Friday, February 23, 2018

The Apache Sentry security service - part III

This is the third in a series of blog posts on the Apache Sentry security service. The first post looked at how to get started with the Apache Sentry security service, both from scratch and via a docker image. The second post looked at how to define the authorization privileges held in the Sentry security service. In this post we will look at updating an earlier tutorial I wrote about securing Apache Kafka with Apache Sentry, this time using the security service instead of defining the privileges in a file local to the Kafka distribution.

1) Configure authorization in the broker

Firstly download configure Apache Kafka using SSL as per this tutorial, except use Kafka 0.11.0.2. To enable authorization using Apache Sentry we also need to follow these steps. First edit 'config/server.properties' and add:
  • authorizer.class.name=org.apache.sentry.kafka.authorizer.SentryKafkaAuthorizer
  • sentry.kafka.site.url=file:./config/sentry-site.xml
Next copy the jars from the "lib" directory of the Sentry distribution to the Kafka "libs" directory. Then create a new file in the config directory called "sentry-site.xml" with the following content:

This is the configuration file for the Sentry plugin for Kafka. It instructs Sentry to retrieve the authorization privileges from the Sentry security service, and to get the groups of authenticated users from the 'sentry.ini' configuration file. Create a new file in the config directory called "sentry.ini" with the following content:
Note that in the earlier tutorial this file also contained the authorization privileges, but they are not required in this scenario as we are using the Apache Sentry security service.

2) Configure the Apache Sentry security service

Follow the first tutorial to install the Apache Sentry security service. Now we need to create the authorization privileges for our Apache Kafka test scenario as per the second tutorial. Start the 'sentryCli" in the Apache Sentry distribution.

Create the roles:
  • t kafka
  • cr admin_role
  • cr describe_role
  • cr read_role
  • cr write_role
  • cr describe_consumer_group_role 
  • cr read_consumer_group_role
Add the privileges to the roles:
  • gp admin_role "Host=*->Cluster=kafka-cluster->action=ALL"
  • gp describe_role "Host=*->Topic=test->action=describe"
  • gp read_role "Host=*->Topic=test->action=read"
  • gp write_role "Host=*->Topic=test->action=write"
  • gp describe_consumer_group_role "Host=*->ConsumerGroup=test-consumer-group->action=describe"
  • gp read_consumer_group_role "Host=*->ConsumerGroup=test-consumer-group->action=read"
Associate the roles with groups (defined in 'sentry.ini' above):
  • gr admin_role admin
  • gr describe_role producer
  • gr read_role producer
  • gr write_role producer
  • gr read_role consumer
  • gr describe_role consumer
  • gr describe_consumer_group_role consumer
  • gr read_consumer_group_role consumer
3) Test authorization

Now start the broker (after starting Zookeeper):
  • bin/kafka-server-start.sh config/server.properties
Start the producer:
  • bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test --producer.config config/producer.properties
Send a few messages to check that the producer is authorized correctly. Now start the consumer:
  • bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning --consumer.config config/consumer.properties --new-consumer
Authorization should succeed and you should see the messages made by the producer appear in the consumer console window.

Tuesday, February 20, 2018

Enabling Apache CXF Fediz plugin logging in Apache Tomcat

The Apache CXF Fediz subproject provides an easy way to secure your web applications via the WS-Federation Passive Requestor Profile. An earlier tutorial I wrote covers how to deploy and secure a "simpleWebapp" project that ships with Fediz in Apache Tomcat. One of the questions that came up recently on that article was how to enable logging for the Fediz plugin itself (as opposed to the IdP/STS). My colleague Jan Bernhardt has covered this topic using Apache Log4j. Here we will show a simple alternative way to enable logging using java.util.logging.

Please follow the earlier tutorial to set up and secure the "simpleWebapp" in Apache Tomcat. Note that after a successful test, the IdP logs appear in "logs/idp.log" and the STS logs appear in "logs/sts.log". However no logs exist for the plugin itself. To rectify this, copy the "slf4j-jdk14" jar into "lib/fediz" (for example from here). Then edit 'webapps/fedizhelloworld/WEB-INF/classes/logging.properties' with the following content:

This configuration logs "INFO" level messages to the Console (catalina.out) and logs "FINE" level messages to the log file "logs/rp.log" in XML Format. For example:

Wednesday, February 14, 2018

The Apache Sentry security service - part II

This is the second in a series of blog posts on the Apache Sentry security service. The first post looked at how to get started with the Apache Sentry security service, both from scratch and via a docker image. The next logical question is how can we can define the authorization privileges held in the Sentry security service. In this post we will briefly cover what those privileges look like, and how we can query them using two different tools that ship with the Apache Sentry distribution.

1) Apache Sentry privileges

The Apache Sentry docker image we covered in the previous tutorial ships with a 'sentry.ini' configuration file (see here) that is used to retrieve the groups associated with a given user. A user must be a member of the "admin" group to invoke on the Apache Sentry security service, as configured in 'sentry-site.xml' (see here).  To avoid confusion, 'sentry.ini' also contains "[groups]" and "[roles]" sections, but these are not used by the Sentry security service.

In Apache Sentry, a user is associated with one or more groups, which in turn are associated with one or more roles, which in turn are associated with one or more privileges. Privileges are made up of a number of different components that vary slightly depending on what service the privilege is associated with (e.g. Hive, Kafka, etc.). For example:
  • Host=*->Topic=test->action=ALL - This Kafka privilege grants all actions on the "test" topic on all hosts.
  • Collection=logs->action=* - This Solr privilege grants all actions on the "logs" collection.
  • Server=sqoopServer1->Connector=c1->action=* - This Sqoop privilege grants all actions on the "c1" connector on the "sqoopServer1" server.
  • Server=server1->Db=default->Table=words->Column=count->action=select - This Hive privilege grants the "select" action on the "count" column of the "words" table in the "default" database on the "server1" server.
For more information on the Apache sentry privilege model please consult the official wiki.

2) Querying the Apache Sentry security service using 'sentryShell'

Follow the steps outlined in the previous tutorial to get the Apache Sentry security service up and running using either the docker image or by setting it up manually. The Apache Sentry distribution ships with a "sentryShell" command line tool that we can use to query that Apache Sentry security service. So depending on which approach you followed to install Sentry, either go to the distribution or else log into the docker container.

We can query the roles, groups and privileges via:
  • bin/sentryShell -conf sentry-site.xml -lr
  • bin/sentryShell -conf sentry-site.xml -lg
  • bin/sentryShell -conf sentry-site.xml -lp -r admin_role
We can create a "admin_role" role and add it to the "admin" group via:
  • bin/sentryShell -conf sentry-site.xml -cr -r admin_role
  • bin/sentryShell -conf sentry-site.xml -arg -g admin -r admin_role
We can grant a (Hive) privilege to the "admin_role" role as follows:
  • bin/sentryShell -conf sentry-site.xml -gpr -r admin_role -p "Server=*->action=ALL"
If we are adding a privilege for anything other than Apache Hive, we need to explicitly specify the "type", e.g.:
  • bin/sentryShell -conf sentry-site.xml -gpr -r admin_role -p "Host=*->Cluster=kafka-cluster->action=ALL" -t kafka
  • bin/sentryShell -conf sentry-site.xml -lp -r admin_role -t kafka
3) Querying the Apache Sentry security service using 'sentryCli'

A rather more user-friendly alternative to the 'sentryShell' is available in Apache Sentry 2.0.0. The 'sentryCli' can be started with 'bin/sentryCli'. Typing ?l lists the available commands:

The Apache Sentry security service can be queried using any of these commands.

Monday, February 12, 2018

The Apache Sentry security service - part I

Apache Sentry is a role-based authorization solution for a number of big-data projects. I have previously blogged about how to install the authorization plugin to secure various deployments, e.g.:
For all of these tutorials, the authorization privileges were stored in a configuration file local to the deployment. However this is just a "test configuration" to get simple examples up and running quickly. For production scenarios, Apache Sentry offers a central security service, which stores the user roles and privileges in a database, and provides an RPC service that the Sentry authorization plugins can invoke on. In this article, we will show how to set up the Apache Sentry security service in a couple of different ways.

1) Installing the Apache Sentry security service manually

Download the binary distribution of Apache Sentry (2.0.0 was used for the purposes of this tutorial). Verify that the signature is valid and that the message digests match, and extract it to ${sentry.home}. In addition, download a compatible version of Apache Hadoop (2.7.5 was used for the purposes of this tutorial). Set the "HADOOP_HOME" environment variable to point to the Hadoop distribution.

First we need to specify two configuration files, "sentry-site.xml" which contains the Sentry configuration, and "sentry.ini" which defines the user/group information for the user who will be invoking on the Sentry security service. You can download sample configuration files here. Copy these files to the root directory of "${sentry.home}". Edit the 'sentry.ini' file and replace 'user' with the user who will be invoking on the security service (such as "kafka" or "solr"). The other entries will be ignored - 'sentry-site.xml' defines that a user must belong to the "admin" group to invoke on the security service successfully.

Finally configure the database and start the Apache Sentry security service via:
  • bin/sentry --command schema-tool --conffile sentry-site.xml --dbType derby --initSchema
  • bin/sentry --command service -c sentry-site.xml
2) Installing the Apache Sentry security service via docker

Instead of having to download and configure Apache Sentry and Hadoop, a simpler way to get started is to download a pre-made docker image that I created. The DockerFile is available here and the docker image is available here. Note that this docker image is only for testing use, as the security service is not secured with kerberos and it uses the default credentials. Download and run the docker image with:
  • docker pull coheigea/sentry
  • docker run -p 8038:8038 coheigea/sentry
Once the container has started, we need to update the 'sentry.ini' file with the username that we are going to use to invoke on the Apache Sentry security service. Get the "id" of the running container via "docker ps" and then run "docker exec -it <id> bash". Edit 'sentry.ini' and change 'user' to the username you are using.

In the next tutorial we will look at how to manually invoke on the security service.

Thursday, February 8, 2018

Securing Apache Sqoop - part III

This is the third and final post about securing Apache Sqoop. The first post looked at how to set up Apache Sqoop to perform a simple use-case of transferring a file from HDFS to Apache Kafka. The second post showed how to secure Apache Sqoop with Apache Ranger. In this post we will look at an alternative way of implementing authorization in Apache Sqoop, namely using Apache Sentry.

1) Install the Apache Sentry Sqoop plugin

If you have not done so already, please follow the steps in the earlier tutorial to set up Apache Sqoop. Download the binary distribution of Apache Sentry (2.0.0 was used for the purposes of this tutorial). Verify that the signature is valid and that the message digests match, and extract it to ${sentry.home}.

a) Configure sqoop.properties

We need to configure Apache Sqoop to use Apache Sentry for authorization. Edit 'conf/sqoop.properties' and add the following properties:
  • org.apache.sqoop.security.authentication.type=SIMPLE
  • org.apache.sqoop.security.authentication.handler=org.apache.sqoop.security.authentication.SimpleAuthenticationHandler
  • org.apache.sqoop.security.authorization.handler=org.apache.sentry.sqoop.authz.SentryAuthorizationHandler
  • org.apache.sqoop.security.authorization.access_controller=org.apache.sentry.sqoop.authz.SentryAccessController
  • org.apache.sqoop.security.authorization.validator=org.apache.sentry.sqoop.authz.SentryAuthorizationValidator
  • org.apache.sqoop.security.authorization.server_name=SqoopServer1
  • sentry.sqoop.site.url=file:./conf/sentry-site.xml
In addition, we need to add some of the Sentry jars to the Sqoop classpath. Add the following property to 'conf/sqoop.properties', substituting the value for "${sentry.home}":
  • org.apache.sqoop.classpath.extra=${sentry.home}/lib/sentry-binding-sqoop-2.0.0.jar:${sentry.home}/lib/sentry-core-common-2.0.0.jar:${sentry.home}/lib/sentry-core-model-sqoop-2.0.0.jar:${sentry.home}/lib/sentry-provider-file-2.0.0.jar:${sentry.home}/lib/sentry-provider-common-2.0.0.jar:${sentry.home}/lib/sentry-provider-db-2.0.0.jar:${sentry.home}/lib/shiro-core-1.4.0.jar:${sentry.home}/lib/sentry-policy-engine-2.0.0.jar:${sentry.home}/lib/sentry-policy-common-2.0.0.jar

b) Add Apache Sentry configuration files

Next we will configure the Apache Sentry authorization plugin. Create a new file in the Sqoop "conf" directory called "sentry-site.xml" with the following content (substituting the correct directory for "sentry.sqoop.provider.resource"):

It essentially says that the authorization privileges are stored in a local file, and that the groups for authenticated users should be retrieved from this file. Finally, we need to specify the authorization privileges. Create a new file in the config directory called "sentry.ini" with the following content, substituting "colm" for the name of the user running the Sqoop shell:

2) Test authorization 

Now start Apache Sqoop ("bin/sqoop2-server start") and start the shell ("bin/sqoop2-shell"). "show connector" should list the full range of Sqoop Connectors, as authorization has succeeded. To test that authorization is correctly disabling access for unauthorized users, change the "ALL" permission in 'conf/sentry.ini' to "WRITE", and restart the server and shell. This time access is not granted and a blank list should be returned for "show connector".

Monday, January 29, 2018

Securing Apache Sqoop - part II

This is the second in a series of posts on how to secure Apache Sqoop. The first post looked at how to set up Apache Sqoop to perform a simple use-case of transferring a file from HDFS to Apache Kafka. In this post we will look at securing Apache Sqoop with Apache Ranger, such that only authorized users can interact with it. We will then show how to use the Apache Ranger Admin UI to create authorization policies for Apache Sqoop.

1) Install the Apache Ranger Sqoop plugin

If you have not done so already, please follow the steps in the earlier tutorial to set up Apache Sqoop. First we will install the Apache Ranger Sqoop plugin. Download Apache Ranger and verify that the signature is valid and that the message digests match. Due to some bugs that were fixed for the installation process, I am using version 1.0.0-SNAPSHOT in this post. Now extract and build the source, and copy the resulting plugin to a location where you will configure and install it:
  • mvn clean package assembly:assembly -DskipTests
  • tar zxvf target/ranger-1.0.0-SNAPSHOT-sqoop-plugin.tar.gz
  • mv ranger-1.0.0-SNAPSHOT-sqoop-plugin ${ranger.sqoop.home}
Now go to ${ranger.sqoop.home} and edit "install.properties". You need to specify the following properties:
  • POLICY_MGR_URL: Set this to "http://localhost:6080"
  • REPOSITORY_NAME: Set this to "SqoopTest".
  • COMPONENT_INSTALL_DIR_NAME: The location of your Apache Sqoop installation
Save "install.properties" and install the plugin as root via "sudo -E ./enable-sqoop-plugin.sh". Make sure that the user you are running Sqoop as has permission to access '/etc/ranger/SqoopTest', which is where the Ranger plugin for Sqoop will download authorization policies created in the Ranger Admin UI.

In the Apache Sqoop directory, copy 'conf/ranger-sqoop-security.xml' to the root directory (or else add the 'conf' directory to the Sqoop classpath). Now restart Apache Sqoop and try to see the Connectors that were installed:
  • bin/sqoop2-server start
  • bin/sqoop2-shell
  • show connector
You should see an empty list here as you are not authorized to see the connectors. Note that "show job" should still work OK, as you have permission to view jobs that you created.

2) Create authorization policies in the Apache Ranger Admin console

Next we will use the Apache Ranger admin console to create authorization policies for Sqoop. Follow the steps in this tutorial (except use at least Ranger 1.0.0) to install the Apache Ranger admin service. Start the Apache Ranger admin service with "sudo ranger-admin start" and open a browser and go to "http://localhost:6080/" and log on with "admin/admin". Add a new Sqoop service with the following configuration values:
  • Service Name: SqoopTest
  • Username: admin
  • Sqoop URL: http://localhost:12000
Note that "Test Connection" is not going to work here, as the "admin" user is not authorized at this stage to read from the Sqoop 2 server. However, once the service is created and the policies synced to the Ranger plugin in Sqoop (roughly every 30 seconds by default), it should work correctly.

Once the "SqoopTest" service is created, we will create some authorization policies for the user who is using the Sqoop Shell.
Click on "Settings" and "Users/Groups" and add a new user corresponding to the user for whom you wish to create authorization policies. When this is done then click on the "SqoopTest" service and edit the existing policies, adding this user (for example):


Wait 30 seconds for the policies to sync to the Ranger plugin that is co-located with the Sqoop service. Now re-start the Shell and "show connector" should list the full range of Sqoop Connectors, as authorization has succeeded. Similar policies could be created to allow only certain users to run jobs created by other users.



Friday, January 26, 2018

Securing Apache Sqoop - part I

This is the first in a series of posts on how to secure Apache Sqoop. Apache Sqoop is a tool to transfer bulk data mainly between HDFS and relational databases, but also supporting other projects such as Apache Kafka. In this post we will look at how to set up Apache Sqoop to perform a simple use-case of transferring a file from HDFS to Apache Kafka. Subsequent posts will show how to authorize this data transfer using both Apache Ranger and Apache Sentry.

Note that we will only use Sqoop 2 (current version 1.99.7), as this is the only version that both Sentry and Ranger support. However, this version is not (yet) recommended for production deployment.

1) Set up Apache Hadoop and Apache Kafka

First we will set up Apache Hadoop and Apache Kafka. The use-case is that we want to transfer a file from HDFS (/data/LICENSE.txt) to a Kafka topic (test). Follow part (1) of an earlier tutorial I wrote about installing Apache Hadoop. The following change is also required for ''etc/hadoop/core-site.xml' (in addition to the "fs.defaultFS" setting that is configured in the earlier tutorial):

Make sure that LICENSE.txt is uploaded to the /data directory as outlined in the tutorial. Now we will set up Apache Kafka. Download Apache Kafka and extract it (1.0.0 was used for the purposes of this tutorial). Start Zookeeper with:
  • bin/zookeeper-server-start.sh config/zookeeper.properties
and start the broker and then create a "test" topic with:
  • bin/kafka-server-start.sh config/server.properties
  • bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
Finally let's set up a consumer for the "test" topic:
  • bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning --consumer.config config/consumer.properties
2) Set up Apache Sqoop

Download Apache Sqoop and extract it (1.99.7 was used for the purposes of this tutorial).

2.a) Configure + start Sqoop

Before starting Sqoop, edit 'conf/sqoop.properties' and change the following property to point instead to the Hadoop configuration directory (e.g. /path.to.hadoop/etc/hadoop):
  • org.apache.sqoop.submission.engine.mapreduce.configuration.directory
Then configure and start Apache Sqoop with the following commands:
  • export HADOOP_HOME=path to Hadoop home
  • bin/sqoop2-tool upgrade
  • bin/sqoop2-tool verify
  • bin/sqoop2-server start (stop)
2.b) Configure links/job in Sqoop

Now that Sqoop has started we need to configure it to transfer data from HDFS to Kafka. Start the Shell via:
  • bin/sqoop2-shell
"show connector" lists the connectors that are available. We first need to configure a link for the HDFS connector:
  • create link -connector hdfs-connector
  • Name: HDFS
  • URI: hdfs://localhost:9000
  • Conf directory: Path to Hadoop conf directory
Similarly, for the Kafka connector:
  • create link -connector kafka-connector
  • Name: KAFKA
  • Kafka brokers: localhost:9092
  • Zookeeper quorum: localhost:2181
"show link" shows the links we've just created. Now we need to create a job from the HDFS link to the Kafka link as follows (accepting the default values if they are not specified below):
  • create job -f HDFS -t KAFKA
  • Name: testjob
  • Input Directory: /data
  • Topic: test
We can see the job we've created with "show job". Now let's start the job:
  • start job -name testjob 
You should see the content of the HDFS "/data" directory (i.e. the LICENSE.txt) appear in the window of the Kafka "test" consumer, thus showing that Sqoop has transfered data from HDFS to Kafka.