Wednesday, April 18, 2018

Running the Apache Ranger Admin service 1.0.0 in Docker

Apache Ranger 1.0.0 has been recently released after a long development cycle, featuring a huge number of improvements and bug fixes. A previous blog post covered how to manually install the Apache Ranger admin service, by compiling the Apache Ranger source and using MySQL as the database. However this involves a large number of steps, as well as installing MySQL, Apache Maven, Java, etc. In this post we will show how Docker Compose can be used to easily set up the Apache Ranger 1.0.0 Admin Service.

1) Description

The project is available in my github testcases repository here. This project is provided as a quick and easy way to play around with the Apache Ranger admin service. It should not be deployed in production as it uses default security credentials, it is not secure with kerberos, auditing is not enabled, etc. It contains the configuration required to build two Docker images:
  • ranger-postgres: Contains a Docker File to set up a Postgres database for Apache Ranger, creating the necessary users for the Ranger admin installation scripts to work.
  • ranger-admin: Contains a Docker File to build, configure and install the Apache Ranger admin service. It downloads the Apache Ranger source code, and builds and extracts the Admin service. It configures it to use the postgres database and starts the Admin service when the docker image is started.
2) Building and running

First we need to build the docker images. This can be done via:
  • (In ranger-postgres) docker build . -t coheigea/ranger-postgres
  • (In ranger-admin) docker build . -t coheigea/ranger-admin
Note that the ranger-admin docker images takes a long time to build due to having to build the source code using Apache Maven - and hence it needs to download a large amount of dependencies.

There are two ways of running the project. The easiest is to install Docker compose and then simply start it with:
  •  docker-compose up
The alternative is to create a network so that we can link containers, and then run the images separately using docker, i.e.:
  • docker network create my-network
  • docker run -p 5432:5432 --name postgres-server --network my-network coheigea/ranger-postgres
  • docker run -p 6080:6080 -it --network my-network coheigea/ranger-admin
Once the Ranger admin server is started then open a browser and navigate to:
  • http://localhost:6080 (credentials: admin/admin)
To see how to create authorization policies for various big data components using the UI please refer to the numerous blog posts I have previously written on this topic (for example: Kafka, HBase, HDFS).

Thursday, March 1, 2018

The Apache Sentry security service - part IV

This is the fourth in a series of blog posts on the Apache Sentry security service. The first post looked at how to get started with the Apache Sentry security service, both from scratch and via a docker image. The second post looked at how to define the authorization privileges held in the Sentry security service. The third post looked at securing Apache Kafka withe Apache Sentry, where the privileges were defined in the Sentry security service. In this post, we will update an earlier tutorial I wrote on securing Apache Hive using Apache Sentry to also retrieve the privileges from the Sentry security service.

1) Configure authorization in Apache Hive

Please follow this tutorial to install and configure Apache Hadoop and Apache Hive, except use version 2.3.2 of Apache Hive, which is the version supported by Apache Sentry 2.0.0. After installation, follow the instructions to create a table in Hive and make sure that a query is successful. Now we will integrate Apache Sentry 2.0.0 with Apache Hive. First copy the jars from the "lib" directory of the Sentry distribution to the Hive "lib" directory. We need to add three new configuration files to the "conf" directory of Apache Hive.

Create a file called 'conf/hiveserver2-site.xml' with the content:

Here we are enabling authorization and adding the Sentry authorization plugin. Note that it differs a bit from the hiveserver2-site.xml given in the previous tutorial, namely that we are not using the "v2" Sentry Hive binding as before.

Next create a new file in the "conf" directory of Apache Hive called "sentry-site.xml" with the following content:


This is the configuration file for the Sentry plugin for Hive. It instructs Sentry to retrieve the authorization privileges from the Sentry security service, and to get the groups of authenticated users from the 'sentry.ini' configuration file. As we are not using Kerberos, the "testing.mode" configuration parameter must be set to "true". Finally, we need to define the groups associated with a given user in 'sentry.ini' in the conf directory:

Here we assign "alice" the group "user". Note that in the earlier tutorial this file also contained the authorization privileges, but they are not required in this scenario as we are using the Apache Sentry security service.

2) Configure the Apache Sentry security service

Follow the first tutorial to install the Apache Sentry security service. Now we need to create the authorization privileges for our Apache Hive test scenario as per the second tutorial. Start the 'sentryCli" in the Apache Sentry distribution, and assign a role to the "user" group (of which "alice" is a member) with the privilege to perform a "select" statement on the "words" table:
  • cr select_role
  • gp select_role "Server=server1->Db=default->Table=words->Column=*->action=select"
  • gr select_role user
Now we can test authorization after restarting Apache Hive. The user 'alice' should now be able query the table according to our policy:
  • bin/beeline -u jdbc:hive2://localhost:10000 -n alice
  • select * from words where word == 'Dare'; (works)

Friday, February 23, 2018

The Apache Sentry security service - part III

This is the third in a series of blog posts on the Apache Sentry security service. The first post looked at how to get started with the Apache Sentry security service, both from scratch and via a docker image. The second post looked at how to define the authorization privileges held in the Sentry security service. In this post we will look at updating an earlier tutorial I wrote about securing Apache Kafka with Apache Sentry, this time using the security service instead of defining the privileges in a file local to the Kafka distribution.

1) Configure authorization in the broker

Firstly download and configure Apache Kafka using SSL as per this tutorial, except use Kafka 0.11.0.2. To enable authorization using Apache Sentry we also need to follow these steps. First edit 'config/server.properties' and add:
  • authorizer.class.name=org.apache.sentry.kafka.authorizer.SentryKafkaAuthorizer
  • sentry.kafka.site.url=file:./config/sentry-site.xml
Next copy the jars from the "lib" directory of the Sentry distribution to the Kafka "libs" directory. Then create a new file in the config directory called "sentry-site.xml" with the following content:

This is the configuration file for the Sentry plugin for Kafka. It instructs Sentry to retrieve the authorization privileges from the Sentry security service, and to get the groups of authenticated users from the 'sentry.ini' configuration file. Create a new file in the config directory called "sentry.ini" with the following content:
Note that in the earlier tutorial this file also contained the authorization privileges, but they are not required in this scenario as we are using the Apache Sentry security service.

2) Configure the Apache Sentry security service

Follow the first tutorial to install the Apache Sentry security service. Now we need to create the authorization privileges for our Apache Kafka test scenario as per the second tutorial. Start the 'sentryCli" in the Apache Sentry distribution.

Create the roles:
  • t kafka
  • cr admin_role
  • cr describe_role
  • cr read_role
  • cr write_role
  • cr describe_consumer_group_role 
  • cr read_consumer_group_role
Add the privileges to the roles:
  • gp admin_role "Host=*->Cluster=kafka-cluster->action=ALL"
  • gp describe_role "Host=*->Topic=test->action=describe"
  • gp read_role "Host=*->Topic=test->action=read"
  • gp write_role "Host=*->Topic=test->action=write"
  • gp describe_consumer_group_role "Host=*->ConsumerGroup=test-consumer-group->action=describe"
  • gp read_consumer_group_role "Host=*->ConsumerGroup=test-consumer-group->action=read"
Associate the roles with groups (defined in 'sentry.ini' above):
  • gr admin_role admin
  • gr describe_role producer
  • gr read_role producer
  • gr write_role producer
  • gr read_role consumer
  • gr describe_role consumer
  • gr describe_consumer_group_role consumer
  • gr read_consumer_group_role consumer
3) Test authorization

Now start the broker (after starting Zookeeper):
  • bin/kafka-server-start.sh config/server.properties
Start the producer:
  • bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test --producer.config config/producer.properties
Send a few messages to check that the producer is authorized correctly. Now start the consumer:
  • bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning --consumer.config config/consumer.properties --new-consumer
Authorization should succeed and you should see the messages made by the producer appear in the consumer console window.

Tuesday, February 20, 2018

Enabling Apache CXF Fediz plugin logging in Apache Tomcat

The Apache CXF Fediz subproject provides an easy way to secure your web applications via the WS-Federation Passive Requestor Profile. An earlier tutorial I wrote covers how to deploy and secure a "simpleWebapp" project that ships with Fediz in Apache Tomcat. One of the questions that came up recently on that article was how to enable logging for the Fediz plugin itself (as opposed to the IdP/STS). My colleague Jan Bernhardt has covered this topic using Apache Log4j. Here we will show a simple alternative way to enable logging using java.util.logging.

Please follow the earlier tutorial to set up and secure the "simpleWebapp" in Apache Tomcat. Note that after a successful test, the IdP logs appear in "logs/idp.log" and the STS logs appear in "logs/sts.log". However no logs exist for the plugin itself. To rectify this, copy the "slf4j-jdk14" jar into "lib/fediz" (for example from here). Then edit 'webapps/fedizhelloworld/WEB-INF/classes/logging.properties' with the following content:

This configuration logs "INFO" level messages to the Console (catalina.out) and logs "FINE" level messages to the log file "logs/rp.log" in XML Format. For example:

Wednesday, February 14, 2018

The Apache Sentry security service - part II

This is the second in a series of blog posts on the Apache Sentry security service. The first post looked at how to get started with the Apache Sentry security service, both from scratch and via a docker image. The next logical question is how can we can define the authorization privileges held in the Sentry security service. In this post we will briefly cover what those privileges look like, and how we can query them using two different tools that ship with the Apache Sentry distribution.

1) Apache Sentry privileges

The Apache Sentry docker image we covered in the previous tutorial ships with a 'sentry.ini' configuration file (see here) that is used to retrieve the groups associated with a given user. A user must be a member of the "admin" group to invoke on the Apache Sentry security service, as configured in 'sentry-site.xml' (see here).  To avoid confusion, 'sentry.ini' also contains "[groups]" and "[roles]" sections, but these are not used by the Sentry security service.

In Apache Sentry, a user is associated with one or more groups, which in turn are associated with one or more roles, which in turn are associated with one or more privileges. Privileges are made up of a number of different components that vary slightly depending on what service the privilege is associated with (e.g. Hive, Kafka, etc.). For example:
  • Host=*->Topic=test->action=ALL - This Kafka privilege grants all actions on the "test" topic on all hosts.
  • Collection=logs->action=* - This Solr privilege grants all actions on the "logs" collection.
  • Server=sqoopServer1->Connector=c1->action=* - This Sqoop privilege grants all actions on the "c1" connector on the "sqoopServer1" server.
  • Server=server1->Db=default->Table=words->Column=count->action=select - This Hive privilege grants the "select" action on the "count" column of the "words" table in the "default" database on the "server1" server.
For more information on the Apache sentry privilege model please consult the official wiki.

2) Querying the Apache Sentry security service using 'sentryShell'

Follow the steps outlined in the previous tutorial to get the Apache Sentry security service up and running using either the docker image or by setting it up manually. The Apache Sentry distribution ships with a "sentryShell" command line tool that we can use to query that Apache Sentry security service. So depending on which approach you followed to install Sentry, either go to the distribution or else log into the docker container.

We can query the roles, groups and privileges via:
  • bin/sentryShell -conf sentry-site.xml -lr
  • bin/sentryShell -conf sentry-site.xml -lg
  • bin/sentryShell -conf sentry-site.xml -lp -r admin_role
We can create a "admin_role" role and add it to the "admin" group via:
  • bin/sentryShell -conf sentry-site.xml -cr -r admin_role
  • bin/sentryShell -conf sentry-site.xml -arg -g admin -r admin_role
We can grant a (Hive) privilege to the "admin_role" role as follows:
  • bin/sentryShell -conf sentry-site.xml -gpr -r admin_role -p "Server=*->action=ALL"
If we are adding a privilege for anything other than Apache Hive, we need to explicitly specify the "type", e.g.:
  • bin/sentryShell -conf sentry-site.xml -gpr -r admin_role -p "Host=*->Cluster=kafka-cluster->action=ALL" -t kafka
  • bin/sentryShell -conf sentry-site.xml -lp -r admin_role -t kafka
3) Querying the Apache Sentry security service using 'sentryCli'

A rather more user-friendly alternative to the 'sentryShell' is available in Apache Sentry 2.0.0. The 'sentryCli' can be started with 'bin/sentryCli'. Typing ?l lists the available commands:

The Apache Sentry security service can be queried using any of these commands.

Monday, February 12, 2018

The Apache Sentry security service - part I

Apache Sentry is a role-based authorization solution for a number of big-data projects. I have previously blogged about how to install the authorization plugin to secure various deployments, e.g.:
For all of these tutorials, the authorization privileges were stored in a configuration file local to the deployment. However this is just a "test configuration" to get simple examples up and running quickly. For production scenarios, Apache Sentry offers a central security service, which stores the user roles and privileges in a database, and provides an RPC service that the Sentry authorization plugins can invoke on. In this article, we will show how to set up the Apache Sentry security service in a couple of different ways.

1) Installing the Apache Sentry security service manually

Download the binary distribution of Apache Sentry (2.0.0 was used for the purposes of this tutorial). Verify that the signature is valid and that the message digests match, and extract it to ${sentry.home}. In addition, download a compatible version of Apache Hadoop (2.7.5 was used for the purposes of this tutorial). Set the "HADOOP_HOME" environment variable to point to the Hadoop distribution.

First we need to specify two configuration files, "sentry-site.xml" which contains the Sentry configuration, and "sentry.ini" which defines the user/group information for the user who will be invoking on the Sentry security service. You can download sample configuration files here. Copy these files to the root directory of "${sentry.home}". Edit the 'sentry.ini' file and replace 'user' with the user who will be invoking on the security service (such as "kafka" or "solr"). The other entries will be ignored - 'sentry-site.xml' defines that a user must belong to the "admin" group to invoke on the security service successfully.

Finally configure the database and start the Apache Sentry security service via:
  • bin/sentry --command schema-tool --conffile sentry-site.xml --dbType derby --initSchema
  • bin/sentry --command service -c sentry-site.xml
2) Installing the Apache Sentry security service via docker

Instead of having to download and configure Apache Sentry and Hadoop, a simpler way to get started is to download a pre-made docker image that I created. The DockerFile is available here and the docker image is available here. Note that this docker image is only for testing use, as the security service is not secured with kerberos and it uses the default credentials. Download and run the docker image with:
  • docker pull coheigea/sentry
  • docker run -p 8038:8038 coheigea/sentry
Once the container has started, we need to update the 'sentry.ini' file with the username that we are going to use to invoke on the Apache Sentry security service. Get the "id" of the running container via "docker ps" and then run "docker exec -it <id> bash". Edit 'sentry.ini' and change 'user' to the username you are using.

In the next tutorial we will look at how to manually invoke on the security service.

Thursday, February 8, 2018

Securing Apache Sqoop - part III

This is the third and final post about securing Apache Sqoop. The first post looked at how to set up Apache Sqoop to perform a simple use-case of transferring a file from HDFS to Apache Kafka. The second post showed how to secure Apache Sqoop with Apache Ranger. In this post we will look at an alternative way of implementing authorization in Apache Sqoop, namely using Apache Sentry.

1) Install the Apache Sentry Sqoop plugin

If you have not done so already, please follow the steps in the earlier tutorial to set up Apache Sqoop. Download the binary distribution of Apache Sentry (2.0.0 was used for the purposes of this tutorial). Verify that the signature is valid and that the message digests match, and extract it to ${sentry.home}.

a) Configure sqoop.properties

We need to configure Apache Sqoop to use Apache Sentry for authorization. Edit 'conf/sqoop.properties' and add the following properties:
  • org.apache.sqoop.security.authentication.type=SIMPLE
  • org.apache.sqoop.security.authentication.handler=org.apache.sqoop.security.authentication.SimpleAuthenticationHandler
  • org.apache.sqoop.security.authorization.handler=org.apache.sentry.sqoop.authz.SentryAuthorizationHandler
  • org.apache.sqoop.security.authorization.access_controller=org.apache.sentry.sqoop.authz.SentryAccessController
  • org.apache.sqoop.security.authorization.validator=org.apache.sentry.sqoop.authz.SentryAuthorizationValidator
  • org.apache.sqoop.security.authorization.server_name=SqoopServer1
  • sentry.sqoop.site.url=file:./conf/sentry-site.xml
In addition, we need to add some of the Sentry jars to the Sqoop classpath. Add the following property to 'conf/sqoop.properties', substituting the value for "${sentry.home}":
  • org.apache.sqoop.classpath.extra=${sentry.home}/lib/sentry-binding-sqoop-2.0.0.jar:${sentry.home}/lib/sentry-core-common-2.0.0.jar:${sentry.home}/lib/sentry-core-model-sqoop-2.0.0.jar:${sentry.home}/lib/sentry-provider-file-2.0.0.jar:${sentry.home}/lib/sentry-provider-common-2.0.0.jar:${sentry.home}/lib/sentry-provider-db-2.0.0.jar:${sentry.home}/lib/shiro-core-1.4.0.jar:${sentry.home}/lib/sentry-policy-engine-2.0.0.jar:${sentry.home}/lib/sentry-policy-common-2.0.0.jar

b) Add Apache Sentry configuration files

Next we will configure the Apache Sentry authorization plugin. Create a new file in the Sqoop "conf" directory called "sentry-site.xml" with the following content (substituting the correct directory for "sentry.sqoop.provider.resource"):

It essentially says that the authorization privileges are stored in a local file, and that the groups for authenticated users should be retrieved from this file. Finally, we need to specify the authorization privileges. Create a new file in the config directory called "sentry.ini" with the following content, substituting "colm" for the name of the user running the Sqoop shell:

2) Test authorization 

Now start Apache Sqoop ("bin/sqoop2-server start") and start the shell ("bin/sqoop2-shell"). "show connector" should list the full range of Sqoop Connectors, as authorization has succeeded. To test that authorization is correctly disabling access for unauthorized users, change the "ALL" permission in 'conf/sentry.ini' to "WRITE", and restart the server and shell. This time access is not granted and a blank list should be returned for "show connector".