Wednesday, April 26, 2017

Securing Apache Hadoop Distributed File System (HDFS) - part IV

This is the fourth in a series of blog posts on securing HDFS. The first post described how to install Apache Hadoop, and how to use POSIX permissions and ACLs to restrict access to data stored in HDFS. The second post looked at how to use Apache Ranger to authorize access to data stored in HDFS. The third post looked at how Apache Ranger can create "tag" based authorization policies for HDFS using Apache Atlas. In this post I will look at how you can implement transparent encryption in HDFS using the Apache Ranger Key Management Service (KMS).

1) Install and Configure the Apache Ranger KMS

If you have not done so already, then follow the instructions in this tutorial to install the Apache Ranger admin service, and then start it via "sudo ranger-admin start". Open a browser and go to "http://localhost:6080/". Log on with "admin/admin" and click on "Settings". Create a new user corresponding to the name of the user which starts HDFS.

The next step is to install the Apache Ranger KMS. Please follow step (2) in a blog post I wrote last year about this. When installation is complete, then start the KMS service with "sudo ranger-kms start". Log out of the Admin UI and then log back in again with the credentials "keyadmin/keyadmin". Click on the "+" button on the "KMS" tab to create a new KMS Service. Specify the following values:
  • Service Name: kmsdev
  • KMS URL: kms://http@localhost:9292/kms
  • Username: keyadmin
  • Password: keyadmin
When the "kmsdev" service has been created then click on it and edit the default policy that has been created. Edit the existing "allow condition" for "hdfs" adding in the user that will be starting HDFS (if not the "hdfs" user itself). Also grant the "CREATE" permission to that user so that we can create keys from the command line, and the "DECRYPT EEK" permission, so that the user can decrypt the data encryption key:


2) Create an encryption zone in HDFS

In your Hadoop distribution (after first following the steps in the first post), edit 'etc/hadoop/core-site.xml' and add the following property:
  • hadoop.security.key.provider.path - kms://http@localhost:9292/kms
Similarly, edit 'etc/hadoop/hdfs-site.xml' and add the following property:
  • dfs.encryption.key.provider.uri - kms://http@localhost:9292/kms
Start HDFS via 'sbin/start-dfs.sh'. Let's create a new encryption key called "enckey" as follows:
  • bin/hadoop key create enckey
If you go back to the Ranger Admin UI and click on "Encryption / Key Manager" and select the "kmsdev" service, you should be able to see the new key that was created. Now let's create a new encryption zone in HDFS as follows:
  • bin/hadoop fs -mkdir /zone
  • bin/hdfs crypto -createZone -keyName enckey -path /zone
  • bin/hdfs crypto -listZones
That's it! We can put data into the '/zone' directory and it will be encrypted by a key which in turn is encrypted by the key we have created and stored in the Ranger KMS.

Friday, April 21, 2017

Securing Apache Hadoop Distributed File System (HDFS) - part III

This is the third in a series of posts on securing HDFS. The first post described how to install Apache Hadoop, and how to use POSIX permissions and ACLs to restrict access to data stored in HDFS. The second post looked at how to use Apache Ranger to authorize access to data stored in HDFS. In this post we will look at how Apache Ranger can create "tag" based authorization policies for HDFS using Apache Atlas. For information on how to create tag-based authorization policies for Apache Kafka, see a post I wrote earlier this year.

The Apache Ranger admin console allows you to create security policies for HDFS by associating a user/group with some permissions (read/write/execute) and a resource, such as a directory or file. This is called a "Resource based policy" in Apache Ranger. An alternative is to use a "Tag based policy", which instead associates the user/group + permissions with a "tag". You can create and manage tags in Apache Atlas, and Apache Ranger supports the ability to imports tags from Apache Atlas via a tagsync service, something we will cover in this post.

1) Start Apache Atlas and create entities/tags for HDFS

First let's look at setting up Apache Atlas. Download the latest released version (0.8-incubating) and extract it. Build the distribution that contains an embedded HBase and Solr instance via:
  • mvn clean package -Pdist,embedded-hbase-solr -DskipTests
The distribution will then be available in 'distro/target/apache-atlas-0.8-incubating-bin'. To launch Atlas, we need to set some variables to tell it to use the local HBase and Solr instances:
  • export MANAGE_LOCAL_HBASE=true
  • export MANAGE_LOCAL_SOLR=true
Now let's start Apache Atlas with 'bin/atlas_start.py'. Open a browser and go to 'http://localhost:21000/', logging on with credentials 'admin/admin'. Click on "TAGS" and create a new tag called "Data".  Click on "Search" and the "Create new entity" link. Select an entity type of "hdfs_path" with the following values:
  • QualifiedName: data@cl1
  • Name: Data
  • Path: /data
Once the new entity has been created, then click on "+" beside "Tags" and associate the new entity with the "Data" tag.

2) Use the Apache Ranger TagSync service to import tags from Atlas into Ranger

To create tag based policies in Apache Ranger, we have to import the entity + tag we have created in Apache Atlas into Ranger via the Ranger TagSync service. First, start the Apache Ranger admin service and rename the HDFS service we created in the previous tutorial from "HDFSTest" to "cl1_hadoop". This is because the Tagsync service will sync tags into the Ranger service that corresponds to the suffix of the qualified name of the tag with "_hadoop". Also edit 'etc/hadoop/ranger-hdfs-security.xml' in your Hadoop distribution and change the "ranger.plugin.hdfs.service.name" to "cl1_hadoop". Also change the "ranger.plugin.hdfs.policy.cache.dir" along the same lines. Finally, make sure the directory '/etc/ranger/cl1_hadoop/policycache' exists and the user you are running Hadoop as can write and read from this directory.

After building Apache Ranger then extract the file called "target/ranger-<version>-tagsync.tar.gz". Edit 'install.properties' as follows:
  • Set TAG_SOURCE_ATLAS_ENABLED to "false"
  • Set TAG_SOURCE_ATLASREST_ENABLED to  "true"
  • Set TAG_SOURCE_ATLASREST_DOWNLOAD_INTERVAL_IN_MILLIS to "60000" (just for testing purposes)
  • Specify "admin" for both TAG_SOURCE_ATLASREST_USERNAME and TAG_SOURCE_ATLASREST_PASSWORD
Save 'install.properties' and install the tagsync service via "sudo ./setup.sh". It can now be started via "sudo ranger-tagsync-services.sh start".

3) Create Tag-based authorization policies in Apache Ranger

Now let's create a tag-based authorization policy in the Apache Ranger admin UI. Click on "Access Manager" and then "Tag based policies". Create a new Tag service called "HDFSTagService". Create a new policy for this service called "DataPolicy". In the "TAG" field enter a capital "D" and the "Data" tag should pop up, meaning that it was successfully synced in from Apache Atlas. Create an "Allow" condition for the user "bob" with component permission of "HDFS" and "read" and "execute":


The last thing we need to do is to go back to the Resource based policies and edit "cl1_hadoop" and select the tag service we have created above.

4) Testing authorization in HDFS using our tag based policy

Wait until the Ranger authorization plugin syncs the new authorization policies from the Ranger Admin service and then we can test authorization. In the previous tutorial we showed that the file owner and user "alice" can read the data stored in '/data', but "bob" could not. Now we should be able to successfully read the data as "bob" due to the tag based authorization policy we have created:
  • sudo -u bob bin/hadoop fs -cat /data/LICENSE.txt

Thursday, April 20, 2017

Securing Apache Hadoop Distributed File System (HDFS) - part II

This is the second in a series of posts on securing HDFS. The first post described how to install Apache Hadoop, and how to use POSIX permissions and ACLs to restrict access to data stored in HDFS. In this post we will look at how to use Apache Ranger to authorize access to data stored in HDFS. The Apache Ranger Admin console allows you to create policies which are retrieved and enforced by a HDFS authorization plugin. Apache Ranger allows us to create centralized authorization policies for HDFS, as well as an authorization audit trail stored in SOLR or HDFS.

1) Install the Apache Ranger HDFS plugin

First we will install the Apache Ranger HDFS plugin. Follow the steps in the previous tutorial to setup Apache Hadoop, if you have not done this already. Then download Apache Ranger and verify that the signature is valid and that the message digests match. Due to some bugs that were fixed for the installation process, I am using version 1.0.0-SNAPSHOT in this post. Now extract and build the source, and copy the resulting plugin to a location where you will configure and install it:
  • mvn clean package assembly:assembly -DskipTests
  • tar zxvf target/ranger-1.0.0-SNAPSHOT-hdfs-plugin.tar.gz
  • mv ranger-1.0.0-SNAPSHOT-hdfs-plugin.tar.gz ${ranger.hdfs.home}
Now go to ${ranger.hdfs.home} and edit "install.properties". You need to specify the following properties:
  • POLICY_MGR_URL: Set this to "http://localhost:6080"
  • REPOSITORY_NAME: Set this to "HDFSTest".
  • COMPONENT_INSTALL_DIR_NAME: The location of your Apache Hadoop installation
Save "install.properties" and install the plugin as root via "sudo ./enable-hdfs-plugin.sh". The Apache Ranger HDFS plugin should now be successfully installed. Start HDFS with:
  • sbin/start-dfs.sh
2) Create authorization policies in the Apache Ranger Admin console

Next we will use the Apache Ranger admin console to create authorization policies for our data in HDFS. Follow the steps in this tutorial to install the Apache Ranger admin service. Start the Apache Ranger admin service with "sudo ranger-admin start" and open a browser and go to "http://localhost:6080/" and log on with "admin/admin". Add a new HDFS service with the following configuration values:
  • Service Name: HDFSTest
  • Username: admin
  • Password: admin
  • Namenode URL: hdfs://localhost:9000
Click on "Test Connection" to verify that we can connect successfully to HDFS + then save the new service. Now click on the "HDFSTest" service that we have created. Add a new policy for the "/data" resource path for the user "alice" (create this user if you have not done so already under "Settings, Users/Groups"), with permissions of "read" and "execute".


3) Testing authorization in HDFS

Now let's test the Ranger authorization policy we created above in action. Note that by default the HDFS authorization plugin checks for a Ranger authorization policy that grants access first, and if this fails it falls back to the default POSIX permissions. The Ranger authorization plugin will pull policies from the Admin service every 30 seconds by default. For the "HDFSTest" example above, they are stored in "/etc/ranger/HDFSTest/policycache/" by default. Make sure that the user you are running Hadoop as can access this directory.

Now let's test to see if I can read the data file as follows:
  • bin/hadoop fs -cat /data/LICENSE* (this should work via the underlying POSIX permissions)
  • sudo -u alice bin/hadoop fs -cat /data/LICENSE* (this should work via the Ranger authorization policy)
  • sudo -u bob bin/hadoop fs -cat /data/LICENSE* (this should fail as we don't have an authorization policy for "bob").

Wednesday, April 19, 2017

Securing Apache Hadoop Distributed File System (HDFS) - part I

Last year, I wrote a series of articles on securing Apache Kafka using Apache Ranger and Apache Sentry. In this series of posts I will look at how to secure the Apache Hadoop Distributed File System (HDFS) using Ranger and Sentry, such that only authorized users can access data stored in it. In this post we will look at a very basic way of installing Apache Hadoop and accessing some data stored in HDFS. Then we will look at how to authorize access to the data stored in HDFS using POSIX permissions and ACLs.

1) Installing Apache Hadoop

The first step is to download and extract Apache Hadoop. This tutorial uses version 2.7.3. The next step is to configure Apache Hadoop as a single node cluster so that we can easily get it up and running on a local machine. You will need to follow the steps outlined in the previous link to install ssh + pdsh. If you can't log in to localhost without a password ("ssh localhost") then you need to follow the instructions given in the link about setting up passphraseless ssh.

In addition, we want to run Apache Hadoop in pseudo-distributed mode, where each Hadoop daemon runs as a separate Java process. Edit 'etc/hadoop/core-site.xml' and add:
Next edit 'etc/hadoop/hdfs-site.xml' and add:

Make sure that the JAVA_HOME variable in 'etc/hadoop/hadoop-env.sh' is correct, and then format the filesystem and start Hadoop via:
  • bin/hdfs namenode -format
  • sbin/start-dfs.sh
To confirm that everything is working correctly, you can open "http://localhost:50090" and check on the status of the cluster there. Once Hadoop has started then upload and then access some data to HDFS:
  • bin/hadoop fs -mkdir /data
  • bin/hadoop fs -put LICENSE.txt /data
  • bin/hadoop fs -ls /data
  • bin/hadoop fs -cat /data/*
2) Securing HDFS using POSIX Permissions

We've seen how to access some data stored in HDFS via the command line. Now how can we create some authorization policies to restrict how to access this data? Well the simplest way is to use the standard POSIX Permissions. If we look at the /data directory we see that it has the following permissions "-rw-r--r--", which means other users can read the LICENSE file stored there. Remove access to other users apart from the owner via:
  • bin/hadoop fs -chmod og-r /data
Now create a test user called "alice" on your system and try to access the LICENSE we uploaded above via:
  • sudo -u alice bin/hadoop fs -cat /data/*
You will see an error that says "cat: Permission denied: user=alice, access=READ_EXECUTE".

3) Securing HDFS using ACLs

Securing access to data stored in HDFS via POSIX permissions works fine, however it does not allow you for example to specify fine-grained permissions for users other than the file owner. What if we want to allow "alice" from the previous section to read the file but not "bob"? We can achieve this via Hadoop ACLs. To enable ACLs, we will need to add a property called "dfs.namenode.acls.enabled" with value "true" to 'etc/hadoop/hdfs-site.xml' + re-start HDFS.

We can grant read access to 'alice' via:
  • bin/hadoop fs -setfacl -m user:alice:r-- /data/*
  • bin/hadoop fs -setfacl -m user:alice:r-x /data
To check to see the new ACLs associated with LICENSE.txt do:
  • bin/hadoop fs -getfacl /data/LICENSE.txt
In addition to the owner, we now have the ACL "user:alice:r--". Now we can read the data as "alice". However another user "bob" cannot read the data. To avoid confusion with future blog posts on securing HDFS, we will now remove the ACLs we added via:
  • bin/hadoop fs -setfacl -b /data
  • bin/hadoop fs -setfacl -b /data/LICENSE.txt

Tuesday, April 18, 2017

Apache CXF 3.1.11 released

Apache CXF 3.1.11 (and 3.0.13) has been released. This release fixes a large number of bugs (there are over a 100 issues fixed in the CXF JIRA for this release). From a security POV, here are some of the more notable bug fixes and changes:
  • CXF-7315 - Abstract the STS client token caching behaviour to allow the user to plug in a custom implementation
  • CXF-7296 - Add support to enable revocation for TLS via configuration (see here). 
  • CXF-7314 - Custom BinarySecurityTokens are not used to set up the security context
  • CXF-4692 - Allow customization of Request Security Token Response
  • CXF-7252 - TLSParameterJaxBUtils.getTrustManagers getting password from wrong system property
In addition, two new security advisories have been issued for bugs fixed in this release:
  • CVE-2017-5653 - Apache CXF JAX-RS XML Security streaming clients do not validate that the service response was signed or encrypted.
  • CVE-2017-5656 - Apache CXF's STSClient uses a flawed way of caching tokens that are associated with delegation tokens.
Please update to the latest releases if you are affected by either of these issues.

Thursday, March 30, 2017

Using OCSP with TLS in Apache CXF

The previous article showed how to enable OCSP for WS-Security based SOAP services in Apache CXF, by checking the revocation status of a certificate used for X.509 digital signature. The article stated that OCSP is supported in Apache CXF when TLS is used to secure communication between a web service client and server, but didn't give any further information. In this post we will show how to enable OCSP when using TLS for both a web service (JAX-WS or JAX-RS) client and server.

The test-code is available on github here (also contains WS-Security OCSP tests):
  • cxf-ocsp: This project contains a number of tests that show how a CXF service can validate client certificates using OCSP.
1) Enabling OCSP for web service clients

First we'll look at enabling OCSP for web service clients. The TLSOCSPTest shows how this can be done. Two Java security properties are set in the test-code to enable OCSP: 
  • "ocsp.responderURL": The URL of the OCSP service
  • "ocsp.enable": "true" to enable OCSP
The first property is required if the service certificate does not contain the URL of the OCSP service in a certificate extension. Before running the test, install openssl and run the following command from the "openssl" directory included in the project (use the passphrase "security"):
  • openssl ocsp -index ca.db.index -port 12345 -text -rkey wss40CAKey.pem -CA wss40CA.pem -rsigner wss40CA.pem
Two options are available to get OCSP working for a web service client. The first is to configure TLS in code as shown in the first test contained in TLSOCSPTest. A PKIXBuilderParameters instance is created with the truststore and revocation is explicitly "enabled" on it. This is then wrapped in a CertPathTrustManagerParameters and used to initialise the TrustManagerFactory. 

The second test shows a new and alternative way of enabling OCSP if you want to configure your TLS keys in spring. This feature is only available from CXF 3.1.11 onwards.  The spring configuration file for the client contains a tlsClientParameters Element with the attribute "enableRevocation="true"". Once the "ocsp.enable" security property is set, then this will enable revocation checking on the certificate presented by the server during the TLS handshake.

2) Enabling OCSP for web service servers

We also show via the TLSOCSPClientAuthTest how to enable OCSP for web service servers that use CXF's Jetty transport. Openssl should be started as per the client tests. The server requires client authentication and then uses OCSP to verify the revocation status of the certificate presented by the client during the TLS handshake. The TLS configuration for the server is done in code. However it can also be done in spring using the "enableRevocation" attribute as per the client above.

Tuesday, March 21, 2017

Using OCSP with WS-Security in Apache CXF

The OCSP (Online Certificate Status Protocol) is a http-based protocol to check whether a given X.509 certificate is revoked or not. It is supported in Apache CXF when TLS is used to secure communication between a web service client and server. However, it is also possible to use with a SOAP request secured with WS-Security. When the client signs a portion of the SOAP request using XML digital signature, then the service can be configured to check whether the certificate in question is revoked or not via OCSP. We will cover some simple test-cases in this post that show how this can be done.

The test-code is available on github here:
  • cxf-ocsp: This project contains a number of tests that show how a CXF service can validate client certificates using OCSP.
The project contains two separate test-classes for WS-Security in particular. Both are for a simple "double it" SOAP web service invocation using Apache CXF. The clients are configured with CXF's WSS4JOutInterceptor, to encrypt and sign the SOAP Body using credentials contained in keystores. For signature, the signing certificate is included in the security header of the request. On the receiving side, the services are configured to validate the signature and to decrypt the request. In particular, the property "enableRevocation" is set to "true" to enable revocation checking.

The first test, WSSecurityOCSPTest, is a conventional test of the OCSP functionality. Two Java security properties are set in the test-code to enable OCSP (the server runs in the same process as the client):
  • "ocsp.responderURL": The URL of the OCSP service
  • "ocsp.enable": "true" to enable OCSP
The first property is required if the client certificate does not contain the URL of the OCSP service in a certificate extension. Before running the test, install openssl and run the following command from the "openssl" directory included in the project (use the passphrase "security"):
  • openssl ocsp -index ca.db.index -port 12345 -text -rkey wss40CAKey.pem -CA wss40CA.pem -rsigner wss40CA.pem
Now run the test (e.g.  mvn test -Dtest=WSSecurityOCSPTest). In the openssl console window you should see the OCSP request data.

The second test, WSSecurityOCSPCertTest, tests the scenario where the OCSP service signs the response with a different certificate to that of the issuer of the client certificate. Under ordinary circumstances, OCSP revocation checking will fail, and indeed this is tested in the test above. However it's also possible to support this scenario, by adding the OCSP certificate to the service truststore (this is already done in the test), and to set the following additional security properties:
  • "ocsp.responderCertIssuerName": DN of the issuer of the cert
  • "ocsp.responderCertSerialNumber": Serial number of the cert
Launch Openssl from the "openssl" directory included in the project:
  • openssl ocsp -index ca.db.index -port 12345 -text -rkey wss40key.pem -CA wss40CA.pem -rsigner wss40.pem
and run the test via "mvn test -Dtest=WSSecurityOCSPCertTest".