Product: TIBCO Spotfire®
Configure Hive HDFS Permissions
Configure Hive HDFS Permissions
Configure Hive HDFS PermissionsNote: As of 6.2.2, we provide updated Hive support. This Knowledge Base article will walk you through the steps to set up your system and describe how we store results files on HDFS.
HDFS Directory and Permissions Configuration
User Chorus HDFS Directory
/user/chorus directory with the owner:group as
hdfs dfs -mkdir -p /user/chorus
This directory will be used to cache the uploaded JAR files such as
/user/chorus directory should have read, write, and execute permissions set for the chorus user.
hdfs dfs -chown chorus:supergroup /user/chorus hdfs dfs -chmod 777 /user/chorusThe staging directory is typically set as
/user. If not, please create a directory using the modified
Active Directory (AD) Permissions
In order to run Pig jobs, the Spotfire Data Science application attempts to create a folder called
/user/<username> as the AD user. By default, the permissions are set to
hdfs:supergroup:drwxr-xr-x which prevents Spotfire Data Science from creating that folder. Change the permissions to grant write access to that folder to the AD users running the Spotfire Data Science application. (Use
Temp Directory Permissions
In order to run YARN, Pig, and similar jobs, each individual user may need to write temporary files to the temporary directories.
There are many Hadoop temp directories such as
pig.tmp.dir, etc. By default, all of them are based off the
/tmp directory must be writeable to everyone in order to let everyone run different jobs.
/tmp directory but be executable by everyone in order to let everyone recurse the directory tree.
/tmp permissions by using the following command:
hdfs dfs -chmod 777 /tmp
Spotfire Data Science Related HDFS Configuration
Spotfire Data Science Directory Structure
Spotfire Data Science uses several temporary directories on HDFS. These directories and files are created with
mapred, and other users.
The temporary directories must be made accessible to the user
alpine and other relevant useres at the base level.
Note: Only individual directories for the specified user can be viewed by that user.
These directories are:
- Standard output for operators:
- Spotfire Data Science temporary output:
- Spotfire Data Science model location:
Temp Directory Ownership For Spotfire Data Science Folders
/tmp directory should be readable and writable.
/tmp/hadoop-yarn directory should be readable and writable for Spark jobs.
Create the Spotfire Data Science folders and assign permissions to them to avoid permission failures.
hdfs dfs -mkdir /tmp/tsds_out /tmp/tsds_runtime /tmp/tsds_model hdfs dfs -chown chorus /tmp/tsds_out /tmp/tsds_runtime /tmp/tsds_model hdfs dfs -chmod 777 /tmp/alpine_out /tmp/tsds_runtime /tmp/tsds_model
Hive ACL (Access Control List) Configuration
In order to run Hive operators and jobs, we need to set up an Access Control List (ACL) for the Hive user.
The Hive user should have read, write, and execute access to
/tmp and all Spotfire Data Science folders.
hdfs dfs -setfacl -m default:user:hive:rwx /tmp hdfs dfs -setfacl -m user:hive:rwx /tmp hdfs dfs -setfacl -R -m default:user:hive:rwx /tmp/alpine_* hdfs dfs -setfacl -R -m user:hive:rwx /tmp
If you're upgrading Spotfire Data Science from a previous version to 6.2 or later, you'll need to perform these actions as well:
/tmp/alpine_* directories to have full permissions so that everyone can read, write, and execute.
hdfs dfs -chmod -R 777 /tmp/alpine_out /tmp/alpine_runtime /tmp/alpine_model hdfs dfs -setfacl -R -m default:user:hive:rwx /tmp hdfs dfs -setfacl -R -m user:hive:rwx /tmp
Customizing Your Permission Settings
With the following settings, users can customize their permissions for the Spotfire Data Science user folders, workflow folders, operator folders, and output files.
There are three configuration options you can set in
alpine.hdfs.userDirPerms– sets permissions for the user folders
alpine.hdfs.dirPerms– sets permissions for the workflow folders and the operator folders in
alpine.hdfs.filePerms– sets permissions for Spotfire Data Science output files.
Each of these needs to be set with a 10 character long permission string. Here are the default settings:
alpine.hdfs.userDirPerms = "-rwxrwxrwx" alpine.hdfs.dirPerms = "-rwxrwxrwx” alpine.hdfs.filePerms = "-rwxr-x---"
Frequently Asked Questions
How do I change
Which files can I safely clear from
Spotfire Data Science overwrites
@default_tmpdir/alpine_* files when users re-run workflows.
Spotfire Data Science users can clear selected
@default_tmpdir/alpine_out files using Clear Temporary Data.
Hadoop administrators can safely clear
@default_tmpdir/alpine_runtime from HDFS as this directory is used to store information for which Spotfire Data Science users have chosen the option "Store Results = False".
@default_tmpdir/alpine_model with caution, as Spotfire Data Science users may need to export models from this directory.