Hive connector security configuration
Zipstack Cloud features a powerful SQL querying engine on top of many types of connectors, including those from Trino, some custom connectors and connectors from the open source Airbyte project. The underlying native connectors are Trino's connectors. Additionally, some parts of the documentation for these connectors have been adapted from the connector documentation found in Trino's open source project.
Please reach out to [email protected] if you need Hive/Kerberos based security. This requires provisioning Zipstack Cloud with extra modules/properties.
Overview
The Hive connector supports both authentication and authorization. Zipstack Cloud can impersonate the end user who is running a query. Authentication can be configured with or without user impersonation on Kerberized Hadoop clusters.
Requirements
End user authentication limited to Kerberized Hadoop clusters. Authentication user impersonation is available for both Kerberized and non-Kerberized clusters.
You must ensure that you meet the Kerberos, user impersonation and keytab requirements described in this section that apply to your configuration.
Kerberos
In order to use the Hive connector with a Hadoop cluster that uses
kerberos authentication, you must configure the connector to work with
two services on the Hadoop cluster:
The Hive metastore Thrift service
The Hadoop Distributed File System (HDFS)
Access to these services by the Hive connector is configured in the general Hive connector configuration.
Kerberos authentication by ticket cache is not yet supported.
::: note ::: title Note :::
If your krb5.conf location is different from /etc/krb5.conf you must
set it explicitly using the java.security.krb5.conf JVM property in
jvm.config file.
Example: -Djava.security.krb5.conf=/example/path/krb5.conf.
:::
::: warning ::: title Warning :::
Access to the Trino coordinator must be secured e.g., using Kerberos or
password authentication, when using Kerberos authentication to Hadoop
services. Failure to secure access to the Trino coordinator could result
in unauthorized access to sensitive data on the Hadoop cluster. Refer to
/security for further information.
See /security/kerberos for information on setting up Kerberos
authentication.
:::
Keytab files
Keytab files contain encryption keys that are used to authenticate
principals to the Kerberos KDC (Key Distribution Center). These
encryption keys must be stored securely; you must take the same
precautions to protect them that you take to protect ssh private keys.
In particular, access to keytab files must be limited to only the accounts that must use them to authenticate. In practice, this is the user that the Trino process runs as. The ownership and permissions on keytab files must be set to prevent other users from reading or modifying the files.
Keytab files must be distributed to every node running Trino. Under common deployment situations, the Hive connector configuration is the same on all nodes. This means that the keytab needs to be in the same location on every node.
You must ensure that the keytab files have the correct permissions on every node after distributing them.
Impersonation in Hadoop
In order to use impersonation, the Hadoop cluster must be configured to
allow the user or principal that Trino is running as to impersonate the
users who log in to Trino. Impersonation in Hadoop is configured in the
file core-site.xml. A complete description of the configuration
options can be found in the Hadoop
documentation.
Authentication
The default security configuration of the hive connector does not use authentication when connecting to a Hadoop cluster. All queries are executed as the user who runs the Trino process, regardless of which user submits the query.
The Hive connector provides additional security options to support
Hadoop clusters that have been configured to use
Kerberos <hive-security-kerberos-support>.
When accessing HDFS (Hadoop Distributed File System), Trino can
impersonate<hive-security-impersonation> the end user who is running
the query. This can be used with HDFS permissions and
ACLs (Access Control Lists) to provide additional security for data.
Hive metastore Thrift service authentication
In a Kerberized Hadoop cluster, Trino connects to the Hive metastore
Thrift service using SASL (Simple Authentication and Security Layer)
and authenticates using Kerberos. Kerberos authentication for the
metastore is configured in the connector's properties file using the
following optional properties:
| Property value | Description | Default |
|---|---|---|
metastore.authentication.type | Hive metastore authentication type. One of NONE or KERBEROS. When using the default value of NONE, Kerberos authentication is disabled, and no other properties must be configured.When set to KERBEROS the Hive connector connects to the Hive metastore Thrift service using SASL and authenticate using Kerberos. | NONE |
metastore.thrift.impersonation.enabled | Enable Hive metastore end user impersonation. See KERBEROS authentication with impersonation for more information. | false |
metastore.service.principal | The Kerberos principal of the Hive metastore service. The coordinator uses this to authenticate the Hive metastore.The _HOST placeholder can be used in this property value. When connecting to the Hive metastore, the Hive connector substitutes in the hostname of the metastore server it is connecting to. This is useful if the metastore runs on multiple hosts.Example: hive/[email protected] or hive/[email protected]. | |
metastore.client.principal | The Kerberos principal that Trino uses when connecting to the Hive metastore service.Example: trino/[email protected] or trino/[email protected].The _HOST placeholder can be used in this property value. When connecting to the Hive metastore, the Hive connector substitutes in the hostname of the worker node Trino is running on. This is useful if each worker node has its own Kerberos principal.Unless KERBEROS authentication with impersonation is enabled, the principal specified by metastore.client.principal must have sufficient privileges to remove files and directories within the hive/warehouse directory.Warning: If the principal does have sufficient permissions, only the metadata is removed, and the data continues to consume disk space. This occurs because the Hive metastore is responsible for deleting the internal table data. When the metastore is configured to use Kerberos authentication, all of the HDFS operations performed by the metastore are impersonated. Errors deleting data are silently ignored. | |
metastore.client.keytab | The path to the keytab file that contains a key for the principal specified by metastore.client.principal. This file must be readable by the operating system user running Trino. |
Configuration examples
The following sections describe the configuration properties and values needed for the various authentication configurations needed to use the Hive metastore Thrift service with the Hive connector.
====== Default NONE authentication without impersonation
metastore.authentication.type=NONE
The default authentication type for the Hive metastore is NONE. When
the authentication type is NONE, Trino connects to an unsecured Hive
metastore. Kerberos is not used.
====== KERBEROS authentication with impersonation
metastore.authentication.type=KERBEROS
metastore.thrift.impersonation.enabled=true
metastore.service.principal=hive/[email protected]
[email protected]
metastore.client.keytab=/etc/trino/hive.keytab
When the authentication type for the Hive metastore Thrift service is
KERBEROS, Trino connects as the Kerberos principal specified by the
property metastore.client.principal. Trino authenticates this
principal using the keytab specified by the
metastore.client.keytab property, and verifies that the identity
of the metastore matches metastore.service.principal.
When using KERBEROS Metastore authentication with impersonation, the
principal specified by the metastore.client.principal property
must be allowed to impersonate the current Trino user, as discussed in
the section configuring-hadoop-impersonation.
Keytab files must be distributed to every node in the cluster that runs Trino.
Additional Information About Keytab Files.<hive-security-additional-keytab>
HDFS authentication
In a Kerberized Hadoop cluster, Trino authenticates to HDFS using Kerberos. Kerberos authentication for HDFS is configured in the connector's properties file using the following optional properties:
| Property value | Description | Default |
|---|---|---|
hdfs.authentication.type | HDFS authentication type; one of NONE or KERBEROS. When using the default value of NONE, Kerberos authentication is disabled, and no other properties must be configured.When set to KERBEROS, the Hive connector authenticates to HDFS using Kerberos. | NONE |
hdfs.impersonation.enabled | Enable HDFS end-user impersonation. Impersonating the end user can provide additional security when accessing HDFS if HDFS permissions or ACLs are used.HDFS Permissions and ACLs are explained in the HDFS Permissions Guide. | false |
hdfs.trino.principal | The Kerberos principal Trino uses when connecting to HDFS.Example: trino-hdfs-superuser/[email protected] or trino-hdfs-superuser/[email protected].The _HOST placeholder can be used in this property value. When connecting to HDFS, the Hive connector substitutes in the hostname of the worker node Trino is running on. This is useful if each worker node has its own Kerberos principal. | |
hdfs.trino.keytab | The path to the keytab file that contains a key for the principal specified by hdfs.trino.principal. This file must be readable by the operating system user running Trino. | |
hdfs.wire-encryption.enabled | Enable HDFS wire encryption. In a Kerberized Hadoop cluster that uses HDFS wire encryption, this must be set to true to enable Trino to access HDFS. Note that using wire encryption may impact query execution performance. |
Configuration examples
The following sections describe the configuration properties and values needed for the various authentication configurations with HDFS and the Hive connector.
====== Default NONE authentication without impersonation
hdfs.authentication.type=NONE
The default authentication type for HDFS is NONE. When the
authentication type is NONE, Trino connects to HDFS using Hadoop's
simple authentication mechanism. Kerberos is not used.
====== NONE authentication with impersonation
hdfs.authentication.type=NONE
hdfs.impersonation.enabled=true
When using NONE authentication with impersonation, Trino impersonates
the user who is running the query when accessing HDFS. The user Trino is
running as must be allowed to impersonate this user, as discussed in the
section configuring-hadoop-impersonation. Kerberos is not used.
====== KERBEROS authentication without impersonation
hdfs.authentication.type=KERBEROS
[email protected]
hdfs.trino.keytab=/etc/trino/hdfs.keytab
When the authentication type is KERBEROS, Trino accesses HDFS as the
principal specified by the hdfs.trino.principal property. Trino
authenticates this principal using the keytab specified by the
hdfs.trino.keytab keytab.
Keytab files must be distributed to every node in the cluster that runs Trino.
Additional Information About Keytab Files.<hive-security-additional-keytab>
====== KERBEROS authentication with impersonation
hdfs.authentication.type=KERBEROS
hdfs.impersonation.enabled=true
[email protected]
hdfs.trino.keytab=/etc/trino/hdfs.keytab
When using KERBEROS authentication with impersonation, Trino
impersonates the user who is running the query when accessing HDFS. The
principal specified by the hdfs.trino.principal property must be
allowed to impersonate the current Trino user, as discussed in the
section configuring-hadoop-impersonation. Trino authenticates
hdfs.trino.principal using the keytab specified by
hdfs.trino.keytab.
Keytab files must be distributed to every node in the cluster that runs Trino.
Additional Information About Keytab Files.<hive-security-additional-keytab>
Authorization
You can enable authorization checks for the hive by setting the
security property in the Hive catalog properties file. This
property must be one of the following values:
| Property value | Description |
|---|---|
legacy (default value) | Few authorization checks are enforced, thus allowing most operations. The config properties allow-drop-table, allow-rename-table, allow-add-column, allow-drop-column and allow-rename-column are used. |
read-only | Operations that read data or metadata, such as SELECT, are permitted, but none of the operations that write data or metadata, such as CREATE, INSERT or DELETE, are allowed. |
file | Authorization checks are enforced using a catalog-level access control configuration file whose path is specified in the security.config-file catalog configuration property. See Catalog-level access control files for details. |
sql-standard | Users are permitted to perform the operations as long as they have the required privileges as per the SQL standard. In this mode, Trino enforces the authorization checks for queries based on the privileges defined in Hive metastore. To alter these privileges, use the GRANT and REVOKE commands.See the SQL standard based authorization section for details. |
allow-all | No authorization checks are enforced. |
SQL standard based authorization
When sql-standard security is enabled, Trino enforces the same SQL
standard-based authorization as Hive does.
Since Trino's ROLE syntax support matches the SQL standard, and Hive
does not exactly follow the SQL standard, there are the following
limitations and differences:
CREATE ROLE role WITH ADMINis not supported.The
adminrole must be enabled to executeCREATE ROLE,DROP ROLEorCREATE SCHEMA.GRANT role TO user GRANTED BY someoneis not supported.REVOKE role FROM user GRANTED BY someoneis not supported.By default, all a user's roles, except
admin, are enabled in a new user session.One particular role can be selected by executing
SET ROLE role.SET ROLE ALLenables all of a user's roles exceptadmin.The
adminrole must be enabled explicitly by executingSET ROLE admin.GRANT privilege ON SCHEMA schemais not supported. Schema ownership can be changed withALTER SCHEMA schema SET AUTHORIZATION user