Skip to main content

Cassandra connector

Zipstack Cloud features a powerful SQL querying engine on top of many types of connectors, including those from Trino, some custom connectors and connectors from the open source Airbyte project. The underlying native connectors are Trino's connectors. Additionally, some parts of the documentation for these connectors have been adapted from the connector documentation found in Trino's open source project.

The Cassandra connector allows querying data stored in Apache Cassandra.

Requirements

To connect to Cassandra, you need:

  • Cassandra version 3.0 or higher.

  • Network access from the Zipstack Cloud to Cassandra. Port 9042 is the default port.

Configuration

Multiple Cassandra clusters

You can have as many catalogs as you need, so if you have additional Cassandra clusters, simply add more data sources of type Cassandra with a different name.

Configuration properties

The following configuration properties are available:

Property nameDescription
contact-pointsComma-separated list of hosts in a Cassandra cluster. The Cassandra driver uses these contact points to discover cluster topology. At least one Cassandra host is required.
native-protocol-portThe Cassandra server port running the native client protocol, defaults to 9042.
consistency-levelConsistency levels in Cassandra refer to the level of consistency to be used for both read and write operations. More information about consistency levels can be found in the Cassandra consistency documentation. This property defaults to a consistency level of ONE. Possible values include ALL, EACH_QUORUM, QUORUM, LOCAL_QUORUM, ONE, TWO, THREE, LOCAL_ONE, ANY, SERIAL, LOCAL_SERIAL.
allow-drop-tableEnables DROP TABLE operations. Defaults to false.
usernameUsername used for authentication to the Cassandra cluster. This is a global setting used for all connections, regardless of the user connected to Trino.
passwordPassword used for authentication to the Cassandra cluster. This is a global setting used for all connections, regardless of the user connected to Trino.
protocol-versionIt is possible to override the protocol version for older Cassandra clusters. By default, the value corresponds to the default protocol version used in the underlying Cassandra java driver. Possible values include V3, V4, V5, V6.
note

If authorization is enabled, username must have enough permissions to perform SELECT queries on the system.size_estimates table.

Property nameDescription
fetch-sizeNumber of rows fetched at a time in a Cassandra query.
partition-size-for-batch-selectNumber of partitions batched together into a single select for a single partion key column table.
split-sizeNumber of keys per split when querying Cassandra.
splits-per-nodeNumber of splits per node. By default, the values from the system.size_estimates table are used. Only override when connecting to Cassandra versions < 2.1.5, which lacks the system.size_estimates table.
batch-sizeMaximum number of statements to execute in one batch.
client.read-timeoutMaximum time the Cassandra driver waits for an answer to a query from one Cassandra node. Note that the underlying Cassandra driver may retry a query against more than one node in the event of a read timeout. Increasing this may help with queries that use an index.
client.connect-timeoutMaximum time the Cassandra driver waits to establish a connection to a Cassandra node. Increasing this may help with heavily loaded Cassandra clusters.
client.so-lingerNumber of seconds to linger on close if unsent data is queued. If set to zero, the socket will be closed immediately. When this option is non-zero, a socket lingers that many seconds for an acknowledgement that all data was written to a peer. This option can be used to avoid consuming sockets on a Cassandra server by immediately closing connections when they are no longer needed.
retry-policyPolicy used to retry failed requests to Cassandra. This property defaults to DEFAULT. Using BACKOFF may help when queries fail with “not enough replicas”. The other possible values are DOWNGRADING_CONSISTENCY and FALLTHROUGH.
load-policy.use-dc-awareSet to true if the load balancing policy requires a local datacenter, defaults to true.
load-policy.dc-aware.local-dcThe name of the datacenter considered “local”.
load-policy.dc-aware.used-hosts-per-remote-dcUses the provided number of host per remote datacenter as failover for the local hosts for DefaultLoadBalancingPolicy.
load-policy.dc-aware.allow-remote-dc-for-localSet to true to allow to use hosts of remote datacenter for local consistency level.
no-host-available-retry-timeoutRetry timeout for AllNodesFailedException, defaults to 1m.
speculative-execution.limitThe number of speculative executions. This is disabled by default.
speculative-execution.delayThe delay between each speculative execution, defaults to 500ms.
tls.enabledWhether TLS security is enabled, defaults to false.
tls.keystore-pathPath to the PEM or JKS key store.
tls.truststore-pathPath to the PEM or JKS trust store.
tls.keystore-passwordPassword for the key store.
tls.truststore-passwordPassword for the trust store.

Querying Cassandra tables

The users table is an example Cassandra table from the Cassandra Getting Started guide. It can be created along with the example_keyspace keyspace using Cassandra's cqlsh (CQL interactive terminal):

cqlsh> CREATE KEYSPACE example_keyspace
... WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
cqlsh> USE example_keyspace;
cqlsh:example_keyspace> CREATE TABLE users (
... user_id int PRIMARY KEY,
... fname text,
... lname text
... );

This table can be described in Trino:

DESCRIBE example.example_keyspace.users;
Column  |  Type   | Extra | Comment
---------+---------+-------+---------
user_id | bigint | |
fname | varchar | |
lname | varchar | |
(3 rows)

This table can then be queried in Trino:

SELECT * FROM example.example_keyspace.users;

Type mapping

Because Trino and Cassandra each support types that the other does not, this connector modifies some types <type-mapping-overview> when reading or writing data. Data types may not map the same way in both directions between Trino and the data source. Refer to the following sections for type mapping in each direction.

Cassandra type to Trino type mapping

The connector maps Cassandra types to the corresponding Trino types according to the following table:

Cassandra typeTrino typeNotes
BOOLEANBOOLEAN
TINYINTTINYINT
SMALLINTSMALLINT
INTINTEGER
BIGINTBIGINT
FLOATREAL
DOUBLEDOUBLE
DECIMALDOUBLE
ASCIIVARCHARUS-ASCII character string
TEXTVARCHARUTF-8 encoded string
VARCHARVARCHARUTF-8 encoded string
VARINTVARCHARArbitrary-precision integer
BLOBVARBINARY
DATEDATE
TIMETIME(9)
TIMESTAMPTIMESTAMP(3) WITH TIME ZONE
LIST<?>VARCHAR
MAP<?, ?>VARCHAR
SET<?>VARCHAR
TUPLEROW with anonymous fields
UDTROW with field names
INETIPADDRESS
UUIDUUID
TIMEUUIDUUID

: Cassandra type to Trino type mapping

No other types are supported.

Trino type to Cassandra type mapping

The connector maps Trino types to the corresponding Cassandra types according to the following table:

Trino typeCassandra typeNotes
BOOLEANBOOLEAN
TINYINTTINYINT
SMALLINTSMALLINT
INTEGERINT
BIGINTBIGINT
REALFLOAT
DOUBLEDOUBLE
VARCHARTEXT
DATEDATE
TIMESTAMP(3) WITH TIME ZONETIMESTAMP
IPADDRESSINET
UUIDUUID

: Trino type to Cassandra type mapping

No other types are supported.

Partition key types

Partition keys can only be of the following types:

  • ASCII

  • TEXT

  • VARCHAR

  • BIGINT

  • BOOLEAN

  • DOUBLE

  • INET

  • INT

  • FLOAT

  • DECIMAL

  • TIMESTAMP

  • UUID

  • TIMEUUID

Limitations

  • Queries without filters containing the partition key result in fetching all partitions. This causes a full scan of the entire data set, and is therefore much slower compared to a similar query with a partition key as a filter.

  • IN list filters are only allowed on index (that is, partition key or clustering key) columns.

  • Range (< or > and BETWEEN) filters can be applied only to the partition keys.

SQL support

The connector provides read and write access to data and metadata in the Cassandra database. In addition to the globally available <sql-globally-available> and read operation <sql-read-operations> statements, the connector supports the following features:

  • /sql/insert

  • /sql/delete see sql-delete-limitation

  • /sql/truncate

  • /sql/create-table

  • /sql/create-table-as

  • /sql/drop-table

Table functions

The connector provides specific table functions </functions/table> to access Cassandra. .. _cassandra-query-function:

query(varchar) -> table

The query function allows you to query the underlying Cassandra directly. It requires syntax native to Cassandra, because the full query is pushed down and processed by Cassandra. This can be useful for accessing native features which are not available in Trino or for improving query performance in situations where running a query natively may be faster.

::: note ::: title Note :::

Polymorphic table functions may not preserve the order of the query result. If the table function contains a query with an ORDER BY clause, the function result may not be ordered as expected. :::

As a simple example, to select an entire table:

SELECT
*
FROM
TABLE(
example.system.query(
query => 'SELECT
*
FROM
tpch.nation'
)
);

DROP TABLE

By default, DROP TABLE operations are disabled on Cassandra catalogs. To enable DROP TABLE, set the allow-drop-table catalog configuration property to true

SQL delete limitation

DELETE is only supported if the WHERE clause matches entire partitions.