Skip to main content

Thrift connector

Zipstack Cloud features a powerful SQL querying engine on top of many types of connectors, including those from Trino, some custom connectors and connectors from the open source Airbyte project. The underlying native connectors are Trino's connectors. Additionally, some parts of the documentation for these connectors have been adapted from the connector documentation found in Trino's open source project.

The Thrift connector makes it possible to integrate with external storage systems without a custom Zipstack Cloud connector implementation by using Apache Thrift on these servers. It is therefore generic and can provide access to any backend, as long as it exposes the expected API by using Thrift.

In order to use the Thrift connector with an external system, you need to implement the TrinoThriftService interface, found below. Next, you configure the Thrift connector to point to a set of machines, called Thrift servers, that implement the interface. As part of the interface implementation, the Thrift servers provide metadata, splits and data. The connector randomly chooses a server to talk to from the available instances for metadata calls, or for data calls unless the splits include a list of addresses. All requests are assumed to be idempotent and can be retried freely among any server.

Requirements

To connect to your custom servers with the Thrift protocol, you need:

  • Network access from Zipstack Cloud to the Thrift servers.

  • A trino-thrift-service for your system.

Configuration

To configure the Thrift connector, create a data source with the following minimum properties. Replace the properties as appropriate:

trino.thrift.client.addresses=host:port,host:port

Multiple Thrift systems

You can have as many catalogs as you need, so if you have additional Thrift systems to connect to, simply add another data source.

Configuration properties

The following configuration properties are available:

Property nameDescription
trino.thrift.client.addressesLocation of Thrift servers
trino-thrift.max-response-sizeMaximum size of data returned from Thrift server
trino-thrift.metadata-refresh-threadsNumber of refresh threads for metadata cache
trino.thrift.client.max-retriesMaximum number of retries for failed Thrift requests
trino.thrift.client.max-backoff-delayMaximum interval between retry attempts
trino.thrift.client.min-backoff-delayMinimum interval between retry attempts
trino.thrift.client.max-retry-timeMaximum duration across all attempts of a Thrift request
trino.thrift.client.backoff-scale-factorScale factor for exponential back off
trino.thrift.client.connect-timeoutConnect timeout
trino.thrift.client.request-timeoutRequest timeout
trino.thrift.client.socks-proxySOCKS proxy address
trino.thrift.client.max-frame-sizeMaximum size of a raw Thrift response
trino.thrift.client.transportThrift transport type (UNFRAMED, FRAMED, HEADER)
trino.thrift.client.protocolThrift protocol type (BINARY, COMPACT, FB_COMPACT)

trino.thrift.client.addresses

Comma-separated list of thrift servers in the form of host:port. For example:

trino.thrift.client.addresses=192.0.2.3:7777,192.0.2.4:7779

This property is required; there is no default.

trino-thrift.max-response-size

Maximum size of a data response that the connector accepts. This value is sent by the connector to the Thrift server when requesting data, allowing it to size the response appropriately.

This property is optional; the default is 16MB.

trino-thrift.metadata-refresh-threads

Number of refresh threads for metadata cache.

This property is optional; the default is 1.

TrinoThriftService implementation

The following IDL describes the TrinoThriftService that must be implemented:

/include/TrinoThriftService.thrift

Type mapping

The Thrift service defines data type support and mappings to Trino data types.

SQL support

The connector provides globally available <sql-globally-available> and read operation <sql-read-operations> statements to access data and metadata in your Thrift service.