Thrift connector
Zipstack Cloud features a powerful SQL querying engine on top of many types of connectors, including those from Trino, some custom connectors and connectors from the open source Airbyte project. The underlying native connectors are Trino's connectors. Additionally, some parts of the documentation for these connectors have been adapted from the connector documentation found in Trino's open source project.
The Thrift connector makes it possible to integrate with external storage systems without a custom Zipstack Cloud connector implementation by using Apache Thrift on these servers. It is therefore generic and can provide access to any backend, as long as it exposes the expected API by using Thrift.
In order to use the Thrift connector with an external system, you need
to implement the TrinoThriftService interface, found below. Next, you
configure the Thrift connector to point to a set of machines, called
Thrift servers, that implement the interface. As part of the interface
implementation, the Thrift servers provide metadata, splits and data.
The connector randomly chooses a server to talk to from the available
instances for metadata calls, or for data calls unless the splits
include a list of addresses. All requests are assumed to be idempotent
and can be retried freely among any server.
Requirements
To connect to your custom servers with the Thrift protocol, you need:
Network access from Zipstack Cloud to the Thrift servers.
A
trino-thrift-servicefor your system.
Configuration
To configure the Thrift connector, create a data source with the following minimum properties. Replace the properties as appropriate:
trino.thrift.client.addresses=host:port,host:port
Multiple Thrift systems
You can have as many catalogs as you need, so if you have additional Thrift systems to connect to, simply add another data source.
Configuration properties
The following configuration properties are available:
| Property name | Description |
|---|---|
trino.thrift.client.addresses | Location of Thrift servers |
trino-thrift.max-response-size | Maximum size of data returned from Thrift server |
trino-thrift.metadata-refresh-threads | Number of refresh threads for metadata cache |
trino.thrift.client.max-retries | Maximum number of retries for failed Thrift requests |
trino.thrift.client.max-backoff-delay | Maximum interval between retry attempts |
trino.thrift.client.min-backoff-delay | Minimum interval between retry attempts |
trino.thrift.client.max-retry-time | Maximum duration across all attempts of a Thrift request |
trino.thrift.client.backoff-scale-factor | Scale factor for exponential back off |
trino.thrift.client.connect-timeout | Connect timeout |
trino.thrift.client.request-timeout | Request timeout |
trino.thrift.client.socks-proxy | SOCKS proxy address |
trino.thrift.client.max-frame-size | Maximum size of a raw Thrift response |
trino.thrift.client.transport | Thrift transport type (UNFRAMED, FRAMED, HEADER) |
trino.thrift.client.protocol | Thrift protocol type (BINARY, COMPACT, FB_COMPACT) |
trino.thrift.client.addresses
Comma-separated list of thrift servers in the form of host:port. For
example:
trino.thrift.client.addresses=192.0.2.3:7777,192.0.2.4:7779
This property is required; there is no default.
trino-thrift.max-response-size
Maximum size of a data response that the connector accepts. This value is sent by the connector to the Thrift server when requesting data, allowing it to size the response appropriately.
This property is optional; the default is 16MB.
trino-thrift.metadata-refresh-threads
Number of refresh threads for metadata cache.
This property is optional; the default is 1.
TrinoThriftService implementation
The following IDL describes the TrinoThriftService that must be
implemented:
/include/TrinoThriftService.thrift
Type mapping
The Thrift service defines data type support and mappings to Trino data types.
SQL support
The connector provides globally available <sql-globally-available> and
read operation <sql-read-operations> statements to access data and
metadata in your Thrift service.