Arrow Flight RPC¶
Arrow Flight is an RPC framework for high-performance data services based on Arrow data, and is built on top of gRPC and the IPC format.
Flight is organized around streams of Arrow record batches, being either downloaded from or uploaded to another service. A set of metadata methods offers discovery and introspection of streams, as well as the ability to implement application-specific methods.
Methods and message wire formats are defined by Protobuf, enabling interoperability with clients that may support gRPC and Arrow separately, but not Flight. However, Flight implementations include further optimizations to avoid overhead in usage of Protobuf (mostly around avoiding excessive memory copies).
RPC Methods¶
Flight defines a set of RPC methods for uploading/downloading data, retrieving metadata about a data stream, listing available data streams, and for implementing application-specific RPC methods. A Flight service implements some subset of these methods, while a Flight client can call any of these methods. Thus, one Flight client can connect to any Flight service and perform basic operations.
Data streams are identified by descriptors, which are either a path or an arbitrary binary command. A client that wishes to download the data would:
Construct or acquire a
FlightDescriptor
for the data set they are interested in. A client may know what descriptor they want already, or they may use methods likeListFlights
to discover them.Call
GetFlightInfo(FlightDescriptor)
to get aFlightInfo
message containing details on where the data is located (as well as other metadata, like the schema and possibly an estimate of the dataset size).Flight does not require that data live on the same server as metadata: this call may list other servers to connect to. The
FlightInfo
message includes aTicket
, an opaque binary token that the server uses to identify the exact data set being requested.Connect to other servers (if needed).
Call
DoGet(Ticket)
to get back a stream of Arrow record batches.
To upload data, a client would:
Construct or acquire a
FlightDescriptor
, as before.Call
DoPut(FlightData)
and upload a stream of Arrow record batches. They would also include theFlightDescriptor
with the first message.
See Protocol Buffer Definitions for full details on the methods and messages involved.
Authentication¶
Flight supports application-implemented authentication methods. Authentication, if enabled, has two phases: at connection time, the client and server can exchange any number of messages. Then, the client can provide a token alongside each call, and the server can validate that token.
Applications may use any part of this; for instance, they may ignore the initial handshake and send an externally acquired token on each call, or they may establish trust during the handshake and not validate a token for each call. (Note that the latter is not secure if you choose to deploy a layer 7 load balancer, as is common with gRPC.)
Error Handling¶
Arrow Flight defines its own set of error codes. The implementation differs between languages (e.g. in C++, Unimplemented is a general Arrow error status while it’s a Flight-specific exception in Java), but the following set is exposed:
Error Code |
Description |
---|---|
UNKNOWN |
An unknown error. The default if no other error applies. |
INTERNAL |
An error internal to the service implementation occurred. |
INVALID_ARGUMENT |
The client passed an invalid argument to the RPC. |
TIMED_OUT |
The operation exceeded a timeout or deadline. |
NOT_FOUND |
The requested resource (action, data stream) was not found. |
ALREADY_EXISTS |
The resource already exists. |
CANCELLED |
The operation was cancelled (either by the client or the server). |
UNAUTHENTICATED |
The client is not authenticated. |
UNAUTHORIZED |
The client is authenticated, but does not have permissions for the requested operation. |
UNIMPLEMENTED |
The RPC is not implemented. |
UNAVAILABLE |
The server is not available. May be emitted by the client for connectivity reasons. |
External Resources¶
Protocol Buffer Definitions¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 | /*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
* <p>
* http://www.apache.org/licenses/LICENSE-2.0
* <p>
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
syntax = "proto3";
option java_package = "org.apache.arrow.flight.impl";
option go_package = "github.com/apache/arrow/go/flight;flight";
option csharp_namespace = "Apache.Arrow.Flight.Protocol";
package arrow.flight.protocol;
/*
* A flight service is an endpoint for retrieving or storing Arrow data. A
* flight service can expose one or more predefined endpoints that can be
* accessed using the Arrow Flight Protocol. Additionally, a flight service
* can expose a set of actions that are available.
*/
service FlightService {
/*
* Handshake between client and server. Depending on the server, the
* handshake may be required to determine the token that should be used for
* future operations. Both request and response are streams to allow multiple
* round-trips depending on auth mechanism.
*/
rpc Handshake(stream HandshakeRequest) returns (stream HandshakeResponse) {}
/*
* Get a list of available streams given a particular criteria. Most flight
* services will expose one or more streams that are readily available for
* retrieval. This api allows listing the streams available for
* consumption. A user can also provide a criteria. The criteria can limit
* the subset of streams that can be listed via this interface. Each flight
* service allows its own definition of how to consume criteria.
*/
rpc ListFlights(Criteria) returns (stream FlightInfo) {}
/*
* For a given FlightDescriptor, get information about how the flight can be
* consumed. This is a useful interface if the consumer of the interface
* already can identify the specific flight to consume. This interface can
* also allow a consumer to generate a flight stream through a specified
* descriptor. For example, a flight descriptor might be something that
* includes a SQL statement or a Pickled Python operation that will be
* executed. In those cases, the descriptor will not be previously available
* within the list of available streams provided by ListFlights but will be
* available for consumption for the duration defined by the specific flight
* service.
*/
rpc GetFlightInfo(FlightDescriptor) returns (FlightInfo) {}
/*
* For a given FlightDescriptor, get the Schema as described in Schema.fbs::Schema
* This is used when a consumer needs the Schema of flight stream. Similar to
* GetFlightInfo this interface may generate a new flight that was not previously
* available in ListFlights.
*/
rpc GetSchema(FlightDescriptor) returns (SchemaResult) {}
/*
* Retrieve a single stream associated with a particular descriptor
* associated with the referenced ticket. A Flight can be composed of one or
* more streams where each stream can be retrieved using a separate opaque
* ticket that the flight service uses for managing a collection of streams.
*/
rpc DoGet(Ticket) returns (stream FlightData) {}
/*
* Push a stream to the flight service associated with a particular
* flight stream. This allows a client of a flight service to upload a stream
* of data. Depending on the particular flight service, a client consumer
* could be allowed to upload a single stream per descriptor or an unlimited
* number. In the latter, the service might implement a 'seal' action that
* can be applied to a descriptor once all streams are uploaded.
*/
rpc DoPut(stream FlightData) returns (stream PutResult) {}
/*
* Open a bidirectional data channel for a given descriptor. This
* allows clients to send and receive arbitrary Arrow data and
* application-specific metadata in a single logical stream. In
* contrast to DoGet/DoPut, this is more suited for clients
* offloading computation (rather than storage) to a Flight service.
*/
rpc DoExchange(stream FlightData) returns (stream FlightData) {}
/*
* Flight services can support an arbitrary number of simple actions in
* addition to the possible ListFlights, GetFlightInfo, DoGet, DoPut
* operations that are potentially available. DoAction allows a flight client
* to do a specific action against a flight service. An action includes
* opaque request and response objects that are specific to the type action
* being undertaken.
*/
rpc DoAction(Action) returns (stream Result) {}
/*
* A flight service exposes all of the available action types that it has
* along with descriptions. This allows different flight consumers to
* understand the capabilities of the flight service.
*/
rpc ListActions(Empty) returns (stream ActionType) {}
}
/*
* The request that a client provides to a server on handshake.
*/
message HandshakeRequest {
/*
* A defined protocol version
*/
uint64 protocol_version = 1;
/*
* Arbitrary auth/handshake info.
*/
bytes payload = 2;
}
message HandshakeResponse {
/*
* A defined protocol version
*/
uint64 protocol_version = 1;
/*
* Arbitrary auth/handshake info.
*/
bytes payload = 2;
}
/*
* A message for doing simple auth.
*/
message BasicAuth {
string username = 2;
string password = 3;
}
message Empty {}
/*
* Describes an available action, including both the name used for execution
* along with a short description of the purpose of the action.
*/
message ActionType {
string type = 1;
string description = 2;
}
/*
* A service specific expression that can be used to return a limited set
* of available Arrow Flight streams.
*/
message Criteria {
bytes expression = 1;
}
/*
* An opaque action specific for the service.
*/
message Action {
string type = 1;
bytes body = 2;
}
/*
* An opaque result returned after executing an action.
*/
message Result {
bytes body = 1;
}
/*
* Wrap the result of a getSchema call
*/
message SchemaResult {
// schema of the dataset as described in Schema.fbs::Schema.
bytes schema = 1;
}
/*
* The name or tag for a Flight. May be used as a way to retrieve or generate
* a flight or be used to expose a set of previously defined flights.
*/
message FlightDescriptor {
/*
* Describes what type of descriptor is defined.
*/
enum DescriptorType {
// Protobuf pattern, not used.
UNKNOWN = 0;
/*
* A named path that identifies a dataset. A path is composed of a string
* or list of strings describing a particular dataset. This is conceptually
* similar to a path inside a filesystem.
*/
PATH = 1;
/*
* An opaque command to generate a dataset.
*/
CMD = 2;
}
DescriptorType type = 1;
/*
* Opaque value used to express a command. Should only be defined when
* type = CMD.
*/
bytes cmd = 2;
/*
* List of strings identifying a particular dataset. Should only be defined
* when type = PATH.
*/
repeated string path = 3;
}
/*
* The access coordinates for retrieval of a dataset. With a FlightInfo, a
* consumer is able to determine how to retrieve a dataset.
*/
message FlightInfo {
// schema of the dataset as described in Schema.fbs::Schema.
bytes schema = 1;
/*
* The descriptor associated with this info.
*/
FlightDescriptor flight_descriptor = 2;
/*
* A list of endpoints associated with the flight. To consume the whole
* flight, all endpoints must be consumed.
*/
repeated FlightEndpoint endpoint = 3;
// Set these to -1 if unknown.
int64 total_records = 4;
int64 total_bytes = 5;
}
/*
* A particular stream or split associated with a flight.
*/
message FlightEndpoint {
/*
* Token used to retrieve this stream.
*/
Ticket ticket = 1;
/*
* A list of URIs where this ticket can be redeemed. If the list is
* empty, the expectation is that the ticket can only be redeemed on the
* current service where the ticket was generated.
*/
repeated Location location = 2;
}
/*
* A location where a Flight service will accept retrieval of a particular
* stream given a ticket.
*/
message Location {
string uri = 1;
}
/*
* An opaque identifier that the service can use to retrieve a particular
* portion of a stream.
*/
message Ticket {
bytes ticket = 1;
}
/*
* A batch of Arrow data as part of a stream of batches.
*/
message FlightData {
/*
* The descriptor of the data. This is only relevant when a client is
* starting a new DoPut stream.
*/
FlightDescriptor flight_descriptor = 1;
/*
* Header for message data as described in Message.fbs::Message.
*/
bytes data_header = 2;
/*
* Application-defined metadata.
*/
bytes app_metadata = 3;
/*
* The actual batch of Arrow data. Preferably handled with minimal-copies
* coming last in the definition to help with sidecar patterns (it is
* expected that some implementations will fetch this field off the wire
* with specialized code to avoid extra memory copies).
*/
bytes data_body = 1000;
}
/**
* The response message associated with the submission of a DoPut.
*/
message PutResult {
bytes app_metadata = 1;
}
|