feat(format): expose functions/procedures in catalog metadata#4031
feat(format): expose functions/procedures in catalog metadata#4031lidavidm wants to merge 2 commits intoapache:spec-1.2.0from
Conversation
|
Some notes:
|
c/include/arrow-adbc/adbc.h
Outdated
| /// 5. Metadata about the accepted parameters and return values as an Arrow | ||
| /// schema, serialized as an IPC message containing a schema Flatbuffers | ||
| /// structure. Only populated if include_arrow_schema is set, otherwise | ||
| /// null. |
There was a problem hiding this comment.
How about another function(s) instead of invoking flatbuffers and only returning if a flag is set?
AdbcConnectionGetRoutineParameterSchema(struct AdbcConnection* connection, const char* catalog, const char* db_schema, const char* routine_name, struct AdbcSchema** parameter_schema_out, int n_parameter_schema_out, struct AdbcSchema** result_schema_out, int n_result_schema_out, struct AdbcError* error);...or something.
A related question whose answer is often as useful (or more useful) is: given a list of arrow types (and the scalar argument values, if present), what Arrow type would you give me back? This is roughly DataFusion's ScalarUdfImpl::return_field(). I have an ABI for this because I have to write a bunch of our UDFs in C++:
There was a problem hiding this comment.
Yes, but it's not in the spec yet 🙂
I still think serializing function arguments and results as flatbuffers is pretty gross and potentially not even that useful (the actual output type sometimes depends on actual literal arguments). A driver-specific string or binary representation of a parameter data type and a function that converts that driver-specific string representation (no roundtrip needed) would solve both of those issues without flatbuffers.
There was a problem hiding this comment.
It's on my list, I just need to get to it!
I'm not sure that metadata is necessarily commonly exposed, or that a driver can necessarily predict the return type (outside of something like DataFusion). JDBC/ODBC seem to assume a fixed list of overloads, for instance.
There was a problem hiding this comment.
I'm not sure that metadata is necessarily commonly exposed
Yes, this is probably unique to systems where the driver is the database (and decimals have type-level precision/scale). It is roughly what is needed to do syntax checking on SQL expressions although can be approximated by a fixed list of overloads of coarser types.
Maybe if a fully serialized vendor/driver-specific type (e.g., what would appear in CREATE TABLE) were available in the parameter schema and we had a function to convert that to an arrow type? (The function that can do the reverse of that would be #3793 I think)
Or you can ignore me and funnel Arrow IPC via the Arrow C Data interface 🙂 (I just think it's a shame we never agreed on a string representation of a data type that would make this cleaner)
There was a problem hiding this comment.
Ah, is the concern mostly about not being human-readable? I agree there...I don't want to go yak shave the JSON issue but maybe it's time...
There was a problem hiding this comment.
Also the vendor-specific type is already there as xdbc_type_name, unless there's more metadata you're expecting
c/include/arrow-adbc/adbc.h
Outdated
| /// | routine_parameters | list<PARAMETER_SCHEMA> | (4) | | ||
| /// | routine_result | list<PARAMETER_SCHEMA> | (4) | | ||
| /// | routine_parameter_schema | binary | (5) | | ||
| /// | routine_result_schema | binary | (5) | | ||
| /// |
There was a problem hiding this comment.
DataFusion and DuckDB both have "description" and SQL examples (useful for generating help pages or inline help). Where would these go?
There was a problem hiding this comment.
That's probably something I should investigate and add to the list! I believe some other systems also provide the definition for UDFs.
There was a problem hiding this comment.
I updated this: remarks can contain the main docstring, examples can contain a list of examples, and now there are two fields for the definition (e.g. for a UDF) and the definition language.
c/include/arrow-adbc/adbc.h
Outdated
| /// | xdbc_scale | int16 | (3) | | ||
| /// | xdbc_num_prec_radix | int16 | (3) | | ||
| /// | xdbc_nullable | int16 | (3) | | ||
| /// | xdbc_char_octet_length | int32 | (3) | | ||
| /// | xdbc_is_nullable | utf8 | (3) | |
There was a problem hiding this comment.
It is a good thing LLMs are much better than they were when creating builders/parsers for the last nested GetObjects schema 🙂
Closes #3983.