Skip to content

feat(format): expose functions/procedures in catalog metadata#4031

Draft
lidavidm wants to merge 2 commits intoapache:spec-1.2.0from
lidavidm:gh-3983
Draft

feat(format): expose functions/procedures in catalog metadata#4031
lidavidm wants to merge 2 commits intoapache:spec-1.2.0from
lidavidm:gh-3983

Conversation

@lidavidm
Copy link
Member

@lidavidm lidavidm commented Mar 2, 2026

Closes #3983.

@lidavidm lidavidm added this to the ADBC API Specification 1.2.0 milestone Mar 2, 2026
@lidavidm
Copy link
Member Author

lidavidm commented Mar 2, 2026

Some notes:

  • There may be more metadata that is not in JDBC/ODBC, but which is reasonably common that we can standardize?
  • I've chosen not to model the types of multiple result sets; based on my search, almost nothing actually exposes this metadata (and it may not even be possible to determine the type with static analysis). But we should add a note about this. (It seems MSSQL with SET FMTONLY ON is the only DBMS that even tries to expose this metadata, but that is not recommended anymore and the result it gave you was suspect anyways as it didn't/couldn't properly evaluate which branches to take.)
  • I need to describe the Arrow schema field in more detail.

Comment on lines +1897 to +1900
/// 5. Metadata about the accepted parameters and return values as an Arrow
/// schema, serialized as an IPC message containing a schema Flatbuffers
/// structure. Only populated if include_arrow_schema is set, otherwise
/// null.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about another function(s) instead of invoking flatbuffers and only returning if a flag is set?

AdbcConnectionGetRoutineParameterSchema(struct AdbcConnection* connection, const char* catalog, const char* db_schema, const char* routine_name, struct AdbcSchema** parameter_schema_out, int n_parameter_schema_out, struct AdbcSchema** result_schema_out, int n_result_schema_out, struct AdbcError* error);

...or something.

A related question whose answer is often as useful (or more useful) is: given a list of arrow types (and the scalar argument values, if present), what Arrow type would you give me back? This is roughly DataFusion's ScalarUdfImpl::return_field(). I have an ABI for this because I have to write a bunch of our UDFs in C++:

https://github.com/apache/sedona-db/blob/8b16f39b8a8806daacf3c09a2f6e1c864fed69e8/c/sedona-extension/src/sedona_extension.h#L148-L151

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3793 and #3996 are related to that, I believe.

I thought about a separate function, but note that even for GetObjects there's a request to add a serialized schema to avoid round trips.
#1704

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it's not in the spec yet 🙂

I still think serializing function arguments and results as flatbuffers is pretty gross and potentially not even that useful (the actual output type sometimes depends on actual literal arguments). A driver-specific string or binary representation of a parameter data type and a function that converts that driver-specific string representation (no roundtrip needed) would solve both of those issues without flatbuffers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's on my list, I just need to get to it!

I'm not sure that metadata is necessarily commonly exposed, or that a driver can necessarily predict the return type (outside of something like DataFusion). JDBC/ODBC seem to assume a fixed list of overloads, for instance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that metadata is necessarily commonly exposed

Yes, this is probably unique to systems where the driver is the database (and decimals have type-level precision/scale). It is roughly what is needed to do syntax checking on SQL expressions although can be approximated by a fixed list of overloads of coarser types.

Maybe if a fully serialized vendor/driver-specific type (e.g., what would appear in CREATE TABLE) were available in the parameter schema and we had a function to convert that to an arrow type? (The function that can do the reverse of that would be #3793 I think)

Or you can ignore me and funnel Arrow IPC via the Arrow C Data interface 🙂 (I just think it's a shame we never agreed on a string representation of a data type that would make this cleaner)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, is the concern mostly about not being human-readable? I agree there...I don't want to go yak shave the JSON issue but maybe it's time...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the vendor-specific type is already there as xdbc_type_name, unless there's more metadata you're expecting

Comment on lines +1885 to +1889
/// | routine_parameters | list<PARAMETER_SCHEMA> | (4) |
/// | routine_result | list<PARAMETER_SCHEMA> | (4) |
/// | routine_parameter_schema | binary | (5) |
/// | routine_result_schema | binary | (5) |
///
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataFusion and DuckDB both have "description" and SQL examples (useful for generating help pages or inline help). Where would these go?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably something I should investigate and add to the list! I believe some other systems also provide the definition for UDFs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this: remarks can contain the main docstring, examples can contain a list of examples, and now there are two fields for the definition (e.g. for a UDF) and the definition language.

Comment on lines +1914 to +1918
/// | xdbc_scale | int16 | (3) |
/// | xdbc_num_prec_radix | int16 | (3) |
/// | xdbc_nullable | int16 | (3) |
/// | xdbc_char_octet_length | int32 | (3) |
/// | xdbc_is_nullable | utf8 | (3) |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a good thing LLMs are much better than they were when creating builders/parsers for the last nested GetObjects schema 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants