Skip to content

fix(c): handle non-ASCII driver paths in GetDriverInfo on Windows#4006

Draft
amoeba wants to merge 2 commits intoapache:mainfrom
amoeba:gh-3970--nonascii-path
Draft

fix(c): handle non-ASCII driver paths in GetDriverInfo on Windows#4006
amoeba wants to merge 2 commits intoapache:mainfrom
amoeba:gh-3970--nonascii-path

Conversation

@amoeba
Copy link
Member

@amoeba amoeba commented Feb 18, 2026

This is just a draft to verify my test fails on CI like it does locally. Will update PR body before merge.

Ref #3970

Comment on lines +815 to +816
// Create a UTF-8 encoded string to simulate what Python or another caller would
// pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the issue is that Python is treating the path as a string when it shouldn't?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's that.

I initially started out Claude Code on it and it zero'd in on,

diff --git a/c/driver_manager/adbc_driver_manager.cc b/c/driver_manager/adbc_driver_manager.cc
index 15d98c00b..5bb4a34d1 100644
--- a/c/driver_manager/adbc_driver_manager.cc
+++ b/c/driver_manager/adbc_driver_manager.cc
@@ -704,7 +704,12 @@ struct ManagedLibrary {
     }
 
     // First try to treat the given driver name as a path to a manifest or shared library
+#ifdef _WIN32
+    // On Windows, decode UTF-8 to wide string to properly handle non-ASCII paths
+    std::filesystem::path driver_path(Utf8Decode(std::string(driver_name)));
+#else
     std::filesystem::path driver_path(driver_name);
+#endif
     const bool allow_relative_paths = load_options & ADBC_LOAD_FLAG_ALLOW_RELATIVE_PATHS;
     if (driver_path.has_extension()) {
       if (driver_path.is_relative() && !allow_relative_paths) {

So I started to see if I could prove it. I looked at the sequence of calls that leads there from Python and it looks like we pass a Python str() and treat it as a C string in the C++ side. I also built adbc_driver_manager with debug symbols and when I run,

>>> dbapi.connect(driver="项目\some.dll")

and stop on a breakpoint in the C++ driver manager's AdbcDatabaseInit, I see,

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason I wasn't able to breakpoint GetDriverInfo so when I laboriously stepped my way into GetDriverInfo, I can see the code in the patch gives me what I expect. Notice driver_name is the garbled form and driver_path is correct:

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that paths aren't UTF-8, so decoding it seems very wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paths should be bytestrings (with some restrictions), so if Python provided UTF-8 that we have to decode into the native codepage, then presumably Python should've provided the raw bytestring in the first place

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a codepath I'm not seeing where we use a Path object? I'm assuming that's what you're referring to when you say paths are bytestrings. All I see are str involved in the code paths here and I guess I'm assuming those turn into UTF-8 encoded const char* on the C side... I'll look again tomorrow

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talking about the Python side. I'm assuming this should not be converting to str, but should be using bytes?

@functools.lru_cache
def _driver_path() -> str:
import pathlib
import sys
import importlib_resources
driver = "adbc_driver_sqlite"
# Wheels bundle the shared library
root = importlib_resources.files(driver)
# The filename is always the same regardless of platform
entrypoint = root.joinpath(f"lib{driver}.so")
if entrypoint.is_file():
return str(entrypoint)
# Search sys.prefix + '/lib' (Unix, Conda on Unix)
root = pathlib.Path(sys.prefix)
for filename in (f"lib{driver}.so", f"lib{driver}.dylib"):
entrypoint = root.joinpath("lib", filename)
if entrypoint.is_file():
return str(entrypoint)
# Conda on Windows
entrypoint = root.joinpath("bin", f"{driver}.dll")
if entrypoint.is_file():
return str(entrypoint)
# Let the driver manager fall back to (DY)LD_LIBRARY_PATH/PATH
# (It will insert 'lib', 'so', etc. as needed)
return driver

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants