Skip to content

Fix/1158 udf thread safety#5

Closed
otegami wants to merge 8 commits intomainfrom
fix/1158-udf-thread-safety
Closed

Fix/1158 udf thread safety#5
otegami wants to merge 8 commits intomainfrom
fix/1158-udf-thread-safety

Conversation

@otegami
Copy link
Copy Markdown
Owner

@otegami otegami commented Apr 2, 2026

No description provided.

otegami and others added 8 commits April 2, 2026 17:18
…callbacks

DuckDB calls scalar function callbacks from its own worker threads,
which are not Ruby threads. Ruby's GVL cannot be acquired from
non-Ruby threads (rb_thread_call_with_gvl crashes with rb_bug),
so an executor pattern is required.

Previously, scalar_function.c contained a single global executor
thread that serialized all callbacks through one condvar. On Windows,
this caused deadlocks due to GVL starvation under Minitest parallel
execution (Issue suketa#1158).

This commit makes two changes:

1. Extract the executor infrastructure into a shared module
   (executor.c/executor.h) so both scalar_function.c and
   table_function.c can use it.

2. Add per-worker proxy threads using DuckDB's init_local_state
   API. Each DuckDB worker thread gets a dedicated Ruby proxy
   thread created via duckdb_scalar_function_set_init(). The
   proxy acquires the GVL independently, eliminating the global
   executor bottleneck. The global executor is retained as a
   bootstrap mechanism for proxy creation and as a fallback.

Refs: suketa#1158

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
table_function.c previously called Ruby C API functions (rb_funcall,
rb_class_new_instance, rb_protect) directly from whichever thread
DuckDB invoked the callback on. When called from a non-Ruby worker
thread, this was undefined behavior — potential crash or deadlock.

The workaround was a Ruby-level check_threads guard in connection.rb
that raised an error if threads > 1 when registering table functions,
forcing all table function queries to run single-threaded.

This commit adds the same three-path dispatch pattern used by
scalar_function.c to all three table function callbacks:
  1. Ruby thread WITH GVL    -> call directly
  2. Ruby thread WITHOUT GVL -> rb_thread_call_with_gvl
  3. Non-Ruby thread         -> per-worker proxy (or global executor)

The execute callback also registers a local_init that creates a
per-worker proxy thread for non-Ruby workers, matching the scalar
function approach.

With table functions now thread-safe, the check_threads guard and
its threads=1 requirement are removed from connection.rb.

Refs: suketa#1158

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… table function test

With scalar and table function callbacks now thread-safe via
per-worker proxy threads, the SET threads=1 workaround is no
longer needed in UDF-related tests.

- Remove SET threads=1 from table_function_test.rb,
  data_chunk_test.rb, gc_stress_test.rb, table_function_csv_test.rb,
  and table_function_integration_test.rb
- Add test_table_function_with_multithread that runs a table function
  with threads=4 to verify thread safety

Non-UDF tests (query_progress, pending_result) retain their
SET threads=1 as those are unrelated to callback thread safety.

Refs: suketa#1158

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
duckdb_scalar_function_set_init, duckdb_scalar_function_get_state,
and duckdb_table_function_set_local_init were introduced in DuckDB
1.5.0. CI tests against DuckDB 1.4.4, which lacks these symbols.

Wrap per-worker proxy creation and dispatch behind
HAVE_DUCKDB_H_GE_V1_5_0 (#ifdef). On DuckDB < 1.5.0, the global
executor thread handles all non-Ruby thread callbacks (the previous
behavior). The three-path dispatch (Ruby+GVL / Ruby-GVL / non-Ruby)
remains active on all versions.

Refs: suketa#1158

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Per-worker proxy threads require DuckDB >= 1.5.0 APIs
(duckdb_scalar_function_set_init, duckdb_scalar_function_get_state,
duckdb_table_function_set_local_init). On older versions, these
symbols are unavailable and the global executor is the only
dispatch path for non-Ruby thread callbacks.

Restore backward-compatible behavior:

- connection.rb: Restore check_threads for DuckDB < 1.5.0.
  On >= 1.5.0 the check is skipped (per-worker proxy handles
  thread safety). On < 1.5.0 the threads=1 restriction remains
  to prevent the global executor deadlock.

- Tests: Keep SET threads=1 in all table function tests for
  compatibility with DuckDB 1.4.x. Version-gate the new
  multithread tests behind DuckDB >= 1.5.0.

Refs: suketa#1158

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Break long skip lines into unless/skip form
- Add rubocop:disable for MethodLength on scalar multithread test
- Replace empty block with nil body in table function init

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously, rbduckdb_worker_proxy_destroy only signaled the proxy thread
to stop but never waited for it to exit or freed the struct. This leaked
the worker_proxy struct, its mutex, and condvars for every DuckDB worker
thread on each UDF query — unbounded growth in long-lived processes.

Add an exit_cond condvar so destroy can block until the proxy thread has
fully exited, then safely destroy all OS primitives and free the struct.
Switch from xcalloc/xfree to calloc/free since destroy runs on a
non-Ruby thread where xfree is unsafe.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rename terse field names in executor_request and worker_proxy structs
to be more descriptive, following Arrow C GLib naming conventions:

- fn/data → callback_func/callback_data
- done → request_done
- done_lock/done_mutex/done_cond → request_done_lock/request_done_mutex/request_done_cond
- stop → stop_requested
- exited → thread_exited
- exit_cond → thread_exit_cond

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@otegami otegami closed this Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant