[plan] Extract ResourceCache and PythonResourceBridge from AgentPlan by weiqingy · Pull Request #548 · apache/flink-agents

weiqingy · 2026-02-22T03:39:11Z

Linked issue: #547

Purpose of change

AgentPlan (624 lines) mixes plan definition, resource caching/resolution, Python bridge wiring, and serialization.
This PR extracts two classes to separate concerns:

ResourceCache — lazy resource resolution, caching, and cleanup. Created by the operator in open(), owned by
the operator lifecycle.
PythonResourceBridge — static discoverPythonMCPResources() for Python MCP tool/prompt discovery. Called
during operator init.

After extraction, AgentPlan becomes immutable after construction (~490 lines, down from 624). The removed public
methods are getResource(), close(), and setPythonResourceAdapter().

Tests

mvn test -pl plan — all plan module tests pass
mvn test -pl runtime — all runtime module tests pass
./tools/lint.sh -c — formatting check passed
./tools/ut.sh -j — full Java test suite passed

API

Yes. Three public methods removed from AgentPlan:

getResource(String, ResourceType) — replaced by ResourceCache.getResource()
close() — replaced by ResourceCache.close()
setPythonResourceAdapter(PythonResourceAdapter) — replaced by PythonResourceBridge.discoverPythonMCPResources()

Documentation

doc-needed
doc-not-needed
doc-included

weiqingy · 2026-02-22T04:14:28Z

Both CI failures are unrelated to our changes:

it-python [flink-2.2] — Ollama download returned HTTP 404. This is a transient infrastructure issue — the Ollama release artifact was temporarily unavailable. Nothing to fix on our side.
cross-language — Py4JError / Py4JNetworkError in the cross-language e2e tests. The Java gateway crashed during test_java_chat_model_integration and subsequent tests couldn't connect. Our PR only touches plan/ and runtime/ Java code — it doesn't affect the Python-Java bridge or cross-language e2e test infrastructure.

wenjin272

Thanks for your work @weiqingy and I think the separation makes sense.

Besides, I think we should also apply this separation in the python side. If possible, could you implement this in this PR?

wenjin272 · 2026-02-27T10:17:26Z

plan/src/main/java/org/apache/flink/agents/plan/ResourceCache.java

+public class ResourceCache implements AutoCloseable {
+
+    private final Map<ResourceType, Map<String, ResourceProvider>> resourceProviders;
+    private final Map<ResourceType, Map<String, Resource>> cache = new HashMap<>();


Should we use ConcurrentHashMap here? For task submit by ctx.durableExecuteAsync may read/write this hashmap parallel.

Good question! This is intentional. After the refactoring, ResourceCache is created and owned by ActionExecutionOperator.open(), so it's scoped to a single operator subtask. In Flink's execution model, all access (processElement, open, close) runs on the same mailbox thread. durableExecuteAsync dispatches work to external threads, but resource resolution from the cache happens on the operator thread before dispatch. So HashMap is sufficient and avoids the overhead of ConcurrentHashMap. Let me know if this matches your understanding.

but resource resolution from the cache happens on the operator thread before dispatch

I think the resource resolution may not always happens before dispatch.

Take the built in chat action as an example. In chat action, we submit chat task to external threads, and the chat task will call chat method of BaseChatModelSetup, which occurs in an asynchronous thread. In the chat method of chat model setup, it will resolves the correspond connection, prompt and tools, which I think may lead to concurrent access to the cache.

You're right, thanks for pushing back on this. I traced the code path more carefully:

ChatModelAction → durableExecuteAsync(callable) → async pool thread runs chatModel.chat() → BaseChatModelSetup.chat() calls this.getResource.apply() to resolve connection, prompt, and tools → which hits ResourceCache.getResource().

So resource resolution does happen on async threads, not just the mailbox thread. I'll switch back to ConcurrentHashMap. Good catch!

Updated the PR.

weiqingy · 2026-03-02T06:02:59Z

Thanks for the review, @wenjin272! I’ve updated the PR to apply the same separation on the Python side as well - could you please take another look?

wenjin272

LGTM. Could you take a look at your convenience @xintongsong ?

weiqingy · 2026-03-02T07:25:12Z

I checked the CI failures - both are LLM-dependent e2e tests and don’t appear to be caused by this PR.

Test 1 (react_agent_test): The output 4444 = 2123 + 2321 proves our ResourceCache IS working correctly — the chat model was resolved, the add tool was resolved and called successfully. The LLM (qwen3:1.7b) simply stopped after one tool call instead of continuing to call multiply(4444, 312). This is LLM non-determinism.

Test 2 (long_term_memory_test): This runs on the Flink remote runner, where there's exactly ONE FlinkRunnerContext with ONE ResourceCache. The behavior is identical to before. The failure is assert len(doc) == 1 after LLM-based compaction using qwen3:8b — if the model's summarization response is malformed, compaction produces incorrect output.

We can re-run CI to confirm flakiness — if it fails again with different assertion values, that would further support LLM non-determinism.

@wenjin272 do you have access to re-run the CI tests? It looks like admin rights are required.

wenjin272 · 2026-03-02T07:51:56Z

Hi, @weiqingy, sorry for I don't have access to re-run the CI. I acknowledge that the failing test is due to its own flakiness and not caused by this PR.

wenjin272 · 2026-03-04T09:05:40Z

Hi, @weiqingy, looks like the timeout of cross-language test may be related to this pr. Two attempts both failed.

xintongsong · 2026-03-04T09:16:48Z

@weiqingy Thanks for working on this refactor.

There're a few comments from my side. Please take a look.

Since ResourceCache and PythonResourceBridge are only needed in runtime, show we move them from the plan package/module to runtime?
What are the relationship of PythonResourceBridge and PythonResourceAdapter? Is there a clear boundary of responses? Or can they be combined into one class?
The cross-language test in CI stuck even after a retry. This might be related to the code changes.

wenjin272 · 2026-03-05T10:42:41Z

runtime/src/main/java/org/apache/flink/agents/runtime/operator/ActionExecutionOperator.java

-        if (runnerContext != null) {
-            runnerContext.close();
+        if (resourceCache != null) {
+            resourceCache.close();


I investigated the cross-language test issues locally and found that resourceCache.close() must be called before pythonInterpreter.close(). After moving it to the very beginning of the close() method, the tests passed successfully.

The issue didn't appear earlier because I missed something while resolving a merge conflict, which caused runnerContext to be closed twice in the close() method. In reality, only the first close operation took effect.

@wenjin272 Thanks for the review and for catching the close() ordering issue! Applied your fix — moved resourceCache.close() to the top of ActionExecutionOperator.close(), before pythonInterpreter.close(). Added a comment explaining the ordering constraint so it doesn't get accidentally re-ordered in the future.

weiqingy · 2026-03-10T05:52:27Z

@weiqingy Thanks for working on this refactor.

There're a few comments from my side. Please take a look.

Since ·ResourceCacheandPythonResourceBridgeare only needed in runtime, show we move the from theplanpackage/module toruntime`?

What are the relationship of PythonResourceBridge and PythonResourceAdapter? Is there a clear boundary of responses? Or can they be combined into one class?

The cross-language test in CI stuck even after a retry. This might be related to the code changes.

Thanks for reviewing @xintongsong !

I kept both in plan. They only depend on plan/api types (ResourceProvider, PythonResourceAdapter, etc.) with zero runtime imports. ResourceCache is also used by 6 plan test files — moving it would require either circular test dependencies or relocating tests that fundamentally test AgentPlan behavior. Being consumed by runtime doesn't mean it should be owned by runtime.
Relationship between PythonResourceBridge and PythonResourceAdapter?
They serve different roles — PythonResourceAdapter is a general-purpose Java-Python interop interface in api, while PythonResourceBridge is a one-time MCP server discovery utility. Combining them doesn't make sense since they're at different abstraction levels. I renamed PythonResourceBridge → PythonMCPResourceDiscovery to make the distinction obvious from the name.
Cross-language test stuck after retry.
Fixed per @wenjin272's finding — moved resourceCache.close() to the top of ActionExecutionOperator.close(),
before pythonInterpreter.close(). Cached resources may hold Python object references that need the
interpreter alive during cleanup. Added a comment explaining the ordering constraint.

weiqingy · 2026-03-10T18:58:41Z

@xintongsong @wenjin272 The 2 CI failures are unrelated to our changes:

it-python [flink-2.2] — long_term_memory_test::test_long_term_memory_async_execution_in_action failed with assert 4 == 1. Same LLM-dependent compaction flakiness from previous runs — the LLM (qwen3:8b) produced malformed summarization output. Our PR doesn't touch long-term memory or compaction logic.
it-python [flink-1.20] — 3 tests failed (long_term_memory_test, python_event_logging_test, react_agent_test) all with Py4JError / Py4JNetworkError. The Java gateway crashed during the first test and subsequent tests couldn't connect. Same transient Py4J infrastructure pattern from earlier runs.

Notably, the cross-language test now passes (previously stuck/timing out) — confirming that the close() ordering fix works as @wenjin272 suggested.

Could you please take another look at the PR? Thanks!

wenjin272

LGTM, please take a look at your convenience @xintongsong

xintongsong · 2026-03-13T08:44:04Z

I kept both in plan. They only depend on plan/api types (ResourceProvider, PythonResourceAdapter, etc.) with zero runtime imports. ResourceCache is also used by 6 plan test files — moving it would require either circular test dependencies or relocating tests that fundamentally test AgentPlan behavior. Being consumed by runtime doesn't mean it should be owned by runtime.

I tend to disagree. I think the key question is which module does the responsibility of the class conceptually belongs to, not the dependency.

The responsibility of the plan module is to provide a common representation of agents, which are programmed with potentially different sets of apis, so that they can be executed with a unified runtime. In other words, an agent plan is a translation of user program, which determines the behavior of the agent.
On the other hand, the responsibility of the runtime module is to actually execute the agent, performing the behaviors described by the given plan.

From that perspective, I think both ResourceProvider and PythonResourceAdapter belongs to "how to execute the agent", thus should be moved to runtime.

As for the testing dependencies, first of all, it should be the production codes that affect/decide the testing codes, not the other way around. I briefly check the codes that referenced ResourceCache in the plan module, and find that they either should also be move to runtime (e.g., PythonMCPResourceDiscovery is only used by ActionExecutionOperator, AgentPlanTest should be ResourceCacheTest after decoupling ResourceCache from the original AgentPlan), or should not call ResourceCache at all (AgentPlanDeclareToolFieldTest should get the provider from the plan and call the provider directly).

Relationship between PythonResourceBridge and PythonResourceAdapter?
They serve different roles — PythonResourceAdapter is a general-purpose Java-Python interop interface in api, while PythonResourceBridge is a one-time MCP server discovery utility. Combining them doesn't make sense since they're at different abstraction levels. I renamed PythonResourceBridge → PythonMCPResourceDiscovery to make the distinction obvious from the name.

That's much clearer. Thanks.

Cross-language test stuck after retry.
Fixed per @wenjin272's finding — moved resourceCache.close() to the top of ActionExecutionOperator.close(),
before pythonInterpreter.close(). Cached resources may hold Python object references that need the
interpreter alive during cleanup. Added a comment explaining the ordering constraint.

Sounds good.

… to runtime Address reviewer feedback: ResourceCache and PythonMCPResourceDiscovery belong to "how to execute the agent" (runtime), not "what the agent looks like" (plan). - Move ResourceCache.java and PythonMCPResourceDiscovery.java to runtime module - Move Python resource_cache.py to runtime package - Extract ResourceCache-specific tests from AgentPlanTest into new ResourceCacheTest - Refactor 5 plan test files to use provider.provide() directly instead of ResourceCache - Update imports in runtime consumers (ActionExecutionOperator, RunnerContextImpl, etc.)

weiqingy · 2026-03-31T06:08:54Z

@xintongsong Addressed your review feedback:

Moved ResourceCache and PythonMCPResourceDiscovery from plan to runtime — these are execution concerns ("how to execute"), not plan representation ("what the agent looks like").
Refactored plan tests to not use ResourceCache:

Extracted testGetResourceNotFound and testGetResourceFromResourceProvider from AgentPlanTest into a new ResourceCacheTest in runtime/src/test/.
5 plan test files (AgentPlanDeclareToolFieldTest, AgentPlanDeclareToolMethodTest, AgentPlanDeclareChatModelTest, AgentPlanDeclareMCPServerTest, FunctionToolPlanTest) now call provider.provide() directly instead of going through ResourceCache.

Python side: Moved resource_cache.py from flink_agents/plan/ to flink_agents/runtime/, updated all imports.

Note: ResourceProvider itself stays in plan since AgentPlan.getResourceProviders() returns it — moving it would create a plan→runtime circular dependency. Let me know if you intended something different.

weiqingy · 2026-03-31T06:47:09Z

The two failing checks are unrelated to this PR:

cross-language [python-3.12] [java-21]: httpx.ReadTimeout — the Python HTTP client timed out waiting for the Ollama service in CI. This same test also failed on main (run #23576637167).
it-java [java-17] [flink-1.20]: FlinkIntegrationTest.testFromTableToTable — it seems flaky assertion due to non-deterministic event processing order (visit_count=2 vs expected =1). This test passed in the earlier run on the same commit.

@wenjin272 @xintongsong Could you please help re-trigger the failed jobs and check whether the failures still reproduce? Thanks!

weiqingy · 2026-04-01T05:21:15Z

@wenjin272 The cross-language test (ChatModelCrossLanguageTest) has been consistently failing with httpx.ReadTimeout on this PR. After digging into it, I found:

On JDK 21, BaseChatModelSetup.chat() resolves resources (connection, prompt, tools) via getResource() inside the async thread (durableExecuteAsync). In cross-language scenarios, this ends up triggering Pemja calls into the Python interpreter from the async pool thread, which could lead to issues.

It looks like your PR #571 addresses this by moving resource resolution into open() on the main thread, so chat() no longer calls getResource() from the async thread. WDYT?

wenjin272 · 2026-04-01T12:39:28Z

Hi, @weiqingy. Actually, the async execution is disabled for cross language resource. And without the pemja patch and #571, get cross language resource in async thread will cause pemja exception or jvm crash. So I think the ReadTimeout is just caused by the excessively slow LLM inference, after all, there are no GPUs in the GitHub CI runners.

weiqingy · 2026-04-01T18:35:47Z

Hi, @weiqingy. Actually, the async execution is disabled for cross language resource. And without the pemja patch and #571, get cross language resource in async thread will cause pemja exception or jvm crash. So I think the ReadTimeout is just caused by the excessively slow LLM inference, after all, there are no GPUs in the GitHub CI runners.

@wenjin272 Thanks for the clarification - that makes sense. Given that the PR itself is unrelated to the flakiness, would it be reasonable to merge after re-triggering CI? I'll also open a follow-up issue to improve the test's resilience on slow runners (e.g., increasing the Ollama timeout or adding a retry).

[plan] Extract ResourceCache and PythonResourceBridge from AgentPlan

acf2c36

github-actions bot added priority/major Default priority of the PR or issue. fixVersion/0.3.0 The feature or bug should be implemented/fixed in the 0.3.0 version. doc-not-needed Your PR changes do not impact docs labels Feb 22, 2026

xintongsong requested a review from wenjin272 February 24, 2026 09:33

wenjin272 reviewed Feb 28, 2026

View reviewed changes

Add refactor for python

90bb60b

Use ConcurrentHashMap instead

704c77a

wenjin272 approved these changes Mar 2, 2026

View reviewed changes

wenjin272 reviewed Mar 5, 2026

View reviewed changes

Fixed review comments

0f8dd78

wenjin272 reviewed Mar 12, 2026

View reviewed changes

Fix Python import sorting for ruff check

fc8bf9e

Conversation

weiqingy commented Feb 22, 2026

Purpose of change

Tests

API

Documentation

Uh oh!

weiqingy commented Feb 22, 2026

Uh oh!

wenjin272 left a comment

Choose a reason for hiding this comment

Uh oh!

wenjin272 Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

weiqingy Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

wenjin272 Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

weiqingy Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

weiqingy Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

weiqingy commented Mar 2, 2026

Uh oh!

wenjin272 left a comment

Choose a reason for hiding this comment

Uh oh!

weiqingy commented Mar 2, 2026

Uh oh!

wenjin272 commented Mar 2, 2026

Uh oh!

wenjin272 commented Mar 4, 2026

Uh oh!

xintongsong commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wenjin272 Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

weiqingy Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

weiqingy commented Mar 10, 2026

Uh oh!

weiqingy commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wenjin272 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xintongsong commented Mar 13, 2026

Uh oh!

weiqingy commented Mar 31, 2026

Uh oh!

weiqingy commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

weiqingy commented Apr 1, 2026

Uh oh!

wenjin272 commented Apr 1, 2026

Uh oh!

weiqingy commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wenjin272 Mar 2, 2026 •

edited

Loading

xintongsong commented Mar 4, 2026 •

edited

Loading

weiqingy Mar 10, 2026 •

edited

Loading

weiqingy commented Mar 10, 2026 •

edited

Loading

wenjin272 left a comment •

edited

Loading

weiqingy commented Mar 31, 2026 •

edited

Loading

weiqingy commented Apr 1, 2026 •

edited

Loading