Skip to content

GH-3510: Sparql Adapter System#3511

Open
Aklakan wants to merge 1 commit intoapache:mainfrom
Aklakan:2025-05-11-sparqladapter
Open

GH-3510: Sparql Adapter System#3511
Aklakan wants to merge 1 commit intoapache:mainfrom
Aklakan:2025-05-11-sparqladapter

Conversation

@Aklakan
Copy link
Contributor

@Aklakan Aklakan commented Oct 14, 2025

GitHub issue resolved #3510

Pull request Description: Updated proposal for custom SPARQL adapters over DatasetGraphs based on #3184 .
The goal is to unify local and remote query execution on the DatasetGraph level in a way that allows for efficient query/update execution by possibly offloading the execution work to an external or remote engine. This way, execution tracking on the dataset graph level (using its Context) should thus work for both local and remote workloads.

The main difference to the previous proposal is, that unification happens at the builder level - i.e. QueryExecBuilder and UpdateExecBuilder. A newly introduced SparqlAdapterRegistry is a registry for {Query|Update}ExecBuilderProvider instances, which can create dataset-graph-specific exec builders.

SparqlAdapter, as the DatasetGraph-specific factory for those builders, is the ARQ-level driver interface for implementing custom sparql execution over a DatasetGraph.

A DatasetGraphOverRDFLink class with matching query/update providers are also provided.
This adapter design is aimed at making any future third-party extension based on RDFLink also immediately available on the ARQ level. Conversely, new query/update adapter implementations - while possible - should be avoided in favor of RDFLink-based implementations.

The class ExampleDBpediaViaRemoteDataset.java demonstrates the system: A query with Virtuoso-specific features is passed through the Jena API to the DBpedia endpoint, a custom execution wrapper is applied, and yet the outcome is a QueryExecHTTP instance that allows for inspecting certain HTTP fields.

String queryString =
	"SELECT * FROM <http://dbpedia.org> { ?s rdfs:label ?o . ?o bif:contains 'Leipzig' } LIMIT 3";

DatasetGraph dsg = new DatasetGraphOverRDFLink(() ->
	RDFLinkHTTP.newBuilder().destination("http://dbpedia.org/sparql").build());

try (QueryExec qe = QueryExec.newBuilder().dataset(dsg).query(queryString)
		.timeout(10, TimeUnit.SECONDS).transformExec(e -> new QueryExecWrapperDemo(label, e)).build()) {
    // ...
}
Remote Execution Deferred: Dataset type: DatasetGraphOverRDFLink
Remote Execution Deferred: QueryExecBuilder type: QueryExecDatasetBuilderDeferred
Remote Execution Deferred: QueryExec type: QueryExecHTTPWrapper
Remote Execution Deferred: Execution result object type: RowSetBuffered
---------------------------------------------------------------------------------------------------------------------
| s                                                                        | o                                      |
=====================================================================================================================
| <http://dbpedia.org/resource/1._FC_Lokomotive_Leipzig>                   | "1. FC Lokomotive Leipzig"@en          |
| <http://dbpedia.org/resource/Category:1._FC_Lokomotive_Leipzig>          | "1. FC Lokomotive Leipzig"@en          |
| <http://dbpedia.org/resource/Category:1._FC_Lokomotive_Leipzig_managers> | "1. FC Lokomotive Leipzig managers"@en |
---------------------------------------------------------------------------------------------------------------------

Probably the most critical changes are:

  • QueryExecBuilderDataset and QueryExecHTTP are now interfaces.

  • Tests are included.
  • Documentation change and updates are provided for the Apache Jena website
    • {Query|Update}ExecBuilder.transformExec
    • SparqlAdapterRegistry extension point.
  • Commits have been squashed to remove intermediate development commit messages.
  • Key commit messages start with the issue number (GH-xxxx)

By submitting this pull request, I acknowledge that I am making a contribution to the Apache Software Foundation under the terms and conditions of the Contributor's Agreement.


See the Apache Jena "Contributing" guide.

@Aklakan
Copy link
Contributor Author

Aklakan commented Oct 14, 2025

Overall I'd say the proposal is in a state where it can be reviewed.

The part I that I am not totally sure about whether the (public) transformExec methods are really needed at this stage, because post transformations of execs could also be hard-wired into the build method of builders.
I.e. during build, check the context for some key(s) and apply the exec transforms.
However, the indirection with QueryExecHTTPWrapper would still be needed in order to transparently modify a QueryExec while still exposing the HTTP fields.

@Aklakan Aklakan force-pushed the 2025-05-11-sparqladapter branch from 53ff0ac to 413a5ee Compare October 15, 2025 02:11
@Aklakan Aklakan force-pushed the 2025-05-11-sparqladapter branch 2 times, most recently from eab080d to e8d0931 Compare October 27, 2025 13:56
import org.apache.jena.sparql.exec.UpdateExecBuilder;

public interface SparqlAdapter {
QueryExecBuilder newQuery();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a suggestion createAdapter.

But this is the adapter?

Isn't this QueryExecBuilder newQueryExecBuilder();? or shorter: newBuilder?

Copy link
Contributor Author

@Aklakan Aklakan Nov 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basic idea of this proposal is to introduce SparqlAdapter as the one place in ARQ where eventually all the sparql level machinery over a specific DatasetGraph implementation would go. So adapt a DatasetGraph to the SPARQL layer (could also be called a bridge).
Currently, this comprises the SPARQL subset covered by query and update exec builders. Streaming updates (UpdateExecSteaming) and GSP would require additional methods on SparqlAdapter.

Conceptually, for a specific dataset implementation, there would be one specific SparqlAdapter instance - possibly dynamically assembled by a SparqlAdapterProvider.
However, I think "real" custom implementations should build upon RDFLink - so the only ARQ-level adapter that needs to exist (besides that for the ARQ engine) is the bridge to RDFLink, which is based on DatasetGraphOverRDFLink and SparqlAdapterProviderForDatasetGraphOverRDFLink.java (registered within jena-rdfconnection)

In the example above, DatasetGraphOverRDFLink is handled by a specific SparqlAdapterProvider.
Using custom wrappers with DatasetGraphs (without registering custom providers) will fall back to the default ARQ engine provider SparqlAdapterProviderMain.

An alternative design for SparqlAdapter is to keep QueryExecBuilder, UpdateExecBuilder and GSP in separate registries:
QueryExecBuilder.adapt(dsg) would go to a QueryExecBuilderRegistry backed by a list of QueryExecBuilderProviders and the first match creates the final QueryExecBuilder. Same for update.
I think collecting this related functionality in a single SparqlAdapter system might be nicer.

Isn't this QueryExecBuilder newQueryExecBuilder();? or shorter: newBuilder?

I took the naming from QueryExecBuilder RDFLink.newQuery. In essence SparqlAdapter is an ARQ-level variant of RDFLink - though ARQ doesn't have links - transactions are so far handled managed by the DatasetGraph (per thread).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry but this just does not feel right... I mean, just look at the class name of SparqlAdapterProviderForDatasetGraphOverRDFLink :) To me it's a clear indication that two if not more concerns that are orthogonal got conflated/"flattened" into one.

Copy link
Contributor Author

@Aklakan Aklakan Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that I am mixing orthogonal concerns - but I am open to criticism. To recap:

  • SparqlAdapterProvider provides a "SPARQL layer" implementation for a dataset graph implementation.
    • A SPARQL layer implementation comprises builders for query, update and GSP requests.
  • DatasetGraphOverRDFLink is a specific DatasetGraph implementation backed by a factory of RDFLinks. Much like a JDBC data source is a factory for connections.
  • SparqlAdapterProviderDatasetGraphOverRDFLink is the adapter for DatasetGraphOverRDFLink. For example, when querying, this provider will supply specialized QueryExecBuilderOverRDFLink instances that will pass on a query statement to the link - instead of evaluating it against the dataset graph API.

Note, that the sparql adapter system of this PR works under the hood - conventional code should never have to interact with it directly.

@Aklakan Aklakan force-pushed the 2025-05-11-sparqladapter branch 2 times, most recently from 7baa3e0 to 2dd1dbb Compare November 30, 2025 17:06
@Aklakan Aklakan force-pushed the 2025-05-11-sparqladapter branch from 2dd1dbb to 116e788 Compare December 9, 2025 17:17
@Aklakan
Copy link
Contributor Author

Aklakan commented Dec 9, 2025

I think there are three meaningful policies for how to pass on transactions when doing

DatasetGraphOverRDFLink dsg = new DatasetGraphOverRDFLink(linkCreator);
RDFLink frontLink = RDFLink.connect(dsg);

RDFLink backingLink = dsg.newLink(); // Calls linkCreator.create()
  • Link-per-execution: Every query exec obtained from the front facing link, such as via QueryExec qe = frontLink.query(...), is backed by a fresh backing link (dsg.newLink()) that is closed when qe is closed.
  • Pass through: RDFLink.connect(dsg) returns the same link returned by dsg.newLink(). This is currently unsupported because (a) it would require adapting RDFLink.connect and (b) the behavior should be covered by the thread-local-links policy.
  • Thread-local-links: dsg.begin() will open a link via dsg.newLink() and place it into a ThreadLocal of dsg. All further API calls will go to that backing link until dsg.end() is called.
    The behavior should be the same as pass-through. The only difference is, that the link returned by RDFLink.connect is a wrapper that delegates to the backing link in dsg's thread local.

I updated the code of DatasetGraphOverRDFLink with an internal TransactionalOverRDFLink class for the thread-local-links policy.

// Pass 'true' for supportsTransactions to enable thread local links:
boolean supportsTransactions = true;
DatasetGraphOverRDFLink dsg = new DatasetGraphOverRDFLink(linkCreator,
    supportsTransactions, supportsTransactionAbort);

[Update]

  • Note: Placing a link into a datasetgraph's thread local on begin and closing it on end ties the link's life-cycle to that of the transaction. Consequently, begin transaction -> possible connection overhead. Shouldn't be a problem but it is a possible caveat when using DatasetGraphOverRDFLink.

@Aklakan Aklakan force-pushed the 2025-05-11-sparqladapter branch 2 times, most recently from 5637051 to cc1fc9b Compare December 18, 2025 22:58
@Aklakan Aklakan force-pushed the 2025-05-11-sparqladapter branch 3 times, most recently from b6625e0 to 96f95ef Compare January 23, 2026 12:13
Copy link
Member

@afs afs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like the use of deferred builder to be controlled as a build time constant from one place.

SystemARQ:

    /**
     * Control for {@link QueryExec} and {@link UpdateExec} builder.
     */
    public final static boolean DeferredExecBuilders = true;

This also requires various minor code changes to make everything go via QueryExecDatasetBuilder and UpdateExecDatasetBuilder, together with enforcing code to use .create and not go to the implicit constructor.
The no arg constructors in (Query|Update)ExecDatasetBuilder(Deferred|Impl) should be explicit and private.

I've made these changes + chasing other places where "Deferred" is hard-wired + a few other things I noticed.
Attached: a (git) patch -- "git apply ....".

afs-adapter-2026-02-21.patch

(I'm still unsure why there is indirection via "Deferred" and then again via SparqlAdapterRegister/SparqlAdpaterProvider.)

@afs
Copy link
Member

afs commented Feb 21, 2026

Other things:

Initial bindings are mentioned in

  • QueryExecDataset
  • QueryExecDatasetBuilderImpl
  • QueryExecutionAdapter

There are FIXME/TODO in

  • SparqlAdapterRegistry - 2 TODO
  • QueryExec - 1 FIXME
  • QueryExecDatasetBuilder - 1 TODO

QueryExecHTTPBuilder Comment out code ~46 line. Remove it?

Sort out package "todelete" -0 the class seems to be in-use.
./jena-rdfconnection/src/main/java/org/apache/jena/rdflink/dataset/todelete

Javadoc wanring:

Javadoc: InitExecTracking cannot be resolved to a type
InitDatasetGraphOverRDFLink.java
/jena-rdfconnection/src/main/java/org/apache/jena/rdflink/dataset	line 38

Aklakan added a commit to Aklakan/jena that referenced this pull request Feb 22, 2026
@Aklakan
Copy link
Contributor Author

Aklakan commented Feb 24, 2026

I'm still unsure why there is indirection via "Deferred" and then again via SparqlAdapterRegister/SparqlAdpaterProvider.

The rationale is:

  • QueryExecDatasetBuilderDeferred: This builder does not know for which dataset implementation it is being built - the builder's dataset may at any time be changed via a call to dataset(dsg).

  • Upon QueryExecBuilder.build(): The physical QueryExecBuilder for the dataset implementation is obtained from the SparqlAdapterRegistry, and the settings of the deferred builder are transferred to the physical (possibly vendor-specific) builder.

A note about SparqlAdapter: This interface combines query and update aspects. I think I'll drop this interface in favor of a separation similar to {Query|Update}EngineFactory.
The current SparqlAdapterRegistry will remain the central registry, but there will be separate add{Query|Update}ExecBuilderProvider methods instead of the single addProvider(SparqlAdapterProvider) method.
This should be easier to evolve - because what's then still missing are support for custom (vendor-specific) adapters for streaming updates and GSP - but this can be added at a later stage.

@afs
Copy link
Member

afs commented Feb 24, 2026

custom (vendor-specific) adapters

How many such vendor specific cases are there?

@Aklakan
Copy link
Contributor Author

Aklakan commented Feb 24, 2026

Any implementation that wishes to bypass ARQ's default query engine and have QueryExec/UpdateExec operate on custom DatasetGraph implementations.

As one concrete example - besides having a DatasetGraph backed by SPARQL/HTTP - I can think of e.g. the virtuoso jena adapter going through the new ARQ indirection layer proposed by this PR:

https://vos.openlinksw.com/owiki/wiki/VOS/VirtJenaProvider

Bypass Jena/ARQ parser
To use Virtuoso-specific SPARQL extensions (such as bif:contains), queries must bypass the Jena/ARQ parser and go straight to the Virtuoso server. This is done by using the VirtuosoQueryExecutionFactory.create() method, instead of and without the Jena-specific QueryFactory.create() method, which always invokes the Jena/ARQ parser, which in turn rejects any Virtuoso-specific extensions. Thus one would execute a query as follows to bypass the Jena parser --

VirtuosoQueryExecution vqe = VirtuosoQueryExecutionFactory.create (query, set);
vqe.execSelect();

With the SparqlAdapterRegistry indirection, this could become a conventional QueryExecutionFactory.create(queryStr, dsg);.

@Aklakan
Copy link
Contributor Author

Aklakan commented Feb 24, 2026

So just to clarify: The primary use case is to abstract remote SPARQL endpoints as a DatasetGraph - because this is a generic mechanism. But the same pattern might be extended to e.g. ODBC/JDBC backed datasets. The intent is not to tie the machinery to specific vendors.

@namedgraph
Copy link
Contributor

@Aklakan sorry, but I have to push back on this again :)

The uniform local/remote execution you're after can be achieved more simply by putting the abstraction boundary at the SPARQL Protocol level rather than the DatasetGraph level.

DatasetGraph is a storage abstraction (quads, add/delete/find). Wrapping a remote SPARQL endpoint as a DatasetGraph and then detecting at build time that it's not actually local storage is a round-trip through the wrong layer.

For reference, AtomGraph Core solves this with a simple SPARQLEndpoint interface that maps directly to the SPARQL Protocol spec. The local implementation executes against a Jena Dataset; the remote.SPARQLEndpoint extends it with getURI() and getSPARQLClient() and delegates over HTTP. Polymorphism at the protocol level keeps things straightforward.

This also handles your Virtuoso bif:contains case naturally: the remote implementation passes the query string straight to the endpoint without touching the ARQ parser, because it operates at the HTTP level.

@Aklakan
Copy link
Contributor Author

Aklakan commented Feb 24, 2026

@namedgraph

[...] and then detecting at build time that it's not actually local storage is a round-trip through the wrong layer.

DatasetGraph does not mandate local storage. As @afs noted before, at the core this is about facilitating low-level adapters (DatasetGraph) over high-level abstractions (third-party SPARQL engines/endpoints).

The adapter system of this PR is about making specifically the DatasetGraph ecosystem work consistently and efficiently over different backends - primarily HTTP SPARQL endpoints. With ecosystem I mean the query/update builders and their use throughout Jena including assemblers and Fuseki.
It gives users the option to write application logic on the DatasetGraph level that can operate on third-party backends - it's the users that have to decide whether this abstraction fits their use case.

Your proposal to abstract DatasetGraph (or any other backend) with SPARQLEndpoint is simply the adapter in the other direction (high-level over low-level).

p.s:
The reason I mentioned Virtuoso is because it is quite well-known prior art that provides its custom VirtDataset implementation. It allows RDF operations to be carried out against a Virtuoso endpoint using Jena's Dataset API. This mention is 8+ years old. The problem is, that querying requires VirtuosoQueryExecutionFactory in order to bypass ARQ, thus tying application logic to Virtuoso. While this PR is not about Virtuoso, it addresses exactly this issue where its abstraction breaks.

Aklakan added a commit to Aklakan/jena that referenced this pull request Feb 24, 2026
@Aklakan Aklakan force-pushed the 2025-05-11-sparqladapter branch 6 times, most recently from 5921bcb to 5cae312 Compare February 26, 2026 12:54
@Aklakan Aklakan force-pushed the 2025-05-11-sparqladapter branch 3 times, most recently from 41a3457 to 7e5b4f5 Compare February 26, 2026 17:32
@Aklakan
Copy link
Contributor Author

Aklakan commented Feb 26, 2026

Other things:

Initial bindings are mentioned in

* QueryExecDataset

* QueryExecDatasetBuilderImpl

* QueryExecutionAdapter

There are FIXME/TODO in

* `SparqlAdapterRegistry` - 2 TODO

* `QueryExec` - 1 FIXME

* `QueryExecDatasetBuilder` - 1 TODO

QueryExecHTTPBuilder Comment out code ~46 line. Remove it?

Sort out package "todelete" -0 the class seems to be in-use. ./jena-rdfconnection/src/main/java/org/apache/jena/rdflink/dataset/todelete

Javadoc wanring:

Javadoc: InitExecTracking cannot be resolved to a type
InitDatasetGraphOverRDFLink.java
/jena-rdfconnection/src/main/java/org/apache/jena/rdflink/dataset	line 38

I have completed my revision, the patch is applied and the issues should be resolved.

  • I replaced the SparqlAdapter interface with separate {Query|Update}ExecBuilderProvider interfaces because it follows more the modular design of {Query|Update}EngineFactory.
  • The current SparqlAdapterRegistry system operates on the DatasetGraph level.
  • For Graph-level adapters I suggest a separate iteration. Right now, Graphs will be wrapped with DatasetGraphFactory.wrap(graph). The wrapper DatasetGraph will cause the adapter system to fall back to the default ARQ query/update engine system.

As for the initial bindings: They are also mentioned - but unused - in the original classes:

@afs
Copy link
Member

afs commented Feb 28, 2026

After all these changes, where does it leave {Query|Update}EngineFactory?

Can the choice of normal/ref/TDB1/TDB2, and/or unwrapping DatasetGraphWrapper in QueryEngineFactoryWrapper, now be in the deferred/ExecBuilder framework?

@Aklakan Aklakan force-pushed the 2025-05-11-sparqladapter branch from e14ffa5 to 6964b4d Compare February 28, 2026 19:22
@Aklakan
Copy link
Contributor Author

Aklakan commented Feb 28, 2026

The precedence is:

QueryExec.newBuilder()
  --creates-->QueryExecDatasetBuilderDeferred
    --on build delegates to-->SparqlAdapterRegistry.new{Query|Update}ExecBuilder
      --falls back to-->{Query|Update}ExecDatasetBuilderImpl
        --uses-->{Query|Update}EngineRegistry.
  • The builders can intercept any SPARQL query form: select, ask, construct and describe with quad variants, json.
  • The engines operate below on the select query/Op and QueryIterator level: QueryEngineFactory -> Plan -> QueryIterator.

Because of these different levels of abstraction I don't think it is a good idea to pull query engine level machinery (especially the unwrapping system) up to the builder level.
The sparql adapter / builder abstraction is to forward the whole sparql statement (against a DatasetGraph) to a certain executor which provides its own {Query|Update}ExecBuilder implementation. For example, the forward for remote HTTP SPARQL endpoints.

For your question, I added a new TestSparqlAdapterSystem test class which introduces a custom DatasetGraphRef and then uses the adapter system to use query builders that configure the ref(erence) query engine.
For this to work, the provider accept and create methods now also have a context argument - similar to QueryEngineFactory.
This example is just to check whether the adapter system can do it - it's not that meaningful because the query engine can be set directly in the dataset context.
You may want to check whether this looks OK to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sparql Adapter System

3 participants