Conversation
|
Overall I'd say the proposal is in a state where it can be reviewed. The part I that I am not totally sure about whether the (public) |
53ff0ac to
413a5ee
Compare
eab080d to
e8d0931
Compare
| import org.apache.jena.sparql.exec.UpdateExecBuilder; | ||
|
|
||
| public interface SparqlAdapter { | ||
| QueryExecBuilder newQuery(); |
There was a problem hiding this comment.
There was a suggestion createAdapter.
But this is the adapter?
Isn't this QueryExecBuilder newQueryExecBuilder();? or shorter: newBuilder?
There was a problem hiding this comment.
The basic idea of this proposal is to introduce SparqlAdapter as the one place in ARQ where eventually all the sparql level machinery over a specific DatasetGraph implementation would go. So adapt a DatasetGraph to the SPARQL layer (could also be called a bridge).
Currently, this comprises the SPARQL subset covered by query and update exec builders. Streaming updates (UpdateExecSteaming) and GSP would require additional methods on SparqlAdapter.
Conceptually, for a specific dataset implementation, there would be one specific SparqlAdapter instance - possibly dynamically assembled by a SparqlAdapterProvider.
However, I think "real" custom implementations should build upon RDFLink - so the only ARQ-level adapter that needs to exist (besides that for the ARQ engine) is the bridge to RDFLink, which is based on DatasetGraphOverRDFLink and SparqlAdapterProviderForDatasetGraphOverRDFLink.java (registered within jena-rdfconnection)
In the example above, DatasetGraphOverRDFLink is handled by a specific SparqlAdapterProvider.
Using custom wrappers with DatasetGraphs (without registering custom providers) will fall back to the default ARQ engine provider SparqlAdapterProviderMain.
An alternative design for SparqlAdapter is to keep QueryExecBuilder, UpdateExecBuilder and GSP in separate registries:
QueryExecBuilder.adapt(dsg) would go to a QueryExecBuilderRegistry backed by a list of QueryExecBuilderProviders and the first match creates the final QueryExecBuilder. Same for update.
I think collecting this related functionality in a single SparqlAdapter system might be nicer.
Isn't this QueryExecBuilder newQueryExecBuilder();? or shorter: newBuilder?
I took the naming from QueryExecBuilder RDFLink.newQuery. In essence SparqlAdapter is an ARQ-level variant of RDFLink - though ARQ doesn't have links - transactions are so far handled managed by the DatasetGraph (per thread).
There was a problem hiding this comment.
I'm sorry but this just does not feel right... I mean, just look at the class name of SparqlAdapterProviderForDatasetGraphOverRDFLink :) To me it's a clear indication that two if not more concerns that are orthogonal got conflated/"flattened" into one.
There was a problem hiding this comment.
I don't think that I am mixing orthogonal concerns - but I am open to criticism. To recap:
- SparqlAdapterProvider provides a "SPARQL layer" implementation for a dataset graph implementation.
- A SPARQL layer implementation comprises builders for query, update and GSP requests.
DatasetGraphOverRDFLinkis a specificDatasetGraphimplementation backed by a factory of RDFLinks. Much like a JDBC data source is a factory for connections.SparqlAdapterProviderDatasetGraphOverRDFLinkis the adapter forDatasetGraphOverRDFLink. For example, when querying, this provider will supply specializedQueryExecBuilderOverRDFLinkinstances that will pass on a query statement to the link - instead of evaluating it against the dataset graph API.
Note, that the sparql adapter system of this PR works under the hood - conventional code should never have to interact with it directly.
7baa3e0 to
2dd1dbb
Compare
2dd1dbb to
116e788
Compare
|
I think there are three meaningful policies for how to pass on transactions when doing DatasetGraphOverRDFLink dsg = new DatasetGraphOverRDFLink(linkCreator);
RDFLink frontLink = RDFLink.connect(dsg);
RDFLink backingLink = dsg.newLink(); // Calls linkCreator.create()
I updated the code of // Pass 'true' for supportsTransactions to enable thread local links:
boolean supportsTransactions = true;
DatasetGraphOverRDFLink dsg = new DatasetGraphOverRDFLink(linkCreator,
supportsTransactions, supportsTransactionAbort);[Update]
|
5637051 to
cc1fc9b
Compare
b6625e0 to
96f95ef
Compare
afs
left a comment
There was a problem hiding this comment.
I'd like the use of deferred builder to be controlled as a build time constant from one place.
SystemARQ:
/**
* Control for {@link QueryExec} and {@link UpdateExec} builder.
*/
public final static boolean DeferredExecBuilders = true;This also requires various minor code changes to make everything go via QueryExecDatasetBuilder and UpdateExecDatasetBuilder, together with enforcing code to use .create and not go to the implicit constructor.
The no arg constructors in (Query|Update)ExecDatasetBuilder(Deferred|Impl) should be explicit and private.
I've made these changes + chasing other places where "Deferred" is hard-wired + a few other things I noticed.
Attached: a (git) patch -- "git apply ....".
(I'm still unsure why there is indirection via "Deferred" and then again via SparqlAdapterRegister/SparqlAdpaterProvider.)
|
Other things: Initial bindings are mentioned in
There are FIXME/TODO in
Sort out package "todelete" -0 the class seems to be in-use. Javadoc wanring: |
The rationale is:
A note about |
How many such vendor specific cases are there? |
|
Any implementation that wishes to bypass ARQ's default query engine and have QueryExec/UpdateExec operate on custom DatasetGraph implementations. As one concrete example - besides having a DatasetGraph backed by SPARQL/HTTP - I can think of e.g. the virtuoso jena adapter going through the new ARQ indirection layer proposed by this PR: https://vos.openlinksw.com/owiki/wiki/VOS/VirtJenaProvider
With the SparqlAdapterRegistry indirection, this could become a conventional |
|
So just to clarify: The primary use case is to abstract remote SPARQL endpoints as a DatasetGraph - because this is a generic mechanism. But the same pattern might be extended to e.g. ODBC/JDBC backed datasets. The intent is not to tie the machinery to specific vendors. |
|
@Aklakan sorry, but I have to push back on this again :) The uniform local/remote execution you're after can be achieved more simply by putting the abstraction boundary at the SPARQL Protocol level rather than the
For reference, AtomGraph Core solves this with a simple This also handles your Virtuoso |
DatasetGraph does not mandate local storage. As @afs noted before, at the core this is about facilitating low-level adapters (DatasetGraph) over high-level abstractions (third-party SPARQL engines/endpoints). The adapter system of this PR is about making specifically the Your proposal to abstract DatasetGraph (or any other backend) with SPARQLEndpoint is simply the adapter in the other direction (high-level over low-level). p.s: |
5921bcb to
5cae312
Compare
41a3457 to
7e5b4f5
Compare
I have completed my revision, the patch is applied and the issues should be resolved.
As for the initial bindings: They are also mentioned - but unused - in the original classes: |
|
After all these changes, where does it leave {Query|Update}EngineFactory? Can the choice of normal/ref/TDB1/TDB2, and/or unwrapping |
e14ffa5 to
6964b4d
Compare
|
The precedence is:
Because of these different levels of abstraction I don't think it is a good idea to pull query engine level machinery (especially the unwrapping system) up to the builder level. For your question, I added a new |
GitHub issue resolved #3510
Pull request Description: Updated proposal for custom SPARQL adapters over DatasetGraphs based on #3184 .
The goal is to unify local and remote query execution on the DatasetGraph level in a way that allows for efficient query/update execution by possibly offloading the execution work to an external or remote engine. This way, execution tracking on the dataset graph level (using its Context) should thus work for both local and remote workloads.
The main difference to the previous proposal is, that unification happens at the builder level - i.e. QueryExecBuilder and UpdateExecBuilder. A newly introduced
SparqlAdapterRegistryis a registry for{Query|Update}ExecBuilderProviderinstances, which can create dataset-graph-specific exec builders.SparqlAdapter, as the DatasetGraph-specific factory for those builders, is the ARQ-level driver interface for implementing custom sparql execution over a DatasetGraph.A
DatasetGraphOverRDFLinkclass with matching query/update providers are also provided.This adapter design is aimed at making any future third-party extension based on RDFLink also immediately available on the ARQ level. Conversely, new query/update adapter implementations - while possible - should be avoided in favor of RDFLink-based implementations.
The class
ExampleDBpediaViaRemoteDataset.javademonstrates the system: A query with Virtuoso-specific features is passed through the Jena API to the DBpedia endpoint, a custom execution wrapper is applied, and yet the outcome is a QueryExecHTTP instance that allows for inspecting certain HTTP fields.Probably the most critical changes are:
QueryExecBuilderDatasetandQueryExecHTTPare now interfaces.{Query|Update}ExecBuilder.transformExecSparqlAdapterRegistryextension point.By submitting this pull request, I acknowledge that I am making a contribution to the Apache Software Foundation under the terms and conditions of the Contributor's Agreement.
See the Apache Jena "Contributing" guide.