This repository provides a performant way to convert CSV files to RDF. It contains:
- A sample CSV file
- A corresponding XRM mapping that generates R2RML mapping files
- A script that converts the input CSV to RDF
- A default GitHub Action configuration that runs the script and creates an artifact for download
This is a GitHub template repository. It will not be declared as "fork" once you click on the Use this template button above. Simply do that, start adding your data sources and adjust the XRM mapping accordingly.
Make sure to commit the input, mappings and src-gen directories if you want to build it using GitHub Actions.
See Further reading for more information about the XRM mapping language.
Download DuckDB CLI, DuckDB JDBC Driver, Ontop CLI - and unpack into bin folder:
$ ./install.sh
Try converting the example CSV
$ ./convert.sh
- Replace the example CSV in the
inputdirectory with your CSV file(s). - Edit
load-csv.sqlto define tables for your CSV files. - Create the DuckDB database
xrm-csv-workflow.duckdbfrom CSV files:
$ npm run db:create
- Generate new XRM files in the
mappingsdirectory:
$ npm run xrm:bootstrap
xrm:bootstrapgenerates a scaffold based on your DB schema. It picks the first column of each table as the subject URI key — review and adjust the generated mappings before using them.
-
Create/adjust the XRM files in the
mappingsdirectory. -
Materialize
transformed.ntin theoutputdirectory:
$ npm run rdf:create
Each time you change the CSV files or the corresponding definitions (in load-csv.sql), you need to recreate the DuckDB database (with npm run db:create) and refresh the XRM sources (with npm run xrm:sources).
XRM will warn you about any source field used in the mappings that is no more available.
- Table names in DuckDB need to be lowercase, otherwise
ontop materializethrows exceptions - Override binary paths via env vars:
DUCKDB_BINandONTOP_BIN(default to./bin/duckdb/duckdband./bin/ontop/ontop)
We provide additional template repositories:
- xrm-csvw-workflow: A template repository for converting CSV to RDF using barnard59 pipelines and the CSVW specification.
- xrm-r2rml-workflow: A template repository for converting complete relational databases to RDF using the R2RML specification and Ontop as mapper.
-
DuckDB for creating relational databases from CSV files.
-
RDB to RDF Mapping Language (R2RML) for expressing customized mappings from relational databases to RDF datasets.
-
Expressive RDF Mapping Language (XRM) for creating R2RML mappings with a user-friendly domain-specific language (DSL).
-
Ontop for exposing the content of arbitrary relational databases as knowledge graphs, relying on R2RML mappings.