Skip to content

zazuko/xrm-csv-workflow

Repository files navigation

CSV to RDF conversion template project

This repository provides a performant way to convert CSV files to RDF. It contains:

  • A sample CSV file
  • A corresponding XRM mapping that generates R2RML mapping files
  • A script that converts the input CSV to RDF
  • A default GitHub Action configuration that runs the script and creates an artifact for download

This is a GitHub template repository. It will not be declared as "fork" once you click on the Use this template button above. Simply do that, start adding your data sources and adjust the XRM mapping accordingly.

Make sure to commit the input, mappings and src-gen directories if you want to build it using GitHub Actions.

See Further reading for more information about the XRM mapping language.

Install

Download DuckDB CLI, DuckDB JDBC Driver, Ontop CLI - and unpack into bin folder:

$ ./install.sh

Try converting the example CSV

$ ./convert.sh

Customize

  1. Replace the example CSV in the input directory with your CSV file(s).
  2. Edit load-csv.sql to define tables for your CSV files.
  3. Create the DuckDB database xrm-csv-workflow.duckdb from CSV files:
$ npm run db:create
  1. Generate new XRM files in the mappings directory:
$ npm run xrm:bootstrap

xrm:bootstrap generates a scaffold based on your DB schema. It picks the first column of each table as the subject URI key — review and adjust the generated mappings before using them.

  1. Create/adjust the XRM files in the mappings directory.

  2. Materialize transformed.nt in the output directory:

$ npm run rdf:create

Develop

Each time you change the CSV files or the corresponding definitions (in load-csv.sql), you need to recreate the DuckDB database (with npm run db:create) and refresh the XRM sources (with npm run xrm:sources). XRM will warn you about any source field used in the mappings that is no more available.

Hints

  • Table names in DuckDB need to be lowercase, otherwise ontop materialize throws exceptions
  • Override binary paths via env vars: DUCKDB_BIN and ONTOP_BIN (default to ./bin/duckdb/duckdb and ./bin/ontop/ontop)

Other template repositories

We provide additional template repositories:

  • xrm-csvw-workflow: A template repository for converting CSV to RDF using barnard59 pipelines and the CSVW specification.
  • xrm-r2rml-workflow: A template repository for converting complete relational databases to RDF using the R2RML specification and Ontop as mapper.

Further reading

About

Convert CSV to RDF using DuckDB and Ontop

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors