Skip to content

ldelements/lde

Repository files navigation

LDE – Linked Data Engine

Shared building blocks for the full Linked Data lifecycle.

CI License: MIT

Every organization working with Linked Data ends up building the same infrastructure from scratch: endpoint management, data import, transformation pipelines, dataset discovery.

LDE covers the full Linked Data lifecycle – from discovery and ingestion through transformation to publication – as an open-source toolkit of composable building blocks for Node.js.

Data transformations are expressed as plain SPARQL queries: portable, transparent and free of vendor lock-in.

Key capabilities

  • Discover datasets from DCAT-AP 3.0 registries.
  • Download and import data dumps to a local SPARQL endpoint for querying.
  • Transform datasets with pure SPARQL CONSTRUCT queries: composable stages with fan-out item selection.
  • Analyze datasets with VoID statistics and SPARQL monitoring.
  • Publish results to SPARQL endpoints or local files.
  • Serve RDF data over HTTP with content negotiation (Fastify plugin).

Standards

Standard Usage in LDE
DCAT-AP 3.0 (EU) Dataset discovery and registry queries
SPARQL 1.1 Data transformations, dataset queries and endpoint management
SHACL Documentation generation from shapes (@lde/docgen)
VoID Statistical analysis of RDF datasets (@lde/pipeline-void)
RDF/JS Internal data model (N3)
LDES (EU) Event stream consumption and publication (planned)

Quick example

import {
  Pipeline,
  Stage,
  SparqlConstructExecutor,
  SparqlItemSelector,
  SparqlUpdateWriter,
  ManualDatasetSelection,
} from '@lde/pipeline';

const pipeline = new Pipeline({
  datasetSelector: new ManualDatasetSelection([dataset]),
  stages: [
    new Stage({
      name: 'per-class',
      itemSelector: new SparqlItemSelector({
        query: 'SELECT DISTINCT ?class WHERE { ?s a ?class }',
      }),
      executors: new SparqlConstructExecutor({
        query:
          'CONSTRUCT { ?class a <http://example.org/Class> } WHERE { ?s a ?class }',
      }),
    }),
  ],
  writers: new SparqlUpdateWriter({
    endpoint: new URL('http://localhost:7200/repositories/lde/statements'),
  }),
});

await pipeline.run();

Packages

Discovery – Find and retrieve dataset descriptions from registries
@lde/dataset npm Core dataset and distribution objects
@lde/dataset-registry-client npm Retrieve dataset descriptions from DCAT-AP 3.0 registries
Processing – Transform, enrich and analyse datasets with SPARQL pipelines
@lde/pipeline npm Build pipelines that query, transform and enrich Linked Data
@lde/pipeline-void npm VoID statistical analysis for RDF datasets
@lde/distribution-downloader npm Download distributions for local processing
@lde/sparql-importer npm Import data dumps to a local SPARQL endpoint for querying
Publication – Serve and document your data
@lde/fastify-rdf npm Fastify plugin for RDF content negotiation and request body parsing
@lde/docgen npm Generate documentation from RDF such as SHACL shapes
Monitoring – Observe pipeline runs and endpoint health
@lde/sparql-monitor npm Monitor SPARQL endpoints with periodic checks
@lde/pipeline-console-reporter npm Console progress reporter for pipelines
Infrastructure – Manage SPARQL servers and run tasks
@lde/local-sparql-endpoint npm Quickly start a local SPARQL endpoint for testing and development
@lde/sparql-server npm Start, stop and control SPARQL servers
@lde/sparql-qlever npm QLever SPARQL adapter for importing and serving data
@lde/wait-for-sparql npm Wait for a SPARQL endpoint to become available
@lde/task-runner npm Task runner core classes and interfaces
@lde/task-runner-docker npm Run tasks in Docker containers
@lde/task-runner-native npm Run tasks natively on the host system

Architecture

graph TD
  subgraph Discovery
    dataset
    dataset-registry-client --> dataset
  end

  subgraph Processing
    pipeline --> dataset-registry-client
    pipeline --> sparql-server
    pipeline --> sparql-importer
    pipeline-void --> pipeline
    distribution-downloader --> dataset
    sparql-importer --> dataset
  end

  subgraph Publication
    fastify-rdf
    docgen
  end

  subgraph Monitoring
    pipeline-console-reporter --> pipeline
    sparql-monitor
  end

  subgraph Infrastructure
    sparql-qlever --> sparql-importer
    sparql-qlever --> sparql-server
    sparql-qlever --> task-runner-docker
    task-runner-docker --> task-runner
    task-runner-native --> task-runner
    sparql-server
    local-sparql-endpoint
    wait-for-sparql
  end
Loading

Who uses LDE

Netwerk Digitaal Erfgoed — Dutch national digital heritage infrastructure, commissioned by the Ministry of Education, Culture and Science

Comparison

LDE TriplyETL rdf-connect
Focus SPARQL-native pipelines RDF ETL platform RDF stream processing
Pipeline language SPARQL + TypeScript TypeScript DSL Declarative (RML)
Lock-in None – plain SPARQL files Proprietary platform Framework-specific
License MIT Proprietary MIT

Development

Prerequisites: Node.js (LTS) and npm.

npm install
npx nx run-many -t build
npx nx run-many -t test
npx nx affected -t lint test typecheck build  # only changed packages

See CONTRIBUTING.md for the full development workflow.

License

MIT – see LICENSE.

Acknowledgements

LDE originated at the Dutch national infrastructure for digital heritage (NDE).

About

Modular engine to power your Linked Data apps and pipelines

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors