Loading and saving RDF¶
Reading RDF files¶
RDF data can be represented using various syntaxes (turtle
, rdf/xml
, n3
, n-triples
,
trix
, JSON-LD
, etc.). The simplest format is
ntriples
, which is a triple-per-line format.
Create the file demo.nt
in the current directory with these two lines in it:
<http://example.com/drewp> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://example.com/drewp> <http://example.com/says> "Hello World" .
On line 1 this file says “drewp is a FOAF Person:. On line 2 it says “drep says “Hello World””.
RDFLib can guess what format the file is by the file ending (“.nt” is commonly used for n-triples) so you can just use
parse()
to read in the file. If the file had a non-standard RDF file ending, you could set the
keyword-parameter format
to specify either an Internet Media Type or the format name (a list of available
parsers is available).
In an interactive python interpreter, try this:
from rdflib import Graph
g = Graph()
g.parse("demo.nt")
print(len(g))
# prints: 2
import pprint
for stmt in g:
pprint.pprint(stmt)
# prints:
# (rdflib.term.URIRef('http://example.com/drewp'),
# rdflib.term.URIRef('http://example.com/says'),
# rdflib.term.Literal('Hello World'))
# (rdflib.term.URIRef('http://example.com/drewp'),
# rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
# rdflib.term.URIRef('http://xmlns.com/foaf/0.1/Person'))
The final lines show how RDFLib represents the two statements in the file: the statements themselves are just length-3 tuples (“triples”) and the subjects, predicates, and objects of the triples are all rdflib types.
Reading remote RDF¶
Reading graphs from the Internet is easy:
from rdflib import Graph
g = Graph()
g.parse("http://www.w3.org/People/Berners-Lee/card")
print(len(g))
# prints: 86
rdflib.Graph.parse()
can process local files, remote data via a URL, as in this example, or RDF data in a string
(using the data
parameter).
Saving RDF¶
To store a graph in a file, use the rdflib.Graph.serialize()
function:
from rdflib import Graph
g = Graph()
g.parse("http://www.w3.org/People/Berners-Lee/card")
g.serialize(destination="tbl.ttl")
This parses data from http://www.w3.org/People/Berners-Lee/card and stores it in a file tbl.ttl
in this directory
using the turtle format, which is the default RDF serialization (as of rdflib 6.0.0).
To read the same data and to save it as an RDF/XML format string in the variable v
, do this:
from rdflib import Graph
g = Graph()
g.parse("http://www.w3.org/People/Berners-Lee/card")
v = g.serialize(format="xml")
The following table lists the RDF formats you can serialize data to with rdflib, out of the box, and the format=KEYWORD
keyword used to reference them within serialize()
:
RDF Format |
Keyword |
Notes |
---|---|---|
Turtle |
turtle, ttl or turtle2 |
turtle2 is just turtle with more spacing & linebreaks |
RDF/XML |
xml or pretty-xml |
Was the default format, rdflib < 6.0.0 |
JSON-LD |
json-ld |
There are further options for compact syntax and other JSON-LD variants |
N-Triples |
ntriples, nt or nt11 |
nt11 is exactly like nt, only utf8 encoded |
Notation-3 |
n3 |
N3 is a superset of Turtle that also caters for rules and a few other things |
Trig |
trig |
Turtle-like format for RDF triples + context (RDF quads) and thus multiple graphs |
Trix |
trix |
RDF/XML-like format for RDF quads |
N-Quads |
nquads |
N-Triples-like format for RDF quads |
Working with multi-graphs¶
To read and query multi-graphs, that is RDF data that is context-aware, you need to use rdflib’s
rdflib.ConjunctiveGraph
or rdflib.Dataset
class. These are extensions to rdflib.Graph
that
know all about quads (triples + graph IDs).
If you had this multi-graph data file (in the trig
format, using new-style PREFIX
statement (not the older
@prefix
):
PREFIX eg: <http://example.com/person/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
eg:graph-1 {
eg:drewp a foaf:Person .
eg:drewp eg:says "Hello World" .
}
eg:graph-2 {
eg:nick a foaf:Person .
eg:nick eg:says "Hi World" .
}
You could parse the file and query it like this:
from rdflib import Dataset
from rdflib.namespace import RDF
g = Dataset()
g.parse("demo.trig")
for s, p, o, g in g.quads((None, RDF.type, None, None)):
print(s, g)
This will print out:
http://example.com/person/drewp http://example.com/person/graph-1
http://example.com/person/nick http://example.com/person/graph-2