NSMNTX diagram

Introduction

NSMNTX is a plugin that enables the use of RDF in Neo4j. RDF is a W3C standard model for data interchange. This effectively means that NSMNTX makes it possible to

  • Store RDF data in Neo4j in a lossless manner (imported RDF can subsequently be exported without losing a single triple in the process).

  • On-demand export property graph data from Neo4j as RDF.

Other features in NSMNTX include model mapping and inferencing on Neo4j graphs.

Installation

You can either download a prebuilt jar from the releases area or build it from the source. If you prefer to build, check the note below.

  1. Copy the the jar(s) in the <NEO_HOME>/plugins directory of your Neo4j instance. (note: If you’re going to use the JSON-LD serialisation format for RDF, you’ll need to include also APOC)

  2. Add the following line to your <NEO_HOME>/conf/neo4j.conf (notice that it is possible to modify where the extension is mounted by using an alternative name to /rdf below).

dbms.unmanaged_extension_classes=semantics.extension=/rdf
  1. Restart the server.

  2. Check that the installation went well by running

    call dbms.procedures()

The list of procedures should include the ones documented below. You can check that the extension is mounted by running

:GET /rdf/ping

Note on build

When you run

  mvn clean package

it will produce two jars

  1. A neosemantics-[…​].jar This jar bundles all the dependencies.

  2. An original-neosemantics-[…​].jar This jar is just the neosemantics bit. So go this way if you want to keep the third party jars separate. In this case you will have to add all third party dependencies (look at the pom.xml).

Feedback

Please provide feedback and report bugs as GitHub issues or join the Neo4j Community forum.

Acknowledgements

NSMNTX uses rdf4j for parsing and serialising RDF. Eclipse rdf4j is a powerful Java framework for processing and handling RDF data.

Importing RDF data

The main method for importing RDF is semantics.importRDF. It imports and persists into Neo4j the triples returned by an url. This url can point to an RDF file (local or remote) or a service producing RDF dynamically. More on how to parameterise the access to web services in section x[link].

All import procedures take the following three parameters:

Parameter Type Description

url

String

URL of the dataset

format

String

serialization format. Valid formats are: Turtle, N-Triples, JSON-LD, RDF/XML, TriG and N-Quads (For named graphs)

params

Map

Optional set of parameters (see description in table below)

Note that for this method to run, an index needs to exist on property uri of nodes labeled as Resource so if you have not done it, just run the following command on your DB or the semantics.importRDF procedure will remind you with an error messsage.

CREATE INDEX ON :Resource(uri)

In its most basic form the semantics.importRDF method just takes the url string to access the RDF data and the serialisation format. Let’s say you’re trying to load the following set of triples into Neo4j.

@prefix neo4voc: <http://neo4j.org/vocab/sw#> .
@prefix neo4ind: <http://neo4j.org/ind#> .

neo4ind:nsmntx3502 neo4voc:name "NSMNTX" ;
         a neo4voc:Neo4jPlugin ;
         neo4voc:version "3.5.0.2" ;
         neo4voc:releaseDate "03-06-2019" ;
         neo4voc:runsOn neo4ind:neo4j355 .

neo4ind:apoc3502 neo4voc:name "APOC" ;
         a neo4voc:Neo4jPlugin ;
         neo4voc:version "3.5.0.4" ;
         neo4voc:releaseDate "05-31-2019" ;
         neo4voc:runsOn neo4ind:neo4j355 .

neo4ind:graphql3502 neo4voc:name "Neo4j-GraphQL" ;
         a neo4voc:Neo4jPlugin ;
         neo4voc:version "3.5.0.3" ;
         neo4voc:releaseDate "05-05-2019" ;
         neo4voc:runsOn neo4ind:neo4j355 .

neo4ind:neo4j355 neo4voc:name "neo4j" ;
         a neo4voc:GraphPlatform , neo4voc:AwesomePlatform ;
         neo4voc:version "3.5.5" .

You can save them to your local drive or access them directly here. All you’ll need to provide to NSMNTX is the location (file:// or http://) and the serialisation used, Turtle in this case.

CALL semantics.importRDF("https://raw.githubusercontent.com/jbarrasa/neosemantics/3.5/docs/rdf/nsmntx.ttl","Turtle")

NSMNTX will import the RDF data and persist it into your Neo4j graph as the following structure

RDF data imported in Neo4j

The first thing we notice is that dataType properties in your RDF have been converted into node properties and object properties are now relationships connecting nodes. Every node represents a resource and has a property with its uri. Similarly, rdf:type statements are transformed into node labels. That’s pretty much it but if you are interested, there is a complete description of the way triple data is transformed into Property Graph data for storage in Neo4j in this post. You will also notice a terminology/vocabulary transformation applied by default. The URIs identifying the elments in the RDF data (resources, properties, etc) have their namespace part shortened to make them more human readable and easier to query with Cypher.

In our example, http://neo4j.org/vocab/sw#name has been shortened to ns0_\_name (notice the double underscore separator used between the prefix and teh local name in the URI). Similarly, http://www.w3.org/1999/02/22-rdf-syntax-ns#type would be shortened to rdf\_\_type and so on…​

Prefixes for custom namespaces are assigned sequentially (ns0, ns1, etc) as they appear in the imported RDF. This is the default behavior but we’ll see later on that it is possible to control that, and use custom prefixes. More details in section Defining custom prefixes for namespaces.

Keeping namespaces can be important if you care about being able to regenerate the imported RDF as we will see in section Exporting RDF data. If you don’t care about that, you can ignore the namespaces by setting the handleVocabUris parameter to 'IGNORE' and namespaces will be lost on import. If you run the import with this setting only the local names of URIs will be kept. Here’s what that would look like:

CALL semantics.importRDF("http://.../nsmntx.ttl","Turtle", { handleVocabUris: "IGNORE" })

The imported graph will look something like the following one, in which the names for labels, properties and relationships are more of the kind you’re use to work with in Neo4j:

RDF data imported in Neo4j ignoring namespaces
Important
The first great thing about getting your RDF data into Neo4j is that now you can query it with Cypher

Here’s an example that showcases the difference: Let’s say you want to produce a list of plugins that run on Neo4j and what’s the latest versions of each.

If your RDF data is stored in a triple store, you would need to use the SPARQL query on the left to answer the question. To the right you can see the same thing expressed with Cypher in Neo4j.

SPARQL Cypher
prefix neovoc: <http://neo4j.org/vocab/sw#>
prefix neoind: <http://neo4j.org/ind#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?pluginName
       (MAX(?ver) as ?latestVersion)
WHERE {

	?plugin rdf:type neovoc:Neo4jPlugin ;
		    neovoc:runsOn ?neosrv ;
		    neovoc:name ?pluginName ;
		    neovoc:version ?ver .

	?neosrv rdf:type neovoc:GraphPlatform ;
			neovoc:name "neo4j"
}
GROUP BY ?pluginName
MATCH (n:Neo4jPlugin)-[:runsOn]->(p:GraphPlatform)
WHERE p.name = "neo4j"
RETURN n.name, MAX(n.version)

We’ve seen how to shorten RDF uris into more readable names using namespace prefixes, and we’ve also seen how to ignore them completely. There is a third option: You can keep the complete uris in property names, labels and relationships in the graph by setting the handleVocabUris property to "KEEP". The result will not be pretty and your cypher queries will be horrible, but hey, the option is there. Here’s an example on the same RDF file:

CALL semantics.importRDF("http://.../nsmntx.ttl","Turtle", { handleVocabUris: "KEEP" })
RDF data imported in Neo4j keeping namespaces

The imported graph in this case has the same structure, of course, but uses full uris as labels, relationships an property names.

Filtering triples by predicate

Something you may need to do when importing RDF data into Neo4j is exclude certain triples so that they are not persisted in your Neo4j graph. This is useful when only a portion of the RDF data is relevant to your projecty. The exclusion is done by predicate type "I don’t need to load the version property, or the release date", all you’ll need to do is provide the list of URIs of the predicates you want excluded in parameter predicateExclList. Note that the list needs to contain full URIs.

CALL semantics.importRDF("http://jbarrasa.github.io/neosemantics/docs/rdf/nsmntx.ttl","Turtle", { handleVocabUris: "IGNORE" , predicateExclusionList : [ "http://neo4j.org/vocab/sw#version", "http://neo4j.org/vocab/sw#releaseDate"] })

Handling multivalued properties

In RDF multiple values for the same property are just multiple triples. For example, you can have multiple alternative names for an individual like in the next RDF fragment:

<neo4j://individual/JB> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://neo4j.org/voc#Person> .
<neo4j://individual/JB> <http://neo4j.org/voc#name> "J. Barrasa" .
<neo4j://individual/JB> <http://neo4j.org/voc#altName> "JB" .
<neo4j://individual/JB> <http://neo4j.org/voc#altName> "Jesús" .
<neo4j://individual/JB> <http://neo4j.org/voc#altName> "Dr J" .

NSMNTX default behavior is to keep only one value for literal properties and it will be the last one read in the triples parsed. So if you run a straight import on that data like this

CALL semantics.importRDF("http://jbarrasa.github.io/neosemantics/docs/rdf/multivalued1.nt","N-Triples")

Only the last value for the multivalued altName property will be kept.

MATCH (n:ns0__Person)
RETURN n.ns0__name as name, n.ns0__altName as altName

returns

╒════════════╤═════════╕
│"name"      │"altName"│
╞════════════╪═════════╡
│"J. Barrasa"│"Dr J"   │
└────────────┴─────────┘

This makes things simple and will be perfect if your dataset does not have multivalued properties. It can also be fine if keeping only one value is acceptable, either because the property is not critical or because one value is enough. There will be other cases though, where we do need to keep all the values, and here’s where the config parameter handleMultival will help. Here’s how:

CALL semantics.importRDF("http://jbarrasa.github.io/neosemantics/docs/rdf/multivalued1.nt","N-Triples", { handleMultival: 'ARRAY' })

Now all properties are stored as arrays in Neo4j. Even the ones that have one value only! But we can do better than that, let’s have a look at another example.

The following Turtle RDF fragment with the description of a news article. The article has a number of keykeywords associated with it.

@prefix og: <http://ogp.me/ns#> .
@prefix nyt: <http://nyt.com/voc/> .

<nyt://article/a17a9514-73e7-51be-8ade-283e84a6cd87>
  a og:article ;
  og:title "Bengal Tigers May Not Survive Climate Change" ;
  og:url "https://www.nytimes.com/2019/05/06/science/tigers-climate-change-sundarbans.html" ;
  og:description "The tigers of the Sundarbans may be gone in fifty years, according to study" ;
  nyt:keyword "Climate Change", "Endangered Species", "Global Warming", "India", "Poaching" .

We want to make sure we keep all values for the nyt:keyword property. The natural way to do this in Neo4j is storing them in an array, so we’ll instruct NSMNTX to do that by setting the handleMultival to 'ARRAY' and the multivalPropList to the list of property types that are multivalued and we want stored as arrays of values. In the example the list will only contain 'http://nyt.com/voc/keyword'.

Here’s teh import command that we need. Note that I’m combining the multivalued property config setting with the handleVocabUris set to false (the interested reader can try to drop this config and get URIS shortened with prefixes instead):

CALL semantics.importRDF("http://jbarrasa.github.io/neosemantics/docs/rdf/multivalued2.ttl","Turtle", { handleVocabUris: "IGNORE", handleMultival: 'ARRAY', multivalPropList : ['http://nyt.com/voc/keyword']})

And here’s what the result of the import would look like:

Multivalued properties loaded as arrays in Neo4j

When we analyse the result in the Neo4j browser we realise that there’s only one node for the nine triples imported! Yes, keep in mind that all triples in our RDF fragment are datatype properties, or in other words, properties with literal values, which are stored in Neo4j as node properties. All the statements are there, no data is lost, it’s just stored as the internal structure of the node. We can see all properties on the table view on the left hand side of the image.

Note that this time only the properties listed in the multivalPropList config parameter are stored as arrays, the rest are kept as atomic values.

Warning
Remember that if we set handleMultival to 'ARRAY' but we don’t provide a list of property URIs as multivalPropList ALL literal properties will be stored as arrays.

Here’s an example of how to query the multiple values of the keyword property: Give me articles tagged with the "Global Warming" keyword.

MATCH (a:article)
WHERE "Global Warming" IN a.keyword
RETURN a.title as title
╒══════════════════════════════════════════════╕
│"title"                                       │
╞══════════════════════════════════════════════╡
│"Bengal Tigers May Not Survive Climate Change"│
└──────────────────────────────────────────────┘

Handling language tags

Literal values in RDF can be tagged with language information. This can be used in any context but it’s common to find it used in combination with multivalued properties to create multilingual descriptions for items in a dataset. In the following example we have a description of a TV series with a multivalued property show:localName where each of the values is annotated with the language.

@prefix show: <http://example.org/vocab/show/> .
@prefix indiv: <http://example.org/ind/> .

ind:218 a show:TVSeries
ind:218 show:name "That Seventies Show" .
ind:218 show:localName "That Seventies Show"@en .
ind:218 show:localName 'Cette Série des Années Soixante-dix'@fr .
ind:218 show:localName "Cette Série des Années Septante"@fr-be .

By default, NSMNTX will strip out the language tags but if you want to keep them you’ll need to set the keepLangTag to true. If we uset it in combination with the setting required to keep all values of a property stored in an array, the import invocation would look something like this:

CALL semantics.importRDF("http://jbarrasa.github.io/neosemantics/docs/rdf/multilang.ttl","Turtle", { keepLangTag: true, handleMultival: 'ARRAY', multivalPropList : ['http://example.org/vocab/show/localName']})

When you import literal values keeping the language annotation, you’ll see that string values have a suffix like @fr for French language, @zh-cmn-Hant for Mandarin Chinese traditional, and so on. The function getLangValue can be used to get the value for a particular language tag. It returns null when there is not a value for the selected language tag. The following Cypher fragment returns the french version of a property and when not found, defaults to the english version.

MATCH (n:Resource) RETURN coalesce(semantics.getLangValue("fr",n.ns0__localName), semantics.getLangValue("en",n.ns0__localName))

Filtering triples by language tag

Language tags can also be used as a filter criteria. If we are only interested in a particular language when loading a multilingual dataset, we can set a filter so only literal values with a given language tag (or untagged ones) are imported into Neo4j. The configuration parameter that does it is languageFilter and you’ll need to set it to the relevant tag, for instance 'es' for literals in Spanish language. Here’s what such a configuration would look like:

CALL semantics.importRDF("http://jbarrasa.github.io/neosemantics/docs/rdf/multilang.ttl","Turtle", { languageFilter: 'es'})

Handling custom data types

In RDF custom data types are annotated to literals after the seperator ^^ in form of an IRI. For example, you can have a custom data type for a currency like in the following Turtle RDF fragment:

@prefix ex: <http://example.com/> .

ex:Mercedes
	rdf:type ex:Car ;
	ex:price "10000"^^ex:EUR ;
	ex:power "300"^^ex:HP ;
	ex:color "red"^^ex:Color .

NSMNTX default behavior is to not keep custom data types for properties. So if you run a straight import on that data like this:

CALL semantics.importRDF("file:///Users/emrearkan/IdeaProjects/neosemantics/docs/rdf/customDataTypes.ttl","Turtle")

Only the value for the properties will be kept.

MATCH (n:ns0__Car)
RETURN n.ns0__price, n.ns0__power, n.ns0__color
╒══════════════╤══════════════╤══════════════╕
│"n.ns0__price"│"n.ns0__power"│"n.ns0__color"│
╞══════════════╪══════════════╪══════════════╡
│"10000"       │"300"         │"red"         │
└──────────────┴──────────────┴──────────────┘

This makes things simple and will be perfect if your dataset does not have properties with custom data types. But if you need to keep the custom data types the config parameter keepCustomDataTypes comes into play. Here’s how:

CALL semantics.importRDF("file:///Users/emrearkan/IdeaProjects/neosemantics/docs/rdf/customDataTypes.ttl","Turtle", {keepCustomDataTypes: true})

Now all properties that have a custom data type are saved as strings with their respective custom data type IRIs in Neo4j.

╒═════════════════╤══════════════╤═════════════════╕
│"n.ns0__price"   │"n.ns0__power"│"n.ns0__color"   │
╞═════════════════╪══════════════╪═════════════════╡
│"10000^^ns0__EUR"│"300^^ns0__HP"│"red^^ns0__Color"│
└─────────────────┴──────────────┴─────────────────┘

But we can do better than that, let’s have a look at another example.We will use the same Turtle file from above for this example.

If we want to keep the custom data type for only some of the properties then we can instruct NSMNTX to do that by setting keepCustomDataTypes to true and customDataTypedPropList to the list of property types whose custom data types we want to keep. In the example the list will only contain 'http://example.com/power'.

Here is the import command that we need:

CALL semantics.importRDF("file:///Users/emrearkan/IdeaProjects/neosemantics/docs/rdf/customDataTypes.ttl","Turtle", {keepCustomDataTypes: true, customDataTypedPropList: ['http://example.com/power']})

And here’s what the result of the cypher query above would look like after this import:

╒══════════════╤══════════════╤══════════════╕
│"n.ns0__price"│"n.ns0__power"│"n.ns0__color"│
╞══════════════╪══════════════╪══════════════╡
│"10000"       │"300^^ns0__HP"│"red"         │
└──────────────┴──────────────┴──────────────┘

Note that this time only the custom data types of the properties listed in the customDataTypedPropList are kept, the rest will only have the literal value.

Warning
Remember that if we set keepCustomDataTypes to true but we don’t provide a list of property URIs as customDataTypedPropList ALL literals with a custom data type will be stored as strings with their respective custom data type IRIs.

When you import literal values keeping the custom data types, you’ll see that string values have a IRI suffix separated by ^^ from the raw value. For instance "10000^^ns0__EUR" from the example above. The function getDataType can be used to get the data type for a particular property. It returns null when there is no custom data type for the given property.

The following Cypher fragment returns the data type of power.

MATCH (n:ns0__Car)
RETURN semantics.getDataType(n.ns0__power)

The function getValue can be used to get the raw value of a particular property without custom data types or language tags.

The following Cypher fragment returns the raw value of power.

MATCH (n:ns0__Car)
RETURN semantics.getValue(n.ns0__power)

The user functions mentioned above can be combined with other user functions like uriFromShort or getIRILocalName etc.

Classess as Nodes (instead of Labels)

The rdf:type statements in RDF (triples) are transformed into labels by default when we import them into Neo4j. While this is a reasonable approach it may not be your preferred option, especially if you want to load an ontology too and link it to your instance data. In that case you’ll probably want to represent the types as nodes and have 'the magic' of uris have them linked. Be careful if you try this approach when loading large datasets as it can create very dense nodes. If you want rdf:type statements (triples) to be imported in this way, all you have to do is set the typesToLabels parameter to false.

Here’s an example: Let’s say we want to load an ontology (notice that it’s actually a small fragment of several ontologies, but it will work for our example). For what it’s worth, it’s an RDF file, so we load it the usual way, with all default settings

call semantics.importRDF("http://jbarrasa.github.io/neosemantics/docs/rdf/minionto.ttl","Turtle")

We can inspect the result of the import to see that the ontology contains just five class definitions linked in a hierarchy like this.

Ontology imported in Neo4j

Now we want to load the instance data and we want it to link to the ontology graph rather than build a disconnected graph by transforming rdf:type statements into Property Graph labels. We can achieve this by setting the typesToLabels to false.

call semantics.importRDF("http://jbarrasa.github.io/neosemantics/docs/rdf/miniinstances.ttl","Turtle", { typesToLabels: false })

The resulting graph connects the instance data to the ontology elements. This is the magic of unique identifiers (uris), tere’s nothing you need to do for the linkage to happen, if your RDF is well formed and uris are used consistently in it, then it will happen automatically.

Connected ontology and instance data imported in Neo4j

More on the usefulness of representing the ontology in the neo4j graph in section Inferencing/Reasoning.

Handling named graphs (RDF Quads)

You can also import RDF datasets using semantics.importQuadRDF. The only difference in comparison to semantics.importRDF is that you can import not just triples but also quads. RDF statements can have an extra IRI containing the context of the statement. It enables the partitioning of the data into multiple so called named graphs. When a statement has context information NSMNTX annotates Resources from this statement with a property "graphUri". This property contains the context IRI from the statement.

Note that you need to use TriG or N-Quads serializations if you want to take advantage of the named graph function.

Similar to semantics.importRDF method semantics.importQuadRDF also takes the url string to access the RDF dataset and the serialisation format. Let’s say you’re trying to load the following set of quads into Neo4j.

@prefix ex: <http://www.example.org/vocabulary#> .
@prefix exDoc: <http://www.example.org/exampleDocument#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

exDoc:G1 ex:created "2019-06-06"^^xsd:date .
exDoc:G2 ex:created "2019-06-07T10:15:30"^^xsd:dateTime .

exDoc:Monica a ex:Person ;
             ex:friendOf exDoc:John .

exDoc:G1 {
    exDoc:Monica
              ex:name "Monica Murphy" ;
              ex:homepage <http://www.monicamurphy.org> ;
              ex:email <mailto:monica@monicamurphy.org> ;
              ex:hasSkill ex:Management ,
                                  ex:Programming .
    exDoc:Monica ex:knows exDoc:John . }

exDoc:G2 {
    exDoc:Monica
              ex:city "New York" ;
              ex:country "USA" . }


exDoc:G3 {
    exDoc:John a ex:Person . }

Note that for this method to run, an index needs to exist on property uri of nodes labeled as Resource so if you have not done it, just run the following command on your DB or the semantics.importQuadRDF procedure will remind you with an error messsage.

CREATE INDEX ON :Resource(uri)

This procedure takes the same generic params described in [common_params] at the beginning of the Importing RDF data section, so we will invoke it with a URL and a serialisation format. In the following example we will import the RDF dataset in this file.

You can use the following cypher snippet to import the set of quads from above:

CALL semantics.importQuadRDF( "file:///Users/emrearkan/IdeaProjects/neosemantics/docs/rdf/RDFDataset/RDFDataset.trig", "TriG", {typesToLabels: true, keepCustomDataTypes: true, handleMultival: 'ARRAY'})

Merging nodes virtually

While importing the RDF dataset above NSMNTX will create a separate node for each instance of exDoc:Monica. That means you will have three nodes each representing a different graph. This might complicate things when you want to for example query everything about exDoc:Monica with the following cypher snippet:

MATCH (monica:Resource {uri: 'http://www.example.org/exampleDocument#Monica'})
RETURN monica

As a result you will get three distinct nodes, which look like this in text mode:

╒══════════════════════════════════════════════════════════════════════╕
│"monica"                                                              │
╞══════════════════════════════════════════════════════════════════════╡
│{"http://www.example.org/vocabulary#name":["Monica Murphy"],"uri":"htt│
│p://www.example.org/exampleDocument#Monica","graphUri":"http://www.exa│
│mple.org/exampleDocument#G1"}                                         │
├──────────────────────────────────────────────────────────────────────┤
│{"http://www.example.org/vocabulary#city":["New York"],"http://www.exa│
│mple.org/vocabulary#country":["USA"],"uri":"http://www.example.org/exa│
│mpleDocument#Monica","graphUri":"http://www.example.org/exampleDocumen│
│t#G2"}                                                                │
├──────────────────────────────────────────────────────────────────────┤
│{"uri":"http://www.example.org/exampleDocument#Monica"}               │
└──────────────────────────────────────────────────────────────────────┘

To avoid this, you can use APOC Nodes collapse. apoc.nodes.collapse merges the set of nodes into a virtual node.

Here is the cypher snippet showing how to do that with the exDoc:Monica example:

MATCH (monica:Resource {uri: 'http://www.example.org/exampleDocument#Monica'})
WITH collect(monica) AS nodes
CALL apoc.nodes.collapse(nodes,{properties:'combine'}) YIELD from, rel, to
RETURN DISTINCT from AS monica

As a result you will get a single node which looks like this in text mode:

╒══════════════════════════════════════════════════════════════════════╕
│"monica"                                                              │
╞══════════════════════════════════════════════════════════════════════╡
│{"http://www.example.org/vocabulary#city":["New York"],"count":3,"http│
│://www.example.org/vocabulary#country":["USA"],"uri":"http://www.examp│
│le.org/exampleDocument#Monica","http://www.example.org/vocabulary#name│
│":["Monica Murphy"],"graphUri":["http://www.example.org/exampleDocumen│
│t#G2","http://www.example.org/exampleDocument#G1"]}                   │
└──────────────────────────────────────────────────────────────────────┘

You can find more information about the parameter configuration of apoc.nodes.collapse on APOC Nodes collapse.

Advanced settings for fetching RDF

Sometimes the RDF data will be a static file, and other times it’ll be dynamically generated in response to an HTTP request (GET or POST) possibly containg parameters, even a SPARQL query. The following two parameters will help in these situations: payload : Takes a String as value and sends the specified data in a POST HTTP request to the the url passed as first parameter of the Stored Procedure. Useful typicaloy for SPARQL endpoints where we want to submit a query to produce the actual RDF. headerParams : Takes a map of property-values and adds each of them as an extra header in the HTTP request. Useful for sending credentials to services requiring authentication (using Authorization header) or to specify the required format (using Accept header).

Here is an example of how to send a request to a SPARQL endpoint and ingest the results directly in Neo4j. The service in question is the Linked Open Data service of the British Library. You can test it here. The service is not authenticated, so no need to use the Authorization header but we want to select the RDF serialisation produced by our request, which we do by setting Accept: "application/turtle". Finally, we pass the SPARQL query as the value of the payload parameter, prefixed with query=.

headerParams: { Accept: "application/turtle"}, payload: "query=DESCRIBE <http://bnb.data.bl.uk/id/resource/018212405>" }

We obviously need a query producing RDF so we can import it into Neo4j. I’m using a SPARQL DESCRIBE query in the following example but a SPARQL CONSTRUCT query could be used too. If you want to import all the details available in the British Library about 'The world of yesterday' by Stefan Zweig’s, which by the way, if you haven’t read, you should really take a break after this section and go read.

CALL semantics.importRDF("https://bnb.data.bl.uk/sparql","Turtle",{ handleVocabUris: "IGNORE", headerParams: { Accept: "application/turtle"}, payload: "query=" + apoc.text.urlencode("DESCRIBE <http://bnb.data.bl.uk/id/resource/018212405>") })

Notice that the Bristish Library service requires you to encode the SPARQL query. We do this with APOC’s apoc.text.urlencode function. After running this you get a pretty poor graph, because the DESCRIBE query only returns the statements having 'The world of yesterday' (http://bnb.data.bl.uk/id/resource/018212405) as subject or object. But we can enrich it a bit by re-running it for a all of the URIs connected to our book as follows:

MATCH (:Book)-->(t) WITH DISTINCT t
CALL semantics.importRDF("https://bnb.data.bl.uk/sparql","Turtle",{ handleVocabUris: "IGNORE", headerParams: { Accept: "application/turtle"}, payload: "query=" + apoc.text.urlencode("CONSTRUCT {<" + t.uri + "> ?p ?o } { <" + t.uri + "> ?p ?o } LIMIT 10 ")}) yield triplesLoaded
return t.uri, triplesLoaded

Which returns:

╒══════════════════════════════════════════════════════════════════════╤═══════════════╕
│"t.uri"                                                               │"triplesLoaded"│
╞══════════════════════════════════════════════════════════════════════╪═══════════════╡
│"http://bnb.data.bl.uk/id/person/ZweigStefan1881-1942"                │5              │
├──────────────────────────────────────────────────────────────────────┼───────────────┤
│"http://rdaregistry.info/termList/RDACarrierType/1018"                │1              │
├──────────────────────────────────────────────────────────────────────┼───────────────┤
│"http://bnb.data.bl.uk/id/concept/place/lcsh/Europe"                  │4              │
├──────────────────────────────────────────────────────────────────────┼───────────────┤
│"http://bnb.data.bl.uk/id/concept/lcsh/EuropeCivilization20thcentury" │5              │
├──────────────────────────────────────────────────────────────────────┼───────────────┤
│"http://bnb.data.bl.uk/id/resource/GBB721847"                         │1              │
├──────────────────────────────────────────────────────────────────────┼───────────────┤
│"http://bnb.data.bl.uk/id/place/Europe"                               │3              │
├──────────────────────────────────────────────────────────────────────┼───────────────┤
│"http://lexvo.org/id/iso639-3/eng"                                    │0              │
├──────────────────────────────────────────────────────────────────────┼───────────────┤
│"http://bnb.data.bl.uk/id/concept/lcsh/WorldWar1914-1918Influence"    │5              │
├──────────────────────────────────────────────────────────────────────┼───────────────┤
│"http://rdaregistry.info/termList/RDAMediaType/1003"                  │1              │
├──────────────────────────────────────────────────────────────────────┼───────────────┤
│"http://bnb.data.bl.uk/id/concept/lcsh/AuthorsAustrian20thcenturyBiogr│5              │
│aphy"                                                                 │               │
├──────────────────────────────────────────────────────────────────────┼───────────────┤
│"http://bnb.data.bl.uk/id/resource/018212405/publicationevent/Placeofp│4              │
│ublicationnotidentifiedPushkinPress2009"                              │               │
├──────────────────────────────────────────────────────────────────────┼───────────────┤
│"http://rdaregistry.info/termList/RDAContentType/1020"                │1              │
├──────────────────────────────────────────────────────────────────────┼───────────────┤
│"http://bnb.data.bl.uk/id/concept/ddc/e22/838.91209"                  │3              │
├──────────────────────────────────────────────────────────────────────┼───────────────┤
│"http://bnb.data.bl.uk/id/concept/person/lcsh/ZweigStefan1881-1942"   │5              │
└──────────────────────────────────────────────────────────────────────┴───────────────┘

And produces this graph:

Graph resulting of importing the data in the British National Library on 'The world of yesterday' by Stefan Zweig

Of course you could do achieve this -or something similar- in different ways, in this case we are using a SPARQL CONSTRUCT query in order to be able to limit the number of triples returned for each resource as some of them are pretty dense.

Defining custom prefixes for namespaces

When applying url shortening on RDF ingestion (either explicitly or implicitly), we have the option of letting neosemantics automatically assign prefixes to namespaces as they appear in the imported RDF. But before doing that, a few popular ones will be set with familiar prefixes. These include "http://www.w3.org/1999/02/22-rdf-syntax-ns#" prefixed as rdf and "http://www.w3.org/2004/02/skos/core#" prefixed as skos.

At any point you can check the prefixes in use by running the listNamespacePrefixes procedure.

CALL semantics.listNamespacePrefixes()

Before running your first import this method should return no results but after your first run, it should return a list containing at least the following entries.

╒════════╤═════════════════════════════════════════════╕
│"prefix"│"namespace"                                  │
╞════════╪═════════════════════════════════════════════╡
│"skos"  │"http://www.w3.org/2004/02/skos/core#"       │
├────────┼─────────────────────────────────────────────┤
│"sch"   │"http://schema.org/"                         │
├────────┼─────────────────────────────────────────────┤
│"sh"    │"http://www.w3.org/ns/shacl#"                │
├────────┼─────────────────────────────────────────────┤
│"rdfs"  │"http://www.w3.org/2000/01/rdf-schema#"      │
├────────┼─────────────────────────────────────────────┤
│"dc"    │"http://purl.org/dc/elements/1.1/"           │
├────────┼─────────────────────────────────────────────┤
│"dct"   │"http://purl.org/dc/terms/"                  │
├────────┼─────────────────────────────────────────────┤
│"rdf"   │"http://www.w3.org/1999/02/22-rdf-syntax-ns#"│
├────────┼─────────────────────────────────────────────┤
│"owl"   │"http://www.w3.org/2002/07/owl#"             │
└────────┴─────────────────────────────────────────────┘

Let’s say the RDF dataset that you are going to import uses the namespace http://neo4j.org/voc/sw# and you want it to be prefixed as neo instead of ns0 (or ns7) as would happen if the prefix was assigned automatically by neosemantics. You can do this by calling the addNamespacePrefix procedure as follows:

call semantics.addNamespacePrefix("neo","http://neo4j.org/vocab/sw#")

This will return:

╒════════╤════════════════════════════╕
│"prefix"│"namespace"                 │
╞════════╪════════════════════════════╡
│"neo"   │"http://neo4j.org/vocab/sw#"│
└────────┴────────────────────────────┘

And then when the namespace is detected during the ingestion of the RDF data, the neo prefix will be used.

Make sure you know what you’re doing if you manipulate the prefix definition, especially after loading RDF data as you can overwrite namespaces in use, which would affect the possibility of regenerating the imported RDF.

Deleting RDF

The method to delete imported RDF data is semantics.deleteRDF. It deletes from Neo4j the triples returned by an url. This url can point to an RDF file (local or remote) or a service producing RDF dynamically. All delete procedures take the following three parameters, like the import procedures:

Parameter Type Description

url

String

URL of the dataset

format

String

serialization format. Valid formats are: Turtle, N-Triples, JSON-LD, RDF/XML, TriG and N-Quads (For named graphs)

params

Map

Set of parameters (see description in table below)

In its most basic form the semantics.deleteRDF method just takes the url string to access the RDF data and the serialisation format. Let’s say you have already imported the following set of triples into Neo4j with this command:

CALL semantics.importRDF("file:///Users/emrearkan/IdeaProjects/neosemantics/docs/rdf/deleteRDF/dataset.ttl","Turtle",{typesToLabels: true, keepCustomDataTypes: true, handleMultival: 'ARRAY'})
@prefix ex: <http://example.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ex:Resource1
  a ex:TestResource ;
  ex:Predicate1 "100"^^ex:CDT ;
  ex:Predicate2 "test";
  ex:Predicate3 ex:Resource2 ;
  ex:Predicate4 "val1" ;
  ex:Predicate4 "val2" ;
  ex:Predicate4 "val3" ;
  ex:Predicate4 "val4" .

ex:Resource2
  a ex:TestResource ;
  ex:Predicate1 "test";
  ex:Predicate2 ex:Resource3 ;
  ex:Predicate3 "100"^^xsd:long ;
  ex:Predicate3 "200"^^xsd:long ;
  ex:Predicate4 "300.0"^^xsd:double ;
  ex:Predicate4 "400.0"^^xsd:double .

Let’s say you’re trying to delete the following set of triples from Neo4j after the import above:

@prefix ex: <http://example.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ex:Resource1
  ex:Predicate3 ex:Resource2 .

ex:Resource2
  a ex:TestResource ;
  ex:Predicate1 "test";
  ex:Predicate2 ex:Resource3 ;
  ex:Predicate3 "100"^^xsd:long ;
  ex:Predicate3 "200"^^xsd:long ;
  ex:Predicate4 "300.0"^^xsd:double ;
  ex:Predicate4 "400.0"^^xsd:double .

Here is the cypher snippet showing how to do that:

CALL semantics.deleteRDF("file:///Users/emrearkan/IdeaProjects/neosemantics/docs/rdf/deleteRDF/delete.ttl","Turtle",{typesToLabels: true, keepCustomDataTypes: true, handleMultival: 'ARRAY'})

NSMNTX will delete the RDF data in your Neo4j graph. After this deletion your RDF data will look like this:

@prefix ex: <http://example.org/> .

ex:Resource1
  a ex:TestResource ;
  ex:Predicate1 "100"^^ex:CDT ;
  ex:Predicate2 "test";
  ex:Predicate4 "val1" ;
  ex:Predicate4 "val2" ;
  ex:Predicate4 "val3" ;
  ex:Predicate4 "val4" .
Important
You must use for deletion the same set of parameters that you used during import, otherwise you will not get the expected results.

Note that currently blank nodes cannot be deleted due to not having a persistent IRI.

Handling named graphs (RDF Quads)

You can also delete imported quads using semantics.deleteQuadRDF which takes the same generic params described in [common_params_delete].

Note that you need to use TriG or N-Quads serializations if you want to take advantage of the named graph function.

For this example we will use the same dataset which was used in Handling named graphs (RDF Quads).

Let’s say you’re trying to delete the following set of quads from Neo4j after the import from the Handling named graphs (RDF Quads) section:

@prefix ex: <http://www.example.org/vocabulary#> .
@prefix exDoc: <http://www.example.org/exampleDocument#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

exDoc:G2 ex:created "2019-06-07T10:15:30"^^xsd:dateTime .

exDoc:Monica a ex:Person ;
             ex:friendOf exDoc:John .

exDoc:G1 {
    exDoc:Monica
              ex:name "Monica Murphy" ;
              ex:email <mailto:monica@monicamurphy.org> ;
              ex:hasSkill ex:Management ;
              ex:knows exDoc:John . }

exDoc:G2 {
    exDoc:Monica
              ex:city "New York" ;
              ex:country "USA" . }

The semantics.deleteQuadRDF procedure takes the same generic params described in [common_params_delete] at the beginning of the Deleting RDF section, so we will invoke it with a URL and a serialisation format. In the following example we will import the RDF dataset in this file.

Here is the cypher snippet showing how to do that:

CALL semantics.deleteQuadRDF("file:///Users/emrearkan/IdeaProjects/neosemantics/docs/rdf/RDFDataset/RDFDataset.trig","TriG",{typesToLabels: true, keepCustomDataTypes: true, handleMultival: 'ARRAY'})

NSMNTX will delete the given quads in your Neo4j graph. After this deletion your RDF dataset will look like this:

@prefix ex: <http://www.example.org/vocabulary#> .
@prefix exDoc: <http://www.example.org/exampleDocument#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

exDoc:G1 ex:created "2019-06-06"^^xsd:date .

exDoc:G1 {
    exDoc:Monica
              ex:homepage <http://www.monicamurphy.org> ;
              ex:hasSkill ex:Programming . }

exDoc:G3 {
    exDoc:John a ex:Person . }

Importing Ontologies

Ontologies are serialised as RDF, so they can be imported using plain importRDF but the liteOntoImport method will give us a higher level of control over how an RDFS or OWL ontology is imported into Neo4j. It’s important to note that this procedure exclusively import the following:

  1. Named class (category) declarations with both rdfs:Class and owl:Class.

  2. Explicit class hierarchies defined with rdf:subClassOf statements.

  3. Property definitions with owl:ObjectProperty, owl:DatatypeProperty and rdfs:Property

  4. Explicit property hierarchies defined with rdfs:subPropertyOf statements.

  5. Domain and range information for properties described as rdfs:domain and rdfs:range statements.

All other elments will be ignored by this loader.

The liteOntoImport procedure takes the same generic params described in [common_params] at the beginning of the Importing RDF data section, so we will invoke it with a URL and a serialisation format. In the following example we will import the ontology in this file.

CALL semantics.importOntology("http://jbarrasa.github.io/neosemantics/docs/rdf/vw.owl","Turtle")
VW ontology imported in Neo4j

As we see in the ingested graph, by default, classes will be persissted as nodes with label Class with two properties: uri and name and rdf:subClassOf statements are stored ass relationships of type SCO between Class nodes. Similarly, relationships will be persisted as nodes with name and uri and labels Relationship or Property for owl:ObjectProperty and owl:DatatypeProperty respectively. Statements with rdf:subPropertyOf as predicate are stored as relationships of type SPO between Relationship or Property nodes.

These graph model elements can be overriden by using the following configuration params:

  • classLabel: Label to be used for Ontology Classes (categories). Default is Class.

  • subClassOfRel: Relationship to be used for rdfs:subClassOf hierarchies. Default is SCO.

  • dataTypePropertyLabel: Label to be used for DataType properties in the Ontology. Default is Property.

  • objectPropertyLabel: Label to be used for Object properties in the Ontology. Default is Relationship.

  • subPropertyOfRel: Relationship to be used for rdfs:subPropertyOf hierarchies. Default is SPO.

  • domainRel: Relationship to be used for rdfs:domain. Default is DOMAIN.

  • rangeRel: Relationship to be used for rdfs:range. Default is RANGE.

Here’s an example of how to load an ontology using some of these parameters:

CALL semantics.importOntology("http://jbarrasa.github.io/neosemantics/docs/rdf/vw.owl","Turtle", { classLabel : 'Category', objectPropertyLabel: 'Rel', dataTypePropertyLabel: 'Prop'})

Finally, it’s also possible to have imported nodes (both Classes and Properties/Relationships) labeled as Resource for compatibility with the importRDF procedure. This is done by setting the addResourceLabels parameter to true.

Previewing RDF data

Sometimes before we go ahead and import RDF data into Neo4j we want to see what it looks like or we may even want to take full control with Cypher over the data ingestion process and customise what to do with each parsed triple. For these purpose NSMNTX provides the following procedures.

Streaming triples

The streamRDF procedure also takes the same generic params described in [common_params], so we will invoke it with a URL and a serialisation format just as we would invoke the importRDF procedure:

CALL semantics.streamRDF("http://jbarrasa.github.io/neosemantics/docs/rdf/nsmntx.ttl","Turtle")

It will produce a stream of records, each one representing a triple parsed. So you will get fields for the subject, predicate and object plus three additional ones:

  1. isLiteral: a boolean indicating whether the object of the statement is a literal

  2. literalType: The datatype of the literal value when available

  3. literalLang: The language when available

In the previous example the output would look something like this:

RDF parsed and streamed in Neo4j

The procedure is read-only and nothing is written to the graph, however, it is possible to use cypher on the output of the procedure to analyse the triples returned like in this first example:

CALL semantics.streamRDF("http://jbarrasa.github.io/neosemantics/docs/rdf/nsmntx.ttl","Turtle") yield subject, predicate, object
WHERE predicate = "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
RETURN object as category, count(*) as itemsInCategory
╒═══════════════════════════════════════════╤═════════════════╕
│"category"                                 │"itemsInCategory"│
╞═══════════════════════════════════════════╪═════════════════╡
│"http://neo4j.org/vocab/sw#Neo4jPlugin"    │3                │
├───────────────────────────────────────────┼─────────────────┤
│"http://neo4j.org/vocab/sw#GraphPlatform"  │1                │
├───────────────────────────────────────────┼─────────────────┤
│"http://neo4j.org/vocab/sw#AwesomePlatform"│1                │
└───────────────────────────────────────────┴─────────────────┘

Or even to write to the Graph to create your own custom structure like in this second one:

CALL semantics.streamRDF("http://jbarrasa.github.io/neosemantics/docs/rdf/nsmntx.ttl","Turtle")
YIELD subject, predicate, object, isLiteral
WHERE NOT ( isLiteral OR predicate = "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" )
MERGE (from:Thing { id: subject})
MERGE (to:Thing { id: object })
MERGE (from)-[:CONNECTED_TO { id: predicate }]->(to)

Previewing RDF data

The previewRDF and previewRDFSnippet methods provide a convenient way to visualise in the Neo4j browser some RDF data before we go ahead with the actual import. Like all methods in the Previewing RDF data section, both previewRDF and previewRDFSnippet are read only so will not persist anything in the graph. The difference between them is that previewRDF takes a url (and optionally additional configuration settings as described in Advanced settings for fetching RDF) whereas previewRDFSnippet takes an RDF fragment as text instead.

Exporting RDF data

In the previous section we covered how to ingest RDF into Neo4j, in this one we will focus on how to generate RDF from our Neo4j graph. We will see that it is possible to serialise in RDF any Neo4j graph, even in the case when the data in Neo4j is not the result of importing RDF.

RDF is a W3C standard model for data interchange on the Web that represents data as a graph, hence the seamless serialisation of graph data from Neo4j in RDF as we’ll see.

There are three main ways of generating RDF from your graph in Neo4j. Selecting a node in the graph by its unique identifier (id or uri), selecting a group of nodes by Label + property value and via Cypher. Let’s analyse each of them in detail.

Warning
The paths used in the following sections assume that NSMNTX is mounted at /rdf. If you’ve mounted the extension under a different name (instructions on how to do this can be found in the [install] section) all you need to do is replace the /rdf bits in the urls in the following examples with the name you’ve used.

By node ID

/rdf/describe/id

To explain how some this method works, we’ll use the Northwind Graph. You can easily load it in your Neo4j instance by running :play northwind-graph in your browser. This will bring up a guide with step by step instructions on how to create it. I’ll assume the graph is now loaded.

The describe/id method emulates the SPARQL DESCRIBE operation. It takes the unique identifier of an element in the graph and it produces an RDF serialisation of all information available about it. This includes properties, labels, and relationships (both incoming and outgoing).

The only required parameter is the id of the node. As you know any node in Neo4j has a unique identifier that you can get via cypher using the id function.

MATCH (p:Product) WHERE p.productName = "Queso Manchego La Pastora" RETURN ID(p)

In my case, the ID returned by the previous query is 11, so if I want NSMNTX to produce an RDF serialisation of this node, all I need to do is issue the following HTTP request:

http://localhost:7474/rdf/describe/id/11

Or if you’re working on your Neo4j browser, you can run it like this too:

:GET /rdf/describe/id/11

And you will get a description for your node, serialised as Turtle RDF by default. Something like this:

@prefix neovoc: <neo4j://vocabulary#> .
@prefix neoind: <neo4j://individuals#> .


neoind:11 a neovoc:Product;
  neovoc:PART_OF neoind:83;
  neovoc:categoryID "4";
  neovoc:discontinued false;
  neovoc:productID "12";
  neovoc:productName "Queso Manchego La Pastora";
  neovoc:quantityPerUnit "10 - 500 g pkgs.";
  neovoc:reorderLevel "0"^^<http://www.w3.org/2001/XMLSchema#long>;
  neovoc:supplierID "5";
  neovoc:unitPrice 3.8E1;
  neovoc:unitsInStock "86"^^<http://www.w3.org/2001/XMLSchema#long>;
  neovoc:unitsOnOrder "0"^^<http://www.w3.org/2001/XMLSchema#long> .

neoind:1038 neovoc:ORDERS neoind:11 .
neoind:684 neovoc:ORDERS neoind:11 .
neoind:1035 neovoc:ORDERS neoind:11 .
neoind:525 neovoc:ORDERS neoind:11 .
neoind:622 neovoc:ORDERS neoind:11 .
neoind:968 neovoc:ORDERS neoind:11 .
neoind:532 neovoc:ORDERS neoind:11 .
neoind:667 neovoc:ORDERS neoind:11 .
neoind:957 neovoc:ORDERS neoind:11 .
neoind:1007 neovoc:ORDERS neoind:11 .
neoind:707 neovoc:ORDERS neoind:11 .
neoind:255 neovoc:ORDERS neoind:11 .
neoind:1066 neovoc:ORDERS neoind:11 .
neoind:428 neovoc:ORDERS neoind:11 .
neoind:104 neovoc:SUPPLIES neoind:11 .

You can modify the output of the describe method as follows: * Change serialisation format by either by using the accept header param with any of the RDF media types: "application/rdf+xml", "text/plain", "text/turtle", "text/n3", "application/trix", "application/x-trig", "application/ld+json" or the format request param using any of the following values: Turtle, N-Triples, JSON-LD, TriG, RDF/XML. The format request parameter if used will override the accept header param. * Exclude relationships and just return the properties and labels of the selected node by setting the request parameter exculdeContext to true. * Exclude unmapped elements by setting the request parameter showOnlyMapped to true. We’ll see in section Mapping graph models how to define basic model mappings with NSMNTX.

Here’s an example of using some of this modifiers. The following request (again simplified notation for the Neo4j browser):

:GET /rdf/describe/id/11?format=RDF/XML&excludeContext=true

Would filter out relationships (RDF Object Properties) and set the serialisation format to RDF/XML to produce:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
	xmlns:neovoc="neo4j://vocabulary#"
	xmlns:neoind="neo4j://individuals#"
	xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">

<rdf:Description rdf:about="neo4j://individuals#11">
	<rdf:type rdf:resource="neo4j://vocabulary#Product"/>
	<neovoc:reorderLevel rdf:datatype="http://www.w3.org/2001/XMLSchema#long">0</neovoc:reorderLevel>
	<neovoc:unitsInStock rdf:datatype="http://www.w3.org/2001/XMLSchema#long">86</neovoc:unitsInStock>
	<neovoc:unitPrice rdf:datatype="http://www.w3.org/2001/XMLSchema#double">38.0</neovoc:unitPrice>
	<neovoc:supplierID>5</neovoc:supplierID>
	<neovoc:productID>12</neovoc:productID>
	<neovoc:quantityPerUnit>10 - 500 g pkgs.</neovoc:quantityPerUnit>
	<neovoc:discontinued rdf:datatype="http://www.w3.org/2001/XMLSchema#boolean">false</neovoc:discontinued>
	<neovoc:productName>Queso Manchego La Pastora</neovoc:productName>
	<neovoc:categoryID>4</neovoc:categoryID>
	<neovoc:unitsOnOrder rdf:datatype="http://www.w3.org/2001/XMLSchema#long">0</neovoc:unitsOnOrder>
</rdf:Description>

</rdf:RDF>

By URI

/rdf/describe/uri

If you’ve imported an RDF dataset using NSMNTX (and you did NOT use the IGNORE option) now you can export it and generate exactly the same set of RDF triples that were originally ingested. You can do this in a very similar way to how you do it for any other Neo4j graph. The /rdf/describe/uri works exactly in the same way as the /rdf/describe/id but instead of taking a node’s ID, it takes it’s URI. Here’s an example on the graph we imported in section [import]:

:GET /rdf/describe/uri/http%3A%2F%2Fneo4j.org%2Find%23neo4j355?format=RDF/XML

Again, notice the URL enconding of the URI (the clean URI is http://neo4j.org/ind#neo4j355) and the format parameter to specify the serialisation format. Here’s the output of the request:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
	xmlns:neovoc="neo4j://vocabulary#"
	xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">

<rdf:Description rdf:about="http://neo4j.org/ind#neo4j355">
	<rdf:type rdf:resource="http://neo4j.org/vocab/sw#GraphPlatform"/>
	<rdf:type rdf:resource="http://neo4j.org/vocab/sw#AwesomePlatform"/>
	<name xmlns="http://neo4j.org/vocab/sw#">neo4j</name>
	<version xmlns="http://neo4j.org/vocab/sw#">3.5.5</version>
</rdf:Description>

<rdf:Description rdf:about="http://neo4j.org/ind#graphql3502">
	<runsOn xmlns="http://neo4j.org/vocab/sw#" rdf:resource="http://neo4j.org/ind#neo4j355"/>
</rdf:Description>

<rdf:Description rdf:about="http://neo4j.org/ind#nsmntx3502">
	<runsOn xmlns="http://neo4j.org/vocab/sw#" rdf:resource="http://neo4j.org/ind#neo4j355"/>
</rdf:Description>

<rdf:Description rdf:about="http://neo4j.org/ind#apoc3502">
	<runsOn xmlns="http://neo4j.org/vocab/sw#" rdf:resource="http://neo4j.org/ind#neo4j355"/>
</rdf:Description>

</rdf:RDF>

Additionally, you can provide a graph URI to specify the context of the given resource using the graphuri parameter. Here is how you can serialise as RDF the resource identified by URI http://www.example.org/exampleDocument#Monica but only the statements in the named graph http://www.example.org/exampleDocument#G1. Normally such a model will be the result of importing RDF Quads as described in the Handling named graphs (RDF Quads) section. Note that URIS are URL encoded:

:GET /rdf/describe/uri/http%3A%2F%2Fwww.example.org%2FexampleDocument%23Monica?graphuri=http%3A%2F%2Fwww.example.org%2FexampleDocument%23G1&format=TriG

By Label + property value

/rdf/describe/find/

An alternative way to select he node (or set of nodes) to serialise as RDF is to do a search by label and property. Let’s say in our Northwind Database example we want to get the Suppliers in a given postal code. The label we’re interested in is Supplier and the property is postcode. Here’s what a request of this type would look like:

:GET /rdf/describe/find/Supplier/postalCode/EC1%204SD?format=N-Triples

In this request we are setting the serialisation to N-Triples format. Also notice that the property value (EC1 4SD) needs to be URL Encoded. Here’s the output of the request:

<neo4j://individuals#100> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <neo4j://vocabulary#Supplier> .
<neo4j://individuals#100> <neo4j://vocabulary#country> "UK" .
<neo4j://individuals#100> <neo4j://vocabulary#contactTitle> "Purchasing Manager" .
<neo4j://individuals#100> <neo4j://vocabulary#address> "49 Gilbert St." .
<neo4j://individuals#100> <neo4j://vocabulary#supplierID> "1" .
<neo4j://individuals#100> <neo4j://vocabulary#phone> "(171) 555-2222" .
<neo4j://individuals#100> <neo4j://vocabulary#city> "London" .
<neo4j://individuals#100> <neo4j://vocabulary#contactName> "Charlotte Cooper" .
<neo4j://individuals#100> <neo4j://vocabulary#companyName> "Exotic Liquids" .
<neo4j://individuals#100> <neo4j://vocabulary#postalCode> "EC1 4SD" .
<neo4j://individuals#100> <neo4j://vocabulary#region> "NULL" .
<neo4j://individuals#100> <neo4j://vocabulary#fax> "NULL" .
<neo4j://individuals#100> <neo4j://vocabulary#homePage> "NULL" .
<neo4j://individuals#100> <neo4j://vocabulary#SUPPLIES> <neo4j://individuals#0> .
<neo4j://individuals#100> <neo4j://vocabulary#SUPPLIES> <neo4j://individuals#1> .
<neo4j://individuals#100> <neo4j://vocabulary#SUPPLIES> <neo4j://individuals#2> .

By default property values are treated as strings which may or may not work depending on the actual datatype stored in the node property in the Database. If you need to specify the datatype, you’ll need the valType parameter. The following request returns all products with a given price point.

:GET /rdf/describe/find/Product/unitPrice/15?valType=INTEGER&excludeContext

Notice how we are being explicit about the datatype using the valType request parameter. If we removed this parameter the request would return no results because there is no Product in the Northwind Database with a unitPrice stored as a string. Here’s the ouptut produced (default serialisation is Turtle).

@prefix neovoc: <neo4j://vocabulary#> .
@prefix neoind: <neo4j://individuals#> .


neoind:69 a neovoc:Product;
  neovoc:categoryID "1";
  neovoc:discontinued false;
  neovoc:productID "70";
  neovoc:productName "Outback Lager";
  neovoc:quantityPerUnit "24 - 355 ml bottles";
  neovoc:reorderLevel "30"^^<http://www.w3.org/2001/XMLSchema#long>;
  neovoc:supplierID "7";
  neovoc:unitPrice 1.5E1;
  neovoc:unitsInStock "15"^^<http://www.w3.org/2001/XMLSchema#long>;
  neovoc:unitsOnOrder "10"^^<http://www.w3.org/2001/XMLSchema#long> .

neoind:72 a neovoc:Product;
  neovoc:categoryID "8";
  neovoc:discontinued false;
  neovoc:productID "73";
  neovoc:productName "Röd Kaviar";
  neovoc:quantityPerUnit "24 - 150 g jars";
  neovoc:reorderLevel "5"^^<http://www.w3.org/2001/XMLSchema#long>;
  neovoc:supplierID "17";
  neovoc:unitPrice 1.5E1;
  neovoc:unitsInStock "101"^^<http://www.w3.org/2001/XMLSchema#long>;
  neovoc:unitsOnOrder "0"^^<http://www.w3.org/2001/XMLSchema#long> .

The different values that the valType request parameter can take are currently: INTEGER, FLOAT and BOOLEAN.

Using Cypher

/rdf/cypher

Finally, the most powerful way of selecting the portion of the graph that we want to serialise as cypher would obviously be to use Cypher. That’s exactly what this method does. In this case it’s a POST request that takes as payload a JSON map with at least one cypher key having as its value the query returning the graph objects (nodes with their properties and relationships) to be serialised.

Optionally, the JSON map may include the format key that can be used to override the default serialization format (Turtle) and also a showOnlyMapped key (default value is false). Whe present, the returned serialisation will exclude unmapped elements (same functionality explained in the describe methods). Here’s an example of use on the Northwind database. Note that your query needs to return graph elements: nodes, relationships or paths. Produces an RDF serialization of the nodes and relationships returned by the query.<br>

:POST /rdf/cypher
{ "cypher" : "MATCH path = (n:Customer { customerID : 'GROSR'})-[:PURCHASED]->(o)-[:ORDERS]->()-[:PART_OF]->(:Category { categoryName : 'Beverages'}) RETURN path " , "format": "RDF/XML" }

This is the subgraph (path) that we are serialising as RDF. We’re taking a customer by its customerID and getting all orders containing items in category Beverages. Nice path expression in Cypher :

Customer

And this is the generated RDF/XML.

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
	xmlns:neovoc="neo4j://vocabulary#"
	xmlns:neoind="neo4j://individuals#"
	xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">

<rdf:Description rdf:about="neo4j://individuals#172">
	<rdf:type rdf:resource="neo4j://vocabulary#Customer"/>
	<neovoc:country>Venezuela</neovoc:country>
	<neovoc:address>5ª Ave. Los Palos Grandes</neovoc:address>
	<neovoc:contactTitle>Owner</neovoc:contactTitle>
	<neovoc:city>Caracas</neovoc:city>
	<neovoc:phone>(2) 283-2951</neovoc:phone>
	<neovoc:contactName>Manuel Pereira</neovoc:contactName>
	<neovoc:companyName>GROSELLA-Restaurante</neovoc:companyName>
	<neovoc:postalCode>1081</neovoc:postalCode>
	<neovoc:customerID>GROSR</neovoc:customerID>
	<neovoc:fax>(2) 283-3397</neovoc:fax>
	<neovoc:region>DF</neovoc:region>
</rdf:Description>

<rdf:Description rdf:about="neo4j://individuals#774">
	<rdf:type rdf:resource="neo4j://vocabulary#Order"/>
	<neovoc:shipCity>Caracas</neovoc:shipCity>
	<neovoc:orderID>10785</neovoc:orderID>
	<neovoc:freight>1.51</neovoc:freight>
	<neovoc:requiredDate>1998-01-15 00:00:00.000</neovoc:requiredDate>
	<neovoc:employeeID>1</neovoc:employeeID>
	<neovoc:shipPostalCode>1081</neovoc:shipPostalCode>
	<neovoc:shipName>GROSELLA-Restaurante</neovoc:shipName>
	<neovoc:shipCountry>Venezuela</neovoc:shipCountry>
	<neovoc:shipAddress>5ª Ave. Los Palos Grandes</neovoc:shipAddress>
	<neovoc:shipVia>3</neovoc:shipVia>
	<neovoc:customerID>GROSR</neovoc:customerID>
	<neovoc:shipRegion>DF</neovoc:shipRegion>
	<neovoc:shippedDate>1997-12-24 00:00:00.000</neovoc:shippedDate>
	<neovoc:orderDate>1997-12-18 00:00:00.000</neovoc:orderDate>
</rdf:Description>

<rdf:Description rdf:about="neo4j://individuals#74">
	<rdf:type rdf:resource="neo4j://vocabulary#Product"/>
	<neovoc:reorderLevel rdf:datatype="http://www.w3.org/2001/XMLSchema#long">25</neovoc:reorderLevel>
	<neovoc:unitsInStock rdf:datatype="http://www.w3.org/2001/XMLSchema#long">125</neovoc:unitsInStock>
	<neovoc:unitPrice rdf:datatype="http://www.w3.org/2001/XMLSchema#double">7.75</neovoc:unitPrice>
	<neovoc:supplierID>12</neovoc:supplierID>
	<neovoc:productID>75</neovoc:productID>
	<neovoc:quantityPerUnit>24 - 0.5 l bottles</neovoc:quantityPerUnit>
	<neovoc:discontinued rdf:datatype="http://www.w3.org/2001/XMLSchema#boolean">false</neovoc:discontinued>
	<neovoc:productName>Rhönbräu Klosterbier</neovoc:productName>
	<neovoc:categoryID>1</neovoc:categoryID>
	<neovoc:unitsOnOrder rdf:datatype="http://www.w3.org/2001/XMLSchema#long">0</neovoc:unitsOnOrder>
</rdf:Description>

<rdf:Description rdf:about="neo4j://individuals#80">
	<rdf:type rdf:resource="neo4j://vocabulary#Category"/>
	<neovoc:description>Soft drinks, coffees, teas, beers, and ales</neovoc:description>
	<neovoc:categoryName>Beverages</neovoc:categoryName>
	<neovoc:picture>0x151C2F00020000000D000E0014002100FFFFFFFF4269746D617020496D616765005061696E742E5069637475726500010500000200000007000000504272757368000000000000000000A0290000424D98290000000000005600000028000000AC00000078000000010004000000000000000000880B0000880B0000080000</neovoc:picture>
	<neovoc:categoryID>1</neovoc:categoryID>
</rdf:Description>

<rdf:Description rdf:about="neo4j://individuals#172">
	<neovoc:PURCHASED rdf:resource="neo4j://individuals#774"/>
</rdf:Description>

<rdf:Description rdf:about="neo4j://individuals#774">
	<neovoc:ORDERS rdf:resource="neo4j://individuals#74"/>
</rdf:Description>

<rdf:Description rdf:about="neo4j://individuals#74">
	<neovoc:PART_OF rdf:resource="neo4j://individuals#80"/>
</rdf:Description>

</rdf:RDF>

And here’s the graph visualisation produced by the W3C’s RDF validation service for this RDF. Feel free to test the parsing of the generated RDF yourself. You can do it manually copy-pasting it in the form, or you can point directly to your Neo4j instance RDF endpoint if the URL is publicly accessible.

RDF Graph visualisation generated by W3C RDF Validation service

It is possible to pass parameters to the query using the cypherParams parameter in the request. And you should be using params whenever possible. Here’s exactly the same request but passing the customerID as a parameter to the cypher.

:POST /rdf/cypher
{ "cypher" : "MATCH path = (n:Customer { customerID : $custid })-[:PURCHASED]->(o)-[:ORDERS]->()-[:PART_OF]->(:Category { categoryName : 'Beverages'}) RETURN path " , "cypherParams" : { "custid": "GROSR" }, "format": "RDF/XML" }

/rdf/cypheronrdf

And finally, if the graph in your Neo4j DB is the result of importing an RDF dataset using NSMNTX (and of course if you did NOT use the IGNORE option), rdf/cypheronrdf will work in exactly the same way as rdf/cypher but will use the stored namespace information to generate exactly the same RDF triples that were originally ingested. The parameters are identical to the previous case. Here’s an example on the graph we imported in section [import] that returns a plugin information given a releaseDate:

:POST /rdf/cypheronrdf { "cypher":"MATCH (neo4j:ns0__GraphPlatform)<-[ro:ns0__runsOn]-(plugin:ns0__Neo4jPlugin) WHERE plugin.ns0__releaseDate = '03-06-2019' RETURN plugin, ro, neo4j " , "format" : "JSON-LD"}

We can use this example to set the serialisation format to JSON-LD, which would produce the following RDF fragment:

[ {
  "@id" : "http://neo4j.org/ind#neo4j355",
  "@type" : [ "http://neo4j.org/vocab/sw#GraphPlatform", "http://neo4j.org/vocab/sw#AwesomePlatform" ],
  "http://neo4j.org/vocab/sw#name" : [ {
    "@value" : "neo4j"
  } ],
  "http://neo4j.org/vocab/sw#version" : [ {
    "@value" : "3.5.5"
  } ]
}, {
  "@id" : "http://neo4j.org/ind#nsmntx3502",
  "@type" : [ "http://neo4j.org/vocab/sw#Neo4jPlugin" ],
  "http://neo4j.org/vocab/sw#name" : [ {
    "@value" : "NSMNTX"
  } ],
  "http://neo4j.org/vocab/sw#releaseDate" : [ {
    "@value" : "03-06-2019"
  } ],
  "http://neo4j.org/vocab/sw#runsOn" : [ {
    "@id" : "http://neo4j.org/ind#neo4j355"
  } ],
  "http://neo4j.org/vocab/sw#version" : [ {
    "@value" : "3.5.0.2"
  } ]
} ]

Run this cypher instead MATCH (n:Resource)-[r]-(m) RETURN * and you’ll be returning the whole dataset, or in other words, regenerating from Neo4j exactly the same RDF that we ingested in the first place.

Export Graph Ontology

It is possible to export your Graph schema in the form of an OWL Ontology. The same output produced by the db.schema() procedure can be generated as RDF/OWL through the /onto method.

/rdf/onto

The /onto method will run db.schema() on your Neo4j graph and will generate owl:Class definitions for each label found, and owl:ObjectProperty definitions for each relationship along with rdfs:domain and rdfs:range based on the labels of their start and end nodes. Here’s an example of the output for the Neo4j Movie database.

:GET /rdf/onto

or

http://localhost:7474/rdf/onto

And the ontology generated would be:

@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix neovoc: <neo4j://vocabulary#> .
@prefix neoind: <neo4j://individuals#> .


neovoc:Movie a owl:Class;
  rdfs:label "Movie" .

neovoc:Person a owl:Class;
  rdfs:label "Person" .

neovoc:ACTED_IN a owl:ObjectProperty;
  rdfs:domain neovoc:Person;
  rdfs:range neovoc:Movie .

neovoc:REVIEWED a owl:ObjectProperty;
  rdfs:domain neovoc:Person;
  rdfs:range neovoc:Movie .

neovoc:PRODUCED a owl:ObjectProperty;
  rdfs:domain neovoc:Person;
  rdfs:range neovoc:Movie .

neovoc:WROTE a owl:ObjectProperty;
  rdfs:domain neovoc:Person;
  rdfs:range neovoc:Movie .

neovoc:FOLLOWS a owl:ObjectProperty;
  rdfs:domain neovoc:Person;
  rdfs:range neovoc:Person .

neovoc:DIRECTED a owl:ObjectProperty;
  rdfs:domain neovoc:Person;
  rdfs:range neovoc:Movie .

It is possible to set the serialisation format using the accept header param or the format request param. The following request would serialise the ontology as N-Triples.

:GET /rdf/onto?format=N-Triples

/rdf/ontonrdf

Similarly, if the Neo4j graph is the result of importing RDF via semantics.importRDF, the Ontology can be exported by running ontonrdf, which will take care of expanding the namespaces shortened in the import process.

:GET /rdf/ontonrdf

Which applied to the example dataset about neo4j plugins used in section Importing RDF data, would produce the following ontology:

@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix neovoc: <neo4j://vocabulary#> .
@prefix neoind: <neo4j://individuals#> .


<http://neo4j.org/vocab/sw#GraphPlatform> a owl:Class;
  rdfs:label "GraphPlatform" .

<http://neo4j.org/vocab/sw#Neo4jPlugin> a owl:Class;
  rdfs:label "Neo4jPlugin" .

<http://neo4j.org/vocab/sw#AwesomePlatform> a owl:Class;
  rdfs:label "AwesomePlatform" .

<http://neo4j.org/vocab/sw#runsOn> a owl:ObjectProperty;
  rdfs:domain <http://neo4j.org/vocab/sw#Neo4jPlugin>;
  rdfs:label "runsOn";
  rdfs:range <http://neo4j.org/vocab/sw#AwesomePlatform>, <http://neo4j.org/vocab/sw#GraphPlatform> .

Mapping graph models

Mappings can be used for applying transformations to the RDF as it’s imported into Neo4j and they can also be used to transform the vocabulary used in a Neo4j graph as it’s exported through the different RDF export methods described in Exporting RDF data. Mappings are based on terminology but they will not modify the structure of the graph. In other words, as we will see in this section, you will be able to use them to rename a property, a relationship or a label but not to change a property into a relationship.

Public Vocabularies/Ontologies

A public graph model is also called an Ontology (or a schema, or a vocabulary). We will not go into the details of the subtle differences between each flavour in this manual. All we need to know is that a graph model normally defines a set of terms (categories, properties, relationships…​) and how they relate to each other. Some common examples are schema.org, FIBO or the Gene Ontology. Public vocabularies like the ones mentioned, typically uniquely identify the terms in it by using namespaces, so roughly speaking, a namespace identifies a vocabulary (or at least a portion of it).

To create a mapping with NSMNTX we need to do two things: first, create a reference to a public schema, and then use that reference to create individual mappings from elements in the Neo4j schema to elements in the public schema. Here’s how to do it:

Defining mappings

Let’s say we want to map our movie database schema to the public schema.org vocabulary. First we’ll create a reference to the schema.org vocabulary passing the base URI and a prefix to be used in the RDF serialisation. You can use standard ones or a random accronym. Just make sure they’re both unique in your mapping definition.

call semantics.mapping.addSchema("http://schema.org/","sch")

The call to create a reference to a public vocabulary will produce as output, the newly created reference, or alternatively an error message indicating what went wrong:

╒════════╤════════════════════╕
│"prefix"│"namespace"         │
╞════════╪════════════════════╡
│"sch"   │"http://schema.org/"│
└────────┴────────────────────┘

We can create as many references to public vocabularies as needed, and there is also useful method (mapping.addCommonSchemas) that can be used to include a set of the most common schemas in one go:

call semantics.mapping.addCommonSchemas()

References to schemas can be removed using the mapping.dropSchema method and passing as single parameter the exact URI of the vocabulary we want to have deleted. Notice that this will remove both the schema and all element mappings defined on it.

call semantics.mapping.dropSchema("http://purl.org/dc/elements/1.1/")

And we can list the currently existing schemas by running mapping.listSchemas. This method takes an optional string parameter that can be used to filter the list to the ones that match a particular search string in the schema uri or in the prefix. If we run the following after running the mapping.addCommonSchemas:

call semantics.mapping.listSchemas("rdf")

We would get the following results:

╒════════╤═════════════════════════════════════════════╕
│"prefix"│"namespace"                                  │
╞════════╪═════════════════════════════════════════════╡
│"rdfs"  │"http://www.w3.org/2000/01/rdf-schema#"      │
├────────┼─────────────────────────────────────────────┤
│"rdf"   │"http://www.w3.org/1999/02/22-rdf-syntax-ns#"│
└────────┴─────────────────────────────────────────────┘

Once we have defined a reference to a public vocabulary/schema, we can now create actual mappings for elements in our graph to elements in the public schemas. The mapping.addMappingToSchema procedure. This method takes three parameters, the URI of a public schema previously added via mapping.addSchema and a pair formed by the name of the element in our graph (a property name, a label or a relationship type) and the matching element in the public schema.

The following example shows how to define a map from a CHILD_CATEGORY relationship type in a Neo4j graph to the skos:narrower relationship (or ObjectProperty in RDF terminology).

call semantics.mapping.addMappingToSchema("http://www.w3.org/2004/02/skos/core#", "CHILD_CATEGORY", "narrower")

Just like we did with schema references, we can list existing mappings using mapping.listMappings and filter the list with an optional search string parameter to return only mappings where either the graph element name or the schema element name match the search string.

call semantics.mapping.listMappings()

Producing a listing with the following structure:

╒══════════════════════════════════════╤══════════════╤═══════════════╤════════════════╕
│"schemaNs"                            │"schemaPrefix"│"schemaElement"│"elemName"      │
╞══════════════════════════════════════╪══════════════╪═══════════════╪════════════════╡
│"http://www.w3.org/2004/02/skos/core#"│"skos"        │"narrower"     │"CHILD_CATEGORY"│
└──────────────────────────────────────┴──────────────┴───────────────┴────────────────┘

It is also possible to remove individual ones with mapping.dropMapping passing as single parameter the name of the graph model element on which the mapping is defined.

call semantics.mapping.dropMapping("CHILD_CATEGORY")

Mappings for export

Let’s look at the case where we want to publish a graph in Neo4j but we want to map it to our organisation’s canonical model, our Enterprise Ontology or any public vocabulary. For this example we’re going to use the Northwind database in Neo4j :play northwind-graph and the public schema.org vocabulary.

Here’s the script that defines the reference to the schema.org public vocabulary and a few individual mappings for elements in the Northwind database in Neo4j.

//set parameter uri ->   :param uri: "http://schema.org/"

CALL semantics.mapping.addSchema($uri,"sch");
CALL semantics.mapping.addMappingToSchema($uri,"Order","Order");
CALL semantics.mapping.addMappingToSchema($uri,"orderID","orderNumber");
CALL semantics.mapping.addMappingToSchema($uri,"orderDate","orderDate");

CALL semantics.mapping.addMappingToSchema($uri,"ORDERS","orderedItem");

CALL semantics.mapping.addMappingToSchema($uri,"Product","Product");
CALL semantics.mapping.addMappingToSchema($uri,"productID","productID");
CALL semantics.mapping.addMappingToSchema($uri,"productName","name");

CALL semantics.mapping.addMappingToSchema($uri,"PART_OF","category");

CALL semantics.mapping.addMappingToSchema($uri,"categoryName","name");

After running the previous script, we can check that the mappings have been correctly defined with

call semantics.mapping.listMappings()

That should return:

╒════════════════════╤══════════════╤═══════════════╤══════════════╕
│"schemaNs"          │"schemaPrefix"│"schemaElement"│"elemName"    │
╞════════════════════╪══════════════╪═══════════════╪══════════════╡
│"http://schema.org/"│"sch"         │"Order"        │"Order"       │
├────────────────────┼──────────────┼───────────────┼──────────────┤
│"http://schema.org/"│"sch"         │"orderNumber"  │"orderID"     │
├────────────────────┼──────────────┼───────────────┼──────────────┤
│"http://schema.org/"│"sch"         │"orderDate"    │"orderDate"   │
├────────────────────┼──────────────┼───────────────┼──────────────┤
│"http://schema.org/"│"sch"         │"orderedItem"  │"ORDERS"      │
├────────────────────┼──────────────┼───────────────┼──────────────┤
│"http://schema.org/"│"sch"         │"Product"      │"Product"     │
├────────────────────┼──────────────┼───────────────┼──────────────┤
│"http://schema.org/"│"sch"         │"productID"    │"productID"   │
├────────────────────┼──────────────┼───────────────┼──────────────┤
│"http://schema.org/"│"sch"         │"name"         │"productName" │
├────────────────────┼──────────────┼───────────────┼──────────────┤
│"http://schema.org/"│"sch"         │"category"     │"PART_OF"     │
├────────────────────┼──────────────┼───────────────┼──────────────┤
│"http://schema.org/"│"sch"         │"name"         │"categoryName"│
└────────────────────┴──────────────┴───────────────┴──────────────┘

Now we can see these mappings in action by running any of the RDF generating methods described in Exporting RDF data (/describe/id, /describe/find/ or /cypher). Let’s use the /cypher method to serialise as RDF an order given its orderID.

:POST /rdf/cypher
{ "cypher" : "MATCH path = (n:Order { orderID : '10785'})-[:ORDERS]->()-[:PART_OF]->(:Category { categoryName : 'Beverages'}) RETURN path " , "format": "RDF/XML" , "mappedElemsOnly" : true }

The Cypher query uses the elements in the Neo4j graph but the generated RDF uses schema.org vocabulary elements. The mapping we just defined is bridging the two. Note that the mapping is completely dynamic which means that any change to the mapping definition will be applied to any subsequent request.

Warning
Elements for which no mapping has been defined will use the default Neo4j schema but we can specify that only mapped elements are to be exported by setting the mappedElemsOnly parameter to true in the request.

Here’s the output generated by the previous request:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
	xmlns:neovoc="neo4j://com.neo4j/voc#"
	xmlns:neoind="neo4j://com.neo4j/indiv#"
	xmlns:sch="http://schema.org/"
	xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">

<rdf:Description rdf:about="neo4j://com.neo4j/indiv#786">
	<rdf:type rdf:resource="http://schema.org/Order"/>
	<sch:orderNumber>10785</sch:orderNumber>
	<sch:orderDate>1997-12-18 00:00:00.000</sch:orderDate>
</rdf:Description>

<rdf:Description rdf:about="neo4j://com.neo4j/indiv#74">
	<rdf:type rdf:resource="http://schema.org/Product"/>
	<sch:productID>75</sch:productID>
	<sch:name>Rhönbräu Klosterbier</sch:name>
</rdf:Description>

<rdf:Description rdf:about="neo4j://com.neo4j/indiv#80">
	<sch:name>Beverages</sch:name>
</rdf:Description>

<rdf:Description rdf:about="neo4j://com.neo4j/indiv#786">
	<sch:orderedItem rdf:resource="neo4j://com.neo4j/indiv#74"/>
</rdf:Description>

<rdf:Description rdf:about="neo4j://com.neo4j/indiv#74">
	<sch:category rdf:resource="neo4j://com.neo4j/indiv#80"/>
</rdf:Description>

</rdf:RDF>

There’s another example of use of mappings for export in this blog post.

Mappings for import

In this section we’ll see how to use mappings to apply changes to an RDF dataset on ingestion using the RDF import procedures described in Importing RDF data.

Let’s say we are importing into Neo4j the the Open PermID dataset from Thomson Reuters. Here is a small fragment of the 'Person' file:

@prefix vcard: <http://www.w3.org/2006/vcard/ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix permid: <https://permid.org/> .

permid:1-34419230351
  a vcard:Person ;
  vcard:given-name "Keith"^^xsd:string .

permid:1-34419198943
  vcard:family-name "Peltz"^^xsd:string ;
  vcard:given-name "Maxwell"^^xsd:string ;
  vcard:additional-name "S"^^xsd:string ;
  a vcard:Person .

permid:1-34418273443
  vcard:family-name "Benner"^^xsd:string ;
  vcard:given-name "Thomas"^^xsd:string ;
  a vcard:Person ;
  vcard:friend-of <https://permid.org/1-34419230351> .

As part of the import process, we want to drop the namespaces (as described in Importing RDF data, this can be done using the handleVocabUris: "IGNORE" configuration) BUT in this case, we also want to create more neo4j-friendly names for properties. We want to get rid of the dashes in property names like given-name or additional-name and use 'camelCase' notation instead. The way to tell NSMNTX to do that is by defining a model mapping and setting the handleVocabUris parameter on import to 'MAP'.

We’ll start by defining a mapping like the one we defined for exporting RDF. Note that the properties we want to map are all in the same vcard vocabulary: http://www.w3.org/2006/vcard/ns#. The following script should do the job:

WITH
[{ neoSchemaElem : "givenName", publicSchemaElem:	"given-name" },
{ neoSchemaElem : "familyName", publicSchemaElem: "family-name" },
{ neoSchemaElem : "additionalName", publicSchemaElem: "additional-name" },
{ neoSchemaElem : "FRIEND_OF", publicSchemaElem: "friend-of" }] AS mappings,
"http://www.w3.org/2006/vcard/ns#" AS vcardUri

CALL semantics.mapping.addSchema(vcardUri,"vcard") YIELD namespace
UNWIND mappings as m
CALL semantics.mapping.addMappingToSchema(vcardUri,m.neoSchemaElem,m.publicSchemaElem) YIELD schemaElement
RETURN count(schemaElement) AS mappingsDefined

Just like we did in the previous section, we define a vocabulary with mapping.addSchema and then we add individual mappings for elements in the vocabulary with mapping.addMappingToSchema. If there were multiple vocabularies to map, we would just need repeat the process for each of them.

Now we can check that the mappings are correctly defined by running:

CALL semantics.mapping.listMappings()

which produces:

╒══════════════════════════════════╤══════════════╤═════════════════╤════════════════╕
│"schemaNs"                        │"schemaPrefix"│"schemaElement"  │"elemName"      │
╞══════════════════════════════════╪══════════════╪═════════════════╪════════════════╡
│"http://www.w3.org/2006/vcard/ns#"│"vcard"       │"given-name"     │"givenName"     │
├──────────────────────────────────┼──────────────┼─────────────────┼────────────────┤
│"http://www.w3.org/2006/vcard/ns#"│"vcard"       │"family-name"    │"familyName"    │
├──────────────────────────────────┼──────────────┼─────────────────┼────────────────┤
│"http://www.w3.org/2006/vcard/ns#"│"vcard"       │"additional-name"│"additionalName"│
├──────────────────────────────────┼──────────────┼─────────────────┼────────────────┤
│"http://www.w3.org/2006/vcard/ns#"│"vcard"       │"friend-of"      │"FRIEND_OF"     │
└──────────────────────────────────┴──────────────┴─────────────────┴────────────────┘

Important to note that when using the option handleVocabUris: "MAP", all non-mapped vocabulary elements will get the default treatment they get when the 'IGNORE' option is selected.

Once the mappings are defined, we can run the import process as described in Importing RDF data with the mentioned config param handleVocabUris: 'MAP' as follows:

CALL semantics.previewRDF("http://jbarrasa.github.io/neosemantics/docs/rdf/permid-person-fragment.ttl","Turtle", { handleVocabUris: 'MAP' })

After data load, we will be able to query the imported graph with a much more friendly cypher:

MATCH (n:Person) RETURN n.uri AS uri, n.familyName as familyName LIMIT 10

to get:

╒══════════════════════════════════╤══════════════╕
│"uri"                             │"familyName"  │
╞══════════════════════════════════╪══════════════╡
│"https://permid.org/1-34419230351"│null          │
├──────────────────────────────────┼──────────────┤
│"https://permid.org/1-34418273443"│"Benner"      │
├──────────────────────────────────┼──────────────┤
│"https://permid.org/1-34419198943"│"Peltz"       │
└──────────────────────────────────┴──────────────┘
Note
The combination of a mapping definition plus the use of the handleVocabUris: 'MAP' configuration can be applied not only to the semantics.importRDF procedure but also to the preview ones semantics.previewRDF and semantics.previewRDFSnippet.

Inferencing/Reasoning

By inferencing/reasoning we understand the process of getting information from the Neo4j database that is not explicitly stored. Here is a simple example: you have in your Neo4j graph some nodes labeled as loans and some nodes labeled as mortgages. If you manage to express the fact that a mortgage is a type of loan and consequently nodes labeled as mortgages are loans too, then you could expect your smart Neo4j DB to apply this reasoning on the fly and return both loan and mortgage nodes when you query for loans (even though you never explicitly labeled mortgage nodes as loans).

This kind of reasoning/inferencing is what this set of procedures will help you with.

Hierarchies of Categories

To model a hierarchy of categories we’ll typically use nodes in the graph to represent the categories and related nodes connected through SUBCAT_OF or NARROWER_THAN relationships (or whatever your choice of terminology will be).

Here’s a set of categories from the Library of Congress Subject Headings.

CREATE (c:LCSHTopic { authoritativeLabel: "Crystallography", dbLabel: "Crystallography", identifier: "sh 85034498" })
CREATE (po:LCSHTopic { authoritativeLabel: "Physical optics", dbLabel: "PhysicalOptics", identifier: "sh 85095187" })
CREATE (s:LCSHTopic { authoritativeLabel: "Solids", dbLabel: "Solids", identifier: "sh 85124647" })
CREATE (c)<-[:NARROWER_THAN]-(:LCSHTopic { authoritativeLabel: "Crystal optics", dbLabel: "CrystalOptics", identifier: "sh 85034488" })-[:NARROWER_THAN]->(po)
CREATE (c)<-[:NARROWER_THAN]-(:LCSHTopic { authoritativeLabel: "Crystals", dbLabel: "Crystals", identifier: "sh 85034503" })-[:NARROWER_THAN]->(s)
CREATE (c)<-[:NARROWER_THAN]-(:LCSHTopic { authoritativeLabel: "Dimorphism (Crystallography)", dbLabel: "DimorphismCrystallography", identifier: "sh 2007001101" })
CREATE (c)<-[:NARROWER_THAN]-(:LCSHTopic { authoritativeLabel: "Isomorphism (Crystallography)", dbLabel: "IsomorphismCrystallography", identifier: "sh 85068653" })

In this example we use LCSHTopic to label the categories and the NARROWER_THAN to link them in a hierarchy that as we can see in this fragment does not necessarily need to be a tree (it the general case it will be a graph).

Topic hierarchy from the LCSH

We have defined a hierarchy of categories and now we’ll want to annotate individuals with the categories defined. To do this we have two main options:

  • We can use labels to tag a node representing an individual with the category it belongs to. While this approach is preferable in many cases, it will be harder to navigate to nodes with related labels (by related in this case I mean super or sublabels).

  • We can link nodes representing individuals to the category (or categories) they belong to using a TYPE or IN_CATEGORY (or again whatever your preferred name for that relationship).

The following methods will help you leveraging explicit class hierarchies in your graph to run inferences whatever the modeling approach you follow from the ones described before.

semantics.inference.nodesLabelled

Let’s look at the first way of annotating individuals. This script creates a few publications (books) from the British National Library catalog sets a label on each of them. The labels match categories defined before.

CREATE (:Book:CrystalOptics { title: "Crystals and light", identifier: "2167673"})
CREATE (:Book:CrystalOptics { title: "Optical crystallography", identifier: "11916857"})

CREATE (:Book:IsomorphismCrystallography { title: "Isomorphism in minerals", identifier: "8096161"})

CREATE (:Book:Crystals { title: "Crystals and life", identifier: "12873809"})
CREATE (:Book:Crystals { title: "Highlights in applied mineralogy", identifier: "20234576"})

Note that in this case there is no relationship connecting the books with the category they belong to. We are using labels instead. But there is an explicit hierarchy for these labels that we want to exploit.

What we want now is to be able to ask Neo4j for all the books on Crystallography and get all those actually labelled as Crystallography but also all those labelled as any of Crystallography’s subcategories. That’s exactly what the semantics.inference.nodesLabelled does for us. All we need to do is pass the details on how is the category hierarchy built: catLabel will contain the label used to describe categories (the default is Label) which in our case is LCSHTopic. We’ll also need to specify in the subCatRel parameter, the relationship used to define the hierarchy (the default is SLO for Sub Label Of) which in our case is NARROWER_THAN. Finally, we need to specify the name in the category node containing the label name (the default is name) which in our example is dbLabel.

CALL semantics.inference.nodesLabelled('Crystallography',  { catNameProp: "dbLabel", catLabel: "LCSHTopic", subCatRel: "NARROWER_THAN" }) YIELD node
RETURN node.identifier as id, node.title as title, labels(node) as categories

When we run this query, and even thoug not a single node in our graph is actually labelled as Crystallography, we get the following results:

╒══════════╤══════════════════════════════════╤═════════════════════════════════════╕
│"id"      │"title"                           │"categories"                         │
╞══════════╪══════════════════════════════════╪═════════════════════════════════════╡
│"2167673" │"Crystals and light"              │["CrystalOptics","Book"]             │
├──────────┼──────────────────────────────────┼─────────────────────────────────────┤
│"11916857"│"Optical crystallography"         │["CrystalOptics","Book"]             │
├──────────┼──────────────────────────────────┼─────────────────────────────────────┤
│"12873809"│"Crystals and life"               │["Crystals","Book"]                  │
├──────────┼──────────────────────────────────┼─────────────────────────────────────┤
│"20234576"│"Highlights in applied mineralogy"│["Crystals","Book"]                  │
├──────────┼──────────────────────────────────┼─────────────────────────────────────┤
│"8096161" │"Isomorphism in minerals"         │["IsomorphismCrystallography","Book"]│
└──────────┴──────────────────────────────────┴─────────────────────────────────────┘

semantics.inference.hasLabel

If what we are looking for is not an set of nodes in a given category but rather a predicate telling us whether a node is or not is in a category then the function we need is semantics.inference.hasLabel.

Let’s create a user with interests in some of the books in our database:

MERGE (jb:User { userId : "JB2020"}) with jb
MATCH (book1:Book { identifier : "20234576" })
MATCH (book2:Book { identifier : "11916857" })
WITH jb, book1, book2
CREATE (book1)<-[:INTERESTED_IN]-(jb)-[:INTERESTED_IN]->(book2)

Now we can query for the books about Physical optics that he’s interested in. Here’s how:

MATCH (:User { userId : "JB2020"})-[:INTERESTED_IN]->(b:Book)
WHERE semantics.inference.hasLabel(b,'PhysicalOptics',$inferenceParams)
RETURN b.identifier as id, b.title as title, labels(b) as categories

Notice that now we’re passing the function configuration as a parameter. So we’ll have to set the param value upfront if we’re using the Neo4j browser.

:param inferenceParams: { catNameProp: "dbLabel", catLabel: "LCSHTopic", subCatRel: "NARROWER_THAN" }

And again, even though there’s no book explicitly labelled as 'PhysicalOptics', the query will produce the following result:

╒══════════╤═════════════════════════╤════════════════════════╕
│"id"      │"title"                  │"categories"            │
╞══════════╪═════════════════════════╪════════════════════════╡
│"11916857"│"Optical crystallography"│["CrystalOptics","Book"]│
└──────────┴─────────────────────────┴────────────────────────┘

semantics.inference.nodesInCategory

Now let’s look at the second way of annotating individuals.

Warning
If you were running the previous example delete all Book nodes before continuing with this second approach.

This script creates a few of them and links them to the categories defined before.

MATCH (co:LCSHTopic { authoritativeLabel: "Crystal optics"})
MATCH (is:LCSHTopic { authoritativeLabel: "Isomorphism (Crystallography)"})
MATCH (cr:LCSHTopic { authoritativeLabel: "Crystals"})

CREATE (:Work { title: "Crystals and light", identifier: "2167673"})-[:HAS_SUBJECT]->(co)
CREATE (:Work { title: "Optical crystallography", identifier: "11916857"})-[:HAS_SUBJECT]->(co)

CREATE (:Work { title: "Isomorphism in minerals", identifier: "8096161"})-[:HAS_SUBJECT]->(is)

CREATE (:Work { title: "Crystals and life", identifier: "12873809"})-[:HAS_SUBJECT]->(cr)
CREATE (:Work { title: "Highlights in applied mineralogy", identifier: "20234576"})-[:HAS_SUBJECT]->(cr)
Topic hierarchy with instances

In this case, the query to get the nodes in a particular category will make use of the semantics.inference.nodesInCategory procedure. This procedure takes as parameters, the details of how is the category hierarchy built and how are individuals connected to the categories: inCatRel specifies the relationship used to link an instance to a category (the default is IN_CAT) which in our example is HAS_SUBJECT. subCatRel specifies the relationship used to define the hierarchy (the default is SCO for Sub Category Of) which in our example is NARROWER_THAN.

MATCH (cat:LCSHTopic { authoritativeLabel: "Crystallography"})
CALL semantics.inference.nodesInCategory(cat, { inCatRel: "HAS_SUBJECT", subCatRel: "NARROWER_THAN"}) yield node
return node.title as work

When we run this Cypher fragment, we get the following list of results, even though not a single node in the graph is actually explicitly connected to the Crystallography category.

╒══════════════════════════════════╕
│"work"                            │
╞══════════════════════════════════╡
│"Optical crystallography"         │
├──────────────────────────────────┤
│"Crystals and light"              │
├──────────────────────────────────┤
│"Isomorphism in minerals"         │
├──────────────────────────────────┤
│"Crystals and life"               │
├──────────────────────────────────┤
│"Highlights in applied mineralogy"│
└──────────────────────────────────┘

semantics.inference.inCategory(node, category, {})

If what we are looking for is not an set of nodes in a given category but rather a predicate telling us whether a node is or not is in a category then the function we need is semantics.inference.inCategory.

Let’s create a user with interests in some of the books in our database:

MERGE (jb:User { userId : "JB2020"}) with jb
MATCH (book1:Work { identifier : "20234576" })
MATCH (book2:Work { identifier : "11916857" })
WITH jb, book1, book2
CREATE (book1)<-[:INTERESTED_IN]-(jb)-[:INTERESTED_IN]->(book2)

Now we can query for the books about Physical optics that he’s interested in. Here’s how:

MATCH (phyOpt:LCSHTopic { authoritativeLabel: "Physical optics"})
MATCH (:User { userId : "JB2020"})-[:INTERESTED_IN]->(b:Work)
WHERE semantics.inference.inCategory(b,phyOpt,$inferenceParams)
RETURN b.identifier as id, b.title as title

Notice that now we’re passing the function configuration as a parameter. So we’ll have to set the param value upfront if we’re using the Neo4j browser.

:param inferenceParams: { inCatRel: "HAS_SUBJECT", subCatRel: "NARROWER_THAN"}

And again, even though there’s no book explicitly connected to the 'PhysicalOptics' category, the query will produce the following result:

╒══════════╤═════════════════════════╕
│"id"      │"title"                  │
╞══════════╪═════════════════════════╡
│"11916857"│"Optical crystallography"│
└──────────┴─────────────────────────┘

A real world example

We can use the semantics.importOntology procedure to import the NCBI Taxon ontology. This is an ontology representation of the National Center for Biotechnology Information (NCBI) organismal taxonomy. It contains 1.8 million classes (Class) and 3.6 million subClass of (SCO) relationships.

CALL semantics.importOntology("http://purl.obolibrary.org/obo/ncbitaxon.owl","RDF/XML")

It takes just over a couple of minutes to load it into Neo4j.

╒═══════════════════╤═══════════════╤═══════════════╤════════════╤═══════════╤═══════════════╕
│"terminationStatus"│"triplesLoaded"│"triplesParsed"│"namespaces"│"extraInfo"│"configSummary"│
╞═══════════════════╪═══════════════╪═══════════════╪════════════╪═══════════╪═══════════════╡
│"OK"               │5480841        │12581469       │null        │""         │{}             │
└───────────────────┴───────────────┴───────────────┴────────────┴───────────┴───────────────┘

Let’s add to the hierarchy a few individuals. Some dogs (NCBITaxon_9615, "Canis lupus familiaris"):

CREATE (p:Person { name: "Mr. Doglover"}) WITH p
UNWIND [ { name: "Perdita" , dob: "30/11/2016"}, { name: "Toby" , dob: "14/03/2019"}, { name: "Lucky" , dob: "14/11/2018"}, { name: "Pongo" , dob: "4/10/2012"}] as doggy
CREATE (:Pet:NCBITaxon_9615 { name: doggy.name, dob: doggy.dob })-[:OWNER]->(p)

And why not? some mice (NCBITaxon_10092, "Mus musculus domesticus"):

CREATE (p:Person { name: "Mr. Mouselover"}) WITH p
UNWIND [ { name: "Mickey" , dob: "30/11/2016"}, { name: "Minnie" , dob: "14/03/2019"}, { name: "Topo" , dob: "14/11/2018"}, { name: "Rastamouse" , dob: "4/10/2012"}] as mouse
CREATE (:Pet:NCBITaxon_10092 { name: mouse.name, dob: mouse.dob })-[:OWNER]->(p)

If we’re looking for instances of mammals in our database, we’d look for nodes labelled as NCBITaxon_40674 ("Mammalia"). Obviously no node has been labelled as mammal, but we expect NSMNTX to do the job for us.

CALL semantics.inference.nodesLabelled('NCBITaxon_40674',{ catLabel: "Class", subCatRel: "SCO" }) YIELD node
RETURN node.name as name, node.dob as dob, labels(node)

Only a few milliseconds needed to identify them in the nearly 11k categories under Mammalia.

╒════════════╤════════════╤═════════════════════════╕
│"name"      │"dob"       │"labels(node)"           │
╞════════════╪════════════╪═════════════════════════╡
│"Mickey"    │"30/11/2016"│["Pet","NCBITaxon_10092"]│
├────────────┼────────────┼─────────────────────────┤
│"Minnie"    │"14/03/2019"│["Pet","NCBITaxon_10092"]│
├────────────┼────────────┼─────────────────────────┤
│"Topo"      │"14/11/2018"│["Pet","NCBITaxon_10092"]│
├────────────┼────────────┼─────────────────────────┤
│"Rastamouse"│"4/10/2012" │["Pet","NCBITaxon_10092"]│
├────────────┼────────────┼─────────────────────────┤
│"Perdita"   │"30/11/2016"│["NCBITaxon_9615","Pet"] │
├────────────┼────────────┼─────────────────────────┤
│"Toby"      │"14/03/2019"│["NCBITaxon_9615","Pet"] │
├────────────┼────────────┼─────────────────────────┤
│"Lucky"     │"14/11/2018"│["NCBITaxon_9615","Pet"] │
├────────────┼────────────┼─────────────────────────┤
│"Pongo"     │"4/10/2012" │["NCBITaxon_9615","Pet"] │
└────────────┴────────────┴─────────────────────────┘

Interestingly, and because Neo4j is a native Graph DB implementing index free adjacency, if we were to search across the 1.2 million categories for all instances of "Eukaryota" (NCBITaxon_2759), (one of the top three categories that all cellular organisms are divided into) it would take NSMNTX exactly the same time to identify them. Here’s the query:

CALL semantics.inference.nodesLabelled('NCBITaxon_2759',{ catLabel: "Class", subCatRel: "SCO" }) YIELD node
RETURN node.name as name, node.dob as dob, labels(node)

Similarly, we can verify in milliseconds how many of an individual’s pets are actually instances of "Eukaryota". Here’s how:

MATCH path = (:Person { name : "Mr. Doglover"})<-[:OWNER]-(pet)
WHERE semantics.inference.hasLabel(pet,'NCBITaxon_2759',$inferenceParams)
RETURN count(pet)

Hierarchies of Relationships

Just like we did with categories, we can use rdfs:subPropertyOf to create hierarchies of relationships, or in other words to state that all resources connected by one relationship are also implicitly connected by any parent relationship. If We state that ACTED_IN is a subproperty of WORKED_IN, when we find in the graph that Keanu Reeves ACTED_IN The Matrix, we can safely derive the fact that he also WORKED_IN that movie, even if there is not an explicit WORKED_IN relationship in the graph between Keanu and The Matrix. This is useful in situations where we want to be able to dynamically define relationships by composing existing ones.

The semantics.inference.getRels stored procedure uses exactly these semantics to infer implicit relationships between nodes in the graph.

semantics.inference.getRels

Let’s take the movie database. Remember you can have it loaded in Neo4j by running :play movies and following the instructions in the guide. Let’s say we have a fragment of a movie ontology that contains a definition of a relationship hierarchy. It does it by defining a number of rdfs:subPropertyOf statements between relationships. For instance, it states that every ACTED_IN relationship is also a WORKED_IN one. This is the triple in question:

...

neovoc:ACTED_IN a owl:ObjectProperty;
  rdfs:label "ACTED_IN";
  rdfs:subPropertyOf neovoc:WORKED_IN .

...

To see this inferencing procedure in action, we’ll start by loading the ontology. We can do this by either using the semantics.importOntology or the semantics.importRDF methods described in the Importing RDF data section.

Note
We can get a hierarchy from an ontology or we can create it with a cypher script from any other source.

If we run:

CALL semantics.importOntology("http://jbarrasa.github.io/neosemantics/docs/rdf/movieDBRelHierarchy.ttl", "Turtle")

We should get a simple hierarchy of properties like the one in this screen capture from the Neo4j browser.

property hierarchy in a possible Movie Database Ontology loaded into Neo4j

Writing a query that returns all nodes connected to the movie The Matrix through the 'virtual' WORKED_IN relationship is an easy task with the semantics.inference.getRels procedure.

match (thematrix:Movie {title: "The Matrix"})
call semantics.inference.getRels(thematrix,"WORKED_IN", { subRelRel: "SPO" }) yield rel, node
return type(rel) as relType, node

Returning:

╒══════════╤═════════════════════════════════════════╕
│"relType" │"node"                                   │
╞══════════╪═════════════════════════════════════════╡
│"ACTED_IN"│{"name":"Emil Eifrem","born":1978}       │
├──────────┼─────────────────────────────────────────┤
│"PRODUCED"│{"name":"Joel Silver","born":1952}       │
├──────────┼─────────────────────────────────────────┤
│"DIRECTED"│{"name":"Lana Wachowski","born":1965}    │
├──────────┼─────────────────────────────────────────┤
│"DIRECTED"│{"name":"Lilly Wachowski","born":1967}   │
├──────────┼─────────────────────────────────────────┤
│"ACTED_IN"│{"name":"Hugo Weaving","born":1960}      │
├──────────┼─────────────────────────────────────────┤
│"ACTED_IN"│{"name":"Laurence Fishburne","born":1961}│
├──────────┼─────────────────────────────────────────┤
│"ACTED_IN"│{"name":"Carrie-Anne Moss","born":1967}  │
├──────────┼─────────────────────────────────────────┤
│"ACTED_IN"│{"name":"Keanu Reeves","born":1964}      │
└──────────┴─────────────────────────────────────────┘

Now let’s say we want to modify the meaning of the WORKED_IN relationship to exclude PRODUCED and keep only artistic involvement connections, tis is WROTE, ACTED_IN and DIRECTED. We don’t need to alter our database, just our ontology.

MATCH (:Relationship {name:"PRODUCED"})-[r:SPO]->(:Relationship {name:"WORKED_IN"})
DELETE r

If we run the same query again, we’ll get different results, this time excluding producers. Think of this in a large scale DB. We can effectively modify relationships globally by adding or deleting a simple link to the hierarchy and without having to modify every single instance.

NSMNTX Reference

Complete list of all available stored procedures, functions and extensions in NSMNTX.

Stored Procedures

RDF Import

Procedure Name params Description and example usage

semantics.importRDF

  • URL of the dataset

  • serialization format (valid formats: Turtle, N-Triples, JSON-LD, TriG, RDF/XML)

  • optional map with params from the table below

Fetches RDF from a url (file or http) and stores it in Neo4j as a property graph. This procedure requires and index on :Resource(uri)

semantics.importRDFSnippet

  • string containing a valid RDF fragment

  • serialization format (valid formats: Turtle, N-Triples, JSON-LD, TriG, RDF/XML)

  • optional map with params from the table below

Imports an RDF snippet passed as parameter and stores it in Neo4j as a property graph. Requires and index on :Resource(uri)

semantics.importQuadRDF

  • URL of the dataset

  • serialization format (valid formats: TriG,N-Quads)

  • optional map with params from the table below

importRDF for RDF Quads

semantics.importOntology

  • URL of the dataset

  • serialization format (valid formats: Turtle, N-Triples, JSON-LD, TriG, RDF/XML)

  • optional map with params from the ontology import table below

Imports classes, properties (dataType and Object), hierarchies thereof and domain and range info.

semantics.streamRDF

  • URL of the dataset

  • serialization format (valid formats: Turtle, N-Triples, JSON-LD, TriG, RDF/XML)

  • optional map with params from the table below

Parses RDF and streams each triple as a record with <S,P,O> along with datatype and language tag for Literal values. No writing to the DB. This SP is useful when you want to import into your Neo4j graph fragments of an RDF dataset in a custom way.

semantics.previewRDF

  • URL of the dataset

  • serialization format (valid formats: Turtle, N-Triples, JSON-LD, TriG, RDF/XML)

  • optional map with params from the table below

Parses RDF and produces virtual Nodes and relationships for preview in the Neo4j browser. No writing to the DB. Notice that this is adequate for a preliminary visual analysis of a SMALL dataset. Think how many nodes you want rendered in your browser.

semantics.previewRDFSnippet

  • string containing a valid RDF fragment

  • serialization format (valid formats: Turtle, N-Triples, JSON-LD, TriG, RDF/XML)

  • optional map with params from the table below

Parses an RDF fragment passed as parameter (no retrieval from url) and produces virtual Nodes and relationships for preview in the Neo4j browser. No writing to the DB

semantics.deleteRDF

  • URL of the dataset

  • serialization format (valid formats: Turtle, N-Triples, JSON-LD, TriG, RDF/XML)

  • optional map with params from the table below

Deletes triples from Neo4j. Works on a graph resulted of importing RDF via semantics.importRDF(). Delete config must match the one used on import

semantics.deleteQuadRDF

  • URL of the dataset

  • serialization format (valid formats: TriG,N-Quads)

  • optional map with params from the table below

Deletes quads from Neo4j. Works on a graph resulted of importing RDF quads via semantics.importQuadRDF(). Delete config must match the one used on import

RDF Import Params
Param values(default) Description

handleVocabUris

'SHORTEN','IGNORE','MAP','KEEP' ('SHORTEN')

  • 'SHORTEN', full uris are shortened using prefixes for property names, relationship names and labels

  • 'IGNORE' uris are ignored and only local names are kept

  • 'MAP' vocabulary element mappings are applied on import

  • 'KEEP' uris are kept unchanged

applyNeo4jNaming

boolean (false)

when set to true and in combination with handleVocabUris: 'IGNORE', Neo4j capitalisation is applied to vocabulary elements (all caps for relationship types, capital first for labels, etc.)

handleMultival

'OVERWRITE', 'ARRAY' ('OVERWRITE')

  • 'OVERWRITE' property values are kept single valued. Multiple values in the imported RDF are overwriten (only the last one is kept)

  • 'ARRAY' properties are stored in an array enabling storage of multiple values. All of them unless multivalPropList is set.

multivalPropList

list of strings ([])

List of property names (full uri) to be stored as arrays. The rest are treated as 'OVERWRITE'.

keepLangTag

boolean (false)

when set to true, the language tag is kept along with the property value. Useful for multilingual datasets. Use helper function getLangValue to get specific values.

predicateExclusionList

list of strings ([])

List of predicates (full uri) that are to be ignored on parsing RDF and not stored in Neo4j.

typesToLabels

boolean (true)

when set to true, rdf:type statements are imported as node labels in Neo4j

languageFilter

['en','fr','es',…​]

when set, only literal properties with this language tag (or untagged ones) are imported

headerParams

map {}

parameters to be passed in the HTTP GET request or payload if POST request. <br> Example: { authorization: 'Basic user:pwd', Accept: 'application/rdf+xml'}

commitSize

integer (25000)

commit a partial transaction every n triples

nodeCacheSize

integer (10000)

keep n nodes in cache to minimize reads from DB

verifyUriSyntax

boolean (true)

by default, uri syntax is checked. This can be disable d by setting this parameter to false

keepCustomDataTypes

boolean(false)

when set to true, all properties containing a custom data type will be saved as a string followed by their custom data type IRIs

customDataTypedPropList

list of strings ([])

when set, only custom data types of literal properties in this list are imported

Ontology Import Params
Param values(default) Description

predicateExclusionList

list of strings ([])

List of predicates (full uri) that are to be ignored on parsing RDF and not stored in Neo4j.

headerParams

map {}

parameters to be passed in the HTTP GET request or payload if POST request. <br> Example: { authorization: 'Basic user:pwd', Accept: 'application/rdf+xml'}

commitSize

integer (25000)

commit a partial transaction every n triples

nodeCacheSize

integer (10000)

keep n nodes in cache to minimize reads from DB

verifyUriSyntax

boolean (true)

by default, uri syntax is checked. This can be disable d by setting this parameter to false

classLabelName

string ('Class')

Label for classes in the ontology

subClassOfRelName

string ('SCO')

Relationship name for rdfs:subClassOf statements

dataTypePropertyLabelName

string ('Property')

Label for DataTypeProperty definitions (attributes)

objectPropertyLabelName

string ('Relationship')

Label for ObjectProperty definitions (relationships)

subPropertyOfRelName

string ('SPO')

Relationship for rdfs:subPropertyOf statements

domainRelName

string ('DOMAIN')

Domain relationship between Classes and DataTypeProperty/ObjectProperty

rangeRelName

string ('RANGE')

Range relationship between Classes and DataTypeProperty/ObjectProperty

RDF Import Utils

Procedure Name params Description and example usage

semantics.addNamespacePrefix

Adds namespace - prefix pair definition to be used for RDF import

semantics.listNamespacePrefixes

params

Lists all currently defined namespace prefix definitions

Model Mapping

Procedure Name params Description and example usage

semantics.mapping.addSchema

  • URL of the schema/vocabulary/ontology

  • prefix to be used in serialisations

Creates a reference to a vocabulary. Needed to define mappings.

semantics.mapping.dropSchema

  • URL of the schema/vocabulary/ontology

Deletes a vocabulary reference and all associated mappings.

semantics.mapping.listSchemas

  • optional filter string

Returns all vocabulary references. When filter string is set, only schemas containing the search string in their uri or in the associated prefix are returned.

semantics.mapping.addCommonSchemas

no prams

Creates references to a number of popular vocabularies including schema.org, Dublin Core, SKOS, OWL, etc

semantics.mapping.addMappingToSchema

  • URL of the schema/voc/ontology

  • The name of the element in the Neo4j graph (a property name, a label or a relationship type)

  • The matching element (Class, DataTypeProperty or ObjectProperty) in the public schema. Only the local name of the element

Creates a mapping for an element in the Neo4j DB schema to a vocabulary element

semantics.mapping.dropMapping

* mapped DB element name to remove the mapping

Returns an output text message indicating success/failure of the deletion

semantics.mapping.listMappings

  • optional filter string

Returns a list with all the currently defined mappings. Whe filter string is passed, only mappings containing the string in the DB element name or the schema element URI are returned

Inferencing

Stored Proc Name params Description

semantics.inference.nodesLabelled

  • a string with a label name

  • parameters as described in table below

returns all nodes with label 'label' or its sublabels

semantics.inference.nodesInCategory

  • a node representing the category

  • parameters as described in table below

returns all nodes connected to Node 'catNode' or its subcategories

semantics.inference.getRels

  • a start node

  • a (real or 'virtual') relationship type

  • parameters as described in table below

returns all relationships of type 'virtRel' or its subtypes along with the target nodes

semantics.inference.hasLabel (function)

  • a node

  • a label name as a string

  • parameters as described in table below

checks whether node is explicitly or implicitly labeled as 'label'

semantics.inference.inCategory (function)

  • a node representing an instance

  • a node representing a category

  • parameters as described in table below

checks whether node is explicitly or implicitly in a category

Utility Functions

Function Name params Description

semantics.getIRILocalName

URI string

Returns the local part of the URI (stripping out the namespace)

semantics.getIRINamespace

URI string

Returns the namespace part of the URI (stripping out the local part)

semantics.getDataType

string (a property value)

Returns the XMLSchema (or custom) datatype of a property value when present

semantics.getLangValue

string (a property value)

Returns the value with the language tag passed as first argument or null if there’s not a value for the provided language tag

semantics.getValue

string (a property value)

Returns the value of a datatype of a property after stripping out the datatype information or language tag when present

semantics.shortFromUri

string (a URI)

Returns the shortened version of an IRI using the existing namespace definitions

semantics.uriFromShort

string (a shortened URI)

Returns the expanded (full) URI given a shortened one created in the load process with semantics.importRDF

semantics.importJSONAsTree

  • node to link the imported json to

  • the json fragment

  • (optional) relationship name linking the root node of the JSON to the node passed as first param

Imports a JSON payload by mapping it to nodes and relationships (JSON-LD style). Requires a uniqueness constraint on :Resource(uri)

Extensions (HTTP endpoints)

method type params Description

/rdf/describe/id/<nodeid>

GET

  • nodeid: path parameter containing the id of a node

  • excludeContext: Optional named parameter. If present output will not include connected nodes, just selected one.

  • format: RDF serialisation format. When present, it overrides the header param accept.

Produces an RDF serialization of the selected node. The format will be determined by the accept parameter in the header. Default is Turtle

/rdf/describe/uri/<nodeuri>

GET

  • nodeuri: path parameter containing the (urlencoded) uri of a node.

  • excludeContext: (optional) if present output will not include connected nodes, just selected one.

  • graphuri: (optional) if present and the graph includes Quad information, only statements in the selected named graph are returned.The value of the parameter is the (urlencoded) uri of a named graph.

  • format: RDF serialisation format. When present, it overrides the header param accept.

Produces an RDF serialization of the selected node. It works on a model either imported from an RDF dataset via semantics.importRDF, semantics.importQuadRDF or built in a way that nodes are labeled as :Resource and have an uri.

/rdf/describe/find/<l>/<p>/<v>

GET

  • the method takes three parameters passed as path parameters in the URL: <l>/<p>/<v>. They represent respectively a label, a property name and a property value.

  • excludeContext: Optional named parameter. If present output will not include connected nodes, just selected one.

  • valType: required when the property value is not to be treated as a string. Valid values: INTEGER, FLOAT and BOOLEAN

  • format: RDF serialisation format. When present, it overrides the header param accept.

returns nodes matching the filter on label and property value

/rdf/cypher

POST

POST request taking as parameter a JSON map with the following keys:

  • cypher: the cypher query to run

  • cypherParams: parameters for the cypher query

  • showOnlyMapped: (optional, default is false) if present output will exclude unmapped elements (see how to define mappings for labels,attributes, relationships)

  • format: RDF serialisation format. When present, it overrides the header param accept.

Produces an RDF serialization of the nodes and relationships returned by the Cypher query

/rdf/cypheronrdf

POST

same parameters as /rdf/cypher

Same as /rdf/cypher but it works on a model either imported from an RDF dataset via semantics.importRDF or built in a way that nodes are labeled as :Resource and have an uri.

/rdf/onto

GET

  • format: RDF serialisation format. When present, it overrides the header param accept.

returns an OWL ontology based on the graph schema

/rdf/ontonrdf

GET

  • format: RDF serialisation format. When present, it overrides the header param accept.

Same as /rdf/onto but it works on a model either imported from an RDF dataset via semantics.importRDF or built in a way that nodes are labeled as :Resource and have an uri.

Projects using NSMNTX

  • Nick Doyle writes on using Neo4j to map your AWS Infrastructure and he uses NSMNTX to ingest the RDF produced by awless. Read his post here

  • Refinitiv in their developers tutorials section include a step by step description of how to import one of their public datasets into Neo4j using NSMNTX.

  • Tony Hammond created an Elixir wrapper library for NSMNTX. He describes it in this blog post.

We know there are more out there! If you’re using NSMNTX we’d love to hear about what you’re doing. Let us know about it at the community forum.You can help us growing this list!