Archives for : Others

A cross-RDF Graph Database investigation: the case of the missing context!

What is a graph in RDF?

RDF Graph Databases, also known as Triplestores, are a subset of Graph Databases where data is represented in triples. A simple triple consists of a subject, a predicate and an object aka subject-predicate-object. The predicate is the edge in the data graph that connects the subject to the object nodes. If we add context or graph information to a triple, we end up having the following structure: graph-subject-predicate-object. And when we talk about a graph in an RDF Graph Database, we always refer to it as the context. This type of triple, in turn, is named a quad.

The graph exists to structure and represent your data better because the triples with the same graph have the same context. The existence of the graph is one of the main differences between a property graph database and an RDF graph database. Yes, you can store your graph information in a property graph database too, but the RDF store is designed from the ground up with this in mind.

In the end, the choice of the database type is a matter of performance and how you want your data to be represented best for your use case.

What happens if there is no graph? 

One can insert data in the RDF Graph Database that does not contain the graph information. These simple triples are stored in the so called “unnamed graph” or “default graph” of the database. We want to see how to access this graph and we know that the DEFAULT SPARQL keyword is usually used in such cases.

Now that we specified what the DEFAULT graph is in relation to an RDF Graph Database, we will take a look at different triplestores and their specific implementation of it. We will look at some basic actions like data insert, delete and query. 

The triplestores we evaluated are: RDF4J 2.4, Stardog 6.1.1, GraphDB 8.8, Virtuoso  v7.2.2.1, AllegroGraph  6.4.6, MarkLogic 9.0, Apache JENA TDB, Oracle Spatial and Graph 18c. From now on when we mention one of them, we refer to the versions listed here. We did not change any configurations upon installation, so our observations relate to the default setup. 

Learnings

Data insert observations

The insert data SPARQL query used is

INSERT DATA {

<http://example.org/picasso> <http://example.org/paints> <http://example.org/guernica>

}

This query inserts a triple which has to graph information. The triple is stored in the DEFAULT graph of each RDF Graph Database. However there is a difference from store to store of what the DEFAULT graph represents. 

In Stardog, the DEFAULT graph keywords does not exist and instead one needs to use <tag:stardog:api:context:default>. All triples land here. 

Apache JENA TDB uses <urn:x-arq:DefaultGraph\> as default graph and the triples land here. You can use the DEFAULT keyword to query them.

Virtuoso has an internal default graph but the big difference is that a user cannot access it by using the DEFAULT keyword. The triples without graph information are added to this internal default graph.

Select data observations

The SPARQL query for selecting data used is:

SELECT * WHERE {

?s ?p ?o

}

For most of the triplestores what happens is that the data retrieved is coming from all graphs, including the DEFAULT graph. Basically it does not take into account any specific graph. The exceptions are:

Stradog retrieves data only from its internal default graph <tag:stardog:api:context:default>.

For Virtuoso you always need a graph otherwise you receive: “No default graph specified in the preamble”.

Delete data observations

The SPARQL query used to delete a triple is:

DELETE {

?s ?p ?o

} WHERE {

<http://example.org/picasso> <http://example.org/paints> ?o

}

Generally the triples that match the pattern are deleted from ALL graphs it exist in. Exceptions from this behaviour we found in:

Stradog deletes the triple only in the defined default graph. 

MarkLogic and Apache JENA TDB behaves the same. It deletes the triples that match the pattern only from the internal default graph. 

In Virtuoso one always needs to specify a graph to delete data. 

We also want to remark how a SPARQL query looks like when the DEFAULT keyword is present. The query to select data would look like:

SELECT * FROM DEFAULT WHERE {

?s ?p ?o

}

Additional known configurations 

In Stardog there is a configuration property which lets you choose which behaviour you like better. Through the query.all.graphs = true parameter, when you query without a graph, it will look in all graphs – default and named graphs – exactly like in the case of RDF4J. And if the property is set to false, it will only query the internal default graph. 

Additionally, if for some reason, you really need a graph in your SPARQL query even when you only need data from the DEFAULT graph, in Stardog you can write it as: FROM <tag:stardog:api:context:default>. And if you want to query all graphs, you can also do FROM <tag:stardog:api:context:all>.

In Virtuoso we learned that you always need to specify a graph when you query. So how do we work with the DEFAULT graph than?

There is a specific syntax for Virtuoso which lets you define/set your graph at the beginning of the query:

define input:default-graph-uri <graph_name>

INSERT DATA

{<http://example.org/picasso> <http://example.org/paints> <http://example.org/guernica>

}

Read more about it in the Virtuoso documentation.

AllegroGraph also provides some configurations. The defaultDatasetBehavior can be used directly in the SPARQL query to determine if  :all, :default or :rdf should be used when no graphs name is specified in the query. 

Or one can fix the default graph name with the default-graph-uris option (or the default-dataset-behavior) upon the run-sparql command.

In MarkLogic when working with REST or XQuery one has the default-graph-uri and a named-graph-uri parameters available, like mentioned in the SPARQL 1.1 Protocol recommendation to specify the graph.

In Apache JENA TDB all named graphs can be called  with <urn:x-arq:UnionGraph>. The configuration parameter tdb:unionDefaultGraph can be added to switch the default graph to the union of all graphs. And the default graph can be specifically called with <urn:x-arq:DefaultGraph\>

Conclusion

RDF Graph Databases are built from the group up with the context of your data in mind. Knowing your graphs and triplestore setup is, from my point of view, a basic knowledge for both developers but also data engineers. Always start with the question: “what setup do I need for my use case?”

Cross-RDF Graph Database behavior – the DEFAULT graph 

Triple store behavior on new installWRITE triples without graphSELECT triple without graphDELETE triple without graph
RDF4J 2.4Triples are added to DEFAULT graph.Retrieves data from ALL graphs including the DEFAULT graph.Deletes triples that match the pattern, from ALL graphs.
Stardog 6.1.1Triples are added to <tag:stardog:api:context:default>  graph which acts as the DEFAULT graph.It retrieves data only from the <tag:stardog:api:context:default> graph.It tries to delete the triple in the defined default graph. 
AllegroGraph  6.4.6Triples are added to an internal DEFAULT graph.Retrieves data from ALL graphs including the DEFAULT graph.Deletes triples that match the pattern, from ALL graphs.
MarkLogic 9.0Triples are added to an internal DEFAULT graph.Retrieves data from ALL graphs including the DEFAULT graph.It tries to delete the data in the internal DEFAULT graph.
GraphDB 8.8Triples are added to DEFAULT graph.Retrieves data from ALL graphs including the DEFAULT graph. Deletes triples that match the pattern, from ALL graphs.
Virtuoso v7.2.2.1Triples are added to an internal DEFAULT graph.You always need a graph otherwise you receive: “No default graph specified in the preamble”You always need to specify a graph to delete data.
Apache JENA TDBTriples are added to <urn:x-arq:DefaultGraph\>  graph which acts as the DEFAULT graph.It retrieves data only from the <urn:x-arq:DefaultGraph\> graph.It tries to delete the triple in the specified default graph. 
Oracle Spatial and Graph 18cTriples are added to an internal DEFAULT graph.Retrieves data from ALL graphs including the DEFAULT graph.Deletes triples that match the pattern, from ALL graphs.
Triple store behavior on new installWRITE triples without graphSELECT triple without graphDELETE triple without graph

New year new project

It is official: I am a developer, again!!!

Yup and I am very happy about it. The fist month back to code is almost over and I learned quite some stuff, 24 points to be exact,  ’cause I’m keeping track. I should maybe mention some of them in another post. For now, this post is about something else, something I am wanting to pick up since some time but ALWAYS found a reason why not. Well, this time I should be out of ‘why not’ reasons and just do it: work on a personal projects.

Last year I started flirting with Golang, ah… just for the sake of it (another postponed thing). And I actually did some problem solving on HackerRank to get started. Then, I also read a super interesting survey result, also from HackerRank, which talks about what employers want and what developers can do. That is where I got the idea to also go back do some Javascript (also because my project need at least a basic frontend).

Here I am, shaping up a project idea, more useful than fun, more for learning than being useful. But that is to be decided later… what it will become.

Some things are certain:

Functionality

  • search & autocomplete
  • suggest a new entry
  • admin dashboard with CRUD actions
  • 2 types of users: admin and public
  • API (maybe) for further features like – statistics, graphics, similarity…

Stack

  • backend: Golang
  • frontend: Javascript

This is my commitment to see it through basically! I will also soon post my git repo when it is up. Oh oh and did I mention, the topic is: funding and women in STEM!

Impostor Syndrome

Hello everybody,

I had the opportunity to give an interview for the magazine WOMAN regarding the topic of Impostor Syndrome. The article is only in German and was not available online so I uploaded it here for who is interested to read it. Credits to Angelika Strobl who interviewed me and wrote the article. Enjoy!

Developing for the Semantic Web

This year’s DevFest was again a blast!

I had the opportunity to hold a presentation about what I have been doing lately: a Web Application to show off the power of SPARQL. I turned my experience into an introduction of how to “Developing for the Semantic Web”.

Take a look:



My video from DevFest:



DevFest Vienna Website.

Spring Boot and Polymer

Last week was the Google I/O Developer conference and Polymer 1.0 was presented. So finally my curiosity was sparked and I made some time to check it out a little bit.  I was looking for a fast way to create a JAVA Web Application where I can use Polymer so I heard about how easy and fast Spring Boot is.

So voilà, my first JAVA Web App with Spring Boot and Polymer 1.0. You can clone it from Git and use it as a archetype – the Polymer files are included in the project already. (also for learning purposes). I used Maven to build the project, which is also easy. But one can also use Gradle.

https://github.com/theRealImy/SpringBootPolymer

Using Spring Boot was super easy! One can simply follow the Getting started

Polymer is home here

The only issue I encountered was that the index.html was not displayed. After a bit of reading, in the Spring Boot docu you find:

Do not use the src/main/webapp directory if your application will be packaged as a jar…

By default Spring Boot will serve static content from a directory called /static (or /public or /resources or /META-INF/resources) in the classpath or from the root of the ServletContext.

Fast enough, I changed the folder name and it worked.

 

Viel Spaß!