Operations on a Graph Database (Part 1 – Nodes)

Graph Database Tutorial

Part 1: Nodes

Part 2: Edges

Part 3: Types

Part 4: Properties

Part 5: Identifiers

Part 6: Traversals

Part 7: Sets

Part 8: Events

Graph databases are still quite unfamiliar to many developers. This is the first post in a series discussing the operations a graph database makes available to the developer. Just like there are only so many different things you can do on a relational database (like CREATE TABLE or INSERT), there are only so many things you can do on a graph database. It is worth looking at them one at a time, and that’s the goal of this series. This first post is on creating and deleting nodes.

To recap, a graph database contains nodes and edges, or MeshObjects and Relationships (as we call them in InfoGrid), or Instances and Links (as the UML would call them), or Resources and Triples (as the semantic web folks would call them), or boxes and arrows (as we draw them on a white board).

Nodes are those objects in a graph database that can stand on their own, they don’t depend on anything else. Edges are those objects that depend on the existence of (typically two) other objects, their source and their destination; we think of edges as connecting nodes.

To create a node in a graph database is one of its basic operations. For example, in InfoGrid, you can simply say:

MeshObject createMeshObject()

and voila, you have one. Similarly, you can delete a node by saying:

deleteMeshObject( MeshObject toDelete )

There are few conditions around those operations, such as that you have to have a transaction open, and that you have to have access rights to actually perform this operation, but that goes without saying.

When deleting a node, the graph database may require you to first delete all edges connected to the node before you get to delete it. Or, it may “ripple delete” all connected edges as part of the delete operation. There are some differences in the various graph database products on this; neither will make much of a difference to the developer.

If the graph database enforces a model (aka schema), as some graph databases do, you may need to make sure you don’t attempt to delete a node in a way that the schema would be violated. For example, if the schema says “an Order must be placed by exactly one Customer”, and you are attempting to delete the node representing the Customer, the graph database may prevent you from doing that as long as there still are nodes representing Order related to the Customer node. We’ll discuss schemas and graph databases in more detail in a later post.

For now, we learned two basic operations on a graph database:

  • create node
  • delete node.

Stay tuned for the next installment.

Graph Databases vs. Object Databases — What’s the Difference?

Great question on Stackoverflow.com about the difference between Graph Databases and Object Databases. I answered it there, and decided to post it here as well:

Object and graph databases operate on two different levels of abstraction.

An object database’s main data elements are objects, the way we know them from an object-oriented programming language.

A graph database’s main data elements are nodes and edges.

An object database does not have the notion of a (bidirectional) edge between two things with automatic referential integrity etc. A graph database does not have the notion of a pointer that can be NULL. (Of course one can imagine hybrids.)

In terms of schema, an object database’s schema is whatever the set of classes is in the application. A graph database’s schema (whether implicit, by convention of what String labels mean, or explicit, by declaration as models as we do it in InfoGrid for example) is independent of the application. This makes it much simpler, for example, to write multiple applications against the same data using a graph database instead of an object database, because the schema is application-independent. On the other hand, using a graph database you can’t simply take an arbitrary object and persist it.

Different tools for different jobs I would think.

The FirstStep Example

The new FirstStep example application allows you to get an InfoGrid application running literally in 60 seconds or less.

FirstStep shows the essence of how a tagging application like delicious would be implemented using InfoGrid.

Instructions and annotated source code are here: http://infogrid.org/wiki/Examples/FirstStep.

InfoGrid 2.9.2 Released

InfoGrid 2.9.2 is focused on the new project layout of the code base. This new layout has also been documented on the wiki, starting with the front page and continuing to the projects page.

The new layout will make it easier for newcomers to find their way around InfoGrid, and to selectively include only those parts of InfoGrid required for a given application. It’s top-level structure is as follows:

Below, you find directories such as:

  • modules: contains the functionality of the project
  • tests: automated tests for the project
  • testapps: web applications testing the project
  • etc.

Enjoy!

First Academic Workshop on Graph Databases: in China

This is remarkable.

Sooner or later, somebody had to organize an “international workshop on graph databases” in an academic setting. It happens with all technologies. So it’s not surprising that there will be one in July this year, in conjunction with a conference and with the proceedings published by Springer, just like you would expect.

It is surprising that the workshop is organized in China, by Chinese researchers. That’s a first, at least from what I have seen so far. Usually you would expect something in, say, Florida, or the south of France or perhaps Spain or Germany. But it is China.

Way to go! Some people are faster to spot a trend than others, and as an entrepreneur, I admire that.

Jonathan Ellis: The NoSQL Ecosystem

Excellent article by Jonathan Ellis on the various approaches to non-relational databases in the market today. He categorizes products and projects along three dimensions:

  1. scalability, in particular how well one can add and remove servers in local or remote data centers
  2. data and query model. He finds a lot of variety there.
  3. persistence design. Alternatives range from in-memory only to smart caching strategies to on-disk.

This categorization is really useful, and more useful than several other categorizations that have been proposed.

Let’s apply this to InfoGrid’s graph database layer:

Re scalability, InfoGrid scales as well as the underlying persistence layer. InfoGrid makes storage pluggable by delegating to the Store abstraction, and Store can be implemented on top of any key-value store. So InfoGrid is just as scalable as the underlying Store.

InfoGrid’s data and query model is based on a graph and an explicit object model. This makes life even easier for the developer than any of the alternatives he discusses in his article. Also, we think our traversal API is a lot simpler than some others that we have seen.

InfoGrid’s persistence design actually gives developers more choices than is typical: InfoGrid can be entirely in memory (if class MMeshBase is instantiated, for example), or smartly cached to an external Store (if class StoreMeshBase is instantiated). Most importantly: the API that developers write to is the same. This allows developers to write application code once, and only later decide how to store their application data. Or, if one kind of Store does not work out (or does not scale once the application becomes popular), move to another without changing the application (other than the initialization).

Carsonified: Why Graph Databases

Martin Kleppman summarizes the case for Graph Databases at carsonified.com. This is exactly why InfoGrid is built around a graph of MeshObjects:

… graph databases focus on the relationships between items — a better fit for highly interconnected data models.

Standard SQL cannot query transitive relationships, i.e. variable-length chains of joins which continue until some condition is reached. Graph databases, on the other hand, are optimised precisely for this kind of data. Look out for these symptoms indicating that your data would better fit into a graph model:

  • you find yourself writing long chains of joins (join table A to B, B to C, C to D) in your queries;
  • you are writing loops of queries in your application in order to follow a chain of relationships (particularly when you don’t know in advance how long that chain is going to be);
  • you have lots of many-to-many joins or tree-like data structures;
  • your data is already in a graph form (e.g. information about who is friends with whom in a social network).

Graph databases are often associated with the semantic web and RDF datastores, which is one of the applications they are used for. I actually believe that many other applications’ data would also be well represented in graphs. However, as before, don’t try to force data into a graph if it fits better into tables or documents.

In our experience, particularly social applications or applications that deal with complex interrelated data are much easier to build using a graph of typed objects in InfoGrid than to shoehorn into relational tables. But then, InfoGrid can use relational databases as storage engines, so we have the best of both worlds: graphs on the front, and enterprise-friendly SQL on the back.