Traversals are the most common operations on a graph database. They are just as important for graph databases as joins are for relational databases. But of course, they are something else as graphs are not tables.
A traversal in a graph database is uniquely described by two data items:
- a start node
- a specification for the traversal
A traversal always leads to a set of nodes. Depending on the structure of the graph being traversed, that set may contain many, one or zero nodes. For example, if we traverse from a node representing a Person to the nodes representing their grandchildren, we may or may not find any, depending on whether they have any.
From a given node, we can traverse in many different ways (i.e. same node, different traversal specifications). Or, given the same traversal specification, we can start with different nodes.
By way of analogy, consider street directions:
- start node: my house
- traversal specification: first turn left, then go either straight or left.
The result of this particular traversal is a single-element set containing the neighborhood park. If you had started at the same node (my house), but gone right first, you would not have arrived at the park. If you had started at a different node (somebody else’s house), you may or may not have arrived at the park. You may not have arrived anywhere (perhaps there is no left that one can take from certain houses). Or you might have arrived in multiple places (“go either straight or left” might not take you to the same part regardless which you take, but taken you into different directions.
Graph database products seem to differ on how to deliver to the developer the set of nodes that is the result of a traversal. In InfoGrid, all traversals produce a MeshObjectSet, which, as the name says, is a set of nodes. One can then iterate over that set, for example, subset it, unify it with another or ask how many elements it has. In other products, traversals produce an iterator directly which then can be queried for one member of the result set at a time. Regardless of API details, the result of a traversal is always a set (e.g. it can’t contain duplicates.)
Just like there are many ways of giving directions, traversal specifications can be captured in many different ways. In InfoGrid, we have — you guessed it — an abstract data type called TraversalSpecification and several different classes that implement that type, such as:
- traverse by going to all direct neighbor nodes of the start node
- go to all neighbor nodes related with a edge of a particular type in a particular direction (e.g. “traverse from employee to their manager(s)”)
- go N steps in sequence, where each step can be any traversal specification
- go N steps in parallel, and unify the resulting set
- select a subset of the found nodes based on some criteria, etc.
The FirstStep example shows some simple traversals.
And just for simplicity, InfoGrid also allows traversals starting from a set of start nodes, not just one. So we can say things like this:
MeshObject me = ...; MeshObjectSet myParents = me.traverse( childToParents ); MeshObjectSet myGrandParents = myParents.traverse( childToParents );
In our experience, working with sets makes complex traversals very easily understandable.