Operations on a Graph Database (Part 2 – Edges)

Graph Database Tutorial

Part 1: Nodes

Part 2: Edges

Part 3: Types

Part 4: Properties

Part 5: Identifiers

Part 6: Traversals

Part 7: Sets

Part 8: Events

In the first post of this series, we looked at creating and deleting Nodes. Today we are looking at Edges.

Unlike simpler NoSQL data stores like key-value stores, graph databases not only manage nodes, but also edges. Edges are things that connect two other data elements, and graph datastores have them as a basic element managed by the store. Think of them as the line between two boxes; that’s exactly what they are.

Edges often take developers a while to get used to, because there isn’t much precedent in the world of software. Even the so-called “relational database” doesn’t actually have “relationships” as a first-class concept: we have to infer them from primary/foreign key declarations; and that only works if developers actually declare them, which is not all that common.

Edges don’t exist in normal code either. Pretty much all mainstream programming languages only have pointers, not relationships aka edges. Edges are bidirectional, managed things, while pointers are one-directional and not managed at all. Let’s take an example (using a simplified version of the InfoGrid API, see the FirstStep example for complete code of a basic URL tagging application):

MeshObject customer = createMeshObject(); // create first node, called MeshObject in InfoGrid
MeshObject order    = createMeshObject(); // create second node
customer.relate( order );

What did we just do?

We created a customer object, and an order object, and then we said the two are related. (The graph database makes sure the objects get persisted automatically when the Transaction is closed; not shown here as we try to stay on topic.)

If we had to do that in straight Java, we’d do something like this:

Customer customer = new Customer();
Order    order    = new Order();
customer.addOrder( order );
order.setCustomer( customer );

and we’d have to write the code to manage the edge ourselves, such as:

class Customer {
    ...
    private List<Order> ordersOfThisCustomer = new ArrayList<Order>();
}
class Order {
    ...
    private Customer customerPlacingThisOrder;
}

The question is: why do we have to do all this work for a simple 1-N relationship between customers and orders? The graph database API is much better: for one, it lets the database worry about how and when to save and restore the involved objects. It could, for example, (as InfoGrid does), decide to restore from disk the Customer object but not the Order object for some time because the application code does not currently need to know the Customer’s orders. And referential integrity is always guaranteed. For example:

customer.traverseToNeighbors().getSingleMember(); // returns the single Order object
order.traverseToNeighbors().getSingleMember(); // returns the single Customer object

// now we delete the edge
customer.unrelate( order );

customer.traverseToNeighbors().getSingleMember(); // returns null
order.traverseToNeighbors().getSingleMember(); // returns null

If there is no graph database involved, we need to do it manually, like this:

customer.removeOrder( order );
order.setCustomer( null );

… and hope that we don’t forget one of those calls, because then referential integrity would be out the window, and the next application crash is a certainty.

Imagine if we wanted to restore the Customer and the Order object at different times from disk. Without help from sophisticated run-time infrastructure like a graph database, band-aid-technologies such as object-relational mapping is most likely going to create a separate instance for, say, the restored Order object, and code such as List.remove( … ) is not going to work because we have two Java objects in memory that represent the same order. (Shudder.)

Of course, code could be written to manage all of this manually, but it’s much better if the platform takes care of it.

[The astute reader will notice that the plain Java example has one advantage: it provides type safety. I'll have to say more about this in an upcoming post about types.]

So: after working with graph databases for a while, many people believe that edges are actually the much more interesting and useful concept than nodes. Just like many data modelers think that the value of a data model is often more in the way the entities are connected by relationships than the details of the entities. Automatic management of relationships make things simple, and that’s what any good database should do. Developers have enough to worry about, and graph databases provide real help here.

In the next post, we’ll look at types.

Leave a comment:

You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>