Operations on a Graph Database (Part 4 – Properties)

Graph Database Tutorial

Part 1: Nodes

Part 2: Edges

Part 3: Types

Part 4: Properties

Part 5: Identifiers

Part 6: Traversals

Part 7: Sets

Part 8: Events

Today we’re looking at properties. There are a few different philosophies that a graph database might employ.

1. The purists often argue that properties aren’t needed at all: all properties can be modeled as edges to separate nodes, each of which represents a value. That’s of course true at some level: instead of a node representing a person, for example, that “contains” the person’s FirstName, LastName and DateOfBirth, one could create two String nodes and a TimeStamp node, and connect the with edges representing “first name”, “last name” and “date of birth”.

The non-purists counter that for practical purposes, it is much simpler to think of these data elements as properties instead of as independent things that are related. For example, it makes deletion of the Person much simpler (and we don’t need to implement cascading delete rules). Also, there are performance tradeoffs: if the three properties are stored with their owning node, for example, a single read is required to restore from disk the node and all of its properties. This would require at least 4 (perhaps 7, depending on how edges are stored in the graph database) reads if stored independently.

In InfoGrid, we believe in properties. We don’t prevent anybody from creating as many edges as they like, of course, but think that properties definitely have their uses.

2. Properties have to be named in some fashion, and the simplest approach — used by a number of graph database projects — is to give them a String label as a name. Correspondingly, the essence of the property API using Strings as labels would look like this:

public Object getPropertyValue( String name );
public void setPropertyValue( String name, Object value );

The advantage of this model is obviously that it is very simple. The disadvantage is that for complex schemas or models created by multiple development teams, name conflicts and spelling errors for property names occur more frequently than one would like. At least that is our experience when building InfoGrid applications, which is why we prefer the next alternative:

3. Properties are identified by true meta-data objects. We call them PropertyTypes, and they are part of what developers define when defining an InfoGrid Model. So the InfoGrid property API looks like this:

public Object getPropertyValue( PropertyType name );
public void setPropertyValue( PropertyType name, Object value );

We’ll have more to say on the subject of meta-data and Models in a future post.

Finally, we need to discuss what in a graph database can carry properties. Everybody other than the purists (see above) agree that nodes (called MeshObjects in InfoGrid) can carry properties. Some graph database projects (like the now-obsolete InfoGrid V1) also allow properties on edges (called Relationships in InfoGrid). Others (InfoGrid today) do not allow that.

It may sound peculiar that we had what looks like a more powerful approach in an earlier InfoGrid version but not any more. Here is what we observed in our practice with InfoGrid:

  • Properties on edges are fairly rare compared to Properties on nodes. We’ve been involved in several projects over the years where the Models were substantial and not a single property was found on any edge; nor did anybody ask for one.
  • If a property is needed on an edge, there is an easy workaround known as “associative entity” in data modeling circles: simply create an intermediary node that carries the property.
  • The deciding factor was performance: if properties are rarely needed on edges, it is possible to traverse from one node to a neighbor node in a single step. If properties are needed on edges, the edge needs to be represented as a separate object, and a traversal from one node to its neighbor requires two steps: from the start node to the connecting edge, and from the edge to the destination node. So not having properties on edges can improve performance by a 100%. Which is why we got rid of them for InfoGrid V2.

In the next post, we will look at data types for properties.

Comments:

  1. Robert Quinn says on June 10th, 2010 at 12:15 pm:

    Bummer about not allowing properties on edges. We effective date all of our relationships so now I have to use an associative object rather than just adding a start and end date to the edge.

  2. Robert Quinn says on June 10th, 2010 at 2:29 pm:

    Can you explain how I might use the associative entity patten to implement effective dating (start/stop) for relationships?

    I’m thinking that every relationship goes through an effectivity node. simply make each effectivity node unique, double the relationship counts and (nodes increase by 1/3rd) in the graph.

    transversal would be work, but all N step and neighbor logic has to account for this.

    I’m new to graphs, any suggestions?

  3. I don’t quite understand what the semantics are that you are trying to capture. Perhaps move the discussion to the mailing list?

  4. Robert Quinn says on June 12th, 2010 at 3:54 pm:

    thanks I’ll post my question there

  5. Najam Haq says on June 6th, 2013 at 2:04 pm:

    No properties on edges? That would make a whole array of applications rather cumbersome to write. Graphs with labeled edges, or weighted edges. Most graph applications I have written — and I have written quite a few — have edges with associated weights. Sometimes the weight is a vector. Multiple numeric properties on edges would be very useful to me.

  6. admin says on June 6th, 2013 at 2:21 pm:

    It’s a tradeoff. We used to have properties on RelationshipTypes but removed them because most apps developed on InfoGrid that we were aware of at the time did not have much of a need for that feature (and creating an associated EntityType is a straightforward workaround).

    At the same time, developers really did not like that any relationship traversal effectively needed one additional disk access to retrieve those property values, which is not good if 95%+ of your RelationshipTypes don’t actually have any PropertyTypes.

    As you say, it depends on the application.

Leave a comment:

You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>