Today we’re looking at properties. There are a few different philosophies that a graph database might employ.
1. The purists often argue that properties aren’t needed at all: all properties can be modeled as edges to separate nodes, each of which represents a value. That’s of course true at some level: instead of a node representing a person, for example, that “contains” the person’s FirstName, LastName and DateOfBirth, one could create two String nodes and a TimeStamp node, and connect the with edges representing “first name”, “last name” and “date of birth”.
The non-purists counter that for practical purposes, it is much simpler to think of these data elements as properties instead of as independent things that are related. For example, it makes deletion of the Person much simpler (and we don’t need to implement cascading delete rules). Also, there are performance tradeoffs: if the three properties are stored with their owning node, for example, a single read is required to restore from disk the node and all of its properties. This would require at least 4 (perhaps 7, depending on how edges are stored in the graph database) reads if stored independently.
In InfoGrid, we believe in properties. We don’t prevent anybody from creating as many edges as they like, of course, but think that properties definitely have their uses.
2. Properties have to be named in some fashion, and the simplest approach — used by a number of graph database projects — is to give them a String label as a name. Correspondingly, the essence of the property API using Strings as labels would look like this:
public Object getPropertyValue( String name );
public void setPropertyValue( String name, Object value );
The advantage of this model is obviously that it is very simple. The disadvantage is that for complex schemas or models created by multiple development teams, name conflicts and spelling errors for property names occur more frequently than one would like. At least that is our experience when building InfoGrid applications, which is why we prefer the next alternative:
3. Properties are identified by true meta-data objects. We call them PropertyTypes, and they are part of what developers define when defining an InfoGrid Model. So the InfoGrid property API looks like this:
public Object getPropertyValue( PropertyType name );
public void setPropertyValue( PropertyType name, Object value );
We’ll have more to say on the subject of meta-data and Models in a future post.
Finally, we need to discuss what in a graph database can carry properties. Everybody other than the purists (see above) agree that nodes (called MeshObjects in InfoGrid) can carry properties. Some graph database projects (like the now-obsolete InfoGrid V1) also allow properties on edges (called Relationships in InfoGrid). Others (InfoGrid today) do not allow that.
It may sound peculiar that we had what looks like a more powerful approach in an earlier InfoGrid version but not any more. Here is what we observed in our practice with InfoGrid:
- Properties on edges are fairly rare compared to Properties on nodes. We’ve been involved in several projects over the years where the Models were substantial and not a single property was found on any edge; nor did anybody ask for one.
- If a property is needed on an edge, there is an easy workaround known as “associative entity” in data modeling circles: simply create an intermediary node that carries the property.
- The deciding factor was performance: if properties are rarely needed on edges, it is possible to traverse from one node to a neighbor node in a single step. If properties are needed on edges, the edge needs to be represented as a separate object, and a traversal from one node to its neighbor requires two steps: from the start node to the connecting edge, and from the edge to the destination node. So not having properties on edges can improve performance by a 100%. Which is why we got rid of them for InfoGrid V2.
In the next post, we will look at data types for properties.