The Web Graph Database

wiki:WildIdeas/HistoricalInformation

Wild Idea: Historical Information

Motivation

Surprisingly many apps building on InfoGrid need to store the historical evolution of some kind of data, represented as a MeshObject. Examples:

  • if a MeshObject represents a web page at URL http://example.com/page1, it is often desirable to be able to keep a change history of that page around, similarly to a version control system, but native to InfoGrid so that relationships keep being preserved.
  • if a MeshObject represents a measurement e.g. of a temperature, it would be desirable to keep a history of measurements around (e.g. yesterday's temperature etc.)

While it is possible to capture historical information in InfoGrid today by creating suitable RelationshipTypes, graph databases are notoriously inefficient at managing the often thousands of Relationships between some anchor MeshObject and the MeshObjects representing the history. Also, such RelationshipTypes tend to mess up the semantics of the model, and are generally hard to use to version not such individual MeshObjects but the entire MeshObjectGraph.

A cleaner and more efficient approach would be desirable.

Approach

In a possible future InfoGrid version, we could do this:

  • MeshBase implementations may or may not support historical information, or perhaps offer a flag that switches the behavior on and off for efficiency reasons.
  • MeshObjects carry an additional version number attribute of type positive integer. Prior to each update of a MeshObject, a copy of the MeshObject is created, the version number is incremented and the update is made.
  • We extend MeshObjectIdentifier with an optional version number. If this is non-null, the MeshObjectIdentifier refers to the MeshObject in the specified version. If null, it refers to the most recent version of the MeshObject.
  • To obtain the entire, ordered history of a MeshObject, one looks up the current version of the MeshObject (leaving the version attribute at null in the MeshObjectIdentifier), determines its version number, and then looks up past versions of the MeshObject by creating a MeshObjectIdentifier with the same root and version numbers from 1 to the current version.
  • To obtain the version of a MeshObject at a certain past time, we perform a binary search on the history of the MeshObject, comparing with the timeUpdated pseudo-property.
  • To perform a historical traversal, we find the appropriate version of the MeshObject, traverse the RolePlayerTable?, and then look up the then-current version of the reached MeshObject.
  • All of this can be packaged into nice APIs.
  • Existing InfoGrid apps continue to behave the same. Similarly, if an app decides to never specify a version number in a MeshObjectIdentifier, apps InfoGrid behavior is unchanged.

API examples

These are just API ideas, there may be better ways of doing this.

    MeshObjectIdentifier currentId = idFact.fromExternalForm( "foo" );
    MeshObject           current   = mb.findMeshObjectById( currentId );

    MeshObjectIdentifier previousId = idFact.fromExternalForm(
            "foo",
            current.getIdentifier().getVersion() );
    MeshObject           previous   = mb.findMeshObjectById( previousId );

    MeshObject []        history    = mb.getMeshObjectHistory( currentId );
    assertEquals( history[ history.length-1 ], current );
    assertEquals( history[ history.length-2 ], previous );

    MeshObject lastYear = mb.findMeshObjectVersionByIdTime(
            currentId, 
            TimeStampValue.create( LAST_YEAR ));

    MeshObjectSet lastYearNeighbors = lastYear.traverseToNeighbors( 
            TimeStampValue.create( LAST_YEAR ));
Last modified 21 months ago Last modified on 02/24/13 21:45:27