Generating a Hierarchical HTML Outline

Graph databases like InfoGrid are much better at traversing recursive data structures than relational databases. That is probably an established fact by now and not controversial.

But what does this mean in practice?

Here is an example just how simple it can be to generate a hierarchical HTML outline from a graph using InfoGrid’s custom tags:

<dl>
  <tree:treeIterate
            startObjectName="Subject"
            traversalSpecification="xpath:*"
            meshObjectLoopVar="current">
    <tree:down>
      <dd><dl>
    </tree:down>
    <tree:up>
      </dl></dd>
    </tree:up>
    <tree:nodeBefore>
      <dt>
        <mesh:meshObject meshObjectName="current"/>
      </dt>
    </tree:nodeBefore>
  </tree:treeIterate>
</dl>

The result is a set of nested HTML definition lists, which are a good match for an outline in HTML.

By way of explanation of this example, the outer tag is a big loop, which iterates over all elements in the outline. Every time we walk down the tree, the <tree:down> tag fires and emits the HTML list start element contained there. Every time we walk up the tree, the corresponding <tree:up> tag triggers and closes the list. “Subject” is the name of the start node as JSP bean, and the xpath-like expression is used to describe which branches to traverse in the graph, because it would just as simple traversing the opposite direction, or sideways.

I don’t think it gets much simpler ;-) Imagine the pain in SQL, particularly if the tree is deep …

Web Apps and InfoGrid

Graph databases are great for web apps: instead of somehow having to map tables and rows to URLs, developers can simply 1-to-1 map objects in the GraphDB to URLs.

Unlike most other GraphDBs, InfoGrid has extensive libraries to make the creation of web pages from dynamic graphs straightforward.

Here’s an example. Let’s say on your web page, you want to show a list of nodes related to some node ‘x’. For example, a social media friends list, or list of purchases, or bookmarks etc. Here is the JSP code:

<set:traverseIterate startObjectName="x" traversalSpecification="*" loopVar="y">
  <set:iterateNoContent><p>Sorry, no neighbors.</p></set:iterateNoContent>
  <set:iterateHeader>
    <ul>
  </set:iterateHeader>
  <set:iterateContentRow>
    <li>
      <mesh:meshObjectLink meshObjectName="y">
        <mesh:meshObject meshObjectName="y" />
      </mesh:meshObjectLink>
    <li>
  </set:iterateContentRow>
  <set:iterateFooter>
    </ul>
  </set:iterateFooter>
<set:traverseIterate>

If you look closely, you see that not only does it print the neighbor nodes in the graph, but also adds a hyperlink to them, so they can clicked on and examined. And if there are no neighbors, a message is printed saying so.

What if I don’t want any kind of neighbor, but only those related in some fashion, say business colleagues? Well, depends a bit on your model, but it’s usually as simple as replacing the first line of JSP above with something like this:

<set:traverseIterate startObjectName="x"
        traversalSpecification="com.example.MySubjectArea/Person_IsColleagueOf_Person-S"
        loopVar="y">

Notice that there is no other code that needs to be written, no object-relational mapping, no SQL syntax to get right, no handler code or other clumsy magic. Of course, you can have may of these statements on the same page, nested etc.

A great way of developing rich web pages really quickly.

The Bless Relationships API

InfoGrid supports types on both nodes and edges, aka MeshObjects and Relationships. They are called EntityTypes and RelationshipsTypes, respectively.

Assume we have two related MeshObjects, each blessed with a (hypothetical) type Person. Like this:

MeshObject obj1 = ...;
MeshObject obj2 = ...;
obj1.bless( PERSON );
obj2.bless( PERSON );
obj1.relate( obj2 );

Now if we want to express that the second Person is the daughter of the first Person, we cannot do this:

obj1.blessRelationship( PERSON_PARENT_PERSON, obj2 );

because we don’t know, from this API, whether obj1 or obj2 is the daughter or the parent.

Instead, in InfoGrid, we have to explicitly specify the direction, and we do this by specifying the “end” of the relationship type instead of the relationship type for the bless operation, like this:

obj1.blessRelationship( PERSON_PARENT_PERSON.getSource(), obj2 );

Hopefully, the documentation of the parent relationship type in the model indicates the semantics of the source and the destination of each relationship type. These “ends” are called RoleTypes, by the way.

One nice side effect of this approach to the API is that it becomes quite straightforward to extend it if InfoGrid ever decided to support ternary or N-ary relationship types: Just pick a RoleType beyond a source or destination RoleType.

If you bless relationships from the HttpShell — the simplest way of manipulating the graph from a web application — you thus need to specify the identifier of the RoleType, not of the RelationshipType. By convention, the source RoleType of a RelationshipType with identifier ID is ID-S, and ID-D for the destination.

Error Messages, Parameters and Other Resources In InfoGrid

InfoGrid internally configures all parameters and messages via resource files. This makes internationalization much easier, and gives application developers a way of changing internal parameters if so desired. InfoGrid includes an override mechanism allows developers to use different values than InfoGrid’s default values without having to change InfoGrid’s property files.

Let’s take an example. Class org.infogrid.httpd.HttpServer implements a very simple HTTP server that’s been quite useful for automated testing, for example. Its default listening port should not be hard-coded, but instead is configured via a resource. In the code, it looks as follows:

private static final ResourceHelper theResourceHelper
        = ResourceHelper.getInstance( HttpServer.class );

protected static final int DEFAULT_ACCEPT_PORT
        = theResourceHelper.getResourceIntegerOrDefault(
                "DefaultAcceptPort",
                8081 );

Class org.infogrid.util.ResourceHelper is basically a wrapper around the Java resource facilities, but with some additional facilities, such as the ability to parse integers and use defaults as can be seen from this code fragment. When this code runs, InfoGrid looks for an override first (more about this below), then for a property file called org/infogrid/httpd/HttpServer.properties. In that file, it looks for a line that looks like this:

DefaultAcceptPort=1234

If found, DEFAULT_ACCEPT_PORT will be assigned that value. If not found (as in the current InfoGrid version, as the line is commented out), it will use the default specified in the code, here 8081.

If a developer wishes to override the property in that file, they could of course change that property file, but that would effectively create an InfoGrid branch, which is undesirable. Instead, an override mechanism can be used:

The ResourceHelper can be initialized with an “application resource bundle”, which essentially is just another property file specific to the application. Resources specified in that file override all others. An application developer could specify, in that file:

org.infogrid.httpd.HttpServer!DefaultAcceptPort=80

and DEFAULT_ACCEPT_PORT will become 80. It is necessary to qualify the name of the resource with the fully-qualified name of the ResourceHelper so no naming collisions occur across modules or developers, hence the prefix separated by the !.

The application resource bundle for an application can be set manually, by invoking ResourceHelper.setApplicationResourceBundle( ResourceBundle ) in the application’s startup code, or, preferably when using the InfoGrid Module Framework, in the application’s module advertisement as a parameter like this:

<parameter name="org.infogrid.util.ResourceHelper.ApplicationResourceBundle"
        value="com/example/ExampleApp" />

This identifies file com/example/ExampleApp.properties as the application resource file.

Required vs. Optional Property Values

InfoGrid distinguishes between properties that must have a non-null value, and properties that may or or may not be null.

When creating an InfoGrid model, a developer has to specify which by using the <isoptional/> tag in the model file.

Why?

By way of parallel, consider the following piece of Java code:

class Foo {
    private int max1 = 10;
    private Integer max2 = 20;

    public void doSomething() {
        for( int i=0 ; i<max1 ; ++i ) {
            //...
        }
        for( int i=0 ; i<max2 ; ++i ) {
            //...
        }
    }

Spot the problem? max2 of course might be null, which means our code will throw an exception in the innocent-looking second for loop. To get the code right, we will have to protect that section with an if-then-else section that checks for null first.

Of course, such a protection is often the right thing to do. But in this example, a “max” should hardly ever be null, so using an “int” as a data type like for max1 (which can’t be null) is much better than using an “Integer” like for max2 (which may be null).

It’s the same thing for properties in InfoGrid models. Some properties simply should never be null. For example, consider a time stamp indicating when a MeshObject was created. Given that the MeshObject was created, the time stamp must exist, and therefore a null value makes no sense. In which case the property would be specified as “mandatory”. On the contrary, a time stamp when a MeshObject is likely to become obsolete is very likely optional: we might not know that time (yet), or it might never become obsolete, so null values are fine.

If InfoGrid did not distinguish between required and optional values, application code would be littered with unnecessary tests for null values. (or failing that, unexpected NullPointerExceptions.) We think being specific is better when creating the model; higher-quality and less cluttered application code is the reward.

Also check out the following related posts:

Tip: Always RelateIfNeeded when using HttpShell

The InfoGrid HttpShell is something rather amazing when creating web applications. With just a little bit of HTML markup, we can modify the graph of nodes and edges in the GraphDB any way we want, all without writing any handler code for it.

For edges, it understands the keywords:

  • relate
  • relateIfNeeded
  • unrelate
  • unrelateIfNeeded

Relate will fail if the two nodes in question are related already. And in practice, that turns out to be more common than one would assume; only blessing of the edge with the appropriate RoleType is often needed.

So here’s the tip of the day: unless there are good reasons not to, use “relateIfNeeded” and “unrelateIfNeeded” for your application code.

InfoGrid Now Supports Money as a Native Data Type

There’s a new DataType called CurrencyDataType, and corresponding CurrencyValue. A CurrencyValue consists of a fixed-point decimal number and one of the ISO 4217 currencies. For example, you can say

CurrencyValue newValue = CurrencyValue.create( 12, 34, CurrencyDataType.USD );
System.out.println( newValue );

which will, by default, print:

12.34 USD

It correctly handles currencies that have 0, 1, or 3 digits after the decimal point, too.

In a JSP page, you can say, as you would expect:

<mesh:property
    meshObjectName="Subject"
    propertyType="org.infogrid.model.Test/AllProperties_OptionalCurrencyDataType" />

(or the identifier of a PropertyType in your model that is of type CurrencyDataType) which will print 12.34 USD in view mode, and automatically make it editable in edit mode.

Easiest to try out by running the test app org.infogrid.jee.testapp. Currently currency support is only in trunk, but will get promoted over time.

Isn’t it much easier if your data platform does this, than having to write currency parsing and rendering code all over one’s application?

InfoGrid Screencasts on YouTube

InfoGrid is now on YouTube at InfoGridDotOrg.

Occasionally we’ll post a software demo. The first one is there already: a screencast of the MeshWorld example application for the InfoGrid graph database. It shows:

  • How to create and delete MeshObjects (ie. nodes)
  • How to relate them to each other (ie. edges)

It follows the FirstStep example, except that MeshWorld is a web application, while FirstStep is a command-line application. There’ll be more to come.

InfoGrid and Multigraphs

Graph theory distinguishes between graphs that are multigraphs and those that are not, i.e. between graphs where there may be more than one edge between any two nodes, and those graphs where there can only be one edge between any two nodes.

On the first glance, InfoGrid does not support multigraphs. On the second glance, InfoGrid is a bit more powerful than it appears; not an unusual occurrence ;-) Let me explain:

Here is how we create an edge between two nodes in InfoGrid:

MeshObject node1 = ...;
MeshObject node2 = ...;

node1.relate( node2 );

If we try to do the last statement again, InfoGrid will throw a RelatedAlreadyException. That clearly says that multiple edges between the same two nodes are not allowed.

But.

We do this in InfoGrid because unless there is some associated information with each of these edges, we cannot distinguish them and so there is no point in having two. The way we do that in InfoGrid is to bless the edge with more than one RelationshipType, such as:

node1.blessRelationship( PARENT_TO_SON, node2 );
node1.blessRelationship( CUSTOMER_TO_VENDOR, node2 );

These two relationships express that node1 is a parent of son node2, and that node2 (the son) has sold something to customer node 1 (the parent). Very different semantics associated with the same edge. Just like multiple edges in a multigraph.

So: InfoGrid supports multigraphs just fine, just perhaps not entirely as you would have expected.

Operations on a Graph Database (Part 6 – Traversals)

Graph Database Tutorial

Part 1: Nodes

Part 2: Edges

Part 3: Types

Part 4: Properties

Part 5: Identifiers

Part 6: Traversals

Part 7: Sets

Part 8: Events

Traversals are the most common operations on a graph database. They are just as important for graph databases as joins are for relational databases. But of course, they are something else as graphs are not tables.

A traversal in a graph database is uniquely described by two data items:

  • a start node
  • a specification for the traversal

A traversal always leads to a set of nodes. Depending on the structure of the graph being traversed, that set may contain many, one or zero nodes. For example, if we traverse from a node representing a Person to the nodes representing their grandchildren, we may or may not find any, depending on whether they have any.

From a given node, we can traverse in many different ways (i.e. same node, different traversal specifications). Or, given the same traversal specification, we can start with different nodes.

By way of analogy, consider street directions:

  • start node: my house
  • traversal specification: first turn left, then go either straight or left.

The result of this particular traversal is a single-element set containing the neighborhood park. If you had started at the same node (my house), but gone right first, you would not have arrived at the park. If you had started at a different node (somebody else’s house), you may or may not have arrived at the park. You may not have arrived anywhere (perhaps there is no left that one can take from certain houses). Or you might have arrived in multiple places (“go either straight or left” might not take you to the same part regardless which you take, but taken you into different directions.

Graph database products seem to differ on how to deliver to the developer the set of nodes that is the result of a traversal. In InfoGrid, all traversals produce a MeshObjectSet, which, as the name says, is a set of nodes. One can then iterate over that set, for example, subset it, unify it with another or ask how many elements it has. In other products, traversals produce an iterator directly which then can be queried for one member of the result set at a time. Regardless of API details, the result of a traversal is always a set (e.g. it can’t contain duplicates.)

Just like there are many ways of giving directions, traversal specifications can be captured in many different ways. In InfoGrid, we have — you guessed it — an abstract data type called TraversalSpecification and several different classes that implement that type, such as:

  • traverse by going to all direct neighbor nodes of the start node
  • go to all neighbor nodes related with a edge of a particular type in a particular direction (e.g. “traverse from employee to their manager(s)”)
  • go N steps in sequence, where each step can be any traversal specification
  • go N steps in parallel, and unify the resulting set
  • select a subset of the found nodes based on some criteria, etc.

The FirstStep example shows some simple traversals.

And just for simplicity, InfoGrid also allows traversals starting from a set of start nodes, not just one. So we can say things like this:

MeshObject me = ...;
MeshObjectSet myParents      = me.traverse( childToParents );
MeshObjectSet myGrandParents = myParents.traverse( childToParents );

In our experience, working with sets makes complex traversals very easily understandable.