It always looked like it was only a matter of time until the object database companies would try and become graph databases. Perhaps that is what they should have been all along. I’m speaking as somebody who tried several products almost 20 years ago and decided that they were just too much hassle to be worth it: graphs are a much better abstraction level than programming-level constructs for a database.
strategic business unit is tasked with bringing [a] enterprise-ready and distributed graph database product to market
(I took the liberty of eliminating the “marketing” superlatives from the quote; the entire press release has a very generous sprinkling of them.)
Actually, they only announced a beta program, which I signed up for. InfiniGraph.com says:
X:\> BETA IS NOW OPEN
But then, on the screen behind, they say:
Over the next several days, we’ll be preparing our installer and documentation for distribution to the InfiniteGraph community. Stay tuned, and feel free to participate in the discussion on our beta blog!
Well, well, the difficulties of a launch. So I don’t know yet what they created. But it’s good to see another player legitimizing graph databases as a category. So, welcome Objectivity!
This release contains some major improvements, in particular to the way the object graph is mapped to and from the web. Download or browse documentation here.
General:
improved stability and error reporting
lots of bug fixes
more tests
removed symbolic links from SVN; was an endless source of frustration
Core:
renamed TraversalDictionary to TraversalTranslator: it can be much more dynamic than a dictionary
implemented KeywordTraversalTranslator with fixed translation keywords
implemented XpathTraversalTranslator with a pseudo-subset of Xpath
introduced AllNeighborsTraversalSpecification instead of null TraversalSpecification; introduced StayRightHereTraversalSpecification
MeshObject’s userVisibleName now always returns value of a Property called “Name” if the MeshObject has one; this seems a sensible default for many applications
MeshObjectIdentifierFactory now has a pointer back to the MeshBase to which it belongs. Downside is one cannot use the same instance of MeshObjectIdentifierFactory for multiple MeshBases any more.
got rid of MeshStringRepresentationContext, which partially overlapped with the purpose of MeshStringRepresentationParameters; only historical reasons can explain why we had both
removed title/target/additionalParameters argument from HasStringRepresentation.toStringRepresentationLinkStart; now handled as parameters in StringRepresentationParameters
simplified StringRepresentation of PropertyValues and related
additional pre-defined StringRepresentations e.g. HttpPost
distinguish between formatting Properties (which may be null) and PropertyValues; no more muddling with funny MeshObject context parameters; formatting is now performed via DataType
use correct ClassLoader to load ResourceHelper default properties file during Module initialization
correct initialization of ResourceHelper in module.adv
expanded MeshObjectSet and MeshObjectSetFactory API
Identity-related:
refactored LID implementation to use an “instructions” based approach to the pipeline, instead of an exceptions-based approach. This is more flexible for users of the module.
fixed typo about credential vs. credtype.
renamed LidPersona to LidAccount; more natural to talk about it using that name
added LidAccountStatus and SiteIdentifier to LidAccount for multi-tenancy
factored account and session-related concepts from org.infogrid.lid.model.lid into new SubjectArea org.infogrid.lid.model.account.
new custom tag library that deals with identity
Model-related:
fixed code generation bug for descriptions of values in EnumeratedDataTypes
code generator to generate static constants for all EnumeratedValues.
added AccountCollection to account LID model
expanded TestModel to cover both optional and mandatory PropertyTypes; renamed PropertyTypes correspondingly
improved Test model for more comprehensive testing
better user-visible strings for EntityTypes Bookmark, Account and WebResource
Viewlet/GUI-related:
Viewlet framework and tag library extensions for including Viewlets in Viewlets; updated Viewlets accordingly; now allows in-context editing, change of viewlet types etc. for included JeeViewlets; no contiguous TraversalPath from top required
support REST-ful URLs on hierarchical Viewlets e.g. GraphTreeViewlet with multiple Viewlet alternatives in contained Viewlet
removed iframe in (Net)MeshWorld; hierarchical Viewlets is better approach
various HTML fixes and improvements related to the rendering and editing of PropertyValues
removed rootPath on all custom tags; not needed or used
sanitized formatting of Identifiers; it’s still not totally sane but a lot more so
eliminated PropertyValueTag.{css,js} and replaced with PropertyTag.{css,js}
removed -moz-opacity CSS value per recent Firefox updates
fix HTML doctype to make IE more happy
BlobViewlet to get its PropertyType from URL argument not POST argument
footer element for MeshObjectSetIterate tag
do not print iterateHeader and iterateFooter when set has no content in MeshObjectSetIterateTags
default POST behavior is now redirect-to-GET on same URL, so browser refresh is not as awful
added missing setter methods on StructuredResponse
created SetSizeTag to print the size of a MeshObjectSet in JSP
slight changes how MeshObjects are shown on screen by default (dropped annotation in which non-standard MeshBase they are)
overflow: auto; to support long CSS floats
added orderBy property to setIterate JSP tags
added ability to sort in the inverse direction
created propertymeter JSP tag for bar graphs or temperature graphs based on Properties
default sorting in JSP MeshObjectSet tags is by user-visible String
don’t set domains on cookies when run from localhost; makes life of developers hard
generate Javascript for PropertyValues
eliminating unnecessary projects by moving their code into other projects:
renamed meshObjectLoopVar to loopVar in custom tags
added arrivedAt property to Viewlet
enctype attribute on safeForm is all lowercase
created SaneUrl, new supertype of SaneRequest that allows to reuse API for URLs and servlet requests; slight API naming changes httpHost vs. server; allows us to get rid of OverridingSaneRequest nonsense
DEFAULT_LINK_START/END_ENTRY now consistently on StringRepresentation
removed RestfulRequest, replaced with a MeshObjectsToViewFactory that directly translates SaneRequest into MeshObjectsToView
an instance of MeshObjectsToViewFactory must now reside in Context
removed NetViewletDispatcherServlet; not needed any more
removed most redundant methods on Viewlet; better have one clear way how to do it only
upgraded ViewletFactoryChoice: now HasStringRepresentation and contains MeshObjectsToView; this means unfortunately that ViewletFactory setup in applications needs to pass MeshObjectsToView to their choices()
ViewedMeshObjects now keeps reference to MeshObjectsToView that it took its data from
removed unnecessary request attributes like JeeViewlet.VIEWLET_STATE_TRANSITION_NAME: can be obtained via Viewlet
made MeshObjectsToView an interface and subtyped to JeeMeshObjectsToView and NetMeshObjectsToView for cleaner model
renamed getMeshObjects to getViewedMeshObjects for consistency
ViewletState has moved from JeeViewlet to JeeViewedMeshObjects; added isDefaultState
The database industry is not used to databases that can generate events. The closest the relational database has to events are stored procedures, but they never “reach out” back to the application, so their usefulness is limited. But events are quite natural for graph databases. Broadly speaking, they occur in two places:
Events on the graph database itself (example: “tell me when a transaction has been committed, regardless on which thread”)
Events on individual objects stored in the graph database (example: “tell me when property X on object Y has changed to value Z”, or “tell me when Node A has a new Edge”)
Events on the GraphDB itself are more useful for administrative and management purposes. For example, an event handler listening to GraphDB events can examine the list of changes that a Transaction is performing at commit time, and collect statistics (for example).
From an application developer’s perspective, events on the data are more interesting:
An example may illustrate this. Imagine an application that helps manage an emergency room in a hospital. The application’s object graph contains things such as the doctors on staff, the patients currently in the emergency room and their status (like “arrived”, “has been triaged”, “waiting for doctor”, “waiting for lab results” etc.) Doctors carry pagers. One of the requirements for application is that the doctor be paged when the status of one of their assigned patients changes (e.g. from “waiting for lab results” to “waiting for doctor”).
With a passive database, i.e. one that cannot generate events, like a typical relational database, we usually have to write some kind of background task (e.g. a cron job) that periodically checks whether certain properties have changed, and then sends the message to the pager. That is very messy: e.g. how does your cron job know which properties *changed* from one run to the next? Or we have to add the message sending code to every single screen and web service interface in the app that could possibly change the relevant property, which is just as messy and hard to maintain.
With a GraphDB like InfoGrid, you simply subscribe to events, like this:
MeshObject patientStatus = ...; // the node in the graph representing a patient's status
patientStatus.addPropertyChangeListener( new PropertyChangeListener() {
public void propertyChanged( PropertyChangeEvent e ) {
sendPagerMessage( ... );
});
}
The graph database will trigger the event handler whenever a property changed on that particular object. It’s real simple.
It’s rather apparent that while these projects are all GraphDBs, they differ substantially in what they are trying to accomplish, and why, and therefore how they do it. This is a good resource for developers investigating GraphDBs and trying to understand their alternatives.
Occasionally we’ll post a software demo. The first one is there already: a screencast of the MeshWorld example application for the InfoGrid graph database. It shows:
How to create and delete MeshObjects (ie. nodes)
How to relate them to each other (ie. edges)
It follows the FirstStep example, except that MeshWorld is a web application, while FirstStep is a command-line application. There’ll be more to come.
Sets are a core concept of most databases. For example, any SQL SELECT statement in a relational database produces a set. Sets apply to Graph Databases just as well and are just as useful:
The most frequently encountered set of nodes in a Graph Database is the result of a traversal. For example, in InfoGrid, all traversal operations result in a set like this:
We might as well have returned an array, or an Iterator over the members of the set, were it not for the fact that there are well-understood set operations that often make our jobs as developers much simpler: like set unification, intersection and so forth.
For example, in a social bookmarking application we might want to find out which sites both you and I have bookmarked. Code might look like this:
MeshObject me = ...; // node representing me
MeshObject you = ...; // node representing you
TraversalSpecification ME_TO_BOOKMARKS_SPEC = ...;
// how to get from a person to their bookmarks, see post on traversals
MeshObjectSet myBookmarks = me.traverse( ME_TO_BOOKMARKS_SPEC );
MeshObjectSet yourBookmarks = you.traverse( ME_TO_BOOKMARKS_SPEC );
// Bookmarks that you and I share
MeshObjectSet sharedBookmarks = myBookmarks.intersect( yourBookmarks );
Notice how simple this code is to understand? One of the powers of sets. Or, if you know what a “minus” operation is on a set, this is immediately obvious:
// Bookmarks unique to me
MeshObjectSet myUniqueBookmarks = myBookmarks.minus( yourBookmarks );
This is clearly much simpler than writing imperative code which would have lots of loops and if/then/else’s and comparisons and perhaps indexes in it. (And seeing this might put some concerns to rest that NoSQL databases are primitive because they don’t have a SQL-like query language. I’d argue it’s less the language but the power of sets, and if you have sets you have a lot of power at your fingertips.)
To check out sets in InfoGrid, try package org.infogrid.mesh.set. Clearly much more can be done than we have so far in InfoGrid, but it’s a very useful start in our experience.
Nick Kallen at Twitter last night released FlockDB, Twitter’s social graph database. Source code is here.
If anybody doubted that graph databases are real, or are useful, this release is yet another good reason to investigate. Welcome FlockDB to the crowd.
I haven’t had time to take a detailed look, but it appears that FlockDB has a hard-coded schema developed specifically for the needs at Twitter. That makes a lot of sense for Twitter but less so as a general-purpose graph database. On the other hand, lots of people could probably benefit from that schema when building social applications. We’ll see.
I’m planning to be at Big Data Workshop, the first unconference on NoSQL and Big Data. If past events moderated by Kaliya Hamlin are any guide, it will be a great opportunity for everybody:
to explore together how the Big Data market will be coming together
to understand how the key technologies and projects work
what interfaces and interoperability standards are emerging and/or needed
how we can grow the overall market and make it easier for everybody to adopt these technologies for interesting new projects.
Arguably, without Internet Identity Workshop (also moderated by Kaliya) was the enabler for the stunning adoption rate over the past five years of OpenID, OAuth and related technologies (at last count, more than 1 billion enabled accounts). I hope history repeats itself here.
P.S. Feel free to corner me on InfoGrid, graph databases or any other subject. That’s the whole point of an unconference.
Little did I know when I put up InfoGrid’s FirstStep example. The example creates just a few nodes and a few edges to show, in principle, how to build a URL tagging application based on a graph database like InfoGrid.
Alex Popescu at MyNoSQL challenged the Neo4j folks how they would implement it, and they responded promptly. Then, the guys are Sones implemented the same example themselves, and just now the Filament project did the same. Worth a blog post with the links!
I’m tempted to list my own observations, but I’d like to avoid a blogging contest in which — naturally — everybody will claim “but the way we do it is better”. Independent reviews anybody?
Alex Popescu has a great comparison how the InfoGrid FirstStep example would look like in Neo4j, another graph database. As I noted in an earlier post, there are far more similarities in our approaches to the basics of graph databases than there are differences.
Couple comments, addressing some of Alex’ notes. He says:
everything in Neo4j must happen inside a transaction even if it’s a graph traversal operation (this gives a very strong Isolation level). The InfoGrid traversal code seem to happen outside the transaction…
That’s correct. You can do the traversal inside a transaction if you like, but you are not required to. This gives application developers one more option for concurrency control: transactions, critical sections, and no protection.
Re InfoGrid terminology, it’s ancient roots are in object modeling (think UML) — for example, we still talk about InfoGrid Models. However, over time it became clear that InfoGrid’s core ideas are far distinct enough to warrant their own terms. So when we moved from InfoGrid V1 to V2 a few years ago, we changed terms. For example, an “instance” (in UML or programming terminology) aka “node” (graph terminology) rarely can have more than one type anywhere other than in InfoGrid. Think of a Java object that has more than one class, and you can dynamically add and remove classes to the instance at run-time. So we call them MeshObjects rather than something people might have the wrong connotations with. The closes we are aware of is Perl’s “bless”, which is why we use that term.
the Neo4j uses also the LuceneIndexService for indexing both the tag and web resources nodes, but that’s only because the code there makes sure not to duplicate either tags or web resources (i.e. this functionality is not present in the InfoGrid code and I don’t know how that would look like)
Correction. You are invited to modify the example and attempt to create a second MeshObject with the same identifier. You can’t (it will throw an exception at you).
But never mind the comparatively minor differences between Neo4j and InfoGrid. We should compare this to all the stuff one would have to do with a relational database to build the same thing. Object-relational mapping anybody? No thanks …