Required vs. Optional Property Values

InfoGrid distinguishes between properties that must have a non-null value, and properties that may or or may not be null.

When creating an InfoGrid model, a developer has to specify which by using the <isoptional/> tag in the model file.

Why?

By way of parallel, consider the following piece of Java code:

class Foo {
    private int max1 = 10;
    private Integer max2 = 20;

    public void doSomething() {
        for( int i=0 ; i<max1 ; ++i ) {
            //...
        }
        for( int i=0 ; i<max2 ; ++i ) {
            //...
        }
    }

Spot the problem? max2 of course might be null, which means our code will throw an exception in the innocent-looking second for loop. To get the code right, we will have to protect that section with an if-then-else section that checks for null first.

Of course, such a protection is often the right thing to do. But in this example, a “max” should hardly ever be null, so using an “int” as a data type like for max1 (which can’t be null) is much better than using an “Integer” like for max2 (which may be null).

It’s the same thing for properties in InfoGrid models. Some properties simply should never be null. For example, consider a time stamp indicating when a MeshObject was created. Given that the MeshObject was created, the time stamp must exist, and therefore a null value makes no sense. In which case the property would be specified as “mandatory”. On the contrary, a time stamp when a MeshObject is likely to become obsolete is very likely optional: we might not know that time (yet), or it might never become obsolete, so null values are fine.

If InfoGrid did not distinguish between required and optional values, application code would be littered with unnecessary tests for null values. (or failing that, unexpected NullPointerExceptions.) We think being specific is better when creating the model; higher-quality and less cluttered application code is the reward.

Also check out the following related posts:

InfoGrid Storage Interface

We’ve been getting a lot of hits from search engines looking for the Infogrid’s “Storage Interface”. That interface is called Store and here is more info about it: entry on wiki, entry to JavaDoc.

Sometimes one has to manually fix Google and that’s the sole purpose of this post ;-)

ACID Transactions Are Overrated

For many years, the canonical example why we need database transactions has been banking. If you move $100, you don’t really want the money be subtracted from the first account, but never be added to the second because of some problem in between. You want both the subtraction and the addition to happen, or neither.

Sounds good so far. Just apparently that is not how banks work in the real world, and they certainly use enough database systems that have ACID transactions. The Economist (July 24, 2010, “Computer says no”) quotes a former executive of the Royal Bank of Scotland saying: “The reality was you could never be certain that anything was correct.” Continuing: “Reported numbers fot the bank’s exposure were regularly billions of dollars adrift of reality.” The article offers an explanation: “banks tend to operate lots of different databases producing conflicting numbers.” HSBC is quoted: 55 separate systems for core banking, 24 for credit cards, and 41 for internet banking.

According to traditional transaction wisdom, if a customer makes an internet transaction to pay off his credit card, it should be a single transaction: start transaction, take money from checking account, put it into credit card account, commit. But transactions generally cannot span systems. Because the system responsible for internet banking is separate in this real-world example from core banking and from credit card systems, no such single ACID transaction is possible. Given the numbers above, it looks very much like those money transfers that actually can follow the canonical ACID transaction pattern constitute only a very small fraction of all transactions (like transferring from checking to savings.)

If I look at my own banking, the vast majority of my banking activity isn’t even within the same bank, but with other banks: bills to pay usually have to be paid at other banks. No cross-bank ACID database transactions that I’ve ever heard of.

So banking software necessarily has to have functionality that prevents that money is deducted but never arrives, all without depending on database transactions. If we have to have this functionality anyway, why then are transactions “indispensable” as some people still want to make us believe?

This pattern can be generalized: the more distributed and decentralized a system is, the less likely it is that we can use transactions that span the entire system. That is certainly true for the banking system, apparently also true for systems inside banks, and in many other places. ACID transactions were invented for the mainframe, the world’s most centralized computing construct. But computing is not “one mainframe” any more I’m afraid as it was in the sixties.

Instead of trotting out transactions as the answer, what we need for NoSQL databases is the ability to get the same benefits in a distributed system (”nothing is lost”) without relying on transactions. That’s where all of our efforts should be.

In the InfoGrid graph database, we have a weaker form of transactions for individual MeshBases, but then synchronize with the rest of the world by passing XPRISO messages. I’d be surprised if the eventual “transactional” architecture for large-scale distributed and decentralized systems looked very different.

Tip: Always RelateIfNeeded when using HttpShell

The InfoGrid HttpShell is something rather amazing when creating web applications. With just a little bit of HTML markup, we can modify the graph of nodes and edges in the GraphDB any way we want, all without writing any handler code for it.

For edges, it understands the keywords:

  • relate
  • relateIfNeeded
  • unrelate
  • unrelateIfNeeded

Relate will fail if the two nodes in question are related already. And in practice, that turns out to be more common than one would assume; only blessing of the edge with the appropriate RoleType is often needed.

So here’s the tip of the day: unless there are good reasons not to, use “relateIfNeeded” and “unrelateIfNeeded” for your application code.

InfiniteGraph Implementation of FirstStep

InfiniteGraph, the currently youngest member of the GraphDB party, has now also implemented our FirstStep example. Todd Stavish’s code is here. It joins implementations of the same example from InfoGrid, Neo4j, Sones, and Filament.

[Update: see Todd's comment below. Apparently there is checking in the second step.] On cursory examination, I’m surprised that InfiniteGraph allows you (requires you?) to create edges without source and destination nodes. Only after the creation of the edge does one assign the nodes to the edge. I’m unclear why this is an advantage for any particular scenario; however, I would think there’s a clear disadvantage as an application developer, because I now have to check for null pointers that I wouldn’t have to in most other graph databases (InfoGrid included).

Hope somebody more neutral than me will perform an API comparison using this example some day.

InfoGrid Now Supports Money as a Native Data Type

There’s a new DataType called CurrencyDataType, and corresponding CurrencyValue. A CurrencyValue consists of a fixed-point decimal number and one of the ISO 4217 currencies. For example, you can say

CurrencyValue newValue = CurrencyValue.create( 12, 34, CurrencyDataType.USD );
System.out.println( newValue );

which will, by default, print:

12.34 USD

It correctly handles currencies that have 0, 1, or 3 digits after the decimal point, too.

In a JSP page, you can say, as you would expect:

<mesh:property
    meshObjectName="Subject"
    propertyType="org.infogrid.model.Test/AllProperties_OptionalCurrencyDataType" />

(or the identifier of a PropertyType in your model that is of type CurrencyDataType) which will print 12.34 USD in view mode, and automatically make it editable in edit mode.

Easiest to try out by running the test app org.infogrid.jee.testapp. Currently currency support is only in trunk, but will get promoted over time.

Isn’t it much easier if your data platform does this, than having to write currency parsing and rendering code all over one’s application?

Welcome Infinite Graph

It always looked like it was only a matter of time until the object database companies would try and become graph databases. Perhaps that is what they should have been all along. I’m speaking as somebody who tried several products almost 20 years ago and decided that they were just too much hassle to be worth it: graphs are a much better abstraction level than programming-level constructs for a database.

Today, Objectivity announced “Infinite Graph”, a:

strategic business unit is tasked with bringing [a] enterprise-ready and distributed graph database product to market

(I took the liberty of eliminating the “marketing” superlatives from the quote; the entire press release has a very generous sprinkling of them.)

Actually, they only announced a beta program, which I signed up for. InfiniGraph.com says:

X:\> BETA IS NOW OPEN

But then, on the screen behind, they say:

Over the next several days, we’ll be preparing our installer and documentation for distribution to the InfiniteGraph community. Stay tuned, and feel free to participate in the discussion on our beta blog!

Well, well, the difficulties of a launch. So I don’t know yet what they created. But it’s good to see another player legitimizing graph databases as a category. So, welcome Objectivity!

InfoGrid 2.9.4 Released

This release contains some major improvements, in particular to the way the object graph is mapped to and from the web. Download or browse documentation here.

General:

  • improved stability and error reporting
  • lots of bug fixes
  • more tests
  • removed symbolic links from SVN; was an endless source of frustration

Core:

  • renamed TraversalDictionary to TraversalTranslator: it can be much more dynamic than a dictionary
  • implemented KeywordTraversalTranslator with fixed translation keywords
  • implemented XpathTraversalTranslator with a pseudo-subset of Xpath
  • introduced AllNeighborsTraversalSpecification instead of null TraversalSpecification; introduced StayRightHereTraversalSpecification
  • MeshObject’s userVisibleName now always returns value of a Property called “Name” if the MeshObject has one; this seems a sensible default for many applications
  • MeshObjectIdentifierFactory now has a pointer back to the MeshBase to which it belongs. Downside is one cannot use the same instance of MeshObjectIdentifierFactory for multiple MeshBases any more.
  • got rid of MeshStringRepresentationContext, which partially overlapped with the purpose of MeshStringRepresentationParameters; only historical reasons can explain why we had both
  • removed title/target/additionalParameters argument from HasStringRepresentation.toStringRepresentationLinkStart; now handled as parameters in StringRepresentationParameters
  • simplified StringRepresentation of PropertyValues and related
  • additional pre-defined StringRepresentations e.g. HttpPost
  • distinguish between formatting Properties (which may be null) and PropertyValues; no more muddling with funny MeshObject context parameters; formatting is now performed via DataType
  • use correct ClassLoader to load ResourceHelper default properties file during Module initialization
  • correct initialization of ResourceHelper in module.adv
  • expanded MeshObjectSet and MeshObjectSetFactory API

Identity-related:

  • refactored LID implementation to use an “instructions” based approach to the pipeline, instead of an exceptions-based approach. This is more flexible for users of the module.
  • fixed typo about credential vs. credtype.
  • renamed LidPersona to LidAccount; more natural to talk about it using that name
  • added LidAccountStatus and SiteIdentifier to LidAccount for multi-tenancy
  • factored account and session-related concepts from org.infogrid.lid.model.lid into new SubjectArea org.infogrid.lid.model.account.
  • new custom tag library that deals with identity

Model-related:

  • fixed code generation bug for descriptions of values in EnumeratedDataTypes
  • code generator to generate static constants for all EnumeratedValues.
  • added AccountCollection to account LID model
  • expanded TestModel to cover both optional and mandatory PropertyTypes; renamed PropertyTypes correspondingly
  • improved Test model for more comprehensive testing
  • better user-visible strings for EntityTypes Bookmark, Account and WebResource

Viewlet/GUI-related:

  • Viewlet framework and tag library extensions for including Viewlets in Viewlets; updated Viewlets accordingly; now allows in-context editing, change of viewlet types etc. for included JeeViewlets; no contiguous TraversalPath from top required
  • support REST-ful URLs on hierarchical Viewlets e.g. GraphTreeViewlet with multiple Viewlet alternatives in contained Viewlet
  • removed iframe in (Net)MeshWorld; hierarchical Viewlets is better approach
  • various HTML fixes and improvements related to the rendering and editing of PropertyValues
  • removed rootPath on all custom tags; not needed or used
  • sanitized formatting of Identifiers; it’s still not totally sane but a lot more so
  • eliminated PropertyValueTag.{css,js} and replaced with PropertyTag.{css,js}
  • removed -moz-opacity CSS value per recent Firefox updates
  • fix HTML doctype to make IE more happy
  • BlobViewlet to get its PropertyType from URL argument not POST argument
  • footer element for MeshObjectSetIterate tag
  • do not print iterateHeader and iterateFooter when set has no content in MeshObjectSetIterateTags
  • default POST behavior is now redirect-to-GET on same URL, so browser refresh is not as awful
  • added missing setter methods on StructuredResponse
  • created SetSizeTag to print the size of a MeshObjectSet in JSP
  • slight changes how MeshObjects are shown on screen by default (dropped annotation in which non-standard MeshBase they are)
  • overflow: auto; to support long CSS floats
  • added orderBy property to setIterate JSP tags
  • added ability to sort in the inverse direction
  • created propertymeter JSP tag for bar graphs or temperature graphs based on Properties
  • default sorting in JSP MeshObjectSet tags is by user-visible String
  • don’t set domains on cookies when run from localhost; makes life of developers hard
  • generate Javascript for PropertyValues
  • eliminating unnecessary projects by moving their code into other projects:
    • org.infogrid.jee.rest -> org.infogrid.jee.viewlet
    • org.infogrid.jee.rest.net -> org.infogrid.jee.viewlet.net
  • renaming projects:
  • org.infogrid.jee.rest.net.local -> org.infogrid.jee.viewlet.net.local
  • org.infogrid.jee.rest.net.local.store -> org.infogrid.jee.viewlet.net.local.store
  • org.infogrid.jee.rest.store -> org.infogrid.jee.viewlet.store
  • renamed meshObjectLoopVar to loopVar in custom tags
  • added arrivedAt property to Viewlet
  • enctype attribute on safeForm is all lowercase
  • created SaneUrl, new supertype of SaneRequest that allows to reuse API for URLs and servlet requests; slight API naming changes httpHost vs. server; allows us to get rid of OverridingSaneRequest nonsense
  • DEFAULT_LINK_START/END_ENTRY now consistently on StringRepresentation
  • removed RestfulRequest, replaced with a MeshObjectsToViewFactory that directly translates SaneRequest into MeshObjectsToView
  • an instance of MeshObjectsToViewFactory must now reside in Context
  • removed NetViewletDispatcherServlet; not needed any more
  • removed most redundant methods on Viewlet; better have one clear way how to do it only
  • upgraded ViewletFactoryChoice: now HasStringRepresentation and contains MeshObjectsToView; this means unfortunately that ViewletFactory setup in applications needs to pass MeshObjectsToView to their choices()
  • ViewedMeshObjects now keeps reference to MeshObjectsToView that it took its data from
  • removed unnecessary request attributes like JeeViewlet.VIEWLET_STATE_TRANSITION_NAME: can be obtained via Viewlet
  • made MeshObjectsToView an interface and subtyped to JeeMeshObjectsToView and NetMeshObjectsToView for cleaner model
  • renamed getMeshObjects to getViewedMeshObjects for consistency
  • ViewletState has moved from JeeViewlet to JeeViewedMeshObjects; added isDefaultState

Graph Database Scaling: InfoGrid’s Contrarian View

There’s an ongoing debate how to scale graph databases horizontally, and I got “strongly encouraged” to present the InfoGrid point of view.

In a nutshell: to make it work, scale like the web, not like a database.

Let me explain:

If you start out with a single relational database server, and you want to scale it horizontally to thousands of servers, you have a problem: it doesn’t really work. SQL makes too many assumptions that one simply cannot make for a massively distributed system. Which is why instead of running Oracle, Google runs BigTable (and Amazon runs Dynamo etc.), which were designed from the ground up for thousands of servers.

If you start out with a single hypertext system, and you want to scale it horizontally to millions of servers, you have a problem, too: it doesn’t really work either. Which is why we got the world-wide-web, which has been inherently designed for a massively decentralized architecture from the get-go, instead of a horizontally-scaled version of Xanadu.

So he we are, and people are trying to scale graph databases from one to many servers. They are finding it hard, and so far I have not seen a single credible idea, never mind an implementation, even in an early stage. Guess what? I think massively scaling a graph database that’s been designed using a one-server philosophy will not work either, just like it didn’t work for relational databases or hypertext. We need a different approach.

With InfoGrid, we take such a fundamentally different approach. To be clear, its current re-implementation in the code base is early and is still missing important parts. But the architecture is stable, the core protocol is stable, and it has been proven to work. Turning it back on across the network is a matter of “mere” programming.

To explain it, let’s lay out our high-level assumptions. They are very similar to the original assumptions of the web itself, but unlike traditional database assumptions:

Traditional database assumptions Web assumptions InfoGrid assumptions
All relevant data is stored in a centrally administered database Let everybody operate their own server with their own content and create hyperlinks to each other Let everybody operate their own InfoGrid server on their own data and share sub-graphs with each other as needed (see also note on XPRISO protocol below)
Data is “imported” and “exported” from/to the database from time to time, but not very often: it’s an unusual act. Only “my” data in “my” database really matters. Data stays where it is, a web server makes it accessible from/to the web. Through hyperlinks, that server becomes just one in millions that together form the world-wide-web. Data stays where it is, and a graph database server makes it look like a (seamless) sub-graph of a larger graph, which conceivably one day could be the entire internet
Mashing up data from different databases is a “web services” problem and uses totally different APIs and abstractions than the database’s Mashing up content from different servers is as simple as an <a…, <img… or <script… tag. Application developers should not have to worry which server currently has the subgraph they are interested in; it becomes local through automatic replication, synchronization and garbage collection.

This approach is supported by a protocol we came up with called XPRISO, which stands for eXtensible Protocol for the Replication, Integration and Synchronization of distributed Objects. (Perhaps “graphs” would have been a better name, but then the acronym wouldn’t sound as good.) There is some more info about XPRISO on the InfoGrid wiki.

Simplified, XPRISO allows one server to ask another server for a copy of a node or edge in the graph, so they can create a replica. When granted, this replica comes with an event stream reflecting changes made to it and related objects over time, so the receiving server can obtain the received replica in sync. When not needed any more, the replication relationship can be canceled and the replicas garbage collected. There are also features to move update rights around etc.

For a (conceptual) graphical illustration, look at InfoGrid Core Ideas on Slideshare, in particular slides 15 and 16.

Code looks like this:

// create a local node:
MeshObject myLocalObject = mb.getMeshObjectLifecycleManager().createMeshObject();
// get a replica of a remote node. The identifier is essentially a URL where to find the original
MeshObject myRemoteObject = mb.accessLocally( remote_identifier );

// now we can do stuff, like relate the two nodes:
myLocalObject.relate( myRemoteObject );

// or set a property on the replica, which is kept consistent with the original via XPRISO:
myRemoteObject.setProperty( property_type, property_value );

// or traverse to all neighbors of the replica: they automatically get replicated, too
MeshObjectSet neighbors = myRemoteObject.traverseToNeighbors();

Simple enough? [There are several versions of this scenario in the automated tests in SVN. Feel free to download and run.]

What does this contrarian approach to graphdb scaling give us? The most important result is that we can assemble truly gigantic graphs, one server at a time: each server serves a subgraph, and XPRISO makes sure that the overlaps between the subgraphs are kept consistent in the face of updates by any of the servers. Almost as important is that it enables a bottom-up bootstrapping of large graphs: everybody who feels like participating sets up their own server, populates it with the data they wish to contribute (which doesn’t even need to be “copied” by virtue of the InfoGrid Probe Framework), and link to others.

Now, if you think that makes InfoGrid more like a web system than a database, we sympathize. There’s a reason we call InfoGrid an “internet graph database”. Or it might look more like a P2P system. But other NoSQL projects use peer-to-peer protocols, too, like Cassandra’s gossip protocol. And I predict: the more we distribute databases, the more decentralized they become, the more the “database” will look like the web itself. That’s particularly so for graph databases: after all, the web is the largest graph ever assembled.

We do not claim that this approach addresses all possible use cases for scaling graph databases. For example, if you need to visit every single node in a gazillion-node graph in sequence, this approach is just as good or bad as any other: you can’t afford the memory to get the entire graph onto your local server, and so things will be slow.

However, the InfoGrid approach elegantly addresses a very common use case: adding somebody else’s data to the graph. “Simply” have them set up their own graph server, and create the relationships to objects in other graph servers: XPRISO will keep them maintained. Leave data in the place where its owners have it already is a very powerful feature; without it, the web arguably could not have emerged. It further addresses control issues and privacy/security issues much better than a more database’y approach, because people can remain in control over their subgraph: just like on the web, where nobody needs to trust a single “web server operator” with their content; you simply set up your own web server and then link.

Want to help? ;-)

Operations on a Graph Database (Part 8 - Events)

Graph Database Tutorial

Part 1: Nodes

Part 2: Edges

Part 3: Types

Part 4: Properties

Part 5: Identifiers

Part 6: Traversals

Part 7: Sets

Part 8: Events

The database industry is not used to databases that can generate events. The closest the relational database has to events are stored procedures, but they never “reach out” back to the application, so their usefulness is limited. But events are quite natural for graph databases. Broadly speaking, they occur in two places:

  • Events on the graph database itself (example: “tell me when a transaction has been committed, regardless on which thread”)
  • Events on individual objects stored in the graph database (example: “tell me when property X on object Y has changed to value Z”, or “tell me when Node A has a new Edge”)

Events on the GraphDB itself are more useful for administrative and management purposes. For example, an event handler listening to GraphDB events can examine the list of changes that a Transaction is performing at commit time, and collect statistics (for example).

From an application developer’s perspective, events on the data are more interesting:

An example may illustrate this. Imagine an application that helps manage an emergency room in a hospital. The application’s object graph contains things such as the doctors on staff, the patients currently in the emergency room and their status (like “arrived”, “has been triaged”, “waiting for doctor”, “waiting for lab results” etc.) Doctors carry pagers. One of the requirements for application is that the doctor be paged when the status of one of their assigned patients changes (e.g. from “waiting for lab results” to “waiting for doctor”).

With a passive database, i.e. one that cannot generate events, like a typical relational database, we usually have to write some kind of background task (e.g. a cron job) that periodically checks whether certain properties have changed, and then sends the message to the pager. That is very messy: e.g. how does your cron job know which properties *changed* from one run to the next? Or we have to add the message sending code to every single screen and web service interface in the app that could possibly change the relevant property, which is just as messy and hard to maintain.

With a GraphDB like InfoGrid, you simply subscribe to events, like this:

MeshObject patientStatus = ...; // the node in the graph representing a patient's status
patientStatus.addPropertyChangeListener( new PropertyChangeListener() {
    public void propertyChanged( PropertyChangeEvent e ) {
        sendPagerMessage( ... );
    });
}

The graph database will trigger the event handler whenever a property changed on that particular object. It’s real simple.