Adam Keys: Post-Relational, not NoSQL

Adam Keys makes a good argument why the budding NoSQL movement should instead be called post-relational. Among other things, he says:

Right now, the best we have is NoSQL. The problem with that name is that it only defines what it is not…

What we’re seeing its the end of the assumption that valuable data should go in some kind of relational database. The end of the assumption that SQL and ACID are the only tools for solving our problems. The end of the viability of master/slave scaling. The end of weaving the relational model through our application code.

We agree, which is why we use the term post-relational when talking about InfoGrid.

NoSQL East Conference in Atlanta

The aspiring NoSQL movement has a conference:

no:sql(east)

October 28-30, 2009. Atlanta, GA.

You got to love their motto:

select fun, profit from real_world where relational=false;

Reportedly there will be talks on:

and perhaps others (e.g. Project Voldemort, Tokyo *, Neo4J, Riak, Kai, Hypertable, Dryad/Cosmos).

Should be interesting.

InfoGrid and the Unified Modeling Language (UML) — Technical Comparison

UML InfoGrid Explanation
Class EntityType In InfoGrid, the Code Generator is responsible for generating classes from interfaces/types. There is no need to represent (implementation) classes in the model. So EntityType in InfoGrid most closely resembles interface in UML.
Association RelationshipType Very similar.
Multiplicity Multiplicity Very similar.
Class inheritance EntityType inheritance Very similar.
Association inheritance RelationshipType inheritance Very similar.
Aggregation, composition, … Considered unnecessary in InfoGrid.
Attributes Properties Similar. In InfoGrid, a property’s definition can be overridden in a subtype. InfoGrid limits the DataTypes support for Properties to simplify.
Instances [represented differently] In InfoGrid, an instance is represented by a MeshObject at a certain URL. There is not need to extend the modeling language to capture it there as well.
Model SubjectArea Very similar.

InfoGrid is of course a (model-driven) software platform, while the UML is just a modeling language. This comparison is only about InfoGrid’s modeling language component and ignores the remainder (i.e. most) of InfoGrid. On the converse, InfoGrid’s modeling language component focuses on the information representation aspect, while the UML provides many more diagramming types — which of course can be used with InfoGrid.

Query Languages Are Overrated

Can you imagine a database without a query language? Say: MySQL without SQL?

Ridiculous? Perhaps not.

I’m aware of three major reasons have been made for SQL, or query languages in general:

  1. A query language allows non-technical users (like managers) to interact with persistent data themselves, bypassing engineers and their applications, without requiring technical assistance.
  2. It allows developers to interact with persistent data in the same way, regardless of the language in which the application is written.
  3. It gives developers and system operators a way of quickly “looking at the data” when things go wrong.

On closer inspection, none of these stands up much. (The fourth argument — that SQL would allow developers to ignore which product of which vendor the data is stored in — is very obviously false.)

Argument #1 was the big one in the 70′s when SQL was invented. However, non-engineers using SQL never happened and by now the argument has been conclusively disproved. Users need user-friendly applications, not SQL, to accomplish anything.

Argument #2 sounds good from the perspective of the database vendor: hey, we can say we support FORTRAN just as well as Java and Python, all with the same code. The trouble is: it does not matter to developers, because they tend to develop in one language at a time. And given the hoops one has to go through to interact with a SQL database from the average language, it’s not even true.

Argument #3 is indeed true. But using SQL is hardly the only way to accomplish this objective. A debugger is another (and nobody has ever asked for a query language for a debugger.) Another way would be to map all data objects to web URLs as we do it in InfoGrid and use the web browser as the “query language”.

So why SQL? Beats me!

InfoGrid’s Debt To CDIF

[Updated with more names]

Although CDIF hasn’t been any more for over 10 years, it’s amazing how many people I still keep meeting that say they had been involved at some time, and that they have nothing but the highest respect for the work that was done there. I’m in that camp, and InfoGrid would not be if it hadn’t been for what I learned in CDIF.

CDIF was started to move information from one CASE tool to another. To accomplish this, the CDIF architects designed a very elegant architecture:

  • A language to express a conceptual information model. CDIF called this the MetaMetaModel.
  • An extensible list of conceptual information models. CDIF called those SubjectAreas.
  • Data instances that are instantiated according to the conceptual information model. CDIF called those Models of various kinds, given CDIF’s purpose.
  • A transfer format that could carry any data instances of any model that could be defined within the CDIF architecture.

The beauty of this architecture is the separation of concerns: the transfer format, for example, would transfer whatever information correctly, even if the schema for the information had not even been defined at the time the transfer syntax was frozen.

The InfoGrid architecture is very similar:

  • In InfoGrid, the language to express a conceptual information model is defined by the types in package org.infogrid.model.primitives, i.e. the types that can be held in a ModelBase. There is also an XML format.
  • InfoGrid calls the information models SubjectAreas as well. For example, here is the default InfoGrid Tagging Model.
  • Instances in InfoGrid are called MeshObjects and Relationships.
  • InfoGrid uses several technologies for instance transfer, including object serialization for persistance purposes, InfoGrid extensions to RSS and Atom, and the XPRISO protocol to keep distributed, replicated instances in sync with each other.

Naturally, given the many years that have passed, InfoGrid goes beyond CDIF in many ways, including:

  • A richer conceptual modeling language.
  • More flexible uses of conceptual models, e.g. multiple types per instance, dynamic blessing and unblessing.
  • Explicit roles and more precise relationship subtyping.
  • Sophisticated APIs for real-time instance manipulation and event generation and reception.

[To be clear, this is just the part of InfoGrid inspired by CDIF. InfoGrid is a software platform, not just a way to move data around. It goes far beyond with REST-ful user interfaces, automatic persistence to a variety of different data store technologies, bidirectional synchronization, in real time, of instances held in different places, via the XPRISO protocol, and real-time information integration via the Probe Framework.]

This is as good a time as any to say thanks to the CDIF folks. CDIF was easily the place where I learned more about information technology in the shortest amount of time, ever. Probably because so many companies sent their chief architects to meetings. For some time I even had the fortune to serve as CDIF’s Technical Vice Chair.

So, thanks, to:

  • Adrian Blakey (Sybase)
  • Bob Lechner (Univ. Massachusetts, Lowell)
  • Bob Matthews (IBM)
  • Chuck Foley (Intersolv)
  • Chuck ‘CQ’ Rehberg (DEC)
  • David Swift (Cadre)
  • Hugh Davis (ICL)
  • Jacob Okyne (Lucas)
  • Kelsey Bruso (Unisys)
  • M’hamed Bouziane (DEC)
  • Mike Imber (LBMS)
  • Mary Lomas (Oracle)
  • Paolo Puncello (FINSIEL)
  • Pete Rivett (Virtual Software Factory)
  • Rob Hill (Deft/Sybase)
  • Woody Pidcock (Boeing)
  • Shaike Artsy (Transtar)
  • Tammy Kirkendall (NIST)
  • and many others…

For what it’s worth, InfoGrid would not have been conceived without you.

Getting data into and out of InfoGrid applications

A variety of mechanisms are available to developers to get data into and out of InfoGrid applications. Here is an overview:

Export:

  • Each data element (aka MeshObject) in an InfoGrid application has a URL. The easiest form of export is to simply access the MeshObject at its URL (subject to access control).

    The MeshWorld example application shows how the Viewlet Framework can be used to easily make multiple formats available at that same URL, e.g. HTML, JSON, RSS and Atom. For example, http://example.com/object-1 may be the MeshObject’s URL that emits HTML, while http://example.com/object-1?lid-format=mime:text/json emits a JSON feed.

  • The WritableProbe from the Probe Framework can be used to push data out, in any format, based on a configurable schedule. This is particularly advantageous if the InfoGrid application accesses an outside information feed, and makes changes to the imported objects; WritableProbe makes it easy to “write back” those changed objects to the feed from where they came from (assuming the feed permits updates in some fashion).

  • Direct export from the application through an encoder such as ExternalizedMeshObjectXmlEncoder or BulkExternalizedMeshObjectXmlEncoder, which serialize MeshObjects into XML.

  • If all else fails, one can of course export directly from the database.

Import:

  • For bulk import through the user interface, the BulkLoaderViewlet can be used.

  • For repeated import of information from the same feed, the InfoGrid Probe Framework was created. InfoGrid already contains Probes for common formats, such as an RSS Probe, Atom Probe, VCard Probe, and for Blobs.

    While the Probe Framework can be used to perform one-time imports, it goes beyond that. Usually, developers ask the Probe Framework to check back, on a regular or adaptive schedule, whether the content of the feed was updated since it was read last. If so, the Probe Framework will incrementally update the previously imported data with no additional programming effort. This enables the developer to focus on their application, and not on how to reconcile old and new data sets, which is typically a major expense and QA headache.

  • Direct import from the application through an encoder such as ExternalizedMeshObjectXmlEncoder or BulkExternalizedMeshObjectXmlEncoder, which also know how to deserialize MeshObjects from XML.

  • If all else fails, one can of course import directly into the database.

Inaugural InfoGrid.org post

Hear, hear.

InfoGrid finally has its own blog, at http://infogrid.org/blog.

It will be about:

  • InfoGrid technology
  • InfoGrid application scenarios
  • InfoGrid and related technologies
  • relevant news
  • and the like.

What is InfoGrid? The next big thing of course. Please visit infogrid.org.