InfoGrid At Enterprise Data World 2010

I’ll be talking about integrating disparate, highly-related information in real-time information using InfoGrid and model-driven development at the upcoming Enterprise Data World Conference in March 2010 in San Francisco. Please join me at Wednesday, March 17, at 8:45am.

There’s much talk in the NoSQL universe, usually by non-believers ;-), about whether NoSQL solutions solve any problems that “regular” companies might have, and not just Google, Facebook and the like.

Well, representing complex information as a graph is something clearly worthwhile, as the web has shown, in particularly if disparate pieces of information need to be related to each other. This is of course indispensable today, and usually far too complex and expensive.

The InfoGrid Graph Database is made to address this problem; and its Probe Framework makes it particularly easy because it can seamlessly deal with information that is not managed by InfoGrid itself. InfoGrid Models govern all data elements and further simplify this otherwise rather complex problem.

See you there for a discussion on NoSQL in the enterprise!

The Problems With SQL: from 1991

Via Twitter. This is simply too good to ignore. Original found here. Note the date.

Nimble Computer Corporation
16231 Meadow Ridge Way
Encino, CA 91436
(818) 986-1436
FAX: (818) 986-1360

October 15, 1991

ACM Forum
Association for Computing Machinery
11 West 42nd St.
New York, NY 10036

Dear ACM Forum:

I had great difficulty in controlling my mirth while I read the self-congratulatory article “Database Systems: Achievements and Opportunities” in the October, 1991, issue of the Communications, because its authors consider relational databases to be one of the three major achievements of the past two decades. As a designer of commercial manufacturing applications on IBM mainframes in the late 1960′s and early 1970′s, I can categorically state that relational databases set the commercial data processing industry back at least ten years and wasted many of the billions of dollars that were spent on data processing. With the recent arrival of object-oriented databases, the industry may finally achieve some of the promises which were made 20 years ago about the capabilities of computers to automate and improve organizations.

Biological systems follow the rule “ontogeny recapitulates phylogeny”, which states that every higher-level organism goes through a developmental history which mirrors the evolutionary development of the species itself. Data processing systems seem to have followed the same rule in perpetuating the Procrustean bed of the “unit record”. Virtually all commercial applications in the 1960′s were based on files of fixed-length records of multiple fields, which were selected and merged. Codd’s relational theory dressed up these concepts with the trappings of mathematics (wow, we lowly Cobol programmers are now mathematicians!) by calling files relations, records rows, fields domains, and merges joins. To a close approximation, established data processing practise became database theory by simply renaming all of the concepts. Because “algebraic relation theory” was much more respectible than “data processing”, database theoreticians could now get tenure at respectible schools whose names did not sound like the “Control Data Institute”.

Unfortunately, relational databases performed a task that didn’t need doing; e.g., these databases were orders of magnitude slower than the “flat files” they replaced, and they could not begin to handle the requirements of real-time transaction systems. In mathematical parlance, they made trivial problems obviously trivial, but did nothing to solve the really hard data processing problems. In fact, the advent of relational databases made the hard problems harder, because the application engineer now had to convince his non-technical management that the relational database had no clothes.

Why were relational databases such a Procrustean bed? Because organizations, budgets, products, etc., are hierarchical; hierarchies require transitive closures for their “explosions”; and transitive closures cannot be expressed within the classical Codd model using only a finite number of joins (I wrote a paper in 1971 discussing this problem). Perhaps this sounds like 20-20 hindsight, but most manufacturing databases of the late 1960′s were of the “Bill of Materials” type, which today would be characterized as “object-oriented”. Parts “explosions” and budgets “explosions” were the norm, and these databases could easily handle the complexity of large amounts of CAD-equivalent data. These databases could also respond quickly to “real-time” requests for information, because the data was readily accessible through pointers and hash tables–without performing “joins”.

I shudder to think about the large number of man-years that were devoted during the 1970′s and 1980′s to “optimizing” relational databases to the point where they could remotely compete in the marketplace. It is also a tribute to the power of the universities, that by teaching only relational databases, they could convince an entire generation of computer scientists that relational databases were more appropriate than “ad hoc” databases such as flat files and Bills of Materials.

Computing history will consider the past 20 years as a kind of Dark Ages of commercial data processing in which the religious zealots of the Church of Relationalism managed to hold back progress until a Renaissance rediscovered the Greece and Rome of pointer-based databases. Database research has produced a number of good results, but the relational database is not one of them.

Sincerely,

Henry G. Baker, Ph.D.

Jonathan Ellis: The NoSQL Ecosystem

Excellent article by Jonathan Ellis on the various approaches to non-relational databases in the market today. He categorizes products and projects along three dimensions:

  1. scalability, in particular how well one can add and remove servers in local or remote data centers
  2. data and query model. He finds a lot of variety there.
  3. persistence design. Alternatives range from in-memory only to smart caching strategies to on-disk.

This categorization is really useful, and more useful than several other categorizations that have been proposed.

Let’s apply this to InfoGrid’s graph database layer:

Re scalability, InfoGrid scales as well as the underlying persistence layer. InfoGrid makes storage pluggable by delegating to the Store abstraction, and Store can be implemented on top of any key-value store. So InfoGrid is just as scalable as the underlying Store.

InfoGrid’s data and query model is based on a graph and an explicit object model. This makes life even easier for the developer than any of the alternatives he discusses in his article. Also, we think our traversal API is a lot simpler than some others that we have seen.

InfoGrid’s persistence design actually gives developers more choices than is typical: InfoGrid can be entirely in memory (if class MMeshBase is instantiated, for example), or smartly cached to an external Store (if class StoreMeshBase is instantiated). Most importantly: the API that developers write to is the same. This allows developers to write application code once, and only later decide how to store their application data. Or, if one kind of Store does not work out (or does not scale once the application becomes popular), move to another without changing the application (other than the initialization).

The NoSQL Business and Use Cases

My question about the most important business and use cases for NoSQL technologies on the NoSQL mailing list sparked an interesting discussion. There appears to be widespread agreement on the following three high-level use/business cases:

  1. The amount of data, or bandwidth, required by an application is so massive that a massively distributed architecture is needed.
    This is the original use case for systems such as Google’s BigTable built to index the internet.
  2. The query load or query complexity is too large to be handled by relational “joins”.
    Digg explained this very well as their reason to move to a NoSQL architecture.
  3. The gap between the physical relational data structures required by a SQL database and an application’s schema complexity and flexibility requirements is too large.
    This encompasses the entire range from needing very loose, weakly typed storage to needing very expressive, strongly typed systems (e.g. graph databases with explicit object models).

Each of these of course is a big category, and more detail can be added. But it is good to see that the community seems to be able to agree on the top-three. It should also put to rest the argument that “NoSQL is not needed”.