What’s the biggest obstacle to GraphDB adoption?

The recent workshop on Graph Databases in Barcelona sparked an interesting debate among vendors of graph databases about how to accelerate graph database adoption.

I don’t think that debate has been resolved yet. Perhaps this post will help a bit.

Some opinions that I heard were:

  • there are significant differences between the various graph database products on the market (e.g. graphs, property graphs, properties on edges or not, hypergraphs etc.). Unless they are more similar, customers will fear lock-in and not buy any of them. Here is a variation:
  • relational databases only took off once there was agreement on the SQL standard among vendors. We need to cooperate to create a similar language, otherwise graph database adoption will not take off either.
  • most potential users of graph databases have never heard of them. How could they use any if they don’t know they exist?
  • even if possible users know of graph databases, they do not know what the use cases are because use cases and success stories have not broadly been documented.

What do you think the biggest obstacles are for graph database adoption?

Strong and Weak Typing With Graph Databases

Whether programming systems should be strongly typed or weakly typed has been one of the longest-running controversies in the history of computer science going back something like 50 years. Generally speaking, strongly typed systems tend to require more programmer effort up-front, in exchange for earlier or more definite error reports.

We also need to distinguish between static typing and dynamic typing: a dynamically typed system enables changes of types at run-time, while a statically typed system can’t do that.

Not surprisingly, typing for graph databases (or any other kind of NoSQL database) can be implemented in different ways, too:

Weakly typed Strongly typed
Dynamically typed

At development time: types may be declared but are not checked except perhaps rudimentarily.

At run-time: errors may occur, which may or may not be discovered; mis-interpretations of data are possible; data corruption is likely in case of programming errors.

At development time: types are declared and checked as well as possible.

At run-time: all operations are checked for type safety; types can be discovered dynamically; type mis-interpretations are not possible.

Statically typed

At development time: only rudimentary checking, if at all

At run-time: errors may occur, which may or may not be discovered; mis-interpretations of data are possible; data corruption is likely in case of programming errors.

At development time: all type errors are caught; additional developer effort is required; some types of data are hard to represent

At run-time: no checking required due to “correctness by construction”.

Let’s insert some systems into this table:

Weakly typed Strongly typed
Dynamically typed Most NoSQL systems InfoGrid
Statically typed SQL database (if used as intended)

Side note: when NoSQL proponents argue that weakly typed systems are much better than stronger-typed SQL, they sometimes throw out the baby with the bath water: there are four choices, not two. We agree that statically, strongly typed systems like a typical SQL database has considerable disadvantages in a fast-moving world, but so do weakly typed systems; the only difference is the type of disadvantage. In our view, a strong but dynamic type system is the best compromise for most applications with a non-trivial schema, which is why InfoGrid V2 implements it. (There are some applications that do not require a non-trivial; web caching for example.)

In a graph database like InfoGrid, the following items can be typed:

  • Nodes
  • Edges
  • Properties

In other graph databases, only a subset of these items may be typed. More in the next post on types.

InfoGrid and Relational Databases

The other day I was asked:

Your pitch for InfoGrid really disses relational databases. But then, InfoGrid applications usually use MySQL (or PostgreSQL) to store their data. What gives?

To which I responded:

All the database vendors want you to store your data in their database, instead of files in the file system. But then, the databases themselves store their data as files in the file system. What gives?

This does not sound as contradictory. It’s fine that a database stores its data as files in a file system; it may or may not, as an application developer you really don’t care much. You care about the high-level facilities (such as SQL) that the database provides, because writing code against them is much easier and faster than writing against files (for many applications).

The InfoGrid argument is the same one, just one level up: It is much better the develop against the InfoGrid APIs than against SQL directly, because of all the high-level facilities that InfoGrid gives you. Here’s an example:

Try this with a relational database:

MeshObject employee = ...;
employee.bless( CustomerSubjectArea.CUSTOMER );

Your employee has just also become a customer, with all that this entails (e.g. participating in the relationship Customer_Places_Order, which you can’t as a mere employee). For more on blessing objects, see the documentation.

With raw SQL, you wouldn’t even know where exactly to start, but chances are you would have to redesign your schema, and write and update a whole lot of application code.

Query Languages Are Overrated

Can you imagine a database without a query language? Say: MySQL without SQL?

Ridiculous? Perhaps not.

I’m aware of three major reasons have been made for SQL, or query languages in general:

  1. A query language allows non-technical users (like managers) to interact with persistent data themselves, bypassing engineers and their applications, without requiring technical assistance.
  2. It allows developers to interact with persistent data in the same way, regardless of the language in which the application is written.
  3. It gives developers and system operators a way of quickly “looking at the data” when things go wrong.

On closer inspection, none of these stands up much. (The fourth argument — that SQL would allow developers to ignore which product of which vendor the data is stored in — is very obviously false.)

Argument #1 was the big one in the 70′s when SQL was invented. However, non-engineers using SQL never happened and by now the argument has been conclusively disproved. Users need user-friendly applications, not SQL, to accomplish anything.

Argument #2 sounds good from the perspective of the database vendor: hey, we can say we support FORTRAN just as well as Java and Python, all with the same code. The trouble is: it does not matter to developers, because they tend to develop in one language at a time. And given the hoops one has to go through to interact with a SQL database from the average language, it’s not even true.

Argument #3 is indeed true. But using SQL is hardly the only way to accomplish this objective. A debugger is another (and nobody has ever asked for a query language for a debugger.) Another way would be to map all data objects to web URLs as we do it in InfoGrid and use the web browser as the “query language”.

So why SQL? Beats me!