Our problem statement: The barriers of remote data access

Author: Daniel Jozsef April 27th, 2012

, , , ,

Why SQL is no longer enough

With the growing prominence of cloud storage and computing, we have seen a rise in the number of standard or vendor-specific languages and protocols for on-line data querying. In the world of today’s computing, maybe the single most important repeated task is finding relevant records in a database.
For years, the de-facto standard for this type of operation has been the SQL language. The core language stems from the 1970s, and is about the same age as relational databases themselves. It is a well-tested concept that has proven itself over the course of years, and is still with us in many applications. However, as the need to provide public access to datastores through on-line, usually HTTP channels grows, the strengths of SQL become its main drawbacks.
It is far too powerful for a public data service. Even with proper rights management, it is too easy to overlook and almost impossible to filter out or optimize queries that would consume disproportionate computing power. When one creates a database for a specific application, the most important queries are known, and indexes and clusterings are placed according to the client’s behavior, to speed up the most common and most problematic tasks.
When designing a public data service, one does not have the advantage of knowing the client. The client can be anyone, and might want to query anything.
Also, as object-oriented datastores and automatically coupled, “code first” ORM systems gain ground from traditionally relational databases, and as the need rises to query high-level representations of data at or above the application layer, aggregating from heterogenous sources and systems, SQL is slowly ceding its reach as a globally accepted way of querying.
As long as all the data is stored in an Oracle or MSSQL server, publishing a SQL interface is workable, if risky from a Denial-of-Service standpoint. However, if the data you need is a combination of the information stored in several legacy web applications, sometimes running in different operating systems and different database servers, providing a complete SQL implementation is out of the question.

Babel’s confusion

This is where dozens of cloud vendors stepped in with their own, on-line query languages. Some of them specific to the cloud platform it is bound to, such as YQL (Yahoo Query Language) and FQL (Facebook Query Language), others are aimed to become worldwide standards (OData from Microsoft Azure).
When an application developer sets out to create his own mobile or web app, chances are he will have to access a multitude of data sources. Today is the age of mash-ups and interconnectivity. One expects his photo app to upload photos to Facebook, Google+, Twitter and Flickr, find uploaded photos in all those services based on keywords and geolocation, etc. You get the picture. What the developer needs to do is, eventually, to learn and make use of a multitude of query languages, each with its own quirks, to provide the functionality consumers expect.



Some languages, such as YQL and FBQL, are SQL-like query languages, which is also mirrored in their names. They are simple and powerful, and learning these languages seems easy for someone who comes from a SQL background, and that is at the same time their largest pitfall. Modern SQL implementations are almost (or fully) Turing-complete, and allow the developer to do anything. As this has been one of the reasons why SQL is inapplicable as an on-line querying tool, these languages are necessarily less complex and less powerful.
For the project manager, this means unpredictability of effort. Something that looks trivial at the first glance can become a serious bottleneck in the development process, because of some unforeseen limitation of the web query language used. These limitations are of course less prominent as long as one sticks to simple queries, but become a real risk when wandering into the realm of nested queries, cursors, GROUPBY statements and aggregator functions, etc.


Other languages, like OData, follow a completely different model. OData is a URI-based protocol, following the resource model. The datasource consists of various resources which one can access through standardized URIs, and enter queries through URI parameters.
This is closer to the way the Web thinks, and carries less chances of misunderstanding than the SQL-like languages. And it works wondefully, as long as one doesn’t venture outside the realm of simplistic “field EQUALS constant” type queries, but once you’d like the right operand to be a field as well (eg. “I want to see all transactions where the expense is greater than the revenue”), language support stops short.

The Ajax request

Once a developer has learned all that, and is now proficient in using each query language to its full capabilities, and use them in client code, one must handle the mess that is the Ajax request. This is a generic tool that has been with us for a very long time, has specific quirks, and before all, the developer needs to assemble the request with string operations.
Even though we have JSON to help us handle the return value, this is all too similar to the glory days of programming, when connecting to a database involved first assembling a correct network request as a byte array, pushing it through a pipe, and then parsing the response. It is something today’s enterprise developers have forgotten many years ago. The data access layer of the day should be strongly typed, smart and abstract. The Ajax request is all BUT those things.


And this is where we come into the picture. Our goal was to create a developer-friendly data access system for client code. JayData abstracts the underlying query languages and network operations via a provider model, similar to the provider architecture used in the ASP.NET. Each data source type can have its own provider – we have created providers for the most important ones, but a clear point of extensibility enables anyone with the needed knowledge to support the data source he needs.
From an application developer’s standpoint, JayData allows data to be queried using JavaScript Language Query (JSLQ), a language native query system based on the functional aspect of JavaScript. Queries can be written as JavaScript boolean expressions and functions, and operations such as filtering, sorting, cropping or paging the result, etc. are represented through a fluent API over the object representing the data context.
Of course this approach is entirely Turing-complete, type-safe, abstract, and extremely comfortable for the developer, as all data sources can be queried using the exactly same syntax. Of course this also means there will be operations that certain data sources do not support, but then it is up to the provider developer to create a workaround on the client side, or simply throw an error stating that the provider does not support this.
Provider documentations also contain the exact specification of what the provider does or doesn’t support, making planning with data source capabilities simple and understandable.
To show you what we mean, let the code speak for itself.

YQL example

Selecting European countries with an “e” in their name, in native YQL:

…and in JSLQ:

FQL example:

Selecting all details of current user and friends in native FQL:

…in JSLQ:

, , , ,