Developer Blog |
When running a process, it's rather important to have somewhere to store the relevant data. This could be in a value store, but if longer-term data persistence is required, it's likely you'll rely on a database. This post covers traditional databases and hints at what's next.
Short term and carefree
By default, Appway manages all Data Entities in a container called "Value Store". Data related to a Process instance is stored for the duration of that instance. Once the end of the Process is reached, the data for that instance is deleted automatically.
The Value Store persists any object graphs, a wonderful feature which allows you to focus on what the data is about, what relations between the different concepts exists, how the concept hierarchy looks like and how entities are interconnected. There's no need to care about specific value sizes: You do not have to define if a person's surname has 50 or 150 characters, or whether UTF-8 or ASCII or an ISO-8891 charset is used for storage. The Value Store takes care of that.
Everything just falls into place … until you have to persist data for a longer time, beyond the length of the process. Then what?
Persistence and rigidity
The industry came up with a solution for longer-term data persistence decades ago: relational database management systems (RDBMS). Hardware limitations had a huge impact on the development of this technology: Memory was expensive 20 years ago. RDBMS therefore mostly deals with how large your data is, and how fast it can be retrieved from cheap disk space.
In RDBMS you need to define tables of equal sized rows. Each row may be split in an equal number of columns, but they must have a fixed size.
The devotion to a pre-ordained structure has benefits: an RDBMS set-up allows engineers to create high-performance data retrieval mechanisms. Flexibility, however, suffers. If lots of information only takes up a couple of hundred bytes, but the outlier — with a few thousand bytes — must be captured as well… you have to define a column which is large enough for the outlier, wasting space. Modern databases can deal with this in more efficient ways, but the initial problem is still there: Data is given a fixed size and a fixed structure.
A database schema defines what data may be stored to a specific database; more precisely, to a specific database table. This isn't an issue when the data is really uniform. But when you have lots of different data, you have to define specific tables for each data type. Engineers have to create a data model that fits current needs, is efficient, and can hold the structures required — but then also anticipate future changes to the data, because every change in data structures or types requires a change in the database schema.
As former IBM executive Jnan Dash notes, "This 'get it right first' approach may have worked in the old world of static schema, but it will not be suitable for the new world of dynamic schema, where changes need to be made daily, if not hourly, to fit the ever changing data model."
So what's next?
The sequel to SQL
Industry came up with yet another concept and called it key-value stores, a very simple approach in which you store a value for a specific key. This kind of data structure is very easily distributed and thus fits nicely in a cluster environment. Nowadays known as "NoSQL stores", they can deal with very, very large amounts of data while also embracing data versatility.
The price of this flexibility comes in the form of 'query-ability': Huge, variously-structured datasets cannot easily be queried. As a work-around, large data warehouses do nightly processing of their data to create indexes in RDBMS or other systems — or it may suffice to remember a root key, which then allows a system to efficiently retrieve data in an incremental way.
Is this the future? No. It's a stepping stone along the way. Read my next post to find out what I consider to be the direction we're heading in.
Note: Within Appway, the Data Store extension provides a tool for mapping Appway Data Classes to an RDBMS system, handling the creation and alteration of table definitions and dealing with storage and loading.
---
Image courtesy tec_estromberg / Flickr

Short term and carefree
By default, Appway manages all Data Entities in a container called "Value Store". Data related to a Process instance is stored for the duration of that instance. Once the end of the Process is reached, the data for that instance is deleted automatically.
The Value Store persists any object graphs, a wonderful feature which allows you to focus on what the data is about, what relations between the different concepts exists, how the concept hierarchy looks like and how entities are interconnected. There's no need to care about specific value sizes: You do not have to define if a person's surname has 50 or 150 characters, or whether UTF-8 or ASCII or an ISO-8891 charset is used for storage. The Value Store takes care of that.
Everything just falls into place … until you have to persist data for a longer time, beyond the length of the process. Then what?
Persistence and rigidity
The industry came up with a solution for longer-term data persistence decades ago: relational database management systems (RDBMS). Hardware limitations had a huge impact on the development of this technology: Memory was expensive 20 years ago. RDBMS therefore mostly deals with how large your data is, and how fast it can be retrieved from cheap disk space.
In RDBMS you need to define tables of equal sized rows. Each row may be split in an equal number of columns, but they must have a fixed size.
The devotion to a pre-ordained structure has benefits: an RDBMS set-up allows engineers to create high-performance data retrieval mechanisms. Flexibility, however, suffers. If lots of information only takes up a couple of hundred bytes, but the outlier — with a few thousand bytes — must be captured as well… you have to define a column which is large enough for the outlier, wasting space. Modern databases can deal with this in more efficient ways, but the initial problem is still there: Data is given a fixed size and a fixed structure.
A database schema defines what data may be stored to a specific database; more precisely, to a specific database table. This isn't an issue when the data is really uniform. But when you have lots of different data, you have to define specific tables for each data type. Engineers have to create a data model that fits current needs, is efficient, and can hold the structures required — but then also anticipate future changes to the data, because every change in data structures or types requires a change in the database schema.
As former IBM executive Jnan Dash notes, "This 'get it right first' approach may have worked in the old world of static schema, but it will not be suitable for the new world of dynamic schema, where changes need to be made daily, if not hourly, to fit the ever changing data model."
So what's next?
The sequel to SQL
Industry came up with yet another concept and called it key-value stores, a very simple approach in which you store a value for a specific key. This kind of data structure is very easily distributed and thus fits nicely in a cluster environment. Nowadays known as "NoSQL stores", they can deal with very, very large amounts of data while also embracing data versatility.
The price of this flexibility comes in the form of 'query-ability': Huge, variously-structured datasets cannot easily be queried. As a work-around, large data warehouses do nightly processing of their data to create indexes in RDBMS or other systems — or it may suffice to remember a root key, which then allows a system to efficiently retrieve data in an incremental way.
Is this the future? No. It's a stepping stone along the way. Read my next post to find out what I consider to be the direction we're heading in.
Note: Within Appway, the Data Store extension provides a tool for mapping Appway Data Classes to an RDBMS system, handling the creation and alteration of table definitions and dealing with storage and loading.
---
Image courtesy tec_estromberg / Flickr

Comments (0)
