FaunaDB: A Guide for Relational Users

Evolution of relational databases

The evolution of databases has been a history of "Creative destruction". Joseph Schumpeter, the famous Austrian economist, in his widely accepted economic theory explains creative destruction as the "process of industrial mutation that incessantly revolutionizes the economic structure from within, incessantly destroying the old one, incessantly creating a new one". Taking a cue from the above, we can very easily relate to how database management systems have been evolving over the last 50 years or so.

In the 1960s, both the hierarchical and the network models to store databases became popular. The hierarchical databases structured data in a tree-like model in which each record of data is connected to another one using links. The hierarchical database enforced a one-to-many relationship between a parent and a child record and, in order to read any record, the whole database needed to be traversed from the root node to the leaf. The hierarchical model was the first database model introduced by IBM. The network model allowed more natural modeling of relationships between entities. Unlike the hierarchical model, the network model allowed each record to have multiple parent and child records forming a generalized graph structure. In spite of the relatively flexible structure, the network model did not gain traction over the hierarchical model because of IBM ’s decision to use the latter in its established IMS database. These early databases saw adoption only with businesses with deep pockets and they were conceived as complex, difficult, and time-consuming to design. The upfront cost to procure specialized hardware was also an inhibitor and along with the poor performance of applications left a lot to be desired. The complexity to model beyond one-to-many relationships only added to these shortcomings.

Primarily tedious navigational access and high total cost of ownership led to the relational database model, i.e., the next disruption in the 1970s.

Relational databases

The next big step in database technology came in the 1970s when Edgar Codd described the relational database model. Codd proposed 12 rules (actually 13, numbered 0 through 12) to explain the properties of a relational database system:

Rule 0: The foundation rule

The system must qualify as relational, as a database, and as a management system. For a system to qualify as a relational database management system (RDBMS), that system must use its relational facilities (exclusively) to manage the database.
The other 12 rules derive from this rule. The rules are as follows:

Rule 1: The information rule

All information in the database is to be represented in one and only one way, namely by values in column positions within rows of tables.

Rule 2: The guaranteed access rule

All data must be accessible. This rule is essentially a restatement of the fundamental requirement for primary keys. It says that every individual scalar value in the database must be logically addressable by specifying the name of the containing table, the name of the containing column and the primary key value of the containing row.

Rule 3: Systematic treatment of null values

The DBMS must allow each field to remain null (or empty). Specifically, it must support a representation of "missing information and inapplicable information" that is systematic, distinct from all regular values (for example, "distinct from zero or any other number", in the case of numeric values), and independent of data type. It is also implied that such representations must be manipulated by the DBMS in a systematic way.

Rule 4: Active online catalog based on the relational model

The system must support an online, inline, relational catalog that is accessible to authorized users by means of their regular query language. That is, users must be able to access the database’s structure (catalog) using the same query language that they use to access the database’s data.

Rule 5: The comprehensive data sub-language rule

The system must support at least one relational language that:

Has a linear syntax.
Can be used both interactively and within application programs.
Supports data definition operations (including view definitions), data manipulation operations (update as well as retrieval), security and integrity constraints, and transaction management operations (begin, commit, and rollback).

Rule 6: The view updating rule

All views that are theoretically updatable must also be updatable by the system.

Rule 7: High-level insert, update, and delete

The system must support set-at-a-time insert, update, and delete operations. This means that data can be retrieved from a relational database in sets constructed of data from multiple rows and/or multiple tables. This rule states that insert, update, and delete operations should be supported for any retrievable set rather than just for a single row in a single table.

Rule 8: Physical data independence

Changes to the physical level (how the data is stored, whether in arrays or linked lists, etc.) must not require a change to an application based on the structure.

Rule 9: Logical data independence

Changes to the logical level (tables, columns, rows, and so on) must not require a change to an application based on the structure. Logical data independence is more difficult to achieve than physical data independence.

Rule 10: Integrity independence

Integrity constraints must be specified separately from application programs and stored in the catalog. It must be possible to change such constraints as appropriate without unnecessarily affecting existing applications.

Rule 11: Distribution independence

The distribution of portions of the database to various locations should be invisible to users of the database. Existing applications should continue to operate successfully:

When a distributed version of the DBMS is first introduced.
When existing distributed data are redistributed around the system.

Rule 12: The non-subversion rule

If the system provides a low-level (record-at-a-time) interface, then that interface cannot be used to subvert the system, for example, bypassing a relational security or integrity constraint.

In a strict sense, none of the modern day databases adhere to these 13 rules but many commercial databases do come close. An extensive examination of all the rules is beyond the scope of this paper, but if we summarize these rules, the core tenets of an RDBMS distills down to three very high-level concepts: Structure, Integrity, and Manipulation (Baker, Relational Databases 1992).

Structure

The structure of the relational model refers to Entities, which over the years have become synonymous with a Table in the relational lingo. Codd proposed (in Rule #1) that all data in the database be represented as values in column positions within rows of tables. Basically, only put one piece of data in each field, and do not put the same data in different fields of different tables. This process is called normalization and leads to the creation of entity relationships within the database. Normalization is essentially a data-analysis process used to find the simplest possible data structure for a given collection of information by eliminating the replication of data across tables. However, in order to build relationships between entities, there is a need to uniquely identify each record. This unique identifier for a table row is called a Primary key. Data in primary key columns are used to build relationships between two or more related entities.

It is very important to understand here that the ground rules for relational databases were set at a time when storing and retrieving information from disks was quite expensive, and thus application of normalization became really important. Normalizing data to the most granular level often leads to complicated data models that are extremely difficult to comprehend. Over the years, new data modeling techniques like the star schema and the snowflake schema have been accepted even though they don’t adhere to the principles of normalization.

Integrity

Integrity mainly refers to relational integrity and constraints. It enforces that a foreign key of one relation either can hold values that are present in the primary key columns or must be null. This rule means that if one instance of an entity exists and refers to another entity occurrence, then the referenced entity must also exist. The other aspects of integrity are constraints that can enforce uniqueness in a specific column without making it a primary key. In relational databases, unique constraints are enforced using indexes. Indexes are key to data access in relational databases.

Manipulation

The data manipulation aspect of relational databases can be divided into two major elements:

Manipulation Operations
Manipulation Language

Data manipulation operations adhere to relational algebra and consist of the set operators, such as intersection, union, minus, etc. The ability to Join by comparing data in two columns in two different tables forms the crux of data manipulation in relational databases. Joins provide the ability to access any data in the normalized entities.

The manipulation language as mentioned in Codd’s Rule #5 allows the user to interact with the data stored across various tables. Users access data through a high-level, declarative database language. Instead of writing an algorithm to obtain desired records one at a time, the application programmer is only required to specify a predicate that identifies the desired record(s) or combination of records.

The Structured Query Language or SQL has become the universal standard for this data manipulation language. SQL can be executed interactively or embedded in a programming language. The embedded mode makes SQL the database programming language for a host language, e.g., C, FORTRAN, COBOL, and so forth. SQL as a database manipulation language has become so popular that it has become synonymous with relational databases. It is easy to learn and provides a very intuitive syntax to operate on the data.

Database transactions and ACID

A database transaction is a unit of work performed within a database in a coherent and reliable way independent of other transactions. Transactions generally represent any change in the database. The built-in support for transactions has been one of the key reasons for widespread adoption of relational databases. A database transaction by definition needs to be atomic, consistent, isolated, and durable, i.e., ACID compliant.

Atomicity

An atomic transaction is an indivisible and irreducible series of database operations such that either all occur, or nothing occurs.

Consistent

Consistency guarantees that any transactions started in the future necessarily see the effects of other transactions committed in the past.

Isolated

Isolation determines how transaction integrity is visible to other users and systems.

Durable

Durability guarantees that transactions that have been committed will survive permanently.

What does relational mean to different database operators?

We have discussed the foundations of relational databases in the above sections. However, over the past 40 years, the interpretation of the word "relational" has morphed. If we ran a survey across database developers, architects, and even administrators to ask them which relational property is the most important, the answers would be surprisingly different. While most developers would identify SQL, a database architect might choose referential integrity, and the administrator might choose the ACID properties.

Over the years, the use of relational databases for mission-critical systems has also ensured certain operational readiness. The ability to recover to any point in time or to fail-over to a standby system in case of any hardware failure are now considered table stakes in the relational world.

Common stakeholders involved in RDBMS

Challenges posed by the advent of the internet era

The internet revolution in the late 90s changed how businesses were conducted. The need for global internet-scale applications changed the fundamental requirements of how applications needed to be designed. The introduction of new application programming paradigms exposed issues with the relational database model, some of which forced more innovation in the database industry.

Impedance mismatch

A mismatch between the object-oriented and the relational world.

New-generation programming languages are primarily object-oriented, where objects are connected via references and build an object hierarchy or graph. But relational databases store data in table rows and columns. To store the object hierarchy or the graph in a relational database, the object or graph has to be sliced and flattened to fit the normalized data format. This results in a complex joining of tables and often leads to performance issues in the databases.

Sub-optimal data models

While relational systems meet most business operations needs (because relationships are the way most business are modeled), the relational data model doesn’t fit in with every data domain. Today, more than 85% of corporate data is generated either in the web as free form text or by machines and modern day applications that need to store data in various formats. Textual unstructured data includes documents, presentations, videos, email, chat, and social media posts, while machine data includes data generated by various sensors, electric meters, IoT devices, and so on. Analyzing all of this unstructured data can be challenging with massive data volumes, many different file types, and high creation speeds. The structured data model, with the upfront definition of the table structure and referential constraints, is not suitable for storing and retrieving such unstructured data.

In the pre-internet era, the design of an application was driven by the data model. A business process was mapped to entity relationships and applications were built within the bounds of that definition. With the advent of the internet era, the focus changed to capturing data in its original free form. Thus, the rigidity of a structured model had to be given up. Applications now require flexibility so that data can be captured in its original form.

Dynamic scalability

Relational databases thrived during the client-server era and clearly were not built to handle web-scale workloads. So, when the application load increased, the only way to keep up with performance needs was to scale up, i.e., move the database to a bigger machine. Many people tried horizontal scaling with shared disks across multiple machines, but that approach could not keep up with the demands of running a database across multiple data centers. The scale up approach left a lot to be desired and led to a high total cost of ownership. The operational overhead to maintain large databases is significant and often requires hiring a team of highly skilled DBAs to maintain the application backend.

Increasing data volumes

Data volumes generated by applications are growing exponentially. In the last decade, the notion of a large database has changed from a few hundred GBs to hundreds of TBs. Even petabyte scale is not uncommon these days. Relational databases were designed to run on single servers and hence they are limited by design to the system resources that can be made available on the node. The relational databases did undergo improvements to cope with this increased demand by allowing active-passive architectures or by creating data shards. But eventually it came down to one single point of failure that always made the database vulnerable to failures. For example, Oracle RAC can support multiple database nodes, but it still requires a shared disk subsystem where a single corruption or crash can take down the entire application.

Performance

Application performance expectations have changed significantly over the years. Applications are expected to ingest data at high speed from multiple sources and to keep read latencies under a few milliseconds.

One of the prime requirements of a relational database is to support a declarative language for data manipulation. If we take a step back and assess the reasons behind the popularity of SQL, we will see it is primarily because of the fact that it helps to answer the "What?" and not the "How?". In other words, the end user doesn’t need to specify the access path (i.e., an Index or join order), but only provide the name of the entity (table). While in theory this sounds pretty good, in reality this is quite the opposite. For any given application, a team of performance engineers carefully study access patterns and suggest required indices to ensure agreed SLAs. While the end users may not care about the nuts and bolts of indices or materialized views, the operational overhead of maintaining such additional objects cannot be ignored. Now, as the data volumes grow and new access patterns emerge, the predictability of performance goes out the window. This has been a huge problem with relational databases, so much so that database vendors like Oracle have introduced several features just to ensure that the execution plan (access pattern) doesn’t change.

The other side of the problem has been the need to join multiple tables, even for addressing simple queries. As data volumes grow, performing efficient joins across multiple tables poses a challenge to meet business expectations.

Primarily because of these shortcomings, companies are progressively considering alternatives to legacy relational infrastructure.

The NoSQL evolution

NoSQL (or Not Only SQL) encompasses a wide variety of different database technologies that have been developed in response to the growth in the data volume, the increased frequency of data retrieval, and performance and processing needs. It has now become an alternative to the traditional relational databases with built-in horizontal scalability and high availability but not compromising performance.

The growth of the multi-datacenter

One of the factors that has triggered the need for distributed databases is the adoption of multiple data centers by enterprise IT. Multiple data centers were initially introduced purely as a disaster recovery strategy, but enterprises soon realized that in order to stay competitive they could not have machines idle in a data center purely as insurance for an impending failure. Relational databases provide ways to keep a copy of the database in-sync across data centers, but they do not support active-active setups, i.e., writes across multiple data centers. NoSQL databases have been built with the premise of being distributed across multiple data centers.

The key benefits to using NoSQL to process Big Data are as follows:

Scaling out

As discussed earlier, relational databases relied on scaling up — purchasing larger, more expensive servers — as workload increased, but NoSQL databases can scale out transparently across multiple data centers. NoSQL databases are designed to work on low-cost commodity hardware.

Superior data handling

NoSQL distributed databases allow data to be spread across hundreds of servers with little reduction in performance. Features like data replication, automatic repair, relaxed data distribution, and simpler data models make database management much easier in NoSQL databases.

Lower cost of ownership

NoSQL databases usually use clusters of inexpensive commodity servers to manage the extremely large data and transaction volumes. Thus, both upfront cost and management cost for NoSQL databases are much less than with RDBMS.

Flexible data model

An upfront data model definition is not a requirement in a NoSQL database. Thus, it fosters rapid application development and deployment. The data created or the underlying data model can be restructured at any time, without any application disruption. This offers immense application and business flexibility.

Application performance

The flexibility in data models and the ability to store data in multiple formats (key-value, documents column-family, and graph) also helps with improved application performance. These advantages have led to widespread adoption of NoSQL databases in the last decade.

But what about data consistency and relations?

Before getting into the shortcomings of the first generation NoSQL databases, it is very important to understand the CAP theorem which describes the strategies of distributing application logic across networks. The CAP theorem describes three areas of concern and prescribes that one can only choose two out of the three in case of a network partition:

Consistency - All database users see the same data even with concurrent updates
Availability - All database users are able to access some version of the data
Partition Tolerance - The entire database can be distributed across multiple networks (read data centers)

How consistency concerns overlap

In distributed NoSQL databases, one of the biggest challenges is ensuring that data remains synchronized across data centers or regions. If it is very important for the application users to have a consistent view of data irrespective of the data center they are connected to, any writes on one of the database servers must be replicated to all the data centers without much delay. If it is extremely critical or a business requirement that each user must see all changes as they are happening, it becomes essential that the application wait until a majority of the database servers have acknowledged a specific write or the version of data being read. In this case, "consistency" of data is the driver and for a significant amount of time a use case for relational databases.

However, if the availability of the application is more important than consistency, database clients can write data to any node of the cluster without waiting for an agreement from another database server. The database must provide a way to reconcile the changes and the data is "eventually consistent".

A majority of the first generation NoSQL databases chose availability over consistency and are "eventually consistent".

While this strategy of eventual consistency (AP systems) may work for some applications, most mission-critical applications will choose consistency over availability. The requirement to see a consistent version of data is paramount to a number of financial, retail, and inventory-based applications. Hence, mission-critical applications have been resistant to adopting NoSQL databases.

"When we started at Twitter, databases were bad. When we left, they were still bad."

— Evan Weaver
CEO of Fauna

Anecdotal evidence of the inefficiencies of the first generation NoSQL systems is available first hand from Evan Weaver (now CEO at Fauna), employee number 15 at Twitter and responsible for running its software infrastructure team. Twitter, one of the early adopters of Cassandra, couldn’t rely on existing databases to scale the site. This resulted in frequent outages and the famous "fail whale" page.

Apart from the problem of eventual consistency, the following shortcomings have also been a deterrent to the adoption of NoSQL databases.

Operational readiness

A lot of the NoSQL databases are extremely difficult to manage and maintain. One of the possible reasons is that most of these databases are an outgrowth of an open source project. Open source committers are great at adding new functionality, but they seldom care about enterprise readiness. Large companies can throw developers at these issues and work around the immaturities, but smaller digital businesses cannot afford to do that. They need a solution that just works out of the box.

Typical problems are often very basic requirements for the operators. There are no easy ways to take incremental backups or straightforward ways to do point-in-time recoveries. Most of these first generation NoSQL databases simply lacked the enterprise readiness required to run the database in a corporate data center.

Security

Security is a major concern for IT Enterprise Infrastructures. Security in NoSQL databases is very weak; authentication and encryption both in motion and at rest are very weakly implemented.

Lack of standardization

The design and query languages vary widely among various NoSQL database products. Depending on the type of NoSQL database chosen, the learning curve can be very steep.

Joins

NoSQL databases do not require any predefined data models or referential integrity. As a result, there is no predefined structure that can be used to join two different collections of data.

With these shortcomings in place, it is clear that modern-day applications need a database that has the best of breed features from both relational and NoSQL database worlds.

Need of the day: A modern database

The advent of NoSQL databases in the last decade has addressed the issue of scaling out at the expense of relational integrity, but even then the primary focus has been on how data is stored and distributed. Some of these NoSQL databases are strictly key-value stores, some are document based, and a few are wide-column stores. This focus on how to store the data has often meant that the need for consistency, enterprise readiness, and ease of use have been overlooked. We have many NoSQL database solutions available today as point solutions to niche use cases, but none of them have evolved to a platform that provides the enterprise-grade readiness that we usually identify with RDBMSs. Today’s modern applications want the global scale and flexibility of NoSQL but, at the same time, they want the consistency, security, and reliability of relational systems.

FaunaDB provides the best of both the relational and NoSQL worlds. Here is a summary of FaunaDB features that let you create distributed applications at a global scale without compromising the integrity of data:

Feature	Description
Strong ACID Transactions	FaunaDB guarantees 100% ACID transactions across a distributed database cluster.
Native Multi-tenancy	FaunaDB has built-in quality of service (QoS) capabilities that allow multiple tenants to co-exist without any noisy neighbor issues. FaunaDB is great for SaaS applications.
Scalability	FaunaDB can scale from a single node install to an enterprise-grade setup across multiple data centers with no downtime. The management of the database is completely transparent to the application.
High Availability	FaunaDB’s peer-to-peer/masterless architecture provides a seamless response to any node failure or data center outage.
Operational Simplicity	Operational simplicity is at the core of FaunaDB. Unlike incumbent NoSQL solutions, FaunaDB’s rich cluster management APIs makes cluster setup a breeze.
Security	FaunaDB provides identity management, row-level access control, transaction auditing, and encryption in motion and encryption at rest coming soon.
Document Relational Features	FaunaDB supports various application programming languages with built-in drivers. A single query in FaunaDB (FQL) can pull data across documents, relational graph, and temporal datasets.

Feature

Description

Strong ACID Transactions

FaunaDB guarantees 100% ACID transactions across a distributed database cluster.

Native Multi-tenancy

FaunaDB has built-in quality of service (QoS) capabilities that allow multiple tenants to co-exist without any noisy neighbor issues. FaunaDB is great for SaaS applications.

Scalability

FaunaDB can scale from a single node install to an enterprise-grade setup across multiple data centers with no downtime. The management of the database is completely transparent to the application.

High Availability

FaunaDB’s peer-to-peer/masterless architecture provides a seamless response to any node failure or data center outage.

Operational Simplicity

Operational simplicity is at the core of FaunaDB. Unlike incumbent NoSQL solutions, FaunaDB’s rich cluster management APIs makes cluster setup a breeze.

Security

FaunaDB provides identity management, row-level access control, transaction auditing, and encryption in motion and encryption at rest coming soon.

Document Relational Features

FaunaDB supports various application programming languages with built-in drivers. A single query in FaunaDB (FQL) can pull data across documents, relational graph, and temporal datasets.

The relational checklist in FaunaDB

Feature	Description
Structure	FaunaDB provides the flexibility of NoSQL databases when it comes to schema design. At the same time, like relational databases with indexes and the (soon to be added) collection validators, FaunaDB enforces structure and defined access patterns to the data. FaunaDB stores data as documents (rows) in the database.
Integrity	Fauna DB supports the creation of Primary keys and will soon be able to enforce Foreign keys.
Joins	FaunaDB supports equi-joins and other set operations like union, intersect, minus, etc. across collections. Support for outer joins is on the roadmap.
Indexes	FaunaDB supports the creation of various types of Indexes. Indexes are used to define the access path to data.
ACID	FaunaDB fully supports transactions and provides strong consistency.
Query Language	Fauna Query Language (FQL) allows for complex, precise manipulation and retrieval of data stored within FaunaDB.
Enterprise Readiness	FaunaDB not only solves the problem of scale, but all the problems adjacent to scale like security, compliance, global distribution, consistency, multi-tenancy, and more.

Feature

Description

Structure

FaunaDB provides the flexibility of NoSQL databases when it comes to schema design. At the same time, like relational databases with indexes and the (soon to be added) collection validators, FaunaDB enforces structure and defined access patterns to the data. FaunaDB stores data as documents (rows) in the database.

Integrity

Fauna DB supports the creation of Primary keys and will soon be able to enforce Foreign keys.

Joins

FaunaDB supports equi-joins and other set operations like union, intersect, minus, etc. across collections. Support for outer joins is on the roadmap.

Indexes

FaunaDB supports the creation of various types of Indexes. Indexes are used to define the access path to data.

ACID

FaunaDB fully supports transactions and provides strong consistency.

Query Language

Fauna Query Language (FQL) allows for complex, precise manipulation and retrieval of data stored within FaunaDB.

Enterprise Readiness

FaunaDB not only solves the problem of scale, but all the problems adjacent to scale like security, compliance, global distribution, consistency, multi-tenancy, and more.

Examples

In this section, we will show how indexes can be created in FaunaDB and how we can leverage them to join two different collections. In order to expand on the example, we will use the data and structure of the EMP and DEPT tables usually found in the SCOTT schema of almost every Oracle database.

Create the collection

Fauna doesn’t enforce a predefined structure on collections, as one would normally associate with tables in a relational database. The structure is enforced in the documents created in the collection.

The following commands can be executed using Fauna Shell. Refer to the FaunaDB documentation for setting up the Fauna Shell in your laptop or workstation.
```
# Create the dept collection
CreateCollection({name: "dept"})

# Create the Emp collection
CreateCollection({name: "emp"})
```
Create indices

Access paths to data are defined via indexes within Fauna. In the example below, two such indexes are created to join the two collections on the department number "deptno" column. Indexes can have both terms and values. Values of one index are joined to the terms of another index So to run a query that returns all employees for a specific department, we need an index on the Dept collection with the term "dname" (department name) and the value "deptno" (department number). The index on the Emp collection should use "deptno" as the term and the Refs (location of the data, loosely analogous with ROWIDs in relational databases) as the values.
```
CreateIndex({
  name: "emp_by_deptno",
  source: Collection("emp"),
  terms:[{ field: ["data", "deptno"] }]
})
```

Create documents

Use the following commands to create records in the EMP and Dept collections. Only a single sample is shown. The entire script is available in the appendix of this document.

# Create the dept document
Create(Collection("dept"), { data: {
  "deptno": 10,
  "dname": "ACCOUNTING",
  "loc": "NEW YORK"
}});

# Create the Emp collection
Create(Collection("emp"), { data: {
  "empno": 7839,
  "ename": "KING",
  "job": "PRESIDENT",
  "mgr": null,
  "hiredate": "1981-11-17",
  "sal": 5000,
  "comm": null,
  "deptno": 10
}});

Join the two and return the results

In this example, let us assume that we want to find all employees in the SALES department.

# Get data using Refs
Map(Paginate(Join(Match(Index("dept_by_name"),"SALES"),Index("emp_
by_deptno"))),Lambda("X",Get(Var("X"))))

The above query returns all records in the Emp collection matching the SALES department.

Paginate(Join(Match(Index("dept_by_name")
{ data:
   [ Ref(Collection("emp"), "212658237367910913"),
     Ref(Collection("emp"), "212658785828733440"),
     Ref(Collection("emp"), "212659256472633858"),
     Ref(Collection("emp"), "212659262752555520"),
     Ref(Collection("emp"), "212659274083467777") ] }
Map(Paginate(Join(Match(Index("dept_by_name")
{ data:
   [ { ref: Ref(Collection("emp"), "212658237367910913"),
       ts: 1539065549080650,
       data:
        { empno: 7698,
          ename: 'BLAKE',
          job: 'MANAGER',
          mgr: 7839,
          hiredate: '1981-5-1',
          sal: 2850,
          deptno: 30 } },
     { ref: Ref(Collection("emp"), "212658785828733440"),
       ts: 1539066072138690,
       data:
        { empno: 7499,
          ename: 'ALLEN',
          job: 'SALESMAN',
          mgr: 7698,
          hiredate: '1981-02-22',
          sal: 1600,
          comm: 300,
          deptno: 30 } },
     { ref: Ref(Collection("emp"), "212659256472633858"),
       ts: 1539066520973573,
       data:
        { empno: 7521,
          ename: 'WARD',
          job: 'SALESMAN',
          mgr: 7698,
          hiredate: '1981-02-22',
          sal: 1250,
          comm: 500,
          deptno: 30 } },
     { ref: Ref(Collection("emp"), "212659262752555520"),
       ts: 1539066526966442,
       data:
        { empno: 7844,
          ename: 'TURNER',
          job: 'SALESMAN',
          mgr: 7698,
          hiredate: '1981-09-08',
          sal: 1500,
          comm: 0,
          deptno: 30 } },
     { ref: Ref(Collection("emp"), "212659274083467777"),
       ts: 1539066537773781,
       data:
        { empno: 7900,
          ename: 'JAMES',
          job: 'CLERK',
          mgr: 7698,
          hiredate: '1987-12-3',
          sal: 950,
          deptno: 30 } } ] }

Conclusion

We have inspected the continuous evolution of database management systems over the last 50 years and how disruption during the internet era has led to a shift from the traditional relational model to the new generation of NoSQL databases. While ACID guarantees were taken for granted in traditional relational databases, the first generation of NoSQL databases either had no support or supported a very restrictive flavor of transactions. As a result, much of the complexity of ensuring the integrity and consistency of data had to be pushed into the application layers. This made developing applications on the first generation of NoSQL databases often cumbersome and eventually difficult to maintain. The lack of complete transaction support coupled with the lack of enterprise readiness has prevented the widespread adoption of NoSQL databases in mission-critical transactional applications that require global scale.

Unlike those first-generation NoSQL databases, FaunaDB supports fully distributed ACID transactions with serializable isolation across geographically distributed replicas. FaunaDB’s built-in relational like capabilities provide the operational readiness and robustness that are usually associated with relational databases.

FaunaDB is the future of data-driven applications that require the best of both relational and NoSQL worlds.

Appendix

/* Insert into Department collection */
CreateCollection({name: "dept"})

Create(Collection("dept"), { data: {
  "deptno": 10, "dname": "ACCOUNTING", "loc": "NEW YORK"
}})

Create(Collection("dept"), { data: {
  "deptno": 20, "dname": "RESEARCH", "loc": "DALLAS"
}})

Create(Collection("dept"), { data: {
  "deptno": 30, "dname": "SALES", "loc": "CHICAGO"
}})

Create(Collection("dept"), { data: {
  "deptno": 40, "dname": "OPERATIONS", "loc": "BOSTON"
}})

/* Insert into Employee collection */
CreateCollection({name: "emp"})

Create(Collection("emp"), { data: {
  "empno": 7839, "ename": "KING", "job": "PRESIDENT",
  "mgr": null, "hiredate": "1981-11-17", "sal": 5000,
  "comm": null, "deptno": 10
}})

Create(Collection("emp"), { data: {
  "empno": 7698, "ename": "BLAKE", "job": "MANAGER",
  "mgr": 7839, "hiredate": "1981-5-1", "sal": 2850,
  "comm": null, "deptno": 30
}})

Create(Collection("emp"), { data: {
  "empno": 7782, "ename": "CLARK", "job": "MANAGER",
  "mgr": 7839, "hiredate": "1981-6-9", "sal": 2450,
  "comm": null, "deptno": 10
}})

Create(Collection("emp"), { data: {
  "empno": 7566, "ename": "JONES", "job": "MANAGER",
  "mgr": 7839, "hiredate": "1981-4-2", "sal": 2975,
  "comm": null, "deptno": 20
}})

Create(Collection("emp"), { data: {
  "empno": 7788, "ename": "SCOTT", "job": "ANALYST",
  "mgr": 7566, "hiredate": "1987-07-13", "sal": 3000,
  "comm": null, "deptno": 20
}})

Create(Collection("emp"), { data: {
  "empno": 7902, "ename": "FORD", "job": "ANALYST",
  "mgr": 7566, "hiredate": "1981-12-3", "sal": 3000,
  "comm": null, "deptno": 20
}})

Create(Collection("emp"), { data: {
  "empno": 7369, "ename": "SMITH", "job": "CLERK",
  "mgr": 7902, "hiredate": "1980-12-17", "sal": 800,
  "comm": null, "deptno": 20
}})

Create(Collection("emp"), { data: {
  "empno": 7499, "ename": "ALLEN", "job": "SALESMAN",
  "mgr": 7698, "hiredate": "1981-02-22", "sal": 1600,
  "comm": 300, "deptno": 30
}})

Create(Collection("emp"), { data: {
  "empno": 7521, "ename": "WARD", "job": "SALESMAN",
  "mgr": 7698, "hiredate": "1981-02-22", "sal": 1250,
  "comm": 500, "deptno": 30
}})

Create(Collection("emp"), { data: {
  "empno": 7844, "ename": "TURNER", "job": "SALESMAN",
  "mgr": 7698, "hiredate": "1981-09-08", "sal": 1500,
  "comm": 0, "deptno": 30
}})

Create(Collection("emp"), { data: {
  "empno": 7876, "ename": "ADAMS", "job": "CLERK",
  "mgr": 7788, "hiredate": "1987-07-13", "sal": 1100,
  "comm": null, "deptno": 20
}})

Create(Collection("emp"), { data: {
  "empno": 7900, "ename": "JAMES", "job": "CLERK",
  "mgr": 7698, "hiredate": "1987-12-3", "sal": 950,
  "comm": null, "deptno": 30
}})

/* Here we create two indexes */
CreateIndex({
  name: "dept_by_name",
  source: Collection("dept"),
  terms: [{ field: ["data",  "dname"] }],
  values: [{ field: ["data", "deptno"] }]
})

CreateIndex({
  name: "emp_by_deptno",
  source: Collection("emp"),
  terms: [{ field: ["data", "deptno"] }]
})

/* Find all records in EMP for the department SALES */
Paginate(
  Join(
    Match(Index("dept_by_name"), "SALES"),
    Index("emp_by_deptno")
  )
)

/* Get all records from EMP */
Map(
  Paginate(
    Join(
      Match(Index("dept_by_name"), "SALES"),
      Index("emp_by_deptno")
    )
  ),
  Lambda("X", Get(Var("X")))
)

Was this article helpful?

We're sorry to hear that.
Tell us how we can improve! documentation@fauna.com

Thank you for your feedback!