FaunaDB Data Manager

The FaunaDB Data Manager (FDM) is a terminal application that performs migration, export, import, backup, and restore tasks for FaunaDB databases. For example:

  • Copying a FaunaDB database, including its documents, collections, indexes, functions, and roles, at any point in time, to another FaunaDB database.

  • Importing and updating documents from:

    • JSON or CSV files, in the local filesystem or in an AWS S3 bucket,

    • any JDBC-compliant SQL database, such as MySQL or PostgreSQL,

    • another FaunaDB database.

  • Exporting and backing up documents to:

    • a local filesystem as JSON files,

  • Simple ETL tasks, such as:

    • changing a field/column name and/or data type,

    • specifying a "primary key" field (to use as a reference),

    • specifying the import time,

    • ignoring fields,

    • applying a simple merge, replace, ignore policy for specified schema types.

  • Copying an existing FaunaDB database into a new database, to establish initial schema and content, for testing or multi-tenant scenarios.

For more information on the FaunaDB Data Manager, see the following topics:

  • Install the FaunaDB Data Manager: describes the requirements and installation procedure.

  • Parameters: describes all of the parameters that specify the source, destination, and formatting of document fields.

  • Configuration: describes the format of the FaunaDB Data Manager configuration file, fdm.props, which can be used to record settings for multiple FaunaDB Data Manager invocations.

  • Format transformations: describes the syntax of format transformations, which are used to rename fields, ignore fields, and/or change field types during processing.

  • Examples: presents a number of import, export, and document copying goals and how to invoke the FaunaDB Data Manager to achieve them.

Limitations

The FaunaDB Data Manager is currently in preview mode: we’d like you to try it, but you should not use it on production databases.

The current release of the FaunaDB Data Manager has the following limitations:

  • Document history is not processed. Only the most recent version of each document is exported or copied.

  • Child databases are not processed. To process a child database, run the FaunaDB Data Manager with an admin key for that child database.

  • Keys and tokens are not copied. Since the secret for a key or token is only provided on initial creation, it is not possible to recreate existing keys and tokens. You would need to create new keys and tokens in the target database.

  • GraphQL schema metadata is not fully processed. This means that if you import an exported database, or copy one FaunaDB database to another, you need to import an appropriate GraphQL schema into the target database in order to run GraphQL queries.

  • Schema documents have an upper limit of 10,000 entries per type. If a source database contains more than 10,000 collections, indexes, functions, or roles, only the first 10,000 of each type are processed and the remainder are ignored.

  • When exporting a FaunaDB database to the local filesystem, only collections and their associated documents are exported. A copy of the schema documents describing collections, indexes, functions, and roles is copied to the file fauna_schema. Currently, that schema file cannot be used during import.

  • FaunaDB imposes collection-naming rules, specifically that collections cannot be named any of the following: events, set, self, documents, or _. Unfortunately, the FaunaDB Data Manager does not have the capability to rename collections during processing. If your CSV, JSON, or JDBC sources have filenames/tables that use these reserved names, processing terminates with an error.

  • While the FaunaDB Data Manager works on Windows, only limited testing has been done on that platform. You may experience unexpected platform-specific issues. We certainly plan to expand our Windows testing for the FaunaDB Data Manager for future releases.

Over time, we hope to remove many of these limitations and add new features. We love your feedback, and want to hear from you whether the FaunaDB Data Manager is useful to you, whether you encounter problems, and especially if you have suggestions for improvement! Let us know in the #fdm channel in our Community Slack.

Export file format

When the FaunaDB Data Manager creates a backup of a FaunaDB database to a filesystem, it creates one file per collection in the source database named after the source database collection.

Each exported file contains one JSON document per line, representing all of the documents that exist in the source database.

One additional file is created, called fauna_schema. It too contains one JSON document per line, and these documents record the collections, indexes, functions, and roles within the source database.

Processing synopsis

The FaunaDB Data Manager operates using multiple threads to achieve the best throughput possible.

The loader thread evaluates the source for documents, and populates a read queue to stream in documents to process.

The main processing thread fetches a document from the read queue, applies any Format transformations that may be enabled, and sends the document to a write queue.

The write thread fetches documents from the write queue as quickly as it can, and sends those documents to the specified destination.

When the destination is a FaunaDB database, documents are created using the Insert function.

Source documents that contain references result in practically-identical documents in the destination (history is not copied). When source documents do not contain references, or there is no suitable ID field that can serve as a reference (via Format transformations), destination documents get generated references. For the latter, repeated FaunaDB Data Manager runs result in multiple copies of the source documents.

Similarly, when a source document contains a timestamp, or contains a field that can be used as a timestamp (via Format transformations), the destination documents use the source document’s timestamp. For source documents without timestamps, the destination documents receive the current timestamp during processing.

If a source document has a reference, but the timestamp differs from an existing document in the destination database that has the same reference, a new version of the document is created in the destination database. This prevents overwriting of any existing documents in the destination database.

Was this article helpful?

We're sorry to hear that.
Tell us how we can improve! documentation@fauna.com

Thank you for your feedback!