Datawaves
Search…
⌃K

Concepts

In only one sentence, Datawaves is a real-time data integration platform.
With Datawaves, you define pipelines in SQL that real-time process streams of blockchain data that are integrated into your data infrastructure by connections. If you don't want to host a backend, Datawaves even allows you to integrate the stream into a managed database and access the data via GraphQL.
Datawaves is a cloud service, meaning you have no archive nodes, clusters, or services to manage. We handle all the complex infrastructure of ingesting, decoding, transforming and integrating blockchain data for you.
To ease the friction of handling tedious and fragmented blockchain data, Datawaves provides higher-level abstraction data. Datawaves transforms blockchain activities into structured semantic data streams from which you can easily derive insights.
Real-time Data Integration Platform

Streams

A stream carries records within Datawaves and can be read or written to. It is conceptually similar to a "topic" in Kafka. You can also think of a Datawaves stream like a database table. A stream's records are like a table's rows, and its fields are like a table's columns.
The developers of Datawaves platform can use the Web3 data stream maintained by our team. Datawaves also allow user to derive stream from one (or more) streams using pipelines. Once a stream is defined, it can be used as an input or output by any number of pipelines or connections.

Schema

Each stream has a schema that defines the data types of records. Datawaves supports a standard list of data types for schema.

Pipelines

By definition, a pipeline is a streaming SQL query that processes data from one (or more) streams, and writes the results to an output stream.
Datawaves uses SQL to process data, which should feel familiar to anyone who has used relational database systems. But certain operations (joins and aggregations) must include window.
Datawaves ensures that your pipeline is fault-tolerant and delivers data exactly once by handling pipeline state management, consistent checkpointing and recovery behind the scenes.

Connections

A connection allows data to flow between a Datawaves stream and an external system. Datawaves provides plenty of connectors for messaging systems, object stores, operational database systems, data warehouses, and data lakes, making it easy to connect.
Connections include technology-specific configuration that allows Datawaves to communicate with these systems, typically including hostnames, ports, authentication information, and other settings.

Sync Modes

A key feature of an efficient connection is the ability to sync incrementally. The alternative to incremental sync is to sync historical data.
  • Full Sync: You must run a full sync to begin using your connector. The first historical sync that Datawaves does for a connector is called the initial sync. Full syncs are necessary to capture all data, and occasionally to fix corrupted records or other data integrity issues (We call this re-sync).
  • Incremental Sync: After a successful initial sync, the connector runs in an incremental sync mode. In this mode, only data that has been modified or added - also known as incremental changes - is extracted, processed, and loaded on schedule.