Motivation
AWS Timestream provides a fully managed time-series database designed for a range of workloads, from low-latency queries to large-scale data ingestion.
One of its distinguishing features is that it doesn’t allow to set up a predefined schema; instead, the table schema is automatically generated after the first data write. While this might not seem like a major concern at first, in some cases it can result in runtime or even deployment errors until the table is populated with data.
For instance, consider a table:
If the table has just been created and remains empty, a query such as
will fail to run with the following error
Since software engineers typically can’t wait for production data to be populated before starting working on query functionality, this can present several challenges:
- When developing a query that is executed programmatically (e.g., from a Lambda function) and deploying it to a production environment, the function may start generating errors, leading to noisy monitoring and failure alerts. Alternatively, engineers would need to disable monitoring for this specific functionality and remember to re-enable it later, adding extra complexity to the process.
- When developing a query that runs as a Timestream Scheduled Query, the situation becomes more challenging. The deployment of a CDK/CFT stack may fail entirely because a scheduled query requires the table schema to be defined in advance. This creates a vicious cycle: the stack, which includes both the table and the scheduled query, cannot be deployed until the table is populated, yet the table cannot be populated before the stack is deployed.
While manually adjusting the deployment process by deploying components in 2 phases — first the table and then the query — can provide a temporary solution, this approach is cumbersome, especially when repeated multiple times. Furthermore, it necessitates maintaining runbooks to instruct users on how to populate data into the newly created table, and involves a manual procedure that is inherently prone to errors.
Solution
Fortunately, this issue can be resolved using the built-in functionality of CDK. The solution entails configuring CDK to invoke the Timestream Write API and insert a sample record during the deployment process. Once this step is completed, the deployment can proceed with the query.
To achieve this, you’ll use the custom resource feature.
Let’s begin with defining a policy to perform the Timestream Write API call:
Next, let’s create a Timestream Write request payload:
Remember to tailor the payload to your needs.
The final and most crucial step is to make the call to the Timestream API:
That’s all! By using this simple approach, you have populated the table with a sample record, guaranteeing that no queries will fail during or after deployment.
Further steps
If you have multiple tables that need to be populated with data, it would be beneficial to extract the previously mentioned functionality into its own construct for improved reusability.