Search…
Data seeding
Seed your Velocity environment with production-like data.
In this guide we will create an environment that does the following:
  • starts a PostgreSQL database service
  • starts a job that will seed the DB from a data file (a snapshot)
  • starts an application that queries the DB and uses the seed data
First, we will create a text file called migrate.sql that contains the following SQL:
CREATE TABLE IF NOT EXISTS tasks (
task_id INT PRIMARY KEY NOT NULL,
title TEXT NOT NULL,
start_date DATE,
due_date DATE,
status INT NOT NULL,
priority INT NOT NULL,
description TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
INSERT INTO tasks (task_id, title, status, priority) VALUES (1, 'grep', 1, 1);
Here we are creating a new table named tasks and populating it with a single record.
Next, run the following command to upload the data file to the snapshot storage:
veloctl snapshot put -f <path/to/migrate.sql> --target seeding --default
Note that the target must be seeding (it will be explained shortly). The --default flag means that this data will be used by default whenever a new environment is created. Later, we will see an example with some non-default snapshots as well.
Next, we will create an environment that has a Postgres DB. For that, we will use a sample blueprint from our samples repository:
veloctl env create -f https://raw.githubusercontent.com/techvelocity/velocity-blueprints/main/getting-started/aws/data-seeding/postgres-single-job.yaml --env-version v2
Running this command should result in output similar to this:
Requesting the creation of environment surprised-bart-rozum with services at 2022-09-13 10:43:28 IDT...
Environment 'surprised-bart-rozum' status:
Point in time: 2022-09-13 10:43:28 IDT
Service Status Version Public URI Tunneled
psql Ready ...postgresql:13.2.0
seeding Ready ...postgresql:13.2.0
the-app Ready ...postgresql:13.2.0
Overall status: Ready
You can see that 3 services were successfully deployed. psql is the Postgres DB itself. seeding is the name of the service responsible for seeding the data.
Remember in the previous step the value of the --target flag had to be seeding. The target of the snapshot is the name of the Velocity service that uses it. Each data file has precisely one target, which means that each data file is handled by precisely one seeding job.
Looking at the logs of the-app service, you should see:
task_id | title | start_date | due_date | status | priority | description | created_at │
│ ---------+-------+------------+----------+--------+----------+-------------+---------------------------- │
│ 1 | grep | | | 1 | 1 | | 2022-09-13 07:44:02.673977 │
│ (1 row) │
│ │
│ task_id | title | start_date | due_date | status | priority | description | created_at │
│ ---------+-------+------------+----------+--------+----------+-------------+---------------------------- │
│ 1 | grep | | | 1 | 1 | | 2022-09-13 07:44:02.673977 │
│ (1 row) │
│ │
│ task_id | title | start_date | due_date | status | priority | description | created_at │
│ ---------+-------+------------+----------+--------+----------+-------------+---------------------------- │
│ 1 | grep | | | 1 | 1 | | 2022-09-13 07:44:02.673977 │
│ (1 row) │
In this example, the app repeatedly runs the following query: select * from tasks. And you can see in the above output that the table exists and contains one record, which is the data that was successfully seeded by the seeding job.

Create non-default snapshot

If you want to create a new environment with a DB that is seeded with data that is different from the default, you can achieve that by providing a different data file. Let’s create another text file with the following SQL:
CREATE TABLE IF NOT EXISTS tasks (
task_id INT PRIMARY KEY NOT NULL,
title TEXT NOT NULL,
start_date DATE,
due_date DATE,
status INT NOT NULL,
priority INT NOT NULL,
description TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
INSERT INTO tasks (task_id, title, status, priority) VALUES
(1, 'ps', 1, 2),
(2, 'ls', 2, 1),
(3, 'vim', 3, 1);
Note that we are creating the same table, but we fill it with different values.
Now run the following command:
veloctl snapshot put -f <path/to/other/file> --target seeding --name special-data
The --target is still seeding, because we are using the same blueprints with the same seeding job. However, this time we are using the --name special-data flag. That means that our data file will be associated with a different snapshot named special-data.
Next, let’s create a new environment with the special-data snapshot like this:
veloctl create env -f https://raw.githubusercontent.com/techvelocity/velocity-blueprints/main/getting-started/aws/data-seeding/postgres-single-job.yaml --env-version v2 --snapshot special-data
Note the --snapshot flag. This command means: create a new environment and don’t use the default snapshot but rather a snapshot data named special-data.

Using multiple seeding jobs/targets

In the previous example, we had to repeat the SQL for creating a table. This isn’t ideal and not DRY. We will try to solve this issue with the following example.
Let's create two text files:
migrate.sql
CREATE TABLE IF NOT EXISTS tasks (
task_id INT PRIMARY KEY NOT NULL,
title TEXT NOT NULL,
start_date DATE,
due_date DATE,
status INT NOT NULL,
priority INT NOT NULL,
description TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
data.sql
INSERT INTO tasks (task_id, title, status, priority) VALUES
(1, 'ps', 1, 2),
(2, 'ls', 2, 1),
(3, 'vim', 3, 1);
We will refer to these file names in the next steps of the guide so be sure to use these exact names.
Upload the data files:
veloctl snapshot put -f migrate.sql --target migrate-job --default
veloctl snapshot put -f data.sql --target data-job --default
Note that each file is associated with a target: migrate-job and data-job respectively.
Now, let’s create a new environment from a multi-job sample:
veloctl env create -f https://raw.githubusercontent.com/techvelocity/velocity-blueprints/main/getting-started/aws/data-seeding/postgres-multi-job.yaml --env-version v2
And the expected output should be similar to:
Requesting the creation of environment surprised-bart-rozum with services at 2022-09-13 10:43:28 IDT...
Environment 'surprised-bart-rozum' status:
Point in time: 2022-09-13 10:43:28 IDT
Service Status Version Public URI Tunneled
psql Ready ...postgresql:13.2.0
migrate Ready ...postgresql:13.2.0
data Ready ...postgresql:13.2.0
the-app Ready ...postgresql:13.2.0
Overall status: Ready
We can see that this time there are 4 services provisioned.
Now, if we want to override only the values seeded, but keep the same table definitions, we will create another file:
INSERT INTO tasks (task_id, title, status, priority) VALUES
(10, 'cp', 1, 2),
(20, 'rm', 2, 1),
(30, 'top', 3, 1);
And upload it like this:
veloctl snapshot put -f <path/to/file> --target data --name special-data
And create another environment to demonstrate an interesting behavior:
veloctl env create -f https://raw.githubusercontent.com/techvelocity/velocity-blueprints/main/getting-started/aws/data-seeding/postgres-multi-job.yaml --snapshot special-data,default
Note that we pass a comma-separated list of snapshot names here. The order of names is important. First, we include all the data files that are associated with the special-data snapshot name. Note that we only uploaded one file for special-data that is associated with the data job.
Next, we take more data files from the default snapshot name. The default snapshot includes two files, each associated with data and migrate jobs respectively.
Since the data job is already using the data file from the special-data snapshot, it will not be replaced with the data from default. However, the migrate job is still “free,” because there wasn’t any file in the special-data snapshot associated with it, so that job will use the data from the default snapshot.
You can use as many snapshot names as you want when creating an environment. The default snapshot name is a special name that is always added to the end of the list, so you don’t have to add it yourself, so the following command has the same result:
veloctl env create -f https://raw.githubusercontent.com/techvelocity/velocity-blueprints/main/getting-started/aws/data-seeding/postgres-multi-job.yaml --snapshot special-data

Summary

You can use as many seeding jobs as you need and divide the work as is relevant to your application. It is possible to create multiple jobs that seed a single DB, as was demonstrated in this guide. It is also possible to create multiple jobs, each seeding a different DB, and it is also possible to create one job that seeds multiple databases; however, this is outside the scope of this guide.
Copy link
On this page
Create non-default snapshot
Using multiple seeding jobs/targets
Summary