Advanced Data Seeding Configurations
If you want to create a new Velocity Environment with a DB that is seeded with data that is different from the default, you can achieve that by providing a different data file.
Let’s continue the Blueprints for Data Seeding flow and create another text file called
special-data.sql
with the following SQL:CREATE TABLE public.users (
first_name text,
last_name text,
user_id smallint
);
ALTER TABLE public.users OWNER TO postgres;
COPY public.users (first_name, last_name, user_id) FROM stdin;
ps '1' 1
ls '2' 2
vim '3' 3
\.
Note that we are creating the same table we created in the previous tutorial, but we fill it with different values.
Now run the following command:
veloctl snapshot put -f special-data.sql --target seeding --name special-data
The
--target
is still seeding
, because we are using the same Blueprints with the same seeding job as we used before. However, this time we are using the
--name special-data
flag. That means that our data file will be associated with a different snapshot named special-data
.Next, let’s create a new Velocity Environment with the
special-data
snapshot like this:veloctl env create -f https://raw.githubusercontent.com/techvelocity/velocity-blueprints/main/getting-started/aws/data-seeding/postgres-single-job.yaml --env-version v2 --snapshot special-data
Note the
--snapshot
flag. This command means: create a new environment and don’t use the default snapshot but rather a snapshot data named special-data
. In the previous example, we had to repeat the SQL for creating a table. This isn’t ideal and not DRY. We will try to solve this issue with the following example.
Let's create two text files:
migrate.sql
CREATE TABLE IF NOT EXISTS tasks (
task_id INT PRIMARY KEY NOT NULL,
title TEXT NOT NULL,
start_date DATE,
due_date DATE,
status INT NOT NULL,
priority INT NOT NULL,
description TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
data.sql
INSERT INTO tasks (task_id, title, status, priority) VALUES
(1, 'ps', 1, 2),
(2, 'ls', 2, 1),
(3, 'vim', 3, 1);
We will refer to these file names in the next steps of the guide, so be sure to use these exact names.
Upload the data files:
veloctl snapshot put -f migrate.sql --target migrate-job --default
veloctl snapshot put -f data.sql --target data-job --default
Note that each file is associated with a target:
migrate-job
and data-job
respectively.Now, let’s create a new Velocity Environment from a multi-job sample:
veloctl env create -f https://raw.githubusercontent.com/techvelocity/velocity-blueprints/main/getting-started/aws/data-seeding/postgres-multi-job.yaml --env-version v2
And the expected output should be similar to:
Requesting the creation of environment surprised-bart-rozum with services at 2022-09-13 10:43:28 IDT...
Environment 'surprised-bart-rozum' status:
Point in time: 2022-09-13 10:43:28 IDT
Service Status Version Public URI Tunneled
psql Ready ...postgresql:13.2.0
migrate Ready ...postgresql:13.2.0
data Ready ...postgresql:13.2.0
the-app Ready ...postgresql:13.2.0
Overall status: Ready
We can see that this time there are 4 Velocity Services provisioned.
Now, if we want to override only the values seeded, but keep the same table definitions, we will create another file:
data-2.sql
INSERT INTO tasks (task_id, title, status, priority) VALUES
(10, 'cp', 1, 2),
(20, 'rm', 2, 1),
(30, 'top', 3, 1);
And upload it like this:
veloctl snapshot put -f data-2.sql --target data-job --default
And create another Velocity Environment to demonstrate an interesting behavior:
veloctl env create -f https://raw.githubusercontent.com/techvelocity/velocity-blueprints/main/getting-started/aws/data-seeding/postgres-multi-job.yaml --snapshot special-data,default
Note that we pass a comma-separated list of snapshot names here. The order of names is important. First, we include all the data files that are associated with the
special-data
snapshot name. Note that we only uploaded one file for special-data
that is associated with the data
job.Next, we take more data files from the
default
snapshot name. The default
snapshot includes two files, each associated with data
and migrate
jobs respectively. Since the
data
job is already using the data file from the special-data
snapshot, it will not be replaced with the data from default
. However, the migrate
job is still “free,” because there wasn’t any file in the special-data
snapshot associated with it, so that K8s Job will use the data from the default
snapshot.You can use as many snapshot names as you want when creating a Velocity Environment. The
default
snapshot name is a special name that is always added to the end of the list, so you don’t have to add it yourself, so the following command has the same result:veloctl env create -f https://raw.githubusercontent.com/techvelocity/velocity-blueprints/main/getting-started/aws/data-seeding/postgres-multi-job.yaml --snapshot special-data
You can use as many seeding jobs as you need and divide the work as is relevant to your application. It is possible to create multiple jobs that seed a single DB, as was demonstrated in this guide. It is also possible to create multiple jobs, each seeding a different DB, and it is also possible to create one job that seeds multiple databases; however, this is outside the scope of this guide.
Last modified 2mo ago