npm install -g @jircik/datagen

Realistic fake data for your database
in one command.

datagen is a schema-driven CLI that populates Postgres and MongoDB with realistic seed data using Faker.js — fast, reproducible, and built for developers who hate writing seed scripts.

Install datagen See how it works

PostgreSQL & MongoDB

Relation-aware

YAML schemas (versionable)

Reproducible (--seed)

~ / my-app zsh

$ datagen connect "postgresql://user:pass@localhost:5432/mydb"
✔ Connected to PostgreSQL at localhost:5432/mydb

$ datagen populate .datagen/ --count 500
↳ resolving dependency graph...
✔ users      500 rows inserted (212ms)
✔ posts      500 rows inserted (188ms)
✔ comments   500 rows inserted (240ms)

$ ▌

// Built for developers

Stop writing seed scripts.

Declarative schemas, real Faker.js methods, and database-native inserts — without leaving the terminal.

Two databases, one CLI

PostgreSQL and MongoDB drivers built in. Auto-detected from your connection string.

Relation-aware

Define foreign keys in YAML. datagen builds a dependency graph and populates in the right order.

Versionable schemas

Schemas live in .datagen/ as YAML. Commit them with your code.

Reproducible runs

Pass --seed 42 for byte-identical output. Perfect for CI and demos.

Inline mode

No schema file needed for quick seeding. Pass fields directly on the command line.

Fail loudly

Schema errors, FK violations, and unknown Faker methods are caught before insertion.

// 01 — Install

One command. Globally available.

datagen ships as a global npm package. Install it once and run datagen from any project. Your connection lives in ~/.datagen/config.json; schemas live alongside your code.

→ Requires Node.js 18+
→ Works on macOS, Linux, and Windows
→ Zero project setup — no migrations or codegen

npm — recommended

$ npm install -g @jircik/datagen

pnpm

$ pnpm add -g @jircik/datagen

yarn

$ yarn global add @jircik/datagen

verify ✓ check installation

$ datagen --version
1.0.0

// 02 — How it works

From zero to seeded in 3 steps.

Connect once, write a schema (or skip it), populate. That's it.

STEP 01

Connect your database

datagen detects the database type from your connection string and saves it globally.

$ datagen connect \
  "postgresql://localhost/mydb"
✔ Connected to PostgreSQL

STEP 02

Describe your data

Drop a YAML schema in .datagen/ mapping fields to Faker methods.

# users.schema.yaml
target: postgres
table: users
fields:
  id:    string.uuid
  name:  person.fullName
  email: internet.email

STEP 03

Populate

Run one schema, a whole folder, or pass fields inline. datagen handles the rest.

$ datagen populate .datagen/ \
    --count 1000 --seed 42
✔ 3 tables · 3000 rows
   in 640ms

// 03 — Commands

A small, focused CLI.

Eight commands. Each does one thing well.

datagen connect <uri> setup

Save and validate a database connection. Auto-detects Postgres or MongoDB.

datagen status setup

Show the currently connected database with masked credentials.

datagen disconnect setup

Clear the active connection from the global config.

datagen populate ... core

Insert fake data from a schema file, an entire .datagen/ folder, or inline fields.

datagen schema validate schema

Validate types, relations, and primary keys in a schema file before running.

datagen schema list schema

List all .schema.yaml files in the current project.

datagen list tables inspect

List all tables (Postgres) or collections (MongoDB) in the connected database.

--dry-run / --seed flags

Preview without inserting, or fix the seed for byte-identical output.

// 04 — Schema examples

Schemas your team can read.

Plain YAML. Versioned next to your code. No DSL to learn — just Faker.js method names.

.datagen/posts.schema.yaml postgres · relation

target: postgres
table: posts
fields:
  id:
    type: string.uuid
    primary: true
  title: lorem.sentence
  body:  lorem.paragraphs
  user_id:
    type: relation
    table: users
    field: id
    strategy: random

.datagen/orders.schema.yaml mongodb · nested

target: mongo
collection: orders
fields:
  status:
    type: helpers.arrayElement
    values: ['pending', 'paid', 'shipped']
  amount:
    type: number.float
    min: 10
    max: 9999
  tags:
    type: array
    items: commerce.productAdjective
    length: 3

no schema file? use inline mode --field × N

$ datagen populate --table users \
    --field "name:person.fullName" \
    --field "email:internet.email" \
    --field "age:number.int:min=18:max=80" \
    --count 50

new · claude code plugin

Use datagen with Claude Code.

Skip the manual schema authoring. The official Claude Code plugin teaches Claude how to set up connections, generate schemas from your tables, and run populate commands — all from a natural-language prompt inside your editor.

→ Generate .datagen schemas from your existing tables
→ Connect, populate, and validate without leaving the chat
→ Works in any project — global install, project-aware

View on GitHub Read installation guide

claude code · datagen plugin

ready

you

Seed the users and posts tables with 200 realistic rows each.

I'll generate schemas from your tables and populate them in the right order.

→ creating .datagen/users.schema.yaml
→ creating .datagen/posts.schema.yaml
$ datagen populate .datagen/ --count 200
✔ users  200 rows
✔ posts  200 rows

Done — 400 rows seeded in 380ms.

// 05 — FAQ

Questions, answered.

Does datagen modify my schema or run migrations?

No. datagen only inserts data. Your tables and collections must exist already — bring your own migrations.

How does relation resolution work?

For every type: relation field, datagen runs a SELECT to fetch existing IDs and picks them using your strategy (random or sequential). In folder mode, dependencies are sorted automatically.

Can I commit schemas to git?

Yes — that's the point. Schemas live in .datagen/ in your repo. Connection strings live globally in ~/.datagen/config.json and never touch your project files.

What about many-to-many or self-referencing relations?

One-to-many and many-to-one are supported today. Many-to-many (join tables) and self-referencing relations are on the post-MVP roadmap.

Is it safe to run against production?

datagen is built for local and staging environments. Don't point it at production — it inserts data, and that data is fake.

Realistic fake data for your database in one command.

Stop writing seed scripts.

Two databases, one CLI

Relation-aware

Versionable schemas

Reproducible runs

Inline mode

Fail loudly

One command. Globally available.

From zero to seeded in 3 steps.

Connect your database

Describe your data

Populate

A small, focused CLI.

Schemas your team can read.

Use datagen with Claude Code.

Questions, answered.

Seed a thousand rows before your coffee cools.

Realistic fake data for your database
in one command.