V Volkanic
Backend

Database design decisions that age well

The schema choices that look obvious in retrospect but are easy to get wrong early. Written from the perspective of maintaining production databases over years.

8 min read
DatabasePostgreSQLMySQLArchitectureBackend

Database schemas are one of the most expensive things to change in a live system. You can refactor code, swap libraries, replace APIs. But migrating production data at scale is slow, risky, and often requires backward-compatible intermediary states that linger for months.

Getting the schema right early pays dividends for years.

Use UUIDs or ULIDs as primary keys in distributed systems

Sequential integer IDs are fine for simple systems with a single database. They become a problem when you:

  • Generate IDs in application code before inserting (distributed creation)
  • Need to merge databases from different sources
  • Want to avoid exposing record count information

UUIDs are universally unique. ULIDs are sortable and more compact. Both avoid the tight coupling between “create an ID” and “insert into the database.”

The downside is storage and index size. UUID primary keys are 16 bytes vs 4 or 8 bytes for integers. On very large tables with many foreign keys this matters. Profile before assuming it’s a problem.

CREATE TABLE reservations (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    -- ...
);

Store monetary amounts as integers (minor currency units)

Never store prices as floats. 12.50 * 3 in floating point arithmetic may not equal 37.50 due to binary representation.

Store amounts in the smallest unit of the currency (cents for USD/EUR, pence for GBP):

total_amount_cents INTEGER NOT NULL,
currency VARCHAR(3) NOT NULL DEFAULT 'EUR',

12.50 EUR becomes 1250 in the database. Your application layer handles display formatting. Arithmetic on integers is exact.

Use timestamps with time zones

In PostgreSQL, TIMESTAMPTZ stores timestamps as UTC internally and handles timezone conversion in queries. In MySQL, DATETIME does not store timezone; you’re responsible for ensuring all values are UTC.

Pick one and be consistent. Store and retrieve in UTC. Convert to the user’s timezone at the display layer only.

created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),

Soft deletes:

deleted_at TIMESTAMPTZ,

NULL means not deleted. Non-null means deleted at that timestamp. Simpler than a boolean and you get the deletion time for free.

Separate “status” from “timestamps”

This is a common pattern I’ve seen cause confusion:

-- Problematic: status is implicit in which timestamp is set
confirmed_at TIMESTAMPTZ,
cancelled_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ,

The problem: determining the current status requires checking multiple nullable columns. Business logic to determine status gets scattered across the codebase.

Better to have both:

status VARCHAR(20) NOT NULL DEFAULT 'pending',
confirmed_at TIMESTAMPTZ,
cancelled_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ,

status is the source of truth for current state, queryable and indexable. The timestamps are the audit trail, telling you when each transition happened.

Index for your access patterns, not for your schema

Foreign key columns get indexes because joins need them. Status columns get indexes because you filter by them. But index every column “just in case” and you’ll slow down writes without benefiting reads.

Think about your actual queries:

-- You query reservations by customer frequently
CREATE INDEX idx_reservations_customer_id ON reservations(customer_id);

-- You query active reservations by start date
CREATE INDEX idx_reservations_active_starts_at
    ON reservations(starts_at)
    WHERE status = 'confirmed';

Partial indexes are powerful and often overlooked. If you query WHERE deleted_at IS NULL constantly, index on it.

Use EXPLAIN ANALYZE on your real queries. Don’t guess.

Keep migrations additive, never destructive

Every production database migration should be safe to run while the application is serving traffic. This means:

  • Add columns as nullable first, then backfill, then add the constraint.
  • Never rename a column while the old code is running (use a new column + deprecate the old one).
  • Drop columns and tables only after the code that references them is already deployed and gone.

The sequence for adding a required field:

  1. Deploy migration: add new_column as nullable
  2. Deploy code: write to both old and new columns; read from old
  3. Backfill: populate new_column for existing rows
  4. Deploy code: read from new column; write only to new
  5. Deploy migration: add NOT NULL constraint
  6. Deploy code: remove writes to old column
  7. Deploy migration: drop old column (much later)

This is more steps than you want. It’s also how you avoid a 2am incident.