Friday, April 10, 2026

vms homage

license: public domain CC0

Vision and design: Versioned filesystem for modern workloads

This is a design for a VMS‑style automatically versioned filesystem reimagined for:

  • SQLite and other page‑oriented databases
  • streaming and append‑heavy workloads
  • cloud‑local hybrid operation
  • semantic, provenance‑rich development environments

The core move: version at the block/page layer, not at the “whole file” layer, and expose a VMS‑like versioned namespace as a projection over an immutable, content‑addressable store.


1. Vision

1.1 What we want

A filesystem where:

  • Every file has a versioned history by default (like VMS foo.txt;17).
  • Versions are cheap to create and store, even for large, mutable files.
  • SQLite databases and similar workloads get transaction‑consistent snapshots without hacks.
  • Streaming and append‑heavy workloads don’t explode the version count.
  • The same underlying store can back:
    • local development
    • cloud sync
    • semantic code storage
    • time‑travel debugging and provenance

1.2 Core principles

  • Immutability at the block level: once written, blocks never change.
  • Copy‑on‑write structure: new versions share blocks with old ones.
  • Semantic version boundaries: versioning is aligned with meaningful events (fsync, close, WAL checkpoint, explicit commit), not every write.
  • Content addressability: blocks are identified by hash, enabling deduplication and integrity.
  • Namespace as projection: the “filesystem” is a view over a versioned object graph, not the primary storage primitive.

2. Goals and non‑goals

2.1 Goals

  • Automatic versioning: every file has a history without application changes.
  • SQLite‑friendly: support efficient, consistent snapshots of SQLite DBs and similar page‑oriented stores.
  • Streaming‑friendly: support long‑lived, append‑heavy files without pathological version growth.
  • Cloud‑ready: support remote block storage, local caching, and multi‑device sync.
  • Debuggable and explainable: versioning rules and boundaries are explicit and inspectable.

2.2 Non‑goals (for v1)

  • Not a full POSIX replacement: we target a FUSE‑style or OS‑integrated filesystem, but we can initially accept some edge‑case incompatibilities.
  • Not a distributed consensus system: multi‑writer concurrency is handled via version branching and merge semantics, not strong global transactions.
  • Not a full Git replacement: we can integrate with Git, but we’re not re‑implementing its UX.

3. Core concepts

3.1 Block store

Primitive: fixed‑size blocks (e.g., 4 KB or 8 KB).

  • ID: block_id = hash(block_contents)
  • Properties:
    • immutable
    • content‑addressed
    • deduplicated
    • stored in local cache + optional remote store

3.2 File version

file version is a logical object:

FileVersion {
  file_id        // stable identity for the logical file
  version_id     // monotonically increasing or hash-based
  parent_version // optional, for history/branching
  path_at_time   // path in the namespace when this version was created
  block_list     // ordered list of block_ids
  size           // byte length
  metadata       // timestamps, permissions, etc.
  tags           // optional semantic labels (e.g., "checkpoint", "autosave")
}

Multiple FileVersions share blocks via block_list.

3.3 Namespace

The live filesystem view maps paths to a current version:

Path -> FileVersion(version = "latest")

Historical versions are accessible via extended syntax, e.g.:

  • foo.txt;17
  • foo.txt;latest
  • foo.txt;timestamp:2026-04-10T23:17Z
  • foo.txt;version:abc123

Internally, this is just a lookup into the version metadata store.

3.4 Version boundary

version boundary is an event that causes a new FileVersion to be committed:

  • close()
  • fsync()
  • SQLite WAL checkpoint
  • explicit ioctl / API call
  • time‑based or size‑based thresholds (e.g., at most 1 version per second per file)

Between boundaries, writes mutate an uncommitted working state (in memory or temp structures) that is not yet a committed version.


4. Architecture

4.1 High‑level components

  1. Block store

    • Local block cache (disk)
    • Optional remote block store (cloud)
    • Content‑addressable, immutable
  2. Metadata store

    • File identities (file_id)
    • File versions (FileVersion)
    • Directory structure and path mapping
    • Indices for lookup by path, time, tags
  3. Versioned filesystem layer

    • Implements POSIX‑like operations
    • Maintains working state for open files
    • Decides when to commit new versions
  4. Sync and replication layer

    • Push/pull blocks and metadata to/from remote
    • Conflict detection and version branching
  5. Semantic layer (optional, higher level)

    • Tags versions with semantic events (e.g., “test passed”, “build succeeded”)
    • Integrates with tools like Git, CI, editors, agents

4.2 Data flow: write path

  1. Application opens foo.db.
  2. Filesystem resolves file_id and current FileVersion.
  3. Writes go to a working file state:
    • In memory or temp file
    • Tracked as a list of modified blocks
  4. On version boundary (e.g., fsync, WAL checkpoint, close):
    • Compute hashes for modified blocks
    • Store new blocks in block store (if not already present)
    • Construct new FileVersion with updated block_list
    • Update namespace mapping for foo.db → new version

4.3 Data flow: read path

  1. Application opens foo.db (or foo.db;17).
  2. Filesystem resolves the requested FileVersion.
  3. Reads map file offsets to block indices.
  4. Blocks are fetched from:
    • Local block cache if present
    • Otherwise remote store, then cached locally

5. Versioning semantics by workload

5.1 Regular files (text, code, configs)

Default behavior:

  • New version on:
    • close()
    • fsync()
    • explicit versioning call
  • Optional throttling:
    • If a file is opened/closed rapidly, coalesce versions within a time window.

This gives a natural history of edits without overwhelming storage.

5.2 SQLite and page‑oriented databases

SQLite is special and important.

Key properties:

  • Writes in fixed‑size pages.
  • Uses WAL mode for concurrency and durability.
  • Checkpoints consolidate WAL into the main DB file.

Design:

  • Track SQLite DBs explicitly (by extension, magic bytes, or configuration).
  • Treat WAL checkpoints as version boundaries:
    • Before checkpoint: DB is at version N.
    • After checkpoint: DB is at version N+1.
  • Optionally, treat WAL segments themselves as versioned objects for finer‑grained time travel.

Benefits:

  • Each version is transaction‑consistent.
  • No version spam from individual page writes.
  • Snapshots are cheap: only changed pages create new blocks.

5.3 Streaming and append‑heavy files

Examples: logs, video recording, long‑running data streams.

Problems to avoid:

  • Creating a new version for every append.
  • Keeping infinite history for unbounded streams.

Design:

  • Maintain a live append version:
    • Writes extend the current version’s block_list.
  • Version boundaries:
    • On close()
    • Periodic checkpoints (e.g., every N MB or N seconds)
  • Retention policy:
    • Keep only the last K checkpoints or last T time window.
    • Older versions can be garbage‑collected or compacted.

This keeps the version tree manageable while still enabling time‑bounded rewind.


6. Namespace and UX

6.1 Path and version syntax

Expose a VMS‑inspired syntax while remaining POSIX‑compatible.

  • Default path: foo.txt → latest version.
  • Explicit version: foo.txt;17
  • By timestamp: foo.txt;@2026-04-10T23:17Z
  • By label/tag: foo.txt;tag:checkpoint

Internally, these resolve to specific FileVersion objects.

6.2 Tools and introspection

Provide tools to explore history:

  • vls foo.txt → list versions with timestamps, sizes, tags.
  • vcat foo.txt;17 → print specific version.
  • vdiff foo.txt;17 foo.txt;23 → diff two versions.
  • vmeta foo.txt;17 → show metadata and block structure.

These tools make the system explainable and debuggable.


7. Consistency, concurrency, and branching

7.1 Single‑writer per file (v1 assumption)

For simplicity, assume:

  • At any moment, a given file_id has at most one active writer.
  • Concurrent writes from multiple processes are serialized by the filesystem layer.

This matches typical local FS semantics and avoids distributed locking.

7.2 Multi‑device / multi‑replica

When multiple devices modify the same logical file:

  • Each device creates its own FileVersion chain.
  • On sync, if two new versions share the same parent, we have a branch.
  • Branches can be:
    • kept as parallel histories
    • merged via higher‑level tools (e.g., text merge, DB merge)
    • resolved by user choice

The filesystem itself remains neutral: it records divergent histories; it doesn’t auto‑merge semantics.

7.3 Atomicity and durability

  • FileVersion is either fully committed (all blocks stored, metadata updated) or not visible.
  • Crash during commit:
    • Blocks may be present, but metadata not updated → GC can reclaim or finalize.
  • Durability guarantees depend on:
    • local fsync to block store
    • remote sync policy

8. Storage, GC, and retention

8.1 Storage growth

Storage grows with:

  • number of unique blocks
  • number of versions
  • retention policy

Because blocks are content‑addressed and shared:

  • Repeated edits that reuse content are cheap.
  • Large files with small changes are efficient (only changed blocks are new).

8.2 Garbage collection

GC operates at the block and version levels.

  • Version GC:

    • Apply retention policies:
      • keep last N versions
      • keep versions newer than T
      • keep tagged versions (e.g., “checkpoint”, “release”)
    • Delete metadata for expired versions.
  • Block GC:

    • Periodically scan for blocks not referenced by any remaining FileVersion.
    • Delete unreferenced blocks from local and/or remote stores.

8.3 Tiered storage

Support multiple storage tiers:

  • Hot: local SSD cache for recent blocks and metadata.
  • Warm: remote object store (e.g., S3‑like).
  • Cold: archival storage or compressed packfiles.

Policies can move blocks between tiers based on age, access frequency, or tags.


9. Integration and migration

9.1 Integration with existing tools

  • Git:

    • Map Git commits to snapshots of a directory tree.
    • Store Git objects in the same block store for deduplication.
    • Allow git checkout to be implemented as a cheap namespace projection.
  • Editors and IDEs:

    • Provide an API to tag versions with semantic events (save, build, test).
    • Allow time‑travel debugging by mapping source versions to runtime traces.
  • CI/CD:

    • Pin builds to specific filesystem versions.
    • Reproduce builds by re‑mounting the same versioned tree.

9.2 Migration path

  • Start as a user‑space filesystem (FUSE or equivalent).
  • Allow mounting a directory as versioned storage.
  • Existing applications can run unmodified:
    • SQLite uses the filesystem as usual.
    • Logs and streams write to regular files.

Over time, add:

  • OS‑level integration (kernel module).
  • Tooling and UI for history exploration.
  • Cloud sync and multi‑device support.

10. Open questions and extensions

10.1 Policy configuration

  • How configurable should version boundaries be per file/path?
  • Do we expose a policy language (e.g., “for *.db, version on WAL checkpoint; for logs/, checkpoint every 10 MB”)?

10.2 Semantic tagging

  • How deeply do we integrate with higher‑level semantics (tests, builds, deployments)?
  • Do we treat semantic events as first‑class objects linked to FileVersions?

10.3 Security and encryption

  • Per‑block encryption with keys managed per user or per project.
  • Integrity verification via hashes is already built‑in.

10.4 Observability

  • Expose metrics:
    • version creation rate
    • block deduplication ratio
    • storage usage by tier
  • Provide tracing for:
    • version commit paths
    • sync operations
    • GC cycles


No comments:

Post a Comment