Vision and design: Versioned filesystem for modern workloads
This is a design for a VMS‑style automatically versioned filesystem reimagined for:
- SQLite and other page‑oriented databases
- streaming and append‑heavy workloads
- cloud‑local hybrid operation
- semantic, provenance‑rich development environments
The core move: version at the block/page layer, not at the “whole file” layer, and expose a VMS‑like versioned namespace as a projection over an immutable, content‑addressable store.
1. Vision
1.1 What we want
A filesystem where:
- Every file has a versioned history by default (like VMS
foo.txt;17). - Versions are cheap to create and store, even for large, mutable files.
- SQLite databases and similar workloads get transaction‑consistent snapshots without hacks.
- Streaming and append‑heavy workloads don’t explode the version count.
- The same underlying store can back:
- local development
- cloud sync
- semantic code storage
- time‑travel debugging and provenance
1.2 Core principles
- Immutability at the block level: once written, blocks never change.
- Copy‑on‑write structure: new versions share blocks with old ones.
- Semantic version boundaries: versioning is aligned with meaningful events (fsync, close, WAL checkpoint, explicit commit), not every write.
- Content addressability: blocks are identified by hash, enabling deduplication and integrity.
- Namespace as projection: the “filesystem” is a view over a versioned object graph, not the primary storage primitive.
2. Goals and non‑goals
2.1 Goals
- Automatic versioning: every file has a history without application changes.
- SQLite‑friendly: support efficient, consistent snapshots of SQLite DBs and similar page‑oriented stores.
- Streaming‑friendly: support long‑lived, append‑heavy files without pathological version growth.
- Cloud‑ready: support remote block storage, local caching, and multi‑device sync.
- Debuggable and explainable: versioning rules and boundaries are explicit and inspectable.
2.2 Non‑goals (for v1)
- Not a full POSIX replacement: we target a FUSE‑style or OS‑integrated filesystem, but we can initially accept some edge‑case incompatibilities.
- Not a distributed consensus system: multi‑writer concurrency is handled via version branching and merge semantics, not strong global transactions.
- Not a full Git replacement: we can integrate with Git, but we’re not re‑implementing its UX.
3. Core concepts
3.1 Block store
Primitive: fixed‑size blocks (e.g., 4 KB or 8 KB).
- ID:
block_id = hash(block_contents) - Properties:
- immutable
- content‑addressed
- deduplicated
- stored in local cache + optional remote store
3.2 File version
A file version is a logical object:
FileVersion {
file_id // stable identity for the logical file
version_id // monotonically increasing or hash-based
parent_version // optional, for history/branching
path_at_time // path in the namespace when this version was created
block_list // ordered list of block_ids
size // byte length
metadata // timestamps, permissions, etc.
tags // optional semantic labels (e.g., "checkpoint", "autosave")
}
Multiple FileVersions share blocks via block_list.
3.3 Namespace
The live filesystem view maps paths to a current version:
Path -> FileVersion(version = "latest")
Historical versions are accessible via extended syntax, e.g.:
foo.txt;17foo.txt;latestfoo.txt;timestamp:2026-04-10T23:17Zfoo.txt;version:abc123
Internally, this is just a lookup into the version metadata store.
3.4 Version boundary
A version boundary is an event that causes a new FileVersion to be committed:
close()fsync()- SQLite WAL checkpoint
- explicit ioctl / API call
- time‑based or size‑based thresholds (e.g., at most 1 version per second per file)
Between boundaries, writes mutate an uncommitted working state (in memory or temp structures) that is not yet a committed version.
4. Architecture
4.1 High‑level components
Block store
- Local block cache (disk)
- Optional remote block store (cloud)
- Content‑addressable, immutable
Metadata store
- File identities (
file_id) - File versions (
FileVersion) - Directory structure and path mapping
- Indices for lookup by path, time, tags
- File identities (
Versioned filesystem layer
- Implements POSIX‑like operations
- Maintains working state for open files
- Decides when to commit new versions
Sync and replication layer
- Push/pull blocks and metadata to/from remote
- Conflict detection and version branching
Semantic layer (optional, higher level)
- Tags versions with semantic events (e.g., “test passed”, “build succeeded”)
- Integrates with tools like Git, CI, editors, agents
4.2 Data flow: write path
- Application opens
foo.db. - Filesystem resolves
file_idand currentFileVersion. - Writes go to a working file state:
- In memory or temp file
- Tracked as a list of modified blocks
- On version boundary (e.g.,
fsync, WAL checkpoint, close):- Compute hashes for modified blocks
- Store new blocks in block store (if not already present)
- Construct new
FileVersionwith updatedblock_list - Update namespace mapping for
foo.db→ new version
4.3 Data flow: read path
- Application opens
foo.db(orfoo.db;17). - Filesystem resolves the requested
FileVersion. - Reads map file offsets to block indices.
- Blocks are fetched from:
- Local block cache if present
- Otherwise remote store, then cached locally
5. Versioning semantics by workload
5.1 Regular files (text, code, configs)
Default behavior:
- New version on:
close()fsync()- explicit versioning call
- Optional throttling:
- If a file is opened/closed rapidly, coalesce versions within a time window.
This gives a natural history of edits without overwhelming storage.
5.2 SQLite and page‑oriented databases
SQLite is special and important.
Key properties:
- Writes in fixed‑size pages.
- Uses WAL mode for concurrency and durability.
- Checkpoints consolidate WAL into the main DB file.
Design:
- Track SQLite DBs explicitly (by extension, magic bytes, or configuration).
- Treat WAL checkpoints as version boundaries:
- Before checkpoint: DB is at version N.
- After checkpoint: DB is at version N+1.
- Optionally, treat WAL segments themselves as versioned objects for finer‑grained time travel.
Benefits:
- Each version is transaction‑consistent.
- No version spam from individual page writes.
- Snapshots are cheap: only changed pages create new blocks.
5.3 Streaming and append‑heavy files
Examples: logs, video recording, long‑running data streams.
Problems to avoid:
- Creating a new version for every append.
- Keeping infinite history for unbounded streams.
Design:
- Maintain a live append version:
- Writes extend the current version’s
block_list.
- Writes extend the current version’s
- Version boundaries:
- On
close() - Periodic checkpoints (e.g., every N MB or N seconds)
- On
- Retention policy:
- Keep only the last K checkpoints or last T time window.
- Older versions can be garbage‑collected or compacted.
This keeps the version tree manageable while still enabling time‑bounded rewind.
6. Namespace and UX
6.1 Path and version syntax
Expose a VMS‑inspired syntax while remaining POSIX‑compatible.
- Default path:
foo.txt→ latest version. - Explicit version:
foo.txt;17 - By timestamp:
foo.txt;@2026-04-10T23:17Z - By label/tag:
foo.txt;tag:checkpoint
Internally, these resolve to specific FileVersion objects.
6.2 Tools and introspection
Provide tools to explore history:
vls foo.txt→ list versions with timestamps, sizes, tags.vcat foo.txt;17→ print specific version.vdiff foo.txt;17 foo.txt;23→ diff two versions.vmeta foo.txt;17→ show metadata and block structure.
These tools make the system explainable and debuggable.
7. Consistency, concurrency, and branching
7.1 Single‑writer per file (v1 assumption)
For simplicity, assume:
- At any moment, a given
file_idhas at most one active writer. - Concurrent writes from multiple processes are serialized by the filesystem layer.
This matches typical local FS semantics and avoids distributed locking.
7.2 Multi‑device / multi‑replica
When multiple devices modify the same logical file:
- Each device creates its own
FileVersionchain. - On sync, if two new versions share the same parent, we have a branch.
- Branches can be:
- kept as parallel histories
- merged via higher‑level tools (e.g., text merge, DB merge)
- resolved by user choice
The filesystem itself remains neutral: it records divergent histories; it doesn’t auto‑merge semantics.
7.3 Atomicity and durability
- A
FileVersionis either fully committed (all blocks stored, metadata updated) or not visible. - Crash during commit:
- Blocks may be present, but metadata not updated → GC can reclaim or finalize.
- Durability guarantees depend on:
- local fsync to block store
- remote sync policy
8. Storage, GC, and retention
8.1 Storage growth
Storage grows with:
- number of unique blocks
- number of versions
- retention policy
Because blocks are content‑addressed and shared:
- Repeated edits that reuse content are cheap.
- Large files with small changes are efficient (only changed blocks are new).
8.2 Garbage collection
GC operates at the block and version levels.
Version GC:
- Apply retention policies:
- keep last N versions
- keep versions newer than T
- keep tagged versions (e.g., “checkpoint”, “release”)
- Delete metadata for expired versions.
- Apply retention policies:
Block GC:
- Periodically scan for blocks not referenced by any remaining
FileVersion. - Delete unreferenced blocks from local and/or remote stores.
- Periodically scan for blocks not referenced by any remaining
8.3 Tiered storage
Support multiple storage tiers:
- Hot: local SSD cache for recent blocks and metadata.
- Warm: remote object store (e.g., S3‑like).
- Cold: archival storage or compressed packfiles.
Policies can move blocks between tiers based on age, access frequency, or tags.
9. Integration and migration
9.1 Integration with existing tools
Git:
- Map Git commits to snapshots of a directory tree.
- Store Git objects in the same block store for deduplication.
- Allow
git checkoutto be implemented as a cheap namespace projection.
Editors and IDEs:
- Provide an API to tag versions with semantic events (save, build, test).
- Allow time‑travel debugging by mapping source versions to runtime traces.
CI/CD:
- Pin builds to specific filesystem versions.
- Reproduce builds by re‑mounting the same versioned tree.
9.2 Migration path
- Start as a user‑space filesystem (FUSE or equivalent).
- Allow mounting a directory as versioned storage.
- Existing applications can run unmodified:
- SQLite uses the filesystem as usual.
- Logs and streams write to regular files.
Over time, add:
- OS‑level integration (kernel module).
- Tooling and UI for history exploration.
- Cloud sync and multi‑device support.
10. Open questions and extensions
10.1 Policy configuration
- How configurable should version boundaries be per file/path?
- Do we expose a policy language (e.g., “for
*.db, version on WAL checkpoint; forlogs/, checkpoint every 10 MB”)?
10.2 Semantic tagging
- How deeply do we integrate with higher‑level semantics (tests, builds, deployments)?
- Do we treat semantic events as first‑class objects linked to
FileVersions?
10.3 Security and encryption
- Per‑block encryption with keys managed per user or per project.
- Integrity verification via hashes is already built‑in.
10.4 Observability
- Expose metrics:
- version creation rate
- block deduplication ratio
- storage usage by tier
- Provide tracing for:
- version commit paths
- sync operations
- GC cycles
No comments:
Post a Comment