Project document: Stackless CPS engine for Haskell
1. Vision and goals
Vision:
Build a stackless CPS engine written in Haskell that:
- Executes Haskell programs via a Core→CPS IR pipeline.
- Has no semantic call–return: only one active frame; all “future” is heap data.
- Treats control flow as data: continuations are first-class, mutable, cloneable, serializable.
- Enables:
- hot-swap and version routing,
- shadow execution (fork futures),
- replay / time travel,
- semantic drift governance,
- rich introspection and lore.
Key separation:
- Host Haskell (GHC): stackful, classic call–return, used to implement the engine and tooling.
- Guest Haskell: compiled to a Core-like IR, executed by the CPS engine with stackless semantics.
2. Architecture overview
2.1 High-level pipeline
- Haskell source (guest)
- GHC front-end
- parse, rename, typecheck
- desugar to GHC Core
- run Core-to-Core optimizations
- Core extraction (plugin/backend)
- capture Core AST
- translate to EngineCore (our IR)
- CPS engine (host Haskell)
- interpret EngineCore using heap-resident continuations
- manage scheduling, effects, hot-swap, shadow execution, replay
2.2 Components
GHC plugin / custom backend
- Intercepts Core.
- Serializes or passes it directly to the engine.
EngineCore IR
- Small, typed lambda calculus with ADTs and case.
- Designed for CPS interpretation.
CPS runtime
- Machine state (expr, env, cont, meta).
- Step function (small-step interpreter).
- Engine loop (schedule–resume, no semantic stack).
Meta layer
- Versioning, tracing, effect routing, shadow execution hooks.
3. Front-end: GHC integration
3.1 Core interception via plugin
A GHC plugin that captures Core after optimization:
module Plugin (plugin) where
import GhcPlugins
plugin :: Plugin
plugin = defaultPlugin { installCoreToDos = install }
install :: [CommandLineOption] -> [CoreToDo] -> CoreM [CoreToDo]
install _ todos = do
modGuts <- getModGuts
let binds = mg_binds modGuts
liftIO $ writeFile "program.core" (show binds)
-- Option A: stop compilation here
pure []
Later iteration: instead of writing to a file, directly call into the engine or serialize a structured format.
3.2 Core → EngineCore translation
Define a Core-like IR:
type Name = String
data EngineExpr
= EVar Name
| ELam Name EngineExpr
| EApp EngineExpr EngineExpr
| ELet Name EngineExpr EngineExpr
| EInt Int
| ECase EngineExpr [Alt]
| EConstr Name [EngineExpr]
deriving (Show)
data Alt = Alt ConName [Name] EngineExpr
type ConName = String
Translator skeleton:
ghcCoreToEngine :: GHC.CoreExpr -> EngineExpr
ghcCoreToEngine = go
where
go (Var v) = EVar (occNameString (getOccName v))
go (Lit (MachInt n)) = EInt (fromIntegral n)
go (Lam b e) = ELam (occNameString (getOccName b)) (go e)
go (App f a) = EApp (go f) (go a)
go (Let (NonRec b rhs) body) =
ELet (occNameString (getOccName b)) (go rhs) (go body)
go (Case scrut b _ alts) =
ECase (go scrut) (map goAlt alts)
go _ = error "unsupported Core form (first pass)"
goAlt (DataAlt dc, bs, rhs) =
Alt (occNameString (getOccName dc))
(map (occNameString . getOccName) bs)
(go rhs)
goAlt _ = error "unsupported alt"
4. CPS engine: core runtime model
4.1 Values, environments, continuations
data Value
= VInt Int
| VClosure Name EngineExpr Env
| VConstr ConName [Value]
deriving (Show)
type Env = [(Name, Value)]
data Frame
= FAppFun EngineExpr Env
| FAppArg Value
| FLet Name EngineExpr Env
| FCase [Alt] Env
deriving (Show)
type Cont = [Frame] -- head = next frame
4.2 Meta state
data VersionId = VersionId String
data TraceId = TraceId String
data MetaState = MetaState
{ msVersion :: VersionId
, msTrace :: TraceId
-- hooks for logging, lore, etc.
}
4.3 Machine state
data MachineState = MachineState
{ msExpr :: EngineExpr
, msEnv :: Env
, msCont :: Cont
, msMeta :: MetaState
}
Exactly one semantic frame is active: (msExpr, msEnv).
All other “future work” is in msCont on the heap.
5. CPS interpreter loop
5.1 Engine monad
newtype Engine a = Engine { runEngine :: IO a }
deriving (Functor, Applicative, Monad, MonadIO)
5.2 Step function
step :: MachineState -> Engine MachineState
step st@MachineState{ msExpr = expr, msEnv = env, msCont = cont } =
case expr of
EVar x ->
case lookup x env of
Just v -> continueValue v cont (msMeta st)
Nothing -> error ("unbound variable: " ++ x)
ELam x body ->
continueValue (VClosure x body env) cont (msMeta st)
EApp f a ->
pure st { msExpr = f
, msCont = FAppFun a env : cont
}
ELet x e body ->
pure st { msExpr = e
, msCont = FLet x body env : cont
}
EInt n ->
continueValue (VInt n) cont (msMeta st)
EConstr c es ->
-- naive: evaluate args left-to-right
case es of
[] -> continueValue (VConstr c []) cont (msMeta st)
(e0:rest) ->
pure st { msExpr = e0
, msCont = FAppFun (EConstr c rest) env : cont
}
ECase scrut alts ->
pure st { msExpr = scrut
, msCont = FCase alts env : cont
}
5.3 Continuation handling
continueValue :: Value -> Cont -> MetaState -> Engine MachineState
continueValue v [] meta =
pure $ MachineState (EInt 0) [] [] meta -- “done” marker
continueValue v (frame:rest) meta =
case frame of
FAppFun arg env ->
pure $ MachineState arg env (FAppArg v : rest) meta
FAppArg funVal ->
case funVal of
VClosure x body cloEnv ->
let env' = (x, v) : cloEnv
in pure $ MachineState body env' rest meta
_ ->
error "attempt to call non-function"
FLet x body env ->
let env' = (x, v) : env
in pure $ MachineState body env' rest meta
FCase alts env ->
matchCase v alts env rest meta
5.4 Case matching
matchCase :: Value -> [Alt] -> Env -> Cont -> MetaState -> Engine MachineState
matchCase v alts env cont meta =
case v of
VConstr c vs ->
case findAlt c alts of
Just (Alt _ names body) ->
let env' = zip names vs ++ env
in pure $ MachineState body env' cont meta
Nothing ->
error "no matching constructor"
_ ->
error "case on non-constructor"
findAlt :: ConName -> [Alt] -> Maybe Alt
findAlt c = find (\(Alt c' _ _) -> c' == c)
5.5 Run loop
runMachine :: MachineState -> Engine Value
runMachine st0 = loop st0
where
loop st = do
st' <- step st
case msCont st' of
[] -> extractResult st'
_ -> loop st'
extractResult :: MachineState -> Engine Value
extractResult MachineState{ msExpr = expr, msEnv = env } =
case expr of
EInt n -> pure (VInt n)
_ -> error "non-value at termination"
6. Advanced features: where the CPS model shines
Because continuations and machine states are heap data, we can:
6.1 Hot-swap / version routing
- Store
VersionIdinMetaState. - At each
step, consult a version router to decide:- which implementation of primitives to use,
- which function body to dispatch to (e.g., by name + version).
Hook point: inside step or continueValue, before executing a function body.
6.2 Shadow execution
- Clone
MachineState(deep copy ofmsExpr,msEnv,msCont,msMeta). - Run:
stAunder version A,stBunder version B.
- Compare results, logs, or effects.
Shadow execution is “fork the future,” not “clone a stack.”
6.3 Replay / time travel
- Periodically snapshot
MachineState(or log deltas). - To replay:
- restore a snapshot,
- run forward under new code or different version.
6.4 Semantic drift governance
- When semantics change, traverse
msContand:- rewrite frames (e.g., rename functions, adjust environments),
- migrate data representations,
- annotate frames with drift metadata.
Because frames are just data, you can patch history.
7. Implementation roadmap
Phase 1: Minimal engine
- Implement
EngineExpr,Value,Env,Frame,Cont,MachineState. - Implement
step,continueValue,runMachine. - Hand-construct small
EngineExprprograms and run them.
- Implement
Phase 2: GHC integration
- Write a GHC plugin that dumps Core.
- Implement
ghcCoreToEnginefor a small subset (ints, lambdas, apps, lets). - Run simple Haskell programs on the engine.
Phase 3: Extend Core coverage
- Add ADTs, pattern matching, recursion, basic primitives.
- Support more Core constructs (case, letrec, etc.).
Phase 4: Meta and effects
- Add
MetaStatewith versioning and tracing. - Introduce an
Effectlayer (IO-like operations via explicit effect nodes). - Implement effect routing in the engine.
- Add
Phase 5: Advanced semantics
- Hot-swap: version routing for functions and primitives.
- Shadow execution: state cloning and dual runs.
- Replay: checkpoints and re-runs.
- Drift governance: continuation rewriting tools.
Phase 6: Tooling and ergonomics
- CLI or library to compile Haskell → EngineCore → run.
- Debug views: pretty-print continuations, traces, versions.
- Optional: GHC plugin that directly invokes the engine for
main.
This gives you a full, end-to-end story:
- Haskell in → GHC front-end → Core → EngineCore → CPS engine → stackless, mutable-history execution.
ADDENDUM
🌟 The “Big Five” features your runtime truly depends on
These are the features that make your hot‑swap / replay / shadow‑execution / version‑routing runtime easy instead of a nightmare of hacks.
1. Semantic closures
Not just syntactic closures — but closures that:
- capture lexical environments predictably
- can be reified or at least reasoned about
- behave deterministically
- don’t hide mutable references in surprising ways
This is the foundation of your entire model.
2. Purity (or at least effect isolation)
You don’t need religious purity.
You need semantic purity:
- deterministic functions
- explicit effects
- replayable computations
- serializable state transitions
- no hidden side effects
This is what makes:
- shadow execution
- hot‑swap
- drift detection
- replay
- time travel
- version routing
…possible without black magic.
3. First‑class continuations (or CPS‑friendly semantics)
Your runtime is fundamentally:
- continuation‑driven
- checkpoint‑driven
- resumable
- hot‑swappable
- replayable
You don’t need full call/cc, but you do need:
- the ability to reify the “rest of the computation”
- or to structure the runtime so that the stack is already in the heap
This is why your “no real stack, only semantic frames” idea is so powerful.
4. Structured concurrency / actor‑style isolation
Your runtime wants:
- tasks as independent semantic units
- message passing
- supervision trees
- restartability
- deterministic scheduling (or at least observable scheduling)
This is what makes:
- shadow execution of tasks
- version‑routed execution
- hot‑swap of individual tasks
- replay of task histories
…tractable.
5. A type system that can express effects, capabilities, or at least “what this function touches”
This doesn’t have to be Haskell‑level purity.
It can be:
- algebraic effects
- effect rows
- capabilities
- linear types
- region types
- ownership types
- or even a disciplined annotation system
But you need some way to know:
- what a function reads
- what it writes
- what it depends on
- what it captures
- what it can be safely swapped with
This is essential for:
- drift detection
- safe hot‑swap
- shadow execution
- replay correctness
- version routing
🌱 Secondary but still important features
These aren’t strictly required, but they make your life dramatically easier.
6. Immutable data structures as the default
Because:
- snapshots become trivial
- diffs become trivial
- replay becomes trivial
- shadow execution becomes trivial
7. Pattern matching / algebraic data types
Because your runtime manipulates:
- semantic frames
- effect logs
- versioned closures
- task states
…and ADTs make this clean.
8. Tail‑call friendliness
Not for performance —
but because your runtime is fundamentally CPS‑shaped.
9. A runtime that doesn’t hide too much
Languages like Java, Go, and Python hide:
- stack frames
- scheduling
- GC timing
- object layout
- closure representation
This makes your runtime harder to implement.
Languages like:
- Haskell
- OCaml
- Scheme
- Racket
- Clojure
- Rust (with some work)
…give you more control.
🎯 The crisp answer to your question
If closures were the only requirement, you could port the runtime anywhere.
But closures are only 1 of the 5 essential features.
The full set is:
- Semantic closures
- Purity / effect isolation
- Continuation‑friendly semantics
- Structured concurrency
- Effect‑expressive type system
These five together make your runtime natural instead of forced.
No comments:
Post a Comment