Sunday, February 8, 2026

cps ftw

license: public domain CC0


Project document: Stackless CPS engine for Haskell

1. Vision and goals

Vision:
Build a stackless CPS engine written in Haskell that:

  • Executes Haskell programs via a Core→CPS IR pipeline.
  • Has no semantic call–return: only one active frame; all “future” is heap data.
  • Treats control flow as data: continuations are first-class, mutable, cloneable, serializable.
  • Enables:
    • hot-swap and version routing,
    • shadow execution (fork futures),
    • replay / time travel,
    • semantic drift governance,
    • rich introspection and lore.

Key separation:

  • Host Haskell (GHC): stackful, classic call–return, used to implement the engine and tooling.
  • Guest Haskell: compiled to a Core-like IR, executed by the CPS engine with stackless semantics.

2. Architecture overview

2.1 High-level pipeline

  1. Haskell source (guest)
  2. GHC front-end
    • parse, rename, typecheck
    • desugar to GHC Core
    • run Core-to-Core optimizations
  3. Core extraction (plugin/backend)
    • capture Core AST
    • translate to EngineCore (our IR)
  4. CPS engine (host Haskell)
    • interpret EngineCore using heap-resident continuations
    • manage scheduling, effects, hot-swap, shadow execution, replay

2.2 Components

  • GHC plugin / custom backend

    • Intercepts Core.
    • Serializes or passes it directly to the engine.
  • EngineCore IR

    • Small, typed lambda calculus with ADTs and case.
    • Designed for CPS interpretation.
  • CPS runtime

    • Machine state (expr, env, cont, meta).
    • Step function (small-step interpreter).
    • Engine loop (schedule–resume, no semantic stack).
  • Meta layer

    • Versioning, tracing, effect routing, shadow execution hooks.

3. Front-end: GHC integration

3.1 Core interception via plugin

A GHC plugin that captures Core after optimization:

module Plugin (plugin) where

import GhcPlugins

plugin :: Plugin
plugin = defaultPlugin { installCoreToDos = install }

install :: [CommandLineOption] -> [CoreToDo] -> CoreM [CoreToDo]
install _ todos = do
  modGuts <- getModGuts
  let binds = mg_binds modGuts
  liftIO $ writeFile "program.core" (show binds)
  -- Option A: stop compilation here
  pure []

Later iteration: instead of writing to a file, directly call into the engine or serialize a structured format.

3.2 Core → EngineCore translation

Define a Core-like IR:

type Name = String

data EngineExpr
  = EVar Name
  | ELam Name EngineExpr
  | EApp EngineExpr EngineExpr
  | ELet Name EngineExpr EngineExpr
  | EInt Int
  | ECase EngineExpr [Alt]
  | EConstr Name [EngineExpr]
  deriving (Show)

data Alt = Alt ConName [Name] EngineExpr
type ConName = String

Translator skeleton:

ghcCoreToEngine :: GHC.CoreExpr -> EngineExpr
ghcCoreToEngine = go
  where
    go (Var v)      = EVar (occNameString (getOccName v))
    go (Lit (MachInt n)) = EInt (fromIntegral n)
    go (Lam b e)    = ELam (occNameString (getOccName b)) (go e)
    go (App f a)    = EApp (go f) (go a)
    go (Let (NonRec b rhs) body) =
      ELet (occNameString (getOccName b)) (go rhs) (go body)
    go (Case scrut b _ alts) =
      ECase (go scrut) (map goAlt alts)
    go _ = error "unsupported Core form (first pass)"

    goAlt (DataAlt dc, bs, rhs) =
      Alt (occNameString (getOccName dc))
          (map (occNameString . getOccName) bs)
          (go rhs)
    goAlt _ = error "unsupported alt"

4. CPS engine: core runtime model

4.1 Values, environments, continuations

data Value
  = VInt Int
  | VClosure Name EngineExpr Env
  | VConstr ConName [Value]
  deriving (Show)

type Env = [(Name, Value)]

data Frame
  = FAppFun EngineExpr Env
  | FAppArg Value
  | FLet Name EngineExpr Env
  | FCase [Alt] Env
  deriving (Show)

type Cont = [Frame]  -- head = next frame

4.2 Meta state

data VersionId = VersionId String
data TraceId   = TraceId String

data MetaState = MetaState
  { msVersion :: VersionId
  , msTrace   :: TraceId
  -- hooks for logging, lore, etc.
  }

4.3 Machine state

data MachineState = MachineState
  { msExpr :: EngineExpr
  , msEnv  :: Env
  , msCont :: Cont
  , msMeta :: MetaState
  }

Exactly one semantic frame is active: (msExpr, msEnv).
All other “future work” is in msCont on the heap.


5. CPS interpreter loop

5.1 Engine monad

newtype Engine a = Engine { runEngine :: IO a }
  deriving (Functor, Applicative, Monad, MonadIO)

5.2 Step function

step :: MachineState -> Engine MachineState
step st@MachineState{ msExpr = expr, msEnv = env, msCont = cont } =
  case expr of
    EVar x ->
      case lookup x env of
        Just v  -> continueValue v cont (msMeta st)
        Nothing -> error ("unbound variable: " ++ x)

    ELam x body ->
      continueValue (VClosure x body env) cont (msMeta st)

    EApp f a ->
      pure st { msExpr = f
              , msCont = FAppFun a env : cont
              }

    ELet x e body ->
      pure st { msExpr = e
              , msCont = FLet x body env : cont
              }

    EInt n ->
      continueValue (VInt n) cont (msMeta st)

    EConstr c es ->
      -- naive: evaluate args left-to-right
      case es of
        [] -> continueValue (VConstr c []) cont (msMeta st)
        (e0:rest) ->
          pure st { msExpr = e0
                  , msCont = FAppFun (EConstr c rest) env : cont
                  }

    ECase scrut alts ->
      pure st { msExpr = scrut
              , msCont = FCase alts env : cont
              }

5.3 Continuation handling

continueValue :: Value -> Cont -> MetaState -> Engine MachineState
continueValue v [] meta =
  pure $ MachineState (EInt 0) [] [] meta  -- “done” marker

continueValue v (frame:rest) meta =
  case frame of
    FAppFun arg env ->
      pure $ MachineState arg env (FAppArg v : rest) meta

    FAppArg funVal ->
      case funVal of
        VClosure x body cloEnv ->
          let env' = (x, v) : cloEnv
          in pure $ MachineState body env' rest meta
        _ ->
          error "attempt to call non-function"

    FLet x body env ->
      let env' = (x, v) : env
      in pure $ MachineState body env' rest meta

    FCase alts env ->
      matchCase v alts env rest meta

5.4 Case matching

matchCase :: Value -> [Alt] -> Env -> Cont -> MetaState -> Engine MachineState
matchCase v alts env cont meta =
  case v of
    VConstr c vs ->
      case findAlt c alts of
        Just (Alt _ names body) ->
          let env' = zip names vs ++ env
          in pure $ MachineState body env' cont meta
        Nothing ->
          error "no matching constructor"
    _ ->
      error "case on non-constructor"

findAlt :: ConName -> [Alt] -> Maybe Alt
findAlt c = find (\(Alt c' _ _) -> c' == c)

5.5 Run loop

runMachine :: MachineState -> Engine Value
runMachine st0 = loop st0
  where
    loop st = do
      st' <- step st
      case msCont st' of
        [] -> extractResult st'
        _  -> loop st'

extractResult :: MachineState -> Engine Value
extractResult MachineState{ msExpr = expr, msEnv = env } =
  case expr of
    EInt n -> pure (VInt n)
    _      -> error "non-value at termination"

6. Advanced features: where the CPS model shines

Because continuations and machine states are heap data, we can:

6.1 Hot-swap / version routing

  • Store VersionId in MetaState.
  • At each step, consult a version router to decide:
    • which implementation of primitives to use,
    • which function body to dispatch to (e.g., by name + version).

Hook point: inside step or continueValue, before executing a function body.

6.2 Shadow execution

  • Clone MachineState (deep copy of msExprmsEnvmsContmsMeta).
  • Run:
    • stA under version A,
    • stB under version B.
  • Compare results, logs, or effects.

Shadow execution is “fork the future,” not “clone a stack.”

6.3 Replay / time travel

  • Periodically snapshot MachineState (or log deltas).
  • To replay:
    • restore a snapshot,
    • run forward under new code or different version.

6.4 Semantic drift governance

  • When semantics change, traverse msCont and:
    • rewrite frames (e.g., rename functions, adjust environments),
    • migrate data representations,
    • annotate frames with drift metadata.

Because frames are just data, you can patch history.


7. Implementation roadmap

  1. Phase 1: Minimal engine

    • Implement EngineExprValueEnvFrameContMachineState.
    • Implement stepcontinueValuerunMachine.
    • Hand-construct small EngineExpr programs and run them.
  2. Phase 2: GHC integration

    • Write a GHC plugin that dumps Core.
    • Implement ghcCoreToEngine for a small subset (ints, lambdas, apps, lets).
    • Run simple Haskell programs on the engine.
  3. Phase 3: Extend Core coverage

    • Add ADTs, pattern matching, recursion, basic primitives.
    • Support more Core constructs (case, letrec, etc.).
  4. Phase 4: Meta and effects

    • Add MetaState with versioning and tracing.
    • Introduce an Effect layer (IO-like operations via explicit effect nodes).
    • Implement effect routing in the engine.
  5. Phase 5: Advanced semantics

    • Hot-swap: version routing for functions and primitives.
    • Shadow execution: state cloning and dual runs.
    • Replay: checkpoints and re-runs.
    • Drift governance: continuation rewriting tools.
  6. Phase 6: Tooling and ergonomics

    • CLI or library to compile Haskell → EngineCore → run.
    • Debug views: pretty-print continuations, traces, versions.
    • Optional: GHC plugin that directly invokes the engine for main.

This gives you a full, end-to-end story:

  • Haskell in → GHC front-end → Core → EngineCore → CPS engine → stackless, mutable-history execution.

ADDENDUM

🌟 The “Big Five” features your runtime truly depends on

These are the features that make your hot‑swap / replay / shadow‑execution / version‑routing runtime easy instead of a nightmare of hacks.

1. Semantic closures

Not just syntactic closures — but closures that:

  • capture lexical environments predictably
  • can be reified or at least reasoned about
  • behave deterministically
  • don’t hide mutable references in surprising ways

This is the foundation of your entire model.


2. Purity (or at least effect isolation)

You don’t need religious purity.
You need semantic purity:

  • deterministic functions
  • explicit effects
  • replayable computations
  • serializable state transitions
  • no hidden side effects

This is what makes:

  • shadow execution
  • hot‑swap
  • drift detection
  • replay
  • time travel
  • version routing

…possible without black magic.


3. First‑class continuations (or CPS‑friendly semantics)

Your runtime is fundamentally:

  • continuation‑driven
  • checkpoint‑driven
  • resumable
  • hot‑swappable
  • replayable

You don’t need full call/cc, but you do need:

  • the ability to reify the “rest of the computation”
  • or to structure the runtime so that the stack is already in the heap

This is why your “no real stack, only semantic frames” idea is so powerful.


4. Structured concurrency / actor‑style isolation

Your runtime wants:

  • tasks as independent semantic units
  • message passing
  • supervision trees
  • restartability
  • deterministic scheduling (or at least observable scheduling)

This is what makes:

  • shadow execution of tasks
  • version‑routed execution
  • hot‑swap of individual tasks
  • replay of task histories

…tractable.


5. A type system that can express effects, capabilities, or at least “what this function touches”

This doesn’t have to be Haskell‑level purity.
It can be:

  • algebraic effects
  • effect rows
  • capabilities
  • linear types
  • region types
  • ownership types
  • or even a disciplined annotation system

But you need some way to know:

  • what a function reads
  • what it writes
  • what it depends on
  • what it captures
  • what it can be safely swapped with

This is essential for:

  • drift detection
  • safe hot‑swap
  • shadow execution
  • replay correctness
  • version routing

🌱 Secondary but still important features

These aren’t strictly required, but they make your life dramatically easier.

6. Immutable data structures as the default

Because:

  • snapshots become trivial
  • diffs become trivial
  • replay becomes trivial
  • shadow execution becomes trivial

7. Pattern matching / algebraic data types

Because your runtime manipulates:

  • semantic frames
  • effect logs
  • versioned closures
  • task states

…and ADTs make this clean.

8. Tail‑call friendliness

Not for performance —
but because your runtime is fundamentally CPS‑shaped.

9. A runtime that doesn’t hide too much

Languages like Java, Go, and Python hide:

  • stack frames
  • scheduling
  • GC timing
  • object layout
  • closure representation

This makes your runtime harder to implement.

Languages like:

  • Haskell
  • OCaml
  • Scheme
  • Racket
  • Clojure
  • Rust (with some work)

…give you more control.


🎯 The crisp answer to your question

If closures were the only requirement, you could port the runtime anywhere.
But closures are only 1 of the 5 essential features.

The full set is:

  1. Semantic closures
  2. Purity / effect isolation
  3. Continuation‑friendly semantics
  4. Structured concurrency
  5. Effect‑expressive type system

These five together make your runtime natural instead of forced.



No comments:

Post a Comment