std.cluster
std.cluster
Section titled “std.cluster”std.cluster currently exposes the local supervised-actor runtime for the :cluster profile tracer bullet.
This page documents what exists now. Local grains have a source-level activation
shell, lifecycle hook syntax, deactivation-policy metadata, a local activation
registry that enforces one live writer per durable identity, a local namespace
lookup layer, and explicit GrainStore-backed lifecycle callbacks.
Source on_activate/on_deactivate hooks execute over the current local scalar
grain state slot. Source supervisor ... child ... end declarations lower to
local Name_start_link(node_id) helpers for actor children. Persistent source
grains that import std.cluster.persist as persist also get the narrow generated
Name_lookup_or_start_persistent_state0_u64(...) helper for slot-0 u64
state over GrainStoreBytes. Local idle passivation is executable through
explicit activity touch and deterministic sweep calls. Local namespace mappings
can now be persisted in GrainStoreBytes and restored into a fresh local actor
system before activation. std.cluster.persist also exposes explicit
arbitrary-slot u64 helpers for nonzero scalar state slots, and the compiler
can generate all-slot scalar persistence helpers for numeric and namespace
grain starts. Heterogeneous typed state serializers, non-u64 state slots,
schema evolution, placement, membership, migration, distributed namespace
synchronization, distributed supervision aggregation, and remote transport
remain future :cluster work.
The current facade is local-only:
LocalActorSystemowns one cluster-budgetNurseryand oneSupervisor.- Actors run as nursery tasks.
- Supervisor strategies are
one_for_one,one_for_all, andrest_for_one. - Restart policies are
permanent,transient, andtemporary. - Restart budgets and pledge-violation restart opt-in are exposed.
- Child status snapshots expose lifecycle, actor id, task id, task state, last exit reason, and restart count.
- Actor tombstones can be counted, inspected as bounded latest-record scalar metadata, classified for repeated deterministic patterns, mirrored to a caller-provided sink, and used to drive explicit local quarantine.
- Grain declarations with durable identity syntax lower through the same local supervised activation shell as actors.
- Grain lifecycle metadata accepts
@lifecycle(activation: .lazy, deactivation: .idle_timeout(ms)). - Grain bodies accept
on_activate(stored: T) -> T do ... endandon_deactivate(state: T) -> T do ... endas the stable source contract. - Local grain lookup/start maps
(grain_type, grain_id)to one stable local actor reference, so duplicate activation attempts reuse the existing live activation instead of creating a second mutator. - Local namespace lookup maps
(grain_type, namespace)to an internal durable grain id, then routes duplicate lookups through the same single-writer activation registry. std.cluster.grainstoreexposes explicit local namespace persistence helpers:bind_local_namespace_u64(...)persists and mirrors a binding into the local runtime, andrestore_local_namespace_u64(...)restores a persisted binding into a fresh local actor system before activation.- Persistent local grain start can invoke caller-provided load/store callbacks
backed by
GrainStoreBytesafter setup, after message/timeout boundaries, and before teardown. - Persistent source grains that import
std.cluster.persist as persistcan use generatedName_lookup_or_start_persistent_state0_u64(system, grain_id, slot, policy, ctx)helpers for the current scalar slot-0u64state runtime. The helper wires canonicalstd.cluster.persistload/store callbacks and deterministic(grain_type, grain_id, slot)GrainStoreByteskeys. - The compiler also emits
Name_lookup_or_start_persistent_slots_u64(system, grain_id, slot, policy, ctx)for the current scalaru64state-slot runtime. It wires generated per-grain callbacks that persist every scalar state slot throughGrainStoreBytes. - The compiler also emits
Name_lookup_or_start_namespace_persistent_slots_u64(system, namespace, slot, policy, ctx)for namespace-addressed local grains. It resolves the local namespace binding first, then uses the same generated all-slot scalar callbacks. std.cluster.persistalso exposesget_slot_u64,put_slot_u64,load_slot_u64,store_slot_u64,load_slots_u64, andstore_slots_u64for explicit scalar state slots and generated all-slot callbacks.- Generated grain lifecycle hooks run inside that boundary:
on_activateruns after a persistence load and before the first message;on_deactivateruns before teardown and before the final persistent store. - Local idle passivation is explicit and deterministic:
local_grain_touch(ref, now_ms)records an activity boundary, andlocal_grain_passivate_idle(system, idle_timeout_ms, now_ms, reason)passivates elapsed grains through the sameon_deactivate/store boundary. - A source grain with
@lifecycle(activation: .lazy, deactivation: .idle_timeout(ms))also emitsName_passivate_idle(system, now_ms, reason). The helper uses the source timeout literal and keeps the clock sample and stop reason explicit. - Source supervisors create local systems through generated
SupervisorName_start_link(node_id)helpers and start declared actor children through generated supervised refs. - Compiler-generated local actor/grain starts forward
@arena(max_bytes: N)into the runtime. The current executable boundary enforces that limit for generated scalaru64state-slot allocation and exposes the configured ceiling through explicit local observation helpers.
Janus actor start
Section titled “Janus actor start”Define a message protocol with message. The declaration uses tagged
variants:
message CounterMsg { Tick, Stop,}Attach the protocol to an actor with actor Name(msg: Msg). The payload
binding name is part of the header; the current generated handler still
receives the raw i64 tag as __msg.
@mailbox(capacity: 4)actor Counter(msg: CounterMsg) do var count: u64 = 0
receive do count += __msg endendFor a source-level Janus actor, the compiler emits the supervised start
wrappers:
ActorName_start_supervised(system: u64, slot: u64, policy: u32) -> u64ActorName_start_supervised_ref(system: u64, slot: u64, policy: u32) -> u64ActorName_start_supervised returns the transient ActorId.
ActorName_start_supervised_ref starts the actor and returns a stable
local actor reference for the supervised (system, slot) identity. Use
the _ref form for production send and observation paths that should
survive supervisor restarts.
{.profile: cluster.}
use std.cluster.local as cluster
message CounterMsg { Tick, Stop,}
@mailbox(capacity: 4)actor Counter(msg: CounterMsg) do var count: u64 = 0
receive do count += __msg endend
pub func main() -> i32 do let system = cluster.local_new( 1 as u64, cluster.STRATEGY_ONE_FOR_ONE, 1 as u64, ) if system == 0 as u64 do return 1 end
let counter = Counter_start_supervised_ref( system, 0 as u64, cluster.POLICY_PERMANENT, ) if counter == 0 as u64 do return 2 end
if cluster.local_ref_mailbox_capacity(counter) != 4 as i64 do return 3 end if cluster.local_ref_try_send(counter, 1 as i64) != 1 as i32 do return 4 end if cluster.local_shutdown(system) != 1 as i32 do return 5 end if cluster.local_destroy(system) != 1 as i32 do return 6 end return 0endThe wrapper hides the setup/handler/destroy runtime-entry plumbing. Public
Janus APIs accept typed callables or generated actor/grain starters; callable
addresses are not ordinary u64 values on the language surface.
Source supervisor start
Section titled “Source supervisor start”supervisor declarations are now executable in the local v1 runtime when their
children are source actors with generated supervised-ref helpers:
actor Worker do receive do __msg endend
actor Scratch do receive do __msg endend
supervisor Root, strategy: .one_for_one, restart_pledge_violations: true do child Worker, restart: .permanent child Scratch, restart: .temporaryend
pub func main() -> i32 do let system = Root_start_link(1 as u64) if system == 0 as u64 do return 1 end if cluster.local_destroy(system) != 1 as i32 do return 2 end return 0endThe compiler emits:
Root_start_link(node_id: u64) -> u64The helper lowers deterministically to the local runtime:
cluster_local_new(node_id, strategy, child_count)creates the system.restart_pledge_violations: truecallscluster_local_set_restart_pledge_violations(system, 1).- Each
child Actor, restart: .policycallsActor_start_supervised_ref(system, slot, policy). - If system creation or any child start fails, the helper returns
0; child start failure also destroys the partially created system.
The current source helper intentionally covers local actor children. Child argument lists, grain identity arguments, distributed supervisor aggregation, and cross-node placement remain runtime layers below the same source doctrine.
Local grain activation registry
Section titled “Local grain activation registry”The compiler accepts the final local-persistent grain header shape and emits the same local supervised start wrapper used by actors:
@persist(via: GrainStoreBytes)@lifecycle(activation: .lazy, deactivation: .idle_timeout(300_000))@requires(cap: [.network])@reload(boundary: .message, state: UserState, migrate: user_v1_to_v2)@reductions(limit: 128)@arena(scope: .grain, reset: .on_deactivate)@observe(mailbox: .summary, state: .none)@tombstone(digest_includes: [.payload], retention_window: 60_000, deadly_threshold: 3)@behaviour(.worker)grain User(id: u64, msg: UserMsg) do var count: u64 = 0
on_activate(stored: u64) -> u64 do return stored end
on_deactivate(state: u64) -> u64 do return state end
receive do UserMsg.Ping => do count += 1 end, UserMsg.Stop => do return 0 end, endendFor the compiler slice, User_start_supervised(system, slot, policy) remains an
activation shell over the local actor runtime. Lifecycle hooks now execute for
the current local scalar state-slot implementation. Local idle passivation uses
explicit caller-supplied milliseconds; no hidden clock or scheduler is implied.
When the source declares .idle_timeout(ms), the compiler emits
User_passivate_idle(system, now_ms, reason) so callers do not duplicate the
timeout literal. The caller still supplies the visible now_ms and reason
costs. For the runtime registry slice, use std.cluster.local to locate or
start a grain activation by durable numeric identity:
@persist and @lifecycle are now checked as grain source contract during
janus build. @persist is valid only on grains and, in the v1 local runtime,
must spell via: GrainStoreBytes; missing via, unknown fields, actor use, or
future store names fail with E_CLUSTER_PERSIST. @lifecycle is valid only on
grains, requires activation: .lazy, and accepts omitted deactivation
metadata as the current .never default. If deactivation is present, it must
be .never or .idle_timeout(ms) with a positive compile-time millisecond
literal. Invalid lifecycle metadata fails with E_CLUSTER_LIFECYCLE.
@requires(cap: [...]) is already enforced by janus build for calls inside
the grain body. The compiler maps source symbols to the current Cap*
call-graph requirements. .network covers CapNetRead and CapNetWrite;
.storage_nvme covers filesystem-style storage requirements; .stdout,
.stderr, and .alloc cover their matching runtime powers. If a grain body
calls a function requiring CapNetRead without declaring .network, the build
fails with E_CAP_MISSING. The annotation itself is closed over the canonical
cap: [...] field; missing cap, an empty list, or invented fields such as
caps fail with E_CLUSTER_REQUIRES.
func read_socket() requires CapNetRead doend
@requires(cap: [.storage_nvme])grain StorageOnly(msg: UserMsg) do receive do UserMsg.Ping => do read_socket() // E_CAP_MISSING during janus build end, else => do end, endendThis compile-time check is separate from runtime placement. NodeManifest matching, migration refusal, and remote routing still belong to the NexusOS cluster runtime.
Memory tags have the same source-contract discipline. The live Phase-B checks are compile-time source rules:
alloc[Local.Shared](...)is rejected inside a grain withE_CLUSTER_MEMTAG. A grain owns its state and mutates it through its protocol; shared mutable local memory is a rival authority path.alloc[Volatile.Ephemeral](...)is rejected inside a grain unless the grain declaresreconstruct(). Ephemeral grain state is allowed only when the source shows how the grain rebuilds it after activation, migration, or passivation boundaries.@replicate(scope: .wing | .cluster | .swarm, protocol: .pbft)is validated as replication source metadata.scopeis required; unknown fields, unsupported scopes, or unsupported protocols fail withE_CLUSTER_REPLICATE. Runtime replication, membership, and consensus execution remain runtime work.
grain BadStore(msg: UserMsg) do receive do UserMsg.Ping => do let slot = alloc[Local.Shared](0 as u64) _ = slot end, else => do end, endendUse Local.Exclusive, Session.Replicated, Session.Consistent, or
Volatile.Ephemeral according to the migration contract. For
Volatile.Ephemeral, declare reconstruct() next to the receive loop:
grain ScratchStore(msg: UserMsg) do reconstruct() do // Rebuild dropped caches or scratch state from durable state. end
receive do UserMsg.Ping => do let scratch = alloc[Volatile.Ephemeral](0 as u64) _ = scratch end, else => do end, endendFull runtime replication/passivation behavior remains below the same source surface.
let user_ref = cluster.local_grain_lookup_or_start( system, 100 as u64, // grain type id 42 as u64, // durable grain id 0 as u64, // local supervisor slot cluster.POLICY_PERMANENT, 4 as u64, // mailbox capacity user_setup, user_handler, user_destroy,)If another call uses the same (grain_type, grain_id), the runtime returns the
same stable local reference while the activation is live. This pins the first
grain runtime invariant: one durable identity has one active local writer.
The local namespace layer resolves human-readable namespace keys to internal durable grain ids before entering the same activation registry:
let user_ref = cluster.local_grain_lookup_or_start_namespace( system, 100 as u64, // grain type id "users/alice", // local namespace key 0 as u64, // local supervisor slot cluster.POLICY_PERMANENT, 4 as u64, // mailbox capacity user_setup, user_handler, user_destroy,)local_grain_namespace_lookup returns the mapped internal id, or 0 when the
namespace is unbound. local_grain_lookup_or_start_namespace derives and stores
an internal id on first lookup, then returns the same live activation ref for
duplicate namespace lookups. local_grain_namespace_bind can bind aliases to an
existing id; rebinding an existing namespace to a different id is rejected.
For local persistence, use the persistent lookup/start variant and pass lifecycle callbacks:
let user_ref = cluster.local_grain_lookup_or_start_persistent( system, 100 as u64, 42 as u64, 0 as u64, cluster.POLICY_PERMANENT, 4 as u64, user_setup, user_handler, user_destroy, store_ctx as u64, load, store,)The load/store callbacks use this shape:
pub func load(ctx: u64, grain_type: u64, grain_id: u64, state: u64) -> i32 do // Return >= 0 for a valid cold miss or restore, negative for fatal load.end
pub func store(ctx: u64, grain_type: u64, grain_id: u64, state: u64) -> i32 do // Return 1 when durable state was committed, 0 on failure.endctx is the caller-provided store context, commonly a pointer to a
GrainStoreBytes facade. The runtime calls load after setup returns a state
pointer, calls store after message and timeout handlers, and calls store
again before teardown. Store failure turns the handler boundary into a stop so
the activation does not continue pretending volatile mutation was committed.
Use local_grain_persistence_load_failures(system) and
local_grain_persistence_store_failures(system) to inspect persistence callback
failures observed by the local runtime. The counters are scoped to the local
actor system handle and increment only when a user-provided load callback returns
a negative value or a store callback returns anything other than 1.
For source-declared grains with scalar u64 state slots, the compiler also
emits callback-free helpers:
use std.cluster.persist as persist
let user_ref = User_lookup_or_start_persistent_state0_u64( system, 42 as u64, 0 as u64, cluster.POLICY_PERMANENT, store_ctx as u64,)
let full_user_ref = User_lookup_or_start_persistent_slots_u64( system, 42 as u64, 0 as u64, cluster.POLICY_PERMANENT, store_ctx as u64,)
let named_user_ref = User_lookup_or_start_namespace_persistent_slots_u64( system, "users/alice", 0 as u64, cluster.POLICY_PERMANENT, store_ctx as u64,)These helpers still expose the cost: the caller passes the persistence context
explicitly, and the runtime performs load/store at the same boundaries. The
state0 helper preserves the original single-slot convenience path; the
numeric and namespace slots helpers persist every generated scalar u64
state slot. The namespace helper is still local: callers must persist or restore
namespace bindings explicitly when they need durable names across systems.
If a source grain declares @lifecycle(..., deactivation: .idle_timeout(ms)),
the compiler also emits:
User_passivate_idle(system, now_ms, reason) -> u64This helper lowers to cluster.local_grain_passivate_idle with the source
timeout literal. It does not read a clock and does not install a hidden timer;
the scheduler or caller remains responsible for choosing when to sweep.
The current registry and namespace layer are still local. They do not yet
provide heterogeneous typed GrainStore serializers, non-u64 state slots,
typed-state schema evolution, hidden scheduler-owned sweep loops, migration,
remote routing, cross-node placement, or distributed namespace synchronization.
Those are separate runtime layers.
The local grain registry helpers are:
cluster.local_grain_lookup_or_start(system, grain_type, grain_id, slot, policy, capacity, setup, handler, destroy) -> u64cluster.local_grain_lookup_or_start_lifecycle(system, grain_type, grain_id, slot, policy, capacity, setup, handler, destroy, activate, deactivate) -> u64cluster.local_grain_lookup_or_start_persistent(system, grain_type, grain_id, slot, policy, capacity, setup, handler, destroy, ctx, load, store) -> u64cluster.local_grain_lookup_or_start_persistent_lifecycle(system, grain_type, grain_id, slot, policy, capacity, setup, handler, destroy, ctx, load, store, activate, deactivate) -> u64cluster.local_grain_ref_try_send(grain_ref, msg) -> i32cluster.local_grain_touch(grain_ref, now_ms) -> i32cluster.local_grain_passivate_idle(system, idle_timeout_ms, now_ms, reason) -> u64cluster.local_grain_active_count(system) -> u64cluster.local_grain_persistence_load_failures(system) -> u64cluster.local_grain_persistence_store_failures(system) -> u64cluster.local_grain_namespace_lookup(system, grain_type, namespace) -> u64cluster.local_grain_namespace_bind(system, grain_type, namespace, grain_id) -> i32cluster.local_grain_lookup_or_start_namespace(system, grain_type, namespace, slot, policy, capacity, setup, handler, destroy) -> u64cluster.local_grain_lookup_or_start_namespace_lifecycle(system, grain_type, namespace, slot, policy, capacity, setup, handler, destroy, activate, deactivate) -> u64cluster.local_grain_lookup_or_start_namespace_persistent(system, grain_type, namespace, slot, policy, capacity, setup, handler, destroy, ctx, load, store) -> u64cluster.local_grain_lookup_or_start_namespace_persistent_lifecycle(system, grain_type, namespace, slot, policy, capacity, setup, handler, destroy, ctx, load, store, activate, deactivate) -> u64cluster.local_arena_max_bytes(system, slot) -> u64Stable local actor references are scalar handles. They encode the local system
handle, child slot, and slot generation, not the runtime ActorId, so a
permanent or transient child keeps the same reference after restart. If you stop
a child and reuse the slot for a different child, the old reference becomes
invalid instead of aliasing the replacement. The current ref helpers are:
cluster.local_actor_ref(system, slot) -> u64cluster.local_ref_try_send(actor_ref, msg) -> i32cluster.local_ref_child_actor_id(actor_ref) -> i32cluster.local_ref_child_lifecycle(actor_ref) -> i32cluster.local_ref_child_task_state(actor_ref) -> i32cluster.local_ref_child_last_exit(actor_ref) -> i32cluster.local_ref_mailbox_len(actor_ref) -> i64cluster.local_ref_mailbox_capacity(actor_ref) -> i64cluster.local_ref_arena_max_bytes(actor_ref) -> u64cluster.local_ref_stop_child(actor_ref, reason) -> i32Capability-gated callers use the same reference shape with explicit
ClusterLocalCap authority:
cluster.local_actor_ref_cap(cap, system, slot) -> u64cluster.local_ref_try_send_cap(cap, actor_ref, msg) -> i32cluster.local_ref_child_actor_id_cap(cap, actor_ref) -> i32cluster.local_ref_child_lifecycle_cap(cap, actor_ref) -> i32cluster.local_ref_child_task_state_cap(cap, actor_ref) -> i32cluster.local_ref_child_last_exit_cap(cap, actor_ref) -> i32cluster.local_ref_mailbox_len_cap(cap, actor_ref) -> i64cluster.local_ref_mailbox_capacity_cap(cap, actor_ref) -> i64cluster.local_ref_arena_max_bytes_cap(cap, actor_ref) -> u64cluster.local_ref_stop_child_cap(cap, actor_ref, reason) -> i32cluster.local_arena_max_bytes_cap(cap, system, slot) -> u64Grain @requires is declaration-level metadata; capability-token facade calls
remain expression-level authority. Use both when both are true: the grain
declares what kind of node/API authority it needs, and a specific runtime call
passes the concrete token that authorizes the operation.
Use ActorRef[Msg] for compile-time message protocol checks on direct
spawned actors. Use the scalar local actor reference above for the
supervised local bridge path. Local GrainRef[Msg] uses the same protocol
check and boxed payload send ABI for local grain activations; the
test-cluster-grain-payload gate proves payload delivery by resolving a
typed Promise[T] from inside the grain receive arm.
Inside receive, you can either write normal statements against __msg or
write bare match arms. Bare arms desugar to match __msg { ... }:
receive do 0 => do count += 1 end, 1 => do return 0 end, else => do count = count end,endFor typed message protocols, receive arms can match named variants, destructure payload fields, guard on destructured bindings, and include a timeout arm:
message CounterMsg { Tick, Set { value: u64 }, Stop,}
receive do CounterMsg.Tick => do count += 1 end, CounterMsg.Set { value } when value >= 0 as u64 => do count += value end, CounterMsg.Stop => do return 0 end, else => do count = count end, after 0 => do count = count end,endThe shorthand { value } binds the payload field named value into the arm
scope. Message payload fields must be SBI-conformant; pointer-typed fields are
rejected at declaration time with E2530.
For compiler-generated supervised actors, an after N => ... arm is wired into
the local runtime. The compiler emits an ActorName_timeout(actor) helper and
the generated ActorName_start_supervised* wrappers register it with the
mailbox timeout. Delivered messages still call ActorName_handler(actor, msg);
an empty mailbox at the timeout boundary calls ActorName_timeout(actor).
Direct spawned actors can use typed actor references:
pub func send_tick(ref: ActorRef[CounterMsg]) -> i32 do ref.send(CounterMsg.Tick) return 0end
pub func spawn_counter() -> ActorRef[CounterMsg] do return spawn Counter()endActorRef[Msg] is a compile-time protocol witness over the current actor
handle ABI. The compiler checks direct ref.send(Msg.UnitVariant) calls,
typed local bindings, and direct return spawn Actor() expressions. Unit
variants lower to their i64 tag. Payload-carrying variants are now
supported: fields transfer through boxed slot arrays, and receive arms can
destructure them with Msg.Variant { field } patterns. All message fields
must be SBI-conformant (owned, by-value, no pointers) — the compiler
rejects non-conformant declarations with E2530.
Local GrainRef[Msg] follows the same boxed payload ABI for source-level
.send(...) calls. The local runtime still activates grains through the
node-local actor substrate, but the source witness is grain-shaped and
protocol-checked independently from ActorRef[Msg].
Sendability
Section titled “Sendability”SPEC-029 sendability is enforced before actor payload delivery ships. For proven actor, channel, and mailbox send boundaries:
ref Tpayloads are rejected with E2801.iso Tpayloads are accepted and the binding is consumed.- Reading a consumed
isobinding emits E2802. val Tandtag Tpayloads are sendable.
This is a type check, not a serialization trait check. Janus does not require
a Serialize trait for actor messages. Wire-ready message payloads must use
SBI-compatible layout when the distributed transport path lands.
Explicit child stop
Section titled “Explicit child stop”Use local_stop_child when a caller wants to stop a live child without
applying its restart policy:
let stopped = cluster.local_stop_child( system, 0 as u64, cluster.STOP_REASON_SHUTDOWN,)Shutdown and normal stop reasons do not create tombstones. Abnormal,
killed, and pledge-violation stop reasons do create tombstones, but
still do not restart the child. local_handle_crash and
local_handle_exit remain the restart-policy paths for simulated or
observed actor exits.
Mailbox backpressure
Section titled “Mailbox backpressure”The local actor mailbox is bounded. Actors without @mailbox use the
runtime channel default: one pending handoff slot. The public send surface
is non-blocking:
let sent = cluster.local_try_send(system, 0 as u64, 42 as i64)let sent_ref = cluster.local_ref_try_send(actor_ref, 42 as i64)Return codes are stable for the current tracer bullet:
1: the message was accepted.0: the child slot is empty or the mailbox is full.-1: the mailbox channel is closed.
Use @mailbox(capacity: N) or
@mailbox(capacity: N, overflow: .reject) on a compiler-generated actor to set
the supervised actor mailbox capacity. The compiler also uses the same value
for direct spawn Actor() mailboxes. In the v1 local runtime, omitted
overflow means .reject: send returns 0 when the mailbox is full.
overflow: .drop_oldest, .drop_newest, and .block_sender are rejected by
janus build with E_CLUSTER_MAILBOX until those runtime policies are
executable. Unknown @mailbox fields are also rejected: the canonical v1
shape is capacity plus optional overflow. Production callers should treat
0 as backpressure or missing-child rejection and retry, drop, or escalate
according to their actor protocol.
@arena policy is also checked at build time. If present, it must describe an
executable actor/grain allocator-domain contract:
@arena(scope: .actor, reset: .on_restart, max_bytes: 4096)actor Worker(msg: WorkMsg) do receive do WorkMsg.Ping => do end, endend
@arena(scope: .grain, reset: .on_deactivate)grain User(id: u64, msg: UserMsg) do receive do UserMsg.Ping => do end, endendThe scope must match the declaration kind. reset must be one of
.on_stop, .on_restart, .on_deactivate, .generation, or .manual.
reset: .manual requires explicit reason metadata. Optional max_bytes
currently must be a positive compile-time integer literal. Invalid arena
metadata fails with E_CLUSTER_ARENA.
For compiler-generated local actors and grains, max_bytes is forwarded into
the generated start helper. The current runtime allocation for generated scalar
state slots uses u64 slots; setup fails before activation when slot_count * 8
is greater than the configured byte ceiling. The configured limit is visible by
ref or by raw local slot:
let actor_limit = cluster.local_ref_arena_max_bytes(actor_ref)let slot_limit = cluster.local_arena_max_bytes(system, 0 as u64)Capability-gated callers use
local_ref_arena_max_bytes_cap and local_arena_max_bytes_cap. Full
allocator-domain accounting for arbitrary actor-local allocations remains
future runtime work.
@replicate validates the source shape for replicated or consistent session
state:
@replicate(scope: .wing)var threat_map = alloc[Session.Replicated](0 as u64)
@replicate(scope: .swarm, protocol: .pbft)var engagement_rules = alloc[Session.Consistent](0 as u64)Allowed scopes are .wing, .cluster, and .swarm. The only v1 protocol
metadata accepted today is .pbft, and it may be omitted. Invalid replication
metadata fails with E_CLUSTER_REPLICATE; this is compile-time contract
validation, not runtime replication execution.
@reductions metadata uses one canonical shape:
@reductions(limit: 128, yield: .loop_backedge)actor Worker(msg: WorkMsg) do receive do WorkMsg.Ping => do end, endendlimit is required and must be a positive compile-time integer literal.
yield may be omitted; if present in the current v1 surface, it must be
.loop_backedge. The old budget spelling is not a synonym and fails with
E_CLUSTER_REDUCTIONS.
For compiler-generated local actors and grains, the accepted limit is now
forwarded into the local runtime. The current executable surface counts
handler-boundary reductions: each delivered message or timeout consumes one
local reduction unit, and the runtime exposes the configured limit, remaining
budget, and yield-marker count.
let limit = cluster.local_ref_reduction_limit(actor_ref)let remaining = cluster.local_ref_reduction_remaining(actor_ref)let yields = cluster.local_ref_reduction_yields(actor_ref)The same counters are available by raw system slot:
let limit = cluster.local_reduction_limit(system, 0 as u64)let yields = cluster.local_reduction_yields(system, 0 as u64)This is deliberately narrower than the final scheduler contract. Function-entry checks, loop-backedge checks, selective-receive scan costs, send/reply costs, and blocking-call reduction costs remain future compiler/runtime injection work under the same source annotation.
@reload metadata is checked as dispatch-table source contract:
@reload(boundary: .message, state: UserState, migrate: user_v1_to_v2)grain User(id: u64, msg: UserMsg) do receive do UserMsg.Ping => do end, endendboundary is required and must be .message, .idle,
.supervised_restart, or .forbidden. state and migrate must be declared
together. Unknown fields and non-executable boundaries fail with
E_CLUSTER_RELOAD. This is metadata validation only; signed module loading,
ABI/state hash comparison, dispatch-entry swap, and hot-reload authorization
remain runtime work.
@observe metadata also has one source shape:
@observe(mailbox: .summary, state: .none, current_message: .type_only)actor Worker(msg: WorkMsg) do receive do WorkMsg.Ping => do end, endendmailbox may be .summary or .none. state may be .none, .redacted,
or .full. current_message may be .none, .type_only, .redacted, or
.full. The old events field is not canonical and fails with
E_CLUSTER_OBSERVE; activation/deactivation events belong to lifecycle or
tombstone streams, not observation-level metadata.
The local v1 runtime exposes the .summary registry through capability-gated
packed snapshots:
let summary = cluster.local_observe_ref_summary_cap(cap, actor_ref)if cluster.local_observe_is_present(summary) do let lifecycle = cluster.local_observe_lifecycle(summary) let pending = cluster.local_observe_mailbox_len(summary) let restarts = cluster.local_observe_restart_count(summary)end
let reductions = cluster.local_observe_ref_reductions_cap(cap, actor_ref)if cluster.local_observe_is_present(reductions) do let limit = cluster.local_observe_reduction_limit(reductions) let remaining = cluster.local_observe_reduction_remaining(reductions) let yields = cluster.local_observe_reduction_yields(reductions)end
let reason = cluster.local_ref_schedule_reason_cap(cap, actor_ref)Use local_observe_child_summary_cap(cap, system, slot) when the caller has a
system handle and slot rather than a stable ref. Use
local_observe_child_reductions_cap(cap, system, slot) for the equivalent
packed reduction counters. The status summary exposes only status metadata:
lifecycle, task state, last exit reason, mailbox length, mailbox capacity, and
restart count. The reduction summary exposes configured limit, remaining budget,
and yield markers. local_ref_schedule_reason_cap and
local_schedule_reason_cap expose the last local scheduling reason as one of
SCHEDULE_REASON_NONE, SCHEDULE_REASON_MESSAGE, or
SCHEDULE_REASON_REDUCTION_YIELD. Observation summaries return 0 for absent
or stale refs. State snapshots, payload snapshots, and cross-node aggregation
remain future observation levels.
@tombstone metadata uses explicit hot-index policy fields:
@tombstone(enabled: true, digest_includes: [.payload], retention_window: 60_000, deadly_threshold: 3)actor Worker(msg: WorkMsg) do receive do WorkMsg.Ping => do end, endendenabled must be true or false. digest_includes may list .payload
and .state; state digests still require a redacted observation or
serialization contract. retention_window and deadly_threshold must be
positive compile-time integer literals. The old classifier field is not
canonical and fails with E_CLUSTER_TOMBSTONE.
@behaviour metadata validates common actor/grain shapes:
@behaviour(.server)actor Worker(msg: WorkMsg) do init(start: i64) -> i64 do return start end
receive do WorkMsg.Ping => do end, endendThe v1 compiler accepts exactly one positional behaviour symbol. Known symbols
are .server, .worker, .event_handler, .state_machine, and
.supervisor. .server currently requires an init hook so the state shape
is visible. .supervisor belongs to supervisor ... end syntax, not an
actor/grain annotation. Shape mismatches fail with CL-E1413.
Mailbox pressure is observable through scalar status accessors:
let pending = cluster.local_child_mailbox_len(system, 0 as u64)let slots = cluster.local_child_mailbox_capacity(system, 0 as u64)The default reports slots == 1. An actor declared with
@mailbox(capacity: 4) reports slots == 4. Both functions return -1
when the slot has no live child.
Janus Status Accessors
Section titled “Janus Status Accessors”The Janus facade exposes local supervisor and child status without exposing actor state:
let supervisor_state = cluster.local_supervisor_state(system)let lifecycle = cluster.local_child_lifecycle(system, 0 as u64)let task_state = cluster.local_child_task_state(system, 0 as u64)let last_exit = cluster.local_child_last_exit(system, 0 as u64)local_supervisor_state returns:
SUPERVISOR_STATE_RUNNINGSUPERVISOR_STATE_STOPPEDSUPERVISOR_STATE_FAILED-1for an invalid handle
local_child_lifecycle returns:
CHILD_LIFECYCLE_UNCONFIGUREDCHILD_LIFECYCLE_CONFIGUREDCHILD_LIFECYCLE_RUNNINGCHILD_LIFECYCLE_STOPPEDCHILD_LIFECYCLE_FAILED-1for an invalid handle or slot
local_child_task_state returns TASK_STATE_READY,
TASK_STATE_RUNNING, TASK_STATE_BLOCKED,
TASK_STATE_BUDGET_EXHAUSTED, TASK_STATE_COMPLETED,
TASK_STATE_CANCELLED, or -1 when no live task is present.
local_child_last_exit returns the same STOP_REASON_* codes used
by local_handle_exit, or -1 when no exit is recorded.
Prefer local_observe_ref_summary_cap or local_observe_child_summary_cap for
the canonical capability-gated status snapshot. The individual accessors remain
low-level local bridge tools and compatibility probes.
Every status accessor has a _cap form that consumes
ClusterLocalCap. These accessors report lifecycle and pressure only;
they do not expose actor-local variables or grain-owned state.
Reduction accessors follow the same local-only rule. Prefer
local_observe_ref_reductions_cap or local_observe_child_reductions_cap when
the caller is already using the observation registry. The lower-level
local_ref_reduction_* helpers accept a stable actor or grain ref, and
local_reduction_* accepts a system handle plus child slot. The values are
counters, not scheduler authority; code that changes reduction policy or forces
preemption still belongs behind Cap.cluster.preempt.
Scheduling reason accessors are also local-only. local_ref_schedule_reason_*
accepts a stable actor or grain ref, and local_schedule_reason_* accepts a
system handle plus child slot. The current reason codes identify no observed
dispatch, ordinary message dispatch, or reduction-budget yield marker.
LocalActorSystem
Section titled “LocalActorSystem”LocalActorSystem is the ergonomic root for the local tracer bullet. It keeps callers on the public std.cluster path instead of reaching into runtime internals.
const cluster = @import("std_cluster");
var system = try cluster.LocalActorSystem.init( allocator, 1, // nursery id "root", // supervisor id .one_for_one, 2, // child slots);defer system.deinit();Starting Children
Section titled “Starting Children”Children are started from ChildSpec values. A child start function receives the actor-system nursery and the allocator owned by the supervisor.
fn startWorker(nursery: *cluster.Nursery, allocator: std.mem.Allocator) !cluster.SupervisedChild { const actor = try allocator.create(cluster.Actor); errdefer allocator.destroy(actor);
actor.* = try cluster.Actor.init(allocator, 1, 1); errdefer actor.deinit();
const task = cluster.spawn(nursery, actor, workerHandler) orelse return error.ActorSpawnRejected; return .{ .actor = actor, .task = task };}
_ = try system.startChild(0, .{ .id = "worker", .start_fn = startWorker, .restart = .permanent,});You can also configure children first and start them later:
try system.configureChild(0, .{ .id = "worker", .start_fn = startWorker, .restart = .permanent,});
const started = try system.startConfiguredChildren();Handling Exits
Section titled “Handling Exits”Use handleCrash for ordinary abnormal actor failure:
try system.handleCrash(0);Use handleExit when the caller knows the exact stop reason:
try system.handleExit(0, .pledge_violated);The Janus facade exposes the same path with stable STOP_REASON_*
codes:
if cluster.local_handle_exit( system, 0 as u64, cluster.STOP_REASON_PLEDGE_VIOLATED,) != 1 as i32 do return 1 endUse handleExitAt for deterministic restart-window tests or runtime loops that already have a timestamp:
try system.handleExitAt(0, .abnormal, 100);const status = system.statusAt(100);Actor Tombstones
Section titled “Actor Tombstones”Abnormal terminal exits now produce actor tombstones. Normal exits and shutdown exits are intentionally skipped; tombstones are for failure classes that may need replay, audit, or repair.
The local runtime keeps the existing bounded in-memory tombstone index and can also mirror each tombstone to a caller-provided sink:
Use the typed Janus sink hook:
local_set_tombstone_sink(system, ctx_addr, append_callback). The callback is
a top-level func(u64, u64) -> i32; the compiler lowers it to internal bridge
plumbing. The legacy _addr hook remains bridge-only compatibility surface and
must not be taught as the public callback API.
The callback receives an opaque context pointer and a callback-scoped
record pointer. Copy or persist the record during the callback; do not
retain record_raw.
Sink counters are exposed for monitoring:
let stored = cluster.local_tombstone_sink_appends(system)let failed = cluster.local_tombstone_sink_failures(system)Stable stop-reason codes are available as STOP_REASON_NORMAL,
STOP_REASON_SHUTDOWN, STOP_REASON_ABNORMAL, STOP_REASON_KILLED,
STOP_REASON_PLEDGE_VIOLATED, and STOP_REASON_MIGRATION_ABORTED.
Tombstone Classification
Section titled “Tombstone Classification”The supervisor hot index can classify the latest tombstone against prior tombstones with the same deterministic pattern: child slot, spec id, stop reason, code version, and input digest. Janus exposes scalar accessors for the current local runtime:
let matches = cluster.local_tombstone_classify_match_count( system, now_seconds, 3 as u32, 60 as i64,)
let deadly = cluster.local_tombstone_classify_deadly( system, now_seconds, 3 as u32, 60 as i64,)
let oldest = cluster.local_tombstone_classify_oldest_sequence( system, now_seconds, 3 as u32, 60 as i64,)matches is the number of hot-index tombstones matching the latest
pattern inside the window. deadly returns 1 when matches reaches the
threshold. oldest returns the oldest matching tombstone sequence, or 0
when no latest tombstone exists. Each function also has a _cap form that
consumes ClusterLocalCap.
The latest hot-index tombstone can also be observed as bounded scalar metadata:
let seq = cluster.local_latest_tombstone_sequence_cap(cap, system)if seq != 0 as u64 do let child = cluster.local_latest_tombstone_child_cap(cap, system) let reason = cluster.local_latest_tombstone_reason_cap(cap, system) let code = cluster.local_latest_tombstone_code_version_cap(cap, system) let digest = cluster.local_latest_tombstone_input_digest_cap(cap, system) let has_replay = cluster.local_latest_tombstone_replay_token_present_cap(cap, system) let attempt = cluster.local_latest_tombstone_attempt_count_cap(cap, system) _ = child _ = reason _ = code _ = digest _ = has_replay _ = attemptendlocal_latest_tombstone_sequence_cap is the presence check. When it returns
0, the child accessor also returns 0; callers should not treat that as a
real child slot without a nonzero sequence. Replay-token observation is a
presence flag only. The token value is not exposed by this surface because
replay is a separate diagnostic authority.
Tombstone Quarantine
Section titled “Tombstone Quarantine”The local supervisor can suppress deterministic-deadly restart loops before the restart budget is exhausted. Quarantine is explicit local runtime policy:
let cap = caps.unsafe_forge_cluster_local_cap()
_ = cluster.local_set_tombstone_quarantine_config_cap( cap, system, 3 as u32, 60 as i64,)_ = cluster.local_set_tombstone_quarantine_cap(cap, system, 1 as u32)
let quarantined = cluster.local_child_quarantined_cap(cap, system, 0 as u64)let total = cluster.local_quarantined_children_cap(cap, system)let first = cluster.local_first_quarantined_child_cap(cap, system)local_child_lifecycle_cap returns CHILD_LIFECYCLE_QUARANTINED for a
configured child that the local tombstone classifier has suppressed.
local_clear_tombstone_quarantine_cap clears the local mark for a slot; it
does not restart the child, delete tombstones, replay payloads, or affect
distributed placement policy. Cross-node quarantine gossip and placement
aggregation remain runtime/operator work.
Tombstones To STL
Section titled “Tombstones To STL”std.cluster.tombstones converts callback records into canonical STL
events. The adapter keeps cluster supervision and STL storage separate:
the sink copies scalar tombstone fields, builds an ActorTombstone, and
appends through an std.stl.lsm_store.LSMStore.
use std.cluster.local as clusteruse std.cluster.tombstones as tombstonesuse std.db.lsm as lsmuse std.stl.lsm_store as lsm_storeuse std.stl.store as store
pub func tombstone_sink(ctx: u64, record_raw: u64) -> i32 do let gs = as[*lsm.GrainStoreBytes](ctx) var stl = lsm_store.make_store(gs)
var t = tombstones.zero() t.sequence = cluster.tombstone_sequence(record_raw) t.child = cluster.tombstone_child(record_raw) t.reason = cluster.tombstone_reason(record_raw) t.attempt_count = cluster.tombstone_attempt_count(record_raw) t.timestamp_seconds = cluster.tombstone_timestamp_seconds(record_raw)
if tombstones.append_lsm(&stl, &t) != store.STORE_OK do return 0 end return 1endThe sink context should point at the borrowed GrainStoreBytes. The
callback creates a short-lived LSMStore wrapper over that same store;
fresh wrappers can rescan LSM truth later for count, rank lookup, and
flush.
Task Completion Routing
Section titled “Task Completion Routing”The local actor system can route a completed nursery task back to the supervised child slot:
const task = system.childTaskAt(1) orelse return error.MissingTask;task.markCompleted(5);
const restarted_idx = try system.handleTaskCompleteByTask(task);Stale task handles are rejected. This matters after a restart, because the old task pointer must not be allowed to affect the replacement child.
Restart Controls
Section titled “Restart Controls”Restart budgets are opt-in:
system.setRestartLimit(2, 60);From Janus:
_ = cluster.local_set_restart_limit(system, 2 as u32, 60 as i64)The limit is counted per restart window. When the budget is exhausted, the supervisor moves to failed, records the failed child and reason, and stops remaining active children according to the implemented supervisor failure cleanup.
Janus callers can test exhaustion through:
let exhausted = cluster.local_restart_limit_exhausted(system)Pledge violations do not restart by default. This is intentional because pledge failure is a capability boundary event, not an ordinary crash. Local systems can explicitly opt in:
system.setRestartPledgeViolations(true);From Janus:
_ = cluster.local_set_restart_pledge_violations(system, 1 as u32)At the source supervisor surface, the same explicit choice is loud. A
declaration with restart_pledge_violations: true builds, but emits PU-W007
so the opt-in is visible during review.
Lifecycle
Section titled “Lifecycle”Use stopChild, stopChildren, or shutdown for explicit lifecycle control:
try system.stopChild(0, .shutdown);_ = try system.stopChildren(.killed);system.shutdown();shutdown stops active children and moves the supervisor to stopped.
Status Inspection
Section titled “Status Inspection”The facade exposes supervisor and child snapshots:
const supervisor_status = system.status();const child_status = system.childStatus(0);const failure = system.failure();SupervisorStatus includes:
- strategy and state
- slot count
- configured, active, stopped, and failed child counts
- total restarts
- restart exhaustion metadata
- restart limit and remaining restarts
ChildStatus includes:
- lifecycle
- configured spec id and restart policy
- actor id and task id when running
- task state when available
- last exit reason
- restart count
Current Limits
Section titled “Current Limits”- Local runtime only.
- No grain API.
- No placement, membership, gossip, or remote send.
- No automatic actor registry integration.
- No hot reload.
- No persistence for actor state. Actor tombstones can be persisted to STL; live actor state replay remains future work.
- Slot type is
u64. Heterogeneous typed state and non-u64 payload fields remain future work.
The current goal is a correct local supervised-actor tracer bullet. Distributed :cluster features build on this surface later.