:cluster — The Sanctum
:cluster — The Sanctum
Section titled “:cluster — The Sanctum”“Fault tolerance by design.”
:cluster is where Janus becomes a system for building software that endures.
Everything from :service plus the actor model, supervised lifecycle, local
grain activation shells, the local grain activation registry, local grain
namespace lookup, and future distribution layers belongs here.
The current compiler/runtime slice is local supervised actors plus the first
grain source-contract shell, local single-writer activation registry, local
namespace lookup layer, and explicit GrainStore-backed lifecycle callbacks.
Scalar on_activate/on_deactivate hooks execute in the local grain runtime.
Local idle passivation is executable through explicit activity touch and
deterministic sweep calls. Grains with
@lifecycle(..., deactivation: .idle_timeout(ms)) also emit a generated
Name_passivate_idle(system, now_ms, reason) helper that carries the source
timeout literal into the local runtime.
Local namespace mappings can be persisted in GrainStoreBytes and restored into
a fresh local actor system before activation.
@reductions(limit: N) is also executable for generated local actor/grain
starts as handler-boundary budget accounting with visible local counters.
Source supervisor ... child ... end declarations now emit local
Name_start_link(node_id) helpers for actor children. Compiler-generated typed
state serializers, non-u64 state slots, remote placement, full
loop-backedge/function-entry reduction injection, migration, distributed
namespace synchronization, and distributed registries are roadmap work, shown
below as future sketches where noted.
What :cluster Gives You
Section titled “What :cluster Gives You”Actors — Concurrent Entities with Mailboxes
Section titled “Actors — Concurrent Entities with Mailboxes”The canonical shape today is actor X do var ... receive do match __msg { ... } end end.
Each actor compiles to a setup/handler/destroy triple that the
generated X_start_supervised(system, slot, policy) wrapper threads
into the local supervisor. Messages are i64 and dispatch is over
their value via an explicit match __msg.
actor Counter do var count: i64 = 0
receive do match __msg { 0 => do count = count + 1 end, 1 => do return 0 end, _ => do count = count end, } endend- Isolated state — No shared memory; each
varis a private slot. - Auto-supervised —
Counter_setup/Counter_handler/Counter_destroyandCounter_start_supervisedare auto-emitted alongside the spawn-form__Counter_loop. - No locks — Message passing is the only concurrency.
Walk through it hands-on in the Stateful Actors tutorial.
Typed message protocols are now local actor syntax, not just a sketch:
message declarations may include payload variants, ActorRef[Msg]
checks the send protocol, and receive arms can destructure local boxed
payload messages:
message Cmd { Tick, Set { value: u64 }, Stop,}
actor Counter(msg: Cmd) do var count: u64 = 0
receive do Cmd.Tick => do count += 1 end, Cmd.Set { value } when value >= 0 as u64 => do count += value end, Cmd.Stop => do return 0 end, after 30_000 => do count = count end, endendThis is still the node-local actor path. Guards and receive-loop
timeouts are live; supervised actors register after arms as local mailbox
timeouts. Distributed payload wire formats remain future
:cluster work.
Local Grain Shell — Virtual Identity Shape
Section titled “Local Grain Shell — Virtual Identity Shape”message UserMsg { Ping, Stop,}
@persist(via: GrainStoreBytes)@lifecycle(activation: .lazy, deactivation: .idle_timeout(300_000))@reload(boundary: .message, state: UserState, migrate: user_v1_to_v2)@reductions(limit: 128)@arena(scope: .grain, reset: .on_deactivate)@observe(mailbox: .summary, state: .none)@tombstone(digest_includes: [.payload], retention_window: 60_000, deadly_threshold: 3)@behaviour(.worker)grain User(id: u64, msg: UserMsg) do var count: u64 = 0
on_activate(stored: u64) -> u64 do return stored end
on_deactivate(state: u64) -> u64 do return state end
receive do UserMsg.Ping => do count += 1 end, UserMsg.Stop => do return 0 end, endend- Live now — the parser accepts
grain Name(id: Id, msg: Msg),@persist,@lifecycleincluding deactivation metadata, lifecycle hooks, future-runtime annotations, state slots, receive arms, and emits a local supervised start wrapper. - Live now —
janus buildvalidates@persist(via: GrainStoreBytes)and grain@lifecycle(activation: .lazy, deactivation: .never | .idle_timeout(ms)). Invalid persistence metadata fails withE_CLUSTER_PERSIST; invalid lifecycle metadata fails withE_CLUSTER_LIFECYCLE. - Live now —
janus buildvalidates@replicate(scope: .wing | .cluster | .swarm, protocol: .pbft)as source metadata. Invalid replication metadata fails withE_CLUSTER_REPLICATE; actual replication remains runtime work. - Live now —
cluster.local_grain_lookup_or_start(...)maps a numeric(grain_type, grain_id)to one stable local activation ref while it is live. - Live now —
cluster.local_grain_lookup_or_start_namespace(...)maps a local(grain_type, namespace)key to an internal durable id, then reuses the same single-writer activation registry. - Live now —
std.cluster.grainstore.bind_local_namespace_u64(...)can persist and mirror a namespace binding, andstd.cluster.grainstore.restore_local_namespace_u64(...)can restore that binding into a fresh local actor system before activation. - Live now —
cluster.local_grain_lookup_or_start_persistent(...)invokes explicit load/store callbacks that can restore and commit state throughGrainStoreBytes. - Live now — persistent source grains that import
std.cluster.persist as persistalso emitName_lookup_or_start_persistent_state0_u64(system, grain_id, slot, policy, ctx)for the current scalar slot-0u64runtime. It uses canonicalGrainStoreBytesload/store callbacks, so simple grains no longer need to hand-write the callback pair. - Live now — persistent source grains also emit
Name_lookup_or_start_persistent_slots_u64(system, grain_id, slot, policy, ctx)for all current scalaru64state slots. - Live now — namespace-addressed persistent source grains emit
Name_lookup_or_start_namespace_persistent_slots_u64(system, namespace, slot, policy, ctx). The helper resolves the local namespace binding first, then uses generated all-slot scalar callbacks. - Live now —
std.cluster.persistexposes explicit arbitrary-slotu64helpers for nonzero scalar state slots:get_slot_u64,put_slot_u64,load_slot_u64,store_slot_u64,load_slots_u64, andstore_slots_u64. - Live now — generated
on_activatehooks run after load and before the first message; generatedon_deactivatehooks run before teardown and the final store for the current scalar state-slot implementation. - Live now —
cluster.local_grain_touch(ref, now_ms)records a visible activity boundary, andcluster.local_grain_passivate_idle(system, timeout_ms, now_ms, reason)passivates idle local grains through the sameon_deactivate/store boundary. The caller supplies milliseconds; no hidden scheduler or wall-clock read is implied. - Live now — a source grain with
@lifecycle(activation: .lazy, deactivation: .idle_timeout(ms))emitsName_passivate_idle(system, now_ms, reason). The generated helper uses the metadata literal as the timeout while keeping the sweep time and stop reason explicit. - Live now —
@reductions(limit: N)on compiler-generated local actors and grains forwards the configured limit into the runtime. The local facade exposes limit, remaining budget, and yield-marker counters at handler boundaries throughlocal_ref_reduction_*andlocal_reduction_*; the capability-gated observation registry also exposes the same counters throughlocal_observe_ref_reductions_capandlocal_observe_child_reductions_cap.local_ref_schedule_reason_capandlocal_schedule_reason_capexpose the last local reason as none, message dispatch, or reduction-yield marker. - Live now —
@arena(max_bytes: N)on compiler-generated local actors and grains forwards the configured ceiling into the runtime. The current executable boundary enforces the byte ceiling for generated scalaru64state slots and exposes it throughlocal_ref_arena_max_bytes(_cap)andlocal_arena_max_bytes(_cap). Arbitrary actor-local allocation accounting remains future runtime work. - Live now — local tombstone quarantine can be enabled and configured from
Janus source.
local_child_quarantined_cap,local_quarantined_children_cap, andlocal_first_quarantined_child_capexpose the local suppression state; clearing quarantine is explicit and does not restart, replay, or delete tombstones. - Live now — latest local tombstone metadata can be observed as bounded scalar fields: sequence, child slot, stop reason, code version, redacted input digest, replay-token presence, state epoch, attempt count, and timestamp. Replay-token contents are not exposed by this observation path.
- Live now — local grain persistence exposes per-system load/store failure counters so operators can detect callback failures instead of inferring them from stopped activations.
- Not live yet — heterogeneous typed GrainStore serializers, non-
u64state slots, full reduction preemption injection, distributed scheduling aggregation, hidden scheduler-owned passivation loops, migration, cross-node tombstone gossip, placement quarantine aggregation, and remote routing. - Rule — a grain is virtual identity with owned state. The current shell proves the source shape; the local registry pins the single-writer identity invariant.
Local Supervision Trees
Section titled “Local Supervision Trees”supervisor GameServerSupervisor, strategy: .one_for_one, restart_pledge_violations: true do child LobbyManager, restart: .permanent child MatchMaker, restart: .permanent child MetricsCollector, restart: .temporaryend
let system = GameServerSupervisor_start_link(1 as u64)- one_for_one — Restart crashed child only
- one_for_all — Restart all if any crashes
- rest_for_one — Restart crashed + subsequent children
- permanent / transient / temporary — Child restart policy is visible at the declaration site
- PU-W007 —
restart_pledge_violations: trueis allowed but intentionally loud
The v1 helper is local and actor-child oriented. It creates the local supervisor
system, applies the declared strategy and pledge-restart opt-in, starts each
child through its generated supervised reference helper, and returns 0 if the
tree cannot start cleanly.
Additional Features
Section titled “Additional Features”- Memory sovereignty tags —
Local.Exclusive,Session.Replicated,Volatile.Ephemeral - Typed message protocols —
messagedeclarations,ActorRef[Msg], localGrainRef[Msg], local payload sends, guarded receive-arm payload destructuring, and direct receive-loop timeout arms are live for node-local actors and grain activations - Visible local cost counters —
@reductions(limit: N)exposes executable local handler-boundary accounting before the later full preemption injector lands - Local failure quarantine — repeated deterministic tombstones can suppress local restarts, with explicit observation and clear APIs
- Source supervision trees — local
supervisordeclarations lower to executableName_start_link(node_id)helpers for declared actor children - Location transparency — Same syntax for local and remote once the distributed runtime layers land
What :cluster Excludes
Section titled “What :cluster Excludes”| Excluded | Available In |
|---|---|
| Tensors and GPU | :compute |
Raw pointers and unsafe | :sovereign |
When to Use :cluster
Section titled “When to Use :cluster”Perfect for:
- Game servers handling thousands of concurrent connections
- Chat systems and real-time messaging
- Distributed databases and key-value stores
- Metaverse infrastructure and virtual world backends
- Any system where a node crash should not take down the service
- Stateful services that need to persist across restarts
The rule: If it needs to stay up when hardware fails, :cluster is your home.
Future Code Sketches
Section titled “Future Code Sketches”The following examples show the intended destination for distributed grains and remote message payloads. They are broader than the current local actor/grain and source-supervisor tracer bullet.
A Chat Server
Section titled “A Chat Server”message ChatMsg { Join { user_id: UserId, reply: Reply[void] }, Send { user_id: UserId, text: String }, Leave { user_id: UserId }, history { count: i32, reply: Reply[[Message]] },}
actor ChatRoom(room_id: RoomId) implements ChatMsg do var members: Set[UserId] := Set.new() var messages: [Message] := []
receive do | Join { user_id, reply } => do members.insert(user_id) reply.send(void.ok()) end
| Send { user_id, text } => do if not members.contains(user_id) do reply.send(Error.not_a_member()) return end messages.push(Message{user_id, text, now()}) end
| Leave { user_id } => do members.remove(user_id) endendSupervision with Recovery
Section titled “Supervision with Recovery”supervisor DatabaseCluster do strategy: one_for_all
child ConnectionPool(max: 10) child QueryProcessor child MetricsExporter
# If ConnectionPool crashes, ALL children restart # This ensures consistent state across the clusterendDistributed Key-Value Store
Section titled “Distributed Key-Value Store”message KVStoreMsg { Get { key: String, reply: Reply[Option[Bytes]] }, Set { key: String, value: Bytes, reply: Reply[void] }, Delete { key: String, reply: Reply[void] }, Range { start: String, end: String, reply: Reply[[(String, Bytes)]] },}
@requires(cap: [.storage_nvme, .network_infiniband])grain KVNode(node_id: NodeId) implements KVStoreMsg do var data: HashMap[String, Bytes]
receive do | Get { key, reply } => do reply.send(data.get(key)) end
| Set { key, value, reply } => do data.set(key, value) # Replicate to other nodes replicate(key, value) reply.send(void.ok()) endendWhy :cluster Wins
Section titled “Why :cluster Wins”vs. Erlang/OTP:
- Types — Erlang’s dynamic types are a feature we left behind
- Generics — No more boilerplate for different message types
- Single language — Everything in Janus, not a separate DSL
vs. Akka (Scala/Java):
- Lighter — No JVM overhead
- Better interop — Native Zig bindings via graft
- Simpler — No implicit state machines
vs. Go + etcd:
- Supervision built-in — etcd is external, here it’s native
- Location transparency — Go needs service discovery, Janus has it baked in
- Grain migration — Go services can’t move between nodes automatically
Next Steps
Section titled “Next Steps”- Move to :sovereign — When you need raw performance
- Move to :service — For simpler applications
- Architecture Docs — Deep dive into the actor model
Build systems that endure.