Skip to content

:cluster — The Sanctum

“Fault tolerance by design.”

:cluster is where Janus becomes a system for building software that endures. Everything from :service plus the actor model, supervised lifecycle, local grain activation shells, the local grain activation registry, local grain namespace lookup, and future distribution layers belongs here.

The current compiler/runtime slice is local supervised actors plus the first grain source-contract shell, local single-writer activation registry, local namespace lookup layer, and explicit GrainStore-backed lifecycle callbacks. Scalar on_activate/on_deactivate hooks execute in the local grain runtime. Local idle passivation is executable through explicit activity touch and deterministic sweep calls. Grains with @lifecycle(..., deactivation: .idle_timeout(ms)) also emit a generated Name_passivate_idle(system, now_ms, reason) helper that carries the source timeout literal into the local runtime. Local namespace mappings can be persisted in GrainStoreBytes and restored into a fresh local actor system before activation. @reductions(limit: N) is also executable for generated local actor/grain starts as handler-boundary budget accounting with visible local counters. Source supervisor ... child ... end declarations now emit local Name_start_link(node_id) helpers for actor children. Compiler-generated typed state serializers, non-u64 state slots, remote placement, full loop-backedge/function-entry reduction injection, migration, distributed namespace synchronization, and distributed registries are roadmap work, shown below as future sketches where noted.


Actors — Concurrent Entities with Mailboxes

Section titled “Actors — Concurrent Entities with Mailboxes”

The canonical shape today is actor X do var ... receive do match __msg { ... } end end. Each actor compiles to a setup/handler/destroy triple that the generated X_start_supervised(system, slot, policy) wrapper threads into the local supervisor. Messages are i64 and dispatch is over their value via an explicit match __msg.

actor Counter do
var count: i64 = 0
receive do
match __msg {
0 => do
count = count + 1
end,
1 => do
return 0
end,
_ => do
count = count
end,
}
end
end
  • Isolated state — No shared memory; each var is a private slot.
  • Auto-supervisedCounter_setup / Counter_handler / Counter_destroy and Counter_start_supervised are auto-emitted alongside the spawn-form __Counter_loop.
  • No locks — Message passing is the only concurrency.

Walk through it hands-on in the Stateful Actors tutorial.

Typed message protocols are now local actor syntax, not just a sketch: message declarations may include payload variants, ActorRef[Msg] checks the send protocol, and receive arms can destructure local boxed payload messages:

message Cmd {
Tick,
Set { value: u64 },
Stop,
}
actor Counter(msg: Cmd) do
var count: u64 = 0
receive do
Cmd.Tick => do
count += 1
end,
Cmd.Set { value } when value >= 0 as u64 => do
count += value
end,
Cmd.Stop => do
return 0
end,
after 30_000 => do
count = count
end,
end
end

This is still the node-local actor path. Guards and receive-loop timeouts are live; supervised actors register after arms as local mailbox timeouts. Distributed payload wire formats remain future :cluster work.

Local Grain Shell — Virtual Identity Shape

Section titled “Local Grain Shell — Virtual Identity Shape”
message UserMsg {
Ping,
Stop,
}
@persist(via: GrainStoreBytes)
@lifecycle(activation: .lazy, deactivation: .idle_timeout(300_000))
@reload(boundary: .message, state: UserState, migrate: user_v1_to_v2)
@reductions(limit: 128)
@arena(scope: .grain, reset: .on_deactivate)
@observe(mailbox: .summary, state: .none)
@tombstone(digest_includes: [.payload], retention_window: 60_000, deadly_threshold: 3)
@behaviour(.worker)
grain User(id: u64, msg: UserMsg) do
var count: u64 = 0
on_activate(stored: u64) -> u64 do
return stored
end
on_deactivate(state: u64) -> u64 do
return state
end
receive do
UserMsg.Ping => do
count += 1
end,
UserMsg.Stop => do
return 0
end,
end
end
  • Live now — the parser accepts grain Name(id: Id, msg: Msg), @persist, @lifecycle including deactivation metadata, lifecycle hooks, future-runtime annotations, state slots, receive arms, and emits a local supervised start wrapper.
  • Live nowjanus build validates @persist(via: GrainStoreBytes) and grain @lifecycle(activation: .lazy, deactivation: .never | .idle_timeout(ms)). Invalid persistence metadata fails with E_CLUSTER_PERSIST; invalid lifecycle metadata fails with E_CLUSTER_LIFECYCLE.
  • Live nowjanus build validates @replicate(scope: .wing | .cluster | .swarm, protocol: .pbft) as source metadata. Invalid replication metadata fails with E_CLUSTER_REPLICATE; actual replication remains runtime work.
  • Live nowcluster.local_grain_lookup_or_start(...) maps a numeric (grain_type, grain_id) to one stable local activation ref while it is live.
  • Live nowcluster.local_grain_lookup_or_start_namespace(...) maps a local (grain_type, namespace) key to an internal durable id, then reuses the same single-writer activation registry.
  • Live nowstd.cluster.grainstore.bind_local_namespace_u64(...) can persist and mirror a namespace binding, and std.cluster.grainstore.restore_local_namespace_u64(...) can restore that binding into a fresh local actor system before activation.
  • Live nowcluster.local_grain_lookup_or_start_persistent(...) invokes explicit load/store callbacks that can restore and commit state through GrainStoreBytes.
  • Live now — persistent source grains that import std.cluster.persist as persist also emit Name_lookup_or_start_persistent_state0_u64(system, grain_id, slot, policy, ctx) for the current scalar slot-0 u64 runtime. It uses canonical GrainStoreBytes load/store callbacks, so simple grains no longer need to hand-write the callback pair.
  • Live now — persistent source grains also emit Name_lookup_or_start_persistent_slots_u64(system, grain_id, slot, policy, ctx) for all current scalar u64 state slots.
  • Live now — namespace-addressed persistent source grains emit Name_lookup_or_start_namespace_persistent_slots_u64(system, namespace, slot, policy, ctx). The helper resolves the local namespace binding first, then uses generated all-slot scalar callbacks.
  • Live nowstd.cluster.persist exposes explicit arbitrary-slot u64 helpers for nonzero scalar state slots: get_slot_u64, put_slot_u64, load_slot_u64, store_slot_u64, load_slots_u64, and store_slots_u64.
  • Live now — generated on_activate hooks run after load and before the first message; generated on_deactivate hooks run before teardown and the final store for the current scalar state-slot implementation.
  • Live nowcluster.local_grain_touch(ref, now_ms) records a visible activity boundary, and cluster.local_grain_passivate_idle(system, timeout_ms, now_ms, reason) passivates idle local grains through the same on_deactivate/store boundary. The caller supplies milliseconds; no hidden scheduler or wall-clock read is implied.
  • Live now — a source grain with @lifecycle(activation: .lazy, deactivation: .idle_timeout(ms)) emits Name_passivate_idle(system, now_ms, reason). The generated helper uses the metadata literal as the timeout while keeping the sweep time and stop reason explicit.
  • Live now@reductions(limit: N) on compiler-generated local actors and grains forwards the configured limit into the runtime. The local facade exposes limit, remaining budget, and yield-marker counters at handler boundaries through local_ref_reduction_* and local_reduction_*; the capability-gated observation registry also exposes the same counters through local_observe_ref_reductions_cap and local_observe_child_reductions_cap. local_ref_schedule_reason_cap and local_schedule_reason_cap expose the last local reason as none, message dispatch, or reduction-yield marker.
  • Live now@arena(max_bytes: N) on compiler-generated local actors and grains forwards the configured ceiling into the runtime. The current executable boundary enforces the byte ceiling for generated scalar u64 state slots and exposes it through local_ref_arena_max_bytes(_cap) and local_arena_max_bytes(_cap). Arbitrary actor-local allocation accounting remains future runtime work.
  • Live now — local tombstone quarantine can be enabled and configured from Janus source. local_child_quarantined_cap, local_quarantined_children_cap, and local_first_quarantined_child_cap expose the local suppression state; clearing quarantine is explicit and does not restart, replay, or delete tombstones.
  • Live now — latest local tombstone metadata can be observed as bounded scalar fields: sequence, child slot, stop reason, code version, redacted input digest, replay-token presence, state epoch, attempt count, and timestamp. Replay-token contents are not exposed by this observation path.
  • Live now — local grain persistence exposes per-system load/store failure counters so operators can detect callback failures instead of inferring them from stopped activations.
  • Not live yet — heterogeneous typed GrainStore serializers, non-u64 state slots, full reduction preemption injection, distributed scheduling aggregation, hidden scheduler-owned passivation loops, migration, cross-node tombstone gossip, placement quarantine aggregation, and remote routing.
  • Rule — a grain is virtual identity with owned state. The current shell proves the source shape; the local registry pins the single-writer identity invariant.
supervisor GameServerSupervisor, strategy: .one_for_one,
restart_pledge_violations: true do
child LobbyManager, restart: .permanent
child MatchMaker, restart: .permanent
child MetricsCollector, restart: .temporary
end
let system = GameServerSupervisor_start_link(1 as u64)
  • one_for_one — Restart crashed child only
  • one_for_all — Restart all if any crashes
  • rest_for_one — Restart crashed + subsequent children
  • permanent / transient / temporary — Child restart policy is visible at the declaration site
  • PU-W007restart_pledge_violations: true is allowed but intentionally loud

The v1 helper is local and actor-child oriented. It creates the local supervisor system, applies the declared strategy and pledge-restart opt-in, starts each child through its generated supervised reference helper, and returns 0 if the tree cannot start cleanly.

  • Memory sovereignty tagsLocal.Exclusive, Session.Replicated, Volatile.Ephemeral
  • Typed message protocolsmessage declarations, ActorRef[Msg], local GrainRef[Msg], local payload sends, guarded receive-arm payload destructuring, and direct receive-loop timeout arms are live for node-local actors and grain activations
  • Visible local cost counters@reductions(limit: N) exposes executable local handler-boundary accounting before the later full preemption injector lands
  • Local failure quarantine — repeated deterministic tombstones can suppress local restarts, with explicit observation and clear APIs
  • Source supervision trees — local supervisor declarations lower to executable Name_start_link(node_id) helpers for declared actor children
  • Location transparency — Same syntax for local and remote once the distributed runtime layers land

ExcludedAvailable In
Tensors and GPU:compute
Raw pointers and unsafe:sovereign

Perfect for:

  • Game servers handling thousands of concurrent connections
  • Chat systems and real-time messaging
  • Distributed databases and key-value stores
  • Metaverse infrastructure and virtual world backends
  • Any system where a node crash should not take down the service
  • Stateful services that need to persist across restarts

The rule: If it needs to stay up when hardware fails, :cluster is your home.


The following examples show the intended destination for distributed grains and remote message payloads. They are broader than the current local actor/grain and source-supervisor tracer bullet.

message ChatMsg {
Join { user_id: UserId, reply: Reply[void] },
Send { user_id: UserId, text: String },
Leave { user_id: UserId },
history { count: i32, reply: Reply[[Message]] },
}
actor ChatRoom(room_id: RoomId) implements ChatMsg do
var members: Set[UserId] := Set.new()
var messages: [Message] := []
receive do
| Join { user_id, reply } => do
members.insert(user_id)
reply.send(void.ok())
end
| Send { user_id, text } => do
if not members.contains(user_id) do
reply.send(Error.not_a_member())
return
end
messages.push(Message{user_id, text, now()})
end
| Leave { user_id } => do
members.remove(user_id)
end
end
supervisor DatabaseCluster do
strategy: one_for_all
child ConnectionPool(max: 10)
child QueryProcessor
child MetricsExporter
# If ConnectionPool crashes, ALL children restart
# This ensures consistent state across the cluster
end
message KVStoreMsg {
Get { key: String, reply: Reply[Option[Bytes]] },
Set { key: String, value: Bytes, reply: Reply[void] },
Delete { key: String, reply: Reply[void] },
Range { start: String, end: String, reply: Reply[[(String, Bytes)]] },
}
@requires(cap: [.storage_nvme, .network_infiniband])
grain KVNode(node_id: NodeId) implements KVStoreMsg do
var data: HashMap[String, Bytes]
receive do
| Get { key, reply } => do
reply.send(data.get(key))
end
| Set { key, value, reply } => do
data.set(key, value)
# Replicate to other nodes
replicate(key, value)
reply.send(void.ok())
end
end

vs. Erlang/OTP:

  • Types — Erlang’s dynamic types are a feature we left behind
  • Generics — No more boilerplate for different message types
  • Single language — Everything in Janus, not a separate DSL

vs. Akka (Scala/Java):

  • Lighter — No JVM overhead
  • Better interop — Native Zig bindings via graft
  • Simpler — No implicit state machines

vs. Go + etcd:

  • Supervision built-in — etcd is external, here it’s native
  • Location transparency — Go needs service discovery, Janus has it baked in
  • Grain migration — Go services can’t move between nodes automatically


Build systems that endure.