Skip to main content

Snapshot Management

Overview​

Checkpoint is a first-class CRD in OpenKruise Agents. Each Checkpoint object captures a point-in-time state of a running sandbox Pod — its memory and/or filesystem — and can later be used to clone (fork) brand-new sandboxes from that exact state.

OpenKruise Agents exposes Checkpoint capabilities through two parallel interfaces; both write to the same underlying Checkpoint CR:

InterfaceConsumerTypical usage
Kubernetes CRDCluster operators, declarative GitOps, direct kubectl / controller useCreate, list, delete, GC Checkpoints natively
E2B SDK (snapshot API)Application code using the E2B Python / JavaScript SDKProgrammatic snapshot lifecycle from a sandbox

In the E2B SDK, the concept is called snapshot. In OpenKruise Agents, the underlying object is a Checkpoint CR — snapshotId returned by the E2B API is exactly the Checkpoint name. This page documents the Checkpoint features and shows both access styles side-by-side.

Checkpoint vs. Pause / Resume​

Checkpoints and Pause / Resume share the underlying capture machinery, but they answer different questions:

Pause / ResumeCheckpoint (Snapshot)
Effect on the source sandboxPaused (stopped)Briefly paused, then resumes running (configurable)
RelationshipOne-to-one — resume restores the same sandboxOne-to-many — one Checkpoint can be cloned into many sandboxes
Sandbox ID after the callUnchangedSource unchanged; each cloned sandbox gets a new ID
Typical use caseSuspend and later resume a single sandboxCheckpointing, rollback, forking runtime state

Checkpoint CRD​

The Checkpoint resource (agents.kruise.io/v1alpha1, short name cp) has the following key fields:

FieldTypeDescription
spec.podNamestringName of the target Pod to checkpoint. Usually equals the target sandbox name.
spec.sandboxNamestringOptional — explicit source Sandbox name when podName is not enough to locate it.
spec.keepRunningboolWhether the source Pod keeps running after the checkpoint completes. Defaults to true. If false, Pod phase becomes Succeeded.
spec.persistentContents[]stringWhat to persist. Valid values: memory, filesystem. Defaults to both when empty (inherited from the source template).
spec.ttlAfterFinishedstringGo duration (e.g. 30m, 24h, 30d). When set, the Checkpoint is auto-deleted after that period. Unset means keep until manual delete.
status.phasestringPending / Creating / Succeeded / Failed / Terminating.
status.checkpointIdstringIdentifier of the captured state in the backing Checkpoint driver. Filled once phase is Succeeded.
status.completionTimeTimeSet when phase transitions to Succeeded or Failed.

Creating a Checkpoint​

Apply a Checkpoint manifest against the target Pod / Sandbox:

apiVersion: agents.kruise.io/v1alpha1
kind: Checkpoint
metadata:
name: checkpoint-code-demo-01
namespace: default
spec:
# Target Pod name
podName: code-interpreter-28rvn
# Whether the Pod remains Running after the checkpoint completes.
# If false, the Pod phase transitions to Succeeded.
keepRunning: true
# Auto-GC the Checkpoint CR after this duration. Accepts Go duration format (30m, 30h, 30d).
# When unset, the Checkpoint is kept until explicitly deleted.
ttlAfterFinished: 30h
# What to persist. Currently only "memory" and "filesystem" are supported.
# Defaults to both.
persistentContents:
- memory
- filesystem

Watch the Checkpoint progress:

$ kubectl get cp
NAME STATUS AGE
checkpoint-code-demo-01 Succeeded 24s

Once STATUS is Succeeded, the Checkpoint is ready to be used as a starting point for new sandboxes.

Listing Checkpoints​

Use the short name cp with kubectl. Checkpoints are namespaced:

$ kubectl get cp -n default
NAME STATUS AGE
checkpoint-code-demo-01 Succeeded 5m
checkpoint-code-demo-02 Creating 10s

Filter by the source sandbox with the agents.kruise.io/sandbox-name label (set automatically on Checkpoints created through the E2B API; can be set manually on CRD-created ones):

$ kubectl get cp -l agents.kruise.io/sandbox-name=code-interpreter-28rvn

Deleting a Checkpoint​

Deleting a Checkpoint removes both the Checkpoint CR and its paired SandboxTemplate. Existing sandboxes already cloned from it are not affected.

kubectl delete cp checkpoint-code-demo-01 -n default

You can also let spec.ttlAfterFinished handle automatic cleanup.

Creating a Sandbox from a Checkpoint​

A Succeeded Checkpoint can be used as the starting point for new sandboxes. The Checkpoint.name is passed as the template identifier.

apiVersion: agents.kruise.io/v1alpha1
kind: SandboxClaim
metadata:
name: demo-from-checkpoint
namespace: default
spec:
# When a Checkpoint with this name exists in the same namespace,
# the claim goes through the clone path instead of the warm-pool path.
templateName: checkpoint-code-demo-01

See Claiming Sandboxes for the full set of claim-time options.

Checkpoint vs. SandboxTemplate / SandboxSet​

Checkpoints and templates both provide reusable starting points for sandboxes, but they solve different problems:

SandboxTemplate / SandboxSetCheckpoint (Snapshot)
Defined byDeclarative CRD + imageCapturing a running sandbox
ReproducibilitySame definition always produces the same sandboxCaptures whatever runtime state exists
Best forRepeatable base environments, warm poolsCheckpointing, rollback, forking runtime state

Use templates when every sandbox should start from an identical, pre-provisioned base. Use Checkpoints when you need to capture or fork live runtime state that depends on what happened during execution.

Notes​

  1. Connection drop: The source sandbox is briefly paused during capture. All active WebSocket, PTY and command-stream connections are dropped; clients must be able to reconnect.
  2. Backend dependency: The actual Checkpoint semantics (what is preserved, speed, size) depend on the Checkpoint driver configured in your cluster.
  3. Ownership isolation (E2B): listSnapshots and the E2B delete path are scoped to the API-key user that owns the Checkpoint. Admin-team keys may see Checkpoints across namespaces.
  4. No isolation for CRD access: Direct kubectl/CRD access bypasses the E2B user scoping and is governed purely by Kubernetes RBAC on checkpoints.agents.kruise.io.
  5. E2B deleteSnapshot covers both: DELETE /templates/{id} handles both Checkpoint deletion and template deletion. Checkpoints are deletable; SandboxSet-backed templates are not.