This controller distributes a Pod on every node in the cluster. Like a DaemonSet, a BroadcastJob makes sure a Pod is created and run on all selected nodes once in a cluster. Like a Job, a BroadcastJob is expected to run to completion.
In the end, BroadcastJob does not consume any resources after each Pod succeeds on every node. This controller is particularly useful when upgrading a software, e.g., Kubelet, or validation check in every node, which is typically needed only once within a long period of time or running an adhoc full cluster inspection script.
Optionally, a BroadcastJob can keep alive after all Pods on desired nodes complete so that a Pod will be automatically launched for every new node after it is added to the cluster.
Template describes the Pod template used to run the job.
Note that for the Pod restart policy, only
OnFailure is allowed for BroadcastJob.
Parallelism specifies the maximal desired number of Pods that should be run at
any given time. By default, there's no limit.
For example, if a cluster has ten nodes and
Parallelism is set to three, there can only be
three pods running in parallel. A new Pod is created only after one running Pod finishes.
CompletionPolicy specifies the controller behavior when reconciling the BroadcastJob.
Always policy means the job will eventually complete with either failed or succeeded
condition. The following parameters take effect with this policy:
ActiveDeadlineSecondsspecifies the duration in seconds relative to the startTime that the job may be active before the system tries to terminate it. For example, if
ActiveDeadlineSecondsis set to 60 seconds, after the BroadcastJob starts running for 60 seconds, all the running pods will be deleted and the job will be marked as Failed.
BackoffLimitspecifies the number of retries before marking this job failed. Currently, the number of retries are defined as the aggregated number of restart counts across all Pods created by the job, i.e., the sum of the ContainerStatus.RestartCount for all containers in every Pod. If this value exceeds
BackoffLimit, the job is marked as Failed and all running Pods are deleted. No limit is enforced if
BackoffLimitis not set.
TTLSecondsAfterFinishedlimits the lifetime of a BroadcastJob that has finished execution (either Complete or Failed). For example, if TTLSecondsAfterFinished is set to 10 seconds, the job will be kept for 10 seconds after it finishes. Then the job along with all the Pods will be deleted.
Never policy means the BroadcastJob will never be marked as Failed or Succeeded even if
all Pods run to completion. This also means above
TTLSecondsAfterFinished parameters takes no effect if
Never policy is used.
For example, if user wants to perform an initial configuration validation for every newly
added node in the cluster, he can deploy a BroadcastJob with
Monitor BroadcastJob status
Assuming the cluster has only one node, run
kubectl get bcj (shortcut name for BroadcastJob) and
we will see the following:
NAME DESIRED ACTIVE SUCCEEDED FAILED
broadcastjob-sample 1 0 1 0
Desired: The number of desired Pods. This equals to the number of matched nodes in the cluster.
Active: The number of active Pods.
SUCCEEDED: The number of succeeded Pods.
FAILED: The number of failed Pods.
Run a BroadcastJob that each Pod computes a pi, with
ttlSecondsAfterFinished set to 30.
The job will be deleted in 30 seconds after it is finished.
- name: pi
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
Run a BroadcastJob that each Pod sleeps for 50 seconds, with
activeDeadlineSeconds set to 10 seconds.
The job will be marked as Failed after it runs for 10 seconds, and the running Pods will be deleted.
- name: sleep
command: ["sleep", "50000"]
Run a BroadcastJob with
Never completionPolicy. The job will continue to run even if all Pods
have completed on all nodes.
- name: sleep
command: ["sleep", "5"]