Kubernetes does not provide a clear guidance about which controller is the best fit for
a user application. Sometimes, this does not seem to be a big problem if users understand
both the application and workload well. For example, users usually know when to choose
DaemonSet since the concepts of these workload are straightforward -
the former is designed for temporal batch style applications and the latter is suitable
for long running Pod which is distributed in every node. On the other hand, the usage
StatefulSet is vague. An application managed by
Deployment conceptually can be managed by a
StatefulSet as well, the opposite may
also apply as long as the Pod
OrderedReady capability of
StatefulSet is not mandatory.
Furthermore, as more and more customized controllers/operators become available in Kubernetes
community, finding suitable controller can be a nonnegligible user problem especially
when some controllers have functional overlaps.
Kruise attempts to mitigate the problem from two aspects:
- Carefully design the new controllers in the Kruise suite to avoid unnecessary functional duplications that may confuse users.
- Establish a classification mechanism for existing workload controllers so that user can more easily understand the use cases of them. We will elaborate this more in this post. The first and most intuitive criterion for classification is the controller name.
An easily understandable controller name can certainly help adoption. After consulting with many internal/external Kubernetes users, we decide to use the following naming conventions in Kruise. Note that these conventions are not contradicted with the controller names used in upstream controllers.
Set -suffix names: This type of controller manages Pods directly. Examples include
SidecarSet. It supports various depolyment/rollout strategies in Pod level.
Deployment -suffix names: This type of controller does not manage Pods directly. Instead, it manages one or many Set -suffix workload instances which are created on behalf of one application. The controller can provide capabilities to orchestrate the deployment/rollout of multiple instances. For example,
ReplicaSetand provides rollout capability which is not available in
UnitedDeployment(planned in M3 release) manages multiple
StatefulSetcreated in respect of multiple domains (i.e., fault domains) within one cluster.
Job -suffix names: This type of controller manages batch style applications with different depolyment/rollout strategies. For example,
BroadcastJobdistributes a job style Pod to every node in the cluster.
Set, Deployment and Job are widely adopted terms in Kubernetes community. Kruise leverages them with certain extensions.
Can we further distinguish controllers with the same name suffix? Normally the string prior to
the suffix should be self-explainable, but in many cases it is hard to find a right word to
describe what the controller does. Check to see how
StatefulSet is originated in
this thread. It takes four
months for community to decide to use the name
StatefulSet to replace the original
PetSet although the new name still confuse people by looking
at its API documentation. This example showcases that sometimes a well-thought-out name
may not be helpful to identify controller. Again, Kruise does not plan to resolve
this problem. As an incremental effort, Kruise considers the following criterion to help classify
Set -suffix controllers.
One unique property of
StatefulSet is that it maintains consistent identities for
Pod network and storage. Essentially, this is done by fixing Pod names.
Pod name can identify both network and storage since it is part of DNS record and
can be used to name Pod volume claim. Why is this property needed given that all Pods in
StatefulSet are created from the same Pod template?
A well known use case is to manage distributed coordination server application such as
etcd or Zookeeper. This type of application requires the cluster member
(i.e., the Pod) to access the same data (in Pod volume) whenever a member is
reconstructed upon failure, in order to function correctly. To differentiate the term
StatefulSet from the same term used in other computer science areas,
I'd like to associate
State with Pod name in this document. That being said, controllers
Stateless since they don't require to reuse the
old Pod name when a Pod is recreated.
Stateful does lead to inflexibility for controller.
StatefulSet relies on ordinal
numbers to realize fixing Pod names. The workload rollout and scaling
has to be done in a strict order. As a consequence, some useful enhancements to
become impossible. For example,
- Selective Pod upgrade and Pod deletion (when scale in). These features can be helpful when Pods are spread across different regions or fault domains.
- The ability of taking control over existing Pods with arbitrary names. There are
cases where Pod creation is done by one controller but Pod lifecycle management
is done by another controller (e.g.,
We found that many containerized applications do not require the
of fixing Pod names, and
StatefulSet is hard to be extended for those
applications in many cases. To fill the gap, Kruise has released a new controller
CloneSet to manage the
Stateless applications. In a nutshell,
provides PVC support and enriched rollout and management capabilities.
The following table roughly compares Advanced StatefulSet and CloneSet in a few aspects.
|Change Pod ownership||No||Yes|
Now, a clear recommendation to Kruise users is if your applications require fixed Pod names (identities for Pod network and storage), you can start with
CloneSet is the primary choice of Set -suffix controllers (if
DaemonSet is not
Kruise aims to provide intuitive names for new controllers. As a supplement, this post provides additional guidance for Kruise users to pick the right controller for their applications. Hope it helps!