Skip to main content

2 posts tagged with "uniteddeployment"

View All Tags

A Flexible and Configurable Serverless Elastic Solution at the Workload Level

ยท 13 min read
Tianyun Zhong
Member of OpenKruise

Serverless represents an extension of cloud computing, inheriting its most significant feature: on-demand elastic scaling. This model design allows developers to focus on application logic without concerning themselves with deployment resources, thereby fully leveraging resource scalability to provide superior elasticity capabilities. Enterprises can also genuinely benefit from true pay-as-you-go characteristics. Consequently, more cloud providers are converging towards this new architectural paradigm.

The core capability of "flexible configurability" in Serverless technology focuses on enabling specific cloud usage scenarios to fully utilize cloud resources through simple, minimally invasive, and highly configurable methods. Its essence lies in resolving the conflict between capacity planning and actual cluster load configuration. This article will sequentially introduce two configurable components โ€” WorkloadSpread and UnitedDeployment โ€” discussing their core capabilities, technical principles, advantages and disadvantages, as well as real-world applications. Through these discussions, we aim to share OpenKruise's technical evolution and considerations in addressing Serverless workload elasticity.

Overview of Elastic Scenarios

As Serverless technology matures, more enterprises prefer using cloud resources (such as Alibaba Cloud ACS Serverless container instances) over on-premise resources (like managed resource pools or on-premise IDC data centers) to host applications with temporary, tidal, or bursty characteristics. This approach enhances resource utilization efficiency and reduces overall costs by adopting a pay-as-you-go model. Below are some typical elastic scenarios:

  1. Prioritize using on-premise resources in offline IDC data centers; scale application to the cloud when resources are insufficient.
  2. Prefer using pre-paid resource pool in the cloud; use pay-as-you-go Serverless instances for additional replicas when resources are insufficient.
  3. Use high-quality stable compute power (e.g., dedicated cloud server instances) first; then use lower-quality compute power (e.g., Spot instances).
  4. Configure different resource quantities for container replicas deployed on different compute platforms (e.g., X86, ARM, Serverless instances) to achieve similar performance.
  5. Inject different middleware configurations into replicas deployed on nodes versus Serverless environments (e.g., shared Daemon on nodes, Sidecar injection on Serverless).

These components introduced in this article offer distinct advantages in solving the above problems. Users can choose appropriate capabilities based on their specific scenarios to effectively leverage elastic compute power.

Capabilities and Advantageous Scenarios of Two Components

  • WorkloadSpread: Utilizes a Mutating Webhook to intercept Pod creation requests that meet certain criteria and apply Patch operations to inject differentiated configurations. Suitable for existing applications requiring multiple elastic partitions with customized Pod Metadata and Spec fields.
  • UnitedDeployment: A workload with built-in capability of elastic partitioning and pod customization, offering stronger elasticity and capacity planning capabilities. Ideal for new applications needing detailed partitioning and individual configurations for each partition.

WorkloadSpread: An Elastic Strategy Plugin Based on Pod Mutating Webhook

WorkloadSpread is a bypass component provided by the OpenKruise community that spreads target workload Pods across different types of subsets according to specific rules, enhancing multi-region and elastic deployment capabilities without modifying the original workload. It supports almost all native or custom Kubernetes workloads, ensuring adaptability and flexibility in various environments.

Example Configurationโ€‹

apiVersion: apps.kruise.io/v1alpha1
kind: WorkloadSpread
metadata:
name: workloadspread-demo
spec:
targetRef: # Supports almost all native or custom Kubernetes workloads
apiVersion: apps/v1 | apps.kruise.io/v1alpha1
kind: Deployment | CloneSet
name: workload-xxx
subsets:
- name: subset-a
# The first three replicas will be scheduled to this Subset
maxReplicas: 3
# Pod affinity configuration
requiredNodeSelectorTerm:
matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- zone-a
patch:
# Inject a custom label to Pods scheduled to this Subset
metadata:
labels:
xxx-specific-label: xxx
- name: subset-b
# Deploy to Serverless clusters, no capacity and unlimited replicas
requiredNodeSelectorTerm:
matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- acs-cn-hangzhou
scheduleStrategy:
# Adaptive mode will reschedule failed Pods to other Subsets
type: Adaptive | Fixed
adaptive:
rescheduleCriticalSeconds: 30

Powerful Partitioning Capabilityโ€‹

WorkloadSpread spreads Pods into different elastic partitions using Subsets, scaling up forward and scaling down backward based on Subset order.

Flexible Scheduling Configurationโ€‹

At the Subset level, WorkloadSpread supports selecting nodes via Labels and configuring advanced options such as taints and tolerations. For example, requiredNodeSelectorTerm specifies mandatory node attributes, preferredNodeSelectorTermssets preferred node attributes, and tolerations configures Pod tolerance for node taints. These configurations allow precise control over Pod scheduling and distribution.

At the global level, WorkloadSpread supports two scheduling strategies via the scheduleStrategy field: Fixed and Adaptive. The Fixed strategy ensures strict adherence to predefined Subset distributions, while the Adaptive strategy provides higher flexibility by automatically rescheduling Pods to other available Subsets when necessary.

Detailed Pod Customizationโ€‹

In Subset configurations, the patch field allows for fine-grained customization of Pods scheduled to that subset. Supported fields include container images, resource limits, environment variables, volume mounts, startup commands, probe configurations, and labels. This decouples Pod specifications from environment adaptations, enabling flexible workload adjustments for various partition environments.

...
# patch pod with a topology label:
patch:
metadata:
labels:
topology.application.deploy/zone: "zone-a"
...

The example above demonstrates how to add or modify a label to all Pods in a Subset.

...
# patch pod container resources:
patch:
spec:
containers:
- name: main
resources:
limit:
cpu: "2"
memory: 800Mi
...

The example above demonstrates how to add or modify the Pod Spec.

...
# patch pod container env with a zone name:
patch:
spec:
containers:
- name: main
env:
- name: K8S_AZ_NAME
value: zone-a
...

The example above demonstrates how to add or modify a container environment variable.

WorkloadSpread's Pod Mutating Webhook Mechanismโ€‹

WorkloadSpread operates directly on Pods created by the target workload via Pod Mutating Webhook, ensuring non-intrusive operation. When a Pod creation request meets the criteria, the Webhook intercepts it, reads the corresponding WorkloadSpread configuration, selects an appropriate Subset, and modifies the Pod configuration accordingly. The controller maintains the controller.kubernetes.io/pod-deletion-cost label to ensure correct downsizing order.

Limitations of WorkloadSpreadโ€‹

Potential Risks of Webhookโ€‹

WorkloadSpread depends on Pod Mutating Webhook to function, which intercepts all Pod creation requests in the cluster. If the Webhook Pod (kruise-manager) experiences performance issues or failures, it may prevent new Pods from being created. Additionally, during large-scale scaling operations, Webhook can become a performance bottleneck.

Limitations of Acting on Podsโ€‹

While acting on Pods reduces business intrusion, it introduces limitations. For instance, CloneSet's gray release ratio cannot be controlled per Subset.

Case Study 1: Bandwidth Package Allocation in Large-Scale Load Testingโ€‹

A company needed to perform load testing before a major shopping festival. They developed a load-agent program to generate requests and used a CloneSet to manage agent replicas. To save costs, they purchased 10 shared bandwidth packages (each supporting 300 Pods) and aimed to dynamically allocate them to elastic agent replicas.

They configured a WorkloadSpread with 11 Subsets: the first 10 Subsets had a capacity of 300 and patched Pod Annotations to bind specific bandwidth packages; the last Subset had no capacity and no bandwidth package, preventing extra bandwidth allocation if more than 3000 replicas were created.

apiVersion: apps.kruise.io/v1alpha1
kind: WorkloadSpread
metadata:
name: bandwidth-spread
namespace: loadtest
spec:
targetRef:
apiVersion: apps.kruise.io/v1alpha1
kind: CloneSet
name: load-agent-XXXXX
subsets:
- name: bandwidthPackage-1
maxReplicas: 300
patch:
metadata:
annotations:
k8s.aliyun.com/eip-common-bandwidth-package-id: <id1>

- ...

- name: bandwidthPackage-10
maxReplicas: 300
patch:
metadata:
annotations:
k8s.aliyun.com/eip-common-bandwidth-package-id: <id10>
- name: no-eip

Case Study 2: Compatibility for Scaling Managed K8S Cluster Services to Serverless Instancesโ€‹

A company had a web service running on an IDC that needed to scale up due to business growth but could not expand the local data center. They chose to use virtual nodes to access cloud-based Serverless elastic compute power, forming a hybrid cloud. Their application used acceleration services like Fluid, which were pre-deployed on nodes in the IDC but not available in the serverless subset. Therefore, they needed to inject a sidecar into cloud Pods to provide acceleration capabilities.

To achieve this without modifying the existing Deployment's 8 replicas, they used WorkloadSpread to add a label to Pods scaled to each subset, which controlled the Fluid sidecar injection.

apiVersion: apps.kruise.io/v1alpha1
kind: WorkloadSpread
metadata:
name: data-processor-spread
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: data-processor
subsets:
- name: local
maxReplicas: 8
patch:
metadata:
labels:
serverless.fluid.io/inject: "false"
- name: aliyun-acs
patch:
metadata:
labels:
serverless.fluid.io/inject: "true"

UnitedDeployment: A Native Workload with Built-in Elasticity

UnitedDeployment is an advanced workload provided by the OpenKruise community that natively supports partition management. Unlike WorkloadSpread, which enhances basic workloads, UnitedDeployment offers a new mode for managing partitioned elastic applications. It defines applications through a single template, and the controller creates and manages multiple secondary workloads to match different subsets. UnitedDeployment manages the entire lifecycle of applications within a single resource, including definition, partitioning, scaling, and upgrades.

Example Configurationโ€‹

apiVersion: apps.kruise.io/v1alpha1
kind: UnitedDeployment
metadata:
name: sample-ud
spec:
replicas: 6
selector:
matchLabels:
app: sample
template:
cloneSetTemplate:
metadata:
labels:
app: sample
spec:
# CloneSet Spec
...
topology:
subsets:
- name: ecs
nodeSelectorTerm:
matchExpressions:
- key: node-type
operator: In
values:
- ecs
maxReplicas: 2
- name: acs-serverless
nodeSelectorTerm:
matchExpressions:
- key: node-type
operator: In
values:
- acs-virtual-kubelet

Advantages of UnitedDeploymentโ€‹

All-In-One Elastic Application Managementโ€‹

UnitedDeployment offers comprehensive all-in-one application management, enabling users to define applications, manage subsets, scale, and upgrade using a single resource.

The UnitedDeployment controller manages a corresponding type of secondary workload for each subset based on the workload template, without requiring additional attention from the user. Users only need to manage the application template and subsets; the UnitedDeployment controller will handle subsequent management tasks for each secondary workload, including creation, modification, and deletion. The controller also monitors the status of Pods created by these workloads when necessary to make corresponding adjustments.

It is the secondary workload controllers implement the specific scaling and updating operations. Thus, scaling and updating using UnitedDeployment produces exactly the same effect as directly using the corresponding workload. For example, a UnitedDeployment will inherit the same grayscale publishing and in-place upgrade capabilities from CloneSet when created with a CloneSet template.

Advanced Subset Managementโ€‹

UnitedDeployment incorporates two capacity allocation algorithms, enabling users to handle various scenarios of elastic applications through detailed subset capacity configurations.

The elastic allocation algorithm implements a classic elastic capacity allocation method similar to WorkloadSpread: by setting upper and lower capacity limits for each subset, Pods are scaled up in the defined order of subsets and scaled down in reverse order. This method has been thoroughly introduced earlier, so it will not be elaborated further here.

The specified allocation algorithm represents a new approach to capacity allocation. It directly assigns fixed numbers or percentages to some subsets and reserves at least one elastic subset to distribute the remaining replicas.

In addition to capacity allocation, UnitedDeployment also allows customizing any Pod Spec fields (including container images) for each subset, which is similar to WorkloadSpread. This grants UnitedDeployment's subset configuration with powerful flexibility.

Adaptive Elasticityโ€‹

UnitedDeployment offers robust adaptive elasticity, automating scaling and rescheduling operations to reduce operational overhead. It supports Kubernetes Horizontal Pod Autoscaler (HPA), enabling automatic scaling based on predefined conditions while adhering strictly to subset configurations.

UnitedDeployment also offers adaptive Pod rescheduling capabilities similar to WorkloadSpread. Additionally, it allows configuration of timeout durations for scheduling failures and recovery times for subsets from unscheduable status, providing enhanced control over adaptive scheduling.

Limitations of UnitedDeploymentโ€‹

The many advantages of UnitedDeployment stem from its all-in-one management capabilities as an independent workload. However, this also leads to the drawback of higher business transformation intrusiveness. For users' existing application, it is necessary to modify PaaS systems and tools (such as operation and maintenance systems, release systems, etc.) to switch from existing workloads like Deployment and CloneSet to UnitedDeployment.

Case Study 1: Elastic Scaling of Pods to Virtual Nodes with Adaptation for Serverless Containersโ€‹

Cloud providers typically offer three types of Kubernetes services:

  1. Managed clusters with fixed nodes using cloud servers purchased by users.
  2. Serverless clusters delivering container computing power directly via virtual node technology.
  3. Hybrid clusters containing both managed nodes and virtual nodes.

In this case, a company planned to launch a new service with significant peak-to-valley traffic differences (up to tenfold). To handle this characteristic, they purchased a batch of cloud servers to form a managed cluster nodepool for handling baseline traffic and intended to quickly scale out new replicas to a serverless subset during peak hours. Additionally, their application required extra configuration to run in the Serverless environment. Below is an example configuration:

apiVersion: apps.kruise.io/v1alpha1
kind: UnitedDeployment
metadata:
name: elastic-app
spec:
# Omitted business workload template
...
topology:
# Enable Adaptive scheduling to dispatch Pod replicas to ECS node pools and ACS instances adaptively
scheduleStrategy:
type: Adaptive
adaptive:
# Start scheduling to ACS Serverless instances 10 seconds after ECS node scheduling failure
rescheduleCriticalSeconds: 10
# Do not schedule to ECS nodes within one hour after the above scheduling failure
unschedulableLastSeconds: 3600
subsets:
# Prioritize ECS without an upper limit; only schedule to ACS when ECS fails
# During scale-in, delete ACS instances first, then ECS node pool Pods
- name: ecs
nodeSelectorTerm:
matchExpressions:
- key: type
operator: NotIn
values:
- acs-virtual-kubelet
- name: acs-serverless
nodeSelectorTerm:
matchExpressions:
- key: type
operator: In
values:
- acs-virtual-kubelet
# Use patch to modify environment variables for Pods scheduled to elastic computing power, enabling Serverless mode
patch:
spec:
containers:
- name: main
env:
- name: APP_RUNTIME_MODE
value: SERVERLESS
---
# Combine with HPA for automatic scaling
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: elastic-app-hpa
spec:
minReplicas: 1
maxReplicas: 100
metrics:
- resource:
name: cpu
targetAverageUtilization: 2
type: Resource
scaleTargetRef:
apiVersion: apps.kruise.io/v1alpha1
kind: UnitedDeployment
name: elastic-app

Case Study 2: Allocating Different Resources to Pods with Different CPU Typesโ€‹

In this case, a company purchased several cloud server instances with Intel, AMD, and ARM platform CPUs to prepare for launching a new service. They wanted Pods scheduled on different platforms to exhibit similar performance. After stress testing, it was found that, compared to Intel CPUs as the benchmark, AMD platforms needed more CPU cores, while ARM platforms required more memory.

apiVersion: apps.kruise.io/v1alpha1
kind: UnitedDeployment
metadata:
name: my-app
spec:
replicas: 4
selector:
matchLabels:
app: my-app
template:
deploymentTemplate:
... # Omitted business workload template
topology:
# Intel, AMD, and Yitian 710 ARM machines carry 50%, 25%, and 25% of the replicas respectively
subsets:
- name: intel
replicas: 50%
nodeSelectorTerm:
... # Select Intel node pool through labels
patch:
spec:
containers:
- name: main
resources:
limits:
cpu: 2000m
memory: 4000Mi
- name: amd64
replicas: 25%
nodeSelectorTerm:
... # Select AMD node pool through labels
# Allocate more CPU to AMD platform
patch:
spec:
containers:
- name: main
resources:
limits:
cpu: 3000m
memory: 4000Mi
- name: yitian-arm
replicas: 25%
nodeSelectorTerm:
... # Select ARM node pool through labels
# Allocate more memory to ARM platform
patch:
spec:
containers:
- name: main
resources:
limits:
cpu: 2000m
memory: 6000Mi

Summary

Elastic computing power can significantly reduce business costs and effectively increase the performance ceiling of services. To make good use of elastic computing power, it is necessary to choose appropriate elastic components based on specific application characteristics. The following table summarizes the capabilities of the two components introduced in this article, hoping to provide some reference.

ComponentPartition PrincipleEase of ModificationGranularity of PartitionElasticity Capability
WorkloadSpreadModify Pods via WebhookHighMediumMedium
UnitedDeploymentCreate multiple workloads via templatesLowHighHigh

UnitedDeploymemt - Supporting Multi-domain Workload Management

ยท 7 min read
Fei Guo
Maintainer of OpenKruise

Ironically, probably every cloud user knew (or should realized that) failures in Cloud resources are inevitable. Hence, high availability is probably one of the most desirable features that Cloud Provider offers for cloud users. For example, in AWS, each geographic region has multiple isolated locations known as Availability Zones (AZs). AWS provides various AZ-aware solutions to allow the compute or storage resources of the user applications to be distributed across multiple AZs in order to tolerate AZ failure, which indeed happened in the past.

In Kubernetes, the concept of AZ is not realized by an API object. Instead, an AZ is usually represented by a group of hosts that have the same location label. Although hosts within the same AZ can be identified by labels, the capability of distributing Pods across AZs was missing in Kubernetes default scheduler. Hence it was difficult to use single StatefulSet or Deployment to perform AZ-aware Pods deployment. Fortunately, in Kubernetes 1.16, a new feature called "Pod Topology Spread Constraints" was introduced. Users now can add new constraints in the Pod Spec, and scheduler will enforce the constraints so that Pods can be distributed across failure domains such as AZs, regions or nodes, in a uniform fashion.

In Kruise, UnitedDeploymemt provides an alternative to achieve high availability in a cluster that consists of multiple fault domains - that is, managing multiple homogeneous workloads, and each workload is dedicated to a single Subset. Pod distribution across AZs is determined by the replica number of each workload. Since each Subset is associated with a workload, UnitedDeployment can support finer-grained rollout and deployment strategies. In addition, UnitedDeploymemt can be further extended to support multiple clusters! Let us reveal how UnitedDeployment is designed.

Using Subsets to describe domain topologyโ€‹

UnitedDeploymemt uses Subset to represent a failure domain. Subset API primarily specifies the nodes that forms the domain and the number of replicas, or the percentage of total replicas, run in this domain. UnitedDeployment manages subset workloads against a specific domain topology, described by a Subset array.

type Topology struct {
// Contains the details of each subset.
Subsets []Subset
}

type Subset struct {
// Indicates the name of this subset, which will be used to generate
// subset workload name prefix in the format '<deployment-name>-<subset-name>-'.
Name string

// Indicates the node select strategy to form the subset.
NodeSelector corev1.NodeSelector

// Indicates the number of the subset replicas or percentage of it on the
// UnitedDeployment replicas.
Replicas *intstr.IntOrString
}

The specification of the subset workload is saved in Spec.Template. UnitedDeployment only supports StatefulSet subset workload as of now. An interesting part of Subset design is that now user can specify customized Pod distribution across AZs, which is not necessarily a uniform distribution in some cases. For example, if the AZ utilization or capacity are not homogeneous, evenly distributing Pods may lead to Pod deployment failure due to lack of resources. If users have prior knowledge about AZ resource capacity/usage, UnitedDeployment can help to apply an optimal Pod distribution to ensure overall cluster utilization remains balanced. Of course, if not specified, a uniform Pod distribution will be applied to maximize availability.

Customized subset rollout Partitionsโ€‹

User can update all the UnitedDeployment subset workloads by providing a new version of subset workload template. Note that UnitedDeployment does not control the entire rollout process of all subset workloads, which is typically done by another rollout controller built on top of it. Since the replica number in each Subset can be different, it will be much more convenient to allow user to specify the individual rollout Partition of each subset workload instead of using one Partition to rule all, so that they can be upgraded in the same pace. UnitedDeployment provides ManualUpdate strategy to customize per subset rollout Partition.

type UnitedDeploymentUpdateStrategy struct {
// Type of UnitedDeployment update.
Type UpdateStrategyType
// Indicates the partition of each subset.
ManualUpdate *ManualUpdate
}

type ManualUpdate struct {
// Indicates number of subset partition.
Partitions map[string]int32
}

multi-cluster controller

This makes it fairly easy to coordinate multiple subsets rollout. For example, as illustrated in Figure 1, assuming UnitedDeployment manages three subsets and their replica numbers are 4, 2, 2 respectively, a rollout controller can realize a canary release plan of upgrading 50% of Pods in each subset at a time by setting subset partitions to 2, 1, 1 respectively. The same cannot be easily achieved by using a single workload controller like StatefulSet or Deployment.

Multi-Cluster application management (In future)โ€‹

UnitedDeployment can be extended to support multi-cluster workload management. The idea is that Subsets may not only reside in one cluster, but also spread over multiple clusters. More specifically, domain topology specification will associate a ClusterRegistryQuerySpec, which describes the clusters that UnitedDeployment may distribute Pods to. Each cluster is represented by a custom resource managed by a ClusterRegistry controller using Kubernetes cluster registry APIs.

type Topology struct {
// ClusterRegistryQuerySpec is used to find the all the clusters that
// the workload may be deployed to.
ClusterRegistry *ClusterRegistryQuerySpec
// Contains the details of each subset including the target cluster name and
// the node selector in target cluster.
Subsets []Subset
}

type ClusterRegistryQuerySpec struct {
// Namespaces that the cluster objects reside.
// If not specified, default namespace is used.
Namespaces []string
// Selector is the label matcher to find all qualified clusters.
Selector map[string]string
// Describe the kind and APIversion of the cluster object.
ClusterType metav1.TypeMeta
}

type Subset struct {
Name string

// The name of target cluster. The controller will validate that
// the TargetCluster exits based on Topology.ClusterRegistry.
TargetCluster *TargetCluster

// Indicate the node select strategy in the Subset.TargetCluster.
// If Subset.TargetCluster is not set, node selector strategy refers to
// current cluster.
NodeSelector corev1.NodeSelector

Replicas *intstr.IntOrString
}

type TargetCluster struct {
// Namespace of the target cluster CRD
Namespace string
// Target cluster name
Name string
}

A new TargetCluster field is added to the Subset API. If it presents, the NodeSelector indicates the node selection logic in the target cluster. Now UnitedDeployment controller can distribute application Pods to multiple clusters by instantiating a StatefulSet workload in each target cluster with a specific replica number (or a percentage of total replica), as illustrated in Figure 2.

multi-cluster	controller

At a first glance, UnitedDeployment looks more like a federation controller following the design pattern of Kubefed, but it isn't. The fundamental difference is that Kubefed focuses on propagating arbitrary object types to remote clusters instead of managing an application across clusters. In this example, had a Kubefed style controller been used, each StatefulSet workload in individual cluster would have a replica of 100. UnitedDeployment focuses more on providing the ability of managing multiple workloads in multiple clusters on behalf of one application, which is absent in Kubernetes community to the best of our knowledge.

Summaryโ€‹

This blog post introduces UnitedDeployment, a new controller which helps managing application spread over multiple domains (in arbitrary clusters). It not only allows evenly distributing Pods over AZs, which arguably can be more efficiently done using the new Pod Topology Spread Constraint APIs though, but also enables flexible workload deployment/rollout and supports multi-cluster use cases in the future.