Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
b8a604c
Finalize API
SoWieMarkus Feb 18, 2026
c19990b
Finalize the API this time frfr
SoWieMarkus Feb 18, 2026
b6ec5bd
Remove waiting state in decision oberserver kpi
SoWieMarkus Feb 18, 2026
c29a6eb
Refactor DecisionStateKPI_Collect tests to remove waiting state metrics
SoWieMarkus Feb 18, 2026
210081e
Refactor Run method to return FilterWeigherPipelineDecision instead o…
SoWieMarkus Feb 18, 2026
d59441a
Example implementation for cinder api
SoWieMarkus Feb 18, 2026
61bb3f6
Merge branch 'main' into decision-crd
SoWieMarkus Feb 19, 2026
96b44c1
Refactor FilterWeigherPipelineController tests to simplify request ha…
SoWieMarkus Feb 18, 2026
8bdee65
Example implementation of external scheduler api tests in cinder
SoWieMarkus Feb 18, 2026
4f58ad2
Remove decision as source for scheduling pipeline
SoWieMarkus Feb 19, 2026
23eccb0
Merge branch 'main' into decision-crd
SoWieMarkus Feb 19, 2026
2e5f6e4
Remove explanation controller from config
SoWieMarkus Feb 19, 2026
b9d7185
Fix external scheduler api tests for cinder
SoWieMarkus Feb 19, 2026
54ef046
Fix manila tests
SoWieMarkus Feb 19, 2026
b4d6c18
Fix nova external scheduler api tests
SoWieMarkus Feb 19, 2026
abedb1c
Fix filter weigher pipeline controller for pods
SoWieMarkus Feb 19, 2026
2186be9
Fix machine scheduler
SoWieMarkus Feb 19, 2026
c8e46dd
Fix pod scheduler tests
SoWieMarkus Feb 19, 2026
8c9b596
Fix type of ordered host
SoWieMarkus Feb 19, 2026
4ff6bfc
Added back ignore preselection option
SoWieMarkus Feb 20, 2026
68d097c
Fix manila and cinder tests for filter weigher controller
SoWieMarkus Feb 20, 2026
50012fd
Fix nova controller tests
SoWieMarkus Feb 20, 2026
0dbffc4
Add cedision creation and event publishing, disabled explainer for now
SoWieMarkus Feb 20, 2026
408349f
Fix machines test
SoWieMarkus Feb 20, 2026
0f583d6
Fix pipeline deletion handling in Reconcile method and update test cases
SoWieMarkus Feb 20, 2026
a978822
Fix duplicate lib import
SoWieMarkus Feb 20, 2026
45e2568
Remove linting errors
SoWieMarkus Feb 20, 2026
6074f15
Update decision state KPI test cases and fix expected status in Cinde…
SoWieMarkus Feb 20, 2026
d0b2721
Update expected hosts in Nova external scheduler test case
SoWieMarkus Feb 20, 2026
56c4d69
Refactor machine processing test cases to remove decision creation ch…
SoWieMarkus Feb 20, 2026
1f27fd0
Refactor test cases to remove decision creation checks in filter weig…
SoWieMarkus Feb 20, 2026
b7ad01d
Merge branch 'main' into decision-crd
SoWieMarkus Feb 23, 2026
9b344e9
Rename pipeline controller to reflect descheduler functionality
SoWieMarkus Feb 23, 2026
6e58883
Merge branch 'main' into decision-crd
SoWieMarkus Feb 23, 2026
3742f41
Rename reason to intent
SoWieMarkus Feb 24, 2026
85dbe3f
Rename 'Reason' to 'Intent' in filter weigher pipeline controllers
SoWieMarkus Feb 24, 2026
eafa5ee
Merge branch 'main' into decision-crd
SoWieMarkus Feb 25, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 41 additions & 68 deletions api/v1alpha1/decision_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,90 +6,67 @@ package v1alpha1
import (
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
runtime "k8s.io/apimachinery/pkg/runtime"
)

type DecisionSpec struct {
// SchedulingDomain defines in which scheduling domain this decision
// was or is processed (e.g., nova, cinder, manila).
SchedulingDomain SchedulingDomain `json:"schedulingDomain"`
// SchedulingIntents represents the Intent for a scheduling event.
type SchedulingIntent string

// A reference to the pipeline that should be used for this decision.
const (
// SchedulingIntentInitialPlacement indicates that this is the initial placement of a resource.
SchedulingIntentInitialPlacement SchedulingIntent = "InitialPlacement"
// SchedulingIntentLiveMigration indicates that this scheduling event is triggered by a live migration operation.
SchedulingIntentLiveMigration SchedulingIntent = "LiveMigration"
// SchedulingIntentResize indicates that this scheduling event is triggered by a resize operation.
SchedulingIntentResize SchedulingIntent = "Resize"
// SchedulingIntentRebuild indicates that this scheduling event is triggered by a rebuild operation.
SchedulingIntentRebuild SchedulingIntent = "Rebuild"
// SchedulingIntentEvacuate indicates that this scheduling event is triggered by an evacuate operation.
SchedulingIntentEvacuate SchedulingIntent = "Evacuate"
// SchedulingIntentUnknown indicates that the Intent for this scheduling event is unknown.
SchedulingIntentUnknown SchedulingIntent = "Unknown"
)

// SchedulingHistoryEntry represents a single entry in the scheduling history of a resource.
type SchedulingHistoryEntry struct {
// The hosts that were selected in this scheduling event, in order of preference.
OrderedHosts []string `json:"orderedHosts"`
// Timestamp of when the scheduling event occurred.
Timestamp metav1.Time `json:"timestamp"`
// A reference to the pipeline that was used for this decision.
// This reference can be used to look up the pipeline definition and its
// scheduler step configuration for additional context.
PipelineRef corev1.ObjectReference `json:"pipelineRef"`
// The Intent for this scheduling event.
Intent SchedulingIntent `json:"intent"`
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this can be >1000 entries. Can you test when this would explode the kubernetes resource size limit? If we're orders of magnitude off (e.g. 50k entries are possible) this should be fine, or we should truncate the scheduling history in the controller and document this. E.g.

// SchedulingHistory provides the history of the observed resource.
// For example, this can be the (re-)schedulings of a virtual machine
// over time. Entries are always truncated to 100 elements to stay
// within the kubernetes resource size limits. Check `HistoryLength`
// for the untruncated number of events recorded for this resource.
SchedulingHistory []SchedulingHistoryEntry `json:"schedulingHistory,omitempty"`


type DecisionSpec struct {
// SchedulingDomain defines in which scheduling domain this decision
// was or is processed (e.g., nova, cinder, manila).
SchedulingDomain SchedulingDomain `json:"schedulingDomain"`

// An identifier for the underlying resource to be scheduled.
// For example, this can be the UUID of a nova instance or cinder volume.
// This can be used to correlate multiple decisions for the same resource.
ResourceID string `json:"resourceID"`

// If the type is "nova", this field contains the raw nova decision request.
// +kubebuilder:validation:Optional
NovaRaw *runtime.RawExtension `json:"novaRaw,omitempty"`
// If the type is "cinder", this field contains the raw cinder decision request.
// +kubebuilder:validation:Optional
CinderRaw *runtime.RawExtension `json:"cinderRaw,omitempty"`
// If the type is "manila", this field contains the raw manila decision request.
// +kubebuilder:validation:Optional
ManilaRaw *runtime.RawExtension `json:"manilaRaw,omitempty"`
// If the type is "machine", this field contains the machine reference.
// +kubebuilder:validation:Optional
MachineRef *corev1.ObjectReference `json:"machineRef,omitempty"`
// If the type is "pod", this field contains the pod reference.
// +kubebuilder:validation:Optional
PodRef *corev1.ObjectReference `json:"podRef,omitempty"`
}

type StepResult struct {
// object reference to the scheduler step.
StepName string `json:"stepName"`
// Activations of the step for each host.
Activations map[string]float64 `json:"activations"`
}

type DecisionResult struct {
// Raw input weights to the pipeline.
// +kubebuilder:validation:Optional
RawInWeights map[string]float64 `json:"rawInWeights"`
// Normalized input weights to the pipeline.
// +kubebuilder:validation:Optional
NormalizedInWeights map[string]float64 `json:"normalizedInWeights"`
// Outputs of the decision pipeline including the activations used
// to make the final ordering of compute hosts.
// +kubebuilder:validation:Optional
StepResults []StepResult `json:"stepResults,omitempty"`
// Aggregated output weights from the pipeline.
// +kubebuilder:validation:Optional
AggregatedOutWeights map[string]float64 `json:"aggregatedOutWeights"`
// Final ordered list of hosts from most preferred to least preferred.
// +kubebuilder:validation:Optional
OrderedHosts []string `json:"orderedHosts,omitempty"`
// The first element of the ordered hosts is considered the target host.
// +kubebuilder:validation:Optional
TargetHost *string `json:"targetHost,omitempty"`
}

const (
// The decision was successfully processed.
// The decision is ready and tracking the resource.
DecisionConditionReady = "Ready"
// The decision has failed to make a placement decision for the resource.
DecisionConditionFailed = "Failed"
)

type DecisionStatus struct {
// The result of this decision.
// +kubebuilder:validation:Optional
Result *DecisionResult `json:"result,omitempty"`

// If there were previous decisions for the underlying resource, they can
// be resolved here to provide historical context for the decision.
// The target host selected for the resource. Can be empty if no host could be determined.
// +kubebuilder:validation:Optional
History *[]corev1.ObjectReference `json:"history,omitempty"`
TargetHost string `json:"targetHost,omitempty"`

// The number of decisions that preceded this one for the same resource.
// The history of scheduling events for this resource.
// +kubebuilder:validation:Optional
Precedence *int `json:"precedence,omitempty"`
SchedulingHistory []SchedulingHistoryEntry `json:"schedulingHistory,omitempty"`

// A human-readable explanation of the decision result.
// A human-readable explanation of the current scheduling decision.
// +kubebuilder:validation:Optional
Explanation string `json:"explanation,omitempty"`

Expand All @@ -103,12 +80,8 @@ type DecisionStatus struct {
// +kubebuilder:resource:scope=Cluster
// +kubebuilder:printcolumn:name="Domain",type="string",JSONPath=".spec.schedulingDomain"
// +kubebuilder:printcolumn:name="Resource ID",type="string",JSONPath=".spec.resourceID"
// +kubebuilder:printcolumn:name="#",type="string",JSONPath=".status.precedence"
// +kubebuilder:printcolumn:name="Target Host",type="string",JSONPath=".status.targetHost"
// +kubebuilder:printcolumn:name="Created",type="date",JSONPath=".metadata.creationTimestamp"
// +kubebuilder:printcolumn:name="Pipeline",type="string",JSONPath=".spec.pipelineRef.name"
// +kubebuilder:printcolumn:name="TargetHost",type="string",JSONPath=".status.result.targetHost"
// +kubebuilder:printcolumn:name="Ready",type="string",JSONPath=".status.conditions[?(@.type=='Ready')].status"
// +kubebuilder:selectablefield:JSONPath=".spec.resourceID"

// Decision is the Schema for the decisions API
type Decision struct {
Expand Down
123 changes: 16 additions & 107 deletions api/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 0 additions & 14 deletions cmd/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,6 @@ import (
"github.com/cobaltcore-dev/cortex/internal/knowledge/extractor"
"github.com/cobaltcore-dev/cortex/internal/knowledge/kpis"
"github.com/cobaltcore-dev/cortex/internal/scheduling/cinder"
"github.com/cobaltcore-dev/cortex/internal/scheduling/explanation"
schedulinglib "github.com/cobaltcore-dev/cortex/internal/scheduling/lib"
"github.com/cobaltcore-dev/cortex/internal/scheduling/machines"
"github.com/cobaltcore-dev/cortex/internal/scheduling/manila"
Expand Down Expand Up @@ -439,19 +438,6 @@ func main() {
os.Exit(1)
}
}
if slices.Contains(mainConfig.EnabledControllers, "explanation-controller") {
// Setup a controller which will reconcile the history and explanation for
// decision resources.
explanationControllerConfig := conf.GetConfigOrDie[explanation.ControllerConfig]()
explanationController := &explanation.Controller{
Client: multiclusterClient,
Config: explanationControllerConfig,
}
if err := explanationController.SetupWithManager(mgr, multiclusterClient); err != nil {
setupLog.Error(err, "unable to create controller", "controller", "ExplanationController")
os.Exit(1)
}
}
if slices.Contains(mainConfig.EnabledControllers, "reservations-controller") {
monitor := reservationscontroller.NewControllerMonitor(multiclusterClient)
metrics.Registry.MustRegister(&monitor)
Expand Down
1 change: 0 additions & 1 deletion helm/bundles/cortex-cinder/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,6 @@ cortex-scheduling-controllers:
component: cinder-scheduling
enabledControllers:
- cinder-decisions-pipeline-controller
- explanation-controller
enabledTasks:
- cinder-decisions-cleanup-task

Expand Down
1 change: 0 additions & 1 deletion helm/bundles/cortex-ironcore/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,6 @@ cortex:
schedulingDomain: machines
enabledControllers:
- ironcore-decisions-pipeline-controller
- explanation-controller
monitoring:
labels:
github_org: cobaltcore-dev
Expand Down
1 change: 0 additions & 1 deletion helm/bundles/cortex-manila/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,6 @@ cortex-scheduling-controllers:
component: manila-scheduling
enabledControllers:
- manila-decisions-pipeline-controller
- explanation-controller
enabledTasks:
- manila-decisions-cleanup-task

Expand Down
1 change: 0 additions & 1 deletion helm/bundles/cortex-nova/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,6 @@ cortex-scheduling-controllers:
enabledControllers:
- nova-pipeline-controllers
- nova-deschedulings-executor
- explanation-controller
enabledTasks:
- nova-decisions-cleanup-task

Expand Down
1 change: 0 additions & 1 deletion helm/bundles/cortex-pods/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,6 @@ cortex:
schedulingDomain: pods
enabledControllers:
- pods-decisions-pipeline-controller
- explanation-controller
monitoring:
labels:
github_org: cobaltcore-dev
Expand Down
Loading