Releases · cozystack/cozystack
v1.2.1
Features and Improvements
- [postgres] Hardcode PostgreSQL 17 for monitoring databases and add migration: CloudNativePG operator defaults to PostgreSQL 18.3 when no explicit image is specified, but monitoring queries in Grafana and Alerta rely on PostgreSQL 17 features such as
pg_stat_checkpointerand the updatedpg_stat_bgwriter. This mismatch could break monitoring after fresh installs or database recreation. PostgreSQL 17.7 images are now hardcoded for monitoring databases, and migration 37 is added to set version v17 for any existing PostgreSQL resources (@IvanHunters in #2304, #2309).
Fixes
-
[platform] Prevent installed packages deletion: Added the
helm.sh/resource-policy: keepannotation to all platform packages. Previously, moving a package todisabledPackagesor removing it fromenabledPackagescaused Helm to automatically delete the corresponding resource, contradicting the documented behavior that requires the platform administrator to manually delete packages when needed (@myasnikovdaniil in #2273, #2297). -
[linstor] Preserve TCP ports during toggle-disk operations: During toggle-disk operations,
removeLayerData()freed TCP ports from the number pool andensureStackDataExists()could then allocate different ports. If a satellite missed the resulting update (e.g. due to a controller restart), it retained the old ports while peers received the new ones, causing DRBD connections to fail with StandAlone state. The fix addscopyDrbdTcpPortsIfExists()which saves existing TCP ports into theLayerPayloadbeforeremoveLayerData()deletes them (@kvaps in #2292, #2299). -
[platform] Fix resource allocation ratios not propagated to managed packages: A regression introduced in the bundle restructure caused
cpuAllocationRatio,memoryAllocationRatio, andephemeralStorageAllocationRatioset inplatform/values.yamlto become no-ops — they were never written to thecozystack-valuesSecret that cozy-lib reads in child packages. This meant all managed applications silently used the hardcoded defaults (10, 1, 40) regardless of operator-configured values. The fix restores propagation by writing the ratios into the_clustersection of thecozystack-valuesSecret and passingcpuAllocationRatioto the KubeVirt Package component (@sircthulhu in #2296, #2301). -
[linstor] Fix DRBD connectivity failures on kernels without
crct10difby setting verify-alg tocrc32c: LINSTOR's auto-verify algorithm selection defaults tocrct10dif, but this kernel crypto module is no longer available in newer kernels (e.g. Talos v1.12.6, kernel 6.18.18). Whencrct10difis unavailable, DRBD peer connections fail withVERIFYAlgNotAvail: failed to allocate crct10dif for verify, causing all DRBD resources to enter Diskless state and lose quorum.DrbdOptions/Net/verify-algis now set tocrc32cat the controller level (@kvaps in #2303, #2312). -
[multus] Fix stale sandbox reservations permanently blocking pod creation after CNI ADD failure: After a node disruption (e.g. DRBD or kube-ovn issues during upgrade), containerd accumulated stale sandbox name reservations. Cleanup failed because multus called delegate plugins for DEL without cached state and they rejected the incomplete config, causing DEL to fail instead of succeeding. Stale entries were never released, permanently blocking new pod creation on the affected node. A custom multus-cni image is now built with a patch that returns success from DEL when CNI ADD never completed (@kvaps in #2313, #2314).
-
[multus] Pin master CNI to
05-cilium.conflistto prevent race condition at boot: During node boot or Talos upgrade, multus auto-detects the master CNI conflist by scanning the CNI config directory. If kube-ovn writes10-kube-ovn.conflistbefore Cilium writes05-cilium.conflist, multus selects the wrong file and pods bypass the Cilium chain entirely, have no Cilium endpoint, and their traffic is blocked by cluster-wide network policies.multusMasterCNIis now pinned to05-cilium.conflist(@kvaps in #2315, #2316).
Documentation
-
[website] Add custom Keycloak themes documentation: Added documentation for custom Keycloak theme injection to the White Labeling guide, covering the theme image contract (
/themes/directory structure), configuration via thecozystack.keycloakPackage resource,imagePullSecretsfor private registries, and theme activation in the Keycloak admin console (@lexfrei in cozystack/website#463). -
[website] Add documentation for Go types usage: Added a guide for using the generated Go types for Cozystack managed applications as a Go module, including installation instructions, programmatic resource management examples, and deployment approaches (@myasnikovdaniil in cozystack/website#465).
Full Changelog: v1.2.0...v1.2.1
v1.1.5
Fixes
-
[platform] Prevent installed packages deletion: Added the
helm.sh/resource-policy: keepannotation to all platform packages. Previously, moving a package todisabledPackagesor removing it fromenabledPackagescaused Helm to automatically delete it, contradicting the documented behavior that requires the platform administrator to manually delete packages when needed (@myasnikovdaniil in #2273, #2298). -
[linstor] Fix TCP port mismatches after toggle-disk operations causing DRBD resources to enter StandAlone state: During toggle-disk operations,
removeLayerData()freed TCP ports from the number pool andensureStackDataExists()could then allocate different ports. If a satellite missed the resulting update (e.g. due to a controller restart), it retained the old ports while peers received the new ones, causing DRBD connections to fail with StandAlone state. The fix introducescopyDrbdTcpPortsIfExists(), which preserves existing TCP ports in theLayerPayloadbeforeremoveLayerData()releases them (@kvaps in #2292, #2300).
Full Changelog: v1.1.4...v1.1.5
v1.1.4
Features and Improvements
-
[boot-to-talos] Add support for ISO, RAW, and HTTP image sources: The
boot-to-talostool can now use ISO files, raw disk images, and HTTP URLs as Talos image sources in addition to container registry images. This allows bootstrapping nodes in air-gapped environments or from locally stored images without requiring a container registry (@lexfrei in cozystack/boot-to-talos#13). -
[boot-to-talos] Use permanent MAC address for predictable network interface names: Interface name detection now reads the permanent MAC address directly from sysfs instead of relying on udev data, providing a stable hardware MAC that is unaffected by user modifications to the active MAC address. This makes network interface naming more reliable across reboots and hardware changes (@IvanHunters in cozystack/boot-to-talos#14).
Fixes
-
[dashboard] Fix broken backup menu links missing cluster context: Backup resources (plans, backupjobs, backups) are not
ApplicationDefinitions, soensureNavigation()never created theirbaseFactoriesMappingentries. Without these entries the OpenUI frontend could not resolve the{cluster}context for backup pages, producing broken sidebar links with an empty cluster segment (e.g./openapi-ui//tenant-root/...). The missingbaseFactoriesMappingentries for all backup resource types are now added to the staticNavigationresource (@sircthulhu in #2232, #2269). -
[platform] Fix tenant admins unable to create FoundationDB, Harbor, MongoDB, OpenBAO, OpenSearch, Qdrant, and VPN applications: The
cozy:tenant:admin:baseClusterRolewas missing seven application resources fromapps.cozystack.io(foundationdbs,harbors,mongodbs,openbaos,opensearches,qdrants,vpns). Without these permissions, tenant admins could not create these applications — the "Add" button was inactive in the dashboard. The missing resources have been added to the ClusterRole (@sircthulhu in #2268, #2272). -
[dashboard] Fix StorageClass dropdown showing "Error" in application forms: The dashboard UI fetches
StorageClassresources to populate dropdowns (e.g. in the Postgres form), but thecozystack-dashboard-readonlyClusterRoledid not includestorage.k8s.io/storageclasses. This caused authenticated users to see "Error" instead of the StorageClass name.get/list/watchpermissions forstorageclasseshave been added to the dashboard readonly role (@sircthulhu in #2267, #2274). -
[system] Fix 403 error on Service details page by granting tenants read access to EndpointSlices: The dashboard requested
EndpointSlicesfrom thediscovery.k8s.ioAPI group to display the "Pod serving" section on the Service details page, butcozy:tenant:baseandcozy:tenant:view:baseClusterRoles lacked permissions for this resource. Tenant users received a 403 error when opening the Service details page.get/list/watchpermissions forendpointsliceshave been added to both tenant ClusterRoles (@sircthulhu in #2257, #2285). -
[dashboard] Fix "Pod serving" table displaying "Raw:" and "Invalid Date" on Service details page: The Service details page
EndpointSlicetable showed "Raw:" prefixes and "Invalid Date" values because theEnrichedTablereferencedcustomizationIdfactory-kube-service-details-endpointslicewhich had no correspondingCustomColumnsOverride. Column definitions for Pod (.targetRef.name), Addresses (.addresses), Ready (.conditions.ready), and Node (.nodeName) have been added (@sircthulhu in #2266, #2283). -
[piraeus-operator] Fix LINSTOR satellite alert labels, reduce scrape-flap false positives, and improve controller alerting: Three alerting issues in
cozy-piraeus-operatorhave been addressed: (1)linstorSatelliteErrorRateused a non-existentnamelabel in annotations, resulting inSatellite ""in alert notifications — corrected to{{ $labels.hostname }}; (2)linstorSatelliteErrorRatecould produce false positives when thelinstor-controllerscrape flapped and historicallinstor_error_reports_countcounters reappeared inside the alert window — fixed by adding a minimum scrape-count guard; (3) TheLinstorControllerOfflinealert has been split into separate availability and metrics-availability alerts with configurable hold time to reduce noise during brief connectivity interruptions (@sasha-sup in #2265, #2286). -
[linstor] Fix swapped VMPodScrape job labels causing incorrect controller offline alerts: The
cozy-linstorVictoriaMetricsVMPodScrapetemplates had thejobrelabeling rules swapped:linstor-satellitemetrics were labeled asjob=linstor-controllerand vice versa. This causedlinstorControllerOfflinealerts to fire for satellite endpoints (:9942) while reporting that the controller was unreachable. Thejoblabels are now correctly assigned to their respective targets (@sasha-sup in #2264, #2289). -
[boot-to-talos] Fix triple-fault on hosts with 5-level paging (LA57) enabled: On hosts with
CONFIG_X86_5LEVEL=yin the kernel, kexec into Talos caused a triple-fault because the Talos kernel does not support 5-level page tables.boot-to-talosnow detects LA57 before kexec and automatically patches GRUB withno5lvl, runsupdate-grub, and reboots. After reboot with 5-level paging disabled,boot-to-talosproceeds normally (@IvanHunters in cozystack/boot-to-talos#15). -
[boot-to-talos] Fix EFI boot entry creation when using loop device images: Talos installer skips EFI variable creation when running on loop devices.
boot-to-talosnow creates a proper UEFI boot entry with anHD()device path pointing to the real target disk's ESP by reading the GPT partition table from the target disk after image copy, instead of relying on the Talos installer (@kvaps in cozystack/boot-to-talos#16). -
[talm] Fix silent empty output when no template files are specified: Running
talm templatewithout--fileor--templateflags previously produced minimal or empty output without any error. Validation has been added toengine.Renderto return a clear error message when no template files are specified, making misconfigured invocations immediately apparent (@kvaps in cozystack/talm#112).
Documentation
-
[website] Add documentation for VMInstance and VMDisk backups: Added a new virtualization-focused Backup and Recovery guide covering one-off and scheduled backups for
VMInstanceandVMDiskresources, restore procedures, status verification commands, and troubleshooting notes including Velero-related issues (@myasnikovdaniil in cozystack/website#456). -
[website] Update developer guide with operator-driven architecture and OCIRepository migration flow: Rewrote the development guide to describe the operator-driven in-cluster architecture, bootstrap flow, operator responsibilities, and the platform install/update sequence. Added an "OCIRepositories and Migration Flow" section with migration hook examples and sequencing rules for pre-upgrade hooks (@myasnikovdaniil in cozystack/website#458).
Full Changelog: v1.1.3...v1.1.4
v1.0.7
Fixes
-
[platform] Fix tenant admins unable to create FoundationDB, Harbor, MongoDB, OpenBAO, OpenSearch, Qdrant, and VPN applications: The
cozy:tenant:admin:baseClusterRole was missing RBAC entries forfoundationdbs,harbors,mongodbs,openbaos,opensearches,qdrants, andvpnsresources fromapps.cozystack.io. Without these permissions, tenant admins could not create these applications — the "Add" button was inactive in the dashboard. The fix adds all seven missing resource verbs (@sircthulhu in #2268, #2271). -
[system] Fix 403 error on Service details page for tenant users: The
cozy:tenant:baseandcozy:tenant:view:baseClusterRoles were missing read permissions fordiscovery.k8s.io/endpointslices. The dashboard requests EndpointSlices to display the "Pod serving" section on the Service details page, and without this permission tenant users received a 403 error. The fix addsget,list, andwatchverbs for endpointslices to both tenant roles (@sircthulhu in #2257, #2284). -
[dashboard] Fix "Pod serving" table showing "Raw:" prefixes and "Invalid Date" on Service details page: The EndpointSlice table on the service details page displayed raw data and broken timestamps because the
EnrichedTablecomponent referenced thefactory-kube-service-details-endpointslicecustomization ID which had no correspondingCustomColumnsOverride. The fix adds column definitions for Pod (.targetRef.name), Addresses (.addresses), Ready (.conditions.ready), and Node (.nodeName) (@sircthulhu in #2266, #2282). -
[dashboard] Fix broken backup menu links missing cluster context: Backup resources (plans, backupjobs, backups) are not
ApplicationDefinitions, soensureNavigation()never created theirbaseFactoriesMappingentries. Without these mappings, the OpenUI frontend could not resolve the{cluster}context for backup pages, producing broken sidebar links with an empty cluster segment (e.g./openapi-ui//tenant-root/...instead of/openapi-ui/default/tenant-root/...). The fix adds the three missing static entries to the Navigation resource (@sircthulhu in #2232, #2270). -
[linstor] Fix swapped VMPodScrape job labels causing incorrect alerts: The
joblabels in thecozy-linstorVictoriaMetricsVMPodScrapetemplates were swapped:linstor-satellitemetrics were relabeled asjob=linstor-controllerand vice versa. This causedlinstorControllerOfflinealerts to fire against satellite endpoints (:9942) while reporting the controller as unreachable. The fix ensureslinstor-satellitemetrics keepjob=linstor-satelliteandlinstor-controllermetrics keepjob=linstor-controller, restoring consistent alerting and dashboard semantics (@sasha-sup in #2264, #2288). -
[piraeus-operator] Fix LINSTOR satellite alert annotations and reduce false-positive alerts: Two issues in the LINSTOR alerts shipped by
cozy-piraeus-operatorwere fixed. First,linstorSatelliteErrorRateused a non-existentnamelabel in annotations, resulting inSatellite ""in alert notifications — corrected to use{{ $labels.hostname }}. Second,linstorSatelliteErrorRateproduced false positives when thelinstor-controllerscrape flapped and historicallinstor_error_reports_countcounters reappeared inside the alert window — fixed by requiring stableup{job="linstor-controller"}for the full 15-minute window. Additionally, the controller availability alert was split to add a dedicated warning for metrics scrape failures with a 10-minute hold time to reduce transient noise (@sasha-sup in #2265, #2287).
Documentation
-
[website] Add Backup and Recovery guide for VMInstance and VMDisk: Replaced the generic Kubernetes Backup and Recovery guide with a virtualization-focused Backup and Recovery doc covering VMInstance and VMDisk one-off and scheduled backups, restores, status checks, and troubleshooting (including Velero-related notes) (@myasnikovdaniil in cozystack/website#456).
-
[website] Update developer guide with operator-driven architecture and OCIRepository/migration flow: Rewrote the development guide to describe the operator-driven in-cluster architecture, bootstrap flow, operator responsibilities, and platform install/update sequence. Added documentation for OCIRepositories and the migration flow with migration hook examples and sequencing rules for pre-upgrade/install migrations. Also updated the concepts guide with the two-repository update model, dependency ordering rules, namespace creation behavior, and cluster-wide values injection (@myasnikovdaniil in cozystack/website#458).
Full Changelog: v1.0.6...v1.0.7
v1.2.0
Cozystack v1.2.0
Cozystack v1.2.0 delivers significant platform enhancements: a fully managed OpenSearch service joining the application catalog, VPC peering for secure inter-tenant networking, tenant workload placement control via the new SchedulingClass system, a highly-available VictoriaLogs cluster replacing the single-node setup, and Linstor volume relocation for optimized clone and snapshot restore placement. Additional highlights include external-dns as a standalone extra package, multi-node RWX volume fixes, and a wave of dashboard and monitoring improvements.
Feature Highlights
OpenSearch: Managed Search and Analytics Service
Cozystack now ships OpenSearch as a fully managed PaaS application — supporting OpenSearch v1, v2, and v3 in a multi-role topology with dedicated master, data, ingest, coordinating, and ML nodes. TLS is enabled by default, HTTP Basic auth is provided out of the box, and custom user definitions allow per-application credentials. The optional OpenSearch Dashboards UI can be enabled alongside the engine. External access, topology spread policies, and a comprehensive JSON schema are all included.
A companion opensearch-operator system package wraps the upstream Opster OpenSearch Operator v2.8.0 and adds a sysctl DaemonSet to configure the required vm.max_map_count kernel parameter on every node automatically. An ApplicationDefinition package ties everything into the Cozystack platform dashboard with schema validation and resource management.
SchedulingClass: Tenant Workload Placement
Cozystack now supports a SchedulingClass CRD that allows platform operators to define cluster-wide scheduling constraints — pinning tenant workloads to specific data centers, hardware generations, or node groups without requiring tenants to manage scheduler configuration themselves. Tenants declare a schedulingClass in their Tenant spec; the platform injects the appropriate schedulerName into all workloads in that namespace.
The lineage-controller-webhook has been extended to verify the referenced SchedulingClass CR before injection, and child tenants inherit their parent's scheduling constraints (children cannot override). A SchedulingClass dropdown in the Tenant creation form in the dashboard makes the feature fully self-service. The underlying cozystack-scheduler — a custom kube-scheduler extension with SchedulingClass-aware affinity plugins — is now installed and enabled by default as part of the platform.
VPC Peering for Multi-Tenant Environments
The vpc application gains bilateral VPC peering using Kube-OVN's native vpcPeerings mechanism, allowing tenants to securely interconnect their private networks without routing traffic through public endpoints. Peering link-local IPs (169.254.0.0/16) are allocated deterministically from a hash of the sorted VPC pair names, ensuring stable addresses across reconciliations. Static route support (staticRoutes) enables fine-grained inter-VPC routing policies. A cozy-lib helper (hexToInt) performs the deterministic IP allocation, and a JSON Schema validation enforces the ^tenant- namespace pattern for peered VPCs.
VictoriaLogs: Clustered Mode for High Availability
The platform's log storage has been upgraded from the deprecated single-node VLogs CR to a VLCluster deployment with separate vlinsert, vlselect, and vlstorage components, each running with 2 replicas by default — consistent with the existing VMCluster setup. This brings horizontal scalability and resilience to the logging tier. VPA autoscaling is enabled for all VLCluster components, and the victoria-metrics-operator has been upgraded from v0.55.0 to v0.68.1 to add VLCluster CRD support.
Linstor CSI: Volume Relocation After Clone and Restore
The Linstor CSI driver now carries upstream patches enabling automatic replica relocation after PVC clone and snapshot restore operations. Two new parameters control the behavior: linstor.csi.linbit.com/relocateAfterClone on StorageClasses moves replicas to optimal nodes after a clone, and snap.linstor.csi.linbit.com/relocate-after-restore on VolumeSnapshotClasses does the same after a restore. VolumeSnapshotClasses for Velero and Kasten use cases are pre-configured. This enables full PVC-level backup and restore workflows with automatic data rebalancing, a key prerequisite for production Velero/Kasten integrations.
Major Features and Improvements
-
[apps] Add managed OpenSearch service: Deployed as a PaaS application supporting OpenSearch v1/v2/v3 with multi-role node topology, TLS, HTTP Basic auth, custom users, optional OpenSearch Dashboards UI, external access, and topology spread policies; backed by the opster OpenSearch Operator v2.8.0 and a sysctl DaemonSet for
vm.max_map_count(@matthieu-robin in #1953). -
[vpc] Add VPC peering support for multi-tenant environments: Bilateral VPC peering via Kube-OVN's
vpcPeerings, deterministic link-local IP allocation from sorted VPC pair hash, static routes support, ConfigMap peer discovery enrichment, and JSON Schema validation enforcing^tenant-namespace pattern (@mattia-eleuteri in #2152). -
[monitoring] Migrate VictoriaLogs from VLogs to VLCluster: Replaced deprecated single-node
VLogsCR with clusteredVLCluster(vlinsert/vlselect/vlstorage, 2 replicas each), added VPA for all components, upgraded victoria-metrics-operator to v0.68.1 (@sircthulhu in #2153). -
[scheduler] Integrate SchedulingClass support for tenant workloads: Added
schedulingClassTenant parameter with inheritance enforcement,scheduling.cozystack.io/classnamespace label, lineage-webhook extension to verify and injectschedulerName, SchedulingClass dropdown in Tenant dashboard form (@sircthulhu in #2223). -
[cozystack-scheduler] Add custom scheduler as an optional system package: Vendored
cozystack-schedulerfrom github.com/cozystack/cozystack-scheduler — a kube-scheduler extension with SchedulingClass-aware affinity plugins, including Helm chart with RBAC, ConfigMap, Deployment, and CRD (@lllamnyp in #2205). -
[platform] Enable cozystack-scheduler by default: The cozystack-scheduler and SchedulingClass CRD are now installed as default system packages; the backup tool has been moved to optional packages (@lllamnyp in #2253).
-
[extra] Add external-dns as a standalone extra package: Packaged external-dns as an installable extra (tenant-level) component for automatic DNS record management from Kubernetes Service and Ingress resources (@mattia-eleuteri in #1988).
-
[linstor] Add linstor-csi patches for clone/snapshot relocation: New patch enabling
relocateAfterCloneStorageClass parameter andrelocate-after-restoreVolumeSnapshotClass parameter; pre-configured VolumeSnapshotClasses for Velero and relocation workflows; CDI switched to csi-clone strategy (@kvaps in #2133). -
[monitoring] Add inlineScrapeConfig support to tenant vmagent: Tenants can now define inline scrape configurations directly in their VMAgent spec, enabling custom metrics collection from services that are not discoverable via standard Kubernetes service discovery (@mattia-eleuteri in #2200).
-
[monitoring] Add Slack dashboard URL, vmagent environment label, and dynamictext Grafana plugin: Added
SLACK_DASHBOARD_URLandSLACK_SUMMARY_FMTenvironment variables for richer alert notifications, per-vmagentenvironmentlabel for metric source identification, and thedynamictext-panelplugin for Grafana dashboards (@vnyakas in #2210). -
[monitoring] Scope infrastructure dashboards to tenant-root only: Infrastructure-level Grafana dashboards are now scoped to the tenant-root namespace only, preventing them from appearing in tenant sub-namespaces and reducing dashboard noise (@mattia-eleuteri in #2197).
-
[tenant] Allow egress to virt-handler for VM metrics scraping: Extended tenant NetworkPolicy to permit egress to virt-handler pods, enabling Prometheus to scrape VM-level metrics from KubeVirt without additional policy exceptions (@mattia-eleuteri in #2199).
-
[dashboard] Add keycloakInternalUrl for backend-to-backend OIDC requests: Added a
keycloakInternalUrlplatform value for the dashboard backend to perform OIDC token introspection via an internal cluster URL, avoiding external round-trips and improving reliability in air-gapped environments (@sircthulhu in #2224). -
[dashboard] Add secret-hash annotation to KeycloakClient for secret sync: Added a
secret-hashannotation to the KeycloakClient resource so that changes to the client secret trigger automatic reconciliation and propagation to dependent components (@sircthulhu in #2231). -
[docs] Add OpenAPI and Go types code generation for apps: Added tooling to generate OpenAPI schemas and Go types from Helm chart values, enabling type-safe programmatic access to managed application configurations and automatic API reference generation (@myasnikovdaniil in #2214).
Improvements (minor)
v1.0.6
Fixes
-
[kubernetes] Fix CiliumNetworkPolicy endpointSelector for multi-node RWX volumes: When an NFS-backed RWX volume is published to multiple VMs, the network policy's
endpointSelectorwas only capturing the first VM. Subsequent volume publications added owner references but never broadened the selector, causing Cilium to block NFS egress and making mounts hang on all nodes except the first. The fix switches frommatchLabelstomatchExpressions(operator: In) so the selector lists all VM names and is rebuilt whenever owner references change (@mattia-eleuteri in #2227, #2228). -
[dashboard] Fix dashboard authentication failures after secret recreation: Added a
secret-hashannotation containing the SHA256 hash of the client secret to the dashboardKeycloakClientresource. Without this annotation, if thedashboard-clientSecret was recreated (e.g. after an upgrade or reinstall), theKeycloakClientspec stayed unchanged, the EDP Keycloak operator skipped reconciliation, and Keycloak kept the stale secret — causing dashboard authentication failures. Now any secret value change updates the annotation hash, triggering operator reconciliation and syncing the new secret to Keycloak (@sircthulhu in #2231, #2240). -
[etcd] Fix defrag CronJob accumulating pods during cluster upgrades: Added protective limits to the etcd defragmentation CronJob to prevent job pile-up when etcd is temporarily unavailable during upgrades. Without
concurrencyPolicy: Forbid, new jobs kept being created hourly while previous ones were still failing, accumulating hundreds of running/failed pods across tenants. The fix addsconcurrencyPolicy: Forbid, astartingDeadlineSeconds: 300guard against missed schedules, a 30-minute job timeout, and limits retries to 2 (@sircthulhu in #2233, #2235).
Documentation
-
[website] Document
keycloakInternalUrlplatform value: Added documentation for theauthentication.oidc.keycloakInternalUrlplatform value to the Platform Package Reference, Self-Signed Certificates guide, and Enable OIDC Server pages. This value routes dashboard backend OIDC requests through the internal Keycloak service, which is useful in environments with self-signed certificates (@sircthulhu in cozystack/website#452). -
[website] Publish Cozystack v1.0 release announcement: Added the official Cozystack v1.0 release announcement blog post and supporting images, celebrating the first stable release of the platform (@tym83 in cozystack/website#453, cozystack/website#454).
Full Changelog: v1.0.5...v1.0.6
v1.1.3
Fixes
-
[kubernetes] Fix CiliumNetworkPolicy endpointSelector not updated for multi-node RWX volumes: When an NFS-backed RWX volume was published to multiple VMs, the
CiliumNetworkPolicyendpointSelector.matchLabelswas set only for the first VM and never broadened on subsequentControllerPublishVolumecalls. This caused Cilium to block NFS egress so that mounts hung on all nodes except the first. The selector now usesmatchExpressionswithoperator: Inand is rebuilt whenever owner references are added or removed (@mattia-eleuteri in #2227, #2229). -
[dashboard] Fix dashboard-client secret desynchronization with Keycloak after upgrades: When the
dashboard-clientKubernetes Secret was recreated with a new value after an upgrade or reinstall, theKeycloakClientspec remained unchanged and the EDP Keycloak operator skipped reconciliation, leaving Keycloak with the stale secret and causing authentication failures for the dashboard. Asecret-hashannotation containing the SHA256 hash of the client secret is now added to theKeycloakClientresource; any secret rotation updates the hash in metadata, triggering operator reconciliation and syncing the new secret to Keycloak (@sircthulhu in #2231, #2241). -
[etcd] Fix defrag CronJob accumulating hundreds of pods during cluster upgrades: After upgrading CozyStack, the etcd defrag CronJob could accumulate hundreds of running and failed pods when etcd was temporarily unavailable during the upgrade, because no concurrency or retry limits were configured. Added
concurrencyPolicy: Forbidto prevent parallel jobs,startingDeadlineSeconds: 300to discard missed schedules older than 5 minutes,failedJobsHistoryLimit: 1to limit failure retention,activeDeadlineSeconds: 1800for a 30-minute per-job timeout, andbackoffLimit: 2to cap retries (@sircthulhu in #2233, #2234).
Documentation
- [website] Document
keycloakInternalUrlplatform value: Added reference documentation for theauthentication.oidc.keycloakInternalUrlplatform value to the Platform Package Reference, Self-Signed Certificates guide, and Enable OIDC Server guide, explaining how to route dashboard backend OIDC requests through the internal Keycloak service URL (@sircthulhu in cozystack/website#452).
Full Changelog: v1.1.2...v1.1.3
v1.1.2
Fixes
-
[bucket] Fix S3 Manager endpoint mismatch with COSI credentials: The S3 Manager UI previously constructed an
s3.<tenant>.<cluster-domain>endpoint even though COSI-issued bucket credentials point to the root-level S3 endpoint. This caused login failures with "invalid credentials" despite valid secrets. The deployment now uses the actual endpoint fromBucketInfo, with the old namespace-based endpoint kept only as a fallback beforeBucketAccesssecrets exist (@IvanHunters in #2211, #2215). -
[platform] Fix spurious OpenAPI post-processing errors on cozystack-api startup: The OpenAPI post-processor was being invoked for non-
apps.cozystack.iogroup versions where the baseApplication*schemas do not exist, producing noisy startup errors on every API server launch. It now skips those non-apps group versions gracefully instead of returning an error (@kvaps in #2212, #2217).
Documentation
-
[website] Add troubleshooting for packages stuck in
DependenciesNotReady: Added an operations guide that explains how to diagnose missing package dependencies in operator logs and corrected the packages management development docs to use the currentmake image-packagestarget (@kvaps in cozystack/website#450). -
[website] Reorder installation docs to install the operator before the platform package: Updated the platform installation guide and tutorial so the setup sequence consistently installs the Cozystack operator first, then prepares and applies the Platform Package, matching the rest of the documentation set (@sircthulhu in cozystack/website#449).
-
[website] Add automated installation guide for the Ansible collection: Added a full guide for deploying Cozystack with the
cozystack.installercollection, including inventory examples, distro-specific playbooks, configuration reference, and explicit version pinning guidance (@lexfrei in cozystack/website#442). -
[website] Expand monitoring and platform architecture reference docs: Added a tenant custom metrics collection guide for
VMServiceScrapeandVMPodScrape, and documentedPackageSource/Packagearchitecture, reconciliation flow, rollback behavior, and thecozypkgworkflow in Key Concepts (@IvanHunters in cozystack/website#444, cozystack/website#445). -
[website] Improve operations guides for CA rotation and Velero backups: Completed the CA rotation documentation with dry-run and post-rotation credential retrieval steps, and expanded the backup configuration guide with concrete examples, verification commands, and clearer operator procedures (@kvaps in cozystack/website#406; @androndo in cozystack/website#440).
Full Changelog: v1.1.1...v1.1.2
v1.0.5
Fixes
- [api] Fix spurious OpenAPI post-processing errors for non-apps group versions: The API server no longer logs false errors while generating OpenAPI specs for core and other non-
apps.cozystack.iogroup versions. The post-processor now exits early when the baseApplicationschemas are absent, reducing noisy startup logs without affecting application schema generation (@kvaps in #2212, #2216).
Documentation
-
[website] Add
DependenciesNotReadytroubleshooting and correct packages management build target: Added a troubleshooting guide for packages stuck inDependenciesNotReady, including how to inspect operator logs and identify missing dependencies, and fixed the outdatedmake image-cozystackcommand tomake image-packagesin the packages management guide (@kvaps in cozystack/website#450). -
[website] Clarify operator-first installation order: Reordered the platform installation guide and tutorial so users install the Cozystack operator before preparing and applying the Platform Package, matching the rest of the installation docs and reducing setup confusion during fresh installs (@sircthulhu in cozystack/website#449).
-
[website] Add automated installation guide for Ansible: Added end-to-end documentation for deploying Cozystack with the
cozystack.installerAnsible collection, including inventory examples, distro-specific playbooks, configuration reference, verification steps, and explicit version pinning guidance to help operators automate installs safely (@lexfrei in cozystack/website#442). -
[website] Expand CA rotation operations guide: Completed the CA rotation documentation with separate Talos and Kubernetes certificate rotation procedures, dry-run preview steps, and post-rotation guidance for fetching updated
talosconfigandkubeconfigfiles after certificate changes (@kvaps in cozystack/website#406). -
[website] Improve backup operations documentation: Enhanced the operator backup and recovery guide with clearer Velero enablement steps, concrete provider and bucket examples, and more useful commands for inspecting backups, schedules, restores, CRD status, and logs (@androndo in cozystack/website#440).
-
[website] Add custom metrics collection guide: Added a monitoring guide showing how tenants can expose their own Prometheus exporters through
VMServiceScrapeandVMPodScrape, including namespace labeling requirements, example manifests, verification steps, and troubleshooting advice (@IvanHunters in cozystack/website#444). -
[website] Document PackageSource and Package architecture: Added a Key Concepts reference covering
PackageSourceandPackagereconciliation flow, dependency handling, update propagation, rollback behavior, FluxPlunger recovery, and thecozypkgCLI for package management (@IvanHunters in cozystack/website#445). -
[website] Refresh v1 application and platform documentation: Fixed the documentation auto-update flow and published a broad v1 documentation refresh covering newly documented applications, updated naming and navigation, virtualization and platform content updates, and reorganized versioned docs pages (@myasnikovdaniil in cozystack/website#439).
Full Changelog: v1.0.4...v1.0.5
v1.1.1
Fixes
-
[dashboard] Fix hidden MarketplacePanel resources appearing in sidebar menu: The sidebar was generated independently from MarketplacePanels, always showing all resources regardless of their
hiddenstate. Fixed by fetching MarketplacePanels during sidebar reconciliation and skipping resources wherehidden=true, so hiding a resource from the marketplace also removes it from the sidebar navigation (@IvanHunters in #2177, #2203). -
[dashboard] Fix disabled/hidden state overwritten on every MarketplacePanel reconciliation: The controller was hardcoding
disabled=falseandhidden=falseon every reconciliation, silently overwriting any user changes made through the dashboard UI. Fixed by reading and preserving the currentdisabled/hiddenvalues from the existing resource before updating (@IvanHunters in #2176, #2201). -
[dashboard] Fix External IPs factory EnrichedTable rendering: The external-IPs table displayed empty rows because the factory used incorrect
EnrichedTableproperties. ReplacedclusterNamePartOfUrlwithclusterand changedpathToItemsfrom array to dot-path string format, consistent with all other workingEnrichedTableinstances (@IvanHunters in #2175, #2193). -
[platform] Fix VM MAC address not preserved during virtual-machine to vm-instance migration: Kube-OVN reads MAC address exclusively from the pod annotation
ovn.kubernetes.io/mac_address, not from the IP resourcespec.macAddress. Without the annotation, migrated VMs received a new random MAC, breaking OS-level network configurations that match by MAC (e.g. netplan). Added a Helmlookupfor the Kube-OVN IP resource in the vm-instance chart so that MAC and IP addresses are automatically injected as pod annotations when the resource exists (@sircthulhu in #2169, #2190). -
[etcd-operator] Replace deprecated kube-rbac-proxy image: The
gcr.io/kubebuilder/kube-rbac-proxyimage became unavailable after Google Container Registry was deprecated. Replaced it withquay.io/brancz/kube-rbac-proxyfrom the original upstream author, restoring etcd-operator functionality (@kvaps in #2181, #2182). -
[migrations] Handle missing RabbitMQ CRD in migration 34: Migration 34 failed with an error when the
rabbitmqs.apps.cozystack.ioCRD did not exist — which occurs on clusters where RabbitMQ was never installed. Added a CRD presence check before attempting to list resources so that migration 34 completes cleanly on such clusters (@IvanHunters in #2168, #2180). -
[keycloak] Fix Keycloak crashloop due to misconfigured health probes: Keycloak 26.x redirects all HTTP requests on port 8080 to the configured HTTPS hostname; since kubelet does not follow redirects, liveness and readiness probes failed causing a crashloop. Fixed by enabling
KC_HEALTH_ENABLED=true, exposing management port 9000, and switching all probes to/health/liveand/health/readyon port 9000. Also added astartupProbefor improved startup tolerance (@mattia-eleuteri in #2162, #2179).
Full Changelog: v1.1.0...v1.1.1