Releases · cozystack/cozystack

v1.2.1

Features and Improvements

  • [postgres] Hardcode PostgreSQL 17 for monitoring databases and add migration: CloudNativePG operator defaults to PostgreSQL 18.3 when no explicit image is specified, but monitoring queries in Grafana and Alerta rely on PostgreSQL 17 features such as pg_stat_checkpointer and the updated pg_stat_bgwriter. This mismatch could break monitoring after fresh installs or database recreation. PostgreSQL 17.7 images are now hardcoded for monitoring databases, and migration 37 is added to set version v17 for any existing PostgreSQL resources (@IvanHunters in #2304, #2309).

Fixes

  • [platform] Prevent installed packages deletion: Added the helm.sh/resource-policy: keep annotation to all platform packages. Previously, moving a package to disabledPackages or removing it from enabledPackages caused Helm to automatically delete the corresponding resource, contradicting the documented behavior that requires the platform administrator to manually delete packages when needed (@myasnikovdaniil in #2273, #2297).

  • [linstor] Preserve TCP ports during toggle-disk operations: During toggle-disk operations, removeLayerData() freed TCP ports from the number pool and ensureStackDataExists() could then allocate different ports. If a satellite missed the resulting update (e.g. due to a controller restart), it retained the old ports while peers received the new ones, causing DRBD connections to fail with StandAlone state. The fix adds copyDrbdTcpPortsIfExists() which saves existing TCP ports into the LayerPayload before removeLayerData() deletes them (@kvaps in #2292, #2299).

  • [platform] Fix resource allocation ratios not propagated to managed packages: A regression introduced in the bundle restructure caused cpuAllocationRatio, memoryAllocationRatio, and ephemeralStorageAllocationRatio set in platform/values.yaml to become no-ops — they were never written to the cozystack-values Secret that cozy-lib reads in child packages. This meant all managed applications silently used the hardcoded defaults (10, 1, 40) regardless of operator-configured values. The fix restores propagation by writing the ratios into the _cluster section of the cozystack-values Secret and passing cpuAllocationRatio to the KubeVirt Package component (@sircthulhu in #2296, #2301).

  • [linstor] Fix DRBD connectivity failures on kernels without crct10dif by setting verify-alg to crc32c: LINSTOR's auto-verify algorithm selection defaults to crct10dif, but this kernel crypto module is no longer available in newer kernels (e.g. Talos v1.12.6, kernel 6.18.18). When crct10dif is unavailable, DRBD peer connections fail with VERIFYAlgNotAvail: failed to allocate crct10dif for verify, causing all DRBD resources to enter Diskless state and lose quorum. DrbdOptions/Net/verify-alg is now set to crc32c at the controller level (@kvaps in #2303, #2312).

  • [multus] Fix stale sandbox reservations permanently blocking pod creation after CNI ADD failure: After a node disruption (e.g. DRBD or kube-ovn issues during upgrade), containerd accumulated stale sandbox name reservations. Cleanup failed because multus called delegate plugins for DEL without cached state and they rejected the incomplete config, causing DEL to fail instead of succeeding. Stale entries were never released, permanently blocking new pod creation on the affected node. A custom multus-cni image is now built with a patch that returns success from DEL when CNI ADD never completed (@kvaps in #2313, #2314).

  • [multus] Pin master CNI to 05-cilium.conflist to prevent race condition at boot: During node boot or Talos upgrade, multus auto-detects the master CNI conflist by scanning the CNI config directory. If kube-ovn writes 10-kube-ovn.conflist before Cilium writes 05-cilium.conflist, multus selects the wrong file and pods bypass the Cilium chain entirely, have no Cilium endpoint, and their traffic is blocked by cluster-wide network policies. multusMasterCNI is now pinned to 05-cilium.conflist (@kvaps in #2315, #2316).

Documentation

  • [website] Add custom Keycloak themes documentation: Added documentation for custom Keycloak theme injection to the White Labeling guide, covering the theme image contract (/themes/ directory structure), configuration via the cozystack.keycloak Package resource, imagePullSecrets for private registries, and theme activation in the Keycloak admin console (@lexfrei in cozystack/website#463).

  • [website] Add documentation for Go types usage: Added a guide for using the generated Go types for Cozystack managed applications as a Go module, including installation instructions, programmatic resource management examples, and deployment approaches (@myasnikovdaniil in cozystack/website#465).


Full Changelog: v1.2.0...v1.2.1

Download cozystack

v1.1.5

Fixes

  • [platform] Prevent installed packages deletion: Added the helm.sh/resource-policy: keep annotation to all platform packages. Previously, moving a package to disabledPackages or removing it from enabledPackages caused Helm to automatically delete it, contradicting the documented behavior that requires the platform administrator to manually delete packages when needed (@myasnikovdaniil in #2273, #2298).

  • [linstor] Fix TCP port mismatches after toggle-disk operations causing DRBD resources to enter StandAlone state: During toggle-disk operations, removeLayerData() freed TCP ports from the number pool and ensureStackDataExists() could then allocate different ports. If a satellite missed the resulting update (e.g. due to a controller restart), it retained the old ports while peers received the new ones, causing DRBD connections to fail with StandAlone state. The fix introduces copyDrbdTcpPortsIfExists(), which preserves existing TCP ports in the LayerPayload before removeLayerData() releases them (@kvaps in #2292, #2300).


Full Changelog: v1.1.4...v1.1.5

Download cozystack

v1.1.4

Features and Improvements

  • [boot-to-talos] Add support for ISO, RAW, and HTTP image sources: The boot-to-talos tool can now use ISO files, raw disk images, and HTTP URLs as Talos image sources in addition to container registry images. This allows bootstrapping nodes in air-gapped environments or from locally stored images without requiring a container registry (@lexfrei in cozystack/boot-to-talos#13).

  • [boot-to-talos] Use permanent MAC address for predictable network interface names: Interface name detection now reads the permanent MAC address directly from sysfs instead of relying on udev data, providing a stable hardware MAC that is unaffected by user modifications to the active MAC address. This makes network interface naming more reliable across reboots and hardware changes (@IvanHunters in cozystack/boot-to-talos#14).

Fixes

  • [dashboard] Fix broken backup menu links missing cluster context: Backup resources (plans, backupjobs, backups) are not ApplicationDefinitions, so ensureNavigation() never created their baseFactoriesMapping entries. Without these entries the OpenUI frontend could not resolve the {cluster} context for backup pages, producing broken sidebar links with an empty cluster segment (e.g. /openapi-ui//tenant-root/...). The missing baseFactoriesMapping entries for all backup resource types are now added to the static Navigation resource (@sircthulhu in #2232, #2269).

  • [platform] Fix tenant admins unable to create FoundationDB, Harbor, MongoDB, OpenBAO, OpenSearch, Qdrant, and VPN applications: The cozy:tenant:admin:base ClusterRole was missing seven application resources from apps.cozystack.io (foundationdbs, harbors, mongodbs, openbaos, opensearches, qdrants, vpns). Without these permissions, tenant admins could not create these applications — the "Add" button was inactive in the dashboard. The missing resources have been added to the ClusterRole (@sircthulhu in #2268, #2272).

  • [dashboard] Fix StorageClass dropdown showing "Error" in application forms: The dashboard UI fetches StorageClass resources to populate dropdowns (e.g. in the Postgres form), but the cozystack-dashboard-readonly ClusterRole did not include storage.k8s.io/storageclasses. This caused authenticated users to see "Error" instead of the StorageClass name. get/list/watch permissions for storageclasses have been added to the dashboard readonly role (@sircthulhu in #2267, #2274).

  • [system] Fix 403 error on Service details page by granting tenants read access to EndpointSlices: The dashboard requested EndpointSlices from the discovery.k8s.io API group to display the "Pod serving" section on the Service details page, but cozy:tenant:base and cozy:tenant:view:base ClusterRoles lacked permissions for this resource. Tenant users received a 403 error when opening the Service details page. get/list/watch permissions for endpointslices have been added to both tenant ClusterRoles (@sircthulhu in #2257, #2285).

  • [dashboard] Fix "Pod serving" table displaying "Raw:" and "Invalid Date" on Service details page: The Service details page EndpointSlice table showed "Raw:" prefixes and "Invalid Date" values because the EnrichedTable referenced customizationId factory-kube-service-details-endpointslice which had no corresponding CustomColumnsOverride. Column definitions for Pod (.targetRef.name), Addresses (.addresses), Ready (.conditions.ready), and Node (.nodeName) have been added (@sircthulhu in #2266, #2283).

  • [piraeus-operator] Fix LINSTOR satellite alert labels, reduce scrape-flap false positives, and improve controller alerting: Three alerting issues in cozy-piraeus-operator have been addressed: (1) linstorSatelliteErrorRate used a non-existent name label in annotations, resulting in Satellite "" in alert notifications — corrected to {{ $labels.hostname }}; (2) linstorSatelliteErrorRate could produce false positives when the linstor-controller scrape flapped and historical linstor_error_reports_count counters reappeared inside the alert window — fixed by adding a minimum scrape-count guard; (3) The LinstorControllerOffline alert has been split into separate availability and metrics-availability alerts with configurable hold time to reduce noise during brief connectivity interruptions (@sasha-sup in #2265, #2286).

  • [linstor] Fix swapped VMPodScrape job labels causing incorrect controller offline alerts: The cozy-linstor VictoriaMetrics VMPodScrape templates had the job relabeling rules swapped: linstor-satellite metrics were labeled as job=linstor-controller and vice versa. This caused linstorControllerOffline alerts to fire for satellite endpoints (:9942) while reporting that the controller was unreachable. The job labels are now correctly assigned to their respective targets (@sasha-sup in #2264, #2289).

  • [boot-to-talos] Fix triple-fault on hosts with 5-level paging (LA57) enabled: On hosts with CONFIG_X86_5LEVEL=y in the kernel, kexec into Talos caused a triple-fault because the Talos kernel does not support 5-level page tables. boot-to-talos now detects LA57 before kexec and automatically patches GRUB with no5lvl, runs update-grub, and reboots. After reboot with 5-level paging disabled, boot-to-talos proceeds normally (@IvanHunters in cozystack/boot-to-talos#15).

  • [boot-to-talos] Fix EFI boot entry creation when using loop device images: Talos installer skips EFI variable creation when running on loop devices. boot-to-talos now creates a proper UEFI boot entry with an HD() device path pointing to the real target disk's ESP by reading the GPT partition table from the target disk after image copy, instead of relying on the Talos installer (@kvaps in cozystack/boot-to-talos#16).

  • [talm] Fix silent empty output when no template files are specified: Running talm template without --file or --template flags previously produced minimal or empty output without any error. Validation has been added to engine.Render to return a clear error message when no template files are specified, making misconfigured invocations immediately apparent (@kvaps in cozystack/talm#112).

Documentation

  • [website] Add documentation for VMInstance and VMDisk backups: Added a new virtualization-focused Backup and Recovery guide covering one-off and scheduled backups for VMInstance and VMDisk resources, restore procedures, status verification commands, and troubleshooting notes including Velero-related issues (@myasnikovdaniil in cozystack/website#456).

  • [website] Update developer guide with operator-driven architecture and OCIRepository migration flow: Rewrote the development guide to describe the operator-driven in-cluster architecture, bootstrap flow, operator responsibilities, and the platform install/update sequence. Added an "OCIRepositories and Migration Flow" section with migration hook examples and sequencing rules for pre-upgrade hooks (@myasnikovdaniil in cozystack/website#458).


Full Changelog: v1.1.3...v1.1.4

Download cozystack

v1.0.7

Fixes

  • [platform] Fix tenant admins unable to create FoundationDB, Harbor, MongoDB, OpenBAO, OpenSearch, Qdrant, and VPN applications: The cozy:tenant:admin:base ClusterRole was missing RBAC entries for foundationdbs, harbors, mongodbs, openbaos, opensearches, qdrants, and vpns resources from apps.cozystack.io. Without these permissions, tenant admins could not create these applications — the "Add" button was inactive in the dashboard. The fix adds all seven missing resource verbs (@sircthulhu in #2268, #2271).

  • [system] Fix 403 error on Service details page for tenant users: The cozy:tenant:base and cozy:tenant:view:base ClusterRoles were missing read permissions for discovery.k8s.io/endpointslices. The dashboard requests EndpointSlices to display the "Pod serving" section on the Service details page, and without this permission tenant users received a 403 error. The fix adds get, list, and watch verbs for endpointslices to both tenant roles (@sircthulhu in #2257, #2284).

  • [dashboard] Fix "Pod serving" table showing "Raw:" prefixes and "Invalid Date" on Service details page: The EndpointSlice table on the service details page displayed raw data and broken timestamps because the EnrichedTable component referenced the factory-kube-service-details-endpointslice customization ID which had no corresponding CustomColumnsOverride. The fix adds column definitions for Pod (.targetRef.name), Addresses (.addresses), Ready (.conditions.ready), and Node (.nodeName) (@sircthulhu in #2266, #2282).

  • [dashboard] Fix broken backup menu links missing cluster context: Backup resources (plans, backupjobs, backups) are not ApplicationDefinitions, so ensureNavigation() never created their baseFactoriesMapping entries. Without these mappings, the OpenUI frontend could not resolve the {cluster} context for backup pages, producing broken sidebar links with an empty cluster segment (e.g. /openapi-ui//tenant-root/... instead of /openapi-ui/default/tenant-root/...). The fix adds the three missing static entries to the Navigation resource (@sircthulhu in #2232, #2270).

  • [linstor] Fix swapped VMPodScrape job labels causing incorrect alerts: The job labels in the cozy-linstor VictoriaMetrics VMPodScrape templates were swapped: linstor-satellite metrics were relabeled as job=linstor-controller and vice versa. This caused linstorControllerOffline alerts to fire against satellite endpoints (:9942) while reporting the controller as unreachable. The fix ensures linstor-satellite metrics keep job=linstor-satellite and linstor-controller metrics keep job=linstor-controller, restoring consistent alerting and dashboard semantics (@sasha-sup in #2264, #2288).

  • [piraeus-operator] Fix LINSTOR satellite alert annotations and reduce false-positive alerts: Two issues in the LINSTOR alerts shipped by cozy-piraeus-operator were fixed. First, linstorSatelliteErrorRate used a non-existent name label in annotations, resulting in Satellite "" in alert notifications — corrected to use {{ $labels.hostname }}. Second, linstorSatelliteErrorRate produced false positives when the linstor-controller scrape flapped and historical linstor_error_reports_count counters reappeared inside the alert window — fixed by requiring stable up{job="linstor-controller"} for the full 15-minute window. Additionally, the controller availability alert was split to add a dedicated warning for metrics scrape failures with a 10-minute hold time to reduce transient noise (@sasha-sup in #2265, #2287).

Documentation

  • [website] Add Backup and Recovery guide for VMInstance and VMDisk: Replaced the generic Kubernetes Backup and Recovery guide with a virtualization-focused Backup and Recovery doc covering VMInstance and VMDisk one-off and scheduled backups, restores, status checks, and troubleshooting (including Velero-related notes) (@myasnikovdaniil in cozystack/website#456).

  • [website] Update developer guide with operator-driven architecture and OCIRepository/migration flow: Rewrote the development guide to describe the operator-driven in-cluster architecture, bootstrap flow, operator responsibilities, and platform install/update sequence. Added documentation for OCIRepositories and the migration flow with migration hook examples and sequencing rules for pre-upgrade/install migrations. Also updated the concepts guide with the two-repository update model, dependency ordering rules, namespace creation behavior, and cluster-wide values injection (@myasnikovdaniil in cozystack/website#458).


Full Changelog: v1.0.6...v1.0.7

Download cozystack

v1.2.0

Cozystack v1.2.0

Cozystack v1.2.0 delivers significant platform enhancements: a fully managed OpenSearch service joining the application catalog, VPC peering for secure inter-tenant networking, tenant workload placement control via the new SchedulingClass system, a highly-available VictoriaLogs cluster replacing the single-node setup, and Linstor volume relocation for optimized clone and snapshot restore placement. Additional highlights include external-dns as a standalone extra package, multi-node RWX volume fixes, and a wave of dashboard and monitoring improvements.

Feature Highlights

OpenSearch: Managed Search and Analytics Service

Cozystack now ships OpenSearch as a fully managed PaaS application — supporting OpenSearch v1, v2, and v3 in a multi-role topology with dedicated master, data, ingest, coordinating, and ML nodes. TLS is enabled by default, HTTP Basic auth is provided out of the box, and custom user definitions allow per-application credentials. The optional OpenSearch Dashboards UI can be enabled alongside the engine. External access, topology spread policies, and a comprehensive JSON schema are all included.

A companion opensearch-operator system package wraps the upstream Opster OpenSearch Operator v2.8.0 and adds a sysctl DaemonSet to configure the required vm.max_map_count kernel parameter on every node automatically. An ApplicationDefinition package ties everything into the Cozystack platform dashboard with schema validation and resource management.

SchedulingClass: Tenant Workload Placement

Cozystack now supports a SchedulingClass CRD that allows platform operators to define cluster-wide scheduling constraints — pinning tenant workloads to specific data centers, hardware generations, or node groups without requiring tenants to manage scheduler configuration themselves. Tenants declare a schedulingClass in their Tenant spec; the platform injects the appropriate schedulerName into all workloads in that namespace.

The lineage-controller-webhook has been extended to verify the referenced SchedulingClass CR before injection, and child tenants inherit their parent's scheduling constraints (children cannot override). A SchedulingClass dropdown in the Tenant creation form in the dashboard makes the feature fully self-service. The underlying cozystack-scheduler — a custom kube-scheduler extension with SchedulingClass-aware affinity plugins — is now installed and enabled by default as part of the platform.

VPC Peering for Multi-Tenant Environments

The vpc application gains bilateral VPC peering using Kube-OVN's native vpcPeerings mechanism, allowing tenants to securely interconnect their private networks without routing traffic through public endpoints. Peering link-local IPs (169.254.0.0/16) are allocated deterministically from a hash of the sorted VPC pair names, ensuring stable addresses across reconciliations. Static route support (staticRoutes) enables fine-grained inter-VPC routing policies. A cozy-lib helper (hexToInt) performs the deterministic IP allocation, and a JSON Schema validation enforces the ^tenant- namespace pattern for peered VPCs.

VictoriaLogs: Clustered Mode for High Availability

The platform's log storage has been upgraded from the deprecated single-node VLogs CR to a VLCluster deployment with separate vlinsert, vlselect, and vlstorage components, each running with 2 replicas by default — consistent with the existing VMCluster setup. This brings horizontal scalability and resilience to the logging tier. VPA autoscaling is enabled for all VLCluster components, and the victoria-metrics-operator has been upgraded from v0.55.0 to v0.68.1 to add VLCluster CRD support.

Linstor CSI: Volume Relocation After Clone and Restore

The Linstor CSI driver now carries upstream patches enabling automatic replica relocation after PVC clone and snapshot restore operations. Two new parameters control the behavior: linstor.csi.linbit.com/relocateAfterClone on StorageClasses moves replicas to optimal nodes after a clone, and snap.linstor.csi.linbit.com/relocate-after-restore on VolumeSnapshotClasses does the same after a restore. VolumeSnapshotClasses for Velero and Kasten use cases are pre-configured. This enables full PVC-level backup and restore workflows with automatic data rebalancing, a key prerequisite for production Velero/Kasten integrations.

Major Features and Improvements

  • [apps] Add managed OpenSearch service: Deployed as a PaaS application supporting OpenSearch v1/v2/v3 with multi-role node topology, TLS, HTTP Basic auth, custom users, optional OpenSearch Dashboards UI, external access, and topology spread policies; backed by the opster OpenSearch Operator v2.8.0 and a sysctl DaemonSet for vm.max_map_count (@matthieu-robin in #1953).

  • [vpc] Add VPC peering support for multi-tenant environments: Bilateral VPC peering via Kube-OVN's vpcPeerings, deterministic link-local IP allocation from sorted VPC pair hash, static routes support, ConfigMap peer discovery enrichment, and JSON Schema validation enforcing ^tenant- namespace pattern (@mattia-eleuteri in #2152).

  • [monitoring] Migrate VictoriaLogs from VLogs to VLCluster: Replaced deprecated single-node VLogs CR with clustered VLCluster (vlinsert/vlselect/vlstorage, 2 replicas each), added VPA for all components, upgraded victoria-metrics-operator to v0.68.1 (@sircthulhu in #2153).

  • [scheduler] Integrate SchedulingClass support for tenant workloads: Added schedulingClass Tenant parameter with inheritance enforcement, scheduling.cozystack.io/class namespace label, lineage-webhook extension to verify and inject schedulerName, SchedulingClass dropdown in Tenant dashboard form (@sircthulhu in #2223).

  • [cozystack-scheduler] Add custom scheduler as an optional system package: Vendored cozystack-scheduler from github.com/cozystack/cozystack-scheduler — a kube-scheduler extension with SchedulingClass-aware affinity plugins, including Helm chart with RBAC, ConfigMap, Deployment, and CRD (@lllamnyp in #2205).

  • [platform] Enable cozystack-scheduler by default: The cozystack-scheduler and SchedulingClass CRD are now installed as default system packages; the backup tool has been moved to optional packages (@lllamnyp in #2253).

  • [extra] Add external-dns as a standalone extra package: Packaged external-dns as an installable extra (tenant-level) component for automatic DNS record management from Kubernetes Service and Ingress resources (@mattia-eleuteri in #1988).

  • [linstor] Add linstor-csi patches for clone/snapshot relocation: New patch enabling relocateAfterClone StorageClass parameter and relocate-after-restore VolumeSnapshotClass parameter; pre-configured VolumeSnapshotClasses for Velero and relocation workflows; CDI switched to csi-clone strategy (@kvaps in #2133).

  • [monitoring] Add inlineScrapeConfig support to tenant vmagent: Tenants can now define inline scrape configurations directly in their VMAgent spec, enabling custom metrics collection from services that are not discoverable via standard Kubernetes service discovery (@mattia-eleuteri in #2200).

  • [monitoring] Add Slack dashboard URL, vmagent environment label, and dynamictext Grafana plugin: Added SLACK_DASHBOARD_URL and SLACK_SUMMARY_FMT environment variables for richer alert notifications, per-vmagent environment label for metric source identification, and the dynamictext-panel plugin for Grafana dashboards (@vnyakas in #2210).

  • [monitoring] Scope infrastructure dashboards to tenant-root only: Infrastructure-level Grafana dashboards are now scoped to the tenant-root namespace only, preventing them from appearing in tenant sub-namespaces and reducing dashboard noise (@mattia-eleuteri in #2197).

  • [tenant] Allow egress to virt-handler for VM metrics scraping: Extended tenant NetworkPolicy to permit egress to virt-handler pods, enabling Prometheus to scrape VM-level metrics from KubeVirt without additional policy exceptions (@mattia-eleuteri in #2199).

  • [dashboard] Add keycloakInternalUrl for backend-to-backend OIDC requests: Added a keycloakInternalUrl platform value for the dashboard backend to perform OIDC token introspection via an internal cluster URL, avoiding external round-trips and improving reliability in air-gapped environments (@sircthulhu in #2224).

  • [dashboard] Add secret-hash annotation to KeycloakClient for secret sync: Added a secret-hash annotation to the KeycloakClient resource so that changes to the client secret trigger automatic reconciliation and propagation to dependent components (@sircthulhu in #2231).

  • [docs] Add OpenAPI and Go types code generation for apps: Added tooling to generate OpenAPI schemas and Go types from Helm chart values, enabling type-safe programmatic access to managed application configurations and automatic API reference generation (@myasnikovdaniil in #2214).

Improvements (minor)

  • [cozystack-scheduler] Update to v0.2.0: Updated the cozystack-scheduler to v0.2.0 with improved SchedulingClass affinity handling (@lllamnyp in #2244).

  • [platform] Ensure cozystack-packages OCIRepository updates reliably: Added configuration to...

Read more

v1.0.6

Fixes

  • [kubernetes] Fix CiliumNetworkPolicy endpointSelector for multi-node RWX volumes: When an NFS-backed RWX volume is published to multiple VMs, the network policy's endpointSelector was only capturing the first VM. Subsequent volume publications added owner references but never broadened the selector, causing Cilium to block NFS egress and making mounts hang on all nodes except the first. The fix switches from matchLabels to matchExpressions (operator: In) so the selector lists all VM names and is rebuilt whenever owner references change (@mattia-eleuteri in #2227, #2228).

  • [dashboard] Fix dashboard authentication failures after secret recreation: Added a secret-hash annotation containing the SHA256 hash of the client secret to the dashboard KeycloakClient resource. Without this annotation, if the dashboard-client Secret was recreated (e.g. after an upgrade or reinstall), the KeycloakClient spec stayed unchanged, the EDP Keycloak operator skipped reconciliation, and Keycloak kept the stale secret — causing dashboard authentication failures. Now any secret value change updates the annotation hash, triggering operator reconciliation and syncing the new secret to Keycloak (@sircthulhu in #2231, #2240).

  • [etcd] Fix defrag CronJob accumulating pods during cluster upgrades: Added protective limits to the etcd defragmentation CronJob to prevent job pile-up when etcd is temporarily unavailable during upgrades. Without concurrencyPolicy: Forbid, new jobs kept being created hourly while previous ones were still failing, accumulating hundreds of running/failed pods across tenants. The fix adds concurrencyPolicy: Forbid, a startingDeadlineSeconds: 300 guard against missed schedules, a 30-minute job timeout, and limits retries to 2 (@sircthulhu in #2233, #2235).

Documentation

  • [website] Document keycloakInternalUrl platform value: Added documentation for the authentication.oidc.keycloakInternalUrl platform value to the Platform Package Reference, Self-Signed Certificates guide, and Enable OIDC Server pages. This value routes dashboard backend OIDC requests through the internal Keycloak service, which is useful in environments with self-signed certificates (@sircthulhu in cozystack/website#452).

  • [website] Publish Cozystack v1.0 release announcement: Added the official Cozystack v1.0 release announcement blog post and supporting images, celebrating the first stable release of the platform (@tym83 in cozystack/website#453, cozystack/website#454).


Full Changelog: v1.0.5...v1.0.6

Download cozystack

v1.1.3

Fixes

  • [kubernetes] Fix CiliumNetworkPolicy endpointSelector not updated for multi-node RWX volumes: When an NFS-backed RWX volume was published to multiple VMs, the CiliumNetworkPolicy endpointSelector.matchLabels was set only for the first VM and never broadened on subsequent ControllerPublishVolume calls. This caused Cilium to block NFS egress so that mounts hung on all nodes except the first. The selector now uses matchExpressions with operator: In and is rebuilt whenever owner references are added or removed (@mattia-eleuteri in #2227, #2229).

  • [dashboard] Fix dashboard-client secret desynchronization with Keycloak after upgrades: When the dashboard-client Kubernetes Secret was recreated with a new value after an upgrade or reinstall, the KeycloakClient spec remained unchanged and the EDP Keycloak operator skipped reconciliation, leaving Keycloak with the stale secret and causing authentication failures for the dashboard. A secret-hash annotation containing the SHA256 hash of the client secret is now added to the KeycloakClient resource; any secret rotation updates the hash in metadata, triggering operator reconciliation and syncing the new secret to Keycloak (@sircthulhu in #2231, #2241).

  • [etcd] Fix defrag CronJob accumulating hundreds of pods during cluster upgrades: After upgrading CozyStack, the etcd defrag CronJob could accumulate hundreds of running and failed pods when etcd was temporarily unavailable during the upgrade, because no concurrency or retry limits were configured. Added concurrencyPolicy: Forbid to prevent parallel jobs, startingDeadlineSeconds: 300 to discard missed schedules older than 5 minutes, failedJobsHistoryLimit: 1 to limit failure retention, activeDeadlineSeconds: 1800 for a 30-minute per-job timeout, and backoffLimit: 2 to cap retries (@sircthulhu in #2233, #2234).

Documentation

  • [website] Document keycloakInternalUrl platform value: Added reference documentation for the authentication.oidc.keycloakInternalUrl platform value to the Platform Package Reference, Self-Signed Certificates guide, and Enable OIDC Server guide, explaining how to route dashboard backend OIDC requests through the internal Keycloak service URL (@sircthulhu in cozystack/website#452).

Full Changelog: v1.1.2...v1.1.3

Download cozystack

v1.1.2

Fixes

  • [bucket] Fix S3 Manager endpoint mismatch with COSI credentials: The S3 Manager UI previously constructed an s3.<tenant>.<cluster-domain> endpoint even though COSI-issued bucket credentials point to the root-level S3 endpoint. This caused login failures with "invalid credentials" despite valid secrets. The deployment now uses the actual endpoint from BucketInfo, with the old namespace-based endpoint kept only as a fallback before BucketAccess secrets exist (@IvanHunters in #2211, #2215).

  • [platform] Fix spurious OpenAPI post-processing errors on cozystack-api startup: The OpenAPI post-processor was being invoked for non-apps.cozystack.io group versions where the base Application* schemas do not exist, producing noisy startup errors on every API server launch. It now skips those non-apps group versions gracefully instead of returning an error (@kvaps in #2212, #2217).

Documentation

  • [website] Add troubleshooting for packages stuck in DependenciesNotReady: Added an operations guide that explains how to diagnose missing package dependencies in operator logs and corrected the packages management development docs to use the current make image-packages target (@kvaps in cozystack/website#450).

  • [website] Reorder installation docs to install the operator before the platform package: Updated the platform installation guide and tutorial so the setup sequence consistently installs the Cozystack operator first, then prepares and applies the Platform Package, matching the rest of the documentation set (@sircthulhu in cozystack/website#449).

  • [website] Add automated installation guide for the Ansible collection: Added a full guide for deploying Cozystack with the cozystack.installer collection, including inventory examples, distro-specific playbooks, configuration reference, and explicit version pinning guidance (@lexfrei in cozystack/website#442).

  • [website] Expand monitoring and platform architecture reference docs: Added a tenant custom metrics collection guide for VMServiceScrape and VMPodScrape, and documented PackageSource/Package architecture, reconciliation flow, rollback behavior, and the cozypkg workflow in Key Concepts (@IvanHunters in cozystack/website#444, cozystack/website#445).

  • [website] Improve operations guides for CA rotation and Velero backups: Completed the CA rotation documentation with dry-run and post-rotation credential retrieval steps, and expanded the backup configuration guide with concrete examples, verification commands, and clearer operator procedures (@kvaps in cozystack/website#406; @androndo in cozystack/website#440).


Full Changelog: v1.1.1...v1.1.2

Download cozystack

v1.0.5

Fixes

  • [api] Fix spurious OpenAPI post-processing errors for non-apps group versions: The API server no longer logs false errors while generating OpenAPI specs for core and other non-apps.cozystack.io group versions. The post-processor now exits early when the base Application schemas are absent, reducing noisy startup logs without affecting application schema generation (@kvaps in #2212, #2216).

Documentation

  • [website] Add DependenciesNotReady troubleshooting and correct packages management build target: Added a troubleshooting guide for packages stuck in DependenciesNotReady, including how to inspect operator logs and identify missing dependencies, and fixed the outdated make image-cozystack command to make image-packages in the packages management guide (@kvaps in cozystack/website#450).

  • [website] Clarify operator-first installation order: Reordered the platform installation guide and tutorial so users install the Cozystack operator before preparing and applying the Platform Package, matching the rest of the installation docs and reducing setup confusion during fresh installs (@sircthulhu in cozystack/website#449).

  • [website] Add automated installation guide for Ansible: Added end-to-end documentation for deploying Cozystack with the cozystack.installer Ansible collection, including inventory examples, distro-specific playbooks, configuration reference, verification steps, and explicit version pinning guidance to help operators automate installs safely (@lexfrei in cozystack/website#442).

  • [website] Expand CA rotation operations guide: Completed the CA rotation documentation with separate Talos and Kubernetes certificate rotation procedures, dry-run preview steps, and post-rotation guidance for fetching updated talosconfig and kubeconfig files after certificate changes (@kvaps in cozystack/website#406).

  • [website] Improve backup operations documentation: Enhanced the operator backup and recovery guide with clearer Velero enablement steps, concrete provider and bucket examples, and more useful commands for inspecting backups, schedules, restores, CRD status, and logs (@androndo in cozystack/website#440).

  • [website] Add custom metrics collection guide: Added a monitoring guide showing how tenants can expose their own Prometheus exporters through VMServiceScrape and VMPodScrape, including namespace labeling requirements, example manifests, verification steps, and troubleshooting advice (@IvanHunters in cozystack/website#444).

  • [website] Document PackageSource and Package architecture: Added a Key Concepts reference covering PackageSource and Package reconciliation flow, dependency handling, update propagation, rollback behavior, FluxPlunger recovery, and the cozypkg CLI for package management (@IvanHunters in cozystack/website#445).

  • [website] Refresh v1 application and platform documentation: Fixed the documentation auto-update flow and published a broad v1 documentation refresh covering newly documented applications, updated naming and navigation, virtualization and platform content updates, and reorganized versioned docs pages (@myasnikovdaniil in cozystack/website#439).


Full Changelog: v1.0.4...v1.0.5

Download cozystack

v1.1.1

Fixes

  • [dashboard] Fix hidden MarketplacePanel resources appearing in sidebar menu: The sidebar was generated independently from MarketplacePanels, always showing all resources regardless of their hidden state. Fixed by fetching MarketplacePanels during sidebar reconciliation and skipping resources where hidden=true, so hiding a resource from the marketplace also removes it from the sidebar navigation (@IvanHunters in #2177, #2203).

  • [dashboard] Fix disabled/hidden state overwritten on every MarketplacePanel reconciliation: The controller was hardcoding disabled=false and hidden=false on every reconciliation, silently overwriting any user changes made through the dashboard UI. Fixed by reading and preserving the current disabled/hidden values from the existing resource before updating (@IvanHunters in #2176, #2201).

  • [dashboard] Fix External IPs factory EnrichedTable rendering: The external-IPs table displayed empty rows because the factory used incorrect EnrichedTable properties. Replaced clusterNamePartOfUrl with cluster and changed pathToItems from array to dot-path string format, consistent with all other working EnrichedTable instances (@IvanHunters in #2175, #2193).

  • [platform] Fix VM MAC address not preserved during virtual-machine to vm-instance migration: Kube-OVN reads MAC address exclusively from the pod annotation ovn.kubernetes.io/mac_address, not from the IP resource spec.macAddress. Without the annotation, migrated VMs received a new random MAC, breaking OS-level network configurations that match by MAC (e.g. netplan). Added a Helm lookup for the Kube-OVN IP resource in the vm-instance chart so that MAC and IP addresses are automatically injected as pod annotations when the resource exists (@sircthulhu in #2169, #2190).

  • [etcd-operator] Replace deprecated kube-rbac-proxy image: The gcr.io/kubebuilder/kube-rbac-proxy image became unavailable after Google Container Registry was deprecated. Replaced it with quay.io/brancz/kube-rbac-proxy from the original upstream author, restoring etcd-operator functionality (@kvaps in #2181, #2182).

  • [migrations] Handle missing RabbitMQ CRD in migration 34: Migration 34 failed with an error when the rabbitmqs.apps.cozystack.io CRD did not exist — which occurs on clusters where RabbitMQ was never installed. Added a CRD presence check before attempting to list resources so that migration 34 completes cleanly on such clusters (@IvanHunters in #2168, #2180).

  • [keycloak] Fix Keycloak crashloop due to misconfigured health probes: Keycloak 26.x redirects all HTTP requests on port 8080 to the configured HTTPS hostname; since kubelet does not follow redirects, liveness and readiness probes failed causing a crashloop. Fixed by enabling KC_HEALTH_ENABLED=true, exposing management port 9000, and switching all probes to /health/live and /health/ready on port 9000. Also added a startupProbe for improved startup tolerance (@mattia-eleuteri in #2162, #2179).


Full Changelog: v1.1.0...v1.1.1

Download cozystack