Unless explicitly stated otherwise, the data model and algorithms described here are applicable to IPS deployments on Kubernetes (K8s) as well as OpenShift.
IPS comprises the following high-level components that communicate over a shared network for coordinating all operations:
the management console (web application) coordinating all configuration management, deployment and monitoring operations of IPS
health check endpoint exposed at
tailored UltraESB-X instances that perform the actual work inside IPS (hosting and running projects, with runtime log generation and statistics reporting)
provides persistence by holding metadata and configurations of all Clusters, Projects, deployments (Cluster Versions) and access control entities, as well as logging, monitoring and audit trails; while MySQL is primarily used, other DBMSs like Oracle are also supported
not externally accessible
not externally accessible
a K8s platform-level DNS provider that facilitates service discovery for communication across the abovementioned components
not externally accessible
|While the above arrangement corresponds to the standalone demo IPS installer, a production deployment would usually host the database and Elasticsearch instances on external dedicated servers for better resource provisioning and reliability.|
IPS uses a database-backed persistence model. The database is shared by the IPS-Web application (for all management and monitoring operations), Config-Server component (for serving pods with assigned configurations) and by IPS-Worker instances (for persistent logging).
In addition, IPS-Worker instances deployed in all clusters report system and communication statistics to a common Elasticsearch instance, which is queried by IPS-Web for displaying cluster- (summarized) and pod-level statistics.
Cluster deployments are based on K8s Deployment entities (or DeploymentConfig entities in case of OpenShift). Usually each deployment creates a new ClusterVersion on the Cluster against which the deployment is done (somewhat similar to a version control mechanism), but it is also possible to deploy a specific (existing) ClusterVersion in order to reproduce a former state of the cluster. A ClusterVersion encompasses details required for configuring an IPS-Worker runtime, including projects and configuration artifacts that should be deployed. The K8s Deployment (and hence each pod derived from it) has a reference to the ClusterVersion, based on which the IPS-Web runtime pulls and deploys required artifacts from the Config-Server.
Each actively deployed Cluster has an active ClusterVersion, indicating its current deployment configuration. A deployment effectively involves updating the active version of the parent Cluster to the specified ClusterVersion and then refreshing the Cluster, which internally deploys the configuration of the (newly activated) version. Cluster refresh can also be invoked by a user to propagate general configuration updates made on a Cluster to its actual deployment on K8s side.
Once a deployment request is received, IPS-Web performs the following:
Check if a deployment is ongoing for the requested cluster (and reject the request if such is found, unless a "force deploy" flag is specified in the request)
Perform some sanity checks (such as validating the set of projects scheduled for deployment, and suspending the cluster if the set is empty)
Updating the K8s Service associated with the cluster, with the ports that should be exposed for external connectivity (creating a service if one does not exist, or deleting it if there are no ports to be mapped)
Updating the K8s Deployment associated with the cluster, with a Pod template with the specified replication level, container image and resource limits, containing parameters such as the ClusterVersion to be deployed and environmental variables necessary for the IPS-Worker runtime (creating a new deployment if one does not exist)
Marking the cluster as DEPLOYING, so that the monitoring mechanism can start monitoring the ongoing deployment
A deployment is performed transactionally, meaning that any failure would roll back all parameters (database state and platform (K8s) level representation) of the cluster to their previous states.
Once a deployment is started, K8s spawns the requested number of IPS-Worker pods, each of which goes through these steps:
Obtaining a pod name from Config-Server to uniquely represent itself
Downloading the necessary
project properties, and
configuration artifacts from Config-Server and placing them on the pod-local filesystem
Spawning an UltraESB-X process as the main process of the container, which includes
Establishing database connections for persistent log publishing,
Establishing connectivity to Elasticsearch for statistics publishing,
Initialization and starting of all configured projects, and
Startup of a management server as a management API endpoint and server health indicator utilized by IPS
In order to provide a unique name for each pod, IPS-Web maintains a pod name map, a registry of all pod identifiers
currently active on the system, mapped with a corresponding set of human readable mapped names in the format
As part of its startup process, each pod has to request a mapped name from Config-Server
which issues a unique mapped name for each new incoming K8s pod name.
If a pod is restarted (in which case its K8s pod name would remain unchanged), Config-Server reissues the old name for
the newly spawned pod, ensuring that a given pod name will retain the same mapped name throughout its lifetime.
In order to track the status of the Cluster deployment (and optionally initiate rollback if a failure state is detected), a monitor agent (thread) is assigned to each Cluster, and remains active throughout the Cluster’s lifetime. This thread will periodically check the status of the Cluster’s pods, and determine cluster state based on the following:
If any of the pods are in failing states (
the Cluster (and hence the deployment) is marked as FAILED.
If no failures are detected, the Cluster remains DEPLOYING until the required number of pods enters
When the replication count is satisfied (as above), management server endpoints on each pod are invoked to obtain the lists of running projects from each instance. The Cluster remains in DEPLOYING state until all management server endpoints start producing success responses (indicating that all IPS-Worker instances have started successfully).
Each project is checked for the expected number of replicas (across all running worker instances), and the deployment is marked as SUCCESS Or FAILED based on whether all projects have started successfully.
Whenever a state change (e.g. all pods entering
Running state) is detected in the Cluster, a
monitor log is generated.
In case of a failure event (cluster entering FAILED state) this log may be accompanied by multiple
error log entries (one entry per pod)
which contain detailed diagnostic information such as
pod configurations and
In order to ensure availability for failure-prone clusters, an automatic rollback mechanism is made available for reverting the cluster to their "latest stabilizable" state in case of a deployment failure. In this, a rollback mechanism is attached to the same deployment monitoring process explained above, which gets triggered whenever a deployment entity (pod or project) displays a failure (at least temporarily). On failure, the Cluster configurations are updated to reflect those of the last known successful deployment (effectively rolling back the Cluster to its previous state). If this deployment attempt also fails, the process is repeated for the next successful version and so forth, until a version gets deployed successfully or all available versions are exhausted (in which case the Cluster would get suspended as there is no deployable configuration).
|While the monitoring process is an inherent part of all clusters, the automatic rollback mechanism is disabled by default, and should be explicitly enabled for required clusters by ticking the advanced option Enable automatic rollback on failure in the cluster configuration.|