IPS Architecture

IPS Architecture

Version: 17.07

Unless explicitly stated otherwise, the data model and algorithms described here are applicable to IPS deployments on Kubernetes (K8s) as well as OpenShift.

Components

IPS comprises the following high-level components that communicate over a shared network for coordinating all operations:

Component Functionalities/Responsibilities Accessibility

Component	Functionalities/Responsibilities	Accessibility
`IPS-Web`	the management console (web application) coordinating all configuration management, deployment and monitoring operations of IPS	`https://<any_node_network_IP>:30080`
`Config-Server`	a replicatable Thrift server serving configurations (pod names, projects and configuration artifacts) for worker instances running inside IPS	health check endpoint exposed at `http://<any_node_network_IP>:30800`
`IPS-Worker`	tailored UltraESB-X instances that perform the actual work inside IPS (hosting and running projects, with runtime log generation and statistics reporting)	varies across projects, based on the nature of connectors used; e.g. for HTTP, service paths defined in deployed projects are accessible via port mappings configured during deployment
`Database`	provides persistence by holding metadata and configurations of all Clusters, Projects, deployments (Cluster Versions) and access control entities, as well as logging, monitoring and audit trails; while MySQL is primarily used, other DBMSs like Oracle are also supported	not externally accessible
`Elasticsearch`	retains all statistics published by IPS-Worker runtimes, viewable through cluster and instance statistics perspectives of IPS-Web	not externally accessible
`kube-dns`	a K8s platform-level DNS provider that facilitates service discovery for communication across the abovementioned components	not externally accessible

IPS-Web

the management console (web application) coordinating all configuration management, deployment and monitoring operations of IPS

https://<any_node_network_IP>:30080

Config-Server

a replicatable Thrift server serving configurations (pod names, projects and configuration artifacts) for worker instances running inside IPS

health check endpoint exposed at http://<any_node_network_IP>:30800

IPS-Worker

tailored UltraESB-X instances that perform the actual work inside IPS (hosting and running projects, with runtime log generation and statistics reporting)

varies across projects, based on the nature of connectors used; e.g. for HTTP, service paths defined in deployed projects are accessible via port mappings configured during deployment

Database

provides persistence by holding metadata and configurations of all Clusters, Projects, deployments (Cluster Versions) and access control entities, as well as logging, monitoring and audit trails; while MySQL is primarily used, other DBMSs like Oracle are also supported

not externally accessible

Elasticsearch

retains all statistics published by IPS-Worker runtimes, viewable through cluster and instance statistics perspectives of IPS-Web

not externally accessible

kube-dns

a K8s platform-level DNS provider that facilitates service discovery for communication across the abovementioned components

not externally accessible

All of the above are deployed in the base K8s platform as Replication Controllers and exposed for external access through Services when applicable.

Figure 1. IPS Architecture

While the above arrangement corresponds to the standalone demo IPS installer, a production deployment would usually host the database and Elasticsearch instances on external dedicated servers for better resource provisioning and reliability.

Data Model

IPS uses a database-backed persistence model. The database is shared by the IPS-Web application (for all management and monitoring operations), Config-Server component (for serving pods with assigned configurations) and by IPS-Worker instances (for persistent logging).

In addition, IPS-Worker instances deployed in all clusters report system and communication statistics to a common Elasticsearch instance, which is queried by IPS-Web for displaying cluster- (summarized) and pod-level statistics.

Cluster Deployment

Concept

Cluster deployments are based on K8s Deployment entities (or DeploymentConfig entities in case of OpenShift). Usually each deployment creates a new ClusterVersion on the Cluster against which the deployment is done (somewhat similar to a version control mechanism), but it is also possible to deploy a specific (existing) ClusterVersion in order to reproduce a former state of the cluster. A ClusterVersion encompasses details required for configuring an IPS-Worker runtime, including projects and configuration artifacts that should be deployed. The K8s Deployment (and hence each pod derived from it) has a reference to the ClusterVersion, based on which the IPS-Web runtime pulls and deploys required artifacts from the Config-Server.

Mechanism

Each actively deployed Cluster has an active ClusterVersion, indicating its current deployment configuration. A deployment effectively involves updating the active version of the parent Cluster to the specified ClusterVersion and then refreshing the Cluster, which internally deploys the configuration of the (newly activated) version. Cluster refresh can also be invoked by a user to propagate general configuration updates made on a Cluster to its actual deployment on K8s side.

Starting a Deployment

Once a deployment request is received, IPS-Web performs the following:

Check if a deployment is ongoing for the requested cluster (and reject the request if such is found, unless a "force deploy" flag is specified in the request)
Perform some sanity checks (such as validating the set of projects scheduled for deployment, and suspending the cluster if the set is empty)
Updating the K8s Service associated with the cluster, with the ports that should be exposed for external connectivity (creating a service if one does not exist, or deleting it if there are no ports to be mapped)
Updating the K8s Deployment associated with the cluster, with a Pod template with the specified replication level, container image and resource limits, containing parameters such as the ClusterVersion to be deployed and environmental variables necessary for the IPS-Worker runtime (creating a new deployment if one does not exist)
Marking the cluster as DEPLOYING, so that the monitoring mechanism can start monitoring the ongoing deployment

A deployment is performed transactionally, meaning that any failure would roll back all parameters (database state and platform (K8s) level representation) of the cluster to their previous states.

IPS-Worker Startup

Once a deployment is started, K8s spawns the requested number of IPS-Worker pods, each of which goes through these steps:

Obtaining a pod name from Config-Server to uniquely represent itself
Downloading the necessary
1. project versions,
2. project properties, and
3. configuration artifacts from Config-Server and placing them on the pod-local filesystem
Spawning an UltraESB-X process as the main process of the container, which includes
1. Establishing database connections for persistent log publishing,
2. Establishing connectivity to Elasticsearch for statistics publishing,
3. Initialization and starting of all configured projects, and
4. Startup of a management server as a management API endpoint and server health indicator utilized by IPS

Pod Names

In order to provide a unique name for each pod, IPS-Web maintains a pod name map, a registry of all pod identifiers currently active on the system, mapped with a corresponding set of human readable mapped names in the format Server-n. As part of its startup process, each pod has to request a mapped name from Config-Server which issues a unique mapped name for each new incoming K8s pod name. If a pod is restarted (in which case its K8s pod name would remain unchanged), Config-Server reissues the old name for the newly spawned pod, ensuring that a given pod name will retain the same mapped name throughout its lifetime.

Monitoring

In order to track the status of the Cluster deployment (and optionally initiate rollback if a failure state is detected), a monitor agent (thread) is assigned to each Cluster, and remains active throughout the Cluster’s lifetime. This thread will periodically check the status of the Cluster’s pods, and determine cluster state based on the following:

If any of the pods are in failing states (CrashLoopBackoff, Error, Completed, ImagePullBackoff etc.) the Cluster (and hence the deployment) is marked as FAILED.
If no failures are detected, the Cluster remains DEPLOYING until the required number of pods enters Running state.
When the replication count is satisfied (as above), management server endpoints on each pod are invoked to obtain the lists of running projects from each instance. The Cluster remains in DEPLOYING state until all management server endpoints start producing success responses (indicating that all IPS-Worker instances have started successfully).
Each project is checked for the expected number of replicas (across all running worker instances), and the deployment is marked as SUCCESS Or FAILED based on whether all projects have started successfully.

Whenever a state change (e.g. all pods entering Running state) is detected in the Cluster, a monitor log is generated. In case of a failure event (cluster entering FAILED state) this log may be accompanied by multiple error log entries (one entry per pod) which contain detailed diagnostic information such as console logs, pod configurations and pod snapshots.

Additionally, each running worker instance publishes: . its system and messaging statistics to the shared Elasticsearch instance for viewing and analysis through IPS-Web, and . its runtime logs to the central database which can also be viewed, filtered and downloaded through IPS-Web

Automatic Rollback

In order to ensure availability for failure-prone clusters, an automatic rollback mechanism is made available for reverting the cluster to their "latest stabilizable" state in case of a deployment failure. In this, a rollback mechanism is attached to the same deployment monitoring process explained above, which gets triggered whenever a deployment entity (pod or project) displays a failure (at least temporarily). On failure, the Cluster configurations are updated to reflect those of the last known successful deployment (effectively rolling back the Cluster to its previous state). If this deployment attempt also fails, the process is repeated for the next successful version and so forth, until a version gets deployed successfully or all available versions are exhausted (in which case the Cluster would get suspended as there is no deployable configuration).

While the monitoring process is an inherent part of all clusters, the automatic rollback mechanism is disabled by default, and should be explicitly enabled for required clusters by ticking the advanced option Enable automatic rollback on failure in the cluster configuration.

Components

Data Model

Cluster Deployment

Concept

Mechanism

Starting a Deployment

IPS-Worker Startup

Pod Names

Monitoring

Automatic Rollback

Docs

API

Quick Links

Follow Us

Documentation

API

IPS Architecture

Components

Data Model

Cluster Deployment

Concept

Mechanism

Starting a Deployment

IPS-Worker Startup

Pod Names

Monitoring

Automatic Rollback

Docs

API

Quick Links

Follow Us

Subscribe to our newsletter