An UltraESB installation can be configured to run as one or more stand-alone instances, or configured to work as a cluster of nodes. By default, instances does not share any state among each other unless the mediation logic explicitly requests such behavior utilizing the distributed caching support. Clustering support in the UltraESB is implemented using the Apache ZooKeeper framework used to coordinate extremely large clusters of Hadoop instances. ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. The clustering support thus separates the following concerns and handles them separately from each other:
Group coordination & Co-operative control
A group of nodes in the cluster needs to inter communicate seamlessly in-order to operate as a single unit which is referred to as Group coordination. At the same time an ESB cluster should provide a controlling mechanism to control the complete group as a single unit which is referred to as Co-operative control.
State replication & Content sharing
If there are stateful mediations’ operating in an ESB cluster, it maybe required to replicate the state among the nodes in the group which is referred to as State replication. It is very common to have shared content between these nodes in the group when you implement a mediation flow which needs to be visible as a single unit, and that is accomplished by Content sharing in the cluster.
ZooKeeper is a fast, simple, ordered and replicated coordination service, which could itself be deployed as a collection of multiple instances called an ensemble. Refer to the ZooKeeper Overview for a more detailed description of its features and concepts.
Coordination is mainly used in the UltraESB to find the nodes in a given cluster domain. The UltraESB has a concept of a clustering domain, so that you multiple clusters could be configured to work independently on the same LAN using the same ZooKeeper quorum. ZooKeeper keeps information in a tree structure as a set of nodes, which is similar to that of the Unix file system inodes, and ZooKeeper names these nodes as znodes. This structure is efficiently shared across each node in the clustering domain, and reads and writes are guaranteed to be atomic and are ordered across the cluster with a stamped transaction state. An UltraESB instance acts as a client to the ZooKeeper service, and maintains a connection to the ZooKeeper service which maybe replicated as one or more instances again.
The above diagram depicts a three node UltraESB cluster, which uses a replicated ZooKeeper quorum of three instances spread across three nodes. This avoids any single point of failure, and ZooKeeper can operate correctly even with the loss of a single ZooKeeper instance, as long as a majority of nodes are available; hence a ZooKeeper quorum is usually 1, 3, 5 nodes etc. When ZooKeeper itself is operating as a replicated group of instances, it would elect a leader for its operation - however, this has absolutely no impact to the operation of the UltraESB.
Unlike some JEE application server clustering concepts, the nodes of an UltraESB cluster are absolutely identical. Hence an UltraESB cluster can operate irrespective of the failure or network partitioning of any of the other nodes, as long as any replicated state (See below) and shared content can operate accordingly.
Typically a hardware load-balancer would be used to front a cluster of nodes to load balance HTTP/S, TCP/S, MLLP/S style traffic reaching proxy services over such transports. Proxy services over transports such as JMS, or JDBC polling would be transactional as configured by default. These services would thus operate in an Active-Active manner. However, File, FTP/S, SFTP, Email style proxy services may not be able to operate multiple instances of the same service concurrently over multiple instances on a cluster. Hence, such services would require to be "pinned" to one node of a cluster, and allowed to operate on an Active-Passive manner with the service failing over to another node when the primary active node fails. The clustering configuration specifies this behavior with a fail-over matrix that specifies one or more other nodes that "can act" as a failed node.
e.g. Node 1 ⇒ Node 2, Node 3; Node 2 ⇒ Node 1, Node 3; Node 3 ⇒ Node 1, Node 2
For controlled maintenance, a specific node could be asked to "start acting as some other node" as well, using the uterm command console. In such a situation, the node concerned will locally start services pinned to the other nodes. On a typical node crash, the clustering framework will detect the loss of the heart-beat, and fail-over services pinned to the failed node into another node. If the original node comes back online, the service is moved back to the original node.
Leaving the state replication and content sharing aspects aside, an UltraESB cluster operates as a collection of independent nodes coordinated through ZooKeeper. Internally a cluster could be controlled by connecting to any of the nodes of a cluster, as all nodes shares the same clustering domain state. Any cluster-wide changes or operations could be issued while being connected to any of the nodes, and in-turn, the local node will publish the event as a cluster wide command. A cluster command will contain the operation to invoke, and the scope of application (i.e. cluster wide - or a specific node etc). Each node on the cluster will receive each cluster command, and take necessary locally action as per the command. The cluster maintains a list of nodes, and the commands issues, and acknowledged/completed state. Any node which crashes and re-starts, will catch up to the same level of issued cluster commands. For example, if a stop proxy service command was issued, and one of the nodes crashed just before receiving it; then it would first synchronize itself to the latest command state with the cluster, and thus execute the missed command later.
The cluster management aspects are available through IMonitor as well as the UTerm command line interface. A special cluster management command of interest is the cluster-wide round-robin restart command, which will take down each node of the cluster in sequential iteration for a re-start, while ensuring that only one node will restart at a given point in time.
AdroitLogic Integration Monitor - IMonitor
AdroitLogic Integration Monitor executes as an independent Web Application, and allows the easy management of a single UltraESB instance or a cluster of instances. Let it be a single instance or a cluster of ESB nodes, IMonitor delivers business level statistics and monitoring at the best. Apart from the operational statistics, IMonitor is capable of presenting friendly troubleshooting & diagnostics capabilities. It’s your step towards improved organisational efficiency saving hours of developer time. Note that IMonitor comes as a replacement for UConsole which was there in previous UltraESB releases and is covered separately in AdroitLogic - Integration Monitor User Guide
State replication among nodes in an UltraESB cluster takes place through the distributed caching support implemented utilizing the Ehcache framework underneath. Additionally, caching support allows local as well as distributed caching, including persisted caches, and is configured and operated as per Ehcache norms. The caching support is exposed to the user via the Mediation.getCachingSupport() API call, and sample 800 shows a simple use case of a distributed cache.