Version: 17.07
Suppose you want to run a cluster of UltraESB-X instances for a high available distributed solution or for any other similar distributed solution. To address such clustering requirements, we have inbuilt clustering support which is convenient and flexible to use. UltraESB-X clustering support provide 2 major clustering functionalities.
Failover Support |
Strategies to be used whenever a cluster or an individual node require failover. We have a set of inbuilt failover techniques which can be configured and used out of the box. Apart from that you can use any customized failover techniques as you prefer with our failover support implementation. |
Distributed Command Framework |
A convenient mechanism to execute distributed commands in all the nodes in a cluster at once. The commands have to be predefined as we use bean introspection to marshal and unmarshal commands when they are being notified and executed at each node belonging to the same command space. |
To provide these clustering functionalities we rely on the well-known
distributed coordination system, Apache Zookeeper.
That is, UltraESB-X cluster manager need to connect to a
zookeeper ensemble
in order to continue its operations mentioned above. Zookeeper version must be higher than 3.5.0-alpha for the
cluster manager to operate correctly.
|
We have internally used Apache Curator as the client to connect to the zookeeper ensemble. If you want to use SSL for client-server communication, please refer SSL user guide for zookeeper |
As UltraESB-X clustering is implemented using Apache ZooKeeper, the first step is to setup ZooKeeper! Before digging into details, lets first look at the deployment diagram that we are planning to implement.
As per the above diagram, our deployment will be on 3 server nodes, each running both ZooKeeper and the UltraESB-X.
ZooKeeper and the required scripts and configurations to run the ZooKeeper, is shipped in with the UltraESB-X already! So we will be using those to setup ZooKeeper for our clustered deployment instead of downloading and setting up Zookeeper from scratch. (However, if your enterprise already has Zookeeper nodes setup for wider use, you may use them - or set up such an environment if required) You may also want to go through the ZooKeeper documentation to get an understanding of what you are doing and why you have to do it; although this guide will be sufficient to setup ZooKeeper as required for our purposes. Quoting from the ZooKeeper documentation, it says that you have to run a replicated ZooKeeper quorum if you are running ZooKeeper in production. Here is the quote,
Running ZooKeeper in standalone mode is convenient for evaluation, some development, and testing. But in production, you should run ZooKeeper in replicated mode. A replicated group of servers in the same application is called a quorum, and in replicated mode, all servers in the quorum have copies of the same configuration file.
Hence we will be using a three node ZooKeeper quorum. You might wonder whether you can go with two ZooKeeper nodes
instead? Well the answer is yes, but there is no real value as a two node ZooKeeper quorum is equivalent in reliability
to that of a single node Zookeeper setup. ZooKeeper assumes that there is a majority of servers available at any given
time, hence in a two node deployment, if one node goes down, ZooKeeper fails to have the necessary quorum to operate as
there is no possible majority. So in effect any even number n
of nodes is equal in reliability to that of n-1
nodes.
ZooKeeper quorum should be expanded in Odd numbers
The conclusion of the above discussion is that a ZooKeeper quorum should only be expanded in odd numbers, as in 3, 5, 7, etc… where necessary. |
Next you will need to prepare the ZooKeeper configuration for the quorum, which is a standard properties file with name
value pairs. A default configuration for the standalone ZooKeeper is shipped with the UltraESB-X which can be found in
the ULTRA_HOME/conf/cluster/zoo.cfg
file, so lets edit it and change it as follows to configure it for the quorum we
will be setting up.
tickTime=2000
dataDir=/var/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=hostname1:2888:3888
server.2=hostname2:2888:3888
server.3=hostname3:2888:3888
|
The basic time unit in milliseconds used by ZooKeeper. It is used to do heartbeats and the minimum session timeout will be twice the tickTime. |
|
The location to store the in-memory database snapshots and, unless specified otherwise, the transaction log of updates to the database. |
|
The port to listen for client connections. |
|
The timeouts ZooKeeper uses to limit the length of time (in tickTime units) the ZooKeeper servers in quorum have to connect to a leader. |
|
Limits how far (in tickTime units) out of date a server can be from a leader. |
|
This line will be repeated with incrementing x by 1 to define the servers in the quorum. This list the servers that make up the ZooKeeper service. |
Note the two port numbers after each server name, 2888 and 3888 . Peers use the former port to connect to
other peers. Such a connection is necessary so that peers can communicate, for example, to agree upon the order of
updates. More specifically, a ZooKeeper server uses this port to connect followers to the leader. When a new leader
arises, a follower opens a TCP connection to the leader using this port. Because the default leader election also uses
TCP, we currently require another port for leader election. This is the second port in the server entry.
|
Now we need to inform each server which node they are in the quorum, remember that in the above configuration, we have
mentioned, server.1, server.2 and server.3, so it is the time to give the servers their identifiers. This requires
creating a file named myid
in the ZooKeeper data directory (dataDir
), which we have configured in the above
configuration file. Simply create a file named myid
there (according to the previous configuration, the file path is
/var/zookeeper/myid
) and specify only the identifier on each server, which is just 1, 2, and 3 respectively on each
server.
myid
ConfigurationsServer 1: Host name = hostname1, file /var/zookeeper/myid will contain "1"
Server 2: Host name = hostname1, file /var/zookeeper/myid will contain "2"
Server 3: Host name = hostname1, file /var/zookeeper/myid will contain "3"
That completes the ZooKeeper setup, but before starting the ZooKeeper quorum you may want to take a peek at the ZooKeeper administration guide as a production setup is not child’s play.
If for any reason, this setup needs to be done on a single machine, you need to change the client port and server
election ports to be unique while the host name of all the servers being the same, which is the host name of the server,
for example localhost .
|
Starting of the ZooKeeper quorum is now a matter of running the ZooKeeper startup script shipped with the UltraESB-X
instance in each server instance, which resides in the ULTRA_HOME/bin
directory of the installations. So the command
for running the ZooKeeper quorum would be;
$ sh bin/zkServer.sh start
Zookeeper should now be up and running if all went well. You could check the status of the ZooKeeper quorum before
proceeding with the UltraESB-X clustering configuration instance setup, with the ZooKeeper client (bin/zkCli.sh
)
shipped with the UltraESB-X Tools distribution.
If you want to enable clustering in UltraESB-X, all you have to do is defining
the Cluster Manager's configuration in the server.xml as shown below. By
default this segment it not provided (or commented by default) in the default server.xml .
|
<bean id="cluster-manager" class="org.adroitlogic.x.clustering.CuratorClusterManager">
------
</bean>
By adding the above configuration, the UltraESB-X instance will start with
clustering support through the CuratorClusterManager
. Before we move forward, our
cluster manager need to be configured with some parameters in order to function correctly.
* marked are the mandatory fields
connectString* |
A comma separated list of zookeeper servers (belonging to the zookeeper ensemble we are using)
to connect to. The A list of zookeeper servers are provided because the underlying client will be trying to connect to one of those
servers at the beginning. If the connection attempt fails, then the client will try another server given in the
|
domain |
The domain name of this UltraESB-X instance. In short, |
Zoo |
All the UltraESB-X instances in the same A distributed command can be any class implementing |
zkUsername |
The username to be used by this UltraESB-X instance when setting ACL (Access Control Lists) of ZNodes. |
zkPassword |
The password to be used by this UltraESB-X instance when setting
ACL (Access Control Lists) of ZNodes. |
We are using the digest ACL scheme
where the username and password will be joined together (as "zkUsername:zkPassword" ) and hashed to authorize clients.
|
connectionTimeoutMs |
The connection timeout to be used by the underlying Apache Curator client. |
sessionTimeoutMs |
The session timeout to be used by the underlying Apache Curator client. |
When you are configuring sessionTimeoutMs and connectionTimeoutMs, make sure that your session timeout is larger than the connection timeout. Otherwise, the cluster manager will undergo inconsistencies when operating. |
startupTimeoutMs |
The maximum time cluster manager will be waiting to connect to the zookeeper ensemble at the beginning.
If the cluster manager is unable to connect to the zookeeper by that time, an |
retryPolicy |
Underlying Apache Curator client will be using a retry
policy to retry connecting to zookeeper whenever the connection to the zookeeper is lost. This same retry policy will be used
whenever the curator client has to retry an operation like creating/deleting ZNodes due to a network interruptions. By default we are using |
The bean configuration will look like this once all the above mentioned properties are taken into account.
<bean id="cluster-manager" class="org.adroitlogic.x.clustering.CuratorClusterManager">
<property name="connectString" value="hostname1:port1,hostname2:port2,hostname3:port3"/>
<!--Optional configurations-->
<property name="commandSpace" value="newSpace"/>
<property name="domain" value="local"/>
<property name="zkUsername" value="admin"/>
<property name="zkPassword" value="password"/>
<property name="connectionTimeoutMs" value="15000"/>
<property name="sessionTimeoutMs" value="60000"/>
<property name="startupTimeOutMs" value="30000"/>
<property name="retryPolicy">
<bean class="org.apache.curator.retry.RetryNTimes">
<constructor-arg name="n" value="200"/>
<constructor-arg name="sleepMsBetweenRetries" value="3000"/>
</bean>
</property>
</bean>
Now you have learnt how to setup the cluster manager in UltraESB-X. By setting the CuratorClusterManager bean,
your UltraESB-X instance will have the functionality of the distributed command framework. If you want to enable the
failover support too, you have to add another bean to the server.xml . That is to specify the failover strategy you
will be using and this node’s configuration related to the failover strategy used.
|
In order to start the UltraESB-X instance with failover support, a small modification has to be made
to the xml configuration shown in the Basic Clustering Configuration. First, a
failover-strategy
bean has to be defined. This bean should be of a class which is extending the
org.adroitlogic.x.clustering.failover.strategy.FailoverStrategy
. Then this failover-strategy bean has to be set as the
failoverStrategy
of the cluster manager. Therefore the bean configuration’s structure will look like below.
<bean id="cluster-manager" class="org.adroitlogic.x.clustering.CuratorClusterManager"> ....... <!-- Failover strategy --> <property name="failoverStrategy" ref="failover-strategy"/> </bean> <!-- Failover strategy bean --> <bean id="failover-strategy" class="${failover_strategy's_class}"> ...... </bean>
Here, we have referenced the bean named failover-strategy
which is the class responsible for failover processing as the
failoverStrategy
of CuratorClusterManager. At this point, we have to chose the class to act as the failover-strategy
of the node you are configuring.
Currently we have 2 inbuilt failover strategies,
Node wise failover strategy
Group wise failover strategy
You have to chose one of those strategies as per your requirement and set it as the failover-strategy
bean. Apart from these
inbuilt strategies, UltraESB-X has the capability to introduce new strategies easily. The configuration details of the
existing failover strategies are described below.
UltraESB-X failover support is very flexible due to its pluggable nature. All you have to do is choosing the strategy
that matches your requirement and plugging it to the cluster-manager bean as shown before. We have currently included 2
strategies which are expected to address 2 of the mostly encountered failover requirements. We will be adding more strategies in
the future.
|
This strategy is to provide failover to individual nodes in a cluster. Let’s consider a simple scenario.
Consider a very simple scenario where a cluster has 3 nodes and it should run 3 projects which are configured as below.
Projects
Project name | Description |
---|---|
project1 |
Listens for SFTP messages and copies the files received to an external storage. Only one instance of this project should be running in the cluster. |
project2 |
Listens for JMS messages of 2 queues and submits the messages to an external JMS queue. Only one instance must be running at a time. |
project3 |
Listens for HTTP requests and sends them to an external endpoint. |
This cluster should guarantee that its all 3 distinct projects are running without any down time. That is the external clients who are sending messages to the nodes running in this cluster should see the corresponding service as highly available. How do we address this situation?
We distribute those projects among the nodes in the cluster as shown below. Each node is configured to run a specific project by default. But a node may start other projects in order to provide failover to another node. Therefore, each node is configured with all 3 projects, but only one project is pinned to a given node.
Relationship between nodes and projects
Node | Configured projects | Projects to be run by default |
---|---|---|
node1 |
project1,project2,project3 |
project1 |
node2 |
project1,project2,project3 |
project2 |
node3 |
project1,project2,project3 |
project3 |
We first define a failover matrix. It’s just a map of node names where keys are node names and the values are comma separated lists of node names. The dependencies among nodes can be specified as below.
Node | Nodes expecting to provide failover to "Node" |
---|---|
node1 |
node2, node3 |
node2 |
node1, node3 |
node 3 |
node1, node2 |
In short, a node is providing failover to all other nodes. Say that the node1 goes down. Then either node2 or node3 will
takeover the operations of node1 and start the projects (in this case, project1) to start acting as node1. Whenever the node1
comes back online, the node which is currently acting as node1 will stop project2 and stop acting as node1. Let’s see how do
we configure this solution in UltraESB-X. We can deploy this solution easily with the node wise failover strategy
as follows.
As you saw before at Failover Clustering Configuration, we have to plug the failover-strategy
bean to our cluster manager. Since we selected node wise failover strategy
as our strategy to be used to address this
scenario, we should use the class org.adroitlogic.x.clustering.failover.strategy.NodeWiseFailoverStrategy
as the
failover-strategy bean as below.
<bean id="failover-strategy" class="org.adroitlogic.x.clustering.failover.strategy.NodeWiseFailoverStrategy"> ...... </bean>
There are several properties like failover matrix to be configured when using this strategy.
* marked properties are mandatory.
failoverMatrix* |
A map of node names as keys and comma separated lists of node names as the values. For each key
in this map, its values are the node names that are supposed to provide failover to the node given by key. Following xml segment show the failover matrix that should be given for solution we discussed for the above scenario we discussed. Note that we wanted to assign each node to provide failover to the other two nodes. |
<property name="failoverMatrix">
<map>
<entry key="node1" value="node2,node3"/>
<entry key="node2" value="node1,node3"/>
<entry key="node3" value="node1,node2"/>
</map>
</property>
A Complex Scenario
The above failover matrix is for the scenario we discussed. We specified it as above since we wanted each node to provide failover to all the other nodes. There can be scenarios like below where only a subset of nodes are expected to provide failover to a given node. Or the relationship can be even more complex as shown below. <property name="failoverMatrix"> <map> <entry key="node1" value="node2,node3"/> <entry key="node2" value="node1,node3"/> <entry key="node3" value="node1,node2"/> <entry key="node4" value="node5"/> <entry key="node6" value="node7,node8,node9,node10"/> <entry key="node7" value="node6,node8,node9,node10"/> <entry key="node8" value="node6,node7,node9,node10"/> <entry key="node9" value="node6,node7,node8,node10"/> <entry key="node10" value="node6,node7,node8,node9"/> </map> </property> node1, node2 and node3 are providing failover to each other. node5 is providing failover to node4. node4 is not providing failover to anyone. node6, node7, node8, node9 and node10 are providing failover to each other. |
recursivelyFailover |
A If you are willing to use this feature, you should make sure that the nodes which can have recursive relationship when providing failover have the common set of projects configured properly in each of those nodes. |
failoverOnStartup |
A boolean which defaults to |
waitTimeBeforeFailoverMs |
Whenever this strategy gets notified about the unavailability of a node, this strategy waits
|
distributedLockWaitTimeMs |
This strategy make use of distributed locks to make sure that no two nodes start acting as
the same node whenever that node goes down. Therefore, the nodes willing to act as a node that went down should compete for
a distributed lock. The property |
Under this strategy, the final bean configuration of the 'failover-strategy' bean will look like this.
<bean id="failover-strategy" class="org.adroitlogic.x.clustering.failover.strategy.NodeWiseFailoverStrategy">
<property name="failoverMatrix">
<map>
<entry key="node1" value="node2,node3"/>
<entry key="node2" value="node1,node3"/>
<entry key="node3" value="node1,node2"/>
</map>
</property>
<property name="recursivelyFailover" value="true"/>
<property name="failoverOnStartup" value="true"/>
<property name="distributedLockWaitTimeMs" value="2000"/>
<property name="waitTimeBeforeFailoverMs" value="1000"/>
</bean>
By configuring the failover-strategy
bean we are halfway through to deploying our
solution to the sample scenario.
Now what we are left with is the configuration of the projects.
When you are configuring an UltraESB-X instance with NodeWiseFailoverStrategy
it is your responsibility to configure projects
as well to match the failover strategy you chose. Therefore if you are using this strategy and you want a project to respond to
failover events occurring, you should add the following evaluator to that project’s "project.xpml"
as an x:evaluator
as
shown below.
<x:evaluators>
........
<x:evaluator>
<bean class="org.adroitlogic.x.base.evaluators.NodeWiseFailoverEnvironmentEvaluator">
<constructor-arg name="weight" value="1"/>
<constructor-arg name="pinnedNodes">
<list>
<value>node1</value>
<value>node2</value>
</list>
</constructor-arg>
</bean>
</x:evaluator>
</x:evaluators>
weight |
This evaluator will return |
pinnedNodes |
Names of the nodes in which this project is configured to run by default (at startup). If this project to
start in a given UltraESB-X instance, that instance should either be one of those |
In our sample scenario we should configure evaluators in each project with,
node1
as the pinned node for project1
.
node2
as the pinned node for project2
.
node3
as the pinned node for project3
.
Congratulations! you now know how to configure our inbuilt node wise failover strategy in your UltraESB-X clustering deployment.
This strategy is to provide failover to groups of nodes. This can either be considered as a strategy providing failover for data centers.
Let’s consider a scenario where there is service which should be highly available served by a data center. Outsiders should not see a downtime in this service. The data center has 6 nodes which are identical and providing the same service. A load balancer is used to direct incoming requests to individual nodes. How do we ensure that there will be no downtime in this service as seen by the outsiders?
The most common solution is keeping a backup data center which will come online if the first data center lose majority of its
nodes. Let’s name the main data center which is originally providing the service as PDC (Primary Data Center)
and the backup
data center as SDC(Secondary Data Center)
. All the nodes in these two data centers are identical and therefore each node
has the same set of projects. Therefore, there will be no dependency between data centers and projects other than the dependency
between two data centers.
We can use the group failover strategy
inbuilt with UltraESB-X to deploy this solution. We can consider the two
data centers, PDC and SDC as two groups in this strategy. Let’s see how do we configure and deploy our solution.
As you saw before at Failover Clustering Configuration, we have to plug the failover-strategy
bean to our cluster manager. Since we have selected the group failover strategy
as our strategy to be used in this scenario,
we should use the class org.adroitlogic.x.clustering.failover.strategy.GroupFailoverStrategy
as the failover-strategy
bean as below.
<bean id="failover-strategy" class="org.adroitlogic.x.clustering.failover.strategy.GroupFailoverStrategy"> ...... </bean>
There are several properties like failover matrix to be configured when using this strategy.
* marked properties are mandatory.
groupName* |
The name of the group of this UltraESB-X instance. |
failoverMatrix* |
A map showing the relationship among the node groups. Keys are the group names. For each key, a list of
|
<property name="failoverMatrix">
<map>
<entry key="PDC">
<list>
<bean class="org.adroitlogic.x.clustering.failover.strategy.config.FailoverGroupConfig">
<property name="failoverGroup" value="SDC"/>
<property name="triggerOnNodeCount" value="3"/>
<property name="triggerOnPercentage" value="50"/>
<property name="triggerOnMajority" value="true"/>
</bean>
</list>
</entry>
</map>
</property>
failoverGroup |
The name of the group which is providing failover. |
triggerOnNodeCount |
If the available number of nodes drops below this amount, the group( |
triggerOnPercentage |
If the active node percentage(out of total nodes) goes below the amount specified by this parameter, this group will start providing failover to the corresponding group. |
triggerOnMajority |
If set to |
When a group is checking another group’s statistics (active node count, total node count, etc.) to determine whether to start providing failover, fulfilling any condition given by the above described properties is enough to trigger starting failover. That is any of,
can cause this strategy to start providing failover to that group. |
failoverOnStartup |
Whether to check for the availability of the groups to which this instance is providing failover
on this instance’s startup. Defaults to |
Finally, this is how our "failover-strategy"
bean looks like once configured to address our
scenario.
<bean id="failover-strategy" class="org.adroitlogic.x.clustering.failover.strategy.GroupFailoverStrategy">
<property name="failoverMatrix">
<map>
<entry key="PDC">
<list>
<bean class="org.adroitlogic.x.clustering.failover.strategy.config.FailoverGroupConfig">
<property name="failoverGroup" value="SDC"/>
<property name="triggerOnNodeCount" value="3"/>
<property name="triggerOnPercentage" value="50"/>
<property name="triggerOnMajority" value="true"/>
</bean>
</list>
</entry>
</map>
</property>
<property name="failoverOnStartup" value="true"/>
<property name="groupName" value="PDC"/>
</bean>
We are almost done with our solution. But still we have to configure the projects deployed in the UltraESB-X instances to act accordingly to the failover events.
When you are configuring an UltraESB-X instance with GroupFailoverStrategy
it is your responsibility to configure projects
as well to match the failover strategy you chose. Therefore if you are using this strategy and you want a project to respond to
failover events occurring, you should add the following evaluator to that project’s "project.xpml"
as an x:evaluator
as
shown below. In this strategy, we don’t have to configure each project separately since all the projects use the same
configuration in each group.
<x:evaluators>
........
<x:evaluator>
<bean class="org.adroitlogic.x.base.evaluators.GroupFailoverEnvironmentEvaluator">
<constructor-arg name="weight" value="1"/>
<constructor-arg name="pinnedGroups">
<list>
<value>SDC</value>
</list>
</constructor-arg>
</bean>
</x:evaluator>
</x:evaluators>
weight |
This evaluator will return |
pinnedGroups |
Names of the groups in which this project is configured to run by default (at startup). If this project to
start in a given UltraESB-X instance, that instance should either belong to one of those |
Congratulations! We have finished configuring our solution. What we configured was a very simple scenario. But the UltraESB-X failover support and group failover strategy is capable of handling more complex scenarios like multiple groups providing backup to a given group and so on.