Intelligent MQ Cluster workload balancing with AMQSCLM

Print

Availability

Included in WebSphere MQ version 7.0.1.8 as a sample program, both as compiled and source code except for z/OS

 

Overview

IBM have included a Sample Cluster Queue Monitoring Program (amqsclm) in the package, this was added to deal with the fact that there are no dynamic MQ Cluster workload balancing built into the MQ product. 

Because the amqsclm program is not part of the product you need to start it your self, or start it as a MQ Service. It will execute as a seperate process.

 

What does this mean to you?

It means that is the consuming service are stuck, messages will pile up in the input queue and it means that the client services will not get their replies. This can lead to serious problems for the queue manager if it's not properly configured (Max queue depth and dead letter configuration).

 

Limitations

The amqsclm does not add High Availability (HA) to your applications in the case where the Queue Manager fails, the messages that are in the queues is stuck until the Queue Manager is back in Service. Same as always with MQ, no difference here because amqsclm is a normal MQ application.

Message sequencing (better known as"Message affinity") is a bad practice. Messages should not be reliant on other messages. Messages should be autonomic, yes I know there is "Bind on Open" which should have been depreciated (among other options) to avoid problems.

When you enable message transfer it will do the last of your options, i.e. be transferred and lose the original routing information. By the time messages arrive at their target queues (where the application is running) the context of the target queue manager is lost to applications such as this sample, and when it re-puts the message they are put with no such context. The documentation does state "Message order, and any binding options are not preserved".

For this reason, if your messages are being targeted to specific queue managers, enabling the message transfer function would not be wise. This sample is not without its limitations, the documentation tries to make these as clear as possible so that it can be used in appropriate situations.

IBM MQ Clusters are designed for workload balancing.

Real MQ HA is implemented using Hardware Clustrering like HACMP or //SYSPLEX, where zero downtime is only available on z/OS using //SYSPLEX and Shared Queues with the right setup. Another newer nearly HA solution is the Multi Instance Queue Manager, where it's the IBM MQ Software that gives the near HA solution.
However messages that are on a local queue on a failing Queue Manager is still stuck until the Queue Manager instance is back in service, no change there, and amqsclm don't change that.

 

Logic

The amqsclm program controls CLWLPRTY of the managed queues, so active queues have CLWLPRTY=1 an inactive queues have CLWLPRY=0. To do this will amqsclm inquire the queue status of the controlled queues with the configured interval.

Every queue manager with managed queues will need an instance of a amqsclm service.
 

Figure 1

When amqsclm detects that the service is stuck, it set the CLWLPRY=0 on the queue to prevent more messages from being sent to the queue.

Figure 2

When amqsclm is in control and have adjusted CLWLPRY it can requeue the messages by sending them to the remaining service queue managers if started with option '-t'.


 Figure 3

The amqsclm program will detect when the failing service is reinstated (IPPROC > 0) and will adjust CLWLPRTY to get traffic back to the service

Figure 4

This is actually a great sample program.

Usage

 Usage:   AMQSCLM -m QMgrName -c ClusterName                         
                  (-q QNameMask | -f QListFile) -r MonitorQName     
                  -l ReportDir [-i Interval] [-t]                   
                  [-u ActiveVal] [-d] [-s] [-v]                     
                                                                    
  where:  -m QMgrName     Queue manager to monitor                  
          -c ClusterName  Cluster containing the queues to monitor  
          -q QNameMask    Queue(s) to monitor, where a trailing '*' will monitor all matching queues          
          -f QListFile    File containing list of queue(s) or queue masks to monitor. Contains one queue name per line.
                          (not valid if -q argument provided)       
          -r MonitorQName Local queue to be used exclusively by the monitor                                   
          -l ReportDir    Directory path to store logged informational messages                    
          -i Interval     (optional) Interval in seconds at which the monitor checks the queues.            
                          Defaults to 300 (5 minutes)               
          -t              (optional) Transfer messages from inactive queues to active instances in the cluster 
                          (No transfer by default)                  
          -u ActiveVal    (optional) Automatically switch the CLWLUSEQ value of a queue from the 'ActiveVal' to 'ANY' while the queue is inactive
                          (ActiveVal may be 'LOCAL' or 'QMGR')      
                          (not modified by default)                 
          -d              (optional) Enable additional diagnostic output
                          (No diagnostic output by default)         
          -s              (optional) Enable minimal statistics output per interval  
                          (No per-iteration statistics output by default) 
          -v              (optional) Log report information to standard out 

Example

Example: AMQSCLM -m QM1 -c CLUS1 -f /QList.txt -r MONQ -l /mon_reports -t -u QMGR -s                 

This will monitors queue manager QM1's local cluster queues in cluster CLUST1, specifically the queues listed in file /QList.txt.
The local queue MONQ is used as the monitor's working queue.

Documentation

You find the IBM documentation here, however I will strongly recomment to have a close look in the source code. This is where you find the real documentation.

 

 

Current state State definition Desired state
Active IPPROCS > 0 and CLWLPRTY > 0 Active
Inactive IPPROCS = 0 and CLWLPRTY = 0 Inactive
Become active IPPROCS > 0 and CLWLPRTY = 0 Active
Become inactive IPPROCS = 0 and CLWLPRTY > 0 inactive