Collecting IBM statistics data – meshIQ

Some of the content described in this article is available in meshIQ Manage versions 11 and greater. See meshIQ Highlights v11 for an overview of feature changes.

The purpose of this article is to explain the basic concepts of collecting data provided by IBM statistics with your meshIQ management application. This article will cover activating the statistics, collecting them in the AutoPilot database and using the data in AutoPilot policies. Both distributed and zOS queue managers will be discussed.

Jump ahead:

Architecture review
Activating statistics capture
Statistics data collection
Streaming queue statistics to XRay

Architecture review

The diagram below shows the basic AutoPilot architecture. The data is collected in the Work Group Server database in tables designated for statistics. The statistics that can be collected include Queue, Channel and MQI calls. This article focuses on queue statistics, but the concepts are the same for all statistic types. In addition to statistics, the queue manager can also publish accounting data. This is detailed data intended for billing or other application specific usage. It is not covered in this article although the concepts are the same.

An alternative is using reset queue statistics. This facility also reports the number of gets and puts since last reset.

These statistics are aggregated over time, and do no identify any specific messages which have problems. Using XRay, which captures individual message flow information, the amount of details about specific message flows is greatly increased.
Additionally, system metrics such as CPU and Disk usage can be produced by the queue manager. These take a different path and are not covered in this article.

Activating statistics capture

The following uses the meshIQ management application to activate the statistics, but the process would be similar with any tool that is used.

Each queue can have stats turned on or not. This is controlled via monitoring options. It could be ENABLED/DISABLED or as shown below, set to QUEUE MANAGER. This means that whatever the queue manager is set to, this queue uses.

1. You should check your model queues when using QUEUE MANAGER since you may not want stats for all of your temporary dynamic queues. These should be set to DISABLED rather than QUEUE MANAGER except for temporary dynamic queues that require statistics.

2. The Queue Manager Properties window is displayed below. As shown, Statistics are on for queues with an 1800 second interval for publishing the stats.
3. Online collection should be activated. This increases the level of detail for the queue status monitor.
4. Accounting data collection should not be used with AutoPilot. The amount of data generated can be very high and unless properly tuned, can impact the performance of the Workgroup Server.

Distributed systems

On distributed systems, the stats are written just like MQ Events to the SYSTEM.ADMIN.STATISTICS.QUEUE.

This should be configured as an alias to one of the following depending on how you have your meshIQ management application set up.

NASTEL.EVENT.QUEUE
NASTEL.PUBSUB.EVENT.QUEUE (when running with publish subscribe process)
SYSTEM.ADMIN.QMGR.EVENT (for connection manager)

zOS systems

On zOs, the statistics are written to SMF and have to be post processed to send to the Workgroup Server. The job to do this is nsqzas which is run whenever the SMF is switched to convert the data from SMF format to a form accepted by the Workgroup Server. This requires that in addition to the interval specified in the statistics collection, that an SMF switch must occur to get the data to AutoPilot. However, once received by the WGS, the processing is identical to distributed systems.

Statistics data collection

Example of data collected

The following shows an example of the data collected, using the meshIQ management application to demonstrate.

You can use the current date or a range to view the stats. You can also customize the columns to view what is pertinent to you.

SQL access

The data collected is stored in the WGS in SQL tables. In addition to access via our tools, this data could be used in reporting servers to provide reports about activity. It can also be exported to other databases for use within other tools.
The schema below is an example of the schema created for a DB2 database, although the content will be similar for other databases

You do not have to create this schema, it is done when running the original database creation scripts, it is shown to describe the fields stored.

CREATE TABLE STATQUEUE (
                STATQUEUE_NO                  INTEGER     NOT NULL,
                MANAGER_NAME                  CHAR(33)    NOT NULL,
                MQNODE_NAME                   CHAR(49),
                MQMGR_NAME                    CHAR(49),
                STAT_TIME_STAMP               INTEGER,
                INTERVAL_START_DATE_TIME      TIMESTAMP,
                INTERVAL_END_DATE_TIME        TIMESTAMP,
                COMMAND_LEVEL                 INTEGER,
                QUEUE_NAME                    CHAR(49),
                QUEUE_TYPE                    INTEGER,
                Q_DEFINITION_TYPE             INTEGER,
                CREATION_TIMESTAMP            CHAR(13),
                CREATION_TIME                 CHAR(9),
                MIN_DEPTH                     INTEGER,
                MAX_DEPTH                     INTEGER,
                NONPERS_TIME_ON_Q_AVG         INTEGER,
                PERS_TIME_ON_Q_AVG            INTEGER,
                NONPERS_PUT_COUNT             INTEGER,
                PERS_PUT_COUNT                INTEGER,
                PUT_FAIL_COUNT                INTEGER,
                NONPERS_PUT1_COUNT            INTEGER,
                PERS_PUT1_COUNT               INTEGER,
                PUT1_FAIL_COUNT               INTEGER,
                NONPERS_PUT_BYTES             DOUBLE,
                PERS_PUT_BYTES                DOUBLE,
                NONPERS_GET_COUNT             INTEGER,
                PERS_GET_COUNT                INTEGER,
                GET_FAIL_COUNT                INTEGER,
                NONPERS_GET_BYTES             DOUBLE,
                PERS_GET_BYTES                DOUBLE,
                NONPERS_BROWSE_COUNT          INTEGER,
                PERS_BROWSE_COUNT             INTEGER,
                BROWSE_FAIL_COUNT             INTEGER,
                NONPERS_BROWSE_BYTES          DOUBLE,
                PERS_BROWSE_BYTES             DOUBLE,
                EXPIRED_MSG_COUNT             INTEGER,
                NOT_QUEUED_MSG_COUNT          INTEGER,
                PURGED_MSG_COUNT              INTEGER,
                CB_CRT_ALT_COUNT              INTEGER,
                CB_REMOVE_COUNT               INTEGER,
                CB_RESUME_COUNT               INTEGER,
                CB_SUSPEND_COUNT              INTEGER,
                CB_FAIL_COUNT                 INTEGER,
    CONSTRAINT STATQUEUEPK PRIMARY KEY
    (
       STATQUEUE_NO,
       MANAGER_NAME
    )
)

Data elements are collected from IBM supplied data. Specific information can be found at https://www.ibm.com/docs/en/ibm-mq/9.3?topic=messages-accounting-statistics-message-reference

Estimating size of the statistics database

The statistics records have the following sizes:

MQI: 623
Queue: 366
Channel: 557

To calculate size you would need to determine the number of stat records per day and number of days to keep on file. The number of statistics records per day will vary depending on the activity being performed. For the purpose of estimating, you could assume that all queues will have activity every day. Channel statistics are even more dynamic as they are created at the interval but also at channel end. As such, channels that end frequently could generate significant load.

Using MQ Statistics in AutoPilot Policies

The AutoPilot WMQ Expert can be configured to receive MQ statistics. This is done by selecting Publish Statistics & Accounting on the General tab of WS Monitor Properties.

The Fact Options tab should also be specified. The following two settings are important when collecting statistics data. The others could be set to control which facts are published.

Expire facts(ms): If not set, the statistics for queues that no longer exist (such as temporary dynamic queues) will not be deleted. This should be set greater than the frequency that the statistics are produced. In the example below, it is 360,000 milliseconds (360 seconds / 6 minutes) which was greater than the collection frequency of 5 minutes.
Fact History Size: This allows for using derived historical analytics of the queue statistics. This should be set at least 50 to provide adequate history. It will add to the memory requirements for storing facts and should be considered when setting the size.

When active, the statistics will be published under the expert as AutoPilot facts as shown in the example below. They will include the number of gets and puts, the average time on queue and other statistics.

These metrics can be used in AutoPilot policies to determine various characteristics (total puts and gets), types of calls and total bytes transferred, and maximum and minimum depth of the queue.

The following would be used to determine message get and put rates:

{WSMON}\Queue_Statistics\*\*\*\QStatistics\*\Gets\Mqimo_Gets_Persist_Count
{WSMON}\Queue_Statistics\*\*\*\QStatistics\*\Gets\Mqimo_Gets_NonPersist_Count
{WSMON}\Queue_Statistics\*\*\*\QStatistics\*\Puts\Mqimo_Puts_NonPersist_Count
{WSMON}\Queue_Statistics\*\*\*\QStatistics\*\Puts\Mqimo_Put1s_NonPersist_Count
{WSMON}\Queue_Statistics\*\*\*\QStatistics\*\Puts\Mqimo_Puts_NonPersist_Count
{WSMON}\Queue_Statistics\*\*\*\QStatistics\*\Puts\Mqimo_Put1s_NonPersist_Count

For example, the total number of get message requests (excludes browse) for the collection period would be 1 + 2 and the total put requests would be 3 + 4 + 5 + 6. The delta between the messages put and get requests gives insight into the behavior of the queue. Some applications will process the messages immediately and the counts will be the same, others may vary slightly which may result in frequency of reporting, and some may process periodically creating a sawtooth frequency.

The collection period would be reported as the update latency in milliseconds. The gets and puts per second would be calculated as follows:

total_count / (update latency) * 1000

The following is an example policy based on these values:

Similarly, you could view the statistics over time using charts.

Using MQ statistics in your meshIQ management application

MQ statistics can be viewed using your meshIQ management application. This is done by selecting MQ Statistics… from the context menus for queues, channels or queue managers.

This will display a default view for the selected objects, such as queue as seen below.

You can update the date range using the Date mode list: Last 24 hours, Last 48 hours, Last 7 days, Custom Days Count (enter the number of previous days), or User Date Range (select a date range). When switching back and forth between the User Date Range and the Custom Days count, the date range is updated. For example, if you view records after selecting a Custom Days Count of 14, then switch to the User Date Range, the range shows the past 14 days.

You can modify the default view by selecting schema and selecting a subset of columns, as seen below.

Using a database “view” of the data is especially useful if you want to combine or simplify the columns displayed.

Select * FROM statqueue
Select QUEUE_NAME, MIN_DEPTH, MAX_DEPTH, NONPERS_PUT_COUNT, NONPERS_PUT_COUNT, NONPERS_PUT1_COUNT FROM statqueue
Select * from view1

Streaming queue statistics to XRay

XRay provides long term data analysis as well as the ability to address Decision Support requirements such as volumes over time and busiest periods. An English like query language (JKQL) allows adhoc questions to be answered related to application behavior. While the SQL database described above is a good option for a simple deployment, the big data aspects of XRay offer a longer term ability to capture the required data and offers a more robust query language to get to the root of the data. In addition to queue statistics, XRay can also collect MQ error logs, and transaction activity to further correlate performance anomalies using Machine Learning and advanced analytics and objectives.

Using the same metrics collected above, these can be forwarded to XRay by indicating the required metrics on the Streaming tab using the raw metrics or periodic summary records.

As an example, to stream all MQI and queue statistics, the following would be specified on the WS_Monitor or equivalent expert. With WGS 10 and MQ V9 and above, additional system metrics are also collected and available for streaming.

Once streamed to XRay, the various displays can be used to analyze and present them as shown in the example below.

This chart is a 1 year breakdown of bytes sent by the queue manager as a bar chart and below that as a percentage by respective queue managers.

Whereas this chart shows a breakdown by queues, both for messages and bytes

Other presentation types exist, including line charts, anomaly charts, histograms, heat maps, and many others.

In version 11.0 and earlier	In version 11.2 and later