Some of the content described in this article is available in meshIQ Manage versions 11 and greater. See meshIQ Highlights v11 for an overview of feature changes.
The purpose of this article is to explain the basic concepts of collecting data provided by IBM statistics with your meshIQ management application. This article will cover activating the statistics, collecting them in the AutoPilot database and using the data in AutoPilot policies. Both distributed and zOS queue managers will be discussed.
Jump ahead: |
Architecture review
The diagram below shows the basic AutoPilot architecture. The data is collected in the Work Group Server database in tables designated for statistics. The statistics that can be collected include Queue, Channel and MQI calls. This article focuses on queue statistics, but the concepts are the same for all statistic types. In addition to statistics, the queue manager can also publish accounting data. This is detailed data intended for billing or other application specific usage. It is not covered in this article although the concepts are the same.
These statistics are aggregated over time, and do no identify any specific messages which have problems. Using XRay, which captures individual message flow information, the amount of details about specific message flows is greatly increased.
Additionally, system metrics such as CPU and Disk usage can be produced by the queue manager. These take a different path and are not covered in this article.
Activating statistics capture
The following uses the meshIQ management application to activate the statistics, but the process would be similar with any tool that is used.
Each queue can have stats turned on or not. This is controlled via monitoring options. It could be ENABLED/DISABLED or as shown below, set to QUEUE MANAGER. This means that whatever the queue manager is set to, this queue uses.
1. You should check your model queues when using QUEUE MANAGER since you may not want stats for all of your temporary dynamic queues. These should be set to DISABLED rather than QUEUE MANAGER except for temporary dynamic queues that require statistics.
2. The Queue Manager Properties window is displayed below. As shown, Statistics are on for queues with an 1800 second interval for publishing the stats.
3. Online collection should be activated. This increases the level of detail for the queue status monitor.
4. Accounting data collection should not be used with AutoPilot. The amount of data generated can be very high and unless properly tuned, can impact the performance of the Workgroup Server.
Distributed systems
On distributed systems, the stats are written just like MQ Events to the SYSTEM.ADMIN.STATISTICS.QUEUE.
This should be configured as an alias to one of the following depending on how you have your meshIQ management application set up.
- NASTEL.EVENT.QUEUE
- NASTEL.PUBSUB.EVENT.QUEUE (when running with publish subscribe process)
- SYSTEM.ADMIN.QMGR.EVENT (for connection manager)
zOS systems
On zOs, the statistics are written to SMF and have to be post processed to send to the Workgroup Server. The job to do this is nsqzas which is run whenever the SMF is switched to convert the data from SMF format to a form accepted by the Workgroup Server. This requires that in addition to the interval specified in the statistics collection, that an SMF switch must occur to get the data to AutoPilot. However, once received by the WGS, the processing is identical to distributed systems.
Statistics data collection
Example of data collected
The following shows an example of the data collected, using the meshIQ management application to demonstrate.
You can use the current date or a range to view the stats. You can also customize the columns to view what is pertinent to you.
SQL access
The data collected is stored in the WGS in SQL tables. In addition to access via our tools, this data could be used in reporting servers to provide reports about activity. It can also be exported to other databases for use within other tools.
The schema below is an example of the schema created for a DB2 database, although the content will be similar for other databases
You do not have to create this schema, it is done when running the original database creation scripts, it is shown to describe the fields stored.
CREATE TABLE STATQUEUE (
STATQUEUE_NO INTEGER NOT NULL,
MANAGER_NAME CHAR(33) NOT NULL,
MQNODE_NAME CHAR(49),
MQMGR_NAME CHAR(49),
STAT_TIME_STAMP INTEGER,
INTERVAL_START_DATE_TIME TIMESTAMP,
INTERVAL_END_DATE_TIME TIMESTAMP,
COMMAND_LEVEL INTEGER,
QUEUE_NAME CHAR(49),
QUEUE_TYPE INTEGER,
Q_DEFINITION_TYPE INTEGER,
CREATION_TIMESTAMP CHAR(13),
CREATION_TIME CHAR(9),
MIN_DEPTH INTEGER,
MAX_DEPTH INTEGER,
NONPERS_TIME_ON_Q_AVG INTEGER,
PERS_TIME_ON_Q_AVG INTEGER,
NONPERS_PUT_COUNT INTEGER,
PERS_PUT_COUNT INTEGER,
PUT_FAIL_COUNT INTEGER,
NONPERS_PUT1_COUNT INTEGER,
PERS_PUT1_COUNT INTEGER,
PUT1_FAIL_COUNT INTEGER,
NONPERS_PUT_BYTES DOUBLE,
PERS_PUT_BYTES DOUBLE,
NONPERS_GET_COUNT INTEGER,
PERS_GET_COUNT INTEGER,
GET_FAIL_COUNT INTEGER,
NONPERS_GET_BYTES DOUBLE,
PERS_GET_BYTES DOUBLE,
NONPERS_BROWSE_COUNT INTEGER,
PERS_BROWSE_COUNT INTEGER,
BROWSE_FAIL_COUNT INTEGER,
NONPERS_BROWSE_BYTES DOUBLE,
PERS_BROWSE_BYTES DOUBLE,
EXPIRED_MSG_COUNT INTEGER,
NOT_QUEUED_MSG_COUNT INTEGER,
PURGED_MSG_COUNT INTEGER,
CB_CRT_ALT_COUNT INTEGER,
CB_REMOVE_COUNT INTEGER,
CB_RESUME_COUNT INTEGER,
CB_SUSPEND_COUNT INTEGER,
CB_FAIL_COUNT INTEGER,
CONSTRAINT STATQUEUEPK PRIMARY KEY
(
STATQUEUE_NO,
MANAGER_NAME
)
)
Data elements are collected from IBM supplied data. Specific information can be found at https://www.ibm.com/docs/en/ibm-mq/9.3?topic=messages-accounting-statistics-message-reference
Estimating size of the statistics database
The statistics records have the following sizes:
- MQI: 623
- Queue: 366
- Channel: 557
To calculate size you would need to determine the number of stat records per day and number of days to keep on file. The number of statistics records per day will vary depending on the activity being performed. For the purpose of estimating, you could assume that all queues will have activity every day. Channel statistics are even more dynamic as they are created at the interval but also at channel end. As such, channels that end frequently could generate significant load.
Using MQ Statistics in AutoPilot Policies
The AutoPilot WMQ Expert can be configured to receive MQ statistics. This is done by selecting Publish Statistics & Accounting on the General tab of WS Monitor Properties.
The Fact Options tab should also be specified. The following two settings are important when collecting statistics data. The others could be set to control which facts are published.
- Expire facts(ms): If not set, the statistics for queues that no longer exist (such as temporary dynamic queues) will not be deleted. This should be set greater than the frequency that the statistics are produced. In the example below, it is 360,000 milliseconds (360 seconds / 6 minutes) which was greater than the collection frequency of 5 minutes.
- Fact History Size: This allows for using derived historical analytics of the queue statistics. This should be set at least 50 to provide adequate history. It will add to the memory requirements for storing facts and should be considered when setting the size.
When active, the statistics will be published under the expert as AutoPilot facts as shown in the example below. They will include the number of gets and puts, the average time on queue and other statistics.
These metrics can be used in AutoPilot policies to determine various characteristics (total puts and gets), types of calls and total bytes transferred, and maximum and minimum depth of the queue.
The following would be used to determine message get and put rates:
- {WSMON}\Queue_Statistics\*\*\*\QStatistics\*\Gets\Mqimo_Gets_Persist_Count
- {WSMON}\Queue_Statistics\*\*\*\QStatistics\*\Gets\Mqimo_Gets_NonPersist_Count
- {WSMON}\Queue_Statistics\*\*\*\QStatistics\*\Puts\Mqimo_Puts_NonPersist_Count
- {WSMON}\Queue_Statistics\*\*\*\QStatistics\*\Puts\Mqimo_Put1s_NonPersist_Count
- {WSMON}\Queue_Statistics\*\*\*\QStatistics\*\Puts\Mqimo_Puts_NonPersist_Count
- {WSMON}\Queue_Statistics\*\*\*\QStatistics\*\Puts\Mqimo_Put1s_NonPersist_Count
For example, the total number of get message requests (excludes browse) for the collection period would be 1 + 2 and the total put requests would be 3 + 4 + 5 + 6. The delta between the messages put and get requests gives insight into the behavior of the queue. Some applications will process the messages immediately and the counts will be the same, others may vary slightly which may result in frequency of reporting, and some may process periodically creating a sawtooth frequency.
The collection period would be reported as the update latency in milliseconds. The gets and puts per second would be calculated as follows:
total_count / (update latency) * 1000
The following is an example policy based on these values:
Similarly, you could view the statistics over time using charts.
Using MQ statistics in your meshIQ management application
MQ statistics can be viewed using your meshIQ management application. This is done by selecting MQ Statistics… from the context menus for queues, channels or queue managers.
This will display a default view for the selected objects, such as queue as seen below.
You can update the date range using the Date mode list: Last 24 hours, Last 48 hours, Last 7 days, Custom Days Count (enter the number of previous days), or User Date Range (select a date range). When switching back and forth between the User Date Range and the Custom Days count, the date range is updated. For example, if you view records after selecting a Custom Days Count of 14, then switch to the User Date Range, the range shows the past 14 days.
You can modify the default view by selecting schema and selecting a subset of columns, as seen below.
Using a database “view” of the data is especially useful if you want to combine or simplify the columns displayed.
- Select * FROM statqueue
- Select QUEUE_NAME, MIN_DEPTH, MAX_DEPTH, NONPERS_PUT_COUNT, NONPERS_PUT_COUNT, NONPERS_PUT1_COUNT FROM statqueue
- Select * from view1
Streaming queue statistics to XRay
XRay provides long term data analysis as well as the ability to address Decision Support requirements such as volumes over time and busiest periods. An English like query language (JKQL) allows adhoc questions to be answered related to application behavior. While the SQL database described above is a good option for a simple deployment, the big data aspects of XRay offer a longer term ability to capture the required data and offers a more robust query language to get to the root of the data. In addition to queue statistics, XRay can also collect MQ error logs, and transaction activity to further correlate performance anomalies using Machine Learning and advanced analytics and objectives.
Using the same metrics collected above, these can be forwarded to XRay by indicating the required metrics on the Streaming tab using the raw metrics or periodic summary records.
As an example, to stream all MQI and queue statistics, the following would be specified on the WS_Monitor or equivalent expert. With WGS 10 and MQ V9 and above, additional system metrics are also collected and available for streaming.
Once streamed to XRay, the various displays can be used to analyze and present them as shown in the example below.
This chart is a 1 year breakdown of bytes sent by the queue manager as a bar chart and below that as a percentage by respective queue managers.
Whereas this chart shows a breakdown by queues, both for messages and bytes
Other presentation types exist, including line charts, anomaly charts, histograms, heat maps, and many others.