How to validate whether my WGS is running & which logs help understand the situation. – meshIQ

Workgroup Server status can be validated in several ways. This article uses the Enterprise Manager and Command Line Interface as examples. Before we begin, we also recommend reviewing the article My WGS Disconnected to understand any changes or updates that may be required on the Manage/Navigator side.

Using Enterprise Manager

Enterprise Manager is the recommended option to check whether WGS is active. On the deployed WGS expert,

Red colour means it is not running.
Yellow means it is starting.
Green shows that it is up & running.
No colour means WGS was never initialized, from the time CEP started.

When WGS is fully up & running, it publishes 4 types of facts in a tree structure. They are:

DBSTATS : This section shows whether the WGS successfully connected to database and provides information about the database it connected to. It also display errors, if connection could not be established.
License : This section lists out the currently applied or in-use license limits, expiry date, and relevant information.
WGSName : This is the section where WGS publishes facts/metrics about the middleware broker/queue manager and it's objects. In the sample screenshot below, it shows MQM.
MQM refers to the name of the WGS, and it may change depending on the environment (based on the name in the license or as defined in the WGS properties).
WGSTATS : This section stores information or metrics about WGS.

In the screenshot below, highlighted part shows the expected fact structure for a running WGS.

Note that WGS startup includes multiple steps and one of the important step is, loading cache from the database. If the amount of data is large, then cache loading will take time. During this period, users may see that the WGS state in Enterprise Manager is Running, but they fail to connect via Manage/Navigator. Additionally, users will only see DBSTATS facts under WGS fact tree list. Give WGS enough time to load cache from database and complete it's startup operations. Once this is done, all four facts discussed previously will be published, indicating that WGS is fully operational and users can connect via Navigator/Manage. Referring to WGS log located at $AUTOPILOT_HOME/logs/log4j, will let users know whether the operation, loading cache completed.

Alternatively, WGS provides an option, so that it can be configured to start up with only required data/cache. After startup finished and WGS made connection to the underlying middleware broker/queue manager, it begins collecting the cache. This option is available in WGS properties under General Tab and is called Minimal Cache Startup. Screenshot attached below for reference.

Using apnet from CLI

If, for some reason, users do not have access to Enterprise Manager, then apnet utility will allow user to query the status of the WGS and determine whether the facts are published. These two steps will confirm whether the WGS is fully operational.

Before going with validating state of WGS using apnet, we recommend first checking whether the CEP that hosts WGS is running. This can be done by validating process status from the Linux terminal by running the command

ps -ef|grep ATPNODE

If the process is running, check whether the port configured for WGS is in use, by running

netstat -na|grep 4010

4010 is a sample port and should be replaced with the port configured in WGS properties.

If the port is in use, proceed with the next steps.

The command to query WGS status using apnet is provided below. In the command, update the domain IP address, port, username, password, and service name to match your environment configuration. The service name refers to the WGS in the example below. In the response, search for status field.

apnet -domain 172.16.31.152 -port 2323 -user Admin -password admin lookup WGS

To check whether all required facts are published, use the apnet command provided below. In this example, we're querying for license facts, but it can be updated to query any other facts, such as WGSTATS or the name of your WGS.

apnet -domain 172.16.31.152 -port 2323 -user Admin -password admin get WGS\LICENSE\*

Going through WGS logs

WGS logs are named as CEPName_wgs*.log4j* (where CEPName refers to the name of the CEP instance) and are located at $AUTOPILOT_HOME/logs/log4j (typically in /opt/nastel/AutoPilotM6/logs/log4j, /opt/meshiq/platform/logs/log4j, or any user-defined path). These logs are specific to Workgroup Server and contain information written by the WGS component.

If the requirement is to understand why WGS went down, we recommend opening the WGS log from the time it went down or the last log entry before service was restarted. In the log, look for entries that state WGS is stopping. See samples below. From this point, scroll up and search for any ERROR entries, which should provide insights into the cause of the issue.

2025-01-13T12:18:24,644 INFO [WGSExpert][WGServer-Thread/MQM] - Stopping all network & communication services | INFO | 2025-01-13 12:18:24.643224 +0530 | APPL=WGSExpert#RUNTIME=7872@ODIN#SERVER=ODIN#NETADDR=169.254.197.38#DATACENTER=UNKNOWN#GEOADDR=0,0
2025-01-13T12:18:24,644 INFO [WGSExpert][WGServer-Thread/MQM] - Closing RegistrationListener | INFO | 2025-01-13 12:18:24.643766 +0530 | APPL=WGSExpert#RUNTIME=7872@ODIN#SERVER=ODIN#NETADDR=169.254.197.38#DATACENTER=UNKNOWN#GEOADDR=0,0
2025-01-13T12:18:24,645 INFO [WGSExpert][WGServer-Thread/MQM] - Closing ClientListener | INFO | 2025-01-13 12:18:24.644702 +0530 | APPL=WGSExpert#RUNTIME=7872@ODIN#SERVER=ODIN#NETADDR=169.254.197.38#DATACENTER=UNKNOWN#GEOADDR=0,0
2025-01-13T12:18:24,645 INFO [WGSExpert][WGServer-Thread/MQM] - Closing AysncListener | INFO | 2025-01-13 12:18:24.644823 +0530 | APPL=WGSExpert#RUNTIME=7872@ODIN#SERVER=ODIN#NETADDR=169.254.197.38#DATACENTER=UNKNOWN#GEOADDR=0,0
2025-01-13T12:18:24,646 INFO [WGSExpert][WGServer-Thread/MQM] - Stopping all background threads, connection pools | INFO | 2025-01-13 12:18:24.645441 +0530 | APPL=WGSExpert#RUNTIME=7872@ODIN#SERVER=ODIN#NETADDR=169.254.197.38#DATACENTER=UNKNOWN#GEOADDR=0,0

If the requirement is to understand why WGS startup is taking too long, open the active WGS log and look for the following entries:

CacheUtils.CacheLoad.call loaded
CacheUtils.CacheLoad.load loaded: elapsed.ms=665

The first entry (CacheLoad.call) indicates that WGS is still loading the cache, meaning it is still in the initialization process. There will be multiple (CacheLoad.call) entries, because WGS should load cache for may object types. The second entry (CacheLoad.load) is reported when WGS completes loading the cache, and it will also show how long the operation took to complete in milliseconds.

For any of those tasks, referring to WGS log is recommended. If there are any issues with understanding the log, we recommend collecting the entire WGS log and reaching out to meshIQ technical support team for assistance.