You run into an issue where MQ agents are displayed as being down or in an unknown state, but the agent processes are actually running fine. For example, in UNIX you see the following commands:
- ps -ef | grep nsq: shows nsqmq, nsqpub and nsqmsg processes are running
- netstat -an | grep 5010: shows that the agent listens on port "5010" as an example
If both of these are successful, continue to the next set of tests.
Troubleshooting suggestions
It is suggested to perform connectivity tests between workgroup server and the agent. See this article for details on how the agent and workgroup server communicate. Listed below are tests you can perform. As an example, "wgshost" is used as the name of the workgroup server host and "mqhost" is the name of the agent host. Use the values shown from the nodes viewlet.
In this example, since Use DNS is NO, the WGS will be using the IP address, not the host name.
On the wgshost machine
Run the following using a command line:
-
nslookup mqhost
Returns IP address of the agent box.
Use resulting IP of agent machine or the "mqIP" for next test. - tracert mqIP (Windows version)
-or -
traceroute mqIP (UNIX)
Confirm that traffic can be routed from one node to the other. -
ping mqIP
This can sometimes be unsuccessful because the ping service/port is disabled or closed on the target machine. -
telnet mqIP 5010
Where 5010 is default port used by the MQ agent. -
telnet mqhost 5010
Where 5010 is default port used by the MQ agent.
If the telnet client does not respond or says "Connection refused" there could be a firewall issue and network engineers and/or network administrators should troubleshoot.
If telnet connects then user can press the key combination "Ctrl+]" to get to the telnet prompt.
In the telnet prompt, type "status" to see connection status.
User can type "close" to close connection of the telnet client and MQ agent.
On the mqhost machine
The following tests are useful when the agent does not show up in the node viewlet. See this article for details on how the agent and workgroup server communicate. Run a similar set of tests using a command line:
- nslookup wgshost
- tracert wgsIP
- ping wgsIP
-
telnet wgsIP 4010
Where 4010 is default port used by the workgroup server.
Please note that secure shell 'ssh' can be used instead of telnet as in: ssh -p 4010 wgsIP
For example:
-bash-4.1$ ssh -p 4217 11.0.0.45
ssh: connect to host 11.0.0.45 port 4217: Connection timed out
Step #4 for both tests usually means that there should be no essential TCP/IP connectivity issues between the MQ agent and the workgroup server.