Thursday, January 15, 2009

Blackberry Troubleshooting

Here are some basic Blackberry Alerts and Actions that should be used when troubleshooting Blackberry issues:

In most instances, the Blackberry server will automatically restart an agent after 6 waitcounts (60 minutes). The main thing you want to look for is in the “Controller” log file which is located on each server under the following folder:
\Research In Motion\Blackberry Enterprise Server\Logs\YYYYMMDD\SERVER_CTRL_01_YYYYMMDD_0001.txt

The main thing you want to check, is that the agent stopped, and restarted without error, and also that the next 1 or 2 health checks were passed without incident. (see below):

[30000] (05/19 17:44:59.703):{0x86C} Performing system health check (BlackBerry Controller Version 4.1.4.13)
[30000] (05/19 17:46:26.093):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 6
[30000] (05/19 17:46:26.093):{0x950} 'SERVERNAME' agent 3: requesting the restart
[30000] (05/19 17:46:26.296):{0x950} 'SERVERNAME' agent 3 requested to stop
[30000] (05/19 17:47:11.671):{0x950} 'SERVERNAME' agent 3 stopped. Exit code = 255
[30000] (05/19 17:47:16.718):{0x950} 'SERVERNAME' agent 3 started as process 1376
[30000] (05/19 17:47:18.656):{0x950} 'SERVERNAME' agent 3: UDP log port is 4091
[30000] (05/19 17:54:59.718):{0x86C} Performing system health check (BlackBerry Controller Version 4.1.4.13)
[30000] (05/19 18:04:59.750):{0x86C} Performing system health check (BlackBerry Controller Version 4.1.4.13)

In cases where there are a large number of hung threads detected (see below), I would suggest restarting the Blackberry services on that particular server as that group of users will have had some message delays of at least 30 minutes on some messages. Also, the Blackberry server will continue to queue messages in to that thread even though it’s hung.

Please note a service restart will delay messages to all users on that server for about 5 minutes while the server resynchronizes with each handheld. Also, any agent that has been restarted 10 times in a 24-hour period will not restart. In those cases it’s mandatory to restart all the Blackberry services, and often a reboot is in order.

[30000] (05/19 17:14:59.687):{0x86C} Performing system health check (BlackBerry Controller Version 4.1.4.13)
[30000] (05/19 17:16:09.750):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 3
[30000] (05/19 17:16:09.750):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 3
[30000] (05/19 17:16:09.765):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 3
[30000] (05/19 17:16:09.765):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 1
[30000] (05/19 17:16:09.765):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 1
[30000] (05/19 17:16:09.765):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 1
[30000] (05/19 17:16:09.765):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 3
[30000] (05/19 17:16:09.765):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 3
[30000] (05/19 17:16:09.765):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 2
[30000] (05/19 17:16:09.765):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 3
[30000] (05/19 17:16:09.765):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 3
[30000] (05/19 17:16:09.765):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 1
[30000] (05/19 17:16:09.765):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 2
[30000] (05/19 17:16:09.765):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 2
[30000] (05/19 17:16:09.765):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 3
[30000] (05/19 17:16:09.765):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 2
[30000] (05/19 17:16:09.765):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 1
[30000] (05/19 17:16:09.765):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 3
[30000] (05/19 17:16:09.765):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 1
[30000] (05/19 17:16:09.765):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 3
[30000] (05/19 17:16:09.765):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 3
[30000] (05/19 17:16:09.765):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 2
[30000] (05/19 17:16:09.765):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 2
[30000] (05/19 17:16:09.765):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 1
[30000] (05/19 17:16:09.765):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 3
[30000] (05/19 17:16:09.765):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 3
[30000] (05/19 17:16:09.765):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 3
[30000] (05/19 17:16:09.765):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 3
[30000] (05/19 17:16:09.765):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 3
[30000] (05/19 17:16:09.781):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 1
[30000] (05/19 17:16:09.781):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 1
[30000] (05/19 17:16:09.781):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 3
[30000] (05/19 17:16:09.781):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 3
[30000] (05/19 17:16:09.781):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 3
[30000] (05/19 17:16:09.781):{0x950} 'SERVERNAME' agent 3: hung threads detected. WaitCount = 3

No comments: