Closed
|
Case #
|
10055
|
|
Affiliated Job:
|
New Trier Township District 2031
|
|
Opened:
|
Tuesday, September 13, 2011
|
|
Closed:
|
Tuesday, September 13, 2011
|
Total Hit Count:
|
13853
|
|
Last Hit:
|
Wednesday, October 30, 2024 9:49:31 AM
|
Unique Hit Count:
|
5285
|
|
Last Unique Hit:
|
Wednesday, October 30, 2024 9:49:31 AM
|
Case Type(s):
|
Server, Vendor Support
|
|
Case Notes(s):
|
All cases are posted for review purposes only. Any implementations should be performed at your own risk.
|
|
|
Problem:
|
Our Avamar grid appeared to stall for us over a weekend while nothing on our network had changed. Jobs initiated via a schedule had been running for 30 hours straight despite the 12:00 noon blackout window which should have timed-out the jobs but didn't. I proceeded to cancel these jobs and test manual runs (both image level and agent levels) which simply remained in a "waiting" status, allowed a schedule job to begin which never did. In further review I noticed that under "Server" - "Active Sessions" - there were 19 even though from my perspective nothing was running under the "Activity" screen. Additionally, later I received a notice that a recent checkpoint had not run. I contacted EMC support.
|
|
Resolution:
|
It turned out there were stalled sessions in a "CLOSE_WAIT" status. They performed the commands in a putty session below to diagnose:
netstat -alp | grep CLOSE (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) tcp 1 0 avamar-u.nths.net:28001 avamar-vm-01.domain.local:50441 CLOSE_WAIT 19222/java tcp 138 0 avamar-u.nths.net:28001 avamar-vm-02.domain.local:46636 CLOSE_WAIT 19222/java ... ... The above demonstrated these "CLOSE_WAIT" - there were 38 of these
The Technician continued by running a manual checkpoint and once this completed, he proceeded by restarting the MCS service:
dpnctl stop mcs dpnctl start mcs dpnctl start sched dpnctl start maint dpnctl status dpnctl: INFO: gsan status: ready dpnctl: INFO: MCS status: up. dpnctl: INFO: EMS status: up. dpnctl: INFO: Backup scheduler status: up. dpnctl: INFO: dtlt status: up. dpnctl: INFO: axionfs status: up. dpnctl: INFO: Maintenance windows scheduler status: enabled. dpnctl: INFO: Maintenance cron jobs status: enabled. dpnctl: INFO: Unattended startup status: disabled.
This indeed resolved our issue, the next evenings backups proceeded without a hitch.
|
|
|
|
|
|
|