Site Tools

New release available: 2022-07-31 "Igor". upgrade now! [52.2] (what's this?)

IBM Spectrum Scale troubleshooting

IBM requests to get analysis data using the follwing procedure in case of initial data collection.

The steps below will gather all the docs you could provide in terms of first time data capture given an unknown problem. Do these steps for all your performance/hang/unknown GPFS issues WHEN the problem is occurring. Commands are executed from one node. Collection of the docs will vary based on the working collective created below.

1) Gather waiters and create working collective. It can be good to get multiple looks at what the waiters are and how they have changed, so doing the first mmlsnode command (with the -L) numerous times as you proceed through the steps below might be helpful (specially if issue is pure performance, no hangs).

mmlsnode -N waiters > /tmp/waiters.wcoll
mmdsh -N /tmp/waiters.wcoll "mkdir /tmp/mmfs  2>/dev/null"
mmlsnode -N waiters -L  | sort -nk 4,4 > /tmp/mmfs/service.allwaiters.$(date +"%m%d%H%M%S")

View allwaiters and waiters.wcoll files to verify that these files are <hi #fff200>not empty</hi>.

If either (or both) file(s) are empty, this indicates that the issues seen are not GPFS waiting on any of it's threads. Data to be gathered in this case will vary. Do not continue with steps. Tell Service person and they will determine the best course of action and what docs will be needed.

2) Gather internaldump from all nodes in the working collective
For performance/non-hangs:

mmdsh -N /tmp/waiters.wcoll "/usr/lpp/mmfs/bin/mmfsadm saferdump all > /tmp/mmfs/service.\$(hostname -s).safer.dumpall.\$(date +"%m%d%H%M%S")"

3) If this is a performance problem, get 60 seconds mmfs trace from the nodes in the working collective.

mmtracectl --start --aix-trace-buffer-size=64M --trace-file-size=128M -N /tmp/waiters.wcoll ; sleep 60; mmtracectl --stop -N /tmp/waiters.wcoll

4) Gather gpfs.snap from same nodes.

ibm.txt · Last modified: 2022/07/10 09:34 by