* [NFS] Sudden high load average and abnormal behavior
@ 2008-06-16 5:25 howard chen
[not found] ` <b66ddc900806152225o3a7f2bccrf4e83e70c992847a-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 3+ messages in thread
From: howard chen @ 2008-06-16 5:25 UTC (permalink / raw)
To: nfs
Hi,
I have a dedicated NFS server running Raid5 disks and recently
observed a sudden increase in load average and some abnormal behavior
(e.g. command "df -h" halt without returning).
I have checked the Dell OpenManage and showing hardware is okay, the
load average used to be around 3 to 4 before.
Some info might be useful:
>> top
top - 13:17:53 up 382 days, 23:44, 6 users, load average: 20.53, 20.21, 18.93
Tasks: 286 total, 1 running, 285 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.1% us, 1.1% sy, 0.0% ni, 68.4% id, 29.9% wa, 0.0% hi, 0.5% si
Mem: 4045256k total, 4028028k used, 17228k free, 437428k buffers
Swap: 9775512k total, 160k used, 9775352k free, 2814332k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2049 root 15 0 0 0 0 S 1 0.0 861:21.26 kjournald
26094 root 15 0 0 0 0 S 0 0.0 85:02.82 nfsd
26106 root 15 0 0 0 0 S 0 0.0 83:49.86 nfsd
26110 root 15 0 0 0 0 S 0 0.0 84:33.23 nfsd
26124 root 15 0 0 0 0 S 0 0.0 84:37.47 nfsd
2839 root 16 0 6280 1172 780 R 0 0.0 0:00.02 top
..
>> iostat
avg-cpu: %user %nice %sys %iowait %idle
0.06 0.00 1.34 21.60 77.00
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 114.89 4.33 18.05 143391021 597126208
sda1 1.07 0.69 8.26 22771290 273100496
sda2 0.00 0.00 0.00 2 0
sda5 0.00 0.00 0.00 1010 408
sda6 110.49 3.63 9.79 119979495 323992464
dm-0 0.58 2.91 3.22 96295602 106444120
dm-1 0.55 0.60 4.31 19996266 142435600
dm-2 0.02 0.08 0.18 2673626 5953184
dm-3 109.53 1.52 2.09 50389354 69192400
>> df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 9.7G 1.6G 7.6G 18% /
none 2.0G 0 2.0G 0% /dev/shm
/dev/mapper/lvm01-lvm01_usr
20G 1.5G 18G 8% /usr
/dev/mapper/lvm01-lvm01_var
9.9G 327M 9.1G 4% /var
/dev/mapper/lvm01-lvm01_home
9.9G 56M 9.3G 1% /home
/dev/mapper/lvm01-lvm01_data0
492G 285G 182G 62% /data0
# !!! == The command stopped at here without returning === !!!
Any idea?
Howard
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that nfs@lists.sourceforge.net is being discontinued.
Please subscribe to linux-nfs@vger.kernel.org instead.
http://vger.kernel.org/vger-lists.html#linux-nfs
^ permalink raw reply [flat|nested] 3+ messages in thread[parent not found: <b66ddc900806152225o3a7f2bccrf4e83e70c992847a-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [NFS] Sudden high load average and abnormal behavior [not found] ` <b66ddc900806152225o3a7f2bccrf4e83e70c992847a-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2008-06-16 15:18 ` Wendy Cheng 2008-06-16 16:07 ` howard chen 0 siblings, 1 reply; 3+ messages in thread From: Wendy Cheng @ 2008-06-16 15:18 UTC (permalink / raw) To: howard chen; +Cc: nfs howard chen wrote: > > > top - 13:17:53 up 382 days, 23:44, 6 users, load average: 20.53, 20.21, 18.93 > Tasks: 286 total, 1 running, 285 sleeping, 0 stopped, 0 zombie > Cpu(s): 0.1% us, 1.1% sy, 0.0% ni, 68.4% id, 29.9% wa, 0.0% hi, 0.5% si > Mem: 4045256k total, 4028028k used, 17228k free, 437428k buffers > Swap: 9775512k total, 160k used, 9775352k free, 2814332k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 2049 root 15 0 0 0 0 S 1 0.0 861:21.26 kjournald > 26094 root 15 0 0 0 0 S 0 0.0 85:02.82 nfsd > 26106 root 15 0 0 0 0 S 0 0.0 83:49.86 nfsd > 26110 root 15 0 0 0 0 S 0 0.0 84:33.23 nfsd > 26124 root 15 0 0 0 0 S 0 0.0 84:37.47 nfsd > 2839 root 16 0 6280 1172 780 R 0 0.0 0:00.02 top > I haven't used ext3 for a very long time so not sure whether there are changes. IIRC, if kjournald is up and runnning (implying ext3 is flushing its data to the disk), it holds the journal lock so the access to that particular filesystem is temporarily suspended. So the issue here is to check why kjournald takes such a long time to do the flushing. Normally we want to see the thread backtrace of "kjournald" by asking for a "sysrq-t" output via: shell> cd /proc shell> echo t > sysrq-trigger This will write all the thread backtraces into the system file /var/log/messages file so people can have a rough idea of what goes wrong. The *trick* here is to make sure the /var/log/messages file doesn't live on the particular filesystem that has the high load issue (otherwise the writing to the /var/log/messages will hang as well). So you may want to configure the /var on a separate filesystem. Remember each ext3 filesystem has its own kjournald (again, I have not touched ext3 for a while so this is from my old memory). Another option is to google to see whether other people on the same kernel level has the same issue as yours and pull their fix into your system - however, it is more of a long shot (since you're doing the guessing). -- Wendy ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs _______________________________________________ Please note that nfs@lists.sourceforge.net is being discontinued. Please subscribe to linux-nfs@vger.kernel.org instead. http://vger.kernel.org/vger-lists.html#linux-nfs ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [NFS] Sudden high load average and abnormal behavior 2008-06-16 15:18 ` Wendy Cheng @ 2008-06-16 16:07 ` howard chen 0 siblings, 0 replies; 3+ messages in thread From: howard chen @ 2008-06-16 16:07 UTC (permalink / raw) To: Wendy Cheng; +Cc: nfs Hi On Mon, Jun 16, 2008 at 11:18 PM, Wendy Cheng <s.wendy.cheng@gmail.com> wrote: > howard chen wrote: > This will write all the thread backtraces into the system file > /var/log/messages file so people can have a rough idea of what goes wrong. > The *trick* here is to make sure the /var/log/messages file doesn't live on > the particular filesystem that has the high load issue (otherwise the > writing to the /var/log/messages will hang as well). So you may want to > configure the /var on a separate filesystem. Remember each ext3 filesystem > has its own kjournald (again, I have not touched ext3 for a while so this is > from my old memory). > > Another option is to google to see whether other people on the same kernel > level has the same issue as yours and pull their fix into your system - > however, it is more of a long shot (since you're doing the guessing). > > -- Wendy Thanks. I will have a more detail tests Howard ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs _______________________________________________ Please note that nfs@lists.sourceforge.net is being discontinued. Please subscribe to linux-nfs@vger.kernel.org instead. http://vger.kernel.org/vger-lists.html#linux-nfs ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2008-06-16 16:08 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-16 5:25 [NFS] Sudden high load average and abnormal behavior howard chen
[not found] ` <b66ddc900806152225o3a7f2bccrf4e83e70c992847a-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-06-16 15:18 ` Wendy Cheng
2008-06-16 16:07 ` howard chen
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.