* 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU?
@ 2013-10-17 20:42 Helge Deller
2013-10-17 21:07 ` Myklebust, Trond
0 siblings, 1 reply; 8+ messages in thread
From: Helge Deller @ 2013-10-17 20:42 UTC (permalink / raw)
To: Linux Kernel Development, NFS list, list parisc-linux
I'm seeing a regression with current kernel git head when using NFS-mounts.
Architecture in my case is parisc, although I don't think that this is relevant.
At least kernel 3.10 (and I think 3.11) didn't showed that problem.
The symtom is, that "top" shows high usage of either kswapd0 or kswapd1.
Here is an output with kswapd1:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
37 root 20 0 0 0 0 R 91.8 0.0 63:00.40 kswapd1
28448 root 20 0 3252 1428 1060 R 15.3 0.0 0:00.09 top
1 root 20 0 2784 988 852 S 0.0 0.0 0:09.95 init
This is what ps shows:
lsXXXX:~# ps -ef | grep mount
root 1181 1 0 14:51 ? 00:00:18 /usr/sbin/automount --pid-file /var/run/autofs.pid
root 25331 1181 0 21:25 ? 00:00:00 /bin/mount -n -t nfs -s -o nolock,rw,hard,intr homes:/unixhome1 /net/home1
root 25332 25331 0 21:25 ? 00:00:00 /sbin/mount.nfs homes:/unixhome1 /net/home1 -s -n -o rw,nolock,hard,intr
And using sysrq to show the blocked tasks I get in syslog:
SysRq : Show Blocked State
mount.nfs D 00000000401040c0 0 25332 25331 0x00000010
Backtrace:
[<0000000040113a68>] __schedule+0x500/0x810
I know it's not a problem of the NFS server, since the same mount is still ok on other machines.
The NFS directory was already mounted and in use when this mount happened again (called by cron-job).
Any ideas?
Helge
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU? 2013-10-17 20:42 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU? Helge Deller @ 2013-10-17 21:07 ` Myklebust, Trond 2013-10-18 19:26 ` Helge Deller 0 siblings, 1 reply; 8+ messages in thread From: Myklebust, Trond @ 2013-10-17 21:07 UTC (permalink / raw) To: Helge Deller; +Cc: Linux Kernel Development, NFS list, list parisc-linux On Thu, 2013-10-17 at 22:42 +0200, Helge Deller wrote: > I'm seeing a regression with current kernel git head when using NFS-mounts. > Architecture in my case is parisc, although I don't think that this is relevant. > At least kernel 3.10 (and I think 3.11) didn't showed that problem. > > The symtom is, that "top" shows high usage of either kswapd0 or kswapd1. > Here is an output with kswapd1: > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 37 root 20 0 0 0 0 R 91.8 0.0 63:00.40 kswapd1 > 28448 root 20 0 3252 1428 1060 R 15.3 0.0 0:00.09 top > 1 root 20 0 2784 988 852 S 0.0 0.0 0:09.95 init > > This is what ps shows: > lsXXXX:~# ps -ef | grep mount > root 1181 1 0 14:51 ? 00:00:18 /usr/sbin/automount --pid-file /var/run/autofs.pid > root 25331 1181 0 21:25 ? 00:00:00 /bin/mount -n -t nfs -s -o nolock,rw,hard,intr homes:/unixhome1 /net/home1 > root 25332 25331 0 21:25 ? 00:00:00 /sbin/mount.nfs homes:/unixhome1 /net/home1 -s -n -o rw,nolock,hard,intr > > And using sysrq to show the blocked tasks I get in syslog: > SysRq : Show Blocked State > mount.nfs D 00000000401040c0 0 25332 25331 0x00000010 > Backtrace: > [<0000000040113a68>] __schedule+0x500/0x810 > > I know it's not a problem of the NFS server, since the same mount is still ok on other machines. > The NFS directory was already mounted and in use when this mount happened again (called by cron-job). > > Any ideas? If the NFS directory is already mounted, then why is the automounter trying to mount it a second time? -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU? 2013-10-17 21:07 ` Myklebust, Trond @ 2013-10-18 19:26 ` Helge Deller 2013-10-18 19:36 ` Myklebust, Trond 0 siblings, 1 reply; 8+ messages in thread From: Helge Deller @ 2013-10-18 19:26 UTC (permalink / raw) To: Myklebust, Trond; +Cc: Linux Kernel Development, NFS list, linux-parisc [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset=UTF-7, Size: 2195 bytes --] On 10/17/2013 11:07 PM, Myklebust, Trond wrote: > On Thu, 2013-10-17 at 22:42 +0200, Helge Deller wrote: >> I'm seeing a regression with current kernel git head when using NFS-mounts. >> Architecture in my case is parisc, although I don't think that this is relevant. >> At least kernel 3.10 (and I think 3.11) didn't showed that problem. >> >> The symtom is, that "top" shows high usage of either kswapd0 or kswapd1. >> Here is an output with kswapd1: >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 37 root 20 0 0 0 0 R 91.8 0.0 63:00.40 kswapd1 >> 28448 root 20 0 3252 1428 1060 R 15.3 0.0 0:00.09 top >> 1 root 20 0 2784 988 852 S 0.0 0.0 0:09.95 init >> >> This is what ps shows: >> lsXXXX:~# ps -ef | grep mount >> root 1181 1 0 14:51 ? 00:00:18 /usr/sbin/automount --pid-file /var/run/autofs.pid >> root 25331 1181 0 21:25 ? 00:00:00 /bin/mount -n -t nfs -s -o nolock,rw,hard,intr homes:/unixhome1 /net/home1 >> root 25332 25331 0 21:25 ? 00:00:00 /sbin/mount.nfs homes:/unixhome1 /net/home1 -s -n -o rw,nolock,hard,intr >> >> And using sysrq to show the blocked tasks I get in syslog: >> SysRq : Show Blocked State >> mount.nfs D 00000000401040c0 0 25332 25331 0x00000010 >> Backtrace: >> [<0000000040113a68>] __schedule+0x500/0x810 >> >> I know it's not a problem of the NFS server, since the same mount is still ok on other machines. >> The NFS directory was already mounted and in use when this mount happened again (called by cron-job). >> >> Any ideas? > > If the NFS directory is already mounted, then why is the automounter > trying to mount it a second time? I was wrong in this. The directory wasn't mounted yet (or at least it was unmounted in the meantime before the new mount.nfs was called). I'm now not even sure, that the high kswapd is really triggered by the NFS problem, because I now have another machine with the blocked NFS-mount, but without the high kswapd usage. Nevertheless, the blocked nfs mount tasks really make me wonder. There is clearly some kind of regression since it doesn't happen with older kernels. Helge ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU? 2013-10-18 19:26 ` Helge Deller @ 2013-10-18 19:36 ` Myklebust, Trond 2013-10-18 20:03 ` Helge Deller 0 siblings, 1 reply; 8+ messages in thread From: Myklebust, Trond @ 2013-10-18 19:36 UTC (permalink / raw) To: Helge Deller; +Cc: Linux Kernel Development, NFS list, linux-parisc On Fri, 2013-10-18 at 21:26 +0200, Helge Deller wrote: > On 10/17/2013 11:07 PM, Myklebust, Trond wrote: > > On Thu, 2013-10-17 at 22:42 퍭, Helge Deller wrote: > >> I'm seeing a regression with current kernel git head when using NFS-mounts. > >> Architecture in my case is parisc, although I don't think that this is relevant. > >> At least kernel 3.10 (and I think 3.11) didn't showed that problem. > >> > >> The symtom is, that "top" shows high usage of either kswapd0 or kswapd1. > >> Here is an output with kswapd1: > >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME COMMAND > >> 37 root 20 0 0 0 0 R 91.8 0.0 63:00.40 kswapd1 > >> 28448 root 20 0 3252 1428 1060 R 15.3 0.0 0:00.09 top > >> 1 root 20 0 2784 988 852 S 0.0 0.0 0:09.95 init > >> > >> This is what ps shows: > >> lsXXXX:~# ps -ef | grep mount > >> root 1181 1 0 14:51 ? 00:00:18 /usr/sbin/automount --pid-file /var/run/autofs.pid > >> root 25331 1181 0 21:25 ? 00:00:00 /bin/mount -n -t nfs -s -o nolock,rw,hard,intr homes:/unixhome1 /net/home1 > >> root 25332 25331 0 21:25 ? 00:00:00 /sbin/mount.nfs homes:/unixhome1 /net/home1 -s -n -o rw,nolock,hard,intr > >> > >> And using sysrq to show the blocked tasks I get in syslog: > >> SysRq : Show Blocked State > >> mount.nfs D 00000000401040c0 0 25332 25331 0x00000010 > >> Backtrace: > >> [<0000000040113a68>] __schedule팞瓓ﴱ > >> > >> I know it's not a problem of the NFS server, since the same mount is still ok on other machines. > >> The NFS directory was already mounted and in use when this mount happened again (called by cron-job). > >> > >> Any ideas? > > > > If the NFS directory is already mounted, then why is the automounter > > trying to mount it a second time? > > I was wrong in this. > The directory wasn't mounted yet (or at least it was unmounted in the meantime before the new > mount.nfs was called). > > I'm now not even sure, that the high kswapd is really triggered by the NFS problem, > because I now have another machine with the blocked NFS-mount, but without > the high kswapd usage. > > Nevertheless, the blocked nfs mount tasks really make me wonder. There is clearly > some kind of regression since it doesn't happen with older kernels. Have you ever reproduced it without the automounter? Also, could you please try a sysRQ-t the next time it happens, so that we can get a trace of where the mount program is hanging. Knowing that the mount is stuck in "__schedule()" is not really interesting unless we know from where that was called. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU? 2013-10-18 19:36 ` Myklebust, Trond @ 2013-10-18 20:03 ` Helge Deller 2013-10-18 20:12 ` Myklebust, Trond 0 siblings, 1 reply; 8+ messages in thread From: Helge Deller @ 2013-10-18 20:03 UTC (permalink / raw) To: Myklebust, Trond; +Cc: Linux Kernel Development, NFS list, linux-parisc On 10/18/2013 09:36 PM, Myklebust, Trond wrote: > On Fri, 2013-10-18 at 21:26 퍭, Helge Deller wrote: >> On 10/17/2013 11:07 PM, Myklebust, Trond wrote: >>> On Thu, 2013-10-17 at 22:42 m, Helge Deller wrote: >>>> I'm seeing a regression with current kernel git head when using NFS-mounts. >>>> Architecture in my case is parisc, although I don't think that this is relevant. >>>> At least kernel 3.10 (and I think 3.11) didn't showed that problem. >>>> >>>> The symtom is, that "top" shows high usage of either kswapd0 or kswapd1. >>>> Here is an output with kswapd1: >>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME COMMAND >>>> 37 root 20 0 0 0 0 R 91.8 0.0 63:00.40 kswapd1 >>>> 28448 root 20 0 3252 1428 1060 R 15.3 0.0 0:00.09 top >>>> 1 root 20 0 2784 988 852 S 0.0 0.0 0:09.95 init >>>> >>>> This is what ps shows: >>>> lsXXXX:~# ps -ef | grep mount >>>> root 1181 1 0 14:51 ? 00:00:18 /usr/sbin/automount --pid-file /var/run/autofs.pid >>>> root 25331 1181 0 21:25 ? 00:00:00 /bin/mount -n -t nfs -s -o nolock,rw,hard,intr homes:/unixhome1 /net/home1 >>>> root 25332 25331 0 21:25 ? 00:00:00 /sbin/mount.nfs homes:/unixhome1 /net/home1 -s -n -o rw,nolock,hard,intr >>>> >>>> And using sysrq to show the blocked tasks I get in syslog: >>>> SysRq : Show Blocked State >>>> mount.nfs D 00000000401040c0 0 25332 25331 0x00000010 >>>> Backtrace: >>>> [<0000000040113a68>] __schedule >>>> >>>> I know it's not a problem of the NFS server, since the same mount is still ok on other machines. >>>> The NFS directory was already mounted and in use when this mount happened again (called by cron-job). >>>> >>>> Any ideas? >>> >>> If the NFS directory is already mounted, then why is the automounter >>> trying to mount it a second time? >> >> I was wrong in this. >> The directory wasn't mounted yet (or at least it was unmounted in the meantime before the new >> mount.nfs was called). >> >> I'm now not even sure, that the high kswapd is really triggered by the NFS problem, >> because I now have another machine with the blocked NFS-mount, but without >> the high kswapd usage. >> >> Nevertheless, the blocked nfs mount tasks really make me wonder. There is clearly >> some kind of regression since it doesn't happen with older kernels. > > Have you ever reproduced it without the automounter? No, because it happens only after quite some time (>12h) and only if I have it under pressure (load is >9 on a 4-way box). I'll try it as soon as possible. > Also, could you please try a sysRQ-t the next time it happens, so that > we can get a trace of where the mount program is hanging. Knowing that > the mount is stuck in "__schedule()" is not really interesting unless we > know from where that was called. Actually, the machine was still running in this state. Here is sysrq-t: [112009.084000] mount S 00000000401040c0 0 25331 1 0x00000010 [112009.084000] Backtrace: [112009.084000] [<0000000040113a68>] __schedule팞瓓ﴱ [112009.232000] [112009.232000] mount.nfs D 00000000401040c0 0 25332 25331 0x00000010 [112009.232000] Backtrace: [112009.232000] [<0000000040113a68>] __schedule팞瓓ﴱ Helge ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU? 2013-10-18 20:03 ` Helge Deller @ 2013-10-18 20:12 ` Myklebust, Trond 2013-10-19 18:27 ` Helge Deller 0 siblings, 1 reply; 8+ messages in thread From: Myklebust, Trond @ 2013-10-18 20:12 UTC (permalink / raw) To: Helge Deller; +Cc: Linux Kernel Development, NFS list, linux-parisc On Fri, 2013-10-18 at 22:03 +0200, Helge Deller wrote: > On 10/18/2013 09:36 PM, Myklebust, Trond wrote: > > Also, could you please try a sysRQ-t the next time it happens, so that > > we can get a trace of where the mount program is hanging. Knowing that > > the mount is stuck in "__schedule()" is not really interesting unless we > > know from where that was called. > > Actually, the machine was still running in this state. > Here is sysrq-t: > [112009.084000] mount S 00000000401040c0 0 25331 1 0x00000010 > [112009.084000] Backtrace: > [112009.084000] [<0000000040113a68>] __schedule팞瓓ﴱ > [112009.232000] > [112009.232000] mount.nfs D 00000000401040c0 0 25332 25331 0x00000010 > [112009.232000] Backtrace: > [112009.232000] [<0000000040113a68>] __schedule팞瓓ﴱ That makes no sense unless sysrq-t works differently on parisc than on other platforms. I'd expect the backtrace to at least include a system call. Parisc experts? -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU? 2013-10-18 20:12 ` Myklebust, Trond @ 2013-10-19 18:27 ` Helge Deller 2013-10-31 19:45 ` Helge Deller 0 siblings, 1 reply; 8+ messages in thread From: Helge Deller @ 2013-10-19 18:27 UTC (permalink / raw) To: Myklebust, Trond; +Cc: Linux Kernel Development, NFS list, linux-parisc On 10/18/2013 10:12 PM, Myklebust, Trond wrote: > On Fri, 2013-10-18 at 22:03 퍭, Helge Deller wrote: >> On 10/18/2013 09:36 PM, Myklebust, Trond wrote: >>> Also, could you please try a sysRQ-t the next time it happens, so that >>> we can get a trace of where the mount program is hanging. Knowing that >>> the mount is stuck in "__schedule()" is not really interesting unless we >>> know from where that was called. >> >> Actually, the machine was still running in this state. >> Here is sysrq-t: >> [112009.084000] mount S 00000000401040c0 0 25331 1 0x00000010 >> [112009.084000] Backtrace: >> [112009.084000] [<0000000040113a68>] __schedule >> [112009.232000] >> [112009.232000] mount.nfs D 00000000401040c0 0 25332 25331 0x00000010 >> [112009.232000] Backtrace: >> [112009.232000] [<0000000040113a68>] __schedule > > That makes no sense unless sysrq-t works differently on parisc than on > other platforms. I'd expect the backtrace to at least include a system > call. Parisc experts? sysrq-t doesn't work differently on parisc. For other processes I do get a backtrace like the one on x86_64. That's the main reason why I asked for ideas here on the list. I do see the stuck process, but don't see any indications where it comes from. Helge ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU? 2013-10-19 18:27 ` Helge Deller @ 2013-10-31 19:45 ` Helge Deller 0 siblings, 0 replies; 8+ messages in thread From: Helge Deller @ 2013-10-31 19:45 UTC (permalink / raw) To: Myklebust, Trond; +Cc: Linux Kernel Development, NFS list, linux-parisc On 10/19/2013 08:27 PM, Helge Deller wrote: > On 10/18/2013 10:12 PM, Myklebust, Trond wrote: >> On Fri, 2013-10-18 at 22:03 m, Helge Deller wrote: >>> On 10/18/2013 09:36 PM, Myklebust, Trond wrote: >>>> Also, could you please try a sysRQ-t the next time it happens, so that >>>> we can get a trace of where the mount program is hanging. Knowing that >>>> the mount is stuck in "__schedule()" is not really interesting unless we >>>> know from where that was called. >>> >>> Actually, the machine was still running in this state. >>> Here is sysrq-t: >>> [112009.084000] mount S 00000000401040c0 0 25331 1 0x00000010 >>> [112009.084000] Backtrace: >>> [112009.084000] [<0000000040113a68>] __schedule >>> [112009.232000] >>> [112009.232000] mount.nfs D 00000000401040c0 0 25332 25331 0x00000010 >>> [112009.232000] Backtrace: >>> [112009.232000] [<0000000040113a68>] __schedule >> >> That makes no sense unless sysrq-t works differently on parisc than on >> other platforms. I'd expect the backtrace to at least include a system >> call. Parisc experts? > > sysrq-t doesn't work differently on parisc. For other processes I do get > a backtrace like the one on x86_64. > That's the main reason why I asked for ideas here on the list. > I do see the stuck process, but don't see any indications where it comes from. I'm pretty sure that the regression (kswapd using 100% CPU) I reported here is fixed by this patch: http://git.kernel.org/cgit/linux/kernel/git/deller/parisc-linux.git/commit/?id=c56b097af26cb11c1f49a4311ba538c825666fed I will start some testing... Helge ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2013-10-31 19:45 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-10-17 20:42 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU? Helge Deller 2013-10-17 21:07 ` Myklebust, Trond 2013-10-18 19:26 ` Helge Deller 2013-10-18 19:36 ` Myklebust, Trond 2013-10-18 20:03 ` Helge Deller 2013-10-18 20:12 ` Myklebust, Trond 2013-10-19 18:27 ` Helge Deller 2013-10-31 19:45 ` Helge Deller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).