Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU?

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Helge Deller <deller@gmx.de>
To: "Myklebust, Trond" <Trond.Myklebust@netapp.com>
Cc: Linux Kernel Development <linux-kernel@vger.kernel.org>,
	NFS list <linux-nfs@vger.kernel.org>,
	linux-parisc <linux-parisc@vger.kernel.org>
Subject: Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU?
Date: Fri, 18 Oct 2013 22:03:22 +0200	[thread overview]
Message-ID: <5261940A.4090101@gmx.de> (raw)
In-Reply-To: <1382124981.20461.4.camel@leira.trondhjem.org>

On 10/18/2013 09:36 PM, Myklebust, Trond wrote:
> On Fri, 2013-10-18 at 21:26 퍭, Helge Deller wrote:
>> On 10/17/2013 11:07 PM, Myklebust, Trond wrote:
>>> On Thu, 2013-10-17 at 22:42 m, Helge Deller wrote:
>>>> I'm seeing a regression with current kernel git head when using NFS-mounts.
>>>> Architecture in my case is parisc, although I don't think that this is relevant.
>>>> At least kernel 3.10 (and I think 3.11) didn't showed that problem.
>>>>
>>>> The symtom is, that "top" shows high usage of either kswapd0 or kswapd1.
>>>> Here is an output with kswapd1:
>>>>   PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM     TIME COMMAND
>>>>    37 root      20   0     0    0    0 R  91.8  0.0  63:00.40 kswapd1
>>>> 28448 root      20   0  3252 1428 1060 R  15.3  0.0   0:00.09 top
>>>>     1 root      20   0  2784  988  852 S   0.0  0.0   0:09.95 init
>>>>
>>>> This is what ps shows:
>>>> lsXXXX:~# ps -ef |  grep mount
>>>> root      1181     1  0 14:51 ?        00:00:18 /usr/sbin/automount --pid-file /var/run/autofs.pid
>>>> root     25331  1181  0 21:25 ?        00:00:00 /bin/mount -n -t nfs -s -o nolock,rw,hard,intr homes:/unixhome1 /net/home1
>>>> root     25332 25331  0 21:25 ?        00:00:00 /sbin/mount.nfs homes:/unixhome1 /net/home1 -s -n -o rw,nolock,hard,intr
>>>>
>>>> And using sysrq to show the blocked tasks I get in syslog:
>>>> SysRq : Show Blocked State
>>>> mount.nfs       D 00000000401040c0     0 25332  25331 0x00000010
>>>> Backtrace:
>>>> [<0000000040113a68>] __schedule
>>>>
>>>> I know it's not a problem of the NFS server, since the same mount is still ok on other machines.
>>>> The NFS directory was already mounted and in use when this mount happened again (called by cron-job). 
>>>>  
>>>> Any ideas?
>>>
>>> If the NFS directory is already mounted, then why is the automounter
>>> trying to mount it a second time?
>>
>> I was wrong in this.
>> The directory wasn't mounted yet (or at least it was unmounted in the meantime before the new
>> mount.nfs was called).
>>
>> I'm now not even sure, that the high kswapd is really triggered by the NFS problem,
>> because I now have another machine with the blocked NFS-mount, but without
>> the high kswapd usage.
>>
>> Nevertheless, the blocked nfs mount tasks really make me wonder. There is clearly
>> some kind of regression since it doesn't happen with older kernels.
> 
> Have you ever reproduced it without the automounter?

No, because it happens only after quite some time (>12h) and only if I have it
under pressure (load is >9 on a 4-way box).

I'll try it as soon as possible.

> Also, could you please try a sysRQ-t the next time it happens, so that
> we can get a trace of where the mount program is hanging. Knowing that
> the mount is stuck in "__schedule()" is not really interesting unless we
> know from where that was called.

Actually, the machine was still running in this state.
Here is sysrq-t:
[112009.084000] mount           S 00000000401040c0     0 25331      1 0x00000010
[112009.084000] Backtrace:
[112009.084000]  [<0000000040113a68>] __schedule팞瓓ﴱ
[112009.232000]
[112009.232000] mount.nfs       D 00000000401040c0     0 25332  25331 0x00000010
[112009.232000] Backtrace:
[112009.232000]  [<0000000040113a68>] __schedule팞瓓ﴱ

Helge

WARNING: multiple messages have this Message-ID (diff)

From: Helge Deller <deller-Mmb7MZpHnFY@public.gmane.org>
To: "Myklebust,
	Trond" <Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org>
Cc: Linux Kernel Development
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	NFS list <linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	linux-parisc
	<linux-parisc-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU?
Date: Fri, 18 Oct 2013 22:03:22 +0200	[thread overview]
Message-ID: <5261940A.4090101@gmx.de> (raw)
In-Reply-To: <1382124981.20461.4.camel-5lNtUQgoD8Pfa3cDbr2K10B+6BGkLq7r@public.gmane.org>

On 10/18/2013 09:36 PM, Myklebust, Trond wrote:
> On Fri, 2013-10-18 at 21:26 퍭, Helge Deller wrote:
>> On 10/17/2013 11:07 PM, Myklebust, Trond wrote:
>>> On Thu, 2013-10-17 at 22:42 m, Helge Deller wrote:
>>>> I'm seeing a regression with current kernel git head when using NFS-mounts.
>>>> Architecture in my case is parisc, although I don't think that this is relevant.
>>>> At least kernel 3.10 (and I think 3.11) didn't showed that problem.
>>>>
>>>> The symtom is, that "top" shows high usage of either kswapd0 or kswapd1.
>>>> Here is an output with kswapd1:
>>>>   PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM     TIME COMMAND
>>>>    37 root      20   0     0    0    0 R  91.8  0.0  63:00.40 kswapd1
>>>> 28448 root      20   0  3252 1428 1060 R  15.3  0.0   0:00.09 top
>>>>     1 root      20   0  2784  988  852 S   0.0  0.0   0:09.95 init
>>>>
>>>> This is what ps shows:
>>>> lsXXXX:~# ps -ef |  grep mount
>>>> root      1181     1  0 14:51 ?        00:00:18 /usr/sbin/automount --pid-file /var/run/autofs.pid
>>>> root     25331  1181  0 21:25 ?        00:00:00 /bin/mount -n -t nfs -s -o nolock,rw,hard,intr homes:/unixhome1 /net/home1
>>>> root     25332 25331  0 21:25 ?        00:00:00 /sbin/mount.nfs homes:/unixhome1 /net/home1 -s -n -o rw,nolock,hard,intr
>>>>
>>>> And using sysrq to show the blocked tasks I get in syslog:
>>>> SysRq : Show Blocked State
>>>> mount.nfs       D 00000000401040c0     0 25332  25331 0x00000010
>>>> Backtrace:
>>>> [<0000000040113a68>] __schedule
>>>>
>>>> I know it's not a problem of the NFS server, since the same mount is still ok on other machines.
>>>> The NFS directory was already mounted and in use when this mount happened again (called by cron-job). 
>>>>  
>>>> Any ideas?
>>>
>>> If the NFS directory is already mounted, then why is the automounter
>>> trying to mount it a second time?
>>
>> I was wrong in this.
>> The directory wasn't mounted yet (or at least it was unmounted in the meantime before the new
>> mount.nfs was called).
>>
>> I'm now not even sure, that the high kswapd is really triggered by the NFS problem,
>> because I now have another machine with the blocked NFS-mount, but without
>> the high kswapd usage.
>>
>> Nevertheless, the blocked nfs mount tasks really make me wonder. There is clearly
>> some kind of regression since it doesn't happen with older kernels.
> 
> Have you ever reproduced it without the automounter?

No, because it happens only after quite some time (>12h) and only if I have it
under pressure (load is >9 on a 4-way box).

I'll try it as soon as possible.

> Also, could you please try a sysRQ-t the next time it happens, so that
> we can get a trace of where the mount program is hanging. Knowing that
> the mount is stuck in "__schedule()" is not really interesting unless we
> know from where that was called.

Actually, the machine was still running in this state.
Here is sysrq-t:
[112009.084000] mount           S 00000000401040c0     0 25331      1 0x00000010
[112009.084000] Backtrace:
[112009.084000]  [<0000000040113a68>] __schedule팞瓓ﴱ
[112009.232000]
[112009.232000] mount.nfs       D 00000000401040c0     0 25332  25331 0x00000010
[112009.232000] Backtrace:
[112009.232000]  [<0000000040113a68>] __schedule팞瓓ﴱ

Helge
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2013-10-18 20:03 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-17 20:42 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU? Helge Deller
2013-10-17 21:07 ` Myklebust, Trond
2013-10-18 19:26   ` Helge Deller
2013-10-18 19:36     ` Myklebust, Trond
2013-10-18 19:36       ` Myklebust, Trond
2013-10-18 20:03       ` Helge Deller [this message]
2013-10-18 20:03         ` Helge Deller
2013-10-18 20:12         ` Myklebust, Trond
2013-10-18 20:12           ` Myklebust, Trond
2013-10-19 18:27           ` Helge Deller
2013-10-31 19:45             ` Helge Deller
2013-10-31 19:45               ` Helge Deller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5261940A.4090101@gmx.de \
    --to=deller@gmx.de \
    --cc=Trond.Myklebust@netapp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-parisc@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.