Re: nfsd deadlock, 2.6.36-rc3

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Tim Gardner <tim.gardner@canonical.com>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>,
	linux-nfs@vger.kernel.org,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Trond.Myklebust@netapp.com
Subject: Re: nfsd deadlock, 2.6.36-rc3
Date: Wed, 08 Sep 2010 10:52:51 -0600	[thread overview]
Message-ID: <4C87BF63.3070808@canonical.com> (raw)
In-Reply-To: <4C7FBF26.3090203@canonical.com>

On 09/02/2010 09:13 AM, Tim Gardner wrote:
> On 09/01/2010 03:13 PM, J. Bruce Fields wrote:
>> On Wed, Sep 01, 2010 at 03:11:23PM -0600, Tim Gardner wrote:
>>> On 09/01/2010 02:55 PM, Neil Brown wrote:
>>>> On Wed, 1 Sep 2010 12:54:01 -0400
>>>> "J. Bruce Fields"<bfields@fieldses.org> wrote:
>>>>
>>>>> On Wed, Sep 01, 2010 at 09:39:55AM -0600, Tim Gardner wrote:
>>>>>> I've been pursuing a simple reproducer for an NFS lockup that shows
>>>>>> up under stress. There is a bunch of info (some of it extraneous) in
>>>>>> http://bugs.launchpad.net/bugs/561210. I can reproduce it by writing
>>>>>> loop mounted NFS exports:
>>>>>>
>>>>>> /etc/fstab: 127.0.0.1:/srv /mnt/srv nfs rw 0 2
>>>>>> /etc/exports: /srv 127.0.0.1(rw,insecure,no_subtree_check)
>>>>>>
>>>>>> See the attached scripts test_master.sh and test_client.sh. I simply
>>>>>> repeat './test_master.sh wait' until nfsd locks up, typically within
>>>>>> 1-3 cycles, e.g.,
>>>>>
>>>>> Without looking at the dmesg and scripts carefully to confirm, one
>>>>> possible explanation is a deadlock when the server can't allocate
>>>>> memory
>>>>> required to service client requests, memory which the client itself
>>>>> needs to free by writing back dirty pages, but can't because the
>>>>> server
>>>>> isn't processing its writes.
>>>>
>>>> Having looked closely I'd say it is almost certainly this issue.
>>>> nfsd thread 1266 is in zone_reclaim waiting on a page to be written
>>>> out so
>>>> the memory can be reused.
>>>> The other nfsd threads are blocking on a mutex held by 1266.
>>>> The dd processes are waiting for pages to be written to the server
>>>>
>>>> The particular page that 1266 is waiting on is almost certainly a
>>>> page on an
>>>> NFS file, so you have a cyclic deadlock.
>>>>
>>>>>
>>>>> For that reason we just don't support loopback mounts--they're OK for
>>>>> light testing, but it would be difficult to make them completely
>>>>> robust
>>>>> under load.
>>>>
>>>> I wonder if we could use 'containers' to partition available memory
>>>> between
>>>> 'nfsd threads' and 'everything else'?? Probably not worth the effort.
>>>>
>>>> NeilBrown
>>>>
>>>
>>> I'm currently working with my support folks to reproduce this using
>>> the exact same configuration as the customer, e.g., an NFS server
>>> (running as a guest on a VMWare ESX host) serving multiple gigabit
>>> clients.
>>>
>>> I assume that is a reasonable scenario?
>>
>> Assuming no VMWare problem (which I know nothing about), sure.
>>
>> --b.
>>
>
> The support folks were able to reproduce the failure using external
> clients after about 6 hours. We're thinking that its the same symptom as
> seen in https://bugzilla.kernel.org/show_bug.cgi?id=16056. That
> backported patch b608b283a962caaa280756bc8563016a71712acf from Trond was
> just incorporated into the Ubuntu 10.04 kernel, so they'll retest to see
> if its a bona-fide fix.
>
> rtg

The solution appears to be to twiddle with /proc/sys/vm/min_free_kbytes 
and /proc/sys/vm/drop_caches, though I'm not sure this addresses the 
root cause. Perhaps low memory really is the root cause.

At any rate, their solution was to set min_free_kbytes to 4GB, and to 
'echo 1 > /proc/sys/vm/drop_caches' whenever free memory fell below 8GB. 
Not particularly elegant, but it appears to have stopped their server 
from wedging.

rtg

-- 
Tim Gardner tim.gardner@canonical.com

next prev parent reply	other threads:[~2010-09-08 16:53 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-01 15:39 nfsd deadlock, 2.6.36-rc3 Tim Gardner
2010-09-01 16:54 ` J. Bruce Fields
2010-09-01 20:55   ` Neil Brown
2010-09-01 21:05     ` J. Bruce Fields
2010-09-01 21:11     ` Tim Gardner
2010-09-01 21:13       ` J. Bruce Fields
2010-09-02 15:13         ` Tim Gardner
2010-09-08 16:52           ` Tim Gardner [this message]
2010-09-08 17:50             ` J. Bruce Fields
2010-09-03 19:12 ` Maciej Rutecki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C87BF63.3070808@canonical.com \
    --to=tim.gardner@canonical.com \
    --cc=Trond.Myklebust@netapp.com \
    --cc=bfields@fieldses.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.