public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Tim Gardner <tim.gardner@canonical.com>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>,
	linux-nfs@vger.kernel.org,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Trond.Myklebust@netapp.com
Subject: Re: nfsd deadlock, 2.6.36-rc3
Date: Wed, 08 Sep 2010 10:52:51 -0600	[thread overview]
Message-ID: <4C87BF63.3070808@canonical.com> (raw)
In-Reply-To: <4C7FBF26.3090203@canonical.com>

On 09/02/2010 09:13 AM, Tim Gardner wrote:
> On 09/01/2010 03:13 PM, J. Bruce Fields wrote:
>> On Wed, Sep 01, 2010 at 03:11:23PM -0600, Tim Gardner wrote:
>>> On 09/01/2010 02:55 PM, Neil Brown wrote:
>>>> On Wed, 1 Sep 2010 12:54:01 -0400
>>>> "J. Bruce Fields"<bfields@fieldses.org> wrote:
>>>>
>>>>> On Wed, Sep 01, 2010 at 09:39:55AM -0600, Tim Gardner wrote:
>>>>>> I've been pursuing a simple reproducer for an NFS lockup that shows
>>>>>> up under stress. There is a bunch of info (some of it extraneous) in
>>>>>> http://bugs.launchpad.net/bugs/561210. I can reproduce it by writing
>>>>>> loop mounted NFS exports:
>>>>>>
>>>>>> /etc/fstab: 127.0.0.1:/srv /mnt/srv nfs rw 0 2
>>>>>> /etc/exports: /srv 127.0.0.1(rw,insecure,no_subtree_check)
>>>>>>
>>>>>> See the attached scripts test_master.sh and test_client.sh. I simply
>>>>>> repeat './test_master.sh wait' until nfsd locks up, typically within
>>>>>> 1-3 cycles, e.g.,
>>>>>
>>>>> Without looking at the dmesg and scripts carefully to confirm, one
>>>>> possible explanation is a deadlock when the server can't allocate
>>>>> memory
>>>>> required to service client requests, memory which the client itself
>>>>> needs to free by writing back dirty pages, but can't because the
>>>>> server
>>>>> isn't processing its writes.
>>>>
>>>> Having looked closely I'd say it is almost certainly this issue.
>>>> nfsd thread 1266 is in zone_reclaim waiting on a page to be written
>>>> out so
>>>> the memory can be reused.
>>>> The other nfsd threads are blocking on a mutex held by 1266.
>>>> The dd processes are waiting for pages to be written to the server
>>>>
>>>> The particular page that 1266 is waiting on is almost certainly a
>>>> page on an
>>>> NFS file, so you have a cyclic deadlock.
>>>>
>>>>>
>>>>> For that reason we just don't support loopback mounts--they're OK for
>>>>> light testing, but it would be difficult to make them completely
>>>>> robust
>>>>> under load.
>>>>
>>>> I wonder if we could use 'containers' to partition available memory
>>>> between
>>>> 'nfsd threads' and 'everything else'?? Probably not worth the effort.
>>>>
>>>> NeilBrown
>>>>
>>>
>>> I'm currently working with my support folks to reproduce this using
>>> the exact same configuration as the customer, e.g., an NFS server
>>> (running as a guest on a VMWare ESX host) serving multiple gigabit
>>> clients.
>>>
>>> I assume that is a reasonable scenario?
>>
>> Assuming no VMWare problem (which I know nothing about), sure.
>>
>> --b.
>>
>
> The support folks were able to reproduce the failure using external
> clients after about 6 hours. We're thinking that its the same symptom as
> seen in https://bugzilla.kernel.org/show_bug.cgi?id=16056. That
> backported patch b608b283a962caaa280756bc8563016a71712acf from Trond was
> just incorporated into the Ubuntu 10.04 kernel, so they'll retest to see
> if its a bona-fide fix.
>
> rtg

The solution appears to be to twiddle with /proc/sys/vm/min_free_kbytes 
and /proc/sys/vm/drop_caches, though I'm not sure this addresses the 
root cause. Perhaps low memory really is the root cause.

At any rate, their solution was to set min_free_kbytes to 4GB, and to 
'echo 1 > /proc/sys/vm/drop_caches' whenever free memory fell below 8GB. 
Not particularly elegant, but it appears to have stopped their server 
from wedging.

rtg

-- 
Tim Gardner tim.gardner@canonical.com

  reply	other threads:[~2010-09-08 16:53 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-01 15:39 nfsd deadlock, 2.6.36-rc3 Tim Gardner
2010-09-01 16:54 ` J. Bruce Fields
2010-09-01 20:55   ` Neil Brown
2010-09-01 21:05     ` J. Bruce Fields
2010-09-01 21:11     ` Tim Gardner
2010-09-01 21:13       ` J. Bruce Fields
2010-09-02 15:13         ` Tim Gardner
2010-09-08 16:52           ` Tim Gardner [this message]
2010-09-08 17:50             ` J. Bruce Fields
2010-09-03 19:12 ` Maciej Rutecki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C87BF63.3070808@canonical.com \
    --to=tim.gardner@canonical.com \
    --cc=Trond.Myklebust@netapp.com \
    --cc=bfields@fieldses.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox