linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Benjamin Coddington <bcodding@redhat.com>
To: Shawn Bohrer <shawn.bohrer@gmail.com>
Cc: linux-nfs@vger.kernel.org, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, mayoff@rgmadvisors.com,
	Jeff Layton <jeff.layton@primarydata.com>,
	fsorenso@redhat.com
Subject: Re: NFS Freezer and stuck tasks
Date: Fri, 1 May 2015 16:56:00 -0400 (EDT)	[thread overview]
Message-ID: <alpine.OSX.2.19.9992.1505011631390.946@planck.local> (raw)
In-Reply-To: <20150304220027.GB20242@sbohrermbp13-local.rgmadvisors.com>

On Wed, 4 Mar 2015, Shawn Bohrer wrote:

> Hello,
>
> We're using the Linux cgroup Freezer on some machines that use NFS and
> have run into what appears to be a bug where frozen tasks are blocking
> running tasks and preventing them from completing.  On one of our
> machines which happens to be running an older 3.10.46 kernel we have
> frozen some of the tasks on the system using the cgroup Freezer.  We
> also have a separate set of tasks which are NOT frozen which are stuck
> trying to open some files on NFS.
>
> Looking at the frozen tasks there are several that have the following
> stack:
>
> [<ffffffff814fd055>] rpc_wait_bit_killable+0x35/0x80
> [<ffffffff814fd01d>] __rpc_wait_for_completion_task+0x2d/0x30
> [<ffffffff811dce5d>] nfs4_run_open_task+0x11d/0x170
> [<ffffffff811de7a3>] _nfs4_open_and_get_state+0x53/0x260
> [<ffffffff811e12d1>] nfs4_do_open+0x121/0x400
> [<ffffffff811e15e1>] nfs4_atomic_open+0x31/0x50
> [<ffffffff811f02dc>] nfs4_file_open+0xac/0x180
> [<ffffffff811479be>] do_dentry_open.isra.19+0x1ee/0x280
> [<ffffffff81147b3e>] finish_open+0x1e/0x30
> [<ffffffff811578d2>] do_last.isra.64+0x2c2/0xc40
> [<ffffffff81158519>] path_openat.isra.65+0x2c9/0x490
> [<ffffffff81158c38>] do_filp_open+0x38/0x80
> [<ffffffff81148cd4>] do_sys_open+0xe4/0x1c0
> [<ffffffff81148dce>] SyS_open+0x1e/0x20
> [<ffffffff8153e719>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> Here it looks like we are waiting in a wait queue inside
> rpc_wait_bit_killable() for RPC_TASK_ACTIVE.
>
> And there is a single task with a stack that looks like the following:
>
> [<ffffffff8107dc05>] __refrigerator+0x55/0x150
> [<ffffffff814fd086>] rpc_wait_bit_killable+0x66/0x80
> [<ffffffff814fd01d>] __rpc_wait_for_completion_task+0x2d/0x30
> [<ffffffff811dce5d>] nfs4_run_open_task+0x11d/0x170
> [<ffffffff811de7a3>] _nfs4_open_and_get_state+0x53/0x260
> [<ffffffff811e12d1>] nfs4_do_open+0x121/0x400
> [<ffffffff811e15e1>] nfs4_atomic_open+0x31/0x50
> [<ffffffff811f02dc>] nfs4_file_open+0xac/0x180
> [<ffffffff811479be>] do_dentry_open.isra.19+0x1ee/0x280
> [<ffffffff81147b3e>] finish_open+0x1e/0x30
> [<ffffffff811578d2>] do_last.isra.64+0x2c2/0xc40
> [<ffffffff81158519>] path_openat.isra.65+0x2c9/0x490
> [<ffffffff81158c38>] do_filp_open+0x38/0x80
> [<ffffffff81148cd4>] do_sys_open+0xe4/0x1c0
> [<ffffffff81148dce>] SyS_open+0x1e/0x20
> [<ffffffff8153e719>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> This looks similar but the different offset into
> rpc_wait_bit_killable() shows that we have returned from the
> schedule() call in freezable_schedule() and are now blocked in
> __refrigerator() inside freezer_count()
>
> Similarly if you look at the tasks that are NOT frozen but are stuck
> opening a NFS file, they also have the following stack showing they are
> waiting in the wait queue for RPC_TASK_ACTIVE.
>
> [<ffffffff814fd055>] rpc_wait_bit_killable+0x35/0x80
> [<ffffffff814fd01d>] __rpc_wait_for_completion_task+0x2d/0x30
> [<ffffffff811dce5d>] nfs4_run_open_task+0x11d/0x170
> [<ffffffff811de7a3>] _nfs4_open_and_get_state+0x53/0x260
> [<ffffffff811e12d1>] nfs4_do_open+0x121/0x400
> [<ffffffff811e15e1>] nfs4_atomic_open+0x31/0x50
> [<ffffffff811f02dc>] nfs4_file_open+0xac/0x180
> [<ffffffff811479be>] do_dentry_open.isra.19+0x1ee/0x280
> [<ffffffff81147b3e>] finish_open+0x1e/0x30
> [<ffffffff811578d2>] do_last.isra.64+0x2c2/0xc40
> [<ffffffff81158519>] path_openat.isra.65+0x2c9/0x490
> [<ffffffff81158c38>] do_filp_open+0x38/0x80
> [<ffffffff81148cd4>] do_sys_open+0xe4/0x1c0
> [<ffffffff81148dce>] SyS_open+0x1e/0x20
> [<ffffffff8153e719>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> We have hit this a couple of times now and know that if we THAW all of
> the frozen tasks that running tasks will unwedge and finish.
>
> Additionally we have also tried thawing the single task that is frozen
> in __refrigerator() inside rpc_wait_bit_killable().  This usually
> results in different frozen task entering the __refrigerator() state
> inside rpc_wait_bit_killable().  It looks like each one of those tasks
> must wake up another letting it progress.  Again if you thaw enough of
> the frozen tasks eventually everything unwedges and everything
> completes.
>
> I've looked through the 3.10 stable patches since 3.10.46 and don't
> see anything that looks like it addresses this.  Does anyone have any
> idea what might be going on here, and what the fix might be?
>
> Thanks,
> Shawn

Hi Shawn, just started looking at this myself, and as Frank Sorensen points
out in https://bugzilla.redhat.com/show_bug.cgi?id=1209143 the problem is
that a task takes the xprt lock and then ends up in the refrigerator
effectively blocking other tasks from proceeding.

Jeff, any suggestions on how to proceed here?

Ben

  reply	other threads:[~2015-05-01 20:56 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-04 22:00 NFS Freezer and stuck tasks Shawn Bohrer
2015-05-01 20:56 ` Benjamin Coddington [this message]
2015-05-01 21:10   ` Benjamin Coddington
2015-05-01 21:18     ` Shawn Bohrer
2015-05-01 23:17     ` Jeff Layton
2015-05-03  2:03       ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.OSX.2.19.9992.1505011631390.946@planck.local \
    --to=bcodding@redhat.com \
    --cc=fsorenso@redhat.com \
    --cc=jeff.layton@primarydata.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mayoff@rgmadvisors.com \
    --cc=shawn.bohrer@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).