All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@redhat.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Seth Forshee <seth.forshee@canonical.com>,
	Trond Myklebust <trond.myklebust@primarydata.com>,
	Anna Schumaker <anna.schumaker@netapp.com>,
	linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Tycho Andersen <tycho.andersen@canonical.com>
Subject: Re: Hang due to nfs letting tasks freeze with locked inodes
Date: Fri, 08 Jul 2016 08:51:54 -0400	[thread overview]
Message-ID: <1467982314.13822.5.camel@redhat.com> (raw)
In-Reply-To: <20160708122224.GA20200@dhcp22.suse.cz>

On Fri, 2016-07-08 at 14:22 +0200, Michal Hocko wrote:
> On Wed 06-07-16 18:07:18, Jeff Layton wrote:
> > 
> > On Wed, 2016-07-06 at 12:46 -0500, Seth Forshee wrote:
> > > 
> > > We're seeing a hang when freezing a container with an nfs bind mount while
> > > running iozone. Two iozone processes were hung with this stack trace.
> > > 
> > >  [] schedule+0x35/0x80
> > >  [] schedule_preempt_disabled+0xe/0x10
> > >  [] __mutex_lock_slowpath+0xb9/0x130
> > >  [] mutex_lock+0x1f/0x30
> > >  [] do_unlinkat+0x12b/0x2d0
> > >  [] SyS_unlink+0x16/0x20
> > >  [] entry_SYSCALL_64_fastpath+0x16/0x71
> > > 
> > > This seems to be due to another iozone thread frozen during unlink with
> > > this stack trace:
> > > 
> > >  [] __refrigerator+0x7a/0x140
> > >  [] nfs4_handle_exception+0x118/0x130 [nfsv4]
> > >  [] nfs4_proc_remove+0x7d/0xf0 [nfsv4]
> > >  [] nfs_unlink+0x149/0x350 [nfs]
> > >  [] vfs_unlink+0xf1/0x1a0
> > >  [] do_unlinkat+0x279/0x2d0
> > >  [] SyS_unlink+0x16/0x20
> > >  [] entry_SYSCALL_64_fastpath+0x16/0x71
> > > 
> > > Since nfs is allowing the thread to be frozen with the inode locked it's
> > > preventing other threads trying to lock the same inode from freezing. It
> > > seems like a bad idea for nfs to be doing this.
> > > 
> > Yeah, known problem. Not a simple one to fix though.
> Apart from alternative Dave was mentioning in other email, what is the
> point to use freezable wait from this path in the first place?
> 
> nfs4_handle_exception does nfs4_wait_clnt_recover from the same path and
> that does wait_on_bit_action with TASK_KILLABLE so we are waiting in two
> different modes from the same path AFAICS. There do not seem to be other
> callers of nfs4_delay outside of nfs4_handle_exception. Sounds like
> something is not quite right here to me. If the nfs4_delay did regular
> wait then the freezing would fail as well but at least it would be clear
> who is the culrprit rather than having an indirect dependency.

The codepaths involved there are a lot more complex than that
unfortunately.

nfs4_delay is the function that we use to handle the case where the
server returns NFS4ERR_DELAY. Basically telling us that it's too busy
right now or has some transient error and the client should retry after
a small, sliding delay.

That codepath could probably be made more freezer-safe. The typical
case however, is that we've sent a call and just haven't gotten a
reply. That's the trickier one to handle.

-- 
Jeff Layton <jlayton@redhat.com>

  parent reply	other threads:[~2016-07-08 12:51 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-06 17:46 Hang due to nfs letting tasks freeze with locked inodes Seth Forshee
2016-07-06 22:07 ` Jeff Layton
2016-07-07  3:55   ` Seth Forshee
2016-07-07 10:29     ` Jeff Layton
2016-07-07 23:53   ` Dave Chinner
2016-07-07 23:53     ` Dave Chinner
2016-07-08 11:33     ` Jeff Layton
2016-07-08 12:48     ` Seth Forshee
2016-07-08 12:55       ` Trond Myklebust
2016-07-08 12:55         ` Trond Myklebust
2016-07-08 13:05         ` Trond Myklebust
2016-07-08 13:05           ` Trond Myklebust
2016-07-11  1:20           ` Dave Chinner
2016-07-08 12:22   ` Michal Hocko
2016-07-08 12:22     ` Michal Hocko
2016-07-08 12:47     ` Seth Forshee
2016-07-08 12:51     ` Jeff Layton [this message]
2016-07-08 14:23       ` Michal Hocko
2016-07-08 14:27         ` Jeff Layton
2016-07-11  7:23           ` Michal Hocko
2016-07-11 11:03             ` Jeff Layton
2016-07-11 11:43               ` Michal Hocko
2016-07-11 12:50               ` Seth Forshee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1467982314.13822.5.camel@redhat.com \
    --to=jlayton@redhat.com \
    --cc=anna.schumaker@netapp.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=seth.forshee@canonical.com \
    --cc=trond.myklebust@primarydata.com \
    --cc=tycho.andersen@canonical.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.