Re: nfsd oops on Linus' current tree.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "J. Bruce Fields" <bfields@fieldses.org>
To: Tejun Heo <tj@kernel.org>
Cc: "Adamson, Dros" <Weston.Adamson@netapp.com>,
	"Myklebust, Trond" <Trond.Myklebust@netapp.com>,
	Dave Jones <davej@redhat.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: nfsd oops on Linus' current tree.
Date: Thu, 3 Jan 2013 18:08:11 -0500	[thread overview]
Message-ID: <20130103230811.GC3238@fieldses.org> (raw)
In-Reply-To: <20130103220309.GA2753@mtj.dyndns.org>

On Thu, Jan 03, 2013 at 05:03:09PM -0500, Tejun Heo wrote:
> Hello, guys.
> 
> On Thu, Jan 03, 2013 at 04:28:37PM +0000, Adamson, Dros wrote:
> > The deadlock we were seeing was:
> > 
> > - task A gets queued on rpciod workqueue and assigned kworker-0:0
> > - task B gets queued on rpciod workqueue and assigned the same kworker (kworker-0:0)
> > - task A gets run, calls rpc_shutdown_client(), which will loop forever waiting for task B to run rpc_async_release()
> > - task B will never run rpc_async_release() - it can't run until kworker-0:0 is free, which won't happen until task A (rpc_shutdown_client) is done
> > 
> > The same deadlock happened when we tried queuing the tasks on a
> > different workqueues -- queue_work() assigns the task to a kworker
> > thread and it's luck of the draw if it's the same kworker as task A.
> > We tried the different workqueue options, but nothing changed this
> > behavior.
> 
> Work items don't get assigned to workers on queueing.  Idle workers
> pick up work items.

Oh, so that's why the case where we can't create a new worker is the
only case we should need the rescuers for.  Got it.  I think.

--b.

> A work item is directly assigned to a specific
> worker iff the worker is already executing that specific work item or
> the new work item is "linked" to the one it's currently executing.
> Currently, the only case where a linked work item is used is when
> flushing which is guaranteed to not introduce dependency the other way
> around.
> 
> So, your diagnosis looks wrong to me.  If such problem existed, we
> would be seeing deadlocks all over the place.
> 
> > Once a work struct is queued, there is no way to back out of the
> > deadlock.  From kernel/workqueue.c:insert_wq_barrier comment:
> 
> Yes, there are.  cancel_work[_sync]() do exactly that.
> 
> >  * Currently, a queued barrier can't be canceled.  This is because
> >  * try_to_grab_pending() can't determine whether the work to be
> >  * grabbed is at the head of the queue and thus can't clear LINKED
> >  * flag of the previous work while there must be a valid next work
> >  * after a work with LINKED flag set.
> > 
> > So once a work struct is queued and there is an ordering dependency
> > (i.e. task A is before task B), there is no way to back task B out -
> > so we can't just call cancel_work() or something on task B in
> > rpc_shutdown_client.
> 
> A *barrier* can't be canceled.  A barrier is used only to flush work
> items.  The above comment means that we currently don't (or can't)
> support canceling flush_work().  It has *nothing* to do with canceling
> regular work items.  You can cancel work items fine.
> 
> > The root of our issue is that rpc_shutdown_client is never safe to
> > call from a workqueue context - it loops until there are no more
> > tasks, marking tasks as killed and waiting for them to be cleaned up
> > in each task's own workqueue context. Any tasks that have already
> > been assigned to the same kworker thread will never have a chance to
> > run this cleanup stage.
> >
> > When fixing this deadlock, Trond and I discussed changing how
> > rpc_shutdown_client works (making it workqueue safe), but Trond felt
> > that it'd be better to just not call it from a workqueue context and
> > print a warning if it is.
> >
> > IIRC we tried using different workqueues with WQ_MEM_RECLAIM (with
> > no success), but I'd argue that even if that did work it would still
> > be very easy to call rpc_shutdown_client from the wrong context and
> > MUCH harder to detect it.  It's also unclear to me if setting rpciod
> > workqueue to WQ_MEM_RECLAIM would limit it to one kworker, etc...
> 
> It looks like you guys ended up in a weird place misled by wrong
> analysis.  Unless you require more than one concurrent execution on
> the same workqueue, WQ_MEM_RECLAIM guarantees forward progress.  It
> won't deadlock because "a different work item is queued to the same
> worker".  The whole thing is designed *exactly* to avoid problems like
> that.  So, I'd strongly recommend looking again at why the deadlocks
> are occurring.
> 
> Thanks.
> 
> -- 
> tejun
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

next prev parent reply	other threads:[~2013-01-03 23:08 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-21 15:33 nfsd oops on Linus' current tree Dave Jones
2012-12-21 18:08 ` J. Bruce Fields
2012-12-21 18:40   ` Myklebust, Trond
2012-12-21 23:08     ` J. Bruce Fields
2012-12-21 23:15       ` Myklebust, Trond
2012-12-21 23:26         ` J. Bruce Fields
2012-12-21 23:36           ` Myklebust, Trond
2012-12-21 23:45             ` J. Bruce Fields
2013-01-03 16:28               ` Adamson, Dros
2013-01-03 20:11                 ` J. Bruce Fields
2013-01-03 20:27                   ` Adamson, Dros
2013-01-03 20:52                     ` J. Bruce Fields
2013-01-03 22:15                       ` Tejun Heo
2013-01-03 22:08                   ` Tejun Heo
2013-01-03 22:12                     ` Myklebust, Trond
2013-01-03 22:26                       ` Tejun Heo
2013-01-03 22:34                         ` Tejun Heo
2013-01-03 23:11                         ` Myklebust, Trond
     [not found]                         ` <1357254692.55285.33.camel@lade.trondhjem.org>
2013-01-03 23:26                           ` Myklebust, Trond
2013-01-04 17:11                             ` Adamson, Dros
2013-01-04 18:15                               ` [PATCH 1/3] SUNRPC: Ensure that we free the rpc_task after cleanups are done Trond Myklebust
2013-01-04 18:15                                 ` [PATCH 2/3] NFS: Ensure that we free the rpc_task after read and write " Trond Myklebust
2013-01-04 18:15                                   ` [PATCH 3/3] SUNRPC: Partial revert of commit 168e4b39d1afb79a7e3ea6c3bb246b4c82c6bdb9 Trond Myklebust
2013-01-04 18:29                                   ` [PATCH 2/3] NFS: Ensure that we free the rpc_task after read and write cleanups are done Bruce Fields
2013-01-04 18:52                                     ` Myklebust, Trond
2013-01-04 19:12                                       ` Bruce Fields
2013-01-07 17:47                                 ` [PATCH 1/3] SUNRPC: Ensure that we free the rpc_task after " Tejun Heo
2013-01-07 17:54                                   ` Myklebust, Trond
2013-01-03 22:03                 ` nfsd oops on Linus' current tree Tejun Heo
2013-01-03 23:08                   ` J. Bruce Fields [this message]
2012-12-22  0:48   ` [PATCH] Revert "nfsd: warn on odd reply state in nfsd_vfs_read" J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130103230811.GC3238@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=Trond.Myklebust@netapp.com \
    --cc=Weston.Adamson@netapp.com \
    --cc=davej@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.