From: Tejun Heo <tj@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Eric Sandeen <sandeen@redhat.com>,
Eric Sandeen <sandeen@sandeen.net>, xfs-oss <xfs@oss.sgi.com>
Subject: Re: [PATCH 2/2] xfs: mark the xfs-alloc workqueue as high priority
Date: Sat, 10 Jan 2015 12:41:33 -0500 [thread overview]
Message-ID: <20150110174133.GC25319@htj.dyndns.org> (raw)
In-Reply-To: <20150109232815.GL31508@dastard>
Hello, Dave.
On Sat, Jan 10, 2015 at 10:28:15AM +1100, Dave Chinner wrote:
> process A kworker (1..N)
> ilock(excl)
> alloc
> queue work(allocwq)
> (work queued as no kworker
> threads available)
> execute work from xfsbuf-wq
> xfs_end_io
> ilock(excl)
> (blocks waiting on queued work)
>
> No new kworkers are started, so the queue never makes progress,
> we deadlock.
But allocwq is a separate workqueue from xfsbuf-wq and should have its
own rescuer. The work item queued by process on A is guaranteed to
make forward progress no matter what work items on xfsbuf-wq are
doing. The deadlock as depicted above cannot happen. A workqueue
with WQ_MEM_RECLAIM can deadlock iff an executing work item on the
workqueue deadlocks.
> AFAICT, the only way we can get here is that we have N idle
> kworkers, and N+M works get queued where the allocwq work is at the
> tail end of the queue. This results in newly queued work is not
> kicking a new kworker threadi as there are idle threads, and as
> works are executed the are all for the xfsbuf-wq and blocking on the
> ilock(excl).
The workqueue doesn't work that way. Forward progress guarantee is
separate for each workqueue with WQ_MEM_RECLAIM set. Each is
guaranteed to make forward progress indepdently of what others are
doing.
> We eventually get to the point where there are no more idle
> kworkers, but we still have works queued, and progress is still
> dependent the queued works completing....
>
> This is actually not an uncommon queuing occurrence, because we
> can get storms of end-io works queued from batched IO completion
> processing.
And all these have nothing to with why the deadlock is happening.
> > Ummm, this feel pretty voodoo. In practice, it'd change the order of
> > things being executed and may make certain deadlocks unlikely enough,
> > but I don't think this can be a proper fix.
>
> Right, that's why Eric approached about this a few weeks ago asking
> whether it could be fixed in the workqueue code.
>
> As I've said before (in years gone by), we've got multiple levels of
> priority needed for executing XFS works because of lock ordering
> requirements. We *always* want the allocation workqueue work to run
There's no problem there - create a separate WQ_MEM_RECLAIM workqueue
at each priority level. Each is guaranteed to make forward progress
and as long as the dependency graph isn't looped, the whole thing is
guaranteed to make forward progress.
> before the end-io processing of the xfsbuf-wq and unwritten-wq
> because of this lock inversion, just like we we always want the
> xfsbufd to run before the unwritten-wq because unwritten extent
> conversion may block waiting for metadata buffer IO to complete, and
> we always want the the xfslog-wq works to run before all of them
> because metadata buffer IO may get blocked waiting for buffers
> pinned by the log to be unpinned for log Io completion...
I'm not really following your logic here. Are you saying that xfs is
trying to work around cyclic dependency by manipulating execution
order of specific work items?
> We solve these dependencies in a sane manner with a single high
> priority workqueue level, so we're stuck with hacking around the
> worst of the problems for the moment.
I'm afraid I'm lost. Sans bugs in the rescuer logic, each workqueue
with WQ_MEM_RECLAIM guarantees forward progress of a single work item
at any given time. If there are dependencies among the work items,
each node of the dependency graph should correspond to a separate
workqueue. As long as the dependency graph isn't cyclic, the whole is
guaranteed to make forward progress.
There no reason to play with priorities to avoid deadlock. That
doesn't make any sense to me. Priority or chained queueing, which is
a lot better way to do it, can be used to optimize queueing and
execution behavior if one set of work items are likely to wait on
another set, but that should have *NOTHING* to do with deadlock
avoidance.
If the dependency graph is cyclic, it will deadlock. If not, it
won't. It's as simple as that.
Thanks.
--
tejun
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2015-01-10 17:41 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-09 18:08 [PATCH 0/2] xfs: make xfs allocation workqueue per-mount, and high priority Eric Sandeen
2015-01-09 18:10 ` [PATCH 1/2] xfs: make global xfsalloc workqueue per-mount Eric Sandeen
2015-01-12 15:35 ` Brian Foster
2015-01-09 18:12 ` [PATCH 2/2] xfs: mark the xfs-alloc workqueue as high priority Eric Sandeen
2015-01-09 18:23 ` Tejun Heo
2015-01-09 20:36 ` Eric Sandeen
2015-01-10 19:28 ` Tejun Heo
2015-01-11 0:04 ` Eric Sandeen
2015-01-11 6:33 ` Tejun Heo
2015-01-12 20:09 ` Eric Sandeen
2015-01-12 22:53 ` Tejun Heo
2015-01-12 23:12 ` Eric Sandeen
2015-01-12 23:37 ` Tejun Heo
2015-01-13 19:08 ` Eric Sandeen
2015-01-13 20:19 ` Tejun Heo
2015-01-13 20:29 ` Eric Sandeen
2015-01-13 20:46 ` Tejun Heo
2015-01-13 22:58 ` Eric Sandeen
2015-01-13 23:35 ` [PATCH wq/for-3.19] workqueue: fix subtle pool management issue which can stall whole worker_pool Tejun Heo
2015-01-16 19:32 ` [PATCH workqueue wq/for-3.19-fixes] " Tejun Heo
2015-01-19 2:15 ` Lai Jiangshan
2015-01-09 23:28 ` [PATCH 2/2] xfs: mark the xfs-alloc workqueue as high priority Dave Chinner
2015-01-10 17:41 ` Tejun Heo [this message]
2015-01-12 3:30 ` Dave Chinner
2015-01-13 20:50 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150110174133.GC25319@htj.dyndns.org \
--to=tj@kernel.org \
--cc=david@fromorbit.com \
--cc=sandeen@redhat.com \
--cc=sandeen@sandeen.net \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox