From: Filipe Manana <fdmanana@gmail.com>
To: Omar Sandoval <osandov@osandov.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>,
kernel-team@fb.com, Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH] Btrfs: fix workqueue deadlock on dependent filesystems
Date: Mon, 12 Aug 2019 19:53:10 +0100 [thread overview]
Message-ID: <CAL3q7H4d6mGsiKRmsELw1o6XQBVTWOEQ-bYqwr5RS7i__jarzA@mail.gmail.com> (raw)
In-Reply-To: <20190812184812.GA4142@vader>
On Mon, Aug 12, 2019 at 7:48 PM Omar Sandoval <osandov@osandov.com> wrote:
>
> On Mon, Aug 12, 2019 at 12:38:55PM +0100, Filipe Manana wrote:
> > On Tue, Aug 6, 2019 at 6:48 PM Omar Sandoval <osandov@osandov.com> wrote:
> > >
> > > From: Omar Sandoval <osandov@fb.com>
> > >
> > > We hit a the following very strange deadlock on a system with Btrfs on a
> > > loop device backed by another Btrfs filesystem:
> > >
> > > 1. The top (loop device) filesystem queues an async_cow work item from
> > > cow_file_range_async(). We'll call this work X.
> > > 2. Worker thread A starts work X (normal_work_helper()).
> > > 3. Worker thread A executes the ordered work for the top filesystem
> > > (run_ordered_work()).
> > > 4. Worker thread A finishes the ordered work for work X and frees X
> > > (work->ordered_free()).
> > > 5. Worker thread A executes another ordered work and gets blocked on I/O
> > > to the bottom filesystem (still in run_ordered_work()).
> > > 6. Meanwhile, the bottom filesystem allocates and queues an async_cow
> > > work item which happens to be the recently-freed X.
> > > 7. The workqueue code sees that X is already being executed by worker
> > > thread A, so it schedules X to be executed _after_ worker thread A
> > > finishes (see the find_worker_executing_work() call in
> > > process_one_work()).
> > >
> > > Now, the top filesystem is waiting for I/O on the bottom filesystem, but
> > > the bottom filesystem is waiting for the top filesystem to finish, so we
> > > deadlock.
> > >
> > > This happens because we are breaking the workqueue assumption that a
> > > work item cannot be recycled while it still depends on other work. Fix
> > > it by waiting to free the work item until we are done with all of the
> > > related ordered work.
> > >
> > > P.S.:
> > >
> > > One might ask why the workqueue code doesn't try to detect a recycled
> > > work item. It actually does try by checking whether the work item has
> > > the same work function (find_worker_executing_work()), but in our case
> > > the function is the same. This is the only key that the workqueue code
> > > has available to compare, short of adding an additional, layer-violating
> > > "custom key". Considering that we're the only ones that have ever hit
> > > this, we should just play by the rules.
> > >
> > > Unfortunately, we haven't been able to create a minimal reproducer other
> > > than our full container setup using a compress-force=zstd filesystem on
> > > top of another compress-force=zstd filesystem.
> > >
> > > Suggested-by: Tejun Heo <tj@kernel.org>
> > > Signed-off-by: Omar Sandoval <osandov@fb.com>
> >
> > Reviewed-by: Filipe Manana <fdmanana@suse.com>
> >
> > Looks good to me, thanks.
> > Another variant of the problem Liu fixed back in 2014 (commit
> > 9e0af23764344f7f1b68e4eefbe7dc865018b63d).
>
> Good point. I think we can actually get rid of those unique helpers with
> this fix. I'll send some followup cleanups.
Great! Thanks.
--
Filipe David Manana,
“Whether you think you can, or you think you can't — you're right.”
next prev parent reply other threads:[~2019-08-12 18:53 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-06 17:34 [PATCH] Btrfs: fix workqueue deadlock on dependent filesystems Omar Sandoval
2019-08-07 7:17 ` Nikolay Borisov
2019-08-07 17:08 ` Omar Sandoval
2019-08-12 11:38 ` Filipe Manana
2019-08-12 18:48 ` Omar Sandoval
2019-08-12 18:53 ` Filipe Manana [this message]
2019-08-19 16:37 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAL3q7H4d6mGsiKRmsELw1o6XQBVTWOEQ-bYqwr5RS7i__jarzA@mail.gmail.com \
--to=fdmanana@gmail.com \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=osandov@osandov.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).