All of lore.kernel.org
 help / color / mirror / Atom feed
From: Omar Sandoval <osandov@osandov.com>
To: Filipe Manana <fdmanana@gmail.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>,
	kernel-team@fb.com, Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH] Btrfs: fix workqueue deadlock on dependent filesystems
Date: Mon, 12 Aug 2019 11:48:12 -0700	[thread overview]
Message-ID: <20190812184812.GA4142@vader> (raw)
In-Reply-To: <CAL3q7H4cSMNSKfQKtFk9Q5Shw3VxMFZQ0E7uusL0efHzyN3MXw@mail.gmail.com>

On Mon, Aug 12, 2019 at 12:38:55PM +0100, Filipe Manana wrote:
> On Tue, Aug 6, 2019 at 6:48 PM Omar Sandoval <osandov@osandov.com> wrote:
> >
> > From: Omar Sandoval <osandov@fb.com>
> >
> > We hit a the following very strange deadlock on a system with Btrfs on a
> > loop device backed by another Btrfs filesystem:
> >
> > 1. The top (loop device) filesystem queues an async_cow work item from
> >    cow_file_range_async(). We'll call this work X.
> > 2. Worker thread A starts work X (normal_work_helper()).
> > 3. Worker thread A executes the ordered work for the top filesystem
> >    (run_ordered_work()).
> > 4. Worker thread A finishes the ordered work for work X and frees X
> >    (work->ordered_free()).
> > 5. Worker thread A executes another ordered work and gets blocked on I/O
> >    to the bottom filesystem (still in run_ordered_work()).
> > 6. Meanwhile, the bottom filesystem allocates and queues an async_cow
> >    work item which happens to be the recently-freed X.
> > 7. The workqueue code sees that X is already being executed by worker
> >    thread A, so it schedules X to be executed _after_ worker thread A
> >    finishes (see the find_worker_executing_work() call in
> >    process_one_work()).
> >
> > Now, the top filesystem is waiting for I/O on the bottom filesystem, but
> > the bottom filesystem is waiting for the top filesystem to finish, so we
> > deadlock.
> >
> > This happens because we are breaking the workqueue assumption that a
> > work item cannot be recycled while it still depends on other work. Fix
> > it by waiting to free the work item until we are done with all of the
> > related ordered work.
> >
> > P.S.:
> >
> > One might ask why the workqueue code doesn't try to detect a recycled
> > work item. It actually does try by checking whether the work item has
> > the same work function (find_worker_executing_work()), but in our case
> > the function is the same. This is the only key that the workqueue code
> > has available to compare, short of adding an additional, layer-violating
> > "custom key". Considering that we're the only ones that have ever hit
> > this, we should just play by the rules.
> >
> > Unfortunately, we haven't been able to create a minimal reproducer other
> > than our full container setup using a compress-force=zstd filesystem on
> > top of another compress-force=zstd filesystem.
> >
> > Suggested-by: Tejun Heo <tj@kernel.org>
> > Signed-off-by: Omar Sandoval <osandov@fb.com>
> 
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> 
> Looks good to me, thanks.
> Another variant of the problem Liu fixed back in 2014 (commit
> 9e0af23764344f7f1b68e4eefbe7dc865018b63d).

Good point. I think we can actually get rid of those unique helpers with
this fix. I'll send some followup cleanups.

  reply	other threads:[~2019-08-12 18:48 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-06 17:34 [PATCH] Btrfs: fix workqueue deadlock on dependent filesystems Omar Sandoval
2019-08-07  7:17 ` Nikolay Borisov
2019-08-07 17:08   ` Omar Sandoval
2019-08-12 11:38 ` Filipe Manana
2019-08-12 18:48   ` Omar Sandoval [this message]
2019-08-12 18:53     ` Filipe Manana
2019-08-19 16:37 ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190812184812.GA4142@vader \
    --to=osandov@osandov.com \
    --cc=fdmanana@gmail.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.