From: Dave Chinner <david@fromorbit.com>
To: Amir Goldstein <amir73il@gmail.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
xfs <linux-xfs@vger.kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Byungchul Park <byungchul.park@lge.com>,
linux-kernel <linux-kernel@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: false positive lockdep splat with loop device
Date: Tue, 26 Sep 2017 14:24:21 +1000 [thread overview]
Message-ID: <20170926042421.GP10955@dastard> (raw)
In-Reply-To: <CAOQ4uxixLMRU2YT-3csTr_684vNUZ=2CazKKD-ojO68fC4f6tA@mail.gmail.com>
On Thu, Sep 21, 2017 at 09:43:41AM +0300, Amir Goldstein wrote:
> On Thu, Sep 21, 2017 at 1:22 AM, Dave Chinner <david@fromorbit.com> wrote:
> > [cc lkml, PeterZ and Byungchul]
> ...
> > The thing is, this IO completion has nothing to do with the lower
> > filesystem - it's the IO completion for the filesystem on the loop
> > device (the upper filesystem) and is not in any way related to the
> > IO completion from the dax device the lower filesystem is waiting
> > on.
> >
> > IOWs, this is a false positive.
> >
> > Peter, this is the sort of false positive I mentioned were likely to
> > occur without some serious work to annotate the IO stack to prevent
> > them. We can nest multiple layers of IO completions and locking in
> > the IO stack via things like loop and RAID devices. They can be
> > nested to arbitrary depths, too (e.g. loop on fs on loop on fs on
> > dm-raid on n * (loop on fs) on bdev) so this new completion lockdep
> > checking is going to be a source of false positives until there is
> > an effective (and simple!) way of providing context based completion
> > annotations to avoid them...
> >
>
> IMO, the way to handle this is to add 'nesting_depth' information
> on blockdev (or bdi?). 'nesting' in the sense of blockdev->fs->blockdev->fs.
> AFAIK, the only blockdev drivers that need to bump nesting_depth
> are loop and nbd??
You're assumming that this sort of "completion inversion" can only
happen with bdev->fs->bdev, and that submit_bio_wait() is the only
place where completions are used in stackable block devices.
AFAICT, this could happen on with any block device that can be
stacked multiple times that uses completions. e.g. MD has a function
sync_page_io() that calls submit_bio_wait(), and that is called from
places in the raid 5, raid 10, raid 1 and bitmap layers (plus others
in DM). These can get stacked anywhere - even on top of loop devices
- and so I think the issue has a much wider scope than just loop and
nbd devices.
> Not sure if the kernel should limit loop blockdev nesting depth??
There's no way we should do that just because new lockdep
functionality is unable to express such constructs.
-Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2017-09-26 4:31 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-21 6:43 false positive lockdep splat with loop device Amir Goldstein
2017-09-26 4:24 ` Dave Chinner [this message]
2017-09-26 8:35 ` Amir Goldstein
2017-10-05 16:33 ` Christoph Hellwig
2017-10-10 9:16 ` Ilya Dryomov
2017-10-10 9:43 ` Ilya Dryomov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170926042421.GP10955@dastard \
--to=david@fromorbit.com \
--cc=amir73il@gmail.com \
--cc=byungchul.park@lge.com \
--cc=darrick.wong@oracle.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).