Re: [PATCH] xfs: avoid lockdep false positives in xfs_trans_alloc

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	Jeff Mahoney <jeffm@suse.com>, Theodore Tso <tytso@mit.edu>,
	Jan Kara <jack@suse.cz>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Subject: Re: [PATCH] xfs: avoid lockdep false positives in xfs_trans_alloc
Date: Thu, 4 Oct 2018 08:59:17 +1000	[thread overview]
Message-ID: <20181003225917.GR18567@dastard> (raw)
In-Reply-To: <CAOQ4uxhrEm4uMhsw5G+_Bv87-zPenJ9gwvDJvCPSmRkQAWiszQ@mail.gmail.com>

On Wed, Oct 03, 2018 at 06:45:13AM +0300, Amir Goldstein wrote:
> On Wed, Oct 3, 2018 at 2:14 AM Dave Chinner <david@fromorbit.com> wrote:
> [...]
> > > Seems like freezing any of the layers if overlay itself is not frozen
> > > is not a good idea.
> >
> > That's something we can't directly control. e.g. lower filesystem is
> > on a DM volume. DM can freeze the lower fileystem through the block
> > device when a dm command is run. It may well be that the admins that
> > set up the storage and filesystem layer have no idea that there are
> > now overlay users on top of the filesystem they originally set up.
> > Indeed, the admins may not even know that dm operations freeze
> > filesystems because it happens completely transparently to them.
> >
> 
> I don't think we should be binding the stacked filesystem issues with
> the stacked block over fs issues.

It's the same problem.  Hacking a one-off solution to hide a specific
overlay symptom does not address the root problem. And, besides, if
you stack like this:

overlay
  lower_fs
    loopback dev
      loop img fs

And freeze the loop img fs, overlay can still get stuck in it's
shrinker because the the lower_fs gets stuck doing IO on the frozen
loop img fs.

i.e. it's the same issue - kswapd will get stuck doing reclaim from
the overlay shrinker.

> The latter is more complex to solve
> generally and has by design non limited stack depth. The former has
> limited stack depth (2) and each sb knows its own stack depth, which
> is already used in overlay to annotate lockdep correctly.
> 
> If vfs stores a reverse tree of stacked fs dependencies, then individual
> sb freeze can be solved.

Don't make me mention bind mounts... :/

> Drawing the fire away from overlayfs.. I personally find the behavior that
> a process that only has files open for read could block when filesystem is
> frozen somewhat unexpected to users (even if I can expect it).

Filesystem reads have always been able to modify the file (e.g.
atime updates). Not to mention filesystem reads require memory
allocation, and that means any GFP_KERNEL direct reclaim can get
stuck on a frozen filesystem if that filesystem hasn't properly
cleared out it's dangerous reclaimable objects when freezing.

> I wonder out loud if it wouldn't be friendlier for any filesystem to defer
> "garbage collection" (e.g. truncate deleted inode blocks) to thawing time,

https://marc.info/?l=linux-xfs&m=153022904909523&w=2

Been on the list of "nice to have" unlink optimisations for XFS
since 2008.

But it's a performance optimisation and precursor for offlining AGs
for online repair, not something we've ever considered as needed for
correctness or to prevent deadlocks.

> just as those operations are already run on mount (post crash) anyway.

That's a completely different context - log recovery is much more
constrained in the amount of work it needs to do and has much more
freedom for handling errors (i.e. it can just leak bad unlinked
inodes). Runtime deferral of post-unlink, post-reference inode
reclaim is a *lot* more complex than processing pending unlinks in
log recovery.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2018-10-04  5:49 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-07  1:51 [PATCH] xfs: avoid lockdep false positives in xfs_trans_alloc Dave Chinner
2018-09-07 14:07 ` Brian Foster
2018-09-10  6:47 ` Christoph Hellwig
2018-09-30  7:56 ` Amir Goldstein
2018-10-01  1:09   ` Dave Chinner
2018-10-01  7:56     ` Amir Goldstein
2018-10-01 22:32       ` Dave Chinner
2018-10-02  4:02         ` Amir Goldstein
2018-10-02  6:39           ` Dave Chinner
2018-10-02  7:33             ` Miklos Szeredi
2018-10-02 23:14               ` Dave Chinner
2018-10-03  3:45                 ` Amir Goldstein
2018-10-03 22:59                   ` Dave Chinner [this message]
2018-10-03 23:14                     ` Miklos Szeredi
2018-10-04  5:38                       ` Dave Chinner
2018-10-04  7:33                         ` Miklos Szeredi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181003225917.GR18567@dastard \
    --to=david@fromorbit.com \
    --cc=amir73il@gmail.com \
    --cc=jack@suse.cz \
    --cc=jeffm@suse.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).