From: Dave Chinner <david@fromorbit.com>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
James Bottomley <James.Bottomley@hansenpartnership.com>,
brauner@kernel.org, jack@suse.cz, laoar.shao@gmail.com,
linux-fsdevel@vger.kernel.org, longman@redhat.com,
walters@verbum.org, wangkai86@huawei.com, willy@infradead.org
Subject: Re: [PATCH] vfs: move dentry shrinking outside the inode lock in 'rmdir()'
Date: Thu, 23 May 2024 17:18:41 +1000 [thread overview]
Message-ID: <Zk7t0WdpHE24UNDG@dread.disaster.area> (raw)
In-Reply-To: <20240513163332.GK2118490@ZenIV>
On Mon, May 13, 2024 at 05:33:32PM +0100, Al Viro wrote:
> On Mon, May 13, 2024 at 08:58:33AM -0700, Linus Torvalds wrote:
>
> > We *could* strive for a hybrid approach, where we handle the common
> > case ("not a ton of child dentries") differently, and just get rid of
> > them synchronously, and handle the "millions of children" case by
> > unhashing the directory and dealing with shrinking the children async.
>
> try_to_shrink_children()? Doable, and not even that hard to do, but
> as for shrinking async... We can easily move it out of inode_lock
> on parent, but doing that really async would either need to be
> tied into e.g. remount r/o logics or we'd get userland regressions.
>
> I mean, "I have an opened unlinked file, can't remount r/o" is one
> thing, but "I've done rm -rf ~luser, can't remount r/o for a while"
> when all luser's processes had been killed and nothing is holding
> any of that opened... ouch.
There is no ouch for the vast majority of users: XFS has been doing
background async inode unlink processing since 5.14 (i.e. for almost
3 years now). See commit ab23a7768739 ("xfs: per-cpu deferred inode
inactivation queues") for more of the background on this change - it
was implemented because we needed to allow the scrub code to drop
inode references from within transaction contexts, and evict()
processing could run a nested transaction which could then deadlock
the filesystem.
Hence XFS offloads the inode freeing half of the unlink operation
(i.e. the bit that happens in evict() context) to per-cpu workqueues
instead of doing the work directly in evict() context. We allow
evict() to completely tear down the VFS inode context, but don't
free it in ->destroy_inode() because we still have work to do on it.
XFS doesn't need an active VFS inode context to do the internal
metadata updates needed to free an inode, so it's trivial to defer
this work to a background context outside the VFS inode life cycle.
Hence over half the processing work of every unlink() operation on
XFS is now done in kworker threads rather than via the unlink()
syscall context.
Yes, that means freeze, remount r/o and unmount will block in
xfs_inodegc_stop() waiting for these async inode freeing operations
to be flushed and completed.
However, there have been very few reported issues with freeze,
remount r/o or unmount being significantly delayed - there's an
occasional report of an inodes with tens of millions of extents to
free delaying an operation, but that's no change from unlink()
taking minutes to run and delaying the operation that way,
anyway....
-Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2024-05-23 7:18 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-11 2:27 [RFC PATCH] fs: dcache: Delete the associated dentry when deleting a file Yafang Shao
2024-05-11 2:53 ` Linus Torvalds
2024-05-11 3:35 ` Yafang Shao
2024-05-11 4:54 ` Waiman Long
2024-05-11 15:58 ` Matthew Wilcox
2024-05-11 16:07 ` Linus Torvalds
2024-05-11 16:13 ` Linus Torvalds
2024-05-11 18:05 ` Linus Torvalds
2024-05-11 18:26 ` [PATCH] vfs: move dentry shrinking outside the inode lock in 'rmdir()' Linus Torvalds
2024-05-11 18:42 ` Linus Torvalds
2024-05-11 19:28 ` Al Viro
2024-05-11 19:55 ` Linus Torvalds
2024-05-11 20:31 ` Al Viro
2024-05-11 21:17 ` Al Viro
2024-05-12 15:45 ` James Bottomley
2024-05-12 16:16 ` Al Viro
2024-05-12 19:59 ` Linus Torvalds
2024-05-12 20:29 ` Linus Torvalds
2024-05-13 5:31 ` Al Viro
2024-05-13 15:58 ` Linus Torvalds
2024-05-13 16:33 ` Al Viro
2024-05-13 16:44 ` Linus Torvalds
2024-05-23 7:18 ` Dave Chinner [this message]
2024-05-11 20:02 ` [PATCH v2] " Linus Torvalds
2024-05-12 3:06 ` Yafang Shao
2024-05-12 3:30 ` Al Viro
2024-05-12 3:36 ` Yafang Shao
2024-05-11 19:24 ` [PATCH] " Al Viro
2024-05-15 2:18 ` [RFC PATCH] fs: dcache: Delete the associated dentry when deleting a file Yafang Shao
2024-05-15 2:36 ` Linus Torvalds
2024-05-15 9:17 ` [PATCH] vfs: " Yafang Shao
2024-05-15 16:05 ` Linus Torvalds
2024-05-16 13:44 ` Oliver Sang
2024-05-22 8:51 ` Oliver Sang
2024-05-23 2:21 ` Yafang Shao
2024-05-22 8:11 ` kernel test robot
2024-05-22 16:00 ` Linus Torvalds
2024-05-22 17:13 ` Matthew Wilcox
2024-05-22 18:11 ` Linus Torvalds
2024-05-11 3:36 ` [RFC PATCH] fs: dcache: " Al Viro
2024-05-11 3:51 ` Yafang Shao
2024-05-11 5:18 ` Al Viro
2024-05-11 5:32 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zk7t0WdpHE24UNDG@dread.disaster.area \
--to=david@fromorbit.com \
--cc=James.Bottomley@hansenpartnership.com \
--cc=brauner@kernel.org \
--cc=jack@suse.cz \
--cc=laoar.shao@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
--cc=walters@verbum.org \
--cc=wangkai86@huawei.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.