From: Al Viro <viro@zeniv.linux.org.uk>
To: linux-fsdevel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
"Lai, Yi" <yi1.lai@linux.intel.com>,
Tycho Andersen <tycho@tycho.pizza>,
Andrei Vagin <avagin@google.com>,
Pavel Tikhomirov <snorcht@gmail.com>
Subject: [PATCHES][RFC][CFT] mount fixes
Date: Sat, 16 Aug 2025 00:33:16 +0100 [thread overview]
Message-ID: <20250815233316.GS222315@ZenIV> (raw)
Several regression fixes of varying severity; live in
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git, individual
patches in followups. Help with testing and reviewing would be very
welcome.
1) attach_recursive_mnt() slowdown - severe in some pathological cases.
This cycle regression, fortunately trivial to fix. Underlying mechanism
of slowdown (bypassed by that fix) is due to bad semantics of skip_mnt_tree(),
but that's a separate story.
The case with severe slowdown is mounting on top of a long chain of overmounts,
all peers with each other. Ends up being time-quardatic by the depth of chain,
easily triggers softlockup warnings. Since the slow part is under mount_lock,
all pathname resolution pretty much stops for the duration. Kudos for catching
that to "Lai, Yi" <yi1.lai@linux.intel.com>...
2) reparenting in propagate_umount() should *NOT* be done to mounts that are
themselves going to be taken out. I'd even put "<argument> is not to going
away, and it overmounts the top of a stack of mounts that are going away"
right on top of reparent()... and lost the logics in its caller that used
to check that at some point during reshuffling the queue ;-/ This cycle's
regression and that one is not just a slowdown - in some cases it might end
up as data corruptor. Trivial to fix...
3) permission checks for --make-{shared,slave,private,unlinkable} (i.e.
MS_{SHARED,...} in mount(2) flags) ended up being too restrictive.
The checks used to be too weak until the last cycle. Then they were
made stricter - basically to "mount is a part of our namespace and we have
admin permissions there". Turns out that CRIU userland depended upon the
weak^Wabsent checks we used to have there before. Sane replacement that
would suffice for them is "mount is a part of *SOME* namespace that we have
admin permissions for". Last cycle userland regression...
4) change_mnt_propagation() slowdown in some cases. On umount we want all
victims out of propagation graph and propagation between the surviving mounts
to be unchanged. So if victim used to have slaves, they need to be transfered
to its peer (if any) or master. In case when victim had many peers, all
taken out by that umount(), that ended up with all its slaves being gradually
transferred between all peers until we finally ran out of those. It can
easily lead to quadratic time. The patch in -rc1 switched that to "just
find where they'll end up upfront, and move them once", which eliminated
that... except that I hadn't noticed that on massage of change_mnt_propagation()
we ended up calculating the place where they'd be transferred in cases
when there had been nothing to transfer. With obvious effects when there
had been a large peer group entirely taken out, with not a single slave between
them. The minimal fix ("call propagation_source() only if we are going to
use its return value") is enough to recover in all cases.
Longer term we should kick all victims out of propagation graph at once
and I have that plotted out, but that's for the next merge window; for
now the minimal obvious fix is good enough.
next reply other threads:[~2025-08-15 23:33 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-15 23:33 Al Viro [this message]
2025-08-15 23:34 ` [PATCH 1/4] fix the softlockups in attach_recursive_mnt() Al Viro
2025-08-19 10:18 ` Christian Brauner
2025-08-15 23:34 ` [PATCH 2/4] propagate_umount(): only surviving overmounts should be remounted Al Viro
2025-08-19 10:19 ` Christian Brauner
2025-08-15 23:35 ` [PATCH 3/4] use uniform permission checks for all mount propagation changes Al Viro
2025-08-16 18:28 ` Andrei Vagin
2025-08-19 4:44 ` Pavel Tikhomirov
2025-08-19 10:20 ` Christian Brauner
2025-08-15 23:36 ` [PATCH 4/4] change_mnt_propagation(): calculate propagation source only if we'll need it Al Viro
2025-08-19 10:20 ` Christian Brauner
2025-08-16 15:58 ` [PATCHES][RFC][CFT] mount fixes Al Viro
2025-08-19 16:12 ` [git pull] " Al Viro
2025-08-19 17:31 ` Linus Torvalds
2025-08-19 17:33 ` pr-tracker-bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250815233316.GS222315@ZenIV \
--to=viro@zeniv.linux.org.uk \
--cc=avagin@google.com \
--cc=brauner@kernel.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=snorcht@gmail.com \
--cc=torvalds@linux-foundation.org \
--cc=tycho@tycho.pizza \
--cc=yi1.lai@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).