From: Al Viro <viro@zeniv.linux.org.uk>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: linux-fsdevel@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
Christian Brauner <brauner@kernel.org>
Subject: [BUG] propagate_umount() breakage
Date: Mon, 12 May 2025 00:27:32 +0100 [thread overview]
Message-ID: <20250511232732.GC2023217@ZenIV> (raw)
reproducer:
------------------------------------------------------------
# create a playground
mkdir /tmp/foo
mount -t tmpfs none /tmp/foo
mount --make-private /tmp/foo
cd /tmp/foo
# set one-way propagation from A to B
mkdir A
mkdir B
mount -t tmpfs none A
mount --make-shared A
mount --bind A B
mount --make-slave B
# A/1 -> B/1, A/1/2 -> B/1/2
mkdir A/1
mount -t tmpfs none A/1
mkdir A/1/2
mount -t tmpfs none A/1/2
# overmount the entire B/1/2
mount -t tmpfs none B/1/2
# make sure it's busy - set a mount at B/1/2/x
mkdir B/1/2/x
mount -t tmpfs none B/1/2/x
stat B/1/x # shouldn't exist
umount -l A/1
stat B/1/x # ... and now it does
------------------------------------------------------------
What happens is that mounts on B/1 and B/1/2 had been considered
as victims - and taken out, since the overmount on top of B/1/2
overmounted the root of the first mount on B/1/2 and it got
reparented - all the way to B/1.
Correct behaviour would be to have B/1 left in place and upper
B/1/2 to be reparented once.
As an aside, that's a catch from several days of attempts to prove
correctness of propagate_umount(); I'm still not sure there's
nothing else wrong with it. _Maybe_ it's the only problem in
there, but reconstructing the proof of correctness has turned
out to be a real bitch ;-/
I seriously suspect that a lot of headache comes from trying
to combine collecting the full set of potential victims with
deciding what can and what can not be taken out - gathering
all of them first would simplify things. First pass would've
logics similar to your variant, but without __propagate_umount()
part[*]
After the set is collected, we could go through it, doing the
something along the lines of
how = 0
for each child in children(m)
if child in set
continue
how = 1
if child is not mounted on root
how = 2
break
if how == 2
kick_out_ancestors(m)
remove m itself from set // needs to cooperate with outer loop
else if how == 1
for (p = m; p in set && p is mounted on root; p = p->mnt_parent)
;
if p in set
kick_out_ancestors(p)
else if children(m) is empty && m is not locked // to optimize things a bit
commit to unmounting m (again, needs to cooperate with the outer loop)
"Cooperate with the outer loop" might mean something like
having this per-element work leave removal of its argument to
caller and report whether its argument needs to be removed.
After that we'd be left with everything still in the set
having no out-of-set children that would be obstacles.
The only thing that remains after that is MNT_LOCKED and
that's as simple as
while set is not empty
m = first element of set
for (p = m; p is locked && p in set; p = p->mnt_parent)
;
if p not in set {
if p is not committed to unmount
remove everything from m to p from set
continue
} else {
p = p->mnt_parent
}
commit everything from m to p to unmount, removing from set
I'll try to put something of that sort together, along with
detailed explanation of what it's doing - in D/f/*, rather than
buring it in commit messages, and along with "read and update
D/f/... if you are ever touch this function" in fs/pnode.c itself;
this fun is not something I would like to repeat several years
down the road ;-/
We *REALLY* need a good set of regression tests for that stuff.
If you have anything along those lines sitting somewhere, please
post a reference. The same goes for everybody else who might
have something in that general area.
[*] well, that and with fixed skip_propagation_subtree() logics; it's
easier to combine it with propagation_next() rather than trying to set
the things up so that the next call of propagation_next() would DTRT -
it's less work and yours actually has a corner case if the last element
of ->mnt_slave_list has non-empty ->mnt_slave_list itself.
next reply other threads:[~2025-05-11 23:27 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-11 23:27 Al Viro [this message]
2025-05-12 4:50 ` [BUG] propagate_umount() breakage Eric W. Biederman
2025-05-13 3:56 ` Al Viro
2025-05-15 11:41 ` Al Viro
2025-05-15 11:47 ` Al Viro
2025-05-16 5:21 ` [RFC][CFT][PATCH] Rewrite of propagate_umount() (was Re: [BUG] propagate_umount() breakage) Al Viro
2025-05-19 18:11 ` Linus Torvalds
2025-05-19 21:35 ` Al Viro
2025-05-19 22:08 ` Linus Torvalds
2025-05-19 22:26 ` Linus Torvalds
2025-05-20 22:27 ` Eric W. Biederman
2025-05-20 23:08 ` Al Viro
2025-05-23 2:10 ` [RFC][CFT][PATCH v2] Rewrite of propagate_umount() Al Viro
[not found] ` <20250520075317.GB2023217@ZenIV>
[not found] ` <87y0uqlvxs.fsf@email.froward.int.ebiederm.org>
[not found] ` <20250520231854.GF2023217@ZenIV>
[not found] ` <20250521023219.GA1309405@ZenIV>
[not found] ` <20250617041754.GA1591763@ZenIV>
2025-06-17 21:18 ` [PATCH][RFC] replace collect_mounts()/drop_collected_mounts() with safer variant Al Viro
2025-05-20 11:10 ` [RFC][CFT][PATCH] Rewrite of propagate_umount() (was Re: [BUG] propagate_umount() breakage) Christian Brauner
2025-05-21 2:11 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250511232732.GC2023217@ZenIV \
--to=viro@zeniv.linux.org.uk \
--cc=brauner@kernel.org \
--cc=ebiederm@xmission.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).