From: Al Viro <viro@zeniv.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
linux-fsdevel@vger.kernel.org,
Christian Brauner <brauner@kernel.org>
Subject: Re: [RFC][CFT][PATCH] Rewrite of propagate_umount() (was Re: [BUG] propagate_umount() breakage)
Date: Mon, 19 May 2025 22:35:08 +0100 [thread overview]
Message-ID: <20250519213508.GA2023217@ZenIV> (raw)
In-Reply-To: <CAHk-=wi1r1QFu=mfr75VtsCpx3xw_uy5yMZaCz2Cyxg0fQh4hg@mail.gmail.com>
On Mon, May 19, 2025 at 11:11:10AM -0700, Linus Torvalds wrote:
> Another thing that is either purely syntactic, or shows that I
> *really* don't understand your patch. Why do you do this odd thing:
>
> // reduce the set until it's non-shifting
> for (m = first_candidate(); m; m = trim_one(m))
> ;
>
> which seems to just walk the candidates list in a very non-obvious
> manner (this is one of those "I had to go back and forth to see what
> first_candidate() did and what lists it used" cases).
>
> It *seems* to be the same as
>
> list_for_each_entry_safe(m, tmp, &candidates, mnt_umounting)
> trim_one(m);
>
> because if I read that code right, 'trim_one()' will just always
> return the next entry in that candidate list.
list_for_each_entry_safe() is safe wrt removal of the _current_
element. Suppose the list is <A, B, C, D> and trim_one(A) wants to
take out A, B and D. What we want is to have it followed by trim_one(C),
but how could *anything* outside of trim_one() get to C?
list_for_each_entry_safe() would pick B as the next entry
to consider, which will end up with looping in B ever after.
What trim_one(A) does instead is
* note that A itself needs to be removed
* remove B and D
* remember who currently follows A (the list is down to <A, C>, so C it is)
* remove A
* return C.
We could have a separate list and have trim_one(m) move m to in
case it still looks like a valid candidate. Then the loop would turn
into
while !empty(candidates)
trim_one(first candidate)
move the second list over to candidates (or just use it on the
next stage instead of candidates)
What's slightly confusing about that variant is that m might survive
trim_one(m), only to be taken out by subsequent call of trim_one() for
another mount. So remove_candidate(p) would become "remove p from
candidates or from the set of seen candidates, whichever p currently
belongs to"...
Another variant would be to steal one more bit from mnt_flags,
set it for all candidates when collecting them, have is_candidate() check
that instead of list_empty(&m->mnt_umounting) and clean it where this
variant removes from the list; trim_one() would immediately return if
bit is not set. Then we could really do list_for_each_entry_safe(),
with another loop doing list removals afterwards. Extra work that way,
though, and I still think it's more confusing...
OTOH, "steal one more bit" would allow to get rid of
->mnt_umounting - we could use ->mnt_list for all these sets. IIRC,
the reason for ->mnt_umounting was that we used to maintain ->mnt_list
for everything in a namespace, and having to restore it if we decide
that candidate has to stay alive would be painful. These days we
use rbtree for iterators, so ->mnt_list on those suckers is fair
game...
Re globals - the other bunch is part of propagate_mnt(), not
propagate_umount()... IIRC, at some point I got fed up by arseloads
of arguments passed from propagate_umount() down to the helpers and
went "fuck it, we have a hard dependency on global serialization anyway,
might as well make all those arguments global" (and I remember trying
to go with per-namespace locks; stuff of nightmares, that)...
I'll look into gathering that stuff into a single structure
passed by the callers, but the whole thing (especially on the umount
side) really depends upon not running into another instance working
at the same time. Trying to cope with duelling umount propagations
from two threads... the data structures would be the least of headache
sources there.
next prev parent reply other threads:[~2025-05-19 21:35 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-11 23:27 [BUG] propagate_umount() breakage Al Viro
2025-05-12 4:50 ` Eric W. Biederman
2025-05-13 3:56 ` Al Viro
2025-05-15 11:41 ` Al Viro
2025-05-15 11:47 ` Al Viro
2025-05-16 5:21 ` [RFC][CFT][PATCH] Rewrite of propagate_umount() (was Re: [BUG] propagate_umount() breakage) Al Viro
2025-05-19 18:11 ` Linus Torvalds
2025-05-19 21:35 ` Al Viro [this message]
2025-05-19 22:08 ` Linus Torvalds
2025-05-19 22:26 ` Linus Torvalds
2025-05-20 22:27 ` Eric W. Biederman
2025-05-20 23:08 ` Al Viro
2025-05-23 2:10 ` [RFC][CFT][PATCH v2] Rewrite of propagate_umount() Al Viro
[not found] ` <20250520075317.GB2023217@ZenIV>
[not found] ` <87y0uqlvxs.fsf@email.froward.int.ebiederm.org>
[not found] ` <20250520231854.GF2023217@ZenIV>
[not found] ` <20250521023219.GA1309405@ZenIV>
[not found] ` <20250617041754.GA1591763@ZenIV>
2025-06-17 21:18 ` [PATCH][RFC] replace collect_mounts()/drop_collected_mounts() with safer variant Al Viro
2025-05-20 11:10 ` [RFC][CFT][PATCH] Rewrite of propagate_umount() (was Re: [BUG] propagate_umount() breakage) Christian Brauner
2025-05-21 2:11 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250519213508.GA2023217@ZenIV \
--to=viro@zeniv.linux.org.uk \
--cc=brauner@kernel.org \
--cc=ebiederm@xmission.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).