* [BUG] propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2)
@ 2025-05-09 8:26 Al Viro
2025-05-09 8:28 ` Al Viro
0 siblings, 1 reply; 4+ messages in thread
From: Al Viro @ 2025-05-09 8:26 UTC (permalink / raw)
To: linux-fsdevel; +Cc: Pavel Tikhomirov, Christian Brauner
AFAICS, 9ffb14ef61ba "move_mount: allow to add a mount into an existing
group" breaks assertions on ->mnt_share/->mnt_slave. For once, the data
structures in question are actually documented.
Documentation/filesystem/sharedsubtree.rst:
All vfsmounts in a peer group have the same ->mnt_master. If it is
non-NULL, they form a contiguous (ordered) segment of slave list.
fs/pnode.c:
* Note that peer groups form contiguous segments of slave lists.
fs/namespace.c:do_set_group():
if (IS_MNT_SLAVE(from)) {
struct mount *m = from->mnt_master;
list_add(&to->mnt_slave, &m->mnt_slave_list);
to->mnt_master = m;
}
if (IS_MNT_SHARED(from)) {
to->mnt_group_id = from->mnt_group_id;
list_add(&to->mnt_share, &from->mnt_share);
lock_mount_hash();
set_mnt_shared(to);
unlock_mount_hash();
}
Note that 'to' goes right after 'from' in ->mnt_share (i.e. peer group
list) and into the beginning of the slave list 'from' belongs to. IOW,
contiguity gets broken if 'from' is both IS_MNT_SLAVE and IS_MNT_SHARED.
Which is what happens when the peer group 'from' is in gets propagation
from somewhere.
It's not hard to fix - something like
if (IS_MNT_SHARED(from)) {
to->mnt_group_id = from->mnt_group_id;
list_add(&to->mnt_share, &from->mnt_share);
if (IS_MNT_SLAVE(from))
list_add(&to->mnt_slave, &from->mnt_slave);
to->mnt_master = from->mnt_master;
lock_mount_hash();
set_mnt_shared(to);
unlock_mount_hash();
} else if (IS_MNT_SLAVE(from)) {
to->mnt_master = from->mnt_master;
list_add(&to->mnt_slave, &from->mnt_master->mnt_slave_list);
}
ought to do it. I'm nowhere near sufficiently awake right now to put
together a regression test, but unless I'm missing something subtle, it
should be possible to get a fairly obvious breakage of propagate_mnt()
out of that...
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [BUG] propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2) 2025-05-09 8:26 [BUG] propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2) Al Viro @ 2025-05-09 8:28 ` Al Viro 2025-05-09 10:59 ` Pavel Tikhomirov 0 siblings, 1 reply; 4+ messages in thread From: Al Viro @ 2025-05-09 8:28 UTC (permalink / raw) To: linux-fsdevel; +Cc: Pavel Tikhomirov, Christian Brauner On Fri, May 09, 2025 at 09:26:28AM +0100, Al Viro wrote: > AFAICS, 9ffb14ef61ba "move_mount: allow to add a mount into an existing > group" breaks assertions on ->mnt_share/->mnt_slave. For once, the data > structures in question are actually documented. > > Documentation/filesystem/sharedsubtree.rst: > All vfsmounts in a peer group have the same ->mnt_master. If it is > non-NULL, they form a contiguous (ordered) segment of slave list. > > fs/pnode.c: > * Note that peer groups form contiguous segments of slave lists. > > fs/namespace.c:do_set_group(): > if (IS_MNT_SLAVE(from)) { > struct mount *m = from->mnt_master; > > list_add(&to->mnt_slave, &m->mnt_slave_list); > to->mnt_master = m; > } > > if (IS_MNT_SHARED(from)) { > to->mnt_group_id = from->mnt_group_id; > list_add(&to->mnt_share, &from->mnt_share); > lock_mount_hash(); > set_mnt_shared(to); > unlock_mount_hash(); > } > > Note that 'to' goes right after 'from' in ->mnt_share (i.e. peer group > list) and into the beginning of the slave list 'from' belongs to. IOW, > contiguity gets broken if 'from' is both IS_MNT_SLAVE and IS_MNT_SHARED. > Which is what happens when the peer group 'from' is in gets propagation > from somewhere. > > It's not hard to fix - something like > > if (IS_MNT_SHARED(from)) { > to->mnt_group_id = from->mnt_group_id; > list_add(&to->mnt_share, &from->mnt_share); > if (IS_MNT_SLAVE(from)) > list_add(&to->mnt_slave, &from->mnt_slave); > to->mnt_master = from->mnt_master; > lock_mount_hash(); > set_mnt_shared(to); > unlock_mount_hash(); > } else if (IS_MNT_SLAVE(from)) { > to->mnt_master = from->mnt_master; > list_add(&to->mnt_slave, &from->mnt_master->mnt_slave_list); > } > > ought to do it. I'm nowhere near sufficiently awake right now to put > together a regression test, but unless I'm missing something subtle, it > should be possible to get a fairly obvious breakage of propagate_mnt() > out of that... Not sufficiently awake is right - wrong address on Cc... Anyway, bedtime for me... ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [BUG] propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2) 2025-05-09 8:28 ` Al Viro @ 2025-05-09 10:59 ` Pavel Tikhomirov 2025-05-09 11:06 ` Pavel Tikhomirov 0 siblings, 1 reply; 4+ messages in thread From: Pavel Tikhomirov @ 2025-05-09 10:59 UTC (permalink / raw) To: Al Viro, linux-fsdevel; +Cc: Christian Brauner On 5/9/25 16:28, Al Viro wrote: > On Fri, May 09, 2025 at 09:26:28AM +0100, Al Viro wrote: >> AFAICS, 9ffb14ef61ba "move_mount: allow to add a mount into an existing >> group" breaks assertions on ->mnt_share/->mnt_slave. For once, the data >> structures in question are actually documented. >> >> Documentation/filesystem/sharedsubtree.rst: >> All vfsmounts in a peer group have the same ->mnt_master. If it is >> non-NULL, they form a contiguous (ordered) segment of slave list. >> >> fs/pnode.c: >> * Note that peer groups form contiguous segments of slave lists. >> >> fs/namespace.c:do_set_group(): >> if (IS_MNT_SLAVE(from)) { >> struct mount *m = from->mnt_master; >> >> list_add(&to->mnt_slave, &m->mnt_slave_list); >> to->mnt_master = m; >> } >> >> if (IS_MNT_SHARED(from)) { >> to->mnt_group_id = from->mnt_group_id; >> list_add(&to->mnt_share, &from->mnt_share); >> lock_mount_hash(); >> set_mnt_shared(to); >> unlock_mount_hash(); >> } >> >> Note that 'to' goes right after 'from' in ->mnt_share (i.e. peer group >> list) and into the beginning of the slave list 'from' belongs to. IOW, >> contiguity gets broken if 'from' is both IS_MNT_SLAVE and IS_MNT_SHARED. >> Which is what happens when the peer group 'from' is in gets propagation >> from somewhere. Agreed, list ordering consistency looks broken by my commit. >> >> It's not hard to fix - something like >> >> if (IS_MNT_SHARED(from)) { >> to->mnt_group_id = from->mnt_group_id; >> list_add(&to->mnt_share, &from->mnt_share); >> if (IS_MNT_SLAVE(from)) >> list_add(&to->mnt_slave, &from->mnt_slave); >> to->mnt_master = from->mnt_master; >> lock_mount_hash(); >> set_mnt_shared(to); >> unlock_mount_hash(); >> } else if (IS_MNT_SLAVE(from)) { >> to->mnt_master = from->mnt_master; >> list_add(&to->mnt_slave, &from->mnt_master->mnt_slave_list); >> } >> >> ought to do it. Yes it should work. In case (IS_MNT_SLAVE(from) && !IS_MNT_SHARED(from)) we can probably also do: list_add(&to->mnt_slave, &from->mnt_slave); as next slave after "from" is definitely not from the same shared group with "from" (as it's not in a shared group) so we won't break list continuity. That will allow to simplify code change to: if (IS_MNT_SLAVE(from)) { struct mount *m = from->mnt_master; - list_add(&to->mnt_slave, &m->mnt_slave_list); + list_add(&to->mnt_slave, &from->mnt_slave); to->mnt_master = m; } if (IS_MNT_SHARED(from)) { to->mnt_group_id = from->mnt_group_id; list_add(&to->mnt_share, &from->mnt_share); lock_mount_hash(); set_mnt_shared(to); unlock_mount_hash(); } If I'm not missing something (didn't test yet). >> I'm nowhere near sufficiently awake right now to put >> together a regression test, but unless I'm missing something subtle, it >> should be possible to get a fairly obvious breakage of propagate_mnt() >> out of that... I managed to see weird behavior like that: # rmdir /tmp/{A,B,C,D,E,Z} # unshare -m mkdir /tmp/{A,B,C,D,E,Z} mount --make-rprivate / mount -t tmpfs tmpfs /tmp/A mount --bind /tmp/A /tmp/Z mount --make-shared /tmp/A mount --bind /tmp/A /tmp/B mount --make-slave /tmp/B mount --make-shared /tmp/B mount --bind /tmp/B /tmp/C mount --bind /tmp/C /tmp/D mount --bind /tmp/D /tmp/E ./setgroup-v2 /tmp/C /tmp/Z mkdir /tmp/A/subdir mount -t tmpfs tmpfs /tmp/A/subdir This creates 16 subdir mounts instead of expected 6: cat /proc/self/mountinfo | grep /tmp/ 1071 1065 0:109 / /tmp/A rw,relatime shared:556 - tmpfs tmpfs rw,seclabel,inode64 1073 1065 0:109 / /tmp/Z rw,relatime shared:1040 master:556 - tmpfs tmpfs rw,seclabel,inode64 1076 1065 0:109 / /tmp/B rw,relatime shared:1040 master:556 - tmpfs tmpfs rw,seclabel,inode64 1077 1065 0:109 / /tmp/C rw,relatime shared:1040 master:556 - tmpfs tmpfs rw,seclabel,inode64 1078 1065 0:109 / /tmp/D rw,relatime shared:1040 master:556 - tmpfs tmpfs rw,seclabel,inode64 1079 1065 0:109 / /tmp/E rw,relatime shared:1040 master:556 - tmpfs tmpfs rw,seclabel,inode64 1080 1071 0:136 / /tmp/A/subdir rw,relatime shared:1041 - tmpfs tmpfs rw,seclabel,inode64 1081 1073 0:136 / /tmp/Z/subdir rw,relatime shared:1042 master:1041 - tmpfs tmpfs rw,seclabel,inode64 1082 1078 0:136 / /tmp/D/subdir rw,relatime shared:1042 master:1041 - tmpfs tmpfs rw,seclabel,inode64 1083 1079 0:136 / /tmp/E/subdir rw,relatime shared:1042 master:1041 - tmpfs tmpfs rw,seclabel,inode64 1084 1076 0:136 / /tmp/B/subdir rw,relatime shared:1042 master:1041 - tmpfs tmpfs rw,seclabel,inode64 1085 1077 0:136 / /tmp/C/subdir rw,relatime shared:1042 master:1041 - tmpfs tmpfs rw,seclabel,inode64 1086 1084 0:136 / /tmp/B/subdir rw,relatime shared:1042 master:1041 - tmpfs tmpfs rw,seclabel,inode64 1087 1085 0:136 / /tmp/C/subdir rw,relatime shared:1042 master:1041 - tmpfs tmpfs rw,seclabel,inode64 1088 1081 0:136 / /tmp/Z/subdir rw,relatime shared:1042 master:1041 - tmpfs tmpfs rw,seclabel,inode64 1089 1082 0:136 / /tmp/D/subdir rw,relatime shared:1042 master:1041 - tmpfs tmpfs rw,seclabel,inode64 1090 1083 0:136 / /tmp/E/subdir rw,relatime shared:1042 master:1041 - tmpfs tmpfs rw,seclabel,inode64 1142 1089 0:136 / /tmp/D/subdir rw,relatime shared:1042 master:1041 - tmpfs tmpfs rw,seclabel,inode64 1143 1090 0:136 / /tmp/E/subdir rw,relatime shared:1042 master:1041 - tmpfs tmpfs rw,seclabel,inode64 1144 1086 0:136 / /tmp/B/subdir rw,relatime shared:1042 master:1041 - tmpfs tmpfs rw,seclabel,inode64 1145 1087 0:136 / /tmp/C/subdir rw,relatime shared:1042 master:1041 - tmpfs tmpfs rw,seclabel,inode64 1146 1088 0:136 / /tmp/Z/subdir rw,relatime shared:1042 master:1041 - tmpfs tmpfs rw,seclabel,inode64 Maybe that can be converted to a regression test. > > Not sufficiently awake is right - wrong address on Cc... Anyway, bedtime > for me... -- Best regards, Pavel Tikhomirov Senior Software Developer, Virtuozzo. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [BUG] propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2) 2025-05-09 10:59 ` Pavel Tikhomirov @ 2025-05-09 11:06 ` Pavel Tikhomirov 0 siblings, 0 replies; 4+ messages in thread From: Pavel Tikhomirov @ 2025-05-09 11:06 UTC (permalink / raw) To: Al Viro, linux-fsdevel; +Cc: Christian Brauner On 5/9/25 18:59, Pavel Tikhomirov wrote: > > > On 5/9/25 16:28, Al Viro wrote: >> On Fri, May 09, 2025 at 09:26:28AM +0100, Al Viro wrote: >>> AFAICS, 9ffb14ef61ba "move_mount: allow to add a mount into an existing >>> group" breaks assertions on ->mnt_share/->mnt_slave. For once, the data >>> structures in question are actually documented. >>> >>> Documentation/filesystem/sharedsubtree.rst: >>> All vfsmounts in a peer group have the same ->mnt_master. >>> If it is >>> non-NULL, they form a contiguous (ordered) segment of slave list. >>> >>> fs/pnode.c: >>> * Note that peer groups form contiguous segments of slave lists. >>> >>> fs/namespace.c:do_set_group(): >>> if (IS_MNT_SLAVE(from)) { >>> struct mount *m = from->mnt_master; >>> >>> list_add(&to->mnt_slave, &m->mnt_slave_list); >>> to->mnt_master = m; >>> } >>> >>> if (IS_MNT_SHARED(from)) { >>> to->mnt_group_id = from->mnt_group_id; >>> list_add(&to->mnt_share, &from->mnt_share); >>> lock_mount_hash(); >>> set_mnt_shared(to); >>> unlock_mount_hash(); >>> } >>> >>> Note that 'to' goes right after 'from' in ->mnt_share (i.e. peer group >>> list) and into the beginning of the slave list 'from' belongs to. IOW, >>> contiguity gets broken if 'from' is both IS_MNT_SLAVE and IS_MNT_SHARED. >>> Which is what happens when the peer group 'from' is in gets propagation >>> from somewhere. > > Agreed, list ordering consistency looks broken by my commit. > >>> >>> It's not hard to fix - something like >>> >>> if (IS_MNT_SHARED(from)) { >>> to->mnt_group_id = from->mnt_group_id; >>> list_add(&to->mnt_share, &from->mnt_share); >>> if (IS_MNT_SLAVE(from)) >>> list_add(&to->mnt_slave, &from->mnt_slave); >>> to->mnt_master = from->mnt_master; >>> lock_mount_hash(); >>> set_mnt_shared(to); >>> unlock_mount_hash(); >>> } else if (IS_MNT_SLAVE(from)) { >>> to->mnt_master = from->mnt_master; >>> list_add(&to->mnt_slave, &from->mnt_master->mnt_slave_list); >>> } >>> >>> ought to do it. > > Yes it should work. > > In case (IS_MNT_SLAVE(from) && !IS_MNT_SHARED(from)) we can probably > also do: > > list_add(&to->mnt_slave, &from->mnt_slave); > > as next slave after "from" is definitely not from the same shared group > with "from" (as it's not in a shared group) so we won't break list > continuity. > > That will allow to simplify code change to: > > if (IS_MNT_SLAVE(from)) { > struct mount *m = from->mnt_master; > > - list_add(&to->mnt_slave, &m->mnt_slave_list); > + list_add(&to->mnt_slave, &from->mnt_slave); > to->mnt_master = m; > } > > if (IS_MNT_SHARED(from)) { > to->mnt_group_id = from->mnt_group_id; > list_add(&to->mnt_share, &from->mnt_share); > lock_mount_hash(); > set_mnt_shared(to); > unlock_mount_hash(); > } > > If I'm not missing something (didn't test yet). > >>> I'm nowhere near sufficiently awake right now to put >>> together a regression test, but unless I'm missing something subtle, it >>> should be possible to get a fairly obvious breakage of propagate_mnt() >>> out of that... > > I managed to see weird behavior like that: > > # rmdir /tmp/{A,B,C,D,E,Z} > # unshare -m > mkdir /tmp/{A,B,C,D,E,Z} > mount --make-rprivate / > mount -t tmpfs tmpfs /tmp/A > mount --bind /tmp/A /tmp/Z > mount --make-shared /tmp/A > mount --bind /tmp/A /tmp/B > mount --make-slave /tmp/B > mount --make-shared /tmp/B > mount --bind /tmp/B /tmp/C > mount --bind /tmp/C /tmp/D > mount --bind /tmp/D /tmp/E > ./setgroup-v2 /tmp/C /tmp/Z Note I put ./setgroup-v2 code here: https://github.com/Snorch/linux-helpers/blob/master/mount_set_group.c > mkdir /tmp/A/subdir > mount -t tmpfs tmpfs /tmp/A/subdir > > This creates 16 subdir mounts instead of expected 6: > > cat /proc/self/mountinfo | grep /tmp/ > 1071 1065 0:109 / /tmp/A rw,relatime shared:556 - tmpfs tmpfs > rw,seclabel,inode64 > 1073 1065 0:109 / /tmp/Z rw,relatime shared:1040 master:556 - tmpfs > tmpfs rw,seclabel,inode64 > 1076 1065 0:109 / /tmp/B rw,relatime shared:1040 master:556 - tmpfs > tmpfs rw,seclabel,inode64 > 1077 1065 0:109 / /tmp/C rw,relatime shared:1040 master:556 - tmpfs > tmpfs rw,seclabel,inode64 > 1078 1065 0:109 / /tmp/D rw,relatime shared:1040 master:556 - tmpfs > tmpfs rw,seclabel,inode64 > 1079 1065 0:109 / /tmp/E rw,relatime shared:1040 master:556 - tmpfs > tmpfs rw,seclabel,inode64 > 1080 1071 0:136 / /tmp/A/subdir rw,relatime shared:1041 - tmpfs tmpfs > rw,seclabel,inode64 > 1081 1073 0:136 / /tmp/Z/subdir rw,relatime shared:1042 master:1041 - > tmpfs tmpfs rw,seclabel,inode64 > 1082 1078 0:136 / /tmp/D/subdir rw,relatime shared:1042 master:1041 - > tmpfs tmpfs rw,seclabel,inode64 > 1083 1079 0:136 / /tmp/E/subdir rw,relatime shared:1042 master:1041 - > tmpfs tmpfs rw,seclabel,inode64 > 1084 1076 0:136 / /tmp/B/subdir rw,relatime shared:1042 master:1041 - > tmpfs tmpfs rw,seclabel,inode64 > 1085 1077 0:136 / /tmp/C/subdir rw,relatime shared:1042 master:1041 - > tmpfs tmpfs rw,seclabel,inode64 > 1086 1084 0:136 / /tmp/B/subdir rw,relatime shared:1042 master:1041 - > tmpfs tmpfs rw,seclabel,inode64 > 1087 1085 0:136 / /tmp/C/subdir rw,relatime shared:1042 master:1041 - > tmpfs tmpfs rw,seclabel,inode64 > 1088 1081 0:136 / /tmp/Z/subdir rw,relatime shared:1042 master:1041 - > tmpfs tmpfs rw,seclabel,inode64 > 1089 1082 0:136 / /tmp/D/subdir rw,relatime shared:1042 master:1041 - > tmpfs tmpfs rw,seclabel,inode64 > 1090 1083 0:136 / /tmp/E/subdir rw,relatime shared:1042 master:1041 - > tmpfs tmpfs rw,seclabel,inode64 > 1142 1089 0:136 / /tmp/D/subdir rw,relatime shared:1042 master:1041 - > tmpfs tmpfs rw,seclabel,inode64 > 1143 1090 0:136 / /tmp/E/subdir rw,relatime shared:1042 master:1041 - > tmpfs tmpfs rw,seclabel,inode64 > 1144 1086 0:136 / /tmp/B/subdir rw,relatime shared:1042 master:1041 - > tmpfs tmpfs rw,seclabel,inode64 > 1145 1087 0:136 / /tmp/C/subdir rw,relatime shared:1042 master:1041 - > tmpfs tmpfs rw,seclabel,inode64 > 1146 1088 0:136 / /tmp/Z/subdir rw,relatime shared:1042 master:1041 - > tmpfs tmpfs rw,seclabel,inode64 > > Maybe that can be converted to a regression test. > >> >> Not sufficiently awake is right - wrong address on Cc... Anyway, bedtime >> for me... > > > -- Best regards, Pavel Tikhomirov Senior Software Developer, Virtuozzo. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-05-09 11:07 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-05-09 8:26 [BUG] propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2) Al Viro 2025-05-09 8:28 ` Al Viro 2025-05-09 10:59 ` Pavel Tikhomirov 2025-05-09 11:06 ` Pavel Tikhomirov
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.