[BUG] propagation graph breakage by MOVE_MOUNT_SET_GROUP move

All of lore.kernel.org
 help / color / mirror / Atom feed

* [BUG] propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2)
@ 2025-05-09  8:26 Al Viro
  2025-05-09  8:28 ` Al Viro
  0 siblings, 1 reply; 4+ messages in thread
From: Al Viro @ 2025-05-09  8:26 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Pavel Tikhomirov, Christian Brauner

AFAICS, 9ffb14ef61ba "move_mount: allow to add a mount into an existing
group" breaks assertions on ->mnt_share/->mnt_slave.  For once, the data
structures in question are actually documented.

Documentation/filesystem/sharedsubtree.rst:
        All vfsmounts in a peer group have the same ->mnt_master.  If it is
	non-NULL, they form a contiguous (ordered) segment of slave list.

fs/pnode.c:
 * Note that peer groups form contiguous segments of slave lists.

fs/namespace.c:do_set_group():
        if (IS_MNT_SLAVE(from)) {
                struct mount *m = from->mnt_master;

                list_add(&to->mnt_slave, &m->mnt_slave_list);
                to->mnt_master = m;
        }

        if (IS_MNT_SHARED(from)) {
                to->mnt_group_id = from->mnt_group_id;
                list_add(&to->mnt_share, &from->mnt_share);
                lock_mount_hash();
                set_mnt_shared(to);
                unlock_mount_hash();
        }

Note that 'to' goes right after 'from' in ->mnt_share (i.e. peer group
list) and into the beginning of the slave list 'from' belongs to.  IOW,
contiguity gets broken if 'from' is both IS_MNT_SLAVE and IS_MNT_SHARED.
Which is what happens when the peer group 'from' is in gets propagation
from somewhere.

It's not hard to fix - something like

        if (IS_MNT_SHARED(from)) {
		to->mnt_group_id = from->mnt_group_id;
                list_add(&to->mnt_share, &from->mnt_share);
		if (IS_MNT_SLAVE(from))
			list_add(&to->mnt_slave, &from->mnt_slave);
		to->mnt_master = from->mnt_master;
                lock_mount_hash();
                set_mnt_shared(to);
                unlock_mount_hash();
        } else if (IS_MNT_SLAVE(from)) {
		to->mnt_master = from->mnt_master;
		list_add(&to->mnt_slave, &from->mnt_master->mnt_slave_list);
	}

ought to do it.  I'm nowhere near sufficiently awake right now to put
together a regression test, but unless I'm missing something subtle, it
should be possible to get a fairly obvious breakage of propagate_mnt()
out of that...

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG] propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2)
  2025-05-09  8:26 [BUG] propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2) Al Viro
@ 2025-05-09  8:28 ` Al Viro
  2025-05-09 10:59   ` Pavel Tikhomirov
  0 siblings, 1 reply; 4+ messages in thread
From: Al Viro @ 2025-05-09  8:28 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Pavel Tikhomirov, Christian Brauner

On Fri, May 09, 2025 at 09:26:28AM +0100, Al Viro wrote:
> AFAICS, 9ffb14ef61ba "move_mount: allow to add a mount into an existing
> group" breaks assertions on ->mnt_share/->mnt_slave.  For once, the data
> structures in question are actually documented.
> 
> Documentation/filesystem/sharedsubtree.rst:
>         All vfsmounts in a peer group have the same ->mnt_master.  If it is
> 	non-NULL, they form a contiguous (ordered) segment of slave list.
> 
> fs/pnode.c:
>  * Note that peer groups form contiguous segments of slave lists.
> 
> fs/namespace.c:do_set_group():
>         if (IS_MNT_SLAVE(from)) {
>                 struct mount *m = from->mnt_master;
> 
>                 list_add(&to->mnt_slave, &m->mnt_slave_list);
>                 to->mnt_master = m;
>         }
> 
>         if (IS_MNT_SHARED(from)) {
>                 to->mnt_group_id = from->mnt_group_id;
>                 list_add(&to->mnt_share, &from->mnt_share);
>                 lock_mount_hash();
>                 set_mnt_shared(to);
>                 unlock_mount_hash();
>         }
> 
> Note that 'to' goes right after 'from' in ->mnt_share (i.e. peer group
> list) and into the beginning of the slave list 'from' belongs to.  IOW,
> contiguity gets broken if 'from' is both IS_MNT_SLAVE and IS_MNT_SHARED.
> Which is what happens when the peer group 'from' is in gets propagation
> from somewhere.
> 
> It's not hard to fix - something like
> 
>         if (IS_MNT_SHARED(from)) {
> 		to->mnt_group_id = from->mnt_group_id;
>                 list_add(&to->mnt_share, &from->mnt_share);
> 		if (IS_MNT_SLAVE(from))
> 			list_add(&to->mnt_slave, &from->mnt_slave);
> 		to->mnt_master = from->mnt_master;
>                 lock_mount_hash();
>                 set_mnt_shared(to);
>                 unlock_mount_hash();
>         } else if (IS_MNT_SLAVE(from)) {
> 		to->mnt_master = from->mnt_master;
> 		list_add(&to->mnt_slave, &from->mnt_master->mnt_slave_list);
> 	}
> 
> ought to do it.  I'm nowhere near sufficiently awake right now to put
> together a regression test, but unless I'm missing something subtle, it
> should be possible to get a fairly obvious breakage of propagate_mnt()
> out of that...

Not sufficiently awake is right - wrong address on Cc...  Anyway, bedtime
for me...

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG] propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2)
  2025-05-09  8:28 ` Al Viro
@ 2025-05-09 10:59   ` Pavel Tikhomirov
  2025-05-09 11:06     ` Pavel Tikhomirov
  0 siblings, 1 reply; 4+ messages in thread
From: Pavel Tikhomirov @ 2025-05-09 10:59 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel; +Cc: Christian Brauner



On 5/9/25 16:28, Al Viro wrote:
> On Fri, May 09, 2025 at 09:26:28AM +0100, Al Viro wrote:
>> AFAICS, 9ffb14ef61ba "move_mount: allow to add a mount into an existing
>> group" breaks assertions on ->mnt_share/->mnt_slave.  For once, the data
>> structures in question are actually documented.
>>
>> Documentation/filesystem/sharedsubtree.rst:
>>          All vfsmounts in a peer group have the same ->mnt_master.  If it is
>> 	non-NULL, they form a contiguous (ordered) segment of slave list.
>>
>> fs/pnode.c:
>>   * Note that peer groups form contiguous segments of slave lists.
>>
>> fs/namespace.c:do_set_group():
>>          if (IS_MNT_SLAVE(from)) {
>>                  struct mount *m = from->mnt_master;
>>
>>                  list_add(&to->mnt_slave, &m->mnt_slave_list);
>>                  to->mnt_master = m;
>>          }
>>
>>          if (IS_MNT_SHARED(from)) {
>>                  to->mnt_group_id = from->mnt_group_id;
>>                  list_add(&to->mnt_share, &from->mnt_share);
>>                  lock_mount_hash();
>>                  set_mnt_shared(to);
>>                  unlock_mount_hash();
>>          }
>>
>> Note that 'to' goes right after 'from' in ->mnt_share (i.e. peer group
>> list) and into the beginning of the slave list 'from' belongs to.  IOW,
>> contiguity gets broken if 'from' is both IS_MNT_SLAVE and IS_MNT_SHARED.
>> Which is what happens when the peer group 'from' is in gets propagation
>> from somewhere.

Agreed, list ordering consistency looks broken by my commit.

>>
>> It's not hard to fix - something like
>>
>>          if (IS_MNT_SHARED(from)) {
>> 		to->mnt_group_id = from->mnt_group_id;
>>                  list_add(&to->mnt_share, &from->mnt_share);
>> 		if (IS_MNT_SLAVE(from))
>> 			list_add(&to->mnt_slave, &from->mnt_slave);
>> 		to->mnt_master = from->mnt_master;
>>                  lock_mount_hash();
>>                  set_mnt_shared(to);
>>                  unlock_mount_hash();
>>          } else if (IS_MNT_SLAVE(from)) {
>> 		to->mnt_master = from->mnt_master;
>> 		list_add(&to->mnt_slave, &from->mnt_master->mnt_slave_list);
>> 	}
>>
>> ought to do it.

Yes it should work.

In case (IS_MNT_SLAVE(from) && !IS_MNT_SHARED(from)) we can probably 
also do:

list_add(&to->mnt_slave, &from->mnt_slave);

as next slave after "from" is definitely not from the same shared group 
with "from" (as it's not in a shared group) so we won't break list 
continuity.

That will allow to simplify code change to:

         if (IS_MNT_SLAVE(from)) {
                 struct mount *m = from->mnt_master;

-                list_add(&to->mnt_slave, &m->mnt_slave_list);
+                list_add(&to->mnt_slave, &from->mnt_slave);
                 to->mnt_master = m;
         }

         if (IS_MNT_SHARED(from)) {
                 to->mnt_group_id = from->mnt_group_id;
                 list_add(&to->mnt_share, &from->mnt_share);
                 lock_mount_hash();
                 set_mnt_shared(to);
                 unlock_mount_hash();
         }

If I'm not missing something (didn't test yet).

>> I'm nowhere near sufficiently awake right now to put
>> together a regression test, but unless I'm missing something subtle, it
>> should be possible to get a fairly obvious breakage of propagate_mnt()
>> out of that...

I managed to see weird behavior like that:

# rmdir /tmp/{A,B,C,D,E,Z}
# unshare -m
mkdir /tmp/{A,B,C,D,E,Z}
mount --make-rprivate /
mount -t tmpfs tmpfs /tmp/A
mount --bind /tmp/A /tmp/Z
mount --make-shared /tmp/A
mount --bind /tmp/A /tmp/B
mount --make-slave /tmp/B
mount --make-shared /tmp/B
mount --bind /tmp/B /tmp/C
mount --bind /tmp/C /tmp/D
mount --bind /tmp/D /tmp/E
./setgroup-v2 /tmp/C /tmp/Z
mkdir /tmp/A/subdir
mount -t tmpfs tmpfs /tmp/A/subdir

This creates 16 subdir mounts instead of expected 6:

cat /proc/self/mountinfo | grep /tmp/
1071 1065 0:109 / /tmp/A rw,relatime shared:556 - tmpfs tmpfs 
rw,seclabel,inode64
1073 1065 0:109 / /tmp/Z rw,relatime shared:1040 master:556 - tmpfs 
tmpfs rw,seclabel,inode64
1076 1065 0:109 / /tmp/B rw,relatime shared:1040 master:556 - tmpfs 
tmpfs rw,seclabel,inode64
1077 1065 0:109 / /tmp/C rw,relatime shared:1040 master:556 - tmpfs 
tmpfs rw,seclabel,inode64
1078 1065 0:109 / /tmp/D rw,relatime shared:1040 master:556 - tmpfs 
tmpfs rw,seclabel,inode64
1079 1065 0:109 / /tmp/E rw,relatime shared:1040 master:556 - tmpfs 
tmpfs rw,seclabel,inode64
1080 1071 0:136 / /tmp/A/subdir rw,relatime shared:1041 - tmpfs tmpfs 
rw,seclabel,inode64
1081 1073 0:136 / /tmp/Z/subdir rw,relatime shared:1042 master:1041 - 
tmpfs tmpfs rw,seclabel,inode64
1082 1078 0:136 / /tmp/D/subdir rw,relatime shared:1042 master:1041 - 
tmpfs tmpfs rw,seclabel,inode64
1083 1079 0:136 / /tmp/E/subdir rw,relatime shared:1042 master:1041 - 
tmpfs tmpfs rw,seclabel,inode64
1084 1076 0:136 / /tmp/B/subdir rw,relatime shared:1042 master:1041 - 
tmpfs tmpfs rw,seclabel,inode64
1085 1077 0:136 / /tmp/C/subdir rw,relatime shared:1042 master:1041 - 
tmpfs tmpfs rw,seclabel,inode64
1086 1084 0:136 / /tmp/B/subdir rw,relatime shared:1042 master:1041 - 
tmpfs tmpfs rw,seclabel,inode64
1087 1085 0:136 / /tmp/C/subdir rw,relatime shared:1042 master:1041 - 
tmpfs tmpfs rw,seclabel,inode64
1088 1081 0:136 / /tmp/Z/subdir rw,relatime shared:1042 master:1041 - 
tmpfs tmpfs rw,seclabel,inode64
1089 1082 0:136 / /tmp/D/subdir rw,relatime shared:1042 master:1041 - 
tmpfs tmpfs rw,seclabel,inode64
1090 1083 0:136 / /tmp/E/subdir rw,relatime shared:1042 master:1041 - 
tmpfs tmpfs rw,seclabel,inode64
1142 1089 0:136 / /tmp/D/subdir rw,relatime shared:1042 master:1041 - 
tmpfs tmpfs rw,seclabel,inode64
1143 1090 0:136 / /tmp/E/subdir rw,relatime shared:1042 master:1041 - 
tmpfs tmpfs rw,seclabel,inode64
1144 1086 0:136 / /tmp/B/subdir rw,relatime shared:1042 master:1041 - 
tmpfs tmpfs rw,seclabel,inode64
1145 1087 0:136 / /tmp/C/subdir rw,relatime shared:1042 master:1041 - 
tmpfs tmpfs rw,seclabel,inode64
1146 1088 0:136 / /tmp/Z/subdir rw,relatime shared:1042 master:1041 - 
tmpfs tmpfs rw,seclabel,inode64

Maybe that can be converted to a regression test.

> 
> Not sufficiently awake is right - wrong address on Cc...  Anyway, bedtime
> for me...



-- 
Best regards, Pavel Tikhomirov
Senior Software Developer, Virtuozzo.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG] propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2)
  2025-05-09 10:59   ` Pavel Tikhomirov
@ 2025-05-09 11:06     ` Pavel Tikhomirov
  0 siblings, 0 replies; 4+ messages in thread
From: Pavel Tikhomirov @ 2025-05-09 11:06 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel; +Cc: Christian Brauner



On 5/9/25 18:59, Pavel Tikhomirov wrote:
> 
> 
> On 5/9/25 16:28, Al Viro wrote:
>> On Fri, May 09, 2025 at 09:26:28AM +0100, Al Viro wrote:
>>> AFAICS, 9ffb14ef61ba "move_mount: allow to add a mount into an existing
>>> group" breaks assertions on ->mnt_share/->mnt_slave.  For once, the data
>>> structures in question are actually documented.
>>>
>>> Documentation/filesystem/sharedsubtree.rst:
>>>          All vfsmounts in a peer group have the same ->mnt_master.  
>>> If it is
>>>     non-NULL, they form a contiguous (ordered) segment of slave list.
>>>
>>> fs/pnode.c:
>>>   * Note that peer groups form contiguous segments of slave lists.
>>>
>>> fs/namespace.c:do_set_group():
>>>          if (IS_MNT_SLAVE(from)) {
>>>                  struct mount *m = from->mnt_master;
>>>
>>>                  list_add(&to->mnt_slave, &m->mnt_slave_list);
>>>                  to->mnt_master = m;
>>>          }
>>>
>>>          if (IS_MNT_SHARED(from)) {
>>>                  to->mnt_group_id = from->mnt_group_id;
>>>                  list_add(&to->mnt_share, &from->mnt_share);
>>>                  lock_mount_hash();
>>>                  set_mnt_shared(to);
>>>                  unlock_mount_hash();
>>>          }
>>>
>>> Note that 'to' goes right after 'from' in ->mnt_share (i.e. peer group
>>> list) and into the beginning of the slave list 'from' belongs to.  IOW,
>>> contiguity gets broken if 'from' is both IS_MNT_SLAVE and IS_MNT_SHARED.
>>> Which is what happens when the peer group 'from' is in gets propagation
>>> from somewhere.
> 
> Agreed, list ordering consistency looks broken by my commit.
> 
>>>
>>> It's not hard to fix - something like
>>>
>>>          if (IS_MNT_SHARED(from)) {
>>>         to->mnt_group_id = from->mnt_group_id;
>>>                  list_add(&to->mnt_share, &from->mnt_share);
>>>         if (IS_MNT_SLAVE(from))
>>>             list_add(&to->mnt_slave, &from->mnt_slave);
>>>         to->mnt_master = from->mnt_master;
>>>                  lock_mount_hash();
>>>                  set_mnt_shared(to);
>>>                  unlock_mount_hash();
>>>          } else if (IS_MNT_SLAVE(from)) {
>>>         to->mnt_master = from->mnt_master;
>>>         list_add(&to->mnt_slave, &from->mnt_master->mnt_slave_list);
>>>     }
>>>
>>> ought to do it.
> 
> Yes it should work.
> 
> In case (IS_MNT_SLAVE(from) && !IS_MNT_SHARED(from)) we can probably 
> also do:
> 
> list_add(&to->mnt_slave, &from->mnt_slave);
> 
> as next slave after "from" is definitely not from the same shared group 
> with "from" (as it's not in a shared group) so we won't break list 
> continuity.
> 
> That will allow to simplify code change to:
> 
>          if (IS_MNT_SLAVE(from)) {
>                  struct mount *m = from->mnt_master;
> 
> -                list_add(&to->mnt_slave, &m->mnt_slave_list);
> +                list_add(&to->mnt_slave, &from->mnt_slave);
>                  to->mnt_master = m;
>          }
> 
>          if (IS_MNT_SHARED(from)) {
>                  to->mnt_group_id = from->mnt_group_id;
>                  list_add(&to->mnt_share, &from->mnt_share);
>                  lock_mount_hash();
>                  set_mnt_shared(to);
>                  unlock_mount_hash();
>          }
> 
> If I'm not missing something (didn't test yet).
> 
>>> I'm nowhere near sufficiently awake right now to put
>>> together a regression test, but unless I'm missing something subtle, it
>>> should be possible to get a fairly obvious breakage of propagate_mnt()
>>> out of that...
> 
> I managed to see weird behavior like that:
> 
> # rmdir /tmp/{A,B,C,D,E,Z}
> # unshare -m
> mkdir /tmp/{A,B,C,D,E,Z}
> mount --make-rprivate /
> mount -t tmpfs tmpfs /tmp/A
> mount --bind /tmp/A /tmp/Z
> mount --make-shared /tmp/A
> mount --bind /tmp/A /tmp/B
> mount --make-slave /tmp/B
> mount --make-shared /tmp/B
> mount --bind /tmp/B /tmp/C
> mount --bind /tmp/C /tmp/D
> mount --bind /tmp/D /tmp/E
> ./setgroup-v2 /tmp/C /tmp/Z

Note I put ./setgroup-v2 code here:
https://github.com/Snorch/linux-helpers/blob/master/mount_set_group.c

> mkdir /tmp/A/subdir
> mount -t tmpfs tmpfs /tmp/A/subdir
> 
> This creates 16 subdir mounts instead of expected 6:
> 
> cat /proc/self/mountinfo | grep /tmp/
> 1071 1065 0:109 / /tmp/A rw,relatime shared:556 - tmpfs tmpfs 
> rw,seclabel,inode64
> 1073 1065 0:109 / /tmp/Z rw,relatime shared:1040 master:556 - tmpfs 
> tmpfs rw,seclabel,inode64
> 1076 1065 0:109 / /tmp/B rw,relatime shared:1040 master:556 - tmpfs 
> tmpfs rw,seclabel,inode64
> 1077 1065 0:109 / /tmp/C rw,relatime shared:1040 master:556 - tmpfs 
> tmpfs rw,seclabel,inode64
> 1078 1065 0:109 / /tmp/D rw,relatime shared:1040 master:556 - tmpfs 
> tmpfs rw,seclabel,inode64
> 1079 1065 0:109 / /tmp/E rw,relatime shared:1040 master:556 - tmpfs 
> tmpfs rw,seclabel,inode64
> 1080 1071 0:136 / /tmp/A/subdir rw,relatime shared:1041 - tmpfs tmpfs 
> rw,seclabel,inode64
> 1081 1073 0:136 / /tmp/Z/subdir rw,relatime shared:1042 master:1041 - 
> tmpfs tmpfs rw,seclabel,inode64
> 1082 1078 0:136 / /tmp/D/subdir rw,relatime shared:1042 master:1041 - 
> tmpfs tmpfs rw,seclabel,inode64
> 1083 1079 0:136 / /tmp/E/subdir rw,relatime shared:1042 master:1041 - 
> tmpfs tmpfs rw,seclabel,inode64
> 1084 1076 0:136 / /tmp/B/subdir rw,relatime shared:1042 master:1041 - 
> tmpfs tmpfs rw,seclabel,inode64
> 1085 1077 0:136 / /tmp/C/subdir rw,relatime shared:1042 master:1041 - 
> tmpfs tmpfs rw,seclabel,inode64
> 1086 1084 0:136 / /tmp/B/subdir rw,relatime shared:1042 master:1041 - 
> tmpfs tmpfs rw,seclabel,inode64
> 1087 1085 0:136 / /tmp/C/subdir rw,relatime shared:1042 master:1041 - 
> tmpfs tmpfs rw,seclabel,inode64
> 1088 1081 0:136 / /tmp/Z/subdir rw,relatime shared:1042 master:1041 - 
> tmpfs tmpfs rw,seclabel,inode64
> 1089 1082 0:136 / /tmp/D/subdir rw,relatime shared:1042 master:1041 - 
> tmpfs tmpfs rw,seclabel,inode64
> 1090 1083 0:136 / /tmp/E/subdir rw,relatime shared:1042 master:1041 - 
> tmpfs tmpfs rw,seclabel,inode64
> 1142 1089 0:136 / /tmp/D/subdir rw,relatime shared:1042 master:1041 - 
> tmpfs tmpfs rw,seclabel,inode64
> 1143 1090 0:136 / /tmp/E/subdir rw,relatime shared:1042 master:1041 - 
> tmpfs tmpfs rw,seclabel,inode64
> 1144 1086 0:136 / /tmp/B/subdir rw,relatime shared:1042 master:1041 - 
> tmpfs tmpfs rw,seclabel,inode64
> 1145 1087 0:136 / /tmp/C/subdir rw,relatime shared:1042 master:1041 - 
> tmpfs tmpfs rw,seclabel,inode64
> 1146 1088 0:136 / /tmp/Z/subdir rw,relatime shared:1042 master:1041 - 
> tmpfs tmpfs rw,seclabel,inode64
> 
> Maybe that can be converted to a regression test.
> 
>>
>> Not sufficiently awake is right - wrong address on Cc...  Anyway, bedtime
>> for me...
> 
> 
> 

-- 
Best regards, Pavel Tikhomirov
Senior Software Developer, Virtuozzo.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-05-09 11:07 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-09  8:26 [BUG] propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2) Al Viro
2025-05-09  8:28 ` Al Viro
2025-05-09 10:59   ` Pavel Tikhomirov
2025-05-09 11:06     ` Pavel Tikhomirov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.