All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Ram Pai <linuxram@us.ibm.com>
Cc: Andrei Vagin <avagin@virtuozzo.com>,
	Al Viro <viro@ZenIV.linux.org.uk>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [REVIEW][PATCH] mnt: In umount propagation reparent in a separate pass
Date: Tue, 23 May 2017 08:58:09 -0500	[thread overview]
Message-ID: <87mva3oetq.fsf@xmission.com> (raw)
In-Reply-To: <20170522223403.GB5393@ram.oc3035372033.ibm.com> (Ram Pai's message of "Mon, 22 May 2017 15:34:03 -0700")

Ram Pai <linuxram@us.ibm.com> writes:

> On Mon, May 22, 2017 at 01:33:05PM -0500, Eric W. Biederman wrote:
>> Ram Pai <linuxram@us.ibm.com> writes:
>> 
>> > On Mon, May 15, 2017 at 03:10:38PM -0500, Eric W. Biederman wrote:
>> >> 
>> >> It was observed that in some pathlogical cases that the current code
>> >> does not unmount everything it should.  After investigation it was
>> >> determined that the issue is that mnt_change_mntpoint can can change
>> >> which mounts are available to be unmounted during mount propagation
>> >> which is wrong.
>> >> 
>> >> The trivial reproducer is:
>> >> $ cat ./pathological.sh
>> >> 
>> >> mount -t tmpfs test-base /mnt
>> >> cd /mnt
>> >> mkdir 1 2 1/1
>> >> mount --bind 1 1
>> >> mount --make-shared 1
>> >> mount --bind 1 2
>> >> mount --bind 1/1 1/1
>> >> mount --bind 1/1 1/1
>> >> echo
>> >> grep test-base /proc/self/mountinfo
>> >> umount 1/1
>> >> echo
>> >> grep test-base /proc/self/mountinfo
>> >> 
>> >> $ unshare -Urm ./pathological.sh
>> >> 
>> >> The expected output looks like:
>> >> 46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000
>> >> 47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
>> >> 48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
>> >> 49 54 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
>> >> 50 53 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
>> >> 51 49 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
>> >> 54 47 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
>> >> 53 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
>> >> 52 50 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
>> >> 
>> >> 46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000
>> >> 47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
>> >> 48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
>> >> 
>> >> The output without the fix looks like:
>> >> 46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000
>> >> 47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
>> >> 48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
>> >> 49 54 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
>> >> 50 53 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
>> >> 51 49 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
>> >> 54 47 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
>> >> 53 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
>> >> 52 50 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
>> >> 
>> >> 46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000
>> >> 47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
>> >> 48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
>> >> 52 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
>> >> 
>> >> That last mount in the output was in the propgation tree to be unmounted but
>> >> was missed because the mnt_change_mountpoint changed it's parent before the walk
>> >> through the mount propagation tree observed it.
>> >> 
>> >
>> > Looks patch correct to me.
>> > Reviewed-by: Ram Pai <linuxram@us.ibm.com>
>> >
>> > BTW: The logic of find_topper() looks not-so-accurate to me. Why dont we
>> > explicitly flag tucked mounts with MNT_TUCKED, and use that information
>> > to determine if the child is really a topper?  Currently we determine
>> > the topper if it entirely is covering. How do we diambiguate that from an
>> > entirely-covering-mount that is explicitly mounted by the administrator?
>> > A topper situation is applicable only when tucked, right?
>> 
>> In the current code explictly does not care about the difference.
>> The code just restricts untucking mounts of any kind to umount
>> propagation.
>> 
>> This is where we have previously disagreed.
>> 
>> A short summary of our previous discussions:
>> Eric Biederman: find_topper makes tucked mounts ordinary mounts and is simple.
>> Eric Biederman: I don't see a compelling case for a MNT_TUCKED flag
>> Eric Biederman: I think the change is a nice behavioral improvement
>> Ram Pai: a MNT_TUCKED flag would perfectly preserve existing behavior
>> Ram Pai: find_topper while not perfect is better than the previous
>>          very special case for side/shadow mounts
>> 
>> With respect to backwards compatibility the set of bugs I am fixing
>> shows that it is possible to have some very egregious bugs in this
>> area and in practice no one cares.
>> 
>> 
>> Without a MNT_TUCKED flag I can readily tell what the following
>> code should do by simply inspection of the of the mount
>> propgation information in /proc/self/mountinfo:
>> 
> Step 1>  $ mount -t tmpfs test-base /mnt
> Step 2>  $ cd /mnt
> Step 3>  $ mkdir -p 1 2 1/1
> Step 4>  $ mount --bind 1 1
> Step 5>  $ mount --make-shared 1
> Step 6>  $ mount --bind 1 2
> Step 7>  $ mount --bind 1/1 1/1
> Step 8>  $ mount --bind 1/1 1/1
> Step 9>  $ umount 1/1
>> 
>> Before the umount /proc/self/mountinfo shows:
>>  46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=502,gid=502
>>  47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=502,gid=502
>>  48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=502,gid=502
>>  49 54 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=502,gid=502
>>  50 53 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=502,gid=502
>>  51 49 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=502,gid=502
>>  54 47 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=502,gid=502
>>  53 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=502,gid=502
>>  52 50 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=502,gid=502
>> 
>> So it is clear to me that umount /mnt/1/1 should just leave:
>> /mnt
>> /mnt/1
>> /mnt/2
>
> This is one other place where we disagree....  I expect things to peel
> back to the state the trees were, when the Step 7 was executed. which
> is 
> /mnt 
> /mnt/1
> /mnt/2
> /mnt/1/1
> /mnt/2/1
> And all tucked mounts disapper.
>
> Dont get me wrong. I dont think we will agree because we have different
> expections. There is no standard on what to expect. Someone
> authoritative; may be Al Viro, has to define what to expect. 
>
>> 
>> I would argue that is what the code should always have done.
>> 
>> I believe the code with a MNT_TUCKED flag would leave:
>> /mnt
>> /mnt/1
>> /mnt/1/1
>> /mnt/2
>> But in truth it makes my head hurt to think about it.
>
> Yes it is extremely mind-bending; sometimes mind-bleeding. :-(
>
>> I don't see that MNT_TUCKED adds anything except aditional code
>> complexity.
>> 
>> I don't actually see what the value is in keeping mounts that you can
>> not use (because they are overmounted) around.
>
> I argue that MNT_TUCKED leaves markers that can be used to determine
> what can be taken out and what needs to be kept.
>
> I will stop here and say.. there is value in marking TUCKED mounts.
> Someone will run into some obscure issue in the future; probably a
> decade from now, and the same story will repeat.
>
> I wish there was a mathematical formula, where you plugin the operation
> and a state of the trees, and the new state of the mount-trees emerge.
>
> For now your patches look good to me.

Then let me ask you this.  Please look at the two successor patches to
this.  I am confident in them but I also know I am human and may have
missed something.  

Then on top of those et's look at marking tucked mounts.  If we write
the code and examine motivating cases where the behavior differs we
should be able to make an informed choice.

I say motivating cases as there are use cases with slave mounts that
motivated the support of tucking mounts.  I figure if we go back through
and examine those we can at least see if there are any cases where
without marking them we have a practical issue.  Or perhaps we will
see that for all cases that we can think of that matter there are no
differences.

I am motivated to solve this because after fixing the perfomance issues
we need find a way for some set of mount namespaces to recreate their
mount propgation tree on a different machine.   That is needed for
CRIU and it may be needed for the plan9 case where logging into a remote
system you could bring all of your filesystems with you into your own
personal mount namespace.

Eric

  reply	other threads:[~2017-05-23 14:04 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-31  4:10 [PATCH] Fix a race in put_mountpoint Krister Johansen
2016-12-31  6:17 ` Al Viro
2017-01-03  0:51   ` Eric W. Biederman
2017-01-03  1:48     ` Al Viro
2017-01-03  3:17       ` Eric W. Biederman
2017-01-03  4:00         ` Al Viro
2017-01-04  3:52           ` Eric W. Biederman
2017-01-04  3:53             ` [PATCH] mnt: Protect the mountpoint hashtable with mount_lock Eric W. Biederman
2017-01-04 21:04               ` [REVIEW][PATCH] mnt: Tuck mounts under others instead of creating shadow/side mounts Eric W. Biederman
2017-01-07  5:06                 ` Al Viro
2017-01-11  0:10                   ` Eric W. Biederman
2017-01-11  4:11                     ` Al Viro
2017-01-11 16:03                       ` Eric W. Biederman
2017-01-11 16:18                         ` [REVIEW][PATCH 1/2] mnt: Fix propagate_mount_busy to notice all cases of busy mounts Eric W. Biederman
2017-01-11 16:19                           ` [REVIEW][PATCH 2/2] mnt: Tuck mounts under others instead of creating shadow/side mounts Eric W. Biederman
2017-01-12  5:45                             ` Al Viro
2017-01-20  7:20                               ` Eric W. Biederman
2017-01-20  7:26                               ` [PATCH v5] " Eric W. Biederman
2017-01-21  3:58                                 ` Ram Pai
2017-01-21  4:15                                   ` Eric W. Biederman
2017-01-23 19:02                                     ` Ram Pai
2017-01-24  0:16                                       ` Eric W. Biederman
2017-02-03 10:54                                         ` Eric W. Biederman
2017-02-03 17:10                                           ` Ram Pai
2017-02-03 18:26                                             ` Eric W. Biederman
2017-02-03 20:28                                               ` Ram Pai
2017-02-03 20:58                                                 ` Eric W. Biederman
2017-02-06  3:25                                                   ` Andrei Vagin
2017-02-06 21:40                                                     ` Ram Pai
2017-02-07  6:35                                                       ` Andrei Vagin
2017-01-12  5:30                           ` [REVIEW][PATCH 1/2] mnt: Fix propagate_mount_busy to notice all cases of busy mounts Al Viro
2017-01-20  7:18                             ` Eric W. Biederman
2017-01-13 20:32                           ` Andrei Vagin
2017-01-18 19:20                             ` Andrei Vagin
2017-01-20 23:18                           ` Ram Pai
2017-01-23  8:15                             ` Eric W. Biederman
2017-01-23 17:04                               ` Ram Pai
2017-01-12  5:03                         ` [REVIEW][PATCH] mnt: Tuck mounts under others instead of creating shadow/side mounts Al Viro
2017-05-14  2:15                 ` Andrei Vagin
2017-05-14  4:05                   ` Eric W. Biederman
2017-05-14  9:26                     ` Eric W. Biederman
2017-05-15 18:27                       ` Andrei Vagin
2017-05-15 19:42                         ` Eric W. Biederman
2017-05-15 20:10                           ` [REVIEW][PATCH] mnt: In umount propagation reparent in a separate pass Eric W. Biederman
2017-05-15 23:12                             ` Andrei Vagin
2017-05-16  5:42                             ` [PATCH] test: check a case when a mount is propagated between exiting mounts Andrei Vagin
2017-05-17  5:54                             ` [REVIEW][PATCH 1/2] mnt: In propgate_umount handle visiting mounts in any order Eric W. Biederman
2017-05-17  5:55                               ` [REVIEW][PATCH 2/2] mnt: Make propagate_umount less slow for overlapping mount propagation trees Eric W. Biederman
2017-05-17 22:48                                 ` Andrei Vagin
2017-05-17 23:26                                   ` Eric W. Biederman
2017-05-18  0:51                                     ` Andrei Vagin
2017-05-24 20:42                               ` [REVIEW][PATCH 1/2] mnt: In propgate_umount handle visiting mounts in any order Ram Pai
2017-05-24 21:54                                 ` Eric W. Biederman
2017-05-24 22:35                                   ` Ram Pai
2017-05-30  6:07                               ` Ram Pai
2017-05-30 15:07                                 ` Eric W. Biederman
2017-06-07  9:54                                   ` Ram Pai
2017-06-07 13:09                                     ` Eric W. Biederman
2017-05-22  8:15                             ` [REVIEW][PATCH] mnt: In umount propagation reparent in a separate pass Ram Pai
2017-05-22 18:33                               ` Eric W. Biederman
2017-05-22 22:34                                 ` Ram Pai
2017-05-23 13:58                                   ` Eric W. Biederman [this message]
2017-01-06  7:00               ` [PATCH] mnt: Protect the mountpoint hashtable with mount_lock Krister Johansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mva3oetq.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=avagin@virtuozzo.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linuxram@us.ibm.com \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.