All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Ram Pai <linuxram@us.ibm.com>
Cc: Andrei Vagin <avagin@virtuozzo.com>,
	Al Viro <viro@ZenIV.linux.org.uk>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [REVIEW][PATCH 1/2] mnt: In propgate_umount handle visiting mounts in any order
Date: Wed, 24 May 2017 16:54:34 -0500	[thread overview]
Message-ID: <87d1axapk5.fsf@xmission.com> (raw)
In-Reply-To: <20170524204253.GB5554@ram.oc3035372033.ibm.com> (Ram Pai's message of "Wed, 24 May 2017 13:42:53 -0700")

Ram Pai <linuxram@us.ibm.com> writes:

> On Wed, May 17, 2017 at 12:54:34AM -0500, Eric W. Biederman wrote:
>> 
>> While investigating some poor umount performance I realized that in
>> the case of overlapping mount trees where some of the mounts are locked
>> the code has been failing to unmount all of the mounts it should
>> have been unmounting.
>> 
>> This failure to unmount all of the necessary
>> mounts can be reproduced with:
>> 
>> $ cat locked_mounts_test.sh
>> 
>> mount -t tmpfs test-base /mnt
>> mount --make-shared /mnt
>> mkdir -p /mnt/b
>> 
>> mount -t tmpfs test1 /mnt/b
>> mount --make-shared /mnt/b
>> mkdir -p /mnt/b/10
>> 
>> mount -t tmpfs test2 /mnt/b/10
>> mount --make-shared /mnt/b/10
>> mkdir -p /mnt/b/10/20
>> 
>> mount --rbind /mnt/b /mnt/b/10/20
>> 
>> unshare -Urm --propagation unchaged /bin/sh -c 'sleep 5; if [ $(grep test /proc/self/mountinfo | wc -l) -eq 1 ] ; then echo SUCCESS ; else echo FAILURE ; fi'
>> sleep 1
>> umount -l /mnt/b
>> wait %%
>> 
>> $ unshare -Urm ./locked_mounts_test.sh
>> 
>> This failure is corrected by removing the prepass that marks mounts
>> that may be umounted.
>> 
>> A first pass is added that umounts mounts if possible and if not sets
>> mount mark if they could be unmounted if they weren't locked and adds
>> them to a list to umount possibilities.  This first pass reconsiders
>> the mounts parent if it is on the list of umount possibilities, ensuring
>> that information of umoutability will pass from child to mount parent.
>> 
>> A second pass then walks through all mounts that are umounted and processes
>> their children unmounting them or marking them for reparenting.
>> 
>> A last pass cleans up the state on the mounts that could not be umounted
>> and if applicable reparents them to their first parent that remained
>> mounted.
>> 
>> While a bit longer than the old code this code is much more robust
>> as it allows information to flow up from the leaves and down
>> from the trunk making the order in which mounts are encountered
>> in the umount propgation tree irrelevant.
>
> Eric,
> 	I think we can accomplish what you want in a much simpler way.
>        	Would the patch below; UNTESTED BUT COMPILED, resolve your
> 	issue?

The reason I came up with an algorithm where the information flows
both directions is that especially in the case of umount -l
but even in some rare cases of a simple umount, the ordering
of the mount propagation tree can result in parent mounts being
visited before the child mounts.

This case shows in in the case of a mount or a set of mounts
being mounted below itself.

So no.   Irregardless of tucked mount state we can't do this.

I see this also doesn't have the change to move mnt_change_mountpoint
into another pass.  That one is quite important from a practical
point of view as that means the way the mount tree changes in umount
is the same irrespective of the number of times a mount shows
up in the mount propagation trees.  Which is a very important
property to have for optimizing umount -l.  Which in
the worst case allows reduces umount from O(N^2+) to roughly O(N).

All of what I am doing should have not effect on an implementation of
MNT_TUCKED.

That said your code to deal with MNT_TUCKED seems reasonable.

Eric

>
> 	Its a two pass unmount. First pass marks mounts that can
> 	be unmounted, and second pass does the neccessary unlinks.
> 	It does mark TUCKED mounts, and uses that information
> 	to peel off the correct mounts. Key points are
>
> 	a) a tucked mount never entertain any unmount propagation
> 	 	on its root dentry.
>
> 	b) when the child on the root dentry of a tucked mount is
> 	   unmounted, the tucked mount is not a tucked mount anymore.
>
> 	c) if the child is a tucked mount, than its child is reparented
> 	   to the parent.
>
>
> Signed-off-by: "Ram Pai" <linuxram@us.ibm.com>
>
> fs/namespace.c        |    4 ++-
> fs/pnode.c            |   53 +++++++++++++++++++++++++++++++++++++-------------
> fs/pnode.h            |    3 ++
> include/linux/mount.h |    1 
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index cc1375ef..ff3ec90 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2050,8 +2050,10 @@ static int attach_recursive_mnt(struct mount *source_mnt,
>  		hlist_del_init(&child->mnt_hash);
>  		q = __lookup_mnt(&child->mnt_parent->mnt,
>  				 child->mnt_mountpoint);
> -		if (q)
> +		if (q) {
>  			mnt_change_mountpoint(child, smp, q);
> +			SET_MNT_TUCKED(child);
> +		}
>  		commit_tree(child);
>  	}
>  	put_mountpoint(smp);
> diff --git a/fs/pnode.c b/fs/pnode.c
> index 5bc7896..b44a544 100644
> --- a/fs/pnode.c
> +++ b/fs/pnode.c
> @@ -448,31 +448,58 @@ static void __propagate_umount(struct mount *mnt)
>  
>  	for (m = propagation_next(parent, parent); m;
>  			m = propagation_next(m, parent)) {
> -		struct mount *topper;
> -		struct mount *child = __lookup_mnt(&m->mnt,
> -						mnt->mnt_mountpoint);
> -		/*
> -		 * umount the child only if the child has no children
> -		 * and the child is marked safe to unmount.
> +		struct mount *topper, *child;
> +
> +		/* Tucked mount must drop umount propagation events on
> +		 * its **root dentry**.
> +		 * The tucked mount did not exist when that child came
> +		 * into existence. It never received that mount propagation.
> +		 * Hence it should never entertain the umount propagation
> +		 * aswell.
>  		 */
> +		if (IS_MNT_TUCKED(m) && list_is_singular(&mnt->mnt_mounts))
> +			continue;
> +
> +
> +		child = __lookup_mnt(&m->mnt, mnt->mnt_mountpoint);
> +
>  		if (!child || !IS_MNT_MARKED(child))
>  			continue;
> +
>  		CLEAR_MNT_MARK(child);
>  
> -		/* If there is exactly one mount covering all of child
> -		 * replace child with that mount.
> -		 */
> -		topper = find_topper(child);
> -		if (topper)
> -			mnt_change_mountpoint(child->mnt_parent, child->mnt_mp,
> -					      topper);
> +		if (IS_MNT_TUCKED(child) &&
> +			(list_is_singular(&child->mnt_mounts))) {
> +			topper = find_topper(child);
> +			if (topper) {
> +				mnt_change_mountpoint(child->mnt_parent,
> +					child->mnt_mp, topper);
> +				CLEAR_MNT_TUCKED(child); /*lets be precise*/
> +			}
> +		}
>  
>  		if (list_empty(&child->mnt_mounts)) {
>  			list_del_init(&child->mnt_child);
>  			child->mnt.mnt_flags |= MNT_UMOUNT;
>  			list_move_tail(&child->mnt_list, &mnt->mnt_list);
>  		}
> +#if 0
> +	       	else {
> +			mntput(child); /* mark it for deletion. It will
> +				       	  be deleted whenever it looses
> +					  all its remaining references.
> +					  TODO: some more thought
> +					  needed, please validate */
> +		}
> +#endif
>  	}
> +
> +	/*
> +	 * This explicit umount operation is exposing the parent.
> +	 * In case the parent was a 'tucked' mount, it cannot be so
> +	 * anymore.
> +	 */
> +	CLEAR_MNT_TUCKED(parent);
>  }
>  
>  /*
> diff --git a/fs/pnode.h b/fs/pnode.h
> index dc87e65..9ebd1a8 100644
> --- a/fs/pnode.h
> +++ b/fs/pnode.h
> @@ -18,8 +18,11 @@
>  #define IS_MNT_UNBINDABLE(m) ((m)->mnt.mnt_flags & MNT_UNBINDABLE)
>  #define IS_MNT_MARKED(m) ((m)->mnt.mnt_flags & MNT_MARKED)
>  #define SET_MNT_MARK(m) ((m)->mnt.mnt_flags |= MNT_MARKED)
> +#define SET_MNT_TUCKED(m) ((m)->mnt.mnt_flags |= MNT_TUCKED)
>  #define CLEAR_MNT_MARK(m) ((m)->mnt.mnt_flags &= ~MNT_MARKED)
> +#define CLEAR_MNT_TUCKED(m) ((m)->mnt.mnt_flags &= ~MNT_TUCKED)
>  #define IS_MNT_LOCKED(m) ((m)->mnt.mnt_flags & MNT_LOCKED)
> +#define IS_MNT_TUCKED(m) ((m)->mnt.mnt_flags & MNT_TUCKED)
>  
>  #define CL_EXPIRE    		0x01
>  #define CL_SLAVE     		0x02
> diff --git a/include/linux/mount.h b/include/linux/mount.h
> index 8e0352a..41674e7 100644
> --- a/include/linux/mount.h
> +++ b/include/linux/mount.h
> @@ -62,6 +62,7 @@
>  #define MNT_SYNC_UMOUNT		0x2000000
>  #define MNT_MARKED		0x4000000
>  #define MNT_UMOUNT		0x8000000
> +#define MNT_TUCKED		0x10000000
>  
>  struct vfsmount {
>  	struct dentry *mnt_root;	/* root of the mounted tree */

  reply	other threads:[~2017-05-24 22:01 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-31  4:10 [PATCH] Fix a race in put_mountpoint Krister Johansen
2016-12-31  6:17 ` Al Viro
2017-01-03  0:51   ` Eric W. Biederman
2017-01-03  1:48     ` Al Viro
2017-01-03  3:17       ` Eric W. Biederman
2017-01-03  4:00         ` Al Viro
2017-01-04  3:52           ` Eric W. Biederman
2017-01-04  3:53             ` [PATCH] mnt: Protect the mountpoint hashtable with mount_lock Eric W. Biederman
2017-01-04 21:04               ` [REVIEW][PATCH] mnt: Tuck mounts under others instead of creating shadow/side mounts Eric W. Biederman
2017-01-07  5:06                 ` Al Viro
2017-01-11  0:10                   ` Eric W. Biederman
2017-01-11  4:11                     ` Al Viro
2017-01-11 16:03                       ` Eric W. Biederman
2017-01-11 16:18                         ` [REVIEW][PATCH 1/2] mnt: Fix propagate_mount_busy to notice all cases of busy mounts Eric W. Biederman
2017-01-11 16:19                           ` [REVIEW][PATCH 2/2] mnt: Tuck mounts under others instead of creating shadow/side mounts Eric W. Biederman
2017-01-12  5:45                             ` Al Viro
2017-01-20  7:20                               ` Eric W. Biederman
2017-01-20  7:26                               ` [PATCH v5] " Eric W. Biederman
2017-01-21  3:58                                 ` Ram Pai
2017-01-21  4:15                                   ` Eric W. Biederman
2017-01-23 19:02                                     ` Ram Pai
2017-01-24  0:16                                       ` Eric W. Biederman
2017-02-03 10:54                                         ` Eric W. Biederman
2017-02-03 17:10                                           ` Ram Pai
2017-02-03 18:26                                             ` Eric W. Biederman
2017-02-03 20:28                                               ` Ram Pai
2017-02-03 20:58                                                 ` Eric W. Biederman
2017-02-06  3:25                                                   ` Andrei Vagin
2017-02-06 21:40                                                     ` Ram Pai
2017-02-07  6:35                                                       ` Andrei Vagin
2017-01-12  5:30                           ` [REVIEW][PATCH 1/2] mnt: Fix propagate_mount_busy to notice all cases of busy mounts Al Viro
2017-01-20  7:18                             ` Eric W. Biederman
2017-01-13 20:32                           ` Andrei Vagin
2017-01-18 19:20                             ` Andrei Vagin
2017-01-20 23:18                           ` Ram Pai
2017-01-23  8:15                             ` Eric W. Biederman
2017-01-23 17:04                               ` Ram Pai
2017-01-12  5:03                         ` [REVIEW][PATCH] mnt: Tuck mounts under others instead of creating shadow/side mounts Al Viro
2017-05-14  2:15                 ` Andrei Vagin
2017-05-14  4:05                   ` Eric W. Biederman
2017-05-14  9:26                     ` Eric W. Biederman
2017-05-15 18:27                       ` Andrei Vagin
2017-05-15 19:42                         ` Eric W. Biederman
2017-05-15 20:10                           ` [REVIEW][PATCH] mnt: In umount propagation reparent in a separate pass Eric W. Biederman
2017-05-15 23:12                             ` Andrei Vagin
2017-05-16  5:42                             ` [PATCH] test: check a case when a mount is propagated between exiting mounts Andrei Vagin
2017-05-17  5:54                             ` [REVIEW][PATCH 1/2] mnt: In propgate_umount handle visiting mounts in any order Eric W. Biederman
2017-05-17  5:55                               ` [REVIEW][PATCH 2/2] mnt: Make propagate_umount less slow for overlapping mount propagation trees Eric W. Biederman
2017-05-17 22:48                                 ` Andrei Vagin
2017-05-17 23:26                                   ` Eric W. Biederman
2017-05-18  0:51                                     ` Andrei Vagin
2017-05-24 20:42                               ` [REVIEW][PATCH 1/2] mnt: In propgate_umount handle visiting mounts in any order Ram Pai
2017-05-24 21:54                                 ` Eric W. Biederman [this message]
2017-05-24 22:35                                   ` Ram Pai
2017-05-30  6:07                               ` Ram Pai
2017-05-30 15:07                                 ` Eric W. Biederman
2017-06-07  9:54                                   ` Ram Pai
2017-06-07 13:09                                     ` Eric W. Biederman
2017-05-22  8:15                             ` [REVIEW][PATCH] mnt: In umount propagation reparent in a separate pass Ram Pai
2017-05-22 18:33                               ` Eric W. Biederman
2017-05-22 22:34                                 ` Ram Pai
2017-05-23 13:58                                   ` Eric W. Biederman
2017-01-06  7:00               ` [PATCH] mnt: Protect the mountpoint hashtable with mount_lock Krister Johansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87d1axapk5.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=avagin@virtuozzo.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linuxram@us.ibm.com \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.