From: Andrei Vagin <avagin-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
To: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Cc: linux-fsdevel
<linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
LKML <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Linux Containers
<containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>,
Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>,
Alexander Viro
<viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
Subject: Re: [RFC][PATCH v2] mount: In propagate_umount handle overlapping mount propagation trees
Date: Mon, 31 Oct 2016 23:14:23 -0700 [thread overview]
Message-ID: <20161101061422.GA14866@outlook.office365.com> (raw)
In-Reply-To: <87mvhs14s7.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
On Tue, Oct 25, 2016 at 04:45:44PM -0500, Eric W. Biederman wrote:
> Andrei Vagin <avagin-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org> writes:
> >
> > From 8e0f45c0272aa1f789d1657a0acc98c58919dcc3 Mon Sep 17 00:00:00 2001
> > From: Andrei Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
> > Date: Tue, 25 Oct 2016 13:57:31 -0700
> > Subject: [PATCH] mount: skip all mounts from a shared group if one is marked
> >
> > If we meet a marked mount, it means that all mounts from
> > its group have been already revised.
> >
> > Signed-off-by: Andrei Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
> > ---
> > fs/pnode.c | 18 +++++++++++++++---
> > 1 file changed, 15 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/pnode.c b/fs/pnode.c
> > index 8fd1a3f..ebb7134 100644
> > --- a/fs/pnode.c
> > +++ b/fs/pnode.c
> > @@ -426,10 +426,16 @@ static struct mount *propagation_visit_child(struct mount *last_child,
> > if (child && !IS_MNT_MARKED(child))
> > return child;
> >
> > - if (!child)
> > + if (!child) {
> > m = propagation_next(m, origin);
> > - else
> > + } else {
> > + if (IS_MNT_MARKED(child)) {
> > + if (m->mnt_group_id == origin->mnt_group_id)
> > + return NULL;
> > + m = m->mnt_master;
> > + }
> > m = propagation_next_sib(m, origin);
> > + }
> > }
> > return NULL;
> > }
> > @@ -456,8 +462,14 @@ static struct mount *propagation_revisit_child(struct mount *last_child,
> >
> > if (!child)
> > m = propagation_next(m, origin);
> > - else
> > + else {
> > + if (!IS_MNT_MARKED(child)) {
> > + if (m->mnt_group_id == origin->mnt_group_id)
> > + return NULL;
> > + m = m->mnt_master;
> > + }
> > m = propagation_next_sib(m, origin);
> > + }
> > }
> > return NULL;
> > }
>
> That is certainly interesting. The problem is that the reason we were
> going slow is that there were in fact mounts that had not been traversed
> in the share group.
>
> And in fact the entire idea of visiting a vfsmount mountpoint pair
> exactly once is wrong in the face of shadow mounts. For a vfsmount
> mountpoint pair that has shadow mounts the number of shadow mounts needs
> to be descreased by one each time the propgation tree is traversed
> during unmount. Which means that as far as I can see we have to kill
> shadow mounts to correctly optimize this code. Once shadow mounts are
> gone I don't know of a case where need your optimization.
I am not sure that now shadow mounts are worked as you described
here. start_umount_propagation() doesn't remove a mount from mnt_hash,
so in a second time we will look up the same mount again.
Look at this script:
[root@fc24 mounts]# cat ./opus02.sh
set -e
mkdir -p /mnt
mount -t tmpfs zdtm /mnt
mkdir -p /mnt/A/a
mkdir -p /mnt/B/a
mount --bind --make-shared /mnt/A /mnt/A
mount --bind /mnt/A /mnt/B
mount --bind /mnt/A/a /mnt/A/a
mount --bind /mnt/A/a /mnt/A/a
umount -l /mnt/A
cat /proc/self/mountinfo | grep zdtm
[root@fc24 mounts]# unshare --propagation private -m ./opus02.sh
159 121 0:46 / /mnt rw,relatime - tmpfs zdtm rw
162 159 0:46 /A /mnt/B rw,relatime shared:67 - tmpfs zdtm rw
167 162 0:46 /A/a /mnt/B/a rw,relatime shared:67 - tmpfs zdtm rw
We mount nothing into /mnt/B, but when we umount everything from A, we
still have something in B.
Thanks,
Andrei
>
> I am busily verifying my patch to kill shadow mounts but the following
> patch is the minimal version. As far as I can see propagate_one
> is the only place we create shadow mounts, and holding the
> namespace_lock over attach_recursive_mnt, propagate_mnt, and
> propgate_one is sufficient for that __lookup_mnt to be competely safe.
>
> diff --git a/fs/pnode.c b/fs/pnode.c
> index 234a9ac49958..b14119b370d4 100644
> --- a/fs/pnode.c
> +++ b/fs/pnode.c
> @@ -217,6 +217,9 @@ static int propagate_one(struct mount *m)
> /* skip if mountpoint isn't covered by it */
> if (!is_subdir(mp->m_dentry, m->mnt.mnt_root))
> return 0;
> + /* skip if mountpoint already has a mount on it */
> + if (__lookup_mnt(&m->mnt, mp->m_dentry))
> + return 0;
> if (peers(m, last_dest)) {
> type = CL_MAKE_SHARED;
> } else {
>
> If you run with that patch you will see that there are go faster stripes.
>
> Eric
>
WARNING: multiple messages have this Message-ID (diff)
From: Andrei Vagin <avagin@virtuozzo.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andrey Vagin <avagin@openvz.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Linux Containers <containers@lists.linux-foundation.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC][PATCH v2] mount: In propagate_umount handle overlapping mount propagation trees
Date: Mon, 31 Oct 2016 23:14:23 -0700 [thread overview]
Message-ID: <20161101061422.GA14866@outlook.office365.com> (raw)
In-Reply-To: <87mvhs14s7.fsf@xmission.com>
On Tue, Oct 25, 2016 at 04:45:44PM -0500, Eric W. Biederman wrote:
> Andrei Vagin <avagin@virtuozzo.com> writes:
> >
> > From 8e0f45c0272aa1f789d1657a0acc98c58919dcc3 Mon Sep 17 00:00:00 2001
> > From: Andrei Vagin <avagin@openvz.org>
> > Date: Tue, 25 Oct 2016 13:57:31 -0700
> > Subject: [PATCH] mount: skip all mounts from a shared group if one is marked
> >
> > If we meet a marked mount, it means that all mounts from
> > its group have been already revised.
> >
> > Signed-off-by: Andrei Vagin <avagin@openvz.org>
> > ---
> > fs/pnode.c | 18 +++++++++++++++---
> > 1 file changed, 15 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/pnode.c b/fs/pnode.c
> > index 8fd1a3f..ebb7134 100644
> > --- a/fs/pnode.c
> > +++ b/fs/pnode.c
> > @@ -426,10 +426,16 @@ static struct mount *propagation_visit_child(struct mount *last_child,
> > if (child && !IS_MNT_MARKED(child))
> > return child;
> >
> > - if (!child)
> > + if (!child) {
> > m = propagation_next(m, origin);
> > - else
> > + } else {
> > + if (IS_MNT_MARKED(child)) {
> > + if (m->mnt_group_id == origin->mnt_group_id)
> > + return NULL;
> > + m = m->mnt_master;
> > + }
> > m = propagation_next_sib(m, origin);
> > + }
> > }
> > return NULL;
> > }
> > @@ -456,8 +462,14 @@ static struct mount *propagation_revisit_child(struct mount *last_child,
> >
> > if (!child)
> > m = propagation_next(m, origin);
> > - else
> > + else {
> > + if (!IS_MNT_MARKED(child)) {
> > + if (m->mnt_group_id == origin->mnt_group_id)
> > + return NULL;
> > + m = m->mnt_master;
> > + }
> > m = propagation_next_sib(m, origin);
> > + }
> > }
> > return NULL;
> > }
>
> That is certainly interesting. The problem is that the reason we were
> going slow is that there were in fact mounts that had not been traversed
> in the share group.
>
> And in fact the entire idea of visiting a vfsmount mountpoint pair
> exactly once is wrong in the face of shadow mounts. For a vfsmount
> mountpoint pair that has shadow mounts the number of shadow mounts needs
> to be descreased by one each time the propgation tree is traversed
> during unmount. Which means that as far as I can see we have to kill
> shadow mounts to correctly optimize this code. Once shadow mounts are
> gone I don't know of a case where need your optimization.
I am not sure that now shadow mounts are worked as you described
here. start_umount_propagation() doesn't remove a mount from mnt_hash,
so in a second time we will look up the same mount again.
Look at this script:
[root@fc24 mounts]# cat ./opus02.sh
set -e
mkdir -p /mnt
mount -t tmpfs zdtm /mnt
mkdir -p /mnt/A/a
mkdir -p /mnt/B/a
mount --bind --make-shared /mnt/A /mnt/A
mount --bind /mnt/A /mnt/B
mount --bind /mnt/A/a /mnt/A/a
mount --bind /mnt/A/a /mnt/A/a
umount -l /mnt/A
cat /proc/self/mountinfo | grep zdtm
[root@fc24 mounts]# unshare --propagation private -m ./opus02.sh
159 121 0:46 / /mnt rw,relatime - tmpfs zdtm rw
162 159 0:46 /A /mnt/B rw,relatime shared:67 - tmpfs zdtm rw
167 162 0:46 /A/a /mnt/B/a rw,relatime shared:67 - tmpfs zdtm rw
We mount nothing into /mnt/B, but when we umount everything from A, we
still have something in B.
Thanks,
Andrei
>
> I am busily verifying my patch to kill shadow mounts but the following
> patch is the minimal version. As far as I can see propagate_one
> is the only place we create shadow mounts, and holding the
> namespace_lock over attach_recursive_mnt, propagate_mnt, and
> propgate_one is sufficient for that __lookup_mnt to be competely safe.
>
> diff --git a/fs/pnode.c b/fs/pnode.c
> index 234a9ac49958..b14119b370d4 100644
> --- a/fs/pnode.c
> +++ b/fs/pnode.c
> @@ -217,6 +217,9 @@ static int propagate_one(struct mount *m)
> /* skip if mountpoint isn't covered by it */
> if (!is_subdir(mp->m_dentry, m->mnt.mnt_root))
> return 0;
> + /* skip if mountpoint already has a mount on it */
> + if (__lookup_mnt(&m->mnt, mp->m_dentry))
> + return 0;
> if (peers(m, last_dest)) {
> type = CL_MAKE_SHARED;
> } else {
>
> If you run with that patch you will see that there are go faster stripes.
>
> Eric
>
next prev parent reply other threads:[~2016-11-01 6:14 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-10 23:26 [PATCH] [v3] mount: dont execute propagate_umount() many times for same mounts Andrei Vagin
2016-10-10 23:26 ` Andrei Vagin
[not found] ` <1476141965-21429-1-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2016-10-13 17:14 ` Eric W. Biederman
2016-10-13 17:14 ` Eric W. Biederman
2016-10-13 19:53 ` [RFC][PATCH] mount: In mark_umount_candidates and __propogate_umount visit each mount once Eric W. Biederman
2016-10-13 21:46 ` Andrei Vagin
[not found] ` <20161013214650.GB19836-1ViLX0X+lBJGNQ1M2rI3KwRV3xvJKrda@public.gmane.org>
2016-10-14 2:31 ` Andrey Vagin
2016-10-14 2:31 ` Andrey Vagin
[not found] ` <CANaxB-xPkgdyeg0z6TvExMfyy4uOC+Nu4Q99WpCscNKMWz8VPg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-10-14 2:45 ` Eric W. Biederman
2016-10-14 2:45 ` Eric W. Biederman
[not found] ` <87wphb4pjn.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-10-14 18:29 ` [RFC][PATCH v2] " Eric W. Biederman
2016-10-14 18:29 ` Eric W. Biederman
[not found] ` <8737jy3htt.fsf_-_-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-10-18 2:40 ` Andrei Vagin
2016-10-18 2:40 ` Andrei Vagin
[not found] ` <20161018024000.GA4901-1ViLX0X+lBJGNQ1M2rI3KwRV3xvJKrda@public.gmane.org>
2016-10-18 6:49 ` Eric W. Biederman
2016-10-18 6:49 ` Eric W. Biederman
[not found] ` <87r37e9mnj.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2016-10-19 3:46 ` [REVIEW][PATCH] mount: In propagate_umount handle overlapping mount propagation trees Eric W. Biederman
2016-10-19 3:46 ` Eric W. Biederman
[not found] ` <877f95ngpr.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2016-10-20 21:30 ` Andrei Vagin
2016-10-20 21:30 ` Andrei Vagin
[not found] ` <20161020213052.GA25226-1ViLX0X+lBJGNQ1M2rI3KwRV3xvJKrda@public.gmane.org>
2016-10-21 19:26 ` Eric W. Biederman
2016-10-21 19:26 ` Eric W. Biederman
[not found] ` <87pomtec6c.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2016-10-22 19:42 ` [RFC][PATCH v2] " Eric W. Biederman
2016-10-22 19:42 ` Eric W. Biederman
[not found] ` <877f90b27o.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2016-10-25 20:58 ` Andrei Vagin
2016-10-25 20:58 ` Andrei Vagin
[not found] ` <20161025205846.GA25080-1ViLX0X+lBJGNQ1M2rI3KwRV3xvJKrda@public.gmane.org>
2016-10-25 21:45 ` Eric W. Biederman
2016-10-25 21:45 ` Eric W. Biederman
[not found] ` <87mvhs14s7.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2016-10-25 23:41 ` Andrei Vagin
2016-10-25 23:41 ` Andrei Vagin
[not found] ` <20161025234125.GA20335-1ViLX0X+lBJGNQ1M2rI3KwRV3xvJKrda@public.gmane.org>
2016-10-26 1:42 ` Eric W. Biederman
2016-10-26 1:42 ` Eric W. Biederman
2016-11-01 6:14 ` Andrei Vagin [this message]
2016-11-01 6:14 ` Andrei Vagin
[not found] ` <87pon458l1.fsf_-_-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-10-13 21:46 ` [RFC][PATCH] mount: In mark_umount_candidates and __propogate_umount visit each mount once Andrei Vagin
[not found] ` <877f9c6ui8.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-10-13 19:53 ` Eric W. Biederman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161101061422.GA14866@outlook.office365.com \
--to=avagin-5hdwgun5lf+gspxsjd1c4w@public.gmane.org \
--cc=avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org \
--cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org \
--cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.