linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* how to show propagation state for mounts
@ 2008-02-20 15:39 Miklos Szeredi
  2008-02-20 16:04 ` Al Viro
  0 siblings, 1 reply; 15+ messages in thread
From: Miklos Szeredi @ 2008-02-20 15:39 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, linuxram

> mountinfo - IMO needs a sane discussion of what and how should be shown
> wrt propagation state

Here's my take on the matter.

The propagation tree can be either be represented

 1) "from root to leaf" listing members of peer groups and their
 slaves explicitly,

 2) or "from leaf to root" by identifying each peer group and then for
 each mount showing the id of its own group and the id of the group's
 master.

2) can have two variants:

 2a) id of peer group is constant in time

 2b) id of peer group may change

The current patch does 2b).  Having a fixed id for each peer group
would mean introducing a new object to anchor the peer group into,
which would add complexity to the whole thing.

All of these are implementable, just need to decide which one we want.

Miklos

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: how to show propagation state for mounts
  2008-02-20 15:39 how to show propagation state for mounts Miklos Szeredi
@ 2008-02-20 16:04 ` Al Viro
  2008-02-20 16:27   ` Miklos Szeredi
  2008-02-20 16:31   ` how to show propagation state for mounts Matthew Wilcox
  0 siblings, 2 replies; 15+ messages in thread
From: Al Viro @ 2008-02-20 16:04 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-fsdevel, linuxram

On Wed, Feb 20, 2008 at 04:39:15PM +0100, Miklos Szeredi wrote:
> > mountinfo - IMO needs a sane discussion of what and how should be shown
> > wrt propagation state
> 
> Here's my take on the matter.
> 
> The propagation tree can be either be represented
> 
>  1) "from root to leaf" listing members of peer groups and their
>  slaves explicitly,
> 
>  2) or "from leaf to root" by identifying each peer group and then for
>  each mount showing the id of its own group and the id of the group's
>  master.
> 
> 2) can have two variants:
> 
>  2a) id of peer group is constant in time
> 
>  2b) id of peer group may change
> 
> The current patch does 2b).  Having a fixed id for each peer group
> would mean introducing a new object to anchor the peer group into,
> which would add complexity to the whole thing.
> 
> All of these are implementable, just need to decide which one we want.

Eh...  Much more interesting question: since the propagation tree spans
multiple namespaces in a lot of normal uses, how do we deal with
reconstructing propagation through the parts that are not present in
our namespace?  Moreover, what should and what should not be kept private
to namespace?  Full exposure of mount trees is definitely over the top
(it shows potentially sensitive information), so we probably want less
than that.

FWIW, my gut feeling is that for each peer group that intersects with our
namespace we ought to expose in some form
	* all vfsmounts belonging to that intesection
	* the nearest dominating peer group (== master (of master ...) of)
that also has a non-empty intersection with our namespace

It's less about the form of representation (after all, we generate poll
events when contents of that sucker changes, so one *can* get a consistent
snapshot of the entire thing) and more about having it self-contained
when we have namespaces in the play.

IOW, the data in there should give answers to questions that make sense.
"Do events get propagated from this vfsmount I have to that vfsmount I have?"
is a meaningful one; ditto for "are events here propagated to somewhere I
don't see?" or "are events getting propagated here from somewhere I don't
see?".

Dumping pieces of raw graph, with IDs of nodes we can't see and without
any way to connect those pieces, OTOH, doesn't make much sense.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: how to show propagation state for mounts
  2008-02-20 16:04 ` Al Viro
@ 2008-02-20 16:27   ` Miklos Szeredi
  2008-02-20 19:29     ` Ram Pai
  2008-02-20 16:31   ` how to show propagation state for mounts Matthew Wilcox
  1 sibling, 1 reply; 15+ messages in thread
From: Miklos Szeredi @ 2008-02-20 16:27 UTC (permalink / raw)
  To: viro; +Cc: miklos, linux-fsdevel, linuxram

> On Wed, Feb 20, 2008 at 04:39:15PM +0100, Miklos Szeredi wrote:
> > > mountinfo - IMO needs a sane discussion of what and how should be shown
> > > wrt propagation state
> > 
> > Here's my take on the matter.
> > 
> > The propagation tree can be either be represented
> > 
> >  1) "from root to leaf" listing members of peer groups and their
> >  slaves explicitly,
> > 
> >  2) or "from leaf to root" by identifying each peer group and then for
> >  each mount showing the id of its own group and the id of the group's
> >  master.
> > 
> > 2) can have two variants:
> > 
> >  2a) id of peer group is constant in time
> > 
> >  2b) id of peer group may change
> > 
> > The current patch does 2b).  Having a fixed id for each peer group
> > would mean introducing a new object to anchor the peer group into,
> > which would add complexity to the whole thing.
> > 
> > All of these are implementable, just need to decide which one we want.
> 
> Eh...  Much more interesting question: since the propagation tree spans
> multiple namespaces in a lot of normal uses, how do we deal with
> reconstructing propagation through the parts that are not present in
> our namespace?  Moreover, what should and what should not be kept private
> to namespace?  Full exposure of mount trees is definitely over the top
> (it shows potentially sensitive information), so we probably want less
> than that.
> 
> FWIW, my gut feeling is that for each peer group that intersects with our
> namespace we ought to expose in some form
> 	* all vfsmounts belonging to that intesection
> 	* the nearest dominating peer group (== master (of master ...) of)
> that also has a non-empty intersection with our namespace
> 
> It's less about the form of representation (after all, we generate poll
> events when contents of that sucker changes, so one *can* get a consistent
> snapshot of the entire thing) and more about having it self-contained
> when we have namespaces in the play.
> 
> IOW, the data in there should give answers to questions that make sense.
> "Do events get propagated from this vfsmount I have to that vfsmount I have?"
> is a meaningful one; ditto for "are events here propagated to somewhere I
> don't see?" or "are events getting propagated here from somewhere I don't
> see?".

Well, assuming you see only one namespace.  When I'm experimenting
with namespaces and propagations, I see both (each in a separate
xterm) and I do want to know how propagation between them happens.

Your suggestion doesn't deal with that problem.

Otherwise, yes it makes sense to have a consistent view of the tree
shown for each namespace.  Perhaps the solution is to restrict viewing
the whole tree to privileged processes.

Miklos

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: how to show propagation state for mounts
  2008-02-20 16:04 ` Al Viro
  2008-02-20 16:27   ` Miklos Szeredi
@ 2008-02-20 16:31   ` Matthew Wilcox
  2008-02-20 19:42     ` Ram Pai
  1 sibling, 1 reply; 15+ messages in thread
From: Matthew Wilcox @ 2008-02-20 16:31 UTC (permalink / raw)
  To: Al Viro; +Cc: Miklos Szeredi, linux-fsdevel, linuxram

On Wed, Feb 20, 2008 at 04:04:22PM +0000, Al Viro wrote:
> It's less about the form of representation (after all, we generate poll
> events when contents of that sucker changes, so one *can* get a consistent
> snapshot of the entire thing) and more about having it self-contained
> when we have namespaces in the play.
> 
> IOW, the data in there should give answers to questions that make sense.
> "Do events get propagated from this vfsmount I have to that vfsmount I have?"
> is a meaningful one; ditto for "are events here propagated to somewhere I
> don't see?" or "are events getting propagated here from somewhere I don't
> see?".

Why do those last two questions deserve an answer?  How will a person's
or application's behaviour be affected by whether a change will
propagate to something they don't know about and can't see?

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: how to show propagation state for mounts
  2008-02-20 16:27   ` Miklos Szeredi
@ 2008-02-20 19:29     ` Ram Pai
  2008-02-20 21:14       ` Al Viro
  0 siblings, 1 reply; 15+ messages in thread
From: Ram Pai @ 2008-02-20 19:29 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: viro, linux-fsdevel

On Wed, 2008-02-20 at 17:27 +0100, Miklos Szeredi wrote:
> > On Wed, Feb 20, 2008 at 04:39:15PM +0100, Miklos Szeredi wrote:
> > > > mountinfo - IMO needs a sane discussion of what and how should be shown
> > > > wrt propagation state
> > > 
> > > Here's my take on the matter.
> > > 
> > > The propagation tree can be either be represented
> > > 
> > >  1) "from root to leaf" listing members of peer groups and their
> > >  slaves explicitly,
> > > 
> > >  2) or "from leaf to root" by identifying each peer group and then for
> > >  each mount showing the id of its own group and the id of the group's
> > >  master.
> > > 
> > > 2) can have two variants:
> > > 
> > >  2a) id of peer group is constant in time
> > > 
> > >  2b) id of peer group may change
> > > 
> > > The current patch does 2b).  Having a fixed id for each peer group
> > > would mean introducing a new object to anchor the peer group into,
> > > which would add complexity to the whole thing.
> > > 
> > > All of these are implementable, just need to decide which one we want.
> > 
> > Eh...  Much more interesting question: since the propagation tree spans
> > multiple namespaces in a lot of normal uses, how do we deal with
> > reconstructing propagation through the parts that are not present in
> > our namespace?  Moreover, what should and what should not be kept private
> > to namespace?  Full exposure of mount trees is definitely over the top
> > (it shows potentially sensitive information), so we probably want less
> > than that.
> > 
> > FWIW, my gut feeling is that for each peer group that intersects with our
> > namespace we ought to expose in some form
> > 	* all vfsmounts belonging to that intesection
> > 	* the nearest dominating peer group (== master (of master ...) of)
> > that also has a non-empty intersection with our namespace
> > 
> > It's less about the form of representation (after all, we generate poll
> > events when contents of that sucker changes, so one *can* get a consistent
> > snapshot of the entire thing) and more about having it self-contained
> > when we have namespaces in the play.
> > 
> > IOW, the data in there should give answers to questions that make sense.
> > "Do events get propagated from this vfsmount I have to that vfsmount I have?"
> > is a meaningful one; ditto for "are events here propagated to somewhere I
> > don't see?" or "are events getting propagated here from somewhere I don't
> > see?".
> 
> Well, assuming you see only one namespace.  When I'm experimenting
> with namespaces and propagations, I see both (each in a separate
> xterm) and I do want to know how propagation between them happens.
> 
> Your suggestion doesn't deal with that problem.
> 
> Otherwise, yes it makes sense to have a consistent view of the tree
> shown for each namespace.  Perhaps the solution is to restrict viewing
> the whole tree to privileged processes.

I wonder, what is wrong in reporting mounts in other namespaces that
either receive and send propagation to mounts in our namespace?

If we take that approach, we will report **only** the mounts in other
namespace which have a counter part in our namespace. After all the
filesystems backing the mounts here and there are the same(other wise
they would'nt have propagated).

And any mounts contained outside our namespace, having no propagation
relation to any mounts in our namespace, will remain hidden. 

RP


> 
> Miklos


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: how to show propagation state for mounts
  2008-02-20 16:31   ` how to show propagation state for mounts Matthew Wilcox
@ 2008-02-20 19:42     ` Ram Pai
  0 siblings, 0 replies; 15+ messages in thread
From: Ram Pai @ 2008-02-20 19:42 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Al Viro, Miklos Szeredi, linux-fsdevel

On Wed, 2008-02-20 at 09:31 -0700, Matthew Wilcox wrote:
> On Wed, Feb 20, 2008 at 04:04:22PM +0000, Al Viro wrote:
> > It's less about the form of representation (after all, we generate poll
> > events when contents of that sucker changes, so one *can* get a consistent
> > snapshot of the entire thing) and more about having it self-contained
> > when we have namespaces in the play.
> > 
> > IOW, the data in there should give answers to questions that make sense.
> > "Do events get propagated from this vfsmount I have to that vfsmount I have?"
> > is a meaningful one; ditto for "are events here propagated to somewhere I
> > don't see?" or "are events getting propagated here from somewhere I don't
> > see?".
> 
> Why do those last two questions deserve an answer?  How will a person's
> or application's behaviour be affected by whether a change will
> propagate to something they don't know about and can't see?

Well, I do not want to be surprised to see a mount suddenly show up in
my namespace because of some action by some other user in some other
namespace. Its going to happen anyway if the namespace is forked of 
a namespace that had shared mounts in them. However I would rather
prefer to know in advance the spots (mounts) where such surprises can
happen. Also I would prefer to know how my actions will effect mounts in
other namespaces.

RP


> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: how to show propagation state for mounts
  2008-02-20 19:29     ` Ram Pai
@ 2008-02-20 21:14       ` Al Viro
  2008-02-20 21:35         ` Miklos Szeredi
  0 siblings, 1 reply; 15+ messages in thread
From: Al Viro @ 2008-02-20 21:14 UTC (permalink / raw)
  To: Ram Pai; +Cc: Miklos Szeredi, linux-fsdevel

On Wed, Feb 20, 2008 at 11:29:13AM -0800, Ram Pai wrote:

> I wonder, what is wrong in reporting mounts in other namespaces that
> either receive and send propagation to mounts in our namespace?

A plenty.  E.g. if foo trusts control over /var/blah to bar, it's not
obvious that foo has any business knowing if bar gets it from somebody
else in turn.  And I'm not sure that bar has any business knowing that
foo has the damn thing attached in five places instead of just one,
let alone _where_ it has been attached.

If you get down to it, the thing is about delegating control over part
of namespace to somebody, without letting them control, see, etc. the
rest of it.  So I'd rather be very conservative about extra information
we allow to piggyback on that.  I don't know... perhaps with stable peer
group IDs it would be OK to show peer group ID by (our) vfsmount + peer
group ID of master + peer group ID of nearest dominating group that has
intersection with our namespace.  Then we don't leak information (AFAICS),
get full propagation information between our vfsmounts and cooperating
tasks in different namespaces can figure the things out as much as possible
without leaking 3rd-party information to either.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: how to show propagation state for mounts
  2008-02-20 21:14       ` Al Viro
@ 2008-02-20 21:35         ` Miklos Szeredi
  2008-02-22 14:46           ` [rfc patch] " Miklos Szeredi
  0 siblings, 1 reply; 15+ messages in thread
From: Miklos Szeredi @ 2008-02-20 21:35 UTC (permalink / raw)
  To: viro; +Cc: linuxram, miklos, linux-fsdevel

> > I wonder, what is wrong in reporting mounts in other namespaces that
> > either receive and send propagation to mounts in our namespace?
> 
> A plenty.  E.g. if foo trusts control over /var/blah to bar, it's not
> obvious that foo has any business knowing if bar gets it from somebody
> else in turn.  And I'm not sure that bar has any business knowing that
> foo has the damn thing attached in five places instead of just one,
> let alone _where_ it has been attached.
> 
> If you get down to it, the thing is about delegating control over part
> of namespace to somebody, without letting them control, see, etc. the
> rest of it.  So I'd rather be very conservative about extra information
> we allow to piggyback on that.  I don't know... perhaps with stable peer
> group IDs it would be OK to show peer group ID by (our) vfsmount + peer
> group ID of master + peer group ID of nearest dominating group that has
> intersection with our namespace.  Then we don't leak information (AFAICS),
> get full propagation information between our vfsmounts and cooperating
> tasks in different namespaces can figure the things out as much as possible
> without leaking 3rd-party information to either.

This sounds fine.

I'll have a look at implementing a stable peer group ID (it doesn't
need a separate object, I realized that now).

Miklos

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [rfc patch] how to show propagation state for mounts
  2008-02-20 21:35         ` Miklos Szeredi
@ 2008-02-22 14:46           ` Miklos Szeredi
  2008-03-05 19:25             ` Serge E. Hallyn
  2008-03-10  6:53             ` [RFC PATCH v2] vfs: optimization to /proc/<pid>/mountinfo patch Ram Pai
  0 siblings, 2 replies; 15+ messages in thread
From: Miklos Szeredi @ 2008-02-22 14:46 UTC (permalink / raw)
  To: viro; +Cc: linuxram, miklos, linux-fsdevel

> > If you get down to it, the thing is about delegating control over part
> > of namespace to somebody, without letting them control, see, etc. the
> > rest of it.  So I'd rather be very conservative about extra information
> > we allow to piggyback on that.  I don't know... perhaps with stable peer
> > group IDs it would be OK to show peer group ID by (our) vfsmount + peer
> > group ID of master + peer group ID of nearest dominating group that has
> > intersection with our namespace.  Then we don't leak information (AFAICS),
> > get full propagation information between our vfsmounts and cooperating
> > tasks in different namespaces can figure the things out as much as possible
> > without leaking 3rd-party information to either.
> 

Here's a patch against current -mm implementing this (with some
cleanups thrown in).  Done some testing on it as well, it wasn't
entirey trivial to figure out a setup, where propagation goes out of
the namespace first, then comes back in:

  mount --bind /mnt1 /mnt1
  mount --make-shared /mnt1
  mount --bind /mnt2 /mnt2
  mount --make-shared /mnt2
  newns
  mount --make-slave /mnt1

old ns:
  mount --make-slave /mnt2
  mount --bind /mnt1/tmp /mnt1/tmp

new ns:
  mount --make-shared /mnt1/tmp
  mount --bind /mnt1/tmp /mnt2/tmp

Voila.


Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---

Index: linux/fs/pnode.c
===================================================================
--- linux.orig/fs/pnode.c	2008-02-22 15:27:23.000000000 +0100
+++ linux/fs/pnode.c	2008-02-22 15:27:26.000000000 +0100
@@ -9,8 +9,12 @@
 #include <linux/mnt_namespace.h>
 #include <linux/mount.h>
 #include <linux/fs.h>
+#include <linux/idr.h>
 #include "pnode.h"
 
+static DEFINE_SPINLOCK(mnt_pgid_lock);
+static DEFINE_IDA(mnt_pgid_ida);
+
 /* return the next shared peer mount of @p */
 static inline struct vfsmount *next_peer(struct vfsmount *p)
 {
@@ -27,36 +31,90 @@ static inline struct vfsmount *next_slav
 	return list_entry(p->mnt_slave.next, struct vfsmount, mnt_slave);
 }
 
-static int __peer_group_id(struct vfsmount *mnt)
+static void __set_mnt_shared(struct vfsmount *mnt)
 {
-	struct vfsmount *m;
-	int id = mnt->mnt_id;
+	mnt->mnt_flags &= ~MNT_PNODE_MASK;
+	mnt->mnt_flags |= MNT_SHARED;
+}
+
+void set_mnt_shared(struct vfsmount *mnt)
+{
+	int res;
 
-	for (m = next_peer(mnt); m != mnt; m = next_peer(m))
-		id = min(id, m->mnt_id);
+ retry:
+	spin_lock(&mnt_pgid_lock);
+	if (IS_MNT_SHARED(mnt)) {
+		spin_unlock(&mnt_pgid_lock);
+		return;
+	}
 
-	return id;
+	res = ida_get_new(&mnt_pgid_ida, &mnt->mnt_pgid);
+	spin_unlock(&mnt_pgid_lock);
+	if (res == -EAGAIN) {
+		if (ida_pre_get(&mnt_pgid_ida, GFP_KERNEL))
+			goto retry;
+	}
+	__set_mnt_shared(mnt);
+}
+
+void clear_mnt_shared(struct vfsmount *mnt)
+{
+	if (IS_MNT_SHARED(mnt)) {
+		mnt->mnt_flags &= ~MNT_SHARED;
+		mnt->mnt_pgid = -1;
+	}
+}
+
+void make_mnt_peer(struct vfsmount *old, struct vfsmount *mnt)
+{
+	mnt->mnt_pgid = old->mnt_pgid;
+	list_add(&mnt->mnt_share, &old->mnt_share);
+	__set_mnt_shared(mnt);
 }
 
-/* return the smallest ID within the peer group */
 int get_peer_group_id(struct vfsmount *mnt)
 {
+	return mnt->mnt_pgid;
+}
+
+int get_master_id(struct vfsmount *mnt)
+{
 	int id;
 
 	spin_lock(&vfsmount_lock);
-	id = __peer_group_id(mnt);
+	id = get_peer_group_id(mnt->mnt_master);
 	spin_unlock(&vfsmount_lock);
 
 	return id;
 }
 
-/* return the smallest ID within the master's peer group */
-int get_master_id(struct vfsmount *mnt)
+static struct vfsmount *get_peer_in_ns(struct vfsmount *mnt,
+				       struct mnt_namespace *ns)
 {
-	int id;
+	struct vfsmount *m = mnt;
+
+	do {
+		if (m->mnt_ns == ns)
+			return m;
+		m = next_peer(m);
+	} while (m != mnt);
+
+	return NULL;
+}
+
+int get_dominator_id_same_ns(struct vfsmount *mnt)
+{
+	int id = -1;
+	struct vfsmount *m;
 
 	spin_lock(&vfsmount_lock);
-	id = __peer_group_id(mnt->mnt_master);
+	for (m = mnt->mnt_master; m != NULL; m = m->mnt_master) {
+		struct vfsmount *d = get_peer_in_ns(m, mnt->mnt_ns);
+		if (d) {
+			id = d->mnt_pgid;
+			break;
+		}
+	}
 	spin_unlock(&vfsmount_lock);
 
 	return id;
@@ -80,7 +138,13 @@ static int do_make_slave(struct vfsmount
 		if (peer_mnt == mnt)
 			peer_mnt = NULL;
 	}
-	list_del_init(&mnt->mnt_share);
+	if (!list_empty(&mnt->mnt_share))
+		list_del_init(&mnt->mnt_share);
+	else if (IS_MNT_SHARED(mnt)) {
+		spin_lock(&mnt_pgid_lock);
+		ida_remove(&mnt_pgid_ida, mnt->mnt_pgid);
+		spin_unlock(&mnt_pgid_lock);
+	}
 
 	if (peer_mnt)
 		master = peer_mnt;
@@ -89,20 +153,18 @@ static int do_make_slave(struct vfsmount
 		list_for_each_entry(slave_mnt, &mnt->mnt_slave_list, mnt_slave)
 			slave_mnt->mnt_master = master;
 		list_move(&mnt->mnt_slave, &master->mnt_slave_list);
-		list_splice(&mnt->mnt_slave_list, master->mnt_slave_list.prev);
-		INIT_LIST_HEAD(&mnt->mnt_slave_list);
+		list_splice_init(&mnt->mnt_slave_list,
+				 master->mnt_slave_list.prev);
 	} else {
-		struct list_head *p = &mnt->mnt_slave_list;
-		while (!list_empty(p)) {
-                        slave_mnt = list_first_entry(p,
+		while (!list_empty(&mnt->mnt_slave_list)) {
+			slave_mnt = list_first_entry(&mnt->mnt_slave_list,
 					struct vfsmount, mnt_slave);
 			list_del_init(&slave_mnt->mnt_slave);
 			slave_mnt->mnt_master = NULL;
 		}
 	}
 	mnt->mnt_master = master;
-	CLEAR_MNT_SHARED(mnt);
-	INIT_LIST_HEAD(&mnt->mnt_slave_list);
+	clear_mnt_shared(mnt);
 	return 0;
 }
 
Index: linux/fs/namespace.c
===================================================================
--- linux.orig/fs/namespace.c	2008-02-22 15:27:23.000000000 +0100
+++ linux/fs/namespace.c	2008-02-22 15:27:26.000000000 +0100
@@ -95,6 +95,7 @@ struct vfsmount *alloc_vfsmnt(const char
 			return NULL;
 		}
 
+		mnt->mnt_pgid = -1;
 		atomic_set(&mnt->mnt_count, 1);
 		INIT_LIST_HEAD(&mnt->mnt_hash);
 		INIT_LIST_HEAD(&mnt->mnt_child);
@@ -537,10 +538,12 @@ static struct vfsmount *clone_mnt(struct
 		if (flag & CL_SLAVE) {
 			list_add(&mnt->mnt_slave, &old->mnt_slave_list);
 			mnt->mnt_master = old;
-			CLEAR_MNT_SHARED(mnt);
+			clear_mnt_shared(mnt);
 		} else if (!(flag & CL_PRIVATE)) {
-			if ((flag & CL_PROPAGATION) || IS_MNT_SHARED(old))
-				list_add(&mnt->mnt_share, &old->mnt_share);
+			if (flag & CL_PROPAGATION)
+				set_mnt_shared(old);
+			if (IS_MNT_SHARED(old))
+				make_mnt_peer(old, mnt);
 			if (IS_MNT_SLAVE(old))
 				list_add(&mnt->mnt_slave, &old->mnt_slave);
 			mnt->mnt_master = old->mnt_master;
@@ -795,16 +798,24 @@ static int show_mountinfo(struct seq_fil
 	show_sb_opts(m, sb);
 	if (sb->s_op->show_options)
 		err = sb->s_op->show_options(m, mnt);
-	if (IS_MNT_SHARED(mnt)) {
-		seq_printf(m, " shared:%i", get_peer_group_id(mnt));
-		if (IS_MNT_SLAVE(mnt))
-			seq_printf(m, ",slave:%i", get_master_id(mnt));
-	} else if (IS_MNT_SLAVE(mnt)) {
-		seq_printf(m, " slave:%i", get_master_id(mnt));
+	seq_putc(m, ' ');
+	if (IS_MNT_SHARED(mnt) || IS_MNT_SLAVE(mnt)) {
+		if (IS_MNT_SHARED(mnt))
+			seq_printf(m, "shared:%i", get_peer_group_id(mnt));
+		if (IS_MNT_SLAVE(mnt)) {
+			int dominator_id = get_dominator_id_same_ns(mnt);
+
+			if (IS_MNT_SHARED(mnt))
+				seq_putc(m, ',');
+
+			seq_printf(m, "slave:%i", get_master_id(mnt));
+			if (dominator_id != -1)
+				seq_printf(m, ":%i", dominator_id);
+		}
 	} else if (IS_MNT_UNBINDABLE(mnt)) {
-		seq_printf(m, " unbindable");
+		seq_printf(m, "unbindable");
 	} else {
-		seq_printf(m, " private");
+		seq_printf(m, "private");
 	}
 	seq_putc(m, '\n');
 	return err;
Index: linux/fs/pnode.h
===================================================================
--- linux.orig/fs/pnode.h	2008-02-22 15:27:23.000000000 +0100
+++ linux/fs/pnode.h	2008-02-22 15:27:26.000000000 +0100
@@ -14,7 +14,6 @@
 #define IS_MNT_SHARED(mnt) (mnt->mnt_flags & MNT_SHARED)
 #define IS_MNT_SLAVE(mnt) (mnt->mnt_master)
 #define IS_MNT_NEW(mnt)  (!mnt->mnt_ns)
-#define CLEAR_MNT_SHARED(mnt) (mnt->mnt_flags &= ~MNT_SHARED)
 #define IS_MNT_UNBINDABLE(mnt) (mnt->mnt_flags & MNT_UNBINDABLE)
 
 #define CL_EXPIRE    		0x01
@@ -24,12 +23,9 @@
 #define CL_PROPAGATION 		0x10
 #define CL_PRIVATE 		0x20
 
-static inline void set_mnt_shared(struct vfsmount *mnt)
-{
-	mnt->mnt_flags &= ~MNT_PNODE_MASK;
-	mnt->mnt_flags |= MNT_SHARED;
-}
-
+void set_mnt_shared(struct vfsmount *);
+void clear_mnt_shared(struct vfsmount *);
+void make_mnt_peer(struct vfsmount *, struct vfsmount *);
 void change_mnt_propagation(struct vfsmount *, int);
 int propagate_mnt(struct vfsmount *, struct dentry *, struct vfsmount *,
 		struct list_head *);
@@ -37,4 +33,5 @@ int propagate_umount(struct list_head *)
 int propagate_mount_busy(struct vfsmount *, int);
 int get_peer_group_id(struct vfsmount *);
 int get_master_id(struct vfsmount *);
+int get_dominator_id_same_ns(struct vfsmount *);
 #endif /* _LINUX_PNODE_H */
Index: linux/include/linux/mount.h
===================================================================
--- linux.orig/include/linux/mount.h	2008-02-22 15:27:23.000000000 +0100
+++ linux/include/linux/mount.h	2008-02-22 15:27:26.000000000 +0100
@@ -57,6 +57,7 @@ struct vfsmount {
 	struct vfsmount *mnt_master;	/* slave is on master->mnt_slave_list */
 	struct mnt_namespace *mnt_ns;	/* containing namespace */
 	int mnt_id;			/* mount identifier */
+	int mnt_pgid;			/* peer group identifier */
 	/*
 	 * We put mnt_count & mnt_expiry_mark at the end of struct vfsmount
 	 * to let these frequently modified fields in a separate cache line
Index: linux/Documentation/filesystems/proc.txt
===================================================================
--- linux.orig/Documentation/filesystems/proc.txt	2008-02-22 15:27:23.000000000 +0100
+++ linux/Documentation/filesystems/proc.txt	2008-02-22 15:27:26.000000000 +0100
@@ -2367,21 +2367,20 @@ MNTOPTS: per mount options
 SBOPTS: per super block options
 PROPAGATION: propagation type
 
-propagation type: <propagation_flag>[:<mntid>][,...]
-	note: 'shared' flag is followed by the mntid of its peer mount
-	      'slave' flag is followed by the mntid of its master mount
+propagation type: <propagation_flag>[:<peergrpid>[:<domgrpid>]][,...]
+	note: 'shared' flag is followed by the id of this mount's peer group
+	      'slave' flag is followed by the peer group id of its master mount,
+	      	      optionally followed by the id of the closest dominant(*)
+		      peer group in the same namespace, if one exists.
 	      'private' flag stands by itself
 	      'unbindable' flag stands by itself
 
-The 'mntid' used in the propagation type is a canonical ID of the peer
-group (currently the smallest ID within the group is used for this
-purpose, but this should not be relied on).  Since mounts can be added
-or removed from the peer group, this ID only guaranteed to stay the
-same on a static propagation tree.
+(*) A dominant peer group is an ancestor of this mount in the
+propagation tree, in other words, this mount receives propagation from
+the dominant peer group, but not the other way round.
 
 For more information see:
 
   Documentation/filesystems/sharedsubtree.txt
 
-
 ------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [rfc patch] how to show propagation state for mounts
  2008-02-22 14:46           ` [rfc patch] " Miklos Szeredi
@ 2008-03-05 19:25             ` Serge E. Hallyn
  2008-03-05 19:34               ` Miklos Szeredi
  2008-03-10  6:53             ` [RFC PATCH v2] vfs: optimization to /proc/<pid>/mountinfo patch Ram Pai
  1 sibling, 1 reply; 15+ messages in thread
From: Serge E. Hallyn @ 2008-03-05 19:25 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: viro, linuxram, linux-fsdevel

Quoting Miklos Szeredi (miklos@szeredi.hu):
> > > If you get down to it, the thing is about delegating control over part
> > > of namespace to somebody, without letting them control, see, etc. the
> > > rest of it.  So I'd rather be very conservative about extra information
> > > we allow to piggyback on that.  I don't know... perhaps with stable peer
> > > group IDs it would be OK to show peer group ID by (our) vfsmount + peer
> > > group ID of master + peer group ID of nearest dominating group that has
> > > intersection with our namespace.  Then we don't leak information (AFAICS),
> > > get full propagation information between our vfsmounts and cooperating
> > > tasks in different namespaces can figure the things out as much as possible
> > > without leaking 3rd-party information to either.
> > 
> 
> Here's a patch against current -mm implementing this (with some
> cleanups thrown in).  Done some testing on it as well, it wasn't
> entirey trivial to figure out a setup, where propagation goes out of
> the namespace first, then comes back in:

Looks nice, and a bit of testing/playing around showed no problem on my
end.

This definately would be a nice feature to have, and heck, it might
greatly simplify an LTP testcase for mounts propagation which is long
overdue.

thanks,
-serge

>   mount --bind /mnt1 /mnt1
>   mount --make-shared /mnt1
>   mount --bind /mnt2 /mnt2
>   mount --make-shared /mnt2
>   newns
>   mount --make-slave /mnt1
> 
> old ns:
>   mount --make-slave /mnt2
>   mount --bind /mnt1/tmp /mnt1/tmp
> 
> new ns:
>   mount --make-shared /mnt1/tmp
>   mount --bind /mnt1/tmp /mnt2/tmp
> 
> Voila.
> 
> 
> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
> ---
> 
> Index: linux/fs/pnode.c
> ===================================================================
> --- linux.orig/fs/pnode.c	2008-02-22 15:27:23.000000000 +0100
> +++ linux/fs/pnode.c	2008-02-22 15:27:26.000000000 +0100
> @@ -9,8 +9,12 @@
>  #include <linux/mnt_namespace.h>
>  #include <linux/mount.h>
>  #include <linux/fs.h>
> +#include <linux/idr.h>
>  #include "pnode.h"
> 
> +static DEFINE_SPINLOCK(mnt_pgid_lock);
> +static DEFINE_IDA(mnt_pgid_ida);
> +
>  /* return the next shared peer mount of @p */
>  static inline struct vfsmount *next_peer(struct vfsmount *p)
>  {
> @@ -27,36 +31,90 @@ static inline struct vfsmount *next_slav
>  	return list_entry(p->mnt_slave.next, struct vfsmount, mnt_slave);
>  }
> 
> -static int __peer_group_id(struct vfsmount *mnt)
> +static void __set_mnt_shared(struct vfsmount *mnt)
>  {
> -	struct vfsmount *m;
> -	int id = mnt->mnt_id;
> +	mnt->mnt_flags &= ~MNT_PNODE_MASK;
> +	mnt->mnt_flags |= MNT_SHARED;
> +}
> +
> +void set_mnt_shared(struct vfsmount *mnt)
> +{
> +	int res;
> 
> -	for (m = next_peer(mnt); m != mnt; m = next_peer(m))
> -		id = min(id, m->mnt_id);
> + retry:
> +	spin_lock(&mnt_pgid_lock);
> +	if (IS_MNT_SHARED(mnt)) {
> +		spin_unlock(&mnt_pgid_lock);
> +		return;
> +	}
> 
> -	return id;
> +	res = ida_get_new(&mnt_pgid_ida, &mnt->mnt_pgid);
> +	spin_unlock(&mnt_pgid_lock);
> +	if (res == -EAGAIN) {
> +		if (ida_pre_get(&mnt_pgid_ida, GFP_KERNEL))
> +			goto retry;
> +	}
> +	__set_mnt_shared(mnt);
> +}
> +
> +void clear_mnt_shared(struct vfsmount *mnt)
> +{
> +	if (IS_MNT_SHARED(mnt)) {
> +		mnt->mnt_flags &= ~MNT_SHARED;
> +		mnt->mnt_pgid = -1;
> +	}
> +}
> +
> +void make_mnt_peer(struct vfsmount *old, struct vfsmount *mnt)
> +{
> +	mnt->mnt_pgid = old->mnt_pgid;
> +	list_add(&mnt->mnt_share, &old->mnt_share);
> +	__set_mnt_shared(mnt);
>  }
> 
> -/* return the smallest ID within the peer group */
>  int get_peer_group_id(struct vfsmount *mnt)
>  {
> +	return mnt->mnt_pgid;
> +}
> +
> +int get_master_id(struct vfsmount *mnt)
> +{
>  	int id;
> 
>  	spin_lock(&vfsmount_lock);
> -	id = __peer_group_id(mnt);
> +	id = get_peer_group_id(mnt->mnt_master);
>  	spin_unlock(&vfsmount_lock);
> 
>  	return id;
>  }
> 
> -/* return the smallest ID within the master's peer group */
> -int get_master_id(struct vfsmount *mnt)
> +static struct vfsmount *get_peer_in_ns(struct vfsmount *mnt,
> +				       struct mnt_namespace *ns)
>  {
> -	int id;
> +	struct vfsmount *m = mnt;
> +
> +	do {
> +		if (m->mnt_ns == ns)
> +			return m;
> +		m = next_peer(m);
> +	} while (m != mnt);
> +
> +	return NULL;
> +}
> +
> +int get_dominator_id_same_ns(struct vfsmount *mnt)
> +{
> +	int id = -1;
> +	struct vfsmount *m;
> 
>  	spin_lock(&vfsmount_lock);
> -	id = __peer_group_id(mnt->mnt_master);
> +	for (m = mnt->mnt_master; m != NULL; m = m->mnt_master) {
> +		struct vfsmount *d = get_peer_in_ns(m, mnt->mnt_ns);
> +		if (d) {
> +			id = d->mnt_pgid;
> +			break;
> +		}
> +	}
>  	spin_unlock(&vfsmount_lock);
> 
>  	return id;
> @@ -80,7 +138,13 @@ static int do_make_slave(struct vfsmount
>  		if (peer_mnt == mnt)
>  			peer_mnt = NULL;
>  	}
> -	list_del_init(&mnt->mnt_share);
> +	if (!list_empty(&mnt->mnt_share))
> +		list_del_init(&mnt->mnt_share);
> +	else if (IS_MNT_SHARED(mnt)) {
> +		spin_lock(&mnt_pgid_lock);
> +		ida_remove(&mnt_pgid_ida, mnt->mnt_pgid);
> +		spin_unlock(&mnt_pgid_lock);
> +	}
> 
>  	if (peer_mnt)
>  		master = peer_mnt;
> @@ -89,20 +153,18 @@ static int do_make_slave(struct vfsmount
>  		list_for_each_entry(slave_mnt, &mnt->mnt_slave_list, mnt_slave)
>  			slave_mnt->mnt_master = master;
>  		list_move(&mnt->mnt_slave, &master->mnt_slave_list);
> -		list_splice(&mnt->mnt_slave_list, master->mnt_slave_list.prev);
> -		INIT_LIST_HEAD(&mnt->mnt_slave_list);
> +		list_splice_init(&mnt->mnt_slave_list,
> +				 master->mnt_slave_list.prev);
>  	} else {
> -		struct list_head *p = &mnt->mnt_slave_list;
> -		while (!list_empty(p)) {
> -                        slave_mnt = list_first_entry(p,
> +		while (!list_empty(&mnt->mnt_slave_list)) {
> +			slave_mnt = list_first_entry(&mnt->mnt_slave_list,
>  					struct vfsmount, mnt_slave);
>  			list_del_init(&slave_mnt->mnt_slave);
>  			slave_mnt->mnt_master = NULL;
>  		}
>  	}
>  	mnt->mnt_master = master;
> -	CLEAR_MNT_SHARED(mnt);
> -	INIT_LIST_HEAD(&mnt->mnt_slave_list);
> +	clear_mnt_shared(mnt);
>  	return 0;
>  }
> 
> Index: linux/fs/namespace.c
> ===================================================================
> --- linux.orig/fs/namespace.c	2008-02-22 15:27:23.000000000 +0100
> +++ linux/fs/namespace.c	2008-02-22 15:27:26.000000000 +0100
> @@ -95,6 +95,7 @@ struct vfsmount *alloc_vfsmnt(const char
>  			return NULL;
>  		}
> 
> +		mnt->mnt_pgid = -1;
>  		atomic_set(&mnt->mnt_count, 1);
>  		INIT_LIST_HEAD(&mnt->mnt_hash);
>  		INIT_LIST_HEAD(&mnt->mnt_child);
> @@ -537,10 +538,12 @@ static struct vfsmount *clone_mnt(struct
>  		if (flag & CL_SLAVE) {
>  			list_add(&mnt->mnt_slave, &old->mnt_slave_list);
>  			mnt->mnt_master = old;
> -			CLEAR_MNT_SHARED(mnt);
> +			clear_mnt_shared(mnt);
>  		} else if (!(flag & CL_PRIVATE)) {
> -			if ((flag & CL_PROPAGATION) || IS_MNT_SHARED(old))
> -				list_add(&mnt->mnt_share, &old->mnt_share);
> +			if (flag & CL_PROPAGATION)
> +				set_mnt_shared(old);
> +			if (IS_MNT_SHARED(old))
> +				make_mnt_peer(old, mnt);
>  			if (IS_MNT_SLAVE(old))
>  				list_add(&mnt->mnt_slave, &old->mnt_slave);
>  			mnt->mnt_master = old->mnt_master;
> @@ -795,16 +798,24 @@ static int show_mountinfo(struct seq_fil
>  	show_sb_opts(m, sb);
>  	if (sb->s_op->show_options)
>  		err = sb->s_op->show_options(m, mnt);
> -	if (IS_MNT_SHARED(mnt)) {
> -		seq_printf(m, " shared:%i", get_peer_group_id(mnt));
> -		if (IS_MNT_SLAVE(mnt))
> -			seq_printf(m, ",slave:%i", get_master_id(mnt));
> -	} else if (IS_MNT_SLAVE(mnt)) {
> -		seq_printf(m, " slave:%i", get_master_id(mnt));
> +	seq_putc(m, ' ');
> +	if (IS_MNT_SHARED(mnt) || IS_MNT_SLAVE(mnt)) {
> +		if (IS_MNT_SHARED(mnt))
> +			seq_printf(m, "shared:%i", get_peer_group_id(mnt));
> +		if (IS_MNT_SLAVE(mnt)) {
> +			int dominator_id = get_dominator_id_same_ns(mnt);
> +
> +			if (IS_MNT_SHARED(mnt))
> +				seq_putc(m, ',');
> +
> +			seq_printf(m, "slave:%i", get_master_id(mnt));
> +			if (dominator_id != -1)
> +				seq_printf(m, ":%i", dominator_id);
> +		}
>  	} else if (IS_MNT_UNBINDABLE(mnt)) {
> -		seq_printf(m, " unbindable");
> +		seq_printf(m, "unbindable");
>  	} else {
> -		seq_printf(m, " private");
> +		seq_printf(m, "private");
>  	}
>  	seq_putc(m, '\n');
>  	return err;
> Index: linux/fs/pnode.h
> ===================================================================
> --- linux.orig/fs/pnode.h	2008-02-22 15:27:23.000000000 +0100
> +++ linux/fs/pnode.h	2008-02-22 15:27:26.000000000 +0100
> @@ -14,7 +14,6 @@
>  #define IS_MNT_SHARED(mnt) (mnt->mnt_flags & MNT_SHARED)
>  #define IS_MNT_SLAVE(mnt) (mnt->mnt_master)
>  #define IS_MNT_NEW(mnt)  (!mnt->mnt_ns)
> -#define CLEAR_MNT_SHARED(mnt) (mnt->mnt_flags &= ~MNT_SHARED)
>  #define IS_MNT_UNBINDABLE(mnt) (mnt->mnt_flags & MNT_UNBINDABLE)
> 
>  #define CL_EXPIRE    		0x01
> @@ -24,12 +23,9 @@
>  #define CL_PROPAGATION 		0x10
>  #define CL_PRIVATE 		0x20
> 
> -static inline void set_mnt_shared(struct vfsmount *mnt)
> -{
> -	mnt->mnt_flags &= ~MNT_PNODE_MASK;
> -	mnt->mnt_flags |= MNT_SHARED;
> -}
> -
> +void set_mnt_shared(struct vfsmount *);
> +void clear_mnt_shared(struct vfsmount *);
> +void make_mnt_peer(struct vfsmount *, struct vfsmount *);
>  void change_mnt_propagation(struct vfsmount *, int);
>  int propagate_mnt(struct vfsmount *, struct dentry *, struct vfsmount *,
>  		struct list_head *);
> @@ -37,4 +33,5 @@ int propagate_umount(struct list_head *)
>  int propagate_mount_busy(struct vfsmount *, int);
>  int get_peer_group_id(struct vfsmount *);
>  int get_master_id(struct vfsmount *);
> +int get_dominator_id_same_ns(struct vfsmount *);
>  #endif /* _LINUX_PNODE_H */
> Index: linux/include/linux/mount.h
> ===================================================================
> --- linux.orig/include/linux/mount.h	2008-02-22 15:27:23.000000000 +0100
> +++ linux/include/linux/mount.h	2008-02-22 15:27:26.000000000 +0100
> @@ -57,6 +57,7 @@ struct vfsmount {
>  	struct vfsmount *mnt_master;	/* slave is on master->mnt_slave_list */
>  	struct mnt_namespace *mnt_ns;	/* containing namespace */
>  	int mnt_id;			/* mount identifier */
> +	int mnt_pgid;			/* peer group identifier */
>  	/*
>  	 * We put mnt_count & mnt_expiry_mark at the end of struct vfsmount
>  	 * to let these frequently modified fields in a separate cache line
> Index: linux/Documentation/filesystems/proc.txt
> ===================================================================
> --- linux.orig/Documentation/filesystems/proc.txt	2008-02-22 15:27:23.000000000 +0100
> +++ linux/Documentation/filesystems/proc.txt	2008-02-22 15:27:26.000000000 +0100
> @@ -2367,21 +2367,20 @@ MNTOPTS: per mount options
>  SBOPTS: per super block options
>  PROPAGATION: propagation type
> 
> -propagation type: <propagation_flag>[:<mntid>][,...]
> -	note: 'shared' flag is followed by the mntid of its peer mount
> -	      'slave' flag is followed by the mntid of its master mount
> +propagation type: <propagation_flag>[:<peergrpid>[:<domgrpid>]][,...]
> +	note: 'shared' flag is followed by the id of this mount's peer group
> +	      'slave' flag is followed by the peer group id of its master mount,
> +	      	      optionally followed by the id of the closest dominant(*)
> +		      peer group in the same namespace, if one exists.
>  	      'private' flag stands by itself
>  	      'unbindable' flag stands by itself
> 
> -The 'mntid' used in the propagation type is a canonical ID of the peer
> -group (currently the smallest ID within the group is used for this
> -purpose, but this should not be relied on).  Since mounts can be added
> -or removed from the peer group, this ID only guaranteed to stay the
> -same on a static propagation tree.
> +(*) A dominant peer group is an ancestor of this mount in the
> +propagation tree, in other words, this mount receives propagation from
> +the dominant peer group, but not the other way round.
> 
>  For more information see:
> 
>    Documentation/filesystems/sharedsubtree.txt
> 
> -
>  ------------------------------------------------------------------------------
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [rfc patch] how to show propagation state for mounts
  2008-03-05 19:25             ` Serge E. Hallyn
@ 2008-03-05 19:34               ` Miklos Szeredi
  2008-03-05 20:23                 ` Ram Pai
  0 siblings, 1 reply; 15+ messages in thread
From: Miklos Szeredi @ 2008-03-05 19:34 UTC (permalink / raw)
  To: serue; +Cc: viro, linuxram, linux-fsdevel

> > > > If you get down to it, the thing is about delegating control over part
> > > > of namespace to somebody, without letting them control, see, etc. the
> > > > rest of it.  So I'd rather be very conservative about extra information
> > > > we allow to piggyback on that.  I don't know... perhaps with stable peer
> > > > group IDs it would be OK to show peer group ID by (our) vfsmount + peer
> > > > group ID of master + peer group ID of nearest dominating group that has
> > > > intersection with our namespace.  Then we don't leak information (AFAICS),
> > > > get full propagation information between our vfsmounts and cooperating
> > > > tasks in different namespaces can figure the things out as much as possible
> > > > without leaking 3rd-party information to either.
> > > 
> > 
> > Here's a patch against current -mm implementing this (with some
> > cleanups thrown in).  Done some testing on it as well, it wasn't
> > entirey trivial to figure out a setup, where propagation goes out of
> > the namespace first, then comes back in:
> 
> Looks nice, and a bit of testing/playing around showed no problem on my
> end.

Thanks for testing!

Ram, how is your patch progressing?  Could you send me the final
version sometime, so that I can put a new version of this work
together and sumbit to -mm for more eyeballs?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [rfc patch] how to show propagation state for mounts
  2008-03-05 19:34               ` Miklos Szeredi
@ 2008-03-05 20:23                 ` Ram Pai
  0 siblings, 0 replies; 15+ messages in thread
From: Ram Pai @ 2008-03-05 20:23 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: serue, viro, linux-fsdevel

On Wed, 2008-03-05 at 20:34 +0100, Miklos Szeredi wrote:
> > > > > If you get down to it, the thing is about delegating control over part
> > > > > of namespace to somebody, without letting them control, see, etc. the
> > > > > rest of it.  So I'd rather be very conservative about extra information
> > > > > we allow to piggyback on that.  I don't know... perhaps with stable peer
> > > > > group IDs it would be OK to show peer group ID by (our) vfsmount + peer
> > > > > group ID of master + peer group ID of nearest dominating group that has
> > > > > intersection with our namespace.  Then we don't leak information (AFAICS),
> > > > > get full propagation information between our vfsmounts and cooperating
> > > > > tasks in different namespaces can figure the things out as much as possible
> > > > > without leaking 3rd-party information to either.
> > > > 
> > > 
> > > Here's a patch against current -mm implementing this (with some
> > > cleanups thrown in).  Done some testing on it as well, it wasn't
> > > entirey trivial to figure out a setup, where propagation goes out of
> > > the namespace first, then comes back in:
> > 
> > Looks nice, and a bit of testing/playing around showed no problem on my
> > end.
> 
> Thanks for testing!
> 
> Ram, how is your patch progressing?  Could you send me the final
> version sometime, so that I can put a new version of this work
> together and sumbit to -mm for more eyeballs?

Miklos,

	sorry. will have it your way tonight. Or the latest by the end
	of the week.


RP


> 
> Thanks,
> Miklos


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC PATCH v2] vfs: optimization to /proc/<pid>/mountinfo patch
  2008-02-22 14:46           ` [rfc patch] " Miklos Szeredi
  2008-03-05 19:25             ` Serge E. Hallyn
@ 2008-03-10  6:53             ` Ram Pai
  2008-03-10  7:10               ` Christoph Hellwig
  1 sibling, 1 reply; 15+ messages in thread
From: Ram Pai @ 2008-03-10  6:53 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: viro, linux-fsdevel

1) reports deleted inode in dentry_path() consistent with that in __d_path()
2) modified __d_path() to use prepend(), reducing the size of __d_path()
3) moved all the functionality that reports mount information in /proc under
	CONFIG_PROC_FS.


Code compile tested only with and without CONFIG_PROC_FS.


Signed-off-by: Ram Pai <linuxram@us.ibm.com>
---
 fs/dcache.c              |   66 +++++++++++++++++++----------------------------
 fs/namespace.c           |   15 +++++++++-
 fs/pnode.c               |   24 +++++++++++------
 fs/pnode.h               |    6 +++-
 fs/seq_file.c            |    2 +
 include/linux/dcache.h   |    3 ++
 include/linux/mount.h    |    2 +
 include/linux/seq_file.h |    3 ++
 8 files changed, 73 insertions(+), 48 deletions(-)

Index: linux-2.6.24/fs/dcache.c
===================================================================
--- linux-2.6.24.orig/fs/dcache.c
+++ linux-2.6.24/fs/dcache.c
@@ -1747,6 +1747,17 @@ shouldnt_be_hashed:
 	goto shouldnt_be_hashed;
 }
 
+static int prepend(char **buffer, int *buflen, const char *str,
+			  int namelen)
+{
+	*buflen -= namelen;
+	if (*buflen < 0)
+		return -ENAMETOOLONG;
+	*buffer -= namelen;
+	memcpy(*buffer, str, namelen);
+	return 0;
+}
+
 /**
  * d_path - return the path of a dentry
  * @dentry: dentry to report
@@ -1768,17 +1779,11 @@ static char *__d_path(struct dentry *den
 {
 	char * end = buffer+buflen;
 	char * retval;
-	int namelen;
 
-	*--end = '\0';
-	buflen--;
-	if (!IS_ROOT(dentry) && d_unhashed(dentry)) {
-		buflen -= 10;
-		end -= 10;
-		if (buflen < 0)
+	prepend(&end, &buflen, "\0", 1);
+	if (!IS_ROOT(dentry) && d_unhashed(dentry) &&
+		(prepend(&end, &buflen, " (deleted)", 10) != 0))
 			goto Elong;
-		memcpy(end, " (deleted)", 10);
-	}
 
 	if (buflen < 1)
 		goto Elong;
@@ -1805,13 +1810,10 @@ static char *__d_path(struct dentry *den
 		}
 		parent = dentry->d_parent;
 		prefetch(parent);
-		namelen = dentry->d_name.len;
-		buflen -= namelen + 1;
-		if (buflen < 0)
+		if ((prepend(&end, &buflen, dentry->d_name.name,
+				dentry->d_name.len) != 0) ||
+		    (prepend(&end, &buflen, "/", 1) != 0))
 			goto Elong;
-		end -= namelen;
-		memcpy(end, dentry->d_name.name, namelen);
-		*--end = '/';
 		retval = end;
 		dentry = parent;
 	}
@@ -1819,12 +1821,10 @@ static char *__d_path(struct dentry *den
 	return retval;
 
 global_root:
-	namelen = dentry->d_name.len;
-	buflen -= namelen;
-	if (buflen < 0)
-		goto Elong;
-	retval -= namelen-1;	/* hit the slash */
-	memcpy(retval, dentry->d_name.name, namelen);
+	retval += 1;	/* hit the slash */
+	if (prepend(&retval, &buflen, dentry->d_name.name,
+			dentry->d_name.len) !=0 )
+			goto Elong;
 	return retval;
 Elong:
 	return ERR_PTR(-ENAMETOOLONG);
@@ -1890,17 +1890,8 @@ char *dynamic_dname(struct dentry *dentr
 	return memcpy(buffer, temp, sz);
 }
 
-static int prepend(char **buffer, int *buflen, const char *str,
-			  int namelen)
-{
-	*buflen -= namelen;
-	if (*buflen < 0)
-		return 1;
-	*buffer -= namelen;
-	memcpy(*buffer, str, namelen);
-	return 0;
-}
 
+#ifdef CONFIG_PROC_FS
 /*
  * Write full pathname from the root of the filesystem into the buffer.
  */
@@ -1910,11 +1901,9 @@ char *dentry_path(struct dentry *dentry,
 	char *retval;
 
 	spin_lock(&dcache_lock);
-	prepend(&end, &buflen, "\0", 1);
-	if (!IS_ROOT(dentry) && d_unhashed(dentry)) {
-		if (prepend(&end, &buflen, "//deleted", 9))
+	if (!IS_ROOT(dentry) && d_unhashed(dentry) &&
+		(prepend(&end, &buflen, " (deleted)", 10) != 0))
 			goto Elong;
-	}
 	if (buflen < 1)
 		goto Elong;
 	/* Get '/' right */
@@ -1929,9 +1918,9 @@ char *dentry_path(struct dentry *dentry,
 		parent = dentry->d_parent;
 		prefetch(parent);
 
-		if (prepend(&end, &buflen, dentry->d_name.name,
-				dentry->d_name.len) ||
-		    prepend(&end, &buflen, "/", 1))
+		if ((prepend(&end, &buflen, dentry->d_name.name,
+				dentry->d_name.len) != 0) ||
+		    (prepend(&end, &buflen, "/", 1) != 0))
 			goto Elong;
 
 		retval = end;
@@ -1943,6 +1932,7 @@ Elong:
 	spin_unlock(&dcache_lock);
 	return ERR_PTR(-ENAMETOOLONG);
 }
+#endif /* CONFIG_PROC_FS */
 
 /*
  * NOTE! The user-level library version returns a
Index: linux-2.6.24/fs/namespace.c
===================================================================
--- linux-2.6.24.orig/fs/namespace.c
+++ linux-2.6.24/fs/namespace.c
@@ -40,7 +40,10 @@
 __cacheline_aligned_in_smp DEFINE_SPINLOCK(vfsmount_lock);
 
 static int event;
+
+#ifdef CONFIG_PROC_FS
 static DEFINE_IDA(mnt_id_ida);
+#endif /* CONFIG_PROC_FS */
 
 static struct list_head *mount_hashtable __read_mostly;
 static struct kmem_cache *mnt_cache __read_mostly;
@@ -60,6 +63,7 @@ static inline unsigned long hash(struct 
 
 #define MNT_WRITER_UNDERFLOW_LIMIT -(1<<16)
 
+#ifdef CONFIG_PROC_FS
 static int mnt_alloc_id(struct vfsmount *mnt)
 {
 	int res;
@@ -82,11 +86,14 @@ static void mnt_free_id(struct vfsmount 
 	ida_remove(&mnt_id_ida, mnt->mnt_id);
 	spin_unlock(&vfsmount_lock);
 }
+#endif /* CONFIG_PROC_FS */
 
 struct vfsmount *alloc_vfsmnt(const char *name)
 {
 	struct vfsmount *mnt = kmem_cache_zalloc(mnt_cache, GFP_KERNEL);
 	if (mnt) {
+
+#ifdef CONFIG_PROC_FS
 		int err;
 
 		err = mnt_alloc_id(mnt);
@@ -96,6 +103,8 @@ struct vfsmount *alloc_vfsmnt(const char
 		}
 
 		mnt->mnt_pgid = -1;
+#endif /* CONFIG_PROC_FS */
+
 		atomic_set(&mnt->mnt_count, 1);
 		INIT_LIST_HEAD(&mnt->mnt_hash);
 		INIT_LIST_HEAD(&mnt->mnt_child);
@@ -381,7 +390,9 @@ EXPORT_SYMBOL(simple_set_mnt);
 void free_vfsmnt(struct vfsmount *mnt)
 {
 	kfree(mnt->mnt_devname);
+#ifdef CONFIG_PROC_FS
 	mnt_free_id(mnt);
+#endif /* CONFIG_PROC_FS */
 	kmem_cache_free(mnt_cache, mnt);
 }
 
@@ -679,6 +690,7 @@ void save_mount_options(struct super_blo
 }
 EXPORT_SYMBOL(save_mount_options);
 
+#ifdef CONFIG_PROC_FS
 /* iterator */
 static void *m_start(struct seq_file *m, loff_t *pos)
 {
@@ -808,7 +820,7 @@ static int show_mountinfo(struct seq_fil
 			if (IS_MNT_SHARED(mnt))
 				seq_putc(m, ',');
 
-			seq_printf(m, "slave:%i", get_master_id(mnt));
+			seq_printf(m, "slave:%i", get_master_group_id(mnt));
 			if (dominator_id != -1)
 				seq_printf(m, ":%i", dominator_id);
 		}
@@ -866,6 +878,7 @@ const struct seq_operations mountstats_o
 	.stop	= m_stop,
 	.show	= show_vfsstat,
 };
+#endif  /* CONFIG_PROC_FS */
 
 /**
  * may_umount_tree - check if a mount tree is busy
Index: linux-2.6.24/fs/seq_file.c
===================================================================
--- linux-2.6.24.orig/fs/seq_file.c
+++ linux-2.6.24/fs/seq_file.c
@@ -391,6 +391,7 @@ int seq_path(struct seq_file *m, struct 
 }
 EXPORT_SYMBOL(seq_path);
 
+#ifdef CONFIG_PROC_FS
 /*
  * returns the path of the 'dentry' from the root of its filesystem.
  */
@@ -412,6 +413,7 @@ int seq_dentry(struct seq_file *m, struc
 	return -1;
 }
 EXPORT_SYMBOL(seq_dentry);
+#endif /* CONFIG_PROC_FS */
 
 static void *single_start(struct seq_file *p, loff_t *pos)
 {
Index: linux-2.6.24/include/linux/dcache.h
===================================================================
--- linux-2.6.24.orig/include/linux/dcache.h
+++ linux-2.6.24/include/linux/dcache.h
@@ -303,7 +303,10 @@ extern int d_validate(struct dentry *, s
 extern char *dynamic_dname(struct dentry *, char *, int, const char *, ...);
 
 extern char *d_path(struct path *, char *, int);
+
+#ifdef CONFIG_PROC_FS
 extern char *dentry_path(struct dentry *, char *, int);
+#endif /* CONFIG_PROC_FS */
 
 /* Allocation counts.. */
 
Index: linux-2.6.24/include/linux/seq_file.h
===================================================================
--- linux-2.6.24.orig/include/linux/seq_file.h
+++ linux-2.6.24/include/linux/seq_file.h
@@ -44,7 +44,10 @@ int seq_printf(struct seq_file *, const 
 	__attribute__ ((format (printf,2,3)));
 
 int seq_path(struct seq_file *, struct path *, char *);
+
+#ifdef CONFIG_PROC_FS
 int seq_dentry(struct seq_file *, struct dentry *, char *);
+#endif /* CONFIG_PROC_FS */
 
 int single_open(struct file *, int (*)(struct seq_file *, void *), void *);
 int single_release(struct inode *, struct file *);
Index: linux-2.6.24/fs/pnode.c
===================================================================
--- linux-2.6.24.orig/fs/pnode.c
+++ linux-2.6.24/fs/pnode.c
@@ -12,8 +12,10 @@
 #include <linux/idr.h>
 #include "pnode.h"
 
+#ifdef CONFIG_PROC_FS
 static DEFINE_SPINLOCK(mnt_pgid_lock);
 static DEFINE_IDA(mnt_pgid_ida);
+#endif /* CONFIG_PROC_FS */
 
 /* return the next shared peer mount of @p */
 static inline struct vfsmount *next_peer(struct vfsmount *p)
@@ -42,18 +44,18 @@ void set_mnt_shared(struct vfsmount *mnt
 	int res;
 
  retry:
-	spin_lock(&mnt_pgid_lock);
-	if (IS_MNT_SHARED(mnt)) {
-		spin_unlock(&mnt_pgid_lock);
+	if (IS_MNT_SHARED(mnt))
 		return;
-	}
 
+#ifdef CONFIG_PROC_FS
+	spin_lock(&mnt_pgid_lock);
 	res = ida_get_new(&mnt_pgid_ida, &mnt->mnt_pgid);
 	spin_unlock(&mnt_pgid_lock);
 	if (res == -EAGAIN) {
 		if (ida_pre_get(&mnt_pgid_ida, GFP_KERNEL))
 			goto retry;
 	}
+#endif /* CONFIG_PROC_FS */
 	__set_mnt_shared(mnt);
 }
 
@@ -61,10 +63,13 @@ void clear_mnt_shared(struct vfsmount *m
 {
 	if (IS_MNT_SHARED(mnt)) {
 		mnt->mnt_flags &= ~MNT_SHARED;
+#ifdef CONFIG_PROC_FS
 		mnt->mnt_pgid = -1;
+#endif /* CONFIG_PROC_FS */
 	}
 }
 
+#ifdef CONFIG_PROC_FS
 void make_mnt_peer(struct vfsmount *old, struct vfsmount *mnt)
 {
 	mnt->mnt_pgid = old->mnt_pgid;
@@ -77,7 +82,7 @@ int get_peer_group_id(struct vfsmount *m
 	return mnt->mnt_pgid;
 }
 
-int get_master_id(struct vfsmount *mnt)
+int get_master_group_id(struct vfsmount *mnt)
 {
 	int id;
 
@@ -119,6 +124,7 @@ int get_dominator_id_same_ns(struct vfsm
 
 	return id;
 }
+#endif /* CONFIG_PROC_FS */
 
 static int do_make_slave(struct vfsmount *mnt)
 {
@@ -138,13 +144,15 @@ static int do_make_slave(struct vfsmount
 		if (peer_mnt == mnt)
 			peer_mnt = NULL;
 	}
-	if (!list_empty(&mnt->mnt_share))
-		list_del_init(&mnt->mnt_share);
-	else if (IS_MNT_SHARED(mnt)) {
+
+#ifdef CONFIG_PROC_FS
+	if (IS_MNT_SHARED(mnt) && list_empty(&mnt->mnt_share)) {
 		spin_lock(&mnt_pgid_lock);
 		ida_remove(&mnt_pgid_ida, mnt->mnt_pgid);
 		spin_unlock(&mnt_pgid_lock);
 	}
+#endif /* CONFIG_PROC_FS */
+	list_del_init(&mnt->mnt_share);
 
 	if (peer_mnt)
 		master = peer_mnt;
Index: linux-2.6.24/fs/pnode.h
===================================================================
--- linux-2.6.24.orig/fs/pnode.h
+++ linux-2.6.24/fs/pnode.h
@@ -31,7 +31,11 @@ int propagate_mnt(struct vfsmount *, str
 		struct list_head *);
 int propagate_umount(struct list_head *);
 int propagate_mount_busy(struct vfsmount *, int);
+
+#ifdef CONFIG_PROC_FS
 int get_peer_group_id(struct vfsmount *);
-int get_master_id(struct vfsmount *);
+int get_master_group_id(struct vfsmount *);
 int get_dominator_id_same_ns(struct vfsmount *);
+#endif /* CONFIG_PROC_FS */
+
 #endif /* _LINUX_PNODE_H */
Index: linux-2.6.24/include/linux/mount.h
===================================================================
--- linux-2.6.24.orig/include/linux/mount.h
+++ linux-2.6.24/include/linux/mount.h
@@ -56,8 +56,10 @@ struct vfsmount {
 	struct list_head mnt_slave;	/* slave list entry */
 	struct vfsmount *mnt_master;	/* slave is on master->mnt_slave_list */
 	struct mnt_namespace *mnt_ns;	/* containing namespace */
+#ifdef CONFIG_PROC_FS
 	int mnt_id;			/* mount identifier */
 	int mnt_pgid;			/* peer group identifier */
+#endif /* CONFIG_PROC_FS */
 	/*
 	 * We put mnt_count & mnt_expiry_mark at the end of struct vfsmount
 	 * to let these frequently modified fields in a separate cache line



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v2] vfs: optimization to /proc/<pid>/mountinfo patch
  2008-03-10  6:53             ` [RFC PATCH v2] vfs: optimization to /proc/<pid>/mountinfo patch Ram Pai
@ 2008-03-10  7:10               ` Christoph Hellwig
  2008-03-10 11:39                 ` Miklos Szeredi
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2008-03-10  7:10 UTC (permalink / raw)
  To: Ram Pai; +Cc: Miklos Szeredi, viro, linux-fsdevel

On Sun, Mar 09, 2008 at 11:53:34PM -0700, Ram Pai wrote:
> 1) reports deleted inode in dentry_path() consistent with that in __d_path()
> 2) modified __d_path() to use prepend(), reducing the size of __d_path()
> 3) moved all the functionality that reports mount information in /proc under
> 	CONFIG_PROC_FS.

I think that latter one goes to far, mnt_id and mn_pdid are generally
useful information that should be generated unconditionally.  That will
also get rid of most of the ifdef clutter introduced in this patch.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v2] vfs: optimization to /proc/<pid>/mountinfo patch
  2008-03-10  7:10               ` Christoph Hellwig
@ 2008-03-10 11:39                 ` Miklos Szeredi
  0 siblings, 0 replies; 15+ messages in thread
From: Miklos Szeredi @ 2008-03-10 11:39 UTC (permalink / raw)
  To: hch; +Cc: linuxram, miklos, viro, linux-fsdevel

> > 1) reports deleted inode in dentry_path() consistent with that in __d_path()
> > 2) modified __d_path() to use prepend(), reducing the size of __d_path()
> > 3) moved all the functionality that reports mount information in /proc under
> > 	CONFIG_PROC_FS.

Thanks Ram.

> I think that latter one goes to far, mnt_id and mn_pdid are generally
> useful information that should be generated unconditionally.  That will
> also get rid of most of the ifdef clutter introduced in this patch.

I'll fix these up and send to Andrew.

Miklos

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2008-03-10 11:40 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-20 15:39 how to show propagation state for mounts Miklos Szeredi
2008-02-20 16:04 ` Al Viro
2008-02-20 16:27   ` Miklos Szeredi
2008-02-20 19:29     ` Ram Pai
2008-02-20 21:14       ` Al Viro
2008-02-20 21:35         ` Miklos Szeredi
2008-02-22 14:46           ` [rfc patch] " Miklos Szeredi
2008-03-05 19:25             ` Serge E. Hallyn
2008-03-05 19:34               ` Miklos Szeredi
2008-03-05 20:23                 ` Ram Pai
2008-03-10  6:53             ` [RFC PATCH v2] vfs: optimization to /proc/<pid>/mountinfo patch Ram Pai
2008-03-10  7:10               ` Christoph Hellwig
2008-03-10 11:39                 ` Miklos Szeredi
2008-02-20 16:31   ` how to show propagation state for mounts Matthew Wilcox
2008-02-20 19:42     ` Ram Pai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).