public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mnt: Fix a memory stomp in umount
@ 2014-12-18 16:57 Eric W. Biederman
       [not found] ` <CA+55aFz6grtss0SXqOizXMOOF4sxT3FC4GC4NCiMF2Huy1vE4A@mail.gmail.com>
  2014-12-18 20:01 ` Al Viro
  0 siblings, 2 replies; 14+ messages in thread
From: Eric W. Biederman @ 2014-12-18 16:57 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-fsdevel


While reviewing the code of umount_tree I realized that when we append
to a preexisting unmounted list we do not change pprev of the former
first item in the list.

Which means later in namespace_unlock hlist_del_init(&mnt->mnt_hash) on
the former first item of the list will stomp unmounted.first leaving
it set to some random mount point which we are likely to free soon.

This isn't likely to hit, but if it does I don't know how anyone could
track it down.

Fixes: 38129a13e6e71f666e0468e99fdd932a687b4d7e switch mnt_hash to hlist
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---

Al do you want to take this one, or would you like me to make certain it
makes it Linus?

 fs/namespace.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index fe1c77145a78..6afbd7bb79f3 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1370,6 +1370,8 @@ void umount_tree(struct mount *mnt, int how)
 	}
 	if (last) {
 		last->mnt_hash.next = unmounted.first;
+		if (unmounted.first)
+			unmounted.first->pprev = &last->mnt_hash.next;
 		unmounted.first = tmp_list.first;
 		unmounted.first->pprev = &unmounted.first;
 	}
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] mnt: Fix a memory stomp in umount
       [not found] ` <CA+55aFz6grtss0SXqOizXMOOF4sxT3FC4GC4NCiMF2Huy1vE4A@mail.gmail.com>
@ 2014-12-18 18:41   ` Linus Torvalds
  2014-12-18 19:24     ` Eric W. Biederman
  0 siblings, 1 reply; 14+ messages in thread
From: Linus Torvalds @ 2014-12-18 18:41 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-fsdevel, Al Viro

On Thu, Dec 18, 2014 at 9:07 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> Why is this piece of code using its own made up and buggy list handling in
> the first place? We have list functions for these things, exactly so that
> people shouldn't write buggy stuff by hand.

Oh. Ok, I see what's going on. We have "list_splice()", but we don't
have the equivalent "hlist_splice()". So it's doing that by hand, and
did it badly.

Al, this is your bug. I guess I can take the "manual hlist_splice" fix
from Eric, but I'm not really happy with it. There's a few other
places in that same commit where the list splice operation has been
open-coded.

Mind taking a look?

                         Linus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mnt: Fix a memory stomp in umount
  2014-12-18 18:41   ` Linus Torvalds
@ 2014-12-18 19:24     ` Eric W. Biederman
  2014-12-18 19:34       ` Linus Torvalds
  2014-12-18 21:05       ` Al Viro
  0 siblings, 2 replies; 14+ messages in thread
From: Eric W. Biederman @ 2014-12-18 19:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, Al Viro

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Thu, Dec 18, 2014 at 9:07 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> Why is this piece of code using its own made up and buggy list handling in
>> the first place? We have list functions for these things, exactly so that
>> people shouldn't write buggy stuff by hand.
>
> Oh. Ok, I see what's going on. We have "list_splice()", but we don't
> have the equivalent "hlist_splice()". So it's doing that by hand, and
> did it badly.
>
> Al, this is your bug. I guess I can take the "manual hlist_splice" fix
> from Eric, but I'm not really happy with it. There's a few other
> places in that same commit where the list splice operation has been
> open-coded.
>
> Mind taking a look?

It looks like we can pretty easily use mnt_list instead of mnt_hash,
see below (note: the code is only compile tested).

While converting this to ordinary list helpers I found something
strange.

In __propagate_umount we currently add the child to be unmounted in a
different location in the list then we did before the conversion of
mnt_hash to a hlist for rcu's accesses benefit.

Now maybe propagate_next handles this (I still need to read and
understand that code) if not it looks like I may have found another bug,
as it looks like today we can add a node to our list without propogating
the unmount from the node.

From: "Eric W. Biederman" <ebiederm@xmission.com>
Date: Thu, 18 Dec 2014 13:10:48 -0600
Subject: [PATCH] mnt: In umount_tree reuse mnt_list instead of mnt_hash

umount_tree builds a list of mounts that need to be unmounted.
Utilize mnt_list for this purpose instead of mnt_hash as mnt_list is
an ordianry list_head, allowing the use of list_splice and list_move
instead of rolling our own.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/namespace.c | 45 +++++++++++++++++++--------------------------
 fs/pnode.c     |  7 ++++---
 fs/pnode.h     |  2 +-
 3 files changed, 24 insertions(+), 30 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 6afbd7bb79f3..2f21a973d7bc 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1285,23 +1285,22 @@ int may_umount(struct vfsmount *mnt)
 
 EXPORT_SYMBOL(may_umount);
 
-static HLIST_HEAD(unmounted);	/* protected by namespace_sem */
+static LIST_HEAD(unmounted);	/* protected by namespace_sem */
 
 static void namespace_unlock(void)
 {
 	struct mount *mnt;
-	struct hlist_head head = unmounted;
+	LIST_HEAD(head);
 
-	if (likely(hlist_empty(&head))) {
+	if (likely(list_empty(&unmounted))) {
 		up_write(&namespace_sem);
 		return;
 	}
 
-	head.first->pprev = &head.first;
-	INIT_HLIST_HEAD(&unmounted);
+	list_splice_init(&unmounted, &head);
 
 	/* undo decrements we'd done in umount_tree() */
-	hlist_for_each_entry(mnt, &head, mnt_hash)
+	list_for_each_entry(mnt, &head, mnt_list)
 		if (mnt->mnt_ex_mountpoint.mnt)
 			mntget(mnt->mnt_ex_mountpoint.mnt);
 
@@ -1309,9 +1308,9 @@ static void namespace_unlock(void)
 
 	synchronize_rcu();
 
-	while (!hlist_empty(&head)) {
-		mnt = hlist_entry(head.first, struct mount, mnt_hash);
-		hlist_del_init(&mnt->mnt_hash);
+	while (!list_empty(&head)) {
+		mnt = list_first_entry(&head, struct mount, mnt_list);
+		list_del_init(&mnt->mnt_list);
 		if (mnt->mnt_ex_mountpoint.mnt)
 			path_put(&mnt->mnt_ex_mountpoint);
 		mntput(&mnt->mnt);
@@ -1332,24 +1331,25 @@ static inline void namespace_lock(void)
  */
 void umount_tree(struct mount *mnt, int how)
 {
-	HLIST_HEAD(tmp_list);
+	LIST_HEAD(tmp_list);
 	struct mount *p;
-	struct mount *last = NULL;
 
-	for (p = mnt; p; p = next_mnt(p, mnt)) {
-		hlist_del_init_rcu(&p->mnt_hash);
-		hlist_add_head(&p->mnt_hash, &tmp_list);
-	}
+	/* Gather the mounts to umount */
+	for (p = mnt; p; p = next_mnt(p, mnt))
+		list_move(&p->mnt_list, &tmp_list);
 
-	hlist_for_each_entry(p, &tmp_list, mnt_hash)
+	/* Hide the mounts from lookup_mnt and mnt_mounts */
+	list_for_each_entry(p, &tmp_list, mnt_list) {
+		hlist_del_init_rcu(&p->mnt_hash);
 		list_del_init(&p->mnt_child);
+	}
 
+	/* Add propogated mounts to the tmp_list */
 	if (how)
 		propagate_umount(&tmp_list);
 
-	hlist_for_each_entry(p, &tmp_list, mnt_hash) {
+	list_for_each_entry(p, &tmp_list, mnt_list) {
 		list_del_init(&p->mnt_expire);
-		list_del_init(&p->mnt_list);
 		__touch_mnt_namespace(p->mnt_ns);
 		p->mnt_ns = NULL;
 		if (how < 2)
@@ -1366,15 +1366,8 @@ void umount_tree(struct mount *mnt, int how)
 			p->mnt_mp = NULL;
 		}
 		change_mnt_propagation(p, MS_PRIVATE);
-		last = p;
-	}
-	if (last) {
-		last->mnt_hash.next = unmounted.first;
-		if (unmounted.first)
-			unmounted.first->pprev = &last->mnt_hash.next;
-		unmounted.first = tmp_list.first;
-		unmounted.first->pprev = &unmounted.first;
 	}
+	list_splice(&tmp_list, &unmounted);
 }
 
 static void shrink_submounts(struct mount *mnt);
diff --git a/fs/pnode.c b/fs/pnode.c
index 260ac8f898a4..c4319520d884 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -384,7 +384,8 @@ static void __propagate_umount(struct mount *mnt)
 		if (child && list_empty(&child->mnt_mounts)) {
 			list_del_init(&child->mnt_child);
 			hlist_del_init_rcu(&child->mnt_hash);
-			hlist_add_before_rcu(&child->mnt_hash, &mnt->mnt_hash);
+			/* list_move or list_move_tail? */
+			list_move(&child->mnt_list, &mnt->mnt_list);
 		}
 	}
 }
@@ -396,11 +397,11 @@ static void __propagate_umount(struct mount *mnt)
  *
  * vfsmount lock must be held for write
  */
-int propagate_umount(struct hlist_head *list)
+int propagate_umount(struct list_head *list)
 {
 	struct mount *mnt;
 
-	hlist_for_each_entry(mnt, list, mnt_hash)
+	list_for_each_entry(mnt, list, mnt_list)
 		__propagate_umount(mnt);
 	return 0;
 }
diff --git a/fs/pnode.h b/fs/pnode.h
index 4a246358b031..997596a5d31b 100644
--- a/fs/pnode.h
+++ b/fs/pnode.h
@@ -40,7 +40,7 @@ static inline void set_mnt_shared(struct mount *mnt)
 void change_mnt_propagation(struct mount *, int);
 int propagate_mnt(struct mount *, struct mountpoint *, struct mount *,
 		struct hlist_head *);
-int propagate_umount(struct hlist_head *);
+int propagate_umount(struct list_head *);
 int propagate_mount_busy(struct mount *, int);
 void mnt_release_group_id(struct mount *);
 int get_dominating_id(struct mount *mnt, const struct path *root);
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] mnt: Fix a memory stomp in umount
  2014-12-18 19:24     ` Eric W. Biederman
@ 2014-12-18 19:34       ` Linus Torvalds
  2014-12-18 21:05       ` Al Viro
  1 sibling, 0 replies; 14+ messages in thread
From: Linus Torvalds @ 2014-12-18 19:34 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-fsdevel, Al Viro

On Thu, Dec 18, 2014 at 11:24 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> It looks like we can pretty easily use mnt_list instead of mnt_hash,
> see below (note: the code is only compile tested).

Ok. That sounds good.

In the meantime, I just applied your patch. I'm not a huge fan of it,
but the bug is real, and for backporting that patch is the
unquestionably the right thing to do. I just want to at least have
this discussion about better long-term solutions, and I think your
patch may be it. Thanks.

Al?

                       Linus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mnt: Fix a memory stomp in umount
  2014-12-18 16:57 [PATCH] mnt: Fix a memory stomp in umount Eric W. Biederman
       [not found] ` <CA+55aFz6grtss0SXqOizXMOOF4sxT3FC4GC4NCiMF2Huy1vE4A@mail.gmail.com>
@ 2014-12-18 20:01 ` Al Viro
  2014-12-18 20:15   ` Al Viro
  1 sibling, 1 reply; 14+ messages in thread
From: Al Viro @ 2014-12-18 20:01 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linus Torvalds, linux-fsdevel

On Thu, Dec 18, 2014 at 10:57:19AM -0600, Eric W. Biederman wrote:
> 
> While reviewing the code of umount_tree I realized that when we append
> to a preexisting unmounted list we do not change pprev of the former
> first item in the list.
> 
> Which means later in namespace_unlock hlist_del_init(&mnt->mnt_hash) on
> the former first item of the list will stomp unmounted.first leaving
> it set to some random mount point which we are likely to free soon.
> 
> This isn't likely to hit, but if it does I don't know how anyone could
> track it down.

All you need for that kind of loop to work is correct -next on all elements.
Seriously.  Even correct first->pprev is not needed.  Look:
        struct hlist_head head = unmounted;

        if (likely(hlist_empty(&head))) // no dereferences of ppre *or* next
		sod off

        head.first->pprev = &head.first; // pprev of the first is valid now
        INIT_HLIST_HEAD(&unmounted);

        hlist_for_each_entry(mnt, &head, mnt_hash)
                if (mnt->mnt_ex_mountpoint.mnt)
                        mntget(mnt->mnt_ex_mountpoint.mnt);
	// only needed ->next for that loop
	...
        while (!hlist_empty(&head)) {
                mnt = hlist_entry(head.first, struct mount, mnt_hash);
                hlist_del_init(&mnt->mnt_hash);
		...
	}

Now, we certainly need pprev of the first to be correct for hlist_del_init()
to work.  What we do not need is correctness of pprev of the second as we
call hlist_del_init().  And _after_ hlist_del_init() pprev of what used to
be the second (the first, now) is healed.

So we actually are OK here.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mnt: Fix a memory stomp in umount
  2014-12-18 20:01 ` Al Viro
@ 2014-12-18 20:15   ` Al Viro
  2014-12-18 20:40     ` Eric W. Biederman
  0 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2014-12-18 20:15 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linus Torvalds, linux-fsdevel

On Thu, Dec 18, 2014 at 08:01:10PM +0000, Al Viro wrote:
> On Thu, Dec 18, 2014 at 10:57:19AM -0600, Eric W. Biederman wrote:
> > 
> > While reviewing the code of umount_tree I realized that when we append
> > to a preexisting unmounted list we do not change pprev of the former
> > first item in the list.
> > 
> > Which means later in namespace_unlock hlist_del_init(&mnt->mnt_hash) on
> > the former first item of the list will stomp unmounted.first leaving
> > it set to some random mount point which we are likely to free soon.
> > 
> > This isn't likely to hit, but if it does I don't know how anyone could
> > track it down.
> 
> All you need for that kind of loop to work is correct -next on all elements.

Mind you, I'm not saying that this isn't worth sanitizing.  Just that analysis
in your commit message is incorrect - we actually do _not_ stomp on memory
there.  It's brittle and it's only saved by having only very few things done
to that hlist, but the actual memory corruption does not happen.  Again, I'm
no defending that code; it's certaily better off making a properly-spliced
hlist rather than relying on precise sequence of operations in
namespace_unlock().

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mnt: Fix a memory stomp in umount
  2014-12-18 20:15   ` Al Viro
@ 2014-12-18 20:40     ` Eric W. Biederman
  0 siblings, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2014-12-18 20:40 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-fsdevel

Al Viro <viro@ZenIV.linux.org.uk> writes:

> On Thu, Dec 18, 2014 at 08:01:10PM +0000, Al Viro wrote:
>> On Thu, Dec 18, 2014 at 10:57:19AM -0600, Eric W. Biederman wrote:
>> > 
>> > While reviewing the code of umount_tree I realized that when we append
>> > to a preexisting unmounted list we do not change pprev of the former
>> > first item in the list.
>> > 
>> > Which means later in namespace_unlock hlist_del_init(&mnt->mnt_hash) on
>> > the former first item of the list will stomp unmounted.first leaving
>> > it set to some random mount point which we are likely to free soon.
>> > 
>> > This isn't likely to hit, but if it does I don't know how anyone could
>> > track it down.
>> 
>> All you need for that kind of loop to work is correct -next on all elements.
>
> Mind you, I'm not saying that this isn't worth sanitizing.  Just that analysis
> in your commit message is incorrect - we actually do _not_ stomp on memory
> there.  It's brittle and it's only saved by having only very few things done
> to that hlist, but the actual memory corruption does not happen.  Again, I'm
> no defending that code; it's certaily better off making a properly-spliced
> hlist rather than relying on precise sequence of operations in
> namespace_unlock().

Agreed.  That code wanted to be a memory stomp but we got lucky and it
couldn't figure out how to be.  Which I guess in the grand scheme of
things I am glad of.

When I sent the patch I knew there was an issue and I figured I had
better send a patch before my mind moved on and I forgot about the
incorrectly spliced list.

Eric




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mnt: Fix a memory stomp in umount
  2014-12-18 19:24     ` Eric W. Biederman
  2014-12-18 19:34       ` Linus Torvalds
@ 2014-12-18 21:05       ` Al Viro
  2014-12-18 21:18         ` Eric W. Biederman
  1 sibling, 1 reply; 14+ messages in thread
From: Al Viro @ 2014-12-18 21:05 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linus Torvalds, linux-fsdevel

On Thu, Dec 18, 2014 at 01:24:26PM -0600, Eric W. Biederman wrote:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
> 
> > On Thu, Dec 18, 2014 at 9:07 AM, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> >> Why is this piece of code using its own made up and buggy list handling in
> >> the first place? We have list functions for these things, exactly so that
> >> people shouldn't write buggy stuff by hand.
> >
> > Oh. Ok, I see what's going on. We have "list_splice()", but we don't
> > have the equivalent "hlist_splice()". So it's doing that by hand, and
> > did it badly.
> >
> > Al, this is your bug. I guess I can take the "manual hlist_splice" fix
> > from Eric, but I'm not really happy with it. There's a few other
> > places in that same commit where the list splice operation has been
> > open-coded.
> >
> > Mind taking a look?
> 
> It looks like we can pretty easily use mnt_list instead of mnt_hash,
> see below (note: the code is only compile tested).
> 
> While converting this to ordinary list helpers I found something
> strange.
> 
> In __propagate_umount we currently add the child to be unmounted in a
> different location in the list then we did before the conversion of
> mnt_hash to a hlist for rcu's accesses benefit.
> 
> Now maybe propagate_next handles this (I still need to read and
> understand that code) if not it looks like I may have found another bug,
> as it looks like today we can add a node to our list without propogating
> the unmount from the node.

Er...  Why would we want to reprocess it?  The loop goes through all nodes
getting events from ours; everything that gets events from _them_ included
at the same time they are.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mnt: Fix a memory stomp in umount
  2014-12-18 21:05       ` Al Viro
@ 2014-12-18 21:18         ` Eric W. Biederman
  2014-12-19  0:02           ` Al Viro
  0 siblings, 1 reply; 14+ messages in thread
From: Eric W. Biederman @ 2014-12-18 21:18 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-fsdevel

Al Viro <viro@ZenIV.linux.org.uk> writes:

> On Thu, Dec 18, 2014 at 01:24:26PM -0600, Eric W. Biederman wrote:
>> Linus Torvalds <torvalds@linux-foundation.org> writes:
>> 
>> > On Thu, Dec 18, 2014 at 9:07 AM, Linus Torvalds
>> > <torvalds@linux-foundation.org> wrote:
>> >> Why is this piece of code using its own made up and buggy list handling in
>> >> the first place? We have list functions for these things, exactly so that
>> >> people shouldn't write buggy stuff by hand.
>> >
>> > Oh. Ok, I see what's going on. We have "list_splice()", but we don't
>> > have the equivalent "hlist_splice()". So it's doing that by hand, and
>> > did it badly.
>> >
>> > Al, this is your bug. I guess I can take the "manual hlist_splice" fix
>> > from Eric, but I'm not really happy with it. There's a few other
>> > places in that same commit where the list splice operation has been
>> > open-coded.
>> >
>> > Mind taking a look?
>> 
>> It looks like we can pretty easily use mnt_list instead of mnt_hash,
>> see below (note: the code is only compile tested).
>> 
>> While converting this to ordinary list helpers I found something
>> strange.
>> 
>> In __propagate_umount we currently add the child to be unmounted in a
>> different location in the list then we did before the conversion of
>> mnt_hash to a hlist for rcu's accesses benefit.
>> 
>> Now maybe propagate_next handles this (I still need to read and
>> understand that code) if not it looks like I may have found another bug,
>> as it looks like today we can add a node to our list without propogating
>> the unmount from the node.
>
> Er...  Why would we want to reprocess it?  The loop goes through all nodes
> getting events from ours; everything that gets events from _them_ included
> at the same time they are.

We might have wanted to because before your change to an hlist that is
what the code did.

Having read through the code of propagate_next it does look like it
iterates through the entire propagation hierarchy so there shouldn't be
a need to visit mounts that we have placed on the propagation list.

Any thoughts on using mnt_list instead of mnt_hash to allow the use of
list_splice and list_move?

Eric


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mnt: Fix a memory stomp in umount
  2014-12-18 21:18         ` Eric W. Biederman
@ 2014-12-19  0:02           ` Al Viro
  2014-12-19  0:03             ` Al Viro
  2015-01-02 21:06             ` Eric W. Biederman
  0 siblings, 2 replies; 14+ messages in thread
From: Al Viro @ 2014-12-19  0:02 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linus Torvalds, linux-fsdevel

On Thu, Dec 18, 2014 at 03:18:49PM -0600, Eric W. Biederman wrote:

> We might have wanted to because before your change to an hlist that is
> what the code did.
> 
> Having read through the code of propagate_next it does look like it
> iterates through the entire propagation hierarchy so there shouldn't be
> a need to visit mounts that we have placed on the propagation list.
> 
> Any thoughts on using mnt_list instead of mnt_hash to allow the use of
> list_splice and list_move?

Frankly, I don't see much benefit in that.  What's wrong with actually
adding
__hlist_splice(struct hlist_node *head,
	       struct hlist_node *tail,
	       struct hlist_node **where)
{
	next = *where;
	*where = head;
	head->pprev = where;
	if (next) {
		tail->next = next;
		next->pprev = tail;
	}
}

and use it as
	__hlist_splice(tmp_list.first, &last->mnt_hash, &unmounted.first);
(and I'd start with if (unlikely(!mnt)) return; /* idiot caller */)

Another primitive would be something like
hlist_transplant(struct hlist_head **from, struct hlist_head **to)
{
	*to = *from;
	if (!hlist_empty(*to)) {
		to->first->pprev = to;
		INIT_HLIST_HEAD(from);
	}
}

with namespace_unlock() starting with
	hlist_transplant(&head, &unmounted);
	if (likely(hlist_empty(&head))) {
                up_write(&namespace_sem);
                return;
	}
        /* undo decrements we'd done in umount_tree() */
	...

Linus, would that work for you, or would you prefer something more fancy?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mnt: Fix a memory stomp in umount
  2014-12-19  0:02           ` Al Viro
@ 2014-12-19  0:03             ` Al Viro
  2015-01-02 21:06             ` Eric W. Biederman
  1 sibling, 0 replies; 14+ messages in thread
From: Al Viro @ 2014-12-19  0:03 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linus Torvalds, linux-fsdevel

On Fri, Dec 19, 2014 at 12:02:15AM +0000, Al Viro wrote:
> On Thu, Dec 18, 2014 at 03:18:49PM -0600, Eric W. Biederman wrote:
> 
> > We might have wanted to because before your change to an hlist that is
> > what the code did.
> > 
> > Having read through the code of propagate_next it does look like it
> > iterates through the entire propagation hierarchy so there shouldn't be
> > a need to visit mounts that we have placed on the propagation list.
> > 
> > Any thoughts on using mnt_list instead of mnt_hash to allow the use of
> > list_splice and list_move?
> 
> Frankly, I don't see much benefit in that.  What's wrong with actually
> adding
> __hlist_splice(struct hlist_node *head,
> 	       struct hlist_node *tail,
> 	       struct hlist_node **where)
> {
> 	next = *where;
> 	*where = head;
> 	head->pprev = where;
> 	if (next) {
> 		tail->next = next;
> 		next->pprev = tail;
> 	}
> }
> 
> and use it as
> 	__hlist_splice(tmp_list.first, &last->mnt_hash, &unmounted.first);
> (and I'd start with if (unlikely(!mnt)) return; /* idiot caller */)
> 
> Another primitive would be something like
> hlist_transplant(struct hlist_head **from, struct hlist_head **to)

hlist_transplant(struct hlist_head **to, struct hlist_head **from), that is...

> {
> 	*to = *from;
> 	if (!hlist_empty(*to)) {
> 		to->first->pprev = to;
> 		INIT_HLIST_HEAD(from);
> 	}
> }
> 
> with namespace_unlock() starting with
> 	hlist_transplant(&head, &unmounted);
> 	if (likely(hlist_empty(&head))) {
>                 up_write(&namespace_sem);
>                 return;
> 	}
>         /* undo decrements we'd done in umount_tree() */
> 	...
> 
> Linus, would that work for you, or would you prefer something more fancy?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mnt: Fix a memory stomp in umount
  2014-12-19  0:02           ` Al Viro
  2014-12-19  0:03             ` Al Viro
@ 2015-01-02 21:06             ` Eric W. Biederman
  2015-01-02 21:13               ` Al Viro
  1 sibling, 1 reply; 14+ messages in thread
From: Eric W. Biederman @ 2015-01-02 21:06 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-fsdevel

Al Viro <viro@ZenIV.linux.org.uk> writes:

> On Thu, Dec 18, 2014 at 03:18:49PM -0600, Eric W. Biederman wrote:
>
>> We might have wanted to because before your change to an hlist that is
>> what the code did.
>> 
>> Having read through the code of propagate_next it does look like it
>> iterates through the entire propagation hierarchy so there shouldn't be
>> a need to visit mounts that we have placed on the propagation list.
>> 
>> Any thoughts on using mnt_list instead of mnt_hash to allow the use of
>> list_splice and list_move?
>
> Frankly, I don't see much benefit in that.  What's wrong with actually
> adding
> __hlist_splice(struct hlist_node *head,
> 	       struct hlist_node *tail,
> 	       struct hlist_node **where)
> {
> 	next = *where;
> 	*where = head;
> 	head->pprev = where;
> 	if (next) {
> 		tail->next = next;
> 		next->pprev = tail;
> 	}
> }
>
> and use it as
> 	__hlist_splice(tmp_list.first, &last->mnt_hash, &unmounted.first);
> (and I'd start with if (unlikely(!mnt)) return; /* idiot caller */)

I have a couple of reasons for not wanting to go that way.
- It requires keeping track of the tail of the list which is half of the
  error prone complexity.

- I have a fix for a bad interaction between MNT_LOCKED and MNT_DETACH
  where I need to leave mounts on the mount hash, until the final
  mntput.  For that I need not to reuse the the mnt_hash lists.

  Patches to follow shortly.

Of course after having gone through this code a time or two and tested
things I am very annoyed by the fact that "hlist_add_before_rcu()" and
"list_move_tail" perform the same operation.  That is down right
confusing.

> Another primitive would be something like
> hlist_transplant(struct hlist_head **from, struct hlist_head **to)
> {
> 	*to = *from;
> 	if (!hlist_empty(*to)) {
> 		to->first->pprev = to;
> 		INIT_HLIST_HEAD(from);
> 	}
> }
>
> with namespace_unlock() starting with
> 	hlist_transplant(&head, &unmounted);
> 	if (likely(hlist_empty(&head))) {
>                 up_write(&namespace_sem);
>                 return;
> 	}
>         /* undo decrements we'd done in umount_tree() */
> 	...
>
> Linus, would that work for you, or would you prefer something more fancy?


Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mnt: Fix a memory stomp in umount
  2015-01-02 21:06             ` Eric W. Biederman
@ 2015-01-02 21:13               ` Al Viro
  2015-01-02 21:56                 ` Eric W. Biederman
  0 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2015-01-02 21:13 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linus Torvalds, linux-fsdevel

On Fri, Jan 02, 2015 at 03:06:06PM -0600, Eric W. Biederman wrote:

> - It requires keeping track of the tail of the list which is half of the
>   error prone complexity.

Er...  In umount_tree() it comes pretty much for free, actually.

> - I have a fix for a bad interaction between MNT_LOCKED and MNT_DETACH
>   where I need to leave mounts on the mount hash, until the final
>   mntput.  For that I need not to reuse the the mnt_hash lists.

Details, please.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mnt: Fix a memory stomp in umount
  2015-01-02 21:13               ` Al Viro
@ 2015-01-02 21:56                 ` Eric W. Biederman
  0 siblings, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2015-01-02 21:56 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-fsdevel

Al Viro <viro@ZenIV.linux.org.uk> writes:

> On Fri, Jan 02, 2015 at 03:06:06PM -0600, Eric W. Biederman wrote:
>
>> - It requires keeping track of the tail of the list which is half of the
>>   error prone complexity.
>
> Er...  In umount_tree() it comes pretty much for free, actually.

The runtime complexity is basically free.  The conceptual complexity is
not.  It is yet another detail to pay attention to.  Having gotten it
wrong my first round while trying to do something similar while
developing my patches I can attest to this.  The hlist version is more
error prone to use and develop against. (Even if appropriate helpers are
available).

>> - I have a fix for a bad interaction between MNT_LOCKED and MNT_DETACH
>>   where I need to leave mounts on the mount hash, until the final
>>   mntput.  For that I need not to reuse the the mnt_hash lists.
>
> Details, please.

The code doing that has been sent out for review.

Eric


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-01-02 21:59 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-18 16:57 [PATCH] mnt: Fix a memory stomp in umount Eric W. Biederman
     [not found] ` <CA+55aFz6grtss0SXqOizXMOOF4sxT3FC4GC4NCiMF2Huy1vE4A@mail.gmail.com>
2014-12-18 18:41   ` Linus Torvalds
2014-12-18 19:24     ` Eric W. Biederman
2014-12-18 19:34       ` Linus Torvalds
2014-12-18 21:05       ` Al Viro
2014-12-18 21:18         ` Eric W. Biederman
2014-12-19  0:02           ` Al Viro
2014-12-19  0:03             ` Al Viro
2015-01-02 21:06             ` Eric W. Biederman
2015-01-02 21:13               ` Al Viro
2015-01-02 21:56                 ` Eric W. Biederman
2014-12-18 20:01 ` Al Viro
2014-12-18 20:15   ` Al Viro
2014-12-18 20:40     ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox