Re: [PATCH] vfs namespace: Don't assume mount namespace has valid root

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] vfs namespace: Don't assume mount namespace has valid root
  2012-03-14 20:58 [PATCH] vfs namespace: Don't assume mount namespace has valid root Sasha Levin
@ 2012-03-14 20:10 ` Al Viro
  2012-03-14 20:58   ` Al Viro
  0 siblings, 1 reply; 8+ messages in thread
From: Al Viro @ 2012-03-14 20:10 UTC (permalink / raw)
  To: Sasha Levin; +Cc: linux-fsdevel, linux-kernel, davej

On Wed, Mar 14, 2012 at 04:58:30PM -0400, Sasha Levin wrote:
> This patch fixes the assumption that a mnt namespace will always have a valid
> root object.

It's not an assumption, it's an invariant that should hold unless you have
run into a bug somewhere.

Instances of struct mnt_namespace should *all* come from alloc_mnt_ns().
There are only two callers - dup_mnt_namespace() and create_mnt_ns().
The latter will assign non-NULL vfsmount to ->root or die NULL pointer
dereference in
                mnt->mnt_ns = new_ns;
The former will either assign non-NULL to ->root or kfree() mnt_namespace
before anyone can see it.

And nothing should modify ->root after that assignment for as long as
the instance of struct mnt_namespace is allocated.

Mind explaining how have you managed to get mnt_namespace with NULL ->root
passed to dup_mnt_ns()?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] vfs namespace: Don't assume mount namespace has valid root
  2012-03-14 20:10 ` Al Viro
@ 2012-03-14 20:58   ` Al Viro
  2012-03-14 21:03     ` Sasha Levin
  0 siblings, 1 reply; 8+ messages in thread
From: Al Viro @ 2012-03-14 20:58 UTC (permalink / raw)
  To: Sasha Levin; +Cc: linux-fsdevel, linux-kernel, davej

On Wed, Mar 14, 2012 at 08:10:48PM +0000, Al Viro wrote:
> On Wed, Mar 14, 2012 at 04:58:30PM -0400, Sasha Levin wrote:
> > This patch fixes the assumption that a mnt namespace will always have a valid
> > root object.
> 
> It's not an assumption, it's an invariant that should hold unless you have
> run into a bug somewhere.
> 
> Instances of struct mnt_namespace should *all* come from alloc_mnt_ns().
> There are only two callers - dup_mnt_namespace() and create_mnt_ns().
> The latter will assign non-NULL vfsmount to ->root or die NULL pointer
> dereference in
>                 mnt->mnt_ns = new_ns;
> The former will either assign non-NULL to ->root or kfree() mnt_namespace
> before anyone can see it.
> 
> And nothing should modify ->root after that assignment for as long as
> the instance of struct mnt_namespace is allocated.
> 
> Mind explaining how have you managed to get mnt_namespace with NULL ->root
> passed to dup_mnt_ns()?

As the matter of fact, all of the above is easily verified; very few places
see the internals of struct mnt_namespace (defined in fs/mount.h, included
only in fs/dcache.c, fs/fhandle.c, fs/namei.c,
fs/notify/{fanotify/fanotify_user.c,fsnotify.c,vfsmount_mark.c},
and fs/pnode.h, which is included by fs/namespace.c, fs/pnode.c and
fs/proc_namespace.c).  Nothing else knows what size the damn thing is,
nevermind the actual layout.

The lifetime rules are also simple and easy to verify: it's a plain refcount.
No direct manipulators, everything happens via get_mnt_ns()/put_mnt_ns().
As far as the outside cares,
	nsproxy->mnt_ns contributes to refcount; copy_mnt_ns() always
returns a new reference, either to exisiting instance or to freshly allocated
one.  put_mnt_ns() needs to be called on the other side...
	opened /proc/*/{mounts,mountinfo,mountstats} contributes to refcount
for as long as it's opened.

This is it; we actually end up acquiring one reference too many to initial
mnt_namespace, but that's not going to cause such effect.

The structure containing struct vfsmount has a pointer to struct mnt_namespace
in it.  That one is protected by namespace_sem, does *not* contribute to
refcount and is cleared when vfsmount (OK, struct mount containing it)
gets removed from namespace's ->list and set when it gets placed there.
Anything still on mnt_ns->list will get kicked out of there by umount_tree()
call from put_mnt_ns(), so those references can't outlive the instance
they point to.

And that's all pointers to mnt_namespace that ever exist, aside of function
arguments and local variables.  I'm not saying that I couldn't have possibly
fucked it up, but from rereading that code it doesn't look like we could
end up with dangling pointers to already freed instances...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] vfs namespace: Don't assume mount namespace has valid root
@ 2012-03-14 20:58 Sasha Levin
  2012-03-14 20:10 ` Al Viro
  0 siblings, 1 reply; 8+ messages in thread
From: Sasha Levin @ 2012-03-14 20:58 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, linux-kernel, davej, Sasha Levin

This patch fixes the assumption that a mnt namespace will always have a valid
root object.

This fixes a NULL ptr deref on unshare():

[   23.652282] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
[   23.653105] IP: [<ffffffff81201e67>] dup_mnt_ns.clone.15+0x37/0x2a0
[   23.653105] PGD 35dd2067 PUD 35dd3067 PMD 0
[   23.653105] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[   23.653105] CPU 2
[   23.653105] Pid: 3068, comm: trinity Not tainted 3.3.0-rc6-next-20120309-sasha-00001-g10cf0bc-dirty #42
[   23.653105] RIP: 0010:[<ffffffff81201e67>]  [<ffffffff81201e67>] dup_mnt_ns.clone.15+0x37/0x2a0
[   23.653105] RSP: 0018:ffff880032b09e48  EFLAGS: 00010282
[   23.653105] RAX: ffffffff81201e67 RBX: ffff880032b14200 RCX: 0000000000000000
[   23.653105] RDX: ffffffff83cde6a0 RSI: ffffffff844ce960 RDI: 0000000000000282
[   23.653105] RBP: ffff880032b09ea8 R08: 0000000000000001 R09: 0000000000000000
[   23.653105] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[   23.653105] R13: ffff8800368db000 R14: ffff880032b388e8 R15: 00007fffb6d26e78
[   23.653105] FS:  00007f89447c8700(0000) GS:ffff88003e200000(006b) knlGS:0000000000f50030
[   23.653105] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   23.666575] CR2: 0000000000000020 CR3: 0000000035df4000 CR4: 00000000000406e0
[   23.666575] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   23.666575] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   23.666575] Process trinity (pid: 3068, threadinfo ffff880032b08000, task ffff8800368db000)
[   23.666575] Stack:
[   23.666575]  00000000000000d0 ffff88003d82d488 ffff88003d3d6178 00000000000000d0
[   23.666575]  ffff880032b09ec8 ffffffff811c2983 ffffffff810de328 ffff880032b608e8
[   23.666575]  0000000004060a00 ffff8800368db000 ffff880032b14200 00007fffb6d26e78
[   23.666575] Call Trace:
[   23.666575]  [<ffffffff811c2983>] ? kmem_cache_alloc+0x123/0x180
[   23.666575]  [<ffffffff810de328>] ? finish_task_switch+0x78/0x100
[   23.666575]  [<ffffffff81202790>] copy_mnt_ns+0x30/0x50
[   23.666575]  [<ffffffff810d889d>] create_new_namespaces+0x5d/0x190
[   23.666575]  [<ffffffff810d8b93>] unshare_nsproxy_namespaces+0x63/0x90
[   23.666575]  [<ffffffff810adc07>] sys_unshare+0x107/0x240
[   23.666575]  [<ffffffff826f10fd>] system_call_fastpath+0x1a/0x1f
[   23.666575] Code: 49 89 fc 53 48 89 f3 48 83 ec 38 e8 f4 cb ff ff 49 89 c6 48 3d 00 f0 ff ff 0f 87 96 01 00 00 48 c7 c7 00 e9 4c 84 e8 49 b9 4e 01 <49> 8b 74 24 20 ba 05 00 00 00 4c 89 e7 e8 77 f3 ff ff 49 89 c5
[   23.666575] RIP  [<ffffffff81201e67>] dup_mnt_ns.clone.15+0x37/0x2a0
[   23.666575]  RSP <ffff880032b09e48>
[   23.666575] CR2: 0000000000000020
[   23.693853] ---[ end trace 1a8ec15272933df5 ]---

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
---
 fs/namespace.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 5a6208b..6b6bec7 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2238,6 +2238,9 @@ static struct mnt_namespace *dup_mnt_ns(struct mnt_namespace *mnt_ns,
 	struct mount *old = mnt_ns->root;
 	struct mount *new;
 
+	if (old == NULL)
+		return ERR_PTR(-EINVAL);
+
 	new_ns = alloc_mnt_ns();
 	if (IS_ERR(new_ns))
 		return new_ns;
-- 
1.7.8.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] vfs namespace: Don't assume mount namespace has valid root
  2012-03-14 20:58   ` Al Viro
@ 2012-03-14 21:03     ` Sasha Levin
  2012-03-14 21:37       ` Al Viro
  0 siblings, 1 reply; 8+ messages in thread
From: Sasha Levin @ 2012-03-14 21:03 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel, davej

On Wed, Mar 14, 2012 at 10:58 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Wed, Mar 14, 2012 at 08:10:48PM +0000, Al Viro wrote:
>> On Wed, Mar 14, 2012 at 04:58:30PM -0400, Sasha Levin wrote:
>> > This patch fixes the assumption that a mnt namespace will always have a valid
>> > root object.
>>
>> It's not an assumption, it's an invariant that should hold unless you have
>> run into a bug somewhere.
>>
>> Instances of struct mnt_namespace should *all* come from alloc_mnt_ns().
>> There are only two callers - dup_mnt_namespace() and create_mnt_ns().
>> The latter will assign non-NULL vfsmount to ->root or die NULL pointer
>> dereference in
>>                 mnt->mnt_ns = new_ns;
>> The former will either assign non-NULL to ->root or kfree() mnt_namespace
>> before anyone can see it.
>>
>> And nothing should modify ->root after that assignment for as long as
>> the instance of struct mnt_namespace is allocated.
>>
>> Mind explaining how have you managed to get mnt_namespace with NULL ->root
>> passed to dup_mnt_ns()?
>
> As the matter of fact, all of the above is easily verified; very few places
> see the internals of struct mnt_namespace (defined in fs/mount.h, included
> only in fs/dcache.c, fs/fhandle.c, fs/namei.c,
> fs/notify/{fanotify/fanotify_user.c,fsnotify.c,vfsmount_mark.c},
> and fs/pnode.h, which is included by fs/namespace.c, fs/pnode.c and
> fs/proc_namespace.c).  Nothing else knows what size the damn thing is,
> nevermind the actual layout.
>
> The lifetime rules are also simple and easy to verify: it's a plain refcount.
> No direct manipulators, everything happens via get_mnt_ns()/put_mnt_ns().
> As far as the outside cares,
>        nsproxy->mnt_ns contributes to refcount; copy_mnt_ns() always
> returns a new reference, either to exisiting instance or to freshly allocated
> one.  put_mnt_ns() needs to be called on the other side...
>        opened /proc/*/{mounts,mountinfo,mountstats} contributes to refcount
> for as long as it's opened.
>
> This is it; we actually end up acquiring one reference too many to initial
> mnt_namespace, but that's not going to cause such effect.
>
> The structure containing struct vfsmount has a pointer to struct mnt_namespace
> in it.  That one is protected by namespace_sem, does *not* contribute to
> refcount and is cleared when vfsmount (OK, struct mount containing it)
> gets removed from namespace's ->list and set when it gets placed there.
> Anything still on mnt_ns->list will get kicked out of there by umount_tree()
> call from put_mnt_ns(), so those references can't outlive the instance
> they point to.
>
> And that's all pointers to mnt_namespace that ever exist, aside of function
> arguments and local variables.  I'm not saying that I couldn't have possibly
> fucked it up, but from rereading that code it doesn't look like we could
> end up with dangling pointers to already freed instances...

I'm trying to find the exact chain of events leading it it at the
moment, but it reproduces rather easily - so if you have any ideas on
figuring it out I'm happy to try anything.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] vfs namespace: Don't assume mount namespace has valid root
  2012-03-14 21:03     ` Sasha Levin
@ 2012-03-14 21:37       ` Al Viro
  2012-03-14 22:00         ` Sasha Levin
  0 siblings, 1 reply; 8+ messages in thread
From: Al Viro @ 2012-03-14 21:37 UTC (permalink / raw)
  To: Sasha Levin; +Cc: linux-fsdevel, linux-kernel, davej

On Wed, Mar 14, 2012 at 11:03:43PM +0200, Sasha Levin wrote:
> > And that's all pointers to mnt_namespace that ever exist, aside of function
> > arguments and local variables. ?I'm not saying that I couldn't have possibly
> > fucked it up, but from rereading that code it doesn't look like we could
> > end up with dangling pointers to already freed instances...
> 
> I'm trying to find the exact chain of events leading it it at the
> moment, but it reproduces rather easily - so if you have any ideas on
> figuring it out I'm happy to try anything.

Which kernel, for starters?  I'd probably add dumping call chain +
return value in alloc_mnt_ns(), call chain + pointer being freed in
final put_mnt_ns() and failure exit of dup_mnt_ns(), address of
mnt_ns and value assigned to ->root on assignments to ->root in
create_mnt_ns() and dup_mnt_ns() and mnt_ns in dup_mnt_ns() if it
happens to have NULL ->root.

That should give you full history of allocation/freeing mnt_namespace
instances and of assignments to anyone's ->root.  Ought to make a sane
starting point...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] vfs namespace: Don't assume mount namespace has valid root
  2012-03-14 21:37       ` Al Viro
@ 2012-03-14 22:00         ` Sasha Levin
  2012-03-14 22:19           ` Al Viro
  0 siblings, 1 reply; 8+ messages in thread
From: Sasha Levin @ 2012-03-14 22:00 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel, davej

On Wed, Mar 14, 2012 at 11:37 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Wed, Mar 14, 2012 at 11:03:43PM +0200, Sasha Levin wrote:
>> > And that's all pointers to mnt_namespace that ever exist, aside of function
>> > arguments and local variables. ?I'm not saying that I couldn't have possibly
>> > fucked it up, but from rereading that code it doesn't look like we could
>> > end up with dangling pointers to already freed instances...
>>
>> I'm trying to find the exact chain of events leading it it at the
>> moment, but it reproduces rather easily - so if you have any ideas on
>> figuring it out I'm happy to try anything.
>
> Which kernel, for starters?  I'd probably add dumping call chain +
> return value in alloc_mnt_ns(), call chain + pointer being freed in
> final put_mnt_ns() and failure exit of dup_mnt_ns(), address of
> mnt_ns and value assigned to ->root on assignments to ->root in
> create_mnt_ns() and dup_mnt_ns() and mnt_ns in dup_mnt_ns() if it
> happens to have NULL ->root.
>
> That should give you full history of allocation/freeing mnt_namespace
> instances and of assignments to anyone's ->root.  Ought to make a sane
> starting point...

I'm running 3.3.0-rc6-next-20120309-sasha-00002-g8fd19a0.

I found that if I revert f8b88187 ("brlocks/lglocks: cleanups") it
seems that the problem is gone.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] vfs namespace: Don't assume mount namespace has valid root
  2012-03-14 22:00         ` Sasha Levin
@ 2012-03-14 22:19           ` Al Viro
  2012-03-15  1:05             ` Andi Kleen
  0 siblings, 1 reply; 8+ messages in thread
From: Al Viro @ 2012-03-14 22:19 UTC (permalink / raw)
  To: Sasha Levin; +Cc: linux-fsdevel, linux-kernel, davej, Andi Kleen

On Thu, Mar 15, 2012 at 12:00:52AM +0200, Sasha Levin wrote:

> I'm running 3.3.0-rc6-next-20120309-sasha-00002-g8fd19a0.
> 
> I found that if I revert f8b88187 ("brlocks/lglocks: cleanups") it
> seems that the problem is gone.

... along with this piece of offending commit:
@@ -2249,10 +2250,9 @@ static struct mnt_namespace *dup_mnt_ns(struct mnt_namespace *mnt_ns,
                kfree(new_ns);
                return ERR_PTR(-ENOMEM);
        }
-       new_ns->root = new;
-       br_write_lock(vfsmount_lock);
+       br_write_lock(&vfsmount_lock);
        list_add_tail(&new_ns->list, &new->mnt_list);
-       br_write_unlock(vfsmount_lock);
+       br_write_unlock(&vfsmount_lock);

        /*
         * Second pass: switch the tsk->fs->* elements and mark new vfsmounts
@@ -2416,9 +2416,9 @@ bool is_path_reachable(struct mount *mnt, struct dentry *dentry,

IOW, Andi has hit delete one too many times...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] vfs namespace: Don't assume mount namespace has valid root
  2012-03-14 22:19           ` Al Viro
@ 2012-03-15  1:05             ` Andi Kleen
  0 siblings, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2012-03-15  1:05 UTC (permalink / raw)
  To: Al Viro; +Cc: Sasha Levin, linux-fsdevel, linux-kernel, davej

>          * Second pass: switch the tsk->fs->* elements and mark new vfsmounts
> @@ -2416,9 +2416,9 @@ bool is_path_reachable(struct mount *mnt, struct dentry *dentry,
> 
> IOW, Andi has hit delete one too many times...

Oops. Sorry about that. Thanks for catching it.

The obvious patch looks good to me of course.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-03-15  1:05 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-14 20:58 [PATCH] vfs namespace: Don't assume mount namespace has valid root Sasha Levin
2012-03-14 20:10 ` Al Viro
2012-03-14 20:58   ` Al Viro
2012-03-14 21:03     ` Sasha Levin
2012-03-14 21:37       ` Al Viro
2012-03-14 22:00         ` Sasha Levin
2012-03-14 22:19           ` Al Viro
2012-03-15  1:05             ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox