* Re: [PATCH] vfs namespace: Don't assume mount namespace has valid root
2012-03-14 20:58 [PATCH] vfs namespace: Don't assume mount namespace has valid root Sasha Levin
@ 2012-03-14 20:10 ` Al Viro
2012-03-14 20:58 ` Al Viro
0 siblings, 1 reply; 8+ messages in thread
From: Al Viro @ 2012-03-14 20:10 UTC (permalink / raw)
To: Sasha Levin; +Cc: linux-fsdevel, linux-kernel, davej
On Wed, Mar 14, 2012 at 04:58:30PM -0400, Sasha Levin wrote:
> This patch fixes the assumption that a mnt namespace will always have a valid
> root object.
It's not an assumption, it's an invariant that should hold unless you have
run into a bug somewhere.
Instances of struct mnt_namespace should *all* come from alloc_mnt_ns().
There are only two callers - dup_mnt_namespace() and create_mnt_ns().
The latter will assign non-NULL vfsmount to ->root or die NULL pointer
dereference in
mnt->mnt_ns = new_ns;
The former will either assign non-NULL to ->root or kfree() mnt_namespace
before anyone can see it.
And nothing should modify ->root after that assignment for as long as
the instance of struct mnt_namespace is allocated.
Mind explaining how have you managed to get mnt_namespace with NULL ->root
passed to dup_mnt_ns()?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] vfs namespace: Don't assume mount namespace has valid root
2012-03-14 20:10 ` Al Viro
@ 2012-03-14 20:58 ` Al Viro
2012-03-14 21:03 ` Sasha Levin
0 siblings, 1 reply; 8+ messages in thread
From: Al Viro @ 2012-03-14 20:58 UTC (permalink / raw)
To: Sasha Levin; +Cc: linux-fsdevel, linux-kernel, davej
On Wed, Mar 14, 2012 at 08:10:48PM +0000, Al Viro wrote:
> On Wed, Mar 14, 2012 at 04:58:30PM -0400, Sasha Levin wrote:
> > This patch fixes the assumption that a mnt namespace will always have a valid
> > root object.
>
> It's not an assumption, it's an invariant that should hold unless you have
> run into a bug somewhere.
>
> Instances of struct mnt_namespace should *all* come from alloc_mnt_ns().
> There are only two callers - dup_mnt_namespace() and create_mnt_ns().
> The latter will assign non-NULL vfsmount to ->root or die NULL pointer
> dereference in
> mnt->mnt_ns = new_ns;
> The former will either assign non-NULL to ->root or kfree() mnt_namespace
> before anyone can see it.
>
> And nothing should modify ->root after that assignment for as long as
> the instance of struct mnt_namespace is allocated.
>
> Mind explaining how have you managed to get mnt_namespace with NULL ->root
> passed to dup_mnt_ns()?
As the matter of fact, all of the above is easily verified; very few places
see the internals of struct mnt_namespace (defined in fs/mount.h, included
only in fs/dcache.c, fs/fhandle.c, fs/namei.c,
fs/notify/{fanotify/fanotify_user.c,fsnotify.c,vfsmount_mark.c},
and fs/pnode.h, which is included by fs/namespace.c, fs/pnode.c and
fs/proc_namespace.c). Nothing else knows what size the damn thing is,
nevermind the actual layout.
The lifetime rules are also simple and easy to verify: it's a plain refcount.
No direct manipulators, everything happens via get_mnt_ns()/put_mnt_ns().
As far as the outside cares,
nsproxy->mnt_ns contributes to refcount; copy_mnt_ns() always
returns a new reference, either to exisiting instance or to freshly allocated
one. put_mnt_ns() needs to be called on the other side...
opened /proc/*/{mounts,mountinfo,mountstats} contributes to refcount
for as long as it's opened.
This is it; we actually end up acquiring one reference too many to initial
mnt_namespace, but that's not going to cause such effect.
The structure containing struct vfsmount has a pointer to struct mnt_namespace
in it. That one is protected by namespace_sem, does *not* contribute to
refcount and is cleared when vfsmount (OK, struct mount containing it)
gets removed from namespace's ->list and set when it gets placed there.
Anything still on mnt_ns->list will get kicked out of there by umount_tree()
call from put_mnt_ns(), so those references can't outlive the instance
they point to.
And that's all pointers to mnt_namespace that ever exist, aside of function
arguments and local variables. I'm not saying that I couldn't have possibly
fucked it up, but from rereading that code it doesn't look like we could
end up with dangling pointers to already freed instances...
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] vfs namespace: Don't assume mount namespace has valid root
@ 2012-03-14 20:58 Sasha Levin
2012-03-14 20:10 ` Al Viro
0 siblings, 1 reply; 8+ messages in thread
From: Sasha Levin @ 2012-03-14 20:58 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, linux-kernel, davej, Sasha Levin
This patch fixes the assumption that a mnt namespace will always have a valid
root object.
This fixes a NULL ptr deref on unshare():
[ 23.652282] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
[ 23.653105] IP: [<ffffffff81201e67>] dup_mnt_ns.clone.15+0x37/0x2a0
[ 23.653105] PGD 35dd2067 PUD 35dd3067 PMD 0
[ 23.653105] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 23.653105] CPU 2
[ 23.653105] Pid: 3068, comm: trinity Not tainted 3.3.0-rc6-next-20120309-sasha-00001-g10cf0bc-dirty #42
[ 23.653105] RIP: 0010:[<ffffffff81201e67>] [<ffffffff81201e67>] dup_mnt_ns.clone.15+0x37/0x2a0
[ 23.653105] RSP: 0018:ffff880032b09e48 EFLAGS: 00010282
[ 23.653105] RAX: ffffffff81201e67 RBX: ffff880032b14200 RCX: 0000000000000000
[ 23.653105] RDX: ffffffff83cde6a0 RSI: ffffffff844ce960 RDI: 0000000000000282
[ 23.653105] RBP: ffff880032b09ea8 R08: 0000000000000001 R09: 0000000000000000
[ 23.653105] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 23.653105] R13: ffff8800368db000 R14: ffff880032b388e8 R15: 00007fffb6d26e78
[ 23.653105] FS: 00007f89447c8700(0000) GS:ffff88003e200000(006b) knlGS:0000000000f50030
[ 23.653105] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 23.666575] CR2: 0000000000000020 CR3: 0000000035df4000 CR4: 00000000000406e0
[ 23.666575] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 23.666575] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 23.666575] Process trinity (pid: 3068, threadinfo ffff880032b08000, task ffff8800368db000)
[ 23.666575] Stack:
[ 23.666575] 00000000000000d0 ffff88003d82d488 ffff88003d3d6178 00000000000000d0
[ 23.666575] ffff880032b09ec8 ffffffff811c2983 ffffffff810de328 ffff880032b608e8
[ 23.666575] 0000000004060a00 ffff8800368db000 ffff880032b14200 00007fffb6d26e78
[ 23.666575] Call Trace:
[ 23.666575] [<ffffffff811c2983>] ? kmem_cache_alloc+0x123/0x180
[ 23.666575] [<ffffffff810de328>] ? finish_task_switch+0x78/0x100
[ 23.666575] [<ffffffff81202790>] copy_mnt_ns+0x30/0x50
[ 23.666575] [<ffffffff810d889d>] create_new_namespaces+0x5d/0x190
[ 23.666575] [<ffffffff810d8b93>] unshare_nsproxy_namespaces+0x63/0x90
[ 23.666575] [<ffffffff810adc07>] sys_unshare+0x107/0x240
[ 23.666575] [<ffffffff826f10fd>] system_call_fastpath+0x1a/0x1f
[ 23.666575] Code: 49 89 fc 53 48 89 f3 48 83 ec 38 e8 f4 cb ff ff 49 89 c6 48 3d 00 f0 ff ff 0f 87 96 01 00 00 48 c7 c7 00 e9 4c 84 e8 49 b9 4e 01 <49> 8b 74 24 20 ba 05 00 00 00 4c 89 e7 e8 77 f3 ff ff 49 89 c5
[ 23.666575] RIP [<ffffffff81201e67>] dup_mnt_ns.clone.15+0x37/0x2a0
[ 23.666575] RSP <ffff880032b09e48>
[ 23.666575] CR2: 0000000000000020
[ 23.693853] ---[ end trace 1a8ec15272933df5 ]---
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
---
fs/namespace.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/fs/namespace.c b/fs/namespace.c
index 5a6208b..6b6bec7 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2238,6 +2238,9 @@ static struct mnt_namespace *dup_mnt_ns(struct mnt_namespace *mnt_ns,
struct mount *old = mnt_ns->root;
struct mount *new;
+ if (old == NULL)
+ return ERR_PTR(-EINVAL);
+
new_ns = alloc_mnt_ns();
if (IS_ERR(new_ns))
return new_ns;
--
1.7.8.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] vfs namespace: Don't assume mount namespace has valid root
2012-03-14 20:58 ` Al Viro
@ 2012-03-14 21:03 ` Sasha Levin
2012-03-14 21:37 ` Al Viro
0 siblings, 1 reply; 8+ messages in thread
From: Sasha Levin @ 2012-03-14 21:03 UTC (permalink / raw)
To: Al Viro; +Cc: linux-fsdevel, linux-kernel, davej
On Wed, Mar 14, 2012 at 10:58 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Wed, Mar 14, 2012 at 08:10:48PM +0000, Al Viro wrote:
>> On Wed, Mar 14, 2012 at 04:58:30PM -0400, Sasha Levin wrote:
>> > This patch fixes the assumption that a mnt namespace will always have a valid
>> > root object.
>>
>> It's not an assumption, it's an invariant that should hold unless you have
>> run into a bug somewhere.
>>
>> Instances of struct mnt_namespace should *all* come from alloc_mnt_ns().
>> There are only two callers - dup_mnt_namespace() and create_mnt_ns().
>> The latter will assign non-NULL vfsmount to ->root or die NULL pointer
>> dereference in
>> mnt->mnt_ns = new_ns;
>> The former will either assign non-NULL to ->root or kfree() mnt_namespace
>> before anyone can see it.
>>
>> And nothing should modify ->root after that assignment for as long as
>> the instance of struct mnt_namespace is allocated.
>>
>> Mind explaining how have you managed to get mnt_namespace with NULL ->root
>> passed to dup_mnt_ns()?
>
> As the matter of fact, all of the above is easily verified; very few places
> see the internals of struct mnt_namespace (defined in fs/mount.h, included
> only in fs/dcache.c, fs/fhandle.c, fs/namei.c,
> fs/notify/{fanotify/fanotify_user.c,fsnotify.c,vfsmount_mark.c},
> and fs/pnode.h, which is included by fs/namespace.c, fs/pnode.c and
> fs/proc_namespace.c). Nothing else knows what size the damn thing is,
> nevermind the actual layout.
>
> The lifetime rules are also simple and easy to verify: it's a plain refcount.
> No direct manipulators, everything happens via get_mnt_ns()/put_mnt_ns().
> As far as the outside cares,
> nsproxy->mnt_ns contributes to refcount; copy_mnt_ns() always
> returns a new reference, either to exisiting instance or to freshly allocated
> one. put_mnt_ns() needs to be called on the other side...
> opened /proc/*/{mounts,mountinfo,mountstats} contributes to refcount
> for as long as it's opened.
>
> This is it; we actually end up acquiring one reference too many to initial
> mnt_namespace, but that's not going to cause such effect.
>
> The structure containing struct vfsmount has a pointer to struct mnt_namespace
> in it. That one is protected by namespace_sem, does *not* contribute to
> refcount and is cleared when vfsmount (OK, struct mount containing it)
> gets removed from namespace's ->list and set when it gets placed there.
> Anything still on mnt_ns->list will get kicked out of there by umount_tree()
> call from put_mnt_ns(), so those references can't outlive the instance
> they point to.
>
> And that's all pointers to mnt_namespace that ever exist, aside of function
> arguments and local variables. I'm not saying that I couldn't have possibly
> fucked it up, but from rereading that code it doesn't look like we could
> end up with dangling pointers to already freed instances...
I'm trying to find the exact chain of events leading it it at the
moment, but it reproduces rather easily - so if you have any ideas on
figuring it out I'm happy to try anything.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] vfs namespace: Don't assume mount namespace has valid root
2012-03-14 21:03 ` Sasha Levin
@ 2012-03-14 21:37 ` Al Viro
2012-03-14 22:00 ` Sasha Levin
0 siblings, 1 reply; 8+ messages in thread
From: Al Viro @ 2012-03-14 21:37 UTC (permalink / raw)
To: Sasha Levin; +Cc: linux-fsdevel, linux-kernel, davej
On Wed, Mar 14, 2012 at 11:03:43PM +0200, Sasha Levin wrote:
> > And that's all pointers to mnt_namespace that ever exist, aside of function
> > arguments and local variables. ?I'm not saying that I couldn't have possibly
> > fucked it up, but from rereading that code it doesn't look like we could
> > end up with dangling pointers to already freed instances...
>
> I'm trying to find the exact chain of events leading it it at the
> moment, but it reproduces rather easily - so if you have any ideas on
> figuring it out I'm happy to try anything.
Which kernel, for starters? I'd probably add dumping call chain +
return value in alloc_mnt_ns(), call chain + pointer being freed in
final put_mnt_ns() and failure exit of dup_mnt_ns(), address of
mnt_ns and value assigned to ->root on assignments to ->root in
create_mnt_ns() and dup_mnt_ns() and mnt_ns in dup_mnt_ns() if it
happens to have NULL ->root.
That should give you full history of allocation/freeing mnt_namespace
instances and of assignments to anyone's ->root. Ought to make a sane
starting point...
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] vfs namespace: Don't assume mount namespace has valid root
2012-03-14 21:37 ` Al Viro
@ 2012-03-14 22:00 ` Sasha Levin
2012-03-14 22:19 ` Al Viro
0 siblings, 1 reply; 8+ messages in thread
From: Sasha Levin @ 2012-03-14 22:00 UTC (permalink / raw)
To: Al Viro; +Cc: linux-fsdevel, linux-kernel, davej
On Wed, Mar 14, 2012 at 11:37 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Wed, Mar 14, 2012 at 11:03:43PM +0200, Sasha Levin wrote:
>> > And that's all pointers to mnt_namespace that ever exist, aside of function
>> > arguments and local variables. ?I'm not saying that I couldn't have possibly
>> > fucked it up, but from rereading that code it doesn't look like we could
>> > end up with dangling pointers to already freed instances...
>>
>> I'm trying to find the exact chain of events leading it it at the
>> moment, but it reproduces rather easily - so if you have any ideas on
>> figuring it out I'm happy to try anything.
>
> Which kernel, for starters? I'd probably add dumping call chain +
> return value in alloc_mnt_ns(), call chain + pointer being freed in
> final put_mnt_ns() and failure exit of dup_mnt_ns(), address of
> mnt_ns and value assigned to ->root on assignments to ->root in
> create_mnt_ns() and dup_mnt_ns() and mnt_ns in dup_mnt_ns() if it
> happens to have NULL ->root.
>
> That should give you full history of allocation/freeing mnt_namespace
> instances and of assignments to anyone's ->root. Ought to make a sane
> starting point...
I'm running 3.3.0-rc6-next-20120309-sasha-00002-g8fd19a0.
I found that if I revert f8b88187 ("brlocks/lglocks: cleanups") it
seems that the problem is gone.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] vfs namespace: Don't assume mount namespace has valid root
2012-03-14 22:00 ` Sasha Levin
@ 2012-03-14 22:19 ` Al Viro
2012-03-15 1:05 ` Andi Kleen
0 siblings, 1 reply; 8+ messages in thread
From: Al Viro @ 2012-03-14 22:19 UTC (permalink / raw)
To: Sasha Levin; +Cc: linux-fsdevel, linux-kernel, davej, Andi Kleen
On Thu, Mar 15, 2012 at 12:00:52AM +0200, Sasha Levin wrote:
> I'm running 3.3.0-rc6-next-20120309-sasha-00002-g8fd19a0.
>
> I found that if I revert f8b88187 ("brlocks/lglocks: cleanups") it
> seems that the problem is gone.
... along with this piece of offending commit:
@@ -2249,10 +2250,9 @@ static struct mnt_namespace *dup_mnt_ns(struct mnt_namespace *mnt_ns,
kfree(new_ns);
return ERR_PTR(-ENOMEM);
}
- new_ns->root = new;
- br_write_lock(vfsmount_lock);
+ br_write_lock(&vfsmount_lock);
list_add_tail(&new_ns->list, &new->mnt_list);
- br_write_unlock(vfsmount_lock);
+ br_write_unlock(&vfsmount_lock);
/*
* Second pass: switch the tsk->fs->* elements and mark new vfsmounts
@@ -2416,9 +2416,9 @@ bool is_path_reachable(struct mount *mnt, struct dentry *dentry,
IOW, Andi has hit delete one too many times...
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] vfs namespace: Don't assume mount namespace has valid root
2012-03-14 22:19 ` Al Viro
@ 2012-03-15 1:05 ` Andi Kleen
0 siblings, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2012-03-15 1:05 UTC (permalink / raw)
To: Al Viro; +Cc: Sasha Levin, linux-fsdevel, linux-kernel, davej
> * Second pass: switch the tsk->fs->* elements and mark new vfsmounts
> @@ -2416,9 +2416,9 @@ bool is_path_reachable(struct mount *mnt, struct dentry *dentry,
>
> IOW, Andi has hit delete one too many times...
Oops. Sorry about that. Thanks for catching it.
The obvious patch looks good to me of course.
-Andi
--
ak@linux.intel.com -- Speaking for myself only
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2012-03-15 1:05 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-14 20:58 [PATCH] vfs namespace: Don't assume mount namespace has valid root Sasha Levin
2012-03-14 20:10 ` Al Viro
2012-03-14 20:58 ` Al Viro
2012-03-14 21:03 ` Sasha Levin
2012-03-14 21:37 ` Al Viro
2012-03-14 22:00 ` Sasha Levin
2012-03-14 22:19 ` Al Viro
2012-03-15 1:05 ` Andi Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox