* [PATCH] vfs namespace: Don't assume mount namespace has valid root @ 2012-03-14 20:58 Sasha Levin 2012-03-14 20:10 ` Al Viro 0 siblings, 1 reply; 8+ messages in thread From: Sasha Levin @ 2012-03-14 20:58 UTC (permalink / raw) To: viro; +Cc: linux-fsdevel, linux-kernel, davej, Sasha Levin This patch fixes the assumption that a mnt namespace will always have a valid root object. This fixes a NULL ptr deref on unshare(): [ 23.652282] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 [ 23.653105] IP: [<ffffffff81201e67>] dup_mnt_ns.clone.15+0x37/0x2a0 [ 23.653105] PGD 35dd2067 PUD 35dd3067 PMD 0 [ 23.653105] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 23.653105] CPU 2 [ 23.653105] Pid: 3068, comm: trinity Not tainted 3.3.0-rc6-next-20120309-sasha-00001-g10cf0bc-dirty #42 [ 23.653105] RIP: 0010:[<ffffffff81201e67>] [<ffffffff81201e67>] dup_mnt_ns.clone.15+0x37/0x2a0 [ 23.653105] RSP: 0018:ffff880032b09e48 EFLAGS: 00010282 [ 23.653105] RAX: ffffffff81201e67 RBX: ffff880032b14200 RCX: 0000000000000000 [ 23.653105] RDX: ffffffff83cde6a0 RSI: ffffffff844ce960 RDI: 0000000000000282 [ 23.653105] RBP: ffff880032b09ea8 R08: 0000000000000001 R09: 0000000000000000 [ 23.653105] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 23.653105] R13: ffff8800368db000 R14: ffff880032b388e8 R15: 00007fffb6d26e78 [ 23.653105] FS: 00007f89447c8700(0000) GS:ffff88003e200000(006b) knlGS:0000000000f50030 [ 23.653105] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 23.666575] CR2: 0000000000000020 CR3: 0000000035df4000 CR4: 00000000000406e0 [ 23.666575] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 23.666575] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 23.666575] Process trinity (pid: 3068, threadinfo ffff880032b08000, task ffff8800368db000) [ 23.666575] Stack: [ 23.666575] 00000000000000d0 ffff88003d82d488 ffff88003d3d6178 00000000000000d0 [ 23.666575] ffff880032b09ec8 ffffffff811c2983 ffffffff810de328 ffff880032b608e8 [ 23.666575] 0000000004060a00 ffff8800368db000 ffff880032b14200 00007fffb6d26e78 [ 23.666575] Call Trace: [ 23.666575] [<ffffffff811c2983>] ? kmem_cache_alloc+0x123/0x180 [ 23.666575] [<ffffffff810de328>] ? finish_task_switch+0x78/0x100 [ 23.666575] [<ffffffff81202790>] copy_mnt_ns+0x30/0x50 [ 23.666575] [<ffffffff810d889d>] create_new_namespaces+0x5d/0x190 [ 23.666575] [<ffffffff810d8b93>] unshare_nsproxy_namespaces+0x63/0x90 [ 23.666575] [<ffffffff810adc07>] sys_unshare+0x107/0x240 [ 23.666575] [<ffffffff826f10fd>] system_call_fastpath+0x1a/0x1f [ 23.666575] Code: 49 89 fc 53 48 89 f3 48 83 ec 38 e8 f4 cb ff ff 49 89 c6 48 3d 00 f0 ff ff 0f 87 96 01 00 00 48 c7 c7 00 e9 4c 84 e8 49 b9 4e 01 <49> 8b 74 24 20 ba 05 00 00 00 4c 89 e7 e8 77 f3 ff ff 49 89 c5 [ 23.666575] RIP [<ffffffff81201e67>] dup_mnt_ns.clone.15+0x37/0x2a0 [ 23.666575] RSP <ffff880032b09e48> [ 23.666575] CR2: 0000000000000020 [ 23.693853] ---[ end trace 1a8ec15272933df5 ]--- Signed-off-by: Sasha Levin <levinsasha928@gmail.com> --- fs/namespace.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/fs/namespace.c b/fs/namespace.c index 5a6208b..6b6bec7 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -2238,6 +2238,9 @@ static struct mnt_namespace *dup_mnt_ns(struct mnt_namespace *mnt_ns, struct mount *old = mnt_ns->root; struct mount *new; + if (old == NULL) + return ERR_PTR(-EINVAL); + new_ns = alloc_mnt_ns(); if (IS_ERR(new_ns)) return new_ns; -- 1.7.8.4 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] vfs namespace: Don't assume mount namespace has valid root 2012-03-14 20:58 [PATCH] vfs namespace: Don't assume mount namespace has valid root Sasha Levin @ 2012-03-14 20:10 ` Al Viro 2012-03-14 20:58 ` Al Viro 0 siblings, 1 reply; 8+ messages in thread From: Al Viro @ 2012-03-14 20:10 UTC (permalink / raw) To: Sasha Levin; +Cc: linux-fsdevel, linux-kernel, davej On Wed, Mar 14, 2012 at 04:58:30PM -0400, Sasha Levin wrote: > This patch fixes the assumption that a mnt namespace will always have a valid > root object. It's not an assumption, it's an invariant that should hold unless you have run into a bug somewhere. Instances of struct mnt_namespace should *all* come from alloc_mnt_ns(). There are only two callers - dup_mnt_namespace() and create_mnt_ns(). The latter will assign non-NULL vfsmount to ->root or die NULL pointer dereference in mnt->mnt_ns = new_ns; The former will either assign non-NULL to ->root or kfree() mnt_namespace before anyone can see it. And nothing should modify ->root after that assignment for as long as the instance of struct mnt_namespace is allocated. Mind explaining how have you managed to get mnt_namespace with NULL ->root passed to dup_mnt_ns()? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] vfs namespace: Don't assume mount namespace has valid root 2012-03-14 20:10 ` Al Viro @ 2012-03-14 20:58 ` Al Viro 2012-03-14 21:03 ` Sasha Levin 0 siblings, 1 reply; 8+ messages in thread From: Al Viro @ 2012-03-14 20:58 UTC (permalink / raw) To: Sasha Levin; +Cc: linux-fsdevel, linux-kernel, davej On Wed, Mar 14, 2012 at 08:10:48PM +0000, Al Viro wrote: > On Wed, Mar 14, 2012 at 04:58:30PM -0400, Sasha Levin wrote: > > This patch fixes the assumption that a mnt namespace will always have a valid > > root object. > > It's not an assumption, it's an invariant that should hold unless you have > run into a bug somewhere. > > Instances of struct mnt_namespace should *all* come from alloc_mnt_ns(). > There are only two callers - dup_mnt_namespace() and create_mnt_ns(). > The latter will assign non-NULL vfsmount to ->root or die NULL pointer > dereference in > mnt->mnt_ns = new_ns; > The former will either assign non-NULL to ->root or kfree() mnt_namespace > before anyone can see it. > > And nothing should modify ->root after that assignment for as long as > the instance of struct mnt_namespace is allocated. > > Mind explaining how have you managed to get mnt_namespace with NULL ->root > passed to dup_mnt_ns()? As the matter of fact, all of the above is easily verified; very few places see the internals of struct mnt_namespace (defined in fs/mount.h, included only in fs/dcache.c, fs/fhandle.c, fs/namei.c, fs/notify/{fanotify/fanotify_user.c,fsnotify.c,vfsmount_mark.c}, and fs/pnode.h, which is included by fs/namespace.c, fs/pnode.c and fs/proc_namespace.c). Nothing else knows what size the damn thing is, nevermind the actual layout. The lifetime rules are also simple and easy to verify: it's a plain refcount. No direct manipulators, everything happens via get_mnt_ns()/put_mnt_ns(). As far as the outside cares, nsproxy->mnt_ns contributes to refcount; copy_mnt_ns() always returns a new reference, either to exisiting instance or to freshly allocated one. put_mnt_ns() needs to be called on the other side... opened /proc/*/{mounts,mountinfo,mountstats} contributes to refcount for as long as it's opened. This is it; we actually end up acquiring one reference too many to initial mnt_namespace, but that's not going to cause such effect. The structure containing struct vfsmount has a pointer to struct mnt_namespace in it. That one is protected by namespace_sem, does *not* contribute to refcount and is cleared when vfsmount (OK, struct mount containing it) gets removed from namespace's ->list and set when it gets placed there. Anything still on mnt_ns->list will get kicked out of there by umount_tree() call from put_mnt_ns(), so those references can't outlive the instance they point to. And that's all pointers to mnt_namespace that ever exist, aside of function arguments and local variables. I'm not saying that I couldn't have possibly fucked it up, but from rereading that code it doesn't look like we could end up with dangling pointers to already freed instances... ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] vfs namespace: Don't assume mount namespace has valid root 2012-03-14 20:58 ` Al Viro @ 2012-03-14 21:03 ` Sasha Levin 2012-03-14 21:37 ` Al Viro 0 siblings, 1 reply; 8+ messages in thread From: Sasha Levin @ 2012-03-14 21:03 UTC (permalink / raw) To: Al Viro; +Cc: linux-fsdevel, linux-kernel, davej On Wed, Mar 14, 2012 at 10:58 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Wed, Mar 14, 2012 at 08:10:48PM +0000, Al Viro wrote: >> On Wed, Mar 14, 2012 at 04:58:30PM -0400, Sasha Levin wrote: >> > This patch fixes the assumption that a mnt namespace will always have a valid >> > root object. >> >> It's not an assumption, it's an invariant that should hold unless you have >> run into a bug somewhere. >> >> Instances of struct mnt_namespace should *all* come from alloc_mnt_ns(). >> There are only two callers - dup_mnt_namespace() and create_mnt_ns(). >> The latter will assign non-NULL vfsmount to ->root or die NULL pointer >> dereference in >> mnt->mnt_ns = new_ns; >> The former will either assign non-NULL to ->root or kfree() mnt_namespace >> before anyone can see it. >> >> And nothing should modify ->root after that assignment for as long as >> the instance of struct mnt_namespace is allocated. >> >> Mind explaining how have you managed to get mnt_namespace with NULL ->root >> passed to dup_mnt_ns()? > > As the matter of fact, all of the above is easily verified; very few places > see the internals of struct mnt_namespace (defined in fs/mount.h, included > only in fs/dcache.c, fs/fhandle.c, fs/namei.c, > fs/notify/{fanotify/fanotify_user.c,fsnotify.c,vfsmount_mark.c}, > and fs/pnode.h, which is included by fs/namespace.c, fs/pnode.c and > fs/proc_namespace.c). Nothing else knows what size the damn thing is, > nevermind the actual layout. > > The lifetime rules are also simple and easy to verify: it's a plain refcount. > No direct manipulators, everything happens via get_mnt_ns()/put_mnt_ns(). > As far as the outside cares, > nsproxy->mnt_ns contributes to refcount; copy_mnt_ns() always > returns a new reference, either to exisiting instance or to freshly allocated > one. put_mnt_ns() needs to be called on the other side... > opened /proc/*/{mounts,mountinfo,mountstats} contributes to refcount > for as long as it's opened. > > This is it; we actually end up acquiring one reference too many to initial > mnt_namespace, but that's not going to cause such effect. > > The structure containing struct vfsmount has a pointer to struct mnt_namespace > in it. That one is protected by namespace_sem, does *not* contribute to > refcount and is cleared when vfsmount (OK, struct mount containing it) > gets removed from namespace's ->list and set when it gets placed there. > Anything still on mnt_ns->list will get kicked out of there by umount_tree() > call from put_mnt_ns(), so those references can't outlive the instance > they point to. > > And that's all pointers to mnt_namespace that ever exist, aside of function > arguments and local variables. I'm not saying that I couldn't have possibly > fucked it up, but from rereading that code it doesn't look like we could > end up with dangling pointers to already freed instances... I'm trying to find the exact chain of events leading it it at the moment, but it reproduces rather easily - so if you have any ideas on figuring it out I'm happy to try anything. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] vfs namespace: Don't assume mount namespace has valid root 2012-03-14 21:03 ` Sasha Levin @ 2012-03-14 21:37 ` Al Viro 2012-03-14 22:00 ` Sasha Levin 0 siblings, 1 reply; 8+ messages in thread From: Al Viro @ 2012-03-14 21:37 UTC (permalink / raw) To: Sasha Levin; +Cc: linux-fsdevel, linux-kernel, davej On Wed, Mar 14, 2012 at 11:03:43PM +0200, Sasha Levin wrote: > > And that's all pointers to mnt_namespace that ever exist, aside of function > > arguments and local variables. ?I'm not saying that I couldn't have possibly > > fucked it up, but from rereading that code it doesn't look like we could > > end up with dangling pointers to already freed instances... > > I'm trying to find the exact chain of events leading it it at the > moment, but it reproduces rather easily - so if you have any ideas on > figuring it out I'm happy to try anything. Which kernel, for starters? I'd probably add dumping call chain + return value in alloc_mnt_ns(), call chain + pointer being freed in final put_mnt_ns() and failure exit of dup_mnt_ns(), address of mnt_ns and value assigned to ->root on assignments to ->root in create_mnt_ns() and dup_mnt_ns() and mnt_ns in dup_mnt_ns() if it happens to have NULL ->root. That should give you full history of allocation/freeing mnt_namespace instances and of assignments to anyone's ->root. Ought to make a sane starting point... ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] vfs namespace: Don't assume mount namespace has valid root 2012-03-14 21:37 ` Al Viro @ 2012-03-14 22:00 ` Sasha Levin 2012-03-14 22:19 ` Al Viro 0 siblings, 1 reply; 8+ messages in thread From: Sasha Levin @ 2012-03-14 22:00 UTC (permalink / raw) To: Al Viro; +Cc: linux-fsdevel, linux-kernel, davej On Wed, Mar 14, 2012 at 11:37 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Wed, Mar 14, 2012 at 11:03:43PM +0200, Sasha Levin wrote: >> > And that's all pointers to mnt_namespace that ever exist, aside of function >> > arguments and local variables. ?I'm not saying that I couldn't have possibly >> > fucked it up, but from rereading that code it doesn't look like we could >> > end up with dangling pointers to already freed instances... >> >> I'm trying to find the exact chain of events leading it it at the >> moment, but it reproduces rather easily - so if you have any ideas on >> figuring it out I'm happy to try anything. > > Which kernel, for starters? I'd probably add dumping call chain + > return value in alloc_mnt_ns(), call chain + pointer being freed in > final put_mnt_ns() and failure exit of dup_mnt_ns(), address of > mnt_ns and value assigned to ->root on assignments to ->root in > create_mnt_ns() and dup_mnt_ns() and mnt_ns in dup_mnt_ns() if it > happens to have NULL ->root. > > That should give you full history of allocation/freeing mnt_namespace > instances and of assignments to anyone's ->root. Ought to make a sane > starting point... I'm running 3.3.0-rc6-next-20120309-sasha-00002-g8fd19a0. I found that if I revert f8b88187 ("brlocks/lglocks: cleanups") it seems that the problem is gone. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] vfs namespace: Don't assume mount namespace has valid root 2012-03-14 22:00 ` Sasha Levin @ 2012-03-14 22:19 ` Al Viro 2012-03-15 1:05 ` Andi Kleen 0 siblings, 1 reply; 8+ messages in thread From: Al Viro @ 2012-03-14 22:19 UTC (permalink / raw) To: Sasha Levin; +Cc: linux-fsdevel, linux-kernel, davej, Andi Kleen On Thu, Mar 15, 2012 at 12:00:52AM +0200, Sasha Levin wrote: > I'm running 3.3.0-rc6-next-20120309-sasha-00002-g8fd19a0. > > I found that if I revert f8b88187 ("brlocks/lglocks: cleanups") it > seems that the problem is gone. ... along with this piece of offending commit: @@ -2249,10 +2250,9 @@ static struct mnt_namespace *dup_mnt_ns(struct mnt_namespace *mnt_ns, kfree(new_ns); return ERR_PTR(-ENOMEM); } - new_ns->root = new; - br_write_lock(vfsmount_lock); + br_write_lock(&vfsmount_lock); list_add_tail(&new_ns->list, &new->mnt_list); - br_write_unlock(vfsmount_lock); + br_write_unlock(&vfsmount_lock); /* * Second pass: switch the tsk->fs->* elements and mark new vfsmounts @@ -2416,9 +2416,9 @@ bool is_path_reachable(struct mount *mnt, struct dentry *dentry, IOW, Andi has hit delete one too many times... ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] vfs namespace: Don't assume mount namespace has valid root 2012-03-14 22:19 ` Al Viro @ 2012-03-15 1:05 ` Andi Kleen 0 siblings, 0 replies; 8+ messages in thread From: Andi Kleen @ 2012-03-15 1:05 UTC (permalink / raw) To: Al Viro; +Cc: Sasha Levin, linux-fsdevel, linux-kernel, davej > * Second pass: switch the tsk->fs->* elements and mark new vfsmounts > @@ -2416,9 +2416,9 @@ bool is_path_reachable(struct mount *mnt, struct dentry *dentry, > > IOW, Andi has hit delete one too many times... Oops. Sorry about that. Thanks for catching it. The obvious patch looks good to me of course. -Andi -- ak@linux.intel.com -- Speaking for myself only ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2012-03-15 1:05 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-03-14 20:58 [PATCH] vfs namespace: Don't assume mount namespace has valid root Sasha Levin 2012-03-14 20:10 ` Al Viro 2012-03-14 20:58 ` Al Viro 2012-03-14 21:03 ` Sasha Levin 2012-03-14 21:37 ` Al Viro 2012-03-14 22:00 ` Sasha Levin 2012-03-14 22:19 ` Al Viro 2012-03-15 1:05 ` Andi Kleen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox