From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030450Ab2CNU6O (ORCPT ); Wed, 14 Mar 2012 16:58:14 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:38468 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1761374Ab2CNU6M (ORCPT ); Wed, 14 Mar 2012 16:58:12 -0400 Date: Wed, 14 Mar 2012 20:58:09 +0000 From: Al Viro To: Sasha Levin Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, davej@redhat.com Subject: Re: [PATCH] vfs namespace: Don't assume mount namespace has valid root Message-ID: <20120314205809.GJ23916@ZenIV.linux.org.uk> References: <1331758710-16400-1-git-send-email-levinsasha928@gmail.com> <20120314201048.GI23916@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120314201048.GI23916@ZenIV.linux.org.uk> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 14, 2012 at 08:10:48PM +0000, Al Viro wrote: > On Wed, Mar 14, 2012 at 04:58:30PM -0400, Sasha Levin wrote: > > This patch fixes the assumption that a mnt namespace will always have a valid > > root object. > > It's not an assumption, it's an invariant that should hold unless you have > run into a bug somewhere. > > Instances of struct mnt_namespace should *all* come from alloc_mnt_ns(). > There are only two callers - dup_mnt_namespace() and create_mnt_ns(). > The latter will assign non-NULL vfsmount to ->root or die NULL pointer > dereference in > mnt->mnt_ns = new_ns; > The former will either assign non-NULL to ->root or kfree() mnt_namespace > before anyone can see it. > > And nothing should modify ->root after that assignment for as long as > the instance of struct mnt_namespace is allocated. > > Mind explaining how have you managed to get mnt_namespace with NULL ->root > passed to dup_mnt_ns()? As the matter of fact, all of the above is easily verified; very few places see the internals of struct mnt_namespace (defined in fs/mount.h, included only in fs/dcache.c, fs/fhandle.c, fs/namei.c, fs/notify/{fanotify/fanotify_user.c,fsnotify.c,vfsmount_mark.c}, and fs/pnode.h, which is included by fs/namespace.c, fs/pnode.c and fs/proc_namespace.c). Nothing else knows what size the damn thing is, nevermind the actual layout. The lifetime rules are also simple and easy to verify: it's a plain refcount. No direct manipulators, everything happens via get_mnt_ns()/put_mnt_ns(). As far as the outside cares, nsproxy->mnt_ns contributes to refcount; copy_mnt_ns() always returns a new reference, either to exisiting instance or to freshly allocated one. put_mnt_ns() needs to be called on the other side... opened /proc/*/{mounts,mountinfo,mountstats} contributes to refcount for as long as it's opened. This is it; we actually end up acquiring one reference too many to initial mnt_namespace, but that's not going to cause such effect. The structure containing struct vfsmount has a pointer to struct mnt_namespace in it. That one is protected by namespace_sem, does *not* contribute to refcount and is cleared when vfsmount (OK, struct mount containing it) gets removed from namespace's ->list and set when it gets placed there. Anything still on mnt_ns->list will get kicked out of there by umount_tree() call from put_mnt_ns(), so those references can't outlive the instance they point to. And that's all pointers to mnt_namespace that ever exist, aside of function arguments and local variables. I'm not saying that I couldn't have possibly fucked it up, but from rereading that code it doesn't look like we could end up with dangling pointers to already freed instances...