From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Hansen Subject: Re: [PATCH 00/26] Mount writer count and read-only bind mounts Date: Mon, 25 Jun 2007 08:45:06 -0700 Message-ID: <1182786306.26162.102.camel@localhost> References: <20070622200303.82D9CC3A@kernel> <20070623095246.a9061585.akpm@linux-foundation.org> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Cc: Dave Hansen , linux-fsdevel@vger.kernel.org, hch@infradead.org, viro@ftp.linux.org.uk To: Andrew Morton Return-path: Received: from e34.co.us.ibm.com ([32.97.110.152]:41597 "EHLO e34.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752584AbXFYPpL (ORCPT ); Mon, 25 Jun 2007 11:45:11 -0400 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e34.co.us.ibm.com (8.13.8/8.13.8) with ESMTP id l5PFj9VD020322 for ; Mon, 25 Jun 2007 11:45:09 -0400 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by d03relay04.boulder.ibm.com (8.13.8/8.13.8/NCO v8.3) with ESMTP id l5PFj8mf137694 for ; Mon, 25 Jun 2007 09:45:08 -0600 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l5PFj8d6016454 for ; Mon, 25 Jun 2007 09:45:08 -0600 In-Reply-To: <20070623095246.a9061585.akpm@linux-foundation.org> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Sat, 2007-06-23 at 09:52 -0700, Andrew Morton wrote: > > On Fri, 22 Jun 2007 13:03:03 -0700 Dave Hansen wrote: > > Why do we need r/o bind mounts? > > > > This feature allows a read-only view into a read-write filesystem. > > In the process of doing that, it also provides infrastructure for > > keeping track of the number of writers to any given mount. > > > > This has a number of uses. It allows chroots to have parts of > > filesystems writable. It will be useful for containers in the future > > because users may have root inside a container, but should not > > be allowed to write to somefilesystems. This also replaces > > patches that vserver has had out of the tree for several years. > > > > It allows security enhancement by making sure that parts of > > your filesystem read-only (such as when you don't trust your > > FTP server), when you don't want to have entire new filesystems > > mounted, or when you want atime selectively updated. > > I've been using the following script to test that the feature is > > working as desired. It takes a directory and makes a regular > > bind and a r/o bind mount of it. It then performs some normal > > filesystem operations on the three directories, including ones > > that are expected to fail, like creating a file on the r/o > > mount. > > Doesn't selinux do some of this? > > My overall reaction: owch. There's a ton of tricksy code here and great > potential for us to accidentally break it in the future by forgetting a > mnt_may_write() as the kernel evolves. This is definitely a tricky thing. It takes a static, single check and replaces it with a matched set of operations. But, it's not much different that adding a mutex to something. People can always miss one side of the lock pair. People won't miss the mnt_may_write() because it will become the only way that it is valid to check a mounted fs for the ability to write to it. IS_RDONLY() will not be available for these kinds of checks. > And then there's the added complexity and the added runtime overhead. > > Balance that against some pretty obscure-looking benefits and I'm > struggling to see how a merge is justifiable? One reason Al had me go through using these paired operations instead of just passing the mount all over the vfs is that this fixes some existing, fundamental problems: we do not properly track when writers are _finished_ to our filesystems, and may allow a remount-r/o operation to success when writes are still occurring. We needed to separate out the logical "users can write to this fs" from the physical "this fs is on r/o media" or "this fs is dying and writes will only kill it more". That's what these patches do in the end. One set of things that I'm going to tack on here once these go in is the ability to increment the writer count upon a decrement of i_nlink to zero. We'll drop the write count when the file is actually truncated. As it stands right now, since there is never an open filp on those files, you might unlink a file, do a r/o mount of the fs, then still write to it when the truncate occurs. I think fixing that was one of Al's long-term goals with this strategy. -- Dave