From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Morton Subject: Re: [patch] mm: close page_mkwrite races Date: Wed, 1 Apr 2009 16:02:41 -0700 Message-ID: <20090401160241.ec2f4573.akpm@linux-foundation.org> References: <20090330135307.GP31000@wotan.suse.de> <20090330135613.GQ31000@wotan.suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: npiggin@suse.de, trond.myklebust@fys.uio.no, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org To: Sage Weil Return-path: In-Reply-To: Sender: owner-linux-mm@kvack.org List-Id: linux-fsdevel.vger.kernel.org On Tue, 31 Mar 2009 12:55:16 -0700 (PDT) Sage Weil wrote: > On Mon, 30 Mar 2009, Nick Piggin wrote: > > [Fixed linux-mm address. Please reply here] > > > > Hi, > > > > I'd like opinions on this patch (applies on top of the previous > > page_mkwrite fixes in -mm). I was not going to ask to merge it > > immediately however it appears that fsblock is not the only one who > > needs it... > > -- > > > > I want to have the page be protected by page lock between page_mkwrite > > notification to the filesystem, and the actual setting of the page > > dirty. Do this by allowing the filesystem to return a locked page from > > page_mkwrite, and have the page fault code keep it held until after it > > calls set_page_dirty. > > > > I need this in fsblock because I am working to ensure filesystem metadata > > can be correctly allocated and refcounted. This means that page cleaning > > should not require memory allocation. wot? "page cleaning" involves writeout. How can we avoid doing allocations there? > > Without this patch, then for example we could get a concurrent writeout > > after the page_mkwrite (which allocates page metadata required to clean > > it), but before the set_page_dirty. The writeout will clean the page and > > notice that the metadata is now unused and may be deallocated (because > > it appears clean as set_page_dirty hasn't been called yet). So at this > > point the page may be dirtied via the pte without enough metadata to be > > able to write it back. > > > > Sage needs this race closed for ceph, and Trond maybe for NFS. > > I ran a few tests and this fixes the problem for me (although fyi the > patch didn't apply cleanly on top of your previously posted page_mkwrite > prototype change patch, due to some differences in block_page_mkwrite). What is "the problem"? Can we get "the problem"'s description included in the changelog? The patch is fairly ugly, somewhat costly and makes things (even) more complex. Sigh. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org