From mboxrd@z Thu Jan 1 00:00:00 1970 From: Matthew Wilcox Subject: Re: hole-punch vs fault Date: Wed, 27 Nov 2013 21:44:54 -0700 Message-ID: <20131128044454.GL24288@parisc-linux.org> References: <20131127134831.GI24288@parisc-linux.org> <20131127221932.GA27330@quack.suse.cz> <20131128023343.GJ24288@parisc-linux.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-fsdevel@vger.kernel.org To: Jan Kara Return-path: Received: from palinux.external.hp.com ([192.25.206.14]:39495 "EHLO mail.parisc-linux.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754434Ab3K1Eo4 (ORCPT ); Wed, 27 Nov 2013 23:44:56 -0500 Content-Disposition: inline In-Reply-To: <20131128023343.GJ24288@parisc-linux.org> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Wed, Nov 27, 2013 at 07:33:43PM -0700, Matthew Wilcox wrote: > On Wed, Nov 27, 2013 at 11:19:32PM +0100, Jan Kara wrote: > > > The second is to put some kind of generation counter in the inode or > > > address_space that is incremented on every deallocation. The fault path > > > would read the generation counter before calling find_get_page() and then > > > check it hasn't changed after getting the page lock (goto retry_find). I > > > like that one a little more since we can fix it all in common code, and I > > > think it'll be lower overhead in the fault path. > > I'm not sure I understand how this should work. After pagecache is > > truncated, fault can come and happily instantiate the page again. After > > that fault is done, hole punching awakes and removes blocks from under that > > page. So checking the generation counter after you get the page lock seems > > useless to me. > > Yeah, I don't think the page lock is enough (maybe someone can convince > me otherwise). How does this look? Checking the generation number > after we get i_mutex ensures that the truncate has finished running. Ohh. We can't take i_mutex because that's an inversion with mmap_sem. So option 1 is to convince ourselves that page lock is enough (and that doesn't solve the problem for ext2-xip). Option 2 is to do something similar to seqcount_t. We can't use seqcount_t as-is because it spins in read_seqcount_begin, and it could be spinning for a very long time while we read a page in from disc! How about something like this: typedef struct seqtex { unsigned sequence; spinlock_t wait_lock; struct list_head wait_list; #ifdef CONFIG_DEBUG_LOCK_ALLOC struct lockdep_map dep_map; #endif } Take seqtex_read_lock() at the beginning of filemap_fault(), release it after we have the page locked (since truncate_inode_pages_range will block on the pagelock before tearing down the data structures). Take seqtex_write_lock() at the start of truncate & holepunch operations, drop it at the end of truncate & holepunch operations. seqtex_read_lock blocks if we're inside a write lock. seqtex_read_unlock tells us whether to try again. write_unlock wakes anybody waiting. write_lock doesn't block because we're assuming the i_mutex is held above us, so we can only have one writer at a time (similar to seqcount_t). I keep hoping for an option 3. -- Matthew Wilcox Intel Open Source Technology Centre "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step."