From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Chinner Subject: Re: hole-punch vs fault Date: Thu, 28 Nov 2013 15:22:48 +1100 Message-ID: <20131128042248.GP8803@dastard> References: <20131127134831.GI24288@parisc-linux.org> <20131127221932.GA27330@quack.suse.cz> <20131128023343.GJ24288@parisc-linux.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Jan Kara , linux-fsdevel@vger.kernel.org To: Matthew Wilcox Return-path: Received: from ipmail06.adl6.internode.on.net ([150.101.137.145]:13736 "EHLO ipmail06.adl6.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754314Ab3K1EXT (ORCPT ); Wed, 27 Nov 2013 23:23:19 -0500 Content-Disposition: inline In-Reply-To: <20131128023343.GJ24288@parisc-linux.org> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Wed, Nov 27, 2013 at 07:33:43PM -0700, Matthew Wilcox wrote: > On Wed, Nov 27, 2013 at 11:19:32PM +0100, Jan Kara wrote: > > > The second is to put some kind of generation counter in the inode or > > > address_space that is incremented on every deallocation. The fault path > > > would read the generation counter before calling find_get_page() and then > > > check it hasn't changed after getting the page lock (goto retry_find). I > > > like that one a little more since we can fix it all in common code, and I > > > think it'll be lower overhead in the fault path. > > I'm not sure I understand how this should work. After pagecache is > > truncated, fault can come and happily instantiate the page again. After > > that fault is done, hole punching awakes and removes blocks from under that > > page. So checking the generation counter after you get the page lock seems > > useless to me. > > Yeah, I don't think the page lock is enough (maybe someone can convince > me otherwise). How does this look? Checking the generation number > after we get i_mutex ensures that the truncate has finished running. But there's no guarantee that whoever is removing the pages from the page cache is holding the i_mutex (XFS hole punching and most page cache operations serialise on XFS_IOLOCK_EXCL, not i_mutex), so you can't use i_mutex in this way for generic code. Besides... > @@ -1617,6 +1617,7 @@ int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf) > pgoff_t offset = vmf->pgoff; > struct page *page; > pgoff_t size; > + unsigned damage = mapping_damage(mapping); > int ret = 0; > > size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT; > @@ -1676,6 +1677,18 @@ retry_find: > return VM_FAULT_SIGBUS; > } > > + /* > + * Check if we were the unlucky victim of a holepunch > + */ > + mutex_lock(&inode->i_mutex); > + if (unlikely(mapping_is_damaged(mapping, damage))) { > + mutex_unlock(&inode->i_mutex); > + unlock_page(page); > + page_cache_release(page); > + damage = mapping_damage(mapping); > + goto retry_find; > + } ... aren't we holding the mmap_sem here? i.e. taking the i_mutex here will lead to deadlocks.... Cheers, Dave. -- Dave Chinner david@fromorbit.com