From mboxrd@z Thu Jan  1 00:00:00 1970
From: Dave Chinner <david@fromorbit.com>
Subject: Re: hole-punch vs fault
Date: Thu, 28 Nov 2013 15:22:48 +1100
Message-ID: <20131128042248.GP8803@dastard>
References: <20131127134831.GI24288@parisc-linux.org>
 <20131127221932.GA27330@quack.suse.cz>
 <20131128023343.GJ24288@parisc-linux.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: Jan Kara <jack@suse.cz>, linux-fsdevel@vger.kernel.org
To: Matthew Wilcox <matthew@wil.cx>
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from ipmail06.adl6.internode.on.net ([150.101.137.145]:13736 "EHLO
	ipmail06.adl6.internode.on.net" rhost-flags-OK-OK-OK-OK)
	by vger.kernel.org with ESMTP id S1754314Ab3K1EXT (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Wed, 27 Nov 2013 23:23:19 -0500
Content-Disposition: inline
In-Reply-To: <20131128023343.GJ24288@parisc-linux.org>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Wed, Nov 27, 2013 at 07:33:43PM -0700, Matthew Wilcox wrote:
> On Wed, Nov 27, 2013 at 11:19:32PM +0100, Jan Kara wrote:
> > > The second is to put some kind of generation counter in the inode or
> > > address_space that is incremented on every deallocation.  The fault path
> > > would read the generation counter before calling find_get_page() and then
> > > check it hasn't changed after getting the page lock (goto retry_find).  I
> > > like that one a little more since we can fix it all in common code, and I
> > > think it'll be lower overhead in the fault path.
> >   I'm not sure I understand how this should work. After pagecache is
> > truncated, fault can come and happily instantiate the page again. After
> > that fault is done, hole punching awakes and removes blocks from under that
> > page. So checking the generation counter after you get the page lock seems
> > useless to me.
> 
> Yeah, I don't think the page lock is enough (maybe someone can convince
> me otherwise).  How does this look?  Checking the generation number
> after we get i_mutex ensures that the truncate has finished running.

But there's no guarantee that whoever is removing the pages from the
page cache is holding the i_mutex (XFS hole punching and most page
cache operations serialise on XFS_IOLOCK_EXCL, not i_mutex), so you
can't use i_mutex in this way for generic code.

Besides...

> @@ -1617,6 +1617,7 @@ int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>  	pgoff_t offset = vmf->pgoff;
>  	struct page *page;
>  	pgoff_t size;
> +	unsigned damage = mapping_damage(mapping);
>  	int ret = 0;
>  
>  	size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
> @@ -1676,6 +1677,18 @@ retry_find:
>  		return VM_FAULT_SIGBUS;
>  	}
>  
> +	/*
> +	 * Check if we were the unlucky victim of a holepunch
> +	 */
> +	mutex_lock(&inode->i_mutex);
> +	if (unlikely(mapping_is_damaged(mapping, damage))) {
> +		mutex_unlock(&inode->i_mutex);
> +		unlock_page(page);
> +		page_cache_release(page);
> +		damage = mapping_damage(mapping);
> +		goto retry_find;
> +	}

... aren't we holding the mmap_sem here? i.e. taking the i_mutex
here will lead to deadlocks....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com