Re: hole-punch vs fault - Matthew Wilcox

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Matthew Wilcox <matthew@wil.cx>
To: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org
Subject: Re: hole-punch vs fault
Date: Wed, 27 Nov 2013 19:33:43 -0700	[thread overview]
Message-ID: <20131128023343.GJ24288@parisc-linux.org> (raw)
In-Reply-To: <20131127221932.GA27330@quack.suse.cz>

On Wed, Nov 27, 2013 at 11:19:32PM +0100, Jan Kara wrote:
> > The second is to put some kind of generation counter in the inode or
> > address_space that is incremented on every deallocation.  The fault path
> > would read the generation counter before calling find_get_page() and then
> > check it hasn't changed after getting the page lock (goto retry_find).  I
> > like that one a little more since we can fix it all in common code, and I
> > think it'll be lower overhead in the fault path.
>   I'm not sure I understand how this should work. After pagecache is
> truncated, fault can come and happily instantiate the page again. After
> that fault is done, hole punching awakes and removes blocks from under that
> page. So checking the generation counter after you get the page lock seems
> useless to me.

Yeah, I don't think the page lock is enough (maybe someone can convince
me otherwise).  How does this look?  Checking the generation number
after we get i_mutex ensures that the truncate has finished running.

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 54eed4f..df6278b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3648,6 +3648,7 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
 					    stop_block);
 
 	ext4_discard_preallocations(inode);
+	damage_mapping(mapping);
 	up_write(&EXT4_I(inode)->i_data_sem);
 	if (IS_SYNC(inode))
 		ext4_handle_sync(handle);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1a04525..190f38c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -415,6 +415,7 @@ struct address_space {
 	struct radix_tree_root	page_tree;	/* radix tree of all pages */
 	spinlock_t		tree_lock;	/* and lock protecting it */
 	unsigned int		i_mmap_writable;/* count VM_SHARED mappings */
+	unsigned		i_damaged;	/* damage count */
 	struct rb_root		i_mmap;		/* tree of private and shared mappings */
 	struct list_head	i_mmap_nonlinear;/*list VM_NONLINEAR mappings */
 	struct mutex		i_mmap_mutex;	/* protect tree, count, list */
@@ -503,6 +504,31 @@ static inline int mapping_writably_mapped(struct address_space *mapping)
 }
 
 /*
+ * A mapping is damaged when blocks are removed from the filesystem's
+ * data structures.
+ */
+static inline unsigned mapping_damage(struct address_space *mapping)
+{
+	unsigned seq = ACCESS_ONCE(mapping->i_damaged);
+	smp_rmb();	/* Subsequent reads of damagable data structures */
+	return seq;
+}
+
+/* Must be called with i_mutex held */
+static inline bool
+mapping_is_damaged(struct address_space *mapping, unsigned seq)
+{
+	return mapping->i_damaged != seq;
+}
+
+/* Must be called with i_mutex held */
+static inline void damage_mapping(struct address_space *mapping)
+{
+	smp_wmb();	/* Prior writes to damagable data structures */
+	mapping->i_damaged++;
+}
+
+/*
  * Use sequence counter to get consistent i_size on 32-bit processors.
  */
 #if BITS_PER_LONG==32 && defined(CONFIG_SMP)
diff --git a/mm/filemap.c b/mm/filemap.c
index b7749a9..71c936d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1617,6 +1617,7 @@ int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	pgoff_t offset = vmf->pgoff;
 	struct page *page;
 	pgoff_t size;
+	unsigned damage = mapping_damage(mapping);
 	int ret = 0;
 
 	size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
@@ -1676,6 +1677,18 @@ retry_find:
 		return VM_FAULT_SIGBUS;
 	}
 
+	/*
+	 * Check if we were the unlucky victim of a holepunch
+	 */
+	mutex_lock(&inode->i_mutex);
+	if (unlikely(mapping_is_damaged(mapping, damage))) {
+		mutex_unlock(&inode->i_mutex);
+		unlock_page(page);
+		page_cache_release(page);
+		damage = mapping_damage(mapping);
+		goto retry_find;
+	}
+
 	vmf->page = page;
 	return ret | VM_FAULT_LOCKED;
 

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

next prev parent reply	other threads:[~2013-11-28  2:33 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-27 13:48 hole-punch vs fault Matthew Wilcox
2013-11-27 22:19 ` Jan Kara
2013-11-28  2:33   ` Matthew Wilcox [this message]
2013-11-28  3:30     ` Matthew Wilcox
2013-11-28  4:22     ` Dave Chinner
2013-11-28  4:44     ` Matthew Wilcox
2013-11-28 12:24       ` Matthew Wilcox
2013-11-28 22:12       ` Dave Chinner
2013-11-29 13:11         ` Matthew Wilcox
2013-12-01 21:52           ` Dave Chinner
2013-12-02  8:33             ` Jan Kara
2013-12-02 15:58               ` Matthew Wilcox
2013-12-02 20:11                 ` Jan Kara
2013-12-02 20:13                 ` Matthew Wilcox
2013-12-02 23:13                   ` Jan Kara

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:54eed4f dfblob:df6278b dfblob:1a04525 dfblob:190f38c
dfblob:b7749a9 dfblob:71c936d )
 OR (
bs:"Re: hole-punch vs fault" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131128023343.GJ24288@parisc-linux.org \
    --to=matthew@wil.cx \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).