From: Matthew Wilcox <matthew@wil.cx>
To: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org
Subject: Re: hole-punch vs fault
Date: Wed, 27 Nov 2013 19:33:43 -0700 [thread overview]
Message-ID: <20131128023343.GJ24288@parisc-linux.org> (raw)
In-Reply-To: <20131127221932.GA27330@quack.suse.cz>
On Wed, Nov 27, 2013 at 11:19:32PM +0100, Jan Kara wrote:
> > The second is to put some kind of generation counter in the inode or
> > address_space that is incremented on every deallocation. The fault path
> > would read the generation counter before calling find_get_page() and then
> > check it hasn't changed after getting the page lock (goto retry_find). I
> > like that one a little more since we can fix it all in common code, and I
> > think it'll be lower overhead in the fault path.
> I'm not sure I understand how this should work. After pagecache is
> truncated, fault can come and happily instantiate the page again. After
> that fault is done, hole punching awakes and removes blocks from under that
> page. So checking the generation counter after you get the page lock seems
> useless to me.
Yeah, I don't think the page lock is enough (maybe someone can convince
me otherwise). How does this look? Checking the generation number
after we get i_mutex ensures that the truncate has finished running.
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 54eed4f..df6278b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3648,6 +3648,7 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
stop_block);
ext4_discard_preallocations(inode);
+ damage_mapping(mapping);
up_write(&EXT4_I(inode)->i_data_sem);
if (IS_SYNC(inode))
ext4_handle_sync(handle);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1a04525..190f38c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -415,6 +415,7 @@ struct address_space {
struct radix_tree_root page_tree; /* radix tree of all pages */
spinlock_t tree_lock; /* and lock protecting it */
unsigned int i_mmap_writable;/* count VM_SHARED mappings */
+ unsigned i_damaged; /* damage count */
struct rb_root i_mmap; /* tree of private and shared mappings */
struct list_head i_mmap_nonlinear;/*list VM_NONLINEAR mappings */
struct mutex i_mmap_mutex; /* protect tree, count, list */
@@ -503,6 +504,31 @@ static inline int mapping_writably_mapped(struct address_space *mapping)
}
/*
+ * A mapping is damaged when blocks are removed from the filesystem's
+ * data structures.
+ */
+static inline unsigned mapping_damage(struct address_space *mapping)
+{
+ unsigned seq = ACCESS_ONCE(mapping->i_damaged);
+ smp_rmb(); /* Subsequent reads of damagable data structures */
+ return seq;
+}
+
+/* Must be called with i_mutex held */
+static inline bool
+mapping_is_damaged(struct address_space *mapping, unsigned seq)
+{
+ return mapping->i_damaged != seq;
+}
+
+/* Must be called with i_mutex held */
+static inline void damage_mapping(struct address_space *mapping)
+{
+ smp_wmb(); /* Prior writes to damagable data structures */
+ mapping->i_damaged++;
+}
+
+/*
* Use sequence counter to get consistent i_size on 32-bit processors.
*/
#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
diff --git a/mm/filemap.c b/mm/filemap.c
index b7749a9..71c936d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1617,6 +1617,7 @@ int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
pgoff_t offset = vmf->pgoff;
struct page *page;
pgoff_t size;
+ unsigned damage = mapping_damage(mapping);
int ret = 0;
size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
@@ -1676,6 +1677,18 @@ retry_find:
return VM_FAULT_SIGBUS;
}
+ /*
+ * Check if we were the unlucky victim of a holepunch
+ */
+ mutex_lock(&inode->i_mutex);
+ if (unlikely(mapping_is_damaged(mapping, damage))) {
+ mutex_unlock(&inode->i_mutex);
+ unlock_page(page);
+ page_cache_release(page);
+ damage = mapping_damage(mapping);
+ goto retry_find;
+ }
+
vmf->page = page;
return ret | VM_FAULT_LOCKED;
--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
next prev parent reply other threads:[~2013-11-28 2:33 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-27 13:48 hole-punch vs fault Matthew Wilcox
2013-11-27 22:19 ` Jan Kara
2013-11-28 2:33 ` Matthew Wilcox [this message]
2013-11-28 3:30 ` Matthew Wilcox
2013-11-28 4:22 ` Dave Chinner
2013-11-28 4:44 ` Matthew Wilcox
2013-11-28 12:24 ` Matthew Wilcox
2013-11-28 22:12 ` Dave Chinner
2013-11-29 13:11 ` Matthew Wilcox
2013-12-01 21:52 ` Dave Chinner
2013-12-02 8:33 ` Jan Kara
2013-12-02 15:58 ` Matthew Wilcox
2013-12-02 20:11 ` Jan Kara
2013-12-02 20:13 ` Matthew Wilcox
2013-12-02 23:13 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131128023343.GJ24288@parisc-linux.org \
--to=matthew@wil.cx \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.