Add invalidatepage_range address space operation

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Add invalidatepage_range address space operation
@ 2012-07-27  8:00 Lukas Czerner
  2012-07-27  8:01 ` [PATCH 01/15] mm: add " Lukas Czerner
                   ` (15 more replies)
  0 siblings, 16 replies; 29+ messages in thread
From: Lukas Czerner @ 2012-07-27  8:00 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-ext4, tytso, hughd, linux-mmc

This set of patches are aimed to allow truncate_inode_pages_range() handle
ranges which are not aligned at the end of the page. Currently it will
hit BUG_ON() when the end of the range is not aligned. Punch hole feature
however can benefit from this ability saving file systems some work not
forcing them to implement their own invalidate code to handle unaligned
ranges.

In order for this to work we need however new address space operation
invalidatepage_range which should be able to handle page invalidation with
offset and length specified.

patch 01:	Implements the new invalidatepage_range address space
		operation in the mm layer
patch 02 - 05:	Wire the new invalidatepage_range aop to the ext4, xfs and
		ocfs2 file system (which are currently the only file
		systems supporting punch hole) and implement this
		functionality for jbd2 as well.
patch 06:	Change truncate_inode_pages_range() to handle unaligned
		ranges.
patch 07 - 15:	Ext4 specific changes which take benefit from the
		previous truncate_inode_pages_range() change. Not all
		are realated specifically to this change, but all are
		related to the punch hole feature.

Thanks!
-Lukas


[PATCH 01/15] mm: add invalidatepage_range address space operation
[PATCH 02/15] jbd2: implement jbd2_journal_invalidatepage_range
[PATCH 03/15] ext4: implement invalidatepage_range aop
[PATCH 04/15] xfs: implement invalidatepage_range aop
[PATCH 05/15] ocfs2: implement invalidatepage_range aop
[PATCH 06/15] mm: teach truncate_inode_pages_range() to handle non
[PATCH 07/15] ext4: Take i_mutex before punching hole
[PATCH 08/15] Revert "ext4: remove no longer used functions in
[PATCH 09/15] Revert "ext4: fix fsx truncate failure"
[PATCH 10/15] ext4: use ext4_zero_partial_blocks in punch_hole
[PATCH 11/15] ext4: remove unused discard_partial_page_buffers
[PATCH 12/15] ext4: remove unused code from ext4_remove_blocks()
[PATCH 13/15] ext4: update ext4_ext_remove_space trace point
[PATCH 14/15] ext4: make punch hole code path work with bigalloc
[PATCH 15/15] ext4: Allow punch hole with bigalloc enabled


fs/buffer.c                 |   23 +++-
fs/ext4/ext4.h              |   14 +-
fs/ext4/extents.c           |  215 +++++++++++-------------------
fs/ext4/indirect.c          |   13 +--
fs/ext4/inode.c             |  316 +++++++++++++++++++------------------------
fs/jbd2/journal.c           |    1 +
fs/jbd2/transaction.c       |   20 +++-
fs/ocfs2/aops.c             |    7 +-
fs/xfs/xfs_aops.c           |   14 +-
fs/xfs/xfs_trace.h          |   41 ++++++-
include/linux/buffer_head.h |    2 +
include/linux/fs.h          |    2 +
include/linux/jbd2.h        |    2 +
include/linux/mm.h          |    2 +
include/trace/events/ext4.h |   55 +++++---
mm/truncate.c               |  107 +++++++++++-----
16 files changed, 437 insertions(+), 397 deletions(-)



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 01/15] mm: add invalidatepage_range address space operation
  2012-07-27  8:00 Add invalidatepage_range address space operation Lukas Czerner
@ 2012-07-27  8:01 ` Lukas Czerner
  2012-08-20  5:24   ` Hugh Dickins
  2012-07-27  8:01 ` [PATCH 02/15] jbd2: implement jbd2_journal_invalidatepage_range Lukas Czerner
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 29+ messages in thread
From: Lukas Czerner @ 2012-07-27  8:01 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, tytso, hughd, linux-mmc, Lukas Czerner, Andrew Morton

Currently there is no way to truncate partial page where the end
truncate point is not at the end of the page. This is because it was not
needed and the functionality was enough for file system truncate
operation to work properly. However more file systems now support punch
hole feature and it can benefit from mm supporting truncating page just
up to the certain point.

Specifically, with this functionality truncate_inode_pages_range() can
be changed so it supports truncating partial page at the end of the
range (currently it will BUG_ON() if 'end' is not at the end of the
page).

This commit add new address space operation invalidatepage_range which
allows specifying length of bytes to invalidate, rather than assuming
truncate to the end of the page. It also introduce
block_invalidatepage_range() and do_invalidatepage)range() functions for
exactly the same reason.

The caller does not have to implement both aops (invalidatepage and
invalidatepage_range) and the latter is preferred. The old method will be
used only if invalidatepage_range is not implemented by the caller.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
---
 fs/buffer.c                 |   23 ++++++++++++++++++++++-
 include/linux/buffer_head.h |    2 ++
 include/linux/fs.h          |    2 ++
 include/linux/mm.h          |    2 ++
 mm/truncate.c               |   30 ++++++++++++++++++++++++++----
 5 files changed, 54 insertions(+), 5 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index c7062c8..5937f30 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1457,13 +1457,27 @@ static void discard_buffer(struct buffer_head * bh)
  */
 void block_invalidatepage(struct page *page, unsigned long offset)
 {
+	block_invalidatepage_range(page, offset, PAGE_CACHE_SIZE);
+}
+EXPORT_SYMBOL(block_invalidatepage);
+
+void block_invalidatepage_range(struct page *page, unsigned long offset,
+				unsigned long length)
+{
 	struct buffer_head *head, *bh, *next;
 	unsigned int curr_off = 0;
+	unsigned long stop = length + offset;
 
 	BUG_ON(!PageLocked(page));
 	if (!page_has_buffers(page))
 		goto out;
 
+	/*
+	 * Check for overflow
+	 */
+	if (stop < length)
+		stop = PAGE_CACHE_SIZE;
+
 	head = page_buffers(page);
 	bh = head;
 	do {
@@ -1471,6 +1485,12 @@ void block_invalidatepage(struct page *page, unsigned long offset)
 		next = bh->b_this_page;
 
 		/*
+		 * Are we still fully in range ?
+		 */
+		if (next_off > stop)
+			goto out;
+
+		/*
 		 * is this block fully invalidated?
 		 */
 		if (offset <= curr_off)
@@ -1489,7 +1509,8 @@ void block_invalidatepage(struct page *page, unsigned long offset)
 out:
 	return;
 }
-EXPORT_SYMBOL(block_invalidatepage);
+EXPORT_SYMBOL(block_invalidatepage_range);
+
 
 /*
  * We attach and possibly dirty the buffers atomically wrt
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 458f497..9d55645 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -194,6 +194,8 @@ extern int buffer_heads_over_limit;
  * address_spaces.
  */
 void block_invalidatepage(struct page *page, unsigned long offset);
+void block_invalidatepage_range(struct page *page, unsigned long offset,
+				unsigned long length);
 int block_write_full_page(struct page *page, get_block_t *get_block,
 				struct writeback_control *wbc);
 int block_write_full_page_endio(struct page *page, get_block_t *get_block,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8fabb03..b9eaf0c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -620,6 +620,8 @@ struct address_space_operations {
 	/* Unfortunately this kludge is needed for FIBMAP. Don't use it */
 	sector_t (*bmap)(struct address_space *, sector_t);
 	void (*invalidatepage) (struct page *, unsigned long);
+	void (*invalidatepage_range) (struct page *, unsigned long,
+				      unsigned long);
 	int (*releasepage) (struct page *, gfp_t);
 	void (*freepage)(struct page *);
 	ssize_t (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index f9f279c..2db6a29 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -998,6 +998,8 @@ struct page *get_dump_page(unsigned long addr);
 
 extern int try_to_release_page(struct page * page, gfp_t gfp_mask);
 extern void do_invalidatepage(struct page *page, unsigned long offset);
+extern void do_invalidatepage_range(struct page *page, unsigned long offset,
+				    unsigned long length);
 
 int __set_page_dirty_nobuffers(struct page *page);
 int __set_page_dirty_no_writeback(struct page *page);
diff --git a/mm/truncate.c b/mm/truncate.c
index 75801ac..e29e5ea 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -39,14 +39,36 @@
  */
 void do_invalidatepage(struct page *page, unsigned long offset)
 {
+	do_invalidatepage_range(page, offset, PAGE_CACHE_SIZE);
+}
+
+void do_invalidatepage_range(struct page *page, unsigned long offset,
+			     unsigned long length)
+{
+	void (*invalidatepage_range)(struct page *, unsigned long,
+				     unsigned long);
 	void (*invalidatepage)(struct page *, unsigned long);
+
+	/*
+	 * Try invalidatepage_range first
+	 */
+	invalidatepage_range = page->mapping->a_ops->invalidatepage_range;
+	if (invalidatepage_range)
+		(*invalidatepage_range)(page, offset, length);
+
+	/*
+	 * When only invalidatepage is registered length must be
+	 * PAGE_CACHE_SIZE
+	 */
 	invalidatepage = page->mapping->a_ops->invalidatepage;
+	if (invalidatepage) {
+		BUG_ON(length != PAGE_CACHE_SIZE);
+		(*invalidatepage)(page, offset);
+	}
 #ifdef CONFIG_BLOCK
-	if (!invalidatepage)
-		invalidatepage = block_invalidatepage;
+	if (!invalidatepage_range && !invalidatepage)
+		block_invalidatepage_range(page, offset, length);
 #endif
-	if (invalidatepage)
-		(*invalidatepage)(page, offset);
 }
 
 static inline void truncate_partial_page(struct page *page, unsigned partial)
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 02/15] jbd2: implement jbd2_journal_invalidatepage_range
  2012-07-27  8:00 Add invalidatepage_range address space operation Lukas Czerner
  2012-07-27  8:01 ` [PATCH 01/15] mm: add " Lukas Czerner
@ 2012-07-27  8:01 ` Lukas Czerner
  2012-07-27  8:01 ` [PATCH 03/15] ext4: implement invalidatepage_range aop Lukas Czerner
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Lukas Czerner @ 2012-07-27  8:01 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-ext4, tytso, hughd, linux-mmc, Lukas Czerner

mm now supports invalidatepage_range address space operation and there
are two file system using jbd2 also implementing punch hole feature
which can benefit from this. We need to implement the same thing for
jbd2 layer in order to allow those file system take benefit of this
functionality.

With new function jbd2_journal_invalidatepage_range() we can now specify
length to invalidate, rather than assuming invalidate to the end of the
page.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
 fs/jbd2/journal.c     |    1 +
 fs/jbd2/transaction.c |   20 ++++++++++++++++++--
 include/linux/jbd2.h  |    2 ++
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index e9a3c4c..a69edc1 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -86,6 +86,7 @@ EXPORT_SYMBOL(jbd2_journal_force_commit_nested);
 EXPORT_SYMBOL(jbd2_journal_wipe);
 EXPORT_SYMBOL(jbd2_journal_blocks_per_page);
 EXPORT_SYMBOL(jbd2_journal_invalidatepage);
+EXPORT_SYMBOL(jbd2_journal_invalidatepage_range);
 EXPORT_SYMBOL(jbd2_journal_try_to_free_buffers);
 EXPORT_SYMBOL(jbd2_journal_force_commit);
 EXPORT_SYMBOL(jbd2_journal_file_inode);
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index fb1ab953..225c6ba 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1993,10 +1993,20 @@ zap_buffer_unlocked:
  *
  */
 void jbd2_journal_invalidatepage(journal_t *journal,
-		      struct page *page,
-		      unsigned long offset)
+				 struct page *page,
+				 unsigned long offset)
+{
+	jbd2_journal_invalidatepage_range(journal, page, offset,
+					  PAGE_CACHE_SIZE);
+}
+
+void jbd2_journal_invalidatepage_range(journal_t *journal,
+				       struct page *page,
+				       unsigned long offset,
+				       unsigned long length)
 {
 	struct buffer_head *head, *bh, *next;
+	unsigned long stop = offset + length;
 	unsigned int curr_off = 0;
 	int may_free = 1;
 
@@ -2005,6 +2015,9 @@ void jbd2_journal_invalidatepage(journal_t *journal,
 	if (!page_has_buffers(page))
 		return;
 
+	if (stop < length)
+		stop = PAGE_CACHE_SIZE;
+
 	/* We will potentially be playing with lists other than just the
 	 * data lists (especially for journaled data mode), so be
 	 * cautious in our locking. */
@@ -2014,6 +2027,9 @@ void jbd2_journal_invalidatepage(journal_t *journal,
 		unsigned int next_off = curr_off + bh->b_size;
 		next = bh->b_this_page;
 
+		if (next_off > stop)
+			return;
+
 		if (offset <= curr_off) {
 			/* This block is wholly outside the truncation point */
 			lock_buffer(bh);
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index f334c7f..c42c3436 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1101,6 +1101,8 @@ extern int	 jbd2_journal_forget (handle_t *, struct buffer_head *);
 extern void	 journal_sync_buffer (struct buffer_head *);
 extern void	 jbd2_journal_invalidatepage(journal_t *,
 				struct page *, unsigned long);
+extern void	 jbd2_journal_invalidatepage_range(journal_t *, struct page *,
+				unsigned long, unsigned long);
 extern int	 jbd2_journal_try_to_free_buffers(journal_t *, struct page *, gfp_t);
 extern int	 jbd2_journal_stop(handle_t *);
 extern int	 jbd2_journal_flush (journal_t *);
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 03/15] ext4: implement invalidatepage_range aop
  2012-07-27  8:00 Add invalidatepage_range address space operation Lukas Czerner
  2012-07-27  8:01 ` [PATCH 01/15] mm: add " Lukas Czerner
  2012-07-27  8:01 ` [PATCH 02/15] jbd2: implement jbd2_journal_invalidatepage_range Lukas Czerner
@ 2012-07-27  8:01 ` Lukas Czerner
  2012-07-27  8:01 ` [PATCH 04/15] xfs: " Lukas Czerner
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Lukas Czerner @ 2012-07-27  8:01 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-ext4, tytso, hughd, linux-mmc, Lukas Czerner

mm now supports invalidatepage_range address space operation which is
useful to allow truncating page range which is not aligned to the end of
the page. This will help in punch hole implementation once
truncate_inode_pages_range() is modify to allow this as well.

With this commit ext4 now register only invalidatepage_range. Also
change the respective tracepoint to print length of the range.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
 fs/ext4/inode.c             |   58 ++++++++++++++++++++++++++++++------------
 include/trace/events/ext4.h |   13 ++++++---
 2 files changed, 49 insertions(+), 22 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 02bc8cb..1b8c317 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -131,7 +131,8 @@ static inline int ext4_begin_ordered_truncate(struct inode *inode,
 						   new_size);
 }
 
-static void ext4_invalidatepage(struct page *page, unsigned long offset);
+static void ext4_invalidatepage_range(struct page *page, unsigned long offset,
+				      unsigned long length);
 static int noalloc_get_block_write(struct inode *inode, sector_t iblock,
 				   struct buffer_head *bh_result, int create);
 static int ext4_set_bh_endio(struct buffer_head *bh, struct inode *inode);
@@ -1262,20 +1263,28 @@ static void ext4_da_release_space(struct inode *inode, int to_free)
 }
 
 static void ext4_da_page_release_reservation(struct page *page,
-					     unsigned long offset)
+					     unsigned long offset,
+					     unsigned long length)
 {
 	int to_release = 0;
 	struct buffer_head *head, *bh;
 	unsigned int curr_off = 0;
 	struct inode *inode = page->mapping->host;
 	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+	unsigned long stop = offset + length;
 	int num_clusters;
 
+	if (stop < length)
+		stop = PAGE_CACHE_SIZE;
+
 	head = page_buffers(page);
 	bh = head;
 	do {
 		unsigned int next_off = curr_off + bh->b_size;
 
+		if (next_off > stop)
+			break;
+
 		if ((offset <= curr_off) && (buffer_delay(bh))) {
 			to_release++;
 			clear_buffer_delay(bh);
@@ -2608,7 +2617,9 @@ static int ext4_da_write_end(struct file *file,
 	return ret ? ret : copied;
 }
 
-static void ext4_da_invalidatepage(struct page *page, unsigned long offset)
+static void ext4_da_invalidatepage_range(struct page *page,
+					 unsigned long offset,
+					 unsigned long length)
 {
 	/*
 	 * Drop reserved blocks
@@ -2617,10 +2628,10 @@ static void ext4_da_invalidatepage(struct page *page, unsigned long offset)
 	if (!page_has_buffers(page))
 		goto out;
 
-	ext4_da_page_release_reservation(page, offset);
+	ext4_da_page_release_reservation(page, offset, length);
 
 out:
-	ext4_invalidatepage(page, offset);
+	ext4_invalidatepage_range(page, offset, length);
 
 	return;
 }
@@ -2746,47 +2757,60 @@ ext4_readpages(struct file *file, struct address_space *mapping,
 	return mpage_readpages(mapping, pages, nr_pages, ext4_get_block);
 }
 
-static void ext4_invalidatepage_free_endio(struct page *page, unsigned long offset)
+static void ext4_invalidatepage_free_endio(struct page *page,
+					   unsigned long offset,
+					   unsigned long length)
 {
 	struct buffer_head *head, *bh;
 	unsigned int curr_off = 0;
+	unsigned long stop = offset + length;
 
 	if (!page_has_buffers(page))
 		return;
+	if (stop < length)
+		stop = PAGE_CACHE_SIZE;
+
 	head = bh = page_buffers(page);
 	do {
+		unsigned int next_off = curr_off + bh->b_size;
+
+		if (next_off > stop)
+			return;
+
 		if (offset <= curr_off && test_clear_buffer_uninit(bh)
 					&& bh->b_private) {
 			ext4_free_io_end(bh->b_private);
 			bh->b_private = NULL;
 			bh->b_end_io = NULL;
 		}
-		curr_off = curr_off + bh->b_size;
+		curr_off = next_off;
 		bh = bh->b_this_page;
 	} while (bh != head);
 }
 
-static void ext4_invalidatepage(struct page *page, unsigned long offset)
+static void ext4_invalidatepage_range(struct page *page, unsigned long offset,
+				      unsigned long length)
 {
 	journal_t *journal = EXT4_JOURNAL(page->mapping->host);
 
-	trace_ext4_invalidatepage(page, offset);
+	trace_ext4_invalidatepage_range(page, offset, length);
 
 	/*
 	 * free any io_end structure allocated for buffers to be discarded
 	 */
 	if (ext4_should_dioread_nolock(page->mapping->host))
-		ext4_invalidatepage_free_endio(page, offset);
+		ext4_invalidatepage_free_endio(page, offset, length);
 	/*
 	 * If it's a full truncate we just forget about the pending dirtying
 	 */
-	if (offset == 0)
+	if (offset == 0 && length >= PAGE_CACHE_SIZE)
 		ClearPageChecked(page);
 
 	if (journal)
-		jbd2_journal_invalidatepage(journal, page, offset);
+		jbd2_journal_invalidatepage_range(journal, page,
+						  offset, length);
 	else
-		block_invalidatepage(page, offset);
+		block_invalidatepage_range(page, offset, length);
 }
 
 static int ext4_releasepage(struct page *page, gfp_t wait)
@@ -3101,7 +3125,7 @@ static const struct address_space_operations ext4_ordered_aops = {
 	.write_begin		= ext4_write_begin,
 	.write_end		= ext4_ordered_write_end,
 	.bmap			= ext4_bmap,
-	.invalidatepage		= ext4_invalidatepage,
+	.invalidatepage_range	= ext4_invalidatepage_range,
 	.releasepage		= ext4_releasepage,
 	.direct_IO		= ext4_direct_IO,
 	.migratepage		= buffer_migrate_page,
@@ -3116,7 +3140,7 @@ static const struct address_space_operations ext4_writeback_aops = {
 	.write_begin		= ext4_write_begin,
 	.write_end		= ext4_writeback_write_end,
 	.bmap			= ext4_bmap,
-	.invalidatepage		= ext4_invalidatepage,
+	.invalidatepage_range	= ext4_invalidatepage_range,
 	.releasepage		= ext4_releasepage,
 	.direct_IO		= ext4_direct_IO,
 	.migratepage		= buffer_migrate_page,
@@ -3132,7 +3156,7 @@ static const struct address_space_operations ext4_journalled_aops = {
 	.write_end		= ext4_journalled_write_end,
 	.set_page_dirty		= ext4_journalled_set_page_dirty,
 	.bmap			= ext4_bmap,
-	.invalidatepage		= ext4_invalidatepage,
+	.invalidatepage_range	= ext4_invalidatepage_range,
 	.releasepage		= ext4_releasepage,
 	.direct_IO		= ext4_direct_IO,
 	.is_partially_uptodate  = block_is_partially_uptodate,
@@ -3147,7 +3171,7 @@ static const struct address_space_operations ext4_da_aops = {
 	.write_begin		= ext4_da_write_begin,
 	.write_end		= ext4_da_write_end,
 	.bmap			= ext4_bmap,
-	.invalidatepage		= ext4_da_invalidatepage,
+	.invalidatepage_range	= ext4_da_invalidatepage_range,
 	.releasepage		= ext4_releasepage,
 	.direct_IO		= ext4_direct_IO,
 	.migratepage		= buffer_migrate_page,
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 69d8a69..30bae72 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -450,14 +450,15 @@ DEFINE_EVENT(ext4__page_op, ext4_releasepage,
 	TP_ARGS(page)
 );
 
-TRACE_EVENT(ext4_invalidatepage,
-	TP_PROTO(struct page *page, unsigned long offset),
+TRACE_EVENT(ext4_invalidatepage_range,
+	TP_PROTO(struct page *page, unsigned long offset, unsigned long length),
 
-	TP_ARGS(page, offset),
+	TP_ARGS(page, offset, length),
 
 	TP_STRUCT__entry(
 		__field(	pgoff_t, index			)
 		__field(	unsigned long, offset		)
+		__field(	unsigned long, length		)
 		__field(	ino_t,	ino			)
 		__field(	dev_t,	dev			)
 
@@ -466,14 +467,16 @@ TRACE_EVENT(ext4_invalidatepage,
 	TP_fast_assign(
 		__entry->index	= page->index;
 		__entry->offset	= offset;
+		__entry->length	= length;
 		__entry->ino	= page->mapping->host->i_ino;
 		__entry->dev	= page->mapping->host->i_sb->s_dev;
 	),
 
-	TP_printk("dev %d,%d ino %lu page_index %lu offset %lu",
+	TP_printk("dev %d,%d ino %lu page_index %lu offset %lu length %lu",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  (unsigned long) __entry->ino,
-		  (unsigned long) __entry->index, __entry->offset)
+		  (unsigned long) __entry->index,
+		  __entry->offset, __entry->length)
 );
 
 TRACE_EVENT(ext4_discard_blocks,
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 04/15] xfs: implement invalidatepage_range aop
  2012-07-27  8:00 Add invalidatepage_range address space operation Lukas Czerner
                   ` (2 preceding siblings ...)
  2012-07-27  8:01 ` [PATCH 03/15] ext4: implement invalidatepage_range aop Lukas Czerner
@ 2012-07-27  8:01 ` Lukas Czerner
  2012-07-27  8:01 ` [PATCH 05/15] ocfs2: " Lukas Czerner
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Lukas Czerner @ 2012-07-27  8:01 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-ext4, tytso, hughd, linux-mmc, Lukas Czerner, xfs

mm now supports invalidatepage_range address space operation which is
useful to allow truncating page range which is not aligned to the end of
the page. This will help in punch hole implementation once
truncate_inode_pages_range() is modify to allow this as well.

With this commit xfs now register only invalidatepage_range.
Additionally we also update the respective trace point.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Cc: xfs@oss.sgi.com
---
 fs/xfs/xfs_aops.c  |   14 ++++++++------
 fs/xfs/xfs_trace.h |   41 ++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 48 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 8dad722..7aa3d68 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -798,12 +798,14 @@ xfs_cluster_write(
 }
 
 STATIC void
-xfs_vm_invalidatepage(
+xfs_vm_invalidatepage_range(
 	struct page		*page,
-	unsigned long		offset)
+	unsigned long		offset,
+	unsigned long		length)
 {
-	trace_xfs_invalidatepage(page->mapping->host, page, offset);
-	block_invalidatepage(page, offset);
+	trace_xfs_invalidatepage_range(page->mapping->host, page, offset,
+				       length);
+	block_invalidatepage_range(page, offset, length);
 }
 
 /*
@@ -867,7 +869,7 @@ next_buffer:
 
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 out_invalidate:
-	xfs_vm_invalidatepage(page, 0);
+	xfs_vm_invalidatepage_range(page, 0, PAGE_CACHE_SIZE);
 	return;
 }
 
@@ -1613,7 +1615,7 @@ const struct address_space_operations xfs_address_space_operations = {
 	.writepage		= xfs_vm_writepage,
 	.writepages		= xfs_vm_writepages,
 	.releasepage		= xfs_vm_releasepage,
-	.invalidatepage		= xfs_vm_invalidatepage,
+	.invalidatepage_range	= xfs_vm_invalidatepage_range,
 	.write_begin		= xfs_vm_write_begin,
 	.write_end		= xfs_vm_write_end,
 	.bmap			= xfs_vm_bmap,
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index caf5dab..1414e93 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -981,7 +981,46 @@ DEFINE_EVENT(xfs_page_class, name,	\
 	TP_ARGS(inode, page, off))
 DEFINE_PAGE_EVENT(xfs_writepage);
 DEFINE_PAGE_EVENT(xfs_releasepage);
-DEFINE_PAGE_EVENT(xfs_invalidatepage);
+
+TRACE_EVENT(xfs_invalidatepage_range,
+	TP_PROTO(struct inode *inode, struct page *page, unsigned long off,
+		 unsigned long len),
+	TP_ARGS(inode, page, off, len),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(pgoff_t, pgoff)
+		__field(loff_t, size)
+		__field(unsigned long, offset)
+		__field(unsigned long, length)
+		__field(int, delalloc)
+		__field(int, unwritten)
+	),
+	TP_fast_assign(
+		int delalloc = -1, unwritten = -1;
+
+		if (page_has_buffers(page))
+			xfs_count_page_state(page, &delalloc, &unwritten);
+		__entry->dev = inode->i_sb->s_dev;
+		__entry->ino = XFS_I(inode)->i_ino;
+		__entry->pgoff = page_offset(page);
+		__entry->size = i_size_read(inode);
+		__entry->offset = off;
+		__entry->length = len;
+		__entry->delalloc = delalloc;
+		__entry->unwritten = unwritten;
+	),
+	TP_printk("dev %d:%d ino 0x%llx pgoff 0x%lx size 0x%llx offset %lx "
+		  "length %lx delalloc %d unwritten %d",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->pgoff,
+		  __entry->size,
+		  __entry->offset,
+		  __entry->length,
+		  __entry->delalloc,
+		  __entry->unwritten)
+)
 
 DECLARE_EVENT_CLASS(xfs_imap_class,
 	TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count,
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 05/15] ocfs2: implement invalidatepage_range aop
  2012-07-27  8:00 Add invalidatepage_range address space operation Lukas Czerner
                   ` (3 preceding siblings ...)
  2012-07-27  8:01 ` [PATCH 04/15] xfs: " Lukas Czerner
@ 2012-07-27  8:01 ` Lukas Czerner
  2012-07-27  8:01 ` [PATCH 06/15] mm: teach truncate_inode_pages_range() to handle non page aligned ranges Lukas Czerner
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Lukas Czerner @ 2012-07-27  8:01 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, tytso, hughd, linux-mmc, Lukas Czerner, ocfs2-devel

mm now supports invalidatepage_range address space operation which is
useful to allow truncating page range which is not aligned to the end of
the page. This will help in punch hole implementation once
truncate_inode_pages_range() is modify to allow this as well.

With this commit ocfs2 now register only invalidatepage_range.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Cc: ocfs2-devel@oss.oracle.com
---
 fs/ocfs2/aops.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 6577432..46c1757 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -603,11 +603,12 @@ static void ocfs2_dio_end_io(struct kiocb *iocb,
  * from ext3.  PageChecked() bits have been removed as OCFS2 does not
  * do journalled data.
  */
-static void ocfs2_invalidatepage(struct page *page, unsigned long offset)
+static void ocfs2_invalidatepage_range(struct page *page, unsigned long offset,
+				       unsigned long length)
 {
 	journal_t *journal = OCFS2_SB(page->mapping->host->i_sb)->journal->j_journal;
 
-	jbd2_journal_invalidatepage(journal, page, offset);
+	jbd2_journal_invalidatepage_range(journal, page, offset, length);
 }
 
 static int ocfs2_releasepage(struct page *page, gfp_t wait)
@@ -2091,7 +2092,7 @@ const struct address_space_operations ocfs2_aops = {
 	.write_end		= ocfs2_write_end,
 	.bmap			= ocfs2_bmap,
 	.direct_IO		= ocfs2_direct_IO,
-	.invalidatepage		= ocfs2_invalidatepage,
+	.invalidatepage_range	= ocfs2_invalidatepage_range,
 	.releasepage		= ocfs2_releasepage,
 	.migratepage		= buffer_migrate_page,
 	.is_partially_uptodate	= block_is_partially_uptodate,
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 06/15] mm: teach truncate_inode_pages_range() to handle non page aligned ranges
  2012-07-27  8:00 Add invalidatepage_range address space operation Lukas Czerner
                   ` (4 preceding siblings ...)
  2012-07-27  8:01 ` [PATCH 05/15] ocfs2: " Lukas Czerner
@ 2012-07-27  8:01 ` Lukas Czerner
  2012-08-20  4:52   ` Hugh Dickins
  2012-07-27  8:01 ` [PATCH 07/15] ext4: Take i_mutex before punching hole Lukas Czerner
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 29+ messages in thread
From: Lukas Czerner @ 2012-07-27  8:01 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, tytso, hughd, linux-mmc, Lukas Czerner, Andrew Morton

This commit changes truncate_inode_pages_range() so it can handle non
page aligned regions of the truncate. Currently we can hit BUG_ON when
the end of the range is not page aligned, but we can handle unaligned
start of the range.

Being able to handle non page aligned regions of the page can help file
system punch_hole implementations and save some work, because once we're
holding the page we might as well deal with it right away.

In order for this to work correctly, called must register
invalidatepage_range address space operation, or rely solely on the
block_invalidatepage_range. That said it will BUG_ON() if caller
implements invalidatepage(), does not implement invalidatepage_range()
and use truncate_inode_pages_range() with unaligned end of the range.

This was based on the code provided by Hugh Dickins with some small
changes to make use of do_invalidatepage_range().

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
---
 mm/truncate.c |   77 +++++++++++++++++++++++++++++++++++++--------------------
 1 files changed, 50 insertions(+), 27 deletions(-)

diff --git a/mm/truncate.c b/mm/truncate.c
index e29e5ea..1f6ea8b 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -71,14 +71,6 @@ void do_invalidatepage_range(struct page *page, unsigned long offset,
 #endif
 }
 
-static inline void truncate_partial_page(struct page *page, unsigned partial)
-{
-	zero_user_segment(page, partial, PAGE_CACHE_SIZE);
-	cleancache_invalidate_page(page->mapping, page);
-	if (page_has_private(page))
-		do_invalidatepage(page, partial);
-}
-
 /*
  * This cancels just the dirty bit on the kernel page itself, it
  * does NOT actually remove dirty bits on any mmap's that may be
@@ -212,8 +204,8 @@ int invalidate_inode_page(struct page *page)
  * @lend: offset to which to truncate
  *
  * Truncate the page cache, removing the pages that are between
- * specified offsets (and zeroing out partial page
- * (if lstart is not page aligned)).
+ * specified offsets (and zeroing out partial pages
+ * if lstart or lend + 1 is not page aligned).
  *
  * Truncate takes two passes - the first pass is nonblocking.  It will not
  * block on page locks and it will not block on writeback.  The second pass
@@ -224,35 +216,44 @@ int invalidate_inode_page(struct page *page)
  * We pass down the cache-hot hint to the page freeing code.  Even if the
  * mapping is large, it is probably the case that the final pages are the most
  * recently touched, and freeing happens in ascending file offset order.
+ *
+ * Note that it is able to handle cases where lend + 1 is not page aligned.
+ * However in order for this to work caller have to register
+ * invalidatepage_range address space operation or rely solely on
+ * block_invalidatepage_range(). That said, do_invalidatepage_range() will
+ * BUG_ON() if caller implements invalidatapage(), does not implement
+ * invalidatepage_range() and uses truncate_inode_pages_range() with lend + 1
+ * unaligned to the page cache size.
  */
 void truncate_inode_pages_range(struct address_space *mapping,
 				loff_t lstart, loff_t lend)
 {
-	const pgoff_t start = (lstart + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
-	const unsigned partial = lstart & (PAGE_CACHE_SIZE - 1);
+	pgoff_t start = (lstart + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
+	pgoff_t end = (lend + 1) >> PAGE_CACHE_SHIFT;
+	unsigned int partial_start = lstart & (PAGE_CACHE_SIZE - 1);
+	unsigned int partial_end = (lend + 1) & (PAGE_CACHE_SIZE - 1);
 	struct pagevec pvec;
 	pgoff_t index;
-	pgoff_t end;
 	int i;
 
 	cleancache_invalidate_inode(mapping);
 	if (mapping->nrpages == 0)
 		return;
 
-	BUG_ON((lend & (PAGE_CACHE_SIZE - 1)) != (PAGE_CACHE_SIZE - 1));
-	end = (lend >> PAGE_CACHE_SHIFT);
+	if (lend == -1)
+		end = -1;	/* unsigned, so actually very big */
 
 	pagevec_init(&pvec, 0);
 	index = start;
-	while (index <= end && pagevec_lookup(&pvec, mapping, index,
-			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
+	while (index < end && pagevec_lookup(&pvec, mapping, index,
+			min(end - index, (pgoff_t)PAGEVEC_SIZE))) {
 		mem_cgroup_uncharge_start();
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
 
 			/* We rely upon deletion not changing page->index */
 			index = page->index;
-			if (index > end)
+			if (index >= end)
 				break;
 
 			if (!trylock_page(page))
@@ -271,27 +272,51 @@ void truncate_inode_pages_range(struct address_space *mapping,
 		index++;
 	}
 
-	if (partial) {
+	if (partial_start) {
 		struct page *page = find_lock_page(mapping, start - 1);
 		if (page) {
+			unsigned int top = PAGE_CACHE_SIZE;
+			if (start > end) {
+				top = partial_end;
+				partial_end = 0;
+			}
+			wait_on_page_writeback(page);
+			zero_user_segment(page, partial_start, top);
+			cleancache_invalidate_page(mapping, page);
+			if (page_has_private(page))
+				do_invalidatepage_range(page, partial_start,
+							top);
+			unlock_page(page);
+			page_cache_release(page);
+		}
+	}
+	if (partial_end) {
+		struct page *page = find_lock_page(mapping, end);
+		if (page) {
 			wait_on_page_writeback(page);
-			truncate_partial_page(page, partial);
+			zero_user_segment(page, 0, partial_end);
+			cleancache_invalidate_page(mapping, page);
+			if (page_has_private(page))
+				do_invalidatepage_range(page, 0,
+							partial_end);
 			unlock_page(page);
 			page_cache_release(page);
 		}
 	}
+	if (start >= end)
+		return;
 
 	index = start;
 	for ( ; ; ) {
 		cond_resched();
 		if (!pagevec_lookup(&pvec, mapping, index,
-			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
+			min(end - index, (pgoff_t)PAGEVEC_SIZE))) {
 			if (index == start)
 				break;
 			index = start;
 			continue;
 		}
-		if (index == start && pvec.pages[0]->index > end) {
+		if (index == start && pvec.pages[0]->index >= end) {
 			pagevec_release(&pvec);
 			break;
 		}
@@ -301,7 +326,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
 
 			/* We rely upon deletion not changing page->index */
 			index = page->index;
-			if (index > end)
+			if (index >= end)
 				break;
 
 			lock_page(page);
@@ -646,10 +671,8 @@ void truncate_pagecache_range(struct inode *inode, loff_t lstart, loff_t lend)
 	 * This rounding is currently just for example: unmap_mapping_range
 	 * expands its hole outwards, whereas we want it to contract the hole
 	 * inwards.  However, existing callers of truncate_pagecache_range are
-	 * doing their own page rounding first; and truncate_inode_pages_range
-	 * currently BUGs if lend is not pagealigned-1 (it handles partial
-	 * page at start of hole, but not partial page at end of hole).  Note
-	 * unmap_mapping_range allows holelen 0 for all, and we allow lend -1.
+	 * doing their own page rounding first.  Note that unmap_mapping_range
+	 * allows holelen 0 for all, and we allow lend -1 for end of file.
 	 */
 
 	/*
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 07/15] ext4: Take i_mutex before punching hole
  2012-07-27  8:00 Add invalidatepage_range address space operation Lukas Czerner
                   ` (5 preceding siblings ...)
  2012-07-27  8:01 ` [PATCH 06/15] mm: teach truncate_inode_pages_range() to handle non page aligned ranges Lukas Czerner
@ 2012-07-27  8:01 ` Lukas Czerner
  2012-07-27  9:04   ` Lukáš Czerner
  2012-08-20  5:45   ` Hugh Dickins
  2012-07-27  8:01 ` [PATCH 08/15] Revert "ext4: remove no longer used functions in inode.c" Lukas Czerner
                   ` (8 subsequent siblings)
  15 siblings, 2 replies; 29+ messages in thread
From: Lukas Czerner @ 2012-07-27  8:01 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-ext4, tytso, hughd, linux-mmc, Lukas Czerner

Currently the allocation might happen in the punched range after the
truncation and before the releasing the space of the range. This would
lead to blocks being unallocated under the mapped buffer heads resulting
in nasty bugs.

With this commit we take i_mutex before going to do anything in the
ext4_ext_punch_hole() preventing any write to happen while the hole
punching is in progress. This will also allow us to ditch the writeout
of dirty pages withing the range.

This commit was based on code provided by Zheng Liu, thanks!

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
 fs/ext4/extents.c |   26 ++++++++++----------------
 1 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 91341ec..2d6a216 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4755,9 +4755,11 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
 	loff_t first_page_offset, last_page_offset;
 	int credits, err = 0;
 
+	mutex_lock(&inode->i_mutex);
+
 	/* No need to punch hole beyond i_size */
 	if (offset >= inode->i_size)
-		return 0;
+		goto out1;
 
 	/*
 	 * If the hole extends beyond i_size, set the hole
@@ -4775,18 +4777,6 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
 	first_page_offset = first_page << PAGE_CACHE_SHIFT;
 	last_page_offset = last_page << PAGE_CACHE_SHIFT;
 
-	/*
-	 * Write out all dirty pages to avoid race conditions
-	 * Then release them.
-	 */
-	if (mapping->nrpages && mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
-		err = filemap_write_and_wait_range(mapping,
-			offset, offset + length - 1);
-
-		if (err)
-			return err;
-	}
-
 	/* Now release the pages */
 	if (last_page_offset > first_page_offset) {
 		truncate_pagecache_range(inode, first_page_offset,
@@ -4798,12 +4788,14 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
 
 	credits = ext4_writepage_trans_blocks(inode);
 	handle = ext4_journal_start(inode, credits);
-	if (IS_ERR(handle))
-		return PTR_ERR(handle);
+	if (IS_ERR(handle)) {
+		err = PTR_ERR(handle);
+		goto out1;
+	}
 
 	err = ext4_orphan_add(handle, inode);
 	if (err)
-		goto out;
+		goto out1;
 
 	/*
 	 * Now we need to zero out the non-page-aligned data in the
@@ -4893,6 +4885,8 @@ out:
 	inode->i_mtime = inode->i_ctime = ext4_current_time(inode);
 	ext4_mark_inode_dirty(handle, inode);
 	ext4_journal_stop(handle);
+out1:
+	mutex_unlock(&inode->i_mutex);
 	return err;
 }
 int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 08/15] Revert "ext4: remove no longer used functions in inode.c"
  2012-07-27  8:00 Add invalidatepage_range address space operation Lukas Czerner
                   ` (6 preceding siblings ...)
  2012-07-27  8:01 ` [PATCH 07/15] ext4: Take i_mutex before punching hole Lukas Czerner
@ 2012-07-27  8:01 ` Lukas Czerner
  2012-07-27  8:01 ` [PATCH 09/15] Revert "ext4: fix fsx truncate failure" Lukas Czerner
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Lukas Czerner @ 2012-07-27  8:01 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-ext4, tytso, hughd, linux-mmc, Lukas Czerner

This reverts commit ccb4d7af914e0fe9b2f1022f8ea6c300463fd5e6.

This commit reintroduces functions ext4_block_truncate_page() and
ext4_block_zero_page_range() which has been previously removed in favour
of ext4_discard_partial_page_buffers().

In future commits we want to reintroduce those function and remove
ext4_discard_partial_page_buffers() since it is duplicating some code
and also partially duplicating work of truncate_pagecache_range(),
moreover the old implementation was much clearer.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
 fs/ext4/ext4.h  |    4 ++
 fs/ext4/inode.c |  120 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 124 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index cfc4e01..439af1e 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1995,6 +1995,10 @@ extern int ext4_alloc_da_blocks(struct inode *inode);
 extern void ext4_set_aops(struct inode *inode);
 extern int ext4_writepage_trans_blocks(struct inode *);
 extern int ext4_chunk_trans_blocks(struct inode *, int nrblocks);
+extern int ext4_block_truncate_page(handle_t *handle,
+		struct address_space *mapping, loff_t from);
+extern int ext4_block_zero_page_range(handle_t *handle,
+		struct address_space *mapping, loff_t from, loff_t length);
 extern int ext4_discard_partial_page_buffers(handle_t *handle,
 		struct address_space *mapping, loff_t from,
 		loff_t length, int flags);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 1b8c317..3e69a78 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3405,6 +3405,126 @@ next:
 	return err;
 }
 
+/*
+ * ext4_block_truncate_page() zeroes out a mapping from file offset `from'
+ * up to the end of the block which corresponds to `from'.
+ * This required during truncate. We need to physically zero the tail end
+ * of that block so it doesn't yield old data if the file is later grown.
+ */
+int ext4_block_truncate_page(handle_t *handle,
+		struct address_space *mapping, loff_t from)
+{
+	unsigned offset = from & (PAGE_CACHE_SIZE-1);
+	unsigned length;
+	unsigned blocksize;
+	struct inode *inode = mapping->host;
+
+	blocksize = inode->i_sb->s_blocksize;
+	length = blocksize - (offset & (blocksize - 1));
+
+	return ext4_block_zero_page_range(handle, mapping, from, length);
+}
+
+/*
+ * ext4_block_zero_page_range() zeros out a mapping of length 'length'
+ * starting from file offset 'from'.  The range to be zero'd must
+ * be contained with in one block.  If the specified range exceeds
+ * the end of the block it will be shortened to end of the block
+ * that cooresponds to 'from'
+ */
+int ext4_block_zero_page_range(handle_t *handle,
+		struct address_space *mapping, loff_t from, loff_t length)
+{
+	ext4_fsblk_t index = from >> PAGE_CACHE_SHIFT;
+	unsigned offset = from & (PAGE_CACHE_SIZE-1);
+	unsigned blocksize, max, pos;
+	ext4_lblk_t iblock;
+	struct inode *inode = mapping->host;
+	struct buffer_head *bh;
+	struct page *page;
+	int err = 0;
+
+	page = find_or_create_page(mapping, from >> PAGE_CACHE_SHIFT,
+				   mapping_gfp_mask(mapping) & ~__GFP_FS);
+	if (!page)
+		return -ENOMEM;
+
+	blocksize = inode->i_sb->s_blocksize;
+	max = blocksize - (offset & (blocksize - 1));
+
+	/*
+	 * correct length if it does not fall between
+	 * 'from' and the end of the block
+	 */
+	if (length > max || length < 0)
+		length = max;
+
+	iblock = index << (PAGE_CACHE_SHIFT - inode->i_sb->s_blocksize_bits);
+
+	if (!page_has_buffers(page))
+		create_empty_buffers(page, blocksize, 0);
+
+	/* Find the buffer that contains "offset" */
+	bh = page_buffers(page);
+	pos = blocksize;
+	while (offset >= pos) {
+		bh = bh->b_this_page;
+		iblock++;
+		pos += blocksize;
+	}
+
+	err = 0;
+	if (buffer_freed(bh)) {
+		BUFFER_TRACE(bh, "freed: skip");
+		goto unlock;
+	}
+
+	if (!buffer_mapped(bh)) {
+		BUFFER_TRACE(bh, "unmapped");
+		ext4_get_block(inode, iblock, bh, 0);
+		/* unmapped? It's a hole - nothing to do */
+		if (!buffer_mapped(bh)) {
+			BUFFER_TRACE(bh, "still unmapped");
+			goto unlock;
+		}
+	}
+
+	/* Ok, it's mapped. Make sure it's up-to-date */
+	if (PageUptodate(page))
+		set_buffer_uptodate(bh);
+
+	if (!buffer_uptodate(bh)) {
+		err = -EIO;
+		ll_rw_block(READ, 1, &bh);
+		wait_on_buffer(bh);
+		/* Uhhuh. Read error. Complain and punt. */
+		if (!buffer_uptodate(bh))
+			goto unlock;
+	}
+
+	if (ext4_should_journal_data(inode)) {
+		BUFFER_TRACE(bh, "get write access");
+		err = ext4_journal_get_write_access(handle, bh);
+		if (err)
+			goto unlock;
+	}
+
+	zero_user(page, offset, length);
+
+	BUFFER_TRACE(bh, "zeroed end of block");
+
+	err = 0;
+	if (ext4_should_journal_data(inode)) {
+		err = ext4_handle_dirty_metadata(handle, inode, bh);
+	} else
+		mark_buffer_dirty(bh);
+
+unlock:
+	unlock_page(page);
+	page_cache_release(page);
+	return err;
+}
+
 int ext4_can_truncate(struct inode *inode)
 {
 	if (S_ISREG(inode->i_mode))
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 09/15] Revert "ext4: fix fsx truncate failure"
  2012-07-27  8:00 Add invalidatepage_range address space operation Lukas Czerner
                   ` (7 preceding siblings ...)
  2012-07-27  8:01 ` [PATCH 08/15] Revert "ext4: remove no longer used functions in inode.c" Lukas Czerner
@ 2012-07-27  8:01 ` Lukas Czerner
  2012-07-27  8:01 ` [PATCH 10/15] ext4: use ext4_zero_partial_blocks in punch_hole Lukas Czerner
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Lukas Czerner @ 2012-07-27  8:01 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-ext4, tytso, hughd, linux-mmc, Lukas Czerner

This reverts commit 189e868fa8fdca702eb9db9d8afc46b5cb9144c9.

This commit reintroduces the use of ext4_block_truncate_page() in ext4
truncate operation instead of ext4_discard_partial_page_buffers().

The statement in the commit description that the truncate operation only
zero block unaligned portion of the last page is not exactly right,
since truncate_pagecache_range() also zeroes and invalidate the unaligned
portion of the page. Then there is no need to zero and unmap it once more
and ext4_block_truncate_page() was doing the right job, although we
still need to update the buffer head containing the last block, which is
exactly what ext4_block_truncate_page() is doing.

Moreover the problem described in the commit is fixed more properly with
commit

15291164b22a357cb211b618adfef4fa82fc0de3
	jbd2: clear BH_Delay & BH_Unwritten in journal_unmap_buffer

This was tested on ppc64 machine with block size of 1024 bytes without
any problems.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
 fs/ext4/extents.c  |   13 ++-----------
 fs/ext4/indirect.c |   13 ++-----------
 2 files changed, 4 insertions(+), 22 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 2d6a216..9967947 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4222,7 +4222,6 @@ void ext4_ext_truncate(struct inode *inode)
 	struct super_block *sb = inode->i_sb;
 	ext4_lblk_t last_block;
 	handle_t *handle;
-	loff_t page_len;
 	int err = 0;
 
 	/*
@@ -4239,16 +4238,8 @@ void ext4_ext_truncate(struct inode *inode)
 	if (IS_ERR(handle))
 		return;
 
-	if (inode->i_size % PAGE_CACHE_SIZE != 0) {
-		page_len = PAGE_CACHE_SIZE -
-			(inode->i_size & (PAGE_CACHE_SIZE - 1));
-
-		err = ext4_discard_partial_page_buffers(handle,
-			mapping, inode->i_size, page_len, 0);
-
-		if (err)
-			goto out_stop;
-	}
+	if (inode->i_size & (sb->s_blocksize - 1))
+		ext4_block_truncate_page(handle, mapping, inode->i_size);
 
 	if (ext4_orphan_add(handle, inode))
 		goto out_stop;
diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index 830e1b2..a082b30 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -1349,9 +1349,7 @@ void ext4_ind_truncate(struct inode *inode)
 	__le32 nr = 0;
 	int n = 0;
 	ext4_lblk_t last_block, max_block;
-	loff_t page_len;
 	unsigned blocksize = inode->i_sb->s_blocksize;
-	int err;
 
 	handle = start_transaction(inode);
 	if (IS_ERR(handle))
@@ -1362,16 +1360,9 @@ void ext4_ind_truncate(struct inode *inode)
 	max_block = (EXT4_SB(inode->i_sb)->s_bitmap_maxbytes + blocksize-1)
 					>> EXT4_BLOCK_SIZE_BITS(inode->i_sb);
 
-	if (inode->i_size % PAGE_CACHE_SIZE != 0) {
-		page_len = PAGE_CACHE_SIZE -
-			(inode->i_size & (PAGE_CACHE_SIZE - 1));
-
-		err = ext4_discard_partial_page_buffers(handle,
-			mapping, inode->i_size, page_len, 0);

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 10/15] ext4: use ext4_zero_partial_blocks in punch_hole
  2012-07-27  8:00 Add invalidatepage_range address space operation Lukas Czerner
                   ` (8 preceding siblings ...)
  2012-07-27  8:01 ` [PATCH 09/15] Revert "ext4: fix fsx truncate failure" Lukas Czerner
@ 2012-07-27  8:01 ` Lukas Czerner
  2012-07-27  8:01 ` [PATCH 11/15] ext4: remove unused discard_partial_page_buffers Lukas Czerner
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Lukas Czerner @ 2012-07-27  8:01 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-ext4, tytso, hughd, linux-mmc, Lukas Czerner

We're doing to get rid of ext4_discard_partial_page_buffers() since it is
duplicating some code and also partially duplicating work of
truncate_pagecache_range(), moreover the old implementation was much
clearer.

Now when the truncate_inode_pages_range() can handle truncating non page
aligned regions we can use this to invalidate and zero out block aligned
region of the punched out range and then use ext4_block_truncate_page()
to zero the unaligned blocks on the start and end of the range. This
will greatly simplify the punch hole code. Moreover after this commit we
can get rid of the ext4_discard_partial_page_buffers() completely.

This has been tested on ppc64 with 1k block size with fsx and xfstests
without any problems.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
 fs/ext4/ext4.h    |    2 +
 fs/ext4/extents.c |   80 ++++++-----------------------------------------------
 fs/ext4/inode.c   |   31 ++++++++++++++++++++
 3 files changed, 42 insertions(+), 71 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 439af1e..704ceab 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1999,6 +1999,8 @@ extern int ext4_block_truncate_page(handle_t *handle,
 		struct address_space *mapping, loff_t from);
 extern int ext4_block_zero_page_range(handle_t *handle,
 		struct address_space *mapping, loff_t from, loff_t length);
+extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
+			     loff_t lstart, loff_t lend);
 extern int ext4_discard_partial_page_buffers(handle_t *handle,
 		struct address_space *mapping, loff_t from,
 		loff_t length, int flags);
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 9967947..7f5fe66 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4740,9 +4740,7 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
 	struct inode *inode = file->f_path.dentry->d_inode;
 	struct super_block *sb = inode->i_sb;
 	ext4_lblk_t first_block, stop_block;
-	struct address_space *mapping = inode->i_mapping;
 	handle_t *handle;
-	loff_t first_page, last_page, page_len;
 	loff_t first_page_offset, last_page_offset;
 	int credits, err = 0;
 
@@ -4762,17 +4760,13 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
 		   offset;
 	}
 
-	first_page = (offset + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
-	last_page = (offset + length) >> PAGE_CACHE_SHIFT;
-
-	first_page_offset = first_page << PAGE_CACHE_SHIFT;
-	last_page_offset = last_page << PAGE_CACHE_SHIFT;
+	first_page_offset = round_up(offset, sb->s_blocksize);
+	last_page_offset = round_down((offset + length), sb->s_blocksize) - 1;
 
-	/* Now release the pages */
-	if (last_page_offset > first_page_offset) {
+	/* Now release the pages and zero block aligned part of pages*/
+	if (last_page_offset > first_page_offset)
 		truncate_pagecache_range(inode, first_page_offset,
-					 last_page_offset - 1);
-	}
+					 last_page_offset);
 
 	/* finish any pending end_io work */
 	ext4_flush_completed_IO(inode);
@@ -4788,66 +4782,10 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
 	if (err)
 		goto out1;
 
-	/*
-	 * Now we need to zero out the non-page-aligned data in the
-	 * pages at the start and tail of the hole, and unmap the buffer
-	 * heads for the block aligned regions of the page that were
-	 * completely zeroed.
-	 */
-	if (first_page > last_page) {
-		/*
-		 * If the file space being truncated is contained within a page
-		 * just zero out and unmap the middle of that page
-		 */
-		err = ext4_discard_partial_page_buffers(handle,
-			mapping, offset, length, 0);
-
-		if (err)
-			goto out;
-	} else {
-		/*
-		 * zero out and unmap the partial page that contains
-		 * the start of the hole
-		 */
-		page_len  = first_page_offset - offset;
-		if (page_len > 0) {
-			err = ext4_discard_partial_page_buffers(handle, mapping,
-						   offset, page_len, 0);
-			if (err)
-				goto out;
-		}
-
-		/*
-		 * zero out and unmap the partial page that contains
-		 * the end of the hole
-		 */
-		page_len = offset + length - last_page_offset;
-		if (page_len > 0) {
-			err = ext4_discard_partial_page_buffers(handle, mapping,
-					last_page_offset, page_len, 0);
-			if (err)
-				goto out;
-		}
-	}
-
-	/*
-	 * If i_size is contained in the last page, we need to
-	 * unmap and zero the partial page after i_size
-	 */
-	if (inode->i_size >> PAGE_CACHE_SHIFT == last_page &&
-	   inode->i_size % PAGE_CACHE_SIZE != 0) {
-
-		page_len = PAGE_CACHE_SIZE -
-			(inode->i_size & (PAGE_CACHE_SIZE - 1));
-
-		if (page_len > 0) {
-			err = ext4_discard_partial_page_buffers(handle,
-			  mapping, inode->i_size, page_len, 0);
-
-			if (err)
-				goto out;
-		}
-	}
+	err = ext4_zero_partial_blocks(handle, inode, offset,
+				       offset + length - 1);
+	if (err)
+		goto out;
 
 	first_block = (offset + sb->s_blocksize - 1) >>
 		EXT4_BLOCK_SIZE_BITS(sb);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3e69a78..5205152 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3525,6 +3525,37 @@ unlock:
 	return err;
 }
 
+int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
+			     loff_t lstart, loff_t lend)
+{
+	struct super_block *sb = inode->i_sb;
+	struct address_space *mapping = inode->i_mapping;
+	unsigned partial = lstart & (sb->s_blocksize - 1);
+	ext4_fsblk_t start = lstart >> sb->s_blocksize_bits;
+	ext4_fsblk_t end = lend >> sb->s_blocksize_bits;
+	int err = 0;
+
+	/* Handle partial zero within the single block */
+	if (start == end) {
+		err = ext4_block_zero_page_range(handle, mapping,
+						 lstart, lend - lstart + 1);
+		return err;
+	}
+	/* Handle partial zero out on the start of the range */
+	if (partial) {
+		err = ext4_block_zero_page_range(handle, mapping,
+						 lstart, sb->s_blocksize);
+		if (err)
+			return err;
+	}
+	/* Handle partial zero out on the end of the range */
+	partial = lend & (sb->s_blocksize - 1);
+	if (partial != sb->s_blocksize - 1)
+		err = ext4_block_zero_page_range(handle, mapping,
+						 lend - partial, partial + 1);
+	return err;
+}
+
 int ext4_can_truncate(struct inode *inode)
 {
 	if (S_ISREG(inode->i_mode))
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 11/15] ext4: remove unused discard_partial_page_buffers
  2012-07-27  8:00 Add invalidatepage_range address space operation Lukas Czerner
                   ` (9 preceding siblings ...)
  2012-07-27  8:01 ` [PATCH 10/15] ext4: use ext4_zero_partial_blocks in punch_hole Lukas Czerner
@ 2012-07-27  8:01 ` Lukas Czerner
  2012-07-27  8:01 ` [PATCH 12/15] ext4: remove unused code from ext4_remove_blocks() Lukas Czerner
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Lukas Czerner @ 2012-07-27  8:01 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-ext4, tytso, hughd, linux-mmc, Lukas Czerner

The discard_partial_page_buffers is no longer used anywhere so we can
simply remove it including the *_no_lock variant and
EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED define.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
 fs/ext4/ext4.h  |    8 --
 fs/ext4/inode.c |  206 -------------------------------------------------------
 2 files changed, 0 insertions(+), 214 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 704ceab..72d75ff 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -583,11 +583,6 @@ enum {
 #define EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER	0x0020
 
 /*
- * Flags used by ext4_discard_partial_page_buffers
- */
-#define EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED	0x0001
-
-/*
  * ioctl commands
  */
 #define	EXT4_IOC_GETFLAGS		FS_IOC_GETFLAGS
@@ -2001,9 +1996,6 @@ extern int ext4_block_zero_page_range(handle_t *handle,
 		struct address_space *mapping, loff_t from, loff_t length);
 extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
 			     loff_t lstart, loff_t lend);
-extern int ext4_discard_partial_page_buffers(handle_t *handle,
-		struct address_space *mapping, loff_t from,
-		loff_t length, int flags);
 extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf);
 extern qsize_t *ext4_get_reserved_space(struct inode *inode);
 extern void ext4_da_update_reserve_space(struct inode *inode,
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5205152..2f950d6 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -139,9 +139,6 @@ static int ext4_set_bh_endio(struct buffer_head *bh, struct inode *inode);
 static void ext4_end_io_buffer_write(struct buffer_head *bh, int uptodate);
 static int __ext4_journalled_writepage(struct page *page, unsigned int len);
 static int ext4_bh_delay_or_unwritten(handle_t *handle, struct buffer_head *bh);
-static int ext4_discard_partial_page_buffers_no_lock(handle_t *handle,
-		struct inode *inode, struct page *page, loff_t from,
-		loff_t length, int flags);
 
 /*
  * Test whether an inode is a fast symlink.
@@ -3202,209 +3199,6 @@ void ext4_set_aops(struct inode *inode)
 	}
 }
 
-
-/*
- * ext4_discard_partial_page_buffers()
- * Wrapper function for ext4_discard_partial_page_buffers_no_lock.
- * This function finds and locks the page containing the offset
- * "from" and passes it to ext4_discard_partial_page_buffers_no_lock.
- * Calling functions that already have the page locked should call
- * ext4_discard_partial_page_buffers_no_lock directly.
- */
-int ext4_discard_partial_page_buffers(handle_t *handle,
-		struct address_space *mapping, loff_t from,
-		loff_t length, int flags)
-{
-	struct inode *inode = mapping->host;
-	struct page *page;
-	int err = 0;
-
-	page = find_or_create_page(mapping, from >> PAGE_CACHE_SHIFT,
-				   mapping_gfp_mask(mapping) & ~__GFP_FS);
-	if (!page)
-		return -ENOMEM;
-
-	err = ext4_discard_partial_page_buffers_no_lock(handle, inode, page,
-		from, length, flags);
-
-	unlock_page(page);
-	page_cache_release(page);
-	return err;
-}
-
-/*
- * ext4_discard_partial_page_buffers_no_lock()
- * Zeros a page range of length 'length' starting from offset 'from'.
- * Buffer heads that correspond to the block aligned regions of the
- * zeroed range will be unmapped.  Unblock aligned regions
- * will have the corresponding buffer head mapped if needed so that
- * that region of the page can be updated with the partial zero out.
- *
- * This function assumes that the page has already been  locked.  The
- * The range to be discarded must be contained with in the given page.
- * If the specified range exceeds the end of the page it will be shortened
- * to the end of the page that corresponds to 'from'.  This function is
- * appropriate for updating a page and it buffer heads to be unmapped and
- * zeroed for blocks that have been either released, or are going to be
- * released.
- *
- * handle: The journal handle
- * inode:  The files inode
- * page:   A locked page that contains the offset "from"
- * from:   The starting byte offset (from the begining of the file)
- *         to begin discarding
- * len:    The length of bytes to discard
- * flags:  Optional flags that may be used:
- *
- *         EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED
- *         Only zero the regions of the page whose buffer heads
- *         have already been unmapped.  This flag is appropriate
- *         for updateing the contents of a page whose blocks may
- *         have already been released, and we only want to zero
- *         out the regions that correspond to those released blocks.
- *
- * Returns zero on sucess or negative on failure.
- */
-static int ext4_discard_partial_page_buffers_no_lock(handle_t *handle,
-		struct inode *inode, struct page *page, loff_t from,
-		loff_t length, int flags)
-{
-	ext4_fsblk_t index = from >> PAGE_CACHE_SHIFT;
-	unsigned int offset = from & (PAGE_CACHE_SIZE-1);
-	unsigned int blocksize, max, pos;
-	ext4_lblk_t iblock;
-	struct buffer_head *bh;
-	int err = 0;
-
-	blocksize = inode->i_sb->s_blocksize;
-	max = PAGE_CACHE_SIZE - offset;
-
-	if (index != page->index)
-		return -EINVAL;
-
-	/*
-	 * correct length if it does not fall between
-	 * 'from' and the end of the page
-	 */
-	if (length > max || length < 0)
-		length = max;
-
-	iblock = index << (PAGE_CACHE_SHIFT - inode->i_sb->s_blocksize_bits);
-
-	if (!page_has_buffers(page))
-		create_empty_buffers(page, blocksize, 0);
-
-	/* Find the buffer that contains "offset" */
-	bh = page_buffers(page);
-	pos = blocksize;
-	while (offset >= pos) {
-		bh = bh->b_this_page;
-		iblock++;
-		pos += blocksize;
-	}
-
-	pos = offset;
-	while (pos < offset + length) {
-		unsigned int end_of_block, range_to_discard;
-
-		err = 0;
-
-		/* The length of space left to zero and unmap */
-		range_to_discard = offset + length - pos;
-
-		/* The length of space until the end of the block */
-		end_of_block = blocksize - (pos & (blocksize-1));
-
-		/*
-		 * Do not unmap or zero past end of block
-		 * for this buffer head
-		 */
-		if (range_to_discard > end_of_block)
-			range_to_discard = end_of_block;
-
-
-		/*
-		 * Skip this buffer head if we are only zeroing unampped
-		 * regions of the page
-		 */
-		if (flags & EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED &&
-			buffer_mapped(bh))
-				goto next;
-
-		/* If the range is block aligned, unmap */
-		if (range_to_discard == blocksize) {
-			clear_buffer_dirty(bh);
-			bh->b_bdev = NULL;
-			clear_buffer_mapped(bh);
-			clear_buffer_req(bh);
-			clear_buffer_new(bh);
-			clear_buffer_delay(bh);
-			clear_buffer_unwritten(bh);
-			clear_buffer_uptodate(bh);
-			zero_user(page, pos, range_to_discard);
-			BUFFER_TRACE(bh, "Buffer discarded");
-			goto next;
-		}
-
-		/*
-		 * If this block is not completely contained in the range
-		 * to be discarded, then it is not going to be released. Because
-		 * we need to keep this block, we need to make sure this part
-		 * of the page is uptodate before we modify it by writeing
-		 * partial zeros on it.
-		 */
-		if (!buffer_mapped(bh)) {
-			/*
-			 * Buffer head must be mapped before we can read
-			 * from the block
-			 */
-			BUFFER_TRACE(bh, "unmapped");
-			ext4_get_block(inode, iblock, bh, 0);
-			/* unmapped? It's a hole - nothing to do */
-			if (!buffer_mapped(bh)) {
-				BUFFER_TRACE(bh, "still unmapped");
-				goto next;
-			}
-		}
-
-		/* Ok, it's mapped. Make sure it's up-to-date */
-		if (PageUptodate(page))
-			set_buffer_uptodate(bh);
-
-		if (!buffer_uptodate(bh)) {
-			err = -EIO;
-			ll_rw_block(READ, 1, &bh);
-			wait_on_buffer(bh);
-			/* Uhhuh. Read error. Complain and punt.*/
-			if (!buffer_uptodate(bh))
-				goto next;
-		}
-
-		if (ext4_should_journal_data(inode)) {
-			BUFFER_TRACE(bh, "get write access");
-			err = ext4_journal_get_write_access(handle, bh);
-			if (err)
-				goto next;
-		}
-
-		zero_user(page, pos, range_to_discard);
-
-		err = 0;
-		if (ext4_should_journal_data(inode)) {
-			err = ext4_handle_dirty_metadata(handle, inode, bh);
-		} else
-			mark_buffer_dirty(bh);
-
-		BUFFER_TRACE(bh, "Partial buffer zeroed");
-next:
-		bh = bh->b_this_page;
-		iblock++;
-		pos += range_to_discard;
-	}
-
-	return err;
-}
-
 /*
  * ext4_block_truncate_page() zeroes out a mapping from file offset `from'
  * up to the end of the block which corresponds to `from'.
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 12/15] ext4: remove unused code from ext4_remove_blocks()
  2012-07-27  8:00 Add invalidatepage_range address space operation Lukas Czerner
                   ` (10 preceding siblings ...)
  2012-07-27  8:01 ` [PATCH 11/15] ext4: remove unused discard_partial_page_buffers Lukas Czerner
@ 2012-07-27  8:01 ` Lukas Czerner
  2012-07-27  8:01 ` [PATCH 13/15] ext4: update ext4_ext_remove_space trace point Lukas Czerner
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Lukas Czerner @ 2012-07-27  8:01 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-ext4, tytso, hughd, linux-mmc, Lukas Czerner

The "head removal" branch in the condition is never used in any code
path in ext4 since the function only caller ext4_ext_rm_leaf() will make
sure that the extent is properly split before removing blocks. Note that
there is a bug in this branch anyway.

This commit removes the unused code completely and makes use of
ext4_error() instead of printk if dubious range is provided.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
 fs/ext4/extents.c |   21 ++++-----------------
 1 files changed, 4 insertions(+), 17 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 7f5fe66..4d0084e 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2339,23 +2339,10 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
 			*partial_cluster = EXT4_B2C(sbi, pblk);
 		else
 			*partial_cluster = 0;
-	} else if (from == le32_to_cpu(ex->ee_block)
-		   && to <= le32_to_cpu(ex->ee_block) + ee_len - 1) {
-		/* head removal */
-		ext4_lblk_t num;
-		ext4_fsblk_t start;
-
-		num = to - from;
-		start = ext4_ext_pblock(ex);
-
-		ext_debug("free first %u blocks starting %llu\n", num, start);
-		ext4_free_blocks(handle, inode, NULL, start, num, flags);
-
-	} else {
-		printk(KERN_INFO "strange request: removal(2) "
-				"%u-%u from %u:%u\n",
-				from, to, le32_to_cpu(ex->ee_block), ee_len);
-	}
+	} else
+		ext4_error(sbi->s_sb, "strange request: removal(2) "
+			   "%u-%u from %u:%u\n",
+			   from, to, le32_to_cpu(ex->ee_block), ee_len);
 	return 0;
 }
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 13/15] ext4: update ext4_ext_remove_space trace point
  2012-07-27  8:00 Add invalidatepage_range address space operation Lukas Czerner
                   ` (11 preceding siblings ...)
  2012-07-27  8:01 ` [PATCH 12/15] ext4: remove unused code from ext4_remove_blocks() Lukas Czerner
@ 2012-07-27  8:01 ` Lukas Czerner
  2012-07-27  8:01 ` [PATCH 14/15] ext4: make punch hole code path work with bigalloc Lukas Czerner
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Lukas Czerner @ 2012-07-27  8:01 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-ext4, tytso, hughd, linux-mmc, Lukas Czerner

Add "end" variable.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
 fs/ext4/extents.c           |    6 +++---
 include/trace/events/ext4.h |   21 ++++++++++++++-------
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 4d0084e..014bec4 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2572,7 +2572,7 @@ static int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start,
 again:
 	ext4_ext_invalidate_cache(inode);
 
-	trace_ext4_ext_remove_space(inode, start, depth);
+	trace_ext4_ext_remove_space(inode, start, end, depth);
 
 	/*
 	 * Check if we are removing extents inside the extent tree. If that
@@ -2725,8 +2725,8 @@ cont:
 		}
 	}
 
-	trace_ext4_ext_remove_space_done(inode, start, depth, partial_cluster,
-			path->p_hdr->eh_entries);
+	trace_ext4_ext_remove_space_done(inode, start, end, depth,
+			partial_cluster, path->p_hdr->eh_entries);
 
 	/* If we still have something in the partial cluster and we have removed
 	 * even the first extent, then we should free the blocks in the partial
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 30bae72..884cbf2 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -1999,14 +1999,16 @@ TRACE_EVENT(ext4_ext_rm_idx,
 );
 
 TRACE_EVENT(ext4_ext_remove_space,
-	TP_PROTO(struct inode *inode, ext4_lblk_t start, int depth),
+	TP_PROTO(struct inode *inode, ext4_lblk_t start,
+		 ext4_lblk_t end, int depth),
 
-	TP_ARGS(inode, start, depth),
+	TP_ARGS(inode, start, end, depth),
 
 	TP_STRUCT__entry(
 		__field(	ino_t,		ino	)
 		__field(	dev_t,		dev	)
 		__field(	ext4_lblk_t,	start	)
+		__field(	ext4_lblk_t,	end	)
 		__field(	int,		depth	)
 	),
 
@@ -2014,26 +2016,29 @@ TRACE_EVENT(ext4_ext_remove_space,
 		__entry->ino	= inode->i_ino;
 		__entry->dev	= inode->i_sb->s_dev;
 		__entry->start	= start;
+		__entry->end	= end;
 		__entry->depth	= depth;
 	),
 
-	TP_printk("dev %d,%d ino %lu since %u depth %d",
+	TP_printk("dev %d,%d ino %lu start %u end %u depth %d",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  (unsigned long) __entry->ino,
 		  (unsigned) __entry->start,
+		  (unsigned) __entry->end,
 		  __entry->depth)
 );
 
 TRACE_EVENT(ext4_ext_remove_space_done,
-	TP_PROTO(struct inode *inode, ext4_lblk_t start, int depth,
-		ext4_lblk_t partial, unsigned short eh_entries),
+	TP_PROTO(struct inode *inode, ext4_lblk_t start, ext4_lblk_t end,
+		 int depth, ext4_lblk_t partial, unsigned short eh_entries),
 
-	TP_ARGS(inode, start, depth, partial, eh_entries),
+	TP_ARGS(inode, start, end, depth, partial, eh_entries),
 
 	TP_STRUCT__entry(
 		__field(	ino_t,		ino		)
 		__field(	dev_t,		dev		)
 		__field(	ext4_lblk_t,	start		)
+		__field(	ext4_lblk_t,	end		)
 		__field(	int,		depth		)
 		__field(	ext4_lblk_t,	partial		)
 		__field(	unsigned short,	eh_entries	)
@@ -2043,16 +2048,18 @@ TRACE_EVENT(ext4_ext_remove_space_done,
 		__entry->ino		= inode->i_ino;
 		__entry->dev		= inode->i_sb->s_dev;
 		__entry->start		= start;
+		__entry->end		= end;
 		__entry->depth		= depth;
 		__entry->partial	= partial;
 		__entry->eh_entries	= eh_entries;
 	),
 
-	TP_printk("dev %d,%d ino %lu since %u depth %d partial %u "
+	TP_printk("dev %d,%d ino %lu start %u end %u depth %d partial %u "
 		  "remaining_entries %u",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  (unsigned long) __entry->ino,
 		  (unsigned) __entry->start,
+		  (unsigned) __entry->end,
 		  __entry->depth,
 		  (unsigned) __entry->partial,
 		  (unsigned short) __entry->eh_entries)
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 14/15] ext4: make punch hole code path work with bigalloc
  2012-07-27  8:00 Add invalidatepage_range address space operation Lukas Czerner
                   ` (12 preceding siblings ...)
  2012-07-27  8:01 ` [PATCH 13/15] ext4: update ext4_ext_remove_space trace point Lukas Czerner
@ 2012-07-27  8:01 ` Lukas Czerner
  2012-07-27  8:01 ` [PATCH 15/15] ext4: Allow punch hole with bigalloc enabled Lukas Czerner
  2012-08-19  0:57 ` Add invalidatepage_range address space operation Theodore Ts'o
  15 siblings, 0 replies; 29+ messages in thread
From: Lukas Czerner @ 2012-07-27  8:01 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-ext4, tytso, hughd, linux-mmc, Lukas Czerner

Currently punch hole is disabled in file systems with bigalloc
feature enabled. However the recent changes in punch hole patch should
make it easier to support punching holes on bigalloc enabled file
systems.

This commit changes partial_cluster handling in ext4_remove_blocks(),
ext4_ext_rm_leaf() and ext4_ext_remove_space(). Currently
partial_cluster is unsigned long long type and it makes sure that we
will free the partial cluster if all extents has been released from that
cluster. However it has been specifically designed only for truncate.

With punch hole we can be freeing just some extents in the cluster
leaving the rest untouched. So we have to make sure that we will notice
cluster which still has some extents. To do this I've changed
partial_cluster to be signed long long type. The only scenario where
this could be a problem is when cluster_size == block size, however in
that case there would not be any partial clusters so we're safe. For
bigger clusters the signed type is enough. Now we use the negative value
in partial_cluster to mark such cluster used, hence we know that we must
not free it even if all other extents has been freed from such cluster.

This scenario can be described in simple diagram:

|FFF...FF..FF.UUU|
 ^----------^
  punch hole

. - free space
| - cluster boundary
F - freed extent
U - used extent

Also update respective tracepoints to use signed long long type for
partial_cluster.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
 fs/ext4/extents.c           |   69 +++++++++++++++++++++++++++++++-----------
 include/trace/events/ext4.h |   25 ++++++++-------
 2 files changed, 64 insertions(+), 30 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 014bec4..18460cd 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2269,7 +2269,7 @@ int ext4_ext_index_trans_blocks(struct inode *inode, int nrblocks, int chunk)
 
 static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
 			      struct ext4_extent *ex,
-			      ext4_fsblk_t *partial_cluster,
+			      signed long long *partial_cluster,
 			      ext4_lblk_t from, ext4_lblk_t to)
 {
 	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
@@ -2295,7 +2295,8 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
 	 * partial cluster here.
 	 */
 	pblk = ext4_ext_pblock(ex) + ee_len - 1;
-	if (*partial_cluster && (EXT4_B2C(sbi, pblk) != *partial_cluster)) {
+	if ((*partial_cluster > 0) &&
+	    (EXT4_B2C(sbi, pblk) != *partial_cluster)) {
 		ext4_free_blocks(handle, inode, NULL,
 				 EXT4_C2B(sbi, *partial_cluster),
 				 sbi->s_cluster_ratio, flags);
@@ -2321,23 +2322,41 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
 	    && to == le32_to_cpu(ex->ee_block) + ee_len - 1) {
 		/* tail removal */
 		ext4_lblk_t num;
+		unsigned int unaligned;
 
 		num = le32_to_cpu(ex->ee_block) + ee_len - from;
 		pblk = ext4_ext_pblock(ex) + ee_len - num;
-		ext_debug("free last %u blocks starting %llu\n", num, pblk);
+		/*
+		 * Usually we want to free partial cluster at the end of the
+		 * extent, except for the situation when the cluster is still
+		 * used by any other extent (partial_cluster is negative).
+		 */
+		if (*partial_cluster < 0 &&
+		    -(*partial_cluster) == EXT4_B2C(sbi, pblk + num - 1))
+			flags |= EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER;
+
+		ext_debug("free last %u blocks starting %llu partial %lld\n",
+			  num, pblk, *partial_cluster);
 		ext4_free_blocks(handle, inode, NULL, pblk, num, flags);
 		/*
 		 * If the block range to be freed didn't start at the
 		 * beginning of a cluster, and we removed the entire
-		 * extent, save the partial cluster here, since we
-		 * might need to delete if we determine that the
-		 * truncate operation has removed all of the blocks in
-		 * the cluster.
+		 * extent and the cluster is not used by any other extent,
+		 * save the partial cluster here, since we might need to
+		 * delete if we determine that the truncate operation has
+		 * removed all of the blocks in the cluster.
+		 *
+		 * On the other hand, if we did not manage to free the whole
+		 * extent, we have to mark the cluster as used (store negative
+		 * cluster number in partial_cluster).
 		 */
-		if (pblk & (sbi->s_cluster_ratio - 1) &&
-		    (ee_len == num))
+		unaligned = pblk & (sbi->s_cluster_ratio - 1);
+		if (unaligned && (ee_len == num) &&
+		    (*partial_cluster != -((long long)EXT4_B2C(sbi, pblk))))
 			*partial_cluster = EXT4_B2C(sbi, pblk);
-		else
+		else if (unaligned)
+			*partial_cluster = -((long long)EXT4_B2C(sbi, pblk));
+		else if (*partial_cluster > 0)
 			*partial_cluster = 0;
 	} else
 		ext4_error(sbi->s_sb, "strange request: removal(2) "
@@ -2355,12 +2374,16 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
  * @handle: The journal handle
  * @inode:  The files inode
  * @path:   The path to the leaf
+ * @partial_cluster: The cluster which we'll have to free if all extents
+ *                   has been released from it. It gets negative in case
+ *                   that the cluster is still used.
  * @start:  The first block to remove
  * @end:   The last block to remove
  */
 static int
 ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
-		 struct ext4_ext_path *path, ext4_fsblk_t *partial_cluster,
+		 struct ext4_ext_path *path,
+		 signed long long *partial_cluster,
 		 ext4_lblk_t start, ext4_lblk_t end)
 {
 	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
@@ -2373,6 +2396,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
 	unsigned short ex_ee_len;
 	unsigned uninitialized = 0;
 	struct ext4_extent *ex;
+	ext4_fsblk_t pblk;
 
 	/* the header must be checked already in ext4_ext_remove_space() */
 	ext_debug("truncate since %u in leaf to %u\n", start, end);
@@ -2411,6 +2435,16 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
 
 		/* If this extent is beyond the end of the hole, skip it */
 		if (end < ex_ee_block) {
+			/*
+			 * We're going to skip this extent and move to another,
+			 * so if this extent is not cluster aligned we have
+			 * to mark the current cluster as used to avoid
+			 * accidentally freeing it later on
+			 */
+			pblk = ext4_ext_pblock(ex);
+			if (pblk & (sbi->s_cluster_ratio - 1))
+				*partial_cluster =
+					-((long long)EXT4_B2C(sbi, pblk));
 			ex--;
 			ex_ee_block = le32_to_cpu(ex->ee_block);
 			ex_ee_len = ext4_ext_get_actual_len(ex);
@@ -2486,7 +2520,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
 					sizeof(struct ext4_extent));
 			}
 			le16_add_cpu(&eh->eh_entries, -1);
-		} else
+		} else if (*partial_cluster > 0)
 			*partial_cluster = 0;
 
 		err = ext4_ext_dirty(handle, inode, path + depth);
@@ -2504,11 +2538,10 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
 		err = ext4_ext_correct_indexes(handle, inode, path);
 
 	/*
-	 * If there is still a entry in the leaf node, check to see if
-	 * it references the partial cluster.  This is the only place
-	 * where it could; if it doesn't, we can free the cluster.
+	 * Free the partial cluster only if the current extent does not
+	 * reference it. Otherwise we might free used cluster.
 	 */
-	if (*partial_cluster && ex >= EXT_FIRST_EXTENT(eh) &&
+	if (*partial_cluster > 0 &&
 	    (EXT4_B2C(sbi, ext4_ext_pblock(ex) + ex_ee_len - 1) !=
 	     *partial_cluster)) {
 		int flags = EXT4_FREE_BLOCKS_FORGET;
@@ -2558,7 +2591,7 @@ static int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start,
 	struct super_block *sb = inode->i_sb;
 	int depth = ext_depth(inode);
 	struct ext4_ext_path *path;
-	ext4_fsblk_t partial_cluster = 0;
+	signed long long partial_cluster = 0;
 	handle_t *handle;
 	int i, err;
 
@@ -2731,7 +2764,7 @@ cont:
 	/* If we still have something in the partial cluster and we have removed
 	 * even the first extent, then we should free the blocks in the partial
 	 * cluster as well. */
-	if (partial_cluster && path->p_hdr->eh_entries == 0) {
+	if (partial_cluster > 0 && path->p_hdr->eh_entries == 0) {
 		int flags = EXT4_FREE_BLOCKS_FORGET;
 
 		if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode))
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 884cbf2..dad72af 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -1900,7 +1900,7 @@ TRACE_EVENT(ext4_ext_show_extent,
 TRACE_EVENT(ext4_remove_blocks,
 	    TP_PROTO(struct inode *inode, struct ext4_extent *ex,
 		ext4_lblk_t from, ext4_fsblk_t to,
-		ext4_fsblk_t partial_cluster),
+		long long int partial_cluster),
 
 	TP_ARGS(inode, ex, from, to, partial_cluster),
 
@@ -1912,7 +1912,7 @@ TRACE_EVENT(ext4_remove_blocks,
 		__field(	unsigned short,	ee_len	)
 		__field(	ext4_lblk_t,	from	)
 		__field(	ext4_lblk_t,	to	)
-		__field(	ext4_fsblk_t,	partial	)
+		__field(	long long int,	partial	)
 	),
 
 	TP_fast_assign(
@@ -1927,7 +1927,7 @@ TRACE_EVENT(ext4_remove_blocks,
 	),
 
 	TP_printk("dev %d,%d ino %lu extent [%u(%llu), %u]"
-		  "from %u to %u partial_cluster %u",
+		  "from %u to %u partial_cluster %lld",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  (unsigned long) __entry->ino,
 		  (unsigned) __entry->ee_lblk,
@@ -1935,12 +1935,13 @@ TRACE_EVENT(ext4_remove_blocks,
 		  (unsigned short) __entry->ee_len,
 		  (unsigned) __entry->from,
 		  (unsigned) __entry->to,
-		  (unsigned) __entry->partial)
+		  (long long int) __entry->partial)
 );
 
 TRACE_EVENT(ext4_ext_rm_leaf,
 	TP_PROTO(struct inode *inode, ext4_lblk_t start,
-		 struct ext4_extent *ex, ext4_fsblk_t partial_cluster),
+		 struct ext4_extent *ex,
+		 long long int partial_cluster),
 
 	TP_ARGS(inode, start, ex, partial_cluster),
 
@@ -1951,7 +1952,7 @@ TRACE_EVENT(ext4_ext_rm_leaf,
 		__field(	ext4_lblk_t,	ee_lblk	)
 		__field(	ext4_fsblk_t,	ee_pblk	)
 		__field(	short,		ee_len	)
-		__field(	ext4_fsblk_t,	partial	)
+		__field(	long long int,	partial	)
 	),
 
 	TP_fast_assign(
@@ -1965,14 +1966,14 @@ TRACE_EVENT(ext4_ext_rm_leaf,
 	),
 
 	TP_printk("dev %d,%d ino %lu start_lblk %u last_extent [%u(%llu), %u]"
-		  "partial_cluster %u",
+		  "partial_cluster %lld",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  (unsigned long) __entry->ino,
 		  (unsigned) __entry->start,
 		  (unsigned) __entry->ee_lblk,
 		  (unsigned long long) __entry->ee_pblk,
 		  (unsigned short) __entry->ee_len,
-		  (unsigned) __entry->partial)
+		  (long long int) __entry->partial)
 );
 
 TRACE_EVENT(ext4_ext_rm_idx,
@@ -2030,7 +2031,7 @@ TRACE_EVENT(ext4_ext_remove_space,
 
 TRACE_EVENT(ext4_ext_remove_space_done,
 	TP_PROTO(struct inode *inode, ext4_lblk_t start, ext4_lblk_t end,
-		 int depth, ext4_lblk_t partial, unsigned short eh_entries),
+		 int depth, long long int partial, unsigned short eh_entries),
 
 	TP_ARGS(inode, start, end, depth, partial, eh_entries),
 
@@ -2040,7 +2041,7 @@ TRACE_EVENT(ext4_ext_remove_space_done,
 		__field(	ext4_lblk_t,	start		)
 		__field(	ext4_lblk_t,	end		)
 		__field(	int,		depth		)
-		__field(	ext4_lblk_t,	partial		)
+		__field(	long long int,	partial		)
 		__field(	unsigned short,	eh_entries	)
 	),
 
@@ -2054,14 +2055,14 @@ TRACE_EVENT(ext4_ext_remove_space_done,
 		__entry->eh_entries	= eh_entries;
 	),
 
-	TP_printk("dev %d,%d ino %lu start %u end %u depth %d partial %u "
+	TP_printk("dev %d,%d ino %lu start %u end %u depth %d partial %lld "
 		  "remaining_entries %u",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  (unsigned long) __entry->ino,
 		  (unsigned) __entry->start,
 		  (unsigned) __entry->end,
 		  __entry->depth,
-		  (unsigned) __entry->partial,
+		  (long long int) __entry->partial,
 		  (unsigned short) __entry->eh_entries)
 );
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 15/15] ext4: Allow punch hole with bigalloc enabled
  2012-07-27  8:00 Add invalidatepage_range address space operation Lukas Czerner
                   ` (13 preceding siblings ...)
  2012-07-27  8:01 ` [PATCH 14/15] ext4: make punch hole code path work with bigalloc Lukas Czerner
@ 2012-07-27  8:01 ` Lukas Czerner
  2012-08-19  0:57 ` Add invalidatepage_range address space operation Theodore Ts'o
  15 siblings, 0 replies; 29+ messages in thread
From: Lukas Czerner @ 2012-07-27  8:01 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-ext4, tytso, hughd, linux-mmc, Lukas Czerner

In commits 5f95d21fb6f2aaa52830e5b7fb405f6c71d3ab85 and
30bc2ec9598a1b156ad75217f2e7d4560efdeeab we've reworked punch_hole
implementation and there is noting holding us back from using punch hole
on file system with bigalloc feature enabled.

This has been tested with fsx and xfstests.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
 fs/ext4/inode.c |    5 -----
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 2f950d6..7dc7f79 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3383,11 +3383,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
 		return -EOPNOTSUPP;
 	}
 
-	if (EXT4_SB(inode->i_sb)->s_cluster_ratio > 1) {
-		/* TODO: Add support for bigalloc file systems */
-		return -EOPNOTSUPP;
-	}
-
 	return ext4_ext_punch_hole(file, offset, length);
 }
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 07/15] ext4: Take i_mutex before punching hole
  2012-07-27  8:01 ` [PATCH 07/15] ext4: Take i_mutex before punching hole Lukas Czerner
@ 2012-07-27  9:04   ` Lukáš Czerner
  2012-07-27 12:08     ` Zheng Liu
  2012-08-20  5:45   ` Hugh Dickins
  1 sibling, 1 reply; 29+ messages in thread
From: Lukáš Czerner @ 2012-07-27  9:04 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-fsdevel, linux-ext4, tytso, Zheng Liu

Add Zheng Liu to the CC

On Fri, 27 Jul 2012, Lukas Czerner wrote:

> Date: Fri, 27 Jul 2012 10:01:06 +0200
> From: Lukas Czerner <lczerner@redhat.com>
> To: linux-fsdevel@vger.kernel.org
> Cc: linux-ext4@vger.kernel.org, tytso@mit.edu, hughd@google.com,
>     linux-mmc@vger.kernel.org, Lukas Czerner <lczerner@redhat.com>
> Subject: [PATCH 07/15] ext4: Take i_mutex before punching hole
> 
> Currently the allocation might happen in the punched range after the
> truncation and before the releasing the space of the range. This would
> lead to blocks being unallocated under the mapped buffer heads resulting
> in nasty bugs.
> 
> With this commit we take i_mutex before going to do anything in the
> ext4_ext_punch_hole() preventing any write to happen while the hole
> punching is in progress. This will also allow us to ditch the writeout
> of dirty pages withing the range.
> 
> This commit was based on code provided by Zheng Liu, thanks!
> 
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> ---
>  fs/ext4/extents.c |   26 ++++++++++----------------
>  1 files changed, 10 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 91341ec..2d6a216 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -4755,9 +4755,11 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
>  	loff_t first_page_offset, last_page_offset;
>  	int credits, err = 0;
>  
> +	mutex_lock(&inode->i_mutex);
> +
>  	/* No need to punch hole beyond i_size */
>  	if (offset >= inode->i_size)
> -		return 0;
> +		goto out1;
>  
>  	/*
>  	 * If the hole extends beyond i_size, set the hole
> @@ -4775,18 +4777,6 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
>  	first_page_offset = first_page << PAGE_CACHE_SHIFT;
>  	last_page_offset = last_page << PAGE_CACHE_SHIFT;
>  
> -	/*
> -	 * Write out all dirty pages to avoid race conditions
> -	 * Then release them.
> -	 */
> -	if (mapping->nrpages && mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
> -		err = filemap_write_and_wait_range(mapping,
> -			offset, offset + length - 1);
> -
> -		if (err)
> -			return err;
> -	}
> -
>  	/* Now release the pages */
>  	if (last_page_offset > first_page_offset) {
>  		truncate_pagecache_range(inode, first_page_offset,
> @@ -4798,12 +4788,14 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
>  
>  	credits = ext4_writepage_trans_blocks(inode);
>  	handle = ext4_journal_start(inode, credits);
> -	if (IS_ERR(handle))
> -		return PTR_ERR(handle);
> +	if (IS_ERR(handle)) {
> +		err = PTR_ERR(handle);
> +		goto out1;
> +	}
>  
>  	err = ext4_orphan_add(handle, inode);
>  	if (err)
> -		goto out;
> +		goto out1;
>  
>  	/*
>  	 * Now we need to zero out the non-page-aligned data in the
> @@ -4893,6 +4885,8 @@ out:
>  	inode->i_mtime = inode->i_ctime = ext4_current_time(inode);
>  	ext4_mark_inode_dirty(handle, inode);
>  	ext4_journal_stop(handle);
> +out1:
> +	mutex_unlock(&inode->i_mutex);
>  	return err;
>  }
>  int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 07/15] ext4: Take i_mutex before punching hole
  2012-07-27  9:04   ` Lukáš Czerner
@ 2012-07-27 12:08     ` Zheng Liu
  0 siblings, 0 replies; 29+ messages in thread
From: Zheng Liu @ 2012-07-27 12:08 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: linux-fsdevel, linux-ext4, tytso

On Fri, Jul 27, 2012 at 11:04:02AM +0200, Lukáš Czerner wrote:
> Add Zheng Liu to the CC

Hi Lukas,

It looks good to me. :-)

Regards,
Zheng

> 
> On Fri, 27 Jul 2012, Lukas Czerner wrote:
> 
> > Date: Fri, 27 Jul 2012 10:01:06 +0200
> > From: Lukas Czerner <lczerner@redhat.com>
> > To: linux-fsdevel@vger.kernel.org
> > Cc: linux-ext4@vger.kernel.org, tytso@mit.edu, hughd@google.com,
> >     linux-mmc@vger.kernel.org, Lukas Czerner <lczerner@redhat.com>
> > Subject: [PATCH 07/15] ext4: Take i_mutex before punching hole
> > 
> > Currently the allocation might happen in the punched range after the
> > truncation and before the releasing the space of the range. This would
> > lead to blocks being unallocated under the mapped buffer heads resulting
> > in nasty bugs.
> > 
> > With this commit we take i_mutex before going to do anything in the
> > ext4_ext_punch_hole() preventing any write to happen while the hole
> > punching is in progress. This will also allow us to ditch the writeout
> > of dirty pages withing the range.
> > 
> > This commit was based on code provided by Zheng Liu, thanks!
> > 
> > Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> > ---
> >  fs/ext4/extents.c |   26 ++++++++++----------------
> >  1 files changed, 10 insertions(+), 16 deletions(-)
> > 
> > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> > index 91341ec..2d6a216 100644
> > --- a/fs/ext4/extents.c
> > +++ b/fs/ext4/extents.c
> > @@ -4755,9 +4755,11 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
> >  	loff_t first_page_offset, last_page_offset;
> >  	int credits, err = 0;
> >  
> > +	mutex_lock(&inode->i_mutex);
> > +
> >  	/* No need to punch hole beyond i_size */
> >  	if (offset >= inode->i_size)
> > -		return 0;
> > +		goto out1;
> >  
> >  	/*
> >  	 * If the hole extends beyond i_size, set the hole
> > @@ -4775,18 +4777,6 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
> >  	first_page_offset = first_page << PAGE_CACHE_SHIFT;
> >  	last_page_offset = last_page << PAGE_CACHE_SHIFT;
> >  
> > -	/*
> > -	 * Write out all dirty pages to avoid race conditions
> > -	 * Then release them.
> > -	 */
> > -	if (mapping->nrpages && mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
> > -		err = filemap_write_and_wait_range(mapping,
> > -			offset, offset + length - 1);
> > -
> > -		if (err)
> > -			return err;
> > -	}
> > -
> >  	/* Now release the pages */
> >  	if (last_page_offset > first_page_offset) {
> >  		truncate_pagecache_range(inode, first_page_offset,
> > @@ -4798,12 +4788,14 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
> >  
> >  	credits = ext4_writepage_trans_blocks(inode);
> >  	handle = ext4_journal_start(inode, credits);
> > -	if (IS_ERR(handle))
> > -		return PTR_ERR(handle);
> > +	if (IS_ERR(handle)) {
> > +		err = PTR_ERR(handle);
> > +		goto out1;
> > +	}
> >  
> >  	err = ext4_orphan_add(handle, inode);
> >  	if (err)
> > -		goto out;
> > +		goto out1;
> >  
> >  	/*
> >  	 * Now we need to zero out the non-page-aligned data in the
> > @@ -4893,6 +4885,8 @@ out:
> >  	inode->i_mtime = inode->i_ctime = ext4_current_time(inode);
> >  	ext4_mark_inode_dirty(handle, inode);
> >  	ext4_journal_stop(handle);
> > +out1:
> > +	mutex_unlock(&inode->i_mutex);
> >  	return err;
> >  }
> >  int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> > 
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Add invalidatepage_range address space operation
  2012-07-27  8:00 Add invalidatepage_range address space operation Lukas Czerner
                   ` (14 preceding siblings ...)
  2012-07-27  8:01 ` [PATCH 15/15] ext4: Allow punch hole with bigalloc enabled Lukas Czerner
@ 2012-08-19  0:57 ` Theodore Ts'o
  2012-08-20  4:43   ` Hugh Dickins
  15 siblings, 1 reply; 29+ messages in thread
From: Theodore Ts'o @ 2012-08-19  0:57 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-fsdevel, linux-ext4, hughd, linux-mmc

On Fri, Jul 27, 2012 at 10:00:59AM +0200, Lukas Czerner wrote:
> This set of patches are aimed to allow truncate_inode_pages_range() handle
> ranges which are not aligned at the end of the page. Currently it will
> hit BUG_ON() when the end of the range is not aligned. Punch hole feature
> however can benefit from this ability saving file systems some work not
> forcing them to implement their own invalidate code to handle unaligned
> ranges.
> 
> In order for this to work we need however new address space operation
> invalidatepage_range which should be able to handle page invalidation with
> offset and length specified.
> 
> patch 01:	Implements the new invalidatepage_range address space
> 		operation in the mm layer
> patch 02 - 05:	Wire the new invalidatepage_range aop to the ext4, xfs and
> 		ocfs2 file system (which are currently the only file
> 		systems supporting punch hole) and implement this
> 		functionality for jbd2 as well.
> patch 06:	Change truncate_inode_pages_range() to handle unaligned
> 		ranges.
> patch 07 - 15:	Ext4 specific changes which take benefit from the
> 		previous truncate_inode_pages_range() change. Not all
> 		are realated specifically to this change, but all are
> 		related to the punch hole feature.

What's the current status of this patch series?

I haven't seen much if any comment from the mm folks and the xfs/ocfs2
folks.  I've incorporated the patches into the unstable portion of the
ext4 tree so I can do some testing (some minor changes to the ext4
patches were needed to rebase the patch series to the 3.6-rc1-based
ext4 dev tree, but nothing major).  However, I don't want to include
this into the ext4 tree for merging into linux-next unless we get
general agreement from the maintainers of the other trees that are
affected by this patch series.

Thansk,

						- Ted

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Add invalidatepage_range address space operation
  2012-08-19  0:57 ` Add invalidatepage_range address space operation Theodore Ts'o
@ 2012-08-20  4:43   ` Hugh Dickins
  2012-08-20 13:23     ` Theodore Ts'o
  0 siblings, 1 reply; 29+ messages in thread
From: Hugh Dickins @ 2012-08-20  4:43 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Lukas Czerner, linux-fsdevel, linux-ext4, linux-mmc

(I wonder if linux-mmc@vger.kernel.org was supposed to be
linux-mm@kvack.org?  Our apologies to linux-mmc if so.)

On Sat, 18 Aug 2012, Theodore Ts'o wrote:
> On Fri, Jul 27, 2012 at 10:00:59AM +0200, Lukas Czerner wrote:
> > This set of patches are aimed to allow truncate_inode_pages_range() handle
> > ranges which are not aligned at the end of the page. Currently it will
> > hit BUG_ON() when the end of the range is not aligned. Punch hole feature
> > however can benefit from this ability saving file systems some work not
> > forcing them to implement their own invalidate code to handle unaligned
> > ranges.
> > 
> > In order for this to work we need however new address space operation
> > invalidatepage_range which should be able to handle page invalidation with
> > offset and length specified.
> > 
> > patch 01:	Implements the new invalidatepage_range address space
> > 		operation in the mm layer
> > patch 02 - 05:	Wire the new invalidatepage_range aop to the ext4, xfs and
> > 		ocfs2 file system (which are currently the only file
> > 		systems supporting punch hole) and implement this
> > 		functionality for jbd2 as well.

(tmpfs also supports punch hole, but has its own truncation routine,
and does not need to get involved in these changes.)

> > patch 06:	Change truncate_inode_pages_range() to handle unaligned
> > 		ranges.
> > patch 07 - 15:	Ext4 specific changes which take benefit from the
> > 		previous truncate_inode_pages_range() change. Not all
> > 		are realated specifically to this change, but all are
> > 		related to the punch hole feature.
> 
> What's the current status of this patch series?
> 
> I haven't seen much if any comment from the mm folks and the xfs/ocfs2
> folks.  I've incorporated the patches into the unstable portion of the
> ext4 tree so I can do some testing (some minor changes to the ext4
> patches were needed to rebase the patch series to the 3.6-rc1-based
> ext4 dev tree, but nothing major).  However, I don't want to include
> this into the ext4 tree for merging into linux-next unless we get
> general agreement from the maintainers of the other trees that are
> affected by this patch series.

Thanks for the reminder.  I'm happy with the general direction, if not
some of the details; though I see it as more of a vfs than an mm thing.

I'll comment briefly now on patch 06 (which is fine) and patch 01 (which
is not) and patch 07 (which is outside my scope, but worth a remark on).

I won't attempt to judge whether the ext4 and ocfs2 and xfs parts
are correct; but from the look of it, 02 03 04 05 are wrong like 01
(or you can say it's 06 which is wrong, though it looks good to me).

Hugh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 06/15] mm: teach truncate_inode_pages_range() to handle non page aligned ranges
  2012-07-27  8:01 ` [PATCH 06/15] mm: teach truncate_inode_pages_range() to handle non page aligned ranges Lukas Czerner
@ 2012-08-20  4:52   ` Hugh Dickins
  2012-08-20 10:26     ` Lukáš Czerner
  0 siblings, 1 reply; 29+ messages in thread
From: Hugh Dickins @ 2012-08-20  4:52 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-fsdevel, linux-ext4, tytso, linux-mmc, Andrew Morton

On Fri, 27 Jul 2012, Lukas Czerner wrote:

> This commit changes truncate_inode_pages_range() so it can handle non
> page aligned regions of the truncate. Currently we can hit BUG_ON when
> the end of the range is not page aligned, but we can handle unaligned
> start of the range.
> 
> Being able to handle non page aligned regions of the page can help file
> system punch_hole implementations and save some work, because once we're
> holding the page we might as well deal with it right away.
> 
> In order for this to work correctly, called must register
> invalidatepage_range address space operation, or rely solely on the
> block_invalidatepage_range. That said it will BUG_ON() if caller
> implements invalidatepage(), does not implement invalidatepage_range()
> and use truncate_inode_pages_range() with unaligned end of the range.
> 
> This was based on the code provided by Hugh Dickins with some small
> changes to make use of do_invalidatepage_range().
> 
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Hugh Dickins <hughd@google.com>

Acked-by: Hugh Dickins <hughd@google.com>

This looks good to me.  I like the way you provide the same args
to do_invalidatepage_range() as to zero_user_segment():

		zero_user_segment(page, partial_start, top);
		if (page_has_private(page))
			do_invalidatepage_range(page, partial_start, top);

Unfortunately, that is not what patches 01-05 are expecting...

Hugh

> ---
>  mm/truncate.c |   77 +++++++++++++++++++++++++++++++++++++--------------------
>  1 files changed, 50 insertions(+), 27 deletions(-)
> 
> diff --git a/mm/truncate.c b/mm/truncate.c
> index e29e5ea..1f6ea8b 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -71,14 +71,6 @@ void do_invalidatepage_range(struct page *page, unsigned long offset,
>  #endif
>  }
>  
> -static inline void truncate_partial_page(struct page *page, unsigned partial)
> -{
> -	zero_user_segment(page, partial, PAGE_CACHE_SIZE);
> -	cleancache_invalidate_page(page->mapping, page);
> -	if (page_has_private(page))
> -		do_invalidatepage(page, partial);
> -}
> -
>  /*
>   * This cancels just the dirty bit on the kernel page itself, it
>   * does NOT actually remove dirty bits on any mmap's that may be
> @@ -212,8 +204,8 @@ int invalidate_inode_page(struct page *page)
>   * @lend: offset to which to truncate
>   *
>   * Truncate the page cache, removing the pages that are between
> - * specified offsets (and zeroing out partial page
> - * (if lstart is not page aligned)).
> + * specified offsets (and zeroing out partial pages
> + * if lstart or lend + 1 is not page aligned).
>   *
>   * Truncate takes two passes - the first pass is nonblocking.  It will not
>   * block on page locks and it will not block on writeback.  The second pass
> @@ -224,35 +216,44 @@ int invalidate_inode_page(struct page *page)
>   * We pass down the cache-hot hint to the page freeing code.  Even if the
>   * mapping is large, it is probably the case that the final pages are the most
>   * recently touched, and freeing happens in ascending file offset order.
> + *
> + * Note that it is able to handle cases where lend + 1 is not page aligned.
> + * However in order for this to work caller have to register
> + * invalidatepage_range address space operation or rely solely on
> + * block_invalidatepage_range(). That said, do_invalidatepage_range() will
> + * BUG_ON() if caller implements invalidatapage(), does not implement
                                    invalidatepage()
> + * invalidatepage_range() and uses truncate_inode_pages_range() with lend + 1
> + * unaligned to the page cache size.
>   */
>  void truncate_inode_pages_range(struct address_space *mapping,
>  				loff_t lstart, loff_t lend)
>  {
> -	const pgoff_t start = (lstart + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
> -	const unsigned partial = lstart & (PAGE_CACHE_SIZE - 1);
> +	pgoff_t start = (lstart + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
> +	pgoff_t end = (lend + 1) >> PAGE_CACHE_SHIFT;
> +	unsigned int partial_start = lstart & (PAGE_CACHE_SIZE - 1);
> +	unsigned int partial_end = (lend + 1) & (PAGE_CACHE_SIZE - 1);
>  	struct pagevec pvec;
>  	pgoff_t index;
> -	pgoff_t end;
>  	int i;
>  
>  	cleancache_invalidate_inode(mapping);
>  	if (mapping->nrpages == 0)
>  		return;
>  
> -	BUG_ON((lend & (PAGE_CACHE_SIZE - 1)) != (PAGE_CACHE_SIZE - 1));
> -	end = (lend >> PAGE_CACHE_SHIFT);
> +	if (lend == -1)
> +		end = -1;	/* unsigned, so actually very big */
>  
>  	pagevec_init(&pvec, 0);
>  	index = start;
> -	while (index <= end && pagevec_lookup(&pvec, mapping, index,
> -			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
> +	while (index < end && pagevec_lookup(&pvec, mapping, index,
> +			min(end - index, (pgoff_t)PAGEVEC_SIZE))) {
>  		mem_cgroup_uncharge_start();
>  		for (i = 0; i < pagevec_count(&pvec); i++) {
>  			struct page *page = pvec.pages[i];
>  
>  			/* We rely upon deletion not changing page->index */
>  			index = page->index;
> -			if (index > end)
> +			if (index >= end)
>  				break;
>  
>  			if (!trylock_page(page))
> @@ -271,27 +272,51 @@ void truncate_inode_pages_range(struct address_space *mapping,
>  		index++;
>  	}
>  
> -	if (partial) {
> +	if (partial_start) {
>  		struct page *page = find_lock_page(mapping, start - 1);
>  		if (page) {
> +			unsigned int top = PAGE_CACHE_SIZE;
> +			if (start > end) {
> +				top = partial_end;
> +				partial_end = 0;
> +			}
> +			wait_on_page_writeback(page);
> +			zero_user_segment(page, partial_start, top);
> +			cleancache_invalidate_page(mapping, page);
> +			if (page_has_private(page))
> +				do_invalidatepage_range(page, partial_start,
> +							top);
> +			unlock_page(page);
> +			page_cache_release(page);
> +		}
> +	}
> +	if (partial_end) {
> +		struct page *page = find_lock_page(mapping, end);
> +		if (page) {
>  			wait_on_page_writeback(page);
> -			truncate_partial_page(page, partial);
> +			zero_user_segment(page, 0, partial_end);
> +			cleancache_invalidate_page(mapping, page);
> +			if (page_has_private(page))
> +				do_invalidatepage_range(page, 0,
> +							partial_end);
>  			unlock_page(page);
>  			page_cache_release(page);
>  		}
>  	}
> +	if (start >= end)
> +		return;
>  
>  	index = start;
>  	for ( ; ; ) {
>  		cond_resched();
>  		if (!pagevec_lookup(&pvec, mapping, index,
> -			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
> +			min(end - index, (pgoff_t)PAGEVEC_SIZE))) {
>  			if (index == start)
>  				break;
>  			index = start;
>  			continue;
>  		}
> -		if (index == start && pvec.pages[0]->index > end) {
> +		if (index == start && pvec.pages[0]->index >= end) {
>  			pagevec_release(&pvec);
>  			break;
>  		}
> @@ -301,7 +326,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
>  
>  			/* We rely upon deletion not changing page->index */
>  			index = page->index;
> -			if (index > end)
> +			if (index >= end)
>  				break;
>  
>  			lock_page(page);
> @@ -646,10 +671,8 @@ void truncate_pagecache_range(struct inode *inode, loff_t lstart, loff_t lend)
>  	 * This rounding is currently just for example: unmap_mapping_range
>  	 * expands its hole outwards, whereas we want it to contract the hole
>  	 * inwards.  However, existing callers of truncate_pagecache_range are
> -	 * doing their own page rounding first; and truncate_inode_pages_range
> -	 * currently BUGs if lend is not pagealigned-1 (it handles partial
> -	 * page at start of hole, but not partial page at end of hole).  Note
> -	 * unmap_mapping_range allows holelen 0 for all, and we allow lend -1.
> +	 * doing their own page rounding first.  Note that unmap_mapping_range
> +	 * allows holelen 0 for all, and we allow lend -1 for end of file.
>  	 */
>  
>  	/*
> -- 
> 1.7.7.6
> 
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 01/15] mm: add invalidatepage_range address space operation
  2012-07-27  8:01 ` [PATCH 01/15] mm: add " Lukas Czerner
@ 2012-08-20  5:24   ` Hugh Dickins
       [not found]     ` <5033a999.0f403a0a.19c3.ffff95deSMTPIN_ADDED@mx.google.com>
  0 siblings, 1 reply; 29+ messages in thread
From: Hugh Dickins @ 2012-08-20  5:24 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-fsdevel, linux-ext4, tytso, linux-mmc, Andrew Morton

On Fri, 27 Jul 2012, Lukas Czerner wrote:

> Currently there is no way to truncate partial page where the end
> truncate point is not at the end of the page. This is because it was not
> needed and the functionality was enough for file system truncate
> operation to work properly. However more file systems now support punch
> hole feature and it can benefit from mm supporting truncating page just
> up to the certain point.
> 
> Specifically, with this functionality truncate_inode_pages_range() can
> be changed so it supports truncating partial page at the end of the
> range (currently it will BUG_ON() if 'end' is not at the end of the
> page).
> 
> This commit add new address space operation invalidatepage_range which
> allows specifying length of bytes to invalidate, rather than assuming
> truncate to the end of the page. It also introduce
> block_invalidatepage_range() and do_invalidatepage)range() functions for
> exactly the same reason.
> 
> The caller does not have to implement both aops (invalidatepage and
> invalidatepage_range) and the latter is preferred. The old method will be
> used only if invalidatepage_range is not implemented by the caller.
> 
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Hugh Dickins <hughd@google.com>

Needs some rework.

> ---

Documentation/filesystems/Locking
Documentation/filesystems/netfs-api.txt
Documentation/filesystems/vfs.txt
   all have something to say about invalidatepage().  I should think
   you can leave the netfs-api.txt one alone, but the other two ought
   now to comment on invalidatepage_range() too.  Can perfectly well be
   left to a separate patch, following on a little later if you prefer.

>  fs/buffer.c                 |   23 ++++++++++++++++++++++-
>  include/linux/buffer_head.h |    2 ++
>  include/linux/fs.h          |    2 ++
>  include/linux/mm.h          |    2 ++
>  mm/truncate.c               |   30 ++++++++++++++++++++++++++----
>  5 files changed, 54 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index c7062c8..5937f30 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -1457,13 +1457,27 @@ static void discard_buffer(struct buffer_head * bh)
>   */
>  void block_invalidatepage(struct page *page, unsigned long offset)
>  {
> +	block_invalidatepage_range(page, offset, PAGE_CACHE_SIZE);

Okay, fine, that fits with patch 06 where the third arg is endoffset.

> +}
> +EXPORT_SYMBOL(block_invalidatepage);
> +
> +void block_invalidatepage_range(struct page *page, unsigned long offset,
> +				unsigned long length)

Length?  No, patch 06 is passing endoffset.

> +{
>  	struct buffer_head *head, *bh, *next;
>  	unsigned int curr_off = 0;
> +	unsigned long stop = length + offset;

So here you are adding together two offsets.

>  
>  	BUG_ON(!PageLocked(page));
>  	if (!page_has_buffers(page))
>  		goto out;
>  
> +	/*
> +	 * Check for overflow
> +	 */
> +	if (stop < length)
> +		stop = PAGE_CACHE_SIZE;

And this is weird, even if length were a length: though such wraparound
is conceivable, a far more likely overflow would be stop > PAGE_CACHE_SIZE.

> +
>  	head = page_buffers(page);
>  	bh = head;
>  	do {
> @@ -1471,6 +1485,12 @@ void block_invalidatepage(struct page *page, unsigned long offset)
>  		next = bh->b_this_page;
>  
>  		/*
> +		 * Are we still fully in range ?
> +		 */
> +		if (next_off > stop)
> +			goto out;

Disillusioned, I didn't stop to think about this one.

> +
> +		/*
>  		 * is this block fully invalidated?
>  		 */
>  		if (offset <= curr_off)
> @@ -1489,7 +1509,8 @@ void block_invalidatepage(struct page *page, unsigned long offset)
>  out:
>  	return;
>  }
> -EXPORT_SYMBOL(block_invalidatepage);
> +EXPORT_SYMBOL(block_invalidatepage_range);
> +
>  
>  /*
>   * We attach and possibly dirty the buffers atomically wrt
> diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
> index 458f497..9d55645 100644
> --- a/include/linux/buffer_head.h
> +++ b/include/linux/buffer_head.h
> @@ -194,6 +194,8 @@ extern int buffer_heads_over_limit;
>   * address_spaces.
>   */
>  void block_invalidatepage(struct page *page, unsigned long offset);
> +void block_invalidatepage_range(struct page *page, unsigned long offset,
> +				unsigned long length);

See comments on fs.h.

>  int block_write_full_page(struct page *page, get_block_t *get_block,
>  				struct writeback_control *wbc);
>  int block_write_full_page_endio(struct page *page, get_block_t *get_block,
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 8fabb03..b9eaf0c 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -620,6 +620,8 @@ struct address_space_operations {
>  	/* Unfortunately this kludge is needed for FIBMAP. Don't use it */
>  	sector_t (*bmap)(struct address_space *, sector_t);
>  	void (*invalidatepage) (struct page *, unsigned long);
> +	void (*invalidatepage_range) (struct page *, unsigned long,
> +				      unsigned long);

It may turn out to be bad advice, given how invalidatepage() already
takes an unsigned long, but I'd be tempted to make both of these args
unsigned int, since that helps to make it clearer that they're intended
to be offsets within a page, in the range 0..PAGE_CACHE_SIZE.

(partial_start, partial_end and top in truncate_inode_pages_range()
are all unsigned int.)

Andrew is very keen on naming arguments in prototypes,
and I think there is an especially strong case for it here.

>  	int (*releasepage) (struct page *, gfp_t);
>  	void (*freepage)(struct page *);
>  	ssize_t (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f9f279c..2db6a29 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -998,6 +998,8 @@ struct page *get_dump_page(unsigned long addr);
>  
>  extern int try_to_release_page(struct page * page, gfp_t gfp_mask);
>  extern void do_invalidatepage(struct page *page, unsigned long offset);
> +extern void do_invalidatepage_range(struct page *page, unsigned long offset,
> +				    unsigned long length);

Likewise; ah, good, you named them... but then passed the wrong thing.

>  
>  int __set_page_dirty_nobuffers(struct page *page);
>  int __set_page_dirty_no_writeback(struct page *page);
> diff --git a/mm/truncate.c b/mm/truncate.c
> index 75801ac..e29e5ea 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -39,14 +39,36 @@
>   */
>  void do_invalidatepage(struct page *page, unsigned long offset)
>  {
> +	do_invalidatepage_range(page, offset, PAGE_CACHE_SIZE);

Good.

> +}
> +
> +void do_invalidatepage_range(struct page *page, unsigned long offset,
> +			     unsigned long length)

Not so good.

> +{
> +	void (*invalidatepage_range)(struct page *, unsigned long,
> +				     unsigned long);
>  	void (*invalidatepage)(struct page *, unsigned long);
> +
> +	/*
> +	 * Try invalidatepage_range first
> +	 */
> +	invalidatepage_range = page->mapping->a_ops->invalidatepage_range;
> +	if (invalidatepage_range)
> +		(*invalidatepage_range)(page, offset, length);
> +
> +	/*
> +	 * When only invalidatepage is registered length must be
> +	 * PAGE_CACHE_SIZE
> +	 */
>  	invalidatepage = page->mapping->a_ops->invalidatepage;
> +	if (invalidatepage) {
> +		BUG_ON(length != PAGE_CACHE_SIZE);
> +		(*invalidatepage)(page, offset);

So if a filesystem were to declare both invalidatepage_range() and
invalidatepage(), it would call them both?  That seems odd, and differs
from what you said in the commit message, though probably would not matter.

> +	}
>  #ifdef CONFIG_BLOCK
> -	if (!invalidatepage)
> -		invalidatepage = block_invalidatepage;
> +	if (!invalidatepage_range && !invalidatepage)
> +		block_invalidatepage_range(page, offset, length);
>  #endif
> -	if (invalidatepage)
> -		(*invalidatepage)(page, offset);
>  }
>  
>  static inline void truncate_partial_page(struct page *page, unsigned partial)
> -- 
> 1.7.7.6

Patches 02-05 look as if they suffer from the same defect;
I didn't look at the later ones, they may well be unaffected.

I can see that your series would change less if we were declare that
01 is right and 06 is wrong; but I find consistency with zero_user_segment
too appealing, and the strange "stop" stuff here too unappealing.

Hugh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 07/15] ext4: Take i_mutex before punching hole
  2012-07-27  8:01 ` [PATCH 07/15] ext4: Take i_mutex before punching hole Lukas Czerner
  2012-07-27  9:04   ` Lukáš Czerner
@ 2012-08-20  5:45   ` Hugh Dickins
  1 sibling, 0 replies; 29+ messages in thread
From: Hugh Dickins @ 2012-08-20  5:45 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-fsdevel, linux-ext4, tytso, linux-mmc

On Fri, 27 Jul 2012, Lukas Czerner wrote:

> Currently the allocation might happen in the punched range after the
> truncation and before the releasing the space of the range. This would
> lead to blocks being unallocated under the mapped buffer heads resulting
> in nasty bugs.
> 
> With this commit we take i_mutex before going to do anything in the
> ext4_ext_punch_hole() preventing any write to happen while the hole
> punching is in progress. This will also allow us to ditch the writeout
> of dirty pages withing the range.
> 
> This commit was based on code provided by Zheng Liu, thanks!

I'm glad you have found that i_mutex really is needed here: it had
worried me that it was not taken, but I could only raise a concern,
didn't really know one way or the other.

> 
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> ---
>  fs/ext4/extents.c |   26 ++++++++++----------------
>  1 files changed, 10 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 91341ec..2d6a216 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -4755,9 +4755,11 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
>  	loff_t first_page_offset, last_page_offset;
>  	int credits, err = 0;
>  
> +	mutex_lock(&inode->i_mutex);
> +
>  	/* No need to punch hole beyond i_size */
>  	if (offset >= inode->i_size)
> -		return 0;
> +		goto out1;

Note that this is wrong, but there is no reason why it should be you
to fix it in this patchset.  Blocks may have been fallocated beyond
i_size, and they should be removed when a hole is punched there.

It's on the ext4 TODO list to be fixed, so don't worry about it:
unless your changes happen to make it trivial to fix at the same time.

>  
>  	/*
>  	 * If the hole extends beyond i_size, set the hole
> @@ -4775,18 +4777,6 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
>  	first_page_offset = first_page << PAGE_CACHE_SHIFT;
>  	last_page_offset = last_page << PAGE_CACHE_SHIFT;
>  
> -	/*
> -	 * Write out all dirty pages to avoid race conditions
> -	 * Then release them.
> -	 */
> -	if (mapping->nrpages && mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
> -		err = filemap_write_and_wait_range(mapping,
> -			offset, offset + length - 1);
> -
> -		if (err)
> -			return err;
> -	}
> -

It's not clear to me why that's now safe to remove: a little more comment
in the commit would be good; but so long as it's clear to ext4 developers,
don't try to make it clear to me - that would take far too long!

Hugh

>  	/* Now release the pages */
>  	if (last_page_offset > first_page_offset) {
>  		truncate_pagecache_range(inode, first_page_offset,
> @@ -4798,12 +4788,14 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
>  
>  	credits = ext4_writepage_trans_blocks(inode);
>  	handle = ext4_journal_start(inode, credits);
> -	if (IS_ERR(handle))
> -		return PTR_ERR(handle);
> +	if (IS_ERR(handle)) {
> +		err = PTR_ERR(handle);
> +		goto out1;
> +	}
>  
>  	err = ext4_orphan_add(handle, inode);
>  	if (err)
> -		goto out;
> +		goto out1;
>  
>  	/*
>  	 * Now we need to zero out the non-page-aligned data in the
> @@ -4893,6 +4885,8 @@ out:
>  	inode->i_mtime = inode->i_ctime = ext4_current_time(inode);
>  	ext4_mark_inode_dirty(handle, inode);
>  	ext4_journal_stop(handle);
> +out1:
> +	mutex_unlock(&inode->i_mutex);
>  	return err;
>  }
>  int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> -- 
> 1.7.7.6

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 06/15] mm: teach truncate_inode_pages_range() to handle non page aligned ranges
  2012-08-20  4:52   ` Hugh Dickins
@ 2012-08-20 10:26     ` Lukáš Czerner
  2012-08-20 15:47       ` Hugh Dickins
  2012-08-20 15:53       ` Hugh Dickins
  0 siblings, 2 replies; 29+ messages in thread
From: Lukáš Czerner @ 2012-08-20 10:26 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Lukas Czerner, linux-fsdevel, linux-ext4, tytso, linux-mmc,
	Andrew Morton

On Sun, 19 Aug 2012, Hugh Dickins wrote:

> Date: Sun, 19 Aug 2012 21:52:48 -0700 (PDT)
> From: Hugh Dickins <hughd@google.com>
> To: Lukas Czerner <lczerner@redhat.com>
> Cc: linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, tytso@mit.edu,
>     linux-mmc@vger.kernel.org, Andrew Morton <akpm@linux-foundation.org>
> Subject: Re: [PATCH 06/15] mm: teach truncate_inode_pages_range() to handle
>     non page aligned ranges
> 
> On Fri, 27 Jul 2012, Lukas Czerner wrote:
> 
> > This commit changes truncate_inode_pages_range() so it can handle non
> > page aligned regions of the truncate. Currently we can hit BUG_ON when
> > the end of the range is not page aligned, but we can handle unaligned
> > start of the range.
> > 
> > Being able to handle non page aligned regions of the page can help file
> > system punch_hole implementations and save some work, because once we're
> > holding the page we might as well deal with it right away.
> > 
> > In order for this to work correctly, called must register
> > invalidatepage_range address space operation, or rely solely on the
> > block_invalidatepage_range. That said it will BUG_ON() if caller
> > implements invalidatepage(), does not implement invalidatepage_range()
> > and use truncate_inode_pages_range() with unaligned end of the range.
> > 
> > This was based on the code provided by Hugh Dickins with some small
> > changes to make use of do_invalidatepage_range().
> > 
> > Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Hugh Dickins <hughd@google.com>
> 
> Acked-by: Hugh Dickins <hughd@google.com>
> 
> This looks good to me.  I like the way you provide the same args
> to do_invalidatepage_range() as to zero_user_segment():
> 
> 		zero_user_segment(page, partial_start, top);
> 		if (page_has_private(page))
> 			do_invalidatepage_range(page, partial_start, top);
> 
> Unfortunately, that is not what patches 01-05 are expecting...

Thank for the review Hugh. The fact is that the third argument of
the invalidatepage_range() was meant to be length and the problem is
actually in this patch, where I am passing end offset as the third
argument.

But you've made it clear that you would like better the semantics
where the third argument is actually the end offset. Is that right ?
If so, I'll change it accordingly, otherwise I'll just fix this
patch.

Thanks!
-Lukas

> 
> Hugh
> 
> > ---
> >  mm/truncate.c |   77 +++++++++++++++++++++++++++++++++++++--------------------
> >  1 files changed, 50 insertions(+), 27 deletions(-)
> > 
> > diff --git a/mm/truncate.c b/mm/truncate.c
> > index e29e5ea..1f6ea8b 100644
> > --- a/mm/truncate.c
> > +++ b/mm/truncate.c
> > @@ -71,14 +71,6 @@ void do_invalidatepage_range(struct page *page, unsigned long offset,
> >  #endif
> >  }
> >  
> > -static inline void truncate_partial_page(struct page *page, unsigned partial)
> > -{
> > -	zero_user_segment(page, partial, PAGE_CACHE_SIZE);
> > -	cleancache_invalidate_page(page->mapping, page);
> > -	if (page_has_private(page))
> > -		do_invalidatepage(page, partial);
> > -}
> > -
> >  /*
> >   * This cancels just the dirty bit on the kernel page itself, it
> >   * does NOT actually remove dirty bits on any mmap's that may be
> > @@ -212,8 +204,8 @@ int invalidate_inode_page(struct page *page)
> >   * @lend: offset to which to truncate
> >   *
> >   * Truncate the page cache, removing the pages that are between
> > - * specified offsets (and zeroing out partial page
> > - * (if lstart is not page aligned)).
> > + * specified offsets (and zeroing out partial pages
> > + * if lstart or lend + 1 is not page aligned).
> >   *
> >   * Truncate takes two passes - the first pass is nonblocking.  It will not
> >   * block on page locks and it will not block on writeback.  The second pass
> > @@ -224,35 +216,44 @@ int invalidate_inode_page(struct page *page)
> >   * We pass down the cache-hot hint to the page freeing code.  Even if the
> >   * mapping is large, it is probably the case that the final pages are the most
> >   * recently touched, and freeing happens in ascending file offset order.
> > + *
> > + * Note that it is able to handle cases where lend + 1 is not page aligned.
> > + * However in order for this to work caller have to register
> > + * invalidatepage_range address space operation or rely solely on
> > + * block_invalidatepage_range(). That said, do_invalidatepage_range() will
> > + * BUG_ON() if caller implements invalidatapage(), does not implement
>                                     invalidatepage()
> > + * invalidatepage_range() and uses truncate_inode_pages_range() with lend + 1
> > + * unaligned to the page cache size.
> >   */
> >  void truncate_inode_pages_range(struct address_space *mapping,
> >  				loff_t lstart, loff_t lend)
> >  {
> > -	const pgoff_t start = (lstart + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
> > -	const unsigned partial = lstart & (PAGE_CACHE_SIZE - 1);
> > +	pgoff_t start = (lstart + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
> > +	pgoff_t end = (lend + 1) >> PAGE_CACHE_SHIFT;
> > +	unsigned int partial_start = lstart & (PAGE_CACHE_SIZE - 1);
> > +	unsigned int partial_end = (lend + 1) & (PAGE_CACHE_SIZE - 1);
> >  	struct pagevec pvec;
> >  	pgoff_t index;
> > -	pgoff_t end;
> >  	int i;
> >  
> >  	cleancache_invalidate_inode(mapping);
> >  	if (mapping->nrpages == 0)
> >  		return;
> >  
> > -	BUG_ON((lend & (PAGE_CACHE_SIZE - 1)) != (PAGE_CACHE_SIZE - 1));
> > -	end = (lend >> PAGE_CACHE_SHIFT);
> > +	if (lend == -1)
> > +		end = -1;	/* unsigned, so actually very big */
> >  
> >  	pagevec_init(&pvec, 0);
> >  	index = start;
> > -	while (index <= end && pagevec_lookup(&pvec, mapping, index,
> > -			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
> > +	while (index < end && pagevec_lookup(&pvec, mapping, index,
> > +			min(end - index, (pgoff_t)PAGEVEC_SIZE))) {
> >  		mem_cgroup_uncharge_start();
> >  		for (i = 0; i < pagevec_count(&pvec); i++) {
> >  			struct page *page = pvec.pages[i];
> >  
> >  			/* We rely upon deletion not changing page->index */
> >  			index = page->index;
> > -			if (index > end)
> > +			if (index >= end)
> >  				break;
> >  
> >  			if (!trylock_page(page))
> > @@ -271,27 +272,51 @@ void truncate_inode_pages_range(struct address_space *mapping,
> >  		index++;
> >  	}
> >  
> > -	if (partial) {
> > +	if (partial_start) {
> >  		struct page *page = find_lock_page(mapping, start - 1);
> >  		if (page) {
> > +			unsigned int top = PAGE_CACHE_SIZE;
> > +			if (start > end) {
> > +				top = partial_end;
> > +				partial_end = 0;
> > +			}
> > +			wait_on_page_writeback(page);
> > +			zero_user_segment(page, partial_start, top);
> > +			cleancache_invalidate_page(mapping, page);
> > +			if (page_has_private(page))
> > +				do_invalidatepage_range(page, partial_start,
> > +							top);
> > +			unlock_page(page);
> > +			page_cache_release(page);
> > +		}
> > +	}
> > +	if (partial_end) {
> > +		struct page *page = find_lock_page(mapping, end);
> > +		if (page) {
> >  			wait_on_page_writeback(page);
> > -			truncate_partial_page(page, partial);
> > +			zero_user_segment(page, 0, partial_end);
> > +			cleancache_invalidate_page(mapping, page);
> > +			if (page_has_private(page))
> > +				do_invalidatepage_range(page, 0,
> > +							partial_end);
> >  			unlock_page(page);
> >  			page_cache_release(page);
> >  		}
> >  	}
> > +	if (start >= end)
> > +		return;
> >  
> >  	index = start;
> >  	for ( ; ; ) {
> >  		cond_resched();
> >  		if (!pagevec_lookup(&pvec, mapping, index,
> > -			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
> > +			min(end - index, (pgoff_t)PAGEVEC_SIZE))) {
> >  			if (index == start)
> >  				break;
> >  			index = start;
> >  			continue;
> >  		}
> > -		if (index == start && pvec.pages[0]->index > end) {
> > +		if (index == start && pvec.pages[0]->index >= end) {
> >  			pagevec_release(&pvec);
> >  			break;
> >  		}
> > @@ -301,7 +326,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
> >  
> >  			/* We rely upon deletion not changing page->index */
> >  			index = page->index;
> > -			if (index > end)
> > +			if (index >= end)
> >  				break;
> >  
> >  			lock_page(page);
> > @@ -646,10 +671,8 @@ void truncate_pagecache_range(struct inode *inode, loff_t lstart, loff_t lend)
> >  	 * This rounding is currently just for example: unmap_mapping_range
> >  	 * expands its hole outwards, whereas we want it to contract the hole
> >  	 * inwards.  However, existing callers of truncate_pagecache_range are
> > -	 * doing their own page rounding first; and truncate_inode_pages_range
> > -	 * currently BUGs if lend is not pagealigned-1 (it handles partial
> > -	 * page at start of hole, but not partial page at end of hole).  Note
> > -	 * unmap_mapping_range allows holelen 0 for all, and we allow lend -1.
> > +	 * doing their own page rounding first.  Note that unmap_mapping_range
> > +	 * allows holelen 0 for all, and we allow lend -1 for end of file.
> >  	 */
> >  
> >  	/*
> > -- 
> > 1.7.7.6
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Add invalidatepage_range address space operation
  2012-08-20  4:43   ` Hugh Dickins
@ 2012-08-20 13:23     ` Theodore Ts'o
  0 siblings, 0 replies; 29+ messages in thread
From: Theodore Ts'o @ 2012-08-20 13:23 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Lukas Czerner, linux-fsdevel, linux-ext4, linux-mmc

On Sun, Aug 19, 2012 at 09:43:55PM -0700, Hugh Dickins wrote:
> (I wonder if linux-mmc@vger.kernel.org was supposed to be
> linux-mm@kvack.org?  Our apologies to linux-mmc if so.)

Whoops, I bet it was supposed to have been linux-mm@kvack.org -- and
that's probably why we didn't get comments from other mm developers.  :-)

Lukas, can you confirm?

					- Ted

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 06/15] mm: teach truncate_inode_pages_range() to handle non page aligned ranges
  2012-08-20 10:26     ` Lukáš Czerner
@ 2012-08-20 15:47       ` Hugh Dickins
       [not found]         ` <50339e0d.69b2340a.50ba.ffff92bcSMTPIN_ADDED@mx.google.com>
  2012-08-20 15:53       ` Hugh Dickins
  1 sibling, 1 reply; 29+ messages in thread
From: Hugh Dickins @ 2012-08-20 15:47 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-fsdevel, linux-ext4, tytso, linux-mm, Andrew Morton

On Mon, 20 Aug 2012, Lukas Czerner wrote:
> On Sun, 19 Aug 2012, Hugh Dickins wrote:
> > 
> > This looks good to me.  I like the way you provide the same args
> > to do_invalidatepage_range() as to zero_user_segment():
> > 
> > 		zero_user_segment(page, partial_start, top);
> > 		if (page_has_private(page))
> > 			do_invalidatepage_range(page, partial_start, top);
> > 
> > Unfortunately, that is not what patches 01-05 are expecting...
> 
> Thank for the review Hugh. The fact is that the third argument of
> the invalidatepage_range() was meant to be length and the problem is
> actually in this patch, where I am passing end offset as the third
> argument.
> 
> But you've made it clear that you would like better the semantics
> where the third argument is actually the end offset. Is that right ?
> If so, I'll change it accordingly, otherwise I'll just fix this
> patch.

I do get irritated by gratuitous differences between function calling
conventions, so yes, I liked that you (appeared to) follow
zero_user_segment() here.

However, I don't think my opinion and that precedent are very important
in this case.  What do the VFS people think makes the most sensible
interface for ->invalidatepage_range()?  page, startoffset-within-page,
length-within-page or page, startoffset-within-page, endoffset-within-page?
(where "within" may actually take you to the end of the page).

If they think 3rd arg should be length (and I'd still suggest unsigned
int for both 2nd and 3rd argument, to make it clearer that it's inside
the page, not an erroneous use of unsigned long for ssize_t or loff_t),
that's okay by me.

I can see advantages to length, actually: it's often unclear
whether "end" is of the "last-of-this" or "start-of-next" variety;
in most of mm we are consistent in using end in the start-of-next
sense, but here truncate_inode_pages_range() itself has gone for
the last-of-this meaning.

But even you keep to length, you still need to go through patches 01-05,
changing block_invalidatepage() etc. to
	block_invalidatepage_range(page, offset, PAGE_CACHE_SIZE - offset);
and removing (or more probably replacing by some BUG_ONs for now) the
strange "(stop < length)" stuff in the invalidatatepage_range()s.

I do not think it's a good idea to be lenient about out-of-range args
there: that approach has already wasted time.

Hugh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 06/15] mm: teach truncate_inode_pages_range() to handle non page aligned ranges
  2012-08-20 10:26     ` Lukáš Czerner
  2012-08-20 15:47       ` Hugh Dickins
@ 2012-08-20 15:53       ` Hugh Dickins
  1 sibling, 0 replies; 29+ messages in thread
From: Hugh Dickins @ 2012-08-20 15:53 UTC (permalink / raw)
  To: Lukas Czerner
  Cc: Hugh Dickins, linux-fsdevel, linux-ext4, tytso, linux-mm,
	Andrew Morton

Urrgh, now I messed up trying to correct linux-mm: resend to fix.

On Mon, 20 Aug 2012, Lukas Czerner wrote:
> On Sun, 19 Aug 2012, Hugh Dickins wrote:
> > 
> > This looks good to me.  I like the way you provide the same args
> > to do_invalidatepage_range() as to zero_user_segment():
> > 
> > 		zero_user_segment(page, partial_start, top);
> > 		if (page_has_private(page))
> > 			do_invalidatepage_range(page, partial_start, top);
> > 
> > Unfortunately, that is not what patches 01-05 are expecting...
> 
> Thank for the review Hugh. The fact is that the third argument of
> the invalidatepage_range() was meant to be length and the problem is
> actually in this patch, where I am passing end offset as the third
> argument.
> 
> But you've made it clear that you would like better the semantics
> where the third argument is actually the end offset. Is that right ?
> If so, I'll change it accordingly, otherwise I'll just fix this
> patch.

I do get irritated by gratuitous differences between function calling
conventions, so yes, I liked that you (appeared to) follow
zero_user_segment() here.

However, I don't think my opinion and that precedent are very important
in this case.  What do the VFS people think makes the most sensible
interface for ->invalidatepage_range()?  page, startoffset-within-page,
length-within-page or page, startoffset-within-page, endoffset-within-page?
(where "within" may actually take you to the end of the page).

If they think 3rd arg should be length (and I'd still suggest unsigned
int for both 2nd and 3rd argument, to make it clearer that it's inside
the page, not an erroneous use of unsigned long for ssize_t or loff_t),
that's okay by me.

I can see advantages to length, actually: it's often unclear
whether "end" is of the "last-of-this" or "start-of-next" variety;
in most of mm we are consistent in using end in the start-of-next
sense, but here truncate_inode_pages_range() itself has gone for
the last-of-this meaning.

But even you keep to length, you still need to go through patches 01-05,
changing block_invalidatepage() etc. to
	block_invalidatepage_range(page, offset, PAGE_CACHE_SIZE - offset);
and removing (or more probably replacing by some BUG_ONs for now) the
strange "(stop < length)" stuff in the invalidatatepage_range()s.

I do not think it's a good idea to be lenient about out-of-range args
there: that approach has already wasted time.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 06/15] mm: teach truncate_inode_pages_range() to handle non page aligned ranges
       [not found]         ` <50339e0d.69b2340a.50ba.ffff92bcSMTPIN_ADDED@mx.google.com>
@ 2012-08-21 18:44           ` Hugh Dickins
  0 siblings, 0 replies; 29+ messages in thread
From: Hugh Dickins @ 2012-08-21 18:44 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-fsdevel, linux-ext4, tytso, linux-mm, Andrew Morton

On Tue, 21 Aug 2012, Lukas Czerner wrote:
> On Mon, 20 Aug 2012, Hugh Dickins wrote:
> > 
> > I can see advantages to length, actually: it's often unclear
> > whether "end" is of the "last-of-this" or "start-of-next" variety;
> > in most of mm we are consistent in using end in the start-of-next
> > sense, but here truncate_inode_pages_range() itself has gone for
> > the last-of-this meaning.
> 
> I really do agree with this paragraph and this is why I like the "length"
> argument better. So if there is no objections I'll stick with it and
> fix the other things you've pointed out.

Okay
Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 01/15] mm: add invalidatepage_range address space operation
       [not found]     ` <5033a999.0f403a0a.19c3.ffff95deSMTPIN_ADDED@mx.google.com>
@ 2012-08-21 18:59       ` Hugh Dickins
  0 siblings, 0 replies; 29+ messages in thread
From: Hugh Dickins @ 2012-08-21 18:59 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-fsdevel, linux-ext4, tytso, linux-mm, Andrew Morton

On Tue, 21 Aug 2012, Lukas Czerner wrote:
> On Sun, 19 Aug 2012, Hugh Dickins wrote:
> > > --- a/include/linux/fs.h
> > > +++ b/include/linux/fs.h
> > > @@ -620,6 +620,8 @@ struct address_space_operations {
> > >  	/* Unfortunately this kludge is needed for FIBMAP. Don't use it */
> > >  	sector_t (*bmap)(struct address_space *, sector_t);
> > >  	void (*invalidatepage) (struct page *, unsigned long);
> > > +	void (*invalidatepage_range) (struct page *, unsigned long,
> > > +				      unsigned long);
> > 
> > It may turn out to be bad advice, given how invalidatepage() already
> > takes an unsigned long, but I'd be tempted to make both of these args
> > unsigned int, since that helps to make it clearer that they're intended
> > to be offsets within a page, in the range 0..PAGE_CACHE_SIZE.
> > 
> > (partial_start, partial_end and top in truncate_inode_pages_range()
> > are all unsigned int.)
> 
> Hmm, this does not seem right. I can see that PAGE_CACHE_SIZE
> (PAGE_SIZE) can be defined as unsigned long, or am I missing
> something ?

They would be defined as unsigned long so that they can be used in
masks like ~(PAGE_SIZE - 1), and behave as expected on addresses,
without needing casts to be added all over.

We do not (currently!) expect PAGE_SIZE or PAGE_CACHE_SIZE to grow
beyond an unsigned int - but indeed they can be larger than what's
held in an unsigned short (look no further than ia64 or ppc64).

For more reassurance, see include/linux/highmem.h, which declares
zero_user_segments() and others: unsigned int (well, unsigned with
the int implicit) for offsets within a page.

Hugh

> > 
> > Andrew is very keen on naming arguments in prototypes,
> > and I think there is an especially strong case for it here.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2012-08-21 18:59 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-27  8:00 Add invalidatepage_range address space operation Lukas Czerner
2012-07-27  8:01 ` [PATCH 01/15] mm: add " Lukas Czerner
2012-08-20  5:24   ` Hugh Dickins
     [not found]     ` <5033a999.0f403a0a.19c3.ffff95deSMTPIN_ADDED@mx.google.com>
2012-08-21 18:59       ` Hugh Dickins
2012-07-27  8:01 ` [PATCH 02/15] jbd2: implement jbd2_journal_invalidatepage_range Lukas Czerner
2012-07-27  8:01 ` [PATCH 03/15] ext4: implement invalidatepage_range aop Lukas Czerner
2012-07-27  8:01 ` [PATCH 04/15] xfs: " Lukas Czerner
2012-07-27  8:01 ` [PATCH 05/15] ocfs2: " Lukas Czerner
2012-07-27  8:01 ` [PATCH 06/15] mm: teach truncate_inode_pages_range() to handle non page aligned ranges Lukas Czerner
2012-08-20  4:52   ` Hugh Dickins
2012-08-20 10:26     ` Lukáš Czerner
2012-08-20 15:47       ` Hugh Dickins
     [not found]         ` <50339e0d.69b2340a.50ba.ffff92bcSMTPIN_ADDED@mx.google.com>
2012-08-21 18:44           ` Hugh Dickins
2012-08-20 15:53       ` Hugh Dickins
2012-07-27  8:01 ` [PATCH 07/15] ext4: Take i_mutex before punching hole Lukas Czerner
2012-07-27  9:04   ` Lukáš Czerner
2012-07-27 12:08     ` Zheng Liu
2012-08-20  5:45   ` Hugh Dickins
2012-07-27  8:01 ` [PATCH 08/15] Revert "ext4: remove no longer used functions in inode.c" Lukas Czerner
2012-07-27  8:01 ` [PATCH 09/15] Revert "ext4: fix fsx truncate failure" Lukas Czerner
2012-07-27  8:01 ` [PATCH 10/15] ext4: use ext4_zero_partial_blocks in punch_hole Lukas Czerner
2012-07-27  8:01 ` [PATCH 11/15] ext4: remove unused discard_partial_page_buffers Lukas Czerner
2012-07-27  8:01 ` [PATCH 12/15] ext4: remove unused code from ext4_remove_blocks() Lukas Czerner
2012-07-27  8:01 ` [PATCH 13/15] ext4: update ext4_ext_remove_space trace point Lukas Czerner
2012-07-27  8:01 ` [PATCH 14/15] ext4: make punch hole code path work with bigalloc Lukas Czerner
2012-07-27  8:01 ` [PATCH 15/15] ext4: Allow punch hole with bigalloc enabled Lukas Czerner
2012-08-19  0:57 ` Add invalidatepage_range address space operation Theodore Ts'o
2012-08-20  4:43   ` Hugh Dickins
2012-08-20 13:23     ` Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).