[PATCH 4/6] xfs: zeroing space needs to punch delalloc blocks

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 4/6] xfs: zeroing space needs to punch delalloc blocks
  2014-03-21 10:11 [RFC, PATCH 0/6] xfs: delalloc, DIO " Dave Chinner
@ 2014-03-21 10:11 ` Dave Chinner
  0 siblings, 0 replies; 20+ messages in thread
From: Dave Chinner @ 2014-03-21 10:11 UTC (permalink / raw)
  To: xfs; +Cc: Viro, viro, Al

From: Dave Chinner <dchinner@redhat.com>

When we are zeroing space andit is covered by a delalloc range, we
need to punch the delalloc range out before we truncate the page
cache. Failing to do so leaves and inconsistency between the page
cache and the extent tree, which we later trip over when doing
direct IO over the same range.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_bmap_util.c | 13 ++++++++++++-
 fs/xfs/xfs_trace.h     |  1 +
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 01f6a64..3235b74 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1418,6 +1418,8 @@ xfs_zero_file_space(
 	xfs_off_t		end_boundary;
 	int			error;
 
+	trace_xfs_zero_file_space(ip);
+
 	granularity = max_t(uint, 1 << mp->m_sb.sb_blocklog, PAGE_CACHE_SIZE);
 
 	/*
@@ -1432,9 +1434,18 @@ xfs_zero_file_space(
 	ASSERT(end_boundary <= offset + len);
 
 	if (start_boundary < end_boundary - 1) {
-		/* punch out the page cache over the conversion range */
+		/*
+		 * punch out delayed allocation blocks and the page cache over
+		 * the conversion range
+		 */
+		xfs_ilock(ip, XFS_ILOCK_EXCL);
+		error = xfs_bmap_punch_delalloc_range(ip,
+				XFS_B_TO_FSB(mp, start_boundary),
+				XFS_B_TO_FSB(mp, end_boundary - start_boundary));
+		xfs_iunlock(ip, XFS_ILOCK_EXCL);
 		truncate_pagecache_range(VFS_I(ip), start_boundary,
 					 end_boundary - 1);
+
 		/* convert the blocks */
 		error = xfs_alloc_file_space(ip, start_boundary,
 					end_boundary - start_boundary - 1,
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index a4ae41c..65d8c79 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -603,6 +603,7 @@ DEFINE_INODE_EVENT(xfs_readlink);
 DEFINE_INODE_EVENT(xfs_inactive_symlink);
 DEFINE_INODE_EVENT(xfs_alloc_file_space);
 DEFINE_INODE_EVENT(xfs_free_file_space);
+DEFINE_INODE_EVENT(xfs_zero_file_space);
 DEFINE_INODE_EVENT(xfs_collapse_file_space);
 DEFINE_INODE_EVENT(xfs_readdir);
 #ifdef CONFIG_XFS_POSIX_ACL
-- 
1.9.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 0/6 v2] xfs: delalloc, dio and corruption...
@ 2014-04-10  5:00 Dave Chinner
  2014-04-10  5:00 ` [PATCH 1/6] xfs: kill buffers over failed write ranges properly Dave Chinner
                   ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: Dave Chinner @ 2014-04-10  5:00 UTC (permalink / raw)
  To: xfs

Hi folks,

This is version 2 of the DIO vs delalloc patchset I posted here:

http://oss.sgi.com/archives/xfs/2014-03/msg00313.html

The changes to this version are:

	- the bug fix to patch 2 that Brain noticed,
	- I dropped the delalloc extent splittting patch because
	  with the fix to patch 2 I can't trigger that bug anymore,
	  and that patch was causing transaction overruns in
	  xfs/297. Hence without an existing reproducer, I won't try
	  to fix that problem.
	- the last patch is new, and is a bug in the collapse range
	  code where it fails to shift the last N extents correctly
	  if there are N delalloc extents before the shifted range.

With these 6 patches, all of the xfstests fsx/fsstress tests pass
on 1k, 2k and 4k block size filesystems, with and without CRCs
enabled, on 1, 2 and 16p test VMs.

I'm much happier with these patches now - I don't think that there
are more problems lurking, but only time will tell. I'd like to get
these fixes to Linus for 3.15 (probably for -rc2), so eyeballs and
testing would be appreciated.

Cheers,

Dave.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/6] xfs: kill buffers over failed write ranges properly
  2014-04-10  5:00 [PATCH 0/6 v2] xfs: delalloc, dio and corruption Dave Chinner
@ 2014-04-10  5:00 ` Dave Chinner
  2014-04-10 10:32   ` Christoph Hellwig
  2014-04-10  5:00 ` [PATCH 2/6] xfs: write failure beyond EOF truncates too much data Dave Chinner
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2014-04-10  5:00 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When a write fails, if we don't clear the delalloc flags from the
buffers over the failed range, they can persist beyond EOF and cause
problems. writeback will see the pages int eh page cache, see they
are dirty and continually retry the write, assuming that the page
beyond EOF is just racing with a truncate. The page will eventually
be released due to some other operation (e.g. direct IO), and it
will not pass through invalidation because it is dirty. Hence it
will be released with buffer_delay set on it, and trigger warnings
in xfs_vm_releasepage() and assert fail in xfs_file_aio_write_direct
because invalidation failed and we didn't write the corect amount.

This causes failures on block size < page size filesystems in fsx
and fsstress workloads run by xfstests.

Fix it by completely trashing any state on the buffer that could be
used to imply that it contains valid data when the delalloc range
over the buffer is punched out during the failed write handling.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_aops.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 75df77d..282c726 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1566,6 +1566,16 @@ xfs_vm_write_failed(

 		xfs_vm_kill_delalloc_range(inode, block_offset,
 					   block_offset + bh->b_size);
+
+		/*
+		 * This buffer does not contain data anymore. make sure anyone
+		 * who finds it knows that for certain.
+		 */
+		clear_buffer_delay(bh);
+		clear_buffer_uptodate(bh);
+		clear_buffer_mapped(bh);
+		clear_buffer_new(bh);
+		clear_buffer_dirty(bh);
 	}

 }
-- 
1.9.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/6] xfs: write failure beyond EOF truncates too much data
  2014-04-10  5:00 [PATCH 0/6 v2] xfs: delalloc, dio and corruption Dave Chinner
  2014-04-10  5:00 ` [PATCH 1/6] xfs: kill buffers over failed write ranges properly Dave Chinner
@ 2014-04-10  5:00 ` Dave Chinner
  2014-04-10 10:35   ` Christoph Hellwig
  2014-04-10  5:00 ` [PATCH 3/6] xfs: xfs_vm_write_end truncates too much on failure Dave Chinner
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2014-04-10  5:00 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

If we fail a write beyond EOF and have to handle it in
xfs_vm_write_begin(), we truncate the inode back to the current inode
size. This doesn't take into account the fact that we may have
already made successful writes to the same page (in the case of block
size < page size) and hence we can truncate the page cache away from
blocks with valid data in them. If these blocks are delayed
allocation blocks, we now have a mismatch between the page cache and
the extent tree, and this will trigger - at minimum - a delayed
block count mismatch assert when the inode is evicted from the cache.
We can also trip over it when block mapping for direct IO - this is
the most common symptom seen from fsx and fsstress when run from
xfstests.

Fix it by only truncating away the exact range we are updating state
for in this write_begin call.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_aops.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 282c726..5f29693 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1609,12 +1609,21 @@ xfs_vm_write_begin(
 	status = __block_write_begin(page, pos, len, xfs_get_blocks);
 	if (unlikely(status)) {
 		struct inode	*inode = mapping->host;
+		size_t		isize = i_size_read(inode);

 		xfs_vm_write_failed(inode, page, pos, len);
 		unlock_page(page);

-		if (pos + len > i_size_read(inode))
-			truncate_pagecache(inode, i_size_read(inode));
+		/*
+		 * If the write is beyond EOF, we only want to kill blocks
+		 * allocated in this write, not blocks that were previously
+		 * written successfully.
+		 */
+		if (pos + len > isize) {
+			ssize_t start = max_t(ssize_t, pos, isize);
+
+			truncate_pagecache_range(inode, start, pos + len);
+		}

 		page_cache_release(page);
 		page = NULL;
-- 
1.9.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/6] xfs: xfs_vm_write_end truncates too much on failure
  2014-04-10  5:00 [PATCH 0/6 v2] xfs: delalloc, dio and corruption Dave Chinner
  2014-04-10  5:00 ` [PATCH 1/6] xfs: kill buffers over failed write ranges properly Dave Chinner
  2014-04-10  5:00 ` [PATCH 2/6] xfs: write failure beyond EOF truncates too much data Dave Chinner
@ 2014-04-10  5:00 ` Dave Chinner
  2014-04-10 10:35   ` Christoph Hellwig
  2014-04-10  5:00 ` [PATCH 4/6] xfs: zeroing space needs to punch delalloc blocks Dave Chinner
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2014-04-10  5:00 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Similar to the write_begin problem, xfs-vm_write_end will truncate
back to the old EOF, potentially removing page cache from over the
top of delalloc blocks with valid data in them. Fix this by
truncating back to just the start of the failed write.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_aops.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 5f29693..e0a7931 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1634,9 +1634,12 @@ xfs_vm_write_begin(
 }
 
 /*
- * On failure, we only need to kill delalloc blocks beyond EOF because they
- * will never be written. For blocks within EOF, generic_write_end() zeros them
- * so they are safe to leave alone and be written with all the other valid data.
+ * On failure, we only need to kill delalloc blocks beyond EOF in the range of
+ * this specific write because they will never be written. Previous writes
+ * beyond EOF where block allocation succeeded do not need to be trashed, so
+ * only new blocks from this write should be trashed. For blocks within
+ * EOF, generic_write_end() zeros them so they are safe to leave alone and be
+ * written with all the other valid data.
  */
 STATIC int
 xfs_vm_write_end(
@@ -1659,8 +1662,11 @@ xfs_vm_write_end(
 		loff_t		to = pos + len;
 
 		if (to > isize) {
-			truncate_pagecache(inode, isize);
+			/* only kill blocks in this write beyond EOF */
+			if (pos > isize)
+				isize = pos;
 			xfs_vm_kill_delalloc_range(inode, isize, to);
+			truncate_pagecache_range(inode, isize, to);
 		}
 	}
 	return ret;
-- 
1.9.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/6] xfs: zeroing space needs to punch delalloc blocks
  2014-04-10  5:00 [PATCH 0/6 v2] xfs: delalloc, dio and corruption Dave Chinner
                   ` (2 preceding siblings ...)
  2014-04-10  5:00 ` [PATCH 3/6] xfs: xfs_vm_write_end truncates too much on failure Dave Chinner
@ 2014-04-10  5:00 ` Dave Chinner
  2014-04-10 10:40   ` Christoph Hellwig
  2014-04-10  5:00 ` [PATCH 5/6] xfs: don't map ranges that span EOF for direct IO Dave Chinner
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2014-04-10  5:00 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When we are zeroing space andit is covered by a delalloc range, we
need to punch the delalloc range out before we truncate the page
cache. Failing to do so leaves and inconsistency between the page
cache and the extent tree, which we later trip over when doing
direct IO over the same range.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_bmap_util.c | 13 ++++++++++++-
 fs/xfs/xfs_trace.h     |  1 +
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 01f6a64..3235b74 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1418,6 +1418,8 @@ xfs_zero_file_space(
 	xfs_off_t		end_boundary;
 	int			error;
 
+	trace_xfs_zero_file_space(ip);
+
 	granularity = max_t(uint, 1 << mp->m_sb.sb_blocklog, PAGE_CACHE_SIZE);
 
 	/*
@@ -1432,9 +1434,18 @@ xfs_zero_file_space(
 	ASSERT(end_boundary <= offset + len);
 
 	if (start_boundary < end_boundary - 1) {
-		/* punch out the page cache over the conversion range */
+		/*
+		 * punch out delayed allocation blocks and the page cache over
+		 * the conversion range
+		 */
+		xfs_ilock(ip, XFS_ILOCK_EXCL);
+		error = xfs_bmap_punch_delalloc_range(ip,
+				XFS_B_TO_FSB(mp, start_boundary),
+				XFS_B_TO_FSB(mp, end_boundary - start_boundary));
+		xfs_iunlock(ip, XFS_ILOCK_EXCL);
 		truncate_pagecache_range(VFS_I(ip), start_boundary,
 					 end_boundary - 1);
+
 		/* convert the blocks */
 		error = xfs_alloc_file_space(ip, start_boundary,
 					end_boundary - start_boundary - 1,
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index a4ae41c..65d8c79 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -603,6 +603,7 @@ DEFINE_INODE_EVENT(xfs_readlink);
 DEFINE_INODE_EVENT(xfs_inactive_symlink);
 DEFINE_INODE_EVENT(xfs_alloc_file_space);
 DEFINE_INODE_EVENT(xfs_free_file_space);
+DEFINE_INODE_EVENT(xfs_zero_file_space);
 DEFINE_INODE_EVENT(xfs_collapse_file_space);
 DEFINE_INODE_EVENT(xfs_readdir);
 #ifdef CONFIG_XFS_POSIX_ACL
-- 
1.9.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/6] xfs: don't map ranges that span EOF for direct IO
  2014-04-10  5:00 [PATCH 0/6 v2] xfs: delalloc, dio and corruption Dave Chinner
                   ` (3 preceding siblings ...)
  2014-04-10  5:00 ` [PATCH 4/6] xfs: zeroing space needs to punch delalloc blocks Dave Chinner
@ 2014-04-10  5:00 ` Dave Chinner
  2014-04-10 10:40   ` Christoph Hellwig
  2014-04-10  5:00 ` [PATCH 6/6] xfs: collapse range is delalloc challenged Dave Chinner
  2014-04-11 13:10 ` [PATCH 0/6 v2] xfs: delalloc, dio and corruption Brian Foster
  6 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2014-04-10  5:00 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Al Viro tracked down the problem that has caused generic/263 to fail
on XFS since the test was introduced. If is caused by
xfs_get_blocks() mapping a single extent that spans EOF without
marking it as buffer-new() so that the direct Io code does not zero
the tail of the block at the new EOF. This is a long standing bug
that has been around for many, many years.

Because xfs_get_blocks() starts the map before EOF, it can't set
buffer_new(), because that causes he direct Io code to also zero
unaligned sectors at the head of the IO. This would overwrite valid
data with zeros, and hence we cannot validly return a single extent
that spans EOF to direct IO.

Fix this by detecting a mapping that spans EOF and truncate it down
to EOF. This results in the the direct IO code doing the right thing
for unaligned data blocks before EOF, and then returning to get
another mapping for the region beyond EOF which XFS treats correctly
by setting buffer_new() on it. This makes direct Io behave correctly
w.r.t. tail block zeroing beyond EOF, and fsx is happy about that.

Again, thanks to Al Viro for finding what I couldn't.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_aops.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index e0a7931..2c13967 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1344,6 +1344,14 @@ __xfs_get_blocks(
 	/*
 	 * If this is O_DIRECT or the mpage code calling tell them how large
 	 * the mapping is, so that we can avoid repeated get_blocks calls.
+	 *
+	 * If the mapping spans EOF, then we have to break the mapping up as the
+	 * mapping for blocks beyond EOF must be marked new so that sub block
+	 * regions can be correctly zeroed. We can't do this for mappings within
+	 * EOF unless the mapping was just allocated or is unwritten, otherwise
+	 * the callers would overwrite existing data with zeros. Hence we have
+	 * to split the mapping into a range up to and including EOF, and a
+	 * second mapping for beyond EOF.
 	 */
 	if (direct || size > (1 << inode->i_blkbits)) {
 		xfs_off_t		mapping_size;
@@ -1354,6 +1362,12 @@ __xfs_get_blocks(
 		ASSERT(mapping_size > 0);
 		if (mapping_size > size)
 			mapping_size = size;
+		if (offset < i_size_read(inode) &&
+		    offset + mapping_size >= i_size_read(inode)) {
+			/* limit mapping to block that spans EOF */
+			mapping_size = roundup(i_size_read(inode) - offset,
+					       1 << inode->i_blkbits);
+		}
 		if (mapping_size > LONG_MAX)
 			mapping_size = LONG_MAX;
 
-- 
1.9.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/6] xfs: collapse range is delalloc challenged.
  2014-04-10  5:00 [PATCH 0/6 v2] xfs: delalloc, dio and corruption Dave Chinner
                   ` (4 preceding siblings ...)
  2014-04-10  5:00 ` [PATCH 5/6] xfs: don't map ranges that span EOF for direct IO Dave Chinner
@ 2014-04-10  5:00 ` Dave Chinner
  2014-04-10 10:44   ` Christoph Hellwig
  2014-04-11 13:10 ` [PATCH 0/6 v2] xfs: delalloc, dio and corruption Brian Foster
  6 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2014-04-10  5:00 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

FSX has been detecting data corruption after to collapse range
calls. The key observation is that the offset of the last extent in
the file was not being shifted, and hence when the file size was
adjusted it was truncating away data because the extents handled
been correctly shifted.

Tracing indicated that before the collapse, the extent list looked
like:

....
ino 0x5788 state  idx 6 offset 26 block 195904 count 10 flag 0
ino 0x5788 state  idx 7 offset 39 block 195917 count 35 flag 0
ino 0x5788 state  idx 8 offset 86 block 195964 count 32 flag 0

and after the shift of 2 blocks:

ino 0x5788 state  idx 6 offset 24 block 195904 count 10 flag 0
ino 0x5788 state  idx 7 offset 37 block 195917 count 35 flag 0
ino 0x5788 state  idx 8 offset 86 block 195964 count 32 flag 0

Note that the last extent did not change offset. After the changing
of the file size:

ino 0x5788 state  idx 6 offset 24 block 195904 count 10 flag 0
ino 0x5788 state  idx 7 offset 37 block 195917 count 35 flag 0
ino 0x5788 state  idx 8 offset 86 block 195964 count 30 flag 0

You can see that the last extent had it's length truncated,
indicating that we've lost data.

The reason for this is that the xfs_bmap_shift_extents() loop uses
XFS_IFORK_NEXTENTS() to determine how many extents are in the inode.
This, unfortunately, doesn't take into account delayed allocation
extents - it's a count of physically allocated extents - and hence
when the file being collapsed has a delalloc extent like this one
does prior to the range being collapsed:

....
ino 0x5788 state  idx 4 offset 11 block 4503599627239429 count 1 flag 0
....

it gets the count wrong and terminates the shift loop early.

Fix it by using the in-memory extent array size that includes
delayed allocation extents to determine the number of extents on the
inode.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_bmap.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
index 5b6092e..f0efc7e 100644
--- a/fs/xfs/xfs_bmap.c
+++ b/fs/xfs/xfs_bmap.c
@@ -5413,6 +5413,7 @@ xfs_bmap_shift_extents(
 	int				whichfork = XFS_DATA_FORK;
 	int				logflags;
 	xfs_filblks_t			blockcount = 0;
+	int				total_extents;
 
 	if (unlikely(XFS_TEST_ERROR(
 	    (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
@@ -5429,7 +5430,6 @@ xfs_bmap_shift_extents(
 	ASSERT(current_ext != NULL);
 
 	ifp = XFS_IFORK_PTR(ip, whichfork);
-
 	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
 		/* Read in all the extents */
 		error = xfs_iread_extents(tp, ip, whichfork);
@@ -5456,7 +5456,6 @@ xfs_bmap_shift_extents(
 
 	/* We are going to change core inode */
 	logflags = XFS_ILOG_CORE;
-
 	if (ifp->if_flags & XFS_IFBROOT) {
 		cur = xfs_bmbt_init_cursor(mp, tp, ip, whichfork);
 		cur->bc_private.b.firstblock = *firstblock;
@@ -5467,8 +5466,14 @@ xfs_bmap_shift_extents(
 		logflags |= XFS_ILOG_DEXT;
 	}
 
-	while (nexts++ < num_exts &&
-	       *current_ext <  XFS_IFORK_NEXTENTS(ip, whichfork)) {
+	/*
+	 * There may be delalloc extents in the data fork before the range we
+	 * are collapsing out, so we cannot
+	 * use the count of real extents here. Instead we have to calculate it
+	 * from the incore fork.
+	 */
+	total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
+	while (nexts++ < num_exts && *current_ext < total_extents) {
 
 		gotp = xfs_iext_get_ext(ifp, *current_ext);
 		xfs_bmbt_get_all(gotp, &got);
@@ -5556,10 +5561,11 @@ xfs_bmap_shift_extents(
 		}
 
 		(*current_ext)++;
+		total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
 	}
 
 	/* Check if we are done */
-	if (*current_ext ==  XFS_IFORK_NEXTENTS(ip, whichfork))
+	if (*current_ext == total_extents)
 		*done = 1;
 
 del_cursor:
@@ -5568,6 +5574,5 @@ del_cursor:
 			error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
 
 	xfs_trans_log_inode(tp, ip, logflags);
-
 	return error;
 }
-- 
1.9.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/6] xfs: kill buffers over failed write ranges properly
  2014-04-10  5:00 ` [PATCH 1/6] xfs: kill buffers over failed write ranges properly Dave Chinner
@ 2014-04-10 10:32   ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2014-04-10 10:32 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/6] xfs: write failure beyond EOF truncates too much data
  2014-04-10  5:00 ` [PATCH 2/6] xfs: write failure beyond EOF truncates too much data Dave Chinner
@ 2014-04-10 10:35   ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2014-04-10 10:35 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/6] xfs: xfs_vm_write_end truncates too much on failure
  2014-04-10  5:00 ` [PATCH 3/6] xfs: xfs_vm_write_end truncates too much on failure Dave Chinner
@ 2014-04-10 10:35   ` Christoph Hellwig
  2014-04-14  8:13     ` Dave Chinner
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2014-04-10 10:35 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, Apr 10, 2014 at 03:00:50PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Similar to the write_begin problem, xfs-vm_write_end will truncate
> back to the old EOF, potentially removing page cache from over the
> top of delalloc blocks with valid data in them. Fix this by
> truncating back to just the start of the failed write.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

Any reason you used max_t in the previous one, and opencode it here?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/6] xfs: zeroing space needs to punch delalloc blocks
  2014-04-10  5:00 ` [PATCH 4/6] xfs: zeroing space needs to punch delalloc blocks Dave Chinner
@ 2014-04-10 10:40   ` Christoph Hellwig
  2014-04-10 12:22     ` Dave Chinner
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2014-04-10 10:40 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, Apr 10, 2014 at 03:00:51PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When we are zeroing space andit is covered by a delalloc range, we
> need to punch the delalloc range out before we truncate the page
> cache. Failing to do so leaves and inconsistency between the page
> cache and the extent tree, which we later trip over when doing
> direct IO over the same range.

Looks good.

Which test found this?

> @@ -1432,9 +1434,18 @@ xfs_zero_file_space(
>  	ASSERT(end_boundary <= offset + len);
>  
>  	if (start_boundary < end_boundary - 1) {
> -		/* punch out the page cache over the conversion range */
> +		/*
> +		 * punch out delayed allocation blocks and the page cache over
> +		 * the conversion range
> +		 */
> +		xfs_ilock(ip, XFS_ILOCK_EXCL);
> +		error = xfs_bmap_punch_delalloc_range(ip,
> +				XFS_B_TO_FSB(mp, start_boundary),

Shouldn't this be XFS_B_TO_FSBT?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/6] xfs: don't map ranges that span EOF for direct IO
  2014-04-10  5:00 ` [PATCH 5/6] xfs: don't map ranges that span EOF for direct IO Dave Chinner
@ 2014-04-10 10:40   ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2014-04-10 10:40 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/6] xfs: collapse range is delalloc challenged.
  2014-04-10  5:00 ` [PATCH 6/6] xfs: collapse range is delalloc challenged Dave Chinner
@ 2014-04-10 10:44   ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2014-04-10 10:44 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

> -	       *current_ext <  XFS_IFORK_NEXTENTS(ip, whichfork)) {
> +	/*
> +	 * There may be delalloc extents in the data fork before the range we
> +	 * are collapsing out, so we cannot
> +	 * use the count of real extents here. Instead we have to calculate it
> +	 * from the incore fork.
> +	 */
> +	total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);

Between the amount of times we have this calculation opencoded, and the
confusing nature of XFS_IFORK_NEXTENTS it might be time to introduce
a macro to get the proper number of incore extents.

But no need to do this now, so:

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/6] xfs: zeroing space needs to punch delalloc blocks
  2014-04-10 10:40   ` Christoph Hellwig
@ 2014-04-10 12:22     ` Dave Chinner
  2014-04-10 12:33       ` Christoph Hellwig
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2014-04-10 12:22 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Thu, Apr 10, 2014 at 03:40:01AM -0700, Christoph Hellwig wrote:
> On Thu, Apr 10, 2014 at 03:00:51PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > When we are zeroing space andit is covered by a delalloc range, we
> > need to punch the delalloc range out before we truncate the page
> > cache. Failing to do so leaves and inconsistency between the page
> > cache and the extent tree, which we later trip over when doing
> > direct IO over the same range.
> 
> Looks good.
> 
> Which test found this?

The fsstress tests - any of them will do. It showed after I merged
Lukas' FALLOC_FL_ZERO_RANGE support patches for fsstress and fsx
into xfstests, but it wasn't reliably triggering until I fixed the
__xfs_get_blocks DIO bug that Al found...

> > @@ -1432,9 +1434,18 @@ xfs_zero_file_space(
> >  	ASSERT(end_boundary <= offset + len);
> >  
> >  	if (start_boundary < end_boundary - 1) {
> > -		/* punch out the page cache over the conversion range */
> > +		/*
> > +		 * punch out delayed allocation blocks and the page cache over
> > +		 * the conversion range
> > +		 */
> > +		xfs_ilock(ip, XFS_ILOCK_EXCL);
> > +		error = xfs_bmap_punch_delalloc_range(ip,
> > +				XFS_B_TO_FSB(mp, start_boundary),
> 
> Shouldn't this be XFS_B_TO_FSBT?

XFS_B_TO_FSB gives the same result given that we've already rounded
start_boundary to block granularity....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/6] xfs: zeroing space needs to punch delalloc blocks
  2014-04-10 12:22     ` Dave Chinner
@ 2014-04-10 12:33       ` Christoph Hellwig
  2014-04-10 22:35         ` Dave Chinner
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2014-04-10 12:33 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Thu, Apr 10, 2014 at 10:22:59PM +1000, Dave Chinner wrote:
> XFS_B_TO_FSB gives the same result given that we've already rounded
> start_boundary to block granularity....

It does, but XFS_B_TO_FSBT is one less operation, and it's the better
example for a start offset when people start copy & pasting it.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/6] xfs: zeroing space needs to punch delalloc blocks
  2014-04-10 12:33       ` Christoph Hellwig
@ 2014-04-10 22:35         ` Dave Chinner
  2014-04-11  7:34           ` Christoph Hellwig
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2014-04-10 22:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Thu, Apr 10, 2014 at 05:33:32AM -0700, Christoph Hellwig wrote:
> On Thu, Apr 10, 2014 at 10:22:59PM +1000, Dave Chinner wrote:
> > XFS_B_TO_FSB gives the same result given that we've already rounded
> > start_boundary to block granularity....
> 
> It does, but XFS_B_TO_FSBT is one less operation, and it's the better
> example for a start offset when people start copy & pasting it.

*nod*.

Fixed it.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/6] xfs: zeroing space needs to punch delalloc blocks
  2014-04-10 22:35         ` Dave Chinner
@ 2014-04-11  7:34           ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2014-04-11  7:34 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Fri, Apr 11, 2014 at 08:35:46AM +1000, Dave Chinner wrote:
> Fixed it.

Reviewed-by: Christoph Hellwig <hch@lst.de>

with that little tweak.  And yes, we'll need that series ASAP - I not
hit the asserts everytime I do an xfstests run.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/6 v2] xfs: delalloc, dio and corruption...
  2014-04-10  5:00 [PATCH 0/6 v2] xfs: delalloc, dio and corruption Dave Chinner
                   ` (5 preceding siblings ...)
  2014-04-10  5:00 ` [PATCH 6/6] xfs: collapse range is delalloc challenged Dave Chinner
@ 2014-04-11 13:10 ` Brian Foster
  6 siblings, 0 replies; 20+ messages in thread
From: Brian Foster @ 2014-04-11 13:10 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, Apr 10, 2014 at 03:00:47PM +1000, Dave Chinner wrote:
> Hi folks,
> 
> This is version 2 of the DIO vs delalloc patchset I posted here:
> 
> http://oss.sgi.com/archives/xfs/2014-03/msg00313.html
> 
> The changes to this version are:
> 
> 	- the bug fix to patch 2 that Brain noticed,
> 	- I dropped the delalloc extent splittting patch because
> 	  with the fix to patch 2 I can't trigger that bug anymore,
> 	  and that patch was causing transaction overruns in
> 	  xfs/297. Hence without an existing reproducer, I won't try
> 	  to fix that problem.
> 	- the last patch is new, and is a bug in the collapse range
> 	  code where it fails to shift the last N extents correctly
> 	  if there are N delalloc extents before the shifted range.
> 
> With these 6 patches, all of the xfstests fsx/fsstress tests pass
> on 1k, 2k and 4k block size filesystems, with and without CRCs
> enabled, on 1, 2 and 16p test VMs.
> 
> I'm much happier with these patches now - I don't think that there
> are more problems lurking, but only time will tell. I'd like to get
> these fixes to Linus for 3.15 (probably for -rc2), so eyeballs and
> testing would be appreciated.
> 

The series looks pretty good to me now with the latest fix. I was
previously tripping all over the delalloc asserts. With this set (and
also running with finobt enabled), my tests ran clean on a 4k fs.

I've seen generic/270 still hit an assert once or twice on a 1k block fs
(as noted on irc), but that's proven rather difficult to reproduce. I'll
continue beating on it a bit, but otherwise:

Tested-by: Brian Foster <bfoster@redhat.com>

Brian

> Cheers,
> 
> Dave.
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/6] xfs: xfs_vm_write_end truncates too much on failure
  2014-04-10 10:35   ` Christoph Hellwig
@ 2014-04-14  8:13     ` Dave Chinner
  0 siblings, 0 replies; 20+ messages in thread
From: Dave Chinner @ 2014-04-14  8:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Thu, Apr 10, 2014 at 03:35:55AM -0700, Christoph Hellwig wrote:
> On Thu, Apr 10, 2014 at 03:00:50PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Similar to the write_begin problem, xfs-vm_write_end will truncate
> > back to the old EOF, potentially removing page cache from over the
> > top of delalloc blocks with valid data in them. Fix this by
> > truncating back to just the start of the failed write.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> 
> Looks good,
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> Any reason you used max_t in the previous one, and opencode it here?

No, just made the changes at different times. I'll fix them both to
use max_t....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2014-04-14  8:13 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-10  5:00 [PATCH 0/6 v2] xfs: delalloc, dio and corruption Dave Chinner
2014-04-10  5:00 ` [PATCH 1/6] xfs: kill buffers over failed write ranges properly Dave Chinner
2014-04-10 10:32   ` Christoph Hellwig
2014-04-10  5:00 ` [PATCH 2/6] xfs: write failure beyond EOF truncates too much data Dave Chinner
2014-04-10 10:35   ` Christoph Hellwig
2014-04-10  5:00 ` [PATCH 3/6] xfs: xfs_vm_write_end truncates too much on failure Dave Chinner
2014-04-10 10:35   ` Christoph Hellwig
2014-04-14  8:13     ` Dave Chinner
2014-04-10  5:00 ` [PATCH 4/6] xfs: zeroing space needs to punch delalloc blocks Dave Chinner
2014-04-10 10:40   ` Christoph Hellwig
2014-04-10 12:22     ` Dave Chinner
2014-04-10 12:33       ` Christoph Hellwig
2014-04-10 22:35         ` Dave Chinner
2014-04-11  7:34           ` Christoph Hellwig
2014-04-10  5:00 ` [PATCH 5/6] xfs: don't map ranges that span EOF for direct IO Dave Chinner
2014-04-10 10:40   ` Christoph Hellwig
2014-04-10  5:00 ` [PATCH 6/6] xfs: collapse range is delalloc challenged Dave Chinner
2014-04-10 10:44   ` Christoph Hellwig
2014-04-11 13:10 ` [PATCH 0/6 v2] xfs: delalloc, dio and corruption Brian Foster
  -- strict thread matches above, loose matches on Subject: below --
2014-03-21 10:11 [RFC, PATCH 0/6] xfs: delalloc, DIO " Dave Chinner
2014-03-21 10:11 ` [PATCH 4/6] xfs: zeroing space needs to punch delalloc blocks Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox