* [PATCH v2 0/3] fix up indlen reservations on extent split
@ 2016-02-29 14:29 Brian Foster
2016-02-29 14:29 ` [PATCH RFC 1/3] xfs: debug mode forced buffered write failure Brian Foster
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Brian Foster @ 2016-02-29 14:29 UTC (permalink / raw)
To: xfs
Hi all,
This is a resurrection of an old fix for the indirect delalloc
reservation split problem. The last version apparently fell through the
cracks. The core problem and fix is the same and is described in patch
3.
The original problem is not as reproducible as it was since the last
version of this patch. The original zero range reproducer doesn't work
because zero range has since been updated to flush and invalidate the
affected range rather than kill any delayed allocation blocks in the in
core extent tree. The side effect of this is that the problem does not
currently have a clear reproducer, but the indirect reservation
management code is still incorrect nonetheless.
As a result, I've prepended an RFC test instrumentation patch that can
help induce the problem[1]. I've marked the patch RFC simply because it
is hacky and probably up in the air as to whether it is merge worthy. I
wanted to have _something_ to help reproduce the problem and verify the
fix, however, hence it is included here. I'm fine with either merging it
or using it as a one-off verification and dropping it. Also, any other
ideas for a more simple/elegant reproducer are welcome.
Thoughts, reviews, flames appreciated.
Brian
[1] An update to the original xfstests test is also required. I'll send
that update in a reply to this cover letter shortly.
v2:
- Rebase to latest for-next branch.
- Include RFC test instrumentation patch.
v1: http://oss.sgi.com/archives/xfs/2014-10/msg00294.html
- xfs_bunmapi() code into independent patch.
- Refactor fix into separate helper function.
rfc: http://oss.sgi.com/archives/xfs/2014-09/msg00337.html
Brian Foster (3):
xfs: debug mode forced buffered write failure
xfs: update icsb freeblocks counter after extent deletion
xfs: borrow indirect blocks from freed extent when available
fs/xfs/libxfs/xfs_bmap.c | 158 ++++++++++++++++++++++++++++++++++-------------
fs/xfs/xfs_aops.c | 9 ++-
fs/xfs/xfs_mount.h | 9 +++
fs/xfs/xfs_sysfs.c | 78 ++++++++++++++++++++---
4 files changed, 200 insertions(+), 54 deletions(-)
--
2.4.3
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH RFC 1/3] xfs: debug mode forced buffered write failure
2016-02-29 14:29 [PATCH v2 0/3] fix up indlen reservations on extent split Brian Foster
@ 2016-02-29 14:29 ` Brian Foster
2016-03-01 12:55 ` Christoph Hellwig
2016-02-29 14:29 ` [PATCH 2/3] xfs: update icsb freeblocks counter after extent deletion Brian Foster
` (2 subsequent siblings)
3 siblings, 1 reply; 10+ messages in thread
From: Brian Foster @ 2016-02-29 14:29 UTC (permalink / raw)
To: xfs
Add a DEBUG mode-only sysfs knob to enable forced buffered write
failure. An additional side effect of this mode is brute force killing
of delayed allocation blocks in the range of the write. The latter is
the prime motiviation behind this patch, as userspace test
infrastructure requires a reliable mechanism to create and split
delalloc extents without causing extent conversion.
Certain fallocate operations (i.e., zero range) were used for this in
the past, but the implementations have changed such that delalloc
extents are flushed and converted to real blocks, rendering the test
useless.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
fs/xfs/xfs_aops.c | 9 ++++++-
fs/xfs/xfs_mount.h | 9 +++++++
fs/xfs/xfs_sysfs.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++-------
3 files changed, 85 insertions(+), 11 deletions(-)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 7a467b3..39968f1 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1473,6 +1473,7 @@ xfs_vm_write_failed(
loff_t from = pos & (PAGE_CACHE_SIZE - 1);
loff_t to = from + len;
struct buffer_head *bh, *head;
+ struct xfs_mount *mp = XFS_I(inode)->i_mount;
/*
* The request pos offset might be 32 or 64 bit, this is all fine
@@ -1514,7 +1515,8 @@ xfs_vm_write_failed(
if (!buffer_delay(bh) && !buffer_unwritten(bh))
continue;
- if (!buffer_new(bh) && block_offset < i_size_read(inode))
+ if (!mp->m_fail_writes && !buffer_new(bh) &&
+ block_offset < i_size_read(inode))
continue;
if (buffer_delay(bh))
@@ -1554,6 +1556,7 @@ xfs_vm_write_begin(
pgoff_t index = pos >> PAGE_CACHE_SHIFT;
struct page *page;
int status;
+ struct xfs_mount *mp = XFS_I(mapping->host)->i_mount;
ASSERT(len <= PAGE_CACHE_SIZE);
@@ -1562,6 +1565,8 @@ xfs_vm_write_begin(
return -ENOMEM;
status = __block_write_begin(page, pos, len, xfs_get_blocks);
+ if (mp->m_fail_writes)
+ status = -EIO;
if (unlikely(status)) {
struct inode *inode = mapping->host;
size_t isize = i_size_read(inode);
@@ -1574,6 +1579,8 @@ xfs_vm_write_begin(
* allocated in this write, not blocks that were previously
* written successfully.
*/
+ if (mp->m_fail_writes)
+ isize = 0;
if (pos + len > isize) {
ssize_t start = max_t(ssize_t, pos, isize);
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index a4e03ab..083a32a 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -147,6 +147,15 @@ typedef struct xfs_mount {
* to various other kinds of pain inflicted on the pNFS server.
*/
__uint32_t m_generation;
+
+ /*
+ * DEBUG mode instrumentation to test and/or trigger delayed allocation
+ * block killing in the event of failed writes. When enabled, all
+ * buffered writes are forced to fail. All delalloc blocks in the range
+ * of the write (including pre-existing delalloc blocks!) are tossed as
+ * part of the write failure error handling sequence.
+ */
+ bool m_fail_writes;
} xfs_mount_t;
/*
diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
index 641d625..5f2d28d 100644
--- a/fs/xfs/xfs_sysfs.c
+++ b/fs/xfs/xfs_sysfs.c
@@ -22,6 +22,9 @@
#include "xfs_log.h"
#include "xfs_log_priv.h"
#include "xfs_stats.h"
+#include "libxfs/xfs_trans_resv.h"
+#include "libxfs/xfs_format.h"
+#include "xfs_mount.h"
struct xfs_sysfs_attr {
struct attribute attr;
@@ -45,16 +48,6 @@ to_attr(struct attribute *attr)
#define ATTR_LIST(name) &xfs_sysfs_attr_##name.attr
-/*
- * xfs_mount kobject. This currently has no attributes and thus no need for show
- * and store helpers. The mp kobject serves as the per-mount parent object that
- * is identified by the fsname under sysfs.
- */
-
-struct kobj_type xfs_mp_ktype = {
- .release = xfs_sysfs_release,
-};
-
STATIC ssize_t
xfs_sysfs_object_show(
struct kobject *kobject,
@@ -83,6 +76,71 @@ static const struct sysfs_ops xfs_sysfs_ops = {
.store = xfs_sysfs_object_store,
};
+/*
+ * xfs_mount kobject. The mp kobject also serves as the per-mount parent object
+ * that is identified by the fsname under sysfs.
+ */
+
+static inline struct xfs_mount *
+to_mp(struct kobject *kobject)
+{
+ struct xfs_kobj *kobj = to_kobj(kobject);
+
+ return container_of(kobj, struct xfs_mount, m_kobj);
+}
+
+#ifdef DEBUG
+
+STATIC ssize_t
+fail_writes_store(
+ struct kobject *kobject,
+ const char *buf,
+ size_t count)
+{
+ struct xfs_mount *mp = to_mp(kobject);
+ int ret;
+ int val;
+
+ ret = kstrtoint(buf, 0, &val);
+ if (ret)
+ return ret;
+
+ if (val == 1)
+ mp->m_fail_writes = true;
+ else if (val == 0)
+ mp->m_fail_writes = false;
+ else
+ return -EINVAL;
+
+ return count;
+}
+
+STATIC ssize_t
+fail_writes_show(
+ struct kobject *kobject,
+ char *buf)
+{
+ struct xfs_mount *mp = to_mp(kobject);
+
+ return snprintf(buf, PAGE_SIZE, "%d\n", mp->m_fail_writes ? 1 : 0);
+}
+XFS_SYSFS_ATTR_RW(fail_writes);
+
+#endif /* DEBUG */
+
+static struct attribute *xfs_mp_attrs[] = {
+#ifdef DEBUG
+ ATTR_LIST(fail_writes),
+#endif
+ NULL,
+};
+
+struct kobj_type xfs_mp_ktype = {
+ .release = xfs_sysfs_release,
+ .sysfs_ops = &xfs_sysfs_ops,
+ .default_attrs = xfs_mp_attrs,
+};
+
#ifdef DEBUG
/* debug */
--
2.4.3
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/3] xfs: update icsb freeblocks counter after extent deletion
2016-02-29 14:29 [PATCH v2 0/3] fix up indlen reservations on extent split Brian Foster
2016-02-29 14:29 ` [PATCH RFC 1/3] xfs: debug mode forced buffered write failure Brian Foster
@ 2016-02-29 14:29 ` Brian Foster
2016-03-01 12:48 ` Christoph Hellwig
2016-02-29 14:29 ` [PATCH 3/3] xfs: borrow indirect blocks from freed extent when available Brian Foster
2016-02-29 14:36 ` [PATCH v2 0/3] fix up indlen reservations on extent split Brian Foster
3 siblings, 1 reply; 10+ messages in thread
From: Brian Foster @ 2016-02-29 14:29 UTC (permalink / raw)
To: xfs
xfs_bunmapi() currently updates the icsb fdblocks counter, unreserves
quota, etc. before the extent is deleted by xfs_bmap_del_extent(). The
function has problems dividing up the indirect reserved blocks for
scenarios where a single delalloc extent is split in two. Particularly,
there aren't always enough blocks reserved for multiple extents in a
single extent reservation.
The solution to this problem is to allow the extent removal code to
steal from the deleted extent to meet indirect reservation requirements.
Move the block of code in xfs_bmapi() that updates the icsb fdblocks
counter to after the call to xfs_bmap_del_extent() to allow the codepath
to update the extent record before the free blocks are accounted. Also,
reshuffle the code slightly so the delalloc accounting occurs near the
xfs_bmap_del_extent() call to provide context for the comments.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
fs/xfs/libxfs/xfs_bmap.c | 58 ++++++++++++++++++++++++++++--------------------
1 file changed, 34 insertions(+), 24 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 6a05166..7180104 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5296,31 +5296,7 @@ xfs_bunmapi(
goto nodelete;
}
}
- if (wasdel) {
- ASSERT(startblockval(del.br_startblock) > 0);
- /* Update realtime/data freespace, unreserve quota */
- if (isrt) {
- xfs_filblks_t rtexts;
- rtexts = XFS_FSB_TO_B(mp, del.br_blockcount);
- do_div(rtexts, mp->m_sb.sb_rextsize);
- xfs_mod_frextents(mp, (int64_t)rtexts);
- (void)xfs_trans_reserve_quota_nblks(NULL,
- ip, -((long)del.br_blockcount), 0,
- XFS_QMOPT_RES_RTBLKS);
- } else {
- xfs_mod_fdblocks(mp, (int64_t)del.br_blockcount,
- false);
- (void)xfs_trans_reserve_quota_nblks(NULL,
- ip, -((long)del.br_blockcount), 0,
- XFS_QMOPT_RES_REGBLKS);
- }
- ip->i_delayed_blks -= del.br_blockcount;
- if (cur)
- cur->bc_private.b.flags |=
- XFS_BTCUR_BPRV_WASDEL;
- } else if (cur)
- cur->bc_private.b.flags &= ~XFS_BTCUR_BPRV_WASDEL;
/*
* If it's the case where the directory code is running
* with no block reservation, and the deleted block is in
@@ -5342,11 +5318,45 @@ xfs_bunmapi(
error = -ENOSPC;
goto error0;
}
+
+ /*
+ * Unreserve quota and update realtime free space, if
+ * appropriate. If delayed allocation, update the inode delalloc
+ * counter now and wait to update the sb counters as
+ * xfs_bmap_del_extent() might need to borrow some blocks.
+ */
+ if (wasdel) {
+ ASSERT(startblockval(del.br_startblock) > 0);
+ if (isrt) {
+ xfs_filblks_t rtexts;
+
+ rtexts = XFS_FSB_TO_B(mp, del.br_blockcount);
+ do_div(rtexts, mp->m_sb.sb_rextsize);
+ xfs_mod_frextents(mp, (int64_t)rtexts);
+ (void)xfs_trans_reserve_quota_nblks(NULL,
+ ip, -((long)del.br_blockcount), 0,
+ XFS_QMOPT_RES_RTBLKS);
+ } else {
+ (void)xfs_trans_reserve_quota_nblks(NULL,
+ ip, -((long)del.br_blockcount), 0,
+ XFS_QMOPT_RES_REGBLKS);
+ }
+ ip->i_delayed_blks -= del.br_blockcount;
+ if (cur)
+ cur->bc_private.b.flags |=
+ XFS_BTCUR_BPRV_WASDEL;
+ } else if (cur)
+ cur->bc_private.b.flags &= ~XFS_BTCUR_BPRV_WASDEL;
+
error = xfs_bmap_del_extent(ip, tp, &lastx, flist, cur, &del,
&tmp_logflags, whichfork);
logflags |= tmp_logflags;
if (error)
goto error0;
+
+ if (!isrt && wasdel)
+ xfs_mod_fdblocks(mp, (int64_t)del.br_blockcount, false);
+
bno = del.br_startoff - 1;
nodelete:
/*
--
2.4.3
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/3] xfs: borrow indirect blocks from freed extent when available
2016-02-29 14:29 [PATCH v2 0/3] fix up indlen reservations on extent split Brian Foster
2016-02-29 14:29 ` [PATCH RFC 1/3] xfs: debug mode forced buffered write failure Brian Foster
2016-02-29 14:29 ` [PATCH 2/3] xfs: update icsb freeblocks counter after extent deletion Brian Foster
@ 2016-02-29 14:29 ` Brian Foster
2016-03-01 13:00 ` Christoph Hellwig
2016-02-29 14:36 ` [PATCH v2 0/3] fix up indlen reservations on extent split Brian Foster
3 siblings, 1 reply; 10+ messages in thread
From: Brian Foster @ 2016-02-29 14:29 UTC (permalink / raw)
To: xfs
xfs_bmap_del_extent() handles extent removal from the in-core and
on-disk extent lists. When removing a delalloc range, it updates the
indirect block reservation appropriately based on the removal. It
currently enforces that the new indirect block reservation is less than
or equal to the original. This is normally the case in all situations
except for in certain cases when the removed range creates a hole in a
single delalloc extent, thus splitting a single delalloc extent in two.
It is possible with small enough extents to split an indlen==1 extent
into two such slightly smaller extents. This leaves one extent with 0
indirect blocks and leads to assert failures in other areas (e.g.,
xfs_bunmapi() if the extent happens to be removed).
Refactor the indirect reservation code into a new helper function
invoked by xfs_bunmapi(). Allow the helper to steal blocks from the
deleted extent if necessary to satisfy the worst case indirect
reservation of the new extents. This is safe now that the caller does
not update icsb counters until the extent is removed (such stolen blocks
simply remain accounted as allocated). Fall back to the original
reservation using the existing algorithm and warn if we end up with
extents without any reservation at all to detect this more easily in the
future.
This allows generic/033 to run on XFS without assert failures.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
fs/xfs/libxfs/xfs_bmap.c | 100 ++++++++++++++++++++++++++++++++++++++---------
1 file changed, 81 insertions(+), 19 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 7180104..79688a6 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4721,6 +4721,68 @@ error0:
}
/*
+ * When a delalloc extent is split (e.g., due to a hole punch), the original
+ * indlen reservation must be shared across the two new extents that are left
+ * behind.
+ *
+ * Provided the worst case indlen for two extents, limit the total reservation
+ * against that of the original extent. We may want to steal blocks from any
+ * extra that might be available according to the caller to make up a deficiency
+ * (e.g., if the original reservation consists of one block). The number of such
+ * stolen blocks is returned. The availability and accounting of stealable
+ * blocks is the responsibility of the caller.
+ */
+static xfs_filblks_t
+xfs_bmap_limit_indlen(
+ xfs_filblks_t ores, /* original res. */
+ xfs_filblks_t *indlen1, /* ext1 worst indlen */
+ xfs_filblks_t *indlen2, /* ext2 worst indlen */
+ xfs_filblks_t avail) /* stealable blocks */
+{
+ xfs_filblks_t nres; /* new total res. */
+ xfs_filblks_t temp;
+ xfs_filblks_t temp2;
+ xfs_filblks_t stolen = 0;
+
+ temp = *indlen1;
+ temp2 = *indlen2;
+ nres = temp + temp2;
+
+ /*
+ * Steal as many blocks as we can to try and satisfy the worst case
+ * indlen of both new extents.
+ */
+ while (nres > ores && avail) {
+ nres--;
+ avail--;
+ stolen++;
+ }
+
+ /*
+ * If we've exhausted stealable blocks and still haven't satisfied the
+ * worst case reservation, we have no choice but to pare back until
+ * covered by the original reservation.
+ */
+ while (nres > ores) {
+ if (temp) {
+ temp--;
+ nres--;
+ }
+ if (nres == ores)
+ break;
+ if (temp2) {
+ temp2--;
+ nres--;
+ }
+ }
+
+ *indlen1 = temp;
+ *indlen2 = temp2;
+
+ return stolen;
+}
+
+/*
* Called by xfs_bmapi to update file extent records and the btree
* after removing space (or undoing a delayed allocation).
*/
@@ -4984,28 +5046,28 @@ xfs_bmap_del_extent(
XFS_IFORK_NEXT_SET(ip, whichfork,
XFS_IFORK_NEXTENTS(ip, whichfork) + 1);
} else {
+ xfs_filblks_t stolen;
ASSERT(whichfork == XFS_DATA_FORK);
- temp = xfs_bmap_worst_indlen(ip, temp);
+
+ /*
+ * Fix up the indlen reservations. We can safely steal
+ * blocks from the deleted extent as this simply fudges
+ * the icsb fdblocks accounting in xfs_bunmapi().
+ */
+ temp = xfs_bmap_worst_indlen(ip, got.br_blockcount);
+ temp2 = xfs_bmap_worst_indlen(ip, new.br_blockcount);
+ stolen = xfs_bmap_limit_indlen(da_old, &temp, &temp2,
+ del->br_blockcount);
+ da_new = temp + temp2 - stolen;
+ del->br_blockcount -= stolen;
+
+ /*
+ * Set the reservation for each extent. Warn if either
+ * is zero as this can lead to delalloc problems.
+ */
+ WARN_ON_ONCE(!temp || !temp2);
xfs_bmbt_set_startblock(ep, nullstartblock((int)temp));
- temp2 = xfs_bmap_worst_indlen(ip, temp2);
new.br_startblock = nullstartblock((int)temp2);
- da_new = temp + temp2;
- while (da_new > da_old) {
- if (temp) {
- temp--;
- da_new--;
- xfs_bmbt_set_startblock(ep,
- nullstartblock((int)temp));
- }
- if (da_new == da_old)
- break;
- if (temp2) {
- temp2--;
- da_new--;
- new.br_startblock =
- nullstartblock((int)temp2);
- }
- }
}
trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_);
xfs_iext_insert(ip, *idx + 1, 1, &new, state);
--
2.4.3
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/3] fix up indlen reservations on extent split
2016-02-29 14:29 [PATCH v2 0/3] fix up indlen reservations on extent split Brian Foster
` (2 preceding siblings ...)
2016-02-29 14:29 ` [PATCH 3/3] xfs: borrow indirect blocks from freed extent when available Brian Foster
@ 2016-02-29 14:36 ` Brian Foster
3 siblings, 0 replies; 10+ messages in thread
From: Brian Foster @ 2016-02-29 14:36 UTC (permalink / raw)
To: xfs
[-- Attachment #1: Type: text/plain, Size: 2778 bytes --]
On Mon, Feb 29, 2016 at 09:29:27AM -0500, Brian Foster wrote:
> Hi all,
>
> This is a resurrection of an old fix for the indirect delalloc
> reservation split problem. The last version apparently fell through the
> cracks. The core problem and fix is the same and is described in patch
> 3.
>
> The original problem is not as reproducible as it was since the last
> version of this patch. The original zero range reproducer doesn't work
> because zero range has since been updated to flush and invalidate the
> affected range rather than kill any delayed allocation blocks in the in
> core extent tree. The side effect of this is that the problem does not
> currently have a clear reproducer, but the indirect reservation
> management code is still incorrect nonetheless.
>
> As a result, I've prepended an RFC test instrumentation patch that can
> help induce the problem[1]. I've marked the patch RFC simply because it
> is hacky and probably up in the air as to whether it is merge worthy. I
> wanted to have _something_ to help reproduce the problem and verify the
> fix, however, hence it is included here. I'm fine with either merging it
> or using it as a one-off verification and dropping it. Also, any other
> ideas for a more simple/elegant reproducer are welcome.
>
> Thoughts, reviews, flames appreciated.
>
> Brian
>
> [1] An update to the original xfstests test is also required. I'll send
> that update in a reply to this cover letter shortly.
>
Attached are a couple patches to update the test as described here. I'm
sending them as attachments because they aren't worth reviewing unless
patch 1/3 is merged as is. I'll send the test updates separately
(properly) depending on what results on the kernel side of things. In
the meantime, these are useful to demonstrate the problem on a current
kernel and test the fix.
Brian
> v2:
> - Rebase to latest for-next branch.
> - Include RFC test instrumentation patch.
> v1: http://oss.sgi.com/archives/xfs/2014-10/msg00294.html
> - xfs_bunmapi() code into independent patch.
> - Refactor fix into separate helper function.
> rfc: http://oss.sgi.com/archives/xfs/2014-09/msg00337.html
>
> Brian Foster (3):
> xfs: debug mode forced buffered write failure
> xfs: update icsb freeblocks counter after extent deletion
> xfs: borrow indirect blocks from freed extent when available
>
> fs/xfs/libxfs/xfs_bmap.c | 158 ++++++++++++++++++++++++++++++++++-------------
> fs/xfs/xfs_aops.c | 9 ++-
> fs/xfs/xfs_mount.h | 9 +++
> fs/xfs/xfs_sysfs.c | 78 ++++++++++++++++++++---
> 4 files changed, 200 insertions(+), 54 deletions(-)
>
> --
> 2.4.3
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
[-- Attachment #2: 0001-xfstests-move-generic-indlen-reservation-test-to-xfs.patch --]
[-- Type: text/plain, Size: 8077 bytes --]
>From ecb696160da6b38288ca7a339c2c386bf4955fba Mon Sep 17 00:00:00 2001
From: Brian Foster <bfoster@redhat.com>
Date: Mon, 29 Feb 2016 09:00:36 -0500
Subject: [PATCH 1/2] xfstests: move generic indlen reservation test to xfs dir
This test was originally designed to reproduce the split indlen
reservation depletion problem in XFS. It was included as a generic test
simply because it had no hard dependencies on XFS or associated tools.
This test is no longer effective in its current form. Fixing it requires
use of XFS specific mechanisms. Therefore, move the test to the XFS
specific test directory. No other changes are made in this patch.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
tests/generic/033 | 84 ---------------------------------------------------
tests/generic/033.out | 4 ---
tests/generic/group | 1 -
tests/xfs/289 | 84 +++++++++++++++++++++++++++++++++++++++++++++++++++
tests/xfs/289.out | 4 +++
tests/xfs/group | 1 +
6 files changed, 89 insertions(+), 89 deletions(-)
delete mode 100755 tests/generic/033
delete mode 100644 tests/generic/033.out
create mode 100755 tests/xfs/289
create mode 100644 tests/xfs/289.out
diff --git a/tests/generic/033 b/tests/generic/033
deleted file mode 100755
index 4f8bb92..0000000
--- a/tests/generic/033
+++ /dev/null
@@ -1,84 +0,0 @@
-#! /bin/bash
-# FS QA Test No. 033
-#
-# This test stresses indirect block reservation for delayed allocation extents.
-# XFS reserves extra blocks for deferred allocation of delalloc extents. These
-# reserved blocks can be divided among more extents than anticipated if the
-# original extent for which the blocks were reserved is split into multiple
-# delalloc extents. If this scenario repeats, eventually some extents are left
-# without any indirect block reservation whatsoever. This leads to assert
-# failures and possibly other problems in XFS.
-#
-#-----------------------------------------------------------------------
-# Copyright (c) 2014 Red Hat, Inc. All Rights Reserved.
-#
-# This program is free software; you can redistribute it and/or
-# modify it under the terms of the GNU General Public License as
-# published by the Free Software Foundation.
-#
-# This program is distributed in the hope that it would be useful,
-# but WITHOUT ANY WARRANTY; without even the implied warranty of
-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
-# GNU General Public License for more details.
-#
-# You should have received a copy of the GNU General Public License
-# along with this program; if not, write the Free Software Foundation,
-# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
-#-----------------------------------------------------------------------
-#
-
-seq=`basename $0`
-seqres=$RESULT_DIR/$seq
-echo "QA output created by $seq"
-
-here=`pwd`
-tmp=/tmp/$$
-status=1 # failure is the default!
-trap "_cleanup; exit \$status" 0 1 2 3 15
-
-_cleanup()
-{
- cd /
- rm -f $tmp.*
-}
-
-# get standard environment, filters and checks
-. ./common/rc
-
-# real QA test starts here
-rm -f $seqres.full
-
-# Modify as appropriate.
-_supported_fs generic
-_supported_os Linux
-_require_scratch
-_require_xfs_io_command "fzero"
-
-_scratch_mkfs >/dev/null 2>&1
-_scratch_mount
-
-file=$SCRATCH_MNT/file.$seq
-bytes=$((64 * 1024))
-
-# create sequential delayed allocation
-$XFS_IO_PROG -f -c "pwrite 0 $bytes" $file >> $seqres.full 2>&1
-
-# Zero every other 4k range to split the larger delalloc extent into many more
-# smaller extents. Use zero instead of hole punch because the former does not
-# force writeback (and hence delalloc conversion). It can simply discard
-# delalloc blocks and convert the ranges to unwritten.
-endoff=$((bytes - 4096))
-for i in $(seq 0 8192 $endoff); do
- $XFS_IO_PROG -c "fzero -k $i 4k" $file >> $seqres.full 2>&1
-done
-
-# now zero the opposite set to remove remaining delalloc extents
-for i in $(seq 4096 8192 $endoff); do
- $XFS_IO_PROG -c "fzero -k $i 4k" $file >> $seqres.full 2>&1
-done
-
-_scratch_cycle_mount
-hexdump $file
-
-status=0
-exit
diff --git a/tests/generic/033.out b/tests/generic/033.out
deleted file mode 100644
index 419d831..0000000
--- a/tests/generic/033.out
+++ /dev/null
@@ -1,4 +0,0 @@
-QA output created by 033
-0000000 0000 0000 0000 0000 0000 0000 0000 0000
-*
-0010000
diff --git a/tests/generic/group b/tests/generic/group
index 727648c..47638c3 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -35,7 +35,6 @@
030 auto quick rw
031 auto quick prealloc rw
032 auto quick rw
-033 auto quick rw
034 auto quick metadata log
035 auto quick
036 auto aio rw stress
diff --git a/tests/xfs/289 b/tests/xfs/289
new file mode 100755
index 0000000..33cf060
--- /dev/null
+++ b/tests/xfs/289
@@ -0,0 +1,84 @@
+#! /bin/bash
+# FS QA Test No. 289
+#
+# This test stresses indirect block reservation for delayed allocation extents.
+# XFS reserves extra blocks for deferred allocation of delalloc extents. These
+# reserved blocks can be divided among more extents than anticipated if the
+# original extent for which the blocks were reserved is split into multiple
+# delalloc extents. If this scenario repeats, eventually some extents are left
+# without any indirect block reservation whatsoever. This leads to assert
+# failures and possibly other problems in XFS.
+#
+#-----------------------------------------------------------------------
+# Copyright (c) 2014 Red Hat, Inc. All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1 # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+ cd /
+ rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+
+# real QA test starts here
+rm -f $seqres.full
+
+# Modify as appropriate.
+_supported_fs generic
+_supported_os Linux
+_require_scratch
+_require_xfs_io_command "fzero"
+
+_scratch_mkfs >/dev/null 2>&1
+_scratch_mount
+
+file=$SCRATCH_MNT/file.$seq
+bytes=$((64 * 1024))
+
+# create sequential delayed allocation
+$XFS_IO_PROG -f -c "pwrite 0 $bytes" $file >> $seqres.full 2>&1
+
+# Zero every other 4k range to split the larger delalloc extent into many more
+# smaller extents. Use zero instead of hole punch because the former does not
+# force writeback (and hence delalloc conversion). It can simply discard
+# delalloc blocks and convert the ranges to unwritten.
+endoff=$((bytes - 4096))
+for i in $(seq 0 8192 $endoff); do
+ $XFS_IO_PROG -c "fzero -k $i 4k" $file >> $seqres.full 2>&1
+done
+
+# now zero the opposite set to remove remaining delalloc extents
+for i in $(seq 4096 8192 $endoff); do
+ $XFS_IO_PROG -c "fzero -k $i 4k" $file >> $seqres.full 2>&1
+done
+
+_scratch_cycle_mount
+hexdump $file
+
+status=0
+exit
diff --git a/tests/xfs/289.out b/tests/xfs/289.out
new file mode 100644
index 0000000..bdcf195
--- /dev/null
+++ b/tests/xfs/289.out
@@ -0,0 +1,4 @@
+QA output created by 289
+0000000 0000 0000 0000 0000 0000 0000 0000 0000
+*
+0010000
diff --git a/tests/xfs/group b/tests/xfs/group
index e0c4553..b4cc1c0 100644
--- a/tests/xfs/group
+++ b/tests/xfs/group
@@ -269,6 +269,7 @@
282 dump ioctl auto quick
283 dump ioctl auto quick
287 auto dump quota quick
+289 auto quick rw
290 auto rw prealloc quick ioctl
291 auto repair
292 auto mkfs quick
--
2.4.3
[-- Attachment #3: 0002-tests-xfs-update-indlen-res.-test-to-use-fail-writes.patch --]
[-- Type: text/plain, Size: 3292 bytes --]
>From 82ca2acf976736b5dc638d6a5ab31a82a76364ae Mon Sep 17 00:00:00 2001
From: Brian Foster <bfoster@redhat.com>
Date: Mon, 29 Feb 2016 09:04:25 -0500
Subject: [PATCH 2/2] tests/xfs: update indlen res. test to use fail writes
mechanism
This test originally used zero range operations to reproduce problematic
indirect delalloc reservations on XFS. Zero range has since been updated
such that it cannot be used to reproduce the problem. Instead, a
buffered write failure mechanism has been added to XFS to facilitate
reproducing this problem.
Update the test to use the buffered write failure mechanism to split
delalloc extents and reproduce the original indlen reservation problem.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
tests/xfs/289 | 28 +++++++++++++++++++---------
tests/xfs/289.out | 4 +---
2 files changed, 20 insertions(+), 12 deletions(-)
diff --git a/tests/xfs/289 b/tests/xfs/289
index 33cf060..5da332b 100755
--- a/tests/xfs/289
+++ b/tests/xfs/289
@@ -44,6 +44,7 @@ _cleanup()
# get standard environment, filters and checks
. ./common/rc
+. ./common/punch
# real QA test starts here
rm -f $seqres.full
@@ -52,33 +53,42 @@ rm -f $seqres.full
_supported_fs generic
_supported_os Linux
_require_scratch
-_require_xfs_io_command "fzero"
+_require_xfs_sysfs $(_short_dev $TEST_DEV)/fail_writes
_scratch_mkfs >/dev/null 2>&1
_scratch_mount
+sdev=$(_short_dev $SCRATCH_DEV)
file=$SCRATCH_MNT/file.$seq
bytes=$((64 * 1024))
# create sequential delayed allocation
$XFS_IO_PROG -f -c "pwrite 0 $bytes" $file >> $seqres.full 2>&1
-# Zero every other 4k range to split the larger delalloc extent into many more
-# smaller extents. Use zero instead of hole punch because the former does not
-# force writeback (and hence delalloc conversion). It can simply discard
-# delalloc blocks and convert the ranges to unwritten.
+# Enable write failures. All buffered writes fail from this point on.
+echo 1 > /sys/fs/xfs/$sdev/fail_writes
+
+# Write every other 4k range to split the larger delalloc extent into many more
+# smaller extents. Use pwrite because with write failures enabled, all
+# preexisting delalloc blocks in the range of the I/O are tossed without
+# discretion. This allows manipulation of the delalloc extent without conversion
+# to real blocks (and thus releasing the indirect reservation).
endoff=$((bytes - 4096))
for i in $(seq 0 8192 $endoff); do
- $XFS_IO_PROG -c "fzero -k $i 4k" $file >> $seqres.full 2>&1
+ $XFS_IO_PROG -c "pwrite $i 4k" $file >> $seqres.full 2>&1
done
-# now zero the opposite set to remove remaining delalloc extents
+# now pwrite the opposite set to remove remaining delalloc extents
for i in $(seq 4096 8192 $endoff); do
- $XFS_IO_PROG -c "fzero -k $i 4k" $file >> $seqres.full 2>&1
+ $XFS_IO_PROG -c "pwrite $i 4k" $file >> $seqres.full 2>&1
done
+echo 0 > /sys/fs/xfs/$sdev/fail_writes
+
+echo "Silence is golden."
+
_scratch_cycle_mount
-hexdump $file
+$XFS_IO_PROG -c 'bmap -vp' $file | _filter_bmap
status=0
exit
diff --git a/tests/xfs/289.out b/tests/xfs/289.out
index bdcf195..72e60f9 100644
--- a/tests/xfs/289.out
+++ b/tests/xfs/289.out
@@ -1,4 +1,2 @@
QA output created by 289
-0000000 0000 0000 0000 0000 0000 0000 0000 0000
-*
-0010000
+Silence is golden.
--
2.4.3
[-- Attachment #4: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] xfs: update icsb freeblocks counter after extent deletion
2016-02-29 14:29 ` [PATCH 2/3] xfs: update icsb freeblocks counter after extent deletion Brian Foster
@ 2016-03-01 12:48 ` Christoph Hellwig
0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2016-03-01 12:48 UTC (permalink / raw)
To: Brian Foster; +Cc: xfs
On Mon, Feb 29, 2016 at 09:29:29AM -0500, Brian Foster wrote:
> xfs_bunmapi() currently updates the icsb fdblocks counter, unreserves
> quota, etc. before the extent is deleted by xfs_bmap_del_extent(). The
> function has problems dividing up the indirect reserved blocks for
> scenarios where a single delalloc extent is split in two. Particularly,
> there aren't always enough blocks reserved for multiple extents in a
> single extent reservation.
>
> The solution to this problem is to allow the extent removal code to
> steal from the deleted extent to meet indirect reservation requirements.
> Move the block of code in xfs_bmapi() that updates the icsb fdblocks
> counter to after the call to xfs_bmap_del_extent() to allow the codepath
> to update the extent record before the free blocks are accounted. Also,
> reshuffle the code slightly so the delalloc accounting occurs near the
> xfs_bmap_del_extent() call to provide context for the comments.
>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
Looks fine,
Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RFC 1/3] xfs: debug mode forced buffered write failure
2016-02-29 14:29 ` [PATCH RFC 1/3] xfs: debug mode forced buffered write failure Brian Foster
@ 2016-03-01 12:55 ` Christoph Hellwig
2016-03-01 13:10 ` Brian Foster
0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2016-03-01 12:55 UTC (permalink / raw)
To: Brian Foster; +Cc: xfs
On Mon, Feb 29, 2016 at 09:29:28AM -0500, Brian Foster wrote:
> Add a DEBUG mode-only sysfs knob to enable forced buffered write
> failure. An additional side effect of this mode is brute force killing
> of delayed allocation blocks in the range of the write. The latter is
> the prime motiviation behind this patch, as userspace test
> infrastructure requires a reliable mechanism to create and split
> delalloc extents without causing extent conversion.
>
> Certain fallocate operations (i.e., zero range) were used for this in
> the past, but the implementations have changed such that delalloc
> extents are flushed and converted to real blocks, rendering the test
> useless.
Looks harmless, but I'd add a inline function to check for the failed
writes field so that we can compile it away entirely for !DEBUG builds.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 3/3] xfs: borrow indirect blocks from freed extent when available
2016-02-29 14:29 ` [PATCH 3/3] xfs: borrow indirect blocks from freed extent when available Brian Foster
@ 2016-03-01 13:00 ` Christoph Hellwig
2016-03-01 13:11 ` Brian Foster
0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2016-03-01 13:00 UTC (permalink / raw)
To: Brian Foster; +Cc: xfs
I've been wrapping my head around this since yersterday..
It looks reasonable, but I'd be much more comfortable if you could
split this into two patches:
- one to just factor out a helper to update the temp and temp2
values (great variable names while we're at it.. not your fault,
though)
- one to actually change the algorithm used.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RFC 1/3] xfs: debug mode forced buffered write failure
2016-03-01 12:55 ` Christoph Hellwig
@ 2016-03-01 13:10 ` Brian Foster
0 siblings, 0 replies; 10+ messages in thread
From: Brian Foster @ 2016-03-01 13:10 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Tue, Mar 01, 2016 at 04:55:16AM -0800, Christoph Hellwig wrote:
> On Mon, Feb 29, 2016 at 09:29:28AM -0500, Brian Foster wrote:
> > Add a DEBUG mode-only sysfs knob to enable forced buffered write
> > failure. An additional side effect of this mode is brute force killing
> > of delayed allocation blocks in the range of the write. The latter is
> > the prime motiviation behind this patch, as userspace test
> > infrastructure requires a reliable mechanism to create and split
> > delalloc extents without causing extent conversion.
> >
> > Certain fallocate operations (i.e., zero range) were used for this in
> > the past, but the implementations have changed such that delalloc
> > extents are flushed and converted to real blocks, rendering the test
> > useless.
>
> Looks harmless, but I'd add a inline function to check for the failed
> writes field so that we can compile it away entirely for !DEBUG builds.
Good idea. The thought of burying this further down in get_blocks or
somewhere around there has also crossed my mind since sending this, so
I'll look into that as well. I'm mainly just curious whether it helps
isolate the necessary changes any better than the current form...
Brian
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 3/3] xfs: borrow indirect blocks from freed extent when available
2016-03-01 13:00 ` Christoph Hellwig
@ 2016-03-01 13:11 ` Brian Foster
0 siblings, 0 replies; 10+ messages in thread
From: Brian Foster @ 2016-03-01 13:11 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Tue, Mar 01, 2016 at 05:00:18AM -0800, Christoph Hellwig wrote:
> I've been wrapping my head around this since yersterday..
>
> It looks reasonable, but I'd be much more comfortable if you could
> split this into two patches:
>
> - one to just factor out a helper to update the temp and temp2
> values (great variable names while we're at it.. not your fault,
> though)
> - one to actually change the algorithm used.
Sure, will do.
Brian
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2016-03-01 13:11 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-29 14:29 [PATCH v2 0/3] fix up indlen reservations on extent split Brian Foster
2016-02-29 14:29 ` [PATCH RFC 1/3] xfs: debug mode forced buffered write failure Brian Foster
2016-03-01 12:55 ` Christoph Hellwig
2016-03-01 13:10 ` Brian Foster
2016-02-29 14:29 ` [PATCH 2/3] xfs: update icsb freeblocks counter after extent deletion Brian Foster
2016-03-01 12:48 ` Christoph Hellwig
2016-02-29 14:29 ` [PATCH 3/3] xfs: borrow indirect blocks from freed extent when available Brian Foster
2016-03-01 13:00 ` Christoph Hellwig
2016-03-01 13:11 ` Brian Foster
2016-02-29 14:36 ` [PATCH v2 0/3] fix up indlen reservations on extent split Brian Foster
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox