* [PATCH 0/2] xfs: rely on minleft instead of total for bmbt res
@ 2019-05-01 14:05 Brian Foster
2019-05-01 14:05 ` [PATCH 1/2] xfs: drop minlen before tossing alignment on bmap allocs Brian Foster
2019-05-01 14:05 ` [PATCH 2/2] xfs: don't set bmapi total block req where minleft is sufficient Brian Foster
0 siblings, 2 replies; 3+ messages in thread
From: Brian Foster @ 2019-05-01 14:05 UTC (permalink / raw)
To: linux-xfs
Hi all,
This is a follow up to the RFC[1] I posted the other day. After poking
around some more, I noticed that the bmapi allocation code already set
args.minleft based on the state of the associated fork. Based on the
reasoning in the commit logs, it seems that a bunch of the 'args.total =
extent length + bmbt res' bmapi allocation callers were superfluous and
could just rely on the existing minleft logic.
Note that this doesn't address the other users of args.total (inode,
dquot, xattr, etc.). It remains to be seen whether there is still value
in having something like args.extra instead of args.total. For one,
passing zero is somewhat of a landmine if transaction block reservations
happen to change. It would be nice to clean this all up so the
additional reservation portion of the mapping request is explicit and
robust, particularly for callers where minlen < maxlen. As it is, it's
still kind of pointless to specify minlen < maxlen and total >= maxlen
because if we can't satisfy maxlen, the total check in
xfs_alloc_space_available() will never pass until xfs_bmap_btalloc()
reduces args.total, and it's set to minlen essentially ignoring what the
caller set it to originally. That might not matter so much in cases
where the allocation is a strict minlen == maxlen request and we're
going to simply pass or fail.
In summary, this whole mechanism is still quite hairy and could probably
use further improvements. For that reason, careful review of patch 2 is
probably in order. So far, this has survived a full fstests auto run
with default geometry and an enospc group run with an agsize=20MB
geometry (~750+ AGs). I'm currently (sllloowlly) repeating the auto
group run with the latter format and perhaps will follow that up with an
fsstress cycle that slams the fs into -ENOSPC for a longish period of
time (hours/days).
Thoughts, reviews, flames appreciated.
Brian
[1] https://marc.info/?l=linux-xfs&m=155628702230215&w=2
Brian Foster (2):
xfs: drop minlen before tossing alignment on bmap allocs
xfs: don't set bmapi total block req where minleft is sufficient
fs/xfs/libxfs/xfs_bmap.c | 13 +++++++++----
fs/xfs/xfs_bmap_util.c | 4 ++--
fs/xfs/xfs_dquot.c | 4 ++--
fs/xfs/xfs_iomap.c | 4 ++--
fs/xfs/xfs_reflink.c | 4 ++--
fs/xfs/xfs_rtalloc.c | 3 +--
6 files changed, 18 insertions(+), 14 deletions(-)
--
2.17.2
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH 1/2] xfs: drop minlen before tossing alignment on bmap allocs
2019-05-01 14:05 [PATCH 0/2] xfs: rely on minleft instead of total for bmbt res Brian Foster
@ 2019-05-01 14:05 ` Brian Foster
2019-05-01 14:05 ` [PATCH 2/2] xfs: don't set bmapi total block req where minleft is sufficient Brian Foster
1 sibling, 0 replies; 3+ messages in thread
From: Brian Foster @ 2019-05-01 14:05 UTC (permalink / raw)
To: linux-xfs
The bmap block allocation code issues a sequence of retries to
perform an optimal allocation, gradually loosening constraints as
allocations fail. For example, the first attempt might begin at a
particular bno, with maxlen == minlen and alignment incorporated. As
allocations fail, the parameters fall back to different modes, drop
alignment requirements and reduce the minlen and total block
requirements.
For large extent allocations with an args.total value that exceeds
the allocation length (i.e., non-delalloc), the total value tends to
dominate despite these fallbacks. For example, an aligned extent
allocation request of tens to hundreds of MB that cannot be
satisfied from a particular AG will not succeed after dropping
alignment or minlen because xfs_alloc_space_available() never
selects an AG that can't satisfy args.total. The retry sequence
eventually reduces total and ultimately succeeds if a minlen extent
is available somewhere, but the first several retries are
effectively pointless in this scenario.
Beyond simply being inefficient, another side effect of this
behavior is that we drop alignment requirements too aggressively.
Consider a 1GB fallocate on a 15GB fs with 16 AGs and 128k stripe
unit:
# xfs_io -c "falloc 0 1g" /mnt/file
# <xfstests>/src/t_stripealign /mnt/file 32
/mnt/file: Start block 347176 not multiple of sunit 32
Despite the filesystem being completely empty, the fact that the
allocation request cannot be satisifed from a single AG means the
allocation doesn't succeed until xfs_bmap_btalloc() drops total from
the original value based on maxlen. This occurs after we've dropped
minlen and alignment (unnecessarily).
As a step towards addressing this problem, insert a new retry in the
bmap allocation sequence to drop minlen (from maxlen) before tossing
alignment. This should still result in as large of an extent as
possible as the block allocator prioritizes extent size in all but
exact allocation modes. By itself, this does not change the behavior
of the command above because the preallocation code still specifies
total based on maxlen. Instead, this facilitates preservation of
alignment once extra reservation is separated from the extent length
portion of the total block requirement.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
fs/xfs/libxfs/xfs_bmap.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 356ebd1cbe82..184ce11d9aee 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3586,6 +3586,14 @@ xfs_bmap_btalloc(
if ((error = xfs_alloc_vextent(&args)))
return error;
}
+ if (args.fsbno == NULLFSBLOCK && nullfb &&
+ args.minlen > ap->minlen) {
+ args.minlen = ap->minlen;
+ args.fsbno = ap->blkno;
+ error = xfs_alloc_vextent(&args);
+ if (error)
+ return error;
+ }
if (isaligned && args.fsbno == NULLFSBLOCK) {
/*
* allocation failed, so turn off alignment and
@@ -3597,9 +3605,7 @@ xfs_bmap_btalloc(
if ((error = xfs_alloc_vextent(&args)))
return error;
}
- if (args.fsbno == NULLFSBLOCK && nullfb &&
- args.minlen > ap->minlen) {
- args.minlen = ap->minlen;
+ if (args.fsbno == NULLFSBLOCK && nullfb) {
args.type = XFS_ALLOCTYPE_START_BNO;
args.fsbno = ap->blkno;
if ((error = xfs_alloc_vextent(&args)))
--
2.17.2
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH 2/2] xfs: don't set bmapi total block req where minleft is sufficient
2019-05-01 14:05 [PATCH 0/2] xfs: rely on minleft instead of total for bmbt res Brian Foster
2019-05-01 14:05 ` [PATCH 1/2] xfs: drop minlen before tossing alignment on bmap allocs Brian Foster
@ 2019-05-01 14:05 ` Brian Foster
1 sibling, 0 replies; 3+ messages in thread
From: Brian Foster @ 2019-05-01 14:05 UTC (permalink / raw)
To: linux-xfs
xfs_bmapi_write() takes a total block requirement parameter that is
passed down to the block allocation code and is used to specify the
total block requirement of the associated transaction. This is used
to try and select an AG that can not only satisfy the requested
extent allocation, but can also accommodate subsequent allocations
that might be required to complete the transaction. For example,
additional bmbt block allocations may be required on insertion of
the resulting extent to an inode data fork.
While it's important for callers to calculate and reserve such extra
blocks in the transaction, it is not necessary to pass the total
value to xfs_bmapi_write() in all cases. The latter automatically
sets minleft to ensure that sufficient free blocks remain after the
allocation attempt to expand the format of the associated inode
(i.e., such as extent to btree conversion, btree splits, etc).
Therefore, any callers that pass a total block requirement of the
bmap mapping length plus worst case bmbt expansion essentially
specify the additional reservation requirement twice. These callers
can pass a total of zero to rely on the bmapi minleft policy.
Beyond being superfluous, the primary motivation for this change is
that the total reservation logic in the bmbt code is dubious in
scenarios where minlen < maxlen and a maxlen extent cannot be
allocated (which more common for data extent allocations where
contiguity is not required). The total value is based on maxlen in
the xfs_bmapi_write() caller. If the bmbt code falls back to an
allocation between minlen and maxlen, that allocation will not
succeed until total is reset to minlen, which essentially throws
away any additional reservation included in total by the caller. In
addition, the total value is not reset until after alignment is
dropped, which means that such callers drop alignment far too
aggressively than necessary.
Update all callers of xfs_bmapi_write() that pass a total block
value of the mapping length plus bmbt reservation to instead pass
zero and rely on xfs_bmapi_minleft() to enforce the bmbt reservation
requirement. This trades off slightly less conservative AG selection
for the ability to preserve alignment in more scenarios.
xfs_bmapi_write() callers that incorporate unrelated or additional
reservations in total beyond what is already included in minleft
must continue to use the former.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
fs/xfs/libxfs/xfs_bmap.c | 1 -
fs/xfs/xfs_bmap_util.c | 4 ++--
fs/xfs/xfs_dquot.c | 4 ++--
fs/xfs/xfs_iomap.c | 4 ++--
fs/xfs/xfs_reflink.c | 4 ++--
fs/xfs/xfs_rtalloc.c | 3 +--
6 files changed, 9 insertions(+), 11 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 184ce11d9aee..1d1b5600273f 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4512,7 +4512,6 @@ xfs_bmapi_convert_delalloc(
bma.wasdel = true;
bma.offset = bma.got.br_startoff;
bma.length = max_t(xfs_filblks_t, bma.got.br_blockcount, MAXEXTLEN);
- bma.total = XFS_EXTENTADD_SPACE_RES(ip->i_mount, XFS_DATA_FORK);
bma.minleft = xfs_bmapi_minleft(tp, ip, whichfork);
if (whichfork == XFS_COW_FORK)
bma.flags = XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC;
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 06d07f1e310b..83e23d33096f 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -967,8 +967,8 @@ xfs_alloc_file_space(
xfs_trans_ijoin(tp, ip, 0);
error = xfs_bmapi_write(tp, ip, startoffset_fsb,
- allocatesize_fsb, alloc_type, resblks,
- imapp, &nimaps);
+ allocatesize_fsb, alloc_type, 0, imapp,
+ &nimaps);
if (error)
goto error0;
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index a1af984e4913..c1928e3c58ba 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -309,8 +309,8 @@ xfs_dquot_disk_alloc(
/* Create the block mapping. */
xfs_trans_ijoin(tp, quotip, XFS_ILOCK_EXCL);
error = xfs_bmapi_write(tp, quotip, dqp->q_fileoffset,
- XFS_DQUOT_CLUSTER_SIZE_FSB, XFS_BMAPI_METADATA,
- XFS_QM_DQALLOC_SPACE_RES(mp), &map, &nmaps);
+ XFS_DQUOT_CLUSTER_SIZE_FSB, XFS_BMAPI_METADATA, 0, &map,
+ &nmaps);
if (error)
return error;
ASSERT(map.br_blockcount == XFS_DQUOT_CLUSTER_SIZE_FSB);
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 63d323916bba..1ed7b8869c78 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -280,8 +280,8 @@ xfs_iomap_write_direct(
* caller gave to us.
*/
nimaps = 1;
- error = xfs_bmapi_write(tp, ip, offset_fsb, count_fsb,
- bmapi_flags, resblks, imap, &nimaps);
+ error = xfs_bmapi_write(tp, ip, offset_fsb, count_fsb, bmapi_flags, 0,
+ imap, &nimaps);
if (error)
goto out_res_cancel;
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 680ae7662a78..0c9cb3410554 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -421,8 +421,8 @@ xfs_reflink_allocate_cow(
/* Allocate the entire reservation as unwritten blocks. */
nimaps = 1;
error = xfs_bmapi_write(tp, ip, imap->br_startoff, imap->br_blockcount,
- XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC,
- resblks, imap, &nimaps);
+ XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC, 0, imap,
+ &nimaps);
if (error)
goto out_unreserve;
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index ac0fcdad0c4e..4ed0ab1852e4 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -798,8 +798,7 @@ xfs_growfs_rt_alloc(
*/
nmap = 1;
error = xfs_bmapi_write(tp, ip, oblocks, nblocks - oblocks,
- XFS_BMAPI_METADATA, resblks, &map,
- &nmap);
+ XFS_BMAPI_METADATA, 0, &map, &nmap);
if (!error && nmap < 1)
error = -ENOSPC;
if (error)
--
2.17.2
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2019-05-01 14:05 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-05-01 14:05 [PATCH 0/2] xfs: rely on minleft instead of total for bmbt res Brian Foster
2019-05-01 14:05 ` [PATCH 1/2] xfs: drop minlen before tossing alignment on bmap allocs Brian Foster
2019-05-01 14:05 ` [PATCH 2/2] xfs: don't set bmapi total block req where minleft is sufficient Brian Foster
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.