* [PATCHSET] AGFL reservation changes @ 2024-08-15 19:29 Krister Johansen 2024-08-15 19:30 ` [PATCH 0/5] linux: Modifying per-ag reservation to account for dependent allocations Krister Johansen 2024-08-15 19:36 ` [RFC PATCH 0/2] xfstests: fstests for agfl reservation Krister Johansen 0 siblings, 2 replies; 10+ messages in thread From: Krister Johansen @ 2024-08-15 19:29 UTC (permalink / raw) To: Chandan Babu R, Darrick J. Wong, Dave Chinner Cc: Dave Chinner, Zorro Lang, linux-xfs, fstests Hi, This patchset contains changes to linux and to xfstests to address a reoccuring panic in xfs_bmap_extents_to_btree. The RFC was discussed here: https://lore.kernel.org/linux-xfs/cover.1718232004.git.kjlx@templeofstupid.com/T/#t The kernel changes modify how the AGFL reservation is calculated when a filesystem is mounted. This is also pushed into the in-core per-AG structures to ensure that they do not consume the additional space reserved by this change. Additionally, this includes a pair of xfstest patches. The first introduces a test that triggers the problem we're trying to fix, as xfs/608. The second is a modification to xfs/306, which started failing because the increased space that is reserved by these changes is above the global reserve limit that this test intentionally lowers. The second patch increases this limit by one block. If this seems wrong, I'm happy to debug further. The change was based upon the assumption that artifically lowered global reservation limits had to be cognizant of the per-AG limits. -K ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 0/5] linux: Modifying per-ag reservation to account for dependent allocations 2024-08-15 19:29 [PATCHSET] AGFL reservation changes Krister Johansen @ 2024-08-15 19:30 ` Krister Johansen 2024-08-15 19:31 ` [PATCH 1/5] xfs: count the number of blocks in a per-ag reservation Krister Johansen ` (4 more replies) 2024-08-15 19:36 ` [RFC PATCH 0/2] xfstests: fstests for agfl reservation Krister Johansen 1 sibling, 5 replies; 10+ messages in thread From: Krister Johansen @ 2024-08-15 19:30 UTC (permalink / raw) To: Chandan Babu R, Darrick J. Wong, Dave Chinner Cc: Dave Chinner, Zorro Lang, linux-xfs, fstests Hi, These patches attempt to address the problem where dependent allocations fail during a multi allocation transaction, often as a result of refilling the AGFL. The failure results in a filesystem shutdown. In many cases, it manifests as a warn in xfs_bmap_extents_to_btree, when the dependent b-tree conversion fails after inadvertently getting an ENOSPC. The RFC series was here: https://lore.kernel.org/linux-xfs/cover.1718232004.git.kjlx@templeofstupid.com/T/#t This series attempts follows David's guidance around implementing the reservation using the existing xfs_alloc_ag_max_usable and XFS_ALLOCBT_AGFL_RESERVE mechanisms. This mostly worked as advertised (thanks!), however, a few additional patches were needed in order to address test failures. In particular, without the 'xfs: include min freelist in m_ag_max_usable' patch this series would fail the generic/223 tests around stripe alignment, because m_ag_max_usable was slightly larger than the actually usable space. This turned out to be the result of the first pre-fill of the AGFL and once corrected the tests pass. This currently has a failure in xfs/306. I'm including a patch for xfstests to address this. The per-AG reservation size on the filesystem in that test increased by just enough that the filesystem reservation that was manually configured to 16 blocks was too small. Increasing that test's reservation to 17 allows it to continue. The failure manifests as dd failing to make progress and the dmesg filling with errors about xfs_discard_folio. Tracing showed that conversion of the delalloc to a real allocation was getting ENOSPC. -K Krister Johansen (5): xfs: count the number of blocks in a per-ag reservation xfs: move calculation in xfs_alloc_min_freelist to its own function xfs: make alloc_set_aside and friends aware of per-AG reservations xfs: push the agfl set aside into xfs_alloc_space_available xfs: include min freelist in m_ag_max_usable fs/xfs/libxfs/xfs_alloc.c | 184 ++++++++++++++++++++++++++++++-------- fs/xfs/libxfs/xfs_alloc.h | 1 + fs/xfs/xfs_fsops.c | 21 +++++ fs/xfs/xfs_fsops.h | 1 + fs/xfs/xfs_mount.c | 24 +++++ fs/xfs/xfs_mount.h | 12 +++ 6 files changed, 207 insertions(+), 36 deletions(-) base-commit: 7bf888fa26e8f22bed4bc3965ab2a2953104ff96 -- 2.25.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/5] xfs: count the number of blocks in a per-ag reservation 2024-08-15 19:30 ` [PATCH 0/5] linux: Modifying per-ag reservation to account for dependent allocations Krister Johansen @ 2024-08-15 19:31 ` Krister Johansen 2024-08-15 19:32 ` [PATCH 2/5] xfs: move calculation in xfs_alloc_min_freelist to its own function Krister Johansen ` (3 subsequent siblings) 4 siblings, 0 replies; 10+ messages in thread From: Krister Johansen @ 2024-08-15 19:31 UTC (permalink / raw) To: Chandan Babu R, Darrick J. Wong, Dave Chinner Cc: Dave Chinner, Zorro Lang, linux-xfs, fstests In order to get the AGFL reservation, alloc_set_aside, and ag_max_usable calculations correct in the face of per-AG reservations, we need to understand the number of blocks that a per-AG reservation can leave free in a worst-case scenario. Compute the number of blocks used for a per-ag reservation by using AG 0's reservation. Other code already assumes AG 0's reservation is as large or larger than the other AG's. Subsequent patches will used the block count to construct a more accurate set of parameters. The reservation is counted after log_mount_finish because reservations are temporarily enabled for this operation. An updated alloc_set_aside and ag_max_usable need to be computed before enabling reservations at the end of a RW mount. Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> --- fs/xfs/xfs_fsops.c | 21 +++++++++++++++++++++ fs/xfs/xfs_fsops.h | 1 + fs/xfs/xfs_mount.c | 7 +++++++ fs/xfs/xfs_mount.h | 7 +++++++ 4 files changed, 36 insertions(+) diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c index c211ea2b63c4..fefc20df8a2e 100644 --- a/fs/xfs/xfs_fsops.c +++ b/fs/xfs/xfs_fsops.c @@ -551,6 +551,27 @@ xfs_fs_reserve_ag_blocks( return error; } +/* + * Count the number of reserved blocks that an AG has requested. + */ +uint +xfs_fs_count_reserved_ag_blocks( + struct xfs_mount *mp, + xfs_agnumber_t agno) +{ + + struct xfs_perag *pag; + uint blocks = 0; + + pag = xfs_perag_grab(mp, agno); + if (!pag) + return blocks; + + blocks = pag->pag_meta_resv.ar_asked + pag->pag_rmapbt_resv.ar_asked; + xfs_perag_rele(pag); + return blocks; +} + /* * Free space reserved for per-AG metadata. */ diff --git a/fs/xfs/xfs_fsops.h b/fs/xfs/xfs_fsops.h index 3e2f73bcf831..75f5fa1a38f4 100644 --- a/fs/xfs/xfs_fsops.h +++ b/fs/xfs/xfs_fsops.h @@ -12,6 +12,7 @@ int xfs_reserve_blocks(struct xfs_mount *mp, uint64_t request); int xfs_fs_goingdown(struct xfs_mount *mp, uint32_t inflags); int xfs_fs_reserve_ag_blocks(struct xfs_mount *mp); +uint xfs_fs_count_reserved_ag_blocks(struct xfs_mount *mp, xfs_agnumber_t agno); void xfs_fs_unreserve_ag_blocks(struct xfs_mount *mp); #endif /* __XFS_FSOPS_H__ */ diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 09eef1721ef4..d6ba67a29e3a 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -952,6 +952,13 @@ xfs_mountfs( xfs_warn(mp, "ENOSPC reserving per-AG metadata pool, log recovery may fail."); error = xfs_log_mount_finish(mp); + /* + * Before disabling the temporary per-ag reservation, count up the + * reserved blocks in AG 0. This will be used to determine how to + * re-size the AGFL reserve and alloc_set_aside prior to enabling + * reservations if the mount is RW. + */ + mp->m_ag_resblk_count = xfs_fs_count_reserved_ag_blocks(mp, 0); xfs_fs_unreserve_ag_blocks(mp); if (error) { xfs_warn(mp, "log mount finish failed"); diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index d0567dfbc036..800788043ca6 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -213,6 +213,13 @@ typedef struct xfs_mount { uint64_t m_resblks; /* total reserved blocks */ uint64_t m_resblks_avail;/* available reserved blocks */ uint64_t m_resblks_save; /* reserved blks @ remount,ro */ + + /* + * Number of per-ag resv blocks for a single AG. Derived from AG 0 + * under the assumption no per-AG reservations will be larger than that + * one. + */ + uint m_ag_resblk_count; struct delayed_work m_reclaim_work; /* background inode reclaim */ struct dentry *m_debugfs; /* debugfs parent */ struct xfs_kobj m_kobj; -- 2.25.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/5] xfs: move calculation in xfs_alloc_min_freelist to its own function 2024-08-15 19:30 ` [PATCH 0/5] linux: Modifying per-ag reservation to account for dependent allocations Krister Johansen 2024-08-15 19:31 ` [PATCH 1/5] xfs: count the number of blocks in a per-ag reservation Krister Johansen @ 2024-08-15 19:32 ` Krister Johansen 2024-08-15 19:33 ` [PATCH 3/5] xfs: make alloc_set_aside and friends aware of per-AG reservations Krister Johansen ` (2 subsequent siblings) 4 siblings, 0 replies; 10+ messages in thread From: Krister Johansen @ 2024-08-15 19:32 UTC (permalink / raw) To: Chandan Babu R, Darrick J. Wong, Dave Chinner Cc: Dave Chinner, Zorro Lang, linux-xfs, fstests This splits xfs_alloc_min_freelist into two pieces. The piece that remains in the function determines the appropriate level to pass to the calculation function. The calcution is extracted into its own inline function so that it can be used in multiple locations in xfs_alloc. No functional change. A subsequent patch will leverage this split. Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> --- fs/xfs/libxfs/xfs_alloc.c | 75 +++++++++++++++++++++++---------------- 1 file changed, 44 insertions(+), 31 deletions(-) diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 59326f84f6a5..17e029bb1b6d 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -79,6 +79,46 @@ xfs_prealloc_blocks( return XFS_IBT_BLOCK(mp) + 1; } +static inline unsigned int +xfs_alloc_min_freelist_calc( + const unsigned int bno_level, + const unsigned int cnt_level, + const unsigned int rmap_level) +{ + unsigned int min_free; + + /* + * For a btree shorter than the maximum height, the worst case is that + * every level gets split and a new level is added, then while inserting + * another entry to refill the AGFL, every level under the old root gets + * split again. This is: + * + * (full height split reservation) + (AGFL refill split height) + * = (current height + 1) + (current height - 1) + * = (new height) + (new height - 2) + * = 2 * new height - 2 + * + * For a btree of maximum height, the worst case is that every level + * under the root gets split, then while inserting another entry to + * refill the AGFL, every level under the root gets split again. This is + * also: + * + * 2 * (current height - 1) + * = 2 * (new height - 1) + * = 2 * new height - 2 + */ + + /* space needed by-bno freespace btree */ + min_free = bno_level * 2 - 2; + /* space needed by-size freespace btree */ + min_free += cnt_level * 2 - 2; + /* space needed reverse mapping used space btree */ + if (rmap_level) + min_free += rmap_level * 2 - 2; + + return min_free; +} + /* * The number of blocks per AG that we withhold from xfs_dec_fdblocks to * guarantee that we can refill the AGFL prior to allocating space in a nearly @@ -152,7 +192,6 @@ xfs_alloc_ag_max_usable( return mp->m_sb.sb_agblocks - blocks; } - static int xfs_alloc_lookup( struct xfs_btree_cur *cur, @@ -2449,39 +2488,13 @@ xfs_alloc_min_freelist( const unsigned int bno_level = pag ? pag->pagf_bno_level : 1; const unsigned int cnt_level = pag ? pag->pagf_cnt_level : 1; const unsigned int rmap_level = pag ? pag->pagf_rmap_level : 1; - unsigned int min_free; ASSERT(mp->m_alloc_maxlevels > 0); - /* - * For a btree shorter than the maximum height, the worst case is that - * every level gets split and a new level is added, then while inserting - * another entry to refill the AGFL, every level under the old root gets - * split again. This is: - * - * (full height split reservation) + (AGFL refill split height) - * = (current height + 1) + (current height - 1) - * = (new height) + (new height - 2) - * = 2 * new height - 2 - * - * For a btree of maximum height, the worst case is that every level - * under the root gets split, then while inserting another entry to - * refill the AGFL, every level under the root gets split again. This is - * also: - * - * 2 * (current height - 1) - * = 2 * (new height - 1) - * = 2 * new height - 2 - */ - - /* space needed by-bno freespace btree */ - min_free = min(bno_level + 1, mp->m_alloc_maxlevels) * 2 - 2; - /* space needed by-size freespace btree */ - min_free += min(cnt_level + 1, mp->m_alloc_maxlevels) * 2 - 2; - /* space needed reverse mapping used space btree */ - if (xfs_has_rmapbt(mp)) - min_free += min(rmap_level + 1, mp->m_rmap_maxlevels) * 2 - 2; - return min_free; + return xfs_alloc_min_freelist_calc( + min(bno_level + 1, mp->m_alloc_maxlevels), + min(cnt_level + 1, mp->m_alloc_maxlevels), + xfs_has_rmapbt(mp) ? min(rmap_level + 1, mp->m_rmap_maxlevels) : 0); } /* -- 2.25.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/5] xfs: make alloc_set_aside and friends aware of per-AG reservations 2024-08-15 19:30 ` [PATCH 0/5] linux: Modifying per-ag reservation to account for dependent allocations Krister Johansen 2024-08-15 19:31 ` [PATCH 1/5] xfs: count the number of blocks in a per-ag reservation Krister Johansen 2024-08-15 19:32 ` [PATCH 2/5] xfs: move calculation in xfs_alloc_min_freelist to its own function Krister Johansen @ 2024-08-15 19:33 ` Krister Johansen 2024-08-15 19:34 ` [PATCH 4/5] xfs: push the agfl set aside into xfs_alloc_space_available Krister Johansen 2024-08-15 19:35 ` [PATCH 5/5] xfs: include min freelist in m_ag_max_usable Krister Johansen 4 siblings, 0 replies; 10+ messages in thread From: Krister Johansen @ 2024-08-15 19:33 UTC (permalink / raw) To: Chandan Babu R, Darrick J. Wong, Dave Chinner Cc: Dave Chinner, Zorro Lang, linux-xfs, fstests The code in xfs_alloc_set_aside and xfs_alloc_ag_max_usable assumes that the amount of free space that remains when a filesystem is at ENOSPC corresponds to the filesystem actually having consumed almost all the available free space. These functions control how much space is set aside to refill the AGFL when a filesystem is almost out of space. With per-AG reservations, an AG has more space available at ENOSPC than it did in the past. This leads to situations where the reservation code informs callers that an ENOSPC condition is present, yet the filesystem isn't fully empty. As a result, under certain edge cases, allocations that need to refill the AGFL at a reservation-induced ENOSPC may not have enough space set aside to complete that operation successfully. This is because there is more free-space metadata to track than there used to be. The result is ENOSPC related shutdowns in paths that only partially succeed at satsifying their allocations. Fix this by determining the size of the free space that remains when a filesystem's reservation is unused but all remaining blocks have been consumed. Use this remaining space to determine the size of the b-trees that manage the space, and correspondingly, the number of blocks needed to refill the AGFL if we have a split at or near ENOSPC. Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> --- fs/xfs/libxfs/xfs_alloc.c | 85 +++++++++++++++++++++++++++++++++++++-- fs/xfs/xfs_mount.c | 16 ++++++++ 2 files changed, 98 insertions(+), 3 deletions(-) diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 17e029bb1b6d..826f527d20f2 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -26,6 +26,7 @@ #include "xfs_ag.h" #include "xfs_ag_resv.h" #include "xfs_bmap.h" +#include "xfs_bmap_btree.h" #include "xfs_health.h" #include "xfs_extfree_item.h" @@ -131,9 +132,81 @@ xfs_alloc_min_freelist_calc( * fdblocks to ensure user allocation does not overcommit the space the * filesystem needs for the AGFLs. The rmap btree uses a per-AG reservation to * withhold space from xfs_dec_fdblocks, so we do not account for that here. + * + * This value should be used on filesystems that do not have a per-AG + * reservation enabled. If per-AG reservations are on, then this value needs to + * be scaled to the size of the metadata used to track the freespace that the + * reservation prevents from being consumed. */ #define XFS_ALLOCBT_AGFL_RESERVE 4 +/* + * Calculate the number of blocks that should be reserved on a per-AG basis when + * per-AG reservations are in use. This is necessary because the per-AG + * reservations result in ENOSPC occurring before the filesystem is truly empty. + * This means that in cases where the reservations are enabled, additional space + * needs to be set aside to manage the freespace data structures that remain + * because of space held by the reservation. This function attempts to + * determine how much free space will remain, in a worst-case scenario, and then + * how much space is needed to manage the metadata for the space that remains. + * Failure to do this correctly results in users getting ENOSPC errors in the + * middle of dependent allocations when they are close to hitting the + * reservation-induced limits. + */ +static unsigned int +xfs_allocbt_agfl_reserve( + struct xfs_mount *mp) +{ + unsigned int ndependent_allocs, free_height, agfl_resv, dep_alloc_sz; + unsigned int agfl_min_refill; + + if (!mp->m_ag_resblk_count) + return XFS_ALLOCBT_AGFL_RESERVE + 4; + + /* + * Worst case, the number of dependent allocations will be a split for + * every level in the BMBT. Use the max BMBT levels for this filesystem + * to determine how many dependent allocations we'd see at the most. + */ + ndependent_allocs = XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK); + + /* + * Assume that worst case, the free space trees are managing + * single-block free records when all per-ag reservations are at their + * maximum size. Use m_ag_resblk_count, which is the maximum per-AG + * reserved space, to calculate the number of b-tree blocks needed to + * index this free space, and use that to determine the maximum height + * of the free space b-tree in this case. + */ + free_height = xfs_btree_compute_maxlevels(mp->m_alloc_mnr, + mp->m_ag_resblk_count); + + /* + * Assume that data extent can perform a full-height split, but that + * subsequent split from dependent allocations will be (height - 2). + * The these values are multipled by 2, because they count both + * freespace trees (bnobt and cnobt). + */ + agfl_resv = free_height * 2; + dep_alloc_sz = (max(free_height, 2) - 2) * 2; + + /* + * Finally, ensure that we have enough blocks reserved to keep the agfl + * at its minimum fullness for any dependent allocation once our + * freespace tree reaches its maximum height. In this case we need to + * compute the free_height + 1, and max rmap which would be our worst + * case scenario. If this function doesn't account for agfl fullness, + * it will underestimate the amount of space that must remain free to + * continue allocating. + */ + agfl_min_refill = xfs_alloc_min_freelist_calc( + free_height + 1, + free_height + 1, + xfs_has_rmapbt(mp) ? mp->m_rmap_maxlevels : 0); + + return agfl_resv + agfl_min_refill + (ndependent_allocs * dep_alloc_sz); +} + /* * Compute the number of blocks that we set aside to guarantee the ability to * refill the AGFL and handle a full bmap btree split. @@ -150,13 +223,19 @@ xfs_alloc_min_freelist_calc( * aside a few blocks which will not be reserved in delayed allocation. * * For each AG, we need to reserve enough blocks to replenish a totally empty - * AGFL and 4 more to handle a potential split of the file's bmap btree. + * AGFL and 4 more to handle a potential split of the file's bmap btree if no AG + * reservation is enabled. + * + * If per-AG reservations are enabled, then the size of the per-AG reservation + * needs to be factored into the space that is set aside to replenish a empty + * AGFL when the filesystem is at a reservation-induced ENOSPC (instead of + * actually empty). */ unsigned int xfs_alloc_set_aside( struct xfs_mount *mp) { - return mp->m_sb.sb_agcount * (XFS_ALLOCBT_AGFL_RESERVE + 4); + return mp->m_sb.sb_agcount * xfs_allocbt_agfl_reserve(mp); } /* @@ -180,7 +259,7 @@ xfs_alloc_ag_max_usable( unsigned int blocks; blocks = XFS_BB_TO_FSB(mp, XFS_FSS_TO_BB(mp, 4)); /* ag headers */ - blocks += XFS_ALLOCBT_AGFL_RESERVE; + blocks += xfs_allocbt_agfl_reserve(mp); blocks += 3; /* AGF, AGI btree root blocks */ if (xfs_has_finobt(mp)) blocks++; /* finobt root block */ diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index d6ba67a29e3a..ec1f7925b31f 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -987,6 +987,22 @@ xfs_mountfs( xfs_qm_mount_quotas(mp); } + /* + * Prior to enabling the reservations as part of completing a RW mount, + * recompute the alloc_set_aside and ag_max_usable values to account for + * the size of the free space that the reservation occupies. Since the + * reservation keeps some free space from being utilized, these values + * need to account for the space that must also be set aside to do AGFL + * management during transactions with dependent allocations. The + * reservation initialization code uses the set_aside value and modifies + * ag_max_usable, which means this needs to get configured before the + * reservation is enabled for real. The earlier temporary + * enabling of the reservation allows this code to estimate the size of + * the reservation in order to perform its calculations. + */ + mp->m_alloc_set_aside = xfs_alloc_set_aside(mp); + mp->m_ag_max_usable = xfs_alloc_ag_max_usable(mp); + /* * Now we are mounted, reserve a small amount of unused space for * privileged transactions. This is needed so that transaction -- 2.25.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 4/5] xfs: push the agfl set aside into xfs_alloc_space_available 2024-08-15 19:30 ` [PATCH 0/5] linux: Modifying per-ag reservation to account for dependent allocations Krister Johansen ` (2 preceding siblings ...) 2024-08-15 19:33 ` [PATCH 3/5] xfs: make alloc_set_aside and friends aware of per-AG reservations Krister Johansen @ 2024-08-15 19:34 ` Krister Johansen 2024-08-15 19:35 ` [PATCH 5/5] xfs: include min freelist in m_ag_max_usable Krister Johansen 4 siblings, 0 replies; 10+ messages in thread From: Krister Johansen @ 2024-08-15 19:34 UTC (permalink / raw) To: Chandan Babu R, Darrick J. Wong, Dave Chinner Cc: Dave Chinner, Zorro Lang, linux-xfs, fstests The blocks that have been set aside for dependent allocations and freelist refilling at reservation induced ENOSPC are deducted from m_ag_max_usable, which prevents them from being factored into the longest_free_extent and maxlen calculations. However, it's still possible to eat into this space by making multiple small allocations. Catch this case by withholding the space that's been set aside in xfs_alloc_space_available's available space calculation. Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> --- fs/xfs/libxfs/xfs_alloc.c | 15 +++++++++++++-- fs/xfs/libxfs/xfs_alloc.h | 1 + fs/xfs/xfs_mount.c | 1 + fs/xfs/xfs_mount.h | 5 +++++ 4 files changed, 20 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 826f527d20f2..4dd401d407c2 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -153,7 +153,7 @@ xfs_alloc_min_freelist_calc( * middle of dependent allocations when they are close to hitting the * reservation-induced limits. */ -static unsigned int +unsigned int xfs_allocbt_agfl_reserve( struct xfs_mount *mp) { @@ -2593,6 +2593,7 @@ xfs_alloc_space_available( xfs_extlen_t reservation; /* blocks that are still reserved */ int available; xfs_extlen_t agflcount; + xfs_extlen_t set_aside = 0; if (flags & XFS_ALLOC_FLAG_FREEING) return true; @@ -2605,6 +2606,16 @@ xfs_alloc_space_available( if (longest < alloc_len) return false; + /* + * Withhold from the available space any that has been set-aside as a + * reserve for refilling the AGFL close to ENOSPC. In the case where a + * dependent allocation is in progress, allow that space to be consumed + * so that the dependent allocation may complete successfully. Without + * this, we may ENOSPC in the middle of the allocation chain and + * shutdown the filesystem. + */ + if (args->tp->t_highest_agno == NULLAGNUMBER) + set_aside = args->mp->m_ag_agfl_setaside; /* * Do we have enough free space remaining for the allocation? Don't * account extra agfl blocks because we are about to defer free them, @@ -2612,7 +2623,7 @@ xfs_alloc_space_available( */ agflcount = min_t(xfs_extlen_t, pag->pagf_flcount, min_free); available = (int)(pag->pagf_freeblks + agflcount - - reservation - min_free - args->minleft); + reservation - min_free - args->minleft - set_aside); if (available < (int)max(args->total, alloc_len)) return false; diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h index fae170825be0..7e92c4c455a1 100644 --- a/fs/xfs/libxfs/xfs_alloc.h +++ b/fs/xfs/libxfs/xfs_alloc.h @@ -70,6 +70,7 @@ typedef struct xfs_alloc_arg { /* freespace limit calculations */ unsigned int xfs_alloc_set_aside(struct xfs_mount *mp); unsigned int xfs_alloc_ag_max_usable(struct xfs_mount *mp); +unsigned int xfs_allocbt_agfl_reserve(struct xfs_mount *mp); xfs_extlen_t xfs_alloc_longest_free_extent(struct xfs_perag *pag, xfs_extlen_t need, xfs_extlen_t reserved); diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index ec1f7925b31f..1bc80983310a 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -1002,6 +1002,7 @@ xfs_mountfs( */ mp->m_alloc_set_aside = xfs_alloc_set_aside(mp); mp->m_ag_max_usable = xfs_alloc_ag_max_usable(mp); + mp->m_ag_agfl_setaside = xfs_allocbt_agfl_reserve(mp); /* * Now we are mounted, reserve a small amount of unused space for diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 800788043ca6..4a9321424954 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -220,6 +220,11 @@ typedef struct xfs_mount { * one. */ uint m_ag_resblk_count; + /* + * Blocks set aside to refill the agfl at ENOSPC and satisfy any + * dependent allocation resulting from a chain of BMBT splits. + */ + uint m_ag_agfl_setaside; struct delayed_work m_reclaim_work; /* background inode reclaim */ struct dentry *m_debugfs; /* debugfs parent */ struct xfs_kobj m_kobj; -- 2.25.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 5/5] xfs: include min freelist in m_ag_max_usable 2024-08-15 19:30 ` [PATCH 0/5] linux: Modifying per-ag reservation to account for dependent allocations Krister Johansen ` (3 preceding siblings ...) 2024-08-15 19:34 ` [PATCH 4/5] xfs: push the agfl set aside into xfs_alloc_space_available Krister Johansen @ 2024-08-15 19:35 ` Krister Johansen 4 siblings, 0 replies; 10+ messages in thread From: Krister Johansen @ 2024-08-15 19:35 UTC (permalink / raw) To: Chandan Babu R, Darrick J. Wong, Dave Chinner Cc: Dave Chinner, Zorro Lang, linux-xfs, fstests If agfl_reserve blocks are conditionally withheld from consideration in xfs_alloc_space_available, then m_ag_max_usable overstates the amount of max available space on an empty filesystem by the amount of blocks that mkfs placed into the AGFL on our behalf. While this space _is_ technically free, it's not usable for a maximum sized allocation on an empty filesystem, because the blocks must remain in the AGFL in order for an allocation to succeed. Without this, stripe aligned allocations on an empty AG pick a size that they can't actually get which leads to allocations which can't be satisfied and that consequently come back unaligned. Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> --- fs/xfs/libxfs/xfs_alloc.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 4dd401d407c2..26447e6061b3 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -246,7 +246,9 @@ xfs_alloc_set_aside( * - the AG superblock, AGF, AGI and AGFL * - the AGF (bno and cnt) and AGI btree root blocks, and optionally * the AGI free inode and rmap btree root blocks. - * - blocks on the AGFL according to xfs_alloc_set_aside() limits + * - blocks on the AGFL when the filesystem is empty + * - blocks on needed to AGFL while performing dependent allocations close + * to ENOSPC as given by xfs_allocbt_agfl_reserve() * - the rmapbt root block * * The AG headers are sector sized, so the amount of space they take up is @@ -259,6 +261,13 @@ xfs_alloc_ag_max_usable( unsigned int blocks; blocks = XFS_BB_TO_FSB(mp, XFS_FSS_TO_BB(mp, 4)); /* ag headers */ + /* + * Minimal freelist length when filesystem is completely empty. + * xfs_alloc_min_freelist needs m_alloc_maxlevels so this is computed in + * our second invocation of xfs_alloc_ag_max_usable + */ + if (mp->m_alloc_maxlevels > 0) + blocks += xfs_alloc_min_freelist(mp, NULL); blocks += xfs_allocbt_agfl_reserve(mp); blocks += 3; /* AGF, AGI btree root blocks */ if (xfs_has_finobt(mp)) -- 2.25.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC PATCH 0/2] xfstests: fstests for agfl reservation 2024-08-15 19:29 [PATCHSET] AGFL reservation changes Krister Johansen 2024-08-15 19:30 ` [PATCH 0/5] linux: Modifying per-ag reservation to account for dependent allocations Krister Johansen @ 2024-08-15 19:36 ` Krister Johansen 2024-08-15 19:37 ` [RFC PATCH 1/2] xfs/608: a test case for xfs_bmap_extents_to_btree allocation failure Krister Johansen 2024-08-15 19:40 ` [RFC PATCH 2/2] xfs/306: update resblks to account for increased per-ag reservation size Krister Johansen 1 sibling, 2 replies; 10+ messages in thread From: Krister Johansen @ 2024-08-15 19:36 UTC (permalink / raw) To: Chandan Babu R, Darrick J. Wong, Dave Chinner Cc: Dave Chinner, Zorro Lang, linux-xfs, fstests Hi, This set includes a test to trigger the xfs_bmap_extents_to_btree WARN that tends to the the most observed symptom of the dependent allocation failures that this author encounters. The patches to fix the problem also introduced a failure in xfs/306. I'm including a fix for that as well, though if this is an inappropriate resolution of the underlying problem, I'm happy to take another approach. This is sent as an RFC since it's my first foray into submitting xfstests. The 608 test should be deterministic on kernels that precede the AG-aware allocator re-write. On my machines on recent kernels I'm getting about a 40-50% success rate with triggering the bug with this test now. -K Krister Johansen (2): xfs/608: a test case for xfs_bmap_extents_to_btree allocation failure xfs/306: update resblks to account for increased per-ag reservation size tests/xfs/306 | 2 +- tests/xfs/608 | 372 ++++++++++++++++++++++++++++++++++++++++++++++ tests/xfs/608.out | 2 + 3 files changed, 375 insertions(+), 1 deletion(-) create mode 100755 tests/xfs/608 create mode 100644 tests/xfs/608.out base-commit: f5ada754d5838d29fd270257003d0d123a9d1cd2 -- 2.25.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC PATCH 1/2] xfs/608: a test case for xfs_bmap_extents_to_btree allocation failure 2024-08-15 19:36 ` [RFC PATCH 0/2] xfstests: fstests for agfl reservation Krister Johansen @ 2024-08-15 19:37 ` Krister Johansen 2024-08-15 19:40 ` [RFC PATCH 2/2] xfs/306: update resblks to account for increased per-ag reservation size Krister Johansen 1 sibling, 0 replies; 10+ messages in thread From: Krister Johansen @ 2024-08-15 19:37 UTC (permalink / raw) To: Chandan Babu R, Darrick J. Wong, Dave Chinner Cc: Dave Chinner, Zorro Lang, linux-xfs, fstests Add a test case that reproduces the xfs_bmap_extents_to_btree warnings that can be seen when a filesystem runs out of space while performing a dependent allocation. This test should be 100% reproducible on older kernels, prior to the AG-aware allocator re-write. The test runs a busy work job to keep another AG occupied in order to trigger the problem regardless of kernel version. However, this is only partially successful. On newer kernels, wiht the AG-aware allocator, this test now triggers the failure around 40-50% of the time on the author's test machine. Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> --- tests/xfs/608 | 372 ++++++++++++++++++++++++++++++++++++++++++++++ tests/xfs/608.out | 2 + 2 files changed, 374 insertions(+) create mode 100755 tests/xfs/608 create mode 100644 tests/xfs/608.out diff --git a/tests/xfs/608 b/tests/xfs/608 new file mode 100755 index 00000000..9db8d21f --- /dev/null +++ b/tests/xfs/608 @@ -0,0 +1,372 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2024 YOUR NAME HERE. All Rights Reserved. +# +# FS QA Test 608 +# +# This test reproduces the xfs_bmap_extents_to_btree WARN that occurs when XFS +# fails to allocate a block for the new b-tree in the desired AG. This should +# reproduce the issue on kernels that predate the fix for the AG-aware extent +# allocator (< 6.3). +# +. ./common/preamble +_begin_fstest dangerous insert prealloc punch + +# Override the default cleanup function. + _cleanup() +{ + cd / + _destroy_loop_device $LOOP_DEV + rm -f $tmp.* $LOOP_FILE +} + +declare -a workers + +busy_work() +{ + while : + do + $XFS_IO_PROG -f -c "falloc 0 $BLOCK_SIZE" $BUSY_FILE + $XFS_IO_PROG -f -c "fpunch 0 $BLOCK_SIZE" $BUSY_FILE + done +} + +kill_busy_procs() +{ + for pid in ${workers[@]}; do + kill $pid + wait $pid + done + # Despite killing the workers and waiting for them to exit, we still + # sometimes get an EBUSY unmounting the loop device. Wait a second + # before returning to give lingering refcounts a chance to reach zero. + sleep 1 +} + +find_freesp() +{ + umount $LOOP_MNT + local freesp=$($XFS_DB_PROG $LOOP_DEV -c "agf 1" -c "print freeblks" \ + | awk '{print $3}') + mount -o nodiscard $LOOP_DEV $LOOP_MNT + echo $freesp +} + +find_biggest_freesp() +{ + umount $LOOP_MNT + local freesp=$($XFS_DB_PROG $LOOP_DEV -c 'agf 1' -c 'addr cntroot' \ + -c btdump | sed -rn 's/^[0-9]+:\[[0-9]+,([0-9]+)\].*/\1/p' \ + | tail -1) + mount -o nodiscard $LOOP_DEV $LOOP_MNT + echo $freesp +} + + +# Import common functions. +. ./common/filter + +# real QA test starts here + +# Modify as appropriate. +_supported_fs xfs +_require_test +_require_xfs_io_command "falloc" +_require_xfs_io_command "finsert" +_require_xfs_io_command "fpunch" + +# Require loop devices so that this test can create a small filesystem with +# a specific geometry that assists in making this easier to reproduce. +_require_loop + +LOOP_FILE=$TEST_DIR/$seq.img +LOOP_MNT=$TEST_DIR/$seq.mnt +mkdir -p $LOOP_MNT +$XFS_IO_PROG -ft -c "truncate 2g" $LOOP_FILE >> $seqres.full +LOOP_DEV=`_create_loop_device $LOOP_FILE` +loop_mkfs_addl_opts= + +$MKFS_XFS_PROG 2>&1 | grep -q rmapbt && \ + loop_mkfs_addl_opts="$loop_mkfs_addl_opts,rmapbt=0" +$MKFS_XFS_PROG 2>&1 | grep -q reflink && \ + loop_mkfs_addl_opts="$loop_mkfs_addl_opts,reflink=1" + +_mkfs_dev "-b size=4096 -m crc=1$loop_mkfs_addl_opts -d size=1708m,agcount=2" \ + " -l size=1986b" $LOOP_DEV >> $seqres.full +# nodiscard makes the unbusying of extents more predictable which makes the test +# repeatable. +_mount -o nodiscard $LOOP_DEV $LOOP_MNT + +BLOCK_SIZE=$(_get_file_block_size $LOOP_MNT) + +# Add a directory under the root of LOOP_MNT in order to ensure the files placed +# there end up in AG 1. + +TEST_SUBDIR=testdir/testsubdir +mkdir -p $LOOP_MNT/$TEST_SUBDIR + +# There are 3 files in this test. The first is allocated to consume most of the +# space in the AG. The test then punches holes in the file in order to allow +# the second file to allocate the fragmented blocks as individual extents. The +# second file is written such that it is only a couple of operations away from +# being converted from in-line extents to a b-tree in its bmbt. The third file +# simply exists for us to allocate spare blocks to in order to take the AG right +# up against ENOSPC. + +FILE1=$LOOP_MNT/$TEST_SUBDIR/file1 +FILE2=$LOOP_MNT/$TEST_SUBDIR/file2 +FILE3=$LOOP_MNT/$TEST_SUBDIR/file3 + +# BUSY_FILE is here to keep AG0 busy on versions of XFS where the allocator is +# allowed to check lower numbered AGs if it fails in a higher numbered one. +BUSY_FILE=$LOOP_MNT/testdir/busyfile + +# Calculate the number of extents we need in the cnobt and bnobt. This should +# be the maximum b-tree leaf size minus one, so that after adding two extents a +# split is triggered. + +# The test is currently set to always use CRC, otherwise this would be 56 if CRC +# and 16 if not. +alloc_block_len=56 +allocbt_leaf_maxrecs=$(((BLOCK_SIZE - alloc_block_len) / 8)) + +# Look at the attrfork offset in FILE2's inode in order to determine the number +# of extents before this splits to a b-tree. This test assumes a v5 filesystem +# so if forkoff is zero, it falls back to LITINO of 336 and uses a bmbt_rec_size +# of 16. +touch $FILE2 +file2_inode=$(stat -c '%i' "$FILE2") +# Temporarily unmount while parameters are gathered via xfs_db +umount $LOOP_MNT +forkoff=$($XFS_DB_PROG $LOOP_DEV -c "inode $file2_inode" \ + -c "print core.forkoff" | awk '{print $3}') +freeblks=$($XFS_DB_PROG $LOOP_DEV -c "agf 1" \ + -c "print freeblks" | awk '{print $3}') +mount -o nodiscard $LOOP_DEV $LOOP_MNT + +# We'll recreate FILE2 later. For now be as empty as we can. +rm $FILE2 + +# Some versions of xfs_db contain the agresv command, but not all do. +# Additionally, some parameters about how much space is actually allocatable +# aren't visible from xfs_db. Tracepoints have been helpful in figuring this +# out when developing the test by hand. Instead of trying to parse ftrace data +# and hope that the right tracepoints are available, brute force the actual +# allocatable maximum size by repeatedly trying to allocate larger offsets +# subtracted from $freeblks until one succeeds. +for (( i = 0, ag = 0; ; i++ )) +do + $XFS_IO_PROG -f -c "falloc 0 $(( (freeblks-i)*BLOCK_SIZE ))" $FILE1 + ag=$(xfs_bmap -v -n 1 $FILE1 | tail -1 | awk '{print $4}') + rm $FILE1 + (( ag == 1 )) && break +done + +# Let free'd extents unbusy +sleep 30 + +# At this point, $i is one larger than whatever the allocator thinks the maximum +# available space is. This is based upon the asssumption that the data +# allocation we made above set minleft = 1, so the allocation that finally fit +# into AG 1 has had any reservation withheld along with the space the allocator +# requested be withheld for any bmbt expansion. +freeblkseff=$((freeblks - i - 1)) +blocks_withheld=$((i+1)) + +iforksize=$((forkoff > 0 ? forkoff * 8 : 336)) +maxextents=$((iforksize / 16)) +# We'll need to allocate maxextents - 1 for this test, so that the last +# allocation to the file forces an extents -> b-tree conversion. +wanted_extents=$((maxextents - 1)) + +# The first allocation is just a big chunk into the first file. Grab the +# majority of the free blocks. +first_alloc_blocks=$((freeblkseff - 8192)) + +$XFS_IO_PROG -f -c "falloc 0 $((BLOCK_SIZE * first_alloc_blocks))" $FILE1 >> \ + $seqres.full + +# Insert space in the middle in order to help the allocator pick sequential +# blocks when we add space back later. If we don't do this, then it can break +# up larger extents instead of grabbing the ones we fpunch'd out. +# +# The insert offset was chosen arbitrarily and placed towards the beginning of +# the file for the conveinence of humans. The insert_blocks size needs to be +# larger than the space we'll later punch out of file 2 and insert back into +# file 1. This is on the order of 7 blocks, so 512 should always be large +# enough. +first_insert_offset=$((BLOCK_SIZE * 2083)) +first_insert_blocks=$((BLOCK_SIZE * 512)) + +$XFS_IO_PROG -f -c "finsert $first_insert_offset $first_insert_blocks" \ + $FILE1 >> $seqres.full + +# Punch 3-block holes into the file. This number was chosen so that we could +# re-allocate blocks from these chunks without causing the extent to get removed +# from the free-space btrees. +# +# Punch enough holes to ensure that the bnobt and cnobt end up two extents away +# from a b-tree split, and overshoot that value by the number we need to consume +# in the second file to end up with wanted_extents - 1. +num_holes=$(((allocbt_leaf_maxrecs-2) + (wanted_extents - 1))) +end_hole_offset=$((num_holes * 4 - 3)) +hole_blocks=$((BLOCK_SIZE * 3)) + +for i in $(seq 1 4 $end_hole_offset); do + $XFS_IO_PROG -f -c "fpunch $((BLOCK_SIZE * i)) $hole_blocks" \ + $FILE1 >> $seqres.full +done + +# Use the newly created holes to create extents in our victim file. The goal is +# to allocate up to the point of b-tree conversion minus 2. The remaining space +# is placed in the n-1 extent, and then the last is reserved for the split we +# trigger later. The holes are placed after a gap that's left towards the front +# of the file to allocate the rest of the space. This is done to get the +# allocator to give us the contiguous free chunk that would have previously been +# occupied by the per-AG reservation's free space. +alloc_hole_seq=$(((wanted_extents - 1) * 4 - 3)) + +# The offset for the placement of the holes needs to be after the remaining +# freespace chunk so calculate how big that needs to be first. We may need to +# recalculate this value to account for blocks freed from the AGFL later. +biggest_freesp=$(find_biggest_freesp) + +# 3x the biggest chunk of free blocks should be a big enough gap +hole_offset=$((biggest_freesp * 3)) + +for i in $(seq 1 4 $alloc_hole_seq); do + $XFS_IO_PROG -f \ + -c "falloc $((BLOCK_SIZE * (i+hole_offset))) $hole_blocks" \ + $FILE2 >> $seqres.full +done + +# Attempt to compensate for any late-breaking over/undershoot in the desired +# extent count by checking the number of extents in the bnobt and adding or +# removing space to try to arrive at the desired number. +umount $LOOP_MNT +current_extents=$($XFS_DB_PROG $LOOP_DEV -c 'agf 1' -c 'addr bnoroot' \ + -c 'btdump' | grep recs | sed -rn 's/^recs\[.*\-([0-9]+)\].*/\1/p' | \ + awk '{a +=int($1)} END{printf("%d\n", a);}') +mount -o nodiscard $LOOP_DEV $LOOP_MNT + +wanted_allocbt=$((allocbt_leaf_maxrecs-2)) +if [[ $current_extents -gt $wanted_allocbt ]]; then + ext_diff=$(( current_extents - wanted_allocbt )) + end_offset=$(( ext_diff * 4 - 3 )) + for i in $(seq 1 4 $end_offset); do + $XFS_IO_PROG -f -c "falloc $((BLOCK_SIZE * i)) $hole_blocks" \ + $FILE1 >> $seqres.full + done +elif [[ $current_extents -lt $wanted_allocbt ]]; then + ext_diff=$(( wanted_allocbt - current_extents )) + end_offset=$(( (ext_diff * 4 - 3) + end_hole_offset )) + for i in $(seq $end_hole_offset 4 $end_offset); do + $XFS_IO_PROG -f -c "fpunch $((BLOCK_SIZE * i )) $hole_blocks" \ + $FILE1 >> $seqres.full + done +fi + +# The previous falloc should have triggered a reverse-split of the freespace +# btrees. The next alloc should cause the freelist to be drained. Recompute +# the available freespace with the understanding that we'll need to do this +# again after the AGFL is trimmed by the next allocation. Leave a few blocks +# free so that we can use FILE3 to create the last needed set of free extents +# before triggering a split while simultaneously using the remaining space. +freesp_remaining=$(find_freesp) +f2_alloc_blocks=$((freesp_remaining - blocks_withheld - 10)) + +$XFS_IO_PROG -f -c "falloc 0 $((BLOCK_SIZE * f2_alloc_blocks))" \ + $FILE2 >> $seqres.full + +# Recompute the remaining blocks and let FILE3 consume the remainder of the +# space. This is intended to both leave one more free extent in the btrees and +# take us down to being right before ENOSPC. +freesp_remaining=$(find_freesp) +f3_alloc_blocks=$((freesp_remaining - blocks_withheld)) +biggest_freesp=$(find_biggest_freesp) + +# Due to variance outside of the control of the test, the remaining freespace +# may be broken into smaller chunks than it's possible to allocate in a single +# attempt. If the test tries to allocate one big chunk, that allocation will +# fail and consult the next AG. To prevent that from happening, check the size +# of the remaining freespace in AG1 and break this allocation into smaller +# chunks that a) consume space from AG1 and b) do not cause the extents we've +# carefully added to the freespace trees to get removed. +if [[ $f3_alloc_blocks -lt $biggest_freesp ]]; then + $XFS_IO_PROG -f -c "falloc 0 $((BLOCK_SIZE * f3_alloc_blocks))" \ + $FILE3 >> $seqres.full +else + alloc_left=$f3_alloc_blocks + alloc_blocks=$((biggest_freesp - 1)) + alloc_ofst=0 + while ((alloc_left > 0)); do + size=$((alloc_blocks * BLOCK_SIZE)) + $XFS_IO_PROG -f -c "falloc $alloc_ofst $size" \ + $FILE3 >> $seqres.full + alloc_left=$((alloc_left - alloc_blocks)) + alloc_ofst=$((alloc_ofst + (1000*BLOCK_SIZE))) + biggest_freesp=$(find_biggest_freesp) + if [[ $alloc_left -lt $biggest_freesp ]]; then + alloc_blocks=$alloc_left + else + alloc_blocks=$((biggest_freesp - 1)) + fi + done +fi + +# That's it for the setup. Now the test punches out a 12 block extent as one 6 +# block chunk in the middle, followed by two 3 block chunks on either side. It +# sleeps after the 6 block chunk so that portion of the extent will un-busy, but +# the 3 block chunks on either side stay (temporarily) unavailable. While the +# chunks on either side are busy, re-allocate some of the space that's been +# free'd back to FILE1 so that the final falloc to FILE2 brings us to ENOSPC. +f2_off=2560 +$XFS_IO_PROG -f -c "fpunch $((BLOCK_SIZE * f2_off)) $((BLOCK_SIZE * 6))" \ + $FILE2 >> $seqres.full +# Before we finish punching the final holes, start up some busy workers to keep +# the _other_ AG's locks contended. This isn't needed to reproduce the problem +# prior to the AG-aware allocator's arrival. Ncpu * 4 has been successful at +# reproducing the problem in places where a lower number of workers succeeds +# intermittently (or not at all). +ncpu=$(nproc) +for ((i=0 ; i < ncpu*4; i++)); do + busy_work & + workers[$i]=$! +done + +# Wait for first fpunch to unbusy, and then continue with remainder. +sleep 30 +$XFS_IO_PROG -f -c "fpunch $((BLOCK_SIZE * (f2_off + 6))) $((BLOCK_SIZE * 3))" \ + $FILE2 >> $seqres.full +$XFS_IO_PROG -f -c "fpunch $((BLOCK_SIZE * (f2_off - 3))) $((BLOCK_SIZE * 3))" \ + $FILE2 >> $seqres.full + +# Put 7 blocks back into FILE1 to consume some of the space free'd above. +# The offset here was picked so that the allocator takes blocks from the 3 block +# chunks we punched earlier, but leaves the extents intact in the freespace +# trees. +f1_off=1956 +f1_wanted_blocks=7 +f1_alloc_seq=$((f1_wanted_blocks * 4 - 3)) +for i in $(seq 1 4 $f1_alloc_seq); do + $XFS_IO_PROG -f -c "falloc $((BLOCK_SIZE * (i+f1_off))) $BLOCK_SIZE" \ + $FILE1 >> $seqres.full +done + +# This next falloc should result in FILE2's bmbt getting converted from extents +# to btree while simultaneously splitting the bnotbt and cnobt. The first +# allocation succeeds, splits the free space trees, consumes all the blocks in +# the agfl, and leaves us in a situation where the second allocation to convert +# from extents to a btree fails. +$XFS_IO_PROG -f -c "falloc $((BLOCK_SIZE * f2_off)) $((BLOCK_SIZE * 5))" \ + $FILE2 >> $seqres.full + +# Terminate the busy workers or else umount will return EBUSY. +kill_busy_procs + +umount $LOOP_MNT + +# success, all done +echo "Silence is golden" +status=0 +exit diff --git a/tests/xfs/608.out b/tests/xfs/608.out new file mode 100644 index 00000000..1e534458 --- /dev/null +++ b/tests/xfs/608.out @@ -0,0 +1,2 @@ +QA output created by 608 +Silence is golden -- 2.25.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC PATCH 2/2] xfs/306: update resblks to account for increased per-ag reservation size 2024-08-15 19:36 ` [RFC PATCH 0/2] xfstests: fstests for agfl reservation Krister Johansen 2024-08-15 19:37 ` [RFC PATCH 1/2] xfs/608: a test case for xfs_bmap_extents_to_btree allocation failure Krister Johansen @ 2024-08-15 19:40 ` Krister Johansen 1 sibling, 0 replies; 10+ messages in thread From: Krister Johansen @ 2024-08-15 19:40 UTC (permalink / raw) To: Chandan Babu R, Darrick J. Wong, Dave Chinner Cc: Dave Chinner, Zorro Lang, linux-xfs, fstests The AGFL reservation patches increase the amount of reserved space that this test needs in order to succeed. Modify the resblks xfs_io call to account for the increased per-AG reservation. Without this change, the dd in the test gets an ENOSPC in the path where a delalloc is converted to a real allocation. This results in the test spinning in the dd command and generating endless xfs_discard_folio warnings. Since resblks is supposed to prevent the filesystem from getting empty enough to hit a case like this, increase it slightly to account for the modified per-ag reservation size. Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> --- tests/xfs/306 | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/xfs/306 b/tests/xfs/306 index e21a5622..9acace85 100755 --- a/tests/xfs/306 +++ b/tests/xfs/306 @@ -47,7 +47,7 @@ for i in $(seq 0 3); do done # consume and fragment free space -$XFS_IO_PROG -xc "resblks 16" $SCRATCH_MNT >> $seqres.full 2>&1 +$XFS_IO_PROG -xc "resblks 17" $SCRATCH_MNT >> $seqres.full 2>&1 dd if=/dev/zero of=$SCRATCH_MNT/file bs=4k >> $seqres.full 2>&1 $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/file >> $seqres.full 2>&1 $here/src/punch-alternating $SCRATCH_MNT/file -- 2.25.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-08-15 21:24 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-08-15 19:29 [PATCHSET] AGFL reservation changes Krister Johansen 2024-08-15 19:30 ` [PATCH 0/5] linux: Modifying per-ag reservation to account for dependent allocations Krister Johansen 2024-08-15 19:31 ` [PATCH 1/5] xfs: count the number of blocks in a per-ag reservation Krister Johansen 2024-08-15 19:32 ` [PATCH 2/5] xfs: move calculation in xfs_alloc_min_freelist to its own function Krister Johansen 2024-08-15 19:33 ` [PATCH 3/5] xfs: make alloc_set_aside and friends aware of per-AG reservations Krister Johansen 2024-08-15 19:34 ` [PATCH 4/5] xfs: push the agfl set aside into xfs_alloc_space_available Krister Johansen 2024-08-15 19:35 ` [PATCH 5/5] xfs: include min freelist in m_ag_max_usable Krister Johansen 2024-08-15 19:36 ` [RFC PATCH 0/2] xfstests: fstests for agfl reservation Krister Johansen 2024-08-15 19:37 ` [RFC PATCH 1/2] xfs/608: a test case for xfs_bmap_extents_to_btree allocation failure Krister Johansen 2024-08-15 19:40 ` [RFC PATCH 2/2] xfs/306: update resblks to account for increased per-ag reservation size Krister Johansen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox