From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: david@fromorbit.com, darrick.wong@oracle.com
Cc: Dave Chinner <dchinner@redhat.com>, xfs@oss.sgi.com
Subject: [PATCH 12/51] xfs: rmap btree requires more reserved free space
Date: Tue, 06 Oct 2015 22:06:31 -0700 [thread overview]
Message-ID: <20151007050631.1504.24820.stgit@birch.djwong.org> (raw)
In-Reply-To: <20151007050513.1504.28089.stgit@birch.djwong.org>
>From : Dave Chinner <dchinner@redhat.com>
The rmap btree is allocated from the AGFL, which means we have to
ensure ENOSPC is reported to userspace before we run out of free
space in each AG. The last allocation in an AG can cause a full
height rmap btree split, and that means we have to reserve at least
this many blocks *in each AG* to be placed on the AGFL at ENOSPC.
Update the various space calculation functiosn to handle this.
Also, because the macros are now executing conditional code and are called quite
frequently, convert them to functions that initialise varaibles in the struct
xfs_mount, use the new variables everywhere and document the calculations
better.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
[port to xfsprogs]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
include/xfs_mount.h | 2 +
libxfs/xfs_alloc.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++
libxfs/xfs_alloc.h | 43 ++++----------------------------
libxfs/xfs_bmap.c | 2 +
libxfs/xfs_sb.c | 2 +
5 files changed, 79 insertions(+), 39 deletions(-)
diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index 9978769..5410168 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -69,6 +69,8 @@ typedef struct xfs_mount {
uint m_ag_maxlevels; /* XFS_AG_MAXLEVELS */
uint m_bm_maxlevels[2]; /* XFS_BM_MAXLEVELS */
uint m_in_maxlevels; /* XFS_IN_MAXLEVELS */
+ uint m_alloc_set_aside; /* space we can't use */
+ uint m_ag_max_usable; /* max space per AG */
struct radix_tree_root m_perag_tree;
uint m_flags; /* global mount flags */
uint m_qflags; /* quota status flags */
diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index eeb682e..1a69bdc 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -58,6 +58,72 @@ xfs_prealloc_blocks(
}
/*
+ * In order to avoid ENOSPC-related deadlock caused by out-of-order locking of
+ * AGF buffer (PV 947395), we place constraints on the relationship among actual
+ * allocations for data blocks, freelist blocks, and potential file data bmap
+ * btree blocks. However, these restrictions may result in no actual space
+ * allocated for a delayed extent, for example, a data block in a certain AG is
+ * allocated but there is no additional block for the additional bmap btree
+ * block due to a split of the bmap btree of the file. The result of this may
+ * lead to an infinite loop when the file gets flushed to disk and all delayed
+ * extents need to be actually allocated. To get around this, we explicitly set
+ * aside a few blocks which will not be reserved in delayed allocation.
+ *
+ * The minimum number of needed freelist blocks is 4 fsbs _per AG_ when we are
+ * not using rmap btrees a potential split of file's bmap btree requires 1 fsb,
+ * so we set the number of set-aside blocks to 4 + 4*agcount when not using rmap
+ * btrees.
+ *
+ * When rmap btrees are active, we have to consider that using the last block in
+ * the AG can cause a full height rmap btree split and we need enough blocks on
+ * the AGFL to be able to handle this. That means we have, in addition to the
+ * above consideration, another (2 * mp->m_ag_levels) - 1 blocks required to be
+ * available to the free list.
+ */
+unsigned int
+xfs_alloc_set_aside(
+ struct xfs_mount *mp)
+{
+ unsigned int blocks;
+
+ blocks = 4 + (mp->m_sb.sb_agcount * XFS_ALLOC_AGFL_RESERVE);
+ if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+ return blocks;
+ return blocks + (mp->m_sb.sb_agcount * (2 * mp->m_ag_maxlevels) - 1);
+}
+
+/*
+ * When deciding how much space to allocate out of an AG, we limit the
+ * allocation maximum size to the size the AG. However, we cannot use all the
+ * blocks in the AG - some are permanently used by metadata. These
+ * blocks are generally:
+ * - the AG superblock, AGF, AGI and AGFL
+ * - the AGF (bno and cnt) and AGI btree root blocks, and optionally
+ * the AGI free inode and rmap btree root blocks.
+ * - blocks on the AGFL according to xfs_alloc_set_aside() limits
+ *
+ * The AG headers are sector sized, so the amount of space they take up is
+ * dependent on filesystem geometry. The others are all single blocks.
+ */
+unsigned int
+xfs_alloc_ag_max_usable(struct xfs_mount *mp)
+{
+ unsigned int blocks;
+
+ blocks = XFS_BB_TO_FSB(mp, XFS_FSS_TO_BB(mp, 4)); /* ag headers */
+ blocks += XFS_ALLOC_AGFL_RESERVE;
+ blocks += 3; /* AGF, AGI btree root blocks */
+ if (xfs_sb_version_hasfinobt(&mp->m_sb))
+ blocks++; /* finobt root block */
+ if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+ /* rmap root block + full tree split on full AG */
+ blocks += 1 + (2 * mp->m_ag_maxlevels) - 1;
+ }
+
+ return mp->m_sb.sb_agblocks - blocks;
+}
+
+/*
* Lookup the record equal to [bno, len] in the btree given by cur.
*/
STATIC int /* error */
@@ -1897,6 +1963,9 @@ xfs_alloc_min_freelist(
/* space needed by-size freespace btree */
min_free += min_t(unsigned int, pag->pagf_levels[XFS_BTNUM_CNTi] + 1,
mp->m_ag_maxlevels);
+ /* space needed reverse mapping used space btree */
+ min_free += min_t(unsigned int, pag->pagf_levels[XFS_BTNUM_RMAPi] + 1,
+ mp->m_ag_maxlevels);
return min_free;
}
diff --git a/libxfs/xfs_alloc.h b/libxfs/xfs_alloc.h
index 35b60ae..5b2b616 100644
--- a/libxfs/xfs_alloc.h
+++ b/libxfs/xfs_alloc.h
@@ -55,44 +55,6 @@ typedef unsigned int xfs_alloctype_t;
#define XFS_ALLOC_FLAG_TRYLOCK 0x00000001 /* use trylock for buffer locking */
#define XFS_ALLOC_FLAG_FREEING 0x00000002 /* indicate caller is freeing extents*/
-/*
- * In order to avoid ENOSPC-related deadlock caused by
- * out-of-order locking of AGF buffer (PV 947395), we place
- * constraints on the relationship among actual allocations for
- * data blocks, freelist blocks, and potential file data bmap
- * btree blocks. However, these restrictions may result in no
- * actual space allocated for a delayed extent, for example, a data
- * block in a certain AG is allocated but there is no additional
- * block for the additional bmap btree block due to a split of the
- * bmap btree of the file. The result of this may lead to an
- * infinite loop in xfssyncd when the file gets flushed to disk and
- * all delayed extents need to be actually allocated. To get around
- * this, we explicitly set aside a few blocks which will not be
- * reserved in delayed allocation. Considering the minimum number of
- * needed freelist blocks is 4 fsbs _per AG_, a potential split of file's bmap
- * btree requires 1 fsb, so we set the number of set-aside blocks
- * to 4 + 4*agcount.
- *
- * XXX: this changes for rmapbt filesystems.
- */
-#define XFS_ALLOC_SET_ASIDE(mp) (4 + ((mp)->m_sb.sb_agcount * 4))
-
-/*
- * When deciding how much space to allocate out of an AG, we limit the
- * allocation maximum size to the size the AG. However, we cannot use all the
- * blocks in the AG - some are permanently used by metadata. These
- * blocks are generally:
- * - the AG superblock, AGF, AGI and AGFL
- * - the AGF (bno and cnt) and AGI btree root blocks
- * - 4 blocks on the AGFL according to XFS_ALLOC_SET_ASIDE() limits
- *
- * The AG headers are sector sized, so the amount of space they take up is
- * dependent on filesystem geometry. The others are all single blocks.
- *
- * XXX: this changes for rmapbt filesystems.
- */
-#define XFS_ALLOC_AG_MAX_USABLE(mp) \
- ((mp)->m_sb.sb_agblocks - XFS_BB_TO_FSB(mp, XFS_FSS_TO_BB(mp, 4)) - 7)
/*
* Argument structure for xfs_alloc routines.
@@ -134,6 +96,11 @@ typedef struct xfs_alloc_arg {
#define XFS_ALLOC_USERDATA 1 /* allocation is for user data*/
#define XFS_ALLOC_INITIAL_USER_DATA 2 /* special case start of file */
+/* freespace limit calculations */
+#define XFS_ALLOC_AGFL_RESERVE 4
+unsigned int xfs_alloc_set_aside(struct xfs_mount *mp);
+unsigned int xfs_alloc_ag_max_usable(struct xfs_mount *mp);
+
xfs_extlen_t xfs_alloc_longest_free_extent(struct xfs_mount *mp,
struct xfs_perag *pag, xfs_extlen_t need);
unsigned int xfs_alloc_min_freelist(struct xfs_mount *mp,
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 87a6918..73fbdf0 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -3701,7 +3701,7 @@ xfs_bmap_btalloc(
args.fsbno = ap->blkno;
/* Trim the allocation back to the maximum an AG can fit. */
- args.maxlen = MIN(ap->length, XFS_ALLOC_AG_MAX_USABLE(mp));
+ args.maxlen = MIN(ap->length, mp->m_ag_max_usable);
args.firstblock = *ap->firstblock;
blen = 0;
if (nullfb) {
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index f092da7..ddc1ecd 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -715,6 +715,8 @@ xfs_sb_mount_common(
mp->m_ialloc_min_blks = sbp->sb_spino_align;
else
mp->m_ialloc_min_blks = mp->m_ialloc_blks;
+ mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
+ mp->m_ag_max_usable = xfs_alloc_ag_max_usable(mp);
}
/*
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2015-10-07 5:06 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-07 5:05 [RFCv3 00/51] xfsprogs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
2015-10-07 5:05 ` [PATCH 01/51] libxcmd: provide a common function to report command runtimes Darrick J. Wong
2015-10-13 17:48 ` Christoph Hellwig
2015-10-13 22:39 ` Darrick J. Wong
2015-10-14 5:35 ` [PATCH v2 " Darrick J. Wong
2015-10-14 7:31 ` Christoph Hellwig
2015-10-07 5:05 ` [PATCH 02/51] libxfs: add reflink and dedupe ioctls Darrick J. Wong
2015-10-07 5:05 ` [PATCH 03/51] xfs_io: support reflink and dedupe of file ranges Darrick J. Wong
2015-10-14 5:36 ` [PATCH v2 " Darrick J. Wong
2015-11-09 7:54 ` Christoph Hellwig
2015-11-09 18:33 ` Darrick J. Wong
2015-11-09 18:57 ` Darrick J. Wong
2015-11-09 21:35 ` Dave Chinner
2015-11-10 6:27 ` Darrick J. Wong
2015-10-07 5:05 ` [PATCH 04/51] xfs_io: unshare blocks via fallocate Darrick J. Wong
2015-10-07 5:05 ` [PATCH 05/51] xfs_db: enable blocktrash for checksummed filesystems Darrick J. Wong
2015-10-07 5:05 ` [PATCH 06/51] xfs_db: trash the block at the top of the cursor stack Darrick J. Wong
2015-10-07 5:05 ` [PATCH 07/51] xfs_db: enable blockget for v5 filesystems Darrick J. Wong
2015-10-14 17:08 ` Christoph Hellwig
2015-10-14 18:20 ` Darrick J. Wong
2015-10-14 18:23 ` Christoph Hellwig
2015-10-14 19:52 ` Darrick J. Wong
2015-10-14 21:26 ` Dave Chinner
2015-10-07 5:06 ` [PATCH 08/51] libxfs: reorder xfs_bmap_add_free args Darrick J. Wong
2015-10-07 5:06 ` [PATCH 09/51] libxfs: add the reverse-mapping btree Darrick J. Wong
2015-10-07 5:06 ` [PATCH 10/51] libxfs: resync xfs_prealloc_blocks with the kernel Darrick J. Wong
2015-10-07 5:06 ` [PATCH 11/51] xfs: rmap btree transaction reservations Darrick J. Wong
2015-10-07 5:06 ` Darrick J. Wong [this message]
2015-10-07 5:06 ` [PATCH 13/51] libxfs: propagate a bunch of case changes to mkfs and repair Darrick J. Wong
2015-10-07 5:06 ` [PATCH 14/51] libxfs: fix min freelist length calculation Darrick J. Wong
2015-10-07 5:06 ` [PATCH 15/51] libxfs: add the RMAP CRC to the xfs_magics list Darrick J. Wong
2015-10-07 5:06 ` [PATCH 16/51] libxfs: enhance rmapbt definition to support reflink Darrick J. Wong
2015-10-07 5:07 ` [PATCH 17/51] libxfs: refactor short btree block verification Darrick J. Wong
2015-10-07 5:07 ` [PATCH 18/51] xfs: don't update rmapbt when fixing agfl Darrick J. Wong
2015-10-07 5:07 ` [PATCH 19/51] libxfs: implement XFS_IOC_SWAPEXT when rmap btree is enabled Darrick J. Wong
2015-10-07 5:07 ` [PATCH 20/51] xfs_db: display rmap btree contents Darrick J. Wong
2015-10-07 5:07 ` [PATCH 21/51] xfs_dump: display enhanced rmapbt fields Darrick J. Wong
2015-10-07 5:07 ` [PATCH 22/51] xfs_db: check rmapbt Darrick J. Wong
2015-10-07 5:07 ` [PATCH 23/51] xfs_db: copy the rmap btree Darrick J. Wong
2015-10-07 5:07 ` [PATCH 24/51] xfs_growfs: report rmapbt presence Darrick J. Wong
2015-10-07 5:07 ` [PATCH 25/51] xfs_repair: use rmap btree data to check block types Darrick J. Wong
2015-10-07 5:08 ` [PATCH 26/51] xfs_repair: mask off length appropriately Darrick J. Wong
2015-10-07 5:08 ` [PATCH 27/51] xfs_repair: fix fino_bno calculation when rmapbt is enabled Darrick J. Wong
2015-10-07 5:08 ` [PATCH 28/51] xfs_repair: create a slab API for allocating arrays in large chunks Darrick J. Wong
2015-10-07 5:08 ` [PATCH 29/51] xfs_repair: collect reverse-mapping data for refcount/rmap tree rebuilding Darrick J. Wong
2015-10-07 5:08 ` [PATCH 30/51] xfs_repair: record and merge raw rmap data Darrick J. Wong
2015-10-07 5:08 ` [PATCH 31/51] xfs_repair: add inode bmbt block rmaps Darrick J. Wong
2015-10-07 5:08 ` [PATCH 32/51] xfs_repair: add fixed-location per-AG rmaps Darrick J. Wong
2015-10-21 21:08 ` Darrick J. Wong
2015-10-07 5:08 ` [PATCH 33/51] xfs_repair: check existing rmapbt entries against observed rmaps Darrick J. Wong
2015-10-07 5:08 ` [PATCH 34/51] xfs_repair: rebuild reverse-mapping btree Darrick J. Wong
2015-10-07 5:09 ` [PATCH 35/51] xfs_repair: add per-AG btree blocks to rmap data and add to rmapbt Darrick J. Wong
2015-10-07 5:09 ` [PATCH 36/51] mkfs.xfs: Create rmapbt filesystems Darrick J. Wong
2015-10-07 5:09 ` [PATCH 37/51] xfs_mkfs: initialize extra fields during mkfs Darrick J. Wong
2015-10-07 5:09 ` [PATCH 38/51] libxfs: add support for refcount btrees Darrick J. Wong
2015-10-07 5:09 ` [PATCH 39/51] xfs_db: dump refcount btree data Darrick J. Wong
2015-10-07 5:09 ` [PATCH 40/51] xfs_db: add support for checking the refcount btree Darrick J. Wong
2015-10-07 5:09 ` [PATCH 41/51] xfs_db: metadump should copy the refcount btree too Darrick J. Wong
2015-10-07 5:09 ` [PATCH 42/51] xfs_growfs: report the presence of the reflink feature Darrick J. Wong
2015-10-07 5:09 ` [PATCH 43/51] xfs_repair: check the existing refcount btree Darrick J. Wong
2015-10-07 5:09 ` [PATCH 44/51] xfs_repair: handle multiple owners of data blocks Darrick J. Wong
2015-10-07 5:10 ` [PATCH 45/51] xfs_repair: process reverse-mapping data into refcount data Darrick J. Wong
2015-10-07 5:10 ` [PATCH 46/51] xfs_repair: record reflink inode state Darrick J. Wong
2015-10-07 5:10 ` [PATCH 47/51] xfs_repair: fix inode reflink flags Darrick J. Wong
2015-10-07 5:10 ` [PATCH 48/51] xfs_repair: check the refcount btree against our observed reference counts when -n Darrick J. Wong
2015-10-07 5:10 ` [PATCH 49/51] xfs_repair: rebuild the refcount btree Darrick J. Wong
2015-10-07 5:10 ` [PATCH 50/51] mkfs.xfs: format reflink enabled filesystems Darrick J. Wong
2015-10-07 5:10 ` [PATCH 51/51] mkfs: hack around not having enough log blocks Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151007050631.1504.24820.stgit@birch.djwong.org \
--to=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=dchinner@redhat.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox