From: Pankaj Raghav <p.raghav@samsung.com>
To: linux-xfs@vger.kernel.org
Cc: bfoster@redhat.com, lukas@herbolt.com,
"Darrick J . Wong" <djwong@kernel.org>,
p.raghav@samsung.com, dgc@kernel.org, gost.dev@samsung.com,
pankaj.raghav@linux.dev, andres@anarazel.de,
kundan.kumar@samsung.com, cem@kernel.org, hch@infradead.org
Subject: [PATCH v4 1/2] xfs: add xfs_bmap_alloc_or_convert_range function
Date: Fri, 1 May 2026 13:20:25 +0200 [thread overview]
Message-ID: <20260501112026.673609-2-p.raghav@samsung.com> (raw)
In-Reply-To: <20260501112026.673609-1-p.raghav@samsung.com>
Add xfs_bmap_alloc_or_convert_range() that can either allocate a range
and/or convert unwritten extents to written extents.
This function is based on xfs_iomap_write_unwritten() but we add an
extra flag parameter. Only XFS_BMAPI_CONVERT and/or XFS_BMAPI_ZERO is
accepted as flags to this function. This function also additionally
accounts while starting the transaction for the blocks that might
be created because of XFS_BMAPI_ZERO.
This is done as a preparation to add FALLOC_FL_WRITE_ZEROES flag.
xfs_iomap_write_unwritten() function will now just call
xfs_bmap_alloc_or_convert_range() with flag set to XFS_BMAPI_CONVERT.
Suggested-by: Dave Chinner <dgc@kernel.org>
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
---
I feel like xfs_bmap_alloc_or_convert_range() has become a bit big and
logic repeated, should we rethink of going back to the original solution? [1]
[1] https://lore.kernel.org/linux-xfs/20260310194245.848034-2-lukas@herbolt.com/
fs/xfs/xfs_iomap.c | 215 ++++++++++++++++++++++++++++++++-------------
fs/xfs/xfs_iomap.h | 2 +
2 files changed, 155 insertions(+), 62 deletions(-)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index f20a02f49ed9..40715589814a 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -609,12 +609,30 @@ xfs_iomap_prealloc_size(
return alloc_blocks;
}
+/*
+ * This function is used to allocate written extents over holes and/or convert
+ * unwritten extents to written extents based on the @flags passed to it.
+ *
+ * If @flags is zero, it will allocate written extents for holes and delalloc
+ * extents across the range.
+ *
+ * If XFS_BMAPI_CONVERT is specified in @flags, then it will also do conversion
+ * of unwritten extents in the range to written extents.
+ *
+ * If XFS_BMAPI_ZERO is specified in @flags, then both newly allocated extents
+ * and converted unwritten extents will be initialised to contain zeroes.
+ *
+ * If @update_isize is true, then if the range we are operating on extends
+ * beyond the current EOF, extend i_size to offset+len incrementally as extents
+ * in the range are allocated/converted.
+ */
int
-xfs_iomap_write_unwritten(
- xfs_inode_t *ip,
- xfs_off_t offset,
- xfs_off_t count,
- bool update_isize)
+xfs_bmap_alloc_or_convert_range(
+ struct xfs_inode *ip,
+ xfs_off_t offset,
+ xfs_off_t count,
+ uint32_t flags,
+ bool update_isize)
{
xfs_mount_t *mp = ip->i_mount;
xfs_fileoff_t offset_fsb;
@@ -625,98 +643,159 @@ xfs_iomap_write_unwritten(
xfs_bmbt_irec_t imap;
struct inode *inode = VFS_I(ip);
xfs_fsize_t i_size;
- uint resblks;
int error;
+ int rt = XFS_IS_REALTIME_INODE(ip);
+ xfs_extlen_t extsz, temp;
- trace_xfs_unwritten_convert(ip, offset, count);
+ ASSERT((flags & ~(XFS_BMAPI_ZERO | XFS_BMAPI_CONVERT)) == 0);
+ extsz = xfs_get_extsz_hint(ip);
offset_fsb = XFS_B_TO_FSBT(mp, offset);
count_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count);
count_fsb = (xfs_filblks_t)(count_fsb - offset_fsb);
- /*
- * Reserve enough blocks in this transaction for two complete extent
- * btree splits. We may be converting the middle part of an unwritten
- * extent and in this case we will insert two new extents in the btree
- * each of which could cause a full split.
- *
- * This reservation amount will be used in the first call to
- * xfs_bmbt_split() to select an AG with enough space to satisfy the
- * rest of the operation.
- */
- resblks = XFS_DIOSTRAT_SPACE_RES(mp, 0) << 1;
-
/* Attach dquots so that bmbt splits are accounted correctly. */
error = xfs_qm_dqattach(ip);
if (error)
return error;
do {
+
+ uint resblks, dblocks, bmapi_total, rblocks;
+ int bmapi_error;
+
+ if (flags == XFS_BMAPI_CONVERT) {
+ /*
+ * Reserve enough blocks in this transaction for two
+ * complete extent btree splits. We may be converting
+ * the middle part of an unwritten extent and in this
+ * case we will insert two new extents in the btree each
+ * of which could cause a full split.
+ *
+ * This reservation amount will be used in the first call
+ * to xfs_bmbt_split() to select an AG with enough space
+ * to satisfy the rest of the operation.
+ */
+ resblks = 0;
+ dblocks = XFS_DIOSTRAT_SPACE_RES(mp, resblks) << 1;
+ rblocks = 0;
+ bmapi_total = dblocks;
+ } else {
+ xfs_fileoff_t s, e;
+ if (unlikely(extsz)) {
+ s = offset_fsb;
+ do_div(s, extsz);
+ s *= extsz;
+ e = offset_fsb + count_fsb;
+ div_u64_rem(offset_fsb, extsz, &temp);
+ if (temp)
+ e += temp;
+ div_u64_rem(e, extsz, &temp);
+ if (temp)
+ e += extsz - temp;
+ } else {
+ s = 0;
+ e = count_fsb;
+ }
+
+ /*
+ * We might allocate data blocks (needs resblks + 1 split)
+ * or convert an unwritten extent (needs 0 data blocks +
+ * 2 splits). Ensure we have enough block reservation for
+ * the worst case.
+ */
+ resblks = XFS_FILBLKS_MIN((e - s), XFS_MAX_BMBT_EXTLEN);
+ bmapi_total = 0;
+ if (unlikely(rt)) {
+ rblocks = XFS_DIOSTRAT_SPACE_RES(mp, resblks);
+ dblocks = XFS_DIOSTRAT_SPACE_RES(mp, 0) << 1;
+ } else {
+ rblocks = 0;
+ dblocks = XFS_DIOSTRAT_SPACE_RES(mp, resblks);
+ dblocks = max(dblocks,
+ XFS_DIOSTRAT_SPACE_RES(mp, 0) << 1);
+ }
+ }
+
/*
- * Set up a transaction to convert the range of extents
- * from unwritten to real. Do allocations in a loop until
- * we have covered the range passed in.
+ * Set up a transaction to convert the range of extents based on
+ * the flags. Do allocations in a loop until we have covered the
+ * range passed in.
*
* Note that we can't risk to recursing back into the filesystem
* here as we might be asked to write out the same inode that we
* complete here and might deadlock on the iolock.
*/
- error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_write, resblks,
- 0, true, &tp);
+ error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_write, dblocks,
+ rblocks, true, &tp);
if (error)
return error;
- error = xfs_iext_count_extend(tp, ip, XFS_DATA_FORK,
- XFS_IEXT_WRITE_UNWRITTEN_CNT);
+ if (flags & XFS_BMAPI_CONVERT)
+ error = xfs_iext_count_extend(tp, ip, XFS_DATA_FORK,
+ XFS_IEXT_WRITE_UNWRITTEN_CNT);
+ else
+ error = xfs_iext_count_extend(tp, ip, XFS_DATA_FORK,
+ XFS_IEXT_ADD_NOSPLIT_CNT);
+
if (error)
goto error_on_bmapi_transaction;
- /*
- * Modify the unwritten extent state of the buffer.
- */
nimaps = 1;
error = xfs_bmapi_write(tp, ip, offset_fsb, count_fsb,
- XFS_BMAPI_CONVERT, resblks, &imap,
- &nimaps);
- if (error)
- goto error_on_bmapi_transaction;
+ flags, bmapi_total, &imap, &nimaps);
+ bmapi_error = error;
+
+ if (error) {
+ if (error != -ENOSR)
+ goto error_on_bmapi_transaction;
+ /*
+ * Keep searching until we get one contiguous
+ * extent if we get ENOSR
+ */
+ error = 0;
+ } else {
+ /*
+ * Log the updated inode size as we go. We have to be
+ * careful to only log it up to the actual write offset
+ * if it is halfway into a block.
+ */
+ i_size = XFS_FSB_TO_B(mp, offset_fsb + count_fsb);
+ if (i_size > offset + count)
+ i_size = offset + count;
+ if (update_isize && i_size > i_size_read(inode))
+ i_size_write(inode, i_size);
+ i_size = xfs_new_eof(ip, i_size);
+ if (i_size) {
+ ip->i_disk_size = i_size;
+ xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+ }
- /*
- * Log the updated inode size as we go. We have to be careful
- * to only log it up to the actual write offset if it is
- * halfway into a block.
- */
- i_size = XFS_FSB_TO_B(mp, offset_fsb + count_fsb);
- if (i_size > offset + count)
- i_size = offset + count;
- if (update_isize && i_size > i_size_read(inode))
- i_size_write(inode, i_size);
- i_size = xfs_new_eof(ip, i_size);
- if (i_size) {
- ip->i_disk_size = i_size;
- xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
}
error = xfs_trans_commit(tp);
xfs_iunlock(ip, XFS_ILOCK_EXCL);
- if (error)
- return error;
- if (unlikely(!xfs_valid_startblock(ip, imap.br_startblock))) {
- xfs_bmap_mark_sick(ip, XFS_DATA_FORK);
- return xfs_alert_fsblock_zero(ip, &imap);
+ if (!bmapi_error) {
+ if (unlikely(!xfs_valid_startblock(ip, imap.br_startblock))) {
+ xfs_bmap_mark_sick(ip, XFS_DATA_FORK);
+ return xfs_alert_fsblock_zero(ip, &imap);
+ }
+ if ((numblks_fsb = imap.br_blockcount) == 0) {
+ /*
+ * The numblks_fsb value should always get smaller,
+ * otherwise the loop is stuck.
+ */
+ ASSERT(imap.br_blockcount);
+ break;
+ }
+ offset_fsb += numblks_fsb;
+ count_fsb -= numblks_fsb;
}
- if ((numblks_fsb = imap.br_blockcount) == 0) {
- /*
- * The numblks_fsb value should always get
- * smaller, otherwise the loop is stuck.
- */
- ASSERT(imap.br_blockcount);
- break;
- }
- offset_fsb += numblks_fsb;
- count_fsb -= numblks_fsb;
+ if (error)
+ return error;
+
} while (count_fsb > 0);
return 0;
@@ -727,6 +806,18 @@ xfs_iomap_write_unwritten(
return error;
}
+int
+xfs_iomap_write_unwritten(
+ xfs_inode_t *ip,
+ xfs_off_t offset,
+ xfs_off_t count,
+ bool update_isize)
+{
+ trace_xfs_unwritten_convert(ip, offset, count);
+ return xfs_bmap_alloc_or_convert_range(ip, offset, count,
+ XFS_BMAPI_CONVERT, update_isize);
+}
+
static inline bool
imap_needs_alloc(
struct inode *inode,
diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
index ebcce7d49446..ba7b06074539 100644
--- a/fs/xfs/xfs_iomap.h
+++ b/fs/xfs/xfs_iomap.h
@@ -16,6 +16,8 @@ int xfs_iomap_write_direct(struct xfs_inode *ip, xfs_fileoff_t offset_fsb,
xfs_fileoff_t count_fsb, unsigned int flags,
struct xfs_bmbt_irec *imap, u64 *sequence);
int xfs_iomap_write_unwritten(struct xfs_inode *, xfs_off_t, xfs_off_t, bool);
+int xfs_bmap_alloc_or_convert_range(struct xfs_inode *, xfs_off_t, xfs_off_t,
+ uint32_t, bool);
xfs_fileoff_t xfs_iomap_eof_align_last_fsb(struct xfs_inode *ip,
xfs_fileoff_t end_fsb);
--
2.51.2
next prev parent reply other threads:[~2026-05-01 11:20 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-01 11:20 [PATCH v4 0/2] add FALLOC_FL_WRITE_ZEROES support to xfs Pankaj Raghav
2026-05-01 11:20 ` Pankaj Raghav [this message]
2026-05-08 9:01 ` [PATCH v4 1/2] xfs: add xfs_bmap_alloc_or_convert_range function Christoph Hellwig
2026-05-01 11:20 ` [PATCH v4 2/2] xfs: add support for FALLOC_FL_WRITE_ZEROES Pankaj Raghav
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260501112026.673609-2-p.raghav@samsung.com \
--to=p.raghav@samsung.com \
--cc=andres@anarazel.de \
--cc=bfoster@redhat.com \
--cc=cem@kernel.org \
--cc=dgc@kernel.org \
--cc=djwong@kernel.org \
--cc=gost.dev@samsung.com \
--cc=hch@infradead.org \
--cc=kundan.kumar@samsung.com \
--cc=linux-xfs@vger.kernel.org \
--cc=lukas@herbolt.com \
--cc=pankaj.raghav@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox