* [PATCH v10] xfs: add FALLOC_FL_WRITE_ZEROES to XFS code base
@ 2026-02-25 8:39 Lukas Herbolt
2026-02-25 17:52 ` Christoph Hellwig
2026-02-26 14:44 ` Pankaj Raghav (Samsung)
0 siblings, 2 replies; 5+ messages in thread
From: Lukas Herbolt @ 2026-02-25 8:39 UTC (permalink / raw)
To: linux-xfs, djwong; +Cc: cem, hch, Lukas Herbolt
Add support for FALLOC_FL_WRITE_ZEROES if the underlying device enable
the unmap write zeroes operation.
Signed-off-by: Lukas Herbolt <lukas@herbolt.com>
---
fs/xfs/xfs_bmap_util.c | 5 +++--
fs/xfs/xfs_bmap_util.h | 2 +-
fs/xfs/xfs_file.c | 43 +++++++++++++++++++++++++++++-------------
3 files changed, 34 insertions(+), 16 deletions(-)
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 2208a720ec3f..0c1b1fa82f8b 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -646,7 +646,8 @@ int
xfs_alloc_file_space(
struct xfs_inode *ip,
xfs_off_t offset,
- xfs_off_t len)
+ xfs_off_t len,
+ uint32_t bmapi_flags)
{
xfs_mount_t *mp = ip->i_mount;
xfs_off_t count;
@@ -748,7 +749,7 @@ xfs_alloc_file_space(
* will eventually reach the requested range.
*/
error = xfs_bmapi_write(tp, ip, startoffset_fsb,
- allocatesize_fsb, XFS_BMAPI_PREALLOC, 0, imapp,
+ allocatesize_fsb, bmapi_flags, 0, imapp,
&nimaps);
if (error) {
if (error != -ENOSR)
diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h
index c477b3361630..2895cc97a572 100644
--- a/fs/xfs/xfs_bmap_util.h
+++ b/fs/xfs/xfs_bmap_util.h
@@ -56,7 +56,7 @@ int xfs_bmap_last_extent(struct xfs_trans *tp, struct xfs_inode *ip,
/* preallocation and hole punch interface */
int xfs_alloc_file_space(struct xfs_inode *ip, xfs_off_t offset,
- xfs_off_t len);
+ xfs_off_t len, uint32_t bmapi_flags);
int xfs_free_file_space(struct xfs_inode *ip, xfs_off_t offset,
xfs_off_t len, struct xfs_zone_alloc_ctx *ac);
int xfs_collapse_file_space(struct xfs_inode *, xfs_off_t offset,
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 7874cf745af3..1ba4f449edb3 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1293,6 +1293,7 @@ xfs_falloc_zero_range(
unsigned int blksize = i_blocksize(inode);
loff_t new_size = 0;
int error;
+ uint32_t bmapi_flags;
trace_xfs_zero_file_space(ip);
@@ -1300,18 +1301,31 @@ xfs_falloc_zero_range(
if (error)
return error;
- if (xfs_falloc_force_zero(ip, ac)) {
- error = xfs_zero_range(ip, offset, len, ac, NULL);
- } else {
- error = xfs_free_file_space(ip, offset, len, ac);
- if (error)
- return error;
- len = round_up(offset + len, blksize) -
- round_down(offset, blksize);
- offset = round_down(offset, blksize);
- error = xfs_alloc_file_space(ip, offset, len);
+ if (mode & FALLOC_FL_WRITE_ZEROES) {
+ if (xfs_is_always_cow_inode(ip) ||
+ !bdev_write_zeroes_unmap_sectors(
+ xfs_inode_buftarg(ip)->bt_bdev))
+ return -EOPNOTSUPP;
+ bmapi_flags = XFS_BMAPI_ZERO;
+ } else {
+ if (xfs_falloc_force_zero(ip, ac)) {
+ error = xfs_zero_range(ip, offset, len, ac, NULL);
+ goto set_filesize;
+ } else {
+ error = xfs_free_file_space(ip, offset, len, ac);
+ if (error)
+ return error;
+ }
+ bmapi_flags = XFS_BMAPI_PREALLOC;
}
+
+ len = round_up(offset + len, blksize) - round_down(offset, blksize);
+ offset = round_down(offset, blksize);
+
+ error = xfs_alloc_file_space(ip, offset, len, bmapi_flags);
+
+set_filesize:
if (error)
return error;
return xfs_falloc_setsize(file, new_size);
@@ -1336,7 +1350,8 @@ xfs_falloc_unshare_range(
if (error)
return error;
- error = xfs_alloc_file_space(XFS_I(inode), offset, len);
+ error = xfs_alloc_file_space(XFS_I(inode), offset, len,
+ XFS_BMAPI_PREALLOC);
if (error)
return error;
return xfs_falloc_setsize(file, new_size);
@@ -1364,7 +1379,8 @@ xfs_falloc_allocate_range(
if (error)
return error;
- error = xfs_alloc_file_space(XFS_I(inode), offset, len);
+ error = xfs_alloc_file_space(XFS_I(inode), offset, len,
+ XFS_BMAPI_PREALLOC);
if (error)
return error;
return xfs_falloc_setsize(file, new_size);
@@ -1374,7 +1390,7 @@ xfs_falloc_allocate_range(
(FALLOC_FL_ALLOCATE_RANGE | FALLOC_FL_KEEP_SIZE | \
FALLOC_FL_PUNCH_HOLE | FALLOC_FL_COLLAPSE_RANGE | \
FALLOC_FL_ZERO_RANGE | FALLOC_FL_INSERT_RANGE | \
- FALLOC_FL_UNSHARE_RANGE)
+ FALLOC_FL_UNSHARE_RANGE | FALLOC_FL_WRITE_ZEROES)
STATIC long
__xfs_file_fallocate(
@@ -1417,6 +1433,7 @@ __xfs_file_fallocate(
case FALLOC_FL_INSERT_RANGE:
error = xfs_falloc_insert_range(file, offset, len);
break;
+ case FALLOC_FL_WRITE_ZEROES:
case FALLOC_FL_ZERO_RANGE:
error = xfs_falloc_zero_range(file, mode, offset, len, ac);
break;
--
2.53.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v10] xfs: add FALLOC_FL_WRITE_ZEROES to XFS code base
2026-02-25 8:39 [PATCH v10] xfs: add FALLOC_FL_WRITE_ZEROES to XFS code base Lukas Herbolt
@ 2026-02-25 17:52 ` Christoph Hellwig
2026-02-26 14:44 ` Pankaj Raghav (Samsung)
1 sibling, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2026-02-25 17:52 UTC (permalink / raw)
To: Lukas Herbolt; +Cc: linux-xfs, djwong, cem, hch
> + } else {
> + if (xfs_falloc_force_zero(ip, ac)) {
> + error = xfs_zero_range(ip, offset, len, ac, NULL);
> + goto set_filesize;
> + } else {
No need for an else after a goto.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v10] xfs: add FALLOC_FL_WRITE_ZEROES to XFS code base
2026-02-25 8:39 [PATCH v10] xfs: add FALLOC_FL_WRITE_ZEROES to XFS code base Lukas Herbolt
2026-02-25 17:52 ` Christoph Hellwig
@ 2026-02-26 14:44 ` Pankaj Raghav (Samsung)
2026-02-26 16:42 ` Brian Foster
1 sibling, 1 reply; 5+ messages in thread
From: Pankaj Raghav (Samsung) @ 2026-02-26 14:44 UTC (permalink / raw)
To: Lukas Herbolt; +Cc: linux-xfs, djwong, cem, hch, p.raghav, pankaj.raghav
On Wed, Feb 25, 2026 at 09:39:33AM +0100, Lukas Herbolt wrote:
> Add support for FALLOC_FL_WRITE_ZEROES if the underlying device enable
> the unmap write zeroes operation.
>
Hi Lukas,
I independently started implmenting this feature as well. I ran a test case
on your patches and it resulted in a warning in iomap_zero_range.
iomap_zero_range has a check for folios outside eof, and it is being
called as a part of setsize, i.e, before we change the size of the file.
I think we need to do a PREALLOC and then do a XFS_BMAPI_ZERO with
XFS_BMAPI_CONVERT. Or I don't know if we should change the warning in
iomap_zero_range.
Doing unwritten extents first and then converting them to written with
zeroes is what ext4 does as well. Maybe it is better this way because we
can quickly allocate the blocks and return while holding the aglocks and
then do the actually write. I guess someone more experienced with XFS
can comment on that.
I can send what I have and I will CC you in the series.
This is the warning I get when I test your patch:
WARNING:
[ 112.551102] WARNING: fs/iomap/buffered-io.c:1525 at iomap_zero_range+0x42d/0x7b0, CPU#2: write_zeroes/411
[ 112.560073] RIP: 0010:iomap_zero_range+0x42d/0x7b0
[ 112.593471] xfs_zero_range+0x86/0xd0 [xfs]
<snip>
[ 112.594100] xfs_setattr_size+0x5c2/0xd90 [xfs]
<snip>
[ 112.598895] xfs_falloc_setsize+0x158/0x200 [xfs]
This is the test case:
#define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#ifndef FALLOC_FL_ZERO_RANGE
#define FALLOC_FL_ZERO_RANGE 0x10
#endif
#ifndef FALLOC_FL_WRITE_ZEROES
#define FALLOC_FL_WRITE_ZEROES 0x80
#endif
#define TEST_SIZE (10 * 1024 * 1024)
void test_fallocate(const char *filename, int mode, const char *mode_name) {
int fd;
printf("Testing %s on %s...\n", mode_name, filename);
unlink(filename);
fd = open(filename, O_RDWR | O_CREAT, 0666);
if (fd < 0) {
perror("open failed");
return;
}
if (fallocate(fd, mode, 0, TEST_SIZE) == 0) {
printf(" -> fallocate(%s) succeeded!\n", mode_name);
} else {
printf(" -> fallocate(%s) failed: %s\n", mode_name, strerror(errno));
}
close(fd);
/* Dump extent info using xfs_io */
char cmd[256];
snprintf(cmd, sizeof(cmd), "xfs_io -c 'bmap -vp' %s", filename);
printf("=== Extents for %s ===\n", filename);
system(cmd);
printf("\n");
}
int main() {
printf("Starting fallocate tests...\n");
printf("------------------------------------------------\n\n");
test_fallocate("test_zero_range.bin", FALLOC_FL_ZERO_RANGE, "FALLOC_FL_ZERO_RANGE");
test_fallocate("test_write_zeroes.bin", FALLOC_FL_WRITE_ZEROES, "FALLOC_FL_WRITE_ZEROES");
printf("Test complete.\n");
return 0;
}
--
Pankaj
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v10] xfs: add FALLOC_FL_WRITE_ZEROES to XFS code base
2026-02-26 14:44 ` Pankaj Raghav (Samsung)
@ 2026-02-26 16:42 ` Brian Foster
2026-02-27 12:05 ` Pankaj Raghav (Samsung)
0 siblings, 1 reply; 5+ messages in thread
From: Brian Foster @ 2026-02-26 16:42 UTC (permalink / raw)
To: Pankaj Raghav (Samsung)
Cc: Lukas Herbolt, linux-xfs, djwong, cem, hch, p.raghav
On Thu, Feb 26, 2026 at 02:44:05PM +0000, Pankaj Raghav (Samsung) wrote:
> On Wed, Feb 25, 2026 at 09:39:33AM +0100, Lukas Herbolt wrote:
> > Add support for FALLOC_FL_WRITE_ZEROES if the underlying device enable
> > the unmap write zeroes operation.
> >
> Hi Lukas,
>
> I independently started implmenting this feature as well. I ran a test case
> on your patches and it resulted in a warning in iomap_zero_range.
> iomap_zero_range has a check for folios outside eof, and it is being
> called as a part of setsize, i.e, before we change the size of the file.
>
> I think we need to do a PREALLOC and then do a XFS_BMAPI_ZERO with
> XFS_BMAPI_CONVERT. Or I don't know if we should change the warning in
> iomap_zero_range.
>
The reason the warning is there is because iomap_zero_range() uses
buffered writes but doesn't actually bump i_size for writes beyond eof.
Therefore if it ends up zeroing folios that start beyond eof, writeback
would potentially toss those folios if i_size wasn't updated somehow or
another by the time it occurs..
I'd guess there are two likely scenarios that lead to this warning, but
you'd have to confirm. One is that we're unnecessarily zeroing an
unwritten range for some reason. That would probably be harmless, but
unexpected. The other would be zeroing written blocks beyond eof, which
is risky and probably something we want to avoid, but also suspicious in
that I don't think we should ever have written extents beyond eof in XFS
(but rather either delalloc or written).
Brian
> Doing unwritten extents first and then converting them to written with
> zeroes is what ext4 does as well. Maybe it is better this way because we
> can quickly allocate the blocks and return while holding the aglocks and
> then do the actually write. I guess someone more experienced with XFS
> can comment on that.
>
> I can send what I have and I will CC you in the series.
>
> This is the warning I get when I test your patch:
>
> WARNING:
>
> [ 112.551102] WARNING: fs/iomap/buffered-io.c:1525 at iomap_zero_range+0x42d/0x7b0, CPU#2: write_zeroes/411
> [ 112.560073] RIP: 0010:iomap_zero_range+0x42d/0x7b0
> [ 112.593471] xfs_zero_range+0x86/0xd0 [xfs]
> <snip>
> [ 112.594100] xfs_setattr_size+0x5c2/0xd90 [xfs]
> <snip>
> [ 112.598895] xfs_falloc_setsize+0x158/0x200 [xfs]
>
>
> This is the test case:
> #define _GNU_SOURCE
> #include <fcntl.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <string.h>
> #include <errno.h>
>
> #ifndef FALLOC_FL_ZERO_RANGE
> #define FALLOC_FL_ZERO_RANGE 0x10
> #endif
>
> #ifndef FALLOC_FL_WRITE_ZEROES
> #define FALLOC_FL_WRITE_ZEROES 0x80
> #endif
>
> #define TEST_SIZE (10 * 1024 * 1024)
>
> void test_fallocate(const char *filename, int mode, const char *mode_name) {
> int fd;
>
> printf("Testing %s on %s...\n", mode_name, filename);
>
> unlink(filename);
>
> fd = open(filename, O_RDWR | O_CREAT, 0666);
> if (fd < 0) {
> perror("open failed");
> return;
> }
>
> if (fallocate(fd, mode, 0, TEST_SIZE) == 0) {
> printf(" -> fallocate(%s) succeeded!\n", mode_name);
> } else {
> printf(" -> fallocate(%s) failed: %s\n", mode_name, strerror(errno));
> }
>
> close(fd);
>
> /* Dump extent info using xfs_io */
> char cmd[256];
> snprintf(cmd, sizeof(cmd), "xfs_io -c 'bmap -vp' %s", filename);
> printf("=== Extents for %s ===\n", filename);
> system(cmd);
> printf("\n");
> }
>
> int main() {
> printf("Starting fallocate tests...\n");
> printf("------------------------------------------------\n\n");
>
> test_fallocate("test_zero_range.bin", FALLOC_FL_ZERO_RANGE, "FALLOC_FL_ZERO_RANGE");
> test_fallocate("test_write_zeroes.bin", FALLOC_FL_WRITE_ZEROES, "FALLOC_FL_WRITE_ZEROES");
>
> printf("Test complete.\n");
> return 0;
> }
>
> --
> Pankaj
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v10] xfs: add FALLOC_FL_WRITE_ZEROES to XFS code base
2026-02-26 16:42 ` Brian Foster
@ 2026-02-27 12:05 ` Pankaj Raghav (Samsung)
0 siblings, 0 replies; 5+ messages in thread
From: Pankaj Raghav (Samsung) @ 2026-02-27 12:05 UTC (permalink / raw)
To: Brian Foster; +Cc: Lukas Herbolt, linux-xfs, djwong, cem, hch, p.raghav
> > Hi Lukas,
> >
> > I independently started implmenting this feature as well. I ran a test case
> > on your patches and it resulted in a warning in iomap_zero_range.
> > iomap_zero_range has a check for folios outside eof, and it is being
> > called as a part of setsize, i.e, before we change the size of the file.
> >
> > I think we need to do a PREALLOC and then do a XFS_BMAPI_ZERO with
> > XFS_BMAPI_CONVERT. Or I don't know if we should change the warning in
> > iomap_zero_range.
> >
>
> The reason the warning is there is because iomap_zero_range() uses
> buffered writes but doesn't actually bump i_size for writes beyond eof.
> Therefore if it ends up zeroing folios that start beyond eof, writeback
> would potentially toss those folios if i_size wasn't updated somehow or
> another by the time it occurs..
>
> I'd guess there are two likely scenarios that lead to this warning, but
> you'd have to confirm. One is that we're unnecessarily zeroing an
> unwritten range for some reason. That would probably be harmless, but
> unexpected. The other would be zeroing written blocks beyond eof, which
> is risky and probably something we want to avoid, but also suspicious in
> that I don't think we should ever have written extents beyond eof in XFS
> (but rather either delalloc or written).
>
Thanks for the reply Brian. The issue is the latter.
As a part of the new FALLOC_FL_WRITE_ZEROES flag, we do want to
physcally zero beyond eof before we update the size of the file as a
part of fallocate. But we will zero them out by directly talking to the
block device. As you say, we should do only unwritten extents beyond
eof, we can do something similar to what ext4 does: write unwritten
extents first, increase the size of the file, then zero out those extents
with XFS_BMAPI_CONVERT with XFS_BMAPI_ZERO.
I will send the patches soon.
--
Pankaj
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-02-27 12:06 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-25 8:39 [PATCH v10] xfs: add FALLOC_FL_WRITE_ZEROES to XFS code base Lukas Herbolt
2026-02-25 17:52 ` Christoph Hellwig
2026-02-26 14:44 ` Pankaj Raghav (Samsung)
2026-02-26 16:42 ` Brian Foster
2026-02-27 12:05 ` Pankaj Raghav (Samsung)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox