[PATCH 4/4] mkfs: fix log sunit automatic configuration

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 4/4] mkfs: fix log sunit automatic configuration
  2026-03-05  4:24 [PATCHSET] xfsprogs: various bug fixes for 6.19 Darrick J. Wong
@ 2026-03-05  4:24 ` Darrick J. Wong
  2026-03-05 14:05   ` Christoph Hellwig
  2026-03-05 22:56   ` Darrick J. Wong
  0 siblings, 2 replies; 9+ messages in thread
From: Darrick J. Wong @ 2026-03-05  4:24 UTC (permalink / raw)
  To: aalbersh, djwong; +Cc: linux-xfs, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

This patch fixes ~96% failure rates on fstests on my test fleet, some
of which now have 4k LBA disks with unexciting min/opt io geometry:

# lsblk -t /dev/sda
NAME   ALIGNMENT MIN-IO  OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE   RA WSAME
sda            0   4096 1048576    4096     512    1 bfq       256 2048    0B
# mkfs.xfs -f -N /dev/sda3
meta-data=/dev/sda3              isize=512    agcount=4, agsize=2183680 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=1
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=1
         =                       exchange=1   metadir=0
data     =                       bsize=4096   blocks=8734720, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1, parent=1
log      =internal log           bsize=4096   blocks=16384, version=2
         =                       sectsz=4096  sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
         =                       rgcount=0    rgsize=0 extents
         =                       zoned=0      start=0 reserved=0

Note that MIN-IO == PHY-SEC, so dsunit/dswidth are zero.  With this
change, we no longer set the lsunit to the fsblock size if the log
sector size is greater than 512.  Unfortunately, dsunit is also not set,
so mkfs never sets the log sunit and it remains zero.  I think
this causes problems with the log roundoff computation in the kernel:

	if (xfs_has_logv2(mp) && mp->m_sb.sb_logsunit > 1)
		log->l_iclog_roundoff = mp->m_sb.sb_logsunit;
	else
		log->l_iclog_roundoff = BBSIZE;

because now the roundoff factor is less than the log sector size.  After
a while, the filesystem cannot be mounted anymore because:

XFS (sda3): Mounting V5 Filesystem 81b8ffa8-383b-4574-a68c-9b8202707a26
XFS (sda3): Corruption warning: Metadata has LSN (4:2729) ahead of current LSN (4:2727). Please unmount and run xfs_repair (>= v4.3) to resolve.
XFS (sda3): log mount/recovery failed: error -22
XFS (sda3): log mount failed

Reverting this patch makes the problem go away, but I think you're
trying to make it so that mkfs will set lsunit = dsunit if dsunit>0 and
the caller didn't specify any -lsunit= parameter, right?

But there's something that just seems off with this whole function.  If
the user provided a -lsunit/-lsu option then we need to validate the
value and either use it if it makes sense, or complain if not.  If the
user didn't specify any option, then we should figure it out
automatically from the other data device geometry options (internal) or
the external log device probing.

But that's not what this function does.  Why would you do this:

	else if (cfg->lsectorsize > XLOG_HEADER_SIZE)
		lsu = cfg->blocksize; /* lsunit matches filesystem block size */

and then loudly validate that lsu (bytes) is congruent with the fsblock
size?  This is trivially true, but then it disables the "make lsunit use
dsunit if set" logic below:

	} else if (cfg->sb_feat.log_version == 2 &&
		   cfg->loginternal && cfg->dsunit) {
		/* lsunit and dsunit now in fs blocks */
		cfg->lsunit = cfg->dsunit;
	}

AFAICT, the "lsunit matches fs block size" logic is buggy.  This code
was added with no justification as part of a "reworking" commit
2f44b1b0e5adc4 ("mkfs: rework stripe calculations") back in 2017.  I
think the correct logic is to move the "lsunit matches fs block size"
logic to the no-lsunit-option code after the validation code.

This seems to set sb_logsunit to 4096 on my test VM, to 0 on the even
more boring VMs with 512 physical sectors, and to 262144 with the
scsi_debug device that Lukas Herbolt created with:

# modprobe scsi_debug inq_vendor=XFS_TEST physblk_exp=3 sector_size=512 \
opt_xferlen_exp=9 opt_blks=512 dev_size_mb=100 virtual_gb=1000

Cc: <linux-xfs@vger.kernel.org> # v4.15.0
Fixes: 2f44b1b0e5adc4 ("mkfs: rework stripe calculations")
Fixes: ca1eb448e116da ("mkfs.xfs fix sunit size on 512e and 4kN disks.")
Link: https://lore.kernel.org/linux-xfs/20250926123829.2101207-2-lukas@herbolt.com/
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 mkfs/xfs_mkfs.c |   16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index ece20905b28313..a45859dd633e98 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -3624,10 +3624,8 @@ _("%s: Stripe unit(%d) or stripe width(%d) is not a multiple of the block size(%
 		lsunit = cli->lsunit;
 	else if (cli_opt_set(&lopts, L_SU))
 		lsu = getnum(cli->lsu, &lopts, L_SU);
-	else if (cfg->lsectorsize > XLOG_HEADER_SIZE)
-		lsu = cfg->blocksize; /* lsunit matches filesystem block size */

-	if (cli->lsu) {
+	if (lsu) {
 		/* verify if lsu is a multiple block size */
 		if (lsu % cfg->blocksize != 0) {
 			fprintf(stderr,
@@ -3651,10 +3649,14 @@ _("log stripe unit (%d) must be a multiple of the block size (%d)\n"),
 	if (lsunit) {
 		/* convert from 512 byte blocks to fs blocks */
 		cfg->lsunit = DTOBT(lsunit, cfg->blocklog);
-	} else if (cfg->sb_feat.log_version == 2 &&
-		   cfg->loginternal && cfg->dsunit) {
-		/* lsunit and dsunit now in fs blocks */
-		cfg->lsunit = cfg->dsunit;
+	} else if (cfg->sb_feat.log_version == 2 && cfg->loginternal) {
+		if (cfg->dsunit) {
+			/* lsunit and dsunit now in fs blocks */
+			cfg->lsunit = cfg->dsunit;
+		} else if (cfg->lsectorsize > XLOG_HEADER_SIZE) {
+			/* lsunit matches filesystem block size */
+			cfg->lsunit = 1;
+		}
 	} else if (cfg->sb_feat.log_version == 2 &&
 		   !cfg->loginternal) {
 		/* use the external log device properties */

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 4/4] mkfs: fix log sunit automatic configuration
  2026-03-05  4:24 ` [PATCH 4/4] mkfs: fix log sunit automatic configuration Darrick J. Wong
@ 2026-03-05 14:05   ` Christoph Hellwig
  2026-03-05 22:56   ` Darrick J. Wong
  1 sibling, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2026-03-05 14:05 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 4/4] mkfs: fix log sunit automatic configuration
  2026-03-05  4:24 ` [PATCH 4/4] mkfs: fix log sunit automatic configuration Darrick J. Wong
  2026-03-05 14:05   ` Christoph Hellwig
@ 2026-03-05 22:56   ` Darrick J. Wong
  1 sibling, 0 replies; 9+ messages in thread
From: Darrick J. Wong @ 2026-03-05 22:56 UTC (permalink / raw)
  To: aalbersh; +Cc: linux-xfs

On Wed, Mar 04, 2026 at 08:24:55PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> This patch fixes ~96% failure rates on fstests on my test fleet, some
> of which now have 4k LBA disks with unexciting min/opt io geometry:
> 
> # lsblk -t /dev/sda
> NAME   ALIGNMENT MIN-IO  OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE   RA WSAME
> sda            0   4096 1048576    4096     512    1 bfq       256 2048    0B
> # mkfs.xfs -f -N /dev/sda3
> meta-data=/dev/sda3              isize=512    agcount=4, agsize=2183680 blks
>          =                       sectsz=4096  attr=2, projid32bit=1
>          =                       crc=1        finobt=1, sparse=1, rmapbt=1
>          =                       reflink=1    bigtime=1 inobtcount=1 nrext64=1
>          =                       exchange=1   metadir=0
> data     =                       bsize=4096   blocks=8734720, imaxpct=25
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0, ftype=1, parent=1
> log      =internal log           bsize=4096   blocks=16384, version=2
>          =                       sectsz=4096  sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
>          =                       rgcount=0    rgsize=0 extents
>          =                       zoned=0      start=0 reserved=0
> 
> Note that MIN-IO == PHY-SEC, so dsunit/dswidth are zero.  With this
> change, we no longer set the lsunit to the fsblock size if the log
> sector size is greater than 512.  Unfortunately, dsunit is also not set,
> so mkfs never sets the log sunit and it remains zero.  I think
> this causes problems with the log roundoff computation in the kernel:
> 
> 	if (xfs_has_logv2(mp) && mp->m_sb.sb_logsunit > 1)
> 		log->l_iclog_roundoff = mp->m_sb.sb_logsunit;
> 	else
> 		log->l_iclog_roundoff = BBSIZE;
> 
> because now the roundoff factor is less than the log sector size.  After
> a while, the filesystem cannot be mounted anymore because:
> 
> XFS (sda3): Mounting V5 Filesystem 81b8ffa8-383b-4574-a68c-9b8202707a26
> XFS (sda3): Corruption warning: Metadata has LSN (4:2729) ahead of current LSN (4:2727). Please unmount and run xfs_repair (>= v4.3) to resolve.
> XFS (sda3): log mount/recovery failed: error -22
> XFS (sda3): log mount failed
> 
> Reverting this patch makes the problem go away, but I think you're
> trying to make it so that mkfs will set lsunit = dsunit if dsunit>0 and
> the caller didn't specify any -lsunit= parameter, right?
> 
> But there's something that just seems off with this whole function.  If
> the user provided a -lsunit/-lsu option then we need to validate the
> value and either use it if it makes sense, or complain if not.  If the
> user didn't specify any option, then we should figure it out
> automatically from the other data device geometry options (internal) or
> the external log device probing.
> 
> But that's not what this function does.  Why would you do this:
> 
> 	else if (cfg->lsectorsize > XLOG_HEADER_SIZE)
> 		lsu = cfg->blocksize; /* lsunit matches filesystem block size */
> 
> and then loudly validate that lsu (bytes) is congruent with the fsblock
> size?  This is trivially true, but then it disables the "make lsunit use
> dsunit if set" logic below:
> 
> 	} else if (cfg->sb_feat.log_version == 2 &&
> 		   cfg->loginternal && cfg->dsunit) {
> 		/* lsunit and dsunit now in fs blocks */
> 		cfg->lsunit = cfg->dsunit;
> 	}
> 
> AFAICT, the "lsunit matches fs block size" logic is buggy.  This code
> was added with no justification as part of a "reworking" commit
> 2f44b1b0e5adc4 ("mkfs: rework stripe calculations") back in 2017.  I
> think the correct logic is to move the "lsunit matches fs block size"
> logic to the no-lsunit-option code after the validation code.
> 
> This seems to set sb_logsunit to 4096 on my test VM, to 0 on the even
> more boring VMs with 512 physical sectors, and to 262144 with the
> scsi_debug device that Lukas Herbolt created with:
> 
> # modprobe scsi_debug inq_vendor=XFS_TEST physblk_exp=3 sector_size=512 \
> opt_xferlen_exp=9 opt_blks=512 dev_size_mb=100 virtual_gb=1000
> 
> Cc: <linux-xfs@vger.kernel.org> # v4.15.0
> Fixes: 2f44b1b0e5adc4 ("mkfs: rework stripe calculations")
> Fixes: ca1eb448e116da ("mkfs.xfs fix sunit size on 512e and 4kN disks.")
> Link: https://lore.kernel.org/linux-xfs/20250926123829.2101207-2-lukas@herbolt.com/
> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> ---
>  mkfs/xfs_mkfs.c |   16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> 
> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> index ece20905b28313..a45859dd633e98 100644
> --- a/mkfs/xfs_mkfs.c
> +++ b/mkfs/xfs_mkfs.c
> @@ -3624,10 +3624,8 @@ _("%s: Stripe unit(%d) or stripe width(%d) is not a multiple of the block size(%
>  		lsunit = cli->lsunit;
>  	else if (cli_opt_set(&lopts, L_SU))
>  		lsu = getnum(cli->lsu, &lopts, L_SU);
> -	else if (cfg->lsectorsize > XLOG_HEADER_SIZE)
> -		lsu = cfg->blocksize; /* lsunit matches filesystem block size */
>  
> -	if (cli->lsu) {
> +	if (lsu) {
>  		/* verify if lsu is a multiple block size */
>  		if (lsu % cfg->blocksize != 0) {
>  			fprintf(stderr,
> @@ -3651,10 +3649,14 @@ _("log stripe unit (%d) must be a multiple of the block size (%d)\n"),
>  	if (lsunit) {
>  		/* convert from 512 byte blocks to fs blocks */
>  		cfg->lsunit = DTOBT(lsunit, cfg->blocklog);
> -	} else if (cfg->sb_feat.log_version == 2 &&
> -		   cfg->loginternal && cfg->dsunit) {
> -		/* lsunit and dsunit now in fs blocks */
> -		cfg->lsunit = cfg->dsunit;
> +	} else if (cfg->sb_feat.log_version == 2 && cfg->loginternal) {
> +		if (cfg->dsunit) {
> +			/* lsunit and dsunit now in fs blocks */
> +			cfg->lsunit = cfg->dsunit;
> +		} else if (cfg->lsectorsize > XLOG_HEADER_SIZE) {
> +			/* lsunit matches filesystem block size */
> +			cfg->lsunit = 1;
> +		}
>  	} else if (cfg->sb_feat.log_version == 2 &&
>  		   !cfg->loginternal) {
>  		/* use the external log device properties */

There's a bug in this patch; the lsectorsize > XLOG_HEADER_SIZE also
needs to be copied downwards to the external log configuration part.

--D

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCHSET v2] xfsprogs: various bug fixes for 6.19
@ 2026-03-10  3:24 Darrick J. Wong
  2026-03-10  3:24 ` [PATCH 1/4] misc: fix a few memory leaks Darrick J. Wong
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Darrick J. Wong @ 2026-03-10  3:24 UTC (permalink / raw)
  To: aalbersh, djwong; +Cc: linux-xfs, hch, linux-xfs

Hi all,

This is miscellaneous bugfixes.

v2: fix the external log sunit handling

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

With a bit of luck, this should all go splendidly.
Comments and questions are, as always, welcome.

Unreviewed patches in this series:
[PATCHSET v2] xfsprogs: various bug fixes for 6.19
  [PATCH 4/4] mkfs: fix log sunit automatic configuration

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=random-fixes

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=random-fixes

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=random-fixes
---
Commits in this patchset:
 * misc: fix a few memory leaks
 * libxfs: fix data corruption bug in libxfs_file_write
 * mkfs: fix protofile data corruption when in/out file block sizes don't match
 * mkfs: fix log sunit automatic configuration
---
 libxfs/util.c   |    5 +++--
 mkfs/proto.c    |   28 ++++++++++++++++++----------
 mkfs/xfs_mkfs.c |   29 ++++++++++++++++++-----------
 3 files changed, 39 insertions(+), 23 deletions(-)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/4] misc: fix a few memory leaks
  2026-03-10  3:24 [PATCHSET v2] xfsprogs: various bug fixes for 6.19 Darrick J. Wong
@ 2026-03-10  3:24 ` Darrick J. Wong
  2026-03-10  3:25 ` [PATCH 2/4] libxfs: fix data corruption bug in libxfs_file_write Darrick J. Wong
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Darrick J. Wong @ 2026-03-10  3:24 UTC (permalink / raw)
  To: aalbersh, djwong; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

valgrind caught these while I was trying to debug xfs/841 regression so
fix them.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 mkfs/proto.c    |   17 +++++++----------
 mkfs/xfs_mkfs.c |    1 +
 2 files changed, 8 insertions(+), 10 deletions(-)


diff --git a/mkfs/proto.c b/mkfs/proto.c
index 1a7b3586581278..3241a066f72951 100644
--- a/mkfs/proto.c
+++ b/mkfs/proto.c
@@ -1412,19 +1412,19 @@ handle_hardlink(
 	struct xfs_trans	*tp;
 	struct xfs_parent_args	*ppargs = NULL;
 
-	tp = getres(mp, 0);
-	ppargs = newpptr(mp);
-	dst_ino = get_hardlink_dst_inode(file_stat.st_ino);
-
 	/*
 	 * We didn't find the hardlink inode, this means it's the first time
 	 * we see it, report error so create_nondir_inode() can continue handling the
 	 * inode as a regular file type, and later save the source inode in our
 	 * buffer for future consumption.
 	 */
+	dst_ino = get_hardlink_dst_inode(file_stat.st_ino);
 	if (dst_ino == 0)
 		return false;
 
+	tp = getres(mp, 0);
+	ppargs = newpptr(mp);
+
 	error = -libxfs_iget(mp, NULL, dst_ino, 0, &ip);
 	if (error)
 		fail(_("failed to get inode"), error);
@@ -1546,12 +1546,7 @@ create_nondir_inode(
 		close(fd);
 		return;
 	}
-	/*
-	 * If instead we have an error it means the hardlink was not registered,
-	 * so we proceed to treat it like a regular file, and save it to our
-	 * tracker later.
-	 */
-	tp = getres(mp, 0);
+
 	/*
 	 * In case of symlinks, we need to handle things a little differently.
 	 * We need to read out our link target and act accordingly.
@@ -1563,6 +1558,8 @@ create_nondir_inode(
 		if (link_len >= PATH_MAX)
 			fail(_("symlink target too long"), ENAMETOOLONG);
 		tp = getres(mp, XFS_B_TO_FSB(mp, link_len));
+	} else {
+		tp = getres(mp, 0);
 	}
 	ppargs = newpptr(mp);
 
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index a7a6fde9327797..ece20905b28313 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -5824,6 +5824,7 @@ set_autofsck(
 	}
 
 	libxfs_irele(args.dp);
+	free(p);
 }
 
 /* Write the realtime superblock */


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/4] libxfs: fix data corruption bug in libxfs_file_write
  2026-03-10  3:24 [PATCHSET v2] xfsprogs: various bug fixes for 6.19 Darrick J. Wong
  2026-03-10  3:24 ` [PATCH 1/4] misc: fix a few memory leaks Darrick J. Wong
@ 2026-03-10  3:25 ` Darrick J. Wong
  2026-03-10  3:25 ` [PATCH 3/4] mkfs: fix protofile data corruption when in/out file block sizes don't match Darrick J. Wong
  2026-03-10  3:25 ` [PATCH 4/4] mkfs: fix log sunit automatic configuration Darrick J. Wong
  3 siblings, 0 replies; 9+ messages in thread
From: Darrick J. Wong @ 2026-03-10  3:25 UTC (permalink / raw)
  To: aalbersh, djwong; +Cc: hch, linux-xfs, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

libxfs_file_write tries to initialize the entire file block buffer,
which includes zeroing the head portion if @pos is not aligned to the
filesystem block size.  However, @buf is the file data to copy in at
position @pos, not the position of the file block.  Therefore, block_off
should be added to b_addr, not buf.

Cc: <linux-xfs@vger.kernel.org> # v6.13.0
Fixes: 73fb78e5ee8940 ("mkfs: support copying in large or sparse files")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/util.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)


diff --git a/libxfs/util.c b/libxfs/util.c
index 8dba3ef0c66139..2b1c32efc4709a 100644
--- a/libxfs/util.c
+++ b/libxfs/util.c
@@ -650,7 +650,8 @@ get_random_u32(void)
 
 /*
  * Write a buffer to a file on the data device.  There must not be sparse holes
- * or unwritten extents.
+ * or unwritten extents, and the blocks underneath the file range will be
+ * completely overwritten.
  */
 int
 libxfs_file_write(
@@ -697,7 +698,7 @@ libxfs_file_write(
 		if (block_off > 0)
 			memset((char *)bp->b_addr, 0, block_off);
 		count = min(len, XFS_FSB_TO_B(mp, map.br_blockcount));
-		memmove(bp->b_addr, buf + block_off, count);
+		memmove(bp->b_addr + block_off, buf, count);
 		bcount = BBTOB(bp->b_length);
 		if (count < bcount)
 			memset((char *)bp->b_addr + block_off + count, 0,


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/4] mkfs: fix protofile data corruption when in/out file block sizes don't match
  2026-03-10  3:24 [PATCHSET v2] xfsprogs: various bug fixes for 6.19 Darrick J. Wong
  2026-03-10  3:24 ` [PATCH 1/4] misc: fix a few memory leaks Darrick J. Wong
  2026-03-10  3:25 ` [PATCH 2/4] libxfs: fix data corruption bug in libxfs_file_write Darrick J. Wong
@ 2026-03-10  3:25 ` Darrick J. Wong
  2026-03-10  3:25 ` [PATCH 4/4] mkfs: fix log sunit automatic configuration Darrick J. Wong
  3 siblings, 0 replies; 9+ messages in thread
From: Darrick J. Wong @ 2026-03-10  3:25 UTC (permalink / raw)
  To: aalbersh, djwong; +Cc: hch, linux-xfs, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

As written in 73fb78e5ee8940, if libxfs_file_write is passed an
unaligned file range to write, it will zero the unaligned regions at the
head and tail of the block.  This is what we want for a newly allocated
(and hence unwritten) block, but this is definitely not what we want
if some other part of the block has already been written.

Fix this by extending the data/hole_pos range to be aligned to the block
size of the new filesystem.  This means we read slightly more, but we
never rewrite blocks in the new filesystem, sidestepping the behavior.

Found by xfs/841 when the test filesystem has a 1k fsblock size.

Cc: <linux-xfs@vger.kernel.org> # v6.13.0
Fixes: 73fb78e5ee8940 ("mkfs: support copying in large or sparse files")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 mkfs/proto.c |   11 +++++++++++
 1 file changed, 11 insertions(+)


diff --git a/mkfs/proto.c b/mkfs/proto.c
index 3241a066f72951..a460aebd95ae1e 100644
--- a/mkfs/proto.c
+++ b/mkfs/proto.c
@@ -374,6 +374,17 @@ writefile(
 			break;
 		}
 
+		/*
+		 * If we pass an unaligned range to libxfs_file_write, it will
+		 * zero the unaligned head and tail parts of each block.  If
+		 * the fd filesystem has a smaller blocksize, then we can end
+		 * up writing to the same block twice, causing unwanted zeroing
+		 * and hence data corruption.
+		 */
+		data_pos = rounddown_64(data_pos, mp->m_sb.sb_blocksize);
+		hole_pos = min(roundup_64(hole_pos, mp->m_sb.sb_blocksize),
+			       statbuf.st_size);
+
 		writefile_range(ip, fname, fd, data_pos, hole_pos - data_pos);
 		data_pos = lseek(fd, hole_pos, SEEK_DATA);
 	}


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/4] mkfs: fix log sunit automatic configuration
  2026-03-10  3:24 [PATCHSET v2] xfsprogs: various bug fixes for 6.19 Darrick J. Wong
                   ` (2 preceding siblings ...)
  2026-03-10  3:25 ` [PATCH 3/4] mkfs: fix protofile data corruption when in/out file block sizes don't match Darrick J. Wong
@ 2026-03-10  3:25 ` Darrick J. Wong
  2026-03-10  6:43   ` Christoph Hellwig
  3 siblings, 1 reply; 9+ messages in thread
From: Darrick J. Wong @ 2026-03-10  3:25 UTC (permalink / raw)
  To: aalbersh, djwong; +Cc: linux-xfs, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

This patch fixes ~96% failure rates on fstests on my test fleet, some
of which now have 4k LBA disks with unexciting min/opt io geometry:

# lsblk -t /dev/sda
NAME   ALIGNMENT MIN-IO  OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE   RA WSAME
sda            0   4096 1048576    4096     512    1 bfq       256 2048    0B
# mkfs.xfs -f -N /dev/sda3
meta-data=/dev/sda3              isize=512    agcount=4, agsize=2183680 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=1
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=1
         =                       exchange=1   metadir=0
data     =                       bsize=4096   blocks=8734720, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1, parent=1
log      =internal log           bsize=4096   blocks=16384, version=2
         =                       sectsz=4096  sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
         =                       rgcount=0    rgsize=0 extents
         =                       zoned=0      start=0 reserved=0

Note that MIN-IO == PHY-SEC, so dsunit/dswidth are zero.  With this
change, we no longer set the lsunit to the fsblock size if the log
sector size is greater than 512.  Unfortunately, dsunit is also not set,
so mkfs never sets the log sunit and it remains zero.  I think
this causes problems with the log roundoff computation in the kernel:

	if (xfs_has_logv2(mp) && mp->m_sb.sb_logsunit > 1)
		log->l_iclog_roundoff = mp->m_sb.sb_logsunit;
	else
		log->l_iclog_roundoff = BBSIZE;

because now the roundoff factor is less than the log sector size.  After
a while, the filesystem cannot be mounted anymore because:

XFS (sda3): Mounting V5 Filesystem 81b8ffa8-383b-4574-a68c-9b8202707a26
XFS (sda3): Corruption warning: Metadata has LSN (4:2729) ahead of current LSN (4:2727). Please unmount and run xfs_repair (>= v4.3) to resolve.
XFS (sda3): log mount/recovery failed: error -22
XFS (sda3): log mount failed

Reverting this patch makes the problem go away, but I think you're
trying to make it so that mkfs will set lsunit = dsunit if dsunit>0 and
the caller didn't specify any -lsunit= parameter, right?

But there's something that just seems off with this whole function.  If
the user provided a -lsunit/-lsu option then we need to validate the
value and either use it if it makes sense, or complain if not.  If the
user didn't specify any option, then we should figure it out
automatically from the other data device geometry options (internal) or
the external log device probing.

But that's not what this function does.  Why would you do this:

	else if (cfg->lsectorsize > XLOG_HEADER_SIZE)
		lsu = cfg->blocksize; /* lsunit matches filesystem block size */

and then loudly validate that lsu (bytes) is congruent with the fsblock
size?  This is trivially true, but then it disables the "make lsunit use
dsunit if set" logic below:

	} else if (cfg->sb_feat.log_version == 2 &&
		   cfg->loginternal && cfg->dsunit) {
		/* lsunit and dsunit now in fs blocks */
		cfg->lsunit = cfg->dsunit;
	}

AFAICT, the "lsunit matches fs block size" logic is buggy.  This code
was added with no justification as part of a "reworking" commit
2f44b1b0e5adc4 ("mkfs: rework stripe calculations") back in 2017.  I
think the correct logic is to move the "lsunit matches fs block size"
logic to the no-lsunit-option code after the validation code.

This seems to set sb_logsunit to 4096 on my test VM, to 0 on the even
more boring VMs with 512 physical sectors, and to 262144 with the
scsi_debug device that Lukas Herbolt created with:

# modprobe scsi_debug inq_vendor=XFS_TEST physblk_exp=3 sector_size=512 \
opt_xferlen_exp=9 opt_blks=512 dev_size_mb=100 virtual_gb=1000

We also need to adjust the external log lsunit computation code, which
was omitted from the first version of this patch.

Cc: <linux-xfs@vger.kernel.org> # v4.15.0
Fixes: 2f44b1b0e5adc4 ("mkfs: rework stripe calculations")
Fixes: ca1eb448e116da ("mkfs.xfs fix sunit size on 512e and 4kN disks.")
Link: https://lore.kernel.org/linux-xfs/20250926123829.2101207-2-lukas@herbolt.com/
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 mkfs/xfs_mkfs.c |   28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index ece20905b28313..527a662f3ac858 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -3624,10 +3624,8 @@ _("%s: Stripe unit(%d) or stripe width(%d) is not a multiple of the block size(%
 		lsunit = cli->lsunit;
 	else if (cli_opt_set(&lopts, L_SU))
 		lsu = getnum(cli->lsu, &lopts, L_SU);
-	else if (cfg->lsectorsize > XLOG_HEADER_SIZE)
-		lsu = cfg->blocksize; /* lsunit matches filesystem block size */

-	if (cli->lsu) {
+	if (lsu) {
 		/* verify if lsu is a multiple block size */
 		if (lsu % cfg->blocksize != 0) {
 			fprintf(stderr,
@@ -3651,14 +3649,22 @@ _("log stripe unit (%d) must be a multiple of the block size (%d)\n"),
 	if (lsunit) {
 		/* convert from 512 byte blocks to fs blocks */
 		cfg->lsunit = DTOBT(lsunit, cfg->blocklog);
-	} else if (cfg->sb_feat.log_version == 2 &&
-		   cfg->loginternal && cfg->dsunit) {
-		/* lsunit and dsunit now in fs blocks */
-		cfg->lsunit = cfg->dsunit;
-	} else if (cfg->sb_feat.log_version == 2 &&
-		   !cfg->loginternal) {
-		/* use the external log device properties */
-		cfg->lsunit = DTOBT(ft->log.sunit, cfg->blocklog);
+	} else if (cfg->sb_feat.log_version == 2 && cfg->loginternal) {
+		if (cfg->dsunit) {
+			/* lsunit and dsunit now in fs blocks */
+			cfg->lsunit = cfg->dsunit;
+		} else if (cfg->lsectorsize > XLOG_HEADER_SIZE) {
+			/* lsunit matches filesystem block size */
+			cfg->lsunit = 1;
+		}
+	} else if (cfg->sb_feat.log_version == 2 && !cfg->loginternal) {
+		if (ft->log.sunit > 0) {
+			/* use the external log device properties */
+			cfg->lsunit = DTOBT(ft->log.sunit, cfg->blocklog);
+		} else if (cfg->lsectorsize > XLOG_HEADER_SIZE) {
+			/* lsunit matches filesystem block size */
+			cfg->lsunit = 1;
+		}
 	}

 	if (cfg->sb_feat.log_version == 2 &&

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 4/4] mkfs: fix log sunit automatic configuration
  2026-03-10  3:25 ` [PATCH 4/4] mkfs: fix log sunit automatic configuration Darrick J. Wong
@ 2026-03-10  6:43   ` Christoph Hellwig
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2026-03-10  6:43 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-03-10  6:43 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-10  3:24 [PATCHSET v2] xfsprogs: various bug fixes for 6.19 Darrick J. Wong
2026-03-10  3:24 ` [PATCH 1/4] misc: fix a few memory leaks Darrick J. Wong
2026-03-10  3:25 ` [PATCH 2/4] libxfs: fix data corruption bug in libxfs_file_write Darrick J. Wong
2026-03-10  3:25 ` [PATCH 3/4] mkfs: fix protofile data corruption when in/out file block sizes don't match Darrick J. Wong
2026-03-10  3:25 ` [PATCH 4/4] mkfs: fix log sunit automatic configuration Darrick J. Wong
2026-03-10  6:43   ` Christoph Hellwig
  -- strict thread matches above, loose matches on Subject: below --
2026-03-05  4:24 [PATCHSET] xfsprogs: various bug fixes for 6.19 Darrick J. Wong
2026-03-05  4:24 ` [PATCH 4/4] mkfs: fix log sunit automatic configuration Darrick J. Wong
2026-03-05 14:05   ` Christoph Hellwig
2026-03-05 22:56   ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox