public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: aalbersh@kernel.org
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 4/4] mkfs: fix log sunit automatic configuration
Date: Thu, 5 Mar 2026 14:56:54 -0800	[thread overview]
Message-ID: <20260305225654.GK57948@frogsfrogsfrogs> (raw)
In-Reply-To: <177268457083.1999857.7479249726865742847.stgit@frogsfrogsfrogs>

On Wed, Mar 04, 2026 at 08:24:55PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> This patch fixes ~96% failure rates on fstests on my test fleet, some
> of which now have 4k LBA disks with unexciting min/opt io geometry:
> 
> # lsblk -t /dev/sda
> NAME   ALIGNMENT MIN-IO  OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE   RA WSAME
> sda            0   4096 1048576    4096     512    1 bfq       256 2048    0B
> # mkfs.xfs -f -N /dev/sda3
> meta-data=/dev/sda3              isize=512    agcount=4, agsize=2183680 blks
>          =                       sectsz=4096  attr=2, projid32bit=1
>          =                       crc=1        finobt=1, sparse=1, rmapbt=1
>          =                       reflink=1    bigtime=1 inobtcount=1 nrext64=1
>          =                       exchange=1   metadir=0
> data     =                       bsize=4096   blocks=8734720, imaxpct=25
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0, ftype=1, parent=1
> log      =internal log           bsize=4096   blocks=16384, version=2
>          =                       sectsz=4096  sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
>          =                       rgcount=0    rgsize=0 extents
>          =                       zoned=0      start=0 reserved=0
> 
> Note that MIN-IO == PHY-SEC, so dsunit/dswidth are zero.  With this
> change, we no longer set the lsunit to the fsblock size if the log
> sector size is greater than 512.  Unfortunately, dsunit is also not set,
> so mkfs never sets the log sunit and it remains zero.  I think
> this causes problems with the log roundoff computation in the kernel:
> 
> 	if (xfs_has_logv2(mp) && mp->m_sb.sb_logsunit > 1)
> 		log->l_iclog_roundoff = mp->m_sb.sb_logsunit;
> 	else
> 		log->l_iclog_roundoff = BBSIZE;
> 
> because now the roundoff factor is less than the log sector size.  After
> a while, the filesystem cannot be mounted anymore because:
> 
> XFS (sda3): Mounting V5 Filesystem 81b8ffa8-383b-4574-a68c-9b8202707a26
> XFS (sda3): Corruption warning: Metadata has LSN (4:2729) ahead of current LSN (4:2727). Please unmount and run xfs_repair (>= v4.3) to resolve.
> XFS (sda3): log mount/recovery failed: error -22
> XFS (sda3): log mount failed
> 
> Reverting this patch makes the problem go away, but I think you're
> trying to make it so that mkfs will set lsunit = dsunit if dsunit>0 and
> the caller didn't specify any -lsunit= parameter, right?
> 
> But there's something that just seems off with this whole function.  If
> the user provided a -lsunit/-lsu option then we need to validate the
> value and either use it if it makes sense, or complain if not.  If the
> user didn't specify any option, then we should figure it out
> automatically from the other data device geometry options (internal) or
> the external log device probing.
> 
> But that's not what this function does.  Why would you do this:
> 
> 	else if (cfg->lsectorsize > XLOG_HEADER_SIZE)
> 		lsu = cfg->blocksize; /* lsunit matches filesystem block size */
> 
> and then loudly validate that lsu (bytes) is congruent with the fsblock
> size?  This is trivially true, but then it disables the "make lsunit use
> dsunit if set" logic below:
> 
> 	} else if (cfg->sb_feat.log_version == 2 &&
> 		   cfg->loginternal && cfg->dsunit) {
> 		/* lsunit and dsunit now in fs blocks */
> 		cfg->lsunit = cfg->dsunit;
> 	}
> 
> AFAICT, the "lsunit matches fs block size" logic is buggy.  This code
> was added with no justification as part of a "reworking" commit
> 2f44b1b0e5adc4 ("mkfs: rework stripe calculations") back in 2017.  I
> think the correct logic is to move the "lsunit matches fs block size"
> logic to the no-lsunit-option code after the validation code.
> 
> This seems to set sb_logsunit to 4096 on my test VM, to 0 on the even
> more boring VMs with 512 physical sectors, and to 262144 with the
> scsi_debug device that Lukas Herbolt created with:
> 
> # modprobe scsi_debug inq_vendor=XFS_TEST physblk_exp=3 sector_size=512 \
> opt_xferlen_exp=9 opt_blks=512 dev_size_mb=100 virtual_gb=1000
> 
> Cc: <linux-xfs@vger.kernel.org> # v4.15.0
> Fixes: 2f44b1b0e5adc4 ("mkfs: rework stripe calculations")
> Fixes: ca1eb448e116da ("mkfs.xfs fix sunit size on 512e and 4kN disks.")
> Link: https://lore.kernel.org/linux-xfs/20250926123829.2101207-2-lukas@herbolt.com/
> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> ---
>  mkfs/xfs_mkfs.c |   16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> 
> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> index ece20905b28313..a45859dd633e98 100644
> --- a/mkfs/xfs_mkfs.c
> +++ b/mkfs/xfs_mkfs.c
> @@ -3624,10 +3624,8 @@ _("%s: Stripe unit(%d) or stripe width(%d) is not a multiple of the block size(%
>  		lsunit = cli->lsunit;
>  	else if (cli_opt_set(&lopts, L_SU))
>  		lsu = getnum(cli->lsu, &lopts, L_SU);
> -	else if (cfg->lsectorsize > XLOG_HEADER_SIZE)
> -		lsu = cfg->blocksize; /* lsunit matches filesystem block size */
>  
> -	if (cli->lsu) {
> +	if (lsu) {
>  		/* verify if lsu is a multiple block size */
>  		if (lsu % cfg->blocksize != 0) {
>  			fprintf(stderr,
> @@ -3651,10 +3649,14 @@ _("log stripe unit (%d) must be a multiple of the block size (%d)\n"),
>  	if (lsunit) {
>  		/* convert from 512 byte blocks to fs blocks */
>  		cfg->lsunit = DTOBT(lsunit, cfg->blocklog);
> -	} else if (cfg->sb_feat.log_version == 2 &&
> -		   cfg->loginternal && cfg->dsunit) {
> -		/* lsunit and dsunit now in fs blocks */
> -		cfg->lsunit = cfg->dsunit;
> +	} else if (cfg->sb_feat.log_version == 2 && cfg->loginternal) {
> +		if (cfg->dsunit) {
> +			/* lsunit and dsunit now in fs blocks */
> +			cfg->lsunit = cfg->dsunit;
> +		} else if (cfg->lsectorsize > XLOG_HEADER_SIZE) {
> +			/* lsunit matches filesystem block size */
> +			cfg->lsunit = 1;
> +		}
>  	} else if (cfg->sb_feat.log_version == 2 &&
>  		   !cfg->loginternal) {
>  		/* use the external log device properties */

There's a bug in this patch; the lsectorsize > XLOG_HEADER_SIZE also
needs to be copied downwards to the external log configuration part.

--D

  parent reply	other threads:[~2026-03-05 22:56 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-05  4:24 [PATCHSET] xfsprogs: various bug fixes for 6.19 Darrick J. Wong
2026-03-05  4:24 ` [PATCH 1/4] misc: fix a few memory leaks Darrick J. Wong
2026-03-05 14:03   ` Christoph Hellwig
2026-03-05  4:24 ` [PATCH 2/4] libxfs: fix data corruption bug in libxfs_file_write Darrick J. Wong
2026-03-05  7:46   ` Holger Hoffstätte
2026-03-05 15:56     ` Darrick J. Wong
2026-03-05 14:04   ` Christoph Hellwig
2026-03-05  4:24 ` [PATCH 3/4] mkfs: fix protofile data corruption when in/out file block sizes don't match Darrick J. Wong
2026-03-05 14:04   ` Christoph Hellwig
2026-03-05  4:24 ` [PATCH 4/4] mkfs: fix log sunit automatic configuration Darrick J. Wong
2026-03-05 14:05   ` Christoph Hellwig
2026-03-05 22:56   ` Darrick J. Wong [this message]
  -- strict thread matches above, loose matches on Subject: below --
2026-03-10  3:24 [PATCHSET v2] xfsprogs: various bug fixes for 6.19 Darrick J. Wong
2026-03-10  3:25 ` [PATCH 4/4] mkfs: fix log sunit automatic configuration Darrick J. Wong
2026-03-10  6:43   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260305225654.GK57948@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=aalbersh@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox