* [PATCH v7 0/1] scsi: sas: fix mkfs.xfs failure due to bogus optimal_io_size
@ 2026-04-15 7:18 Ionut Nechita (Wind River)
2026-04-15 7:18 ` [PATCH v7 1/1] scsi: sas: skip opt_sectors when DMA reports no real optimization hint Ionut Nechita (Wind River)
0 siblings, 1 reply; 4+ messages in thread
From: Ionut Nechita (Wind River) @ 2026-04-15 7:18 UTC (permalink / raw)
To: James E . J . Bottomley, Martin K . Petersen
Cc: linux-scsi, linux-kernel, stable, hch, dlemoal, robin.murphy,
john.g.garry, axboe, m.szyprowski, ahuang12, ionut_n2001,
sunlightlinux, Ionut Nechita
From: Ionut Nechita <ionut.nechita@windriver.com>
From: Ionut Nechita <ionut.nechita@windriver.com>
v7 (per John Garry's review of v6):
- Dropped the redundant !opt check from the first guard; the
!opt_sectors check later already handles the opt == 0 case.
Now simply: if (opt >= max) return;
- Added Reviewed-by: John Garry <john.g.garry@oracle.com>.
- Rebased onto linux-next (next-20260414).
v6 (per John Garry's review of v5):
- Replaced kerneldoc (/**) with a regular comment — function is static.
- Condensed the comment to a single paragraph.
- Removed WARN_ONCE for opt > max — not the driver's job.
- Combined the !opt and opt == max checks into: if (!opt || opt >= max).
- Apply rounddown_pow_of_two() to min(opt_sectors, max_sectors) instead
of just opt, since max_sectors can be any value.
- Restructured as sas_dma_setup_opt_sectors(struct Scsi_Host *shost)
with the dma_mask check moved inside, removing the need for a
separate dma_dev variable in sas_host_setup().
v5 (per Damien Le Moal's and James Bottomley's review of v4):
- Expanded kdoc, inline comment at opt == max, guard for opt == 0
before rounddown_pow_of_two, trimmed Cc list.
v4 (per Damien Le Moal's review of v3):
- WARN_ONCE for opt > max, min_t overflow protection, reformatted
call site.
v3 (per Christoph Hellwig's review of v2):
- Extracted the opt_sectors logic into a dedicated helper function.
- Added rounddown_pow_of_two().
v2:
- Dropped the dma_opt_mapping_size() change per Robin Murphy's
feedback. Single patch fixing scsi_transport_sas.c.
Test environment:
- Dell PowerEdge R750
- SAS Controller: Broadcom/LSI mpt3sas (SAS3816, FW 33.15.00.00)
- Disks: SAMSUNG MZILT800HBHQ0D3 (800GB SCSI SAS SSD)
- Kernel: 6.12.0-1-amd64 with intel_iommu=off
- IOMMU: Disabled (DMAR: IOMMU disabled), default domain: Passthrough
Based on linux-next (next-20260414).
Link: https://lore.kernel.org/lkml/20260316203956.64515-1-ionut.nechita@windriver.com/ [v1]
Link: https://lore.kernel.org/all/20260318074314.17372-1-ionut.nechita@windriver.com/ [v2]
Link: https://lore.kernel.org/all/20260318200532.51232-1-ionut.nechita@windriver.com/ [v3]
Link: https://lore.kernel.org/lkml/20260319083954.21056-1-ionut.nechita@windriver.com/ [v4]
Link: https://lore.kernel.org/linux-scsi/20260320081429.42106-1-ionut.nechita@windriver.com/ [v5]
Link: https://lore.kernel.org/linux-scsi/20260326084644.27162-1-ionut.nechita@windriver.com/ [v6]
Ionut Nechita (Wind River) (1):
scsi: sas: skip opt_sectors when DMA reports no real optimization hint
drivers/scsi/scsi_transport_sas.c | 38 +++++++++++++++++++++++++++----
1 file changed, 33 insertions(+), 5 deletions(-)
--
2.53.0
^ permalink raw reply [flat|nested] 4+ messages in thread* [PATCH v7 1/1] scsi: sas: skip opt_sectors when DMA reports no real optimization hint 2026-04-15 7:18 [PATCH v7 0/1] scsi: sas: fix mkfs.xfs failure due to bogus optimal_io_size Ionut Nechita (Wind River) @ 2026-04-15 7:18 ` Ionut Nechita (Wind River) 2026-04-24 13:21 ` Christoph Hellwig 0 siblings, 1 reply; 4+ messages in thread From: Ionut Nechita (Wind River) @ 2026-04-15 7:18 UTC (permalink / raw) To: James E . J . Bottomley, Martin K . Petersen Cc: linux-scsi, linux-kernel, stable, hch, dlemoal, robin.murphy, john.g.garry, axboe, m.szyprowski, ahuang12, ionut_n2001, sunlightlinux, Ionut Nechita From: Ionut Nechita <ionut.nechita@windriver.com> sas_host_setup() unconditionally sets shost->opt_sectors from dma_opt_mapping_size(). When the IOMMU is disabled or in passthrough mode and no DMA ops provide an opt_mapping_size callback, dma_opt_mapping_size() returns min(dma_max_mapping_size(), SIZE_MAX) which equals dma_max_mapping_size() — a hard upper bound, not an optimization hint. On a Dell PowerEdge R750 with mpt3sas (Broadcom SAS3816, FW 33.15.00.00) and intel_iommu=off the following values are observed: dma_opt_mapping_size() = dma_max_mapping_size() (no real hint) shost->max_sectors = 32767 opt_sectors = min(32767, huge >> 9) = 32767 optimal_io_size = 32767 << 9 = 16776704 → round_down(16776704, 4096) = 16773120 The SAS disk (SAMSUNG MZILT800HBHQ0D3) does not report an Optimal Transfer Length in VPD page B0, so sdkp->opt_xfer_blocks remains 0. sd_revalidate_disk() then uses min_not_zero(0, opt_sectors) = opt_sectors, propagating the bogus value into the block device's optimal_io_size (visible as OPT-IO = 16773120 in lsblk --topology). mkfs.xfs picks up optimal_io_size and minimum_io_size and computes: swidth = 16773120 / 4096 = 4095 sunit = 8192 / 4096 = 2 Since 4095 % 2 != 0, XFS rejects the geometry: SB stripe unit sanity check failed This makes it impossible to create XFS filesystems (e.g. for /var/lib/docker) during system bootstrap. Fix this by introducing a sas_dma_setup_opt_sectors() helper that sets opt_sectors only when dma_opt_mapping_size() is strictly less than dma_max_mapping_size(), indicating a genuine DMA optimization constraint. The helper computes min(opt_sectors, max_sectors) first, then rounds down to a power of two so that filesystem geometry calculations always produce clean results. When the two DMA values are equal, no backend provided a real hint, so opt_sectors stays at 0 ("no preference"). Fixes: 4cbfca5f7750 ("scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit") Cc: stable@vger.kernel.org Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Ionut Nechita <ionut.nechita@windriver.com> --- Changes in v7: - Drop redundant !opt check; the !opt_sectors check below already handles the opt == 0 case (John Garry). - Add Reviewed-by from John Garry. - Rebased onto next-20260414. Changes in v6: - No kerneldoc, short inline comment, removed WARN_ONCE, combined checks (!opt || opt >= max), rounddown on min(opt, max_sectors), restructured as sas_dma_setup_opt_sectors(shost) (John Garry). Changes in v5: - Expanded kdoc, inline comment at opt == max, guard for opt == 0 before rounddown_pow_of_two, trimmed Cc list (Damien/James/Sashiko). Changes in v4: - WARN_ONCE for opt > max, min_t overflow protection, reformatted call site (Damien Le Moal). Changes in v3: - sas_dma_opt_sectors() helper + rounddown_pow_of_two() (Christoph). Changes in v2: - Single patch fixing scsi_transport_sas.c, Fixes: 4cbfca5f7750. drivers/scsi/scsi_transport_sas.c | 38 +++++++++++++++++++++++++++---- 1 file changed, 33 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c index 13412702188e4..45609259f27db 100644 --- a/drivers/scsi/scsi_transport_sas.c +++ b/drivers/scsi/scsi_transport_sas.c @@ -27,6 +27,7 @@ #include <linux/module.h> #include <linux/jiffies.h> #include <linux/err.h> +#include <linux/log2.h> #include <linux/slab.h> #include <linux/string.h> #include <linux/blkdev.h> @@ -222,12 +223,42 @@ static int sas_bsg_initialize(struct Scsi_Host *shost, struct sas_rphy *rphy) * SAS host attributes */ +/* + * Set shost->opt_sectors from the DMA optimal mapping size, but only + * when dma_opt_mapping_size() is strictly less than dma_max_mapping_size(), + * indicating a genuine optimization hint from an IOMMU or DMA backend. + * When the two are equal (e.g. IOMMU disabled / passthrough), no real + * hint exists, so leave opt_sectors at 0 to avoid bogus optimal_io_size + * values that break filesystem geometry (e.g. mkfs.xfs stripe alignment). + */ +static void sas_dma_setup_opt_sectors(struct Scsi_Host *shost) +{ + struct device *dma_dev = shost->dma_dev; + size_t opt, max; + unsigned int opt_sectors; + + if (!dma_dev->dma_mask) + return; + + opt = dma_opt_mapping_size(dma_dev); + max = dma_max_mapping_size(dma_dev); + + if (opt >= max) + return; + + opt_sectors = min_t(unsigned int, opt >> SECTOR_SHIFT, + shost->max_sectors); + if (!opt_sectors) + return; + + shost->opt_sectors = rounddown_pow_of_two(opt_sectors); +} + static int sas_host_setup(struct transport_container *tc, struct device *dev, struct device *cdev) { struct Scsi_Host *shost = dev_to_shost(dev); struct sas_host_attrs *sas_host = to_sas_host_attrs(shost); - struct device *dma_dev = shost->dma_dev; INIT_LIST_HEAD(&sas_host->rphy_list); mutex_init(&sas_host->lock); @@ -239,10 +270,7 @@ static int sas_host_setup(struct transport_container *tc, struct device *dev, dev_printk(KERN_ERR, dev, "fail to a bsg device %d\n", shost->host_no); - if (dma_dev->dma_mask) { - shost->opt_sectors = min_t(unsigned int, shost->max_sectors, - dma_opt_mapping_size(dma_dev) >> SECTOR_SHIFT); - } + sas_dma_setup_opt_sectors(shost); return 0; } -- 2.53.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v7 1/1] scsi: sas: skip opt_sectors when DMA reports no real optimization hint 2026-04-15 7:18 ` [PATCH v7 1/1] scsi: sas: skip opt_sectors when DMA reports no real optimization hint Ionut Nechita (Wind River) @ 2026-04-24 13:21 ` Christoph Hellwig 2026-04-28 8:15 ` John Garry 0 siblings, 1 reply; 4+ messages in thread From: Christoph Hellwig @ 2026-04-24 13:21 UTC (permalink / raw) To: Ionut Nechita (Wind River) Cc: James E . J . Bottomley, Martin K . Petersen, linux-scsi, linux-kernel, stable, hch, dlemoal, robin.murphy, john.g.garry, axboe, m.szyprowski, ahuang12, ionut_n2001, sunlightlinux On Wed, Apr 15, 2026 at 10:18:49AM +0300, Ionut Nechita (Wind River) wrote: > +/* > + * Set shost->opt_sectors from the DMA optimal mapping size, but only > + * when dma_opt_mapping_size() is strictly less than dma_max_mapping_size(), > + * indicating a genuine optimization hint from an IOMMU or DMA backend. > + * When the two are equal (e.g. IOMMU disabled / passthrough), no real > + * hint exists, so leave opt_sectors at 0 to avoid bogus optimal_io_size > + * values that break filesystem geometry (e.g. mkfs.xfs stripe alignment). > + */ > +static void sas_dma_setup_opt_sectors(struct Scsi_Host *shost) > +{ > + struct device *dma_dev = shost->dma_dev; > + size_t opt, max; > + unsigned int opt_sectors; > + > + if (!dma_dev->dma_mask) > + return; Upper layers have no real busines looking at dma_dev->dma_mask. What is this check intended to do? > + > + opt = dma_opt_mapping_size(dma_dev); > + max = dma_max_mapping_size(dma_dev); > + > + if (opt >= max) > + return; > + > + opt_sectors = min_t(unsigned int, opt >> SECTOR_SHIFT, > + shost->max_sectors); > + if (!opt_sectors) > + return; > + > + shost->opt_sectors = rounddown_pow_of_two(opt_sectors); Please add comments explaining the logic. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v7 1/1] scsi: sas: skip opt_sectors when DMA reports no real optimization hint 2026-04-24 13:21 ` Christoph Hellwig @ 2026-04-28 8:15 ` John Garry 0 siblings, 0 replies; 4+ messages in thread From: John Garry @ 2026-04-28 8:15 UTC (permalink / raw) To: Christoph Hellwig, Ionut Nechita (Wind River) Cc: James E . J . Bottomley, Martin K . Petersen, linux-scsi, linux-kernel, stable, dlemoal, robin.murphy, axboe, m.szyprowski, ahuang12, ionut_n2001, sunlightlinux On 24/04/2026 14:21, Christoph Hellwig wrote: Responding to get things moving.. >> + */ >> +static void sas_dma_setup_opt_sectors(struct Scsi_Host *shost) >> +{ >> + struct device *dma_dev = shost->dma_dev; >> + size_t opt, max; >> + unsigned int opt_sectors; >> + >> + if (!dma_dev->dma_mask) >> + return; > Upper layers have no real busines looking at dma_dev->dma_mask. What > is this check intended to do? Back when that check was introduced, dma_max_mapping_size() may crash for some SCSI hosts. See https://lore.kernel.org/linux-scsi/BYAPR04MB58168CBFF8B691DF33C73DDBE7C40@BYAPR04MB5816.namprd04.prod.outlook.com/ scsi_debug would be an example of such a shost as it is not DMA capable. That crash is not an issue any longer from my limited testing. I think that it comes down to new checks in dma_addressing_limited() -> __dma_addressing_limited() for dma_mask being set. So we may be able to get rid of that dma_mask check. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-04-28 8:15 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-15 7:18 [PATCH v7 0/1] scsi: sas: fix mkfs.xfs failure due to bogus optimal_io_size Ionut Nechita (Wind River) 2026-04-15 7:18 ` [PATCH v7 1/1] scsi: sas: skip opt_sectors when DMA reports no real optimization hint Ionut Nechita (Wind River) 2026-04-24 13:21 ` Christoph Hellwig 2026-04-28 8:15 ` John Garry
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox