* [PATCH v8 1/1] scsi: sas: skip opt_sectors when DMA reports no real optimization hint
2026-05-19 13:52 [PATCH v8 0/1] scsi: sas: fix mkfs.xfs failure due to bogus optimal_io_size Ionut Nechita (Wind River)
@ 2026-05-19 13:52 ` Ionut Nechita (Wind River)
2026-05-25 6:02 ` Christoph Hellwig
0 siblings, 1 reply; 3+ messages in thread
From: Ionut Nechita (Wind River) @ 2026-05-19 13:52 UTC (permalink / raw)
To: James.Bottomley, martin.petersen
Cc: linux-scsi, linux-kernel, stable, hch, dlemoal, robin.murphy,
john.g.garry, axboe, m.szyprowski, ahuang12, ionut_n2001,
sunlightlinux, Ionut Nechita
From: Ionut Nechita <ionut.nechita@windriver.com>
sas_host_setup() unconditionally sets shost->opt_sectors from
dma_opt_mapping_size().
When the IOMMU is disabled or in passthrough mode and no DMA ops provide
an opt_mapping_size callback, dma_opt_mapping_size() returns
min(dma_max_mapping_size(), SIZE_MAX) which equals dma_max_mapping_size()
— a hard upper bound, not an optimization hint.
On a Dell PowerEdge R750 with mpt3sas (Broadcom SAS3816, FW 33.15.00.00)
and intel_iommu=off the following values are observed:
dma_opt_mapping_size() = dma_max_mapping_size() (no real hint)
shost->max_sectors = 32767
opt_sectors = min(32767, huge >> 9) = 32767
optimal_io_size = 32767 << 9 = 16776704
→ round_down(16776704, 4096) = 16773120
The SAS disk (SAMSUNG MZILT800HBHQ0D3) does not report an Optimal
Transfer Length in VPD page B0, so sdkp->opt_xfer_blocks remains 0.
sd_revalidate_disk() then uses min_not_zero(0, opt_sectors) = opt_sectors,
propagating the bogus value into the block device's optimal_io_size
(visible as OPT-IO = 16773120 in lsblk --topology).
mkfs.xfs picks up optimal_io_size and minimum_io_size and computes:
swidth = 16773120 / 4096 = 4095
sunit = 8192 / 4096 = 2
Since 4095 % 2 != 0, XFS rejects the geometry:
SB stripe unit sanity check failed
This makes it impossible to create XFS filesystems (e.g. for
/var/lib/docker) during system bootstrap.
Fix this by introducing a sas_dma_setup_opt_sectors() helper that sets
opt_sectors only when dma_opt_mapping_size() is strictly less than
dma_max_mapping_size(), indicating a genuine DMA optimization constraint.
The helper computes min(opt_sectors, max_sectors) first, then rounds
down to a power of two so that filesystem geometry calculations always
produce clean results.
When the two DMA values are equal, no backend provided a real hint, so
opt_sectors stays at 0 ("no preference").
Fixes: 4cbfca5f7750 ("scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit")
Cc: stable@vger.kernel.org
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Ionut Nechita <ionut.nechita@windriver.com>
---
Changes in v8:
- Remove dma_dev->dma_mask check — dma_opt/max_mapping_size() handle
the no-DMA case gracefully by returning SIZE_MAX (Christoph Hellwig).
- Add inline comments explaining each conditional (Christoph Hellwig).
Changes in v7:
- Drop redundant !opt check; the !opt_sectors check below already
handles the opt == 0 case (John Garry).
- Add Reviewed-by from John Garry.
Changes in v6:
- No kerneldoc, short inline comment, removed WARN_ONCE, combined
checks (!opt || opt >= max), rounddown on min(opt, max_sectors),
restructured as sas_dma_setup_opt_sectors(shost) (John Garry).
Changes in v5:
- Expanded kdoc, inline comment at opt == max, guard for opt == 0
before rounddown_pow_of_two, trimmed Cc list (Damien/James/Sashiko).
Changes in v4:
- WARN_ONCE for opt > max, min_t overflow protection, reformatted
call site (Damien Le Moal).
Changes in v3:
- sas_dma_opt_sectors() helper + rounddown_pow_of_two() (Christoph).
Changes in v2:
- Single patch fixing scsi_transport_sas.c, Fixes: 4cbfca5f7750.
drivers/scsi/scsi_transport_sas.c | 43 +++++++++++++++++++++++++++----
1 file changed, 38 insertions(+), 5 deletions(-)
diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c
index 13412702188e4..ebd063b51bd6b 100644
--- a/drivers/scsi/scsi_transport_sas.c
+++ b/drivers/scsi/scsi_transport_sas.c
@@ -27,6 +27,7 @@
#include <linux/module.h>
#include <linux/jiffies.h>
#include <linux/err.h>
+#include <linux/log2.h>
#include <linux/slab.h>
#include <linux/string.h>
#include <linux/blkdev.h>
@@ -222,12 +223,47 @@ static int sas_bsg_initialize(struct Scsi_Host *shost, struct sas_rphy *rphy)
* SAS host attributes
*/
+/*
+ * Set shost->opt_sectors from the DMA optimal mapping size, but only
+ * when dma_opt_mapping_size() is strictly less than dma_max_mapping_size(),
+ * indicating a genuine optimization hint from an IOMMU or DMA backend.
+ * When the two are equal (e.g. IOMMU disabled / passthrough), no real
+ * hint exists, so leave opt_sectors at 0 to avoid bogus optimal_io_size
+ * values that break filesystem geometry (e.g. mkfs.xfs stripe alignment).
+ */
+static void sas_dma_setup_opt_sectors(struct Scsi_Host *shost)
+{
+ struct device *dma_dev = shost->dma_dev;
+ size_t opt, max;
+ unsigned int opt_sectors;
+
+ opt = dma_opt_mapping_size(dma_dev);
+ max = dma_max_mapping_size(dma_dev);
+
+ /* opt >= max means no real hint was provided by the DMA layer */
+ if (opt >= max)
+ return;
+
+ /* Clamp to max_sectors to avoid overflow in sector arithmetic */
+ opt_sectors = min_t(unsigned int, opt >> SECTOR_SHIFT,
+ shost->max_sectors);
+
+ /* Guard against zero before rounddown_pow_of_two() */
+ if (!opt_sectors)
+ return;
+
+ /*
+ * Round down to power-of-two so filesystem geometry calculations
+ * (e.g. XFS stripe width/unit) always produce clean divisors.
+ */
+ shost->opt_sectors = rounddown_pow_of_two(opt_sectors);
+}
+
static int sas_host_setup(struct transport_container *tc, struct device *dev,
struct device *cdev)
{
struct Scsi_Host *shost = dev_to_shost(dev);
struct sas_host_attrs *sas_host = to_sas_host_attrs(shost);
- struct device *dma_dev = shost->dma_dev;
INIT_LIST_HEAD(&sas_host->rphy_list);
mutex_init(&sas_host->lock);
@@ -239,10 +275,7 @@ static int sas_host_setup(struct transport_container *tc, struct device *dev,
dev_printk(KERN_ERR, dev, "fail to a bsg device %d\n",
shost->host_no);
- if (dma_dev->dma_mask) {
- shost->opt_sectors = min_t(unsigned int, shost->max_sectors,
- dma_opt_mapping_size(dma_dev) >> SECTOR_SHIFT);
- }
+ sas_dma_setup_opt_sectors(shost);
return 0;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 3+ messages in thread