* [PATCH v6 0/1] scsi: sas: fix mkfs.xfs failure due to bogus optimal_io_size
@ 2026-03-26 8:46 Ionut Nechita (Wind River)
2026-03-26 8:46 ` [PATCH v6 1/1] scsi: sas: skip opt_sectors when DMA reports no real optimization hint Ionut Nechita (Wind River)
0 siblings, 1 reply; 3+ messages in thread
From: Ionut Nechita (Wind River) @ 2026-03-26 8:46 UTC (permalink / raw)
To: James E . J . Bottomley, Martin K . Petersen
Cc: linux-scsi, linux-kernel, stable, hch, dlemoal, robin.murphy,
john.g.garry, axboe, m.szyprowski, ahuang12, ionut_n2001,
sunlightlinux, Ionut Nechita (Wind River)
From: Ionut Nechita <ionut.nechita@windriver.com>
v6 (per John Garry's review of v5):
- Replaced kerneldoc (/**) with a regular comment — function is static.
- Condensed the comment to a single paragraph.
- Removed WARN_ONCE for opt > max — not the driver's job.
- Combined the !opt and opt == max checks into: if (!opt || opt >= max).
- Apply rounddown_pow_of_two() to min(opt_sectors, max_sectors) instead
of just opt, since max_sectors can be any value.
- Restructured as sas_dma_setup_opt_sectors(struct Scsi_Host *shost)
with the dma_mask check moved inside, removing the need for a
separate dma_dev variable in sas_host_setup().
v5 (per Damien Le Moal's and James Bottomley's review of v4):
- Expanded kdoc, inline comment at opt == max, guard for opt == 0
before rounddown_pow_of_two, trimmed Cc list.
v4 (per Damien Le Moal's review of v3):
- WARN_ONCE for opt > max, min_t overflow protection, reformatted
call site.
v3 (per Christoph Hellwig's review of v2):
- Extracted the opt_sectors logic into a dedicated helper function.
- Added rounddown_pow_of_two().
v2:
- Dropped the dma_opt_mapping_size() change per Robin Murphy's
feedback. Single patch fixing scsi_transport_sas.c.
Test environment:
- Dell PowerEdge R750
- SAS Controller: Broadcom/LSI mpt3sas (SAS3816, FW 33.15.00.00)
- Disks: SAMSUNG MZILT800HBHQ0D3 (800GB SCSI SAS SSD)
- Kernel: 6.12.0-1-amd64 with intel_iommu=off
- IOMMU: Disabled (DMAR: IOMMU disabled), default domain: Passthrough
Based on linux-next (next-20260325).
Link: https://lore.kernel.org/lkml/20260316203956.64515-1-ionut.nechita@windriver.com/ [v1]
Link: https://lore.kernel.org/all/20260318074314.17372-1-ionut.nechita@windriver.com/ [v2]
Link: https://lore.kernel.org/all/20260318200532.51232-1-ionut.nechita@windriver.com/ [v3]
Link: https://lore.kernel.org/lkml/20260319083954.21056-1-ionut.nechita@windriver.com/ [v4]
Link: https://lore.kernel.org/linux-scsi/20260320081429.42106-1-ionut.nechita@windriver.com/ [v5]
Ionut Nechita (Wind River) (1):
scsi: sas: skip opt_sectors when DMA reports no real optimization hint
drivers/scsi/scsi_transport_sas.c | 38 +++++++++++++++++++++++++++----
1 file changed, 33 insertions(+), 5 deletions(-)
--
2.53.0
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH v6 1/1] scsi: sas: skip opt_sectors when DMA reports no real optimization hint
2026-03-26 8:46 [PATCH v6 0/1] scsi: sas: fix mkfs.xfs failure due to bogus optimal_io_size Ionut Nechita (Wind River)
@ 2026-03-26 8:46 ` Ionut Nechita (Wind River)
2026-03-26 14:50 ` John Garry
0 siblings, 1 reply; 3+ messages in thread
From: Ionut Nechita (Wind River) @ 2026-03-26 8:46 UTC (permalink / raw)
To: James E . J . Bottomley, Martin K . Petersen
Cc: linux-scsi, linux-kernel, stable, hch, dlemoal, robin.murphy,
john.g.garry, axboe, m.szyprowski, ahuang12, ionut_n2001,
sunlightlinux, Ionut Nechita (Wind River)
sas_host_setup() unconditionally sets shost->opt_sectors from
dma_opt_mapping_size(). When the IOMMU is disabled or in passthrough
mode and no DMA ops provide an opt_mapping_size callback,
dma_opt_mapping_size() returns min(dma_max_mapping_size(), SIZE_MAX)
which equals dma_max_mapping_size() — a hard upper bound, not an
optimization hint.
On a Dell PowerEdge R750 with mpt3sas (Broadcom SAS3816, FW 33.15.00.00)
and intel_iommu=off the following values are observed:
dma_opt_mapping_size() = dma_max_mapping_size() (no real hint)
shost->max_sectors = 32767
opt_sectors = min(32767, huge >> 9) = 32767
optimal_io_size = 32767 << 9 = 16776704
→ round_down(16776704, 4096) = 16773120
The SAS disk (SAMSUNG MZILT800HBHQ0D3) does not report an
Optimal Transfer Length in VPD page B0, so sdkp->opt_xfer_blocks
remains 0. sd_revalidate_disk() then uses min_not_zero(0, opt_sectors)
= opt_sectors, propagating the bogus value into the block device's
optimal_io_size (visible as OPT-IO = 16773120 in lsblk --topology).
mkfs.xfs picks up optimal_io_size and minimum_io_size and computes:
swidth = 16773120 / 4096 = 4095
sunit = 8192 / 4096 = 2
Since 4095 % 2 != 0, XFS rejects the geometry:
SB stripe unit sanity check failed
This makes it impossible to create XFS filesystems (e.g. for
/var/lib/docker) during system bootstrap.
Fix this by introducing a sas_dma_setup_opt_sectors() helper that
sets opt_sectors only when dma_opt_mapping_size() is strictly less
than dma_max_mapping_size(), indicating a genuine DMA optimization
constraint. The helper computes min(opt_sectors, max_sectors) first,
then rounds down to a power of two so that filesystem geometry
calculations always produce clean results. When the two DMA values
are equal, no backend provided a real hint, so opt_sectors stays at
0 ("no preference").
Fixes: 4cbfca5f7750 ("scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit")
Cc: stable@vger.kernel.org
Signed-off-by: Ionut Nechita <ionut.nechita@windriver.com>
---
drivers/scsi/scsi_transport_sas.c | 38 +++++++++++++++++++++++++++----
1 file changed, 33 insertions(+), 5 deletions(-)
diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c
index 13412702188e4..fa79a0883bb3d 100644
--- a/drivers/scsi/scsi_transport_sas.c
+++ b/drivers/scsi/scsi_transport_sas.c
@@ -27,6 +27,7 @@
#include <linux/module.h>
#include <linux/jiffies.h>
#include <linux/err.h>
+#include <linux/log2.h>
#include <linux/slab.h>
#include <linux/string.h>
#include <linux/blkdev.h>
@@ -222,12 +223,42 @@ static int sas_bsg_initialize(struct Scsi_Host *shost, struct sas_rphy *rphy)
* SAS host attributes
*/
+/*
+ * Set shost->opt_sectors from the DMA optimal mapping size, but only
+ * when dma_opt_mapping_size() is strictly less than dma_max_mapping_size(),
+ * indicating a genuine optimization hint from an IOMMU or DMA backend.
+ * When the two are equal (e.g. IOMMU disabled / passthrough), no real
+ * hint exists, so leave opt_sectors at 0 to avoid bogus optimal_io_size
+ * values that break filesystem geometry (e.g. mkfs.xfs stripe alignment).
+ */
+static void sas_dma_setup_opt_sectors(struct Scsi_Host *shost)
+{
+ struct device *dma_dev = shost->dma_dev;
+ size_t opt, max;
+ unsigned int opt_sectors;
+
+ if (!dma_dev->dma_mask)
+ return;
+
+ opt = dma_opt_mapping_size(dma_dev);
+ max = dma_max_mapping_size(dma_dev);
+
+ if (!opt || opt >= max)
+ return;
+
+ opt_sectors = min_t(unsigned int, opt >> SECTOR_SHIFT,
+ shost->max_sectors);
+ if (!opt_sectors)
+ return;
+
+ shost->opt_sectors = rounddown_pow_of_two(opt_sectors);
+}
+
static int sas_host_setup(struct transport_container *tc, struct device *dev,
struct device *cdev)
{
struct Scsi_Host *shost = dev_to_shost(dev);
struct sas_host_attrs *sas_host = to_sas_host_attrs(shost);
- struct device *dma_dev = shost->dma_dev;
INIT_LIST_HEAD(&sas_host->rphy_list);
mutex_init(&sas_host->lock);
@@ -239,10 +270,7 @@ static int sas_host_setup(struct transport_container *tc, struct device *dev,
dev_printk(KERN_ERR, dev, "fail to a bsg device %d\n",
shost->host_no);
- if (dma_dev->dma_mask) {
- shost->opt_sectors = min_t(unsigned int, shost->max_sectors,
- dma_opt_mapping_size(dma_dev) >> SECTOR_SHIFT);
- }
+ sas_dma_setup_opt_sectors(shost);
return 0;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v6 1/1] scsi: sas: skip opt_sectors when DMA reports no real optimization hint
2026-03-26 8:46 ` [PATCH v6 1/1] scsi: sas: skip opt_sectors when DMA reports no real optimization hint Ionut Nechita (Wind River)
@ 2026-03-26 14:50 ` John Garry
0 siblings, 0 replies; 3+ messages in thread
From: John Garry @ 2026-03-26 14:50 UTC (permalink / raw)
To: Ionut Nechita (Wind River), James E . J . Bottomley,
Martin K . Petersen
Cc: linux-scsi, linux-kernel, stable, hch, dlemoal, robin.murphy,
axboe, m.szyprowski, ahuang12, ionut_n2001, sunlightlinux
On 26/03/2026 08:46, Ionut Nechita (Wind River) wrote:
> sas_host_setup() unconditionally sets shost->opt_sectors from
> dma_opt_mapping_size(). When the IOMMU is disabled or in passthrough
> mode and no DMA ops provide an opt_mapping_size callback,
> dma_opt_mapping_size() returns min(dma_max_mapping_size(), SIZE_MAX)
> which equals dma_max_mapping_size() — a hard upper bound, not an
> optimization hint.
>
> On a Dell PowerEdge R750 with mpt3sas (Broadcom SAS3816, FW 33.15.00.00)
> and intel_iommu=off the following values are observed:
>
> dma_opt_mapping_size() = dma_max_mapping_size() (no real hint)
> shost->max_sectors = 32767
> opt_sectors = min(32767, huge >> 9) = 32767
> optimal_io_size = 32767 << 9 = 16776704
> → round_down(16776704, 4096) = 16773120
>
> The SAS disk (SAMSUNG MZILT800HBHQ0D3) does not report an
> Optimal Transfer Length in VPD page B0, so sdkp->opt_xfer_blocks
> remains 0. sd_revalidate_disk() then uses min_not_zero(0, opt_sectors)
> = opt_sectors, propagating the bogus value into the block device's
> optimal_io_size (visible as OPT-IO = 16773120 in lsblk --topology).
>
> mkfs.xfs picks up optimal_io_size and minimum_io_size and computes:
>
> swidth = 16773120 / 4096 = 4095
> sunit = 8192 / 4096 = 2
>
> Since 4095 % 2 != 0, XFS rejects the geometry:
>
> SB stripe unit sanity check failed
>
> This makes it impossible to create XFS filesystems (e.g. for
> /var/lib/docker) during system bootstrap.
>
> Fix this by introducing a sas_dma_setup_opt_sectors() helper that
> sets opt_sectors only when dma_opt_mapping_size() is strictly less
> than dma_max_mapping_size(), indicating a genuine DMA optimization
> constraint. The helper computes min(opt_sectors, max_sectors) first,
> then rounds down to a power of two so that filesystem geometry
> calculations always produce clean results. When the two DMA values
> are equal, no backend provided a real hint, so opt_sectors stays at
> 0 ("no preference").
>
> Fixes: 4cbfca5f7750 ("scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit")
> Cc: stable@vger.kernel.org
> Signed-off-by: Ionut Nechita <ionut.nechita@windriver.com>
I have some nits below, regardless of that, FWIW:
Reviewed-by: John Garry <john.g.garry@oracle.com>
> ---
> drivers/scsi/scsi_transport_sas.c | 38 +++++++++++++++++++++++++++----
> 1 file changed, 33 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c
> index 13412702188e4..fa79a0883bb3d 100644
> --- a/drivers/scsi/scsi_transport_sas.c
> +++ b/drivers/scsi/scsi_transport_sas.c
> @@ -27,6 +27,7 @@
> #include <linux/module.h>
> #include <linux/jiffies.h>
> #include <linux/err.h>
> +#include <linux/log2.h>
> #include <linux/slab.h>
> #include <linux/string.h>
> #include <linux/blkdev.h>
> @@ -222,12 +223,42 @@ static int sas_bsg_initialize(struct Scsi_Host *shost, struct sas_rphy *rphy)
> * SAS host attributes
> */
>
> +/*
> + * Set shost->opt_sectors from the DMA optimal mapping size, but only
> + * when dma_opt_mapping_size() is strictly less than dma_max_mapping_size(),
Aside from this patch, dma_opt_mapping_size() may be better named to
dma_max_opt_mapping_size() or similar, to indicate that it is an upper
limit of good performance and not a sweet spot which we should aim for
> + * indicating a genuine optimization hint from an IOMMU or DMA backend.
> + * When the two are equal (e.g. IOMMU disabled / passthrough), no real
> + * hint exists, so leave opt_sectors at 0 to avoid bogus optimal_io_size
> + * values that break filesystem geometry (e.g. mkfs.xfs stripe alignment).
> + */
> +static void sas_dma_setup_opt_sectors(struct Scsi_Host *shost)
> +{
> + struct device *dma_dev = shost->dma_dev;
> + size_t opt, max;
> + unsigned int opt_sectors;
> +
> + if (!dma_dev->dma_mask)
> + return;
> +
> + opt = dma_opt_mapping_size(dma_dev);
> + max = dma_max_mapping_size(dma_dev);
> +
> + if (!opt || opt >= max)
> + return;
opt > max should not be possible, but I suppose no harm to check. And I
think that the opt == 0 check is really covered by the !opt_sectors
check, below
> +
> + opt_sectors = min_t(unsigned int, opt >> SECTOR_SHIFT,
> + shost->max_sectors);
> + if (!opt_sectors)
> + return;
I don't think that opt_sectors == 0 is possible as max_sectors == 0 is
not possible unless someone hacks their SCSI LLD to override it to zero
after scsi_host_alloc(), so I suppose that the check is ok since
rounddown_pow_of_two(0) gives undefined behaviour
> +
> + shost->opt_sectors = rounddown_pow_of_two(opt_sectors);
> +}
> +
> static int sas_host_setup(struct transport_container *tc, struct device *dev,
> struct device *cdev)
> {
> struct Scsi_Host *shost = dev_to_shost(dev);
> struct sas_host_attrs *sas_host = to_sas_host_attrs(shost);
> - struct device *dma_dev = shost->dma_dev;
>
> INIT_LIST_HEAD(&sas_host->rphy_list);
> mutex_init(&sas_host->lock);
> @@ -239,10 +270,7 @@ static int sas_host_setup(struct transport_container *tc, struct device *dev,
> dev_printk(KERN_ERR, dev, "fail to a bsg device %d\n",
> shost->host_no);
>
> - if (dma_dev->dma_mask) {
> - shost->opt_sectors = min_t(unsigned int, shost->max_sectors,
> - dma_opt_mapping_size(dma_dev) >> SECTOR_SHIFT);
> - }
> + sas_dma_setup_opt_sectors(shost);
>
> return 0;
> }
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-03-26 14:51 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-26 8:46 [PATCH v6 0/1] scsi: sas: fix mkfs.xfs failure due to bogus optimal_io_size Ionut Nechita (Wind River)
2026-03-26 8:46 ` [PATCH v6 1/1] scsi: sas: skip opt_sectors when DMA reports no real optimization hint Ionut Nechita (Wind River)
2026-03-26 14:50 ` John Garry
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox