public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/1] scsi: sas: fix mkfs.xfs failure due to bogus optimal_io_size
@ 2026-04-15  7:18 Ionut Nechita (Wind River)
  2026-04-15  7:18 ` [PATCH v7 1/1] scsi: sas: skip opt_sectors when DMA reports no real optimization hint Ionut Nechita (Wind River)
  0 siblings, 1 reply; 4+ messages in thread
From: Ionut Nechita (Wind River) @ 2026-04-15  7:18 UTC (permalink / raw)
  To: James E . J . Bottomley, Martin K . Petersen
  Cc: linux-scsi, linux-kernel, stable, hch, dlemoal, robin.murphy,
	john.g.garry, axboe, m.szyprowski, ahuang12, ionut_n2001,
	sunlightlinux, Ionut Nechita

From: Ionut Nechita <ionut.nechita@windriver.com>

From: Ionut Nechita <ionut.nechita@windriver.com>

v7 (per John Garry's review of v6):
  - Dropped the redundant !opt check from the first guard; the
    !opt_sectors check later already handles the opt == 0 case.
    Now simply: if (opt >= max) return;
  - Added Reviewed-by: John Garry <john.g.garry@oracle.com>.
  - Rebased onto linux-next (next-20260414).

v6 (per John Garry's review of v5):
  - Replaced kerneldoc (/**) with a regular comment — function is static.
  - Condensed the comment to a single paragraph.
  - Removed WARN_ONCE for opt > max — not the driver's job.
  - Combined the !opt and opt == max checks into: if (!opt || opt >= max).
  - Apply rounddown_pow_of_two() to min(opt_sectors, max_sectors) instead
    of just opt, since max_sectors can be any value.
  - Restructured as sas_dma_setup_opt_sectors(struct Scsi_Host *shost)
    with the dma_mask check moved inside, removing the need for a
    separate dma_dev variable in sas_host_setup().

v5 (per Damien Le Moal's and James Bottomley's review of v4):
  - Expanded kdoc, inline comment at opt == max, guard for opt == 0
    before rounddown_pow_of_two, trimmed Cc list.

v4 (per Damien Le Moal's review of v3):
  - WARN_ONCE for opt > max, min_t overflow protection, reformatted
    call site.

v3 (per Christoph Hellwig's review of v2):
  - Extracted the opt_sectors logic into a dedicated helper function.
  - Added rounddown_pow_of_two().

v2:
  - Dropped the dma_opt_mapping_size() change per Robin Murphy's
    feedback.  Single patch fixing scsi_transport_sas.c.

Test environment:
  - Dell PowerEdge R750
  - SAS Controller: Broadcom/LSI mpt3sas (SAS3816, FW 33.15.00.00)
  - Disks: SAMSUNG MZILT800HBHQ0D3 (800GB SCSI SAS SSD)
  - Kernel: 6.12.0-1-amd64 with intel_iommu=off
  - IOMMU: Disabled (DMAR: IOMMU disabled), default domain: Passthrough

Based on linux-next (next-20260414).

Link: https://lore.kernel.org/lkml/20260316203956.64515-1-ionut.nechita@windriver.com/ [v1]
Link: https://lore.kernel.org/all/20260318074314.17372-1-ionut.nechita@windriver.com/ [v2]
Link: https://lore.kernel.org/all/20260318200532.51232-1-ionut.nechita@windriver.com/ [v3]
Link: https://lore.kernel.org/lkml/20260319083954.21056-1-ionut.nechita@windriver.com/ [v4]
Link: https://lore.kernel.org/linux-scsi/20260320081429.42106-1-ionut.nechita@windriver.com/ [v5]
Link: https://lore.kernel.org/linux-scsi/20260326084644.27162-1-ionut.nechita@windriver.com/ [v6]

Ionut Nechita (Wind River) (1):
  scsi: sas: skip opt_sectors when DMA reports no real optimization hint

 drivers/scsi/scsi_transport_sas.c | 38 +++++++++++++++++++++++++++----
 1 file changed, 33 insertions(+), 5 deletions(-)

--
2.53.0

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v7 1/1] scsi: sas: skip opt_sectors when DMA reports no real optimization hint
  2026-04-15  7:18 [PATCH v7 0/1] scsi: sas: fix mkfs.xfs failure due to bogus optimal_io_size Ionut Nechita (Wind River)
@ 2026-04-15  7:18 ` Ionut Nechita (Wind River)
  2026-04-24 13:21   ` Christoph Hellwig
  0 siblings, 1 reply; 4+ messages in thread
From: Ionut Nechita (Wind River) @ 2026-04-15  7:18 UTC (permalink / raw)
  To: James E . J . Bottomley, Martin K . Petersen
  Cc: linux-scsi, linux-kernel, stable, hch, dlemoal, robin.murphy,
	john.g.garry, axboe, m.szyprowski, ahuang12, ionut_n2001,
	sunlightlinux, Ionut Nechita

From: Ionut Nechita <ionut.nechita@windriver.com>

sas_host_setup() unconditionally sets shost->opt_sectors from
dma_opt_mapping_size().  When the IOMMU is disabled or in passthrough
mode and no DMA ops provide an opt_mapping_size callback,
dma_opt_mapping_size() returns min(dma_max_mapping_size(), SIZE_MAX)
which equals dma_max_mapping_size() — a hard upper bound, not an
optimization hint.

On a Dell PowerEdge R750 with mpt3sas (Broadcom SAS3816, FW 33.15.00.00)
and intel_iommu=off the following values are observed:

  dma_opt_mapping_size()  = dma_max_mapping_size() (no real hint)
  shost->max_sectors      = 32767
  opt_sectors             = min(32767, huge >> 9) = 32767
  optimal_io_size         = 32767 << 9 = 16776704
                          → round_down(16776704, 4096) = 16773120

The SAS disk (SAMSUNG MZILT800HBHQ0D3) does not report an
Optimal Transfer Length in VPD page B0, so sdkp->opt_xfer_blocks
remains 0.  sd_revalidate_disk() then uses min_not_zero(0, opt_sectors)
= opt_sectors, propagating the bogus value into the block device's
optimal_io_size (visible as OPT-IO = 16773120 in lsblk --topology).

mkfs.xfs picks up optimal_io_size and minimum_io_size and computes:

  swidth = 16773120 / 4096 = 4095
  sunit  = 8192 / 4096     = 2

Since 4095 % 2 != 0, XFS rejects the geometry:

  SB stripe unit sanity check failed

This makes it impossible to create XFS filesystems (e.g. for
/var/lib/docker) during system bootstrap.

Fix this by introducing a sas_dma_setup_opt_sectors() helper that
sets opt_sectors only when dma_opt_mapping_size() is strictly less
than dma_max_mapping_size(), indicating a genuine DMA optimization
constraint.  The helper computes min(opt_sectors, max_sectors) first,
then rounds down to a power of two so that filesystem geometry
calculations always produce clean results.  When the two DMA values
are equal, no backend provided a real hint, so opt_sectors stays at
0 ("no preference").

Fixes: 4cbfca5f7750 ("scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit")
Cc: stable@vger.kernel.org
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Ionut Nechita <ionut.nechita@windriver.com>
---
Changes in v7:
- Drop redundant !opt check; the !opt_sectors check below already
  handles the opt == 0 case (John Garry).
- Add Reviewed-by from John Garry.
- Rebased onto next-20260414.

Changes in v6:
- No kerneldoc, short inline comment, removed WARN_ONCE, combined
  checks (!opt || opt >= max), rounddown on min(opt, max_sectors),
  restructured as sas_dma_setup_opt_sectors(shost) (John Garry).

Changes in v5:
- Expanded kdoc, inline comment at opt == max, guard for opt == 0
  before rounddown_pow_of_two, trimmed Cc list (Damien/James/Sashiko).

Changes in v4:
- WARN_ONCE for opt > max, min_t overflow protection, reformatted
  call site (Damien Le Moal).

Changes in v3:
- sas_dma_opt_sectors() helper + rounddown_pow_of_two() (Christoph).

Changes in v2:
- Single patch fixing scsi_transport_sas.c, Fixes: 4cbfca5f7750.

 drivers/scsi/scsi_transport_sas.c | 38 +++++++++++++++++++++++++++----
 1 file changed, 33 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c
index 13412702188e4..45609259f27db 100644
--- a/drivers/scsi/scsi_transport_sas.c
+++ b/drivers/scsi/scsi_transport_sas.c
@@ -27,6 +27,7 @@
 #include <linux/module.h>
 #include <linux/jiffies.h>
 #include <linux/err.h>
+#include <linux/log2.h>
 #include <linux/slab.h>
 #include <linux/string.h>
 #include <linux/blkdev.h>
@@ -222,12 +223,42 @@ static int sas_bsg_initialize(struct Scsi_Host *shost, struct sas_rphy *rphy)
  * SAS host attributes
  */
 
+/*
+ * Set shost->opt_sectors from the DMA optimal mapping size, but only
+ * when dma_opt_mapping_size() is strictly less than dma_max_mapping_size(),
+ * indicating a genuine optimization hint from an IOMMU or DMA backend.
+ * When the two are equal (e.g. IOMMU disabled / passthrough), no real
+ * hint exists, so leave opt_sectors at 0 to avoid bogus optimal_io_size
+ * values that break filesystem geometry (e.g. mkfs.xfs stripe alignment).
+ */
+static void sas_dma_setup_opt_sectors(struct Scsi_Host *shost)
+{
+	struct device *dma_dev = shost->dma_dev;
+	size_t opt, max;
+	unsigned int opt_sectors;
+
+	if (!dma_dev->dma_mask)
+		return;
+
+	opt = dma_opt_mapping_size(dma_dev);
+	max = dma_max_mapping_size(dma_dev);
+
+	if (opt >= max)
+		return;
+
+	opt_sectors = min_t(unsigned int, opt >> SECTOR_SHIFT,
+			    shost->max_sectors);
+	if (!opt_sectors)
+		return;
+
+	shost->opt_sectors = rounddown_pow_of_two(opt_sectors);
+}
+
 static int sas_host_setup(struct transport_container *tc, struct device *dev,
 			  struct device *cdev)
 {
 	struct Scsi_Host *shost = dev_to_shost(dev);
 	struct sas_host_attrs *sas_host = to_sas_host_attrs(shost);
-	struct device *dma_dev = shost->dma_dev;
 
 	INIT_LIST_HEAD(&sas_host->rphy_list);
 	mutex_init(&sas_host->lock);
@@ -239,10 +270,7 @@ static int sas_host_setup(struct transport_container *tc, struct device *dev,
 		dev_printk(KERN_ERR, dev, "fail to a bsg device %d\n",
 			   shost->host_no);
 
-	if (dma_dev->dma_mask) {
-		shost->opt_sectors = min_t(unsigned int, shost->max_sectors,
-				dma_opt_mapping_size(dma_dev) >> SECTOR_SHIFT);
-	}
+	sas_dma_setup_opt_sectors(shost);
 
 	return 0;
 }
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v7 1/1] scsi: sas: skip opt_sectors when DMA reports no real optimization hint
  2026-04-15  7:18 ` [PATCH v7 1/1] scsi: sas: skip opt_sectors when DMA reports no real optimization hint Ionut Nechita (Wind River)
@ 2026-04-24 13:21   ` Christoph Hellwig
  2026-04-28  8:15     ` John Garry
  0 siblings, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2026-04-24 13:21 UTC (permalink / raw)
  To: Ionut Nechita (Wind River)
  Cc: James E . J . Bottomley, Martin K . Petersen, linux-scsi,
	linux-kernel, stable, hch, dlemoal, robin.murphy, john.g.garry,
	axboe, m.szyprowski, ahuang12, ionut_n2001, sunlightlinux

On Wed, Apr 15, 2026 at 10:18:49AM +0300, Ionut Nechita (Wind River) wrote:
> +/*
> + * Set shost->opt_sectors from the DMA optimal mapping size, but only
> + * when dma_opt_mapping_size() is strictly less than dma_max_mapping_size(),
> + * indicating a genuine optimization hint from an IOMMU or DMA backend.
> + * When the two are equal (e.g. IOMMU disabled / passthrough), no real
> + * hint exists, so leave opt_sectors at 0 to avoid bogus optimal_io_size
> + * values that break filesystem geometry (e.g. mkfs.xfs stripe alignment).
> + */
> +static void sas_dma_setup_opt_sectors(struct Scsi_Host *shost)
> +{
> +	struct device *dma_dev = shost->dma_dev;
> +	size_t opt, max;
> +	unsigned int opt_sectors;
> +
> +	if (!dma_dev->dma_mask)
> +		return;

Upper layers have no real busines looking at dma_dev->dma_mask. What
is this check intended to do?

> +
> +	opt = dma_opt_mapping_size(dma_dev);
> +	max = dma_max_mapping_size(dma_dev);
> +
> +	if (opt >= max)
> +		return;
> +
> +	opt_sectors = min_t(unsigned int, opt >> SECTOR_SHIFT,
> +			    shost->max_sectors);
> +	if (!opt_sectors)
> +		return;
> +
> +	shost->opt_sectors = rounddown_pow_of_two(opt_sectors);

Please add comments explaining the logic.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v7 1/1] scsi: sas: skip opt_sectors when DMA reports no real optimization hint
  2026-04-24 13:21   ` Christoph Hellwig
@ 2026-04-28  8:15     ` John Garry
  0 siblings, 0 replies; 4+ messages in thread
From: John Garry @ 2026-04-28  8:15 UTC (permalink / raw)
  To: Christoph Hellwig, Ionut Nechita (Wind River)
  Cc: James E . J . Bottomley, Martin K . Petersen, linux-scsi,
	linux-kernel, stable, dlemoal, robin.murphy, axboe, m.szyprowski,
	ahuang12, ionut_n2001, sunlightlinux

On 24/04/2026 14:21, Christoph Hellwig wrote:

Responding to get things moving..

>> + */
>> +static void sas_dma_setup_opt_sectors(struct Scsi_Host *shost)
>> +{
>> +	struct device *dma_dev = shost->dma_dev;
>> +	size_t opt, max;
>> +	unsigned int opt_sectors;
>> +
>> +	if (!dma_dev->dma_mask)
>> +		return;
> Upper layers have no real busines looking at dma_dev->dma_mask. What
> is this check intended to do?

Back when that check was introduced, dma_max_mapping_size() may crash 
for some SCSI hosts. See 
https://lore.kernel.org/linux-scsi/BYAPR04MB58168CBFF8B691DF33C73DDBE7C40@BYAPR04MB5816.namprd04.prod.outlook.com/

scsi_debug would be an example of such a shost as it is not DMA capable. 
That crash is not an issue any longer from my limited testing. I think 
that it comes down to new checks in dma_addressing_limited() -> 
__dma_addressing_limited() for dma_mask being set. So we may be able to 
get rid of that dma_mask check.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-04-28  8:15 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-15  7:18 [PATCH v7 0/1] scsi: sas: fix mkfs.xfs failure due to bogus optimal_io_size Ionut Nechita (Wind River)
2026-04-15  7:18 ` [PATCH v7 1/1] scsi: sas: skip opt_sectors when DMA reports no real optimization hint Ionut Nechita (Wind River)
2026-04-24 13:21   ` Christoph Hellwig
2026-04-28  8:15     ` John Garry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox