Linux SCSI subsystem development
 help / color / mirror / Atom feed
* [PATCH v4 0/1] scsi: sas: fix mkfs.xfs failure due to bogus optimal_io_size
@ 2026-03-19  8:39 Ionut Nechita (Wind River)
  2026-03-19  8:39 ` [PATCH v4] scsi: sas: skip opt_sectors when DMA reports no real optimization hint Ionut Nechita (Wind River)
  0 siblings, 1 reply; 7+ messages in thread
From: Ionut Nechita (Wind River) @ 2026-03-19  8:39 UTC (permalink / raw)
  To: linux-scsi
  Cc: James.Bottomley, ahuang12, axboe, damien.lemoal, dlemoal, hch,
	iommu, ionut_n2001, john.g.garry, kbusch, linux-kernel,
	linux-nvme, m.szyprowski, martin.petersen, robin.murphy, sagi,
	stable, sunlightlinux, Ionut Nechita

From: Ionut Nechita <ionut.nechita@windriver.com>

v4 (per Damien Le Moal's review of v3):
  - Split the opt >= max check into a WARN_ONCE for the impossible
    opt > max case (driver bug) and a plain == check for the "no hint"
    case.
  - Used min_t(unsigned int, ...) for the return value to avoid any
    potential overflow when shifting size_t down to sectors.
  - Reformatted the call site as suggested:
      shost->opt_sectors =
          sas_dma_opt_sectors(dma_dev, shost->max_sectors);

v3 (per Christoph Hellwig's review of v2):
  - Extracted the opt_sectors logic into a dedicated sas_dma_opt_sectors()
    helper function, clearly split out from sas_host_setup().
  - Added rounddown_pow_of_two() on the DMA optimal mapping size so that
    the resulting opt_sectors is always a power of two, keeping filesystem
    geometry calculations clean.
  - Added #include <linux/log2.h> for rounddown_pow_of_two().

v2:
  - Dropped the dma_opt_mapping_size() change per Robin Murphy's feedback:
    the DMA core semantics are correct, the bug is in the caller.
  - Dropped the nvme-pci patch (no longer needed).
  - Single patch now fixes the actual bug in scsi_transport_sas.c.

v1 feedback summary:
  - Robin Murphy: dma_opt_mapping_size() semantics are correct; if no
    restriction exists, the largest efficient size IS the largest size.
    Fix the caller, not the common code.
  - John Garry: Asked for concrete max_sectors/opt_sectors values and
    questioned whether sd_revalidate_disk() would override opt_sectors
    via opt_xfer_blocks.
  - Damien Le Moal: Suggested min_not_zero() for nvme-pci (now moot).

Answer to John's question (from v2, still relevant):
  The SAS disks on this system do not report Optimal Transfer Length in
  VPD page B0, so sdkp->opt_xfer_blocks = 0.  sd_revalidate_disk() uses
  min_not_zero(0, opt_sectors) which returns opt_sectors, propagating
  the bogus value.  Observed values:

    shost->max_sectors      = 32767
    opt_sectors             = 32767  (capped at max_sectors)
    optimal_io_size         = 16773120  (visible in lsblk --topology)
    minimum_io_size         = 8192

  mkfs.xfs computes swidth=4095, sunit=2, fails because 4095 % 2 != 0.

Test environment:
  - Dell PowerEdge R750
  - SAS Controller: Broadcom/LSI mpt3sas (SAS3816, FW 33.15.00.00)
  - Disks: SAMSUNG MZILT800HBHQ0D3 (800GB SCSI SAS SSD)
  - Kernel: 6.12.0-1-amd64 with intel_iommu=off
  - IOMMU: Disabled (DMAR: IOMMU disabled), default domain: Passthrough

Based on linux-next (next-20260318).

Link: https://lore.kernel.org/lkml/20260316203956.64515-1-ionut.nechita@windriver.com/
Link: https://lore.kernel.org/all/20260318074314.17372-1-ionut.nechita@windriver.com/
Link: https://lore.kernel.org/all/20260318200532.51232-1-ionut.nechita@windriver.com/

Ionut Nechita (1):
  scsi: sas: skip opt_sectors when DMA reports no real optimization hint

 drivers/scsi/scsi_transport_sas.c | 40 +++++++++++++++++++++++++++----
 1 file changed, 36 insertions(+), 4 deletions(-)

--
2.43.0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v4] scsi: sas: skip opt_sectors when DMA reports no real optimization hint
  2026-03-19  8:39 [PATCH v4 0/1] scsi: sas: fix mkfs.xfs failure due to bogus optimal_io_size Ionut Nechita (Wind River)
@ 2026-03-19  8:39 ` Ionut Nechita (Wind River)
  2026-03-19 11:06   ` Damien Le Moal
  2026-03-19 11:07   ` Damien Le Moal
  0 siblings, 2 replies; 7+ messages in thread
From: Ionut Nechita (Wind River) @ 2026-03-19  8:39 UTC (permalink / raw)
  To: linux-scsi
  Cc: James.Bottomley, ahuang12, axboe, damien.lemoal, dlemoal, hch,
	iommu, ionut_n2001, john.g.garry, kbusch, linux-kernel,
	linux-nvme, m.szyprowski, martin.petersen, robin.murphy, sagi,
	stable, sunlightlinux, Ionut Nechita

From: Ionut Nechita <ionut.nechita@windriver.com>

sas_host_setup() unconditionally sets shost->opt_sectors from
dma_opt_mapping_size().  When the IOMMU is disabled or in passthrough
mode and no DMA ops provide an opt_mapping_size callback,
dma_opt_mapping_size() returns min(dma_max_mapping_size(), SIZE_MAX)
which equals dma_max_mapping_size() — a hard upper bound, not an
optimization hint.

On a Dell PowerEdge R750 with mpt3sas (Broadcom SAS3816, FW 33.15.00.00)
and intel_iommu=off the following values are observed:

  dma_opt_mapping_size()  = dma_max_mapping_size() (no real hint)
  shost->max_sectors      = 32767
  opt_sectors             = min(32767, huge >> 9) = 32767
  optimal_io_size         = 32767 << 9 = 16776704
                          → round_down(16776704, 4096) = 16773120

The SAS disk (SAMSUNG MZILT800HBHQ0D3) do not report an
Optimal Transfer Length in VPD page B0,so sdkp->opt_xfer_blocks remains 0.
sd_revalidate_disk() then uses min_not_zero(0, opt_sectors) = opt_sectors,
propagating the bogus value into the block device's optimal_io_size
(visible as OPT-IO = 16773120 in lsblk --topology).

mkfs.xfs picks up optimal_io_size and minimum_io_size and computes:

  swidth = 16773120 / 4096 = 4095
  sunit  = 8192 / 4096     = 2

Since 4095 % 2 != 0, XFS rejects the geometry:

  SB stripe unit sanity check failed

This makes it impossible to create XFS filesystems (e.g. for
/var/lib/docker) during system bootstrap.

Fix this by introducing a sas_dma_opt_sectors() helper that only returns
a non-zero opt_sectors when dma_opt_mapping_size() is strictly less than
dma_max_mapping_size(), indicating a genuine DMA optimization constraint
from an IOMMU or DMA ops backend.  The helper also rounds the value down
to a power of two so that filesystem geometry calculations always produce
clean results.  When the two DMA values are equal, no backend provided a
real hint, so opt_sectors stays at 0 ("no preference").

A WARN_ONCE guards against dma_opt_mapping_size() returning a value
larger than dma_max_mapping_size(), which would indicate a driver bug.
The return value uses min_t(unsigned int, ...) to avoid any potential
overflow when shifting the size_t opt value down to sectors.

Fixes: 4cbfca5f7750 ("scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit")
Cc: stable@vger.kernel.org
Signed-off-by: Ionut Nechita <ionut.nechita@windriver.com>
---
 drivers/scsi/scsi_transport_sas.c | 40 +++++++++++++++++++++++++++----
 1 file changed, 36 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c
index 12124f9d5ccd0..696627b6fe2c3 100644
--- a/drivers/scsi/scsi_transport_sas.c
+++ b/drivers/scsi/scsi_transport_sas.c
@@ -27,6 +27,7 @@
 #include <linux/module.h>
 #include <linux/jiffies.h>
 #include <linux/err.h>
+#include <linux/log2.h>
 #include <linux/slab.h>
 #include <linux/string.h>
 #include <linux/blkdev.h>
@@ -222,6 +223,38 @@ static int sas_bsg_initialize(struct Scsi_Host *shost, struct sas_rphy *rphy)
  * SAS host attributes
  */
 
+/**
+ * sas_dma_opt_sectors - derive opt_sectors from DMA optimal mapping size
+ * @dma_dev: device to query DMA parameters for
+ * @max_sectors: upper bound from the host adapter
+ *
+ * When the DMA layer reports a genuine optimization constraint (i.e.
+ * dma_opt_mapping_size() < dma_max_mapping_size()), convert it to a
+ * sector count, round it down to a power of two so that filesystem
+ * geometry calculations stay sane, and cap it at @max_sectors.
+ *
+ * When the two values are equal no backend provided a real hint and
+ * the function returns 0 ("no preference").
+ */
+static unsigned int sas_dma_opt_sectors(struct device *dma_dev,
+					unsigned int max_sectors)
+{
+	size_t opt = dma_opt_mapping_size(dma_dev);
+	size_t max = dma_max_mapping_size(dma_dev);
+
+	if (WARN_ONCE(opt > max,
+		      "dma_opt_mapping_size (%zu) > dma_max_mapping_size (%zu)\n",
+		      opt, max))
+		return 0;
+
+	if (opt == max)
+		return 0;
+
+	opt = rounddown_pow_of_two(opt);
+
+	return min_t(unsigned int, opt >> SECTOR_SHIFT, max_sectors);
+}
+
 static int sas_host_setup(struct transport_container *tc, struct device *dev,
 			  struct device *cdev)
 {
@@ -239,10 +272,9 @@ static int sas_host_setup(struct transport_container *tc, struct device *dev,
 		dev_printk(KERN_ERR, dev, "fail to a bsg device %d\n",
 			   shost->host_no);
 
-	if (dma_dev->dma_mask) {
-		shost->opt_sectors = min_t(unsigned int, shost->max_sectors,
-				dma_opt_mapping_size(dma_dev) >> SECTOR_SHIFT);
-	}
+	if (dma_dev->dma_mask)
+		shost->opt_sectors =
+			sas_dma_opt_sectors(dma_dev, shost->max_sectors);
 
 	return 0;
 }
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] scsi: sas: skip opt_sectors when DMA reports no real optimization hint
  2026-03-19  8:39 ` [PATCH v4] scsi: sas: skip opt_sectors when DMA reports no real optimization hint Ionut Nechita (Wind River)
@ 2026-03-19 11:06   ` Damien Le Moal
  2026-03-19 11:07   ` Damien Le Moal
  1 sibling, 0 replies; 7+ messages in thread
From: Damien Le Moal @ 2026-03-19 11:06 UTC (permalink / raw)
  To: Ionut Nechita (Wind River), linux-scsi
  Cc: James.Bottomley, ahuang12, axboe, damien.lemoal, hch, iommu,
	ionut_n2001, john.g.garry, kbusch, linux-kernel, linux-nvme,
	m.szyprowski, martin.petersen, robin.murphy, sagi, stable,
	sunlightlinux

On 3/19/26 17:39, Ionut Nechita (Wind River) wrote:
> From: Ionut Nechita <ionut.nechita@windriver.com>
> 
> sas_host_setup() unconditionally sets shost->opt_sectors from
> dma_opt_mapping_size().  When the IOMMU is disabled or in passthrough
> mode and no DMA ops provide an opt_mapping_size callback,
> dma_opt_mapping_size() returns min(dma_max_mapping_size(), SIZE_MAX)
> which equals dma_max_mapping_size() — a hard upper bound, not an
> optimization hint.

Please reduce the distribution list. This is now a scsi patch. Nothing to do
with iommu or nvme.

> 
> On a Dell PowerEdge R750 with mpt3sas (Broadcom SAS3816, FW 33.15.00.00)
> and intel_iommu=off the following values are observed:
> 
>   dma_opt_mapping_size()  = dma_max_mapping_size() (no real hint)
>   shost->max_sectors      = 32767
>   opt_sectors             = min(32767, huge >> 9) = 32767
>   optimal_io_size         = 32767 << 9 = 16776704
>                           → round_down(16776704, 4096) = 16773120
> 
> The SAS disk (SAMSUNG MZILT800HBHQ0D3) do not report an
> Optimal Transfer Length in VPD page B0,so sdkp->opt_xfer_blocks remains 0.
> sd_revalidate_disk() then uses min_not_zero(0, opt_sectors) = opt_sectors,
> propagating the bogus value into the block device's optimal_io_size
> (visible as OPT-IO = 16773120 in lsblk --topology).
> 
> mkfs.xfs picks up optimal_io_size and minimum_io_size and computes:
> 
>   swidth = 16773120 / 4096 = 4095
>   sunit  = 8192 / 4096     = 2
> 
> Since 4095 % 2 != 0, XFS rejects the geometry:
> 
>   SB stripe unit sanity check failed
> 
> This makes it impossible to create XFS filesystems (e.g. for
> /var/lib/docker) during system bootstrap.
> 
> Fix this by introducing a sas_dma_opt_sectors() helper that only returns
> a non-zero opt_sectors when dma_opt_mapping_size() is strictly less than
> dma_max_mapping_size(), indicating a genuine DMA optimization constraint
> from an IOMMU or DMA ops backend.  The helper also rounds the value down
> to a power of two so that filesystem geometry calculations always produce
> clean results.  When the two DMA values are equal, no backend provided a
> real hint, so opt_sectors stays at 0 ("no preference").
> 
> A WARN_ONCE guards against dma_opt_mapping_size() returning a value
> larger than dma_max_mapping_size(), which would indicate a driver bug.
> The return value uses min_t(unsigned int, ...) to avoid any potential
> overflow when shifting the size_t opt value down to sectors.
> 
> Fixes: 4cbfca5f7750 ("scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit")
> Cc: stable@vger.kernel.org
> Signed-off-by: Ionut Nechita <ionut.nechita@windriver.com>
> ---
>  drivers/scsi/scsi_transport_sas.c | 40 +++++++++++++++++++++++++++----
>  1 file changed, 36 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c
> index 12124f9d5ccd0..696627b6fe2c3 100644
> --- a/drivers/scsi/scsi_transport_sas.c
> +++ b/drivers/scsi/scsi_transport_sas.c
> @@ -27,6 +27,7 @@
>  #include <linux/module.h>
>  #include <linux/jiffies.h>
>  #include <linux/err.h>
> +#include <linux/log2.h>
>  #include <linux/slab.h>
>  #include <linux/string.h>
>  #include <linux/blkdev.h>
> @@ -222,6 +223,38 @@ static int sas_bsg_initialize(struct Scsi_Host *shost, struct sas_rphy *rphy)
>   * SAS host attributes
>   */
>  
> +/**
> + * sas_dma_opt_sectors - derive opt_sectors from DMA optimal mapping size
> + * @dma_dev: device to query DMA parameters for
> + * @max_sectors: upper bound from the host adapter
> + *
> + * When the DMA layer reports a genuine optimization constraint (i.e.
> + * dma_opt_mapping_size() < dma_max_mapping_size()), convert it to a
> + * sector count, round it down to a power of two so that filesystem
> + * geometry calculations stay sane, and cap it at @max_sectors.
> + *
> + * When the two values are equal no backend provided a real hint and
> + * the function returns 0 ("no preference").
> + */
> +static unsigned int sas_dma_opt_sectors(struct device *dma_dev,
> +					unsigned int max_sectors)
> +{
> +	size_t opt = dma_opt_mapping_size(dma_dev);
> +	size_t max = dma_max_mapping_size(dma_dev);
> +
> +	if (WARN_ONCE(opt > max,
> +		      "dma_opt_mapping_size (%zu) > dma_max_mapping_size (%zu)\n",
> +		      opt, max))
> +		return 0;
> +
> +	if (opt == max)
> +		return 0;
> +
> +	opt = rounddown_pow_of_two(opt);
> +
> +	return min_t(unsigned int, opt >> SECTOR_SHIFT, max_sectors);
> +}
> +
>  static int sas_host_setup(struct transport_container *tc, struct device *dev,
>  			  struct device *cdev)
>  {
> @@ -239,10 +272,9 @@ static int sas_host_setup(struct transport_container *tc, struct device *dev,
>  		dev_printk(KERN_ERR, dev, "fail to a bsg device %d\n",
>  			   shost->host_no);
>  
> -	if (dma_dev->dma_mask) {
> -		shost->opt_sectors = min_t(unsigned int, shost->max_sectors,
> -				dma_opt_mapping_size(dma_dev) >> SECTOR_SHIFT);
> -	}
> +	if (dma_dev->dma_mask)
> +		shost->opt_sectors =
> +			sas_dma_opt_sectors(dma_dev, shost->max_sectors);
>  
>  	return 0;
>  }


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] scsi: sas: skip opt_sectors when DMA reports no real optimization hint
  2026-03-19  8:39 ` [PATCH v4] scsi: sas: skip opt_sectors when DMA reports no real optimization hint Ionut Nechita (Wind River)
  2026-03-19 11:06   ` Damien Le Moal
@ 2026-03-19 11:07   ` Damien Le Moal
  2026-03-19 20:43     ` Ionut Nechita (Wind River)
  1 sibling, 1 reply; 7+ messages in thread
From: Damien Le Moal @ 2026-03-19 11:07 UTC (permalink / raw)
  To: Ionut Nechita (Wind River), linux-scsi
  Cc: James.Bottomley, ahuang12, axboe, hch, ionut_n2001, john.g.garry,
	linux-kernel, m.szyprowski, martin.petersen, robin.murphy,
	sunlightlinux

On 3/19/26 17:39, Ionut Nechita (Wind River) wrote:
> +static unsigned int sas_dma_opt_sectors(struct device *dma_dev,
> +					unsigned int max_sectors)
> +{
> +	size_t opt = dma_opt_mapping_size(dma_dev);
> +	size_t max = dma_max_mapping_size(dma_dev);
> +
> +	if (WARN_ONCE(opt > max,
> +		      "dma_opt_mapping_size (%zu) > dma_max_mapping_size (%zu)\n",
> +		      opt, max))
> +		return 0;
> +
> +	if (opt == max)
> +		return 0;

Why return 0 ? This is a valid case, so this should get through the alignment below.

> +
> +	opt = rounddown_pow_of_two(opt);
> +
> +	return min_t(unsigned int, opt >> SECTOR_SHIFT, max_sectors);
> +}
> +


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] scsi: sas: skip opt_sectors when DMA reports no real optimization hint
  2026-03-19 11:07   ` Damien Le Moal
@ 2026-03-19 20:43     ` Ionut Nechita (Wind River)
  2026-03-19 20:49       ` Damien Le Moal
  2026-03-19 21:04       ` James Bottomley
  0 siblings, 2 replies; 7+ messages in thread
From: Ionut Nechita (Wind River) @ 2026-03-19 20:43 UTC (permalink / raw)
  To: dlemoal
  Cc: James.Bottomley, ahuang12, axboe, hch, ionut.nechita, ionut_n2001,
	john.g.garry, linux-kernel, linux-scsi, m.szyprowski,
	martin.petersen, robin.murphy, sunlightlinux

On Wed, 19 Mar 2026 11:07:00 +0000, Damien Le Moal wrote:
> Why return 0 ? This is a valid case, so this should get through the
> alignment below.

Hi Damien,

Thanks for the review.

The opt == max case is specifically the bug this patch fixes.

When the IOMMU is disabled or in passthrough mode and no DMA ops
provide an opt_mapping_size callback, dma_opt_mapping_size() falls
back to min(SIZE_MAX, dma_max_mapping_size()), which equals
dma_max_mapping_size().  So opt == max.

If we let that value through, rounddown_pow_of_two() produces a
huge power-of-two, and min_t() caps it at max_sectors (32767).
That gives opt_sectors = 32767, which is exactly the bogus value
that breaks mkfs.xfs:

  swidth = 16773120 / 4096 = 4095
  sunit  = 8192 / 4096     = 2
  4095 % 2 != 0  ->  "SB stripe unit sanity check failed"

The key insight (from Robin Murphy's v1 review) is that when no
backend provides a real optimization constraint, the DMA core
returns the largest efficient size == the largest size.  That is
correct DMA semantics, but it means opt == max signals "no
preference", not "the optimal size happens to equal the maximum".

Returning 0 in that case means "no preference", which leaves
opt_sectors at 0 and lets the disk's own geometry (or lack
thereof) determine the I/O size.

Regarding the Cc list: noted, I will trim it for v5 if needed.

Thanks,
Ionut

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] scsi: sas: skip opt_sectors when DMA reports no real optimization hint
  2026-03-19 20:43     ` Ionut Nechita (Wind River)
@ 2026-03-19 20:49       ` Damien Le Moal
  2026-03-19 21:04       ` James Bottomley
  1 sibling, 0 replies; 7+ messages in thread
From: Damien Le Moal @ 2026-03-19 20:49 UTC (permalink / raw)
  To: Ionut Nechita (Wind River)
  Cc: James.Bottomley, ahuang12, axboe, hch, ionut_n2001, john.g.garry,
	linux-kernel, linux-scsi, m.szyprowski, martin.petersen,
	robin.murphy, sunlightlinux

On 3/20/26 05:43, Ionut Nechita (Wind River) wrote:
> On Wed, 19 Mar 2026 11:07:00 +0000, Damien Le Moal wrote:
>> Why return 0 ? This is a valid case, so this should get through the
>> alignment below.
> 
> Hi Damien,
> 
> Thanks for the review.
> 
> The opt == max case is specifically the bug this patch fixes.
> 
> When the IOMMU is disabled or in passthrough mode and no DMA ops
> provide an opt_mapping_size callback, dma_opt_mapping_size() falls
> back to min(SIZE_MAX, dma_max_mapping_size()), which equals
> dma_max_mapping_size().  So opt == max.
> 
> If we let that value through, rounddown_pow_of_two() produces a
> huge power-of-two, and min_t() caps it at max_sectors (32767).
> That gives opt_sectors = 32767, which is exactly the bogus value
> that breaks mkfs.xfs:
> 
>   swidth = 16773120 / 4096 = 4095
>   sunit  = 8192 / 4096     = 2
>   4095 % 2 != 0  ->  "SB stripe unit sanity check failed"
> 
> The key insight (from Robin Murphy's v1 review) is that when no
> backend provides a real optimization constraint, the DMA core
> returns the largest efficient size == the largest size.  That is
> correct DMA semantics, but it means opt == max signals "no
> preference", not "the optimal size happens to equal the maximum".
> 
> Returning 0 in that case means "no preference", which leaves
> opt_sectors at 0 and lets the disk's own geometry (or lack
> thereof) determine the I/O size.

Thanks for re-explaining this.

The code needs to have all this explanation as comment so that we do not trip on
this again.

> 
> Regarding the Cc list: noted, I will trim it for v5 if needed.
> 
> Thanks,
> Ionut


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] scsi: sas: skip opt_sectors when DMA reports no real optimization hint
  2026-03-19 20:43     ` Ionut Nechita (Wind River)
  2026-03-19 20:49       ` Damien Le Moal
@ 2026-03-19 21:04       ` James Bottomley
  1 sibling, 0 replies; 7+ messages in thread
From: James Bottomley @ 2026-03-19 21:04 UTC (permalink / raw)
  To: Ionut Nechita (Wind River), dlemoal
  Cc: ahuang12, axboe, hch, ionut_n2001, john.g.garry, linux-kernel,
	linux-scsi, m.szyprowski, martin.petersen, robin.murphy,
	sunlightlinux

On Thu, 2026-03-19 at 22:43 +0200, Ionut Nechita (Wind River) wrote:
> On Wed, 19 Mar 2026 11:07:00 +0000, Damien Le Moal wrote:
> > Why return 0 ? This is a valid case, so this should get through the
> > alignment below.
> 
> Hi Damien,
> 
> Thanks for the review.
> 
> The opt == max case is specifically the bug this patch fixes.
> 
> When the IOMMU is disabled or in passthrough mode and no DMA ops
> provide an opt_mapping_size callback, dma_opt_mapping_size() falls
> back to min(SIZE_MAX, dma_max_mapping_size()), which equals
> dma_max_mapping_size().  So opt == max.
> 
> If we let that value through, rounddown_pow_of_two() produces a
> huge power-of-two, and min_t() caps it at max_sectors (32767).
> That gives opt_sectors = 32767, which is exactly the bogus value
> that breaks mkfs.xfs:
> 
>   swidth = 16773120 / 4096 = 4095
>   sunit  = 8192 / 4096     = 2
>   4095 % 2 != 0  ->  "SB stripe unit sanity check failed"

So if max_sectors is usually 32767 and this breaks xfs why the final
line:

> +	return min_t(unsigned int, opt >> SECTOR_SHIFT,
> max_sectors);

because there are surely situations where the above max_sectors 
(32767) comes back as the minimum or are you assuming opt >>
SECTOR_SHIFT is always less than max_sectors, in which case there's no
need for min_t?

Additionally, I note that the new AI code review:

https://sashiko.dev/#/patchset/20260319083954.21056-1-ionut.nechita%40windriver.com

Worries that if opt comes back as its don't care zero value then
rounddown_pow_of_2(opt) returns a bogus value.

Regards,

James


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-03-19 21:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-19  8:39 [PATCH v4 0/1] scsi: sas: fix mkfs.xfs failure due to bogus optimal_io_size Ionut Nechita (Wind River)
2026-03-19  8:39 ` [PATCH v4] scsi: sas: skip opt_sectors when DMA reports no real optimization hint Ionut Nechita (Wind River)
2026-03-19 11:06   ` Damien Le Moal
2026-03-19 11:07   ` Damien Le Moal
2026-03-19 20:43     ` Ionut Nechita (Wind River)
2026-03-19 20:49       ` Damien Le Moal
2026-03-19 21:04       ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox