public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: John Garry <john.g.garry@oracle.com>
To: "Ionut Nechita (Wind River)" <ionut.nechita@windriver.com>,
	"James E . J . Bottomley" <James.Bottomley@HansenPartnership.com>,
	"Martin K . Petersen" <martin.petersen@oracle.com>
Cc: ahuang12@lenovo.com, axboe@kernel.dk, dlemoal@kernel.org,
	hch@lst.de, ionut_n2001@yahoo.com, linux-kernel@vger.kernel.org,
	linux-scsi@vger.kernel.org, m.szyprowski@samsung.com,
	robin.murphy@arm.com, sunlightlinux@gmail.com,
	stable@vger.kernel.org
Subject: Re: [PATCH v5 1/1] scsi: sas: skip opt_sectors when DMA reports no real optimization hint
Date: Mon, 23 Mar 2026 08:45:05 +0000	[thread overview]
Message-ID: <c03ccf98-a86c-4c30-bc24-e76178ea8bdd@oracle.com> (raw)
In-Reply-To: <20260320081429.42106-2-ionut.nechita@windriver.com>

On 20/03/2026 08:14, Ionut Nechita (Wind River) wrote:
> From: Ionut Nechita <ionut.nechita@windriver.com>
> 
> sas_host_setup() unconditionally sets shost->opt_sectors from
> dma_opt_mapping_size().  When the IOMMU is disabled or in passthrough
> mode and no DMA ops provide an opt_mapping_size callback,
> dma_opt_mapping_size() returns min(dma_max_mapping_size(), SIZE_MAX)
> which equals dma_max_mapping_size() — a hard upper bound, not an
> optimization hint.
> 
> On a Dell PowerEdge R750 with mpt3sas (Broadcom SAS3816, FW 33.15.00.00)
> and intel_iommu=off the following values are observed:
> 
>    dma_opt_mapping_size()  = dma_max_mapping_size() (no real hint)
>    shost->max_sectors      = 32767
>    opt_sectors             = min(32767, huge >> 9) = 32767
>    optimal_io_size         = 32767 << 9 = 16776704
>                            → round_down(16776704, 4096) = 16773120
> 
> The SAS disk (SAMSUNG MZILT800HBHQ0D3) do not report an
> Optimal Transfer Length in VPD page B0,so sdkp->opt_xfer_blocks remains 0.
> sd_revalidate_disk() then uses min_not_zero(0, opt_sectors) = opt_sectors,
> propagating the bogus value into the block device's optimal_io_size
> (visible as OPT-IO = 16773120 in lsblk --topology).
> 
> mkfs.xfs picks up optimal_io_size and minimum_io_size and computes:
> 
>    swidth = 16773120 / 4096 = 4095
>    sunit  = 8192 / 4096     = 2
> 
> Since 4095 % 2 != 0, XFS rejects the geometry:
> 
>    SB stripe unit sanity check failed
> 
> This makes it impossible to create XFS filesystems (e.g. for
> /var/lib/docker) during system bootstrap.
> 
> Fix this by introducing a sas_dma_opt_sectors() helper that only returns
> a non-zero opt_sectors when dma_opt_mapping_size() is strictly less than
> dma_max_mapping_size(), indicating a genuine DMA optimization constraint
> from an IOMMU or DMA ops backend.  The helper also rounds the value down
> to a power of two so that filesystem geometry calculations always produce
> clean results.  When the two DMA values are equal, no backend provided a
> real hint, so opt_sectors stays at 0 ("no preference").
> 
> A WARN_ONCE guards against dma_opt_mapping_size() returning a value
> larger than dma_max_mapping_size(), which would indicate a driver bug.
> A zero check on opt guards against undefined behavior from
> rounddown_pow_of_two(0).  The return value uses min_t(unsigned int, ...)
> to avoid any potential overflow when shifting the size_t opt value down
> to sectors.
> 
> Fixes: 4cbfca5f7750 ("scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit")
> Cc: stable@vger.kernel.org
> Signed-off-by: Ionut Nechita <ionut.nechita@windriver.com>
> ---
>   drivers/scsi/scsi_transport_sas.c | 52 ++++++++++++++++++++++++++++---
>   1 file changed, 48 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c
> index 12124f9d5ccd0..a5207caf8565e 100644
> --- a/drivers/scsi/scsi_transport_sas.c
> +++ b/drivers/scsi/scsi_transport_sas.c
> @@ -27,6 +27,7 @@
>   #include <linux/module.h>
>   #include <linux/jiffies.h>
>   #include <linux/err.h>
> +#include <linux/log2.h>
>   #include <linux/slab.h>
>   #include <linux/string.h>
>   #include <linux/blkdev.h>
> @@ -222,6 +223,50 @@ static int sas_bsg_initialize(struct Scsi_Host *shost, struct sas_rphy *rphy)
>    * SAS host attributes
>    */
>   
> +/**

this is static, so really should not be a kerneldoc comment

> + * sas_dma_opt_sectors - derive opt_sectors from DMA optimal mapping size
> + * @dma_dev: device to query DMA parameters for
> + * @max_sectors: upper bound from the host adapter
> + *
> + * When the DMA layer reports a genuine optimization constraint (i.e.
> + * dma_opt_mapping_size() < dma_max_mapping_size()), convert it to a
> + * sector count, round it down to a power of two so that filesystem
> + * geometry calculations stay sane, and cap it at @max_sectors.
> + *
> + * When the two values are equal no backend provided a real hint and
> + * the function returns 0 ("no preference").  This happens when the
> + * IOMMU is disabled or in passthrough mode: dma_opt_mapping_size()
> + * falls back to min(SIZE_MAX, dma_max_mapping_size()) which equals
> + * dma_max_mapping_size().  Letting that value through would produce
> + * opt_sectors == max_sectors (e.g. 32767), leading to bogus
> + * optimal_io_size values that break mkfs.xfs stripe geometry.
> + *
> + * A WARN_ONCE guards against dma_opt_mapping_size() returning a value
> + * larger than dma_max_mapping_size(), which would indicate a driver bug.
> + */

too much is written here, it can be one paragraph

> +static unsigned int sas_dma_opt_sectors(struct device *dma_dev,
> +					unsigned int max_sectors)
> +{
> +	size_t opt = dma_opt_mapping_size(dma_dev);
> +	size_t max = dma_max_mapping_size(dma_dev);
> +
> +	if (WARN_ONCE(opt > max,
> +		      "dma_opt_mapping_size (%zu) > dma_max_mapping_size (%zu)\n",
> +		      opt, max))
> +		return 0;
> +

this can or should not happen, so it is not the drivers job to check it

> +	/* opt == max means no backend provided a real hint; see above. */
> +	if (opt == max)
> +		return 0;
> +
> +	if (!opt)
> +		return 0;

this can be combined with the previous check:

	if (!opt || opt == max)
		return;

> +
> +	opt = rounddown_pow_of_two(opt);
> +
> +	return min_t(unsigned int, opt >> SECTOR_SHIFT, max_sectors);

max_sectors can really be any value.

I would think that is is better to rounddown to a power-of-2 the min of 
opt sectors and max_sectors

> +}
> +
>   static int sas_host_setup(struct transport_container *tc, struct device *dev,
>   			  struct device *cdev)
>   {
> @@ -239,10 +284,9 @@ static int sas_host_setup(struct transport_container *tc, struct device *dev,
>   		dev_printk(KERN_ERR, dev, "fail to a bsg device %d\n",
>   			   shost->host_no);
>   
> -	if (dma_dev->dma_mask) {
> -		shost->opt_sectors = min_t(unsigned int, shost->max_sectors,
> -				dma_opt_mapping_size(dma_dev) >> SECTOR_SHIFT);
> -	}
> +	if (dma_dev->dma_mask)

I think that we don't need to declare dma_dev here, so can have:

static void sas_dma_setup_opt_sectors(struct Scsi_Host *shost)
{
	struct device *dma_dev = shost->dma_dev;

	if (!dma_dev->dma_mask)
		return;

	/* continue to evaluate opt sectors */
	...
}

> +		shost->opt_sectors =
> +			sas_dma_opt_sectors(dma_dev, shost->max_sectors);
>   
>   	return 0;
>   }


      reply	other threads:[~2026-03-23  8:45 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260320081429.42106-1-ionut.nechita@windriver.com>
2026-03-20  8:14 ` [PATCH v5 1/1] scsi: sas: skip opt_sectors when DMA reports no real optimization hint Ionut Nechita (Wind River)
2026-03-23  8:45   ` John Garry [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c03ccf98-a86c-4c30-bc24-e76178ea8bdd@oracle.com \
    --to=john.g.garry@oracle.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=ahuang12@lenovo.com \
    --cc=axboe@kernel.dk \
    --cc=dlemoal@kernel.org \
    --cc=hch@lst.de \
    --cc=ionut.nechita@windriver.com \
    --cc=ionut_n2001@yahoo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=m.szyprowski@samsung.com \
    --cc=martin.petersen@oracle.com \
    --cc=robin.murphy@arm.com \
    --cc=stable@vger.kernel.org \
    --cc=sunlightlinux@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox