From: Christoph Hellwig <hch@lst.de>
To: Robin Murphy <robin.murphy@arm.com>
Cc: "Ionut Nechita (Wind River)" <ionut.nechita@windriver.com>,
m.szyprowski@samsung.com, kbusch@kernel.org, axboe@kernel.dk,
sagi@grimberg.me, martin.petersen@oracle.com,
damien.lemoal@opensource.wdc.com, john.g.garry@oracle.com,
ahuang12@lenovo.com, iommu@lists.linux.dev,
linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org,
stable@vger.kernel.org, ionut_n2001@yahoo.com,
sunlightlinux@gmail.com
Subject: Re: [PATCH v1 1/2] dma: return 0 from dma_opt_mapping_size() when no real hint exists
Date: Tue, 17 Mar 2026 15:19:35 +0100 [thread overview]
Message-ID: <20260317141935.GB3539@lst.de> (raw)
In-Reply-To: <d2e0fe45-31ae-4879-951e-7d0494d764e4@arm.com>
On Tue, Mar 17, 2026 at 09:43:46AM +0000, Robin Murphy wrote:
> On 2026-03-16 8:39 pm, Ionut Nechita (Wind River) wrote:
>> From: Ionut Nechita <ionut.nechita@windriver.com>
>>
>> dma_opt_mapping_size() currently initializes its local size to SIZE_MAX
>> and, when neither an IOMMU nor a DMA ops opt_mapping_size callback is
>> present, returns min(dma_max_mapping_size(dev), SIZE_MAX). That value
>> is a large but finite number that has nothing to do with an optimal
>> transfer size — it is simply the maximum the DMA layer can map.
>
> No, the current code is correct. dma_opt_mapping_size() represents the
> largest size that can be mapped without incurring any significant
> performance penalty (compared to smaller sizes). If the implementation has
> no such restriction, then the largest "efficient" size is quite obviously
> just the largest size in total.
Yes.
>> Callers such as scsi_transport_sas treat the return value as a genuine
>> optimization hint and propagate it into Scsi_Host.opt_sectors, which in
>> turn becomes the block device's optimal_io_size. On SAS controllers
>> like mpt3sas running with IOMMU in passthrough mode the bogus value
>> (max_sectors << 9 = 16776704, rounded to 16773120) reaches mkfs.xfs,
>> which computes swidth=4095 and sunit=2. Because 4095 is not a multiple
>> of 2, XFS rejects the geometry with "SB stripe unit sanity check
>> failed", making it impossible to create filesystems during system
>> bootstrap.
>
> And that is obviously a bug. There has never been any guarantee offered
> about the values returned by either dma_max_mapping_size() or
> dma_opt_mapping_size() - they could be very large, very small, and
> certainly do not have to be powers of 2. Say an implementation has some
> internal data size optimisation that makes U32_MAX its largest "efficient"
> size, it's free to return that, and then you'll still have the same bug
> regardless of this bodge.
Yes, the SCSI/SAS code needs to properly round the value.
But we might also need to split the values up a bit, as tools just
assign too much value to the I/O opt value. I.e. the file system
geometry really should not be affected by the IOMMU details.
>
> Fix the actual bug, don't break common code in an attempt to paper over it
> that doesn't even achieve that very well.
>
> Thanks,
> Robin.
>
>> Fix this by returning 0 when no backend provides an optimal mapping size
>> hint. A return value of 0 unambiguously means "no preference" and lets
>> callers that use min() or min_not_zero() do the right thing without
>> special-casing.
>>
>> The only other in-tree caller (nvme-pci) is adjusted in the next patch.
>>
>> Fixes: a229cc14f339 ("dma-mapping: add dma_opt_mapping_size()")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Ionut Nechita <ionut.nechita@windriver.com>
>> ---
>> kernel/dma/mapping.c | 13 ++++++++-----
>> 1 file changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
>> index 78d8b4039c3e6..fffa6a3f191a3 100644
>> --- a/kernel/dma/mapping.c
>> +++ b/kernel/dma/mapping.c
>> @@ -984,14 +984,17 @@ EXPORT_SYMBOL_GPL(dma_max_mapping_size);
>> size_t dma_opt_mapping_size(struct device *dev)
>> {
>> const struct dma_map_ops *ops = get_dma_ops(dev);
>> - size_t size = SIZE_MAX;
>> if (use_dma_iommu(dev))
>> - size = iommu_dma_opt_mapping_size();
>> - else if (ops && ops->opt_mapping_size)
>> - size = ops->opt_mapping_size();
>> + return iommu_dma_opt_mapping_size();
>> + if (ops && ops->opt_mapping_size)
>> + return ops->opt_mapping_size();
>> - return min(dma_max_mapping_size(dev), size);
>> + /*
>> + * No backend provided an optimal size hint. Return 0 so that
>> + * callers can distinguish "no hint" from a real value.
>> + */
>> + return 0;
>> }
>> EXPORT_SYMBOL_GPL(dma_opt_mapping_size);
>>
---end quoted text---
next prev parent reply other threads:[~2026-03-17 14:19 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-16 20:39 [PATCH v1 0/2] dma: fix dma_opt_mapping_size() returning bogus value when no backend hint exists Ionut Nechita (Wind River)
2026-03-16 20:39 ` [PATCH v1 1/2] dma: return 0 from dma_opt_mapping_size() when no real " Ionut Nechita (Wind River)
2026-03-17 9:43 ` Robin Murphy
2026-03-17 14:19 ` Christoph Hellwig [this message]
2026-03-16 20:39 ` [PATCH v1 2/2] nvme-pci: handle dma_opt_mapping_size() returning 0 Ionut Nechita (Wind River)
2026-03-16 21:21 ` Damien Le Moal
2026-03-17 8:55 ` John Garry
2026-03-17 14:14 ` Christoph Hellwig
2026-03-17 9:11 ` [PATCH v1 0/2] dma: fix dma_opt_mapping_size() returning bogus value when no backend hint exists John Garry
2026-03-17 9:18 ` Damien Le Moal
2026-03-17 14:36 ` Christoph Hellwig
2026-03-17 15:18 ` John Garry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260317141935.GB3539@lst.de \
--to=hch@lst.de \
--cc=ahuang12@lenovo.com \
--cc=axboe@kernel.dk \
--cc=damien.lemoal@opensource.wdc.com \
--cc=iommu@lists.linux.dev \
--cc=ionut.nechita@windriver.com \
--cc=ionut_n2001@yahoo.com \
--cc=john.g.garry@oracle.com \
--cc=kbusch@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=m.szyprowski@samsung.com \
--cc=martin.petersen@oracle.com \
--cc=robin.murphy@arm.com \
--cc=sagi@grimberg.me \
--cc=stable@vger.kernel.org \
--cc=sunlightlinux@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.