From: David Laight <david.laight.linux@gmail.com>
To: Alvin Lim <alvinwylim@gmail.com>
Cc: linux-ide@vger.kernel.org, Damien Le Moal <dlemoal@kernel.org>,
Niklas Cassel <cassel@kernel.org>,
linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH] ata: ahci: force 32-bit DMA for ASMedia ASM1166
Date: Sun, 21 Jun 2026 22:57:27 +0100 [thread overview]
Message-ID: <20260621225727.734e5ecc@pumpkin> (raw)
In-Reply-To: <CA+CYLR6Rg-3brg9yCMAKJDr7t=mtu4vP0+aMFs+JhLPWtQxOYA@mail.gmail.com>
On Sun, 21 Jun 2026 22:02:31 +0800
Alvin Lim <alvinwylim@gmail.com> wrote:
> > It seems more likely that the wrong addresses are ending up in the host
> > side of the iommu and using bounce buffers fixes that.
> > Using the wrong addresses is likely to lead to major kernel memory
> > corruptions.
> > Mixing up physical addresses and dma addresses, assuming that memory
> > from dma_alloc_coherent() is physically contiguous, or just losing the
> > high bits of the physical address passed to the iommu seem more likely.
>
> Thank you - this is the right question to ask, and you're correct that
> my commit message overstates what I actually established. I described the
> controller's internal behaviour as fact when it is an inference from the
> failure class, not something I characterised on the silicon. I'll reword
> v2 to claim only what I measured. I also did not instrument the IOVAs
> actually handed to the device, so I won't argue the allocator internals
> with you.
>
> But I don't think a host-side addressing bug fits the observations,
> independent of mechanism, for three reasons:
>
> 1. The fault is specific to this one controller. On this host the AMD
> IOMMU also translates for four NVMe controllers (three of them Ceph
> OSDs under identical load) and the NIC - all doing 64-bit DMA through
> the same iommu, same kernel, same RAM. Every one of them is clean: the
> NVMe OSDs deep-scrub to zero errors. Only the six disks behind the
> ASM1166 corrupt. A host-side bug - phys/dma mixup, a
> dma_alloc_coherent() contiguity assumption, or high bits lost into the
> iommu - is device-agnostic: the mapping path doesn't know which device
> sits behind an IOVA, so it would have to corrupt NVMe too. It doesn't.
> That selectivity is what points at the controller rather than the
> kernel.
>
> 2. It is load-dependent. Single isolated reads are almost always clean;
> only concurrent reads corrupt (six parallel dd of one file return six
> different md5s; serial reads agree). A deterministic host-side
> addressing bug would not switch on and off with concurrency.
>
> 3. The fix holds at scale. Running amd_iommu=off, the controller only
> ever sees real physical addresses - the top of this box's RAM is
> 0x83e2fffff (~33 GiB, a 36-bit address, from /proc/iomem) - and a full
> deep-scrub of the 5.4 TiB / 1.43M-object pool re-read end-to-end with
> zero scrub errors, plus ~4.7 TiB of backfill writes through the same
> path, also at zero.
>
> On your "works with the iommu off, so it supports 64-bit" inference:
> amd_iommu=off does not exercise the 64-bit range on this box. With the
> iommu off the controller is handed untranslated physical addresses, and
> the highest in the map is 0x83e2fffff (36-bit, from /proc/iomem). The
> clean scrub in (3) is therefore strong evidence the controller does
> correct DMA up to ~36-bit - which is also how I know it is not truly
> 32-bit. It says nothing about the full 64-bit range, because no address
> that high is ever presented to it in that configuration.
>
> A note on what I actually validated: the at-scale proof above is under
> amd_iommu=off. The submitted quirk is the IOMMU-preserving equivalent -
> it caps the controller's DMA mask at 32 bits and bounces transfers above
> 4 GiB via SWIOTLB. 32-bit addresses are a strict subset of the ~36-bit
> physical range already shown clean, so the quirk is conservative relative
> to what is validated; I have not separately re-run the scrub under the
> quirk itself, as this is a production NAS and I won't deliberately
> re-induce corruption to characterise it.
I'd definitely look at how the iommu code selects the address the target
side uses.
There is something very fishy going on.
With the iommu enabled whether bounce buffers get used should make no
difference - the address the device sees is different from the physical
address. But with a 32bit dma mask it should always generate low addresses
(regardless of the physical address of the buffer).
One possibility is the iommu code uses increasing addresses for each transfer
(to get uniqueness) but resets the address when idle.
That would mean the a single thread test would keep using the same address
(so it would stay low) but concurrent requests would keep using forever
larger addresses until the target starts aliasing the addresses (it puts
in the request TLP). At which point the requests should hit invalid iommu
pte (or whatever they are called) and the PCIe read/write should fail.
I'm not sure what the device would see - PCIe writes are 'posted' and
async, host reads of invalid addresses return ~0 and trigger an AER error
(if enabled and if anyone can manage to work out what to do, hint an NMI
into the host is not helpful on a 'nebs' rated server).
One effect of setting the dma mask to 32bits is to force swiotbl bounces
when they aren't needed because the iommu is disabled/absent.
I think you are probably using the wrong shaped hammer :-)
David
>
> This is the same shape as the sibling ASMedia ASM1061, quirked to 43-bit
> DMA (20730e9b2778), and JMicron JMB585, quirked to 32-bit (105c42566a55).
> I submitted the conservative 32-bit lower bound (as JMB585 itself
> shipped) rather than assert a width I did not measure; a later change can
> widen it if someone characterises the true value.
>
> If it helps, I'll send a v2 whose commit message states only the measured
> facts - corruption isolated to the ASM1166's disks while every other DMA
> master on the same IOMMU is clean; load-dependent; eliminated by
> constraining the controller's DMA, verified at scale - and leaves the
> internal silicon mechanism as inference consistent with the existing
> ASM1061/JMB585 quirks rather than asserting it.
>
> Would that address your concern?
>
> Thanks,
> Alvin
>
> On Sun, Jun 21, 2026 at 8:48 PM David Laight <david.laight.linux@gmail.com>
> wrote:
>
> > On Sun, 21 Jun 2026 18:08:44 +0800
> > Alvin Lim <alvinwylim@gmail.com> wrote:
> >
> > > The ASMedia ASM1166 SATA controller (1b21:1166) advertises 64-bit DMA
> > > support (AHCI CAP.S64A), but on systems with the IOMMU enabled - where it
> > > can be handed DMA addresses above 4 GB - it silently corrupts data in
> > > transit.
> >
> > That seems wrong.
> > If any iommu is disabled the sata cotroller will be passed the memory's
> > physical address which is very likely to be over 4G.
> > So the controller seems to support 64bit addresses - as advertised.
> >
> > But with the iommu enabled the addresses the controller needs to use are
> > different from the physical address - so the controller will almost
> > certainly see sub 4G addresses for buffers above 4G.
> > (The iommu probably allocates 32bit PCIe addresses for all buffers so that
> > it doesn't have to worry about targets that only support 32bit addresses.)
> >
> > It seems more likely that the wrong addresses are ending up in the host
> > side of the iommu and using bounce buffers fixes that.
> > Using the wrong addresses is likely to lead to major kernel memory
> > corruptions.
> >
> > Mixing up physical addresses and dma addresses, assuming that memory
> > from dma_alloc_coherent() is physically contiguous, or just losing the
> > high bits of the physical address passed to the iommu seem more likely.
> >
> > David
> >
> >
> >
> > > Reads return different, wrong data on each access. SMART is clean,
> > > there are no SATA link resets and no MCE is raised, so the corruption is
> > > invisible until it surfaces as filesystem metadata errors (XFS EUCLEAN)
> > > or, on Ceph, mass scrub errors across multiple independent filesystems at
> > > once - i.e. host-level, not filesystem-level.
> > >
> > > This is the same failure mode already quirked for other controllers that
> > > falsely claim working 64-bit DMA. See commit 105c42566a55 ("ata: ahci:
> > > force 32-bit DMA for JMicron JMB582/JMB585") and commit 20730e9b2778
> > > ("ahci: add 43-bit DMA address quirk for ASMedia ASM1061 controllers").
> > > The ASM1166 currently maps to plain board_ahci with no DMA limit.
> > >
> > > Limit the ASM1166 to 32-bit DMA. 32-bit is the guaranteed-correct lower
> > > bound; the only cost is extra SWIOTLB bounce-buffering on transfers above
> > > 4 GB, negligible for storage. A future change can widen it to the
> > > controller's true addressable width if characterised. Until this lands
> > the
> > > only workarounds are disabling the IOMMU (amd_iommu=off /
> > intel_iommu=off)
> > > or using an HBA.
> > >
> > > Reproduced on an AOOSTAR WTR MAX (AMD Ryzen 7 PRO 8845HS) whose six SATA
> > > bays all hang off one ASM1166: with the IOMMU on, six concurrent
> > > 'dd ... | md5sum' of the same large file return six different sums; with
> > > amd_iommu=off they are identical, and a full Ceph deep-scrub of a 5.4 TiB
> > > / 1.43M-object pool re-reads end-to-end with zero scrub errors.
> > >
> > > Add a board_ahci_32bit_dma board type (mirroring board_ahci_43bit_dma)
> > > and point the ASM1166 entry at it.
> > >
> > > Fixes: 3bf614106094 ("ata: ahci: add identifiers for ASM2116 series
> > adapters")
> > > Cc: stable@vger.kernel.org
> > > Assisted-by: Claude:claude-opus-4.8
> > > Signed-off-by: Alvin Lim <alvinwylim@gmail.com>
> > > ---
> > > drivers/ata/ahci.c | 10 +++++++++-
> > > 1 file changed, 9 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
> > > index 58f512f8952a..895956c2ca15 100644
> > > --- a/drivers/ata/ahci.c
> > > +++ b/drivers/ata/ahci.c
> > > @@ -48,6 +48,7 @@ enum {
> > > enum board_ids {
> > > /* board IDs by feature in alphabetical order */
> > > board_ahci,
> > > + board_ahci_32bit_dma,
> > > board_ahci_43bit_dma,
> > > board_ahci_ign_iferr,
> > > board_ahci_no_debounce_delay,
> > > @@ -132,6 +133,13 @@ static const struct ata_port_info ahci_port_info[]
> > = {
> > > .udma_mask = ATA_UDMA6,
> > > .port_ops = &ahci_ops,
> > > },
> > > + [board_ahci_32bit_dma] = {
> > > + AHCI_HFLAGS (AHCI_HFLAG_32BIT_ONLY),
> > > + .flags = AHCI_FLAG_COMMON,
> > > + .pio_mask = ATA_PIO4,
> > > + .udma_mask = ATA_UDMA6,
> > > + .port_ops = &ahci_ops,
> > > + },
> > > [board_ahci_43bit_dma] = {
> > > AHCI_HFLAGS (AHCI_HFLAG_43BIT_ONLY),
> > > .flags = AHCI_FLAG_COMMON,
> > > @@ -1559,7 +1567,7 @@ static const struct pci_device_id ahci_pci_tbl[] =
> > {
> > > }, {
> > > /* ASM1166 */
> > > PCI_VDEVICE(ASMEDIA, 0x1166),
> > > - .driver_data = board_ahci,
> > > + .driver_data = board_ahci_32bit_dma,
> > > }, {
> > > /*
> > > * Samsung SSDs found on some macbooks. NCQ times out if
> > MSI is
> > >
> > > base-commit: 322008f87f917e2217eeac386a9410945092eb2e
> >
> >
prev parent reply other threads:[~2026-06-21 21:57 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-21 10:08 [PATCH] ata: ahci: force 32-bit DMA for ASMedia ASM1166 Alvin Lim
2026-06-21 10:21 ` sashiko-bot
2026-06-21 12:48 ` David Laight
[not found] ` <CA+CYLR6Rg-3brg9yCMAKJDr7t=mtu4vP0+aMFs+JhLPWtQxOYA@mail.gmail.com>
2026-06-21 21:57 ` David Laight [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260621225727.734e5ecc@pumpkin \
--to=david.laight.linux@gmail.com \
--cc=alvinwylim@gmail.com \
--cc=cassel@kernel.org \
--cc=dlemoal@kernel.org \
--cc=linux-ide@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox