All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Laight <david.laight.linux@gmail.com>
To: Niklas Cassel <cassel@kernel.org>
Cc: Damien Le Moal <dlemoal@kernel.org>,
	Alvin Lim <alvinwylim@gmail.com>,
	linux-ide@vger.kernel.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org
Subject: Re: [PATCH] ata: ahci: force 32-bit DMA for ASMedia ASM1166
Date: Mon, 22 Jun 2026 16:02:15 +0100	[thread overview]
Message-ID: <20260622160215.67e6def5@pumpkin> (raw)
In-Reply-To: <ajk2WIzpNgQSJ2dh@ryzen>

On Mon, 22 Jun 2026 15:19:20 +0200
Niklas Cassel <cassel@kernel.org> wrote:

> On Mon, Jun 22, 2026 at 02:02:57PM +0100, David Laight wrote:
> > On Mon, 22 Jun 2026 20:31:54 +0900
> > Damien Le Moal <dlemoal@kernel.org> wrote:
> >   
> > > On 6/21/26 19:08, Alvin Lim wrote:  
> > > > The ASMedia ASM1166 SATA controller (1b21:1166) advertises 64-bit DMA
> > > > support (AHCI CAP.S64A), but on systems with the IOMMU enabled - where it
> > > > can be handed DMA addresses above 4 GB - it silently corrupts data in
> > > > transit. Reads return different, wrong data on each access. SMART is clean,
> > > > there are no SATA link resets and no MCE is raised, so the corruption is
> > > > invisible until it surfaces as filesystem metadata errors (XFS EUCLEAN)
> > > > or, on Ceph, mass scrub errors across multiple independent filesystems at
> > > > once - i.e. host-level, not filesystem-level.
> > > > 
> > > > This is the same failure mode already quirked for other controllers that
> > > > falsely claim working 64-bit DMA. See commit 105c42566a55 ("ata: ahci:
> > > > force 32-bit DMA for JMicron JMB582/JMB585") and commit 20730e9b2778
> > > > ("ahci: add 43-bit DMA address quirk for ASMedia ASM1061 controllers").
> > > > The ASM1166 currently maps to plain board_ahci with no DMA limit.    
> > > 
> > > Have you tried the same quirk, limiting DMA to 43-bits ? It is very likely that
> > > this adapter bug is the same as the 1061.
> > >   
> > 
> > It would also be worth checking that you get the read fails with a 44-bit mask.
> > 
> > I'd guess it also requires that you keep the controller busy for (about) 8TB
> > of reads - which is where sequential address allocation would exceed 43-bits.
> > But that is just conjecture since I've not looked at the iommu code.  
> 
> The iommu code will by default try to allocate a 32-bit IOVA by default:
> https://github.com/torvalds/linux/blob/v7.1/drivers/iommu/dma-iommu.c#L780-L799
> 
> Only once a 32-bit IOVA allocation fails, will it start using 64-bit IOVAs.
> 
> 
> It is possible to to set iommu.forcedac=1 to allocate from the full usable
> IOVA range immediately:
> https://github.com/torvalds/linux/blob/v7.1/Documentation/admin-guide/kernel-parameters.txt#L2619

Ok so SAC => Single Address Cycle and DAC => Double.
This all makes less sense than before - especially if that message
isn't being output.

That all rather implies that with the iommu enabled it is unlikely the/any
device will see DMA addresses above 4G.
(Unless you manage to have approaching 4G of active buffers.)

If changing the dma mask is causing bounce buffers be used (and there is no
reason it should when the iommu is enabled), then the difference starts
looking like a timing error.

Have you identified the type of corruption that happens for disk reads?
I'd guess typical errors are:
- Buffer not written at all.
- End of buffer incorrect.
- Buffer written with data from the wrong sector.

The PCIe write TLP associated with disk reads are relatively simple.

I learnt more that I wanted to about read TLP diagnosing a corruption
caused by an fpga implementation failing to correctly process read TLP
that generated more than one data TLP in response.
We managed to loan a PCIe analyser (very expensive, difficult to setup
and difficult to use) by suggesting to a salesman we might buy one!
and identified the problem, fortunately the bug was in logic supplied in
source form so we could fix it.
I then added logic to our fpga image so that we could trace the TLP and LSSM
state changes.

	David


> 
> 
> Kind regards,
> Niklas


      reply	other threads:[~2026-06-22 15:02 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-21 10:08 [PATCH] ata: ahci: force 32-bit DMA for ASMedia ASM1166 Alvin Lim
2026-06-21 10:21 ` sashiko-bot
2026-06-21 12:48 ` David Laight
     [not found]   ` <CA+CYLR6Rg-3brg9yCMAKJDr7t=mtu4vP0+aMFs+JhLPWtQxOYA@mail.gmail.com>
2026-06-21 21:57     ` David Laight
2026-06-22 11:31 ` Damien Le Moal
2026-06-22 13:02   ` David Laight
2026-06-22 13:19     ` Niklas Cassel
2026-06-22 15:02       ` David Laight [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260622160215.67e6def5@pumpkin \
    --to=david.laight.linux@gmail.com \
    --cc=alvinwylim@gmail.com \
    --cc=cassel@kernel.org \
    --cc=dlemoal@kernel.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.