public inbox for linux-ide@vger.kernel.org
 help / color / mirror / Atom feed
From: Lennert Buytenhek <kernel@wantstofly.org>
To: Niklas Cassel <cassel@kernel.org>
Cc: Damien Le Moal <dlemoal@kernel.org>,
	linux-ide@vger.kernel.org, linux-kernel@vger.kernel.org,
	Robin Murphy <robin.murphy@arm.com>,
	John Garry <john.g.garry@oracle.com>,
	Joerg Roedel <jroedel@suse.de>
Subject: Re: [PATCH] ahci: add 43-bit DMA address quirk for ASMedia ASM106x controllers
Date: Thu, 25 Jan 2024 15:43:06 +0200	[thread overview]
Message-ID: <ZbJlatnuTMblK9hS@wantstofly.org> (raw)
In-Reply-To: <ZbJiZotayVjtF8ti@x1-carbon>

On Thu, Jan 25, 2024 at 02:30:14PM +0100, Niklas Cassel wrote:

> > With one of the on-board ASM1061 AHCI controllers (1b21:0612) on an
> > ASUSTeK Pro WS WRX80E-SAGE SE WIFI mainboard, a controller hang was
> > observed that was immediately preceded by the following kernel messages:
> > 
> > [Thu Jan  4 23:12:54 2024] ahci 0000:28:00.0: Using 64-bit DMA addresses
> > [Thu Jan  4 23:12:54 2024] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00000 flags=0x0000]
> > [Thu Jan  4 23:12:54 2024] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00300 flags=0x0000]
> > [Thu Jan  4 23:12:54 2024] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00380 flags=0x0000]
> > [Thu Jan  4 23:12:54 2024] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00400 flags=0x0000]
> > [Thu Jan  4 23:12:54 2024] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00680 flags=0x0000]
> > [Thu Jan  4 23:12:54 2024] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00700 flags=0x0000]
> > 
> > The first message is produced by code in drivers/iommu/dma-iommu.c which
> > is accompanied by the following comment that seems to apply:
> > 
> >         /*
> >          * Try to use all the 32-bit PCI addresses first. The original SAC vs.
> >          * DAC reasoning loses relevance with PCIe, but enough hardware and
> >          * firmware bugs are still lurking out there that it's safest not to
> >          * venture into the 64-bit space until necessary.
> >          *
> >          * If your device goes wrong after seeing the notice then likely either
> >          * its driver is not setting DMA masks accurately, the hardware has
> >          * some inherent bug in handling >32-bit addresses, or not all the
> >          * expected address bits are wired up between the device and the IOMMU.
> >          */
> > 
> > Asking the ASM1061 on a discrete PCIe card to DMA from I/O virtual
> > address 0xffffffff00000000 produces the following I/O page faults:
> > 
> > [Tue Jan 23 21:31:55 2024] vfio-pci 0000:07:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0021 address=0x7ff00000000 flags=0x0010]
> > [Tue Jan 23 21:31:55 2024] vfio-pci 0000:07:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0021 address=0x7ff00000500 flags=0x0010]
> > 
> > Note that the upper 21 bits of the logged DMA address are zero.  (When
> > asking a different PCIe device in the same PCIe slot to DMA to the same
> > I/O virtual address, we do see all the upper 32 bits of the DMA address
> > as 1, so this is not an issue with the chipset or IOMMU configuration on
> > the test system.)
> > 
> > Finally, hacking libahci to always set the upper 21 bits of all DMA
> > addresses to 1 produces no discernible effect on the behavior of the
> > ASM1061, and mkfs/mount/scrub/etc work as without this hack.
> > 
> > This all strongly suggests that the ASM1061 has a 43 bit DMA address
> > limit, and this commit therefore adds a quirk to deal with this limit.
> 
> Thank you for all the effort you spent on debugging this!

Thanks for your help!


> > We apply the quirk to the other known ASMedia parts as well, since they
> > are similar and likely to be affected by the same issue.
> 
> I think we want to limit the quirk to ASMedia device ID 0x612 (and
> probably also 0x611) for now, as that is what you have tested with.
> 
> We don't know for sure if the other devices are also broken.
> (Even though it seems highely likely that some of them are.)
> 
> If the ASMedia guys eventually reply to the other email and tell us
> the exactly which device IDs that are affected, then it will be trivial
> to apply the quirk for other affected device IDs as well.

OK, I'll do that in v2.


> > Link: https://lore.kernel.org/linux-ide/ZaZ2PIpEId-rl6jv@wantstofly.org/
> > Signed-off-by: Lennert Buytenhek <kernel@wantstofly.org>
> > ---
> >  drivers/ata/ahci.c | 38 +++++++++++++++++++++++++++-----------
> >  drivers/ata/ahci.h |  1 +
> >  2 files changed, 28 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
> > index 08745e7db820..dc86c73019ce 100644
> > --- a/drivers/ata/ahci.c
> > +++ b/drivers/ata/ahci.c
> > @@ -58,6 +58,7 @@ enum board_ids {
> >  
> >  	/* board IDs for specific chipsets in alphabetical order */
> >  	board_ahci_al,
> > +	board_ahci_asm106x,
> >  	board_ahci_avn,
> >  	board_ahci_mcp65,
> >  	board_ahci_mcp77,
> > @@ -185,6 +186,13 @@ static const struct ata_port_info ahci_port_info[] = {
> >  		.udma_mask	= ATA_UDMA6,
> >  		.port_ops	= &ahci_ops,
> >  	},
> > +	[board_ahci_asm106x] = {
> > +		AHCI_HFLAGS	(AHCI_HFLAG_43BIT_ONLY),
> > +		.flags		= AHCI_FLAG_COMMON,
> > +		.pio_mask	= ATA_PIO4,
> > +		.udma_mask	= ATA_UDMA6,
> > +		.port_ops	= &ahci_ops,
> > +	},
> >  	[board_ahci_avn] = {
> >  		.flags		= AHCI_FLAG_COMMON,
> >  		.pio_mask	= ATA_PIO4,
> > @@ -596,14 +604,14 @@ static const struct pci_device_id ahci_pci_tbl[] = {
> >  	{ PCI_VDEVICE(PROMISE, 0x3f20), board_ahci },	/* PDC42819 */
> >  	{ PCI_VDEVICE(PROMISE, 0x3781), board_ahci },   /* FastTrak TX8660 ahci-mode */
> >  
> > -	/* Asmedia */
> > -	{ PCI_VDEVICE(ASMEDIA, 0x0601), board_ahci },	/* ASM1060 */
> > -	{ PCI_VDEVICE(ASMEDIA, 0x0602), board_ahci },	/* ASM1060 */
> > -	{ PCI_VDEVICE(ASMEDIA, 0x0611), board_ahci },	/* ASM1061 */
> > -	{ PCI_VDEVICE(ASMEDIA, 0x0612), board_ahci },	/* ASM1062 */
> > -	{ PCI_VDEVICE(ASMEDIA, 0x0621), board_ahci },   /* ASM1061R */
> > -	{ PCI_VDEVICE(ASMEDIA, 0x0622), board_ahci },   /* ASM1062R */
> > -	{ PCI_VDEVICE(ASMEDIA, 0x0624), board_ahci },   /* ASM1062+JMB575 */
> > +	/* ASMedia, 43 bit DMA address limit */
> > +	{ PCI_VDEVICE(ASMEDIA, 0x0601), board_ahci_asm106x },	/* ASM1060 */
> > +	{ PCI_VDEVICE(ASMEDIA, 0x0602), board_ahci_asm106x },	/* ASM1060 */
> > +	{ PCI_VDEVICE(ASMEDIA, 0x0611), board_ahci_asm106x },	/* ASM1061 */
> > +	{ PCI_VDEVICE(ASMEDIA, 0x0612), board_ahci_asm106x },	/* ASM1061/1062 */
> > +	{ PCI_VDEVICE(ASMEDIA, 0x0621), board_ahci_asm106x },   /* ASM1061R */
> > +	{ PCI_VDEVICE(ASMEDIA, 0x0622), board_ahci_asm106x },   /* ASM1062R */
> > +	{ PCI_VDEVICE(ASMEDIA, 0x0624), board_ahci_asm106x },   /* ASM1062+JMB575 */
> >  
> >  	/*
> >  	 * Samsung SSDs found on some macbooks.  NCQ times out if MSI is
> > @@ -943,11 +951,19 @@ static int ahci_pci_device_resume(struct device *dev)
> >  
> >  #endif /* CONFIG_PM */
> >  
> > -static int ahci_configure_dma_masks(struct pci_dev *pdev, int using_dac)
> > +static int ahci_configure_dma_masks(struct pci_dev *pdev,
> > +				    struct ahci_host_priv *hpriv)
> >  {
> > -	const int dma_bits = using_dac ? 64 : 32;
> > +	int dma_bits;
> >  	int rc;
> >  
> > +	if (!(hpriv->cap & HOST_CAP_64))
> > +		dma_bits = 32;
> > +	else if (hpriv->flags & AHCI_HFLAG_43BIT_ONLY)
> > +		dma_bits = 43;
> > +	else
> > +		dma_bits = 64;
> > +
> 
> I would prefer if you write this as:
> 
> if (hpriv->cap & HOST_CAP_64) {
> 	dma_bits = 64;
> 	if (hpriv->flags & AHCI_HFLAG_43BIT_ONLY)
> 		dma_bits = 43;
> } else {
> 	dma_bits = 32;
> }
> 
> Such that we still require the device to advertize 64 bit support,
> and the quirk.
> If the device does not advertize 64, we don't want it to be possible
> to use a mask >32, even if the quirk flag is set.

Isn't that logic exactly the same as in my version?  I.e. in both
versions, HOST_CAP_64 has to be set and AHCI_HFLAG_43BIT_ONLY has
to be set for dma_bits to become 43.

(I don't mind doing it your way, I just don't see a functional
difference between the versions. :-) )


Kind regards,
Lennert

  reply	other threads:[~2024-01-25 13:43 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-24 13:54 [PATCH] ahci: add 43-bit DMA address quirk for ASMedia ASM106x controllers Lennert Buytenhek
2024-01-25 13:30 ` Niklas Cassel
2024-01-25 13:43   ` Lennert Buytenhek [this message]
2024-01-25 14:06     ` Niklas Cassel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZbJlatnuTMblK9hS@wantstofly.org \
    --to=kernel@wantstofly.org \
    --cc=cassel@kernel.org \
    --cc=dlemoal@kernel.org \
    --cc=john.g.garry@oracle.com \
    --cc=jroedel@suse.de \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=robin.murphy@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox