All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: fk1xdcio@duck.com
Cc: linux-pci@vger.kernel.org, Lukas Wunner <lukas@wunner.de>,
	Oliver O'Halloran <oohall@gmail.com>
Subject: Re: ASMedia ASM1812 PCIe switch causes system to freeze hard
Date: Mon, 13 Mar 2023 16:57:18 -0500	[thread overview]
Message-ID: <20230313215718.GA1546868@bhelgaas> (raw)
In-Reply-To: <20230308204942.GA1032495@bhelgaas>

On Wed, Mar 08, 2023 at 02:49:42PM -0600, Bjorn Helgaas wrote:
> On Sat, Feb 25, 2023 at 01:37:23PM -0500, fk1xdcio@duck.com wrote:
> > I'm testing a generic 4-port PCIe x4 2.5Gbps Ethernet NIC. It uses an
> > ASM1812 for the PCI packet switch to four RTL8125BG network controllers.
> > 
> > The more load I put on the NIC the faster the system freezes. For example if
> > I activate four 2.5Gbps fully saturated network connections then the system
> > hard freezes almost immediately. When the system freezes it seems completely
> > dead. SysRq doesn't work, serial consoles are dead, etc. so I haven't been
> > able to get much debugging information. I have tested on various different
> > physical systems, Xeon E5, Xeon E3, i7, and they all behave the same so it
> > doesn't seem like a system hardware issue.
> > 
> > Disabling IOMMU makes it run for a little longer before crashing.
> > 
> > The tiny bit of error information I have been able to get under various
> > conditions (eg. disabling ASPM, forcing D0, etc):
> >   Test #1:
> >   pcieport 0000:04:02.0: Unable to change power state from D3hot to D0,
> > device inaccessible
> > 
> >   Test #2:
> >   pcieport 0000:04:02.0: can't change power state from D3cold to D0 (config
> > space inaccessible)
> >   pcieport 0000:03:00.0: Wakeup disabled by ACPI
> >   pcieport 0000:04:02.0: PME# disabled
> > 
> >   Test #3:
> >   enp7s0: cmd = 0xff, should be 0x07 \x0a.
> >   enp7s0: pci link is down \x0a.
> > 
> > At times there are several of those errors printed for the different PCI
> > devices of the NIC before the system locks up.
> > 
> > Setting "pci=nommconf" on the kernel command line is the only thing that
> > seems to fix the issue but performance is degraded when using bidirectional
> > transfers. 2.5Gbps TX but only 1.5Gbps RX compared to MMCONFIG enabled which
> > gets full 2.5Gbps bidirectional.
> > 
> > So it seems the MMCONFIG works sometimes but eventually something happens
> > and it becomes inaccessible at which point the system freezes. Is there a
> > way to keep MMCONFIG enabled for other devices but not this ASM1812 device?
> > Or better, is there a way to debug and fix MMCONFIG for the device?
> 
> Thanks for the report!
> 
> So IIUC, "pci=nommconf" avoids the system hang completely, but network
> performance is lower.  Do the NIC stats show packet drops that might
> explain the performance problem?
> 
> You mentioned later that you see AER errors caused by ASPM, and they
> go away if you disable power management (but the hard lockups still
> happen).  Is it "pcie_aspm=off" or "pcie_port_pm=off" or something
> else that makes this diffference?

I don't want to forget about this issue.  Have you learned anything
new, e.g., any answers to the questions above?  I don't have any good
ideas yet, but if we keep pushing on it, we might be able to figure
out something.

Bjorn

  reply	other threads:[~2023-03-13 21:58 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-25 18:37 ASMedia ASM1812 PCIe switch causes system to freeze hard fk1xdcio
2023-02-25 21:02 ` Lukas Wunner
     [not found]   ` <FC4B5703-B454-4BEB-9E9C-6841FBD2CD60.1@smtp-inbound1.duck.com>
2023-02-25 21:58     ` fk1xdcio
2023-03-08 20:49 ` Bjorn Helgaas
2023-03-13 21:57   ` Bjorn Helgaas [this message]
     [not found]     ` <1BD0E6B9-0611-4879-BA26-DDA87E772512.1@smtp-inbound1.duck.com>
2023-03-14  8:28       ` fk1xdcio
     [not found] <8e7978f65c6606fb2d48483435c78bd3@cutk.com>
2023-02-25 18:47 ` fk1xdcio
2023-02-27  8:12   ` Oliver O'Halloran
     [not found]     ` <9C53F704-1C13-4191-8890-20B18A23E94B.1@smtp-inbound1.duck.com>
2023-02-27  9:17       ` fk1xdcio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230313215718.GA1546868@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=fk1xdcio@duck.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=oohall@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.