public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Ben Greear <greearb@candelatech.com>
Cc: "Pali Rohár" <pali@kernel.org>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	bjorn@helgaas.com, LKML <linux-kernel@vger.kernel.org>,
	stable@vger.kernel.org, "Stefan Roese" <sr@denx.de>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	"Bharat Kumar Gogada" <bharat.kumar.gogada@xilinx.com>,
	"Michal Simek" <michal.simek@xilinx.com>,
	"Yao Hongbo" <yaohongbo@linux.alibaba.com>,
	"Naveen Naidu" <naveennaidu479@gmail.com>,
	"Sasha Levin" <sashal@kernel.org>,
	linux-pci@vger.kernel.org,
	"Gregory Greenman" <gregory.greenman@intel.com>,
	"Kalle Valo" <kvalo@kernel.org>,
	linux-wireless@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [PATCH 5.4 182/389] PCI/portdrv: Dont disable AER reporting in get_port_device_capability()
Date: Tue, 4 Apr 2023 12:09:52 -0500	[thread overview]
Message-ID: <20230404170952.GA3559293@bhelgaas> (raw)
In-Reply-To: <4ff1397e-1d78-bc59-f577-e69024c4c4f3@candelatech.com>

On Fri, Mar 31, 2023 at 03:31:40PM -0700, Ben Greear wrote:
> On 3/31/23 15:06, Bjorn Helgaas wrote:
> > [+cc iwlwifi folks]
> > 
> > Re: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in
> > get_port_device_capability()")
> > 
> > On Wed, Mar 29, 2023 at 04:17:29PM -0700, Ben Greear wrote:
> > > On 8/30/22 3:16 PM, Ben Greear wrote:
> > > ...
> > 
> > > I notice this patch appears to be in 6.2.6 kernel, and my kernel logs are
> > > full of spam and system is unstable.  Possibly the unstable part is related
> > > to something else, but the log spam is definitely extreme.
> > > 
> > > These systems are fairly stable on 5.19-ish kernels without the patch in
> > > question.
> > 
> > Hmmm, I was going to thank you for the report, but looking closer, I
> > see that you reported this last August [1] and we *should* have
> > pursued it with the iwlwifi folks or figured out what the PCI core is
> > doing wrong, but I totally dropped the ball.  Sorry about that.
> > 
> > To make sure we're all on the same page, we're talking about
> > 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in
> > get_port_device_capability()") [2],
> > which is present in v6.0 and later [3] but not v5.19.16 [4].
> 
> Yes, though I manually tried reverting that patch, and problem
> persisted, so maybe some secondary patch still enables whatever
> causes the issue.
> 
> Booting with pci=noaer 'fixes' the problem for me, that is what I am
> running currently.
> 
> > > Here is sample of the spam:
> > > 
> > > [ 1675.547023] pcieport 0000:03:02.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
> > > [ 1675.556851] pcieport 0000:03:02.0:   device [10b5:8619] error status/mask=00100000/00000000
> > > [ 1675.563904] pcieport 0000:03:02.0:    [20] UnsupReq               (First)
> > > [ 1675.569398] pcieport 0000:03:02.0: AER:   TLP Header: 34000000 05001f10 00000000 88c888c8
> > > [ 1675.576296] iwlwifi 0000:05:00.0: AER: can't recover (no error_detected callback)
> > 
> > The TLP header says this is an LTR message from 05:00.0.  Apparently
> > the bridge above 05:00.0 is 03:02.0, which logged an Unsupported
> > Request error for the message, probably because 03:02.0 doesn't have
> > LTR enabled.

> Here is lspci, and please note that I am using a pcie -> 12x m.2
> adapter board, which is not common in the world.  Possibly it is
> causing some of the problems with the AER logic (though, it is
> stable in 5.19 and lower.  And a similar system with 2 of these
> adapter boards filled with 24 mtk7922 radios does not show the AER
> warnings or instability problems so far.)
> 
> The lspci below is from a system with 12 ax210 radios, I have
> another with 24, it shows similar problems.

Interesting config.  Somebody is definitely doing something wrong.
LTR is enabled at 00:1c.0 (which is fine), not supported and disabled
at 02:00.0 and 03:02.0 (also fine), but *enabled* at 05:00.0, which is
absolutely not fine because 03:02.0 won't know what to do with the LTR
messages and would log the AER errors you're seeing.

> 00:1c.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #1 (rev f1) (prog-if 00 [Normal decode])
> 	Bus: primary=00, secondary=02, subordinate=0f, sec-latency=0
> 		DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd+
> 			 AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
> 		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-

> 02:00.0 PCI bridge: PLX Technology, Inc. PEX 8619 16-lane, 16-Port PCI Express Gen 2 (5.0 GT/s) Switch with DMA (rev ba) (prog-if 00 [Normal decode])
> 	Bus: primary=02, secondary=03, subordinate=0f, sec-latency=0

> 		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
> 			 AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
> 		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled

> 03:02.0 PCI bridge: PLX Technology, Inc. PEX 8619 16-lane, 16-Port PCI Express Gen 2 (5.0 GT/s) Switch with DMA (rev ba) (prog-if 00 [Normal decode])
> 	Bus: primary=03, secondary=05, subordinate=05, sec-latency=0

> 		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported ARIFwd+
> 			 AtomicOpsCap: Routing-
> 		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-

> 05:00.0 Network controller: Intel Corporation Device 2725 (rev 1a)
> 		DevCap2: Completion Timeout: Range B, TimeoutDis+, LTR+, OBFF Via WAKE#
> 			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> 		DevCtl2: Completion Timeout: 16ms to 55ms, TimeoutDis-, LTR+, OBFF Disabled
> 			 AtomicOpsCtl: ReqEn-

For 02:00.0 and 03:02.0, pci_configure_ltr() should bail out as soon
as it sees they don't support PCI_EXP_DEVCAP2_LTR, so they should
never have dev->ltr_path set.  And pci_configure_ltr() should not set
PCI_EXP_DEVCTL2_LTR_EN for 05:00.0 since bridge->ltr_path is not set
for 03:02.0.

Can you collect the dmesg log when booted with "pci=earlydump"?  I
wonder if BIOS could be enabling LTR on 05:00.0.

Bjorn

       reply	other threads:[~2023-04-04 17:10 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <4ff1397e-1d78-bc59-f577-e69024c4c4f3@candelatech.com>
2023-04-04 17:09 ` Bjorn Helgaas [this message]
2023-04-18 18:18   ` [PATCH 5.4 182/389] PCI/portdrv: Dont disable AER reporting in get_port_device_capability() Ben Greear
2023-04-18 20:26     ` Bjorn Helgaas
     [not found] <9dfa04c4-e0cc-f265-5935-254f43db931b@candelatech.com>
2023-03-31 22:06 ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230404170952.GA3559293@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=bharat.kumar.gogada@xilinx.com \
    --cc=bhelgaas@google.com \
    --cc=bjorn@helgaas.com \
    --cc=greearb@candelatech.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=gregory.greenman@intel.com \
    --cc=kvalo@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=michal.simek@xilinx.com \
    --cc=naveennaidu479@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pali@kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=sashal@kernel.org \
    --cc=sr@denx.de \
    --cc=stable@vger.kernel.org \
    --cc=yaohongbo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox