From: Bjorn Helgaas <helgaas@kernel.org>
To: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: "Pali Rohár" <pali@kernel.org>,
bhelgaas@google.com,
"Mario Limonciello" <mario.limonciello@amd.com>,
"Mika Westerberg" <mika.westerberg@linux.intel.com>,
"Keith Busch" <kbusch@kernel.org>,
"Kuppuswamy Sathyanarayanan"
<sathyanarayanan.kuppuswamy@linux.intel.com>,
"Stefan Roese" <sr@denx.de>,
linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] PCI/portdrv: Avoid enabling AER on Thunderbolt devices
Date: Thu, 29 Dec 2022 10:51:58 -0600 [thread overview]
Message-ID: <20221229165158.GA608748@bhelgaas> (raw)
In-Reply-To: <CAAd53p4CU+K5sOumJYxENRE-Ci7zPKxk0ROszvBUPWV=1xYZyw@mail.gmail.com>
On Thu, Dec 29, 2022 at 11:45:51AM +0800, Kai-Heng Feng wrote:
> On Mon, Dec 26, 2022 at 11:46 PM Pali Rohár <pali@kernel.org> wrote:
> > On Monday 26 December 2022 23:30:31 Kai-Heng Feng wrote:
> > > We are seeing igc ethernet device on Thunderbolt dock stops working
> > > after S3 resume because of AER error, or even make S3 resume freeze:
> > > pcieport 0000:00:1d.0: AER: Multiple Corrected error received: 0000:00:1d.0
> > > pcieport 0000:00:1d.0: PCIe Bus Error: severity=Corrected, type=Transaction Layer, (Receiver ID)
> > > pcieport 0000:00:1d.0: device [8086:7ab0] error status/mask=00008000/00002000
> > > pcieport 0000:00:1d.0: [15] HeaderOF
> > > pcieport 0000:00:1d.0: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:00:1d.0
> > > pcieport 0000:00:1d.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
> > > pcieport 0000:00:1d.0: device [8086:7ab0] error status/mask=00100000/00004000
> > > pcieport 0000:00:1d.0: [20] UnsupReq (First)
> > > pcieport 0000:00:1d.0: AER: TLP Header: 34000000 0a000052 00000000 00000000
> > > pcieport 0000:00:1d.0: AER: Error of this Agent is reported first
> > > pcieport 0000:04:01.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
> > > pcieport 0000:04:01.0: device [8086:1136] error status/mask=00300000/00000000
> > > pcieport 0000:04:01.0: [20] UnsupReq (First)
> > > pcieport 0000:04:01.0: [21] ACSViol
> > > pcieport 0000:04:01.0: AER: TLP Header: 34000000 04000052 00000000 00000000
> > > thunderbolt 0000:05:00.0: AER: can't recover (no error_detected callback)
> > >
> > > This supposedly should be fixed by commit c01163dbd1b8 ("PCI/PM: Always disable
> > > PTM for all devices during suspend"), but somehow it doesn't work for
> > > this case.
> > >
> > > By dumping the PCI_PTM_CTRL register on resume, it turns out PTM is
> > > already flipped on by either the Thunderbolt dock firmware or the host
> > > BIOS. Writing 0 to PCI_PTM_CTRL yields the same result.
> > >
> > > Windows is however not affected by this issue, by using WinDbg's !pci
> > > command, it shows that AER is not enabled for devices connected via
> > > Thunderbolt port, and that's the reason why Windows doesn't exhibit the
> > > issue.
> >
> > Could you try to manually enable AER on Windows (via touching PCIe
> > config registers) if Windows can trigger this issue too, or not?
>
> Actually I misread the output of WinDbg !pci command, the AER is also
> enabled under Windows.
> !pci command also shows the same PTM error in Header Log. I can also
> find the AER warnings in Windows' Event Viewer.
I suspected a Linux problem (e.g., we messed up disabling/restoring
PTM). That's why I was asking about your debug patch, to see if we
could find something wrong with Linux.
But if you also see the Unsupported Request errors on Windows, that
makes it more likely that it's a firmware issue.
> I am asking hardware vendor to see if it's possible to fix it at
> firmware side.
I assume PTM was not enabled by firmware at boot-time (you might be
able to confirm this by tweaking early_dump_pci_device() to dump more
space and using "pci=earlydump"). If that's the case, it seems
strange that firmware would enable PTM at resume-time.
Linux *should* be disabling PTM at suspend-time, so firmware should
never see the fact that it had been enabled, so I don't know how it
could conclude that it's safe to enable PTM at resume-time.
Bjorn
next prev parent reply other threads:[~2022-12-29 16:52 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-26 15:30 [PATCH] PCI/portdrv: Avoid enabling AER on Thunderbolt devices Kai-Heng Feng
2022-12-26 15:46 ` Pali Rohár
2022-12-29 3:45 ` Kai-Heng Feng
2022-12-29 12:02 ` Pali Rohár
2022-12-29 16:51 ` Bjorn Helgaas [this message]
2022-12-26 22:50 ` Bjorn Helgaas
2022-12-29 4:26 ` Kai-Heng Feng
2023-01-17 23:14 ` Bjorn Helgaas
2023-02-08 13:33 ` Kai-Heng Feng
2023-02-14 0:10 ` Bjorn Helgaas
2023-05-16 14:14 ` Bagas Sanjaya
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221229165158.GA608748@bhelgaas \
--to=helgaas@kernel.org \
--cc=bhelgaas@google.com \
--cc=kai.heng.feng@canonical.com \
--cc=kbusch@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=mario.limonciello@amd.com \
--cc=mika.westerberg@linux.intel.com \
--cc=pali@kernel.org \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=sr@denx.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox