* Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
@ 2025-01-17 12:05 Marek Marczykowski-Górecki
2025-01-29 1:15 ` Bjorn Helgaas
0 siblings, 1 reply; 25+ messages in thread
From: Marek Marczykowski-Górecki @ 2025-01-17 12:05 UTC (permalink / raw)
To: Bjorn Helgaas, Jürgen Groß, Roger Pau Monné,
Boris Ostrovsky
Cc: xen-devel, linux-kernel, regressions, Felix Fietkau,
Lorenzo Bianconi, Ryder Lee
[-- Attachment #1: Type: text/plain, Size: 2618 bytes --]
Hi,
After updating PV dom0 to Linux 6.12, The Mediatek MT7922 device reports
all 0xff when accessing its config space. This happens only after device
reset (which is also triggered when binding the device to the
xen-pciback driver).
Reproducer:
# lspci -xs 01:00.0
01:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter
00: c3 14 16 06 00 00 10 00 00 00 80 02 10 00 00 00
...
# echo 1 > /sys/bus/pci/devices/0000:01:00.0/reset
# lspci -xs 01:00.0
01:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter
00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
...
The same operation done on Linux 6.12 running without Xen works fine.
git bisect points at:
commit d591f6804e7e1310881c9224d72247a2b65039af
Author: Bjorn Helgaas <bhelgaas@google.com>
Date: Tue Aug 27 18:48:46 2024 -0500
PCI: Wait for device readiness with Configuration RRS
part of that commit:
@@ -1311,9 +1320,15 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
return -ENOTTY;
}
- pci_read_config_dword(dev, PCI_COMMAND, &id);
- if (!PCI_POSSIBLE_ERROR(id))
- break;
+ if (root && root->config_crs_sv) {
+ pci_read_config_dword(dev, PCI_VENDOR_ID, &id);
+ if (!pci_bus_crs_vendor_id(id))
+ break;
+ } else {
+ pci_read_config_dword(dev, PCI_COMMAND, &id);
+ if (!PCI_POSSIBLE_ERROR(id))
+ break;
+ }
Adding some debugging, the PCI_VENDOR_ID read in pci_dev_wait() returns
initially 0xffffffff. If I extend the condition with
"&& !PCI_POSSIBLE_ERROR(id)", then the issue disappear. But reading the
patch description, it would break VF.
I'm not sure where the issue is, but given it breaks only when running
with Xen, I guess something is wrong with "Configuration RRS Software
Visibility" in that case.
BTW, shouldn't PCI_VENDOR_ID be accessed via pci_read_config_word()
instead of pci_read_config_dword()?
I'm also CC-ing MT76 driver maintainers in case it turns out to be
device-specific issue, not a generic one.
Initially reported at https://github.com/QubesOS/qubes-issues/issues/9689
#regzbot introduced: d591f6804e7e1310881c9224d72247a2b65039af
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-17 12:05 Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12) Marek Marczykowski-Górecki
@ 2025-01-29 1:15 ` Bjorn Helgaas
2025-01-29 2:10 ` Marek Marczykowski-Górecki
0 siblings, 1 reply; 25+ messages in thread
From: Bjorn Helgaas @ 2025-01-29 1:15 UTC (permalink / raw)
To: Marek Marczykowski-Górecki
Cc: Bjorn Helgaas, Jürgen Groß, Roger Pau Monné,
Boris Ostrovsky, xen-devel, linux-kernel, regressions,
Felix Fietkau, Lorenzo Bianconi, Ryder Lee
On Fri, Jan 17, 2025 at 01:05:30PM +0100, Marek Marczykowski-Górecki wrote:
> After updating PV dom0 to Linux 6.12, The Mediatek MT7922 device reports
> all 0xff when accessing its config space. This happens only after device
> reset (which is also triggered when binding the device to the
> xen-pciback driver).
Thanks for the report and for all the debugging you've already done!
> Reproducer:
>
> # lspci -xs 01:00.0
> 01:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter
> 00: c3 14 16 06 00 00 10 00 00 00 80 02 10 00 00 00
> ...
> # echo 1 > /sys/bus/pci/devices/0000:01:00.0/reset
> # lspci -xs 01:00.0
> 01:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter
> 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>
> The same operation done on Linux 6.12 running without Xen works fine.
>
> git bisect points at:
>
> commit d591f6804e7e1310881c9224d72247a2b65039af
> Author: Bjorn Helgaas <bhelgaas@google.com>
> Date: Tue Aug 27 18:48:46 2024 -0500
>
> PCI: Wait for device readiness with Configuration RRS
>
> part of that commit:
> @@ -1311,9 +1320,15 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
> return -ENOTTY;
> }
>
> - pci_read_config_dword(dev, PCI_COMMAND, &id);
> - if (!PCI_POSSIBLE_ERROR(id))
> - break;
> + if (root && root->config_crs_sv) {
> + pci_read_config_dword(dev, PCI_VENDOR_ID, &id);
> + if (!pci_bus_crs_vendor_id(id))
> + break;
> + } else {
> + pci_read_config_dword(dev, PCI_COMMAND, &id);
> + if (!PCI_POSSIBLE_ERROR(id))
> + break;
> + }
>
>
> Adding some debugging, the PCI_VENDOR_ID read in pci_dev_wait() returns
> initially 0xffffffff. If I extend the condition with
> "&& !PCI_POSSIBLE_ERROR(id)", then the issue disappear. But reading the
> patch description, it would break VF.
> I'm not sure where the issue is, but given it breaks only when running
> with Xen, I guess something is wrong with "Configuration RRS Software
> Visibility" in that case.
I'm missing something. If you get 0xffffffff, that is not the 0x0001
Vendor ID, so pci_dev_wait() should exit immediately. But the log at
https://github.com/QubesOS/qubes-issues/issues/9689#issuecomment-2582927149
says it *doesn't* exit and eventually times out.
And the lspci above shows ~0 data for much of the header, even though
the device must be ready by then.
I don't have any good ideas, but since the problem only happens with
Xen, and it seems to affect more than just the Vendor ID, maybe you
could instrument xen_pcibk_config_read() and see if there's something
wonky going on there?
> BTW, shouldn't PCI_VENDOR_ID be accessed via pci_read_config_word()
> instead of pci_read_config_dword()?
Per PCIe r6.0, sec 2.3.2:
If Configuration RRS Software Visibility is enabled (see below):
For a Configuration Read Request that includes both bytes of the
Vendor ID field of a device Function's Configuration Space Header,
the Root Complex must complete the Request to the host by
returning a read-data value of 0001h for the Vendor ID field and
all ‘1’s for any additional bytes included in the request.
Since either a word (16 bit) or dword (32 bit) read includes both
bytes of Vendor ID, I think either should work. We use a 32-bit read
in the enumeration path, where we need both Vendor ID and Device ID,
but we don't care about the Device ID here, so it probably doesn't
really matter here.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-29 1:15 ` Bjorn Helgaas
@ 2025-01-29 2:10 ` Marek Marczykowski-Górecki
2025-01-29 3:03 ` Bjorn Helgaas
2025-01-29 18:48 ` Bjorn Helgaas
0 siblings, 2 replies; 25+ messages in thread
From: Marek Marczykowski-Górecki @ 2025-01-29 2:10 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Bjorn Helgaas, Jürgen Groß, Roger Pau Monné,
Boris Ostrovsky, xen-devel, linux-kernel, regressions,
Felix Fietkau, Lorenzo Bianconi, Ryder Lee
[-- Attachment #1: Type: text/plain, Size: 3761 bytes --]
On Tue, Jan 28, 2025 at 07:15:26PM -0600, Bjorn Helgaas wrote:
> On Fri, Jan 17, 2025 at 01:05:30PM +0100, Marek Marczykowski-Górecki wrote:
> > After updating PV dom0 to Linux 6.12, The Mediatek MT7922 device reports
> > all 0xff when accessing its config space. This happens only after device
> > reset (which is also triggered when binding the device to the
> > xen-pciback driver).
>
> Thanks for the report and for all the debugging you've already done!
>
> > Reproducer:
> >
> > # lspci -xs 01:00.0
> > 01:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter
> > 00: c3 14 16 06 00 00 10 00 00 00 80 02 10 00 00 00
> > ...
> > # echo 1 > /sys/bus/pci/devices/0000:01:00.0/reset
> > # lspci -xs 01:00.0
> > 01:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter
> > 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> >
> > The same operation done on Linux 6.12 running without Xen works fine.
> >
> > git bisect points at:
> >
> > commit d591f6804e7e1310881c9224d72247a2b65039af
> > Author: Bjorn Helgaas <bhelgaas@google.com>
> > Date: Tue Aug 27 18:48:46 2024 -0500
> >
> > PCI: Wait for device readiness with Configuration RRS
> >
> > part of that commit:
> > @@ -1311,9 +1320,15 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
> > return -ENOTTY;
> > }
> >
> > - pci_read_config_dword(dev, PCI_COMMAND, &id);
> > - if (!PCI_POSSIBLE_ERROR(id))
> > - break;
> > + if (root && root->config_crs_sv) {
> > + pci_read_config_dword(dev, PCI_VENDOR_ID, &id);
> > + if (!pci_bus_crs_vendor_id(id))
> > + break;
> > + } else {
> > + pci_read_config_dword(dev, PCI_COMMAND, &id);
> > + if (!PCI_POSSIBLE_ERROR(id))
> > + break;
> > + }
> >
> >
> > Adding some debugging, the PCI_VENDOR_ID read in pci_dev_wait() returns
> > initially 0xffffffff. If I extend the condition with
> > "&& !PCI_POSSIBLE_ERROR(id)", then the issue disappear. But reading the
> > patch description, it would break VF.
> > I'm not sure where the issue is, but given it breaks only when running
> > with Xen, I guess something is wrong with "Configuration RRS Software
> > Visibility" in that case.
>
> I'm missing something. If you get 0xffffffff, that is not the 0x0001
> Vendor ID, so pci_dev_wait() should exit immediately.
I'm not sure what is going on there either, but my _guess_ is that the
loop exits too early due to the above. And it makes some further actions
to fail.
> But the log at
> https://github.com/QubesOS/qubes-issues/issues/9689#issuecomment-2582927149
> says it *doesn't* exit and eventually times out.
Note this log is from "working" kernel, so that timeout must be
something else.
> And the lspci above shows ~0 data for much of the header, even though
> the device must be ready by then.
>
> I don't have any good ideas, but since the problem only happens with
> Xen, and it seems to affect more than just the Vendor ID, maybe you
> could instrument xen_pcibk_config_read() and see if there's something
> wonky going on there?
This one is used when pcifront (from a different PV VM) is asking pciback
to read something. I see the issue even before starting any other VM and
not even attaching the device to the xen-pciback driver...
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-29 2:10 ` Marek Marczykowski-Górecki
@ 2025-01-29 3:03 ` Bjorn Helgaas
2025-01-29 3:22 ` Marek Marczykowski-Górecki
2025-01-29 18:48 ` Bjorn Helgaas
1 sibling, 1 reply; 25+ messages in thread
From: Bjorn Helgaas @ 2025-01-29 3:03 UTC (permalink / raw)
To: Marek Marczykowski-Górecki
Cc: Bjorn Helgaas, Jürgen Groß, Roger Pau Monné,
Boris Ostrovsky, xen-devel, linux-kernel, regressions,
Felix Fietkau, Lorenzo Bianconi, Ryder Lee, linux-pci
[+cc linux-pci]
On Wed, Jan 29, 2025 at 03:10:49AM +0100, Marek Marczykowski-Górecki wrote:
> On Tue, Jan 28, 2025 at 07:15:26PM -0600, Bjorn Helgaas wrote:
> > On Fri, Jan 17, 2025 at 01:05:30PM +0100, Marek Marczykowski-Górecki wrote:
> > > After updating PV dom0 to Linux 6.12, The Mediatek MT7922 device reports
> > > all 0xff when accessing its config space. This happens only after device
> > > reset (which is also triggered when binding the device to the
> > > xen-pciback driver).
> >
> > Thanks for the report and for all the debugging you've already done!
> >
> > > Reproducer:
> > >
> > > # lspci -xs 01:00.0
> > > 01:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter
> > > 00: c3 14 16 06 00 00 10 00 00 00 80 02 10 00 00 00
> > > ...
> > > # echo 1 > /sys/bus/pci/devices/0000:01:00.0/reset
> > > # lspci -xs 01:00.0
> > > 01:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter
> > > 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > >
> > > The same operation done on Linux 6.12 running without Xen works fine.
> > >
> > > git bisect points at:
> > >
> > > commit d591f6804e7e1310881c9224d72247a2b65039af
> > > Author: Bjorn Helgaas <bhelgaas@google.com>
> > > Date: Tue Aug 27 18:48:46 2024 -0500
> > >
> > > PCI: Wait for device readiness with Configuration RRS
> > >
> > > part of that commit:
> > > @@ -1311,9 +1320,15 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
> > > return -ENOTTY;
> > > }
> > >
> > > - pci_read_config_dword(dev, PCI_COMMAND, &id);
> > > - if (!PCI_POSSIBLE_ERROR(id))
> > > - break;
> > > + if (root && root->config_crs_sv) {
> > > + pci_read_config_dword(dev, PCI_VENDOR_ID, &id);
> > > + if (!pci_bus_crs_vendor_id(id))
> > > + break;
> > > + } else {
> > > + pci_read_config_dword(dev, PCI_COMMAND, &id);
> > > + if (!PCI_POSSIBLE_ERROR(id))
> > > + break;
> > > + }
> > >
> > >
> > > Adding some debugging, the PCI_VENDOR_ID read in pci_dev_wait() returns
> > > initially 0xffffffff. If I extend the condition with
> > > "&& !PCI_POSSIBLE_ERROR(id)", then the issue disappear. But reading the
> > > patch description, it would break VF.
> > > I'm not sure where the issue is, but given it breaks only when running
> > > with Xen, I guess something is wrong with "Configuration RRS Software
> > > Visibility" in that case.
> >
> > I'm missing something. If you get 0xffffffff, that is not the 0x0001
> > Vendor ID, so pci_dev_wait() should exit immediately.
>
> I'm not sure what is going on there either, but my _guess_ is that the
> loop exits too early due to the above. And it makes some further actions
> to fail.
When RRS SV is enabled, reading PCI_VENDOR_ID should always return
0x0001 (if the device isn't ready and responds with RRS status) or the
valid Vendor ID. I don't think it should ever return 0xffff (unless
the device is powered off, unplugged, or broken, of course).
> > But the log at
> > https://github.com/QubesOS/qubes-issues/issues/9689#issuecomment-2582927149
> > says it *doesn't* exit and eventually times out.
>
> Note this log is from "working" kernel, so that timeout must be
> something else.
I saw it was labeled "NO BUG" but I'm not sure it's labeled correctly
since there are no interesting messages from the "BUG PRESENT" part.
Awfully funny coincidence if it's unrelated.
> > And the lspci above shows ~0 data for much of the header, even though
> > the device must be ready by then.
> >
> > I don't have any good ideas, but since the problem only happens with
> > Xen, and it seems to affect more than just the Vendor ID, maybe you
> > could instrument xen_pcibk_config_read() and see if there's something
> > wonky going on there?
>
> This one is used when pcifront (from a different PV VM) is asking pciback
> to read something. I see the issue even before starting any other VM and
> not even attaching the device to the xen-pciback driver...
The report claims the problem only happens with Xen. I'm not a Xen
person, and I don't know how to find the relevant config accessors.
The snippets of kernel messages I see at [1] all mention pciback, so
that's my only clue of where to look. Bottom line, I have no idea
what the config accessor path is, and maybe we could learn something
by looking at whatever it is.
[1] https://github.com/QubesOS/qubes-issues/issues/9689#issuecomment-2582927149
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-29 3:03 ` Bjorn Helgaas
@ 2025-01-29 3:22 ` Marek Marczykowski-Górecki
2025-01-29 3:40 ` Bjorn Helgaas
2025-01-29 9:17 ` Jan Beulich
0 siblings, 2 replies; 25+ messages in thread
From: Marek Marczykowski-Górecki @ 2025-01-29 3:22 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Bjorn Helgaas, Jürgen Groß, Roger Pau Monné,
Boris Ostrovsky, xen-devel, linux-kernel, regressions,
Felix Fietkau, Lorenzo Bianconi, Ryder Lee, linux-pci
[-- Attachment #1: Type: text/plain, Size: 10845 bytes --]
On Tue, Jan 28, 2025 at 09:03:15PM -0600, Bjorn Helgaas wrote:
> [+cc linux-pci]
>
> On Wed, Jan 29, 2025 at 03:10:49AM +0100, Marek Marczykowski-Górecki wrote:
> > On Tue, Jan 28, 2025 at 07:15:26PM -0600, Bjorn Helgaas wrote:
> > > On Fri, Jan 17, 2025 at 01:05:30PM +0100, Marek Marczykowski-Górecki wrote:
> > > > After updating PV dom0 to Linux 6.12, The Mediatek MT7922 device reports
> > > > all 0xff when accessing its config space. This happens only after device
> > > > reset (which is also triggered when binding the device to the
> > > > xen-pciback driver).
> > >
> > > Thanks for the report and for all the debugging you've already done!
> > >
> > > > Reproducer:
> > > >
> > > > # lspci -xs 01:00.0
> > > > 01:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter
> > > > 00: c3 14 16 06 00 00 10 00 00 00 80 02 10 00 00 00
> > > > ...
> > > > # echo 1 > /sys/bus/pci/devices/0000:01:00.0/reset
> > > > # lspci -xs 01:00.0
> > > > 01:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter
> > > > 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > >
> > > > The same operation done on Linux 6.12 running without Xen works fine.
> > > >
> > > > git bisect points at:
> > > >
> > > > commit d591f6804e7e1310881c9224d72247a2b65039af
> > > > Author: Bjorn Helgaas <bhelgaas@google.com>
> > > > Date: Tue Aug 27 18:48:46 2024 -0500
> > > >
> > > > PCI: Wait for device readiness with Configuration RRS
> > > >
> > > > part of that commit:
> > > > @@ -1311,9 +1320,15 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
> > > > return -ENOTTY;
> > > > }
> > > >
> > > > - pci_read_config_dword(dev, PCI_COMMAND, &id);
> > > > - if (!PCI_POSSIBLE_ERROR(id))
> > > > - break;
> > > > + if (root && root->config_crs_sv) {
> > > > + pci_read_config_dword(dev, PCI_VENDOR_ID, &id);
> > > > + if (!pci_bus_crs_vendor_id(id))
> > > > + break;
> > > > + } else {
> > > > + pci_read_config_dword(dev, PCI_COMMAND, &id);
> > > > + if (!PCI_POSSIBLE_ERROR(id))
> > > > + break;
> > > > + }
> > > >
> > > >
> > > > Adding some debugging, the PCI_VENDOR_ID read in pci_dev_wait() returns
> > > > initially 0xffffffff. If I extend the condition with
> > > > "&& !PCI_POSSIBLE_ERROR(id)", then the issue disappear. But reading the
> > > > patch description, it would break VF.
> > > > I'm not sure where the issue is, but given it breaks only when running
> > > > with Xen, I guess something is wrong with "Configuration RRS Software
> > > > Visibility" in that case.
> > >
> > > I'm missing something. If you get 0xffffffff, that is not the 0x0001
> > > Vendor ID, so pci_dev_wait() should exit immediately.
> >
> > I'm not sure what is going on there either, but my _guess_ is that the
> > loop exits too early due to the above. And it makes some further actions
> > to fail.
>
> When RRS SV is enabled, reading PCI_VENDOR_ID should always return
> 0x0001 (if the device isn't ready and responds with RRS status) or the
> valid Vendor ID. I don't think it should ever return 0xffff (unless
> the device is powered off, unplugged, or broken, of course).
Maybe it isn't really enabled when Xen is involved?
By looking at lspci of the bridge for this device, I do see RootCtl: ...
CRSVisible+, but maybe there is something else needed too?
Just in case, full lspci -vvvs 2.2 (the bridge):
00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Phoenix GPP Bridge (prog-if 00 [Normal decode])
Subsystem: Advanced Micro Devices, Inc. [AMD] Device 1453
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin ? routed to IRQ 102
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: 0000f000-00000fff [disabled] [32-bit]
Memory behind bridge: 90b00000-90bfffff [size=1M] [32-bit]
Prefetchable memory behind bridge: 8010900000-80109fffff [size=1M] [32-bit]
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Express (v2) Root Port (Slot+), IntMsgNum 0
DevCap: MaxPayload 256 bytes, PhantFunc 0
ExtTag+ RBE+ TEE-IO-
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #4, Speed 16GT/s, Width x1, ASPM L1, Exit Latency L1 <64us
ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes, LnkDisable- CommClk+
ExtSynch+ ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x1
TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
Slot #0, PowerLimit 75W; Interlock- NoCompl+
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
Changed: MRL- PresDet- LinkState+
RootCap: CRSVisible+
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible+
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- LN System CLS Not Supported, TPHComp+ ExtTPHComp- ARIFwd+
AtomicOpsCap: Routing+ 32bit+ 64bit+ 128bitCAS-
DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis- ARIFwd-
AtomicOpsCtl: ReqEn- EgressBlck-
IDOReq- IDOCompl- LTR+ EmergencyPowerReductionReq-
10BitTagReq- OBFF Disabled, EETLPPrefixBlk-
LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee08000 Data: 4000
Capabilities: [c0] Subsystem: Advanced Micro Devices, Inc. [AMD] Device 1453
Capabilities: [c8] HyperTransport: MSI Mapping Enable+ Fixed+
Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [270 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: 0
Capabilities: [2a0 v1] Access Control Services
ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans+
ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
Capabilities: [370 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=10us PortTPowerOnTime=150us
L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
T_CommonMode=10us LTR1.2_Threshold=166912ns
L1SubCtl2: T_PwrOn=150us
Capabilities: [400 v1] Data Link Feature <?>
Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
Capabilities: [440 v1] Lane Margining at the Receiver
PortCap: Uses Driver-
PortSta: MargReady- MargSoftReady-
Kernel driver in use: pcieport
>
> > > But the log at
> > > https://github.com/QubesOS/qubes-issues/issues/9689#issuecomment-2582927149
> > > says it *doesn't* exit and eventually times out.
> >
> > Note this log is from "working" kernel, so that timeout must be
> > something else.
>
> I saw it was labeled "NO BUG" but I'm not sure it's labeled correctly
> since there are no interesting messages from the "BUG PRESENT" part.
> Awfully funny coincidence if it's unrelated.
The timeout thing I have seen before, possibly also on a different
hardware (although I'm not 100% sure), I think it's a different issue.
> > > And the lspci above shows ~0 data for much of the header, even though
> > > the device must be ready by then.
> > >
> > > I don't have any good ideas, but since the problem only happens with
> > > Xen, and it seems to affect more than just the Vendor ID, maybe you
> > > could instrument xen_pcibk_config_read() and see if there's something
> > > wonky going on there?
> >
> > This one is used when pcifront (from a different PV VM) is asking pciback
> > to read something. I see the issue even before starting any other VM and
> > not even attaching the device to the xen-pciback driver...
>
> The report claims the problem only happens with Xen. I'm not a Xen
> person, and I don't know how to find the relevant config accessors.
> The snippets of kernel messages I see at [1] all mention pciback, so
> that's my only clue of where to look. Bottom line, I have no idea
> what the config accessor path is, and maybe we could learn something
> by looking at whatever it is.
AFAIK there are no separate config accessors under Xen dom0, the default
ones are used. xen-pcifront takes over PCI config space access (and few
more) only in a domU (and only for PV), when PCI passthrough is used.
Here, it didn't went that far...
But then, Xen may intercept such access [2]. If I read it right, it
should allow all access (is_hardware_domain(dom0)==true, and also the
device is not on ro_map - otherwise reset wouldn't work at all).
>
> [1] https://github.com/QubesOS/qubes-issues/issues/9689#issuecomment-2582927149
[2] https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/pv/emul-priv-op.c;h=70150c27227661baa253af8693ff00f2ab640a98;hb=HEAD#l295
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-29 3:22 ` Marek Marczykowski-Górecki
@ 2025-01-29 3:40 ` Bjorn Helgaas
2025-01-29 3:47 ` Marek Marczykowski-Górecki
2025-01-29 9:17 ` Jan Beulich
1 sibling, 1 reply; 25+ messages in thread
From: Bjorn Helgaas @ 2025-01-29 3:40 UTC (permalink / raw)
To: Marek Marczykowski-Górecki
Cc: Bjorn Helgaas, Jürgen Groß, Roger Pau Monné,
Boris Ostrovsky, xen-devel, linux-kernel, regressions,
Felix Fietkau, Lorenzo Bianconi, Ryder Lee, linux-pci
On Wed, Jan 29, 2025 at 04:22:43AM +0100, Marek Marczykowski-Górecki wrote:
> On Tue, Jan 28, 2025 at 09:03:15PM -0600, Bjorn Helgaas wrote:
> > On Wed, Jan 29, 2025 at 03:10:49AM +0100, Marek Marczykowski-Górecki wrote:
> > > On Tue, Jan 28, 2025 at 07:15:26PM -0600, Bjorn Helgaas wrote:
> > > > On Fri, Jan 17, 2025 at 01:05:30PM +0100, Marek Marczykowski-Górecki wrote:
> > > > > After updating PV dom0 to Linux 6.12, The Mediatek MT7922 device reports
> > > > > all 0xff when accessing its config space. This happens only after device
> > > > > reset (which is also triggered when binding the device to the
> > > > > xen-pciback driver).
> > > >
> > > > Thanks for the report and for all the debugging you've already done!
> > > >
> > > > > Reproducer:
> > > > >
> > > > > # lspci -xs 01:00.0
> > > > > 01:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter
> > > > > 00: c3 14 16 06 00 00 10 00 00 00 80 02 10 00 00 00
> > > > > ...
> > > > > # echo 1 > /sys/bus/pci/devices/0000:01:00.0/reset
> > > > > # lspci -xs 01:00.0
> > > > > 01:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter
> > > > > 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > >
> > > > > The same operation done on Linux 6.12 running without Xen works fine.
> > > > >
> > > > > git bisect points at:
> > > > >
> > > > > commit d591f6804e7e1310881c9224d72247a2b65039af
> > > > > Author: Bjorn Helgaas <bhelgaas@google.com>
> > > > > Date: Tue Aug 27 18:48:46 2024 -0500
> > > > >
> > > > > PCI: Wait for device readiness with Configuration RRS
> > > > >
> > > > > part of that commit:
> > > > > @@ -1311,9 +1320,15 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
> > > > > return -ENOTTY;
> > > > > }
> > > > >
> > > > > - pci_read_config_dword(dev, PCI_COMMAND, &id);
> > > > > - if (!PCI_POSSIBLE_ERROR(id))
> > > > > - break;
> > > > > + if (root && root->config_crs_sv) {
> > > > > + pci_read_config_dword(dev, PCI_VENDOR_ID, &id);
> > > > > + if (!pci_bus_crs_vendor_id(id))
> > > > > + break;
> > > > > + } else {
> > > > > + pci_read_config_dword(dev, PCI_COMMAND, &id);
> > > > > + if (!PCI_POSSIBLE_ERROR(id))
> > > > > + break;
> > > > > + }
> > > > >
> > > > >
> > > > > Adding some debugging, the PCI_VENDOR_ID read in pci_dev_wait() returns
> > > > > initially 0xffffffff. If I extend the condition with
> > > > > "&& !PCI_POSSIBLE_ERROR(id)", then the issue disappear. But reading the
> > > > > patch description, it would break VF.
> > > > > I'm not sure where the issue is, but given it breaks only when running
> > > > > with Xen, I guess something is wrong with "Configuration RRS Software
> > > > > Visibility" in that case.
> > > >
> > > > I'm missing something. If you get 0xffffffff, that is not the 0x0001
> > > > Vendor ID, so pci_dev_wait() should exit immediately.
> > >
> > > I'm not sure what is going on there either, but my _guess_ is that the
> > > loop exits too early due to the above. And it makes some further actions
> > > to fail.
> >
> > When RRS SV is enabled, reading PCI_VENDOR_ID should always return
> > 0x0001 (if the device isn't ready and responds with RRS status) or the
> > valid Vendor ID. I don't think it should ever return 0xffff (unless
> > the device is powered off, unplugged, or broken, of course).
>
> Maybe it isn't really enabled when Xen is involved?
> By looking at lspci of the bridge for this device, I do see RootCtl: ...
> CRSVisible+, but maybe there is something else needed too?
As far as I know, CRSVisible+ is all that's needed to enable this, and
Linux always enables it if it's supported [3]
> Just in case, full lspci -vvvs 2.2 (the bridge):
>
>
> 00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Phoenix GPP Bridge (prog-if 00 [Normal decode])
> Subsystem: Advanced Micro Devices, Inc. [AMD] Device 1453
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> Latency: 0, Cache Line Size: 64 bytes
> Interrupt: pin ? routed to IRQ 102
> Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
> I/O behind bridge: 0000f000-00000fff [disabled] [32-bit]
> Memory behind bridge: 90b00000-90bfffff [size=1M] [32-bit]
> Prefetchable memory behind bridge: 8010900000-80109fffff [size=1M] [32-bit]
> Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
> BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
> PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> Capabilities: [50] Power Management version 3
> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
> Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [58] Express (v2) Root Port (Slot+), IntMsgNum 0
> DevCap: MaxPayload 256 bytes, PhantFunc 0
> ExtTag+ RBE+ TEE-IO-
> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
> MaxPayload 128 bytes, MaxReadReq 512 bytes
> DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
> LnkCap: Port #4, Speed 16GT/s, Width x1, ASPM L1, Exit Latency L1 <64us
> ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
> LnkCtl: ASPM L1 Enabled; RCB 64 bytes, LnkDisable- CommClk+
> ExtSynch+ ClockPM- AutWidDis- BWInt- AutBWInt-
> LnkSta: Speed 5GT/s, Width x1
> TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
> SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
> Slot #0, PowerLimit 75W; Interlock- NoCompl+
> SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
> Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
> SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
> Changed: MRL- PresDet- LinkState+
> RootCap: CRSVisible+
> RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible+
> RootSta: PME ReqID 0000, PMEStatus- PMEPending-
> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
> 10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
> EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
> FRS- LN System CLS Not Supported, TPHComp+ ExtTPHComp- ARIFwd+
> AtomicOpsCap: Routing+ 32bit+ 64bit+ 128bitCAS-
> DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis- ARIFwd-
> AtomicOpsCtl: ReqEn- EgressBlck-
> IDOReq- IDOCompl- LTR+ EmergencyPowerReductionReq-
> 10BitTagReq- OBFF Disabled, EETLPPrefixBlk-
> LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
> LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
> Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
> LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
> EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
> Retimer- 2Retimers- CrosslinkRes: unsupported
> Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> Address: 00000000fee08000 Data: 4000
> Capabilities: [c0] Subsystem: Advanced Micro Devices, Inc. [AMD] Device 1453
> Capabilities: [c8] HyperTransport: MSI Mapping Enable+ Fixed+
> Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
> Capabilities: [270 v1] Secondary PCI Express
> LnkCtl3: LnkEquIntrruptEn- PerformEqu-
> LaneErrStat: 0
> Capabilities: [2a0 v1] Access Control Services
> ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans+
> ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
> Capabilities: [370 v1] L1 PM Substates
> L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> PortCommonModeRestoreTime=10us PortTPowerOnTime=150us
> L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> T_CommonMode=10us LTR1.2_Threshold=166912ns
> L1SubCtl2: T_PwrOn=150us
> Capabilities: [400 v1] Data Link Feature <?>
> Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
> Capabilities: [440 v1] Lane Margining at the Receiver
> PortCap: Uses Driver-
> PortSta: MargReady- MargSoftReady-
> Kernel driver in use: pcieport
>
> > > > But the log at
> > > > https://github.com/QubesOS/qubes-issues/issues/9689#issuecomment-2582927149
> > > > says it *doesn't* exit and eventually times out.
> > >
> > > Note this log is from "working" kernel, so that timeout must be
> > > something else.
> >
> > I saw it was labeled "NO BUG" but I'm not sure it's labeled correctly
> > since there are no interesting messages from the "BUG PRESENT" part.
> > Awfully funny coincidence if it's unrelated.
>
> The timeout thing I have seen before, possibly also on a different
> hardware (although I'm not 100% sure), I think it's a different issue.
>
> > > > And the lspci above shows ~0 data for much of the header, even though
> > > > the device must be ready by then.
> > > >
> > > > I don't have any good ideas, but since the problem only happens with
> > > > Xen, and it seems to affect more than just the Vendor ID, maybe you
> > > > could instrument xen_pcibk_config_read() and see if there's something
> > > > wonky going on there?
> > >
> > > This one is used when pcifront (from a different PV VM) is asking pciback
> > > to read something. I see the issue even before starting any other VM and
> > > not even attaching the device to the xen-pciback driver...
> >
> > The report claims the problem only happens with Xen. I'm not a Xen
> > person, and I don't know how to find the relevant config accessors.
> > The snippets of kernel messages I see at [1] all mention pciback, so
> > that's my only clue of where to look. Bottom line, I have no idea
> > what the config accessor path is, and maybe we could learn something
> > by looking at whatever it is.
>
> AFAIK there are no separate config accessors under Xen dom0, the default
> ones are used. xen-pcifront takes over PCI config space access (and few
> more) only in a domU (and only for PV), when PCI passthrough is used.
> Here, it didn't went that far...
>
> But then, Xen may intercept such access [2]. If I read it right, it
> should allow all access (is_hardware_domain(dom0)==true, and also the
> device is not on ro_map - otherwise reset wouldn't work at all).
I guess the code at [2] is running in user mode and uses Linux
syscalls for config access? Is it straceable?
Can you reproduce this without Xen at all? If so, can you post a
complete dmesg and complete lspci -vv somewhere?
> > [1] https://github.com/QubesOS/qubes-issues/issues/9689#issuecomment-2582927149
>
> [2] https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/pv/emul-priv-op.c;h=70150c27227661baa253af8693ff00f2ab640a98;hb=HEAD#l295
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/probe.c?id=v6.13#n1208
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-29 3:40 ` Bjorn Helgaas
@ 2025-01-29 3:47 ` Marek Marczykowski-Górecki
2025-01-29 13:32 ` Bjorn Helgaas
0 siblings, 1 reply; 25+ messages in thread
From: Marek Marczykowski-Górecki @ 2025-01-29 3:47 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Bjorn Helgaas, Jürgen Groß, Roger Pau Monné,
Boris Ostrovsky, xen-devel, linux-kernel, regressions,
Felix Fietkau, Lorenzo Bianconi, Ryder Lee, linux-pci
[-- Attachment #1: Type: text/plain, Size: 1113 bytes --]
On Tue, Jan 28, 2025 at 09:40:18PM -0600, Bjorn Helgaas wrote:
> I guess the code at [2] is running in user mode and uses Linux
> syscalls for config access? Is it straceable?
Nope, it's running as the hypervisor and mediates Linux's access to the
hardware. In fact, Linux PV kernel (which dom0 is by default under Xen)
is running in ring 3...
But I can add some more logging there.
> Can you reproduce this without Xen at all? If so, can you post a
> complete dmesg and complete lspci -vv somewhere?
I haven't managed to reproduce it without Xen so far. But I can't
exclude it's some race condition that is simply unlikely to hit when
Linux runs natively.
> > > [1] https://github.com/QubesOS/qubes-issues/issues/9689#issuecomment-2582927149
> >
> > [2] https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/pv/emul-priv-op.c;h=70150c27227661baa253af8693ff00f2ab640a98;hb=HEAD#l295
>
> [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/probe.c?id=v6.13#n1208
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-29 3:22 ` Marek Marczykowski-Górecki
2025-01-29 3:40 ` Bjorn Helgaas
@ 2025-01-29 9:17 ` Jan Beulich
2025-01-29 11:53 ` Marek Marczykowski-Górecki
2025-01-29 13:28 ` Bjorn Helgaas
1 sibling, 2 replies; 25+ messages in thread
From: Jan Beulich @ 2025-01-29 9:17 UTC (permalink / raw)
To: Marek Marczykowski-Górecki
Cc: Bjorn Helgaas, Jürgen Groß, Roger Pau Monné,
Boris Ostrovsky, xen-devel, linux-kernel, regressions,
Felix Fietkau, Lorenzo Bianconi, Ryder Lee, linux-pci,
Bjorn Helgaas
On 29.01.2025 04:22, Marek Marczykowski-Górecki wrote:
> On Tue, Jan 28, 2025 at 09:03:15PM -0600, Bjorn Helgaas wrote:
>> The report claims the problem only happens with Xen. I'm not a Xen
>> person, and I don't know how to find the relevant config accessors.
>> The snippets of kernel messages I see at [1] all mention pciback, so
>> that's my only clue of where to look. Bottom line, I have no idea
>> what the config accessor path is, and maybe we could learn something
>> by looking at whatever it is.
>
> AFAIK there are no separate config accessors under Xen dom0, the default
> ones are used. xen-pcifront takes over PCI config space access (and few
> more) only in a domU (and only for PV), when PCI passthrough is used.
> Here, it didn't went that far...
>
> But then, Xen may intercept such access [2]. If I read it right, it
> should allow all access (is_hardware_domain(dom0)==true, and also the
> device is not on ro_map - otherwise reset wouldn't work at all).
The other day you mentioned (on Matrix I think) that you observe mmcfg
not being used on that system. Am I misremembering? (Since the capability
where the control bit lives is an extended one, that capability would
neither be read nor modified when mmcfg is unavailable.)
Jan
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-29 9:17 ` Jan Beulich
@ 2025-01-29 11:53 ` Marek Marczykowski-Górecki
2025-01-29 12:49 ` Jan Beulich
2025-01-29 13:28 ` Bjorn Helgaas
1 sibling, 1 reply; 25+ messages in thread
From: Marek Marczykowski-Górecki @ 2025-01-29 11:53 UTC (permalink / raw)
To: Jan Beulich
Cc: Bjorn Helgaas, Jürgen Groß, Roger Pau Monné,
Boris Ostrovsky, xen-devel, linux-kernel, regressions,
Felix Fietkau, Lorenzo Bianconi, Ryder Lee, linux-pci,
Bjorn Helgaas
[-- Attachment #1: Type: text/plain, Size: 1886 bytes --]
On Wed, Jan 29, 2025 at 10:17:20AM +0100, Jan Beulich wrote:
> On 29.01.2025 04:22, Marek Marczykowski-Górecki wrote:
> > On Tue, Jan 28, 2025 at 09:03:15PM -0600, Bjorn Helgaas wrote:
> >> The report claims the problem only happens with Xen. I'm not a Xen
> >> person, and I don't know how to find the relevant config accessors.
> >> The snippets of kernel messages I see at [1] all mention pciback, so
> >> that's my only clue of where to look. Bottom line, I have no idea
> >> what the config accessor path is, and maybe we could learn something
> >> by looking at whatever it is.
> >
> > AFAIK there are no separate config accessors under Xen dom0, the default
> > ones are used. xen-pcifront takes over PCI config space access (and few
> > more) only in a domU (and only for PV), when PCI passthrough is used.
> > Here, it didn't went that far...
> >
> > But then, Xen may intercept such access [2]. If I read it right, it
> > should allow all access (is_hardware_domain(dom0)==true, and also the
> > device is not on ro_map - otherwise reset wouldn't work at all).
>
> The other day you mentioned (on Matrix I think) that you observe mmcfg
> not being used on that system. Am I misremembering? (Since the capability
> where the control bit lives is an extended one, that capability would
> neither be read nor modified when mmcfg is unavailable.)
Yes, but later (once dom0 starts) it switched back to mmcfg. Now I see
this:
(XEN) PCI: MCFG configuration 0: base e0000000 segment 0000 buses 00 - ff
(XEN) PCI: Using MCFG for segment 0000 bus 00-ff
Another thing I noticed in the bug report - the reporter says warm
reboot from 6.11 (where it works) to 6.12 avoids the issue (not sure
about further reboots). Cold boot directly to 6.12 results in this buggy
behavior.
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-29 11:53 ` Marek Marczykowski-Górecki
@ 2025-01-29 12:49 ` Jan Beulich
0 siblings, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2025-01-29 12:49 UTC (permalink / raw)
To: Marek Marczykowski-Górecki
Cc: Bjorn Helgaas, Jürgen Groß, Roger Pau Monné,
Boris Ostrovsky, xen-devel, linux-kernel, regressions,
Felix Fietkau, Lorenzo Bianconi, Ryder Lee, linux-pci,
Bjorn Helgaas
On 29.01.2025 12:53, Marek Marczykowski-Górecki wrote:
> On Wed, Jan 29, 2025 at 10:17:20AM +0100, Jan Beulich wrote:
>> On 29.01.2025 04:22, Marek Marczykowski-Górecki wrote:
>>> On Tue, Jan 28, 2025 at 09:03:15PM -0600, Bjorn Helgaas wrote:
>>>> The report claims the problem only happens with Xen. I'm not a Xen
>>>> person, and I don't know how to find the relevant config accessors.
>>>> The snippets of kernel messages I see at [1] all mention pciback, so
>>>> that's my only clue of where to look. Bottom line, I have no idea
>>>> what the config accessor path is, and maybe we could learn something
>>>> by looking at whatever it is.
>>>
>>> AFAIK there are no separate config accessors under Xen dom0, the default
>>> ones are used. xen-pcifront takes over PCI config space access (and few
>>> more) only in a domU (and only for PV), when PCI passthrough is used.
>>> Here, it didn't went that far...
>>>
>>> But then, Xen may intercept such access [2]. If I read it right, it
>>> should allow all access (is_hardware_domain(dom0)==true, and also the
>>> device is not on ro_map - otherwise reset wouldn't work at all).
>>
>> The other day you mentioned (on Matrix I think) that you observe mmcfg
>> not being used on that system. Am I misremembering? (Since the capability
>> where the control bit lives is an extended one, that capability would
>> neither be read nor modified when mmcfg is unavailable.)
>
> Yes, but later (once dom0 starts) it switched back to mmcfg. Now I see
> this:
> (XEN) PCI: MCFG configuration 0: base e0000000 segment 0000 buses 00 - ff
> (XEN) PCI: Using MCFG for segment 0000 bus 00-ff
>
> Another thing I noticed in the bug report - the reporter says warm
> reboot from 6.11 (where it works) to 6.12 avoids the issue (not sure
> about further reboots). Cold boot directly to 6.12 results in this buggy
> behavior.
Makes things yet more odd, imo.
Jan
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-29 9:17 ` Jan Beulich
2025-01-29 11:53 ` Marek Marczykowski-Górecki
@ 2025-01-29 13:28 ` Bjorn Helgaas
2025-01-29 13:54 ` Jan Beulich
1 sibling, 1 reply; 25+ messages in thread
From: Bjorn Helgaas @ 2025-01-29 13:28 UTC (permalink / raw)
To: Jan Beulich
Cc: Marek Marczykowski-Górecki, Bjorn Helgaas,
Jürgen Groß, Roger Pau Monné, Boris Ostrovsky,
xen-devel, linux-kernel, regressions, Felix Fietkau,
Lorenzo Bianconi, Ryder Lee, linux-pci
On Wed, Jan 29, 2025 at 10:17:20AM +0100, Jan Beulich wrote:
> On 29.01.2025 04:22, Marek Marczykowski-Górecki wrote:
> > On Tue, Jan 28, 2025 at 09:03:15PM -0600, Bjorn Helgaas wrote:
> >> The report claims the problem only happens with Xen. I'm not a Xen
> >> person, and I don't know how to find the relevant config accessors.
> >> The snippets of kernel messages I see at [1] all mention pciback, so
> >> that's my only clue of where to look. Bottom line, I have no idea
> >> what the config accessor path is, and maybe we could learn something
> >> by looking at whatever it is.
> >
> > AFAIK there are no separate config accessors under Xen dom0, the default
> > ones are used. xen-pcifront takes over PCI config space access (and few
> > more) only in a domU (and only for PV), when PCI passthrough is used.
> > Here, it didn't went that far...
> >
> > But then, Xen may intercept such access [2]. If I read it right, it
> > should allow all access (is_hardware_domain(dom0)==true, and also the
> > device is not on ro_map - otherwise reset wouldn't work at all).
>
> The other day you mentioned (on Matrix I think) that you observe mmcfg
> not being used on that system. Am I misremembering? (Since the capability
> where the control bit lives is an extended one, that capability would
> neither be read nor modified when mmcfg is unavailable.)
If you're referring to the Configuration RRS Software Visibility
Enable bit, that's in the PCIe Capability Root Control register, which
is in the PCI-compatible config space (the first 256 bytes), not the
extended config space.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-29 3:47 ` Marek Marczykowski-Górecki
@ 2025-01-29 13:32 ` Bjorn Helgaas
2025-01-29 13:52 ` Jan Beulich
0 siblings, 1 reply; 25+ messages in thread
From: Bjorn Helgaas @ 2025-01-29 13:32 UTC (permalink / raw)
To: Marek Marczykowski-Górecki
Cc: Bjorn Helgaas, Jürgen Groß, Roger Pau Monné,
Boris Ostrovsky, xen-devel, linux-kernel, regressions,
Felix Fietkau, Lorenzo Bianconi, Ryder Lee, linux-pci,
Jan Beulich
On Wed, Jan 29, 2025 at 04:47:28AM +0100, Marek Marczykowski-Górecki wrote:
> On Tue, Jan 28, 2025 at 09:40:18PM -0600, Bjorn Helgaas wrote:
> > I guess the code at [2] is running in user mode and uses Linux
> > syscalls for config access? Is it straceable?
>
> Nope, it's running as the hypervisor and mediates Linux's access to the
> hardware. In fact, Linux PV kernel (which dom0 is by default under Xen)
> is running in ring 3...
>
> But I can add some more logging there.
So I guess the hypervisor performs the config access on behalf of the
Linux PV kernel? Obviously Linux thinks CRS/RRS SV is enabled, but I
suppose all the lspci output is similarly based on whatever the
hypervisor does, so how do we know the actual hardware config?
> > > [2] https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/pv/emul-priv-op.c;h=70150c27227661baa253af8693ff00f2ab640a98;hb=HEAD#l295
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-29 13:32 ` Bjorn Helgaas
@ 2025-01-29 13:52 ` Jan Beulich
2025-01-29 14:50 ` Bjorn Helgaas
0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2025-01-29 13:52 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Bjorn Helgaas, Jürgen Groß, Roger Pau Monné,
Boris Ostrovsky, xen-devel, linux-kernel, regressions,
Felix Fietkau, Lorenzo Bianconi, Ryder Lee, linux-pci,
Marek Marczykowski-Górecki
On 29.01.2025 14:32, Bjorn Helgaas wrote:
> On Wed, Jan 29, 2025 at 04:47:28AM +0100, Marek Marczykowski-Górecki wrote:
>> On Tue, Jan 28, 2025 at 09:40:18PM -0600, Bjorn Helgaas wrote:
>>> I guess the code at [2] is running in user mode and uses Linux
>>> syscalls for config access? Is it straceable?
>>
>> Nope, it's running as the hypervisor and mediates Linux's access to the
>> hardware. In fact, Linux PV kernel (which dom0 is by default under Xen)
>> is running in ring 3...
>>
>> But I can add some more logging there.
>
> So I guess the hypervisor performs the config access on behalf of the
> Linux PV kernel? Obviously Linux thinks CRS/RRS SV is enabled, but I
> suppose all the lspci output is similarly based on whatever the
> hypervisor does, so how do we know the actual hardware config?
The hypervisor only intercepts config space writes; reads, particularly
when going via mmcfg, ought to be unaffected, and hence what lspci shows
should be "the actual hardware config".
Jan
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-29 13:28 ` Bjorn Helgaas
@ 2025-01-29 13:54 ` Jan Beulich
0 siblings, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2025-01-29 13:54 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Marek Marczykowski-Górecki, Bjorn Helgaas,
Jürgen Groß, Roger Pau Monné, Boris Ostrovsky,
xen-devel, linux-kernel, regressions, Felix Fietkau,
Lorenzo Bianconi, Ryder Lee, linux-pci
On 29.01.2025 14:28, Bjorn Helgaas wrote:
> On Wed, Jan 29, 2025 at 10:17:20AM +0100, Jan Beulich wrote:
>> On 29.01.2025 04:22, Marek Marczykowski-Górecki wrote:
>>> On Tue, Jan 28, 2025 at 09:03:15PM -0600, Bjorn Helgaas wrote:
>>>> The report claims the problem only happens with Xen. I'm not a Xen
>>>> person, and I don't know how to find the relevant config accessors.
>>>> The snippets of kernel messages I see at [1] all mention pciback, so
>>>> that's my only clue of where to look. Bottom line, I have no idea
>>>> what the config accessor path is, and maybe we could learn something
>>>> by looking at whatever it is.
>>>
>>> AFAIK there are no separate config accessors under Xen dom0, the default
>>> ones are used. xen-pcifront takes over PCI config space access (and few
>>> more) only in a domU (and only for PV), when PCI passthrough is used.
>>> Here, it didn't went that far...
>>>
>>> But then, Xen may intercept such access [2]. If I read it right, it
>>> should allow all access (is_hardware_domain(dom0)==true, and also the
>>> device is not on ro_map - otherwise reset wouldn't work at all).
>>
>> The other day you mentioned (on Matrix I think) that you observe mmcfg
>> not being used on that system. Am I misremembering? (Since the capability
>> where the control bit lives is an extended one, that capability would
>> neither be read nor modified when mmcfg is unavailable.)
>
> If you're referring to the Configuration RRS Software Visibility
> Enable bit, that's in the PCIe Capability Root Control register, which
> is in the PCI-compatible config space (the first 256 bytes), not the
> extended config space.
Oh, I clearly didn't read Marek's earlier mail correctly. I'm sorry for that.
Jan
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-29 13:52 ` Jan Beulich
@ 2025-01-29 14:50 ` Bjorn Helgaas
0 siblings, 0 replies; 25+ messages in thread
From: Bjorn Helgaas @ 2025-01-29 14:50 UTC (permalink / raw)
To: Jan Beulich
Cc: Bjorn Helgaas, Jürgen Groß, Roger Pau Monné,
Boris Ostrovsky, xen-devel, linux-kernel, regressions,
Felix Fietkau, Lorenzo Bianconi, Ryder Lee, linux-pci,
Marek Marczykowski-Górecki
On Wed, Jan 29, 2025 at 02:52:33PM +0100, Jan Beulich wrote:
> On 29.01.2025 14:32, Bjorn Helgaas wrote:
> > On Wed, Jan 29, 2025 at 04:47:28AM +0100, Marek Marczykowski-Górecki wrote:
> >> On Tue, Jan 28, 2025 at 09:40:18PM -0600, Bjorn Helgaas wrote:
> >>> I guess the code at [2] is running in user mode and uses Linux
> >>> syscalls for config access? Is it straceable?
> >>
> >> Nope, it's running as the hypervisor and mediates Linux's access to the
> >> hardware. In fact, Linux PV kernel (which dom0 is by default under Xen)
> >> is running in ring 3...
> >>
> >> But I can add some more logging there.
> >
> > So I guess the hypervisor performs the config access on behalf of the
> > Linux PV kernel? Obviously Linux thinks CRS/RRS SV is enabled, but I
> > suppose all the lspci output is similarly based on whatever the
> > hypervisor does, so how do we know the actual hardware config?
>
> The hypervisor only intercepts config space writes; reads, particularly
> when going via mmcfg, ought to be unaffected, and hence what lspci shows
> should be "the actual hardware config".
FWIW, on x86, the first 256 bytes of config space for domain 0 uses
raw_pci_ops even when mmcfg is available (see raw_pci_read()). This
normally means IO ports 0xcf8/0xcfc.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-29 2:10 ` Marek Marczykowski-Górecki
2025-01-29 3:03 ` Bjorn Helgaas
@ 2025-01-29 18:48 ` Bjorn Helgaas
2025-01-30 4:55 ` Marek Marczykowski-Górecki
1 sibling, 1 reply; 25+ messages in thread
From: Bjorn Helgaas @ 2025-01-29 18:48 UTC (permalink / raw)
To: Marek Marczykowski-Górecki
Cc: Bjorn Helgaas, Jürgen Groß, Roger Pau Monné,
Boris Ostrovsky, xen-devel, linux-kernel, regressions,
Felix Fietkau, Lorenzo Bianconi, Ryder Lee
On Wed, Jan 29, 2025 at 03:10:49AM +0100, Marek Marczykowski-Górecki wrote:
> On Tue, Jan 28, 2025 at 07:15:26PM -0600, Bjorn Helgaas wrote:
> > On Fri, Jan 17, 2025 at 01:05:30PM +0100, Marek Marczykowski-Górecki wrote:
> > > After updating PV dom0 to Linux 6.12, The Mediatek MT7922 device reports
> > > all 0xff when accessing its config space. This happens only after device
> > > reset (which is also triggered when binding the device to the
> > > xen-pciback driver).
> >
> > Thanks for the report and for all the debugging you've already done!
> >
> > > Reproducer:
> > >
> > > # lspci -xs 01:00.0
> > > 01:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter
> > > 00: c3 14 16 06 00 00 10 00 00 00 80 02 10 00 00 00
> > > ...
> > > # echo 1 > /sys/bus/pci/devices/0000:01:00.0/reset
> > > # lspci -xs 01:00.0
> > > 01:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter
> > > 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > >
> > > The same operation done on Linux 6.12 running without Xen works fine.
> > >
> > > git bisect points at:
> > >
> > > commit d591f6804e7e1310881c9224d72247a2b65039af
> > > Author: Bjorn Helgaas <bhelgaas@google.com>
> > > Date: Tue Aug 27 18:48:46 2024 -0500
> > >
> > > PCI: Wait for device readiness with Configuration RRS
> > >
> > > part of that commit:
> > > @@ -1311,9 +1320,15 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
> > > return -ENOTTY;
> > > }
> > >
> > > - pci_read_config_dword(dev, PCI_COMMAND, &id);
> > > - if (!PCI_POSSIBLE_ERROR(id))
> > > - break;
> > > + if (root && root->config_crs_sv) {
> > > + pci_read_config_dword(dev, PCI_VENDOR_ID, &id);
> > > + if (!pci_bus_crs_vendor_id(id))
> > > + break;
> > > + } else {
> > > + pci_read_config_dword(dev, PCI_COMMAND, &id);
> > > + if (!PCI_POSSIBLE_ERROR(id))
> > > + break;
> > > + }
> > >
> > >
> > > Adding some debugging, the PCI_VENDOR_ID read in pci_dev_wait() returns
> > > initially 0xffffffff. If I extend the condition with
> > > "&& !PCI_POSSIBLE_ERROR(id)", then the issue disappear. But reading the
> > > patch description, it would break VF.
> > > I'm not sure where the issue is, but given it breaks only when running
> > > with Xen, I guess something is wrong with "Configuration RRS Software
> > > Visibility" in that case.
> >
> > I'm missing something. If you get 0xffffffff, that is not the 0x0001
> > Vendor ID, so pci_dev_wait() should exit immediately.
>
> I'm not sure what is going on there either, but my _guess_ is that the
> loop exits too early due to the above. And it makes some further actions
> to fail.
Seems like a good guess worth investigating. Maybe log all config
accesses to this device after the FLR and see what we're doing?
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-29 18:48 ` Bjorn Helgaas
@ 2025-01-30 4:55 ` Marek Marczykowski-Górecki
2025-01-30 9:30 ` Jan Beulich
0 siblings, 1 reply; 25+ messages in thread
From: Marek Marczykowski-Górecki @ 2025-01-30 4:55 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Bjorn Helgaas, Jürgen Groß, Roger Pau Monné,
Boris Ostrovsky, xen-devel, linux-kernel, regressions,
Felix Fietkau, Lorenzo Bianconi, Ryder Lee
[-- Attachment #1: Type: text/plain, Size: 18529 bytes --]
On Wed, Jan 29, 2025 at 12:48:25PM -0600, Bjorn Helgaas wrote:
> On Wed, Jan 29, 2025 at 03:10:49AM +0100, Marek Marczykowski-Górecki wrote:
> > On Tue, Jan 28, 2025 at 07:15:26PM -0600, Bjorn Helgaas wrote:
> > > On Fri, Jan 17, 2025 at 01:05:30PM +0100, Marek Marczykowski-Górecki wrote:
> > > > After updating PV dom0 to Linux 6.12, The Mediatek MT7922 device reports
> > > > all 0xff when accessing its config space. This happens only after device
> > > > reset (which is also triggered when binding the device to the
> > > > xen-pciback driver).
> > >
> > > Thanks for the report and for all the debugging you've already done!
> > >
> > > > Reproducer:
> > > >
> > > > # lspci -xs 01:00.0
> > > > 01:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter
> > > > 00: c3 14 16 06 00 00 10 00 00 00 80 02 10 00 00 00
> > > > ...
> > > > # echo 1 > /sys/bus/pci/devices/0000:01:00.0/reset
> > > > # lspci -xs 01:00.0
> > > > 01:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter
> > > > 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > >
> > > > The same operation done on Linux 6.12 running without Xen works fine.
> > > >
> > > > git bisect points at:
> > > >
> > > > commit d591f6804e7e1310881c9224d72247a2b65039af
> > > > Author: Bjorn Helgaas <bhelgaas@google.com>
> > > > Date: Tue Aug 27 18:48:46 2024 -0500
> > > >
> > > > PCI: Wait for device readiness with Configuration RRS
> > > >
> > > > part of that commit:
> > > > @@ -1311,9 +1320,15 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
> > > > return -ENOTTY;
> > > > }
> > > >
> > > > - pci_read_config_dword(dev, PCI_COMMAND, &id);
> > > > - if (!PCI_POSSIBLE_ERROR(id))
> > > > - break;
> > > > + if (root && root->config_crs_sv) {
> > > > + pci_read_config_dword(dev, PCI_VENDOR_ID, &id);
> > > > + if (!pci_bus_crs_vendor_id(id))
> > > > + break;
> > > > + } else {
> > > > + pci_read_config_dword(dev, PCI_COMMAND, &id);
> > > > + if (!PCI_POSSIBLE_ERROR(id))
> > > > + break;
> > > > + }
> > > >
> > > >
> > > > Adding some debugging, the PCI_VENDOR_ID read in pci_dev_wait() returns
> > > > initially 0xffffffff. If I extend the condition with
> > > > "&& !PCI_POSSIBLE_ERROR(id)", then the issue disappear. But reading the
> > > > patch description, it would break VF.
> > > > I'm not sure where the issue is, but given it breaks only when running
> > > > with Xen, I guess something is wrong with "Configuration RRS Software
> > > > Visibility" in that case.
> > >
> > > I'm missing something. If you get 0xffffffff, that is not the 0x0001
> > > Vendor ID, so pci_dev_wait() should exit immediately.
> >
> > I'm not sure what is going on there either, but my _guess_ is that the
> > loop exits too early due to the above. And it makes some further actions
> > to fail.
>
> Seems like a good guess worth investigating. Maybe log all config
> accesses to this device after the FLR and see what we're doing?
I've added logging of all config read/write to this device. Full log at
[1].
A little explanation:
- it's done in pci_conf_read/pci_conf_write in https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/pci.c;h=97b792e578f1093194466081ad3651ade21cae7d;hb=HEAD
- cf8 means cf8 port value (BDF + register)
- bytes is read/write size (1/2/4)
- offset is the offset in the register (on top of cf8), but not in data
- data is either retrieved value, or written value, depending on
function
- it's logging only accesses to 01:00.0
interesting part:
lspci before reset:
(XEN) d0v3 conf read cf8 0x80010000 bytes 4 offset 0 data 0x61614c3
(XEN) d0v3 conf read cf8 0x80010004 bytes 4 offset 0 data 0x100000
(XEN) d0v3 conf read cf8 0x80010008 bytes 4 offset 0 data 0x2800000
(XEN) d0v3 conf read cf8 0x8001000c bytes 4 offset 0 data 0x10
(XEN) d0v3 conf read cf8 0x80010010 bytes 4 offset 0 data 0x1090000c
(XEN) d0v3 conf read cf8 0x80010014 bytes 4 offset 0 data 0x80
(XEN) d0v3 conf read cf8 0x80010018 bytes 4 offset 0 data 0x90b00004
(XEN) d0v3 conf read cf8 0x8001001c bytes 4 offset 0 data 0
(XEN) d0v3 conf read cf8 0x80010020 bytes 4 offset 0 data 0
(XEN) d0v3 conf read cf8 0x80010024 bytes 4 offset 0 data 0
(XEN) d0v3 conf read cf8 0x80010028 bytes 4 offset 0 data 0
(XEN) d0v3 conf read cf8 0x8001002c bytes 4 offset 0 data 0xe61614c3
(XEN) d0v3 conf read cf8 0x80010030 bytes 4 offset 0 data 0
(XEN) d0v3 conf read cf8 0x80010034 bytes 4 offset 0 data 0x80
(XEN) d0v3 conf read cf8 0x80010038 bytes 4 offset 0 data 0
(XEN) d0v3 conf read cf8 0x8001003c bytes 4 offset 0 data 0x1ff
(XEN) d0v3 conf read cf8 0x80010080 bytes 4 offset 0 data 0x2e010
(XEN) d0v3 conf read cf8 0x800100e0 bytes 4 offset 0 data 0x18af805
(XEN) d0v3 conf read cf8 0x800100f8 bytes 4 offset 0 data 0xc8030001
reset:
(XEN) d0v1 conf read cf8 0x800100fc bytes 2 offset 0 data 0x8
(XEN) d0v1 conf read cf8 0x800100fc bytes 2 offset 0 data 0x8
(XEN) d0v1 conf read cf8 0x8001008c bytes 4 offset 0 data 0x145dc12
(XEN) d0v1 conf read cf8 0x80010000 bytes 4 offset 0 data 0x61614c3
(XEN) d0v1 conf read cf8 0x80010004 bytes 4 offset 0 data 0x100000
(XEN) d0v1 conf read cf8 0x80010008 bytes 4 offset 0 data 0x2800000
(XEN) d0v1 conf read cf8 0x8001000c bytes 4 offset 0 data 0x10
(XEN) d0v1 conf read cf8 0x80010010 bytes 4 offset 0 data 0x1090000c
(XEN) d0v1 conf read cf8 0x80010014 bytes 4 offset 0 data 0x80
(XEN) d0v1 conf read cf8 0x80010018 bytes 4 offset 0 data 0x90b00004
(XEN) d0v1 conf read cf8 0x8001001c bytes 4 offset 0 data 0
(XEN) d0v1 conf read cf8 0x80010020 bytes 4 offset 0 data 0
(XEN) d0v1 conf read cf8 0x80010024 bytes 4 offset 0 data 0
(XEN) d0v1 conf read cf8 0x80010028 bytes 4 offset 0 data 0
(XEN) d0v1 conf read cf8 0x8001002c bytes 4 offset 0 data 0xe61614c3
(XEN) d0v1 conf read cf8 0x80010030 bytes 4 offset 0 data 0
(XEN) d0v1 conf read cf8 0x80010034 bytes 4 offset 0 data 0x80
(XEN) d0v1 conf read cf8 0x80010038 bytes 4 offset 0 data 0
(XEN) d0v1 conf read cf8 0x8001003c bytes 4 offset 0 data 0x1ff
(XEN) d0v1 conf read cf8 0x80010088 bytes 2 offset 0 data 0x2910
(XEN) d0v1 conf read cf8 0x80010090 bytes 2 offset 0 data 0x1c2
(XEN) d0v1 conf read cf8 0x800100a8 bytes 2 offset 0 data 0x400
(XEN) d0v1 conf read cf8 0x800100b0 bytes 2 offset 0 data 0x2
(XEN) d0v1 conf read cf8 0x80010004 bytes 2 offset 2 data 0x10
(XEN) d0v1 conf read cf8 0x80010034 bytes 1 offset 0 data 0x80
(XEN) d0v1 conf read cf8 0x80010080 bytes 2 offset 0 data 0xe010
(XEN) d0v1 conf read cf8 0x800100e0 bytes 2 offset 0 data 0xf805
(XEN) d0v1 conf read cf8 0x800100f8 bytes 2 offset 0 data 0x1
(XEN) d0v1 conf write cf8 0x80010004 bytes 2 offset 0 data 0x400
(XEN) d0v1 conf read cf8 0x80010088 bytes 2 offset 2 data 0x9
(XEN) d0v1 conf read cf8 0x80010088 bytes 2 offset 0 data 0x2910
(XEN) d0v1 conf write cf8 0x80010088 bytes 2 offset 0 data 0xa910
(XEN) d0v2 conf read cf8 0x80010000 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf read cf8 0x80010090 bytes 2 offset 0 data 0xffff
(XEN) d0v2 conf write cf8 0x80010090 bytes 2 offset 0 data 0xfffc
(XEN) d0v2 conf write cf8 0x80010090 bytes 2 offset 0 data 0xffff
(XEN) d0v2 conf write cf8 0x80010088 bytes 2 offset 0 data 0x2910
(XEN) d0v2 conf write cf8 0x80010090 bytes 2 offset 0 data 0x1c2
(XEN) d0v2 conf write cf8 0x800100a8 bytes 2 offset 0 data 0x400
(XEN) d0v2 conf write cf8 0x800100b0 bytes 2 offset 0 data 0x2
(XEN) d0v2 conf read cf8 0x8001003c bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x8001003c bytes 4 offset 0 data 0x1ff
(XEN) d0v2 conf read cf8 0x80010038 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010038 bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x80010034 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010034 bytes 4 offset 0 data 0x80
(XEN) d0v2 conf read cf8 0x80010030 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010030 bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x8001002c bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x8001002c bytes 4 offset 0 data 0xe61614c3
(XEN) d0v2 conf read cf8 0x80010028 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010028 bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x80010024 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010024 bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x80010024 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010024 bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x80010024 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010024 bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x80010024 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010024 bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x80010024 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010024 bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x80010024 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010024 bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x80010024 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010024 bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x80010024 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010024 bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x80010024 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010024 bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x80010024 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010024 bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x80010024 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010024 bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x80010020 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010020 bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x80010020 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010020 bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x80010020 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010020 bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x80010020 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010020 bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x80010020 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010020 bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x80010020 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010020 bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x80010020 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010020 bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x80010020 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010020 bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x80010020 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010020 bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x80010020 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010020 bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x80010020 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010020 bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x8001001c bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x8001001c bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x8001001c bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x8001001c bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x8001001c bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x8001001c bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x8001001c bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x8001001c bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x8001001c bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x8001001c bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x8001001c bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x8001001c bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x8001001c bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x8001001c bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x8001001c bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x8001001c bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x8001001c bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x8001001c bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x8001001c bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x8001001c bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x8001001c bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x8001001c bytes 4 offset 0 data 0
(XEN) d0v2 conf read cf8 0x80010018 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010018 bytes 4 offset 0 data 0x90b00004
(XEN) d0v2 conf read cf8 0x80010018 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010018 bytes 4 offset 0 data 0x90b00004
(XEN) d0v2 conf read cf8 0x80010018 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010018 bytes 4 offset 0 data 0x90b00004
(XEN) d0v2 conf read cf8 0x80010018 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010018 bytes 4 offset 0 data 0x90b00004
(XEN) d0v2 conf read cf8 0x80010018 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010018 bytes 4 offset 0 data 0x90b00004
(XEN) d0v2 conf read cf8 0x80010018 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010018 bytes 4 offset 0 data 0x90b00004
(XEN) d0v2 conf read cf8 0x80010018 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010018 bytes 4 offset 0 data 0x90b00004
(XEN) d0v2 conf read cf8 0x80010018 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010018 bytes 4 offset 0 data 0x90b00004
(XEN) d0v2 conf read cf8 0x80010018 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010018 bytes 4 offset 0 data 0x90b00004
(XEN) d0v2 conf read cf8 0x80010018 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010018 bytes 4 offset 0 data 0x90b00004
(XEN) d0v2 conf read cf8 0x80010018 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010018 bytes 4 offset 0 data 0x90b00004
(XEN) d0v2 conf read cf8 0x80010014 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010014 bytes 4 offset 0 data 0x80
(XEN) d0v2 conf read cf8 0x80010014 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010014 bytes 4 offset 0 data 0x80
(XEN) d0v2 conf read cf8 0x80010014 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010014 bytes 4 offset 0 data 0x80
(XEN) d0v2 conf read cf8 0x80010014 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010014 bytes 4 offset 0 data 0x80
(XEN) d0v2 conf read cf8 0x80010014 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010014 bytes 4 offset 0 data 0x80
(XEN) d0v2 conf read cf8 0x80010014 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010014 bytes 4 offset 0 data 0x80
(XEN) d0v2 conf read cf8 0x80010014 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010014 bytes 4 offset 0 data 0x80
(XEN) d0v2 conf read cf8 0x80010014 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010014 bytes 4 offset 0 data 0x80
(XEN) d0v2 conf read cf8 0x80010014 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010014 bytes 4 offset 0 data 0x80
(XEN) d0v2 conf read cf8 0x80010014 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010014 bytes 4 offset 0 data 0x80
(XEN) d0v2 conf read cf8 0x80010014 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010014 bytes 4 offset 0 data 0x80
(XEN) d0v2 conf read cf8 0x80010010 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010010 bytes 4 offset 0 data 0x1090000c
(XEN) d0v2 conf read cf8 0x80010010 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010010 bytes 4 offset 0 data 0x1090000c
(XEN) d0v2 conf read cf8 0x80010010 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010010 bytes 4 offset 0 data 0x1090000c
(XEN) d0v2 conf read cf8 0x80010010 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010010 bytes 4 offset 0 data 0x1090000c
(XEN) d0v2 conf read cf8 0x80010010 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010010 bytes 4 offset 0 data 0x1090000c
(XEN) d0v2 conf read cf8 0x80010010 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010010 bytes 4 offset 0 data 0x1090000c
(XEN) d0v2 conf read cf8 0x80010010 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010010 bytes 4 offset 0 data 0x1090000c
(XEN) d0v2 conf read cf8 0x80010010 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010010 bytes 4 offset 0 data 0x1090000c
(XEN) d0v2 conf read cf8 0x80010010 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010010 bytes 4 offset 0 data 0x1090000c
(XEN) d0v2 conf read cf8 0x80010010 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010010 bytes 4 offset 0 data 0x1090000c
(XEN) d0v2 conf read cf8 0x80010010 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010010 bytes 4 offset 0 data 0x1090000c
(XEN) d0v2 conf read cf8 0x8001000c bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x8001000c bytes 4 offset 0 data 0x10
(XEN) d0v2 conf read cf8 0x80010008 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010008 bytes 4 offset 0 data 0x2800000
(XEN) d0v2 conf read cf8 0x80010004 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010004 bytes 4 offset 0 data 0x100000
(XEN) d0v2 conf read cf8 0x80010000 bytes 4 offset 0 data 0xffffffff
(XEN) d0v2 conf write cf8 0x80010000 bytes 4 offset 0 data 0x61614c3
(XEN) d0v2 conf read cf8 0x80010004 bytes 2 offset 2 data 0xffff
(XEN) d0v2 conf read cf8 0x80010034 bytes 1 offset 0 data 0xff
(XEN) d0v2 conf read cf8 0x800100fc bytes 2 offset 0 data 0xffff
[1] https://gist.github.com/marmarek/b4391c71801145e52590e877c559c5e0
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-30 4:55 ` Marek Marczykowski-Górecki
@ 2025-01-30 9:30 ` Jan Beulich
2025-01-30 21:31 ` Bjorn Helgaas
0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2025-01-30 9:30 UTC (permalink / raw)
To: Marek Marczykowski-Górecki
Cc: Bjorn Helgaas, Jürgen Groß, Roger Pau Monné,
Boris Ostrovsky, xen-devel, linux-kernel, regressions,
Felix Fietkau, Lorenzo Bianconi, Ryder Lee, Bjorn Helgaas
On 30.01.2025 05:55, Marek Marczykowski-Górecki wrote:
> I've added logging of all config read/write to this device. Full log at
> [1].
>
> A little explanation:
> - it's done in pci_conf_read/pci_conf_write in https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/pci.c;h=97b792e578f1093194466081ad3651ade21cae7d;hb=HEAD
> - cf8 means cf8 port value (BDF + register)
> - bytes is read/write size (1/2/4)
> - offset is the offset in the register (on top of cf8), but not in data
> - data is either retrieved value, or written value, depending on
> function
> - it's logging only accesses to 01:00.0
>
> interesting part:
>
> lspci before reset:
> (XEN) d0v3 conf read cf8 0x80010000 bytes 4 offset 0 data 0x61614c3
> (XEN) d0v3 conf read cf8 0x80010004 bytes 4 offset 0 data 0x100000
> (XEN) d0v3 conf read cf8 0x80010008 bytes 4 offset 0 data 0x2800000
> (XEN) d0v3 conf read cf8 0x8001000c bytes 4 offset 0 data 0x10
> (XEN) d0v3 conf read cf8 0x80010010 bytes 4 offset 0 data 0x1090000c
> (XEN) d0v3 conf read cf8 0x80010014 bytes 4 offset 0 data 0x80
> (XEN) d0v3 conf read cf8 0x80010018 bytes 4 offset 0 data 0x90b00004
> (XEN) d0v3 conf read cf8 0x8001001c bytes 4 offset 0 data 0
> (XEN) d0v3 conf read cf8 0x80010020 bytes 4 offset 0 data 0
> (XEN) d0v3 conf read cf8 0x80010024 bytes 4 offset 0 data 0
> (XEN) d0v3 conf read cf8 0x80010028 bytes 4 offset 0 data 0
> (XEN) d0v3 conf read cf8 0x8001002c bytes 4 offset 0 data 0xe61614c3
> (XEN) d0v3 conf read cf8 0x80010030 bytes 4 offset 0 data 0
> (XEN) d0v3 conf read cf8 0x80010034 bytes 4 offset 0 data 0x80
> (XEN) d0v3 conf read cf8 0x80010038 bytes 4 offset 0 data 0
> (XEN) d0v3 conf read cf8 0x8001003c bytes 4 offset 0 data 0x1ff
> (XEN) d0v3 conf read cf8 0x80010080 bytes 4 offset 0 data 0x2e010
> (XEN) d0v3 conf read cf8 0x800100e0 bytes 4 offset 0 data 0x18af805
> (XEN) d0v3 conf read cf8 0x800100f8 bytes 4 offset 0 data 0xc8030001
>
> reset:
> (XEN) d0v1 conf read cf8 0x800100fc bytes 2 offset 0 data 0x8
> (XEN) d0v1 conf read cf8 0x800100fc bytes 2 offset 0 data 0x8
> (XEN) d0v1 conf read cf8 0x8001008c bytes 4 offset 0 data 0x145dc12
> (XEN) d0v1 conf read cf8 0x80010000 bytes 4 offset 0 data 0x61614c3
> (XEN) d0v1 conf read cf8 0x80010004 bytes 4 offset 0 data 0x100000
> (XEN) d0v1 conf read cf8 0x80010008 bytes 4 offset 0 data 0x2800000
> (XEN) d0v1 conf read cf8 0x8001000c bytes 4 offset 0 data 0x10
> (XEN) d0v1 conf read cf8 0x80010010 bytes 4 offset 0 data 0x1090000c
> (XEN) d0v1 conf read cf8 0x80010014 bytes 4 offset 0 data 0x80
> (XEN) d0v1 conf read cf8 0x80010018 bytes 4 offset 0 data 0x90b00004
> (XEN) d0v1 conf read cf8 0x8001001c bytes 4 offset 0 data 0
> (XEN) d0v1 conf read cf8 0x80010020 bytes 4 offset 0 data 0
> (XEN) d0v1 conf read cf8 0x80010024 bytes 4 offset 0 data 0
> (XEN) d0v1 conf read cf8 0x80010028 bytes 4 offset 0 data 0
> (XEN) d0v1 conf read cf8 0x8001002c bytes 4 offset 0 data 0xe61614c3
> (XEN) d0v1 conf read cf8 0x80010030 bytes 4 offset 0 data 0
> (XEN) d0v1 conf read cf8 0x80010034 bytes 4 offset 0 data 0x80
> (XEN) d0v1 conf read cf8 0x80010038 bytes 4 offset 0 data 0
> (XEN) d0v1 conf read cf8 0x8001003c bytes 4 offset 0 data 0x1ff
> (XEN) d0v1 conf read cf8 0x80010088 bytes 2 offset 0 data 0x2910
> (XEN) d0v1 conf read cf8 0x80010090 bytes 2 offset 0 data 0x1c2
> (XEN) d0v1 conf read cf8 0x800100a8 bytes 2 offset 0 data 0x400
> (XEN) d0v1 conf read cf8 0x800100b0 bytes 2 offset 0 data 0x2
> (XEN) d0v1 conf read cf8 0x80010004 bytes 2 offset 2 data 0x10
> (XEN) d0v1 conf read cf8 0x80010034 bytes 1 offset 0 data 0x80
> (XEN) d0v1 conf read cf8 0x80010080 bytes 2 offset 0 data 0xe010
> (XEN) d0v1 conf read cf8 0x800100e0 bytes 2 offset 0 data 0xf805
> (XEN) d0v1 conf read cf8 0x800100f8 bytes 2 offset 0 data 0x1
> (XEN) d0v1 conf write cf8 0x80010004 bytes 2 offset 0 data 0x400
> (XEN) d0v1 conf read cf8 0x80010088 bytes 2 offset 2 data 0x9
> (XEN) d0v1 conf read cf8 0x80010088 bytes 2 offset 0 data 0x2910
> (XEN) d0v1 conf write cf8 0x80010088 bytes 2 offset 0 data 0xa910
This is the express capability's Link Control 2 Register afaict. As per
the copy of the 6.0 spec that I have the top 4 bits have only two
defined encodings - 0b0000 and 0b0001. 0b1000, as is being set here, is
not defined.
Yet then the earlier questions remain: Why has this suddenly become a
problem? And why would this depend on how the present session was
started, and what was running in the previous session? Is this write
perhaps conditional upon something that has changed?
Jan
> (XEN) d0v2 conf read cf8 0x80010000 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf read cf8 0x80010090 bytes 2 offset 0 data 0xffff
> (XEN) d0v2 conf write cf8 0x80010090 bytes 2 offset 0 data 0xfffc
> (XEN) d0v2 conf write cf8 0x80010090 bytes 2 offset 0 data 0xffff
> (XEN) d0v2 conf write cf8 0x80010088 bytes 2 offset 0 data 0x2910
> (XEN) d0v2 conf write cf8 0x80010090 bytes 2 offset 0 data 0x1c2
> (XEN) d0v2 conf write cf8 0x800100a8 bytes 2 offset 0 data 0x400
> (XEN) d0v2 conf write cf8 0x800100b0 bytes 2 offset 0 data 0x2
> (XEN) d0v2 conf read cf8 0x8001003c bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x8001003c bytes 4 offset 0 data 0x1ff
> (XEN) d0v2 conf read cf8 0x80010038 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010038 bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x80010034 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010034 bytes 4 offset 0 data 0x80
> (XEN) d0v2 conf read cf8 0x80010030 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010030 bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x8001002c bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x8001002c bytes 4 offset 0 data 0xe61614c3
> (XEN) d0v2 conf read cf8 0x80010028 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010028 bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x80010024 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010024 bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x80010024 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010024 bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x80010024 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010024 bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x80010024 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010024 bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x80010024 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010024 bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x80010024 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010024 bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x80010024 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010024 bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x80010024 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010024 bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x80010024 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010024 bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x80010024 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010024 bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x80010024 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010024 bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x80010020 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010020 bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x80010020 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010020 bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x80010020 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010020 bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x80010020 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010020 bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x80010020 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010020 bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x80010020 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010020 bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x80010020 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010020 bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x80010020 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010020 bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x80010020 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010020 bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x80010020 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010020 bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x80010020 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010020 bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x8001001c bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x8001001c bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x8001001c bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x8001001c bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x8001001c bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x8001001c bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x8001001c bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x8001001c bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x8001001c bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x8001001c bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x8001001c bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x8001001c bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x8001001c bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x8001001c bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x8001001c bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x8001001c bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x8001001c bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x8001001c bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x8001001c bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x8001001c bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x8001001c bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x8001001c bytes 4 offset 0 data 0
> (XEN) d0v2 conf read cf8 0x80010018 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010018 bytes 4 offset 0 data 0x90b00004
> (XEN) d0v2 conf read cf8 0x80010018 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010018 bytes 4 offset 0 data 0x90b00004
> (XEN) d0v2 conf read cf8 0x80010018 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010018 bytes 4 offset 0 data 0x90b00004
> (XEN) d0v2 conf read cf8 0x80010018 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010018 bytes 4 offset 0 data 0x90b00004
> (XEN) d0v2 conf read cf8 0x80010018 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010018 bytes 4 offset 0 data 0x90b00004
> (XEN) d0v2 conf read cf8 0x80010018 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010018 bytes 4 offset 0 data 0x90b00004
> (XEN) d0v2 conf read cf8 0x80010018 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010018 bytes 4 offset 0 data 0x90b00004
> (XEN) d0v2 conf read cf8 0x80010018 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010018 bytes 4 offset 0 data 0x90b00004
> (XEN) d0v2 conf read cf8 0x80010018 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010018 bytes 4 offset 0 data 0x90b00004
> (XEN) d0v2 conf read cf8 0x80010018 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010018 bytes 4 offset 0 data 0x90b00004
> (XEN) d0v2 conf read cf8 0x80010018 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010018 bytes 4 offset 0 data 0x90b00004
> (XEN) d0v2 conf read cf8 0x80010014 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010014 bytes 4 offset 0 data 0x80
> (XEN) d0v2 conf read cf8 0x80010014 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010014 bytes 4 offset 0 data 0x80
> (XEN) d0v2 conf read cf8 0x80010014 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010014 bytes 4 offset 0 data 0x80
> (XEN) d0v2 conf read cf8 0x80010014 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010014 bytes 4 offset 0 data 0x80
> (XEN) d0v2 conf read cf8 0x80010014 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010014 bytes 4 offset 0 data 0x80
> (XEN) d0v2 conf read cf8 0x80010014 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010014 bytes 4 offset 0 data 0x80
> (XEN) d0v2 conf read cf8 0x80010014 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010014 bytes 4 offset 0 data 0x80
> (XEN) d0v2 conf read cf8 0x80010014 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010014 bytes 4 offset 0 data 0x80
> (XEN) d0v2 conf read cf8 0x80010014 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010014 bytes 4 offset 0 data 0x80
> (XEN) d0v2 conf read cf8 0x80010014 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010014 bytes 4 offset 0 data 0x80
> (XEN) d0v2 conf read cf8 0x80010014 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010014 bytes 4 offset 0 data 0x80
> (XEN) d0v2 conf read cf8 0x80010010 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010010 bytes 4 offset 0 data 0x1090000c
> (XEN) d0v2 conf read cf8 0x80010010 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010010 bytes 4 offset 0 data 0x1090000c
> (XEN) d0v2 conf read cf8 0x80010010 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010010 bytes 4 offset 0 data 0x1090000c
> (XEN) d0v2 conf read cf8 0x80010010 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010010 bytes 4 offset 0 data 0x1090000c
> (XEN) d0v2 conf read cf8 0x80010010 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010010 bytes 4 offset 0 data 0x1090000c
> (XEN) d0v2 conf read cf8 0x80010010 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010010 bytes 4 offset 0 data 0x1090000c
> (XEN) d0v2 conf read cf8 0x80010010 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010010 bytes 4 offset 0 data 0x1090000c
> (XEN) d0v2 conf read cf8 0x80010010 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010010 bytes 4 offset 0 data 0x1090000c
> (XEN) d0v2 conf read cf8 0x80010010 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010010 bytes 4 offset 0 data 0x1090000c
> (XEN) d0v2 conf read cf8 0x80010010 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010010 bytes 4 offset 0 data 0x1090000c
> (XEN) d0v2 conf read cf8 0x80010010 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010010 bytes 4 offset 0 data 0x1090000c
> (XEN) d0v2 conf read cf8 0x8001000c bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x8001000c bytes 4 offset 0 data 0x10
> (XEN) d0v2 conf read cf8 0x80010008 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010008 bytes 4 offset 0 data 0x2800000
> (XEN) d0v2 conf read cf8 0x80010004 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010004 bytes 4 offset 0 data 0x100000
> (XEN) d0v2 conf read cf8 0x80010000 bytes 4 offset 0 data 0xffffffff
> (XEN) d0v2 conf write cf8 0x80010000 bytes 4 offset 0 data 0x61614c3
> (XEN) d0v2 conf read cf8 0x80010004 bytes 2 offset 2 data 0xffff
> (XEN) d0v2 conf read cf8 0x80010034 bytes 1 offset 0 data 0xff
> (XEN) d0v2 conf read cf8 0x800100fc bytes 2 offset 0 data 0xffff
>
>
> [1] https://gist.github.com/marmarek/b4391c71801145e52590e877c559c5e0
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-30 9:30 ` Jan Beulich
@ 2025-01-30 21:31 ` Bjorn Helgaas
2025-01-31 7:13 ` Jan Beulich
2025-02-05 22:14 ` Marek Marczykowski-Górecki
0 siblings, 2 replies; 25+ messages in thread
From: Bjorn Helgaas @ 2025-01-30 21:31 UTC (permalink / raw)
To: Jan Beulich
Cc: Marek Marczykowski-Górecki, Bjorn Helgaas,
Jürgen Groß, Roger Pau Monné, Boris Ostrovsky,
xen-devel, linux-kernel, regressions, Felix Fietkau,
Lorenzo Bianconi, Ryder Lee
On Thu, Jan 30, 2025 at 10:30:33AM +0100, Jan Beulich wrote:
> On 30.01.2025 05:55, Marek Marczykowski-Górecki wrote:
> > I've added logging of all config read/write to this device. Full log at
> > [1].
> ...
I suspect there's something wrong with the Root Port RRS SV
configuration.
Can you add the two patches below?
I don't *think* either should make a difference. The first enables
RRS SV earlier and maybe in a cleaner place; the second should avoid
some pointless capability searches that clutter the logs.
What does d0v1/d0v2/d0v3 mean?
Can you add 00:02.2, the Root Port leading to bus 01, so we can see
the RRS SV configuration? Maybe also lspci -vv for both 00:02.2 and
01:00.0?
Maybe include timestamps and try an FLR without Xen (which I assume
works correctly) so we can see how long the device typically takes to
become ready?
Notes below on the snippet that you commented on, Jan. I think it
makes sense until the read after FLR returns 0xffffffff.
> > (XEN) d0v1 conf read cf8 0x80010034 bytes 1 offset 0 data 0x80
PCI_CAPABILITY_LIST, first cap at 0x80
> > (XEN) d0v1 conf read cf8 0x80010080 bytes 2 offset 0 data 0xe010
PCI_CAP_ID_EXP (0x10) at 0x80, next cap at 0xe0
> > (XEN) d0v1 conf read cf8 0x800100e0 bytes 2 offset 0 data 0xf805
PCI_CAP_ID_MSI (0x05) at 0xe0, next cap at 0xf8
> > (XEN) d0v1 conf read cf8 0x800100f8 bytes 2 offset 0 data 0x1
PCI_CAP_ID_PM (0x01) at 0xf8, end of cap list
These caps match the offsets from the lspci output in the full log:
1:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter
Subsystem: MEDIATEK Corp. Device e616
Flags: fast devsel, IRQ 255
Memory at 8010900000 (64-bit, prefetchable) [disabled] [size=1M]
Memory at 90b00000 (64-bit, non-prefetchable) [disabled] [size=32K]
Capabilities: [80] Express Endpoint, IntMsgNum 0
Capabilities: [e0] MSI: Enable- Count=1/32 Maskable+ 64bit+
Capabilities: [f8] Power Management version 3
> > (XEN) d0v1 conf write cf8 0x80010004 bytes 2 offset 0 data 0x400
Set PCI_COMMAND_INTX_DISABLE, disable BARs, Bus Master. Looks like
the end of pci_dev_save_and_disable().
> > (XEN) d0v1 conf read cf8 0x80010088 bytes 2 offset 2 data 0x9
PCIe Cap at 0x80, PCI_EXP_DEVCTL is 0x08, PCI_EXP_DEVSTA is 0x0a.
0x80010088 would be PCI_EXP_DEVCTL (a 2-byte register), maybe offset 2
gets us to PCI_EXP_DEVSTA? Not sure.
0x0001 PCI_EXP_DEVSTA_CED /* Correctable Error Detected */
0x0008 PCI_EXP_DEVSTA_URD /* Unsupported Request Detected */
Not impossible that these would be set. Lots of URs happen during
enumeration and we're not very good about cleaning these up.
Correctable errors are common for some devices. lspci -vv would
decode the PCIe cap registers, including this.
> > (XEN) d0v1 conf read cf8 0x80010088 bytes 2 offset 0 data 0x2910
PCI_EXP_DEVCTL:
0x2000 PCI_EXP_DEVCTL_READRQ_512B
0x0800 PCI_EXP_DEVCTL_NOSNOOP_EN
0x0100 PCI_EXP_DEVCTL_EXT_TAG
0x0010 PCI_EXP_DEVCTL_RELAX_EN
> > (XEN) d0v1 conf write cf8 0x80010088 bytes 2 offset 0 data 0xa910
PCI_EXP_DEVCTL:
set 0x8000 PCI_EXP_DEVCTL_BCR_FLR
This looks like the actual FLR being initiated.
> This is the express capability's Link Control 2 Register afaict.
Unless I'm missing something this is actually Device Control. So far
I think this all looks OK. The next part:
> > (XEN) d0v2 conf read cf8 0x80010000 bytes 4 offset 0 data 0xffffffff
looks like a problem. The normal successful read gets 0x061614c3.
After the FLR, if RRS SV is enabled, we should get either 0x0001ffff
or 0x061614c3.
Here we would exit the loop in pci_dev_wait() because we didn't see
0x0001 and we expect that the device is ready and we got a valid
Vendor ID.
So we proceed to restoring config space via pci_restore_state(), where
we restore some PCIe registers and the header (first 64 bytes). My
*guess* is the device isn't ready (or at least not responding) since
all the reads return ~0.
> > [1] https://gist.github.com/marmarek/b4391c71801145e52590e877c559c5e0
commit c2fd12204dcb ("PCI: Enable Configuration RRS SV early")
Author: Bjorn Helgaas <bhelgaas@google.com>
Date: Thu Jan 30 15:16:40 2025 -0600
PCI: Enable Configuration RRS SV early
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index b6536ed599c3..0b013b196d00 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1373,8 +1373,6 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
pci_write_config_word(dev, PCI_BRIDGE_CONTROL,
bctl & ~PCI_BRIDGE_CTL_MASTER_ABORT);
- pci_enable_rrs_sv(dev);
-
if ((secondary || subordinate) && !pcibios_assign_all_busses() &&
!is_cardbus && !broken) {
unsigned int cmax, buses;
@@ -1615,6 +1613,11 @@ void set_pcie_port_type(struct pci_dev *pdev)
pdev->pcie_cap = pos;
pci_read_config_word(pdev, pos + PCI_EXP_FLAGS, ®16);
pdev->pcie_flags_reg = reg16;
+
+ type = pci_pcie_type(pdev);
+ if (type == PCI_EXP_TYPE_ROOT_PORT)
+ pci_enable_rrs_sv(pdev);
+
pci_read_config_dword(pdev, pos + PCI_EXP_DEVCAP, &pdev->devcap);
pdev->pcie_mpss = FIELD_GET(PCI_EXP_DEVCAP_PAYLOAD, pdev->devcap);
@@ -1631,7 +1634,6 @@ void set_pcie_port_type(struct pci_dev *pdev)
* correctly so detect impossible configurations here and correct
* the port type accordingly.
*/
- type = pci_pcie_type(pdev);
if (type == PCI_EXP_TYPE_DOWNSTREAM) {
/*
* If pdev claims to be downstream port but the parent
commit 4ea25d50c7c1 ("PCI: Avoid needless capability searches")
Author: Bjorn Helgaas <bhelgaas@google.com>
Date: Thu Jan 30 14:33:00 2025 -0600
PCI: Avoid needless capability searches
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 869d204a70a3..02d592b81bc6 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1742,19 +1742,17 @@ static void pci_restore_pcie_state(struct pci_dev *dev)
static int pci_save_pcix_state(struct pci_dev *dev)
{
- int pos;
struct pci_cap_saved_state *save_state;
+ u8 pos;
+
+ save_state = pci_find_saved_cap(dev, PCI_CAP_ID_PCIX);
+ if (!save_state)
+ return -ENOMEM;
pos = pci_find_capability(dev, PCI_CAP_ID_PCIX);
if (!pos)
return 0;
- save_state = pci_find_saved_cap(dev, PCI_CAP_ID_PCIX);
- if (!save_state) {
- pci_err(dev, "buffer not found in %s\n", __func__);
- return -ENOMEM;
- }
-
pci_read_config_word(dev, pos + PCI_X_CMD,
(u16 *)save_state->cap.data);
@@ -1763,14 +1761,19 @@ static int pci_save_pcix_state(struct pci_dev *dev)
static void pci_restore_pcix_state(struct pci_dev *dev)
{
- int i = 0, pos;
struct pci_cap_saved_state *save_state;
+ u8 pos;
+ int i = 0;
u16 *cap;
save_state = pci_find_saved_cap(dev, PCI_CAP_ID_PCIX);
- pos = pci_find_capability(dev, PCI_CAP_ID_PCIX);
- if (!save_state || !pos)
+ if (!save_state)
return;
+
+ pos = pci_find_capability(dev, PCI_CAP_ID_PCIX);
+ if (!pos)
+ return;
+
cap = (u16 *)&save_state->cap.data[0];
pci_write_config_word(dev, pos + PCI_X_CMD, cap[i++]);
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index e0bc90597dca..007e4a082e6f 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -35,16 +35,14 @@ void pci_save_ltr_state(struct pci_dev *dev)
if (!pci_is_pcie(dev))
return;
+ save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_LTR);
+ if (!save_state)
+ return;
+
ltr = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_LTR);
if (!ltr)
return;
- save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_LTR);
- if (!save_state) {
- pci_err(dev, "no suspend buffer for LTR; ASPM issues possible after resume\n");
- return;
- }
-
/* Some broken devices only support dword access to LTR */
cap = &save_state->cap.data[0];
pci_read_config_dword(dev, ltr + PCI_LTR_MAX_SNOOP_LAT, cap);
@@ -57,8 +55,11 @@ void pci_restore_ltr_state(struct pci_dev *dev)
u32 *cap;
save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_LTR);
+ if (!save_state)
+ return;
+
ltr = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_LTR);
- if (!save_state || !ltr)
+ if (!ltr)
return;
/* Some broken devices only support dword access to LTR */
diff --git a/drivers/pci/vc.c b/drivers/pci/vc.c
index a4ff7f5f66dd..c39f3be518d4 100644
--- a/drivers/pci/vc.c
+++ b/drivers/pci/vc.c
@@ -355,20 +355,17 @@ int pci_save_vc_state(struct pci_dev *dev)
int i;
for (i = 0; i < ARRAY_SIZE(vc_caps); i++) {
- int pos, ret;
struct pci_cap_saved_state *save_state;
+ int pos, ret;
+
+ save_state = pci_find_saved_ext_cap(dev, vc_caps[i].id);
+ if (!save_state)
+ return -ENOMEM;
pos = pci_find_ext_capability(dev, vc_caps[i].id);
if (!pos)
continue;
- save_state = pci_find_saved_ext_cap(dev, vc_caps[i].id);
- if (!save_state) {
- pci_err(dev, "%s buffer not found in %s\n",
- vc_caps[i].name, __func__);
- return -ENOMEM;
- }
-
ret = pci_vc_do_save_buffer(dev, pos, save_state, true);
if (ret) {
pci_err(dev, "%s save unsuccessful %s\n",
@@ -392,12 +389,15 @@ void pci_restore_vc_state(struct pci_dev *dev)
int i;
for (i = 0; i < ARRAY_SIZE(vc_caps); i++) {
- int pos;
struct pci_cap_saved_state *save_state;
+ int pos;
+
+ save_state = pci_find_saved_ext_cap(dev, vc_caps[i].id);
+ if (!save_state)
+ continue;
pos = pci_find_ext_capability(dev, vc_caps[i].id);
- save_state = pci_find_saved_ext_cap(dev, vc_caps[i].id);
- if (!save_state || !pos)
+ if (!pos)
continue;
pci_vc_do_save_buffer(dev, pos, save_state, false);
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-30 21:31 ` Bjorn Helgaas
@ 2025-01-31 7:13 ` Jan Beulich
2025-01-31 8:36 ` Marek Marczykowski-Górecki
2025-02-05 22:14 ` Marek Marczykowski-Górecki
1 sibling, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2025-01-31 7:13 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Marek Marczykowski-Górecki, Bjorn Helgaas,
Jürgen Groß, Roger Pau Monné, Boris Ostrovsky,
xen-devel, linux-kernel, regressions, Felix Fietkau,
Lorenzo Bianconi, Ryder Lee
On 30.01.2025 22:31, Bjorn Helgaas wrote:
> On Thu, Jan 30, 2025 at 10:30:33AM +0100, Jan Beulich wrote:
>> On 30.01.2025 05:55, Marek Marczykowski-Górecki wrote:
>>> (XEN) d0v1 conf read cf8 0x80010088 bytes 2 offset 2 data 0x9
>
> PCIe Cap at 0x80, PCI_EXP_DEVCTL is 0x08, PCI_EXP_DEVSTA is 0x0a.
>
> 0x80010088 would be PCI_EXP_DEVCTL (a 2-byte register), maybe offset 2
> gets us to PCI_EXP_DEVSTA? Not sure.
>
> 0x0001 PCI_EXP_DEVSTA_CED /* Correctable Error Detected */
> 0x0008 PCI_EXP_DEVSTA_URD /* Unsupported Request Detected */
>
> Not impossible that these would be set. Lots of URs happen during
> enumeration and we're not very good about cleaning these up.
> Correctable errors are common for some devices. lspci -vv would
> decode the PCIe cap registers, including this.
>
>>> (XEN) d0v1 conf read cf8 0x80010088 bytes 2 offset 0 data 0x2910
>
> PCI_EXP_DEVCTL:
> 0x2000 PCI_EXP_DEVCTL_READRQ_512B
> 0x0800 PCI_EXP_DEVCTL_NOSNOOP_EN
> 0x0100 PCI_EXP_DEVCTL_EXT_TAG
> 0x0010 PCI_EXP_DEVCTL_RELAX_EN
>
>>> (XEN) d0v1 conf write cf8 0x80010088 bytes 2 offset 0 data 0xa910
>
> PCI_EXP_DEVCTL:
> set 0x8000 PCI_EXP_DEVCTL_BCR_FLR
>
> This looks like the actual FLR being initiated.
>
>> This is the express capability's Link Control 2 Register afaict.
>
> Unless I'm missing something this is actually Device Control. So far
> I think this all looks OK. The next part:
What you say is very plausible as far as the observed behavior goes,
but: According to the lspci output provided earlier the express
capability is at 58 (hex). Hence here we're 30 (hex) into the
capability, which according to the spec I'm looking at is Link
Control 2. Yet as said - with what you say being plausible, likely
I'm simply getting something very wrong.
Jan
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-31 7:13 ` Jan Beulich
@ 2025-01-31 8:36 ` Marek Marczykowski-Górecki
0 siblings, 0 replies; 25+ messages in thread
From: Marek Marczykowski-Górecki @ 2025-01-31 8:36 UTC (permalink / raw)
To: Jan Beulich
Cc: Bjorn Helgaas, Bjorn Helgaas, Jürgen Groß,
Roger Pau Monné, Boris Ostrovsky, xen-devel, linux-kernel,
regressions, Felix Fietkau, Lorenzo Bianconi, Ryder Lee
[-- Attachment #1: Type: text/plain, Size: 2212 bytes --]
On Fri, Jan 31, 2025 at 08:13:37AM +0100, Jan Beulich wrote:
> On 30.01.2025 22:31, Bjorn Helgaas wrote:
> > On Thu, Jan 30, 2025 at 10:30:33AM +0100, Jan Beulich wrote:
> >> On 30.01.2025 05:55, Marek Marczykowski-Górecki wrote:
> >>> (XEN) d0v1 conf read cf8 0x80010088 bytes 2 offset 2 data 0x9
> >
> > PCIe Cap at 0x80, PCI_EXP_DEVCTL is 0x08, PCI_EXP_DEVSTA is 0x0a.
> >
> > 0x80010088 would be PCI_EXP_DEVCTL (a 2-byte register), maybe offset 2
> > gets us to PCI_EXP_DEVSTA? Not sure.
> >
> > 0x0001 PCI_EXP_DEVSTA_CED /* Correctable Error Detected */
> > 0x0008 PCI_EXP_DEVSTA_URD /* Unsupported Request Detected */
> >
> > Not impossible that these would be set. Lots of URs happen during
> > enumeration and we're not very good about cleaning these up.
> > Correctable errors are common for some devices. lspci -vv would
> > decode the PCIe cap registers, including this.
> >
> >>> (XEN) d0v1 conf read cf8 0x80010088 bytes 2 offset 0 data 0x2910
> >
> > PCI_EXP_DEVCTL:
> > 0x2000 PCI_EXP_DEVCTL_READRQ_512B
> > 0x0800 PCI_EXP_DEVCTL_NOSNOOP_EN
> > 0x0100 PCI_EXP_DEVCTL_EXT_TAG
> > 0x0010 PCI_EXP_DEVCTL_RELAX_EN
> >
> >>> (XEN) d0v1 conf write cf8 0x80010088 bytes 2 offset 0 data 0xa910
> >
> > PCI_EXP_DEVCTL:
> > set 0x8000 PCI_EXP_DEVCTL_BCR_FLR
> >
> > This looks like the actual FLR being initiated.
> >
> >> This is the express capability's Link Control 2 Register afaict.
> >
> > Unless I'm missing something this is actually Device Control. So far
> > I think this all looks OK. The next part:
>
> What you say is very plausible as far as the observed behavior goes,
> but: According to the lspci output provided earlier the express
> capability is at 58 (hex).
lspci in the log says:
Capabilities: [80] Express Endpoint, IntMsgNum 0
I think you confused device config space with bridge config space.
> Hence here we're 30 (hex) into the
> capability, which according to the spec I'm looking at is Link
> Control 2. Yet as said - with what you say being plausible, likely
> I'm simply getting something very wrong.
>
> Jan
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-30 21:31 ` Bjorn Helgaas
2025-01-31 7:13 ` Jan Beulich
@ 2025-02-05 22:14 ` Marek Marczykowski-Górecki
2025-02-07 22:00 ` Bjorn Helgaas
2025-02-07 22:23 ` Bjorn Helgaas
1 sibling, 2 replies; 25+ messages in thread
From: Marek Marczykowski-Górecki @ 2025-02-05 22:14 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Jan Beulich, Bjorn Helgaas, Jürgen Groß,
Roger Pau Monné, Boris Ostrovsky, xen-devel, linux-kernel,
regressions, Felix Fietkau, Lorenzo Bianconi, Ryder Lee
[-- Attachment #1: Type: text/plain, Size: 21985 bytes --]
On Thu, Jan 30, 2025 at 03:31:23PM -0600, Bjorn Helgaas wrote:
> On Thu, Jan 30, 2025 at 10:30:33AM +0100, Jan Beulich wrote:
> > On 30.01.2025 05:55, Marek Marczykowski-Górecki wrote:
> > > I've added logging of all config read/write to this device. Full log at
> > > [1].
> > ...
>
> I suspect there's something wrong with the Root Port RRS SV
> configuration.
>
> Can you add the two patches below?
I tried and indeed it doesn't make a difference. Generally it looks like
this device has broken FLR, and the reset works due to the fallback to
the secondary bus reset on timeout. I repeated the test with my
additional "&& !PCI_POSSIBLE_ERROR(id)" and I got this:
https://gist.github.com/marmarek/db0808702131b69ea2f66f339a55d71b
The first log is with xen, and the second with native linux (and added
PCI config space logging in drivers/pci/access.c).
Ignore "usb usb2-port2" errors, that's my USB debug console that
didn't worked on native Linux...
For convenience I paste the interesting part at the end of the email.
> I don't *think* either should make a difference. The first enables
> RRS SV earlier and maybe in a cleaner place; the second should avoid
> some pointless capability searches that clutter the logs.
>
> What does d0v1/d0v2/d0v3 mean?
This is dom0 vcpu 1/2/3.
> Can you add 00:02.2, the Root Port leading to bus 01, so we can see
> the RRS SV configuration? Maybe also lspci -vv for both 00:02.2 and
> 01:00.0?
Yes, added in the log above.
> Maybe include timestamps and try an FLR without Xen (which I assume
> works correctly) so we can see how long the device typically takes to
> become ready?
Timestamps are in the "native" log above. The one under Xen I added
"time" for the device reset call, it said 1m10s...
> Notes below on the snippet that you commented on, Jan. I think it
> makes sense until the read after FLR returns 0xffffffff.
>
> > > (XEN) d0v1 conf read cf8 0x80010034 bytes 1 offset 0 data 0x80
>
> PCI_CAPABILITY_LIST, first cap at 0x80
>
> > > (XEN) d0v1 conf read cf8 0x80010080 bytes 2 offset 0 data 0xe010
>
> PCI_CAP_ID_EXP (0x10) at 0x80, next cap at 0xe0
>
> > > (XEN) d0v1 conf read cf8 0x800100e0 bytes 2 offset 0 data 0xf805
>
> PCI_CAP_ID_MSI (0x05) at 0xe0, next cap at 0xf8
>
> > > (XEN) d0v1 conf read cf8 0x800100f8 bytes 2 offset 0 data 0x1
>
> PCI_CAP_ID_PM (0x01) at 0xf8, end of cap list
>
> These caps match the offsets from the lspci output in the full log:
>
> 1:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter
> Subsystem: MEDIATEK Corp. Device e616
> Flags: fast devsel, IRQ 255
> Memory at 8010900000 (64-bit, prefetchable) [disabled] [size=1M]
> Memory at 90b00000 (64-bit, non-prefetchable) [disabled] [size=32K]
> Capabilities: [80] Express Endpoint, IntMsgNum 0
> Capabilities: [e0] MSI: Enable- Count=1/32 Maskable+ 64bit+
> Capabilities: [f8] Power Management version 3
>
> > > (XEN) d0v1 conf write cf8 0x80010004 bytes 2 offset 0 data 0x400
>
> Set PCI_COMMAND_INTX_DISABLE, disable BARs, Bus Master. Looks like
> the end of pci_dev_save_and_disable().
>
> > > (XEN) d0v1 conf read cf8 0x80010088 bytes 2 offset 2 data 0x9
>
> PCIe Cap at 0x80, PCI_EXP_DEVCTL is 0x08, PCI_EXP_DEVSTA is 0x0a.
>
> 0x80010088 would be PCI_EXP_DEVCTL (a 2-byte register), maybe offset 2
> gets us to PCI_EXP_DEVSTA? Not sure.
>
> 0x0001 PCI_EXP_DEVSTA_CED /* Correctable Error Detected */
> 0x0008 PCI_EXP_DEVSTA_URD /* Unsupported Request Detected */
>
> Not impossible that these would be set. Lots of URs happen during
> enumeration and we're not very good about cleaning these up.
> Correctable errors are common for some devices. lspci -vv would
> decode the PCIe cap registers, including this.
>
> > > (XEN) d0v1 conf read cf8 0x80010088 bytes 2 offset 0 data 0x2910
>
> PCI_EXP_DEVCTL:
> 0x2000 PCI_EXP_DEVCTL_READRQ_512B
> 0x0800 PCI_EXP_DEVCTL_NOSNOOP_EN
> 0x0100 PCI_EXP_DEVCTL_EXT_TAG
> 0x0010 PCI_EXP_DEVCTL_RELAX_EN
>
> > > (XEN) d0v1 conf write cf8 0x80010088 bytes 2 offset 0 data 0xa910
>
> PCI_EXP_DEVCTL:
> set 0x8000 PCI_EXP_DEVCTL_BCR_FLR
>
> This looks like the actual FLR being initiated.
>
> > This is the express capability's Link Control 2 Register afaict.
>
> Unless I'm missing something this is actually Device Control. So far
> I think this all looks OK. The next part:
>
> > > (XEN) d0v2 conf read cf8 0x80010000 bytes 4 offset 0 data 0xffffffff
>
> looks like a problem. The normal successful read gets 0x061614c3.
> After the FLR, if RRS SV is enabled, we should get either 0x0001ffff
> or 0x061614c3.
Yes, and the most recent test I got the same also when running Linux
natively (which doesn't match my initial report).
And also, it looks when it works, it's because waiting for the FLR to
complete times out and it goes to the secondary bus reset, which works
instantly.
> Here we would exit the loop in pci_dev_wait() because we didn't see
> 0x0001 and we expect that the device is ready and we got a valid
> Vendor ID.
>
> So we proceed to restoring config space via pci_restore_state(), where
> we restore some PCIe registers and the header (first 64 bytes). My
> *guess* is the device isn't ready (or at least not responding) since
> all the reads return ~0.
Logs from native (logging both the device at 01:00.0 and the bridge at
00:02.2):
[ 348.129591] PCI: read bus 0x1 devfn 0x0 pos 0xfc size 2 value 0x8
[ 348.129609] PCI: read bus 0x1 devfn 0x0 pos 0xfc size 2 value 0x8
[ 348.130258] PCI: read bus 0x0 devfn 0x12 pos 0x64 size 4 value 0x4737814
[ 348.130645] PCI: read bus 0x1 devfn 0x0 pos 0x8c size 4 value 0x145dc12
[ 348.130663] PCI: read bus 0x1 devfn 0x0 pos 0x0 size 4 value 0x61614c3
[ 348.130667] PCI: read bus 0x1 devfn 0x0 pos 0x4 size 4 value 0x100000
[ 348.130671] PCI: read bus 0x1 devfn 0x0 pos 0x8 size 4 value 0x2800000
[ 348.130675] PCI: read bus 0x1 devfn 0x0 pos 0xc size 4 value 0x10
[ 348.130678] PCI: read bus 0x1 devfn 0x0 pos 0x10 size 4 value 0x1090000c
[ 348.131855] PCI: read bus 0x1 devfn 0x0 pos 0x14 size 4 value 0x80
[ 348.131859] PCI: read bus 0x1 devfn 0x0 pos 0x18 size 4 value 0x90b00004
[ 348.131863] PCI: read bus 0x1 devfn 0x0 pos 0x1c size 4 value 0x0
[ 348.132414] PCI: read bus 0x1 devfn 0x0 pos 0x20 size 4 value 0x0
[ 348.132419] PCI: read bus 0x1 devfn 0x0 pos 0x24 size 4 value 0x0
[ 348.132422] PCI: read bus 0x1 devfn 0x0 pos 0x28 size 4 value 0x0
[ 348.132426] PCI: read bus 0x1 devfn 0x0 pos 0x2c size 4 value 0xe61614c3
[ 348.133104] PCI: read bus 0x1 devfn 0x0 pos 0x30 size 4 value 0x0
[ 348.133121] PCI: read bus 0x1 devfn 0x0 pos 0x34 size 4 value 0x80
[ 348.133125] PCI: read bus 0x1 devfn 0x0 pos 0x38 size 4 value 0x0
[ 348.133128] PCI: read bus 0x1 devfn 0x0 pos 0x3c size 4 value 0x1ff
[ 348.133133] PCI: read bus 0x1 devfn 0x0 pos 0x88 size 2 value 0x2910
[ 348.133136] PCI: read bus 0x1 devfn 0x0 pos 0x90 size 2 value 0x1c2
[ 348.133140] PCI: read bus 0x1 devfn 0x0 pos 0xa8 size 2 value 0x400
[ 348.133143] PCI: read bus 0x1 devfn 0x0 pos 0xb0 size 2 value 0x2
[ 348.133148] PCI: read bus 0x1 devfn 0x0 pos 0x11c size 4 value 0x79
[ 348.133152] PCI: read bus 0x1 devfn 0x0 pos 0x118 size 4 value 0x40a3000f
[ 348.134803] PCI: read bus 0x1 devfn 0x0 pos 0x100 size 4 value 0x1081000b
[ 348.134806] PCI: read bus 0x1 devfn 0x0 pos 0x108 size 4 value 0x11010018
[ 348.134810] PCI: read bus 0x1 devfn 0x0 pos 0x10c size 4 value 0x10011001
[ 348.134813] PCI: write bus 0x1 devfn 0x0 pos 0x4 size 2 value 0x400
[ 348.135924] PCI: read bus 0x1 devfn 0x0 pos 0x8a size 2 value 0x0
[ 348.135928] PCI: read bus 0x1 devfn 0x0 pos 0x88 size 2 value 0x2910
[ 348.135931] PCI: write bus 0x1 devfn 0x0 pos 0x88 size 2 value 0xa910
[ 348.241243] PCI: read bus 0x1 devfn 0x0 pos 0x0 size 4 value 0xffffffff
[ 348.243160] PCI: read bus 0x1 devfn 0x0 pos 0x0 size 4 value 0xffffffff
[ 348.246356] PCI: read bus 0x1 devfn 0x0 pos 0x0 size 4 value 0xffffffff
[ 348.251348] PCI: read bus 0x1 devfn 0x0 pos 0x0 size 4 value 0xffffffff
[ 348.260697] PCI: read bus 0x1 devfn 0x0 pos 0x0 size 4 value 0xffffffff
[ 348.278824] PCI: read bus 0x1 devfn 0x0 pos 0x0 size 4 value 0xffffffff
[ 348.312794] PCI: read bus 0x1 devfn 0x0 pos 0x0 size 4 value 0xffffffff
[ 348.385762] PCI: read bus 0x1 devfn 0x0 pos 0x0 size 4 value 0xffffffff
[ 348.521807] PCI: read bus 0x1 devfn 0x0 pos 0x0 size 4 value 0xffffffff
[ 348.785302] PCI: read bus 0x1 devfn 0x0 pos 0x0 size 4 value 0xffffffff
[ 349.353822] PCI: read bus 0x1 devfn 0x0 pos 0x0 size 4 value 0xffffffff
[ 349.353842] PCI: read bus 0x0 devfn 0x12 pos 0x88 size 2 value 0x42
[ 349.354415] PCI: read bus 0x0 devfn 0x12 pos 0x6a size 2 value 0x3012
[ 349.355623] pci 0000:01:00.0: not ready 1023ms after FLR; waiting
[ 350.441192] PCI: read bus 0x1 devfn 0x0 pos 0x0 size 4 value 0xffffffff
[ 350.441211] pci 0000:01:00.0: not ready 2047ms after FLR; waiting
[ 352.553793] PCI: read bus 0x1 devfn 0x0 pos 0x0 size 4 value 0xffffffff
[ 352.553812] pci 0000:01:00.0: not ready 4095ms after FLR; waiting
[ 357.097805] PCI: read bus 0x1 devfn 0x0 pos 0x0 size 4 value 0xffffffff
[ 357.097825] pci 0000:01:00.0: not ready 8191ms after FLR; waiting
[ 365.801764] PCI: read bus 0x1 devfn 0x0 pos 0x0 size 4 value 0xffffffff
[ 365.801783] pci 0000:01:00.0: not ready 16383ms after FLR; waiting
[ 382.697684] PCI: read bus 0x1 devfn 0x0 pos 0x0 size 4 value 0xffffffff
[ 382.697707] pci 0000:01:00.0: not ready 32767ms after FLR; waiting
[ 415.977738] PCI: read bus 0x1 devfn 0x0 pos 0x0 size 4 value 0xffffffff
[ 415.977761] pci 0000:01:00.0: not ready 65535ms after FLR; giving up
[ 415.978610] PCI: read bus 0x0 devfn 0x12 pos 0x100 size 4 value 0x2701000b
[ 415.978614] PCI: read bus 0x0 devfn 0x12 pos 0x270 size 4 value 0x2a010019
[ 415.978617] PCI: read bus 0x0 devfn 0x12 pos 0x2a0 size 4 value 0x3701000d
[ 415.978619] PCI: read bus 0x0 devfn 0x12 pos 0x370 size 4 value 0x4001001e
[ 415.978623] PCI: read bus 0x0 devfn 0x12 pos 0x400 size 4 value 0x41010025
[ 415.978626] PCI: read bus 0x0 devfn 0x12 pos 0x410 size 4 value 0x44010026
[ 415.978628] PCI: read bus 0x0 devfn 0x12 pos 0x440 size 4 value 0x10027
[ 415.978632] PCI: read bus 0x0 devfn 0x12 pos 0x3e size 2 value 0x2
[ 415.978636] PCI: write bus 0x0 devfn 0x12 pos 0x3e size 2 value 0x42
[ 415.981198] PCI: write bus 0x0 devfn 0x12 pos 0x3e size 2 value 0x2
[ 416.003807] PCI: read bus 0x0 devfn 0x12 pos 0x6a size 2 value 0x1011
[ 416.006865] PCI: read bus 0x0 devfn 0x12 pos 0x6a size 2 value 0x1011
[ 416.008690] PCI: read bus 0x0 devfn 0x12 pos 0x78 size 4 value 0x0
[ 416.009178] PCI: read bus 0x0 devfn 0x12 pos 0x6a size 2 value 0xb012
[ 416.009755] PCI: read bus 0x0 devfn 0x12 pos 0x6a size 2 value 0xb012
[ 416.010319] PCI: write bus 0x0 devfn 0x12 pos 0x6a size 2 value 0x8000
[ 416.010322] PCI: read bus 0x0 devfn 0x12 pos 0x6a size 2 value 0x3012
[ 416.114652] PCI: read bus 0x1 devfn 0x0 pos 0x0 size 4 value 0x61614c3
[ 416.115053] PCI: read bus 0x1 devfn 0x0 pos 0x100 size 4 value 0x1081000b
[ 416.115341] PCI: read bus 0x1 devfn 0x0 pos 0x108 size 4 value 0x11010018
[ 416.115563] PCI: write bus 0x1 devfn 0x0 pos 0x10c size 4 value 0x10011001
[ 416.115803] PCI: read bus 0x1 devfn 0x0 pos 0x90 size 2 value 0x0
[ 416.115807] PCI: read bus 0x0 devfn 0x12 pos 0x68 size 2 value 0xcc2
[ 416.115811] PCI: write bus 0x1 devfn 0x0 pos 0x90 size 2 value 0x0
[ 416.115814] PCI: write bus 0x0 devfn 0x12 pos 0x68 size 2 value 0xcc0
[ 416.115817] PCI: read bus 0x1 devfn 0x0 pos 0x118 size 4 value 0x0
[ 416.115820] PCI: write bus 0x1 devfn 0x0 pos 0x118 size 4 value 0x0
[ 416.115823] PCI: read bus 0x0 devfn 0x12 pos 0x378 size 4 value 0x40a30a0f
[ 416.115825] PCI: write bus 0x0 devfn 0x12 pos 0x378 size 4 value 0x40a30a0a
[ 416.115828] PCI: write bus 0x0 devfn 0x12 pos 0x37c size 4 value 0x79
[ 416.115831] PCI: write bus 0x1 devfn 0x0 pos 0x11c size 4 value 0x79
[ 416.115834] PCI: write bus 0x0 devfn 0x12 pos 0x378 size 4 value 0x40a30a0a
[ 416.115837] PCI: write bus 0x1 devfn 0x0 pos 0x118 size 4 value 0x40a3000a
[ 416.115839] PCI: write bus 0x0 devfn 0x12 pos 0x378 size 4 value 0x40a30a0f
[ 416.115842] PCI: write bus 0x1 devfn 0x0 pos 0x118 size 4 value 0x40a3000f
[ 416.115845] PCI: write bus 0x0 devfn 0x12 pos 0x68 size 2 value 0xcc2
[ 416.115849] PCI: write bus 0x1 devfn 0x0 pos 0x90 size 2 value 0x0
[ 416.115852] PCI: read bus 0x0 devfn 0x12 pos 0x80 size 4 value 0x6
[ 416.115855] PCI: read bus 0x0 devfn 0x12 pos 0x80 size 2 value 0x6
[ 416.115858] PCI: write bus 0x0 devfn 0x12 pos 0x80 size 2 value 0x406
[ 416.115861] PCI: write bus 0x1 devfn 0x0 pos 0x88 size 2 value 0x2910
[ 416.115864] PCI: write bus 0x1 devfn 0x0 pos 0x90 size 2 value 0x1c2
[ 416.115868] PCI: write bus 0x1 devfn 0x0 pos 0xa8 size 2 value 0x400
[ 416.115871] PCI: write bus 0x1 devfn 0x0 pos 0xb0 size 2 value 0x2
[ 416.115875] PCI: read bus 0x1 devfn 0x0 pos 0x100 size 4 value 0x1081000b
[ 416.115878] PCI: read bus 0x1 devfn 0x0 pos 0x108 size 4 value 0x11010018
[ 416.115881] PCI: read bus 0x1 devfn 0x0 pos 0x110 size 4 value 0x2001001e
[ 416.115884] PCI: read bus 0x1 devfn 0x0 pos 0x200 size 4 value 0x20001
[ 416.115889] PCI: write bus 0x1 devfn 0x0 pos 0x208 size 4 value 0x0
[ 416.115892] PCI: write bus 0x1 devfn 0x0 pos 0x20c size 4 value 0x0
[ 416.115895] PCI: write bus 0x1 devfn 0x0 pos 0x214 size 4 value 0x0
[ 416.115899] PCI: write bus 0x1 devfn 0x0 pos 0x218 size 4 value 0x0
[ 416.115902] PCI: read bus 0x1 devfn 0x0 pos 0x3c size 4 value 0x100
[ 416.115906] PCI: write bus 0x1 devfn 0x0 pos 0x3c size 4 value 0x1ff
[ 416.115909] PCI: read bus 0x1 devfn 0x0 pos 0x38 size 4 value 0x0
[ 416.115912] PCI: read bus 0x1 devfn 0x0 pos 0x34 size 4 value 0x80
[ 416.115916] PCI: read bus 0x1 devfn 0x0 pos 0x30 size 4 value 0x0
[ 416.115919] PCI: read bus 0x1 devfn 0x0 pos 0x2c size 4 value 0xe61614c3
[ 416.115923] PCI: read bus 0x1 devfn 0x0 pos 0x28 size 4 value 0x0
[ 416.115926] PCI: read bus 0x1 devfn 0x0 pos 0x24 size 4 value 0x0
[ 416.115930] PCI: read bus 0x1 devfn 0x0 pos 0x20 size 4 value 0x0
[ 416.115933] PCI: read bus 0x1 devfn 0x0 pos 0x1c size 4 value 0x0
[ 416.123065] PCI: read bus 0x1 devfn 0x0 pos 0x18 size 4 value 0x4
[ 416.123070] PCI: write bus 0x1 devfn 0x0 pos 0x18 size 4 value 0x90b00004
[ 416.123073] PCI: read bus 0x1 devfn 0x0 pos 0x18 size 4 value 0x90b00004
[ 416.123076] PCI: read bus 0x1 devfn 0x0 pos 0x14 size 4 value 0x0
[ 416.123080] PCI: write bus 0x1 devfn 0x0 pos 0x14 size 4 value 0x80
[ 416.123083] PCI: read bus 0x1 devfn 0x0 pos 0x14 size 4 value 0x80
[ 416.123086] PCI: read bus 0x1 devfn 0x0 pos 0x10 size 4 value 0xc
[ 416.123090] PCI: write bus 0x1 devfn 0x0 pos 0x10 size 4 value 0x1090000c
[ 416.123093] PCI: read bus 0x1 devfn 0x0 pos 0x10 size 4 value 0x1090000c
[ 416.123096] PCI: read bus 0x1 devfn 0x0 pos 0xc size 4 value 0x0
[ 416.123099] PCI: write bus 0x1 devfn 0x0 pos 0xc size 4 value 0x10
[ 416.123103] PCI: read bus 0x1 devfn 0x0 pos 0x8 size 4 value 0x2800000
[ 416.123106] PCI: read bus 0x1 devfn 0x0 pos 0x4 size 4 value 0x100000
[ 416.123109] PCI: read bus 0x1 devfn 0x0 pos 0x0 size 4 value 0x61614c3
Logs with Xen:
(XEN) d0v3 conf read cf8 0x800100fc bytes 2 offset 0 data 0x8
(XEN) d0v3 conf read cf8 0x800100fc bytes 2 offset 0 data 0x8
(XEN) d0v3 conf read cf8 0x80001264 bytes 4 offset 0 data 0x4737814
(XEN) d0v3 conf read cf8 0x8001008c bytes 4 offset 0 data 0x145dc12
(XEN) d0v3 conf read cf8 0x80010000 bytes 4 offset 0 data 0x61614c3
(XEN) d0v3 conf read cf8 0x80010004 bytes 4 offset 0 data 0x100000
(XEN) d0v3 conf read cf8 0x80010008 bytes 4 offset 0 data 0x2800000
(XEN) d0v3 conf read cf8 0x8001000c bytes 4 offset 0 data 0x10
(XEN) d0v3 conf read cf8 0x80010010 bytes 4 offset 0 data 0x1090000c
(XEN) d0v3 conf read cf8 0x80010014 bytes 4 offset 0 data 0x80
(XEN) d0v3 conf read cf8 0x80010018 bytes 4 offset 0 data 0x90b00004
(XEN) d0v3 conf read cf8 0x8001001c bytes 4 offset 0 data 0
(XEN) d0v3 conf read cf8 0x80010020 bytes 4 offset 0 data 0
(XEN) d0v3 conf read cf8 0x80010024 bytes 4 offset 0 data 0
(XEN) d0v3 conf read cf8 0x80010028 bytes 4 offset 0 data 0
(XEN) d0v3 conf read cf8 0x8001002c bytes 4 offset 0 data 0xe61614c3
(XEN) d0v3 conf read cf8 0x80010030 bytes 4 offset 0 data 0
(XEN) d0v3 conf read cf8 0x80010034 bytes 4 offset 0 data 0x80
(XEN) d0v3 conf read cf8 0x80010038 bytes 4 offset 0 data 0
(XEN) d0v3 conf read cf8 0x8001003c bytes 4 offset 0 data 0x1ff
(XEN) d0v3 conf read cf8 0x80010088 bytes 2 offset 0 data 0x2910
(XEN) d0v3 conf read cf8 0x80010090 bytes 2 offset 0 data 0x1c2
(XEN) d0v3 conf read cf8 0x800100a8 bytes 2 offset 0 data 0x400
(XEN) d0v3 conf read cf8 0x800100b0 bytes 2 offset 0 data 0x2
(XEN) d0v3 conf write cf8 0x80010004 bytes 2 offset 0 data 0x400
(XEN) d0v3 conf read cf8 0x80010088 bytes 2 offset 2 data 0x9
(XEN) d0v3 conf read cf8 0x80010088 bytes 2 offset 0 data 0x2910
(XEN) d0v3 conf write cf8 0x80010088 bytes 2 offset 0 data 0xa910
(XEN) d0v3 conf read cf8 0x80010000 bytes 4 offset 0 data 0xffffffff
(XEN) d0v3 conf read cf8 0x80010000 bytes 4 offset 0 data 0xffffffff
(XEN) d0v3 conf read cf8 0x80010000 bytes 4 offset 0 data 0xffffffff
(XEN) d0v3 conf read cf8 0x80010000 bytes 4 offset 0 data 0xffffffff
(XEN) d0v3 conf read cf8 0x80010000 bytes 4 offset 0 data 0xffffffff
(XEN) d0v3 conf read cf8 0x80010000 bytes 4 offset 0 data 0xffffffff
(XEN) d0v3 conf read cf8 0x80010000 bytes 4 offset 0 data 0xffffffff
(XEN) d0v3 conf read cf8 0x80010000 bytes 4 offset 0 data 0xffffffff
(XEN) d0v3 conf read cf8 0x80010000 bytes 4 offset 0 data 0xffffffff
(XEN) d0v3 conf read cf8 0x80010000 bytes 4 offset 0 data 0xffffffff
(XEN) d0v3 conf read cf8 0x80010000 bytes 4 offset 0 data 0xffffffff
(XEN) d0v3 conf read cf8 0x80001288 bytes 2 offset 0 data 0x42
(XEN) d0v3 conf read cf8 0x80001268 bytes 2 offset 2 data 0x3012
(XEN) d0v3 conf read cf8 0x80010000 bytes 4 offset 0 data 0xffffffff
(XEN) d0v3 conf read cf8 0x80010000 bytes 4 offset 0 data 0xffffffff
(XEN) d0v3 conf read cf8 0x80010000 bytes 4 offset 0 data 0xffffffff
(XEN) d0v3 conf read cf8 0x80010000 bytes 4 offset 0 data 0xffffffff
(XEN) d0v3 conf read cf8 0x80010000 bytes 4 offset 0 data 0xffffffff
(XEN) d0v4 conf read cf8 0x80010000 bytes 4 offset 0 data 0xffffffff
(XEN) d0v4 conf read cf8 0x8000123c bytes 2 offset 2 data 0x2
(XEN) d0v4 conf write cf8 0x8000123c bytes 2 offset 2 data 0x42
(XEN) d0v4 conf write cf8 0x8000123c bytes 2 offset 2 data 0x2
(XEN) d0v4 conf read cf8 0x80001268 bytes 2 offset 2 data 0x1011
(XEN) d0v4 conf read cf8 0x80001268 bytes 2 offset 2 data 0x1011
(XEN) d0v4 conf read cf8 0x80001268 bytes 2 offset 2 data 0x1011
(XEN) d0v4 conf read cf8 0x80001268 bytes 2 offset 2 data 0x1011
(XEN) d0v1 conf read cf8 0x80001278 bytes 4 offset 0 data 0
(XEN) d0v1 conf read cf8 0x80001268 bytes 2 offset 2 data 0xb012
(XEN) d0v1 conf write cf8 0x80001268 bytes 2 offset 2 data 0x8000
(XEN) d0v1 conf read cf8 0x80001268 bytes 2 offset 2 data 0x3012
(XEN) d0v4 conf read cf8 0x80001268 bytes 2 offset 2 data 0x3012
(XEN) d0v4 conf read cf8 0x80010000 bytes 4 offset 0 data 0x61614c3
(XEN) d0v4 conf read cf8 0x80010090 bytes 2 offset 0 data 0
(XEN) d0v4 conf read cf8 0x80001268 bytes 2 offset 0 data 0xcc2
(XEN) d0v4 conf write cf8 0x80010090 bytes 2 offset 0 data 0
(XEN) d0v4 conf write cf8 0x80001268 bytes 2 offset 0 data 0xcc0
(XEN) d0v4 conf write cf8 0x80001268 bytes 2 offset 0 data 0xcc2
(XEN) d0v4 conf write cf8 0x80010090 bytes 2 offset 0 data 0
(XEN) d0v4 conf read cf8 0x80001280 bytes 4 offset 0 data 0x6
(XEN) d0v4 conf read cf8 0x80001280 bytes 2 offset 0 data 0x6
(XEN) d0v4 conf write cf8 0x80001280 bytes 2 offset 0 data 0x406
(XEN) d0v4 conf write cf8 0x80010088 bytes 2 offset 0 data 0x2910
(XEN) d0v4 conf write cf8 0x80010090 bytes 2 offset 0 data 0x1c2
(XEN) d0v4 conf write cf8 0x800100a8 bytes 2 offset 0 data 0x400
(XEN) d0v4 conf write cf8 0x800100b0 bytes 2 offset 0 data 0x2
(XEN) d0v4 conf read cf8 0x8001003c bytes 4 offset 0 data 0x100
(XEN) d0v4 conf write cf8 0x8001003c bytes 4 offset 0 data 0x1ff
(XEN) d0v4 conf read cf8 0x80010038 bytes 4 offset 0 data 0
(XEN) d0v4 conf read cf8 0x80010034 bytes 4 offset 0 data 0x80
(XEN) d0v4 conf read cf8 0x80010030 bytes 4 offset 0 data 0
(XEN) d0v4 conf read cf8 0x8001002c bytes 4 offset 0 data 0xe61614c3
(XEN) d0v4 conf read cf8 0x80010028 bytes 4 offset 0 data 0
(XEN) d0v4 conf read cf8 0x80010024 bytes 4 offset 0 data 0
(XEN) d0v4 conf read cf8 0x80010020 bytes 4 offset 0 data 0
(XEN) d0v4 conf read cf8 0x8001001c bytes 4 offset 0 data 0
(XEN) d0v4 conf read cf8 0x80010018 bytes 4 offset 0 data 0x4
(XEN) d0v4 conf write cf8 0x80010018 bytes 4 offset 0 data 0x90b00004
(XEN) d0v4 conf read cf8 0x80010018 bytes 4 offset 0 data 0x90b00004
(XEN) d0v4 conf read cf8 0x80010014 bytes 4 offset 0 data 0
(XEN) d0v4 conf write cf8 0x80010014 bytes 4 offset 0 data 0x80
(XEN) d0v4 conf read cf8 0x80010014 bytes 4 offset 0 data 0x80
(XEN) d0v4 conf read cf8 0x80010010 bytes 4 offset 0 data 0xc
(XEN) d0v4 conf write cf8 0x80010010 bytes 4 offset 0 data 0x1090000c
(XEN) d0v4 conf read cf8 0x80010010 bytes 4 offset 0 data 0x1090000c
(XEN) d0v4 conf read cf8 0x8001000c bytes 4 offset 0 data 0
(XEN) d0v4 conf write cf8 0x8001000c bytes 4 offset 0 data 0x10
(XEN) d0v4 conf read cf8 0x80010008 bytes 4 offset 0 data 0x2800000
(XEN) d0v4 conf read cf8 0x80010004 bytes 4 offset 0 data 0x100000
(XEN) d0v4 conf read cf8 0x80010000 bytes 4 offset 0 data 0x61614c3
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-02-05 22:14 ` Marek Marczykowski-Górecki
@ 2025-02-07 22:00 ` Bjorn Helgaas
2025-02-07 22:10 ` Marek Marczykowski-Górecki
2025-02-07 22:23 ` Bjorn Helgaas
1 sibling, 1 reply; 25+ messages in thread
From: Bjorn Helgaas @ 2025-02-07 22:00 UTC (permalink / raw)
To: Marek Marczykowski-Górecki
Cc: Jan Beulich, Bjorn Helgaas, Jürgen Groß,
Roger Pau Monné, Boris Ostrovsky, xen-devel, linux-kernel,
regressions, Felix Fietkau, Lorenzo Bianconi, Ryder Lee
On Wed, Feb 05, 2025 at 11:14:17PM +0100, Marek Marczykowski-Górecki wrote:
> On Thu, Jan 30, 2025 at 03:31:23PM -0600, Bjorn Helgaas wrote:
> > On Thu, Jan 30, 2025 at 10:30:33AM +0100, Jan Beulich wrote:
> > > On 30.01.2025 05:55, Marek Marczykowski-Górecki wrote:
> > > > I've added logging of all config read/write to this device. Full log at
> > > > [1].
> > > ...
> ... Generally it looks like this device has broken FLR, and the
> reset works due to the fallback to the secondary bus reset on
> timeout. I repeated the test with my additional "&&
> !PCI_POSSIBLE_ERROR(id)" and I got this:
> [2] https://gist.github.com/marmarek/db0808702131b69ea2f66f339a55d71b
>
> The first log is with xen, and the second with native linux (and
> added PCI config space logging in drivers/pci/access.c).
This is just to annotate these logs. Correct me if you see something
wrong.
Both logs include this patch:
@@ -1297,7 +1297,8 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
if (root && root->config_rrs_sv) {
pci_read_config_dword(dev, PCI_VENDOR_ID, &id);
- if (!pci_bus_rrs_vendor_id(id))
+ if (!pci_bus_rrs_vendor_id(id) &&
+ !PCI_POSSIBLE_ERROR(id))
break;
I think both logs show this sequence:
- Initiate FLR on 01:00.0
- In pci_dev_wait(), poll PCI_VENDOR_ID, looking for something other
than 0x0001 (which would indicate RRS response) or 0xffff (from
patch above).
- Time out after ~70 seconds and return -ENOTTY.
- Attempt Secondary Bus Reset using 00:02.2, the Root Port leading
to 01:00.0.
- Successfully read PCI_VENDOR_ID.
- Looks the same, whether linux is running natively or on top of
Xen.
Relevant devices (from mediatek-debug-6.12-patch2+bridgelog.log):
00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Phoenix GPP Bridge
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
01:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter
Capabilities: [80] Express (v2) Endpoint, IntMsgNum 0
From mediatek-debug-6.12-patch2+bridgelog.log (from [2] above):
[anaconda root@test-12 /]# time echo 1 > /sys/bus/pci/devices/0000:01:00.0/reset
(XEN) d0v3 conf write cf8 0x80010088 bytes 2 offset 0 data 0xa910 <-- set 01:00.0 FLR
(XEN) d0v3 conf read cf8 0x80010000 bytes 4 offset 0 data 0xffffffff
...
(XEN) d0v4 conf read cf8 0x80010000 bytes 4 offset 0 data 0xffffffff
...
(XEN) d0v4 conf read cf8 0x8000123c bytes 2 offset 2 data 0x2 (0x3c + offset 2 = 0x3e)
(XEN) d0v4 conf write cf8 0x8000123c bytes 2 offset 2 data 0x42 <-- set 00:02.2 SBR
(XEN) d0v4 conf write cf8 0x8000123c bytes 2 offset 2 data 0x2
...
(XEN) d0v4 conf read cf8 0x80010000 bytes 4 offset 0 data 0x61614c3 <-- 01:00.0 VID/DID
...
real 1m10.825s
From mediatek-debug-native-6.12-patch2+bridgelog.log (also from [2]
above):
[anaconda root@test-12 ~]# time echo 1 > /sys/bus/pci/devices/0000:01:00.0/reset
[ 240.449215] pciback 0000:01:00.0: resetting
[ 240.450709] PCI: write bus 0x1 devfn 0x0 pos 0x88 size 2 value 0xa910 <-- set 01:00.0 FLR
[ 240.553264] PCI: read bus 0x1 devfn 0x0 pos 0x0 size 4 value 0xffffffff
...
[ 309.481728] PCI: read bus 0x1 devfn 0x0 pos 0x0 size 4 value 0xffffffff
[ 309.481747] pciback 0000:01:00.0: not ready 65535ms after FLR; giving up
...
[ 309.482667] PCI: read bus 0x0 devfn 0x12 pos 0x3e size 2 value 0x2 PCI_BRIDGE_CONTROL
[ 309.482670] PCI: write bus 0x0 devfn 0x12 pos 0x3e size 2 value 0x42 <-- set 00:02.2 SBR
[ 309.485184] PCI: write bus 0x0 devfn 0x12 pos 0x3e size 2 value 0x2
...
[ 309.617782] PCI: read bus 0x1 devfn 0x0 pos 0x0 size 4 value 0x61614c3 <-- 01:00.0 VID/DID
[ 309.629234] pciback 0000:01:00.0: reset done
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-02-07 22:00 ` Bjorn Helgaas
@ 2025-02-07 22:10 ` Marek Marczykowski-Górecki
0 siblings, 0 replies; 25+ messages in thread
From: Marek Marczykowski-Górecki @ 2025-02-07 22:10 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Jan Beulich, Bjorn Helgaas, Jürgen Groß,
Roger Pau Monné, Boris Ostrovsky, xen-devel, linux-kernel,
regressions, Felix Fietkau, Lorenzo Bianconi, Ryder Lee
[-- Attachment #1: Type: text/plain, Size: 4385 bytes --]
On Fri, Feb 07, 2025 at 04:00:36PM -0600, Bjorn Helgaas wrote:
> On Wed, Feb 05, 2025 at 11:14:17PM +0100, Marek Marczykowski-Górecki wrote:
> > On Thu, Jan 30, 2025 at 03:31:23PM -0600, Bjorn Helgaas wrote:
> > > On Thu, Jan 30, 2025 at 10:30:33AM +0100, Jan Beulich wrote:
> > > > On 30.01.2025 05:55, Marek Marczykowski-Górecki wrote:
> > > > > I've added logging of all config read/write to this device. Full log at
> > > > > [1].
> > > > ...
>
> > ... Generally it looks like this device has broken FLR, and the
> > reset works due to the fallback to the secondary bus reset on
> > timeout. I repeated the test with my additional "&&
> > !PCI_POSSIBLE_ERROR(id)" and I got this:
> > [2] https://gist.github.com/marmarek/db0808702131b69ea2f66f339a55d71b
> >
> > The first log is with xen, and the second with native linux (and
> > added PCI config space logging in drivers/pci/access.c).
>
> This is just to annotate these logs. Correct me if you see something
> wrong.
I think you all of that correct, yes.
> Both logs include this patch:
>
> @@ -1297,7 +1297,8 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
> if (root && root->config_rrs_sv) {
> pci_read_config_dword(dev, PCI_VENDOR_ID, &id);
> - if (!pci_bus_rrs_vendor_id(id))
> + if (!pci_bus_rrs_vendor_id(id) &&
> + !PCI_POSSIBLE_ERROR(id))
> break;
>
> I think both logs show this sequence:
>
> - Initiate FLR on 01:00.0
>
> - In pci_dev_wait(), poll PCI_VENDOR_ID, looking for something other
> than 0x0001 (which would indicate RRS response) or 0xffff (from
> patch above).
>
> - Time out after ~70 seconds and return -ENOTTY.
>
> - Attempt Secondary Bus Reset using 00:02.2, the Root Port leading
> to 01:00.0.
>
> - Successfully read PCI_VENDOR_ID.
>
> - Looks the same, whether linux is running natively or on top of
> Xen.
>
> Relevant devices (from mediatek-debug-6.12-patch2+bridgelog.log):
>
> 00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Phoenix GPP Bridge
> Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
>
> 01:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter
> Capabilities: [80] Express (v2) Endpoint, IntMsgNum 0
>
> From mediatek-debug-6.12-patch2+bridgelog.log (from [2] above):
>
> [anaconda root@test-12 /]# time echo 1 > /sys/bus/pci/devices/0000:01:00.0/reset
> (XEN) d0v3 conf write cf8 0x80010088 bytes 2 offset 0 data 0xa910 <-- set 01:00.0 FLR
> (XEN) d0v3 conf read cf8 0x80010000 bytes 4 offset 0 data 0xffffffff
> ...
> (XEN) d0v4 conf read cf8 0x80010000 bytes 4 offset 0 data 0xffffffff
> ...
> (XEN) d0v4 conf read cf8 0x8000123c bytes 2 offset 2 data 0x2 (0x3c + offset 2 = 0x3e)
> (XEN) d0v4 conf write cf8 0x8000123c bytes 2 offset 2 data 0x42 <-- set 00:02.2 SBR
> (XEN) d0v4 conf write cf8 0x8000123c bytes 2 offset 2 data 0x2
> ...
> (XEN) d0v4 conf read cf8 0x80010000 bytes 4 offset 0 data 0x61614c3 <-- 01:00.0 VID/DID
> ...
> real 1m10.825s
>
> From mediatek-debug-native-6.12-patch2+bridgelog.log (also from [2]
> above):
>
> [anaconda root@test-12 ~]# time echo 1 > /sys/bus/pci/devices/0000:01:00.0/reset
> [ 240.449215] pciback 0000:01:00.0: resetting
> [ 240.450709] PCI: write bus 0x1 devfn 0x0 pos 0x88 size 2 value 0xa910 <-- set 01:00.0 FLR
> [ 240.553264] PCI: read bus 0x1 devfn 0x0 pos 0x0 size 4 value 0xffffffff
> ...
> [ 309.481728] PCI: read bus 0x1 devfn 0x0 pos 0x0 size 4 value 0xffffffff
> [ 309.481747] pciback 0000:01:00.0: not ready 65535ms after FLR; giving up
> ...
> [ 309.482667] PCI: read bus 0x0 devfn 0x12 pos 0x3e size 2 value 0x2 PCI_BRIDGE_CONTROL
> [ 309.482670] PCI: write bus 0x0 devfn 0x12 pos 0x3e size 2 value 0x42 <-- set 00:02.2 SBR
> [ 309.485184] PCI: write bus 0x0 devfn 0x12 pos 0x3e size 2 value 0x2
>
> ...
> [ 309.617782] PCI: read bus 0x1 devfn 0x0 pos 0x0 size 4 value 0x61614c3 <-- 01:00.0 VID/DID
> [ 309.629234] pciback 0000:01:00.0: reset done
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-02-05 22:14 ` Marek Marczykowski-Górecki
2025-02-07 22:00 ` Bjorn Helgaas
@ 2025-02-07 22:23 ` Bjorn Helgaas
1 sibling, 0 replies; 25+ messages in thread
From: Bjorn Helgaas @ 2025-02-07 22:23 UTC (permalink / raw)
To: Marek Marczykowski-Górecki
Cc: Jan Beulich, Bjorn Helgaas, Jürgen Groß,
Roger Pau Monné, Boris Ostrovsky, xen-devel, linux-kernel,
regressions, Felix Fietkau, Lorenzo Bianconi, Ryder Lee,
Alex Williamson, Deren Wu, Kai-Heng Feng, Shayne Chen, Sean Wang,
Leon Yen, linux-mediatek
[+cc Alex, Mediatek folks, thread at https://lore.kernel.org/r/Z4pHll_6GX7OUBzQ@mail-itl]
On Wed, Feb 05, 2025 at 11:14:17PM +0100, Marek Marczykowski-Górecki wrote:
> On Thu, Jan 30, 2025 at 03:31:23PM -0600, Bjorn Helgaas wrote:
> > On Thu, Jan 30, 2025 at 10:30:33AM +0100, Jan Beulich wrote:
> > > On 30.01.2025 05:55, Marek Marczykowski-Górecki wrote:
> > > > I've added logging of all config read/write to this device. Full log at
> > > > [1].
> > > ...
> ... Generally it looks like this device has broken FLR, and the
> reset works due to the fallback to the secondary bus reset on
> timeout.
> I repeated the test with my additional "&& !PCI_POSSIBLE_ERROR(id)"
> and I got this:
> https://gist.github.com/marmarek/db0808702131b69ea2f66f339a55d71b
I'm having a really hard time piecing this all together. I'm trying
to recap the current theory:
- https://github.com/QubesOS/qubes-issues/issues/9689 reports
Mediatek MT7922 wifi (device/vendor [14c3:0616]) broke when
running v6.12 on Xen.
- Marek reproduced this and bisected to d591f6804e7e ("PCI: Wait for
device readiness with Configuration RRS"), which appeared in
v6.12.
- We do FLR on the device, either via sysfs or the xen-pciback
driver, e.g., pcistub_reset_device_state().
- We theorize that FLR is unreliable on this device, and it may
never respond successfully again. All reads, either to
PCI_COMMAND (before d591f6804e7e) or PCI_VENDOR_ID (after
d591f6804e7e) get ~0.
- Prior to d591f6804e7e, e.g., in v6.11, pci_dev_wait() times out
because polling PCI_COMMAND always returns ~0, and returns
-ENOTTY.
Since -ENOTTY was returned, we try another reset method. A
Secondary Bus Reset (SBR) works, and the device works again.
[3] seems to show this scenario ("NO BUG (kernel rollback 6.11)").
We waited ~345 seconds before giving up.
- After d591f6804e7e, e.g., in v6.12, pci_dev_wait() polls
PCI_VENDOR_ID looking for anything other than 0x0001. We
immediately get 0xffff and exit the loop. We assume the device is
ready, but it's actually not.
If pci_dbg were enabled (CONFIG_DYNAMIC_DEBUG=y and booted with
dyndbg="file drivers/pci/* +p"), we should see "ready %dms after
FLR" with a very small time.
We mistakenly think the device is ready, so we restore config
space, which the device ignores because it's not ready. The
device doesn't work at all, perhaps because its config space has
not been restored.
- After including the debug patch below, pci_dev_wait() polls
PCI_VENDOR_ID for something other than either 0x0001 or 0xffff.
This "works" the same as before d591f6804e7e: We always get ~0,
eventually time out, return -ENOTTY, fall back to SBR, and the
device works again. Because of the timeout, it takes about 70
seconds in both the Xen and the native logs in [4].
- The initial report said this works on v6.12 after a warm reboot
from v6.11, but fails after a cold boot [3]. Followup says this
works on v6.12 running natively, but it fails when running on
Xen [5].
I can't explain why this works in some cases but not others.
- It seems that even in v6.11, FLR didn't work for this device. The
device did eventually become usable, but only because we waited
70+ seconds after FLR, timed out, and fell back to SBR.
The quirk patch below should avoid use of FLR completely. The
mt7921 driver supports several other devices, maybe more should be
added.
Searches for mediatek "not ready" "after FLR" find many similar
reports from the web: [6], [7] (suspicious in that holding power
button 60 seconds seems to fix something, maybe similar to the
warm/cold reboot thing), [8] (works, then fails after
suspend/resume), [9], [10].
[3] https://github.com/QubesOS/qubes-issues/issues/9689#issuecomment-2582927149
[4] https://gist.github.com/marmarek/db0808702131b69ea2f66f339a55d71b
[5] https://lore.kernel.org/r/Z4pHll_6GX7OUBzQ@mail-itl
[6] https://community.frame.work/t/responded-yet-more-mediatek-issues-on-amd-linux/50039
[7] https://www.linux.org/threads/solved-wifi-adaptator-not-found-mediatek-wi-fi-6-mt7921-wireless-lan-card.37699/page-2
[8] https://forum.manjaro.org/t/mediatek-mt7922-wifi-not-working-after-waking-up/160664
[9] https://forum.manjaro.org/t/mediatek-mt7921e-fails-in-kernel-6-6-and-later-through-6-10/164217
[10] https://www.reddit.com/r/archlinux/comments/188ccib/wifi_disabled_after_disconnected_power/
Debug patch:
@@ -1297,7 +1297,8 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
if (root && root->config_rrs_sv) {
pci_read_config_dword(dev, PCI_VENDOR_ID, &id);
- if (!pci_bus_rrs_vendor_id(id))
+ if (!pci_bus_rrs_vendor_id(id) &&
+ !PCI_POSSIBLE_ERROR(id))
break;
commit 70197d3ec778 ("PCI: Avoid FLR for Mediatek MT7922 WiFi")
Author: Bjorn Helgaas <bhelgaas@google.com>
Date: Fri Feb 7 14:50:42 2025 -0600
PCI: Avoid FLR for Mediatek MT7922 WiFi
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index b84ff7bade82..82b21e34c545 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5522,7 +5522,7 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443, quirk_intel_qat_vf_cap);
* AMD Matisse USB 3.0 Host Controller 0x149c
* Intel 82579LM Gigabit Ethernet Controller 0x1502
* Intel 82579V Gigabit Ethernet Controller 0x1503
- *
+ * Mediatek MT7922 802.11ax PCI Express Wireless Network Adapter
*/
static void quirk_no_flr(struct pci_dev *dev)
{
@@ -5534,6 +5534,7 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x149c, quirk_no_flr);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x7901, quirk_no_flr);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502, quirk_no_flr);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503, quirk_no_flr);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_MEDIATEK, 0x0616, quirk_no_flr);
/* FLR may cause the SolidRun SNET DPU (rev 0x1) to hang */
static void quirk_no_flr_snet(struct pci_dev *dev)
^ permalink raw reply related [flat|nested] 25+ messages in thread
end of thread, other threads:[~2025-02-07 22:23 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-17 12:05 Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12) Marek Marczykowski-Górecki
2025-01-29 1:15 ` Bjorn Helgaas
2025-01-29 2:10 ` Marek Marczykowski-Górecki
2025-01-29 3:03 ` Bjorn Helgaas
2025-01-29 3:22 ` Marek Marczykowski-Górecki
2025-01-29 3:40 ` Bjorn Helgaas
2025-01-29 3:47 ` Marek Marczykowski-Górecki
2025-01-29 13:32 ` Bjorn Helgaas
2025-01-29 13:52 ` Jan Beulich
2025-01-29 14:50 ` Bjorn Helgaas
2025-01-29 9:17 ` Jan Beulich
2025-01-29 11:53 ` Marek Marczykowski-Górecki
2025-01-29 12:49 ` Jan Beulich
2025-01-29 13:28 ` Bjorn Helgaas
2025-01-29 13:54 ` Jan Beulich
2025-01-29 18:48 ` Bjorn Helgaas
2025-01-30 4:55 ` Marek Marczykowski-Górecki
2025-01-30 9:30 ` Jan Beulich
2025-01-30 21:31 ` Bjorn Helgaas
2025-01-31 7:13 ` Jan Beulich
2025-01-31 8:36 ` Marek Marczykowski-Górecki
2025-02-05 22:14 ` Marek Marczykowski-Górecki
2025-02-07 22:00 ` Bjorn Helgaas
2025-02-07 22:10 ` Marek Marczykowski-Górecki
2025-02-07 22:23 ` Bjorn Helgaas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox