* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
[not found] <Z5mOKQUrgeF_r6te@mail-itl>
@ 2025-01-29 3:03 ` Bjorn Helgaas
2025-01-29 3:22 ` Marek Marczykowski-Górecki
0 siblings, 1 reply; 12+ messages in thread
From: Bjorn Helgaas @ 2025-01-29 3:03 UTC (permalink / raw)
To: Marek Marczykowski-Górecki
Cc: Bjorn Helgaas, Jürgen Groß, Roger Pau Monné,
Boris Ostrovsky, xen-devel, linux-kernel, regressions,
Felix Fietkau, Lorenzo Bianconi, Ryder Lee, linux-pci
[+cc linux-pci]
On Wed, Jan 29, 2025 at 03:10:49AM +0100, Marek Marczykowski-Górecki wrote:
> On Tue, Jan 28, 2025 at 07:15:26PM -0600, Bjorn Helgaas wrote:
> > On Fri, Jan 17, 2025 at 01:05:30PM +0100, Marek Marczykowski-Górecki wrote:
> > > After updating PV dom0 to Linux 6.12, The Mediatek MT7922 device reports
> > > all 0xff when accessing its config space. This happens only after device
> > > reset (which is also triggered when binding the device to the
> > > xen-pciback driver).
> >
> > Thanks for the report and for all the debugging you've already done!
> >
> > > Reproducer:
> > >
> > > # lspci -xs 01:00.0
> > > 01:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter
> > > 00: c3 14 16 06 00 00 10 00 00 00 80 02 10 00 00 00
> > > ...
> > > # echo 1 > /sys/bus/pci/devices/0000:01:00.0/reset
> > > # lspci -xs 01:00.0
> > > 01:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter
> > > 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > >
> > > The same operation done on Linux 6.12 running without Xen works fine.
> > >
> > > git bisect points at:
> > >
> > > commit d591f6804e7e1310881c9224d72247a2b65039af
> > > Author: Bjorn Helgaas <bhelgaas@google.com>
> > > Date: Tue Aug 27 18:48:46 2024 -0500
> > >
> > > PCI: Wait for device readiness with Configuration RRS
> > >
> > > part of that commit:
> > > @@ -1311,9 +1320,15 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
> > > return -ENOTTY;
> > > }
> > >
> > > - pci_read_config_dword(dev, PCI_COMMAND, &id);
> > > - if (!PCI_POSSIBLE_ERROR(id))
> > > - break;
> > > + if (root && root->config_crs_sv) {
> > > + pci_read_config_dword(dev, PCI_VENDOR_ID, &id);
> > > + if (!pci_bus_crs_vendor_id(id))
> > > + break;
> > > + } else {
> > > + pci_read_config_dword(dev, PCI_COMMAND, &id);
> > > + if (!PCI_POSSIBLE_ERROR(id))
> > > + break;
> > > + }
> > >
> > >
> > > Adding some debugging, the PCI_VENDOR_ID read in pci_dev_wait() returns
> > > initially 0xffffffff. If I extend the condition with
> > > "&& !PCI_POSSIBLE_ERROR(id)", then the issue disappear. But reading the
> > > patch description, it would break VF.
> > > I'm not sure where the issue is, but given it breaks only when running
> > > with Xen, I guess something is wrong with "Configuration RRS Software
> > > Visibility" in that case.
> >
> > I'm missing something. If you get 0xffffffff, that is not the 0x0001
> > Vendor ID, so pci_dev_wait() should exit immediately.
>
> I'm not sure what is going on there either, but my _guess_ is that the
> loop exits too early due to the above. And it makes some further actions
> to fail.
When RRS SV is enabled, reading PCI_VENDOR_ID should always return
0x0001 (if the device isn't ready and responds with RRS status) or the
valid Vendor ID. I don't think it should ever return 0xffff (unless
the device is powered off, unplugged, or broken, of course).
> > But the log at
> > https://github.com/QubesOS/qubes-issues/issues/9689#issuecomment-2582927149
> > says it *doesn't* exit and eventually times out.
>
> Note this log is from "working" kernel, so that timeout must be
> something else.
I saw it was labeled "NO BUG" but I'm not sure it's labeled correctly
since there are no interesting messages from the "BUG PRESENT" part.
Awfully funny coincidence if it's unrelated.
> > And the lspci above shows ~0 data for much of the header, even though
> > the device must be ready by then.
> >
> > I don't have any good ideas, but since the problem only happens with
> > Xen, and it seems to affect more than just the Vendor ID, maybe you
> > could instrument xen_pcibk_config_read() and see if there's something
> > wonky going on there?
>
> This one is used when pcifront (from a different PV VM) is asking pciback
> to read something. I see the issue even before starting any other VM and
> not even attaching the device to the xen-pciback driver...
The report claims the problem only happens with Xen. I'm not a Xen
person, and I don't know how to find the relevant config accessors.
The snippets of kernel messages I see at [1] all mention pciback, so
that's my only clue of where to look. Bottom line, I have no idea
what the config accessor path is, and maybe we could learn something
by looking at whatever it is.
[1] https://github.com/QubesOS/qubes-issues/issues/9689#issuecomment-2582927149
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-29 3:03 ` Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12) Bjorn Helgaas
@ 2025-01-29 3:22 ` Marek Marczykowski-Górecki
2025-01-29 3:40 ` Bjorn Helgaas
2025-01-29 9:17 ` Jan Beulich
0 siblings, 2 replies; 12+ messages in thread
From: Marek Marczykowski-Górecki @ 2025-01-29 3:22 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Bjorn Helgaas, Jürgen Groß, Roger Pau Monné,
Boris Ostrovsky, xen-devel, linux-kernel, regressions,
Felix Fietkau, Lorenzo Bianconi, Ryder Lee, linux-pci
[-- Attachment #1: Type: text/plain, Size: 10845 bytes --]
On Tue, Jan 28, 2025 at 09:03:15PM -0600, Bjorn Helgaas wrote:
> [+cc linux-pci]
>
> On Wed, Jan 29, 2025 at 03:10:49AM +0100, Marek Marczykowski-Górecki wrote:
> > On Tue, Jan 28, 2025 at 07:15:26PM -0600, Bjorn Helgaas wrote:
> > > On Fri, Jan 17, 2025 at 01:05:30PM +0100, Marek Marczykowski-Górecki wrote:
> > > > After updating PV dom0 to Linux 6.12, The Mediatek MT7922 device reports
> > > > all 0xff when accessing its config space. This happens only after device
> > > > reset (which is also triggered when binding the device to the
> > > > xen-pciback driver).
> > >
> > > Thanks for the report and for all the debugging you've already done!
> > >
> > > > Reproducer:
> > > >
> > > > # lspci -xs 01:00.0
> > > > 01:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter
> > > > 00: c3 14 16 06 00 00 10 00 00 00 80 02 10 00 00 00
> > > > ...
> > > > # echo 1 > /sys/bus/pci/devices/0000:01:00.0/reset
> > > > # lspci -xs 01:00.0
> > > > 01:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter
> > > > 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > >
> > > > The same operation done on Linux 6.12 running without Xen works fine.
> > > >
> > > > git bisect points at:
> > > >
> > > > commit d591f6804e7e1310881c9224d72247a2b65039af
> > > > Author: Bjorn Helgaas <bhelgaas@google.com>
> > > > Date: Tue Aug 27 18:48:46 2024 -0500
> > > >
> > > > PCI: Wait for device readiness with Configuration RRS
> > > >
> > > > part of that commit:
> > > > @@ -1311,9 +1320,15 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
> > > > return -ENOTTY;
> > > > }
> > > >
> > > > - pci_read_config_dword(dev, PCI_COMMAND, &id);
> > > > - if (!PCI_POSSIBLE_ERROR(id))
> > > > - break;
> > > > + if (root && root->config_crs_sv) {
> > > > + pci_read_config_dword(dev, PCI_VENDOR_ID, &id);
> > > > + if (!pci_bus_crs_vendor_id(id))
> > > > + break;
> > > > + } else {
> > > > + pci_read_config_dword(dev, PCI_COMMAND, &id);
> > > > + if (!PCI_POSSIBLE_ERROR(id))
> > > > + break;
> > > > + }
> > > >
> > > >
> > > > Adding some debugging, the PCI_VENDOR_ID read in pci_dev_wait() returns
> > > > initially 0xffffffff. If I extend the condition with
> > > > "&& !PCI_POSSIBLE_ERROR(id)", then the issue disappear. But reading the
> > > > patch description, it would break VF.
> > > > I'm not sure where the issue is, but given it breaks only when running
> > > > with Xen, I guess something is wrong with "Configuration RRS Software
> > > > Visibility" in that case.
> > >
> > > I'm missing something. If you get 0xffffffff, that is not the 0x0001
> > > Vendor ID, so pci_dev_wait() should exit immediately.
> >
> > I'm not sure what is going on there either, but my _guess_ is that the
> > loop exits too early due to the above. And it makes some further actions
> > to fail.
>
> When RRS SV is enabled, reading PCI_VENDOR_ID should always return
> 0x0001 (if the device isn't ready and responds with RRS status) or the
> valid Vendor ID. I don't think it should ever return 0xffff (unless
> the device is powered off, unplugged, or broken, of course).
Maybe it isn't really enabled when Xen is involved?
By looking at lspci of the bridge for this device, I do see RootCtl: ...
CRSVisible+, but maybe there is something else needed too?
Just in case, full lspci -vvvs 2.2 (the bridge):
00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Phoenix GPP Bridge (prog-if 00 [Normal decode])
Subsystem: Advanced Micro Devices, Inc. [AMD] Device 1453
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin ? routed to IRQ 102
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: 0000f000-00000fff [disabled] [32-bit]
Memory behind bridge: 90b00000-90bfffff [size=1M] [32-bit]
Prefetchable memory behind bridge: 8010900000-80109fffff [size=1M] [32-bit]
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Express (v2) Root Port (Slot+), IntMsgNum 0
DevCap: MaxPayload 256 bytes, PhantFunc 0
ExtTag+ RBE+ TEE-IO-
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #4, Speed 16GT/s, Width x1, ASPM L1, Exit Latency L1 <64us
ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes, LnkDisable- CommClk+
ExtSynch+ ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x1
TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
Slot #0, PowerLimit 75W; Interlock- NoCompl+
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
Changed: MRL- PresDet- LinkState+
RootCap: CRSVisible+
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible+
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- LN System CLS Not Supported, TPHComp+ ExtTPHComp- ARIFwd+
AtomicOpsCap: Routing+ 32bit+ 64bit+ 128bitCAS-
DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis- ARIFwd-
AtomicOpsCtl: ReqEn- EgressBlck-
IDOReq- IDOCompl- LTR+ EmergencyPowerReductionReq-
10BitTagReq- OBFF Disabled, EETLPPrefixBlk-
LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee08000 Data: 4000
Capabilities: [c0] Subsystem: Advanced Micro Devices, Inc. [AMD] Device 1453
Capabilities: [c8] HyperTransport: MSI Mapping Enable+ Fixed+
Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [270 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: 0
Capabilities: [2a0 v1] Access Control Services
ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans+
ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
Capabilities: [370 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=10us PortTPowerOnTime=150us
L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
T_CommonMode=10us LTR1.2_Threshold=166912ns
L1SubCtl2: T_PwrOn=150us
Capabilities: [400 v1] Data Link Feature <?>
Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
Capabilities: [440 v1] Lane Margining at the Receiver
PortCap: Uses Driver-
PortSta: MargReady- MargSoftReady-
Kernel driver in use: pcieport
>
> > > But the log at
> > > https://github.com/QubesOS/qubes-issues/issues/9689#issuecomment-2582927149
> > > says it *doesn't* exit and eventually times out.
> >
> > Note this log is from "working" kernel, so that timeout must be
> > something else.
>
> I saw it was labeled "NO BUG" but I'm not sure it's labeled correctly
> since there are no interesting messages from the "BUG PRESENT" part.
> Awfully funny coincidence if it's unrelated.
The timeout thing I have seen before, possibly also on a different
hardware (although I'm not 100% sure), I think it's a different issue.
> > > And the lspci above shows ~0 data for much of the header, even though
> > > the device must be ready by then.
> > >
> > > I don't have any good ideas, but since the problem only happens with
> > > Xen, and it seems to affect more than just the Vendor ID, maybe you
> > > could instrument xen_pcibk_config_read() and see if there's something
> > > wonky going on there?
> >
> > This one is used when pcifront (from a different PV VM) is asking pciback
> > to read something. I see the issue even before starting any other VM and
> > not even attaching the device to the xen-pciback driver...
>
> The report claims the problem only happens with Xen. I'm not a Xen
> person, and I don't know how to find the relevant config accessors.
> The snippets of kernel messages I see at [1] all mention pciback, so
> that's my only clue of where to look. Bottom line, I have no idea
> what the config accessor path is, and maybe we could learn something
> by looking at whatever it is.
AFAIK there are no separate config accessors under Xen dom0, the default
ones are used. xen-pcifront takes over PCI config space access (and few
more) only in a domU (and only for PV), when PCI passthrough is used.
Here, it didn't went that far...
But then, Xen may intercept such access [2]. If I read it right, it
should allow all access (is_hardware_domain(dom0)==true, and also the
device is not on ro_map - otherwise reset wouldn't work at all).
>
> [1] https://github.com/QubesOS/qubes-issues/issues/9689#issuecomment-2582927149
[2] https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/pv/emul-priv-op.c;h=70150c27227661baa253af8693ff00f2ab640a98;hb=HEAD#l295
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-29 3:22 ` Marek Marczykowski-Górecki
@ 2025-01-29 3:40 ` Bjorn Helgaas
2025-01-29 3:47 ` Marek Marczykowski-Górecki
2025-01-29 9:17 ` Jan Beulich
1 sibling, 1 reply; 12+ messages in thread
From: Bjorn Helgaas @ 2025-01-29 3:40 UTC (permalink / raw)
To: Marek Marczykowski-Górecki
Cc: Bjorn Helgaas, Jürgen Groß, Roger Pau Monné,
Boris Ostrovsky, xen-devel, linux-kernel, regressions,
Felix Fietkau, Lorenzo Bianconi, Ryder Lee, linux-pci
On Wed, Jan 29, 2025 at 04:22:43AM +0100, Marek Marczykowski-Górecki wrote:
> On Tue, Jan 28, 2025 at 09:03:15PM -0600, Bjorn Helgaas wrote:
> > On Wed, Jan 29, 2025 at 03:10:49AM +0100, Marek Marczykowski-Górecki wrote:
> > > On Tue, Jan 28, 2025 at 07:15:26PM -0600, Bjorn Helgaas wrote:
> > > > On Fri, Jan 17, 2025 at 01:05:30PM +0100, Marek Marczykowski-Górecki wrote:
> > > > > After updating PV dom0 to Linux 6.12, The Mediatek MT7922 device reports
> > > > > all 0xff when accessing its config space. This happens only after device
> > > > > reset (which is also triggered when binding the device to the
> > > > > xen-pciback driver).
> > > >
> > > > Thanks for the report and for all the debugging you've already done!
> > > >
> > > > > Reproducer:
> > > > >
> > > > > # lspci -xs 01:00.0
> > > > > 01:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter
> > > > > 00: c3 14 16 06 00 00 10 00 00 00 80 02 10 00 00 00
> > > > > ...
> > > > > # echo 1 > /sys/bus/pci/devices/0000:01:00.0/reset
> > > > > # lspci -xs 01:00.0
> > > > > 01:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter
> > > > > 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > >
> > > > > The same operation done on Linux 6.12 running without Xen works fine.
> > > > >
> > > > > git bisect points at:
> > > > >
> > > > > commit d591f6804e7e1310881c9224d72247a2b65039af
> > > > > Author: Bjorn Helgaas <bhelgaas@google.com>
> > > > > Date: Tue Aug 27 18:48:46 2024 -0500
> > > > >
> > > > > PCI: Wait for device readiness with Configuration RRS
> > > > >
> > > > > part of that commit:
> > > > > @@ -1311,9 +1320,15 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
> > > > > return -ENOTTY;
> > > > > }
> > > > >
> > > > > - pci_read_config_dword(dev, PCI_COMMAND, &id);
> > > > > - if (!PCI_POSSIBLE_ERROR(id))
> > > > > - break;
> > > > > + if (root && root->config_crs_sv) {
> > > > > + pci_read_config_dword(dev, PCI_VENDOR_ID, &id);
> > > > > + if (!pci_bus_crs_vendor_id(id))
> > > > > + break;
> > > > > + } else {
> > > > > + pci_read_config_dword(dev, PCI_COMMAND, &id);
> > > > > + if (!PCI_POSSIBLE_ERROR(id))
> > > > > + break;
> > > > > + }
> > > > >
> > > > >
> > > > > Adding some debugging, the PCI_VENDOR_ID read in pci_dev_wait() returns
> > > > > initially 0xffffffff. If I extend the condition with
> > > > > "&& !PCI_POSSIBLE_ERROR(id)", then the issue disappear. But reading the
> > > > > patch description, it would break VF.
> > > > > I'm not sure where the issue is, but given it breaks only when running
> > > > > with Xen, I guess something is wrong with "Configuration RRS Software
> > > > > Visibility" in that case.
> > > >
> > > > I'm missing something. If you get 0xffffffff, that is not the 0x0001
> > > > Vendor ID, so pci_dev_wait() should exit immediately.
> > >
> > > I'm not sure what is going on there either, but my _guess_ is that the
> > > loop exits too early due to the above. And it makes some further actions
> > > to fail.
> >
> > When RRS SV is enabled, reading PCI_VENDOR_ID should always return
> > 0x0001 (if the device isn't ready and responds with RRS status) or the
> > valid Vendor ID. I don't think it should ever return 0xffff (unless
> > the device is powered off, unplugged, or broken, of course).
>
> Maybe it isn't really enabled when Xen is involved?
> By looking at lspci of the bridge for this device, I do see RootCtl: ...
> CRSVisible+, but maybe there is something else needed too?
As far as I know, CRSVisible+ is all that's needed to enable this, and
Linux always enables it if it's supported [3]
> Just in case, full lspci -vvvs 2.2 (the bridge):
>
>
> 00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Phoenix GPP Bridge (prog-if 00 [Normal decode])
> Subsystem: Advanced Micro Devices, Inc. [AMD] Device 1453
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> Latency: 0, Cache Line Size: 64 bytes
> Interrupt: pin ? routed to IRQ 102
> Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
> I/O behind bridge: 0000f000-00000fff [disabled] [32-bit]
> Memory behind bridge: 90b00000-90bfffff [size=1M] [32-bit]
> Prefetchable memory behind bridge: 8010900000-80109fffff [size=1M] [32-bit]
> Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
> BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
> PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> Capabilities: [50] Power Management version 3
> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
> Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [58] Express (v2) Root Port (Slot+), IntMsgNum 0
> DevCap: MaxPayload 256 bytes, PhantFunc 0
> ExtTag+ RBE+ TEE-IO-
> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
> MaxPayload 128 bytes, MaxReadReq 512 bytes
> DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
> LnkCap: Port #4, Speed 16GT/s, Width x1, ASPM L1, Exit Latency L1 <64us
> ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
> LnkCtl: ASPM L1 Enabled; RCB 64 bytes, LnkDisable- CommClk+
> ExtSynch+ ClockPM- AutWidDis- BWInt- AutBWInt-
> LnkSta: Speed 5GT/s, Width x1
> TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
> SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
> Slot #0, PowerLimit 75W; Interlock- NoCompl+
> SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
> Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
> SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
> Changed: MRL- PresDet- LinkState+
> RootCap: CRSVisible+
> RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible+
> RootSta: PME ReqID 0000, PMEStatus- PMEPending-
> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
> 10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
> EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
> FRS- LN System CLS Not Supported, TPHComp+ ExtTPHComp- ARIFwd+
> AtomicOpsCap: Routing+ 32bit+ 64bit+ 128bitCAS-
> DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis- ARIFwd-
> AtomicOpsCtl: ReqEn- EgressBlck-
> IDOReq- IDOCompl- LTR+ EmergencyPowerReductionReq-
> 10BitTagReq- OBFF Disabled, EETLPPrefixBlk-
> LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
> LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
> Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
> LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
> EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
> Retimer- 2Retimers- CrosslinkRes: unsupported
> Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> Address: 00000000fee08000 Data: 4000
> Capabilities: [c0] Subsystem: Advanced Micro Devices, Inc. [AMD] Device 1453
> Capabilities: [c8] HyperTransport: MSI Mapping Enable+ Fixed+
> Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
> Capabilities: [270 v1] Secondary PCI Express
> LnkCtl3: LnkEquIntrruptEn- PerformEqu-
> LaneErrStat: 0
> Capabilities: [2a0 v1] Access Control Services
> ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans+
> ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
> Capabilities: [370 v1] L1 PM Substates
> L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> PortCommonModeRestoreTime=10us PortTPowerOnTime=150us
> L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> T_CommonMode=10us LTR1.2_Threshold=166912ns
> L1SubCtl2: T_PwrOn=150us
> Capabilities: [400 v1] Data Link Feature <?>
> Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
> Capabilities: [440 v1] Lane Margining at the Receiver
> PortCap: Uses Driver-
> PortSta: MargReady- MargSoftReady-
> Kernel driver in use: pcieport
>
> > > > But the log at
> > > > https://github.com/QubesOS/qubes-issues/issues/9689#issuecomment-2582927149
> > > > says it *doesn't* exit and eventually times out.
> > >
> > > Note this log is from "working" kernel, so that timeout must be
> > > something else.
> >
> > I saw it was labeled "NO BUG" but I'm not sure it's labeled correctly
> > since there are no interesting messages from the "BUG PRESENT" part.
> > Awfully funny coincidence if it's unrelated.
>
> The timeout thing I have seen before, possibly also on a different
> hardware (although I'm not 100% sure), I think it's a different issue.
>
> > > > And the lspci above shows ~0 data for much of the header, even though
> > > > the device must be ready by then.
> > > >
> > > > I don't have any good ideas, but since the problem only happens with
> > > > Xen, and it seems to affect more than just the Vendor ID, maybe you
> > > > could instrument xen_pcibk_config_read() and see if there's something
> > > > wonky going on there?
> > >
> > > This one is used when pcifront (from a different PV VM) is asking pciback
> > > to read something. I see the issue even before starting any other VM and
> > > not even attaching the device to the xen-pciback driver...
> >
> > The report claims the problem only happens with Xen. I'm not a Xen
> > person, and I don't know how to find the relevant config accessors.
> > The snippets of kernel messages I see at [1] all mention pciback, so
> > that's my only clue of where to look. Bottom line, I have no idea
> > what the config accessor path is, and maybe we could learn something
> > by looking at whatever it is.
>
> AFAIK there are no separate config accessors under Xen dom0, the default
> ones are used. xen-pcifront takes over PCI config space access (and few
> more) only in a domU (and only for PV), when PCI passthrough is used.
> Here, it didn't went that far...
>
> But then, Xen may intercept such access [2]. If I read it right, it
> should allow all access (is_hardware_domain(dom0)==true, and also the
> device is not on ro_map - otherwise reset wouldn't work at all).
I guess the code at [2] is running in user mode and uses Linux
syscalls for config access? Is it straceable?
Can you reproduce this without Xen at all? If so, can you post a
complete dmesg and complete lspci -vv somewhere?
> > [1] https://github.com/QubesOS/qubes-issues/issues/9689#issuecomment-2582927149
>
> [2] https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/pv/emul-priv-op.c;h=70150c27227661baa253af8693ff00f2ab640a98;hb=HEAD#l295
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/probe.c?id=v6.13#n1208
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-29 3:40 ` Bjorn Helgaas
@ 2025-01-29 3:47 ` Marek Marczykowski-Górecki
2025-01-29 13:32 ` Bjorn Helgaas
0 siblings, 1 reply; 12+ messages in thread
From: Marek Marczykowski-Górecki @ 2025-01-29 3:47 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Bjorn Helgaas, Jürgen Groß, Roger Pau Monné,
Boris Ostrovsky, xen-devel, linux-kernel, regressions,
Felix Fietkau, Lorenzo Bianconi, Ryder Lee, linux-pci
[-- Attachment #1: Type: text/plain, Size: 1113 bytes --]
On Tue, Jan 28, 2025 at 09:40:18PM -0600, Bjorn Helgaas wrote:
> I guess the code at [2] is running in user mode and uses Linux
> syscalls for config access? Is it straceable?
Nope, it's running as the hypervisor and mediates Linux's access to the
hardware. In fact, Linux PV kernel (which dom0 is by default under Xen)
is running in ring 3...
But I can add some more logging there.
> Can you reproduce this without Xen at all? If so, can you post a
> complete dmesg and complete lspci -vv somewhere?
I haven't managed to reproduce it without Xen so far. But I can't
exclude it's some race condition that is simply unlikely to hit when
Linux runs natively.
> > > [1] https://github.com/QubesOS/qubes-issues/issues/9689#issuecomment-2582927149
> >
> > [2] https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/pv/emul-priv-op.c;h=70150c27227661baa253af8693ff00f2ab640a98;hb=HEAD#l295
>
> [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/probe.c?id=v6.13#n1208
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-29 3:22 ` Marek Marczykowski-Górecki
2025-01-29 3:40 ` Bjorn Helgaas
@ 2025-01-29 9:17 ` Jan Beulich
2025-01-29 11:53 ` Marek Marczykowski-Górecki
2025-01-29 13:28 ` Bjorn Helgaas
1 sibling, 2 replies; 12+ messages in thread
From: Jan Beulich @ 2025-01-29 9:17 UTC (permalink / raw)
To: Marek Marczykowski-Górecki
Cc: Bjorn Helgaas, Jürgen Groß, Roger Pau Monné,
Boris Ostrovsky, xen-devel, linux-kernel, regressions,
Felix Fietkau, Lorenzo Bianconi, Ryder Lee, linux-pci,
Bjorn Helgaas
On 29.01.2025 04:22, Marek Marczykowski-Górecki wrote:
> On Tue, Jan 28, 2025 at 09:03:15PM -0600, Bjorn Helgaas wrote:
>> The report claims the problem only happens with Xen. I'm not a Xen
>> person, and I don't know how to find the relevant config accessors.
>> The snippets of kernel messages I see at [1] all mention pciback, so
>> that's my only clue of where to look. Bottom line, I have no idea
>> what the config accessor path is, and maybe we could learn something
>> by looking at whatever it is.
>
> AFAIK there are no separate config accessors under Xen dom0, the default
> ones are used. xen-pcifront takes over PCI config space access (and few
> more) only in a domU (and only for PV), when PCI passthrough is used.
> Here, it didn't went that far...
>
> But then, Xen may intercept such access [2]. If I read it right, it
> should allow all access (is_hardware_domain(dom0)==true, and also the
> device is not on ro_map - otherwise reset wouldn't work at all).
The other day you mentioned (on Matrix I think) that you observe mmcfg
not being used on that system. Am I misremembering? (Since the capability
where the control bit lives is an extended one, that capability would
neither be read nor modified when mmcfg is unavailable.)
Jan
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-29 9:17 ` Jan Beulich
@ 2025-01-29 11:53 ` Marek Marczykowski-Górecki
2025-01-29 12:49 ` Jan Beulich
2025-01-29 13:28 ` Bjorn Helgaas
1 sibling, 1 reply; 12+ messages in thread
From: Marek Marczykowski-Górecki @ 2025-01-29 11:53 UTC (permalink / raw)
To: Jan Beulich
Cc: Bjorn Helgaas, Jürgen Groß, Roger Pau Monné,
Boris Ostrovsky, xen-devel, linux-kernel, regressions,
Felix Fietkau, Lorenzo Bianconi, Ryder Lee, linux-pci,
Bjorn Helgaas
[-- Attachment #1: Type: text/plain, Size: 1886 bytes --]
On Wed, Jan 29, 2025 at 10:17:20AM +0100, Jan Beulich wrote:
> On 29.01.2025 04:22, Marek Marczykowski-Górecki wrote:
> > On Tue, Jan 28, 2025 at 09:03:15PM -0600, Bjorn Helgaas wrote:
> >> The report claims the problem only happens with Xen. I'm not a Xen
> >> person, and I don't know how to find the relevant config accessors.
> >> The snippets of kernel messages I see at [1] all mention pciback, so
> >> that's my only clue of where to look. Bottom line, I have no idea
> >> what the config accessor path is, and maybe we could learn something
> >> by looking at whatever it is.
> >
> > AFAIK there are no separate config accessors under Xen dom0, the default
> > ones are used. xen-pcifront takes over PCI config space access (and few
> > more) only in a domU (and only for PV), when PCI passthrough is used.
> > Here, it didn't went that far...
> >
> > But then, Xen may intercept such access [2]. If I read it right, it
> > should allow all access (is_hardware_domain(dom0)==true, and also the
> > device is not on ro_map - otherwise reset wouldn't work at all).
>
> The other day you mentioned (on Matrix I think) that you observe mmcfg
> not being used on that system. Am I misremembering? (Since the capability
> where the control bit lives is an extended one, that capability would
> neither be read nor modified when mmcfg is unavailable.)
Yes, but later (once dom0 starts) it switched back to mmcfg. Now I see
this:
(XEN) PCI: MCFG configuration 0: base e0000000 segment 0000 buses 00 - ff
(XEN) PCI: Using MCFG for segment 0000 bus 00-ff
Another thing I noticed in the bug report - the reporter says warm
reboot from 6.11 (where it works) to 6.12 avoids the issue (not sure
about further reboots). Cold boot directly to 6.12 results in this buggy
behavior.
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-29 11:53 ` Marek Marczykowski-Górecki
@ 2025-01-29 12:49 ` Jan Beulich
0 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2025-01-29 12:49 UTC (permalink / raw)
To: Marek Marczykowski-Górecki
Cc: Bjorn Helgaas, Jürgen Groß, Roger Pau Monné,
Boris Ostrovsky, xen-devel, linux-kernel, regressions,
Felix Fietkau, Lorenzo Bianconi, Ryder Lee, linux-pci,
Bjorn Helgaas
On 29.01.2025 12:53, Marek Marczykowski-Górecki wrote:
> On Wed, Jan 29, 2025 at 10:17:20AM +0100, Jan Beulich wrote:
>> On 29.01.2025 04:22, Marek Marczykowski-Górecki wrote:
>>> On Tue, Jan 28, 2025 at 09:03:15PM -0600, Bjorn Helgaas wrote:
>>>> The report claims the problem only happens with Xen. I'm not a Xen
>>>> person, and I don't know how to find the relevant config accessors.
>>>> The snippets of kernel messages I see at [1] all mention pciback, so
>>>> that's my only clue of where to look. Bottom line, I have no idea
>>>> what the config accessor path is, and maybe we could learn something
>>>> by looking at whatever it is.
>>>
>>> AFAIK there are no separate config accessors under Xen dom0, the default
>>> ones are used. xen-pcifront takes over PCI config space access (and few
>>> more) only in a domU (and only for PV), when PCI passthrough is used.
>>> Here, it didn't went that far...
>>>
>>> But then, Xen may intercept such access [2]. If I read it right, it
>>> should allow all access (is_hardware_domain(dom0)==true, and also the
>>> device is not on ro_map - otherwise reset wouldn't work at all).
>>
>> The other day you mentioned (on Matrix I think) that you observe mmcfg
>> not being used on that system. Am I misremembering? (Since the capability
>> where the control bit lives is an extended one, that capability would
>> neither be read nor modified when mmcfg is unavailable.)
>
> Yes, but later (once dom0 starts) it switched back to mmcfg. Now I see
> this:
> (XEN) PCI: MCFG configuration 0: base e0000000 segment 0000 buses 00 - ff
> (XEN) PCI: Using MCFG for segment 0000 bus 00-ff
>
> Another thing I noticed in the bug report - the reporter says warm
> reboot from 6.11 (where it works) to 6.12 avoids the issue (not sure
> about further reboots). Cold boot directly to 6.12 results in this buggy
> behavior.
Makes things yet more odd, imo.
Jan
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-29 9:17 ` Jan Beulich
2025-01-29 11:53 ` Marek Marczykowski-Górecki
@ 2025-01-29 13:28 ` Bjorn Helgaas
2025-01-29 13:54 ` Jan Beulich
1 sibling, 1 reply; 12+ messages in thread
From: Bjorn Helgaas @ 2025-01-29 13:28 UTC (permalink / raw)
To: Jan Beulich
Cc: Marek Marczykowski-Górecki, Bjorn Helgaas,
Jürgen Groß, Roger Pau Monné, Boris Ostrovsky,
xen-devel, linux-kernel, regressions, Felix Fietkau,
Lorenzo Bianconi, Ryder Lee, linux-pci
On Wed, Jan 29, 2025 at 10:17:20AM +0100, Jan Beulich wrote:
> On 29.01.2025 04:22, Marek Marczykowski-Górecki wrote:
> > On Tue, Jan 28, 2025 at 09:03:15PM -0600, Bjorn Helgaas wrote:
> >> The report claims the problem only happens with Xen. I'm not a Xen
> >> person, and I don't know how to find the relevant config accessors.
> >> The snippets of kernel messages I see at [1] all mention pciback, so
> >> that's my only clue of where to look. Bottom line, I have no idea
> >> what the config accessor path is, and maybe we could learn something
> >> by looking at whatever it is.
> >
> > AFAIK there are no separate config accessors under Xen dom0, the default
> > ones are used. xen-pcifront takes over PCI config space access (and few
> > more) only in a domU (and only for PV), when PCI passthrough is used.
> > Here, it didn't went that far...
> >
> > But then, Xen may intercept such access [2]. If I read it right, it
> > should allow all access (is_hardware_domain(dom0)==true, and also the
> > device is not on ro_map - otherwise reset wouldn't work at all).
>
> The other day you mentioned (on Matrix I think) that you observe mmcfg
> not being used on that system. Am I misremembering? (Since the capability
> where the control bit lives is an extended one, that capability would
> neither be read nor modified when mmcfg is unavailable.)
If you're referring to the Configuration RRS Software Visibility
Enable bit, that's in the PCIe Capability Root Control register, which
is in the PCI-compatible config space (the first 256 bytes), not the
extended config space.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-29 3:47 ` Marek Marczykowski-Górecki
@ 2025-01-29 13:32 ` Bjorn Helgaas
2025-01-29 13:52 ` Jan Beulich
0 siblings, 1 reply; 12+ messages in thread
From: Bjorn Helgaas @ 2025-01-29 13:32 UTC (permalink / raw)
To: Marek Marczykowski-Górecki
Cc: Bjorn Helgaas, Jürgen Groß, Roger Pau Monné,
Boris Ostrovsky, xen-devel, linux-kernel, regressions,
Felix Fietkau, Lorenzo Bianconi, Ryder Lee, linux-pci,
Jan Beulich
On Wed, Jan 29, 2025 at 04:47:28AM +0100, Marek Marczykowski-Górecki wrote:
> On Tue, Jan 28, 2025 at 09:40:18PM -0600, Bjorn Helgaas wrote:
> > I guess the code at [2] is running in user mode and uses Linux
> > syscalls for config access? Is it straceable?
>
> Nope, it's running as the hypervisor and mediates Linux's access to the
> hardware. In fact, Linux PV kernel (which dom0 is by default under Xen)
> is running in ring 3...
>
> But I can add some more logging there.
So I guess the hypervisor performs the config access on behalf of the
Linux PV kernel? Obviously Linux thinks CRS/RRS SV is enabled, but I
suppose all the lspci output is similarly based on whatever the
hypervisor does, so how do we know the actual hardware config?
> > > [2] https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/pv/emul-priv-op.c;h=70150c27227661baa253af8693ff00f2ab640a98;hb=HEAD#l295
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-29 13:32 ` Bjorn Helgaas
@ 2025-01-29 13:52 ` Jan Beulich
2025-01-29 14:50 ` Bjorn Helgaas
0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2025-01-29 13:52 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Bjorn Helgaas, Jürgen Groß, Roger Pau Monné,
Boris Ostrovsky, xen-devel, linux-kernel, regressions,
Felix Fietkau, Lorenzo Bianconi, Ryder Lee, linux-pci,
Marek Marczykowski-Górecki
On 29.01.2025 14:32, Bjorn Helgaas wrote:
> On Wed, Jan 29, 2025 at 04:47:28AM +0100, Marek Marczykowski-Górecki wrote:
>> On Tue, Jan 28, 2025 at 09:40:18PM -0600, Bjorn Helgaas wrote:
>>> I guess the code at [2] is running in user mode and uses Linux
>>> syscalls for config access? Is it straceable?
>>
>> Nope, it's running as the hypervisor and mediates Linux's access to the
>> hardware. In fact, Linux PV kernel (which dom0 is by default under Xen)
>> is running in ring 3...
>>
>> But I can add some more logging there.
>
> So I guess the hypervisor performs the config access on behalf of the
> Linux PV kernel? Obviously Linux thinks CRS/RRS SV is enabled, but I
> suppose all the lspci output is similarly based on whatever the
> hypervisor does, so how do we know the actual hardware config?
The hypervisor only intercepts config space writes; reads, particularly
when going via mmcfg, ought to be unaffected, and hence what lspci shows
should be "the actual hardware config".
Jan
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-29 13:28 ` Bjorn Helgaas
@ 2025-01-29 13:54 ` Jan Beulich
0 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2025-01-29 13:54 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Marek Marczykowski-Górecki, Bjorn Helgaas,
Jürgen Groß, Roger Pau Monné, Boris Ostrovsky,
xen-devel, linux-kernel, regressions, Felix Fietkau,
Lorenzo Bianconi, Ryder Lee, linux-pci
On 29.01.2025 14:28, Bjorn Helgaas wrote:
> On Wed, Jan 29, 2025 at 10:17:20AM +0100, Jan Beulich wrote:
>> On 29.01.2025 04:22, Marek Marczykowski-Górecki wrote:
>>> On Tue, Jan 28, 2025 at 09:03:15PM -0600, Bjorn Helgaas wrote:
>>>> The report claims the problem only happens with Xen. I'm not a Xen
>>>> person, and I don't know how to find the relevant config accessors.
>>>> The snippets of kernel messages I see at [1] all mention pciback, so
>>>> that's my only clue of where to look. Bottom line, I have no idea
>>>> what the config accessor path is, and maybe we could learn something
>>>> by looking at whatever it is.
>>>
>>> AFAIK there are no separate config accessors under Xen dom0, the default
>>> ones are used. xen-pcifront takes over PCI config space access (and few
>>> more) only in a domU (and only for PV), when PCI passthrough is used.
>>> Here, it didn't went that far...
>>>
>>> But then, Xen may intercept such access [2]. If I read it right, it
>>> should allow all access (is_hardware_domain(dom0)==true, and also the
>>> device is not on ro_map - otherwise reset wouldn't work at all).
>>
>> The other day you mentioned (on Matrix I think) that you observe mmcfg
>> not being used on that system. Am I misremembering? (Since the capability
>> where the control bit lives is an extended one, that capability would
>> neither be read nor modified when mmcfg is unavailable.)
>
> If you're referring to the Configuration RRS Software Visibility
> Enable bit, that's in the PCIe Capability Root Control register, which
> is in the PCI-compatible config space (the first 256 bytes), not the
> extended config space.
Oh, I clearly didn't read Marek's earlier mail correctly. I'm sorry for that.
Jan
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12)
2025-01-29 13:52 ` Jan Beulich
@ 2025-01-29 14:50 ` Bjorn Helgaas
0 siblings, 0 replies; 12+ messages in thread
From: Bjorn Helgaas @ 2025-01-29 14:50 UTC (permalink / raw)
To: Jan Beulich
Cc: Bjorn Helgaas, Jürgen Groß, Roger Pau Monné,
Boris Ostrovsky, xen-devel, linux-kernel, regressions,
Felix Fietkau, Lorenzo Bianconi, Ryder Lee, linux-pci,
Marek Marczykowski-Górecki
On Wed, Jan 29, 2025 at 02:52:33PM +0100, Jan Beulich wrote:
> On 29.01.2025 14:32, Bjorn Helgaas wrote:
> > On Wed, Jan 29, 2025 at 04:47:28AM +0100, Marek Marczykowski-Górecki wrote:
> >> On Tue, Jan 28, 2025 at 09:40:18PM -0600, Bjorn Helgaas wrote:
> >>> I guess the code at [2] is running in user mode and uses Linux
> >>> syscalls for config access? Is it straceable?
> >>
> >> Nope, it's running as the hypervisor and mediates Linux's access to the
> >> hardware. In fact, Linux PV kernel (which dom0 is by default under Xen)
> >> is running in ring 3...
> >>
> >> But I can add some more logging there.
> >
> > So I guess the hypervisor performs the config access on behalf of the
> > Linux PV kernel? Obviously Linux thinks CRS/RRS SV is enabled, but I
> > suppose all the lspci output is similarly based on whatever the
> > hypervisor does, so how do we know the actual hardware config?
>
> The hypervisor only intercepts config space writes; reads, particularly
> when going via mmcfg, ought to be unaffected, and hence what lspci shows
> should be "the actual hardware config".
FWIW, on x86, the first 256 bytes of config space for domain 0 uses
raw_pci_ops even when mmcfg is available (see raw_pci_read()). This
normally means IO ports 0xcf8/0xcfc.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-01-29 14:50 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <Z5mOKQUrgeF_r6te@mail-itl>
2025-01-29 3:03 ` Config space access to Mediatek MT7922 doesn't work after device reset in Xen PV dom0 (regression, Linux 6.12) Bjorn Helgaas
2025-01-29 3:22 ` Marek Marczykowski-Górecki
2025-01-29 3:40 ` Bjorn Helgaas
2025-01-29 3:47 ` Marek Marczykowski-Górecki
2025-01-29 13:32 ` Bjorn Helgaas
2025-01-29 13:52 ` Jan Beulich
2025-01-29 14:50 ` Bjorn Helgaas
2025-01-29 9:17 ` Jan Beulich
2025-01-29 11:53 ` Marek Marczykowski-Górecki
2025-01-29 12:49 ` Jan Beulich
2025-01-29 13:28 ` Bjorn Helgaas
2025-01-29 13:54 ` Jan Beulich
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox