* Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues @ 2018-08-24 3:31 Daniel Drake 2018-08-24 15:42 ` [Nouveau] " Peter Wu 0 siblings, 1 reply; 12+ messages in thread From: Daniel Drake @ 2018-08-24 3:31 UTC (permalink / raw) To: linux-pci, nouveau, Linux PM; +Cc: Endless Linux Upstreaming Team Hi, We are facing a suspend/resume problem with many different Asus laptop models (30+ products) with Intel chipsets (multiple generations) and nvidia GPUs (several different ones). Reproducers include: 1. Boot 2. Suspend/resume 3. Load nouveau driver 4. Start X 5. Observe slow X startup and many many errors in logs (primarily nouveau fifo faults) or 1. Boot 2. Load nouveau driver 3. Start X 4. Run glxgears - observe spinning gears 4. Suspend/resume 5. Run glxgears - observe that output is all black or 1. Boot 2. Load proprietary nvidia driver 3. Start X 4. Suspend/resume 5. Observe screen all black, Xorg using 100% CPU So, suspend/resume basically kills the nvidia card in some way. After a lot of experimentation I found a workaround: during resume, set the value of PCI_PREF_BASE_UPPER32 to 0 on the parent PCI bridge. Easily done in drivers/pci/quirks.c. Now all nvidia stuff works fine. As an example of an affected product, take the Asus X542UQ (Intel KabyLake i7-7500U with Nvidia GeForce 940MX). The PCI bridge is: 00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express Root Port [8086:9d10] (rev f1) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 120 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 I/O behind bridge: 0000e000-0000efff Memory behind bridge: ee000000-ef0fffff Prefetchable memory behind bridge: 00000000d0000000-00000000e1ffffff Capabilities: [40] Express Root Port (Slot+), MSI 00 Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [90] Subsystem: ASUSTeK Computer Inc. Sunrise Point-LP PCI Express Root Port [1043:1a00] Capabilities: [a0] Power Management version 3 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Access Control Services Capabilities: [200] L1 PM Substates Capabilities: [220] #19 Kernel driver in use: pcieport The really weird thing here is that the workaround register PCI_PREF_BASE_UPPER32 already appears to have value 0, as shown above and also verified during resume. But simply writing value 0 again definitely results in all the problems going away. 1. Is the Intel PCI bridge misbehaving here? Why does writing the same value of PCI_PREF_BASE_UPPER32 make any difference at all? 2. Who is responsible for saving and restoring PCI bridge configuration during suspend and resume? Linux? ACPI? BIOS? I could not see any Linux code to save and restore these registers. Likewise I didn't find anything in the ACPI DSDT/SSDT - neither on the affected products, nor on a similar product that does not suffer this nvidia issue. Linux does put the PCI bridge into D3 power state during suspend, and upon resume the lower 32 bits of the prefetch address are still set to the same value, so through some means this info is not being lost. 3. Any other suggestions, hints or experiments I could do to help move forward on this issue? My goal is to add a workaround to Linux (perhaps as a pci quirk) for existing devices, but also we are in conversation with Asus engineers and if we can come up with a concrete diagnosis, we should be able to have them fix this at the BIOS level in future products. Thanks Daniel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Nouveau] Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues 2018-08-24 3:31 Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues Daniel Drake @ 2018-08-24 15:42 ` Peter Wu 2018-08-28 2:23 ` Daniel Drake 0 siblings, 1 reply; 12+ messages in thread From: Peter Wu @ 2018-08-24 15:42 UTC (permalink / raw) To: Daniel Drake; +Cc: linux-pci, nouveau, Linux PM, Endless Linux Upstreaming Team Hi Daniel, On Fri, Aug 24, 2018 at 11:31:54AM +0800, Daniel Drake wrote: > Hi, > > We are facing a suspend/resume problem with many different Asus laptop > models (30+ products) with Intel chipsets (multiple generations) and > nvidia GPUs (several different ones). Reproducers include: Are these systems also affected through runtime power management? For example: modprobe nouveau # should enable runtime PM sleep 6 # wait for runtime suspend to kick in lspci -s1: # runtime resume by reading PCI config space On laptops from about 2015-2016 with a GTX 9xxM this sequence results in hangs on various laptops (https://bugzilla.kernel.org/show_bug.cgi?id=156341). I wonder if you are experiencing the same issue. Do you have a list of affected models, an acpidump, the output of "lspci -nnvvvxxxx" and the corresponding BIOS version (e.g. from /sys/class/dmi/id/)? > After a lot of experimentation I found a workaround: during resume, > set the value of PCI_PREF_BASE_UPPER32 to 0 on the parent PCI bridge. > Easily done in drivers/pci/quirks.c. Now all nvidia stuff works fine. I am curious, how did you discover this? While this could work, perhaps there are alternative workarounds/fixes? When you say "parent PCI" bridge, is that actually the device you see in "lspci -tv"? On a Dell XPS 9560, the GPU is under a different device: -[0000:00]-+-00.0 Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers +-01.0-[01]----00.0 NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 05) Under 00:1c.0, there is a wireless adapter. > As an example of an affected product, take the Asus X542UQ (Intel > KabyLake i7-7500U with Nvidia GeForce 940MX). The PCI bridge is: > > 00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI > Express Root Port [8086:9d10] (rev f1) (prog-if 00 [Normal decode]) > Flags: bus master, fast devsel, latency 0, IRQ 120 > Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 > I/O behind bridge: 0000e000-0000efff > Memory behind bridge: ee000000-ef0fffff > Prefetchable memory behind bridge: 00000000d0000000-00000000e1ffffff > Capabilities: [40] Express Root Port (Slot+), MSI 00 > Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit- > Capabilities: [90] Subsystem: ASUSTeK Computer Inc. Sunrise > Point-LP PCI Express Root Port [1043:1a00] > Capabilities: [a0] Power Management version 3 > Capabilities: [100] Advanced Error Reporting > Capabilities: [140] Access Control Services > Capabilities: [200] L1 PM Substates > Capabilities: [220] #19 > Kernel driver in use: pcieport > > The really weird thing here is that the workaround register > PCI_PREF_BASE_UPPER32 already appears to have value 0, as shown above > and also verified during resume. But simply writing value 0 again > definitely results in all the problems going away. > > 1. Is the Intel PCI bridge misbehaving here? Why does writing the same > value of PCI_PREF_BASE_UPPER32 make any difference at all? At what point in the suspend code path did you insert this write? It is possible that the write somehow acted as a fence/memory barrier? > 2. Who is responsible for saving and restoring PCI bridge > configuration during suspend and resume? Linux? ACPI? BIOS? Not sure about PCI bridges, but at least for the PCI Express Capability registers, it is in control of the OS when control is granted via the ACPI _OSC method. > I could not see any Linux code to save and restore these registers. > Likewise I didn't find anything in the ACPI DSDT/SSDT - neither on the > affected products, nor on a similar product that does not suffer this > nvidia issue. Linux does put the PCI bridge into D3 power state during > suspend, and upon resume the lower 32 bits of the prefetch address are > still set to the same value, so through some means this info is not > being lost. > > > 3. Any other suggestions, hints or experiments I could do to help move > forward on this issue? > > My goal is to add a workaround to Linux (perhaps as a pci quirk) for > existing devices, but also we are in conversation with Asus engineers > and if we can come up with a concrete diagnosis, we should be able to > have them fix this at the BIOS level in future products. As Windows is probably not affected by this issue, a change must be possible to make Linux more compatible with Windows. Though I am not sure what change is needed. I recently compared PCI configuration space access and ACPI method invocation using QEMU + VFIO with Linux 4.18, Windows 7 and Windows 10 (1803). There were differences like disabling MSI/interrupts before suspend, setting the Enable Clock Power Management bit in PCI Express Link Control and more, but applying these changes were so far not really successful. Some supporting files for that investigation are here: https://github.com/Lekensteyn/acpi-stuff/tree/master/d3test Karol noticed that by not setting the State in PMCSR to D3 for the Nvidia GPU during runtime suspend, then the device would successfully resume. However, based on traces using VFIO-PCI, it does not seem a good solution as Windows does not behave like that. -- Kind regards, Peter Wu https://lekensteyn.nl ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Nouveau] Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues 2018-08-24 15:42 ` [Nouveau] " Peter Wu @ 2018-08-28 2:23 ` Daniel Drake 2018-08-28 9:57 ` Peter Wu 2018-08-29 12:40 ` Karol Herbst 0 siblings, 2 replies; 12+ messages in thread From: Daniel Drake @ 2018-08-28 2:23 UTC (permalink / raw) To: Peter Wu; +Cc: linux-pci, nouveau, Linux PM, Endless Linux Upstreaming Team On Fri, Aug 24, 2018 at 11:42 PM, Peter Wu <peter@lekensteyn.nl> wrote: > Are these systems also affected through runtime power management? For > example: > > modprobe nouveau # should enable runtime PM > sleep 6 # wait for runtime suspend to kick in > lspci -s1: # runtime resume by reading PCI config space > > On laptops from about 2015-2016 with a GTX 9xxM this sequence results in > hangs on various laptops > (https://bugzilla.kernel.org/show_bug.cgi?id=156341). This works fine here. I'm facing a different issue. >> After a lot of experimentation I found a workaround: during resume, >> set the value of PCI_PREF_BASE_UPPER32 to 0 on the parent PCI bridge. >> Easily done in drivers/pci/quirks.c. Now all nvidia stuff works fine. > > I am curious, how did you discover this? While this could work, perhaps > there are alternative workarounds/fixes? Based on the observation that the following procedure works fine (note the addition of step 3): 1. Boot 2. Suspend/resume 3. echo rescan > /sys/bus/pci/devices/0000:00:1c.0/rescan 4. Load nouveau driver 5. Start X I worked through the rescan codepath until I had isolated the specific code which magically makes things work (in pci_bridge_check_ranges). Having found that, step 3 in the above test procedure can be replaced with a simple: setpci -s 00:1c.0 0x28.l=0 > When you say "parent PCI" bridge, is that actually the device you see in > "lspci -tv"? On a Dell XPS 9560, the GPU is under a different device: > > -[0000:00]-+-00.0 Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers > +-01.0-[01]----00.0 NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] > > 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 05) Yes, it's the parent bridge shown by lspci. The address of this varies from system to system. >> 1. Is the Intel PCI bridge misbehaving here? Why does writing the same >> value of PCI_PREF_BASE_UPPER32 make any difference at all? > > At what point in the suspend code path did you insert this write? It is > possible that the write somehow acted as a fence/memory barrier? static void quirk_pref_base_upper32(struct pci_dev *dev) { u32 pref_base_upper32; pci_read_config_dword(dev, PCI_PREF_BASE_UPPER32, &pref_base_upper32); pci_write_config_dword(dev, PCI_PREF_BASE_UPPER32, pref_base_upper32); } DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x9d10, quirk_pref_base_upper32); I don't think it's acting as a barrier. I tried changing this code to rewrite other registers such as PCI_PREF_MEMORY_BASE and that makes the bug come back. >> 2. Who is responsible for saving and restoring PCI bridge >> configuration during suspend and resume? Linux? ACPI? BIOS? > > Not sure about PCI bridges, but at least for the PCI Express Capability > registers, it is in control of the OS when control is granted via the > ACPI _OSC method. I guess you are referring to pci_save_pcie_state(). I can't see anything equivalent for the bridge registers. > As Windows is probably not affected by this issue, a change must be > possible to make Linux more compatible with Windows. Though I am not > sure what change is needed. I agree. There's a definite difference with Windows here and it would be great to find a fix along those lines. > I recently compared PCI configuration space access and ACPI method > invocation using QEMU + VFIO with Linux 4.18, Windows 7 and Windows 10 > (1803). There were differences like disabling MSI/interrupts before > suspend, setting the Enable Clock Power Management bit in PCI Express > Link Control and more, but applying these changes were so far not really > successful. Interesting. Do you know any way that I could spy on Windows' accesses to the PCI bridge registers? Looking at at https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF I suspect VFIO would not help me here. It says: Note: If they are grouped with other devices in this manner, pci root ports and bridges should neither be bound to vfio at boot, nor be added to the VM. Thanks Daniel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Nouveau] Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues 2018-08-28 2:23 ` Daniel Drake @ 2018-08-28 9:57 ` Peter Wu 2018-08-29 0:19 ` Karol Herbst ` (2 more replies) 2018-08-29 12:40 ` Karol Herbst 1 sibling, 3 replies; 12+ messages in thread From: Peter Wu @ 2018-08-28 9:57 UTC (permalink / raw) To: Daniel Drake; +Cc: linux-pci, nouveau, Linux PM, Endless Linux Upstreaming Team On Tue, Aug 28, 2018 at 10:23:24AM +0800, Daniel Drake wrote: > On Fri, Aug 24, 2018 at 11:42 PM, Peter Wu <peter@lekensteyn.nl> wrote: > > Are these systems also affected through runtime power management? For > > example: > > > > modprobe nouveau # should enable runtime PM > > sleep 6 # wait for runtime suspend to kick in > > lspci -s1: # runtime resume by reading PCI config space > > > > On laptops from about 2015-2016 with a GTX 9xxM this sequence results in > > hangs on various laptops > > (https://bugzilla.kernel.org/show_bug.cgi?id=156341). > > This works fine here. I'm facing a different issue. Just to be sure, after "sleep", do both devices report "suspended" in /sys/bus/pci/devices/0000:00:1c.0/power/runtime_status /sys/bus/pci/devices/0000:01:00.0/power/runtime_status and was this reproduced with a recent mainline kernel with no special cmdline options? The endlessm kernel on Github seems to have quite some patches, one of them explicitly disable runtime PM: https://github.com/endlessm/linux/commit/8b128b50cd6725eee2ae9025a1510a221d9b42f2 > >> After a lot of experimentation I found a workaround: during resume, > >> set the value of PCI_PREF_BASE_UPPER32 to 0 on the parent PCI bridge. > >> Easily done in drivers/pci/quirks.c. Now all nvidia stuff works fine. > > > > I am curious, how did you discover this? While this could work, perhaps > > there are alternative workarounds/fixes? > > Based on the observation that the following procedure works fine (note > the addition of step 3): > > 1. Boot > 2. Suspend/resume > 3. echo rescan > /sys/bus/pci/devices/0000:00:1c.0/rescan > 4. Load nouveau driver > 5. Start X > > I worked through the rescan codepath until I had isolated the specific > code which magically makes things work (in pci_bridge_check_ranges). > > Having found that, step 3 in the above test procedure can be replaced > with a simple: > setpci -s 00:1c.0 0x28.l=0 > > > When you say "parent PCI" bridge, is that actually the device you see in > > "lspci -tv"? On a Dell XPS 9560, the GPU is under a different device: > > > > -[0000:00]-+-00.0 Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers > > +-01.0-[01]----00.0 NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] > > > > 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 05) > > Yes, it's the parent bridge shown by lspci. The address of this varies > from system to system. Could you share some details: - acpidump - lspci -nnxxxxvvv - BIOS version (from /sys/class/dmi/id/) - kernel version (mainline?) Perhaps there is some magic in the ACPI suspend or resume path that causes this. > >> 1. Is the Intel PCI bridge misbehaving here? Why does writing the same > >> value of PCI_PREF_BASE_UPPER32 make any difference at all? > > > > At what point in the suspend code path did you insert this write? It is > > possible that the write somehow acted as a fence/memory barrier? > > static void quirk_pref_base_upper32(struct pci_dev *dev) > { > u32 pref_base_upper32; > pci_read_config_dword(dev, PCI_PREF_BASE_UPPER32, &pref_base_upper32); > pci_write_config_dword(dev, PCI_PREF_BASE_UPPER32, pref_base_upper32); > } > DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x9d10, quirk_pref_base_upper32); > > I don't think it's acting as a barrier. I tried changing this code to > rewrite other registers such as PCI_PREF_MEMORY_BASE and that makes > the bug come back. > > >> 2. Who is responsible for saving and restoring PCI bridge > >> configuration during suspend and resume? Linux? ACPI? BIOS? > > > > Not sure about PCI bridges, but at least for the PCI Express Capability > > registers, it is in control of the OS when control is granted via the > > ACPI _OSC method. > > I guess you are referring to pci_save_pcie_state(). I can't see > anything equivalent for the bridge registers. Yes that would be the function, called via pci_save_state. > > I recently compared PCI configuration space access and ACPI method > > invocation using QEMU + VFIO with Linux 4.18, Windows 7 and Windows 10 > > (1803). There were differences like disabling MSI/interrupts before > > suspend, setting the Enable Clock Power Management bit in PCI Express > > Link Control and more, but applying these changes were so far not really > > successful. > > Interesting. Do you know any way that I could spy on Windows' accesses > to the PCI bridge registers? > Looking at at https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF > I suspect VFIO would not help me here. > It says: > Note: If they are grouped with other devices in this manner, pci > root ports and bridges should neither be bound to vfio at boot, nor be > added to the VM. Only non-bridge devices can be passed to a guest, but perhaps logging access to the emulated bridge is already sufficient. The Prefetchable Base Upper 32 Bits register is at offset 0x28. In a trace where the Nvidia device is disabled/enabled via Device Manager, I see writes on the enable path: 2571@1535108904.593107:rp_write_config (ioh3420, @0x28, 0x0, len=0x4) For Linux, I only see one write at startup, none on runtime resume. I did not test system sleep/resume. (disable/enable is arguably a bit different from system s/r, you may want to do additional testing here.) Full log for WIndows 10 and Linux: https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/XPS9560/slogs/win10-rp-enable-disable.txt#L3418 https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/XPS9560/slogs/linux-rp.txt lspci for the emulated bridge: https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/XPS9560/lspci-vm-vfio.txt#L359 The rp_*_config trace points are non-standard and require patches: https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/patches/qemu-trace.diff -- Kind regards, Peter Wu https://lekensteyn.nl ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Nouveau] Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues 2018-08-28 9:57 ` Peter Wu @ 2018-08-29 0:19 ` Karol Herbst 2018-08-30 7:41 ` Daniel Drake 2018-09-05 6:26 ` Daniel Drake 2 siblings, 0 replies; 12+ messages in thread From: Karol Herbst @ 2018-08-29 0:19 UTC (permalink / raw) To: Peter Wu Cc: Daniel Drake, linux-pci, Linux PM, Endless Linux Upstreaming Team, nouveau hi everybody. I came up with another workaround for the runtime suspend/resume issues we have as well: https://github.com/karolherbst/linux/commit/3cab4c50f77cf97c6c19a9b1e7884366f78f35a5.patch I don't think this is really a bug inside the kernel or not directly. If you for example not use Nouveau but simply enable the runpm features without a driver or a very dumb stub driver, the GPU should be able to suspend and resume correctly. At least this is the case on my laptop. I was able to disable enough part of Nouveaus code to be able to tell that running some signed firmware embedded in the vbios on the GPU embedded PMU is starting the runpm issues to appear on my laptop. This firmware is also used by the nvidia driver, which makes the argument "it happens with Nouveau and nvidia" a useless one. I have no idea what this is all about, but it might be the hardware/firmware just being overprotecting and bailing out on an untrusted state, maybe it is a bug inside the kernel, maybe a bug inside nvidias firmware, which would be super hard to fix as it's embedded in the vbios. On Tue, Aug 28, 2018 at 11:57 AM, Peter Wu <peter@lekensteyn.nl> wrote: > On Tue, Aug 28, 2018 at 10:23:24AM +0800, Daniel Drake wrote: >> On Fri, Aug 24, 2018 at 11:42 PM, Peter Wu <peter@lekensteyn.nl> wrote: >> > Are these systems also affected through runtime power management? For >> > example: >> > >> > modprobe nouveau # should enable runtime PM >> > sleep 6 # wait for runtime suspend to kick in >> > lspci -s1: # runtime resume by reading PCI config space >> > >> > On laptops from about 2015-2016 with a GTX 9xxM this sequence results in >> > hangs on various laptops >> > (https://bugzilla.kernel.org/show_bug.cgi?id=156341). >> >> This works fine here. I'm facing a different issue. > > Just to be sure, after "sleep", do both devices report "suspended" in > /sys/bus/pci/devices/0000:00:1c.0/power/runtime_status > /sys/bus/pci/devices/0000:01:00.0/power/runtime_status > > and was this reproduced with a recent mainline kernel with no special > cmdline options? The endlessm kernel on Github seems to have quite some > patches, one of them explicitly disable runtime PM: > https://github.com/endlessm/linux/commit/8b128b50cd6725eee2ae9025a1510a221d9b42f2 > >> >> After a lot of experimentation I found a workaround: during resume, >> >> set the value of PCI_PREF_BASE_UPPER32 to 0 on the parent PCI bridge. >> >> Easily done in drivers/pci/quirks.c. Now all nvidia stuff works fine. >> > >> > I am curious, how did you discover this? While this could work, perhaps >> > there are alternative workarounds/fixes? >> >> Based on the observation that the following procedure works fine (note >> the addition of step 3): >> >> 1. Boot >> 2. Suspend/resume >> 3. echo rescan > /sys/bus/pci/devices/0000:00:1c.0/rescan >> 4. Load nouveau driver >> 5. Start X >> >> I worked through the rescan codepath until I had isolated the specific >> code which magically makes things work (in pci_bridge_check_ranges). >> >> Having found that, step 3 in the above test procedure can be replaced >> with a simple: >> setpci -s 00:1c.0 0x28.l=0 >> >> > When you say "parent PCI" bridge, is that actually the device you see in >> > "lspci -tv"? On a Dell XPS 9560, the GPU is under a different device: >> > >> > -[0000:00]-+-00.0 Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers >> > +-01.0-[01]----00.0 NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] >> > >> > 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 05) >> >> Yes, it's the parent bridge shown by lspci. The address of this varies >> from system to system. > > Could you share some details: > - acpidump > - lspci -nnxxxxvvv > - BIOS version (from /sys/class/dmi/id/) > - kernel version (mainline?) > > Perhaps there is some magic in the ACPI suspend or resume path that > causes this. > >> >> 1. Is the Intel PCI bridge misbehaving here? Why does writing the same >> >> value of PCI_PREF_BASE_UPPER32 make any difference at all? >> > >> > At what point in the suspend code path did you insert this write? It is >> > possible that the write somehow acted as a fence/memory barrier? >> >> static void quirk_pref_base_upper32(struct pci_dev *dev) >> { >> u32 pref_base_upper32; >> pci_read_config_dword(dev, PCI_PREF_BASE_UPPER32, &pref_base_upper32); >> pci_write_config_dword(dev, PCI_PREF_BASE_UPPER32, pref_base_upper32); >> } >> DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x9d10, quirk_pref_base_upper32); >> >> I don't think it's acting as a barrier. I tried changing this code to >> rewrite other registers such as PCI_PREF_MEMORY_BASE and that makes >> the bug come back. >> >> >> 2. Who is responsible for saving and restoring PCI bridge >> >> configuration during suspend and resume? Linux? ACPI? BIOS? >> > >> > Not sure about PCI bridges, but at least for the PCI Express Capability >> > registers, it is in control of the OS when control is granted via the >> > ACPI _OSC method. >> >> I guess you are referring to pci_save_pcie_state(). I can't see >> anything equivalent for the bridge registers. > > Yes that would be the function, called via pci_save_state. > >> > I recently compared PCI configuration space access and ACPI method >> > invocation using QEMU + VFIO with Linux 4.18, Windows 7 and Windows 10 >> > (1803). There were differences like disabling MSI/interrupts before >> > suspend, setting the Enable Clock Power Management bit in PCI Express >> > Link Control and more, but applying these changes were so far not really >> > successful. >> >> Interesting. Do you know any way that I could spy on Windows' accesses >> to the PCI bridge registers? >> Looking at at https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF >> I suspect VFIO would not help me here. >> It says: >> Note: If they are grouped with other devices in this manner, pci >> root ports and bridges should neither be bound to vfio at boot, nor be >> added to the VM. > > Only non-bridge devices can be passed to a guest, but perhaps logging > access to the emulated bridge is already sufficient. The Prefetchable > Base Upper 32 Bits register is at offset 0x28. > > In a trace where the Nvidia device is disabled/enabled via Device > Manager, I see writes on the enable path: > > 2571@1535108904.593107:rp_write_config (ioh3420, @0x28, 0x0, len=0x4) > > For Linux, I only see one write at startup, none on runtime resume. > I did not test system sleep/resume. (disable/enable is arguably a bit > different from system s/r, you may want to do additional testing here.) > > Full log for WIndows 10 and Linux: > https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/XPS9560/slogs/win10-rp-enable-disable.txt#L3418 > https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/XPS9560/slogs/linux-rp.txt > lspci for the emulated bridge: > https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/XPS9560/lspci-vm-vfio.txt#L359 > The rp_*_config trace points are non-standard and require patches: > https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/patches/qemu-trace.diff > -- > Kind regards, > Peter Wu > https://lekensteyn.nl > _______________________________________________ > Nouveau mailing list > Nouveau@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/nouveau ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Nouveau] Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues 2018-08-28 9:57 ` Peter Wu 2018-08-29 0:19 ` Karol Herbst @ 2018-08-30 7:41 ` Daniel Drake 2018-08-30 9:40 ` Peter Wu 2018-09-05 6:26 ` Daniel Drake 2 siblings, 1 reply; 12+ messages in thread From: Daniel Drake @ 2018-08-30 7:41 UTC (permalink / raw) To: Peter Wu; +Cc: linux-pci, nouveau, Linux PM, Endless Linux Upstreaming Team On Tue, Aug 28, 2018 at 5:57 PM, Peter Wu <peter@lekensteyn.nl> wrote: > Just to be sure, after "sleep", do both devices report "suspended" in > /sys/bus/pci/devices/0000:00:1c.0/power/runtime_status > /sys/bus/pci/devices/0000:01:00.0/power/runtime_status > > and was this reproduced with a recent mainline kernel with no special > cmdline options? The endlessm kernel on Github seems to have quite some > patches, one of them explicitly disable runtime PM: > https://github.com/endlessm/linux/commit/8b128b50cd6725eee2ae9025a1510a221d9b42f2 Yes, I checked for this issue in the past and I'm certain that nouveau runtime pm works fine. I also checked again now on X542UQ and the results are the same. nouveau can do runtime suspend/resume (confirmed by reading runtime_status) and then render 3D graphics OK. lspci is fine too. It is just S3 suspend that is affected. This was testing on Linux 4.18 unmodified. I had to set nouveau runpm parameter to 1 for it to use runtime pm. Also checked with Karol's patch, the S3 issue is still there. Seems like 2 different issues. > Could you share some details: > - acpidump > - lspci -nnxxxxvvv > - BIOS version (from /sys/class/dmi/id/) > - kernel version (mainline?) Linux 4.18 mainline BIOS version: X542UQ.202 acpidump: https://gist.githubusercontent.com/dsd/79352284d4adce14f30d70e94fad89f2/raw/ed9480e924be413fff567da2edd5a2a7a86619d0/gistfile1.txt pci: https://gist.githubusercontent.com/dsd/79352284d4adce14f30d70e94fad89f2/raw/ed9480e924be413fff567da2edd5a2a7a86619d0/pci > Only non-bridge devices can be passed to a guest, but perhaps logging > access to the emulated bridge is already sufficient. The Prefetchable > Base Upper 32 Bits register is at offset 0x28. > > In a trace where the Nvidia device is disabled/enabled via Device > Manager, I see writes on the enable path: > > 2571@1535108904.593107:rp_write_config (ioh3420, @0x28, 0x0, len=0x4) > > For Linux, I only see one write at startup, none on runtime resume. > I did not test system sleep/resume. (disable/enable is arguably a bit > different from system s/r, you may want to do additional testing here.) I managed to install Win10 Home under virt-manager with the nvidia device passed through. However the nvidia windows driver installer refuses to install, says: The NVIDIA graphics driver is not compatible with this version of Windows. This graphics driver could not find compatible graphics hardware. One trick for similar sounding problems is to change hypervisor vendor ID but no luck here. I was going to check if I can monitor PCI bridge config space access even without the nvidia driver installed, but I can't find a way to make the windows VM suspend and resume - the option is not available in the VM. Daniel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Nouveau] Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues 2018-08-30 7:41 ` Daniel Drake @ 2018-08-30 9:40 ` Peter Wu 2018-08-31 7:17 ` Daniel Drake 0 siblings, 1 reply; 12+ messages in thread From: Peter Wu @ 2018-08-30 9:40 UTC (permalink / raw) To: Daniel Drake; +Cc: linux-pci, nouveau, Linux PM, Endless Linux Upstreaming Team On Thu, Aug 30, 2018 at 03:41:43PM +0800, Daniel Drake wrote: > On Tue, Aug 28, 2018 at 5:57 PM, Peter Wu <peter@lekensteyn.nl> wrote: > > Just to be sure, after "sleep", do both devices report "suspended" in > > /sys/bus/pci/devices/0000:00:1c.0/power/runtime_status > > /sys/bus/pci/devices/0000:01:00.0/power/runtime_status > > > > and was this reproduced with a recent mainline kernel with no special > > cmdline options? The endlessm kernel on Github seems to have quite some > > patches, one of them explicitly disable runtime PM: > > https://github.com/endlessm/linux/commit/8b128b50cd6725eee2ae9025a1510a221d9b42f2 > > Yes, I checked for this issue in the past and I'm certain that nouveau > runtime pm works fine. > > I also checked again now on X542UQ and the results are the same. > nouveau can do runtime suspend/resume (confirmed by reading > runtime_status) and then render 3D graphics OK. lspci is fine too. It > is just S3 suspend that is affected. This was testing on Linux 4.18 > unmodified. I had to set nouveau runpm parameter to 1 for it to use > runtime pm. > > Also checked with Karol's patch, the S3 issue is still there. Seems > like 2 different issues. > > > Could you share some details: > > - acpidump > > - lspci -nnxxxxvvv > > - BIOS version (from /sys/class/dmi/id/) > > - kernel version (mainline?) > > Linux 4.18 mainline > BIOS version: X542UQ.202 > acpidump: https://gist.githubusercontent.com/dsd/79352284d4adce14f30d70e94fad89f2/raw/ed9480e924be413fff567da2edd5a2a7a86619d0/gistfile1.txt > pci: https://gist.githubusercontent.com/dsd/79352284d4adce14f30d70e94fad89f2/raw/ed9480e924be413fff567da2edd5a2a7a86619d0/pci Thanks, based on the \_SB.PCI0.HGOF implementation, it looks like this model will not be affected by the runtime suspend issue (it sets the "Link Disable" register which is known to work for other models). As the BIOS date is not visible, can you also confirm that this message is visible in dmesg? nouveau: detected PR support, will not use DSM FWIW, the latest BIOS version is 305, released at 2018/08/07: https://www.asus.com/Laptops/ASUS-VivoBook-15-X542UQ/HelpDesk_BIOS/ > > Only non-bridge devices can be passed to a guest, but perhaps logging > > access to the emulated bridge is already sufficient. The Prefetchable > > Base Upper 32 Bits register is at offset 0x28. > > > > In a trace where the Nvidia device is disabled/enabled via Device > > Manager, I see writes on the enable path: > > > > 2571@1535108904.593107:rp_write_config (ioh3420, @0x28, 0x0, len=0x4) > > > > For Linux, I only see one write at startup, none on runtime resume. > > I did not test system sleep/resume. (disable/enable is arguably a bit > > different from system s/r, you may want to do additional testing here.) > > I managed to install Win10 Home under virt-manager with the nvidia > device passed through. > However the nvidia windows driver installer refuses to install, says: > The NVIDIA graphics driver is not compatible with this version of Windows. > This graphics driver could not find compatible graphics hardware. > > One trick for similar sounding problems is to change hypervisor vendor > ID but no luck here. For laptops, it appears that you have to do at least two things: - Ensure that the Subsystem Vendor/Product ID are set. - Expose a _ROM ACPI method that provides VBIOS. Perhaps you also need to provide a "_DSM" method that emulates at least the "Optimus" interface for GUID a486d8f8-0bda-471b-a72b-6042a6b5bee0. You probably lost interest here, but if you want to continue anyway this is what allowed me to install the driver on the XPS 9560: https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/fakedev.asl If you adapt if for your environment, note: - I have only tested this with the q35 machine type with an additional ioh3420 root port. See the XPS956/boot-vm script. - The \_SB.PCI0.SE0 device should match the root port: cat /sys/bus/pci/devices/0000:00:1c.0/firmware_node/path (the SE0 name is chosen by QEMU.) - The "NET" (\_SB.PCI0.SE0.NET) device name is arbitrary chosen by me, it currently assumes PCI address 01:00.0: Name (_ADR, 0x00000000) // _ADR: Address (dev+fn only, 01:00.0) - The _DSM method is copied from the XPS 9560 SSDT with external method references removed (focus on the code with "OPCI" true, the other two with NBCI and SGCI are irrelevant). One obvious difference with your SSDT is function 0x10, your OPVK ("Optimus Validation Key Object" is different and there is another "OPDR" check afterwards. > I was going to check if I can monitor PCI bridge config space access > even without the nvidia driver installed, but I can't find a way to > make the windows VM suspend and resume - the option is not available > in the VM. The system cannot be suspended if the GPU device has no driver. -- Kind regards, Peter Wu https://lekensteyn.nl ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Nouveau] Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues 2018-08-30 9:40 ` Peter Wu @ 2018-08-31 7:17 ` Daniel Drake 0 siblings, 0 replies; 12+ messages in thread From: Daniel Drake @ 2018-08-31 7:17 UTC (permalink / raw) To: Peter Wu; +Cc: linux-pci, nouveau, Linux PM, Endless Linux Upstreaming Team On Thu, Aug 30, 2018 at 5:40 PM, Peter Wu <peter@lekensteyn.nl> wrote: > As the BIOS date is not visible, can you also confirm that this message > is visible in dmesg? > > nouveau: detected PR support, will not use DSM Yes, that gets logged. > For laptops, it appears that you have to do at least two things: > - Ensure that the Subsystem Vendor/Product ID are set. > - Expose a _ROM ACPI method that provides VBIOS. > > Perhaps you also need to provide a "_DSM" method that emulates at least > the "Optimus" interface for GUID a486d8f8-0bda-471b-a72b-6042a6b5bee0. > > You probably lost interest here, but if you want to continue anyway this > is what allowed me to install the driver on the XPS 9560: > https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/fakedev.asl Indeed. I'm going to submit the workaround and I'll look to come back to this qemu/vfio analysis later. Thanks Daniel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Nouveau] Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues 2018-08-28 9:57 ` Peter Wu 2018-08-29 0:19 ` Karol Herbst 2018-08-30 7:41 ` Daniel Drake @ 2018-09-05 6:26 ` Daniel Drake 2018-09-05 16:02 ` Peter Wu 2 siblings, 1 reply; 12+ messages in thread From: Daniel Drake @ 2018-09-05 6:26 UTC (permalink / raw) To: Peter Wu; +Cc: linux-pci, nouveau, Linux PM, Endless Linux Upstreaming Team On Tue, Aug 28, 2018 at 5:57 PM, Peter Wu <peter@lekensteyn.nl> wrote: > Only non-bridge devices can be passed to a guest, but perhaps logging > access to the emulated bridge is already sufficient. The Prefetchable > Base Upper 32 Bits register is at offset 0x28. > > In a trace where the Nvidia device is disabled/enabled via Device > Manager, I see writes on the enable path: > > 2571@1535108904.593107:rp_write_config (ioh3420, @0x28, 0x0, len=0x4) Did you do anything special to get an emulated bridge included in this setup? Folllowing the instructions at https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF I can successfully pass through devices to windows running under virt-manager. In the nvidia GPU case I haven't got passed the driver installation failure, but I can pass through other devices OK and install their drivers. However I do not end up with any PCI-to-PCI bridges in this setup. The passed through device sits at address 00:08.0, parent is the PCI host bridge 00:00.0. (I'm trying to spy if Windows appears to restore or reset the PCI bridge prefetch registers upon resume) Thanks Daniel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Nouveau] Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues 2018-09-05 6:26 ` Daniel Drake @ 2018-09-05 16:02 ` Peter Wu 0 siblings, 0 replies; 12+ messages in thread From: Peter Wu @ 2018-09-05 16:02 UTC (permalink / raw) To: Daniel Drake; +Cc: linux-pci, nouveau, Linux PM, Endless Linux Upstreaming Team On Wed, Sep 05, 2018 at 02:26:51PM +0800, Daniel Drake wrote: > On Tue, Aug 28, 2018 at 5:57 PM, Peter Wu <peter@lekensteyn.nl> wrote: > > Only non-bridge devices can be passed to a guest, but perhaps logging > > access to the emulated bridge is already sufficient. The Prefetchable > > Base Upper 32 Bits register is at offset 0x28. > > > > In a trace where the Nvidia device is disabled/enabled via Device > > Manager, I see writes on the enable path: > > > > 2571@1535108904.593107:rp_write_config (ioh3420, @0x28, 0x0, len=0x4) > > Did you do anything special to get an emulated bridge included in this setup? Yes, I followed instructions in QEMU's docs/pcie.txt and ended up with: -device ioh3420,id=rp1,bus=pcie.0,addr=1c.0,port=1 -device vfio-pci,bus=rp1,host=01:00.0,rombar=0,x-pci-sub-vendor-id=0x1028,x-pci-sub-device-id=0x07be (Subvendor/device IDs are from lspci -nnv). > Folllowing the instructions at > https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF I can > successfully pass through devices to windows running under > virt-manager. In the nvidia GPU case I haven't got passed the driver > installation failure, but I can pass through other devices OK and > install their drivers. After installing drivers, it would still not start. For that to work I had to pass the VBIOS via an ACPI _ROM method: -acpitable file=fakedev.aml -fw_cfg name=opt/nl.lekensteyn/vfio-vbios,file=vbios.rom These options were taken from: https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/XPS9560/boot-vm fakedev.asl source file and instructions to extract the VBIOS: https://github.com/Lekensteyn/acpi-stuff/tree/master/d3test > However I do not end up with any PCI-to-PCI bridges in this setup. The > passed through device sits at address 00:08.0, parent is the PCI host > bridge 00:00.0. > > (I'm trying to spy if Windows appears to restore or reset the PCI > bridge prefetch registers upon resume) If you want to suspend the guest, note that Windows refuses suspend with the default VGA adapter (see "devicequery /a"). Try the QXL adapter with https://gitlab.freedesktop.org/spice/win32/qxl-wddm-dod -vga qxl -device qemu-xhci -device usb-tablet Not sure how well tested this is, I had to patch Linux to avoid an oops. If I try this on Windows, it successfully suspends ("info status" in QEMU monitor says "paused (suspended)"), but resume ends up with a black screen... Luckily, the important information is already logged. Windows 10 indeed seems to write to "Prefetchable Base Upper 32 Bits" on resume[1]. -- Kind regards, Peter Wu https://lekensteyn.nl [1]: QEMU output (annotated with register names) for ./run-vm.sh -device usb-tablet -vga qxl /tmp/w10.qcow2 -trace rp_read_config,file=/dev/stdout -trace rp_write_config,file=/dev/stdout <suspend> NET._PS3 32481@1536163097.415976:rp_write_config (ioh3420, @0x12c, 0x0, len=0x4) AER: Root Error Command 32481@1536163097.415999:rp_write_config (ioh3420, @0xac, 0x0, len=0x2) PCIE: Root Control 32481@1536163097.416008:rp_read_config (ioh3420, @0xac, len=0x2) 0x0 PCIE: Root Control 32481@1536163097.416017:rp_read_config (ioh3420, @0xa0, len=0x2) 0x0 PCIE: Link Control 32481@1536163097.416024:rp_write_config (ioh3420, @0xb0, 0x10000, len=0x4) PCIE: Root Status 32481@1536163097.416057:rp_read_config (ioh3420, @0x4, len=0x2) 0x506 Command 32481@1536163097.416066:rp_read_config (ioh3420, @0xc, len=0x1) 0x0 Cacheline Size 32481@1536163097.416073:rp_read_config (ioh3420, @0xd, len=0x1) 0x0 Latency Timer 32481@1536163097.416081:rp_read_config (ioh3420, @0x3c, len=0x1) 0x0 Interrupt Line 32481@1536163097.416088:rp_read_config (ioh3420, @0x19, len=0x1) 0x1 Secondary Bus Number 32481@1536163097.416095:rp_read_config (ioh3420, @0x1a, len=0x1) 0x1 Subordiante Bus Number 32481@1536163097.416103:rp_read_config (ioh3420, @0x3e, len=0x2) 0x2 Bridge Control 32481@1536163097.416129:rp_read_config (ioh3420, @0x0, len=0x2) 0x8086 Device ID 32481@1536163097.416136:rp_read_config (ioh3420, @0x2, len=0x2) 0x3420 Vendor ID 32481@1536163097.416143:rp_read_config (ioh3420, @0x8, len=0x1) 0x2 Revision 32481@1536163097.416150:rp_read_config (ioh3420, @0x9, len=0x1) 0x0 Class Code 32481@1536163097.416156:rp_read_config (ioh3420, @0xa, len=0x1) 0x4 +1 Class Code 32481@1536163097.416164:rp_read_config (ioh3420, @0xb, len=0x1) 0x6 +2 Class Code 32481@1536163097.416172:rp_read_config (ioh3420, @0xe, len=0x1) 0x1 Header Type 32481@1536163097.416180:rp_read_config (ioh3420, @0x6, len=0x2) 0x10 Status 32481@1536163097.416187:rp_read_config (ioh3420, @0x34, len=0x1) 0xe0 Capabilities Pointer 32481@1536163097.416195:rp_read_config (ioh3420, @0xe0, len=0x2) 0x9001 32481@1536163097.416203:rp_read_config (ioh3420, @0x90, len=0x2) 0x6010 PCI Express 32481@1536163097.416210:rp_read_config (ioh3420, @0x60, len=0x2) 0x4005 Message Signaled Interrupts 32481@1536163097.416218:rp_read_config (ioh3420, @0x40, len=0x2) 0xd Bridge subsystem vendor/device ID 32481@1536163097.416226:rp_read_config (ioh3420, @0x44, len=0x2) 0x8086 32481@1536163097.416234:rp_read_config (ioh3420, @0x46, len=0x2) 0x0 32481@1536163097.416241:rp_read_config (ioh3420, @0x98, len=0x2) 0x7 PCIE: Device Control 32481@1536163097.416249:rp_read_config (ioh3420, @0xb8, len=0x2) 0x0 PCIE: Device Control 2 32481@1536163097.416257:rp_read_config (ioh3420, @0x0, len=0x2) 0x8086 Device ID 32481@1536163097.416265:rp_read_config (ioh3420, @0x2, len=0x2) 0x3420 Vendor ID 32481@1536163097.416272:rp_read_config (ioh3420, @0x8, len=0x1) 0x2 Revision 32481@1536163097.416280:rp_read_config (ioh3420, @0x9, len=0x1) 0x0 Class Code 32481@1536163097.416287:rp_read_config (ioh3420, @0xa, len=0x1) 0x4 +1 Class Code 32481@1536163097.416295:rp_read_config (ioh3420, @0xb, len=0x1) 0x6 +2 Class Code 32481@1536163097.416303:rp_read_config (ioh3420, @0xe, len=0x1) 0x1 Header Type 32481@1536163097.416310:rp_read_config (ioh3420, @0x6, len=0x2) 0x10 Status 32481@1536163097.416318:rp_read_config (ioh3420, @0x34, len=0x1) 0xe0 Capabilities Pointer 32481@1536163097.416325:rp_read_config (ioh3420, @0xe0, len=0x2) 0x9001 32481@1536163097.416333:rp_read_config (ioh3420, @0x90, len=0x2) 0x6010 PCI Express 32481@1536163097.416341:rp_read_config (ioh3420, @0x60, len=0x2) 0x4005 Message Signaled Interrupts 32481@1536163097.416349:rp_read_config (ioh3420, @0x40, len=0x2) 0xd Bridge subsystem vendor/device ID 32481@1536163097.416356:rp_read_config (ioh3420, @0x44, len=0x2) 0x8086 32481@1536163097.416364:rp_read_config (ioh3420, @0x46, len=0x2) 0x0 32481@1536163097.416372:rp_read_config (ioh3420, @0x4, len=0x2) 0x506 Command 32481@1536163097.416380:rp_write_config (ioh3420, @0x4, 0x506, len=0x2) Command 32481@1536163097.416742:rp_read_config (ioh3420, @0x62, len=0x2) 0x103 MSI: Message Control 32481@1536163097.416753:rp_write_config (ioh3420, @0x62, 0x102, len=0x2) MSI: Message Control 32481@1536163097.416762:rp_read_config (ioh3420, @0x4, len=0x2) 0x506 Command 32481@1536163097.416770:rp_write_config (ioh3420, @0x4, 0x500, len=0x2) Command 32481@1536163097.417356:rp_read_config (ioh3420, @0x9a, len=0x2) 0x0 PCIE: Device Status 32481@1536163097.417367:rp_read_config (ioh3420, @0xe0, len=0x4) 0xc8039001 32481@1536163097.417375:rp_read_config (ioh3420, @0xe4, len=0x4) 0x8 32481@1536163097.417383:rp_write_config (ioh3420, @0xe4, 0xb, len=0x2) 32481@1536163097.456781:rp_read_config (ioh3420, @0xe4, len=0x2) 0xb _PS3 PG00._ON PG00._OFF <resume> PG00._ON PG00._ON _PS0 32481@1536163120.049599:rp_read_config (ioh3420, @0x0, len=0x2) 0x8086 Device ID 32481@1536163120.049655:rp_read_config (ioh3420, @0x2, len=0x2) 0x3420 Vendor ID 32481@1536163120.049680:rp_read_config (ioh3420, @0x8, len=0x1) 0x2 Revision 32481@1536163120.049708:rp_read_config (ioh3420, @0x9, len=0x1) 0x0 Class Code 32481@1536163120.049734:rp_read_config (ioh3420, @0xa, len=0x1) 0x4 +1 Class Code 32481@1536163120.049760:rp_read_config (ioh3420, @0xb, len=0x1) 0x6 +2 Class Code 32481@1536163120.049785:rp_read_config (ioh3420, @0xe, len=0x1) 0x1 Header Type 32481@1536163120.049811:rp_read_config (ioh3420, @0x6, len=0x2) 0x10 Status 32481@1536163120.049837:rp_read_config (ioh3420, @0x34, len=0x1) 0xe0 Capabilities Pointer 32481@1536163120.049862:rp_read_config (ioh3420, @0xe0, len=0x2) 0x9001 32481@1536163120.049887:rp_read_config (ioh3420, @0x90, len=0x2) 0x6010 PCI Express 32481@1536163120.049909:rp_read_config (ioh3420, @0x60, len=0x2) 0x4005 Message Signaled Interrupts 32481@1536163120.049932:rp_read_config (ioh3420, @0x40, len=0x2) 0xd Bridge subsystem vendor/device ID 32481@1536163120.049958:rp_read_config (ioh3420, @0x44, len=0x2) 0x8086 32481@1536163120.049985:rp_read_config (ioh3420, @0x46, len=0x2) 0x0 32481@1536163120.050015:rp_read_config (ioh3420, @0xe0, len=0x4) 0xc8039001 32481@1536163120.050040:rp_read_config (ioh3420, @0xe4, len=0x4) 0xb 32481@1536163120.050072:rp_write_config (ioh3420, @0xe4, 0x8, len=0x2) 32481@1536163120.068096:rp_read_config (ioh3420, @0x0, len=0x2) 0x8086 Device ID 32481@1536163120.068157:rp_read_config (ioh3420, @0x2, len=0x2) 0x3420 Vendor ID 32481@1536163120.068194:rp_read_config (ioh3420, @0x8, len=0x1) 0x2 Revision 32481@1536163120.068222:rp_read_config (ioh3420, @0x9, len=0x1) 0x0 Class Code 32481@1536163120.068250:rp_read_config (ioh3420, @0xa, len=0x1) 0x4 +1 Class Code 32481@1536163120.068284:rp_read_config (ioh3420, @0xb, len=0x1) 0x6 +2 Class Code 32481@1536163120.068309:rp_read_config (ioh3420, @0xe, len=0x1) 0x1 Header Type 32481@1536163120.068333:rp_read_config (ioh3420, @0x6, len=0x2) 0x10 Status 32481@1536163120.068361:rp_read_config (ioh3420, @0x34, len=0x1) 0xe0 Capabilities Pointer 32481@1536163120.068395:rp_read_config (ioh3420, @0xe0, len=0x2) 0x9001 32481@1536163120.068421:rp_read_config (ioh3420, @0x90, len=0x2) 0x6010 PCI Express 32481@1536163120.068446:rp_read_config (ioh3420, @0x60, len=0x2) 0x4005 Message Signaled Interrupts 32481@1536163120.068471:rp_read_config (ioh3420, @0x40, len=0x2) 0xd Bridge subsystem vendor/device ID 32481@1536163120.068495:rp_read_config (ioh3420, @0x44, len=0x2) 0x8086 32481@1536163120.068519:rp_read_config (ioh3420, @0x46, len=0x2) 0x0 32481@1536163120.068547:rp_read_config (ioh3420, @0xe4, len=0x2) 0x8 32481@1536163120.068575:rp_write_config (ioh3420, @0x10, 0x0, len=0x4) BAR0 32481@1536163120.068607:rp_write_config (ioh3420, @0x14, 0x0, len=0x4) BAR1 32481@1536163120.068636:rp_write_config (ioh3420, @0x1c, 0xff, len=0x2) I/O Base 32481@1536163120.069825:rp_write_config (ioh3420, @0x20, 0xfc10fc00, len=0x4) Memory Base 32481@1536163120.070928:rp_write_config (ioh3420, @0x24, 0xfeb0fea0, len=0x4) Prefetchable Memory Base 32481@1536163120.071968:rp_write_config (ioh3420, @0x28, 0x0, len=0x4) Prefetchable Base Upper 32 Bits 32481@1536163120.072946:rp_write_config (ioh3420, @0x2c, 0x0, len=0x4) Prefetchable Limit Upper 32 Bits 32481@1536163120.073901:rp_write_config (ioh3420, @0x30, 0x0, len=0x4) I/O Base Upper 16 Bits 32481@1536163120.074969:rp_write_config (ioh3420, @0x38, 0x0, len=0x4) 32481@1536163120.075006:rp_write_config (ioh3420, @0x3c, 0x0, len=0x1) Interrupt Line 32481@1536163120.075028:rp_write_config (ioh3420, @0x3e, 0x2, len=0x2) Bridge Control 32481@1536163120.075996:rp_read_config (ioh3420, @0x3e, len=0x2) 0x2 Bridge Control 32481@1536163120.076028:rp_write_config (ioh3420, @0x18, 0x0, len=0x1) Primary Bus Number 32481@1536163120.076051:rp_write_config (ioh3420, @0x19, 0x1, len=0x1) Secondary Bus Number 32481@1536163120.076074:rp_write_config (ioh3420, @0x1a, 0x1, len=0x1) Subordiante Bus Number 32481@1536163120.076097:rp_write_config (ioh3420, @0xc, 0x0, len=0x1) Cacheline Size 32481@1536163120.076118:rp_write_config (ioh3420, @0xd, 0x0, len=0x1) Latency Timer 32481@1536163120.076137:rp_write_config (ioh3420, @0x4, 0x500, len=0x2) Command 32481@1536163120.077194:rp_write_config (ioh3420, @0x98, 0x7, len=0x2) PCIE: Device Control 32481@1536163120.077225:rp_write_config (ioh3420, @0xb8, 0x0, len=0x2) PCIE: Device Control 2 32481@1536163120.077246:rp_read_config (ioh3420, @0x4, len=0x2) 0x500 Command 32481@1536163120.077270:rp_write_config (ioh3420, @0x4, 0x506, len=0x2) Command 32481@1536163120.078918:rp_write_config (ioh3420, @0x6, 0xf900, len=0x2) Status 32481@1536163120.078950:rp_write_config (ioh3420, @0x1e, 0xf900, len=0x2) Secondary Status 32481@1536163120.078972:rp_read_config (ioh3420, @0x4, len=0x2) 0x506 Command 32481@1536163120.078995:rp_write_config (ioh3420, @0x4, 0x506, len=0x2) Command 32481@1536163120.079701:rp_read_config (ioh3420, @0x62, len=0x2) 0x102 MSI: Message Control 32481@1536163120.079722:rp_write_config (ioh3420, @0x62, 0x102, len=0x2) MSI: Message Control 32481@1536163120.079739:rp_read_config (ioh3420, @0x62, len=0x2) 0x102 MSI: Message Control 32481@1536163120.079753:rp_write_config (ioh3420, @0x64, 0xfee0100c, len=0x4) MSI: Message Address 32481@1536163120.079770:rp_write_config (ioh3420, @0x68, 0x4950, len=0x2) MSI: Message Upper Address 32481@1536163120.079786:rp_write_config (ioh3420, @0x6c, 0xfffffffe, len=0x4) MSI: Message Data 32481@1536163120.079801:rp_write_config (ioh3420, @0x62, 0x103, len=0x2) MSI: Message Control 32481@1536163120.079855:rp_write_config (ioh3420, @0xac, 0x0, len=0x2) PCIE: Root Control 32481@1536163120.079872:rp_write_config (ioh3420, @0xa0, 0x0, len=0x2) PCIE: Link Control 32481@1536163120.079887:rp_read_config (ioh3420, @0x98, len=0x2) 0x7 PCIE: Device Control 32481@1536163120.079903:rp_write_config (ioh3420, @0x98, 0x7, len=0x2) PCIE: Device Control 32481@1536163120.079918:rp_read_config (ioh3420, @0xb0, len=0x4) 0x0 PCIE: Root Status 32481@1536163120.079934:rp_write_config (ioh3420, @0x12c, 0x7, len=0x4) AER: Root Error Command 32481@1536163120.079950:rp_write_config (ioh3420, @0xac, 0x8, len=0x2) PCIE: Root Control NET._PS0 32481@1536163120.175514:rp_write_config (ioh3420, @0xb8, 0x0, len=0x2) PCIE: Device Control 2 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Nouveau] Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues 2018-08-28 2:23 ` Daniel Drake 2018-08-28 9:57 ` Peter Wu @ 2018-08-29 12:40 ` Karol Herbst 2018-08-30 0:13 ` Karol Herbst 1 sibling, 1 reply; 12+ messages in thread From: Karol Herbst @ 2018-08-29 12:40 UTC (permalink / raw) To: Daniel Drake Cc: Peter Wu, linux-pci, Linux PM, Endless Linux Upstreaming Team, nouveau On Tue, Aug 28, 2018 at 4:23 AM, Daniel Drake <drake@endlessm.com> wrote: > On Fri, Aug 24, 2018 at 11:42 PM, Peter Wu <peter@lekensteyn.nl> wrote: >> Are these systems also affected through runtime power management? For >> example: >> >> modprobe nouveau # should enable runtime PM >> sleep 6 # wait for runtime suspend to kick in >> lspci -s1: # runtime resume by reading PCI config space >> >> On laptops from about 2015-2016 with a GTX 9xxM this sequence results in >> hangs on various laptops >> (https://bugzilla.kernel.org/show_bug.cgi?id=156341). > > This works fine here. I'm facing a different issue. > >>> After a lot of experimentation I found a workaround: during resume, >>> set the value of PCI_PREF_BASE_UPPER32 to 0 on the parent PCI bridge. >>> Easily done in drivers/pci/quirks.c. Now all nvidia stuff works fine. >> >> I am curious, how did you discover this? While this could work, perhaps >> there are alternative workarounds/fixes? > > Based on the observation that the following procedure works fine (note > the addition of step 3): > > 1. Boot > 2. Suspend/resume > 3. echo rescan > /sys/bus/pci/devices/0000:00:1c.0/rescan > 4. Load nouveau driver > 5. Start X > > I worked through the rescan codepath until I had isolated the specific > code which magically makes things work (in pci_bridge_check_ranges). > > Having found that, step 3 in the above test procedure can be replaced > with a simple: > setpci -s 00:1c.0 0x28.l=0 > >> When you say "parent PCI" bridge, is that actually the device you see in >> "lspci -tv"? On a Dell XPS 9560, the GPU is under a different device: >> >> -[0000:00]-+-00.0 Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers >> +-01.0-[01]----00.0 NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] >> >> 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 05) > > Yes, it's the parent bridge shown by lspci. The address of this varies > from system to system. > >>> 1. Is the Intel PCI bridge misbehaving here? Why does writing the same >>> value of PCI_PREF_BASE_UPPER32 make any difference at all? >> >> At what point in the suspend code path did you insert this write? It is >> possible that the write somehow acted as a fence/memory barrier? > > static void quirk_pref_base_upper32(struct pci_dev *dev) > { > u32 pref_base_upper32; > pci_read_config_dword(dev, PCI_PREF_BASE_UPPER32, &pref_base_upper32); > pci_write_config_dword(dev, PCI_PREF_BASE_UPPER32, pref_base_upper32); > } > DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x9d10, quirk_pref_base_upper32); > this workaround fixes runtime suspend/resume on my laptop as well... but what baffles me most is, unloading nouveau does as well. I will see what bits are exactly "fixing" it in the nouveau unloading path and maybe we can get around this issue inside nouveau. It would be still nice to get to the root cause of all of this as there are three known workarounds (at least on my system): 1. unload nouveau 2. skip setting the D3 power state via PCI config space (and still do the ACPI bits) 3. write value of PCI_PREF_BASE_UPPER32 > I don't think it's acting as a barrier. I tried changing this code to > rewrite other registers such as PCI_PREF_MEMORY_BASE and that makes > the bug come back. > >>> 2. Who is responsible for saving and restoring PCI bridge >>> configuration during suspend and resume? Linux? ACPI? BIOS? >> >> Not sure about PCI bridges, but at least for the PCI Express Capability >> registers, it is in control of the OS when control is granted via the >> ACPI _OSC method. > > I guess you are referring to pci_save_pcie_state(). I can't see > anything equivalent for the bridge registers. > >> As Windows is probably not affected by this issue, a change must be >> possible to make Linux more compatible with Windows. Though I am not >> sure what change is needed. > > I agree. There's a definite difference with Windows here and it would > be great to find a fix along those lines. > >> I recently compared PCI configuration space access and ACPI method >> invocation using QEMU + VFIO with Linux 4.18, Windows 7 and Windows 10 >> (1803). There were differences like disabling MSI/interrupts before >> suspend, setting the Enable Clock Power Management bit in PCI Express >> Link Control and more, but applying these changes were so far not really >> successful. > > Interesting. Do you know any way that I could spy on Windows' accesses > to the PCI bridge registers? > Looking at at https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF > I suspect VFIO would not help me here. > It says: > Note: If they are grouped with other devices in this manner, pci > root ports and bridges should neither be bound to vfio at boot, nor be > added to the VM. > > Thanks > Daniel > _______________________________________________ > Nouveau mailing list > Nouveau@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/nouveau ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Nouveau] Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues 2018-08-29 12:40 ` Karol Herbst @ 2018-08-30 0:13 ` Karol Herbst 0 siblings, 0 replies; 12+ messages in thread From: Karol Herbst @ 2018-08-30 0:13 UTC (permalink / raw) To: Daniel Drake Cc: Peter Wu, linux-pci, Linux PM, Endless Linux Upstreaming Team, nouveau ohh actually, I was testing with a kernel without this workaround applied, so I need to retest it later. On Wed, Aug 29, 2018 at 2:40 PM, Karol Herbst <kherbst@redhat.com> wrote: > On Tue, Aug 28, 2018 at 4:23 AM, Daniel Drake <drake@endlessm.com> wrote: >> On Fri, Aug 24, 2018 at 11:42 PM, Peter Wu <peter@lekensteyn.nl> wrote: >>> Are these systems also affected through runtime power management? For >>> example: >>> >>> modprobe nouveau # should enable runtime PM >>> sleep 6 # wait for runtime suspend to kick in >>> lspci -s1: # runtime resume by reading PCI config space >>> >>> On laptops from about 2015-2016 with a GTX 9xxM this sequence results in >>> hangs on various laptops >>> (https://bugzilla.kernel.org/show_bug.cgi?id=156341). >> >> This works fine here. I'm facing a different issue. >> >>>> After a lot of experimentation I found a workaround: during resume, >>>> set the value of PCI_PREF_BASE_UPPER32 to 0 on the parent PCI bridge. >>>> Easily done in drivers/pci/quirks.c. Now all nvidia stuff works fine. >>> >>> I am curious, how did you discover this? While this could work, perhaps >>> there are alternative workarounds/fixes? >> >> Based on the observation that the following procedure works fine (note >> the addition of step 3): >> >> 1. Boot >> 2. Suspend/resume >> 3. echo rescan > /sys/bus/pci/devices/0000:00:1c.0/rescan >> 4. Load nouveau driver >> 5. Start X >> >> I worked through the rescan codepath until I had isolated the specific >> code which magically makes things work (in pci_bridge_check_ranges). >> >> Having found that, step 3 in the above test procedure can be replaced >> with a simple: >> setpci -s 00:1c.0 0x28.l=0 >> >>> When you say "parent PCI" bridge, is that actually the device you see in >>> "lspci -tv"? On a Dell XPS 9560, the GPU is under a different device: >>> >>> -[0000:00]-+-00.0 Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers >>> +-01.0-[01]----00.0 NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] >>> >>> 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 05) >> >> Yes, it's the parent bridge shown by lspci. The address of this varies >> from system to system. >> >>>> 1. Is the Intel PCI bridge misbehaving here? Why does writing the same >>>> value of PCI_PREF_BASE_UPPER32 make any difference at all? >>> >>> At what point in the suspend code path did you insert this write? It is >>> possible that the write somehow acted as a fence/memory barrier? >> >> static void quirk_pref_base_upper32(struct pci_dev *dev) >> { >> u32 pref_base_upper32; >> pci_read_config_dword(dev, PCI_PREF_BASE_UPPER32, &pref_base_upper32); >> pci_write_config_dword(dev, PCI_PREF_BASE_UPPER32, pref_base_upper32); >> } >> DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x9d10, quirk_pref_base_upper32); >> > > this workaround fixes runtime suspend/resume on my laptop as well... > but what baffles me most is, unloading nouveau does as well. I will > see what bits are exactly "fixing" it in the nouveau unloading path > and maybe we can get around this issue inside nouveau. It would be > still nice to get to the root cause of all of this as there are three > known workarounds (at least on my system): > 1. unload nouveau > 2. skip setting the D3 power state via PCI config space (and still do > the ACPI bits) > 3. write value of PCI_PREF_BASE_UPPER32 > >> I don't think it's acting as a barrier. I tried changing this code to >> rewrite other registers such as PCI_PREF_MEMORY_BASE and that makes >> the bug come back. >> >>>> 2. Who is responsible for saving and restoring PCI bridge >>>> configuration during suspend and resume? Linux? ACPI? BIOS? >>> >>> Not sure about PCI bridges, but at least for the PCI Express Capability >>> registers, it is in control of the OS when control is granted via the >>> ACPI _OSC method. >> >> I guess you are referring to pci_save_pcie_state(). I can't see >> anything equivalent for the bridge registers. >> >>> As Windows is probably not affected by this issue, a change must be >>> possible to make Linux more compatible with Windows. Though I am not >>> sure what change is needed. >> >> I agree. There's a definite difference with Windows here and it would >> be great to find a fix along those lines. >> >>> I recently compared PCI configuration space access and ACPI method >>> invocation using QEMU + VFIO with Linux 4.18, Windows 7 and Windows 10 >>> (1803). There were differences like disabling MSI/interrupts before >>> suspend, setting the Enable Clock Power Management bit in PCI Express >>> Link Control and more, but applying these changes were so far not really >>> successful. >> >> Interesting. Do you know any way that I could spy on Windows' accesses >> to the PCI bridge registers? >> Looking at at https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF >> I suspect VFIO would not help me here. >> It says: >> Note: If they are grouped with other devices in this manner, pci >> root ports and bridges should neither be bound to vfio at boot, nor be >> added to the VM. >> >> Thanks >> Daniel >> _______________________________________________ >> Nouveau mailing list >> Nouveau@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/nouveau ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2018-09-05 20:33 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-08-24 3:31 Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues Daniel Drake 2018-08-24 15:42 ` [Nouveau] " Peter Wu 2018-08-28 2:23 ` Daniel Drake 2018-08-28 9:57 ` Peter Wu 2018-08-29 0:19 ` Karol Herbst 2018-08-30 7:41 ` Daniel Drake 2018-08-30 9:40 ` Peter Wu 2018-08-31 7:17 ` Daniel Drake 2018-09-05 6:26 ` Daniel Drake 2018-09-05 16:02 ` Peter Wu 2018-08-29 12:40 ` Karol Herbst 2018-08-30 0:13 ` Karol Herbst
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).