From: Bjorn Helgaas <helgaas@kernel.org>
To: Lyude Paul <lyude@redhat.com>
Cc: linux-pci@vger.kernel.org, nouveau@lists.freedesktop.org,
dri-devel@lists.freedesktop.org,
Karol Herbst <kherbst@redhat.com>, Ben Skeggs <skeggsb@gmail.com>,
stable@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] pci/quirks: Add quirk to reset nvgpu at boot for the Lenovo ThinkPad P50
Date: Thu, 25 Apr 2019 08:01:24 -0500 [thread overview]
Message-ID: <20190425130124.GD11428@google.com> (raw)
In-Reply-To: <20190212220230.1568-1-lyude@redhat.com>
On Tue, Feb 12, 2019 at 05:02:30PM -0500, Lyude Paul wrote:
> On a very specific subset of ThinkPad P50 SKUs, particularly ones that
> come with a Quadro M1000M chip instead of the M2000M variant, the BIOS
> seems to have a very nasty habit of not always resetting the secondary
> Nvidia GPU between full reboots if the laptop is configured in Hybrid
> Graphics mode. The reason for this happening is unknown, but the
> following steps and possibly a good bit of patience will reproduce the
> issue:
>
> 1. Boot up the laptop normally in Hybrid graphics mode
> 2. Make sure nouveau is loaded and that the GPU is awake
> 2. Allow the nvidia GPU to runtime suspend itself after being idle
> 3. Reboot the machine, the more sudden the better (e.g sysrq-b may help)
> 4. If nouveau loads up properly, reboot the machine again and go back to
> step 2 until you reproduce the issue
>
> This results in some very strange behavior: the GPU will
> quite literally be left in exactly the same state it was in when the
> previously booted kernel started the reboot. This has all sorts of bad
> sideaffects: for starters, this completely breaks nouveau starting with a
> mysterious EVO channel failure that happens well before we've actually
> used the EVO channel for anything:
>
> nouveau 0000:01:00.0: disp: chid 0 mthd 0000 data 00000400 00001000
> 00000002
> ...
> So to do this, we add a new pci quirk using
> DECLARE_PCI_FIXUP_CLASS_FINAL that will be invoked before the PCI probe
> at boot finishes. From there, we check to make sure that this is indeed
> the specific P50 variant of this GPU. We also make sure that the GPU PCI
> device is advertising NoReset- in order to prevent us from trying to
> reset the GPU when the machine is in Dedicated graphics mode (where the
> GPU being initialized by the BIOS is normal and expected). Finally, we
> try mapping the MMIO space for the GPU which should only work if the GPU
> is actually active in D0 mode. We can then read the magic 0x2240c
> register on the GPU, which will have bit 1 set if the GPU's firmware has
> already been posted during a previous boot. Once we've confirmed all of
> this, we reset the PCI device and re-disable it - bringing the GPU back
> into a healthy state.
>
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Cc: nouveau@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: Karol Herbst <kherbst@redhat.com>
> Cc: Ben Skeggs <skeggsb@gmail.com>
> Cc: stable@vger.kernel.org
Applied to pci/misc for v5.2, thanks!
> ---
> drivers/pci/quirks.c | 65 ++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 65 insertions(+)
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index b0a413f3f7ca..948492fda8bf 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5117,3 +5117,68 @@ SWITCHTEC_QUIRK(0x8573); /* PFXI 48XG3 */
> SWITCHTEC_QUIRK(0x8574); /* PFXI 64XG3 */
> SWITCHTEC_QUIRK(0x8575); /* PFXI 80XG3 */
> SWITCHTEC_QUIRK(0x8576); /* PFXI 96XG3 */
> +
> +/*
> + * On certain Lenovo Thinkpad P50 SKUs, specifically those with a Nvidia
> + * Quadro M1000M, the BIOS will occasionally make the mistake of not resetting
> + * the nvidia GPU between reboots if the system is configured to use hybrid
> + * graphics mode. This results in the GPU being left in whatever state it was
> + * in during the previous boot which causes spurious interrupts from the GPU,
> + * which in turn cause us to disable the wrong IRQs and end up breaking the
> + * touchpad. Unsurprisingly, this also completely breaks nouveau.
> + *
> + * Luckily, it seems a simple reset of the PCI device for the nvidia GPU
> + * manages to bring the GPU back into a clean state and fix all of these
> + * issues. Additionally since the GPU will report NoReset+ when the machine is
> + * configured in Dedicated display mode, we don't need to worry about
> + * accidentally resetting the GPU when it's supposed to already be
> + * initialized.
> + */
> +static void
> +quirk_lenovo_thinkpad_p50_nvgpu_survives_reboot(struct pci_dev *pdev)
> +{
> + void __iomem *map;
> + int ret;
> +
> + if (pdev->subsystem_vendor != PCI_VENDOR_ID_LENOVO ||
> + pdev->subsystem_device != 0x222e ||
> + !pdev->reset_fn)
> + return;
> +
> + /*
> + * If we can't enable the device's mmio space, it's probably not even
> + * initialized. This is fine, and means we can just skip the quirk
> + * entirely.
> + */
> + if (pci_enable_device_mem(pdev)) {
> + pci_dbg(pdev, "Can't enable device mem, no reset needed\n");
> + return;
> + }
> +
> + /* Taken from drivers/gpu/drm/nouveau/engine/device/base.c */
> + map = ioremap(pci_resource_start(pdev, 0), 0x102000);
> + if (!map) {
> + pci_err(pdev, "Can't map MMIO space, this is probably very bad\n");
> + goto out_disable;
> + }
> +
> + /*
> + * Be extra careful, and make sure that the GPU firmware is posted
> + * before trying a reset
> + */
> + if (ioread32(map + 0x2240c) & 0x2) {
> + pci_info(pdev,
> + FW_BUG "GPU left initialized by EFI, resetting\n");
> + ret = pci_reset_function(pdev);
> + if (ret < 0)
> + pci_err(pdev, "Failed to reset GPU: %d\n", ret);
> + }
> +
> + iounmap(map);
> +out_disable:
> + pci_disable_device(pdev);
> +}
> +
> +DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, 0x13b1,
> + PCI_CLASS_DISPLAY_VGA, 8,
> + quirk_lenovo_thinkpad_p50_nvgpu_survives_reboot);
> --
> 2.20.1
>
WARNING: multiple messages have this Message-ID (diff)
From: Bjorn Helgaas <helgaas@kernel.org>
To: Lyude Paul <lyude@redhat.com>
Cc: Karol Herbst <kherbst@redhat.com>,
linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
stable@vger.kernel.org, dri-devel@lists.freedesktop.org,
nouveau@lists.freedesktop.org
Subject: Re: [PATCH] pci/quirks: Add quirk to reset nvgpu at boot for the Lenovo ThinkPad P50
Date: Thu, 25 Apr 2019 08:01:24 -0500 [thread overview]
Message-ID: <20190425130124.GD11428@google.com> (raw)
In-Reply-To: <20190212220230.1568-1-lyude@redhat.com>
On Tue, Feb 12, 2019 at 05:02:30PM -0500, Lyude Paul wrote:
> On a very specific subset of ThinkPad P50 SKUs, particularly ones that
> come with a Quadro M1000M chip instead of the M2000M variant, the BIOS
> seems to have a very nasty habit of not always resetting the secondary
> Nvidia GPU between full reboots if the laptop is configured in Hybrid
> Graphics mode. The reason for this happening is unknown, but the
> following steps and possibly a good bit of patience will reproduce the
> issue:
>
> 1. Boot up the laptop normally in Hybrid graphics mode
> 2. Make sure nouveau is loaded and that the GPU is awake
> 2. Allow the nvidia GPU to runtime suspend itself after being idle
> 3. Reboot the machine, the more sudden the better (e.g sysrq-b may help)
> 4. If nouveau loads up properly, reboot the machine again and go back to
> step 2 until you reproduce the issue
>
> This results in some very strange behavior: the GPU will
> quite literally be left in exactly the same state it was in when the
> previously booted kernel started the reboot. This has all sorts of bad
> sideaffects: for starters, this completely breaks nouveau starting with a
> mysterious EVO channel failure that happens well before we've actually
> used the EVO channel for anything:
>
> nouveau 0000:01:00.0: disp: chid 0 mthd 0000 data 00000400 00001000
> 00000002
> ...
> So to do this, we add a new pci quirk using
> DECLARE_PCI_FIXUP_CLASS_FINAL that will be invoked before the PCI probe
> at boot finishes. From there, we check to make sure that this is indeed
> the specific P50 variant of this GPU. We also make sure that the GPU PCI
> device is advertising NoReset- in order to prevent us from trying to
> reset the GPU when the machine is in Dedicated graphics mode (where the
> GPU being initialized by the BIOS is normal and expected). Finally, we
> try mapping the MMIO space for the GPU which should only work if the GPU
> is actually active in D0 mode. We can then read the magic 0x2240c
> register on the GPU, which will have bit 1 set if the GPU's firmware has
> already been posted during a previous boot. Once we've confirmed all of
> this, we reset the PCI device and re-disable it - bringing the GPU back
> into a healthy state.
>
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Cc: nouveau@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: Karol Herbst <kherbst@redhat.com>
> Cc: Ben Skeggs <skeggsb@gmail.com>
> Cc: stable@vger.kernel.org
Applied to pci/misc for v5.2, thanks!
> ---
> drivers/pci/quirks.c | 65 ++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 65 insertions(+)
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index b0a413f3f7ca..948492fda8bf 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5117,3 +5117,68 @@ SWITCHTEC_QUIRK(0x8573); /* PFXI 48XG3 */
> SWITCHTEC_QUIRK(0x8574); /* PFXI 64XG3 */
> SWITCHTEC_QUIRK(0x8575); /* PFXI 80XG3 */
> SWITCHTEC_QUIRK(0x8576); /* PFXI 96XG3 */
> +
> +/*
> + * On certain Lenovo Thinkpad P50 SKUs, specifically those with a Nvidia
> + * Quadro M1000M, the BIOS will occasionally make the mistake of not resetting
> + * the nvidia GPU between reboots if the system is configured to use hybrid
> + * graphics mode. This results in the GPU being left in whatever state it was
> + * in during the previous boot which causes spurious interrupts from the GPU,
> + * which in turn cause us to disable the wrong IRQs and end up breaking the
> + * touchpad. Unsurprisingly, this also completely breaks nouveau.
> + *
> + * Luckily, it seems a simple reset of the PCI device for the nvidia GPU
> + * manages to bring the GPU back into a clean state and fix all of these
> + * issues. Additionally since the GPU will report NoReset+ when the machine is
> + * configured in Dedicated display mode, we don't need to worry about
> + * accidentally resetting the GPU when it's supposed to already be
> + * initialized.
> + */
> +static void
> +quirk_lenovo_thinkpad_p50_nvgpu_survives_reboot(struct pci_dev *pdev)
> +{
> + void __iomem *map;
> + int ret;
> +
> + if (pdev->subsystem_vendor != PCI_VENDOR_ID_LENOVO ||
> + pdev->subsystem_device != 0x222e ||
> + !pdev->reset_fn)
> + return;
> +
> + /*
> + * If we can't enable the device's mmio space, it's probably not even
> + * initialized. This is fine, and means we can just skip the quirk
> + * entirely.
> + */
> + if (pci_enable_device_mem(pdev)) {
> + pci_dbg(pdev, "Can't enable device mem, no reset needed\n");
> + return;
> + }
> +
> + /* Taken from drivers/gpu/drm/nouveau/engine/device/base.c */
> + map = ioremap(pci_resource_start(pdev, 0), 0x102000);
> + if (!map) {
> + pci_err(pdev, "Can't map MMIO space, this is probably very bad\n");
> + goto out_disable;
> + }
> +
> + /*
> + * Be extra careful, and make sure that the GPU firmware is posted
> + * before trying a reset
> + */
> + if (ioread32(map + 0x2240c) & 0x2) {
> + pci_info(pdev,
> + FW_BUG "GPU left initialized by EFI, resetting\n");
> + ret = pci_reset_function(pdev);
> + if (ret < 0)
> + pci_err(pdev, "Failed to reset GPU: %d\n", ret);
> + }
> +
> + iounmap(map);
> +out_disable:
> + pci_disable_device(pdev);
> +}
> +
> +DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, 0x13b1,
> + PCI_CLASS_DISPLAY_VGA, 8,
> + quirk_lenovo_thinkpad_p50_nvgpu_survives_reboot);
> --
> 2.20.1
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
next prev parent reply other threads:[~2019-04-25 13:01 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-12 22:02 [PATCH] pci/quirks: Add quirk to reset nvgpu at boot for the Lenovo ThinkPad P50 Lyude Paul
2019-02-12 22:02 ` Lyude Paul
2019-02-15 0:43 ` Bjorn Helgaas
2019-02-15 21:17 ` Lyude Paul
2019-03-13 22:25 ` Lyude Paul
2019-03-19 20:56 ` Lyude Paul
2019-03-19 20:56 ` Lyude Paul
2019-03-21 22:48 ` Bjorn Helgaas
2019-03-22 11:30 ` Bjorn Helgaas
2019-03-22 11:30 ` Bjorn Helgaas
2019-04-03 17:27 ` Lyude Paul
2019-04-04 14:17 ` Bjorn Helgaas
2019-04-04 14:17 ` Bjorn Helgaas
2019-04-15 18:07 ` Lyude Paul
2019-04-24 18:59 ` Bjorn Helgaas
2019-04-24 18:59 ` Bjorn Helgaas
2019-04-24 19:16 ` Lyude Paul
2019-04-24 22:36 ` Bjorn Helgaas
2019-04-24 22:36 ` Bjorn Helgaas
2019-04-24 23:03 ` Lyude Paul
2019-04-24 17:31 ` Lyude Paul
2019-04-24 17:31 ` Lyude Paul
2019-04-24 18:28 ` Bjorn Helgaas
2019-03-22 23:50 ` Lyude Paul
[not found] ` <20190212220230.1568-1-lyude-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2019-02-15 15:48 ` Sasha Levin
2019-02-18 21:14 ` Sasha Levin
2019-02-15 15:48 ` Sasha Levin via dri-devel
2019-02-18 21:14 ` Sasha Levin
2019-02-18 21:14 ` Sasha Levin
2019-02-18 22:18 ` Bjorn Helgaas
2019-04-25 13:01 ` Bjorn Helgaas [this message]
2019-04-25 13:01 ` Bjorn Helgaas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190425130124.GD11428@google.com \
--to=helgaas@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=kherbst@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lyude@redhat.com \
--cc=nouveau@lists.freedesktop.org \
--cc=skeggsb@gmail.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.