From: Mika Westerberg <mika.westerberg-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
To: "Michael S. Tsirkin" <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
"Rafael J. Wysocki" <rjw-LthD3rsA81gm4RdzfppkhA@public.gmane.org>,
dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Ben Skeggs <bskeggs-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
Dave Airlie <airlied-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
Len Brown <lenb-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Subject: Re: 4.20.0-rc3 nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups
Date: Wed, 28 Nov 2018 13:08:57 +0200 [thread overview]
Message-ID: <20181128110857.GW2296@lahna.fi.intel.com> (raw)
In-Reply-To: <20181127212550-mutt-send-email-mst-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
On Tue, Nov 27, 2018 at 09:49:44PM -0500, Michael S. Tsirkin wrote:
> On Tue, Nov 27, 2018 at 11:36:50AM +0200, Mika Westerberg wrote:
> > +linux-acpi
> >
> > Hi Michael,
> >
> > On Mon, Nov 26, 2018 at 10:53:26PM -0500, Michael S. Tsirkin wrote:
> > > So a new thinkpad:
> > > 01:00.0 VGA compatible controller: NVIDIA Corporation GP107GLM [Quadro P2000 Mobile] (rev a1)
> > >
> > > Hangs whenever I try to poke at the card. It starts happily enough with
> > >
> > > [ 3.971515] ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20181003/nsarguments-66)
> > > [ 3.971553] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20181003/nsarguments-66)
> > > [ 3.971721] pci 0000:01:00.0: optimus capabilities: enabled, status dynamic power, hda bios codec supported
> > > [ 3.971726] VGA switcheroo: detected Optimus DSM method \_SB_.PCI0.PEG0.PEGP handle
> > > [ 3.971727] nouveau: detected PR support, will not use DSM
>
> BTW this is also a bit strange. It says it will not use DSM - but did
> it maybe use DSM previously? The ACPI Warning seems to indicate that
> it did ...
>
> And just to complete the picture here's the _DSM from ACPI:
>
> Method (_DSM, 4, Serialized) // _DSM: Device-Specific Method
> {
...
> }
>
>
> I am not sure what makes it think that Arg3 is a buffer and
> not a package: IIRC Index (decoded C-style here as []) can apply
> to either.
>
> Am I maybe misunderstanding the warning?
It looks like coming from the nouveau driver (assuming I'm reading it right).
drivers/gpu/drm/nouveau/nouveau_acpi.c::nouveau_optimus_dsm()
union acpi_object argv4 = {
.buffer.type = ACPI_TYPE_BUFFER,
.buffer.length = 4,
.buffer.pointer = args_buff
};
...
obj = acpi_evaluate_dsm_typed(handle, &nouveau_op_dsm_muid, 0x00000100,
func, &argv4, ACPI_TYPE_BUFFER);
It passes ACPI_TYPE_BUFFER but ACPI spec _DSM expects package.
> > > [ 3.971745] nouveau 0000:01:00.0: enabling device (0006 -> 0007)
> > > [ 3.971923] nouveau 0000:01:00.0: NVIDIA GP107 (137000a1)
> > > [ 4.009875] PM: Image not found (code -22)
> > > [ 4.135752] nouveau 0000:01:00.0: DRM: VRAM: 4096 MiB
> > > [ 4.135753] nouveau 0000:01:00.0: DRM: GART: 536870912 MiB
> > > [ 4.135754] nouveau 0000:01:00.0: DRM: BIT table 'A' not found
> > > [ 4.135755] nouveau 0000:01:00.0: DRM: BIT table 'L' not found
> > > [ 4.135756] nouveau 0000:01:00.0: DRM: TMDS table version 2.0
> > > [ 4.135756] nouveau 0000:01:00.0: DRM: DCB version 4.1
> > > [ 4.135757] nouveau 0000:01:00.0: DRM: DCB outp 00: 02800f76 04600020
> > > [ 4.135758] nouveau 0000:01:00.0: DRM: DCB outp 01: 02011f62 00020010
> > > [ 4.135759] nouveau 0000:01:00.0: DRM: DCB outp 02: 01022f46 04600010
> > > [ 4.135760] nouveau 0000:01:00.0: DRM: DCB outp 03: 01033f56 04600020
> > > [ 4.135761] nouveau 0000:01:00.0: DRM: DCB conn 00: 00020047
> > > [ 4.135761] nouveau 0000:01:00.0: DRM: DCB conn 01: 00010161
> > > [ 4.135762] nouveau 0000:01:00.0: DRM: DCB conn 02: 00001246
> > > [ 4.135763] nouveau 0000:01:00.0: DRM: DCB conn 03: 00002346
> > > [ 4.508355] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> > > [ 4.508355] [drm] Driver supports precise vblank timestamp query.
> > > [ 4.509812] [drm] Cannot find any crtc or sizes
> > > [ 4.510144] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 2
> > >
> > >
> > > Although that type mismatch is a bit worrying. And I'm not sure what
> > > prints PM: Image not found.
> >
> > That is fine, it just says it cannot find a hibernation image from swap
> > device. I guess you have resume=... in the kernel command line.
> >
> > > But after a short while it gets pretty busy:
> > >
> > >
> > > [ 52.917009] No Local Variables are initialized for Method [NVPO]
> > > [ 52.917011] No Arguments are initialized for method [NVPO]
> > > [ 52.917012] ACPI Error: Method parse/execution failed \_SB.PCI0.PEG0.PEGP.NVPO, AE_AML_LOOP_TIMEOUT (20181003/psparse-516)
> > > [ 52.917063] ACPI Error: Method parse/execution failed \_SB.PCI0.PGON, AE_AML_LOOP_TIMEOUT (20181003/psparse-516)
> > > [ 52.917084] ACPI Error: Method parse/execution failed \_SB.PCI0.PEG0.PG00._ON, AE_AML_LOOP_TIMEOUT (20181003/psparse-516)
> > > [ 52.917108] acpi device:00: Failed to change power state to D0
> >
> > Here it seems to fail to turn on the power resource for the device.
> >
> > > [ 52.969287] video LNXVIDEO:00: Cannot transition to power state D0 for parent in (unknown)
> > > [ 52.969289] pci_raw_set_power_state: 2 callbacks suppressed
> > > [ 52.969291] nouveau 0000:01:00.0: Refused to change power state, currently in D3
> > > [ 53.029514] video LNXVIDEO:00: Cannot transition to power state D0 for parent in (unknown)
> > > [ 53.041027] nouveau 0000:01:00.0: Refused to change power state, currently in D3
> > > [ 53.041035] video LNXVIDEO:00: Cannot transition to power state D0 for parent in (unknown)
> > > [ 53.053008] nouveau 0000:01:00.0: Refused to change power state, currently in D3
> > >
> > >
> > > And then kernel proceeds to throw up errors at random places, e.g.
> > >
> > > [ 67.021892] cfg80211: failed to load regulatory.db
> > > [ 67.021895] cfg80211: failed to load regulatory.db
> > > [ 67.021897] cfg80211: failed to load regulatory.db
> > > [ 67.021900] cfg80211: failed to load regulatory.db
> > > [ 67.021927] cfg80211: failed to load regulatory.db
> > > [ 67.021928] cfg80211: failed to load regulatory.db
> > > [ 67.021932] cfg80211: failed to load regulatory.db
> > > [ 67.021934] cfg80211: failed to load regulatory.db
> > > [ 67.024463] cfg80211: failed to load regulatory.db
> > > [ 99.980625] iwlwifi 0000:00:14.3: Error sending STATISTICS_CMD: time out after 2000ms.
>
> BTW this one might indicate that somehow iwlwifi got powered
> down too somehow. It's at
>
> 00:14.3 Network controller: Intel Corporation Wireless-AC 9560 [Jefferson Peak] (rev 10)
>
> so really shouldn't be affected, but go figure. If driver really is getting
> all-ones from the device, it just might try to poke at a wrong b:d.f by mistake
> maybe ...
Or it the power resource is shared by wifi as well.
> > > followed by soft lockups and sometimes hard lockups in places
> > > like attempts to walk skb lists.
> > >
> > > Adding runpm=0 does away with this issue.
> > >
> > > The specific test was with noaccel=1 - it does not seem to change
> > > things for me.
> > >
> > > I poked at the ACPI method NVPO and yes it does actually
> > > seem to execute a while loop waiting for some register
> > > to become 0. Which I guess never happens? Because card
> > > is in a low power state and so reads return ffffffff maybe?
> >
> > Yes, it could be the case.
> >
> > > X isn't happy even with runpm=0 but that might be a different
> > > issue - I thought runpm=0 might be an easier place to start debugging
> > > things given there are logs of the failure.
> > >
> > > Using 4.20.0-rc3 right now.
> > >
> > > Userspace bits are from fedora 29:
> > > xorg-x11-drv-nouveau-1.0.15-6.fc29.x86_64
> > >
> > > firmware is pretty recent:
> > > linux-firmware-20181008-88.gitc6b6265d.fc29.noarch
> > >
> > > More hints for debugging would be appreciated.
> > > If anyone wants me to play with a different kernel tree,
> > > let me know.
> >
> > Can you share full dmesg and acpidump of the system? I would like to
> > check the power resource methods.
>
> Pls find it as an attachment here:
> https://bugs.freedesktop.org/show_bug.cgi?id=108873
>
> If you like I can send it to you directly too -
> spamming the list with it isn't helpful I guess?
No need to send, I can read it from the bugzilla just fine. Can you attach
acpidump there as well?
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau
next prev parent reply other threads:[~2018-11-28 11:08 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20181126221005-mutt-send-email-mst@kernel.org>
[not found] ` <20181126221005-mutt-send-email-mst-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2018-11-27 9:36 ` 4.20.0-rc3 nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups Mika Westerberg
[not found] ` <20181127093650.GP2296-3PARRvDOhMZrdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2018-11-28 2:49 ` Michael S. Tsirkin
[not found] ` <20181127212550-mutt-send-email-mst-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2018-11-28 11:08 ` Mika Westerberg [this message]
[not found] ` <20181128110857.GW2296-3PARRvDOhMZrdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2018-11-28 15:09 ` Michael S. Tsirkin
[not found] ` <20181128093527-mutt-send-email-mst-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2018-11-28 15:55 ` Mika Westerberg
[not found] ` <20181128155544.GD2296-3PARRvDOhMZrdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2018-11-28 20:30 ` Michael S. Tsirkin
[not found] ` <20181128111312-mutt-send-email-mst-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2018-11-28 23:21 ` Karol Herbst
[not found] ` <CACO55tuzDgTb2g_ebhm7+A=UjEEDoM09-b5X_7fkznkFjJGiag-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-11-29 1:29 ` Michael S. Tsirkin
2018-11-29 10:53 ` [Nouveau] " Karol Herbst
[not found] ` <CACO55tutwu+46dmhy7RuqRHSBorundx3ZcgDZCqSwvxN+JU4-g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-11-29 17:12 ` Michael S. Tsirkin
[not found] ` <20181129115629-mutt-send-email-mst-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2018-11-29 17:26 ` Karol Herbst
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181128110857.GW2296@lahna.fi.intel.com \
--to=mika.westerberg-vuqaysv1563yd54fqh9/ca@public.gmane.org \
--cc=airlied-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=bskeggs-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
--cc=lenb-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
--cc=rjw-LthD3rsA81gm4RdzfppkhA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox