All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc MERLIN <marc_nouveau-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
To: Ilia Mirkin <imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org>
Cc: nouveau <nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>
Subject: Re: 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
Date: Tue, 29 Dec 2020 09:47:50 -0800	[thread overview]
Message-ID: <20201229174750.GI23389@merlins.org> (raw)
In-Reply-To: <CAKb7UviFP_YVxC4PO7MDNnw6NDrD=3BCGF37umwAfaimjbX9Pw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

(removed other lists, since it's likely not a linux-PCI problem)

On Tue, Dec 29, 2020 at 11:33:16AM -0500, Ilia Mirkin wrote:
> > Sounds like this would be a problem with all chips if userspace is able
> > to wake them up every second or two with a probe. Now I wonder what
> > broken userspace I have that could be doing this.
> 
> Well, it's a theory. Some userspace helpfully prevents the GPU from
> suspending entirely, unfortunately I don't remember its name though by
> messing with the attached audio device. It's very common and meant to
> help... oh well.

Are you thinking about tlp maybe?  https://linrunner.de/tlp/
I submitted a blacklist patch so that it works ok-ish on my laptop now.
(when the nvidia chip is unhappy, it happily uses 70W on batteries with
1.3h of runtime. When everything is ok, I can go down to about 12W/9H)

> > Do you think that could be a reason why the boot would hang for 2 full minutes at every
> > boot ever since I upgraded to 5.5?
> 
> I'd have to check, but I'm guessing TU104 acceleration became a thing
> in 5.5. I would also not be very surprised if the code didn't handle
> failure extremely gracefully - there definitely have been problems
> with that in the past.

Ah, then the timing checks out. That's exciting, at least now I have a
lead as to why I'm having problems. This was the same time a PCI PM
change went in, and I mistakenly thought it was to blame.

> > The kernel module is in my initrd:
> > sauron:/usr/local/bin# dd if=/boot/initrd.img-5.9.11-amd64-preempt-sysrq-20190817 bs=2966528  skip=1 | gunzip | cpio -tdv | grep nouveau
> > drwxr-xr-x   1 root     root            0 Nov 30 15:40 usr/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau
> > -rw-r--r--   1 root     root      3691385 Nov 30 15:35 usr/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau/nouveau.ko
> > 17+1 records in
> > 17+1 records out
> > 52566778 bytes (53 MB, 50 MiB) copied, 1.69708 s, 31.0 MB/s
> 
> I think that gets you out of "full newbie" land...

:)  (ok, I have been using linux since 1993, but stuff changes so much
all the time, that sometimes I feel like a newbie all over again)
In my days, we didn't complain about systemd vs sysvinit, we had rc.local
and it was good enough :-D

> > Note that ultimately I only need nouveau not to hang my boot 2mn and do
> > PM so that the nvidia chip goes to sleep since I don't use it.
> 
> I'm not extremely familiar with debian packaging, but the firmware is
> provided by NVIDIA and shipped as part of linux-firmware:
> https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/nvidia
 
Ah, it comes from outside just like intel firmware, thanks.
Also, I was looking for nouveau, not nvidia:
sauron:/usr/local/bin# dd if=/boot/initrd.img-5.9.11-amd64-preempt-sysrq-20190817 bs=2966528  skip=1 | gunzip | cpio -tdv | grep tu104
shows no match

Good news is that debian did package it (they have multiple firmware
packages)
sauron:~# dpkggrep firmware | awk '{print $1}' | xargs apt-get install -y
sauron:~# dpkg -S /lib/firmware/nvidia/tu104
firmware-misc-nonfree: /lib/firmware/nvidia/tu104

update-initramfs -v -c -k 5.9.11-amd64-preempt-sysrq-20190817

Ok, I should be in business after next reboot, thank you.

> Of course now that I read your email a bit more carefully, it seems
> your issue is with the "saving config space" messages. I'm not sure
> I've seen those before. Perhaps you have some sort of debug enabled.
> I'd find where in the kernel they are being produced, and what the
> conditions for it are. But the failure to load firmware isn't great --
> not 100% sure if it impacts runpm or not.
 
Yes, I have 'nouveau.debug=disp=trace'
Someone on this list asked me to add this a few months back.

> I just double-checked, TU10x accel came in via
> afa3b96b058d87c2c44d1c83dadb2ba6998d03ce, which was first in v5.6.
> Initial TU10x support came in v5.0. So that doesn't line up with your
> timeline.

You know, I said 5.5, maybe it was 5.6 now, it's been a little while
since those issues started.

Now we know I was missing the required firmware, it's a good place to
start, so I'll start there, thank you very much for the pointers.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

  parent reply	other threads:[~2020-12-29 17:47 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-04 12:39 [PATCH v2 0/2] PCI: Add missing link delays Mika Westerberg
2019-10-04 12:39 ` [PATCH v2 1/2] PCI: Introduce pcie_wait_for_link_delay() Mika Westerberg
2020-08-08 20:22   ` Marc MERLIN
2020-08-08 20:23     ` Marc MERLIN
2020-08-09 16:31     ` Marc MERLIN
2020-09-06 18:18     ` pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73) Marc MERLIN
2020-09-06 18:18       ` Marc MERLIN
2020-09-06 18:26       ` Matthias Andree
2020-09-07 19:14       ` [Nouveau] " Karol Herbst
2020-09-07 19:14         ` Karol Herbst
2020-09-07 20:58         ` [Nouveau] " Marc MERLIN
2020-09-07 20:58           ` Marc MERLIN
2020-09-07 23:51           ` [Nouveau] " Karol Herbst
2020-09-07 23:51             ` Karol Herbst
2020-09-08  0:29             ` [Nouveau] " Marc MERLIN
2020-05-29 18:03               ` 5.5 kernel: using nouveau or something else just long enough to turn off Quadro RTX 4000 Mobile for hybrid graphics? Marc MERLIN
     [not found]                 ` <20200529180315.GA18804-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2020-05-29 18:53                   ` Ilia Mirkin
     [not found]                     ` <CAKb7Uvhw2EYo1RR-=NGgLO3CU9QTRWchcAw1injffybZbJ-zOA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-05-29 19:46                       ` Marc MERLIN
     [not found]                         ` <20200529194605.GB18804-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2020-05-30 17:32                           ` Karol Herbst
2023-04-19  6:49                         ` [Nouveau] 6.1 still cannot get display on Thinkpad P73Quadro " Marc MERLIN
2023-04-21  5:46                           ` [Nouveau] 6.2 still cannot get hdmi display out on Thinkpad P73 Quadro RTX 4000 Mobile/TU104 Marc MERLIN
     [not found]                       ` <CACO55tsvY0t_z986VVoYCvxuBASdZ+rQcDtZ_dAtQR60NLmQQw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-05-31 18:31                         ` 5.5 kernel: using nouveau or something else just long enough to turn off Quadro RTX 4000 Mobile for hybrid graphics? Marc MERLIN
2020-12-26 11:12                 ` 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile) Marc MERLIN
2020-12-26 11:12                   ` Marc MERLIN
2020-12-27 18:28                   ` [Nouveau] " Ilia Mirkin
2020-12-27 18:28                     ` Ilia Mirkin
2021-01-27 21:33                   ` Bjorn Helgaas
2021-01-27 21:33                     ` Bjorn Helgaas
2021-01-28 20:59                     ` Bjorn Helgaas
2021-01-28 20:59                       ` [Nouveau] " Bjorn Helgaas
2021-01-29  0:56                     ` Marc MERLIN
2021-01-29  0:56                       ` [Nouveau] " Marc MERLIN
2021-01-29 21:20                       ` Bjorn Helgaas
2021-01-29 21:20                         ` [Nouveau] " Bjorn Helgaas
2021-01-30  2:04                         ` Marc MERLIN
2021-01-30  2:04                           ` [Nouveau] " Marc MERLIN
2021-05-05 21:42                           ` [Nouveau] 5.12.1 0010:nvkm_falcon_v1_wait_for_halt+0x8f/0xb9 [nouveau] Marc MERLIN
2021-05-06 14:50                             ` Bjorn Helgaas
2021-05-25  3:13                               ` Ben Skeggs
2020-12-29 15:51                 ` 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile) Marc MERLIN
2020-12-29 15:51                   ` Marc MERLIN
2020-12-29 16:33                   ` Ilia Mirkin
2020-12-29 16:33                     ` Ilia Mirkin
     [not found]                     ` <CAKb7UviFP_YVxC4PO7MDNnw6NDrD=3BCGF37umwAfaimjbX9Pw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-12-29 17:47                       ` Marc MERLIN [this message]
     [not found]                         ` <20201229174750.GI23389-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2021-01-04 11:49                           ` Marc MERLIN
     [not found]                             ` <20210104114955.GM32533-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2021-01-04 13:28                               ` Karol Herbst
     [not found]                                 ` <CACO55tsdG37YKv7FV2er4hRnXk9vmwMbPuPptA+=ZtziWXC2+g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-01-07 11:49                                   ` Marc MERLIN
2020-12-30 12:16                       ` ael
     [not found]               ` <20200908002935.GD20064-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2020-09-13 20:15                 ` pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73) Marc MERLIN
2020-09-13 20:15                   ` [Nouveau] " Marc MERLIN
     [not found]                   ` <20200913201545.GL2622-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2020-09-19 23:18                     ` Marc MERLIN
2019-10-04 12:39 ` [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec Mika Westerberg
2019-10-26 14:19   ` Bjorn Helgaas
2019-10-28 11:28     ` Mika Westerberg
2019-10-28 13:42       ` Bjorn Helgaas
2019-10-28 18:06         ` Mika Westerberg
2019-10-28 20:16           ` Bjorn Helgaas
2019-10-29 11:15             ` Mika Westerberg
2019-10-29 20:27               ` Bjorn Helgaas
2019-10-30 11:15                 ` Mika Westerberg
2019-10-31 22:31                   ` Bjorn Helgaas
2019-11-01 11:19                     ` Mika Westerberg
2019-11-05  0:00                       ` Bjorn Helgaas
2019-11-05  9:54                         ` Mika Westerberg
2019-11-05 12:58                           ` Mika Westerberg
2019-11-05 20:01                             ` Bjorn Helgaas
2019-11-06 13:31                               ` Mika Westerberg
2019-11-05 15:00                           ` Bjorn Helgaas
2019-11-05 15:28                             ` Mika Westerberg
2019-11-05 16:10                               ` Bjorn Helgaas
2019-11-06 13:29                                 ` Mika Westerberg
2019-10-29 20:54   ` Bjorn Helgaas
2019-10-30 11:33     ` Mika Westerberg
2019-10-04 12:57 ` [PATCH v2 0/2] PCI: Add missing link delays Matthias Andree
2019-10-04 13:06   ` Mika Westerberg
2019-10-05  7:34     ` Matthias Andree
2019-10-07  9:32       ` Mika Westerberg
2019-10-07 15:15         ` Matthias Andree
2019-10-08  9:05           ` Mika Westerberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201229174750.GI23389@merlins.org \
    --to=marc_nouveau-xnduunryou1afugrpc6u6w@public.gmane.org \
    --cc=imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org \
    --cc=nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.