Re: Unrecoverable AER error when resuming from RAM (hda regression in 5.7-rc2)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Takashi Iwai <tiwai@suse.de>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: alsa-devel@alsa-project.org, linux-pm@vger.kernel.org,
	linux-pci@vger.kernel.org,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	linux-kernel@vger.kernel.org,
	"Alex Xu \(Hello71\)" <alex_y_xu@yahoo.ca>,
	Roy Spliet <nouveau@spliet.org>
Subject: Re: Unrecoverable AER error when resuming from RAM (hda regression in 5.7-rc2)
Date: Wed, 22 Apr 2020 23:25:04 +0200	[thread overview]
Message-ID: <s5h8sinxlfz.wl-tiwai@suse.de> (raw)
In-Reply-To: <20200422205028.GA223132@google.com>

On Wed, 22 Apr 2020 22:50:28 +0200,
Bjorn Helgaas wrote:
> 
> [+cc Rafael, linux-pm]
> 
> On Tue, Apr 21, 2020 at 03:08:44PM -0400, Alex Xu (Hello71) wrote:
> > With 5.7-rc2, after resuming from suspend to RAM, I get:
> > 
> > [   55.679382] pcieport 0000:00:03.1: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:00:00.0
> > [   55.679405] pcieport 0000:00:03.1: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
> > [   55.679410] pcieport 0000:00:03.1: AER:   device [1022:1453] error status/mask=00100000/04400000
> > [   55.679414] pcieport 0000:00:03.1: AER:    [20] UnsupReq               (First)
> > [   55.679417] pcieport 0000:00:03.1: AER:   TLP Header: 40000004 0a0000ff fffc0e80 00000000
> > [   55.679423] amdgpu 0000:0a:00.0: AER: can't recover (no error_detected callback)
> > [   55.679425] snd_hda_intel 0000:0a:00.1: AER: can't recover (no error_detected callback)
> > [   55.679455] pcieport 0000:00:03.1: AER: device recovery failed
> 
> I'm not at all confident in my decoding skills, but I *think* the TLP
> header decodes to:
> 
>   Fmt           010b         3 DW header with data (32-bit address)
>   Type          00000b       MWr
>   Length        0x4          4 DW = 16 bytes
>   Requester ID  0x0a00       0a:00.0
>   Byte enables  0xff
>   Address       0xfffc0e80
> 
> which would mean the 0a:00.0 GPU did a 16-byte write to 0xfffc0e80,
> and the 00:03.1 Root Port reported that as an Unsupported Request.
> I don't know why that would be unless the address is invalid.
> 
> Maybe that's supposed to be an MSI address?  Maybe a complete dmesg or
> /proc/iomem would have a clue?
> 
> I feel like this UR issue could be a PCI core issue or maybe some sort
> of misuse of PCI power management, but I can't seem to get traction on
> it.
> 
> > Then the display freezes and the system basically falls apart (can't 
> > even sudo reboot -f, need to use magic sysrq).
> > 
> > I bisected this to "ALSA: hda: Skip controller resume if not needed". 
> > Setting snd_hda_intel.power_save=0 resolves the issue.
> 
> FWIW, the complete citation is c4c8dd6ef807 ("ALSA: hda: Skip
> controller resume if not needed"),
> https://git.kernel.org/linus/c4c8dd6ef807, which first appeared in
> v5.7-rc2.

Yes, and I posted the fix patch right now:
  https://lore.kernel.org/r/20200422203744.26299-1-tiwai@suse.de

The possible cause was the tricky resume code that both HD-audio
controller (the parent PCI device) and the codec devices used.

At least the patch above seems working for the reporter's machine.
Now we need a bit more testing before merging, but it looks promising,
so far.


thanks,

Takashi

WARNING: multiple messages have this Message-ID (diff)

From: Takashi Iwai <tiwai@suse.de>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: "Alex Xu (Hello71)" <alex_y_xu@yahoo.ca>,
	alsa-devel@alsa-project.org, Roy Spliet <nouveau@spliet.org>,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	linux-pm@vger.kernel.org
Subject: Re: Unrecoverable AER error when resuming from RAM (hda regression in 5.7-rc2)
Date: Wed, 22 Apr 2020 23:25:04 +0200	[thread overview]
Message-ID: <s5h8sinxlfz.wl-tiwai@suse.de> (raw)
In-Reply-To: <20200422205028.GA223132@google.com>

On Wed, 22 Apr 2020 22:50:28 +0200,
Bjorn Helgaas wrote:
> 
> [+cc Rafael, linux-pm]
> 
> On Tue, Apr 21, 2020 at 03:08:44PM -0400, Alex Xu (Hello71) wrote:
> > With 5.7-rc2, after resuming from suspend to RAM, I get:
> > 
> > [   55.679382] pcieport 0000:00:03.1: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:00:00.0
> > [   55.679405] pcieport 0000:00:03.1: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
> > [   55.679410] pcieport 0000:00:03.1: AER:   device [1022:1453] error status/mask=00100000/04400000
> > [   55.679414] pcieport 0000:00:03.1: AER:    [20] UnsupReq               (First)
> > [   55.679417] pcieport 0000:00:03.1: AER:   TLP Header: 40000004 0a0000ff fffc0e80 00000000
> > [   55.679423] amdgpu 0000:0a:00.0: AER: can't recover (no error_detected callback)
> > [   55.679425] snd_hda_intel 0000:0a:00.1: AER: can't recover (no error_detected callback)
> > [   55.679455] pcieport 0000:00:03.1: AER: device recovery failed
> 
> I'm not at all confident in my decoding skills, but I *think* the TLP
> header decodes to:
> 
>   Fmt           010b         3 DW header with data (32-bit address)
>   Type          00000b       MWr
>   Length        0x4          4 DW = 16 bytes
>   Requester ID  0x0a00       0a:00.0
>   Byte enables  0xff
>   Address       0xfffc0e80
> 
> which would mean the 0a:00.0 GPU did a 16-byte write to 0xfffc0e80,
> and the 00:03.1 Root Port reported that as an Unsupported Request.
> I don't know why that would be unless the address is invalid.
> 
> Maybe that's supposed to be an MSI address?  Maybe a complete dmesg or
> /proc/iomem would have a clue?
> 
> I feel like this UR issue could be a PCI core issue or maybe some sort
> of misuse of PCI power management, but I can't seem to get traction on
> it.
> 
> > Then the display freezes and the system basically falls apart (can't 
> > even sudo reboot -f, need to use magic sysrq).
> > 
> > I bisected this to "ALSA: hda: Skip controller resume if not needed". 
> > Setting snd_hda_intel.power_save=0 resolves the issue.
> 
> FWIW, the complete citation is c4c8dd6ef807 ("ALSA: hda: Skip
> controller resume if not needed"),
> https://git.kernel.org/linus/c4c8dd6ef807, which first appeared in
> v5.7-rc2.

Yes, and I posted the fix patch right now:
  https://lore.kernel.org/r/20200422203744.26299-1-tiwai@suse.de

The possible cause was the tricky resume code that both HD-audio
controller (the parent PCI device) and the codec devices used.

At least the patch above seems working for the reporter's machine.
Now we need a bit more testing before merging, but it looks promising,
so far.


thanks,

Takashi

next prev parent reply	other threads:[~2020-04-22 21:26 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1587494585.7pihgq0z3i.none.ref@localhost>
2020-04-21 19:08 ` Unrecoverable AER error when resuming from RAM (hda regression in 5.7-rc2) Alex Xu (Hello71)
2020-04-21 19:08   ` Alex Xu (Hello71)
2020-04-21 19:40   ` Takashi Iwai
2020-04-21 19:40     ` Takashi Iwai
2020-04-22 20:50   ` Bjorn Helgaas
2020-04-22 20:50     ` Bjorn Helgaas
2020-04-22 21:25     ` Takashi Iwai [this message]
2020-04-22 21:25       ` Takashi Iwai
2020-04-22 23:21       ` Bjorn Helgaas
2020-04-22 23:21         ` Bjorn Helgaas
2020-04-23  7:05         ` Takashi Iwai
2020-04-23  7:05           ` Takashi Iwai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=s5h8sinxlfz.wl-tiwai@suse.de \
    --to=tiwai@suse.de \
    --cc=alex_y_xu@yahoo.ca \
    --cc=alsa-devel@alsa-project.org \
    --cc=helgaas@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=nouveau@spliet.org \
    --cc=rjw@rjwysocki.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.