All of lore.kernel.org
 help / color / mirror / Atom feed
From: Takashi Iwai <tiwai@suse.de>
To: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
Cc: "Zhou, David\(ChunMing\)" <David1.Zhou@amd.com>,
	"alsa-devel@alsa-project.org" <alsa-devel@alsa-project.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	Takashi Iwai <tiwai@suse.com>,
	"Deucher, Alexander" <Alexander.Deucher@amd.com>,
	Lukas Wunner <lukas@wunner.de>,
	Alex Deucher <alexdeucher@gmail.com>,
	"Koenig, Christian" <Christian.Koenig@amd.com>
Subject: Re: [PATCH 0/1] Fiji GPU audio register timeout when in BACO state
Date: Wed, 29 Apr 2020 09:37:41 +0200	[thread overview]
Message-ID: <s5ha72ulp2y.wl-tiwai@suse.de> (raw)
In-Reply-To: <PSXP216MB043899DC52E6C6BF728D77CD80AC0@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM>

On Tue, 28 Apr 2020 16:48:45 +0200,
Nicholas Johnson wrote:
> 
> > > > >
> > > > > FWIW, I have a fiji board in a desktop system and it worked fine when
> > > > > this code was enabled.
> > > >
> > > > Is the new DC code used for Fiji boards?  IIRC, the audio component
> > > > binding from amdgpu is enabled only for DC, and without the audio
> > > > component binding the runtime PM won't be linked up, hence you can't
> > > > power up GPU from the audio side access automatically.
> > > >
> > > 
> > > Yes, DC is enabled by default for all cards with runtime pm enabled.
> > 
> > OK, thanks, I found that amdgpu got bound via component in the dmesg
> > output, too:
> > 
> > [   21.294927] snd_hda_intel 0000:08:00.1: bound 0000:08:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
> > 
> > This is the place soon after amdgpu driver gets initialized.
> > Then we see later another initialization phase:
> > 
> > [   26.904127] rfkill: input handler enabled
> > [   37.264152] [drm] PCIE GART of 1024M enabled (table at 0x000000F400000000).
> > 
> > here shows 10 seconds between them.  Then, it complained something:
> > 
> > 
> > [   37.363287] [drm] UVD initialized successfully.
> > [   37.473340] [drm] VCE initialized successfully.
> > [   37.477942] amdgpu 0000:08:00.0: [drm] Cannot find any crtc or sizes
> 
> The above would be me hitting WindowsKey+P to change screens, but with 
> no DisplayPort attached to Fiji, hence it unable to find crtc.
> 
> > 
> > ... and go further, and hitting HD-audio error:
> > 
> That would be me having attached the DisplayPort and done WindowsKey+P 
> again.
> 
> > [   38.936624] [drm] fb mappable at 0x4B0696000
> > [   38.936626] [drm] vram apper at 0x4B0000000
> > [   38.936626] [drm] size 33177600
> > [   38.936627] [drm] fb depth is 24
> > [   38.936627] [drm]    pitch is 15360
> > [   38.936673] amdgpu 0000:08:00.0: fb1: amdgpudrmfb frame buffer device
> > [   40.092223] snd_hda_intel 0000:08:00.1: azx_get_response timeout, switching to polling mode: last cmd=0x00170500
> > 
> > After this point, HD-audio communication was screwed up.
> > 
> > This lastcmd in the above message is AC_SET_POWER_STATE verb for the
> > root node to D0, so the very first command to power up the codec. 
> > The rest commands are also about the power up of each node, so the
> > whole error indicate that the power up at runtime resume failed.
> > 
> > So, this looks to me as if the device gets runtime-resumed at the bad
> > moment?
> It does. However, this is not going to be easy to pin down.
> 
> I moved from Arch to Ubuntu, and it behaves differently. I cannot 
> trigger the bug in Ubuntu. Plus, it puts the GPUs asleep, even if 
> attached at boot, unlike Arch. I will continue to try to trigger it. But 
> even if this is a problem with the Linux distribution, it should not be 
> able to trigger a kernel mode bug, so we should persist with finding it.

Sure, that's a bug to be fixed.

This made me thinking what happens if we load the HD-audio driver very
late.  Could you try to blacklist snd-hda-intel module, then load it
manually after plugging the DP monitor and activating it?

Also, could you track who called the problematic power-up sequence,
e.g. by adding WARN_ON_ONCE()?

Last but not least, please check /proc/asound/card1/eld#* files (there
are both card0 and card1 or such that contain eld#* files, and one is
for i915 and another for amdgpu) before and after plugging.  This
shows whether the audio connection was recognized or not.


thanks,

Takashi

--- a/sound/hda/hdac_controller.c
+++ b/sound/hda/hdac_controller.c
@@ -224,6 +224,7 @@ void snd_hdac_bus_update_rirb(struct hdac_bus *bus)
 			dev_err_ratelimited(bus->dev,
 				"spurious response %#x:%#x, last cmd=%#08x\n",
 				res, res_ex, bus->last_cmd[addr]);
+			WARN_ON_ONCE(1);
 		}
 	}
 }
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

WARNING: multiple messages have this Message-ID (diff)
From: Takashi Iwai <tiwai@suse.de>
To: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
Cc: "Zhou, David\(ChunMing\)" <David1.Zhou@amd.com>,
	"alsa-devel@alsa-project.org" <alsa-devel@alsa-project.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	Takashi Iwai <tiwai@suse.com>,
	"Deucher, Alexander" <Alexander.Deucher@amd.com>,
	Lukas Wunner <lukas@wunner.de>,
	Alex Deucher <alexdeucher@gmail.com>,
	"Koenig, Christian" <Christian.Koenig@amd.com>
Subject: Re: [PATCH 0/1] Fiji GPU audio register timeout when in BACO state
Date: Wed, 29 Apr 2020 09:37:41 +0200	[thread overview]
Message-ID: <s5ha72ulp2y.wl-tiwai@suse.de> (raw)
In-Reply-To: <PSXP216MB043899DC52E6C6BF728D77CD80AC0@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM>

On Tue, 28 Apr 2020 16:48:45 +0200,
Nicholas Johnson wrote:
> 
> > > > >
> > > > > FWIW, I have a fiji board in a desktop system and it worked fine when
> > > > > this code was enabled.
> > > >
> > > > Is the new DC code used for Fiji boards?  IIRC, the audio component
> > > > binding from amdgpu is enabled only for DC, and without the audio
> > > > component binding the runtime PM won't be linked up, hence you can't
> > > > power up GPU from the audio side access automatically.
> > > >
> > > 
> > > Yes, DC is enabled by default for all cards with runtime pm enabled.
> > 
> > OK, thanks, I found that amdgpu got bound via component in the dmesg
> > output, too:
> > 
> > [   21.294927] snd_hda_intel 0000:08:00.1: bound 0000:08:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
> > 
> > This is the place soon after amdgpu driver gets initialized.
> > Then we see later another initialization phase:
> > 
> > [   26.904127] rfkill: input handler enabled
> > [   37.264152] [drm] PCIE GART of 1024M enabled (table at 0x000000F400000000).
> > 
> > here shows 10 seconds between them.  Then, it complained something:
> > 
> > 
> > [   37.363287] [drm] UVD initialized successfully.
> > [   37.473340] [drm] VCE initialized successfully.
> > [   37.477942] amdgpu 0000:08:00.0: [drm] Cannot find any crtc or sizes
> 
> The above would be me hitting WindowsKey+P to change screens, but with 
> no DisplayPort attached to Fiji, hence it unable to find crtc.
> 
> > 
> > ... and go further, and hitting HD-audio error:
> > 
> That would be me having attached the DisplayPort and done WindowsKey+P 
> again.
> 
> > [   38.936624] [drm] fb mappable at 0x4B0696000
> > [   38.936626] [drm] vram apper at 0x4B0000000
> > [   38.936626] [drm] size 33177600
> > [   38.936627] [drm] fb depth is 24
> > [   38.936627] [drm]    pitch is 15360
> > [   38.936673] amdgpu 0000:08:00.0: fb1: amdgpudrmfb frame buffer device
> > [   40.092223] snd_hda_intel 0000:08:00.1: azx_get_response timeout, switching to polling mode: last cmd=0x00170500
> > 
> > After this point, HD-audio communication was screwed up.
> > 
> > This lastcmd in the above message is AC_SET_POWER_STATE verb for the
> > root node to D0, so the very first command to power up the codec. 
> > The rest commands are also about the power up of each node, so the
> > whole error indicate that the power up at runtime resume failed.
> > 
> > So, this looks to me as if the device gets runtime-resumed at the bad
> > moment?
> It does. However, this is not going to be easy to pin down.
> 
> I moved from Arch to Ubuntu, and it behaves differently. I cannot 
> trigger the bug in Ubuntu. Plus, it puts the GPUs asleep, even if 
> attached at boot, unlike Arch. I will continue to try to trigger it. But 
> even if this is a problem with the Linux distribution, it should not be 
> able to trigger a kernel mode bug, so we should persist with finding it.

Sure, that's a bug to be fixed.

This made me thinking what happens if we load the HD-audio driver very
late.  Could you try to blacklist snd-hda-intel module, then load it
manually after plugging the DP monitor and activating it?

Also, could you track who called the problematic power-up sequence,
e.g. by adding WARN_ON_ONCE()?

Last but not least, please check /proc/asound/card1/eld#* files (there
are both card0 and card1 or such that contain eld#* files, and one is
for i915 and another for amdgpu) before and after plugging.  This
shows whether the audio connection was recognized or not.


thanks,

Takashi

--- a/sound/hda/hdac_controller.c
+++ b/sound/hda/hdac_controller.c
@@ -224,6 +224,7 @@ void snd_hdac_bus_update_rirb(struct hdac_bus *bus)
 			dev_err_ratelimited(bus->dev,
 				"spurious response %#x:%#x, last cmd=%#08x\n",
 				res, res_ex, bus->last_cmd[addr]);
+			WARN_ON_ONCE(1);
 		}
 	}
 }

WARNING: multiple messages have this Message-ID (diff)
From: Takashi Iwai <tiwai@suse.de>
To: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
Cc: Alex Deucher <alexdeucher@gmail.com>,
	"Zhou, David(ChunMing)" <David1.Zhou@amd.com>,
	"alsa-devel@alsa-project.org" <alsa-devel@alsa-project.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	Takashi Iwai <tiwai@suse.com>, Lukas Wunner <lukas@wunner.de>,
	"Deucher, Alexander" <Alexander.Deucher@amd.com>,
	"Koenig, Christian" <Christian.Koenig@amd.com>
Subject: Re: [PATCH 0/1] Fiji GPU audio register timeout when in BACO state
Date: Wed, 29 Apr 2020 09:37:41 +0200	[thread overview]
Message-ID: <s5ha72ulp2y.wl-tiwai@suse.de> (raw)
In-Reply-To: <PSXP216MB043899DC52E6C6BF728D77CD80AC0@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM>

On Tue, 28 Apr 2020 16:48:45 +0200,
Nicholas Johnson wrote:
> 
> > > > >
> > > > > FWIW, I have a fiji board in a desktop system and it worked fine when
> > > > > this code was enabled.
> > > >
> > > > Is the new DC code used for Fiji boards?  IIRC, the audio component
> > > > binding from amdgpu is enabled only for DC, and without the audio
> > > > component binding the runtime PM won't be linked up, hence you can't
> > > > power up GPU from the audio side access automatically.
> > > >
> > > 
> > > Yes, DC is enabled by default for all cards with runtime pm enabled.
> > 
> > OK, thanks, I found that amdgpu got bound via component in the dmesg
> > output, too:
> > 
> > [   21.294927] snd_hda_intel 0000:08:00.1: bound 0000:08:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
> > 
> > This is the place soon after amdgpu driver gets initialized.
> > Then we see later another initialization phase:
> > 
> > [   26.904127] rfkill: input handler enabled
> > [   37.264152] [drm] PCIE GART of 1024M enabled (table at 0x000000F400000000).
> > 
> > here shows 10 seconds between them.  Then, it complained something:
> > 
> > 
> > [   37.363287] [drm] UVD initialized successfully.
> > [   37.473340] [drm] VCE initialized successfully.
> > [   37.477942] amdgpu 0000:08:00.0: [drm] Cannot find any crtc or sizes
> 
> The above would be me hitting WindowsKey+P to change screens, but with 
> no DisplayPort attached to Fiji, hence it unable to find crtc.
> 
> > 
> > ... and go further, and hitting HD-audio error:
> > 
> That would be me having attached the DisplayPort and done WindowsKey+P 
> again.
> 
> > [   38.936624] [drm] fb mappable at 0x4B0696000
> > [   38.936626] [drm] vram apper at 0x4B0000000
> > [   38.936626] [drm] size 33177600
> > [   38.936627] [drm] fb depth is 24
> > [   38.936627] [drm]    pitch is 15360
> > [   38.936673] amdgpu 0000:08:00.0: fb1: amdgpudrmfb frame buffer device
> > [   40.092223] snd_hda_intel 0000:08:00.1: azx_get_response timeout, switching to polling mode: last cmd=0x00170500
> > 
> > After this point, HD-audio communication was screwed up.
> > 
> > This lastcmd in the above message is AC_SET_POWER_STATE verb for the
> > root node to D0, so the very first command to power up the codec. 
> > The rest commands are also about the power up of each node, so the
> > whole error indicate that the power up at runtime resume failed.
> > 
> > So, this looks to me as if the device gets runtime-resumed at the bad
> > moment?
> It does. However, this is not going to be easy to pin down.
> 
> I moved from Arch to Ubuntu, and it behaves differently. I cannot 
> trigger the bug in Ubuntu. Plus, it puts the GPUs asleep, even if 
> attached at boot, unlike Arch. I will continue to try to trigger it. But 
> even if this is a problem with the Linux distribution, it should not be 
> able to trigger a kernel mode bug, so we should persist with finding it.

Sure, that's a bug to be fixed.

This made me thinking what happens if we load the HD-audio driver very
late.  Could you try to blacklist snd-hda-intel module, then load it
manually after plugging the DP monitor and activating it?

Also, could you track who called the problematic power-up sequence,
e.g. by adding WARN_ON_ONCE()?

Last but not least, please check /proc/asound/card1/eld#* files (there
are both card0 and card1 or such that contain eld#* files, and one is
for i915 and another for amdgpu) before and after plugging.  This
shows whether the audio connection was recognized or not.


thanks,

Takashi

--- a/sound/hda/hdac_controller.c
+++ b/sound/hda/hdac_controller.c
@@ -224,6 +224,7 @@ void snd_hdac_bus_update_rirb(struct hdac_bus *bus)
 			dev_err_ratelimited(bus->dev,
 				"spurious response %#x:%#x, last cmd=%#08x\n",
 				res, res_ex, bus->last_cmd[addr]);
+			WARN_ON_ONCE(1);
 		}
 	}
 }

  reply	other threads:[~2020-04-29  7:37 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-26 16:02 [PATCH 0/1] Fiji GPU audio register timeout when in BACO state Nicholas Johnson
2020-04-26 16:02 ` [PATCH 1/1] drm/amdgpu/runpm: Disable runpm on Fiji due to audio register timeout Nicholas Johnson
2020-04-27 14:22 ` [PATCH 0/1] Fiji GPU audio register timeout when in BACO state Deucher, Alexander
2020-04-27 14:22   ` Deucher, Alexander
2020-04-27 14:22   ` Deucher, Alexander
2020-04-27 15:15   ` Takashi Iwai
2020-04-27 15:15     ` Takashi Iwai
2020-04-27 15:15     ` Takashi Iwai
2020-04-27 17:20     ` Nicholas Johnson
2020-04-27 17:20       ` Nicholas Johnson
2020-04-27 17:20       ` Nicholas Johnson
2020-04-27 18:28       ` Alex Deucher
2020-04-27 18:28         ` Alex Deucher
2020-04-27 18:28         ` Alex Deucher
2020-04-27 18:39         ` Takashi Iwai
2020-04-27 18:39           ` Takashi Iwai
2020-04-27 18:39           ` Takashi Iwai
2020-04-27 18:43           ` Alex Deucher
2020-04-27 18:43             ` Alex Deucher
2020-04-27 18:43             ` Alex Deucher
2020-04-28  7:57             ` Takashi Iwai
2020-04-28  7:57               ` Takashi Iwai
2020-04-28  7:57               ` Takashi Iwai
2020-04-28 14:48               ` Nicholas Johnson
2020-04-28 14:48                 ` Nicholas Johnson
2020-04-28 14:48                 ` Nicholas Johnson
2020-04-29  7:37                 ` Takashi Iwai [this message]
2020-04-29  7:37                   ` Takashi Iwai
2020-04-29  7:37                   ` Takashi Iwai
2020-04-29 15:27                   ` Nicholas Johnson
2020-04-29 15:27                     ` Nicholas Johnson
2020-04-29 15:27                     ` Nicholas Johnson
2020-04-29 15:43                     ` Takashi Iwai
2020-04-29 15:43                       ` Takashi Iwai
2020-04-29 15:43                       ` Takashi Iwai
2020-04-29 15:47                     ` Alex Deucher
2020-04-29 15:47                       ` Alex Deucher
2020-04-29 15:47                       ` Alex Deucher
2020-04-29 16:05                       ` Takashi Iwai
2020-04-29 16:05                         ` Takashi Iwai
2020-04-29 16:05                         ` Takashi Iwai
2020-04-29 16:19                         ` Alex Deucher
2020-04-29 16:19                           ` Alex Deucher
2020-04-29 16:19                           ` Alex Deucher
2020-04-30 15:14                           ` Takashi Iwai
2020-04-30 15:14                             ` Takashi Iwai
2020-04-30 15:14                             ` Takashi Iwai
2020-04-30 16:52                             ` Nicholas Johnson
2020-04-30 16:52                               ` Nicholas Johnson
2020-04-30 16:52                               ` Nicholas Johnson
2020-04-30 17:01                               ` Takashi Iwai
2020-04-30 17:01                                 ` Takashi Iwai
2020-04-30 17:01                                 ` Takashi Iwai
2020-04-30 17:38                                 ` Nicholas Johnson
2020-04-30 17:38                                   ` Nicholas Johnson
2020-04-30 17:38                                   ` Nicholas Johnson
2020-04-30 17:49                                   ` Takashi Iwai
2020-04-30 17:49                                     ` Takashi Iwai
2020-04-30 17:49                                     ` Takashi Iwai
2020-05-02  7:11                                     ` Takashi Iwai
2020-05-02  7:11                                       ` Takashi Iwai
2020-05-02  7:11                                       ` Takashi Iwai
2020-05-02  7:17                                       ` Lukas Wunner
2020-05-02  7:17                                         ` Lukas Wunner
2020-05-02  7:17                                         ` Lukas Wunner
2020-05-02  7:27                                         ` Takashi Iwai
2020-05-02  7:27                                           ` Takashi Iwai
2020-05-02  7:27                                           ` Takashi Iwai
2020-05-02 10:09                                           ` Takashi Iwai
2020-05-02 10:09                                             ` Takashi Iwai
2020-05-02 10:09                                             ` Takashi Iwai
2020-05-06 15:15                                             ` Nicholas Johnson
2020-05-06 15:15                                               ` Nicholas Johnson
2020-05-06 15:15                                               ` Nicholas Johnson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=s5ha72ulp2y.wl-tiwai@suse.de \
    --to=tiwai@suse.de \
    --cc=Alexander.Deucher@amd.com \
    --cc=Christian.Koenig@amd.com \
    --cc=David1.Zhou@amd.com \
    --cc=alexdeucher@gmail.com \
    --cc=alsa-devel@alsa-project.org \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=nicholas.johnson-opensource@outlook.com.au \
    --cc=tiwai@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.