From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=ObIu=6N=vger.kernel.org=linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 786C9C83004
	for <linux-kernel@archiver.kernel.org>; Wed, 29 Apr 2020 07:37:46 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 604D32073E
	for <linux-kernel@archiver.kernel.org>; Wed, 29 Apr 2020 07:37:46 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1726608AbgD2Hhp (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 29 Apr 2020 03:37:45 -0400
Received: from mx2.suse.de ([195.135.220.15]:56636 "EHLO mx2.suse.de"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1726381AbgD2Hhp (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 29 Apr 2020 03:37:45 -0400
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
Received: from relay2.suse.de (unknown [195.135.220.254])
        by mx2.suse.de (Postfix) with ESMTP id 1914AAF3F;
        Wed, 29 Apr 2020 07:37:42 +0000 (UTC)
Date:   Wed, 29 Apr 2020 09:37:41 +0200
Message-ID: <s5ha72ulp2y.wl-tiwai@suse.de>
From:   Takashi Iwai <tiwai@suse.de>
To:     Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
Cc:     Alex Deucher <alexdeucher@gmail.com>,
        "Zhou, David(ChunMing)" <David1.Zhou@amd.com>,
        "alsa-devel@alsa-project.org" <alsa-devel@alsa-project.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        "amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
        Takashi Iwai <tiwai@suse.com>, Lukas Wunner <lukas@wunner.de>,
        "Deucher, Alexander" <Alexander.Deucher@amd.com>,
        "Koenig, Christian" <Christian.Koenig@amd.com>
Subject: Re: [PATCH 0/1] Fiji GPU audio register timeout when in BACO state
In-Reply-To: <PSXP216MB043899DC52E6C6BF728D77CD80AC0@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM>
References: <PSXP216MB0438D2AF96CE0D4F83F48C4D80AE0@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM>
        <MN2PR12MB4488E4909C1488FB507E0BF5F7AF0@MN2PR12MB4488.namprd12.prod.outlook.com>
        <s5ho8rdnems.wl-tiwai@suse.de>
        <PSXP216MB04387BF6B5F8DA84749E5D6F80AF0@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM>
        <CADnq5_M=QEqxuCKjb_qZvFSvwM5eLEFfsepxYYXoouFoe5bn7A@mail.gmail.com>
        <s5h4kt4ojrf.wl-tiwai@suse.de>
        <CADnq5_MMQ5_MjEg=bkJJGMJP53RjB3yxvOW0nUDeWxzg3Q0pVQ@mail.gmail.com>
        <s5hv9lkm49n.wl-tiwai@suse.de>
        <PSXP216MB043899DC52E6C6BF728D77CD80AC0@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM>
User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI/1.14.6 (Maruoka)
 FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 Emacs/25.3
 (x86_64-suse-linux-gnu) MULE/6.0 (HANACHIRUSATO)
MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka")
Content-Type: text/plain; charset=US-ASCII
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, 28 Apr 2020 16:48:45 +0200,
Nicholas Johnson wrote:
> 
> > > > >
> > > > > FWIW, I have a fiji board in a desktop system and it worked fine when
> > > > > this code was enabled.
> > > >
> > > > Is the new DC code used for Fiji boards?  IIRC, the audio component
> > > > binding from amdgpu is enabled only for DC, and without the audio
> > > > component binding the runtime PM won't be linked up, hence you can't
> > > > power up GPU from the audio side access automatically.
> > > >
> > > 
> > > Yes, DC is enabled by default for all cards with runtime pm enabled.
> > 
> > OK, thanks, I found that amdgpu got bound via component in the dmesg
> > output, too:
> > 
> > [   21.294927] snd_hda_intel 0000:08:00.1: bound 0000:08:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
> > 
> > This is the place soon after amdgpu driver gets initialized.
> > Then we see later another initialization phase:
> > 
> > [   26.904127] rfkill: input handler enabled
> > [   37.264152] [drm] PCIE GART of 1024M enabled (table at 0x000000F400000000).
> > 
> > here shows 10 seconds between them.  Then, it complained something:
> > 
> > 
> > [   37.363287] [drm] UVD initialized successfully.
> > [   37.473340] [drm] VCE initialized successfully.
> > [   37.477942] amdgpu 0000:08:00.0: [drm] Cannot find any crtc or sizes
> 
> The above would be me hitting WindowsKey+P to change screens, but with 
> no DisplayPort attached to Fiji, hence it unable to find crtc.
> 
> > 
> > ... and go further, and hitting HD-audio error:
> > 
> That would be me having attached the DisplayPort and done WindowsKey+P 
> again.
> 
> > [   38.936624] [drm] fb mappable at 0x4B0696000
> > [   38.936626] [drm] vram apper at 0x4B0000000
> > [   38.936626] [drm] size 33177600
> > [   38.936627] [drm] fb depth is 24
> > [   38.936627] [drm]    pitch is 15360
> > [   38.936673] amdgpu 0000:08:00.0: fb1: amdgpudrmfb frame buffer device
> > [   40.092223] snd_hda_intel 0000:08:00.1: azx_get_response timeout, switching to polling mode: last cmd=0x00170500
> > 
> > After this point, HD-audio communication was screwed up.
> > 
> > This lastcmd in the above message is AC_SET_POWER_STATE verb for the
> > root node to D0, so the very first command to power up the codec. 
> > The rest commands are also about the power up of each node, so the
> > whole error indicate that the power up at runtime resume failed.
> > 
> > So, this looks to me as if the device gets runtime-resumed at the bad
> > moment?
> It does. However, this is not going to be easy to pin down.
> 
> I moved from Arch to Ubuntu, and it behaves differently. I cannot 
> trigger the bug in Ubuntu. Plus, it puts the GPUs asleep, even if 
> attached at boot, unlike Arch. I will continue to try to trigger it. But 
> even if this is a problem with the Linux distribution, it should not be 
> able to trigger a kernel mode bug, so we should persist with finding it.

Sure, that's a bug to be fixed.

This made me thinking what happens if we load the HD-audio driver very
late.  Could you try to blacklist snd-hda-intel module, then load it
manually after plugging the DP monitor and activating it?

Also, could you track who called the problematic power-up sequence,
e.g. by adding WARN_ON_ONCE()?

Last but not least, please check /proc/asound/card1/eld#* files (there
are both card0 and card1 or such that contain eld#* files, and one is
for i915 and another for amdgpu) before and after plugging.  This
shows whether the audio connection was recognized or not.


thanks,

Takashi

--- a/sound/hda/hdac_controller.c
+++ b/sound/hda/hdac_controller.c
@@ -224,6 +224,7 @@ void snd_hdac_bus_update_rirb(struct hdac_bus *bus)
 			dev_err_ratelimited(bus->dev,
 				"spurious response %#x:%#x, last cmd=%#08x\n",
 				res, res_ex, bus->last_cmd[addr]);
+			WARN_ON_ONCE(1);
 		}
 	}
 }