From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 786C9C83004 for ; Wed, 29 Apr 2020 07:37:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 604D32073E for ; Wed, 29 Apr 2020 07:37:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726608AbgD2Hhp (ORCPT ); Wed, 29 Apr 2020 03:37:45 -0400 Received: from mx2.suse.de ([195.135.220.15]:56636 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726381AbgD2Hhp (ORCPT ); Wed, 29 Apr 2020 03:37:45 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 1914AAF3F; Wed, 29 Apr 2020 07:37:42 +0000 (UTC) Date: Wed, 29 Apr 2020 09:37:41 +0200 Message-ID: From: Takashi Iwai To: Nicholas Johnson Cc: Alex Deucher , "Zhou, David(ChunMing)" , "alsa-devel@alsa-project.org" , "linux-kernel@vger.kernel.org" , "amd-gfx@lists.freedesktop.org" , Takashi Iwai , Lukas Wunner , "Deucher, Alexander" , "Koenig, Christian" Subject: Re: [PATCH 0/1] Fiji GPU audio register timeout when in BACO state In-Reply-To: References: User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI/1.14.6 (Maruoka) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 Emacs/25.3 (x86_64-suse-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 28 Apr 2020 16:48:45 +0200, Nicholas Johnson wrote: > > > > > > > > > > > FWIW, I have a fiji board in a desktop system and it worked fine when > > > > > this code was enabled. > > > > > > > > Is the new DC code used for Fiji boards? IIRC, the audio component > > > > binding from amdgpu is enabled only for DC, and without the audio > > > > component binding the runtime PM won't be linked up, hence you can't > > > > power up GPU from the audio side access automatically. > > > > > > > > > > Yes, DC is enabled by default for all cards with runtime pm enabled. > > > > OK, thanks, I found that amdgpu got bound via component in the dmesg > > output, too: > > > > [ 21.294927] snd_hda_intel 0000:08:00.1: bound 0000:08:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu]) > > > > This is the place soon after amdgpu driver gets initialized. > > Then we see later another initialization phase: > > > > [ 26.904127] rfkill: input handler enabled > > [ 37.264152] [drm] PCIE GART of 1024M enabled (table at 0x000000F400000000). > > > > here shows 10 seconds between them. Then, it complained something: > > > > > > [ 37.363287] [drm] UVD initialized successfully. > > [ 37.473340] [drm] VCE initialized successfully. > > [ 37.477942] amdgpu 0000:08:00.0: [drm] Cannot find any crtc or sizes > > The above would be me hitting WindowsKey+P to change screens, but with > no DisplayPort attached to Fiji, hence it unable to find crtc. > > > > > ... and go further, and hitting HD-audio error: > > > That would be me having attached the DisplayPort and done WindowsKey+P > again. > > > [ 38.936624] [drm] fb mappable at 0x4B0696000 > > [ 38.936626] [drm] vram apper at 0x4B0000000 > > [ 38.936626] [drm] size 33177600 > > [ 38.936627] [drm] fb depth is 24 > > [ 38.936627] [drm] pitch is 15360 > > [ 38.936673] amdgpu 0000:08:00.0: fb1: amdgpudrmfb frame buffer device > > [ 40.092223] snd_hda_intel 0000:08:00.1: azx_get_response timeout, switching to polling mode: last cmd=0x00170500 > > > > After this point, HD-audio communication was screwed up. > > > > This lastcmd in the above message is AC_SET_POWER_STATE verb for the > > root node to D0, so the very first command to power up the codec. > > The rest commands are also about the power up of each node, so the > > whole error indicate that the power up at runtime resume failed. > > > > So, this looks to me as if the device gets runtime-resumed at the bad > > moment? > It does. However, this is not going to be easy to pin down. > > I moved from Arch to Ubuntu, and it behaves differently. I cannot > trigger the bug in Ubuntu. Plus, it puts the GPUs asleep, even if > attached at boot, unlike Arch. I will continue to try to trigger it. But > even if this is a problem with the Linux distribution, it should not be > able to trigger a kernel mode bug, so we should persist with finding it. Sure, that's a bug to be fixed. This made me thinking what happens if we load the HD-audio driver very late. Could you try to blacklist snd-hda-intel module, then load it manually after plugging the DP monitor and activating it? Also, could you track who called the problematic power-up sequence, e.g. by adding WARN_ON_ONCE()? Last but not least, please check /proc/asound/card1/eld#* files (there are both card0 and card1 or such that contain eld#* files, and one is for i915 and another for amdgpu) before and after plugging. This shows whether the audio connection was recognized or not. thanks, Takashi --- a/sound/hda/hdac_controller.c +++ b/sound/hda/hdac_controller.c @@ -224,6 +224,7 @@ void snd_hdac_bus_update_rirb(struct hdac_bus *bus) dev_err_ratelimited(bus->dev, "spurious response %#x:%#x, last cmd=%#08x\n", res, res_ex, bus->last_cmd[addr]); + WARN_ON_ONCE(1); } } }