From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from MTA-13-4.privateemail.com (mta-13-4.privateemail.com [198.54.127.109]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DAFB51400C for ; Sat, 24 Jan 2026 19:49:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.54.127.109 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769284201; cv=none; b=S4kbOtVPgi7zrtefOkETl2uN2s7+U7o4KJ2jwri67y91ExlJI9BAWz63CWhuMMIHqajVW/rSHrfs3b56xDJa+V8RYD7l1gGBGzdIsAskZCz2KRQ50giWTljXtFNkXqx4rsE85mRmEBdbdHU8k2kGmPNLk1nhV0ssd8lNcathMFE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769284201; c=relaxed/simple; bh=RIkWJzMbgMi4FcW9mKuvNQHwp6MgvkqZcdgVBzcWRnE=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=G+UkMEGgRJviWM9R51MZyckErkUDFE6vrwV5QfZwtwuqYeZr9Su8V6cKdqdIS3JGRAKef48H3/fRI7uWLbC7KpH5PgKv/NDeNG5w5v267JOik756ddIfKr3AbEA+ZHgErOb2HElTKVjznATh2NSvBE4CWSRhA8b8yo1q+uGqN+w= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=effective-light.com; spf=pass smtp.mailfrom=effective-light.com; arc=none smtp.client-ip=198.54.127.109 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=effective-light.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=effective-light.com Received: from mta-13.privateemail.com (localhost [127.0.0.1]) by mta-13.privateemail.com (Postfix) with ESMTP id 4dz5405rm3z3hhTT; Sat, 24 Jan 2026 14:49:52 -0500 (EST) Received: from hal-station (unknown [204.137.14.92]) by mta-13.privateemail.com (Postfix) with ESMTPA; Sat, 24 Jan 2026 14:49:19 -0500 (EST) Date: Sat, 24 Jan 2026 14:49:15 -0500 From: Hamza Mahfooz To: Mario Limonciello Cc: Timur =?iso-8859-1?Q?Krist=F3f?= , dri-devel@lists.freedesktop.org, Christian =?iso-8859-1?Q?K=F6nig?= , Alex Deucher , David Airlie , Simona Vetter , Harry Wentland , Leo Li , Rodrigo Siqueira , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , Sunil Khatri , Ce Sun , Lijo Lazar , Kenneth Feng , Ivan Lipski , Alex Hung , Tom Chung , Melissa Wen , Michel =?iso-8859-1?Q?D=E4nzer?= , Fangzhi Zuo , amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/2] drm: introduce page_flip_timeout() Message-ID: References: <20260123000537.2450496-1-someguy@effective-light.com> <2de6d428-b997-4ba8-8766-a211e5612e72@amd.com> <2349754.vFx2qVVIhK@timur-hyperion> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Virus-Scanned: ClamAV using ClamSMTP On Sat, Jan 24, 2026 at 12:43:02PM -0600, Mario Limonciello wrote: > > > On 1/24/2026 12:32 PM, Hamza Mahfooz wrote: > > On Fri, Jan 23, 2026 at 04:30:12PM -0600, Mario Limonciello wrote: > > > Hamza - since you seem to have a "workload" that can run overnight and this > > > series recovers, can you try what Alex said and do a dc_suspend() and > > > dc_resume() for failure? > > > > > > Make sure you log a message so you can know it worked. > > > > Sure, I'll try something along the lines of: > > Generally speaking that looks good, but I'll leave a few comments. > > > > > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c > > index 492457c86393..bc7abd00f5f4 100644 > > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c > > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c > > @@ -579,11 +579,28 @@ amdgpu_dm_atomic_crtc_get_property(struct drm_crtc *crtc, > > } > > #endif > > > > -static void amdgpu_dm_crtc_handle_timeout(struct drm_crtc *crtc) > > +static void amdgpu_dm_crtc_handle_timeout(struct drm_crtc *crtc, > > + struct drm_crtc_commit *commit) > > { > > struct amdgpu_device *adev = drm_to_adev(crtc->dev); > > struct amdgpu_reset_context reset_ctx; > > + struct amdgpu_ip_block *ip_block; > > > > + ip_block = amdgpu_device_ip_get_ip_block(adev, AMD_IP_BLOCK_TYPE_DCE); > > + if (!ip_block) > > + goto full_reset; > > + > > + ip_block->version->funcs->suspend(ip_block); > > + ip_block->version->funcs->resume(ip_block); > > Both of these can fail. Especially considering the page flip timeout could > be a DCN hang, I think you should check return code for both of them > sequentially and jump to the full reset condition if either fails. > > > + > > + if (drm_crtc_commit_wait(commit)) { > > + drm_err(crtc->dev, "suspend-resume failed!\n"); > > + goto full_reset; > > + } > > + > > At least to prove "this worked" you should log a message "right here" that > the reset occurred and you recovered. That "might not" be in the final > version, but I think it's worth having for now. I have included all of the suggestions in my test run, fingers crossed that I don't have to wait too long for a repro though. > > > + return; > > + > > +full_reset: > > if (amdgpu_device_should_recover_gpu(adev)) { > > memset(&reset_ctx, 0, sizeof(reset_ctx)); > > > > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c > > index 7175294ccb57..b38c4ee2fc95 100644 > > --- a/drivers/gpu/drm/drm_atomic_helper.c > > +++ b/drivers/gpu/drm/drm_atomic_helper.c > > @@ -1961,7 +1961,7 @@ void drm_atomic_helper_wait_for_flip_done(struct drm_device *dev, > > crtc->base.id, crtc->name); > > > > if (crtc->funcs->page_flip_timeout) > > - crtc->funcs->page_flip_timeout(crtc); > > + crtc->funcs->page_flip_timeout(crtc, commit); > > } > > } > > > > diff --git a/include/drm/drm_crtc.h b/include/drm/drm_crtc.h > > index 45dc5a76e915..47a34a05f6de 100644 > > --- a/include/drm/drm_crtc.h > > +++ b/include/drm/drm_crtc.h > > @@ -616,7 +616,8 @@ struct drm_crtc_funcs { > > * and can be used by drivers to attempt to recover from a page flip > > * timeout. > > */ > > - void (*page_flip_timeout)(struct drm_crtc *crtc); > > + void (*page_flip_timeout)(struct drm_crtc *crtc, > > + struct drm_crtc_commit *commit); > > > > /** > > * @set_property: >