All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sergey Senozhatsky <senozhatsky@chromium.org>
To: Rob Clark <rob.clark@oss.qualcomm.com>
Cc: Akhil P Oommen <akhilpo@oss.qualcomm.com>,
	 Sergey Senozhatsky <senozhatsky@chromium.org>,
	Sean Paul <sean@poorly.run>,
	 Konrad Dybcio <konradybcio@kernel.org>,
	linux-arm-msm@vger.kernel.org, dri-devel@lists.freedesktop.org,
	 freedreno@lists.freedesktop.org, linux-kernel@vger.kernel.org,
	Tomasz Figa <tfiga@chromium.org>,
	 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
	Maxime Ripard <mripard@kernel.org>,
	 Thomas Zimmermann <tzimmermann@suse.de>,
	David Airlie <airlied@gmail.com>,
	 Simona Vetter <simona@ffwll.ch>
Subject: Re: [RFC PATCH] drm: gpu: msm: forbid mem reclaim from reset
Date: Mon, 30 Mar 2026 11:46:25 +0900	[thread overview]
Message-ID: <acnj2pIv_6H56GDT@google.com> (raw)
In-Reply-To: <CACSVV00MFgsn8_XjXZxubJibLFE9ULrFqiEW9dQAyU404SLj1g@mail.gmail.com>

On (26/03/27 09:08), Rob Clark wrote:
> On Thu, Mar 26, 2026 at 5:18 PM Akhil P Oommen <akhilpo@oss.qualcomm.com> wrote:
> >
> > On 3/26/2026 7:24 AM, Sergey Senozhatsky wrote:
> > > On (26/01/27 16:33), Sergey Senozhatsky wrote:
> > >> We sometimes get into a situtation where GPU hangcheck fails to
> > >> recover GPU:
> > >>
> > >> [..]
> > >> msm_dpu ae01000.display-controller: [drm:hangcheck_handler] *ERROR* (IPv4: 1): hangcheck detected gpu lockup rb 0!
> > >> msm_dpu ae01000.display-controller: [drm:hangcheck_handler] *ERROR* (IPv4: 1): completed fence: 7840161
> > >> msm_dpu ae01000.display-controller: [drm:hangcheck_handler] *ERROR* (IPv4: 1): submitted fence: 7840162
> > >> msm_dpu ae01000.display-controller: [drm:hangcheck_handler] *ERROR* (IPv4: 1): hangcheck detected gpu lockup rb 0!
> > >> msm_dpu ae01000.display-controller: [drm:hangcheck_handler] *ERROR* (IPv4: 1): completed fence: 7840162
> > >> msm_dpu ae01000.display-controller: [drm:hangcheck_handler] *ERROR* (IPv4: 1): submitted fence: 7840163
> > >> [..]
> > >>
> > >> The problem is that msm_job worker is blocked on gpu->lock
> > >>
> > >> INFO: task ring0:155 blocked for more than 122 seconds.
> > >> Not tainted 6.6.99-08727-gaac38b365d2c #1
> > >> task:ring0 state:D stack:0 pid:155 ppid:2 flags:0x00000008
> > >> Call trace:
> > >> __switch_to+0x108/0x208
> > >> schedule+0x544/0x11f0
> > >> schedule_preempt_disabled+0x30/0x50
> > >> __mutex_lock_common+0x410/0x850
> > >> __mutex_lock_slowpath+0x28/0x40
> > >> mutex_lock+0x5c/0x90
> > >> msm_job_run+0x9c/0x140
> > >> drm_sched_main+0x514/0x938
> > >> kthread+0x114/0x138
> > >> ret_from_fork+0x10/0x20
> > >>
> > >> which is owned by recover worker, which is waiting for DMA fences
> > >> from a memory reclaim path, under the very same gpu->lock
> >
> > I am still thinking if there is a better way to handle this. Btw, Rob
> > had a few fixes related to this area recently. Do you think those would
> > help in this scenario?
> 
> For some reason I was thinking we used GFP_ATOMIC or similar in the
> gpu snapshot path.. but we don't :-(
> 
> It does look like we handle allocation failures.  So this is probably
> a better option than fixing up GFP flags everywhere.
> 
> Reviewed-by: Rob Clark <rob.clark@oss.qualcomm.com>

Thanks!

> 
> (and apologies for overlooking this patch earlier)

No worries.

  reply	other threads:[~2026-03-30  2:46 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-27  7:33 [RFC PATCH] drm: gpu: msm: forbid mem reclaim from reset Sergey Senozhatsky
2026-02-03  3:42 ` Sergey Senozhatsky
2026-03-26  1:54 ` Sergey Senozhatsky
2026-03-27  0:17   ` Akhil P Oommen
2026-03-27 16:08     ` Rob Clark
2026-03-30  2:46       ` Sergey Senozhatsky [this message]
2026-03-30  2:45     ` Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=acnj2pIv_6H56GDT@google.com \
    --to=senozhatsky@chromium.org \
    --cc=airlied@gmail.com \
    --cc=akhilpo@oss.qualcomm.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=freedreno@lists.freedesktop.org \
    --cc=konradybcio@kernel.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mripard@kernel.org \
    --cc=rob.clark@oss.qualcomm.com \
    --cc=sean@poorly.run \
    --cc=simona@ffwll.ch \
    --cc=tfiga@chromium.org \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.