The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Boris Brezillon <boris.brezillon@collabora.com>
To: Rob Clark <rob.clark@oss.qualcomm.com>
Cc: Daniel J Blueman <daniel@quora.org>,
	Dmitry Baryshkov <lumag@kernel.org>,
	Abhinav Kumar <abhinav.kumar@linux.dev>,
	Jessica Zhang <jesszhan0024@gmail.com>,
	Sean Paul <sean@poorly.run>,
	Marijn Suijten <marijn.suijten@somainline.org>,
	David Airlie <airlied@gmail.com>, Simona Vetter <simona@ffwll.ch>,
	Antonino Maniscalco <antomani103@gmail.com>,
	linux-arm-msm@vger.kernel.org, dri-devel@lists.freedesktop.org,
	freedreno@lists.freedesktop.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org
Subject: Re: [PATCH] drm/msm: Fix shrinker deadlock
Date: Mon, 11 May 2026 09:37:27 +0200	[thread overview]
Message-ID: <20260511093325.74e2777f@fedora> (raw)
In-Reply-To: <CACSVV02t99r2JpT9EUar_YRs+zgpzjNYgNvvB9BGLLnpssY4BA@mail.gmail.com>

Hi,

On Sat, 9 May 2026 08:34:15 -0700
Rob Clark <rob.clark@oss.qualcomm.com> wrote:

> On Thu, May 7, 2026 at 11:57 PM Daniel J Blueman <daniel@quora.org> wrote:
> >
> > With PROVE_LOCKING on an Snapdragon X1 and VM reclaim pressure, we see:
> >
> > """
> > kswapd0/121 is trying to acquire lock:
> > ffff800080ed3800 (reservation_ww_class_acquire){+.+.}-{0:0}, at:
> >   msm_gem_shrinker_scan (drivers/gpu/drm/msm/msm_gem_shrinker.c:189)
> >
> > but task is already holding lock:
> > ffffbf4ddb44ca40 (fs_reclaim){+.+.}-{0:0}, at:
> >   balance_pgdat (mm/vmscan.c:7236 (discriminator 2))
> >
> > which lock already depends on the new lock.
> >
> > the existing dependency chain (in reverse order) is:
> >  
> > -> #2 (fs_reclaim){+.+.}-{0:0}:  
> > lock_acquire (kernel/locking/lockdep.c:5868 kernel/locking/lockdep.c:5825)
> > fs_reclaim_acquire (mm/page_alloc.c:4325 mm/page_alloc.c:4339)
> > dma_resv_lockdep (drivers/dma-buf/dma-resv.c:798)
> > do_one_initcall (init/main.c:1392)
> > kernel_init_freeable (init/main.c:1454 (discriminator 1) init/main.c:1470
> >   (discriminator 1) init/main.c:1490 (discriminator 1) init/main.c:1703
> >   (discriminator 1))
> > kernel_init (init/main.c:1593)
> > ret_from_fork (arch/arm64/kernel/entry.S:858)
> >  
> > -> #1 (reservation_ww_class_mutex){+.+.}-{4:4}:  
> > lock_acquire (kernel/locking/lockdep.c:5868 kernel/locking/lockdep.c:5825)
> > dma_resv_lockdep (./include/linux/ww_mutex.h:164 (discriminator 1)
> >   drivers/dma-buf/dma-resv.c:791 (discriminator 1))
> > do_one_initcall (init/main.c:1392)
> > kernel_init_freeable (init/main.c:1454 (discriminator 1) init/main.c:1470
> >   (discriminator 1) init/main.c:1490 (discriminator 1) init/main.c:1703
> >   (discriminator 1))
> > kernel_init (init/main.c:1593)
> > ret_from_fork (arch/arm64/kernel/entry.S:858)
> >  
> > -> #0 (reservation_ww_class_acquire){+.+.}-{0:0}:  
> > check_prev_add (kernel/locking/lockdep.c:3165)
> > __lock_acquire (kernel/locking/lockdep.c:3284
> >   kernel/locking/lockdep.c:3908 kernel/locking/lockdep.c:5237)
> > lock_acquire (kernel/locking/lockdep.c:5868 kernel/locking/lockdep.c:5825)
> > drm_gem_lru_scan (./include/linux/ww_mutex.h:163 (discriminator 1)
> >   drivers/gpu/drm/drm_gem.c:1681 (discriminator 1))  
> 
> Your line #s don't quite match mine, but AFAICT this is from the
> ww_acquire_init()
> 
> What I'm unsure about is if this could cause live-lock against another
> operation which requires obtaining both obj and vm locks in a
> potentially different order (which would also be using a
> ww_acquire_ctx ticket to backoff in case of conflicting locking
> order).  It wouldn't deadlock because we don't sleep forever if we do
> sleep, but...
> 
> Possibly we should also be using trylock to also acquire the vm lock,
> but lockdep would still complain as it doesn't know the ticket will be
> only used w/ trylock (unless we did something hacky by using a
> different ww_class?)

FWIW, we started using a ticket in the initial version of the Panthor
shrinker, and ditched it at some point because of these unsolvable lock
ordering issues. It also seems to me that trylock-all-the-way is the
right solution, and if we trylock and back off + immediately move to
the next BO if any lock is already held, the ticket approach is not as
useful, because we're not going to use the retry mechanism provided by
ww_mutex anyway.

It's true that it does the bookkeeping, which simplifies the rollback
procedure, but if you look at the other locks taken in the shrinker
path, they are static (one per-component involved in reclaim) for most
of them, meaning the rollback is pretty straightforward. The only
exception is the VM lock (one per vm_bo in case of shared BOs). In
panthor, we just decided to open-code this rollback logic (see
panthor_gem_try_evict_no_resv_wait() [1]) instead of teaching ww_mutex
about non-blocking locks when a ticket is provided. Not saying this is
the best option, but it works...

Regards,

Boris

[1]https://gitlab.freedesktop.org/drm/misc/kernel/-/blob/drm-misc-next/drivers/gpu/drm/panthor/panthor_gem.c?ref_type=heads#L1425

  reply	other threads:[~2026-05-11  7:37 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-08  6:57 [PATCH] drm/msm: Fix shrinker deadlock Daniel J Blueman
2026-05-09 15:34 ` Rob Clark
2026-05-11  7:37   ` Boris Brezillon [this message]
2026-05-11 13:16     ` Rob Clark
2026-05-10 12:30 ` kernel test robot
2026-05-10 17:08 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260511093325.74e2777f@fedora \
    --to=boris.brezillon@collabora.com \
    --cc=abhinav.kumar@linux.dev \
    --cc=airlied@gmail.com \
    --cc=antomani103@gmail.com \
    --cc=daniel@quora.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=freedreno@lists.freedesktop.org \
    --cc=jesszhan0024@gmail.com \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lumag@kernel.org \
    --cc=marijn.suijten@somainline.org \
    --cc=rob.clark@oss.qualcomm.com \
    --cc=sean@poorly.run \
    --cc=simona@ffwll.ch \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox