All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boris Brezillon <boris.brezillon@collabora.com>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org, stable@vger.kernel.org,
	Steven Price <steven.price@arm.com>,
	Rob Herring <robh+dt@kernel.org>,
	Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>,
	Robin Murphy <robin.murphy@arm.com>
Subject: Re: [PATCH v3] drm/panfrost: Move the GPU reset bits outside the timeout handler
Date: Tue, 3 Nov 2020 13:02:21 +0100	[thread overview]
Message-ID: <20201103130221.3367da07@collabora.com> (raw)
In-Reply-To: <20201103110847.GG401619@phenom.ffwll.local>

On Tue, 3 Nov 2020 12:08:47 +0100
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Tue, Nov 03, 2020 at 12:03:26PM +0100, Boris Brezillon wrote:
> > On Tue, 3 Nov 2020 11:25:40 +0100
> > Daniel Vetter <daniel@ffwll.ch> wrote:
> >   
> > > On Tue, Nov 03, 2020 at 09:13:47AM +0100, Boris Brezillon wrote:  
> > > > We've fixed many races in panfrost_job_timedout() but some remain.
> > > > Instead of trying to fix it again, let's simplify the logic and move
> > > > the reset bits to a separate work scheduled when one of the queue
> > > > reports a timeout.
> > > > 
> > > > v3:
> > > > - Replace the atomic_cmpxchg() by an atomic_xchg() (Robin Murphy)
> > > > - Add Steven's R-b
> > > > 
> > > > v2:
> > > > - Use atomic_cmpxchg() to conditionally schedule the reset work (Steven Price)
> > > > 
> > > > Fixes: 1a11a88cfd9a ("drm/panfrost: Fix job timeout handling")
> > > > Cc: <stable@vger.kernel.org>
> > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > > > Reviewed-by: Steven Price <steven.price@arm.com>    
> > > 
> > > Sprinkling the dma_fence annotations over this would be really nice ...  
> > 
> > You mean something like that?  
> 
> That's just the irq annotations, i.e. the one that's already guaranteed by
> the irq vs. locks checks. So this does nothing.
> 
> What I mean is annotating your new reset work (it's part of the critical
> path to complete batches, since it's holding up other batches that are
> stuck in the scheduler still), and the drm/scheduler annotations I've
> floated a while ago. The drm/scheduler annotations are stuck somewhat for
> lack of feedback from any of the driver teams using it though :-/
> 
> The thing is pulling something out into a worker of it's own generally
> doesn't fix any deadlocks, it just hides them from lockdep.

Hm, except that's not exactly a deadlock we were trying to fix here (as
in, not a situation where 2 threads try to acquire locks in different
orders), just a situation where the scheduler stops dequeuing jobs
because it ends up in an inconsistent state (which is caused by a
bad/lack-of synchronization between timeout handlers). The problem here
is that we have 3 schedulers (one per HW queue) but when a timeout
occurs on one of them, we need to reset them all, thus requiring some
synchronization between the different timeout works. Moving the reset
logic to a separate work simplifies the synchronization.

> So it would be
> good to make sure lockdep can see through your maze again.

Okay, but it's not clear to me which part of the panfrost_reset()
function should be annotated. I mean, I probably call functions that
can signal fences, but I don't call dma_signal_fence() directly. Are
callers of the dma_sched_xxx() helpers expected to place such
annotations?
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

WARNING: multiple messages have this Message-ID (diff)
From: Boris Brezillon <boris.brezillon@collabora.com>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: Rob Herring <robh+dt@kernel.org>,
	Tomeu Vizoso <tomeu@tomeuvizoso.net>,
	Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>,
	Steven Price <steven.price@arm.com>,
	Robin Murphy <robin.murphy@arm.com>,
	stable@vger.kernel.org, dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v3] drm/panfrost: Move the GPU reset bits outside the timeout handler
Date: Tue, 3 Nov 2020 13:02:21 +0100	[thread overview]
Message-ID: <20201103130221.3367da07@collabora.com> (raw)
In-Reply-To: <20201103110847.GG401619@phenom.ffwll.local>

On Tue, 3 Nov 2020 12:08:47 +0100
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Tue, Nov 03, 2020 at 12:03:26PM +0100, Boris Brezillon wrote:
> > On Tue, 3 Nov 2020 11:25:40 +0100
> > Daniel Vetter <daniel@ffwll.ch> wrote:
> >   
> > > On Tue, Nov 03, 2020 at 09:13:47AM +0100, Boris Brezillon wrote:  
> > > > We've fixed many races in panfrost_job_timedout() but some remain.
> > > > Instead of trying to fix it again, let's simplify the logic and move
> > > > the reset bits to a separate work scheduled when one of the queue
> > > > reports a timeout.
> > > > 
> > > > v3:
> > > > - Replace the atomic_cmpxchg() by an atomic_xchg() (Robin Murphy)
> > > > - Add Steven's R-b
> > > > 
> > > > v2:
> > > > - Use atomic_cmpxchg() to conditionally schedule the reset work (Steven Price)
> > > > 
> > > > Fixes: 1a11a88cfd9a ("drm/panfrost: Fix job timeout handling")
> > > > Cc: <stable@vger.kernel.org>
> > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > > > Reviewed-by: Steven Price <steven.price@arm.com>    
> > > 
> > > Sprinkling the dma_fence annotations over this would be really nice ...  
> > 
> > You mean something like that?  
> 
> That's just the irq annotations, i.e. the one that's already guaranteed by
> the irq vs. locks checks. So this does nothing.
> 
> What I mean is annotating your new reset work (it's part of the critical
> path to complete batches, since it's holding up other batches that are
> stuck in the scheduler still), and the drm/scheduler annotations I've
> floated a while ago. The drm/scheduler annotations are stuck somewhat for
> lack of feedback from any of the driver teams using it though :-/
> 
> The thing is pulling something out into a worker of it's own generally
> doesn't fix any deadlocks, it just hides them from lockdep.

Hm, except that's not exactly a deadlock we were trying to fix here (as
in, not a situation where 2 threads try to acquire locks in different
orders), just a situation where the scheduler stops dequeuing jobs
because it ends up in an inconsistent state (which is caused by a
bad/lack-of synchronization between timeout handlers). The problem here
is that we have 3 schedulers (one per HW queue) but when a timeout
occurs on one of them, we need to reset them all, thus requiring some
synchronization between the different timeout works. Moving the reset
logic to a separate work simplifies the synchronization.

> So it would be
> good to make sure lockdep can see through your maze again.

Okay, but it's not clear to me which part of the panfrost_reset()
function should be annotated. I mean, I probably call functions that
can signal fences, but I don't call dma_signal_fence() directly. Are
callers of the dma_sched_xxx() helpers expected to place such
annotations?

  reply	other threads:[~2020-11-03 12:02 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-03  8:13 [PATCH v3] drm/panfrost: Move the GPU reset bits outside the timeout handler Boris Brezillon
2020-11-03  8:13 ` Boris Brezillon
2020-11-03 10:25 ` Daniel Vetter
2020-11-03 10:25   ` Daniel Vetter
2020-11-03 11:03   ` Boris Brezillon
2020-11-03 11:03     ` Boris Brezillon
2020-11-03 11:08     ` Daniel Vetter
2020-11-03 11:08       ` Daniel Vetter
2020-11-03 12:02       ` Boris Brezillon [this message]
2020-11-03 12:02         ` Boris Brezillon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201103130221.3367da07@collabora.com \
    --to=boris.brezillon@collabora.com \
    --cc=alyssa.rosenzweig@collabora.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=robh+dt@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=stable@vger.kernel.org \
    --cc=steven.price@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.