All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Dawson <matthew@mjdsystems.ca>
To: "Christian König" <deathsimple@vodafone.de>
Cc: dri-devel@lists.freedesktop.org
Subject: Re: [PATCH] drm/radeon: Avoid double gpu reset by adding a timeout on IB ring tests.
Date: Mon, 08 Feb 2016 09:51:03 -0500	[thread overview]
Message-ID: <5775429.5kGhVQn69T@ring00> (raw)
In-Reply-To: <56B1F191.5020607@vodafone.de>


[-- Attachment #1.1: Type: text/plain, Size: 3906 bytes --]

On Wednesday, February 3, 2016 1:24:49 PM EST Christian König wrote:
> Am 31.01.2016 um 19:50 schrieb Matthew Dawson:
> > On Sunday, January 24, 2016 10:49:00 AM EST Christian König wrote:
> >> Am 24.01.2016 um 07:18 schrieb Matthew Dawson:
> >>> When the radeon driver resets a gpu, it attempts to test whether all the
> >>> rings can successfully handle an IB.  If these rings fail to respond,
> >>> the
> >>> process will wait forever.  Another gpu reset can't happen at this
> >>> point,
> >>> as the current reset holds a lock required to do so.  Instead, make all
> >>> the IB tests run with a timeout, so the system can attempt to recover
> >>> in this case.
> >>> 
> >>> While this doesn't fix the underlying issue with card resets failing, it
> >>> gives the system a higher chance of recovering.  These timeouts have
> >>> been
> >>> confirmed to help both a Tathi and Hawaii card recover after a gpu
> >>> reset.
> >>> 
> >>> I haven't been able to test this on other cards, but everything
> >>> compiles.
> >>> I wasn't sure what to use for a timeout value, so for now I've used
> >>> radeon_lockup_timeout.  When that is 0, the timeout functionality in
> >>> this
> >>> patch is also disabled.
> >>> 
> >>> This also adds a new function, radeon_fence_wait_timeout, that behaves
> >>> like
> >>> radeon_fence_wait except allowing a different timeout value to be passed
> >>> to
> >>> radeon_fence_wait_seq_timeout.
> >>> 
> >>> Signed-off-by: Matthew Dawson <matthew@mjdsystems.ca>
> >> 
> >> Really good idea, but please don't use radeon_lockup_timeout for this
> >> cause the 10 seconds default of that is way to long. Instead just use a
> >> fixed, let's say 100ms timeout defined in radeon.h.
> > 
> > I originally tried 100ms, but I found that to be too short (my system
> > would
> > just lockup completely (no network even)).  I found 1s is long enough, and
> > isn't so long to be annoying.  Would that be ok?
> 
> Yeah, 1s works as well. But that 100ms shouldn't be enough sounds like
> another bug to me.
Probably, but it has a tendency to crash my entire system, so I'm not sure how 
to debug it further.  Netconsole maybe?  I probably won't be able to dig into 
this due to time since 1s does work.

> 
> >> Also don't define our own radeon_fence_wait_timeout, just use
> >> fence_wait_timeout(&fence->base, RADEON_IB_TEST_TIMEOUT) for this.
> > 
> > I switched everything over to this, however the ib test always times out
> > after a gpu reset (on startup, everything is normal).  I'm not sure why
> > fence_wait_timeout isn't being signaled while
> > radeon_fence_wait_seq_timeout
> > is.  I'm suspicious that either radeon_fence_enable_signaling is not doing
> > something necessary to get the fence completion to actual signal
> > radeon_fence_default_wait, or because we hold the exclusive lock during a
> > reset radeon_fence_check_lockup never runs and thus never triggers the
> > fence or enables some irq lines.
> > 
> > Is it worth digging through the interactions here to get dma-buf fences to
> > work correctly during a lockup, or would it be better to just keep
> > radeon_fence_wait_timeout?  If option 1 is preferred, I'm happy to learn
> > but I need some help learning how it is supposed to work.
> 
> In this case feel free to add the radeon_fence_wait_timeout. It's
> probably a good idea to get rid of the exclusive lock sooner or later in
> radeon as well, but that really is a long term project.
> 
> If you're really bored I can give you tons of input where to start with
> that.
That sounds like a good project to learn more about the driver, so I'd be 
interested.  I don't have a lot of time right now to play with it though.  I 
do want to finish off the tf2 crasher first as well.  Once I have time I'll 
poke the list to get started.

Thanks for the reviews!
-- 
Matthew

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  reply	other threads:[~2016-02-08 14:51 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-24  6:18 [PATCH] drm/radeon: Avoid double gpu reset by adding a timeout on IB ring tests Matthew Dawson
2016-01-24  9:49 ` Christian König
2016-01-31 18:50   ` Matthew Dawson
2016-02-03 12:24     ` Christian König
2016-02-08 14:51       ` Matthew Dawson [this message]
2016-02-08 16:55         ` Bruno Kant
  -- strict thread matches above, loose matches on Subject: below --
2016-02-07 21:51 Matthew Dawson
2016-02-08  9:38 ` Christian König
2016-02-08 15:42   ` Alex Deucher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5775429.5kGhVQn69T@ring00 \
    --to=matthew@mjdsystems.ca \
    --cc=deathsimple@vodafone.de \
    --cc=dri-devel@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.