Re: [PATCH] drm/radeon: add an exclusive lock for GPU reset

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Christian König" <deathsimple@vodafone.de>
To: Jerome Glisse <j.glisse@gmail.com>
Cc: Jerome Glisse <jglisse@redhat.com>, dri-devel@lists.freedesktop.org
Subject: Re: [PATCH] drm/radeon: add an exclusive lock for GPU reset
Date: Tue, 03 Jul 2012 16:52:00 +0200	[thread overview]
Message-ID: <4FF30710.7030504@vodafone.de> (raw)
In-Reply-To: <CAH3drwaAxBfBtpeeihuApE-UM6BVnOdXzsfA6mAwCg3e9gNm_g@mail.gmail.com>

On 03.07.2012 16:09, Jerome Glisse wrote:
> On Tue, Jul 3, 2012 at 5:26 AM, Christian König <deathsimple@vodafone.de> wrote:
>> [SNIP]
>> Yeah, but I thought that fixing those oops as the second step. I see the
>> fact that suspend/resume is unpinning all the ttm memory and then pinning it
>> again as a bug that needs to be fixed. Or as an alternative we could split
>> the suspend/resume calls into suspend/disable and resume/enable calls, where
>> we only call disable/enable in the gpu_reset path rather than a complete
>> suspend/resume (not really sure about that).
> Fixing oops are not second step, they are first step. I am not saying
> that the suspend/resume as it happens right now is a good thing, i am
> saying it's what it's and we have to deal with it until we do
> something better. There is no excuse to not fix oops with a simple
> patch 16 lines patch.
Completely agree.

>> Also a GPU reset isn't such a rare event, currently it just occurs when
>> userspace is doing something bad, for example submitting an invalid shader
>> or stuff like that. But with VM and IO protection coming into the driver we
>> are going to need a GPU reset when there is an protection fault, and with
>> that it is really desirable to just reset the hardware in a way where we can
>> say: This IB was faulty skip over it and resume with whatever is after it on
>> the ring.
> There is mecanism to handle that properly from irq handler that AMD
> need to release, the pagefault thing could be transparent and should
> only need few lines in the irq handler (i think i did a patch for that
> and sent it to AMD for review but i am wondering if i wasn't lacking
> some doc).
I also searched the AMD internal docs for a good description of how the 
lightweight recovery is supposed to work, but haven't found anything 
clear so far. My expectation is that there should be something like a 
"abort current IB" command you can issue by writing an register, but 
that doesn't seems to be the case.

>> And todo that we need to keep the auxiliary data like sub allocator memory,
>> blitting shader bo, and especially vm page tables at the same place in
>> hardware memory.
> I agree that we need a lightweight reset but we need to keep the heavy
> reset as it is right now, if you want to do a light weight reset do it
> as a new function. I always intended to have two reset path, hint gpu
> soft reset name vs what is call hard reset but not released, i even
> made patch for that long time ago but never got them cleared from AMD
> review.
My idea was to pass in some extra informations, so asic_reset more 
clearly knows what todo. An explicit distinction between a soft and a 
hard reset also seems like a possible solution, but sounds like a bit of 
code duplication.

>>> I stress it we need at very least a mutex to protect gpu reset and i
>>> will stand on that position because there is no other way around.
>>> Using rw lock have a bonus of allowing proper handling of gpu reset
>>> failure and that what the patch i sent to linus fix tree is about, so
>>> to make drm next merge properly while preserving proper behavior in
>>> gpu reset failure the rw semaphore is the best option.
>> Oh well, I'm not arguing that we don't need a look here. I'm just
>> questioning to put it at the ioctl level (e.g. the driver entry from
>> userspace), that wasn't such a good idea with the cs_mutex and doesn't seems
>> like a good idea now. Instead we should place the looking between the
>> ioctl/ttm and the actual hardware submission and that brings it pretty close
>> (if not identically) to the ring mutex.
>>
>> Cheers,
>> Christian.
> No, locking at the ioctl level make sense please show me figure that
> it hurt performance, i did a quick sysprof and i couldn't see them
> impacting anything. No matter how much you hate this, this is the best
> solution, it avoids each ioctl to do useless things in case of gpu
> lockup and it touch a lot less code than moving a lock down the call
> path would. So again this is the best solution for the heavy reset,
> and i am not saying that a soft reset would need to take this lock or
> that we can't improve the way it's done. All i am saying is that ring
> lock is the wrong solution for heavy reset, it should be ok for light
> weight reset.
>
I'm not into any performance concerns, it just doesn't seems to be the 
right place to me. So are you really sure that the 
ttm_bo_delayed_workqueue,  pageflips or callbacks to radeon_bo_move 
can't hit us here? IIRC that always was the big concern with the 
cs_mutex not being held in all cases.

Anyway, if you think it will work and fix the crash problem at hand then 
I'm ok with commit it.

Christian.
<http://lxr.free-electrons.com/ident?v=2.6.37;i=ttm_bo_delayed_workqueue>

next prev parent reply	other threads:[~2012-07-03 14:52 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-02 15:39 [PATCH] drm/radeon: add an exclusive lock for GPU reset j.glisse
2012-07-02 16:20 ` Jerome Glisse
2012-07-02 16:26 ` Christian König
2012-07-02 16:41   ` Jerome Glisse
2012-07-02 17:05     ` Christian König
2012-07-02 17:27       ` Jerome Glisse
2012-07-03  9:26         ` Christian König
2012-07-03 14:09           ` Jerome Glisse
2012-07-03 14:52             ` Christian König [this message]
2012-07-03 15:15               ` Jerome Glisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FF30710.7030504@vodafone.de \
    --to=deathsimple@vodafone.de \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=j.glisse@gmail.com \
    --cc=jglisse@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.