All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <deathsimple@vodafone.de>
To: Jerome Glisse <j.glisse@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Subject: Re: GPU lockup dumping
Date: Thu, 24 May 2012 09:58:11 +0200	[thread overview]
Message-ID: <4FBDEA13.4070208@vodafone.de> (raw)
In-Reply-To: <CAH3drwYTQTQLg432+9NPAFZ46EsDGxMyYt6WfKC3mngZ_7bmwA@mail.gmail.com>

On 23.05.2012 19:02, Jerome Glisse wrote:
> On Wed, May 23, 2012 at 12:41 PM, Dave Airlie<airlied@gmail.com>  wrote:
>> On Wed, May 23, 2012 at 5:26 PM, Jerome Glisse<j.glisse@gmail.com>  wrote:
>>> On Wed, May 23, 2012 at 12:08 PM, Dave Airlie<airlied@gmail.com>  wrote:
>>>> On Wed, May 23, 2012 at 3:48 PM, Jerome Glisse<j.glisse@gmail.com>  wrote:
>>>>> On Wed, May 23, 2012 at 8:34 AM, Christian König
>>>>> <deathsimple@vodafone.de>  wrote:
>>>>>> On 23.05.2012 11:27, Dave Airlie wrote:
>>>>>>> On Thu, May 17, 2012 at 7:28 PM,<j.glisse@gmail.com>    wrote:
>>>>>>>> So here is improved patchset, where i splited ground work necessary
>>>>>>>> for the dumping into their own patch. The debugfs improvement could
>>>>>>>> probably be usefull to intel instead of having i915 have it's own
>>>>>>>> debugfs file stuff.
>>>>>>>>
>>>>>>>> The lockup dumping public api have been move into radeon_drm.h
>>>>>>>>
>>>>>>>> Stressing the fact again that dump are self contained ie they have
>>>>>>>> all the data needed to be replayed (vertex, indices, shader, texture,
>>>>>>>> ...).
>>>>>>>>
>>>>>>>> Would really like to get this into 3.5, the new API is pretty much
>>>>>>>> straightforward and userspace tools can easily be made to convert
>>>>>>>> it to other format. The change to the driver is self contained.
>>>>>>> I really don't like introducing this at this stage into 3.5,
>>>>>>>
>>>>>>> I'd really like a good review of the API and what information we provide
>>>>>>> along with how extensible it is.
>>>>>>>
>>>>>>> I'm still not convinced replay is what we want in the field, I know its
>>>>>>> what
>>>>>>> *you* want, but I think apitrace stuff in userspace pretty much covers
>>>>>>> the replaying situation. So I'd have to look at this and see how easy
>>>>>>> it makes disecting command streams etc.
>>>>>>>
>>>>>>> Dave.
>>>>>>
>>>>>> I agree that it might not be a good idea to push that into 3.5, since at
>>>>>> least I (and I also think Alex) didn't had time to look into it yet. On the
>>>>>> other hand the patches look quite reasonable.
>>>>>>
>>>>>> But I still wanted to throw in a requirement from my day to day work, maybe
>>>>>> that helps finding a more general solution:
>>>>>> When we start to work with more parts of the chip it might be necessary to
>>>>>> dump everything that is currently "in the fly". For example I had a whole
>>>>>> bunch of problems where copying data around with a 3D Blit and then missing
>>>>>> a sync between this job and a job on another rings causes a "hiccup" in the
>>>>>> hardware.
>>>>>>
>>>>>> I know that this isn't your focus and that is absolutely ok with me, cause
>>>>>> the format you are introducing is just used in debugfs and so not part of
>>>>>> any stable API (at least not in my understanding), but you should still keep
>>>>>> in mind that we might need to extend it into that direction in the future.
>>>>>>
>>>>>> Christian.
>>>>> Note that my format is also done with that in mind, it can capture ib
>>>>> from all rings. The only thing i don't think worth capturing are the
>>>>> ring themself because there would be no way to replay them without
>>>>> adding some new special API.
>>>> I'd like to dump the rings as well, as I said I'd rather we didn't
>>>> limit this to replay, but make it useful for getting as much info as
>>>> possible out
>>>>
>>>> Dave.
>>> Ring will contains very little, like ib schedule and fence, i don't
>>> see how useful this can be.
>>>
>> In case we have a bug in our ib scheduling or fencing :-0
>>
>> Dave.
> Well i think we have several kind of lockup, the most basic one is
> userspace sending broken shader, vertex, or something in that line.
> The more complex one is timing related, like a bo move or some cache
> invalidation that didn't happen properly and GPU endup reading either
> wrong data or old cached data. I don't see how to capture useful
> information for this second case, beside doing snapshot of memory.
>
> For multi-ring i agree that dumping the ring might prove useful spot
> inter-ring semaphore deadlock, or possibly inter-ring absence of
> synchronization (but that would be a bad kernel bug).

I don't think that we need the actual data from the rings neither (at 
least as long as we keep the radeon_ring_* debugfs files). But it would 
still be nice to know weather or not there was a sync between the rings. 
See the patches I just send to you (sorry, actually send more patches 
than I wanted to send), storing the new sync_seq array within the debug 
output should enable us to actually figure out the dependencies and 
order between different IBs.

Cheers,
Christian.

  reply	other threads:[~2012-05-24  7:58 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-17 18:28 GPU lockup dumping j.glisse
2012-05-17 18:28 ` [PATCH 1/5] drm/debugfs: allow driver to provide a custom read callback j.glisse
2012-05-17 18:28 ` [PATCH 2/5] drm/radeon: allow radeon debugfs helper to provide custom read j.glisse
2012-05-17 18:28 ` [PATCH 3/5] drm/radeon: allow radeon_vm_bo_update_pte caller to get bo virtual offset j.glisse
2012-05-17 18:28 ` [PATCH 4/5] drm/radeon: add lockup faulty command recording v2 j.glisse
2012-05-17 18:28 ` [PATCH 5/5] drm/radeon: restore consistant whitespace & indentation j.glisse
2012-05-23  9:27 ` GPU lockup dumping Dave Airlie
2012-05-23 12:34   ` Christian König
2012-05-23 14:48     ` Jerome Glisse
2012-05-23 16:08       ` Dave Airlie
2012-05-23 16:26         ` Jerome Glisse
2012-05-23 16:41           ` Dave Airlie
2012-05-23 17:02             ` Jerome Glisse
2012-05-24  7:58               ` Christian König [this message]
2012-05-23 16:04     ` Alex Deucher
2012-05-23 16:26       ` Jerome Glisse
2012-05-23 14:51   ` Jerome Glisse
2012-05-23 19:33     ` Jordan Crouse
  -- strict thread matches above, loose matches on Subject: below --
2012-05-17 19:53 j.glisse
2012-05-17 22:07 j.glisse
2012-05-22 21:08 ` Jerome Glisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FBDEA13.4070208@vodafone.de \
    --to=deathsimple@vodafone.de \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=j.glisse@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.