All of lore.kernel.org
 help / color / mirror / Atom feed
From: Maarten Lankhorst <maarten.lankhorst@canonical.com>
To: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: "Michel Dänzer" <michel@daenzer.net>, dri-devel@lists.freedesktop.org
Subject: Re: GPU lockup CP stall for more than 10000msec on latest vanilla git
Date: Tue, 18 Dec 2012 19:10:42 +0100	[thread overview]
Message-ID: <50D0B1A2.3050102@canonical.com> (raw)
In-Reply-To: <20121218161238.GA213@x4>

Op 18-12-12 17:12, Markus Trippelsdorf schreef:
> On 2012.12.18 at 16:24 +0100, Maarten Lankhorst wrote:
>> Op 18-12-12 14:38, Markus Trippelsdorf schreef:
>>> On 2012.12.18 at 12:20 +0100, Michel Dänzer wrote:
>>>> On Mon, 2012-12-17 at 23:55 +0100, Markus Trippelsdorf wrote: 
>>>>> On 2012.12.17 at 23:25 +0100, Markus Trippelsdorf wrote:
>>>>>> On 2012.12.17 at 17:00 -0500, Alex Deucher wrote:
>>>>>>> On Mon, Dec 17, 2012 at 4:48 PM, Markus Trippelsdorf
>>>>>>> <markus@trippelsdorf.de> wrote:
>>>>>>>> On 2012.12.17 at 16:32 -0500, Alex Deucher wrote:
>>>>>>>>> On Mon, Dec 17, 2012 at 1:27 PM, Markus Trippelsdorf
>>>>>>>>> <markus@trippelsdorf.de> wrote:
>>>>>>>>>> As soon as I open the following website:
>>>>>>>>>> http://www.boston.com/bigpicture/2012/12/2012_year_in_pictures_part_i.html
>>>>>>>>>>
>>>>>>>>>> my Radeon RS780 stalls (GPU lockup) leaving the machine unusable:
>>>>>>>>> Is this a regression?  Most likely a 3D driver bug unless you are only
>>>>>>>>> seeing it with specific kernels.  What browser are you using and do
>>>>>>>>> you have hw accelerated webgl, etc. enabled?  If so, what version of
>>>>>>>>> mesa are you using?
>>>>>>>> This is a regression, because it is caused by yesterdays merge of
>>>>>>>> drm-next by Linus. IOW I only see this bug when running a
>>>>>>>> v3.7-9432-g9360b53 kernel.
>>>>>>> Can you bisect?  I'm guessing it may be related to the new DMA rings.  Possibly:
>>>>>>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=2d6cc7296d4ee128ab0fa3b715f0afde511f49c2
>>>>>> Yes, the commit above causes the issue. 
>>>>>>
>>>>>>  2d6cc72  GPU lockups
>>>>> With 2d6cc72 reverted I get:
>>>>>
>>>>> Dec 17 23:09:35 x4 kernel: ------------[ cut here ]------------
>>>> Probably a separate issue, can you bisect this one as well?
>>> Yes. Git-bisect points to:
>>>
>>> 85b144f860176ec18db927d6d9ecdfb24d9c6483 is the first bad commit
>>> commit 85b144f860176ec18db927d6d9ecdfb24d9c6483
>>> Author: Maarten Lankhorst <maarten.lankhorst@canonical.com>
>>> Date:   Thu Nov 29 11:36:54 2012 +0000
>>>
>>>     drm/ttm: call ttm_bo_cleanup_refs with reservation and lru lock
>>>     held, v3
>>>
>>> (Please note that this bug is a little bit harder to reproduce. But
>>> when you scroll up and down for ~10 seconds on the webpage mentioned
>>> above it will trigger the oops.
>>> So while I'm not 100% sure that the issue is caused by exactly this
>>> commit, the vicinity should be right)
>>>
>> Those dmesg warnings sound suspicious, looks like something is going
>> very wrong there.
>>
>> Can you revert the one before it? "drm/radeon: allow move_notify to be
>> called without reservation" Reservation should be held at this point,
>> that commit got in accidentally.
>>
>> I doubt not holding a reservation is causing it though, I don't really
>> see how that commit could cause it however, so can you please double
>> check it never happened before that point, and only started at that
>> commit?
>>
>> also slap in a BUG_ON(!ttm_bo_is_reserved(bo)) in
>> ttm_bo_cleanup_refs_and_unlock for good measure, and a
>> BUG_ON(spin_trylock(&bdev->fence_lock)); to ttm_bo_wait.
>>
>> I really don't see how that specific commit can be wrong though, so
>> awaiting your results first before I try to dig more into it.
> I just reran git-bisect just on your commits (from 1a1494def to 97a875cbd)
> and I landed on the same commit as above:
>
> commit 85b144f86 (drm/ttm: call ttm_bo_cleanup_refs with reservation and lru lock held, v3)
>
> So now I'm pretty sure it's specifically this commit that started the
> issue.
>
> With your supposed debugging BUG_ONs added I still get:
>
> Dec 18 17:01:15 x4 kernel: ------------[ cut here ]------------
> Dec 18 17:01:15 x4 kernel: WARNING: at include/linux/kref.h:42 radeon_fence_ref+0x2c/0x40()
> Dec 18 17:01:15 x4 kernel: Hardware name: System Product Name
> Dec 18 17:01:15 x4 kernel: Pid: 157, comm: X Not tainted 3.7.0-rc7-00520-g85b144f-dirty #174
> Dec 18 17:01:15 x4 kernel: Call Trace:
> Dec 18 17:01:15 x4 kernel: [<ffffffff81058c84>] ? warn_slowpath_common+0x74/0xb0
> Dec 18 17:01:15 x4 kernel: [<ffffffff8129273c>] ? radeon_fence_ref+0x2c/0x40
> Dec 18 17:01:15 x4 kernel: [<ffffffff8125e95c>] ? ttm_bo_cleanup_refs_and_unlock+0x18c/0x2d0
> Dec 18 17:01:15 x4 kernel: [<ffffffff8125f17c>] ? ttm_mem_evict_first+0x1dc/0x2a0
> Dec 18 17:01:15 x4 kernel: [<ffffffff81264452>] ? ttm_bo_man_get_node+0x62/0xb0
> Dec 18 17:01:15 x4 kernel: [<ffffffff8125f4ce>] ? ttm_bo_mem_space+0x28e/0x340
> Dec 18 17:01:15 x4 kernel: [<ffffffff8125fb0c>] ? ttm_bo_move_buffer+0xfc/0x170
> Dec 18 17:01:15 x4 kernel: [<ffffffff810de172>] ? kmem_cache_alloc+0xb2/0xc0
> Dec 18 17:01:15 x4 kernel: [<ffffffff8125fc15>] ? ttm_bo_validate+0x95/0x110
> Dec 18 17:01:15 x4 kernel: [<ffffffff8125ff7c>] ? ttm_bo_init+0x2ec/0x3b0
> Dec 18 17:01:15 x4 kernel: [<ffffffff8129419a>] ? radeon_bo_create+0x18a/0x200
> Dec 18 17:01:15 x4 kernel: [<ffffffff81293e80>] ? radeon_bo_clear_va+0x40/0x40
> Dec 18 17:01:15 x4 kernel: [<ffffffff812a5342>] ? radeon_gem_object_create+0x92/0x160
> Dec 18 17:01:15 x4 kernel: [<ffffffff812a575c>] ? radeon_gem_create_ioctl+0x6c/0x150
> Dec 18 17:01:15 x4 kernel: [<ffffffff812a529f>] ? radeon_gem_object_free+0x2f/0x40
> Dec 18 17:01:15 x4 kernel: [<ffffffff81246b60>] ? drm_ioctl+0x420/0x4f0
> Dec 18 17:01:15 x4 kernel: [<ffffffff812a56f0>] ? radeon_gem_pwrite_ioctl+0x20/0x20
> Dec 18 17:01:15 x4 kernel: [<ffffffff810f53a4>] ? do_vfs_ioctl+0x2e4/0x4e0
> Dec 18 17:01:15 x4 kernel: [<ffffffff810e5588>] ? vfs_read+0x118/0x160
> Dec 18 17:01:15 x4 kernel: [<ffffffff810f55ec>] ? sys_ioctl+0x4c/0xa0
> Dec 18 17:01:15 x4 kernel: [<ffffffff810e5851>] ? sys_read+0x51/0xa0
> Dec 18 17:01:15 x4 kernel: [<ffffffff814b0612>] ? system_call_fastpath+0x16/0x1b
So nothing changed.. did you revert the drm/radeon patch before it yet? And wtf is going on here?

That patch shouldn't cause such issues by itself, and I don't see how the refcount on bo->sync_obj can be zero, with bo->sync_obj non-null.

Refcounting seems to be messed up on the fence somewhere, but I don't think it's caused by this patch..

~Maarten

  reply	other threads:[~2012-12-18 18:10 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-17 18:27 GPU lockup CP stall for more than 10000msec on latest vanilla git Markus Trippelsdorf
2012-12-17 21:32 ` Alex Deucher
2012-12-17 21:48   ` Markus Trippelsdorf
2012-12-17 21:58     ` Markus Trippelsdorf
2012-12-17 22:00     ` Alex Deucher
2012-12-17 22:25       ` Markus Trippelsdorf
2012-12-17 22:55         ` Markus Trippelsdorf
2012-12-18 11:20           ` Michel Dänzer
2012-12-18 13:38             ` Markus Trippelsdorf
2012-12-18 13:51               ` Markus Trippelsdorf
2012-12-18 15:24               ` Maarten Lankhorst
2012-12-18 16:12                 ` Markus Trippelsdorf
2012-12-18 18:10                   ` Maarten Lankhorst [this message]
2012-12-19 13:57                   ` Maarten Lankhorst
2012-12-19 14:20                     ` Markus Trippelsdorf
2012-12-19 14:31                       ` Maarten Lankhorst
2012-12-23  1:46         ` Alex Deucher
2012-12-23  8:43           ` Markus Trippelsdorf
2012-12-23 10:09             ` Andy Furniss
2012-12-23 10:21               ` Markus Trippelsdorf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50D0B1A2.3050102@canonical.com \
    --to=maarten.lankhorst@canonical.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=markus@trippelsdorf.de \
    --cc=michel@daenzer.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.