From: Markus Trippelsdorf <markus@trippelsdorf.de>
To: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: "Michel Dänzer" <michel@daenzer.net>, dri-devel@lists.freedesktop.org
Subject: Re: GPU lockup CP stall for more than 10000msec on latest vanilla git
Date: Tue, 18 Dec 2012 17:12:38 +0100 [thread overview]
Message-ID: <20121218161238.GA213@x4> (raw)
In-Reply-To: <50D08ACB.4090605@canonical.com>
On 2012.12.18 at 16:24 +0100, Maarten Lankhorst wrote:
> Op 18-12-12 14:38, Markus Trippelsdorf schreef:
> > On 2012.12.18 at 12:20 +0100, Michel Dänzer wrote:
> >> On Mon, 2012-12-17 at 23:55 +0100, Markus Trippelsdorf wrote:
> >>> On 2012.12.17 at 23:25 +0100, Markus Trippelsdorf wrote:
> >>>> On 2012.12.17 at 17:00 -0500, Alex Deucher wrote:
> >>>>> On Mon, Dec 17, 2012 at 4:48 PM, Markus Trippelsdorf
> >>>>> <markus@trippelsdorf.de> wrote:
> >>>>>> On 2012.12.17 at 16:32 -0500, Alex Deucher wrote:
> >>>>>>> On Mon, Dec 17, 2012 at 1:27 PM, Markus Trippelsdorf
> >>>>>>> <markus@trippelsdorf.de> wrote:
> >>>>>>>> As soon as I open the following website:
> >>>>>>>> http://www.boston.com/bigpicture/2012/12/2012_year_in_pictures_part_i.html
> >>>>>>>>
> >>>>>>>> my Radeon RS780 stalls (GPU lockup) leaving the machine unusable:
> >>>>>>> Is this a regression? Most likely a 3D driver bug unless you are only
> >>>>>>> seeing it with specific kernels. What browser are you using and do
> >>>>>>> you have hw accelerated webgl, etc. enabled? If so, what version of
> >>>>>>> mesa are you using?
> >>>>>> This is a regression, because it is caused by yesterdays merge of
> >>>>>> drm-next by Linus. IOW I only see this bug when running a
> >>>>>> v3.7-9432-g9360b53 kernel.
> >>>>> Can you bisect? I'm guessing it may be related to the new DMA rings. Possibly:
> >>>>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=2d6cc7296d4ee128ab0fa3b715f0afde511f49c2
> >>>> Yes, the commit above causes the issue.
> >>>>
> >>>> 2d6cc72 GPU lockups
> >>> With 2d6cc72 reverted I get:
> >>>
> >>> Dec 17 23:09:35 x4 kernel: ------------[ cut here ]------------
> >> Probably a separate issue, can you bisect this one as well?
> > Yes. Git-bisect points to:
> >
> > 85b144f860176ec18db927d6d9ecdfb24d9c6483 is the first bad commit
> > commit 85b144f860176ec18db927d6d9ecdfb24d9c6483
> > Author: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> > Date: Thu Nov 29 11:36:54 2012 +0000
> >
> > drm/ttm: call ttm_bo_cleanup_refs with reservation and lru lock
> > held, v3
> >
> > (Please note that this bug is a little bit harder to reproduce. But
> > when you scroll up and down for ~10 seconds on the webpage mentioned
> > above it will trigger the oops.
> > So while I'm not 100% sure that the issue is caused by exactly this
> > commit, the vicinity should be right)
> >
> Those dmesg warnings sound suspicious, looks like something is going
> very wrong there.
>
> Can you revert the one before it? "drm/radeon: allow move_notify to be
> called without reservation" Reservation should be held at this point,
> that commit got in accidentally.
>
> I doubt not holding a reservation is causing it though, I don't really
> see how that commit could cause it however, so can you please double
> check it never happened before that point, and only started at that
> commit?
>
> also slap in a BUG_ON(!ttm_bo_is_reserved(bo)) in
> ttm_bo_cleanup_refs_and_unlock for good measure, and a
> BUG_ON(spin_trylock(&bdev->fence_lock)); to ttm_bo_wait.
>
> I really don't see how that specific commit can be wrong though, so
> awaiting your results first before I try to dig more into it.
I just reran git-bisect just on your commits (from 1a1494def to 97a875cbd)
and I landed on the same commit as above:
commit 85b144f86 (drm/ttm: call ttm_bo_cleanup_refs with reservation and lru lock held, v3)
So now I'm pretty sure it's specifically this commit that started the
issue.
With your supposed debugging BUG_ONs added I still get:
Dec 18 17:01:15 x4 kernel: ------------[ cut here ]------------
Dec 18 17:01:15 x4 kernel: WARNING: at include/linux/kref.h:42 radeon_fence_ref+0x2c/0x40()
Dec 18 17:01:15 x4 kernel: Hardware name: System Product Name
Dec 18 17:01:15 x4 kernel: Pid: 157, comm: X Not tainted 3.7.0-rc7-00520-g85b144f-dirty #174
Dec 18 17:01:15 x4 kernel: Call Trace:
Dec 18 17:01:15 x4 kernel: [<ffffffff81058c84>] ? warn_slowpath_common+0x74/0xb0
Dec 18 17:01:15 x4 kernel: [<ffffffff8129273c>] ? radeon_fence_ref+0x2c/0x40
Dec 18 17:01:15 x4 kernel: [<ffffffff8125e95c>] ? ttm_bo_cleanup_refs_and_unlock+0x18c/0x2d0
Dec 18 17:01:15 x4 kernel: [<ffffffff8125f17c>] ? ttm_mem_evict_first+0x1dc/0x2a0
Dec 18 17:01:15 x4 kernel: [<ffffffff81264452>] ? ttm_bo_man_get_node+0x62/0xb0
Dec 18 17:01:15 x4 kernel: [<ffffffff8125f4ce>] ? ttm_bo_mem_space+0x28e/0x340
Dec 18 17:01:15 x4 kernel: [<ffffffff8125fb0c>] ? ttm_bo_move_buffer+0xfc/0x170
Dec 18 17:01:15 x4 kernel: [<ffffffff810de172>] ? kmem_cache_alloc+0xb2/0xc0
Dec 18 17:01:15 x4 kernel: [<ffffffff8125fc15>] ? ttm_bo_validate+0x95/0x110
Dec 18 17:01:15 x4 kernel: [<ffffffff8125ff7c>] ? ttm_bo_init+0x2ec/0x3b0
Dec 18 17:01:15 x4 kernel: [<ffffffff8129419a>] ? radeon_bo_create+0x18a/0x200
Dec 18 17:01:15 x4 kernel: [<ffffffff81293e80>] ? radeon_bo_clear_va+0x40/0x40
Dec 18 17:01:15 x4 kernel: [<ffffffff812a5342>] ? radeon_gem_object_create+0x92/0x160
Dec 18 17:01:15 x4 kernel: [<ffffffff812a575c>] ? radeon_gem_create_ioctl+0x6c/0x150
Dec 18 17:01:15 x4 kernel: [<ffffffff812a529f>] ? radeon_gem_object_free+0x2f/0x40
Dec 18 17:01:15 x4 kernel: [<ffffffff81246b60>] ? drm_ioctl+0x420/0x4f0
Dec 18 17:01:15 x4 kernel: [<ffffffff812a56f0>] ? radeon_gem_pwrite_ioctl+0x20/0x20
Dec 18 17:01:15 x4 kernel: [<ffffffff810f53a4>] ? do_vfs_ioctl+0x2e4/0x4e0
Dec 18 17:01:15 x4 kernel: [<ffffffff810e5588>] ? vfs_read+0x118/0x160
Dec 18 17:01:15 x4 kernel: [<ffffffff810f55ec>] ? sys_ioctl+0x4c/0xa0
Dec 18 17:01:15 x4 kernel: [<ffffffff810e5851>] ? sys_read+0x51/0xa0
Dec 18 17:01:15 x4 kernel: [<ffffffff814b0612>] ? system_call_fastpath+0x16/0x1b
Dec 18 17:01:15 x4 kernel: ---[ end trace 485a2dd5755db51e ]---
Dec 18 17:01:15 x4 kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000024
Dec 18 17:01:15 x4 kernel: IP: [<ffffffff81296488>] radeon_vm_bo_invalidate+0x18/0x30
Dec 18 17:01:15 x4 kernel: PGD 211d09067 PUD 211d52067 PMD 0
Dec 18 17:01:15 x4 kernel: Oops: 0002 [#1] SMP
Dec 18 17:01:15 x4 kernel: CPU 1
Dec 18 17:01:15 x4 kernel: Pid: 157, comm: X Tainted: G W 3.7.0-rc7-00520-g85b144f-dirty #174 System manufacturer System Product Name/M4A78T-E
Dec 18 17:01:15 x4 kernel: RIP: 0010:[<ffffffff81296488>] [<ffffffff81296488>] radeon_vm_bo_invalidate+0x18/0x30
Dec 18 17:01:15 x4 kernel: RSP: 0018:ffff880211ddfaa8 EFLAGS: 00010203
Dec 18 17:01:15 x4 kernel: RAX: 0000000000000000 RBX: ffff8801f94e1c48 RCX: ffff880205de3128
Dec 18 17:01:15 x4 kernel: RDX: 0000000000000001 RSI: ffff8801f94e1df0 RDI: ffff8801f94e1df8
Dec 18 17:01:15 x4 kernel: RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
Dec 18 17:01:15 x4 kernel: R10: 0000000000000000 R11: ffff880216a766b8 R12: ffff880216a76590
Dec 18 17:01:15 x4 kernel: R13: ffffffff818383e0 R14: 0000000000000001 R15: ffff880215c83678
Dec 18 17:01:15 x4 kernel: FS: 00007fbcabc8c880(0000) GS:ffff88021fc80000(0000) knlGS:0000000000000000
Dec 18 17:01:15 x4 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 18 17:01:15 x4 kernel: CR2: 0000000000000024 CR3: 0000000211d07000 CR4: 00000000000007e0
Dec 18 17:01:15 x4 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Dec 18 17:01:15 x4 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Dec 18 17:01:15 x4 kernel: Process X (pid: 157, threadinfo ffff880211dde000, task ffff880211dc0ba0)
Dec 18 17:01:15 x4 kernel: Stack:
Dec 18 17:01:15 x4 kernel: ffffffff8125d2e9 ffff8801f94e1c48 ffffffff8125e909 ffff880216a769b8
Dec 18 17:01:15 x4 kernel: 01ff880200000001 ffff8801f94e1c84 0000000000000001 ffff880216a766b8
Dec 18 17:01:15 x4 kernel: 0000000000000000 ffff880215c83678 ffff8801f94e1c48 ffffffff8125f17c
Dec 18 17:01:15 x4 kernel: Call Trace:
Dec 18 17:01:15 x4 kernel: [<ffffffff8125d2e9>] ? ttm_bo_cleanup_memtype_use+0x19/0x90
Dec 18 17:01:15 x4 kernel: [<ffffffff8125e909>] ? ttm_bo_cleanup_refs_and_unlock+0x139/0x2d0
Dec 18 17:01:15 x4 kernel: [<ffffffff8125f17c>] ? ttm_mem_evict_first+0x1dc/0x2a0
Dec 18 17:01:15 x4 kernel: [<ffffffff81264452>] ? ttm_bo_man_get_node+0x62/0xb0
Dec 18 17:01:15 x4 kernel: [<ffffffff8125f4ce>] ? ttm_bo_mem_space+0x28e/0x340
Dec 18 17:01:15 x4 kernel: [<ffffffff8125fb0c>] ? ttm_bo_move_buffer+0xfc/0x170
Dec 18 17:01:15 x4 kernel: [<ffffffff810de172>] ? kmem_cache_alloc+0xb2/0xc0
Dec 18 17:01:15 x4 kernel: [<ffffffff8125fc15>] ? ttm_bo_validate+0x95/0x110
Dec 18 17:01:15 x4 kernel: [<ffffffff8125ff7c>] ? ttm_bo_init+0x2ec/0x3b0
Dec 18 17:01:15 x4 kernel: [<ffffffff8129419a>] ? radeon_bo_create+0x18a/0x200
Dec 18 17:01:15 x4 kernel: [<ffffffff81293e80>] ? radeon_bo_clear_va+0x40/0x40
Dec 18 17:01:15 x4 kernel: [<ffffffff812a5342>] ? radeon_gem_object_create+0x92/0x160
Dec 18 17:01:15 x4 kernel: [<ffffffff812a575c>] ? radeon_gem_create_ioctl+0x6c/0x150
Dec 18 17:01:15 x4 kernel: [<ffffffff81246b60>] ? drm_ioctl+0x420/0x4f0
Dec 18 17:01:15 x4 kernel: [<ffffffff812a56f0>] ? radeon_gem_pwrite_ioctl+0x20/0x20
Dec 18 17:01:15 x4 kernel: [<ffffffff8111c310>] ? fsnotify_clear_marks_by_inode+0x20/0xd0
Dec 18 17:01:15 x4 kernel: [<ffffffff810fbc35>] ? __destroy_inode+0x15/0x60
Dec 18 17:01:15 x4 kernel: [<ffffffff810de220>] ? kmem_cache_free+0x10/0x90
Dec 18 17:01:15 x4 kernel: [<ffffffff810f8eaf>] ? dput+0x2f/0x300
Dec 18 17:01:15 x4 kernel: [<ffffffff810f53a4>] ? do_vfs_ioctl+0x2e4/0x4e0
Dec 18 17:01:15 x4 kernel: [<ffffffff811005fb>] ? mntput_no_expire+0x7b/0x170
Dec 18 17:01:15 x4 kernel: [<ffffffff8107bb6b>] ? lg_global_unlock+0x3b/0x50
Dec 18 17:01:15 x4 kernel: [<ffffffff81071b9c>] ? task_work_run+0x8c/0xc0
Dec 18 17:01:15 x4 kernel: [<ffffffff810f55ec>] ? sys_ioctl+0x4c/0xa0
Dec 18 17:01:15 x4 kernel: [<ffffffff814b0612>] ? system_call_fastpath+0x16/0x1b
Dec 18 17:01:15 x4 kernel: Code: 8b 44 24 04 48 83 c4 08 5b 5d 41 5c c3 66 0f 1f 44 00 00 48 8b 86 f0 01 00 00 48 81 c6 f0 01 00 00 48 39 f0 74 11 0f 1f 44 00 00 <c6> 40 24 00 48 8b 00 48 39 f0 75 f4 f3 c3 66 2e 0f 1f 84 00 00
Dec 18 17:01:15 x4 kernel: RIP [<ffffffff81296488>] radeon_vm_bo_invalidate+0x18/0x30
Dec 18 17:01:15 x4 kernel: RSP <ffff880211ddfaa8>
Dec 18 17:01:15 x4 kernel: CR2: 0000000000000024
Dec 18 17:01:15 x4 kernel: ---[ end trace 485a2dd5755db51f ]---
Dec 18 17:01:15 x4 kernel: [drm:drm_release] *ERROR* Device busy: 1
--
Markus
next prev parent reply other threads:[~2012-12-18 16:12 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-17 18:27 GPU lockup CP stall for more than 10000msec on latest vanilla git Markus Trippelsdorf
2012-12-17 21:32 ` Alex Deucher
2012-12-17 21:48 ` Markus Trippelsdorf
2012-12-17 21:58 ` Markus Trippelsdorf
2012-12-17 22:00 ` Alex Deucher
2012-12-17 22:25 ` Markus Trippelsdorf
2012-12-17 22:55 ` Markus Trippelsdorf
2012-12-18 11:20 ` Michel Dänzer
2012-12-18 13:38 ` Markus Trippelsdorf
2012-12-18 13:51 ` Markus Trippelsdorf
2012-12-18 15:24 ` Maarten Lankhorst
2012-12-18 16:12 ` Markus Trippelsdorf [this message]
2012-12-18 18:10 ` Maarten Lankhorst
2012-12-19 13:57 ` Maarten Lankhorst
2012-12-19 14:20 ` Markus Trippelsdorf
2012-12-19 14:31 ` Maarten Lankhorst
2012-12-23 1:46 ` Alex Deucher
2012-12-23 8:43 ` Markus Trippelsdorf
2012-12-23 10:09 ` Andy Furniss
2012-12-23 10:21 ` Markus Trippelsdorf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121218161238.GA213@x4 \
--to=markus@trippelsdorf.de \
--cc=dri-devel@lists.freedesktop.org \
--cc=maarten.lankhorst@canonical.com \
--cc=michel@daenzer.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.