From mboxrd@z Thu Jan 1 00:00:00 1970 From: Markus Trippelsdorf Subject: Re: GPU lockup CP stall for more than 10000msec on latest vanilla git Date: Tue, 18 Dec 2012 14:51:02 +0100 Message-ID: <20121218135102.GB218@x4> References: <20121217182752.GA351@x4> <20121217214819.GA228@x4> <20121217222519.GA229@x4> <20121217225534.GA219@x4> <1355829632.17142.59.camel@thor.local> <20121218133831.GA218@x4> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: Received: from mail.ud10.udmedia.de (ud10.udmedia.de [194.117.254.50]) by gabe.freedesktop.org (Postfix) with ESMTP id D8EEBE5D3E for ; Tue, 18 Dec 2012 05:51:04 -0800 (PST) Content-Disposition: inline In-Reply-To: <20121218133831.GA218@x4> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org Errors-To: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org To: Michel =?iso-8859-1?Q?D=E4nzer?= Cc: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org On 2012.12.18 at 14:38 +0100, Markus Trippelsdorf wrote: > On 2012.12.18 at 12:20 +0100, Michel D=E4nzer wrote: > > On Mon, 2012-12-17 at 23:55 +0100, Markus Trippelsdorf wrote: = > > > On 2012.12.17 at 23:25 +0100, Markus Trippelsdorf wrote: > > > > On 2012.12.17 at 17:00 -0500, Alex Deucher wrote: > > > > > On Mon, Dec 17, 2012 at 4:48 PM, Markus Trippelsdorf > > > > > wrote: > > > > > > On 2012.12.17 at 16:32 -0500, Alex Deucher wrote: > > > > > >> On Mon, Dec 17, 2012 at 1:27 PM, Markus Trippelsdorf > > > > > >> wrote: > > > > > >> > As soon as I open the following website: > > > > > >> > http://www.boston.com/bigpicture/2012/12/2012_year_in_pictur= es_part_i.html > > > > > >> > > > > > > >> > my Radeon RS780 stalls (GPU lockup) leaving the machine unus= able: > > > > > >> > > > > > >> Is this a regression? Most likely a 3D driver bug unless you = are only > > > > > >> seeing it with specific kernels. What browser are you using a= nd do > > > > > >> you have hw accelerated webgl, etc. enabled? If so, what vers= ion of > > > > > >> mesa are you using? > > > > > > > > > > > > This is a regression, because it is caused by yesterdays merge = of > > > > > > drm-next by Linus. IOW I only see this bug when running a > > > > > > v3.7-9432-g9360b53 kernel. > > > > > = > > > > > Can you bisect? I'm guessing it may be related to the new DMA ri= ngs. Possibly: > > > > > http://git.kernel.org/?p=3Dlinux/kernel/git/torvalds/linux.git;a= =3Dcommitdiff;h=3D2d6cc7296d4ee128ab0fa3b715f0afde511f49c2 > > > > = > > > > Yes, the commit above causes the issue. = > > > > = > > > > 2d6cc72 GPU lockups > > > = > > > With 2d6cc72 reverted I get: > > > = > > > Dec 17 23:09:35 x4 kernel: ------------[ cut here ]------------ > > = > > Probably a separate issue, can you bisect this one as well? > = > Yes. Git-bisect points to: > = > 85b144f860176ec18db927d6d9ecdfb24d9c6483 is the first bad commit > commit 85b144f860176ec18db927d6d9ecdfb24d9c6483 > Author: Maarten Lankhorst > Date: Thu Nov 29 11:36:54 2012 +0000 > = > drm/ttm: call ttm_bo_cleanup_refs with reservation and lru lock > held, v3 > = > (Please note that this bug is a little bit harder to reproduce. But > when you scroll up and down for ~10 seconds on the webpage mentioned > above it will trigger the oops. > So while I'm not 100% sure that the issue is caused by exactly this > commit, the vicinity should be right) > = > Dec 18 14:29:07 x4 kernel: ------------[ cut here ]------------ > Dec 18 14:29:07 x4 kernel: WARNING: at include/linux/kref.h:42 radeon_fen= ce_ref+0x2c/0x40() > Dec 18 14:29:07 x4 kernel: Hardware name: System Product Name > Dec 18 14:29:07 x4 kernel: Pid: 161, comm: X Not tainted 3.7.0-rc7-00520-= g85b144f #168 > Dec 18 14:29:07 x4 kernel: Call Trace: > Dec 18 14:29:07 x4 kernel: [] ? warn_slowpath_common+0x= 74/0xb0 > Dec 18 14:29:07 x4 kernel: [] ? radeon_fence_ref+0x2c/0= x40 > Dec 18 14:29:07 x4 kernel: [] ? ttm_bo_cleanup_refs_and= _unlock+0x17c/0x2c0 > Dec 18 14:29:07 x4 kernel: [] ? ttm_mem_evict_first+0x1= dc/0x2a0 > Dec 18 14:29:07 x4 kernel: [] ? ttm_bo_man_get_node+0x6= 2/0xb0 > Dec 18 14:29:07 x4 kernel: [] ? ttm_bo_mem_space+0x28e/= 0x340 > Dec 18 14:29:07 x4 kernel: [] ? ttm_bo_move_buffer+0xfc= /0x170 > Dec 18 14:29:07 x4 kernel: [] ? kmem_cache_alloc+0xb2/0= xc0 > Dec 18 14:29:07 x4 kernel: [] ? ttm_bo_validate+0x95/0x= 110 > Dec 18 14:29:07 x4 kernel: [] ? ttm_bo_init+0x2ec/0x3b0 > Dec 18 14:29:07 x4 kernel: [] ? radeon_bo_create+0x18a/= 0x200 > Dec 18 14:29:07 x4 kernel: [] ? radeon_bo_clear_va+0x40= /0x40 > Dec 18 14:29:07 x4 kernel: [] ? radeon_gem_object_creat= e+0x92/0x160 > Dec 18 14:29:07 x4 kernel: [] ? radeon_gem_create_ioctl= +0x6c/0x150 > Dec 18 14:29:07 x4 kernel: [] ? drm_ioctl+0x420/0x4f0 > Dec 18 14:29:07 x4 kernel: [] ? radeon_gem_pwrite_ioctl= +0x20/0x20 > Dec 18 14:29:07 x4 kernel: [] ? do_vfs_ioctl+0x2e4/0x4e0 > Dec 18 14:29:07 x4 kernel: [] ? vfs_read+0x118/0x160 > Dec 18 14:29:07 x4 kernel: [] ? sys_ioctl+0x4c/0xa0 > Dec 18 14:29:07 x4 kernel: [] ? sys_read+0x51/0xa0 > Dec 18 14:29:07 x4 kernel: [] ? system_call_fastpath+0x= 16/0x1b > Dec 18 14:29:07 x4 kernel: ---[ end trace c5e6f68fefd3a70b ]--- > Dec 18 14:29:07 x4 kernel: BUG: unable to handle kernel paging request at= 0000000100000077 > Dec 18 14:29:07 x4 kernel: IP: [] _raw_spin_lock+0x5/0x= 30 > Dec 18 14:29:07 x4 kernel: PGD 2156c4067 PUD 0 > Dec 18 14:29:07 x4 kernel: Oops: 0002 [#1] SMP > Dec 18 14:29:07 x4 kernel: CPU 1 > Dec 18 14:29:07 x4 kernel: Pid: 161, comm: X Tainted: G W 3.7.0= -rc7-00520-g85b144f #168 System manufacturer System Product Name/M4A78T-E > Dec 18 14:29:07 x4 kernel: RIP: 0010:[] [] _raw_spin_lock+0x5/0x30 > Dec 18 14:29:07 x4 kernel: RSP: 0018:ffff880211645d58 EFLAGS: 00010286 > Dec 18 14:29:07 x4 kernel: RAX: 0000000000000100 RBX: ffff8801c0e29448 RC= X: 0000000000000000 > Dec 18 14:29:07 x4 kernel: RDX: 0000000000000001 RSI: 0000000000000001 RD= I: 0000000100000077 > Dec 18 14:29:07 x4 kernel: RBP: 00000000ffffffff R08: 0000000000000000 R0= 9: ffffffff81838370 > Dec 18 14:29:07 x4 kernel: R10: ffffffff812a5960 R11: 0000000000000246 R1= 2: 0000000000000001 > Dec 18 14:29:07 x4 kernel: R13: 0000000000000001 R14: 0000000000000000 R1= 5: 00007fff0723dba0 > Dec 18 14:29:07 x4 kernel: FS: 00007f958542f880(0000) GS:ffff88021fc8000= 0(0000) knlGS:0000000000000000 > Dec 18 14:29:07 x4 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050= 033 > Dec 18 14:29:07 x4 kernel: CR2: 0000000100000077 CR3: 000000021161a000 CR= 4: 00000000000007e0 > Dec 18 14:29:07 x4 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR= 2: 0000000000000000 > Dec 18 14:29:07 x4 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR= 7: 0000000000000400 > Dec 18 14:29:07 x4 kernel: Process X (pid: 161, threadinfo ffff8802116440= 00, task ffff880215ab85d0) > Dec 18 14:29:07 x4 kernel: Stack: > Dec 18 14:29:07 x4 kernel: ffffffff8125d9ba 0000000015c83600 ffff8801c0e2= 9400 ffff880211645e30 > Dec 18 14:29:07 x4 kernel: ffff8801c0e29448 ffff880211645dcc 000000000000= 0001 ffffffff81294bff > Dec 18 14:29:07 x4 kernel: ffff8801c0e29608 ffff880211645e30 ffff880216a7= 6000 ffff880211645e30 > Dec 18 14:29:07 x4 kernel: Call Trace: > Dec 18 14:29:07 x4 kernel: [] ? ttm_bo_reserve+0x3a/0x1= 10 > Dec 18 14:29:07 x4 kernel: [] ? radeon_bo_wait+0x3f/0xc0 > Dec 18 14:29:07 x4 kernel: [] ? radeon_gem_busy_ioctl+0= x57/0x100 > Dec 18 14:29:07 x4 kernel: [] ? drm_ioctl+0x420/0x4f0 > Dec 18 14:29:07 x4 kernel: [] ? radeon_gem_mmap_ioctl+0= x20/0x20 > Dec 18 14:29:07 x4 kernel: [] ? do_vfs_ioctl+0x2e4/0x4e0 > Dec 18 14:29:07 x4 kernel: [] ? vfs_read+0x13d/0x160 > Dec 18 14:29:07 x4 kernel: [] ? sys_ioctl+0x4c/0xa0 > Dec 18 14:29:07 x4 kernel: [] ? sys_read+0x51/0xa0 > Dec 18 14:29:07 x4 kernel: [] ? system_call_fastpath+0x= 16/0x1b > Dec 18 14:29:07 x4 kernel: Code: 31 c0 5b c3 66 90 8d 8a 00 01 00 00 89 d= 0 f0 66 0f b1 0b 66 39 d0 75 de b8 01 00 00 00 5b c3 0f 1f 80 00 00 00 00 b= 8 00 01 00 00 66 0f c1 07 0f b6 d4 38 c2 74 10 0f 1f 80 00 00 00 00 f3= 90 > Dec 18 14:29:07 x4 kernel: RIP [] _raw_spin_lock+0x5/0= x30 > Dec 18 14:29:07 x4 kernel: RSP > Dec 18 14:29:07 x4 kernel: CR2: 0000000100000077 > Dec 18 14:29:07 x4 kernel: ---[ end trace c5e6f68fefd3a70c ]--- > Dec 18 14:29:28 x4 kernel: BUG: unable to handle kernel paging request at= 0000000100000023 > Dec 18 14:29:28 x4 kernel: IP: [] radeon_vm_bo_invalida= te+0x18/0x30 > Dec 18 14:29:28 x4 kernel: PGD 205289067 PUD 0 > Dec 18 14:29:28 x4 kernel: Oops: 0002 [#2] SMP > Dec 18 14:29:28 x4 kernel: CPU 1 > Dec 18 14:29:28 x4 kernel: Pid: 13, comm: kworker/1:0 Tainted: G D W= 3.7.0-rc7-00520-g85b144f #168 System manufacturer System Product Name/M= 4A78T-E > Dec 18 14:29:28 x4 kernel: RIP: 0010:[] [] radeon_vm_bo_invalidate+0x18/0x30 > Dec 18 14:29:28 x4 kernel: RSP: 0018:ffff8802168b3d78 EFLAGS: 00010207 > Dec 18 14:29:28 x4 kernel: RAX: 00000000ffffffff RBX: ffff8801c0e29048 RC= X: ffff8801c0e2b928 > Dec 18 14:29:28 x4 kernel: RDX: 0000000000000001 RSI: ffff8801c0e291f0 RD= I: 00000000ffffffff > Dec 18 14:29:28 x4 kernel: RBP: 0000000000000002 R08: 0000000000000000 R0= 9: 0000000000000000 > Dec 18 14:29:28 x4 kernel: R10: ffffea0007038a00 R11: dead000000100100 R1= 2: ffff880216a76590 > Dec 18 14:29:28 x4 kernel: R13: ffffffff818383e0 R14: 0000000000000000 R1= 5: ffff880215c83678 > Dec 18 14:29:28 x4 kernel: FS: 00007f4bb2b64740(0000) GS:ffff88021fc8000= 0(0000) knlGS:0000000000000000 > Dec 18 14:29:28 x4 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050= 03b > Dec 18 14:29:28 x4 kernel: CR2: 0000000100000023 CR3: 000000020698f000 CR= 4: 00000000000007e0 > Dec 18 14:29:28 x4 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR= 2: 0000000000000000 > Dec 18 14:29:28 x4 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR= 7: 0000000000000400 > Dec 18 14:29:28 x4 kernel: Process kworker/1:0 (pid: 13, threadinfo ffff8= 802168b2000, task ffff88021687d730) > Dec 18 14:29:28 x4 kernel: Stack: > Dec 18 14:29:28 x4 kernel: ffffffff8125d2e9 ffff8801c0e29048 ffffffff8125= e8cb ffff880216a769b8 > Dec 18 14:29:28 x4 kernel: ffffffff810de82f ffff8801c0e2b848 ffff880215c8= 3678 ffff8801c0e2b900 > Dec 18 14:29:28 x4 kernel: 0000000000000001 ffff880216a76a80 ffff8801c0e2= 9048 ffffffff8125eb7d > Dec 18 14:29:28 x4 kernel: Call Trace: > Dec 18 14:29:28 x4 kernel: [] ? ttm_bo_cleanup_memtype_= use+0x19/0x90 > Dec 18 14:29:28 x4 kernel: [] ? ttm_bo_cleanup_refs_and= _unlock+0x12b/0x2c0 > Dec 18 14:29:28 x4 kernel: [] ? kfree+0xf/0xb0 > Dec 18 14:29:28 x4 kernel: [] ? ttm_bo_delayed_delete+0= x11d/0x1a0 > Dec 18 14:29:28 x4 kernel: [] ? ttm_bo_delayed_workqueu= e+0x12/0x30 > Dec 18 14:29:28 x4 kernel: [] ? process_one_work+0x179/= 0x480 > Dec 18 14:29:28 x4 kernel: [] ? ttm_bo_delayed_delete+0= x1a0/0x1a0 > Dec 18 14:29:28 x4 kernel: [] ? worker_thread+0x1b1/0x5= 40 > Dec 18 14:29:28 x4 kernel: [] ? busy_worker_rebind_fn+0= x100/0x100 > Dec 18 14:29:28 x4 kernel: [] ? kthread+0xaf/0xc0 > Dec 18 14:29:28 x4 kernel: [] ? __kthread_bind+0x30/0x30 > Dec 18 14:29:28 x4 kernel: [] ? ret_from_fork+0x7c/0xb0 > Dec 18 14:29:28 x4 kernel: [] ? __kthread_bind+0x30/0x30 > Dec 18 14:29:28 x4 kernel: Code: 8b 44 24 04 48 83 c4 08 5b 5d 41 5c c3 6= 6 0f 1f 44 00 00 48 8b 86 f0 01 00 00 48 81 c6 f0 01 00 00 48 39 f0 74 11 0= f 1f 44 00 00 40 24 00 48 8b 00 48 39 f0 75 f4 f3 c3 66 2e 0f 1f 84 00= 00 > Dec 18 14:29:28 x4 kernel: RIP [] radeon_vm_bo_invalid= ate+0x18/0x30 > Dec 18 14:29:28 x4 kernel: RSP > Dec 18 14:29:28 x4 kernel: CR2: 0000000100000023 > Dec 18 14:29:28 x4 kernel: ---[ end trace c5e6f68fefd3a70d ]--- > Dec 18 14:29:28 x4 kernel: BUG: unable to handle kernel paging request at= ffffffffffffffd8 > Dec 18 14:29:28 x4 kernel: IP: [] kthread_data+0x7/0x10 > Dec 18 14:29:28 x4 kernel: PGD 180d067 PUD 180e067 PMD 0 > Dec 18 14:29:28 x4 kernel: Oops: 0000 [#3] SMP > Dec 18 14:29:28 x4 kernel: CPU 1 > Dec 18 14:29:28 x4 kernel: Pid: 13, comm: kworker/1:0 Tainted: G D W= 3.7.0-rc7-00520-g85b144f #168 System manufacturer System Product Name/M= 4A78T-E > Dec 18 14:29:28 x4 kernel: RIP: 0010:[] [] kthread_data+0x7/0x10 > Dec 18 14:29:28 x4 kernel: RSP: 0018:ffff8802168b3aa0 EFLAGS: 00010002 > Dec 18 14:29:28 x4 kernel: RAX: 0000000000000000 RBX: 0000000000000001 RC= X: 000000015c7992d1 > Dec 18 14:29:28 x4 kernel: RDX: ffffffffff8a8b63 RSI: 0000000000000001 RD= I: ffff88021687d730 > Dec 18 14:29:28 x4 kernel: RBP: ffff88021687d730 R08: 0000000000000000 R0= 9: 0000000000000000 > Dec 18 14:29:28 x4 kernel: R10: ffff880216887980 R11: 0000000000000000 R1= 2: ffff88021fc912c0 > Dec 18 14:29:28 x4 kernel: R13: 0000000000000001 R14: ffff88021687d720 R1= 5: ffff88021687d730 > Dec 18 14:29:28 x4 kernel: FS: 00007f4bb2b64740(0000) GS:ffff88021fc8000= 0(0000) knlGS:0000000000000000 > Dec 18 14:29:28 x4 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050= 03b > Dec 18 14:29:28 x4 kernel: CR2: ffffffffffffffd8 CR3: 000000020698f000 CR= 4: 00000000000007e0 > Dec 18 14:29:28 x4 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR= 2: 0000000000000000 > Dec 18 14:29:28 x4 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR= 7: 0000000000000400 > Dec 18 14:29:28 x4 kernel: Process kworker/1:0 (pid: 13, threadinfo ffff8= 802168b2000, task ffff88021687d730) > Dec 18 14:29:28 x4 kernel: Stack: > Dec 18 14:29:28 x4 kernel: ffffffff8106fb98 ffff88021687d9d0 ffffffff814a= e8b5 00000000000112c0 > Dec 18 14:29:28 x4 kernel: ffff8802168b3fd8 00000000000112c0 ffff8802168b= 3fd8 0000000000000001 > Dec 18 14:29:28 x4 kernel: ffff88021687d8d8 ffff88021687d720 ffff88021687= 8000 ffff88021687d720 > Dec 18 14:29:28 x4 kernel: Call Trace: > Dec 18 14:29:28 x4 kernel: [] ? wq_worker_sleeping+0x8/= 0xb0 > Dec 18 14:29:28 x4 kernel: [] ? __schedule+0x3a5/0x5f0 > Dec 18 14:29:28 x4 kernel: [] ? do_exit+0x52a/0x830 > Dec 18 14:29:28 x4 kernel: [] ? oops_end+0x8e/0xd0 > Dec 18 14:29:28 x4 kernel: [] ? no_context+0x251/0x25d > Dec 18 14:29:28 x4 kernel: [] ? __do_page_fault+0x2ee/0= x490 > Dec 18 14:29:28 x4 kernel: [] ? find_busiest_group+0x28= /0x480 > Dec 18 14:29:28 x4 kernel: [] ? page_fault+0x1f/0x30 > Dec 18 14:29:28 x4 kernel: [] ? radeon_vm_bo_invalidate= +0x18/0x30 > Dec 18 14:29:28 x4 kernel: [] ? ttm_bo_cleanup_memtype_= use+0x19/0x90 > Dec 18 14:29:28 x4 kernel: [] ? ttm_bo_cleanup_refs_and= _unlock+0x12b/0x2c0 > Dec 18 14:29:28 x4 kernel: [] ? kfree+0xf/0xb0 > Dec 18 14:29:28 x4 kernel: [] ? ttm_bo_delayed_delete+0= x11d/0x1a0 > Dec 18 14:29:28 x4 kernel: [] ? ttm_bo_delayed_workqueu= e+0x12/0x30 > Dec 18 14:29:28 x4 kernel: [] ? process_one_work+0x179/= 0x480 > Dec 18 14:29:28 x4 kernel: [] ? ttm_bo_delayed_delete+0= x1a0/0x1a0 > Dec 18 14:29:28 x4 kernel: [] ? worker_thread+0x1b1/0x5= 40 > Dec 18 14:29:28 x4 kernel: [] ? busy_worker_rebind_fn+0= x100/0x100 > Dec 18 14:29:28 x4 kernel: [] ? kthread+0xaf/0xc0 > Dec 18 14:29:28 x4 kernel: [] ? __kthread_bind+0x30/0x30 > Dec 18 14:29:28 x4 kernel: [] ? ret_from_fork+0x7c/0xb0 > Dec 18 14:29:28 x4 kernel: [] ? __kthread_bind+0x30/0x30 > Dec 18 14:29:28 x4 kernel: Code: 74 03 c6 03 00 65 48 8b 04 25 c0 b9 00 0= 0 48 8b 80 48 02 00 00 5b 48 8b 40 c8 48 d1 e8 83 e0 01 c3 0f 1f 00 48 8b 8= 7 48 02 00 00 <48> 8b 40 d8 c3 0f 1f 40 00 65 48 8b 04 25 c0 b9 00 00 48 8b= b8 > Dec 18 14:29:28 x4 kernel: RIP [] kthread_data+0x7/0x10 > Dec 18 14:29:28 x4 kernel: RSP > Dec 18 14:29:28 x4 kernel: CR2: ffffffffffffffd8 > Dec 18 14:29:28 x4 kernel: ---[ end trace c5e6f68fefd3a70e ]--- > Dec 18 14:29:28 x4 kernel: Fixing recursive fault but reboot is needed! > Dec 18 14:29:28 x4 kernel: SysRq : Emergency Sync CCing Maarten -- = Markus