From mboxrd@z Thu Jan 1 00:00:00 1970 From: Maarten Lankhorst Subject: Re: GPU lockup CP stall for more than 10000msec on latest vanilla git Date: Tue, 18 Dec 2012 16:24:59 +0100 Message-ID: <50D08ACB.4090605@canonical.com> References: <20121217182752.GA351@x4> <20121217214819.GA228@x4> <20121217222519.GA229@x4> <20121217225534.GA219@x4> <1355829632.17142.59.camel@thor.local> <20121218133831.GA218@x4> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: Received: from youngberry.canonical.com (youngberry.canonical.com [91.189.89.112]) by gabe.freedesktop.org (Postfix) with ESMTP id 5750BE5D3B for ; Tue, 18 Dec 2012 07:25:02 -0800 (PST) In-Reply-To: <20121218133831.GA218@x4> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org Errors-To: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org To: Markus Trippelsdorf Cc: =?ISO-8859-1?Q?Michel_D=E4nzer?= , dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org Op 18-12-12 14:38, Markus Trippelsdorf schreef: > On 2012.12.18 at 12:20 +0100, Michel D=E4nzer wrote: >> On Mon, 2012-12-17 at 23:55 +0100, Markus Trippelsdorf wrote: = >>> On 2012.12.17 at 23:25 +0100, Markus Trippelsdorf wrote: >>>> On 2012.12.17 at 17:00 -0500, Alex Deucher wrote: >>>>> On Mon, Dec 17, 2012 at 4:48 PM, Markus Trippelsdorf >>>>> wrote: >>>>>> On 2012.12.17 at 16:32 -0500, Alex Deucher wrote: >>>>>>> On Mon, Dec 17, 2012 at 1:27 PM, Markus Trippelsdorf >>>>>>> wrote: >>>>>>>> As soon as I open the following website: >>>>>>>> http://www.boston.com/bigpicture/2012/12/2012_year_in_pictures_par= t_i.html >>>>>>>> >>>>>>>> my Radeon RS780 stalls (GPU lockup) leaving the machine unusable: >>>>>>> Is this a regression? Most likely a 3D driver bug unless you are o= nly >>>>>>> seeing it with specific kernels. What browser are you using and do >>>>>>> you have hw accelerated webgl, etc. enabled? If so, what version of >>>>>>> mesa are you using? >>>>>> This is a regression, because it is caused by yesterdays merge of >>>>>> drm-next by Linus. IOW I only see this bug when running a >>>>>> v3.7-9432-g9360b53 kernel. >>>>> Can you bisect? I'm guessing it may be related to the new DMA rings.= Possibly: >>>>> http://git.kernel.org/?p=3Dlinux/kernel/git/torvalds/linux.git;a=3Dco= mmitdiff;h=3D2d6cc7296d4ee128ab0fa3b715f0afde511f49c2 >>>> Yes, the commit above causes the issue. = >>>> >>>> 2d6cc72 GPU lockups >>> With 2d6cc72 reverted I get: >>> >>> Dec 17 23:09:35 x4 kernel: ------------[ cut here ]------------ >> Probably a separate issue, can you bisect this one as well? > Yes. Git-bisect points to: > > 85b144f860176ec18db927d6d9ecdfb24d9c6483 is the first bad commit > commit 85b144f860176ec18db927d6d9ecdfb24d9c6483 > Author: Maarten Lankhorst > Date: Thu Nov 29 11:36:54 2012 +0000 > > drm/ttm: call ttm_bo_cleanup_refs with reservation and lru lock > held, v3 > > (Please note that this bug is a little bit harder to reproduce. But > when you scroll up and down for ~10 seconds on the webpage mentioned > above it will trigger the oops. > So while I'm not 100% sure that the issue is caused by exactly this > commit, the vicinity should be right) > Those dmesg warnings sound suspicious, looks like something is going very w= rong there. Can you revert the one before it? "drm/radeon: allow move_notify to be call= ed without reservation" Reservation should be held at this point, that commit got in accidentally. I doubt not holding a reservation is causing it though, I don't really see = how that commit could cause it however, so can you please double check it never happened before t= hat point, and only started at that commit? also slap in a BUG_ON(!ttm_bo_is_reserved(bo)) in ttm_bo_cleanup_refs_and_u= nlock for good measure, and a BUG_ON(spin_trylock(&bdev->fence_lock)); to ttm_bo_wait. I really don't see how that specific commit can be wrong though, so awaitin= g your results first before I try to dig more into it. ~Maarten