From mboxrd@z Thu Jan 1 00:00:00 1970 From: Maarten Lankhorst Subject: Re: Commit ecff665f5e3f (drm/ttm: make ttm reservation calls...) causes system hang on Radeon RS780 Date: Wed, 10 Jul 2013 11:56:27 +0200 Message-ID: <51DD2FCB.70809@canonical.com> References: <20130710092211.GB356@x4> <51DD2976.2010904@canonical.com> <20130710094637.GA354@x4> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from youngberry.canonical.com (youngberry.canonical.com [91.189.89.112]) by gabe.freedesktop.org (Postfix) with ESMTP id 5C063E5FF1 for ; Wed, 10 Jul 2013 02:56:29 -0700 (PDT) In-Reply-To: <20130710094637.GA354@x4> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org Errors-To: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org To: Markus Trippelsdorf Cc: Dave Airlie , Jerome Glisse , dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org Op 10-07-13 11:46, Markus Trippelsdorf schreef: > On 2013.07.10 at 11:29 +0200, Maarten Lankhorst wrote: >> Op 10-07-13 11:22, Markus Trippelsdorf schreef: >>> By simply copy/pasting a big document under LibreOffice my system hangs >>> itself up. Only a hard reset gets it working again. >>> see also: https://bugs.freedesktop.org/show_bug.cgi?id=66551 >>> >>> I've bisected the issue to: >>> >>> commit ecff665f5e3f1c6909353e00b9420e45ae23d995 >>> Author: Maarten Lankhorst >>> Date: Thu Jun 27 13:48:17 2013 +0200 >>> >>> drm/ttm: make ttm reservation calls behave like reservation calls >>> >>> This commit converts the source of the val_seq counter to >>> the ww_mutex api. The reservation objects are converted later, >>> because there is still a lockdep splat in nouveau that has to >>> resolved first. >>> >>> Signed-off-by: Maarten Lankhorst >>> Reviewed-by: Jerome Glisse >>> Signed-off-by: Dave Airlie >> Hey, >> >> Can you try current head with CONFIG_PROVE_LOCKING set and post the >> lockdep splat from dmesg, if any? If there is any locking issue >> lockdep should warn about it. Lockdep will turn itself off after the >> first splat, so if the lockdep splat happens before running the >> affected parts those will have to be fixed first. > There was an unrelated EDAC lockdep splat, so I simply disabled it. > > This is what I get: > > Jul 10 11:40:44 x4 kernel: ================================================ > Jul 10 11:40:44 x4 kernel: [ BUG: lock held when returning to user space! ] > Jul 10 11:40:44 x4 kernel: 3.10.0-08587-g496322b #35 Not tainted > Jul 10 11:40:44 x4 kernel: ------------------------------------------------ > Jul 10 11:40:44 x4 kernel: X/211 is leaving the kernel with locks still held! > Jul 10 11:40:44 x4 kernel: 2 locks held by X/211: > Jul 10 11:40:44 x4 kernel: #0: (reservation_ww_class_acquire){+.+.+.}, at: [] radeon_bo_list_validate+0x20/0xd0 > Jul 10 11:40:44 x4 kernel: #1: (reservation_ww_class_mutex){+.+.+.}, at: [] ttm_eu_reserve_buffers+0x126/0x4b0 > Jul 10 11:40:52 x4 kernel: SysRq : Emergency Sync > Jul 10 11:40:53 x4 kernel: Emergency Sync complete > Thanks, exactly what I thought. I missed a backoff somewhere.. Does the below patch fix it? --- diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c index 0219d26..2020bf4 100644 --- a/drivers/gpu/drm/radeon/radeon_object.c +++ b/drivers/gpu/drm/radeon/radeon_object.c @@ -377,6 +377,7 @@ int radeon_bo_list_validate(struct ww_acquire_ctx *ticket, domain = lobj->alt_domain; goto retry; } + ttm_eu_backoff_reservation(ticket, head); return r; } }