* [PATCH] drm/i915: Remove bogus locking check in the hangcheck code
@ 2015-02-03 10:49 Daniel Vetter
2015-02-03 10:50 ` Chris Wilson
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Daniel Vetter @ 2015-02-03 10:49 UTC (permalink / raw)
To: Intel Graphics Development; +Cc: Daniel Vetter, Daniel Vetter, Mika Kuoppala
You can _never_ assert that a lock is not held, except in some very
restricted corner cases where it's guranteed that your code is running
single-threade (e.g. driver load before you've published any pointers
leading to that lock).
In addition the early return breaks a bunch of testcases since with
highly concurrent hangcheck stress tests the reset fails to work and
the test doesn't recover and time out.
This regression has been introduced in
commit b8d24a06568368076ebd5a858a011699a97bfa42
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date: Wed Jan 28 17:03:14 2015 +0200
drm/i915: Remove nested work in gpu error handling
Aside: It is possible to check whether a given task doesn't hold a
lock, but only when lockdep is enabled, using the lockdep_assert_held
stuff.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
drivers/gpu/drm/i915/i915_irq.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 53c5f9e39fe3..4145d95902f5 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2612,9 +2612,6 @@ void i915_handle_error(struct drm_device *dev, bool wedged,
va_list args;
char error_msg[80];
- if (WARN_ON(mutex_is_locked(&dev_priv->dev->struct_mutex)))
- return;
-
va_start(args, fmt);
vscnprintf(error_msg, sizeof(error_msg), fmt, args);
va_end(args);
--
2.1.4
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/i915: Remove bogus locking check in the hangcheck code
2015-02-03 10:49 [PATCH] drm/i915: Remove bogus locking check in the hangcheck code Daniel Vetter
@ 2015-02-03 10:50 ` Chris Wilson
2015-02-03 11:09 ` Daniel Vetter
2015-02-03 11:00 ` Chris Wilson
2015-02-03 13:57 ` shuang.he
2 siblings, 1 reply; 7+ messages in thread
From: Chris Wilson @ 2015-02-03 10:50 UTC (permalink / raw)
To: Daniel Vetter; +Cc: Daniel Vetter, Intel Graphics Development, Mika Kuoppala
On Tue, Feb 03, 2015 at 11:49:00AM +0100, Daniel Vetter wrote:
> Aside: It is possible to check whether a given task doesn't hold a
> lock, but only when lockdep is enabled, using the lockdep_assert_held
> stuff.
Bah. That's what I said, but a certain Daniel insists on using WARN_ON().
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/i915: Remove bogus locking check in the hangcheck code
2015-02-03 10:49 [PATCH] drm/i915: Remove bogus locking check in the hangcheck code Daniel Vetter
2015-02-03 10:50 ` Chris Wilson
@ 2015-02-03 11:00 ` Chris Wilson
2015-02-03 11:14 ` Daniel Vetter
2015-02-03 13:57 ` shuang.he
2 siblings, 1 reply; 7+ messages in thread
From: Chris Wilson @ 2015-02-03 11:00 UTC (permalink / raw)
To: Daniel Vetter; +Cc: Daniel Vetter, Intel Graphics Development, Mika Kuoppala
On Tue, Feb 03, 2015 at 11:49:00AM +0100, Daniel Vetter wrote:
> You can _never_ assert that a lock is not held, except in some very
> restricted corner cases where it's guranteed that your code is running
> single-threade (e.g. driver load before you've published any pointers
> leading to that lock).
Except that the mistake here was that we thought we were already inside
the strictly single threaded recovery phase. Seems a bit blasé not to
mention that recovery includes several tricks to break locks.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/i915: Remove bogus locking check in the hangcheck code
2015-02-03 10:50 ` Chris Wilson
@ 2015-02-03 11:09 ` Daniel Vetter
0 siblings, 0 replies; 7+ messages in thread
From: Daniel Vetter @ 2015-02-03 11:09 UTC (permalink / raw)
To: Chris Wilson, Daniel Vetter, Intel Graphics Development,
Mika Kuoppala, Daniel Vetter
On Tue, Feb 03, 2015 at 10:50:27AM +0000, Chris Wilson wrote:
> On Tue, Feb 03, 2015 at 11:49:00AM +0100, Daniel Vetter wrote:
> > Aside: It is possible to check whether a given task doesn't hold a
> > lock, but only when lockdep is enabled, using the lockdep_assert_held
> > stuff.
>
> Bah. That's what I said, but a certain Daniel insists on using WARN_ON().
That is for the inverse check which warns when the lock is not taking,
i.e. WARN_ON(!mutex_is_locked). lockdep_assert_held would be more
accurate but also won't work at all if disabled. We run single-threaded by
accident often enough that the reduced practical coverage trumps the
reduced theoretical coverage imo.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/i915: Remove bogus locking check in the hangcheck code
2015-02-03 11:00 ` Chris Wilson
@ 2015-02-03 11:14 ` Daniel Vetter
2015-02-03 11:17 ` Chris Wilson
0 siblings, 1 reply; 7+ messages in thread
From: Daniel Vetter @ 2015-02-03 11:14 UTC (permalink / raw)
To: Chris Wilson, Daniel Vetter, Intel Graphics Development,
Mika Kuoppala, Daniel Vetter
On Tue, Feb 03, 2015 at 11:00:56AM +0000, Chris Wilson wrote:
> On Tue, Feb 03, 2015 at 11:49:00AM +0100, Daniel Vetter wrote:
> > You can _never_ assert that a lock is not held, except in some very
> > restricted corner cases where it's guranteed that your code is running
> > single-threade (e.g. driver load before you've published any pointers
> > leading to that lock).
>
> Except that the mistake here was that we thought we were already inside
> the strictly single threaded recovery phase. Seems a bit blasé not to
> mention that recovery includes several tricks to break locks.
Even if this check is after the wake_up calls it's still invalid, since
only until we actually try to grab the mutex with mutex_lock will we
enforce enough synchronization to stall for any other lock holders. The
scheduler is free to honor our wake_up whenever it pleases.
Hence I stand by my assertion that except in cases where it's trivially
true (i.e. driver load and no other cpu could have possible seen a pointer
to that lock yet) check for unlockedness is wrong. The only reliable way
is to grab the lock (and hang if there's a bug).
We've had this exact bug in the past with hangcheck years back when we
started to stress-test hangs: There was a mutex_trylock in the recovery
work and we bailed when that failed:
commit d54a02c041ccfdcfe3efcd1e5b90c6e8d5e7a8d9
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Wed Jul 4 22:18:39 2012 +0200
drm/i915: don't trylock in the gpu reset code
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/i915: Remove bogus locking check in the hangcheck code
2015-02-03 11:14 ` Daniel Vetter
@ 2015-02-03 11:17 ` Chris Wilson
0 siblings, 0 replies; 7+ messages in thread
From: Chris Wilson @ 2015-02-03 11:17 UTC (permalink / raw)
To: Daniel Vetter
Cc: Daniel Vetter, Daniel Vetter, Intel Graphics Development,
Mika Kuoppala
On Tue, Feb 03, 2015 at 12:14:18PM +0100, Daniel Vetter wrote:
> On Tue, Feb 03, 2015 at 11:00:56AM +0000, Chris Wilson wrote:
> > On Tue, Feb 03, 2015 at 11:49:00AM +0100, Daniel Vetter wrote:
> > > You can _never_ assert that a lock is not held, except in some very
> > > restricted corner cases where it's guranteed that your code is running
> > > single-threade (e.g. driver load before you've published any pointers
> > > leading to that lock).
> >
> > Except that the mistake here was that we thought we were already inside
> > the strictly single threaded recovery phase. Seems a bit blasé not to
> > mention that recovery includes several tricks to break locks.
>
> Even if this check is after the wake_up calls it's still invalid, since
> only until we actually try to grab the mutex with mutex_lock will we
> enforce enough synchronization to stall for any other lock holders. The
> scheduler is free to honor our wake_up whenever it pleases.
Yes, that is exactly the reason why I pointed it out.
> Hence I stand by my assertion that except in cases where it's trivially
> true (i.e. driver load and no other cpu could have possible seen a pointer
> to that lock yet) check for unlockedness is wrong. The only reliable way
> is to grab the lock (and hang if there's a bug).
>
> We've had this exact bug in the past with hangcheck years back when we
> started to stress-test hangs: There was a mutex_trylock in the recovery
> work and we bailed when that failed:
>
> commit d54a02c041ccfdcfe3efcd1e5b90c6e8d5e7a8d9
> Author: Daniel Vetter <daniel.vetter@ffwll.ch>
> Date: Wed Jul 4 22:18:39 2012 +0200
>
> drm/i915: don't trylock in the gpu reset code
Oh, can we please fix those unwanted -EIO.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/i915: Remove bogus locking check in the hangcheck code
2015-02-03 10:49 [PATCH] drm/i915: Remove bogus locking check in the hangcheck code Daniel Vetter
2015-02-03 10:50 ` Chris Wilson
2015-02-03 11:00 ` Chris Wilson
@ 2015-02-03 13:57 ` shuang.he
2 siblings, 0 replies; 7+ messages in thread
From: shuang.he @ 2015-02-03 13:57 UTC (permalink / raw)
To: shuang.he, ethan.gao, intel-gfx, daniel.vetter
Tested-By: PRC QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 5702
-------------------------------------Summary-------------------------------------
Platform Delta drm-intel-nightly Series Applied
PNV 283/283 283/283
ILK -1 319/319 318/319
SNB +1-1 322/346 322/346
IVB +1-1 382/384 382/384
BYT 296/296 296/296
HSW +1 425/428 426/428
BDW -1 319/333 318/333
-------------------------------------Detailed-------------------------------------
Platform Test drm-intel-nightly Series Applied
ILK igt_gem_unfence_active_buffers DMESG_WARN(1, M37)PASS(3, M26) DMESG_WARN(1, M37)
*SNB igt_kms_flip_event_leak NSPT(7, M35M22) PASS(1, M35)
*SNB igt_kms_flip_single-buffer-flip-vs-dpms-off-vs-modeset PASS(2, M35) NSPT(1, M35)
*IVB igt_gem_pwrite_pread_snooped-copy-performance PASS(3, M34) DMESG_WARN(1, M34)
IVB igt_gem_pwrite_pread_snooped-pwrite-blt-cpu_mmap-performance DMESG_WARN(11, M34M21)PASS(10, M4M34) PASS(1, M34)
HSW igt_gem_pwrite_pread_snooped-pwrite-blt-cpu_mmap-performance DMESG_WARN(3, M40)PASS(24, M40M20) PASS(1, M40)
*BDW igt_gem_fence_thrash_bo-write-verify-threaded-none PASS(3, M30) DMESG_WARN(1, M30)
Note: You need to pay more attention to line start with '*'
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-02-03 13:57 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-03 10:49 [PATCH] drm/i915: Remove bogus locking check in the hangcheck code Daniel Vetter
2015-02-03 10:50 ` Chris Wilson
2015-02-03 11:09 ` Daniel Vetter
2015-02-03 11:00 ` Chris Wilson
2015-02-03 11:14 ` Daniel Vetter
2015-02-03 11:17 ` Chris Wilson
2015-02-03 13:57 ` shuang.he
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox