All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH] drm/i915: Mark up racy read of rq->engine
@ 2020-04-23 11:58 Chris Wilson
  2020-04-23 13:25 ` [Intel-gfx] ✗ Fi.CI.BAT: failure for " Patchwork
  2020-04-23 14:53 ` [Intel-gfx] [PATCH] " Tvrtko Ursulin
  0 siblings, 2 replies; 4+ messages in thread
From: Chris Wilson @ 2020-04-23 11:58 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

As the i915_request.engine may be updated by a virtual engine to either
point to the virtual engine or the real physical engine on submission,
we have to be wary that the engine pointer may change.

[  213.317076] BUG: KCSAN: data-race in execlists_dequeue [i915] / i915_request_wait [i915]
[  213.317097]
[  213.317110] write (marked) to 0xffff8881e8647650 of 8 bytes by interrupt on cpu 2:
[  213.317386]  execlists_dequeue+0x43b/0x1670 [i915]
[  213.317645]  __execlists_submission_tasklet+0x48/0x60 [i915]
[  213.317905]  execlists_submission_tasklet+0xd3/0x170 [i915]
[  213.317926]  tasklet_action_common.isra.0+0x42/0x90
[  213.317943]  __do_softirq+0xc8/0x206
[  213.317958]  irq_exit+0xcd/0xe0
[  213.317980]  irq_work_interrupt+0xf/0x20
[  213.317999]  __tsan_read8+0x30/0x100
[  213.318272]  retire_requests+0xdd/0xf0 [i915]
[  213.318502]  engine_retire+0xa6/0xe0 [i915]
[  213.318519]  process_one_work+0x3af/0x640
[  213.318534]  worker_thread+0x80/0x670
[  213.318548]  kthread+0x19a/0x1e0
[  213.318566]  ret_from_fork+0x1f/0x30
[  213.318584]
[  213.318595] read to 0xffff8881e8647650 of 8 bytes by task 458 on cpu 1:
[  213.318847]  i915_request_wait+0x3e3/0x510 [i915]
[  213.319088]  i915_gem_object_wait_fence+0x81/0xa0 [i915]
[  213.319328]  i915_gem_object_wait+0x26b/0x560 [i915]
[  213.319578]  i915_gem_wait_ioctl+0x141/0x290 [i915]
[  213.319597]  drm_ioctl_kernel+0xe9/0x130
[  213.319613]  drm_ioctl+0x27d/0x45e
[  213.319628]  ksys_ioctl+0x89/0xb0
[  213.319648]  __x64_sys_ioctl+0x42/0x60
[  213.319666]  do_syscall_64+0x6e/0x2c0
[  213.319680]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

In this case, we are merely trying to flush the most recent engine
associated with the request, and do not care which as the process of
chaing the engine is done by a tasklet, with which we are yielding to.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 22635bbabf06..e9fd20242438 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1660,7 +1660,7 @@ long i915_request_wait(struct i915_request *rq,
 			break;
 		}
 
-		intel_engine_flush_submission(rq->engine);
+		intel_engine_flush_submission(READ_ONCE(rq->engine));
 
 		if (signal_pending_state(state, current)) {
 			timeout = -ERESTARTSYS;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915: Mark up racy read of rq->engine
  2020-04-23 11:58 [Intel-gfx] [PATCH] drm/i915: Mark up racy read of rq->engine Chris Wilson
@ 2020-04-23 13:25 ` Patchwork
  2020-04-23 14:53 ` [Intel-gfx] [PATCH] " Tvrtko Ursulin
  1 sibling, 0 replies; 4+ messages in thread
From: Patchwork @ 2020-04-23 13:25 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Mark up racy read of rq->engine
URL   : https://patchwork.freedesktop.org/series/76392/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_8352 -> Patchwork_17439
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_17439 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_17439, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17439/index.html

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_17439:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_selftest@live@gt_pm:
    - fi-byt-j1900:       [PASS][1] -> [FAIL][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8352/fi-byt-j1900/igt@i915_selftest@live@gt_pm.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17439/fi-byt-j1900/igt@i915_selftest@live@gt_pm.html

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@i915_selftest@live@workarounds:
    - {fi-tgl-dsi}:       [PASS][3] -> [INCOMPLETE][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8352/fi-tgl-dsi/igt@i915_selftest@live@workarounds.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17439/fi-tgl-dsi/igt@i915_selftest@live@workarounds.html

  
Known issues
------------

  Here are the changes found in Patchwork_17439 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@i915_selftest@live@sanitycheck:
    - fi-bwr-2160:        [PASS][5] -> [INCOMPLETE][6] ([i915#489])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8352/fi-bwr-2160/igt@i915_selftest@live@sanitycheck.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17439/fi-bwr-2160/igt@i915_selftest@live@sanitycheck.html

  
#### Possible fixes ####

  * igt@i915_selftest@live@gt_lrc:
    - fi-snb-2600:        [FAIL][7] ([i915#1763]) -> [PASS][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8352/fi-snb-2600/igt@i915_selftest@live@gt_lrc.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17439/fi-snb-2600/igt@i915_selftest@live@gt_lrc.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [i915#1763]: https://gitlab.freedesktop.org/drm/intel/issues/1763
  [i915#489]: https://gitlab.freedesktop.org/drm/intel/issues/489


Participating hosts (48 -> 43)
------------------------------

  Additional (1): fi-bsw-kefka 
  Missing    (6): fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 fi-byt-clapper fi-bdw-samus 


Build changes
-------------

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_8352 -> Patchwork_17439

  CI-20190529: 20190529
  CI_DRM_8352: 248cbab28d58c203de956df1db4cdeb53ea97a89 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5608: e7bcaf1dd251d454706c7cd64282f531aec50183 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_17439: caf81f8e993e52fdcebf6e56807b254a65257613 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

caf81f8e993e drm/i915: Mark up racy read of rq->engine

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17439/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Mark up racy read of rq->engine
  2020-04-23 11:58 [Intel-gfx] [PATCH] drm/i915: Mark up racy read of rq->engine Chris Wilson
  2020-04-23 13:25 ` [Intel-gfx] ✗ Fi.CI.BAT: failure for " Patchwork
@ 2020-04-23 14:53 ` Tvrtko Ursulin
  2020-04-23 15:11   ` Chris Wilson
  1 sibling, 1 reply; 4+ messages in thread
From: Tvrtko Ursulin @ 2020-04-23 14:53 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 23/04/2020 12:58, Chris Wilson wrote:
> As the i915_request.engine may be updated by a virtual engine to either
> point to the virtual engine or the real physical engine on submission,
> we have to be wary that the engine pointer may change.
> 
> [  213.317076] BUG: KCSAN: data-race in execlists_dequeue [i915] / i915_request_wait [i915]
> [  213.317097]
> [  213.317110] write (marked) to 0xffff8881e8647650 of 8 bytes by interrupt on cpu 2:
> [  213.317386]  execlists_dequeue+0x43b/0x1670 [i915]
> [  213.317645]  __execlists_submission_tasklet+0x48/0x60 [i915]
> [  213.317905]  execlists_submission_tasklet+0xd3/0x170 [i915]
> [  213.317926]  tasklet_action_common.isra.0+0x42/0x90
> [  213.317943]  __do_softirq+0xc8/0x206
> [  213.317958]  irq_exit+0xcd/0xe0
> [  213.317980]  irq_work_interrupt+0xf/0x20
> [  213.317999]  __tsan_read8+0x30/0x100
> [  213.318272]  retire_requests+0xdd/0xf0 [i915]
> [  213.318502]  engine_retire+0xa6/0xe0 [i915]
> [  213.318519]  process_one_work+0x3af/0x640
> [  213.318534]  worker_thread+0x80/0x670
> [  213.318548]  kthread+0x19a/0x1e0
> [  213.318566]  ret_from_fork+0x1f/0x30
> [  213.318584]
> [  213.318595] read to 0xffff8881e8647650 of 8 bytes by task 458 on cpu 1:
> [  213.318847]  i915_request_wait+0x3e3/0x510 [i915]
> [  213.319088]  i915_gem_object_wait_fence+0x81/0xa0 [i915]
> [  213.319328]  i915_gem_object_wait+0x26b/0x560 [i915]
> [  213.319578]  i915_gem_wait_ioctl+0x141/0x290 [i915]
> [  213.319597]  drm_ioctl_kernel+0xe9/0x130
> [  213.319613]  drm_ioctl+0x27d/0x45e
> [  213.319628]  ksys_ioctl+0x89/0xb0
> [  213.319648]  __x64_sys_ioctl+0x42/0x60
> [  213.319666]  do_syscall_64+0x6e/0x2c0
> [  213.319680]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> In this case, we are merely trying to flush the most recent engine
> associated with the request, and do not care which as the process of
> chaing the engine is done by a tasklet, with which we are yielding to.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_request.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 22635bbabf06..e9fd20242438 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1660,7 +1660,7 @@ long i915_request_wait(struct i915_request *rq,
>   			break;
>   		}
>   
> -		intel_engine_flush_submission(rq->engine);
> +		intel_engine_flush_submission(READ_ONCE(rq->engine));
>   
>   		if (signal_pending_state(state, current)) {
>   			timeout = -ERESTARTSYS;
> 

What with the mutex_acquire/release in this case? No practical effect 
but they are also dereferencing rq->engine... Take a copy of engine for 
lockdep at start and another read for engine flushing in the loop?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Mark up racy read of rq->engine
  2020-04-23 14:53 ` [Intel-gfx] [PATCH] " Tvrtko Ursulin
@ 2020-04-23 15:11   ` Chris Wilson
  0 siblings, 0 replies; 4+ messages in thread
From: Chris Wilson @ 2020-04-23 15:11 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2020-04-23 15:53:44)
> 
> On 23/04/2020 12:58, Chris Wilson wrote:
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index 22635bbabf06..e9fd20242438 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -1660,7 +1660,7 @@ long i915_request_wait(struct i915_request *rq,
> >                       break;
> >               }
> >   
> > -             intel_engine_flush_submission(rq->engine);
> > +             intel_engine_flush_submission(READ_ONCE(rq->engine));
> >   
> >               if (signal_pending_state(state, current)) {
> >                       timeout = -ERESTARTSYS;
> > 
> 
> What with the mutex_acquire/release in this case? No practical effect 
> but they are also dereferencing rq->engine... Take a copy of engine for 
> lockdep at start and another read for engine flushing in the loop?

No practical difference [today], since the lockmap is on the gt.

Well that will be interesting in the future. Hmm, we could replace it
with a static [global] lockmap and then link all gt->reset.mutex to.
The only danger then is that we link a wait on one gt with a wait on
another -- except that if we allow it, that will happen naturally.
So I don't see a loss in generality in using a global lockmap.

Make a note for the future.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-04-23 15:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-04-23 11:58 [Intel-gfx] [PATCH] drm/i915: Mark up racy read of rq->engine Chris Wilson
2020-04-23 13:25 ` [Intel-gfx] ✗ Fi.CI.BAT: failure for " Patchwork
2020-04-23 14:53 ` [Intel-gfx] [PATCH] " Tvrtko Ursulin
2020-04-23 15:11   ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.