* [Intel-gfx] [PATCH 1/2] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission
@ 2020-03-25 12:02 Chris Wilson
2020-03-25 12:02 ` [Intel-gfx] [PATCH 2/2] drm/i915: Immediately execute the fenced work Chris Wilson
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Chris Wilson @ 2020-03-25 12:02 UTC (permalink / raw)
To: intel-gfx
We dropped calling process_csb prior to handling direct submission in
order to avoid the nesting of spinlocks and lift process_csb() and the
majority of the tasklet out of irq-off. However, we do want to avoid
ksoftirqd latency in the fast path, so try and pull the interrupt-bh
local to direct submission if we can acquire the tasklet's lock.
v2: Document the read of pending[0] from outside the tasklet with
READ_ONCE.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
drivers/gpu/drm/i915/gt/intel_lrc.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index f88d3b95c4e1..d49baade0986 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2891,6 +2891,13 @@ static void __submit_queue_imm(struct intel_engine_cs *engine)
if (reset_in_progress(execlists))
return; /* defer until we restart the engine following reset */
+ /* Hopefully we clear execlists->pending[] to let us through */
+ if (READ_ONCE(execlists->pending[0]) &&
+ tasklet_trylock(&execlists->tasklet)) {
+ process_csb(engine);
+ tasklet_unlock(&execlists->tasklet);
+ }
+
__execlists_submission_tasklet(engine);
}
--
2.20.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 10+ messages in thread* [Intel-gfx] [PATCH 2/2] drm/i915: Immediately execute the fenced work 2020-03-25 12:02 [Intel-gfx] [PATCH 1/2] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission Chris Wilson @ 2020-03-25 12:02 ` Chris Wilson 2020-03-25 12:58 ` Tvrtko Ursulin 2020-03-25 12:42 ` [Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [1/2] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission Patchwork 2020-03-25 12:56 ` [Intel-gfx] [PATCH 1/2] " Tvrtko Ursulin 2 siblings, 1 reply; 10+ messages in thread From: Chris Wilson @ 2020-03-25 12:02 UTC (permalink / raw) To: intel-gfx If the caller allows and we do not have to wait for any signals, immediately execute the work within the caller's process. By doing so we avoid the overhead of scheduling a new task, and the latency in executing it, at the cost of pulling that work back into the immediate context. (Sometimes we still prefer to offload the task to another cpu, especially if we plan on executing many such tasks in parallel for this client.) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 +- drivers/gpu/drm/i915/i915_sw_fence_work.c | 5 +++- drivers/gpu/drm/i915/i915_sw_fence_work.h | 23 +++++++++++++++++++ drivers/gpu/drm/i915/i915_vma.c | 2 +- 4 files changed, 29 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 6b3013d20851..c643eec4dca0 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -1822,7 +1822,7 @@ static int eb_parse_pipeline(struct i915_execbuffer *eb, dma_resv_add_excl_fence(shadow->resv, &pw->base.dma); dma_resv_unlock(shadow->resv); - dma_fence_work_commit(&pw->base); + dma_fence_work_commit_imm(&pw->base); return 0; err_batch_unlock: diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c index 997b2998f1f2..a3a81bb8f2c3 100644 --- a/drivers/gpu/drm/i915/i915_sw_fence_work.c +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c @@ -38,7 +38,10 @@ fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state) if (!f->dma.error) { dma_fence_get(&f->dma); - queue_work(system_unbound_wq, &f->work); + if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags)) + fence_work(&f->work); + else + queue_work(system_unbound_wq, &f->work); } else { fence_complete(f); } diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.h b/drivers/gpu/drm/i915/i915_sw_fence_work.h index 3a22b287e201..2c409f11c5c5 100644 --- a/drivers/gpu/drm/i915/i915_sw_fence_work.h +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.h @@ -32,6 +32,10 @@ struct dma_fence_work { const struct dma_fence_work_ops *ops; }; +enum { + DMA_FENCE_WORK_IMM = DMA_FENCE_FLAG_USER_BITS, +}; + void dma_fence_work_init(struct dma_fence_work *f, const struct dma_fence_work_ops *ops); int dma_fence_work_chain(struct dma_fence_work *f, struct dma_fence *signal); @@ -41,4 +45,23 @@ static inline void dma_fence_work_commit(struct dma_fence_work *f) i915_sw_fence_commit(&f->chain); } +/** + * dma_fence_work_commit_imm: Commit the fence, and if possible execute locally. + * @f: the fenced worker + * + * Instead of always scheduling a worker to execute the callback (see + * dma_fence_work_commit()), we try to execute the callback immediately in + * the local context. It is required that the fence be committed before it + * is published, and that no other threads try to tamper with the number + * of asynchronous waits on the fence (or else the callback will be + * executed in the wrong context, i.e. not the callers). + */ +static inline void dma_fence_work_commit_imm(struct dma_fence_work *f) +{ + if (atomic_read(&f->chain.pending) <= 1) + __set_bit(DMA_FENCE_WORK_IMM, &f->dma.flags); + + dma_fence_work_commit(f); +} + #endif /* I915_SW_FENCE_WORK_H */ diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 08699fa069aa..191577a98390 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -980,7 +980,7 @@ int i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) mutex_unlock(&vma->vm->mutex); err_fence: if (work) - dma_fence_work_commit(&work->base); + dma_fence_work_commit_imm(&work->base); if (wakeref) intel_runtime_pm_put(&vma->vm->i915->runtime_pm, wakeref); err_pages: -- 2.20.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [Intel-gfx] [PATCH 2/2] drm/i915: Immediately execute the fenced work 2020-03-25 12:02 ` [Intel-gfx] [PATCH 2/2] drm/i915: Immediately execute the fenced work Chris Wilson @ 2020-03-25 12:58 ` Tvrtko Ursulin 0 siblings, 0 replies; 10+ messages in thread From: Tvrtko Ursulin @ 2020-03-25 12:58 UTC (permalink / raw) To: Chris Wilson, intel-gfx On 25/03/2020 12:02, Chris Wilson wrote: > If the caller allows and we do not have to wait for any signals, > immediately execute the work within the caller's process. By doing so we > avoid the overhead of scheduling a new task, and the latency in > executing it, at the cost of pulling that work back into the immediate > context. (Sometimes we still prefer to offload the task to another cpu, > especially if we plan on executing many such tasks in parallel for this > client.) > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > --- > .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 +- > drivers/gpu/drm/i915/i915_sw_fence_work.c | 5 +++- > drivers/gpu/drm/i915/i915_sw_fence_work.h | 23 +++++++++++++++++++ > drivers/gpu/drm/i915/i915_vma.c | 2 +- > 4 files changed, 29 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c > index 6b3013d20851..c643eec4dca0 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c > @@ -1822,7 +1822,7 @@ static int eb_parse_pipeline(struct i915_execbuffer *eb, > dma_resv_add_excl_fence(shadow->resv, &pw->base.dma); > dma_resv_unlock(shadow->resv); > > - dma_fence_work_commit(&pw->base); > + dma_fence_work_commit_imm(&pw->base); > return 0; > > err_batch_unlock: > diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c > index 997b2998f1f2..a3a81bb8f2c3 100644 > --- a/drivers/gpu/drm/i915/i915_sw_fence_work.c > +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c > @@ -38,7 +38,10 @@ fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state) > > if (!f->dma.error) { > dma_fence_get(&f->dma); > - queue_work(system_unbound_wq, &f->work); > + if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags)) > + fence_work(&f->work); > + else > + queue_work(system_unbound_wq, &f->work); > } else { > fence_complete(f); > } > diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.h b/drivers/gpu/drm/i915/i915_sw_fence_work.h > index 3a22b287e201..2c409f11c5c5 100644 > --- a/drivers/gpu/drm/i915/i915_sw_fence_work.h > +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.h > @@ -32,6 +32,10 @@ struct dma_fence_work { > const struct dma_fence_work_ops *ops; > }; > > +enum { > + DMA_FENCE_WORK_IMM = DMA_FENCE_FLAG_USER_BITS, > +}; > + > void dma_fence_work_init(struct dma_fence_work *f, > const struct dma_fence_work_ops *ops); > int dma_fence_work_chain(struct dma_fence_work *f, struct dma_fence *signal); > @@ -41,4 +45,23 @@ static inline void dma_fence_work_commit(struct dma_fence_work *f) > i915_sw_fence_commit(&f->chain); > } > > +/** > + * dma_fence_work_commit_imm: Commit the fence, and if possible execute locally. > + * @f: the fenced worker > + * > + * Instead of always scheduling a worker to execute the callback (see > + * dma_fence_work_commit()), we try to execute the callback immediately in > + * the local context. It is required that the fence be committed before it > + * is published, and that no other threads try to tamper with the number > + * of asynchronous waits on the fence (or else the callback will be > + * executed in the wrong context, i.e. not the callers). > + */ > +static inline void dma_fence_work_commit_imm(struct dma_fence_work *f) > +{ > + if (atomic_read(&f->chain.pending) <= 1) > + __set_bit(DMA_FENCE_WORK_IMM, &f->dma.flags); > + > + dma_fence_work_commit(f); > +} > + > #endif /* I915_SW_FENCE_WORK_H */ > diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c > index 08699fa069aa..191577a98390 100644 > --- a/drivers/gpu/drm/i915/i915_vma.c > +++ b/drivers/gpu/drm/i915/i915_vma.c > @@ -980,7 +980,7 @@ int i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) > mutex_unlock(&vma->vm->mutex); > err_fence: > if (work) > - dma_fence_work_commit(&work->base); > + dma_fence_work_commit_imm(&work->base); > if (wakeref) > intel_runtime_pm_put(&vma->vm->i915->runtime_pm, wakeref); > err_pages: > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Regards, Tvrtko _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [1/2] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission 2020-03-25 12:02 [Intel-gfx] [PATCH 1/2] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission Chris Wilson 2020-03-25 12:02 ` [Intel-gfx] [PATCH 2/2] drm/i915: Immediately execute the fenced work Chris Wilson @ 2020-03-25 12:42 ` Patchwork 2020-03-25 12:56 ` [Intel-gfx] [PATCH 1/2] " Tvrtko Ursulin 2 siblings, 0 replies; 10+ messages in thread From: Patchwork @ 2020-03-25 12:42 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx == Series Details == Series: series starting with [1/2] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission URL : https://patchwork.freedesktop.org/series/75068/ State : success == Summary == CI Bug Log - changes from CI_DRM_8186 -> Patchwork_17083 ==================================================== Summary ------- **SUCCESS** No regressions found. External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17083/index.html Known issues ------------ Here are the changes found in Patchwork_17083 that come from known issues: ### IGT changes ### #### Possible fixes #### * igt@i915_selftest@live@gem_contexts: - fi-cfl-guc: [DMESG-FAIL][1] ([i915#730] / [i915#933]) -> [PASS][2] [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8186/fi-cfl-guc/igt@i915_selftest@live@gem_contexts.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17083/fi-cfl-guc/igt@i915_selftest@live@gem_contexts.html [i915#730]: https://gitlab.freedesktop.org/drm/intel/issues/730 [i915#933]: https://gitlab.freedesktop.org/drm/intel/issues/933 Participating hosts (44 -> 39) ------------------------------ Additional (4): fi-skl-lmem fi-bwr-2160 fi-skl-6600u fi-snb-2600 Missing (9): fi-bdw-5557u fi-hsw-4200u fi-byt-j1900 fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 fi-elk-e7500 fi-blb-e6850 fi-bdw-samus Build changes ------------- * CI: CI-20190529 -> None * Linux: CI_DRM_8186 -> Patchwork_17083 CI-20190529: 20190529 CI_DRM_8186: 1abdd8c4027177dd054471cf58a5e9ec3f2df46d @ git://anongit.freedesktop.org/gfx-ci/linux IGT_5537: 190245120758e754813d76b2c6c613413a0dba29 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools Patchwork_17083: 0eba3cb69652551e1a54e32cad9c830e932d7683 @ git://anongit.freedesktop.org/gfx-ci/linux == Linux commits == 0eba3cb69652 drm/i915: Immediately execute the fenced work 7a0b5667f8c9 drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission == Logs == For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17083/index.html _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Intel-gfx] [PATCH 1/2] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission 2020-03-25 12:02 [Intel-gfx] [PATCH 1/2] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission Chris Wilson 2020-03-25 12:02 ` [Intel-gfx] [PATCH 2/2] drm/i915: Immediately execute the fenced work Chris Wilson 2020-03-25 12:42 ` [Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [1/2] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission Patchwork @ 2020-03-25 12:56 ` Tvrtko Ursulin 2 siblings, 0 replies; 10+ messages in thread From: Tvrtko Ursulin @ 2020-03-25 12:56 UTC (permalink / raw) To: Chris Wilson, intel-gfx On 25/03/2020 12:02, Chris Wilson wrote: > We dropped calling process_csb prior to handling direct submission in > order to avoid the nesting of spinlocks and lift process_csb() and the > majority of the tasklet out of irq-off. However, we do want to avoid > ksoftirqd latency in the fast path, so try and pull the interrupt-bh > local to direct submission if we can acquire the tasklet's lock. > > v2: Document the read of pending[0] from outside the tasklet with > READ_ONCE. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > --- > drivers/gpu/drm/i915/gt/intel_lrc.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c > index f88d3b95c4e1..d49baade0986 100644 > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c > @@ -2891,6 +2891,13 @@ static void __submit_queue_imm(struct intel_engine_cs *engine) > if (reset_in_progress(execlists)) > return; /* defer until we restart the engine following reset */ > > + /* Hopefully we clear execlists->pending[] to let us through */ > + if (READ_ONCE(execlists->pending[0]) && > + tasklet_trylock(&execlists->tasklet)) { > + process_csb(engine); > + tasklet_unlock(&execlists->tasklet); > + } > + > __execlists_submission_tasklet(engine); > } > > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Regards, Tvrtko _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Intel-gfx] [PATCH 1/2] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission @ 2020-03-24 12:07 Chris Wilson 2020-03-24 16:04 ` Tvrtko Ursulin 2020-03-24 16:11 ` Tvrtko Ursulin 0 siblings, 2 replies; 10+ messages in thread From: Chris Wilson @ 2020-03-24 12:07 UTC (permalink / raw) To: intel-gfx We dropped calling process_csb prior to handling direct submission in order to avoid the nesting of spinlocks and lift process_csb() and the majority of the tasklet out of irq-off. However, we do want to avoid ksoftirqd latency in the fast path, so try and pull the interrupt-bh local to direct submission if we can acquire the tasklet's lock. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> --- drivers/gpu/drm/i915/gt/intel_lrc.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 210f60e14ef4..82dee2141b46 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -2891,6 +2891,12 @@ static void __submit_queue_imm(struct intel_engine_cs *engine) if (reset_in_progress(execlists)) return; /* defer until we restart the engine following reset */ + /* Hopefully we clear execlists->pending[] to let us through */ + if (execlists->pending[0] && tasklet_trylock(&execlists->tasklet)) { + process_csb(engine); + tasklet_unlock(&execlists->tasklet); + } + __execlists_submission_tasklet(engine); } -- 2.20.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [Intel-gfx] [PATCH 1/2] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission 2020-03-24 12:07 Chris Wilson @ 2020-03-24 16:04 ` Tvrtko Ursulin 2020-03-24 16:28 ` Chris Wilson 2020-03-24 16:11 ` Tvrtko Ursulin 1 sibling, 1 reply; 10+ messages in thread From: Tvrtko Ursulin @ 2020-03-24 16:04 UTC (permalink / raw) To: Chris Wilson, intel-gfx On 24/03/2020 12:07, Chris Wilson wrote: > We dropped calling process_csb prior to handling direct submission in > order to avoid the nesting of spinlocks and lift process_csb() and the > majority of the tasklet out of irq-off. However, we do want to avoid > ksoftirqd latency in the fast path, so try and pull the interrupt-bh > local to direct submission if we can acquire the tasklet's lock. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > --- > drivers/gpu/drm/i915/gt/intel_lrc.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c > index 210f60e14ef4..82dee2141b46 100644 > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c > @@ -2891,6 +2891,12 @@ static void __submit_queue_imm(struct intel_engine_cs *engine) > if (reset_in_progress(execlists)) > return; /* defer until we restart the engine following reset */ > > + /* Hopefully we clear execlists->pending[] to let us through */ > + if (execlists->pending[0] && tasklet_trylock(&execlists->tasklet)) Does access to pending needs a READ_ONCE? { > + process_csb(engine); > + tasklet_unlock(&execlists->tasklet); > + } > + > __execlists_submission_tasklet(engine); > } > > __execlists_submission_tasklet does check with READ_ONCE. I think locking is fine, given how normal flow is tasklet -> irqsave engine lock, and here we have the reverse, but a trylock. Regards, Tvrtko _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Intel-gfx] [PATCH 1/2] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission 2020-03-24 16:04 ` Tvrtko Ursulin @ 2020-03-24 16:28 ` Chris Wilson 0 siblings, 0 replies; 10+ messages in thread From: Chris Wilson @ 2020-03-24 16:28 UTC (permalink / raw) To: Tvrtko Ursulin, intel-gfx Quoting Tvrtko Ursulin (2020-03-24 16:04:47) > > On 24/03/2020 12:07, Chris Wilson wrote: > > We dropped calling process_csb prior to handling direct submission in > > order to avoid the nesting of spinlocks and lift process_csb() and the > > majority of the tasklet out of irq-off. However, we do want to avoid > > ksoftirqd latency in the fast path, so try and pull the interrupt-bh > > local to direct submission if we can acquire the tasklet's lock. > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > > --- > > drivers/gpu/drm/i915/gt/intel_lrc.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c > > index 210f60e14ef4..82dee2141b46 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c > > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c > > @@ -2891,6 +2891,12 @@ static void __submit_queue_imm(struct intel_engine_cs *engine) > > if (reset_in_progress(execlists)) > > return; /* defer until we restart the engine following reset */ > > > > + /* Hopefully we clear execlists->pending[] to let us through */ > > + if (execlists->pending[0] && tasklet_trylock(&execlists->tasklet)) > > Does access to pending needs a READ_ONCE? Yes, we are peeking outside of the tasklet. > { > > + process_csb(engine); > > + tasklet_unlock(&execlists->tasklet); > > + } > > + > > __execlists_submission_tasklet(engine); > > } > > > > > > __execlists_submission_tasklet does check with READ_ONCE. > > I think locking is fine, given how normal flow is tasklet -> irqsave > engine lock, and here we have the reverse, but a trylock. And inside process_csb() we don't take the active lock, so we just need to serialise calls to process_csb(). We don't strictly rely on it as it's just an optimisation for latency so we can trylock and not worry about not making forward progress -- the tasklet *running* but waiting on the execlists lock is sufficient to ensure that a dequeue will happen after this point if we can't do it ourselves. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Intel-gfx] [PATCH 1/2] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission 2020-03-24 12:07 Chris Wilson 2020-03-24 16:04 ` Tvrtko Ursulin @ 2020-03-24 16:11 ` Tvrtko Ursulin 2020-03-24 16:32 ` Chris Wilson 1 sibling, 1 reply; 10+ messages in thread From: Tvrtko Ursulin @ 2020-03-24 16:11 UTC (permalink / raw) To: Chris Wilson, intel-gfx On 24/03/2020 12:07, Chris Wilson wrote: > We dropped calling process_csb prior to handling direct submission in > order to avoid the nesting of spinlocks and lift process_csb() and the > majority of the tasklet out of irq-off. However, we do want to avoid > ksoftirqd latency in the fast path, so try and pull the interrupt-bh > local to direct submission if we can acquire the tasklet's lock. Oh and important question - who benefits and how much? Regards, Tvrtko > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > --- > drivers/gpu/drm/i915/gt/intel_lrc.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c > index 210f60e14ef4..82dee2141b46 100644 > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c > @@ -2891,6 +2891,12 @@ static void __submit_queue_imm(struct intel_engine_cs *engine) > if (reset_in_progress(execlists)) > return; /* defer until we restart the engine following reset */ > > + /* Hopefully we clear execlists->pending[] to let us through */ > + if (execlists->pending[0] && tasklet_trylock(&execlists->tasklet)) { > + process_csb(engine); > + tasklet_unlock(&execlists->tasklet); > + } > + > __execlists_submission_tasklet(engine); > } > > _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Intel-gfx] [PATCH 1/2] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission 2020-03-24 16:11 ` Tvrtko Ursulin @ 2020-03-24 16:32 ` Chris Wilson 0 siblings, 0 replies; 10+ messages in thread From: Chris Wilson @ 2020-03-24 16:32 UTC (permalink / raw) To: Tvrtko Ursulin, intel-gfx Quoting Tvrtko Ursulin (2020-03-24 16:11:10) > > On 24/03/2020 12:07, Chris Wilson wrote: > > We dropped calling process_csb prior to handling direct submission in > > order to avoid the nesting of spinlocks and lift process_csb() and the > > majority of the tasklet out of irq-off. However, we do want to avoid > > ksoftirqd latency in the fast path, so try and pull the interrupt-bh > > local to direct submission if we can acquire the tasklet's lock. > > Oh and important question - who benefits and how much? Anything that doesn't want to be deferred to a tasklet like wsim, where it helps fix a small regression in tip. That and avoiding the worker where we didn't use to have one. Did not see a difference in syslatency, so that's still a bone of contention for direction submission. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-03-25 12:58 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-03-25 12:02 [Intel-gfx] [PATCH 1/2] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission Chris Wilson 2020-03-25 12:02 ` [Intel-gfx] [PATCH 2/2] drm/i915: Immediately execute the fenced work Chris Wilson 2020-03-25 12:58 ` Tvrtko Ursulin 2020-03-25 12:42 ` [Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [1/2] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission Patchwork 2020-03-25 12:56 ` [Intel-gfx] [PATCH 1/2] " Tvrtko Ursulin -- strict thread matches above, loose matches on Subject: below -- 2020-03-24 12:07 Chris Wilson 2020-03-24 16:04 ` Tvrtko Ursulin 2020-03-24 16:28 ` Chris Wilson 2020-03-24 16:11 ` Tvrtko Ursulin 2020-03-24 16:32 ` Chris Wilson
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.