All of lore.kernel.org
 help / color / mirror / Atom feed
From: Imre Deak <imre.deak@intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	Intel-gfx@lists.freedesktop.org
Subject: Re: [RFC v2] drm/i915: Move execlists irq handler to a bottom half
Date: Thu, 24 Mar 2016 17:56:40 +0200	[thread overview]
Message-ID: <1458835000.20611.14.camel@intel.com> (raw)
In-Reply-To: <1458745056-25673-1-git-send-email-tvrtko.ursulin@linux.intel.com>

On ke, 2016-03-23 at 14:57 +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Doing a lot of work in the interrupt handler introduces huge
> latencies to the system as a whole.
> 
> Most dramatic effect can be seen by running an all engine
> stress test like igt/gem_exec_nop/all where, when the kernel
> config is lean enough, the whole system can be brought into
> multi-second periods of complete non-interactivty. That can
> look for example like this:
> 
>  NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s!
> [kworker/u8:3:143]
>  Modules linked in: [redacted for brevity]
>  CPU: 0 PID: 143 Comm: kworker/u8:3 Tainted: G     U       L  4.5.0-
> 160321+ #183
>  Hardware name: Intel Corporation Broadwell Client platform/WhiteTip
> Mountain 1
>  Workqueue: i915 gen6_pm_rps_work [i915]
>  task: ffff8800aae88000 ti: ffff8800aae90000 task.ti:
> ffff8800aae90000
>  RIP: 0010:[<ffffffff8104a3c2>]  [<ffffffff8104a3c2>]
> __do_softirq+0x72/0x1d0
>  RSP: 0000:ffff88014f403f38  EFLAGS: 00000206
>  RAX: ffff8800aae94000 RBX: 0000000000000000 RCX: 00000000000006e0
>  RDX: 0000000000000020 RSI: 0000000004208060 RDI: 0000000000215d80
>  RBP: ffff88014f403f80 R08: 0000000b1b42c180 R09: 0000000000000022
>  R10: 0000000000000004 R11: 00000000ffffffff R12: 000000000000a030
>  R13: 0000000000000082 R14: ffff8800aa4d0080 R15: 0000000000000082
>  FS:  0000000000000000(0000) GS:ffff88014f400000(0000)
> knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00007fa53b90c000 CR3: 0000000001a0a000 CR4: 00000000001406f0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  Stack:
>   042080601b33869f ffff8800aae94000 00000000fffc2678 ffff88010000000a
>   0000000000000000 000000000000a030 0000000000005302 ffff8800aa4d0080
>   0000000000000206 ffff88014f403f90 ffffffff8104a716 ffff88014f403fa8
>  Call Trace:
>   <IRQ>
>   [<ffffffff8104a716>] irq_exit+0x86/0x90
>   [<ffffffff81031e7d>] smp_apic_timer_interrupt+0x3d/0x50
>   [<ffffffff814f3eac>] apic_timer_interrupt+0x7c/0x90
>   <EOI>
>   [<ffffffffa01c5b40>] ? gen8_write64+0x1a0/0x1a0 [i915]
>   [<ffffffff814f2b39>] ? _raw_spin_unlock_irqrestore+0x9/0x20
>   [<ffffffffa01c5c44>] gen8_write32+0x104/0x1a0 [i915]
>   [<ffffffff8132c6a2>] ? n_tty_receive_buf_common+0x372/0xae0
>   [<ffffffffa017cc9e>] gen6_set_rps_thresholds+0x1be/0x330 [i915]
>   [<ffffffffa017eaf0>] gen6_set_rps+0x70/0x200 [i915]
>   [<ffffffffa0185375>] intel_set_rps+0x25/0x30 [i915]
>   [<ffffffffa01768fd>] gen6_pm_rps_work+0x10d/0x2e0 [i915]
>   [<ffffffff81063852>] ? finish_task_switch+0x72/0x1c0
>   [<ffffffff8105ab29>] process_one_work+0x139/0x350
>   [<ffffffff8105b186>] worker_thread+0x126/0x490
>   [<ffffffff8105b060>] ? rescuer_thread+0x320/0x320
>   [<ffffffff8105fa64>] kthread+0xc4/0xe0
>   [<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170
>   [<ffffffff814f351f>] ret_from_fork+0x3f/0x70
>   [<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170
> 
> I could not explain, or find a code path, which would explain
> a +20 second lockup, but from some instrumentation it was
> apparent the interrupts off proportion of time was between
> 10-25% under heavy load which is quite bad.
> 
> By moving the GT interrupt handling to a tasklet in a most
> simple way, the problem above disappears completely.
> 
> Also, gem_latency -n 100 shows 25% better throughput and CPU
> usage, and 14% better latencies.
> 
> I did not find any gains or regressions with Synmark2 or
> GLbench under light testing. More benchmarking is certainly
> required.
> 
> v2:
>    * execlists_lock should be taken as spin_lock_bh when
>      queuing work from userspace now. (Chris Wilson)
>    * uncore.lock must be taken with spin_lock_irq when
>      submitting requests since that now runs from either
>      softirq or process context.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>

You also have to synchronize against the tasklet now whenever we
synchronize against the IRQ, see gen6_disable_rps_interrupts(),
gen8_irq_power_well_pre_disable() and
intel_runtime_pm_disable_interrupts(). Not saying you should use a
threaded IRQ instead, but it does provide for this automatically.

--Imre

> ---
>  drivers/gpu/drm/i915/i915_irq.c         |  2 +-
>  drivers/gpu/drm/i915/intel_lrc.c        | 24 ++++++++++++++---------
> -
>  drivers/gpu/drm/i915/intel_lrc.h        |  1 -
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
>  4 files changed, 16 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c
> b/drivers/gpu/drm/i915/i915_irq.c
> index 8f3e3309c3ab..e68134347007 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1324,7 +1324,7 @@ gen8_cs_irq_handler(struct intel_engine_cs
> *engine, u32 iir, int test_shift)
>  	if (iir & (GT_RENDER_USER_INTERRUPT << test_shift))
>  		notify_ring(engine);
>  	if (iir & (GT_CONTEXT_SWITCH_INTERRUPT << test_shift))
> -		intel_lrc_irq_handler(engine);
> +		tasklet_schedule(&engine->irq_tasklet);
>  }
>  
>  static irqreturn_t gen8_gt_irq_handler(struct drm_i915_private
> *dev_priv,
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> b/drivers/gpu/drm/i915/intel_lrc.c
> index 67592f8354d6..b3b62b3cd90d 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -418,20 +418,18 @@ static void execlists_submit_requests(struct
> drm_i915_gem_request *rq0,
>  {
>  	struct drm_i915_private *dev_priv = rq0->i915;
>  
> -	/* BUG_ON(!irqs_disabled());  */
> -
>  	execlists_update_context(rq0);
>  
>  	if (rq1)
>  		execlists_update_context(rq1);
>  
> -	spin_lock(&dev_priv->uncore.lock);
> +	spin_lock_irq(&dev_priv->uncore.lock);
>  	intel_uncore_forcewake_get__locked(dev_priv, FORCEWAKE_ALL);
>  
>  	execlists_elsp_write(rq0, rq1);
>  
>  	intel_uncore_forcewake_put__locked(dev_priv, FORCEWAKE_ALL);
> -	spin_unlock(&dev_priv->uncore.lock);
> +	spin_unlock_irq(&dev_priv->uncore.lock);
>  }
>  
>  static void execlists_context_unqueue(struct intel_engine_cs
> *engine)
> @@ -536,13 +534,14 @@ get_context_status(struct drm_i915_private
> *dev_priv, u32 csb_base,
>  
>  /**
>   * intel_lrc_irq_handler() - handle Context Switch interrupts
> - * @ring: Engine Command Streamer to handle.
> + * @engine: Engine Command Streamer to handle.
>   *
>   * Check the unread Context Status Buffers and manage the submission
> of new
>   * contexts to the ELSP accordingly.
>   */
> -void intel_lrc_irq_handler(struct intel_engine_cs *engine)
> +void intel_lrc_irq_handler(unsigned long data)
>  {
> +	struct intel_engine_cs *engine = (struct intel_engine_cs
> *)data;
>  	struct drm_i915_private *dev_priv = engine->dev-
> >dev_private;
>  	u32 status_pointer;
>  	unsigned int read_pointer, write_pointer;
> @@ -551,7 +550,7 @@ void intel_lrc_irq_handler(struct intel_engine_cs
> *engine)
>  	unsigned int csb_read = 0, i;
>  	unsigned int submit_contexts = 0;
>  
> -	spin_lock(&dev_priv->uncore.lock);
> +	spin_lock_irq(&dev_priv->uncore.lock);
>  	intel_uncore_forcewake_get__locked(dev_priv, FORCEWAKE_ALL);
>  
>  	status_pointer =
> I915_READ_FW(RING_CONTEXT_STATUS_PTR(engine));
> @@ -579,7 +578,7 @@ void intel_lrc_irq_handler(struct intel_engine_cs
> *engine)
>  				    engine-
> >next_context_status_buffer << 8));
>  
>  	intel_uncore_forcewake_put__locked(dev_priv, FORCEWAKE_ALL);
> -	spin_unlock(&dev_priv->uncore.lock);
> +	spin_unlock_irq(&dev_priv->uncore.lock);
>  
>  	spin_lock(&engine->execlist_lock);
>  
> @@ -621,7 +620,7 @@ static void execlists_context_queue(struct
> drm_i915_gem_request *request)
>  
>  	i915_gem_request_reference(request);
>  
> -	spin_lock_irq(&engine->execlist_lock);
> +	spin_lock_bh(&engine->execlist_lock);
>  
>  	list_for_each_entry(cursor, &engine->execlist_queue,
> execlist_link)
>  		if (++num_elements > 2)
> @@ -646,7 +645,7 @@ static void execlists_context_queue(struct
> drm_i915_gem_request *request)
>  	if (num_elements == 0)
>  		execlists_context_unqueue(engine);
>  
> -	spin_unlock_irq(&engine->execlist_lock);
> +	spin_unlock_bh(&engine->execlist_lock);
>  }
>  
>  static int logical_ring_invalidate_all_caches(struct
> drm_i915_gem_request *req)
> @@ -2016,6 +2015,8 @@ void intel_logical_ring_cleanup(struct
> intel_engine_cs *engine)
>  	if (!intel_engine_initialized(engine))
>  		return;
>  
> +	tasklet_kill(&engine->irq_tasklet);
> +
>  	dev_priv = engine->dev->dev_private;
>  
>  	if (engine->buffer) {
> @@ -2089,6 +2090,9 @@ logical_ring_init(struct drm_device *dev,
> struct intel_engine_cs *engine)
>  	INIT_LIST_HEAD(&engine->execlist_retired_req_list);
>  	spin_lock_init(&engine->execlist_lock);
>  
> +	tasklet_init(&engine->irq_tasklet, intel_lrc_irq_handler,
> +		     (unsigned long)engine);
> +
>  	logical_ring_init_platform_invariants(engine);
>  
>  	ret = i915_cmd_parser_init_ring(engine);
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h
> b/drivers/gpu/drm/i915/intel_lrc.h
> index 6690d93d603f..efcbd7bf9cc9 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -123,7 +123,6 @@ int intel_execlists_submission(struct
> i915_execbuffer_params *params,
>  			       struct drm_i915_gem_execbuffer2
> *args,
>  			       struct list_head *vmas);
>  
> -void intel_lrc_irq_handler(struct intel_engine_cs *engine);
>  void intel_execlists_retire_requests(struct intel_engine_cs
> *engine);
>  
>  #endif /* _INTEL_LRC_H_ */
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h
> b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 221a94627aab..29810cba8a8c 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -266,6 +266,7 @@ struct  intel_engine_cs {
>  	} semaphore;
>  
>  	/* Execlists */
> +	struct tasklet_struct irq_tasklet;
>  	spinlock_t execlist_lock;
>  	struct list_head execlist_queue;
>  	struct list_head execlist_retired_req_list;
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  parent reply	other threads:[~2016-03-24 15:56 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-22 17:30 [RFC] drm/i915: Move execlists irq handler to a bottom half Tvrtko Ursulin
2016-03-23  7:02 ` ✓ Fi.CI.BAT: success for " Patchwork
2016-03-23  9:07 ` [RFC] " Daniel Vetter
2016-03-23  9:14   ` Chris Wilson
2016-03-23 10:08     ` Tvrtko Ursulin
2016-03-23 11:31       ` Chris Wilson
2016-03-23 12:46         ` Tvrtko Ursulin
2016-03-23 12:56           ` Chris Wilson
2016-03-23 15:23             ` Tvrtko Ursulin
2016-03-24  9:37               ` Tvrtko Ursulin
2016-03-24 12:18             ` [PATCH v3] " Tvrtko Ursulin
2016-03-24 22:24               ` Chris Wilson
2016-03-25 12:56                 ` Chris Wilson
2016-04-04 11:11                   ` [PATCH v4] " Tvrtko Ursulin
2016-04-04 11:27                     ` Chris Wilson
2016-04-04 12:51                       ` Tvrtko Ursulin
2016-03-23 14:57 ` [RFC v2] " Tvrtko Ursulin
2016-03-24 10:56   ` Chris Wilson
2016-03-24 11:50     ` Tvrtko Ursulin
2016-03-24 12:58       ` Tvrtko Ursulin
2016-03-24 15:56   ` Imre Deak [this message]
2016-03-24 16:05     ` Chris Wilson
2016-03-24 16:40       ` Imre Deak
2016-03-24 19:56         ` Chris Wilson
2016-03-24 22:13           ` Imre Deak
2016-03-23 15:32 ` ✗ Fi.CI.BAT: failure for drm/i915: Move execlists irq handler to a bottom half (rev2) Patchwork
2016-03-24 14:03 ` ✗ Fi.CI.BAT: warning for drm/i915: Move execlists irq handler to a bottom half (rev3) Patchwork
2016-03-24 15:17   ` Tvrtko Ursulin
2016-04-04 12:33 ` ✗ Fi.CI.BAT: failure for drm/i915: Move execlists irq handler to a bottom half (rev4) Patchwork
2016-04-04 12:42   ` Tvrtko Ursulin
2016-04-04 12:53     ` Chris Wilson
2016-04-04 13:14       ` Tvrtko Ursulin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1458835000.20611.14.camel@intel.com \
    --to=imre.deak@intel.com \
    --cc=Intel-gfx@lists.freedesktop.org \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.