All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>,
	Intel-gfx@lists.freedesktop.org,
	Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Subject: Re: [RFC v2] drm/i915: Move execlists irq handler to a bottom half
Date: Thu, 24 Mar 2016 11:50:38 +0000	[thread overview]
Message-ID: <56F3D48E.7000303@linux.intel.com> (raw)
In-Reply-To: <20160324105614.GB27742@nuc-i3427.alporthouse.com>


On 24/03/16 10:56, Chris Wilson wrote:
> On Wed, Mar 23, 2016 at 02:57:36PM +0000, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Doing a lot of work in the interrupt handler introduces huge
>> latencies to the system as a whole.
>>
>> Most dramatic effect can be seen by running an all engine
>> stress test like igt/gem_exec_nop/all where, when the kernel
>> config is lean enough, the whole system can be brought into
>> multi-second periods of complete non-interactivty. That can
>> look for example like this:
>>
>>   NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u8:3:143]
>>   Modules linked in: [redacted for brevity]
>>   CPU: 0 PID: 143 Comm: kworker/u8:3 Tainted: G     U       L  4.5.0-160321+ #183
>>   Hardware name: Intel Corporation Broadwell Client platform/WhiteTip Mountain 1
>>   Workqueue: i915 gen6_pm_rps_work [i915]
>>   task: ffff8800aae88000 ti: ffff8800aae90000 task.ti: ffff8800aae90000
>>   RIP: 0010:[<ffffffff8104a3c2>]  [<ffffffff8104a3c2>] __do_softirq+0x72/0x1d0
>>   RSP: 0000:ffff88014f403f38  EFLAGS: 00000206
>>   RAX: ffff8800aae94000 RBX: 0000000000000000 RCX: 00000000000006e0
>>   RDX: 0000000000000020 RSI: 0000000004208060 RDI: 0000000000215d80
>>   RBP: ffff88014f403f80 R08: 0000000b1b42c180 R09: 0000000000000022
>>   R10: 0000000000000004 R11: 00000000ffffffff R12: 000000000000a030
>>   R13: 0000000000000082 R14: ffff8800aa4d0080 R15: 0000000000000082
>>   FS:  0000000000000000(0000) GS:ffff88014f400000(0000) knlGS:0000000000000000
>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>   CR2: 00007fa53b90c000 CR3: 0000000001a0a000 CR4: 00000000001406f0
>>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>   Stack:
>>    042080601b33869f ffff8800aae94000 00000000fffc2678 ffff88010000000a
>>    0000000000000000 000000000000a030 0000000000005302 ffff8800aa4d0080
>>    0000000000000206 ffff88014f403f90 ffffffff8104a716 ffff88014f403fa8
>>   Call Trace:
>>    <IRQ>
>>    [<ffffffff8104a716>] irq_exit+0x86/0x90
>>    [<ffffffff81031e7d>] smp_apic_timer_interrupt+0x3d/0x50
>>    [<ffffffff814f3eac>] apic_timer_interrupt+0x7c/0x90
>>    <EOI>
>>    [<ffffffffa01c5b40>] ? gen8_write64+0x1a0/0x1a0 [i915]
>>    [<ffffffff814f2b39>] ? _raw_spin_unlock_irqrestore+0x9/0x20
>>    [<ffffffffa01c5c44>] gen8_write32+0x104/0x1a0 [i915]
>>    [<ffffffff8132c6a2>] ? n_tty_receive_buf_common+0x372/0xae0
>>    [<ffffffffa017cc9e>] gen6_set_rps_thresholds+0x1be/0x330 [i915]
>>    [<ffffffffa017eaf0>] gen6_set_rps+0x70/0x200 [i915]
>>    [<ffffffffa0185375>] intel_set_rps+0x25/0x30 [i915]
>>    [<ffffffffa01768fd>] gen6_pm_rps_work+0x10d/0x2e0 [i915]
>>    [<ffffffff81063852>] ? finish_task_switch+0x72/0x1c0
>>    [<ffffffff8105ab29>] process_one_work+0x139/0x350
>>    [<ffffffff8105b186>] worker_thread+0x126/0x490
>>    [<ffffffff8105b060>] ? rescuer_thread+0x320/0x320
>>    [<ffffffff8105fa64>] kthread+0xc4/0xe0
>>    [<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170
>>    [<ffffffff814f351f>] ret_from_fork+0x3f/0x70
>>    [<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170
>>
>> I could not explain, or find a code path, which would explain
>> a +20 second lockup, but from some instrumentation it was
>> apparent the interrupts off proportion of time was between
>> 10-25% under heavy load which is quite bad.
>>
>> By moving the GT interrupt handling to a tasklet in a most
>> simple way, the problem above disappears completely.
>
> Perfect segue into gem_syslatency. I think gem_syslatency is the better
> tool to correlate disruptive system behaviour. And then continue on with
> gem_latency to demonstrate that is doesn't adversely affect our
> performance.

Will do.

>> Also, gem_latency -n 100 shows 25% better throughput and CPU
>> usage, and 14% better latencies.
>
> Mention the benefits of parallelising dispatch.

Hm, actually this should be the same as before I think.

> As fairly hit-and-miss as perf testing is on these machines, it is
> looking in favour of using tasklets vs the rt kthread. The numbers swing
> between 2-10%, but consistently improves in the nop sync latencies.
> There's still several hours to go in this run before we cover the
> dispatch latenies, but so far reasonable.
>
> (Hmm, looks like there may be a possible degredation on the single nop
> dispatch but an improvement on the continuous nop dispatch.)

We can add all the numbers you get to the commit message as well.

>> I did not find any gains or regressions with Synmark2 or
>> GLbench under light testing. More benchmarking is certainly
>> required.
>>
>> v2:
>>     * execlists_lock should be taken as spin_lock_bh when
>>       queuing work from userspace now. (Chris Wilson)
>>     * uncore.lock must be taken with spin_lock_irq when
>>       submitting requests since that now runs from either
>>       softirq or process context.
>
> There are a couple of execlist_lock usage outside of intel_lrc that may
> or may not be useful to convert (low frequency reset / debug paths, so
> way off the critical paths, but consistency in locking is invaluable).

Oh right, I've missed those.

>
>> +	tasklet_init(&engine->irq_tasklet, intel_lrc_irq_handler,
>> +		     (unsigned long)engine);
>
> I like trying to split lines to cluster arguments if possible. Here I
> think intel_lrc_irq_handler pairs with engine,
>
> 	tasklet_init(&engine->irq_tasklet,
> 		     intel_lrc_irq_handler, (unsigned long)engine);
>
> *shrug*

Yeah it is nicer.

>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> index 221a94627aab..29810cba8a8c 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> @@ -266,6 +266,7 @@ struct  intel_engine_cs {
>>   	} semaphore;
>>
>>   	/* Execlists */
>> +	struct tasklet_struct irq_tasklet;
>>   	spinlock_t execlist_lock;
>
> spinlock_t execlist_lock; /* used inside tasklet, use spin_lock_bh */

Will do.

> It's looking good, but once this run completes, I'm going to repeat it
> just to confirm how stable my numbers are.
>
> Critical bugfix, improvements, simpler patch than my kthread
> implementation,
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

Okay I will respin with the above and we'll see.

Unfortunately my test platform just died so there will be a delay.

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2016-03-24 11:50 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-22 17:30 [RFC] drm/i915: Move execlists irq handler to a bottom half Tvrtko Ursulin
2016-03-23  7:02 ` ✓ Fi.CI.BAT: success for " Patchwork
2016-03-23  9:07 ` [RFC] " Daniel Vetter
2016-03-23  9:14   ` Chris Wilson
2016-03-23 10:08     ` Tvrtko Ursulin
2016-03-23 11:31       ` Chris Wilson
2016-03-23 12:46         ` Tvrtko Ursulin
2016-03-23 12:56           ` Chris Wilson
2016-03-23 15:23             ` Tvrtko Ursulin
2016-03-24  9:37               ` Tvrtko Ursulin
2016-03-24 12:18             ` [PATCH v3] " Tvrtko Ursulin
2016-03-24 22:24               ` Chris Wilson
2016-03-25 12:56                 ` Chris Wilson
2016-04-04 11:11                   ` [PATCH v4] " Tvrtko Ursulin
2016-04-04 11:27                     ` Chris Wilson
2016-04-04 12:51                       ` Tvrtko Ursulin
2016-03-23 14:57 ` [RFC v2] " Tvrtko Ursulin
2016-03-24 10:56   ` Chris Wilson
2016-03-24 11:50     ` Tvrtko Ursulin [this message]
2016-03-24 12:58       ` Tvrtko Ursulin
2016-03-24 15:56   ` Imre Deak
2016-03-24 16:05     ` Chris Wilson
2016-03-24 16:40       ` Imre Deak
2016-03-24 19:56         ` Chris Wilson
2016-03-24 22:13           ` Imre Deak
2016-03-23 15:32 ` ✗ Fi.CI.BAT: failure for drm/i915: Move execlists irq handler to a bottom half (rev2) Patchwork
2016-03-24 14:03 ` ✗ Fi.CI.BAT: warning for drm/i915: Move execlists irq handler to a bottom half (rev3) Patchwork
2016-03-24 15:17   ` Tvrtko Ursulin
2016-04-04 12:33 ` ✗ Fi.CI.BAT: failure for drm/i915: Move execlists irq handler to a bottom half (rev4) Patchwork
2016-04-04 12:42   ` Tvrtko Ursulin
2016-04-04 12:53     ` Chris Wilson
2016-04-04 13:14       ` Tvrtko Ursulin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56F3D48E.7000303@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=Intel-gfx@lists.freedesktop.org \
    --cc=chris@chris-wilson.co.uk \
    --cc=tvrtko.ursulin@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.