From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>,
Eero Tamminen <eero.t.tamminen@intel.com>,
stable@kernel.vger.org, "Rantala,
Valtteri" <valtteri.rantala@intel.com>
Subject: Re: [PATCH 02/15] drm/i915: Limit the busy wait on requests to 10us not 10ms!
Date: Mon, 30 Nov 2015 10:02:30 +0000 [thread overview]
Message-ID: <565C1EB6.4060309@linux.intel.com> (raw)
In-Reply-To: <1448786893-2522-3-git-send-email-chris@chris-wilson.co.uk>
On 29/11/15 08:48, Chris Wilson wrote:
> When waiting for high frequency requests, the finite amount of time
> required to set up the irq and wait upon it limits the response rate. By
> busywaiting on the request completion for a short while we can service
> the high frequency waits as quick as possible. However, if it is a slow
> request, we want to sleep as quickly as possible. The tradeoff between
> waiting and sleeping is roughly the time it takes to sleep on a request,
> on the order of a microsecond. Based on measurements of synchronous
> workloads from across big core and little atom, I have set the limit for
> busywaiting as 10 microseconds. In most of the synchronous cases, we can
> reduce the limit down to as little as 2 miscroseconds, but that leaves
> quite a few test cases regressing by factors of 3 and more.
>
> The code currently uses the jiffie clock, but that is far too coarse (on
> the order of 10 milliseconds) and results in poor interactivity as the
> CPU ends up being hogged by slow requests. To get microsecond resolution
> we need to use a high resolution timer. The cheapest of which is polling
> local_clock(), but that is only valid on the same CPU. If we switch CPUs
> because the task was preempted, we can also use that as an indicator that
> the system is too busy to waste cycles on spinning and we should sleep
> instead.
>
> __i915_spin_request was introduced in
> commit 2def4ad99befa25775dd2f714fdd4d92faec6e34 [v4.2]
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date: Tue Apr 7 16:20:41 2015 +0100
>
> drm/i915: Optimistically spin for the request completion
>
> v2: Drop full u64 for unsigned long - the timer is 32bit wraparound safe,
> so we can use native register sizes on smaller architectures. Mention
> the approximate microseconds units for elapsed time and add some extra
> comments describing the reason for busywaiting.
>
> v3: Raise the limit to 10us
>
> Reported-by: Jens Axboe <axboe@kernel.dk>
> Link: https://lkml.org/lkml/2015/11/12/621
> Cc; "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com>
> Cc: stable@kernel.vger.org
> ---
> drivers/gpu/drm/i915/i915_gem.c | 47 +++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 45 insertions(+), 2 deletions(-)
Again,
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Regards,
Tvrtko
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 87fc34f5899f..bad112abb16b 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1146,14 +1146,57 @@ static bool missed_irq(struct drm_i915_private *dev_priv,
> return test_bit(ring->id, &dev_priv->gpu_error.missed_irq_rings);
> }
>
> +static unsigned long local_clock_us(unsigned *cpu)
> +{
> + unsigned long t;
> +
> + /* Cheaply and approximately convert from nanoseconds to microseconds.
> + * The result and subsequent calculations are also defined in the same
> + * approximate microseconds units. The principal source of timing
> + * error here is from the simple truncation.
> + *
> + * Note that local_clock() is only defined wrt to the current CPU;
> + * the comparisons are no longer valid if we switch CPUs. Instead of
> + * blocking preemption for the entire busywait, we can detect the CPU
> + * switch and use that as indicator of system load and a reason to
> + * stop busywaiting, see busywait_stop().
> + */
> + *cpu = get_cpu();
> + t = local_clock() >> 10;
> + put_cpu();
> +
> + return t;
> +}
> +
> +static bool busywait_stop(unsigned long timeout, unsigned cpu)
> +{
> + unsigned this_cpu;
> +
> + if (time_after(local_clock_us(&this_cpu), timeout))
> + return true;
> +
> + return this_cpu != cpu;
> +}
> +
> static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
> {
> unsigned long timeout;
> + unsigned cpu;
> +
> + /* When waiting for high frequency requests, e.g. during synchronous
> + * rendering split between the CPU and GPU, the finite amount of time
> + * required to set up the irq and wait upon it limits the response
> + * rate. By busywaiting on the request completion for a short while we
> + * can service the high frequency waits as quick as possible. However,
> + * if it is a slow request, we want to sleep as quickly as possible.
> + * The tradeoff between waiting and sleeping is roughly the time it
> + * takes to sleep on a request, on the order of a microsecond.
> + */
>
> if (i915_gem_request_get_ring(req)->irq_refcount)
> return -EBUSY;
>
> - timeout = jiffies + 1;
> + timeout = local_clock_us(&cpu) + 10;
> while (!need_resched()) {
> if (i915_gem_request_completed(req, true))
> return 0;
> @@ -1161,7 +1204,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
> if (signal_pending_state(state, current))
> break;
>
> - if (time_after_eq(jiffies, timeout))
> + if (busywait_stop(timeout, cpu))
> break;
>
> cpu_relax_lowlatency();
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2015-11-30 10:02 UTC|newest]
Thread overview: 92+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-29 8:47 i915_wait_request scaling Chris Wilson
2015-11-29 8:47 ` [PATCH 01/15] drm/i915: Break busywaiting for requests on pending signals Chris Wilson
2015-11-30 10:01 ` Tvrtko Ursulin
2015-11-29 8:48 ` [PATCH 02/15] drm/i915: Limit the busy wait on requests to 10us not 10ms! Chris Wilson
2015-11-30 10:02 ` Tvrtko Ursulin [this message]
2015-11-30 10:08 ` Chris Wilson
2015-11-29 8:48 ` [PATCH 03/15] drm/i915: Only spin whilst waiting on the current request Chris Wilson
2015-11-30 10:06 ` Tvrtko Ursulin
2015-12-01 15:47 ` Dave Gordon
2015-12-01 15:58 ` Chris Wilson
2015-12-01 16:44 ` Dave Gordon
2015-12-03 8:52 ` Daniel Vetter
2015-11-29 8:48 ` [PATCH 04/15] drm/i915: Cache the reset_counter for the request Chris Wilson
2015-12-01 8:31 ` Daniel Vetter
2015-12-01 8:47 ` Chris Wilson
2015-12-01 9:15 ` Chris Wilson
2015-12-01 11:05 ` [PATCH 1/3] drm/i915: Hide the atomic_read(reset_counter) behind a helper Chris Wilson
2015-12-01 11:05 ` [PATCH 2/3] drm/i915: Store the reset counter when constructing a request Chris Wilson
2015-12-03 8:59 ` Daniel Vetter
2015-12-01 11:05 ` [PATCH 3/3] drm/i915: Prevent leaking of -EIO from i915_wait_request() Chris Wilson
2015-12-03 9:14 ` Daniel Vetter
2015-12-03 9:41 ` Chris Wilson
2015-12-11 9:02 ` Chris Wilson
2015-12-11 16:46 ` Daniel Vetter
2015-12-03 8:57 ` [PATCH 1/3] drm/i915: Hide the atomic_read(reset_counter) behind a helper Daniel Vetter
2015-12-03 9:02 ` Chris Wilson
2015-12-03 9:20 ` Daniel Vetter
2015-11-29 8:48 ` [PATCH 05/15] drm/i915: Suppress error message when GPU resets are disabled Chris Wilson
2015-12-01 8:30 ` Daniel Vetter
2015-11-29 8:48 ` [PATCH 06/15] drm/i915: Delay queuing hangcheck to wait-request Chris Wilson
2015-11-29 8:48 ` [PATCH 07/15] drm/i915: Check the timeout passed to i915_wait_request Chris Wilson
2015-11-30 10:14 ` Tvrtko Ursulin
2015-11-30 10:19 ` Chris Wilson
2015-11-30 10:27 ` Tvrtko Ursulin
2015-11-30 10:22 ` Chris Wilson
2015-11-30 10:28 ` Tvrtko Ursulin
2015-11-29 8:48 ` [PATCH 08/15] drm/i915: Slaughter the thundering i915_wait_request herd Chris Wilson
2015-11-30 10:53 ` Chris Wilson
2015-11-30 12:09 ` Tvrtko Ursulin
2015-11-30 12:38 ` Chris Wilson
2015-11-30 13:33 ` Tvrtko Ursulin
2015-11-30 14:30 ` Chris Wilson
2015-11-30 12:05 ` Tvrtko Ursulin
2015-11-30 12:30 ` Chris Wilson
2015-11-30 13:32 ` Tvrtko Ursulin
2015-11-30 14:18 ` Chris Wilson
2015-12-01 17:06 ` Dave Gordon
2015-11-30 14:26 ` Chris Wilson
2015-11-30 14:34 ` [PATCH v4] " Chris Wilson
2015-11-30 16:30 ` Chris Wilson
2015-11-30 16:40 ` Chris Wilson
2015-12-01 18:34 ` Dave Gordon
2015-12-03 16:22 ` [PATCH v7] " Chris Wilson
2015-12-07 15:08 ` Tvrtko Ursulin
2015-12-08 10:44 ` Chris Wilson
2015-12-08 14:03 ` Tvrtko Ursulin
2015-12-08 14:33 ` Chris Wilson
2015-11-23 11:34 ` [RFC 00/12] Convert requests to use struct fence John.C.Harrison
2015-11-23 11:34 ` [RFC 01/12] staging/android/sync: Support sync points created from dma-fences John.C.Harrison
2015-11-23 13:29 ` Maarten Lankhorst
2015-11-23 13:31 ` [Intel-gfx] " Tvrtko Ursulin
2015-11-23 11:34 ` [RFC 02/12] staging/android/sync: add sync_fence_create_dma John.C.Harrison
2015-11-23 13:27 ` Maarten Lankhorst
2015-11-23 13:38 ` John Harrison
2015-11-23 13:44 ` Tvrtko Ursulin
2015-11-23 13:48 ` Maarten Lankhorst
2015-11-23 11:34 ` [RFC 03/12] staging/android/sync: Move sync framework out of staging John.C.Harrison
2015-11-23 11:34 ` [RFC 04/12] drm/i915: Convert requests to use struct fence John.C.Harrison
2015-11-23 11:34 ` [RFC 05/12] drm/i915: Removed now redudant parameter to i915_gem_request_completed() John.C.Harrison
2015-11-23 11:34 ` [RFC 06/12] drm/i915: Add per context timelines to fence object John.C.Harrison
2015-11-23 11:34 ` [RFC 07/12] drm/i915: Delay the freeing of requests until retire time John.C.Harrison
2015-11-23 11:34 ` [RFC 08/12] drm/i915: Interrupt driven fences John.C.Harrison
2015-12-11 12:17 ` Tvrtko Ursulin
2015-11-23 11:34 ` [RFC 09/12] drm/i915: Updated request structure tracing John.C.Harrison
2015-11-23 11:34 ` [RFC 10/12] android/sync: Fix reversed sense of signaled fence John.C.Harrison
2015-11-23 11:34 ` [RFC 11/12] drm/i915: Add sync framework support to execbuff IOCTL John.C.Harrison
2015-11-23 11:34 ` [RFC 12/12] drm/i915: Cache last IRQ seqno to reduce IRQ overhead John.C.Harrison
2015-11-23 11:38 ` [RFC 00/12] Convert requests to use struct fence John Harrison
2015-12-08 14:53 ` [PATCH v7] drm/i915: Slaughter the thundering i915_wait_request herd Dave Gordon
2015-11-30 15:45 ` [PATCH] drm/i915: Convert trace-irq to the breadcrumb waiter Chris Wilson
2015-11-29 8:48 ` [PATCH 09/15] drm/i915: Separate out the seqno-barrier from engine->get_seqno Chris Wilson
2015-11-29 8:48 ` [PATCH 10/15] drm/i915: Remove the lazy_coherency parameter from request-completed? Chris Wilson
2015-11-29 8:48 ` [PATCH 11/15] drm/i915: Use HWS for seqno tracking everywhere Chris Wilson
2015-11-29 8:48 ` [PATCH 12/15] drm/i915: Reduce seqno/irq barrier to a clflush on legacy gen6+ Chris Wilson
2015-11-29 8:48 ` [PATCH 13/15] drm/i915: Stop setting wraparound seqno on initialisation Chris Wilson
2015-12-01 16:57 ` Dave Gordon
2015-12-04 9:36 ` Daniel Vetter
2015-12-04 9:51 ` Chris Wilson
2015-11-29 8:48 ` [PATCH 14/15] drm/i915: Only query timestamp when measuring elapsed time Chris Wilson
2015-11-30 10:19 ` Tvrtko Ursulin
2015-11-30 14:31 ` Chris Wilson
2015-11-29 8:48 ` [PATCH 15/15] drm/i915: On GPU reset, set the HWS breadcrumb to the last seqno Chris Wilson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=565C1EB6.4060309@linux.intel.com \
--to=tvrtko.ursulin@linux.intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=daniel.vetter@ffwll.ch \
--cc=eero.t.tamminen@intel.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=stable@kernel.vger.org \
--cc=valtteri.rantala@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).