From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>,
Jens Axboe <axboe@kernel.dk>,
intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>,
Eero Tamminen <eero.t.tamminen@intel.com>,
"Rantala, Valtteri" <valtteri.rantala@intel.com>,
stable@kernel.vger.org, dri-devel@lists.freedesktop.org
Subject: Re: [PATCH 2/2] drm/i915: Limit the busy wait on requests to 2us not 10ms!
Date: Mon, 16 Nov 2015 10:24:45 +0000 [thread overview]
Message-ID: <5649AEED.9090807@linux.intel.com> (raw)
In-Reply-To: <1447594364-4206-2-git-send-email-chris@chris-wilson.co.uk>
Hi,
On 15/11/15 13:32, Chris Wilson wrote:
> When waiting for high frequency requests, the finite amount of time
> required to set up the irq and wait upon it limits the response rate. By
> busywaiting on the request completion for a short while we can service
> the high frequency waits as quick as possible. However, if it is a slow
> request, we want to sleep as quickly as possible. The tradeoff between
> waiting and sleeping is roughly the time it takes to sleep on a request,
> on the order of a microsecond. Based on measurements from big core, I
> have set the limit for busywaiting as 2 microseconds.
Sounds like solid reasoning. Would it also be worth finding the trade
off limit for small core?
> The code currently uses the jiffie clock, but that is far too coarse (on
> the order of 10 milliseconds) and results in poor interactivity as the
> CPU ends up being hogged by slow requests. To get microsecond resolution
> we need to use a high resolution timer. The cheapest of which is polling
> local_clock(), but that is only valid on the same CPU. If we switch CPUs
> because the task was preempted, we can also use that as an indicator that
> the system is too busy to waste cycles on spinning and we should sleep
> instead.
Hm, need_resched would not cover the CPU switch anyway? Or maybe
need_resched means something other than I thought which is "there are
other runnable tasks"?
This would also have impact on the patch subject line.I thought we would
burn a jiffie of CPU cycles only if there are no other runnable tasks -
so how come an impact on interactivity?
Also again I think the commit message needs some data on how this was
found and what is the impact.
Btw as it happens, just last week as I was playing with perf, I did
notice busy spinning is the top cycle waster in some benchmarks. I was
in the process of trying to quantize the difference with it on or off
but did not complete it.
> __i915_spin_request was introduced in
> commit 2def4ad99befa25775dd2f714fdd4d92faec6e34 [v4.2]
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date: Tue Apr 7 16:20:41 2015 +0100
>
> drm/i915: Optimistically spin for the request completion
>
> Reported-by: Jens Axboe <axboe@kernel.dk>
> Link: https://lkml.org/lkml/2015/11/12/621
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc; "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com>
> Cc: stable@kernel.vger.org
> ---
> drivers/gpu/drm/i915/i915_gem.c | 28 +++++++++++++++++++++++++---
> 1 file changed, 25 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 740530c571d1..2a88158bd1f7 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1146,14 +1146,36 @@ static bool missed_irq(struct drm_i915_private *dev_priv,
> return test_bit(ring->id, &dev_priv->gpu_error.missed_irq_rings);
> }
>
> +static u64 local_clock_us(unsigned *cpu)
> +{
> + u64 t;
> +
> + *cpu = get_cpu();
> + t = local_clock() >> 10;
Needs comment I think to explicitly mention the approximation, or maybe
drop the _us suffix?
> + put_cpu();
> +
> + return t;
> +}
> +
> +static bool busywait_stop(u64 timeout, unsigned cpu)
> +{
> + unsigned this_cpu;
> +
> + if (time_after64(local_clock_us(&this_cpu), timeout))
> + return true;
> +
> + return this_cpu != cpu;
> +}
> +
> static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
> {
> - unsigned long timeout;
> + u64 timeout;
> + unsigned cpu;
>
> if (i915_gem_request_get_ring(req)->irq_refcount)
> return -EBUSY;
>
> - timeout = jiffies + 1;
> + timeout = local_clock_us(&cpu) + 2;
> while (!need_resched()) {
> if (i915_gem_request_completed(req, true))
> return 0;
> @@ -1161,7 +1183,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
> if (signal_pending_state(state, current))
> break;
>
> - if (time_after_eq(jiffies, timeout))
> + if (busywait_stop(timeout, cpu))
> break;
>
> cpu_relax_lowlatency();
>
Otherwise looks good. Not sure what would you convert to 32-bit from
your follow up reply since you need us resolution?
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
WARNING: multiple messages have this Message-ID (diff)
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>,
Jens Axboe <axboe@kernel.dk>,
intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org,
Daniel Vetter <daniel.vetter@ffwll.ch>,
Eero Tamminen <eero.t.tamminen@intel.com>,
"Rantala, Valtteri" <valtteri.rantala@intel.com>,
stable@kernel.vger.org
Subject: Re: [PATCH 2/2] drm/i915: Limit the busy wait on requests to 2us not 10ms!
Date: Mon, 16 Nov 2015 10:24:45 +0000 [thread overview]
Message-ID: <5649AEED.9090807@linux.intel.com> (raw)
In-Reply-To: <1447594364-4206-2-git-send-email-chris@chris-wilson.co.uk>
Hi,
On 15/11/15 13:32, Chris Wilson wrote:
> When waiting for high frequency requests, the finite amount of time
> required to set up the irq and wait upon it limits the response rate. By
> busywaiting on the request completion for a short while we can service
> the high frequency waits as quick as possible. However, if it is a slow
> request, we want to sleep as quickly as possible. The tradeoff between
> waiting and sleeping is roughly the time it takes to sleep on a request,
> on the order of a microsecond. Based on measurements from big core, I
> have set the limit for busywaiting as 2 microseconds.
Sounds like solid reasoning. Would it also be worth finding the trade
off limit for small core?
> The code currently uses the jiffie clock, but that is far too coarse (on
> the order of 10 milliseconds) and results in poor interactivity as the
> CPU ends up being hogged by slow requests. To get microsecond resolution
> we need to use a high resolution timer. The cheapest of which is polling
> local_clock(), but that is only valid on the same CPU. If we switch CPUs
> because the task was preempted, we can also use that as an indicator that
> the system is too busy to waste cycles on spinning and we should sleep
> instead.
Hm, need_resched would not cover the CPU switch anyway? Or maybe
need_resched means something other than I thought which is "there are
other runnable tasks"?
This would also have impact on the patch subject line.I thought we would
burn a jiffie of CPU cycles only if there are no other runnable tasks -
so how come an impact on interactivity?
Also again I think the commit message needs some data on how this was
found and what is the impact.
Btw as it happens, just last week as I was playing with perf, I did
notice busy spinning is the top cycle waster in some benchmarks. I was
in the process of trying to quantize the difference with it on or off
but did not complete it.
> __i915_spin_request was introduced in
> commit 2def4ad99befa25775dd2f714fdd4d92faec6e34 [v4.2]
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date: Tue Apr 7 16:20:41 2015 +0100
>
> drm/i915: Optimistically spin for the request completion
>
> Reported-by: Jens Axboe <axboe@kernel.dk>
> Link: https://lkml.org/lkml/2015/11/12/621
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc; "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com>
> Cc: stable@kernel.vger.org
> ---
> drivers/gpu/drm/i915/i915_gem.c | 28 +++++++++++++++++++++++++---
> 1 file changed, 25 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 740530c571d1..2a88158bd1f7 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1146,14 +1146,36 @@ static bool missed_irq(struct drm_i915_private *dev_priv,
> return test_bit(ring->id, &dev_priv->gpu_error.missed_irq_rings);
> }
>
> +static u64 local_clock_us(unsigned *cpu)
> +{
> + u64 t;
> +
> + *cpu = get_cpu();
> + t = local_clock() >> 10;
Needs comment I think to explicitly mention the approximation, or maybe
drop the _us suffix?
> + put_cpu();
> +
> + return t;
> +}
> +
> +static bool busywait_stop(u64 timeout, unsigned cpu)
> +{
> + unsigned this_cpu;
> +
> + if (time_after64(local_clock_us(&this_cpu), timeout))
> + return true;
> +
> + return this_cpu != cpu;
> +}
> +
> static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
> {
> - unsigned long timeout;
> + u64 timeout;
> + unsigned cpu;
>
> if (i915_gem_request_get_ring(req)->irq_refcount)
> return -EBUSY;
>
> - timeout = jiffies + 1;
> + timeout = local_clock_us(&cpu) + 2;
> while (!need_resched()) {
> if (i915_gem_request_completed(req, true))
> return 0;
> @@ -1161,7 +1183,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
> if (signal_pending_state(state, current))
> break;
>
> - if (time_after_eq(jiffies, timeout))
> + if (busywait_stop(timeout, cpu))
> break;
>
> cpu_relax_lowlatency();
>
Otherwise looks good. Not sure what would you convert to 32-bit from
your follow up reply since you need us resolution?
Regards,
Tvrtko
next prev parent reply other threads:[~2015-11-16 10:24 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-15 13:32 [PATCH 1/2] drm/i915: Break busywaiting for requests on pending signals Chris Wilson
2015-11-15 13:32 ` Chris Wilson
2015-11-15 13:32 ` [PATCH 2/2] drm/i915: Limit the busy wait on requests to 2us not 10ms! Chris Wilson
2015-11-15 13:32 ` Chris Wilson
2015-11-15 17:48 ` Chris Wilson
2015-11-15 17:48 ` Chris Wilson
2015-11-16 10:24 ` Tvrtko Ursulin [this message]
2015-11-16 10:24 ` Tvrtko Ursulin
2015-11-16 11:12 ` Chris Wilson
2015-11-16 11:12 ` Chris Wilson
2015-11-16 12:08 ` Tvrtko Ursulin
2015-11-16 12:08 ` Tvrtko Ursulin
2015-11-16 12:55 ` Chris Wilson
2015-11-16 12:55 ` Chris Wilson
2015-11-16 13:09 ` Tvrtko Ursulin
2015-11-16 13:09 ` Tvrtko Ursulin
2015-11-16 13:30 ` [Intel-gfx] " Ville Syrjälä
2015-11-16 13:30 ` Ville Syrjälä
2015-11-16 16:48 ` Jens Axboe
2015-11-18 9:56 ` Limit busywaiting Chris Wilson
2015-11-18 9:56 ` [PATCH 1/3] drm/i915: Only spin whilst waiting on the current request Chris Wilson
2015-11-18 17:03 ` Daniel Vetter
2015-11-19 10:05 ` Tvrtko Ursulin
2015-11-19 10:12 ` Chris Wilson
2015-11-18 9:56 ` [PATCH 2/3] drm/i915: Convert __i915_wait_request to receive flags Chris Wilson
2015-11-18 9:56 ` [PATCH 3/3] drm/i915: Limit request busywaiting Chris Wilson
2015-11-19 15:22 ` Daniel Vetter
2015-11-19 16:29 ` Limit busywaiting Jens Axboe
2015-12-03 22:03 ` [PATCH 2/2] drm/i915: Limit the busy wait on requests to 2us not 10ms! Pavel Machek
2015-12-03 22:03 ` Pavel Machek
2015-11-16 9:54 ` [PATCH 1/2] drm/i915: Break busywaiting for requests on pending signals Tvrtko Ursulin
2015-11-16 9:54 ` Tvrtko Ursulin
2015-11-16 11:22 ` Chris Wilson
2015-11-16 11:22 ` Chris Wilson
2015-11-16 11:40 ` Tvrtko Ursulin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5649AEED.9090807@linux.intel.com \
--to=tvrtko.ursulin@linux.intel.com \
--cc=axboe@kernel.dk \
--cc=chris@chris-wilson.co.uk \
--cc=daniel.vetter@ffwll.ch \
--cc=dri-devel@lists.freedesktop.org \
--cc=eero.t.tamminen@intel.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@kernel.vger.org \
--cc=valtteri.rantala@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.