From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>,
Jens Axboe <axboe@kernel.dk>,
intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org,
dri-devel@lists.freedesktop.org,
Daniel Vetter <daniel.vetter@ffwll.ch>,
Eero Tamminen <eero.t.tamminen@intel.com>,
"Rantala, Valtteri" <valtteri.rantala@intel.com>,
stable@kernel.vger.org
Subject: Re: [PATCH 2/2] drm/i915: Limit the busy wait on requests to 2us not 10ms!
Date: Mon, 16 Nov 2015 12:08:08 +0000 [thread overview]
Message-ID: <5649C728.5040109@linux.intel.com> (raw)
In-Reply-To: <20151116111208.GQ569@nuc-i3427.alporthouse.com>
On 16/11/15 11:12, Chris Wilson wrote:
> On Mon, Nov 16, 2015 at 10:24:45AM +0000, Tvrtko Ursulin wrote:
>>
>> Hi,
>>
>> On 15/11/15 13:32, Chris Wilson wrote:
>>> When waiting for high frequency requests, the finite amount of time
>>> required to set up the irq and wait upon it limits the response rate. By
>>> busywaiting on the request completion for a short while we can service
>>> the high frequency waits as quick as possible. However, if it is a slow
>>> request, we want to sleep as quickly as possible. The tradeoff between
>>> waiting and sleeping is roughly the time it takes to sleep on a request,
>>> on the order of a microsecond. Based on measurements from big core, I
>>> have set the limit for busywaiting as 2 microseconds.
>>
>> Sounds like solid reasoning. Would it also be worth finding the
>> trade off limit for small core?
>
> Takes a bit longer, but 2us seems "ok" on PineView (as in it doesn't
> lose the boost from spinning rather than sleeping). Have some more
> testing to do on vlv/byt.
Cool.
>>> The code currently uses the jiffie clock, but that is far too coarse (on
>>> the order of 10 milliseconds) and results in poor interactivity as the
>>> CPU ends up being hogged by slow requests. To get microsecond resolution
>>> we need to use a high resolution timer. The cheapest of which is polling
>>> local_clock(), but that is only valid on the same CPU. If we switch CPUs
>>> because the task was preempted, we can also use that as an indicator that
>>> the system is too busy to waste cycles on spinning and we should sleep
>>> instead.
>>
>> Hm, need_resched would not cover the CPU switch anyway? Or maybe
>> need_resched means something other than I thought which is "there
>> are other runnable tasks"?
>
> As I understand it, it means that the scheduler tick fired (or something
> else yielded). I haven't spotted if it gets set as the runqueue changes.
> As it pertains to us, it means that we need to get to schedule() as
> quick as possible which along this path means going to sleep.
>
> I'm not sure if need_resched() would catch the cpu switch, if we were
> preempted as the flag would be set on the idle process not us.
Could be, I wasn't sure at all, just curious and trying to understand it
fully. Cpu check is so cheap as implemented that it is not under any
criticism.
>> This would also have impact on the patch subject line.I thought we
>> would burn a jiffie of CPU cycles only if there are no other
>> runnable tasks - so how come an impact on interactivity?
>
> I have a couple of ideas for the effect on interactivty:
>
> 1. Burning through the time slice is acting as a penalty against running
> that process (typically the compositor) in the near future, perhaps
> enough to miss some deadlines.
>
> 2. Processor power balancing.
>
>> Also again I think the commit message needs some data on how this
>> was found and what is the impact.
>
> The system felt unresponsive. It would be interesting for me to know a
> few more details about the tick on that system (HZ, tickless?,
> preemption?) to see if changing the config on my xps13 also produces the
> lag/jitter/poor interactivty.
Yes interesting but not critical I think. Since the new scheme looks as
efficient as the old one so there should be no downside anyway.
>> Btw as it happens, just last week as I was playing with perf, I did
>> notice busy spinning is the top cycle waster in some benchmarks. I
>> was in the process of trying to quantize the difference with it on
>> or off but did not complete it.
>
> I found that spin-request appearing in profiles makes tracking down the
> culprit higer in the stack much easier. Otherwise you have to remember to
> enable a pass with the tracepoint to find the stalls (or use
> intel-gpu-overlay which does it for you).
I'll put it on my TODO list of things to play with.
>>> +static u64 local_clock_us(unsigned *cpu)
>>> +{
>>> + u64 t;
>>> +
>>> + *cpu = get_cpu();
>>> + t = local_clock() >> 10;
>>
>> Needs comment I think to explicitly mention the approximation, or
>> maybe drop the _us suffix?
>
> I did consider _approx_us but thought that was overkill. A comment along
> the lines of
> /* Approximately convert ns to us - the error is less than the
> * truncation!
> */
And the result is not used in subsequent calculations apart from
comparing against an approximate timeout?
>>> @@ -1161,7 +1183,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
>>> if (signal_pending_state(state, current))
>>> break;
>>>
>>> - if (time_after_eq(jiffies, timeout))
>>> + if (busywait_stop(timeout, cpu))
>>> break;
>>>
>>> cpu_relax_lowlatency();
>>>
>>
>> Otherwise looks good. Not sure what would you convert to 32-bit from
>> your follow up reply since you need us resolution?
>
> s/u64/unsigned long/ s/time_after64/time_after/
>
> 32bits of us resolution gives us 1000s before wraparound between the two
> samples. And I hope that a 1000s doesn't pass between loops. Or if it does,
> the GPU managed to complete its task.
Now I see that you did say low bits.. yes that sounds fine.
Btw while you are optimizing things maybe pick up this micro
optimization: http://patchwork.freedesktop.org/patch/64339/
Not in scope of this thread but under the normal development patch flow.
Btw2, any benchmark result changes with this?
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2015-11-16 12:08 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-15 13:32 [PATCH 1/2] drm/i915: Break busywaiting for requests on pending signals Chris Wilson
2015-11-15 13:32 ` [PATCH 2/2] drm/i915: Limit the busy wait on requests to 2us not 10ms! Chris Wilson
2015-11-15 17:48 ` Chris Wilson
2015-11-16 10:24 ` Tvrtko Ursulin
2015-11-16 11:12 ` Chris Wilson
2015-11-16 12:08 ` Tvrtko Ursulin [this message]
2015-11-16 12:55 ` Chris Wilson
2015-11-16 13:09 ` Tvrtko Ursulin
2015-11-16 13:30 ` [Intel-gfx] " Ville Syrjälä
2015-11-16 16:48 ` Jens Axboe
2015-11-18 9:56 ` Limit busywaiting Chris Wilson
2015-11-18 9:56 ` [PATCH 1/3] drm/i915: Only spin whilst waiting on the current request Chris Wilson
2015-11-18 17:03 ` Daniel Vetter
2015-11-19 10:05 ` Tvrtko Ursulin
2015-11-19 10:12 ` Chris Wilson
2015-11-18 9:56 ` [PATCH 2/3] drm/i915: Convert __i915_wait_request to receive flags Chris Wilson
2015-11-18 9:56 ` [PATCH 3/3] drm/i915: Limit request busywaiting Chris Wilson
2015-11-19 15:22 ` Daniel Vetter
2015-11-19 16:29 ` Limit busywaiting Jens Axboe
2015-12-03 22:03 ` [PATCH 2/2] drm/i915: Limit the busy wait on requests to 2us not 10ms! Pavel Machek
2015-11-16 9:54 ` [PATCH 1/2] drm/i915: Break busywaiting for requests on pending signals Tvrtko Ursulin
2015-11-16 11:22 ` Chris Wilson
2015-11-16 11:40 ` Tvrtko Ursulin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5649C728.5040109@linux.intel.com \
--to=tvrtko.ursulin@linux.intel.com \
--cc=axboe@kernel.dk \
--cc=chris@chris-wilson.co.uk \
--cc=daniel.vetter@ffwll.ch \
--cc=dri-devel@lists.freedesktop.org \
--cc=eero.t.tamminen@intel.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@kernel.vger.org \
--cc=valtteri.rantala@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox