From: Kirill Tkhai <ktkhai@odin.com>
To: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: <linux-kernel@vger.kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>
Subject: Re: [PATCH] sched/fair: Skip wake_affine() for core siblings
Date: Wed, 30 Sep 2015 22:16:23 +0300 [thread overview]
Message-ID: <560C3507.3040906@odin.com> (raw)
In-Reply-To: <1443547751.2790.24.camel@gmail.com>
On 29.09.2015 20:29, Mike Galbraith wrote:
> On Tue, 2015-09-29 at 19:00 +0300, Kirill Tkhai wrote:
>>
>> On 29.09.2015 17:55, Mike Galbraith wrote:
>>> On Mon, 2015-09-28 at 18:36 +0300, Kirill Tkhai wrote:
>>>
>>>> ---
>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>> index 4df37a4..dfbe06b 100644
>>>> --- a/kernel/sched/fair.c
>>>> +++ b/kernel/sched/fair.c
>>>> @@ -4930,8 +4930,13 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
>>>> int want_affine = 0;
>>>> int sync = wake_flags & WF_SYNC;
>>>>
>>>> - if (sd_flag & SD_BALANCE_WAKE)
>>>> - want_affine = !wake_wide(p) && cpumask_test_cpu(cpu, tsk_cpus_allowed(p));
>>>> + if (sd_flag & SD_BALANCE_WAKE) {
>>>> + want_affine = 1;
>>>> + if (cpu == prev_cpu || !cpumask_test_cpu(cpu, tsk_cpus_allowed(p)))
>>>> + goto want_affine;
>>>> + if (wake_wide(p))
>>>> + goto want_affine;
>>>> + }
>>>
>>> That blew wake_wide() right out of the water.
>>>
>>> It's not only about things like pgbench. Drive multiple tasks in a Xen
>>> guest (single event channel dom0 -> domu, and no select_idle_sibling()
>>> to save the day) via network, and watch workers fail to be all they can
>>> be because they keep being stacked up on the irq source. Load balancing
>>> yanks them apart, next irq stacks them right back up. I met that in
>>> enterprise land, thought wake_wide() should cure it, and indeed it did.
>>
>> 1)Hm.. The patch makes select_task_rq_fair() to prefer old cpu instead of
>> current, doesn't it? We more often don't set affine_sd. So, the skipped
>> part of patch (skipped in quote) selects prev_cpu.
>
> Not the way I read it..
>
>>> - if (affine_sd) {
>>> +want_affine:
>>> + if (want_affine) {
>>> sd = NULL; /* Prefer wake_affine over balance flags */
>>> - if (cpu != prev_cpu && wake_affine(affine_sd, p, sync))
>>> + if (affine_sd && wake_affine(affine_sd, p, sync))
>>> new_cpu = cpu;
>>> - }
>>> -
>>> - if (!sd) {
>>> - if (sd_flag & SD_BALANCE_WAKE) /* XXX always ? */
>>> - new_cpu = select_idle_sibling(p, new_cpu);
>>> -
>>> + new_cpu = select_idle_sibling(p, new_cpu);
>
> ..it sets new_cpu = cpu if wake_affine() says Ok, wake_wide() has no say
> in the matter.
>
>> 2)I thought about waking by irq handler and even was going to ask why
>> we use affine logic for such wakeups. Device handlers usually aren't
>> bound, timers may migrate since NO_HZ logic presents. The only explanation
>> I found is unbound timers is very unlikely case (I added statistics printk
>> to my local sched_debug to check that). But if we have the situations like
>> you described above, don't we have to disable affine logic for in_interrupt()
>> cases?
>
> BTDT. In my experience, the more you try to differentiate sources, the
> more corner cases you create. I've tried doing special things for irq,
> locks, wake_all, wake_one, and it always turned into a can of worms.
> IMHO, the best policy for the fast patch is KISS.
>
>> 3)I ask about just because (being outside of scheduler history) it's a little
>> bit strange, we prefer smp_processor_id()'s sd_llc so much. Sync wakeup's
>> profit is less or more clear: smp_processor_id()'s sd_llc may contain some
>> data, which is interesting for a wakee, and this minimizes cache misses.
>> But we do the same in other cases too, and at every migration we loose
>> itlb, dtlb... Of course, it requires more accurate patches, then posted
>> (not so rude patches).
>
> IMHO, the sync wakeup hint is more often a big fat lie than anything
> else, it really just gives us a bit more headroom for affine wakeups in
> cases where that's likely to be a very good thing (affine in the cache
> sense, not affine as in an individual CPU). What it means is that waker
> is likely to schedule RSN, but if you measure even very fast/light
> things, there is an overlap win to be had by NOT waking CPU affine,
> rather waking cache affine, that's why we cross core schedule so often.
> A real network app doing a wakeup does is not necessarily gonna schedule
> RSN, there is very often a latency win to be had by scheduling to a
> nearby core, ie a thread pool worker doing a "sync" wakeup may very
> instantly find that it has more work to do. If a fast/light wakee can
> slip into an idle crack and get to CPU instantly, it can generate more
> work a little bit sooner.
Yeah, in most places, where sync wakeup is used, task is not going to reschedule
soon..
Thanks for the explanation, Mike!
prev parent reply other threads:[~2015-09-30 19:16 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-25 17:54 [PATCH] sched/fair: Skip wake_affine() for core siblings Kirill Tkhai
2015-09-26 15:25 ` Mike Galbraith
2015-09-28 10:28 ` Kirill Tkhai
2015-09-28 13:12 ` Mike Galbraith
2015-09-28 15:36 ` Kirill Tkhai
2015-09-28 15:49 ` Kirill Tkhai
2015-09-28 18:22 ` Mike Galbraith
2015-09-28 19:19 ` Kirill Tkhai
2015-09-29 2:03 ` Mike Galbraith
2015-09-29 14:55 ` Mike Galbraith
2015-09-29 16:00 ` Kirill Tkhai
2015-09-29 16:03 ` Kirill Tkhai
2015-09-29 17:29 ` Mike Galbraith
2015-09-30 19:16 ` Kirill Tkhai [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=560C3507.3040906@odin.com \
--to=ktkhai@odin.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=umgwanakikbuti@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).