All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Galbraith <umgwanakikbuti@gmail.com>
To: Kirill Tkhai <ktkhai@odin.com>
Cc: linux-kernel@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>
Subject: Re: [PATCH] sched/fair: Skip wake_affine() for core siblings
Date: Tue, 29 Sep 2015 19:29:11 +0200	[thread overview]
Message-ID: <1443547751.2790.24.camel@gmail.com> (raw)
In-Reply-To: <560AB591.4070407@odin.com>

On Tue, 2015-09-29 at 19:00 +0300, Kirill Tkhai wrote:
> 
> On 29.09.2015 17:55, Mike Galbraith wrote:
> > On Mon, 2015-09-28 at 18:36 +0300, Kirill Tkhai wrote:
> > 
> >> ---
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index 4df37a4..dfbe06b 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -4930,8 +4930,13 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
> >>  	int want_affine = 0;
> >>  	int sync = wake_flags & WF_SYNC;
> >>  
> >> -	if (sd_flag & SD_BALANCE_WAKE)
> >> -		want_affine = !wake_wide(p) && cpumask_test_cpu(cpu, tsk_cpus_allowed(p));
> >> +	if (sd_flag & SD_BALANCE_WAKE) {
> >> +		want_affine = 1;
> >> +		if (cpu == prev_cpu || !cpumask_test_cpu(cpu, tsk_cpus_allowed(p)))
> >> +			goto want_affine;
> >> +		if (wake_wide(p))
> >> +			goto want_affine;
> >> +	}
> > 
> > That blew wake_wide() right out of the water.
> > 
> > It's not only about things like pgbench.  Drive multiple tasks in a Xen
> > guest (single event channel dom0 -> domu, and no select_idle_sibling()
> > to save the day) via network, and watch workers fail to be all they can
> > be because they keep being stacked up on the irq source.  Load balancing
> > yanks them apart, next irq stacks them right back up.  I met that in
> > enterprise land, thought wake_wide() should cure it, and indeed it did.
> 
> 1)Hm.. The patch makes select_task_rq_fair() to prefer old cpu instead of
> current, doesn't it? We more often don't set affine_sd. So, the skipped
> part of patch (skipped in quote) selects prev_cpu.

Not the way I read it..

>> -    if (affine_sd) {
>> +want_affine:
>> +    if (want_affine) {
>>              sd = NULL; /* Prefer wake_affine over balance flags */
>> -            if (cpu != prev_cpu && wake_affine(affine_sd, p, sync))
>> +            if (affine_sd && wake_affine(affine_sd, p, sync))
>>                      new_cpu = cpu;
>> -    }
>> -
>> -    if (!sd) {
>> -            if (sd_flag & SD_BALANCE_WAKE) /* XXX always ? */
>> -                    new_cpu = select_idle_sibling(p, new_cpu);
>> -
>> +            new_cpu = select_idle_sibling(p, new_cpu);

..it sets new_cpu = cpu if wake_affine() says Ok, wake_wide() has no say
in the matter.
 
> 2)I thought about waking by irq handler and even was going to ask why
> we use affine logic for such wakeups. Device handlers usually aren't
> bound, timers may migrate since NO_HZ logic presents. The only explanation
> I found is unbound timers is very unlikely case (I added statistics printk
> to my local sched_debug to check that). But if we have the situations like
> you described above, don't we have to disable affine logic for in_interrupt()
> cases?

BTDT.  In my experience, the more you try to differentiate sources, the
more corner cases you create.  I've tried doing special things for irq,
locks, wake_all, wake_one, and it always turned into a can of worms.
IMHO, the best policy for the fast patch is KISS.

> 3)I ask about just because (being outside of scheduler history) it's a little
> bit strange, we prefer smp_processor_id()'s sd_llc so much. Sync wakeup's
> profit is less or more clear: smp_processor_id()'s sd_llc may contain some
> data, which is interesting for a wakee, and this minimizes cache misses.
> But we do the same in other cases too, and at every migration we loose
> itlb, dtlb... Of course, it requires more accurate patches, then posted
> (not so rude patches).

IMHO, the sync wakeup hint is more often a big fat lie than anything
else, it really just gives us a bit more headroom for affine wakeups in
cases where that's likely to be a very good thing (affine in the cache
sense, not affine as in an individual CPU).  What it means is that waker
is likely to schedule RSN, but if you measure even very fast/light
things, there is an overlap win to be had by NOT waking CPU affine,
rather waking cache affine, that's why we cross core schedule so often.
A real network app doing a wakeup does is not necessarily gonna schedule
RSN, there is very often a latency win to be had by scheduling to a
nearby core, ie a thread pool worker doing a "sync" wakeup may very
instantly find that it has more work to do.  If a fast/light wakee can
slip into an idle crack and get to CPU instantly, it can generate more
work a little bit sooner.

	-Mike


  parent reply	other threads:[~2015-09-29 17:29 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-25 17:54 [PATCH] sched/fair: Skip wake_affine() for core siblings Kirill Tkhai
2015-09-26 15:25 ` Mike Galbraith
2015-09-28 10:28   ` Kirill Tkhai
2015-09-28 13:12     ` Mike Galbraith
2015-09-28 15:36       ` Kirill Tkhai
2015-09-28 15:49         ` Kirill Tkhai
2015-09-28 18:22         ` Mike Galbraith
2015-09-28 19:19           ` Kirill Tkhai
2015-09-29  2:03             ` Mike Galbraith
2015-09-29 14:55         ` Mike Galbraith
2015-09-29 16:00           ` Kirill Tkhai
2015-09-29 16:03             ` Kirill Tkhai
2015-09-29 17:29             ` Mike Galbraith [this message]
2015-09-30 19:16               ` Kirill Tkhai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1443547751.2790.24.camel@gmail.com \
    --to=umgwanakikbuti@gmail.com \
    --cc=ktkhai@odin.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.