From: Michael Wang <wangyun@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
Ingo Molnar <mingo@kernel.org>, Mike Galbraith <efault@gmx.de>,
Alex Shi <alex.shi@intel.com>, Namhyung Kim <namhyung@kernel.org>,
Paul Turner <pjt@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
"Nikunj A. Dadhania" <nikunj@linux.vnet.ibm.com>,
Ram Pai <linuxram@us.ibm.com>
Subject: Re: [PATCH] sched: smart wake-affine
Date: Tue, 02 Jul 2013 17:35:33 +0800 [thread overview]
Message-ID: <51D29EE5.8080307@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130702085202.GA23916@twins.programming.kicks-ass.net>
Hi, Peter
Thanks for your review :)
On 07/02/2013 04:52 PM, Peter Zijlstra wrote:
[snip]
>> +static void record_wakee(struct task_struct *p)
>> +{
>> + /*
>> + * Rough decay, don't worry about the boundary, really active
>> + * task won't care the loose.
>> + */
>
> OK so we 'decay' once a second.
>
>> + if (jiffies > current->last_switch_decay + HZ) {
>> + current->nr_wakee_switch = 0;
>> + current->last_switch_decay = jiffies;
>> + }
>
> This isn't so much a decay as it is wiping state. Did you try an actual
> decay -- something like: current->nr_wakee_switch >>= 1; ?
>
> I suppose you wanted to avoid something like:
>
> now = jiffies;
> while (now > current->last_switch_decay + HZ) {
> current->nr_wakee_switch >>= 1;
> current->last_switch_decay += HZ;
> }
Right, actually I have though about the decay problem with some testing,
including some similar implementations like this, but one issue I could
not solve is:
the task waken up after dequeue 10secs and the task waken up
after dequeue 1sec will suffer the same decay.
Thus, in order to keep fair, we have to do some calculation here to make
the decay correct, but that means cost...
So I pick this wiping method, and the cost performance is not so bad :)
>
> ?
>
> And we increment every time we wake someone else. Gaining a measure of
> how often we wake someone else.
>
>> + if (current->last_wakee != p) {
>> + current->last_wakee = p;
>> + current->nr_wakee_switch++;
>> + }
>> +}
>> +
>> +static int nasty_pull(struct task_struct *p)
>
> I've seen there's some discussion as to this function name.. good :-) It
> really wants to change. How about something like:
>
> int wake_affine()
> {
> ...
>
> /*
> * If we wake multiple tasks be careful to not bounce
> * ourselves around too much.
> */
> if (wake_wide(p))
> return 0;
Do you mean wake_wipe() here?
>
>
>> +{
>> + int factor = cpumask_weight(cpu_online_mask);
>
> We have num_cpus_online() for this.. however both are rather expensive.
> Having to walk and count a 4096 bitmap for every wakeup if going to get
> tiresome real quick.
>
> I suppose the question is; to what level do we really want to scale?
>
> One fair answer would be node size I suppose; do you really want to go
> bigger than that?
Agree, it sounds more reasonable, let me do some testing on it.
>
> Also; you compare a size against a switching frequency, that's not
> really and apples to apples comparison.
>
>> +
>> + /*
>> + * Yeah, it's the switching-frequency, could means many wakee or
>> + * rapidly switch, use factor here will just help to automatically
>> + * adjust the loose-degree, so more cpu will lead to more pull.
>> + */
>> + if (p->nr_wakee_switch > factor) {
>> + /*
>> + * wakee is somewhat hot, it needs certain amount of cpu
>> + * resource, so if waker is far more hot, prefer to leave
>> + * it alone.
>> + */
>> + if (current->nr_wakee_switch > (factor * p->nr_wakee_switch))
>> + return 1;
>
> Ah ok, this makes more sense; the first is simply a filter to avoid
> doing the second dereference I suppose.
Yeah, the first one is some kind of vague filter, the second one is the
core filter ;-)
>
>> + }
>> +
>> + return 0;
>> +}
>> +
>> static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
>> {
>> s64 this_load, load;
>> @@ -3118,6 +3157,9 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
>> unsigned long weight;
>> int balanced;
>>
>> + if (nasty_pull(p))
>> + return 0;
>> +
>> idx = sd->wake_idx;
>> this_cpu = smp_processor_id();
>> prev_cpu = task_cpu(p);
>> @@ -3410,6 +3452,9 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
>> /* while loop will break here if sd == NULL */
>> }
>> unlock:
>> + if (sd_flag & SD_BALANCE_WAKE)
>> + record_wakee(p);
>
> if we put this in task_waking_fair() we can avoid an entire conditional!
Nice, will do it in next version :)
Regards,
Michael Wang
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
next prev parent reply other threads:[~2013-07-02 9:36 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-28 5:05 [RFC PATCH] sched: smart wake-affine Michael Wang
2013-06-03 2:28 ` Michael Wang
2013-06-03 3:09 ` Mike Galbraith
2013-06-03 3:26 ` Michael Wang
2013-06-03 3:53 ` Mike Galbraith
2013-06-03 4:52 ` Michael Wang
2013-06-03 5:22 ` Mike Galbraith
2013-06-03 5:50 ` Michael Wang
2013-06-03 6:05 ` Mike Galbraith
2013-06-03 6:31 ` Michael Wang
2013-06-13 3:09 ` Michael Wang
2013-07-02 4:43 ` [PATCH] " Michael Wang
2013-07-02 5:38 ` Mike Galbraith
2013-07-02 5:50 ` Michael Wang
2013-07-02 5:54 ` Mike Galbraith
2013-07-02 6:17 ` Michael Wang
2013-07-02 6:29 ` Mike Galbraith
2013-07-02 6:45 ` Michael Wang
2013-07-02 8:52 ` Peter Zijlstra
2013-07-02 9:35 ` Michael Wang [this message]
2013-07-02 9:44 ` Michael Wang
2013-07-04 9:13 ` Peter Zijlstra
2013-07-04 9:38 ` Michael Wang
2013-07-04 10:33 ` Mike Galbraith
2013-07-05 2:47 ` Michael Wang
2013-07-05 4:08 ` Mike Galbraith
2013-07-05 4:33 ` Michael Wang
2013-07-05 5:41 ` Mike Galbraith
2013-07-05 6:16 ` Michael Wang
2013-07-07 6:43 ` Mike Galbraith
2013-07-08 2:49 ` Michael Wang
2013-07-08 3:12 ` Mike Galbraith
2013-07-08 8:21 ` Peter Zijlstra
2013-07-08 8:49 ` Mike Galbraith
2013-07-08 9:08 ` Michael Wang
2013-07-08 8:58 ` Michael Wang
2013-07-08 18:59 ` Davidlohr Bueso
2013-07-09 2:30 ` Michael Wang
2013-07-09 2:36 ` Davidlohr Bueso
2013-07-09 2:52 ` Michael Wang
2013-07-15 5:13 ` Michael Wang
2013-07-15 5:57 ` Davidlohr Bueso
2013-07-15 6:01 ` Michael Wang
2013-07-18 2:15 ` Michael Wang
2013-07-03 6:10 ` [PATCH v2] " Michael Wang
2013-07-03 8:50 ` Peter Zijlstra
2013-07-03 9:11 ` Michael Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51D29EE5.8080307@linux.vnet.ibm.com \
--to=wangyun@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=alex.shi@intel.com \
--cc=efault@gmx.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxram@us.ibm.com \
--cc=mingo@kernel.org \
--cc=namhyung@kernel.org \
--cc=nikunj@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).