From: Tianchen Ding <dtcccc@linux.alibaba.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Daniel Bristot de Oliveira <bristot@redhat.com>,
Valentin Schneider <vschneid@redhat.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] sched: Clear ttwu_pending after enqueue_task
Date: Wed, 2 Nov 2022 14:40:28 +0800 [thread overview]
Message-ID: <84750ddd-43cf-a439-fb3e-2b047ffe1cfc@linux.alibaba.com> (raw)
In-Reply-To: <Y2D2HIZuGP81w25+@hirez.programming.kicks-ass.net>
On 2022/11/1 18:34, Peter Zijlstra wrote:
> On Tue, Nov 01, 2022 at 03:36:30PM +0800, Tianchen Ding wrote:
>> We found a long tail latency in schbench whem m*t is close to nr_cpus.
>> (e.g., "schbench -m 2 -t 16" on a machine with 32 cpus.)
>>
>> This is because when the wakee cpu is idle, rq->ttwu_pending is cleared
>> too early, and idle_cpu() will return true until the wakee task enqueued.
>> This will mislead the waker when selecting idle cpu, and wake multiple
>> worker threads on the same wakee cpu. This situation is enlarged by
>> commit f3dd3f674555 ("sched: Remove the limitation of WF_ON_CPU on
>> wakelist if wakee cpu is idle") because it tends to use wakelist.
>>
>> Here is the result of "schbench -m 2 -t 16" on a VM with 32vcpu
>> (Intel(R) Xeon(R) Platinum 8369B).
>>
>> Latency percentiles (usec):
>> base base+revert_f3dd3f674555 base+this_patch
>> 50.0000th: 9 13 9
>> 75.0000th: 12 19 12
>> 90.0000th: 15 22 15
>> 95.0000th: 18 24 17
>> *99.0000th: 27 31 24
>> 99.5000th: 3364 33 27
>> 99.9000th: 12560 36 30
>
> Nice; but have you also ran other benchmarks and confirmed it doesn't
> negatively affect those?
>
> If so; mentioning that is very helpful. If not; best go do so :-)
>
Thanks for the review.
We've tested with unixbench and hackbench (they show the average scores), and
the performance result seems no difference.
We don't mention here because what we found is a specific case in schbench
(where m*t==nr_cpus). It only affect long tail latency, so the problem and the
fix should also take effects on only this case, not the average scores.
>> Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
>> ---
>> kernel/sched/core.c | 8 +-------
>> 1 file changed, 1 insertion(+), 7 deletions(-)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 87c9cdf37a26..b07de1753be5 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -3739,13 +3739,6 @@ void sched_ttwu_pending(void *arg)
>> if (!llist)
>> return;
>>
>> - /*
>> - * rq::ttwu_pending racy indication of out-standing wakeups.
>> - * Races such that false-negatives are possible, since they
>> - * are shorter lived that false-positives would be.
>> - */
>> - WRITE_ONCE(rq->ttwu_pending, 0);
>> -
>> rq_lock_irqsave(rq, &rf);
>> update_rq_clock(rq);
>>
>
> Could you try the below instead? Also note the comment; since you did
> the work to figure out why -- best record that for posterity.
>
It works well for me. But I have the same thought with Chen Yu, and will explain
in detail in my next reply.
Thanks.
> @@ -3737,6 +3730,13 @@ void sched_ttwu_pending(void *arg)
> set_task_cpu(p, cpu_of(rq));
>
> ttwu_do_activate(rq, p, p->sched_remote_wakeup ? WF_MIGRATED : 0, &rf);
> + /*
> + * Must be after enqueueing at least once task such that
> + * idle_cpu() does not observe a false-negative -- if it does,
> + * it is possible for select_idle_siblings() to stack a number
> + * of tasks on this CPU during that window.
> + */
> + WRITE_ONCE(rq->ttwu_pending, 0);
> }
>
> rq_unlock_irqrestore(rq, &rf);
next prev parent reply other threads:[~2022-11-02 6:40 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-01 7:36 [PATCH] sched: Clear ttwu_pending after enqueue_task Tianchen Ding
2022-11-01 10:34 ` Peter Zijlstra
2022-11-01 13:51 ` Chen Yu
2022-11-01 14:59 ` Peter Zijlstra
2022-11-02 3:01 ` Chen Yu
2022-11-02 6:40 ` Tianchen Ding
2022-11-02 6:40 ` Tianchen Ding [this message]
2022-11-04 2:36 ` [PATCH v2] " Tianchen Ding
2022-11-04 8:00 ` Chen Yu
2022-11-14 15:27 ` Mel Gorman
2022-11-16 9:22 ` [tip: sched/core] sched: Clear ttwu_pending after enqueue_task() tip-bot2 for Tianchen Ding
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=84750ddd-43cf-a439-fb3e-2b047ffe1cfc@linux.alibaba.com \
--to=dtcccc@linux.alibaba.com \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox