public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched: Clear ttwu_pending after enqueue_task
@ 2022-11-01  7:36 Tianchen Ding
  2022-11-01 10:34 ` Peter Zijlstra
  2022-11-04  2:36 ` [PATCH v2] " Tianchen Ding
  0 siblings, 2 replies; 11+ messages in thread
From: Tianchen Ding @ 2022-11-01  7:36 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider
  Cc: linux-kernel

We found a long tail latency in schbench whem m*t is close to nr_cpus.
(e.g., "schbench -m 2 -t 16" on a machine with 32 cpus.)

This is because when the wakee cpu is idle, rq->ttwu_pending is cleared
too early, and idle_cpu() will return true until the wakee task enqueued.
This will mislead the waker when selecting idle cpu, and wake multiple
worker threads on the same wakee cpu. This situation is enlarged by
commit f3dd3f674555 ("sched: Remove the limitation of WF_ON_CPU on
wakelist if wakee cpu is idle") because it tends to use wakelist.

Here is the result of "schbench -m 2 -t 16" on a VM with 32vcpu
(Intel(R) Xeon(R) Platinum 8369B).

Latency percentiles (usec):
                base      base+revert_f3dd3f674555   base+this_patch
50.0000th:         9                            13                 9
75.0000th:        12                            19                12
90.0000th:        15                            22                15
95.0000th:        18                            24                17
*99.0000th:       27                            31                24
99.5000th:      3364                            33                27
99.9000th:     12560                            36                30

Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
---
 kernel/sched/core.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 87c9cdf37a26..b07de1753be5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3739,13 +3739,6 @@ void sched_ttwu_pending(void *arg)
 	if (!llist)
 		return;
 
-	/*
-	 * rq::ttwu_pending racy indication of out-standing wakeups.
-	 * Races such that false-negatives are possible, since they
-	 * are shorter lived that false-positives would be.
-	 */
-	WRITE_ONCE(rq->ttwu_pending, 0);
-
 	rq_lock_irqsave(rq, &rf);
 	update_rq_clock(rq);
 
@@ -3760,6 +3753,7 @@ void sched_ttwu_pending(void *arg)
 	}
 
 	rq_unlock_irqrestore(rq, &rf);
+	WRITE_ONCE(rq->ttwu_pending, 0);
 }
 
 void send_call_function_single_ipi(int cpu)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-11-16  9:22 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-01  7:36 [PATCH] sched: Clear ttwu_pending after enqueue_task Tianchen Ding
2022-11-01 10:34 ` Peter Zijlstra
2022-11-01 13:51   ` Chen Yu
2022-11-01 14:59     ` Peter Zijlstra
2022-11-02  3:01       ` Chen Yu
2022-11-02  6:40       ` Tianchen Ding
2022-11-02  6:40   ` Tianchen Ding
2022-11-04  2:36 ` [PATCH v2] " Tianchen Ding
2022-11-04  8:00   ` Chen Yu
2022-11-14 15:27   ` Mel Gorman
2022-11-16  9:22   ` [tip: sched/core] sched: Clear ttwu_pending after enqueue_task() tip-bot2 for Tianchen Ding

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox