All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Matt Fleming <matt@codeblueprint.co.uk>,
	Mel Gorman <mgorman@techsingularity.net>,
	LKML <linux-kernel@vger.kernel.org>
Subject: [PATCH 4/4] sched: Allow tasks to stack with a workqueue on the same CPU
Date: Mon, 18 Dec 2017 09:43:27 +0000	[thread overview]
Message-ID: <20171218094327.19562-5-mgorman@techsingularity.net> (raw)
In-Reply-To: <20171218094327.19562-1-mgorman@techsingularity.net>

If tasks wake a kworker to do some work and is woken on completion and it
was a per-cpu kworker that was used then a situation can arise where the
current CPU is always active when the kworker is waking and select_idle_sibling
moves the task. This leads to a situation where a task moves around the socket
each time a kworker is used even through the relationship is effectively sync.
This patch special cases a kworker running on the same CPU. It has a noticable
impact on migrations and performance of dbench running with the XFS filesystem
but has no impact on ext4 as ext4 interacts with a kthread, not a kworker.

                          4.15.0-rc3             4.15.0-rc3
                            wakeprev                stackwq
Hmean     1        392.92 (   0.00%)     1024.22 ( 160.67%)
Hmean     2        787.09 (   0.00%)     1808.38 ( 129.75%)
Hmean     4       1559.71 (   0.00%)     2525.42 (  61.92%)
Hmean     8       2576.05 (   0.00%)     2881.12 (  11.84%)
Hmean     16      2949.28 (   0.00%)     3137.65 (   6.39%)
Hmean     32      3041.89 (   0.00%)     3147.92 (   3.49%)
Hmean     64      1655.42 (   0.00%)     1756.21 (   6.09%)
Hmean     128     1133.19 (   0.00%)     1165.39 (   2.84%)
Stddev    1          2.59 (   0.00%)       11.21 (-332.82%)
Stddev    2          8.96 (   0.00%)       13.57 ( -51.44%)
Stddev    4         20.15 (   0.00%)        8.51 (  57.75%)
Stddev    8         17.15 (   0.00%)       14.45 (  15.75%)
Stddev    16        30.29 (   0.00%)       31.30 (  -3.33%)
Stddev    32        64.45 (   0.00%)       57.22 (  11.21%)
Stddev    64        55.89 (   0.00%)       62.84 ( -12.43%)
Stddev    128       55.89 (   0.00%)       62.75 ( -12.27%)

There is also a large drop in system CPU usage;

          4.15.0-rc3  4.15.0-rc3
            wakeprev     stackwq
User         1561.85     1166.59
System       6961.89     4965.09
Elapsed      1472.05     1471.84

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/fair.c     | 29 +++++++++++++++++++++++++++--
 kernel/sched/features.h |  8 ++++++++
 2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 95b1145bc38d..cff55481bd19 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5684,6 +5684,19 @@ static int wake_wide(struct task_struct *p)
 	return 1;
 }
 
+/*
+ * Returns true if a wakeup is either from or to a workqueue and the tasks
+ * appear to be synchronised with each other.
+ */
+static bool
+is_wakeup_workqueue_sync(struct task_struct *p, int this_cpu, int prev_cpu)
+{
+	return sched_feat(WA_STACK_WQ) &&
+		this_cpu == prev_cpu &&
+		((p->flags & PF_WQ_WORKER) || (current->flags & PF_WQ_WORKER)) &&
+		this_rq()->nr_running <= 1;
+}
+
 /*
  * The purpose of wake_affine() is to quickly determine on which CPU we can run
  * soonest. For the purpose of speed we only consider the waking and previous
@@ -5735,7 +5748,7 @@ wake_affine_idle(int this_cpu, int prev_cpu, int sync)
 }
 
 static int
-wake_affine_sync(int this_cpu, int sync)
+wake_affine_sync(struct task_struct *p, int this_cpu, int prev_cpu, int sync)
 {
 	/*
 	 * Consider stacking tasks if it's a sync wakeup and there is only
@@ -5745,6 +5758,14 @@ wake_affine_sync(int this_cpu, int sync)
 	if (sync && cpu_rq(this_cpu)->nr_running == 1)
 		return this_cpu;
 
+	/*
+	 * If the waker or wakee is a workqueue and it appears to be similar
+	 * to a sync wakeup then assume the waker will sleep shortly and allow
+	 * the tasks to stack on the same CPU.
+	 */
+	if (is_wakeup_workqueue_sync(p, this_cpu, prev_cpu))
+		return this_cpu;
+
 	return nr_cpumask_bits;
 }
 
@@ -5794,7 +5815,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
 		new_cpu = wake_affine_idle(this_cpu, prev_cpu, sync);
 
 	if (sched_feat(WA_IDLE) && new_cpu == nr_cpumask_bits)
-		new_cpu = wake_affine_sync(this_cpu, sync);
+		new_cpu = wake_affine_sync(p, this_cpu, prev_cpu, sync);
 
 	if (sched_feat(WA_WEIGHT) && new_cpu == nr_cpumask_bits)
 		new_cpu = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync);
@@ -6240,6 +6261,10 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	if (idle_cpu(target))
 		return target;
 
+	/* Allow a wakeup to stack if it looks like a synchronous workqueue */
+	if (is_wakeup_workqueue_sync(p, smp_processor_id(), target))
+		return target;
+
 	/*
 	 * If the previous cpu is cache affine and idle, don't be stupid.
 	 */
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 9552fd5854bf..c96ad246584a 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -85,3 +85,11 @@ SCHED_FEAT(ATTACH_AGE_LOAD, true)
 SCHED_FEAT(WA_IDLE, true)
 SCHED_FEAT(WA_WEIGHT, true)
 SCHED_FEAT(WA_BIAS, true)
+
+/*
+ * If true then a process may stack with a workqueue on the same CPU during
+ * wakeup instead of finding an idle sibling. This should only happen in the
+ * case where there appears to be a strong relationship beween the wq and the
+ * task e.g. IO operations dispatched to a workqueue on XFS.
+ */
+SCHED_FEAT(WA_STACK_WQ, true)
-- 
2.15.0

  parent reply	other threads:[~2017-12-18  9:44 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-18  9:43 [PATCH 0/4] Reduce scheduler migrations due to wake_affine Mel Gorman
2017-12-18  9:43 ` [PATCH 1/4] sched: Only immediately migrate tasks due to interrupts if prev and target CPUs share cache Mel Gorman
2017-12-18  9:43 ` [PATCH 2/4] sched: Allow a wakee to run on the prev_cpu if it is idle and cache-affine with the waker Mel Gorman
2017-12-18 10:26   ` Peter Zijlstra
2017-12-18 10:56     ` Mel Gorman
2017-12-18 10:59       ` Peter Zijlstra
2017-12-18  9:43 ` [PATCH 3/4] sched: Comment on why sync wakeups try to run on the current CPU Mel Gorman
2017-12-19 19:06   ` Peter Zijlstra
2017-12-19 20:25     ` Mel Gorman
2017-12-20  4:09     ` Mike Galbraith
2017-12-20  4:21       ` Mike Galbraith
2017-12-18  9:43 ` Mel Gorman [this message]
2017-12-18 10:44   ` [PATCH 4/4] sched: Allow tasks to stack with a workqueue on the same CPU Mike Galbraith
2017-12-18 11:25     ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171218094327.19562-5-mgorman@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matt@codeblueprint.co.uk \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.