From: tip-bot for Venkatesh Pallipadi <venki@google.com>
To: linux-tip-commits@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, hpa@zytor.com, mingo@redhat.com,
a.p.zijlstra@chello.nl, riel@redhat.com, tglx@linutronix.de,
venki@google.com, mingo@elte.hu
Subject: [tip:sched/core] sched: Next buddy hint on sleep and preempt path
Date: Tue, 19 Apr 2011 12:05:41 GMT [thread overview]
Message-ID: <tip-2f36825b176f67e5c5228aa33d828bc39718811f@git.kernel.org> (raw)
In-Reply-To: <1302802253-25760-1-git-send-email-venki@google.com>
Commit-ID: 2f36825b176f67e5c5228aa33d828bc39718811f
Gitweb: http://git.kernel.org/tip/2f36825b176f67e5c5228aa33d828bc39718811f
Author: Venkatesh Pallipadi <venki@google.com>
AuthorDate: Thu, 14 Apr 2011 10:30:53 -0700
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Tue, 19 Apr 2011 10:08:38 +0200
sched: Next buddy hint on sleep and preempt path
When a task in a taskgroup sleeps, pick_next_task starts all the way back at
the root and picks the task/taskgroup with the min vruntime across all
runnable tasks.
But when there are many frequently sleeping tasks across different taskgroups,
it makes better sense to stay with same taskgroup for its slice period (or
until all tasks in the taskgroup sleeps) instead of switching cross taskgroup
on each sleep after a short runtime.
This helps specifically where taskgroups corresponds to a process with
multiple threads. The change reduces the number of CR3 switches in this case.
Example:
Two taskgroups with 2 threads each which are running for 2ms and
sleeping for 1ms. Looking at sched:sched_switch shows:
BEFORE: taskgroup_1 threads [5004, 5005], taskgroup_2 threads [5016, 5017]
cpu-soaker-5004 [003] 3683.391089
cpu-soaker-5016 [003] 3683.393106
cpu-soaker-5005 [003] 3683.395119
cpu-soaker-5017 [003] 3683.397130
cpu-soaker-5004 [003] 3683.399143
cpu-soaker-5016 [003] 3683.401155
cpu-soaker-5005 [003] 3683.403168
cpu-soaker-5017 [003] 3683.405170
AFTER: taskgroup_1 threads [21890, 21891], taskgroup_2 threads [21934, 21935]
cpu-soaker-21890 [003] 865.895494
cpu-soaker-21935 [003] 865.897506
cpu-soaker-21934 [003] 865.899520
cpu-soaker-21935 [003] 865.901532
cpu-soaker-21934 [003] 865.903543
cpu-soaker-21935 [003] 865.905546
cpu-soaker-21891 [003] 865.907548
cpu-soaker-21890 [003] 865.909560
cpu-soaker-21891 [003] 865.911571
cpu-soaker-21890 [003] 865.913582
cpu-soaker-21891 [003] 865.915594
cpu-soaker-21934 [003] 865.917606
Similar problem is there when there are multiple taskgroups and say a task A
preempts currently running task B of taskgroup_1. On schedule, pick_next_task
can pick an unrelated task on taskgroup_2. Here it would be better to give some
preference to task B on pick_next_task.
A simple (may be extreme case) benchmark I tried was tbench with 2 tbench
client processes with 2 threads each running on a single CPU. Avg throughput
across 5 50 sec runs was:
BEFORE: 105.84 MB/sec
AFTER: 112.42 MB/sec
Signed-off-by: Venkatesh Pallipadi <venki@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1302802253-25760-1-git-send-email-venki@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/sched_fair.c | 26 +++++++++++++++++++++++---
1 files changed, 23 insertions(+), 3 deletions(-)
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 501ab63..5280272 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1344,6 +1344,8 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
hrtick_update(rq);
}
+static void set_next_buddy(struct sched_entity *se);
+
/*
* The dequeue_task method is called before nr_running is
* decreased. We remove the task from the rbtree and
@@ -1353,14 +1355,22 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
{
struct cfs_rq *cfs_rq;
struct sched_entity *se = &p->se;
+ int task_sleep = flags & DEQUEUE_SLEEP;
for_each_sched_entity(se) {
cfs_rq = cfs_rq_of(se);
dequeue_entity(cfs_rq, se, flags);
/* Don't dequeue parent if it has other entities besides us */
- if (cfs_rq->load.weight)
+ if (cfs_rq->load.weight) {
+ /*
+ * Bias pick_next to pick a task from this cfs_rq, as
+ * p is sleeping when it is within its sched_slice.
+ */
+ if (task_sleep && parent_entity(se))
+ set_next_buddy(parent_entity(se));
break;
+ }
flags |= DEQUEUE_SLEEP;
}
@@ -1877,12 +1887,15 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_
struct sched_entity *se = &curr->se, *pse = &p->se;
struct cfs_rq *cfs_rq = task_cfs_rq(curr);
int scale = cfs_rq->nr_running >= sched_nr_latency;
+ int next_buddy_marked = 0;
if (unlikely(se == pse))
return;
- if (sched_feat(NEXT_BUDDY) && scale && !(wake_flags & WF_FORK))
+ if (sched_feat(NEXT_BUDDY) && scale && !(wake_flags & WF_FORK)) {
set_next_buddy(pse);
+ next_buddy_marked = 1;
+ }
/*
* We can come here with TIF_NEED_RESCHED already set from new task
@@ -1910,8 +1923,15 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_
update_curr(cfs_rq);
find_matching_se(&se, &pse);
BUG_ON(!pse);
- if (wakeup_preempt_entity(se, pse) == 1)
+ if (wakeup_preempt_entity(se, pse) == 1) {
+ /*
+ * Bias pick_next to pick the sched entity that is
+ * triggering this preemption.
+ */
+ if (!next_buddy_marked)
+ set_next_buddy(pse);
goto preempt;
+ }
return;
next prev parent reply other threads:[~2011-04-19 12:06 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-01 23:33 [PATCH] sched: next buddy hint on sleep and preempt path Venkatesh Pallipadi
2011-03-02 2:44 ` Rik van Riel
2011-03-02 5:43 ` Paul Turner
2011-03-02 6:47 ` Mike Galbraith
2011-03-02 7:08 ` Paul Turner
2011-03-02 7:40 ` Mike Galbraith
2011-03-02 19:12 ` Venkatesh Pallipadi
2011-03-08 0:59 ` [PATCH] sched: next buddy hint on sleep and preempt path - v1 Venkatesh Pallipadi
2011-03-08 1:29 ` Paul Turner
2011-03-08 1:47 ` Venkatesh Pallipadi
2011-04-14 1:21 ` [PATCH 0/2] sched: Avoid frequent cross taskgroup switches -v2 Venkatesh Pallipadi
2011-04-14 1:21 ` [PATCH 1/2] sched: Make set_*_buddy work on non-task entity -v2 Venkatesh Pallipadi
2011-04-19 12:05 ` [tip:sched/core] sched: Make set_*_buddy() work on non-task entities tip-bot for Venkatesh Pallipadi
2011-04-14 1:21 ` [PATCH 2/2] sched: next buddy hint on sleep and preempt path -v2 Venkatesh Pallipadi
2011-04-14 10:50 ` Peter Zijlstra
2011-04-14 17:30 ` Venkatesh Pallipadi
2011-04-15 21:45 ` Rik van Riel
2011-04-19 12:05 ` tip-bot for Venkatesh Pallipadi [this message]
2011-03-08 2:33 ` [PATCH] sched: next buddy hint on sleep and preempt path - v1 Venkatesh Pallipadi
2011-03-02 19:22 ` [PATCH] sched: next buddy hint on sleep and preempt path Venkatesh Pallipadi
2011-03-02 10:31 ` Peter Zijlstra
2011-03-02 15:25 ` Mike Galbraith
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=tip-2f36825b176f67e5c5228aa33d828bc39718811f@git.kernel.org \
--to=venki@google.com \
--cc=a.p.zijlstra@chello.nl \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-tip-commits@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=mingo@redhat.com \
--cc=riel@redhat.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).