From: Bharata B Rao <bharata@linux.vnet.ibm.com>
To: linux-kernel@vger.kernel.org
Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>,
Balbir Singh <balbir@linux.vnet.ibm.com>,
Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>,
Gautham R Shenoy <ego@in.ibm.com>,
Srivatsa Vaddagiri <vatsa@in.ibm.com>,
Ingo Molnar <mingo@elte.hu>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Pavel Emelyanov <xemul@openvz.org>,
Herbert Poetzl <herbert@13thfloor.at>,
Avi Kivity <avi@redhat.com>, Chris Friesen <cfriesen@nortel.com>,
Paul Menage <menage@google.com>,
Mike Waychison <mikew@google.com>
Subject: [RFC v2 PATCH 0/8] CFS Hard limits - v2
Date: Wed, 30 Sep 2009 18:19:20 +0530 [thread overview]
Message-ID: <20090930124919.GA19951@in.ibm.com> (raw)
Hi,
Here is the v2 post of hard limits feature for CFS group scheduler. This
RFC post mainly adds runtime borrowing feature and has a new locking scheme
to protect CFS runtime related fields.
It would be nice to have some comments on this set!
Changes
-------
RFC v2:
- Upgraded to 2.6.31.
- Added CFS runtime borrowing.
- New locking scheme
The hard limit specific fields of cfs_rq (cfs_runtime, cfs_time and
cfs_throttled) were being protected by rq->lock. This simple scheme will
not work when runtime rebalancing is introduced where it will be required
to look at these fields on other CPU's which requires us to acquire
rq->lock of other CPUs. This will not be feasible from update_curr().
Hence introduce a separate lock (rq->runtime_lock) to protect these
fields of all cfs_rq under it.
- Handle the task wakeup in a throttled group correctly.
- Make CFS_HARD_LIMITS dependent on CGROUP_SCHED (Thanks to Andrea Righi)
RFC v1:
- First version of the patches with minimal features was posted at
http://lkml.org/lkml/2009/8/25/128
RFC v0:
- The CFS hard limits proposal was first posted at
http://lkml.org/lkml/2009/6/4/24
Testing and Benchmark numbers
-----------------------------
- This patchset has seen very minimal testing on 24way machine and is expected
to have bugs. I need to test this under more test scenarios.
- I have run a few common benchmarks to see if my patches introduce any visible
overhead. I am aware that the number of runs or the combinations I have
used may not be ideal, but the intention in this early stage is to catch any
serious regressions that the patches would have introduced.
- I plan to get numbers from more benchmarks in future releases. Any inputs
on specific benchmarks to try would be helpful.
- hackbench (hackbench -pipe N)
(hackbench was run as part of a group under root group)
-----------------------------------------------------------------------
Time
-----------------------------------------------------------------
N CFS_HARD_LIMTS=n CFS_HARD_LIMTS=y CFS_HARD_LIMITS=y
(infinite runtime) (BW=450000/500000)
-----------------------------------------------------------------------
10 0.475 0.384 0.253
20 0.610 0.670 0.692
50 1.250 1.201 1.295
100 1.981 2.174 1.583
-----------------------------------------------------------------------
- BW = Bandwidth = runtime/period
- Infinite runtime means no hard limiting
- lmbench (lat_ctx -N 5 -s <size_in_kb> N)
(i) size_in_kb = 1024
-----------------------------------------------------------------------
Context switch time (us)
-----------------------------------------------------------------
N CFS_HARD_LIMTS=n CFS_HARD_LIMTS=y CFS_HARD_LIMITS=y
(infinite runtime) (BW=450000/500000)
-----------------------------------------------------------------------
10 315.87 330.19 317.04
100 675.52 699.90 698.50
500 775.01 772.86 772.30
-----------------------------------------------------------------------
(ii) size_in_kb = 2048
-----------------------------------------------------------------------
Context switch time (us)
-----------------------------------------------------------------
N CFS_HARD_LIMTS=n CFS_HARD_LIMTS=y CFS_HARD_LIMITS=y
(infinite runtime) (BW=450000/500000)
-----------------------------------------------------------------------
10 1319.01 1332.16 1328.09
100 1400.77 1372.67 1382.27
500 1479.40 1524.57 1615.84
-----------------------------------------------------------------------
- kernbench
Average Half load -j 12 Run (std deviation):
------------------------------------------------------------------------------
CFS_HARD_LIMTS=n CFS_HARD_LIMTS=y CFS_HARD_LIMITS=y
(infinite runtime) (BW=450000/500000)
------------------------------------------------------------------------------
Elapsd 5.716 (0.278711) 6.06 (0.479322) 5.41 (0.360694)
User 20.464 (2.22087) 22.978 (3.43738) 18.486 (2.60754)
System 14.82 (1.52086) 16.68 (2.3438) 13.514 (1.77074)
% CPU 615.2 (41.1667) 651.6 (43.397) 588.4 (42.0214)
CtxSwt 2727.8 (243.19) 3030.6 (425.338) 2536 (302.498)
Sleeps 4981.4 (442.337) 5532.2 (847.27) 4554.6 (510.532)
------------------------------------------------------------------------------
Average Optimal load -j 96 Run (std deviation):
------------------------------------------------------------------------------
CFS_HARD_LIMTS=n CFS_HARD_LIMTS=y CFS_HARD_LIMITS=y
(infinite runtime) (BW=450000/500000)
------------------------------------------------------------------------------
Elapsd 4.826 (0.276641) 4.776 (0.291599) 5.13 (0.50448)
User 21.278 (2.67999) 22.138 (3.2045) 21.988 (5.63116)
System 19.213 (5.38314) 19.796 (4.32574) 20.407 (8.53682)
% CPU 778.3 (184.522) 786.1 (154.295) 803.1 (244.865)
CtxSwt 2906.5 (387.799) 3052.1 (397.15) 3030.6 (765.418)
Sleeps 4576.6 (565.383) 4796 (990.278) 4576.9 (625.933)
------------------------------------------------------------------------------
Average Maximal load -j Run (std deviation):
------------------------------------------------------------------------------
CFS_HARD_LIMTS=n CFS_HARD_LIMTS=y CFS_HARD_LIMITS=y
(infinite runtime) (BW=450000/500000)
------------------------------------------------------------------------------
Elapsd 5.13 (0.530236) 5.062 (0.0408656) 4.94 (0.229891)
User 22.7293 (4.37921) 22.9973 (2.86311) 22.5507 (4.78016)
System 21.966 (6.81872) 21.9713 (4.72952) 22.0287 (7.39655)
% CPU 860 (202.295) 859.8 (164.415) 864.467 (218.721)
CtxSwt 3154.27 (659.933) 3172.93 (370.439) 3127.2 (657.224)
Sleeps 4602.6 (662.155) 4676.67 (813.274) 4489.2 (542.859)
------------------------------------------------------------------------------
Features TODO
-------------
- CFS runtime borrowing still needs some work, especially need to handle
runtime redistribution when a CPU goes offline.
- Bandwidth inheritance support (long term, not under consideration currently)
- This implementation doesn't work for user group scheduler. Since user group
scheduler will eventually go away, I don't plan to work on this.
Implementation TODO
-------------------
- It is possible to share some of the bandwidth handling code with RT, but
the intention of this post is to show the changes associated with hard limits.
Hence the sharing/cleanup will be done down the line when this patchset
itself becomes more accepatable.
- When a dequeued entity is enqueued back, I don't change its vruntime. The
entity might get undue advantage due to its old (lower) vruntime. Need to
address this.
Patches description
-------------------
This post has the following patches:
1/8 sched: Rename sched_rt_period_mask() and use it in CFS also
2/8 sched: Maintain aggregated tasks count in cfs_rq at each hierarchy level
3/8 sched: Bandwidth initialization for fair task groups
4/8 sched: Enforce hard limits by throttling
5/8 sched: Unthrottle the throttled tasks
6/8 sched: Add throttle time statistics to /proc/sched_debug
7/8 sched: CFS runtime borrowing
8/8 sched: Hard limits documentation
Documentation/scheduler/sched-cfs-hard-limits.txt | 52 ++
include/linux/sched.h | 9
init/Kconfig | 13
kernel/sched.c | 427 +++++++++++++++++++
kernel/sched_debug.c | 21
kernel/sched_fair.c | 432 +++++++++++++++++++-
kernel/sched_rt.c | 22 -
7 files changed, 932 insertions(+), 44 deletions(-)
Regards,
Bharata.
next reply other threads:[~2009-09-30 12:51 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-30 12:49 Bharata B Rao [this message]
2009-09-30 12:50 ` [RFC v2 PATCH 1/8] sched: Rename sched_rt_period_mask() and use it in CFS also Bharata B Rao
2009-09-30 12:51 ` [RFC v2 PATCH 2/8] sched: Maintain aggregated tasks count in cfs_rq at each hierarchy level Bharata B Rao
2009-10-13 14:27 ` Peter Zijlstra
2009-10-14 3:42 ` Bharata B Rao
2009-09-30 12:52 ` [RFC v2 PATCH 3/8] sched: Bandwidth initialization for fair task groups Bharata B Rao
2009-10-13 14:27 ` Peter Zijlstra
2009-10-14 3:49 ` Bharata B Rao
2009-09-30 12:52 ` [RFC v2 PATCH 4/8] sched: Enforce hard limits by throttling Bharata B Rao
2009-10-13 14:27 ` Peter Zijlstra
2009-10-14 3:41 ` Bharata B Rao
2009-10-14 9:17 ` Peter Zijlstra
2009-10-14 11:50 ` Bharata B Rao
2009-10-14 13:18 ` Herbert Poetzl
2009-10-15 3:30 ` Bharata B Rao
2009-09-30 12:53 ` [RFC v2 PATCH 5/8] sched: Unthrottle the throttled tasks Bharata B Rao
2009-09-30 12:54 ` [RFC v2 PATCH 6/8] sched: Add throttle time statistics to /proc/sched_debug Bharata B Rao
2009-09-30 12:55 ` [RFC v2 PATCH 7/8] sched: Rebalance cfs runtimes Bharata B Rao
2009-09-30 12:55 ` [RFC v2 PATCH 8/8] sched: Hard limits documentation Bharata B Rao
2009-09-30 13:36 ` [RFC v2 PATCH 0/8] CFS Hard limits - v2 Pavel Emelyanov
2009-09-30 14:25 ` Bharata B Rao
2009-09-30 14:39 ` Srivatsa Vaddagiri
2009-09-30 15:09 ` Pavel Emelyanov
2009-10-13 11:39 ` Pavel Emelyanov
2009-10-13 12:03 ` Herbert Poetzl
2009-10-13 12:19 ` Pavel Emelyanov
2009-10-13 12:30 ` Dhaval Giani
2009-10-13 12:45 ` Pavel Emelyanov
2009-10-13 12:56 ` Dhaval Giani
2009-10-13 12:57 ` Bharata B Rao
2009-10-13 13:01 ` Pavel Emelyanov
2009-10-13 14:56 ` Valdis.Kletnieks
2009-10-13 22:02 ` Herbert Poetzl
2009-10-13 14:49 ` Valdis.Kletnieks
2009-09-30 14:38 ` Balbir Singh
2009-09-30 15:10 ` Pavel Emelyanov
2009-09-30 15:30 ` Balbir Singh
2009-09-30 22:30 ` Herbert Poetzl
2009-10-01 5:12 ` Bharata B Rao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090930124919.GA19951@in.ibm.com \
--to=bharata@linux.vnet.ibm.com \
--cc=a.p.zijlstra@chello.nl \
--cc=avi@redhat.com \
--cc=balbir@linux.vnet.ibm.com \
--cc=cfriesen@nortel.com \
--cc=dhaval@linux.vnet.ibm.com \
--cc=ego@in.ibm.com \
--cc=herbert@13thfloor.at \
--cc=linux-kernel@vger.kernel.org \
--cc=menage@google.com \
--cc=mikew@google.com \
--cc=mingo@elte.hu \
--cc=svaidy@linux.vnet.ibm.com \
--cc=vatsa@in.ibm.com \
--cc=xemul@openvz.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.