From: Peter Zijlstra <peterz@infradead.org>
To: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
Paul Jackson <pj@sgi.com>
Subject: Re: [PATCH 1/2] Customize sched domain via cpuset
Date: Tue, 01 Apr 2008 13:48:11 +0200 [thread overview]
Message-ID: <1207050491.8514.710.camel@twins> (raw)
In-Reply-To: <47F21BE3.5030705@jp.fujitsu.com>
Adding CCs (highly recommended to CC at least the subsystem maintainers
of the stuff you touch :-)
On Tue, 2008-04-01 at 20:26 +0900, Hidetoshi Seto wrote:
> Hi all,
>
> Using cpuset, now we can partition the system into multiple sched domains.
> Then, how about providing different characteristics for each domains?
>
> This patch introduces new feature of cpuset - sched domain customization.
>
> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
>
> ---
> Documentation/cpusets.txt | 89 ++++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 87 insertions(+), 2 deletions(-)
>
> Index: GIT-torvalds/Documentation/cpusets.txt
> ===================================================================
> --- GIT-torvalds.orig/Documentation/cpusets.txt
> +++ GIT-torvalds/Documentation/cpusets.txt
> @@ -8,6 +8,7 @@ Portions Copyright (c) 2004-2006 Silicon
> Modified by Paul Jackson <pj@sgi.com>
> Modified by Christoph Lameter <clameter@sgi.com>
> Modified by Paul Menage <menage@google.com>
> +Modified by Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
>
> CONTENTS:
> =========
> @@ -20,7 +21,8 @@ CONTENTS:
> 1.5 What is memory_pressure ?
> 1.6 What is memory spread ?
> 1.7 What is sched_load_balance ?
> - 1.8 How do I use cpusets ?
> + 1.8 What are other sched_* files ?
> + 1.9 How do I use cpusets ?
> 2. Usage Examples and Syntax
> 2.1 Basic Usage
> 2.2 Adding/removing cpus
> @@ -497,7 +499,90 @@ the cpuset code to update these sched do
> partition requested with the current, and updates its sched domains,
> removing the old and adding the new, for each change.
>
> -1.8 How do I use cpusets ?
> +1.8 What are other sched_* files ?
> +----------------------------------
> +
> +As described in 1.7, cpuset allows you to partition the systems CPUs
> +into a number of sched domains. Each sched domain is load balanced
> +independently, in a traditional way that designed to be good for
> +usual systems.
> +
> +But you may want to customize the behavior of load balancing for your
> +special system. For this requirement, cpuset provides some files named
> +sched_* to customize the sched domain of the cpuset for some special
> +situation, i.e. some specific application on some special system.
> +
> +These files are per-cpuset and affect the sched domain where the
> +cpuset belongs to. If multiple cpusets are overlapping and hence they
> +form a single sched domain, changes in one of them affect others.
> +If flag "sched_load_balance" of a cpuset is disabled, sched_* files
> +have no effect since there is no sched domain belonging the cpuset.
> +
> +Note that modifying sched_* files will have both good and bad effects,
> +and whether it is acceptable or not will be depend on your situation.
> +Don't modify these files if you are not sure the effect.
> +
> +1.8.1 What is sched_wake_idle_far ?
> +-----------------------------------
> +
> +When a task is woken up, scheduler try to wake up the task on idle CPU.
> +
> +For example, if a task A running on CPU X activates another task B
> +on the same CPU X, and if CPU Y is X's sibling and performing idle,
> +then scheduler migrate task B to CPU Y so that task B can start
> +on CPU Y without waiting task A on CPU X.
> +
> +However scheduler doesn't search whole system, just searches nearby
> +siblings at default. Assume CPU Z is relatively far from CPU X.
> +Even if CPU Z is idle while CPU X and the siblings are busy, scheduler
> +can't migrate woken task B from X to Z. As the result, task B on CPU X
> +need to wait task A or wait load balance on the next tick. For some
> +special applications, waiting 1 tick is too long.
> +
> +The main reason why scheduler limits the range of searching idle CPU
> +so small such as "siblings in the socket" is because it saves
> +searching cost and migration cost. Nowadays there are shared
> +resources between siblings - CPU caches and so on, so this limit can
> +save some migration cost assuming that the resources contain enough
> +not-expired stuff for migrating task. Usually this assumption will
> +work, but not guaranteed.
> +
> +When the flag 'sched_wake_idle_far' is enabled, this searching range
> +is expanded to all CPUs in the sched domain of the cpuset.
> +
> +If this flag was enabled on the example of CPU Z given above,
> +scheduler can find CPU Z by taking some extra searching cost, and
> +migrate task B to CPU Z by taking some extra migration cost.
> +In exchange of these costs, you can start task B relatively fast.
> +
> +If your situation is:
> + - The migration costs between each cpu can be assumed considerably
> + small(for you) due to your special application's behavior or
> + special hardware support for CPU cache etc.
> + - The searching cost doesn't have impact(for you) or you can make
> + the searching cost enough small by managing cpuset to compact etc.
> + - The latency is required even it sacrifices cache hit rate etc.
> +then turning on 'sched_wake_idle_far' would benefit you.
> +
> +1.8.2 What is sched_balance_newidle_far ?
> +-----------------------------------------
> +
> +If a CPU run out of tasks in its runqueue, the CPU try to pull extra
> +tasks from other busy CPUs to help them before it is going to be idle.
> +
> +Of course it takes some searching cost to find movable tasks,
> +scheduler might not search all CPUs in the system. For example,
> +the range is limited in the same socket or node where the CPU locates.
> +
> +When the flag 'sched_balance_newidle_far' is enabled, this range
> +is expanded to all CPUs in the sched domain of the cpuset.
> +
> +The assumed situation where this flag is considerable is almost same
> +as that of 'sched_wake_idle_far'. If you would like to trade better
> +latency and high operating ratio in return of some other benefits,
> +then enable this flag.
> +
> +1.9 How do I use cpusets ?
> --------------------------
>
> In order to minimize the impact of cpusets on critical kernel
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
next prev parent reply other threads:[~2008-04-01 11:48 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-01 11:26 [PATCH 1/2] Customize sched domain via cpuset Hidetoshi Seto
2008-04-01 11:40 ` Andi Kleen
2008-04-01 11:56 ` Peter Zijlstra
2008-04-01 13:29 ` Andi Kleen
2008-04-01 13:38 ` Peter Zijlstra
2008-04-01 11:48 ` Peter Zijlstra [this message]
2008-04-01 11:55 ` Paul Jackson
2008-04-01 11:59 ` Peter Zijlstra
2008-04-02 8:39 ` Hidetoshi Seto
2008-04-02 11:14 ` Paul Jackson
2008-04-03 3:21 ` Hidetoshi Seto
2008-04-03 10:46 ` Peter Zijlstra
2008-04-03 12:56 ` Paul Jackson
2008-04-03 13:14 ` Paul Jackson
2008-04-04 9:10 ` [PATCH 1/2] Customize sched domain via cpuset (v2) Hidetoshi Seto
2008-04-04 9:11 ` [PATCH 2/2] " Hidetoshi Seto
2008-04-10 14:53 ` Peter Zijlstra
2008-04-14 1:45 ` Hidetoshi Seto
2008-04-14 15:38 ` Paul Jackson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1207050491.8514.710.camel@twins \
--to=peterz@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=pj@sgi.com \
--cc=seto.hidetoshi@jp.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.