public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Paul Jackson <pj@sgi.com>
To: dino@in.ibm.com
Cc: Simon.Derr@bull.net, nickpiggin@yahoo.com.au,
	linux-kernel@vger.kernel.org, lse-tech@lists.sourceforge.net,
	akpm@osdl.org, dipankar@in.ibm.com, colpatch@us.ibm.com
Subject: Re: [Lse-tech] Re: [RFC PATCH] Dynamic sched domains aka Isolated cpusets
Date: Sat, 23 Apr 2005 15:30:59 -0700	[thread overview]
Message-ID: <20050423153059.0ab5fdc2.pj@sgi.com> (raw)
In-Reply-To: <20050421162738.GA4200@in.ibm.com>

Dinakar wrote:
>   cpuset               cpus       isolated   cpus_allowed   isolated_map
> top                 0-7               1           0             0-7

The top cpuset holds the kernel threads that are pinned to a particular
cpu or node.  It's not right that their cpusets cpus_allowed is empty,
which is what I guess the "0" in the cpus_allowed column above means.
(Even if the "0" means CPU 0, that still conflicts with kernel threads
on CPUs 1-7.)

We might get away with it on cpus, because we don't change the tasks
cpus_allowed to match the cpusets cpus_allowed (we don't call
set_cpus_allowed, from kernel/cpuset.c) _except_ when someone rebinds
that task to its cpuset by writing its pid into the cpuset tasks file.
So for as long as no one tries to rebind the per-cpu or per-node
kernel threads, no one will notice that they in a cpuset with an
empty cpus_allowed.

This won't even work that well on the memory side, where we resync
a task with its cpuset anytime that a task goes to allocate memory
(if it can WAIT and it is not in interrupt) and we notice that someone
has bumped the mems_generation for its cpuset.

In other words, I strongly suspect that:

 1) The top cpuset should allow all cpus, all memory nodes.
 2) The way to assure that one task can't have its cpu or memory stolen
    by another is to put the other tasks in cpusets that don't overlap.
 3) The wrong way to assure this is by refusing to have any other cpusets
    that have overlapping cpus_allowed or mems_allowed.
 4) There are some tasks that _do_ require to run on the same cpus as
    the tasks you would assign to isolated cpusets.  These kernel threads,
    such as for example the migration and ksoftirqd threads, must be setup
    well before user code is run that can configure job specific isolated
    cpusets, so these tasks need a cpuset to run in that can be created
    during the system boot, before init (pid == 1) starts up.  This cpuset
    is the top cpuset.

My users are successfully managing what tasks can use what cpu or memory
resources by controlling which tasks are in which cpusets.  They do not
require the ability to disable allowed cpus or memory nodes in other cpusets
to do this.  It is not entirely clear to me that they even require the
minimal cpu_exclusive/mem_exclusive facility that is there now.

I don't understand why what's there now isn't sufficient.  I don't see
that this patch provides any capability that you can't get just by
properly placing tasks in cpusets that have the desired cpus and nodes.
This patch leaves the per-cpu kernel threads with no cpuset that allows
what they need, and it complicates the semantics of things, in ways that
I still don't entirely understand.

Earlier you wrote:
> 1. I need a method to isolate a random set of cpus in such a way that
>    only the set of processes that are specifically assigned can
>    make use of these CPUs

I don't see why you need this.  Nor do I think it is possible.

You don't need to isolate a set of cpus; you need to isolate a set of
processes.  So long as you can create non-overlapping cpusets, and
assign processes to them, I don't see where it matters that you cannot
prohibit the creation of overlapping cpusets, or in the case of the top
cpuset, why it matters that you cannot _disallow_ allowed cpus
or memory nodes in existing cpusets.

And this is not possible because at least the kernel per-cpu threads
_do_ need to run on each cpu in the system, including those cpus you
would isolate.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

  parent reply	other threads:[~2005-04-23 22:36 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-10-07  0:51 [RFC PATCH] scheduler: Dynamic sched_domains Matthew Dobson
2004-10-07  2:13 ` Nick Piggin
2004-10-07 17:01   ` Jesse Barnes
2004-10-08  5:55     ` [Lse-tech] " Takayoshi Kochi
2004-10-08  6:08       ` Nick Piggin
2004-10-08 16:43         ` Jesse Barnes
2004-10-07 21:58   ` Matthew Dobson
2004-10-08  0:22     ` Nick Piggin
2004-10-07 22:20   ` Matthew Dobson
2004-10-07  4:12 ` [ckrm-tech] " Marc E. Fiuczynski
2004-10-07  5:35   ` Paul Jackson
2004-10-07 22:06   ` Matthew Dobson
2004-10-07  9:32 ` Paul Jackson
2004-10-08 10:14 ` [Lse-tech] " Erich Focht
2004-10-08 10:40   ` Nick Piggin
2004-10-08 15:50     ` [ckrm-tech] " Hubertus Franke
2004-10-08 22:48       ` Matthew Dobson
2004-10-08 18:54     ` Matthew Dobson
2004-10-08 21:56       ` Peter Williams
2004-10-08 22:52         ` Matthew Dobson
2004-10-08 23:13       ` Erich Focht
2004-10-08 23:50         ` Nick Piggin
2004-10-10 12:25           ` Erich Focht
2004-10-08 22:51     ` Erich Focht
2004-10-09  1:05       ` Matthew Dobson
2004-10-10 12:45         ` Erich Focht
2004-10-12 22:45           ` Matthew Dobson
2004-10-08 18:45   ` Matthew Dobson
2005-04-18 20:26 ` [RFC PATCH] Dynamic sched domains aka Isolated cpusets Dinakar Guniguntala
2005-04-18 23:44   ` Nick Piggin
2005-04-19  8:00     ` Dinakar Guniguntala
2005-04-19  5:54   ` Paul Jackson
2005-04-19  6:19     ` Nick Piggin
2005-04-19  6:59       ` Paul Jackson
2005-04-19  7:09         ` Nick Piggin
2005-04-19  7:25           ` Paul Jackson
2005-04-19  7:28           ` Paul Jackson
2005-04-19  7:19       ` Paul Jackson
2005-04-19  7:57         ` Nick Piggin
2005-04-19 20:34           ` Paul Jackson
2005-04-23 23:26             ` Paul Jackson
2005-04-26  0:52               ` Matthew Dobson
2005-04-26  0:59                 ` Paul Jackson
2005-04-19  9:52       ` Dinakar Guniguntala
2005-04-19 15:26         ` Paul Jackson
2005-04-20  7:37           ` Dinakar Guniguntala
2005-04-19 20:42         ` Paul Jackson
2005-04-19  8:12     ` Simon Derr
2005-04-19 16:19       ` Paul Jackson
2005-04-19  9:34     ` [Lse-tech] " Dinakar Guniguntala
2005-04-19 17:23       ` Paul Jackson
2005-04-20  7:16         ` Dinakar Guniguntala
2005-04-20 19:09           ` Paul Jackson
2005-04-21 16:27             ` Dinakar Guniguntala
2005-04-22 21:26               ` Paul Jackson
2005-04-23  7:24                 ` Dinakar Guniguntala
2005-04-23 22:30               ` Paul Jackson [this message]
2005-04-25 11:53                 ` Dinakar Guniguntala
2005-04-25 14:38                   ` Paul Jackson
2005-04-21 17:31   ` [RFC PATCH] Dynamic sched domains aka Isolated cpusets (v0.2) Dinakar Guniguntala
2005-04-22 18:50     ` Paul Jackson
2005-04-22 21:37       ` Paul Jackson
2005-04-23  3:11     ` Paul Jackson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050423153059.0ab5fdc2.pj@sgi.com \
    --to=pj@sgi.com \
    --cc=Simon.Derr@bull.net \
    --cc=akpm@osdl.org \
    --cc=colpatch@us.ibm.com \
    --cc=dino@in.ibm.com \
    --cc=dipankar@in.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lse-tech@lists.sourceforge.net \
    --cc=nickpiggin@yahoo.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox