public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Paul Jackson <pj@sgi.com>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: maxk@qualcomm.com, mingo@elte.hu, tglx@linutronix.de,
	oleg@tv-sign.ru, rostedt@goodmis.org,
	linux-kernel@vger.kernel.org, rientjes@google.com
Subject: Re: [RFC/PATCH] cpuset: cpuset irq affinities
Date: Thu, 6 Mar 2008 21:40:03 -0600	[thread overview]
Message-ID: <20080306214003.aba81840.pj@sgi.com> (raw)
In-Reply-To: <1204816885.6241.265.camel@lappy>

Helpful reply, Peter.  Thanks.

Peter, replying to pj, wrote:
> On Thu, 2008-03-06 at 07:47 -0600, Paul Jackson wrote:
> > ... In your example,
> > the 'tasks' file in /cgroup/irqs is probably empty, right?
> 
> Likely; if for instance you'd want some unbound kernel threads to join
> in that overlapping set, then perhaps that name would be badly chosen.

Ok.  So, as you note below, discussing cgroups:
> ... yes, cgroups are perhaps awkward because they group tasks whereas
> the current problem is grouping IRQs.

Essentially, cpusets are like cgroups in this regard.  They group
tasks.  They just happen to be grouping tasks to associate them
with sets of CPUs (and Memory Nodes), which seems relevant somehow
to the present need, to group irqs to associate them with sets of
CPUs.



> Perhaps we're talking about something else here; how bad would it be to
> require:
> 
> for irq in `cat /cgroup/boot/irqs` ; do echo $irq > /cgroup/irqs; done

I haven't gotten my head around what such a script would do yet,
but you are correct in suspecting that we could add a script like
this easily enough in future releases, if that was useful.

I can change init scripts, for each kernel version, much easier than I
can ask the big batch scheduler providers to change their application
code (user level system code) to deal with incompatible changes.


> Certainly providing a new script in the new
> version certified for a new distro isn't too much work?

Correct - that's quite easy, from my perspective.

> > What in tarnation are you trying to do, that's painful or impossible
> > to do, with what we have now?
> 
> Assign a map of cpus where irqs will default into, and a way to
> explicitly move them out of it.

Can you spell out how or why /proc/irq/N/smp_affinity doesn't
provide what you need here?

My guess is that it's fairly obvious why /proc/irq/N/smp_affinity is
not well suited for this.  It requires poking lots of settings, one
at a time, which is cumbersome and racey from user space, difficult
to keep in sync with any other changes in placement of RT or other
jobs, and it requires root permissions, without any finer granularity
practical.


> Because we're mapping them [irqs] onto CPUs, cpusets came to mind.
> 
> The thing we 'need', is to provide named groups of irqs and for each
> such a group specify a cpu mask that is appropriate.
> 
> Grouping them makes sense in that we want to make a functional division.
> Some IRQs serve a system as a whole, others serve a subset. Typical
> subsets could be a RT process space bounded to a cpu/mem domain.
> 
> Other usable subsets could be limiting the IRQs of node local network
> and IO cards to the cpu/mem domain that runs the application that uses
> them.
> 
> So we group irqs like:
> 
>   system_on_nodes_1_2_and_3 (default)
>   big_io_app_on_nodes_2_and_3
>   rt_app_on_node_4
> 
> Where, again, you see a strong similarity to the cpu/mem divisions
> already made by cpusets.


Cool -- I'm glad now I asked (rather impatiently) what we needed.

That's a helpful reply.

Could we:
 1) name some sets of IRQs
 2) for each cpuset, specify which named IRQ set applied to it
 3) prioritize these sets of IRQs (linear order), so that
    for any given CPU, if it were in multiple cpusets
    specifying conflicting IRQ sets, we could select the
    IRQ set to apply to that CPU.

Given the reliance in (2) of cpusets on these IRQ set names, this
still needs to be part of cpusets.

But rather than (ab)use cpusets to directly accomplish (1), how
about adding some files to the root cpuset to define IRQ sets,
with names such as (for example):

	irqs.0.system
	irqs.1.big_io_apps
	irqs.2.rt

That is, more generally, add one or more "irqs.N.name" files to
the top cpuset, where N is a distinct natural number and "name"
a user space specified name (except that perhaps the first one,
the 'irqs.0.system', with its name 'system' or perhaps it should
be 'boot', is pre-ordained during system boot.)

Each of these 'irqs.N.name' files would contain a newline separated
list of irq numbers.

Also add, per item (2) above, to each cpuset, one more file, containing
a single line, naming one of these irq.* files to be found in the
root cpuset.  Let me call this new per-cpuset file 'irqs'.

The number N in the name "irqs.N.name" would order these sets of irqs.

If in this example a cpuset's "irqs" file specified 'rt', that would
take priority (for the CPUs in that cpusets 'cpus' file) over the
other two irqs.N.* files above, because the '2' in "irqs.2.rt" is
bigger than the other irqs.N numbers.

For each CPU, we'd find the largest N such that some cpuset (1) had
that CPU in its 'cpus'mask, and that cpusets 'irqs' file named the
corresponding irqs.N.name file, and then we'd use the irqs listed in
that irqs.N.name file on that CPU.

The default value of the top (root) cpusets 'irqs' file at boot
would be 'system' (or 'boot').  The default value for any cpusets
created thereafter would be inherited from the cpusets parent.

These 'irqs.N.name' files would be the first instance of allowing
user created files in cpuset directories.  That will require some
changes to the cpuset or cgroup code; I don't know how much.

If one of these 'irqs.N.name' files were removed, then any cpuset
that had been using it (had that 'name' in its 'irqs' file) would
have to be reverted, I suppose to its parents 'irqs' setting.

An application (any job with permission to write its own cpusets
files) could control which named set of irqs it wanted to use,
by writing the 'irqs' file in its cpuset.  But system permissions
(such as root) would be probably be required to specify which irqs
were listed in each /dev/cpuset/irqs.N.* file (unless some admin
script decided to change the permissions on those files at runtime,
of course.)

Does that make any sense?  What have I missed?

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.940.382.4214

  reply	other threads:[~2008-03-07  3:40 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-27 22:21 [RFC/PATCH 0/4] CPUSET driven CPU isolation Peter Zijlstra
2008-02-27 22:21 ` [RFC/PATCH 1/4] sched: remove isolcpus Peter Zijlstra
2008-02-27 23:57   ` Max Krasnyanskiy
2008-02-28 10:19     ` Peter Zijlstra
2008-02-28 19:36       ` Max Krasnyansky
2008-02-27 22:21 ` [RFC/PATCH 2/4] cpuset: system sets Peter Zijlstra
2008-02-27 23:39   ` Paul Jackson
2008-02-28  1:53     ` Max Krasnyanskiy
2008-02-27 23:52   ` Max Krasnyanskiy
2008-02-28  0:11     ` Paul Jackson
2008-02-28  0:29       ` Steven Rostedt
2008-02-28  1:45         ` Max Krasnyanskiy
2008-02-28  3:41           ` Steven Rostedt
2008-02-28  4:58             ` Max Krasnyansky
2008-02-27 22:21 ` [RFC/PATCH 3/4] genirq: system set irq affinities Peter Zijlstra
2008-02-28  0:10   ` Max Krasnyanskiy
2008-02-28 10:19     ` Peter Zijlstra
2008-02-27 22:21 ` [RFC/PATCH 4/4] kthread: system set kthread affinities Peter Zijlstra
2008-02-27 23:38 ` [RFC/PATCH 0/4] CPUSET driven CPU isolation Max Krasnyanskiy
2008-02-28 10:19   ` Peter Zijlstra
2008-02-28 17:33     ` Max Krasnyanskiy
2008-02-28  7:50 ` Ingo Molnar
2008-02-28  8:08   ` Paul Jackson
2008-02-28  9:08     ` Ingo Molnar
2008-02-28  9:17       ` Paul Jackson
2008-02-28  9:32         ` David Rientjes
2008-02-28 10:12           ` David Rientjes
2008-02-28 10:26             ` Peter Zijlstra
2008-02-28 17:37             ` Paul Jackson
2008-02-28 21:24               ` David Rientjes
2008-02-28 22:46                 ` Paul Jackson
2008-02-28 23:00                   ` David Rientjes
2008-02-29  0:16                     ` Paul Jackson
2008-02-29  1:05                       ` David Rientjes
2008-02-29  3:34                         ` Paul Jackson
2008-02-29  4:00                           ` David Rientjes
2008-02-29  6:53                             ` Paul Jackson
2008-02-28 10:46         ` Ingo Molnar
2008-02-28 17:47           ` Paul Jackson
2008-02-28 20:11           ` Max Krasnyansky
2008-02-28 20:13             ` Paul Jackson
2008-02-28 20:26               ` Max Krasnyansky
2008-02-28 20:27                 ` Paul Jackson
2008-02-28 20:45                   ` Max Krasnyansky
2008-02-28 20:23       ` Max Krasnyansky
2008-02-28 17:48   ` Max Krasnyanskiy
2008-02-29  8:31   ` Andrew Morton
2008-02-29  8:36     ` Andrew Morton
2008-02-29  9:10     ` Ingo Molnar
2008-02-29 18:06       ` Max Krasnyanskiy
2008-02-28 12:12 ` Mark Hounschell
2008-02-28 19:57   ` Max Krasnyansky
2008-02-29 18:55 ` [RFC/PATCH] cpuset: cpuset irq affinities Peter Zijlstra
2008-02-29 19:02   ` Ingo Molnar
2008-02-29 20:52     ` Max Krasnyanskiy
2008-02-29 21:03       ` Peter Zijlstra
2008-02-29 21:20         ` Max Krasnyanskiy
2008-03-03 11:57           ` Peter Zijlstra
2008-03-03 17:36             ` Paul Jackson
2008-03-03 17:57               ` Peter Zijlstra
2008-03-03 18:10                 ` Paul Jackson
2008-03-03 18:18                   ` Peter Zijlstra
2008-03-04  7:35                     ` Paul Jackson
2008-03-04 11:06                       ` Peter Zijlstra
2008-03-04 19:52                         ` Max Krasnyanskiy
2008-03-05  1:11                           ` Paul Jackson
2008-03-05  8:37                             ` Peter Zijlstra
2008-03-05  8:50                               ` Ingo Molnar
2008-03-05 12:35                                 ` Paul Jackson
2008-03-05 12:43                                   ` Ingo Molnar
2008-03-05 17:44                                     ` Paul Jackson
2008-03-05 19:17                               ` Max Krasnyansky
2008-03-06 13:47                               ` Paul Jackson
2008-03-06 15:21                                 ` Peter Zijlstra
2008-03-07  3:40                                   ` Paul Jackson [this message]
2008-03-07  6:39                                     ` Paul Jackson
2008-03-07  8:47                                       ` Paul Menage
2008-03-07 14:57                                         ` Paul Jackson
2008-03-03 18:41                   ` Paul Menage
2008-03-03 18:52                     ` Paul Jackson
2008-03-04  5:26                       ` Paul Menage
2008-03-04  6:15                         ` Paul Jackson
2008-03-04  6:21                           ` Paul Menage
2008-03-04  6:26                             ` Paul Jackson
2008-03-04  6:34                               ` Paul Menage
2008-03-04  6:51                                 ` Paul Jackson
2008-02-29 20:55   ` Paul Jackson
2008-02-29 21:14     ` Peter Zijlstra
2008-02-29 21:29       ` Ingo Molnar
2008-02-29 21:32       ` Ingo Molnar
2008-02-29 21:42       ` Max Krasnyanskiy
2008-02-29 22:00         ` Paul Jackson
2008-02-29 21:53       ` Paul Jackson
2008-03-02  5:18   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080306214003.aba81840.pj@sgi.com \
    --to=pj@sgi.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maxk@qualcomm.com \
    --cc=mingo@elte.hu \
    --cc=oleg@tv-sign.ru \
    --cc=rientjes@google.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox