From: Shailabh Nagar <nagar@watson.ibm.com>
To: Paul Jackson <pj@sgi.com>
Cc: akpm@osdl.org, Valdis.Kletnieks@vt.edu, jlan@engr.sgi.com,
balbir@in.ibm.com, csturtiv@sgi.com,
linux-kernel@vger.kernel.org, hadi@cyberus.ca,
netdev@vger.kernel.org
Subject: Re: [Patch][RFC] Disabling per-tgid stats on task exit in taskstats
Date: Mon, 03 Jul 2006 11:02:17 -0400 [thread overview]
Message-ID: <44A93179.2080303@watson.ibm.com> (raw)
In-Reply-To: <20060702215350.2c1de596.pj@sgi.com>
Paul Jackson wrote:
>Shailabh wrote:
>
>
>>Sends a separate "registration" message with cpumask to listen to.
>>Kernel stores (real) pid and cpumask.
>>
>>
>
>Question:
>=========
>
>Ah - good.
>
>So this means that I could configure a system with a fork/exit
>intensive, performance critical job on some dedicated CPUs, and be able
>to collect taskstat data from tasks exiting on the -other- CPUS, while
>avoiding collecting data from this special job, thus avoiding any
>taskstat collection performance impact on said job.
>
>If I'm understanding this correctly, excellent.
>
>
Yes. If no one registers to listen on a particular CPU, data from tasks
exiting on that cpu is
not sent out at all.
>Caveat:
>=======
>
>Passing cpumasks across the kernel-user boundary can be tricky.
>
>Historically, Unix has a long tradition of boloxing up the passing
>of variable length data types across the kernel-user boundary.
>
>We've got perhaps a half dozen ways of getting these masks out of the
>kernel, and three ways of getting them (or the similar nodemasks) back
>into the kernel. The three ways being used in the sched_setaffinity
>system call, the mbind and set_mempolicy system calls, and the cpuset
>file system.
>
>All three of these ways have their controversial details:
> * The kernel cpumask mask size needed for sched_setaffinity calls is
> not trivially available to userland.
> * The nodemask bit size is off by one in the mbind and set_mempolicy
> calls.
> * The CPU and Node masks are ascii, not binary, in the cpuset calls.
>
>One option that might make sense for these task stat registrations
>would be to:
> 1) make the kernel/sched.c get_user_cpu_mask() routine generic,
> moving it to non-static lib/*.c code, and
> 2) provide a sensible way for user space to query the size of
> the kernel cpumask (and perhaps nodemask while you're at it.)
>
>Currently, the best way I know for user space to query the kernels
>cpumask and nodemask size is to examine the length of the ascii
>string values labeled "Cpus_allowed:" and "Mems_allowed:" in the file
>/proc/self/status. These ascii strings always require exactly nine
>ascii chars to express each 32 bits of kernel mask code, if you include
>in the count the trailing ',' comma or '\n' newline after each eight
>ascii character word.
>
>Probing /proc/self/status fields for these mask sizes is rather
>unobvious and indirect, and requires caching the result if you care at
>all about performance. Userland code in support of your taskstat
>facility might be better served by a more obvious way to size cpumasks.
>
>... unless of course you're inclined to pass cpumasks formatted as
> ascii strings, in which case speak up, as I'd be delighted to
> throw in my 2 cents on how to do that ;).
>
>
Thanks for the size info. I did hit it while coding this up.
So I chose to use the "cpulist" ascii format that has been helpfully
provided in include/linux/cpumask.h (by whom I wonder :-)
User specified the cpumask as an ascii string containing comma separated
cpu ranges.
Kernel parses the same and stores it as a cpumask_t after which we can
iterate over the
mask using standard helpers.
Since registration/deregistration is not a common operation, the
overhead of parsing
ascii strings should be acceptable and avoids the hassles of trying to
determine kernel cpumask size. I don't know if there are buffer overflow
issues in passing a string (though I'm using the
standard netlink way of passing it up using NLA_STRING).
Will post the patch shortly.
--Shailabh
next prev parent reply other threads:[~2006-07-03 15:02 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <44892610.6040001@watson.ibm.com>
[not found] ` <20060609042129.ae97018c.akpm@osdl.org>
[not found] ` <4489EE7C.3080007@watson.ibm.com>
[not found] ` <449999D1.7000403@engr.sgi.com>
[not found] ` <44999A98.8030406@engr.sgi.com>
[not found] ` <44999F5A.2080809@watson.ibm.com>
[not found] ` <4499D7CD.1020303@engr.sgi.com>
[not found] ` <449C2181.6000007@watson.ibm.com>
[not found] ` <20060623141926.b28a5fc0.akpm@osdl.org>
[not found] ` <449C6620.1020203@engr.sgi.com>
[not found] ` <20060623164743.c894c314.akpm@osdl.org>
[not found] ` <449CAA78.4080902@watson.ibm.com>
[not found] ` <20060623213912.96056b02.akpm@osdl.org>
[not found] ` <449CD4B3.8020300@watson.ibm.com>
[not found] ` <44A01A50.1050403@sgi.com>
[not found] ` <20060626105548.edef4c64.akpm@osdl.org>
[not found] ` <44A020CD.30903@watson.ibm.com>
[not found] ` <20060626111249.7aece36e.akpm@osdl.org>
[not found] ` <44A026ED.8080903@sgi.com>
[not found] ` <20060626113959.839d72bc.akpm@osdl.org>
[not found] ` <44A2F50D.8030306@engr.sgi.com>
[not found] ` <20060628145341.529a61ab.akpm@osdl.org>
[not found] ` <44A2FC72.9090407@engr.sgi.com>
[not found] ` <20060629014050.d3bf0be4.pj@sgi.com>
[not found] ` <200606291230.k5TCUg45030710@turing-police.cc.vt.edu>
[not found] ` <200606291230.k5TCUg45030710@turing-police.cc.vt. edu>
[not found] ` <20060629094408.360ac157.pj@sgi.com>
[not found] ` <20060629110107.2e56310b.akpm@osdl.org>
2006-06-29 19:10 ` [Patch][RFC] Disabling per-tgid stats on task exit in taskstats Shailabh Nagar
2006-06-29 19:23 ` Paul Jackson
2006-06-29 19:33 ` Andrew Morton
2006-06-29 19:43 ` Shailabh Nagar
2006-06-29 20:00 ` Andrew Morton
2006-06-29 22:13 ` Shailabh Nagar
2006-06-29 23:00 ` jamal
2006-06-29 20:01 ` Shailabh Nagar
2006-06-29 21:22 ` Paul Jackson
2006-06-29 22:54 ` jamal
2006-06-30 0:38 ` Shailabh Nagar
2006-06-30 1:05 ` Andrew Morton
2006-06-30 1:11 ` Shailabh Nagar
2006-06-30 1:30 ` jamal
2006-06-30 3:01 ` Shailabh Nagar
2006-06-30 12:45 ` jamal
2006-06-30 2:25 ` Paul Jackson
2006-06-30 2:35 ` Andrew Morton
2006-06-30 2:43 ` Paul Jackson
2006-06-30 18:53 ` Shailabh Nagar
2006-06-30 19:10 ` Shailabh Nagar
2006-06-30 19:19 ` Shailabh Nagar
2006-06-30 20:19 ` jamal
2006-06-30 22:50 ` Andrew Morton
2006-07-01 2:20 ` Shailabh Nagar
2006-07-01 2:43 ` Andrew Morton
2006-07-01 3:37 ` Shailabh Nagar
2006-07-01 3:51 ` Andrew Morton
2006-07-03 21:11 ` Shailabh Nagar
2006-07-03 21:41 ` Andrew Morton
2006-07-04 0:13 ` Shailabh Nagar
2006-07-04 0:38 ` Andrew Morton
2006-07-04 20:19 ` Paul Jackson
2006-07-04 20:22 ` Paul Jackson
2006-07-04 0:54 ` Shailabh Nagar
2006-07-04 1:01 ` Andrew Morton
2006-07-04 13:05 ` jamal
2006-07-04 15:18 ` Shailabh Nagar
2006-07-04 16:37 ` Shailabh Nagar
2006-07-04 19:24 ` jamal
2006-07-05 14:09 ` Shailabh Nagar
2006-07-05 20:25 ` Chris Sturtivant
2006-07-05 20:32 ` Shailabh Nagar
2006-07-03 4:53 ` Paul Jackson
2006-07-03 15:02 ` Shailabh Nagar [this message]
2006-07-03 15:55 ` Paul Jackson
2006-07-03 16:31 ` Paul Jackson
2006-07-04 0:09 ` Shailabh Nagar
2006-07-04 19:59 ` Paul Jackson
2006-07-05 17:20 ` Jay Lan
2006-07-05 18:18 ` Shailabh Nagar
2006-06-30 22:56 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=44A93179.2080303@watson.ibm.com \
--to=nagar@watson.ibm.com \
--cc=Valdis.Kletnieks@vt.edu \
--cc=akpm@osdl.org \
--cc=balbir@in.ibm.com \
--cc=csturtiv@sgi.com \
--cc=hadi@cyberus.ca \
--cc=jlan@engr.sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pj@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).