public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] syscall interface for cpu affinity
  2002-03-10 23:45     ` Jeff Garzik
@ 1976-03-03 15:58       ` Tim Hockin
  2002-03-11  0:08         ` Jeff Garzik
  0 siblings, 1 reply; 16+ messages in thread
From: Tim Hockin @ 1976-03-03 15:58 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Robert Love, Andreas Jaeger, torvalds, linux-kernel

> Anon!  But there is something uber-ugly about constantly jamming more
> and more stuff into procfs without thinking or planning long term...  I
> vote for the non-procfs approach :)

At some point I had done a port of SGI's pset/sysmp interface to linux 2.2.
As far as I know, lots of people are still using it.  I haven't ported it
to 2.4 for various reasons, but I have to say - IT IS A MUCH BETTER
INTERFACE than all these ad-hoc cpus_allowed bits.

If I thought that it had a chance of inclusion, maybe I'd port it up, but
last I heard none of the "core" people wanted it.

If we are going to pick an affinity system, please, let's consider sysmp().

Tim


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] syscall interface for cpu affinity
@ 2002-03-10 18:15 Robert Love
  2002-03-10 20:29 ` Andreas Jaeger
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Robert Love @ 2002-03-10 18:15 UTC (permalink / raw)
  To: torvalds; +Cc: linux-kernel

Linus,

I have updated the patch a bit and resycned to 2.5.6.  Are you
interested?  I believe a user interface for setting task CPU affinity is
useful and completes the rest of our sched_* syscalls.  A syscall
implementation seems to be what everyone wants (I have a proc-interface,
too...)

This patch implements

        int sched_set_affinity(pid_t pid, unsigned int len,
                               unsigned long *new_mask_ptr);

        int sched_get_affinity(pid_t pid, unsigned int *user_len_ptr,
                               unsigned long *user_mask_ptr)

which set and get the cpu affinity (task->cpus_allowed) for a task,
using the set_cpus_allowed function in Ingo's scheduler.  The functions
properly support changes to cpus_allowed, implement security, and are
well-tested.

They are based on Ingo's older affinity syscall patch and my older
affinity proc patch.

Comments?

	Robert Love

diff -urN linux-2.5.6/arch/i386/kernel/entry.S linux/arch/i386/kernel/entry.S
--- linux-2.5.6/arch/i386/kernel/entry.S	Thu Mar  7 21:18:19 2002
+++ linux/arch/i386/kernel/entry.S	Sun Mar 10 13:01:03 2002
@@ -717,6 +717,8 @@
 	.long SYMBOL_NAME(sys_fremovexattr)
 	.long SYMBOL_NAME(sys_tkill)
 	.long SYMBOL_NAME(sys_sendfile64)
+	.long SYMBOL_NAME(sys_sched_set_affinity)	/* 240 */
+	.long SYMBOL_NAME(sys_sched_get_affinity)
 
 	.rept NR_syscalls-(.-sys_call_table)/4
 		.long SYMBOL_NAME(sys_ni_syscall)
diff -urN linux-2.5.6/include/asm-i386/unistd.h linux/include/asm-i386/unistd.h
--- linux-2.5.6/include/asm-i386/unistd.h	Thu Mar  7 21:18:55 2002
+++ linux/include/asm-i386/unistd.h	Sun Mar 10 13:03:41 2002
@@ -244,6 +244,8 @@
 #define __NR_fremovexattr	237
 #define __NR_tkill		238
 #define __NR_sendfile64		239
+#define __NR_sched_set_affinity	240
+#define __NR_sched_get_affinity	241
 
 /* user-visible error numbers are in the range -1 - -124: see <asm-i386/errno.h> */
 
diff -urN linux-2.5.6/kernel/sched.c linux/kernel/sched.c
--- linux-2.5.6/kernel/sched.c	Thu Mar  7 21:18:19 2002
+++ linux/kernel/sched.c	Sun Mar 10 12:59:26 2002
@@ -1215,6 +1215,95 @@
 	return retval;
 }
 
+/**
+ * sys_sched_set_affinity - set the cpu affinity of a process
+ * @pid: pid of the process
+ * @len: length of new_mask
+ * @new_mask: user-space pointer to the new cpu mask
+ */
+asmlinkage int sys_sched_set_affinity(pid_t pid, unsigned int len,
+				      unsigned long *new_mask_ptr)
+{
+	unsigned long new_mask;
+	task_t *p;
+	int retval;
+
+	if (len < sizeof(new_mask))
+		return -EINVAL;
+
+	if (copy_from_user(&new_mask, new_mask_ptr, sizeof(new_mask)))
+		return -EFAULT;
+
+	new_mask &= cpu_online_map;
+	if (!new_mask)
+		return -EINVAL;
+
+	read_lock(&tasklist_lock);
+
+	retval = -ESRCH;
+	p = find_process_by_pid(pid);
+	if (!p)
+		goto out_unlock;
+
+	retval = -EPERM;
+	if ((current->euid != p->euid) && (current->euid != p->uid) &&
+			!capable(CAP_SYS_NICE))
+		goto out_unlock;
+
+	retval = 0;
+#ifdef CONFIG_SMP
+	set_cpus_allowed(p, new_mask);
+#endif
+
+out_unlock:
+	read_unlock(&tasklist_lock);
+	return retval;
+}
+
+/**
+ * sys_sched_get_affinity - get the cpu affinity of a process
+ * @pid: pid of the process
+ * @user_len_ptr: userspace pointer to the length of the mask
+ * @user_mask_ptr: userspace pointer to the mask
+ */
+asmlinkage int sys_sched_get_affinity(pid_t pid, unsigned int *user_len_ptr,
+				      unsigned long *user_mask_ptr)
+{
+	unsigned long mask;
+	unsigned int len, user_len;
+	task_t *p;
+	int retval;
+
+	len = sizeof(mask);
+
+	if (copy_from_user(&user_len, user_len_ptr, sizeof(user_len)))
+		return -EFAULT;
+
+	if (copy_to_user(user_len_ptr, &len, sizeof(len)))
+		return -EFAULT;
+
+	if (user_len < len)
+		return -EINVAL;
+
+	read_lock(&tasklist_lock);
+
+	retval = -ESRCH;
+	p = find_process_by_pid(pid);
+	if (!p)
+		goto out_unlock;
+
+	retval = 0;
+	mask = p->cpus_allowed & cpu_online_map;
+
+out_unlock:
+	read_unlock(&tasklist_lock);
+	if (retval)
+		return retval;
+	if (copy_to_user(user_mask_ptr, &mask, sizeof(mask)))
+		return -EFAULT;
+	return 0;
+}
+
 asmlinkage long sys_sched_yield(void)
 {
 	runqueue_t *rq;


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] syscall interface for cpu affinity
  2002-03-10 18:15 [PATCH] syscall interface for cpu affinity Robert Love
@ 2002-03-10 20:29 ` Andreas Jaeger
  2002-03-10 20:53   ` Robert Love
  2002-03-10 22:05 ` Chris Wedgwood
  2002-03-11  0:38 ` Andreas Ferber
  2 siblings, 1 reply; 16+ messages in thread
From: Andreas Jaeger @ 2002-03-10 20:29 UTC (permalink / raw)
  To: Robert Love; +Cc: torvalds, linux-kernel

Robert Love <rml@tech9.net> writes:

> Linus,
>
> I have updated the patch a bit and resycned to 2.5.6.  Are you
> interested?  I believe a user interface for setting task CPU affinity is
> useful and completes the rest of our sched_* syscalls.  A syscall
> implementation seems to be what everyone wants (I have a proc-interface,
> too...)

Please add the procinterface also!  I've found it today (for 2.4.18)
and it's much easier to use with existing programs.

Andreas

> This patch implements
>
>         int sched_set_affinity(pid_t pid, unsigned int len,
>                                unsigned long *new_mask_ptr);
>
>         int sched_get_affinity(pid_t pid, unsigned int *user_len_ptr,
>                                unsigned long *user_mask_ptr)
>
> which set and get the cpu affinity (task->cpus_allowed) for a task,
> using the set_cpus_allowed function in Ingo's scheduler.  The functions
> properly support changes to cpus_allowed, implement security, and are
> well-tested.
>
> They are based on Ingo's older affinity syscall patch and my older
> affinity proc patch.
>
> Comments?

Please add it for all archs - this is not only interesting for x86,
Andreas

[...]
-- 
 Andreas Jaeger
  SuSE Labs aj@suse.de
   private aj@arthur.inka.de
    http://www.suse.de/~aj

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] syscall interface for cpu affinity
  2002-03-10 20:29 ` Andreas Jaeger
@ 2002-03-10 20:53   ` Robert Love
  2002-03-10 21:03     ` Andreas Jaeger
  2002-03-10 23:45     ` Jeff Garzik
  0 siblings, 2 replies; 16+ messages in thread
From: Robert Love @ 2002-03-10 20:53 UTC (permalink / raw)
  To: Andreas Jaeger; +Cc: torvalds, linux-kernel

On Sun, 2002-03-10 at 15:29, Andreas Jaeger wrote:
 
> Please add the procinterface also!  I've found it today (for 2.4.18)
> and it's much easier to use with existing programs.

I agree and I really like the proc-interface.  There is something uber
cool about:

	cat 1 > /proc/pid/affinity

I have a patch for 2.5.6 for proc-based affinity interface here:

	http://www.kernel.org/pub/linux/kernel/people/rml/cpu-affinity/v2.5/cpu-affinity-proc-rml-2.5.6-1.patch

I suspect, however, that despite both patches being small we really only
want to pick and standardize on one.  The syscall interface has two main
things going for it against a proc-based implementation: it is faster
and /proc may not be mounted.  The masses have spoken on this issue.

Note you can use the syscall interface with existing programs, too. 
Just write a program to take in a pid and mask and call
sched_set_affinity.

> Please add it for all archs - this is not only interesting for x86,

I'll send Linus the patch for other arches if/when he accepts this patch
- I have no problem with that.

	Robert Love


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] syscall interface for cpu affinity
  2002-03-10 20:53   ` Robert Love
@ 2002-03-10 21:03     ` Andreas Jaeger
  2002-03-10 22:23       ` Andreas Schwab
  2002-03-10 23:56       ` Andreas Ferber
  2002-03-10 23:45     ` Jeff Garzik
  1 sibling, 2 replies; 16+ messages in thread
From: Andreas Jaeger @ 2002-03-10 21:03 UTC (permalink / raw)
  To: Robert Love; +Cc: torvalds, linux-kernel

Robert Love <rml@tech9.net> writes:

> On Sun, 2002-03-10 at 15:29, Andreas Jaeger wrote:
>  
>> Please add the procinterface also!  I've found it today (for 2.4.18)
>> and it's much easier to use with existing programs.
>
> I agree and I really like the proc-interface.  There is something uber
> cool about:
>
> 	cat 1 > /proc/pid/affinity

I agree.

> I have a patch for 2.5.6 for proc-based affinity interface here:
>
> 	http://www.kernel.org/pub/linux/kernel/people/rml/cpu-affinity/v2.5/cpu-affinity-proc-rml-2.5.6-1.patch
>
> I suspect, however, that despite both patches being small we really only
> want to pick and standardize on one.  The syscall interface has two main
> things going for it against a proc-based implementation: it is faster
> and /proc may not be mounted.  The masses have spoken on this issue.
>
> Note you can use the syscall interface with existing programs, too. 
> Just write a program to take in a pid and mask and call
> sched_set_affinity.

What I need at the moment is a wrapper - and you can do it two ways:

$ run_with_affinity 1 program arguments...
$ (cat 1 > /proc/self/affinity; program arguments...)

The second one is much easier coded ;-)

>> Please add it for all archs - this is not only interesting for x86,
>
> I'll send Linus the patch for other arches if/when he accepts this patch
> - I have no problem with that.

Thanks,
Andreas
-- 
 Andreas Jaeger
  SuSE Labs aj@suse.de
   private aj@arthur.inka.de
    http://www.suse.de/~aj

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] syscall interface for cpu affinity
  2002-03-10 18:15 [PATCH] syscall interface for cpu affinity Robert Love
  2002-03-10 20:29 ` Andreas Jaeger
@ 2002-03-10 22:05 ` Chris Wedgwood
  2002-03-10 22:11   ` Robert Love
  2002-03-11  0:38 ` Andreas Ferber
  2 siblings, 1 reply; 16+ messages in thread
From: Chris Wedgwood @ 2002-03-10 22:05 UTC (permalink / raw)
  To: Robert Love; +Cc: torvalds, linux-kernel

On Sun, Mar 10, 2002 at 01:15:03PM -0500, Robert Love wrote:

    I have updated the patch a bit and resycned to 2.5.6.  Are you
    interested?  I believe a user interface for setting task CPU
    affinity is useful and completes the rest of our sched_* syscalls.
    A syscall implementation seems to be what everyone wants (I have a
    proc-interface, too...)

Can't wer just copy the IRIX interface here as some other pathces have
in the past?



  --cw

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] syscall interface for cpu affinity
  2002-03-10 22:05 ` Chris Wedgwood
@ 2002-03-10 22:11   ` Robert Love
  0 siblings, 0 replies; 16+ messages in thread
From: Robert Love @ 2002-03-10 22:11 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: torvalds, linux-kernel

On Sun, 2002-03-10 at 17:05, Chris Wedgwood wrote:

> Can't wer just copy the IRIX interface here as some other pathces have
> in the past?

Is that psets?  If so, no thanks.

I want a simple, clean, quick implementation.  I have seen patches that
do a lot more than what my simple implementation does, and that really
does not interest me and I suspect Ingo and others feel the same way. 
Setting a simple per-task bitmask that is inherited is all we need.

Linux scheduler API is already our own standard.  I'd rather support
that (i.e. add another simple sched_* call) than some evil other
interface - but that is just me.

	Robert Love


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] syscall interface for cpu affinity
  2002-03-10 21:03     ` Andreas Jaeger
@ 2002-03-10 22:23       ` Andreas Schwab
  2002-03-10 23:56       ` Andreas Ferber
  1 sibling, 0 replies; 16+ messages in thread
From: Andreas Schwab @ 2002-03-10 22:23 UTC (permalink / raw)
  To: Andreas Jaeger; +Cc: Robert Love, torvalds, linux-kernel

Andreas Jaeger <aj@suse.de> writes:

|> What I need at the moment is a wrapper - and you can do it two ways:
|> 
|> $ run_with_affinity 1 program arguments...
|> $ (cat 1 > /proc/self/affinity; program arguments...)
|> 
|> The second one is much easier coded ;-)

Apparently not, since that should be

$ (echo 1 > /proc/self/affinity; program arguments...)

:-)

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE GmbH, Deutschherrnstr. 15-19, D-90429 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] syscall interface for cpu affinity
  2002-03-10 20:53   ` Robert Love
  2002-03-10 21:03     ` Andreas Jaeger
@ 2002-03-10 23:45     ` Jeff Garzik
  1976-03-03 15:58       ` Tim Hockin
  1 sibling, 1 reply; 16+ messages in thread
From: Jeff Garzik @ 2002-03-10 23:45 UTC (permalink / raw)
  To: Robert Love; +Cc: Andreas Jaeger, torvalds, linux-kernel

Robert Love wrote:
> 
> On Sun, 2002-03-10 at 15:29, Andreas Jaeger wrote:
> 
> > Please add the procinterface also!  I've found it today (for 2.4.18)
> > and it's much easier to use with existing programs.
> 
> I agree and I really like the proc-interface.  There is something uber
> cool about:
> 
>         cat 1 > /proc/pid/affinity
> 
> I have a patch for 2.5.6 for proc-based affinity interface here:
> 
>         http://www.kernel.org/pub/linux/kernel/people/rml/cpu-affinity/v2.5/cpu-affinity-proc-rml-2.5.6-1.patch


Anon!  But there is something uber-ugly about constantly jamming more
and more stuff into procfs without thinking or planning long term...  I
vote for the non-procfs approach :)

-- 
Jeff Garzik      | Usenet Rule #2 (John Gilmore): "The Net interprets
Building 1024    | censorship as damage and routes around it."
MandrakeSoft     |

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] syscall interface for cpu affinity
  2002-03-10 21:03     ` Andreas Jaeger
  2002-03-10 22:23       ` Andreas Schwab
@ 2002-03-10 23:56       ` Andreas Ferber
  1 sibling, 0 replies; 16+ messages in thread
From: Andreas Ferber @ 2002-03-10 23:56 UTC (permalink / raw)
  To: Andreas Jaeger; +Cc: Robert Love, linux-kernel

On Sun, Mar 10, 2002 at 10:03:02PM +0100, Andreas Jaeger wrote:
> >
> > Note you can use the syscall interface with existing programs, too. 
> > Just write a program to take in a pid and mask and call
> > sched_set_affinity.
> What I need at the moment is a wrapper - and you can do it two ways:
> 
> $ run_with_affinity 1 program arguments...
> $ (cat 1 > /proc/self/affinity; program arguments...)
> 
> The second one is much easier coded ;-)

$ (set_affinity 1; program arguments...)

set_affinity just calls sched_set_affinity(getppid()), and everything
is fine (and even shorter to type) :-)

Andreas
-- 
       Andreas Ferber - dev/consulting GmbH - Bielefeld, FRG
     ---------------------------------------------------------
         +49 521 1365800 - af@devcon.net - www.devcon.net

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] syscall interface for cpu affinity
  1976-03-03 15:58       ` Tim Hockin
@ 2002-03-11  0:08         ` Jeff Garzik
  2002-03-11  0:32           ` Tim Hockin
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Garzik @ 2002-03-11  0:08 UTC (permalink / raw)
  To: Tim Hockin; +Cc: Robert Love, Andreas Jaeger, torvalds, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 642 bytes --]

Tim Hockin wrote:
> If we are going to pick an affinity system, please, let's consider sysmp().

Not too bad.  I picked a random sysmp(2) man page off the net (attached
for ease of other's reference).

It duplicates some stuff set elsewhere, and seems more than a bit like
ioctl(2) by another name, but doesn't seem too bad.  Note we should be
careful not to overengineer the interface, either...

Just setting a bitmask does seem a bit limiting when thinking about the
future, agreed.

-- 
Jeff Garzik      | Usenet Rule #2 (John Gilmore): "The Net interprets
Building 1024    | censorship as damage and routes around it."
MandrakeSoft     |

[-- Attachment #2: sysmp.man.txt --]
[-- Type: text/plain, Size: 10690 bytes --]


     sysmp - multiprocessing control

C SYNOPSIS


     #include <sys/types.h>
     #include <sys/sysmp.h>
     #include <sys/sysinfo.h> /* for SAGET and MINFO structures */
     int sysmp (int cmd, ...);
     ptrdiff_t sysmp (int cmd, ...);"

DESCRIPTION


     sysmp provides control/information for miscellaneous system services.
     This system call is usually used by system programs and is not intended
     for general use.  The arguments arg1, arg2, arg3, arg4 are provided for
     command-dependent use.

     As specified by cmd, the following commands are available:

     MP_CLEARCFSSTAT
     MP_CLEARNFSSTAT
     MP_NUMA_GETCPUNODEMAP
     MP_NUMA_GETDISTMATRIX
                    These are all interfaces that are used to implement
                    various system library functions.  They are all subject to
                    change and should not be called directly by applications.

     MP_PGSIZE      The page size of the system is returned (see
                    getpagesize(2)).

     MP_SCHED       Interface for the schedctl(2) system call.

     MP_NPROCS      Returns the number of processors physically configured.

     MP_NAPROCS     Returns the number of processors that are available to
                    schedule unrestricted processes.

     MP_STAT        The processor ids and status flag bits of the physically
                    configured processors are copied into an array of pda_stat
                    structures to which arg1 points.  The array must be large
                    enough to hold as many pda_stat structures as the number
                    of processors returned by the MP_NPROCS sysmp command.
                    The pda_stat structure and the various status bits are
                    defined in <sys/pda.h>.

     MP_EMPOWER     The processor number given by arg1, interpreted as an
                    'int', is empowered to run any unrestricted processes.
                    This is the default for all processors.  This command
                    requires superuser authority.

     MP_RESTRICT    The processor number given by arg1, interpreted as an
                    'int', is restricted from running any processes except
                    those assigned to it by a MP_MUSTRUN or MP_MUSTRUN_PID
                    command, a runon(1) command or because of hardware
                    necessity.  Note that processor 0 cannot be restricted.
                    This command requires superuser authority.  On Challenge
                    Series machines, all timers belonging to the processor are
                    moved to the processor that owns the clock as reported by
                    MP_CLOCK.

     MP_ISOLATE     The processor number given by arg1, interpreted as an
                    'int', is isolated from running any processes except those
                    assigned to it by a MP_MUSTRUN command, a runon(1) command
                    or because of hardware necessity.  Instruction cache and
                    Translation Lookaside Buffer synchronization across
                    processors in the system is minimized or delayed on an
                    isolated processor until system services are requested.
                    Note that processor 0 cannot be isolated.  This command
                    requires superuser authority.  On Challenge Series
                    machines, all timers belonging to the processor are moved
                    to the processor that owns the clock as reported by
                    MP_CLOCK.

     MP_UNISOLATE   The processor number given by arg1, interpreted as an
                    'int', is unisolated and empowered to run any unrestricted
                    processes.  This is the default system configuration for
                    all processors.  This command requires superuser
                    authority.

     MP_PREEMPTIVE  The processor number given by arg1, interpreted as an
                    'int', has its clock scheduler enabled.  This is the
                    default for all processors.  This command requires
                    superuser authority.

     MP_NONPREEMPTIVE
                    The processor number given by arg1, interpreted as an
                    'int', has its clock scheduler disabled.  Normal process
                    time slicing is no longer enforced on that processor.  As
                    a result of turning off the clock interrupt, the interrupt
                    latency on this processor will be lower.  This command
                    requires superuser authority and is allowed only on an
                    isolated processor.  This command is not allowed on the
                    clock processor (see MP_CLOCK).

     MP_CLOCK       The processor number given by arg1, interpreted as an
                    'int', is given charge of the operating system software
                    clock (see timers(5)).  This command requires superuser
                    authority.

     MP_FASTCLOCK   The processor number given by arg1, interpreted as an
                    'int', is given charge of the operating system software
                    fast clock (see timers(5)).  This command requires
                    superuser authority.

     MP_MISER_GETREQUEST
     MP_MISER_SENDREQUEST
     MP_MISER_RESPOND
     MP_MISER_GETRESOURCE
     MP_MISER_SETRESOURCE
     MP_MISER_CHECKACCESS
                    These are all interfaces that are used to implement
                    various miser(1) functions.  These are all subject to
                    change and should not be called directly by applications.

     MP_MUSTRUN     Assigns the calling process to run only on the processor
                    number by arg1, interpreted as an 'int', except as
                    required for communications with hardware devices.  A
                    process that has allocated a CC sync register (see
                    ccsync(7m)) is restricted to running on a particular cpu.
                    Attempts to reassign such a process to another cpu will
                    fail until the CC sync register has been relinquished.

     MP_MUSTRUN_PID Assigns the process specified by arg2 to run only on the
                    processor number specified by arg1, both interpreted as
                    'int', except as required for communications with hardware
                    devices.  A process that has allocated a CC sync register
                    (see ccsync(7m)) is restricted to running on a particular
                    cpu.  Attempts to reassign such a process to another cpu
                    will fail until the CC sync register has been
                    relinquished.

     MP_GETMUSTRUN  Returns the processor the current process has been set to
                    run on using the MP_MUSTRUN command.  If the current
                    process has not been assigned to a specific processor, -1
                    is returned and errno is set to EINVAL.

     MP_GETMUSTRUN_PID
                    Returns the processor that the process specified by arg1
                    has been set to run on using the MP_MUSTRUN or
                    MP_MUSTRUN_PID command.  If the process has not been
                    assigned to a specific processor, -1 is returned and errno
                    is set to EINVAL.

     MP_RUNANYWHERE Frees the calling process to run on whatever processor the
                    system deems suitable.

     MP_RUNANYWHERE_PID
                    Frees the process specified by arg1 to run on whatever
                    processor the system deems suitable.

     MP_KERNADDR    Returns the address of various kernel data structures.
                    The structure returned is selected by arg1.  The list of
                    available structures is detailed in <sys/sysmp.h>.  This
                    option is used by many system programs to avoid having to
                    look in /unix for the location of the data structures.

     MP_SASZ        Returns the size of various system accounting structures.
                    As above, the structure returned is governed by arg1.

     MP_SAGET1      Returns the contents of various system accounting
                    structures.  The information is only for the processor
                    specified by arg4.  As above, the structure returned is
                    governed by arg1.  arg2 points to a buffer in the address
                    space of the calling process and arg3 specifies the
                    maximum number of bytes to transfer.

     MP_SAGET       Returns the contents of various system accounting
                    structures.  The information is summed across all
                    processors before it is returned.  As above, the structure
                    returned is governed by arg1.  arg2 points to a buffer in
                    the address space of the calling process and arg3
                    specifies the maximum number of bytes to transfer.

     Possible errors from sysmp are:

     [EPERM]     The effective user ID is not superuser.  Many of the commands
                 require superuser privilege.

     [EPERM]     The user ID of the sending process is not superuser, and its
                 real or effective user ID does not match the real, saved,  or
                 effective user ID of the receiving process.

     [ESRCH]     No process corresponding to that specified by a
                 MP_MUSTRUN_PID, MP_GETMUSTRUN_PID, or MP_RUNANYWHERE_PID
                 could be found.

     [EINVAL]    The processor named by a MP_EMPOWER, MP_RESTRICT, MP_CLOCK or
                 MP_SAGET1 command does not exist.

     [EINVAL]    The cmd argument is invalid.

     [EINVAL]    The arg1 argument to a MP_KERNADDR command is invalid.

     [EINVAL]    An attempt was made via MP_MUSTRUN or MP_MUSTRUN_PID to move
                 a process owning a CC sync register from the cpu controlling
                 the CC sync register.

     [EINVAL]    The target of the MP_GETMUSTRUN command has not been set to
                 run on a specific processor.

     [EBUSY]     An attempt was made to restrict the only unrestricted
                 processor or to restrict the master processor.

     [EFAULT]    An invalid buffer address has been supplied by the calling
                 process.

SEE ALSO


     mpadmin(1), runon(1), getpagesize(2), schedctl(2), timers(5)

DIAGNOSTICS

     Upon successful completion, the cmd dependent data is returned.
     Otherwise, a value of -1 is returned and errno is set to indicate the
     error.



[-- Attachment #3: mpadmin.man.txt --]
[-- Type: text/plain, Size: 5437 bytes --]


mpadmin(1)                                                          mpadmin(1)



NAME


     mpadmin - control and report processor status

SYNOPSIS


     mpadmin -n

     mpadmin -u[processor]

     mpadmin -r[processor]

     mpadmin -c[processor]

     mpadmin -f[processor]

     mpadmin -I[processor]

     mpadmin -U[processor]

     mpadmin -D[processor]

     mpadmin -C[processor]

     mpadmin -s

DESCRIPTION


     mpadmin provides control/information of processor status.

     Exactly one argument is accepted by mpadmin at each invocation.  The
     following arguments are accepted:

     -n           Report which processors are physically configured.  The
                  numbers of the physically configured processors are written
                  to the standard output, one processor number per line.
                  Processors are numbered beginning from 0.

     -u[processor]
                  When no processor is specified, the numbers of the
                  processors that are available to schedule unrestricted
                  processes are written to the standard output.  Otherwise,
                  mpadmin enables the processor number processor to run any
                  unrestricted processes.

     -r[processor]
                  When no processor is specified, the numbers of the
                  processors that are restricted from running any processes
                  (except those assigned via the sysmp(MP_MUSTRUN) function,
                  the runon(1) command, or because of hardware necessity) are
                  written to the standard output.  Otherwise, mpadmin
                  restricts the processor numbered processor.

     -c[processor]
                  When no processor is specified, the number of the processor
                  that handles the operating system software clock is written
                  to the standard output.  Otherwise, operating system
                  software clock handling is moved to the processor numbered
                  processor.  See timers(5) for more details.

     -f[processor]
                  When no processor is specified, the number of the processor
                  that handles the operating system fast clock is written to
                  the standard output.  Otherwise, operating system fast clock
                  handling is moved to the processor numbered processor.  See
                  ftimer(1) and timers(5) for a description of the fast clock
                  usage.

     -I[processor]
                  When no processor is specified, the numbers of the
                  processors that are isolated are written to the standard
                  output.  Otherwise, mpadmin isolates the processor numbered
                  processor.  An isolated processor is restricted as by the -r
                  argument.  In addition, instruction cache and Translation
                  Lookaside Buffer synchronization are blocked, and
                  synchronization is delayed until a system service is
                  requested.

     -U[processor]
                  When no processor is specified, the numbers of the
                  processors that are not isolated are written to the standard
                  output.  Otherwise, mpadmin unisolates the processor
                  numbered processor.

     -D[processor]
                  When no processor is specified, the numbers of the
                  processors that are not running the clock scheduler are
                  written to the standard output.  Otherwise, mpadmin disables
                  the clock scheduler on the processor numbered processor.
                  This makes that processor nonpreemptive, so that normal IRIX
                  process time slicing is no longer enforced.  Processes that
                  run on a non-preemptive processor are not preempted because
                  of timer interrupts.  They are preempted only when
                  requesting a system service that causes them to wait, or
                  that makes a higher-priority process runnable (for example,
                  posting a semaphore).

     -C[processor]
                  When no processor is specified, the numbers of the
                  processors that are running the clock scheduler are written
                  to the standard output.  Otherwise, mpadmin enables the
                  clock scheduler on the processor numbered processor.
                  Processes on a preemptive processor can be preempted at the
                  end of their time slice.

     -s           A summary of the unrestricted, restricted, isolated,
                  preemptive and clock processor numbers is written to the
                  standard output.

SEE ALSO


     ftimer(1), runon(1), sysmp(2), timers(5).

DIAGNOSTICS


     When an argument specifies a processor, 0 is returned on success, -1 on
     failure.  Otherwise, the number of processors associated with argument is
     returned.

WARNINGS


     It is not possible to restrict or isolate all processors.  Processor 0
     must never be restricted or isolated.

BUGS


     Changing the clock processor may cause the system to lose a small amount
     of system time.

     When a processor is not provided as an argument, mpadmin's exit value
     will not exceed 255.  If more than 255 processors exist, mpadmin will
     return 0.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] syscall interface for cpu affinity
  2002-03-11  0:08         ` Jeff Garzik
@ 2002-03-11  0:32           ` Tim Hockin
  0 siblings, 0 replies; 16+ messages in thread
From: Tim Hockin @ 2002-03-11  0:32 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Tim Hockin, Robert Love, Andreas Jaeger, torvalds, linux-kernel

> > If we are going to pick an affinity system, please, let's consider sysmp().
> 
> Not too bad.  I picked a random sysmp(2) man page off the net (attached
> for ease of other's reference).
 
so, there are actually two parts to sysmp().  The Way SGI used to it is
with Pset (MP_PSET to sysmp()).  They seem to have dropped exported support
for PSets - don't know why.  The idea is this.

At boot the system creates a PSet with ALL processors, and one set for each
single CPU.  Root can define extra sets with specified CPUs, too.
Processes can then run (commandline tool = 'runon') on a specific Pset.

runon 3 yes 	# runs on PSET #3

This is ok, but it has several drawbacks:
* user can not run on an arbitrary set of procs
* defining a set for every combination of procs is ludicrous

However, it has several upsides
* disabling a CPU is as simple as removing it from a pset struct, not
iterating over all tasks
* conceptually hides the 'bitmask of CPUs'

> It duplicates some stuff set elsewhere, and seems more than a bit like
> ioctl(2) by another name, but doesn't seem too bad.  Note we should be
> careful not to overengineer the interface, either...

At some point Ralf Baechle asked me to extend it more for IRIX
compatibility.  We may want to just drop that altogether.  Several of the
sysmp() interfaces can be handled at the library layer and re-routed to
their existing interfaces.

> Just setting a bitmask does seem a bit limiting when thinking about the
> future, agreed.

What is the future of the existing CPUs bitmask?  Is it becoming something
else? 

Perhaps we want to keep sysmp() in name and form, perhaps just in name,
perhaps not at all.  This is an area in which I have (had, but could get
again) a lot of interest, but before I waste any more time on it, I'd like
to actually co-design a feature set.

What do we want:
* unpriviliged ability to change current->pset?
	- any user can call sysmp(MP_RUNON) anytime
* privileged ability only (runon becomes suid)
	- can "trap" processes to a CPU - it has been requested a lot
* processor sets or just bitmasks/lists?
	- someone was working on memory sets, similarly to psets

If we really want this, I definately want to help. :)
Tim

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] syscall interface for cpu affinity
  2002-03-10 18:15 [PATCH] syscall interface for cpu affinity Robert Love
  2002-03-10 20:29 ` Andreas Jaeger
  2002-03-10 22:05 ` Chris Wedgwood
@ 2002-03-11  0:38 ` Andreas Ferber
  2002-03-15 22:06   ` Stephen Samuel
  2 siblings, 1 reply; 16+ messages in thread
From: Andreas Ferber @ 2002-03-11  0:38 UTC (permalink / raw)
  To: Robert Love; +Cc: torvalds, linux-kernel

On Sun, Mar 10, 2002 at 01:15:03PM -0500, Robert Love wrote:
> 
> This patch implements
> 
>         int sched_set_affinity(pid_t pid, unsigned int len,
>                                unsigned long *new_mask_ptr);
> 
>         int sched_get_affinity(pid_t pid, unsigned int *user_len_ptr,
>                                unsigned long *user_mask_ptr)
> 
> which set and get the cpu affinity (task->cpus_allowed) for a task,
> using the set_cpus_allowed function in Ingo's scheduler.  The functions
> properly support changes to cpus_allowed, implement security, and are
> well-tested.

Setting the affinity of a whole process group also makes sense IMHO.
Therefore I think an interface more like the setpriority syscall
for sched_set_affinity (with two parameters which/who instead of a
single PID) would be more flexible, eg.

    int sched_set_affinity(int which, int who, unsigned int len,
                           unsigned long *new_mask_ptr);

with who one of {PRIO_PROCESS,PRIO_PGRP,PRIO_USER} and which according
to the value of who.


Getting the mask of a group of processes doesn't make sense though
(what if they differ?), so the current interface of sched_get_affinity
is just fine IMHO.

Andreas
-- 
       Andreas Ferber - dev/consulting GmbH - Bielefeld, FRG
     ---------------------------------------------------------
         +49 521 1365800 - af@devcon.net - www.devcon.net

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] syscall interface for cpu affinity
  2002-03-11  0:38 ` Andreas Ferber
@ 2002-03-15 22:06   ` Stephen Samuel
  2002-03-16  0:43     ` Andreas Ferber
  0 siblings, 1 reply; 16+ messages in thread
From: Stephen Samuel @ 2002-03-15 22:06 UTC (permalink / raw)
  To: Andreas Ferber; +Cc: Robert Love, torvalds, linux-kernel

Picking nits, but....

Andreas Ferber wrote:

 > Setting the affinity of a whole process group also makes sense IMHO.
 > Therefore I think an interface more like the setpriority syscall
 > for sched_set_affinity (with two parameters which/who instead of a
 > single PID) would be more flexible, eg.
 >
 >     int sched_set_affinity(int which, int who, unsigned int len,
 >                            unsigned long *new_mask_ptr);
 >
 > with who one of {PRIO_PROCESS,PRIO_PGRP,PRIO_USER} and which according
 > to the value of who.

I soule suggest that the order be

int sched_set_affinity(int who, int which, unsigned int len,
                             unsigned long *new_mask_ptr);

This would have the {p,pg}id be the first thing that a programmer
would see (likely more important than the 'which'.).


-- 
Stephen Samuel +1(604)876-0426                samuel@bcgreen.com
		   http://www.bcgreen.com/~samuel/
Powerful committed communication, reaching through fear, uncertainty and
doubt to touch the jewel within each person and bring it to life.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] syscall interface for cpu affinity
  2002-03-15 22:06   ` Stephen Samuel
@ 2002-03-16  0:43     ` Andreas Ferber
  2002-03-16  4:24       ` Stephen Samuel
  0 siblings, 1 reply; 16+ messages in thread
From: Andreas Ferber @ 2002-03-16  0:43 UTC (permalink / raw)
  To: Stephen Samuel; +Cc: Robert Love, torvalds, linux-kernel

On Fri, Mar 15, 2002 at 02:06:04PM -0800, Stephen Samuel wrote:
>  >
>  >     int sched_set_affinity(int which, int who, unsigned int len,
>  >                            unsigned long *new_mask_ptr);
>  >
>  > with who one of {PRIO_PROCESS,PRIO_PGRP,PRIO_USER} and which according
>  > to the value of who.

Uh, who/which should be just the other way round in the description
(but not in the prototype). Sorry.

> I soule suggest that the order be
> 
> int sched_set_affinity(int who, int which, unsigned int len,
>                              unsigned long *new_mask_ptr);
> 
> This would have the {p,pg}id be the first thing that a programmer
> would see (likely more important than the 'which'.).

See my correction above, does that address your concern?

Andreas
-- 
       Andreas Ferber - dev/consulting GmbH - Bielefeld, FRG
     ---------------------------------------------------------
         +49 521 1365800 - af@devcon.net - www.devcon.net

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] syscall interface for cpu affinity
  2002-03-16  0:43     ` Andreas Ferber
@ 2002-03-16  4:24       ` Stephen Samuel
  0 siblings, 0 replies; 16+ messages in thread
From: Stephen Samuel @ 2002-03-16  4:24 UTC (permalink / raw)
  To: Andreas Ferber; +Cc: Robert Love, torvalds, linux-kernel

Almost... Same effect  (mostly)...

It does, however, leaves us arguing the linguistic semantics of
which name 'who' should have. It seems to me that the most
natural would be with 'who' being the 'name' of the target, and
'which' specifying which name space 'who' is operating in.

UGH: messing with these names via pronouns is too confusing:
-----------
    How about this:

int sched_set_affinity(int who, int which, unsigned int len,
                              unsigned long *new_mask_ptr);

'who' being a {process, process-group or user } ID , and
with 'which' being one of {PRIO_PROCESS, PRIO_PGRP, PRIO_USER},
respectively -- specifying which namespace 'who' operates in.

I think that that is what you were trying to say, right?

Andreas Ferber wrote:
 > On Fri, Mar 15, 2002 at 02:06:04PM -0800, Stephen Samuel wrote:
 >
 >> >
 >> >     int sched_set_affinity(int which, int who, unsigned int len,
 >> >                            unsigned long *new_mask_ptr);
 >> >
 >> > with who one of {PRIO_PROCESS,PRIO_PGRP,PRIO_USER} and which according
 >> > to the value of who.
 >>
 >
 > Uh, who/which should be just the other way round in the description
 > (but not in the prototype). Sorry.
 >
 >
 >>I sould suggest that the order be
 >>
 >>int sched_set_affinity(int who, int which, unsigned int len,
 >>                             unsigned long *new_mask_ptr);
 >>
 >>This would have the {p,pg}id be the first thing that a programmer
 >>would see (likely more important than the 'which'.).

-- 
Stephen Samuel +1(604)876-0426                samuel@bcgreen.com
		   http://www.bcgreen.com/~samuel/
Powerful committed communication, reaching through fear, uncertainty and
doubt to touch the jewel within each person and bring it to life.


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2002-03-16  4:27 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-03-10 18:15 [PATCH] syscall interface for cpu affinity Robert Love
2002-03-10 20:29 ` Andreas Jaeger
2002-03-10 20:53   ` Robert Love
2002-03-10 21:03     ` Andreas Jaeger
2002-03-10 22:23       ` Andreas Schwab
2002-03-10 23:56       ` Andreas Ferber
2002-03-10 23:45     ` Jeff Garzik
1976-03-03 15:58       ` Tim Hockin
2002-03-11  0:08         ` Jeff Garzik
2002-03-11  0:32           ` Tim Hockin
2002-03-10 22:05 ` Chris Wedgwood
2002-03-10 22:11   ` Robert Love
2002-03-11  0:38 ` Andreas Ferber
2002-03-15 22:06   ` Stephen Samuel
2002-03-16  0:43     ` Andreas Ferber
2002-03-16  4:24       ` Stephen Samuel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox