All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Fix argument checking in sched_setaffinity
@ 2004-08-31 14:30 Andi Kleen
  2004-09-01  1:36 ` Paul Jackson
  0 siblings, 1 reply; 35+ messages in thread
From: Andi Kleen @ 2004-08-31 14:30 UTC (permalink / raw)
  To: akpm, linux-kernel


This patch fixes the argument length checking in sched_setaffinity.

Previously it would error out when the length passed was
smaller than sizeof(cpumask_t). And any bits beyond cpumask_s
would be silently ignored.

First this assumes that the user application knows the size
of cpumask_t, which should be kernel internal. When you increase 
cpumask_t old applications break and there is no good way
for the application to find out the cpumask_t size the kernel
uses.

This patch changes it to do similar checking to the NUMA API calls: 

- Any length is ok as long as all online CPUs are covered
(this could still cause application breakage with more CPUs, 
but there is no good way around it) 

- When the user passes more than cpumask_t bytes the excess
bytes are checked to be zero.


diff -u linux-2.6.8-work/kernel/sched.c-AFFINITY linux-2.6.8-work/kernel/sched.c
--- linux-2.6.8-work/kernel/sched.c-AFFINITY	2004-08-05 04:31:11.000000000 +0200
+++ linux-2.6.8-work/kernel/sched.c	2004-08-31 15:36:38.000000000 +0200
@@ -2891,6 +2891,34 @@
 	return retval;
 }
 
+static int get_user_cpu_mask(unsigned long __user *user_mask_ptr, unsigned len,
+			     cpumask_t *new_mask)
+{
+	if (len < sizeof(cpumask_t)) {
+		/* Smaller is ok as long as all online CPUs are covered */
+		int i, max = 0;
+		for_each_online_cpu(i) 
+			max = i; 
+		if (len < (max + 7)/8)
+			return -EINVAL;
+		memset(new_mask, 0, sizeof(cpumask_t)); 
+	} else if (len > sizeof(cpumask_t)) { 
+		/* Longer is ok as long as all high bits are 0 */
+		int i;
+		if (len > PAGE_SIZE)
+			return -EINVAL;
+		for (i = sizeof(cpumask_t); i < len; i++) { 
+			unsigned char val;
+			if (get_user(val, (unsigned char *)user_mask_ptr + i))
+				return -EFAULT; 
+			if (val)
+				return -EINVAL;
+		} 
+		len = sizeof(cpumask_t);			
+	}
+	return copy_from_user(new_mask, user_mask_ptr, len) ? -EFAULT : 0;
+}
+
 /**
  * sys_sched_setaffinity - set the cpu affinity of a process
  * @pid: pid of the process
@@ -2903,12 +2931,10 @@
 	cpumask_t new_mask;
 	int retval;
 	task_t *p;
-
-	if (len < sizeof(new_mask))
-		return -EINVAL;
-
-	if (copy_from_user(&new_mask, user_mask_ptr, sizeof(new_mask)))
-		return -EFAULT;
+	
+	retval = get_user_cpu_mask(user_mask_ptr, len, &new_mask);
+	if (retval)
+		return retval;
 
 	lock_cpu_hotplug();
 	read_lock(&tasklist_lock);


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Fix argument checking in sched_setaffinity
  2004-08-31 14:30 [PATCH] Fix argument checking in sched_setaffinity Andi Kleen
@ 2004-09-01  1:36 ` Paul Jackson
  2004-09-01  1:59   ` Anton Blanchard
  2004-09-04 13:37   ` Andi Kleen
  0 siblings, 2 replies; 35+ messages in thread
From: Paul Jackson @ 2004-09-01  1:36 UTC (permalink / raw)
  To: Andi Kleen; +Cc: akpm, linux-kernel

Looks good - thanks, Andi.

I notice that you didn't bother with the fractional byte that is handled
by 'endmask' in mm/mempolicy.c:get_nodes().  But I really don't give a
hoot - either way is fine by me.

I've written a couple of code snippets that manage to intuit the size of
the kernel's cpumask dynamically from user space, by probing with
various sched_getaffinity() calls.  But since your patch only changes
the errors generated by sched_setaffinity() [that's "set", not "get"], I
will not experience any grief from this subtle change in the kernel's
API.

Should you lock hotplug before calling get_user_cpu_mask(), since
get_user_cpu_mask() depends on cpu_online_mask()?

-- 
                          I won't rest till it's the best ...
                          Programmer, Linux Scalability
                          Paul Jackson <pj@sgi.com> 1.650.933.1373

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Fix argument checking in sched_setaffinity
  2004-09-01  1:36 ` Paul Jackson
@ 2004-09-01  1:59   ` Anton Blanchard
  2004-09-02  9:33     ` Paul Jackson
  2004-09-04 13:40     ` Andi Kleen
  2004-09-04 13:37   ` Andi Kleen
  1 sibling, 2 replies; 35+ messages in thread
From: Anton Blanchard @ 2004-09-01  1:59 UTC (permalink / raw)
  To: Paul Jackson; +Cc: Andi Kleen, akpm, linux-kernel

 
> I notice that you didn't bother with the fractional byte that is handled
> by 'endmask' in mm/mempolicy.c:get_nodes().  But I really don't give a
> hoot - either way is fine by me.
> 
> I've written a couple of code snippets that manage to intuit the size of
> the kernel's cpumask dynamically from user space, by probing with
> various sched_getaffinity() calls.  But since your patch only changes
> the errors generated by sched_setaffinity() [that's "set", not "get"], I
> will not experience any grief from this subtle change in the kernel's
> API.
> 
> Should you lock hotplug before calling get_user_cpu_mask(), since
> get_user_cpu_mask() depends on cpu_online_mask()?

FYI the NUMA API and affinity code is broken on 64bit big endian. We
really need a get/set compat bitmap and use it. How does this look?
Not well tested yet...

Anton

diff -puN kernel/compat.c~compat_bitmap kernel/compat.c
--- gr_work/kernel/compat.c~compat_bitmap	2004-06-16 10:32:11.590272927 -0500
+++ gr_work-anton/kernel/compat.c	2004-06-16 10:32:11.607270238 -0500
@@ -561,3 +561,83 @@ long compat_clock_nanosleep(clockid_t wh
 
 /* timer_create is architecture specific because it needs sigevent conversion */
 
+long compat_get_bitmap(unsigned long *mask, compat_ulong_t __user *umask,
+		       unsigned long bitmap_size)
+{
+	int i, j;
+	unsigned long m;
+	compat_ulong_t um;
+	unsigned long nr_compat_longs;
+
+	/* align bitmap up to nearest compat_long_t boundary */
+	bitmap_size = ALIGN(bitmap_size, BITS_PER_COMPAT_LONG);
+
+	if (verify_area(VERIFY_READ, umask, bitmap_size / 8))
+		return -EFAULT;
+
+	nr_compat_longs = BITS_TO_COMPAT_LONGS(bitmap_size);
+
+	for (i = 0; i < BITS_TO_LONGS(bitmap_size); i++) {
+		m = 0;
+
+		for (j = 0; j < sizeof(m)/sizeof(um); j++) {
+			/*
+			 * We dont want to read past the end of the userspace
+			 * bitmap. We must however ensure the end of the
+			 * kernel bitmap is zeroed.
+			 */
+			if (nr_compat_longs-- > 0) {
+				if (__get_user(um, umask))
+					return -EFAULT;
+			} else {
+				um = 0;
+			}
+
+			umask++;
+			m |= (long)um << (j * BITS_PER_COMPAT_LONG);
+		}
+		*mask++ = m;
+	}
+
+	return 0;
+}
+
+long compat_put_bitmap(compat_ulong_t __user *umask, unsigned long *mask,
+		       unsigned long bitmap_size)
+{
+	int i, j;
+	unsigned long m;
+	compat_ulong_t um;
+	unsigned long nr_compat_longs;
+
+	/* align bitmap up to nearest compat_long_t boundary */
+	bitmap_size = ALIGN(bitmap_size, BITS_PER_COMPAT_LONG);
+
+	if (verify_area(VERIFY_WRITE, umask, bitmap_size / 8))
+		return -EFAULT;
+
+	nr_compat_longs = BITS_TO_COMPAT_LONGS(bitmap_size);
+
+	for (i = 0; i < BITS_TO_LONGS(bitmap_size); i++) {
+		m = *mask++;
+
+		for (j = 0; j < sizeof(m)/sizeof(um); j++) {
+			um = m;
+
+			/*
+			 * We dont want to write past the end of the userspace
+			 * bitmap.
+			 */
+			if (nr_compat_longs-- > 0) {
+				if (__put_user(um, umask))
+					return -EFAULT;
+			}
+
+			umask++;
+			m >>= 4*sizeof(um);
+			m >>= 4*sizeof(um);
+		}
+	}
+
+	return 0;
+}
diff -puN include/linux/compat.h~compat_bitmap include/linux/compat.h
--- gr_work/include/linux/compat.h~compat_bitmap	2004-06-16 10:32:11.595272136 -0500
+++ gr_work-anton/include/linux/compat.h	2004-06-16 10:32:11.608270080 -0500
@@ -130,5 +130,15 @@ asmlinkage long compat_sys_select(int n,
 		compat_ulong_t __user *outp, compat_ulong_t __user *exp,
 		struct compat_timeval __user *tvp);
 
+#define BITS_PER_COMPAT_LONG    (8*sizeof(compat_long_t))
+
+#define BITS_TO_COMPAT_LONGS(bits) \
+	(((bits)+BITS_PER_COMPAT_LONG-1)/BITS_PER_COMPAT_LONG)
+
+long compat_get_bitmap(unsigned long *mask, compat_ulong_t __user *umask,
+		       unsigned long bitmap_size);
+long compat_put_bitmap(compat_ulong_t __user *umask, unsigned long *mask,
+		       unsigned long bitmap_size);
+
 #endif /* CONFIG_COMPAT */
 #endif /* _LINUX_COMPAT_H */

_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Fix argument checking in sched_setaffinity
  2004-09-01  1:59   ` Anton Blanchard
@ 2004-09-02  9:33     ` Paul Jackson
  2004-09-04 13:40     ` Andi Kleen
  1 sibling, 0 replies; 35+ messages in thread
From: Paul Jackson @ 2004-09-02  9:33 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: ak, akpm, linux-kernel

Anton wrote:
> How does this look?

I haven't developed much of an eye yet for the compat routines - so this
looks ok to me, but such means little ;).

Hopefully someone else with a history here can take a look.  Or if you
whack me again, I might be able to examine it further.

Sorry ... thanks.

-- 
                          I won't rest till it's the best ...
                          Programmer, Linux Scalability
                          Paul Jackson <pj@sgi.com> 1.650.933.1373

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Fix argument checking in sched_setaffinity
  2004-09-01  1:36 ` Paul Jackson
  2004-09-01  1:59   ` Anton Blanchard
@ 2004-09-04 13:37   ` Andi Kleen
       [not found]     ` <20040904171417.67649169.pj@sgi.com>
  1 sibling, 1 reply; 35+ messages in thread
From: Andi Kleen @ 2004-09-04 13:37 UTC (permalink / raw)
  To: Paul Jackson; +Cc: akpm, linux-kernel

On Tue, Aug 31, 2004 at 06:36:55PM -0700, Paul Jackson wrote:
> Looks good - thanks, Andi.
> 
> I notice that you didn't bother with the fractional byte that is handled
> by 'endmask' in mm/mempolicy.c:get_nodes().  But I really don't give a
> hoot - either way is fine by me.

It is not needed because this function gets bytes instead of bits.


> I've written a couple of code snippets that manage to intuit the size of
> the kernel's cpumask dynamically from user space, by probing with
> various sched_getaffinity() calls.  But since your patch only changes
> the errors generated by sched_setaffinity() [that's "set", not "get"], I
> will not experience any grief from this subtle change in the kernel's
> API.
> 
> Should you lock hotplug before calling get_user_cpu_mask(), since
> get_user_cpu_mask() depends on cpu_online_mask()?

Good point yes, that is missing. 

However Linus has already thrown the code out and replaced
it with something more broken :-/

-Andi

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Fix argument checking in sched_setaffinity
  2004-09-01  1:59   ` Anton Blanchard
  2004-09-02  9:33     ` Paul Jackson
@ 2004-09-04 13:40     ` Andi Kleen
  2004-09-05 14:27       ` Anton Blanchard
  1 sibling, 1 reply; 35+ messages in thread
From: Andi Kleen @ 2004-09-04 13:40 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: Paul Jackson, akpm, linux-kernel

On Wed, Sep 01, 2004 at 11:59:22AM +1000, Anton Blanchard wrote:
>  
> > I notice that you didn't bother with the fractional byte that is handled
> > by 'endmask' in mm/mempolicy.c:get_nodes().  But I really don't give a
> > hoot - either way is fine by me.
> > 
> > I've written a couple of code snippets that manage to intuit the size of
> > the kernel's cpumask dynamically from user space, by probing with
> > various sched_getaffinity() calls.  But since your patch only changes
> > the errors generated by sched_setaffinity() [that's "set", not "get"], I
> > will not experience any grief from this subtle change in the kernel's
> > API.
> > 
> > Should you lock hotplug before calling get_user_cpu_mask(), since
> > get_user_cpu_mask() depends on cpu_online_mask()?
> 
> FYI the NUMA API and affinity code is broken on 64bit big endian. We
> really need a get/set compat bitmap and use it. How does this look?
> Not well tested yet...

Looks good from a quick review. But there is nothing to call it? 

-Andi

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Fix argument checking in sched_setaffinity
       [not found]     ` <20040904171417.67649169.pj@sgi.com>
@ 2004-09-05  0:18       ` Linus Torvalds
  2004-09-05  1:05         ` Paul Jackson
  2004-09-06 13:16         ` Andi Kleen
  0 siblings, 2 replies; 35+ messages in thread
From: Linus Torvalds @ 2004-09-05  0:18 UTC (permalink / raw)
  To: Paul Jackson; +Cc: Andi Kleen, akpm, linux-kernel



On Sat, 4 Sep 2004, Paul Jackson wrote:
> 
> How is what Linus left more broken?

It's not. If anything, we should probably remove even more.

I don't see what the problem was with just requiring the right damn size.  
User mode can trivially get the size by asking for it. But if it can't be
bothered, then Andi's code certainly just made things worse.

		Linus

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Fix argument checking in sched_setaffinity
  2004-09-05  0:18       ` Linus Torvalds
@ 2004-09-05  1:05         ` Paul Jackson
  2004-09-05  1:38           ` Linus Torvalds
  2004-09-06 13:16         ` Andi Kleen
  1 sibling, 1 reply; 35+ messages in thread
From: Paul Jackson @ 2004-09-05  1:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ak, akpm, linux-kernel

Linus wrote:
> It's not. If anything, we should probably remove even more.
>
> I don't see what the problem was with just requiring the right damn size.  
> User mode can trivially get the size by asking for it

I'll second that motion.  Match size, or return -EINVAL.

My understanding of "asking for it" requires at present a user code
loop, to probe for the size that works.  But my user code already does
that, and the first thing for which I audit any changes to this kernel
code is not breaking my sizing loop code in user space.

I'd mildly prefer adding a kernel/user API for explicitly providing the
two values:

	sizeof(cpumask_t)
	sizeof(nodemask_t)

This might help reduce the unending confusions in the user and library
code sitting on top of us.

We could two phase this:
 1) add an obvious way to size these masks, and then
 2) six months later, require sizes to match in all these calls.

I for one could live with a full and sudden change over, no phasing.
But apparently my field exposure is more limited than Andi's is, at
this time.

-- 
                          I won't rest till it's the best ...
                          Programmer, Linux Scalability
                          Paul Jackson <pj@sgi.com> 1.650.933.1373

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Fix argument checking in sched_setaffinity
  2004-09-05  1:05         ` Paul Jackson
@ 2004-09-05  1:38           ` Linus Torvalds
  2004-09-05  3:48             ` Paul Jackson
  0 siblings, 1 reply; 35+ messages in thread
From: Linus Torvalds @ 2004-09-05  1:38 UTC (permalink / raw)
  To: Paul Jackson; +Cc: ak, akpm, linux-kernel



On Sat, 4 Sep 2004, Paul Jackson wrote:
> 
> My understanding of "asking for it" requires at present a user code
> loop, to probe for the size that works.

Yeah, or just make a frigging big area, and asking the kernel for it ;)

Something like

	/* We just assume that 8k CPU's aren't going to happen */
	#define MAX_CPUMASK_BYTES (1024)

	void *cpumask = malloc(MAX_CPUMASK_BYTES);
	int real_size = sched_getaffinity(0, cpumask, MAX_CPUMASK_BYTES);

and no loop needed.

>				  But my user code already does
> that, and the first thing for which I audit any changes to this kernel
> code is not breaking my sizing loop code in user space.

I don't think you can reasonably use the "setaffinity()" call for sizing, 
since that historically just refused to use anything but the exact size. 
Sure, you could loop over every byte value known to man, but it's just a 
lot easier to do the "getaffinity" thing - if it fails, you can double the 
size of your buffer and try again. O(log(n)) rather than O(n) ;)

(And the "just start high enough" approach means that you can basically 
make it O(1) if you don't care about the theoretical possibility of a 
8k-CPU monster machine).

> I'd mildly prefer adding a kernel/user API for explicitly providing the
> two values:
> 
> 	sizeof(cpumask_t)
> 	sizeof(nodemask_t)
> 
> This might help reduce the unending confusions in the user and library
> code sitting on top of us.

I don't know how to sanely expose the damn things. Maybe in the vsyscall 
page or something. Adding YAEAE (yet another ELF aux entry) could be done, 
of course.

> We could two phase this:
>  1) add an obvious way to size these masks, and then
>  2) six months later, require sizes to match in all these calls.

Well, historically we _have_ required sizes to match. You can pass in 
larger sizes to the "get" functions (and they'll tell you how much you 
got), but the "set" functions required the user to know exactly what the 
size was. Which is easy, see above.

		Linus

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Fix argument checking in sched_setaffinity
  2004-09-05  1:38           ` Linus Torvalds
@ 2004-09-05  3:48             ` Paul Jackson
  2004-09-05  3:57               ` Linus Torvalds
  0 siblings, 1 reply; 35+ messages in thread
From: Paul Jackson @ 2004-09-05  3:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ak, akpm, linux-kernel

Linus wrote:
>	/* We just assume that 8k CPU's aren't going to happen */

SGI doesn't so assume ;).


> but it's just a lot easier to do the "getaffinity" thing - if it fails,
> you can double the size of your buffer and try again. O(log(n)) rather
> than O(n) ;)

I agree.  That's what my cpumask sizing loop does.

Well ... did.

Now it reads /sys/devices/system/node/node0/cpumap and computes the
size of the cpumask as an arithmetic function of the number of bytes
read (the ascii format uses 9 chars for each 32 bits of mask).

Either way works ...

My nodemask sizing code loops on get_mempolicy() calls of increasing
size, until they stop failing -EINVAL.


> Well, historically we _have_ required sizes to match.

I'm not sure what history you're looking at here, Linus.

Last weeks sys_sched_setaffinity didn't seem to require matching size,
only that user size is >= kernel size.  The kernel ignored the extra
user bits.

For nodemask_t, well let me just say the mbind/mempolicy calls are different.

If we want to go in the direction of requiring sizes to match in the
'set' calls, then instead of this weeks changes to sys_sched_setaffinity
allowing user size < kernel size, shouldn't we be going the other way,
and tightening the check in kernel/sched.c:sys_sched_setaffinity(), from
what it was a week ago:

        if (len < sizeof(new_mask))
                return -EINVAL;

to:

        if (len != sizeof(new_mask))
                return -EINVAL;

Or at least reverting this last weeks changes back to the '<' check?


> I don't know how to sanely expose the damn things

How about:

	$ cd /proc/sys/kernel
	$ head sizeof*
	==> sizeof_cpumask <==
	64

	==> sizeof_nodemask <==
	32

-- 
                          I won't rest till it's the best ...
                          Programmer, Linux Scalability
                          Paul Jackson <pj@sgi.com> 1.650.933.1373

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Fix argument checking in sched_setaffinity
  2004-09-05  3:48             ` Paul Jackson
@ 2004-09-05  3:57               ` Linus Torvalds
  2004-09-05  4:17                 ` Paul Jackson
  0 siblings, 1 reply; 35+ messages in thread
From: Linus Torvalds @ 2004-09-05  3:57 UTC (permalink / raw)
  To: Paul Jackson; +Cc: ak, akpm, linux-kernel



On Sat, 4 Sep 2004, Paul Jackson wrote:
> 
> > Well, historically we _have_ required sizes to match.
> 
> I'm not sure what history you're looking at here, Linus.

I have my personal drugged-up history.

Take a toke, man.

IOW: You're obviously right.

> > I don't know how to sanely expose the damn things
> 
> How about:
> 
> 	$ cd /proc/sys/kernel
> 	$ head sizeof*
> 	==> sizeof_cpumask <==
> 	64
> 
> 	==> sizeof_nodemask <==
> 	32

Well, that's so much slower and not any more obvious than just doing the 
iterative few system calls that I don't really see the point other than 
from a scripting standpoint, but on the other hand I can't see how you'd 
use sched_setaffinity() and friends from within a script anyway, so ;)

(yes, there's perl syscalls, but then the standard "find the size" also 
works fine ;)

		Linus

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Fix argument checking in sched_setaffinity
  2004-09-05  3:57               ` Linus Torvalds
@ 2004-09-05  4:17                 ` Paul Jackson
  2004-09-05  4:52                   ` Paul Jackson
  0 siblings, 1 reply; 35+ messages in thread
From: Paul Jackson @ 2004-09-05  4:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ak, akpm, linux-kernel

> Take a toke, man.

Ahh ... much better ... thanks.

> Well, that's so much slower and not any more obvious than just doing the 
> iterative few system calls that I don't really see the point other than 

Perhaps no more obvious to you, but if you had to see the confusion
I'm seeing over on user side, this /proc/sys/kernel/sizeof_cpumask
might be a win.

But if you want to take the position that it's not the kernels job
to keep the users head screwed on straight, I won't argue.

Besides, when you wrote "I don't know how to sanely expose the damn
things", I instinctively took that as a challenge to present a way.

==

I still like the position I thought you took for a moment there, of
tightening, not loosening, the preconditions on setaffinity, starting
with backing out the changes made to it this week.

Are you still thinking of doing that, or would you rather just let this
dog go back to sleep, as it lies now?

-- 
                          I won't rest till it's the best ...
                          Programmer, Linux Scalability
                          Paul Jackson <pj@sgi.com> 1.650.933.1373

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Fix argument checking in sched_setaffinity
  2004-09-05  4:17                 ` Paul Jackson
@ 2004-09-05  4:52                   ` Paul Jackson
  2004-09-06 18:23                     ` Andi Kleen
  0 siblings, 1 reply; 35+ messages in thread
From: Paul Jackson @ 2004-09-05  4:52 UTC (permalink / raw)
  To: Paul Jackson; +Cc: torvalds, ak, akpm, linux-kernel

> starting with backing out the changes made to it this week.

Andi,

Given that Linus has gutted most of your patch to sched_setaffinity,
do you have a preference between where the code started the week,
and where it ended?

If I'm reading Linus' mind right (well ... there's a first time
for everything) then your preference, either way, would likely
carry the day.

-- 
                          I won't rest till it's the best ...
                          Programmer, Linux Scalability
                          Paul Jackson <pj@sgi.com> 1.650.933.1373

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Fix argument checking in sched_setaffinity
  2004-09-04 13:40     ` Andi Kleen
@ 2004-09-05 14:27       ` Anton Blanchard
  0 siblings, 0 replies; 35+ messages in thread
From: Anton Blanchard @ 2004-09-05 14:27 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Paul Jackson, akpm, linux-kernel

 
> Looks good from a quick review. But there is nothing to call it? 

Heres one :) Unfortunately we have to frob 32bit userspace bitmaps to
64bit ones on big endian platforms. This version does extra copies but
is simple, avoids set_fs tricks and gets things working for me on ppc64.

Anton

diff -puN mm/mempolicy.c~numa_api mm/mempolicy.c
--- gr_work/mm/mempolicy.c~numa_api	2004-09-04 21:14:44.595414365 -0500
+++ gr_work-anton/mm/mempolicy.c	2004-09-05 09:12:18.899685327 -0500
@@ -525,20 +525,82 @@ asmlinkage long sys_get_mempolicy(int __
 }
 
 #ifdef CONFIG_COMPAT
-/* The other functions are compatible */
+
 asmlinkage long compat_get_mempolicy(int __user *policy,
-				  unsigned __user *nmask, unsigned  maxnode,
-				  unsigned addr, unsigned  flags)
+				     compat_ulong_t __user *nmask,
+				     compat_ulong_t maxnode,
+				     compat_ulong_t addr, compat_ulong_t flags)
 {
 	long err;
 	unsigned long __user *nm = NULL;
+	unsigned long nr_bits, alloc_size;
+	DECLARE_BITMAP(bm, MAX_NUMNODES);
+
+	nr_bits = min(maxnode-1, MAX_NUMNODES);
+	alloc_size = ALIGN(nr_bits, BITS_PER_LONG) / 8;
+
 	if (nmask)
-		nm = compat_alloc_user_space(ALIGN(maxnode-1, 64) / 8);
-	err = sys_get_mempolicy(policy, nm, maxnode, addr, flags);
-	if (!err && copy_in_user(nmask, nm, ALIGN(maxnode-1, 32)/8))
-		err = -EFAULT;
+		nm = compat_alloc_user_space(alloc_size);
+
+	err = sys_get_mempolicy(policy, nm, nr_bits+1, addr, flags);
+
+	if (!err && nmask) {
+		err = copy_from_user(bm, nm, alloc_size);
+		/* ensure entire bitmap is zeroed */
+		err |= clear_user(nmask, ALIGN(maxnode-1, 8) / 8);
+		err |= compat_put_bitmap(nmask, bm, nr_bits);
+	}
+
 	return err;
 }
+
+asmlinkage long compat_set_mempolicy(int mode, compat_ulong_t __user *nmask,
+				     compat_ulong_t maxnode)
+{
+	long err;
+	unsigned long __user *nm = NULL;
+	unsigned long nr_bits, alloc_size;
+	DECLARE_BITMAP(bm, MAX_NUMNODES);
+
+	nr_bits = min(maxnode-1, MAX_NUMNODES);
+	alloc_size = ALIGN(nr_bits, BITS_PER_LONG) / 8;
+
+	if (nmask) {
+		err = compat_get_bitmap(bm, nmask, nr_bits);
+		nm = compat_alloc_user_space(alloc_size);
+		err |= copy_to_user(nm, bm, alloc_size);
+	}
+
+	if (err)
+		return -EFAULT;
+
+	return sys_set_mempolicy(mode, nm, nr_bits+1);
+}
+
+asmlinkage long compat_mbind(compat_ulong_t start, compat_ulong_t len,
+			     compat_ulong_t mode, compat_ulong_t __user *nmask,
+			     compat_ulong_t maxnode, compat_ulong_t flags)
+{
+	long err;
+	unsigned long __user *nm = NULL;
+	unsigned long nr_bits, alloc_size;
+	DECLARE_BITMAP(bm, MAX_NUMNODES);
+
+	nr_bits = min(maxnode-1, MAX_NUMNODES);
+	alloc_size = ALIGN(nr_bits, BITS_PER_LONG) / 8;
+
+	if (nmask) {
+		err = compat_get_bitmap(bm, nmask, nr_bits);
+		nm = compat_alloc_user_space(alloc_size);
+		err |= copy_to_user(nm, bm, alloc_size);
+	}
+
+	if (err)
+		return -EFAULT;
+
+	return sys_mbind(start, len, mode, nm, nr_bits+1, flags);
+}
+
 #endif
 
 /* Return effective policy for a VMA */
@@ -900,7 +962,7 @@ mpol_shared_policy_lookup(struct shared_
 
 static void sp_delete(struct shared_policy *sp, struct sp_node *n)
 {
-	PDprintk("deleting %lx-l%x\n", n->start, n->end);
+	PDprintk("deleting %lx-%lx\n", n->start, n->end);
 	rb_erase(&n->nd, &sp->root);
 	mpol_free(n->policy);
 	kmem_cache_free(sn_cache, n);

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Fix argument checking in sched_setaffinity
  2004-09-05  0:18       ` Linus Torvalds
  2004-09-05  1:05         ` Paul Jackson
@ 2004-09-06 13:16         ` Andi Kleen
  1 sibling, 0 replies; 35+ messages in thread
From: Andi Kleen @ 2004-09-06 13:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paul Jackson, akpm, linux-kernel

On Sat, Sep 04, 2004 at 05:18:30PM -0700, Linus Torvalds wrote:
> 
> 
> On Sat, 4 Sep 2004, Paul Jackson wrote:
> > 
> > How is what Linus left more broken?
> 
> It's not. If anything, we should probably remove even more.
> 
> I don't see what the problem was with just requiring the right damn size.  
> User mode can trivially get the size by asking for it. But if it can't be

I don't think writing a syscall loop is a good idea for this. 
The main reason is that when you get an EINVAL for some other
reason you will still blow up your memory until you
hit some arbitary upper size.

Currently this EINVAL is the only instance in this syscall,
but this may change in some future version.

A sysctl may have worked, but it results in a lot of code
bloat in the application to handle it.

> bothered, then Andi's code certainly just made things worse.

I disagree on that. It was not perfect, but with minor fixes
could have been a proper solution. Your current code is even worse than
what was there before my patch.

Alternative would be the sysctl and strict check again. I don't
like it too much because it makes the application more complicated
(i prefer simple interfaces, because complex interfaces tend to 
have more bugs) 

-Andi

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Fix argument checking in sched_setaffinity
  2004-09-05  4:52                   ` Paul Jackson
@ 2004-09-06 18:23                     ` Andi Kleen
  2004-09-06 18:48                       ` Linus Torvalds
  0 siblings, 1 reply; 35+ messages in thread
From: Andi Kleen @ 2004-09-06 18:23 UTC (permalink / raw)
  To: Paul Jackson; +Cc: torvalds, akpm, linux-kernel

On Sat, Sep 04, 2004 at 09:52:05PM -0700, Paul Jackson wrote:
> > starting with backing out the changes made to it this week.
> 
> Andi,
> 
> Given that Linus has gutted most of your patch to sched_setaffinity,
> do you have a preference between where the code started the week,
> and where it ended?
> 
> If I'm reading Linus' mind right (well ... there's a first time
> for everything) then your preference, either way, would likely
> carry the day.

The only change I would like to have is to check the excess bytes
to make sure they don't contain some random value. They should
be either all 0 or all 0xff. 

-Andi

Here's a patch for bk12: 

Linus, does this look better?

--------------------------------------------------------

For excess cpumask bits passed from user space ensure
they are all zero or all one.  This minimizes binary incompatibilities
when the kernel is recompiled with a bigger cpumask_t type.

diff -u linux-2.6.8/kernel/sched.c-o linux-2.6.8/kernel/sched.c
--- linux-2.6.8/kernel/sched.c-o	2004-09-06 20:06:58.000000000 +0200
+++ linux-2.6.8/kernel/sched.c	2004-09-06 20:16:33.940579241 +0200
@@ -3368,6 +3368,19 @@
 	if (len < sizeof(cpumask_t)) {
 		memset(new_mask, 0, sizeof(cpumask_t));
 	} else if (len > sizeof(cpumask_t)) {
+		unsigned i;
+		unsigned char val, initval;
+		if (len > PAGE_SIZE)
+			return -EINVAL;
+		/* excess bytes must be all 0 or all 0xff */
+		for (i = sizeof(cpumask_t); i < len; i++) { 
+			if (get_user(val, (char *)new_mask + i))
+				return -EFAULT; 
+			if (i == sizeof(cpumask_t))
+				initval = val;
+			if (!(val == 0 || val == 0xff) || val != initval)
+				return -EINVAL; 
+		} 
 		len = sizeof(cpumask_t);
 	}
 	return copy_from_user(new_mask, user_mask_ptr, len) ? -EFAULT : 0;

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Fix argument checking in sched_setaffinity
  2004-09-06 18:23                     ` Andi Kleen
@ 2004-09-06 18:48                       ` Linus Torvalds
  2004-09-06 21:11                         ` Paul Jackson
  2004-09-07  8:07                         ` [PATCH] Fix argument checking in sched_setaffinity Andi Kleen
  0 siblings, 2 replies; 35+ messages in thread
From: Linus Torvalds @ 2004-09-06 18:48 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Paul Jackson, akpm, linux-kernel



On Mon, 6 Sep 2004, Andi Kleen wrote:
> 
> The only change I would like to have is to check the excess bytes
> to make sure they don't contain some random value. They should
> be either all 0 or all 0xff. 

I hate the "byte at a time" interface.

That said, I think the "long at a time" interface we have now for bitmaps 
ends up being a compatibility problem, where the compat layer has to worry 
about big-endian 32-bit "long" lookign different from big-endian 64-bit 
"long".

So there are other issues here.

		Linus

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Fix argument checking in sched_setaffinity
  2004-09-06 18:48                       ` Linus Torvalds
@ 2004-09-06 21:11                         ` Paul Jackson
  2004-09-07 14:40                           ` Linus Torvalds
  2004-09-07  8:07                         ` [PATCH] Fix argument checking in sched_setaffinity Andi Kleen
  1 sibling, 1 reply; 35+ messages in thread
From: Paul Jackson @ 2004-09-06 21:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ak, akpm, linux-kernel

Linus wrote:
> I hate the "byte at a time" interface.
> 
> That said, I think the "long at a time" interface we have now for bitmaps 
> ends up being a compatibility problem, where the compat layer has to worry 
> about big-endian 32-bit "long" lookign different from big-endian 64-bit 
> "long".

My first preference would be to get all the binary bitmap interfaces
(affinity, mbind and mempolicy) "right":

    I think that means an array of 'u32'.  This parallels what I did for
    the ascii format, where there was less need to remain compatible
    (except that ascii is naturally big-endian, while the u32 array has
    the low order word first):

      $ cat /sys/devices/system/node/node0/cpumap
      00000000,00000000,00000000,000000ff

    No doubt Andi will veto this for mbind/mempolicy, because it breaks
    libnuma's he has in the field - a reasonable concern.  We'd probably
    have to burn another couple of system calls, introducing the new API
    while keeping the old one around, as is, for a year or three.

    And this (array of u32) is different from the kernel bitmap, due to
    the reversed u32 halves of each u64 on big endian 64 arches.  If I
    were God, the kernel bitmap would also be an array of u32's, not ulongs.
    Still ... might as well start somewhere, and get the kernel/user API
    "right", even if the kernel internals have an irreparable twist.

    I agree with Andi that there should be an explicit way to get the
    correct size - the loops cause too many user level code bugs, and
    trying to accomodate user code that doesn't know the exact size is
    causing our kernel code too much grief.

    Possible ways to publish cpumask/nodemask sizes include:

     1) # an ascii field in some proc file:

	    $ grep sizeof /proc/sys/kernel

     2) sysctl

     3) overload sched_getaffinity (for sizeof cpumask) and get_mempolicy
	(for sizeof nodemask) to return the sizes if passed zero lengths
	or NULL mask pointers or some other currently useless input.

My second preference would be what we had a week ago.  Minor tweaks
(especially ones that relax the preconditions) to busted API's do more
harm than good.  Leave 'em be, or get 'em "right".  Quit putting
lipstick on a pig.

I was surprised that Andi came up with yet another tweak to this API
(his suggested patch to allow either 0x00 or 0xff fill).  Surely Andi
doesn't need this for _his_ code, since he's competent to code to the
current API.  So I guess he's trying to make life easier for others.
Eh ... doesn't seem worth it.

Leave it be, I say.  Leave it be.  Or get it right.

-- 
                          I won't rest till it's the best ...
                          Programmer, Linux Scalability
                          Paul Jackson <pj@sgi.com> 1.650.933.1373

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Fix argument checking in sched_setaffinity
  2004-09-06 18:48                       ` Linus Torvalds
  2004-09-06 21:11                         ` Paul Jackson
@ 2004-09-07  8:07                         ` Andi Kleen
  1 sibling, 0 replies; 35+ messages in thread
From: Andi Kleen @ 2004-09-07  8:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paul Jackson, akpm, linux-kernel

On Mon, Sep 06, 2004 at 11:48:46AM -0700, Linus Torvalds wrote:
> 
> 
> On Mon, 6 Sep 2004, Andi Kleen wrote:
> > 
> > The only change I would like to have is to check the excess bytes
> > to make sure they don't contain some random value. They should
> > be either all 0 or all 0xff. 
> 
> I hate the "byte at a time" interface.

I looked at doing it, but it would be far too complicated
for such a single operation with the two necessary alignment and
fix up left over bytes at the end loops and other fixup code. 
And this should not really be performance critical in any ways. 
long handling would be easy if the interface had been designed in longs, 
but it wasn't.

> That said, I think the "long at a time" interface we have now for bitmaps 
> ends up being a compatibility problem, where the compat layer has to worry 
> about big-endian 32-bit "long" lookign different from big-endian 64-bit 
> "long".
> 
> So there are other issues here.

In this special case not - big endian and little endian 0 and -1 are both
identical :-)

-Andi

Here's the byte at a time code again in case you change your mind.

--------------------------------------------------------------

Check that excess bytes passed by the user process to 
sched_setaffinity contain all 0 (no cpus) or all ones (all cpus)

diff -u linux-2.6.8/kernel/sched.c-o linux-2.6.8/kernel/sched.c
--- linux-2.6.8/kernel/sched.c-o	2004-09-06 20:06:58.000000000 +0200
+++ linux-2.6.8/kernel/sched.c	2004-09-06 20:16:33.940579241 +0200
@@ -3368,6 +3368,19 @@
 	if (len < sizeof(cpumask_t)) {
 		memset(new_mask, 0, sizeof(cpumask_t));
 	} else if (len > sizeof(cpumask_t)) {
+		unsigned i;
+		unsigned char val, initval;
+		if (len > PAGE_SIZE)
+			return -EINVAL;
+		/* excess bytes must be all 0 or all 0xff */
+		for (i = sizeof(cpumask_t); i < len; i++) { 
+			if (get_user(val, (char *)new_mask + i))
+				return -EFAULT; 
+			if (i == sizeof(cpumask_t))
+				initval = val;
+			if (!(val == 0 || val == 0xff) || val != initval)
+				return -EINVAL; 
+		} 
 		len = sizeof(cpumask_t);
 	}
 	return copy_from_user(new_mask, user_mask_ptr, len) ? -EFAULT : 0;



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Fix argument checking in sched_setaffinity
  2004-09-06 21:11                         ` Paul Jackson
@ 2004-09-07 14:40                           ` Linus Torvalds
  2004-09-07 14:48                             ` Geert Uytterhoeven
                                               ` (3 more replies)
  0 siblings, 4 replies; 35+ messages in thread
From: Linus Torvalds @ 2004-09-07 14:40 UTC (permalink / raw)
  To: Paul Jackson; +Cc: ak, Andrew Morton, Linux Arch list


[ Linux-kernel replaced with linux-arch, to see if there's any commentary 
  from arch people involved.. ]

On Mon, 6 Sep 2004, Paul Jackson wrote:

> Linus wrote:
> > I hate the "byte at a time" interface.
> > 
> > That said, I think the "long at a time" interface we have now for bitmaps 
> > ends up being a compatibility problem, where the compat layer has to worry 
> > about big-endian 32-bit "long" lookign different from big-endian 64-bit 
> > "long".
> 
> My first preference would be to get all the binary bitmap interfaces
> (affinity, mbind and mempolicy) "right":
> 
>     I think that means an array of 'u32'.  This parallels what I did for
>     the ascii format, where there was less need to remain compatible
>     (except that ascii is naturally big-endian, while the u32 array has
>     the low order word first):
> 
>       $ cat /sys/devices/system/node/node0/cpumap
>       00000000,00000000,00000000,000000ff

I agree. 

>     No doubt Andi will veto this for mbind/mempolicy, because it breaks
>     libnuma's he has in the field - a reasonable concern. 

Now, it will _only_ break systems that are _both_ 64-bit _and_ big-endian. 
Little-endian or 32-bit boxes will never care.

There aren't that many of those machines. I've got one right here (ppc64), 
but that particular one is guaranteed not to break if only because it's 
running a 32-bit user space.

So it's ppc64, sparc64, s390x and sh64. I suspect the breakage is 
basically zero, since not only aren't there _that_ many machines out 
there, the percentage of them that use setaffinity or mbind is likely not 
that high either.

		Linus

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Fix argument checking in sched_setaffinity
  2004-09-07 14:40                           ` Linus Torvalds
@ 2004-09-07 14:48                             ` Geert Uytterhoeven
  2004-09-07 14:49                             ` Andi Kleen
                                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 35+ messages in thread
From: Geert Uytterhoeven @ 2004-09-07 14:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paul Jackson, ak, Andrew Morton, Linux Arch list

On Tue, 7 Sep 2004, Linus Torvalds wrote:
> Now, it will _only_ break systems that are _both_ 64-bit _and_ big-endian.
> Little-endian or 32-bit boxes will never care.
>
> There aren't that many of those machines. I've got one right here (ppc64),
> but that particular one is guaranteed not to break if only because it's
> running a 32-bit user space.
>
> So it's ppc64, sparc64, s390x and sh64. I suspect the breakage is
> basically zero, since not only aren't there _that_ many machines out
> there, the percentage of them that use setaffinity or mbind is likely not
> that high either.

And mips64, and parisc...

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Fix argument checking in sched_setaffinity
  2004-09-07 14:40                           ` Linus Torvalds
  2004-09-07 14:48                             ` Geert Uytterhoeven
@ 2004-09-07 14:49                             ` Andi Kleen
  2004-09-07 21:44                               ` Ralf Baechle
  2004-09-08  0:26                               ` Anton Blanchard
  2004-09-07 14:50                             ` Matthew Wilcox
  2004-09-08  0:24                             ` Anton Blanchard
  3 siblings, 2 replies; 35+ messages in thread
From: Andi Kleen @ 2004-09-07 14:49 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paul Jackson, ak, Andrew Morton, Linux Arch list

On Tue, Sep 07, 2004 at 07:40:53AM -0700, Linus Torvalds wrote:
> >     No doubt Andi will veto this for mbind/mempolicy, because it breaks
> >     libnuma's he has in the field - a reasonable concern. 
> 
> Now, it will _only_ break systems that are _both_ 64-bit _and_ big-endian. 
> Little-endian or 32-bit boxes will never care.
> 
> There aren't that many of those machines. I've got one right here (ppc64), 
> but that particular one is guaranteed not to break if only because it's 
> running a 32-bit user space.

ppc64 libnuma is not deployed - the current numactl releases
still don't have the system call numbers and nobody told me 
about hacking them in.

So for me breaking ppc64 would be no problem.

> 
> So it's ppc64, sparc64, s390x and sh64. I suspect the breakage is 
> basically zero, since not only aren't there _that_ many machines out 
> there, the percentage of them that use setaffinity or mbind is likely not 
> that high either.

For mbind() breaking existing ppc64 users is unlikely I agree.

For setaffinity I am not so sure. The system call is around
for a long time and has been used in standard utilities
in distributions also for quite some time.

Also it is commonly used in real time applications. 

So in short I think changing mbind to u32 would be fine,
but I wouldn't do it for sched_setaffinity() 

-Andi

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Fix argument checking in sched_setaffinity
  2004-09-07 14:40                           ` Linus Torvalds
  2004-09-07 14:48                             ` Geert Uytterhoeven
  2004-09-07 14:49                             ` Andi Kleen
@ 2004-09-07 14:50                             ` Matthew Wilcox
  2004-09-08  0:24                             ` Anton Blanchard
  3 siblings, 0 replies; 35+ messages in thread
From: Matthew Wilcox @ 2004-09-07 14:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paul Jackson, ak, Andrew Morton, Linux Arch list

On Tue, Sep 07, 2004 at 07:40:53AM -0700, Linus Torvalds wrote:
> Now, it will _only_ break systems that are _both_ 64-bit _and_ big-endian. 
> Little-endian or 32-bit boxes will never care.
> 
> So it's ppc64, sparc64, s390x and sh64. I suspect the breakage is 

... and parisc64.  Not that we have 64 bit userland yet either.

> basically zero, since not only aren't there _that_ many machines out 
> there, the percentage of them that use setaffinity or mbind is likely not 
> that high either.

-- 
"Next the statesmen will invent cheap lies, putting the blame upon 
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince 
himself that the war is just, and will thank God for the better sleep 
he enjoys after this process of grotesque self-deception." -- Mark Twain

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Fix argument checking in sched_setaffinity
  2004-09-07 14:49                             ` Andi Kleen
@ 2004-09-07 21:44                               ` Ralf Baechle
  2004-09-07 22:55                                 ` Paul Jackson
  2004-09-08  0:26                               ` Anton Blanchard
  1 sibling, 1 reply; 35+ messages in thread
From: Ralf Baechle @ 2004-09-07 21:44 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, Paul Jackson, ak, Andrew Morton, Linux Arch list

On Tue, Sep 07, 2004 at 04:49:36PM +0200, Andi Kleen wrote:

> ppc64 libnuma is not deployed - the current numactl releases
> still don't have the system call numbers and nobody told me 
> about hacking them in.
> 
> So for me breaking ppc64 would be no problem.

Same for mips64; to date the SGI IP27 (Origin 200 / 2000) is the only
supported NUMA system and it's not very widespread.  Embedded NUMA is
getting closer but as long as that's still NDA stuff I have no problem
with breaking ABIs.

> > So it's ppc64, sparc64, s390x and sh64. I suspect the breakage is 
> > basically zero, since not only aren't there _that_ many machines out 
> > there, the percentage of them that use setaffinity or mbind is likely not 
> > that high either.
> 
> For mbind() breaking existing ppc64 users is unlikely I agree.
> 
> For setaffinity I am not so sure. The system call is around
> for a long time and has been used in standard utilities
> in distributions also for quite some time.
> 
> Also it is commonly used in real time applications. 
> 
> So in short I think changing mbind to u32 would be fine,
> but I wouldn't do it for sched_setaffinity() 

Same situation here for mips64.  sched_setaffinity() is established and
being used in applications, including commercial products so changing is
not an option.  For mbind(2) the situation is the opposite; I just
noticed the syscall handler was actually pointing to sys_ni_syscall and
nobody did complain so far so that's clearly an ok for changing the
interface :-)

  Ralf

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Fix argument checking in sched_setaffinity
  2004-09-07 21:44                               ` Ralf Baechle
@ 2004-09-07 22:55                                 ` Paul Jackson
  2004-09-08  6:58                                   ` Andi Kleen
  0 siblings, 1 reply; 35+ messages in thread
From: Paul Jackson @ 2004-09-07 22:55 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: ak, torvalds, ak, akpm, linux-arch

Ralf wrote:
> Same situation here for mips64.  sched_setaffinity() is established and
> being used in applications, including commercial products so changing is
> not an option. 

Well ... not an option unless Linus makes it one.  It would be a win
in the long term, I think.  We could introduce a new system call, and
mark this one for eventual removal.

Meanwhile, if sched_setaffinity() is being used in products we can't
really test, then that seems to me to be one more good reason to back
out the API tweaking that Andi and Linus have been doing to it this
last week, and just leave it be, as it was a week ago.

-- 
                          I won't rest till it's the best ...
                          Programmer, Linux Scalability
                          Paul Jackson <pj@sgi.com> 1.650.933.1373

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Fix argument checking in sched_setaffinity
  2004-09-07 14:40                           ` Linus Torvalds
                                               ` (2 preceding siblings ...)
  2004-09-07 14:50                             ` Matthew Wilcox
@ 2004-09-08  0:24                             ` Anton Blanchard
  2004-09-08  0:33                               ` [PATCH] [ppc64] compat_get_bitmap/compat_put_bitmap Anton Blanchard
  3 siblings, 1 reply; 35+ messages in thread
From: Anton Blanchard @ 2004-09-08  0:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paul Jackson, ak, Andrew Morton, Linux Arch list

 
> Now, it will _only_ break systems that are _both_ 64-bit _and_ big-endian. 
> Little-endian or 32-bit boxes will never care.
> 
> There aren't that many of those machines. I've got one right here (ppc64), 
> but that particular one is guaranteed not to break if only because it's 
> running a 32-bit user space.
> 
> So it's ppc64, sparc64, s390x and sh64. I suspect the breakage is 
> basically zero, since not only aren't there _that_ many machines out 
> there, the percentage of them that use setaffinity or mbind is likely not 
> that high either.

Oh we've felt it all right, both the cpu affinity calls and the numa api 
calls that pass node bitmaps around. I'll resend what patches I
currently have for this.

Anton

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Fix argument checking in sched_setaffinity
  2004-09-07 14:49                             ` Andi Kleen
  2004-09-07 21:44                               ` Ralf Baechle
@ 2004-09-08  0:26                               ` Anton Blanchard
  1 sibling, 0 replies; 35+ messages in thread
From: Anton Blanchard @ 2004-09-08  0:26 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, Paul Jackson, ak, Andrew Morton, Linux Arch list


Hi,

> For mbind() breaking existing ppc64 users is unlikely I agree.

Yes.

> For setaffinity I am not so sure. The system call is around
> for a long time and has been used in standard utilities
> in distributions also for quite some time.

We have deployed applications using setaffinity, we need a very good
reason to change this.

> So in short I think changing mbind to u32 would be fine,
> but I wouldn't do it for sched_setaffinity() 

Agreed.

Anton

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH] [ppc64] compat_get_bitmap/compat_put_bitmap
  2004-09-08  0:24                             ` Anton Blanchard
@ 2004-09-08  0:33                               ` Anton Blanchard
  2004-09-08  0:40                                 ` [PATCH] [ppc64] Fix compat cpu affinity on big endian 64bit Anton Blanchard
  0 siblings, 1 reply; 35+ messages in thread
From: Anton Blanchard @ 2004-09-08  0:33 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paul Jackson, ak, Andrew Morton, Linux Arch list


Signed-off-by: Anton Blanchard <anton@samba.org>

diff -puN kernel/compat.c~compat_bitmap kernel/compat.c
--- gr_work/kernel/compat.c~compat_bitmap	2004-09-04 00:56:24.297280051 -0500
+++ gr_work-anton/kernel/compat.c	2004-09-04 00:56:24.313277511 -0500
@@ -590,3 +590,83 @@ long compat_clock_nanosleep(clockid_t wh
 
 /* timer_create is architecture specific because it needs sigevent conversion */
 
+long compat_get_bitmap(unsigned long *mask, compat_ulong_t __user *umask,
+		       unsigned long bitmap_size)
+{
+	int i, j;
+	unsigned long m;
+	compat_ulong_t um;
+	unsigned long nr_compat_longs;
+
+	/* align bitmap up to nearest compat_long_t boundary */
+	bitmap_size = ALIGN(bitmap_size, BITS_PER_COMPAT_LONG);
+
+	if (verify_area(VERIFY_READ, umask, bitmap_size / 8))
+		return -EFAULT;
+
+	nr_compat_longs = BITS_TO_COMPAT_LONGS(bitmap_size);
+
+	for (i = 0; i < BITS_TO_LONGS(bitmap_size); i++) {
+		m = 0;
+
+		for (j = 0; j < sizeof(m)/sizeof(um); j++) {
+			/*
+			 * We dont want to read past the end of the userspace
+			 * bitmap. We must however ensure the end of the
+			 * kernel bitmap is zeroed.
+			 */
+			if (nr_compat_longs-- > 0) {
+				if (__get_user(um, umask))
+					return -EFAULT;
+			} else {
+				um = 0;
+			}
+
+			umask++;
+			m |= (long)um << (j * BITS_PER_COMPAT_LONG);
+		}
+		*mask++ = m;
+	}
+
+	return 0;
+}
+
+long compat_put_bitmap(compat_ulong_t __user *umask, unsigned long *mask,
+		       unsigned long bitmap_size)
+{
+	int i, j;
+	unsigned long m;
+	compat_ulong_t um;
+	unsigned long nr_compat_longs;
+
+	/* align bitmap up to nearest compat_long_t boundary */
+	bitmap_size = ALIGN(bitmap_size, BITS_PER_COMPAT_LONG);
+
+	if (verify_area(VERIFY_WRITE, umask, bitmap_size / 8))
+		return -EFAULT;
+
+	nr_compat_longs = BITS_TO_COMPAT_LONGS(bitmap_size);
+
+	for (i = 0; i < BITS_TO_LONGS(bitmap_size); i++) {
+		m = *mask++;
+
+		for (j = 0; j < sizeof(m)/sizeof(um); j++) {
+			um = m;
+
+			/*
+			 * We dont want to write past the end of the userspace
+			 * bitmap.
+			 */
+			if (nr_compat_longs-- > 0) {
+				if (__put_user(um, umask))
+					return -EFAULT;
+			}
+
+			umask++;
+			m >>= 4*sizeof(um);
+			m >>= 4*sizeof(um);
+		}
+	}
+
+	return 0;
+}
diff -puN include/linux/compat.h~compat_bitmap include/linux/compat.h
--- gr_work/include/linux/compat.h~compat_bitmap	2004-09-04 00:56:24.302279257 -0500
+++ gr_work-anton/include/linux/compat.h	2004-09-04 00:56:24.314277352 -0500
@@ -132,5 +132,15 @@ asmlinkage long compat_sys_select(int n,
 		compat_ulong_t __user *outp, compat_ulong_t __user *exp,
 		struct compat_timeval __user *tvp);
 
+#define BITS_PER_COMPAT_LONG    (8*sizeof(compat_long_t))
+
+#define BITS_TO_COMPAT_LONGS(bits) \
+	(((bits)+BITS_PER_COMPAT_LONG-1)/BITS_PER_COMPAT_LONG)
+
+long compat_get_bitmap(unsigned long *mask, compat_ulong_t __user *umask,
+		       unsigned long bitmap_size);
+long compat_put_bitmap(compat_ulong_t __user *umask, unsigned long *mask,
+		       unsigned long bitmap_size);
+
 #endif /* CONFIG_COMPAT */
 #endif /* _LINUX_COMPAT_H */
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH] [ppc64] Fix compat cpu affinity on big endian 64bit
  2004-09-08  0:33                               ` [PATCH] [ppc64] compat_get_bitmap/compat_put_bitmap Anton Blanchard
@ 2004-09-08  0:40                                 ` Anton Blanchard
  2004-09-08  0:43                                   ` [PATCH] [ppc64] Fix compat NUMA API " Anton Blanchard
  2004-09-08  5:22                                   ` [PATCH] [ppc64] Fix compat cpu affinity " Andrew Morton
  0 siblings, 2 replies; 35+ messages in thread
From: Anton Blanchard @ 2004-09-08  0:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paul Jackson, ak, Andrew Morton, Linux Arch list


Add compat sched affinity code. We can argue about how 
USE_COMPAT_ULONG_CPUMASK works now that the non compat interface has changed.

The old non compat behaviour was to require a bitmap long enough in both
setaffinity and getaffinity, now its only required in getaffinity. I
could do the same for the 32bit interfaces.

Signed-off-by: Anton Blanchard <anton@samba.org>

diff -puN kernel/compat.c~compat_sys_sched_affinity kernel/compat.c
--- gr_work/kernel/compat.c~compat_sys_sched_affinity	2004-09-06 04:19:06.327399153 -0500
+++ gr_work-anton/kernel/compat.c	2004-09-07 19:36:09.127584076 -0500
@@ -412,16 +412,43 @@ compat_sys_wait4(compat_pid_t pid, compa
 	}
 }
 
+/*
+ * for maximum compatability, we allow programs to use a single (compat)
+ * unsigned long bitmask if all cpus will fit. If not, you have to have
+ * at least the kernel size available.
+ */
+#define USE_COMPAT_ULONG_CPUMASK (NR_CPUS <= BITS_PER_COMPAT_LONG)
+
 asmlinkage long compat_sys_sched_setaffinity(compat_pid_t pid, 
 					     unsigned int len,
 					     compat_ulong_t __user *user_mask_ptr)
 {
-	unsigned long kern_mask;
+	cpumask_t kern_mask;
 	mm_segment_t old_fs;
 	int ret;
 
-	if (get_user(kern_mask, user_mask_ptr))
-		return -EFAULT;
+	if (USE_COMPAT_ULONG_CPUMASK) {
+		compat_ulong_t user_mask;
+
+		if (len < sizeof(user_mask))
+			return -EINVAL;
+
+		if (get_user(user_mask, user_mask_ptr))
+			return -EFAULT;
+
+		cpus_addr(kern_mask)[0] = user_mask;
+	} else {
+		unsigned long *k;
+
+		if (len < sizeof(kern_mask))
+			return -EINVAL;
+
+		k = cpus_addr(kern_mask);
+		ret = compat_get_bitmap(k, user_mask_ptr,
+					sizeof(kern_mask) * BITS_PER_LONG);
+		if (ret)
+			return ret;
+	}
 
 	old_fs = get_fs();
 	set_fs(KERNEL_DS);
@@ -436,10 +463,14 @@ asmlinkage long compat_sys_sched_setaffi
 asmlinkage long compat_sys_sched_getaffinity(compat_pid_t pid, unsigned int len,
 					     compat_ulong_t __user *user_mask_ptr)
 {
-	unsigned long kern_mask;
+	cpumask_t kern_mask;
 	mm_segment_t old_fs;
 	int ret;
 
+	if (len < (USE_COMPAT_ULONG_CPUMASK ? sizeof(compat_ulong_t)
+				: sizeof(kern_mask)))
+		return -EINVAL;
+
 	old_fs = get_fs();
 	set_fs(KERNEL_DS);
 	ret = sys_sched_getaffinity(pid,
@@ -447,10 +478,23 @@ asmlinkage long compat_sys_sched_getaffi
 				    (unsigned long __user *) &kern_mask);
 	set_fs(old_fs);
 
-	if (ret > 0) {
-		ret = sizeof(compat_ulong_t);
-		if (put_user(kern_mask, user_mask_ptr))
+	if (ret < 0)
+		return ret;
+
+	if (USE_COMPAT_ULONG_CPUMASK) {
+		if (put_user(&cpus_addr(kern_mask)[0], user_mask_ptr))
 			return -EFAULT;
+		ret = sizeof(compat_ulong_t);
+	} else {
+		unsigned long *k;
+
+		k = cpus_addr(kern_mask);
+		ret = compat_put_bitmap(user_mask_ptr, k,
+					sizeof(kern_mask) * BITS_PER_LONG);
+		if (ret)
+			return ret;
+
+		ret = sizeof(kern_mask);
 	}
 
 	return ret;
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH] [ppc64] Fix compat NUMA API on big endian 64bit
  2004-09-08  0:40                                 ` [PATCH] [ppc64] Fix compat cpu affinity on big endian 64bit Anton Blanchard
@ 2004-09-08  0:43                                   ` Anton Blanchard
  2004-09-08  5:22                                   ` [PATCH] [ppc64] Fix compat cpu affinity " Andrew Morton
  1 sibling, 0 replies; 35+ messages in thread
From: Anton Blanchard @ 2004-09-08  0:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paul Jackson, ak, Andrew Morton, Linux Arch list


Switch the NUMA API to use compat_get_bitmap/compat_put_bitmap. In order
to use compat_alloc_userspace instead of set_fs tricks, we have to do a
few copies.

This is what we are currently using on ppc64 but are willing to
entertain the idea of going to a 32bit bitmap, especially considering
how much hoops we have to go through to get it right in this patch.

Signed-off-by: Anton Blanchard <anton@samba.org>

diff -puN mm/mempolicy.c~numa_api mm/mempolicy.c
--- gr_work/mm/mempolicy.c~numa_api	2004-09-04 21:14:44.595414365 -0500
+++ gr_work-anton/mm/mempolicy.c	2004-09-05 09:19:10.475691107 -0500
@@ -525,20 +525,82 @@ asmlinkage long sys_get_mempolicy(int __
 }
 
 #ifdef CONFIG_COMPAT
-/* The other functions are compatible */
+
 asmlinkage long compat_get_mempolicy(int __user *policy,
-				  unsigned __user *nmask, unsigned  maxnode,
-				  unsigned addr, unsigned  flags)
+				     compat_ulong_t __user *nmask,
+				     compat_ulong_t maxnode,
+				     compat_ulong_t addr, compat_ulong_t flags)
 {
 	long err;
 	unsigned long __user *nm = NULL;
+	unsigned long nr_bits, alloc_size;
+	DECLARE_BITMAP(bm, MAX_NUMNODES);
+
+	nr_bits = min_t(unsigned long, maxnode-1, MAX_NUMNODES);
+	alloc_size = ALIGN(nr_bits, BITS_PER_LONG) / 8;
+
 	if (nmask)
-		nm = compat_alloc_user_space(ALIGN(maxnode-1, 64) / 8);
-	err = sys_get_mempolicy(policy, nm, maxnode, addr, flags);
-	if (!err && copy_in_user(nmask, nm, ALIGN(maxnode-1, 32)/8))
-		err = -EFAULT;
+		nm = compat_alloc_user_space(alloc_size);
+
+	err = sys_get_mempolicy(policy, nm, nr_bits+1, addr, flags);
+
+	if (!err && nmask) {
+		err = copy_from_user(bm, nm, alloc_size);
+		/* ensure entire bitmap is zeroed */
+		err |= clear_user(nmask, ALIGN(maxnode-1, 8) / 8);
+		err |= compat_put_bitmap(nmask, bm, nr_bits);
+	}
+
 	return err;
 }
+
+asmlinkage long compat_set_mempolicy(int mode, compat_ulong_t __user *nmask,
+				     compat_ulong_t maxnode)
+{
+	long err;
+	unsigned long __user *nm = NULL;
+	unsigned long nr_bits, alloc_size;
+	DECLARE_BITMAP(bm, MAX_NUMNODES);
+
+	nr_bits = min_t(unsigned long, maxnode-1, MAX_NUMNODES);
+	alloc_size = ALIGN(nr_bits, BITS_PER_LONG) / 8;
+
+	if (nmask) {
+		err = compat_get_bitmap(bm, nmask, nr_bits);
+		nm = compat_alloc_user_space(alloc_size);
+		err |= copy_to_user(nm, bm, alloc_size);
+	}
+
+	if (err)
+		return -EFAULT;
+
+	return sys_set_mempolicy(mode, nm, nr_bits+1);
+}
+
+asmlinkage long compat_mbind(compat_ulong_t start, compat_ulong_t len,
+			     compat_ulong_t mode, compat_ulong_t __user *nmask,
+			     compat_ulong_t maxnode, compat_ulong_t flags)
+{
+	long err;
+	unsigned long __user *nm = NULL;
+	unsigned long nr_bits, alloc_size;
+	DECLARE_BITMAP(bm, MAX_NUMNODES);
+
+	nr_bits = min_t(unsigned long, maxnode-1, MAX_NUMNODES);
+	alloc_size = ALIGN(nr_bits, BITS_PER_LONG) / 8;
+
+	if (nmask) {
+		err = compat_get_bitmap(bm, nmask, nr_bits);
+		nm = compat_alloc_user_space(alloc_size);
+		err |= copy_to_user(nm, bm, alloc_size);
+	}
+
+	if (err)
+		return -EFAULT;
+
+	return sys_mbind(start, len, mode, nm, nr_bits+1, flags);
+}
+
 #endif
 
 /* Return effective policy for a VMA */

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] [ppc64] Fix compat cpu affinity on big endian 64bit
  2004-09-08  0:40                                 ` [PATCH] [ppc64] Fix compat cpu affinity on big endian 64bit Anton Blanchard
  2004-09-08  0:43                                   ` [PATCH] [ppc64] Fix compat NUMA API " Anton Blanchard
@ 2004-09-08  5:22                                   ` Andrew Morton
  2004-09-08  5:34                                     ` Anton Blanchard
  1 sibling, 1 reply; 35+ messages in thread
From: Andrew Morton @ 2004-09-08  5:22 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: torvalds, pj, ak, linux-arch

Anton Blanchard <anton@samba.org> wrote:
>
> Add compat sched affinity code. We can argue about how 
>  USE_COMPAT_ULONG_CPUMASK works now that the non compat interface has changed.
> 
>  The old non compat behaviour was to require a bitmap long enough in both
>  setaffinity and getaffinity, now its only required in getaffinity. I
>  could do the same for the 32bit interfaces.
> 
>  Signed-off-by: Anton Blanchard <anton@samba.org>
> 
>  diff -puN kernel/compat.c~compat_sys_sched_affinity kernel/compat.c
>  --- gr_work/kernel/compat.c~compat_sys_sched_affinity	2004-09-06 04:19:06.327399153 -0500
>  +++ gr_work-anton/kernel/compat.c	2004-09-07 19:36:09.127584076 -0500
>  @@ -412,16 +412,43 @@ compat_sys_wait4(compat_pid_t pid, compa
>   	}
>   }
>   
>  +/*
>  + * for maximum compatability, we allow programs to use a single (compat)
>  + * unsigned long bitmask if all cpus will fit. If not, you have to have
>  + * at least the kernel size available.
>  + */
>  +#define USE_COMPAT_ULONG_CPUMASK (NR_CPUS <= BITS_PER_COMPAT_LONG)

umm, wanna send along a patch which defines BITS_PER_COMPAT_LONG?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] [ppc64] Fix compat cpu affinity on big endian 64bit
  2004-09-08  5:22                                   ` [PATCH] [ppc64] Fix compat cpu affinity " Andrew Morton
@ 2004-09-08  5:34                                     ` Anton Blanchard
  2004-09-08  5:43                                       ` Andrew Morton
  0 siblings, 1 reply; 35+ messages in thread
From: Anton Blanchard @ 2004-09-08  5:34 UTC (permalink / raw)
  To: Andrew Morton; +Cc: torvalds, pj, ak, linux-arch


> umm, wanna send along a patch which defines BITS_PER_COMPAT_LONG?

Yep, did you get the first in the series? Its hiding at the bottom.

Anton

--

Date: Wed, 8 Sep 2004 10:33:59 +1000
From: Anton Blanchard <anton@samba.org>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Paul Jackson <pj@sgi.com>, ak@muc.de,
	Andrew Morton <akpm@osdl.org>,
	Linux Arch list <linux-arch@vger.kernel.org>
Subject: [PATCH] [ppc64] compat_get_bitmap/compat_put_bitmap

Signed-off-by: Anton Blanchard <anton@samba.org>

diff -puN kernel/compat.c~compat_bitmap kernel/compat.c
--- gr_work/kernel/compat.c~compat_bitmap	2004-09-04 00:56:24.297280051 -0500
+++ gr_work-anton/kernel/compat.c	2004-09-04 00:56:24.313277511 -0500
@@ -590,3 +590,83 @@ long compat_clock_nanosleep(clockid_t wh
 
 /* timer_create is architecture specific because it needs sigevent conversion */
 
+long compat_get_bitmap(unsigned long *mask, compat_ulong_t __user *umask,
+		       unsigned long bitmap_size)
+{
+	int i, j;
+	unsigned long m;
+	compat_ulong_t um;
+	unsigned long nr_compat_longs;
+
+	/* align bitmap up to nearest compat_long_t boundary */
+	bitmap_size = ALIGN(bitmap_size, BITS_PER_COMPAT_LONG);
+
+	if (verify_area(VERIFY_READ, umask, bitmap_size / 8))
+		return -EFAULT;
+
+	nr_compat_longs = BITS_TO_COMPAT_LONGS(bitmap_size);
+
+	for (i = 0; i < BITS_TO_LONGS(bitmap_size); i++) {
+		m = 0;
+
+		for (j = 0; j < sizeof(m)/sizeof(um); j++) {
+			/*
+			 * We dont want to read past the end of the userspace
+			 * bitmap. We must however ensure the end of the
+			 * kernel bitmap is zeroed.
+			 */
+			if (nr_compat_longs-- > 0) {
+				if (__get_user(um, umask))
+					return -EFAULT;
+			} else {
+				um = 0;
+			}
+
+			umask++;
+			m |= (long)um << (j * BITS_PER_COMPAT_LONG);
+		}
+		*mask++ = m;
+	}
+
+	return 0;
+}
+
+long compat_put_bitmap(compat_ulong_t __user *umask, unsigned long *mask,
+		       unsigned long bitmap_size)
+{
+	int i, j;
+	unsigned long m;
+	compat_ulong_t um;
+	unsigned long nr_compat_longs;
+
+	/* align bitmap up to nearest compat_long_t boundary */
+	bitmap_size = ALIGN(bitmap_size, BITS_PER_COMPAT_LONG);
+
+	if (verify_area(VERIFY_WRITE, umask, bitmap_size / 8))
+		return -EFAULT;
+
+	nr_compat_longs = BITS_TO_COMPAT_LONGS(bitmap_size);
+
+	for (i = 0; i < BITS_TO_LONGS(bitmap_size); i++) {
+		m = *mask++;
+
+		for (j = 0; j < sizeof(m)/sizeof(um); j++) {
+			um = m;
+
+			/*
+			 * We dont want to write past the end of the userspace
+			 * bitmap.
+			 */
+			if (nr_compat_longs-- > 0) {
+				if (__put_user(um, umask))
+					return -EFAULT;
+			}
+
+			umask++;
+			m >>= 4*sizeof(um);
+			m >>= 4*sizeof(um);
+		}
+	}
+
+	return 0;
+}
diff -puN include/linux/compat.h~compat_bitmap include/linux/compat.h
--- gr_work/include/linux/compat.h~compat_bitmap	2004-09-04 00:56:24.302279257 -0500
+++ gr_work-anton/include/linux/compat.h	2004-09-04 00:56:24.314277352 -0500
@@ -132,5 +132,15 @@ asmlinkage long compat_sys_select(int n,
 		compat_ulong_t __user *outp, compat_ulong_t __user *exp,
 		struct compat_timeval __user *tvp);
 
+#define BITS_PER_COMPAT_LONG    (8*sizeof(compat_long_t))
+
+#define BITS_TO_COMPAT_LONGS(bits) \
+	(((bits)+BITS_PER_COMPAT_LONG-1)/BITS_PER_COMPAT_LONG)
+
+long compat_get_bitmap(unsigned long *mask, compat_ulong_t __user *umask,
+		       unsigned long bitmap_size);
+long compat_put_bitmap(compat_ulong_t __user *umask, unsigned long *mask,
+		       unsigned long bitmap_size);
+
 #endif /* CONFIG_COMPAT */
 #endif /* _LINUX_COMPAT_H */
_

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] [ppc64] Fix compat cpu affinity on big endian 64bit
  2004-09-08  5:34                                     ` Anton Blanchard
@ 2004-09-08  5:43                                       ` Andrew Morton
  0 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2004-09-08  5:43 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: torvalds, pj, ak, linux-arch

Anton Blanchard <anton@samba.org> wrote:
>
>  > umm, wanna send along a patch which defines BITS_PER_COMPAT_LONG?
> 
>  Yep, did you get the first in the series?

<looks>

Oh, so I did.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Fix argument checking in sched_setaffinity
  2004-09-07 22:55                                 ` Paul Jackson
@ 2004-09-08  6:58                                   ` Andi Kleen
  2004-09-08  7:26                                     ` Paul Jackson
  0 siblings, 1 reply; 35+ messages in thread
From: Andi Kleen @ 2004-09-08  6:58 UTC (permalink / raw)
  To: Paul Jackson; +Cc: Ralf Baechle, ak, torvalds, ak, akpm, linux-arch

On Tue, Sep 07, 2004 at 03:55:30PM -0700, Paul Jackson wrote:
> Ralf wrote:
> > Same situation here for mips64.  sched_setaffinity() is established and
> > being used in applications, including commercial products so changing is
> > not an option. 
> 
> Well ... not an option unless Linus makes it one.  It would be a win
> in the long term, I think.  We could introduce a new system call, and
> mark this one for eventual removal.
> 
> Meanwhile, if sched_setaffinity() is being used in products we can't
> really test, then that seems to me to be one more good reason to back
> out the API tweaking that Andi and Linus have been doing to it this
> last week, and just leave it be, as it was a week ago.

The changes me and Linus did do not change anything for these
products. When they pass length == sizeof(cpumask_t) everything 
is ok. The only behaviour change is for cases that would
previously have returned -EINVAL.

-Andi

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Fix argument checking in sched_setaffinity
  2004-09-08  6:58                                   ` Andi Kleen
@ 2004-09-08  7:26                                     ` Paul Jackson
  0 siblings, 0 replies; 35+ messages in thread
From: Paul Jackson @ 2004-09-08  7:26 UTC (permalink / raw)
  To: Andi Kleen; +Cc: ralf, torvalds, ak, akpm, linux-arch

> The only behaviour change is for cases that would
> previously have returned -EINVAL.

That's a change.  It has non-zero risk of breaking (or exposing an
existing, implicit breakage in) some user code, especially given that
user code has been encouraged to write loops testing for -EINVAL in
order to probe the mask size.  I didn't see enough benefit from any
of the variants of the last week to compensate for this risk.

-- 
                          I won't rest till it's the best ...
                          Programmer, Linux Scalability
                          Paul Jackson <pj@sgi.com> 1.650.933.1373

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2004-09-08  7:27 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-31 14:30 [PATCH] Fix argument checking in sched_setaffinity Andi Kleen
2004-09-01  1:36 ` Paul Jackson
2004-09-01  1:59   ` Anton Blanchard
2004-09-02  9:33     ` Paul Jackson
2004-09-04 13:40     ` Andi Kleen
2004-09-05 14:27       ` Anton Blanchard
2004-09-04 13:37   ` Andi Kleen
     [not found]     ` <20040904171417.67649169.pj@sgi.com>
2004-09-05  0:18       ` Linus Torvalds
2004-09-05  1:05         ` Paul Jackson
2004-09-05  1:38           ` Linus Torvalds
2004-09-05  3:48             ` Paul Jackson
2004-09-05  3:57               ` Linus Torvalds
2004-09-05  4:17                 ` Paul Jackson
2004-09-05  4:52                   ` Paul Jackson
2004-09-06 18:23                     ` Andi Kleen
2004-09-06 18:48                       ` Linus Torvalds
2004-09-06 21:11                         ` Paul Jackson
2004-09-07 14:40                           ` Linus Torvalds
2004-09-07 14:48                             ` Geert Uytterhoeven
2004-09-07 14:49                             ` Andi Kleen
2004-09-07 21:44                               ` Ralf Baechle
2004-09-07 22:55                                 ` Paul Jackson
2004-09-08  6:58                                   ` Andi Kleen
2004-09-08  7:26                                     ` Paul Jackson
2004-09-08  0:26                               ` Anton Blanchard
2004-09-07 14:50                             ` Matthew Wilcox
2004-09-08  0:24                             ` Anton Blanchard
2004-09-08  0:33                               ` [PATCH] [ppc64] compat_get_bitmap/compat_put_bitmap Anton Blanchard
2004-09-08  0:40                                 ` [PATCH] [ppc64] Fix compat cpu affinity on big endian 64bit Anton Blanchard
2004-09-08  0:43                                   ` [PATCH] [ppc64] Fix compat NUMA API " Anton Blanchard
2004-09-08  5:22                                   ` [PATCH] [ppc64] Fix compat cpu affinity " Andrew Morton
2004-09-08  5:34                                     ` Anton Blanchard
2004-09-08  5:43                                       ` Andrew Morton
2004-09-07  8:07                         ` [PATCH] Fix argument checking in sched_setaffinity Andi Kleen
2004-09-06 13:16         ` Andi Kleen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.