* [PATCH] fix sys cpumap for > 352 NR_CPUS
@ 2004-06-02 23:11 Paul Jackson
2004-06-02 23:23 ` Andrew Morton
2004-06-03 0:22 ` Rusty Russell
0 siblings, 2 replies; 20+ messages in thread
From: Paul Jackson @ 2004-06-02 23:11 UTC (permalink / raw)
To: linux-kernel; +Cc: Andrew Morton, Andi Kleen
Due to a bug in the 2.6 kernel, the cpumap files, such as:
/sys/devices/system/node/node1/cpumap
only display the high order 352 CPUs. If you compile a kernel with
NR_CPUS=512, such as SGI's Altix is usually compiled, then the low order
512-352 == 160 CPU positions are not shown.
This bug breaks Andi Kleens numactl and libnuma user code which depends
on this file presenting correctly. Andi has reviewed this fix, and it
builds, boots and fixes the breakage (ia64 with sn2_defconfig).
Please apply the following patch (a hack, but better than the hack that
was there before).
Signed-off-by: Paul Jackson <pj@sgi.com>
===================================================================
--- 2.6.7-rc2-mm1-bitmapv6.orig/drivers/base/node.c 2004-06-02 02:04:12.000000000 -0700
+++ 2.6.7-rc2-mm1-bitmapv6/drivers/base/node.c 2004-06-02 15:38:15.000000000 -0700
@@ -21,9 +21,21 @@ static ssize_t node_read_cpumap(struct s
cpumask_t mask = node_dev->cpumap;
int len;
- /* FIXME - someone should pass us a buffer size (count) or
- * use seq_file or something to avoid buffer overrun risk. */
- len = cpumask_scnprintf(buf, 99 /* XXX FIXME */, mask);
+ /*
+ * Hack alert:
+ * 1) This could overwrite a buffer w/o warning. Someone should
+ * pass us a buffer size (count) or use seq_file or something
+ * to avoid buffer overrun risks.
+ * 2) This can return a count larger than the read size requested
+ * by the user code - possibly confusing it.
+ * 3) Following hardcodes that mask scnprintf format requires 9
+ * chars of output for each 32 bits of mask or fraction.
+ * 4) Following prints stale node_dev->cpumap value, instead of
+ * evaluating afresh node_to_cpumask(node_dev->sysdev.id).
+ * 5) Why does struct node even has the field cpumap. Won't it
+ * just get stale, especially in the face of cpu hotplug?
+ */
+ len = cpumask_scnprintf(buf, ((NR_CPUS+31)/32)*9 /* XXX FIXME */, mask);
len += sprintf(buf + len, "\n");
return len;
}
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.650.933.1373
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] fix sys cpumap for > 352 NR_CPUS
2004-06-02 23:11 [PATCH] fix sys cpumap for > 352 NR_CPUS Paul Jackson
@ 2004-06-02 23:23 ` Andrew Morton
2004-06-02 23:59 ` Paul Jackson
2004-06-03 0:22 ` Rusty Russell
1 sibling, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2004-06-02 23:23 UTC (permalink / raw)
To: Paul Jackson; +Cc: linux-kernel, ak, Rusty Russell
Paul Jackson <pj@sgi.com> wrote:
>
> @@ -21,9 +21,21 @@ static ssize_t node_read_cpumap(struct s
> cpumask_t mask = node_dev->cpumap;
> int len;
>
> - /* FIXME - someone should pass us a buffer size (count) or
> - * use seq_file or something to avoid buffer overrun risk. */
> - len = cpumask_scnprintf(buf, 99 /* XXX FIXME */, mask);
> + /*
> + * Hack alert:
> + * 1) This could overwrite a buffer w/o warning. Someone should
> + * pass us a buffer size (count) or use seq_file or something
> + * to avoid buffer overrun risks.
> + * 2) This can return a count larger than the read size requested
> + * by the user code - possibly confusing it.
> + * 3) Following hardcodes that mask scnprintf format requires 9
> + * chars of output for each 32 bits of mask or fraction.
> + * 4) Following prints stale node_dev->cpumap value, instead of
> + * evaluating afresh node_to_cpumask(node_dev->sysdev.id).
> + * 5) Why does struct node even has the field cpumap. Won't it
> + * just get stale, especially in the face of cpu hotplug?
> + */
> + len = cpumask_scnprintf(buf, ((NR_CPUS+31)/32)*9 /* XXX FIXME */, mask);
Oh dear. The sysfs failure to pass in a `length' arg bites again. Can't
we just stick a PAGE_SIZE in here?
wrt the hotplug questions: dunno. Rusty is ->thattaway.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] fix sys cpumap for > 352 NR_CPUS
2004-06-02 23:23 ` Andrew Morton
@ 2004-06-02 23:59 ` Paul Jackson
2004-06-03 0:17 ` Andrew Morton
0 siblings, 1 reply; 20+ messages in thread
From: Paul Jackson @ 2004-06-02 23:59 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, ak, rusty
> Can't we just stick a PAGE_SIZE in here?
We could - either way works about as well. Is there something special
about PAGE_SIZE here? Is that in fact what sysfs is making available?
I can send a PAGE_SIZE hack-a-shaq if you like, or you can just code it
yourself. If I do it, I will also fix the comment as appropriate.
Whatever you prefer ...
len = cpumask_scnprintf(buf, PAGE_SIZE /* XXX FIXME */, mask);
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.650.933.1373
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] fix sys cpumap for > 352 NR_CPUS
2004-06-02 23:59 ` Paul Jackson
@ 2004-06-03 0:17 ` Andrew Morton
2004-06-03 16:24 ` Greg KH
0 siblings, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2004-06-03 0:17 UTC (permalink / raw)
To: Paul Jackson; +Cc: linux-kernel, ak, rusty, Greg KH
Paul Jackson <pj@sgi.com> wrote:
>
> > Can't we just stick a PAGE_SIZE in here?
>
> We could - either way works about as well. Is there something special
> about PAGE_SIZE here? Is that in fact what sysfs is making available?
Think so. Greg, can you confirm that a SYSDEV_ATTR's handler can safely
assume that it has a PAGE_SIZE buffer to write to?
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] fix sys cpumap for > 352 NR_CPUS
2004-06-02 23:11 [PATCH] fix sys cpumap for > 352 NR_CPUS Paul Jackson
2004-06-02 23:23 ` Andrew Morton
@ 2004-06-03 0:22 ` Rusty Russell
2004-06-03 4:25 ` Paul Jackson
` (2 more replies)
1 sibling, 3 replies; 20+ messages in thread
From: Rusty Russell @ 2004-06-03 0:22 UTC (permalink / raw)
To: Paul Jackson
Cc: lkml - Kernel Mailing List, Andrew Morton, Andi Kleen, Greg KH
On Thu, 2004-06-03 at 09:11, Paul Jackson wrote:
> + /*
> + * Hack alert:
> + * 1) This could overwrite a buffer w/o warning. Someone should
> + * pass us a buffer size (count) or use seq_file or something
> + * to avoid buffer overrun risks.
Then just use -1UL as the arg to scnprintf, if you don't have a real
number. That way the overflow will at least have a chance of detection
in the sysfs code, which I think it should check in
file.c:fill_read_buffer(). Greg?
> + * 2) This can return a count larger than the read size requested
> + * by the user code - possibly confusing it.
That's sysfs' problem, not yours, and it handles it fine AFAICT.
> + * 3) Following hardcodes that mask scnprintf format requires 9
> + * chars of output for each 32 bits of mask or fraction.
Yes. Don't do that.
> + * 4) Following prints stale node_dev->cpumap value, instead of
> + * evaluating afresh node_to_cpumask(node_dev->sysdev.id).
> + * 5) Why does struct node even has the field cpumap. Won't it
> + * just get stale, especially in the face of cpu hotplug?
Yes, that field should be removed.
Above all, by placing your questions inside a patch, you got results,
but please don't do this again.
Thanks,
Rusty.
--
Anyone who quotes me in their signature is an idiot -- Rusty Russell
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] fix sys cpumap for > 352 NR_CPUS
2004-06-03 0:22 ` Rusty Russell
@ 2004-06-03 4:25 ` Paul Jackson
2004-06-03 6:26 ` Rusty Russell
2004-06-03 4:34 ` Paul Jackson
2004-06-03 16:27 ` Greg KH
2 siblings, 1 reply; 20+ messages in thread
From: Paul Jackson @ 2004-06-03 4:25 UTC (permalink / raw)
To: Rusty Russell; +Cc: linux-kernel, akpm, ak, greg
Rusty wrote:
> Then just use -1UL as the arg to scnprintf, if you don't have a real
> number. That way the overflow will at least have a chance of detection
> in the sysfs code, which I think it should check in
> file.c:fill_read_buffer(). Greg?
That doesn't make sense.
My node_read_cpumap() routine is being passed a finite length buffer
into which it is supposed to put some characters. Unless by contract
or passed value it knows the length of that buffer, it cannot safely
know how far it can write.
Apparently, from Andrews comments and from the line:
buffer->page = (char *) get_zeroed_page(GFP_KERNEL);
in file.c:fill_read_buffer(), the length is PAGE_SIZE, by contract.
Greg - perhaps a comment in include/linux/sysdev.h, near the declarations
for the show() and store() routines, specifying the buffer sizing,
would be appropriate?
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.650.933.1373
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] fix sys cpumap for > 352 NR_CPUS
2004-06-03 0:22 ` Rusty Russell
2004-06-03 4:25 ` Paul Jackson
@ 2004-06-03 4:34 ` Paul Jackson
2004-06-03 4:35 ` Rusty Russell
2004-06-03 16:27 ` Greg KH
2 siblings, 1 reply; 20+ messages in thread
From: Paul Jackson @ 2004-06-03 4:34 UTC (permalink / raw)
To: Rusty Russell; +Cc: linux-kernel, akpm, ak, greg
Rusty wrote:
> Above all, by placing your questions inside a patch, you got results,
> but please don't do this again.
Ah yes ...
Apparently, Rusty, you are unfamiliar with the style of coding that
involves "put in an ugly hack now, commenting the known open issues,
and then clean up the mess some other day".
This is good ... good that you are unfamiliar with such.
Rather than try to convert you to my corrupt ways, which would be
a great loss to the Linux community, and probably not possible anyway,
I will:
1) Apologize - yes - I should have tried to ask these questions now
and get this code right (which it turns out is happening anyway).
2) Send another patch to Andrew, with PAGE_SIZE as he recommends,
and with the stale struct node cpumap field nuked, as you
recommend.
Thank-you. It's a pleasure.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.650.933.1373
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] fix sys cpumap for > 352 NR_CPUS
2004-06-03 4:34 ` Paul Jackson
@ 2004-06-03 4:35 ` Rusty Russell
0 siblings, 0 replies; 20+ messages in thread
From: Rusty Russell @ 2004-06-03 4:35 UTC (permalink / raw)
To: Paul Jackson; +Cc: lkml - Kernel Mailing List, Andrew Morton, ak, Greg KH
On Thu, 2004-06-03 at 14:34, Paul Jackson wrote:
> Rusty wrote:
> > Above all, by placing your questions inside a patch, you got results,
> > but please don't do this again.
>
> Ah yes ...
>
> Apparently, Rusty, you are unfamiliar with the style of coding that
> involves "put in an ugly hack now, commenting the known open issues,
> and then clean up the mess some other day".
Known open issues I have no problems with. It's asking questions in the
comments that I disagree with: either you should ask on lkml, or grep.
To be honest, if you can't figure out the answers, you have to ask if
you're the right person to be modifying the code.
Rusty.
--
Anyone who quotes me in their signature is an idiot -- Rusty Russell
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] fix sys cpumap for > 352 NR_CPUS
2004-06-03 4:25 ` Paul Jackson
@ 2004-06-03 6:26 ` Rusty Russell
2004-06-03 8:27 ` Paul Jackson
2004-06-03 15:49 ` Paul Jackson
0 siblings, 2 replies; 20+ messages in thread
From: Rusty Russell @ 2004-06-03 6:26 UTC (permalink / raw)
To: Paul Jackson; +Cc: lkml - Kernel Mailing List, Andrew Morton, ak, Greg KH
On Thu, 2004-06-03 at 14:25, Paul Jackson wrote:
> Rusty wrote:
> > Then just use -1UL as the arg to scnprintf, if you don't have a real
> > number. That way the overflow will at least have a chance of detection
> > in the sysfs code, which I think it should check in
> > file.c:fill_read_buffer(). Greg?
>
> That doesn't make sense.
Then I apologize.
Please allow me to demonstrate with code, which should be clearer.
Name: Fix sysfs Node Cpumap for Large NR_CPUS
Status: Booted on 2.6.7-rc2-bk3
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As pointed out by Paul Jackson, sometimes 99 chars is not enough. We
currently get a page from sysfs: that code should check we don't
haven't overrun it, and for futureproofing, detect problem at
buildtime.
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .9869-linux-2.6.7-rc2-bk3/drivers/base/node.c .9869-linux-2.6.7-rc2-bk3.updated/drivers/base/node.c
--- .9869-linux-2.6.7-rc2-bk3/drivers/base/node.c 2004-05-31 09:57:07.000000000 +1000
+++ .9869-linux-2.6.7-rc2-bk3.updated/drivers/base/node.c 2004-06-03 16:18:44.000000000 +1000
@@ -21,9 +21,10 @@ static ssize_t node_read_cpumap(struct s
cpumask_t mask = node_dev->cpumap;
int len;
- /* FIXME - someone should pass us a buffer size (count) or
- * use seq_file or something to avoid buffer overrun risk. */
- len = cpumask_scnprintf(buf, 99 /* XXX FIXME */, mask);
+ /* 2004/06/03: buf currently PAGE_SIZE, need > 1 char per 4 bits. */
+ BUILD_BUG_ON(NR_CPUS/4 > PAGE_SIZE/2);
+
+ len = cpumask_scnprintf(buf, -1UL, mask);
len += sprintf(buf + len, "\n");
return len;
}
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .9869-linux-2.6.7-rc2-bk3/fs/sysfs/file.c .9869-linux-2.6.7-rc2-bk3.updated/fs/sysfs/file.c
--- .9869-linux-2.6.7-rc2-bk3/fs/sysfs/file.c 2004-05-31 09:57:31.000000000 +1000
+++ .9869-linux-2.6.7-rc2-bk3.updated/fs/sysfs/file.c 2004-06-03 16:19:39.000000000 +1000
@@ -89,6 +90,7 @@ static int fill_read_buffer(struct file
return -ENOMEM;
count = ops->show(kobj,attr,buffer->page);
+ BUG_ON(count > PAGE_SIZE);
if (count >= 0)
buffer->count = count;
else
--
Anyone who quotes me in their signature is an idiot -- Rusty Russell
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] fix sys cpumap for > 352 NR_CPUS
2004-06-03 6:26 ` Rusty Russell
@ 2004-06-03 8:27 ` Paul Jackson
2004-06-04 1:12 ` Rusty Russell
2004-06-03 15:49 ` Paul Jackson
1 sibling, 1 reply; 20+ messages in thread
From: Paul Jackson @ 2004-06-03 8:27 UTC (permalink / raw)
To: Rusty Russell; +Cc: linux-kernel, akpm, ak, greg
> + BUILD_BUG_ON(NR_CPUS/4 > PAGE_SIZE/2);
Interesting. This would be the only use in the entire kernel of
BUILD_BUG_ON().
An alternative mechanism would be:
#if(badthing) ... #error "darn" ... #endif
There are over 300 such constructs in the kernel. And it has the
advantage of providing an accurate source line number and a specifiable
string.
My preference when checking limits is to check for the exact limit.
Adding fuzz only serves to disguise details. Whether coding correctly,
or screwing up, best to do so with clarity and precision.
Given all that, how about:
#if ALIGN(NR_CPUS,32)*9/32 > PAGE_SIZE
#error "Need 9 bytes space per 32 CPUs in PAGE_SIZE buffer"
#endif
==
I also could be tempted to remove BUILD_BUG_ON() from kernel.h, and
replace it with a comment:
/* BUILD_BUG_ON() obsolete - consider using #if ... #error ... #endif */
==
> + len = cpumask_scnprintf(buf, -1UL, mask);
Why not instead:
> + len = cpumask_scnprintf(buf, PAGE_SIZE-1, mask);
I see no sense in giving cpumask_scnprintf() license to write past the
end of the buffer, independent of any build-time checks (the -1 is for
the trailing newline). And since the contract says "PAGE_SIZE", we
should code exactly to that value "PAGE_SIZE", for clarity as to our
understandings. Once again - I hate fuzz ;).
==
> + BUG_ON(count > PAGE_SIZE);
Only 'BUG_ON' ?? We have in hand almost certain proof of just
having scrogged kernel memory. Time to panic, no?
==
What are we going to do about the removal of the node_dev->cpumap field,
and changing this node_read_cpumap() routine to display instead the
value of node_to_cpumask(node_dev->sysdev.id)?
Should I do it, or you? Should it presume your patch above, or collide
with it, or replace and extend it?
Since I am most impressed with your abilities, since you doubt my
abilities, and since I'm a lazy s.o.b., you're welcome to it. Or if you
prefer to ask me, that's fine. Seems to me this should be two patches -
the one discussed above to limit how many bytes cpumask_scnprintf can
posit, and a second to nuke node_dev->cpumap. The second patch would
depend on the first, for the trivial reason that they collide on some of
the same code.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.650.933.1373
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] fix sys cpumap for > 352 NR_CPUS
2004-06-03 6:26 ` Rusty Russell
2004-06-03 8:27 ` Paul Jackson
@ 2004-06-03 15:49 ` Paul Jackson
1 sibling, 0 replies; 20+ messages in thread
From: Paul Jackson @ 2004-06-03 15:49 UTC (permalink / raw)
To: Rusty Russell; +Cc: linux-kernel, akpm, ak, greg
This patch (fix-sys-cpumap-for-352-nr_cpus) doesn't work. Passing -1UL
to cpumask_scnprintf() suppresses all of the requested output, because
of the following code in lib/vsprintf.c vsnprintf():
/* Reject out-of-range values early */
if (unlikely((int) size < 0)) {
/* There can be only one.. */
static int warn = 1;
WARN_ON(warn);
warn = 0;
return 0;
}
Using PAGE_SIZE works ok. Well, lets throw in a -1 to leave a slot for
the newline, while we're here.
Signed-off-by: Paul Jackson <pj@sgi.com>
Index: 2.6.7-rc2-mm2/drivers/base/node.c
===================================================================
--- 2.6.7-rc2-mm2.orig/drivers/base/node.c 2004-06-03 08:24:19.000000000 -0700
+++ 2.6.7-rc2-mm2/drivers/base/node.c 2004-06-03 08:26:12.000000000 -0700
@@ -24,7 +24,7 @@ static ssize_t node_read_cpumap(struct s
/* 2004/06/03: buf currently PAGE_SIZE, need > 1 char per 4 bits. */
BUILD_BUG_ON(NR_CPUS/4 > PAGE_SIZE/2);
- len = cpumask_scnprintf(buf, -1UL, mask);
+ len = cpumask_scnprintf(buf, PAGE_SIZE-1, mask);
len += sprintf(buf + len, "\n");
return len;
}
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.650.933.1373
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] fix sys cpumap for > 352 NR_CPUS
2004-06-03 0:17 ` Andrew Morton
@ 2004-06-03 16:24 ` Greg KH
2004-06-03 16:28 ` Paul Jackson
0 siblings, 1 reply; 20+ messages in thread
From: Greg KH @ 2004-06-03 16:24 UTC (permalink / raw)
To: Andrew Morton; +Cc: Paul Jackson, linux-kernel, ak, rusty
On Wed, Jun 02, 2004 at 05:17:24PM -0700, Andrew Morton wrote:
> Paul Jackson <pj@sgi.com> wrote:
> >
> > > Can't we just stick a PAGE_SIZE in here?
> >
> > We could - either way works about as well. Is there something special
> > about PAGE_SIZE here? Is that in fact what sysfs is making available?
>
> Think so. Greg, can you confirm that a SYSDEV_ATTR's handler can safely
> assume that it has a PAGE_SIZE buffer to write to?
Yes, that is correct.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] fix sys cpumap for > 352 NR_CPUS
2004-06-03 0:22 ` Rusty Russell
2004-06-03 4:25 ` Paul Jackson
2004-06-03 4:34 ` Paul Jackson
@ 2004-06-03 16:27 ` Greg KH
2004-06-03 16:38 ` Paul Jackson
2 siblings, 1 reply; 20+ messages in thread
From: Greg KH @ 2004-06-03 16:27 UTC (permalink / raw)
To: Rusty Russell
Cc: Paul Jackson, lkml - Kernel Mailing List, Andrew Morton,
Andi Kleen
On Thu, Jun 03, 2004 at 10:22:36AM +1000, Rusty Russell wrote:
> On Thu, 2004-06-03 at 09:11, Paul Jackson wrote:
> > + /*
> > + * Hack alert:
> > + * 1) This could overwrite a buffer w/o warning. Someone should
> > + * pass us a buffer size (count) or use seq_file or something
> > + * to avoid buffer overrun risks.
>
> Then just use -1UL as the arg to scnprintf, if you don't have a real
> number. That way the overflow will at least have a chance of detection
> in the sysfs code, which I think it should check in
> file.c:fill_read_buffer(). Greg?
We do check for an error in that function, so returning any negative
error value for a show() sysfs callback will be handled properly.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] fix sys cpumap for > 352 NR_CPUS
2004-06-03 16:24 ` Greg KH
@ 2004-06-03 16:28 ` Paul Jackson
0 siblings, 0 replies; 20+ messages in thread
From: Paul Jackson @ 2004-06-03 16:28 UTC (permalink / raw)
To: Greg KH; +Cc: akpm, linux-kernel, ak, rusty
> Yes, that is correct.
Good - thanks, Greg.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.650.933.1373
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] fix sys cpumap for > 352 NR_CPUS
2004-06-03 16:27 ` Greg KH
@ 2004-06-03 16:38 ` Paul Jackson
2004-06-03 16:51 ` Greg KH
0 siblings, 1 reply; 20+ messages in thread
From: Paul Jackson @ 2004-06-03 16:38 UTC (permalink / raw)
To: Greg KH; +Cc: rusty, linux-kernel, akpm, ak
> We do check for an error in that function, so returning any negative
> error value for a show() sysfs callback will be handled properly.
If a show() function returns an error, you handle it - good. As it
should be.
But if a show() function overwrites the page buffer provided to it,
before returning, then there is nothing you can do beyond sending
condolences to the next of kin.
This PATCH and email thread came about because the buffer size is not
passed into the show() function, nor, so far as I can tell, is it
documented anywhere other than implicitly in the fill_read_buffer()
code:
buffer->page = (char *) get_zeroed_page(GFP_KERNEL);
So we were getting confused as to what size buffer we had, when
coding defensively to avoid buffer overrun.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.650.933.1373
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] fix sys cpumap for > 352 NR_CPUS
2004-06-03 16:38 ` Paul Jackson
@ 2004-06-03 16:51 ` Greg KH
2004-06-04 1:27 ` Rusty Russell
0 siblings, 1 reply; 20+ messages in thread
From: Greg KH @ 2004-06-03 16:51 UTC (permalink / raw)
To: Paul Jackson; +Cc: rusty, linux-kernel, akpm, ak
On Thu, Jun 03, 2004 at 09:38:07AM -0700, Paul Jackson wrote:
> > We do check for an error in that function, so returning any negative
> > error value for a show() sysfs callback will be handled properly.
>
> If a show() function returns an error, you handle it - good. As it
> should be.
>
> But if a show() function overwrites the page buffer provided to it,
> before returning, then there is nothing you can do beyond sending
> condolences to the next of kin.
Yup, you broke the kernel, you get to keep both pieces :)
> This PATCH and email thread came about because the buffer size is not
> passed into the show() function, nor, so far as I can tell, is it
> documented anywhere other than implicitly in the fill_read_buffer()
> code:
>
> buffer->page = (char *) get_zeroed_page(GFP_KERNEL);
I agree that we should document this better. This thread has shown
that.
> So we were getting confused as to what size buffer we had, when
> coding defensively to avoid buffer overrun.
Panicing the kernel is not really acceptable, even if you did just
scribble accross any portion of kernel memory in your show() function.
Just be aware of the size and code your show() function to be defensive
and not overrun that size.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] fix sys cpumap for > 352 NR_CPUS
2004-06-03 8:27 ` Paul Jackson
@ 2004-06-04 1:12 ` Rusty Russell
2004-06-04 2:25 ` Paul Jackson
0 siblings, 1 reply; 20+ messages in thread
From: Rusty Russell @ 2004-06-04 1:12 UTC (permalink / raw)
To: Paul Jackson; +Cc: lkml - Kernel Mailing List, Andrew Morton, ak, Greg KH
On Thu, 2004-06-03 at 18:27, Paul Jackson wrote:
> > + BUILD_BUG_ON(NR_CPUS/4 > PAGE_SIZE/2);
>
> Interesting. This would be the only use in the entire kernel of
> BUILD_BUG_ON().
>
> An alternative mechanism would be:
>
> #if(badthing) ... #error "darn" ... #endif
Whatever, I like BUILD_BUG_ON().
> #if ALIGN(NR_CPUS,32)*9/32 > PAGE_SIZE
> #error "Need 9 bytes space per 32 CPUs in PAGE_SIZE buffer"
> #endif
Honestly, I dislike the static check altogether, since it's tied too
closely to the implementation of independent code. Since glibc will
break on > 1024 cpus, I'm not too concerned about the 10000+ CPU limit.
> > + len = cpumask_scnprintf(buf, -1UL, mask);
>
> Why not instead:
>
> > + len = cpumask_scnprintf(buf, PAGE_SIZE-1, mask);
>
> I see no sense in giving cpumask_scnprintf() license to write past the
> end of the buffer, independent of any build-time checks (the -1 is for
> the trailing newline). And since the contract says "PAGE_SIZE", we
> should code exactly to that value "PAGE_SIZE", for clarity as to our
> understandings. Once again - I hate fuzz ;).
Because now you have silently truncated, which is much worse than
blowing up loudly. My version also works correctly when sysfs starts
handing out two pages rather than one because something else needed it.
If you want to do this, then please do:
len = cpumask_scnprintf(buf, PAGE_SIZE, mask);
if (len == PAGE_SIZE)
return -ENOMEM;
> > + BUG_ON(count > PAGE_SIZE);
>
> Only 'BUG_ON' ?? We have in hand almost certain proof of just
> having scrogged kernel memory. Time to panic, no?
That would be extremely unusual; we tend not to panic even when things
are bad. In this case, things are bad because of a static configuration
error, so I find it hard to really care...
> ==
>
> What are we going to do about the removal of the node_dev->cpumap field,
> and changing this node_read_cpumap() routine to display instead the
> value of node_to_cpumask(node_dev->sysdev.id)?
>
> Should I do it, or you? Should it presume your patch above, or collide
> with it, or replace and extend it?
I'll do it; seems like we need negotiation with Greg-KH, too.
> Since I am most impressed with your abilities, since you doubt my
> abilities, and since I'm a lazy s.o.b., you're welcome to it.
That's not true. I have faith in your abilities; I question anyone's
ability to produce a perfectly balanced solution without any external
input.
Rusty.
--
Anyone who quotes me in their signature is an idiot -- Rusty Russell
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] fix sys cpumap for > 352 NR_CPUS
2004-06-03 16:51 ` Greg KH
@ 2004-06-04 1:27 ` Rusty Russell
2004-06-04 18:15 ` Greg KH
0 siblings, 1 reply; 20+ messages in thread
From: Rusty Russell @ 2004-06-04 1:27 UTC (permalink / raw)
To: Greg KH; +Cc: Paul Jackson, lkml - Kernel Mailing List, Andrew Morton, ak
On Fri, 2004-06-04 at 02:51, Greg KH wrote:
> Just be aware of the size and code your show() function to be defensive
> and not overrun that size.
This is where we have a philosophical difference. As I understand it,
the rule is, "don't put big things in attributes". If we want to change
that rule, we need to do more work, like pass the length to the show
function, and handle -ENOMEM by reallocating and looping.
But I think the /rule/ is a good one: if you need to handle something
arbitrarily large, DON'T USE THIS INTERFACE, because there is no way to
do that correctly. This allows us to handle 99.9% of cases as a
one-liner, which I think has great merit.
I think we should guarantee any kernel primitive fits into the space:
this means it should comfortably fit printing a cpumask_t. I would
argue for a #error inside the cpumask or sysfs code which ensures we can
fit two cpumasks (~7000 CPUs on page-size 4096), so we explode early if
this ever becomes a problem, and a runtime sanity check inside the sysfs
code to BUG on overrun.
If the average code is worrying about the exact size of the sysfs
buffer, I think they're on the wrong path. If they are checking it at
runtime, they've got a bug.
I hope this clarifies my thinking, and I can understand that people
would disagree with my premise: that's OK.
Cheers,
Rusty.
--
Anyone who quotes me in their signature is an idiot -- Rusty Russell
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] fix sys cpumap for > 352 NR_CPUS
2004-06-04 1:12 ` Rusty Russell
@ 2004-06-04 2:25 ` Paul Jackson
0 siblings, 0 replies; 20+ messages in thread
From: Paul Jackson @ 2004-06-04 2:25 UTC (permalink / raw)
To: Rusty Russell; +Cc: linux-kernel, akpm, ak, greg
> Honestly, I dislike the static check altogether ...
The build time check was your idea in the first place, as I recall.
I hadn't added it to my variants. Apparently we agree not to add it.
Ok.
> Because now you have silently truncated, which is much worse than
I absolutely agree with your dislike of hidden intermittent failures.
For a constant failure such as this, even if everyone misses the
botch for the first few times that SGI boots a bazillion CPU system
in the lab, it will get noticed soon enough. This is in fact
exactly what happened with the 99 char limit that was there now.
> len = cpumask_scnprintf(buf, PAGE_SIZE, mask);
> if (len == PAGE_SIZE)
> return -ENOMEM;
That looks nice.
If you want me to add the "if (len ..." check, let me know.
Or if you want to send Andrew a patch that adds it, I'll
gladly support that.
==> Do note that I had to change the -1UL to PAGE_SIZE, in a
patch to Andrew about 12 hours ago. The *scnprintf()
family of fine formatting functions suppresses all requested
output if handed a length with the high bit set.
> That would be extremely unusual; we tend not to panic ...
Yup - I think it was Greg who said the same thing. Clearly this is
not a panic.
I was wrong to suggest panic'ing here.
> I'll do it; seems like we need negotiation with Greg-KH, too.
Ok - have fun.
> I question anyone's ability to produce a perfectly balanced solution
> without any external input.
Whatever ... my view of who was saying and doing what here doesn't
entirely match yours.
So be it.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.650.933.1373
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] fix sys cpumap for > 352 NR_CPUS
2004-06-04 1:27 ` Rusty Russell
@ 2004-06-04 18:15 ` Greg KH
0 siblings, 0 replies; 20+ messages in thread
From: Greg KH @ 2004-06-04 18:15 UTC (permalink / raw)
To: Rusty Russell; +Cc: Paul Jackson, lkml - Kernel Mailing List, Andrew Morton, ak
On Fri, Jun 04, 2004 at 11:27:46AM +1000, Rusty Russell wrote:
> On Fri, 2004-06-04 at 02:51, Greg KH wrote:
> > Just be aware of the size and code your show() function to be defensive
> > and not overrun that size.
>
> This is where we have a philosophical difference. As I understand it,
> the rule is, "don't put big things in attributes". If we want to change
> that rule, we need to do more work, like pass the length to the show
> function, and handle -ENOMEM by reallocating and looping.
We don't want to change the rule, it's a good rule.
> But I think the /rule/ is a good one: if you need to handle something
> arbitrarily large, DON'T USE THIS INTERFACE, because there is no way to
> do that correctly. This allows us to handle 99.9% of cases as a
> one-liner, which I think has great merit.
Agreed.
> I think we should guarantee any kernel primitive fits into the space:
> this means it should comfortably fit printing a cpumask_t. I would
> argue for a #error inside the cpumask or sysfs code which ensures we can
> fit two cpumasks (~7000 CPUs on page-size 4096), so we explode early if
> this ever becomes a problem, and a runtime sanity check inside the sysfs
> code to BUG on overrun.
Again, this seems to be a issue with the code that is trying to export a
cpumask_t. I suggest you keep the check inside that code and keep it
out of the sysfs core, which does not need it for 99.99% of the cases.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2004-06-04 18:18 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-02 23:11 [PATCH] fix sys cpumap for > 352 NR_CPUS Paul Jackson
2004-06-02 23:23 ` Andrew Morton
2004-06-02 23:59 ` Paul Jackson
2004-06-03 0:17 ` Andrew Morton
2004-06-03 16:24 ` Greg KH
2004-06-03 16:28 ` Paul Jackson
2004-06-03 0:22 ` Rusty Russell
2004-06-03 4:25 ` Paul Jackson
2004-06-03 6:26 ` Rusty Russell
2004-06-03 8:27 ` Paul Jackson
2004-06-04 1:12 ` Rusty Russell
2004-06-04 2:25 ` Paul Jackson
2004-06-03 15:49 ` Paul Jackson
2004-06-03 4:34 ` Paul Jackson
2004-06-03 4:35 ` Rusty Russell
2004-06-03 16:27 ` Greg KH
2004-06-03 16:38 ` Paul Jackson
2004-06-03 16:51 ` Greg KH
2004-06-04 1:27 ` Rusty Russell
2004-06-04 18:15 ` Greg KH
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox