* wrong parsing to /proc/self/status causes wrong out-of-range errors
@ 2009-04-21 15:48 Brice Goglin
2009-04-23 13:55 ` Cliff Wickman
0 siblings, 1 reply; 4+ messages in thread
From: Brice Goglin @ 2009-04-21 15:48 UTC (permalink / raw)
To: linux-numa; +Cc: ianw
Hello,
After upgrading a quad-socket quad-core machine from 2.6.27 to 2.6.29,
numactl now reports that --physcpubind=X is an out-of-range CPU when
12<=X<16
(while obviously we still have 16 cores in this machine).
Same problem on other machines with 2.6.29: the last 4 cores are
"out-of-range".
First observed with Debian's 2.0.3-rc1 but seems to occur with 2.0.3-rc2
as well.
It appears that this is caused by set_thread_constraints() passing a
wrong pointer to read_mask() when trying to gather maxproccpu and
maxprocnode from /proc/self/status. It points to the second character
of the mask instead of the first one, thus loosing one "f", which means
4 cores are lost.
The kernel code generating the "Cpus_allowed:" and "Mems_allowed:" masks
in /proc/self/status has changed recently, so maybe the formatting
changed a bit (whitespaces?).
The patch below fixes this problem by just passing a pointer to the first
character after ":". read_mask/strtoul are able to skip whitespaces anyway,
so no need to bother trying to guess how many whitespaces follow ":" in the
caller.
Brice
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
diff -ur numactl-2.0.3~rc1/libnuma.c numactl-2.0.3~rc1.save/libnuma.c
--- numactl-2.0.3~rc1/libnuma.c 2008-12-09 20:38:07.000000000 +0100
+++ numactl-2.0.3~rc1.save/libnuma.c 2009-04-21 15:44:19.000000000 +0200
@@ -479,11 +479,11 @@
while (getline(&buffer, &buflen, f) > 0) {
if (strncmp(buffer,"Cpus_allowed:",13) == 0)
- maxproccpu = read_mask(buffer + 15, numa_all_cpus_ptr);
+ maxproccpu = read_mask(buffer + 13, numa_all_cpus_ptr);
if (strncmp(buffer,"Mems_allowed:",13) == 0) {
maxprocnode =
- read_mask(buffer + 15, numa_all_nodes_ptr);
+ read_mask(buffer + 13, numa_all_nodes_ptr);
}
}
fclose(f);
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: wrong parsing to /proc/self/status causes wrong out-of-range errors
2009-04-21 15:48 wrong parsing to /proc/self/status causes wrong out-of-range errors Brice Goglin
@ 2009-04-23 13:55 ` Cliff Wickman
2009-04-23 18:14 ` Lee Schermerhorn
0 siblings, 1 reply; 4+ messages in thread
From: Cliff Wickman @ 2009-04-23 13:55 UTC (permalink / raw)
To: Brice Goglin; +Cc: linux-numa, ianw
Hi Brice,
Thanks for the fix.
I tested it on ia64.
It is now part of numactl-2.0.3-rc2.tar.gz at
ftp://oss.sgi.com/www/projects/libnuma/download/
BTW,
Andi, Lee and all,
We have no established release schedule that I'm aware of.
2.0.2 was Aug08
There are about 14 fixes accumulated in 2.0.3 rc1 and rc2. But only
a trickle of small changes lately.
I'm inclined to announce it as 2.0.3, if you have used it and are
satisfied that it's stable.
-Cliff
On Tue, Apr 21, 2009 at 05:48:18PM +0200, Brice Goglin wrote:
> Hello,
>
> After upgrading a quad-socket quad-core machine from 2.6.27 to 2.6.29,
> numactl now reports that --physcpubind=X is an out-of-range CPU when
> 12<=X<16
> (while obviously we still have 16 cores in this machine).
> Same problem on other machines with 2.6.29: the last 4 cores are
> "out-of-range".
> First observed with Debian's 2.0.3-rc1 but seems to occur with 2.0.3-rc2
> as well.
>
> It appears that this is caused by set_thread_constraints() passing a
> wrong pointer to read_mask() when trying to gather maxproccpu and
> maxprocnode from /proc/self/status. It points to the second character
> of the mask instead of the first one, thus loosing one "f", which means
> 4 cores are lost.
>
> The kernel code generating the "Cpus_allowed:" and "Mems_allowed:" masks
> in /proc/self/status has changed recently, so maybe the formatting
> changed a bit (whitespaces?).
>
> The patch below fixes this problem by just passing a pointer to the first
> character after ":". read_mask/strtoul are able to skip whitespaces anyway,
> so no need to bother trying to guess how many whitespaces follow ":" in the
> caller.
>
> Brice
>
>
> Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
>
> diff -ur numactl-2.0.3~rc1/libnuma.c numactl-2.0.3~rc1.save/libnuma.c
> --- numactl-2.0.3~rc1/libnuma.c 2008-12-09 20:38:07.000000000 +0100
> +++ numactl-2.0.3~rc1.save/libnuma.c 2009-04-21 15:44:19.000000000 +0200
> @@ -479,11 +479,11 @@
>
> while (getline(&buffer, &buflen, f) > 0) {
> if (strncmp(buffer,"Cpus_allowed:",13) == 0)
> - maxproccpu = read_mask(buffer + 15, numa_all_cpus_ptr);
> + maxproccpu = read_mask(buffer + 13, numa_all_cpus_ptr);
>
> if (strncmp(buffer,"Mems_allowed:",13) == 0) {
> maxprocnode =
> - read_mask(buffer + 15, numa_all_nodes_ptr);
> + read_mask(buffer + 13, numa_all_nodes_ptr);
> }
> }
> fclose(f);
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-numa" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Cliff Wickman
Silicon Graphics, Inc.
cpw@sgi.com
(651) 683-3824
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: wrong parsing to /proc/self/status causes wrong out-of-range errors
2009-04-23 13:55 ` Cliff Wickman
@ 2009-04-23 18:14 ` Lee Schermerhorn
2009-04-23 18:40 ` Cliff Wickman
0 siblings, 1 reply; 4+ messages in thread
From: Lee Schermerhorn @ 2009-04-23 18:14 UTC (permalink / raw)
To: Cliff Wickman; +Cc: Brice Goglin, linux-numa, ianw
On Thu, 2009-04-23 at 08:55 -0500, Cliff Wickman wrote:
> Hi Brice,
>
> Thanks for the fix.
>
> I tested it on ia64.
> It is now part of numactl-2.0.3-rc2.tar.gz at
> ftp://oss.sgi.com/www/projects/libnuma/download/
>
> BTW,
> Andi, Lee and all,
> We have no established release schedule that I'm aware of.
> 2.0.2 was Aug08
> There are about 14 fixes accumulated in 2.0.3 rc1 and rc2. But only
> a trickle of small changes lately.
> I'm inclined to announce it as 2.0.3, if you have used it and are
> satisfied that it's stable.
Hi, Cliff:
I have been using/testing 2.0.3-rc2 for a while. I have accumulated a
couple of patches which I've been remiss for not posting. They sort of
got pushed way down on my stack. One involves cleaning up what I
perceived as bitmap leaks, but included some unrelated cleanup. I've
meant to separate them out. One actually addresses the Cpus_allowed
parsing that Brice fixed.
A while ago I sent a copy to Kornilios Kourtis [off list] to see if he
was interested in moving them. He did make some mods, which I've been
too distracted to check out :(, but KK was also too busy to dedicate
much time. I can try to spend some time tomorrow reviewing Kornilios'
version and my other patches [nothing serious, as I recall], and send
them on--before the weekend, I would hope.
Or I could send you what I have as a heads up and work with you to
finish testing/cleanup/... Let me know your preference.
Apologies for letting this slide,
Lee
<snip>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: wrong parsing to /proc/self/status causes wrong out-of-range errors
2009-04-23 18:14 ` Lee Schermerhorn
@ 2009-04-23 18:40 ` Cliff Wickman
0 siblings, 0 replies; 4+ messages in thread
From: Cliff Wickman @ 2009-04-23 18:40 UTC (permalink / raw)
To: Lee Schermerhorn; +Cc: linux-numa
Hi Lee.
On Thu, Apr 23, 2009 at 02:14:34PM -0400, Lee Schermerhorn wrote:
> On Thu, 2009-04-23 at 08:55 -0500, Cliff Wickman wrote:
> > Andi, Lee and all,
> > We have no established release schedule that I'm aware of.
> > 2.0.2 was Aug08
> > There are about 14 fixes accumulated in 2.0.3 rc1 and rc2. But only
> > a trickle of small changes lately.
> > I'm inclined to announce it as 2.0.3, if you have used it and are
> > satisfied that it's stable.
>
> Hi, Cliff:
>
> I have been using/testing 2.0.3-rc2 for a while. I have accumulated a
> couple of patches which I've been remiss for not posting. They sort of
> got pushed way down on my stack. One involves cleaning up what I
> perceived as bitmap leaks, but included some unrelated cleanup. I've
> meant to separate them out. One actually addresses the Cpus_allowed
> parsing that Brice fixed.
>
> A while ago I sent a copy to Kornilios Kourtis [off list] to see if he
> was interested in moving them. He did make some mods, which I've been
> too distracted to check out :(, but KK was also too busy to dedicate
> much time. I can try to spend some time tomorrow reviewing Kornilios'
> version and my other patches [nothing serious, as I recall], and send
> them on--before the weekend, I would hope.
>
> Or I could send you what I have as a heads up and work with you to
> finish testing/cleanup/... Let me know your preference.
>
> Apologies for letting this slide,
> Lee
I'm not in any hurry release 2.0.3. It just looked like it was stable.
But if you and Kornilios and others are working on some issues that's
fine. Finish them when you can.
-Cliff
--
Cliff Wickman
Silicon Graphics, Inc.
cpw@sgi.com
(651) 683-3824
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-04-23 18:40 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-21 15:48 wrong parsing to /proc/self/status causes wrong out-of-range errors Brice Goglin
2009-04-23 13:55 ` Cliff Wickman
2009-04-23 18:14 ` Lee Schermerhorn
2009-04-23 18:40 ` Cliff Wickman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).