public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: NUMA API observations
@ 2004-06-15  4:59 Manfred Spraul
  2004-06-15  6:18 ` Paul Jackson
  2004-06-15 11:03 ` Andi Kleen
  0 siblings, 2 replies; 19+ messages in thread
From: Manfred Spraul @ 2004-06-15  4:59 UTC (permalink / raw)
  To: pj () sgi ! com; +Cc: linux-kernel, lse-tech, Andi Kleen, Anton Blanchard

>
>
>> I will probably make it loop and double the buffer until EINVAL
>> ends or it passes a page and add a nasty comment.
>
>I agree that a loop is needed.  And yes someone didn't do a very
>good job of designing this interface.
>
>  
>
What about fixing the interface instead? For example if user_mask_ptr 
NULL, then sys_sched_{get,set}affinity return the bitmap size.

--
    Manfred

^ permalink raw reply	[flat|nested] 19+ messages in thread
[parent not found: <271SM-3DT-7@gated-at.bofh.it>]
* NUMA API observations
@ 2004-06-14 15:36 Anton Blanchard
  2004-06-14 16:17 ` Andi Kleen
  2004-06-15 12:53 ` Thomas Zehetbauer
  0 siblings, 2 replies; 19+ messages in thread
From: Anton Blanchard @ 2004-06-14 15:36 UTC (permalink / raw)
  To: ak; +Cc: linux-kernel


Hi Andi,

I had a chance to test out the NUMA API on ppc64. I started with 64bit
userspace, I'll send on some 32/64bit compat patches shortly.

This machine is weird in that there are lots of nodes with no memory.
Sure I should probably not set those nodes online, but its a test setup
that is good for finding various issues (like node failover when one is
OOM).

As you can see only node 0 and 1 has memory:

# cat /proc/buddyinfo
Node 1, zone  DMA  0   53  62  22  1  0  1  1  1  1  0  0  0  1  60
Node 0, zone  DMA  136 18   5   1  2  1  0  1  0  1  1  0  0  0  59

# numastat
            node7  node6  node5  node4  node3  node2  node1   node0
numa_hit        0      0      0      0      0      0  30903 2170759
numa_miss       0      0      0      0      0      0      0       0
numa_foreign    0      0      0      0      0      0      0       0
interleave_hit  0      0      0      0      0      0    715     835
local_node      0      0      0      0      0      0  28776 2170737
other_node      0      0      0      0      0      0   2127      22

Now if I try and interleave across all, the task gets OOM killed:

# numactl --interleave=all /bin/sh
Killed

in dmesg: VM: killing process sh

It works if I specify the nodes with memory:

# numactl --interleave=0,1 /bin/sh

Is this expected or do we want it to fallback when there is lots of
memory on other nodes?

A similar scenario happens with:

# numactl --preferred=7 /bin/sh
Killed

in dmesg: VM: killing process numactl

The manpage says we should fallback to other nodes when the preferred
node is OOM.

numactl cpu affinity looks broken on big cpumask systems:

# numactl --cpubind=0 /bin/sh
sched_setaffinity: Invalid argument

sched_setaffinity(19470, 64,  { 0, 0, 0, 0, 0, 0, 2332313320, 534d50204d6f6e20 }) = -1 EINVAL (Invalid argument)

My kernel is compiled with NR_CPUS=128, the setaffinity syscall must be
called with a bitmap at least as big as the kernels cpumask_t. I will
submit a patch for this shortly.

Next I looked at the numactl --show info:

# numactl --show
policy: default
preferred node: 0
interleavemask:
interleavenode: 0
nodebind: 0 1 2 3
membind: 0 1 2 3 4 5 6 7

Whats the difference between nodebind and membind? Why dont i see all 8
nodes on both of them? I notice if I do membind=all then I only see
the nodes with memory:

# numactl --membind=all --show
policy: bind
preferred node: 0
interleavemask:
interleavenode: 0
nodebind: 0 1 2 3
membind: 0 1

That kind of makes sense, but I dont understand why we have 4 nodes in
the nodebind field. My cpu layout is not contiguous, perhaps thats why
nodebind comes out strange:

processor       : 0
processor       : 1
processor       : 2
processor       : 3
processor       : 16
processor       : 17
processor       : 18
processor       : 19

Anton

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2004-06-15 18:24 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-15  4:59 NUMA API observations Manfred Spraul
2004-06-15  6:18 ` Paul Jackson
2004-06-15 11:03 ` Andi Kleen
2004-06-15 17:37   ` Manfred Spraul
2004-06-15 18:32     ` Paul Jackson
2004-06-15 18:18   ` Paul Jackson
     [not found] <271SM-3DT-7@gated-at.bofh.it>
     [not found] ` <27lI4-29E-19@gated-at.bofh.it>
2004-06-15 13:27   ` Andi Kleen
     [not found] ` <272lY-44B-49@gated-at.bofh.it>
     [not found]   ` <2772a-7VK-9@gated-at.bofh.it>
     [not found]     ` <279nf-1id-3@gated-at.bofh.it>
2004-06-15 13:52       ` Bill Davidsen
  -- strict thread matches above, loose matches on Subject: below --
2004-06-14 15:36 Anton Blanchard
2004-06-14 16:17 ` Andi Kleen
2004-06-14 21:21   ` Paul Jackson
2004-06-14 23:44     ` Andi Kleen
2004-06-15  0:06       ` Paul Jackson
2004-06-15  0:20         ` Andi Kleen
2004-06-15  0:25           ` Paul Jackson
2004-06-14 21:40   ` Anton Blanchard
2004-06-14 23:49     ` Andi Kleen
2004-06-15 13:50       ` Jesse Barnes
2004-06-15 12:53 ` Thomas Zehetbauer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox