public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
  • [parent not found: <272lY-44B-49@gated-at.bofh.it>]
  • * Re: NUMA API observations
    @ 2004-06-15  4:59 Manfred Spraul
      2004-06-15  6:18 ` Paul Jackson
      2004-06-15 11:03 ` Andi Kleen
      0 siblings, 2 replies; 19+ messages in thread
    From: Manfred Spraul @ 2004-06-15  4:59 UTC (permalink / raw)
      To: pj () sgi ! com; +Cc: linux-kernel, lse-tech, Andi Kleen, Anton Blanchard
    
    >
    >
    >> I will probably make it loop and double the buffer until EINVAL
    >> ends or it passes a page and add a nasty comment.
    >
    >I agree that a loop is needed.  And yes someone didn't do a very
    >good job of designing this interface.
    >
    >  
    >
    What about fixing the interface instead? For example if user_mask_ptr 
    NULL, then sys_sched_{get,set}affinity return the bitmap size.
    
    --
        Manfred
    
    ^ permalink raw reply	[flat|nested] 19+ messages in thread
    * NUMA API observations
    @ 2004-06-14 15:36 Anton Blanchard
      2004-06-14 16:17 ` Andi Kleen
      2004-06-15 12:53 ` Thomas Zehetbauer
      0 siblings, 2 replies; 19+ messages in thread
    From: Anton Blanchard @ 2004-06-14 15:36 UTC (permalink / raw)
      To: ak; +Cc: linux-kernel
    
    
    Hi Andi,
    
    I had a chance to test out the NUMA API on ppc64. I started with 64bit
    userspace, I'll send on some 32/64bit compat patches shortly.
    
    This machine is weird in that there are lots of nodes with no memory.
    Sure I should probably not set those nodes online, but its a test setup
    that is good for finding various issues (like node failover when one is
    OOM).
    
    As you can see only node 0 and 1 has memory:
    
    # cat /proc/buddyinfo
    Node 1, zone  DMA  0   53  62  22  1  0  1  1  1  1  0  0  0  1  60
    Node 0, zone  DMA  136 18   5   1  2  1  0  1  0  1  1  0  0  0  59
    
    # numastat
                node7  node6  node5  node4  node3  node2  node1   node0
    numa_hit        0      0      0      0      0      0  30903 2170759
    numa_miss       0      0      0      0      0      0      0       0
    numa_foreign    0      0      0      0      0      0      0       0
    interleave_hit  0      0      0      0      0      0    715     835
    local_node      0      0      0      0      0      0  28776 2170737
    other_node      0      0      0      0      0      0   2127      22
    
    Now if I try and interleave across all, the task gets OOM killed:
    
    # numactl --interleave=all /bin/sh
    Killed
    
    in dmesg: VM: killing process sh
    
    It works if I specify the nodes with memory:
    
    # numactl --interleave=0,1 /bin/sh
    
    Is this expected or do we want it to fallback when there is lots of
    memory on other nodes?
    
    A similar scenario happens with:
    
    # numactl --preferred=7 /bin/sh
    Killed
    
    in dmesg: VM: killing process numactl
    
    The manpage says we should fallback to other nodes when the preferred
    node is OOM.
    
    numactl cpu affinity looks broken on big cpumask systems:
    
    # numactl --cpubind=0 /bin/sh
    sched_setaffinity: Invalid argument
    
    sched_setaffinity(19470, 64,  { 0, 0, 0, 0, 0, 0, 2332313320, 534d50204d6f6e20 }) = -1 EINVAL (Invalid argument)
    
    My kernel is compiled with NR_CPUS=128, the setaffinity syscall must be
    called with a bitmap at least as big as the kernels cpumask_t. I will
    submit a patch for this shortly.
    
    Next I looked at the numactl --show info:
    
    # numactl --show
    policy: default
    preferred node: 0
    interleavemask:
    interleavenode: 0
    nodebind: 0 1 2 3
    membind: 0 1 2 3 4 5 6 7
    
    Whats the difference between nodebind and membind? Why dont i see all 8
    nodes on both of them? I notice if I do membind=all then I only see
    the nodes with memory:
    
    # numactl --membind=all --show
    policy: bind
    preferred node: 0
    interleavemask:
    interleavenode: 0
    nodebind: 0 1 2 3
    membind: 0 1
    
    That kind of makes sense, but I dont understand why we have 4 nodes in
    the nodebind field. My cpu layout is not contiguous, perhaps thats why
    nodebind comes out strange:
    
    processor       : 0
    processor       : 1
    processor       : 2
    processor       : 3
    processor       : 16
    processor       : 17
    processor       : 18
    processor       : 19
    
    Anton
    
    ^ permalink raw reply	[flat|nested] 19+ messages in thread

    end of thread, other threads:[~2004-06-15 18:24 UTC | newest]
    
    Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
    -- links below jump to the message on this page --
         [not found] <271SM-3DT-7@gated-at.bofh.it>
         [not found] ` <27lI4-29E-19@gated-at.bofh.it>
    2004-06-15 13:27   ` NUMA API observations Andi Kleen
         [not found] ` <272lY-44B-49@gated-at.bofh.it>
         [not found]   ` <2772a-7VK-9@gated-at.bofh.it>
         [not found]     ` <279nf-1id-3@gated-at.bofh.it>
    2004-06-15 13:52       ` Bill Davidsen
    2004-06-15  4:59 Manfred Spraul
    2004-06-15  6:18 ` Paul Jackson
    2004-06-15 11:03 ` Andi Kleen
    2004-06-15 17:37   ` Manfred Spraul
    2004-06-15 18:32     ` Paul Jackson
    2004-06-15 18:18   ` Paul Jackson
      -- strict thread matches above, loose matches on Subject: below --
    2004-06-14 15:36 Anton Blanchard
    2004-06-14 16:17 ` Andi Kleen
    2004-06-14 21:21   ` Paul Jackson
    2004-06-14 23:44     ` Andi Kleen
    2004-06-15  0:06       ` Paul Jackson
    2004-06-15  0:20         ` Andi Kleen
    2004-06-15  0:25           ` Paul Jackson
    2004-06-14 21:40   ` Anton Blanchard
    2004-06-14 23:49     ` Andi Kleen
    2004-06-15 13:50       ` Jesse Barnes
    2004-06-15 12:53 ` Thomas Zehetbauer
    

    This is a public inbox, see mirroring instructions
    for how to clone and mirror all data and code used for this inbox