linux-numa.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* NUMA balancing degrading performance
@ 2014-10-27 18:07 Martin Ichilevici de Oliveira
       [not found] ` <CAGz0_-3ukQGieAQ_NjcW6W9A2R3u5kCUxr4NETy5Z28srsxcNw@mail.gmail.com>
  0 siblings, 1 reply; 4+ messages in thread
From: Martin Ichilevici de Oliveira @ 2014-10-27 18:07 UTC (permalink / raw)
  To: linux-numa

[-- Attachment #1: Type: text/plain, Size: 5115 bytes --]

Hello,

I'm experimenting with the kernel's automatic NUMA balancing and I'm
experiencing performance loss when I turn the balancing on.

Through the sysctl kernel.numa_balancing, I've ran some benchmarks and
the performance with the balancing turned *off* was consistently better.
A general figure was 50% slower with balancing on, but in one particular
case it got 10 times slower.

AFAIK, none of the benchmarks are NUMA-aware, so I was expecting some
kind of performance gain. In the case any are in fact NUMA-aware, I
could understand a slight performance loss, but just slight: since the
scan period adapts to the ratio of local/remote faults, it should
quickly understand that memory placement is already optimal and increase
the scan period so that the overhead is small.

I would appreciate if someone could help me find what's causing this
behavior.

I'm running on CentOS 6.5 with Linux 3.17.1 (self compiled).

Below is some stuff that might be useful. If you need anything else,
just tell me.

Thank you,
Martin

$ for x in /proc/sys/kernel/numa_balancing* ; do echo $x ; cat $x; done
/proc/sys/kernel/numa_balancing
32768
/proc/sys/kernel/numa_balancing_scan_delay_ms
1000
/proc/sys/kernel/numa_balancing_scan_period_max_ms
60000
/proc/sys/kernel/numa_balancing_scan_period_min_ms
1000
/proc/sys/kernel/numa_balancing_scan_size_mb
256
(I tried playing with differents values of scan size, but it didn't made much 
of a difference)

$ numactl --hardware
available: 4 nodes (0,2,4,6)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 0 size: 16076 MB
node 0 free: 15713 MB
node 2 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 2 size: 16157 MB
node 2 free: 15903 MB
node 4 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
node 4 size: 16157 MB
node 4 free: 15998 MB
node 6 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
node 6 size: 16141 MB
node 6 free: 15941 MB
node distances:
node   0   2   4   6 
  0:  10  16  16  16 
  2:  16  10  16  16 
  4:  16  16  10  16 
  6:  16  16  16  10

$ grep NUMA .config
CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_WANTS_PROT_NUMA_PROT_NONE=y
CONFIG_ARCH_USES_NUMA_PROT_NONE=y
CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
CONFIG_NUMA_BALANCING=y
# CONFIG_X86_NUMACHIP is not set
CONFIG_NUMA=y
CONFIG_AMD_NUMA=y
CONFIG_X86_64_ACPI_NUMA=y
# CONFIG_NUMA_EMU is not set
CONFIG_USE_PERCPU_NUMA_NODE_ID=y
CONFIG_ACPI_NUMA=y

$ cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 21
model           : 1
model name      : AMD Opteron(TM) Processor 6272
stepping        : 2
microcode       : 0x6000629
cpu MHz         : 1400.000
cache size      : 2048 KB
physical id     : 0
siblings        : 16
core id         : 0
cpu cores       : 8
apicid          : 32
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc
extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 cx16 sse4_1
sse4_2 popcnt aes xsave avx lahf_lm cmp_lega
cy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
xop skinit wdt lwp fma4 nodeid_msr topoext perfctr_core perfctr_nb arat
cpb hw_pstate npt lbrv svm_lock nrip_save tsc_scale vmcb_clean
flushbyasid decodeassists pausefilter pfthreshold
bugs            : fxsave_leak
bogomips        : 4200.06
TLB size        : 1536 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb
(63 other just like this)

$ cat /proc/meminfo
MemTotal:       66083340 kB
MemFree:        64445100 kB
MemAvailable:   64477352 kB
Buffers:          172848 kB
Cached:           132988 kB
SwapCached:            0 kB
Active:           857468 kB
Inactive:          70176 kB
Active(anon):     621996 kB
Inactive(anon):        8 kB
Active(file):     235472 kB
Inactive(file):    70168 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      65535996 kB
SwapFree:       65535996 kB
Dirty:                28 kB
Writeback:             0 kB
AnonPages:        692268 kB
Mapped:            21468 kB
Shmem:               200 kB
Slab:             182152 kB
SReclaimable:      96752 kB
SUnreclaim:        85400 kB
KernelStack:       13120 kB
PageTables:         5928 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    98577664 kB
Committed_AS:    1363304 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      400824 kB
VmallocChunk:   34309010120 kB
HardwareCorrupted:     0 kB
AnonHugePages:    665600 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      160128 kB
DirectMap2M:     5064704 kB
DirectMap1G:    61865984 kB

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NUMA balancing degrading performance
       [not found] ` <CAGz0_-3ukQGieAQ_NjcW6W9A2R3u5kCUxr4NETy5Z28srsxcNw@mail.gmail.com>
@ 2014-10-27 21:27   ` Martin Ichilevici de Oliveira
  2014-10-28 21:57     ` Andreas Hollmann
  0 siblings, 1 reply; 4+ messages in thread
From: Martin Ichilevici de Oliveira @ 2014-10-27 21:27 UTC (permalink / raw)
  To: Andreas Hollmann; +Cc: linux-numa

[-- Attachment #1: Type: text/plain, Size: 922 bytes --]

Hello Andreas,

Thank you for your reply. Please check my comments inline.

> it would be good to know which applications/benchmarks you were running.
> 
> Have you tried out some well known and open source benchmarks?
> 
> NAS Parallel Benchmarks -
> http://www.nas.nasa.gov/publications/npb.html (Fortran Code)
> NPB2.3-omp-C.tgz (C version NPB in OpenMP) -
> http://www.hpcs.cs.tsukuba.ac.jp/omni-compiler/download/NPB2.3-omp-C.tgz
> Stream - http://www.cs.virginia.edu/stream/FTP/Code/stream.c

Sorry, I should have mentioned that. I tried some NAS benchmarks: 
bt, sp and lu-hp. bt and sp were around 60% slower with the balancing
turned on, and lu-hp was 10 times slower.

I also ran Lulesh, which was roughly 100% slower with the balancing
turned on.

> Do you have "numad" running on your machine? If it is running you
> should stop it.

I checked and it's not running.

Cheers,
Martin

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NUMA balancing degrading performance
  2014-10-27 21:27   ` Martin Ichilevici de Oliveira
@ 2014-10-28 21:57     ` Andreas Hollmann
  2014-10-29  2:37       ` Martin Ichilevici de Oliveira
  0 siblings, 1 reply; 4+ messages in thread
From: Andreas Hollmann @ 2014-10-28 21:57 UTC (permalink / raw)
  To: Martin Ichilevici de Oliveira; +Cc: linux-numa

Hi Martin,

I had some time to run NPB.

LU-HP is not available in NPB 3.3.1, so I used LU instead.

Here are my results:

LU Benchmark Completed.
 Class           =                        C
 Size            =            162x 162x 162
 Iterations      =                      250
 Time in seconds =                    XX.XX
 Total threads   =                       80
 Avail threads   =                       80
 Mop/s total     =                 51420.84
 Mop/s/thread    =                   642.76
 Operation type  =           floating point
 Verification    =               SUCCESSFUL
 Version         =                    3.3.1
 Compile date    =              28 Oct 2014

 Compile options:
    F77          = gfortran
    FLINK        = $(F77)
    F_LIB        = (none)
    F_INC        = (none)
    FFLAGS       = -O3 -fopenmp -mcmodel=medium
    FLINKFLAGS   = -O3 -fopenmp
    RAND         = (none)

With numa balance disabled :
(sudo bash -c "echo 0 > /proc/sys/kernel/numa_balancing"):

1st run:  Time in seconds =                    39.65
2nd run:  Time in seconds =                    39.47
3rd run:  Time in seconds =                    41.31
4th run:  Time in seconds =                    40.42

The measurements without numa balance are stable and around 40 sec.

With numa balance enabled
(sudo bash -c "echo 1 > /proc/sys/kernel/numa_balancing"):

1st run: Time in seconds =                    53.89
2nd run: Time in seconds =                    51.95
3rd run: Time in seconds =                    56.22
4th run: Time in seconds =                    64.20

Enabling this option increases the runtime by more then 50 % in the worst case.

Here is some information about the hardware:

Kernel: Linux inwest 3.16.4-1-ARCH #1 SMP PREEMPT Mon Oct 6 08:22:27
CEST 2014 x86_64 GNU/Linux
CPU: Intel(R) Xeon(R) CPU E7- 4850  @ 2.00GHz

numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 40 41 42 43 44 45 46 47 48 49
node 0 size: 64427 MB
node 0 free: 63912 MB
node 1 cpus: 10 11 12 13 14 15 16 17 18 19 50 51 52 53 54 55 56 57 58 59
node 1 size: 64509 MB
node 1 free: 64066 MB
node 2 cpus: 20 21 22 23 24 25 26 27 28 29 60 61 62 63 64 65 66 67 68 69
node 2 size: 64509 MB
node 2 free: 63987 MB
node 3 cpus: 30 31 32 33 34 35 36 37 38 39 70 71 72 73 74 75 76 77 78 79
node 3 size: 64509 MB
node 3 free: 64035 MB
node distances:
node   0   1   2   3
  0:  10  21  21  21
  1:  21  10  21  21
  2:  21  21  10  21
  3:  21  21  21  10

Regards,
Andreas

2014-10-27 22:27 GMT+01:00 Martin Ichilevici de Oliveira
<iomartin@iomartin.net>:
> Hello Andreas,
>
> Thank you for your reply. Please check my comments inline.
>
>> it would be good to know which applications/benchmarks you were running.
>>
>> Have you tried out some well known and open source benchmarks?
>>
>> NAS Parallel Benchmarks -
>> http://www.nas.nasa.gov/publications/npb.html (Fortran Code)
>> NPB2.3-omp-C.tgz (C version NPB in OpenMP) -
>> http://www.hpcs.cs.tsukuba.ac.jp/omni-compiler/download/NPB2.3-omp-C.tgz
>> Stream - http://www.cs.virginia.edu/stream/FTP/Code/stream.c
>
> Sorry, I should have mentioned that. I tried some NAS benchmarks:
> bt, sp and lu-hp. bt and sp were around 60% slower with the balancing
> turned on, and lu-hp was 10 times slower.
>
> I also ran Lulesh, which was roughly 100% slower with the balancing
> turned on.
>
>> Do you have "numad" running on your machine? If it is running you
>> should stop it.
>
> I checked and it's not running.
>
> Cheers,
> Martin

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NUMA balancing degrading performance
  2014-10-28 21:57     ` Andreas Hollmann
@ 2014-10-29  2:37       ` Martin Ichilevici de Oliveira
  0 siblings, 0 replies; 4+ messages in thread
From: Martin Ichilevici de Oliveira @ 2014-10-29  2:37 UTC (permalink / raw)
  To: Andreas Hollmann; +Cc: linux-numa

[-- Attachment #1: Type: text/plain, Size: 659 bytes --]

On Tue, Oct 28, 2014 at 10:57:37PM +0100, Andreas Hollmann wrote:
> Hi Martin,
> 
> I had some time to run NPB.
> 
> LU-HP is not available in NPB 3.3.1, so I used LU instead.
> 
> With numa balance enabled
> Enabling this option increases the runtime by more then 50 % in the worst case.

Andreas,

Thank you for taking the time to run it in your computer. I ran LU from
NPB 3.1.1 and got results similar to yours (up to 50% performance loss).
Do you have any idea/guess what might be causing it?

Do you know of any benchmark that improves with the automatic NUMA
balancing? I'd like to see how that goes on my machine.

Thanks,
Martin

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-10-29  2:37 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-27 18:07 NUMA balancing degrading performance Martin Ichilevici de Oliveira
     [not found] ` <CAGz0_-3ukQGieAQ_NjcW6W9A2R3u5kCUxr4NETy5Z28srsxcNw@mail.gmail.com>
2014-10-27 21:27   ` Martin Ichilevici de Oliveira
2014-10-28 21:57     ` Andreas Hollmann
2014-10-29  2:37       ` Martin Ichilevici de Oliveira

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).