* NUMA, migrate/N, and tuned-adm
@ 2013-12-17 18:10 David Timothy Strauss
2013-12-17 18:46 ` David Timothy Strauss
2013-12-17 20:41 ` Rik van Riel
0 siblings, 2 replies; 3+ messages in thread
From: David Timothy Strauss @ 2013-12-17 18:10 UTC (permalink / raw)
To: Mel Gorman, Ingo Molnar, Rik van Riel; +Cc: linux-kernel
Our system gets storms of migrate/N (and sometimes kswapd) tasks from
the kernel, based on what we've seen in top [1]. This issue is unique
to our hardware application servers; we run hundreds of application
servers on Xen virtual hardware without this issue and the same
kernel. We also have no issues with identical kernels and hardware
servers while running databases.
System specs:
* Fedora 19 with the 3.11.10-200.fc19.x86_64 kernel (just the stock RPM)
* Bare-metal servers with 128GB RAM split between two NUMA regions,
each region with one hex-core processor
* More than 700 processes, a couple hundred of which are active
fairly frequently. The systems were at 7000 processes, but we've
dropped it while we dive into this issue.
* Many of the processes are short-lived. The long-lived ones
experience spikes in CPU and memory usage while processing requests.
Here's what we've tried, to no avail:
* tuned-adm on latency-performance and virtual-host profiles; this
places the system on the deadline scheduler, but this problem occurred
on the default one too
* kernel.sched_migration_cost_ns=5000000 (which tuned will do for
those profiles in v3.3/Fedora 20)
* numad to balance between regions
* Global use of sched_relax_domain_level=1 and sched_relax_domain_level=2
* Splitting the system with cpuset into management tasks (6 virtual
cores) and workload tasks (18 virtual cores) with
sched_relax_domain_level=2. This is based on recommendations for NUMA
systems in the cpuset man page.
Here's what we've used for analysis:
* powertop
* top/htop
* perf record -a -g
* SystemTap with code to print out migrations occurring
* numatop
[1] https://gist.github.com/davidstrauss/3ff0b29c4d3766bedd49
David Strauss
Pantheon Systems
Fedora Server Working Group
P.S. Josh Boyer (jwb) referred me here from the Fedora kernel side.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: NUMA, migrate/N, and tuned-adm
2013-12-17 18:10 NUMA, migrate/N, and tuned-adm David Timothy Strauss
@ 2013-12-17 18:46 ` David Timothy Strauss
2013-12-17 20:41 ` Rik van Riel
1 sibling, 0 replies; 3+ messages in thread
From: David Timothy Strauss @ 2013-12-17 18:46 UTC (permalink / raw)
To: Mel Gorman, Ingo Molnar, Rik van Riel; +Cc: linux-kernel
On Tue, Dec 17, 2013 at 10:10 AM, David Timothy Strauss
<david@davidstrauss.net> wrote:
> * Splitting the system with cpuset into management tasks (6 virtual
> cores) and workload tasks (18 virtual cores) with
> sched_relax_domain_level=2. This is based on recommendations for NUMA
> systems in the cpuset man page.
I actually meant logical cores, not virtual cores here.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: NUMA, migrate/N, and tuned-adm
2013-12-17 18:10 NUMA, migrate/N, and tuned-adm David Timothy Strauss
2013-12-17 18:46 ` David Timothy Strauss
@ 2013-12-17 20:41 ` Rik van Riel
1 sibling, 0 replies; 3+ messages in thread
From: Rik van Riel @ 2013-12-17 20:41 UTC (permalink / raw)
To: David Timothy Strauss; +Cc: Mel Gorman, Ingo Molnar, linux-kernel
On 12/17/2013 01:10 PM, David Timothy Strauss wrote:
> System specs:
> * Fedora 19 with the 3.11.10-200.fc19.x86_64 kernel (just the stock RPM)
> * Bare-metal servers with 128GB RAM split between two NUMA regions,
> each region with one hex-core processor
> * More than 700 processes, a couple hundred of which are active
> fairly frequently. The systems were at 7000 processes, but we've
> dropped it while we dive into this issue.
> * Many of the processes are short-lived. The long-lived ones
> experience spikes in CPU and memory usage while processing requests.
>
> Here's what we've tried, to no avail:
> * tuned-adm on latency-performance and virtual-host profiles; this
> places the system on the deadline scheduler, but this problem occurred
> on the default one too
> * kernel.sched_migration_cost_ns=5000000 (which tuned will do for
> those profiles in v3.3/Fedora 20)
> * numad to balance between regions
> * Global use of sched_relax_domain_level=1 and sched_relax_domain_level=2
> * Splitting the system with cpuset into management tasks (6 virtual
> cores) and workload tasks (18 virtual cores) with
> sched_relax_domain_level=2. This is based on recommendations for NUMA
> systems in the cpuset man page.
Just for a quick sanity check, can you try disabling the
automatic numa balancing code?
# echo NO_NUMA > /sys/kernel/debug/sched_features
> Here's what we've used for analysis:
> * powertop
> * top/htop
> * perf record -a -g
Does "perf report -g" show where the calls to the
migration code are coming from? Something must be
migrating tasks around, and it will be good to know
what it is...
--
All rights reversed
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-12-17 20:41 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-17 18:10 NUMA, migrate/N, and tuned-adm David Timothy Strauss
2013-12-17 18:46 ` David Timothy Strauss
2013-12-17 20:41 ` Rik van Riel
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.