* jitter test scalability problem - my theory
@ 2011-07-29 14:31 Mike Galbraith
2011-07-29 14:35 ` Peter Zijlstra
0 siblings, 1 reply; 3+ messages in thread
From: Mike Galbraith @ 2011-07-29 14:31 UTC (permalink / raw)
To: RT; +Cc: Peter Zijlstra
Greetings,
2.6.33.15-rt31 on a 64 core DL980, running a jitter test proggy that
fires up an executive (a model thereof actually) on 56 isolated cores,
which in turn fires up 3 or 4 workers in descending priority. All tasks
are pinned. Box booted isolcpus=8-63. Jitter test runs on these, 0-7
handle everything else, including interrupts.
Test proggy is looking to achieve a max jitter of +-30us, and that goal
is met.. until I increase the load to > 32 isolated cores.
Profiling, my kernel overhead is mostly (surprise) locks, but it seems
my scalability issue may be coming from...
static void
inc_rt_prio_smp(struct rt_rq *rt_rq, int prio, int prev_prio)
{
...
if (rq->online)
cpupri_set(&rq->rd->cpupri, rq->cpu, prio);
void cpupri_set(struct cpupri *cp, int cpu, int newpri)
{
...
if (likely(newpri != CPUPRI_INVALID)) {
struct cpupri_vec *vec = &cp->pri_to_cpu[newpri];
raw_spin_lock_irqsave(&vec->lock, flags); <== here
So it seems push/pull logic _may_ be my scalability problem. Even
though there's nothing that can be pushed/pulled, it's hammering a few
locks from many cores, so cores perturb each other enough despite
isolation, to fail once enough cores are active.
Does that look like a reasonable explanation for my jitter increase?
I'm going to hack up a test for my wild theory, but since I don't have a
lot of experience hunting jitter sources, looking for a few lousy usecs,
it couldn't hurt to ask whether I'm barking up the wrong tree or not :)
(in previous run, _raw_spin_lock_irqsave() was > 60% of kernel
overhead, this is just the last run I did)
# dso: [kernel.kallsyms]
# Events: 214K cycles
#
# Overhead Symbol
# ........ ...................................
#
29.53% [k] _raw_spin_lock_irqsave
|
|--61.57%-- cpupri_set
| |
| |--81.55%-- dequeue_rt_stack
| | dequeue_task_rt
| | dequeue_task
| | |
| | |--100.00%-- deactivate_task
| | | __schedule
| | | schedule
| | | |
| | | |--69.94%-- run_ksoftirqd
| | | | kthread
| | | | kernel_thread_helper
| | | |
| | | |--15.23%-- sys_semtimedop
| | | | sys_semop
| | | | system_call_fastpath
| | | | |
| | | | |--4.11%-- 0x7f54c025ae37
| | | | | __semop
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: jitter test scalability problem - my theory
2011-07-29 14:31 jitter test scalability problem - my theory Mike Galbraith
@ 2011-07-29 14:35 ` Peter Zijlstra
2011-07-29 14:47 ` Mike Galbraith
0 siblings, 1 reply; 3+ messages in thread
From: Peter Zijlstra @ 2011-07-29 14:35 UTC (permalink / raw)
To: Mike Galbraith; +Cc: RT, rostedt
On Fri, 2011-07-29 at 16:31 +0200, Mike Galbraith wrote:
> So it seems push/pull logic _may_ be my scalability problem. Even
> though there's nothing that can be pushed/pulled, it's hammering a few
> locks from many cores, so cores perturb each other enough despite
> isolation, to fail once enough cores are active.
>
> Does that look like a reasonable explanation for my jitter increase?
It is, Steve actually had patches for that..
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: jitter test scalability problem - my theory
2011-07-29 14:35 ` Peter Zijlstra
@ 2011-07-29 14:47 ` Mike Galbraith
0 siblings, 0 replies; 3+ messages in thread
From: Mike Galbraith @ 2011-07-29 14:47 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: RT, rostedt
On Fri, 2011-07-29 at 16:35 +0200, Peter Zijlstra wrote:
> On Fri, 2011-07-29 at 16:31 +0200, Mike Galbraith wrote:
> > So it seems push/pull logic _may_ be my scalability problem. Even
> > though there's nothing that can be pushed/pulled, it's hammering a few
> > locks from many cores, so cores perturb each other enough despite
> > isolation, to fail once enough cores are active.
> >
> > Does that look like a reasonable explanation for my jitter increase?
>
> It is, Steve actually had patches for that..
Thanks for confirming.
I know a guy (me) who would _love_ to test them :) Diagnosing is one
thing, figuring out how to fix such an issue is quite another matter.
-Mike
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2011-07-29 14:47 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-29 14:31 jitter test scalability problem - my theory Mike Galbraith
2011-07-29 14:35 ` Peter Zijlstra
2011-07-29 14:47 ` Mike Galbraith
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.