All of lore.kernel.org
 help / color / mirror / Atom feed
* jitter test scalability problem - my theory
@ 2011-07-29 14:31 Mike Galbraith
  2011-07-29 14:35 ` Peter Zijlstra
  0 siblings, 1 reply; 3+ messages in thread
From: Mike Galbraith @ 2011-07-29 14:31 UTC (permalink / raw)
  To: RT; +Cc: Peter Zijlstra

Greetings,

2.6.33.15-rt31 on a 64 core DL980, running a jitter test proggy that
fires up an executive (a model thereof actually) on 56 isolated cores,
which in turn fires up 3 or 4 workers in descending priority.  All tasks
are pinned. Box booted isolcpus=8-63.  Jitter test runs on these, 0-7
handle everything else, including interrupts.

Test proggy is looking to achieve a max jitter of +-30us, and that goal
is met.. until I increase the load to > 32 isolated cores.

Profiling, my kernel overhead is mostly (surprise) locks, but it seems
my scalability issue may be coming from...

static void 
inc_rt_prio_smp(struct rt_rq *rt_rq, int prio, int prev_prio)
{
...
                if (rq->online)
                        cpupri_set(&rq->rd->cpupri, rq->cpu, prio);
      
void cpupri_set(struct cpupri *cp, int cpu, int newpri)
{
... 
        if (likely(newpri != CPUPRI_INVALID)) {
                struct cpupri_vec *vec = &cp->pri_to_cpu[newpri];

                raw_spin_lock_irqsave(&vec->lock, flags);  <== here

So it seems push/pull logic _may_ be my scalability problem.  Even
though there's nothing that can be pushed/pulled, it's hammering a few
locks from many cores, so cores perturb each other enough despite
isolation, to fail once enough cores are active.

Does that look like a reasonable explanation for my jitter increase?

I'm going to hack up a test for my wild theory, but since I don't have a
lot of experience hunting jitter sources, looking for a few lousy usecs,
it couldn't hurt to ask whether I'm barking up the wrong tree or not :)


(in previous run,  _raw_spin_lock_irqsave() was > 60% of kernel
overhead, this is just the last run I did)

# dso: [kernel.kallsyms]
# Events: 214K cycles
#
# Overhead                               Symbol
# ........  ...................................
#
    29.53%  [k] _raw_spin_lock_irqsave
            |
            |--61.57%-- cpupri_set
            |          |
            |          |--81.55%-- dequeue_rt_stack
            |          |          dequeue_task_rt
            |          |          dequeue_task
            |          |          |
            |          |          |--100.00%-- deactivate_task
            |          |          |          __schedule
            |          |          |          schedule
            |          |          |          |
            |          |          |          |--69.94%-- run_ksoftirqd
            |          |          |          |          kthread
            |          |          |          |          kernel_thread_helper
            |          |          |          |
            |          |          |          |--15.23%-- sys_semtimedop
            |          |          |          |          sys_semop
            |          |          |          |          system_call_fastpath
            |          |          |          |          |
            |          |          |          |          |--4.11%-- 0x7f54c025ae37
            |          |          |          |          |          __semop



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: jitter test scalability problem - my theory
  2011-07-29 14:31 jitter test scalability problem - my theory Mike Galbraith
@ 2011-07-29 14:35 ` Peter Zijlstra
  2011-07-29 14:47   ` Mike Galbraith
  0 siblings, 1 reply; 3+ messages in thread
From: Peter Zijlstra @ 2011-07-29 14:35 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: RT, rostedt

On Fri, 2011-07-29 at 16:31 +0200, Mike Galbraith wrote:
> So it seems push/pull logic _may_ be my scalability problem.  Even
> though there's nothing that can be pushed/pulled, it's hammering a few
> locks from many cores, so cores perturb each other enough despite
> isolation, to fail once enough cores are active.
> 
> Does that look like a reasonable explanation for my jitter increase? 

It is, Steve actually had patches for that.. 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: jitter test scalability problem - my theory
  2011-07-29 14:35 ` Peter Zijlstra
@ 2011-07-29 14:47   ` Mike Galbraith
  0 siblings, 0 replies; 3+ messages in thread
From: Mike Galbraith @ 2011-07-29 14:47 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: RT, rostedt

On Fri, 2011-07-29 at 16:35 +0200, Peter Zijlstra wrote:
> On Fri, 2011-07-29 at 16:31 +0200, Mike Galbraith wrote:
> > So it seems push/pull logic _may_ be my scalability problem.  Even
> > though there's nothing that can be pushed/pulled, it's hammering a few
> > locks from many cores, so cores perturb each other enough despite
> > isolation, to fail once enough cores are active.
> > 
> > Does that look like a reasonable explanation for my jitter increase? 
> 
> It is, Steve actually had patches for that..

Thanks for confirming. 

I know a guy (me) who would _love_ to test them :)  Diagnosing is one
thing, figuring out how to fix such an issue is quite another matter.

	-Mike


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-07-29 14:47 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-29 14:31 jitter test scalability problem - my theory Mike Galbraith
2011-07-29 14:35 ` Peter Zijlstra
2011-07-29 14:47   ` Mike Galbraith

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.