All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <jens.axboe@oracle.com>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>, mingo@elte.hu
Subject: Re: find_busiest_group using lots of CPU
Date: Tue, 6 Oct 2009 09:51:27 +0200	[thread overview]
Message-ID: <20091006075127.GF5216@kernel.dk> (raw)
In-Reply-To: <1254745898.26976.52.camel@twins>

On Mon, Oct 05 2009, Peter Zijlstra wrote:
> On Wed, 2009-09-30 at 10:18 +0200, Jens Axboe wrote:
> > Hi,
> > 
> > I stuffed a few more SSDs into my text box. Running a simple workload
> > that just does streaming reads from 10 processes (throughput is around
> > 2.2GB/sec), find_busiest_group() is using > 10% of the CPU time. This is
> > a 64 thread box.
> > 
> > The top two profile entries are:
> > 
> >     10.86%      fio  [kernel]                [k] find_busiest_group
> >                 |          
> >                 |--99.91%-- thread_return
> >                 |          io_schedule
> >                 |          sys_io_getevents
> >                 |          system_call_fastpath
> >                 |          0x7f4b50b61604
> >                 |          |          
> >                 |           --100.00%-- td_io_getevents
> >                 |                     io_u_queued_complete
> >                 |                     thread_main
> >                 |                     run_threads
> >                 |                     main
> >                 |                     __libc_start_main
> >                  --0.09%-- [...]
> > 
> >      5.78%      fio  [kernel]                [k] cpumask_next_and
> >                 |          
> >                 |--67.21%-- thread_return
> >                 |          io_schedule
> >                 |          sys_io_getevents
> >                 |          system_call_fastpath
> >                 |          0x7f4b50b61604
> >                 |          |          
> >                 |           --100.00%-- td_io_getevents
> >                 |                     io_u_queued_complete
> >                 |                     thread_main
> >                 |                     run_threads
> >                 |                     main
> >                 |                     __libc_start_main
> >                 |          
> >                  --32.79%-- find_busiest_group
> >                            thread_return
> >                            io_schedule
> >                            sys_io_getevents
> >                            system_call_fastpath
> >                            0x7f4b50b61604
> >                            |          
> >                             --100.00%-- td_io_getevents
> >                                       io_u_queued_complete
> >                                       thread_main
> >                                       run_threads
> >                                       main
> >                                       __libc_start_main
> > 
> > This is with SCHED_DEBUG=y and SCHEDSTATS=y enabled, I just tried with
> > both disabled but that yields the same result (well actually worse, 22%
> > spent in there. dunno if that's normal "fluctuation"). GROUP_SCHED is
> > not set. This seems way excessive!
> 
> io_schedule() straight into find_busiest_group() leads me to think this
> could be SD_BALANCE_NEWIDLE, does something like:
> 
> for i in /proc/sys/kernel/sched_domain/cpu*/domain*/flags; 
> do 
> 	val=`cat $i`; echo $((val & ~0x02)) > $i; 
> done
> 
> [ assuming SCHED_DEBUG=y ]
> 
> Cure things?

I can try, as mentioned it doesn't look any better with SCHED_DEBUG=n

> If so, then its spending time looking for work, which there might not be
> on your machine, since everything is waiting for IO or somesuch.

OK, just seems way excessive for something which is only 10 tasks and
not even that context switch intensive.

> Not really sure what to do about it though, this is a quad socket
> nehalem, right? We could possibly disable SD_BALANCE_NEWIDLE on the NODE
> level, but that would again decrease throughput in things like kbuild.

Yes, it's a quad socket nehalem. I'll see if disabling NEWIDLE makes a
difference, I need to run some other tests on that box today anyway.

-- 
Jens Axboe


  reply	other threads:[~2009-10-06  7:52 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-30  8:18 find_busiest_group using lots of CPU Jens Axboe
2009-10-05 12:31 ` Peter Zijlstra
2009-10-06  7:51   ` Jens Axboe [this message]
2009-10-06 11:20     ` Jens Axboe
2009-10-06 11:47       ` Ingo Molnar
2009-10-06 11:56         ` Jens Axboe
2009-10-06 12:04       ` Peter Zijlstra
2009-10-06 12:14         ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091006075127.GF5216@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.