From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: john stultz <johnstul@us.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>, Thomas Gleixner <tglx@linutronix.de>,
Nick Piggin <npiggin@suse.de>, Darren Hart <dvhltc@us.ibm.com>,
Clark Williams <williams@redhat.com>,
Dinakar Guniguntala <dino@in.ibm.com>,
lkml <linux-kernel@vger.kernel.org>
Subject: Re: -rt dbench scalabiltiy issue
Date: Fri, 16 Oct 2009 17:45:05 -0700 [thread overview]
Message-ID: <20091017004505.GI6720@linux.vnet.ibm.com> (raw)
In-Reply-To: <1255723519.5135.121.camel@localhost.localdomain>
On Fri, Oct 16, 2009 at 01:05:19PM -0700, john stultz wrote:
> See http://lwn.net/Articles/354690/ for a bit of background here.
>
> I've been looking at scalability regressions in the -rt kernel. One easy
> place to see regressions is with the dbench benchmark. While dbench can
> be painfully noisy from run to run, it does clearly show some severe
> regressions with -rt.
>
> There's a chart in the article above that illustrates this, but here's
> some specific numbers on an 8-way box running dbench-3.04 as follows:
>
> ./dbench 8 -t 10 -D . -c client.txt 2>&1
>
> I ran both on an ext3 disk and a ramfs mounted directory.
>
> (Again, the numbers are VERY rough due to the run-to-run variance seen)
>
> ext3 ramfs
> 2.6.32-rc3: ~1800 MB/sec ~1600 MB/sec
> 2.6.31.2-rt13: ~300 MB/sec ~66 MB/sec
>
> Ouch. Similar to the charts in the LWN article.
>
> Dino pointed out that using lockstat with -rt, we can see the
> dcache_lock is fairly hot with the -rt kernel. One of the issues with
> the -rt tree is that the change from spinlocks to sleeping-spinlocks
> doesn't effect the un-contended case very much, but when there is
> contention on the lock, the overhead is much worse then with vanilla.
>
> And as noted at the realtime mini-conf, Ingo saw this dcache_lock
> bottleneck as well and suggested trying Nick Piggin's dcache_lock
> removal patches.
>
> So over the last week, I've ported Nick's fs-scale patches to -rt.
>
> Specifically the tarball found here:
> ftp://ftp.kernel.org/pub/linux/kernel/people/npiggin/patches/fs-scale/06102009.tar.gz
>
>
> Due to the 2.6.32 2.6.31-rt split, the port wasn't exactly straight
> forward, but I believe I managed to do a decent job. Once I had the
> patchset applied, building and booted, I eagerly ran dbench to see the
> new results, aaaaaand.....
>
> ext3 ramfs
> 2.6.31.2-rt13-nick: ~80 MB/sec ~126 MB/sec
>
>
> So yea, mixed bag there. The ramfs got a little bit better but not that
> much, and the ext3 numbers regressed further.
OK, I will ask the stupid question... What happens if you run on ext2?
Thanx, Paul
> I then looked into the perf tool, to see if it would shed some light on
> whats going on (snipped results below).
>
> 2.6.31.2-rt13 on ext3:
> 42.45% dbench [kernel] [k] _atomic_spin_lock_irqsave
> |
> |--85.61%-- rt_spin_lock_slowlock
> | rt_spin_lock
> | |
> | |--23.91%-- start_this_handle
> | | journal_start
> | | ext3_journal_start_sb
> | |--21.29%-- journal_stop
> | |
> | |--13.80%-- ext3_test_allocatable
> | |
> | |--12.15%-- bitmap_search_next_usable_block
> | |
> | |--9.79%-- journal_put_journal_head
> | |
> | |--5.93%-- journal_add_journal_head
> | |
> | |--2.59%-- atomic_dec_and_spin_lock
> | | dput
> | | |
> | | |--65.31%-- path_put
> | | | |
> | | | |--53.37%-- __link_path_walk
> ...
>
> So this is initially interesting, as it seems on ext3 it seems the
> journal locking is really whats catching us more then the dcache_lock.
> Am I reading this right?
>
>
> 2.6.31.2-rt13 on ramfs:
> 45.98% dbench [kernel] [k] _atomic_spin_lock_irqsave
> |
> |--82.94%-- rt_spin_lock_slowlock
> | rt_spin_lock
> | |
> | |--61.18%-- dcache_readdir
> | | vfs_readdir
> | | sys_getdents
> | | system_call_fastpath
> | | __getdents64
> | |
> | |--11.26%-- atomic_dec_and_spin_lock
> | | dput
> | |
> | |--7.93%-- d_path
> | | seq_path
> | | show_vfsmnt
> | | seq_read
> | | vfs_read
> | | sys_read
> | | system_call_fastpath
> | | __GI___libc_read
> | |
>
>
> So here we do see the dcache_readdir's use of the dcache lock pop up to
> the top. And with ramfs we don't see any of the ext3 journal code.
>
> Next up is with Nick's patchset:
>
> 2.6.31.2-rt13-nick on ext3:
> 45.48% dbench [kernel] [k] _atomic_spin_lock_irqsave
> |
> |--83.40%-- rt_spin_lock_slowlock
> | |
> | |--100.00%-- rt_spin_lock
> | | |
> | | |--43.35%-- dput
> | | | |
> | | | |--50.29%-- __link_path_walk
> | | | --49.71%-- path_put
> | | |--39.07%-- path_get
> | | | |
> | | | |--61.98%-- path_walk
> | | | |--38.01%-- path_init
> | | |
> | | |--7.33%-- journal_put_journal_head
> | | |
> | | |--4.32%-- journal_add_journal_head
> | | |
> | | |--2.83%-- start_this_handle
> | | | journal_start
> | | | ext3_journal_start_sb
> | | |
> | | |--2.52%-- journal_stop
> |
> |--15.87%-- rt_spin_lock_slowunlock
> | rt_spin_unlock
> | |
> | |--43.48%-- path_get
> | |
> | |--41.80%-- dput
> | |
> | |--5.34%-- journal_add_journal_head
> ...
>
> With Nick's patches on ext3, it seems dput()'s locking is the bottleneck
> more then the journal code (maybe due to the multiple spinning nested
> trylocks?).
>
> With the ramfs, it looks mostly the same, but without the journal calls:
>
> 2.6.31.2-rt13-nick on ramfs:
> 46.51% dbench [kernel] [k] _atomic_spin_lock_irqsave
> |
> |--86.95%-- rt_spin_lock_slowlock
> | rt_spin_lock
> | |
> | |--50.08%-- dput
> | | |
> | | |--56.92%-- __link_path_walk
> | | |
> | | --43.08%-- path_put
> | |
> | |--49.12%-- path_get
> | | |
> | | |--63.22%-- path_walk
> | | |
> | | |--36.73%-- path_init
> |
> |--12.59%-- rt_spin_lock_slowunlock
> | rt_spin_unlock
> | |
> | |--49.86%-- path_get
> | | |
> | | |--58.15%-- path_init
> | | | |
> ...
>
>
> So the net of this is: Nick's patches helped some but not that much in
> ramfs filesystems, and hurt ext3 performance w/ -rt.
>
> Maybe I just mis-applied the patches? I'll admit I'm unfamiliar with the
> dcache code, and converting the patches to the -rt tree was not always
> straight forward.
>
> Or maybe these results are expected? With Nick's patch against
> 2.6.32-rc3 I got:
>
> ext3 ramfs
> 2.6.32-rc3-nick ~1800 MB/sec ~2200 MB/sec
>
> So ext3 performance didn't change, but ramfs did see a nice bump. Maybe
> Nick's patches helped where they could, but we still have other
> contention points that are problematic with -rt's lock slowpath
> overhead?
>
>
> Ingo, Nick, Thomas: Any thoughts or comments here? Am I reading perf's
> results incorrectly? Any idea why with Nick's patch the contention in
> dput() hurts ext3 so much worse then in the ramfs case?
>
>
> I'll be doing some further tests today w/ ext2 to see if getting the
> journal code out of the way shows any benefit. But if folks have any
> insight or suggestions for other ideas to look at please let me know.
>
> thanks
> -john
>
next prev parent reply other threads:[~2009-10-17 0:45 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-16 20:05 -rt dbench scalabiltiy issue john stultz
2009-10-17 0:45 ` Paul E. McKenney [this message]
2009-10-17 1:03 ` john stultz
2009-10-17 1:37 ` john stultz
2009-10-17 23:06 ` Nick Piggin
2009-10-17 22:39 ` Nick Piggin
2009-11-18 1:28 ` john stultz
2009-11-18 4:25 ` Nick Piggin
2009-11-18 10:19 ` Thomas Gleixner
2009-11-18 10:52 ` Nick Piggin
2009-11-20 2:22 ` john stultz
2009-11-23 9:06 ` Nick Piggin
2009-11-25 2:16 ` john stultz
2009-11-25 7:18 ` Nick Piggin
2009-11-25 22:20 ` john stultz
2009-11-26 6:20 ` Nick Piggin
2009-12-02 1:53 ` john stultz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091017004505.GI6720@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=dino@in.ibm.com \
--cc=dvhltc@us.ibm.com \
--cc=johnstul@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=npiggin@suse.de \
--cc=tglx@linutronix.de \
--cc=williams@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.