From: Dave Chinner <david@fromorbit.com>
To: Kent Overstreet <koverstreet@google.com>
Cc: Mike Snitzer <snitzer@redhat.com>,
linux-kernel@vger.kernel.org, linux-bcache@vger.kernel.org,
dm-devel@redhat.com, linux-fsdevel@vger.kernel.org,
axboe@kernel.dk, yehuda@hq.newdream.net, mpatocka@redhat.com,
vgoyal@redhat.com, bharrosh@panasas.com, tj@kernel.org,
sage@newdream.net, agk@redhat.com, drbd-dev@lists.linbit.com,
Dave Chinner <dchinner@redhat.com>,
tytso@google.com
Subject: Re: [PATCH v3 14/16] Gut bio_add_page()
Date: Tue, 29 May 2012 11:54:38 +1000 [thread overview]
Message-ID: <20120529015438.GZ5091@dastard> (raw)
In-Reply-To: <20120525210944.GB14196@google.com>
On Fri, May 25, 2012 at 02:09:44PM -0700, Kent Overstreet wrote:
> On Fri, May 25, 2012 at 04:46:51PM -0400, Mike Snitzer wrote:
> > I'd love to see the merge_bvec stuff go away but it does serve a
> > purpose: filesystems benefit from accurately building up much larger
> > bios (based on underlying device limits). XFS has leveraged this for
> > some time and ext4 adopted this (commit bd2d0210cf) because of the
> > performance advantage.
>
> That commit only talks about skipping buffer heads, from the patch
> description I don't see how merge_bvec_fn would have anything to do with
> what it's after.
XFS has used it since 2.6.16 as building our own bios enabled the Io
path form IOs of sizes that are independent of the filesystem block
size.
http://oss.sgi.com/projects/xfs/papers/ols2006/ols-2006-paper.pdf
And it's not just the XFS write path that uses bio_add_page - the XFS
metadata read/write IO code uses it as well because we have metadata
constructs that are larger than a single page...
> > So if you don't have a mechanism for the filesystem's IO to have
> > accurate understanding of the limits of the device the filesystem is
> > built on (merge_bvec was the mechanism) and are leaning on late
> > splitting does filesystem performance suffer?
>
> So is the issue that it may take longer for an IO to complete, or is it
> CPU utilization/scalability?
Both. Moving to this code reduced the CPU overhead per MB of data
written to disk by 80-90%. It also allowed us to build IOs that span
entire RAID stripe widths, thereby avoiding potential RAID RMW
cycles, and even allowing high end raid controllers to trigger BBWC
bypass fast paths that could double or triple the write throughput
of the arrays...
> If it's the former, we've got a real problem.
... then you have a real problem.
> If it's the latter - it
> might be a problem in the interim (I don't expect generic_make_request()
> to be splitting bios in the common case long term), but I doubt it's
> going to be much of an issue.
I think this will also be an issue - the typical sort of throughput
I've been hearing about over the past year for typical HPC
deployments is >20GB/s buffered write throughput to disk on a single
XFS filesystem, and that is typically limited by the flusher thread
being CPU bound. So if you changes have a CPU usage impact, then
these systems will definitely see reduced performance....
> > Would be nice to see before and after XFS and ext4 benchmarks against a
> > RAID device (level 5 or 6). I'm especially interested to get Dave
> > Chinner's and Ted's insight here.
>
> Yeah.
>
> I can't remember who it was, but Ted knows someone who was able to
> benchmark on a 48 core system. I don't think we need numbers from a 48
> core machine for these patches, but whatever workloads they were testing
> that were problematic CPU wise would be useful to test.
Eric Whitney.
http://downloads.linux.hp.com/~enw/ext4/3.2/
His storage hardware probably isn't fast enough to demonstrate the
sort of problems I'm expecting that would occur...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2012-05-29 1:54 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-25 20:25 [PATCH v3 00/16] Block cleanups Kent Overstreet
2012-05-25 20:25 ` [PATCH v3 01/16] block: Generalized bio pool freeing Kent Overstreet
[not found] ` <1337977539-16977-2-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-05-28 1:15 ` Tejun Heo
2012-05-28 10:04 ` Boaz Harrosh
2012-05-25 20:25 ` [PATCH v3 02/16] dm: Use bioset's front_pad for dm_rq_clone_bio_info Kent Overstreet
[not found] ` <1337977539-16977-3-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-05-28 0:57 ` [dm-devel] " Jun'ichi Nomura
[not found] ` <4FC2CD93.1080009-JhyGz2TFV9J8UrSeD/g0lQ@public.gmane.org>
2012-05-28 11:41 ` Jun'ichi Nomura
2012-05-28 1:21 ` Tejun Heo
[not found] ` <1337977539-16977-1-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-05-25 20:25 ` [PATCH v3 03/16] block: Add bio_reset() Kent Overstreet
[not found] ` <1337977539-16977-4-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-05-28 1:23 ` Tejun Heo
2012-05-28 10:02 ` Boaz Harrosh
2012-05-25 20:25 ` [PATCH v3 04/16] pktcdvd: Switch to bio_kmalloc() Kent Overstreet
[not found] ` <1337977539-16977-5-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-05-28 1:30 ` Tejun Heo
2012-05-25 20:25 ` [PATCH v3 05/16] block: Kill bi_destructor Kent Overstreet
[not found] ` <1337977539-16977-6-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-05-28 1:36 ` Tejun Heo
2012-05-29 2:10 ` Kent Overstreet
[not found] ` <20120529021042.GA6472-RcKxWJ4Cfj3IzGYXcIpNmNLIRw13R84JkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2012-05-29 2:20 ` Tejun Heo
2012-05-25 20:25 ` [PATCH v3 06/16] block: Add an explicit bio flag for bios that own their bvec Kent Overstreet
[not found] ` <1337977539-16977-7-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-05-28 1:52 ` Tejun Heo
2012-05-25 20:25 ` [PATCH v3 07/16] block: Rename bio_split() -> bio_pair_split() Kent Overstreet
[not found] ` <1337977539-16977-8-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-05-28 10:15 ` Boaz Harrosh
[not found] ` <4FC3504B.7000903-C4P08NqkoRlBDgjK7y7TUQ@public.gmane.org>
2012-05-29 2:15 ` Kent Overstreet
2012-05-25 20:25 ` [PATCH v3 08/16] block: Rework bio splitting Kent Overstreet
[not found] ` <1337977539-16977-9-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-05-28 16:12 ` Mikulas Patocka
2012-05-25 20:25 ` [PATCH v3 09/16] block: Add bio_clone_kmalloc() Kent Overstreet
2012-05-25 20:25 ` [PATCH v3 10/16] block: Add bio_clone_bioset() Kent Overstreet
2012-05-25 20:25 ` [PATCH v3 11/16] block: Only clone bio vecs that are in use Kent Overstreet
2012-05-25 20:25 ` [PATCH v3 12/16] Closures Kent Overstreet
[not found] ` <1337977539-16977-13-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-05-25 20:57 ` Joe Perches
2012-05-25 21:35 ` Kent Overstreet
2012-05-25 20:25 ` [PATCH v3 13/16] Make generic_make_request handle arbitrarily large bios Kent Overstreet
[not found] ` <1337977539-16977-14-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-05-25 22:58 ` Alasdair G Kergon
[not found] ` <20120525225852.GG5761-FDJ95KluN3Z0klwcnFlA1dvLeJWuRmrY@public.gmane.org>
2012-05-25 23:12 ` Alasdair G Kergon
2012-05-26 0:18 ` Kent Overstreet
2012-05-25 20:25 ` [PATCH v3 14/16] Gut bio_add_page() Kent Overstreet
[not found] ` <1337977539-16977-15-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-05-25 20:46 ` Mike Snitzer
2012-05-25 21:09 ` Kent Overstreet
[not found] ` <20120525210944.GB14196-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-05-25 22:39 ` Alasdair G Kergon
[not found] ` <20120525223937.GF5761-FDJ95KluN3Z0klwcnFlA1dvLeJWuRmrY@public.gmane.org>
2012-05-28 16:07 ` Mikulas Patocka
[not found] ` <Pine.LNX.4.64.1205281129180.2227-e+HWlsje6Db1wF9wiOj0lkEOCMrvLtNR@public.gmane.org>
2012-05-28 20:28 ` Tejun Heo
[not found] ` <20120528202839.GA18537-RcKxWJ4Cfj1J2suj2OqeGauc2jM2gXBXkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2012-05-28 21:27 ` Mikulas Patocka
2012-05-28 21:38 ` Tejun Heo
[not found] ` <20120528213839.GB18537-RcKxWJ4Cfj1J2suj2OqeGauc2jM2gXBXkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2012-05-28 23:02 ` Tejun Heo
[not found] ` <20120528230208.GA20954-RcKxWJ4Cfj1J2suj2OqeGauc2jM2gXBXkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2012-05-29 2:08 ` Dave Chinner
2012-05-29 2:15 ` Tejun Heo
[not found] ` <20120529021558.GG20954-RcKxWJ4Cfj1J2suj2OqeGauc2jM2gXBXkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2012-05-29 3:36 ` Kent Overstreet
2012-05-29 2:07 ` Dave Chinner
2012-05-29 1:54 ` Dave Chinner [this message]
2012-05-29 3:34 ` Kent Overstreet
2012-06-05 0:33 ` Dave Chinner
2012-05-25 20:25 ` [PATCH v3 15/16] md: Kill merge_bvec_fn()s Kent Overstreet
2012-05-25 20:25 ` [PATCH v3 16/16] dm: Kill merge_bvec_fn() Kent Overstreet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120529015438.GZ5091@dastard \
--to=david@fromorbit.com \
--cc=agk@redhat.com \
--cc=axboe@kernel.dk \
--cc=bharrosh@panasas.com \
--cc=dchinner@redhat.com \
--cc=dm-devel@redhat.com \
--cc=drbd-dev@lists.linbit.com \
--cc=koverstreet@google.com \
--cc=linux-bcache@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mpatocka@redhat.com \
--cc=sage@newdream.net \
--cc=snitzer@redhat.com \
--cc=tj@kernel.org \
--cc=tytso@google.com \
--cc=vgoyal@redhat.com \
--cc=yehuda@hq.newdream.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).