From: Jeff Moyer <jmoyer@redhat.com>
To: Andreas Dilger <adilger@dilger.ca>
Cc: Christoph Hellwig <hch@infradead.org>,
Chris Mason <chris.mason@oracle.com>,
Andrea Arcangeli <aarcange@redhat.com>, Jan Kara <jack@suse.cz>,
Boaz Harrosh <bharrosh@panasas.com>,
Mike Snitzer <snitzer@redhat.com>,
"linux-scsi\@vger.kernel.org" <linux-scsi@vger.kernel.org>,
"neilb\@suse.de" <neilb@suse.de>,
"dm-devel\@redhat.com" <dm-devel@redhat.com>,
"linux-fsdevel\@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"lsf-pc\@lists.linux-foundation.org"
<lsf-pc@lists.linux-foundation.org>,
"Darrick J.Wong" <djwong@us.ibm.com>
Subject: Re: [dm-devel] [Lsf-pc] [LSF/MM TOPIC] a few storage topics
Date: Tue, 24 Jan 2012 13:05:50 -0500 [thread overview]
Message-ID: <x49fwf5kmbl.fsf@segfault.boston.devel.redhat.com> (raw)
In-Reply-To: <186EA560-1720-4975-AC2F-8C72C4A777A9@dilger.ca> (Andreas Dilger's message of "Tue, 24 Jan 2012 10:08:47 -0700")
Andreas Dilger <adilger@dilger.ca> writes:
> On 2012-01-24, at 9:56, Christoph Hellwig <hch@infradead.org> wrote:
>> On Tue, Jan 24, 2012 at 10:15:04AM -0500, Chris Mason wrote:
>>> https://lkml.org/lkml/2011/12/13/326
>>>
>>> This patch is another example, although for a slight different reason.
>>> I really have no idea yet what the right answer is in a generic sense,
>>> but you don't need a 512K request to see higher latencies from merging.
>>
>> That assumes the 512k requests is created by merging. We have enough
>> workloads that create large I/O from the get go, and not splitting them
>> and eventually merging them again would be a big win. E.g. I'm
>> currently looking at a distributed block device which uses internal 4MB
>> chunks, and increasing the maximum request size to that dramatically
>> increases the read performance.
>
> (sorry about last email, hit send by accident)
>
> I don't think we can have a "one size fits all" policy here. In most
> RAID devices the IO size needs to be at least 1MB, and with newer
> devices 4MB gives better performance.
Right, and there's more to it than just I/O size. There's access
pattern, and more importantly, workload and related requirements
(latency vs throughput).
> One of the reasons that Lustre used to hack so much around the VFS and
> VM APIs is exactly to avoid the splitting of read/write requests into
> pages and then depend on the elevator to reconstruct a good-sized IO
> out of it.
>
> Things have gotten better with newer kernels, but there is still a
> ways to go w.r.t. allowing large IO requests to pass unhindered
> through to disk (or at least as far as enduring that the IO is aligned
> to the underlying disk geometry).
I've been wondering if it's gotten better, so decided to run a few quick
tests.
kernel version 3.2.0, storage: hp eva fc array, i/o scheduler cfq,
max_sectors_kb: 1024, test program: dd
ext3:
- buffered writes and buffered O_SYNC writes, all 1MB block size show 4k
I/Os passed down to the I/O scheduler
- buffered 1MB reads are a little better, typically in the 128k-256k
range when they hit the I/O scheduler.
ext4:
- buffered writes: 512K I/Os show up at the elevator
- buffered O_SYNC writes: data is again 512KB, journal writes are 4K
- buffered 1MB reads get down to the scheduler in 128KB chunks
xfs:
- buffered writes: 1MB I/Os show up at the elevator
- buffered O_SYNC writes: 1MB I/Os
- buffered 1MB reads: 128KB chunks show up at the I/O scheduler
So, ext4 is doing better than ext3, but still not perfect. xfs is
kicking ass for writes, but reads are still split up.
Cheers,
Jeff
next prev parent reply other threads:[~2012-01-24 18:05 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CABE8wws67dn0fwhTCs_XqH0g_CxGuT+hPQH9cVFe1xx5t_O9Jw@mail.gmail.com>
2012-01-17 20:06 ` [LSF/MM TOPIC] a few storage topics Mike Snitzer
2012-01-17 21:36 ` [Lsf-pc] " Jan Kara
2012-01-18 22:58 ` Darrick J. Wong
2012-01-18 23:22 ` Jan Kara
2012-01-18 23:42 ` Boaz Harrosh
2012-01-19 9:46 ` Jan Kara
2012-01-19 15:08 ` Andrea Arcangeli
2012-01-19 20:52 ` Jan Kara
2012-01-19 21:39 ` Andrea Arcangeli
2012-01-22 11:31 ` Boaz Harrosh
2012-01-23 16:30 ` Jan Kara
2012-01-22 12:21 ` Boaz Harrosh
2012-01-23 16:18 ` Jan Kara
2012-01-23 17:53 ` Andrea Arcangeli
2012-01-23 18:28 ` Jeff Moyer
2012-01-23 18:56 ` Andrea Arcangeli
2012-01-23 19:19 ` Jeff Moyer
2012-01-24 15:15 ` Chris Mason
2012-01-24 16:56 ` [dm-devel] " Christoph Hellwig
2012-01-24 17:01 ` Andreas Dilger
2012-01-24 17:06 ` [Lsf-pc] [dm-devel] " Andrea Arcangeli
2012-01-24 17:08 ` Chris Mason
2012-01-24 17:08 ` [Lsf-pc] " Andreas Dilger
2012-01-24 18:05 ` Jeff Moyer [this message]
2012-01-24 18:40 ` [dm-devel] " Christoph Hellwig
2012-01-24 19:07 ` Chris Mason
2012-01-24 19:14 ` Jeff Moyer
2012-01-24 20:09 ` [Lsf-pc] [dm-devel] " Jan Kara
2012-01-24 20:13 ` [Lsf-pc] " Jeff Moyer
2012-01-24 20:39 ` [Lsf-pc] [dm-devel] " Jan Kara
2012-01-24 20:59 ` Jeff Moyer
2012-01-24 21:08 ` Jan Kara
2012-01-25 3:29 ` Wu Fengguang
2012-01-25 6:15 ` [Lsf-pc] " Andreas Dilger
2012-01-25 6:35 ` [Lsf-pc] [dm-devel] " Wu Fengguang
2012-01-25 14:00 ` Jan Kara
2012-01-26 12:29 ` Andreas Dilger
2012-01-27 17:03 ` Ted Ts'o
2012-01-26 16:25 ` Vivek Goyal
2012-01-26 20:37 ` Jan Kara
2012-01-26 22:34 ` Dave Chinner
2012-01-27 3:27 ` Wu Fengguang
2012-01-27 5:25 ` Andreas Dilger
2012-01-27 7:53 ` Wu Fengguang
2012-01-25 14:33 ` Steven Whitehouse
2012-01-25 14:45 ` Jan Kara
2012-01-25 16:22 ` Loke, Chetan
2012-01-25 16:40 ` Steven Whitehouse
2012-01-25 17:08 ` Loke, Chetan
2012-01-25 17:32 ` James Bottomley
2012-01-25 18:28 ` Loke, Chetan
2012-01-25 18:37 ` Loke, Chetan
2012-01-25 18:37 ` James Bottomley
2012-01-25 20:06 ` Chris Mason
2012-01-25 22:46 ` Andrea Arcangeli
2012-01-25 22:58 ` Jan Kara
2012-01-26 8:59 ` Boaz Harrosh
2012-01-26 16:40 ` Loke, Chetan
2012-01-26 17:00 ` Andreas Dilger
2012-01-26 17:16 ` Loke, Chetan
2012-02-03 12:37 ` Wu Fengguang
2012-01-26 22:38 ` Dave Chinner
2012-01-26 16:17 ` Loke, Chetan
2012-01-25 18:44 ` Boaz Harrosh
2012-02-03 12:55 ` Wu Fengguang
2012-01-24 19:11 ` [dm-devel] [Lsf-pc] " Jeff Moyer
2012-01-26 22:31 ` Dave Chinner
2012-01-24 17:12 ` Jeff Moyer
2012-01-24 17:32 ` Chris Mason
2012-01-24 18:14 ` Jeff Moyer
2012-01-25 0:23 ` NeilBrown
2012-01-25 6:11 ` Andreas Dilger
2012-01-18 23:39 ` Dan Williams
2012-01-24 17:59 ` Martin K. Petersen
2012-01-24 19:48 ` Douglas Gilbert
2012-01-24 20:04 ` Martin K. Petersen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=x49fwf5kmbl.fsf@segfault.boston.devel.redhat.com \
--to=jmoyer@redhat.com \
--cc=aarcange@redhat.com \
--cc=adilger@dilger.ca \
--cc=bharrosh@panasas.com \
--cc=chris.mason@oracle.com \
--cc=djwong@us.ibm.com \
--cc=dm-devel@redhat.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=neilb@suse.de \
--cc=snitzer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).