From: Wu Fengguang <wfg@linux.intel.com>
To: Shaohua Li <shaohua.li@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>,
Eric Dumazet <eric.dumazet@gmail.com>,
Herbert Poetzl <herbert@13thfloor.at>,
Andrew Morton <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>, Jens Axboe <axboe@kernel.dk>,
Tejun Heo <tj@kernel.org>
Subject: Re: Bad SSD performance with recent kernels
Date: Tue, 31 Jan 2012 09:07:55 +0800 [thread overview]
Message-ID: <20120131010755.GA12776@localhost> (raw)
In-Reply-To: <1327968859.21268.12.camel@sli10-conroe>
On Tue, Jan 31, 2012 at 08:14:19AM +0800, Li, Shaohua wrote:
> On Mon, 2012-01-30 at 17:26 -0500, Vivek Goyal wrote:
> > On Mon, Jan 30, 2012 at 03:51:49PM +0100, Eric Dumazet wrote:
> > > Le lundi 30 janvier 2012 à 22:28 +0800, Wu Fengguang a écrit :
> > > > On Mon, Jan 30, 2012 at 06:31:34PM +0800, Li, Shaohua wrote:
> > > >
> > > > > Looks the 2.6.39 block plug introduces some latency here. deleting
> > > > > blk_start_plug/blk_finish_plug in generic_file_aio_read seems
> > > > > workaround
> > > > > the issue. The plug seems not good for sequential IO, because readahead
> > > > > code already has plug and has fine grained control.
> > > >
> > > > Why not remove the generic_file_aio_read() plug completely? It
> > > > actually prevents unplugging immediately after the readahead IO is
> > > > submitted and in turn stalls the IO pipeline as showed by Eric's
> > > > blktrace data.
> > > >
> > > > Eric, will you test this patch? Thank you.
> >
> > Can you please run the blktrace again with this patch applied. I am curious
> > to see how does traffic pattern look like now.
> >
> > In your previous trace, there were so many small 8 sector requests which
> > were merged into 512 sector requests before dispatching to disk. (I am
> > not sure why those requests are not bigger. Shouldn't readahead logic
> > submit a bigger request?) Now with plug/unplug logic removed, I am assuming
> > we should be doing less merging and dispatching more smaller requests. May be
> > that is helping and cutting down on disk idling time.
> >
> > In previous logs, 512 sector request seems to be taking around 1ms to
> > complete after dispatch. In between requests disk seems to be idle
> > for around .5 to .6 ms. Out of this .3 ms seems to be gone in just
> > coming up with new request after completion of previous one and another
> > .3ms seems to be consumed in merging the smaller IOs. So if we don't wait
> > for merging, it should keep disk busier for .3ms more which is 30% of time
> > it takes to complete 512 sector request. So theoritically it can give
> > 30% boost for this workload. (Assuming request size will not impact the
> > disk throughput very severely).
> >
> > Anyway, some blktrace data will shed some light..
> yep, I suspect plug merges big request too (iostat shows it too), that's
> why I only think delete the plug in generic_file_aio_read as a
> workaround.
It's good to merge requests inside the same readahead window. However
I don't think readahead window A should be merged with B at the cost
of delaying A for some time, which will break the pipeline. If larger
IO is desirable, we can do so by increasing the readahead size.
> I still thought readahead has something to do here. I
> observed the async readahead does readahead (A, A + 2M), and follows (A
> +128k, A+2M), (A+256k, A+2M) ..., the later readahead doesn't work
> because we already have (A, A+2M) in memory at that time. Anyway, I can
> reproduce the issue, will play with it more today.
How do you observe that? I don't think that readahead pattern is
possible. However I do see such _read_ patterns.
Thanks,
Fengguang
next prev parent reply other threads:[~2012-01-31 1:18 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-27 6:00 Bad SSD performance with recent kernels Herbert Poetzl
2012-01-27 6:44 ` Eric Dumazet
2012-01-28 12:51 ` Wu Fengguang
2012-01-28 13:33 ` Eric Dumazet
2012-01-29 5:59 ` Wu Fengguang
2012-01-29 8:42 ` Herbert Poetzl
2012-01-29 9:28 ` Wu Fengguang
2012-01-29 10:03 ` Eric Dumazet
2012-01-29 11:16 ` Wu Fengguang
2012-01-29 13:13 ` Eric Dumazet
2012-01-29 15:52 ` Pádraig Brady
2012-01-29 16:10 ` Wu Fengguang
2012-01-29 20:15 ` Herbert Poetzl
2012-01-30 11:18 ` Wu Fengguang
2012-01-30 12:34 ` Eric Dumazet
2012-01-30 14:01 ` Wu Fengguang
2012-01-30 14:05 ` Wu Fengguang
2012-01-30 3:17 ` Shaohua Li
2012-01-30 5:31 ` Eric Dumazet
2012-01-30 5:45 ` Shaohua Li
2012-01-30 7:13 ` Herbert Poetzl
2012-01-30 7:22 ` Shaohua Li
2012-01-30 7:36 ` Herbert Poetzl
2012-01-30 8:12 ` Shaohua Li
2012-01-30 10:31 ` Shaohua Li
2012-01-30 14:28 ` Wu Fengguang
2012-01-30 14:51 ` Eric Dumazet
2012-01-30 22:26 ` Vivek Goyal
2012-01-31 0:14 ` Shaohua Li
2012-01-31 1:07 ` Wu Fengguang [this message]
2012-01-31 3:00 ` Shaohua Li
2012-01-31 2:17 ` Eric Dumazet
2012-01-31 8:46 ` Eric Dumazet
2012-01-31 6:36 ` Herbert Poetzl
2012-01-30 14:48 ` Wu Fengguang
2012-01-28 17:01 ` Herbert Poetzl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120131010755.GA12776@localhost \
--to=wfg@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=eric.dumazet@gmail.com \
--cc=herbert@13thfloor.at \
--cc=linux-kernel@vger.kernel.org \
--cc=shaohua.li@intel.com \
--cc=tj@kernel.org \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.