linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <jens.axboe@oracle.com>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jeff Moyer <jmoyer@redhat.com>,
	Ralf Gross <rg@STZ-Softwaretechnik.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: io-scheduler tuning for better read/write ratio
Date: Fri, 26 Jun 2009 12:44:06 +0200	[thread overview]
Message-ID: <20090626104406.GK23611@kernel.dk> (raw)
In-Reply-To: <20090626021905.GA23981@localhost>

On Fri, Jun 26 2009, Wu Fengguang wrote:
> On Tue, Jun 23, 2009 at 03:42:46AM +0800, Jeff Moyer wrote:
> > Ralf Gross <rg@STZ-Softwaretechnik.com> writes:
> > 
> > > Jeff Moyer schrieb:
> > >> Jeff Moyer <jmoyer@redhat.com> writes:
> > >> 
> > >> > Ralf Gross <rg@stz-softwaretechnik.com> writes:
> > >> >
> > >> >> Casey Dahlin schrieb:
> > >> >>> On 06/16/2009 02:40 PM, Ralf Gross wrote:
> > >> >>> > David Newall schrieb:
> > >> >>> >> Ralf Gross wrote:
> > >> >>> >>> write throughput is much higher than the read throughput (40 MB/s
> > >> >>> >>> read, 90 MB/s write).
> > >> >>> > 
> > >> >>> > Hm, but I get higher read throughput (160-200 MB/s) if I don't write
> > >> >>> > to the device at the same time.
> > >> >>> > 
> > >> >>> > Ralf
> > >> >>> 
> > >> >>> How specifically are you testing? It could depend a lot on the
> > >> >>> particular access patterns you're using to test.
> > >> >>
> > >> >> I did the basic tests with tiobench. The real test is a test backup
> > >> >> (bacula) with 2 jobs that create 2 30 GB spool files on that device.
> > >> >> The jobs partially write to the device in parallel. Depending which
> > >> >> spool file reaches the 30 GB first, one starts reading from that file
> > >> >> and writing to tape, while to other is still spooling.
> > >> >
> > >> > We are missing a lot of details, here.  I guess the first thing I'd try
> > >> > would be bumping up the max_readahead_kb parameter, since I'm guessing
> > >> > that your backup application isn't driving very deep queue depths.  If
> > >> > that doesn't work, then please provide exact invocations of tiobench
> > >> > that reprduce the problem or some blktrace output for your real test.
> > >> 
> > >> Any news, Ralf?
> > >
> > > sorry for the delay. atm there are large backups running and using the
> > > raid device for spooling. So I can't do any tests.
> > >
> > > Re. read ahead: I tested different settings from 8Kb to 65Kb, this
> > > didn't help.
> > >
> > > I'll do some more tests when the backups are done (3-4 more days).
> > 
> > The default is 128KB, I believe, so it's strange that you would test
> > smaller values.  ;)  I would try something along the lines of 1 or 2 MB.
> > 
> > I'm CCing Fengguang in case he has any suggestions.
> 
> Jeff, thank you for the forwarding (and sorry for the long delay)!
> 
> The read:write (or rather sync:async) ratio control is an IO scheduler
> feature. CFQ has parameters slice_sync and slice_async for that.
> What's more, CFQ will let async IO wait if there are any in flight
> sync IO. This is good, but not quite enough. Normally sync IOs come
> one by one, with some small idle time window in between. If we only
> start dispatching async IOs after the last sync IO has completed for
> eg. 1ms, then we may stop the async background write IOs when there
> are active sync foreground read IO stream.
> 
> This simple patch aims to address the writes-push-aside-reads problem.
> Ralf, you can try applying this patch and run your workload with this
> (huge) CFQ parameter:
> 
>         echo 1000 > /sys/block/sda/queue/iosched/slice_sync 
> 
> The patch is based on 2.6.30, but can be trivially backported if you
> want to use some old kernel.
> 
> It may impact overall (sync+async) IO throughput when there are one or
> more ongoing sync IO streams, so requires considerable benchmarks and
> adjustments.
> 
> Thanks,
> Fengguang
> ---
> 
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index a55a9bd..14011b7 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -1064,7 +1064,6 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
>  	if (blk_queue_nonrot(cfqd->queue) && cfqd->hw_tag)
>  		return;
>  
> -	WARN_ON(!RB_EMPTY_ROOT(&cfqq->sort_list));
>  	WARN_ON(cfq_cfqq_slice_new(cfqq));
>  
>  	/*
> @@ -2175,8 +2174,6 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
>  	 * or if we want to idle in case it has no pending requests.
>  	 */
>  	if (cfqd->active_queue == cfqq) {
> -		const bool cfqq_empty = RB_EMPTY_ROOT(&cfqq->sort_list);
> -
>  		if (cfq_cfqq_slice_new(cfqq)) {
>  			cfq_set_prio_slice(cfqd, cfqq);
>  			cfq_clear_cfqq_slice_new(cfqq);
> @@ -2190,8 +2187,8 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
>  		 */
>  		if (cfq_slice_used(cfqq) || cfq_class_idle(cfqq))
>  			cfq_slice_expired(cfqd, 1);
> -		else if (cfqq_empty && !cfq_close_cooperator(cfqd, cfqq, 1) &&
> -			 sync && !rq_noidle(rq))
> +		else if (sync && !rq_noidle(rq) &&
> +			 !cfq_close_cooperator(cfqd, cfqq, 1))
>  			cfq_arm_slice_timer(cfqd);
>  	}

What's the purpose of this patch? If you have requests pending you don't
want to arm the idle timer and wait, you want to dispatch those.

-- 
Jens Axboe


  reply	other threads:[~2009-06-26 10:44 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20090616154342.GA7043@p15145560.pureserver.info>
     [not found] ` <4A37CB2A.6010209@davidnewall.com>
     [not found]   ` <20090616184027.GB7043@p15145560.pureserver.info>
     [not found]     ` <4A37E7DB.7030100@redhat.com>
     [not found]       ` <20090616185600.GC7043@p15145560.pureserver.info>
     [not found]         ` <x49d494c4u0.fsf@segfault.boston.devel.redhat.com>
     [not found]           ` <x49fxdsl46y.fsf@segfault.boston.devel.redhat.com>
     [not found]             ` <20090622163113.GD12483@p15145560.pureserver.info>
     [not found]               ` <x49hby8jbrd.fsf@segfault.boston.devel.redhat.com>
2009-06-26  2:19                 ` io-scheduler tuning for better read/write ratio Wu Fengguang
2009-06-26 10:44                   ` Jens Axboe [this message]
2009-06-27  3:46                     ` Wu Fengguang
2009-06-29  9:47                       ` Ralf Gross

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090626104406.GK23611@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=fengguang.wu@intel.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rg@STZ-Softwaretechnik.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).