public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Corrado Zoccolo <czoccolo@gmail.com>
Cc: Jens Axboe <jens.axboe@oracle.com>,
	Linux-Kernel <linux-kernel@vger.kernel.org>,
	Jeff Moyer <jmoyer@redhat.com>, Shaohua Li <shaohua.li@intel.com>,
	Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Subject: Re: [PATCH 2/2] cfq-iosched: rethink seeky detection for SSDs
Date: Wed, 3 Mar 2010 18:28:32 -0500	[thread overview]
Message-ID: <20100303232832.GD5230@redhat.com> (raw)
In-Reply-To: <4e5e476b1003031147n6a53b646k418483a93013d77c@mail.gmail.com>

On Wed, Mar 03, 2010 at 08:47:31PM +0100, Corrado Zoccolo wrote:
> Hi Vivek,
> On Mon, Mar 1, 2010 at 3:25 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Sat, Feb 27, 2010 at 07:45:40PM +0100, Corrado Zoccolo wrote:
> >> CFQ currently applies the same logic of detecting seeky queues and
> >> grouping them together for rotational disks as well as SSDs.
> >> For SSDs, the time to complete a request doesn't depend on the
> >> request location, but only on the size.
> >> This patch therefore changes the criterion to group queues by
> >> request size in case of SSDs, in order to achieve better fairness.
> >
> > Hi Corrado,
> >
> > Can you give some numbers regarding how are you measuring fairness and
> > how did you decide that we achieve better fairness?
> >
> Please, see the attached fio script. It benchmarks pairs of processes
> performing direct random I/O.
> One is always fixed at bs=4k , while I vary the other from 8K to 64K
> test00: (g=0): rw=randread, bs=8K-8K/8K-8K, ioengine=sync, iodepth=1
> test01: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
> test10: (g=1): rw=randread, bs=16K-16K/16K-16K, ioengine=sync, iodepth=1
> test11: (g=1): rw=randread, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
> test20: (g=2): rw=randread, bs=32K-32K/32K-32K, ioengine=sync, iodepth=1
> test21: (g=2): rw=randread, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
> test30: (g=3): rw=randread, bs=64K-64K/64K-64K, ioengine=sync, iodepth=1
> test31: (g=3): rw=randread, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
> 
> With unpatched cfq (2.6.33), on a flash card (non-ncq), after running
> a fio script with high number of parallel readers to make sure ncq
> detection is stabilized, I get the following:
> Run status group 0 (all jobs):
>    READ: io=21528KiB, aggrb=4406KiB/s, minb=1485KiB/s, maxb=2922KiB/s,
> mint=5001msec, maxt=5003msec
> 
> Run status group 1 (all jobs):
>    READ: io=31524KiB, aggrb=6452KiB/s, minb=1327KiB/s, maxb=5126KiB/s,
> mint=5002msec, maxt=5003msec
> 
> Run status group 2 (all jobs):
>    READ: io=46544KiB, aggrb=9524KiB/s, minb=1031KiB/s, maxb=8493KiB/s,
> mint=5001msec, maxt=5004msec
> 
> Run status group 3 (all jobs):
>    READ: io=64712KiB, aggrb=13242KiB/s, minb=761KiB/s,
> maxb=12486KiB/s, mint=5002msec, maxt=5004msec
> 
> As you can see from minb, the process with smallest I/O size is
> penalized (the fact is that being both marked as noidle, they both end
> up in the noidle tree, where they are serviced round robin, so they
> get fairness in term of IOPS, but bandwidth varies a lot.
> 
> With my patches in place, I get:
> Run status group 0 (all jobs):
>    READ: io=21544KiB, aggrb=4409KiB/s, minb=1511KiB/s, maxb=2898KiB/s,
> mint=5002msec, maxt=5003msec
> 
> Run status group 1 (all jobs):
>    READ: io=32000KiB, aggrb=6549KiB/s, minb=1277KiB/s, maxb=5274KiB/s,
> mint=5001msec, maxt=5003msec
> 
> Run status group 2 (all jobs):
>    READ: io=39444KiB, aggrb=8073KiB/s, minb=1576KiB/s, maxb=6498KiB/s,
> mint=5002msec, maxt=5003msec
> 
> Run status group 3 (all jobs):
>    READ: io=49180KiB, aggrb=10059KiB/s, minb=1512KiB/s,
> maxb=8548KiB/s, mint=5001msec, maxt=5006msec
> 
> The process doing smaller requests is now not penalized by the fact
> that it is run concurrently with the other one, and the other still
> benefits from larger requests because it uses better its time slice.
> 

Hi Corrado,

I ran your fio script on my SSD which supports NCQ. In my case both the
processes lose.

Vanilla kernel (2.6.33)
=======================
Run status group 0 (all jobs):
   READ: io=258MB, aggrb=52,903KB/s, minb=18,058KB/s, maxb=36,114KB/s,
mint=5001msec, maxt=5001msec

Run status group 1 (all jobs):
   READ: io=382MB, aggrb=78,301KB/s, minb=16,037KB/s, maxb=64,143KB/s,
mint=5001msec, maxt=5001msec

Run status group 2 (all jobs):
   READ: io=560MB, aggrb=112MB/s, minb=13,052KB/s, maxb=102MB/s,
mint=5001msec, maxt=5001msec

Run status group 3 (all jobs):
   READ: io=774MB, aggrb=155MB/s, minb=9,595KB/s, maxb=149MB/s,
mint=5001msec, maxt=5001msec

With two seek patches applied
=============================
Run status group 0 (all jobs):
   READ: io=260MB, aggrb=53,148KB/s, minb=18,140KB/s, maxb=36,283KB/s,
mint=5001msec, maxt=5001msec

Run status group 1 (all jobs):
   READ: io=383MB, aggrb=78,484KB/s, minb=16,073KB/s, maxb=64,294KB/s,
mint=5001msec, maxt=5001msec

Run status group 2 (all jobs):
   READ: io=359MB, aggrb=73,534KB/s, minb=8,367KB/s, maxb=66,931KB/s,
mint=5001msec, maxt=5001msec

Run status group 3 (all jobs):
   READ: io=556MB, aggrb=111MB/s, minb=6,852KB/s, maxb=107MB/s,
mint=5001msec, maxt=5001msec

Note, how overall BW suffers for group 2 and group 3. This needs some
debugging why it is happening. I guess we are idling on sync-noidle
tree and not idling on sync-idle tree, probably that's why bigger request
size process can't get enough IO pushed. But then smaller request size
process should have got more IO done but that also does not seem to
be happening.

Both small request size and big request size processes are losing. 

Thanks
Vivek

> > In case of SSDs with NCQ, we will not idle on any of the queues (either
> > sync or sync-noidle (seeky queues)). So w.r.t code, what behavior changes
> > if we mark a queue as seeky/non-seeky on SSD?
> >
> 
> I've not tested on NCQ SSD, but I think at worst it will not harm, and
> at best, it will provide similar fairness improvements when the queue
> of processes submitting requests grows above the available NCQ slots.
> 
> > IOW, looking at this patch, now any queue doing IO in smaller chunks than
> > 32K on SSD will be marked as seeky. How does that change the behavior in
> > terms of fairness for the queue?
> >
> Basically, we will have IOPS based fairness for small requests, and
> time based fairness for larger requests.
> 
> Thanks,
> Corrado
> 
> > Thanks
> > Vivek
> >
> >>
> >> Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com>
> >> ---
> >>  block/cfq-iosched.c |    7 ++++++-
> >>  1 files changed, 6 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> >> index 806d30b..f27e535 100644
> >> --- a/block/cfq-iosched.c
> >> +++ b/block/cfq-iosched.c
> >> @@ -47,6 +47,7 @@ static const int cfq_hist_divisor = 4;
> >>  #define CFQ_SERVICE_SHIFT       12
> >>
> >>  #define CFQQ_SEEK_THR                (sector_t)(8 * 100)
> >> +#define CFQQ_SECT_THR_NONROT (sector_t)(2 * 32)
> >>  #define CFQQ_SEEKY(cfqq)     (hweight32(cfqq->seek_history) > 32/8)
> >>
> >>  #define RQ_CIC(rq)           \
> >> @@ -2958,6 +2959,7 @@ cfq_update_io_seektime(struct cfq_data *cfqd, struct cfq_queue *cfqq,
> >>                      struct request *rq)
> >>  {
> >>       sector_t sdist = 0;
> >> +     sector_t n_sec = blk_rq_sectors(rq);
> >>       if (cfqq->last_request_pos) {
> >>               if (cfqq->last_request_pos < blk_rq_pos(rq))
> >>                       sdist = blk_rq_pos(rq) - cfqq->last_request_pos;
> >> @@ -2966,7 +2968,10 @@ cfq_update_io_seektime(struct cfq_data *cfqd, struct cfq_queue *cfqq,
> >>       }
> >>
> >>       cfqq->seek_history <<= 1;
> >> -     cfqq->seek_history |= (sdist > CFQQ_SEEK_THR);
> >> +     if (blk_queue_nonrot(cfqd->queue))
> >> +             cfqq->seek_history |= (n_sec < CFQQ_SECT_THR_NONROT);
> >> +     else
> >> +             cfqq->seek_history |= (sdist > CFQQ_SEEK_THR);
> >>  }
> >>
> >>  /*
> >> --
> >> 1.6.4.4
> >



  parent reply	other threads:[~2010-03-03 23:28 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1267296340-3820-1-git-send-email-czoccolo@gmail.com>
2010-02-27 18:45 ` [PATCH 1/2] cfq-iosched: rework seeky detection Corrado Zoccolo
2010-02-27 18:45   ` [PATCH 2/2] cfq-iosched: rethink seeky detection for SSDs Corrado Zoccolo
2010-03-01 14:25     ` Vivek Goyal
2010-03-03 19:47       ` Corrado Zoccolo
2010-03-03 21:21         ` Vivek Goyal
2010-03-03 23:28         ` Vivek Goyal [this message]
2010-03-04 20:34           ` Corrado Zoccolo
2010-03-04 22:27             ` Vivek Goyal
2010-03-05 22:31               ` Corrado Zoccolo
2010-03-08 14:08                 ` Vivek Goyal
2010-02-28 18:41 ` [RFC, PATCH 0/2] Reworking seeky detection for 2.6.34 Jens Axboe
2010-03-01 16:35 ` Vivek Goyal
2010-03-01 19:45   ` Vivek Goyal
2010-03-01 23:01   ` Corrado Zoccolo
2010-03-03 22:39     ` Corrado Zoccolo
2010-03-03 23:11       ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100303232832.GD5230@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=czoccolo@gmail.com \
    --cc=guijianfeng@cn.fujitsu.com \
    --cc=jens.axboe@oracle.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shaohua.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox