All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Corrado Zoccolo <czoccolo@gmail.com>
Cc: linux-kernel@vger.kernel.org, jens.axboe@oracle.com,
	nauman@google.com, dpshah@google.com, lizf@cn.fujitsu.com,
	ryov@valinux.co.jp, fernando@oss.ntt.co.jp,
	s-uchida@ap.jp.nec.com, taka@valinux.co.jp,
	guijianfeng@cn.fujitsu.com, jmoyer@redhat.com,
	balbir@linux.vnet.ibm.com, righi.andrea@gmail.com,
	m-ikeda@ds.jp.nec.com, akpm@linux-foundation.org,
	riel@redhat.com, kamezawa.hiroyu@jp.fujitsu.com
Subject: Re: [RFC] Workload type Vs Groups (Was: Re: [PATCH 02/20] blkio: Change CFQ to use CFS like queue time stamps)
Date: Tue, 10 Nov 2009 14:15:20 -0500	[thread overview]
Message-ID: <20091110191520.GC3497@redhat.com> (raw)
In-Reply-To: <4e5e476b0911101005x3da4a552g8f636022ae2c3bed@mail.gmail.com>

On Tue, Nov 10, 2009 at 07:05:19PM +0100, Corrado Zoccolo wrote:
> On Tue, Nov 10, 2009 at 3:12 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > Ok, I ran some simple tests on my NCQ SSD. I had pulled the Jen's branch
> > few days back and it has your patches in it.
> >
> > I am running three direct sequential readers or prio 0, 4 and 7
> > respectively using fio for 10 seconds and then monitoring who got how
> > much job done.
> >
> > Following is my fio job file
> >
> > ****************************************************************
> > [global]
> > ioengine=sync
> > runtime=10
> > size=1G
> > rw=read
> > directory=/mnt/sdc/fio/
> > direct=1
> > bs=4K
> > exec_prerun="echo 3 > /proc/sys/vm/drop_caches"
> >
> > [seqread0]
> > prio=0
> >
> > [seqread4]
> > prio=4
> >
> > [seqread7]
> > prio=7
> > ************************************************************************
> 
> Can you try without direct and bs?
> 

Ok, here are the results without direct and bs. So it is now buffered
reads. The fio file above remains more or less same except that I had
to change size to 2G as with-in 10 seconds some process can finish reading
1G and get out of contention.

First Run
=========
read : io=382MB, bw=39,112KB/s, iops=9,777, runt= 10001msec
read : io=939MB, bw=96,194KB/s, iops=24,048, runt= 10001msec
read : io=765MB, bw=78,355KB/s, iops=19,588, runt= 10004msec

Second run
==========
read : io=443MB, bw=45,395KB/s, iops=11,348, runt= 10004msec
read : io=1,058MB, bw=106MB/s, iops=27,081, runt= 10001msec
read : io=650MB, bw=66,535KB/s, iops=16,633, runt= 10006msec

Third Run
=========
read : io=727MB, bw=74,465KB/s, iops=18,616, runt= 10004msec
read : io=890MB, bw=91,126KB/s, iops=22,781, runt= 10001msec
read : io=406MB, bw=41,608KB/s, iops=10,401, runt= 10004msec

Fourth Run
==========
read : io=792MB, bw=81,143KB/s, iops=20,285, runt= 10001msec
read : io=1,024MB, bw=102MB/s, iops=26,192, runt= 10009msec
read : io=314MB, bw=32,093KB/s, iops=8,023, runt= 10011msec

Still can't get the service difference proportionate to priority levels.
In fact in some cases it is more like priority inversion where higher
priority is getting lower BW.

> >
> > Following are the results of 4 runs. Every run lists three jobs of prio0,
> > prio4 and prio7 respectively.
> >
> > First run
> > =========
> > read : io=75,996KB, bw=7,599KB/s, iops=1,899, runt= 10001msec
> > read : io=95,920KB, bw=9,591KB/s, iops=2,397, runt= 10001msec
> > read : io=21,068KB, bw=2,107KB/s, iops=526, runt= 10001msec
> >
> > Second run
> > ==========
> > read : io=103MB, bw=10,540KB/s, iops=2,635, runt= 10001msec
> > read : io=102MB, bw=10,479KB/s, iops=2,619, runt= 10001msec
> > read : io=720KB, bw=73,728B/s, iops=18, runt= 10000msec
> >
> > Third Run
> > =========
> > read : io=103MB, bw=10,532KB/s, iops=2,632, runt= 10001msec
> > read : io=85,728KB, bw=8,572KB/s, iops=2,142, runt= 10001msec
> > read : io=19,696KB, bw=1,969KB/s, iops=492, runt= 10001msec
> >
> > Fourth Run
> > ==========
> > read : io=50,060KB, bw=5,005KB/s, iops=1,251, runt= 10001msec
> > read : io=102MB, bw=10,409KB/s, iops=2,602, runt= 10001msec
> > read : io=54,844KB, bw=5,484KB/s, iops=1,370, runt= 10001msec
> >
> > I can't see fairness being provided to processes of diff prio levels. In
> > first run prio4 got more BW than prio0 process.
> >
> > In second run prio 7 process got completely starved. Based on slice
> > calculation, the difference between prio 0 and prio 7 should be 180/40=4.5
> >
> > Third run is still better.
> >
> > In fourth run again prio 4 got double the BW of prio 0.
> >
> > So I can't see how are you achieving fariness on NCQ SSD?
> >
> > One more important thing to notice is that throughput of SSD has come down
> > significantly. If I just run one job then I get 73MB/s. With these tree
> > jobs running, we are achieving close to 19 MB/s.
> 
> I think it depends on the hardware. On Jeff's SSD, 32 random readers
> were obtaining approximately the same aggregate bandwidth than a
> single sequential reader. I think that the decision to avoid idling is
> sane on that kind of hardware, but not on the ones like yours, in
> which seek has a very large penalty (I have one in my netbook, for
> which reading 4k takes 1ms). However, if you increase block size, or
> remove the direct I/O, the prefetch should still work for you.

Of course increasing the block size of making the IO buffered which in
turn will increase the block size for sequential reads will increase
the throughput.

Here I wanted to get cache out of picture so that we can see what is
happening at IO scheduling layer.

Thanks
Vivek

> >
> > I think this is happening because of seeks happening almost after every
> > dispatch and that brings down the overall throughput. If we had idled
> > here, I think probably overall throughput would have been better.
> Agreed. In fact, I'd like to add some measurements in cfq, to
> determine the idle parameters, instead of relying on those binary
> rules of thumbs.
> Which hardware is this, btw?
> 
> >
> > Thanks
> > Vivek
> >
> Thanks
> Corrado

  reply	other threads:[~2009-11-10 19:15 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-03 23:43 [RFC] Block IO Controller V1 Vivek Goyal
2009-11-03 23:43 ` [PATCH 01/20] blkio: Documentation Vivek Goyal
2009-11-04 13:37   ` Jeff Moyer
2009-11-04 17:21   ` Balbir Singh
2009-11-04 17:52     ` Vivek Goyal
2009-11-04 23:36       ` Balbir Singh
2009-11-03 23:43 ` [PATCH 02/20] blkio: Change CFQ to use CFS like queue time stamps Vivek Goyal
2009-11-04 14:30   ` Jeff Moyer
2009-11-04 16:37     ` Vivek Goyal
2009-11-04 17:59       ` Corrado Zoccolo
2009-11-04 18:54         ` Vivek Goyal
2009-11-05  2:44       ` Divyesh Shah
2009-11-05 14:39         ` Vivek Goyal
2009-11-04 21:18   ` Corrado Zoccolo
2009-11-04 22:25     ` Vivek Goyal
2009-11-05  8:36       ` Corrado Zoccolo
2009-11-04 23:22     ` Vivek Goyal
2009-11-05  8:27       ` Corrado Zoccolo
2009-11-05  0:05     ` Vivek Goyal
2009-11-06 22:22     ` [RFC] Workload type Vs Groups (Was: Re: [PATCH 02/20] blkio: Change CFQ to use CFS like queue time stamps) Vivek Goyal
2009-11-09 17:33       ` Nauman Rafique
2009-11-09 21:47       ` Corrado Zoccolo
2009-11-09 23:12         ` Vivek Goyal
2009-11-10 11:29           ` Corrado Zoccolo
2009-11-10 13:31             ` Vivek Goyal
2009-11-10 14:12               ` Vivek Goyal
2009-11-10 18:05                 ` Corrado Zoccolo
2009-11-10 19:15                   ` Vivek Goyal [this message]
2009-11-12  8:53                     ` Corrado Zoccolo
2009-11-11  0:48   ` [PATCH 02/20] blkio: Change CFQ to use CFS like queue time stamps Gui Jianfeng
2009-11-12 23:07     ` Vivek Goyal
2009-11-13  0:59       ` Gui Jianfeng
2009-11-13  1:24         ` Vivek Goyal
2009-11-13  2:05           ` Gui Jianfeng
2009-11-03 23:43 ` [PATCH 03/20] blkio: Introduce the notion of weights Vivek Goyal
2009-11-04 15:06   ` Jeff Moyer
2009-11-04 15:41     ` Vivek Goyal
2009-11-04 17:07       ` Divyesh Shah
2009-11-04 19:00         ` Vivek Goyal
2009-11-04 19:15       ` Jeff Moyer
2009-11-03 23:43 ` [PATCH 04/20] blkio: Introduce the notion of cfq entity Vivek Goyal
2009-11-03 23:43 ` [PATCH 05/20] blkio: Introduce the notion of cfq groups Vivek Goyal
2009-11-03 23:43 ` [PATCH 06/20] blkio: Introduce cgroup interface Vivek Goyal
2009-11-04 15:23   ` Jeff Moyer
2009-11-04 16:47     ` Vivek Goyal
2009-11-03 23:43 ` [PATCH 07/20] blkio: Provide capablity to enqueue/dequeue group entities Vivek Goyal
2009-11-04 15:34   ` Jeff Moyer
2009-11-04 16:54     ` Vivek Goyal
2009-11-03 23:43 ` [PATCH 08/20] blkio: Add support for dynamic creation of cfq_groups Vivek Goyal
2009-11-04 16:01   ` Jeff Moyer
2009-11-03 23:43 ` [PATCH 09/20] blkio: Porpogate blkio cgroup weight or ioprio class updation to cfq groups Vivek Goyal
2009-11-05  5:35   ` Gui Jianfeng
2009-11-05 14:42     ` Vivek Goyal
2009-11-03 23:43 ` [PATCH 10/20] blkio: Implement cfq group deletion and reference counting support Vivek Goyal
2009-11-04 18:44   ` Jeff Moyer
2009-11-04 19:00     ` Vivek Goyal
2009-11-03 23:43 ` [PATCH 11/20] blkio: Some CFQ debugging Aid Vivek Goyal
2009-11-04 18:52   ` Jeff Moyer
2009-11-04 19:12     ` Vivek Goyal
2009-11-04 19:25       ` Jeff Moyer
2009-11-05  3:10   ` Divyesh Shah
2009-11-05 14:42     ` Vivek Goyal
2009-11-06  0:56       ` Divyesh Shah
2009-11-03 23:43 ` [PATCH 12/20] blkio: Export disk time and sectors dispatched from cgroup interface Vivek Goyal
2009-11-03 23:43 ` [PATCH 13/20] blkio: Add a group dequeue interface in cgroup for debugging Vivek Goyal
2009-11-03 23:43 ` [PATCH 14/20] blkio: Do not allow request merging across cfq groups Vivek Goyal
2009-11-03 23:43 ` [PATCH 15/20] blkio: Take care of preemptions across groups Vivek Goyal
2009-11-04 19:00   ` Jeff Moyer
2009-11-04 19:27     ` Vivek Goyal
2009-11-04 19:30       ` Jeff Moyer
2009-11-06  7:55   ` Gui Jianfeng
2009-11-06 22:10     ` Vivek Goyal
2009-11-09  7:41       ` Gui Jianfeng
2009-11-03 23:43 ` [PATCH 16/20] blkio: do not select co-operating queues from different cfq groups Vivek Goyal
2009-11-03 23:43 ` [PATCH 17/20] blkio: Wait for queue to get backlogged before it expires Vivek Goyal
2009-11-03 23:43 ` [PATCH 18/20] blkio: arm idle timer even if think time is great then time slice left Vivek Goyal
2009-11-04 19:04   ` Jeff Moyer
2009-11-04 19:17     ` Vivek Goyal
2009-11-03 23:43 ` [PATCH 19/20] blkio: Arm slice timer even if there are requests in driver Vivek Goyal
2009-11-03 23:43 ` [PATCH 20/20] blkio: Drop the reference to queue once the task changes cgroup Vivek Goyal
2009-11-04 19:09   ` Jeff Moyer
2009-11-04 19:18     ` Vivek Goyal
2009-11-04  7:43 ` [RFC] Block IO Controller V1 Jens Axboe
2009-11-04 13:39   ` Vivek Goyal
2009-11-04 19:12 ` Jeff Moyer
2009-11-04 19:19   ` Vivek Goyal
2009-11-04 19:27     ` Jeff Moyer
2009-11-04 19:38       ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091110191520.GC3497@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=czoccolo@gmail.com \
    --cc=dpshah@google.com \
    --cc=fernando@oss.ntt.co.jp \
    --cc=guijianfeng@cn.fujitsu.com \
    --cc=jens.axboe@oracle.com \
    --cc=jmoyer@redhat.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=m-ikeda@ds.jp.nec.com \
    --cc=nauman@google.com \
    --cc=riel@redhat.com \
    --cc=righi.andrea@gmail.com \
    --cc=ryov@valinux.co.jp \
    --cc=s-uchida@ap.jp.nec.com \
    --cc=taka@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.