Re: [RFC] [PATCH v2 0/8] Provide cgroup isolation for buffered writes.

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Vivek Goyal <vgoyal@redhat.com>
To: Justin TerAvest <teravest@google.com>
Cc: jaxboe@fusionio.com, m-ikeda@ds.jp.nec.com, ryov@valinux.co.jp,
	taka@valinux.co.jp, kamezawa.hiroyu@jp.fujitsu.com,
	righi.andrea@gmail.com, guijianfeng@cn.fujitsu.com,
	balbir@linux.vnet.ibm.com, ctalbott@google.com,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC] [PATCH v2 0/8] Provide cgroup isolation for buffered writes.
Date: Tue, 22 Mar 2011 21:27:55 -0400	[thread overview]
Message-ID: <20110323012755.GA10325@redhat.com> (raw)
In-Reply-To: <1300835335-2777-1-git-send-email-teravest@google.com>

On Tue, Mar 22, 2011 at 04:08:47PM -0700, Justin TerAvest wrote:

[..]
> ===================================== Isolation experiment results
> 
> For isolation testing, we run a test that's available at:
>   git://google3-2.osuosl.org/tests/blkcgroup.git
> 
> It creates containers, runs workloads, and checks to see how well we meet
> isolation targets. For the purposes of this patchset, I only ran
> tests among buffered writers.
> 
> Before patches
> ==============
> 10:32:06 INFO experiment 0 achieved DTFs: 666, 333
> 10:32:06 INFO experiment 0 FAILED: max observed error is 167, allowed is 150
> 10:32:51 INFO experiment 1 achieved DTFs: 647, 352
> 10:32:51 INFO experiment 1 FAILED: max observed error is 253, allowed is 150
> 10:33:35 INFO experiment 2 achieved DTFs: 298, 701
> 10:33:35 INFO experiment 2 FAILED: max observed error is 199, allowed is 150
> 10:34:19 INFO experiment 3 achieved DTFs: 445, 277, 277
> 10:34:19 INFO experiment 3 FAILED: max observed error is 155, allowed is 150
> 10:35:05 INFO experiment 4 achieved DTFs: 418, 104, 261, 215
> 10:35:05 INFO experiment 4 FAILED: max observed error is 232, allowed is 150
> 10:35:53 INFO experiment 5 achieved DTFs: 213, 136, 68, 102, 170, 136, 170
> 10:35:53 INFO experiment 5 PASSED: max observed error is 73, allowed is 150
> 10:36:04 INFO -----ran 6 experiments, 1 passed, 5 failed
> 
> After patches
> =============
> 11:05:22 INFO experiment 0 achieved DTFs: 501, 498
> 11:05:22 INFO experiment 0 PASSED: max observed error is 2, allowed is 150
> 11:06:07 INFO experiment 1 achieved DTFs: 874, 125
> 11:06:07 INFO experiment 1 PASSED: max observed error is 26, allowed is 150
> 11:06:53 INFO experiment 2 achieved DTFs: 121, 878
> 11:06:53 INFO experiment 2 PASSED: max observed error is 22, allowed is 150
> 11:07:46 INFO experiment 3 achieved DTFs: 589, 205, 204
> 11:07:46 INFO experiment 3 PASSED: max observed error is 11, allowed is 150
> 11:08:34 INFO experiment 4 achieved DTFs: 616, 109, 109, 163
> 11:08:34 INFO experiment 4 PASSED: max observed error is 34, allowed is 150
> 11:09:29 INFO experiment 5 achieved DTFs: 139, 139, 139, 139, 140, 141, 160
> 11:09:29 INFO experiment 5 PASSED: max observed error is 1, allowed is 150
> 11:09:46 INFO -----ran 6 experiments, 6 passed, 0 failed
> 
> Summary
> =======
> Isolation between buffered writers is clearly better with this patch.

Can you pleae explain what is this test doing. All I am seeing is passed
and failed and really don't understand what the test is doing.

Can you run say simple 4 dd buffered writers in 4 cgroups with weights
100, 200, 300 and 400 and see if you get better isolation.

Secondly can you also please explain that how does it work. Without
making writeback cgroup aware, there are no gurantees that higher
weight cgroup will get more IO done. 

> 
> 
> =============================== Read latency results
> To test read latency, I created two containers:
>   - One called "readers", with weight 900
>   - One called "writers", with weight 100
> 
> I ran this fio workload in "readers":
> [global]
> directory=/mnt/iostestmnt/fio
> runtime=30
> time_based=1
> group_reporting=1
> exec_prerun='echo 3 > /proc/sys/vm/drop_caches'
> cgroup_nodelete=1
> bs=4K
> size=512M
> 
> [iostest-read]
> description="reader"
> numjobs=16
> rw=randread
> new_group=1
> 
> 
> ....and this fio workload in "writers"
> [global]
> directory=/mnt/iostestmnt/fio
> runtime=30
> time_based=1
> group_reporting=1
> exec_prerun='echo 3 > /proc/sys/vm/drop_caches'
> cgroup_nodelete=1
> bs=4K
> size=512M
> 
> [iostest-write]
> description="writer"
> cgroup=writers
> numjobs=3
> rw=write
> new_group=1
> 
> 
> 
> I've pasted the results from the "read" workload inline.
> 
> Before patches
> ==============
> Starting 16 processes
> 
> Jobs: 14 (f=14): [_rrrrrr_rrrrrrrr] [36.2% done] [352K/0K /s] [86 /0  iops] [eta 01m:00s]·············
> iostest-read: (groupid=0, jobs=16): err= 0: pid=20606
>   Description  : ["reader"]
>   read : io=13532KB, bw=455814 B/s, iops=111 , runt= 30400msec
>     clat (usec): min=2190 , max=30399K, avg=30395175.13, stdev= 0.20
>      lat (usec): min=2190 , max=30399K, avg=30395177.07, stdev= 0.20
>     bw (KB/s) : min=    0, max=  260, per=0.00%, avg= 0.00, stdev= 0.00
>   cpu          : usr=0.00%, sys=0.03%, ctx=3691, majf=2, minf=468
>   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      issued r/w/d: total=3383/0/0, short=0/0/0
> 
>      lat (msec): 4=0.03%, 10=2.66%, 20=74.84%, 50=21.90%, 100=0.09%
>      lat (msec): 250=0.06%, >=2000=0.41%
> 
> Run status group 0 (all jobs):
>    READ: io=13532KB, aggrb=445KB/s, minb=455KB/s, maxb=455KB/s, mint=30400msec, maxt=30400msec
> 
> Disk stats (read/write):
>   sdb: ios=3744/18, merge=0/16, ticks=542713/1675, in_queue=550714, util=99.15%
> 
> 
> 
> After patches
> =============
> tarting 16 processes
> Jobs: 16 (f=16): [rrrrrrrrrrrrrrrr] [100.0% done] [557K/0K /s] [136 /0  iops] [eta 00m:00s]
> iostest-read: (groupid=0, jobs=16): err= 0: pid=14183
>   Description  : ["reader"]
>   read : io=14940KB, bw=506105 B/s, iops=123 , runt= 30228msec
>     clat (msec): min=2 , max=29866 , avg=463.42, stdev=101.84
>      lat (msec): min=2 , max=29866 , avg=463.42, stdev=101.84
>     bw (KB/s) : min=    0, max=  198, per=31.69%, avg=156.52, stdev=17.83
>   cpu          : usr=0.01%, sys=0.03%, ctx=4274, majf=2, minf=464
>   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      issued r/w/d: total=3735/0/0, short=0/0/0
> 
>      lat (msec): 4=0.05%, 10=0.32%, 20=32.99%, 50=64.61%, 100=1.26%
>      lat (msec): 250=0.11%, 500=0.11%, 750=0.16%, 1000=0.05%, >=2000=0.35%
> 
> Run status group 0 (all jobs):
>    READ: io=14940KB, aggrb=494KB/s, minb=506KB/s, maxb=506KB/s, mint=30228msec, maxt=30228msec
> 
> Disk stats (read/write):
>   sdb: ios=4189/0, merge=0/0, ticks=96428/0, in_queue=478798, util=100.00%
> 
> 
> 
> Summary
> =======
> Read latencies are a bit worse, but this overhead is only imposed when users
> ask for this feature by turning on CONFIG_BLKIOTRACK. We expect there to be =
> a something of a latency vs isolation tradeoff.

- What number you are looking at to say READ latencies are worse.
- Who got isolated here? If READS latencies are worse and you are saying
  that's the cost of isolation, that means you are looking for isolation
  for WRITES? This is the first time time I am hearing that READS starved
  WRITES and I want better isolation for WRITES.

Also CONFIG_BLKIOTRACK=n is not the solution. This will most likely be
set and we need to figure out which makes sense.

To me WRITE isolation comes handy only if we want to create speed
difference between multiple WRITE streams. And that can not reliably be
done till we make writeback logic cgroup aware.

If we try to put WRITES in a separate group, most likely WRITES will end
up getting bigger share of disk then what they are getting by default and
I seriously doubt that who is looking for that. So far all the complaints
I have heard is that in presence of WRITES, my READ latencies suffer and
not vice a versa.

Thanks
Vivek

next prev parent reply	other threads:[~2011-03-23  1:28 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-22 23:08 [RFC] [PATCH v2 0/8] Provide cgroup isolation for buffered writes Justin TerAvest
2011-03-22 23:08 ` [PATCH v2 1/8] cfq-iosched: add symmetric reference wrappers Justin TerAvest
2011-03-22 23:08 ` [PATCH v2 2/8] block,fs,mm: IO cgroup tracking for buffered write Justin TerAvest
2011-03-23  4:52   ` KAMEZAWA Hiroyuki
2011-03-23 17:21     ` Justin TerAvest
2011-03-24  8:26       ` KAMEZAWA Hiroyuki
2011-03-22 23:08 ` [PATCH v2 3/8] cfq-iosched: Make async queues per cgroup Justin TerAvest
2011-03-22 23:08 ` [PATCH v2 4/8] block: Modify CFQ to use IO tracking information Justin TerAvest
2011-03-22 23:08 ` [PATCH v2 5/8] cfq: Fix up tracked async workload length Justin TerAvest
2011-03-22 23:08 ` [PATCH v2 6/8] cfq: add per cgroup writeout done by flusher stat Justin TerAvest
2011-03-22 23:08 ` [PATCH v2 7/8] block: Per cgroup request descriptor counts Justin TerAvest
2011-03-22 23:08 ` [PATCH v2 8/8] cfq: Don't allow preemption across cgroups Justin TerAvest
2011-03-23  1:27 ` Vivek Goyal [this message]
2011-03-23 16:27   ` [RFC] [PATCH v2 0/8] Provide cgroup isolation for buffered writes Justin TerAvest
2011-03-23 20:06     ` Vivek Goyal
2011-03-23 22:32       ` Justin TerAvest
2011-03-24 13:56         ` Vivek Goyal
2011-03-25  1:51           ` Justin TerAvest
2011-03-25  7:46 ` Balbir Singh
2011-03-28 15:21   ` Justin TerAvest

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110323012755.GA10325@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=ctalbott@google.com \
    --cc=guijianfeng@cn.fujitsu.com \
    --cc=jaxboe@fusionio.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=m-ikeda@ds.jp.nec.com \
    --cc=righi.andrea@gmail.com \
    --cc=ryov@valinux.co.jp \
    --cc=taka@valinux.co.jp \
    --cc=teravest@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox