* [PATCH 1/2]block cfq: make queue preempt work for queues from different workload
@ 2011-01-11 8:51 Shaohua Li
2011-01-11 21:07 ` Corrado Zoccolo
2011-01-17 14:26 ` Vivek Goyal
0 siblings, 2 replies; 10+ messages in thread
From: Shaohua Li @ 2011-01-11 8:51 UTC (permalink / raw)
To: lkml
Cc: Jens Axboe, Vivek Goyal, jmoyer@redhat.com, Corrado Zoccolo,
Gui Jianfeng
I got this:
fio-874 [007] 2157.724514: 8,32 m N cfq874 preempt
fio-874 [007] 2157.724519: 8,32 m N cfq830 slice expired t=1
fio-874 [007] 2157.724520: 8,32 m N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0
fio-874 [007] 2157.724521: 8,32 m N cfq830 set_active wl_prio:0 wl_type:0
fio-874 [007] 2157.724522: 8,32 m N cfq830 Not idling. st->count:1
cfq830 is an async queue, and preempted by a sync queue cfq874. But since we
have cfqg->saved_workload_slice mechanism, the preempt is a nop.
Looks currently our preempt is totally broken if the two queues are not from
the same workload type.
Below patch fixes it. This will might make async queue starvation, but it's
what our old code does before cgroup is added.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
---
block/cfq-iosched.c | 9 +++++++++
1 file changed, 9 insertions(+)
Index: linux/block/cfq-iosched.c
===================================================================
--- linux.orig/block/cfq-iosched.c 2011-01-11 16:24:56.000000000 +0800
+++ linux/block/cfq-iosched.c 2011-01-11 16:41:49.000000000 +0800
@@ -3284,10 +3284,19 @@ cfq_should_preempt(struct cfq_data *cfqd
*/
static void cfq_preempt_queue(struct cfq_data *cfqd, struct cfq_queue *cfqq)
{
+ struct cfq_queue *old_cfqq = cfqd->active_queue;
+
cfq_log_cfqq(cfqd, cfqq, "preempt");
cfq_slice_expired(cfqd, 1);
/*
+ * workload type is changed, don't save slice, otherwise preempt
+ * doesn't happen
+ */
+ if (cfqq_type(old_cfqq) != cfqq_type(cfqq))
+ cfqq->cfqg->saved_workload_slice = 0;
+
+ /*
* Put the new queue at the front of the of the current list,
* so we know that it will be selected next.
*/
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH 1/2]block cfq: make queue preempt work for queues from different workload 2011-01-11 8:51 [PATCH 1/2]block cfq: make queue preempt work for queues from different workload Shaohua Li @ 2011-01-11 21:07 ` Corrado Zoccolo 2011-01-12 2:30 ` Shaohua Li 2011-01-17 14:26 ` Vivek Goyal 1 sibling, 1 reply; 10+ messages in thread From: Corrado Zoccolo @ 2011-01-11 21:07 UTC (permalink / raw) To: Shaohua Li; +Cc: lkml, Jens Axboe, Vivek Goyal, jmoyer@redhat.com, Gui Jianfeng Hi Shaohua, On Tue, Jan 11, 2011 at 9:51 AM, Shaohua Li <shaohua.li@intel.com> wrote: > I got this: > fio-874 [007] 2157.724514: 8,32 m N cfq874 preempt > fio-874 [007] 2157.724519: 8,32 m N cfq830 slice expired t=1 > fio-874 [007] 2157.724520: 8,32 m N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0 > fio-874 [007] 2157.724521: 8,32 m N cfq830 set_active wl_prio:0 wl_type:0 > fio-874 [007] 2157.724522: 8,32 m N cfq830 Not idling. st->count:1 > cfq830 is an async queue, and preempted by a sync queue cfq874. But since we > have cfqg->saved_workload_slice mechanism, the preempt is a nop. > Looks currently our preempt is totally broken if the two queues are not from > the same workload type. > Below patch fixes it. This will might make async queue starvation, but it's > what our old code does before cgroup is added. have you measured latency improvements by un-breaking preemption? AFAIK, preemption behaviour changed since 2.6.33, before cgroups were added, and the latency before the changes that weakened preemption in 2.6.33 was far worse. Thanks, Corrado > > Signed-off-by: Shaohua Li <shaohua.li@intel.com> > > --- > block/cfq-iosched.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > Index: linux/block/cfq-iosched.c > =================================================================== > --- linux.orig/block/cfq-iosched.c 2011-01-11 16:24:56.000000000 +0800 > +++ linux/block/cfq-iosched.c 2011-01-11 16:41:49.000000000 +0800 > @@ -3284,10 +3284,19 @@ cfq_should_preempt(struct cfq_data *cfqd > */ > static void cfq_preempt_queue(struct cfq_data *cfqd, struct cfq_queue *cfqq) > { > + struct cfq_queue *old_cfqq = cfqd->active_queue; > + > cfq_log_cfqq(cfqd, cfqq, "preempt"); > cfq_slice_expired(cfqd, 1); > > /* > + * workload type is changed, don't save slice, otherwise preempt > + * doesn't happen > + */ > + if (cfqq_type(old_cfqq) != cfqq_type(cfqq)) > + cfqq->cfqg->saved_workload_slice = 0; > + > + /* > * Put the new queue at the front of the of the current list, > * so we know that it will be selected next. > */ > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2]block cfq: make queue preempt work for queues from different workload 2011-01-11 21:07 ` Corrado Zoccolo @ 2011-01-12 2:30 ` Shaohua Li 2011-01-14 4:44 ` Shaohua Li 0 siblings, 1 reply; 10+ messages in thread From: Shaohua Li @ 2011-01-12 2:30 UTC (permalink / raw) To: Corrado Zoccolo Cc: lkml, Jens Axboe, Vivek Goyal, jmoyer@redhat.com, Gui Jianfeng Hi, On Wed, Jan 12, 2011 at 05:07:47AM +0800, Corrado Zoccolo wrote: > Hi Shaohua, > On Tue, Jan 11, 2011 at 9:51 AM, Shaohua Li <shaohua.li@intel.com> wrote: > > I got this: > > fio-874 [007] 2157.724514: 8,32 m N cfq874 preempt > > fio-874 [007] 2157.724519: 8,32 m N cfq830 slice expired t=1 > > fio-874 [007] 2157.724520: 8,32 m N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0 > > fio-874 [007] 2157.724521: 8,32 m N cfq830 set_active wl_prio:0 wl_type:0 > > fio-874 [007] 2157.724522: 8,32 m N cfq830 Not idling. st->count:1 > > cfq830 is an async queue, and preempted by a sync queue cfq874. But since we > > have cfqg->saved_workload_slice mechanism, the preempt is a nop. > > Looks currently our preempt is totally broken if the two queues are not from > > the same workload type. > > Below patch fixes it. This will might make async queue starvation, but it's > > what our old code does before cgroup is added. > have you measured latency improvements by un-breaking preemption? > AFAIK, preemption behaviour changed since 2.6.33, before cgroups were > added, and the latency before the changes that weakened preemption in > 2.6.33 was far worse. Yes. I'm testing a SD card for MeeGo. The random write is very slow (~12k/s) but random read is relatively fast > 1M/s. Without patch: write: (groupid=0, jobs=1): err= 0: pid=3876 write: io=966656 B, bw=8054 B/s, iops=1 , runt=120008msec clat (usec): min=5 , max=1716.3K, avg=88637.38, stdev=207100.44 lat (usec): min=5 , max=1716.3K, avg=88637.69, stdev=207100.41 bw (KB/s) : min= 0, max= 52, per=168.17%, avg=11.77, stdev= 8.85 read: (groupid=0, jobs=1): err= 0: pid=3877 read : io=52516KB, bw=448084 B/s, iops=109 , runt=120014msec slat (usec): min=7 , max=1918.5K, avg=519.78, stdev=25777.85 clat (msec): min=1 , max=2728 , avg=71.17, stdev=216.92 lat (msec): min=1 , max=2756 , avg=71.69, stdev=219.52 bw (KB/s) : min= 1, max= 1413, per=66.42%, avg=567.22, stdev=461.50 With patch: write: (groupid=0, jobs=1): err= 0: pid=4884 write: io=81920 B, bw=677 B/s, iops=0 , runt=120983msec clat (usec): min=13 , max=742976 , avg=155694.10, stdev=244610.02 lat (usec): min=13 , max=742976 , avg=155694.50, stdev=244609.89 bw (KB/s) : min= 0, max= 31, per=inf%, avg= 8.40, stdev=12.78 read: (groupid=0, jobs=1): err= 0: pid=4885 read : io=133008KB, bw=1108.3KB/s, iops=277 , runt=120022msec slat (usec): min=8 , max=1159.1K, avg=164.24, stdev=9116.65 clat (msec): min=1 , max=1988 , avg=28.34, stdev=55.81 lat (msec): min=1 , max=1989 , avg=28.51, stdev=57.51 bw (KB/s) : min= 2, max= 1808, per=51.10%, avg=1133.42, stdev=275.59 Both read latency/throughput has big difference with the patch, but write gets starvation. Thanks, Shaohua ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2]block cfq: make queue preempt work for queues from different workload 2011-01-12 2:30 ` Shaohua Li @ 2011-01-14 4:44 ` Shaohua Li 2011-01-14 7:38 ` Jens Axboe 2011-01-17 3:41 ` Gui Jianfeng 0 siblings, 2 replies; 10+ messages in thread From: Shaohua Li @ 2011-01-14 4:44 UTC (permalink / raw) To: Corrado Zoccolo, Jens Axboe Cc: lkml, Vivek Goyal, jmoyer@redhat.com, Gui Jianfeng 2011/1/12 Shaohua Li <shaohua.li@intel.com>: > Hi, > On Wed, Jan 12, 2011 at 05:07:47AM +0800, Corrado Zoccolo wrote: >> Hi Shaohua, >> On Tue, Jan 11, 2011 at 9:51 AM, Shaohua Li <shaohua.li@intel.com> wrote: >> > I got this: >> > fio-874 [007] 2157.724514: 8,32 m N cfq874 preempt >> > fio-874 [007] 2157.724519: 8,32 m N cfq830 slice expired t=1 >> > fio-874 [007] 2157.724520: 8,32 m N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0 >> > fio-874 [007] 2157.724521: 8,32 m N cfq830 set_active wl_prio:0 wl_type:0 >> > fio-874 [007] 2157.724522: 8,32 m N cfq830 Not idling. st->count:1 >> > cfq830 is an async queue, and preempted by a sync queue cfq874. But since we >> > have cfqg->saved_workload_slice mechanism, the preempt is a nop. >> > Looks currently our preempt is totally broken if the two queues are not from >> > the same workload type. >> > Below patch fixes it. This will might make async queue starvation, but it's >> > what our old code does before cgroup is added. >> have you measured latency improvements by un-breaking preemption? >> AFAIK, preemption behaviour changed since 2.6.33, before cgroups were >> added, and the latency before the changes that weakened preemption in >> 2.6.33 was far worse. > Yes. I'm testing a SD card for MeeGo. The random write is very slow (~12k/s) but > random read is relatively fast > 1M/s. > > Without patch: > write: (groupid=0, jobs=1): err= 0: pid=3876 > write: io=966656 B, bw=8054 B/s, iops=1 , runt=120008msec > clat (usec): min=5 , max=1716.3K, avg=88637.38, stdev=207100.44 > lat (usec): min=5 , max=1716.3K, avg=88637.69, stdev=207100.41 > bw (KB/s) : min= 0, max= 52, per=168.17%, avg=11.77, stdev= 8.85 > read: (groupid=0, jobs=1): err= 0: pid=3877 > read : io=52516KB, bw=448084 B/s, iops=109 , runt=120014msec > slat (usec): min=7 , max=1918.5K, avg=519.78, stdev=25777.85 > clat (msec): min=1 , max=2728 , avg=71.17, stdev=216.92 > lat (msec): min=1 , max=2756 , avg=71.69, stdev=219.52 > bw (KB/s) : min= 1, max= 1413, per=66.42%, avg=567.22, stdev=461.50 > > With patch: > write: (groupid=0, jobs=1): err= 0: pid=4884 > write: io=81920 B, bw=677 B/s, iops=0 , runt=120983msec > clat (usec): min=13 , max=742976 , avg=155694.10, stdev=244610.02 > lat (usec): min=13 , max=742976 , avg=155694.50, stdev=244609.89 > bw (KB/s) : min= 0, max= 31, per=inf%, avg= 8.40, stdev=12.78 > read: (groupid=0, jobs=1): err= 0: pid=4885 > read : io=133008KB, bw=1108.3KB/s, iops=277 , runt=120022msec > slat (usec): min=8 , max=1159.1K, avg=164.24, stdev=9116.65 > clat (msec): min=1 , max=1988 , avg=28.34, stdev=55.81 > lat (msec): min=1 , max=1989 , avg=28.51, stdev=57.51 > bw (KB/s) : min= 2, max= 1808, per=51.10%, avg=1133.42, stdev=275.59 > > Both read latency/throughput has big difference with the patch, but write > gets starvation. Hi Jens and others, How do you think about the patch? Thanks, Shaohua ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2]block cfq: make queue preempt work for queues from different workload 2011-01-14 4:44 ` Shaohua Li @ 2011-01-14 7:38 ` Jens Axboe 2011-01-14 14:29 ` Jeff Moyer 2011-01-17 3:41 ` Gui Jianfeng 1 sibling, 1 reply; 10+ messages in thread From: Jens Axboe @ 2011-01-14 7:38 UTC (permalink / raw) To: Shaohua Li Cc: Corrado Zoccolo, lkml, Vivek Goyal, jmoyer@redhat.com, Gui Jianfeng On 2011-01-14 05:44, Shaohua Li wrote: > 2011/1/12 Shaohua Li <shaohua.li@intel.com>: >> Hi, >> On Wed, Jan 12, 2011 at 05:07:47AM +0800, Corrado Zoccolo wrote: >>> Hi Shaohua, >>> On Tue, Jan 11, 2011 at 9:51 AM, Shaohua Li <shaohua.li@intel.com> wrote: >>>> I got this: >>>> fio-874 [007] 2157.724514: 8,32 m N cfq874 preempt >>>> fio-874 [007] 2157.724519: 8,32 m N cfq830 slice expired t=1 >>>> fio-874 [007] 2157.724520: 8,32 m N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0 >>>> fio-874 [007] 2157.724521: 8,32 m N cfq830 set_active wl_prio:0 wl_type:0 >>>> fio-874 [007] 2157.724522: 8,32 m N cfq830 Not idling. st->count:1 >>>> cfq830 is an async queue, and preempted by a sync queue cfq874. But since we >>>> have cfqg->saved_workload_slice mechanism, the preempt is a nop. >>>> Looks currently our preempt is totally broken if the two queues are not from >>>> the same workload type. >>>> Below patch fixes it. This will might make async queue starvation, but it's >>>> what our old code does before cgroup is added. >>> have you measured latency improvements by un-breaking preemption? >>> AFAIK, preemption behaviour changed since 2.6.33, before cgroups were >>> added, and the latency before the changes that weakened preemption in >>> 2.6.33 was far worse. >> Yes. I'm testing a SD card for MeeGo. The random write is very slow (~12k/s) but >> random read is relatively fast > 1M/s. >> >> Without patch: >> write: (groupid=0, jobs=1): err= 0: pid=3876 >> write: io=966656 B, bw=8054 B/s, iops=1 , runt=120008msec >> clat (usec): min=5 , max=1716.3K, avg=88637.38, stdev=207100.44 >> lat (usec): min=5 , max=1716.3K, avg=88637.69, stdev=207100.41 >> bw (KB/s) : min= 0, max= 52, per=168.17%, avg=11.77, stdev= 8.85 >> read: (groupid=0, jobs=1): err= 0: pid=3877 >> read : io=52516KB, bw=448084 B/s, iops=109 , runt=120014msec >> slat (usec): min=7 , max=1918.5K, avg=519.78, stdev=25777.85 >> clat (msec): min=1 , max=2728 , avg=71.17, stdev=216.92 >> lat (msec): min=1 , max=2756 , avg=71.69, stdev=219.52 >> bw (KB/s) : min= 1, max= 1413, per=66.42%, avg=567.22, stdev=461.50 >> >> With patch: >> write: (groupid=0, jobs=1): err= 0: pid=4884 >> write: io=81920 B, bw=677 B/s, iops=0 , runt=120983msec >> clat (usec): min=13 , max=742976 , avg=155694.10, stdev=244610.02 >> lat (usec): min=13 , max=742976 , avg=155694.50, stdev=244609.89 >> bw (KB/s) : min= 0, max= 31, per=inf%, avg= 8.40, stdev=12.78 >> read: (groupid=0, jobs=1): err= 0: pid=4885 >> read : io=133008KB, bw=1108.3KB/s, iops=277 , runt=120022msec >> slat (usec): min=8 , max=1159.1K, avg=164.24, stdev=9116.65 >> clat (msec): min=1 , max=1988 , avg=28.34, stdev=55.81 >> lat (msec): min=1 , max=1989 , avg=28.51, stdev=57.51 >> bw (KB/s) : min= 2, max= 1808, per=51.10%, avg=1133.42, stdev=275.59 >> >> Both read latency/throughput has big difference with the patch, but write >> gets starvation. > Hi Jens and others, > How do you think about the patch? I think the patch is good. If preemption in some cases makes things worse, then we need to look into those. That's a separate issue. -- Jens Axboe ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2]block cfq: make queue preempt work for queues from different workload 2011-01-14 7:38 ` Jens Axboe @ 2011-01-14 14:29 ` Jeff Moyer 0 siblings, 0 replies; 10+ messages in thread From: Jeff Moyer @ 2011-01-14 14:29 UTC (permalink / raw) To: Jens Axboe; +Cc: Shaohua Li, Corrado Zoccolo, lkml, Vivek Goyal, Gui Jianfeng Jens Axboe <jaxboe@fusionio.com> writes: > On 2011-01-14 05:44, Shaohua Li wrote: >> 2011/1/12 Shaohua Li <shaohua.li@intel.com>: >>> Hi, >>> On Wed, Jan 12, 2011 at 05:07:47AM +0800, Corrado Zoccolo wrote: >>>> Hi Shaohua, >>>> On Tue, Jan 11, 2011 at 9:51 AM, Shaohua Li <shaohua.li@intel.com> wrote: >>>>> I got this: >>>>> fio-874 [007] 2157.724514: 8,32 m N cfq874 preempt >>>>> fio-874 [007] 2157.724519: 8,32 m N cfq830 slice expired t=1 >>>>> fio-874 [007] 2157.724520: 8,32 m N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0 >>>>> fio-874 [007] 2157.724521: 8,32 m N cfq830 set_active wl_prio:0 wl_type:0 >>>>> fio-874 [007] 2157.724522: 8,32 m N cfq830 Not idling. st->count:1 >>>>> cfq830 is an async queue, and preempted by a sync queue cfq874. But since we >>>>> have cfqg->saved_workload_slice mechanism, the preempt is a nop. >>>>> Looks currently our preempt is totally broken if the two queues are not from >>>>> the same workload type. >>>>> Below patch fixes it. This will might make async queue starvation, but it's >>>>> what our old code does before cgroup is added. >>>> have you measured latency improvements by un-breaking preemption? >>>> AFAIK, preemption behaviour changed since 2.6.33, before cgroups were >>>> added, and the latency before the changes that weakened preemption in >>>> 2.6.33 was far worse. >>> Yes. I'm testing a SD card for MeeGo. The random write is very slow (~12k/s) but >>> random read is relatively fast > 1M/s. >>> >>> Without patch: >>> write: (groupid=0, jobs=1): err= 0: pid=3876 >>> write: io=966656 B, bw=8054 B/s, iops=1 , runt=120008msec >>> clat (usec): min=5 , max=1716.3K, avg=88637.38, stdev=207100.44 >>> lat (usec): min=5 , max=1716.3K, avg=88637.69, stdev=207100.41 >>> bw (KB/s) : min= 0, max= 52, per=168.17%, avg=11.77, stdev= 8.85 >>> read: (groupid=0, jobs=1): err= 0: pid=3877 >>> read : io=52516KB, bw=448084 B/s, iops=109 , runt=120014msec >>> slat (usec): min=7 , max=1918.5K, avg=519.78, stdev=25777.85 >>> clat (msec): min=1 , max=2728 , avg=71.17, stdev=216.92 >>> lat (msec): min=1 , max=2756 , avg=71.69, stdev=219.52 >>> bw (KB/s) : min= 1, max= 1413, per=66.42%, avg=567.22, stdev=461.50 >>> >>> With patch: >>> write: (groupid=0, jobs=1): err= 0: pid=4884 >>> write: io=81920 B, bw=677 B/s, iops=0 , runt=120983msec >>> clat (usec): min=13 , max=742976 , avg=155694.10, stdev=244610.02 >>> lat (usec): min=13 , max=742976 , avg=155694.50, stdev=244609.89 >>> bw (KB/s) : min= 0, max= 31, per=inf%, avg= 8.40, stdev=12.78 >>> read: (groupid=0, jobs=1): err= 0: pid=4885 >>> read : io=133008KB, bw=1108.3KB/s, iops=277 , runt=120022msec >>> slat (usec): min=8 , max=1159.1K, avg=164.24, stdev=9116.65 >>> clat (msec): min=1 , max=1988 , avg=28.34, stdev=55.81 >>> lat (msec): min=1 , max=1989 , avg=28.51, stdev=57.51 >>> bw (KB/s) : min= 2, max= 1808, per=51.10%, avg=1133.42, stdev=275.59 >>> >>> Both read latency/throughput has big difference with the patch, but write >>> gets starvation. >> Hi Jens and others, >> How do you think about the patch? > > I think the patch is good. If preemption in some cases makes things > worse, then we need to look into those. That's a separate issue. I think things are getting pretty messy. Who would have guessed that the saved_workload_slice would affect preemption? If a queue is to preempt another, can't we make that a bit more explicit? I'm still trying to walk through the code to figure out how this ends up happening, as the patch description leaves a lot out of the picture. Cheers, Jeff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2]block cfq: make queue preempt work for queues from different workload 2011-01-14 4:44 ` Shaohua Li 2011-01-14 7:38 ` Jens Axboe @ 2011-01-17 3:41 ` Gui Jianfeng 2011-01-17 8:41 ` Shaohua Li 1 sibling, 1 reply; 10+ messages in thread From: Gui Jianfeng @ 2011-01-17 3:41 UTC (permalink / raw) To: Shaohua Li Cc: Corrado Zoccolo, Jens Axboe, lkml, Vivek Goyal, jmoyer@redhat.com Shaohua Li wrote: > 2011/1/12 Shaohua Li <shaohua.li@intel.com>: >> Hi, >> On Wed, Jan 12, 2011 at 05:07:47AM +0800, Corrado Zoccolo wrote: >>> Hi Shaohua, >>> On Tue, Jan 11, 2011 at 9:51 AM, Shaohua Li <shaohua.li@intel.com> wrote: >>>> I got this: >>>> fio-874 [007] 2157.724514: 8,32 m N cfq874 preempt >>>> fio-874 [007] 2157.724519: 8,32 m N cfq830 slice expired t=1 >>>> fio-874 [007] 2157.724520: 8,32 m N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0 >>>> fio-874 [007] 2157.724521: 8,32 m N cfq830 set_active wl_prio:0 wl_type:0 >>>> fio-874 [007] 2157.724522: 8,32 m N cfq830 Not idling. st->count:1 >>>> cfq830 is an async queue, and preempted by a sync queue cfq874. But since we >>>> have cfqg->saved_workload_slice mechanism, the preempt is a nop. >>>> Looks currently our preempt is totally broken if the two queues are not from >>>> the same workload type. >>>> Below patch fixes it. This will might make async queue starvation, but it's >>>> what our old code does before cgroup is added. >>> have you measured latency improvements by un-breaking preemption? >>> AFAIK, preemption behaviour changed since 2.6.33, before cgroups were >>> added, and the latency before the changes that weakened preemption in >>> 2.6.33 was far worse. >> Yes. I'm testing a SD card for MeeGo. The random write is very slow (~12k/s) but >> random read is relatively fast > 1M/s. >> >> Without patch: >> write: (groupid=0, jobs=1): err= 0: pid=3876 >> write: io=966656 B, bw=8054 B/s, iops=1 , runt=120008msec >> clat (usec): min=5 , max=1716.3K, avg=88637.38, stdev=207100.44 >> lat (usec): min=5 , max=1716.3K, avg=88637.69, stdev=207100.41 >> bw (KB/s) : min= 0, max= 52, per=168.17%, avg=11.77, stdev= 8.85 >> read: (groupid=0, jobs=1): err= 0: pid=3877 >> read : io=52516KB, bw=448084 B/s, iops=109 , runt=120014msec >> slat (usec): min=7 , max=1918.5K, avg=519.78, stdev=25777.85 >> clat (msec): min=1 , max=2728 , avg=71.17, stdev=216.92 >> lat (msec): min=1 , max=2756 , avg=71.69, stdev=219.52 >> bw (KB/s) : min= 1, max= 1413, per=66.42%, avg=567.22, stdev=461.50 >> >> With patch: >> write: (groupid=0, jobs=1): err= 0: pid=4884 >> write: io=81920 B, bw=677 B/s, iops=0 , runt=120983msec >> clat (usec): min=13 , max=742976 , avg=155694.10, stdev=244610.02 >> lat (usec): min=13 , max=742976 , avg=155694.50, stdev=244609.89 >> bw (KB/s) : min= 0, max= 31, per=inf%, avg= 8.40, stdev=12.78 >> read: (groupid=0, jobs=1): err= 0: pid=4885 >> read : io=133008KB, bw=1108.3KB/s, iops=277 , runt=120022msec >> slat (usec): min=8 , max=1159.1K, avg=164.24, stdev=9116.65 >> clat (msec): min=1 , max=1988 , avg=28.34, stdev=55.81 >> lat (msec): min=1 , max=1989 , avg=28.51, stdev=57.51 >> bw (KB/s) : min= 2, max= 1808, per=51.10%, avg=1133.42, stdev=275.59 >> >> Both read latency/throughput has big difference with the patch, but write >> gets starvation. > Hi Jens and others, > How do you think about the patch? Further more, Consider the following piece code. 2132 /* 2133 * For RT and BE, we have to choose also the type 2134 * (SYNC, SYNC_NOIDLE, ASYNC), and to compute a workload 2135 * expiration time 2136 */ 2137 st = service_tree_for(cfqg, cfqd->serving_prio, cfqd->serving_type); 2138 count = st->count; 2139 2140 /* 2141 * check workload expiration, and that we still have other queues ready 2142 */ 2143 if (count && !time_after(jiffies, cfqd->workload_expires)) 2144 return; here, cfqd->serving_prio might be changed. But we continue to check workload expire to decide whether let the old workload run. I don't think this makes too much sence. I think if cfqd->serving_prio gets changed, we should recalculate workload type. Am i missing something? Thanks Gui > > Thanks, > Shaohua > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2]block cfq: make queue preempt work for queues from different workload 2011-01-17 3:41 ` Gui Jianfeng @ 2011-01-17 8:41 ` Shaohua Li 0 siblings, 0 replies; 10+ messages in thread From: Shaohua Li @ 2011-01-17 8:41 UTC (permalink / raw) To: Gui Jianfeng Cc: Corrado Zoccolo, Jens Axboe, lkml, Vivek Goyal, jmoyer@redhat.com 2011/1/17 Gui Jianfeng <guijianfeng@cn.fujitsu.com>: > Shaohua Li wrote: >> 2011/1/12 Shaohua Li <shaohua.li@intel.com>: >>> Hi, >>> On Wed, Jan 12, 2011 at 05:07:47AM +0800, Corrado Zoccolo wrote: >>>> Hi Shaohua, >>>> On Tue, Jan 11, 2011 at 9:51 AM, Shaohua Li <shaohua.li@intel.com> wrote: >>>>> I got this: >>>>> fio-874 [007] 2157.724514: 8,32 m N cfq874 preempt >>>>> fio-874 [007] 2157.724519: 8,32 m N cfq830 slice expired t=1 >>>>> fio-874 [007] 2157.724520: 8,32 m N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0 >>>>> fio-874 [007] 2157.724521: 8,32 m N cfq830 set_active wl_prio:0 wl_type:0 >>>>> fio-874 [007] 2157.724522: 8,32 m N cfq830 Not idling. st->count:1 >>>>> cfq830 is an async queue, and preempted by a sync queue cfq874. But since we >>>>> have cfqg->saved_workload_slice mechanism, the preempt is a nop. >>>>> Looks currently our preempt is totally broken if the two queues are not from >>>>> the same workload type. >>>>> Below patch fixes it. This will might make async queue starvation, but it's >>>>> what our old code does before cgroup is added. >>>> have you measured latency improvements by un-breaking preemption? >>>> AFAIK, preemption behaviour changed since 2.6.33, before cgroups were >>>> added, and the latency before the changes that weakened preemption in >>>> 2.6.33 was far worse. >>> Yes. I'm testing a SD card for MeeGo. The random write is very slow (~12k/s) but >>> random read is relatively fast > 1M/s. >>> >>> Without patch: >>> write: (groupid=0, jobs=1): err= 0: pid=3876 >>> write: io=966656 B, bw=8054 B/s, iops=1 , runt=120008msec >>> clat (usec): min=5 , max=1716.3K, avg=88637.38, stdev=207100.44 >>> lat (usec): min=5 , max=1716.3K, avg=88637.69, stdev=207100.41 >>> bw (KB/s) : min= 0, max= 52, per=168.17%, avg=11.77, stdev= 8.85 >>> read: (groupid=0, jobs=1): err= 0: pid=3877 >>> read : io=52516KB, bw=448084 B/s, iops=109 , runt=120014msec >>> slat (usec): min=7 , max=1918.5K, avg=519.78, stdev=25777.85 >>> clat (msec): min=1 , max=2728 , avg=71.17, stdev=216.92 >>> lat (msec): min=1 , max=2756 , avg=71.69, stdev=219.52 >>> bw (KB/s) : min= 1, max= 1413, per=66.42%, avg=567.22, stdev=461.50 >>> >>> With patch: >>> write: (groupid=0, jobs=1): err= 0: pid=4884 >>> write: io=81920 B, bw=677 B/s, iops=0 , runt=120983msec >>> clat (usec): min=13 , max=742976 , avg=155694.10, stdev=244610.02 >>> lat (usec): min=13 , max=742976 , avg=155694.50, stdev=244609.89 >>> bw (KB/s) : min= 0, max= 31, per=inf%, avg= 8.40, stdev=12.78 >>> read: (groupid=0, jobs=1): err= 0: pid=4885 >>> read : io=133008KB, bw=1108.3KB/s, iops=277 , runt=120022msec >>> slat (usec): min=8 , max=1159.1K, avg=164.24, stdev=9116.65 >>> clat (msec): min=1 , max=1988 , avg=28.34, stdev=55.81 >>> lat (msec): min=1 , max=1989 , avg=28.51, stdev=57.51 >>> bw (KB/s) : min= 2, max= 1808, per=51.10%, avg=1133.42, stdev=275.59 >>> >>> Both read latency/throughput has big difference with the patch, but write >>> gets starvation. >> Hi Jens and others, >> How do you think about the patch? > > Further more, Consider the following piece code. > > 2132 /* > 2133 * For RT and BE, we have to choose also the type > 2134 * (SYNC, SYNC_NOIDLE, ASYNC), and to compute a workload > 2135 * expiration time > 2136 */ > 2137 st = service_tree_for(cfqg, cfqd->serving_prio, cfqd->serving_type); > 2138 count = st->count; > 2139 > 2140 /* > 2141 * check workload expiration, and that we still have other queues ready > 2142 */ > 2143 if (count && !time_after(jiffies, cfqd->workload_expires)) > 2144 return; > > here, cfqd->serving_prio might be changed. But we continue to check workload expire > to decide whether let the old workload run. I don't think this makes too much sence. > I think if cfqd->serving_prio gets changed, we should recalculate workload type. > Am i missing something? This is already fixed in latest git e4ea0c16a85d221ebcc3a21f32e321440459e0fc Thanks, Shaohua ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2]block cfq: make queue preempt work for queues from different workload 2011-01-11 8:51 [PATCH 1/2]block cfq: make queue preempt work for queues from different workload Shaohua Li 2011-01-11 21:07 ` Corrado Zoccolo @ 2011-01-17 14:26 ` Vivek Goyal 2011-01-18 0:55 ` Shaohua Li 1 sibling, 1 reply; 10+ messages in thread From: Vivek Goyal @ 2011-01-17 14:26 UTC (permalink / raw) To: Shaohua Li Cc: lkml, Jens Axboe, jmoyer@redhat.com, Corrado Zoccolo, Gui Jianfeng On Tue, Jan 11, 2011 at 04:51:56PM +0800, Shaohua Li wrote: > I got this: > fio-874 [007] 2157.724514: 8,32 m N cfq874 preempt > fio-874 [007] 2157.724519: 8,32 m N cfq830 slice expired t=1 > fio-874 [007] 2157.724520: 8,32 m N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0 > fio-874 [007] 2157.724521: 8,32 m N cfq830 set_active wl_prio:0 wl_type:0 > fio-874 [007] 2157.724522: 8,32 m N cfq830 Not idling. st->count:1 > cfq830 is an async queue, and preempted by a sync queue cfq874. But since we > have cfqg->saved_workload_slice mechanism, the preempt is a nop. > Looks currently our preempt is totally broken if the two queues are not from > the same workload type. > Below patch fixes it. This will might make async queue starvation, but it's > what our old code does before cgroup is added. I am not sure how good a idea this is. So effectively now if there is a preemtion across workload, the workload being preempted can lose its share completely and be starved. So it is not just about async WRITES. sync-noidle can starve sync-idle workload too as meta data request will preempt any regular sync-idle queue and then sync-idle queue workload can be starved. So this is not exactly going back to old CFQ behavior but something more than that too. On one hand you are going to great lengths to fix the issues where if a cfqq is preempted, it does not lose its share (patch 2 in the series) and on the other hand you don't mind all the queues in a workload losing their share due to preemption. I think atleast we should limit this to async workload. Or solve the problem by giving even smaller slice length to async workload. Thanks Vivek ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2]block cfq: make queue preempt work for queues from different workload 2011-01-17 14:26 ` Vivek Goyal @ 2011-01-18 0:55 ` Shaohua Li 0 siblings, 0 replies; 10+ messages in thread From: Shaohua Li @ 2011-01-18 0:55 UTC (permalink / raw) To: Vivek Goyal Cc: lkml, Jens Axboe, jmoyer@redhat.com, Corrado Zoccolo, Gui Jianfeng On Mon, 2011-01-17 at 22:26 +0800, Vivek Goyal wrote: > On Tue, Jan 11, 2011 at 04:51:56PM +0800, Shaohua Li wrote: > > I got this: > > fio-874 [007] 2157.724514: 8,32 m N cfq874 preempt > > fio-874 [007] 2157.724519: 8,32 m N cfq830 slice expired t=1 > > fio-874 [007] 2157.724520: 8,32 m N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0 > > fio-874 [007] 2157.724521: 8,32 m N cfq830 set_active wl_prio:0 wl_type:0 > > fio-874 [007] 2157.724522: 8,32 m N cfq830 Not idling. st->count:1 > > cfq830 is an async queue, and preempted by a sync queue cfq874. But since we > > have cfqg->saved_workload_slice mechanism, the preempt is a nop. > > Looks currently our preempt is totally broken if the two queues are not from > > the same workload type. > > Below patch fixes it. This will might make async queue starvation, but it's > > what our old code does before cgroup is added. > > I am not sure how good a idea this is. So effectively now if there is a > preemtion across workload, the workload being preempted can lose its > share completely and be starved. So it is not just about async WRITES. > sync-noidle can starve sync-idle workload too as meta data request will > preempt any regular sync-idle queue and then sync-idle queue workload > can be starved. So this is not exactly going back to old CFQ behavior > but something more than that too. > > On one hand you are going to great lengths to fix the issues where if a > cfqq is preempted, it does not lose its share (patch 2 in the series) and > on the other hand you don't mind all the queues in a workload losing their > share due to preemption. > > I think atleast we should limit this to async workload. Or solve the > problem by giving even smaller slice length to async workload. I haven't too much idea about this, but disabling preempt might harm performance even for sync too, for example, the cfq_rq_close() case. can we do something like this: sync preempts async: expire the whole workload slice sync queue A preempt sync queue B: the workload of queue A will immediately expire after queue A is expired. Thanks, Shaohua ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2011-01-18 0:55 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-01-11 8:51 [PATCH 1/2]block cfq: make queue preempt work for queues from different workload Shaohua Li 2011-01-11 21:07 ` Corrado Zoccolo 2011-01-12 2:30 ` Shaohua Li 2011-01-14 4:44 ` Shaohua Li 2011-01-14 7:38 ` Jens Axboe 2011-01-14 14:29 ` Jeff Moyer 2011-01-17 3:41 ` Gui Jianfeng 2011-01-17 8:41 ` Shaohua Li 2011-01-17 14:26 ` Vivek Goyal 2011-01-18 0:55 ` Shaohua Li
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox