* [PATCH 1/2]block cfq: make queue preempt work for queues from different workload
@ 2011-01-11 8:51 Shaohua Li
2011-01-11 21:07 ` Corrado Zoccolo
2011-01-17 14:26 ` Vivek Goyal
0 siblings, 2 replies; 10+ messages in thread
From: Shaohua Li @ 2011-01-11 8:51 UTC (permalink / raw)
To: lkml
Cc: Jens Axboe, Vivek Goyal, jmoyer@redhat.com, Corrado Zoccolo,
Gui Jianfeng
I got this:
fio-874 [007] 2157.724514: 8,32 m N cfq874 preempt
fio-874 [007] 2157.724519: 8,32 m N cfq830 slice expired t=1
fio-874 [007] 2157.724520: 8,32 m N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0
fio-874 [007] 2157.724521: 8,32 m N cfq830 set_active wl_prio:0 wl_type:0
fio-874 [007] 2157.724522: 8,32 m N cfq830 Not idling. st->count:1
cfq830 is an async queue, and preempted by a sync queue cfq874. But since we
have cfqg->saved_workload_slice mechanism, the preempt is a nop.
Looks currently our preempt is totally broken if the two queues are not from
the same workload type.
Below patch fixes it. This will might make async queue starvation, but it's
what our old code does before cgroup is added.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
---
block/cfq-iosched.c | 9 +++++++++
1 file changed, 9 insertions(+)
Index: linux/block/cfq-iosched.c
===================================================================
--- linux.orig/block/cfq-iosched.c 2011-01-11 16:24:56.000000000 +0800
+++ linux/block/cfq-iosched.c 2011-01-11 16:41:49.000000000 +0800
@@ -3284,10 +3284,19 @@ cfq_should_preempt(struct cfq_data *cfqd
*/
static void cfq_preempt_queue(struct cfq_data *cfqd, struct cfq_queue *cfqq)
{
+ struct cfq_queue *old_cfqq = cfqd->active_queue;
+
cfq_log_cfqq(cfqd, cfqq, "preempt");
cfq_slice_expired(cfqd, 1);
/*
+ * workload type is changed, don't save slice, otherwise preempt
+ * doesn't happen
+ */
+ if (cfqq_type(old_cfqq) != cfqq_type(cfqq))
+ cfqq->cfqg->saved_workload_slice = 0;
+
+ /*
* Put the new queue at the front of the of the current list,
* so we know that it will be selected next.
*/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2]block cfq: make queue preempt work for queues from different workload
2011-01-11 8:51 [PATCH 1/2]block cfq: make queue preempt work for queues from different workload Shaohua Li
@ 2011-01-11 21:07 ` Corrado Zoccolo
2011-01-12 2:30 ` Shaohua Li
2011-01-17 14:26 ` Vivek Goyal
1 sibling, 1 reply; 10+ messages in thread
From: Corrado Zoccolo @ 2011-01-11 21:07 UTC (permalink / raw)
To: Shaohua Li; +Cc: lkml, Jens Axboe, Vivek Goyal, jmoyer@redhat.com, Gui Jianfeng
Hi Shaohua,
On Tue, Jan 11, 2011 at 9:51 AM, Shaohua Li <shaohua.li@intel.com> wrote:
> I got this:
> fio-874 [007] 2157.724514: 8,32 m N cfq874 preempt
> fio-874 [007] 2157.724519: 8,32 m N cfq830 slice expired t=1
> fio-874 [007] 2157.724520: 8,32 m N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0
> fio-874 [007] 2157.724521: 8,32 m N cfq830 set_active wl_prio:0 wl_type:0
> fio-874 [007] 2157.724522: 8,32 m N cfq830 Not idling. st->count:1
> cfq830 is an async queue, and preempted by a sync queue cfq874. But since we
> have cfqg->saved_workload_slice mechanism, the preempt is a nop.
> Looks currently our preempt is totally broken if the two queues are not from
> the same workload type.
> Below patch fixes it. This will might make async queue starvation, but it's
> what our old code does before cgroup is added.
have you measured latency improvements by un-breaking preemption?
AFAIK, preemption behaviour changed since 2.6.33, before cgroups were
added, and the latency before the changes that weakened preemption in
2.6.33 was far worse.
Thanks,
Corrado
>
> Signed-off-by: Shaohua Li <shaohua.li@intel.com>
>
> ---
> block/cfq-iosched.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> Index: linux/block/cfq-iosched.c
> ===================================================================
> --- linux.orig/block/cfq-iosched.c 2011-01-11 16:24:56.000000000 +0800
> +++ linux/block/cfq-iosched.c 2011-01-11 16:41:49.000000000 +0800
> @@ -3284,10 +3284,19 @@ cfq_should_preempt(struct cfq_data *cfqd
> */
> static void cfq_preempt_queue(struct cfq_data *cfqd, struct cfq_queue *cfqq)
> {
> + struct cfq_queue *old_cfqq = cfqd->active_queue;
> +
> cfq_log_cfqq(cfqd, cfqq, "preempt");
> cfq_slice_expired(cfqd, 1);
>
> /*
> + * workload type is changed, don't save slice, otherwise preempt
> + * doesn't happen
> + */
> + if (cfqq_type(old_cfqq) != cfqq_type(cfqq))
> + cfqq->cfqg->saved_workload_slice = 0;
> +
> + /*
> * Put the new queue at the front of the of the current list,
> * so we know that it will be selected next.
> */
>
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2]block cfq: make queue preempt work for queues from different workload
2011-01-11 21:07 ` Corrado Zoccolo
@ 2011-01-12 2:30 ` Shaohua Li
2011-01-14 4:44 ` Shaohua Li
0 siblings, 1 reply; 10+ messages in thread
From: Shaohua Li @ 2011-01-12 2:30 UTC (permalink / raw)
To: Corrado Zoccolo
Cc: lkml, Jens Axboe, Vivek Goyal, jmoyer@redhat.com, Gui Jianfeng
Hi,
On Wed, Jan 12, 2011 at 05:07:47AM +0800, Corrado Zoccolo wrote:
> Hi Shaohua,
> On Tue, Jan 11, 2011 at 9:51 AM, Shaohua Li <shaohua.li@intel.com> wrote:
> > I got this:
> > fio-874 [007] 2157.724514: 8,32 m N cfq874 preempt
> > fio-874 [007] 2157.724519: 8,32 m N cfq830 slice expired t=1
> > fio-874 [007] 2157.724520: 8,32 m N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0
> > fio-874 [007] 2157.724521: 8,32 m N cfq830 set_active wl_prio:0 wl_type:0
> > fio-874 [007] 2157.724522: 8,32 m N cfq830 Not idling. st->count:1
> > cfq830 is an async queue, and preempted by a sync queue cfq874. But since we
> > have cfqg->saved_workload_slice mechanism, the preempt is a nop.
> > Looks currently our preempt is totally broken if the two queues are not from
> > the same workload type.
> > Below patch fixes it. This will might make async queue starvation, but it's
> > what our old code does before cgroup is added.
> have you measured latency improvements by un-breaking preemption?
> AFAIK, preemption behaviour changed since 2.6.33, before cgroups were
> added, and the latency before the changes that weakened preemption in
> 2.6.33 was far worse.
Yes. I'm testing a SD card for MeeGo. The random write is very slow (~12k/s) but
random read is relatively fast > 1M/s.
Without patch:
write: (groupid=0, jobs=1): err= 0: pid=3876
write: io=966656 B, bw=8054 B/s, iops=1 , runt=120008msec
clat (usec): min=5 , max=1716.3K, avg=88637.38, stdev=207100.44
lat (usec): min=5 , max=1716.3K, avg=88637.69, stdev=207100.41
bw (KB/s) : min= 0, max= 52, per=168.17%, avg=11.77, stdev= 8.85
read: (groupid=0, jobs=1): err= 0: pid=3877
read : io=52516KB, bw=448084 B/s, iops=109 , runt=120014msec
slat (usec): min=7 , max=1918.5K, avg=519.78, stdev=25777.85
clat (msec): min=1 , max=2728 , avg=71.17, stdev=216.92
lat (msec): min=1 , max=2756 , avg=71.69, stdev=219.52
bw (KB/s) : min= 1, max= 1413, per=66.42%, avg=567.22, stdev=461.50
With patch:
write: (groupid=0, jobs=1): err= 0: pid=4884
write: io=81920 B, bw=677 B/s, iops=0 , runt=120983msec
clat (usec): min=13 , max=742976 , avg=155694.10, stdev=244610.02
lat (usec): min=13 , max=742976 , avg=155694.50, stdev=244609.89
bw (KB/s) : min= 0, max= 31, per=inf%, avg= 8.40, stdev=12.78
read: (groupid=0, jobs=1): err= 0: pid=4885
read : io=133008KB, bw=1108.3KB/s, iops=277 , runt=120022msec
slat (usec): min=8 , max=1159.1K, avg=164.24, stdev=9116.65
clat (msec): min=1 , max=1988 , avg=28.34, stdev=55.81
lat (msec): min=1 , max=1989 , avg=28.51, stdev=57.51
bw (KB/s) : min= 2, max= 1808, per=51.10%, avg=1133.42, stdev=275.59
Both read latency/throughput has big difference with the patch, but write
gets starvation.
Thanks,
Shaohua
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2]block cfq: make queue preempt work for queues from different workload
2011-01-12 2:30 ` Shaohua Li
@ 2011-01-14 4:44 ` Shaohua Li
2011-01-14 7:38 ` Jens Axboe
2011-01-17 3:41 ` Gui Jianfeng
0 siblings, 2 replies; 10+ messages in thread
From: Shaohua Li @ 2011-01-14 4:44 UTC (permalink / raw)
To: Corrado Zoccolo, Jens Axboe
Cc: lkml, Vivek Goyal, jmoyer@redhat.com, Gui Jianfeng
2011/1/12 Shaohua Li <shaohua.li@intel.com>:
> Hi,
> On Wed, Jan 12, 2011 at 05:07:47AM +0800, Corrado Zoccolo wrote:
>> Hi Shaohua,
>> On Tue, Jan 11, 2011 at 9:51 AM, Shaohua Li <shaohua.li@intel.com> wrote:
>> > I got this:
>> > fio-874 [007] 2157.724514: 8,32 m N cfq874 preempt
>> > fio-874 [007] 2157.724519: 8,32 m N cfq830 slice expired t=1
>> > fio-874 [007] 2157.724520: 8,32 m N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0
>> > fio-874 [007] 2157.724521: 8,32 m N cfq830 set_active wl_prio:0 wl_type:0
>> > fio-874 [007] 2157.724522: 8,32 m N cfq830 Not idling. st->count:1
>> > cfq830 is an async queue, and preempted by a sync queue cfq874. But since we
>> > have cfqg->saved_workload_slice mechanism, the preempt is a nop.
>> > Looks currently our preempt is totally broken if the two queues are not from
>> > the same workload type.
>> > Below patch fixes it. This will might make async queue starvation, but it's
>> > what our old code does before cgroup is added.
>> have you measured latency improvements by un-breaking preemption?
>> AFAIK, preemption behaviour changed since 2.6.33, before cgroups were
>> added, and the latency before the changes that weakened preemption in
>> 2.6.33 was far worse.
> Yes. I'm testing a SD card for MeeGo. The random write is very slow (~12k/s) but
> random read is relatively fast > 1M/s.
>
> Without patch:
> write: (groupid=0, jobs=1): err= 0: pid=3876
> write: io=966656 B, bw=8054 B/s, iops=1 , runt=120008msec
> clat (usec): min=5 , max=1716.3K, avg=88637.38, stdev=207100.44
> lat (usec): min=5 , max=1716.3K, avg=88637.69, stdev=207100.41
> bw (KB/s) : min= 0, max= 52, per=168.17%, avg=11.77, stdev= 8.85
> read: (groupid=0, jobs=1): err= 0: pid=3877
> read : io=52516KB, bw=448084 B/s, iops=109 , runt=120014msec
> slat (usec): min=7 , max=1918.5K, avg=519.78, stdev=25777.85
> clat (msec): min=1 , max=2728 , avg=71.17, stdev=216.92
> lat (msec): min=1 , max=2756 , avg=71.69, stdev=219.52
> bw (KB/s) : min= 1, max= 1413, per=66.42%, avg=567.22, stdev=461.50
>
> With patch:
> write: (groupid=0, jobs=1): err= 0: pid=4884
> write: io=81920 B, bw=677 B/s, iops=0 , runt=120983msec
> clat (usec): min=13 , max=742976 , avg=155694.10, stdev=244610.02
> lat (usec): min=13 , max=742976 , avg=155694.50, stdev=244609.89
> bw (KB/s) : min= 0, max= 31, per=inf%, avg= 8.40, stdev=12.78
> read: (groupid=0, jobs=1): err= 0: pid=4885
> read : io=133008KB, bw=1108.3KB/s, iops=277 , runt=120022msec
> slat (usec): min=8 , max=1159.1K, avg=164.24, stdev=9116.65
> clat (msec): min=1 , max=1988 , avg=28.34, stdev=55.81
> lat (msec): min=1 , max=1989 , avg=28.51, stdev=57.51
> bw (KB/s) : min= 2, max= 1808, per=51.10%, avg=1133.42, stdev=275.59
>
> Both read latency/throughput has big difference with the patch, but write
> gets starvation.
Hi Jens and others,
How do you think about the patch?
Thanks,
Shaohua
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2]block cfq: make queue preempt work for queues from different workload
2011-01-14 4:44 ` Shaohua Li
@ 2011-01-14 7:38 ` Jens Axboe
2011-01-14 14:29 ` Jeff Moyer
2011-01-17 3:41 ` Gui Jianfeng
1 sibling, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2011-01-14 7:38 UTC (permalink / raw)
To: Shaohua Li
Cc: Corrado Zoccolo, lkml, Vivek Goyal, jmoyer@redhat.com,
Gui Jianfeng
On 2011-01-14 05:44, Shaohua Li wrote:
> 2011/1/12 Shaohua Li <shaohua.li@intel.com>:
>> Hi,
>> On Wed, Jan 12, 2011 at 05:07:47AM +0800, Corrado Zoccolo wrote:
>>> Hi Shaohua,
>>> On Tue, Jan 11, 2011 at 9:51 AM, Shaohua Li <shaohua.li@intel.com> wrote:
>>>> I got this:
>>>> fio-874 [007] 2157.724514: 8,32 m N cfq874 preempt
>>>> fio-874 [007] 2157.724519: 8,32 m N cfq830 slice expired t=1
>>>> fio-874 [007] 2157.724520: 8,32 m N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0
>>>> fio-874 [007] 2157.724521: 8,32 m N cfq830 set_active wl_prio:0 wl_type:0
>>>> fio-874 [007] 2157.724522: 8,32 m N cfq830 Not idling. st->count:1
>>>> cfq830 is an async queue, and preempted by a sync queue cfq874. But since we
>>>> have cfqg->saved_workload_slice mechanism, the preempt is a nop.
>>>> Looks currently our preempt is totally broken if the two queues are not from
>>>> the same workload type.
>>>> Below patch fixes it. This will might make async queue starvation, but it's
>>>> what our old code does before cgroup is added.
>>> have you measured latency improvements by un-breaking preemption?
>>> AFAIK, preemption behaviour changed since 2.6.33, before cgroups were
>>> added, and the latency before the changes that weakened preemption in
>>> 2.6.33 was far worse.
>> Yes. I'm testing a SD card for MeeGo. The random write is very slow (~12k/s) but
>> random read is relatively fast > 1M/s.
>>
>> Without patch:
>> write: (groupid=0, jobs=1): err= 0: pid=3876
>> write: io=966656 B, bw=8054 B/s, iops=1 , runt=120008msec
>> clat (usec): min=5 , max=1716.3K, avg=88637.38, stdev=207100.44
>> lat (usec): min=5 , max=1716.3K, avg=88637.69, stdev=207100.41
>> bw (KB/s) : min= 0, max= 52, per=168.17%, avg=11.77, stdev= 8.85
>> read: (groupid=0, jobs=1): err= 0: pid=3877
>> read : io=52516KB, bw=448084 B/s, iops=109 , runt=120014msec
>> slat (usec): min=7 , max=1918.5K, avg=519.78, stdev=25777.85
>> clat (msec): min=1 , max=2728 , avg=71.17, stdev=216.92
>> lat (msec): min=1 , max=2756 , avg=71.69, stdev=219.52
>> bw (KB/s) : min= 1, max= 1413, per=66.42%, avg=567.22, stdev=461.50
>>
>> With patch:
>> write: (groupid=0, jobs=1): err= 0: pid=4884
>> write: io=81920 B, bw=677 B/s, iops=0 , runt=120983msec
>> clat (usec): min=13 , max=742976 , avg=155694.10, stdev=244610.02
>> lat (usec): min=13 , max=742976 , avg=155694.50, stdev=244609.89
>> bw (KB/s) : min= 0, max= 31, per=inf%, avg= 8.40, stdev=12.78
>> read: (groupid=0, jobs=1): err= 0: pid=4885
>> read : io=133008KB, bw=1108.3KB/s, iops=277 , runt=120022msec
>> slat (usec): min=8 , max=1159.1K, avg=164.24, stdev=9116.65
>> clat (msec): min=1 , max=1988 , avg=28.34, stdev=55.81
>> lat (msec): min=1 , max=1989 , avg=28.51, stdev=57.51
>> bw (KB/s) : min= 2, max= 1808, per=51.10%, avg=1133.42, stdev=275.59
>>
>> Both read latency/throughput has big difference with the patch, but write
>> gets starvation.
> Hi Jens and others,
> How do you think about the patch?
I think the patch is good. If preemption in some cases makes things
worse, then we need to look into those. That's a separate issue.
--
Jens Axboe
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2]block cfq: make queue preempt work for queues from different workload
2011-01-14 7:38 ` Jens Axboe
@ 2011-01-14 14:29 ` Jeff Moyer
0 siblings, 0 replies; 10+ messages in thread
From: Jeff Moyer @ 2011-01-14 14:29 UTC (permalink / raw)
To: Jens Axboe; +Cc: Shaohua Li, Corrado Zoccolo, lkml, Vivek Goyal, Gui Jianfeng
Jens Axboe <jaxboe@fusionio.com> writes:
> On 2011-01-14 05:44, Shaohua Li wrote:
>> 2011/1/12 Shaohua Li <shaohua.li@intel.com>:
>>> Hi,
>>> On Wed, Jan 12, 2011 at 05:07:47AM +0800, Corrado Zoccolo wrote:
>>>> Hi Shaohua,
>>>> On Tue, Jan 11, 2011 at 9:51 AM, Shaohua Li <shaohua.li@intel.com> wrote:
>>>>> I got this:
>>>>> fio-874 [007] 2157.724514: 8,32 m N cfq874 preempt
>>>>> fio-874 [007] 2157.724519: 8,32 m N cfq830 slice expired t=1
>>>>> fio-874 [007] 2157.724520: 8,32 m N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0
>>>>> fio-874 [007] 2157.724521: 8,32 m N cfq830 set_active wl_prio:0 wl_type:0
>>>>> fio-874 [007] 2157.724522: 8,32 m N cfq830 Not idling. st->count:1
>>>>> cfq830 is an async queue, and preempted by a sync queue cfq874. But since we
>>>>> have cfqg->saved_workload_slice mechanism, the preempt is a nop.
>>>>> Looks currently our preempt is totally broken if the two queues are not from
>>>>> the same workload type.
>>>>> Below patch fixes it. This will might make async queue starvation, but it's
>>>>> what our old code does before cgroup is added.
>>>> have you measured latency improvements by un-breaking preemption?
>>>> AFAIK, preemption behaviour changed since 2.6.33, before cgroups were
>>>> added, and the latency before the changes that weakened preemption in
>>>> 2.6.33 was far worse.
>>> Yes. I'm testing a SD card for MeeGo. The random write is very slow (~12k/s) but
>>> random read is relatively fast > 1M/s.
>>>
>>> Without patch:
>>> write: (groupid=0, jobs=1): err= 0: pid=3876
>>> write: io=966656 B, bw=8054 B/s, iops=1 , runt=120008msec
>>> clat (usec): min=5 , max=1716.3K, avg=88637.38, stdev=207100.44
>>> lat (usec): min=5 , max=1716.3K, avg=88637.69, stdev=207100.41
>>> bw (KB/s) : min= 0, max= 52, per=168.17%, avg=11.77, stdev= 8.85
>>> read: (groupid=0, jobs=1): err= 0: pid=3877
>>> read : io=52516KB, bw=448084 B/s, iops=109 , runt=120014msec
>>> slat (usec): min=7 , max=1918.5K, avg=519.78, stdev=25777.85
>>> clat (msec): min=1 , max=2728 , avg=71.17, stdev=216.92
>>> lat (msec): min=1 , max=2756 , avg=71.69, stdev=219.52
>>> bw (KB/s) : min= 1, max= 1413, per=66.42%, avg=567.22, stdev=461.50
>>>
>>> With patch:
>>> write: (groupid=0, jobs=1): err= 0: pid=4884
>>> write: io=81920 B, bw=677 B/s, iops=0 , runt=120983msec
>>> clat (usec): min=13 , max=742976 , avg=155694.10, stdev=244610.02
>>> lat (usec): min=13 , max=742976 , avg=155694.50, stdev=244609.89
>>> bw (KB/s) : min= 0, max= 31, per=inf%, avg= 8.40, stdev=12.78
>>> read: (groupid=0, jobs=1): err= 0: pid=4885
>>> read : io=133008KB, bw=1108.3KB/s, iops=277 , runt=120022msec
>>> slat (usec): min=8 , max=1159.1K, avg=164.24, stdev=9116.65
>>> clat (msec): min=1 , max=1988 , avg=28.34, stdev=55.81
>>> lat (msec): min=1 , max=1989 , avg=28.51, stdev=57.51
>>> bw (KB/s) : min= 2, max= 1808, per=51.10%, avg=1133.42, stdev=275.59
>>>
>>> Both read latency/throughput has big difference with the patch, but write
>>> gets starvation.
>> Hi Jens and others,
>> How do you think about the patch?
>
> I think the patch is good. If preemption in some cases makes things
> worse, then we need to look into those. That's a separate issue.
I think things are getting pretty messy. Who would have guessed that
the saved_workload_slice would affect preemption? If a queue is to
preempt another, can't we make that a bit more explicit? I'm still
trying to walk through the code to figure out how this ends up
happening, as the patch description leaves a lot out of the picture.
Cheers,
Jeff
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2]block cfq: make queue preempt work for queues from different workload
2011-01-14 4:44 ` Shaohua Li
2011-01-14 7:38 ` Jens Axboe
@ 2011-01-17 3:41 ` Gui Jianfeng
2011-01-17 8:41 ` Shaohua Li
1 sibling, 1 reply; 10+ messages in thread
From: Gui Jianfeng @ 2011-01-17 3:41 UTC (permalink / raw)
To: Shaohua Li
Cc: Corrado Zoccolo, Jens Axboe, lkml, Vivek Goyal, jmoyer@redhat.com
Shaohua Li wrote:
> 2011/1/12 Shaohua Li <shaohua.li@intel.com>:
>> Hi,
>> On Wed, Jan 12, 2011 at 05:07:47AM +0800, Corrado Zoccolo wrote:
>>> Hi Shaohua,
>>> On Tue, Jan 11, 2011 at 9:51 AM, Shaohua Li <shaohua.li@intel.com> wrote:
>>>> I got this:
>>>> fio-874 [007] 2157.724514: 8,32 m N cfq874 preempt
>>>> fio-874 [007] 2157.724519: 8,32 m N cfq830 slice expired t=1
>>>> fio-874 [007] 2157.724520: 8,32 m N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0
>>>> fio-874 [007] 2157.724521: 8,32 m N cfq830 set_active wl_prio:0 wl_type:0
>>>> fio-874 [007] 2157.724522: 8,32 m N cfq830 Not idling. st->count:1
>>>> cfq830 is an async queue, and preempted by a sync queue cfq874. But since we
>>>> have cfqg->saved_workload_slice mechanism, the preempt is a nop.
>>>> Looks currently our preempt is totally broken if the two queues are not from
>>>> the same workload type.
>>>> Below patch fixes it. This will might make async queue starvation, but it's
>>>> what our old code does before cgroup is added.
>>> have you measured latency improvements by un-breaking preemption?
>>> AFAIK, preemption behaviour changed since 2.6.33, before cgroups were
>>> added, and the latency before the changes that weakened preemption in
>>> 2.6.33 was far worse.
>> Yes. I'm testing a SD card for MeeGo. The random write is very slow (~12k/s) but
>> random read is relatively fast > 1M/s.
>>
>> Without patch:
>> write: (groupid=0, jobs=1): err= 0: pid=3876
>> write: io=966656 B, bw=8054 B/s, iops=1 , runt=120008msec
>> clat (usec): min=5 , max=1716.3K, avg=88637.38, stdev=207100.44
>> lat (usec): min=5 , max=1716.3K, avg=88637.69, stdev=207100.41
>> bw (KB/s) : min= 0, max= 52, per=168.17%, avg=11.77, stdev= 8.85
>> read: (groupid=0, jobs=1): err= 0: pid=3877
>> read : io=52516KB, bw=448084 B/s, iops=109 , runt=120014msec
>> slat (usec): min=7 , max=1918.5K, avg=519.78, stdev=25777.85
>> clat (msec): min=1 , max=2728 , avg=71.17, stdev=216.92
>> lat (msec): min=1 , max=2756 , avg=71.69, stdev=219.52
>> bw (KB/s) : min= 1, max= 1413, per=66.42%, avg=567.22, stdev=461.50
>>
>> With patch:
>> write: (groupid=0, jobs=1): err= 0: pid=4884
>> write: io=81920 B, bw=677 B/s, iops=0 , runt=120983msec
>> clat (usec): min=13 , max=742976 , avg=155694.10, stdev=244610.02
>> lat (usec): min=13 , max=742976 , avg=155694.50, stdev=244609.89
>> bw (KB/s) : min= 0, max= 31, per=inf%, avg= 8.40, stdev=12.78
>> read: (groupid=0, jobs=1): err= 0: pid=4885
>> read : io=133008KB, bw=1108.3KB/s, iops=277 , runt=120022msec
>> slat (usec): min=8 , max=1159.1K, avg=164.24, stdev=9116.65
>> clat (msec): min=1 , max=1988 , avg=28.34, stdev=55.81
>> lat (msec): min=1 , max=1989 , avg=28.51, stdev=57.51
>> bw (KB/s) : min= 2, max= 1808, per=51.10%, avg=1133.42, stdev=275.59
>>
>> Both read latency/throughput has big difference with the patch, but write
>> gets starvation.
> Hi Jens and others,
> How do you think about the patch?
Further more, Consider the following piece code.
2132 /*
2133 * For RT and BE, we have to choose also the type
2134 * (SYNC, SYNC_NOIDLE, ASYNC), and to compute a workload
2135 * expiration time
2136 */
2137 st = service_tree_for(cfqg, cfqd->serving_prio, cfqd->serving_type);
2138 count = st->count;
2139
2140 /*
2141 * check workload expiration, and that we still have other queues ready
2142 */
2143 if (count && !time_after(jiffies, cfqd->workload_expires))
2144 return;
here, cfqd->serving_prio might be changed. But we continue to check workload expire
to decide whether let the old workload run. I don't think this makes too much sence.
I think if cfqd->serving_prio gets changed, we should recalculate workload type.
Am i missing something?
Thanks
Gui
>
> Thanks,
> Shaohua
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2]block cfq: make queue preempt work for queues from different workload
2011-01-17 3:41 ` Gui Jianfeng
@ 2011-01-17 8:41 ` Shaohua Li
0 siblings, 0 replies; 10+ messages in thread
From: Shaohua Li @ 2011-01-17 8:41 UTC (permalink / raw)
To: Gui Jianfeng
Cc: Corrado Zoccolo, Jens Axboe, lkml, Vivek Goyal, jmoyer@redhat.com
2011/1/17 Gui Jianfeng <guijianfeng@cn.fujitsu.com>:
> Shaohua Li wrote:
>> 2011/1/12 Shaohua Li <shaohua.li@intel.com>:
>>> Hi,
>>> On Wed, Jan 12, 2011 at 05:07:47AM +0800, Corrado Zoccolo wrote:
>>>> Hi Shaohua,
>>>> On Tue, Jan 11, 2011 at 9:51 AM, Shaohua Li <shaohua.li@intel.com> wrote:
>>>>> I got this:
>>>>> fio-874 [007] 2157.724514: 8,32 m N cfq874 preempt
>>>>> fio-874 [007] 2157.724519: 8,32 m N cfq830 slice expired t=1
>>>>> fio-874 [007] 2157.724520: 8,32 m N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0
>>>>> fio-874 [007] 2157.724521: 8,32 m N cfq830 set_active wl_prio:0 wl_type:0
>>>>> fio-874 [007] 2157.724522: 8,32 m N cfq830 Not idling. st->count:1
>>>>> cfq830 is an async queue, and preempted by a sync queue cfq874. But since we
>>>>> have cfqg->saved_workload_slice mechanism, the preempt is a nop.
>>>>> Looks currently our preempt is totally broken if the two queues are not from
>>>>> the same workload type.
>>>>> Below patch fixes it. This will might make async queue starvation, but it's
>>>>> what our old code does before cgroup is added.
>>>> have you measured latency improvements by un-breaking preemption?
>>>> AFAIK, preemption behaviour changed since 2.6.33, before cgroups were
>>>> added, and the latency before the changes that weakened preemption in
>>>> 2.6.33 was far worse.
>>> Yes. I'm testing a SD card for MeeGo. The random write is very slow (~12k/s) but
>>> random read is relatively fast > 1M/s.
>>>
>>> Without patch:
>>> write: (groupid=0, jobs=1): err= 0: pid=3876
>>> write: io=966656 B, bw=8054 B/s, iops=1 , runt=120008msec
>>> clat (usec): min=5 , max=1716.3K, avg=88637.38, stdev=207100.44
>>> lat (usec): min=5 , max=1716.3K, avg=88637.69, stdev=207100.41
>>> bw (KB/s) : min= 0, max= 52, per=168.17%, avg=11.77, stdev= 8.85
>>> read: (groupid=0, jobs=1): err= 0: pid=3877
>>> read : io=52516KB, bw=448084 B/s, iops=109 , runt=120014msec
>>> slat (usec): min=7 , max=1918.5K, avg=519.78, stdev=25777.85
>>> clat (msec): min=1 , max=2728 , avg=71.17, stdev=216.92
>>> lat (msec): min=1 , max=2756 , avg=71.69, stdev=219.52
>>> bw (KB/s) : min= 1, max= 1413, per=66.42%, avg=567.22, stdev=461.50
>>>
>>> With patch:
>>> write: (groupid=0, jobs=1): err= 0: pid=4884
>>> write: io=81920 B, bw=677 B/s, iops=0 , runt=120983msec
>>> clat (usec): min=13 , max=742976 , avg=155694.10, stdev=244610.02
>>> lat (usec): min=13 , max=742976 , avg=155694.50, stdev=244609.89
>>> bw (KB/s) : min= 0, max= 31, per=inf%, avg= 8.40, stdev=12.78
>>> read: (groupid=0, jobs=1): err= 0: pid=4885
>>> read : io=133008KB, bw=1108.3KB/s, iops=277 , runt=120022msec
>>> slat (usec): min=8 , max=1159.1K, avg=164.24, stdev=9116.65
>>> clat (msec): min=1 , max=1988 , avg=28.34, stdev=55.81
>>> lat (msec): min=1 , max=1989 , avg=28.51, stdev=57.51
>>> bw (KB/s) : min= 2, max= 1808, per=51.10%, avg=1133.42, stdev=275.59
>>>
>>> Both read latency/throughput has big difference with the patch, but write
>>> gets starvation.
>> Hi Jens and others,
>> How do you think about the patch?
>
> Further more, Consider the following piece code.
>
> 2132 /*
> 2133 * For RT and BE, we have to choose also the type
> 2134 * (SYNC, SYNC_NOIDLE, ASYNC), and to compute a workload
> 2135 * expiration time
> 2136 */
> 2137 st = service_tree_for(cfqg, cfqd->serving_prio, cfqd->serving_type);
> 2138 count = st->count;
> 2139
> 2140 /*
> 2141 * check workload expiration, and that we still have other queues ready
> 2142 */
> 2143 if (count && !time_after(jiffies, cfqd->workload_expires))
> 2144 return;
>
> here, cfqd->serving_prio might be changed. But we continue to check workload expire
> to decide whether let the old workload run. I don't think this makes too much sence.
> I think if cfqd->serving_prio gets changed, we should recalculate workload type.
> Am i missing something?
This is already fixed in latest git
e4ea0c16a85d221ebcc3a21f32e321440459e0fc
Thanks,
Shaohua
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2]block cfq: make queue preempt work for queues from different workload
2011-01-11 8:51 [PATCH 1/2]block cfq: make queue preempt work for queues from different workload Shaohua Li
2011-01-11 21:07 ` Corrado Zoccolo
@ 2011-01-17 14:26 ` Vivek Goyal
2011-01-18 0:55 ` Shaohua Li
1 sibling, 1 reply; 10+ messages in thread
From: Vivek Goyal @ 2011-01-17 14:26 UTC (permalink / raw)
To: Shaohua Li
Cc: lkml, Jens Axboe, jmoyer@redhat.com, Corrado Zoccolo,
Gui Jianfeng
On Tue, Jan 11, 2011 at 04:51:56PM +0800, Shaohua Li wrote:
> I got this:
> fio-874 [007] 2157.724514: 8,32 m N cfq874 preempt
> fio-874 [007] 2157.724519: 8,32 m N cfq830 slice expired t=1
> fio-874 [007] 2157.724520: 8,32 m N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0
> fio-874 [007] 2157.724521: 8,32 m N cfq830 set_active wl_prio:0 wl_type:0
> fio-874 [007] 2157.724522: 8,32 m N cfq830 Not idling. st->count:1
> cfq830 is an async queue, and preempted by a sync queue cfq874. But since we
> have cfqg->saved_workload_slice mechanism, the preempt is a nop.
> Looks currently our preempt is totally broken if the two queues are not from
> the same workload type.
> Below patch fixes it. This will might make async queue starvation, but it's
> what our old code does before cgroup is added.
I am not sure how good a idea this is. So effectively now if there is a
preemtion across workload, the workload being preempted can lose its
share completely and be starved. So it is not just about async WRITES.
sync-noidle can starve sync-idle workload too as meta data request will
preempt any regular sync-idle queue and then sync-idle queue workload
can be starved. So this is not exactly going back to old CFQ behavior
but something more than that too.
On one hand you are going to great lengths to fix the issues where if a
cfqq is preempted, it does not lose its share (patch 2 in the series) and
on the other hand you don't mind all the queues in a workload losing their
share due to preemption.
I think atleast we should limit this to async workload. Or solve the
problem by giving even smaller slice length to async workload.
Thanks
Vivek
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2]block cfq: make queue preempt work for queues from different workload
2011-01-17 14:26 ` Vivek Goyal
@ 2011-01-18 0:55 ` Shaohua Li
0 siblings, 0 replies; 10+ messages in thread
From: Shaohua Li @ 2011-01-18 0:55 UTC (permalink / raw)
To: Vivek Goyal
Cc: lkml, Jens Axboe, jmoyer@redhat.com, Corrado Zoccolo,
Gui Jianfeng
On Mon, 2011-01-17 at 22:26 +0800, Vivek Goyal wrote:
> On Tue, Jan 11, 2011 at 04:51:56PM +0800, Shaohua Li wrote:
> > I got this:
> > fio-874 [007] 2157.724514: 8,32 m N cfq874 preempt
> > fio-874 [007] 2157.724519: 8,32 m N cfq830 slice expired t=1
> > fio-874 [007] 2157.724520: 8,32 m N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0
> > fio-874 [007] 2157.724521: 8,32 m N cfq830 set_active wl_prio:0 wl_type:0
> > fio-874 [007] 2157.724522: 8,32 m N cfq830 Not idling. st->count:1
> > cfq830 is an async queue, and preempted by a sync queue cfq874. But since we
> > have cfqg->saved_workload_slice mechanism, the preempt is a nop.
> > Looks currently our preempt is totally broken if the two queues are not from
> > the same workload type.
> > Below patch fixes it. This will might make async queue starvation, but it's
> > what our old code does before cgroup is added.
>
> I am not sure how good a idea this is. So effectively now if there is a
> preemtion across workload, the workload being preempted can lose its
> share completely and be starved. So it is not just about async WRITES.
> sync-noidle can starve sync-idle workload too as meta data request will
> preempt any regular sync-idle queue and then sync-idle queue workload
> can be starved. So this is not exactly going back to old CFQ behavior
> but something more than that too.
>
> On one hand you are going to great lengths to fix the issues where if a
> cfqq is preempted, it does not lose its share (patch 2 in the series) and
> on the other hand you don't mind all the queues in a workload losing their
> share due to preemption.
>
> I think atleast we should limit this to async workload. Or solve the
> problem by giving even smaller slice length to async workload.
I haven't too much idea about this, but disabling preempt might harm
performance even for sync too, for example, the cfq_rq_close() case.
can we do something like this:
sync preempts async: expire the whole workload slice
sync queue A preempt sync queue B: the workload of queue A will
immediately expire after queue A is expired.
Thanks,
Shaohua
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2011-01-18 0:55 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-11 8:51 [PATCH 1/2]block cfq: make queue preempt work for queues from different workload Shaohua Li
2011-01-11 21:07 ` Corrado Zoccolo
2011-01-12 2:30 ` Shaohua Li
2011-01-14 4:44 ` Shaohua Li
2011-01-14 7:38 ` Jens Axboe
2011-01-14 14:29 ` Jeff Moyer
2011-01-17 3:41 ` Gui Jianfeng
2011-01-17 8:41 ` Shaohua Li
2011-01-17 14:26 ` Vivek Goyal
2011-01-18 0:55 ` Shaohua Li
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox