[PATCH 1/2]block cfq: make queue preempt work for queues from different workload

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/2]block cfq:  make queue preempt work for queues from different workload
@ 2011-01-11  8:51 Shaohua Li
  2011-01-11 21:07 ` Corrado Zoccolo
  2011-01-17 14:26 ` Vivek Goyal
  0 siblings, 2 replies; 10+ messages in thread
From: Shaohua Li @ 2011-01-11  8:51 UTC (permalink / raw)
  To: lkml
  Cc: Jens Axboe, Vivek Goyal, jmoyer@redhat.com, Corrado Zoccolo,
	Gui Jianfeng

I got this:
             fio-874   [007]  2157.724514:   8,32   m   N cfq874 preempt
             fio-874   [007]  2157.724519:   8,32   m   N cfq830 slice expired t=1
             fio-874   [007]  2157.724520:   8,32   m   N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0
             fio-874   [007]  2157.724521:   8,32   m   N cfq830 set_active wl_prio:0 wl_type:0
             fio-874   [007]  2157.724522:   8,32   m   N cfq830 Not idling. st->count:1
cfq830 is an async queue, and preempted by a sync queue cfq874. But since we
have cfqg->saved_workload_slice mechanism, the preempt is a nop.
Looks currently our preempt is totally broken if the two queues are not from
the same workload type.
Below patch fixes it. This will might make async queue starvation, but it's
what our old code does before cgroup is added.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>

---
 block/cfq-iosched.c |    9 +++++++++
 1 file changed, 9 insertions(+)

Index: linux/block/cfq-iosched.c
===================================================================
--- linux.orig/block/cfq-iosched.c	2011-01-11 16:24:56.000000000 +0800
+++ linux/block/cfq-iosched.c	2011-01-11 16:41:49.000000000 +0800
@@ -3284,10 +3284,19 @@ cfq_should_preempt(struct cfq_data *cfqd
  */
 static void cfq_preempt_queue(struct cfq_data *cfqd, struct cfq_queue *cfqq)
 {
+	struct cfq_queue *old_cfqq = cfqd->active_queue;
+
 	cfq_log_cfqq(cfqd, cfqq, "preempt");
 	cfq_slice_expired(cfqd, 1);
 
 	/*
+	 * workload type is changed, don't save slice, otherwise preempt
+	 * doesn't happen
+	 */
+	if (cfqq_type(old_cfqq) != cfqq_type(cfqq))
+		cfqq->cfqg->saved_workload_slice = 0;
+
+	/*
 	 * Put the new queue at the front of the of the current list,
 	 * so we know that it will be selected next.
 	 */



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2]block cfq: make queue preempt work for queues from different workload
  2011-01-11  8:51 [PATCH 1/2]block cfq: make queue preempt work for queues from different workload Shaohua Li
@ 2011-01-11 21:07 ` Corrado Zoccolo
  2011-01-12  2:30   ` Shaohua Li
  2011-01-17 14:26 ` Vivek Goyal
  1 sibling, 1 reply; 10+ messages in thread
From: Corrado Zoccolo @ 2011-01-11 21:07 UTC (permalink / raw)
  To: Shaohua Li; +Cc: lkml, Jens Axboe, Vivek Goyal, jmoyer@redhat.com, Gui Jianfeng

Hi Shaohua,
On Tue, Jan 11, 2011 at 9:51 AM, Shaohua Li <shaohua.li@intel.com> wrote:
> I got this:
>             fio-874   [007]  2157.724514:   8,32   m   N cfq874 preempt
>             fio-874   [007]  2157.724519:   8,32   m   N cfq830 slice expired t=1
>             fio-874   [007]  2157.724520:   8,32   m   N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0
>             fio-874   [007]  2157.724521:   8,32   m   N cfq830 set_active wl_prio:0 wl_type:0
>             fio-874   [007]  2157.724522:   8,32   m   N cfq830 Not idling. st->count:1
> cfq830 is an async queue, and preempted by a sync queue cfq874. But since we
> have cfqg->saved_workload_slice mechanism, the preempt is a nop.
> Looks currently our preempt is totally broken if the two queues are not from
> the same workload type.
> Below patch fixes it. This will might make async queue starvation, but it's
> what our old code does before cgroup is added.
have you measured latency improvements by un-breaking preemption?
AFAIK, preemption behaviour changed since 2.6.33, before cgroups were
added, and the latency before the changes that weakened preemption in
2.6.33 was far worse.

Thanks,
Corrado
>
> Signed-off-by: Shaohua Li <shaohua.li@intel.com>
>
> ---
>  block/cfq-iosched.c |    9 +++++++++
>  1 file changed, 9 insertions(+)
>
> Index: linux/block/cfq-iosched.c
> ===================================================================
> --- linux.orig/block/cfq-iosched.c      2011-01-11 16:24:56.000000000 +0800
> +++ linux/block/cfq-iosched.c   2011-01-11 16:41:49.000000000 +0800
> @@ -3284,10 +3284,19 @@ cfq_should_preempt(struct cfq_data *cfqd
>  */
>  static void cfq_preempt_queue(struct cfq_data *cfqd, struct cfq_queue *cfqq)
>  {
> +       struct cfq_queue *old_cfqq = cfqd->active_queue;
> +
>        cfq_log_cfqq(cfqd, cfqq, "preempt");
>        cfq_slice_expired(cfqd, 1);
>
>        /*
> +        * workload type is changed, don't save slice, otherwise preempt
> +        * doesn't happen
> +        */
> +       if (cfqq_type(old_cfqq) != cfqq_type(cfqq))
> +               cfqq->cfqg->saved_workload_slice = 0;
> +
> +       /*
>         * Put the new queue at the front of the of the current list,
>         * so we know that it will be selected next.
>         */
>
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2]block cfq: make queue preempt work for queues from different workload
  2011-01-11 21:07 ` Corrado Zoccolo
@ 2011-01-12  2:30   ` Shaohua Li
  2011-01-14  4:44     ` Shaohua Li
  0 siblings, 1 reply; 10+ messages in thread
From: Shaohua Li @ 2011-01-12  2:30 UTC (permalink / raw)
  To: Corrado Zoccolo
  Cc: lkml, Jens Axboe, Vivek Goyal, jmoyer@redhat.com, Gui Jianfeng

Hi,
On Wed, Jan 12, 2011 at 05:07:47AM +0800, Corrado Zoccolo wrote:
> Hi Shaohua,
> On Tue, Jan 11, 2011 at 9:51 AM, Shaohua Li <shaohua.li@intel.com> wrote:
> > I got this:
> >             fio-874   [007]  2157.724514:   8,32   m   N cfq874 preempt
> >             fio-874   [007]  2157.724519:   8,32   m   N cfq830 slice expired t=1
> >             fio-874   [007]  2157.724520:   8,32   m   N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0
> >             fio-874   [007]  2157.724521:   8,32   m   N cfq830 set_active wl_prio:0 wl_type:0
> >             fio-874   [007]  2157.724522:   8,32   m   N cfq830 Not idling. st->count:1
> > cfq830 is an async queue, and preempted by a sync queue cfq874. But since we
> > have cfqg->saved_workload_slice mechanism, the preempt is a nop.
> > Looks currently our preempt is totally broken if the two queues are not from
> > the same workload type.
> > Below patch fixes it. This will might make async queue starvation, but it's
> > what our old code does before cgroup is added.
> have you measured latency improvements by un-breaking preemption?
> AFAIK, preemption behaviour changed since 2.6.33, before cgroups were
> added, and the latency before the changes that weakened preemption in
> 2.6.33 was far worse.
Yes. I'm testing a SD card for MeeGo. The random write is very slow (~12k/s) but
random read is relatively fast > 1M/s.

Without patch:
write: (groupid=0, jobs=1): err= 0: pid=3876
  write: io=966656 B, bw=8054 B/s, iops=1 , runt=120008msec
    clat (usec): min=5 , max=1716.3K, avg=88637.38, stdev=207100.44
     lat (usec): min=5 , max=1716.3K, avg=88637.69, stdev=207100.41
    bw (KB/s) : min=    0, max=   52, per=168.17%, avg=11.77, stdev= 8.85
read: (groupid=0, jobs=1): err= 0: pid=3877
  read : io=52516KB, bw=448084 B/s, iops=109 , runt=120014msec
    slat (usec): min=7 , max=1918.5K, avg=519.78, stdev=25777.85
    clat (msec): min=1 , max=2728 , avg=71.17, stdev=216.92
     lat (msec): min=1 , max=2756 , avg=71.69, stdev=219.52
    bw (KB/s) : min=    1, max= 1413, per=66.42%, avg=567.22, stdev=461.50

With patch:
write: (groupid=0, jobs=1): err= 0: pid=4884
  write: io=81920 B, bw=677 B/s, iops=0 , runt=120983msec
    clat (usec): min=13 , max=742976 , avg=155694.10, stdev=244610.02
     lat (usec): min=13 , max=742976 , avg=155694.50, stdev=244609.89
    bw (KB/s) : min=    0, max=   31, per=inf%, avg= 8.40, stdev=12.78
read: (groupid=0, jobs=1): err= 0: pid=4885
  read : io=133008KB, bw=1108.3KB/s, iops=277 , runt=120022msec
    slat (usec): min=8 , max=1159.1K, avg=164.24, stdev=9116.65
    clat (msec): min=1 , max=1988 , avg=28.34, stdev=55.81
     lat (msec): min=1 , max=1989 , avg=28.51, stdev=57.51
    bw (KB/s) : min=    2, max= 1808, per=51.10%, avg=1133.42, stdev=275.59

Both read latency/throughput has big difference with the patch, but write
gets starvation.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2]block cfq: make queue preempt work for queues from different workload
  2011-01-12  2:30   ` Shaohua Li
@ 2011-01-14  4:44     ` Shaohua Li
  2011-01-14  7:38       ` Jens Axboe
  2011-01-17  3:41       ` Gui Jianfeng
  0 siblings, 2 replies; 10+ messages in thread
From: Shaohua Li @ 2011-01-14  4:44 UTC (permalink / raw)
  To: Corrado Zoccolo, Jens Axboe
  Cc: lkml, Vivek Goyal, jmoyer@redhat.com, Gui Jianfeng

2011/1/12 Shaohua Li <shaohua.li@intel.com>:
> Hi,
> On Wed, Jan 12, 2011 at 05:07:47AM +0800, Corrado Zoccolo wrote:
>> Hi Shaohua,
>> On Tue, Jan 11, 2011 at 9:51 AM, Shaohua Li <shaohua.li@intel.com> wrote:
>> > I got this:
>> >             fio-874   [007]  2157.724514:   8,32   m   N cfq874 preempt
>> >             fio-874   [007]  2157.724519:   8,32   m   N cfq830 slice expired t=1
>> >             fio-874   [007]  2157.724520:   8,32   m   N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0
>> >             fio-874   [007]  2157.724521:   8,32   m   N cfq830 set_active wl_prio:0 wl_type:0
>> >             fio-874   [007]  2157.724522:   8,32   m   N cfq830 Not idling. st->count:1
>> > cfq830 is an async queue, and preempted by a sync queue cfq874. But since we
>> > have cfqg->saved_workload_slice mechanism, the preempt is a nop.
>> > Looks currently our preempt is totally broken if the two queues are not from
>> > the same workload type.
>> > Below patch fixes it. This will might make async queue starvation, but it's
>> > what our old code does before cgroup is added.
>> have you measured latency improvements by un-breaking preemption?
>> AFAIK, preemption behaviour changed since 2.6.33, before cgroups were
>> added, and the latency before the changes that weakened preemption in
>> 2.6.33 was far worse.
> Yes. I'm testing a SD card for MeeGo. The random write is very slow (~12k/s) but
> random read is relatively fast > 1M/s.
>
> Without patch:
> write: (groupid=0, jobs=1): err= 0: pid=3876
>  write: io=966656 B, bw=8054 B/s, iops=1 , runt=120008msec
>    clat (usec): min=5 , max=1716.3K, avg=88637.38, stdev=207100.44
>     lat (usec): min=5 , max=1716.3K, avg=88637.69, stdev=207100.41
>    bw (KB/s) : min=    0, max=   52, per=168.17%, avg=11.77, stdev= 8.85
> read: (groupid=0, jobs=1): err= 0: pid=3877
>  read : io=52516KB, bw=448084 B/s, iops=109 , runt=120014msec
>    slat (usec): min=7 , max=1918.5K, avg=519.78, stdev=25777.85
>    clat (msec): min=1 , max=2728 , avg=71.17, stdev=216.92
>     lat (msec): min=1 , max=2756 , avg=71.69, stdev=219.52
>    bw (KB/s) : min=    1, max= 1413, per=66.42%, avg=567.22, stdev=461.50
>
> With patch:
> write: (groupid=0, jobs=1): err= 0: pid=4884
>  write: io=81920 B, bw=677 B/s, iops=0 , runt=120983msec
>    clat (usec): min=13 , max=742976 , avg=155694.10, stdev=244610.02
>     lat (usec): min=13 , max=742976 , avg=155694.50, stdev=244609.89
>    bw (KB/s) : min=    0, max=   31, per=inf%, avg= 8.40, stdev=12.78
> read: (groupid=0, jobs=1): err= 0: pid=4885
>  read : io=133008KB, bw=1108.3KB/s, iops=277 , runt=120022msec
>    slat (usec): min=8 , max=1159.1K, avg=164.24, stdev=9116.65
>    clat (msec): min=1 , max=1988 , avg=28.34, stdev=55.81
>     lat (msec): min=1 , max=1989 , avg=28.51, stdev=57.51
>    bw (KB/s) : min=    2, max= 1808, per=51.10%, avg=1133.42, stdev=275.59
>
> Both read latency/throughput has big difference with the patch, but write
> gets starvation.
Hi Jens and others,
How do you think about the patch?

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2]block cfq: make queue preempt work for queues from different workload
  2011-01-14  4:44     ` Shaohua Li
@ 2011-01-14  7:38       ` Jens Axboe
  2011-01-14 14:29         ` Jeff Moyer
  2011-01-17  3:41       ` Gui Jianfeng
  1 sibling, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2011-01-14  7:38 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Corrado Zoccolo, lkml, Vivek Goyal, jmoyer@redhat.com,
	Gui Jianfeng

On 2011-01-14 05:44, Shaohua Li wrote:
> 2011/1/12 Shaohua Li <shaohua.li@intel.com>:
>> Hi,
>> On Wed, Jan 12, 2011 at 05:07:47AM +0800, Corrado Zoccolo wrote:
>>> Hi Shaohua,
>>> On Tue, Jan 11, 2011 at 9:51 AM, Shaohua Li <shaohua.li@intel.com> wrote:
>>>> I got this:
>>>>             fio-874   [007]  2157.724514:   8,32   m   N cfq874 preempt
>>>>             fio-874   [007]  2157.724519:   8,32   m   N cfq830 slice expired t=1
>>>>             fio-874   [007]  2157.724520:   8,32   m   N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0
>>>>             fio-874   [007]  2157.724521:   8,32   m   N cfq830 set_active wl_prio:0 wl_type:0
>>>>             fio-874   [007]  2157.724522:   8,32   m   N cfq830 Not idling. st->count:1
>>>> cfq830 is an async queue, and preempted by a sync queue cfq874. But since we
>>>> have cfqg->saved_workload_slice mechanism, the preempt is a nop.
>>>> Looks currently our preempt is totally broken if the two queues are not from
>>>> the same workload type.
>>>> Below patch fixes it. This will might make async queue starvation, but it's
>>>> what our old code does before cgroup is added.
>>> have you measured latency improvements by un-breaking preemption?
>>> AFAIK, preemption behaviour changed since 2.6.33, before cgroups were
>>> added, and the latency before the changes that weakened preemption in
>>> 2.6.33 was far worse.
>> Yes. I'm testing a SD card for MeeGo. The random write is very slow (~12k/s) but
>> random read is relatively fast > 1M/s.
>>
>> Without patch:
>> write: (groupid=0, jobs=1): err= 0: pid=3876
>>  write: io=966656 B, bw=8054 B/s, iops=1 , runt=120008msec
>>    clat (usec): min=5 , max=1716.3K, avg=88637.38, stdev=207100.44
>>     lat (usec): min=5 , max=1716.3K, avg=88637.69, stdev=207100.41
>>    bw (KB/s) : min=    0, max=   52, per=168.17%, avg=11.77, stdev= 8.85
>> read: (groupid=0, jobs=1): err= 0: pid=3877
>>  read : io=52516KB, bw=448084 B/s, iops=109 , runt=120014msec
>>    slat (usec): min=7 , max=1918.5K, avg=519.78, stdev=25777.85
>>    clat (msec): min=1 , max=2728 , avg=71.17, stdev=216.92
>>     lat (msec): min=1 , max=2756 , avg=71.69, stdev=219.52
>>    bw (KB/s) : min=    1, max= 1413, per=66.42%, avg=567.22, stdev=461.50
>>
>> With patch:
>> write: (groupid=0, jobs=1): err= 0: pid=4884
>>  write: io=81920 B, bw=677 B/s, iops=0 , runt=120983msec
>>    clat (usec): min=13 , max=742976 , avg=155694.10, stdev=244610.02
>>     lat (usec): min=13 , max=742976 , avg=155694.50, stdev=244609.89
>>    bw (KB/s) : min=    0, max=   31, per=inf%, avg= 8.40, stdev=12.78
>> read: (groupid=0, jobs=1): err= 0: pid=4885
>>  read : io=133008KB, bw=1108.3KB/s, iops=277 , runt=120022msec
>>    slat (usec): min=8 , max=1159.1K, avg=164.24, stdev=9116.65
>>    clat (msec): min=1 , max=1988 , avg=28.34, stdev=55.81
>>     lat (msec): min=1 , max=1989 , avg=28.51, stdev=57.51
>>    bw (KB/s) : min=    2, max= 1808, per=51.10%, avg=1133.42, stdev=275.59
>>
>> Both read latency/throughput has big difference with the patch, but write
>> gets starvation.
> Hi Jens and others,
> How do you think about the patch?

I think the patch is good. If preemption in some cases makes things
worse, then we need to look into those. That's a separate issue.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2]block cfq: make queue preempt work for queues from different workload
  2011-01-14  7:38       ` Jens Axboe
@ 2011-01-14 14:29         ` Jeff Moyer
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff Moyer @ 2011-01-14 14:29 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Shaohua Li, Corrado Zoccolo, lkml, Vivek Goyal, Gui Jianfeng

Jens Axboe <jaxboe@fusionio.com> writes:

> On 2011-01-14 05:44, Shaohua Li wrote:
>> 2011/1/12 Shaohua Li <shaohua.li@intel.com>:
>>> Hi,
>>> On Wed, Jan 12, 2011 at 05:07:47AM +0800, Corrado Zoccolo wrote:
>>>> Hi Shaohua,
>>>> On Tue, Jan 11, 2011 at 9:51 AM, Shaohua Li <shaohua.li@intel.com> wrote:
>>>>> I got this:
>>>>>             fio-874   [007]  2157.724514:   8,32   m   N cfq874 preempt
>>>>>             fio-874   [007]  2157.724519:   8,32   m   N cfq830 slice expired t=1
>>>>>             fio-874   [007]  2157.724520:   8,32   m   N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0
>>>>>             fio-874   [007]  2157.724521:   8,32   m   N cfq830 set_active wl_prio:0 wl_type:0
>>>>>             fio-874   [007]  2157.724522:   8,32   m   N cfq830 Not idling. st->count:1
>>>>> cfq830 is an async queue, and preempted by a sync queue cfq874. But since we
>>>>> have cfqg->saved_workload_slice mechanism, the preempt is a nop.
>>>>> Looks currently our preempt is totally broken if the two queues are not from
>>>>> the same workload type.
>>>>> Below patch fixes it. This will might make async queue starvation, but it's
>>>>> what our old code does before cgroup is added.
>>>> have you measured latency improvements by un-breaking preemption?
>>>> AFAIK, preemption behaviour changed since 2.6.33, before cgroups were
>>>> added, and the latency before the changes that weakened preemption in
>>>> 2.6.33 was far worse.
>>> Yes. I'm testing a SD card for MeeGo. The random write is very slow (~12k/s) but
>>> random read is relatively fast > 1M/s.
>>>
>>> Without patch:
>>> write: (groupid=0, jobs=1): err= 0: pid=3876
>>>  write: io=966656 B, bw=8054 B/s, iops=1 , runt=120008msec
>>>    clat (usec): min=5 , max=1716.3K, avg=88637.38, stdev=207100.44
>>>     lat (usec): min=5 , max=1716.3K, avg=88637.69, stdev=207100.41
>>>    bw (KB/s) : min=    0, max=   52, per=168.17%, avg=11.77, stdev= 8.85
>>> read: (groupid=0, jobs=1): err= 0: pid=3877
>>>  read : io=52516KB, bw=448084 B/s, iops=109 , runt=120014msec
>>>    slat (usec): min=7 , max=1918.5K, avg=519.78, stdev=25777.85
>>>    clat (msec): min=1 , max=2728 , avg=71.17, stdev=216.92
>>>     lat (msec): min=1 , max=2756 , avg=71.69, stdev=219.52
>>>    bw (KB/s) : min=    1, max= 1413, per=66.42%, avg=567.22, stdev=461.50
>>>
>>> With patch:
>>> write: (groupid=0, jobs=1): err= 0: pid=4884
>>>  write: io=81920 B, bw=677 B/s, iops=0 , runt=120983msec
>>>    clat (usec): min=13 , max=742976 , avg=155694.10, stdev=244610.02
>>>     lat (usec): min=13 , max=742976 , avg=155694.50, stdev=244609.89
>>>    bw (KB/s) : min=    0, max=   31, per=inf%, avg= 8.40, stdev=12.78
>>> read: (groupid=0, jobs=1): err= 0: pid=4885
>>>  read : io=133008KB, bw=1108.3KB/s, iops=277 , runt=120022msec
>>>    slat (usec): min=8 , max=1159.1K, avg=164.24, stdev=9116.65
>>>    clat (msec): min=1 , max=1988 , avg=28.34, stdev=55.81
>>>     lat (msec): min=1 , max=1989 , avg=28.51, stdev=57.51
>>>    bw (KB/s) : min=    2, max= 1808, per=51.10%, avg=1133.42, stdev=275.59
>>>
>>> Both read latency/throughput has big difference with the patch, but write
>>> gets starvation.
>> Hi Jens and others,
>> How do you think about the patch?
>
> I think the patch is good. If preemption in some cases makes things
> worse, then we need to look into those. That's a separate issue.

I think things are getting pretty messy.  Who would have guessed that
the saved_workload_slice would affect preemption?  If a queue is to
preempt another, can't we make that a bit more explicit?  I'm still
trying to walk through the code to figure out how this ends up
happening, as the patch description leaves a lot out of the picture.

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2]block cfq: make queue preempt work for queues from different workload
  2011-01-14  4:44     ` Shaohua Li
  2011-01-14  7:38       ` Jens Axboe
@ 2011-01-17  3:41       ` Gui Jianfeng
  2011-01-17  8:41         ` Shaohua Li
  1 sibling, 1 reply; 10+ messages in thread
From: Gui Jianfeng @ 2011-01-17  3:41 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Corrado Zoccolo, Jens Axboe, lkml, Vivek Goyal, jmoyer@redhat.com

Shaohua Li wrote:
> 2011/1/12 Shaohua Li <shaohua.li@intel.com>:
>> Hi,
>> On Wed, Jan 12, 2011 at 05:07:47AM +0800, Corrado Zoccolo wrote:
>>> Hi Shaohua,
>>> On Tue, Jan 11, 2011 at 9:51 AM, Shaohua Li <shaohua.li@intel.com> wrote:
>>>> I got this:
>>>>             fio-874   [007]  2157.724514:   8,32   m   N cfq874 preempt
>>>>             fio-874   [007]  2157.724519:   8,32   m   N cfq830 slice expired t=1
>>>>             fio-874   [007]  2157.724520:   8,32   m   N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0
>>>>             fio-874   [007]  2157.724521:   8,32   m   N cfq830 set_active wl_prio:0 wl_type:0
>>>>             fio-874   [007]  2157.724522:   8,32   m   N cfq830 Not idling. st->count:1
>>>> cfq830 is an async queue, and preempted by a sync queue cfq874. But since we
>>>> have cfqg->saved_workload_slice mechanism, the preempt is a nop.
>>>> Looks currently our preempt is totally broken if the two queues are not from
>>>> the same workload type.
>>>> Below patch fixes it. This will might make async queue starvation, but it's
>>>> what our old code does before cgroup is added.
>>> have you measured latency improvements by un-breaking preemption?
>>> AFAIK, preemption behaviour changed since 2.6.33, before cgroups were
>>> added, and the latency before the changes that weakened preemption in
>>> 2.6.33 was far worse.
>> Yes. I'm testing a SD card for MeeGo. The random write is very slow (~12k/s) but
>> random read is relatively fast > 1M/s.
>>
>> Without patch:
>> write: (groupid=0, jobs=1): err= 0: pid=3876
>>  write: io=966656 B, bw=8054 B/s, iops=1 , runt=120008msec
>>    clat (usec): min=5 , max=1716.3K, avg=88637.38, stdev=207100.44
>>     lat (usec): min=5 , max=1716.3K, avg=88637.69, stdev=207100.41
>>    bw (KB/s) : min=    0, max=   52, per=168.17%, avg=11.77, stdev= 8.85
>> read: (groupid=0, jobs=1): err= 0: pid=3877
>>  read : io=52516KB, bw=448084 B/s, iops=109 , runt=120014msec
>>    slat (usec): min=7 , max=1918.5K, avg=519.78, stdev=25777.85
>>    clat (msec): min=1 , max=2728 , avg=71.17, stdev=216.92
>>     lat (msec): min=1 , max=2756 , avg=71.69, stdev=219.52
>>    bw (KB/s) : min=    1, max= 1413, per=66.42%, avg=567.22, stdev=461.50
>>
>> With patch:
>> write: (groupid=0, jobs=1): err= 0: pid=4884
>>  write: io=81920 B, bw=677 B/s, iops=0 , runt=120983msec
>>    clat (usec): min=13 , max=742976 , avg=155694.10, stdev=244610.02
>>     lat (usec): min=13 , max=742976 , avg=155694.50, stdev=244609.89
>>    bw (KB/s) : min=    0, max=   31, per=inf%, avg= 8.40, stdev=12.78
>> read: (groupid=0, jobs=1): err= 0: pid=4885
>>  read : io=133008KB, bw=1108.3KB/s, iops=277 , runt=120022msec
>>    slat (usec): min=8 , max=1159.1K, avg=164.24, stdev=9116.65
>>    clat (msec): min=1 , max=1988 , avg=28.34, stdev=55.81
>>     lat (msec): min=1 , max=1989 , avg=28.51, stdev=57.51
>>    bw (KB/s) : min=    2, max= 1808, per=51.10%, avg=1133.42, stdev=275.59
>>
>> Both read latency/throughput has big difference with the patch, but write
>> gets starvation.
> Hi Jens and others,
> How do you think about the patch?

Further more, Consider the following piece code.

2132         /*
2133          * For RT and BE, we have to choose also the type
2134          * (SYNC, SYNC_NOIDLE, ASYNC), and to compute a workload
2135          * expiration time
2136          */
2137         st = service_tree_for(cfqg, cfqd->serving_prio, cfqd->serving_type);
2138         count = st->count;
2139 
2140         /*
2141          * check workload expiration, and that we still have other queues ready
2142          */
2143         if (count && !time_after(jiffies, cfqd->workload_expires))
2144                 return;

here, cfqd->serving_prio might be changed. But we continue to check workload expire
to decide whether let the old workload run. I don't think this makes too much sence.
I think if cfqd->serving_prio gets changed, we should recalculate workload type.
Am i missing something?

Thanks
Gui

> 
> Thanks,
> Shaohua
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2]block cfq: make queue preempt work for queues from different workload
  2011-01-17  3:41       ` Gui Jianfeng
@ 2011-01-17  8:41         ` Shaohua Li
  0 siblings, 0 replies; 10+ messages in thread
From: Shaohua Li @ 2011-01-17  8:41 UTC (permalink / raw)
  To: Gui Jianfeng
  Cc: Corrado Zoccolo, Jens Axboe, lkml, Vivek Goyal, jmoyer@redhat.com

2011/1/17 Gui Jianfeng <guijianfeng@cn.fujitsu.com>:
> Shaohua Li wrote:
>> 2011/1/12 Shaohua Li <shaohua.li@intel.com>:
>>> Hi,
>>> On Wed, Jan 12, 2011 at 05:07:47AM +0800, Corrado Zoccolo wrote:
>>>> Hi Shaohua,
>>>> On Tue, Jan 11, 2011 at 9:51 AM, Shaohua Li <shaohua.li@intel.com> wrote:
>>>>> I got this:
>>>>>             fio-874   [007]  2157.724514:   8,32   m   N cfq874 preempt
>>>>>             fio-874   [007]  2157.724519:   8,32   m   N cfq830 slice expired t=1
>>>>>             fio-874   [007]  2157.724520:   8,32   m   N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0
>>>>>             fio-874   [007]  2157.724521:   8,32   m   N cfq830 set_active wl_prio:0 wl_type:0
>>>>>             fio-874   [007]  2157.724522:   8,32   m   N cfq830 Not idling. st->count:1
>>>>> cfq830 is an async queue, and preempted by a sync queue cfq874. But since we
>>>>> have cfqg->saved_workload_slice mechanism, the preempt is a nop.
>>>>> Looks currently our preempt is totally broken if the two queues are not from
>>>>> the same workload type.
>>>>> Below patch fixes it. This will might make async queue starvation, but it's
>>>>> what our old code does before cgroup is added.
>>>> have you measured latency improvements by un-breaking preemption?
>>>> AFAIK, preemption behaviour changed since 2.6.33, before cgroups were
>>>> added, and the latency before the changes that weakened preemption in
>>>> 2.6.33 was far worse.
>>> Yes. I'm testing a SD card for MeeGo. The random write is very slow (~12k/s) but
>>> random read is relatively fast > 1M/s.
>>>
>>> Without patch:
>>> write: (groupid=0, jobs=1): err= 0: pid=3876
>>>  write: io=966656 B, bw=8054 B/s, iops=1 , runt=120008msec
>>>    clat (usec): min=5 , max=1716.3K, avg=88637.38, stdev=207100.44
>>>     lat (usec): min=5 , max=1716.3K, avg=88637.69, stdev=207100.41
>>>    bw (KB/s) : min=    0, max=   52, per=168.17%, avg=11.77, stdev= 8.85
>>> read: (groupid=0, jobs=1): err= 0: pid=3877
>>>  read : io=52516KB, bw=448084 B/s, iops=109 , runt=120014msec
>>>    slat (usec): min=7 , max=1918.5K, avg=519.78, stdev=25777.85
>>>    clat (msec): min=1 , max=2728 , avg=71.17, stdev=216.92
>>>     lat (msec): min=1 , max=2756 , avg=71.69, stdev=219.52
>>>    bw (KB/s) : min=    1, max= 1413, per=66.42%, avg=567.22, stdev=461.50
>>>
>>> With patch:
>>> write: (groupid=0, jobs=1): err= 0: pid=4884
>>>  write: io=81920 B, bw=677 B/s, iops=0 , runt=120983msec
>>>    clat (usec): min=13 , max=742976 , avg=155694.10, stdev=244610.02
>>>     lat (usec): min=13 , max=742976 , avg=155694.50, stdev=244609.89
>>>    bw (KB/s) : min=    0, max=   31, per=inf%, avg= 8.40, stdev=12.78
>>> read: (groupid=0, jobs=1): err= 0: pid=4885
>>>  read : io=133008KB, bw=1108.3KB/s, iops=277 , runt=120022msec
>>>    slat (usec): min=8 , max=1159.1K, avg=164.24, stdev=9116.65
>>>    clat (msec): min=1 , max=1988 , avg=28.34, stdev=55.81
>>>     lat (msec): min=1 , max=1989 , avg=28.51, stdev=57.51
>>>    bw (KB/s) : min=    2, max= 1808, per=51.10%, avg=1133.42, stdev=275.59
>>>
>>> Both read latency/throughput has big difference with the patch, but write
>>> gets starvation.
>> Hi Jens and others,
>> How do you think about the patch?
>
> Further more, Consider the following piece code.
>
> 2132         /*
> 2133          * For RT and BE, we have to choose also the type
> 2134          * (SYNC, SYNC_NOIDLE, ASYNC), and to compute a workload
> 2135          * expiration time
> 2136          */
> 2137         st = service_tree_for(cfqg, cfqd->serving_prio, cfqd->serving_type);
> 2138         count = st->count;
> 2139
> 2140         /*
> 2141          * check workload expiration, and that we still have other queues ready
> 2142          */
> 2143         if (count && !time_after(jiffies, cfqd->workload_expires))
> 2144                 return;
>
> here, cfqd->serving_prio might be changed. But we continue to check workload expire
> to decide whether let the old workload run. I don't think this makes too much sence.
> I think if cfqd->serving_prio gets changed, we should recalculate workload type.
> Am i missing something?
This is already fixed in latest git
e4ea0c16a85d221ebcc3a21f32e321440459e0fc

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2]block cfq:  make queue preempt work for queues from different workload
  2011-01-11  8:51 [PATCH 1/2]block cfq: make queue preempt work for queues from different workload Shaohua Li
  2011-01-11 21:07 ` Corrado Zoccolo
@ 2011-01-17 14:26 ` Vivek Goyal
  2011-01-18  0:55   ` Shaohua Li
  1 sibling, 1 reply; 10+ messages in thread
From: Vivek Goyal @ 2011-01-17 14:26 UTC (permalink / raw)
  To: Shaohua Li
  Cc: lkml, Jens Axboe, jmoyer@redhat.com, Corrado Zoccolo,
	Gui Jianfeng

On Tue, Jan 11, 2011 at 04:51:56PM +0800, Shaohua Li wrote:
> I got this:
>              fio-874   [007]  2157.724514:   8,32   m   N cfq874 preempt
>              fio-874   [007]  2157.724519:   8,32   m   N cfq830 slice expired t=1
>              fio-874   [007]  2157.724520:   8,32   m   N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0
>              fio-874   [007]  2157.724521:   8,32   m   N cfq830 set_active wl_prio:0 wl_type:0
>              fio-874   [007]  2157.724522:   8,32   m   N cfq830 Not idling. st->count:1
> cfq830 is an async queue, and preempted by a sync queue cfq874. But since we
> have cfqg->saved_workload_slice mechanism, the preempt is a nop.
> Looks currently our preempt is totally broken if the two queues are not from
> the same workload type.
> Below patch fixes it. This will might make async queue starvation, but it's
> what our old code does before cgroup is added.

I am not sure how good a idea this is. So effectively now if there is a
preemtion across workload, the workload being preempted can lose its
share completely and be starved. So it is not just about async WRITES.
sync-noidle can starve sync-idle workload too as meta data request will
preempt any regular sync-idle queue and then sync-idle queue workload 
can be starved. So this is not exactly going back to old CFQ behavior
but something more than that too.

On one hand you are going to great lengths to fix the issues where if a
cfqq is preempted, it does not lose its share (patch 2 in the series) and
on the other hand you don't mind all the queues in a workload losing their
share due to preemption.

I think atleast we should limit this to async workload. Or solve the
problem by giving even smaller slice length to async workload.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2]block cfq:  make queue preempt work for queues from different workload
  2011-01-17 14:26 ` Vivek Goyal
@ 2011-01-18  0:55   ` Shaohua Li
  0 siblings, 0 replies; 10+ messages in thread
From: Shaohua Li @ 2011-01-18  0:55 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: lkml, Jens Axboe, jmoyer@redhat.com, Corrado Zoccolo,
	Gui Jianfeng

On Mon, 2011-01-17 at 22:26 +0800, Vivek Goyal wrote:
> On Tue, Jan 11, 2011 at 04:51:56PM +0800, Shaohua Li wrote:
> > I got this:
> >              fio-874   [007]  2157.724514:   8,32   m   N cfq874 preempt
> >              fio-874   [007]  2157.724519:   8,32   m   N cfq830 slice expired t=1
> >              fio-874   [007]  2157.724520:   8,32   m   N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0
> >              fio-874   [007]  2157.724521:   8,32   m   N cfq830 set_active wl_prio:0 wl_type:0
> >              fio-874   [007]  2157.724522:   8,32   m   N cfq830 Not idling. st->count:1
> > cfq830 is an async queue, and preempted by a sync queue cfq874. But since we
> > have cfqg->saved_workload_slice mechanism, the preempt is a nop.
> > Looks currently our preempt is totally broken if the two queues are not from
> > the same workload type.
> > Below patch fixes it. This will might make async queue starvation, but it's
> > what our old code does before cgroup is added.
> 
> I am not sure how good a idea this is. So effectively now if there is a
> preemtion across workload, the workload being preempted can lose its
> share completely and be starved. So it is not just about async WRITES.
> sync-noidle can starve sync-idle workload too as meta data request will
> preempt any regular sync-idle queue and then sync-idle queue workload 
> can be starved. So this is not exactly going back to old CFQ behavior
> but something more than that too.
> 
> On one hand you are going to great lengths to fix the issues where if a
> cfqq is preempted, it does not lose its share (patch 2 in the series) and
> on the other hand you don't mind all the queues in a workload losing their
> share due to preemption.
> 
> I think atleast we should limit this to async workload. Or solve the
> problem by giving even smaller slice length to async workload.
I haven't too much idea about this, but disabling preempt might harm
performance even for sync too, for example, the cfq_rq_close() case.
can we do something like this:
sync preempts async: expire the whole workload slice
sync queue A preempt sync queue B: the workload of queue A will
immediately expire after queue A is expired.

Thanks,
Shaohua


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-01-18  0:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-11  8:51 [PATCH 1/2]block cfq: make queue preempt work for queues from different workload Shaohua Li
2011-01-11 21:07 ` Corrado Zoccolo
2011-01-12  2:30   ` Shaohua Li
2011-01-14  4:44     ` Shaohua Li
2011-01-14  7:38       ` Jens Axboe
2011-01-14 14:29         ` Jeff Moyer
2011-01-17  3:41       ` Gui Jianfeng
2011-01-17  8:41         ` Shaohua Li
2011-01-17 14:26 ` Vivek Goyal
2011-01-18  0:55   ` Shaohua Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox