Re: [PATCH] cfq: Make use of service count to estimate the rb_key offset

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
To: Corrado Zoccolo <czoccolo@gmail.com>, Jens Axboe <jens.axboe@oracle.com>
Cc: Vivek Goyal <vgoyal@redhat.com>, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] cfq: Make use of service count to estimate the rb_key offset
Date: Mon, 30 Nov 2009 11:02:43 +0800	[thread overview]
Message-ID: <4B1335D3.7030807@cn.fujitsu.com> (raw)
In-Reply-To: <4e5e476b0911270016o6a6dfb08q7fefa2858a2ae8e6@mail.gmail.com>

Corrado Zoccolo wrote:
> Hi Guy,
> On Fri, Nov 27, 2009 at 2:42 AM, Gui Jianfeng
> <guijianfeng@cn.fujitsu.com> wrote:
>> Corrado Zoccolo wrote:
>>> Hi Gui, Jens
>>> On Thu, Nov 26, 2009 at 7:20 AM, Gui Jianfeng
>>> <guijianfeng@cn.fujitsu.com> wrote:
>>>> Hi Jens, Czoccolo
>>>>
>>>> For the moment, different workload cfq queues are put into different
>>>> service trees. But CFQ still uses "busy_queues" to estimate rb_key
>>>> offset when inserting a cfq queue into a service tree. I think this
>>>> isn't appropriate, and it should make use of service tree count to do
>>>> this estimation. This patch is for for-2.6.33 branch.
>>> In cfq_choose_wl, we rely on consistency of rb_keys across service
>>> trees to compute the next workload to be serviced.
>>>         for (i = 0; i < 3; ++i) {
>>>                 /* otherwise, select the one with lowest rb_key */
>>>                 queue = cfq_rb_first(service_tree_for(prio, i, cfqd));
>>>                 if (queue &&
>>>                     (!key_valid || time_before(queue->rb_key, lowest_key))) {
>>>                         lowest_key = queue->rb_key;
>>>                       cur_best = i;
>>>                       key_valid = true;
>>>               }
>>>         }
>>>
>>> If you change how the rb_key is computed (so it is no longer
>>> consistent across service trees) without changing how it is used can
>>> introduce problems.
>>  Ok, I think I was missing this part. This part still behaves like old CFQ regardless
>>  of workload type. I'm wondering why you prefer starting from sync no-idle only when
>>  priorities switched, after that, you do it like old CFQ behavior?
> 
> When switching priorities (e.g. from RT to BE), we may come from a
> long stall. In this case, I think it is better to run no-idle first.
> During normal operation, instead, we want a fair, starvation free way
> to switch between workloads, and I thought it was simpler to mimic old
> CFQ behaviour, instead of cook up a different method.
> The difference between new and old CFQ is that now, when we decide to
> service one no-idle request, we will then service subsequent ones from
> the same workload type.
> This allows processing them optimally on NCQ hardware.
> Moreover, when no more no-idle requests are available, but the
> timeslice for this workload did not expire yet, we will wait for more.
> This guarantees fairness for no-idle workload.
> 
>>  In order to improve
>>  latency for sync no-idle workload, is it possible to take workload type into account,
>>  not only rely on rb_keys across service trees?
> When loading a program into memory, your process will go through
> various phases w.r.t. disk access pattern: some are seeky, some others
> are sequential.
> 
> If you just improve latency for one workload, penalizing the others,
> you won't get an overall improvement of the system.
> The new scheme improves overall system behaviour because grouping
> no-idle requests together gives a better utilization of the disk, and
> fairness allows also processes making seeky requests to progress.
> Penalizing the idle service tree, instead, you will give you lower
> overall throughput (forbidding progress to the processes that make
> sequential requests), while penalizing writeback you will find
> yourself waiting for freeing dirty pages more often, and maybe
> incurring in OOM conditions.
> 
> Regarding the rb_key computation, I have done various experiments, and
> found that the actual formula doesn't matter much on rotational
> hardware, where the slice length has most importance.
> But I think it is essential on NCQ SSDs, to obtain fairness.
> Unfortunately, I don't have an NCQ SSD, so I can't test my improvement ideas.
> 

Corrado, thanks for the detailed explanation.

Jens, I think Corrado is right, we still need consistency of rb_keys across service
to compute next workload type. So would you revert this patch please?

Thanks,
Gui

next prev parent reply	other threads:[~2009-11-30  3:06 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-26  6:20 [PATCH] cfq: Make use of service count to estimate the rb_key offset Gui Jianfeng
2009-11-26  8:14 ` Jens Axboe
2009-11-26  9:08 ` Corrado Zoccolo
2009-11-27  1:42   ` Gui Jianfeng
2009-11-27  8:16     ` Corrado Zoccolo
2009-11-30  3:02       ` Gui Jianfeng [this message]
2009-11-30  8:38         ` Jens Axboe
2009-11-30 15:36   ` Vivek Goyal
2009-11-30 16:01     ` Corrado Zoccolo
2009-11-30 16:46       ` Vivek Goyal
2009-11-30 21:56         ` Corrado Zoccolo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B1335D3.7030807@cn.fujitsu.com \
    --to=guijianfeng@cn.fujitsu.com \
    --cc=czoccolo@gmail.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.