public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Suresh Jayaraman <sjayaraman@suse.de>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: linux kernel mailing list <linux-kernel@vger.kernel.org>,
	Jens Axboe <axboe@kernel.dk>,
	Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>,
	Moyer Jeff Moyer <jmoyer@redhat.com>,
	Shaohua Li <shaohua.li@intel.com>
Subject: Re: [PATCH] cfq-iosched: Add some more documentation about idling
Date: Mon, 01 Aug 2011 23:55:12 +0530	[thread overview]
Message-ID: <4E36EF88.6040704@suse.de> (raw)
In-Reply-To: <20110801155523.GE3805@redhat.com>

On 08/01/2011 09:25 PM, Vivek Goyal wrote:
> There are always questions about why CFQ is idling on various conditions.
> Recent ones is Christoph asking again why to idle on REQ_NOIDLE. His
> assertion is that XFS is relying more and more on workqueues and is
> concerned that CFQ idling on IO from every workqueue will impact
> XFS badly.
> 
> So he suggested that I add some more documentation about CFQ idling
> and that can provide more clarity on the topic and also gives an
> opprotunity to poke a hole in theory and lead to improvements.
> 
> So here is my attempt at that. Any comments are welcome.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  Documentation/block/cfq-iosched.txt |   70 +++++++++++++++++++++++++++++++++++
>  1 files changed, 70 insertions(+), 0 deletions(-)

The patch looks good and documents idling nicely. A few minor nits...

> diff --git a/Documentation/block/cfq-iosched.txt b/Documentation/block/cfq-iosched.txt
> index e578fee..7ce81b8 100644
> --- a/Documentation/block/cfq-iosched.txt
> +++ b/Documentation/block/cfq-iosched.txt
> @@ -43,3 +43,73 @@ If one sets slice_idle=0 and if storage supports NCQ, CFQ internally switches
>  to IOPS mode and starts providing fairness in terms of number of requests
>  dispatched. Note that this mode switching takes effect only for group
>  scheduling. For non-cgroup users nothing should change.
> +
> +CFQ IO scheduler Idling Theory
> +==============================
> +Idling on a queue is primarily about waiting for next request to come on
                                                  ^^ the ?
> +same queue after completion of a request. In this process CFQ will not
> +dispatch requests from other cfq queues even if requests are pending
> +there.
> +
> +The rationale behind idling is that it can cut down on number of seeks
> +on rotational media. For example, if a process is doing dependent
> +sequential reads (next read will come on only after completion of previous
> +one), then not dispatching request from other queue sould help as we
                                                      ^^ should
> +did not move the disk head and kept on dispatching sequential IO from
> +one queue.
> +
> +CFQ does not do idling on all the queues. It primarily tries to do idling
> +on queues which are doing synchronous sequential IO. The synchronous
> +queues which are not doing sequential IO are put on a separate service
> +tree (called sync-noidle tree) where we do not idle on individual
> +cfq queue, but idle on the whole tree or IOW, idle on a group of cfq
> +queues.
> +
> +CFQ has following tree service trees and various queues are put on these
                    ^^ extraneous "tree" ?

There seems to be some redundant information between the paragraph above
and below.. More room for brevity?

> +trees.
> +
> +	sync-idle	sync-noidle	async 
> +
> +All cfq queues doing synchronous sequential IO go on to sync-idle tree.
> +On this tree we idle on each queue individually.
> +
> +All synchronous non-sequential queues go on sync-noidle tree. Also any
> +request which are marked with REQ_NOIDLE go on this service tree.
> +
> +All async writes go on async service tree. There is no idling on async
> +queues.

> +FAQ
> +===
> +Q1. Why to idle at all on queues marked with REQ_NOIDLE.
> +
> +A1. We only do group idle on queues marked with REQ_NOIDLE. This helps in
> +    providing isolation with all the sync-idle queues. Otherwise in presence
> +    of many sequential readers, other synchronous IO might not get fair
> +    share of disk.
> +
> +    For example, if there are 10 sequential readers doing IO and they get
> +    100ms each. If a REQ_NOIDLE request comes in, it will be scheduled
> +    roughly after 1 second. If after completion of REQ_NOIDLE request we
> +    do not idle, and after a couple of mili seconds a another REQ_NOIDLE
> +    request comes in, again it will be scheduled after 1second. Repeat it
> +    and notice how a workload can lose its disk share and suffer due to
> +    multiple sequnetial readers.
> +
> +    fsync can generate dependent IO where bunch of data is written in the
> +    context of fsync, and later some journaling data is written. Journaling
> +    data comes in only after fsync has finished its IO (atleast for ext4
> +    that seemed to be the case). Now if one decides not to idle on fsync
> +    thread due to REQ_NOIDLE, then next journaling write will not get
> +    scheduled for another second. A process doing small fsync, will suffer
> +    badly in presence of multiple sequntial readers.
> +
> +    Hence doing group idling on threads using REQ_NOIDLE flag on requests
> +    provides isolation from multiple sequntial readers and at the same
> +    time we do not idle on individual threads. 
> +
> +Q2. When to specify REQ_NOIDLE
> +A2. I would think whenever one is doing synchronous write and not expecting
> +    more writes to be dispatched from same context soon, should be able
> +    to specify REQ_NOIDLE on writes and that probably should work well for
> +    most of the cases. 


-- 
Suresh Jayaraman

  reply	other threads:[~2011-08-01 18:26 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-01 15:55 [PATCH] cfq-iosched: Add some more documentation about idling Vivek Goyal
2011-08-01 18:25 ` Suresh Jayaraman [this message]
2011-08-02  0:41 ` Shaohua Li
2011-08-02 17:24   ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E36EF88.6040704@suse.de \
    --to=sjayaraman@suse.de \
    --cc=axboe@kernel.dk \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jmoyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shaohua.li@intel.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox