All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Randy Dunlap <randy.dunlap@oracle.com>
Cc: linux-kernel@vger.kernel.org, jaxboe@fusionio.com,
	nauman@google.com, dpshah@google.com, guijianfeng@cn.fujitsu.com,
	jmoyer@redhat.com, czoccolo@gmail.com
Subject: Re: [PATCH 5/5] cfq-iosched: Documentation update
Date: Fri, 23 Jul 2010 16:22:54 -0400	[thread overview]
Message-ID: <20100723202254.GF13104@redhat.com> (raw)
In-Reply-To: <20100722143659.39d55b0a.randy.dunlap@oracle.com>

On Thu, Jul 22, 2010 at 02:36:59PM -0700, Randy Dunlap wrote:
> On Thu, 22 Jul 2010 17:29:32 -0400 Vivek Goyal wrote:
> 
> > o Documentation update for group_idle tunable and Group IOPS mode.
> > ---

Thanks Randy. I have taken care of your comments in the attached patch.

Vivek

---
 Documentation/block/cfq-iosched.txt        |   45 +++++++++++++++++++++++++++++
 Documentation/cgroups/blkio-controller.txt |   28 ++++++++++++++++++
 2 files changed, 73 insertions(+)

Index: linux-2.6/Documentation/block/cfq-iosched.txt
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/Documentation/block/cfq-iosched.txt	2010-07-23 16:20:52.000000000 -0400
@@ -0,0 +1,45 @@
+CFQ ioscheduler tunables
+========================
+
+slice_idle
+----------
+This specifies how long CFQ should idle for next request on certain cfq queues
+(for sequential workloads) and service trees (for random workloads) before
+queue is expired and CFQ selects next queue to dispatch from.
+
+By default slice_idle is a non-zero value. That means by default we idle on
+queues/service trees. This can be very helpful on highly seeky media like
+single spindle SATA/SAS disks where we can cut down on overall number of
+seeks and see improved throughput.
+
+Setting slice_idle to 0 will remove all the idling on queues/service tree
+level and one should see an overall improved throughput on faster storage
+devices like multiple SATA/SAS disks in hardware RAID configuration. The down
+side is that isolation provided from WRITES also goes down and notion of
+IO priority becomes weaker.
+
+So depending on storage and workload, it might be useful to set slice_idle=0.
+In general I think for SATA/SAS disks and software RAID of SATA/SAS disks
+keeping slice_idle enabled should be useful. For any configurations where
+there are multiple spindles behind single LUN (Host based hardware RAID
+controller or for storage arrays), setting slice_idle=0 might end up in better
+throughput and acceptable latencies.
+
+CFQ IOPS Mode for group scheduling
+==================================
+Basic CFQ design is to provide priority based time slices. Higher priority
+process gets bigger time slice and lower priority process gets smaller time
+slice. Measuring time becomes harder if storage is fast and supports NCQ and
+it would be better to dispatch multiple requests from multiple cfq queues in
+request queue at a time. In such scenario, it is not possible to measure time
+consumed by single queue accurately.
+
+What is possible though is to measure number of requests dispatched from a
+single queue and also allow dispatch from multiple cfq queue at the same time.
+This effectively becomes the fairness in terms of IOPS (IO operations per
+second).
+
+If one sets slice_idle=0 and if storage supports NCQ, CFQ internally switches
+to IOPS mode and starts providing fairness in terms of number of requests
+dispatched. Note that this mode switching takes effect only for group
+scheduling. For non-cgroup users nothing should change.
Index: linux-2.6/Documentation/cgroups/blkio-controller.txt
===================================================================
--- linux-2.6.orig/Documentation/cgroups/blkio-controller.txt	2010-07-22 16:52:22.000000000 -0400
+++ linux-2.6/Documentation/cgroups/blkio-controller.txt	2010-07-23 16:16:09.000000000 -0400
@@ -217,6 +217,7 @@ Details of cgroup files
 CFQ sysfs tunable
 =================
 /sys/block/<disk>/queue/iosched/group_isolation
+-----------------------------------------------
 
 If group_isolation=1, it provides stronger isolation between groups at the
 expense of throughput. By default group_isolation is 0. In general that
@@ -243,6 +244,33 @@ By default one should run with group_iso
 and one wants stronger isolation between groups, then set group_isolation=1
 but this will come at cost of reduced throughput.
 
+/sys/block/<disk>/queue/iosched/slice_idle
+------------------------------------------
+On a faster hardware CFQ can be slow, especially with sequential workload.
+This happens because CFQ idles on a single queue and single queue might not
+drive deeper request queue depths to keep the storage busy. In such scenarios
+one can try setting slice_idle=0 and that would switch CFQ to IOPS
+(IO operations per second) mode on NCQ supporting hardware.
+
+That means CFQ will not idle between cfq queues of a cfq group and hence be
+able to driver higher queue depth and achieve better throughput. That also
+means that cfq provides fairness among groups in terms of IOPS and not in
+terms of disk time.
+
+/sys/block/<disk>/queue/iosched/group_idle
+------------------------------------------
+If one disables idling on individual cfq queues and cfq service trees by
+setting slice_idle=0, group_idle kicks in. That means CFQ will still idle
+on the group in an attempt to provide fairness among groups.
+
+By default group_idle is same as slice_idle and does not do anything if
+slice_idle is enabled.
+
+One can experience an overall throughput drop if you have created multiple
+groups and put applications in that group which are not driving enough
+IO to keep disk busy. In that case set group_idle=0, and CFQ will not idle
+on individual groups and throughput should improve.
+
 What works
 ==========
 - Currently only sync IO queues are support. All the buffered writes are

  reply	other threads:[~2010-07-23 20:23 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-22 21:29 [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable Vivek Goyal
2010-07-22 21:29 ` [PATCH 1/5] cfq-iosched: Do not idle on service tree if slice_idle=0 Vivek Goyal
2010-07-22 21:29 ` [PATCH 2/5] cfq-iosched: Implment IOPS mode for group scheduling Vivek Goyal
2010-07-27  5:47   ` Gui Jianfeng
2010-07-27 13:09     ` Vivek Goyal
2010-07-22 21:29 ` [PATCH 3/5] cfq-iosched: Implement a tunable group_idle Vivek Goyal
2010-07-22 21:29 ` [PATCH 4/5] cfq-iosched: Print number of sectors dispatched per cfqq slice Vivek Goyal
2010-07-22 21:29 ` [PATCH 5/5] cfq-iosched: Documentation update Vivek Goyal
2010-07-22 21:36   ` Randy Dunlap
2010-07-23 20:22     ` Vivek Goyal [this message]
2010-07-23 14:03 ` [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable Heinz Diehl
2010-07-23 14:13   ` Vivek Goyal
2010-07-23 14:56     ` Heinz Diehl
2010-07-23 18:37       ` Vivek Goyal
2010-07-24  8:06         ` Heinz Diehl
2010-07-26 13:43           ` Vivek Goyal
2010-07-26 13:48             ` Christoph Hellwig
2010-07-26 13:54               ` Vivek Goyal
2010-07-26 16:15             ` Heinz Diehl
2010-07-26 14:13           ` Christoph Hellwig
2010-07-27  7:48             ` Heinz Diehl
2010-07-28 20:22           ` Vivek Goyal
2010-07-28 23:57             ` Christoph Hellwig
2010-07-29  4:34               ` cfq fsync patch testing results (Was: Re: [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable) Vivek Goyal
2010-07-29 14:56                 ` Vivek Goyal
2010-07-29 19:39                   ` Jeff Moyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100723202254.GF13104@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=czoccolo@gmail.com \
    --cc=dpshah@google.com \
    --cc=guijianfeng@cn.fujitsu.com \
    --cc=jaxboe@fusionio.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nauman@google.com \
    --cc=randy.dunlap@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.