All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: linux-kernel@vger.kernel.org, jens.axboe@oracle.com
Cc: dhaval@linux.vnet.ibm.com, peterz@infradead.org,
	dm-devel@redhat.com, dpshah@google.com, agk@redhat.com,
	balbir@linux.vnet.ibm.com, paolo.valente@unimore.it,
	jmarchan@redhat.com, guijianfeng@cn.fujitsu.com,
	fernando@oss.ntt.co.jp, mikew@google.com, jmoyer@redhat.com,
	nauman@google.com, mingo@elte.hu, vgoyal@redhat.com,
	m-ikeda@ds.jp.nec.com, riel@redhat.com, lizf@cn.fujitsu.com,
	fchecconi@gmail.com, s-uchida@ap.jp.nec.com,
	containers@lists.linux-foundation.org, akpm@linux-foundation.org,
	righi.andrea@gmail.com, torvalds@linux-foundation.org
Subject: [PATCH 01/23] io-controller: Documentation
Date: Fri, 28 Aug 2009 17:30:50 -0400	[thread overview]
Message-ID: <1251495072-7780-2-git-send-email-vgoyal@redhat.com> (raw)
In-Reply-To: <1251495072-7780-1-git-send-email-vgoyal@redhat.com>

o Documentation for io-controller.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
---
 Documentation/block/00-INDEX          |    2 +
 Documentation/block/io-controller.txt |  407 +++++++++++++++++++++++++++++++++
 2 files changed, 409 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/block/io-controller.txt

diff --git a/Documentation/block/00-INDEX b/Documentation/block/00-INDEX
index 961a051..dc8bf95 100644
--- a/Documentation/block/00-INDEX
+++ b/Documentation/block/00-INDEX
@@ -10,6 +10,8 @@ capability.txt
 	- Generic Block Device Capability (/sys/block/<disk>/capability)
 deadline-iosched.txt
 	- Deadline IO scheduler tunables
+io-controller.txt
+	- IO controller for provding hierarchical IO scheduling
 ioprio.txt
 	- Block io priorities (in CFQ scheduler)
 request.txt
diff --git a/Documentation/block/io-controller.txt b/Documentation/block/io-controller.txt
new file mode 100644
index 0000000..21948c3
--- /dev/null
+++ b/Documentation/block/io-controller.txt
@@ -0,0 +1,407 @@
+				IO Controller
+				=============
+
+Overview
+========
+
+This patchset implements a proportional weight IO controller. That is one
+can create cgroups and assign prio/weights to those cgroups and task group
+will get access to disk proportionate to the weight of the group.
+
+These patches modify elevator layer and individual IO schedulers to do
+IO control hence this io controller works only on block devices which use
+one of the standard io schedulers can not be used with any xyz logical block
+device.
+
+The assumption/thought behind modifying IO scheduler is that resource control
+is primarily needed on leaf nodes where the actual contention for resources is
+present and not on intertermediate logical block devices.
+
+Consider following hypothetical scenario. Lets say there are three physical
+disks, namely sda, sdb and sdc. Two logical volumes (lv0 and lv1) have been
+created on top of these. Some part of sdb is in lv0 and some part is in lv1.
+
+			    lv0      lv1
+			  /	\  /     \
+			sda      sdb      sdc
+
+Also consider following cgroup hierarchy
+
+				root
+				/   \
+			       A     B
+			      / \    / \
+			     T1 T2  T3  T4
+
+A and B are two cgroups and T1, T2, T3 and T4 are tasks with-in those cgroups.
+Assuming T1, T2, T3 and T4 are doing IO on lv0 and lv1. These tasks should
+get their fair share of bandwidth on disks sda, sdb and sdc. There is no
+IO control on intermediate logical block nodes (lv0, lv1).
+
+So if tasks T1 and T2 are doing IO on lv0 and T3 and T4 are doing IO on lv1
+only, there will not be any contetion for resources between group A and B if
+IO is going to sda or sdc. But if actual IO gets translated to disk sdb, then
+IO scheduler associated with the sdb will distribute disk bandwidth to
+group A and B proportionate to their weight.
+
+CFQ already has the notion of fairness and it provides differential disk
+access based on priority and class of the task. Just that it is flat and
+with cgroup stuff, it needs to be made hierarchical to achive a good
+hierarchical control on IO.
+
+Rest of the IO schedulers (noop, deadline and AS) don't have any notion
+of fairness among various threads. They maintain only one queue where all
+the IO gets queued (internally this queue is split in read and write queue
+for deadline and AS). With this patchset, now we maintain one queue per
+cgropu per device and then try to do fair queuing among those queues.
+
+One of the concerns raised with modifying IO schedulers was that we don't
+want to replicate the code in all the IO schedulers. These patches share
+the fair queuing code which has been moved to a common layer (elevator
+layer). Hence we don't end up replicating code across IO schedulers. Following
+diagram depicts the concept.
+
+			--------------------------------
+			| Elevator Layer + Fair Queuing |
+			--------------------------------
+			 |	     |	     |       |
+			NOOP     DEADLINE    AS     CFQ
+
+Design
+======
+This patchset takes the inspiration from CFS cpu scheduler and CFQ to come
+up with core of hierarchical scheduling. Like CFQ we give time slices to
+every queue based on their priority. Like CFS, this disktime given to a
+queue is converted to virtual disk time based on queue's weight (vdisktime)
+and based on this vdisktime we decide which is the queue next to be
+dispatched.
+
+From data structure point of view, one can think of a tree per device, where
+io groups and io queues are hanging and are being scheduled using B-WF2Q+
+algorithm. io_queue, is end queue where requests are actually stored and
+dispatched from (like cfqq).
+
+These io queues are primarily created by and managed by end io schedulers
+depending on its semantics. For example, noop, deadline and AS ioschedulers
+keep one io queues per cgroup and cfqq keeps one io queue per io_context in
+a cgroup (apart from async queues).
+
+A request is mapped to an io group by elevator layer and which io queue it
+is mapped to with in group depends on ioscheduler. Currently "current" task
+is used to determine the cgroup (hence io group) of the request. Down the
+line we need to make use of bio-cgroup patches to map delayed writes to
+right group.
+
+Going back to old behavior
+==========================
+In new scheme of things essentially we are creating hierarchical fair
+queuing logic in elevator layer and chaning IO schedulers to make use of
+that logic so that end IO schedulers start supporting hierarchical scheduling.
+
+Elevator layer continues to support the old interfaces. So even if fair queuing
+is enabled at elevator layer, one can have both new hierchical scheduler as
+well as old non-hierarchical scheduler operating.
+
+Also noop, deadline and AS have option of enabling hierarchical scheduling.
+If it is selected, fair queuing is done in hierarchical manner. If hierarchical
+scheduling is disabled, noop, deadline and AS should retain their existing
+behavior.
+
+CFQ is the only exception where one can not disable fair queuing as it is
+needed for provding fairness among various threads even in non-hierarchical
+mode.
+
+Various user visible config options
+===================================
+CONFIG_IOSCHED_NOOP_HIER
+	- Enables hierchical fair queuing in noop. Not selecting this option
+	  leads to old behavior of noop.
+
+CONFIG_IOSCHED_DEADLINE_HIER
+	- Enables hierchical fair queuing in deadline. Not selecting this
+	  option leads to old behavior of deadline.
+
+CONFIG_IOSCHED_AS_HIER
+	- Enables hierchical fair queuing in AS. Not selecting this option
+	  leads to old behavior of AS.
+
+CONFIG_IOSCHED_CFQ_HIER
+	- Enables hierarchical fair queuing in CFQ. Not selecting this option
+	  still does fair queuing among various queus but it is flat and not
+	  hierarchical.
+
+CGROUP_BLKIO
+	- This option enables blkio-cgroup controller for IO tracking
+	  purposes. That means, by this controller one can attribute a write
+	  to the original cgroup and not assume that it belongs to submitting
+	  thread.
+
+CONFIG_TRACK_ASYNC_CONTEXT
+	- Currently CFQ attributes the writes to the submitting thread and
+	  caches the async queue pointer in the io context of the process.
+	  If this option is set, it tells cfq and elevator fair queuing logic
+	  that for async writes make use of IO tracking patches and attribute
+	  writes to original cgroup and not to write submitting thread.
+
+	  This should be primarily useful when lots of asynchronous writes
+	  are being submitted by pdflush threads and we need to assign the
+	  writes to right group.
+
+CONFIG_DEBUG_GROUP_IOSCHED
+	- Throws extra debug messages in blktrace output helpful in doing
+	  doing debugging in hierarchical setup.
+
+	- Also allows for export of extra debug statistics like group queue
+	  and dequeue statistics on device through cgroup interface.
+
+CONFIG_DEBUG_ELV_FAIR_QUEUING
+	- Enables some vdisktime related debugging messages.
+
+Config options selected automatically
+=====================================
+These config options are not user visible and are selected/deselected
+automatically based on IO scheduler configurations.
+
+CONFIG_ELV_FAIR_QUEUING
+	- Enables/Disables the fair queuing logic at elevator layer.
+
+CONFIG_GROUP_IOSCHED
+	- Enables/Disables hierarchical queuing and associated cgroup bits.
+
+HOWTO
+=====
+You can do a very simple testing of running two dd threads in two different
+cgroups. Here is what you can do.
+
+- Enable hierarchical scheduling in io scheuduler of your choice (say cfq).
+	CONFIG_IOSCHED_CFQ_HIER=y
+
+- Enable IO tracking for async writes.
+	CONFIG_TRACK_ASYNC_CONTEXT=y
+
+  (This will automatically select CGROUP_BLKIO)
+
+- Compile and boot into kernel and mount IO controller and blkio io tracking
+  controller.
+
+	mount -t cgroup -o io,blkio none /cgroup
+
+- Create two cgroups
+	mkdir -p /cgroup/test1/ /cgroup/test2
+
+- Set weights of group test1 and test2
+	echo 1000 > /cgroup/test1/io.weight
+	echo 500 > /cgroup/test2/io.weight
+
+- Set "fairness" parameter to 1 at the disk you are testing.
+
+  echo 1 > /sys/block/<disk>/queue/iosched/fairness
+
+- Create two same size files (say 512MB each) on same disk (file1, file2) and
+  launch two dd threads in different cgroup to read those files. Make sure
+  right io scheduler is being used for the block device where files are
+  present (the one you compiled in hierarchical mode).
+
+	sync
+	echo 3 > /proc/sys/vm/drop_caches
+
+	dd if=/mnt/sdb/zerofile1 of=/dev/null &
+	echo $! > /cgroup/test1/tasks
+	cat /cgroup/test1/tasks
+
+	dd if=/mnt/sdb/zerofile2 of=/dev/null &
+	echo $! > /cgroup/test2/tasks
+	cat /cgroup/test2/tasks
+
+- At macro level, first dd should finish first. To get more precise data, keep
+  on looking at (with the help of script), at io.disk_time and io.disk_sectors
+  files of both test1 and test2 groups. This will tell how much disk time
+  (in milli seconds), each group got and how many secotors each group
+  dispatched to the disk. We provide fairness in terms of disk time, so
+  ideally io.disk_time of cgroups should be in proportion to the weight.
+
+Some High Level Test setups
+===========================
+One of the use cases of IO controller is to provide some kind of IO isolation
+between multiple virtual machines on the same host. Following is one
+example setup which worked for me.
+
+
+			     KVM	     KVM
+			    Guest1	    Guest2
+			   ---------      ----------
+			  |  -----  |    |  ------  |
+			  | | vdb | |    | | vdb  | |
+			  |  -----  |    |   ------ |
+			   ---------      ----------
+
+			   ---------------------------
+			  | Host		      |
+			  |         -------------     |
+			  |        | sdb1 | sdb2 |    |
+			  |         -------------     |
+			   ---------------------------
+
+On host machine, I had a spare SATA disk. I created two partitions sdb1
+and sdb2 and gave this partitions as additional storage to kvm guests. sdb1
+to KVM guest1 and sdb2 KVM guest2. These storage appeared as /dev/vdb in
+both the guests. Formatted the /dev/vdb and created ext3 file system and
+started a 1G file writeout in both the guests. Before writeout I had created
+two cgroups of weight 1000 and 500 and put virtual machines in two different
+groups.
+
+Following is write I started in both the guests.
+
+dd if=/dev/zero of=/mnt/vdc/zerofile1 bs=4K count=262144 conv=fdatasync
+
+Following are the results on host with "deadline" scheduler.
+
+group1 time=8:16 17254 group1 sectors=8:16 2104288
+group2 time=8:16 8498  group2 sectors=8:16 1007040
+
+Virtual machine with cgroup weight 1000 got almost double the time of virtual
+machine with weight 500.
+
+What Works and What Does not
+============================
+Service differentiation at application level can be noticed only if completely
+parallel IO paths are created from application to IO scheduler and there
+are no serializations introduced by any intermediate layer. For example,
+in some cases file system and page cache layer introduce serialization and
+we don't see service difference between higher weight and lower weight
+process groups.
+
+For example, when I start an O_SYNC write out on an ext3 file system (file
+is being created newly), I see lots of activity from kjournald. I have not
+gone into details yet, but my understanding is that there are lot more
+journal commits and kjournald kind of introduces serialization between two
+processes. So even if you put these two processes in two different cgroups
+with different weights, higher weight process will not see more IO done.
+
+It does work very well when we bypass filesystem layer and IO is raw. For
+example in above virtual machine case, host sees raw synchronous writes
+coming from two guest machines and filesystem layer at host is not introducing
+any kind of serialization hence we can see the service difference.
+
+It also works very well for reads even on the same file system as for reads
+file system journalling activity does not kick in and we can create parallel
+IO paths from application to all the way down to IO scheduler and get more
+IO done on the IO path with higher weight.
+
+Regarding "fairness" parameter
+==============================
+IO controller has introduced a "fairness" tunable for every io scheduler.
+Currently this tunable can assume values 0, 1.
+
+If fairness is set to 1, then IO controller waits for requests to finish from
+previous queue before requests from new queue are dispatched. This helps in
+doing better accouting of disk time consumed by a queue. If this is not done
+then on a queuing hardware, there can be requests from multiple queues and
+we will not have any idea which queue consumed how much of disk time.
+
+Details of cgroup files
+=======================
+- io.ioprio_class
+	- Specifies class of the cgroup (RT, BE, IDLE). This is default io
+	  class of the group on all the devices until and unless overridden by
+	  per device rule. (See io.policy).
+
+	  1 = RT; 2 = BE, 3 = IDLE
+
+- io.weight
+	- Specifies per cgroup weight. This is default weight of the group
+	  on all the devices until and unless overridden by per device rule.
+	  (See io.policy).
+
+	  Currently allowed range of weights is from 100 to 1000.
+
+- io.disk_time
+	- disk time allocated to cgroup per device in milliseconds. First
+	  two fields specify the major and minor number of the device and
+	  third field specifies the disk time allocated to group in
+	  milliseconds.
+
+- io.disk_sectors
+	- number of sectors transferred to/from disk by the group. First
+	  two fields specify the major and minor number of the device and
+	  third field specifies the number of sectors transferred by the
+	  group to/from the device.
+
+- io.disk_queue
+	- Debugging aid only enabled if CONFIG_DEBUG_GROUP_IOSCHED=y. This
+	  gives the statistics about how many a times a group was queued
+	  on service tree of the device. First two fields specify the major
+	  and minor number of the device and third field specifies the number
+	  of times a group was queued on a particular device.
+
+- io.disk_queue
+	- Debugging aid only enabled if CONFIG_DEBUG_GROUP_IOSCHED=y. This
+	  gives the statistics about how many a times a group was de-queued
+	  or removed from the service tree of the device. This basically gives
+	  and idea if we can generate enough IO to create continuously
+	  backlogged groups. First two fields specify the major and minor
+	  number of the device and third field specifies the number
+	  of times a group was de-queued on a particular device.
+
+- io.policy
+	- One can specify per cgroup per device rules using this interface.
+	  These rules override the default value of group weight and class as
+	  specified by io.weight and io.ioprio_class.
+
+	  Following is the format.
+
+	#echo dev_maj:dev_minor weight ioprio_class > /patch/to/cgroup/io.policy
+
+	weight=0 means removing a policy.
+
+	Examples:
+
+	Configure weight=300 ioprio_class=2 on /dev/hdb (8:16) in this cgroup
+	# echo 8:16 300 2 > io.policy
+	# cat io.policy
+	dev	weight	class
+	8:16	300	2
+
+	Configure weight=500 ioprio_class=1 on /dev/hda (8:0) in this cgroup
+	# echo 8:0 500 1 > io.policy
+	# cat io.policy
+	dev	weight	class
+	8:0	500	1
+	8:16	300	2
+
+	Remove the policy for /dev/hda in this cgroup
+	# echo 8:0 0 1 > io.policy
+	# cat io.policy
+	dev	weight	class
+	8:16	300	2
+
+About configuring request desriptors
+====================================
+Traditionally there are 128 request desriptors allocated per request queue
+where io scheduler is operating (/sys/block/<disk>/queue/nr_requests). If these
+request descriptors are exhausted, processes will put to sleep and woken
+up once request descriptors are available.
+
+With io controller and cgroup stuff, one can not afford to allocate requests
+from single pool as one group might allocate lots of requests and then tasks
+from other groups might be put to sleep and this other group might be a
+higher weight group. Hence to make sure that a group always can get the
+request descriptors it is entitled to, one needs to make request descriptor
+limit per group on every queue.
+
+A new parameter /sys/block/<disk>/queue/nr_group_requests has been introduced
+and this parameter controlls the maximum number of requests per group.
+nr_requests still continues to control total number of request descriptors
+on the queue.
+
+Ideally one should set nr_requests to be following.
+
+nr_requests = number_of_cgroups * nr_group_requests
+
+This will make sure that at any point of time nr_group_requests number of
+request descriptors will be available for any of the cgroups.
+
+Currently default nr_requests=512 and nr_group_requests=128. This will make
+sure that apart from root group one can create 3 more group without running
+into any issues. If one decides to create more cgorus, nr_requests and
+nr_group_requests should be adjusted accordingly.
-- 
1.6.0.6

WARNING: multiple messages have this Message-ID (diff)
From: Vivek Goyal <vgoyal@redhat.com>
To: linux-kernel@vger.kernel.org, jens.axboe@oracle.com
Cc: containers@lists.linux-foundation.org, dm-devel@redhat.com,
	nauman@google.com, dpshah@google.com, lizf@cn.fujitsu.com,
	mikew@google.com, fchecconi@gmail.com, paolo.valente@unimore.it,
	ryov@valinux.co.jp, fernando@oss.ntt.co.jp,
	s-uchida@ap.jp.nec.com, taka@valinux.co.jp,
	guijianfeng@cn.fujitsu.com, jmoyer@redhat.com,
	dhaval@linux.vnet.ibm.com, balbir@linux.vnet.ibm.com,
	righi.andrea@gmail.com, m-ikeda@ds.jp.nec.com, agk@redhat.com,
	vgoyal@redhat.com, akpm@linux-foundation.org,
	peterz@infradead.org, jmarchan@redhat.com,
	torvalds@linux-foundation.org, mingo@elte.hu, riel@redhat.com
Subject: [PATCH 01/23] io-controller: Documentation
Date: Fri, 28 Aug 2009 17:30:50 -0400	[thread overview]
Message-ID: <1251495072-7780-2-git-send-email-vgoyal@redhat.com> (raw)
In-Reply-To: <1251495072-7780-1-git-send-email-vgoyal@redhat.com>

o Documentation for io-controller.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
---
 Documentation/block/00-INDEX          |    2 +
 Documentation/block/io-controller.txt |  407 +++++++++++++++++++++++++++++++++
 2 files changed, 409 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/block/io-controller.txt

diff --git a/Documentation/block/00-INDEX b/Documentation/block/00-INDEX
index 961a051..dc8bf95 100644
--- a/Documentation/block/00-INDEX
+++ b/Documentation/block/00-INDEX
@@ -10,6 +10,8 @@ capability.txt
 	- Generic Block Device Capability (/sys/block/<disk>/capability)
 deadline-iosched.txt
 	- Deadline IO scheduler tunables
+io-controller.txt
+	- IO controller for provding hierarchical IO scheduling
 ioprio.txt
 	- Block io priorities (in CFQ scheduler)
 request.txt
diff --git a/Documentation/block/io-controller.txt b/Documentation/block/io-controller.txt
new file mode 100644
index 0000000..21948c3
--- /dev/null
+++ b/Documentation/block/io-controller.txt
@@ -0,0 +1,407 @@
+				IO Controller
+				=============
+
+Overview
+========
+
+This patchset implements a proportional weight IO controller. That is one
+can create cgroups and assign prio/weights to those cgroups and task group
+will get access to disk proportionate to the weight of the group.
+
+These patches modify elevator layer and individual IO schedulers to do
+IO control hence this io controller works only on block devices which use
+one of the standard io schedulers can not be used with any xyz logical block
+device.
+
+The assumption/thought behind modifying IO scheduler is that resource control
+is primarily needed on leaf nodes where the actual contention for resources is
+present and not on intertermediate logical block devices.
+
+Consider following hypothetical scenario. Lets say there are three physical
+disks, namely sda, sdb and sdc. Two logical volumes (lv0 and lv1) have been
+created on top of these. Some part of sdb is in lv0 and some part is in lv1.
+
+			    lv0      lv1
+			  /	\  /     \
+			sda      sdb      sdc
+
+Also consider following cgroup hierarchy
+
+				root
+				/   \
+			       A     B
+			      / \    / \
+			     T1 T2  T3  T4
+
+A and B are two cgroups and T1, T2, T3 and T4 are tasks with-in those cgroups.
+Assuming T1, T2, T3 and T4 are doing IO on lv0 and lv1. These tasks should
+get their fair share of bandwidth on disks sda, sdb and sdc. There is no
+IO control on intermediate logical block nodes (lv0, lv1).
+
+So if tasks T1 and T2 are doing IO on lv0 and T3 and T4 are doing IO on lv1
+only, there will not be any contetion for resources between group A and B if
+IO is going to sda or sdc. But if actual IO gets translated to disk sdb, then
+IO scheduler associated with the sdb will distribute disk bandwidth to
+group A and B proportionate to their weight.
+
+CFQ already has the notion of fairness and it provides differential disk
+access based on priority and class of the task. Just that it is flat and
+with cgroup stuff, it needs to be made hierarchical to achive a good
+hierarchical control on IO.
+
+Rest of the IO schedulers (noop, deadline and AS) don't have any notion
+of fairness among various threads. They maintain only one queue where all
+the IO gets queued (internally this queue is split in read and write queue
+for deadline and AS). With this patchset, now we maintain one queue per
+cgropu per device and then try to do fair queuing among those queues.
+
+One of the concerns raised with modifying IO schedulers was that we don't
+want to replicate the code in all the IO schedulers. These patches share
+the fair queuing code which has been moved to a common layer (elevator
+layer). Hence we don't end up replicating code across IO schedulers. Following
+diagram depicts the concept.
+
+			--------------------------------
+			| Elevator Layer + Fair Queuing |
+			--------------------------------
+			 |	     |	     |       |
+			NOOP     DEADLINE    AS     CFQ
+
+Design
+======
+This patchset takes the inspiration from CFS cpu scheduler and CFQ to come
+up with core of hierarchical scheduling. Like CFQ we give time slices to
+every queue based on their priority. Like CFS, this disktime given to a
+queue is converted to virtual disk time based on queue's weight (vdisktime)
+and based on this vdisktime we decide which is the queue next to be
+dispatched.
+
+From data structure point of view, one can think of a tree per device, where
+io groups and io queues are hanging and are being scheduled using B-WF2Q+
+algorithm. io_queue, is end queue where requests are actually stored and
+dispatched from (like cfqq).
+
+These io queues are primarily created by and managed by end io schedulers
+depending on its semantics. For example, noop, deadline and AS ioschedulers
+keep one io queues per cgroup and cfqq keeps one io queue per io_context in
+a cgroup (apart from async queues).
+
+A request is mapped to an io group by elevator layer and which io queue it
+is mapped to with in group depends on ioscheduler. Currently "current" task
+is used to determine the cgroup (hence io group) of the request. Down the
+line we need to make use of bio-cgroup patches to map delayed writes to
+right group.
+
+Going back to old behavior
+==========================
+In new scheme of things essentially we are creating hierarchical fair
+queuing logic in elevator layer and chaning IO schedulers to make use of
+that logic so that end IO schedulers start supporting hierarchical scheduling.
+
+Elevator layer continues to support the old interfaces. So even if fair queuing
+is enabled at elevator layer, one can have both new hierchical scheduler as
+well as old non-hierarchical scheduler operating.
+
+Also noop, deadline and AS have option of enabling hierarchical scheduling.
+If it is selected, fair queuing is done in hierarchical manner. If hierarchical
+scheduling is disabled, noop, deadline and AS should retain their existing
+behavior.
+
+CFQ is the only exception where one can not disable fair queuing as it is
+needed for provding fairness among various threads even in non-hierarchical
+mode.
+
+Various user visible config options
+===================================
+CONFIG_IOSCHED_NOOP_HIER
+	- Enables hierchical fair queuing in noop. Not selecting this option
+	  leads to old behavior of noop.
+
+CONFIG_IOSCHED_DEADLINE_HIER
+	- Enables hierchical fair queuing in deadline. Not selecting this
+	  option leads to old behavior of deadline.
+
+CONFIG_IOSCHED_AS_HIER
+	- Enables hierchical fair queuing in AS. Not selecting this option
+	  leads to old behavior of AS.
+
+CONFIG_IOSCHED_CFQ_HIER
+	- Enables hierarchical fair queuing in CFQ. Not selecting this option
+	  still does fair queuing among various queus but it is flat and not
+	  hierarchical.
+
+CGROUP_BLKIO
+	- This option enables blkio-cgroup controller for IO tracking
+	  purposes. That means, by this controller one can attribute a write
+	  to the original cgroup and not assume that it belongs to submitting
+	  thread.
+
+CONFIG_TRACK_ASYNC_CONTEXT
+	- Currently CFQ attributes the writes to the submitting thread and
+	  caches the async queue pointer in the io context of the process.
+	  If this option is set, it tells cfq and elevator fair queuing logic
+	  that for async writes make use of IO tracking patches and attribute
+	  writes to original cgroup and not to write submitting thread.
+
+	  This should be primarily useful when lots of asynchronous writes
+	  are being submitted by pdflush threads and we need to assign the
+	  writes to right group.
+
+CONFIG_DEBUG_GROUP_IOSCHED
+	- Throws extra debug messages in blktrace output helpful in doing
+	  doing debugging in hierarchical setup.
+
+	- Also allows for export of extra debug statistics like group queue
+	  and dequeue statistics on device through cgroup interface.
+
+CONFIG_DEBUG_ELV_FAIR_QUEUING
+	- Enables some vdisktime related debugging messages.
+
+Config options selected automatically
+=====================================
+These config options are not user visible and are selected/deselected
+automatically based on IO scheduler configurations.
+
+CONFIG_ELV_FAIR_QUEUING
+	- Enables/Disables the fair queuing logic at elevator layer.
+
+CONFIG_GROUP_IOSCHED
+	- Enables/Disables hierarchical queuing and associated cgroup bits.
+
+HOWTO
+=====
+You can do a very simple testing of running two dd threads in two different
+cgroups. Here is what you can do.
+
+- Enable hierarchical scheduling in io scheuduler of your choice (say cfq).
+	CONFIG_IOSCHED_CFQ_HIER=y
+
+- Enable IO tracking for async writes.
+	CONFIG_TRACK_ASYNC_CONTEXT=y
+
+  (This will automatically select CGROUP_BLKIO)
+
+- Compile and boot into kernel and mount IO controller and blkio io tracking
+  controller.
+
+	mount -t cgroup -o io,blkio none /cgroup
+
+- Create two cgroups
+	mkdir -p /cgroup/test1/ /cgroup/test2
+
+- Set weights of group test1 and test2
+	echo 1000 > /cgroup/test1/io.weight
+	echo 500 > /cgroup/test2/io.weight
+
+- Set "fairness" parameter to 1 at the disk you are testing.
+
+  echo 1 > /sys/block/<disk>/queue/iosched/fairness
+
+- Create two same size files (say 512MB each) on same disk (file1, file2) and
+  launch two dd threads in different cgroup to read those files. Make sure
+  right io scheduler is being used for the block device where files are
+  present (the one you compiled in hierarchical mode).
+
+	sync
+	echo 3 > /proc/sys/vm/drop_caches
+
+	dd if=/mnt/sdb/zerofile1 of=/dev/null &
+	echo $! > /cgroup/test1/tasks
+	cat /cgroup/test1/tasks
+
+	dd if=/mnt/sdb/zerofile2 of=/dev/null &
+	echo $! > /cgroup/test2/tasks
+	cat /cgroup/test2/tasks
+
+- At macro level, first dd should finish first. To get more precise data, keep
+  on looking at (with the help of script), at io.disk_time and io.disk_sectors
+  files of both test1 and test2 groups. This will tell how much disk time
+  (in milli seconds), each group got and how many secotors each group
+  dispatched to the disk. We provide fairness in terms of disk time, so
+  ideally io.disk_time of cgroups should be in proportion to the weight.
+
+Some High Level Test setups
+===========================
+One of the use cases of IO controller is to provide some kind of IO isolation
+between multiple virtual machines on the same host. Following is one
+example setup which worked for me.
+
+
+			     KVM	     KVM
+			    Guest1	    Guest2
+			   ---------      ----------
+			  |  -----  |    |  ------  |
+			  | | vdb | |    | | vdb  | |
+			  |  -----  |    |   ------ |
+			   ---------      ----------
+
+			   ---------------------------
+			  | Host		      |
+			  |         -------------     |
+			  |        | sdb1 | sdb2 |    |
+			  |         -------------     |
+			   ---------------------------
+
+On host machine, I had a spare SATA disk. I created two partitions sdb1
+and sdb2 and gave this partitions as additional storage to kvm guests. sdb1
+to KVM guest1 and sdb2 KVM guest2. These storage appeared as /dev/vdb in
+both the guests. Formatted the /dev/vdb and created ext3 file system and
+started a 1G file writeout in both the guests. Before writeout I had created
+two cgroups of weight 1000 and 500 and put virtual machines in two different
+groups.
+
+Following is write I started in both the guests.
+
+dd if=/dev/zero of=/mnt/vdc/zerofile1 bs=4K count=262144 conv=fdatasync
+
+Following are the results on host with "deadline" scheduler.
+
+group1 time=8:16 17254 group1 sectors=8:16 2104288
+group2 time=8:16 8498  group2 sectors=8:16 1007040
+
+Virtual machine with cgroup weight 1000 got almost double the time of virtual
+machine with weight 500.
+
+What Works and What Does not
+============================
+Service differentiation at application level can be noticed only if completely
+parallel IO paths are created from application to IO scheduler and there
+are no serializations introduced by any intermediate layer. For example,
+in some cases file system and page cache layer introduce serialization and
+we don't see service difference between higher weight and lower weight
+process groups.
+
+For example, when I start an O_SYNC write out on an ext3 file system (file
+is being created newly), I see lots of activity from kjournald. I have not
+gone into details yet, but my understanding is that there are lot more
+journal commits and kjournald kind of introduces serialization between two
+processes. So even if you put these two processes in two different cgroups
+with different weights, higher weight process will not see more IO done.
+
+It does work very well when we bypass filesystem layer and IO is raw. For
+example in above virtual machine case, host sees raw synchronous writes
+coming from two guest machines and filesystem layer at host is not introducing
+any kind of serialization hence we can see the service difference.
+
+It also works very well for reads even on the same file system as for reads
+file system journalling activity does not kick in and we can create parallel
+IO paths from application to all the way down to IO scheduler and get more
+IO done on the IO path with higher weight.
+
+Regarding "fairness" parameter
+==============================
+IO controller has introduced a "fairness" tunable for every io scheduler.
+Currently this tunable can assume values 0, 1.
+
+If fairness is set to 1, then IO controller waits for requests to finish from
+previous queue before requests from new queue are dispatched. This helps in
+doing better accouting of disk time consumed by a queue. If this is not done
+then on a queuing hardware, there can be requests from multiple queues and
+we will not have any idea which queue consumed how much of disk time.
+
+Details of cgroup files
+=======================
+- io.ioprio_class
+	- Specifies class of the cgroup (RT, BE, IDLE). This is default io
+	  class of the group on all the devices until and unless overridden by
+	  per device rule. (See io.policy).
+
+	  1 = RT; 2 = BE, 3 = IDLE
+
+- io.weight
+	- Specifies per cgroup weight. This is default weight of the group
+	  on all the devices until and unless overridden by per device rule.
+	  (See io.policy).
+
+	  Currently allowed range of weights is from 100 to 1000.
+
+- io.disk_time
+	- disk time allocated to cgroup per device in milliseconds. First
+	  two fields specify the major and minor number of the device and
+	  third field specifies the disk time allocated to group in
+	  milliseconds.
+
+- io.disk_sectors
+	- number of sectors transferred to/from disk by the group. First
+	  two fields specify the major and minor number of the device and
+	  third field specifies the number of sectors transferred by the
+	  group to/from the device.
+
+- io.disk_queue
+	- Debugging aid only enabled if CONFIG_DEBUG_GROUP_IOSCHED=y. This
+	  gives the statistics about how many a times a group was queued
+	  on service tree of the device. First two fields specify the major
+	  and minor number of the device and third field specifies the number
+	  of times a group was queued on a particular device.
+
+- io.disk_queue
+	- Debugging aid only enabled if CONFIG_DEBUG_GROUP_IOSCHED=y. This
+	  gives the statistics about how many a times a group was de-queued
+	  or removed from the service tree of the device. This basically gives
+	  and idea if we can generate enough IO to create continuously
+	  backlogged groups. First two fields specify the major and minor
+	  number of the device and third field specifies the number
+	  of times a group was de-queued on a particular device.
+
+- io.policy
+	- One can specify per cgroup per device rules using this interface.
+	  These rules override the default value of group weight and class as
+	  specified by io.weight and io.ioprio_class.
+
+	  Following is the format.
+
+	#echo dev_maj:dev_minor weight ioprio_class > /patch/to/cgroup/io.policy
+
+	weight=0 means removing a policy.
+
+	Examples:
+
+	Configure weight=300 ioprio_class=2 on /dev/hdb (8:16) in this cgroup
+	# echo 8:16 300 2 > io.policy
+	# cat io.policy
+	dev	weight	class
+	8:16	300	2
+
+	Configure weight=500 ioprio_class=1 on /dev/hda (8:0) in this cgroup
+	# echo 8:0 500 1 > io.policy
+	# cat io.policy
+	dev	weight	class
+	8:0	500	1
+	8:16	300	2
+
+	Remove the policy for /dev/hda in this cgroup
+	# echo 8:0 0 1 > io.policy
+	# cat io.policy
+	dev	weight	class
+	8:16	300	2
+
+About configuring request desriptors
+====================================
+Traditionally there are 128 request desriptors allocated per request queue
+where io scheduler is operating (/sys/block/<disk>/queue/nr_requests). If these
+request descriptors are exhausted, processes will put to sleep and woken
+up once request descriptors are available.
+
+With io controller and cgroup stuff, one can not afford to allocate requests
+from single pool as one group might allocate lots of requests and then tasks
+from other groups might be put to sleep and this other group might be a
+higher weight group. Hence to make sure that a group always can get the
+request descriptors it is entitled to, one needs to make request descriptor
+limit per group on every queue.
+
+A new parameter /sys/block/<disk>/queue/nr_group_requests has been introduced
+and this parameter controlls the maximum number of requests per group.
+nr_requests still continues to control total number of request descriptors
+on the queue.
+
+Ideally one should set nr_requests to be following.
+
+nr_requests = number_of_cgroups * nr_group_requests
+
+This will make sure that at any point of time nr_group_requests number of
+request descriptors will be available for any of the cgroups.
+
+Currently default nr_requests=512 and nr_group_requests=128. This will make
+sure that apart from root group one can create 3 more group without running
+into any issues. If one decides to create more cgorus, nr_requests and
+nr_group_requests should be adjusted accordingly.
-- 
1.6.0.6


  reply	other threads:[~2009-08-28 21:30 UTC|newest]

Thread overview: 321+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-28 21:30 [RFC] IO scheduler based IO controller V9 Vivek Goyal
2009-08-28 21:30 ` Vivek Goyal
2009-08-28 21:30 ` Vivek Goyal [this message]
2009-08-28 21:30   ` [PATCH 01/23] io-controller: Documentation Vivek Goyal
2009-08-28 21:30 ` [PATCH 02/23] io-controller: Core of the elevator fair queuing Vivek Goyal
2009-08-28 21:30   ` Vivek Goyal
     [not found]   ` <1251495072-7780-3-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-08-28 22:26     ` Rik van Riel
2009-08-28 22:26   ` Rik van Riel
2009-08-28 22:26     ` Rik van Riel
2009-08-28 21:30 ` [PATCH 03/23] io-controller: Common flat fair queuing code in elevaotor layer Vivek Goyal
2009-08-28 21:30   ` Vivek Goyal
     [not found]   ` <1251495072-7780-4-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-08-29  1:29     ` Rik van Riel
2009-08-29  1:29   ` Rik van Riel
2009-08-29  1:29     ` Rik van Riel
     [not found] ` <1251495072-7780-1-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-08-28 21:30   ` [PATCH 01/23] io-controller: Documentation Vivek Goyal
2009-08-28 21:30   ` [PATCH 02/23] io-controller: Core of the elevator fair queuing Vivek Goyal
2009-08-28 21:30   ` [PATCH 03/23] io-controller: Common flat fair queuing code in elevaotor layer Vivek Goyal
2009-08-28 21:30   ` [PATCH 04/23] io-controller: Modify cfq to make use of flat elevator fair queuing Vivek Goyal
2009-08-28 21:30   ` [PATCH 05/23] io-controller: Core scheduler changes to support hierarhical scheduling Vivek Goyal
2009-08-28 21:30   ` [PATCH 06/23] io-controller: cgroup related changes for hierarchical group support Vivek Goyal
2009-08-28 21:30   ` [PATCH 07/23] io-controller: Common hierarchical fair queuing code in elevaotor layer Vivek Goyal
2009-08-28 21:30   ` [PATCH 08/23] io-controller: cfq changes to use " Vivek Goyal
2009-08-28 21:30   ` [PATCH 09/23] io-controller: Export disk time used and nr sectors dipatched through cgroups Vivek Goyal
2009-08-28 21:30   ` [PATCH 10/23] io-controller: Debug hierarchical IO scheduling Vivek Goyal
2009-08-28 21:31   ` [PATCH 11/23] io-controller: Introduce group idling Vivek Goyal
2009-08-28 21:31   ` [PATCH 12/23] io-controller: Wait for requests to complete from last queue before new queue is scheduled Vivek Goyal
2009-08-28 21:31   ` [PATCH 13/23] io-controller: Separate out queue and data Vivek Goyal
2009-08-28 21:31   ` [PATCH 14/23] io-conroller: Prepare elevator layer for single queue schedulers Vivek Goyal
2009-08-28 21:31   ` [PATCH 15/23] io-controller: noop changes for hierarchical fair queuing Vivek Goyal
2009-08-28 21:31   ` [PATCH 16/23] io-controller: deadline " Vivek Goyal
2009-08-28 21:31   ` [PATCH 17/23] io-controller: anticipatory " Vivek Goyal
2009-08-28 21:31   ` [PATCH 18/23] io-controller: blkio_cgroup patches from Ryo to track async bios Vivek Goyal
2009-08-28 21:31   ` [PATCH 19/23] io-controller: map async requests to appropriate cgroup Vivek Goyal
2009-08-28 21:31   ` [PATCH 20/23] io-controller: Per cgroup request descriptor support Vivek Goyal
2009-08-28 21:31   ` [PATCH 21/23] io-controller: Per io group bdi congestion interface Vivek Goyal
2009-08-28 21:31   ` [PATCH 22/23] io-controller: Support per cgroup per device weights and io class Vivek Goyal
2009-08-28 21:31   ` [PATCH 23/23] io-controller: debug elevator fair queuing support Vivek Goyal
2009-08-31  1:09   ` [RFC] IO scheduler based IO controller V9 Gui Jianfeng
2009-09-02  0:58   ` Gui Jianfeng
2009-09-07  7:40   ` Gui Jianfeng
2009-09-08 22:28   ` Vivek Goyal
2009-09-08 22:28   ` [PATCH 24/23] io-controller: Don't leave a queue active when a disk is idle Vivek Goyal
2009-09-08 22:28   ` [PATCH 25/23] io-controller: fix queue vs group fairness Vivek Goyal
2009-09-08 22:28   ` [PATCH 26/23] io-controller: fix writer preemption with in a group Vivek Goyal
2009-09-10 15:18   ` [RFC] IO scheduler based IO controller V9 Jerome Marchand
2009-08-28 21:30 ` [PATCH 04/23] io-controller: Modify cfq to make use of flat elevator fair queuing Vivek Goyal
2009-08-28 21:30   ` Vivek Goyal
     [not found]   ` <1251495072-7780-5-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-08-29  1:44     ` Rik van Riel
2009-08-29  1:44   ` Rik van Riel
2009-08-29  1:44     ` Rik van Riel
2009-08-28 21:30 ` [PATCH 05/23] io-controller: Core scheduler changes to support hierarhical scheduling Vivek Goyal
2009-08-28 21:30   ` Vivek Goyal
2009-08-29  3:31   ` Rik van Riel
2009-08-29  3:31     ` Rik van Riel
     [not found]   ` <1251495072-7780-6-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-08-29  3:31     ` Rik van Riel
2009-08-28 21:30 ` [PATCH 06/23] io-controller: cgroup related changes for hierarchical group support Vivek Goyal
2009-08-28 21:30   ` Vivek Goyal
     [not found]   ` <1251495072-7780-7-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-08-29  3:37     ` Rik van Riel
2009-08-29  3:37   ` Rik van Riel
2009-08-29  3:37     ` Rik van Riel
2009-08-28 21:30 ` [PATCH 07/23] io-controller: Common hierarchical fair queuing code in elevaotor layer Vivek Goyal
2009-08-28 21:30   ` Vivek Goyal
2009-08-29 23:04   ` Rik van Riel
2009-08-29 23:04     ` Rik van Riel
2009-09-03  3:08   ` Munehiro Ikeda
2009-09-03  3:08     ` Munehiro Ikeda
     [not found]     ` <4A9F3319.8040509-MDRzhb/z0dd8UrSeD/g0lQ@public.gmane.org>
2009-09-10 20:11       ` Vivek Goyal
2009-09-10 20:11     ` Vivek Goyal
2009-09-10 20:11       ` Vivek Goyal
     [not found]   ` <1251495072-7780-8-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-08-29 23:04     ` Rik van Riel
2009-09-03  3:08     ` Munehiro Ikeda
2009-08-28 21:30 ` [PATCH 08/23] io-controller: cfq changes to use " Vivek Goyal
2009-08-28 21:30   ` Vivek Goyal
2009-08-29 23:11   ` Rik van Riel
2009-08-29 23:11     ` Rik van Riel
     [not found]   ` <1251495072-7780-9-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-08-29 23:11     ` Rik van Riel
2009-08-28 21:30 ` [PATCH 09/23] io-controller: Export disk time used and nr sectors dipatched through cgroups Vivek Goyal
2009-08-28 21:30   ` Vivek Goyal
     [not found]   ` <1251495072-7780-10-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-08-29 23:12     ` Rik van Riel
2009-08-29 23:12   ` Rik van Riel
2009-08-29 23:12     ` Rik van Riel
2009-08-28 21:30 ` [PATCH 10/23] io-controller: Debug hierarchical IO scheduling Vivek Goyal
2009-08-28 21:30   ` Vivek Goyal
2009-08-30  0:10   ` Rik van Riel
2009-08-30  0:10     ` Rik van Riel
     [not found]   ` <1251495072-7780-11-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-08-30  0:10     ` Rik van Riel
2009-08-28 21:31 ` [PATCH 11/23] io-controller: Introduce group idling Vivek Goyal
2009-08-28 21:31   ` Vivek Goyal
2009-08-30  0:38   ` Rik van Riel
2009-08-30  0:38     ` Rik van Riel
     [not found]   ` <1251495072-7780-12-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-08-30  0:38     ` Rik van Riel
2009-09-18  3:56     ` [PATCH] io-controller: Fix another bug that causing system hanging Gui Jianfeng
2009-09-18  3:56   ` Gui Jianfeng
2009-09-18  3:56     ` Gui Jianfeng
     [not found]     ` <4AB30508.6010206-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2009-09-18 14:47       ` Vivek Goyal
2009-09-18 14:47     ` Vivek Goyal
2009-09-18 14:47       ` Vivek Goyal
2009-08-28 21:31 ` [PATCH 12/23] io-controller: Wait for requests to complete from last queue before new queue is scheduled Vivek Goyal
2009-08-28 21:31   ` Vivek Goyal
     [not found]   ` <1251495072-7780-13-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-08-30  0:40     ` Rik van Riel
2009-08-30  0:40   ` Rik van Riel
2009-08-30  0:40     ` Rik van Riel
2009-08-28 21:31 ` [PATCH 13/23] io-controller: Separate out queue and data Vivek Goyal
2009-08-28 21:31   ` Vivek Goyal
     [not found]   ` <1251495072-7780-14-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-08-31 15:27     ` Rik van Riel
2009-08-31 15:27   ` Rik van Riel
2009-08-31 15:27     ` Rik van Riel
2009-08-28 21:31 ` [PATCH 14/23] io-conroller: Prepare elevator layer for single queue schedulers Vivek Goyal
2009-08-28 21:31   ` Vivek Goyal
2009-08-31  2:49   ` Rik van Riel
2009-08-31  2:49     ` Rik van Riel
     [not found]   ` <1251495072-7780-15-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-08-31  2:49     ` Rik van Riel
2009-08-28 21:31 ` [PATCH 15/23] io-controller: noop changes for hierarchical fair queuing Vivek Goyal
2009-08-28 21:31   ` Vivek Goyal
2009-08-31  2:52   ` Rik van Riel
2009-08-31  2:52     ` Rik van Riel
     [not found]     ` <4A9B3B0B.9090009-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-09-10 17:32       ` Vivek Goyal
2009-09-10 17:32     ` Vivek Goyal
2009-09-10 17:32       ` Vivek Goyal
     [not found]   ` <1251495072-7780-16-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-08-31  2:52     ` Rik van Riel
2009-08-28 21:31 ` [PATCH 16/23] io-controller: deadline " Vivek Goyal
2009-08-28 21:31   ` Vivek Goyal
     [not found]   ` <1251495072-7780-17-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-08-31  3:13     ` Rik van Riel
2009-08-31  3:13   ` Rik van Riel
2009-08-31  3:13     ` Rik van Riel
     [not found]     ` <4A9B3FD3.6000407-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-08-31 13:46       ` Vivek Goyal
2009-08-31 13:46     ` Vivek Goyal
2009-08-31 13:46       ` Vivek Goyal
2009-08-28 21:31 ` [PATCH 17/23] io-controller: anticipatory " Vivek Goyal
2009-08-28 21:31   ` Vivek Goyal
     [not found]   ` <1251495072-7780-18-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-08-31 17:21     ` Rik van Riel
2009-08-31 17:21   ` Rik van Riel
2009-08-31 17:21     ` Rik van Riel
2009-08-28 21:31 ` [PATCH 18/23] io-controller: blkio_cgroup patches from Ryo to track async bios Vivek Goyal
2009-08-28 21:31   ` Vivek Goyal
     [not found]   ` <1251495072-7780-19-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-08-31 17:34     ` Rik van Riel
2009-08-31 17:34   ` Rik van Riel
2009-08-31 17:34     ` Rik van Riel
2009-08-31 18:56     ` Vivek Goyal
2009-08-31 18:56       ` Vivek Goyal
2009-08-31 23:51       ` Nauman Rafique
2009-08-31 23:51         ` Nauman Rafique
     [not found]         ` <e98e18940908311651s26de5b70ye6f4a82402956309-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-01  7:00           ` Ryo Tsuruta
2009-09-01  7:00         ` Ryo Tsuruta
2009-09-01  7:00           ` Ryo Tsuruta
2009-09-01 14:11           ` Vivek Goyal
2009-09-01 14:11             ` Vivek Goyal
2009-09-01 14:53             ` Rik van Riel
2009-09-01 14:53               ` Rik van Riel
     [not found]             ` <20090901141142.GA13709-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-09-01 14:53               ` Rik van Riel
2009-09-01 18:02               ` Nauman Rafique
2009-09-02  0:59               ` KAMEZAWA Hiroyuki
2009-09-02  9:52               ` Ryo Tsuruta
2009-09-01 18:02             ` Nauman Rafique
2009-09-01 18:02               ` Nauman Rafique
2009-09-02  0:59             ` KAMEZAWA Hiroyuki
2009-09-02  0:59               ` KAMEZAWA Hiroyuki
2009-09-02  3:12               ` Balbir Singh
2009-09-02  3:12                 ` Balbir Singh
     [not found]               ` <20090902095912.cdf8a55e.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2009-09-02  3:12                 ` Balbir Singh
2009-09-02  9:52             ` Ryo Tsuruta
2009-09-02 13:58               ` Vivek Goyal
2009-09-02 13:58                 ` Vivek Goyal
2009-09-03  2:24                 ` Ryo Tsuruta
2009-09-03  2:40                   ` Vivek Goyal
2009-09-03  2:40                     ` Vivek Goyal
2009-09-03  3:41                     ` Ryo Tsuruta
2009-09-03  3:41                       ` Ryo Tsuruta
     [not found]                     ` <20090903024014.GA8644-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-09-03  3:41                       ` Ryo Tsuruta
     [not found]                   ` <20090903.112423.226782505.ryov-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
2009-09-03  2:40                     ` Vivek Goyal
     [not found]                 ` <20090902135821.GB5012-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-09-03  2:24                   ` Ryo Tsuruta
     [not found]               ` <20090902.185251.193693849.ryov-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
2009-09-02 13:58                 ` Vivek Goyal
     [not found]           ` <20090901.160004.226800357.ryov-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
2009-09-01 14:11             ` Vivek Goyal
     [not found]       ` <20090831185640.GF3758-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-08-31 23:51         ` Nauman Rafique
     [not found]     ` <4A9C09BE.4060404-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-08-31 18:56       ` Vivek Goyal
2009-08-28 21:31 ` [PATCH 19/23] io-controller: map async requests to appropriate cgroup Vivek Goyal
2009-08-28 21:31   ` Vivek Goyal
2009-08-31 17:39   ` Rik van Riel
2009-08-31 17:39     ` Rik van Riel
     [not found]   ` <1251495072-7780-20-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-08-31 17:39     ` Rik van Riel
2009-08-28 21:31 ` [PATCH 20/23] io-controller: Per cgroup request descriptor support Vivek Goyal
2009-08-28 21:31   ` Vivek Goyal
2009-08-31 17:54   ` Rik van Riel
2009-08-31 17:54     ` Rik van Riel
     [not found]   ` <1251495072-7780-21-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-08-31 17:54     ` Rik van Riel
2009-09-14 18:33     ` Nauman Rafique
2009-09-14 18:33   ` Nauman Rafique
2009-09-14 18:33     ` Nauman Rafique
2009-09-16 18:47     ` Vivek Goyal
2009-09-16 18:47       ` Vivek Goyal
     [not found]     ` <e98e18940909141133m5186b780r3215ce15141e4f87-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-16 18:47       ` Vivek Goyal
2009-08-28 21:31 ` [PATCH 21/23] io-controller: Per io group bdi congestion interface Vivek Goyal
2009-08-28 21:31   ` Vivek Goyal
2009-08-31 19:49   ` Rik van Riel
2009-08-31 19:49     ` Rik van Riel
     [not found]   ` <1251495072-7780-22-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-08-31 19:49     ` Rik van Riel
2009-08-28 21:31 ` [PATCH 22/23] io-controller: Support per cgroup per device weights and io class Vivek Goyal
2009-08-28 21:31   ` Vivek Goyal
     [not found]   ` <1251495072-7780-23-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-08-31 20:56     ` Rik van Riel
2009-08-31 20:56   ` Rik van Riel
2009-08-31 20:56     ` Rik van Riel
2009-08-28 21:31 ` [PATCH 23/23] io-controller: debug elevator fair queuing support Vivek Goyal
2009-08-28 21:31   ` Vivek Goyal
2009-08-31 20:57   ` Rik van Riel
2009-08-31 20:57     ` Rik van Riel
2009-08-31 21:01     ` Vivek Goyal
2009-08-31 21:01       ` Vivek Goyal
     [not found]       ` <20090831210154.GA8229-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-08-31 21:12         ` Rik van Riel
2009-08-31 21:12       ` Rik van Riel
2009-08-31 21:12         ` Rik van Riel
     [not found]     ` <4A9C3951.8020302-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-08-31 21:01       ` Vivek Goyal
     [not found]   ` <1251495072-7780-24-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-08-31 20:57     ` Rik van Riel
2009-08-31  1:09 ` [RFC] IO scheduler based IO controller V9 Gui Jianfeng
2009-08-31  1:09   ` Gui Jianfeng
2009-09-02  0:58 ` Gui Jianfeng
2009-09-02  0:58   ` Gui Jianfeng
2009-09-02 13:45   ` Vivek Goyal
2009-09-02 13:45     ` Vivek Goyal
     [not found]   ` <4A9DC33E.6000408-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2009-09-02 13:45     ` Vivek Goyal
2009-09-07  2:14     ` Gui Jianfeng
2009-09-07  2:14   ` Gui Jianfeng
2009-09-07  2:14     ` Gui Jianfeng
2009-09-08 13:55     ` Vivek Goyal
2009-09-08 13:55       ` Vivek Goyal
     [not found]     ` <4AA46C6E.4010109-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2009-09-08 13:55       ` Vivek Goyal
2009-09-07  7:40 ` Gui Jianfeng
2009-09-07  7:40   ` Gui Jianfeng
2009-09-08 13:53   ` Vivek Goyal
2009-09-08 13:53     ` Vivek Goyal
     [not found]   ` <4AA4B905.8010801-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2009-09-08 13:53     ` Vivek Goyal
2009-09-08 19:19     ` Vivek Goyal
2009-09-08 19:19   ` Vivek Goyal
2009-09-08 19:19     ` Vivek Goyal
     [not found]     ` <20090908191941.GF15974-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-09-09  7:38       ` Gui Jianfeng
2009-09-09  9:41       ` Jens Axboe
2009-09-09  7:38     ` Gui Jianfeng
2009-09-09  7:38       ` Gui Jianfeng
     [not found]       ` <4AA75B71.5060109-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2009-09-09 15:05         ` Vivek Goyal
2009-09-09 15:05       ` Vivek Goyal
2009-09-09 15:05         ` Vivek Goyal
2009-09-10  3:20         ` Gui Jianfeng
     [not found]         ` <20090909150537.GD8256-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-09-10  3:20           ` Gui Jianfeng
2009-09-11  1:15           ` [PATCH] io-controller: Fix task hanging when there are more than one groups Gui Jianfeng
2009-09-11  1:15         ` Gui Jianfeng
2009-09-14  2:44           ` Vivek Goyal
2009-09-14  2:44             ` Vivek Goyal
     [not found]           ` <4AA9A4BE.30005-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2009-09-14  2:44             ` Vivek Goyal
2009-09-15  3:37             ` Vivek Goyal
2009-09-15  3:37           ` Vivek Goyal
2009-09-15  3:37             ` Vivek Goyal
     [not found]             ` <20090915033739.GA4054-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-09-16  0:05               ` Gui Jianfeng
2009-09-16  2:58               ` Gui Jianfeng
2009-09-24  1:10               ` Gui Jianfeng
2009-09-16  0:05             ` Gui Jianfeng
2009-09-16  0:05               ` Gui Jianfeng
2009-09-16  2:58             ` Gui Jianfeng
     [not found]               ` <4AB05442.6080004-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2009-09-16 18:09                 ` Vivek Goyal
2009-09-16 18:09               ` Vivek Goyal
2009-09-16 18:09                 ` Vivek Goyal
2009-09-17  6:08                 ` Gui Jianfeng
2009-09-17  6:08                   ` Gui Jianfeng
     [not found]                 ` <20090916180915.GE5221-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-09-17  6:08                   ` Gui Jianfeng
2009-09-24  1:10             ` Gui Jianfeng
2009-09-09  9:41     ` [RFC] IO scheduler based IO controller V9 Jens Axboe
2009-09-09  9:41       ` Jens Axboe
2009-09-08 22:28 ` Vivek Goyal
2009-09-08 22:28   ` Vivek Goyal
2009-09-08 22:28 ` [PATCH 24/23] io-controller: Don't leave a queue active when a disk is idle Vivek Goyal
2009-09-09  3:39   ` Rik van Riel
     [not found]   ` <20090908222821.GB3558-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-09-09  3:39     ` Rik van Riel
2009-09-08 22:28 ` [PATCH 25/23] io-controller: fix queue vs group fairness Vivek Goyal
2009-09-08 22:28   ` Vivek Goyal
2009-09-08 22:37   ` Daniel Walker
2009-09-09  1:09     ` Vivek Goyal
2009-09-09  1:09       ` Vivek Goyal
2009-09-09  1:09     ` Vivek Goyal
2009-09-08 23:13   ` Fabio Checconi
2009-09-09  1:32     ` Vivek Goyal
2009-09-09  1:32       ` Vivek Goyal
2009-09-09  2:03       ` Fabio Checconi
     [not found]       ` <20090909013205.GB3594-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-09-09  2:03         ` Fabio Checconi
     [not found]     ` <20090908231334.GJ17468-f9ZlEuEWxVeACYmtYXMKmw@public.gmane.org>
2009-09-09  1:32       ` Vivek Goyal
     [not found]   ` <20090908222827.GC3558-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-09-08 22:37     ` Daniel Walker
2009-09-08 23:13     ` Fabio Checconi
2009-09-09  4:44     ` Rik van Riel
2009-09-09  4:44   ` Rik van Riel
2009-09-09  4:44     ` Rik van Riel
2009-09-08 22:28 ` [PATCH 26/23] io-controller: fix writer preemption with in a group Vivek Goyal
2009-09-08 22:28   ` Vivek Goyal
2009-09-09  4:59   ` Rik van Riel
2009-09-09  4:59     ` Rik van Riel
     [not found]   ` <20090908222835.GD3558-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-09-09  4:59     ` Rik van Riel
2009-09-10 15:18 ` [RFC] IO scheduler based IO controller V9 Jerome Marchand
2009-09-10 20:52   ` Vivek Goyal
2009-09-10 20:52     ` Vivek Goyal
2009-09-10 20:56     ` Vivek Goyal
2009-09-10 20:56       ` Vivek Goyal
2009-09-11 13:16       ` Jerome Marchand
2009-09-11 14:30         ` Vivek Goyal
2009-09-11 14:30           ` Vivek Goyal
2009-09-11 14:43           ` Vivek Goyal
2009-09-11 14:43             ` Vivek Goyal
     [not found]             ` <20090911144341.GC6758-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-09-11 14:55               ` Jerome Marchand
2009-09-11 14:55                 ` Jerome Marchand
     [not found]                 ` <4AAA64F6.2050800-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-09-11 15:01                   ` Vivek Goyal
2009-09-11 15:01                 ` Vivek Goyal
2009-09-11 15:01                   ` Vivek Goyal
     [not found]           ` <20090911143040.GB6758-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-09-11 14:43             ` Vivek Goyal
2009-09-11 14:44             ` Jerome Marchand
2009-09-11 14:44           ` Jerome Marchand
     [not found]         ` <4AAA4DA7.8010909-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-09-11 14:30           ` Vivek Goyal
     [not found]       ` <20090910205657.GD3617-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-09-11 13:16         ` Jerome Marchand
     [not found]     ` <20090910205227.GB3617-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-09-10 20:56       ` Vivek Goyal
2009-09-14 14:26       ` Jerome Marchand
2009-09-14 14:26         ` Jerome Marchand
     [not found]   ` <4AA918C1.6070907-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-09-10 20:52     ` Vivek Goyal
2009-09-13 18:54     ` Vivek Goyal
2009-09-13 18:54   ` Vivek Goyal
2009-09-13 18:54     ` Vivek Goyal
     [not found]     ` <20090913185447.GA11003-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-09-14 14:31       ` Jerome Marchand
2009-09-14 14:31         ` Jerome Marchand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1251495072-7780-2-git-send-email-vgoyal@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=agk@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=dhaval@linux.vnet.ibm.com \
    --cc=dm-devel@redhat.com \
    --cc=dpshah@google.com \
    --cc=fchecconi@gmail.com \
    --cc=fernando@oss.ntt.co.jp \
    --cc=guijianfeng@cn.fujitsu.com \
    --cc=jens.axboe@oracle.com \
    --cc=jmarchan@redhat.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=m-ikeda@ds.jp.nec.com \
    --cc=mikew@google.com \
    --cc=mingo@elte.hu \
    --cc=nauman@google.com \
    --cc=paolo.valente@unimore.it \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=righi.andrea@gmail.com \
    --cc=s-uchida@ap.jp.nec.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.