Linux Container Development
 help / color / mirror / Atom feed
  • [parent not found: <1251495072-7780-4-git-send-email-vgoyal@redhat.com>]
  • [parent not found: <1251495072-7780-5-git-send-email-vgoyal@redhat.com>]
  • [parent not found: <1251495072-7780-6-git-send-email-vgoyal@redhat.com>]
  • [parent not found: <1251495072-7780-7-git-send-email-vgoyal@redhat.com>]
  • [parent not found: <1251495072-7780-9-git-send-email-vgoyal@redhat.com>]
  • [parent not found: <1251495072-7780-10-git-send-email-vgoyal@redhat.com>]
  • [parent not found: <1251495072-7780-11-git-send-email-vgoyal@redhat.com>]
  • [parent not found: <1251495072-7780-13-git-send-email-vgoyal@redhat.com>]
  • [parent not found: <1251495072-7780-15-git-send-email-vgoyal@redhat.com>]
  • [parent not found: <1251495072-7780-16-git-send-email-vgoyal@redhat.com>]
  • [parent not found: <1251495072-7780-17-git-send-email-vgoyal@redhat.com>]
  • [parent not found: <1251495072-7780-14-git-send-email-vgoyal@redhat.com>]
  • [parent not found: <1251495072-7780-18-git-send-email-vgoyal@redhat.com>]
  • [parent not found: <1251495072-7780-19-git-send-email-vgoyal@redhat.com>]
  • [parent not found: <1251495072-7780-20-git-send-email-vgoyal@redhat.com>]
  • [parent not found: <1251495072-7780-22-git-send-email-vgoyal@redhat.com>]
  • [parent not found: <1251495072-7780-23-git-send-email-vgoyal@redhat.com>]
  • [parent not found: <1251495072-7780-24-git-send-email-vgoyal@redhat.com>]
  • [parent not found: <1251495072-7780-8-git-send-email-vgoyal@redhat.com>]
  • [parent not found: <4A9DC33E.6000408@cn.fujitsu.com>]
  • [parent not found: <4AA4B905.8010801@cn.fujitsu.com>]
  • [parent not found: <20090908222827.GC3558@redhat.com>]
  • [parent not found: <20090908222821.GB3558@redhat.com>]
  • [parent not found: <20090908222835.GD3558@redhat.com>]
  • [parent not found: <1251495072-7780-1-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
  • [parent not found: <4AA918C1.6070907@redhat.com>]
  • [parent not found: <1251495072-7780-21-git-send-email-vgoyal@redhat.com>]
  • [parent not found: <1251495072-7780-12-git-send-email-vgoyal@redhat.com>]
  • * [RFC] IO scheduler based IO controller V9
    @ 2009-08-28 21:30 Vivek Goyal
      0 siblings, 0 replies; 113+ messages in thread
    From: Vivek Goyal @ 2009-08-28 21:30 UTC (permalink / raw)
      To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
    	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA
      Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
    	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
    	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
    	paolo.valente-rcYM44yAMweonA0d6jMUrA,
    	jmarchan-H+wXaHxf7aLQT0dZR+AlfA, fernando-gVGce1chcLdL9jVzuh4AOg,
    	jmoyer-H+wXaHxf7aLQT0dZR+AlfA, mingo-X9Un+BFzKDI,
    	riel-H+wXaHxf7aLQT0dZR+AlfA, fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
    	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
    	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
    	righi.andrea-Re5JQEeQqe8AvxtiuMwx3w,
    	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
    
    
    Hi All,
    
    Here is the V9 of the IO controller patches generated on top of 2.6.31-rc7.
    
    For ease of patching, a consolidated patch is available here.
    
    http://people.redhat.com/~vgoyal/io-controller/io-scheduler-based-io-controller-v9.patch
    
    Changes from V8
    ===============
    - Implemented bdi like congestion semantics for io group also. Now once an
      io group gets congested, we don't clear the congestion flag until number
      of requests goes below nr_congestion_off.
    
      This helps in getting rid of Buffered write performance regression we
      were observing with io controller patches.
    
      Gui, can you please test it and see if this version is better in terms
      of your buffered write tests.
    
    - Moved some of the functions from blk-core.c to elevator-fq.c. This reduces
      CONFIG_GROUP_IOSCHED ifdefs in blk-core.c and code looks little more clean. 
    
    - Fixed issue of add_front where we go left on rb-tree if add_front is
      specified in case of preemption.
    
    - Requeue async ioq after one round of dispatch. This helps emulationg
      CFQ behavior.
    
    - Pulled in v11 of io tracking patches and modified config option so that if
      CONFIG_TRACK_ASYNC_CONTEXT is not enabled, blkio is not compiled in.
    
    - Fixed some block tracepoints which were broken because of per group request
      list changes.
    
    - Fixed some logging messages.
    
    - Got rid of extra call to update_prio as pointed out by Jerome and Gui.
    
    - Merged the fix from jerome for a crash while chaning prio.
    
    - Got rid of redundant slice_start assignment as pointed by Gui.
    
    - Merged a elv_ioq_nr_dispatched() cleanup from Gui.
    
    - Fixed a compilation issue if CONFIG_BLOCK=n.
     
    What problem are we trying to solve
    ===================================
    Provide group IO scheduling feature in Linux along the lines of other resource
    controllers like cpu.
    
    IOW, provide facility so that a user can group applications using cgroups and
    control the amount of disk time/bandwidth received by a group based on its
    weight. 
    
    How to solve the problem
    =========================
    
    Different people have solved the issue differetnly. At least there are now
    three patchsets available (including this one).
    
    IO throttling
    -------------
    This is a bandwidth controller which keeps track of IO rate of a group and
    throttles the process in the group if it exceeds the user specified limit.
    
    dm-ioband
    ---------
    This is a proportional bandwidth controller implemented as device mapper
    driver and provides fair access in terms of amount of IO done (not in terms
    of disk time as CFQ does).
    
    So one will setup one or more dm-ioband devices on top of physical/logical
    block device, configure the ioband device and pass information like grouping
    etc. Now this device will keep track of bios flowing through it and control
    the flow of bios based on group policies.
    
    IO scheduler based IO controller
    --------------------------------
    Here I have viewed the problem of IO contoller as hierarchical group scheduling (along the lines of CFS group scheduling) issue. Currently one can view linux
    IO schedulers as flat where there is one root group and all the IO belongs to
    that group.
    
    This patchset basically modifies IO schedulers to also support hierarchical
    group scheduling. CFQ already provides fairness among different processes. I 
    have extended it support group IO schduling. Also took some of the code out
    of CFQ and put in a common layer so that same group scheduling code can be
    used by noop, deadline and AS to support group scheduling. 
    
    Pros/Cons
    =========
    There are pros and cons to each of the approach. Following are some of the
    thoughts.
    
    - IO throttling is a max bandwidth controller and not a proportional one.
      Additionaly it provides fairness in terms of amount of IO done (and not in
      terms of disk time as CFQ does).
    
      Personally, I think that proportional weight controller is useful to more
      people than just max bandwidth controller. In addition, IO scheduler based
      controller can also be enhanced to do max bandwidth control, if need be.
    
    - dm-ioband also provides fairness in terms of amount of IO done not in terms
      of disk time. So a seeky process can still run away with lot more disk time.
      Now this is an interesting question that how fairness among groups should be
      viewed and what is more relevant. Should fairness be based on amount of IO
      done or amount of disk time consumed as CFQ does. IO scheduler based
      controller provides fairness in terms of disk time used.
    
    - IO throttling and dm-ioband both are second level controller. That is these
      controllers are implemented in higher layers than io schedulers. So they
      control the IO at higher layer based on group policies and later IO
      schedulers take care of dispatching these bios to disk.
    
      Implementing a second level controller has the advantage of being able to
      provide bandwidth control even on logical block devices in the IO stack
      which don't have any IO schedulers attached to these. But they can also 
      interefere with IO scheduling policy of underlying IO scheduler and change
      the effective behavior. Following are some of the issues which I think
      should be visible in second level controller in one form or other.
    
      Prio with-in group
      ------------------
      A second level controller can potentially interefere with behavior of
      different prio processes with-in a group. bios are buffered at higher layer
      in single queue and release of bios is FIFO and not proportionate to the
      ioprio of the process. This can result in a particular prio level not
      getting fair share.
    
      Buffering at higher layer can delay read requests for more than slice idle
      period of CFQ (default 8 ms). That means, it is possible that we are waiting
      for a request from the queue but it is buffered at higher layer and then idle
      timer will fire. It means that queue will losse its share at the same time
      overall throughput will be impacted as we lost those 8 ms.
      
      Read Vs Write
      -------------
      Writes can overwhelm readers hence second level controller FIFO release
      will run into issue here. If there is a single queue maintained then reads
      will suffer large latencies. If there separate queues for reads and writes
      then it will be hard to decide in what ratio to dispatch reads and writes as
      it is IO scheduler's decision to decide when and how much read/write to
      dispatch. This is another place where higher level controller will not be in
      sync with lower level io scheduler and can change the effective policies of
      underlying io scheduler.
    
      Fairness in terms of disk time / size of IO
      ---------------------------------------------
      An higher level controller will most likely be limited to providing fairness
      in terms of size of IO done and will find it hard to provide fairness in
      terms of disk time used (as CFQ provides between various prio levels). This
      is because only IO scheduler knows how much disk time a queue has used.
    
      Not sure how useful it is to have fairness in terms of secotrs as CFQ has
      been providing fairness in terms of disk time. So a seeky application will
      still run away with lot of disk time and bring down the overall throughput
      of the the disk more than usual.
    
      CFQ IO context Issues
      ---------------------
      Buffering at higher layer means submission of bios later with the help of
      a worker thread. This changes the io context information at CFQ layer which
      assigns the request to submitting thread. Change of io context info again
      leads to issues of idle timer expiry and issue of a process not getting fair
      share and reduced throughput.
    
      Throughput with noop, deadline and AS
      ---------------------------------------------
      I think an higher level controller will result in reduced overall throughput
      (as compared to io scheduler based io controller) and more seeks with noop,
      deadline and AS.
    
      The reason being, that it is likely that IO with-in a group will be related
      and will be relatively close as compared to IO across the groups. For example,
      thread pool of kvm-qemu doing IO for virtual machine. In case of higher level
      control, IO from various groups will go into a single queue at lower level
      controller and it might happen that IO is now interleaved (G1, G2, G1, G3,
      G4....) causing more seeks and reduced throughput. (Agreed that merging will
      help up to some extent but still....).
    
      Instead, in case of lower level controller, IO scheduler maintains one queue
      per group hence there is no interleaving of IO between groups. And if IO is
      related with-in group, then we shoud get reduced number/amount of seek and
      higher throughput.
    
      Latency can be a concern but that can be controlled by reducing the time
      slice length of the queue.
    
    - IO scheduler based controller has the limitation that it works only with the
      bottom most devices in the IO stack where IO scheduler is attached. Now the
      question comes that how important/relevant it is to control bandwidth at
      higher level logical devices also. The actual contention for resources is
      at the leaf block device so it probably makes sense to do any kind of
      control there and not at the intermediate devices. Secondly probably it
      also means better use of available resources.
    
      For example, assume a user has created a linear logical device lv0 using
      three underlying disks sda, sdb and sdc. Also assume there are two tasks
      T1 and T2 in two groups doing IO on lv0. Also assume that weights of groups
      are in the ratio of 2:1 so T1 should get double the BW of T2 on lv0 device.
    
    			     T1    T2
    			       \   /
    			        lv0
    			      /  |  \
    			    sda sdb  sdc
    
      Now if IO control is done at lv0 level, then if T1 is doing IO to only sda,
      and T2's IO is going to sdc. In this case there is no need of resource
      management as both the IOs don't have any contention where it matters. If we
      try to do IO control at lv0 device, it will not be an optimal usage of
      resources and will bring down overall throughput.
    
    IMHO, IO scheduler based IO controller is a reasonable approach to solve the
    problem of group bandwidth control, and can do hierarchical IO scheduling
    more tightly and efficiently. But I am all ears to alternative approaches and
    suggestions how doing things can be done better.
    
    TODO
    ====
    - code cleanups, testing, bug fixing, optimizations, benchmarking etc...
    - More testing to make sure there are no regressions in CFQ.
    
    Open Issues
    ===========
    - Currently for async requests like buffered writes, we get the io group
      information from the page instead of the task context. How important it is
      to determine the context from page?
    
      Can we put all the pdflush threads into a separate group and control system
      wide buffered write bandwidth. Any buffered writes submitted by the process
      directly will any way go to right group.
    
      If it is acceptable then we can drop all the code associated with async io
      context and that should simplify the patchset a lot.  
    
    Testing
    =======
    I have divided testing results in three sections. 
    
    - Latency
    - Throughput and Fairness
    - Group Fairness
    
    Because I have enhanced CFQ to also do group scheduling, one of the concerns
    has been that existing CFQ should not regress at least in flat setup. If
    one creates groups and puts tasks in those, then this is new environment and
    some properties can change because groups have this additional requirement
    of providing isolation also.
    
    Environment
    ==========
    A 7200 RPM SATA drive with queue depth of 31. Ext3 filesystem.
     
    Latency Testing
    ++++++++++++++++
    
    Test1: fsync-test with torture test from linus as background writer
    ------------------------------------------------------------
    I looked at Ext3 fsync latency thread and picked fsync-test from Theodore Ts'o
    and torture test from Linus as background writer to see how are the fsync
    completion latencies. Following are the results.
    
    Vanilla CFQ              IOC                    IOC (with map async)
    ===========             =================        ====================
    fsync time: 0.2515      fsync time: 0.8580      fsync time: 0.0531
    fsync time: 0.1082      fsync time: 0.1408      fsync time: 0.8907
    fsync time: 0.2106      fsync time: 0.3228      fsync time: 0.2709
    fsync time: 0.2591      fsync time: 0.0978      fsync time: 0.3198
    fsync time: 0.2776      fsync time: 0.3035      fsync time: 0.0886
    fsync time: 0.2530      fsync time: 0.0903      fsync time: 0.3035
    fsync time: 0.2271      fsync time: 0.2712      fsync time: 0.0961
    fsync time: 0.1057      fsync time: 0.3357      fsync time: 0.1048
    fsync time: 0.1699      fsync time: 0.3175      fsync time: 0.2582
    fsync time: 0.1923      fsync time: 0.2964      fsync time: 0.0876
    fsync time: 0.1805      fsync time: 0.0971      fsync time: 0.2546
    fsync time: 0.2944      fsync time: 0.2728      fsync time: 0.3059
    fsync time: 0.1420      fsync time: 0.1079      fsync time: 0.2973
    fsync time: 0.2650      fsync time: 0.3103      fsync time: 0.2032
    fsync time: 0.1581      fsync time: 0.1987      fsync time: 0.2926
    fsync time: 0.2656      fsync time: 0.3048      fsync time: 0.1934
    fsync time: 0.2666      fsync time: 0.3092      fsync time: 0.2954
    fsync time: 0.1272      fsync time: 0.0165      fsync time: 0.2952
    fsync time: 0.2655      fsync time: 0.2827      fsync time: 0.2394
    fsync time: 0.0147      fsync time: 0.0068      fsync time: 0.0454
    fsync time: 0.2296      fsync time: 0.2923      fsync time: 0.2936
    fsync time: 0.0069      fsync time: 0.3021      fsync time: 0.0397
    fsync time: 0.2668      fsync time: 0.1032      fsync time: 0.2762
    fsync time: 0.1932      fsync time: 0.0962      fsync time: 0.2946
    fsync time: 0.1895      fsync time: 0.3545      fsync time: 0.0774
    fsync time: 0.2577      fsync time: 0.2406      fsync time: 0.3027
    fsync time: 0.4935      fsync time: 0.7193      fsync time: 0.2984
    fsync time: 0.2804      fsync time: 0.3251      fsync time: 0.1057
    fsync time: 0.2685      fsync time: 0.1001      fsync time: 0.3145
    fsync time: 0.1946      fsync time: 0.2525      fsync time: 0.2992
    
    IOC--> With IO controller patches applied. CONFIG_TRACK_ASYNC_CONTEXT=n
    IOC(map async) --> IO controller patches with CONFIG_TRACK_ASYNC_CONTEXT=y
    
    If CONFIG_TRACK_ASYNC_CONTEXT=y, async requests are mapped to the group based
    on cgroup info stored in page otherwise these are mapped to the cgroup
    submitting task belongs to.
    
    Notes: 
    - It looks like that max fsync time is a bit higher with IO controller
      patches. Wil dig more into it later.
    
    Test2: read small files with multiple sequential readers (10) runnning
    ======================================================================
    Took Ingo's small file reader test and ran it while 10 sequential readers
    were running.
    
    Vanilla CFQ     IOC (flat)      IOC (10 readers in 10 groups)
    0.12 seconds    0.11 seconds    1.62 seconds
    0.05 seconds    0.05 seconds    1.18 seconds
    0.05 seconds    0.05 seconds    1.17 seconds
    0.03 seconds    0.04 seconds    1.18 seconds
    1.15 seconds    1.17 seconds    1.29 seconds
    1.18 seconds    1.16 seconds    1.17 seconds
    1.17 seconds    1.16 seconds    1.17 seconds
    1.18 seconds    1.15 seconds    1.28 seconds
    1.17 seconds    1.15 seconds    1.17 seconds
    1.16 seconds    1.18 seconds    1.18 seconds
    1.15 seconds    1.15 seconds    1.17 seconds
    1.17 seconds    1.15 seconds    1.18 seconds
    1.17 seconds    1.15 seconds    1.17 seconds
    1.17 seconds    1.16 seconds    1.18 seconds
    1.17 seconds    1.15 seconds    1.17 seconds
    0.04 seconds    0.04 seconds    1.18 seconds
    1.17 seconds    1.16 seconds    1.17 seconds
    1.18 seconds    1.15 seconds    1.17 seconds
    1.18 seconds    1.15 seconds    1.28 seconds
    1.18 seconds    1.15 seconds    1.18 seconds
    1.17 seconds    1.16 seconds    1.18 seconds
    1.17 seconds    1.18 seconds    1.17 seconds
    1.17 seconds    1.15 seconds    1.17 seconds
    1.16 seconds    1.16 seconds    1.17 seconds
    1.17 seconds    1.15 seconds    1.17 seconds
    1.16 seconds    1.15 seconds    1.17 seconds
    1.15 seconds    1.15 seconds    1.18 seconds
    1.18 seconds    1.16 seconds    1.17 seconds
    1.16 seconds    1.16 seconds    1.17 seconds
    1.17 seconds    1.16 seconds    1.17 seconds
    1.16 seconds    1.16 seconds    1.17 seconds
    
    In third column, 10 readers have been put into 10 groups instead of running
    into root group. Small file reader runs in to root group.
    
    Notes: It looks like that here read latencies remain same as with vanilla CFQ.
    
    Test3: read small files with multiple writers (8) runnning
    ==========================================================
    Again running small file reader test with 8 buffered writers running with
    prio 0 to 7.
    
    Latency results are in seconds. Tried to capture the output with multiple
    configurations of IO controller to see the effect.
    
    Vanilla  IOC     IOC     IOC     IOC    IOC     IOC
            (flat)(groups) (groups) (map)  (map)    (map)
                    (f=0)   (f=1)   (flat) (groups) (groups)
                                            (f=0)   (f=1)
    0.25    0.03    0.31    0.25    0.29    1.25    0.39
    0.27    0.28    0.28    0.30    0.41    0.90    0.80
    0.25    0.24    0.23    0.37    0.27    1.17    0.24
    0.14    0.14    0.14    0.13    0.15    0.10    1.11
    0.14    0.16    0.13    0.16    0.15    0.06    0.58
    0.16    0.11    0.15    0.12    0.19    0.05    0.14
    0.03    0.17    0.12    0.17    0.04    0.12    0.12
    0.13    0.13    0.13    0.14    0.03    0.05    0.05
    0.18    0.13    0.17    0.09    0.09    0.05    0.07
    0.11    0.18    0.16    0.18    0.14    0.05    0.12
    0.28    0.14    0.15    0.15    0.13    0.02    0.04
    0.16    0.14    0.14    0.12    0.15    0.00    0.13
    0.14    0.13    0.14    0.13    0.13    0.02    0.02
    0.13    0.11    0.12    0.14    0.15    0.06    0.01
    0.27    0.28    0.32    0.24    0.25    0.01    0.01
    0.14    0.15    0.18    0.15    0.13    0.06    0.02
    0.15    0.13    0.13    0.13    0.13    0.00    0.04
    0.15    0.13    0.15    0.14    0.15    0.01    0.05
    0.11    0.17    0.15    0.13    0.13    0.02    0.00
    0.17    0.13    0.17    0.12    0.18    0.39    0.01
    0.18    0.16    0.14    0.16    0.14    0.89    0.47
    0.13    0.13    0.14    0.04    0.12    0.64    0.78
    0.16    0.15    0.19    0.11    0.16    0.67    1.17
    0.04    0.12    0.14    0.04    0.18    0.67    0.63
    0.03    0.13    0.17    0.11    0.15    0.61    0.69
    0.15    0.16    0.13    0.14    0.13    0.77    0.66
    0.12    0.12    0.15    0.11    0.13    0.92    0.73
    0.15    0.12    0.15    0.16    0.13    0.70    0.73
    0.11    0.13    0.15    0.10    0.18    0.73    0.82
    0.16    0.19    0.15    0.16    0.14    0.71    0.74
    0.28    0.05    0.26    0.22    0.17    2.91    0.79
    0.13    0.05    0.14    0.14    0.14    0.44    0.65
    0.16    0.22    0.18    0.13    0.26    0.31    0.65
    0.10    0.13    0.12    0.11    0.16    0.25    0.66
    0.13    0.14    0.16    0.15    0.12    0.17    0.76
    0.19    0.11    0.12    0.14    0.17    0.20    0.71
    0.16    0.15    0.14    0.15    0.11    0.19    0.68
    0.13    0.13    0.13    0.13    0.16    0.04    0.78
    0.14    0.16    0.15    0.17    0.15    1.20    0.80
    0.17    0.13    0.14    0.18    0.14    0.76    0.63
    
    f(0/1)--> refers to "fairness" tunable. This is new tunable part of CFQ. It
      	  set, we wait for requests from one queue to finish before new
    	  queue is scheduled in.
    
    group ---> writers are running into individual groups and not in root group.
    map---> buffered writes are mapped to group using info stored in page.
    
    Notes: Except the case of column 6 and 7 when writeres are in separate groups
    and we are mapping their writes to respective group, latencies seem to be
    fine. I think the latencies are higher for the last two cases because
    now the reader can't preempt the writer.
    
    				root
    			       / \  \ \
    			      R  G1 G2 G3
    				 |  |  |
    				 W  W  W
    Test4: Random Reader test in presece of 4 sequential readers and 4 buffered
           writers
    ============================================================================
    Used fio to this time to run one random reader and see how does it fair in
    the presence of 4 sequential readers and 4 writers.
    
    I have just pasted the output of random reader from fio.
    
    Vanilla Kernel, Three runs
    --------------------------
    read : io=20,512KiB, bw=349KiB/s, iops=10, runt= 60075msec
    clat (usec): min=944, max=2,675K, avg=93715.04, stdev=305815.90
    
    read : io=13,696KiB, bw=233KiB/s, iops=7, runt= 60035msec
    clat (msec): min=2, max=1,812, avg=140.26, stdev=382.55
    
    read : io=13,824KiB, bw=235KiB/s, iops=7, runt= 60185msec
    clat (usec): min=766, max=2,025K, avg=139310.55, stdev=383647.54
    
    IO controller kernel, Three runs
    --------------------------------
    read : io=10,304KiB, bw=175KiB/s, iops=5, runt= 60083msec
    clat (msec): min=2, max=2,654, avg=186.59, stdev=524.08
    
    read : io=10,176KiB, bw=173KiB/s, iops=5, runt= 60054msec
    clat (usec): min=792, max=2,567K, avg=188841.70, stdev=517154.75
    
    read : io=11,040KiB, bw=188KiB/s, iops=5, runt= 60003msec
    clat (usec): min=779, max=2,625K, avg=173915.56, stdev=508118.60
    
    Notes:
    - Looks like vanilla CFQ gives a bit more disk access to random reader. Will
      dig into it.
    
    Throughput and Fairness
    +++++++++++++++++++++++
    Test5: Bandwidth distribution between 4 sequential readers and 4 buffered
           writers
    ==========================================================================
    Used fio to launch 4 sequential readers and 4 buffered writers and watched
    how BW is distributed.
    
    Vanilla kernel, Three sets
    --------------------------
    read : io=962MiB, bw=16,818KiB/s, iops=513, runt= 60008msec
    read : io=969MiB, bw=16,920KiB/s, iops=516, runt= 60077msec
    read : io=978MiB, bw=17,063KiB/s, iops=520, runt= 60096msec
    read : io=922MiB, bw=16,106KiB/s, iops=491, runt= 60057msec
    write: io=235MiB, bw=4,099KiB/s, iops=125, runt= 60049msec
    write: io=226MiB, bw=3,944KiB/s, iops=120, runt= 60049msec
    write: io=215MiB, bw=3,747KiB/s, iops=114, runt= 60049msec
    write: io=207MiB, bw=3,606KiB/s, iops=110, runt= 60049msec
    READ: io=3,832MiB, aggrb=66,868KiB/s, minb=16,106KiB/s, maxb=17,063KiB/s,
    mint=60008msec, maxt=60096msec
    WRITE: io=882MiB, aggrb=15,398KiB/s, minb=3,606KiB/s, maxb=4,099KiB/s,
    mint=60049msec, maxt=60049msec
    
    read : io=1,002MiB, bw=17,513KiB/s, iops=534, runt= 60020msec
    read : io=979MiB, bw=17,085KiB/s, iops=521, runt= 60080msec
    read : io=953MiB, bw=16,637KiB/s, iops=507, runt= 60092msec
    read : io=920MiB, bw=16,057KiB/s, iops=490, runt= 60108msec
    write: io=215MiB, bw=3,560KiB/s, iops=108, runt= 63289msec
    write: io=136MiB, bw=2,361KiB/s, iops=72, runt= 60502msec
    write: io=127MiB, bw=2,101KiB/s, iops=64, runt= 63289msec
    write: io=233MiB, bw=3,852KiB/s, iops=117, runt= 63289msec
    READ: io=3,855MiB, aggrb=67,256KiB/s, minb=16,057KiB/s, maxb=17,513KiB/s,
    mint=60020msec, maxt=60108msec
    WRITE: io=711MiB, aggrb=11,771KiB/s, minb=2,101KiB/s, maxb=3,852KiB/s,
    mint=60502msec, maxt=63289msec
    
    read : io=985MiB, bw=17,179KiB/s, iops=524, runt= 60149msec
    read : io=974MiB, bw=17,025KiB/s, iops=519, runt= 60002msec
    read : io=962MiB, bw=16,772KiB/s, iops=511, runt= 60170msec
    read : io=932MiB, bw=16,280KiB/s, iops=496, runt= 60057msec
    write: io=177MiB, bw=2,933KiB/s, iops=89, runt= 63094msec
    write: io=152MiB, bw=2,637KiB/s, iops=80, runt= 60323msec
    write: io=240MiB, bw=3,983KiB/s, iops=121, runt= 63094msec
    write: io=147MiB, bw=2,439KiB/s, iops=74, runt= 63094msec
    READ: io=3,855MiB, aggrb=67,174KiB/s, minb=16,280KiB/s, maxb=17,179KiB/s,
    mint=60002msec, maxt=60170msec
    WRITE: io=715MiB, aggrb=11,877KiB/s, minb=2,439KiB/s, maxb=3,983KiB/s,
    mint=60323msec, maxt=63094msec
    
    IO controller kernel three sets
    -------------------------------
    read : io=944MiB, bw=16,483KiB/s, iops=503, runt= 60055msec
    read : io=941MiB, bw=16,433KiB/s, iops=501, runt= 60073msec
    read : io=900MiB, bw=15,713KiB/s, iops=479, runt= 60040msec
    read : io=866MiB, bw=15,112KiB/s, iops=461, runt= 60086msec
    write: io=244MiB, bw=4,262KiB/s, iops=130, runt= 60040msec
    write: io=177MiB, bw=3,085KiB/s, iops=94, runt= 60042msec
    write: io=158MiB, bw=2,758KiB/s, iops=84, runt= 60041msec
    write: io=180MiB, bw=3,137KiB/s, iops=95, runt= 60040msec
    READ: io=3,651MiB, aggrb=63,718KiB/s, minb=15,112KiB/s, maxb=16,483KiB/s,
    mint=60040msec, maxt=60086msec
    WRITE: io=758MiB, aggrb=13,243KiB/s, minb=2,758KiB/s, maxb=4,262KiB/s,
    mint=60040msec, maxt=60042msec
    
    read : io=960MiB, bw=16,734KiB/s, iops=510, runt= 60137msec
    read : io=917MiB, bw=16,001KiB/s, iops=488, runt= 60122msec
    read : io=897MiB, bw=15,683KiB/s, iops=478, runt= 60004msec
    read : io=908MiB, bw=15,824KiB/s, iops=482, runt= 60149msec
    write: io=209MiB, bw=3,563KiB/s, iops=108, runt= 61400msec
    write: io=177MiB, bw=3,030KiB/s, iops=92, runt= 61400msec
    write: io=200MiB, bw=3,409KiB/s, iops=104, runt= 61400msec
    write: io=204MiB, bw=3,489KiB/s, iops=106, runt= 61400msec
    READ: io=3,682MiB, aggrb=64,194KiB/s, minb=15,683KiB/s, maxb=16,734KiB/s,
    mint=60004msec, maxt=60149msec
    WRITE: io=790MiB, aggrb=13,492KiB/s, minb=3,030KiB/s, maxb=3,563KiB/s,
    mint=61400msec, maxt=61400msec
    
    read : io=968MiB, bw=16,867KiB/s, iops=514, runt= 60158msec
    read : io=925MiB, bw=16,135KiB/s, iops=492, runt= 60142msec
    read : io=875MiB, bw=15,286KiB/s, iops=466, runt= 60003msec
    read : io=872MiB, bw=15,221KiB/s, iops=464, runt= 60049msec
    write: io=213MiB, bw=3,720KiB/s, iops=113, runt= 60162msec
    write: io=203MiB, bw=3,536KiB/s, iops=107, runt= 60163msec
    write: io=208MiB, bw=3,620KiB/s, iops=110, runt= 60162msec
    write: io=203MiB, bw=3,538KiB/s, iops=107, runt= 60163msec
    READ: io=3,640MiB, aggrb=63,439KiB/s, minb=15,221KiB/s, maxb=16,867KiB/s,
    mint=60003msec, maxt=60158msec
    WRITE: io=827MiB, aggrb=14,415KiB/s, minb=3,536KiB/s, maxb=3,720KiB/s,
    mint=60162msec, maxt=60163msec
    
    Notes: It looks like vanilla CFQ favors readers a bit more over writers as
           compared to io controller cfq. Will dig into it.
    	 
    Test6: Bandwidth distribution between readers of diff prio
    ==========================================================
    Using fio, ran 8 readers of prio 0 to 7 and let it run for 30 seconds and
    watched for overall throughput and who got how much IO done. 
    
    Vanilla kernel, Three sets
    ---------------------------
    read : io=454MiB, bw=15,865KiB/s, iops=484, runt= 30004msec
    read : io=382MiB, bw=13,330KiB/s, iops=406, runt= 30086msec
    read : io=325MiB, bw=11,330KiB/s, iops=345, runt= 30074msec
    read : io=294MiB, bw=10,253KiB/s, iops=312, runt= 30062msec
    read : io=238MiB, bw=8,321KiB/s, iops=253, runt= 30048msec
    read : io=145MiB, bw=5,061KiB/s, iops=154, runt= 30032msec
    read : io=99MiB, bw=3,456KiB/s, iops=105, runt= 30021msec
    read : io=67,040KiB, bw=2,280KiB/s, iops=69, runt= 30108msec
    READ: io=2,003MiB, aggrb=69,767KiB/s, minb=2,280KiB/s, maxb=15,865KiB/s,
    mint=30004msec, maxt=30108msec
    
    read : io=450MiB, bw=15,727KiB/s, iops=479, runt= 30001msec
    read : io=371MiB, bw=12,966KiB/s, iops=395, runt= 30040msec
    read : io=325MiB, bw=11,321KiB/s, iops=345, runt= 30099msec
    read : io=296MiB, bw=10,332KiB/s, iops=315, runt= 30086msec
    read : io=238MiB, bw=8,319KiB/s, iops=253, runt= 30056msec
    read : io=152MiB, bw=5,290KiB/s, iops=161, runt= 30070msec
    read : io=100MiB, bw=3,483KiB/s, iops=106, runt= 30020msec
    read : io=68,832KiB, bw=2,340KiB/s, iops=71, runt= 30118msec
    READ: io=2,000MiB, aggrb=69,631KiB/s, minb=2,340KiB/s, maxb=15,727KiB/s,
    mint=30001msec, maxt=30118msec
    
    read : io=450MiB, bw=15,691KiB/s, iops=478, runt= 30068msec
    read : io=369MiB, bw=12,882KiB/s, iops=393, runt= 30032msec
    read : io=364MiB, bw=12,732KiB/s, iops=388, runt= 30015msec
    read : io=283MiB, bw=9,889KiB/s, iops=301, runt= 30002msec
    read : io=228MiB, bw=7,935KiB/s, iops=242, runt= 30091msec
    read : io=144MiB, bw=5,018KiB/s, iops=153, runt= 30103msec
    read : io=97,760KiB, bw=3,327KiB/s, iops=101, runt= 30083msec
    read : io=66,784KiB, bw=2,276KiB/s, iops=69, runt= 30046msec
    READ: io=1,999MiB, aggrb=69,625KiB/s, minb=2,276KiB/s, maxb=15,691KiB/s,
    mint=30002msec, maxt=30103msec
    
    IO controller kernel, Three sets
    --------------------------------
    read : io=404MiB, bw=14,103KiB/s, iops=430, runt= 30072msec
    read : io=344MiB, bw=11,999KiB/s, iops=366, runt= 30035msec
    read : io=294MiB, bw=10,257KiB/s, iops=313, runt= 30052msec
    read : io=254MiB, bw=8,888KiB/s, iops=271, runt= 30021msec
    read : io=238MiB, bw=8,311KiB/s, iops=253, runt= 30086msec
    read : io=177MiB, bw=6,202KiB/s, iops=189, runt= 30001msec
    read : io=158MiB, bw=5,517KiB/s, iops=168, runt= 30118msec
    read : io=99MiB, bw=3,464KiB/s, iops=105, runt= 30107msec
    READ: io=1,971MiB, aggrb=68,604KiB/s, minb=3,464KiB/s, maxb=14,103KiB/s,
    mint=30001msec, maxt=30118msec
    
    read : io=375MiB, bw=13,066KiB/s, iops=398, runt= 30110msec
    read : io=326MiB, bw=11,409KiB/s, iops=348, runt= 30003msec
    read : io=308MiB, bw=10,758KiB/s, iops=328, runt= 30066msec
    read : io=256MiB, bw=8,937KiB/s, iops=272, runt= 30091msec
    read : io=232MiB, bw=8,088KiB/s, iops=246, runt= 30041msec
    read : io=192MiB, bw=6,695KiB/s, iops=204, runt= 30077msec
    read : io=144MiB, bw=5,014KiB/s, iops=153, runt= 30051msec
    read : io=96,224KiB, bw=3,281KiB/s, iops=100, runt= 30026msec
    READ: io=1,928MiB, aggrb=67,145KiB/s, minb=3,281KiB/s, maxb=13,066KiB/s,
    mint=30003msec, maxt=30110msec
    
    read : io=405MiB, bw=14,162KiB/s, iops=432, runt= 30021msec
    read : io=354MiB, bw=12,386KiB/s, iops=378, runt= 30007msec
    read : io=303MiB, bw=10,567KiB/s, iops=322, runt= 30062msec
    read : io=261MiB, bw=9,126KiB/s, iops=278, runt= 30040msec
    read : io=228MiB, bw=7,946KiB/s, iops=242, runt= 30048msec
    read : io=178MiB, bw=6,222KiB/s, iops=189, runt= 30074msec
    read : io=152MiB, bw=5,286KiB/s, iops=161, runt= 30093msec
    read : io=99MiB, bw=3,446KiB/s, iops=105, runt= 30110msec
    READ: io=1,981MiB, aggrb=68,996KiB/s, minb=3,446KiB/s, maxb=14,162KiB/s,
    mint=30007msec, maxt=30110msec
    
    Notes:
    - It looks like overall throughput is 1-3% less in case of io controller.
    - Bandwidth distribution between various prio levels has changed a bit. CFQ
      seems to have 100ms slice length for prio4 and then this slice increases
      by 20% for each prio level as prio increases and decreases by 20% as prio
      levels decrease. So Io controller does not seem to be doing too bad as in
      meeting that distribution.
    
    Group Fairness
    +++++++++++++++
    Test7 (Isolation between two KVM virtual machines)
    ==================================================
    Created two KVM virtual machines. Partitioned a disk on host in two partitions
    and gave one partition to each virtual machine. Put both the virtual machines
    in two different cgroup of weight 1000 and 500 each. Virtual machines created
    ext3 file system on the partitions exported from host and did buffered writes.
    Host seems writes as synchronous and virtual machine with higher weight gets
    double the disk time of virtual machine of lower weight. Used deadline
    scheduler in this test case.
    
    Some more details about configuration are in documentation patch.
    
    Test8 (Fairness for synchronous reads)
    ======================================
    - Two dd in two cgroups with cgrop weights 1000 and 500. Ran two "dd" in those
      cgroups (With CFQ scheduler and /sys/block/<device>/queue/fairness = 1)
    
      Higher weight dd finishes first and at that point of time my script takes
      care of reading cgroup files io.disk_time and io.disk_sectors for both the
      groups and display the results.
    
      dd if=/mnt/$BLOCKDEV/zerofile1 of=/dev/null &
      dd if=/mnt/$BLOCKDEV/zerofile2 of=/dev/null &
    
      group1 time=8:16 2452 group1 sectors=8:16 457856
      group2 time=8:16 1317 group2 sectors=8:16 247008
    
      234179072 bytes (234 MB) copied, 3.90912 s, 59.9 MB/s
      234179072 bytes (234 MB) copied, 5.15548 s, 45.4 MB/s
    
    First two fields in time and sectors statistics represent major and minor
    number of the device. Third field represents disk time in milliseconds and
    number of sectors transferred respectively.
    
    This patchset tries to provide fairness in terms of disk time received. group1
    got almost double of group2 disk time (At the time of first dd finish). These
    time and sectors statistics can be read using io.disk_time and io.disk_sector
    files in cgroup. More about it in documentation file.
    
    Test9 (Reader Vs Buffered Writes)
    ================================
    Buffered writes can be problematic and can overwhelm readers, especially with
    noop and deadline. IO controller can provide isolation between readers and
    buffered (async) writers.
    
    First I ran the test without io controller to see the severity of the issue.
    Ran a hostile writer and then after 10 seconds started a reader and then
    monitored the completion time of reader. Reader reads a 256 MB file. Tested
    this with noop scheduler.
    
    sample script
    ------------
    sync
    echo 3 > /proc/sys/vm/drop_caches
    time dd if=/dev/zero of=/mnt/sdb/reader-writer-zerofile bs=4K count=2097152
    conv=fdatasync &
    sleep 10
    time dd if=/mnt/sdb/256M-file of=/dev/null &
    
    Results
    -------
    8589934592 bytes (8.6 GB) copied, 106.045 s, 81.0 MB/s (Writer)
    268435456 bytes (268 MB) copied, 96.5237 s, 2.8 MB/s (Reader)
    
    Now it was time to test io controller whether it can provide isolation between
    readers and writers with noop. I created two cgroups of weight 1000 each and
    put reader in group1 and writer in group 2 and ran the test again. Upon
    comletion of reader, my scripts read io.disk_time and io.disk_sectors cgroup
    files to get an estimate how much disk time each group got and how many
    sectors each group did IO for. 
    
    For more accurate accounting of disk time for buffered writes with queuing
    hardware I had to set /sys/block/<disk>/queue/iosched/fairness to "1".
    
    sample script
    -------------
    echo $$ > /cgroup/bfqio/test2/tasks
    dd if=/dev/zero of=/mnt/$BLOCKDEV/testzerofile bs=4K count=2097152 &
    sleep 10
    echo noop > /sys/block/$BLOCKDEV/queue/scheduler
    echo  1 > /sys/block/$BLOCKDEV/queue/iosched/fairness
    echo $$ > /cgroup/bfqio/test1/tasks
    dd if=/mnt/$BLOCKDEV/256M-file of=/dev/null &
    wait $!
    # Some code for reading cgroup files upon completion of reader.
    -------------------------
    
    Results
    =======
    268435456 bytes (268 MB) copied, 6.92248 s, 38.8 MB/s
    
    group1 time=8:16 3185 group1 sectors=8:16 524824
    group2 time=8:16 3190 group2 sectors=8:16 503848
    
    Note, reader finishes now much lesser time and both group1 and group2
    got almost 3 seconds of disk time. Hence io-controller provides isolation
    from buffered writes.
    
    Test10 (AIO)
    ===========
    
    AIO reads
    -----------
    Set up two fio, AIO read jobs in two cgroup with weight 1000 and 500
    respectively. I am using cfq scheduler. Following are some lines from my test
    script.
    
    ---------------------------------------------------------------
    echo 1000 > /cgroup/bfqio/test1/io.weight
    echo 500 > /cgroup/bfqio/test2/io.weight
    
    fio_args="--ioengine=libaio --rw=read --size=512M --direct=1"
    echo 1 > /sys/block/$BLOCKDEV/queue/iosched/fairness
    
    echo $$ > /cgroup/bfqio/test1/tasks
    fio $fio_args --name=test1 --directory=/mnt/$BLOCKDEV/fio1/
    --output=/mnt/$BLOCKDEV/fio1/test1.log
    --exec_postrun="../read-and-display-group-stats.sh $maj_dev $minor_dev" &
    
    echo $$ > /cgroup/bfqio/test2/tasks
    fio $fio_args --name=test2 --directory=/mnt/$BLOCKDEV/fio2/
    --output=/mnt/$BLOCKDEV/fio2/test2.log &
    ----------------------------------------------------------------
    
    test1 and test2 are two groups with weight 1000 and 500 respectively.
    "read-and-display-group-stats.sh" is one small script which reads the
    test1 and test2 cgroup files to determine how much disk time each group
    got till first fio job finished.
    
    Results
    ------
    
    test1 statistics: time=8:16 17955   sectors=8:16 1049656 dq=8:16 2
    test2 statistics: time=8:16 9217   sectors=8:16 602592 dq=8:16 1
    
    Above shows that by the time first fio (higher weight), finished, group
    test1 got 17686 ms of disk time and group test2 got 9036 ms of disk time.
    similarly the statistics for number of sectors transferred are also shown.
    
    Note that disk time given to group test1 is almost double of group2 disk
    time.
    
    AIO writes
    ----------
    Set up two fio, AIO direct write jobs in two cgroup with weight 1000 and 500
    respectively. I am using cfq scheduler. Following are some lines from my test
    script.
    
    ------------------------------------------------
    echo 1000 > /cgroup/bfqio/test1/io.weight
    echo 500 > /cgroup/bfqio/test2/io.weight
    fio_args="--ioengine=libaio --rw=write --size=512M --direct=1"
    
    echo 1 > /sys/block/$BLOCKDEV/queue/iosched/fairness
    
    echo $$ > /cgroup/bfqio/test1/tasks
    fio $fio_args --name=test1 --directory=/mnt/$BLOCKDEV/fio1/
    --output=/mnt/$BLOCKDEV/fio1/test1.log
    --exec_postrun="../read-and-display-group-stats.sh $maj_dev $minor_dev" &
    
    echo $$ > /cgroup/bfqio/test2/tasks
    fio $fio_args --name=test2 --directory=/mnt/$BLOCKDEV/fio2/
    --output=/mnt/$BLOCKDEV/fio2/test2.log &
    -------------------------------------------------
    
    test1 and test2 are two groups with weight 1000 and 500 respectively.
    "read-and-display-group-stats.sh" is one small script which reads the
    test1 and test2 cgroup files to determine how much disk time each group
    got till first fio job finished.
    
    Following are the results.
    
    test1 statistics: time=8:16 25452   sectors=8:16 1049664 dq=8:16 2
    test2 statistics: time=8:16 12939   sectors=8:16 532184 dq=8:16 4
    
    Above shows that by the time first fio (higher weight), finished, group
    test1 got almost double the disk time of group test2.
    
    Test11 (Fairness for async writes, Buffered Write Vs Buffered Write)
    ===================================================================
    Fairness for async writes is tricky and biggest reason is that async writes
    are cached in higher layers (page cahe) as well as possibly in file system
    layer also (btrfs, xfs etc), and are dispatched to lower layers not necessarily
    in proportional manner.
    
    For example, consider two dd threads reading /dev/zero as input file and doing
    writes of huge files. Very soon we will cross vm_dirty_ratio and dd thread will
    be forced to write out some pages to disk before more pages can be dirtied. But
    not necessarily dirty pages of same thread are picked. It can very well pick
    the inode of lesser priority dd thread and do some writeout. So effectively
    higher weight dd is doing writeouts of lower weight dd pages and we don't see
    service differentation.
    
    IOW, the core problem with buffered write fairness is that higher weight thread
    does not throw enought IO traffic at IO controller to keep the queue
    continuously backlogged. In my testing, there are many .2 to .8 second
    intervals where higher weight queue is empty and in that duration lower weight
    queue get lots of job done giving the impression that there was no service
    differentiation.
    
    In summary, from IO controller point of view async writes support is there.
    Because page cache has not been designed in such a manner that higher 
    prio/weight writer can do more write out as compared to lower prio/weight
    writer, gettting service differentiation is hard and it is visible in some
    cases and not visible in some cases.
    
    Previous versions of the patches were posted here.
    ------------------------------------------------
    
    (V1) http://lkml.org/lkml/2009/3/11/486
    (V2) http://lkml.org/lkml/2009/5/5/275
    (V3) http://lkml.org/lkml/2009/5/26/472
    (V4) http://lkml.org/lkml/2009/6/8/580
    (V5) http://lkml.org/lkml/2009/6/19/279
    (V6) http://lkml.org/lkml/2009/7/2/369
    (V7) http://lkml.org/lkml/2009/7/24/253
    (V8) http://lkml.org/lkml/2009/8/16/204
    
    Thanks
    Vivek
    
    ^ permalink raw reply	[flat|nested] 113+ messages in thread

    end of thread, other threads:[~2009-09-24  1:10 UTC | newest]
    
    Thread overview: 113+ messages (download: mbox.gz follow: Atom feed
    -- links below jump to the message on this page --
         [not found] <1251495072-7780-1-git-send-email-vgoyal@redhat.com>
         [not found] ` <1251495072-7780-3-git-send-email-vgoyal@redhat.com>
         [not found]   ` <1251495072-7780-3-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-08-28 22:26     ` [PATCH 02/23] io-controller: Core of the elevator fair queuing Rik van Riel
         [not found] ` <1251495072-7780-4-git-send-email-vgoyal@redhat.com>
         [not found]   ` <1251495072-7780-4-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-08-29  1:29     ` [PATCH 03/23] io-controller: Common flat fair queuing code in elevaotor layer Rik van Riel
         [not found] ` <1251495072-7780-5-git-send-email-vgoyal@redhat.com>
         [not found]   ` <1251495072-7780-5-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-08-29  1:44     ` [PATCH 04/23] io-controller: Modify cfq to make use of flat elevator fair queuing Rik van Riel
         [not found] ` <1251495072-7780-6-git-send-email-vgoyal@redhat.com>
         [not found]   ` <1251495072-7780-6-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-08-29  3:31     ` [PATCH 05/23] io-controller: Core scheduler changes to support hierarhical scheduling Rik van Riel
         [not found] ` <1251495072-7780-7-git-send-email-vgoyal@redhat.com>
         [not found]   ` <1251495072-7780-7-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-08-29  3:37     ` [PATCH 06/23] io-controller: cgroup related changes for hierarchical group support Rik van Riel
         [not found] ` <1251495072-7780-9-git-send-email-vgoyal@redhat.com>
         [not found]   ` <1251495072-7780-9-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-08-29 23:11     ` [PATCH 08/23] io-controller: cfq changes to use hierarchical fair queuing code in elevaotor layer Rik van Riel
         [not found] ` <1251495072-7780-10-git-send-email-vgoyal@redhat.com>
         [not found]   ` <1251495072-7780-10-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-08-29 23:12     ` [PATCH 09/23] io-controller: Export disk time used and nr sectors dipatched through cgroups Rik van Riel
         [not found] ` <1251495072-7780-11-git-send-email-vgoyal@redhat.com>
         [not found]   ` <1251495072-7780-11-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-08-30  0:10     ` [PATCH 10/23] io-controller: Debug hierarchical IO scheduling Rik van Riel
         [not found] ` <1251495072-7780-13-git-send-email-vgoyal@redhat.com>
         [not found]   ` <1251495072-7780-13-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-08-30  0:40     ` [PATCH 12/23] io-controller: Wait for requests to complete from last queue before new queue is scheduled Rik van Riel
         [not found] ` <1251495072-7780-15-git-send-email-vgoyal@redhat.com>
         [not found]   ` <1251495072-7780-15-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-08-31  2:49     ` [PATCH 14/23] io-conroller: Prepare elevator layer for single queue schedulers Rik van Riel
         [not found] ` <1251495072-7780-16-git-send-email-vgoyal@redhat.com>
         [not found]   ` <1251495072-7780-16-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-08-31  2:52     ` [PATCH 15/23] io-controller: noop changes for hierarchical fair queuing Rik van Riel
         [not found]   ` <4A9B3B0B.9090009@redhat.com>
         [not found]     ` <4A9B3B0B.9090009-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-09-10 17:32       ` Vivek Goyal
         [not found] ` <1251495072-7780-17-git-send-email-vgoyal@redhat.com>
         [not found]   ` <1251495072-7780-17-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-08-31  3:13     ` [PATCH 16/23] io-controller: deadline " Rik van Riel
         [not found]   ` <4A9B3FD3.6000407@redhat.com>
         [not found]     ` <4A9B3FD3.6000407-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-08-31 13:46       ` Vivek Goyal
         [not found] ` <1251495072-7780-14-git-send-email-vgoyal@redhat.com>
         [not found]   ` <1251495072-7780-14-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-08-31 15:27     ` [PATCH 13/23] io-controller: Separate out queue and data Rik van Riel
         [not found] ` <1251495072-7780-18-git-send-email-vgoyal@redhat.com>
         [not found]   ` <1251495072-7780-18-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-08-31 17:21     ` [PATCH 17/23] io-controller: anticipatory changes for hierarchical fair queuing Rik van Riel
         [not found] ` <1251495072-7780-19-git-send-email-vgoyal@redhat.com>
         [not found]   ` <1251495072-7780-19-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-08-31 17:34     ` [PATCH 18/23] io-controller: blkio_cgroup patches from Ryo to track async bios Rik van Riel
         [not found]   ` <4A9C09BE.4060404@redhat.com>
         [not found]     ` <4A9C09BE.4060404-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-08-31 18:56       ` Vivek Goyal
         [not found]     ` <20090831185640.GF3758@redhat.com>
         [not found]       ` <20090831185640.GF3758-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-08-31 23:51         ` Nauman Rafique
         [not found]       ` <e98e18940908311651s26de5b70ye6f4a82402956309@mail.gmail.com>
         [not found]         ` <e98e18940908311651s26de5b70ye6f4a82402956309-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
    2009-09-01  7:00           ` Ryo Tsuruta
         [not found]         ` <20090901.160004.226800357.ryov@valinux.co.jp>
         [not found]           ` <20090901.160004.226800357.ryov-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
    2009-09-01 14:11             ` Vivek Goyal
         [not found]           ` <20090901141142.GA13709@redhat.com>
         [not found]             ` <20090901141142.GA13709-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-09-01 14:53               ` Rik van Riel
    2009-09-01 18:02               ` Nauman Rafique
    2009-09-02  0:59               ` KAMEZAWA Hiroyuki
    2009-09-02  9:52               ` Ryo Tsuruta
         [not found]             ` <20090902095912.cdf8a55e.kamezawa.hiroyu@jp.fujitsu.com>
         [not found]               ` <20090902095912.cdf8a55e.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
    2009-09-02  3:12                 ` Balbir Singh
         [not found]             ` <20090902.185251.193693849.ryov@valinux.co.jp>
         [not found]               ` <20090902.185251.193693849.ryov-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
    2009-09-02 13:58                 ` Vivek Goyal
         [not found]               ` <20090902135821.GB5012@redhat.com>
         [not found]                 ` <20090902135821.GB5012-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-09-03  2:24                   ` Ryo Tsuruta
         [not found]                 ` <20090903.112423.226782505.ryov@valinux.co.jp>
         [not found]                   ` <20090903.112423.226782505.ryov-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
    2009-09-03  2:40                     ` Vivek Goyal
         [not found]                   ` <20090903024014.GA8644@redhat.com>
         [not found]                     ` <20090903024014.GA8644-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-09-03  3:41                       ` Ryo Tsuruta
         [not found] ` <1251495072-7780-20-git-send-email-vgoyal@redhat.com>
         [not found]   ` <1251495072-7780-20-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-08-31 17:39     ` [PATCH 19/23] io-controller: map async requests to appropriate cgroup Rik van Riel
         [not found] ` <1251495072-7780-22-git-send-email-vgoyal@redhat.com>
         [not found]   ` <1251495072-7780-22-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-08-31 19:49     ` [PATCH 21/23] io-controller: Per io group bdi congestion interface Rik van Riel
         [not found] ` <1251495072-7780-23-git-send-email-vgoyal@redhat.com>
         [not found]   ` <1251495072-7780-23-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-08-31 20:56     ` [PATCH 22/23] io-controller: Support per cgroup per device weights and io class Rik van Riel
         [not found] ` <1251495072-7780-24-git-send-email-vgoyal@redhat.com>
         [not found]   ` <1251495072-7780-24-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-08-31 20:57     ` [PATCH 23/23] io-controller: debug elevator fair queuing support Rik van Riel
         [not found]   ` <4A9C3951.8020302@redhat.com>
         [not found]     ` <4A9C3951.8020302-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-08-31 21:01       ` Vivek Goyal
         [not found]     ` <20090831210154.GA8229@redhat.com>
         [not found]       ` <20090831210154.GA8229-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-08-31 21:12         ` Rik van Riel
         [not found] ` <1251495072-7780-8-git-send-email-vgoyal@redhat.com>
         [not found]   ` <1251495072-7780-8-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-08-29 23:04     ` [PATCH 07/23] io-controller: Common hierarchical fair queuing code in elevaotor layer Rik van Riel
    2009-09-03  3:08     ` Munehiro Ikeda
         [not found]   ` <4A9F3319.8040509@ds.jp.nec.com>
         [not found]     ` <4A9F3319.8040509-MDRzhb/z0dd8UrSeD/g0lQ@public.gmane.org>
    2009-09-10 20:11       ` Vivek Goyal
         [not found] ` <4A9DC33E.6000408@cn.fujitsu.com>
         [not found]   ` <4A9DC33E.6000408-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
    2009-09-02 13:45     ` [RFC] IO scheduler based IO controller V9 Vivek Goyal
    2009-09-07  2:14     ` Gui Jianfeng
         [not found]   ` <4AA46C6E.4010109@cn.fujitsu.com>
         [not found]     ` <4AA46C6E.4010109-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
    2009-09-08 13:55       ` Vivek Goyal
         [not found] ` <4AA4B905.8010801@cn.fujitsu.com>
         [not found]   ` <4AA4B905.8010801-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
    2009-09-08 13:53     ` Vivek Goyal
    2009-09-08 19:19     ` Vivek Goyal
         [not found]   ` <20090908191941.GF15974@redhat.com>
         [not found]     ` <20090908191941.GF15974-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-09-09  7:38       ` Gui Jianfeng
    2009-09-09  9:41       ` Jens Axboe
         [not found]     ` <4AA75B71.5060109@cn.fujitsu.com>
         [not found]       ` <4AA75B71.5060109-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
    2009-09-09 15:05         ` Vivek Goyal
         [not found]       ` <20090909150537.GD8256@redhat.com>
         [not found]         ` <20090909150537.GD8256-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-09-10  3:20           ` Gui Jianfeng
    2009-09-11  1:15           ` [PATCH] io-controller: Fix task hanging when there are more than one groups Gui Jianfeng
         [not found]         ` <4AA9A4BE.30005@cn.fujitsu.com>
         [not found]           ` <4AA9A4BE.30005-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
    2009-09-14  2:44             ` Vivek Goyal
    2009-09-15  3:37             ` Vivek Goyal
         [not found]           ` <20090915033739.GA4054@redhat.com>
         [not found]             ` <20090915033739.GA4054-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-09-16  0:05               ` Gui Jianfeng
    2009-09-16  2:58               ` Gui Jianfeng
    2009-09-24  1:10               ` Gui Jianfeng
         [not found]             ` <4AB05442.6080004@cn.fujitsu.com>
         [not found]               ` <4AB05442.6080004-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
    2009-09-16 18:09                 ` Vivek Goyal
         [not found]               ` <20090916180915.GE5221@redhat.com>
         [not found]                 ` <20090916180915.GE5221-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-09-17  6:08                   ` Gui Jianfeng
         [not found] ` <20090908222827.GC3558@redhat.com>
         [not found]   ` <1252449427.14793.36.camel@desktop>
    2009-09-09  1:09     ` [PATCH 25/23] io-controller: fix queue vs group fairness Vivek Goyal
         [not found]   ` <20090908231334.GJ17468@gandalf.sssup.it>
         [not found]     ` <20090908231334.GJ17468-f9ZlEuEWxVeACYmtYXMKmw@public.gmane.org>
    2009-09-09  1:32       ` Vivek Goyal
         [not found]     ` <20090909013205.GB3594@redhat.com>
         [not found]       ` <20090909013205.GB3594-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-09-09  2:03         ` Fabio Checconi
         [not found]   ` <20090908222827.GC3558-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-09-08 22:37     ` Daniel Walker
    2009-09-08 23:13     ` Fabio Checconi
    2009-09-09  4:44     ` Rik van Riel
         [not found] ` <20090908222821.GB3558@redhat.com>
         [not found]   ` <20090908222821.GB3558-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-09-09  3:39     ` [PATCH 24/23] io-controller: Don't leave a queue active when a disk is idle Rik van Riel
         [not found] ` <20090908222835.GD3558@redhat.com>
         [not found]   ` <20090908222835.GD3558-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-09-09  4:59     ` [PATCH 26/23] io-controller: fix writer preemption with in a group Rik van Riel
         [not found] ` <1251495072-7780-1-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-08-28 21:30   ` [PATCH 01/23] io-controller: Documentation Vivek Goyal
    2009-08-28 21:30   ` [PATCH 02/23] io-controller: Core of the elevator fair queuing Vivek Goyal
    2009-08-28 21:30   ` [PATCH 03/23] io-controller: Common flat fair queuing code in elevaotor layer Vivek Goyal
    2009-08-28 21:30   ` [PATCH 04/23] io-controller: Modify cfq to make use of flat elevator fair queuing Vivek Goyal
    2009-08-28 21:30   ` [PATCH 05/23] io-controller: Core scheduler changes to support hierarhical scheduling Vivek Goyal
    2009-08-28 21:30   ` [PATCH 06/23] io-controller: cgroup related changes for hierarchical group support Vivek Goyal
    2009-08-28 21:30   ` [PATCH 07/23] io-controller: Common hierarchical fair queuing code in elevaotor layer Vivek Goyal
    2009-08-28 21:30   ` [PATCH 08/23] io-controller: cfq changes to use " Vivek Goyal
    2009-08-28 21:30   ` [PATCH 09/23] io-controller: Export disk time used and nr sectors dipatched through cgroups Vivek Goyal
    2009-08-28 21:30   ` [PATCH 10/23] io-controller: Debug hierarchical IO scheduling Vivek Goyal
    2009-08-28 21:31   ` [PATCH 11/23] io-controller: Introduce group idling Vivek Goyal
    2009-08-28 21:31   ` [PATCH 12/23] io-controller: Wait for requests to complete from last queue before new queue is scheduled Vivek Goyal
    2009-08-28 21:31   ` [PATCH 13/23] io-controller: Separate out queue and data Vivek Goyal
    2009-08-28 21:31   ` [PATCH 14/23] io-conroller: Prepare elevator layer for single queue schedulers Vivek Goyal
    2009-08-28 21:31   ` [PATCH 15/23] io-controller: noop changes for hierarchical fair queuing Vivek Goyal
    2009-08-28 21:31   ` [PATCH 16/23] io-controller: deadline " Vivek Goyal
    2009-08-28 21:31   ` [PATCH 17/23] io-controller: anticipatory " Vivek Goyal
    2009-08-28 21:31   ` [PATCH 18/23] io-controller: blkio_cgroup patches from Ryo to track async bios Vivek Goyal
    2009-08-28 21:31   ` [PATCH 19/23] io-controller: map async requests to appropriate cgroup Vivek Goyal
    2009-08-28 21:31   ` [PATCH 20/23] io-controller: Per cgroup request descriptor support Vivek Goyal
    2009-08-28 21:31   ` [PATCH 21/23] io-controller: Per io group bdi congestion interface Vivek Goyal
    2009-08-28 21:31   ` [PATCH 22/23] io-controller: Support per cgroup per device weights and io class Vivek Goyal
    2009-08-28 21:31   ` [PATCH 23/23] io-controller: debug elevator fair queuing support Vivek Goyal
    2009-08-31  1:09   ` [RFC] IO scheduler based IO controller V9 Gui Jianfeng
    2009-09-02  0:58   ` Gui Jianfeng
    2009-09-07  7:40   ` Gui Jianfeng
    2009-09-08 22:28   ` Vivek Goyal
    2009-09-08 22:28   ` [PATCH 24/23] io-controller: Don't leave a queue active when a disk is idle Vivek Goyal
    2009-09-08 22:28   ` [PATCH 25/23] io-controller: fix queue vs group fairness Vivek Goyal
    2009-09-08 22:28   ` [PATCH 26/23] io-controller: fix writer preemption with in a group Vivek Goyal
    2009-09-10 15:18   ` [RFC] IO scheduler based IO controller V9 Jerome Marchand
         [not found] ` <4AA918C1.6070907@redhat.com>
         [not found]   ` <4AA918C1.6070907-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-09-10 20:52     ` Vivek Goyal
    2009-09-13 18:54     ` Vivek Goyal
         [not found]   ` <20090910205227.GB3617@redhat.com>
         [not found]     ` <20090910205657.GD3617@redhat.com>
         [not found]       ` <20090910205657.GD3617-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-09-11 13:16         ` Jerome Marchand
         [not found]       ` <4AAA4DA7.8010909@redhat.com>
         [not found]         ` <4AAA4DA7.8010909-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-09-11 14:30           ` Vivek Goyal
         [not found]         ` <20090911143040.GB6758@redhat.com>
         [not found]           ` <20090911143040.GB6758-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-09-11 14:43             ` Vivek Goyal
    2009-09-11 14:44             ` Jerome Marchand
         [not found]           ` <20090911144341.GC6758@redhat.com>
         [not found]             ` <20090911144341.GC6758-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-09-11 14:55               ` Jerome Marchand
         [not found]                 ` <4AAA64F6.2050800-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-09-11 15:01                   ` Vivek Goyal
         [not found]     ` <20090910205227.GB3617-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-09-10 20:56       ` Vivek Goyal
    2009-09-14 14:26       ` Jerome Marchand
         [not found]   ` <20090913185447.GA11003@redhat.com>
         [not found]     ` <20090913185447.GA11003-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-09-14 14:31       ` Jerome Marchand
         [not found] ` <1251495072-7780-21-git-send-email-vgoyal@redhat.com>
         [not found]   ` <1251495072-7780-21-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-08-31 17:54     ` [PATCH 20/23] io-controller: Per cgroup request descriptor support Rik van Riel
    2009-09-14 18:33     ` Nauman Rafique
         [not found]   ` <e98e18940909141133m5186b780r3215ce15141e4f87@mail.gmail.com>
         [not found]     ` <e98e18940909141133m5186b780r3215ce15141e4f87-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
    2009-09-16 18:47       ` Vivek Goyal
         [not found] ` <1251495072-7780-12-git-send-email-vgoyal@redhat.com>
         [not found]   ` <1251495072-7780-12-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    2009-08-30  0:38     ` [PATCH 11/23] io-controller: Introduce group idling Rik van Riel
    2009-09-18  3:56     ` [PATCH] io-controller: Fix another bug that causing system hanging Gui Jianfeng
         [not found]   ` <4AB30508.6010206@cn.fujitsu.com>
         [not found]     ` <4AB30508.6010206-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
    2009-09-18 14:47       ` Vivek Goyal
    2009-08-28 21:30 [RFC] IO scheduler based IO controller V9 Vivek Goyal
    

    This is a public inbox, see mirroring instructions
    for how to clone and mirror all data and code used for this inbox