From: Fengguang Wu <fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
To: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
Cc: Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>,
ctalbott-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
rni-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
andrea-oIIqvOZpAevzfdHfmsDf5w@public.gmane.org,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
sjayaraman-IBi9RG/b67k@public.gmane.org,
lsf-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
jmoyer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: Re: [RFC] writeback and cgroup
Date: Wed, 25 Apr 2012 11:16:35 +0800 [thread overview]
Message-ID: <20120425031635.GA6895@localhost> (raw)
In-Reply-To: <20120424145655.GA1474-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 5979 bytes --]
On Tue, Apr 24, 2012 at 04:56:55PM +0200, Jan Kara wrote:
> On Tue 24-04-12 19:33:40, Wu Fengguang wrote:
> > On Mon, Apr 16, 2012 at 10:57:45AM -0400, Vivek Goyal wrote:
> > > On Sat, Apr 14, 2012 at 10:36:39PM +0800, Fengguang Wu wrote:
> > >
> > > [..]
> > > > Yeah the backpressure idea would work nicely with all possible
> > > > intermediate stacking between the bdi and leaf devices. In my attempt
> > > > to do combined IO bandwidth control for
> > > >
> > > > - buffered writes, in balance_dirty_pages()
> > > > - direct IO, in the cfq IO scheduler
> > > >
> > > > I have to look into the cfq code in the past days to get an idea how
> > > > the two throttling layers can cooperate (and suffer from the pains
> > > > arise from the violations of layers). It's also rather tricky to get
> > > > two previously independent throttling mechanisms to work seamlessly
> > > > with each other for providing the desired _unified_ user interface. It
> > > > took a lot of reasoning and experiments to work the basic scheme out...
> > > >
> > > > But here is the first result. The attached graph shows progress of 4
> > > > tasks:
> > > > - cgroup A: 1 direct dd + 1 buffered dd
> > > > - cgroup B: 1 direct dd + 1 buffered dd
> > > >
> > > > The 4 tasks are mostly progressing at the same pace. The top 2
> > > > smoother lines are for the buffered dirtiers. The bottom 2 lines are
> > > > for the direct writers. As you may notice, the two direct writers are
> > > > somehow stalled for 1-2 times, which increases the gaps between the
> > > > lines. Otherwise, the algorithm is working as expected to distribute
> > > > the bandwidth to each task.
> > > >
> > > > The current code's target is to satisfy the more realistic user demand
> > > > of distributing bandwidth equally to each cgroup, and inside each
> > > > cgroup, distribute bandwidth equally to buffered/direct writes. On top
> > > > of which, weights can be specified to change the default distribution.
> > > >
> > > > The implementation involves adding "weight for direct IO" to the cfq
> > > > groups and "weight for buffered writes" to the root cgroup. Note that
> > > > current cfq proportional IO conroller does not offer explicit control
> > > > over the direct:buffered ratio.
> > > >
> > > > When there are both direct/buffered writers in the cgroup,
> > > > balance_dirty_pages() will kick in and adjust the weights for cfq to
> > > > execute. Note that cfq will continue to send all flusher IOs to the
> > > > root cgroup. balance_dirty_pages() will compute the overall async
> > > > weight for it so that in the above test case, the computed weights
> > > > will be
> > >
> > > I think having separate weigths for sync IO groups and async IO is not
> > > very appealing. There should be one notion of group weight and bandwidth
> > > distrubuted among groups according to their weight.
> >
> > There have to be some scheme, either explicitly or implicitly. Maybe
> > you are baring in mind some "equal split among queues" policy? For
> > example, if the cgroup has 9 active sync queues and 1 async queue,
> > split the weight equally to the 10 queues? So the sync IOs get 90%
> > share, and the async writes get 10% share.
> Maybe I misunderstand but there doesn't have to be (and in fact isn't)
> any split among sync / async IO in CFQ. At each moment, we choose a queue
> with the highest score and dispatch a couple of requests from it. Then we
> go and choose again. The score of the queue depends on several factors
> (like age of requests, whether the queue is sync or async, IO priority,
> etc.).
>
> Practically, over a longer period system will stabilize on some ratio
> but that's dependent on the load so your system should not impose some
> artificial direct/buffered split but rather somehow deal with the reality
> how IO scheduler decides to dispatch requests...
> Well, but we also have IO priorities which change which queue should get
> preference.
> And also sync queues for several processes can get merged when CFQ
> observes these processes cooperate together on one area of disk and get
> split again when processes stop cooperating. I don't think you really want
> to second-guess what CFQ does inside...
Good points, thank you!
So the cfq behavior is pretty undetermined. I more or less realize
this from the experiments. For example, when starting 2+ "dd oflag=direct"
tasks in one single cgroup, they _sometimes_ progress at different rates.
See the attached graphs for two such examples on XFS. ext4 is fine.
The 2-dd test case is:
mkdir /cgroup/dd
echo $$ > /cgroup/dd/tasks
dd if=/dev/zero of=/fs/zero1 bs=1M oflag=direct &
dd if=/dev/zero of=/fs/zero2 bs=1M oflag=direct &
The 6-dd test case is similar.
> > Look at this graph, the 4 dd tasks are granted the same weight (2 of
> > them are buffered writes). I guess the 2 buffered dd tasks managed to
> > progress much faster than the 2 direct dd tasks just because the async
> > IOs are much more efficient than the bs=64k direct IOs.
> Likely because 64k is too low to get good bandwidth with direct IO. If
> it was 4M, I believe you would get similar throughput for buffered and
> direct IO. So essentially you are right, small IO benefits from caching
> effects since they allow you to submit larger requests to the device which
> is more efficient.
I didn't direct compare the effects, however here is an example of
doing 1M, 64k, 4k direct writes in parallel. It _seems_ bs=1M only has
marginal benefits of 64k, assuming cfq is behaving well.
https://github.com/fengguang/io-controller-tests/raw/master/log/snb/ext4/direct-write-1M-64k-4k.2012-04-19-10-50/balance_dirty_pages-task-bw.png
The test case is:
# cgroup 1
echo 500 > /cgroup/cp/blkio.weight
dd if=/dev/zero of=/fs/zero-1M bs=1M oflag=direct &
# cgroup 2
echo 1000 > /cgroup/dd/blkio.weight
dd if=/dev/zero of=/fs/zero-64k bs=64k oflag=direct &
dd if=/dev/zero of=/fs/zero-4k bs=4k oflag=direct &
Thanks,
Fengguang
[-- Attachment #2: balance_dirty_pages-task-bw.png --]
[-- Type: image/png, Size: 55134 bytes --]
[-- Attachment #3: balance_dirty_pages-task-bw.png --]
[-- Type: image/png, Size: 61243 bytes --]
[-- Attachment #4: Type: text/plain, Size: 205 bytes --]
_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linuxfoundation.org/mailman/listinfo/containers
next prev parent reply other threads:[~2012-04-25 3:16 UTC|newest]
Thread overview: 81+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-03 18:36 [RFC] writeback and cgroup Tejun Heo
2012-04-04 14:51 ` Vivek Goyal
[not found] ` <20120404145134.GC12676-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-04 15:36 ` [Lsf] " Steve French
2012-04-04 18:56 ` Tejun Heo
2012-04-04 19:19 ` Vivek Goyal
[not found] ` <20120404191918.GK12676-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-25 8:47 ` Suresh Jayaraman
2012-04-04 18:49 ` Tejun Heo
2012-04-04 19:23 ` [Lsf] " Steve French
2012-04-14 12:15 ` Peter Zijlstra
2012-04-04 20:32 ` Vivek Goyal
[not found] ` <20120404203239.GM12676-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-04 23:02 ` Tejun Heo
[not found] ` <20120404184909.GB29686-RcKxWJ4Cfj1J2suj2OqeGauc2jM2gXBXkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2012-04-05 16:38 ` Tejun Heo
[not found] ` <20120405163854.GE12854-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-04-05 17:13 ` Vivek Goyal
2012-04-14 11:53 ` [Lsf] " Peter Zijlstra
2012-04-07 8:00 ` Jan Kara
[not found] ` <20120407080027.GA2584-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2012-04-10 16:23 ` [Lsf] " Steve French
2012-04-10 18:16 ` Vivek Goyal
2012-04-10 18:06 ` Vivek Goyal
[not found] ` <20120410180653.GJ21801-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-10 21:05 ` Jan Kara
2012-04-10 21:20 ` Vivek Goyal
2012-04-10 22:24 ` Jan Kara
[not found] ` <20120410222425.GF4936-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2012-04-11 15:40 ` Vivek Goyal
2012-04-11 15:45 ` Vivek Goyal
2012-04-11 17:05 ` Jan Kara
2012-04-11 17:23 ` Vivek Goyal
[not found] ` <20120411172311.GF16692-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-11 19:44 ` Jan Kara
[not found] ` <20120411170542.GB16008-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2012-04-17 21:48 ` Tejun Heo
2012-04-18 18:18 ` Vivek Goyal
2012-04-11 19:22 ` Jan Kara
[not found] ` <20120411192231.GF16008-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2012-04-12 20:37 ` Vivek Goyal
2012-04-12 20:51 ` Tejun Heo
[not found] ` <20120412205148.GA24056-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-04-14 14:36 ` Fengguang Wu
2012-04-16 14:57 ` Vivek Goyal
2012-04-24 11:33 ` Fengguang Wu
2012-04-24 14:56 ` Jan Kara
2012-04-24 15:58 ` Vivek Goyal
[not found] ` <20120424155843.GG26708-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-25 2:42 ` Fengguang Wu
[not found] ` <20120424145655.GA1474-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2012-04-25 3:16 ` Fengguang Wu [this message]
2012-04-25 9:01 ` Jan Kara
[not found] ` <20120425090156.GB12568-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2012-04-25 12:05 ` Fengguang Wu
2012-04-15 11:37 ` [Lsf] " Peter Zijlstra
2012-04-17 22:01 ` Tejun Heo
2012-04-18 6:30 ` Jan Kara
2012-04-14 12:25 ` [Lsf] " Peter Zijlstra
2012-04-16 12:54 ` Vivek Goyal
2012-04-16 13:07 ` Fengguang Wu
2012-04-16 14:19 ` Fengguang Wu
2012-04-16 15:52 ` Vivek Goyal
[not found] ` <20120416155207.GB15437-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-17 2:14 ` Fengguang Wu
[not found] ` <20120403183655.GA23106-RcKxWJ4Cfj1J2suj2OqeGauc2jM2gXBXkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2012-04-04 17:51 ` Fengguang Wu
2012-04-04 18:35 ` Vivek Goyal
[not found] ` <20120404183528.GJ12676-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-04 21:42 ` Fengguang Wu
2012-04-05 15:10 ` Vivek Goyal
2012-04-06 0:32 ` Fengguang Wu
2012-04-04 19:33 ` Tejun Heo
[not found] ` <20120404193355.GD29686-RcKxWJ4Cfj1J2suj2OqeGauc2jM2gXBXkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2012-04-04 20:18 ` Vivek Goyal
2012-04-05 16:31 ` Tejun Heo
2012-04-05 17:09 ` Vivek Goyal
2012-04-06 9:59 ` Fengguang Wu
2012-04-17 22:38 ` Tejun Heo
[not found] ` <20120417223854.GG19975-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-04-19 14:23 ` Fengguang Wu
2012-04-19 18:31 ` Vivek Goyal
2012-04-20 12:45 ` Fengguang Wu
2012-04-20 19:29 ` Vivek Goyal
2012-04-20 21:33 ` Tejun Heo
2012-04-22 14:26 ` Fengguang Wu
2012-04-23 12:30 ` Vivek Goyal
2012-04-23 16:04 ` Tejun Heo
2012-04-19 20:26 ` Jan Kara
2012-04-20 13:34 ` Fengguang Wu
2012-04-20 19:08 ` Tejun Heo
[not found] ` <20120420190844.GH32324-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-04-22 14:46 ` Fengguang Wu
2012-04-23 16:56 ` Tejun Heo
2012-04-24 7:58 ` Fengguang Wu
2012-04-25 15:47 ` Tejun Heo
2012-04-23 9:14 ` Jan Kara
2012-04-23 10:24 ` Fengguang Wu
2012-04-23 12:42 ` Jan Kara
2012-04-23 14:31 ` Fengguang Wu
2012-04-18 6:57 ` Jan Kara
[not found] ` <20120418065720.GA21485-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2012-04-18 7:58 ` Fengguang Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120425031635.GA6895@localhost \
--to=fengguang.wu-ral2jqcrhueavxtiumwx3w@public.gmane.org \
--cc=andrea-oIIqvOZpAevzfdHfmsDf5w@public.gmane.org \
--cc=axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=ctalbott-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=jack-AlSwsSmVLrQ@public.gmane.org \
--cc=jmoyer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
--cc=lsf-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=rni-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=sjayaraman-IBi9RG/b67k@public.gmane.org \
--cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).