From: Andrea Righi <righi.andrea@gmail.com>
To: Theodore Tso <tytso@mit.edu>
Cc: Jens Axboe <jens.axboe@oracle.com>,
Paul Menage <menage@google.com>,
Balbir Singh <balbir@linux.vnet.ibm.com>,
Gui Jianfeng <guijianfeng@cn.fujitsu.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
agk@sourceware.org, akpm@linux-foundation.org,
baramsori72@gmail.com, Carl Henrik Lunde <chlunde@ping.uio.no>,
dave@linux.vnet.ibm.com, Divyesh Shah <dpshah@google.com>,
eric.rannaud@gmail.com, fernando@oss.ntt.co.jp,
Hirokazu Takahashi <taka@valinux.co.jp>,
Li Zefan <lizf@cn.fujitsu.com>,
matt@bluehost.com, dradford@bluehost.com, ngupta@google.com,
randy.dunlap@oracle.com, roberto@unbit.it,
Ryo Tsuruta <ryov@valinux.co.jp>,
Satoshi UCHIDA <s-uchida@ap.jp.nec.com>,
subrata@linux.vnet.ibm.com, yoshikawa.takuya@oss.ntt.co.jp,
containers@lists.linux-foundation.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 9/9] ext3: do not throttle metadata and journal IO
Date: Tue, 21 Apr 2009 10:30:02 +0200 [thread overview]
Message-ID: <20090421083001.GA8441@linux> (raw)
In-Reply-To: <20090421001822.GB19186@mit.edu>
On Mon, Apr 20, 2009 at 08:18:22PM -0400, Theodore Tso wrote:
> On Fri, Apr 17, 2009 at 04:39:05PM +0200, Andrea Righi wrote:
> >
> > Exactly, the purpose here is is to prioritize the dispatching of journal
> > IO requests in the IO controller. I may have used an inappropriate flag
> > or a quick&dirty solution, but without this, any cgroup/process that
> > generates a lot of journal activity may be throttled and cause other
> > cgroups/processes to be incorrectly blocked when they try to write to
> > disk.
>
> With ext3 and ext4, all journal I/O requests end up going through
> kjournald. So the question is what I/O control group do you put
> kjournald in? If you unrestrict it, it makes the problem go away
> entirely. On the other hand, it is doing work on behalf of other
> processes, and there is no real way to separate out on whose behalf
> kjournald is doing said work. So I'm not sure fundamentally you'll be
> able to do much with any filesystem journalling activity --- and ext3
> makes life especially bad because of data=ordered mode.
OK, I've just removed the ext3/ext4 patch from io-throttle v14 and
results are pretty the same. BTW I can't even prioritize all the
BIO_RW_SYNC, because in this way all the direct IO would be never
limited at all. Or at least I should add something like a
is_in_direct_io() check or kind of.
Anyway, I agree and I think it's reasonable to always leave kiojournald
into the root cgroup, and doesn't set any IO limit for that cgroup.
But I wouldn't add additional checks for this, at the end we know that
"Unix gives you just enough rope to hang yourself".
>
> > > I'm assuming it's the "usual" problem with lower priority IO getting
> > > access to fs exclusive data. It's quite trivial to cause problems with
> > > higher IO priority tasks then getting stuck waiting for the low priority
> > > process, since they also need to access that fs exclusive data.
> >
> > Right. I thought about using the BIO_RW_SYNC flag instead, but as Ted
> > pointed out, some cgroups/processes might be able to evade the IO
> > control issuing a lot of fsync()s. We could also limit the fsync()-rate
> > into the IO controller, but it sounds like a dirty workaround...
>
> Well, if you use data=writeback or Chris Mason's proposed data=guarded
> mode, then at least all of the data blocks will be written process
> context of the application, and not kjournald's process context. So
> one solution that might be the best that we have for now is to treat
> kjournald as special from an I/O controller point of view (i.e., give
> it its own cgroup), and then use a filesystem mode which avoids data
> blocks getting written in kjournald (i.e., ext3 data=wirteback or
> data=guarded, ext4's delayed allocation, etc.)
Agree.
>
> One major form of leakage that you're still going to have is pdflush;
> which again, is more I/O happening in somebody else's process context.
> Ultimately I think trying to throttle I/O at write submission time
> whether at entry into block layer or in the elevators, is going to be
> highly problematic. Suppose someone dirties a large number of pages?
> That's a system resource, and delaying the writes because a particular
> container has used more than its fair share will cause the entire
> system to run out of memory, which is not a good thing.
>
> Ultimately, I think you'll need to do is write throttling, and suspend
> processes that are dirtying too many pages, instad of trying to
> control the I/O.
We're trying to address also this issue, setting max dirty pages limit
per cgroup, and force a direct writeback when these limits are exceeded.
In this case dirty ratio throttling should happen automatically because
the process will be throttled by the IO controller when it tries to
writeback the dirty pages and submit IO requests.
What's your opinion?
Thanks,
-Andrea
next prev parent reply other threads:[~2009-04-21 8:30 UTC|newest]
Thread overview: 104+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-14 20:21 [PATCH 0/9] cgroup: io-throttle controller (v13) Andrea Righi
2009-04-14 20:21 ` [PATCH 1/9] io-throttle documentation Andrea Righi
2009-04-17 1:24 ` KAMEZAWA Hiroyuki
2009-04-17 1:56 ` Li Zefan
2009-04-17 10:25 ` Andrea Righi
2009-04-17 10:41 ` Andrea Righi
2009-04-17 11:35 ` Fernando Luis Vázquez Cao
2009-04-20 9:38 ` Ryo Tsuruta
2009-04-20 15:00 ` Andrea Righi
2009-04-27 10:45 ` Ryo Tsuruta
2009-04-27 12:15 ` Ryo Tsuruta
2009-04-27 21:56 ` Andrea Righi
2009-04-17 7:34 ` Gui Jianfeng
2009-04-17 7:43 ` KAMEZAWA Hiroyuki
2009-04-17 9:29 ` Gui Jianfeng
2009-04-17 9:55 ` Andrea Righi
2009-04-17 17:39 ` Vivek Goyal
2009-04-17 23:12 ` Andrea Righi
2009-04-19 13:42 ` Vivek Goyal
2009-04-19 15:47 ` Andrea Righi
2009-04-20 21:28 ` Vivek Goyal
2009-04-20 22:05 ` Andrea Righi
2009-04-21 1:08 ` Vivek Goyal
2009-04-21 8:37 ` Andrea Righi
2009-04-21 14:23 ` Vivek Goyal
2009-04-21 18:29 ` Vivek Goyal
2009-04-21 21:36 ` Andrea Righi
2009-04-21 21:28 ` Andrea Righi
2009-04-19 13:54 ` Vivek Goyal
2009-04-14 20:21 ` [PATCH 2/9] res_counter: introduce ratelimiting attributes Andrea Righi
2009-04-14 20:21 ` [PATCH 3/9] bio-cgroup controller Andrea Righi
2009-04-15 2:15 ` KAMEZAWA Hiroyuki
2009-04-15 9:37 ` Andrea Righi
2009-04-15 12:38 ` Ryo Tsuruta
2009-04-15 13:23 ` Andrea Righi
2009-04-15 23:58 ` KAMEZAWA Hiroyuki
2009-04-16 10:42 ` Andrea Righi
2009-04-16 12:00 ` Ryo Tsuruta
2009-04-17 0:04 ` KAMEZAWA Hiroyuki
2009-04-17 9:44 ` Andrea Righi
2009-04-15 13:07 ` Andrea Righi
2009-04-16 22:29 ` Andrew Morton
2009-04-17 0:20 ` KAMEZAWA Hiroyuki
2009-04-17 0:44 ` Andrew Morton
2009-04-17 1:44 ` Ryo Tsuruta
2009-04-17 4:15 ` Andrew Morton
2009-04-17 7:48 ` Ryo Tsuruta
2009-04-17 1:50 ` Balbir Singh
2009-04-17 9:40 ` Andrea Righi
2009-04-17 1:49 ` Takuya Yoshikawa
2009-04-17 2:24 ` KAMEZAWA Hiroyuki
2009-04-17 7:22 ` Ryo Tsuruta
2009-04-17 8:00 ` KAMEZAWA Hiroyuki
2009-04-17 8:48 ` KAMEZAWA Hiroyuki
2009-04-17 8:51 ` KAMEZAWA Hiroyuki
2009-04-17 11:27 ` Block I/O tracking (was Re: [PATCH 3/9] bio-cgroup controller) Fernando Luis Vázquez Cao
2009-04-17 22:09 ` Andrea Righi
2009-04-17 7:32 ` [PATCH 3/9] bio-cgroup controller Ryo Tsuruta
2009-04-17 10:22 ` Balbir Singh
2009-04-20 11:35 ` Ryo Tsuruta
2009-04-20 14:56 ` Andrea Righi
2009-04-21 11:39 ` Ryo Tsuruta
2009-04-21 15:31 ` Balbir Singh
2009-04-14 20:21 ` [PATCH 4/9] support checking of cgroup subsystem dependencies Andrea Righi
2009-04-14 20:21 ` [PATCH 5/9] io-throttle controller infrastructure Andrea Righi
2009-04-14 20:21 ` [PATCH 6/9] kiothrottled: throttle buffered (writeback) IO Andrea Righi
2009-04-14 20:21 ` [PATCH 7/9] io-throttle instrumentation Andrea Righi
2009-04-14 20:21 ` [PATCH 8/9] export per-task io-throttle statistics to userspace Andrea Righi
2009-04-14 20:21 ` [PATCH 9/9] ext3: do not throttle metadata and journal IO Andrea Righi
2009-04-17 12:38 ` Theodore Tso
2009-04-17 12:50 ` Jens Axboe
2009-04-17 14:39 ` Andrea Righi
2009-04-21 0:18 ` Theodore Tso
2009-04-21 8:30 ` Andrea Righi [this message]
2009-04-21 14:06 ` Theodore Tso
2009-04-21 14:31 ` Andrea Righi
2009-04-21 16:35 ` Theodore Tso
2009-04-21 17:23 ` Balbir Singh
2009-04-21 17:46 ` Theodore Tso
2009-04-21 18:14 ` Balbir Singh
2009-04-21 19:14 ` Theodore Tso
2009-04-21 20:49 ` Andrea Righi
2009-04-22 0:33 ` KAMEZAWA Hiroyuki
2009-04-22 1:21 ` KAMEZAWA Hiroyuki
2009-04-22 10:22 ` Andrea Righi
2009-04-23 0:05 ` KAMEZAWA Hiroyuki
2009-04-23 1:22 ` Theodore Tso
2009-04-23 2:54 ` KAMEZAWA Hiroyuki
2009-04-23 4:35 ` Theodore Tso
2009-04-23 4:58 ` Andrew Morton
2009-04-23 5:37 ` KAMEZAWA Hiroyuki
2009-04-23 9:44 ` Andrea Righi
2009-04-23 12:17 ` Theodore Tso
2009-04-23 12:27 ` Theodore Tso
2009-04-23 21:13 ` Andrea Righi
2009-04-24 0:26 ` KAMEZAWA Hiroyuki
2009-04-24 5:14 ` Balbir Singh
2009-04-23 10:03 ` Andrea Righi
2009-04-22 3:30 ` Balbir Singh
2009-04-24 15:10 ` Balbir Singh
2009-04-16 22:24 ` [PATCH 0/9] cgroup: io-throttle controller (v13) Andrew Morton
2009-04-17 9:37 ` Andrea Righi
2009-04-30 13:20 ` Alan D. Brunelle
2009-05-01 11:11 ` Andrea Righi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090421083001.GA8441@linux \
--to=righi.andrea@gmail.com \
--cc=agk@sourceware.org \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=baramsori72@gmail.com \
--cc=chlunde@ping.uio.no \
--cc=containers@lists.linux-foundation.org \
--cc=dave@linux.vnet.ibm.com \
--cc=dpshah@google.com \
--cc=dradford@bluehost.com \
--cc=eric.rannaud@gmail.com \
--cc=fernando@oss.ntt.co.jp \
--cc=guijianfeng@cn.fujitsu.com \
--cc=jens.axboe@oracle.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lizf@cn.fujitsu.com \
--cc=matt@bluehost.com \
--cc=menage@google.com \
--cc=ngupta@google.com \
--cc=randy.dunlap@oracle.com \
--cc=roberto@unbit.it \
--cc=ryov@valinux.co.jp \
--cc=s-uchida@ap.jp.nec.com \
--cc=subrata@linux.vnet.ibm.com \
--cc=taka@valinux.co.jp \
--cc=tytso@mit.edu \
--cc=yoshikawa.takuya@oss.ntt.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).