linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Tso <tytso@mit.edu>
To: Andrea Righi <righi.andrea@gmail.com>
Cc: Paul Menage <menage@google.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	Gui Jianfeng <guijianfeng@cn.fujitsu.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	agk@sourceware.org, akpm@linux-foundation.org, axboe@kernel.dk,
	baramsori72@gmail.com, Carl Henrik Lunde <chlunde@ping.uio.no>,
	dave@linux.vnet.ibm.com, Divyesh Shah <dpshah@google.com>,
	eric.rannaud@gmail.com, fernando@oss.ntt.co.jp,
	Hirokazu Takahashi <taka@valinux.co.jp>,
	Li Zefan <lizf@cn.fujitsu.com>,
	matt@bluehost.com, dradford@bluehost.com, ngupta@google.com,
	randy.dunlap@oracle.com, roberto@unbit.it,
	Ryo Tsuruta <ryov@valinux.co.jp>,
	Satoshi UCHIDA <s-uchida@ap.jp.nec.com>,
	subrata@linux.vnet.ibm.com, yoshikawa.takuya@oss.ntt.co.jp,
	Jens Axboe <jens.axboe@oracle.com>,
	containers@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 9/9] ext3: do not throttle metadata and journal IO
Date: Fri, 17 Apr 2009 08:38:05 -0400	[thread overview]
Message-ID: <20090417123805.GC7117@mit.edu> (raw)
In-Reply-To: <1239740480-28125-10-git-send-email-righi.andrea@gmail.com>

On Tue, Apr 14, 2009 at 10:21:20PM +0200, Andrea Righi wrote:
> Delaying journal IO can unnecessarily delay other independent IO
> operations from different cgroups.
> 
> Add BIO_RW_META flag to the ext3 journal IO that informs the io-throttle
> subsystem to account but not delay journal IO and avoid potential
> priority inversion problems.

So this worries me for two reasons.  First of all, the meaning of
BIO_RW_META is not well defined, but I'm concerned that you are using
the flag in a manner that in a way that wasn't its original intent.
I've included Jens on the cc list so he can comment on that score.

Secondly, there are many more locations than these which can end up
causing I/O which will ending up causing the journal commit to block
until they are completed.  I've done a lot of work in the past few
weeks to make sure those writes get marked using BIO_RW_SYNC.  In
data=ordered mode, the journal commit will block waiting for data
blocks to be written out, and that implies you really need to treat as
high priority all of the block writes that are marked with the
BIO_RW_SYNC flag.

The flip side of this is it may end up making your I/O controller to
leaky; that is, someone might be able to evade your I/O controller's
attempt to impose limits by using fsync() all the time.  This is a
hard problem, though, because filesystem I/O is almost always
intertwined.

What sort of scenarios and workloads are you envisioning might use
this I/O controller?  And can you say more about the specifics about
the priority inversion problem you are concerned about?

Regards,

						- Ted

  reply	other threads:[~2009-04-17 12:40 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-14 20:21 [PATCH 0/9] cgroup: io-throttle controller (v13) Andrea Righi
2009-04-14 20:21 ` [PATCH 1/9] io-throttle documentation Andrea Righi
2009-04-17  1:24   ` KAMEZAWA Hiroyuki
2009-04-17  1:56     ` Li Zefan
2009-04-17 10:25       ` Andrea Righi
2009-04-17 10:41         ` Andrea Righi
2009-04-17 11:35         ` Fernando Luis Vázquez Cao
2009-04-20  9:38         ` Ryo Tsuruta
2009-04-20 15:00           ` Andrea Righi
2009-04-27 10:45             ` Ryo Tsuruta
2009-04-27 12:15               ` Ryo Tsuruta
2009-04-27 21:56               ` Andrea Righi
2009-04-17  7:34     ` Gui Jianfeng
2009-04-17  7:43       ` KAMEZAWA Hiroyuki
2009-04-17  9:29         ` Gui Jianfeng
2009-04-17  9:55     ` Andrea Righi
2009-04-17 17:39   ` Vivek Goyal
2009-04-17 23:12     ` Andrea Righi
2009-04-19 13:42       ` Vivek Goyal
2009-04-19 15:47         ` Andrea Righi
2009-04-20 21:28           ` Vivek Goyal
2009-04-20 22:05             ` Andrea Righi
2009-04-21  1:08               ` Vivek Goyal
2009-04-21  8:37                 ` Andrea Righi
2009-04-21 14:23                   ` Vivek Goyal
2009-04-21 18:29                     ` Vivek Goyal
2009-04-21 21:36                       ` Andrea Righi
2009-04-21 21:28                     ` Andrea Righi
2009-04-19 13:54       ` Vivek Goyal
2009-04-14 20:21 ` [PATCH 2/9] res_counter: introduce ratelimiting attributes Andrea Righi
2009-04-14 20:21 ` [PATCH 3/9] bio-cgroup controller Andrea Righi
2009-04-15  2:15   ` KAMEZAWA Hiroyuki
2009-04-15  9:37     ` Andrea Righi
2009-04-15 12:38       ` Ryo Tsuruta
2009-04-15 13:23         ` Andrea Righi
2009-04-15 23:58           ` KAMEZAWA Hiroyuki
2009-04-16 10:42             ` Andrea Righi
2009-04-16 12:00               ` Ryo Tsuruta
2009-04-17  0:04               ` KAMEZAWA Hiroyuki
2009-04-17  9:44                 ` Andrea Righi
2009-04-15 13:07     ` Andrea Righi
2009-04-16 22:29   ` Andrew Morton
2009-04-17  0:20     ` KAMEZAWA Hiroyuki
2009-04-17  0:44       ` Andrew Morton
2009-04-17  1:44         ` Ryo Tsuruta
2009-04-17  4:15           ` Andrew Morton
2009-04-17  7:48             ` Ryo Tsuruta
2009-04-17  1:50         ` Balbir Singh
2009-04-17  9:40     ` Andrea Righi
2009-04-17  1:49   ` Takuya Yoshikawa
2009-04-17  2:24     ` KAMEZAWA Hiroyuki
2009-04-17  7:22       ` Ryo Tsuruta
2009-04-17  8:00         ` KAMEZAWA Hiroyuki
2009-04-17  8:48           ` KAMEZAWA Hiroyuki
2009-04-17  8:51             ` KAMEZAWA Hiroyuki
2009-04-17 11:27         ` Block I/O tracking (was Re: [PATCH 3/9] bio-cgroup controller) Fernando Luis Vázquez Cao
2009-04-17 22:09           ` Andrea Righi
2009-04-17  7:32     ` [PATCH 3/9] bio-cgroup controller Ryo Tsuruta
2009-04-17 10:22   ` Balbir Singh
2009-04-20 11:35     ` Ryo Tsuruta
2009-04-20 14:56       ` Andrea Righi
2009-04-21 11:39         ` Ryo Tsuruta
2009-04-21 15:31         ` Balbir Singh
2009-04-14 20:21 ` [PATCH 4/9] support checking of cgroup subsystem dependencies Andrea Righi
2009-04-14 20:21 ` [PATCH 5/9] io-throttle controller infrastructure Andrea Righi
2009-04-14 20:21 ` [PATCH 6/9] kiothrottled: throttle buffered (writeback) IO Andrea Righi
2009-04-14 20:21 ` [PATCH 7/9] io-throttle instrumentation Andrea Righi
2009-04-14 20:21 ` [PATCH 8/9] export per-task io-throttle statistics to userspace Andrea Righi
2009-04-14 20:21 ` [PATCH 9/9] ext3: do not throttle metadata and journal IO Andrea Righi
2009-04-17 12:38   ` Theodore Tso [this message]
2009-04-17 12:50     ` Jens Axboe
2009-04-17 14:39       ` Andrea Righi
2009-04-21  0:18         ` Theodore Tso
2009-04-21  8:30           ` Andrea Righi
2009-04-21 14:06             ` Theodore Tso
2009-04-21 14:31               ` Andrea Righi
2009-04-21 16:35                 ` Theodore Tso
2009-04-21 17:23                   ` Balbir Singh
2009-04-21 17:46                     ` Theodore Tso
2009-04-21 18:14                       ` Balbir Singh
2009-04-21 19:14                         ` Theodore Tso
2009-04-21 20:49                           ` Andrea Righi
2009-04-22  0:33                             ` KAMEZAWA Hiroyuki
2009-04-22  1:21                               ` KAMEZAWA Hiroyuki
2009-04-22 10:22                                 ` Andrea Righi
2009-04-23  0:05                                   ` KAMEZAWA Hiroyuki
2009-04-23  1:22                                     ` Theodore Tso
2009-04-23  2:54                                       ` KAMEZAWA Hiroyuki
2009-04-23  4:35                                         ` Theodore Tso
2009-04-23  4:58                                           ` Andrew Morton
2009-04-23  5:37                                             ` KAMEZAWA Hiroyuki
2009-04-23  9:44                                           ` Andrea Righi
2009-04-23 12:17                                             ` Theodore Tso
2009-04-23 12:27                                               ` Theodore Tso
2009-04-23 21:13                                               ` Andrea Righi
2009-04-24  0:26                                                 ` KAMEZAWA Hiroyuki
2009-04-24  5:14                                           ` Balbir Singh
2009-04-23 10:03                                     ` Andrea Righi
2009-04-22  3:30                           ` Balbir Singh
2009-04-24 15:10               ` Balbir Singh
2009-04-16 22:24 ` [PATCH 0/9] cgroup: io-throttle controller (v13) Andrew Morton
2009-04-17  9:37   ` Andrea Righi
2009-04-30 13:20 ` Alan D. Brunelle
2009-05-01 11:11   ` Andrea Righi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090417123805.GC7117@mit.edu \
    --to=tytso@mit.edu \
    --cc=agk@sourceware.org \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=baramsori72@gmail.com \
    --cc=chlunde@ping.uio.no \
    --cc=containers@lists.linux-foundation.org \
    --cc=dave@linux.vnet.ibm.com \
    --cc=dpshah@google.com \
    --cc=dradford@bluehost.com \
    --cc=eric.rannaud@gmail.com \
    --cc=fernando@oss.ntt.co.jp \
    --cc=guijianfeng@cn.fujitsu.com \
    --cc=jens.axboe@oracle.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=matt@bluehost.com \
    --cc=menage@google.com \
    --cc=ngupta@google.com \
    --cc=randy.dunlap@oracle.com \
    --cc=righi.andrea@gmail.com \
    --cc=roberto@unbit.it \
    --cc=ryov@valinux.co.jp \
    --cc=s-uchida@ap.jp.nec.com \
    --cc=subrata@linux.vnet.ibm.com \
    --cc=taka@valinux.co.jp \
    --cc=yoshikawa.takuya@oss.ntt.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).