linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Righi <righi.andrea@gmail.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: menage@google.com, balbir@linux.vnet.ibm.com,
	guijianfeng@cn.fujitsu.com, kamezawa.hiroyu@jp.fujitsu.com,
	agk@sourceware.org, axboe@kernel.dk, baramsori72@gmail.com,
	chlunde@ping.uio.no, dave@linux.vnet.ibm.com, dpshah@google.com,
	eric.rannaud@gmail.com, fernando@oss.ntt.co.jp,
	taka@valinux.co.jp, lizf@cn.fujitsu.com, matt@bluehost.com,
	dradford@bluehost.com, ngupta@google.com,
	randy.dunlap@oracle.com, roberto@unbit.it, ryov@valinux.co.jp,
	s-uchida@ap.jp.nec.com, subrata@linux.vnet.ibm.com,
	yoshikawa.takuya@oss.ntt.co.jp,
	containers@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/9] cgroup: io-throttle controller (v13)
Date: Fri, 17 Apr 2009 11:37:44 +0200	[thread overview]
Message-ID: <20090417093744.GA8689@linux> (raw)
In-Reply-To: <20090416152433.aaaba300.akpm@linux-foundation.org>

On Thu, Apr 16, 2009 at 03:24:33PM -0700, Andrew Morton wrote:
> On Tue, 14 Apr 2009 22:21:11 +0200
> Andrea Righi <righi.andrea@gmail.com> wrote:
> 
> > Objective
> > ~~~~~~~~~
> > The objective of the io-throttle controller is to improve IO performance
> > predictability of different cgroups that share the same block devices.
> 
> We should get an IO controller into Linux.  Does anyone have a reason
> why it shouldn't be this one?
> 
> > Respect to other priority/weight-based solutions the approach used by
> > this controller is to explicitly choke applications' requests
> 
> Yes, blocking the offending application at a high level has always
> seemed to me to be the best way of implementing the controller.
> 
> > that
> > directly or indirectly generate IO activity in the system (this
> > controller addresses both synchronous IO and writeback/buffered IO).
> 
> The problem I've seen with some of the proposed controllers was that
> they didn't handle delayed writeback very well, if at all.
> 
> Can you explain at a high level but in some detail how this works?  If
> an application is doing a huge write(), how is that detected and how is
> the application made to throttle?

The writeback writes are handled in three steps:

1) track the owner of the dirty pages
2) detect writeback IO
3) delay writeback IO that exceeds the cgroup limits

For 1) I barely used the bio-cgroup functionality. The bio-cgroup use
the page_cgroup structure to store the owner of each dirty page when the
page is dirtied. At this point the actual owner of the page can be
retrieved looking at current->mm->owner (i.e. in __set_page_dirty()),
and its bio_cgroup id is stored into the page_cgroup structure.

Then for 2) we can detect writeback IO placing a hook,
cgroup_io_throttle(), in submit_bio():

unsigned long long
cgroup_io_throttle(struct bio *bio, struct block_device *bdev, ssize_t bytes);

If the IO operation is a write we look at the owner of the pages
involved (from bio) and we check if we must throttle the operation. If
the owner of that page is "current", we throttle the current task
directly (via schedule_timeout_killable()) and we just return 0 from
cgroup_io_throttle() after the sleep.

3) If the owner of the page must be throttled and the current task is
not the same task, e.g., it's a kernel thread (current->flags &
(PF_KTHREAD | PF_FLUSHER | PF_KSWAPD)), then we assume it's a writeback
IO and we immediately return the amount of jiffies that the real owner
should sleep.

void submit_bio(int rw, struct bio *bio)
{
...
	if (bio_has_data(bio)) {
		unsigned long sleep = 0;

		if (rw & WRITE) {
			count_vm_events(PGPGOUT, count);
			sleep = cgroup_io_throttle(bio,
					bio->bi_bdev, bio->bi_size);
		} else {
			task_io_account_read(bio->bi_size);
			count_vm_events(PGPGIN, count);
			cgroup_io_throttle(NULL, bio->bi_bdev, bio->bi_size);
		}
...

		if (sleep && !iothrottle_make_request(bio, jiffies + sleep))
			return;
	}

	generic_make_request(bio);
...
}

Since the current task must not be throttled here, we set a deadline
jiffies + sleep and we add this request in a rbtree via
iothrottle_make_request().

This request will be dispatched ansychronously by a kernel thread -
kiothrottled() - using generic_make_request() when the deadline will
expire. There's a lot of space for optimizations here, i.e. use many
threads per block device, workqueue, slow-work, ...

In the old version (v12) I simply throttled writeback IO in
balance_dirty_pages_ratelimited_nr() but this obviously leads to bursty
writebacks. In v13 the writeback IO is hugely more smooth.

> 
> Does it add new metadata to `struct page' for this?

struct page_cgroup

> 
> I assume that the write throttling is also wired up into the MAP_SHARED
> write-fault path?
> 

mmmh.. in case of writeback IO we account and throttle requests for
mm->owner. In case of synchronous IO (read/write) we always throttle the
current task in submit_bio().

> 
> 
> Does this patchset provide a path by which we can implement IO control
> for (say) NFS mounts?

Honestly I didn't looked at all at this. :) I'll check, but in principle
adding the cgroup_io_throttle() hook in the opportune NFS path is enough
to provide IO control also for NFS mounts.

-Andrea

  reply	other threads:[~2009-04-17  9:45 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-14 20:21 [PATCH 0/9] cgroup: io-throttle controller (v13) Andrea Righi
2009-04-14 20:21 ` [PATCH 1/9] io-throttle documentation Andrea Righi
2009-04-17  1:24   ` KAMEZAWA Hiroyuki
2009-04-17  1:56     ` Li Zefan
2009-04-17 10:25       ` Andrea Righi
2009-04-17 10:41         ` Andrea Righi
2009-04-17 11:35         ` Fernando Luis Vázquez Cao
2009-04-20  9:38         ` Ryo Tsuruta
2009-04-20 15:00           ` Andrea Righi
2009-04-27 10:45             ` Ryo Tsuruta
2009-04-27 12:15               ` Ryo Tsuruta
2009-04-27 21:56               ` Andrea Righi
2009-04-17  7:34     ` Gui Jianfeng
2009-04-17  7:43       ` KAMEZAWA Hiroyuki
2009-04-17  9:29         ` Gui Jianfeng
2009-04-17  9:55     ` Andrea Righi
2009-04-17 17:39   ` Vivek Goyal
2009-04-17 23:12     ` Andrea Righi
2009-04-19 13:42       ` Vivek Goyal
2009-04-19 15:47         ` Andrea Righi
2009-04-20 21:28           ` Vivek Goyal
2009-04-20 22:05             ` Andrea Righi
2009-04-21  1:08               ` Vivek Goyal
2009-04-21  8:37                 ` Andrea Righi
2009-04-21 14:23                   ` Vivek Goyal
2009-04-21 18:29                     ` Vivek Goyal
2009-04-21 21:36                       ` Andrea Righi
2009-04-21 21:28                     ` Andrea Righi
2009-04-19 13:54       ` Vivek Goyal
2009-04-14 20:21 ` [PATCH 2/9] res_counter: introduce ratelimiting attributes Andrea Righi
2009-04-14 20:21 ` [PATCH 3/9] bio-cgroup controller Andrea Righi
2009-04-15  2:15   ` KAMEZAWA Hiroyuki
2009-04-15  9:37     ` Andrea Righi
2009-04-15 12:38       ` Ryo Tsuruta
2009-04-15 13:23         ` Andrea Righi
2009-04-15 23:58           ` KAMEZAWA Hiroyuki
2009-04-16 10:42             ` Andrea Righi
2009-04-16 12:00               ` Ryo Tsuruta
2009-04-17  0:04               ` KAMEZAWA Hiroyuki
2009-04-17  9:44                 ` Andrea Righi
2009-04-15 13:07     ` Andrea Righi
2009-04-16 22:29   ` Andrew Morton
2009-04-17  0:20     ` KAMEZAWA Hiroyuki
2009-04-17  0:44       ` Andrew Morton
2009-04-17  1:44         ` Ryo Tsuruta
2009-04-17  4:15           ` Andrew Morton
2009-04-17  7:48             ` Ryo Tsuruta
2009-04-17  1:50         ` Balbir Singh
2009-04-17  9:40     ` Andrea Righi
2009-04-17  1:49   ` Takuya Yoshikawa
2009-04-17  2:24     ` KAMEZAWA Hiroyuki
2009-04-17  7:22       ` Ryo Tsuruta
2009-04-17  8:00         ` KAMEZAWA Hiroyuki
2009-04-17  8:48           ` KAMEZAWA Hiroyuki
2009-04-17  8:51             ` KAMEZAWA Hiroyuki
2009-04-17 11:27         ` Block I/O tracking (was Re: [PATCH 3/9] bio-cgroup controller) Fernando Luis Vázquez Cao
2009-04-17 22:09           ` Andrea Righi
2009-04-17  7:32     ` [PATCH 3/9] bio-cgroup controller Ryo Tsuruta
2009-04-17 10:22   ` Balbir Singh
2009-04-20 11:35     ` Ryo Tsuruta
2009-04-20 14:56       ` Andrea Righi
2009-04-21 11:39         ` Ryo Tsuruta
2009-04-21 15:31         ` Balbir Singh
2009-04-14 20:21 ` [PATCH 4/9] support checking of cgroup subsystem dependencies Andrea Righi
2009-04-14 20:21 ` [PATCH 5/9] io-throttle controller infrastructure Andrea Righi
2009-04-14 20:21 ` [PATCH 6/9] kiothrottled: throttle buffered (writeback) IO Andrea Righi
2009-04-14 20:21 ` [PATCH 7/9] io-throttle instrumentation Andrea Righi
2009-04-14 20:21 ` [PATCH 8/9] export per-task io-throttle statistics to userspace Andrea Righi
2009-04-14 20:21 ` [PATCH 9/9] ext3: do not throttle metadata and journal IO Andrea Righi
2009-04-17 12:38   ` Theodore Tso
2009-04-17 12:50     ` Jens Axboe
2009-04-17 14:39       ` Andrea Righi
2009-04-21  0:18         ` Theodore Tso
2009-04-21  8:30           ` Andrea Righi
2009-04-21 14:06             ` Theodore Tso
2009-04-21 14:31               ` Andrea Righi
2009-04-21 16:35                 ` Theodore Tso
2009-04-21 17:23                   ` Balbir Singh
2009-04-21 17:46                     ` Theodore Tso
2009-04-21 18:14                       ` Balbir Singh
2009-04-21 19:14                         ` Theodore Tso
2009-04-21 20:49                           ` Andrea Righi
2009-04-22  0:33                             ` KAMEZAWA Hiroyuki
2009-04-22  1:21                               ` KAMEZAWA Hiroyuki
2009-04-22 10:22                                 ` Andrea Righi
2009-04-23  0:05                                   ` KAMEZAWA Hiroyuki
2009-04-23  1:22                                     ` Theodore Tso
2009-04-23  2:54                                       ` KAMEZAWA Hiroyuki
2009-04-23  4:35                                         ` Theodore Tso
2009-04-23  4:58                                           ` Andrew Morton
2009-04-23  5:37                                             ` KAMEZAWA Hiroyuki
2009-04-23  9:44                                           ` Andrea Righi
2009-04-23 12:17                                             ` Theodore Tso
2009-04-23 12:27                                               ` Theodore Tso
2009-04-23 21:13                                               ` Andrea Righi
2009-04-24  0:26                                                 ` KAMEZAWA Hiroyuki
2009-04-24  5:14                                           ` Balbir Singh
2009-04-23 10:03                                     ` Andrea Righi
2009-04-22  3:30                           ` Balbir Singh
2009-04-24 15:10               ` Balbir Singh
2009-04-16 22:24 ` [PATCH 0/9] cgroup: io-throttle controller (v13) Andrew Morton
2009-04-17  9:37   ` Andrea Righi [this message]
2009-04-30 13:20 ` Alan D. Brunelle
2009-05-01 11:11   ` Andrea Righi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090417093744.GA8689@linux \
    --to=righi.andrea@gmail.com \
    --cc=agk@sourceware.org \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=baramsori72@gmail.com \
    --cc=chlunde@ping.uio.no \
    --cc=containers@lists.linux-foundation.org \
    --cc=dave@linux.vnet.ibm.com \
    --cc=dpshah@google.com \
    --cc=dradford@bluehost.com \
    --cc=eric.rannaud@gmail.com \
    --cc=fernando@oss.ntt.co.jp \
    --cc=guijianfeng@cn.fujitsu.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=matt@bluehost.com \
    --cc=menage@google.com \
    --cc=ngupta@google.com \
    --cc=randy.dunlap@oracle.com \
    --cc=roberto@unbit.it \
    --cc=ryov@valinux.co.jp \
    --cc=s-uchida@ap.jp.nec.com \
    --cc=subrata@linux.vnet.ibm.com \
    --cc=taka@valinux.co.jp \
    --cc=yoshikawa.takuya@oss.ntt.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).