public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Daniel P. Berrange" <berrange@redhat.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Nauman Rafique <nauman@google.com>,
	Munehiro Ikeda <m-ikeda@ds.jp.nec.com>,
	linux-kernel@vger.kernel.org, Ryo Tsuruta <ryov@valinux.co.jp>,
	taka@valinux.co.jp, Andrea Righi <righi.andrea@gmail.com>,
	Gui Jianfeng <guijianfeng@cn.fujitsu.com>,
	akpm@linux-foundation.org, balbir@linux.vnet.ibm.com
Subject: Re: [RFC][PATCH 00/11] blkiocg async support
Date: Tue, 27 Jul 2010 11:40:37 +0100	[thread overview]
Message-ID: <20100727104037.GA22347@redhat.com> (raw)
In-Reply-To: <20100716151234.GG15382@redhat.com>

On Fri, Jul 16, 2010 at 11:12:34AM -0400, Vivek Goyal wrote:
> On Fri, Jul 16, 2010 at 03:53:09PM +0100, Daniel P. Berrange wrote:
> > On Fri, Jul 16, 2010 at 10:35:36AM -0400, Vivek Goyal wrote:
> > > On Fri, Jul 16, 2010 at 03:15:49PM +0100, Daniel P. Berrange wrote:
> > > Secondly, just because some controller allows creation of hierarchy does
> > > not mean that hierarchy is being enforced. For example, memory controller.
> > > IIUC, one needs to explicitly set "use_hierarchy" to enforce hierarchy
> > > otherwise effectively it is flat. So if libvirt is creating groups and
> > > putting machines in child groups thinking that we are not interfering
> > > with admin's policy, is not entirely correct.
> > 
> > That is true, but that 'use_hierarchy' at least provides admins
> > the mechanism required to implement the neccessary policy
> > 
> > > So how do we make progress here. I really want to see blkio controller
> > > integrated with libvirt.
> > > 
> > > About the issue of hierarchy, I can probably travel down the path of allowing
> > > creation of hierarchy but CFQ will treat it as flat. Though I don't like it
> > > because it will force me to introduce variables like "use_hierarchy" once
> > > real hierarchical support comes in but I guess I can live with that.
> > > (Anyway memory controller is already doing it.).
> > > 
> > > There is another issue though and that is by default every virtual
> > > machine going into a group of its own. As of today, it can have
> > > severe performance penalties (depending on workload) if group is not
> > > driving doing enough IO. (Especially with group_isolation=1).
> > > 
> > > I was thinking of a model where an admin moves out the bad virtual
> > > machines in separate group and limit their IO.
> > 
> > In the simple / normal case I imagine all guests VMs will be running
> > unrestricted I/O initially. Thus instead of creating the cgroup at time
> > of VM startup, we could create the cgroup only when the admin actually
> > sets an I/O limit.
> 
> That makes sense. Run all the virtual machines by default in root group
> and move out a virtual machine to a separate group of either low weight
> (if virtual machine is a bad one and driving lot of IO) or of higher weight
> (if we want to give more IO bw to this machine).
> 
> > IIUC, this should maintain the one cgroup per guest
> > model, while avoiding the performance penalty in normal use. The caveat
> > of course is that this would require blkio controller to have a dedicated
> > mount point, not shared with other controller.
> 
> Yes. Because for other controllers we seem to be putting virtual machines
> in separate cgroups by default at startup time. So it seems we will
> require a separate mount point here for blkio controller.
> 
> >  I think we might also
> > want this kind of model for net I/O, since we probably don't want to 
> > creating TC classes + net_cls groups for every VM the moment it starts
> > unless the admin has actually set a net I/O limit.
> 
> Looks like. So good, then network controller and blkio controller can
> share the this new mount point. 

After thinking about this some more there are a couple of problems with
this plan. For QEMU the 'vhostnet' (the in kernel virtio network backend)
requires that QEMU be in the cgroup at time of startup, otherwise the
vhost kernel thread won't end up in the right cgroup. For libvirt's LXC
container driver, moving the container in & out of the cgroups at runtime
is pretty difficult because there are an arbitrary number of processes
running in the container. It would require moving all the container
processes between two cgroups in a race free manner. So on second thoughts
I'm more inclined to stick with our current approach of putting all guests
into the appropriate cgroups at guest/container startup, even for blkio
and netcls. 

Daniel
-- 
|: Red Hat, Engineering, London    -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org        -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

  reply	other threads:[~2010-07-27 10:41 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-09  2:57 [RFC][PATCH 00/11] blkiocg async support Munehiro Ikeda
2010-07-09  3:14 ` [RFC][PATCH 01/11] blkiocg async: Make page_cgroup independent from memory controller Munehiro Ikeda
2010-07-26  6:49   ` Balbir Singh
2010-07-09  3:15 ` [RFC][PATCH 02/11] blkiocg async: The main part of iotrack Munehiro Ikeda
2010-07-09  7:35   ` KAMEZAWA Hiroyuki
2010-07-09 23:06     ` Munehiro Ikeda
2010-07-12  0:11       ` KAMEZAWA Hiroyuki
2010-07-14 14:46         ` Munehiro IKEDA
2010-07-09  7:38   ` KAMEZAWA Hiroyuki
2010-07-09 23:09     ` Munehiro Ikeda
2010-07-10 10:06       ` Andrea Righi
2010-07-09  3:16 ` [RFC][PATCH 03/11] blkiocg async: Hooks for iotrack Munehiro Ikeda
2010-07-09  9:24   ` Andrea Righi
2010-07-09 23:43     ` Munehiro Ikeda
2010-07-09  3:16 ` [RFC][PATCH 04/11] blkiocg async: block_commit_write not to record process info Munehiro Ikeda
2010-07-09  3:17 ` [RFC][PATCH 05/11] blkiocg async: __set_page_dirty_nobuffer " Munehiro Ikeda
2010-07-09  3:17 ` [RFC][PATCH 06/11] blkiocg async: ext4_writepage not to overwrite iotrack info Munehiro Ikeda
2010-07-09  3:18 ` [RFC][PATCH 07/11] blkiocg async: Pass bio to elevator_ops functions Munehiro Ikeda
2010-07-09  3:19 ` [RFC][PATCH 08/11] blkiocg async: Function to search blkcg from css ID Munehiro Ikeda
2010-07-09  3:20 ` [RFC][PATCH 09/11] blkiocg async: Functions to get cfqg from bio Munehiro Ikeda
2010-07-09  3:22 ` [RFC][PATCH 10/11] blkiocg async: Async queue per cfq_group Munehiro Ikeda
2010-08-13  1:24   ` Nauman Rafique
2010-08-13 21:00     ` Munehiro Ikeda
2010-08-13 23:01       ` Nauman Rafique
2010-08-14  0:49         ` Munehiro Ikeda
2010-07-09  3:23 ` [RFC][PATCH 11/11] blkiocg async: Workload timeslice adjustment for async queues Munehiro Ikeda
2010-07-09 10:04 ` [RFC][PATCH 00/11] blkiocg async support Andrea Righi
2010-07-09 13:45 ` Vivek Goyal
2010-07-10  0:17   ` Munehiro Ikeda
2010-07-10  0:55     ` Nauman Rafique
2010-07-10 13:24       ` Vivek Goyal
2010-07-12  0:20         ` KAMEZAWA Hiroyuki
2010-07-12 13:18           ` Vivek Goyal
2010-07-13  4:36             ` KAMEZAWA Hiroyuki
2010-07-14 14:29               ` Vivek Goyal
2010-07-15  0:00                 ` KAMEZAWA Hiroyuki
2010-07-16 13:43                   ` Vivek Goyal
2010-07-16 14:15                     ` Daniel P. Berrange
2010-07-16 14:35                       ` Vivek Goyal
2010-07-16 14:53                         ` Daniel P. Berrange
2010-07-16 15:12                           ` Vivek Goyal
2010-07-27 10:40                             ` Daniel P. Berrange [this message]
2010-07-27 14:03                               ` Vivek Goyal
2010-07-22 19:28           ` Greg Thelen
2010-07-22 23:59             ` KAMEZAWA Hiroyuki
2010-07-26  6:41 ` Balbir Singh
2010-07-27  6:40   ` Greg Thelen
2010-07-27  6:39     ` KAMEZAWA Hiroyuki
2010-08-02 20:58 ` Vivek Goyal
2010-08-03 14:31   ` Munehiro Ikeda
2010-08-03 19:24     ` Nauman Rafique
2010-08-04 14:32       ` Munehiro Ikeda
2010-08-03 20:15     ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100727104037.GA22347@redhat.com \
    --to=berrange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=guijianfeng@cn.fujitsu.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=m-ikeda@ds.jp.nec.com \
    --cc=nauman@google.com \
    --cc=righi.andrea@gmail.com \
    --cc=ryov@valinux.co.jp \
    --cc=taka@valinux.co.jp \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox