linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Chris Wright <chrisw@redhat.com>
Cc: Tejun Heo <tj@kernel.org>,
	Kent Overstreet <koverstreet@google.com>,
	axboe@kernel.dk, ctalbott@google.com, rni@google.com,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 7/9] block: implement bio_associate_current()
Date: Tue, 28 Feb 2012 09:10:36 -0500	[thread overview]
Message-ID: <20120228141036.GE9920@redhat.com> (raw)
In-Reply-To: <20120227231222.GF14856@x200.localdomain>

On Mon, Feb 27, 2012 at 03:12:22PM -0800, Chris Wright wrote:

[..]
> > > > > blkcg doesn't allow that anyway (it tries but is racy) and I actually
> > > > > was thinking about sending a RFC patch to kill CLONE_IO.
> > > > 
> > > > I thought CLONE_IO is useful and it allows threads to share IO context.
> > > > qemu wanted to use it for its IO threads so that one virtual machine
> > > > does not get higher share of disk by just craeting more threads. In fact
> > > > if multiple threads are doing related IO, we would like them to use
> > > > same io context.
> > > 
> > > I don't think that's true.  Think of any multithreaded server program
> > > where each thread is working pretty much independently from others.
> > 
> > If threads are working pretty much independently, then one does not have
> > to specify CLONE_IO.
> > 
> > In case of qemu IO threads, I have debugged issues where an big IO range
> > is being splitted among its IO threads. Just do a sequential IO inside
> > guest, and I was seeing that few sector IO comes from one process, next
> > few sector come from other process and it goes on. A sequential range
> > of IO is some split among a bunch of threads and that does not work
> > well with CFQ if every IO is coming from its own IO context and IO
> > context is not shared. After a bunch of IO from one io context, CFQ
> > continues to idle on that io context thinking more IO will come soon.
> > Next IO does come but from a different thread and differnet context.
> > 
> > CFQ now has employed some techniques to detect that case and try
> > to do preemption and try to reduce idling in such cases. But sometimes
> > these techniques work well and other times don't.  So to me, CLONE_IO
> > can help in this case where application can specifically share
> > IO context and CFQ does not have to do all the tricks.
> > 
> > That's a different thing that applications might not be making use
> > of CLONE_IO.
> > 
> > > Virtualization *can* be a valid use case but are they actually using
> > > it?  Aren't they better served by cgroup?
> > 
> > cgroup can be very heavy weight when hundred's of virtual machines
> > are running. Why? because of idling. CFQ still has lots of tricks
> > to do preemption and cut down on idling across io contexts, but
> > across cgroup boundaries, isolation is much more stronger and very
> > little preemption (if any) is allowed. I suspect in current
> > implementation, if we create lots of blkio cgroup, it will be 
> > bad for overall throughput of virtual machines (purely because of
> > idling).
> > 
> > So I am not too excited about blkio cgroup solution because it might not
> > scale well. (Until and unless we find a better algorithm to cut down
> > on idling).
> > 
> > I am ccing Chris Wright <chrisw@redhat.com>. He might have thoughts
> > on usage of CLONE_IO and qemu.
> 
> Vivek, you summed it up pretty well.  Also, for qemu, raw CLONE_IO is not
> an option because threads are created via pthread (we had done some local
> hacks to verify that CLONE_IO helped w/ the idling problem, and it did).

Chris,

Just to make sure I understand it right I am thinking loud.

That means CLONE_IO is useful and ideally qemu would like to make use of it
but beacuse pthread interface does not support it, it is not used as of
today.

Thanks
Vivek

  reply	other threads:[~2012-02-28 14:10 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-16 22:37 [PATCHSET] blkcg: update locking and fix stacking Tejun Heo
2012-02-16 22:37 ` [PATCH 1/9] blkcg: use double locking instead of RCU for blkg synchronization Tejun Heo
2012-02-16 22:37 ` [PATCH 2/9] blkcg: drop unnecessary RCU locking Tejun Heo
2012-02-17 16:19   ` Vivek Goyal
2012-02-17 17:07     ` Tejun Heo
2012-02-17 17:14       ` Tejun Heo
2012-02-17 16:47   ` Vivek Goyal
2012-02-17 17:11     ` Tejun Heo
2012-02-17 17:28       ` Vivek Goyal
2012-02-17 17:43         ` Tejun Heo
2012-02-17 18:08           ` Vivek Goyal
2012-02-17 18:16             ` Tejun Heo
2012-02-22  0:49   ` [PATCH UPDATED " Tejun Heo
2012-02-16 22:37 ` [PATCH 3/9] block: restructure get_request() Tejun Heo
2012-02-16 22:37 ` [PATCH 4/9] block: interface update for ioc/icq creation functions Tejun Heo
2012-02-16 22:37 ` [PATCH 5/9] block: ioc_task_link() can't fail Tejun Heo
2012-02-17 20:41   ` Vivek Goyal
2012-02-17 22:18     ` Tejun Heo
2012-02-16 22:37 ` [PATCH 6/9] block: add io_context->active_ref Tejun Heo
2012-02-16 22:37 ` [PATCH 7/9] block: implement bio_associate_current() Tejun Heo
2012-02-17  1:19   ` Kent Overstreet
2012-02-17 22:14     ` Tejun Heo
2012-02-17 22:34       ` Vivek Goyal
2012-02-17 22:41         ` Tejun Heo
2012-02-17 22:51           ` Vivek Goyal
2012-02-17 22:57             ` Tejun Heo
2012-02-20 14:22               ` Vivek Goyal
2012-02-20 16:59                 ` Tejun Heo
2012-02-20 19:14                   ` Vivek Goyal
2012-02-20 21:21                     ` Tejun Heo
2012-02-27 23:12                     ` Chris Wright
2012-02-28 14:10                       ` Vivek Goyal [this message]
2012-02-28 17:01                         ` Chris Wright
2012-02-28 20:11                           ` Stefan Hajnoczi
2012-02-20 14:36               ` Vivek Goyal
2012-02-20 17:01                 ` Tejun Heo
2012-02-20 19:16                   ` Vivek Goyal
2012-02-20 21:06                     ` Tejun Heo
2012-02-20 21:10                       ` Vivek Goyal
2012-02-17 22:56           ` Vivek Goyal
2012-02-17 23:06             ` Tejun Heo
2012-02-17 21:33   ` Vivek Goyal
2012-02-17 22:03     ` Tejun Heo
2012-02-17 22:29       ` Vivek Goyal
2012-02-17 22:38         ` Tejun Heo
2012-02-17 22:42           ` Tejun Heo
2012-02-16 22:37 ` [PATCH 8/9] block: make block cgroup policies follow bio task association Tejun Heo
2012-02-16 22:37 ` [PATCH 9/9] block: make blk-throttle preserve the issuing task on delayed bios Tejun Heo
2012-02-17 21:58   ` Vivek Goyal
2012-02-17 22:17     ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120228141036.GE9920@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=chrisw@redhat.com \
    --cc=ctalbott@google.com \
    --cc=koverstreet@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rni@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).