Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Anthony Liguori <anthony@codemonkey.ws>
To: Avi Kivity <avi@redhat.com>
Cc: Vadim Rozenfeld <vrozenfe@redhat.com>,
	Dor Laor <dlaor@redhat.com>, Christoph Hellwig <hch@lst.de>,
	Paul Brook <paul@codesourcery.com>,
	qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device
Date: Fri, 26 Feb 2010 08:36:23 -0600	[thread overview]
Message-ID: <4B87DC67.8030603@codemonkey.ws> (raw)
In-Reply-To: <4B878A97.2080405@redhat.com>

On 02/26/2010 02:47 AM, Avi Kivity wrote:
> qcow2 is still not fully asynchronous.  All the other format drivers 
> (except raw) are fully synchronous.  If we had a threaded 
> infrastructure, we could convert them all in a day.  As it is, you can 
> only use the other block format drivers in 'qemu-img convert'.

I've got a healthy amount of scepticism that it's that easy.  But I'm 
happy to consider patches :-)

>
>>>
>>> Each such thread could run the same loop as the iothread.  Any 
>>> pollable fd or timer would be associated with a thread, so things 
>>> continue as normal more or less.  Unassociated objects continue with 
>>> the main iothread.
>>
>> Is the point latency or increasing available CPU resources? 
>
> Yes.
>
>> If the device models are re-entrant, that reduces a ton of the demand 
>> on the qemu_mutex which means that IO thread can run uncontended.  
>> While we have evidence that the VCPU threads and IO threads are 
>> competing with each other today, I don't think we have any evidence 
>> to suggest that the IO thread is self-starving itself with long 
>> running events.
>
> I agree we have no evidence and that this is all speculation.  But 
> consider a 64-vcpu guest, it has a 1:64 ratio of vcpu time 
> (initiations) to iothread time (completions).  If each vcpu generates 
> 5000 initiations per second, the iothread needs to handle 320,000 
> completions per second.  At that rate you will see some internal 
> competition.  That thread will also have a hard time shuffling data 
> since every completion's data will reside in the wrong cpu cache.

Ultimately, it depends on what you're optimizing for.  If you've got a 
64-vcpu guest on a 128-way box, then sure, we want to have 64 IO threads 
because that will absolutely increase throughput.

But realistically, it's more likely that if you've got a 64-vcpu guest, 
you're on a 1024-way box and you've got 64 guests running at once.  
Having 64 IO threads per VM means you've got 4k threads floating.  It's 
still just as likely that one completion will get delayed by something 
less important.  Now with all of these threads on a box like this, you 
get nasty NUMA interactions too.

The difference between the two models is that with threads, we rely on 
pre-emption to enforce fairness and the Linux scheduler to perform 
scheduling.  With a single IO thread, we're determining execution order 
and priority.

A lot of main loops have a notion of priority for timer and idle 
callbacks.  For something that is latency sensitive, you absolutely 
could introduce the concept of priority for bottom halves.  It would 
ensure that a +1 priority bottom half would get scheduled before 
handling any lower priority I/O/BHs.

> Note, an alternative to multiple iothreads is to move completion 
> handling back to vcpus, provided we can steer the handler close to the 
> guest completion handler.

Looking at something like linux-aio, I think we might actually want to 
do that.  We can submit the request from the VCPU thread and we can 
certainly program the signal to get delivered to that VCPU thread.  
Maintaining affinity for the request is likely a benefit.

>>
>> For host services though, it's much more difficult to isolate them 
>> like this. 
>
> What do you mean by host services?

Things like VNC and live migration.  Things that aren't directly related 
to a guest's activity.  One model I can imagine is to continue to 
relegate these things to a single IO thread, but then move device driven 
callbacks either back to the originating thread or to a dedicated device 
callback thread.  Host services generally have a much lower priority.

>> I'm not necessarily claiming that this will never be the right thing 
>> to do, but I don't think we really have the evidence today to suggest 
>> that we should focus on this in the short term.
>
> Agreed.  We will start to see evidence (one way or the other) as fully 
> loaded 64-vcpu guests are benchmarked.  Another driver may be 
> real-time guests; if a timer can be deferred by some block device 
> initiation or completion, then we can say goodbye to any realtime 
> guarantees we want to make.

I'm wary of making decisions based on performance of a 64-vcpu guest.  
It's an important workload to characterize because it's an extreme case 
but I think 64 1-vcpu guests will continue to be significantly more 
important than 1 64-vcpu guest.

Regards,

Anthony Liguori

next prev parent reply	other threads:[~2010-02-26 14:36 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-11  7:40 [Qemu-devel] [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device Vadim Rozenfeld
2010-01-11  8:30 ` [Qemu-devel] " Avi Kivity
     [not found]   ` <4B4AE95D.7080305@redhat.com>
2010-01-11  9:19     ` Dor Laor
2010-01-11 13:11       ` Christoph Hellwig
2010-01-11 13:13         ` Avi Kivity
2010-01-11 13:16           ` Christoph Hellwig
2010-01-11 13:47           ` Christoph Hellwig
2010-01-11 14:00             ` Anthony Liguori
2010-02-24  2:58               ` Paul Brook
2010-02-24 14:59                 ` Anthony Liguori
2010-02-25 15:06                   ` Paul Brook
2010-02-25 15:23                     ` Anthony Liguori
2010-02-25 16:48                       ` Paul Brook
2010-02-25 17:11                     ` Avi Kivity
2010-02-25 17:15                       ` Anthony Liguori
2010-02-25 17:33                         ` Avi Kivity
2010-02-25 18:05                           ` malc
2010-02-25 19:55                           ` Anthony Liguori
2010-02-26  8:47                             ` Avi Kivity
2010-02-26 14:36                               ` Anthony Liguori [this message]
2010-02-26 15:39                                 ` Avi Kivity
2010-01-11 13:42   ` Christoph Hellwig
2010-01-11 13:49     ` Anthony Liguori
2010-01-11 14:29       ` Avi Kivity
2010-01-11 14:37         ` Anthony Liguori
2010-01-11 14:46           ` Avi Kivity
2010-01-11 15:13             ` Anthony Liguori
2010-01-11 15:19               ` Avi Kivity
2010-01-11 15:22                 ` Anthony Liguori
2010-01-11 15:31                   ` Avi Kivity
2010-01-11 15:32                     ` Anthony Liguori
2010-01-11 15:35                       ` Avi Kivity
2010-01-11 15:38                         ` Anthony Liguori
2010-01-11 18:22               ` [Qemu-devel] " Michael S. Tsirkin
2010-01-11 18:20           ` Michael S. Tsirkin
2010-01-11 14:25     ` [Qemu-devel] " Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B87DC67.8030603@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=avi@redhat.com \
    --cc=dlaor@redhat.com \
    --cc=hch@lst.de \
    --cc=paul@codesourcery.com \
    --cc=qemu-devel@nongnu.org \
    --cc=vrozenfe@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).