qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Anthony Liguori <anthony@codemonkey.ws>
To: Paul Brook <paul@codesourcery.com>
Cc: Avi Kivity <avi@redhat.com>, Dor Laor <dlaor@redhat.com>,
	Christoph Hellwig <hch@lst.de>,
	qemu-devel@nongnu.org, Vadim Rozenfeld <vrozenfe@redhat.com>
Subject: Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device
Date: Thu, 25 Feb 2010 09:23:19 -0600	[thread overview]
Message-ID: <4B8695E7.7000405@codemonkey.ws> (raw)
In-Reply-To: <201002251506.05318.paul@codesourcery.com>

On 02/25/2010 09:06 AM, Paul Brook wrote:
>>> Idle bottom halves (i.e. qemu_bh_schedule_idle) are just bugs waiting to
>>> happen, and should never be used for anything.
>>>        
>> Idle bottom halves make considerable more sense than the normal bottom
>> halves.
>>
>> The fact that rescheduling a bottom half within a bottom half results in
>> an infinite loop is absurd.  It is equally absurd that bottoms halves
>> alter the select timeout.  The result of that is that if a bottom half
>> schedules another bottom half, and that bottom half schedules the
>> previous, you get a tight infinite loop.  Since bottom halves are used
>> often times deep within functions, the result is very subtle infinite
>> loops (that we've absolutely encountered in the past).
>>      
> I disagree. The "select timeout" is a completely irrelevant implementation
> detail. Anything that relies on it is just plain wrong.

No, it's an extremely important detail because its the difference 
between an infinite loop in an idle function.

Very simply, without idle bottom halves, there's no way to implement 
polling with the main loop.  If we dropped idle bottom halves, we would 
have to add explicit polling back to the main loop.

How would you implement polling?

>   If you require a delay
> then you should be using a timer. If scheduling a BH directly then you should
> expect it to be processed without delay.
>    

If you need something that doesn't require a delay, schedule a timer 
with no delay.  What's the point of BHs?

It's because there's very subtle differences between BHs and timers and 
because we don't adjust the select timeout for the next timer deadline, 
there's a race between when we check for timer expiration and when we 
invoke select.

Actually, that's probably fixed now because we've changed the SIGALRM 
handler to write to a file descriptor to eliminate the signal/select 
race.  I'll give it a try, it might actually work now.

> If a BH reschedules itself (indirectly or indirectly) without useful work
> occuring then you absolutely should expect an infinite loop. Rescheduling
> itself after doing useful work should never cause an infinite loop. The only
> way it can loop inifinitely is if we have infinite amount of work to do, in
> which case you loose either way.  Looping over work via recursive BHs is
> probably not the most efficient way to do things, but I guess it may sometimes
> be the simplest in practice.
>    

I think the point is getting lost.  My contention is that a main loop 
needs three things 1) an idle callback 2) timers 3) io notification.  
Bottom halves act both today as a no-delay timer and an idle callback.  
I agree, that's unfortunate.  What we should do is remove idle bottom 
halves, add proper idle callbacks, and make bottom halves implemented in 
terms of no-delay timers.

> Interaction between multiple BH is slightly trickier. By my reading BH are
> processed in the order they are created. It may be reasonable to guarantee
> that BH are processed in the order they are scheduled. However I'm reluctant
> to even go that far without a good use-case. You could probably come up with
> arguments for processing them in most-recently-scheduled order.
>    

It's no different than two timers that happen to fire at the same 
deadline.  You can process in terms of when the timers were created 
initially or when they were last scheduled.  Ultimately, it shouldn't 
matter to any code that uses them.

>> A main loop should have only a few characteristics.  It should enable
>> timeouts (based on select), it should enable fd event dispatch, and it
>> should allow for idle functions to be registered.  There should be no
>> guarantees on when idle functions are executed other than they'll
>> eventually be executed.
>>      
> If you don't provide any guarantees, then surely processing them immediately
> must be an acceptable implementation.  I don't believe there is a useful
> definition of "idle".
>    

idle is necessary to implement polling.  You can argue that polling 
should never be necessary but practically speaking, it almost always is.

> All existing uses of qemu_bh_schedule_idle are in fact poorly implemented
> periodic polling. Furthermore these should not be using periodic polling, they
> can and should be event driven.  They only exist because noone has bothered to
> fix the old code properly. Having arbitrary 10ms latency on DMA transfers is
> just plain wrong.
>    

I agree with you here :-)

But there are more legitimate uses of polling.  For instance, with 
virtio-net, we use a timer to implement tx mitigation.  It's a huge 
boost for throughput but it tends to add latency on packet transmission 
because our notification comes at a longer time.  We really just need 
the notification as a means to re-enable interrupts, not necessarily to 
slow down packet transmission.

So using an idle callback to transmit packets would certainly decrease 
latency in many cases while still getting the benefits of throughput 
improvement.

>> The way we use "bottom halves" today should be implemented in terms of a
>> relative timeout of 0 or an absolute timeout of now.  The fact that we
>> can't do that in our main loop is due to the very strange dependencies
>> deep within various devices on io dispatch ordering.  I would love to
>> eliminate this but I've not been able to spend any time on this recently.
>>      
> I don't see how this helps. A self-triggering event with a timeout of "now" is
> still an infinite loop. Any delay is a bugs in the dispatch loop. "idle" BHs
> are relying on this bug.
>    

The main point is that BHs should not be implemented in the actual main 
loop and that "idle" BHs are really the only type of BHs that should 
exist as far as the main loop is concerned.  s/"idle" BHs/idle 
callbacks/g and I think we're at a more agreeable place.

Regards,

Anthony Liguori

> Paul
>    

  reply	other threads:[~2010-02-25 15:23 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-11  7:40 [Qemu-devel] [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device Vadim Rozenfeld
2010-01-11  8:30 ` [Qemu-devel] " Avi Kivity
     [not found]   ` <4B4AE95D.7080305@redhat.com>
2010-01-11  9:19     ` Dor Laor
2010-01-11 13:11       ` Christoph Hellwig
2010-01-11 13:13         ` Avi Kivity
2010-01-11 13:16           ` Christoph Hellwig
2010-01-11 13:47           ` Christoph Hellwig
2010-01-11 14:00             ` Anthony Liguori
2010-02-24  2:58               ` Paul Brook
2010-02-24 14:59                 ` Anthony Liguori
2010-02-25 15:06                   ` Paul Brook
2010-02-25 15:23                     ` Anthony Liguori [this message]
2010-02-25 16:48                       ` Paul Brook
2010-02-25 17:11                     ` Avi Kivity
2010-02-25 17:15                       ` Anthony Liguori
2010-02-25 17:33                         ` Avi Kivity
2010-02-25 18:05                           ` malc
2010-02-25 19:55                           ` Anthony Liguori
2010-02-26  8:47                             ` Avi Kivity
2010-02-26 14:36                               ` Anthony Liguori
2010-02-26 15:39                                 ` Avi Kivity
2010-01-11 13:42   ` Christoph Hellwig
2010-01-11 13:49     ` Anthony Liguori
2010-01-11 14:29       ` Avi Kivity
2010-01-11 14:37         ` Anthony Liguori
2010-01-11 14:46           ` Avi Kivity
2010-01-11 15:13             ` Anthony Liguori
2010-01-11 15:19               ` Avi Kivity
2010-01-11 15:22                 ` Anthony Liguori
2010-01-11 15:31                   ` Avi Kivity
2010-01-11 15:32                     ` Anthony Liguori
2010-01-11 15:35                       ` Avi Kivity
2010-01-11 15:38                         ` Anthony Liguori
2010-01-11 18:22               ` [Qemu-devel] " Michael S. Tsirkin
2010-01-11 18:20           ` Michael S. Tsirkin
2010-01-11 14:25     ` [Qemu-devel] " Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B8695E7.7000405@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=avi@redhat.com \
    --cc=dlaor@redhat.com \
    --cc=hch@lst.de \
    --cc=paul@codesourcery.com \
    --cc=qemu-devel@nongnu.org \
    --cc=vrozenfe@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).