From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NkfYk-00021k-2o for qemu-devel@nongnu.org; Thu, 25 Feb 2010 10:23:26 -0500 Received: from [199.232.76.173] (port=37376 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NkfYj-00021B-KE for qemu-devel@nongnu.org; Thu, 25 Feb 2010 10:23:25 -0500 Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim 4.60) (envelope-from ) id 1NkfYh-0001LP-Iy for qemu-devel@nongnu.org; Thu, 25 Feb 2010 10:23:25 -0500 Received: from mail-gw0-f45.google.com ([74.125.83.45]:49631) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NkfYh-0001LF-8v for qemu-devel@nongnu.org; Thu, 25 Feb 2010 10:23:23 -0500 Received: by gwj17 with SMTP id 17so2181326gwj.4 for ; Thu, 25 Feb 2010 07:23:22 -0800 (PST) Message-ID: <4B8695E7.7000405@codemonkey.ws> Date: Thu, 25 Feb 2010 09:23:19 -0600 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device References: <1263195647.2005.44.camel@localhost> <201002240258.19045.paul@codesourcery.com> <4B853ED3.3060707@codemonkey.ws> <201002251506.05318.paul@codesourcery.com> In-Reply-To: <201002251506.05318.paul@codesourcery.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paul Brook Cc: Avi Kivity , Dor Laor , Christoph Hellwig , qemu-devel@nongnu.org, Vadim Rozenfeld On 02/25/2010 09:06 AM, Paul Brook wrote: >>> Idle bottom halves (i.e. qemu_bh_schedule_idle) are just bugs waiting to >>> happen, and should never be used for anything. >>> >> Idle bottom halves make considerable more sense than the normal bottom >> halves. >> >> The fact that rescheduling a bottom half within a bottom half results in >> an infinite loop is absurd. It is equally absurd that bottoms halves >> alter the select timeout. The result of that is that if a bottom half >> schedules another bottom half, and that bottom half schedules the >> previous, you get a tight infinite loop. Since bottom halves are used >> often times deep within functions, the result is very subtle infinite >> loops (that we've absolutely encountered in the past). >> > I disagree. The "select timeout" is a completely irrelevant implementation > detail. Anything that relies on it is just plain wrong. No, it's an extremely important detail because its the difference between an infinite loop in an idle function. Very simply, without idle bottom halves, there's no way to implement polling with the main loop. If we dropped idle bottom halves, we would have to add explicit polling back to the main loop. How would you implement polling? > If you require a delay > then you should be using a timer. If scheduling a BH directly then you should > expect it to be processed without delay. > If you need something that doesn't require a delay, schedule a timer with no delay. What's the point of BHs? It's because there's very subtle differences between BHs and timers and because we don't adjust the select timeout for the next timer deadline, there's a race between when we check for timer expiration and when we invoke select. Actually, that's probably fixed now because we've changed the SIGALRM handler to write to a file descriptor to eliminate the signal/select race. I'll give it a try, it might actually work now. > If a BH reschedules itself (indirectly or indirectly) without useful work > occuring then you absolutely should expect an infinite loop. Rescheduling > itself after doing useful work should never cause an infinite loop. The only > way it can loop inifinitely is if we have infinite amount of work to do, in > which case you loose either way. Looping over work via recursive BHs is > probably not the most efficient way to do things, but I guess it may sometimes > be the simplest in practice. > I think the point is getting lost. My contention is that a main loop needs three things 1) an idle callback 2) timers 3) io notification. Bottom halves act both today as a no-delay timer and an idle callback. I agree, that's unfortunate. What we should do is remove idle bottom halves, add proper idle callbacks, and make bottom halves implemented in terms of no-delay timers. > Interaction between multiple BH is slightly trickier. By my reading BH are > processed in the order they are created. It may be reasonable to guarantee > that BH are processed in the order they are scheduled. However I'm reluctant > to even go that far without a good use-case. You could probably come up with > arguments for processing them in most-recently-scheduled order. > It's no different than two timers that happen to fire at the same deadline. You can process in terms of when the timers were created initially or when they were last scheduled. Ultimately, it shouldn't matter to any code that uses them. >> A main loop should have only a few characteristics. It should enable >> timeouts (based on select), it should enable fd event dispatch, and it >> should allow for idle functions to be registered. There should be no >> guarantees on when idle functions are executed other than they'll >> eventually be executed. >> > If you don't provide any guarantees, then surely processing them immediately > must be an acceptable implementation. I don't believe there is a useful > definition of "idle". > idle is necessary to implement polling. You can argue that polling should never be necessary but practically speaking, it almost always is. > All existing uses of qemu_bh_schedule_idle are in fact poorly implemented > periodic polling. Furthermore these should not be using periodic polling, they > can and should be event driven. They only exist because noone has bothered to > fix the old code properly. Having arbitrary 10ms latency on DMA transfers is > just plain wrong. > I agree with you here :-) But there are more legitimate uses of polling. For instance, with virtio-net, we use a timer to implement tx mitigation. It's a huge boost for throughput but it tends to add latency on packet transmission because our notification comes at a longer time. We really just need the notification as a means to re-enable interrupts, not necessarily to slow down packet transmission. So using an idle callback to transmit packets would certainly decrease latency in many cases while still getting the benefits of throughput improvement. >> The way we use "bottom halves" today should be implemented in terms of a >> relative timeout of 0 or an absolute timeout of now. The fact that we >> can't do that in our main loop is due to the very strange dependencies >> deep within various devices on io dispatch ordering. I would love to >> eliminate this but I've not been able to spend any time on this recently. >> > I don't see how this helps. A self-triggering event with a timeout of "now" is > still an infinite loop. Any delay is a bugs in the dispatch loop. "idle" BHs > are relying on this bug. > The main point is that BHs should not be implemented in the actual main loop and that "idle" BHs are really the only type of BHs that should exist as far as the main loop is concerned. s/"idle" BHs/idle callbacks/g and I think we're at a more agreeable place. Regards, Anthony Liguori > Paul >