From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Ky6Pd-0003aZ-FV for qemu-devel@nongnu.org; Thu, 06 Nov 2008 10:04:45 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Ky6Pb-0003ZB-AY for qemu-devel@nongnu.org; Thu, 06 Nov 2008 10:04:44 -0500 Received: from [199.232.76.173] (port=46679 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ky6Pb-0003Z6-64 for qemu-devel@nongnu.org; Thu, 06 Nov 2008 10:04:43 -0500 Received: from mx2.redhat.com ([66.187.237.31]:59837) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1Ky6Pa-000182-Nt for qemu-devel@nongnu.org; Thu, 06 Nov 2008 10:04:43 -0500 Date: Thu, 6 Nov 2008 17:04:45 +0200 From: Gleb Natapov Subject: Re: [Qemu-devel] [RESEND][PATCH 0/3] Fix guest time drift under heavy load. Message-ID: <20081106150445.GB29861@redhat.com> References: <490B59BF.3000205@codemonkey.ws> <20081102130441.GD16809@redhat.com> <49119551.2070704@redhat.com> <20081106071624.GC3820@redhat.com> <20081106100821.GF3820@redhat.com> <20081106141821.GG3820@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: andrzej zaborowski Cc: dlaor@redhat.com, qemu-devel@nongnu.org On Thu, Nov 06, 2008 at 03:35:16PM +0100, andrzej zaborowski wrote: > 2008/11/6 Gleb Natapov : > > On Thu, Nov 06, 2008 at 02:21:16PM +0100, andrzej zaborowski wrote: > >> >> > It is nothing like a -win2k-hack since there is no any guest "bad > >> >> > behaviour" that cause the problem. Yes real hardware doesn't do this, > >> >> > but real hardware also provides OS with enough CPU power to handle every > >> >> > single timer interrupt. > >> >> > >> >> A guest that counts on having enough CPU for something is > >> >> timing-depenent (buggy). > >> >> > >> > Tell this to RT developers who count each CPU cycle. > >> > >> They don't usually use qemu (they certainly shouldn't). > >> > > It seems to me that you were saying that counting on CPU availability > > is buggy in general and since Windows doing this it is buggy. And what I > > am saying is that for certain thing it is valid thing to do. RT people > > shouldn't use qemu in production of cause, should people use qemu to run > > Windows in production? > > People shouldn't use buggy software in general.. if they do, they have > to accept its bugs ;) They don't want to (accept bugs I mean) :( Otherwise we agree on this point. > > >> > > > You ignored this question, but it was not rhetorical at all. Can you > > answer it please? > > > >> >> Linux doesn't see this because the clocksource and the > >> >> clockevents-device come from separate clks there. It is a windows' > >> >> problem. It *is* "bad behaviour". > >> > OK we will call the flag -win-time-drift-hack. > >> > >> I'm not saying it needs a flag, but that's it's a hack so it should stay local. > >> > > The fix is already local. It is local to PIT and RTC. And to make it > > local I change an infrastructure in the first patch. > > Unnecessarily and invasively. How's that different from an invasive ugly hack.. > If interface was defined in such a way at the beginning you wouldn't even noticed that it can be used to implement localized hacks ;) > > > >> > > >> >> > >> >> > > >> >> >> Instead you can have the interrupt sources register a callback in the > >> >> >> PIC that the PIC calls when the interrupt wasn't delivered. Or.. in > >> >> > It requires the mapping from interrupt vector inside the PIC to > >> >> > interrupt source. > >> >> > >> >> Of course. > >> >> > >> >> > This approach was rejected long time ago. > >> >> > >> >> Then you'll have to find a different one. > >> >> > >> > > >> > I found one. Here it is, implemented by this patch series. > >> > > >> >> qemu_irq is the wrong place. > >> > Why? Don't give me "that is not how real HW works". Real HW, if properly > >> > >> I explained in a previous mail, but let me reiterate: qemu_irq > >> abstracts a pin on whose one end a device can set a voltage level and > >> the other end read it. That's it - there's communication in one way > >> only. If you want to send a notification the other direction use a > >> second qemu_irq or a callback. It's a quite simple change in your > >> PIC. > >> > > And why is this abstract is set in stone? Why can't we extend it to be: > > Because we don't need to and because it's so backwards.. That is debatable... > >> >> This doesn't matter, the tick that arrived while a previous interrupt > >> >> was not acked yet, is lost anyway, > >> > Not it is not. Not necessary. It can be queued inside PIC and delivered > >> > by PIC itself immediately after interrupt acknowledgement. > >> > >> You can argue that it's the new irq that's lost or it's the first one > >> that was lost, either way the PIC only sees one time the irq rising, > >> instead of two. That means they were coalesced. > > Nothing is lost and PIC sees two irq rising. Example: > > - RTC triggers first interrupt. > > - It is delivered to PIC. PIC sets corespondent bit in IRR. > > - CPU picks up RTC interrupt and it's bit is cleared from IRR bitmap. > > - CPU jumps to RTC IRQ routing but before it gets a chance to acknowledge > > IRQ to PIC new timer is triggered. > > - With your patch you increment irq_coalesced in that case. > > - Interrupt is delivered to PIC. > > No, it isn't (unless the PIC is poorly implemented). We raise the > irq, but since it's already high, nothing happens, there's no rising > edge. > That would be the case if RTC used level triggered interrupts, but RTC and PIT are edge-trigered. That is how they behave like it or not. > > PIC sets corespondent bit in IRR > > but don't deliver the interrupt to the CPU since one is already > > in progress. > > - CPU acks first IRQ. > > Only here it's lowered. > Nope. > > - PIC delivers second IRQ from IRR. > > > > The result is that two interrupts are delivered to CPU and > > irq_coalesced == 1 so one more will be needlessly signaled. > > > >> >> i.e. had been coalesced. So > >> >> this'll give you the right number of interrupts to re-inject. > >> > No it will not. You'll inject more interrupt then needed and clock will > >> > drift forwards. > >> > >> The PIC won't see more interrupts (rising edges) than times the timer > >> had ticked to given moment. Thus the guest OS won't see them either. > >> > > See above. > > > >> > > >> >> > >> >> Ofcourse this, as well as your approach are both wrong because the > >> >> guest may be intentionally ignoring the irq and expecting the > >> >> interrupts to coalesce. Once it starts processing the RTC interrupts > >> >> it will get an unexpected storm. > >> >> > >> > This is one more thing which is broken in your suggestion, not mine. If a > >> > guest wants to ignore interrupt it will mask it and my patch don't report > >> > interrupts delivered while masked as been coalesced. > >> > >> This is moot, it may choose to mask it or not, it can also mask it > >> lower down the path. > >> > > Are we talking about real OSes that we want to run as guests or we give > > purely theoretical examples? > > We're talking about emulating the hw. We have to make compromises > sometimes, but we try to avoid them. > Fair enough. Can we compromise in this case :) -- Gleb.