From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=50096 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OIOHA-0003uJ-An for qemu-devel@nongnu.org; Sat, 29 May 2010 11:48:43 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OIITS-00008x-7T for qemu-devel@nongnu.org; Sat, 29 May 2010 05:36:59 -0400 Received: from fmmailgate01.web.de ([217.72.192.221]:33529) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OIITR-0008GN-OH for qemu-devel@nongnu.org; Sat, 29 May 2010 05:36:58 -0400 Message-ID: <4C00E02D.9050804@web.de> Date: Sat, 29 May 2010 11:36:45 +0200 From: Jan Kiszka MIME-Version: 1.0 Subject: Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback References: <4BFC3028.6030303@codemonkey.ws> <4BFC44D2.2090608@web.de> <4BFD8010.3010001@web.de> <20100527061335.GE5474@redhat.com> <20100528073135.GC17805@redhat.com> <20100528204746.GC3604@redhat.com> In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enigFB40489B25687855B8644AE8" Sender: jan.kiszka@web.de List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Blue Swirl Cc: qemu-devel@nongnu.org, Gleb Natapov , Juan Quintela This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigFB40489B25687855B8644AE8 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Blue Swirl wrote: >>> On the contrary, APIC is actually the only source of the IRQ ack >>> information. RTC hack would not work without APIC (or the >>> bidirectional IRQ) passing this info to RTC. >>> >>> What APIC doesn't have now is the timer frequency or period info. Thi= s >>> is known by RTC and also higher levels managing the clocks. >>> >> So APIC has one bit of information and RTC everything else. >=20 > The information known by RTC (timer period) is also known by higher lev= els. Curious to see where you'll find this. >=20 >> The current >> approach (and proposed patch) brings this one bit of information to RT= C, >> you are arguing that RTC should be able to communicate all its info to= >> APIC. Sorry I don't see that your way has any advantage. Just more >> complex interface and it is much easier to get it wrong for other time= >> sources. >=20 > I don't think anymore that APIC should be handling this but the > generic stuff, like vl.c or exec.c. Then there would be only > information passing from APIC to higher levels. You neglect the the information required to associate a periodic source (e.g. RTC) with an IRQ sink (e.g. APIC). Without that, you will have a hard time figuring out if a reported IRQ coalescing requires any activities or should simply be welcomed (for I/O IRQs). >=20 >>> I keep ignoring the idea that the current model, where both RTC and >>> APIC must somehow work together to make coalescing work, is the only >>> possible just because it is committed and it happens to work in some >>> cases. It would be much better to concentrate this to one place, APIC= >>> or preferably higher level where it may benefit other timers too. >>> Provided of course that the other models can be made to work. >>> >> So write the code and show us. You haven't show any evidence that RTC = is >> the wrong place. RTC knows when interrupt was acknowledge to RTC, it >> know when clock frequency changes, it know when device reset happened.= >> APIC knows only that interrupt was coalesced. It doesn't even know tha= t >> it may be masked by a guest in IOAPIC (interrupts delivered while they= >> are masked not considered coalesced). >=20 > Oh, I thought interrupt masking was the reason for coalescing! What > exactly is the reason then? Missing acks, ie. the IRQ is still pending when the next one arrives. You want to filter out masked/suppressed IRQs to avoid running the de-coalescing logic on sources that are actually cut off (like the RTC IRQ when the HPET took over). >=20 >> Time source knows only when >> frequency changes and may be when device reset happens if timer is >> stopped by device on reset. So RTC is actually a sweet spot if you wan= t >> to minimize amount of info you need to pass between various layers. >> >>>>> Maybe that version would not bend backwards as much as the current = to >>>>> cater for buggy hosts. >>>>> >>>> You mean "buggy guests"? >>> Yes, sorry. >>> >>>> What guests are not buggy in your opinion? >>>> Linux tries hard to be smart and as a result the only way to have st= able >>>> clock with it is to go paravirt. >>> I'm not an OS designer, but I think an OS should never crash, even if= >>> a burst of IRQs is received. Reprogramming the timer should consider >>> the pending IRQ situation (0 or 1 with real HW). Those bugs are one >>> cause of the problem. >> OS should never crash in the absence of HW bugs? I doubt you can desig= n >> an OS that can run in a face of any HW failure. Anyway here we are >> trying to solve guests time keeping problem not crashes. Do you think >> you can design OS that can keep time accurately no matter how crazy al= l >> HW clock behaves? >=20 > I think my OS design skills are not relevant in this discussion, but > IIRC there are fault tolerant operating systems for extreme conditions > so it can be done. No one can influence the design of released OS versions anymore. >=20 >>>>>> The fact is that timer device is not "just like any >>>>>> other device" in virtual world. Any other device is easy: you just= >>>>>> implement spec as close as possible and everything works. For time= >>>>>> source device this is not enough. You can implement RTC+HPET to th= e >>>>>> letter and your guest will drift like crazy. >>>>> It's doable: a cycle accurate emulator will not cause any drift, >>>>> without any voodoo. The interrupts would come after executing the s= ame >>>>> instruction as the real HW. For emulating any sufficiently buggy >>>>> guests in any sufficiently desperate low resource conditions, this = may >>>>> be the only option that will always work. >>>>> >>>> Yes, but qemu and kvm are not cycle accurate emulators and don't str= ive >>>> to be one. On the contrary KVM runs at native host CPU speed most of= the >>>> time, so any emulation done between two instruction is theoretically= >>>> noticeable for a guest. TSC is bypassed directly to a guest too, so >>>> keeping all time source in perfect sync is also impossible. >>> That is actually another cause of the problem. KVM gives the guest an= >>> illusion that the VCPU speed is equal to host speed. When they don't >>> match, especially in critical code, there can be problems. It would b= e >>> better to tell the guest a lower speed, which also can be guaranteed.= >>> >> Not possible. It's that simple. You should take it into account in you= r >> architecture design stage. In case of KVM real physical CPU executes g= uest >> instruction and it does this as fast as it can. The only way we can hi= de >> that from a guest is by intercepting each access to TSC and at that >> point we can use bochs instead. >=20 > Well, as Paul pointed out, there's also icount option. Which is not available in virtualization mode. Jan --------------enigFB40489B25687855B8644AE8 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iEYEARECAAYFAkwA4DAACgkQitSsb3rl5xRtHgCdGxeFj/ZFdVx/PC3w2yOj0wfZ n3EAn2uShzyp2YADFrwjrLoo+W7A97fg =oUGI -----END PGP SIGNATURE----- --------------enigFB40489B25687855B8644AE8--