From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=58179 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OHlWn-0003jC-T0 for qemu-devel@nongnu.org; Thu, 27 May 2010 18:26:16 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OHlQ8-0001cX-AX for qemu-devel@nongnu.org; Thu, 27 May 2010 18:19:21 -0400 Received: from fmmailgate03.web.de ([217.72.192.234]:43695) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OHlQ7-0001cG-Qh for qemu-devel@nongnu.org; Thu, 27 May 2010 18:19:20 -0400 Message-ID: <4BFEEFE1.2050602@web.de> Date: Fri, 28 May 2010 00:19:13 +0200 From: Jan Kiszka MIME-Version: 1.0 Subject: Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback References: <4BFD8010.3010001@web.de> <201005270026.34119.paul@codesourcery.com> <4BFEBA66.6030804@web.de> <4BFEC322.3030207@web.de> In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enigBE4DCA0408FF82A7DB7DD2C8" Sender: jan.kiszka@web.de List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Blue Swirl Cc: Juan Quintela , Paul Brook , qemu-devel@nongnu.org This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigBE4DCA0408FF82A7DB7DD2C8 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Blue Swirl wrote: > On Thu, May 27, 2010 at 7:08 PM, Jan Kiszka wrote: >> Blue Swirl wrote: >>> On Thu, May 27, 2010 at 6:31 PM, Jan Kiszka wrote= : >>>> Blue Swirl wrote: >>>>> On Wed, May 26, 2010 at 11:26 PM, Paul Brook wrote: >>>>>>> At the other extreme, would it be possible to make the educated g= uests >>>>>>> aware of the virtualization also in clock aspect: virtio-clock? >>>>>> The guest doesn't even need to be aware of virtualization. It just= needs to be >>>>>> able to accommodate the lack of guaranteed realtime behavior. >>>>>> >>>>>> The fundamental problem here is that some guest operating systems = assume that >>>>>> the hardware provides certain realtime guarantees with respect to = execution of >>>>>> interrupt handlers. In particular they assume that the CPU will a= lways be >>>>>> able to complete execution of the timer IRQ handler before the per= iodic timer >>>>>> triggers again. In most virtualized environments you have absolut= ely no >>>>>> guarantee of realtime response. >>>>>> >>>>>> With Linux guests this was solved a long time ago by the introduct= ion of >>>>>> tickless kernels. These separate the timekeeping from wakeup even= ts, so it >>>>>> doesn't matter if several wakeup triggers end up getting merged (e= ither at the >>>>>> hardware level or via top/bottom half guest IRQ handlers). >>>>>> >>>>>> >>>>>> It's worth mentioning that this problem also occurs on real hardwa= re, >>>>>> typically due to lame hardware/drivers which end up masking interr= upts or >>>>>> otherwise stall the CPU for for long periods of time. >>>>>> >>>>>> >>>>>> The PIT hack attempts to workaround broken guests by adding artifi= cial latency >>>>>> to the timer event, ensuring that the guest "sees" them all. Unfo= rtunately >>>>>> guests vary on when it is safe for them to see the next timer even= t, and >>>>>> trying to observe this behavior involves potentially harmful heuri= stics and >>>>>> collusion between unrelated devices (e.g. interrupt controller and= timer). >>>>>> >>>>>> In some cases we don't even do that, and just reschedule the event= some >>>>>> arbitrarily small amount of time later. This assumes the guest to = do useful >>>>>> work in that time. In a single threaded environment this is probab= ly true - >>>>>> qemu got enough CPU to inject the first interrupt, so will probabl= y manage to >>>>>> execute some guest code before the end of its timeslice. In an env= ironment >>>>>> where interrupt processing/delivery and execution of the guest cod= e happen in >>>>>> different threads this becomes increasingly likely to fail. >>>>> So any voodoo around timer events is doomed to fail in some cases. >>>>> What's the amount of hacks what we want then? Is there any generic >>>> The aim of this patch is to reduce the amount of existing and upcomi= ng >>>> hacks. It may still require some refinements, but I think we haven't= >>>> found any smarter approach yet that fits existing use cases. >>> I don't feel we have tried other possibilities hard enough. >> Well, seeing prototypes wouldn't be bad, also to run real load againt >> them. But at least I'm currently clueless what to implement. >=20 > Perhaps now is then not the time to rush to implement something, but > to brainstorm for a clean solution. And sometimes it can help to understand how ideas could even be improved or why others doesn't work at all. >=20 >>>>> solution, like slowing down the guest system to the point where we = can >>>>> guarantee the interrupt rate vs. CPU execution speed? >>>> That's generally a non-option in virtualized production environments= =2E >>>> Specifically if the guest system lost interrupts due to host >>>> overcommitment, you do not want it slow down even further. >>> I meant that the guest time could be scaled down, for example 2s in >>> wall clock time would be presented to the guest as 1s. >> But that is precisely what already happens when the guest loses timer >> interrupts. There is no other time source for this kind of guests - >> often except for some external events generated by systems which you >> don't want to fall behind arbitrarily. >> >>> Then the amount >>> of CPU cycles between timer interrupts would increase and hopefully >>> the guest can keep up. If the guest sleeps, time base could be >>> accelerated to catch up with wall clock and then set back to 1:1 rate= =2E >> Can't follow you ATM, sorry. What should be slowed down then? And how >> precisely? >=20 > I think vm_clock and everything that depends on vm_clock, also > rtc_clock should be tied to vm_clock in this mode, not host_clock. Let me check if I got this idea correctly: Instead of tuning just the tick frequency of the affected timer device / sending its backlog in a row, you rather want to tune the vm_clock correspondingly? Maybe a way to abstract the required logic currently sitting only in the RTC for use by other timer sources as well. But just switching rtc_clock to vm_clock when the user wants host_clock is obviously not an option. We would rather have to tune host_clock in parallel. Still, this does not answer: - How do you want to detect lost timer ticks? - What subsystem(s) keeps track of the backlog? - And depending on the above: How to detect at all that a specific IRQ is a timer tick? Jan --------------enigBE4DCA0408FF82A7DB7DD2C8 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iEYEARECAAYFAkv+7+YACgkQitSsb3rl5xSp7wCfYKPN8/E0U7PFXDjHKxvYHvo/ VGQAn152I69rKa11omWTx6MsYynfmMdQ =UL9W -----END PGP SIGNATURE----- --------------enigBE4DCA0408FF82A7DB7DD2C8--