From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=39031 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OHicT-0006GG-34 for qemu-devel@nongnu.org; Thu, 27 May 2010 15:19:54 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OHicR-0008A4-QS for qemu-devel@nongnu.org; Thu, 27 May 2010 15:19:53 -0400 Received: from mail-pw0-f45.google.com ([209.85.160.45]:49362) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OHicR-00089w-HA for qemu-devel@nongnu.org; Thu, 27 May 2010 15:19:51 -0400 Received: by pwj3 with SMTP id 3so209947pwj.4 for ; Thu, 27 May 2010 12:19:50 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <4BFEC322.3030207@web.de> References: <4BFD8010.3010001@web.de> <201005270026.34119.paul@codesourcery.com> <4BFEBA66.6030804@web.de> <4BFEC322.3030207@web.de> From: Blue Swirl Date: Thu, 27 May 2010 19:19:30 +0000 Message-ID: Subject: Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jan Kiszka Cc: Juan Quintela , Paul Brook , qemu-devel@nongnu.org On Thu, May 27, 2010 at 7:08 PM, Jan Kiszka wrote: > Blue Swirl wrote: >> On Thu, May 27, 2010 at 6:31 PM, Jan Kiszka wrote: >>> Blue Swirl wrote: >>>> On Wed, May 26, 2010 at 11:26 PM, Paul Brook w= rote: >>>>>> At the other extreme, would it be possible to make the educated gues= ts >>>>>> aware of the virtualization also in clock aspect: virtio-clock? >>>>> The guest doesn't even need to be aware of virtualization. It just ne= eds to be >>>>> able to accommodate the lack of guaranteed realtime behavior. >>>>> >>>>> The fundamental problem here is that some guest operating systems ass= ume that >>>>> the hardware provides certain realtime guarantees with respect to exe= cution of >>>>> interrupt handlers. =C2=A0In particular they assume that the CPU will= always be >>>>> able to complete execution of the timer IRQ handler before the period= ic timer >>>>> triggers again. =C2=A0In most virtualized environments you have absol= utely no >>>>> guarantee of realtime response. >>>>> >>>>> With Linux guests this was solved a long time ago by the introduction= of >>>>> tickless kernels. =C2=A0These separate the timekeeping from wakeup ev= ents, so it >>>>> doesn't matter if several wakeup triggers end up getting merged (eith= er at the >>>>> hardware level or via top/bottom half guest IRQ handlers). >>>>> >>>>> >>>>> It's worth mentioning that this problem also occurs on real hardware, >>>>> typically due to lame hardware/drivers which end up masking interrupt= s or >>>>> otherwise stall the CPU for for long periods of time. >>>>> >>>>> >>>>> The PIT hack attempts to workaround broken guests by adding artificia= l latency >>>>> to the timer event, ensuring that the guest "sees" them all. =C2=A0Un= fortunately >>>>> guests vary on when it is safe for them to see the next timer event, = and >>>>> trying to observe this behavior involves potentially harmful heuristi= cs and >>>>> collusion between unrelated devices (e.g. interrupt controller and ti= mer). >>>>> >>>>> In some cases we don't even do that, and just reschedule the event so= me >>>>> arbitrarily small amount of time later. This assumes the guest to do = useful >>>>> work in that time. In a single threaded environment this is probably = true - >>>>> qemu got enough CPU to inject the first interrupt, so will probably m= anage to >>>>> execute some guest code before the end of its timeslice. In an enviro= nment >>>>> where interrupt processing/delivery and execution of the guest code h= appen in >>>>> different threads this becomes increasingly likely to fail. >>>> So any voodoo around timer events is doomed to fail in some cases. >>>> What's the amount of hacks what we want then? Is there any generic >>> The aim of this patch is to reduce the amount of existing and upcoming >>> hacks. It may still require some refinements, but I think we haven't >>> found any smarter approach yet that fits existing use cases. >> >> I don't feel we have tried other possibilities hard enough. > > Well, seeing prototypes wouldn't be bad, also to run real load againt > them. But at least I'm currently clueless what to implement. Perhaps now is then not the time to rush to implement something, but to brainstorm for a clean solution. >> >>>> solution, like slowing down the guest system to the point where we can >>>> guarantee the interrupt rate vs. CPU execution speed? >>> That's generally a non-option in virtualized production environments. >>> Specifically if the guest system lost interrupts due to host >>> overcommitment, you do not want it slow down even further. >> >> I meant that the guest time could be scaled down, for example 2s in >> wall clock time would be presented to the guest as 1s. > > But that is precisely what already happens when the guest loses timer > interrupts. There is no other time source for this kind of guests - > often except for some external events generated by systems which you > don't want to fall behind arbitrarily. > >> Then the amount >> of CPU cycles between timer interrupts would increase and hopefully >> the guest can keep up. If the guest sleeps, time base could be >> accelerated to catch up with wall clock and then set back to 1:1 rate. > > Can't follow you ATM, sorry. What should be slowed down then? And how > precisely? I think vm_clock and everything that depends on vm_clock, also rtc_clock should be tied to vm_clock in this mode, not host_clock. > > Jan > >> >> Slowing down could be triggered by measuring the guest load (for >> example, by checking for presence of halt instructions), if it's close >> to 1, time would be slowed down. If the guest starts to issue halt >> instructions because it's more idle, we can increase speed. >> >> If this approach worked, even APIC could be made ignorant about >> coalescing voodoo so it should be a major cleanup. > > >