qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Gleb Natapov <gleb@redhat.com>
To: Blue Swirl <blauwirbel@gmail.com>
Cc: Jan Kiszka <jan.kiszka@web.de>,
	qemu-devel@nongnu.org, Juan Quintela <quintela@redhat.com>
Subject: Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
Date: Sat, 29 May 2010 17:38:57 +0300	[thread overview]
Message-ID: <20100529143857.GE3604@redhat.com> (raw)
In-Reply-To: <AANLkTimvdjU1zunypgO1VikK3Tw47TDZyrtR7m1YC2o_@mail.gmail.com>

On Sat, May 29, 2010 at 09:15:11AM +0000, Blue Swirl wrote:
> >> There is no code, because we're still at architecture design stage.
> >>
> > Try to write test code to understand the problem better.
> 
> I will.
> 
Please do ASAP. This discussion shows that you don't understand what is the
problem that we are dialing with.

> >> >> >> guests could also be assisted with special handling (like win2k
> >> >> >> install hack), for example guest instructions could be counted
> >> >> >> (approximately, for example using TB size or TSC) and only inject
> >> >> >> after at least N instructions have passed.
> >> >> > Guest instructions cannot be easily counted in KVM (it can be done more
> >> >> > or less reliably using perf counters, may be).
> >> >>
> >> >> Aren't there any debug registers or perf counters, which can generate
> >> >> an interrupt after some number of instructions have been executed?
> >> > Don't think debug registers have something like that and they are
> >> > available for guest use anyway. Perf counters differs greatly from CPU
> >> > to CPU (even between two CPUs of the same manufacturer), and we want to
> >> > keep using them for profiling guests. And I don't see what problem it
> >> > will solve anyway that can be solved by simple delay between irq
> >> > reinjection.
> >>
> >> This would allow counting the executed instructions and limit it. Thus
> >> we could emulate a 500MHz CPU on a 2GHz CPU more accurately.
> >>
> > Why would you want to limit number of instruction executed by guest if
> > CPU has nothing else to do anyway? The problem occurs not when we have
> > spare cycles so give to a guest, but in opposite case.
> 
> I think one problem is that the guest has executed too much compared
> to what would happen with real HW with a lesser CPU. That explains the
> RTC frequency reprogramming case.
You think wrong. The problem is exactly opposite: the guest haven't
had enough execution time between two time interrupts. I don't know what
RTC frequency reprogramming case you are talking about here.

> 
> >
> >> >>
> >> >> >>
> >> >> >> > And even if the rate did not matter, the APIC woult still have to now
> >> >> >> > about the fact that an IRQ is really periodic and does not only appear
> >> >> >> > as such for a certain interval. This really does not sound like
> >> >> >> > simplifying things or even make them cleaner.
> >> >> >>
> >> >> >> It would, the voodoo would be contained only in APIC, RTC would be
> >> >> >> just like any other device. With the bidirectional irqs, this voodoo
> >> >> >> would probably eventually spread to many other devices. The logical
> >> >> >> conclusion of that would be a system where all devices would be
> >> >> >> careful not to disturb the guest at wrong moment because that would
> >> >> >> trigger a bug.
> >> >> >>
> >> >> > This voodoo will be so complex and unreliable that it will make RTC hack
> >> >> > pale in comparison (and I still don't see how you are going to make it
> >> >> > actually work).
> >> >>
> >> >> Implement everything inside APIC: only coalescing and reinjection.
> >> > APIC has zero info needed to implement reinjection correctly as was
> >> > shown to you several time in this thread and you simply keep ignoring
> >> > it.
> >>
> >> On the contrary, APIC is actually the only source of the IRQ ack
> >> information. RTC hack would not work without APIC (or the
> >> bidirectional IRQ) passing this info to RTC.
> >>
> >> What APIC doesn't have now is the timer frequency or period info. This
> >> is known by RTC and also higher levels managing the clocks.
> >>
> > So APIC has one bit of information and RTC everything else.
> 
> The information known by RTC (timer period) is also known by higher levels.
> 
What do you mean by higher level here? vl.c or APIC.

> > The current
> > approach (and proposed patch) brings this one bit of information to RTC,
> > you are arguing that RTC should be able to communicate all its info to
> > APIC. Sorry I don't see that your way has any advantage. Just more
> > complex interface and it is much easier to get it wrong for other time
> > sources.
> 
> I don't think anymore that APIC should be handling this but the
> generic stuff, like vl.c or exec.c. Then there would be only
> information passing from APIC to higher levels.
Handling reinjection by general timer code makes infinitely more sense
then handling it in APIC. One thing (from the top of my head) that can't
be implemented at that level is injection of interrupt back to back (i.e
injecting next interrupt immediately after guest acknowledge previous
one to RTC).

> 
> >> I keep ignoring the idea that the current model, where both RTC and
> >> APIC must somehow work together to make coalescing work, is the only
> >> possible just because it is committed and it happens to work in some
> >> cases. It would be much better to concentrate this to one place, APIC
> >> or preferably higher level where it may benefit other timers too.
> >> Provided of course that the other models can be made to work.
> >>
> > So write the code and show us. You haven't show any evidence that RTC is
> > the wrong place. RTC knows when interrupt was acknowledge to RTC, it
> > know when clock frequency changes, it know when device reset happened.
> > APIC knows only that interrupt was coalesced. It doesn't even know that
> > it may be masked by a guest in IOAPIC (interrupts delivered while they
> > are masked not considered coalesced).
> 
> Oh, I thought interrupt masking was the reason for coalescing! What
> exactly is the reason then?
> 
The reason is that guest has no time to process previous interrupt
before it is time to inject next one.

> > Time source knows only when
> > frequency changes and may be when device reset happens if timer is
> > stopped by device on reset. So RTC is actually a sweet spot if you want
> > to minimize amount of info you need to pass between various layers.
> >
> >> >> Maybe that version would not bend backwards as much as the current to
> >> >> cater for buggy hosts.
> >> >>
> >> > You mean "buggy guests"?
> >>
> >> Yes, sorry.
> >>
> >> > What guests are not buggy in your opinion?
> >> > Linux tries hard to be smart and as a result the only way to have stable
> >> > clock with it is to go paravirt.
> >>
> >> I'm not an OS designer, but I think an OS should never crash, even if
> >> a burst of IRQs is received. Reprogramming the timer should consider
> >> the pending IRQ situation (0 or 1 with real HW). Those bugs are one
> >> cause of the problem.
> > OS should never crash in the absence of HW bugs? I doubt you can design
> > an OS that can run in a face of any HW failure. Anyway here we are
> > trying to solve guests time keeping problem not crashes. Do you think
> > you can design OS that can keep time accurately no matter how crazy all
> > HW clock behaves?
> 
> I think my OS design skills are not relevant in this discussion, but
> IIRC there are fault tolerant operating systems for extreme conditions
> so it can be done.
> 
> >
> >>
> >> >> > The fact is that timer device is not "just like any
> >> >> > other device" in virtual world. Any other device is easy: you just
> >> >> > implement spec as close as possible and everything works. For time
> >> >> > source device this is not enough. You can implement RTC+HPET to the
> >> >> > letter and your guest will drift like crazy.
> >> >>
> >> >> It's doable: a cycle accurate emulator will not cause any drift,
> >> >> without any voodoo. The interrupts would come after executing the same
> >> >> instruction as the real HW. For emulating any sufficiently buggy
> >> >> guests in any sufficiently desperate low resource conditions, this may
> >> >> be the only option that will always work.
> >> >>
> >> > Yes, but qemu and kvm are not cycle accurate emulators and don't strive
> >> > to be one. On the contrary KVM runs at native host CPU speed most of the
> >> > time, so any emulation done between two instruction is theoretically
> >> > noticeable for a guest. TSC is bypassed directly to a guest too, so
> >> > keeping all time source in perfect sync is also impossible.
> >>
> >> That is actually another cause of the problem. KVM gives the guest an
> >> illusion that the VCPU speed is equal to host speed. When they don't
> >> match, especially in critical code, there can be problems. It would be
> >> better to tell the guest a lower speed, which also can be guaranteed.
> >>
> > Not possible. It's that simple. You should take it into account in your
> > architecture design stage. In case of KVM real physical CPU executes guest
> > instruction and it does this as fast as it can. The only way we can hide
> > that from a guest is by intercepting each access to TSC and at that
> > point we can use bochs instead.
> 
> Well, as Paul pointed out, there's also icount option.
> 
icount is not an option for KVM.

> >> Maybe we should also offline the device emulation to another host CPU
> >> with threading. A load from a device will always be much slower than
> >> on real HW though.
> > Time drift problem start to happen on loaded servers, so you do not have
> > spare CPU to offload device emulation too.
> >
> > --
> >                        Gleb.
> >

--
			Gleb.

  parent reply	other threads:[~2010-05-29 15:39 UTC|newest]

Thread overview: 122+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-24 20:13 [Qemu-devel] [RFT][PATCH 00/15] HPET cleanups, fixes, enhancements Jan Kiszka
2010-05-24 20:13 ` [Qemu-devel] [RFT][PATCH 01/15] hpet: Catch out-of-bounds timer access Jan Kiszka
2010-05-24 20:34   ` [Qemu-devel] " Juan Quintela
2010-05-24 20:36     ` Jan Kiszka
2010-05-24 20:50       ` Juan Quintela
2010-05-24 20:13 ` [Qemu-devel] [RFT][PATCH 02/15] hpet: Coding style cleanups and some refactorings Jan Kiszka
2010-05-24 20:37   ` [Qemu-devel] " Juan Quintela
2010-05-24 20:13 ` [Qemu-devel] [RFT][PATCH 03/15] hpet: Silence warning on write to running main counter Jan Kiszka
2010-05-24 20:13 ` [Qemu-devel] [RFT][PATCH 04/15] hpet: Move static timer field initialization Jan Kiszka
2010-05-24 20:13 ` [Qemu-devel] [RFT][PATCH 05/15] hpet: Convert to qdev Jan Kiszka
2010-05-25  9:37   ` Paul Brook
2010-05-25 10:14     ` Jan Kiszka
2010-05-24 20:13 ` [Qemu-devel] [RFT][PATCH 06/15] hpet: Start/stop timer when HPET_TN_ENABLE is modified Jan Kiszka
2010-05-24 20:13 ` [Qemu-devel] [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback Jan Kiszka
2010-05-25  6:07   ` Gleb Natapov
2010-05-25  6:31     ` Jan Kiszka
2010-05-25  6:40       ` Gleb Natapov
2010-05-25  6:54         ` Jan Kiszka
2010-05-25 19:09   ` [Qemu-devel] " Blue Swirl
2010-05-25 20:16     ` Anthony Liguori
2010-05-25 21:44       ` Jan Kiszka
2010-05-26  8:08         ` Gleb Natapov
2010-05-26 20:14           ` Blue Swirl
2010-05-27  5:42             ` Gleb Natapov
2010-05-26 19:55         ` Blue Swirl
2010-05-26 20:09           ` Jan Kiszka
2010-05-26 20:35             ` Blue Swirl
2010-05-26 22:35               ` Jan Kiszka
2010-05-26 23:26               ` Paul Brook
2010-05-27 17:56                 ` Blue Swirl
2010-05-27 18:31                   ` Jan Kiszka
2010-05-27 18:53                     ` Blue Swirl
2010-05-27 19:08                       ` Jan Kiszka
2010-05-27 19:19                         ` Blue Swirl
2010-05-27 22:19                           ` Jan Kiszka
2010-05-28 19:00                             ` Blue Swirl
2010-05-30 12:00                             ` Avi Kivity
2010-05-27 22:21                           ` Paul Brook
2010-05-28 19:10                             ` Blue Swirl
2010-05-27 22:21                   ` Paul Brook
2010-05-27  6:13               ` Gleb Natapov
2010-05-27 18:37                 ` Blue Swirl
2010-05-28  7:31                   ` Gleb Natapov
2010-05-28 20:06                     ` Blue Swirl
2010-05-28 20:47                       ` Gleb Natapov
2010-05-29  7:58                         ` Jan Kiszka
2010-05-29  9:35                           ` Blue Swirl
2010-05-29  9:45                             ` Jan Kiszka
2010-05-29 10:04                               ` Blue Swirl
2010-05-29 10:16                                 ` Jan Kiszka
2010-05-29 10:26                                   ` Blue Swirl
2010-05-29 10:38                                     ` Jan Kiszka
2010-05-29 14:46                             ` Gleb Natapov
2010-05-29 16:13                               ` Blue Swirl
2010-05-29 16:37                                 ` Gleb Natapov
2010-05-29 21:21                                   ` Blue Swirl
2010-05-30  6:02                                     ` Gleb Natapov
2010-05-30 12:10                                       ` Blue Swirl
2010-05-30 12:24                                         ` Jan Kiszka
2010-05-30 12:58                                           ` Blue Swirl
2010-05-31  7:46                                             ` Jan Kiszka
2010-05-30 12:33                                         ` Gleb Natapov
2010-05-30 12:56                                           ` Blue Swirl
2010-05-30 13:49                                             ` Gleb Natapov
2010-05-30 16:54                                               ` Blue Swirl
2010-05-30 19:37                                               ` Blue Swirl
2010-05-30 20:07                                                 ` Gleb Natapov
2010-05-30 20:21                                                   ` Blue Swirl
2010-05-31  5:19                                                     ` Gleb Natapov
2010-06-01 18:00                                                       ` Blue Swirl
2010-06-01 18:30                                                         ` Gleb Natapov
2010-06-02 19:05                                                           ` Blue Swirl
2010-06-03  6:23                                                             ` Jan Kiszka
2010-06-03  6:34                                                               ` Gleb Natapov
2010-06-03  6:59                                                                 ` Jan Kiszka
2010-06-03  7:03                                                                   ` Gleb Natapov
2010-06-03  7:06                                                                     ` Gleb Natapov
2010-06-04 19:05                                                                       ` Blue Swirl
2010-06-05  0:04                                                                         ` Jan Kiszka
2010-06-05  7:20                                                                           ` Blue Swirl
2010-06-05  8:27                                                                             ` Jan Kiszka
2010-06-05  9:23                                                                               ` Blue Swirl
2010-06-05 12:14                                                                                 ` Jan Kiszka
2010-06-06  7:15                                                                           ` Gleb Natapov
2010-06-06  7:39                                                                             ` Jan Kiszka
2010-06-06  7:49                                                                               ` Gleb Natapov
2010-06-06  8:07                                                                                 ` Jan Kiszka
2010-06-06  9:23                                                                                   ` Gleb Natapov
2010-06-06 10:10                                                                                     ` Jan Kiszka
2010-06-06 10:27                                                                                       ` Gleb Natapov
2010-06-06  7:39                                                                             ` Blue Swirl
2010-06-06  8:07                                                                               ` Gleb Natapov
2010-05-30 13:22                                           ` Blue Swirl
2010-05-29  9:15                         ` Blue Swirl
2010-05-29  9:36                           ` Jan Kiszka
2010-05-29 14:38                           ` Gleb Natapov [this message]
2010-05-29 16:03                             ` Blue Swirl
2010-05-29 16:32                               ` Gleb Natapov
2010-05-29 20:52                                 ` Blue Swirl
2010-05-30  5:41                                   ` Gleb Natapov
2010-05-30 11:41                                     ` Blue Swirl
2010-05-30 11:52                                       ` Gleb Natapov
2010-05-30 12:05                           ` Avi Kivity
2010-05-27  5:58             ` Gleb Natapov
2010-05-26 19:49       ` Blue Swirl
2010-05-24 20:13 ` [Qemu-devel] [RFT][PATCH 08/15] x86: Refactor RTC IRQ coalescing workaround Jan Kiszka
2010-05-24 20:13 ` [Qemu-devel] [RFT][PATCH 09/15] hpet/rtc: Rework RTC IRQ replacement by HPET Jan Kiszka
2010-05-25  9:29   ` Paul Brook
2010-05-25 10:23     ` Jan Kiszka
2010-05-25 11:05       ` Paul Brook
2010-05-25 11:19         ` Jan Kiszka
2010-05-25 11:23           ` Paul Brook
2010-05-25 11:26             ` Jan Kiszka
2010-05-25 12:03               ` Paul Brook
2010-05-25 12:39                 ` Jan Kiszka
2010-05-24 20:13 ` [Qemu-devel] [RFT][PATCH 10/15] hpet: Drop static state Jan Kiszka
2010-05-24 20:13 ` [Qemu-devel] [RFT][PATCH 11/15] hpet: Add support for level-triggered interrupts Jan Kiszka
2010-05-24 20:13 ` [Qemu-devel] [RFT][PATCH 12/15] vmstate: Add VMSTATE_STRUCT_VARRAY_UINT8 Jan Kiszka
2010-05-24 20:13 ` [Qemu-devel] [RFT][PATCH 13/15] hpet: Make number of timers configurable Jan Kiszka
2010-05-24 20:13 ` [Qemu-devel] [RFT][PATCH 14/15] hpet: Add MSI support Jan Kiszka
2010-05-24 20:13 ` [Qemu-devel] [RFT][PATCH 15/15] monitor/QMP: Drop info hpet / query-hpet Jan Kiszka
2010-05-24 22:16 ` [Qemu-devel] [RFT][PATCH 00/15] HPET cleanups, fixes, enhancements Anthony Liguori

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100529143857.GE3604@redhat.com \
    --to=gleb@redhat.com \
    --cc=blauwirbel@gmail.com \
    --cc=jan.kiszka@web.de \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).