From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=50096 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1OIOHA-0003uJ-An
	for qemu-devel@nongnu.org; Sat, 29 May 2010 11:48:43 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <jan.kiszka@web.de>) id 1OIITS-00008x-7T
	for qemu-devel@nongnu.org; Sat, 29 May 2010 05:36:59 -0400
Received: from fmmailgate01.web.de ([217.72.192.221]:33529)
	by eggs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <jan.kiszka@web.de>) id 1OIITR-0008GN-OH
	for qemu-devel@nongnu.org; Sat, 29 May 2010 05:36:58 -0400
Message-ID: <4C00E02D.9050804@web.de>
Date: Sat, 29 May 2010 11:36:45 +0200
From: Jan Kiszka <jan.kiszka@web.de>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers
	with delivery feedback
References: <AANLkTinP2QheAYmPzPfIZ0qeOcyCv_L5CRoif4XXe5qt@mail.gmail.com>
	<4BFC3028.6030303@codemonkey.ws> <4BFC44D2.2090608@web.de>
	<AANLkTimP25BI3UI_5Zts3uZ4JUG2nq18R8ncOsTrj5v1@mail.gmail.com>
	<4BFD8010.3010001@web.de>
	<AANLkTim6zZYtkPkJxlZHbHKWhWXcU-RlpvgeyMUriBj1@mail.gmail.com>
	<20100527061335.GE5474@redhat.com>
	<AANLkTinwXJ1dwWoVZLTQOrK2byczn97TzHkh1wdjOGjQ@mail.gmail.com>
	<20100528073135.GC17805@redhat.com>
	<AANLkTin4ib3PmnKGHAVIwVpLeyPMz03xx-MVjOvlUba5@mail.gmail.com>
	<20100528204746.GC3604@redhat.com>
	<AANLkTimvdjU1zunypgO1VikK3Tw47TDZyrtR7m1YC2o_@mail.gmail.com>
In-Reply-To: <AANLkTimvdjU1zunypgO1VikK3Tw47TDZyrtR7m1YC2o_@mail.gmail.com>
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature";
	boundary="------------enigFB40489B25687855B8644AE8"
Sender: jan.kiszka@web.de
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Blue Swirl <blauwirbel@gmail.com>
Cc: qemu-devel@nongnu.org, Gleb Natapov <gleb@redhat.com>, Juan Quintela <quintela@redhat.com>

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enigFB40489B25687855B8644AE8
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Blue Swirl wrote:
>>> On the contrary, APIC is actually the only source of the IRQ ack
>>> information. RTC hack would not work without APIC (or the
>>> bidirectional IRQ) passing this info to RTC.
>>>
>>> What APIC doesn't have now is the timer frequency or period info. Thi=
s
>>> is known by RTC and also higher levels managing the clocks.
>>>
>> So APIC has one bit of information and RTC everything else.
>=20
> The information known by RTC (timer period) is also known by higher lev=
els.

Curious to see where you'll find this.

>=20
>> The current
>> approach (and proposed patch) brings this one bit of information to RT=
C,
>> you are arguing that RTC should be able to communicate all its info to=

>> APIC. Sorry I don't see that your way has any advantage. Just more
>> complex interface and it is much easier to get it wrong for other time=

>> sources.
>=20
> I don't think anymore that APIC should be handling this but the
> generic stuff, like vl.c or exec.c. Then there would be only
> information passing from APIC to higher levels.

You neglect the the information required to associate a periodic source
(e.g. RTC) with an IRQ sink (e.g. APIC). Without that, you will have a
hard time figuring out if a reported IRQ coalescing requires any
activities or should simply be welcomed (for I/O IRQs).

>=20
>>> I keep ignoring the idea that the current model, where both RTC and
>>> APIC must somehow work together to make coalescing work, is the only
>>> possible just because it is committed and it happens to work in some
>>> cases. It would be much better to concentrate this to one place, APIC=

>>> or preferably higher level where it may benefit other timers too.
>>> Provided of course that the other models can be made to work.
>>>
>> So write the code and show us. You haven't show any evidence that RTC =
is
>> the wrong place. RTC knows when interrupt was acknowledge to RTC, it
>> know when clock frequency changes, it know when device reset happened.=

>> APIC knows only that interrupt was coalesced. It doesn't even know tha=
t
>> it may be masked by a guest in IOAPIC (interrupts delivered while they=

>> are masked not considered coalesced).
>=20
> Oh, I thought interrupt masking was the reason for coalescing! What
> exactly is the reason then?

Missing acks, ie. the IRQ is still pending when the next one arrives.
You want to filter out masked/suppressed IRQs to avoid running the
de-coalescing logic on sources that are actually cut off (like the RTC
IRQ when the HPET took over).

>=20
>> Time source knows only when
>> frequency changes and may be when device reset happens if timer is
>> stopped by device on reset. So RTC is actually a sweet spot if you wan=
t
>> to minimize amount of info you need to pass between various layers.
>>
>>>>> Maybe that version would not bend backwards as much as the current =
to
>>>>> cater for buggy hosts.
>>>>>
>>>> You mean "buggy guests"?
>>> Yes, sorry.
>>>
>>>> What guests are not buggy in your opinion?
>>>> Linux tries hard to be smart and as a result the only way to have st=
able
>>>> clock with it is to go paravirt.
>>> I'm not an OS designer, but I think an OS should never crash, even if=

>>> a burst of IRQs is received. Reprogramming the timer should consider
>>> the pending IRQ situation (0 or 1 with real HW). Those bugs are one
>>> cause of the problem.
>> OS should never crash in the absence of HW bugs? I doubt you can desig=
n
>> an OS that can run in a face of any HW failure. Anyway here we are
>> trying to solve guests time keeping problem not crashes. Do you think
>> you can design OS that can keep time accurately no matter how crazy al=
l
>> HW clock behaves?
>=20
> I think my OS design skills are not relevant in this discussion, but
> IIRC there are fault tolerant operating systems for extreme conditions
> so it can be done.

No one can influence the design of released OS versions anymore.

>=20
>>>>>> The fact is that timer device is not "just like any
>>>>>> other device" in virtual world. Any other device is easy: you just=

>>>>>> implement spec as close as possible and everything works. For time=

>>>>>> source device this is not enough. You can implement RTC+HPET to th=
e
>>>>>> letter and your guest will drift like crazy.
>>>>> It's doable: a cycle accurate emulator will not cause any drift,
>>>>> without any voodoo. The interrupts would come after executing the s=
ame
>>>>> instruction as the real HW. For emulating any sufficiently buggy
>>>>> guests in any sufficiently desperate low resource conditions, this =
may
>>>>> be the only option that will always work.
>>>>>
>>>> Yes, but qemu and kvm are not cycle accurate emulators and don't str=
ive
>>>> to be one. On the contrary KVM runs at native host CPU speed most of=
 the
>>>> time, so any emulation done between two instruction is theoretically=

>>>> noticeable for a guest. TSC is bypassed directly to a guest too, so
>>>> keeping all time source in perfect sync is also impossible.
>>> That is actually another cause of the problem. KVM gives the guest an=

>>> illusion that the VCPU speed is equal to host speed. When they don't
>>> match, especially in critical code, there can be problems. It would b=
e
>>> better to tell the guest a lower speed, which also can be guaranteed.=

>>>
>> Not possible. It's that simple. You should take it into account in you=
r
>> architecture design stage. In case of KVM real physical CPU executes g=
uest
>> instruction and it does this as fast as it can. The only way we can hi=
de
>> that from a guest is by intercepting each access to TSC and at that
>> point we can use bochs instead.
>=20
> Well, as Paul pointed out, there's also icount option.

Which is not available in virtualization mode.

Jan


--------------enigFB40489B25687855B8644AE8
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iEYEARECAAYFAkwA4DAACgkQitSsb3rl5xRtHgCdGxeFj/ZFdVx/PC3w2yOj0wfZ
n3EAn2uShzyp2YADFrwjrLoo+W7A97fg
=oUGI
-----END PGP SIGNATURE-----

--------------enigFB40489B25687855B8644AE8--