From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=58179 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1OHlWn-0003jC-T0
	for qemu-devel@nongnu.org; Thu, 27 May 2010 18:26:16 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <jan.kiszka@web.de>) id 1OHlQ8-0001cX-AX
	for qemu-devel@nongnu.org; Thu, 27 May 2010 18:19:21 -0400
Received: from fmmailgate03.web.de ([217.72.192.234]:43695)
	by eggs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <jan.kiszka@web.de>) id 1OHlQ7-0001cG-Qh
	for qemu-devel@nongnu.org; Thu, 27 May 2010 18:19:20 -0400
Message-ID: <4BFEEFE1.2050602@web.de>
Date: Fri, 28 May 2010 00:19:13 +0200
From: Jan Kiszka <jan.kiszka@web.de>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers
	with delivery feedback
References: <cover.1274732025.git.jan.kiszka@web.de> <4BFD8010.3010001@web.de>
	<AANLkTim6zZYtkPkJxlZHbHKWhWXcU-RlpvgeyMUriBj1@mail.gmail.com>
	<201005270026.34119.paul@codesourcery.com>
	<AANLkTimLYFIwZew6Q3hpKrNmtCy7u1MhqoYxLFDdPzs7@mail.gmail.com>
	<4BFEBA66.6030804@web.de>
	<AANLkTimNyQF6PqXqE6PO55eXvesbYz4FChxyaqNXr5o0@mail.gmail.com>
	<4BFEC322.3030207@web.de>
	<AANLkTimCADl9DW0n_QCqO-3lEHbaZ4oDIiNUAhmgVlA9@mail.gmail.com>
In-Reply-To: <AANLkTimCADl9DW0n_QCqO-3lEHbaZ4oDIiNUAhmgVlA9@mail.gmail.com>
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature";
	boundary="------------enigBE4DCA0408FF82A7DB7DD2C8"
Sender: jan.kiszka@web.de
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Blue Swirl <blauwirbel@gmail.com>
Cc: Juan Quintela <quintela@redhat.com>, Paul Brook <paul@codesourcery.com>, qemu-devel@nongnu.org

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enigBE4DCA0408FF82A7DB7DD2C8
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Blue Swirl wrote:
> On Thu, May 27, 2010 at 7:08 PM, Jan Kiszka <jan.kiszka@web.de> wrote:
>> Blue Swirl wrote:
>>> On Thu, May 27, 2010 at 6:31 PM, Jan Kiszka <jan.kiszka@web.de> wrote=
:
>>>> Blue Swirl wrote:
>>>>> On Wed, May 26, 2010 at 11:26 PM, Paul Brook <paul@codesourcery.com=
> wrote:
>>>>>>> At the other extreme, would it be possible to make the educated g=
uests
>>>>>>> aware of the virtualization also in clock aspect: virtio-clock?
>>>>>> The guest doesn't even need to be aware of virtualization. It just=
 needs to be
>>>>>> able to accommodate the lack of guaranteed realtime behavior.
>>>>>>
>>>>>> The fundamental problem here is that some guest operating systems =
assume that
>>>>>> the hardware provides certain realtime guarantees with respect to =
execution of
>>>>>> interrupt handlers.  In particular they assume that the CPU will a=
lways be
>>>>>> able to complete execution of the timer IRQ handler before the per=
iodic timer
>>>>>> triggers again.  In most virtualized environments you have absolut=
ely no
>>>>>> guarantee of realtime response.
>>>>>>
>>>>>> With Linux guests this was solved a long time ago by the introduct=
ion of
>>>>>> tickless kernels.  These separate the timekeeping from wakeup even=
ts, so it
>>>>>> doesn't matter if several wakeup triggers end up getting merged (e=
ither at the
>>>>>> hardware level or via top/bottom half guest IRQ handlers).
>>>>>>
>>>>>>
>>>>>> It's worth mentioning that this problem also occurs on real hardwa=
re,
>>>>>> typically due to lame hardware/drivers which end up masking interr=
upts or
>>>>>> otherwise stall the CPU for for long periods of time.
>>>>>>
>>>>>>
>>>>>> The PIT hack attempts to workaround broken guests by adding artifi=
cial latency
>>>>>> to the timer event, ensuring that the guest "sees" them all.  Unfo=
rtunately
>>>>>> guests vary on when it is safe for them to see the next timer even=
t, and
>>>>>> trying to observe this behavior involves potentially harmful heuri=
stics and
>>>>>> collusion between unrelated devices (e.g. interrupt controller and=
 timer).
>>>>>>
>>>>>> In some cases we don't even do that, and just reschedule the event=
 some
>>>>>> arbitrarily small amount of time later. This assumes the guest to =
do useful
>>>>>> work in that time. In a single threaded environment this is probab=
ly true -
>>>>>> qemu got enough CPU to inject the first interrupt, so will probabl=
y manage to
>>>>>> execute some guest code before the end of its timeslice. In an env=
ironment
>>>>>> where interrupt processing/delivery and execution of the guest cod=
e happen in
>>>>>> different threads this becomes increasingly likely to fail.
>>>>> So any voodoo around timer events is doomed to fail in some cases.
>>>>> What's the amount of hacks what we want then? Is there any generic
>>>> The aim of this patch is to reduce the amount of existing and upcomi=
ng
>>>> hacks. It may still require some refinements, but I think we haven't=

>>>> found any smarter approach yet that fits existing use cases.
>>> I don't feel we have tried other possibilities hard enough.
>> Well, seeing prototypes wouldn't be bad, also to run real load againt
>> them. But at least I'm currently clueless what to implement.
>=20
> Perhaps now is then not the time to rush to implement something, but
> to brainstorm for a clean solution.

And sometimes it can help to understand how ideas could even be improved
or why others doesn't work at all.

>=20
>>>>> solution, like slowing down the guest system to the point where we =
can
>>>>> guarantee the interrupt rate vs. CPU execution speed?
>>>> That's generally a non-option in virtualized production environments=
=2E
>>>> Specifically if the guest system lost interrupts due to host
>>>> overcommitment, you do not want it slow down even further.
>>> I meant that the guest time could be scaled down, for example 2s in
>>> wall clock time would be presented to the guest as 1s.
>> But that is precisely what already happens when the guest loses timer
>> interrupts. There is no other time source for this kind of guests -
>> often except for some external events generated by systems which you
>> don't want to fall behind arbitrarily.
>>
>>> Then the amount
>>> of CPU cycles between timer interrupts would increase and hopefully
>>> the guest can keep up. If the guest sleeps, time base could be
>>> accelerated to catch up with wall clock and then set back to 1:1 rate=
=2E
>> Can't follow you ATM, sorry. What should be slowed down then? And how
>> precisely?
>=20
> I think vm_clock and everything that depends on vm_clock, also
> rtc_clock should be tied to vm_clock in this mode, not host_clock.

Let me check if I got this idea correctly: Instead of tuning just the
tick frequency of the affected timer device / sending its backlog in a
row, you rather want to tune the vm_clock correspondingly? Maybe a way
to abstract the required logic currently sitting only in the RTC for use
by other timer sources as well.

But just switching rtc_clock to vm_clock when the user wants host_clock
is obviously not an option. We would rather have to tune host_clock in
parallel.

Still, this does not answer:

- How do you want to detect lost timer ticks?

- What subsystem(s) keeps track of the backlog?

- And depending on the above: How to detect at all that a specific IRQ
  is a timer tick?

Jan


--------------enigBE4DCA0408FF82A7DB7DD2C8
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iEYEARECAAYFAkv+7+YACgkQitSsb3rl5xSp7wCfYKPN8/E0U7PFXDjHKxvYHvo/
VGQAn152I69rKa11omWTx6MsYynfmMdQ
=UL9W
-----END PGP SIGNATURE-----

--------------enigBE4DCA0408FF82A7DB7DD2C8--