From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1Kxhkj-0008Jd-PU
	for qemu-devel@nongnu.org; Wed, 05 Nov 2008 07:44:53 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1Kxhkh-0008Hq-MD
	for qemu-devel@nongnu.org; Wed, 05 Nov 2008 07:44:53 -0500
Received: from [199.232.76.173] (port=45389 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Kxhkg-0008H7-Nb
	for qemu-devel@nongnu.org; Wed, 05 Nov 2008 07:44:50 -0500
Received: from mx2.redhat.com ([66.187.237.31]:48093)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <dlaor@redhat.com>) id 1Kxhkf-00086z-U5
	for qemu-devel@nongnu.org; Wed, 05 Nov 2008 07:44:50 -0500
Received: from int-mx2.corp.redhat.com (int-mx2.corp.redhat.com [172.16.27.26])
	by mx2.redhat.com (8.13.8/8.13.8) with ESMTP id mA5CimkG026645
	for <qemu-devel@nongnu.org>; Wed, 5 Nov 2008 07:44:48 -0500
Received: from ns3.rdu.redhat.com (ns3.rdu.redhat.com [10.11.255.199])
	by int-mx2.corp.redhat.com (8.13.1/8.13.1) with ESMTP id mA5Cil6T013571
	for <qemu-devel@nongnu.org>; Wed, 5 Nov 2008 07:44:48 -0500
Received: from localhost.localdomain (vpn-12-67.rdu.redhat.com [10.11.12.67])
	by ns3.rdu.redhat.com (8.13.8/8.13.8) with ESMTP id mA5CikOw005882
	for <qemu-devel@nongnu.org>; Wed, 5 Nov 2008 07:44:46 -0500
Message-ID: <49119551.2070704@redhat.com>
Date: Wed, 05 Nov 2008 14:45:05 +0200
From: Dor Laor <dlaor@redhat.com>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] [RESEND][PATCH 0/3] Fix guest time drift under	heavy
	load.
References: <20081029152236.14831.15193.stgit@dhcp-1-237.local>	<490B59BF.3000205@codemonkey.ws>
	<20081102130441.GD16809@redhat.com>
In-Reply-To: <20081102130441.GD16809@redhat.com>
Content-Type: multipart/alternative;
	boundary="------------090304030305060106080500"
Reply-To: dlaor@redhat.com, qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org

This is a multi-part message in MIME format.
--------------090304030305060106080500
Content-Type: text/plain; charset=windows-1255; format=flowed
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by mx2.redhat.com id mA5CimkG026645

Gleb Natapov wrote:
> On Fri, Oct 31, 2008 at 02:17:19PM -0500, Anthony Liguori wrote:
>  =20
>> Gleb Natapov wrote:
>>    =20
>>> Qemu device emulation for timers might be inaccurate and
>>> causes coalescing of several IRQs into one. It happens when the
>>> load on the host is high and the guest did not manage to ack the
>>> previous IRQ. The problem can be reproduced by copying of a big
>>> file or many small ones inside Windows guest. When you do that guest=20
>>> clock start to lag behind the host one.
>>>
>>> The first patch in the series changes qemu_irq subsystem to return
>>> IRQ delivery status information. If device is notified that IRQs
>>> where lost it can regenerate them as needed. The following two
>>> patches add IRQ regeneration to PIC and RTC devices.
>>>  =20
>>>      =20
>> I don't think any of the problems raised when this was initially poste=
d. =20
>>    =20
> So? I raise them now. Have you tried suggested scenario and was able to
> reproduce the problem?
>
>  =20
It is the same issue, just another scenario.
>> Further, I don't think that always playing catch-up with interrupts is=
=20
>> always the best course of action.
>>
>>    =20
> Agree. Playing catch-up with interrupts is not always the best course o=
f
> action. But sometimes there is no other choice.
>
>  =20
>> As I've said repeatedly in the past, any sort of time drift fixes need=
s =20
>> to have a lot of data posted with it that is repeatable.
>>
>> How much does this improve things with Windows?=20
>>    =20
> The time drift is eliminated. If there is a spike in a load time may
> slow down, but after that it catches up (this happens only during very
> high loads though).
>
>  =20
Gleb, can you please provide more details:
- What's the host's kernel version exactly (including the high-res, dyn=20
tick configured)
- What's the windows version? Is it standard HAL (pit) or ACPI (rtc) or=20
both?
- The detailed scenario you use (example: I copied the entire c:/windows=20
directory, etc)
- Without the patch, what the time drift after x seconds on the host.
- With the patch, is there a drift? Is there increased cpu consumption, e=
tc

Btw: I ack the whole thing, including the problem, the scenario and the=20
solution.
The first '1/3' was not received by my mailer.
>>                                                  How does having a hig=
h =20
>> resolution timer in the host affect the problem to begin with?
>>    =20
> My test machine has relatively recent kernel that use high resolution
> timers for time keeping. Also the problem is that guest does not receiv=
e
> enough time to process injected interrupt. How hr timer can help here?
>
>  =20
>>                                                                 How do=
 =20
>> Linux guests behave with this?
>>    =20
> Linux guests don't use pit or RTC for time keeping. They are completely
> unaffected by those patches.
>
>  =20
It will probably also drift with clock=3Dpit in the guest kernel cmdline.
>> Even the Windows PV spec calls out three separate approaches to dealin=
g =20
>> with missed interrupts and provides an interface for the host to query=
 =20
>> the guest as to which one should be used.  I don't think any solution =
=20
>> that uses a single technique is going to be correct.
>>
>>    =20
> That is what I found in Microsoft docs:
>
>   If a virtual processor is unavailable for a sufficiently long period =
of
>   time, a full timer period may be missed. In this case, the hypervisor
>   uses one of two techniques. The first technique involves timer period
>   modulation, in effect shortening the period until the timer =93catche=
s
>   up=94.
>
>   If a significant number of timer signals have been missed, the
>   hypervisor may be unable to compensate by using period modulation. In
>   this case, some timer expiration signals may be skipped completely.
>   For timers that are marked as lazy, the hypervisor uses a second
>   technique for dealing with the situation in which a virtual processor=
 is
>   unavailable for a long period of time. In this case, the timer signal=
 is
>   deferred until this virtual processor is available. If it doesn=92t b=
ecome
>   available until shortly before the next timer is due to expire, it is
>   skipped entirely.=20
>
> The first techniques is what I am trying to introduce with this patch
> series.
>
> --
> 			Gleb.
>
>
>  =20


--------------090304030305060106080500
Content-Type: text/html; charset=windows-1255
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by mx2.redhat.com id mA5CimkG026645

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content=3D"text/html;charset=3Dwindows-1255"
 http-equiv=3D"Content-Type">
</head>
<body bgcolor=3D"#ffffff" text=3D"#000000">
Gleb Natapov wrote:
<blockquote cite=3D"mid:20081102130441.GD16809@redhat.com" type=3D"cite">
  <pre wrap=3D"">On Fri, Oct 31, 2008 at 02:17:19PM -0500, Anthony Liguor=
i wrote:
  </pre>
  <blockquote type=3D"cite">
    <pre wrap=3D"">Gleb Natapov wrote:
    </pre>
    <blockquote type=3D"cite">
      <pre wrap=3D"">Qemu device emulation for timers might be inaccurate=
 and
causes coalescing of several IRQs into one. It happens when the
load on the host is high and the guest did not manage to ack the
previous IRQ. The problem can be reproduced by copying of a big
file or many small ones inside Windows guest. When you do that guest=20
clock start to lag behind the host one.

The first patch in the series changes qemu_irq subsystem to return
IRQ delivery status information. If device is notified that IRQs
where lost it can regenerate them as needed. The following two
patches add IRQ regeneration to PIC and RTC devices.
 =20
      </pre>
    </blockquote>
    <pre wrap=3D"">I don't think any of the problems raised when this was=
 initially posted. =20
    </pre>
  </blockquote>
  <pre wrap=3D""><!---->So? I raise them now. Have you tried suggested sc=
enario and was able to
reproduce the problem?

  </pre>
</blockquote>
It is the same issue, just another scenario.<br>
<blockquote cite=3D"mid:20081102130441.GD16809@redhat.com" type=3D"cite">
  <pre wrap=3D""></pre>
  <blockquote type=3D"cite">
    <pre wrap=3D"">Further, I don't think that always playing catch-up wi=
th interrupts is=20
always the best course of action.

    </pre>
  </blockquote>
  <pre wrap=3D""><!---->Agree. Playing catch-up with interrupts is not al=
ways the best course of
action. But sometimes there is no other choice.

  </pre>
  <blockquote type=3D"cite">
    <pre wrap=3D"">As I've said repeatedly in the past, any sort of time =
drift fixes needs =20
to have a lot of data posted with it that is repeatable.

How much does this improve things with Windows?=20
    </pre>
  </blockquote>
  <pre wrap=3D""><!---->The time drift is eliminated. If there is a spike=
 in a load time may
slow down, but after that it catches up (this happens only during very
high loads though).

  </pre>
</blockquote>
Gleb, can you please provide more details:<br>
- What's the host's kernel version exactly (including the high-res, dyn
tick configured)<br>
- What's the windows version? Is it standard HAL (pit) or ACPI (rtc) or
both?<br>
- The detailed scenario you use (example: I copied the entire
c:/windows directory, etc)<br>
- Without the patch, what the time drift after x seconds on the host.<br>
- With the patch, is there a drift? Is there increased cpu consumption,
etc<br>
<br>
Btw: I ack the whole thing, including the problem, the scenario and the
solution.<br>
The first '1/3' was not received by my mailer.<br>
<blockquote cite=3D"mid:20081102130441.GD16809@redhat.com" type=3D"cite">
  <pre wrap=3D""></pre>
  <blockquote type=3D"cite">
    <pre wrap=3D"">                                                 How d=
oes having a high =20
resolution timer in the host affect the problem to begin with?
    </pre>
  </blockquote>
  <pre wrap=3D""><!---->My test machine has relatively recent kernel that=
 use high resolution
timers for time keeping. Also the problem is that guest does not receive
enough time to process injected interrupt. How hr timer can help here?

  </pre>
  <blockquote type=3D"cite">
    <pre wrap=3D"">                                                      =
          How do =20
Linux guests behave with this?
    </pre>
  </blockquote>
  <pre wrap=3D""><!---->Linux guests don't use pit or RTC for time keepin=
g. They are completely
unaffected by those patches.

  </pre>
</blockquote>
It will probably also drift with clock=3Dpit in the guest kernel cmdline.=
<br>
<blockquote cite=3D"mid:20081102130441.GD16809@redhat.com" type=3D"cite">
  <pre wrap=3D""></pre>
  <blockquote type=3D"cite">
    <pre wrap=3D"">Even the Windows PV spec calls out three separate appr=
oaches to dealing =20
with missed interrupts and provides an interface for the host to query =20
the guest as to which one should be used.  I don't think any solution =20
that uses a single technique is going to be correct.

    </pre>
  </blockquote>
  <pre wrap=3D""><!---->That is what I found in Microsoft docs:

  If a virtual processor is unavailable for a sufficiently long period of
  time, a full timer period may be missed. In this case, the hypervisor
  uses one of two techniques. The first technique involves timer period
  modulation, in effect shortening the period until the timer =93catches
  up=94.

  If a significant number of timer signals have been missed, the
  hypervisor may be unable to compensate by using period modulation. In
  this case, some timer expiration signals may be skipped completely.
  For timers that are marked as lazy, the hypervisor uses a second
  technique for dealing with the situation in which a virtual processor i=
s
  unavailable for a long period of time. In this case, the timer signal i=
s
  deferred until this virtual processor is available. If it doesn=92t bec=
ome
  available until shortly before the next timer is due to expire, it is
  skipped entirely.=20

The first techniques is what I am trying to introduce with this patch
series.

--
			Gleb.


  </pre>
</blockquote>
<br>
</body>
</html>

--------------090304030305060106080500--