From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=49016 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Pkf0D-00059d-L3
	for qemu-devel@nongnu.org; Wed, 02 Feb 2011 10:52:19 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jan.kiszka@siemens.com>) id 1Pkf0B-0007SB-KX
	for qemu-devel@nongnu.org; Wed, 02 Feb 2011 10:52:16 -0500
Received: from goliath.siemens.de ([192.35.17.28]:28406)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jan.kiszka@siemens.com>) id 1Pkf0B-0007Rb-7B
	for qemu-devel@nongnu.org; Wed, 02 Feb 2011 10:52:15 -0500
Message-ID: <4D497DAB.7010901@siemens.com>
Date: Wed, 02 Feb 2011 16:52:11 +0100
From: Jan Kiszka <jan.kiszka@siemens.com>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] KVM: Windows 64-bit troubles with user space irqchip
References: <4D4946F7.1070702@siemens.com> <20110202123532.GF14984@redhat.com>
	<4D4952FA.8020300@siemens.com> <4D49569F.6060207@redhat.com>
	<4D496A8D.90000@siemens.com> <4D496BC5.10807@redhat.com>
	<4D496D77.2010405@siemens.com> <4D496FA6.8070301@siemens.com>
	<4D49738D.7080404@redhat.com> <4D4979BD.6080900@siemens.com>
	<20110202154611.GR14984@redhat.com>
In-Reply-To: <20110202154611.GR14984@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Gleb Natapov <gleb@redhat.com>
Cc: Avi Kivity <avi@redhat.com>, kvm <kvm@vger.kernel.org>, qemu-devel <qemu-devel@nongnu.org>

On 2011-02-02 16:46, Gleb Natapov wrote:
> On Wed, Feb 02, 2011 at 04:35:25PM +0100, Jan Kiszka wrote:
>> On 2011-02-02 16:09, Avi Kivity wrote:
>>> On 02/02/2011 04:52 PM, Jan Kiszka wrote:
>>>> On 2011-02-02 15:43, Jan Kiszka wrote:
>>>>>  On 2011-02-02 15:35, Avi Kivity wrote:
>>>>>>  On 02/02/2011 04:30 PM, Jan Kiszka wrote:
>>>>>>>  On 2011-02-02 14:05, Avi Kivity wrote:
>>>>>>>>   On 02/02/2011 02:50 PM, Jan Kiszka wrote:
>>>>>>>>>>>
>>>>>>>>>>    Opps, -smp 1. With -smp 2 it boot almost completely and then hangs.
>>>>>>>>>
>>>>>>>>>   Ah, good (or not good). With Windows 2003 Server, I actually get a Blue
>>>>>>>>>   Screen (Stop 0x000000b8).
>>>>>>>>
>>>>>>>>   Userspace APIC is broken since it may run with an outdated cr8, does
>>>>>>>>   reverting 27a4f7976d5 help?
>>>>>>>
>>>>>>>  Can you elaborate on what is broken? The way hw/apic.c maintains the
>>>>>>>  tpr? Would it make sense to compare this against the in-kernel model? Or
>>>>>>>  do you mean something else?
>>>>>>
>>>>>>  The problem, IIRC, was that we look up the TPR but it may already have
>>>>>>  been changed by the running vcpu.  Not 100% sure.
>>>>>>
>>>>>>  If that is indeed the problem then the fix would be to process the APIC
>>>>>>  in vcpu context (which is what the kernel does - we set a bit in the IRR
>>>>>>  and all further processing is synchronous).
>>>>>
>>>>>  You mean: user space changes the tpr value while the vcpu is in KVM_RUN,
>>>>>  then we return from the kernel and overwrite the tpr in the apic with
>>>>>  the vcpu's view, right?
>>>>
>>>> Hmm, probably rather that there is a discrepancy between tpr and irr.
>>>> The latter is changed asynchronously /wrt to the vcpu, the former /wrt
>>>> the user space device model.
>>>
>>> And yet, both are synchronized via qemu_mutex.  So we're still missing 
>>> something in this picture.
>>>
>>>> Run apic_set_irq on the vcpu?
>>>
>>> static void apic_set_irq(APICState *s, int vector_num, int trigger_mode)
>>> {
>>>      apic_irq_delivered += !get_bit(s->irr, vector_num);
>>>
>>>      trace_apic_set_irq(apic_irq_delivered);
>>>
>>>      set_bit(s->irr, vector_num);
>>>
>>> This is even more async with kernel irqchip
>>>
>>>      if (trigger_mode)
>>>          set_bit(s->tmr, vector_num);
>>>      else
>>>          reset_bit(s->tmr, vector_num);
>>>
>>> This is protected by qemu_mutex
>>>
>>>      apic_update_irq(s);
>>>
>>> This will be run the next time the vcpu exits, via apic_get_interrupt().
>>
>> The decision to pend an IRQ (and potentially kick the vcpu) takes place
>> immediately in acip_update_irq. And it is based on current irr as well
>> as tpr. But we update again when user space returns with a new value.
>>
>>>
>>> }
>>>
>>> Did you check whether reverting that commit helps?
>>>
>>
>> Just did so, and I can no longer reproduce the problem. Hmm...
>>
> If there is no problem in the logic of this commit (and I do not see
> one yet) then we somewhere miss kicking vcpu when interrupt, that should be
> handled, arrives?

I'm not yet confident about the logic of the kernel patch: mov to cr8 is
serializing. If the guest raises the tpr and then signals this with a
succeeding, non vm-exiting instruction to the other vcpus, one of those
could inject an interrupt with a higher priority than the previous tpr,
but a lower one than current tpr. QEMU user space would accept this
interrupt - and would likely surprise the guest. Do I miss something?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux