From mboxrd@z Thu Jan  1 00:00:00 1970
From: Razvan Cojocaru <rcojocaru@bitdefender.com>
Subject: Re: Deadlock in stdvga_mem_accept() with emulation
Date: Mon, 13 Jul 2015 12:42:24 +0300
Message-ID: <55A38800.3060307@bitdefender.com>
References: <55A36D67.3050203@bitdefender.com> <55A37285.3050304@citrix.com>
	<55A37BD5.2070508@bitdefender.com>
	<9AAE0902D5BC7E449B7C8E4E778ABCD02F4BF5BC@AMSPEX01CL02.citrite.net>
	<55A37ED9.7050909@bitdefender.com>
	<9AAE0902D5BC7E449B7C8E4E778ABCD02F4BF5F8@AMSPEX01CL02.citrite.net>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <9AAE0902D5BC7E449B7C8E4E778ABCD02F4BF5F8@AMSPEX01CL02.citrite.net>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Paul Durrant <Paul.Durrant@citrix.com>, Andrew Cooper <Andrew.Cooper3@citrix.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Cc: "Keir (Xen.org)" <keir@xen.org>, Jan Beulich <JBeulich@suse.com>
List-Id: xen-devel@lists.xenproject.org

On 07/13/2015 12:05 PM, Paul Durrant wrote:
>> -----Original Message-----
>> From: Razvan Cojocaru [mailto:rcojocaru@bitdefender.com]
>> Sent: 13 July 2015 10:03
>> To: Paul Durrant; Andrew Cooper; xen-devel@lists.xen.org
>> Cc: Keir (Xen.org); Jan Beulich
>> Subject: Re: [Xen-devel] Deadlock in stdvga_mem_accept() with emulation
>>
>> On 07/13/2015 12:01 PM, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Razvan Cojocaru [mailto:rcojocaru@bitdefender.com]
>>>> Sent: 13 July 2015 09:50
>>>> To: Andrew Cooper; xen-devel@lists.xen.org
>>>> Cc: Keir (Xen.org); Jan Beulich; Paul Durrant
>>>> Subject: Re: Deadlock in stdvga_mem_accept() with emulation
>>>>
>>>> On 07/13/2015 11:10 AM, Andrew Cooper wrote:
>>>>> On 13/07/2015 08:48, Razvan Cojocaru wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I'm battling the following hypervisor crash with current staging:
>>>>>>
>>>>>> (d2) Invoking ROMBIOS ...
>>>>>> (XEN) stdvga.c:147:d2v0 entering stdvga and caching modes
>>>>>> (d2) VGABios $Id: vgabios.c,v 1.67 2008/01/27 09:44:12 vruppert Exp $
>>>>>> (XEN) Watchdog timer detects that CPU7 is stuck!
>>>>>> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Not tainted ]----
>>>>>> (XEN) CPU:    7
>>>>>> (XEN) RIP:    e008:[<ffff82d08012c3f1>] _spin_lock+0x31/0x54
>>>>>> (XEN) RFLAGS: 0000000000000202   CONTEXT: hypervisor (d2v0)
>>>>>> (XEN) rax: 000000000000c11d   rbx: ffff83041e687970   rcx:
>>>> 000000000000c11e
>>>>>> (XEN) rdx: ffff83041e687970   rsi: 000000000000c11e   rdi:
>> ffff83041e687978
>>>>>> (XEN) rbp: ffff83040eb37208   rsp: ffff83040eb37200   r8:
>>>> 0000000000000000
>>>>>> (XEN) r9:  0000000000000000   r10: ffff82d08028c3c0   r11:
>>>> 0000000000000000
>>>>>> (XEN) r12: ffff83041e687000   r13: ffff83041e687970   r14:
>> ffff83040eb37278
>>>>>> (XEN) r15: 00000000000c253f   cr0: 000000008005003b   cr4:
>>>> 00000000001526e0
>>>>>> (XEN) cr3: 00000004054a0000   cr2: 0000000000000000
>>>>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>>>>> (XEN) Xen stack trace from rsp=ffff83040eb37200:
>>>>>> (XEN)    ffff83040eb37278 ffff83040eb37238 ffff82d0801d09b6
>>>> 0000000000000282
>>>>>> (XEN)    0000000000000008 ffff830403791bf0 ffff83041e687000
>>>> ffff83040eb37268
>>>>>> (XEN)    ffff82d0801cb23a 00000000000c253f ffff8300d85fc000
>>>> 0000000000000001
>>>>>> (XEN)    00000000000000c2 ffff83040eb37298 ffff82d0801cb410
>>>> 00000000000c253f
>>>>>> (XEN)    0000000000000000 0000000100000001 0100000000000000
>>>> ffff83040eb37328
>>>>>> (XEN)    ffff82d0801c2403 ffff83040eb37394 ffff83040eb30000
>>>> 0000000000000000
>>>>>> (XEN)    ffff83040eb37360 00000000000000c2 ffff8304054cb000
>>>> 000000000000053f
>>>>>> (XEN)    0000000000000002 0000000000000000 ffff83040eb373f4
>>>> 00000000000000c2
>>>>>> (XEN)    ffff83040eb373d8 0000000000000000 0000000000000000
>>>> ffff82d08028c620
>>>>>> (XEN)    0000000000000000 ffff83040eb37338 ffff82d0801c3e5d
>>>> ffff83040eb37398
>>>>>> (XEN)    ffff82d0801cb107 000000010eb37394 ffff830403791bf0
>>>> ffff830403791bf0
>>>>>> (XEN)    ffff83041e687000 ffff83040eb37398 ffff830403791bf0
>>>> 0000000000000001
>>>>>> (XEN)    ffff83040eb373d8 0000000000000001 00000000000c253f
>>>> ffff83040eb373c8
>>>>>> (XEN)    ffff82d0801cb291 ffff83040eb37b30 ffff8300d85fc000
>>>> 0000000000000001
>>>>>> (XEN)    0000000000000000 ffff83040eb37428 ffff82d0801bb440
>>>> 00000000000a0001
>>>>>> (XEN)    00000000000c253f 0000000100000001 0111000000000000
>>>> ffff83040eb37478
>>>>>> (XEN)    0000000000000001 0000000000000000 0000000000000000
>>>> 0000000000000001
>>>>>> (XEN)    0000000000000001 ffff83040eb374a8 ffff82d0801bc0b9
>>>> 0000000000000001
>>>>>> (XEN)    00000000000c253f ffff8300d85fc000 00000000000a0001
>>>> 0100000000000000
>>>>>> (XEN)    ffff83040eb37728 ffff82e00819dc60 0000000000000000
>>>> ffff83040eb374c8
>>>>>> (XEN) Xen call trace:
>>>>>> (XEN)    [<ffff82d08012c3f1>] _spin_lock+0x31/0x54
>>>>>> (XEN)    [<ffff82d0801d09b6>] stdvga_mem_accept+0x3b/0x125
>>>>>> (XEN)    [<ffff82d0801cb23a>] hvm_find_io_handler+0x68/0x8a
>>>>>> (XEN)    [<ffff82d0801cb410>] hvm_mmio_internal+0x37/0x67
>>>>>> (XEN)    [<ffff82d0801c2403>] __hvm_copy+0xe9/0x37d
>>>>>> (XEN)    [<ffff82d0801c3e5d>] hvm_copy_from_guest_phys+0x14/0x16
>>>>>> (XEN)    [<ffff82d0801cb107>] hvm_process_io_intercept+0x10b/0x1d6
>>>>>> (XEN)    [<ffff82d0801cb291>] hvm_io_intercept+0x35/0x5b
>>>>>> (XEN)    [<ffff82d0801bb440>] hvmemul_do_io+0x1ff/0x2c1
>>>>>> (XEN)    [<ffff82d0801bc0b9>] hvmemul_do_io_addr+0x117/0x163
>>>>>> (XEN)    [<ffff82d0801bc129>] hvmemul_do_mmio_addr+0x24/0x26
>>>>>> (XEN)    [<ffff82d0801bcbb5>] hvmemul_rep_movs+0x1ef/0x335
>>>>>> (XEN)    [<ffff82d080198b49>] x86_emulate+0x56c9/0x13088
>>>>>> (XEN)    [<ffff82d0801bbd26>] _hvm_emulate_one+0x186/0x281
>>>>>> (XEN)    [<ffff82d0801bc1e8>] hvm_emulate_one+0x10/0x12
>>>>>> (XEN)    [<ffff82d0801cb63e>] handle_mmio+0x54/0xd2
>>>>>> (XEN)    [<ffff82d0801cb700>]
>> handle_mmio_with_translation+0x44/0x46
>>>>>> (XEN)    [<ffff82d0801c27f6>]
>> hvm_hap_nested_page_fault+0x15f/0x589
>>>>>> (XEN)    [<ffff82d0801e9741>] vmx_vmexit_handler+0x150e/0x188d
>>>>>> (XEN)    [<ffff82d0801ee7d1>] vmx_asm_vmexit_handler+0x41/0xc0
>>>>>> (XEN)
>>>>>> (XEN)
>>>>>> (XEN) ****************************************
>>>>>> (XEN) Panic on CPU 7:
>>>>>> (XEN) FATAL TRAP: vector = 2 (nmi)
>>>>>> (XEN) [error_code=0000]
>>>>>> (XEN) ****************************************
>>>>>>
>>>>>> At first I thought it was caused by V5 of the vm_event-based
>>>>>> introspection series, but I've rolled it back enough to apply V4 on top
>>>>>> of it (which has been thoroughly tested on Thursday), and it still
>>>>>> happens, so this would at least appear to be unrelated at this point
>>>>>> (other than the fact that our use case is maybe somewhat unusual with
>>>>>> heavy emulation).
>>>>>>
>>>>>> I'll keep digging, but since this is a busy time for Xen I thought I'd
>>>>>> issue a heads-up here as soon as possible, in case the problem is
>>>>>> obvious for somebody and it helps getting it fixed sooner.
>>>>>
>>>>> In c/s 3bbaaec09b1b942f5624dee176da6e416d31f982 there is now a
>>>>> deliberate split between stdvga_mem_accept() and
>>>> stdvga_mem_complete()
>>>>> about locking and unlocking the stdvga lock.
>>>>>
>>>>> At a guess, the previous chain of execution accidentally omitted the
>>>>> stdvga_mem_complete() call.
>>>>
>>>> Thanks, I've reverted that patch and the crash is gone. I'll be happy to
>>>> test a fix if one is provided, but I don't know enough about that code
>>>> to go mess with it...
>>>>
>>>
>>> The problem appears to be that __hvm_copy() is calling this code before
>> establishing the type of the page:
>>>
>>>         /*
>>>          * No need to do the P2M lookup for internally handled MMIO,
>> benefiting
>>>          * - 32-bit WinXP (& older Windows) on AMD CPUs for LAPIC accesses,
>>>          * - newer Windows (like Server 2012) for HPET accesses.
>>>          */
>>>         if ( !nestedhvm_vcpu_in_guestmode(curr)
>>>              && is_hvm_vcpu(curr)
>>>              && hvm_mmio_internal(gpa) )
>>>             return HVMCOPY_bad_gfn_to_mfn;
>>>
>>> ...and that call to hvm_mmio_internal is trying to re-acquire the lock. The
>> big question is why on earth was I not hitting this every time in testing too...
>> This patch series has gone through some pretty rigorous tests. Anyway, I will
>> post a fix very shortly (since I know what needs to be done).
>>>
>>>   Paul
>>
>> Thanks Paul! I appreciate the quick response.
>>
> 
> While I'm prepping, try this:
> 
> diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c
> index ebb3b42..08c797c 100644
> --- a/xen/arch/x86/hvm/stdvga.c
> +++ b/xen/arch/x86/hvm/stdvga.c
> @@ -490,11 +490,18 @@ static bool_t stdvga_mem_accept(const struct hvm_io_handle
>  {
>      struct hvm_hw_stdvga *s = &current->domain->arch.hvm_domain.stdvga;
> 
> +    /*
> +     * The range check must be done without taking any locks, to avoid
> +     * deadlock when hvm_mmio_internal() is called from
> +     * hvm_copy_to/from_guest_phys() in hvm_process_io_intercept().
> +     */
> +    if ( (hvm_mmio_first_byte(p) < VGA_MEM_BASE) ||
> +         (hvm_mmio_last_byte(p) >= (VGA_MEM_BASE + VGA_MEM_SIZE)) )
> +        return 0;
> +
>      spin_lock(&s->lock);
> 
> -    if ( !s->stdvga ||
> -         (hvm_mmio_first_byte(p) < VGA_MEM_BASE) ||
> -         (hvm_mmio_last_byte(p) >= (VGA_MEM_BASE + VGA_MEM_SIZE)) )
> +    if ( !s->stdvga )
>          goto reject;
> 
>      if ( p->dir == IOREQ_WRITE && p->count > 1 )

The crash is gone, so in that respect:

Tested-by: Razvan Cojocaru <rcojocaru@bitdefender.com>

But my guest seems stuck at boot for some reason, the last line of xl
dmesg is:

(XEN) stdvga.c:151:d3v0 leaving stdvga mode

and this is the guest state:

# ./xenctx -a 3
rip: fffff80001497de0
flags: 00000002 nz
rsp: fffff80001479538
rax: 0000000000000000   rcx: 00000000000003fe   rdx: 00000000000003fe
rbx: fffff8000149a110   rsi: fffff80001479601   rdi: 0000000000009860
rbp: 0000000000000000    r8: fffff8000149a174    r9: 0000000000000000
r10: 00000000000027d7   r11: fffff800014795a0   r12: 0000000000000000
r13: 0000000000000004   r14: 0000000000000002   r15: 0000000000000000
 cs: 0010        ss: 0018        ds: 002b        es: 002b
 fs: 0053 @ 0000000000000000
 gs: 002b @ fffff80001805d00/fffff80001805d00

cr0: 0000000080050031
cr2: 0000000000000030
cr3: 0000000000187000
cr4: 00000000000006b8

dr0: 0000000000000000
dr1: 0000000000000000
dr2: 0000000000000000
dr3: 0000000000000000
dr6: 00000000fffe0ff0
dr7: 0000000000000400
Code (instr addr fffff80001497de0)
90 8a c2 0f b7 d1 ee c3 90 90 90 90 90 90 90 90 90 0f b7 d1 ec <c3> 90
90 90 90 90 90 90 88 11 f0

But it's too early to tell that this isn't something my V5 code is now
doing, so I'll come back if it turns out to be unrelated.


Thanks,
Razvan