All of lore.kernel.org
 help / color / mirror / Atom feed
From: Razvan Cojocaru <rcojocaru@bitdefender.com>
To: Paul Durrant <Paul.Durrant@citrix.com>,
	Andrew Cooper <Andrew.Cooper3@citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Cc: "Keir (Xen.org)" <keir@xen.org>, Jan Beulich <JBeulich@suse.com>
Subject: Re: Deadlock in stdvga_mem_accept() with emulation
Date: Mon, 13 Jul 2015 12:03:21 +0300	[thread overview]
Message-ID: <55A37ED9.7050909@bitdefender.com> (raw)
In-Reply-To: <9AAE0902D5BC7E449B7C8E4E778ABCD02F4BF5BC@AMSPEX01CL02.citrite.net>

On 07/13/2015 12:01 PM, Paul Durrant wrote:
>> -----Original Message-----
>> From: Razvan Cojocaru [mailto:rcojocaru@bitdefender.com]
>> Sent: 13 July 2015 09:50
>> To: Andrew Cooper; xen-devel@lists.xen.org
>> Cc: Keir (Xen.org); Jan Beulich; Paul Durrant
>> Subject: Re: Deadlock in stdvga_mem_accept() with emulation
>>
>> On 07/13/2015 11:10 AM, Andrew Cooper wrote:
>>> On 13/07/2015 08:48, Razvan Cojocaru wrote:
>>>> Hello,
>>>>
>>>> I'm battling the following hypervisor crash with current staging:
>>>>
>>>> (d2) Invoking ROMBIOS ...
>>>> (XEN) stdvga.c:147:d2v0 entering stdvga and caching modes
>>>> (d2) VGABios $Id: vgabios.c,v 1.67 2008/01/27 09:44:12 vruppert Exp $
>>>> (XEN) Watchdog timer detects that CPU7 is stuck!
>>>> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Not tainted ]----
>>>> (XEN) CPU:    7
>>>> (XEN) RIP:    e008:[<ffff82d08012c3f1>] _spin_lock+0x31/0x54
>>>> (XEN) RFLAGS: 0000000000000202   CONTEXT: hypervisor (d2v0)
>>>> (XEN) rax: 000000000000c11d   rbx: ffff83041e687970   rcx:
>> 000000000000c11e
>>>> (XEN) rdx: ffff83041e687970   rsi: 000000000000c11e   rdi: ffff83041e687978
>>>> (XEN) rbp: ffff83040eb37208   rsp: ffff83040eb37200   r8:
>> 0000000000000000
>>>> (XEN) r9:  0000000000000000   r10: ffff82d08028c3c0   r11:
>> 0000000000000000
>>>> (XEN) r12: ffff83041e687000   r13: ffff83041e687970   r14: ffff83040eb37278
>>>> (XEN) r15: 00000000000c253f   cr0: 000000008005003b   cr4:
>> 00000000001526e0
>>>> (XEN) cr3: 00000004054a0000   cr2: 0000000000000000
>>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>>> (XEN) Xen stack trace from rsp=ffff83040eb37200:
>>>> (XEN)    ffff83040eb37278 ffff83040eb37238 ffff82d0801d09b6
>> 0000000000000282
>>>> (XEN)    0000000000000008 ffff830403791bf0 ffff83041e687000
>> ffff83040eb37268
>>>> (XEN)    ffff82d0801cb23a 00000000000c253f ffff8300d85fc000
>> 0000000000000001
>>>> (XEN)    00000000000000c2 ffff83040eb37298 ffff82d0801cb410
>> 00000000000c253f
>>>> (XEN)    0000000000000000 0000000100000001 0100000000000000
>> ffff83040eb37328
>>>> (XEN)    ffff82d0801c2403 ffff83040eb37394 ffff83040eb30000
>> 0000000000000000
>>>> (XEN)    ffff83040eb37360 00000000000000c2 ffff8304054cb000
>> 000000000000053f
>>>> (XEN)    0000000000000002 0000000000000000 ffff83040eb373f4
>> 00000000000000c2
>>>> (XEN)    ffff83040eb373d8 0000000000000000 0000000000000000
>> ffff82d08028c620
>>>> (XEN)    0000000000000000 ffff83040eb37338 ffff82d0801c3e5d
>> ffff83040eb37398
>>>> (XEN)    ffff82d0801cb107 000000010eb37394 ffff830403791bf0
>> ffff830403791bf0
>>>> (XEN)    ffff83041e687000 ffff83040eb37398 ffff830403791bf0
>> 0000000000000001
>>>> (XEN)    ffff83040eb373d8 0000000000000001 00000000000c253f
>> ffff83040eb373c8
>>>> (XEN)    ffff82d0801cb291 ffff83040eb37b30 ffff8300d85fc000
>> 0000000000000001
>>>> (XEN)    0000000000000000 ffff83040eb37428 ffff82d0801bb440
>> 00000000000a0001
>>>> (XEN)    00000000000c253f 0000000100000001 0111000000000000
>> ffff83040eb37478
>>>> (XEN)    0000000000000001 0000000000000000 0000000000000000
>> 0000000000000001
>>>> (XEN)    0000000000000001 ffff83040eb374a8 ffff82d0801bc0b9
>> 0000000000000001
>>>> (XEN)    00000000000c253f ffff8300d85fc000 00000000000a0001
>> 0100000000000000
>>>> (XEN)    ffff83040eb37728 ffff82e00819dc60 0000000000000000
>> ffff83040eb374c8
>>>> (XEN) Xen call trace:
>>>> (XEN)    [<ffff82d08012c3f1>] _spin_lock+0x31/0x54
>>>> (XEN)    [<ffff82d0801d09b6>] stdvga_mem_accept+0x3b/0x125
>>>> (XEN)    [<ffff82d0801cb23a>] hvm_find_io_handler+0x68/0x8a
>>>> (XEN)    [<ffff82d0801cb410>] hvm_mmio_internal+0x37/0x67
>>>> (XEN)    [<ffff82d0801c2403>] __hvm_copy+0xe9/0x37d
>>>> (XEN)    [<ffff82d0801c3e5d>] hvm_copy_from_guest_phys+0x14/0x16
>>>> (XEN)    [<ffff82d0801cb107>] hvm_process_io_intercept+0x10b/0x1d6
>>>> (XEN)    [<ffff82d0801cb291>] hvm_io_intercept+0x35/0x5b
>>>> (XEN)    [<ffff82d0801bb440>] hvmemul_do_io+0x1ff/0x2c1
>>>> (XEN)    [<ffff82d0801bc0b9>] hvmemul_do_io_addr+0x117/0x163
>>>> (XEN)    [<ffff82d0801bc129>] hvmemul_do_mmio_addr+0x24/0x26
>>>> (XEN)    [<ffff82d0801bcbb5>] hvmemul_rep_movs+0x1ef/0x335
>>>> (XEN)    [<ffff82d080198b49>] x86_emulate+0x56c9/0x13088
>>>> (XEN)    [<ffff82d0801bbd26>] _hvm_emulate_one+0x186/0x281
>>>> (XEN)    [<ffff82d0801bc1e8>] hvm_emulate_one+0x10/0x12
>>>> (XEN)    [<ffff82d0801cb63e>] handle_mmio+0x54/0xd2
>>>> (XEN)    [<ffff82d0801cb700>] handle_mmio_with_translation+0x44/0x46
>>>> (XEN)    [<ffff82d0801c27f6>] hvm_hap_nested_page_fault+0x15f/0x589
>>>> (XEN)    [<ffff82d0801e9741>] vmx_vmexit_handler+0x150e/0x188d
>>>> (XEN)    [<ffff82d0801ee7d1>] vmx_asm_vmexit_handler+0x41/0xc0
>>>> (XEN)
>>>> (XEN)
>>>> (XEN) ****************************************
>>>> (XEN) Panic on CPU 7:
>>>> (XEN) FATAL TRAP: vector = 2 (nmi)
>>>> (XEN) [error_code=0000]
>>>> (XEN) ****************************************
>>>>
>>>> At first I thought it was caused by V5 of the vm_event-based
>>>> introspection series, but I've rolled it back enough to apply V4 on top
>>>> of it (which has been thoroughly tested on Thursday), and it still
>>>> happens, so this would at least appear to be unrelated at this point
>>>> (other than the fact that our use case is maybe somewhat unusual with
>>>> heavy emulation).
>>>>
>>>> I'll keep digging, but since this is a busy time for Xen I thought I'd
>>>> issue a heads-up here as soon as possible, in case the problem is
>>>> obvious for somebody and it helps getting it fixed sooner.
>>>
>>> In c/s 3bbaaec09b1b942f5624dee176da6e416d31f982 there is now a
>>> deliberate split between stdvga_mem_accept() and
>> stdvga_mem_complete()
>>> about locking and unlocking the stdvga lock.
>>>
>>> At a guess, the previous chain of execution accidentally omitted the
>>> stdvga_mem_complete() call.
>>
>> Thanks, I've reverted that patch and the crash is gone. I'll be happy to
>> test a fix if one is provided, but I don't know enough about that code
>> to go mess with it...
>>
> 
> The problem appears to be that __hvm_copy() is calling this code before establishing the type of the page:
> 
>         /*
>          * No need to do the P2M lookup for internally handled MMIO, benefiting
>          * - 32-bit WinXP (& older Windows) on AMD CPUs for LAPIC accesses,
>          * - newer Windows (like Server 2012) for HPET accesses.
>          */
>         if ( !nestedhvm_vcpu_in_guestmode(curr)
>              && is_hvm_vcpu(curr)
>              && hvm_mmio_internal(gpa) )
>             return HVMCOPY_bad_gfn_to_mfn;
> 
> ...and that call to hvm_mmio_internal is trying to re-acquire the lock. The big question is why on earth was I not hitting this every time in testing too... This patch series has gone through some pretty rigorous tests. Anyway, I will post a fix very shortly (since I know what needs to be done).
> 
>   Paul

Thanks Paul! I appreciate the quick response.


Cheers,
Razvan

  reply	other threads:[~2015-07-13  9:03 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-13  7:48 Deadlock in stdvga_mem_accept() with emulation Razvan Cojocaru
2015-07-13  8:10 ` Andrew Cooper
2015-07-13  8:50   ` Razvan Cojocaru
2015-07-13  9:00     ` Andrew Cooper
2015-07-13  9:02       ` Razvan Cojocaru
2015-07-13  9:01     ` Paul Durrant
2015-07-13  9:03       ` Razvan Cojocaru [this message]
2015-07-13  9:05         ` Paul Durrant
2015-07-13  9:27           ` Jan Beulich
2015-07-13  9:30             ` Paul Durrant
2015-07-13  9:41               ` Jan Beulich
2015-07-13  9:42           ` Razvan Cojocaru
2015-07-13 10:11             ` Paul Durrant
2015-07-13 12:04               ` Paul Durrant
2015-07-13 12:07                 ` Razvan Cojocaru
2015-07-13 12:08                   ` Paul Durrant
2015-07-13  8:51   ` Paul Durrant

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55A37ED9.7050909@bitdefender.com \
    --to=rcojocaru@bitdefender.com \
    --cc=Andrew.Cooper3@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=Paul.Durrant@citrix.com \
    --cc=keir@xen.org \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.