From: Quentin Casasnovas <quentin.casasnovas@oracle.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Quentin Casasnovas" <quentin.casasnovas@oracle.com>,
x86 <x86@kernel.org>, kvm <kvm@vger.kernel.org>,
lkml <linux-kernel@vger.kernel.org>,
"Eugene Korenevsky" <ekorenevsky@gmail.com>,
"Radim Krčmář" <rkrcmar@redhat.com>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Ingo Molnar" <mingo@redhat.com>,
"H . Peter Anvin" <hpa@zytor.com>,
linux-stable <stable@vger.kernel.org>
Subject: Re: [PATCH] KVM: nVMX: VMX instructions: fix segment checks when L1 is in long mode.
Date: Wed, 29 Jun 2016 19:25:54 +0200 [thread overview]
Message-ID: <20160629172553.GC5318@chrystal> (raw)
In-Reply-To: <b425cbd1-0753-fe11-4a88-0b1adbc0dd46@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 2443 bytes --]
On Fri, Jun 24, 2016 at 03:10:03PM +0200, Paolo Bonzini wrote:
> On 24/06/2016 15:04, Quentin Casasnovas wrote:
> > On Thu, Jun 23, 2016 at 06:03:01PM +0200, Paolo Bonzini wrote:
> >>
> >>
> >> On 18/06/2016 11:01, Quentin Casasnovas wrote:
> >>> Cross-checking the KVM/VMX VMREAD emulation code with the Intel Software
> >>> Developper Manual Volume 3C - "VMREAD - Read Field from Virtual-Machine
> >>> Control Structure", I found that we're enforcing that the destination
> >>> operand is NOT located in a read-only data segment or any code segment when
> >>> the L1 is in long mode - BUT that check should only happen when it is in
> >>> protected mode.
> >>>
> >>> Shuffling the code a bit to make our emulation follow the specification
> >>> allows me to boot a Xen dom0 in a nested KVM and start HVM L2 guests
> >>> without problems.
> >>
> >> That's great, and I'm applying the patch, but it's also pretty weird. :)
> >> Do you have a pointer to Xen source code that does a VMREAD into a
> >> read-only data segment or a code segment?
> >
> > It is indeed pretty weird. Looking at the Xen stack trace, it looks like
> > the vmread is writing to an on-stack buffer, and surely it must be writable
> > so I wonder if Xen might not be using an executable stack for some reason?
> > That would be a bit scary so I'm surely missing something.
> >
> > Is there an easy way to know from my KVM host the different segment
> > permission setup by the guest?
>
> Remove your patch, call dump_vmcs() where the #GP is injected, and
> you'll find the VMCS (including segment permissions, but not the
> instruction info field---you probably should add it) in dmesg.
>
Thanks for the heads up :)
I've had a bit more time to spend on this this morning and attached is the
VMCS dump. I've look at the vmcs_instruction_info and it appears the
segment referenced is SS (which is in sync with the backtrace where the
instruction causing the vmexit is "vmread %rbp, %rbp), and it has awkward
attributes:
SS: sel=0x0000, attr=0x1c000, limit=0xffffffff, base=0x0000000000000000
The lower 16 bits are all zero so KVM VMX emulation was injecting the GP(0)
because we were about to write to a read-only segment. At least the stack
isn't executable from what I can tell!
Attached is the full VMCS dump where I've added a printk() to show the
'type' (all zeroes) and vmcs_instruction_info in case my above analysis is
complete non-sense.
Quentin
[-- Attachment #2: vmcs_dump_xen_vmread.txt --]
[-- Type: text/plain, Size: 2900 bytes --]
[ 9853.506447] kvm: wr: read-only segment type==0, info=e2614920
[ 9853.506464] *** Guest State ***
[ 9853.506466] CR0: actual=0x0000000080050033, shadow=0x0000000080050033, gh_mask=fffffffffffffff7
[ 9853.506467] CR4: actual=0x00000000001526e0, shadow=0x00000000001526e0, gh_mask=fffffffffffff871
[ 9853.506467] CR3 = 0x000000007aa37000
[ 9853.506468] RSP = 0xffff83007b73fab0 RIP = 0xffff82d0801e629e
[ 9853.506469] RFLAGS=0x00000202 DR7 = 0x0000000000000400
[ 9853.506470] Sysenter RSP=ffff83007b73ffc0 CS:RIP=e008:ffff82d08022c480
[ 9853.506471] CS: sel=0xe008, attr=0x0a09b, limit=0xffffffff, base=0x0000000000000000
[ 9853.506472] DS: sel=0x0000, attr=0x0c093, limit=0xffffffff, base=0x0000000000000000
[ 9853.506473] SS: sel=0x0000, attr=0x1c000, limit=0xffffffff, base=0x0000000000000000
[ 9853.506474] ES: sel=0x0000, attr=0x0c093, limit=0xffffffff, base=0x0000000000000000
[ 9853.506475] FS: sel=0x0000, attr=0x0c093, limit=0xffffffff, base=0x0000000000000000
[ 9853.506476] GS: sel=0x0000, attr=0x0c093, limit=0xffffffff, base=0x0000000000000000
[ 9853.506477] GDTR: limit=0x0000efff, base=0xffff83007b4d7000
[ 9853.506478] LDTR: sel=0x0000, attr=0x1c000, limit=0xffffffff, base=0x0000000000000000
[ 9853.506479] IDTR: limit=0x00000fff, base=0xffff83007b4e3000
[ 9853.506480] TR: sel=0xe040, attr=0x0008b, limit=0x00000067, base=0xffff83007b4e6c80
[ 9853.506481] EFER = 0x0000000000000d00 PAT = 0x0000050100070406
[ 9853.506481] DebugCtl = 0x0000000000000000 DebugExceptions = 0x0000000000000000
[ 9853.506482] Interruptibility = 00000000 ActivityState = 00000000
[ 9853.506483] *** Host State ***
[ 9853.506484] RIP = 0xffffffffa00f6daf RSP = 0xffff880131aafd00
[ 9853.506485] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040
[ 9853.506486] FSBase=00007fbf6bfff700 GSBase=ffff88021e240000 TRBase=ffff88021e253b40
[ 9853.506486] GDTBase=ffff88021e249000 IDTBase=ffffffffff57b000
[ 9853.506487] CR0=0000000080050033 CR3=0000000004b21000 CR4=00000000001426e0
[ 9853.506488] Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff81a02740
[ 9853.506489] EFER = 0x0000000000000d01 PAT = 0x0407010600070106
[ 9853.506490] *** Control State ***
[ 9853.506491] PinBased=0000003f CPUBased=b6a06dfa SecondaryExec=000000eb
[ 9853.506491] EntryControls=0000d3ff ExitControls=002fefff
[ 9853.506492] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000
[ 9853.506493] VMEntry: intr_info=000000fc errcode=00000000 ilen=00000000
[ 9853.506494] VMExit: intr_info=00000000 errcode=00000000 ilen=00000006
[ 9853.506495] reason=00000017 qualification=0000000000000008
[ 9853.506495] IDTVectoring: info=00000000 errcode=00000000
[ 9853.506496] TSC Offset = 0xffffe8cdfc3ca592
[ 9853.506497] TPR Threshold = 0x00
[ 9853.506497] EPT pointer = 0x000000000467f01e
[ 9853.506498] Virtual processor ID = 0x0007
next prev parent reply other threads:[~2016-06-29 17:25 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-18 9:01 [PATCH] KVM: nVMX: VMX instructions: fix segment checks when L1 is in long mode Quentin Casasnovas
2016-06-23 16:03 ` Paolo Bonzini
2016-06-24 13:04 ` Quentin Casasnovas
2016-06-24 13:10 ` Paolo Bonzini
2016-06-29 17:25 ` Quentin Casasnovas [this message]
2016-06-29 20:48 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160629172553.GC5318@chrystal \
--to=quentin.casasnovas@oracle.com \
--cc=ekorenevsky@gmail.com \
--cc=hpa@zytor.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=rkrcmar@redhat.com \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.