* RE: [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
@ 2006-04-28 9:19 Petersson, Mats
2006-04-28 9:24 ` Keir Fraser
0 siblings, 1 reply; 19+ messages in thread
From: Petersson, Mats @ 2006-04-28 9:19 UTC (permalink / raw)
To: Keir Fraser; +Cc: Khoa Huynh, xen-devel
> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com
> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of
> Keir Fraser
> Sent: 28 April 2006 10:15
> To: Petersson, Mats
> Cc: Khoa Huynh; xen-devel
> Subject: Re: [Xen-devel] [PATCH] Calculate correct
> instruction length for data-fault VM exits on VT-x systems
>
>
> On 28 Apr 2006, at 10:02, Petersson, Mats wrote:
>
> > I'll look at your previous suggestion of merging the MMIO emulation
> > into x86_emulate later on today. We probably do need to sum up the
> > length and pass it back to the caller - as that code
> doesn't know how
> > to update the correct field of the different processor
> architectures
> > (vmcb vs. vmcs vs. stack-frame for Para-virtual machine). But it
> > shouldn't be particularly hard to achieve this.
>
> The emulator uses and updates the eip field of the passed-in
> regs structure. We may want to change this interface in
> future by having the caller explicitly pass in a buffer
> containing the instruction, and the number of valid bytes in
> the buffer. Or add a 'fetch_insn_byte'
> callback hook to the emulator interface.
I think passing a buffer is the best choice here. And I suppose we can
always stuff vmc[bs]->rip into regs->eip and pull it back out again when
we get back - using a wrapper function may be the easiest way to achieve
this (at least short term).
We will of course also need to get the communication with QEMU done in
some way.
I haven't spent any time looking at it so far...
--
Mats
>
> -- Keir
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
2006-04-28 9:19 [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems Petersson, Mats
@ 2006-04-28 9:24 ` Keir Fraser
2006-04-29 1:20 ` Leendert van Doorn
0 siblings, 1 reply; 19+ messages in thread
From: Keir Fraser @ 2006-04-28 9:24 UTC (permalink / raw)
To: Petersson, Mats; +Cc: Khoa Huynh, xen-devel
On 28 Apr 2006, at 10:19, Petersson, Mats wrote:
> I think passing a buffer is the best choice here. And I suppose we can
> always stuff vmc[bs]->rip into regs->eip and pull it back out again
> when
> we get back - using a wrapper function may be the easiest way to
> achieve
> this (at least short term).
Yes, I expect HVM users will want some kind of helpful wrapper around
the core emulator. I'm trying to keep the emulator itself very generic,
so it makes sense that some tailoring will be required in some usages.
> We will of course also need to get the communication with QEMU done in
> some way.
Yes, I don't imagine that is hard as long as you're prepared to copy
the guest's register state to and from qemu-dm. Even on x8664 it's only
a couple hundred bytes so unlikely to be a significant overhead. Then
you can simply have a copy of the emulator inside qemu-dm.
-- Keir
> I haven't spent any time looking at it so far...
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
2006-04-28 9:24 ` Keir Fraser
@ 2006-04-29 1:20 ` Leendert van Doorn
2006-04-28 20:02 ` Anthony Liguori
2006-04-29 8:00 ` Keir Fraser
0 siblings, 2 replies; 19+ messages in thread
From: Leendert van Doorn @ 2006-04-29 1:20 UTC (permalink / raw)
To: Keir Fraser; +Cc: Petersson, Mats, xen-devel, Khoa Huynh
On Fri, 2006-04-28 at 10:24 +0100, Keir Fraser wrote:
> Yes, I expect HVM users will want some kind of helpful wrapper around
> the core emulator. I'm trying to keep the emulator itself very generic,
> so it makes sense that some tailoring will be required in some usages.
>
> > We will of course also need to get the communication with QEMU done in
> > some way.
>
> Yes, I don't imagine that is hard as long as you're prepared to copy
> the guest's register state to and from qemu-dm. Even on x8664 it's only
> a couple hundred bytes so unlikely to be a significant overhead. Then
> you can simply have a copy of the emulator inside qemu-dm.
>
> -- Keir
>
> > I haven't spent any time looking at it so far...
Here is something I've been toying with lately, inspired by the work
Steve's students presented at Eurosys.
The way we are currently doing real-mode emulation for VT-x is a royal
pain and getting the semantics right for all big real mode uses (Solaris
9, SLES's gfxboot) is next to impossible in the current framework. What
I was thinking about was to switch back and forth between a VT-x
partition and a full emulator. The obvious choice for this would be to
put back the qemu instruction emulator into qemu-dm and handle all
real-mode instructions there. As soon as CR0.PE is set to 0, we do an
upcall to the emulator. Once CR0.PE=1 and we have emulated some
threshold number of instructions (1000?) we switch back to the VMX
partition. This would allow us to amortize the cost of doing a full
context save and restore. Obviously, this is only a concern for VT-x,
but SVM could benefit in the following scenario:
We could do a similar thing for I/O operations. Basically, generate an
upcall into qemu-dm on an MMIO or PIO exit and let qemu-dm deal with it.
It can do the same trick and emulate a number of instructions (1000?)
before returning to the HVM partition. This will eliminate expensive
VMCS/VMCB exits on subsequent I/O operations (just consider doing a
block write on an IDE device in PIO mode, this is common behavior). It
will also eliminate the need for the MMIO instruction emulator in the
hypervisor.
The only difficulty is that the hypervisor keeps some of the device
state vpit and *pics and shotcuts operations to them. This state needs
to be exposed to qemu-dm so that it is saved and restored on every
qemu-dm invocation. I need to verify this, but as far as I'm aware, all
the accesses to the devices emulated in the hypervisor are PIO
operations. These are easy to decode with the exit information that is
provided by VT-x and SVM, so they don't need a a full instruction
decoder.
Comments?
Leendert
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
2006-04-29 1:20 ` Leendert van Doorn
@ 2006-04-28 20:02 ` Anthony Liguori
2006-04-29 8:00 ` Keir Fraser
1 sibling, 0 replies; 19+ messages in thread
From: Anthony Liguori @ 2006-04-28 20:02 UTC (permalink / raw)
To: leendert; +Cc: Petersson, Mats, xen-devel, Khoa Huynh
Leendert van Doorn wrote:
> On Fri, 2006-04-28 at 10:24 +0100, Keir Fraser wrote:
>
>
>> Yes, I expect HVM users will want some kind of helpful wrapper around
>> the core emulator. I'm trying to keep the emulator itself very generic,
>> so it makes sense that some tailoring will be required in some usages.
>>
>>
>>> We will of course also need to get the communication with QEMU done in
>>> some way.
>>>
>> Yes, I don't imagine that is hard as long as you're prepared to copy
>> the guest's register state to and from qemu-dm. Even on x8664 it's only
>> a couple hundred bytes so unlikely to be a significant overhead. Then
>> you can simply have a copy of the emulator inside qemu-dm.
>>
>> -- Keir
>>
>>
>>> I haven't spent any time looking at it so far...
>>>
>
> Here is something I've been toying with lately, inspired by the work
> Steve's students presented at Eurosys.
>
> The way we are currently doing real-mode emulation for VT-x is a royal
> pain and getting the semantics right for all big real mode uses (Solaris
> 9, SLES's gfxboot) is next to impossible in the current framework. What
> I was thinking about was to switch back and forth between a VT-x
> partition and a full emulator. The obvious choice for this would be to
> put back the qemu instruction emulator into qemu-dm and handle all
> real-mode instructions there. As soon as CR0.PE is set to 0, we do an
> upcall to the emulator. Once CR0.PE=1 and we have emulated some
> threshold number of instructions (1000?) we switch back to the VMX
> partition. This would allow us to amortize the cost of doing a full
> context save and restore. Obviously, this is only a concern for VT-x,
> but SVM could benefit in the following scenario:
>
This could be extended to support systems without VT/SVM. Instead of
dropping back when CR0.PE=1 (after 1000 instructions), if VT/SVM isn't
available, you wait until a switch to ring 3. This would essentially do
what QEmu + qvm86 does today.
I'd be really happy to see this in Xen since it would let you use
unmodified guests (even when VT/SVM is not present).
Regards,
Anthony Liguori
> We could do a similar thing for I/O operations. Basically, generate an
> upcall into qemu-dm on an MMIO or PIO exit and let qemu-dm deal with it.
> It can do the same trick and emulate a number of instructions (1000?)
> before returning to the HVM partition. This will eliminate expensive
> VMCS/VMCB exits on subsequent I/O operations (just consider doing a
> block write on an IDE device in PIO mode, this is common behavior). It
> will also eliminate the need for the MMIO instruction emulator in the
> hypervisor.
>
> The only difficulty is that the hypervisor keeps some of the device
> state vpit and *pics and shotcuts operations to them. This state needs
> to be exposed to qemu-dm so that it is saved and restored on every
> qemu-dm invocation. I need to verify this, but as far as I'm aware, all
> the accesses to the devices emulated in the hypervisor are PIO
> operations. These are easy to decode with the exit information that is
> provided by VT-x and SVM, so they don't need a a full instruction
> decoder.
>
> Comments?
>
> Leendert
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
2006-04-29 1:20 ` Leendert van Doorn
2006-04-28 20:02 ` Anthony Liguori
@ 2006-04-29 8:00 ` Keir Fraser
2006-04-29 14:54 ` Leendert van Doorn
1 sibling, 1 reply; 19+ messages in thread
From: Keir Fraser @ 2006-04-29 8:00 UTC (permalink / raw)
To: leendert; +Cc: Petersson, Mats, xen-devel, Khoa Huynh
On 29 Apr 2006, at 02:20, Leendert van Doorn wrote:
> The only difficulty is that the hypervisor keeps some of the device
> state vpit and *pics and shotcuts operations to them. This state needs
> to be exposed to qemu-dm so that it is saved and restored on every
> qemu-dm invocation. I need to verify this, but as far as I'm aware, all
> the accesses to the devices emulated in the hypervisor are PIO
> operations. These are easy to decode with the exit information that is
> provided by VT-x and SVM, so they don't need a a full instruction
> decoder.
The APIC and IO-APIC are accessed via mmio. The former is written
fairly frequently with singleton updates (to the TPR and EOI registers)
so we'd want to carry on dealing with those directly in Xen I should
think. Still you'd have to deal with the case that one of the
Xen-emulated devices is accessed while emulating in qemu-dm -- as you
say you'd probably have to pull their state vectors out of Xen when
starting emulating. We'll need that for save/restore anyway though.
I don't know if this will make sense for emulated I/O but it does sound
like a very sane alternative to vmxassist for dealing with real mode.
-- Keir
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
2006-04-29 8:00 ` Keir Fraser
@ 2006-04-29 14:54 ` Leendert van Doorn
2006-04-29 10:39 ` Keir Fraser
0 siblings, 1 reply; 19+ messages in thread
From: Leendert van Doorn @ 2006-04-29 14:54 UTC (permalink / raw)
To: Keir Fraser, Mats, xen-devel, Khoa Huynh
On Sat, 2006-04-29 at 09:00 +0100, Keir Fraser wrote:
> The APIC and IO-APIC are accessed via mmio. The former is written
> fairly frequently with singleton updates (to the TPR and EOI registers)
> so we'd want to carry on dealing with those directly in Xen I should
> think. Still you'd have to deal with the case that one of the
> Xen-emulated devices is accessed while emulating in qemu-dm -- as you
> say you'd probably have to pull their state vectors out of Xen when
> starting emulating. We'll need that for save/restore anyway though.
The state is already partially exposed to qemu-dm through the shared
global I/O data page (include/public/hvm/ioreq.h). This is easy to
extend so that a context switch doesn't involve copying device state.
This is also the place where we should store the vmx_assist_context
information that is required by the emulator.
The mmio *pic operations could just be handled by x86_emulate.
> I don't know if this will make sense for emulated I/O but it does sound
> like a very sane alternative to vmxassist for dealing with real mode.
The big advantage I see for I/O is that 1) we don't need the instruction
decode support anymore so it cleans up the hypervisor, 2) it has the
potential to greatly reducing the number of exits that are caused by I/O
by emulating subsequent I/O operations before returning to the HVM
partition. Especially for the older devices that we are currently
emulating this could be a major win, but even for modern devices where
you are manipulating ring buffers that reside in I/O space it would be a
win.
I don't think that moving the I/O decoding from the hypervisor to the
device model is going to be a major performance bottleneck. This cost is
dwarfed by the upcall into qemu-dm.
Leendert
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
2006-04-29 14:54 ` Leendert van Doorn
@ 2006-04-29 10:39 ` Keir Fraser
2006-04-29 23:24 ` Leendert van Doorn
0 siblings, 1 reply; 19+ messages in thread
From: Keir Fraser @ 2006-04-29 10:39 UTC (permalink / raw)
To: leendert; +Cc: Mats, xen-devel, Khoa Huynh
On 29 Apr 2006, at 15:54, Leendert van Doorn wrote:
> The state is already partially exposed to qemu-dm through the shared
> global I/O data page (include/public/hvm/ioreq.h). This is easy to
> extend so that a context switch doesn't involve copying device state.
> This is also the place where we should store the vmx_assist_context
> information that is required by the emulator.
Xen-emulated devices could have their state fetched on demand -- it
ought to be rare that qemu-dm ends up emulating an access to such a
device. Or have qemu call down into Xen to emulate those accesses.
Having a device model duplicated in both Xen and qemu, and
synchronising between the two, sounds a bit sketchy.
SSE state is another consideration, but maybe that's not too expensive
to save/restore/copy on every qemu-dm transition.
-- Keir
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
2006-04-29 10:39 ` Keir Fraser
@ 2006-04-29 23:24 ` Leendert van Doorn
2006-04-29 18:54 ` Keir Fraser
0 siblings, 1 reply; 19+ messages in thread
From: Leendert van Doorn @ 2006-04-29 23:24 UTC (permalink / raw)
To: Keir Fraser; +Cc: Mats, xen-devel, Khoa Huynh
On Sat, 2006-04-29 at 11:39 +0100, Keir Fraser wrote:
>
> Xen-emulated devices could have their state fetched on demand -- it
> ought to be rare that qemu-dm ends up emulating an access to such a
> device. Or have qemu call down into Xen to emulate those accesses.
> Having a device model duplicated in both Xen and qemu, and
> synchronising between the two, sounds a bit sketchy.
In the BIOS (realmode) these devices are initialized and the subsequent
32-bit code depends on the proper initialization. For the common I/O
emulation case this should hopefully be rare.
Leendert
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
2006-04-29 23:24 ` Leendert van Doorn
@ 2006-04-29 18:54 ` Keir Fraser
2006-04-30 1:37 ` Leendert van Doorn
0 siblings, 1 reply; 19+ messages in thread
From: Keir Fraser @ 2006-04-29 18:54 UTC (permalink / raw)
To: leendert; +Cc: Mats, xen-devel, Khoa Huynh
On 30 Apr 2006, at 00:24, Leendert van Doorn wrote:
>> Xen-emulated devices could have their state fetched on demand -- it
>> ought to be rare that qemu-dm ends up emulating an access to such a
>> device. Or have qemu call down into Xen to emulate those accesses.
>> Having a device model duplicated in both Xen and qemu, and
>> synchronising between the two, sounds a bit sketchy.
>
> In the BIOS (realmode) these devices are initialized and the subsequent
> 32-bit code depends on the proper initialization. For the common I/O
> emulation case this should hopefully be rare.
How does this work now? Do we really have two copies of each device
model? I doubt that's implemented safely.
-- Keir
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
2006-04-29 18:54 ` Keir Fraser
@ 2006-04-30 1:37 ` Leendert van Doorn
2006-04-29 19:46 ` Keir Fraser
0 siblings, 1 reply; 19+ messages in thread
From: Leendert van Doorn @ 2006-04-30 1:37 UTC (permalink / raw)
To: Keir Fraser; +Cc: Mats, xen-devel, Khoa Huynh
On Sat, 2006-04-29 at 19:54 +0100, Keir Fraser wrote:
> > In the BIOS (realmode) these devices are initialized and the subsequent
> > 32-bit code depends on the proper initialization. For the common I/O
> > emulation case this should hopefully be rare.
>
> How does this work now? Do we really have two copies of each device
> model? I doubt that's implemented safely.
Right now the realmode code runs inside the VMX partition where it is
partially emulated by vmxassist. So all accesses to the emulated devices
go through the hypervisor first before they (potentially) end up in
qemu-dm. When a transition is made to 32/64-bit code all the initialized
device state is still there.
The problem of keeping the the hypervisor state and the qemu-dm state in
sync is introduced when we alternate between emulation and real
execution. This becomes more interesting when we consider MP guests
where one CPU is running inside the emulator and another on the real
hardware.
Leendert
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
2006-04-30 1:37 ` Leendert van Doorn
@ 2006-04-29 19:46 ` Keir Fraser
0 siblings, 0 replies; 19+ messages in thread
From: Keir Fraser @ 2006-04-29 19:46 UTC (permalink / raw)
To: leendert; +Cc: Mats, xen-devel, Khoa Huynh
On 30 Apr 2006, at 02:37, Leendert van Doorn wrote:
>> How does this work now? Do we really have two copies of each device
>> model? I doubt that's implemented safely.
>
> Right now the realmode code runs inside the VMX partition where it is
> partially emulated by vmxassist. So all accesses to the emulated
> devices
> go through the hypervisor first before they (potentially) end up in
> qemu-dm. When a transition is made to 32/64-bit code all the
> initialized
> device state is still there.
Ah yes, I forgot that the mmio decoder stuff in Xen handles real mode.
So that means that currently each device model is either implemented in
Xen or in qemu-dm, but not both (now that the heinous split PIT device
model is gone). That's a nice state of affairs.
> The problem of keeping the the hypervisor state and the qemu-dm state
> in
> sync is introduced when we alternate between emulation and real
> execution. This becomes more interesting when we consider MP guests
> where one CPU is running inside the emulator and another on the real
> hardware.
It'd obviously be better avoided altogether, unless we have to perform
horrible contortions to do so, or if doing so would hurt performance of
operations that we care about.
Don't get me wrong by the way: I do think that leveraging qemu's full
emulator, at least to get us out of the stickiest situations, is a very
good idea. I'm only concerned about some of the finer details.
-- Keir
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
@ 2006-05-02 12:36 Petersson, Mats
0 siblings, 0 replies; 19+ messages in thread
From: Petersson, Mats @ 2006-05-02 12:36 UTC (permalink / raw)
To: Keir Fraser, Khoa Huynh; +Cc: xen-devel
> -----Original Message-----
> From: Keir Fraser [mailto:Keir.Fraser@cl.cam.ac.uk]
> Sent: 29 April 2006 08:22
> To: Khoa Huynh
> Cc: Petersson, Mats; xen-devel
> Subject: Re: [Xen-devel] [PATCH] Calculate correct
> instruction length for data-fault VM exits on VT-x systems
>
>
> On 28 Apr 2006, at 19:10, Khoa Huynh wrote:
>
> > For these instructions, on Intel VT-x, the instruction
> length is valid
> > in VMCS. On AMD, there is a simple look-up function which
> determines
> > the length of the instruction which is passed in as a parameter.
> > We are good here.
>
> The Intel code only uses the instr-len field because it
> happens to be handy. Going to a look-up function in a
> separate file when you *know* at compile time what the
> instruction length must be is stupid, imo. We should only
> have to do that if the instruction needs some decoding for us
> to know its length (perhaps because of prefix bytes or
> effective address suffixes) and we are not otherwise going to
> be decoding the instruction as part of emulation.
We do have to forward the RIP to next instruction, and we don't know the
prefix and other things, so I don't think we can improve on the current
setup [although I noticed some time ago that you replaced the call to
calculate instrlen on HLT with a constant one, which I suppose is fine,
since IF there is a prefix (unlikely) to HLT, then we just re-execute it
without prefix, which doesn't make a whole lot of difference [Except we
MAY end up waiting for another interrupt, technically speaking, which
would probably not be a good thing - of course, I've never seen any code
with a prefix to HLT - but it's perfectly allowed to do redundant and
useless prefixes on any instruction, for example to change the alignment
of the next instruction].
>
> > I guess we can have a wrapper that takes as input the guest
> > instruction pointer (rip), fetches the whole MAX_INST_LEN (15-byte)
> > buffer starting at rip (make sure that we don't cross a page
> > boundary), and then passes that to the emulator. The
> emulator would
> > decode, emulate, and would include in its return the updated guest
> > instruction pointer (rip) and instruction length. This
> info will be
> > stuffed back into vmcs/vmcb/stack as appropriate. Is this more or
> > less what you have in mind ?
>
> Yes, exactly. It gets a bit trickier though -- we'll have to
> fill the buffer with up to 15 bytes. If we fail to get all of
> 15 bytes (perhaps because the instruction straddles a page
> boundary and the second page has been evicted since the
> instruction faulted) then we ought to tell the emulator how
> many bytes we actually copied and it shoudl return error if
> it ends up going off the end of teh instruction buffer.
> Alternatively we could re-enter the guest immediately if we
> cannot read
> 15 bytes from the EIP -- but that'll cause an infinite loop
> if the instruction itself doesn't straddle the page boundary
> as it won't trigger paging in the guest. But I/O instructions
> are unlikely to be in paged memory.
Yes - at the moment, we do not cope well with instructions that straddle
a page-boundary if the next page isn't present in memory.
On the other hand, I've been looking at just using the existing hooks
(in the ops structure) to read instruction bytes and I think that would
work just fine.
However, I haven't been looking through this entire thread to see what
the conclusion is, so this may all be moot...
--
Mats
>
> -- Keir
>
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
@ 2006-04-28 9:02 Petersson, Mats
2006-04-28 9:14 ` Keir Fraser
0 siblings, 1 reply; 19+ messages in thread
From: Petersson, Mats @ 2006-04-28 9:02 UTC (permalink / raw)
To: Keir Fraser, Khoa Huynh; +Cc: xen-devel
> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com
> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of
> Keir Fraser
> Sent: 28 April 2006 07:03
> To: Khoa Huynh
> Cc: xen-devel
> Subject: Re: [Xen-devel] [PATCH] Calculate correct
> instruction length for data-fault VM exits on VT-x systems
>
>
> On 28 Apr 2006, at 02:52, Khoa Huynh wrote:
>
> > It should be noted that VMX only uses this instrlen
> function when the
> > hypervisor needs the instruction-length info and that info is
> > undefined in VMCS, e.g., for MMIO instructions. In other
> cases where
> > the instruction-length field is valid in VMCS, the hypervisor
> > continues to get that info from VMCS (via vmread operation).
>
> I don't believe we need the instruction-length at all, and I
> suspect that the decoder could be removed from hvm/svm
> entirely. There are two broad categories of instruction I'm
> thinking of:
> 1. Instructions with their own VMEXIT reason code tend to
> be really simple so we know their length anyway and, if not,
> the instr-length field should be valid
> 2. For mmio instructions, the emulator can work out the
> length for itself and increment eip appropriately. There's no
> need to know the instruction length in advance of invoking
> the emulator.
>
> I guess there may be one or two instructions, particularly on
> AMD, where we aren't feeding the instruction to the mmio
> emulator and the instruction isn't fixed length, so perhaps
> we'll need a small decoder in hvm/svm for those. But even if
> so, it could be much simpler than what is there right now.
Yes, this is correct. There is a specific routine that takes as an
argument which instruction(s) we're looking for, and calculates it's
length, for this purpose [since we do know which instructions we are
looking for].
I'll look at your previous suggestion of merging the MMIO emulation into
x86_emulate later on today. We probably do need to sum up the length and
pass it back to the caller - as that code doesn't know how to update the
correct field of the different processor architectures (vmcb vs. vmcs
vs. stack-frame for Para-virtual machine). But it shouldn't be
particularly hard to achieve this.
--
Mats
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
2006-04-28 9:02 Petersson, Mats
@ 2006-04-28 9:14 ` Keir Fraser
0 siblings, 0 replies; 19+ messages in thread
From: Keir Fraser @ 2006-04-28 9:14 UTC (permalink / raw)
To: Petersson, Mats; +Cc: Khoa Huynh, xen-devel
On 28 Apr 2006, at 10:02, Petersson, Mats wrote:
> I'll look at your previous suggestion of merging the MMIO emulation
> into
> x86_emulate later on today. We probably do need to sum up the length
> and
> pass it back to the caller - as that code doesn't know how to update
> the
> correct field of the different processor architectures (vmcb vs. vmcs
> vs. stack-frame for Para-virtual machine). But it shouldn't be
> particularly hard to achieve this.
The emulator uses and updates the eip field of the passed-in regs
structure. We may want to change this interface in future by having the
caller explicitly pass in a buffer containing the instruction, and the
number of valid bytes in the buffer. Or add a 'fetch_insn_byte'
callback hook to the emulator interface.
-- Keir
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
@ 2006-04-28 1:52 Khoa Huynh
2006-04-28 2:41 ` Anthony Liguori
2006-04-28 6:03 ` Keir Fraser
0 siblings, 2 replies; 19+ messages in thread
From: Khoa Huynh @ 2006-04-28 1:52 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1: Type: text/plain, Size: 2526 bytes --]
On VT-x systems, according to Intel VMX specifications,
the instruction-length information in VMCS on VM exits
is not always valid. The instruction-length field in
VMCS is ONLY valid in the follwing cases: when the VM
exit is caused by the execution of instructions that
cause the VM exit unconditionally or based on the
execution-control bitmap, a software exception (INT3
or INT0), or a task switch.
For VM exits caused by data faults (hardware exceptions),
the instruction-length field in VMCS is actually undefined.
In these cases, the hypervisor can derive the correct
instruction length by fetching bytes based on the guest
instruction pointer and decoding those bytes. There is
already a function to do this in the SVM sub-directory.
This function should be moved up one level to HVM
sub-directory, so both VMX and SVM can use it.
It should be noted that VMX only uses this instrlen
function when the hypervisor needs the instruction-length
info and that info is undefined in VMCS, e.g., for MMIO
instructions. In other cases where the instruction-length
field is valid in VMCS, the hypervisor continues to get
that info from VMCS (via vmread operation).
I came across this problem in my effort to get Windows
NT booting on Xen.
There are TWO patches attached below:
* instrlen1.patch effectively moves the instrlen.c file
from xen/arch/x86/hvm/svm sub-directory up one level to
xen/arch/x86/hvm sub-directory and makes minor changes
to instrlen.c so that it will work at its new location.
* instrlen2.patch makes additional changes to VMX code
so the hypervisor can use the instrlen function correctly
in all modes in cases where the instruction-length field is
undefined and read from VMCS in cases where it is defined.
I must acknowledge that most of the code in the first patch
(instrlen1.patch) does not come from me since the primary
prupose of this patch is to move the instrlen.c file from
one location to another in the tree (it also makes some
minor changes). The second patch (instrlen2.patch) is
more meaty :-)
These two patches should apply cleanly to the latest
xen-unstable tree (hg tip = 9866).
I have tested these patches successfully on two systems
using a variety of guest OSes (e.g. WinXP, Win2003 Server).
Signed-off-by: Khoa Huynh <khoa@us.ibm.com>
(See attached file: instrlen1.patch)(See attached file: instrlen2.patch)
Regards,
Khoa
_________________________________________
Khoa Huynh, Ph.D.
IBM Linux Technology Center
(512) 838-4903; T/L 678-4903; khoa@us.ibm.com
[-- Attachment #2: instrlen1.patch --]
[-- Type: application/octet-stream, Size: 32964 bytes --]
diff -r 4e1b8be54311 xen/arch/x86/hvm/instrlen.c
--- /dev/null Thu Apr 27 12:38:21 2006
+++ b/xen/arch/x86/hvm/instrlen.c Thu Apr 27 19:50:04 2006
@@ -0,0 +1,482 @@
+/*
+ * instrlen.c - calculates the instruction length for all operating modes
+ *
+ * Travis Betak, travis.betak@amd.com
+ * Copyright (c) 2005,2006 AMD
+ * Copyright (c) 1006 IBM
+ * Copyright (c) 2005 Keir Fraser
+ *
+ * Khoa Huynh (khoa@us.ibm.com):
+ * This file was moved from xen/arch/x86/hvm/svm directory up to
+ * this directory because the instrlen function is being used by
+ * both VMX and SVM. VMX only uses this function when the
+ * hypervisor needs the instruction-length info and this info
+ * is not available (i.e., undefined) in VMCS.
+ */
+
+/*
+ * TODO: the way in which we use hvm_instrlen is very inefficient as is now
+ * stands. It will be worth while to return the actual instruction buffer
+ * along with the instruction length since one of the reasons we are getting
+ * the instruction length is to know how many instruction bytes we need to
+ * fetch.
+ */
+
+#include <xen/config.h>
+#include <xen/types.h>
+#include <xen/lib.h>
+#include <xen/mm.h>
+#include <asm/regs.h>
+#define DPRINTF DPRINTK
+#include <asm-x86/x86_emulate.h>
+
+/* read from guest memory */
+extern int inst_copy_from_guest(unsigned char *buf, unsigned long eip,
+ int length);
+
+/*
+ * Opcode effective-address decode tables.
+ * Note that we only emulate instructions that have at least one memory
+ * operand (excluding implicit stack references). We assume that stack
+ * references and instruction fetches will never occur in special memory
+ * areas that require emulation. So, for example, 'mov <imm>,<reg>' need
+ * not be handled.
+ */
+
+/* Operand sizes: 8-bit operands or specified/overridden size. */
+#define ByteOp (1<<0) /* 8-bit operands. */
+/* Destination operand type. */
+#define ImplicitOps (1<<1) /* Implicit in opcode. No generic decode. */
+#define DstReg (2<<1) /* Register operand. */
+#define DstMem (3<<1) /* Memory operand. */
+#define DstMask (3<<1)
+/* Source operand type. */
+#define SrcNone (0<<3) /* No source operand. */
+#define SrcImplicit (0<<3) /* Source operand is implicit in the opcode. */
+#define SrcReg (1<<3) /* Register operand. */
+#define SrcMem (2<<3) /* Memory operand. */
+#define SrcMem16 (3<<3) /* Memory operand (16-bit). */
+#define SrcMem32 (4<<3) /* Memory operand (32-bit). */
+#define SrcImm (5<<3) /* Immediate operand. */
+#define SrcImmByte (6<<3) /* 8-bit sign-extended immediate operand. */
+#define SrcMask (7<<3)
+/* Generic ModRM decode. */
+#define ModRM (1<<6)
+/* Destination is only written; never read. */
+#define Mov (1<<7)
+
+static uint8_t opcode_table[256] = {
+ /* 0x00 - 0x07 */
+ ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
+ ByteOp|DstReg|SrcMem|ModRM, DstReg|SrcMem|ModRM,
+ 0, 0, 0, 0,
+ /* 0x08 - 0x0F */
+ ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
+ ByteOp|DstReg|SrcMem|ModRM, DstReg|SrcMem|ModRM,
+ 0, 0, 0, 0,
+ /* 0x10 - 0x17 */
+ ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
+ ByteOp|DstReg|SrcMem|ModRM, DstReg|SrcMem|ModRM,
+ 0, 0, 0, 0,
+ /* 0x18 - 0x1F */
+ ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
+ ByteOp|DstReg|SrcMem|ModRM, DstReg|SrcMem|ModRM,
+ 0, 0, 0, 0,
+ /* 0x20 - 0x27 */
+ ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
+ ByteOp|DstReg|SrcMem|ModRM, DstReg|SrcMem|ModRM,
+ 0, 0, 0, 0,
+ /* 0x28 - 0x2F */
+ ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
+ ByteOp|DstReg|SrcMem|ModRM, DstReg|SrcMem|ModRM,
+ 0, 0, 0, 0,
+ /* 0x30 - 0x37 */
+ ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
+ ByteOp|DstReg|SrcMem|ModRM, DstReg|SrcMem|ModRM,
+ 0, 0, 0, 0,
+ /* 0x38 - 0x3F */
+ ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
+ ByteOp|DstReg|SrcMem|ModRM, DstReg|SrcMem|ModRM,
+ 0, 0, 0, 0,
+ /* 0x40 - 0x4F */
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ /* 0x50 - 0x5F */
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ /* 0x60 - 0x6F */
+ 0, 0, 0, DstReg|SrcMem32|ModRM|Mov /* movsxd (x86/64) */,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ /* 0x70 - 0x7F */
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ /* 0x80 - 0x87 */
+ ByteOp|DstMem|SrcImm|ModRM, DstMem|SrcImm|ModRM,
+ ByteOp|DstMem|SrcImm|ModRM, DstMem|SrcImmByte|ModRM,
+ ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
+ ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
+ /* 0x88 - 0x8F */
+ ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
+ ByteOp|DstReg|SrcMem|ModRM, DstReg|SrcMem|ModRM,
+ 0, 0, 0, DstMem|SrcNone|ModRM|Mov,
+ /* 0x90 - 0x9F */
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ /* 0xA0 - 0xA7 */
+ ByteOp|DstReg|SrcMem|Mov, DstReg|SrcMem|Mov,
+ ByteOp|DstMem|SrcReg|Mov, DstMem|SrcReg|Mov,
+ ByteOp|ImplicitOps|Mov, ImplicitOps|Mov,
+ ByteOp|ImplicitOps, ImplicitOps,
+ /* 0xA8 - 0xAF */
+ 0, 0, ByteOp|ImplicitOps|Mov, ImplicitOps|Mov,
+ ByteOp|ImplicitOps|Mov, ImplicitOps|Mov,
+ ByteOp|ImplicitOps, ImplicitOps,
+ /* 0xB0 - 0xBF */
+ SrcImmByte, SrcImmByte, SrcImmByte, SrcImmByte,
+ SrcImmByte, SrcImmByte, SrcImmByte, SrcImmByte,
+ 0, 0, 0, 0, 0, 0, 0, 0,
+ /* 0xC0 - 0xC7 */
+ ByteOp|DstMem|SrcImm|ModRM, DstMem|SrcImmByte|ModRM, 0, 0,
+ 0, 0, ByteOp|DstMem|SrcImm|ModRM, DstMem|SrcImm|ModRM,
+ /* 0xC8 - 0xCF */
+ 0, 0, 0, 0, 0, 0, 0, 0,
+ /* 0xD0 - 0xD7 */
+ ByteOp|DstMem|SrcImplicit|ModRM, DstMem|SrcImplicit|ModRM,
+ ByteOp|DstMem|SrcImplicit|ModRM, DstMem|SrcImplicit|ModRM,
+ 0, 0, 0, 0,
+ /* 0xD8 - 0xDF */
+ 0, 0, 0, 0, 0, 0, 0, 0,
+ /* 0xE0 - 0xEF */
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ /* 0xF0 - 0xF7 */
+ 0, 0, 0, 0,
+ 0, 0, ByteOp|DstMem|SrcNone|ModRM, DstMem|SrcNone|ModRM,
+ /* 0xF8 - 0xFF */
+ 0, 0, 0, 0,
+ 0, 0, ByteOp|DstMem|SrcNone|ModRM, DstMem|SrcNone|ModRM
+};
+
+static uint8_t twobyte_table[256] = {
+ /* 0x00 - 0x0F */
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ImplicitOps|ModRM, 0, 0,
+ /* 0x10 - 0x1F */
+ 0, 0, 0, 0, 0, 0, 0, 0, ImplicitOps|ModRM, 0, 0, 0, 0, 0, 0, 0,
+ /* 0x20 - 0x2F */
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ /* 0x30 - 0x3F */
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ /* 0x40 - 0x47 */
+ DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem|ModRM|Mov,
+ DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem|ModRM|Mov,
+ DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem|ModRM|Mov,
+ DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem|ModRM|Mov,
+ /* 0x48 - 0x4F */
+ DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem|ModRM|Mov,
+ DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem|ModRM|Mov,
+ DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem|ModRM|Mov,
+ DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem|ModRM|Mov,
+ /* 0x50 - 0x5F */
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ /* 0x60 - 0x6F */
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ /* 0x70 - 0x7F */
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ /* 0x80 - 0x8F */
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ /* 0x90 - 0x9F */
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ /* 0xA0 - 0xA7 */
+ 0, 0, 0, DstMem|SrcReg|ModRM, 0, 0, 0, 0,
+ /* 0xA8 - 0xAF */
+ 0, 0, 0, DstMem|SrcReg|ModRM, 0, 0, 0, 0,
+ /* 0xB0 - 0xB7 */
+ ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM, 0, DstMem|SrcReg|ModRM,
+ 0, 0, ByteOp|DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem16|ModRM|Mov,
+ /* 0xB8 - 0xBF */
+ 0, 0, DstMem|SrcImmByte|ModRM, DstMem|SrcReg|ModRM,
+ 0, 0, ByteOp|DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem16|ModRM|Mov,
+ /* 0xC0 - 0xCF */
+ 0, 0, 0, 0, 0, 0, 0, ImplicitOps|ModRM, 0, 0, 0, 0, 0, 0, 0, 0,
+ /* 0xD0 - 0xDF */
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ /* 0xE0 - 0xEF */
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ /* 0xF0 - 0xFF */
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
+};
+
+/*
+ * insn_fetch - fetch the next 1 to 4 bytes from instruction stream
+ *
+ * @_type: u8, u16, u32, s8, s16, or s32
+ * @_size: 1, 2, or 4 bytes
+ * @_eip: address to fetch from guest memory
+ * @_length: updated! increments the current instruction length counter by _size
+ *
+ * INTERNAL this is used internally by hvm_instrlen to fetch the next byte,
+ * word, or dword from guest memory at location _eip. we currently use a local
+ * unsigned long as the storage buffer since the most bytes we're gonna get
+ * is limited to 4.
+ */
+#define insn_fetch(_type, _size, _eip, _length) \
+({ unsigned long _x; \
+ if ((rc = inst_copy_from_guest((unsigned char *)(&(_x)), \
+ (unsigned long)(_eip), _size)) \
+ != _size) \
+ goto done; \
+ (_eip) += (_size); \
+ (_length) += (_size); \
+ (_type)_x; \
+})
+
+
+/**
+ * hvm_instrlen - returns the current instructions length
+ *
+ * @regs: guest register state
+ * @mode: guest operating mode
+ *
+ * EXTERNAL this routine calculates the length of the current instruction
+ * pointed to by eip. The guest state is _not_ changed by this routine.
+ */
+int hvm_instrlen(struct cpu_user_regs *regs, int mode)
+{
+ uint8_t b, d, twobyte = 0, rex_prefix = 0;
+ uint8_t modrm, modrm_mod = 0, modrm_reg = 0, modrm_rm = 0;
+ unsigned int op_bytes, ad_bytes, lock_prefix = 0, rep_prefix = 0, i;
+ int rc = 0;
+ int length = 0;
+ unsigned int tmp;
+
+ /* Shadow copy of register state. Committed on successful emulation. */
+ struct cpu_user_regs _regs = *regs;
+
+ /* include CS for 16-bit modes */
+ if (mode == X86EMUL_MODE_REAL || mode == X86EMUL_MODE_PROT16)
+ _regs.eip += (_regs.cs << 4);
+
+ switch ( mode )
+ {
+ case X86EMUL_MODE_REAL:
+ case X86EMUL_MODE_PROT16:
+ op_bytes = ad_bytes = 2;
+ break;
+ case X86EMUL_MODE_PROT32:
+ op_bytes = ad_bytes = 4;
+ break;
+#ifdef __x86_64__
+ case X86EMUL_MODE_PROT64:
+ op_bytes = 4;
+ ad_bytes = 8;
+ break;
+#endif
+ default:
+ printf("hvm_instrlen: invalid mode!\n");
+ return -1;
+ }
+
+ /* Legacy prefixes. */
+ for ( i = 0; i < 8; i++ )
+ {
+ switch ( b = insn_fetch(uint8_t, 1, _regs.eip, length) )
+ {
+ case 0x66: /* operand-size override */
+ op_bytes ^= 6; /* switch between 2/4 bytes */
+ break;
+ case 0x67: /* address-size override */
+ if ( mode == X86EMUL_MODE_PROT64 )
+ ad_bytes ^= 12; /* switch between 4/8 bytes */
+ else
+ ad_bytes ^= 6; /* switch between 2/4 bytes */
+ break;
+ case 0x2e: /* CS override */
+ case 0x3e: /* DS override */
+ case 0x26: /* ES override */
+ case 0x64: /* FS override */
+ case 0x65: /* GS override */
+ case 0x36: /* SS override */
+ break;
+ case 0xf0: /* LOCK */
+ lock_prefix = 1;
+ break;
+ case 0xf3: /* REP/REPE/REPZ */
+ rep_prefix = 1;
+ break;
+ case 0xf2: /* REPNE/REPNZ */
+ break;
+ default:
+ goto done_prefixes;
+ }
+ }
+done_prefixes:
+
+ /* Note quite the same as 80386 real mode, but hopefully good enough. */
+ if ( (mode == X86EMUL_MODE_REAL) && (ad_bytes != 2) ) {
+ printf("sonofabitch!! we don't support 32-bit addresses in realmode\n");
+ goto cannot_emulate;
+ }
+
+ /* REX prefix. */
+ if ( (mode == X86EMUL_MODE_PROT64) && ((b & 0xf0) == 0x40) )
+ {
+ rex_prefix = b;
+ if ( b & 8 )
+ op_bytes = 8; /* REX.W */
+ modrm_reg = (b & 4) << 1; /* REX.R */
+ /* REX.B and REX.X do not need to be decoded. */
+ b = insn_fetch(uint8_t, 1, _regs.eip, length);
+ }
+
+ /* Opcode byte(s). */
+ d = opcode_table[b];
+ if ( d == 0 )
+ {
+ /* Two-byte opcode? */
+ if ( b == 0x0f )
+ {
+ twobyte = 1;
+ b = insn_fetch(uint8_t, 1, _regs.eip, length);
+ d = twobyte_table[b];
+ }
+
+ /* Unrecognised? */
+ if ( d == 0 )
+ goto cannot_emulate;
+ }
+
+ /* ModRM and SIB bytes. */
+ if ( d & ModRM )
+ {
+ modrm = insn_fetch(uint8_t, 1, _regs.eip, length);
+ modrm_mod |= (modrm & 0xc0) >> 6;
+ modrm_reg |= (modrm & 0x38) >> 3;
+ modrm_rm |= (modrm & 0x07);
+
+ if ( modrm_mod == 3 )
+ {
+ DPRINTF("Cannot parse ModRM.mod == 3.\n");
+ goto cannot_emulate;
+ }
+
+ if ( ad_bytes == 2 )
+ {
+ /* 16-bit ModR/M decode. */
+ switch ( modrm_mod )
+ {
+ case 0:
+ if ( modrm_rm == 6 )
+ {
+ length += 2;
+ _regs.eip += 2; /* skip disp16 */
+ }
+ break;
+ case 1:
+ length += 1;
+ _regs.eip += 1; /* skip disp8 */
+ break;
+ case 2:
+ length += 2;
+ _regs.eip += 2; /* skip disp16 */
+ break;
+ }
+ }
+ else
+ {
+ /* 32/64-bit ModR/M decode. */
+ switch ( modrm_mod )
+ {
+ case 0:
+ if ( (modrm_rm == 4) &&
+ (((insn_fetch(uint8_t, 1, _regs.eip, length)) & 7)
+ == 5) )
+ {
+ length += 4;
+ _regs.eip += 4; /* skip disp32 specified by SIB.base */
+ }
+ else if ( modrm_rm == 5 )
+ {
+ length += 4;
+ _regs.eip += 4; /* skip disp32 */
+ }
+ break;
+ case 1:
+ if ( modrm_rm == 4 )
+ {
+ insn_fetch(uint8_t, 1, _regs.eip, length);
+ }
+ length += 1;
+ _regs.eip += 1; /* skip disp8 */
+ break;
+ case 2:
+ if ( modrm_rm == 4 )
+ {
+ insn_fetch(uint8_t, 1, _regs.eip, length);
+ }
+ length += 4;
+ _regs.eip += 4; /* skip disp32 */
+ break;
+ }
+ }
+ }
+
+ /* Decode and fetch the destination operand: register or memory. */
+ switch ( d & DstMask )
+ {
+ case ImplicitOps:
+ /* Special instructions do their own operand decoding. */
+ goto done;
+ }
+
+ /* Decode and fetch the source operand: register, memory or immediate. */
+ switch ( d & SrcMask )
+ {
+ case SrcImm:
+ tmp = (d & ByteOp) ? 1 : op_bytes;
+ if ( tmp == 8 ) tmp = 4;
+ /* NB. Immediates are sign-extended as necessary. */
+ switch ( tmp )
+ {
+ case 1: insn_fetch(int8_t, 1, _regs.eip, length); break;
+ case 2: insn_fetch(int16_t, 2, _regs.eip, length); break;
+ case 4: insn_fetch(int32_t, 4, _regs.eip, length); break;
+ }
+ break;
+ case SrcImmByte:
+ insn_fetch(int8_t, 1, _regs.eip, length);
+ break;
+ }
+
+ if ( twobyte )
+ goto done;
+
+ switch ( b )
+ {
+ case 0xa0 ... 0xa1: /* mov */
+ length += ad_bytes;
+ _regs.eip += ad_bytes; /* skip src displacement */
+ break;
+ case 0xa2 ... 0xa3: /* mov */
+ length += ad_bytes;
+ _regs.eip += ad_bytes; /* skip dst displacement */
+ break;
+ case 0xf6 ... 0xf7: /* Grp3 */
+ switch ( modrm_reg )
+ {
+ case 0 ... 1: /* test */
+ /* Special case in Grp3: test has an immediate source operand. */
+ tmp = (d & ByteOp) ? 1 : op_bytes;
+ if ( tmp == 8 ) tmp = 4;
+ switch ( tmp )
+ {
+ case 1: insn_fetch(int8_t, 1, _regs.eip, length); break;
+ case 2: insn_fetch(int16_t, 2, _regs.eip, length); break;
+ case 4: insn_fetch(int32_t, 4, _regs.eip, length); break;
+ }
+ goto done;
+ }
+ break;
+ }
+
+done:
+ return length;
+
+cannot_emulate:
+ printf("hvm_instrlen: Cannot emulate %02x at address %lx (eip %lx, mode %d)\n", b, (unsigned long)_regs.eip, (unsigned long)regs->eip, mode);
+ return -1;
+}
diff -r 4e1b8be54311 xen/arch/x86/hvm/svm/instrlen.c
--- a/xen/arch/x86/hvm/svm/instrlen.c Thu Apr 27 12:38:21 2006
+++ /dev/null Thu Apr 27 19:50:04 2006
@@ -1,479 +0,0 @@
-/*
- * instrlen.c - calculates the instruction length for all operating modes
- *
- * Travis Betak, travis.betak@amd.com
- * Copyright (c) 2005,2006 AMD
- * Copyright (c) 2005 Keir Fraser
- *
- * Essentially a very, very stripped version of Keir Fraser's work in
- * x86_emulate.c. Used for MMIO.
- */
-
-/*
- * TODO: the way in which we use svm_instrlen is very inefficient as is now
- * stands. It will be worth while to return the actual instruction buffer
- * along with the instruction length since one of the reasons we are getting
- * the instruction length is to know how many instruction bytes we need to
- * fetch.
- */
-
-#include <xen/config.h>
-#include <xen/types.h>
-#include <xen/lib.h>
-#include <xen/mm.h>
-#include <asm/regs.h>
-#define DPRINTF DPRINTK
-#include <asm-x86/x86_emulate.h>
-
-/* read from guest memory */
-extern int inst_copy_from_guest(unsigned char *buf, unsigned long eip,
- int length);
-extern void svm_dump_inst(unsigned long eip);
-
-/*
- * Opcode effective-address decode tables.
- * Note that we only emulate instructions that have at least one memory
- * operand (excluding implicit stack references). We assume that stack
- * references and instruction fetches will never occur in special memory
- * areas that require emulation. So, for example, 'mov <imm>,<reg>' need
- * not be handled.
- */
-
-/* Operand sizes: 8-bit operands or specified/overridden size. */
-#define ByteOp (1<<0) /* 8-bit operands. */
-/* Destination operand type. */
-#define ImplicitOps (1<<1) /* Implicit in opcode. No generic decode. */
-#define DstReg (2<<1) /* Register operand. */
-#define DstMem (3<<1) /* Memory operand. */
-#define DstMask (3<<1)
-/* Source operand type. */
-#define SrcNone (0<<3) /* No source operand. */
-#define SrcImplicit (0<<3) /* Source operand is implicit in the opcode. */
-#define SrcReg (1<<3) /* Register operand. */
-#define SrcMem (2<<3) /* Memory operand. */
-#define SrcMem16 (3<<3) /* Memory operand (16-bit). */
-#define SrcMem32 (4<<3) /* Memory operand (32-bit). */
-#define SrcImm (5<<3) /* Immediate operand. */
-#define SrcImmByte (6<<3) /* 8-bit sign-extended immediate operand. */
-#define SrcMask (7<<3)
-/* Generic ModRM decode. */
-#define ModRM (1<<6)
-/* Destination is only written; never read. */
-#define Mov (1<<7)
-
-static uint8_t opcode_table[256] = {
- /* 0x00 - 0x07 */
- ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
- ByteOp|DstReg|SrcMem|ModRM, DstReg|SrcMem|ModRM,
- 0, 0, 0, 0,
- /* 0x08 - 0x0F */
- ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
- ByteOp|DstReg|SrcMem|ModRM, DstReg|SrcMem|ModRM,
- 0, 0, 0, 0,
- /* 0x10 - 0x17 */
- ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
- ByteOp|DstReg|SrcMem|ModRM, DstReg|SrcMem|ModRM,
- 0, 0, 0, 0,
- /* 0x18 - 0x1F */
- ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
- ByteOp|DstReg|SrcMem|ModRM, DstReg|SrcMem|ModRM,
- 0, 0, 0, 0,
- /* 0x20 - 0x27 */
- ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
- ByteOp|DstReg|SrcMem|ModRM, DstReg|SrcMem|ModRM,
- 0, 0, 0, 0,
- /* 0x28 - 0x2F */
- ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
- ByteOp|DstReg|SrcMem|ModRM, DstReg|SrcMem|ModRM,
- 0, 0, 0, 0,
- /* 0x30 - 0x37 */
- ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
- ByteOp|DstReg|SrcMem|ModRM, DstReg|SrcMem|ModRM,
- 0, 0, 0, 0,
- /* 0x38 - 0x3F */
- ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
- ByteOp|DstReg|SrcMem|ModRM, DstReg|SrcMem|ModRM,
- 0, 0, 0, 0,
- /* 0x40 - 0x4F */
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
- /* 0x50 - 0x5F */
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
- /* 0x60 - 0x6F */
- 0, 0, 0, DstReg|SrcMem32|ModRM|Mov /* movsxd (x86/64) */,
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
- /* 0x70 - 0x7F */
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
- /* 0x80 - 0x87 */
- ByteOp|DstMem|SrcImm|ModRM, DstMem|SrcImm|ModRM,
- ByteOp|DstMem|SrcImm|ModRM, DstMem|SrcImmByte|ModRM,
- ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
- ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
- /* 0x88 - 0x8F */
- ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
- ByteOp|DstReg|SrcMem|ModRM, DstReg|SrcMem|ModRM,
- 0, 0, 0, DstMem|SrcNone|ModRM|Mov,
- /* 0x90 - 0x9F */
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
- /* 0xA0 - 0xA7 */
- ByteOp|DstReg|SrcMem|Mov, DstReg|SrcMem|Mov,
- ByteOp|DstMem|SrcReg|Mov, DstMem|SrcReg|Mov,
- ByteOp|ImplicitOps|Mov, ImplicitOps|Mov,
- ByteOp|ImplicitOps, ImplicitOps,
- /* 0xA8 - 0xAF */
- 0, 0, ByteOp|ImplicitOps|Mov, ImplicitOps|Mov,
- ByteOp|ImplicitOps|Mov, ImplicitOps|Mov,
- ByteOp|ImplicitOps, ImplicitOps,
- /* 0xB0 - 0xBF */
- SrcImmByte, SrcImmByte, SrcImmByte, SrcImmByte,
- SrcImmByte, SrcImmByte, SrcImmByte, SrcImmByte,
- 0, 0, 0, 0, 0, 0, 0, 0,
- /* 0xC0 - 0xC7 */
- ByteOp|DstMem|SrcImm|ModRM, DstMem|SrcImmByte|ModRM, 0, 0,
- 0, 0, ByteOp|DstMem|SrcImm|ModRM, DstMem|SrcImm|ModRM,
- /* 0xC8 - 0xCF */
- 0, 0, 0, 0, 0, 0, 0, 0,
- /* 0xD0 - 0xD7 */
- ByteOp|DstMem|SrcImplicit|ModRM, DstMem|SrcImplicit|ModRM,
- ByteOp|DstMem|SrcImplicit|ModRM, DstMem|SrcImplicit|ModRM,
- 0, 0, 0, 0,
- /* 0xD8 - 0xDF */
- 0, 0, 0, 0, 0, 0, 0, 0,
- /* 0xE0 - 0xEF */
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
- /* 0xF0 - 0xF7 */
- 0, 0, 0, 0,
- 0, 0, ByteOp|DstMem|SrcNone|ModRM, DstMem|SrcNone|ModRM,
- /* 0xF8 - 0xFF */
- 0, 0, 0, 0,
- 0, 0, ByteOp|DstMem|SrcNone|ModRM, DstMem|SrcNone|ModRM
-};
-
-static uint8_t twobyte_table[256] = {
- /* 0x00 - 0x0F */
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ImplicitOps|ModRM, 0, 0,
- /* 0x10 - 0x1F */
- 0, 0, 0, 0, 0, 0, 0, 0, ImplicitOps|ModRM, 0, 0, 0, 0, 0, 0, 0,
- /* 0x20 - 0x2F */
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
- /* 0x30 - 0x3F */
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
- /* 0x40 - 0x47 */
- DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem|ModRM|Mov,
- DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem|ModRM|Mov,
- DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem|ModRM|Mov,
- DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem|ModRM|Mov,
- /* 0x48 - 0x4F */
- DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem|ModRM|Mov,
- DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem|ModRM|Mov,
- DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem|ModRM|Mov,
- DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem|ModRM|Mov,
- /* 0x50 - 0x5F */
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
- /* 0x60 - 0x6F */
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
- /* 0x70 - 0x7F */
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
- /* 0x80 - 0x8F */
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
- /* 0x90 - 0x9F */
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
- /* 0xA0 - 0xA7 */
- 0, 0, 0, DstMem|SrcReg|ModRM, 0, 0, 0, 0,
- /* 0xA8 - 0xAF */
- 0, 0, 0, DstMem|SrcReg|ModRM, 0, 0, 0, 0,
- /* 0xB0 - 0xB7 */
- ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM, 0, DstMem|SrcReg|ModRM,
- 0, 0, ByteOp|DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem16|ModRM|Mov,
- /* 0xB8 - 0xBF */
- 0, 0, DstMem|SrcImmByte|ModRM, DstMem|SrcReg|ModRM,
- 0, 0, ByteOp|DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem16|ModRM|Mov,
- /* 0xC0 - 0xCF */
- 0, 0, 0, 0, 0, 0, 0, ImplicitOps|ModRM, 0, 0, 0, 0, 0, 0, 0, 0,
- /* 0xD0 - 0xDF */
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
- /* 0xE0 - 0xEF */
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
- /* 0xF0 - 0xFF */
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
-};
-
-/*
- * insn_fetch - fetch the next 1 to 4 bytes from instruction stream
- *
- * @_type: u8, u16, u32, s8, s16, or s32
- * @_size: 1, 2, or 4 bytes
- * @_eip: address to fetch from guest memory
- * @_length: updated! increments the current instruction length counter by _size
- *
- * INTERNAL this is used internally by svm_instrlen to fetch the next byte,
- * word, or dword from guest memory at location _eip. we currently use a local
- * unsigned long as the storage buffer since the most bytes we're gonna get
- * is limited to 4.
- */
-#define insn_fetch(_type, _size, _eip, _length) \
-({ unsigned long _x; \
- if ((rc = inst_copy_from_guest((unsigned char *)(&(_x)), \
- (unsigned long)(_eip), _size)) \
- != _size) \
- goto done; \
- (_eip) += (_size); \
- (_length) += (_size); \
- (_type)_x; \
-})
-
-
-/**
- * svn_instrlen - returns the current instructions length
- *
- * @regs: guest register state
- * @mode: guest operating mode
- *
- * EXTERNAL this routine calculates the length of the current instruction
- * pointed to by eip. The guest state is _not_ changed by this routine.
- */
-int svm_instrlen(struct cpu_user_regs *regs, int mode)
-{
- uint8_t b, d, twobyte = 0, rex_prefix = 0;
- uint8_t modrm, modrm_mod = 0, modrm_reg = 0, modrm_rm = 0;
- unsigned int op_bytes, ad_bytes, lock_prefix = 0, rep_prefix = 0, i;
- int rc = 0;
- int length = 0;
- unsigned int tmp;
-
- /* Shadow copy of register state. Committed on successful emulation. */
- struct cpu_user_regs _regs = *regs;
-
- /* include CS for 16-bit modes */
- if (mode == X86EMUL_MODE_REAL || mode == X86EMUL_MODE_PROT16)
- _regs.eip += (_regs.cs << 4);
-
- switch ( mode )
- {
- case X86EMUL_MODE_REAL:
- case X86EMUL_MODE_PROT16:
- op_bytes = ad_bytes = 2;
- break;
- case X86EMUL_MODE_PROT32:
- op_bytes = ad_bytes = 4;
- break;
-#ifdef __x86_64__
- case X86EMUL_MODE_PROT64:
- op_bytes = 4;
- ad_bytes = 8;
- break;
-#endif
- default:
- return -1;
- }
-
- /* Legacy prefixes. */
- for ( i = 0; i < 8; i++ )
- {
- switch ( b = insn_fetch(uint8_t, 1, _regs.eip, length) )
- {
- case 0x66: /* operand-size override */
- op_bytes ^= 6; /* switch between 2/4 bytes */
- break;
- case 0x67: /* address-size override */
- if ( mode == X86EMUL_MODE_PROT64 )
- ad_bytes ^= 12; /* switch between 4/8 bytes */
- else
- ad_bytes ^= 6; /* switch between 2/4 bytes */
- break;
- case 0x2e: /* CS override */
- case 0x3e: /* DS override */
- case 0x26: /* ES override */
- case 0x64: /* FS override */
- case 0x65: /* GS override */
- case 0x36: /* SS override */
- break;
- case 0xf0: /* LOCK */
- lock_prefix = 1;
- break;
- case 0xf3: /* REP/REPE/REPZ */
- rep_prefix = 1;
- break;
- case 0xf2: /* REPNE/REPNZ */
- break;
- default:
- goto done_prefixes;
- }
- }
-done_prefixes:
-
- /* Note quite the same as 80386 real mode, but hopefully good enough. */
- if ( (mode == X86EMUL_MODE_REAL) && (ad_bytes != 2) ) {
- printf("sonofabitch!! we don't support 32-bit addresses in realmode\n");
- goto cannot_emulate;
- }
-
- /* REX prefix. */
- if ( (mode == X86EMUL_MODE_PROT64) && ((b & 0xf0) == 0x40) )
- {
- rex_prefix = b;
- if ( b & 8 )
- op_bytes = 8; /* REX.W */
- modrm_reg = (b & 4) << 1; /* REX.R */
- /* REX.B and REX.X do not need to be decoded. */
- b = insn_fetch(uint8_t, 1, _regs.eip, length);
- }
-
- /* Opcode byte(s). */
- d = opcode_table[b];
- if ( d == 0 )
- {
- /* Two-byte opcode? */
- if ( b == 0x0f )
- {
- twobyte = 1;
- b = insn_fetch(uint8_t, 1, _regs.eip, length);
- d = twobyte_table[b];
- }
-
- /* Unrecognised? */
- if ( d == 0 )
- goto cannot_emulate;
- }
-
- /* ModRM and SIB bytes. */
- if ( d & ModRM )
- {
- modrm = insn_fetch(uint8_t, 1, _regs.eip, length);
- modrm_mod |= (modrm & 0xc0) >> 6;
- modrm_reg |= (modrm & 0x38) >> 3;
- modrm_rm |= (modrm & 0x07);
-
- if ( modrm_mod == 3 )
- {
- DPRINTF("Cannot parse ModRM.mod == 3.\n");
- goto cannot_emulate;
- }
-
- if ( ad_bytes == 2 )
- {
- /* 16-bit ModR/M decode. */
- switch ( modrm_mod )
- {
- case 0:
- if ( modrm_rm == 6 )
- {
- length += 2;
- _regs.eip += 2; /* skip disp16 */
- }
- break;
- case 1:
- length += 1;
- _regs.eip += 1; /* skip disp8 */
- break;
- case 2:
- length += 2;
- _regs.eip += 2; /* skip disp16 */
- break;
- }
- }
- else
- {
- /* 32/64-bit ModR/M decode. */
- switch ( modrm_mod )
- {
- case 0:
- if ( (modrm_rm == 4) &&
- (((insn_fetch(uint8_t, 1, _regs.eip, length)) & 7)
- == 5) )
- {
- length += 4;
- _regs.eip += 4; /* skip disp32 specified by SIB.base */
- }
- else if ( modrm_rm == 5 )
- {
- length += 4;
- _regs.eip += 4; /* skip disp32 */
- }
- break;
- case 1:
- if ( modrm_rm == 4 )
- {
- insn_fetch(uint8_t, 1, _regs.eip, length);
- }
- length += 1;
- _regs.eip += 1; /* skip disp8 */
- break;
- case 2:
- if ( modrm_rm == 4 )
- {
- insn_fetch(uint8_t, 1, _regs.eip, length);
- }
- length += 4;
- _regs.eip += 4; /* skip disp32 */
- break;
- }
- }
- }
-
- /* Decode and fetch the destination operand: register or memory. */
- switch ( d & DstMask )
- {
- case ImplicitOps:
- /* Special instructions do their own operand decoding. */
- goto done;
- }
-
- /* Decode and fetch the source operand: register, memory or immediate. */
- switch ( d & SrcMask )
- {
- case SrcImm:
- tmp = (d & ByteOp) ? 1 : op_bytes;
- if ( tmp == 8 ) tmp = 4;
- /* NB. Immediates are sign-extended as necessary. */
- switch ( tmp )
- {
- case 1: insn_fetch(int8_t, 1, _regs.eip, length); break;
- case 2: insn_fetch(int16_t, 2, _regs.eip, length); break;
- case 4: insn_fetch(int32_t, 4, _regs.eip, length); break;
- }
- break;
- case SrcImmByte:
- insn_fetch(int8_t, 1, _regs.eip, length);
- break;
- }
-
- if ( twobyte )
- goto done;
-
- switch ( b )
- {
- case 0xa0 ... 0xa1: /* mov */
- length += ad_bytes;
- _regs.eip += ad_bytes; /* skip src displacement */
- break;
- case 0xa2 ... 0xa3: /* mov */
- length += ad_bytes;
- _regs.eip += ad_bytes; /* skip dst displacement */
- break;
- case 0xf6 ... 0xf7: /* Grp3 */
- switch ( modrm_reg )
- {
- case 0 ... 1: /* test */
- /* Special case in Grp3: test has an immediate source operand. */
- tmp = (d & ByteOp) ? 1 : op_bytes;
- if ( tmp == 8 ) tmp = 4;
- switch ( tmp )
- {
- case 1: insn_fetch(int8_t, 1, _regs.eip, length); break;
- case 2: insn_fetch(int16_t, 2, _regs.eip, length); break;
- case 4: insn_fetch(int32_t, 4, _regs.eip, length); break;
- }
- goto done;
- }
- break;
- }
-
-done:
- return length;
-
-cannot_emulate:
- DPRINTF("Cannot emulate %02x at address %lx (eip %lx, mode %d)\n",
- b, (unsigned long)_regs.eip, (unsigned long)regs->eip, mode);
- svm_dump_inst(_regs.eip);
- return -1;
-}
[-- Attachment #3: instrlen2.patch --]
[-- Type: application/octet-stream, Size: 8841 bytes --]
diff -r 4e1b8be54311 xen/arch/x86/hvm/Makefile
--- a/xen/arch/x86/hvm/Makefile Thu Apr 27 12:38:21 2006
+++ b/xen/arch/x86/hvm/Makefile Thu Apr 27 20:02:11 2006
@@ -6,6 +6,7 @@
obj-y += i8259.o
obj-y += intercept.o
obj-y += io.o
+obj-y += instrlen.o
obj-y += platform.o
obj-y += vioapic.o
obj-y += vlapic.o
diff -r 4e1b8be54311 xen/arch/x86/hvm/svm/Makefile
--- a/xen/arch/x86/hvm/svm/Makefile Thu Apr 27 12:38:21 2006
+++ b/xen/arch/x86/hvm/svm/Makefile Thu Apr 27 20:02:11 2006
@@ -2,7 +2,6 @@
subdir-$(x86_64) += x86_64
obj-y += emulate.o
-obj-y += instrlen.o
obj-y += intr.o
obj-y += svm.o
obj-y += vmcb.o
diff -r 4e1b8be54311 xen/arch/x86/hvm/svm/svm.c
--- a/xen/arch/x86/hvm/svm/svm.c Thu Apr 27 12:38:21 2006
+++ b/xen/arch/x86/hvm/svm/svm.c Thu Apr 27 20:02:11 2006
@@ -70,7 +70,7 @@
extern asmlinkage void do_IRQ(struct cpu_user_regs *);
extern void send_pio_req(struct cpu_user_regs *regs, unsigned long port,
unsigned long count, int size, long value, int dir, int pvalid);
-extern int svm_instrlen(struct cpu_user_regs *regs, int mode);
+extern int hvm_instrlen(struct cpu_user_regs *regs, int mode);
extern void svm_dump_inst(unsigned long eip);
extern int svm_dbg_on;
void svm_manual_event_injection32(struct vcpu *v, struct cpu_user_regs *regs,
@@ -409,7 +409,7 @@
mode = vmcb->cs.attributes.fields.l ? 8 : 4;
else
mode = (eflags & X86_EFLAGS_VM) || !(cr0 & X86_CR0_PE) ? 2 : 4;
- return svm_instrlen(guest_cpu_user_regs(), mode);
+ return hvm_instrlen(guest_cpu_user_regs(), mode);
}
unsigned long svm_get_ctrl_reg(struct vcpu *v, unsigned int num)
diff -r 4e1b8be54311 xen/arch/x86/hvm/vmx/vmx.c
--- a/xen/arch/x86/hvm/vmx/vmx.c Thu Apr 27 12:38:21 2006
+++ b/xen/arch/x86/hvm/vmx/vmx.c Thu Apr 27 20:02:11 2006
@@ -50,6 +50,7 @@
static unsigned long trace_values[NR_CPUS][4];
#define TRACE_VMEXIT(index,value) trace_values[smp_processor_id()][index]=value
+extern int hvm_instrlen(struct cpu_user_regs *regs, int mode);
static void vmx_ctxt_switch_from(struct vcpu *v);
static void vmx_ctxt_switch_to(struct vcpu *v);
@@ -560,11 +561,20 @@
int vmx_instruction_length(struct vcpu *v)
{
- unsigned long inst_len;
-
- if (__vmread(VM_EXIT_INSTRUCTION_LEN, &inst_len))
- return 0;
- return inst_len;
+ struct vmx_context c;
+ unsigned long cr0, rflags, mode;
+
+ /* check which operating mode the guest is running */
+ if( VMX_LONG_GUEST(v) ) {
+ __vmread(GUEST_CS_AR_BYTES, &c.cs_arbytes.bytes);
+ mode = c.cs_arbytes.fields.reserved1 ? 8 : 4;
+ }
+ else {
+ __vmread(GUEST_RFLAGS, &rflags);
+ __vmread_vcpu(v, GUEST_CR0, &cr0);
+ mode = (rflags & X86_EFLAGS_VM) || !(cr0 & X86_CR0_PE) ? 2 : 4;
+ }
+ return hvm_instrlen(guest_cpu_user_regs(), mode);
}
unsigned long vmx_get_ctrl_reg(struct vcpu *v, unsigned int num)
@@ -713,9 +723,9 @@
/*
* Not all cases receive valid value in the VM-exit instruction length field.
*/
-#define __get_instruction_length(len) \
+#define __get_instruction_length_from_VMCS(len) \
__vmread(VM_EXIT_INSTRUCTION_LEN, &(len)); \
- if ((len) < 1 || (len) > 15) \
+ if ((len) < 1 || (len) > MAX_INST_LEN) \
__hvm_bug(®s);
static void inline __update_guest_eip(unsigned long inst_len)
@@ -762,7 +772,7 @@
/* No support for APIC */
if (!hvm_apic_support(v->domain) && gpa >= 0xFEC00000) {
u32 inst_len;
- __vmread(VM_EXIT_INSTRUCTION_LEN, &(inst_len));
+ inst_len = hvm_instruction_length(v);
__update_guest_eip(inst_len);
return 1;
}
@@ -773,8 +783,9 @@
result = shadow_fault(va, regs);
TRACE_VMEXIT (2,result);
-#if 0
- if ( !result )
+
+#if 0 /* keep for debugging */
+if ( !result )
{
__vmread(GUEST_RIP, &eip);
printk("vmx pgfault to guest va=%lx eip=%lx\n", va, eip);
@@ -2180,11 +2191,11 @@
break;
case EXIT_REASON_CPUID:
vmx_vmexit_do_cpuid(®s);
- __get_instruction_length(inst_len);
+ __get_instruction_length_from_VMCS(inst_len);
__update_guest_eip(inst_len);
break;
case EXIT_REASON_HLT:
- __get_instruction_length(inst_len);
+ __get_instruction_length_from_VMCS(inst_len);
__update_guest_eip(inst_len);
vmx_vmexit_do_hlt();
break;
@@ -2194,13 +2205,13 @@
__vmread(EXIT_QUALIFICATION, &va);
vmx_vmexit_do_invlpg(va);
- __get_instruction_length(inst_len);
+ __get_instruction_length_from_VMCS(inst_len);
__update_guest_eip(inst_len);
break;
}
#if 0 /* keep this for debugging */
case EXIT_REASON_VMCALL:
- __get_instruction_length(inst_len);
+ __get_instruction_length_from_VMCS(inst_len);
__vmread(GUEST_RIP, &eip);
__vmread(EXIT_QUALIFICATION, &exit_qualification);
@@ -2211,7 +2222,7 @@
case EXIT_REASON_CR_ACCESS:
{
__vmread(GUEST_RIP, &eip);
- __get_instruction_length(inst_len);
+ __get_instruction_length_from_VMCS(inst_len);
__vmread(EXIT_QUALIFICATION, &exit_qualification);
HVM_DBG_LOG(DBG_LEVEL_1, "eip = %lx, inst_len =%lx, exit_qualification = %lx",
@@ -2225,24 +2236,24 @@
case EXIT_REASON_DR_ACCESS:
__vmread(EXIT_QUALIFICATION, &exit_qualification);
vmx_dr_access(exit_qualification, ®s);
- __get_instruction_length(inst_len);
+ __get_instruction_length_from_VMCS(inst_len);
__update_guest_eip(inst_len);
break;
case EXIT_REASON_IO_INSTRUCTION:
__vmread(EXIT_QUALIFICATION, &exit_qualification);
- __get_instruction_length(inst_len);
+ __get_instruction_length_from_VMCS(inst_len);
vmx_io_instruction(®s, exit_qualification, inst_len);
TRACE_VMEXIT(4,exit_qualification);
break;
case EXIT_REASON_MSR_READ:
- __get_instruction_length(inst_len);
+ __get_instruction_length_from_VMCS(inst_len);
vmx_do_msr_read(®s);
__update_guest_eip(inst_len);
break;
case EXIT_REASON_MSR_WRITE:
__vmread(GUEST_RIP, &eip);
vmx_do_msr_write(®s);
- __get_instruction_length(inst_len);
+ __get_instruction_length_from_VMCS(inst_len);
__update_guest_eip(inst_len);
break;
case EXIT_REASON_MWAIT_INSTRUCTION:
@@ -2258,8 +2269,8 @@
case EXIT_REASON_VMWRITE:
case EXIT_REASON_VMOFF:
case EXIT_REASON_VMON:
- /* Report invalid opcode exception when a VMX guest tries to execute
- any of the VMX instructions */
+ /* Report invalid opcode exception when a VMX guest tries to execute
+ any of the VMX instructions */
vmx_inject_exception(v, TRAP_invalid_op, VMX_DELIVER_NO_ERROR_CODE);
break;
diff -r 4e1b8be54311 xen/include/asm-x86/hvm/vmx/vmcs.h
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h Thu Apr 27 12:38:21 2006
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h Thu Apr 27 20:02:11 2006
@@ -79,6 +79,53 @@
void *io_bitmap_a, *io_bitmap_b;
struct timer hlt_timer; /* hlt ins emulation wakeup timer */
};
+/*
+ * VMX context
+ */
+typedef struct vmx_context {
+ uint32_t eip; /* execution pointer */
+ uint32_t esp; /* stack pointer */
+ uint32_t eflags; /* flags register */
+ uint32_t cr0;
+ uint32_t cr3; /* page table directory */
+ uint32_t cr4;
+ uint32_t idtr_limit; /* idt */
+ uint32_t idtr_base;
+ uint32_t gdtr_limit; /* gdt */
+ uint32_t gdtr_base;
+ uint32_t cs_sel; /* cs selector */
+ uint32_t cs_limit;
+ uint32_t cs_base;
+ union vmcs_arbytes cs_arbytes;
+ uint32_t ds_sel; /* ds selector */
+ uint32_t ds_limit;
+ uint32_t ds_base;
+ union vmcs_arbytes ds_arbytes;
+ uint32_t es_sel; /* es selector */
+ uint32_t es_limit;
+ uint32_t es_base;
+ union vmcs_arbytes es_arbytes;
+ uint32_t ss_sel; /* ss selector */
+ uint32_t ss_limit;
+ uint32_t ss_base;
+ union vmcs_arbytes ss_arbytes;
+ uint32_t fs_sel; /* fs selector */
+ uint32_t fs_limit;
+ uint32_t fs_base;
+ union vmcs_arbytes fs_arbytes;
+ uint32_t gs_sel; /* gs selector */
+ uint32_t gs_limit;
+ uint32_t gs_base;
+ union vmcs_arbytes gs_arbytes;
+ uint32_t tr_sel; /* task selector */
+ uint32_t tr_limit;
+ uint32_t tr_base;
+ union vmcs_arbytes tr_arbytes;
+ uint32_t ldtr_sel; /* ldtr selector */
+ uint32_t ldtr_limit;
+ uint32_t ldtr_base;
+ union vmcs_arbytes ldtr_arbytes;
+} vmx_context_t;
#define vmx_schedule_tail(next) \
(next)->thread.arch_vmx.arch_vmx_schedule_tail((next))
[-- Attachment #4: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
2006-04-28 1:52 Khoa Huynh
@ 2006-04-28 2:41 ` Anthony Liguori
2006-04-28 6:03 ` Keir Fraser
1 sibling, 0 replies; 19+ messages in thread
From: Anthony Liguori @ 2006-04-28 2:41 UTC (permalink / raw)
To: Khoa Huynh; +Cc: xen-devel
Please don't submit patches with mailers that attach as binary attachments.
You'll have to resubmit anyway as your copyright line is wrong (unless
you really did write this code a thousand years ago ;-))
Regards,
Anthony Liguori
Khoa Huynh wrote:
> On VT-x systems, according to Intel VMX specifications,
> the instruction-length information in VMCS on VM exits
> is not always valid. The instruction-length field in
> VMCS is ONLY valid in the follwing cases: when the VM
> exit is caused by the execution of instructions that
> cause the VM exit unconditionally or based on the
> execution-control bitmap, a software exception (INT3
> or INT0), or a task switch.
>
> For VM exits caused by data faults (hardware exceptions),
> the instruction-length field in VMCS is actually undefined.
> In these cases, the hypervisor can derive the correct
> instruction length by fetching bytes based on the guest
> instruction pointer and decoding those bytes. There is
> already a function to do this in the SVM sub-directory.
> This function should be moved up one level to HVM
> sub-directory, so both VMX and SVM can use it.
>
> It should be noted that VMX only uses this instrlen
> function when the hypervisor needs the instruction-length
> info and that info is undefined in VMCS, e.g., for MMIO
> instructions. In other cases where the instruction-length
> field is valid in VMCS, the hypervisor continues to get
> that info from VMCS (via vmread operation).
>
> I came across this problem in my effort to get Windows
> NT booting on Xen.
>
> There are TWO patches attached below:
>
> * instrlen1.patch effectively moves the instrlen.c file
> from xen/arch/x86/hvm/svm sub-directory up one level to
> xen/arch/x86/hvm sub-directory and makes minor changes
> to instrlen.c so that it will work at its new location.
>
> * instrlen2.patch makes additional changes to VMX code
> so the hypervisor can use the instrlen function correctly
> in all modes in cases where the instruction-length field is
> undefined and read from VMCS in cases where it is defined.
>
> I must acknowledge that most of the code in the first patch
> (instrlen1.patch) does not come from me since the primary
> prupose of this patch is to move the instrlen.c file from
> one location to another in the tree (it also makes some
> minor changes). The second patch (instrlen2.patch) is
> more meaty :-)
>
> These two patches should apply cleanly to the latest
> xen-unstable tree (hg tip = 9866).
>
> I have tested these patches successfully on two systems
> using a variety of guest OSes (e.g. WinXP, Win2003 Server).
>
> Signed-off-by: Khoa Huynh <khoa@us.ibm.com>
>
> (See attached file: instrlen1.patch)(See attached file: instrlen2.patch)
>
> Regards,
> Khoa
> _________________________________________
> Khoa Huynh, Ph.D.
> IBM Linux Technology Center
> (512) 838-4903; T/L 678-4903; khoa@us.ibm.com
> ------------------------------------------------------------------------
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
2006-04-28 1:52 Khoa Huynh
2006-04-28 2:41 ` Anthony Liguori
@ 2006-04-28 6:03 ` Keir Fraser
2006-04-28 18:10 ` Khoa Huynh
1 sibling, 1 reply; 19+ messages in thread
From: Keir Fraser @ 2006-04-28 6:03 UTC (permalink / raw)
To: Khoa Huynh; +Cc: xen-devel
On 28 Apr 2006, at 02:52, Khoa Huynh wrote:
> It should be noted that VMX only uses this instrlen
> function when the hypervisor needs the instruction-length
> info and that info is undefined in VMCS, e.g., for MMIO
> instructions. In other cases where the instruction-length
> field is valid in VMCS, the hypervisor continues to get
> that info from VMCS (via vmread operation).
I don't believe we need the instruction-length at all, and I suspect
that the decoder could be removed from hvm/svm entirely. There are two
broad categories of instruction I'm thinking of:
1. Instructions with their own VMEXIT reason code tend to be really
simple so we know their length anyway and, if not, the instr-length
field should be valid
2. For mmio instructions, the emulator can work out the length for
itself and increment eip appropriately. There's no need to know the
instruction length in advance of invoking the emulator.
I guess there may be one or two instructions, particularly on AMD,
where we aren't feeding the instruction to the mmio emulator and the
instruction isn't fixed length, so perhaps we'll need a small decoder
in hvm/svm for those. But even if so, it could be much simpler than
what is there right now.
-- Keir
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
2006-04-28 6:03 ` Keir Fraser
@ 2006-04-28 18:10 ` Khoa Huynh
2006-04-29 7:21 ` Keir Fraser
0 siblings, 1 reply; 19+ messages in thread
From: Khoa Huynh @ 2006-04-28 18:10 UTC (permalink / raw)
To: Keir Fraser, Petersson, Mats; +Cc: xen-devel
Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote on 04/28/2006 01:03:02 AM:
>
> On 28 Apr 2006, at 02:52, Khoa Huynh wrote:
>
> > It should be noted that VMX only uses this instrlen
> > function when the hypervisor needs the instruction-length
> > info and that info is undefined in VMCS, e.g., for MMIO
> > instructions. In other cases where the instruction-length
> > field is valid in VMCS, the hypervisor continues to get
> > that info from VMCS (via vmread operation).
>
> I don't believe we need the instruction-length at all, and I suspect
> that the decoder could be removed from hvm/svm entirely. There are two
> broad categories of instruction I'm thinking of:
> 1. Instructions with their own VMEXIT reason code tend to be really
> simple so we know their length anyway and, if not, the instr-length
> field should be valid
For these instructions, on Intel VT-x, the instruction length is valid
in VMCS. On AMD, there is a simple look-up function which determines
the length of the instruction which is passed in as a parameter.
We are good here.
> 2. For mmio instructions, the emulator can work out the length for
> itself and increment eip appropriately. There's no need to know the
> instruction length in advance of invoking the emulator.
Yeah, MMIO instructions are problematic and I was trying to address
this area by using the stripped-down emulator for SVM, but you are
suggesting that we get rid of that stripped-down emulator in SVM,
get rid of the MMIO decoder/emulator in HVM directory (platform.c),
and use the generic x86 emulator in xen/arch/x86 for MMIO instructions
instead. This would certainly be much cleaner than having different
versions of decoder/emulator lying around in different places. I wonder
if there would be any noticeable impact on path lengths for MMIO
instructions ?
> > On 28 Apr 2006, at 10:02, Petersson, Mats wrote:
> >
> > > I'll look at your previous suggestion of merging the MMIO emulation
> > > into x86_emulate later on today. We probably do need to sum up the
> > > length and pass it back to the caller - as that code
> > > doesn't know how
> > > to update the correct field of the different processor
> > > architectures
> > > (vmcb vs. vmcs vs. stack-frame for Para-virtual machine). But it
> > > shouldn't be particularly hard to achieve this.
> >
> > The emulator uses and updates the eip field of the passed-in
> > regs structure. We may want to change this interface in
> > future by having the caller explicitly pass in a buffer
> > containing the instruction, and the number of valid bytes in
> > the buffer. Or add a 'fetch_insn_byte'
> > callback hook to the emulator interface.
>
> I think passing a buffer is the best choice here. And I suppose we can
> always stuff vmc[bs]->rip into regs->eip and pull it back out again when
> we get back - using a wrapper function may be the easiest way to achieve
> this (at least short term).
I guess we can have a wrapper that takes as input the guest instruction
pointer (rip), fetches the whole MAX_INST_LEN (15-byte) buffer starting
at rip (make sure that we don't cross a page boundary), and then passes
that to the emulator. The emulator would decode, emulate, and would
include in its return the updated guest instruction pointer (rip) and
instruction length. This info will be stuffed back into vmcs/vmcb/stack
as appropriate. Is this more or less what you have in mind ?
Thanks.
Regards,
Khoa
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems
2006-04-28 18:10 ` Khoa Huynh
@ 2006-04-29 7:21 ` Keir Fraser
0 siblings, 0 replies; 19+ messages in thread
From: Keir Fraser @ 2006-04-29 7:21 UTC (permalink / raw)
To: Khoa Huynh; +Cc: Petersson, Mats, xen-devel
On 28 Apr 2006, at 19:10, Khoa Huynh wrote:
> For these instructions, on Intel VT-x, the instruction length is valid
> in VMCS. On AMD, there is a simple look-up function which determines
> the length of the instruction which is passed in as a parameter.
> We are good here.
The Intel code only uses the instr-len field because it happens to be
handy. Going to a look-up function in a separate file when you *know*
at compile time what the instruction length must be is stupid, imo. We
should only have to do that if the instruction needs some decoding for
us to know its length (perhaps because of prefix bytes or effective
address suffixes) and we are not otherwise going to be decoding the
instruction as part of emulation.
> I guess we can have a wrapper that takes as input the guest instruction
> pointer (rip), fetches the whole MAX_INST_LEN (15-byte) buffer starting
> at rip (make sure that we don't cross a page boundary), and then passes
> that to the emulator. The emulator would decode, emulate, and would
> include in its return the updated guest instruction pointer (rip) and
> instruction length. This info will be stuffed back into
> vmcs/vmcb/stack
> as appropriate. Is this more or less what you have in mind ?
Yes, exactly. It gets a bit trickier though -- we'll have to fill the
buffer with up to 15 bytes. If we fail to get all of 15 bytes (perhaps
because the instruction straddles a page boundary and the second page
has been evicted since the instruction faulted) then we ought to tell
the emulator how many bytes we actually copied and it shoudl return
error if it ends up going off the end of teh instruction buffer.
Alternatively we could re-enter the guest immediately if we cannot read
15 bytes from the EIP -- but that'll cause an infinite loop if the
instruction itself doesn't straddle the page boundary as it won't
trigger paging in the guest. But I/O instructions are unlikely to be in
paged memory.
-- Keir
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2006-05-02 12:36 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-28 9:19 [PATCH] Calculate correct instruction length for data-fault VM exits on VT-x systems Petersson, Mats
2006-04-28 9:24 ` Keir Fraser
2006-04-29 1:20 ` Leendert van Doorn
2006-04-28 20:02 ` Anthony Liguori
2006-04-29 8:00 ` Keir Fraser
2006-04-29 14:54 ` Leendert van Doorn
2006-04-29 10:39 ` Keir Fraser
2006-04-29 23:24 ` Leendert van Doorn
2006-04-29 18:54 ` Keir Fraser
2006-04-30 1:37 ` Leendert van Doorn
2006-04-29 19:46 ` Keir Fraser
-- strict thread matches above, loose matches on Subject: below --
2006-05-02 12:36 Petersson, Mats
2006-04-28 9:02 Petersson, Mats
2006-04-28 9:14 ` Keir Fraser
2006-04-28 1:52 Khoa Huynh
2006-04-28 2:41 ` Anthony Liguori
2006-04-28 6:03 ` Keir Fraser
2006-04-28 18:10 ` Khoa Huynh
2006-04-29 7:21 ` Keir Fraser
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.