From: "Egger, Christoph" <chegger@amazon.de>
To: "Zhang, Yang Z" <yang.z.zhang@intel.com>,
Jan Beulich <JBeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>,
Keir Fraser <keir@xen.org>, "Dong, Eddie" <eddie.dong@intel.com>,
"Nakajima, Jun" <jun.nakajima@intel.com>,
xen-devel <xen-devel@lists.xenproject.org>
Subject: Re: [PATCH 5/5] x86/HVM: cache emulated instruction for retry processing
Date: Tue, 7 Jan 2014 10:43:43 +0100 [thread overview]
Message-ID: <52CBCC4F.8080500@amazon.de> (raw)
In-Reply-To: <A9667DDFB95DB7438FA9D7D576C3D87E0A9A34C2@SHSMSX104.ccr.corp.intel.com>
On 07.01.14 09:54, Zhang, Yang Z wrote:
> Jan Beulich wrote on 2014-01-07:
>>>>> On 24.12.13 at 12:29, "Zhang, Yang Z" <yang.z.zhang@intel.com> wrote:
>>> Jan Beulich wrote on 2013-12-18:
>>>>>>> On 18.12.13 at 10:40, "Zhang, Yang Z" <yang.z.zhang@intel.com>
>> wrote:
>>>>> Jan Beulich wrote on 2013-12-18:
>>>>>>>>> On 18.12.13 at 09:36, "Zhang, Yang Z" <yang.z.zhang@intel.com>
>>>> wrote:
>>>>>>> Jan Beulich wrote on 2013-09-30:
>>>>>>>> Rather than re-reading the instruction bytes upon retry
>>>>>>>> processing, stash away and re-use tha what we already read.
>>>>>>>> That way we can be certain that the retry won't do something
>>>>>>>> different from what requested the retry, getting once again
>>>>>>>> closer to real hardware behavior (where what we use retries for
>>>>>>>> is simply a bus operation, not involving redundant decoding of instructions).
>>>>>>>>
>>>>>>>
>>>>>>> This patch doesn't consider the nested case.
>>>>>>> For example, if the buffer saved the L2's instruction, then
>>>>>>> vmexit to
>>>>>>> L1 and
>>>>>>> L1 may use the wrong instruction.
>>>>>>
>>>>>> I'm having difficulty seeing how the two could get intermixed:
>>>>>> There should be, at any given point in time, at most one
>>>>>> instruction being emulated. Can you please give a more elaborate
>>>>>> explanation of the situation where you see a (theoretical?
>>>>>> practical?)
>>> problem?
>>>>>
>>>>> I saw this issue when booting L1 hyper-v. I added some debug info
>>>>> and saw the strange phenomenon:
>>>>>
>>>>> (XEN) write to buffer: eip 0xfffff8800430bc80, size 16,
>>>>> content:f7420c1f608488b 44000000011442c7
>>>>> (XEN) read from buffer: eip 0xfffff800002f6138, size 16,
>>>>> content:f7420c1f608488b 44000000011442c7
>>>>>
>>>>> From the log, we can see different eip but using the same buffer.
>>>>> Since I don't know how hyper-v working, so I cannot give more
>>>>> information why this happens. And I only saw it with L1 hyper-v
>>>>> (Xen on Xen and KVM on Xen don't have this issue) .
>>>>
>>>> But in order to validate the fix is (a) needed and (b) correct, a
>>>> proper understanding of the issue is necessary as the very first step.
>>>> Which doesn't require knowing internals of Hyper-V, all you need is
>>>> tracking of emulator state changes in Xen (which I realize is said
>>>> easier than done, since you want to make sure you don't generate
>>>> huge amounts of output before actually hitting the issue, making it
>>>> close to
>>> impossible to analyze).
>>>
>>> Ok. It should be an old issue and just exposed by your patch only:
>>> Sometimes, L0 need to decode the L2's instruction to handle IO
>>> access directly. For example, if L1 pass through (not VT-d) a device to L2 directly.
>>> And L0 may get X86EMUL_RETRY when handling this IO request. At same
>>> time, if there is a virtual vmexit pending (for example, an
>>> interrupt pending to inject to L1) and hypervisor will switch the
>>> VCPU context from L2 to L1. Now we already in L1's context, but
>>> since we got a X86EMUL_RETRY just now and this means hyprevisor will
>>> retry to handle the IO request later and unfortunately, the retry will happen in L1's context.
>>> Without your patch, L0 just emulate an L1's instruction and
>>> everything goes on. With your patch, L0 will get the instruction
>>> from the buffer which is belonging to L2, and problem arises.
>>>
>>> So the fixing is that if there is a pending IO request, no virtual
>>> vmexit/vmentry is allowed which following hardware's behavior.
>>>
>>> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c
>>> b/xen/arch/x86/hvm/vmx/vvmx.c index 41db52b..c5446a9 100644
>>> --- a/xen/arch/x86/hvm/vmx/vvmx.c
>>> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
>>> @@ -1394,7 +1394,10 @@ void nvmx_switch_guest(void)
>>> struct vcpu *v = current;
>>> struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
>>> struct cpu_user_regs *regs = guest_cpu_user_regs();
>>> + ioreq_t *p = get_ioreq(v);
>>>
>>> + if ( p->state != STATE_IOREQ_NONE )
>>> + return;
>>> /*
>>> * a softirq may interrupt us between a virtual vmentry is
>>> * just handled and the true vmentry. If during this window,
>>
>> This change looks much more sensible; question is whether simply
>> ignoring the switch request is acceptable (and I don't know the nested
>> HVM code well enough to tell). Furthermore I wonder whether that's really a VMX-only issue.
>
> From hardware's point, an IO operation is handled atomically. So the problem
> will never happen. But in Xen, an IO operation is divided into several
steps.
> Without nested virtualization, the VCPU is paused or looped until the
IO emulation
> is finished. But in nested environment, the VCPU will continue running
even the
> IO emulation is not finished. So my patch will check this and allow
the VCPU to
> continue running only there is no pending IO request. This is matching
hardware's behavior.
>
> I guess SVM also has this problem. But since I don't know nested SVM well,
> so perhaps Christoph can help to have a double check.
For SVM this issue was fixed with commit
d740d811925385c09553cbe6dee8e77c1d43b198
And there is a followup cleanup commit
ac97fa6a21ccd395cca43890bbd0bf32e3255ebb
The change in nestedsvm.c in commit d740d811925385c09553cbe6dee8e77c1d43b198
is actually not SVM specific.
Move that into nhvm_interrupt_blocked()
in hvm.c right before
return hvm_funcs.nhvm_intr_blocked(v);
and the fix applies for both SVM and VMX.
Christoph
next prev parent reply other threads:[~2014-01-07 9:44 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-30 12:51 [PATCH 0/5] x86/HVM: XSA-63 follow-ups Jan Beulich
2013-09-30 12:57 ` [PATCH 1/5] x86/HVM: properly handle backward string instruction emulation Jan Beulich
2013-10-08 16:36 ` Andrew Cooper
2013-09-30 12:57 ` [PATCH 2/5] x86/HVM: fix direct PCI port I/O emulation retry and error handling Jan Beulich
2013-10-08 18:13 ` Andrew Cooper
2013-09-30 12:58 ` [PATCH 3/5] x86/HVM: don't ignore hvm_copy_to_guest_phys() errors during I/O intercept Jan Beulich
2013-10-08 18:20 ` Andrew Cooper
2013-10-09 7:48 ` Jan Beulich
2013-09-30 12:58 ` [PATCH 4/5] x86/HVM: properly deal with hvm_copy_*_guest_phys() errors Jan Beulich
2013-09-30 13:07 ` Andrew Cooper
2013-09-30 12:59 ` [PATCH 5/5] x86/HVM: cache emulated instruction for retry processing Jan Beulich
2013-10-10 11:35 ` Andrew Cooper
2013-12-18 8:36 ` Zhang, Yang Z
2013-12-18 8:48 ` Jan Beulich
2013-12-18 9:40 ` Zhang, Yang Z
2013-12-18 10:53 ` Jan Beulich
2013-12-24 11:29 ` Zhang, Yang Z
2014-01-07 8:28 ` Jan Beulich
2014-01-07 8:54 ` Zhang, Yang Z
2014-01-07 9:43 ` Egger, Christoph [this message]
2014-01-08 5:50 ` Zhang, Yang Z
2014-01-09 12:19 ` Egger, Christoph
2014-01-16 4:42 ` Zhang, Yang Z
2014-01-16 8:23 ` Jan Beulich
2014-01-17 2:53 ` Zhang, Yang Z
2013-10-08 15:10 ` Ping: [PATCH 0/5] x86/HVM: XSA-63 follow-ups Jan Beulich
2013-10-14 7:29 ` Keir Fraser
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52CBCC4F.8080500@amazon.de \
--to=chegger@amazon.de \
--cc=JBeulich@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=eddie.dong@intel.com \
--cc=jun.nakajima@intel.com \
--cc=keir@xen.org \
--cc=xen-devel@lists.xenproject.org \
--cc=yang.z.zhang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).