* Saving/Restoring IA32_TSC_AUX MSR
@ 2009-12-09 16:41 Nakajima, Jun
2009-12-09 16:59 ` Dan Magenheimer
0 siblings, 1 reply; 30+ messages in thread
From: Nakajima, Jun @ 2009-12-09 16:41 UTC (permalink / raw)
To: xen-devel@lists.xensource.com; +Cc: Dan Magenheimer
I see the code like (in arch/x86/time.c), and wondering how IA32_TSC_AUX MSR is saved/restored at domain switch time.
if ( (d->arch.tsc_mode == TSC_MODE_PVRDTSCP) &&
boot_cpu_has(X86_FEATURE_RDTSCP) )
write_rdtscp_aux(d->arch.incarnation);
BTW,
include/asm-x86/msr.h
#define write_rdtscp_aux(val) wrmsr(0xc0000103, (val), 0)
We should write like wrmsr(MSR_TSC_AUX, (val), 0) by adding
+#define MSR_TSC_AUX 0xc0000103 /* Auxiliary TSC */
in include/asm-x86/msr-index.h
Thanks,
Jun
---
Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: Saving/Restoring IA32_TSC_AUX MSR
2009-12-09 16:41 Saving/Restoring IA32_TSC_AUX MSR Nakajima, Jun
@ 2009-12-09 16:59 ` Dan Magenheimer
2009-12-09 17:07 ` Nakajima, Jun
0 siblings, 1 reply; 30+ messages in thread
From: Dan Magenheimer @ 2009-12-09 16:59 UTC (permalink / raw)
To: Nakajima, Jun, xen-devel
Hi Jun --
Xen doesn't expose the TSC rdtscp bit so assumes that
no guests depend on it. So no save/restore of TSC_AUX
is necessary. Xen could provide support for the TSC
rdtscp bit and allow a guest OS to manage TSC_AUX, but
the existing use of TSC_AUX by Linux would fail to
provide the desired result across migration, so there's
little point. Also the pvrdtscp algorithm now assumes
that Xen itself is responsible for updating TSC_AUX
whenever a migration (across physical machines) occurs.
The #define for write_rdtscp_aux is from Linux source,
so I didn't change the code and define the constant.
Dan
> -----Original Message-----
> From: Nakajima, Jun [mailto:jun.nakajima@intel.com]
> Sent: Wednesday, December 09, 2009 9:42 AM
> To: xen-devel@lists.xensource.com
> Cc: Dan Magenheimer
> Subject: Saving/Restoring IA32_TSC_AUX MSR
>
>
> I see the code like (in arch/x86/time.c), and wondering how
> IA32_TSC_AUX MSR is saved/restored at domain switch time.
>
> if ( (d->arch.tsc_mode == TSC_MODE_PVRDTSCP) &&
> boot_cpu_has(X86_FEATURE_RDTSCP) )
> write_rdtscp_aux(d->arch.incarnation);
>
> BTW,
>
> include/asm-x86/msr.h
> #define write_rdtscp_aux(val) wrmsr(0xc0000103, (val), 0)
>
> We should write like wrmsr(MSR_TSC_AUX, (val), 0) by adding
> +#define MSR_TSC_AUX 0xc0000103 /* Auxiliary TSC */
> in include/asm-x86/msr-index.h
>
> Thanks,
> Jun
> ---
> Intel Open Source Technology Center
>
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: Saving/Restoring IA32_TSC_AUX MSR
2009-12-09 16:59 ` Dan Magenheimer
@ 2009-12-09 17:07 ` Nakajima, Jun
2009-12-09 17:22 ` Dan Magenheimer
0 siblings, 1 reply; 30+ messages in thread
From: Nakajima, Jun @ 2009-12-09 17:07 UTC (permalink / raw)
To: Dan Magenheimer, xen-devel@lists.xensource.com
Dan Magenheimer wrote on Wed, 9 Dec 2009 at 08:59:59:
> Hi Jun --
>
Dan,
> Xen doesn't expose the TSC rdtscp bit so assumes that
> no guests depend on it. So no save/restore of TSC_AUX
> is necessary. Xen could provide support for the TSC
But it's possible that multiple domains use the pvrdtscp algorithm, and the incarnation number is domain specific. We also have the issue when adding RDTSCP support for HVM guests.
> rdtscp bit and allow a guest OS to manage TSC_AUX, but
> the existing use of TSC_AUX by Linux would fail to
> provide the desired result across migration, so there's
> little point. Also the pvrdtscp algorithm now assumes
> that Xen itself is responsible for updating TSC_AUX
> whenever a migration (across physical machines) occurs.
>
> The #define for write_rdtscp_aux is from Linux source,
> so I didn't change the code and define the constant.
>
> Dan
>
>> -----Original Message-----
>> From: Nakajima, Jun [mailto:jun.nakajima@intel.com]
>> Sent: Wednesday, December 09, 2009 9:42 AM
>> To: xen-devel@lists.xensource.com
>> Cc: Dan Magenheimer
>> Subject: Saving/Restoring IA32_TSC_AUX MSR
>>
>>
>> I see the code like (in arch/x86/time.c), and wondering how
>> IA32_TSC_AUX MSR is saved/restored at domain switch time.
>>
>> if ( (d->arch.tsc_mode == TSC_MODE_PVRDTSCP) &&
>> boot_cpu_has(X86_FEATURE_RDTSCP) )
>> write_rdtscp_aux(d->arch.incarnation);
>>
>> BTW,
>>
>> include/asm-x86/msr.h
>> #define write_rdtscp_aux(val) wrmsr(0xc0000103, (val), 0)
>>
>> We should write like wrmsr(MSR_TSC_AUX, (val), 0) by adding
>> +#define MSR_TSC_AUX 0xc0000103 /* Auxiliary TSC */
>> in include/asm-x86/msr-index.h
>>
>> Thanks,
>> Jun
>> ---
>> Intel Open Source Technology Center
>>
>>
Jun
___
Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: Saving/Restoring IA32_TSC_AUX MSR
2009-12-09 17:07 ` Nakajima, Jun
@ 2009-12-09 17:22 ` Dan Magenheimer
2009-12-10 11:21 ` Xu, Dongxiao
0 siblings, 1 reply; 30+ messages in thread
From: Dan Magenheimer @ 2009-12-09 17:22 UTC (permalink / raw)
To: Nakajima, Jun, xen-devel
Hi Jun --
> But it's possible that multiple domains use the pvrdtscp
> algorithm, and the incarnation number is domain specific.
OK, I see. The code for writing TSC_AUX is in
__update_vcpu_system_time() not in context switch.
> We also have the issue when adding RDTSCP support for
> HVM guests.
Only if you expose the rdtscp bit via cpuid. This could
certainly be done but, as I said, is probably pointless.
(The pvrdtscp algorithm uses the instruction whether or
not the rdtscp bit is set in cpuid, since Xen emulates
it -- for PV domains only now -- if the physical machine
doesn't support the instruction.
Dan
> -----Original Message-----
> From: Nakajima, Jun [mailto:jun.nakajima@intel.com]
> Sent: Wednesday, December 09, 2009 10:08 AM
> To: Dan Magenheimer; xen-devel@lists.xensource.com
> Subject: RE: Saving/Restoring IA32_TSC_AUX MSR
>
>
> Dan Magenheimer wrote on Wed, 9 Dec 2009 at 08:59:59:
>
> > Hi Jun --
> >
>
> Dan,
>
> > Xen doesn't expose the TSC rdtscp bit so assumes that
> > no guests depend on it. So no save/restore of TSC_AUX
> > is necessary. Xen could provide support for the TSC
>
> But it's possible that multiple domains use the pvrdtscp
> algorithm, and the incarnation number is domain specific. We
> also have the issue when adding RDTSCP support for HVM guests.
>
> > rdtscp bit and allow a guest OS to manage TSC_AUX, but
> > the existing use of TSC_AUX by Linux would fail to
> > provide the desired result across migration, so there's
> > little point. Also the pvrdtscp algorithm now assumes
> > that Xen itself is responsible for updating TSC_AUX
> > whenever a migration (across physical machines) occurs.
> >
> > The #define for write_rdtscp_aux is from Linux source,
> > so I didn't change the code and define the constant.
> >
> > Dan
> >
> >> -----Original Message-----
> >> From: Nakajima, Jun [mailto:jun.nakajima@intel.com]
> >> Sent: Wednesday, December 09, 2009 9:42 AM
> >> To: xen-devel@lists.xensource.com
> >> Cc: Dan Magenheimer
> >> Subject: Saving/Restoring IA32_TSC_AUX MSR
> >>
> >>
> >> I see the code like (in arch/x86/time.c), and wondering how
> >> IA32_TSC_AUX MSR is saved/restored at domain switch time.
> >>
> >> if ( (d->arch.tsc_mode == TSC_MODE_PVRDTSCP) &&
> >> boot_cpu_has(X86_FEATURE_RDTSCP) )
> >> write_rdtscp_aux(d->arch.incarnation);
> >>
> >> BTW,
> >>
> >> include/asm-x86/msr.h
> >> #define write_rdtscp_aux(val) wrmsr(0xc0000103, (val), 0)
> >>
> >> We should write like wrmsr(MSR_TSC_AUX, (val), 0) by adding
> >> +#define MSR_TSC_AUX 0xc0000103 /* Auxiliary TSC */
> >> in include/asm-x86/msr-index.h
> >>
> >> Thanks,
> >> Jun
> >> ---
> >> Intel Open Source Technology Center
> >>
> >>
>
> Jun
> ___
> Intel Open Source Technology Center
>
>
>
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: RE: Saving/Restoring IA32_TSC_AUX MSR
2009-12-09 17:22 ` Dan Magenheimer
@ 2009-12-10 11:21 ` Xu, Dongxiao
2009-12-10 15:49 ` Dan Magenheimer
0 siblings, 1 reply; 30+ messages in thread
From: Xu, Dongxiao @ 2009-12-10 11:21 UTC (permalink / raw)
To: Dan Magenheimer, Nakajima, Jun, xen-devel@lists.xensource.com,
Keir Fraser
Hi, Dan,
I am now trying to add the rdtscp support for Xen HVM guest.
I have some questions about your pvrdtscp patch. See below.
Dan Magenheimer wrote:
> Hi Jun --
>
>> But it's possible that multiple domains use the pvrdtscp
>> algorithm, and the incarnation number is domain specific.
>
> OK, I see. The code for writing TSC_AUX is in
> __update_vcpu_system_time() not in context switch.
Will you modify the place where Hypervisor writes TSC_AUX MSR?
In the current pvrdtscp logic, I think this MSR should be written while
vcpu context switch. Also, this will make HVM support much easier
because that MSR would not be modified by Hypervisor time to time.
>
>> We also have the issue when adding RDTSCP support for
>> HVM guests.
>
> Only if you expose the rdtscp bit via cpuid. This could
> certainly be done but, as I said, is probably pointless.
> (The pvrdtscp algorithm uses the instruction whether or
> not the rdtscp bit is set in cpuid, since Xen emulates
> it -- for PV domains only now -- if the physical machine
> doesn't support the instruction.
We are planning to add HVM support for RDTSCP, and the behavior for this instruction
will follow the native way.
This caused a problem that RDTSCP instruction in application has different experience
upon PV and HVM domains. Do you have any comment about this? Thanks!
Thanks!
Dongxiao
>
> Dan
>
>> -----Original Message-----
>> From: Nakajima, Jun [mailto:jun.nakajima@intel.com]
>> Sent: Wednesday, December 09, 2009 10:08 AM
>> To: Dan Magenheimer; xen-devel@lists.xensource.com
>> Subject: RE: Saving/Restoring IA32_TSC_AUX MSR
>>
>>
>> Dan Magenheimer wrote on Wed, 9 Dec 2009 at 08:59:59:
>>
>>> Hi Jun --
>>>
>>
>> Dan,
>>
>>> Xen doesn't expose the TSC rdtscp bit so assumes that
>>> no guests depend on it. So no save/restore of TSC_AUX
>>> is necessary. Xen could provide support for the TSC
>>
>> But it's possible that multiple domains use the pvrdtscp
>> algorithm, and the incarnation number is domain specific. We
>> also have the issue when adding RDTSCP support for HVM guests.
>>
>>> rdtscp bit and allow a guest OS to manage TSC_AUX, but
>>> the existing use of TSC_AUX by Linux would fail to
>>> provide the desired result across migration, so there's
>>> little point. Also the pvrdtscp algorithm now assumes
>>> that Xen itself is responsible for updating TSC_AUX
>>> whenever a migration (across physical machines) occurs.
>>>
>>> The #define for write_rdtscp_aux is from Linux source,
>>> so I didn't change the code and define the constant.
>>>
>>> Dan
>>>
>>>> -----Original Message-----
>>>> From: Nakajima, Jun [mailto:jun.nakajima@intel.com]
>>>> Sent: Wednesday, December 09, 2009 9:42 AM
>>>> To: xen-devel@lists.xensource.com
>>>> Cc: Dan Magenheimer
>>>> Subject: Saving/Restoring IA32_TSC_AUX MSR
>>>>
>>>>
>>>> I see the code like (in arch/x86/time.c), and wondering how
>>>> IA32_TSC_AUX MSR is saved/restored at domain switch time.
>>>>
>>>> if ( (d->arch.tsc_mode == TSC_MODE_PVRDTSCP) &&
>>>> boot_cpu_has(X86_FEATURE_RDTSCP) )
>>>> write_rdtscp_aux(d->arch.incarnation);
>>>>
>>>> BTW,
>>>>
>>>> include/asm-x86/msr.h
>>>> #define write_rdtscp_aux(val) wrmsr(0xc0000103, (val), 0)
>>>>
>>>> We should write like wrmsr(MSR_TSC_AUX, (val), 0) by adding
>>>> +#define MSR_TSC_AUX 0xc0000103 /* Auxiliary TSC */
>>>> in include/asm-x86/msr-index.h
>>>>
>>>> Thanks,
>>>> Jun
>>>> ---
>>>> Intel Open Source Technology Center
>>>>
>>>>
>>
>> Jun
>> ___
>> Intel Open Source Technology Center
>>
>>
>>
>>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: RE: Saving/Restoring IA32_TSC_AUX MSR
2009-12-10 11:21 ` Xu, Dongxiao
@ 2009-12-10 15:49 ` Dan Magenheimer
2009-12-11 1:22 ` Xu, Dongxiao
0 siblings, 1 reply; 30+ messages in thread
From: Dan Magenheimer @ 2009-12-10 15:49 UTC (permalink / raw)
To: Xu, Dongxiao, Nakajima, Jun, xen-devel, Keir Fraser
Hi Dongxiao --
There are two approaches to adding rdtscp support:
1) Faithful full implementation of rdtscp instruction
2) Support pvrtdtscp algorithm
For (1), you would enable the rdtscp bit in cpuid. Then
on hardware that supports rdtscp, you would do context
switching of TSC_AUX. On hardware that doesn't support
rdtscp, you would intercept the illegal instruction trap
and emulate the instruction. (TSC_AUX emulation
could be handled "lazily", no need to do context
switch for that.)
BUT if you look at how TSC_AUX is used by a native
OS**, the OS sets TSC_AUX to each physical CPU number
so an application can easily determine if successive
rdtscp instructions were not executed on the same
processor. (This was important on older processors
that did not have invariant TSC.) Unfortunately,
on Xen, this mechanism is worthless and misleading
because the OS believes it is setting TSC_AUX to
a physical CPU number but it is actually setting
it to a virtual CPU number, and the physical CPU
number may change at any time due to scheduling
or migration. So an app using rdtscp will get a
wrong answer.
As a result, I do NOT recommend (1) and do recommend
that Xen should continue to return zero for the rdtscp
bit in cpuid.
For (2), setting TSC_AUX in __update_vcpu_system_time()
is fine (I think). On hardware that supports, for HVM
you would need to ensure that the rdtscp instruction
works natively (even though the rdtscp bit in cpuid
is not turned on for the guest). On hardware that
does not support rdtscp, you would intercept the illegal
instruction trap and call the existing code in
pv_soft_rdtsc().
Does that make sense?
Thanks,
Dan
** I've looked at RHEL5. Windows actually always
returns 0 for TSC_AUX.
> -----Original Message-----
> From: Xu, Dongxiao [mailto:dongxiao.xu@intel.com]
> Sent: Thursday, December 10, 2009 4:22 AM
> To: Dan Magenheimer; Nakajima, Jun;
> xen-devel@lists.xensource.com; Keir
> Fraser
> Subject: RE: [Xen-devel] RE: Saving/Restoring IA32_TSC_AUX MSR
>
>
> Hi, Dan,
> I am now trying to add the rdtscp support for Xen HVM guest.
> I have some questions about your pvrdtscp patch. See below.
>
> Dan Magenheimer wrote:
> > Hi Jun --
> >
> >> But it's possible that multiple domains use the pvrdtscp
> >> algorithm, and the incarnation number is domain specific.
> >
> > OK, I see. The code for writing TSC_AUX is in
> > __update_vcpu_system_time() not in context switch.
>
> Will you modify the place where Hypervisor writes TSC_AUX MSR?
> In the current pvrdtscp logic, I think this MSR should be
> written while
> vcpu context switch. Also, this will make HVM support much easier
> because that MSR would not be modified by Hypervisor time to time.
>
> >
> >> We also have the issue when adding RDTSCP support for
> >> HVM guests.
> >
> > Only if you expose the rdtscp bit via cpuid. This could
> > certainly be done but, as I said, is probably pointless.
> > (The pvrdtscp algorithm uses the instruction whether or
> > not the rdtscp bit is set in cpuid, since Xen emulates
> > it -- for PV domains only now -- if the physical machine
> > doesn't support the instruction.
>
> We are planning to add HVM support for RDTSCP, and the
> behavior for this instruction
> will follow the native way.
> This caused a problem that RDTSCP instruction in application
> has different experience
> upon PV and HVM domains. Do you have any comment about this? Thanks!
>
> Thanks!
> Dongxiao
>
> >
> > Dan
> >
> >> -----Original Message-----
> >> From: Nakajima, Jun [mailto:jun.nakajima@intel.com]
> >> Sent: Wednesday, December 09, 2009 10:08 AM
> >> To: Dan Magenheimer; xen-devel@lists.xensource.com
> >> Subject: RE: Saving/Restoring IA32_TSC_AUX MSR
> >>
> >>
> >> Dan Magenheimer wrote on Wed, 9 Dec 2009 at 08:59:59:
> >>
> >>> Hi Jun --
> >>>
> >>
> >> Dan,
> >>
> >>> Xen doesn't expose the TSC rdtscp bit so assumes that
> >>> no guests depend on it. So no save/restore of TSC_AUX
> >>> is necessary. Xen could provide support for the TSC
> >>
> >> But it's possible that multiple domains use the pvrdtscp
> >> algorithm, and the incarnation number is domain specific. We
> >> also have the issue when adding RDTSCP support for HVM guests.
> >>
> >>> rdtscp bit and allow a guest OS to manage TSC_AUX, but
> >>> the existing use of TSC_AUX by Linux would fail to
> >>> provide the desired result across migration, so there's
> >>> little point. Also the pvrdtscp algorithm now assumes
> >>> that Xen itself is responsible for updating TSC_AUX
> >>> whenever a migration (across physical machines) occurs.
> >>>
> >>> The #define for write_rdtscp_aux is from Linux source,
> >>> so I didn't change the code and define the constant.
> >>>
> >>> Dan
> >>>
> >>>> -----Original Message-----
> >>>> From: Nakajima, Jun [mailto:jun.nakajima@intel.com]
> >>>> Sent: Wednesday, December 09, 2009 9:42 AM
> >>>> To: xen-devel@lists.xensource.com
> >>>> Cc: Dan Magenheimer
> >>>> Subject: Saving/Restoring IA32_TSC_AUX MSR
> >>>>
> >>>>
> >>>> I see the code like (in arch/x86/time.c), and wondering how
> >>>> IA32_TSC_AUX MSR is saved/restored at domain switch time.
> >>>>
> >>>> if ( (d->arch.tsc_mode == TSC_MODE_PVRDTSCP) &&
> >>>> boot_cpu_has(X86_FEATURE_RDTSCP) )
> >>>> write_rdtscp_aux(d->arch.incarnation);
> >>>>
> >>>> BTW,
> >>>>
> >>>> include/asm-x86/msr.h
> >>>> #define write_rdtscp_aux(val) wrmsr(0xc0000103, (val), 0)
> >>>>
> >>>> We should write like wrmsr(MSR_TSC_AUX, (val), 0) by adding
> >>>> +#define MSR_TSC_AUX 0xc0000103 /* Auxiliary TSC */
> >>>> in include/asm-x86/msr-index.h
> >>>>
> >>>> Thanks,
> >>>> Jun
> >>>> ---
> >>>> Intel Open Source Technology Center
> >>>>
> >>>>
> >>
> >> Jun
> >> ___
> >> Intel Open Source Technology Center
> >>
> >>
> >>
> >>
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: RE: Saving/Restoring IA32_TSC_AUX MSR
2009-12-10 15:49 ` Dan Magenheimer
@ 2009-12-11 1:22 ` Xu, Dongxiao
2009-12-11 2:00 ` Dan Magenheimer
0 siblings, 1 reply; 30+ messages in thread
From: Xu, Dongxiao @ 2009-12-11 1:22 UTC (permalink / raw)
To: Dan Magenheimer, Nakajima, Jun, xen-devel@lists.xensource.com,
Keir Fraser
Dan,
Thanks for reply, some comments below.
Best Regards,
-- Dongxiao
Dan Magenheimer wrote:
> Hi Dongxiao --
>
> There are two approaches to adding rdtscp support:
>
> 1) Faithful full implementation of rdtscp instruction
> 2) Support pvrtdtscp algorithm
>
> For (1), you would enable the rdtscp bit in cpuid. Then
> on hardware that supports rdtscp, you would do context
> switching of TSC_AUX. On hardware that doesn't support
> rdtscp, you would intercept the illegal instruction trap
> and emulate the instruction. (TSC_AUX emulation
> could be handled "lazily", no need to do context
> switch for that.)
>
> BUT if you look at how TSC_AUX is used by a native
> OS**, the OS sets TSC_AUX to each physical CPU number
> so an application can easily determine if successive
> rdtscp instructions were not executed on the same
> processor. (This was important on older processors
> that did not have invariant TSC.) Unfortunately,
> on Xen, this mechanism is worthless and misleading
> because the OS believes it is setting TSC_AUX to
> a physical CPU number but it is actually setting
> it to a virtual CPU number, and the physical CPU
> number may change at any time due to scheduling
> or migration. So an app using rdtscp will get a
> wrong answer.
However for HVM, we should keep its behavior the same as
on native machine. So if hardware support rdtscp, we will also
support it in HVM; if not, we will not expose that bit in cpuid
to guest.
>
> As a result, I do NOT recommend (1) and do recommend
> that Xen should continue to return zero for the rdtscp
> bit in cpuid.
>
> For (2), setting TSC_AUX in __update_vcpu_system_time()
> is fine (I think). On hardware that supports, for HVM
> you would need to ensure that the rdtscp instruction
> works natively (even though the rdtscp bit in cpuid
> is not turned on for the guest). On hardware that
> does not support rdtscp, you would intercept the illegal
> instruction trap and call the existing code in
> pv_soft_rdtsc().
Put the writing of TSC_AUX MSR in __update_vcpu_system_time()
has a problem that, Hypervisor will overwrite the value time to time,
( For example, at do_softirq()->local_time_calibration() ), even if the
value didn't change (Currently the domain incarnation value only
increase at save/restore/migration). This makes HVM support a bit
Tricky because we need to save/restore guest/host TSC_AUX at every
VMEXIT/VMENTRY. If both PV/HVM could put TSC_AUX writing in
context_switch(), then things will become easier for HVM support.
Do you have idea about It? Thanks! :-)
>
> Does that make sense?
>
> Thanks,
> Dan
>
> ** I've looked at RHEL5. Windows actually always
> returns 0 for TSC_AUX.
>
>> -----Original Message-----
>> From: Xu, Dongxiao [mailto:dongxiao.xu@intel.com]
>> Sent: Thursday, December 10, 2009 4:22 AM
>> To: Dan Magenheimer; Nakajima, Jun;
>> xen-devel@lists.xensource.com; Keir
>> Fraser
>> Subject: RE: [Xen-devel] RE: Saving/Restoring IA32_TSC_AUX MSR
>>
>>
>> Hi, Dan,
>> I am now trying to add the rdtscp support for Xen HVM guest.
>> I have some questions about your pvrdtscp patch. See below.
>>
>> Dan Magenheimer wrote:
>>> Hi Jun --
>>>
>>>> But it's possible that multiple domains use the pvrdtscp
>>>> algorithm, and the incarnation number is domain specific.
>>>
>>> OK, I see. The code for writing TSC_AUX is in
>>> __update_vcpu_system_time() not in context switch.
>>
>> Will you modify the place where Hypervisor writes TSC_AUX MSR?
>> In the current pvrdtscp logic, I think this MSR should be
>> written while
>> vcpu context switch. Also, this will make HVM support much easier
>> because that MSR would not be modified by Hypervisor time to time.
>>
>>>
>>>> We also have the issue when adding RDTSCP support for
>>>> HVM guests.
>>>
>>> Only if you expose the rdtscp bit via cpuid. This could
>>> certainly be done but, as I said, is probably pointless.
>>> (The pvrdtscp algorithm uses the instruction whether or
>>> not the rdtscp bit is set in cpuid, since Xen emulates
>>> it -- for PV domains only now -- if the physical machine
>>> doesn't support the instruction.
>>
>> We are planning to add HVM support for RDTSCP, and the
>> behavior for this instruction
>> will follow the native way.
>> This caused a problem that RDTSCP instruction in application
>> has different experience
>> upon PV and HVM domains. Do you have any comment about this? Thanks!
>>
>> Thanks!
>> Dongxiao
>>
>>>
>>> Dan
>>>
>>>> -----Original Message-----
>>>> From: Nakajima, Jun [mailto:jun.nakajima@intel.com]
>>>> Sent: Wednesday, December 09, 2009 10:08 AM
>>>> To: Dan Magenheimer; xen-devel@lists.xensource.com
>>>> Subject: RE: Saving/Restoring IA32_TSC_AUX MSR
>>>>
>>>>
>>>> Dan Magenheimer wrote on Wed, 9 Dec 2009 at 08:59:59:
>>>>
>>>>> Hi Jun --
>>>>>
>>>>
>>>> Dan,
>>>>
>>>>> Xen doesn't expose the TSC rdtscp bit so assumes that
>>>>> no guests depend on it. So no save/restore of TSC_AUX
>>>>> is necessary. Xen could provide support for the TSC
>>>>
>>>> But it's possible that multiple domains use the pvrdtscp
>>>> algorithm, and the incarnation number is domain specific. We
>>>> also have the issue when adding RDTSCP support for HVM guests.
>>>>
>>>>> rdtscp bit and allow a guest OS to manage TSC_AUX, but
>>>>> the existing use of TSC_AUX by Linux would fail to
>>>>> provide the desired result across migration, so there's
>>>>> little point. Also the pvrdtscp algorithm now assumes
>>>>> that Xen itself is responsible for updating TSC_AUX
>>>>> whenever a migration (across physical machines) occurs.
>>>>>
>>>>> The #define for write_rdtscp_aux is from Linux source,
>>>>> so I didn't change the code and define the constant.
>>>>>
>>>>> Dan
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Nakajima, Jun [mailto:jun.nakajima@intel.com]
>>>>>> Sent: Wednesday, December 09, 2009 9:42 AM
>>>>>> To: xen-devel@lists.xensource.com
>>>>>> Cc: Dan Magenheimer
>>>>>> Subject: Saving/Restoring IA32_TSC_AUX MSR
>>>>>>
>>>>>>
>>>>>> I see the code like (in arch/x86/time.c), and wondering how
>>>>>> IA32_TSC_AUX MSR is saved/restored at domain switch time.
>>>>>>
>>>>>> if ( (d->arch.tsc_mode == TSC_MODE_PVRDTSCP) &&
>>>>>> boot_cpu_has(X86_FEATURE_RDTSCP) )
>>>>>> write_rdtscp_aux(d->arch.incarnation);
>>>>>>
>>>>>> BTW,
>>>>>>
>>>>>> include/asm-x86/msr.h
>>>>>> #define write_rdtscp_aux(val) wrmsr(0xc0000103, (val), 0)
>>>>>>
>>>>>> We should write like wrmsr(MSR_TSC_AUX, (val), 0) by adding
>>>>>> +#define MSR_TSC_AUX 0xc0000103 /* Auxiliary TSC */
>>>>>> in include/asm-x86/msr-index.h
>>>>>>
>>>>>> Thanks,
>>>>>> Jun
>>>>>> ---
>>>>>> Intel Open Source Technology Center
>>>>>>
>>>>>>
>>>>
>>>> Jun
>>>> ___
>>>> Intel Open Source Technology Center
>>>>
>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: RE: Saving/Restoring IA32_TSC_AUX MSR
2009-12-11 1:22 ` Xu, Dongxiao
@ 2009-12-11 2:00 ` Dan Magenheimer
2009-12-11 8:03 ` Keir Fraser
0 siblings, 1 reply; 30+ messages in thread
From: Dan Magenheimer @ 2009-12-11 2:00 UTC (permalink / raw)
To: Xu, Dongxiao, Nakajima, Jun, xen-devel, Keir Fraser; +Cc: Dugger, Donald D
> However for HVM, we should keep its behavior the same as
> on native machine. So if hardware support rdtscp, we will also
> support it in HVM; if not, we will not expose that bit in cpuid
> to guest.
As I said, I think this a very bad idea because there
is no way to ensure the behavior of an app/OS in a VM
gives the same results as in a physical machine.
So I think the cpuid rdtscp bit should always be off.
> increase at save/restore/migration). This makes HVM support a bit
> Tricky because we need to save/restore guest/host TSC_AUX at every
> VMEXIT/VMENTRY. If both PV/HVM could put TSC_AUX writing in
> context_switch(), then things will become easier for HVM support.
If you are doing a full faithful implementation of
rdtscp (as if cpuid rdtscp bit is on), I agree this
is a problem. If not, and the only use of TSC_AUX
is for the pvrdtscp algorithm, I think setting
TSC_AUX in __update_vcpu_system_time() is fine
because TSC_AUX is not part of a VM's context,
it is a communication of information from system
software (Xen) to applications.
I expect that Keir will not support putting TSC_AUX
in the context switch code unless it is absolutely
necessary, as it is certainly expensive to read and
write to TSC_AUX and this cost will add to every
context switch of every VM even though very few will
actually use rdtscp/TSC_AUX.
So I think we need to decide first about approach (1),
the full faithful implementation of rdtscp.
> -----Original Message-----
> From: Xu, Dongxiao [mailto:dongxiao.xu@intel.com]
> Sent: Thursday, December 10, 2009 6:23 PM
> To: Dan Magenheimer; Nakajima, Jun;
> xen-devel@lists.xensource.com; Keir
> Fraser
> Subject: RE: [Xen-devel] RE: Saving/Restoring IA32_TSC_AUX MSR
>
>
> Dan,
> Thanks for reply, some comments below.
>
> Best Regards,
> -- Dongxiao
>
> Dan Magenheimer wrote:
> > Hi Dongxiao --
> >
> > There are two approaches to adding rdtscp support:
> >
> > 1) Faithful full implementation of rdtscp instruction
> > 2) Support pvrtdtscp algorithm
> >
> > For (1), you would enable the rdtscp bit in cpuid. Then
> > on hardware that supports rdtscp, you would do context
> > switching of TSC_AUX. On hardware that doesn't support
> > rdtscp, you would intercept the illegal instruction trap
> > and emulate the instruction. (TSC_AUX emulation
> > could be handled "lazily", no need to do context
> > switch for that.)
> >
> > BUT if you look at how TSC_AUX is used by a native
> > OS**, the OS sets TSC_AUX to each physical CPU number
> > so an application can easily determine if successive
> > rdtscp instructions were not executed on the same
> > processor. (This was important on older processors
> > that did not have invariant TSC.) Unfortunately,
> > on Xen, this mechanism is worthless and misleading
> > because the OS believes it is setting TSC_AUX to
> > a physical CPU number but it is actually setting
> > it to a virtual CPU number, and the physical CPU
> > number may change at any time due to scheduling
> > or migration. So an app using rdtscp will get a
> > wrong answer.
>
> However for HVM, we should keep its behavior the same as
> on native machine. So if hardware support rdtscp, we will also
> support it in HVM; if not, we will not expose that bit in cpuid
> to guest.
>
> >
> > As a result, I do NOT recommend (1) and do recommend
> > that Xen should continue to return zero for the rdtscp
> > bit in cpuid.
> >
> > For (2), setting TSC_AUX in __update_vcpu_system_time()
> > is fine (I think). On hardware that supports, for HVM
> > you would need to ensure that the rdtscp instruction
> > works natively (even though the rdtscp bit in cpuid
> > is not turned on for the guest). On hardware that
> > does not support rdtscp, you would intercept the illegal
> > instruction trap and call the existing code in
> > pv_soft_rdtsc().
>
> Put the writing of TSC_AUX MSR in __update_vcpu_system_time()
> has a problem that, Hypervisor will overwrite the value time to time,
> ( For example, at do_softirq()->local_time_calibration() ),
> even if the
> value didn't change (Currently the domain incarnation value only
> increase at save/restore/migration). This makes HVM support a bit
> Tricky because we need to save/restore guest/host TSC_AUX at every
> VMEXIT/VMENTRY. If both PV/HVM could put TSC_AUX writing in
> context_switch(), then things will become easier for HVM support.
> Do you have idea about It? Thanks! :-)
>
> >
> > Does that make sense?
> >
> > Thanks,
> > Dan
> >
> > ** I've looked at RHEL5. Windows actually always
> > returns 0 for TSC_AUX.
> >
> >> -----Original Message-----
> >> From: Xu, Dongxiao [mailto:dongxiao.xu@intel.com]
> >> Sent: Thursday, December 10, 2009 4:22 AM
> >> To: Dan Magenheimer; Nakajima, Jun;
> >> xen-devel@lists.xensource.com; Keir
> >> Fraser
> >> Subject: RE: [Xen-devel] RE: Saving/Restoring IA32_TSC_AUX MSR
> >>
> >>
> >> Hi, Dan,
> >> I am now trying to add the rdtscp support for Xen HVM guest.
> >> I have some questions about your pvrdtscp patch. See below.
> >>
> >> Dan Magenheimer wrote:
> >>> Hi Jun --
> >>>
> >>>> But it's possible that multiple domains use the pvrdtscp
> >>>> algorithm, and the incarnation number is domain specific.
> >>>
> >>> OK, I see. The code for writing TSC_AUX is in
> >>> __update_vcpu_system_time() not in context switch.
> >>
> >> Will you modify the place where Hypervisor writes TSC_AUX MSR?
> >> In the current pvrdtscp logic, I think this MSR should be
> >> written while
> >> vcpu context switch. Also, this will make HVM support much easier
> >> because that MSR would not be modified by Hypervisor time to time.
> >>
> >>>
> >>>> We also have the issue when adding RDTSCP support for
> >>>> HVM guests.
> >>>
> >>> Only if you expose the rdtscp bit via cpuid. This could
> >>> certainly be done but, as I said, is probably pointless.
> >>> (The pvrdtscp algorithm uses the instruction whether or
> >>> not the rdtscp bit is set in cpuid, since Xen emulates
> >>> it -- for PV domains only now -- if the physical machine
> >>> doesn't support the instruction.
> >>
> >> We are planning to add HVM support for RDTSCP, and the
> >> behavior for this instruction
> >> will follow the native way.
> >> This caused a problem that RDTSCP instruction in application
> >> has different experience
> >> upon PV and HVM domains. Do you have any comment about
> this? Thanks!
> >>
> >> Thanks!
> >> Dongxiao
> >>
> >>>
> >>> Dan
> >>>
> >>>> -----Original Message-----
> >>>> From: Nakajima, Jun [mailto:jun.nakajima@intel.com]
> >>>> Sent: Wednesday, December 09, 2009 10:08 AM
> >>>> To: Dan Magenheimer; xen-devel@lists.xensource.com
> >>>> Subject: RE: Saving/Restoring IA32_TSC_AUX MSR
> >>>>
> >>>>
> >>>> Dan Magenheimer wrote on Wed, 9 Dec 2009 at 08:59:59:
> >>>>
> >>>>> Hi Jun --
> >>>>>
> >>>>
> >>>> Dan,
> >>>>
> >>>>> Xen doesn't expose the TSC rdtscp bit so assumes that
> >>>>> no guests depend on it. So no save/restore of TSC_AUX
> >>>>> is necessary. Xen could provide support for the TSC
> >>>>
> >>>> But it's possible that multiple domains use the pvrdtscp
> >>>> algorithm, and the incarnation number is domain specific. We
> >>>> also have the issue when adding RDTSCP support for HVM guests.
> >>>>
> >>>>> rdtscp bit and allow a guest OS to manage TSC_AUX, but
> >>>>> the existing use of TSC_AUX by Linux would fail to
> >>>>> provide the desired result across migration, so there's
> >>>>> little point. Also the pvrdtscp algorithm now assumes
> >>>>> that Xen itself is responsible for updating TSC_AUX
> >>>>> whenever a migration (across physical machines) occurs.
> >>>>>
> >>>>> The #define for write_rdtscp_aux is from Linux source,
> >>>>> so I didn't change the code and define the constant.
> >>>>>
> >>>>> Dan
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Nakajima, Jun [mailto:jun.nakajima@intel.com]
> >>>>>> Sent: Wednesday, December 09, 2009 9:42 AM
> >>>>>> To: xen-devel@lists.xensource.com
> >>>>>> Cc: Dan Magenheimer
> >>>>>> Subject: Saving/Restoring IA32_TSC_AUX MSR
> >>>>>>
> >>>>>>
> >>>>>> I see the code like (in arch/x86/time.c), and wondering how
> >>>>>> IA32_TSC_AUX MSR is saved/restored at domain switch time.
> >>>>>>
> >>>>>> if ( (d->arch.tsc_mode == TSC_MODE_PVRDTSCP) &&
> >>>>>> boot_cpu_has(X86_FEATURE_RDTSCP) )
> >>>>>> write_rdtscp_aux(d->arch.incarnation);
> >>>>>>
> >>>>>> BTW,
> >>>>>>
> >>>>>> include/asm-x86/msr.h
> >>>>>> #define write_rdtscp_aux(val) wrmsr(0xc0000103, (val), 0)
> >>>>>>
> >>>>>> We should write like wrmsr(MSR_TSC_AUX, (val), 0) by adding
> >>>>>> +#define MSR_TSC_AUX 0xc0000103 /* Auxiliary TSC */
> >>>>>> in include/asm-x86/msr-index.h
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Jun
> >>>>>> ---
> >>>>>> Intel Open Source Technology Center
> >>>>>>
> >>>>>>
> >>>>
> >>>> Jun
> >>>> ___
> >>>> Intel Open Source Technology Center
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>> _______________________________________________
> >>> Xen-devel mailing list
> >>> Xen-devel@lists.xensource.com
> >>> http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: RE: Saving/Restoring IA32_TSC_AUX MSR
2009-12-11 2:00 ` Dan Magenheimer
@ 2009-12-11 8:03 ` Keir Fraser
2009-12-11 8:43 ` Zhang, Xiantao
0 siblings, 1 reply; 30+ messages in thread
From: Keir Fraser @ 2009-12-11 8:03 UTC (permalink / raw)
To: Dan Magenheimer, Xu, Dongxiao, Nakajima, Jun,
xen-devel@lists.xensource.com
Cc: Dugger, Donald D
On 11/12/2009 02:00, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:
> I expect that Keir will not support putting TSC_AUX
> in the context switch code unless it is absolutely
> necessary, as it is certainly expensive to read and
> write to TSC_AUX and this cost will add to every
> context switch of every VM even though very few will
> actually use rdtscp/TSC_AUX.
Well, you'd make it dependent on the guest using TSC_AUX, I suppose. I think
that's going to be pretty rare.
> So I think we need to decide first about approach (1),
> the full faithful implementation of rdtscp.
The question has to be: what win do we get for faithful virtualisation of
RDTSCP in a virtualised environment? Supporting CPU instructions just
because they're there is not a useful effort.
-- Keir
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: RE: Saving/Restoring IA32_TSC_AUX MSR
2009-12-11 8:03 ` Keir Fraser
@ 2009-12-11 8:43 ` Zhang, Xiantao
2009-12-11 9:22 ` Keir Fraser
0 siblings, 1 reply; 30+ messages in thread
From: Zhang, Xiantao @ 2009-12-11 8:43 UTC (permalink / raw)
To: Keir Fraser, Dan Magenheimer, Xu, Dongxiao, Nakajima, Jun,
xen-devel
Cc: Dugger, Donald D
Keir Fraser wrote:
> On 11/12/2009 02:00, "Dan Magenheimer" <dan.magenheimer@oracle.com>
> wrote:
>
>> I expect that Keir will not support putting TSC_AUX
>> in the context switch code unless it is absolutely
>> necessary, as it is certainly expensive to read and
>> write to TSC_AUX and this cost will add to every
>> context switch of every VM even though very few will
>> actually use rdtscp/TSC_AUX.
>
> Well, you'd make it dependent on the guest using TSC_AUX, I suppose.
> I think that's going to be pretty rare.
>
>> So I think we need to decide first about approach (1),
>> the full faithful implementation of rdtscp.
>
> The question has to be: what win do we get for faithful
> virtualisation of RDTSCP in a virtualised environment? Supporting CPU
> instructions just because they're there is not a useful effort.
As I know, RDTSCP can used to implment fast vgetcpu in newer Linux kernel. Current node and cpu info is saved in the MSR, and applications or libraries can get this info at ring3 through this instruction. If enable this instruction for vmx non-root mode, it should benefit these kernels I think.
Thanks!
Xiantao
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: RE: Saving/Restoring IA32_TSC_AUX MSR
2009-12-11 8:43 ` Zhang, Xiantao
@ 2009-12-11 9:22 ` Keir Fraser
2009-12-11 15:09 ` Dan Magenheimer
0 siblings, 1 reply; 30+ messages in thread
From: Keir Fraser @ 2009-12-11 9:22 UTC (permalink / raw)
To: Zhang, Xiantao, Dan Magenheimer, Xu, Dongxiao, Nakajima, Jun,
xen-devel
Cc: Dugger, Donald D
On 11/12/2009 08:43, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote:
>> The question has to be: what win do we get for faithful
>> virtualisation of RDTSCP in a virtualised environment? Supporting CPU
>> instructions just because they're there is not a useful effort.
>
> As I know, RDTSCP can used to implment fast vgetcpu in newer Linux kernel.
> Current node and cpu info is saved in the MSR, and applications or libraries
> can get this info at ring3 through this instruction. If enable this
> instruction for vmx non-root mode, it should benefit these kernels I think.
Sounds reasonable. Obviously this will be incompatible with pvrdtscp, but
the latter is off by default so this isn't a too serious problem I think.
Pvrdtscp will simply trump ordinary RDTSCP emulation when it is enabled.
You can put your meddling with TSC_AUX MSR in the context-switch path,
regradless of whether pvrdtscp's stays in __update_vcpu_system_time().
In short: have at it.
-- Keir
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: RE: Saving/Restoring IA32_TSC_AUX MSR
2009-12-11 9:22 ` Keir Fraser
@ 2009-12-11 15:09 ` Dan Magenheimer
2009-12-11 15:28 ` Xu, Dongxiao
2009-12-11 18:20 ` Jeremy Fitzhardinge
0 siblings, 2 replies; 30+ messages in thread
From: Dan Magenheimer @ 2009-12-11 15:09 UTC (permalink / raw)
To: Keir Fraser, Zhang, Xiantao, Xu, Dongxiao, Nakajima, Jun,
xen-devel
Cc: Dugger, Donald D
> As I know, RDTSCP can used to implment fast vgetcpu in
> newer Linux kernel.
Yes, but code which uses fast vgetcpu is expecting
to get physical cpu and physical node number. Since
an HVM guest OS only has access to virtual cpu and
virtual node number, the information written to TSC_AUX
by a guest OS is misleading and may silently break any
userland code that assumes it is getting physical
information.
I continue to think this is a bad idea and, to use Keir's
words, is "Supporting CPU instructions just because
they're there".
But, if I am overruled, I'd like to see some measurement
of the cycle cost for writing to TSC_AUX. Since
Linux only writes it once at __cpuinit time, I wouldn't
be surprised to find out that it is horribly slow
and adding it to every context switch would be slowing
down all users of Xen for a handful of applications --
that are getting incorrect information (vcpu vs pcpu)
anyway.
> -----Original Message-----
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> Sent: Friday, December 11, 2009 2:22 AM
> To: Zhang, Xiantao; Dan Magenheimer; Xu, Dongxiao; Nakajima, Jun;
> xen-devel@lists.xensource.com
> Cc: Dugger, Donald D
> Subject: Re: [Xen-devel] RE: Saving/Restoring IA32_TSC_AUX MSR
>
>
> On 11/12/2009 08:43, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote:
>
> >> The question has to be: what win do we get for faithful
> >> virtualisation of RDTSCP in a virtualised environment?
> Supporting CPU
> >> instructions just because they're there is not a useful effort.
> >
> > As I know, RDTSCP can used to implment fast vgetcpu in
> newer Linux kernel.
> > Current node and cpu info is saved in the MSR, and
> applications or libraries
> > can get this info at ring3 through this instruction. If enable this
> > instruction for vmx non-root mode, it should benefit these
> kernels I think.
>
> Sounds reasonable. Obviously this will be incompatible with
> pvrdtscp, but
> the latter is off by default so this isn't a too serious
> problem I think.
> Pvrdtscp will simply trump ordinary RDTSCP emulation when it
> is enabled.
>
> You can put your meddling with TSC_AUX MSR in the context-switch path,
> regradless of whether pvrdtscp's stays in __update_vcpu_system_time().
>
> In short: have at it.
>
> -- Keir
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: RE: Saving/Restoring IA32_TSC_AUX MSR
2009-12-11 15:09 ` Dan Magenheimer
@ 2009-12-11 15:28 ` Xu, Dongxiao
2009-12-11 16:12 ` Dan Magenheimer
2009-12-11 18:20 ` Jeremy Fitzhardinge
1 sibling, 1 reply; 30+ messages in thread
From: Xu, Dongxiao @ 2009-12-11 15:28 UTC (permalink / raw)
To: Dan Magenheimer, Keir Fraser, Zhang, Xiantao, Nakajima, Jun,
xen-devel@
Cc: Dugger, Donald D
Dan Magenheimer wrote:
>> As I know, RDTSCP can used to implment fast vgetcpu in
>> newer Linux kernel.
>
> Yes, but code which uses fast vgetcpu is expecting
> to get physical cpu and physical node number. Since
> an HVM guest OS only has access to virtual cpu and
> virtual node number, the information written to TSC_AUX
> by a guest OS is misleading and may silently break any
> userland code that assumes it is getting physical
> information.
This is depend on how the node info is virtualized.
If the virtual node could reflect the physical
node info, what rdtscp returns is valuable to applications.
>
> I continue to think this is a bad idea and, to use Keir's
> words, is "Supporting CPU instructions just because
> they're there".
>
> But, if I am overruled, I'd like to see some measurement
> of the cycle cost for writing to TSC_AUX. Since
> Linux only writes it once at __cpuinit time, I wouldn't
> be surprised to find out that it is horribly slow
> and adding it to every context switch would be slowing
> down all users of Xen for a handful of applications --
> that are getting incorrect information (vcpu vs pcpu)
> anyway.
According to the current PVRDTSC logic, write_rdtscp_aux()
is called in each scheduling ( schedule()->
update_vcpu_system_time()->__update_vcpu_system_time()->
write_rdtscp_aux() ), which is more frequent than
__context_switch().
>
>> -----Original Message-----
>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>> Sent: Friday, December 11, 2009 2:22 AM
>> To: Zhang, Xiantao; Dan Magenheimer; Xu, Dongxiao; Nakajima, Jun;
>> xen-devel@lists.xensource.com Cc: Dugger, Donald D
>> Subject: Re: [Xen-devel] RE: Saving/Restoring IA32_TSC_AUX MSR
>>
>>
>> On 11/12/2009 08:43, "Zhang, Xiantao" <xiantao.zhang@intel.com>
>> wrote:
>>
>>>> The question has to be: what win do we get for faithful
>>>> virtualisation of RDTSCP in a virtualised environment? Supporting
>>>> CPU instructions just because they're there is not a useful effort.
>>>
>>> As I know, RDTSCP can used to implment fast vgetcpu in newer Linux
>>> kernel. Current node and cpu info is saved in the MSR, and
>>> applications or libraries can get this info at ring3 through this
>>> instruction. If enable this instruction for vmx non-root mode, it
>>> should benefit these kernels I think.
>>
>> Sounds reasonable. Obviously this will be incompatible with
>> pvrdtscp, but
>> the latter is off by default so this isn't a too serious
>> problem I think.
>> Pvrdtscp will simply trump ordinary RDTSCP emulation when it
>> is enabled.
>>
>> You can put your meddling with TSC_AUX MSR in the context-switch
>> path, regradless of whether pvrdtscp's stays in
>> __update_vcpu_system_time().
>>
>> In short: have at it.
>>
>> -- Keir
>>
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
Best Regards,
-- Dongxiao
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: RE: Saving/Restoring IA32_TSC_AUX MSR
2009-12-11 15:28 ` Xu, Dongxiao
@ 2009-12-11 16:12 ` Dan Magenheimer
2009-12-11 18:38 ` Nakajima, Jun
0 siblings, 1 reply; 30+ messages in thread
From: Dan Magenheimer @ 2009-12-11 16:12 UTC (permalink / raw)
To: Xu, Dongxiao, Keir Fraser, Zhang, Xiantao, Nakajima, Jun,
xen-devel
Cc: Dugger, Donald D
> > Yes, but code which uses fast vgetcpu is expecting
> > to get physical cpu and physical node number. Since
> > an HVM guest OS only has access to virtual cpu and
> > virtual node number, the information written to TSC_AUX
> > by a guest OS is misleading and may silently break any
> > userland code that assumes it is getting physical
> > information.
>
> This is depend on how the node info is virtualized.
> If the virtual node could reflect the physical
> node info, what rdtscp returns is valuable to applications.
If it is possible to ensure that the cpu/node info
is virtualized so that TSC_AUX always correctly provides the
information needed by apps, I agree this would be
valuable. I don't see how this is possible, but maybe
you have some creative ideas?
> According to the current PVRDTSC logic, write_rdtscp_aux()
> is called in each scheduling ( schedule()->
> update_vcpu_system_time()->__update_vcpu_system_time()->
> write_rdtscp_aux() ), which is more frequent than
> __context_switch().
OK, I see. Then I am OK with moving the call to write_rdtscp_aux()
Dan
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: RE: Saving/Restoring IA32_TSC_AUX MSR
2009-12-11 15:09 ` Dan Magenheimer
2009-12-11 15:28 ` Xu, Dongxiao
@ 2009-12-11 18:20 ` Jeremy Fitzhardinge
2009-12-11 18:35 ` Dan Magenheimer
1 sibling, 1 reply; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2009-12-11 18:20 UTC (permalink / raw)
To: Dan Magenheimer
Cc: xen-devel, Dugger, Donald D, Xu, Dongxiao, Keir Fraser,
Nakajima, Jun, Zhang, Xiantao
On 12/11/09 07:09, Dan Magenheimer wrote:
>> As I know, RDTSCP can used to implment fast vgetcpu in
>> newer Linux kernel.
>>
> Yes, but code which uses fast vgetcpu is expecting
> to get physical cpu and physical node number. Since
> an HVM guest OS only has access to virtual cpu and
> virtual node number, the information written to TSC_AUX
> by a guest OS is misleading and may silently break any
> userland code that assumes it is getting physical
> information.
>
It will fall back to using the segment limit trick to get vcpu+vnode
info if rdtscp isn't available, so they'll get the info either way.
It's not clear how many apps make good use of the numa node info, but
presumably some do. So long as the virtual numa info bears some vague
resemblance to the real topology then they could still make use of it in
a Xen domain. Whether or not Xen currently implements that is a
separate question.
However, the vcpu number is definitely useful to usermode apps, so they
can get some idea how they're moved between (v)cpus. I don't think it
will matter to them that it isn't pcpu.
J
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: RE: Saving/Restoring IA32_TSC_AUX MSR
2009-12-11 18:20 ` Jeremy Fitzhardinge
@ 2009-12-11 18:35 ` Dan Magenheimer
2009-12-11 18:50 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 30+ messages in thread
From: Dan Magenheimer @ 2009-12-11 18:35 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: xen-devel, Dugger, Donald D, Xu, Dongxiao, Keir Fraser,
Nakajima, Jun, Zhang, Xiantao
> However, the vcpu number is definitely useful to usermode
> apps, so they
> can get some idea how they're moved between (v)cpus. I don't
> think it
> will matter to them that it isn't pcpu.
My point is that an app running on native Linux can
safely assume that, if TSC_AUX==3 at time T1 and
TSC_AUX is still 3 at time T2,it is running
on the same processor and the same node at both T1
and T2. In a virtual environment it cannot even
assume it is running on the same machine.
Further if the app sees that TSC_AUX==2 at time T3
and TSC_AUX==3 at time T4, on native Linux it
can safely assume that it is running on a different
processor. While rarer, in a virtual environment,
this may also be a false assumption.
That's why I say the information is misleading.
> -----Original Message-----
> From: Jeremy Fitzhardinge [mailto:jeremy@goop.org]
> Sent: Friday, December 11, 2009 11:21 AM
> To: Dan Magenheimer
> Cc: Keir Fraser; Zhang, Xiantao; Xu, Dongxiao; Nakajima, Jun;
> xen-devel@lists.xensource.com; Dugger, Donald D
> Subject: Re: [Xen-devel] RE: Saving/Restoring IA32_TSC_AUX MSR
>
>
> On 12/11/09 07:09, Dan Magenheimer wrote:
> >> As I know, RDTSCP can used to implment fast vgetcpu in
> >> newer Linux kernel.
> >>
> > Yes, but code which uses fast vgetcpu is expecting
> > to get physical cpu and physical node number. Since
> > an HVM guest OS only has access to virtual cpu and
> > virtual node number, the information written to TSC_AUX
> > by a guest OS is misleading and may silently break any
> > userland code that assumes it is getting physical
> > information.
> >
>
> It will fall back to using the segment limit trick to get vcpu+vnode
> info if rdtscp isn't available, so they'll get the info either way.
>
> It's not clear how many apps make good use of the numa node info, but
> presumably some do. So long as the virtual numa info bears
> some vague
> resemblance to the real topology then they could still make
> use of it in
> a Xen domain. Whether or not Xen currently implements that is a
> separate question.
>
> However, the vcpu number is definitely useful to usermode
> apps, so they
> can get some idea how they're moved between (v)cpus. I don't
> think it
> will matter to them that it isn't pcpu.
>
> J
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: RE: Saving/Restoring IA32_TSC_AUX MSR
2009-12-11 16:12 ` Dan Magenheimer
@ 2009-12-11 18:38 ` Nakajima, Jun
2009-12-11 19:46 ` Dan Magenheimer
0 siblings, 1 reply; 30+ messages in thread
From: Nakajima, Jun @ 2009-12-11 18:38 UTC (permalink / raw)
To: Dan Magenheimer, Xu, Dongxiao, Keir Fraser, Zhang, Xiantao,
xen-devel
Cc: Dugger, Donald D
Dan Magenheimer wrote on Fri, 11 Dec 2009 at 08:12:00:
>>> Yes, but code which uses fast vgetcpu is expecting
>>> to get physical cpu and physical node number. Since
>>> an HVM guest OS only has access to virtual cpu and
>>> virtual node number, the information written to TSC_AUX
>>> by a guest OS is misleading and may silently break any
>>> userland code that assumes it is getting physical
>>> information.
>>
>> This is depend on how the node info is virtualized.
>> If the virtual node could reflect the physical
>> node info, what rdtscp returns is valuable to applications.
>
> If it is possible to ensure that the cpu/node info
> is virtualized so that TSC_AUX always correctly provides the
> information needed by apps, I agree this would be
> valuable. I don't see how this is possible, but maybe
> you have some creative ideas?
It's possible, and the way guest NUMA supposed to be. We are working on that.
Jun
___
Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: RE: Saving/Restoring IA32_TSC_AUX MSR
2009-12-11 18:35 ` Dan Magenheimer
@ 2009-12-11 18:50 ` Jeremy Fitzhardinge
2009-12-11 19:29 ` Nakajima, Jun
0 siblings, 1 reply; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2009-12-11 18:50 UTC (permalink / raw)
To: Dan Magenheimer
Cc: xen-devel, Dugger, Donald D, Xu, Dongxiao, Keir Fraser,
Nakajima, Jun, Zhang, Xiantao
On 12/11/09 10:35, Dan Magenheimer wrote:
>> However, the vcpu number is definitely useful to usermode
>> apps, so they
>> can get some idea how they're moved between (v)cpus. I don't
>> think it
>> will matter to them that it isn't pcpu.
>>
> My point is that an app running on native Linux can
> safely assume that, if TSC_AUX==3 at time T1 and
> TSC_AUX is still 3 at time T2,it is running
> on the same processor and the same node at both T1
> and T2. In a virtual environment it cannot even
> assume it is running on the same machine.
> Further if the app sees that TSC_AUX==2 at time T3
> and TSC_AUX==3 at time T4, on native Linux it
> can safely assume that it is running on a different
> processor. While rarer, in a virtual environment,
> this may also be a false assumption.
>
> That's why I say the information is misleading.
>
Sure, but that info is, at best, of heuristic value, and won't cause any
correctness problems if it is wrong. The performance may suck, but
that's part of the larger problem of running NUMA-aware code in a
virtual environment.
J
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: RE: Saving/Restoring IA32_TSC_AUX MSR
2009-12-11 18:50 ` Jeremy Fitzhardinge
@ 2009-12-11 19:29 ` Nakajima, Jun
2009-12-11 22:23 ` Dan Magenheimer
0 siblings, 1 reply; 30+ messages in thread
From: Nakajima, Jun @ 2009-12-11 19:29 UTC (permalink / raw)
To: Jeremy Fitzhardinge, Dan Magenheimer
Cc: Xu, Dongxiao, xen-devel@lists.xensource.com, Dugger, Donald D,
Keir Fraser, Zhang, Xiantao
Jeremy Fitzhardinge wrote on Fri, 11 Dec 2009 at 10:50:29:
> On 12/11/09 10:35, Dan Magenheimer wrote:
>>> However, the vcpu number is definitely useful to usermode
>>> apps, so they
>>> can get some idea how they're moved between (v)cpus. I don't
>>> think it
>>> will matter to them that it isn't pcpu.
>>>
>> My point is that an app running on native Linux can
>> safely assume that, if TSC_AUX==3 at time T1 and
>> TSC_AUX is still 3 at time T2,it is running
>> on the same processor and the same node at both T1
>> and T2. In a virtual environment it cannot even
>> assume it is running on the same machine.
>> Further if the app sees that TSC_AUX==2 at time T3
>> and TSC_AUX==3 at time T4, on native Linux it
>> can safely assume that it is running on a different
>> processor. While rarer, in a virtual environment,
>> this may also be a false assumption.
>>
>> That's why I say the information is misleading.
>>
> Sure, but that info is, at best, of heuristic value, and won't cause
> any correctness problems if it is wrong. The performance may suck, but
> that's part of the larger problem of running NUMA-aware code in a
> virtual environment.
>
And to utilize various NUMA optimizations in the kernel/apps in the guest, we need "the virtual numa info bears some vague resemblance to the real topology" (from Jeremy's email) with the vcpus bound to the CPU/node.
I understand that enabling RDTSCP in HVM will disable the pvrdtscp algorithm if used by the kernel. One way is to mask off the feature in CPUID (by default). Then kernel won't use it.
Jun
___
Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: RE: Saving/Restoring IA32_TSC_AUX MSR
2009-12-11 18:38 ` Nakajima, Jun
@ 2009-12-11 19:46 ` Dan Magenheimer
0 siblings, 0 replies; 30+ messages in thread
From: Dan Magenheimer @ 2009-12-11 19:46 UTC (permalink / raw)
To: Nakajima, Jun, Xu, Dongxiao, Keir Fraser, Zhang, Xiantao,
xen-devel
Cc: Dugger, Donald D
> It's possible, and the way guest NUMA supposed to be. We are
> working on that.
I'd be very interested in learning more about your plans.
> -----Original Message-----
> From: Nakajima, Jun [mailto:jun.nakajima@intel.com]
> Sent: Friday, December 11, 2009 11:38 AM
> To: Dan Magenheimer; Xu, Dongxiao; Keir Fraser; Zhang, Xiantao;
> xen-devel@lists.xensource.com
> Cc: Dugger, Donald D
> Subject: RE: [Xen-devel] RE: Saving/Restoring IA32_TSC_AUX MSR
>
>
> Dan Magenheimer wrote on Fri, 11 Dec 2009 at 08:12:00:
>
> >>> Yes, but code which uses fast vgetcpu is expecting
> >>> to get physical cpu and physical node number. Since
> >>> an HVM guest OS only has access to virtual cpu and
> >>> virtual node number, the information written to TSC_AUX
> >>> by a guest OS is misleading and may silently break any
> >>> userland code that assumes it is getting physical
> >>> information.
> >>
> >> This is depend on how the node info is virtualized.
> >> If the virtual node could reflect the physical
> >> node info, what rdtscp returns is valuable to applications.
> >
> > If it is possible to ensure that the cpu/node info
> > is virtualized so that TSC_AUX always correctly provides the
> > information needed by apps, I agree this would be
> > valuable. I don't see how this is possible, but maybe
> > you have some creative ideas?
>
> It's possible, and the way guest NUMA supposed to be. We are
> working on that.
>
> Jun
> ___
> Intel Open Source Technology Center
>
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: RE: Saving/Restoring IA32_TSC_AUX MSR
2009-12-11 19:29 ` Nakajima, Jun
@ 2009-12-11 22:23 ` Dan Magenheimer
2009-12-11 22:58 ` Nakajima, Jun
2009-12-13 9:17 ` Zhang, Xiantao
0 siblings, 2 replies; 30+ messages in thread
From: Dan Magenheimer @ 2009-12-11 22:23 UTC (permalink / raw)
To: Nakajima, Jun, Jeremy Fitzhardinge
Cc: Xu, Dongxiao, xen-devel, Dugger, Donald D, Keir Fraser,
Zhang, Xiantao
Well, although it might be nice to be able to use
rdtscp and TSC_AUX to determine pcpu/vcpu/pnode/vnode
information, I think Jeremy and Jan convinced me in
another thread a couple of months ago that in userland:
x = vgetcpu()
do_other_stuff();
y = vgetcpu()
if x==1 and y==2, there's no way to determine that
do_other_stuff() was executed on cpu 1 vs cpu 2,
or (though unlikely) even on cpu 3. And if
x==y==4, there's no guarantee that do_other_stuff()
is executed on cpu 4.
If this is true the only safe use of TSC_AUX is for
its originally designed intent: To determine if two
successive rdtscp instructions were or were not
executed on the same processor. Since this cannot
be guaranteed in a VM, that's a reasonable argument
that TSC_AUX shouldn't be exposed at all (meaning the
rdtscp bit in cpuid should be turned off by Xen).
True, as long as the information is ONLY used
heuristically to obtain pcpu/vcpu/pnode/vnode info,
and no guarantee of correctness is implied or expected,
it might be useful some of the time.
But frankly, if "performance sucks" when the heuristic
fails due to the fact that the app is running on
a VM instead of native OS, I'd see that as a problem
and suggest the proper way to fix that is to define
more App-to-Xen ABIs so that the app can get the
real information, not a heuristic. Which also argues
for Xen leaving the rdtscp bit in cpuid turned off
Dan
> -----Original Message-----
> From: Nakajima, Jun [mailto:jun.nakajima@intel.com]
> Sent: Friday, December 11, 2009 12:30 PM
> To: Jeremy Fitzhardinge; Dan Magenheimer
> Cc: Keir Fraser; Zhang, Xiantao; Xu, Dongxiao;
> xen-devel@lists.xensource.com; Dugger, Donald D
> Subject: RE: [Xen-devel] RE: Saving/Restoring IA32_TSC_AUX MSR
>
>
> Jeremy Fitzhardinge wrote on Fri, 11 Dec 2009 at 10:50:29:
>
> > On 12/11/09 10:35, Dan Magenheimer wrote:
> >>> However, the vcpu number is definitely useful to usermode
> >>> apps, so they
> >>> can get some idea how they're moved between (v)cpus. I don't
> >>> think it
> >>> will matter to them that it isn't pcpu.
> >>>
> >> My point is that an app running on native Linux can
> >> safely assume that, if TSC_AUX==3 at time T1 and
> >> TSC_AUX is still 3 at time T2,it is running
> >> on the same processor and the same node at both T1
> >> and T2. In a virtual environment it cannot even
> >> assume it is running on the same machine.
> >> Further if the app sees that TSC_AUX==2 at time T3
> >> and TSC_AUX==3 at time T4, on native Linux it
> >> can safely assume that it is running on a different
> >> processor. While rarer, in a virtual environment,
> >> this may also be a false assumption.
> >>
> >> That's why I say the information is misleading.
> >>
> > Sure, but that info is, at best, of heuristic value, and
> won't cause
> > any correctness problems if it is wrong. The performance
> may suck, but
> > that's part of the larger problem of running NUMA-aware code in a
> > virtual environment.
> >
>
> And to utilize various NUMA optimizations in the kernel/apps
> in the guest, we need "the virtual numa info bears some vague
> resemblance to the real topology" (from Jeremy's email) with
> the vcpus bound to the CPU/node.
>
> I understand that enabling RDTSCP in HVM will disable the
> pvrdtscp algorithm if used by the kernel. One way is to mask
> off the feature in CPUID (by default). Then kernel won't use it.
>
> Jun
> ___
> Intel Open Source Technology Center
>
>
>
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: RE: Saving/Restoring IA32_TSC_AUX MSR
2009-12-11 22:23 ` Dan Magenheimer
@ 2009-12-11 22:58 ` Nakajima, Jun
2009-12-11 23:30 ` Dan Magenheimer
2009-12-13 9:17 ` Zhang, Xiantao
1 sibling, 1 reply; 30+ messages in thread
From: Nakajima, Jun @ 2009-12-11 22:58 UTC (permalink / raw)
To: Dan Magenheimer, Jeremy Fitzhardinge
Cc: Xu, Dongxiao, xen-devel@lists.xensource.com, Dugger, Donald D,
Keir Fraser, Zhang, Xiantao
Dan Magenheimer wrote on Fri, 11 Dec 2009 at 14:23:02:
> Well, although it might be nice to be able to use
> rdtscp and TSC_AUX to determine pcpu/vcpu/pnode/vnode
> information, I think Jeremy and Jan convinced me in
> another thread a couple of months ago that in userland:
>
> x = vgetcpu()
> do_other_stuff();
> y = vgetcpu()
>
> if x==1 and y==2, there's no way to determine that
> do_other_stuff() was executed on cpu 1 vs cpu 2,
> or (though unlikely) even on cpu 3. And if
> x==y==4, there's no guarantee that do_other_stuff()
> is executed on cpu 4.
>
> If this is true the only safe use of TSC_AUX is for
> its originally designed intent: To determine if two
> successive rdtscp instructions were or were not
> executed on the same processor. Since this cannot
> be guaranteed in a VM, that's a reasonable argument
> that TSC_AUX shouldn't be exposed at all (meaning the
> rdtscp bit in cpuid should be turned off by Xen).
This should work if you bind (i.e. pin) each vcpu to each CPU, as I suggested.
Jun
___
Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: RE: Saving/Restoring IA32_TSC_AUX MSR
2009-12-11 22:58 ` Nakajima, Jun
@ 2009-12-11 23:30 ` Dan Magenheimer
2009-12-11 23:44 ` Xu, Dongxiao
0 siblings, 1 reply; 30+ messages in thread
From: Dan Magenheimer @ 2009-12-11 23:30 UTC (permalink / raw)
To: Nakajima, Jun, Jeremy Fitzhardinge
Cc: Xu, Dongxiao, xen-devel, Dugger, Donald D, Keir Fraser,
Zhang, Xiantao
> > If this is true the only safe use of TSC_AUX is for
> > its originally designed intent: To determine if two
> > successive rdtscp instructions were or were not
> > executed on the same processor. Since this cannot
> > be guaranteed in a VM, that's a reasonable argument
> > that TSC_AUX shouldn't be exposed at all (meaning the
> > rdtscp bit in cpuid should be turned off by Xen).
>
> This should work if you bind (i.e. pin) each vcpu to each
> CPU, as I suggested.
Yes, it does. If there were a reasonable way for an
application to check "am I running on a VM for which
each vcpu has been pinned?" this might be a reasonable
constraint as, if the app isn't, it could fail or at least
log a message. But if the app will randomly fail
(or perform horribly) depending on whether the
underlying VM is pinned or not (which might even
change across a migration or if a sysadmin is
"tuning" his data center), I don't think
enterprise customers would appreciate that.
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: RE: Saving/Restoring IA32_TSC_AUX MSR
2009-12-11 23:30 ` Dan Magenheimer
@ 2009-12-11 23:44 ` Xu, Dongxiao
2009-12-12 0:09 ` Dan Magenheimer
0 siblings, 1 reply; 30+ messages in thread
From: Xu, Dongxiao @ 2009-12-11 23:44 UTC (permalink / raw)
To: Dan Magenheimer, Nakajima, Jun, Jeremy Fitzhardinge
Cc: xen-devel@lists.xensource.com, Dugger, Donald D, Keir Fraser,
Zhang, Xiantao
Dan Magenheimer wrote:
>>> If this is true the only safe use of TSC_AUX is for
>>> its originally designed intent: To determine if two
>>> successive rdtscp instructions were or were not
>>> executed on the same processor. Since this cannot
>>> be guaranteed in a VM, that's a reasonable argument
>>> that TSC_AUX shouldn't be exposed at all (meaning the
>>> rdtscp bit in cpuid should be turned off by Xen).
>>
>> This should work if you bind (i.e. pin) each vcpu to each
>> CPU, as I suggested.
>
> Yes, it does. If there were a reasonable way for an
> application to check "am I running on a VM for which
> each vcpu has been pinned?" this might be a reasonable
> constraint as, if the app isn't, it could fail or at least
> log a message. But if the app will randomly fail
> (or perform horribly) depending on whether the
> underlying VM is pinned or not (which might even
> change across a migration or if a sysadmin is
> "tuning" his data center), I don't think
> enterprise customers would appreciate that.
Dan,
If later guest NUMA is implemented, both APP and
Hypervisor/Guest are NUMA awared. APP could get benefit
>From the information of node/processor which is got from
RDTSCP. But how to implement guest NUMA is another story,
either we can use pin, or something other creative idea.
Best Regards,
-- Dongxiao
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: RE: Saving/Restoring IA32_TSC_AUX MSR
2009-12-11 23:44 ` Xu, Dongxiao
@ 2009-12-12 0:09 ` Dan Magenheimer
2009-12-12 0:30 ` Xu, Dongxiao
0 siblings, 1 reply; 30+ messages in thread
From: Dan Magenheimer @ 2009-12-12 0:09 UTC (permalink / raw)
To: Xu, Dongxiao, Nakajima, Jun, Jeremy Fitzhardinge
Cc: xen-devel, Dugger, Donald D, Keir Fraser, Zhang, Xiantao
> > Yes, it does. If there were a reasonable way for an
> > application to check "am I running on a VM for which
> > each vcpu has been pinned?" this might be a reasonable
> > constraint as, if the app isn't, it could fail or at least
> > log a message. But if the app will randomly fail
> > (or perform horribly) depending on whether the
> > underlying VM is pinned or not (which might even
> > change across a migration or if a sysadmin is
> > "tuning" his data center), I don't think
> > enterprise customers would appreciate that.
>
> Dan,
> If later guest NUMA is implemented, both APP and
> Hypervisor/Guest are NUMA awared. APP could get benefit
> From the information of node/processor which is got from
> RDTSCP. But how to implement guest NUMA is another story,
> either we can use pin, or something other creative idea.
Right. A guest NUMA implementation could use:
1) rdtscp+tsc_aux, which is very fast but unreliable
(unless the app can be certain the guest is permanently
pinned), or
2) some other yet-to-be-designed mechanism, likely involving
system calls and/or hypercalls, which is slower but can be
designed to be always reliable
In my experience in the enterprise world, "slow but
reliable" is always better than "fast but unreliable",
except possibly in well-understood constrained situations.
So I am suggesting we do not implement (1) by NOT
enabling rdtscp-bit-in-cpuid and instead concentrate
on (2). I guess for the special cases where unreliable
is acceptable, (1) could be an option, but I don't
think it should be turned on by default.
Dan
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: RE: Saving/Restoring IA32_TSC_AUX MSR
2009-12-12 0:09 ` Dan Magenheimer
@ 2009-12-12 0:30 ` Xu, Dongxiao
0 siblings, 0 replies; 30+ messages in thread
From: Xu, Dongxiao @ 2009-12-12 0:30 UTC (permalink / raw)
To: Dan Magenheimer, Nakajima, Jun, Jeremy Fitzhardinge
Cc: xen-devel@lists.xensource.com, Dugger, Donald D, Keir Fraser,
Zhang, Xiantao
Dan Magenheimer wrote:
>>> Yes, it does. If there were a reasonable way for an
>>> application to check "am I running on a VM for which
>>> each vcpu has been pinned?" this might be a reasonable
>>> constraint as, if the app isn't, it could fail or at least
>>> log a message. But if the app will randomly fail
>>> (or perform horribly) depending on whether the
>>> underlying VM is pinned or not (which might even
>>> change across a migration or if a sysadmin is
>>> "tuning" his data center), I don't think
>>> enterprise customers would appreciate that.
>>
>> Dan,
>> If later guest NUMA is implemented, both APP and
>> Hypervisor/Guest are NUMA awared. APP could get benefit
>> From the information of node/processor which is got from
>> RDTSCP. But how to implement guest NUMA is another story,
>> either we can use pin, or something other creative idea.
>
> Right. A guest NUMA implementation could use:
>
> 1) rdtscp+tsc_aux, which is very fast but unreliable
> (unless the app can be certain the guest is permanently
> pinned), or
> 2) some other yet-to-be-designed mechanism, likely involving
> system calls and/or hypercalls, which is slower but can be
> designed to be always reliable
Here is my simple understanding of guest NUMA: it means that
Hypervisor will present the correct NUMA information to
Guest kernel/app. So once guest NUMA is implemented,
the information got from RDTSCP is both reliable and fast.
Thanks!
Dongxiao
>
> In my experience in the enterprise world, "slow but
> reliable" is always better than "fast but unreliable",
> except possibly in well-understood constrained situations.
> So I am suggesting we do not implement (1) by NOT
> enabling rdtscp-bit-in-cpuid and instead concentrate
> on (2). I guess for the special cases where unreliable
> is acceptable, (1) could be an option, but I don't
> think it should be turned on by default.
>
> Dan
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: RE: Saving/Restoring IA32_TSC_AUX MSR
2009-12-11 22:23 ` Dan Magenheimer
2009-12-11 22:58 ` Nakajima, Jun
@ 2009-12-13 9:17 ` Zhang, Xiantao
2009-12-13 18:06 ` Dan Magenheimer
1 sibling, 1 reply; 30+ messages in thread
From: Zhang, Xiantao @ 2009-12-13 9:17 UTC (permalink / raw)
To: Dan Magenheimer, Nakajima, Jun, Jeremy Fitzhardinge
Cc: Xu, Dongxiao, xen-devel@lists.xensource.com, Keir Fraser,
Dugger, Donald D
Dan Magenheimer wrote:
> Well, although it might be nice to be able to use
> rdtscp and TSC_AUX to determine pcpu/vcpu/pnode/vnode
> information, I think Jeremy and Jan convinced me in
> another thread a couple of months ago that in userland:
>
> x = vgetcpu()
> do_other_stuff();
> y = vgetcpu()
>
> if x==1 and y==2, there's no way to determine that
> do_other_stuff() was executed on cpu 1 vs cpu 2,
> or (though unlikely) even on cpu 3. And if
> x==y==4, there's no guarantee that do_other_stuff()
> is executed on cpu 4.
>
> If this is true the only safe use of TSC_AUX is for
> its originally designed intent: To determine if two
> successive rdtscp instructions were or were not
> executed on the same processor. Since this cannot
> be guaranteed in a VM, that's a reasonable argument
> that TSC_AUX shouldn't be exposed at all (meaning the
> rdtscp bit in cpuid should be turned off by Xen).
Why do you think this is the design intent of this instruction ?
For guest NUMA support, it should be a must to pin each vcpu of one VM to some logical proceossors which belong to one specific node(disable vcpu migration between nodes), I think, otherwise, virutal numa may suffer from performance loss. For example, in a numa system which has two nodes and each node has 4G memory and 8 logical processors. And in this Xen-configured system, if we carete a VM with 2 G memory with4 vcpu support, Xen system may allocate 1 G memory from physical node 0 and another 1 G memory from physical node 1. And in this case, if we virtualize numa for this VM, vcpu0 and vcpu1 can be assinged to virtual node0 , vcpu2 and vcpu3 can be configured for virtual node1, certainly, we also can safely pin vcpu0 and vpcu1 to the physical node0's 8 locial processors and accordingly pin vcpu2 and vcpu3 to the physical node1's 8 physical processors. Since virtual TSC_AUX is virtualized for each vcpu, and the value is saved/restored for the vcpu when its migration occurs, so if one application always runs on a virtual processors, it should get a fixed value when it calls vgetcpu, envn if this vcpu often migrates among logical processors of one node.
Back to this topic, in all, we can't mix the virtual TSC_AUX of guest with the host's TSC_AUX. If switch to HVM's vcpu context, load this vcpu's virtual TSC_AUX_MSR to physical TSC_AUX_MSR, and when it is sheduled out, host's TSC_AUX_MSR(which maybe used for pv guests) is loaded.
> True, as long as the information is ONLY used
> heuristically to obtain pcpu/vcpu/pnode/vnode info,
> and no guarantee of correctness is implied or expected,
> it might be useful some of the time.
>
> But frankly, if "performance sucks" when the heuristic
> fails due to the fact that the app is running on
> a VM instead of native OS, I'd see that as a problem
> and suggest the proper way to fix that is to define
> more App-to-Xen ABIs so that the app can get the
> real information, not a heuristic. Which also argues
> for Xen leaving the rdtscp bit in cpuid turned off
>
> Dan
>
>> -----Original Message-----
>> From: Nakajima, Jun [mailto:jun.nakajima@intel.com]
>> Sent: Friday, December 11, 2009 12:30 PM
>> To: Jeremy Fitzhardinge; Dan Magenheimer
>> Cc: Keir Fraser; Zhang, Xiantao; Xu, Dongxiao;
>> xen-devel@lists.xensource.com; Dugger, Donald D
>> Subject: RE: [Xen-devel] RE: Saving/Restoring IA32_TSC_AUX MSR
>>
>>
>> Jeremy Fitzhardinge wrote on Fri, 11 Dec 2009 at 10:50:29:
>>
>>> On 12/11/09 10:35, Dan Magenheimer wrote:
>>>>> However, the vcpu number is definitely useful to usermode apps,
>>>>> so they can get some idea how they're moved between (v)cpus. I
>>>>> don't think it will matter to them that it isn't pcpu.
>>>>>
>>>> My point is that an app running on native Linux can
>>>> safely assume that, if TSC_AUX==3 at time T1 and
>>>> TSC_AUX is still 3 at time T2,it is running
>>>> on the same processor and the same node at both T1
>>>> and T2. In a virtual environment it cannot even
>>>> assume it is running on the same machine.
>>>> Further if the app sees that TSC_AUX==2 at time T3
>>>> and TSC_AUX==3 at time T4, on native Linux it
>>>> can safely assume that it is running on a different
>>>> processor. While rarer, in a virtual environment,
>>>> this may also be a false assumption.
>>>>
>>>> That's why I say the information is misleading.
>>>>
>>> Sure, but that info is, at best, of heuristic value, and won't
>>> cause any correctness problems if it is wrong. The performance may
>>> suck, but that's part of the larger problem of running NUMA-aware
>>> code in a virtual environment.
>>>
>>
>> And to utilize various NUMA optimizations in the kernel/apps
>> in the guest, we need "the virtual numa info bears some vague
>> resemblance to the real topology" (from Jeremy's email) with
>> the vcpus bound to the CPU/node.
>>
>> I understand that enabling RDTSCP in HVM will disable the
>> pvrdtscp algorithm if used by the kernel. One way is to mask
>> off the feature in CPUID (by default). Then kernel won't use it.
>>
>> Jun
>> ___
>> Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: RE: Saving/Restoring IA32_TSC_AUX MSR
2009-12-13 9:17 ` Zhang, Xiantao
@ 2009-12-13 18:06 ` Dan Magenheimer
2009-12-13 18:59 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 30+ messages in thread
From: Dan Magenheimer @ 2009-12-13 18:06 UTC (permalink / raw)
To: Zhang, Xiantao, Nakajima, Jun, Jeremy Fitzhardinge
Cc: Xu, Dongxiao, xen-devel, Keir Fraser, Dugger, Donald D
> > If this is true the only safe use of TSC_AUX is for
> > its originally designed intent: To determine if two
> > successive rdtscp instructions were or were not
> > executed on the same processor. Since this cannot
> > be guaranteed in a VM, that's a reasonable argument
> > that TSC_AUX shouldn't be exposed at all (meaning the
> > rdtscp bit in cpuid should be turned off by Xen).
>
> Why do you think this is the design intent of this instruction ?
The instruction was designed by AMD for this purpose
a few years ago in order to allow applications to
detect (and correct) possible TSC skew between processors.
> For guest NUMA support, it should be a must to pin each vcpu
> of one VM to some logical proceossors which belong to one
> specific node(disable vcpu migration between nodes), I think,
> otherwise, virutal numa may suffer from performance loss.
I agree that, for guest NUMA support, restricting all
vcpus to the same physical node is important. However,
PINNING each vcpu to a fixed pcpu (and never allowing
migration) greatly reduces the value of virtualization.
> For example, in a numa system which has two nodes and each
> node has 4G memory and 8 logical processors. And in this
> Xen-configured system, if we carete a VM with 2 G memory
> with4 vcpu support, Xen system may allocate 1 G memory from
> physical node 0 and another 1 G memory from physical node 1.
> And in this case, if we virtualize numa for this VM, vcpu0
> and vcpu1 can be assinged to virtual node0 , vcpu2 and vcpu3
> can be configured for virtual node1, certainly, we also can
> safely pin vcpu0 and vpcu1 to the physical node0's 8 locial
> processors and accordingly pin vcpu2 and vcpu3 to the
> physical node1's 8 physical processors. Since virtual
> TSC_AUX is virtualized for each vcpu, and the value is
> saved/restored for the vcpu when its migration occurs, so if
> one application always runs on a virtual processors, it
> should get a fixed value when it calls vgetcpu, envn if this
> vcpu often migrates among logical processors of one node.
I agree there are some cases where the TSC_AUX value
set by a guest OS may be useful. But ensuring that its
is always useful (NEVER incorrect) requires too many restrictions,
such as pinning.
> Back to this topic, in all, we can't mix the virtual
> TSC_AUX of guest with the host's TSC_AUX. If switch to HVM's
> vcpu context, load this vcpu's virtual TSC_AUX_MSR to
> physical TSC_AUX_MSR, and when it is sheduled out, host's
> TSC_AUX_MSR(which maybe used for pv guests) is loaded.
I agree they can't be mixed. My position is that a guest
does not have sufficient information to always correctly set
TSC_AUX, so the best way to avoid the issue is to tell
the guest OS that TSC_AUX doesn't exist (i.e. cpuid-rdtscp
bit is off). Xen can still set TSC_AUX (and even emulate
it on processors that don't support it) and this information
can still be used (correctly) by virtualization-and-NUMA-aware
OS's and applications.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: RE: Saving/Restoring IA32_TSC_AUX MSR
2009-12-13 18:06 ` Dan Magenheimer
@ 2009-12-13 18:59 ` Jeremy Fitzhardinge
2009-12-14 6:33 ` Xu, Dongxiao
0 siblings, 1 reply; 30+ messages in thread
From: Jeremy Fitzhardinge @ 2009-12-13 18:59 UTC (permalink / raw)
To: Dan Magenheimer
Cc: xen-devel, Dugger, Donald D, Xu, Dongxiao, Keir Fraser,
Nakajima, Jun, Zhang, Xiantao
On 12/13/09 10:06, Dan Magenheimer wrote:
> I agree there are some cases where the TSC_AUX value
> set by a guest OS may be useful. But ensuring that its
> is always useful (NEVER incorrect) requires too many restrictions,
> such as pinning.
>
At least with respect to Linux guests [*], this objection to rdtscp is
moot, because if it isn't present then Linux will fall back to another
mechanism which is always present. Guest usermode will get the same
info, good/bad/misleading/whatever, either way; rdtscp can't make it
worse. The only question is whether specifically adding rdtscp/TSC_AUX
support adds any overall improvement.
(* I don't know if any other rdtscp-users attempt to put NUMA or other
physical topology info into TSC_AUX. If they just stick to
setting/using the cpu number, then they will get a net win from rdtscp.)
J
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: RE: Saving/Restoring IA32_TSC_AUX MSR
2009-12-13 18:59 ` Jeremy Fitzhardinge
@ 2009-12-14 6:33 ` Xu, Dongxiao
0 siblings, 0 replies; 30+ messages in thread
From: Xu, Dongxiao @ 2009-12-14 6:33 UTC (permalink / raw)
To: Jeremy Fitzhardinge, Dan Magenheimer
Cc: xen-devel@lists.xensource.com, Dugger, Donald D, Nakajima, Jun,
Zhang, Xiantao, Keir Fraser
Jeremy Fitzhardinge wrote:
> On 12/13/09 10:06, Dan Magenheimer wrote:
>> I agree there are some cases where the TSC_AUX value
>> set by a guest OS may be useful. But ensuring that its
>> is always useful (NEVER incorrect) requires too many restrictions,
>> such as pinning.
>>
>
> At least with respect to Linux guests [*], this objection to rdtscp is
> moot, because if it isn't present then Linux will fall back to another
> mechanism which is always present. Guest usermode will get the same
> info, good/bad/misleading/whatever, either way; rdtscp can't make it
> worse. The only question is whether specifically adding
> rdtscp/TSC_AUX support adds any overall improvement.
>
> (* I don't know if any other rdtscp-users attempt to put NUMA or other
> physical topology info into TSC_AUX. If they just stick to
> setting/using the cpu number, then they will get a net win from
> rdtscp.)
Just have a glance at the open-solaris code, in its mp_startup() function,
it will write the cpu_id value into the TSC_AUX MSR. Therefore I think
open-solaris also uses this feature.
Thanks,
Dongxiao
>
> J
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2009-12-14 6:33 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-09 16:41 Saving/Restoring IA32_TSC_AUX MSR Nakajima, Jun
2009-12-09 16:59 ` Dan Magenheimer
2009-12-09 17:07 ` Nakajima, Jun
2009-12-09 17:22 ` Dan Magenheimer
2009-12-10 11:21 ` Xu, Dongxiao
2009-12-10 15:49 ` Dan Magenheimer
2009-12-11 1:22 ` Xu, Dongxiao
2009-12-11 2:00 ` Dan Magenheimer
2009-12-11 8:03 ` Keir Fraser
2009-12-11 8:43 ` Zhang, Xiantao
2009-12-11 9:22 ` Keir Fraser
2009-12-11 15:09 ` Dan Magenheimer
2009-12-11 15:28 ` Xu, Dongxiao
2009-12-11 16:12 ` Dan Magenheimer
2009-12-11 18:38 ` Nakajima, Jun
2009-12-11 19:46 ` Dan Magenheimer
2009-12-11 18:20 ` Jeremy Fitzhardinge
2009-12-11 18:35 ` Dan Magenheimer
2009-12-11 18:50 ` Jeremy Fitzhardinge
2009-12-11 19:29 ` Nakajima, Jun
2009-12-11 22:23 ` Dan Magenheimer
2009-12-11 22:58 ` Nakajima, Jun
2009-12-11 23:30 ` Dan Magenheimer
2009-12-11 23:44 ` Xu, Dongxiao
2009-12-12 0:09 ` Dan Magenheimer
2009-12-12 0:30 ` Xu, Dongxiao
2009-12-13 9:17 ` Zhang, Xiantao
2009-12-13 18:06 ` Dan Magenheimer
2009-12-13 18:59 ` Jeremy Fitzhardinge
2009-12-14 6:33 ` Xu, Dongxiao
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.