* fpu_taskswitch adjustment proposal
@ 2012-06-15 16:03 Jan Beulich
2012-06-15 17:06 ` Keir Fraser
0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2012-06-15 16:03 UTC (permalink / raw)
To: xen-devel
While pv-ops so far doesn't care to eliminate the two trap-and-
emulate CR0 accesses from the asm/xor.h save/restore
operations, the legacy x86-64 kernel uses conditional clts()/stts()
for this purpose. While looking into whether to extend this to the
newly added (in 3.5) AVX operations there I realized that this isn't
fully correct: It doesn't properly nest inside a kernel_fpu_begin()/
kernel_fpu_end() pair (as it will stts() at the end no matter what
the original state of CR0.TS was).
In order to not introduce completely new hypercalls to overcome
this (fpu_taskswitch isn't really extensible on its own), I'm
considering to add a new VM assist, altering the fpu_taskswitch
behavior so that it would return an indicator whether any change
to the virtual CR0.TS was done. That way, the kernel can
implement a true save/restore cycle here.
In order to allow the kernel to run on older hypervisors without
extra conditionals (behaving the same way as it does currently,
i.e. with the incorrect nesting), the return value 0 (which the
hypercall currently always returns) would need to be used to
indicate that the bit got actually flipped (such that on an old
hypervisor an updated kernel would always think that
something needs to be restored).
Would that be an acceptable solution? Can someone think of
other ways to deal with this? (And - is pv-ops interested in
eliminating the two CR0 access emulations in what is supposed
to be a fast path?)
Jan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: fpu_taskswitch adjustment proposal
2012-06-15 16:03 fpu_taskswitch adjustment proposal Jan Beulich
@ 2012-06-15 17:06 ` Keir Fraser
2012-06-18 7:32 ` Jan Beulich
2012-06-26 16:25 ` Jan Beulich
0 siblings, 2 replies; 9+ messages in thread
From: Keir Fraser @ 2012-06-15 17:06 UTC (permalink / raw)
To: Jan Beulich, xen-devel
On 15/06/2012 17:03, "Jan Beulich" <JBeulich@suse.com> wrote:
> While pv-ops so far doesn't care to eliminate the two trap-and-
> emulate CR0 accesses from the asm/xor.h save/restore
> operations, the legacy x86-64 kernel uses conditional clts()/stts()
> for this purpose. While looking into whether to extend this to the
> newly added (in 3.5) AVX operations there I realized that this isn't
> fully correct: It doesn't properly nest inside a kernel_fpu_begin()/
> kernel_fpu_end() pair (as it will stts() at the end no matter what
> the original state of CR0.TS was).
>
> In order to not introduce completely new hypercalls to overcome
> this (fpu_taskswitch isn't really extensible on its own), I'm
> considering to add a new VM assist, altering the fpu_taskswitch
> behavior so that it would return an indicator whether any change
> to the virtual CR0.TS was done. That way, the kernel can
> implement a true save/restore cycle here.
It should be possible for the guest kernel to track its CR0.TS setting
shouldn't it? It gets modified only via a few paravirt hooks, and implicitly
cleared on #NM.
-- Keir
> In order to allow the kernel to run on older hypervisors without
> extra conditionals (behaving the same way as it does currently,
> i.e. with the incorrect nesting), the return value 0 (which the
> hypercall currently always returns) would need to be used to
> indicate that the bit got actually flipped (such that on an old
> hypervisor an updated kernel would always think that
> something needs to be restored).
>
> Would that be an acceptable solution? Can someone think of
> other ways to deal with this? (And - is pv-ops interested in
> eliminating the two CR0 access emulations in what is supposed
> to be a fast path?)
>
> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: fpu_taskswitch adjustment proposal
2012-06-15 17:06 ` Keir Fraser
@ 2012-06-18 7:32 ` Jan Beulich
2012-06-18 12:12 ` Keir Fraser
2012-06-18 18:24 ` Konrad Rzeszutek Wilk
2012-06-26 16:25 ` Jan Beulich
1 sibling, 2 replies; 9+ messages in thread
From: Jan Beulich @ 2012-06-18 7:32 UTC (permalink / raw)
To: Keir Fraser; +Cc: Konrad Rzeszutek Wilk, xen-devel
>>> On 15.06.12 at 19:06, Keir Fraser <keir@xen.org> wrote:
> On 15/06/2012 17:03, "Jan Beulich" <JBeulich@suse.com> wrote:
>
>> While pv-ops so far doesn't care to eliminate the two trap-and-
>> emulate CR0 accesses from the asm/xor.h save/restore
>> operations, the legacy x86-64 kernel uses conditional clts()/stts()
>> for this purpose. While looking into whether to extend this to the
>> newly added (in 3.5) AVX operations there I realized that this isn't
>> fully correct: It doesn't properly nest inside a kernel_fpu_begin()/
>> kernel_fpu_end() pair (as it will stts() at the end no matter what
>> the original state of CR0.TS was).
>>
>> In order to not introduce completely new hypercalls to overcome
>> this (fpu_taskswitch isn't really extensible on its own), I'm
>> considering to add a new VM assist, altering the fpu_taskswitch
>> behavior so that it would return an indicator whether any change
>> to the virtual CR0.TS was done. That way, the kernel can
>> implement a true save/restore cycle here.
>
> It should be possible for the guest kernel to track its CR0.TS setting
> shouldn't it? It gets modified only via a few paravirt hooks, and implicitly
> cleared on #NM.
Sure, but selling this to the Linux maintainers I would expect to be
harder than fitting the Xen side of things into the current save-
and-restore model the native xor code uses. It would only be strait
forward to implement on the legacy, forward ported trees.
However, with the #NM handler in pv-ops apparently not
leveraging the fact that CR0.TS is already cleared for it on entry,
maybe this could indeed be introduced together. Konrad?
Jan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: fpu_taskswitch adjustment proposal
2012-06-18 7:32 ` Jan Beulich
@ 2012-06-18 12:12 ` Keir Fraser
2012-06-18 12:45 ` Jan Beulich
2012-06-18 18:24 ` Konrad Rzeszutek Wilk
1 sibling, 1 reply; 9+ messages in thread
From: Keir Fraser @ 2012-06-18 12:12 UTC (permalink / raw)
To: Jan Beulich; +Cc: Konrad Rzeszutek Wilk, xen-devel
On 18/06/2012 08:32, "Jan Beulich" <JBeulich@suse.com> wrote:
>> It should be possible for the guest kernel to track its CR0.TS setting
>> shouldn't it? It gets modified only via a few paravirt hooks, and implicitly
>> cleared on #NM.
>
> Sure, but selling this to the Linux maintainers I would expect to be
> harder than fitting the Xen side of things into the current save-
> and-restore model the native xor code uses. It would only be strait
> forward to implement on the legacy, forward ported trees.
Wouldn't it be hidden entirely behind pv_ops hooks and within Xen-specific
SSE save/restore code? I suppose you'd need to statically allocate the
per-cpu space for tracking the CR0.TS state... But overall it seems it will
be of little/no concern to other kernel maintainers?
-- Keir
> However, with the #NM handler in pv-ops apparently not
> leveraging the fact that CR0.TS is already cleared for it on entry,
> maybe this could indeed be introduced together. Konrad?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: fpu_taskswitch adjustment proposal
2012-06-18 12:12 ` Keir Fraser
@ 2012-06-18 12:45 ` Jan Beulich
2012-06-18 13:59 ` Keir Fraser
0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2012-06-18 12:45 UTC (permalink / raw)
To: Keir Fraser; +Cc: Konrad Rzeszutek Wilk, xen-devel
>>> On 18.06.12 at 14:12, Keir Fraser <keir@xen.org> wrote:
> On 18/06/2012 08:32, "Jan Beulich" <JBeulich@suse.com> wrote:
>
>>> It should be possible for the guest kernel to track its CR0.TS setting
>>> shouldn't it? It gets modified only via a few paravirt hooks, and implicitly
>>> cleared on #NM.
>>
>> Sure, but selling this to the Linux maintainers I would expect to be
>> harder than fitting the Xen side of things into the current save-
>> and-restore model the native xor code uses. It would only be strait
>> forward to implement on the legacy, forward ported trees.
>
> Wouldn't it be hidden entirely behind pv_ops hooks and within Xen-specific
> SSE save/restore code? I suppose you'd need to statically allocate the
> per-cpu space for tracking the CR0.TS state... But overall it seems it will
> be of little/no concern to other kernel maintainers?
The #NM handler part wouldn't afaict. Everything else indeed
ought to be restricted to the functions backing paravirt.h's clts()
and write_cr0().
Jan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: fpu_taskswitch adjustment proposal
2012-06-18 12:45 ` Jan Beulich
@ 2012-06-18 13:59 ` Keir Fraser
0 siblings, 0 replies; 9+ messages in thread
From: Keir Fraser @ 2012-06-18 13:59 UTC (permalink / raw)
To: Jan Beulich; +Cc: Konrad Rzeszutek Wilk, xen-devel
On 18/06/2012 13:45, "Jan Beulich" <JBeulich@suse.com> wrote:
>> Wouldn't it be hidden entirely behind pv_ops hooks and within Xen-specific
>> SSE save/restore code? I suppose you'd need to statically allocate the
>> per-cpu space for tracking the CR0.TS state... But overall it seems it will
>> be of little/no concern to other kernel maintainers?
>
> The #NM handler part wouldn't afaict. Everything else indeed
> ought to be restricted to the functions backing paravirt.h's clts()
> and write_cr0().
Yes, the #NM handler might need extra treatment. Perhaps we could install
our own #NM wrapper around the kernel's generic handler, which first clears
the saved CR0.TS flag. Then on clts() in the generic handler, our paravirt
clts() would see the flag is already clear and do no hypercall.
-- Keir
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: fpu_taskswitch adjustment proposal
2012-06-18 7:32 ` Jan Beulich
2012-06-18 12:12 ` Keir Fraser
@ 2012-06-18 18:24 ` Konrad Rzeszutek Wilk
2012-06-18 22:35 ` Keir Fraser
1 sibling, 1 reply; 9+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-06-18 18:24 UTC (permalink / raw)
To: Jan Beulich; +Cc: Keir Fraser, xen-devel
On Mon, Jun 18, 2012 at 08:32:05AM +0100, Jan Beulich wrote:
> >>> On 15.06.12 at 19:06, Keir Fraser <keir@xen.org> wrote:
> > On 15/06/2012 17:03, "Jan Beulich" <JBeulich@suse.com> wrote:
> >
> >> While pv-ops so far doesn't care to eliminate the two trap-and-
> >> emulate CR0 accesses from the asm/xor.h save/restore
> >> operations, the legacy x86-64 kernel uses conditional clts()/stts()
> >> for this purpose. While looking into whether to extend this to the
> >> newly added (in 3.5) AVX operations there I realized that this isn't
> >> fully correct: It doesn't properly nest inside a kernel_fpu_begin()/
> >> kernel_fpu_end() pair (as it will stts() at the end no matter what
> >> the original state of CR0.TS was).
That sounds like a bug in the generic code then?
> >>
> >> In order to not introduce completely new hypercalls to overcome
> >> this (fpu_taskswitch isn't really extensible on its own), I'm
> >> considering to add a new VM assist, altering the fpu_taskswitch
> >> behavior so that it would return an indicator whether any change
> >> to the virtual CR0.TS was done. That way, the kernel can
> >> implement a true save/restore cycle here.
How would that work with the multi-calls? Right now clts is batched
and so is cr0 write.
> >
> > It should be possible for the guest kernel to track its CR0.TS setting
> > shouldn't it? It gets modified only via a few paravirt hooks, and implicitly
Hm, the clts() paravirt could take advantage of the per-cpu cr0 to
figure out whether it truly needs to do anything.
> > cleared on #NM.
> Sure, but selling this to the Linux maintainers I would expect to be
> harder than fitting the Xen side of things into the current save-
> and-restore model the native xor code uses. It would only be strait
> forward to implement on the legacy, forward ported trees.
>
> However, with the #NM handler in pv-ops apparently not
> leveraging the fact that CR0.TS is already cleared for it on entry,
> maybe this could indeed be introduced together. Konrad?
Would this require an extra pvops call from the #NM handler?
>
> Jan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: fpu_taskswitch adjustment proposal
2012-06-18 18:24 ` Konrad Rzeszutek Wilk
@ 2012-06-18 22:35 ` Keir Fraser
0 siblings, 0 replies; 9+ messages in thread
From: Keir Fraser @ 2012-06-18 22:35 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk, Jan Beulich; +Cc: xen-devel
On 18/06/2012 19:24, "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com> wrote:
>>> It should be possible for the guest kernel to track its CR0.TS setting
>>> shouldn't it? It gets modified only via a few paravirt hooks, and implicitly
>
> Hm, the clts() paravirt could take advantage of the per-cpu cr0 to
> figure out whether it truly needs to do anything.
Exactly.
>>> cleared on #NM.
>> Sure, but selling this to the Linux maintainers I would expect to be
>> harder than fitting the Xen side of things into the current save-
>> and-restore model the native xor code uses. It would only be strait
>> forward to implement on the legacy, forward ported trees.
>>
>> However, with the #NM handler in pv-ops apparently not
>> leveraging the fact that CR0.TS is already cleared for it on entry,
>> maybe this could indeed be introduced together. Konrad?
>
> Would this require an extra pvops call from the #NM handler?
Or we wrap the #NM handler (somehow?) and clear the per-cpu cr0.ts software
flag before calling into the generic #NM handler.
-- Keir
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: fpu_taskswitch adjustment proposal
2012-06-15 17:06 ` Keir Fraser
2012-06-18 7:32 ` Jan Beulich
@ 2012-06-26 16:25 ` Jan Beulich
1 sibling, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2012-06-26 16:25 UTC (permalink / raw)
To: Jun Nakajima, Keir Fraser; +Cc: xen-devel
>>> On 15.06.12 at 19:06, Keir Fraser <keir@xen.org> wrote:
> On 15/06/2012 17:03, "Jan Beulich" <JBeulich@suse.com> wrote:
>
>> While pv-ops so far doesn't care to eliminate the two trap-and-
>> emulate CR0 accesses from the asm/xor.h save/restore
>> operations, the legacy x86-64 kernel uses conditional clts()/stts()
>> for this purpose. While looking into whether to extend this to the
>> newly added (in 3.5) AVX operations there I realized that this isn't
>> fully correct: It doesn't properly nest inside a kernel_fpu_begin()/
>> kernel_fpu_end() pair (as it will stts() at the end no matter what
>> the original state of CR0.TS was).
>>
>> In order to not introduce completely new hypercalls to overcome
>> this (fpu_taskswitch isn't really extensible on its own), I'm
>> considering to add a new VM assist, altering the fpu_taskswitch
>> behavior so that it would return an indicator whether any change
>> to the virtual CR0.TS was done. That way, the kernel can
>> implement a true save/restore cycle here.
>
> It should be possible for the guest kernel to track its CR0.TS setting
> shouldn't it? It gets modified only via a few paravirt hooks, and implicitly
> cleared on #NM.
While this works fine and is fairly non-intrusive, it's not really
buying us much: The non-SSE variants of the xor code will still
outperform the SSE one on both 32- and 64-bit x86 (and the
MMX ones on 32-bit). So I now instead wonder why linux-2.6.18's
include/asm-x86_64/mach-xen/asm/xor.h doesn't simply
forward to asm-generic/xor.h, or at least doesn't override the
template selection logic. Jun, do you recall whether this was
perhaps done without any actual measurements when the port
to x86-64 was first done?
Jan
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2012-06-26 16:25 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-15 16:03 fpu_taskswitch adjustment proposal Jan Beulich
2012-06-15 17:06 ` Keir Fraser
2012-06-18 7:32 ` Jan Beulich
2012-06-18 12:12 ` Keir Fraser
2012-06-18 12:45 ` Jan Beulich
2012-06-18 13:59 ` Keir Fraser
2012-06-18 18:24 ` Konrad Rzeszutek Wilk
2012-06-18 22:35 ` Keir Fraser
2012-06-26 16:25 ` Jan Beulich
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).