Re: [PATCH 2/2] i387: split up <asm/i387.h> into exported and internal interfaces

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 2/2] i387: split up <asm/i387.h> into exported and internal interfaces
       [not found]                           ` <alpine.LFD.2.02.1202211340330.5354@i5.linux-foundation.org>
@ 2012-02-28 11:21                             ` Avi Kivity
  2012-02-28 16:05                               ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: Avi Kivity @ 2012-02-28 11:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Josh Boyer, Jongman Heo, Thomas Gleixner,
	Ingo Molnar, x86, Linux Kernel Mailing List, KVM list

On 02/21/2012 11:41 PM, Linus Torvalds wrote:
> Btw, I really don't like what arch/x86/kvm/ does with CR0.TS and the FP
> state.  I'm not at all sure that's all kosher.  But I don't know the
> code, so I just made sure that at no point did any of the semantics
> change. 
>

Can you elaborate on what you don't like in the kvm code (apart from "it
does virtualiztion")?

btw, some time ago I did some work to lazify fpu save (as opposed to
just fpu restore) and abstract out the various users (user mode, kernel
threads, irq context, guest mode, and signal handlers).  This would
allow you to run task A's user mode with task B's fpu loaded, have
preemptible kernel fpu being, avoid fpu switching while handling
signals, and run user mode with a guest fpu loaded or vice versa. 
However I abandoned the effort as too complex.  Perhaps a more
determined hacker can make more progress there.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] i387: split up <asm/i387.h> into exported and internal interfaces
  2012-02-28 11:21                             ` [PATCH 2/2] i387: split up <asm/i387.h> into exported and internal interfaces Avi Kivity
@ 2012-02-28 16:05                               ` Linus Torvalds
  2012-02-28 17:21                                 ` Avi Kivity
  0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2012-02-28 16:05 UTC (permalink / raw)
  To: Avi Kivity
  Cc: H. Peter Anvin, Josh Boyer, Jongman Heo, Thomas Gleixner,
	Ingo Molnar, x86, Linux Kernel Mailing List, KVM list

On Tue, Feb 28, 2012 at 3:21 AM, Avi Kivity <avi@redhat.com> wrote:
>
> Can you elaborate on what you don't like in the kvm code (apart from "it
> does virtualiztion")?

It doesn't fit any of the patterns of the x87 save/restore code, and I
don't know what it does.

It does clts on its own, in random places without actually restoring
the FPU state. Why is that ok? I don't know. And I don't think it is,
but I didn't change any of it. Why doesn't that thing corrupt the lazy
state save of some other process, for example?

Doing a "clts()" without restoring the FPU state immediately
afterwards is fundamentally *wrong*. It's crazy. Insane. You can now
use the FPU, but with whatever random state that is in it that caused
TS to be set to begin with.

And if you don't have any FPU state to restore, because you want to
use your own kernel state, you should use the
"kernel_fpu_begin()/end()" things that we have had forever.

Here's an example of the kind of UTTER AND ABSOLUTE SHIT that kvm FPU
state restore is:

  static void emulator_get_fpu(struct x86_emulate_ctxt *ctxt)
  {
        preempt_disable();
        kvm_load_guest_fpu(emul_to_vcpu(ctxt));
        /*
         * CR0.TS may reference the host fpu state, not the guest fpu state,
         * so it may be clear at this point.
         */
        clts();
  }

that whole "comment" says nothing at all. And clearing CR0.TS *after*
loading the FPU state is a f*cking joke, since you need it clear to
load the FPU state to begin with. So as far as I can tell,
kvm_load_guest_fpu() will have cleared the FPU state already, but *it*
did it by:

        unlazy_fpu(current);
        fpu_restore_checking(&vcpu->arch.guest_fpu);

where "unlazy_fpu()" will have *set* TS if it wasn't set before, so
fpu_restore_checking() will now TAKE A FAULT, and in that fault
handler it will clear TS so that it can reload the state we just saved
(yes, really), only to then return to fpu_restore_checking() and
reload yet *another* state.

The code is crap. It's insane. It may work, but if it does, it does so
by pure chance and happenstance. The code is CLEARLY INSANE.

I wasn't going to touch it. It had been written by a
random-code-generator that had strung the various FPU accessor
functions up in random order until it compiled.

                  Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] i387: split up <asm/i387.h> into exported and internal interfaces
  2012-02-28 16:05                               ` Linus Torvalds
@ 2012-02-28 17:21                                 ` Avi Kivity
  2012-02-28 17:37                                   ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: Avi Kivity @ 2012-02-28 17:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Josh Boyer, Jongman Heo, Thomas Gleixner,
	Ingo Molnar, x86, Linux Kernel Mailing List, KVM list

On 02/28/2012 06:05 PM, Linus Torvalds wrote:
> On Tue, Feb 28, 2012 at 3:21 AM, Avi Kivity <avi@redhat.com> wrote:
> >
> > Can you elaborate on what you don't like in the kvm code (apart from "it
> > does virtualiztion")?
>
> It doesn't fit any of the patterns of the x87 save/restore code, and I
> don't know what it does.

It tries to do two things: first, keep the guest fpu loaded while
running kernel code, and second, allow the instruction emulator to
access the guest fpu.

> It does clts on its own, in random places without actually restoring
> the FPU state. Why is that ok? I don't know. 

The way we use vmx, it does an implicit stts() after an exit from a
guest (it's not required, but it's expensive to play with the value of
the host cr0, so we set it to a safe value and clear it when needed). 
So sometimes we need these random clts()s.

> And I don't think it is,
> but I didn't change any of it. Why doesn't that thing corrupt the lazy
> state save of some other process, for example?
>
> Doing a "clts()" without restoring the FPU state immediately
> afterwards is fundamentally *wrong*. It's crazy. Insane. You can now
> use the FPU, but with whatever random state that is in it that caused
> TS to be set to begin with.

There are two cases.  In one of them, we do restore the guest fpu
immediately afterwards.  In the other, we're just clearing a CR0.TS that
was set spuriously.

> And if you don't have any FPU state to restore, because you want to
> use your own kernel state, you should use the
> "kernel_fpu_begin()/end()" things that we have had forever.

We do have state - the guest state.

> Here's an example of the kind of UTTER AND ABSOLUTE SHIT that kvm FPU
> state restore is:
>
>   static void emulator_get_fpu(struct x86_emulate_ctxt *ctxt)
>   {
>         preempt_disable();
>         kvm_load_guest_fpu(emul_to_vcpu(ctxt));
>         /*
>          * CR0.TS may reference the host fpu state, not the guest fpu state,
>          * so it may be clear at this point.
>          */
>         clts();
>   }
>
> that whole "comment" says nothing at all. And clearing CR0.TS *after*
> loading the FPU state is a f*cking joke, since you need it clear to
> load the FPU state to begin with. So as far as I can tell,
> kvm_load_guest_fpu() will have cleared the FPU state already, but *it*
> did it by:
>
>         unlazy_fpu(current);
>         fpu_restore_checking(&vcpu->arch.guest_fpu);
>
> where "unlazy_fpu()" will have *set* TS if it wasn't set before, so
> fpu_restore_checking() will now TAKE A FAULT, and in that fault
> handler it will clear TS so that it can reload the state we just saved
> (yes, really), only to then return to fpu_restore_checking() and
> reload yet *another* state.
>
> The code is crap. It's insane. It may work, but if it does, it does so
> by pure chance and happenstance. The code is CLEARLY INSANE.

What you described is the slow path.  The fast path is

  void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
  {
      if (vcpu->guest_fpu_loaded)
          return;

If we're emulating an fpu instruction, it's very likely that we have the
guest fpu loaded into the cpu.  If we do take that path, we have the
right fpu state loaded, but CR0.TS is set by the recent exit, so we need
to clear it (the comment is in fact correct, except that it misspelled
"set").

> I wasn't going to touch it. It had been written by a
> random-code-generator that had strung the various FPU accessor
> functions up in random order until it compiled.

The tried and tested way, yes.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] i387: split up <asm/i387.h> into exported and internal interfaces
  2012-02-28 17:21                                 ` Avi Kivity
@ 2012-02-28 17:37                                   ` Linus Torvalds
  2012-02-28 18:08                                     ` Linus Torvalds
  2012-02-28 18:09                                     ` Avi Kivity
  0 siblings, 2 replies; 11+ messages in thread
From: Linus Torvalds @ 2012-02-28 17:37 UTC (permalink / raw)
  To: Avi Kivity
  Cc: H. Peter Anvin, Josh Boyer, Jongman Heo, Thomas Gleixner,
	Ingo Molnar, x86, Linux Kernel Mailing List, KVM list

On Tue, Feb 28, 2012 at 9:21 AM, Avi Kivity <avi@redhat.com> wrote:
>
> What you described is the slow path.

Indeed. I'd even call it the "glacial and stupid" path.

>The fast path is
>
>  void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
>  {
>      if (vcpu->guest_fpu_loaded)
>          return;
>
> If we're emulating an fpu instruction, it's very likely that we have the
> guest fpu loaded into the cpu.  If we do take that path, we have the
> right fpu state loaded, but CR0.TS is set by the recent exit, so we need
> to clear it (the comment is in fact correct, except that it misspelled
> "set").

So why the hell do you put the clts in the wrong place, then?

Dammit, the code is crap.

The clts's are in random places, they don't make any sense, the
comments are *wrong*, and the only reason they exist in the first
place is exactly the fact that the code does what it does in the wrong
place.

There's a reason I called the code crap. It makes no sense. Your
explanation only explains what it does (badly) when what *should* have
happened is you saying "ok, that makes no sense, let's fix it up".

So let me re-iterate: it makes ZERO SENSE to clear clts *after*
restoring the state. Don't do it. Don't make excuses for doing it.
It's WRONG. Whether it even happens to work by random chance isn't
even the issue.

Where it *can* make sense to clear TS is in your code that says

>      if (vcpu->guest_fpu_loaded)
>          return;

where you could have done it like this:

    /* If we already have the FPU loaded, just clear TS - it was set
by a recent exit */
    if (vcpu->guest_fpu_loaded) {
        clts();
        return;
    }

And then at least the *placement* of clts would make sense. HOWEVER.
Even if you did that, what guarantees that the most recent FP usage
was by *your* kvm process? Sure, the "recent exit" may have set TS,
but have you had preemption disabled for the whole time? Guaranteed?

Because TS may be set because something else rescheduled too.

So where's the comment about why you actually own and control CR0.TS,
and nobody else does?

Finally, how does this all interact with the fact that the task
switching now keeps the FPU state around in the FPU and caches what
state it is? I have no idea, because the kvm code is so inpenetratable
due to all these totally unexplained things.

Btw, don't get me wrong - the core FPU state save/restore was a mess
of random "TS_USEDFPU" and clts() crap too. We had several bugs there,
partly exactly because the core FPU restore code also had "clts()"
separated from the logic that actually set or cleared the TS_USEDFPU
bit, and it was not at all clear at a "local" setting what the F was
going on.

Most of the recent i387 changes were actually to clean up and make
sense of that thing, and making sure that the clts() was paired with
the action of actually giving the FPU to the thread etc. So at least
now the core FPU handling is reasonably sane, and the clts's and
stts's are paired with the things that take control of the FPU, and we
have a few helper functions and some abstraction in place.

The kvm code definitely needs the same kind of cleanup. Because as it
is now, it's just random code junk, and there is no obvious reason why
it wouldn't interact horribly badly with an interrupt doing
"irq_fpu_usable()" + "kernel_fpu_begin/end()" for example.

Seriously.

              Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] i387: split up <asm/i387.h> into exported and internal interfaces
  2012-02-28 17:37                                   ` Linus Torvalds
@ 2012-02-28 18:08                                     ` Linus Torvalds
  2012-02-28 18:29                                       ` Avi Kivity
  2012-02-28 18:09                                     ` Avi Kivity
  1 sibling, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2012-02-28 18:08 UTC (permalink / raw)
  To: Avi Kivity
  Cc: H. Peter Anvin, Josh Boyer, Jongman Heo, Thomas Gleixner,
	Ingo Molnar, x86, Linux Kernel Mailing List, KVM list

On Tue, Feb 28, 2012 at 9:37 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So where's the comment about why you actually own and control CR0.TS,
> and nobody else does?

So what I think KVM should strive for (but I really don't know the
code, so maybe there are good reasons why it is impossible) is to just
never touch TS at all, and let the core kernel code do it all for you.

When you need access to the FPU, let the core code just handle it for
you. Let it trap and restore the state. When you get scheduled away,
let the core code just set TS, because you really can't touch the FP
state again.

IOW, just do the FP operations you do within the thread you are. Never
touch TS at all, just don't worry about it. Worry about your own
internal FP state machine, but don't interact with the "global" kernel
TS state machine.

You can't do a lot better than that, I think. Especially now that we
do the lazy restore, we can schedule between two tasks and if only one
of them actually uses the FPU, we won't bother with extraneous state
restores.

The one exception I can think of is that if you are loading totally
*new* FP state, and you think that TS is likely to be set, instead of
trapping (and loading the old state in the trap handling) only to
return to load the *new* state, we could expose a helper for that
situation. It would look something like

   user_fpu_begin();
   fpu_restore_checking(newfpustate);

and it would avoid the trap when loading the new state.

                       Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] i387: split up <asm/i387.h> into exported and internal interfaces
  2012-02-28 18:08                                     ` Linus Torvalds
@ 2012-02-28 18:29                                       ` Avi Kivity
  0 siblings, 0 replies; 11+ messages in thread
From: Avi Kivity @ 2012-02-28 18:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Josh Boyer, Jongman Heo, Thomas Gleixner,
	Ingo Molnar, x86, Linux Kernel Mailing List, KVM list

On 02/28/2012 08:08 PM, Linus Torvalds wrote:
> On Tue, Feb 28, 2012 at 9:37 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > So where's the comment about why you actually own and control CR0.TS,
> > and nobody else does?
>
> So what I think KVM should strive for (but I really don't know the
> code, so maybe there are good reasons why it is impossible) is to just
> never touch TS at all, and let the core kernel code do it all for you.

Which TS?  With kvm (restricting ourselves to vmx at this moment) there
are three versions lying around: CR0.TS, GUEST_CR0.TS (which is loaded
by the cpu during entry into the guest) and HOST_CR0.TS (which is loaded
by the cpu during guest exit).  GUEST_CR0.TS is actually a combination
of the guest's virtualized CR0.TS, and a flag that says whether the
guest fpu is loaded or not.  HOST_CR0 is basically a cached CR0, but as
it's expensive to change it, we don't want to reflect CR0.TS into
HOST_CR0.TS.

> When you need access to the FPU, let the core code just handle it for
> you. Let it trap and restore the state. When you get scheduled away,
> let the core code just set TS, because you really can't touch the FP
> state again.
>
> IOW, just do the FP operations you do within the thread you are. Never
> touch TS at all, just don't worry about it. Worry about your own
> internal FP state machine, but don't interact with the "global" kernel
> TS state machine.

I can't avoid touching it.  On exit vmx will set it for me.  I can
atomically copy CR0.TS into HOST_CR0.TS, but that's expensive.

Maybe we should just virtualize it into a percpu variable.  Should speed
up the non-kvm case as well since read_cr0() is likely not very fast.

> You can't do a lot better than that, I think. Especially now that we
> do the lazy restore, we can schedule between two tasks and if only one
> of them actually uses the FPU, we won't bother with extraneous state
> restores.

Ah, this is a new bit, I'll have to study it.

> The one exception I can think of is that if you are loading totally
> *new* FP state, and you think that TS is likely to be set, instead of
> trapping (and loading the old state in the trap handling) only to
> return to load the *new* state, we could expose a helper for that
> situation. It would look something like
>
>    user_fpu_begin();
>    fpu_restore_checking(newfpustate);
>
> and it would avoid the trap when loading the new state.
>

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] i387: split up <asm/i387.h> into exported and internal interfaces
  2012-02-28 17:37                                   ` Linus Torvalds
  2012-02-28 18:08                                     ` Linus Torvalds
@ 2012-02-28 18:09                                     ` Avi Kivity
  2012-02-28 18:34                                       ` Linus Torvalds
  1 sibling, 1 reply; 11+ messages in thread
From: Avi Kivity @ 2012-02-28 18:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Josh Boyer, Jongman Heo, Thomas Gleixner,
	Ingo Molnar, x86, Linux Kernel Mailing List, KVM list

On 02/28/2012 07:37 PM, Linus Torvalds wrote:
> On Tue, Feb 28, 2012 at 9:21 AM, Avi Kivity <avi@redhat.com> wrote:
> >
> > What you described is the slow path.
>
> Indeed. I'd even call it the "glacial and stupid" path.

Right.  It won't be offended, since it's almost never called.

> >The fast path is
> >
> >  void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
> >  {
> >      if (vcpu->guest_fpu_loaded)
> >          return;
> >
> > If we're emulating an fpu instruction, it's very likely that we have the
> > guest fpu loaded into the cpu.  If we do take that path, we have the
> > right fpu state loaded, but CR0.TS is set by the recent exit, so we need
> > to clear it (the comment is in fact correct, except that it misspelled
> > "set").
>
> So why the hell do you put the clts in the wrong place, then?

You mean, why not in kvm_load_guest_fpu()?  Most of the uses of
kvm_load_guest_fpu() are just before guest entry, so the clts() would be
immediately overwritten by the loading of the GUEST_CR0 register (which
may have TS set or clear).  So putting it here would be wasting cycles.

>
> Dammit, the code is crap.
>
> The clts's are in random places, they don't make any sense, the
> comments are *wrong*, and the only reason they exist in the first
> place is exactly the fact that the code does what it does in the wrong
> place.
>
> There's a reason I called the code crap. It makes no sense. Your
> explanation only explains what it does (badly) when what *should* have
> happened is you saying "ok, that makes no sense, let's fix it up".
>
> So let me re-iterate: it makes ZERO SENSE to clear clts *after*
> restoring the state. Don't do it. Don't make excuses for doing it.
> It's WRONG. Whether it even happens to work by random chance isn't
> even the issue.
>
> Where it *can* make sense to clear TS is in your code that says
>
> >      if (vcpu->guest_fpu_loaded)
> >          return;
>
> where you could have done it like this:
>
>     /* If we already have the FPU loaded, just clear TS - it was set
> by a recent exit */
>     if (vcpu->guest_fpu_loaded) {
>         clts();
>         return;
>     }
>
> And then at least the *placement* of clts would make sense. 

True, it's cleaner, but as noted above, it's wasteful.

> HOWEVER.
> Even if you did that, what guarantees that the most recent FP usage
> was by *your* kvm process? Sure, the "recent exit" may have set TS,
> but have you had preemption disabled for the whole time? Guaranteed?

Both the vcpu_load() and emulation paths happen with preemption disabled.

> Because TS may be set because something else rescheduled too.
>
> So where's the comment about why you actually own and control CR0.TS,
> and nobody else does?

The code is poorly commented, yes.

> Finally, how does this all interact with the fact that the task
> switching now keeps the FPU state around in the FPU and caches what
> state it is? I have no idea, because the kvm code is so inpenetratable
> due to all these totally unexplained things.

This is done by preempt notifiers.  Whenever a task switch happens we
push the guest fpu state into memory (if loaded) and let the normal
stuff happen.  So the if we had a task switch during instruction
emulation, for example, then we'd get the "glacial and stupid path" to fire.

> Btw, don't get me wrong - the core FPU state save/restore was a mess
> of random "TS_USEDFPU" and clts() crap too. We had several bugs there,
> partly exactly because the core FPU restore code also had "clts()"
> separated from the logic that actually set or cleared the TS_USEDFPU
> bit, and it was not at all clear at a "local" setting what the F was
> going on.
>
> Most of the recent i387 changes were actually to clean up and make
> sense of that thing, and making sure that the clts() was paired with
> the action of actually giving the FPU to the thread etc. So at least
> now the core FPU handling is reasonably sane, and the clts's and
> stts's are paired with the things that take control of the FPU, and we
> have a few helper functions and some abstraction in place.
>
> The kvm code definitely needs the same kind of cleanup. Because as it
> is now, it's just random code junk, and there is no obvious reason why
> it wouldn't interact horribly badly with an interrupt doing
> "irq_fpu_usable()" + "kernel_fpu_begin/end()" for example.

Well, interrupted_kernel_fpu_idle() does look like it returns the wrong
result when kvm is active.  Has the semantics of that changed in the
recent round?  The kvm fpu code is quite old and we haven't had any
reports of bad interactions with RAID/encryption since it was stabilized.

> Seriously.

I agree a refactoring is needed.  We may need to replace read_cr0() in

  static inline bool interrupted_kernel_fpu_idle(void)
  {
    return !(current_thread_info()->status & TS_USEDFPU) &&
        (read_cr0() & X86_CR0_TS);
  }

with some percpu variable since CR0.TS is not reliable in interrupt
context while kvm is running.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] i387: split up <asm/i387.h> into exported and internal interfaces
  2012-02-28 18:09                                     ` Avi Kivity
@ 2012-02-28 18:34                                       ` Linus Torvalds
  2012-02-28 19:06                                         ` Avi Kivity
  0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2012-02-28 18:34 UTC (permalink / raw)
  To: Avi Kivity
  Cc: H. Peter Anvin, Josh Boyer, Jongman Heo, Thomas Gleixner,
	Ingo Molnar, x86, Linux Kernel Mailing List, KVM list

On Tue, Feb 28, 2012 at 10:09 AM, Avi Kivity <avi@redhat.com> wrote:
>
> This is done by preempt notifiers.  Whenever a task switch happens we
> push the guest fpu state into memory (if loaded) and let the normal
> stuff happen.  So the if we had a task switch during instruction
> emulation, for example, then we'd get the "glacial and stupid path" to fire.

Oh christ.

This is exactly what the scheduler has ALWAYS ALREADY DONE FOR YOU.

That's what the i387 save-and-restore code is all about. What's the
advantage of just re-implementing it in non-obvious ways?

Stop doing it. You get *zero* advantages from just doing what the
scheduler natively does for you, and the scheduler does it *better*.

                   Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] i387: split up <asm/i387.h> into exported and internal interfaces
  2012-02-28 18:34                                       ` Linus Torvalds
@ 2012-02-28 19:06                                         ` Avi Kivity
  2012-02-28 19:26                                           ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: Avi Kivity @ 2012-02-28 19:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Josh Boyer, Jongman Heo, Thomas Gleixner,
	Ingo Molnar, x86, Linux Kernel Mailing List, KVM list

On 02/28/2012 08:34 PM, Linus Torvalds wrote:
> On Tue, Feb 28, 2012 at 10:09 AM, Avi Kivity <avi@redhat.com> wrote:
> >
> > This is done by preempt notifiers.  Whenever a task switch happens we
> > push the guest fpu state into memory (if loaded) and let the normal
> > stuff happen.  So the if we had a task switch during instruction
> > emulation, for example, then we'd get the "glacial and stupid path" to fire.
>
> Oh christ.
>
> This is exactly what the scheduler has ALWAYS ALREADY DONE FOR YOU.

No, the scheduler saves the state into task_struct.  I need it saved
into the vcpu structure.  We have two fpu states, the user state, and
the guest state.  APIs that take a task_struct as a parameter, or
reference current implicitly, aren't going to work.

> That's what the i387 save-and-restore code is all about. What's the
> advantage of just re-implementing it in non-obvious ways?
>
> Stop doing it. You get *zero* advantages from just doing what the
> scheduler natively does for you, and the scheduler does it *better*.

The scheduler does something different.

What I'd ideally want is

  struct fpu {
      int cpu;  /* -1 = not loaded */
      union thread_xstate *state;
  };

Perhaps with a struct fpu_ops *ops if needed.  We could then let various
users' fpus float around freely and only save/load them at the last moment.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] i387: split up <asm/i387.h> into exported and internal interfaces
  2012-02-28 19:06                                         ` Avi Kivity
@ 2012-02-28 19:26                                           ` Linus Torvalds
  2012-02-28 19:45                                             ` Avi Kivity
  0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2012-02-28 19:26 UTC (permalink / raw)
  To: Avi Kivity
  Cc: H. Peter Anvin, Josh Boyer, Jongman Heo, Thomas Gleixner,
	Ingo Molnar, x86, Linux Kernel Mailing List, KVM list

On Tue, Feb 28, 2012 at 11:06 AM, Avi Kivity <avi@redhat.com> wrote:
>
> No, the scheduler saves the state into task_struct.  I need it saved
> into the vcpu structure.  We have two fpu states, the user state, and
> the guest state.  APIs that take a task_struct as a parameter, or
> reference current implicitly, aren't going to work.

As far as I can tell, you should do the saving into the vcpu structure
when you actually switch the thing around.

In fact, you can do it these days by just playing around with the
"tsk->thread.fpu.state" pointer, I guess.

But it all boils down to the fact that your code is not just ugly,
it's *buggy*. If you play around with setting TS, you *will* be hit by
interrupts etc that will start to use the FP code that you "don't
use".

And there is no excuse for you touching the host TS. The kernel does
that for you, and does it better. And caches the end result in
TS_USEDFPU (old) or in some variable that you shouldn't look at but
can access with the user_has_fpu() helpers.

                    Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] i387: split up <asm/i387.h> into exported and internal interfaces
  2012-02-28 19:26                                           ` Linus Torvalds
@ 2012-02-28 19:45                                             ` Avi Kivity
  0 siblings, 0 replies; 11+ messages in thread
From: Avi Kivity @ 2012-02-28 19:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Josh Boyer, Jongman Heo, Thomas Gleixner,
	Ingo Molnar, x86, Linux Kernel Mailing List, KVM list

On 02/28/2012 09:26 PM, Linus Torvalds wrote:
> On Tue, Feb 28, 2012 at 11:06 AM, Avi Kivity <avi@redhat.com> wrote:
> >
> > No, the scheduler saves the state into task_struct.  I need it saved
> > into the vcpu structure.  We have two fpu states, the user state, and
> > the guest state.  APIs that take a task_struct as a parameter, or
> > reference current implicitly, aren't going to work.
>
> As far as I can tell, you should do the saving into the vcpu structure
> when you actually switch the thing around.
>
> In fact, you can do it these days by just playing around with the
> "tsk->thread.fpu.state" pointer, I guess.

Good idea.  I can't say I like poking into struct fpu's internals, but
we can treat it as an opaque structure and copy it around.

We can also do this in kernel_fpu_begin(), and allow it to be preemptible.

> But it all boils down to the fact that your code is not just ugly,
> it's *buggy*. If you play around with setting TS, you *will* be hit by
> interrupts etc that will start to use the FP code that you "don't
> use".
>
> And there is no excuse for you touching the host TS. The kernel does
> that for you, and does it better. And caches the end result in
> TS_USEDFPU (old) or in some variable that you shouldn't look at but
> can access with the user_has_fpu() helpers.

Again, I can't avoid touching it.  I can try to get the hardware to
always preserve its value, but that comes with a cost.

btw, I think the current code is safe wrt kvm.  If the guest fpu has
been loaded, then we know that that TS_USEDFPU is set, since we will
have saved the user fpu earlier.  Yes it's "accidental" and needs to be
improved, but I don't think it's a data corruptor waiting to happen.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-02-28 19:45 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <alpine.LFD.2.02.1202191412060.3898@i5.linux-foundation.org>
     [not found] ` <alpine.LFD.2.02.1202201145130.4237@i5.linux-foundation.org>
     [not found]   ` <alpine.LFD.2.02.1202201146450.4237@i5.linux-foundation.org>
     [not found]     ` <alpine.LFD.2.02.1202201147350.4237@i5.linux-foundation.org>
     [not found]       ` <alpine.LFD.2.02.1202201148100.4237@i5.linux-foundation.org>
     [not found]         ` <CA+5PVA5Zkka6Q2kcvWbvA3xr9=5TZ+VWWsWzYcRQ1GpccpJHew@mail.gmail.com>
     [not found]           ` <CA+55aFyqz-2ejsRpbRdYx7UHDBg3q=9nvmtrj2nLeuZonrkz1A@mail.gmail.com>
     [not found]             ` <4F42FE08.5020309@zytor.com>
     [not found]               ` <CA+55aFxKLvHXRkx-x+zOkHJk__uTJc90Z0i+00W-hqDaQb_HCQ@mail.gmail.com>
     [not found]                 ` <4F43DB69.9060509@zytor.com>
     [not found]                   ` <CA+55aFxap0zAAJOCcz0f+DKBuG6VQa7kq1j2JD1S4ysgmb62aA@mail.gmail.com>
     [not found]                     ` <4F440945.1020904@zytor.com>
     [not found]                       ` <alpine.LFD.2.02.1202211338420.5354@i5.linux-foundation.org>
     [not found]                         ` <alpine.LFD.2.02.1202211339590.5354@i5.linux-foundation.org >
     [not found]                           ` <alpine.LFD.2.02.1202211340330.5354@i5.linux-foundation.org>
2012-02-28 11:21                             ` [PATCH 2/2] i387: split up <asm/i387.h> into exported and internal interfaces Avi Kivity
2012-02-28 16:05                               ` Linus Torvalds
2012-02-28 17:21                                 ` Avi Kivity
2012-02-28 17:37                                   ` Linus Torvalds
2012-02-28 18:08                                     ` Linus Torvalds
2012-02-28 18:29                                       ` Avi Kivity
2012-02-28 18:09                                     ` Avi Kivity
2012-02-28 18:34                                       ` Linus Torvalds
2012-02-28 19:06                                         ` Avi Kivity
2012-02-28 19:26                                           ` Linus Torvalds
2012-02-28 19:45                                             ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox