linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Avi Kivity <avi@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>, Josh Boyer <jwboyer@gmail.com>,
	Jongman Heo <jongman.heo@samsung.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	x86@kernel.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	KVM list <kvm@vger.kernel.org>
Subject: Re: [PATCH 2/2] i387: split up <asm/i387.h> into exported and internal interfaces
Date: Tue, 28 Feb 2012 20:09:27 +0200	[thread overview]
Message-ID: <4F4D1857.4010706@redhat.com> (raw)
In-Reply-To: <CA+55aFyA=b-27XbhGdDQfSRMk=xet2jOD9H8hkjEq1md29bJJA@mail.gmail.com>

On 02/28/2012 07:37 PM, Linus Torvalds wrote:
> On Tue, Feb 28, 2012 at 9:21 AM, Avi Kivity <avi@redhat.com> wrote:
> >
> > What you described is the slow path.
>
> Indeed. I'd even call it the "glacial and stupid" path.

Right.  It won't be offended, since it's almost never called.

> >The fast path is
> >
> >  void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
> >  {
> >      if (vcpu->guest_fpu_loaded)
> >          return;
> >
> > If we're emulating an fpu instruction, it's very likely that we have the
> > guest fpu loaded into the cpu.  If we do take that path, we have the
> > right fpu state loaded, but CR0.TS is set by the recent exit, so we need
> > to clear it (the comment is in fact correct, except that it misspelled
> > "set").
>
> So why the hell do you put the clts in the wrong place, then?

You mean, why not in kvm_load_guest_fpu()?  Most of the uses of
kvm_load_guest_fpu() are just before guest entry, so the clts() would be
immediately overwritten by the loading of the GUEST_CR0 register (which
may have TS set or clear).  So putting it here would be wasting cycles.

>
> Dammit, the code is crap.
>
> The clts's are in random places, they don't make any sense, the
> comments are *wrong*, and the only reason they exist in the first
> place is exactly the fact that the code does what it does in the wrong
> place.
>
> There's a reason I called the code crap. It makes no sense. Your
> explanation only explains what it does (badly) when what *should* have
> happened is you saying "ok, that makes no sense, let's fix it up".
>
> So let me re-iterate: it makes ZERO SENSE to clear clts *after*
> restoring the state. Don't do it. Don't make excuses for doing it.
> It's WRONG. Whether it even happens to work by random chance isn't
> even the issue.
>
> Where it *can* make sense to clear TS is in your code that says
>
> >      if (vcpu->guest_fpu_loaded)
> >          return;
>
> where you could have done it like this:
>
>     /* If we already have the FPU loaded, just clear TS - it was set
> by a recent exit */
>     if (vcpu->guest_fpu_loaded) {
>         clts();
>         return;
>     }
>
> And then at least the *placement* of clts would make sense. 

True, it's cleaner, but as noted above, it's wasteful.

> HOWEVER.
> Even if you did that, what guarantees that the most recent FP usage
> was by *your* kvm process? Sure, the "recent exit" may have set TS,
> but have you had preemption disabled for the whole time? Guaranteed?

Both the vcpu_load() and emulation paths happen with preemption disabled.

> Because TS may be set because something else rescheduled too.
>
> So where's the comment about why you actually own and control CR0.TS,
> and nobody else does?

The code is poorly commented, yes.

> Finally, how does this all interact with the fact that the task
> switching now keeps the FPU state around in the FPU and caches what
> state it is? I have no idea, because the kvm code is so inpenetratable
> due to all these totally unexplained things.

This is done by preempt notifiers.  Whenever a task switch happens we
push the guest fpu state into memory (if loaded) and let the normal
stuff happen.  So the if we had a task switch during instruction
emulation, for example, then we'd get the "glacial and stupid path" to fire.

> Btw, don't get me wrong - the core FPU state save/restore was a mess
> of random "TS_USEDFPU" and clts() crap too. We had several bugs there,
> partly exactly because the core FPU restore code also had "clts()"
> separated from the logic that actually set or cleared the TS_USEDFPU
> bit, and it was not at all clear at a "local" setting what the F was
> going on.
>
> Most of the recent i387 changes were actually to clean up and make
> sense of that thing, and making sure that the clts() was paired with
> the action of actually giving the FPU to the thread etc. So at least
> now the core FPU handling is reasonably sane, and the clts's and
> stts's are paired with the things that take control of the FPU, and we
> have a few helper functions and some abstraction in place.
>
> The kvm code definitely needs the same kind of cleanup. Because as it
> is now, it's just random code junk, and there is no obvious reason why
> it wouldn't interact horribly badly with an interrupt doing
> "irq_fpu_usable()" + "kernel_fpu_begin/end()" for example.

Well, interrupted_kernel_fpu_idle() does look like it returns the wrong
result when kvm is active.  Has the semantics of that changed in the
recent round?  The kvm fpu code is quite old and we haven't had any
reports of bad interactions with RAID/encryption since it was stabilized.

> Seriously.

I agree a refactoring is needed.  We may need to replace read_cr0() in

  static inline bool interrupted_kernel_fpu_idle(void)
  {
    return !(current_thread_info()->status & TS_USEDFPU) &&
        (read_cr0() & X86_CR0_TS);
  }

with some percpu variable since CR0.TS is not reliable in interrupt
context while kvm is running.

-- 
error compiling committee.c: too many arguments to function


  parent reply	other threads:[~2012-02-28 18:09 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-19 22:23 [PATCH 0/2] More i387 state save/restore work Linus Torvalds
2012-02-19 22:26 ` [PATCH 1/2] i387: use 'restore_fpu_checking()' directly in task switching code Linus Torvalds
2012-02-19 22:37   ` [PATCH 2/2] i387: support lazy restore of FPU state Linus Torvalds
2012-02-19 22:44     ` H. Peter Anvin
2012-02-19 23:18       ` H. Peter Anvin
2012-02-19 23:56       ` Linus Torvalds
2012-02-20  7:51     ` Ingo Molnar
2012-02-20  0:53 ` [PATCH 0/2] More i387 state save/restore work Michael Neuling
2012-02-20  1:03   ` Linus Torvalds
2012-02-20  1:06     ` Linus Torvalds
2012-02-20  1:11       ` Linus Torvalds
2012-03-01 11:30         ` Benjamin Herrenschmidt
2012-02-20  2:09     ` Indan Zupancic
2012-02-20 19:46 ` [PATCH v2 0/3] " Linus Torvalds
2012-02-20 19:47   ` [PATCH v2 1/3] i387: fix up some fpu_counter confusion Linus Torvalds
2012-02-20 19:48     ` [PATCH v2 2/3] i387: use 'restore_fpu_checking()' directly in task switching code Linus Torvalds
2012-02-20 19:48       ` [PATCH v2 3/3] i387: support lazy restore of FPU state Linus Torvalds
2012-02-21  1:50         ` Josh Boyer
2012-02-21  2:10           ` Linus Torvalds
2012-02-21  2:14             ` H. Peter Anvin
2012-02-21  5:27               ` Linus Torvalds
2012-02-21  5:35                 ` H. Peter Anvin
2012-02-21 14:19                 ` Josh Boyer
2012-02-21 17:59                 ` H. Peter Anvin
2012-02-21 18:06                   ` Ingo Molnar
2012-02-21 18:26                   ` Linus Torvalds
2012-02-21 21:14                     ` H. Peter Anvin
2012-02-21 21:39                       ` [PATCH 0/2] i387: FP state interface cleanups Linus Torvalds
2012-02-21 21:40                         ` [PATCH 1/2] i387: uninline the generic FP helpers that we expose to kernel modules Linus Torvalds
2012-02-21 21:41                           ` [PATCH 2/2] i387: split up <asm/i387.h> into exported and internal interfaces Linus Torvalds
2012-02-21 23:50                             ` [tip:x86/fpu] i387: Split " tip-bot for Linus Torvalds
2012-02-28 11:21                             ` [PATCH 2/2] i387: split " Avi Kivity
2012-02-28 16:05                               ` Linus Torvalds
2012-02-28 17:21                                 ` Avi Kivity
2012-02-28 17:37                                   ` Linus Torvalds
2012-02-28 18:08                                     ` Linus Torvalds
2012-02-28 18:29                                       ` Avi Kivity
2012-02-28 18:09                                     ` Avi Kivity [this message]
2012-02-28 18:34                                       ` Linus Torvalds
2012-02-28 19:06                                         ` Avi Kivity
2012-02-28 19:26                                           ` Linus Torvalds
2012-02-28 19:45                                             ` Avi Kivity
2012-02-21 23:49                           ` [tip:x86/fpu] i387: Uninline the generic FP helpers that we expose to kernel modules tip-bot for Linus Torvalds
2012-02-21  2:18             ` [PATCH v2 3/3] i387: support lazy restore of FPU state Linus Torvalds
2012-02-21  2:32               ` H. Peter Anvin
2012-02-21  2:11           ` H. Peter Anvin
2012-02-21 21:54         ` Suresh Siddha
2012-02-21 21:57           ` Linus Torvalds
2012-02-21 22:19             ` Suresh Siddha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F4D1857.4010706@redhat.com \
    --to=avi@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jongman.heo@samsung.com \
    --cc=jwboyer@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).