From: Borislav Petkov <bp@alien8.de>
To: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>,
Oleg Nesterov <oleg@redhat.com>, Rik van Riel <riel@redhat.com>,
x86@kernel.org, linux-kernel@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
Arjan van de Ven <arjan@infradead.org>
Subject: Re: [RFC PATCH] x86, fpu: Use eagerfpu by default on all CPUs
Date: Sun, 22 Feb 2015 11:48:54 +0100 [thread overview]
Message-ID: <20150222104854.GA7529@pd.tnic> (raw)
In-Reply-To: <20150222081840.GA22972@gmail.com>
On Sun, Feb 22, 2015 at 09:18:40AM +0100, Ingo Molnar wrote:
> So am I interpreting the older and your latest numbers
> correctly in stating that the cost observation has flipped
> around 180 degrees: the first measurement showed eager FPU
> to be a win, but now that we can do more precise
> measurements, eager FPU has actually slowed down the kernel
> build by ~0.5%?
Well, I wouldn't take the latest numbers too seriously - that was a
single run without --repeat.
> That's not good, and kernel builds are just a random load
> that isn't even that FPU or context switch heavy - there
> will certainly be other loads that would be hurt even more.
That is my fear.
> So just before we base wide reaching decisions based on any
> of these measurements, would you mind help us increase our
> confidence in the numbers some more:
>
> - It might make sense to do a 'perf stat --null --repeat'
> measurement as well [without any -e arguments], to make
> sure the rich PMU stats you are gathering are not
> interfering?
>
> With 'perf stat --null --repeat' perf acts essenially
> as a /usr/bin/time replacement, but can measure down to
> microseconds and will calculate noise/sttdev properly.
Cool, let me do that.
> - Perhaps also double check the debug switch: is it
> really properly switching FPU handling mode?
I've changed the use_eager_fpu() test to do:
static __always_inline __pure bool use_eager_fpu(void)
{
return boot_cpu_has(X86_FEATURE_EAGER_FPU);
}
and I'm clearing/setting eager FPU with
setup_force_cpu_cap/setup_clear_cpu_cap, see full diff below.
> - Do you have enough RAM that there's essentially no IO
> in the system worth speaking of? Do you have enough RAM
> to copy a whole kernel tree to /tmp/linux/ and do the
> measurement there, on ramfs?
/proc/meminfo says "MemTotal: 4011860 kB" which is probably not enough.
But I could find one somewhere :-)
---
arch/x86/include/asm/fpu-internal.h | 6 +++-
arch/x86/kernel/xsave.c | 57 ++++++++++++++++++++++++++++++++++++-
2 files changed, 61 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/fpu-internal.h b/arch/x86/include/asm/fpu-internal.h
index e97622f57722..c8a161d02056 100644
--- a/arch/x86/include/asm/fpu-internal.h
+++ b/arch/x86/include/asm/fpu-internal.h
@@ -38,6 +38,8 @@ int ia32_setup_frame(int sig, struct ksignal *ksig,
# define ia32_setup_rt_frame __setup_rt_frame
#endif
+
+extern unsigned long fpu_saved;
extern unsigned int mxcsr_feature_mask;
extern void fpu_init(void);
extern void eager_fpu_init(void);
@@ -87,7 +89,7 @@ static inline int is_x32_frame(void)
static __always_inline __pure bool use_eager_fpu(void)
{
- return static_cpu_has_safe(X86_FEATURE_EAGER_FPU);
+ return boot_cpu_has(X86_FEATURE_EAGER_FPU);
}
static __always_inline __pure bool use_xsaveopt(void)
@@ -242,6 +244,8 @@ static inline void fpu_fxsave(struct fpu *fpu)
*/
static inline int fpu_save_init(struct fpu *fpu)
{
+ fpu_saved++;
+
if (use_xsave()) {
fpu_xsave(fpu);
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index 0de1fae2bdf0..943af0adacff 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -14,6 +14,8 @@
#include <asm/sigframe.h>
#include <asm/xcr.h>
+#include <linux/debugfs.h>
+
/*
* Supported feature mask by the CPU and the kernel.
*/
@@ -638,7 +640,7 @@ static void __init xstate_enable_boot_cpu(void)
setup_init_fpu_buf();
/* Auto enable eagerfpu for xsaveopt */
- if (cpu_has_xsaveopt && eagerfpu != DISABLE)
+ if (eagerfpu != DISABLE)
eagerfpu = ENABLE;
if (pcntxt_mask & XSTATE_EAGER) {
@@ -739,3 +741,56 @@ void *get_xsave_addr(struct xsave_struct *xsave, int xstate)
return (void *)xsave + xstate_comp_offsets[feature];
}
EXPORT_SYMBOL_GPL(get_xsave_addr);
+
+unsigned long fpu_saved;
+
+static void my_clts(void *arg)
+{
+ asm volatile("clts");
+}
+
+static int eager_get(void *data, u64 *val)
+{
+ *val = fpu_saved;
+
+ return 0;
+}
+
+static int eager_set(void *data, u64 val)
+{
+ preempt_disable();
+ if (val) {
+ on_each_cpu(my_clts, NULL, 1);
+ setup_force_cpu_cap(X86_FEATURE_EAGER_FPU);
+ } else {
+ setup_clear_cpu_cap(X86_FEATURE_EAGER_FPU);
+ stts();
+ }
+ preempt_enable();
+
+ fpu_saved = 0;
+
+ return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(eager_fops, eager_get, eager_set, "%llu\n");
+
+static int __init setup_eagerfpu_knob(void)
+{
+ static struct dentry *d_eager, *f_eager;
+
+ d_eager = debugfs_create_dir("fpu", NULL);
+ if (!d_eager) {
+ pr_err("Error creating fpu debugfs dir\n");
+ return -ENOMEM;
+ }
+
+ f_eager = debugfs_create_file("eager", 0644, d_eager, NULL, &eager_fops);
+ if (!f_eager) {
+ pr_err("Error creating fpu debugfs node\n");
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+late_initcall(setup_eagerfpu_knob);
--
2.2.0.33.gc18b867
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
next prev parent reply other threads:[~2015-02-22 10:49 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-20 18:58 [RFC PATCH] x86, fpu: Use eagerfpu by default on all CPUs Andy Lutomirski
2015-02-20 19:05 ` Borislav Petkov
2015-02-21 9:31 ` Ingo Molnar
2015-02-21 16:38 ` Borislav Petkov
2015-02-21 17:29 ` Borislav Petkov
2015-02-21 18:39 ` Ingo Molnar
2015-02-21 19:15 ` Borislav Petkov
2015-02-21 19:23 ` Ingo Molnar
2015-02-21 21:36 ` Borislav Petkov
2015-02-22 8:18 ` Ingo Molnar
2015-02-22 8:22 ` Ingo Molnar
2015-02-22 10:48 ` Borislav Petkov [this message]
2015-02-22 12:50 ` Borislav Petkov
2015-02-22 12:57 ` Ingo Molnar
2015-02-22 13:21 ` Borislav Petkov
2015-02-22 0:34 ` Maciej W. Rozycki
2015-02-22 2:18 ` Andy Lutomirski
2015-02-22 11:06 ` Borislav Petkov
2015-02-23 1:45 ` Rik van Riel
2015-02-23 5:22 ` Andy Lutomirski
2015-02-23 12:51 ` Rik van Riel
2015-02-23 15:03 ` Borislav Petkov
2015-02-23 15:51 ` Rik van Riel
2015-02-23 18:06 ` Borislav Petkov
2015-02-23 21:17 ` Maciej W. Rozycki
2015-02-23 21:21 ` Rik van Riel
2015-02-23 22:14 ` Linus Torvalds
2015-02-24 0:56 ` Maciej W. Rozycki
2015-02-24 0:59 ` Andy Lutomirski
2015-02-23 22:27 ` Maciej W. Rozycki
2015-02-23 23:44 ` Andy Lutomirski
2015-02-24 2:14 ` Maciej W. Rozycki
2015-02-24 2:31 ` Andy Lutomirski
2015-02-24 14:43 ` Rik van Riel
2015-02-21 18:34 ` Ingo Molnar
2015-02-23 14:59 ` Oleg Nesterov
2015-02-23 15:11 ` Borislav Petkov
2015-02-23 15:53 ` Rik van Riel
2015-02-23 18:40 ` Oleg Nesterov
2015-02-24 19:15 ` Denys Vlasenko
2015-02-25 0:07 ` Andy Lutomirski
2015-02-25 10:37 ` Borislav Petkov
2015-02-25 10:50 ` Ingo Molnar
2015-02-25 10:45 ` Ingo Molnar
2015-02-25 17:12 ` Some results (was: Re: [RFC PATCH] x86, fpu: Use eagerfpu by default on all CPUs) Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150222104854.GA7529@pd.tnic \
--to=bp@alien8.de \
--cc=arjan@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=riel@redhat.com \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.