From: Michael Neuling <mikey@neuling.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
x86@kernel.org,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
benh@kernel.crashing.org, anton@samba.org
Subject: Re: [PATCH 0/2] More i387 state save/restore work
Date: Mon, 20 Feb 2012 11:53:36 +1100 [thread overview]
Message-ID: <12996.1329699216@neuling.org> (raw)
In-Reply-To: <alpine.LFD.2.02.1202191412060.3898@i5.linux-foundation.org>
Linus,
> Ok, this is a series of two patches that continue my i387 state
> save/restore series, but aren't necessarily worth it for Linux-3.3.
We have similar lazy save/restore code on powerpc here:
http://lists.ozlabs.org/pipermail/linuxppc-dev/2010-December/087422.html
With your test, it looks like you're getting about a 10% performance
boost. For VSX registers on powerpc we got about 8% with a similar
micro-benchmark. We were a little disappointed it took such a
tailored/synthetic micro-benchmark to get such modest performance
improvements.
> That said, the first one is a bug-fix - but it's an old bug, and I'm not
> sure it can actually be triggered. The failure path for the FP state
> preload is bogus - and always was. But I'm not sure it really *can* fail.
>
> The first one has another small bugfix in it too, and I think that one may
> be new to the rewritten FP state preloading - it doesn't update the
> fpu_counter, so once it starts preloading, it never stops.
>
> I wrote a silly FPU task switch testing program, which basically starts
> two processes pinned to the same CPU, and then uses sched_yield() in both
> to switch back-and-forth between them. *One* of the processes uses the FPU
> between every yield, the other does not. It runs for two seconds, and
> counts how many loops it gets through.
> With that test, I get:
>
> - Plain 3.3-rc4:
>
> [torvalds@i5 ~]$ uname -r
> 3.3.0-rc4
> [torvalds@i5 ~]$ ./a.out ;./a.out ;./a.out ;./a.out ;./a.out ;./a.out ;
> 2216090 loops in 2 seconds
> 2216922 loops in 2 seconds
> 2217148 loops in 2 seconds
> 2232191 loops in 2 seconds
> 2186203 loops in 2 seconds
> 2231614 loops in 2 seconds
>
> - With the first patch that fixes the FPU preloading to eventually stop:
>
> [torvalds@i5 ~]$ uname -r
> 3.3.0-rc4-00001-g704ed737bd3c
> [torvalds@i5 ~]$ ./a.out ;./a.out ;./a.out ;./a.out ;./a.out ;./a.out ;
> 2306667 loops in 2 seconds
> 2295760 loops in 2 seconds
> 2295494 loops in 2 seconds
> 2296282 loops in 2 seconds
> 2282229 loops in 2 seconds
> 2301842 loops in 2 seconds
>
> - With the second patch that does the lazy preloading
>
> [torvalds@i5 ~]$ uname -r
> 3.3.0-rc4-00002-g022899d937f9
> [torvalds@i5 ~]$ ./a.out ;./a.out ;./a.out ;./a.out ;./a.out ;./a.out ;
> 2466973 loops in 2 seconds
> 2456168 loops in 2 seconds
> 2449863 loops in 2 seconds
> 2461588 loops in 2 seconds
> 2478256 loops in 2 seconds
> 2476844 loops in 2 seconds
Does "2476844 loops in 2 seconds" imply 2476844 context switches in 2
sec? With Anton's context_switch [1] benchmark, we don't even hit 100K
context switches per sec.
Do you have this test program anywhere?
Mikey
1. http://ozlabs.org/~anton/junkcode/context_switch.c
> so these things do make some difference. But it is also interesting to see
> from profiles just how expensive setting CR0.TS is (the write to CR0 is
> very expensive indeed), so even when you avoid the FP state restore
> lazily, just setting TS in between task switches is still a big cost of
> FPU save/restore.
>
>
> Linus Torvalds (2):
> i387: use 'restore_fpu_checking()' directly in task switching code
> i387: support lazy restore of FPU state
>
> arch/x86/include/asm/i387.h | 48 +++++++++++++++++++++++++++---------
-
> arch/x86/include/asm/processor.h | 3 +-
> arch/x86/kernel/cpu/common.c | 2 +
> arch/x86/kernel/process_32.c | 2 +-
> arch/x86/kernel/process_64.c | 2 +-
> arch/x86/kernel/traps.c | 40 ++++++-------------------------
> 6 files changed, 49 insertions(+), 48 deletions(-)
>
> Comments? I feel confident enough about these that I thin kthey might even
> work in 3.3, especially the first one. But I want people to look at
> them.
>
> Linus
>
> --
> 1.7.9.188.g12766.dirty
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
next prev parent reply other threads:[~2012-02-20 0:53 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-19 22:23 [PATCH 0/2] More i387 state save/restore work Linus Torvalds
2012-02-19 22:26 ` [PATCH 1/2] i387: use 'restore_fpu_checking()' directly in task switching code Linus Torvalds
2012-02-19 22:37 ` [PATCH 2/2] i387: support lazy restore of FPU state Linus Torvalds
2012-02-19 22:44 ` H. Peter Anvin
2012-02-19 23:18 ` H. Peter Anvin
2012-02-19 23:56 ` Linus Torvalds
2012-02-20 7:51 ` Ingo Molnar
2012-02-20 0:53 ` Michael Neuling [this message]
2012-02-20 1:03 ` [PATCH 0/2] More i387 state save/restore work Linus Torvalds
2012-02-20 1:06 ` Linus Torvalds
2012-02-20 1:11 ` Linus Torvalds
2012-03-01 11:30 ` Benjamin Herrenschmidt
2012-02-20 2:09 ` Indan Zupancic
2012-02-20 19:46 ` [PATCH v2 0/3] " Linus Torvalds
2012-02-20 19:47 ` [PATCH v2 1/3] i387: fix up some fpu_counter confusion Linus Torvalds
2012-02-20 19:48 ` [PATCH v2 2/3] i387: use 'restore_fpu_checking()' directly in task switching code Linus Torvalds
2012-02-20 19:48 ` [PATCH v2 3/3] i387: support lazy restore of FPU state Linus Torvalds
2012-02-21 1:50 ` Josh Boyer
2012-02-21 2:10 ` Linus Torvalds
2012-02-21 2:14 ` H. Peter Anvin
2012-02-21 5:27 ` Linus Torvalds
2012-02-21 5:35 ` H. Peter Anvin
2012-02-21 14:19 ` Josh Boyer
2012-02-21 17:59 ` H. Peter Anvin
2012-02-21 18:06 ` Ingo Molnar
2012-02-21 18:26 ` Linus Torvalds
2012-02-21 21:14 ` H. Peter Anvin
2012-02-21 21:39 ` [PATCH 0/2] i387: FP state interface cleanups Linus Torvalds
2012-02-21 21:40 ` [PATCH 1/2] i387: uninline the generic FP helpers that we expose to kernel modules Linus Torvalds
2012-02-21 21:41 ` [PATCH 2/2] i387: split up <asm/i387.h> into exported and internal interfaces Linus Torvalds
2012-02-21 23:50 ` [tip:x86/fpu] i387: Split " tip-bot for Linus Torvalds
2012-02-28 11:21 ` [PATCH 2/2] i387: split " Avi Kivity
2012-02-28 16:05 ` Linus Torvalds
2012-02-28 17:21 ` Avi Kivity
2012-02-28 17:37 ` Linus Torvalds
2012-02-28 18:08 ` Linus Torvalds
2012-02-28 18:29 ` Avi Kivity
2012-02-28 18:09 ` Avi Kivity
2012-02-28 18:34 ` Linus Torvalds
2012-02-28 19:06 ` Avi Kivity
2012-02-28 19:26 ` Linus Torvalds
2012-02-28 19:45 ` Avi Kivity
2012-02-21 23:49 ` [tip:x86/fpu] i387: Uninline the generic FP helpers that we expose to kernel modules tip-bot for Linus Torvalds
2012-02-21 2:18 ` [PATCH v2 3/3] i387: support lazy restore of FPU state Linus Torvalds
2012-02-21 2:32 ` H. Peter Anvin
2012-02-21 2:11 ` H. Peter Anvin
2012-02-21 21:54 ` Suresh Siddha
2012-02-21 21:57 ` Linus Torvalds
2012-02-21 22:19 ` Suresh Siddha
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=12996.1329699216@neuling.org \
--to=mikey@neuling.org \
--cc=anton@samba.org \
--cc=benh@kernel.crashing.org \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).