linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michael Neuling <mikey@neuling.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	benh@kernel.crashing.org, anton@samba.org
Subject: Re: [PATCH 0/2] More i387 state save/restore work
Date: Mon, 20 Feb 2012 11:53:36 +1100	[thread overview]
Message-ID: <12996.1329699216@neuling.org> (raw)
In-Reply-To: <alpine.LFD.2.02.1202191412060.3898@i5.linux-foundation.org>

Linus,

> Ok, this is a series of two patches that continue my i387 state 
> save/restore series, but aren't necessarily worth it for Linux-3.3.

We have similar lazy save/restore code on powerpc here:

  http://lists.ozlabs.org/pipermail/linuxppc-dev/2010-December/087422.html
  
With your test, it looks like you're getting about a 10% performance
boost.  For VSX registers on powerpc we got about 8% with a similar
micro-benchmark.  We were a little disappointed it took such a
tailored/synthetic micro-benchmark to get such modest performance
improvements.

> That said, the first one is a bug-fix - but it's an old bug, and I'm not 
> sure it can actually be triggered. The failure path for the FP state 
> preload is bogus - and always was. But I'm not sure it really *can* fail.
> 
> The first one has another small bugfix in it too, and I think that one may 
> be new to the rewritten FP state preloading - it doesn't update the 
> fpu_counter, so once it starts preloading, it never stops.
> 
> I wrote a silly FPU task switch testing program, which basically starts 
> two processes pinned to the same CPU, and then uses sched_yield() in both 
> to switch back-and-forth between them. *One* of the processes uses the FPU 
> between every yield, the other does not. It runs for two seconds, and 
> counts how many loops it gets through.

> With that test, I get:
> 
>  - Plain 3.3-rc4:
> 
>    [torvalds@i5 ~]$ uname -r
>    3.3.0-rc4
>    [torvalds@i5 ~]$ ./a.out ;./a.out ;./a.out ;./a.out ;./a.out ;./a.out ;
>    2216090 loops in 2 seconds
>    2216922 loops in 2 seconds
>    2217148 loops in 2 seconds
>    2232191 loops in 2 seconds
>    2186203 loops in 2 seconds
>    2231614 loops in 2 seconds
> 
>  - With the first patch that fixes the FPU preloading to eventually stop:
> 
>    [torvalds@i5 ~]$ uname -r
>    3.3.0-rc4-00001-g704ed737bd3c
>    [torvalds@i5 ~]$ ./a.out ;./a.out ;./a.out ;./a.out ;./a.out ;./a.out ;
>    2306667 loops in 2 seconds
>    2295760 loops in 2 seconds
>    2295494 loops in 2 seconds
>    2296282 loops in 2 seconds
>    2282229 loops in 2 seconds
>    2301842 loops in 2 seconds
> 
>  - With the second patch that does the lazy preloading
> 
>    [torvalds@i5 ~]$ uname -r
>    3.3.0-rc4-00002-g022899d937f9
>    [torvalds@i5 ~]$ ./a.out ;./a.out ;./a.out ;./a.out ;./a.out ;./a.out ;
>    2466973 loops in 2 seconds
>    2456168 loops in 2 seconds
>    2449863 loops in 2 seconds
>    2461588 loops in 2 seconds
>    2478256 loops in 2 seconds
>    2476844 loops in 2 seconds

Does "2476844 loops in 2 seconds" imply 2476844 context switches in 2
sec?  With Anton's context_switch [1] benchmark, we don't even hit 100K
context switches per sec.

Do you have this test program anywhere?

Mikey

1. http://ozlabs.org/~anton/junkcode/context_switch.c

> so these things do make some difference. But it is also interesting to see 
> from profiles just how expensive setting CR0.TS is (the write to CR0 is 
> very expensive indeed), so even when you avoid the FP state restore 
> lazily, just setting TS in between task switches is still a big cost of 
> FPU save/restore.
>
> 
> Linus Torvalds (2):
>   i387: use 'restore_fpu_checking()' directly in task switching code
>   i387: support lazy restore of FPU state
> 
>  arch/x86/include/asm/i387.h      |   48 +++++++++++++++++++++++++++---------
-
>  arch/x86/include/asm/processor.h |    3 +-
>  arch/x86/kernel/cpu/common.c     |    2 +
>  arch/x86/kernel/process_32.c     |    2 +-
>  arch/x86/kernel/process_64.c     |    2 +-
>  arch/x86/kernel/traps.c          |   40 ++++++-------------------------
>  6 files changed, 49 insertions(+), 48 deletions(-)
> 
> Comments? I feel confident enough about these that I thin kthey might even 
> work in 3.3, especially the first one. But I want people to look at 
> them.
> 
>                      Linus
> 
> -- 
> 1.7.9.188.g12766.dirty
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

  parent reply	other threads:[~2012-02-20  0:53 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-19 22:23 [PATCH 0/2] More i387 state save/restore work Linus Torvalds
2012-02-19 22:26 ` [PATCH 1/2] i387: use 'restore_fpu_checking()' directly in task switching code Linus Torvalds
2012-02-19 22:37   ` [PATCH 2/2] i387: support lazy restore of FPU state Linus Torvalds
2012-02-19 22:44     ` H. Peter Anvin
2012-02-19 23:18       ` H. Peter Anvin
2012-02-19 23:56       ` Linus Torvalds
2012-02-20  7:51     ` Ingo Molnar
2012-02-20  0:53 ` Michael Neuling [this message]
2012-02-20  1:03   ` [PATCH 0/2] More i387 state save/restore work Linus Torvalds
2012-02-20  1:06     ` Linus Torvalds
2012-02-20  1:11       ` Linus Torvalds
2012-03-01 11:30         ` Benjamin Herrenschmidt
2012-02-20  2:09     ` Indan Zupancic
2012-02-20 19:46 ` [PATCH v2 0/3] " Linus Torvalds
2012-02-20 19:47   ` [PATCH v2 1/3] i387: fix up some fpu_counter confusion Linus Torvalds
2012-02-20 19:48     ` [PATCH v2 2/3] i387: use 'restore_fpu_checking()' directly in task switching code Linus Torvalds
2012-02-20 19:48       ` [PATCH v2 3/3] i387: support lazy restore of FPU state Linus Torvalds
2012-02-21  1:50         ` Josh Boyer
2012-02-21  2:10           ` Linus Torvalds
2012-02-21  2:14             ` H. Peter Anvin
2012-02-21  5:27               ` Linus Torvalds
2012-02-21  5:35                 ` H. Peter Anvin
2012-02-21 14:19                 ` Josh Boyer
2012-02-21 17:59                 ` H. Peter Anvin
2012-02-21 18:06                   ` Ingo Molnar
2012-02-21 18:26                   ` Linus Torvalds
2012-02-21 21:14                     ` H. Peter Anvin
2012-02-21 21:39                       ` [PATCH 0/2] i387: FP state interface cleanups Linus Torvalds
2012-02-21 21:40                         ` [PATCH 1/2] i387: uninline the generic FP helpers that we expose to kernel modules Linus Torvalds
2012-02-21 21:41                           ` [PATCH 2/2] i387: split up <asm/i387.h> into exported and internal interfaces Linus Torvalds
2012-02-21 23:50                             ` [tip:x86/fpu] i387: Split " tip-bot for Linus Torvalds
2012-02-28 11:21                             ` [PATCH 2/2] i387: split " Avi Kivity
2012-02-28 16:05                               ` Linus Torvalds
2012-02-28 17:21                                 ` Avi Kivity
2012-02-28 17:37                                   ` Linus Torvalds
2012-02-28 18:08                                     ` Linus Torvalds
2012-02-28 18:29                                       ` Avi Kivity
2012-02-28 18:09                                     ` Avi Kivity
2012-02-28 18:34                                       ` Linus Torvalds
2012-02-28 19:06                                         ` Avi Kivity
2012-02-28 19:26                                           ` Linus Torvalds
2012-02-28 19:45                                             ` Avi Kivity
2012-02-21 23:49                           ` [tip:x86/fpu] i387: Uninline the generic FP helpers that we expose to kernel modules tip-bot for Linus Torvalds
2012-02-21  2:18             ` [PATCH v2 3/3] i387: support lazy restore of FPU state Linus Torvalds
2012-02-21  2:32               ` H. Peter Anvin
2012-02-21  2:11           ` H. Peter Anvin
2012-02-21 21:54         ` Suresh Siddha
2012-02-21 21:57           ` Linus Torvalds
2012-02-21 22:19             ` Suresh Siddha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=12996.1329699216@neuling.org \
    --to=mikey@neuling.org \
    --cc=anton@samba.org \
    --cc=benh@kernel.crashing.org \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).