All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Neuling <mikey@neuling.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	benh@kernel.crashing.org, anton@samba.org
Subject: Re: [PATCH 0/2] More i387 state save/restore work
Date: Mon, 20 Feb 2012 11:53:36 +1100	[thread overview]
Message-ID: <12996.1329699216@neuling.org> (raw)
In-Reply-To: <alpine.LFD.2.02.1202191412060.3898@i5.linux-foundation.org>

Linus,

> Ok, this is a series of two patches that continue my i387 state 
> save/restore series, but aren't necessarily worth it for Linux-3.3.

We have similar lazy save/restore code on powerpc here:

  http://lists.ozlabs.org/pipermail/linuxppc-dev/2010-December/087422.html
  
With your test, it looks like you're getting about a 10% performance
boost.  For VSX registers on powerpc we got about 8% with a similar
micro-benchmark.  We were a little disappointed it took such a
tailored/synthetic micro-benchmark to get such modest performance
improvements.

> That said, the first one is a bug-fix - but it's an old bug, and I'm not 
> sure it can actually be triggered. The failure path for the FP state 
> preload is bogus - and always was. But I'm not sure it really *can* fail.
> 
> The first one has another small bugfix in it too, and I think that one may 
> be new to the rewritten FP state preloading - it doesn't update the 
> fpu_counter, so once it starts preloading, it never stops.
> 
> I wrote a silly FPU task switch testing program, which basically starts 
> two processes pinned to the same CPU, and then uses sched_yield() in both 
> to switch back-and-forth between them. *One* of the processes uses the FPU 
> between every yield, the other does not. It runs for two seconds, and 
> counts how many loops it gets through.

> With that test, I get:
> 
>  - Plain 3.3-rc4:
> 
>    [torvalds@i5 ~]$ uname -r
>    3.3.0-rc4
>    [torvalds@i5 ~]$ ./a.out ;./a.out ;./a.out ;./a.out ;./a.out ;./a.out ;
>    2216090 loops in 2 seconds
>    2216922 loops in 2 seconds
>    2217148 loops in 2 seconds
>    2232191 loops in 2 seconds
>    2186203 loops in 2 seconds
>    2231614 loops in 2 seconds
> 
>  - With the first patch that fixes the FPU preloading to eventually stop:
> 
>    [torvalds@i5 ~]$ uname -r
>    3.3.0-rc4-00001-g704ed737bd3c
>    [torvalds@i5 ~]$ ./a.out ;./a.out ;./a.out ;./a.out ;./a.out ;./a.out ;
>    2306667 loops in 2 seconds
>    2295760 loops in 2 seconds
>    2295494 loops in 2 seconds
>    2296282 loops in 2 seconds
>    2282229 loops in 2 seconds
>    2301842 loops in 2 seconds
> 
>  - With the second patch that does the lazy preloading
> 
>    [torvalds@i5 ~]$ uname -r
>    3.3.0-rc4-00002-g022899d937f9
>    [torvalds@i5 ~]$ ./a.out ;./a.out ;./a.out ;./a.out ;./a.out ;./a.out ;
>    2466973 loops in 2 seconds
>    2456168 loops in 2 seconds
>    2449863 loops in 2 seconds
>    2461588 loops in 2 seconds
>    2478256 loops in 2 seconds
>    2476844 loops in 2 seconds

Does "2476844 loops in 2 seconds" imply 2476844 context switches in 2
sec?  With Anton's context_switch [1] benchmark, we don't even hit 100K
context switches per sec.

Do you have this test program anywhere?

Mikey

1. http://ozlabs.org/~anton/junkcode/context_switch.c

> so these things do make some difference. But it is also interesting to see 
> from profiles just how expensive setting CR0.TS is (the write to CR0 is 
> very expensive indeed), so even when you avoid the FP state restore 
> lazily, just setting TS in between task switches is still a big cost of 
> FPU save/restore.
>
> 
> Linus Torvalds (2):
>   i387: use 'restore_fpu_checking()' directly in task switching code
>   i387: support lazy restore of FPU state
> 
>  arch/x86/include/asm/i387.h      |   48 +++++++++++++++++++++++++++---------
-
>  arch/x86/include/asm/processor.h |    3 +-
>  arch/x86/kernel/cpu/common.c     |    2 +
>  arch/x86/kernel/process_32.c     |    2 +-
>  arch/x86/kernel/process_64.c     |    2 +-
>  arch/x86/kernel/traps.c          |   40 ++++++-------------------------
>  6 files changed, 49 insertions(+), 48 deletions(-)
> 
> Comments? I feel confident enough about these that I thin kthey might even 
> work in 3.3, especially the first one. But I want people to look at 
> them.
> 
>                      Linus
> 
> -- 
> 1.7.9.188.g12766.dirty
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

  parent reply	other threads:[~2012-02-20  0:53 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-19 22:23 [PATCH 0/2] More i387 state save/restore work Linus Torvalds
2012-02-19 22:26 ` [PATCH 1/2] i387: use 'restore_fpu_checking()' directly in task switching code Linus Torvalds
2012-02-19 22:37   ` [PATCH 2/2] i387: support lazy restore of FPU state Linus Torvalds
2012-02-19 22:44     ` H. Peter Anvin
2012-02-19 23:18       ` H. Peter Anvin
2012-02-19 23:56       ` Linus Torvalds
2012-02-20  7:51     ` Ingo Molnar
2012-02-20  0:53 ` Michael Neuling [this message]
2012-02-20  1:03   ` [PATCH 0/2] More i387 state save/restore work Linus Torvalds
2012-02-20  1:06     ` Linus Torvalds
2012-02-20  1:11       ` Linus Torvalds
2012-03-01 11:30         ` Benjamin Herrenschmidt
2012-02-20  2:09     ` Indan Zupancic
2012-02-20 19:46 ` [PATCH v2 0/3] " Linus Torvalds
2012-02-20 19:47   ` [PATCH v2 1/3] i387: fix up some fpu_counter confusion Linus Torvalds
2012-02-20 19:48     ` [PATCH v2 2/3] i387: use 'restore_fpu_checking()' directly in task switching code Linus Torvalds
2012-02-20 19:48       ` [PATCH v2 3/3] i387: support lazy restore of FPU state Linus Torvalds
2012-02-21  1:50         ` Josh Boyer
2012-02-21  2:10           ` Linus Torvalds
2012-02-21  2:14             ` H. Peter Anvin
2012-02-21  5:27               ` Linus Torvalds
2012-02-21  5:35                 ` H. Peter Anvin
2012-02-21 14:19                 ` Josh Boyer
2012-02-21 17:59                 ` H. Peter Anvin
2012-02-21 18:06                   ` Ingo Molnar
2012-02-21 18:26                   ` Linus Torvalds
2012-02-21 21:14                     ` H. Peter Anvin
2012-02-21 21:39                       ` [PATCH 0/2] i387: FP state interface cleanups Linus Torvalds
2012-02-21 21:40                         ` [PATCH 1/2] i387: uninline the generic FP helpers that we expose to kernel modules Linus Torvalds
2012-02-21 21:41                           ` [PATCH 2/2] i387: split up <asm/i387.h> into exported and internal interfaces Linus Torvalds
2012-02-21 23:50                             ` [tip:x86/fpu] i387: Split " tip-bot for Linus Torvalds
2012-02-28 11:21                             ` [PATCH 2/2] i387: split " Avi Kivity
2012-02-28 11:21                               ` Avi Kivity
2012-02-28 16:05                               ` Linus Torvalds
2012-02-28 17:21                                 ` Avi Kivity
2012-02-28 17:21                                   ` Avi Kivity
2012-02-28 17:37                                   ` Linus Torvalds
2012-02-28 18:08                                     ` Linus Torvalds
2012-02-28 18:29                                       ` Avi Kivity
2012-02-28 18:29                                         ` Avi Kivity
2012-02-28 18:09                                     ` Avi Kivity
2012-02-28 18:09                                       ` Avi Kivity
2012-02-28 18:34                                       ` Linus Torvalds
2012-02-28 19:06                                         ` Avi Kivity
2012-02-28 19:06                                           ` Avi Kivity
2012-02-28 19:26                                           ` Linus Torvalds
2012-02-28 19:45                                             ` Avi Kivity
2012-02-28 19:45                                               ` Avi Kivity
2012-02-21 23:49                           ` [tip:x86/fpu] i387: Uninline the generic FP helpers that we expose to kernel modules tip-bot for Linus Torvalds
2012-02-21  2:18             ` [PATCH v2 3/3] i387: support lazy restore of FPU state Linus Torvalds
2012-02-21  2:32               ` H. Peter Anvin
2012-02-21  2:11           ` H. Peter Anvin
2012-02-21 21:54         ` Suresh Siddha
2012-02-21 21:57           ` Linus Torvalds
2012-02-21 22:19             ` Suresh Siddha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=12996.1329699216@neuling.org \
    --to=mikey@neuling.org \
    --cc=anton@samba.org \
    --cc=benh@kernel.crashing.org \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.