public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Andrew Lutomirski <luto@mit.edu>
Cc: linux-kernel@vger.kernel.org, x86 <x86@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Arjan van de Ven <arjan@infradead.org>,
	Avi Kivity <avi@redhat.com>
Subject: Re: [RFC] syscall calling convention, stts/clts, and xstate latency
Date: Sun, 24 Jul 2011 23:15:26 +0200	[thread overview]
Message-ID: <20110724211526.GA6785@elte.hu> (raw)
In-Reply-To: <CAObL_7GCDsfXWRJgkNk7c44GNF0JhQPAH_P0WiYHK7QUX1Bcaw@mail.gmail.com>


* Andrew Lutomirski <luto@mit.edu> wrote:

> I was trying to understand the FPU/xstate saving code, and I ran 
> some benchmarks with surprising results.  These are all on Sandy 
> Bridge i7-2600.  Please take all numbers with a grain of salt -- 
> they're in tight-ish loops and don't really take into account 
> real-world cache effects.
> 
> A clts/stts pair takes about 80 ns.  Accessing extended state from 
> userspace with TS set takes 239 ns.  A kernel_fpu_begin / 
> kernel_fpu_end pair with no userspace xstate access takes 80 ns 
> (presumably 79 of those 80 are the clts/stts).  (Note: The numbers 
> in this paragraph were measured using a hacked-up kernel and KVM.)
> 
> With nonzero ymm state, xsave + clflush (on the first cacheline of 
> xstate) + xrstor takes 128 ns.  With hot cache, xsave = 24ns, 
> xsaveopt (with unchanged state) = 16 ns, and xrstor = 40 ns.
> 
> With nonzero xmm state but zero ymm state, xsave+xrstor drops to 38 
> ns and xsaveopt saves another 5 ns.
> 
> Zeroing the state completely with vzeroall adds 2 ns.  Not sure 
> what's going on.
> 
> All of this makes me think that, at least on Sandy Bridge, lazy 
> xstate saving is a bad optimization -- if the cache is being nice, 
> save/restore is faster than twiddling the TS bit.  And the cost of 
> the trap when TS is set blows everything else away.

Interesting. Mind cooking up a delazying patch and measure it on 
native as well? KVM generally makes exceptions more expensive, so the 
effect of lazy exceptions might be less on native.

> 
> Which brings me to another question: what do you think about 
> declaring some of the extended state to be clobbered by syscall?  
> Ideally, we'd treat syscall like a regular function and clobber 
> everything except the floating point control word and mxcsr.  More 
> conservatively, we'd leave xmm and x87 state but clobber ymm.  This 
> would let us keep the cost of the state save and restore down when 
> kernel_fpu_begin is used in a syscall path and when a context 
> switch happens as a result of a syscall.
> 
> glibc does *not* mark the xmm registers as clobbered when it issues 
> syscalls, but I suspect that everything everywhere that issues 
> syscalls does it from a function, and functions are implicitly 
> assumed to clobber extended state.  (And if anything out there 
> assumes that ymm state is preserved, I'd be amazed.)

To build the kernel with sse optimizations? Would certainly be 
interesting to try.

Thanks,

	Ingo

  reply	other threads:[~2011-07-24 21:16 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-24 21:07 [RFC] syscall calling convention, stts/clts, and xstate latency Andrew Lutomirski
2011-07-24 21:15 ` Ingo Molnar [this message]
2011-07-24 22:34   ` Andrew Lutomirski
2011-07-25  3:21     ` Andrew Lutomirski
2011-07-25  6:42       ` Ingo Molnar
2011-07-25 10:05       ` [PATCH 3.1?] x86: Remove useless stts/clts pair in __switch_to Andy Lutomirski
2011-07-25 11:12         ` Ingo Molnar
2011-07-25 13:04           ` Andrew Lutomirski
2011-07-25 14:13             ` Ingo Molnar
2011-07-25  6:38     ` [RFC] syscall calling convention, stts/clts, and xstate latency Ingo Molnar
2011-07-25  9:44       ` Andrew Lutomirski
2011-07-25  9:51         ` Ingo Molnar
2011-07-25 11:04         ` Hans Rosenfeld
2011-07-25  7:42   ` Avi Kivity
2011-07-25  7:54     ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110724211526.GA6785@elte.hu \
    --to=mingo@elte.hu \
    --cc=arjan@infradead.org \
    --cc=avi@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@mit.edu \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox