All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Andrew Lutomirski <luto@mit.edu>
Cc: linux-kernel@vger.kernel.org, x86 <x86@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Arjan van de Ven <arjan@infradead.org>,
	Avi Kivity <avi@redhat.com>
Subject: Re: [RFC] syscall calling convention, stts/clts, and xstate latency
Date: Sun, 24 Jul 2011 23:15:26 +0200	[thread overview]
Message-ID: <20110724211526.GA6785@elte.hu> (raw)
In-Reply-To: <CAObL_7GCDsfXWRJgkNk7c44GNF0JhQPAH_P0WiYHK7QUX1Bcaw@mail.gmail.com>


* Andrew Lutomirski <luto@mit.edu> wrote:

> I was trying to understand the FPU/xstate saving code, and I ran 
> some benchmarks with surprising results.  These are all on Sandy 
> Bridge i7-2600.  Please take all numbers with a grain of salt -- 
> they're in tight-ish loops and don't really take into account 
> real-world cache effects.
> 
> A clts/stts pair takes about 80 ns.  Accessing extended state from 
> userspace with TS set takes 239 ns.  A kernel_fpu_begin / 
> kernel_fpu_end pair with no userspace xstate access takes 80 ns 
> (presumably 79 of those 80 are the clts/stts).  (Note: The numbers 
> in this paragraph were measured using a hacked-up kernel and KVM.)
> 
> With nonzero ymm state, xsave + clflush (on the first cacheline of 
> xstate) + xrstor takes 128 ns.  With hot cache, xsave = 24ns, 
> xsaveopt (with unchanged state) = 16 ns, and xrstor = 40 ns.
> 
> With nonzero xmm state but zero ymm state, xsave+xrstor drops to 38 
> ns and xsaveopt saves another 5 ns.
> 
> Zeroing the state completely with vzeroall adds 2 ns.  Not sure 
> what's going on.
> 
> All of this makes me think that, at least on Sandy Bridge, lazy 
> xstate saving is a bad optimization -- if the cache is being nice, 
> save/restore is faster than twiddling the TS bit.  And the cost of 
> the trap when TS is set blows everything else away.

Interesting. Mind cooking up a delazying patch and measure it on 
native as well? KVM generally makes exceptions more expensive, so the 
effect of lazy exceptions might be less on native.

> 
> Which brings me to another question: what do you think about 
> declaring some of the extended state to be clobbered by syscall?  
> Ideally, we'd treat syscall like a regular function and clobber 
> everything except the floating point control word and mxcsr.  More 
> conservatively, we'd leave xmm and x87 state but clobber ymm.  This 
> would let us keep the cost of the state save and restore down when 
> kernel_fpu_begin is used in a syscall path and when a context 
> switch happens as a result of a syscall.
> 
> glibc does *not* mark the xmm registers as clobbered when it issues 
> syscalls, but I suspect that everything everywhere that issues 
> syscalls does it from a function, and functions are implicitly 
> assumed to clobber extended state.  (And if anything out there 
> assumes that ymm state is preserved, I'd be amazed.)

To build the kernel with sse optimizations? Would certainly be 
interesting to try.

Thanks,

	Ingo

  reply	other threads:[~2011-07-24 21:16 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-24 21:07 [RFC] syscall calling convention, stts/clts, and xstate latency Andrew Lutomirski
2011-07-24 21:15 ` Ingo Molnar [this message]
2011-07-24 22:34   ` Andrew Lutomirski
2011-07-25  3:21     ` Andrew Lutomirski
2011-07-25  6:42       ` Ingo Molnar
2011-07-25 10:05       ` [PATCH 3.1?] x86: Remove useless stts/clts pair in __switch_to Andy Lutomirski
2011-07-25 11:12         ` Ingo Molnar
2011-07-25 13:04           ` Andrew Lutomirski
2011-07-25 14:13             ` Ingo Molnar
2011-07-25  6:38     ` [RFC] syscall calling convention, stts/clts, and xstate latency Ingo Molnar
2011-07-25  9:44       ` Andrew Lutomirski
2011-07-25  9:51         ` Ingo Molnar
2011-07-25 11:04         ` Hans Rosenfeld
2011-07-25  7:42   ` Avi Kivity
2011-07-25  7:54     ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110724211526.GA6785@elte.hu \
    --to=mingo@elte.hu \
    --cc=arjan@infradead.org \
    --cc=avi@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@mit.edu \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.