From: Ingo Molnar <mingo@elte.hu>
To: Andrew Lutomirski <luto@mit.edu>
Cc: linux-kernel@vger.kernel.org, x86 <x86@kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Arjan van de Ven <arjan@infradead.org>,
Avi Kivity <avi@redhat.com>
Subject: Re: [RFC] syscall calling convention, stts/clts, and xstate latency
Date: Sun, 24 Jul 2011 23:15:26 +0200 [thread overview]
Message-ID: <20110724211526.GA6785@elte.hu> (raw)
In-Reply-To: <CAObL_7GCDsfXWRJgkNk7c44GNF0JhQPAH_P0WiYHK7QUX1Bcaw@mail.gmail.com>
* Andrew Lutomirski <luto@mit.edu> wrote:
> I was trying to understand the FPU/xstate saving code, and I ran
> some benchmarks with surprising results. These are all on Sandy
> Bridge i7-2600. Please take all numbers with a grain of salt --
> they're in tight-ish loops and don't really take into account
> real-world cache effects.
>
> A clts/stts pair takes about 80 ns. Accessing extended state from
> userspace with TS set takes 239 ns. A kernel_fpu_begin /
> kernel_fpu_end pair with no userspace xstate access takes 80 ns
> (presumably 79 of those 80 are the clts/stts). (Note: The numbers
> in this paragraph were measured using a hacked-up kernel and KVM.)
>
> With nonzero ymm state, xsave + clflush (on the first cacheline of
> xstate) + xrstor takes 128 ns. With hot cache, xsave = 24ns,
> xsaveopt (with unchanged state) = 16 ns, and xrstor = 40 ns.
>
> With nonzero xmm state but zero ymm state, xsave+xrstor drops to 38
> ns and xsaveopt saves another 5 ns.
>
> Zeroing the state completely with vzeroall adds 2 ns. Not sure
> what's going on.
>
> All of this makes me think that, at least on Sandy Bridge, lazy
> xstate saving is a bad optimization -- if the cache is being nice,
> save/restore is faster than twiddling the TS bit. And the cost of
> the trap when TS is set blows everything else away.
Interesting. Mind cooking up a delazying patch and measure it on
native as well? KVM generally makes exceptions more expensive, so the
effect of lazy exceptions might be less on native.
>
> Which brings me to another question: what do you think about
> declaring some of the extended state to be clobbered by syscall?
> Ideally, we'd treat syscall like a regular function and clobber
> everything except the floating point control word and mxcsr. More
> conservatively, we'd leave xmm and x87 state but clobber ymm. This
> would let us keep the cost of the state save and restore down when
> kernel_fpu_begin is used in a syscall path and when a context
> switch happens as a result of a syscall.
>
> glibc does *not* mark the xmm registers as clobbered when it issues
> syscalls, but I suspect that everything everywhere that issues
> syscalls does it from a function, and functions are implicitly
> assumed to clobber extended state. (And if anything out there
> assumes that ymm state is preserved, I'd be amazed.)
To build the kernel with sse optimizations? Would certainly be
interesting to try.
Thanks,
Ingo
next prev parent reply other threads:[~2011-07-24 21:16 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-24 21:07 [RFC] syscall calling convention, stts/clts, and xstate latency Andrew Lutomirski
2011-07-24 21:15 ` Ingo Molnar [this message]
2011-07-24 22:34 ` Andrew Lutomirski
2011-07-25 3:21 ` Andrew Lutomirski
2011-07-25 6:42 ` Ingo Molnar
2011-07-25 10:05 ` [PATCH 3.1?] x86: Remove useless stts/clts pair in __switch_to Andy Lutomirski
2011-07-25 11:12 ` Ingo Molnar
2011-07-25 13:04 ` Andrew Lutomirski
2011-07-25 14:13 ` Ingo Molnar
2011-07-25 6:38 ` [RFC] syscall calling convention, stts/clts, and xstate latency Ingo Molnar
2011-07-25 9:44 ` Andrew Lutomirski
2011-07-25 9:51 ` Ingo Molnar
2011-07-25 11:04 ` Hans Rosenfeld
2011-07-25 7:42 ` Avi Kivity
2011-07-25 7:54 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110724211526.GA6785@elte.hu \
--to=mingo@elte.hu \
--cc=arjan@infradead.org \
--cc=avi@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@mit.edu \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox