From: Thomas Gleixner <tglx@linutronix.de>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>,
LKML <linux-kernel@vger.kernel.org>,
x86@kernel.org, Filipe Manana <fdmanana@suse.com>,
"Jason A. Donenfeld" <Jason@zx2c4.com>,
linux-crypto@vger.kernel.org
Subject: Re: [patch 3/3] x86/fpu: Make FPU protection more robust
Date: Wed, 04 May 2022 17:36:38 +0200 [thread overview]
Message-ID: <87fslpjomx.ffs@tglx> (raw)
In-Reply-To: <YnDwjjdiSQ5Yml6E@hirez.programming.kicks-ass.net>
On Tue, May 03 2022 at 11:06, Peter Zijlstra wrote:
> On Mon, May 02, 2022 at 05:58:40PM +0200, Thomas Gleixner wrote:
>> Right, though currently it's guaranteed that softirq processing context
>> can use the FPU. Quite some of the network crypto work runs in softirq
>> context, so this might cause a regression. If so, then this needs to be
>> an explicit commit on top which is easy to revert. Let me stare at it
>> some more.
>
> Right, so with the:
>
> preempt_disable();
> this_cpu_write(fpu_in_use, true);
> barrier();
>
> sequence it is safe against both softirq and hardirq fpu usage. The only
> concern is performance not correctness when dropping that
> local_bh_disable() thing.
>
> So what Thomas proposes makes sense to me.
Now I was looking at it the other way round too; i.e. to use
local_bh_disable() for both fpregs_lock() and kernel_fpu_begin().
Using local_bh_disable() for both fpregs_lock() and kernel_fpu_begin()
is not possible with the current constraints, because kernel_fpu_begin()
can be called from hard interrupt context.
But the only use case which utilizes FPU from hard interrupt context is
the random generator via add_randomness_...().
I did a benchmark of these functions, which invoke blake2s_update()
three times in a row, on a SKL-X and a ZEN3. The generic code and the
FPU accelerated code are pretty much on par vs. execution time of the
algorithm itself plus/minus noise.
But in case that the interrupt hits a userspace task the FPU needs to be
saved and if the interrupt does not result in rescheduling then the
return to user space has to restore it. That's _expensive_ and the
actual cost depends on the FPU state, but 200-300 cycles for save and
200-700 cycles for restore are due.
Even if we ignore the save/restore part and assume that it averages out
vs. schedule() having to save FPU state anyway, then there is another
aspect to this: power consumption which affects also thermal budget and
capacity.
Though that made me more curious and I did the same comparison for crc32
which is heavily used by ext4. crc32c_pcl_intel_update() already
contains a switch to software when the buffer length is less than 512
bytes. But even on larger buffers, typically ~4k, FPU is not necessarily
a win. It's consistently slower by a factor of ~1.4x. And that's not due
to xsave/rstor overhead because these computations run on a worker
thread which does not do that dance at all.
IOW, using the FPU blindly for this kind of computations is not
necessarily a good plan. I have no idea how these things are analyzed
and evaluated if at all. Maybe the crypto people can shed some light on
this.
Thanks,
tglx
next parent reply other threads:[~2022-05-04 15:36 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20220501192740.203963477@linutronix.de>
[not found] ` <20220501193102.704267030@linutronix.de>
[not found] ` <Ym/sHqKqmLOJubgE@zn.tnic>
[not found] ` <87k0b4lydr.ffs@tglx>
[not found] ` <YnDwjjdiSQ5Yml6E@hirez.programming.kicks-ass.net>
2022-05-04 15:36 ` Thomas Gleixner [this message]
2022-05-04 15:55 ` [patch 3/3] x86/fpu: Make FPU protection more robust Jason A. Donenfeld
2022-05-04 16:45 ` Thomas Gleixner
2022-05-04 19:05 ` Jason A. Donenfeld
2022-05-04 21:04 ` Thomas Gleixner
2022-05-04 23:52 ` Jason A. Donenfeld
2022-05-05 0:55 ` Thomas Gleixner
2022-05-05 1:11 ` Jason A. Donenfeld
2022-05-05 1:21 ` Thomas Gleixner
2022-05-05 11:02 ` Jason A. Donenfeld
2022-05-05 11:34 ` David Laight
2022-05-05 11:35 ` Jason A. Donenfeld
2022-05-05 11:53 ` David Laight
2022-05-06 22:34 ` Jason A. Donenfeld
2022-05-07 13:50 ` David Laight
2022-05-05 13:48 ` Jason A. Donenfeld
2022-05-06 22:15 ` Jason A. Donenfeld
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87fslpjomx.ffs@tglx \
--to=tglx@linutronix.de \
--cc=Jason@zx2c4.com \
--cc=bp@alien8.de \
--cc=fdmanana@suse.com \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).