From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751254Ab1GYGnn (ORCPT ); Mon, 25 Jul 2011 02:43:43 -0400 Received: from mx3.mail.elte.hu ([157.181.1.138]:53782 "EHLO mx3.mail.elte.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750967Ab1GYGnj (ORCPT ); Mon, 25 Jul 2011 02:43:39 -0400 Date: Mon, 25 Jul 2011 08:42:52 +0200 From: Ingo Molnar To: Andrew Lutomirski Cc: linux-kernel@vger.kernel.org, x86 , Linus Torvalds , Arjan van de Ven , Avi Kivity Subject: Re: [RFC] syscall calling convention, stts/clts, and xstate latency Message-ID: <20110725064252.GD694@elte.hu> References: <20110724211526.GA6785@elte.hu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-ELTE-SpamScore: -2.0 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=-2.0 required=5.9 tests=BAYES_00 autolearn=no SpamAssassin version=3.3.1 -2.0 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Andrew Lutomirski wrote: > On Sun, Jul 24, 2011 at 6:34 PM, Andrew Lutomirski wrote: > > > > I had in mind something a little less ambitious: making > > kernel_fpu_begin very fast, especially when used more than once. > > Currently it's slow enough to have spawned arch/x86/crypto/fpu.c, > > which is a hideous piece of infrastructure that exists solely to > > reduce the number of kernel_fpu_begin/end pairs when using > > AES-NI. Clobbering registers in syscall would reduce the cost > > even more, but it might require having a way to detect whether > > the most recent kernel entry was via syscall or some other means. > > I think it will be very hard to inadvertently cause a regression, > because the current code looks pretty bad. [ heh, one of the rare cases where bad code works in our favor ;-) ] > 1. Once a task uses xstate for five timeslices, the kernel decides > that it will continue using it. The only thing that clears that > condition is __unlazy_fpu called with TS_USEDFPU set. The only way > I can see for that to happen is if kernel_fpu_begin is called twice > in a row between context switches, and that has little do with the > task's xstate usage. > > 2. __switch_to, when switching to a task with fpu_counter > 5, will > do stts(); clts(). > > The combination means that when switching between two xstate-using > tasks (or even tasks that were once xstate-using), we pay the full > price of a state save/restore *and* stts/clts. I'm all for simplifying this for modern x86 CPUs. The lazy FPU switching logic was kind of neat on UP but started showing its limitations with SMP already - and that was 10 years ago. So if the numbers prove you right then go for it. It's an added bonus that this could enable the kernel to be built using vector instructions - you may or may not want to shoot for the glory of achieving that feat first ;-) Thanks, Ingo