From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932821AbcJNROF (ORCPT ); Fri, 14 Oct 2016 13:14:05 -0400 Received: from mga06.intel.com ([134.134.136.31]:9382 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932741AbcJNROC (ORCPT ); Fri, 14 Oct 2016 13:14:02 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.31,493,1473145200"; d="scan'208";a="1044750641" Subject: Re: [PATCH 2/2] x86/fpu: split old & new fpu handling into separate functions To: riel@redhat.com, linux-kernel@vger.kernel.org References: <1476447331-21566-1-git-send-email-riel@redhat.com> <1476447331-21566-3-git-send-email-riel@redhat.com> Cc: hpa@zytor.com, mingo@kernel.org, bp@alien8.de, luto@kernel.org, oleg@redhat.com From: Dave Hansen Message-ID: <58011258.5000202@linux.intel.com> Date: Fri, 14 Oct 2016 10:14:00 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0 MIME-Version: 1.0 In-Reply-To: <1476447331-21566-3-git-send-email-riel@redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/14/2016 05:15 AM, riel@redhat.com wrote: > From: Rik van Riel > > By moving all of the new fpu state handling into switch_fpu_finish, > the code can be simplified some more. This does get rid of the > prefetch, but given the size of the fpu register state on modern > CPUs, and the amount of work done by __switch_to in-between both > functions, the value of a single cache line prefetch seems somewhat > dubious anyway. ... > - > - if (fpu.preload) { > - if (fpregs_state_valid(new_fpu, cpu)) > - fpu.preload = 0; > - else > - prefetch(&new_fpu->state); > - fpregs_activate(new_fpu); > - } > - > - return fpu; > } Yeah, that prefetch is highly dubious. XRSTOR might not even be _reading_ that cacheline if the state isn't present (xstate->xfeatures bit is 0). If we had to pick *a* cacheline to prefetch for XRSTOR, it would be the XSAVE header, *not* the FPU state. I actually did some attempts to optimize the PKRU handling by touching and prefetching the state before calling XRSTOR. It actually made things overall _worse_ when I touched it before the XRSTOR. It would be ideal to have some data on whether this actually _does_ anything, but I can't imagine it being a real delta in either direction. Acked-by: Dave Hansen