From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1755243AbbHFRTs (ORCPT <rfc822;w@1wt.eu>);
	Thu, 6 Aug 2015 13:19:48 -0400
Received: from www.sr71.net ([198.145.64.142]:54389 "EHLO blackbird.sr71.net"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1753096AbbHFRTq (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Thu, 6 Aug 2015 13:19:46 -0400
Message-ID: <55C39730.8060602@sr71.net>
Date: Thu, 06 Aug 2015 10:19:44 -0700
From: Dave Hansen <dave@sr71.net>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.8.0
MIME-Version: 1.0
To: Ingo Molnar <mingo@kernel.org>
CC: dave.hansen@linux.intel.com, linux-kernel@vger.kernel.org, bp@alien8.de,
        fenghua.yu@intel.com, hpa@zytor.com, x86@kernel.org,
        Thomas Gleixner <tglx@linutronix.de>,
        Peter Zijlstra <a.p.zijlstra@chello.nl>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Andy Lutomirski <luto@kernel.org>,
        Denys Vlasenko <dvlasenk@redhat.com>
Subject: Re: [PATCH] x86, fpu: correct XSAVE xstate size calculation
References: <20150728172143.6DDFECA7@viggo.jf.intel.com> <20150805103227.GA3233@gmail.com> <55C21EFC.3060802@sr71.net> <20150806071545.GB2194@gmail.com>
In-Reply-To: <20150806071545.GB2194@gmail.com>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Just to be clear: the current code is OK and correct for non-compacted
buffers.  Since we currently disable the compacted buffers, this patch
has no effect on current kernels.

This patch fixes the (currently unused) calculation for sizing the
compacted-format buffer.  I can either send it now, or try to make sure
it gets picked up by whoever goes back and re-implents
XSAVES/compact-format support.

On 08/06/2015 12:15 AM, Ingo Molnar wrote:
> * Dave Hansen <dave@sr71.net> wrote:
>>> I realize that the calculation and what CPUID gives us should match, but it's 
>>> not really good for the kernel to not know the precise layout of a critical 
>>> task context data structure ...
>>
>> There is no architectural guarantee that the sum of xstate sizes will be the 
>> same as what comes out of that CPUID leaf.  It would be nice, but it's not 
>> architectural and I've run in to platforms where that assumption does not hold.
> 
> WHY?

>>From a real dmesg:

[    0.000000] x86/fpu: xstate_offset[2]: 0240, xstate_sizes[2]: 0100
[    0.000000] x86/fpu: xstate_offset[3]: 03c0, xstate_sizes[3]: 0040
[    0.000000] x86/fpu: xstate_offset[4]: 0400, xstate_sizes[4]: 0040
...

Note: 0x240 + 0x100 != 0x3c0.

> What sense does it make to have a blob we don't know the exact layout of? How will 
> debuggers or user-space in general be able to print (and change) the register 
> values if they don't know the layout?

Ingo, we know the layout.  We know where every component is.  We know
how big each component is.  This patch does not change the fact that we
calculate and store that.

The *ONLY* thing it does it not derive the total (compacted) buffer size
from that layout since we have another simpler way of getting it.

In fact, it makes the compacted format size calculation work in an
analogous way to the non-compacted one that works today!

> If 'compacted' format means "binary blob only the CPU can decode, not the kernel" 
> then our answer is "uhm, no, thank you, we'll use standard format instead" ...

Nobody is saying that.  We need to read its contents (like with MPX),
and the CPU tells us everything we need in order to decode it.

> And no, "it's not Intel architectural" is a stupid and somewhat circular argument 
> IMHO: the kernel always knew how to decompose CPU context dumps and you'll have to 
> come up with a damn better reason to break that than pointing at some text in an 
> Intel document.

setup_xstate_features() still populates xstate_offsets[] which tells us
where to find each field.  This patch does not change that.

This only changes how we size the (compacted) buffer, not how we
decompose it.

>>> So can we turn this into 'double check the CPUID size and print a warning on 
>>> mismatch' kind of boot time sanity check? Preferably for all XSAVE* data 
>>> formats we can run into. I'd be fine with applying such a patch ahead of 
>>> enabling compaction again.
>>
>> I don't think that is sufficient.
>>
>> There are 4 reasons to apply this patch that I can think of:
>>    1. There is no architectural guarantee that the calculation (sum of
>>    xstate sizes) will match what CPUID gives us as the size of the
>>    buffer.  I've seen this in practice.
> 
> So the context layout and structure on such CPUs has to be mapped and properly 
> taken into account in the size calculation. How can GDB or any other (kernel) 
> debugger display (and change) individual fields reliably if the layout is not 
> known?

Anything wanting to decode the buffer needs to read the format from
CPUID and decode it that way.  GDB from what I hear actually does it
wrong and there are folks looking to fix it.

>> 2. The alignment bit indicates that there is space used in the buffer
>>    which is not part of a state component.  The current code does not
>>    take that in to account.
> 
> Then it has to be taken it into account - just like user-space has to take it into 
> account if it wants to display (and change) individual fields...

This isn't the end of the world (it's just a few more lines and cpuid
calls), but it is unnecessary work.

>> 3. The code is currently asking for the size of an XSAVE-produced
>>    buffer.  The code will be wrong the moment we switch to XSAVES
>>    because XSAVES saves more things than XSAVE and uses more space.
> 
> This will have to be fixed before we move to compacted format.

This patch is intended to help us move in the direction of re-enabling
the compacted format.