From: Ingo Molnar <mingo@kernel.org>
To: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-kernel@vger.kernel.org,
Andy Lutomirski <luto@amacapital.net>,
Borislav Petkov <bp@alien8.de>, Fenghua Yu <fenghua.yu@intel.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Oleg Nesterov <oleg@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ross Zwisler <ross.zwisler@linux.intel.com>
Subject: Re: [REGRESSION] 4.2-rc2: early boot memory corruption from FPU rework
Date: Fri, 17 Jul 2015 09:45:55 +0200 [thread overview]
Message-ID: <20150717074555.GA31873@gmail.com> (raw)
In-Reply-To: <55A6FC31.5010102@linux.intel.com>
* Dave Hansen <dave.hansen@linux.intel.com> wrote:
> On 07/15/2015 04:07 AM, Ingo Molnar wrote:
> > * Dave Hansen <dave.hansen@linux.intel.com> wrote:
> >>> /*
> >>> - * Setup init_xstate_buf to represent the init state of
> >>> + * Setup init_xstate_ctx to represent the init state of
> >>> * all the features managed by the xsave
> >>> */
> >>> - init_xstate_buf = alloc_bootmem_align(xstate_size,
> >>> - __alignof__(struct xsave_struct));
> >>> - fx_finit(&init_xstate_buf->i387);
> >>> + fx_finit(&init_xstate_ctx.i387);
> >>
> >> This is causing memory corruption in 4.2-rc2.
> ...
> >> This patch works around the problem, btw:
> >>
> >> https://www.sr71.net/~dave/intel/bloat-xsave-gunk-2.patch
> >
> > Yeah, so I got this prototype hardware boot crash reported in private mail and
> > decoded it and after some debugging I suggested the +PAGE_SIZE hack - possibly you
> > got that hack from the same person?
>
> Nope, I came up with that gem of a patch all on my own.
:)
> I also wouldn't characterize this as prototype hardware. There are obviously
> plenty of folks depending on mainline to boot and function on hardware that has
> AVX-512 support. That's why two different Intel folks came to you
> independently.
Yeah, so I treat it as a regression even if it's unreleased hw, what matters to
regressions is number of people affected, plus that the kernel should work for a
reasonable set of future hardware as well, without much trouble.
Just curious: does any released hardware have AVX-512? I went by Wikipedia, which
seems to list pre-release hw:
https://en.wikipedia.org/wiki/AVX-512#CPUs_with_AVX-512
Intel
Xeon Phi Knights Landing: AVX-512 F, CDI, PFI and ERI[1] in 2015[6]
Xeon Skylake: AVX-512 F, CDI, VL, BW, and DQ[7] in 2015[8]
Cannonlake (speculation)
> > My suggestion was to solve this properly: if we list xstate features as
> > supported then we should size their max size correctly. The AVX bits are
> > currently not properly enumerated and sized - and I refuse to add feature
> > support to the kernel where per task CPU state fields that the kernel
> > saves/restores are opaque...
>
> We might know the size and composition of the individual components, but we do
> not know the size of the buffer. Different implementations of a given feature
> are quite free to have different data stored in the buffer, or even to rearrange
> or pad it. That's why the sizes are not explicitly called out by the
> architecture and why we enumerated them before your patch that caused this
> regression.
But we _have_ to know their structure and layout of the XSAVE context for any
reasonable ptrace and signal frame support. Can you set/get AVX-512 registers via
ptrace? MPX state?
That's one of the reasons why I absolutely hate how this 'opaque per task CPU
context blob' concept snuck into the x86 code via the XSAVE patches without proper
enumeration of the data structures, sorry...
It makes it way too easy to 'support' CPU features without actually doing a good
job of it - and in fact it makes certain reasonable things impossible or very,
very hard, which makes me nervous.
But we'll fix the boot regression, no argument about that!
> The component itself may not be opaque, but the size of the *buffer* is not a
> simple sum of the component sizes. Here's a real-world example:
>
> [ 0.000000] x86/fpu: xstate_offset[2]: 0240, xstate_sizes[2]: 0100
> [ 0.000000] x86/fpu: xstate_offset[3]: 03c0, xstate_sizes[3]: 0040
>
> Notice that component 3 is not at 0x240+0x100. This means our existing
> init_xstate_size(), and why any attempt to staticlly-size the buffer is broken.
>
> I understand why you were misled by it, but the old "xsave_hdr_struct" was
> wrong. Fenghua even posted patches to remove it before the FPU rework (you were
> cc'd):
>
> https://lkml.org/lkml/2015/4/18/164
Yeah, so I thought the worst bugs were fixed and that these would re-emerge on top
of the new code.
Whether we have a static limit or not is orthogonal to the issue of sizing it
properly - and the plan was to have a dynamic context area in any case.
> > So please add proper AVX512 support structures to fpu/types.h and size
> > XSTATE_RESERVE correctly - or alternatively we can remove the current
> > incomplete AVX512 bits.
>
> The old code sized the buffer in a fully architectural way and it worked. The
> CPU *tells* you how much memory the 'xsave' instruction is going to scribble on.
> The new code just merrily calls it and let it scribble away. This is as
> clear-cut a regression as I've ever seen.
This is a regression which we'll fix, but the 'old' dynamic code clearly did not
work for a long time, I'm sure you still remember my attempt at addressing the
worst fallout in:
e88221c50cad ("x86/fpu: Disable XSAVES* support for now")
Those kinds of totally non-working aspects were what made me nervous about the
opaque data structure aspect.
Because we can have dynamic sizing of the context area and non-opaque data
structures.
> The least we can do is detect that the kernel undersized the buffer and disable
> support for the features that do not fit. A very lightly tested patch to do
> that is attached. I'm not super eager to put that in to an -rc2 kernel though.
Ok, this approach looks good to me as an interim fix. I'll give it a whirl on
older hardware. I agree with you that it needs to be sized dynamically.
> This came out a lot more complicated than I would have liked.
>
> Instead of simply enabling all of the XSAVE features that we both know about and
> the CPU supports, we have to be careful to not overflow our buffer in
> 'init_fpstate.xsave'.
Yeah, and this can be fixed separately and on top of your fix: my plan during the
FPU rework was to move the context area to the end of task_struct and size it
dynamically.
This needs some (very minor) changes to kernel/fork.c to allow an architecture to
determine the full task_struct size dynamically - but looks very doable and clean.
Wanna try this, or should I?
> To do this, we enable each XSAVE feature and then ask the CPU how large that
> makes the buffer. If we would overflow the buffer we allocated, we turn off the
> feature.
>
> This means that no matter what the CPU does, we will not corrupt random memory
> like we do before this patch. It also means that we can fall back in a way
> which cripples the system the least.
Yes, agreed.
Thanks,
Ingo
next prev parent reply other threads:[~2015-07-17 7:46 UTC|newest]
Thread overview: 111+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-05 17:49 [PATCH 000/208] big x86 FPU code rewrite Ingo Molnar
2015-05-05 17:49 ` [PATCH 080/208] x86/fpu: Rename 'xstate_features' to 'xfeatures_nr' Ingo Molnar
2015-05-05 17:49 ` [PATCH 081/208] x86/fpu: Move XCR0 manipulation to the FPU code proper Ingo Molnar
2015-05-05 17:49 ` [PATCH 082/208] x86/fpu: Clean up regset functions Ingo Molnar
2015-05-05 17:49 ` [PATCH 083/208] x86/fpu: Rename 'xsave_hdr' to 'header' Ingo Molnar
2015-05-05 17:49 ` [PATCH 084/208] x86/fpu: Rename xsave.header::xstate_bv to 'xfeatures' Ingo Molnar
2015-05-05 17:57 ` Dave Hansen
2015-05-05 18:16 ` Ingo Molnar
2015-05-05 18:25 ` Dave Hansen
2015-05-06 6:16 ` Ingo Molnar
2015-05-06 12:46 ` Ingo Molnar
2015-05-06 15:09 ` Dave Hansen
2015-05-07 11:46 ` Ingo Molnar
2015-05-06 18:27 ` Dave Hansen
2015-05-07 10:59 ` Borislav Petkov
2015-05-07 12:22 ` Ingo Molnar
2015-05-07 14:58 ` Dave Hansen
2015-05-07 15:33 ` Ingo Molnar
2015-05-07 15:58 ` Dave Hansen
2015-05-07 19:35 ` Ingo Molnar
2015-05-05 17:49 ` [PATCH 085/208] x86/fpu: Clean up and fix MXCSR handling Ingo Molnar
2015-05-05 17:49 ` [PATCH 086/208] x86/fpu: Rename regset FPU register accessors Ingo Molnar
2015-05-05 17:49 ` [PATCH 087/208] x86/fpu: Explain the AVX register layout in the xsave area Ingo Molnar
2015-05-05 17:49 ` [PATCH 088/208] x86/fpu: Improve the __sanitize_i387_state() documentation Ingo Molnar
2015-05-05 17:49 ` [PATCH 089/208] x86/fpu: Rename fpu->has_fpu to fpu->fpregs_active Ingo Molnar
2015-05-05 17:49 ` [PATCH 090/208] x86/fpu: Rename __thread_set_has_fpu() to __fpregs_activate() Ingo Molnar
2015-05-05 17:49 ` [PATCH 091/208] x86/fpu: Rename __thread_clear_has_fpu() to __fpregs_deactivate() Ingo Molnar
2015-05-05 17:49 ` [PATCH 092/208] x86/fpu: Rename __thread_fpu_begin() to fpregs_activate() Ingo Molnar
2015-05-05 17:49 ` [PATCH 093/208] x86/fpu: Rename __thread_fpu_end() to fpregs_deactivate() Ingo Molnar
2015-05-05 17:49 ` [PATCH 094/208] x86/fpu: Remove fpstate_xstate_init_size() boot quirk Ingo Molnar
2015-05-05 17:49 ` [PATCH 095/208] x86/fpu: Remove xsave_init() bootmem allocations Ingo Molnar
2015-05-05 17:49 ` [PATCH 096/208] x86/fpu: Make setup_init_fpu_buf() run-once explicitly Ingo Molnar
2015-05-05 17:49 ` [PATCH 097/208] x86/fpu: Remove 'init_xstate_buf' bootmem allocation Ingo Molnar
2015-07-14 19:46 ` 4.2-rc2: early boot memory corruption from FPU rework Dave Hansen
2015-07-15 1:25 ` H. Peter Anvin
2015-07-15 11:07 ` Ingo Molnar
2015-07-16 0:34 ` [REGRESSION] " Dave Hansen
2015-07-16 2:39 ` Linus Torvalds
2015-07-16 2:51 ` Linus Torvalds
2015-07-17 7:45 ` Ingo Molnar [this message]
2015-07-17 8:51 ` Ingo Molnar
2015-07-17 16:58 ` Dave Hansen
2015-07-17 19:32 ` Ingo Molnar
2015-07-17 20:01 ` Dave Hansen
2015-05-05 17:49 ` [PATCH 098/208] x86/fpu: Split fpu__cpu_init() into early-boot and cpu-boot parts Ingo Molnar
2015-05-05 17:49 ` [PATCH 099/208] x86/fpu: Make the system/cpu init distinction clear in the xstate code as well Ingo Molnar
2015-05-05 17:49 ` [PATCH 100/208] x86/fpu: Move CPU capability check into fpu__init_cpu_xstate() Ingo Molnar
2015-05-05 17:49 ` [PATCH 101/208] x86/fpu: Move legacy check to fpu__init_system_xstate() Ingo Molnar
2015-05-05 17:49 ` [PATCH 102/208] x86/fpu: Propagate once per boot quirk into fpu__init_system_xstate() Ingo Molnar
2015-05-05 17:49 ` [PATCH 103/208] x86/fpu: Remove xsave_init() Ingo Molnar
2015-05-05 17:49 ` [PATCH 104/208] x86/fpu: Do fpu__init_system_xstate only from fpu__init_system() Ingo Molnar
2015-05-05 17:49 ` [PATCH 105/208] x86/fpu: Set up the legacy FPU init image " Ingo Molnar
2015-05-05 17:49 ` [PATCH 106/208] x86/fpu: Remove setup_init_fpu_buf() call from eager_fpu_init() Ingo Molnar
2015-05-05 17:49 ` [PATCH 107/208] x86/fpu: Move all eager-fpu setup code to eager_fpu_init() Ingo Molnar
2015-05-05 17:50 ` [PATCH 108/208] x86/fpu: Move eager_fpu_init() to fpu/init.c Ingo Molnar
2015-05-05 17:50 ` [PATCH 109/208] x86/fpu: Clean up eager_fpu_init() and rename it to fpu__ctx_switch_init() Ingo Molnar
2015-05-05 17:50 ` [PATCH 110/208] x86/fpu: Split fpu__ctx_switch_init() into _cpu() and _system() portions Ingo Molnar
2015-05-05 17:50 ` [PATCH 111/208] x86/fpu: Do CLTS fpu__init_system() Ingo Molnar
2015-05-05 17:50 ` [PATCH 112/208] x86/fpu: Move the fpstate_xstate_init_size() call into fpu__init_system() Ingo Molnar
2015-05-05 17:50 ` [PATCH 113/208] x86/fpu: Call fpu__init_cpu_ctx_switch() from fpu__init_cpu() Ingo Molnar
2015-05-05 17:50 ` [PATCH 114/208] x86/fpu: Do system-wide setup from fpu__detect() Ingo Molnar
2015-05-05 17:50 ` [PATCH 115/208] x86/fpu: Remove fpu__init_cpu_ctx_switch() call from fpu__init_system() Ingo Molnar
2015-05-05 17:50 ` [PATCH 116/208] x86/fpu: Simplify fpu__cpu_init() Ingo Molnar
2015-05-05 17:50 ` [PATCH 117/208] x86/fpu: Factor out fpu__init_cpu_generic() Ingo Molnar
2015-05-05 17:50 ` [PATCH 118/208] x86/fpu: Factor out fpu__init_system_generic() Ingo Molnar
2015-05-05 17:50 ` [PATCH 119/208] x86/fpu: Factor out fpu__init_system_early_generic() Ingo Molnar
2015-05-05 17:50 ` [PATCH 120/208] x86/fpu: Move !FPU check ingo fpu__init_system_early_generic() Ingo Molnar
2015-05-05 17:50 ` [PATCH 121/208] x86/fpu: Factor out FPU bug checks into fpu/bugs.c Ingo Molnar
2015-05-05 17:50 ` [PATCH 122/208] x86/fpu: Make check_fpu() init ordering independent Ingo Molnar
2015-05-05 17:50 ` [PATCH 123/208] x86/fpu: Move fpu__init_system_early_generic() out of fpu__detect() Ingo Molnar
2015-05-05 17:50 ` [PATCH 124/208] x86/fpu: Remove the extra fpu__detect() layer Ingo Molnar
2015-05-05 17:50 ` [PATCH 125/208] x86/fpu: Rename fpstate_xstate_init_size() to fpu__init_system_xstate_size_legacy() Ingo Molnar
2015-05-05 17:50 ` [PATCH 126/208] x86/fpu: Reorder init methods Ingo Molnar
2015-05-05 17:50 ` [PATCH 127/208] x86/fpu: Add more comments to the FPU init code Ingo Molnar
2015-05-05 17:50 ` [PATCH 128/208] x86/fpu: Move fpu__save() to fpu/internals.h Ingo Molnar
2015-05-05 17:50 ` [PATCH 129/208] x86/fpu: Uninline kernel_fpu_begin()/end() Ingo Molnar
2015-05-05 17:50 ` [PATCH 130/208] x86/fpu: Move various internal function prototypes to fpu/internal.h Ingo Molnar
2015-05-05 17:50 ` [PATCH 131/208] x86/fpu: Uninline the irq_ts_save()/restore() functions Ingo Molnar
2015-05-05 17:50 ` [PATCH 132/208] x86/fpu: Rename fpu_save_init() to copy_fpregs_to_fpstate() Ingo Molnar
2015-05-05 17:50 ` [PATCH 133/208] x86/fpu: Optimize copy_fpregs_to_fpstate() by removing the FNCLEX synchronization with FP exceptions Ingo Molnar
2015-05-05 17:50 ` [PATCH 134/208] x86/fpu: Simplify FPU handling by embedding the fpstate in task_struct (again) Ingo Molnar
2015-05-05 17:50 ` [PATCH 135/208] x86/fpu: Remove failure paths from fpstate-alloc low level functions Ingo Molnar
2015-05-05 17:50 ` [PATCH 136/208] x86/fpu: Remove failure return from fpstate_alloc_init() Ingo Molnar
2015-05-05 17:50 ` [PATCH 137/208] x86/fpu: Rename fpstate_alloc_init() to fpstate_init_curr() Ingo Molnar
2015-05-05 17:50 ` [PATCH 138/208] x86/fpu: Simplify fpu__unlazy_stopped() error handling Ingo Molnar
2015-05-05 17:50 ` [PATCH 139/208] x86/fpu, kvm: Simplify fx_init() Ingo Molnar
2015-05-05 17:50 ` [PATCH 140/208] x86/fpu: Simplify fpstate_init_curr() usage Ingo Molnar
2015-05-05 17:50 ` [PATCH 141/208] x86/fpu: Rename fpu__unlazy_stopped() to fpu__activate_stopped() Ingo Molnar
2015-05-05 17:50 ` [PATCH 142/208] x86/fpu: Factor out FPU hw activation/deactivation Ingo Molnar
2015-05-05 17:50 ` [PATCH 143/208] x86/fpu: Simplify __save_fpu() Ingo Molnar
2015-05-05 17:50 ` [PATCH 144/208] x86/fpu: Eliminate __save_fpu() Ingo Molnar
2015-05-05 17:50 ` [PATCH 145/208] x86/fpu: Simplify fpu__save() Ingo Molnar
2015-05-05 17:50 ` [PATCH 146/208] x86/fpu: Optimize fpu__save() Ingo Molnar
2015-05-05 17:50 ` [PATCH 147/208] x86/fpu: Optimize fpu_copy() Ingo Molnar
2015-05-05 17:50 ` [PATCH 148/208] x86/fpu: Optimize fpu_copy() some more on lazy switching systems Ingo Molnar
2015-05-05 17:50 ` [PATCH 149/208] x86/fpu: Rename fpu/xsave.h to fpu/xstate.h Ingo Molnar
2015-05-05 17:50 ` [PATCH 150/208] x86/fpu: Rename fpu/xsave.c to fpu/xstate.c Ingo Molnar
2015-05-05 17:50 ` [PATCH 151/208] x86/fpu: Introduce cpu_has_xfeatures(xfeatures_mask, feature_name) Ingo Molnar
2015-05-05 22:15 ` Yu, Fenghua
2015-05-06 5:00 ` Ingo Molnar
2015-05-05 17:50 ` [PATCH 152/208] x86/fpu: Simplify print_xstate_features() Ingo Molnar
2015-05-05 17:50 ` [PATCH 153/208] x86/fpu: Enumerate xfeature bits Ingo Molnar
2015-05-05 17:50 ` [PATCH 154/208] x86/fpu: Move xfeature type enumeration to fpu/types.h Ingo Molnar
2015-05-05 17:50 ` [PATCH 155/208] x86/fpu, crypto x86/camellia_aesni_avx: Simplify the camellia_aesni_init() xfeature checks Ingo Molnar
2015-05-05 17:50 ` [PATCH 156/208] x86/fpu, crypto x86/sha256_ssse3: Simplify the sha256_ssse3_mod_init() " Ingo Molnar
2015-05-05 17:50 ` [PATCH 157/208] x86/fpu, crypto x86/camellia_aesni_avx2: Simplify the camellia_aesni_init() " Ingo Molnar
2015-05-05 17:50 ` [PATCH 158/208] x86/fpu, crypto x86/twofish_avx: Simplify the twofish_init() " Ingo Molnar
2015-05-05 17:50 ` [PATCH 159/208] x86/fpu, crypto x86/serpent_avx: Simplify the serpent_init() " Ingo Molnar
2015-05-05 17:50 ` [PATCH 160/208] x86/fpu, crypto x86/cast5_avx: Simplify the cast5_init() " Ingo Molnar
2015-05-05 17:50 ` [PATCH 161/208] x86/fpu, crypto x86/sha512_ssse3: Simplify the sha512_ssse3_mod_init() " Ingo Molnar
2015-05-05 17:50 ` [PATCH 162/208] x86/fpu, crypto x86/cast6_avx: Simplify the cast6_init() " Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150717074555.GA31873@gmail.com \
--to=mingo@kernel.org \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=fenghua.yu@intel.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=oleg@redhat.com \
--cc=ross.zwisler@linux.intel.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox