On 07/15/2015 04:07 AM, Ingo Molnar wrote: > * Dave Hansen wrote: >>> /* >>> - * Setup init_xstate_buf to represent the init state of >>> + * Setup init_xstate_ctx to represent the init state of >>> * all the features managed by the xsave >>> */ >>> - init_xstate_buf = alloc_bootmem_align(xstate_size, >>> - __alignof__(struct xsave_struct)); >>> - fx_finit(&init_xstate_buf->i387); >>> + fx_finit(&init_xstate_ctx.i387); >> >> This is causing memory corruption in 4.2-rc2. ... >> This patch works around the problem, btw: >> >> https://www.sr71.net/~dave/intel/bloat-xsave-gunk-2.patch > > Yeah, so I got this prototype hardware boot crash reported in private mail and > decoded it and after some debugging I suggested the +PAGE_SIZE hack - possibly you > got that hack from the same person? Nope, I came up with that gem of a patch all on my own. I also wouldn't characterize this as prototype hardware. There are obviously plenty of folks depending on mainline to boot and function on hardware that has AVX-512 support. That's why two different Intel folks came to you independently. > My suggestion was to solve this properly: if we list xstate features as supported > then we should size their max size correctly. The AVX bits are currently not > properly enumerated and sized - and I refuse to add feature support to the kernel > where per task CPU state fields that the kernel saves/restores are opaque... We might know the size and composition of the individual components, but we do not know the size of the buffer. Different implementations of a given feature are quite free to have different data stored in the buffer, or even to rearrange or pad it. That's why the sizes are not explicitly called out by the architecture and why we enumerated them before your patch that caused this regression. The component itself may not be opaque, but the size of the *buffer* is not a simple sum of the component sizes. Here's a real-world example: [ 0.000000] x86/fpu: xstate_offset[2]: 0240, xstate_sizes[2]: 0100 [ 0.000000] x86/fpu: xstate_offset[3]: 03c0, xstate_sizes[3]: 0040 Notice that component 3 is not at 0x240+0x100. This means our existing init_xstate_size(), and why any attempt to staticlly-size the buffer is broken. I understand why you were misled by it, but the old "xsave_hdr_struct" was wrong. Fenghua even posted patches to remove it before the FPU rework (you were cc'd): https://lkml.org/lkml/2015/4/18/164 > So please add proper AVX512 support structures to fpu/types.h and size > XSTATE_RESERVE correctly - or alternatively we can remove the current incomplete > AVX512 bits. The old code sized the buffer in a fully architectural way and it worked. The CPU *tells* you how much memory the 'xsave' instruction is going to scribble on. The new code just merrily calls it and let it scribble away. This is as clear-cut a regression as I've ever seen. The least we can do is detect that the kernel undersized the buffer and disable support for the features that do not fit. A very lightly tested patch to do that is attached. I'm not super eager to put that in to an -rc2 kernel though.