* [RFC] x86: xsave/xrstor support, ucontext_t extensions
@ 2008-05-13 1:10 Suresh Siddha
2008-05-16 13:26 ` Mikael Pettersson
0 siblings, 1 reply; 57+ messages in thread
From: Suresh Siddha @ 2008-05-13 1:10 UTC (permalink / raw)
To: mingo, hpa, tglx, torvalds, akpm, andi, roland, drepper,
Hongjiu.lu
Cc: linux-kernel, arjan, rmk+lkml, dan, asit.k.mallick
hi,
Appended patch adds the support for xsave/xrstor infrastructure for x86.
xsave/xrstor manages the existing and future processor extended states in x86
architecutre.
More info on xsave/xrstor can be found in the Intel SDM's located at
http://www.intel.com/products/processor/manuals/index.htm
Please let me know your feedback and comments. Specifically, I am not sure
if I break anything or make anyone's life harder with the ucontext_t extensions
that are proposed in the patch.
Similar to fpstate, xsave state need to be saved/restored across signals.
x86 sigcontext doesn't seem to have any unused space, while x86_64 has
some unused space(reserved1[8]) in sigcontext.
To keep it consistent across 32bit and 64bit, ucontext is extended with
the new state context. Please review and let me know if you foresee any
compatibility or other issues with these extensions.
BTW, Traditionally glibc has this definition for struct ucontext.
/* Userlevel context. */
typedef struct ucontext
{
unsigned long int uc_flags;
struct ucontext *uc_link;
stack_t uc_stack;
mcontext_t uc_mcontext;
__sigset_t uc_sigmask;
struct _libc_fpstate __fpregs_mem;
} ucontext_t;
And application uses the same structure for get/setcontext() routines and
to refer process context in signal handler routines.
Kernel which sets up the signal handling context has this definition:
struct ucontext {
unsigned long uc_flags;
struct ucontext *uc_link;
stack_t uc_stack;
struct sigcontext uc_mcontext;
sigset_t uc_sigmask;
};
Though the kernel Vs user ucontext look somewhat similar, kernel's ucontext
struct is different from glibc's ucontext struct because of sigset_t size
differences between user Vs kernel. So fpstate in signal handlers must always
be referred through the pointer in sigcontext, and not directly through
__fpregs_mem in userlevels ucontext struct.
glibc perhaps need to use different context structures, one for
get/setcontext() and another for signal handling? Signal handling
context will be governed by the kernel and context info
referred by get/setcontext() will be governed by glibc. This is specfically
needed if glibc's get/setcontext() want to play with xsave info aswell.
This kernel patch is adding a pointer to ucontext for representing
xsave context (size of this area will be determined by the processor
and kernel capabilities). If at some point, this state need to be saved/restored
by get/setcontext() glibc routines and if they want to support application usage
like:
ucontext_t context;
void save()
{
getcontext(&context);
}
void restore()
{
setcontext(&context);
}
then, context information used by get/setcontext() need to evolve independently
from the signal handler context information provided by the kernel.
Comments?
thanks,
suresh
---
[RFC] x86: xsave/xrstor support
The layout of the xsave/xrstor area extends from the 512-byte FXSAVE/FXRSTOR
layout. xsave/xrstor area layout consists of:
- fxsave/fxrstor area (512 bytes)
- xsave header area (64 bytes)
- set of save areas, each corresponding to a processor extended state
The number of save areas, the offset and the size of each save area is
enumerated by CPUID leaf function 0xd.
This patch includes the basic xsave/xrstor infrastructure, which includes:
- context switch support, extending traditional lazy restore mechanism
- signal handling support, extending ucontext_t
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---
Index: linux-2.6-x86/arch/x86/kernel/xsave.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-x86/arch/x86/kernel/xsave.c 2008-05-12 13:09:56.000000000 -0700
@@ -0,0 +1,272 @@
+/*
+ * xsave/xrstor support.
+ *
+ * Suresh Siddha <suresh.b.siddha@intel.com>
+ */
+#include <linux/bootmem.h>
+#include <linux/compat.h>
+#include <asm/i387.h>
+
+#ifdef CONFIG_X86_64
+#include <asm/sigcontext32.h>
+#endif
+
+unsigned int pcntxt_hmask, pcntxt_lmask;
+struct xsave_struct *init_xstate_buf;
+
+#ifdef CONFIG_X86_64
+/*
+ * Signal frame handlers.
+ */
+int save_i387_xstate(void __user *buf)
+{
+ struct task_struct *tsk = current;
+ int err = 0;
+
+ if ((unsigned long)buf % 64)
+ printk("save_i387_xstate: bad xstate %p\n", buf);
+
+ clear_used_math(); /* trigger finit */
+ if (task_thread_info(tsk)->status & TS_USEDFPU) {
+ if (cpu_has_xsave)
+ err = save_xstate_checking(buf);
+ else
+ err = save_i387_checking(buf);
+
+ if (err)
+ return err;
+
+ task_thread_info(tsk)->status &= ~TS_USEDFPU;
+ stts();
+ } else {
+ if (__copy_to_user(buf, &tsk->thread.xstate->fxsave,
+ xstate_size))
+ return -1;
+ }
+
+ return 0;
+}
+
+/*
+ * This restores directly out of user space. Exceptions are handled.
+ */
+int restore_i387_xstate(struct _fpstate __user *buf,
+ struct xstate_cntxt __user *buf1)
+{
+ int err = 0;
+ int size = 0;
+ unsigned int lmask = 0, hmask = 0;
+ struct _xstate __user *xstate = 0;
+
+ if (cpu_has_xsave) {
+ __get_user(size, &buf1->size);
+ __get_user(lmask, &buf1->lmask);
+ __get_user(hmask, &buf1->hmask);
+ __get_user(xstate, &buf1->xstate);
+ }
+
+ if (!buf && !xstate)
+ goto init_state;
+
+ set_used_math();
+ if (!(task_thread_info(current)->status & TS_USEDFPU)) {
+ clts();
+ task_thread_info(current)->status |= TS_USEDFPU;
+ }
+
+ if (xstate) {
+ err = xrestore_checking((struct xsave_struct *) xstate, size,
+ lmask, hmask);
+ if (err)
+ goto init_state;
+ /*
+ * initialize the other extended state that the kernel
+ * knows and not specifed in the user restore masks.
+ */
+ init_xstate(pcntxt_lmask & ~(XSTATE_FPSSE | lmask),
+ pcntxt_hmask & ~hmask);
+ } else
+ init_xstate(pcntxt_lmask & ~XSTATE_FPSSE, pcntxt_hmask);
+
+ if (buf) {
+ if (!access_ok(VERIFY_READ, buf, sizeof(*buf))) {
+ err = -1;
+ goto init_state;
+ }
+
+ err = restore_fpu_checking(buf);
+ if (err)
+ goto init_state;
+ } else
+ init_xstate(XSTATE_FPSSE, 0);
+
+ if (!err)
+ return 0;
+
+init_state:
+ if (used_math()) {
+ clear_fpu(current);
+ clear_used_math();
+ }
+
+ return err;
+}
+#endif
+
+#ifndef CONFIG_X86_64
+# define _fpstate_ia32 _fpstate
+# define xstate_cntxt_ia32 xstate_cntxt
+#endif
+
+/*
+ * FP and extended context restore during signal return. Extended state is
+ * restored directly from user space. Exceptions are handled.
+ */
+int restore_user_xstate(struct _fpstate_ia32 __user *buf,
+ struct xstate_cntxt_ia32 __user *buf1)
+{
+ int err = 0;
+ struct task_struct *tsk = current;
+ int size = 0, lmask = 0, hmask = 0;
+ struct _xstate *xstate;
+
+ if (!buf && !buf1)
+ goto init_state;
+
+ set_used_math();
+ if (!(task_thread_info(current)->status & TS_USEDFPU)) {
+ clts();
+ task_thread_info(current)->status |= TS_USEDFPU;
+ }
+
+ __get_user(size, &buf1->size);
+ __get_user(lmask, &buf1->lmask);
+ __get_user(hmask, &buf1->hmask);
+
+#ifdef CONFIG_IA32_EMULATION
+ {
+ u32 tmp;
+ __get_user(tmp, &buf1->xstate);
+ xstate = compat_ptr(tmp);
+ }
+#else
+ __get_user(xstate, &buf1->xstate);
+#endif
+
+ if (xstate) {
+ /*
+ * Restore directly from the user space and handle the possible
+ * exception. This way, we don't have to do manual error
+ * checking on the user buffer contents.
+ */
+ err = xrestore_checking((__force struct xsave_struct *) xstate,
+ size, lmask, hmask);
+ if (err)
+ return err;
+ /*
+ * initialize the other extended state that the kernel
+ * knows and not specifed in the user restore masks.
+ */
+ init_xstate(pcntxt_lmask & ~(XSTATE_FPSSE | lmask),
+ pcntxt_hmask & ~hmask);
+ } else
+ init_xstate(pcntxt_lmask & ~XSTATE_FPSSE, pcntxt_hmask);
+
+ /*
+ * FP and SSE state can't be restored directly from the userspace
+ * because of legacy reasons. Lets restore it to the fpstate
+ * in the task struct.
+ */
+ unlazy_fpu(tsk);
+
+ if (buf) {
+ /*
+ * legacy FP and SSE restore.
+ */
+ err = restore_i387_fxsave(buf);
+ if (err)
+ return err;
+ } else
+ /*
+ * initialize tasks fpstate in the memory.
+ */
+ init_task_fpstate(tsk);
+
+ return err;
+
+init_state:
+ if (used_math()) {
+ clear_fpu(current);
+ clear_used_math();
+ }
+
+ return err;
+}
+
+/*
+ * Enable the extended processor state save/restore feature
+ */
+void __cpuinit xsave_init(void)
+{
+ if (!cpu_has_xsave)
+ return;
+
+ set_in_cr4(X86_CR4_OSXSAVE);
+
+ /*
+ * Enable all the features that the HW is capable of
+ * and the Linux kernel is aware of.
+ *
+ * xsetbv();
+ */
+ asm volatile(".byte 0x0f,0x01,0xd1"::"c" (0),
+ "a" (pcntxt_lmask), "d" (pcntxt_hmask));
+}
+
+/*
+ * setup the xstate image representing the init state
+ */
+void setup_xstate_init(void)
+{
+ init_xstate_buf = alloc_bootmem(xstate_size);
+ init_xstate_buf->i387.mxcsr = MXCSR_DEFAULT;
+}
+
+/*
+ * Enable and initialize the xsave feature.
+ */
+void __init xsave_cntxt_init(void)
+{
+ unsigned int eax, ebx, ecx, edx;
+
+ cpuid_count(0xd, 0, &eax, &ebx, &ecx, &edx);
+
+ pcntxt_lmask = eax;
+ pcntxt_hmask = edx;
+
+ if ((pcntxt_lmask & XSTATE_FPSSE) != XSTATE_FPSSE) {
+ printk("FP/SSE not shown under xsave features %x\n",
+ pcntxt_lmask);
+ BUG();
+ }
+
+ /*
+ * for now OS knows only about FP/SSE
+ */
+ pcntxt_lmask = pcntxt_lmask & XCNTXT_LMASK;
+ pcntxt_hmask = pcntxt_hmask & XCNTXT_HMASK;
+
+ xsave_init();
+
+ /*
+ * Recompute the context size for enabled features
+ */
+ cpuid_count(0xd, 0, &eax, &ebx, &ecx, &edx);
+
+ xstate_size = ebx;
+
+ setup_xstate_init();
+
+ printk("xsave/xrstor: cntxt size %x, supported lmask %x, hmask %x\n",
+ xstate_size, pcntxt_lmask, pcntxt_hmask);
+}
Index: linux-2.6-x86/include/asm-x86/xsave.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-x86/include/asm-x86/xsave.h 2008-05-12 14:43:07.000000000 -0700
@@ -0,0 +1,120 @@
+#ifndef __ASM_X86_64_XSAVE_H
+#define __ASM_X86_64_XSAVE_H
+
+#include <asm/processor.h>
+#include <asm/i387.h>
+
+#define XSTATE_FP 0x1
+#define XSTATE_SSE 0x2
+
+#define XSTATE_FPSSE (XSTATE_FP | XSTATE_SSE)
+
+#define FXSAVE_SIZE 512
+
+/*
+ * These are the masks that OS can handle currently.
+ */
+#define XCNTXT_LMASK (XSTATE_FP | XSTATE_SSE)
+#define XCNTXT_HMASK 0x0
+
+#ifdef CONFIG_X86_64
+#define REX_PREFIX "0x48, "
+#else
+#define REX_PREFIX
+#endif
+
+extern unsigned int xstate_size, pcntxt_hmask, pcntxt_lmask;
+extern struct xsave_struct *init_xstate_buf;
+
+extern void xsave_cntxt_init(void);
+extern void xsave_init(void);
+
+static inline void xrstor(struct xsave_struct *fx)
+{
+ asm volatile(".byte " REX_PREFIX "0x0f,0xae,0x2f\n\t"
+ :: "D" (fx), "m" (*fx), "a" (-1), "d" (-1) : "memory");
+}
+
+static inline int xsave(struct task_struct *tsk)
+{
+ /* This, however, we can work around by forcing the compiler to select
+ an addressing mode that doesn't require extended registers. */
+ __asm__ __volatile__(".byte " REX_PREFIX "0x0f,0xae,0x27"
+ ::"D" (&(tsk->thread.xstate->xsave)),
+ "a" (-1), "d"(-1) : "memory");
+
+ return 0;
+}
+
+static inline int save_xstate_checking(struct xsave_struct __user *buf)
+{
+ int err;
+ __asm__ __volatile__("1: .byte " REX_PREFIX "0x0f,0xae,0x27\n"
+ "2:\n"
+ ".section .fixup,\"ax\"\n"
+ "3: movl $-1,%[err]\n"
+ " jmp 2b\n"
+ ".previous\n"
+ ".section __ex_table,\"a\"\n"
+ _ASM_ALIGN "\n"
+ _ASM_PTR "1b,3b\n"
+ ".previous"
+ : [err] "=r" (err)
+ : "D" (buf), "a" (-1), "d" (-1), "0" (0)
+ : "memory");
+ if (unlikely(err) && __clear_user(buf, xstate_size))
+ err = -EFAULT;
+ /* No need to clear here because the caller clears USED_MATH */
+ return err;
+}
+
+static inline int xrestore_checking(struct xsave_struct __user *buf,
+ int size, unsigned int lmask,
+ unsigned int hmask)
+{
+ int err;
+ struct xsave_struct *xstate = ((__force struct xsave_struct *)buf);
+ int eax = lmask & ~0x3;
+ int edx = hmask;
+
+ if (!access_ok(VERIFY_READ, buf, size))
+ return -1;
+
+ __asm__ __volatile__("1: .byte " REX_PREFIX "0x0f,0xae,0x2f\n"
+ "2:\n"
+ ".section .fixup,\"ax\"\n"
+ "3: movl $-1,%[err]\n"
+ " jmp 2b\n"
+ ".previous\n"
+ ".section __ex_table,\"a\"\n"
+ _ASM_ALIGN "\n"
+ _ASM_PTR "1b,3b\n"
+ ".previous"
+ : [err] "=r" (err)
+ : "D" (xstate), "a" (eax), "d" (edx), "0" (0)
+ : "memory"); //memory required?
+ return err;
+}
+
+static inline void xrstor_state(struct xsave_struct *fx, int lmask, int hmask)
+{
+ asm volatile(".byte " REX_PREFIX "0x0f,0xae,0x2f\n\t"
+ :: "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask)
+ : "memory");
+}
+
+static inline void init_xstate(int lmask, int hmask)
+{
+ if (cpu_has_xsave)
+ xrstor_state(init_xstate_buf, lmask, hmask);
+}
+
+static inline void init_task_fpstate(struct task_struct *tsk)
+{
+ struct xsave_struct *xstate = &tsk->thread.xstate->xsave;
+ if (cpu_has_xsave) {
+ xstate->xsave_hdr.xstate_bv &= ~XSTATE_FPSSE;
+ xstate->i387.mxcsr = MXCSR_DEFAULT;
+ }
+}
+#endif
Index: linux-2.6-x86/include/asm-x86/processor-flags.h
===================================================================
--- linux-2.6-x86.orig/include/asm-x86/processor-flags.h 2008-05-12 13:09:03.000000000 -0700
+++ linux-2.6-x86/include/asm-x86/processor-flags.h 2008-05-12 13:09:56.000000000 -0700
@@ -59,6 +59,7 @@
#define X86_CR4_OSFXSR 0x00000200 /* enable fast FPU save and restore */
#define X86_CR4_OSXMMEXCPT 0x00000400 /* enable unmasked SSE exceptions */
#define X86_CR4_VMXE 0x00002000 /* enable VMX virtualization */
+#define X86_CR4_OSXSAVE 0x00040000 /* enable xsave and xrestore */
/*
* x86-64 Task Priority Register, CR8
Index: linux-2.6-x86/arch/x86/kernel/traps_64.c
===================================================================
--- linux-2.6-x86.orig/arch/x86/kernel/traps_64.c 2008-05-12 13:09:02.000000000 -0700
+++ linux-2.6-x86/arch/x86/kernel/traps_64.c 2008-05-12 13:09:56.000000000 -0700
@@ -1155,7 +1155,7 @@
}
clts(); /* Allow maths ops (or we recurse) */
- restore_fpu_checking(&me->thread.xstate->fxsave);
+ restore_fpu_xstate(me);
task_thread_info(me)->status |= TS_USEDFPU;
me->fpu_counter++;
}
@@ -1191,10 +1191,6 @@
#endif
/*
- * initialize the per thread extended state:
- */
- init_thread_xstate();
- /*
* Should be a barrier for any external CPU state.
*/
cpu_init();
Index: linux-2.6-x86/arch/x86/kernel/signal_64.c
===================================================================
--- linux-2.6-x86.orig/arch/x86/kernel/signal_64.c 2008-05-12 13:09:02.000000000 -0700
+++ linux-2.6-x86/arch/x86/kernel/signal_64.c 2008-05-12 13:09:56.000000000 -0700
@@ -59,7 +59,7 @@
*/
static int
restore_sigcontext(struct pt_regs *regs, struct sigcontext __user *sc,
- unsigned long *pax)
+ struct xstate_cntxt __user *uc_xstate, unsigned long *pax)
{
unsigned int err = 0;
@@ -99,24 +99,11 @@
struct _fpstate __user * buf;
err |= __get_user(buf, &sc->fpstate);
- if (buf) {
- if (!access_ok(VERIFY_READ, buf, sizeof(*buf)))
- goto badframe;
- err |= restore_i387(buf);
- } else {
- struct task_struct *me = current;
- if (used_math()) {
- clear_fpu(me);
- clear_used_math();
- }
- }
+ err |= restore_i387_xstate(buf, uc_xstate);
}
err |= __get_user(*pax, &sc->ax);
return err;
-
-badframe:
- return 1;
}
asmlinkage long sys_rt_sigreturn(struct pt_regs *regs)
@@ -137,7 +124,8 @@
recalc_sigpending();
spin_unlock_irq(¤t->sighand->siglock);
- if (restore_sigcontext(regs, &frame->uc.uc_mcontext, &ax))
+ if (restore_sigcontext(regs, &frame->uc.uc_mcontext,
+ &frame->uc.uc_xstate, &ax))
goto badframe;
if (do_sigaltstack(&frame->uc.uc_stack, NULL, regs->sp) == -EFAULT)
@@ -155,7 +143,8 @@
*/
static inline int
-setup_sigcontext(struct sigcontext __user *sc, struct pt_regs *regs, unsigned long mask, struct task_struct *me)
+setup_sigcontext(struct sigcontext __user *sc, struct pt_regs *regs,
+ unsigned long mask, struct task_struct *me)
{
int err = 0;
@@ -207,7 +196,7 @@
sp = current->sas_ss_sp + current->sas_ss_size;
}
- return (void __user *)round_down(sp - size, 16);
+ return (void __user *)round_down(sp - size, 64);
}
static int setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
@@ -219,14 +208,14 @@
struct task_struct *me = current;
if (used_math()) {
- fp = get_stack(ka, regs, sizeof(struct _fpstate));
+ fp = get_stack(ka, regs, xstate_size);
frame = (void __user *)round_down(
(unsigned long)fp - sizeof(struct rt_sigframe), 16) - 8;
- if (!access_ok(VERIFY_WRITE, fp, sizeof(struct _fpstate)))
+ if (!access_ok(VERIFY_WRITE, fp, xstate_size))
goto give_sigsegv;
- if (save_i387(fp) < 0)
+ if (save_i387_xstate(fp) < 0)
err |= -1;
} else
frame = get_stack(ka, regs, sizeof(struct rt_sigframe)) - 8;
@@ -249,6 +238,19 @@
err |= __put_user(me->sas_ss_size, &frame->uc.uc_stack.ss_size);
err |= setup_sigcontext(&frame->uc.uc_mcontext, regs, set->sig[0], me);
err |= __put_user(fp, &frame->uc.uc_mcontext.fpstate);
+
+ if (cpu_has_xsave) {
+ err |= __put_user(fp, &frame->uc.uc_xstate.xstate);
+ err |= __put_user(xstate_size, &frame->uc.uc_xstate.size);
+ err |= __put_user(pcntxt_lmask, &frame->uc.uc_xstate.lmask);
+ err |= __put_user(pcntxt_hmask, &frame->uc.uc_xstate.hmask);
+ } else {
+ err |= __put_user(0, &frame->uc.uc_xstate.xstate);
+ err |= __put_user(0, &frame->uc.uc_xstate.size);
+ err |= __put_user(0, &frame->uc.uc_xstate.lmask);
+ err |= __put_user(0, &frame->uc.uc_xstate.hmask);
+ }
+
if (sizeof(*set) == 16) {
__put_user(set->sig[0], &frame->uc.uc_sigmask.sig[0]);
__put_user(set->sig[1], &frame->uc.uc_sigmask.sig[1]);
Index: linux-2.6-x86/include/asm-x86/sigcontext.h
===================================================================
--- linux-2.6-x86.orig/include/asm-x86/sigcontext.h 2008-05-12 13:09:03.000000000 -0700
+++ linux-2.6-x86/include/asm-x86/sigcontext.h 2008-05-12 14:54:21.000000000 -0700
@@ -202,4 +202,26 @@
#endif /* !__i386__ */
+struct _xsave_hdr_struct {
+ u64 xstate_bv;
+ u64 reserved1[2];
+ u64 reserved2[5];
+} __attribute__((packed));
+
+struct _xstate {
+ /*
+ * Applications need to refer to fpstate through fpstate pointer
+ * in sigcontext. Not here directly.
+ */
+ struct _fpstate fpstate;
+ struct _xsave_hdr_struct xsave_hdr;
+ /* new processor state extensions will go here */
+} __attribute__ ((aligned (64)));
+
+struct xstate_cntxt {
+ struct _xstate __user *xstate;
+ u32 size;
+ u32 lmask;
+ u32 hmask;
+};
#endif
Index: linux-2.6-x86/arch/x86/ia32/ia32_signal.c
===================================================================
--- linux-2.6-x86.orig/arch/x86/ia32/ia32_signal.c 2008-05-12 13:09:02.000000000 -0700
+++ linux-2.6-x86/arch/x86/ia32/ia32_signal.c 2008-05-12 13:09:56.000000000 -0700
@@ -174,9 +174,10 @@
u32 pretcode;
int sig;
struct sigcontext_ia32 sc;
- struct _fpstate_ia32 fpstate;
+ struct xstate_cntxt_ia32 xst_cnxt;
unsigned int extramask[_COMPAT_NSIG_WORDS-1];
char retcode[8];
+ /* fp and rest of the extended context state follows here */
};
struct rt_sigframe
@@ -187,8 +188,8 @@
u32 puc;
compat_siginfo_t info;
struct ucontext_ia32 uc;
- struct _fpstate_ia32 fpstate;
char retcode[8];
+ /* fp and rest of the extended context state follows here */
};
#define COPY(x) { \
@@ -207,7 +208,8 @@
static int ia32_restore_sigcontext(struct pt_regs *regs,
struct sigcontext_ia32 __user *sc,
- unsigned int *peax)
+ unsigned int *peax,
+ struct xstate_cntxt_ia32 __user *xst_cntxt)
{
unsigned int tmpflags, gs, oldgs, err = 0;
struct _fpstate_ia32 __user *buf;
@@ -254,26 +256,13 @@
err |= __get_user(tmp, &sc->fpstate);
buf = compat_ptr(tmp);
- if (buf) {
- if (!access_ok(VERIFY_READ, buf, sizeof(*buf)))
- goto badframe;
- err |= restore_i387_ia32(buf);
- } else {
- struct task_struct *me = current;
- if (used_math()) {
- clear_fpu(me);
- clear_used_math();
- }
- }
+ err |= restore_i387_xstate_ia32(buf, xst_cntxt);
err |= __get_user(tmp, &sc->ax);
*peax = tmp;
return err;
-
-badframe:
- return 1;
}
asmlinkage long sys32_sigreturn(struct pt_regs *regs)
@@ -281,6 +270,7 @@
struct sigframe __user *frame = (struct sigframe __user *)(regs->sp-8);
sigset_t set;
unsigned int ax;
+ struct xstate_cntxt_ia32 __user *xst_cnxt = &frame->xst_cnxt;
if (!access_ok(VERIFY_READ, frame, sizeof(*frame)))
goto badframe;
@@ -297,7 +287,7 @@
recalc_sigpending();
spin_unlock_irq(¤t->sighand->siglock);
- if (ia32_restore_sigcontext(regs, &frame->sc, &ax))
+ if (ia32_restore_sigcontext(regs, &frame->sc, &ax, xst_cnxt))
goto badframe;
return ax;
@@ -326,7 +316,8 @@
recalc_sigpending();
spin_unlock_irq(¤t->sighand->siglock);
- if (ia32_restore_sigcontext(regs, &frame->uc.uc_mcontext, &ax))
+ if (ia32_restore_sigcontext(regs, &frame->uc.uc_mcontext, &ax,
+ &frame->uc.uc_xstate))
goto badframe;
tregs = *regs;
@@ -345,8 +336,9 @@
*/
static int ia32_setup_sigcontext(struct sigcontext_ia32 __user *sc,
+ struct pt_regs *regs, unsigned int mask,
struct _fpstate_ia32 __user *fpstate,
- struct pt_regs *regs, unsigned int mask)
+ struct xsave_struct __user *xstate)
{
int tmp, err = 0;
@@ -376,7 +368,7 @@
err |= __put_user((u32)regs->flags, &sc->flags);
err |= __put_user((u32)regs->sp, &sc->sp_at_signal);
- tmp = save_i387_ia32(fpstate);
+ tmp = save_i387_xstate_ia32(fpstate, xstate);
if (tmp < 0)
err = -EFAULT;
else {
@@ -397,7 +389,9 @@
* Determine which stack to use..
*/
static void __user *get_sigframe(struct k_sigaction *ka, struct pt_regs *regs,
- size_t frame_size)
+ int frame_size,
+ struct _fpstate_ia32 **fpstate,
+ struct xsave_struct **xstate)
{
unsigned long sp;
@@ -416,7 +410,19 @@
ka->sa.sa_restorer)
sp = (unsigned long) ka->sa.sa_restorer;
- sp -= frame_size;
+ if (used_math()) {
+ sp = round_down(sp - xstate_size, 64);
+ if (cpu_has_xsave)
+ *xstate = (struct xsave_struct *) sp;
+
+ sp = sp - (sizeof(struct _fpstate_ia32) - FXSAVE_SIZE);
+
+ *fpstate = (struct _fpstate_ia32 *) sp;
+
+ sp = sp - frame_size;
+ } else
+ sp -= frame_size;
+
/* Align the stack pointer according to the i386 ABI,
* i.e. so that on function entry ((sp + 4) & 15) == 0. */
sp = ((sp + 4) & -16ul) - 4;
@@ -429,6 +435,8 @@
struct sigframe __user *frame;
void __user *restorer;
int err = 0;
+ struct _fpstate_ia32 __user *fpstate = 0;
+ struct xsave_struct __user *xstate = 0;
/* copy_to_user optimizes that into a single 8 byte store */
static const struct {
@@ -443,7 +451,7 @@
0,
};
- frame = get_sigframe(ka, regs, sizeof(*frame));
+ frame = get_sigframe(ka, regs, sizeof(*frame), &fpstate, &xstate);
if (!access_ok(VERIFY_WRITE, frame, sizeof(*frame)))
goto give_sigsegv;
@@ -452,11 +460,24 @@
if (err)
goto give_sigsegv;
- err |= ia32_setup_sigcontext(&frame->sc, &frame->fpstate, regs,
- set->sig[0]);
+ err |= ia32_setup_sigcontext(&frame->sc, regs, set->sig[0],
+ fpstate, xstate);
if (err)
goto give_sigsegv;
+ if (cpu_has_xsave) {
+ err |= __put_user(ptr_to_compat(xstate),
+ &frame->xst_cnxt.xstate);
+ err |= __put_user(xstate_size, &frame->xst_cnxt.size);
+ err |= __put_user(pcntxt_lmask, &frame->xst_cnxt.lmask);
+ err |= __put_user(pcntxt_hmask, &frame->xst_cnxt.hmask);
+ } else {
+ err |= __put_user(0, &frame->xst_cnxt.xstate);
+ err |= __put_user(0, &frame->xst_cnxt.size);
+ err |= __put_user(0, &frame->xst_cnxt.lmask);
+ err |= __put_user(0, &frame->xst_cnxt.hmask);
+ }
+
if (_COMPAT_NSIG_WORDS > 1) {
err |= __copy_to_user(frame->extramask, &set->sig[1],
sizeof(frame->extramask));
@@ -518,6 +539,8 @@
struct exec_domain *ed = current_thread_info()->exec_domain;
void __user *restorer;
int err = 0;
+ struct _fpstate_ia32 __user *fpstate = 0;
+ struct xsave_struct __user *xstate = 0;
/* __copy_to_user optimizes that into a single 8 byte store */
static const struct {
@@ -533,7 +556,7 @@
0,
};
- frame = get_sigframe(ka, regs, sizeof(*frame));
+ frame = get_sigframe(ka, regs, sizeof(*frame), &fpstate, &xstate);
if (!access_ok(VERIFY_WRITE, frame, sizeof(*frame)))
goto give_sigsegv;
@@ -553,8 +576,21 @@
err |= __put_user(sas_ss_flags(regs->sp),
&frame->uc.uc_stack.ss_flags);
err |= __put_user(current->sas_ss_size, &frame->uc.uc_stack.ss_size);
- err |= ia32_setup_sigcontext(&frame->uc.uc_mcontext, &frame->fpstate,
- regs, set->sig[0]);
+ err |= ia32_setup_sigcontext(&frame->uc.uc_mcontext, regs, set->sig[0],
+ fpstate, xstate);
+
+ if (cpu_has_xsave) {
+ err |= __put_user(ptr_to_compat(xstate), &frame->uc.uc_xstate.xstate);
+ err |= __put_user(xstate_size, &frame->uc.uc_xstate.size);
+ err |= __put_user(pcntxt_lmask, &frame->uc.uc_xstate.lmask);
+ err |= __put_user(pcntxt_hmask, &frame->uc.uc_xstate.hmask);
+ } else {
+ err |= __put_user(0, &frame->uc.uc_xstate.xstate);
+ err |= __put_user(0, &frame->uc.uc_xstate.size);
+ err |= __put_user(0, &frame->uc.uc_xstate.lmask);
+ err |= __put_user(0, &frame->uc.uc_xstate.hmask);
+ }
+
err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
if (err)
goto give_sigsegv;
Index: linux-2.6-x86/arch/x86/kernel/Makefile
===================================================================
--- linux-2.6-x86.orig/arch/x86/kernel/Makefile 2008-05-12 13:09:02.000000000 -0700
+++ linux-2.6-x86/arch/x86/kernel/Makefile 2008-05-12 13:09:56.000000000 -0700
@@ -31,7 +31,7 @@
obj-$(CONFIG_X86_TRAMPOLINE) += trampoline.o
obj-y += process.o
-obj-y += i387.o
+obj-y += i387.o xsave.o
obj-y += ptrace.o
obj-y += ds.o
obj-$(CONFIG_X86_32) += tls.o
Index: linux-2.6-x86/arch/x86/kernel/i387.c
===================================================================
--- linux-2.6-x86.orig/arch/x86/kernel/i387.c 2008-05-12 13:09:47.000000000 -0700
+++ linux-2.6-x86/arch/x86/kernel/i387.c 2008-05-12 13:23:10.000000000 -0700
@@ -21,9 +21,10 @@
# include <asm/sigcontext32.h>
# include <asm/user32.h>
#else
-# define save_i387_ia32 save_i387
-# define restore_i387_ia32 restore_i387
+# define save_i387_xstate_ia32 save_i387_xstate
+# define restore_i387_xstate_ia32 restore_i387_xstate
# define _fpstate_ia32 _fpstate
+# define xstate_cntxt_ia32 xstate_cntxt
# define user_i387_ia32_struct user_i387_struct
# define user32_fxsr_struct user_fxsr_struct
#endif
@@ -38,6 +39,21 @@
unsigned int xstate_size;
static struct i387_fxsave_struct fx_scratch __cpuinitdata;
+void __cpuinit init_thread_xstate(void)
+{
+ if (cpu_has_xsave) {
+ xsave_cntxt_init();
+ return;
+ }
+
+ if (cpu_has_fxsr)
+ xstate_size = sizeof(struct i387_fxsave_struct);
+#ifdef CONFIG_X86_32
+ else
+ xstate_size = sizeof(struct i387_fsave_struct);
+#endif
+}
+
void __cpuinit mxcsr_feature_mask_init(void)
{
unsigned long mask = 0;
@@ -54,16 +70,6 @@
stts();
}
-void __init init_thread_xstate(void)
-{
- if (cpu_has_fxsr)
- xstate_size = sizeof(struct i387_fxsave_struct);
-#ifdef CONFIG_X86_32
- else
- xstate_size = sizeof(struct i387_fsave_struct);
-#endif
-}
-
#ifdef CONFIG_X86_64
/*
* Called at bootup to set up the initial FPU state that is later cloned
@@ -78,7 +84,12 @@
write_cr0(oldcr0 & ~(X86_CR0_TS|X86_CR0_EM)); /* clear TS and EM */
+ if (!smp_processor_id())
+ init_thread_xstate();
+ xsave_init();
+
mxcsr_feature_mask_init();
+
/* clean state in init */
current_thread_info()->status = 0;
clear_used_math();
@@ -181,6 +192,9 @@
*/
target->thread.xstate->fxsave.mxcsr &= mxcsr_feature_mask;
+ if (cpu_has_xsave)
+ target->thread.xstate->xsave.xsave_hdr.xstate_bv |= XSTATE_FPSSE;
+
return ret;
}
@@ -381,6 +395,9 @@
if (!ret)
convert_to_fxsr(target, &env);
+ if (cpu_has_xsave)
+ target->thread.xstate->xsave.xsave_hdr.xstate_bv |= XSTATE_FP;
+
return ret;
}
@@ -393,7 +410,6 @@
struct task_struct *tsk = current;
struct i387_fsave_struct *fp = &tsk->thread.xstate->fsave;
- unlazy_fpu(tsk);
fp->status = fp->swd;
if (__copy_to_user(buf, fp, sizeof(struct i387_fsave_struct)))
return -1;
@@ -407,8 +423,6 @@
struct user_i387_ia32_struct env;
int err = 0;
- unlazy_fpu(tsk);
-
convert_from_fxsr(&env, tsk);
if (__copy_to_user(buf, &env, sizeof(env)))
return -1;
@@ -418,14 +432,15 @@
if (err)
return -1;
- if (__copy_to_user(&buf->_fxsr_env[0], fx,
- sizeof(struct i387_fxsave_struct)))
+ if (__copy_to_user(&buf->_fxsr_env[0], fx, xstate_size))
return -1;
return 1;
}
-int save_i387_ia32(struct _fpstate_ia32 __user *buf)
+int save_i387_xstate_ia32(struct _fpstate_ia32 __user *buf,
+ struct xsave_struct __user *buf1)
{
+ struct task_struct *tsk = current;
if (!used_math())
return 0;
/*
@@ -440,7 +455,12 @@
NULL, buf) ? -1 : 1;
}
+ unlazy_fpu(tsk);
+
if (cpu_has_fxsr)
+ /*
+ * saves the extended state including legacy fxsave.
+ */
return save_i387_fxsave(buf);
else
return save_i387_fsave(buf);
@@ -450,18 +470,16 @@
{
struct task_struct *tsk = current;
- clear_fpu(tsk);
return __copy_from_user(&tsk->thread.xstate->fsave, buf,
sizeof(struct i387_fsave_struct));
}
-static int restore_i387_fxsave(struct _fpstate_ia32 __user *buf)
+int restore_i387_fxsave(struct _fpstate_ia32 __user *buf)
{
struct task_struct *tsk = current;
struct user_i387_ia32_struct env;
int err;
- clear_fpu(tsk);
err = __copy_from_user(&tsk->thread.xstate->fxsave, &buf->_fxsr_env[0],
sizeof(struct i387_fxsave_struct));
/* mxcsr reserved bits must be masked to zero for security reasons */
@@ -473,22 +491,56 @@
return 0;
}
-int restore_i387_ia32(struct _fpstate_ia32 __user *buf)
+int restore_i387_xstate_ia32(struct _fpstate_ia32 __user *buf,
+ struct xstate_cntxt_ia32 __user *buf1)
{
- int err;
+ int err = 0;
+ struct task_struct *tsk = current;
+
+ if (buf && !access_ok(VERIFY_READ, buf, sizeof(*buf))) {
+ err = -1;
+ goto init_state;
+ }
if (HAVE_HWFP) {
- if (cpu_has_fxsr)
- err = restore_i387_fxsave(buf);
- else
- err = restore_i387_fsave(buf);
+ clear_fpu(tsk);
+
+ if (!used_math()) {
+ err = init_fpu(tsk);
+ if (err)
+ return err;
+ }
+
+ if (cpu_has_xsave)
+ err = restore_user_xstate(buf, buf1);
+ else {
+ if (!buf)
+ goto init_state;
+
+ if (cpu_has_fxsr)
+ err = restore_i387_fxsave(buf);
+ else
+ err = restore_i387_fsave(buf);
+ }
} else {
err = fpregs_soft_set(current, NULL,
0, sizeof(struct user_i387_ia32_struct),
NULL, buf) != 0;
}
+
+ if (err)
+ goto init_state;
+
set_used_math();
+ return 0;
+
+init_state:
+ if (used_math()) {
+ clear_fpu(current);
+ clear_used_math();
+ }
+
return err;
}
Index: linux-2.6-x86/arch/x86/kernel/signal_32.c
===================================================================
--- linux-2.6-x86.orig/arch/x86/kernel/signal_32.c 2008-05-12 13:09:02.000000000 -0700
+++ linux-2.6-x86/arch/x86/kernel/signal_32.c 2008-05-12 13:09:56.000000000 -0700
@@ -26,6 +26,7 @@
#include <asm/uaccess.h>
#include <asm/i387.h>
#include <asm/vdso.h>
+#include <asm/proto.h>
#include "sigframe.h"
@@ -116,7 +117,7 @@
*/
static int
restore_sigcontext(struct pt_regs *regs, struct sigcontext __user *sc,
- unsigned long *pax)
+ unsigned long *pax, struct xstate_cntxt __user *xst_cntxt)
{
unsigned int err = 0;
@@ -162,25 +163,12 @@
struct _fpstate __user *buf;
err |= __get_user(buf, &sc->fpstate);
- if (buf) {
- if (!access_ok(VERIFY_READ, buf, sizeof(*buf)))
- goto badframe;
- err |= restore_i387(buf);
- } else {
- struct task_struct *me = current;
-
- if (used_math()) {
- clear_fpu(me);
- clear_used_math();
- }
- }
+
+ err |= restore_i387_xstate(buf, xst_cntxt);
}
err |= __get_user(*pax, &sc->ax);
return err;
-
-badframe:
- return 1;
}
asmlinkage unsigned long sys_sigreturn(unsigned long __unused)
@@ -189,9 +177,11 @@
struct pt_regs *regs;
unsigned long ax;
sigset_t set;
+ struct xstate_cntxt __user *xst_cnxt;
regs = (struct pt_regs *) &__unused;
frame = (struct sigframe __user *)(regs->sp - 8);
+ xst_cnxt = &frame->xst_cnxt;
if (!access_ok(VERIFY_READ, frame, sizeof(*frame)))
goto badframe;
@@ -206,7 +196,7 @@
recalc_sigpending();
spin_unlock_irq(¤t->sighand->siglock);
- if (restore_sigcontext(regs, &frame->sc, &ax))
+ if (restore_sigcontext(regs, &frame->sc, &ax, xst_cnxt))
goto badframe;
return ax;
@@ -245,7 +235,8 @@
recalc_sigpending();
spin_unlock_irq(¤t->sighand->siglock);
- if (restore_sigcontext(regs, &frame->uc.uc_mcontext, &ax))
+ if (restore_sigcontext(regs, &frame->uc.uc_mcontext, &ax,
+ &frame->uc.uc_xstate))
goto badframe;
if (do_sigaltstack(&frame->uc.uc_stack, NULL, regs->sp) == -EFAULT)
@@ -262,8 +253,10 @@
* Set up a signal frame.
*/
static int
-setup_sigcontext(struct sigcontext __user *sc, struct _fpstate __user *fpstate,
- struct pt_regs *regs, unsigned long mask)
+setup_sigcontext(struct sigcontext __user *sc,
+ struct pt_regs *regs, unsigned long mask,
+ struct _fpstate __user *fpstate,
+ struct xsave_struct __user *xstate)
{
int tmp, err = 0;
@@ -289,7 +282,7 @@
err |= __put_user(regs->sp, &sc->sp_at_signal);
err |= __put_user(regs->ss, (unsigned int __user *)&sc->ss);
- tmp = save_i387(fpstate);
+ tmp = save_i387_xstate(fpstate, xstate);
if (tmp < 0)
err = 1;
else
@@ -306,7 +299,8 @@
* Determine which stack to use..
*/
static inline void __user *
-get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size)
+get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size,
+ struct _fpstate **fpstate, struct xsave_struct **xstate)
{
unsigned long sp;
@@ -332,7 +326,18 @@
sp = (unsigned long) ka->sa.sa_restorer;
}
- sp -= frame_size;
+ if (used_math()) {
+ sp = round_down(sp - xstate_size, 64);
+ if (cpu_has_xsave)
+ *xstate = (struct xsave_struct *) sp;
+
+ sp = sp - (sizeof(struct _fpstate) - FXSAVE_SIZE);
+
+ *fpstate = (struct _fpstate *) sp;
+
+ sp -= frame_size;
+ } else
+ sp -= frame_size;
/*
* Align the stack pointer according to the i386 ABI,
* i.e. so that on function entry ((sp + 4) & 15) == 0.
@@ -350,10 +355,12 @@
void __user *restorer;
int err = 0;
int usig;
+ struct _fpstate __user *fpstate = 0;
+ struct xsave_struct __user *xstate = 0;
- frame = get_sigframe(ka, regs, sizeof(*frame));
+ frame = get_sigframe(ka, regs, sizeof(*frame), &fpstate, &xstate);
- if (!access_ok(VERIFY_WRITE, frame, sizeof(*frame)))
+ if (!access_ok(VERIFY_WRITE, frame, sizeof (*frame)))
goto give_sigsegv;
usig = current_thread_info()->exec_domain
@@ -366,9 +373,21 @@
if (err)
goto give_sigsegv;
- err = setup_sigcontext(&frame->sc, &frame->fpstate, regs, set->sig[0]);
+ err = setup_sigcontext(&frame->sc, regs, set->sig[0],
+ fpstate, xstate);
if (err)
goto give_sigsegv;
+ if (cpu_has_xsave) {
+ err |= __put_user(xstate, &frame->xst_cnxt.xstate);
+ err |= __put_user(xstate_size, &frame->xst_cnxt.size);
+ err |= __put_user(pcntxt_lmask, &frame->xst_cnxt.lmask);
+ err |= __put_user(pcntxt_hmask, &frame->xst_cnxt.hmask);
+ } else {
+ err |= __put_user(0, &frame->xst_cnxt.xstate);
+ err |= __put_user(0, &frame->xst_cnxt.size);
+ err |= __put_user(0, &frame->xst_cnxt.lmask);
+ err |= __put_user(0, &frame->xst_cnxt.hmask);
+ }
if (_NSIG_WORDS > 1) {
err = __copy_to_user(&frame->extramask, &set->sig[1],
@@ -427,8 +446,10 @@
void __user *restorer;
int err = 0;
int usig;
+ struct _fpstate __user *fpstate = 0;
+ struct xsave_struct __user *xstate = 0;
- frame = get_sigframe(ka, regs, sizeof(*frame));
+ frame = get_sigframe(ka, regs, sizeof(*frame), &fpstate, &xstate);
if (!access_ok(VERIFY_WRITE, frame, sizeof(*frame)))
goto give_sigsegv;
@@ -453,8 +474,20 @@
err |= __put_user(sas_ss_flags(regs->sp),
&frame->uc.uc_stack.ss_flags);
err |= __put_user(current->sas_ss_size, &frame->uc.uc_stack.ss_size);
- err |= setup_sigcontext(&frame->uc.uc_mcontext, &frame->fpstate,
- regs, set->sig[0]);
+ err |= setup_sigcontext(&frame->uc.uc_mcontext, regs, set->sig[0],
+ fpstate, xstate);
+ if (cpu_has_xsave) {
+ err |= __put_user(xstate, &frame->uc.uc_xstate.xstate);
+ err |= __put_user(xstate_size, &frame->uc.uc_xstate.size);
+ err |= __put_user(pcntxt_lmask, &frame->uc.uc_xstate.lmask);
+ err |= __put_user(pcntxt_hmask, &frame->uc.uc_xstate.hmask);
+ } else {
+ err |= __put_user(0, &frame->uc.uc_xstate.xstate);
+ err |= __put_user(0, &frame->uc.uc_xstate.size);
+ err |= __put_user(0, &frame->uc.uc_xstate.lmask);
+ err |= __put_user(0, &frame->uc.uc_xstate.hmask);
+ }
+
err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
if (err)
goto give_sigsegv;
Index: linux-2.6-x86/include/asm-x86/cpufeature.h
===================================================================
--- linux-2.6-x86.orig/include/asm-x86/cpufeature.h 2008-05-12 13:09:03.000000000 -0700
+++ linux-2.6-x86/include/asm-x86/cpufeature.h 2008-05-12 13:09:56.000000000 -0700
@@ -90,6 +90,7 @@
#define X86_FEATURE_CX16 (4*32+13) /* CMPXCHG16B */
#define X86_FEATURE_XTPR (4*32+14) /* Send Task Priority Messages */
#define X86_FEATURE_DCA (4*32+18) /* Direct Cache Access */
+#define X86_FEATURE_XSAVE (4*32+26) /* XSAVE */
/* VIA/Cyrix/Centaur-defined CPU features, CPUID level 0xC0000001, word 5 */
#define X86_FEATURE_XSTORE (5*32+ 2) /* on-CPU RNG present (xstore insn) */
@@ -187,6 +188,7 @@
#define cpu_has_gbpages boot_cpu_has(X86_FEATURE_GBPAGES)
#define cpu_has_arch_perfmon boot_cpu_has(X86_FEATURE_ARCH_PERFMON)
#define cpu_has_pat boot_cpu_has(X86_FEATURE_PAT)
+#define cpu_has_xsave boot_cpu_has(X86_FEATURE_XSAVE)
#if defined(CONFIG_X86_INVLPG) || defined(CONFIG_X86_64)
# define cpu_has_invlpg 1
Index: linux-2.6-x86/include/asm-x86/i387.h
===================================================================
--- linux-2.6-x86.orig/include/asm-x86/i387.h 2008-05-12 13:09:47.000000000 -0700
+++ linux-2.6-x86/include/asm-x86/i387.h 2008-05-12 14:46:43.000000000 -0700
@@ -18,6 +18,8 @@
#include <asm/sigcontext.h>
#include <asm/user.h>
#include <asm/uaccess.h>
+#include <asm/xsave.h>
+#include <asm/percpu.h>
extern void fpu_init(void);
extern void mxcsr_feature_mask_init(void);
@@ -31,10 +33,19 @@
#ifdef CONFIG_IA32_EMULATION
struct _fpstate_ia32;
-extern int save_i387_ia32(struct _fpstate_ia32 __user *buf);
-extern int restore_i387_ia32(struct _fpstate_ia32 __user *buf);
+struct xstate_cntxt_ia32;
+extern int save_i387_xstate_ia32(struct _fpstate_ia32 __user *buf,
+ struct xsave_struct __user *buf1);
+extern int restore_i387_xstate_ia32(struct _fpstate_ia32 __user *buf,
+ struct xstate_cntxt_ia32 __user *buf1);
+extern int restore_user_xstate(struct _fpstate_ia32 __user *buf,
+ struct xstate_cntxt_ia32 __user *buf1);
+
+extern int restore_i387_fxsave(struct _fpstate_ia32 __user *buf);
#endif
+#define X87_FSW_ES (1 << 7) /* Exception Summary */
+
#ifdef CONFIG_X86_64
/* Ignore delayed exceptions from user space */
@@ -45,9 +56,13 @@
_ASM_EXTABLE(1b, 2b));
}
-static inline int restore_fpu_checking(struct i387_fxsave_struct *fx)
+static inline int restore_fpu_checking(struct _fpstate __user *buf)
{
int err;
+ struct i387_fxsave_struct *fx = ((__force struct i387_fxsave_struct *) buf);
+
+ if (!access_ok(VERIFY_READ, buf, sizeof(*buf)))
+ return -1;
asm volatile("1: rex64/fxrstor (%[fx])\n\t"
"2:\n"
@@ -62,20 +77,42 @@
#else
: [fx] "cdaSDb" (fx), "m" (*fx), "0" (0));
#endif
- if (unlikely(err))
- init_fpu(current);
return err;
}
-#define X87_FSW_ES (1 << 7) /* Exception Summary */
+static inline void __fxrstor(struct i387_fxsave_struct *fx)
+{
+ asm volatile("1: rex64/fxrstor (%[fx])\n\t"
+#if 0 /* See comment in __fxsave_clear() below. */
+ :: [fx] "r" (fx), "m" (*fx));
+#else
+ :: [fx] "cdaSDb" (fx), "m" (*fx));
+#endif
+}
+
+static inline void restore_fpu_xstate(struct task_struct *tsk)
+{
+ if (cpu_has_xsave)
+ xrstor(&tsk->thread.xstate->xsave);
+ else
+ __fxrstor(&tsk->thread.xstate->fxsave);
+}
/* AMD CPUs don't save/restore FDP/FIP/FOP unless an exception
is pending. Clear the x87 state here by setting it to fixed
values. The kernel data segment can be sometimes 0 and sometimes
new user value. Both should be ok.
Use the PDA as safe address because it should be already in L1. */
-static inline void clear_fpu_state(struct i387_fxsave_struct *fx)
+static inline void clear_fpu_state(struct xsave_struct *xstate)
{
+ struct i387_fxsave_struct *fx = (struct i387_fxsave_struct *) xstate;
+
+ /*
+ * Header may indicate the init state of the FP.
+ */
+ if (cpu_has_xsave && !(xstate->xsave_hdr.xstate_bv & XSTATE_FP))
+ return;
+
if (unlikely(fx->swd & X87_FSW_ES))
asm volatile("fnclex");
alternative_input(ASM_NOP8 ASM_NOP2,
@@ -108,7 +145,7 @@
return err;
}
-static inline void __save_init_fpu(struct task_struct *tsk)
+static inline void __fxsave(struct task_struct *tsk)
{
/* Using "rex64; fxsave %0" is broken because, if the memory operand
uses any extended registers for addressing, a second REX prefix
@@ -133,55 +170,23 @@
: "=m" (tsk->thread.xstate->fxsave)
: "cdaSDb" (&tsk->thread.xstate->fxsave));
#endif
- clear_fpu_state(&tsk->thread.xstate->fxsave);
- task_thread_info(tsk)->status &= ~TS_USEDFPU;
}
-/*
- * Signal frame handlers.
- */
-
-static inline int save_i387(struct _fpstate __user *buf)
+static inline void __save_init_fpu(struct task_struct *tsk)
{
- struct task_struct *tsk = current;
- int err = 0;
-
- BUILD_BUG_ON(sizeof(struct user_i387_struct) !=
- sizeof(tsk->thread.xstate->fxsave));
-
- if ((unsigned long)buf % 16)
- printk("save_i387: bad fpstate %p\n", buf);
+ if (cpu_has_xsave)
+ xsave(tsk);
+ else
+ __fxsave(tsk);
- if (!used_math())
- return 0;
- clear_used_math(); /* trigger finit */
- if (task_thread_info(tsk)->status & TS_USEDFPU) {
- err = save_i387_checking((struct i387_fxsave_struct __user *)
- buf);
- if (err)
- return err;
- task_thread_info(tsk)->status &= ~TS_USEDFPU;
- stts();
- } else {
- if (__copy_to_user(buf, &tsk->thread.xstate->fxsave,
- sizeof(struct i387_fxsave_struct)))
- return -1;
- }
- return 1;
+ clear_fpu_state(&tsk->thread.xstate->xsave);
+ task_thread_info(tsk)->status &= ~TS_USEDFPU;
}
-
/*
- * This restores directly out of user space. Exceptions are handled.
+ * Signal frame handlers...
*/
-static inline int restore_i387(struct _fpstate __user *buf)
-{
- set_used_math();
- if (!(task_thread_info(current)->status & TS_USEDFPU)) {
- clts();
- task_thread_info(current)->status |= TS_USEDFPU;
- }
- return restore_fpu_checking((__force struct i387_fxsave_struct *)buf);
-}
+extern int save_i387_xstate(void __user *buf);
+extern int restore_i387_xstate(struct _fpstate *buf, struct xstate_cntxt *buf1);
#else /* CONFIG_X86_32 */
@@ -190,8 +195,12 @@
asm volatile("fnclex ; fwait");
}
-static inline void restore_fpu(struct task_struct *tsk)
+static inline void restore_fpu_xstate(struct task_struct *tsk)
{
+ if (cpu_has_xsave) {
+ xrstor(&tsk->thread.xstate->xsave);
+ return;
+ }
/*
* The "nop" is needed to make the instructions the same
* length.
@@ -217,6 +226,21 @@
*/
static inline void __save_init_fpu(struct task_struct *tsk)
{
+ if (cpu_has_xsave) {
+ struct xsave_struct *xstate = &tsk->thread.xstate->xsave;
+ struct i387_fxsave_struct *fx = &tsk->thread.xstate->fxsave;
+
+ xsave(tsk);
+
+ /*
+ * Header may indicate the init state of the FP.
+ */
+ if (!(xstate->xsave_hdr.xstate_bv & XSTATE_FP))
+ goto end;
+
+ if (unlikely(fx->swd & X87_FSW_ES))
+ asm volatile("fnclex");
+ } else {
/* Use more nops than strictly needed in case the compiler
varies code */
alternative_input(
@@ -226,6 +250,7 @@
X86_FEATURE_FXSR,
[fx] "m" (tsk->thread.xstate->fxsave),
[fsw] "m" (tsk->thread.xstate->fxsave.swd) : "memory");
+ }
/* AMD K7/K8 CPUs don't save/restore FDP/FIP/FOP unless an exception
is pending. Clear the x87 state here by setting it to fixed
values. safe_address is a random variable that should be in L1 */
@@ -235,15 +260,18 @@
"fildl %[addr]", /* set F?P to defined value */
X86_FEATURE_FXSAVE_LEAK,
[addr] "m" (safe_address));
+end:
task_thread_info(tsk)->status &= ~TS_USEDFPU;
}
-/*
- * Signal frame handlers...
- */
-extern int save_i387(struct _fpstate __user *buf);
-extern int restore_i387(struct _fpstate __user *buf);
+extern int save_i387_xstate(struct _fpstate __user *buf,
+ struct xsave_struct __user *buf1);
+extern int restore_i387_xstate(struct _fpstate __user *buf,
+ struct xstate_cntxt __user *buf1);
+extern int restore_user_xstate(struct _fpstate __user *buf,
+ struct xstate_cntxt __user *buf1);
+extern int restore_i387_fxsave(struct _fpstate __user *buf);
#endif /* CONFIG_X86_64 */
static inline void __unlazy_fpu(struct task_struct *tsk)
Index: linux-2.6-x86/include/asm-x86/processor.h
===================================================================
--- linux-2.6-x86.orig/include/asm-x86/processor.h 2008-05-12 13:09:03.000000000 -0700
+++ linux-2.6-x86/include/asm-x86/processor.h 2008-05-12 14:53:38.000000000 -0700
@@ -351,10 +351,23 @@
u32 entry_eip;
};
+struct xsave_hdr_struct {
+ u64 xstate_bv;
+ u64 reserved1[2];
+ u64 reserved2[5];
+} __attribute__((packed));
+
+struct xsave_struct {
+ struct i387_fxsave_struct i387;
+ struct xsave_hdr_struct xsave_hdr;
+ /* new processor state extensions will go here */
+} __attribute__ ((packed, aligned (64)));
+
union thread_xstate {
struct i387_fsave_struct fsave;
struct i387_fxsave_struct fxsave;
struct i387_soft_struct soft;
+ struct xsave_struct xsave;
};
#ifdef CONFIG_X86_64
Index: linux-2.6-x86/arch/x86/kernel/cpu/feature_names.c
===================================================================
--- linux-2.6-x86.orig/arch/x86/kernel/cpu/feature_names.c 2008-05-12 13:09:02.000000000 -0700
+++ linux-2.6-x86/arch/x86/kernel/cpu/feature_names.c 2008-05-12 13:09:56.000000000 -0700
@@ -46,7 +46,7 @@
"pni", NULL, NULL, "monitor", "ds_cpl", "vmx", "smx", "est",
"tm2", "ssse3", "cid", NULL, NULL, "cx16", "xtpr", NULL,
NULL, NULL, "dca", "sse4_1", "sse4_2", NULL, NULL, "popcnt",
- NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+ NULL, NULL, "xsave", NULL, NULL, NULL, NULL, NULL,
/* VIA/Cyrix/Centaur-defined */
NULL, NULL, "rng", "rng_en", NULL, NULL, "ace", "ace_en",
Index: linux-2.6-x86/include/asm-x86/ucontext.h
===================================================================
--- linux-2.6-x86.orig/include/asm-x86/ucontext.h 2008-05-12 13:09:03.000000000 -0700
+++ linux-2.6-x86/include/asm-x86/ucontext.h 2008-05-12 15:15:12.000000000 -0700
@@ -6,7 +6,10 @@
struct ucontext *uc_link;
stack_t uc_stack;
struct sigcontext uc_mcontext;
- sigset_t uc_sigmask; /* mask last for extensibility */
+ sigset_t uc_sigmask;
+ /* Allow for uc_sigmask growth. Glibc uses a 1024-bit sigset_t. */
+ int __unused[32 - (sizeof (sigset_t) / sizeof (int))];
+ struct xstate_cntxt uc_xstate;
};
#endif /* _ASM_X86_UCONTEXT_H */
Index: linux-2.6-x86/include/asm-x86/ia32.h
===================================================================
--- linux-2.6-x86.orig/include/asm-x86/ia32.h 2008-05-12 13:09:03.000000000 -0700
+++ linux-2.6-x86/include/asm-x86/ia32.h 2008-05-12 13:09:56.000000000 -0700
@@ -41,6 +41,7 @@
stack_ia32_t uc_stack;
struct sigcontext_ia32 uc_mcontext;
compat_sigset_t uc_sigmask; /* mask last for extensibility */
+ struct xstate_cntxt_ia32 uc_xstate;
};
/* This matches struct stat64 in glibc2.2, hence the absolutely
Index: linux-2.6-x86/include/asm-x86/sigcontext32.h
===================================================================
--- linux-2.6-x86.orig/include/asm-x86/sigcontext32.h 2008-05-12 13:09:03.000000000 -0700
+++ linux-2.6-x86/include/asm-x86/sigcontext32.h 2008-05-12 13:09:56.000000000 -0700
@@ -43,6 +43,13 @@
__u32 padding[56];
};
+struct xstate_cntxt_ia32 {
+ u32 xstate; /* really (struct _xstate *) */
+ u32 size;
+ u32 lmask;
+ u32 hmask;
+};
+
struct sigcontext_ia32 {
unsigned short gs, __gsh;
unsigned short fs, __fsh;
Index: linux-2.6-x86/arch/x86/kernel/sigframe.h
===================================================================
--- linux-2.6-x86.orig/arch/x86/kernel/sigframe.h 2008-05-12 13:09:02.000000000 -0700
+++ linux-2.6-x86/arch/x86/kernel/sigframe.h 2008-05-12 13:09:56.000000000 -0700
@@ -3,9 +3,10 @@
char __user *pretcode;
int sig;
struct sigcontext sc;
- struct _fpstate fpstate;
+ struct xstate_cntxt xst_cnxt;
unsigned long extramask[_NSIG_WORDS-1];
char retcode[8];
+ /* fp and rest of the extended context state follows here */
};
struct rt_sigframe {
@@ -15,8 +16,8 @@
void __user *puc;
struct siginfo info;
struct ucontext uc;
- struct _fpstate fpstate;
char retcode[8];
+ /* fp and rest of the extended context state follows here */
};
#else
struct rt_sigframe {
Index: linux-2.6-x86/arch/x86/kernel/traps_32.c
===================================================================
--- linux-2.6-x86.orig/arch/x86/kernel/traps_32.c 2008-05-12 13:09:02.000000000 -0700
+++ linux-2.6-x86/arch/x86/kernel/traps_32.c 2008-05-12 13:09:56.000000000 -0700
@@ -1178,7 +1178,7 @@
}
clts(); /* Allow maths ops (or we recurse) */
- restore_fpu(tsk);
+ restore_fpu_xstate(tsk);
thread->status |= TS_USEDFPU; /* So we fnsave on switch_to() */
tsk->fpu_counter++;
}
@@ -1255,7 +1255,6 @@
set_bit(SYSCALL_VECTOR, used_vectors);
- init_thread_xstate();
/*
* Should be a barrier for any external CPU state:
*/
Index: linux-2.6-x86/arch/x86/kernel/cpu/common.c
===================================================================
--- linux-2.6-x86.orig/arch/x86/kernel/cpu/common.c 2008-05-12 13:09:02.000000000 -0700
+++ linux-2.6-x86/arch/x86/kernel/cpu/common.c 2008-05-12 13:27:21.000000000 -0700
@@ -742,6 +742,10 @@
current_thread_info()->status = 0;
clear_used_math();
mxcsr_feature_mask_init();
+
+ if (!smp_processor_id())
+ init_thread_xstate();
+ xsave_init();
}
#ifdef CONFIG_HOTPLUG_CPU
^ permalink raw reply [flat|nested] 57+ messages in thread* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-13 1:10 [RFC] x86: xsave/xrstor support, ucontext_t extensions Suresh Siddha
@ 2008-05-16 13:26 ` Mikael Pettersson
2008-05-18 1:34 ` Suresh Siddha
0 siblings, 1 reply; 57+ messages in thread
From: Mikael Pettersson @ 2008-05-16 13:26 UTC (permalink / raw)
To: Suresh Siddha
Cc: mingo, hpa, tglx, torvalds, akpm, andi, roland, drepper,
Hongjiu.lu, linux-kernel, arjan, rmk+lkml, dan, asit.k.mallick
Suresh Siddha writes:
> hi,
>
> Appended patch adds the support for xsave/xrstor infrastructure for x86.
> xsave/xrstor manages the existing and future processor extended states in x86
> architecutre.
>
> More info on xsave/xrstor can be found in the Intel SDM's located at
> http://www.intel.com/products/processor/manuals/index.htm
>
> Please let me know your feedback and comments. Specifically, I am not sure
> if I break anything or make anyone's life harder with the ucontext_t extensions
> that are proposed in the patch.
This is a large patch, and somewhat difficult to review since it mixes
kernel-private and user-visible changes. I'm going to focus on the
user-visible changes.
> BTW, Traditionally glibc has this definition for struct ucontext.
glibc's definition is irrelevant, in part because glibc can and does lie
about kernel types to applications, and in part because glibc is not the
only user-space consumer of kernel types: there are other libcs, and there
are user-space virtualisation tools (my interest in this matter) that care
deeply about kernel types and sigframe layouts. In particular, user-space
needs to be able to copy and assemble sigframes.
So however the xsave support ends up looking, user-space must have a
sensible way of detecting and handling the layout changes.
> --- linux-2.6-x86.orig/include/asm-x86/ucontext.h 2008-05-12 13:09:03.000000000 -0700
> +++ linux-2.6-x86/include/asm-x86/ucontext.h 2008-05-12 15:15:12.000000000 -0700
> @@ -6,7 +6,10 @@
> struct ucontext *uc_link;
> stack_t uc_stack;
> struct sigcontext uc_mcontext;
> - sigset_t uc_sigmask; /* mask last for extensibility */
> + sigset_t uc_sigmask;
> + /* Allow for uc_sigmask growth. Glibc uses a 1024-bit sigset_t. */
> + int __unused[32 - (sizeof (sigset_t) / sizeof (int))];
> + struct xstate_cntxt uc_xstate;
> };
>
> #endif /* _ASM_X86_UCONTEXT_H */
You're changing the layout of struct ucontext in two ways: uc_mcontext
changes elsewhere, and you're adding __unused and uc_xstate.
How is user-space supposed to know whether it's looking at a current
layout ucontext or an xsave-layout ucontext?
It seems that uc_flags is unused and always zero. Could you define a
flag bit (e.g. 1) for uc_flags to indicate the xsave layout?
> --- linux-2.6-x86.orig/arch/x86/kernel/sigframe.h 2008-05-12 13:09:02.000000000 -0700
> +++ linux-2.6-x86/arch/x86/kernel/sigframe.h 2008-05-12 13:09:56.000000000 -0700
> @@ -3,9 +3,10 @@
> char __user *pretcode;
> int sig;
> struct sigcontext sc;
> - struct _fpstate fpstate;
> + struct xstate_cntxt xst_cnxt;
> unsigned long extramask[_NSIG_WORDS-1];
> char retcode[8];
> + /* fp and rest of the extended context state follows here */
> };
Offset to extramask[] and retcode changes, as well as the size of the structure.
Why contract xstate_cntxt to xst_cnxt? That just obscures things.
> --- linux-2.6-x86.orig/include/asm-x86/sigcontext.h 2008-05-12 13:09:03.000000000 -0700
> +++ linux-2.6-x86/include/asm-x86/sigcontext.h 2008-05-12 14:54:21.000000000 -0700
> @@ -202,4 +202,26 @@
>
> #endif /* !__i386__ */
>
> +struct _xsave_hdr_struct {
> + u64 xstate_bv;
> + u64 reserved1[2];
> + u64 reserved2[5];
> +} __attribute__((packed));
> +
> +struct _xstate {
> + /*
> + * Applications need to refer to fpstate through fpstate pointer
> + * in sigcontext. Not here directly.
> + */
> + struct _fpstate fpstate;
> + struct _xsave_hdr_struct xsave_hdr;
> + /* new processor state extensions will go here */
> +} __attribute__ ((aligned (64)));
> +
> +struct xstate_cntxt {
> + struct _xstate __user *xstate;
> + u32 size;
> + u32 lmask;
> + u32 hmask;
> +};
> #endif
What is the purpose of the xstate pointer in xstate_cntxt?
As far as I can tell, it's redundant and can alway be derived
from the ucontext's uc_mcontext.fpstate pointer.
On x86-64 we have:
> --- linux-2.6-x86.orig/arch/x86/kernel/signal_64.c 2008-05-12 13:09:02.000000000 -0700
> +++ linux-2.6-x86/arch/x86/kernel/signal_64.c 2008-05-12 13:09:56.000000000 -0700
...
> err |= __put_user(fp, &frame->uc.uc_mcontext.fpstate);
> +
> + if (cpu_has_xsave) {
> + err |= __put_user(fp, &frame->uc.uc_xstate.xstate);
> + err |= __put_user(xstate_size, &frame->uc.uc_xstate.size);
> + err |= __put_user(pcntxt_lmask, &frame->uc.uc_xstate.lmask);
> + err |= __put_user(pcntxt_hmask, &frame->uc.uc_xstate.hmask);
> + } else {
> + err |= __put_user(0, &frame->uc.uc_xstate.xstate);
and on x86-32 we have:
> --- linux-2.6-x86.orig/arch/x86/kernel/signal_32.c 2008-05-12 13:09:02.000000000 -0700
> +++ linux-2.6-x86/arch/x86/kernel/signal_32.c 2008-05-12 13:09:56.000000000 -0700
...
> -get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size)
> +get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size,
> + struct _fpstate **fpstate, struct xsave_struct **xstate)
...
> - sp -= frame_size;
> + if (used_math()) {
> + sp = round_down(sp - xstate_size, 64);
> + if (cpu_has_xsave)
> + *xstate = (struct xsave_struct *) sp;
> +
> + sp = sp - (sizeof(struct _fpstate) - FXSAVE_SIZE);
> +
> + *fpstate = (struct _fpstate *) sp;
> +
> + sp -= frame_size;
> + } else
> + sp -= frame_size;
...(in setup_frame())
> - err = setup_sigcontext(&frame->sc, &frame->fpstate, regs, set->sig[0]);
> + err = setup_sigcontext(&frame->sc, regs, set->sig[0],
> + fpstate, xstate);
> if (err)
> goto give_sigsegv;
> + if (cpu_has_xsave) {
> + err |= __put_user(xstate, &frame->xst_cnxt.xstate);
> + err |= __put_user(xstate_size, &frame->xst_cnxt.size);
> + err |= __put_user(pcntxt_lmask, &frame->xst_cnxt.lmask);
> + err |= __put_user(pcntxt_hmask, &frame->xst_cnxt.hmask);
> + } else {
> + err |= __put_user(0, &frame->xst_cnxt.xstate);
...(in setup_rt_frame())
> - err |= setup_sigcontext(&frame->uc.uc_mcontext, &frame->fpstate,
> - regs, set->sig[0]);
> + err |= setup_sigcontext(&frame->uc.uc_mcontext, regs, set->sig[0],
> + fpstate, xstate);
> + if (cpu_has_xsave) {
> + err |= __put_user(xstate, &frame->uc.uc_xstate.xstate);
> + err |= __put_user(xstate_size, &frame->uc.uc_xstate.size);
> + err |= __put_user(pcntxt_lmask, &frame->uc.uc_xstate.lmask);
> + err |= __put_user(pcntxt_hmask, &frame->uc.uc_xstate.hmask);
> + } else {
> + err |= __put_user(0, &frame->uc.uc_xstate.xstate);
In all cases .xstate seems to be a function of fpstate and cpu_has_xsave.
And why does x86-64 make xstate == fpstate while on x86-32 they're at an
offset from each other?
The fpstate pointer may be pointing into a struct _xstate.
How is user-space supposed to know if this is the case?
struct _fpstate has a 'magic' field which distinguishes x87-only
from x87+FXSR structs. Could that field also be used to indicate XSAVE?
How is user-space supposed to know how large the _xstate struct is?
Is that the size field in the struct xstate_cntxt?
> --- /dev/null 1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6-x86/arch/x86/kernel/xsave.c 2008-05-12 13:09:56.000000000 -0700
...
> + /*
> + * FP and SSE state can't be restored directly from the userspace
> + * because of legacy reasons. Lets restore it to the fpstate
> + * in the task struct.
> + */
Can you please explain what those 'legacy reasons' are?
> --- linux-2.6-x86.orig/arch/x86/kernel/signal_32.c 2008-05-12 13:09:02.000000000 -0700
> +++ linux-2.6-x86/arch/x86/kernel/signal_32.c 2008-05-12 13:09:56.000000000 -0700
...(in setup_frame)
>
> - if (!access_ok(VERIFY_WRITE, frame, sizeof(*frame)))
> + if (!access_ok(VERIFY_WRITE, frame, sizeof (*frame)))
> goto give_sigsegv;
Gratuitous whitespace change.
I know xsave will be needed once AVX is released to the masses.
Any idea on when that will happen?
/Mikael
^ permalink raw reply [flat|nested] 57+ messages in thread* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-16 13:26 ` Mikael Pettersson
@ 2008-05-18 1:34 ` Suresh Siddha
2008-05-19 14:52 ` Mikael Pettersson
0 siblings, 1 reply; 57+ messages in thread
From: Suresh Siddha @ 2008-05-18 1:34 UTC (permalink / raw)
To: Mikael Pettersson
Cc: Suresh Siddha, mingo, hpa, tglx, torvalds, akpm, andi, roland,
drepper, Hongjiu.lu, linux-kernel, arjan, rmk+lkml, dan,
asit.k.mallick
On Fri, May 16, 2008 at 03:26:15PM +0200, Mikael Pettersson wrote:
> This is a large patch, and somewhat difficult to review since it mixes
> kernel-private and user-visible changes.
Sorry. I wanted to give the complete picture. Will see if I can split it up
while posting for upstream.
> I'm going to focus on the user-visible changes.
>
> > BTW, Traditionally glibc has this definition for struct ucontext.
>
> glibc's definition is irrelevant, in part because glibc can and does lie
> about kernel types to applications, and in part because glibc is not the
> only user-space consumer of kernel types: there are other libcs, and there
> are user-space virtualisation tools (my interest in this matter) that care
> deeply about kernel types and sigframe layouts. In particular, user-space
> needs to be able to copy and assemble sigframes.
Ok. I hope user-space is doing this copy and assemble in a (sane) way that
allows the kernel to modify the sigframe with out breaking old apps.
Do you know how the user-space is determining how much to copy today?
I have to probably use some magic values(or other flags), indicating the
presence of extended state context information in the ucontext.
> So however the xsave support ends up looking, user-space must have a
> sensible way of detecting and handling the layout changes.
cpuid information (cpuid.0x1.ecx.osxsave) indicates whether the
OS supports xsave or not. This can be one way, that signal handlers
can use to interpret the ucontext.
Given that it is more than signal handlers, I can add a flag also,
representing ucontext extensions.
> > --- linux-2.6-x86.orig/include/asm-x86/ucontext.h 2008-05-12 13:09:03.000000000 -0700
> > +++ linux-2.6-x86/include/asm-x86/ucontext.h 2008-05-12 15:15:12.000000000 -0700
> > @@ -6,7 +6,10 @@
> > struct ucontext *uc_link;
> > stack_t uc_stack;
> > struct sigcontext uc_mcontext;
> > - sigset_t uc_sigmask; /* mask last for extensibility */
> > + sigset_t uc_sigmask;
> > + /* Allow for uc_sigmask growth. Glibc uses a 1024-bit sigset_t. */
> > + int __unused[32 - (sizeof (sigset_t) / sizeof (int))];
> > + struct xstate_cntxt uc_xstate;
> > };
> >
> > #endif /* _ASM_X86_UCONTEXT_H */
>
> You're changing the layout of struct ucontext in two ways: uc_mcontext
> changes elsewhere, and you're adding __unused and uc_xstate.
I am not changing the uc_mcontext (struct sigcontext). Just extending
the ucontext.
>
> How is user-space supposed to know whether it's looking at a current
> layout ucontext or an xsave-layout ucontext?
>
> It seems that uc_flags is unused and always zero. Could you define a
> flag bit (e.g. 1) for uc_flags to indicate the xsave layout?
Sure. I can even add a magic word to indicate the xsave presence,
so that sigreturn can be double sure and not break older apps.
> > --- linux-2.6-x86.orig/arch/x86/kernel/sigframe.h 2008-05-12 13:09:02.000000000 -0700
> > +++ linux-2.6-x86/arch/x86/kernel/sigframe.h 2008-05-12 13:09:56.000000000 -0700
> > @@ -3,9 +3,10 @@
> > char __user *pretcode;
> > int sig;
> > struct sigcontext sc;
> > - struct _fpstate fpstate;
> > + struct xstate_cntxt xst_cnxt;
> > unsigned long extramask[_NSIG_WORDS-1];
> > char retcode[8];
> > + /* fp and rest of the extended context state follows here */
> > };
>
> Offset to extramask[] and retcode changes, as well as the size of the structure.
hm yes. this will break the restore of extramask[], for apps which set their
own sig return frames.
I can leave _fpstate as it is(and unused) and move xstate context (which
will contain fpstate + the extended state) after extramask[].
> Why contract xstate_cntxt to xst_cnxt? That just obscures things.
ok.
>
> > --- linux-2.6-x86.orig/include/asm-x86/sigcontext.h 2008-05-12 13:09:03.000000000 -0700
> > +++ linux-2.6-x86/include/asm-x86/sigcontext.h 2008-05-12 14:54:21.000000000 -0700
> > @@ -202,4 +202,26 @@
> >
> > #endif /* !__i386__ */
> >
> > +struct _xsave_hdr_struct {
> > + u64 xstate_bv;
> > + u64 reserved1[2];
> > + u64 reserved2[5];
> > +} __attribute__((packed));
> > +
> > +struct _xstate {
> > + /*
> > + * Applications need to refer to fpstate through fpstate pointer
> > + * in sigcontext. Not here directly.
> > + */
> > + struct _fpstate fpstate;
> > + struct _xsave_hdr_struct xsave_hdr;
> > + /* new processor state extensions will go here */
> > +} __attribute__ ((aligned (64)));
> > +
> > +struct xstate_cntxt {
> > + struct _xstate __user *xstate;
> > + u32 size;
> > + u32 lmask;
> > + u32 hmask;
> > +};
> > #endif
>
> What is the purpose of the xstate pointer in xstate_cntxt?
> As far as I can tell, it's redundant and can alway be derived
> from the ucontext's uc_mcontext.fpstate pointer.
This is true in 64bit.
While doing sigreturn, there is no way, kernel can know if the
uc_mcontext.fpstate is pointing to just the legacy fpstate(512 bytes)
or the extended state pointer (unless we incorporate
some magic word at the end of fpstate image, see below)
during sigreturn, fpstate will be restored from the uc_mcontext.fpstate
and extended state will be restored from xstate pointer.
> And why does x86-64 make xstate == fpstate while on x86-32 they're at an
> offset from each other?
because sigcontext's 32bit fpstate is different from 64bit fpstate.
32bit fp state is fsave frame followed by fxsave frame. Whereas 64bit
fp state is just fxsave frame. To maintain 32bit legacy compatibility,
32bit xstate is different from 32bit fpstate.
> The fpstate pointer may be pointing into a struct _xstate.
> How is user-space supposed to know if this is the case?
user-space doesn't have to know. user has to refer fpstate through
fpstate pointer, and all the extended state through xstate pointer.
during the sigreturn, kernel will restore fp state through
fpstate pointer and the rest (exclusing FP/SSE) will be restored
from xstate pointer.
> struct _fpstate has a 'magic' field which distinguishes x87-only
> from x87+FXSR structs. Could that field also be used to indicate XSAVE?
I don't think we can use the existing 'magic' field. But we can
use some what similar magic, if the fxsave/fxrstor give away
some of the fields at the end of fxsave image (today it is reserved
and ignored during fxsave/fxrstor) for software use.
We can then use these fields at the end of fpstate, to indicate the presence of
xstate. But this requires some architecture changes like giving
away this space for SW use. We can take this to architects and
see what they think.
> How is user-space supposed to know how large the _xstate struct is?
> Is that the size field in the struct xstate_cntxt?
yes. I will make the names more descriptive :)
>
> > --- /dev/null 1970-01-01 00:00:00.000000000 +0000
> > +++ linux-2.6-x86/arch/x86/kernel/xsave.c 2008-05-12 13:09:56.000000000 -0700
> ...
> > + /*
> > + * FP and SSE state can't be restored directly from the userspace
> > + * because of legacy reasons. Lets restore it to the fpstate
> > + * in the task struct.
> > + */
>
> Can you please explain what those 'legacy reasons' are?
As I mentioned earlier, fpstate contains both fsave frame followed by fxsave
frame. So the kernel can't directly restore the information.
> I know xsave will be needed once AVX is released to the masses.
> Any idea on when that will happen?
AVX is currently planned for Sandybridge time frame.
thanks,
suresh
^ permalink raw reply [flat|nested] 57+ messages in thread* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-18 1:34 ` Suresh Siddha
@ 2008-05-19 14:52 ` Mikael Pettersson
2008-05-19 15:04 ` Andi Kleen
` (2 more replies)
0 siblings, 3 replies; 57+ messages in thread
From: Mikael Pettersson @ 2008-05-19 14:52 UTC (permalink / raw)
To: Suresh Siddha
Cc: Mikael Pettersson, mingo, hpa, tglx, torvalds, akpm, andi, roland,
drepper, Hongjiu.lu, linux-kernel, arjan, rmk+lkml, dan,
asit.k.mallick
On Sat, 17 May 2008 18:34:16 -0700, Suresh Siddha wrote:
>On Fri, May 16, 2008 at 03:26:15PM +0200, Mikael Pettersson wrote:
>> This is a large patch, and somewhat difficult to review since it mixes
>> kernel-private and user-visible changes.
>
>Sorry. I wanted to give the complete picture. Will see if I can split it up
>while posting for upstream.
Ok.
>> > BTW, Traditionally glibc has this definition for struct ucontext.
>>
>> glibc's definition is irrelevant, in part because glibc can and does lie
>> about kernel types to applications, and in part because glibc is not the
>> only user-space consumer of kernel types: there are other libcs, and there
>> are user-space virtualisation tools (my interest in this matter) that care
>> deeply about kernel types and sigframe layouts. In particular, user-space
>> needs to be able to copy and assemble sigframes.
>
>Ok. I hope user-space is doing this copy and assemble in a (sane) way that
>allows the kernel to modify the sigframe with out breaking old apps.
>
>Do you know how the user-space is determining how much to copy today?
The de-facto ABI for signal delivery and sigreturn is unfortunately
based on fairly fixed-layout structs on the stack (sigframes). The
only flexibility there that I've found is the sigcontext's fpstate
pointer which allows the fpstate to be located elsewhere.
User-space pretty much has to know the kernel's sigframe layout, for
instance, non-siginfo sigframes on x86-32 hide additional sigmask words
above the fpstate.
User-space could in theory compare SP with the sigcontext's SP and
the altstack (if any) and deduce the sigframe size from that, but
that gives incomplete information; for instance, user-space must
still be able to locate and adjust embedded pointers in the sigframe.
In my case we "know" the sigframe struct layout based on OS and CPU
combination. I can't comment on how others deal with this.
It's pretty much guaranteed that any extension of these structs will
break some applications. I think the best we can hope for is that
1) applications that only inspect sigframes without copying them
don't break, and
2) applications get a mechanism for detecting the new layout of
sigframes and ucontexts, allowing them to be updated to handle
those changes.
>I have to probably use some magic values(or other flags), indicating the
>presence of extended state context information in the ucontext.
I believe so too.
>cpuid information (cpuid.0x1.ecx.osxsave) indicates whether the
>OS supports xsave or not. This can be one way, that signal handlers
>can use to interpret the ucontext.
>
>Given that it is more than signal handlers, I can add a flag also,
>representing ucontext extensions.
My problem with the OSXAVE flag is that it's a very indirect way of
communicating the layout of sigframes and sigcontexts. These structures
should, if at all possible, be self-describing. A single flag bit in
the sigcontext could handle both structures (since a sigframe always
includes a sigcontext).
[But see my comment below about uc_flags.]
>> You're changing the layout of struct ucontext in two ways: uc_mcontext
>> changes elsewhere, and you're adding __unused and uc_xstate.
>
>I am not changing the uc_mcontext (struct sigcontext). Just extending
>the ucontext.
Correct, my mistake.
>> How is user-space supposed to know whether it's looking at a current
>> layout ucontext or an xsave-layout ucontext?
>>
>> It seems that uc_flags is unused and always zero. Could you define a
>> flag bit (e.g. 1) for uc_flags to indicate the xsave layout?
>
>Sure. I can even add a magic word to indicate the xsave presence,
>so that sigreturn can be double sure and not break older apps.
The uc_flags field is not specified by SUSv3. Linux stores a zero
in it on signal delivery but doesn't use it on signal return.
Solaris describes it as implementation-dependent, and stores a bit
vector in it describing which parts of the ucontext are valid and
should be restored by setcontext (i.e. sigreturn in Linux). Thus
uc_flags would be a perfect place to store an XSAVE flag or cookie.
Using uc_flags takes care of x86-64 and x86-32 with "rt" sigframes
(SA_SIGINFO signal dispositions). The remaining problem is x86-32
with non-SA_SIGINFO sigframes, since they store plain sigcontexts
on the stack not ucontexts, and so don't have a uc_flags field.
To handle those we have to augment either sigcontexts or the plain
non-rt sigframes with some kind of marker.
>> > --- linux-2.6-x86.orig/arch/x86/kernel/sigframe.h 2008-05-12 13:09:02.000000000 -0700
>> > +++ linux-2.6-x86/arch/x86/kernel/sigframe.h 2008-05-12 13:09:56.000000000 -0700
>> > @@ -3,9 +3,10 @@
>> > char __user *pretcode;
>> > int sig;
>> > struct sigcontext sc;
>> > - struct _fpstate fpstate;
>> > + struct xstate_cntxt xst_cnxt;
>> > unsigned long extramask[_NSIG_WORDS-1];
>> > char retcode[8];
>> > + /* fp and rest of the extended context state follows here */
>> > };
>>
>> Offset to extramask[] and retcode changes, as well as the size of the structure.
>
>hm yes. this will break the restore of extramask[], for apps which set their
>own sig return frames.
Updating extramask[] is useful for combining sigprocmask(SIG_SETMASK,_,_) and
sigreturn into an atomic operation. AFAIK, there's no way to reach extramask[]
without knowing the sigframe layout.
>I can leave _fpstate as it is(and unused) and move xstate context (which
>will contain fpstate + the extended state) after extramask[].
I think that would be best.
>> Why contract xstate_cntxt to xst_cnxt? That just obscures things.
>
>ok.
Thanks.
>> What is the purpose of the xstate pointer in xstate_cntxt?
>> As far as I can tell, it's redundant and can alway be derived
>> from the ucontext's uc_mcontext.fpstate pointer.
>
>This is true in 64bit.
>
>While doing sigreturn, there is no way, kernel can know if the
>uc_mcontext.fpstate is pointing to just the legacy fpstate(512 bytes)
>or the extended state pointer (unless we incorporate
>some magic word at the end of fpstate image, see below)
>
>during sigreturn, fpstate will be restored from the uc_mcontext.fpstate
>and extended state will be restored from xstate pointer.
>
>> And why does x86-64 make xstate == fpstate while on x86-32 they're at an
>> offset from each other?
>
>because sigcontext's 32bit fpstate is different from 64bit fpstate.
>32bit fp state is fsave frame followed by fxsave frame. Whereas 64bit
>fp state is just fxsave frame. To maintain 32bit legacy compatibility,
>32bit xstate is different from 32bit fpstate.
Ok, thanks for clarifying this.
>> struct _fpstate has a 'magic' field which distinguishes x87-only
>> from x87+FXSR structs. Could that field also be used to indicate XSAVE?
>
>I don't think we can use the existing 'magic' field.
Hmm, right now it seems this field has a de-facto ABI of being
either 0xffff (plain) or 0x0000 (fxsr). Using other values would
confuse at least one application I know of. Sad.
> But we can
>use some what similar magic, if the fxsave/fxrstor give away
>some of the fields at the end of fxsave image (today it is reserved
>and ignored during fxsave/fxrstor) for software use.
>We can then use these fields at the end of fpstate, to indicate the presence of
>xstate. But this requires some architecture changes like giving
>away this space for SW use. We can take this to architects and
>see what they think.
If the HW doesn't store anything valuable there, we could store
SW flags/cookies there on signal delivery, and clear them before
fxrstor (unless the HW is known to ignore those fields).
But it depends on how forgiving the HW is.
Another idea I have at the moment is that it seems that struct
_fpstate reserves storage for x87 in both the legacy part and
the fxsr part. Looking through arch/x86/kernel/i387.c it seems
that the x87 space in the fxsr part isn't actually used (see
convert_to_fxsr()). If so, it might be possible to reuse it for
an XSAVE indicator.
Another idea is that marking non-siginfo sigframes could be done
by adding a sys_xsave_sigreturn() and letting those sigframes
have a return address pointing to a vsyscall that invokes this
new syscall.
/Mikael
^ permalink raw reply [flat|nested] 57+ messages in thread* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-19 14:52 ` Mikael Pettersson
@ 2008-05-19 15:04 ` Andi Kleen
2008-05-19 16:29 ` H. Peter Anvin
2008-05-20 1:57 ` Suresh Siddha
2 siblings, 0 replies; 57+ messages in thread
From: Andi Kleen @ 2008-05-19 15:04 UTC (permalink / raw)
To: Mikael Pettersson
Cc: Suresh Siddha, mingo, hpa, tglx, torvalds, akpm, roland, drepper,
Hongjiu.lu, linux-kernel, arjan, rmk+lkml, dan, asit.k.mallick
> The de-facto ABI for signal delivery and sigreturn is unfortunately
> based on fairly fixed-layout structs on the stack (sigframes). The
> only flexibility there that I've found is the sigcontext's fpstate
> pointer which allows the fpstate to be located elsewhere.
One sure 100% compatible way would be to only change the signal layout once
the application used anything that needs XSAVE/XRSTOR. But implementing
that would be likely complicated and I'm not sure it's worth it.
I don't remember that much breakage when the FXSAVE support was
introduced on i386. That already changed these data structures.
> Another idea is that marking non-siginfo sigframes could be done
> by adding a sys_xsave_sigreturn() and letting those sigframes
> have a return address pointing to a vsyscall that invokes this
> new syscall.
That sounds easy enough. x86-64 right now doesn't use signal vsyscalls but
there is no principle reason it can't.
-Andi
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-19 14:52 ` Mikael Pettersson
2008-05-19 15:04 ` Andi Kleen
@ 2008-05-19 16:29 ` H. Peter Anvin
2008-05-19 16:57 ` Suresh Siddha
2008-05-20 1:57 ` Suresh Siddha
2 siblings, 1 reply; 57+ messages in thread
From: H. Peter Anvin @ 2008-05-19 16:29 UTC (permalink / raw)
To: Mikael Pettersson
Cc: Suresh Siddha, mingo, tglx, torvalds, akpm, andi, roland, drepper,
Hongjiu.lu, linux-kernel, arjan, rmk+lkml, dan, asit.k.mallick
Mikael Pettersson wrote:
>
> My problem with the OSXAVE flag is that it's a very indirect way of
> communicating the layout of sigframes and sigcontexts. These structures
> should, if at all possible, be self-describing. A single flag bit in
> the sigcontext could handle both structures (since a sigframe always
> includes a sigcontext).
>
It's also wrong, since OSXSAVE indicates that the CPU can do it, not
that the kernel can.
>
>>> struct _fpstate has a 'magic' field which distinguishes x87-only
>>> from x87+FXSR structs. Could that field also be used to indicate XSAVE?
>> I don't think we can use the existing 'magic' field.
>
> Hmm, right now it seems this field has a de-facto ABI of being
> either 0xffff (plain) or 0x0000 (fxsr). Using other values would
> confuse at least one application I know of. Sad.
>
Well, arguably it is the right thing to use since we're talking about a
new format. The difference is that the new format *does* extend
backwards to match the old format.
>> But we can
>> use some what similar magic, if the fxsave/fxrstor give away
>> some of the fields at the end of fxsave image (today it is reserved
>> and ignored during fxsave/fxrstor) for software use.
>> We can then use these fields at the end of fpstate, to indicate the presence of
>> xstate. But this requires some architecture changes like giving
>> away this space for SW use. We can take this to architects and
>> see what they think.
>
> If the HW doesn't store anything valuable there, we could store
> SW flags/cookies there on signal delivery, and clear them before
> fxrstor (unless the HW is known to ignore those fields).
> But it depends on how forgiving the HW is.
All we need is a single field -- a single byte -- reserved indefinitely
for software use. Existing FXSAVE kernels will have set it to zero.
There might be fields the existing FXSAVE format which can be equally
abused, even. I will do some looking.
-hpa
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-19 16:29 ` H. Peter Anvin
@ 2008-05-19 16:57 ` Suresh Siddha
2008-05-19 17:45 ` H. Peter Anvin
0 siblings, 1 reply; 57+ messages in thread
From: Suresh Siddha @ 2008-05-19 16:57 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Mikael Pettersson, Suresh Siddha, mingo, tglx, torvalds, akpm,
andi, roland, drepper, Hongjiu.lu, linux-kernel, arjan, rmk+lkml,
dan, asit.k.mallick
On Mon, May 19, 2008 at 09:29:01AM -0700, H. Peter Anvin wrote:
> Mikael Pettersson wrote:
> >
> >My problem with the OSXAVE flag is that it's a very indirect way of
> >communicating the layout of sigframes and sigcontexts. These structures
> >should, if at all possible, be self-describing. A single flag bit in
> >the sigcontext could handle both structures (since a sigframe always
> >includes a sigcontext).
> >
>
> It's also wrong, since OSXSAVE indicates that the CPU can do it, not
> that the kernel can.
OSXSAVE indicates the OS support and XSAVE indicates the cpu support.
> >>>struct _fpstate has a 'magic' field which distinguishes x87-only
> >>>from x87+FXSR structs. Could that field also be used to indicate XSAVE?
> >>I don't think we can use the existing 'magic' field.
> >
> >Hmm, right now it seems this field has a de-facto ABI of being
> >either 0xffff (plain) or 0x0000 (fxsr). Using other values would
> >confuse at least one application I know of. Sad.
> >
>
> Well, arguably it is the right thing to use since we're talking about a
> new format. The difference is that the new format *does* extend
> backwards to match the old format.
There might be some old applications, which just care about FP/SSE and
just check for 0xffff (plain) or 0x0000 (fxsr). We should extend this
in a backward compatible manner.
> >>But we can
> >>use some what similar magic, if the fxsave/fxrstor give away
> >>some of the fields at the end of fxsave image (today it is reserved
> >>and ignored during fxsave/fxrstor) for software use.
> >>We can then use these fields at the end of fpstate, to indicate the
> >>presence of
> >>xstate. But this requires some architecture changes like giving
> >>away this space for SW use. We can take this to architects and
> >>see what they think.
> >
> >If the HW doesn't store anything valuable there, we could store
> >SW flags/cookies there on signal delivery, and clear them before
> >fxrstor (unless the HW is known to ignore those fields).
> >But it depends on how forgiving the HW is.
>
> All we need is a single field -- a single byte -- reserved indefinitely
> for software use. Existing FXSAVE kernels will have set it to zero.
>
> There might be fields the existing FXSAVE format which can be equally
> abused, even. I will do some looking.
All the reserved fields at the end of fxsave format are zeroed and
presented as such to the user. If HW makes some of these fields SW available,
then we can use those (will check). If there is any scope with the
existing format it self, that will be much better.
thanks,
suresh
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-19 16:57 ` Suresh Siddha
@ 2008-05-19 17:45 ` H. Peter Anvin
0 siblings, 0 replies; 57+ messages in thread
From: H. Peter Anvin @ 2008-05-19 17:45 UTC (permalink / raw)
To: Suresh Siddha
Cc: Mikael Pettersson, mingo, tglx, torvalds, akpm, andi, roland,
drepper, Hongjiu.lu, linux-kernel, arjan, rmk+lkml, dan,
asit.k.mallick
Suresh Siddha wrote:
>>>
>> It's also wrong, since OSXSAVE indicates that the CPU can do it, not
>> that the kernel can.
>
> OSXSAVE indicates the OS support and XSAVE indicates the cpu support.
>
Sorry, brainfart. Don't post so early in the morning.
>> All we need is a single field -- a single byte -- reserved indefinitely
>> for software use. Existing FXSAVE kernels will have set it to zero.
>>
>> There might be fields the existing FXSAVE format which can be equally
>> abused, even. I will do some looking.
>
> All the reserved fields at the end of fxsave format are zeroed and
> presented as such to the user. If HW makes some of these fields SW available,
> then we can use those (will check). If there is any scope with the
> existing format it self, that will be much better.
I was thinking about what we'd really like earlier, and given a clean
slate I'd like to see a structure looking like:
struct state_ptrs {
size_t len;
struct state_foo *foo;
struct state_bar *bar;
...
};
... where len is sizeof(struct state_ptrs). This is not merely
extensible, but it's easy for userspace to massage it into whatever
format -- longer or shorter -- that it happens to know about, and it
gives a natural way for the kernel to communicate "none of this state"
by feeding a NULL pointer. So pretty much we're looking for a way to
backwards-compatible way to stash a pointer to this structure, I figure.
-hpa
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-19 14:52 ` Mikael Pettersson
2008-05-19 15:04 ` Andi Kleen
2008-05-19 16:29 ` H. Peter Anvin
@ 2008-05-20 1:57 ` Suresh Siddha
2008-05-20 8:58 ` Mikael Pettersson
2008-05-20 10:01 ` Andi Kleen
2 siblings, 2 replies; 57+ messages in thread
From: Suresh Siddha @ 2008-05-20 1:57 UTC (permalink / raw)
To: Mikael Pettersson
Cc: Suresh Siddha, mingo, hpa, tglx, torvalds, akpm, andi, roland,
drepper, Hongjiu.lu, linux-kernel, arjan, rmk+lkml, dan,
asit.k.mallick
On Mon, May 19, 2008 at 04:52:01PM +0200, Mikael Pettersson wrote:
> > But we can
> >use some what similar magic, if the fxsave/fxrstor give away
> >some of the fields at the end of fxsave image (today it is reserved
> >and ignored during fxsave/fxrstor) for software use.
> >We can then use these fields at the end of fpstate, to indicate the presence of
> >xstate. But this requires some architecture changes like giving
> >away this space for SW use. We can take this to architects and
> >see what they think.
>
> If the HW doesn't store anything valuable there, we could store
> SW flags/cookies there on signal delivery, and clear them before
> fxrstor (unless the HW is known to ignore those fields).
> But it depends on how forgiving the HW is.
Ok. CPU folks are planning to make some of the bytes at the end of fxsave
image, SW usable.
We can use some of these fields, to represent the extended state
presence with a cookie, save area size, mask of the state
stored. If needed, we can include the start address of the fpstate pointer
(also as part of the cookie), so that we can detect the situation,
where apps are just memcopying sizeof(struct _fpstate) from the fpstate
pointer (but not aware of the extended state).
we don't need any ucontext_t extensions any more and just
use the fpstate pointer to indicate the extended state aswell, right?
In addition, we need to make sure that for 32bit non-rt sigframes, we
don't modify the extramask[] offset.
thanks,
suresh
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-20 1:57 ` Suresh Siddha
@ 2008-05-20 8:58 ` Mikael Pettersson
2008-05-20 10:01 ` Andi Kleen
1 sibling, 0 replies; 57+ messages in thread
From: Mikael Pettersson @ 2008-05-20 8:58 UTC (permalink / raw)
To: Suresh Siddha
Cc: Mikael Pettersson, mingo, hpa, tglx, torvalds, akpm, andi, roland,
drepper, Hongjiu.lu, linux-kernel, arjan, rmk+lkml, dan,
asit.k.mallick
Suresh Siddha writes:
> On Mon, May 19, 2008 at 04:52:01PM +0200, Mikael Pettersson wrote:
> > > But we can
> > >use some what similar magic, if the fxsave/fxrstor give away
> > >some of the fields at the end of fxsave image (today it is reserved
> > >and ignored during fxsave/fxrstor) for software use.
> > >We can then use these fields at the end of fpstate, to indicate the presence of
> > >xstate. But this requires some architecture changes like giving
> > >away this space for SW use. We can take this to architects and
> > >see what they think.
> >
> > If the HW doesn't store anything valuable there, we could store
> > SW flags/cookies there on signal delivery, and clear them before
> > fxrstor (unless the HW is known to ignore those fields).
> > But it depends on how forgiving the HW is.
>
> Ok. CPU folks are planning to make some of the bytes at the end of fxsave
> image, SW usable.
Nice.
> We can use some of these fields, to represent the extended state
> presence with a cookie, save area size, mask of the state
> stored. If needed, we can include the start address of the fpstate pointer
> (also as part of the cookie), so that we can detect the situation,
> where apps are just memcopying sizeof(struct _fpstate) from the fpstate
> pointer (but not aware of the extended state).
I use a similar technique to detect user-space mangling
of ucontexts on Solaris.
> we don't need any ucontext_t extensions any more and just
> use the fpstate pointer to indicate the extended state aswell, right?
Yes, the old magic distinguishes x87-only from x87+fxsr, the new magic
distinguishes fxsr from xsave.
> In addition, we need to make sure that for 32bit non-rt sigframes, we
> don't modify the extramask[] offset.
Thanks,
/Mikael
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-20 1:57 ` Suresh Siddha
2008-05-20 8:58 ` Mikael Pettersson
@ 2008-05-20 10:01 ` Andi Kleen
2008-05-20 13:19 ` Mikael Pettersson
2008-05-20 14:55 ` H. Peter Anvin
1 sibling, 2 replies; 57+ messages in thread
From: Andi Kleen @ 2008-05-20 10:01 UTC (permalink / raw)
To: Suresh Siddha
Cc: Mikael Pettersson, mingo, hpa, tglx, torvalds, akpm, roland,
drepper, Hongjiu.lu, linux-kernel, arjan, rmk+lkml, dan,
asit.k.mallick
Suresh Siddha wrote:
> On Mon, May 19, 2008 at 04:52:01PM +0200, Mikael Pettersson wrote:
>>> But we can
>>> use some what similar magic, if the fxsave/fxrstor give away
>>> some of the fields at the end of fxsave image (today it is reserved
>>> and ignored during fxsave/fxrstor) for software use.
>>> We can then use these fields at the end of fpstate, to indicate the presence of
>>> xstate. But this requires some architecture changes like giving
>>> away this space for SW use. We can take this to architects and
>>> see what they think.
>> If the HW doesn't store anything valuable there, we could store
>> SW flags/cookies there on signal delivery, and clear them before
>> fxrstor (unless the HW is known to ignore those fields).
>> But it depends on how forgiving the HW is.
>
> Ok. CPU folks are planning to make some of the bytes at the end of fxsave
> image, SW usable.
Are they always zeroed in earlier CPUs though? If not that wouldn't
work 100% reliably because whatever cookie you put in could have been
there before by chance.
I don't see anything in the SDM guaranteeing zeroing.
-Andi
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-20 10:01 ` Andi Kleen
@ 2008-05-20 13:19 ` Mikael Pettersson
2008-05-20 14:58 ` H. Peter Anvin
2008-05-20 14:55 ` H. Peter Anvin
1 sibling, 1 reply; 57+ messages in thread
From: Mikael Pettersson @ 2008-05-20 13:19 UTC (permalink / raw)
To: Andi Kleen
Cc: Suresh Siddha, Mikael Pettersson, mingo, hpa, tglx, torvalds,
akpm, roland, drepper, Hongjiu.lu, linux-kernel, arjan, rmk+lkml,
dan, asit.k.mallick
Andi Kleen writes:
> Suresh Siddha wrote:
> > On Mon, May 19, 2008 at 04:52:01PM +0200, Mikael Pettersson wrote:
> >>> But we can
> >>> use some what similar magic, if the fxsave/fxrstor give away
> >>> some of the fields at the end of fxsave image (today it is reserved
> >>> and ignored during fxsave/fxrstor) for software use.
> >>> We can then use these fields at the end of fpstate, to indicate the presence of
> >>> xstate. But this requires some architecture changes like giving
> >>> away this space for SW use. We can take this to architects and
> >>> see what they think.
> >> If the HW doesn't store anything valuable there, we could store
> >> SW flags/cookies there on signal delivery, and clear them before
> >> fxrstor (unless the HW is known to ignore those fields).
> >> But it depends on how forgiving the HW is.
> >
> > Ok. CPU folks are planning to make some of the bytes at the end of fxsave
> > image, SW usable.
>
> Are they always zeroed in earlier CPUs though? If not that wouldn't
> work 100% reliably because whatever cookie you put in could have been
> there before by chance.
I wrote a test program (fill an area with zeroes, fxsave, inspect
reserved fields, then fill it with ones, fxsave, inspect again),
and all processors appear to just not write anything to the reserved
fields after the last xmm register. (Tested on an old Mobile Athlon64,
Opteron 280, P4 Xeon, Pentium-D, and C2 Xeon E5345.)
So the question now is what if anything has the Linux kernel written
to those reserved fields. (Looking..) Hmm, signal delivery on x86-64
seems to do fxsave directly to the fxsave area in the user's sigframe,
which would imply that the reserved fields have unpredictable values.
/Mikael
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-20 13:19 ` Mikael Pettersson
@ 2008-05-20 14:58 ` H. Peter Anvin
2008-05-20 15:20 ` Mikael Pettersson
0 siblings, 1 reply; 57+ messages in thread
From: H. Peter Anvin @ 2008-05-20 14:58 UTC (permalink / raw)
To: Mikael Pettersson
Cc: Andi Kleen, Suresh Siddha, mingo, tglx, torvalds, akpm, roland,
drepper, Hongjiu.lu, linux-kernel, arjan, rmk+lkml, dan,
asit.k.mallick
Mikael Pettersson wrote:
> >
> > Are they always zeroed in earlier CPUs though? If not that wouldn't
> > work 100% reliably because whatever cookie you put in could have been
> > there before by chance.
>
> I wrote a test program (fill an area with zeroes, fxsave, inspect
> reserved fields, then fill it with ones, fxsave, inspect again),
> and all processors appear to just not write anything to the reserved
> fields after the last xmm register. (Tested on an old Mobile Athlon64,
> Opteron 280, P4 Xeon, Pentium-D, and C2 Xeon E5345.)
>
> So the question now is what if anything has the Linux kernel written
> to those reserved fields. (Looking..) Hmm, signal delivery on x86-64
> seems to do fxsave directly to the fxsave area in the user's sigframe,
> which would imply that the reserved fields have unpredictable values.
>
OK, so that's not a usable path unless we can find some area in the
existing data set to put a flag. Groan.
-hpa
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-20 14:58 ` H. Peter Anvin
@ 2008-05-20 15:20 ` Mikael Pettersson
2008-05-20 17:53 ` Suresh Siddha
2008-05-20 17:57 ` H. Peter Anvin
0 siblings, 2 replies; 57+ messages in thread
From: Mikael Pettersson @ 2008-05-20 15:20 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Mikael Pettersson, Andi Kleen, Suresh Siddha, mingo, tglx,
torvalds, akpm, roland, drepper, Hongjiu.lu, linux-kernel, arjan,
rmk+lkml, dan, asit.k.mallick
H. Peter Anvin writes:
> Mikael Pettersson wrote:
> > >
> > > Are they always zeroed in earlier CPUs though? If not that wouldn't
> > > work 100% reliably because whatever cookie you put in could have been
> > > there before by chance.
> >
> > I wrote a test program (fill an area with zeroes, fxsave, inspect
> > reserved fields, then fill it with ones, fxsave, inspect again),
> > and all processors appear to just not write anything to the reserved
> > fields after the last xmm register. (Tested on an old Mobile Athlon64,
> > Opteron 280, P4 Xeon, Pentium-D, and C2 Xeon E5345.)
> >
> > So the question now is what if anything has the Linux kernel written
> > to those reserved fields. (Looking..) Hmm, signal delivery on x86-64
> > seems to do fxsave directly to the fxsave area in the user's sigframe,
> > which would imply that the reserved fields have unpredictable values.
> >
>
> OK, so that's not a usable path unless we can find some area in the
> existing data set to put a flag. Groan.
An ugly workaround could be to start clearing one of these fields,
and say that the data there is only valid for kernels >= 2.6.26.
(I said it was ugly...)
Or we go back to stashing a flag in uc_flags (which is kosher),
and try to figure out how to mark non-rt sigframes.
/Mikael
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-20 15:20 ` Mikael Pettersson
@ 2008-05-20 17:53 ` Suresh Siddha
2008-05-20 17:59 ` H. Peter Anvin
2008-05-22 0:28 ` H. Peter Anvin
2008-05-20 17:57 ` H. Peter Anvin
1 sibling, 2 replies; 57+ messages in thread
From: Suresh Siddha @ 2008-05-20 17:53 UTC (permalink / raw)
To: Mikael Pettersson
Cc: H. Peter Anvin, Andi Kleen, Suresh Siddha, mingo, tglx, torvalds,
akpm, roland, drepper, Hongjiu.lu, linux-kernel, arjan, rmk+lkml,
dan, asit.k.mallick
On Tue, May 20, 2008 at 05:20:43PM +0200, Mikael Pettersson wrote:
> H. Peter Anvin writes:
> > Mikael Pettersson wrote:
> > > So the question now is what if anything has the Linux kernel written
> > > to those reserved fields. (Looking..) Hmm, signal delivery on x86-64
> > > seems to do fxsave directly to the fxsave area in the user's sigframe,
> > > which would imply that the reserved fields have unpredictable values.
> > >
> >
> > OK, so that's not a usable path unless we can find some area in the
> > existing data set to put a flag. Groan.
>
> An ugly workaround could be to start clearing one of these fields,
> and say that the data there is only valid for kernels >= 2.6.26.
> (I said it was ugly...)
>
> Or we go back to stashing a flag in uc_flags (which is kosher),
> and try to figure out how to mark non-rt sigframes.
This issue of not-zeroing, is present in only 64bit kernels and for 64bit apps,
right?
64bit app signal handling uses only rt_frame, so we can add an uc_flag for
them and for 32bit apps, kernel was always zero'ing the reserved bits
at the end of _fpstate.
In short, for non-rt frames, they can check the reserved bits at the end
of fpstate frame and for rt-frames (perhaps even for 32bit rt frame handling)
apps can check for uc_flag aswell, for extended state presence. Is this
good enough?
thanks,
suresh
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-20 17:53 ` Suresh Siddha
@ 2008-05-20 17:59 ` H. Peter Anvin
2008-05-22 0:28 ` H. Peter Anvin
1 sibling, 0 replies; 57+ messages in thread
From: H. Peter Anvin @ 2008-05-20 17:59 UTC (permalink / raw)
To: Suresh Siddha
Cc: Mikael Pettersson, Andi Kleen, mingo, tglx, torvalds, akpm,
roland, drepper, Hongjiu.lu, linux-kernel, arjan, rmk+lkml, dan,
asit.k.mallick
Suresh Siddha wrote:
>
> This issue of not-zeroing, is present in only 64bit kernels and for 64bit apps,
> right?
>
> 64bit app signal handling uses only rt_frame, so we can add an uc_flag for
> them and for 32bit apps, kernel was always zero'ing the reserved bits
> at the end of _fpstate.
>
> In short, for non-rt frames, they can check the reserved bits at the end
> of fpstate frame and for rt-frames (perhaps even for 32bit rt frame handling)
> apps can check for uc_flag aswell, for extended state presence. Is this
> good enough?
>
Are we sure about the "always" bit here (I suspect yes, but we may want
to double-check at least back to 2.2 or 2.4.)
Anyway, seems reasonable enough to me.
-hpa
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-20 17:53 ` Suresh Siddha
2008-05-20 17:59 ` H. Peter Anvin
@ 2008-05-22 0:28 ` H. Peter Anvin
2008-05-22 0:53 ` Roland McGrath
2008-05-22 8:57 ` Mikael Pettersson
1 sibling, 2 replies; 57+ messages in thread
From: H. Peter Anvin @ 2008-05-22 0:28 UTC (permalink / raw)
To: Suresh Siddha
Cc: Mikael Pettersson, Andi Kleen, mingo, tglx, torvalds, akpm,
roland, drepper, Hongjiu.lu, linux-kernel, arjan, rmk+lkml, dan,
asit.k.mallick
Suresh Siddha wrote:
>>
>> An ugly workaround could be to start clearing one of these fields,
>> and say that the data there is only valid for kernels >= 2.6.26.
>> (I said it was ugly...)
>>
>> Or we go back to stashing a flag in uc_flags (which is kosher),
>> and try to figure out how to mark non-rt sigframes.
>
> This issue of not-zeroing, is present in only 64bit kernels and for 64bit apps,
> right?
>
> 64bit app signal handling uses only rt_frame, so we can add an uc_flag for
> them and for 32bit apps, kernel was always zero'ing the reserved bits
> at the end of _fpstate.
>
> In short, for non-rt frames, they can check the reserved bits at the end
> of fpstate frame and for rt-frames (perhaps even for 32bit rt frame handling)
> apps can check for uc_flag aswell, for extended state presence. Is this
> good enough?
>
Okay, trying to close on this :)
I would suggest using the uc_flag for the rt frames, and simply rely on
the OSXSAVE flag for non-rt signal frames. It a rather sucky approach
(as previously discussed), but since any sane user of these fields (as
opposed to just relying on the kernel to save/restore) should use the
SIGINFO frames, I don't see a problem *as long as it's possible to get
the information* -- any solution which demands performance should just
turn on SIGINFO and be happy.
The biggest potential problem with this that I see is that relying on
CPUID can mess with certain virtualization solutions. Another option to
accomplish the same thing would be to have a system call (preferrably a
prctl, since it is at least in theory personality-dependent) to query
what information is included in the fpstate data - since it will always
be the same for any particular kernel.
Thoughts?
-hpa
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-22 0:28 ` H. Peter Anvin
@ 2008-05-22 0:53 ` Roland McGrath
2008-05-22 1:38 ` H. Peter Anvin
2008-05-22 8:49 ` Mikael Pettersson
2008-05-22 8:57 ` Mikael Pettersson
1 sibling, 2 replies; 57+ messages in thread
From: Roland McGrath @ 2008-05-22 0:53 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Suresh Siddha, Mikael Pettersson, Andi Kleen, mingo, tglx,
torvalds, akpm, drepper, Hongjiu.lu, linux-kernel, arjan,
rmk+lkml, dan, asit.k.mallick
> The biggest potential problem with this that I see is that relying on
> CPUID can mess with certain virtualization solutions. Another option to
> accomplish the same thing would be to have a system call (preferrably a
> prctl, since it is at least in theory personality-dependent) to query
> what information is included in the fpstate data - since it will always
> be the same for any particular kernel.
It's easy and cheap to add an indicator to the vDSO for this.
Thanks,
Roland
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-22 0:53 ` Roland McGrath
@ 2008-05-22 1:38 ` H. Peter Anvin
2008-05-22 6:40 ` Roland McGrath
2008-05-22 8:49 ` Mikael Pettersson
1 sibling, 1 reply; 57+ messages in thread
From: H. Peter Anvin @ 2008-05-22 1:38 UTC (permalink / raw)
To: Roland McGrath
Cc: Suresh Siddha, Mikael Pettersson, Andi Kleen, mingo, tglx,
torvalds, akpm, drepper, Hongjiu.lu, linux-kernel, arjan,
rmk+lkml, dan, asit.k.mallick
Roland McGrath wrote:
>> The biggest potential problem with this that I see is that relying on
>> CPUID can mess with certain virtualization solutions. Another option to
>> accomplish the same thing would be to have a system call (preferrably a
>> prctl, since it is at least in theory personality-dependent) to query
>> what information is included in the fpstate data - since it will always
>> be the same for any particular kernel.
>
> It's easy and cheap to add an indicator to the vDSO for this.
Yes, but I suspect for legacy apps running without vDSO might matter.
-hpa
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-22 1:38 ` H. Peter Anvin
@ 2008-05-22 6:40 ` Roland McGrath
2008-05-22 7:18 ` H. Peter Anvin
0 siblings, 1 reply; 57+ messages in thread
From: Roland McGrath @ 2008-05-22 6:40 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Suresh Siddha, Mikael Pettersson, Andi Kleen, mingo, tglx,
torvalds, akpm, drepper, Hongjiu.lu, linux-kernel, arjan,
rmk+lkml, dan, asit.k.mallick
> Yes, but I suspect for legacy apps running without vDSO might matter.
"Legacy" apps want to be changed to make a new kind of kernel query
and care about new details of signal frame layout for new features
they never used before, but don't want to handle the vDSO?
Thanks,
Roland
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-22 6:40 ` Roland McGrath
@ 2008-05-22 7:18 ` H. Peter Anvin
0 siblings, 0 replies; 57+ messages in thread
From: H. Peter Anvin @ 2008-05-22 7:18 UTC (permalink / raw)
To: Roland McGrath
Cc: Suresh Siddha, Mikael Pettersson, Andi Kleen, mingo, tglx,
torvalds, akpm, drepper, Hongjiu.lu, linux-kernel, arjan,
rmk+lkml, dan, asit.k.mallick
Roland McGrath wrote:
>> Yes, but I suspect for legacy apps running without vDSO might matter.
>
> "Legacy" apps want to be changed to make a new kind of kernel query
> and care about new details of signal frame layout for new features
> they never used before, but don't want to handle the vDSO?
I'm talking mostly about semi-embedded ISVs that have managed to get
themselves funny ideas about what they don't want to change. The vDSO
definitely involves more machinery to get to.
-hpa
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-22 0:53 ` Roland McGrath
2008-05-22 1:38 ` H. Peter Anvin
@ 2008-05-22 8:49 ` Mikael Pettersson
1 sibling, 0 replies; 57+ messages in thread
From: Mikael Pettersson @ 2008-05-22 8:49 UTC (permalink / raw)
To: Roland McGrath
Cc: H. Peter Anvin, Suresh Siddha, Mikael Pettersson, Andi Kleen,
mingo, tglx, torvalds, akpm, drepper, Hongjiu.lu, linux-kernel,
arjan, rmk+lkml, dan, asit.k.mallick
Roland McGrath writes:
> > The biggest potential problem with this that I see is that relying on
> > CPUID can mess with certain virtualization solutions. Another option to
> > accomplish the same thing would be to have a system call (preferrably a
> > prctl, since it is at least in theory personality-dependent) to query
> > what information is included in the fpstate data - since it will always
> > be the same for any particular kernel.
>
> It's easy and cheap to add an indicator to the vDSO for this.
For our user-space virtualisation code a plain system call,
like prctl(), would be strongly preferable. Right now I
can't say whether a vDSO solution would work at all for us,
but a system call solution would be no problem at all.
/Mikael
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-22 0:28 ` H. Peter Anvin
2008-05-22 0:53 ` Roland McGrath
@ 2008-05-22 8:57 ` Mikael Pettersson
2008-05-22 20:56 ` Suresh Siddha
1 sibling, 1 reply; 57+ messages in thread
From: Mikael Pettersson @ 2008-05-22 8:57 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Suresh Siddha, Mikael Pettersson, Andi Kleen, mingo, tglx,
torvalds, akpm, roland, drepper, Hongjiu.lu, linux-kernel, arjan,
rmk+lkml, dan, asit.k.mallick
H. Peter Anvin writes:
> Suresh Siddha wrote:
> >>
> >> An ugly workaround could be to start clearing one of these fields,
> >> and say that the data there is only valid for kernels >= 2.6.26.
> >> (I said it was ugly...)
> >>
> >> Or we go back to stashing a flag in uc_flags (which is kosher),
> >> and try to figure out how to mark non-rt sigframes.
> >
> > This issue of not-zeroing, is present in only 64bit kernels and for 64bit apps,
> > right?
> >
> > 64bit app signal handling uses only rt_frame, so we can add an uc_flag for
> > them and for 32bit apps, kernel was always zero'ing the reserved bits
> > at the end of _fpstate.
> >
> > In short, for non-rt frames, they can check the reserved bits at the end
> > of fpstate frame and for rt-frames (perhaps even for 32bit rt frame handling)
> > apps can check for uc_flag aswell, for extended state presence. Is this
> > good enough?
> >
>
> Okay, trying to close on this :)
>
> I would suggest using the uc_flag for the rt frames, and simply rely on
> the OSXSAVE flag for non-rt signal frames. It a rather sucky approach
> (as previously discussed), but since any sane user of these fields (as
> opposed to just relying on the kernel to save/restore) should use the
> SIGINFO frames, I don't see a problem *as long as it's possible to get
> the information* -- any solution which demands performance should just
> turn on SIGINFO and be happy.
I don't have the luxury to unconditionally change non-rt signal delivery
to rt signal delivery, but using uc_flags plus OSXSAVE or prctl() to
announce these layout changes would work for us. Of course, any existing
sigframe mangling (as opposed to just reading it) code must be updated
to avoid breakage, but that's unavoidable.
> The biggest potential problem with this that I see is that relying on
> CPUID can mess with certain virtualization solutions. Another option to
> accomplish the same thing would be to have a system call (preferrably a
> prctl, since it is at least in theory personality-dependent) to query
> what information is included in the fpstate data - since it will always
> be the same for any particular kernel.
>
> Thoughts?
Ok for me.
/Mikael
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-22 8:57 ` Mikael Pettersson
@ 2008-05-22 20:56 ` Suresh Siddha
2008-05-22 21:02 ` H. Peter Anvin
0 siblings, 1 reply; 57+ messages in thread
From: Suresh Siddha @ 2008-05-22 20:56 UTC (permalink / raw)
To: Mikael Pettersson
Cc: H. Peter Anvin, Suresh Siddha, Andi Kleen, mingo, tglx, torvalds,
akpm, roland, drepper, Hongjiu.lu, linux-kernel, arjan, rmk+lkml,
dan, asit.k.mallick
On Thu, May 22, 2008 at 10:57:03AM +0200, Mikael Pettersson wrote:
> H. Peter Anvin writes:
> > Suresh Siddha wrote:
> > > In short, for non-rt frames, they can check the reserved bits at the end
> > > of fpstate frame and for rt-frames (perhaps even for 32bit rt frame handling)
> > > apps can check for uc_flag aswell, for extended state presence. Is this
> > > good enough?
> > >
> >
> > Okay, trying to close on this :)
> >
> > I would suggest using the uc_flag for the rt frames, and simply rely on
> > the OSXSAVE flag for non-rt signal frames. It a rather sucky approach
> > (as previously discussed), but since any sane user of these fields (as
> > opposed to just relying on the kernel to save/restore) should use the
> > SIGINFO frames, I don't see a problem *as long as it's possible to get
> > the information* -- any solution which demands performance should just
> > turn on SIGINFO and be happy.
>
> I don't have the luxury to unconditionally change non-rt signal delivery
> to rt signal delivery, but using uc_flags plus OSXSAVE or prctl() to
> announce these layout changes would work for us. Of course, any existing
> sigframe mangling (as opposed to just reading it) code must be updated
> to avoid breakage, but that's unavoidable.
>
> > The biggest potential problem with this that I see is that relying on
> > CPUID can mess with certain virtualization solutions. Another option to
hpa, What is the virtualization problem? Are you referring to perf problem?
As you noted, regular non-rt signal handlers won't need this cpuid check. It's
needed only for those who manually look at non-rt signal frames and interpret it.
And also, they can do this check only once and not everytime.
To me, prtcl() just seems to be an overkill.
While restoring from the user, kernel also need to find out what layout
the user is passing. So it's bi-directional. I prefer the same mechanism
(using cookies/magic numbers etc inaddition to uc_flags or cpuid checks) to
interpret the fpstate for both user/kernel.
ARM also seem to be using similar things while extending their ucontext_t,
with out other kernel interfaces to indicate the layout.
BTW, how come 32bit kernel doesn't have the X86_FXSR_MAGIC checks, while restoring
the extended fxsave data from _fpstate?
thanks,
suresh
> > accomplish the same thing would be to have a system call (preferrably a
> > prctl, since it is at least in theory personality-dependent) to query
> > what information is included in the fpstate data - since it will always
> > be the same for any particular kernel.
> >
> > Thoughts?
>
> Ok for me.
>
> /Mikael
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-22 20:56 ` Suresh Siddha
@ 2008-05-22 21:02 ` H. Peter Anvin
2008-05-22 21:29 ` Suresh Siddha
` (2 more replies)
0 siblings, 3 replies; 57+ messages in thread
From: H. Peter Anvin @ 2008-05-22 21:02 UTC (permalink / raw)
To: Suresh Siddha
Cc: Mikael Pettersson, Andi Kleen, mingo, tglx, torvalds, akpm,
roland, drepper, Hongjiu.lu, linux-kernel, arjan, rmk+lkml, dan,
asit.k.mallick
Suresh Siddha wrote:
>
> hpa, What is the virtualization problem? Are you referring to perf problem?
> As you noted, regular non-rt signal handlers won't need this cpuid check. It's
> needed only for those who manually look at non-rt signal frames and interpret it.
> And also, they can do this check only once and not everytime.
>
No, relying on CPUID and vdso both have implications for virtualization.
> To me, prtcl() just seems to be an overkill.
I don't think it is ... it's not overkill but rather "underkill"... it's
a low-performance solution but it's guaranteed to be safe in the
presence of virtualization of all its various ilk. Note that you don't
need to be able to *set* the format via prctl(), just *query* (get) it.
Using prctl() allows us to make this personality-dependent if we ever
need to.
> While restoring from the user, kernel also need to find out what layout
> the user is passing. So it's bi-directional. I prefer the same mechanism
> (using cookies/magic numbers etc inaddition to uc_flags or cpuid checks) to
> interpret the fpstate for both user/kernel.
No, it really doesn't: the kernel only needs to be able to read the same
format as it itself wrote.
> ARM also seem to be using similar things while extending their ucontext_t,
> with out other kernel interfaces to indicate the layout.
>
> BTW, how come 32bit kernel doesn't have the X86_FXSR_MAGIC checks, while restoring
> the extended fxsave data from _fpstate?
Again, the kernel already knows the format, so it doesn't need to check.
-hpa
^ permalink raw reply [flat|nested] 57+ messages in thread* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-22 21:02 ` H. Peter Anvin
@ 2008-05-22 21:29 ` Suresh Siddha
2008-05-22 21:34 ` H. Peter Anvin
2008-05-22 21:32 ` Chris Wright
2008-05-22 22:36 ` Mikael Pettersson
2 siblings, 1 reply; 57+ messages in thread
From: Suresh Siddha @ 2008-05-22 21:29 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Suresh Siddha, Mikael Pettersson, Andi Kleen, mingo, tglx,
torvalds, akpm, roland, drepper, Hongjiu.lu, linux-kernel, arjan,
rmk+lkml, dan, asit.k.mallick
On Thu, May 22, 2008 at 02:02:28PM -0700, H. Peter Anvin wrote:
> Suresh Siddha wrote:
> >
> >hpa, What is the virtualization problem? Are you referring to perf problem?
> >As you noted, regular non-rt signal handlers won't need this cpuid check.
> >It's
> >needed only for those who manually look at non-rt signal frames and
> >interpret it.
> >And also, they can do this check only once and not everytime.
> >
>
> No, relying on CPUID and vdso both have implications for virtualization.
can you please elaborate? even in presence of virtualization, appropriate
cpuid bits need to be set/visible for application, for xsave/xrstor to work
properly.
> I don't think it is ... it's not overkill but rather "underkill"... it's
> a low-performance solution but it's guaranteed to be safe in the
> presence of virtualization of all its various ilk. Note that you don't
> need to be able to *set* the format via prctl(), just *query* (get) it.
>
> Using prctl() allows us to make this personality-dependent if we ever
> need to.
>
> >While restoring from the user, kernel also need to find out what layout
> >the user is passing. So it's bi-directional. I prefer the same mechanism
> >(using cookies/magic numbers etc inaddition to uc_flags or cpuid checks) to
> >interpret the fpstate for both user/kernel.
>
> No, it really doesn't: the kernel only needs to be able to read the same
> format as it itself wrote.
But user can potentially change the _fpstate pointer in sigcontext struct.
>
> >ARM also seem to be using similar things while extending their ucontext_t,
> >with out other kernel interfaces to indicate the layout.
> >
> >BTW, how come 32bit kernel doesn't have the X86_FXSR_MAGIC checks, while
> >restoring
> >the extended fxsave data from _fpstate?
>
> Again, the kernel already knows the format, so it doesn't need to check.
What happens with some old applications which change the _fpstate
pointer. they may or may not be fxsave aware...
thanks,
suresh
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-22 21:29 ` Suresh Siddha
@ 2008-05-22 21:34 ` H. Peter Anvin
2008-05-22 22:22 ` Mikael Pettersson
2008-05-23 1:48 ` Suresh Siddha
0 siblings, 2 replies; 57+ messages in thread
From: H. Peter Anvin @ 2008-05-22 21:34 UTC (permalink / raw)
To: Suresh Siddha
Cc: Mikael Pettersson, Andi Kleen, mingo, tglx, torvalds, akpm,
roland, drepper, Hongjiu.lu, linux-kernel, arjan, rmk+lkml, dan,
asit.k.mallick
Suresh Siddha wrote:
>
> can you please elaborate? even in presence of virtualization, appropriate
> cpuid bits need to be set/visible for application, for xsave/xrstor to work
> properly.
>
For many paravirtualization solutions, CPUID "leak" from the hypervisor.
The fact that CPUID cannot be disabled (made ring 0 only) is a major
flaw in the architecture.
Therefore, relying on CPUID is too dangerous.
>> I don't think it is ... it's not overkill but rather "underkill"... it's
>> a low-performance solution but it's guaranteed to be safe in the
>> presence of virtualization of all its various ilk. Note that you don't
>> need to be able to *set* the format via prctl(), just *query* (get) it.
>>
>> Using prctl() allows us to make this personality-dependent if we ever
>> need to.
>>
>>> While restoring from the user, kernel also need to find out what layout
>>> the user is passing. So it's bi-directional. I prefer the same mechanism
>>> (using cookies/magic numbers etc inaddition to uc_flags or cpuid checks) to
>>> interpret the fpstate for both user/kernel.
>> No, it really doesn't: the kernel only needs to be able to read the same
>> format as it itself wrote.
>
> But user can potentially change the _fpstate pointer in sigcontext struct.
If so, they get what they bargained for... I am not even sure the kernel
will successfully clean up the stack frame if they do that. I don't
think it has ever been supported doing that, and as you have correctly
pointed out, we don't check the magic number, so if we had had offenders
we ought to have smoked them out a long time ago.
>>> ARM also seem to be using similar things while extending their ucontext_t,
>>> with out other kernel interfaces to indicate the layout.
>>>
>>> BTW, how come 32bit kernel doesn't have the X86_FXSR_MAGIC checks, while
>>> restoring
>>> the extended fxsave data from _fpstate?
>> Again, the kernel already knows the format, so it doesn't need to check.
>
> What happens with some old applications which change the _fpstate
> pointer. they may or may not be fxsave aware...
That is not, and has never been, supported.
-hpa
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-22 21:34 ` H. Peter Anvin
@ 2008-05-22 22:22 ` Mikael Pettersson
2008-05-23 1:48 ` Suresh Siddha
1 sibling, 0 replies; 57+ messages in thread
From: Mikael Pettersson @ 2008-05-22 22:22 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Suresh Siddha, Mikael Pettersson, Andi Kleen, mingo, tglx,
torvalds, akpm, roland, drepper, Hongjiu.lu, linux-kernel, arjan,
rmk+lkml, dan, asit.k.mallick
H. Peter Anvin writes:
> > But user can potentially change the _fpstate pointer in sigcontext struct.
>
> If so, they get what they bargained for... I am not even sure the kernel
> will successfully clean up the stack frame if they do that.
Why would that be a problem? The stack is defined by the user's SP,
which is in the integer state part of the sigframe. We do a sigreturn(),
reload SP/PC/etc, and off we go. Where fpstate pointed to doesn't matter,
as long as it was NULL or a valid fpstate image.
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-22 21:34 ` H. Peter Anvin
2008-05-22 22:22 ` Mikael Pettersson
@ 2008-05-23 1:48 ` Suresh Siddha
2008-05-23 2:12 ` Roland McGrath
2008-05-23 2:45 ` [RFC] x86: xsave/xrstor support, " H. Peter Anvin
1 sibling, 2 replies; 57+ messages in thread
From: Suresh Siddha @ 2008-05-23 1:48 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Suresh Siddha, Mikael Pettersson, Andi Kleen, mingo, tglx,
torvalds, akpm, roland, drepper, Hongjiu.lu, linux-kernel, arjan,
rmk+lkml, dan, asit.k.mallick
On Thu, May 22, 2008 at 02:34:45PM -0700, H. Peter Anvin wrote:
> Suresh Siddha wrote:
> >
> >can you please elaborate? even in presence of virtualization, appropriate
> >cpuid bits need to be set/visible for application, for xsave/xrstor to work
> >properly.
> >
>
> For many paravirtualization solutions, CPUID "leak" from the hypervisor.
> The fact that CPUID cannot be disabled (made ring 0 only) is a major
> flaw in the architecture.
>
> Therefore, relying on CPUID is too dangerous.
hmm.. so the kernel needs to export all the cpuid info (that the kernel enables
and supports) to the user through some mechanism then?
atleast in the xsave case, hypervisor can completely control the
OSXSAVE flag. I am still not convinced whether I need to add prctl()
to indicate the layout. If I have to add, then it should not just
be about whether xsave information is included in _fpstate or not, it should also
be about the whole cpuid information provided by the xsave architecture. Because
the user potentially needs all that information, to make sense out of
the data layout included in the extended state area.
thanks,
suresh
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-23 1:48 ` Suresh Siddha
@ 2008-05-23 2:12 ` Roland McGrath
2008-05-23 2:49 ` H. Peter Anvin
2008-05-23 2:45 ` [RFC] x86: xsave/xrstor support, " H. Peter Anvin
1 sibling, 1 reply; 57+ messages in thread
From: Roland McGrath @ 2008-05-23 2:12 UTC (permalink / raw)
To: Suresh Siddha
Cc: H. Peter Anvin, Mikael Pettersson, Andi Kleen, mingo, tglx,
torvalds, akpm, drepper, Hongjiu.lu, linux-kernel, arjan,
rmk+lkml, dan, asit.k.mallick
> hmm.. so the kernel needs to export all the cpuid info (that the kernel
> enables and supports) to the user through some mechanism then?
For a cheap interface, we use AT_HWCAP for this. Unfortunately that only
covers the first 32 bits of CPUID info. (For another cheap interface,
giving all the CPUID bits in the vDSO would be easy.)
For a clunky interface that already exists and is "simple" to use,
there is /dev/cpu/0/cpuid now. I wonder if having a device node and
opening it too much for applications that consider the vDSO too complex.
Thanks,
Roland
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-23 2:12 ` Roland McGrath
@ 2008-05-23 2:49 ` H. Peter Anvin
2008-05-23 18:09 ` Suresh Siddha
0 siblings, 1 reply; 57+ messages in thread
From: H. Peter Anvin @ 2008-05-23 2:49 UTC (permalink / raw)
To: Roland McGrath
Cc: Suresh Siddha, Mikael Pettersson, Andi Kleen, mingo, tglx,
torvalds, akpm, drepper, Hongjiu.lu, linux-kernel, arjan,
rmk+lkml, dan, asit.k.mallick
Roland McGrath wrote:
>> hmm.. so the kernel needs to export all the cpuid info (that the kernel
>> enables and supports) to the user through some mechanism then?
>
> For a cheap interface, we use AT_HWCAP for this. Unfortunately that only
> covers the first 32 bits of CPUID info. (For another cheap interface,
> giving all the CPUID bits in the vDSO would be easy.)
>
> For a clunky interface that already exists and is "simple" to use,
> there is /dev/cpu/0/cpuid now. I wonder if having a device node and
> opening it too much for applications that consider the vDSO too complex.
I doubt it.
-hpa
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-23 2:49 ` H. Peter Anvin
@ 2008-05-23 18:09 ` Suresh Siddha
2008-06-06 0:28 ` x86: xsave/xrstor support; " H. Peter Anvin
0 siblings, 1 reply; 57+ messages in thread
From: Suresh Siddha @ 2008-05-23 18:09 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Roland McGrath, Suresh Siddha, Mikael Pettersson, Andi Kleen,
mingo, tglx, torvalds, akpm, drepper, Hongjiu.lu, linux-kernel,
arjan, rmk+lkml, dan, asit.k.mallick
On Thu, May 22, 2008 at 07:49:18PM -0700, H. Peter Anvin wrote:
> Roland McGrath wrote:
> >>hmm.. so the kernel needs to export all the cpuid info (that the kernel
> >>enables and supports) to the user through some mechanism then?
> >
> >For a cheap interface, we use AT_HWCAP for this. Unfortunately that only
> >covers the first 32 bits of CPUID info. (For another cheap interface,
> >giving all the CPUID bits in the vDSO would be easy.)
> >
> >For a clunky interface that already exists and is "simple" to use,
> >there is /dev/cpu/0/cpuid now. I wonder if having a device node and
> >opening it too much for applications that consider the vDSO too complex.
>
> I doubt it.
Ok. If really needed, they can use this interface aswell. But I don't
see a need for a new system call / other mechanism, just for xsave
purpose. They can rely on cpuid or any other equivalent infrastructure the
kernel provides.
thanks,
suresh
^ permalink raw reply [flat|nested] 57+ messages in thread
* x86: xsave/xrstor support; ucontext_t extensions
2008-05-23 18:09 ` Suresh Siddha
@ 2008-06-06 0:28 ` H. Peter Anvin
2008-06-06 20:14 ` Suresh Siddha
0 siblings, 1 reply; 57+ messages in thread
From: H. Peter Anvin @ 2008-06-06 0:28 UTC (permalink / raw)
To: Suresh Siddha
Cc: Roland McGrath, Mikael Pettersson, Andi Kleen, x86, torvalds,
akpm, drepper, Hongjiu.lu, linux-kernel, arjan, rmk+lkml, dan,
asit.k.mallick
Sorry for not getting back on this for so long.
I have looked at the XSAVE architecture, and it is pretty darn hideous,
mostly because it doesn't describe itself in the absence of CPUID
information. Given that, it would have been much better if there had
been separate invocations of XSAVE for each substate region. On the
other hand, normalizing to the current CPU format is probably desirable
anyway.
I would like to make this proposal for the signal frames (again, flagged
with a uc_flags bit for RT frames):
- The SW-reserved areas at the bottom of the FXSAVE region will be used
as follows:
- A magic number (M1)
- A length pointer (L1), giving the length of the entire XSAVE region.
- At the end of the XSAVE region, i.e. at the offset given by the length
pointer, we create a secondary structure looking something like this:
- Magic number (M2)
- Descriptor count (DC)
- DC * <EBX, EAX> from CPUID leaf 0Dh
- Possibly a checksum or CRC of this structure
Note that this tail structure will always be the same on a given kernel,
so it can be pre-canned at boot time. This tail structure serves two
purposes:
- It can be used to verify against truncation of the state.
(I.e. if an XSAVE-unaware application tries to copy and save away
a state and later restore it, but only copies the first 512 bytes
and later just puts a pointer to it.)
- It can be used to verify against an alien state (saved and restored
from another CPU, or even just another kernel version with different
support.)
If there is a mismatch, we can then take appropriate action:
- No M1 or M2 signature, or L1 or DC are insane:
-> Reinitialize any non-FXSAVE state.
- M1, M2, L1, and DC make sense, but mismatch on DC or descriptor
offsets:
-> Do a region-by-region copy in of the state; reinitialize any
regions not provided.
- Mismatch on descriptor sizes:
-> Consider that region corrupt and reinitialize?
The region-by-region copy could of course be used even in the same-CPU
case, if there turns out to be a negible performance difference over
whole-block copy.
Thoughts?
-hpa
^ permalink raw reply [flat|nested] 57+ messages in thread* Re: x86: xsave/xrstor support; ucontext_t extensions
2008-06-06 0:28 ` x86: xsave/xrstor support; " H. Peter Anvin
@ 2008-06-06 20:14 ` Suresh Siddha
2008-06-06 23:03 ` H. Peter Anvin
0 siblings, 1 reply; 57+ messages in thread
From: Suresh Siddha @ 2008-06-06 20:14 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Suresh Siddha, Roland McGrath, Mikael Pettersson, Andi Kleen, x86,
torvalds, akpm, drepper, Hongjiu.lu, linux-kernel, arjan,
rmk+lkml, dan, asit.k.mallick
On Thu, Jun 05, 2008 at 05:28:15PM -0700, H. Peter Anvin wrote:
> Sorry for not getting back on this for so long.
I thought we had a closure on the previous thread. But no problem.
It's better late than never.
> I have looked at the XSAVE architecture, and it is pretty darn hideous,
> mostly because it doesn't describe itself in the absence of CPUID
> information.
xsave,xrstor are performance senstive instructions as they are used
in process context switches. It doesn't have to describe itself and
at any time, one can get all the xsave relevant layout information using
cpuid. And when needed, SW can always pass extra information with the
xsave image.
> Given that, it would have been much better if there had
> been separate invocations of XSAVE for each substate region.
It's primarily designed for context switch handling, with
fxsave's interoperability in mind. Even though the substates are
laidout one after another, xsave/xrstor can operate on individual
states based on the edx:eax mask.
Each substate will probably also have different instructions to
save/restore/init their substate.
> On the
> other hand, normalizing to the current CPU format is probably desirable
> anyway.
yes.
> I would like to make this proposal for the signal frames (again, flagged
> with a uc_flags bit for RT frames):
>
> - The SW-reserved areas at the bottom of the FXSAVE region will be used
> as follows:
> - A magic number (M1)
> - A length pointer (L1), giving the length of the entire XSAVE region.
As previously agreed.
> - At the end of the XSAVE region, i.e. at the offset given by the length
> pointer, we create a secondary structure looking something like this:
>
> - Magic number (M2)
As I mentioned earlier, we can avoid this magic number, by including
a pointer (which points to start of the fp and xstate on stack) along with M1.
This will catch any one copying the FP state of the frame but not aware of
Xstate.
> - Descriptor count (DC)
> - DC * <EBX, EAX> from CPUID leaf 0Dh
As you mentioned, this doesn't change after a kernel boot. So do we really
need to save this static information on every signal? (also please see below
about the compaction).
> - Possibly a checksum or CRC of this structure
>
> Note that this tail structure will always be the same on a given kernel,
> so it can be pre-canned at boot time. This tail structure serves two
> purposes:
>
> - It can be used to verify against truncation of the state.
> (I.e. if an XSAVE-unaware application tries to copy and save away
> a state and later restore it, but only copies the first 512 bytes
> and later just puts a pointer to it.)
As I mentioned above, pointer along with M1 should be enough to catch this?
> - It can be used to verify against an alien state (saved and restored
> from another CPU, or even just another kernel version with different
> support.)
Though the xsave layout is extendable, save area is not
compacted if some features are not supported by processor and/or
system software. This is documented in Vol 2b under "xsave"
instruction.
In my RFC, we had the bit mask also saved in the SW-reserved area,
which represents the extended state saved in the signal frame.
This is in addition to the bit mask represeted by the xstate_bv
in the header. xstate_bv indicates the current status (init/non-init)
of the sub-states for the bit mask saved in the SW-reserved area.
While restoring, kernel can also use the same bit mask in SW area to restore
the state and init the other state not referred by the bit mask.
> If there is a mismatch, we can then take appropriate action:
>
> - No M1 or M2 signature, or L1 or DC are insane:
> -> Reinitialize any non-FXSAVE state.
>
> - M1, M2, L1, and DC make sense, but mismatch on DC or descriptor
> offsets:
> -> Do a region-by-region copy in of the state; reinitialize any
> regions not provided.
Given that the descriptor offsets don't change, we can
achieve the same thing with a bit mask representing the state in
the xsave layout. xrstor with the approriate bit masks will automatically
restore/init the state.
> - Mismatch on descriptor sizes:
> -> Consider that region corrupt and reinitialize?
>
> The region-by-region copy could of course be used even in the same-CPU
> case, if there turns out to be a negible performance difference over
> whole-block copy.
Today in 64bit, we directly do fxsave/fxrstor in and out of user-space
for signal handlers. I would like to retain this behavior as much as possible
with xsave/xrstor aswell (and at the same time, provide as much information
as possible for the user to interpret the signal frame). Bit mask representing
the state saved in the xsave image, M1, length and some cookie (pointer along
with M1) to detect the image truncation can achieve this. Isn't it?
thanks,
suresh
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: x86: xsave/xrstor support; ucontext_t extensions
2008-06-06 20:14 ` Suresh Siddha
@ 2008-06-06 23:03 ` H. Peter Anvin
0 siblings, 0 replies; 57+ messages in thread
From: H. Peter Anvin @ 2008-06-06 23:03 UTC (permalink / raw)
To: Suresh Siddha
Cc: Roland McGrath, Mikael Pettersson, Andi Kleen, x86, torvalds,
akpm, drepper, Hongjiu.lu, linux-kernel, arjan, rmk+lkml, dan,
asit.k.mallick
Suresh Siddha wrote:
> I thought we had a closure on the previous thread. But no problem.
> It's better late than never.
I apologize. The last two months have been exceptionally tough.
> xsave,xrstor are performance senstive instructions as they are used
> in process context switches. It doesn't have to describe itself and
> at any time, one can get all the xsave relevant layout information using
> cpuid. And when needed, SW can always pass extra information with the
> xsave image.
Yes, the big problem with it is its monolithic nature (with no realistic
alternate instruction.)
>> - Magic number (M2)
>
> As I mentioned earlier, we can avoid this magic number, by including
> a pointer (which points to start of the fp and xstate on stack) along with M1.
As I mentioned before, this introduces a very different constraint,
which is a really bad precedent; data shouldn't be dependent on its own
location.
> This will catch any one copying the FP state of the frame but not aware of
> Xstate.
>
>> - Descriptor count (DC)
>> - DC * <EBX, EAX> from CPUID leaf 0Dh
>
> As you mentioned, this doesn't change after a kernel boot. So do we really
> need to save this static information on every signal? (also please see below
> about the compaction).
I think given the compaction constraint we're okay with the bitmap plus
length of the area.
>> - Possibly a checksum or CRC of this structure
>>
>> Note that this tail structure will always be the same on a given kernel,
>> so it can be pre-canned at boot time. This tail structure serves two
>> purposes:
>>
>> - It can be used to verify against truncation of the state.
>> (I.e. if an XSAVE-unaware application tries to copy and save away
>> a state and later restore it, but only copies the first 512 bytes
>> and later just puts a pointer to it.)
>
> As I mentioned above, pointer along with M1 should be enough to catch this?
I think the pointer is a really really bad idea. Even if we don't need
the structure I think having a tail magic is the better alternative, and
it's also really cheap to do since you have to have the length pointer
anyway.
>> - It can be used to verify against an alien state (saved and restored
>> from another CPU, or even just another kernel version with different
>> support.)
>
> Though the xsave layout is extendable, save area is not
> compacted if some features are not supported by processor and/or
> system software. This is documented in Vol 2b under "xsave"
> instruction.
Ah, you're right, my bad. That does make the problem substantially
simpler (I somehow read only the second half of the and/or clause, but
it's all there.) So, OK, no need for descriptors (he says, as he waits
for the architectural shoe to drop, especially in a multivendor
environment.)
> Given that the descriptor offsets don't change, we can
> achieve the same thing with a bit mask representing the state in
> the xsave layout. xrstor with the approriate bit masks will automatically
> restore/init the state.
Agreed.
>> - Mismatch on descriptor sizes:
>> -> Consider that region corrupt and reinitialize?
>>
>> The region-by-region copy could of course be used even in the same-CPU
>> case, if there turns out to be a negible performance difference over
>> whole-block copy.
>
> Today in 64bit, we directly do fxsave/fxrstor in and out of user-space
> for signal handlers. I would like to retain this behavior as much as possible
> with xsave/xrstor aswell (and at the same time, provide as much information
> as possible for the user to interpret the signal frame). Bit mask representing
> the state saved in the xsave image, M1, length and some cookie (pointer along
> with M1) to detect the image truncation can achieve this. Isn't it?
If the state is complete, which it of course will be something like
99.9999% of the time, then doing XRSTOR from user space should work just
fine. The case of having to stitch up state is clearly an exceptional
case, which is not at all performance critical in any way.
-hpa
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-23 1:48 ` Suresh Siddha
2008-05-23 2:12 ` Roland McGrath
@ 2008-05-23 2:45 ` H. Peter Anvin
2008-05-23 11:46 ` Mikael Pettersson
1 sibling, 1 reply; 57+ messages in thread
From: H. Peter Anvin @ 2008-05-23 2:45 UTC (permalink / raw)
To: Suresh Siddha
Cc: Mikael Pettersson, Andi Kleen, mingo, tglx, torvalds, akpm,
roland, drepper, Hongjiu.lu, linux-kernel, arjan, rmk+lkml, dan,
asit.k.mallick
Suresh Siddha wrote:
> On Thu, May 22, 2008 at 02:34:45PM -0700, H. Peter Anvin wrote:
>> Suresh Siddha wrote:
>>> can you please elaborate? even in presence of virtualization, appropriate
>>> cpuid bits need to be set/visible for application, for xsave/xrstor to work
>>> properly.
>>>
>> For many paravirtualization solutions, CPUID "leak" from the hypervisor.
>> The fact that CPUID cannot be disabled (made ring 0 only) is a major
>> flaw in the architecture.
>>
>> Therefore, relying on CPUID is too dangerous.
>
> hmm.. so the kernel needs to export all the cpuid info (that the kernel enables
> and supports) to the user through some mechanism then?
Not really. Most of the CPUID information doesn't have a OS-dependent
portion, but the parts that do cause problems (there are other issues,
like the hypervisor wanting to limit the feature set available to the
client, but there is no real solution to that, and certainly it would be
out of scope for this discussion.)
> atleast in the xsave case, hypervisor can completely control the
> OSXSAVE flag. I am still not convinced whether I need to add prctl()
> to indicate the layout. If I have to add, then it should not just
> be about whether xsave information is included in _fpstate or not, it should also
> be about the whole cpuid information provided by the xsave architecture. Because
> the user potentially needs all that information, to make sense out of
> the data layout included in the extended state area.
Agreed that this information is needed, although it probably would be
better to leave it in the frame itself; that way at least the rt frames
would stand on their own. I'm worried about using the reserved fields
since they could be messed up by ptrace, but you could also argue that
if you have the ability to trace the process you can do anything else to
it; that way we could use a reserved field to signal the presence of the
XSAVE information in the frame; I would hope that that XSAVE information
would be self-describing (perhaps a linked list of blocks or a separate
chain of descriptors.)
In other words, we know of a place to stash the presence flag for rt
signal frames, the uc_flags field; we effectively need to find a place
to stash the uc_flags field for the non-rt frame.
Michael, you wrote:
> Hmm, right now it seems this field has a de-facto ABI of being
> either 0xffff (plain) or 0x0000 (fxsr). Using other values would
> confuse at least one application I know of. Sad.
... could you share what application that is? This otherwise would be
ideal.
-hpa
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-23 2:45 ` [RFC] x86: xsave/xrstor support, " H. Peter Anvin
@ 2008-05-23 11:46 ` Mikael Pettersson
2008-05-23 12:11 ` Andi Kleen
0 siblings, 1 reply; 57+ messages in thread
From: Mikael Pettersson @ 2008-05-23 11:46 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Suresh Siddha, Mikael Pettersson, Andi Kleen, mingo, tglx,
torvalds, akpm, roland, drepper, Hongjiu.lu, linux-kernel, arjan,
rmk+lkml, dan, asit.k.mallick
H. Peter Anvin writes:
> Michael, you wrote:
>
> > Hmm, right now it seems this field has a de-facto ABI of being
> > either 0xffff (plain) or 0x0000 (fxsr). Using other values would
> > confuse at least one application I know of. Sad.
>
> ... could you share what application that is? This otherwise would be
> ideal.
It's the virtual machine for the Erlang/OTP concurrent functional
language. It's open-source and used extensively for telecom and
internet server applications.
The language doesn't allow non-finite FP values, so FP ops must
be checked. As an optimisation, the virtual machine groups FP ops
into blocks, enables FP exceptions, and has a SIGFPE handler that
just records that an FP exception has occurred in a per-thread flag.
After each FP block there's a single check for this flag, which if
set triggers a language-level "badarith" exception.
In order to resume from an SSE2 exception the SIGFPE handler must
update some parts of the sigcontext. But first the handler must detect
if the exception came from x87 or SSE2, which requires looking at the
mxcsr field, which requires looking at the magic field to detect the
fpstate's layout.
The magic field is described as follows in sigcontext.h:
>struct _fpstate {
> ...
> unsigned short magic; /* 0xffff = regular FPU data only */
> ...
>};
>
>#define X86_FXSR_MAGIC 0x0000
While this in theory allows for other values than -1 and 0 in magic,
any such value would be != FXSR_MAGIC and therefore would not trigger
the application's "inspect the fxsr layout" logic.
/Mikael
^ permalink raw reply [flat|nested] 57+ messages in thread* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-23 11:46 ` Mikael Pettersson
@ 2008-05-23 12:11 ` Andi Kleen
0 siblings, 0 replies; 57+ messages in thread
From: Andi Kleen @ 2008-05-23 12:11 UTC (permalink / raw)
To: Mikael Pettersson
Cc: H. Peter Anvin, Suresh Siddha, Andi Kleen, mingo, tglx, torvalds,
akpm, roland, drepper, Hongjiu.lu, linux-kernel, arjan, rmk+lkml,
dan, asit.k.mallick
On Fri, May 23, 2008 at 01:46:25PM +0200, Mikael Pettersson wrote:
> H. Peter Anvin writes:
> > Michael, you wrote:
> >
> > > Hmm, right now it seems this field has a de-facto ABI of being
> > > either 0xffff (plain) or 0x0000 (fxsr). Using other values would
> > > confuse at least one application I know of. Sad.
> >
> > ... could you share what application that is? This otherwise would be
> > ideal.
>
> It's the virtual machine for the Erlang/OTP concurrent functional
> language. It's open-source and used extensively for telecom and
> internet server applications.
I had a few reports over the years of other applications ddoing very weird
specific things with the sigframe. So I agree it's fairly important
to be as compatible here as possible.
-Andi
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-22 21:02 ` H. Peter Anvin
2008-05-22 21:29 ` Suresh Siddha
@ 2008-05-22 21:32 ` Chris Wright
2008-05-22 22:15 ` Mikael Pettersson
2008-05-22 22:36 ` Mikael Pettersson
2 siblings, 1 reply; 57+ messages in thread
From: Chris Wright @ 2008-05-22 21:32 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Suresh Siddha, Mikael Pettersson, Andi Kleen, mingo, tglx,
torvalds, akpm, roland, drepper, Hongjiu.lu, linux-kernel, arjan,
rmk+lkml, dan, asit.k.mallick
* H. Peter Anvin (hpa@zytor.com) wrote:
> Suresh Siddha wrote:
>>
>> hpa, What is the virtualization problem? Are you referring to perf problem?
>> As you noted, regular non-rt signal handlers won't need this cpuid check. It's
>> needed only for those who manually look at non-rt signal frames and interpret it.
>> And also, they can do this check only once and not everytime.
>>
>
> No, relying on CPUID and vdso both have implications for virtualization.
hmm, what implications? cpuid is dealt with already.
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-22 21:32 ` Chris Wright
@ 2008-05-22 22:15 ` Mikael Pettersson
2008-05-22 22:29 ` Chris Wright
0 siblings, 1 reply; 57+ messages in thread
From: Mikael Pettersson @ 2008-05-22 22:15 UTC (permalink / raw)
To: Chris Wright
Cc: H. Peter Anvin, Suresh Siddha, Mikael Pettersson, Andi Kleen,
mingo, tglx, torvalds, akpm, roland, drepper, Hongjiu.lu,
linux-kernel, arjan, rmk+lkml, dan, asit.k.mallick
Chris Wright writes:
> * H. Peter Anvin (hpa@zytor.com) wrote:
> > Suresh Siddha wrote:
> >>
> >> hpa, What is the virtualization problem? Are you referring to perf problem?
> >> As you noted, regular non-rt signal handlers won't need this cpuid check. It's
> >> needed only for those who manually look at non-rt signal frames and interpret it.
> >> And also, they can do this check only once and not everytime.
> >>
> >
> > No, relying on CPUID and vdso both have implications for virtualization.
>
> hmm, what implications? cpuid is dealt with already.
By whom?
Case in point: only a few weeks ago, we (as in LKML hackers) tried
to debug a user's bug report. Something very strange seemed to be
going on. It turned out the user was actually using a virtualisation
engine (qemu, not the latest) which presented highly inconsistent
cpuid data to the kernel, causing major confusion.
So I agree with hpa. The kernel has decided to use a particular
sigframe layout. It should communicate this to user-space in a robust
manner that is independent of how the kernel arrived at this decision.
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-22 22:15 ` Mikael Pettersson
@ 2008-05-22 22:29 ` Chris Wright
2008-05-23 0:32 ` H. Peter Anvin
0 siblings, 1 reply; 57+ messages in thread
From: Chris Wright @ 2008-05-22 22:29 UTC (permalink / raw)
To: Mikael Pettersson
Cc: Chris Wright, H. Peter Anvin, Suresh Siddha, Andi Kleen, mingo,
tglx, torvalds, akpm, roland, drepper, Hongjiu.lu, linux-kernel,
arjan, rmk+lkml, dan, asit.k.mallick
* Mikael Pettersson (mikpe@it.uu.se) wrote:
> Chris Wright writes:
> > hmm, what implications? cpuid is dealt with already.
>
> By whom?
I was thinking specifically vmx non-root.
thanks,
-chris
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-22 22:29 ` Chris Wright
@ 2008-05-23 0:32 ` H. Peter Anvin
2008-05-23 0:44 ` Chris Wright
0 siblings, 1 reply; 57+ messages in thread
From: H. Peter Anvin @ 2008-05-23 0:32 UTC (permalink / raw)
To: Chris Wright
Cc: Mikael Pettersson, Suresh Siddha, Andi Kleen, mingo, tglx,
torvalds, akpm, roland, drepper, Hongjiu.lu, linux-kernel, arjan,
rmk+lkml, dan, asit.k.mallick
Chris Wright wrote:
> * Mikael Pettersson (mikpe@it.uu.se) wrote:
>> Chris Wright writes:
>>> hmm, what implications? cpuid is dealt with already.
>> By whom?
>
> I was thinking specifically vmx non-root.
You're assuming the presence *and use* of VMX.
-hpa
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-23 0:32 ` H. Peter Anvin
@ 2008-05-23 0:44 ` Chris Wright
0 siblings, 0 replies; 57+ messages in thread
From: Chris Wright @ 2008-05-23 0:44 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Chris Wright, Mikael Pettersson, Suresh Siddha, Andi Kleen, mingo,
tglx, torvalds, akpm, roland, drepper, Hongjiu.lu, linux-kernel,
arjan, rmk+lkml, dan, asit.k.mallick
* H. Peter Anvin (hpa@zytor.com) wrote:
> You're assuming the presence *and use* of VMX.
Yes, and I see what you're referring to w/out.
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-22 21:02 ` H. Peter Anvin
2008-05-22 21:29 ` Suresh Siddha
2008-05-22 21:32 ` Chris Wright
@ 2008-05-22 22:36 ` Mikael Pettersson
2008-05-23 0:33 ` H. Peter Anvin
2 siblings, 1 reply; 57+ messages in thread
From: Mikael Pettersson @ 2008-05-22 22:36 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Suresh Siddha, Mikael Pettersson, Andi Kleen, mingo, tglx,
torvalds, akpm, roland, drepper, Hongjiu.lu, linux-kernel, arjan,
rmk+lkml, dan, asit.k.mallick
H. Peter Anvin writes:
> Suresh Siddha wrote:
> >
> > hpa, What is the virtualization problem? Are you referring to perf problem?
> > As you noted, regular non-rt signal handlers won't need this cpuid check. It's
> > needed only for those who manually look at non-rt signal frames and interpret it.
> > And also, they can do this check only once and not everytime.
> >
>
> No, relying on CPUID and vdso both have implications for virtualization.
>
> > To me, prtcl() just seems to be an overkill.
>
> I don't think it is ... it's not overkill but rather "underkill"... it's
> a low-performance solution but it's guaranteed to be safe in the
> presence of virtualization of all its various ilk. Note that you don't
> need to be able to *set* the format via prctl(), just *query* (get) it.
I agree. It works, user-space only needs to query it once, so it's not
a big deal that it's a syscall. Admittedly a sigcontext flag would have
been better, but that doesn't seem to be viable.
> > While restoring from the user, kernel also need to find out what layout
> > the user is passing. So it's bi-directional. I prefer the same mechanism
> > (using cookies/magic numbers etc inaddition to uc_flags or cpuid checks) to
> > interpret the fpstate for both user/kernel.
>
> No, it really doesn't: the kernel only needs to be able to read the same
> format as it itself wrote.
The kernel needs to accept one(*) of the formats it can produce, which
is not necessarily what it last produced. It's not inconceivable that
user-space will construct sigframes on the fly (to emulate setcontext),
or that it will mangle sigframes (e.g. to map non-rt to rt before sigreturn).
(*) The format is determined by which version of sys_sigreturn the
user invokes.
/Mikael
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-22 22:36 ` Mikael Pettersson
@ 2008-05-23 0:33 ` H. Peter Anvin
2008-05-23 0:42 ` Suresh Siddha
0 siblings, 1 reply; 57+ messages in thread
From: H. Peter Anvin @ 2008-05-23 0:33 UTC (permalink / raw)
To: Mikael Pettersson
Cc: Suresh Siddha, Andi Kleen, mingo, tglx, torvalds, akpm, roland,
drepper, Hongjiu.lu, linux-kernel, arjan, rmk+lkml, dan,
asit.k.mallick
Mikael Pettersson wrote:
>
> > > While restoring from the user, kernel also need to find out what layout
> > > the user is passing. So it's bi-directional. I prefer the same mechanism
> > > (using cookies/magic numbers etc inaddition to uc_flags or cpuid checks) to
> > > interpret the fpstate for both user/kernel.
> >
> > No, it really doesn't: the kernel only needs to be able to read the same
> > format as it itself wrote.
>
> The kernel needs to accept one(*) of the formats it can produce, which
> is not necessarily what it last produced. It's not inconceivable that
> user-space will construct sigframes on the fly (to emulate setcontext),
> or that it will mangle sigframes (e.g. to map non-rt to rt before sigreturn).
>
> (*) The format is determined by which version of sys_sigreturn the
> user invokes.
>
No. You CANNOT restore from a frame that doesn't have the full state -
you don't have enough information to do so!
-hpa
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-23 0:33 ` H. Peter Anvin
@ 2008-05-23 0:42 ` Suresh Siddha
2008-05-23 1:33 ` Roland McGrath
2008-05-23 2:27 ` H. Peter Anvin
0 siblings, 2 replies; 57+ messages in thread
From: Suresh Siddha @ 2008-05-23 0:42 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Mikael Pettersson, Suresh Siddha, Andi Kleen, mingo, tglx,
torvalds, akpm, roland, drepper, Hongjiu.lu, linux-kernel, arjan,
rmk+lkml, dan, asit.k.mallick
On Thu, May 22, 2008 at 05:33:30PM -0700, H. Peter Anvin wrote:
> Mikael Pettersson wrote:
> >
> > > > While restoring from the user, kernel also need to find out what
> > layout
> > > > the user is passing. So it's bi-directional. I prefer the same
> > mechanism
> > > > (using cookies/magic numbers etc inaddition to uc_flags or cpuid
> > checks) to
> > > > interpret the fpstate for both user/kernel.
> > >
> > > No, it really doesn't: the kernel only needs to be able to read the
> > same > format as it itself wrote.
> >
> >The kernel needs to accept one(*) of the formats it can produce, which
> >is not necessarily what it last produced. It's not inconceivable that
> >user-space will construct sigframes on the fly (to emulate setcontext),
> >or that it will mangle sigframes (e.g. to map non-rt to rt before
> >sigreturn).
> >
> >(*) The format is determined by which version of sys_sigreturn the
> >user invokes.
> >
>
> No. You CANNOT restore from a frame that doesn't have the full state -
> you don't have enough information to do so!
What I was doing in the RFC is: restore the state what ever that was present and
init the state that was not present in the stack frame.
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-23 0:42 ` Suresh Siddha
@ 2008-05-23 1:33 ` Roland McGrath
2008-05-23 16:57 ` H. Peter Anvin
2008-05-23 2:27 ` H. Peter Anvin
1 sibling, 1 reply; 57+ messages in thread
From: Roland McGrath @ 2008-05-23 1:33 UTC (permalink / raw)
To: Suresh Siddha
Cc: H. Peter Anvin, Mikael Pettersson, Andi Kleen, mingo, tglx,
torvalds, akpm, drepper, Hongjiu.lu, linux-kernel, arjan,
rmk+lkml, dan, asit.k.mallick
> What I was doing in the RFC is: restore the state what ever that was
> present and init the state that was not present in the stack frame.
That is consistent in spirit with the existing treatment of FPU data.
That is, if the sigcontext.fpstate pointer is NULL, the thread's FPU
state is reset to default. (And despite what hpa said about being
"supported", the facts in the code are that sigreturn just follows the
sigcontext.fpstate pointer, whatever it is. On 32-bit, the pointer is
NULL in the context saved when the thread had not used the FPU, so
modifying the sigcontext to include FP state when it didn't before
requires putting in some user-chosen pointer. There in fact may well be
existing code that does user-level coroutine switching using sigreturn
and relies on this, for all we know.)
Thanks,
Roland
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-23 1:33 ` Roland McGrath
@ 2008-05-23 16:57 ` H. Peter Anvin
2008-05-23 17:50 ` Suresh Siddha
0 siblings, 1 reply; 57+ messages in thread
From: H. Peter Anvin @ 2008-05-23 16:57 UTC (permalink / raw)
To: Roland McGrath
Cc: Suresh Siddha, Mikael Pettersson, Andi Kleen, mingo, tglx,
torvalds, akpm, drepper, Hongjiu.lu, linux-kernel, arjan,
rmk+lkml, dan, asit.k.mallick
Roland McGrath wrote:
>> What I was doing in the RFC is: restore the state what ever that was
>> present and init the state that was not present in the stack frame.
>
> That is consistent in spirit with the existing treatment of FPU data.
> That is, if the sigcontext.fpstate pointer is NULL, the thread's FPU
> state is reset to default. (And despite what hpa said about being
> "supported", the facts in the code are that sigreturn just follows the
> sigcontext.fpstate pointer, whatever it is. On 32-bit, the pointer is
> NULL in the context saved when the thread had not used the FPU, so
> modifying the sigcontext to include FP state when it didn't before
> requires putting in some user-chosen pointer. There in fact may well be
> existing code that does user-level coroutine switching using sigreturn
> and relies on this, for all we know.)
>
Okay. Pretty much what it comes down to is that there is no ideal
solution. Thus, we're trying to explore the potential tradeoffs. The
scenario you describe above will crash horribly for a non-FXSAVE aware
application running on an FXSAVE kernel.
Either way, there has been a long time since, and new bad applications
have obviously emerged, partially "assisted" by our propensity to not
document, and the deep gulf between our kernel and userspace developers.
Let's try another strawman on for size:
- It is clear it is desirable not just for the frame itself but for the
fpstate to be self-describing.
- Thus, let's put a magic cookie in one of the reserved fields at the
end of the FXSAVE region, and make sure it is long enough to be unlikely
to pop up randomly; as well as another magic cookie outside the FXSAVE
region.
- The signal delivery code will write the cookie (or zero, for !XSAVE)
regardless of any crap ptrace might have written into it.
- We will ALSO set bit 0 in uc_flags for RT sigframes as an additional
assurance.
- We will introduce at least a 32-bit field for future use, to be
written unconditionally zero for now. We don't want to have to go
through this particular torture yet again.
- The XSAVE state beyond the FXSAVE region needs to be self-describing.
This may mean adding information not provided by the hardware.
Furthermore, it must be possible for userspace to know the length of the
frame, even if it doesn't understand its detailed contents.
None of this is foolproof on older kernels -- there simply *IS* no
option for older kernels that is 100% guaranteed, thanks to various
assumptions made and design decisions taken over the years. There are a
couple of failure scenarious here:
- XSAVE-aware application running on pre-XSAVE kernel:
Such an application will be aware that the XSAVE information may not
exist, but needs to know (with high probability) that it isn't
present. We have CPUID.OSXSAVE, the uc_flags bit, and the magic
cookie to help here. ptrace can introduce the magic cookie falsely
into the state, but ptrace can introduce all kinds of failures;
either way they would (probablistically) not see the *second*
cookie.
The fact that 64-bit kernels don't clear the unused fields is of
less concern, since 64-bit kernels get the uc_flags field.
- Pre-XSAVE application running on XSAVE kernel with XSAVE enabled:
Here we have the potential for all kinds of corrupt state, including
userspace trying to save away the state and load it later, not
knowing the proper size of it. Worse, some sick person might try to
save and restore state from different hosts, with potential for
all kinds of mayhem.
The saved state, if copied from the original, would contain the first
cookie but not the second cookie.
Again, the use of two cookies here adds some amount of assurance;
but that again amounts to probabilistic failure detection. However,
I personally don't see any way to avoid that scenario at all.
-hpa
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-23 16:57 ` H. Peter Anvin
@ 2008-05-23 17:50 ` Suresh Siddha
0 siblings, 0 replies; 57+ messages in thread
From: Suresh Siddha @ 2008-05-23 17:50 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Roland McGrath, Suresh Siddha, Mikael Pettersson, Andi Kleen,
mingo, tglx, torvalds, akpm, drepper, Hongjiu.lu, linux-kernel,
arjan, rmk+lkml, dan, asit.k.mallick
On Fri, May 23, 2008 at 09:57:32AM -0700, H. Peter Anvin wrote:
> Roland McGrath wrote:
> >>What I was doing in the RFC is: restore the state what ever that was
> >>present and init the state that was not present in the stack frame.
> >
> >That is consistent in spirit with the existing treatment of FPU data.
> >That is, if the sigcontext.fpstate pointer is NULL, the thread's FPU
> >state is reset to default. (And despite what hpa said about being
> >"supported", the facts in the code are that sigreturn just follows the
> >sigcontext.fpstate pointer, whatever it is. On 32-bit, the pointer is
> >NULL in the context saved when the thread had not used the FPU, so
> >modifying the sigcontext to include FP state when it didn't before
> >requires putting in some user-chosen pointer. There in fact may well be
> >existing code that does user-level coroutine switching using sigreturn
> >and relies on this, for all we know.)
> >
>
> Okay. Pretty much what it comes down to is that there is no ideal
> solution. Thus, we're trying to explore the potential tradeoffs. The
> scenario you describe above will crash horribly for a non-FXSAVE aware
> application running on an FXSAVE kernel.
yes, who ever added fxsave extensions missed this and we def want to handle
these kind of scenarios in xsave context.
> Either way, there has been a long time since, and new bad applications
> have obviously emerged, partially "assisted" by our propensity to not
> document, and the deep gulf between our kernel and userspace developers.
>
> Let's try another strawman on for size:
>
> - It is clear it is desirable not just for the frame itself but for the
> fpstate to be self-describing.
Yes.
>
> - Thus, let's put a magic cookie in one of the reserved fields at the
> end of the FXSAVE region, and make sure it is long enough to be unlikely
> to pop up randomly; as well as another magic cookie outside the FXSAVE
> region.
>
> - The signal delivery code will write the cookie (or zero, for !XSAVE)
> regardless of any crap ptrace might have written into it.
>
> - We will ALSO set bit 0 in uc_flags for RT sigframes as an additional
> assurance.
>
> - We will introduce at least a 32-bit field for future use, to be
> written unconditionally zero for now. We don't want to have to go
> through this particular torture yet again.
This is exactly what I have been trying to say. CPU folks are going
to document that bytes[464..511] of fxsave frame are going to be
SW usable. We can use some of these to represent the cookie, size
of the xsave frame, state that is saved in the frame etc. And the
remaining unused bytes can be cleared to zero for future use.
you say:
> as well as another magic cookie outside the FXSAVE region.
Is this to avoid issues with memcpy, who end up copying only fxsave
info but not the extended state? We don't need another magic
cookie outside the fxsave region, we can include the
address of the _fpstate pointer along side of the first cookie
in the SW usable portion of the fxsave state.
> - The XSAVE state beyond the FXSAVE region needs to be self-describing.
> This may mean adding information not provided by the hardware.
In my RFC, I have added size of the xsave, state mask saved in xsave
frame. I will move this to the sw usable portion of fxsave. We can
add any thing more if desired.
> Furthermore, it must be possible for userspace to know the length of the
> frame, even if it doesn't understand its detailed contents.
With the above, we are just changing the _fpstate layout (which
includes the length). But if we need frame size in the
sigframe layout, we can add it (and it can be done outside the scope of xsave
patches).
> None of this is foolproof on older kernels -- there simply *IS* no
> option for older kernels that is 100% guaranteed, thanks to various
> assumptions made and design decisions taken over the years. There are a
> couple of failure scenarious here:
>
> - XSAVE-aware application running on pre-XSAVE kernel:
>
> Such an application will be aware that the XSAVE information may not
> exist, but needs to know (with high probability) that it isn't
> present. We have CPUID.OSXSAVE, the uc_flags bit, and the magic
> cookie to help here. ptrace can introduce the magic cookie falsely
> into the state, but ptrace can introduce all kinds of failures;
> either way they would (probablistically) not see the *second*
> cookie.
Application can use cpuid.OSXSAVE (even in
paravirtualization, hypervisors will prevent the leak of
cpuid.osxsave for kernels which don't use it) reliably. And also,
we have uc_flags (for rt-frame) and other cookies in the SW usable
portion of fxsave frame.
In all the cases, kernel should be as graceful as it can be. xsave
aware kernel should try to find and restore xsave state and if there
is any failure, try to restore just the fpstate and init the rest.
> The fact that 64-bit kernels don't clear the unused fields is of
> less concern, since 64-bit kernels get the uc_flags field.
>
> - Pre-XSAVE application running on XSAVE kernel with XSAVE enabled:
>
> Here we have the potential for all kinds of corrupt state, including
> userspace trying to save away the state and load it later, not
> knowing the proper size of it. Worse, some sick person might try to
> save and restore state from different hosts, with potential for
> all kinds of mayhem.
>
> The saved state, if copied from the original, would contain the first
> cookie but not the second cookie.
>
> Again, the use of two cookies here adds some amount of assurance;
I think the pointer along with the first cookie is enough right? There
is no SW usable portion in the xsave layout outside the fxsave area.
And for compatibility reasons, its best to keep the xsave layout
similar to what the cpu uses.
> but that again amounts to probabilistic failure detection. However,
> I personally don't see any way to avoid that scenario at all.
In this case, kernel will atleast not corrupt the fxsave state, which
the application cares about.
thanks,
suresh
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-23 0:42 ` Suresh Siddha
2008-05-23 1:33 ` Roland McGrath
@ 2008-05-23 2:27 ` H. Peter Anvin
1 sibling, 0 replies; 57+ messages in thread
From: H. Peter Anvin @ 2008-05-23 2:27 UTC (permalink / raw)
To: Suresh Siddha
Cc: Mikael Pettersson, Andi Kleen, mingo, tglx, torvalds, akpm,
roland, drepper, Hongjiu.lu, linux-kernel, arjan, rmk+lkml, dan,
asit.k.mallick
Suresh Siddha wrote:
>>>
>>> The kernel needs to accept one(*) of the formats it can produce, which
>>> is not necessarily what it last produced. It's not inconceivable that
>>> user-space will construct sigframes on the fly (to emulate setcontext),
>>> or that it will mangle sigframes (e.g. to map non-rt to rt before
>>> sigreturn).
>>>
>>> (*) The format is determined by which version of sys_sigreturn the
>>> user invokes.
>>>
>> No. You CANNOT restore from a frame that doesn't have the full state -
>> you don't have enough information to do so!
>
> What I was doing in the RFC is: restore the state what ever that was present and
> init the state that was not present in the stack frame.
Either way, I find it somewhat surprising that the user would invoke a
different sys_sigreturn especially if using a restorer function (which
gcc always does.)
I really think I need to understand your application better,
*especially* in the light of the fact you wouldn't at the moment know
how even get the size of the frame.
-hpa
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-20 15:20 ` Mikael Pettersson
2008-05-20 17:53 ` Suresh Siddha
@ 2008-05-20 17:57 ` H. Peter Anvin
1 sibling, 0 replies; 57+ messages in thread
From: H. Peter Anvin @ 2008-05-20 17:57 UTC (permalink / raw)
To: Mikael Pettersson
Cc: Andi Kleen, Suresh Siddha, mingo, tglx, torvalds, akpm, roland,
drepper, Hongjiu.lu, linux-kernel, arjan, rmk+lkml, dan,
asit.k.mallick
Mikael Pettersson wrote:
>
> An ugly workaround could be to start clearing one of these fields,
> and say that the data there is only valid for kernels >= 2.6.26.
> (I said it was ugly...)
>
> Or we go back to stashing a flag in uc_flags (which is kosher),
> and try to figure out how to mark non-rt sigframes.
>
Or use the OSXSAVE hack. Let me think about this for some time.
There is also the option of simply using a field guarded by a 64- or
128-bit magic number. It's a bit ugly, but it works.
-hpa
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-20 10:01 ` Andi Kleen
2008-05-20 13:19 ` Mikael Pettersson
@ 2008-05-20 14:55 ` H. Peter Anvin
2008-05-20 15:03 ` Andi Kleen
1 sibling, 1 reply; 57+ messages in thread
From: H. Peter Anvin @ 2008-05-20 14:55 UTC (permalink / raw)
To: Andi Kleen
Cc: Suresh Siddha, Mikael Pettersson, mingo, tglx, torvalds, akpm,
roland, drepper, Hongjiu.lu, linux-kernel, arjan, rmk+lkml, dan,
asit.k.mallick
Andi Kleen wrote:
>> Ok. CPU folks are planning to make some of the bytes at the end of fxsave
>> image, SW usable.
>
> Are they always zeroed in earlier CPUs though? If not that wouldn't
> work 100% reliably because whatever cookie you put in could have been
> there before by chance.
>
> I don't see anything in the SDM guaranteeing zeroing.
>
I'm pretty sure they weren't zeroed by the CPUs. If they weren't zeroed
*by the kernel*, there might have been an information leak.
-hpa
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-20 14:55 ` H. Peter Anvin
@ 2008-05-20 15:03 ` Andi Kleen
2008-05-20 20:10 ` Roland McGrath
0 siblings, 1 reply; 57+ messages in thread
From: Andi Kleen @ 2008-05-20 15:03 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Suresh Siddha, Mikael Pettersson, mingo, tglx, torvalds, akpm,
roland, drepper, Hongjiu.lu, linux-kernel, arjan, rmk+lkml, dan,
asit.k.mallick
H. Peter Anvin wrote:
> Andi Kleen wrote:
>>> Ok. CPU folks are planning to make some of the bytes at the end of
>>> fxsave
>>> image, SW usable.
>>
>> Are they always zeroed in earlier CPUs though? If not that wouldn't
>> work 100% reliably because whatever cookie you put in could have been
>> there before by chance.
>>
>> I don't see anything in the SDM guaranteeing zeroing.
>>
>
> I'm pretty sure they weren't zeroed by the CPUs. If they weren't zeroed
> *by the kernel*, there might have been an information leak.
I don't think there is one. We never copy fxsave completely out of the
kernel. x86-64 does FXSAVE directly in/out user space, but the
only leak is what there was before.
-Andi
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-20 15:03 ` Andi Kleen
@ 2008-05-20 20:10 ` Roland McGrath
2008-05-22 0:05 ` H. Peter Anvin
0 siblings, 1 reply; 57+ messages in thread
From: Roland McGrath @ 2008-05-20 20:10 UTC (permalink / raw)
To: Andi Kleen
Cc: H. Peter Anvin, Suresh Siddha, Mikael Pettersson, mingo, tglx,
torvalds, akpm, drepper, Hongjiu.lu, linux-kernel, arjan,
rmk+lkml, dan, asit.k.mallick
> I don't think there is one. We never copy fxsave completely out of the
> kernel. x86-64 does FXSAVE directly in/out user space, but the
> only leak is what there was before.
ptrace/user_regset copies out and in the whole fxsave block from the ptrace
caller. (Only the mxcsr word is constrained after copy-in.)
Thanks,
Roland
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-20 20:10 ` Roland McGrath
@ 2008-05-22 0:05 ` H. Peter Anvin
2008-05-22 0:47 ` Roland McGrath
0 siblings, 1 reply; 57+ messages in thread
From: H. Peter Anvin @ 2008-05-22 0:05 UTC (permalink / raw)
To: Roland McGrath
Cc: Andi Kleen, Suresh Siddha, Mikael Pettersson, mingo, tglx,
torvalds, akpm, drepper, Hongjiu.lu, linux-kernel, arjan,
rmk+lkml, dan, asit.k.mallick
Roland McGrath wrote:
>> I don't think there is one. We never copy fxsave completely out of the
>> kernel. x86-64 does FXSAVE directly in/out user space, but the
>> only leak is what there was before.
>
> ptrace/user_regset copies out and in the whole fxsave block from the ptrace
> caller. (Only the mxcsr word is constrained after copy-in.)
I see two problems with that:
1. potential information leak out of the kernel if the memory area isn't
zeroed before the first FXSAVE - I haven't verified if so is the case.
This would be a (potentially very serious) security hole.
2. Hidden state in the kernel - this means user space can set
nonarchitectural state in the kernel. There are a few risks with that:
a. Malware might use it to hide state.
b. The possibility of using the stability or lack thereof of this
state to extract information about kernel internals and/or
provide a covert channel in the presence of hardware changes.
c. It is not certain that future architectures will not have
off-limit fields here, like the equivalent of MXCSR. This is
somewhat of a tricky judgement, of course, but it seems safer
to me if we would explicitly list the modifiable fields.
Thoughts?
-hpa
^ permalink raw reply [flat|nested] 57+ messages in thread* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-22 0:05 ` H. Peter Anvin
@ 2008-05-22 0:47 ` Roland McGrath
2008-05-22 8:14 ` Andi Kleen
0 siblings, 1 reply; 57+ messages in thread
From: Roland McGrath @ 2008-05-22 0:47 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Andi Kleen, Suresh Siddha, Mikael Pettersson, mingo, tglx,
torvalds, akpm, drepper, Hongjiu.lu, linux-kernel, arjan,
rmk+lkml, dan, asit.k.mallick
> 1. potential information leak out of the kernel if the memory area isn't
> zeroed before the first FXSAVE - I haven't verified if so is the case.
> This would be a (potentially very serious) security hole.
Not a problem (see init_fpu).
> 2. Hidden state in the kernel - this means user space can set
> nonarchitectural state in the kernel. There are a few risks with that:
>
> a. Malware might use it to hide state.
Meh. You can already use FP reg space when abusing non-FP-using programs
(via ptrace or signal frame clobbering). Via ptrace, you can use %db[0-3]
without setting %db7 to make them affect hardware. etc, etc.
> b. The possibility of using the stability or lack thereof of this
> state to extract information about kernel internals and/or
> provide a covert channel in the presence of hardware changes.
> c. It is not certain that future architectures will not have
> off-limit fields here, like the equivalent of MXCSR. This is
> somewhat of a tricky judgement, of course, but it seems safer
> to me if we would explicitly list the modifiable fields.
Meh. The status quo is that (aside from initialization) it gets
whatever the hardware does. FXRSTOR is an unprivileged instruction.
AFAIK the only reason we control the MXCSR value is because we don't
want to handle a #GP in context switch. That would happen if reserved
bits were set in MXCSR. We already have code to do fxrstor and handle
it, which sigreturn uses. We could just use that if it weren't so
trivial to keep MXCSR safe--but since it is, why not keep one fewer
entry in the exception fixup table?
The hardware people know that it is unprivileged and is used in this way
where privileged code uses a data block received from unprivileged code.
If they add things that can make the data lead to new kinds of
exceptions, they will make it require the OS enable something, just like
we already have to set OSFXSR for the MXCSR reserved bits #GP case to be
possible.
Any piece of software state we start using in this buffer just has to
meet these same expectations. (i.e. if any value could cause lossage to
anything but the user's own memory and user-mode CPU state, then we have
to police it at sigreturn/user_regset.set time.)
Thanks,
Roland
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [RFC] x86: xsave/xrstor support, ucontext_t extensions
2008-05-22 0:47 ` Roland McGrath
@ 2008-05-22 8:14 ` Andi Kleen
0 siblings, 0 replies; 57+ messages in thread
From: Andi Kleen @ 2008-05-22 8:14 UTC (permalink / raw)
To: Roland McGrath
Cc: H. Peter Anvin, Andi Kleen, Suresh Siddha, Mikael Pettersson,
mingo, tglx, torvalds, akpm, drepper, Hongjiu.lu, linux-kernel,
arjan, rmk+lkml, dan, asit.k.mallick
> AFAIK the only reason we control the MXCSR value is because we don't
> want to handle a #GP in context switch.
x86-64 actually handles it in context switch, just i386 doesn't.
-Andi
^ permalink raw reply [flat|nested] 57+ messages in thread
end of thread, other threads:[~2008-06-06 23:17 UTC | newest]
Thread overview: 57+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-13 1:10 [RFC] x86: xsave/xrstor support, ucontext_t extensions Suresh Siddha
2008-05-16 13:26 ` Mikael Pettersson
2008-05-18 1:34 ` Suresh Siddha
2008-05-19 14:52 ` Mikael Pettersson
2008-05-19 15:04 ` Andi Kleen
2008-05-19 16:29 ` H. Peter Anvin
2008-05-19 16:57 ` Suresh Siddha
2008-05-19 17:45 ` H. Peter Anvin
2008-05-20 1:57 ` Suresh Siddha
2008-05-20 8:58 ` Mikael Pettersson
2008-05-20 10:01 ` Andi Kleen
2008-05-20 13:19 ` Mikael Pettersson
2008-05-20 14:58 ` H. Peter Anvin
2008-05-20 15:20 ` Mikael Pettersson
2008-05-20 17:53 ` Suresh Siddha
2008-05-20 17:59 ` H. Peter Anvin
2008-05-22 0:28 ` H. Peter Anvin
2008-05-22 0:53 ` Roland McGrath
2008-05-22 1:38 ` H. Peter Anvin
2008-05-22 6:40 ` Roland McGrath
2008-05-22 7:18 ` H. Peter Anvin
2008-05-22 8:49 ` Mikael Pettersson
2008-05-22 8:57 ` Mikael Pettersson
2008-05-22 20:56 ` Suresh Siddha
2008-05-22 21:02 ` H. Peter Anvin
2008-05-22 21:29 ` Suresh Siddha
2008-05-22 21:34 ` H. Peter Anvin
2008-05-22 22:22 ` Mikael Pettersson
2008-05-23 1:48 ` Suresh Siddha
2008-05-23 2:12 ` Roland McGrath
2008-05-23 2:49 ` H. Peter Anvin
2008-05-23 18:09 ` Suresh Siddha
2008-06-06 0:28 ` x86: xsave/xrstor support; " H. Peter Anvin
2008-06-06 20:14 ` Suresh Siddha
2008-06-06 23:03 ` H. Peter Anvin
2008-05-23 2:45 ` [RFC] x86: xsave/xrstor support, " H. Peter Anvin
2008-05-23 11:46 ` Mikael Pettersson
2008-05-23 12:11 ` Andi Kleen
2008-05-22 21:32 ` Chris Wright
2008-05-22 22:15 ` Mikael Pettersson
2008-05-22 22:29 ` Chris Wright
2008-05-23 0:32 ` H. Peter Anvin
2008-05-23 0:44 ` Chris Wright
2008-05-22 22:36 ` Mikael Pettersson
2008-05-23 0:33 ` H. Peter Anvin
2008-05-23 0:42 ` Suresh Siddha
2008-05-23 1:33 ` Roland McGrath
2008-05-23 16:57 ` H. Peter Anvin
2008-05-23 17:50 ` Suresh Siddha
2008-05-23 2:27 ` H. Peter Anvin
2008-05-20 17:57 ` H. Peter Anvin
2008-05-20 14:55 ` H. Peter Anvin
2008-05-20 15:03 ` Andi Kleen
2008-05-20 20:10 ` Roland McGrath
2008-05-22 0:05 ` H. Peter Anvin
2008-05-22 0:47 ` Roland McGrath
2008-05-22 8:14 ` Andi Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox