* [PATCH 0/2] x86: allow to enable/disable modify_ldt at run time @ 2015-08-03 18:23 Willy Tarreau 2015-08-03 18:23 ` [PATCH 1/2] sysctl: add a new generic strategy to make permanent changes on negative values Willy Tarreau ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: Willy Tarreau @ 2015-08-03 18:23 UTC (permalink / raw) To: Andy Lutomirski, Kees Cook Cc: Willy Tarreau, Steven Rostedt, security@kernel.org, X86 ML, Borislav Petkov, Sasha Levin, LKML, Konrad Rzeszutek Wilk, Boris Ostrovsky, Andrew Cooper, Jan Beulich, xen-devel This is the second version. It adds a strategy for the sysctls so that we can reject any change to a value that was already negative. This way it's possible to disable modify_ldt temporarily or permanently (eg: lock down a server) as suggested by Kees. Willy Tarreau (2): sysctl: add a new generic strategy to make permanent changes on negative values x86/ldt: allow to disable modify_ldt at runtime Documentation/sysctl/kernel.txt | 16 +++++++++++++ arch/x86/Kconfig | 17 ++++++++++++++ arch/x86/kernel/ldt.c | 15 +++++++++++++ kernel/sysctl.c | 50 +++++++++++++++++++++++++++++++++++++++++ 4 files changed, 98 insertions(+) -- 1.7.12.1 ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 1/2] sysctl: add a new generic strategy to make permanent changes on negative values 2015-08-03 18:23 [PATCH 0/2] x86: allow to enable/disable modify_ldt at run time Willy Tarreau @ 2015-08-03 18:23 ` Willy Tarreau 2015-08-03 18:33 ` Andy Lutomirski 2015-08-03 18:23 ` [PATCH 2/2] x86/ldt: allow to disable modify_ldt at runtime Willy Tarreau 2015-08-04 8:49 ` [PATCH v3 1/1] x86: allow to enable/disable modify_ldt at run time Willy Tarreau 2 siblings, 1 reply; 20+ messages in thread From: Willy Tarreau @ 2015-08-03 18:23 UTC (permalink / raw) To: Andy Lutomirski, Kees Cook Cc: Willy Tarreau, Steven Rostedt, security@kernel.org, X86 ML, Borislav Petkov, Sasha Levin, LKML, Konrad Rzeszutek Wilk, Boris Ostrovsky, Andrew Cooper, Jan Beulich, xen-devel The new function is proc_dointvec_minmax_negperm(), it refuses to change the value if the current one is already negative. This will be used to lock down some settings such as sensitive system calls. Signed-off-by: Willy Tarreau <w@1wt.eu> --- kernel/sysctl.c | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 19b62b5..86c95a8 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -185,6 +185,9 @@ static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos); #endif +static int proc_dointvec_minmax_negperm(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, loff_t *ppos); + static int proc_dointvec_minmax_coredump(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos); #ifdef CONFIG_COREDUMP @@ -2249,6 +2252,33 @@ static void validate_coredump_safety(void) #endif } +/* Like minmax except that it refuses any change if the value was already + * negative. It silently ignores overrides with the same negative value. + */ +static int do_proc_dointvec_negperm_conv(bool *negp, unsigned long *lvalp, + int *valp, + int write, void *data) +{ + if (write && *valp < 0 && (!*negp || *valp != (int)*lvalp)) + return -EINVAL; + + return do_proc_dointvec_minmax_conv(negp, lvalp, valp, write, data); +} + +/* Like proc_dointvec_minmax() except that it refuses any change once + * the destination is negative. Used to permanently disable some settings. + */ +static int proc_dointvec_minmax_negperm(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, loff_t *ppos) +{ + struct do_proc_dointvec_minmax_conv_param param = { + .min = (int *) table->extra1, + .max = (int *) table->extra2, + }; + return do_proc_dointvec(table, write, buffer, lenp, ppos, + do_proc_dointvec_negperm_conv, ¶m); +} + static int proc_dointvec_minmax_coredump(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos) { @@ -2751,6 +2781,12 @@ int proc_dointvec_minmax(struct ctl_table *table, int write, return -ENOSYS; } +static int proc_dointvec_minmax_negperm(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, loff_t *ppos) +{ + return -ENOSYS; +} + int proc_dointvec_jiffies(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos) { -- 1.7.12.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] sysctl: add a new generic strategy to make permanent changes on negative values 2015-08-03 18:23 ` [PATCH 1/2] sysctl: add a new generic strategy to make permanent changes on negative values Willy Tarreau @ 2015-08-03 18:33 ` Andy Lutomirski 2015-08-03 18:50 ` Willy Tarreau 0 siblings, 1 reply; 20+ messages in thread From: Andy Lutomirski @ 2015-08-03 18:33 UTC (permalink / raw) To: Willy Tarreau Cc: Andy Lutomirski, Kees Cook, Steven Rostedt, security@kernel.org, X86 ML, Borislav Petkov, Sasha Levin, LKML, Konrad Rzeszutek Wilk, Boris Ostrovsky, Andrew Cooper, Jan Beulich, xen-devel On Mon, Aug 3, 2015 at 11:23 AM, Willy Tarreau <w@1wt.eu> wrote: > The new function is proc_dointvec_minmax_negperm(), it refuses to change > the value if the current one is already negative. This will be used to > lock down some settings such as sensitive system calls. > > Signed-off-by: Willy Tarreau <w@1wt.eu> > --- > kernel/sysctl.c | 36 ++++++++++++++++++++++++++++++++++++ > 1 file changed, 36 insertions(+) > > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index 19b62b5..86c95a8 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -185,6 +185,9 @@ static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write, > void __user *buffer, size_t *lenp, loff_t *ppos); > #endif > > +static int proc_dointvec_minmax_negperm(struct ctl_table *table, int write, > + void __user *buffer, size_t *lenp, loff_t *ppos); > + > static int proc_dointvec_minmax_coredump(struct ctl_table *table, int write, > void __user *buffer, size_t *lenp, loff_t *ppos); > #ifdef CONFIG_COREDUMP > @@ -2249,6 +2252,33 @@ static void validate_coredump_safety(void) > #endif > } > > +/* Like minmax except that it refuses any change if the value was already > + * negative. It silently ignores overrides with the same negative value. > + */ > +static int do_proc_dointvec_negperm_conv(bool *negp, unsigned long *lvalp, > + int *valp, > + int write, void *data) > +{ > + if (write && *valp < 0 && (!*negp || *valp != (int)*lvalp)) I could easily have failed to follow the bizarre negative sign convention, but shouldn't that be "*valp != -(int)*lvalp" or similar? --Andy ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] sysctl: add a new generic strategy to make permanent changes on negative values 2015-08-03 18:33 ` Andy Lutomirski @ 2015-08-03 18:50 ` Willy Tarreau 0 siblings, 0 replies; 20+ messages in thread From: Willy Tarreau @ 2015-08-03 18:50 UTC (permalink / raw) To: Andy Lutomirski Cc: Andy Lutomirski, Kees Cook, Steven Rostedt, security@kernel.org, X86 ML, Borislav Petkov, Sasha Levin, LKML, Konrad Rzeszutek Wilk, Boris Ostrovsky, Andrew Cooper, Jan Beulich, xen-devel On Mon, Aug 03, 2015 at 11:33:30AM -0700, Andy Lutomirski wrote: > On Mon, Aug 3, 2015 at 11:23 AM, Willy Tarreau <w@1wt.eu> wrote: > > The new function is proc_dointvec_minmax_negperm(), it refuses to change > > the value if the current one is already negative. This will be used to > > lock down some settings such as sensitive system calls. > > > > Signed-off-by: Willy Tarreau <w@1wt.eu> > > --- > > kernel/sysctl.c | 36 ++++++++++++++++++++++++++++++++++++ > > 1 file changed, 36 insertions(+) > > > > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > > index 19b62b5..86c95a8 100644 > > --- a/kernel/sysctl.c > > +++ b/kernel/sysctl.c > > @@ -185,6 +185,9 @@ static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write, > > void __user *buffer, size_t *lenp, loff_t *ppos); > > #endif > > > > +static int proc_dointvec_minmax_negperm(struct ctl_table *table, int write, > > + void __user *buffer, size_t *lenp, loff_t *ppos); > > + > > static int proc_dointvec_minmax_coredump(struct ctl_table *table, int write, > > void __user *buffer, size_t *lenp, loff_t *ppos); > > #ifdef CONFIG_COREDUMP > > @@ -2249,6 +2252,33 @@ static void validate_coredump_safety(void) > > #endif > > } > > > > +/* Like minmax except that it refuses any change if the value was already > > + * negative. It silently ignores overrides with the same negative value. > > + */ > > +static int do_proc_dointvec_negperm_conv(bool *negp, unsigned long *lvalp, > > + int *valp, > > + int write, void *data) > > +{ > > + if (write && *valp < 0 && (!*negp || *valp != (int)*lvalp)) > > I could easily have failed to follow the bizarre negative sign > convention, but shouldn't that be "*valp != -(int)*lvalp" or similar? Not exactly since the sign is passed via negp apparently. There is an expression in the called function which first assigns lvalp or -lvalp to val depending on val, then uses the resulting value. The code above is the (simplified for me) equivalent of : int val = *negp ? -*lvalp : *lvalp; if (write && *valp < 0 && *valp != val) return -EINVAL; Maybe you find it more readable in which case I can redo it this way ? In my case it was the opposite in fact, I want to reject non-negative values as well as the negative ones not equal to *valp. Note that we could have decided to make it even simpler and always reject writes once *valp is < 0 but I find that it would be annoying for hardening scripts which would not be idempotent anymore. Willy ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 2/2] x86/ldt: allow to disable modify_ldt at runtime 2015-08-03 18:23 [PATCH 0/2] x86: allow to enable/disable modify_ldt at run time Willy Tarreau 2015-08-03 18:23 ` [PATCH 1/2] sysctl: add a new generic strategy to make permanent changes on negative values Willy Tarreau @ 2015-08-03 18:23 ` Willy Tarreau 2015-08-03 18:45 ` Andy Lutomirski 2015-08-03 22:35 ` Kees Cook 2015-08-04 8:49 ` [PATCH v3 1/1] x86: allow to enable/disable modify_ldt at run time Willy Tarreau 2 siblings, 2 replies; 20+ messages in thread From: Willy Tarreau @ 2015-08-03 18:23 UTC (permalink / raw) To: Andy Lutomirski, Kees Cook Cc: Willy Tarreau, Steven Rostedt, security@kernel.org, X86 ML, Borislav Petkov, Sasha Levin, LKML, Konrad Rzeszutek Wilk, Boris Ostrovsky, Andrew Cooper, Jan Beulich, xen-devel For distros who prefer not to take the risk of completely disabling the modify_ldt syscall using CONFIG_MODIFY_LDT_SYSCALL, this patch adds a sysctl to enable, temporarily disable, or permanently disable it at runtime, and proposes to temporarily disable it by default. This can be a safe alternative. A message is logged if an attempt was stopped so that it's easy to spot if/when it is needed. Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Willy Tarreau <w@1wt.eu> --- Documentation/sysctl/kernel.txt | 16 ++++++++++++++++ arch/x86/Kconfig | 17 +++++++++++++++++ arch/x86/kernel/ldt.c | 15 +++++++++++++++ kernel/sysctl.c | 14 ++++++++++++++ 4 files changed, 62 insertions(+) diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 6fccb69..55648b9 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -41,6 +41,7 @@ show up in /proc/sys/kernel: - kptr_restrict - kstack_depth_to_print [ X86 only ] - l2cr [ PPC only ] +- modify_ldt [ X86 only ] - modprobe ==> Documentation/debugging-modules.txt - modules_disabled - msg_next_id [ sysv ipc ] @@ -391,6 +392,21 @@ This flag controls the L2 cache of G3 processor boards. If ============================================================== +modify_ldt: (X86 only) + +Enables (1), disables (0) or permanently disables (-1) the modify_ldt syscall. +Modifying the LDT (Local Descriptor Table) may be needed to run a 16-bit or +segmented code such as Dosemu or Wine. This is done via a system call which is +not needed to run portable applications, and which can sometimes be abused to +exploit some weaknesses of the architecture, opening new vulnerabilities. + +This sysctl allows one to increase the system's security by disabling the +system call, or to restore compatibility with specific applications when it +was already disabled. When permanently disabled, it is not possible to change +the value anymore until the next system reboot. + +============================================================== + modules_disabled: A toggle value indicating if modules are allowed to be loaded diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index beabf30..88d10a0 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2069,6 +2069,23 @@ config MODIFY_LDT_SYSCALL surface. Disabling it removes the modify_ldt(2) system call. Saying 'N' here may make sense for embedded or server kernels. + If really unsure, say 'Y', you'll be able to disable it at runtime. + +config DEFAULT_MODIFY_LDT_SYSCALL + bool "Allow userspace to modify the LDT by default" + depends on MODIFY_LDT_SYSCALL + default y + ---help--- + Modifying the LDT (Local Descriptor Table) may be needed to run a + 16-bit or segmented code such as Dosemu or Wine. This is done via + a system call which is not needed to run portable applications, + and which can sometimes be abused to exploit some weaknesses of + the architecture, opening new vulnerabilities. + + For this reason this option allows one to enable or disable the + feature at runtime. It is recommended to say 'N' here to leave + the system protected, and to enable it at runtime only if needed + by setting the sys.kernel.modify_ldt sysctl. source "kernel/livepatch/Kconfig" diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c index 2bcc052..420fc8f 100644 --- a/arch/x86/kernel/ldt.c +++ b/arch/x86/kernel/ldt.c @@ -11,6 +11,7 @@ #include <linux/sched.h> #include <linux/string.h> #include <linux/mm.h> +#include <linux/ratelimit.h> #include <linux/smp.h> #include <linux/slab.h> #include <linux/vmalloc.h> @@ -21,6 +22,11 @@ #include <asm/mmu_context.h> #include <asm/syscalls.h> +#ifdef CONFIG_MODIFY_LDT_SYSCALL +int sysctl_modify_ldt __read_mostly = + IS_ENABLED(CONFIG_DEFAULT_MODIFY_LDT_SYSCALL); +#endif + /* context.lock is held for us, so we don't need any locking. */ static void flush_ldt(void *current_mm) { @@ -276,6 +282,15 @@ asmlinkage int sys_modify_ldt(int func, void __user *ptr, { int ret = -ENOSYS; + if (sysctl_modify_ldt <= 0) { + printk_ratelimited(KERN_INFO + "Denied a call to modify_ldt() from %s[%d] (uid: %d)." + " Adjust sysctl if this was not an exploit attempt.\n", + current->comm, task_pid_nr(current), + from_kuid_munged(current_user_ns(), current_uid())); + return ret; + } + switch (func) { case 0: ret = read_ldt(ptr, bytecount); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 86c95a8..ec1170d 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -111,6 +111,9 @@ extern int sysctl_nr_open_min, sysctl_nr_open_max; #ifndef CONFIG_MMU extern int sysctl_nr_trim_pages; #endif +#ifdef CONFIG_MODIFY_LDT_SYSCALL +extern int sysctl_modify_ldt; +#endif /* Constants used for minimum and maximum */ #ifdef CONFIG_LOCKUP_DETECTOR @@ -963,6 +966,17 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = proc_dointvec, }, +#ifdef CONFIG_MODIFY_LDT_SYSCALL + { + .procname = "modify_ldt", + .data = &sysctl_modify_ldt, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax_negperm, + .extra1 = &neg_one, + .extra2 = &one, + }, +#endif #endif #if defined(CONFIG_MMU) { -- 1.7.12.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] x86/ldt: allow to disable modify_ldt at runtime 2015-08-03 18:23 ` [PATCH 2/2] x86/ldt: allow to disable modify_ldt at runtime Willy Tarreau @ 2015-08-03 18:45 ` Andy Lutomirski 2015-08-03 19:01 ` Willy Tarreau 2015-08-04 3:54 ` Borislav Petkov 2015-08-03 22:35 ` Kees Cook 1 sibling, 2 replies; 20+ messages in thread From: Andy Lutomirski @ 2015-08-03 18:45 UTC (permalink / raw) To: Willy Tarreau Cc: Andy Lutomirski, Kees Cook, Steven Rostedt, security@kernel.org, X86 ML, Borislav Petkov, Sasha Levin, LKML, Konrad Rzeszutek Wilk, Boris Ostrovsky, Andrew Cooper, Jan Beulich, xen-devel On Mon, Aug 3, 2015 at 11:23 AM, Willy Tarreau <w@1wt.eu> wrote: > For distros who prefer not to take the risk of completely disabling the > modify_ldt syscall using CONFIG_MODIFY_LDT_SYSCALL, this patch adds a > sysctl to enable, temporarily disable, or permanently disable it at > runtime, and proposes to temporarily disable it by default. This can be > a safe alternative. A message is logged if an attempt was stopped so that > it's easy to spot if/when it is needed. > > Cc: Andy Lutomirski <luto@kernel.org> > Cc: Kees Cook <keescook@chromium.org> > Signed-off-by: Willy Tarreau <w@1wt.eu> > --- > Documentation/sysctl/kernel.txt | 16 ++++++++++++++++ > arch/x86/Kconfig | 17 +++++++++++++++++ > arch/x86/kernel/ldt.c | 15 +++++++++++++++ > kernel/sysctl.c | 14 ++++++++++++++ > 4 files changed, 62 insertions(+) > > diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt > index 6fccb69..55648b9 100644 > --- a/Documentation/sysctl/kernel.txt > +++ b/Documentation/sysctl/kernel.txt > @@ -41,6 +41,7 @@ show up in /proc/sys/kernel: > - kptr_restrict > - kstack_depth_to_print [ X86 only ] > - l2cr [ PPC only ] > +- modify_ldt [ X86 only ] > - modprobe ==> Documentation/debugging-modules.txt > - modules_disabled > - msg_next_id [ sysv ipc ] > @@ -391,6 +392,21 @@ This flag controls the L2 cache of G3 processor boards. If > > ============================================================== > > +modify_ldt: (X86 only) > + > +Enables (1), disables (0) or permanently disables (-1) the modify_ldt syscall. > +Modifying the LDT (Local Descriptor Table) may be needed to run a 16-bit or > +segmented code such as Dosemu or Wine. This is done via a system call which is > +not needed to run portable applications, and which can sometimes be abused to > +exploit some weaknesses of the architecture, opening new vulnerabilities. > + I'm not entirely convinced that the lock bit should work this way. At some point, we might want a setting for "32-bit only" or even "32-bit, present, not non-conforming only" (like we do unconditionally for set_thread_area). When we do that, having -1 act like 0 might be confusing. I'd actually favor rigging it up to support enumerated values and/or the word "locked" somewhere in the text. So we could have "0", "1 locked", "1" or even "enabled" "enabled locked", "disabled", "disabled locked", "safe 32-bit", "safe 32-bit locked", etc. I'll add an explicit 16-bit check to my infinite todo list for the asm part. Now that the synchronous modify_ldt code is merged, it won't be racy, and it would make a 32-bit only mode actually be useful (except maybe on AMD -- someone needs to test just how badly broken IRET is on AMD systems -- I know that AMD has IRET-to-16-bit differently broken from Intel, and that causes test-cast failures. Grump.) P.S. Hey CPU vendors: please consider stopping your utter suckage when it comes to critical system instructions. Intel and AMD both terminally screwed up IRET in multiple ways that clearly took actual effort. Intel screwed up SYSRET pretty badly (AFAIK every single 64-bit OS has had at least one root hole as a result), and AMD screwed SYSRET up differently (userspace crash bug that requires a performance hit to mitigate because no one at AMD realized that one might preempt a process during a syscall). P.P.S. You know what would be *way* better than allowing IRET to fault? Just allow IRET to continue executing the next instruction on failure (I'm talking about #GP, #NP, and #SS here, not page faults). P.P.P.S. Who thought that IRET faults unmasking NMIs made any sense whatsoever when NMIs run on an IST stack? Seriously, people? --Andy ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] x86/ldt: allow to disable modify_ldt at runtime 2015-08-03 18:45 ` Andy Lutomirski @ 2015-08-03 19:01 ` Willy Tarreau 2015-08-03 19:06 ` Andy Lutomirski 2015-08-04 3:54 ` Borislav Petkov 1 sibling, 1 reply; 20+ messages in thread From: Willy Tarreau @ 2015-08-03 19:01 UTC (permalink / raw) To: Andy Lutomirski Cc: Andy Lutomirski, Kees Cook, Steven Rostedt, security@kernel.org, X86 ML, Borislav Petkov, Sasha Levin, LKML, Konrad Rzeszutek Wilk, Boris Ostrovsky, Andrew Cooper, Jan Beulich, xen-devel On Mon, Aug 03, 2015 at 11:45:24AM -0700, Andy Lutomirski wrote: > I'm not entirely convinced that the lock bit should work this way. At > some point, we might want a setting for "32-bit only" or even "32-bit, > present, not non-conforming only" (like we do unconditionally for > set_thread_area). When we do that, having -1 act like 0 might be > confusing. > > I'd actually favor rigging it up to support enumerated values and/or > the word "locked" somewhere in the text. So we could have "0", "1 > locked", "1" or even "enabled" "enabled locked", "disabled", "disabled > locked", "safe 32-bit", "safe 32-bit locked", etc. Got it, that makes sense indeed. I asked myself whether we'd use more than these 3 values, and estimated that "locked on" didn't make much sense here, and I thought that nobody would like to manipulate such things using bitmaps. But with words like this it can indeed make sense. I feel like it's probably part of a larger project then. Do you think we should step back and only support 0/1 for now ? I also have the patch available. > I'll add an explicit 16-bit check to my infinite todo list for the asm > part. Now that the synchronous modify_ldt code is merged, it won't be > racy, and it would make a 32-bit only mode actually be useful (except > maybe on AMD -- someone needs to test just how badly broken IRET is on > AMD systems -- I know that AMD has IRET-to-16-bit differently broken > from Intel, and that causes test-cast failures. Grump.) > > P.S. Hey CPU vendors: please consider stopping your utter suckage when > it comes to critical system instructions. Intel and AMD both > terminally screwed up IRET in multiple ways that clearly took actual > effort. Intel screwed up SYSRET pretty badly (AFAIK every single > 64-bit OS has had at least one root hole as a result), and AMD screwed > SYSRET up differently (userspace crash bug that requires a performance > hit to mitigate because no one at AMD realized that one might preempt > a process during a syscall). Well the good thing is that SYSRET reused the LOADALL opcode so at least this one cannot be screwed on 64-bit :-) It would have helped us to emulate IRET though. > P.P.S. You know what would be *way* better than allowing IRET to > fault? Just allow IRET to continue executing the next instruction on > failure (I'm talking about #GP, #NP, and #SS here, not page faults). > > P.P.P.S. Who thought that IRET faults unmasking NMIs made any sense > whatsoever when NMIs run on an IST stack? Seriously, people? A dedicated flag "don't clear NMI yet" would have been nice in EFLAGS so that the software stack could set it in fault handlers. It would be one-shot and always cleared by IRET. That would have been very handy. Willy ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] x86/ldt: allow to disable modify_ldt at runtime 2015-08-03 19:01 ` Willy Tarreau @ 2015-08-03 19:06 ` Andy Lutomirski 2015-08-03 19:18 ` Willy Tarreau 0 siblings, 1 reply; 20+ messages in thread From: Andy Lutomirski @ 2015-08-03 19:06 UTC (permalink / raw) To: Willy Tarreau Cc: Andy Lutomirski, Kees Cook, Steven Rostedt, security@kernel.org, X86 ML, Borislav Petkov, Sasha Levin, LKML, Konrad Rzeszutek Wilk, Boris Ostrovsky, Andrew Cooper, Jan Beulich, xen-devel On Mon, Aug 3, 2015 at 12:01 PM, Willy Tarreau <w@1wt.eu> wrote: > On Mon, Aug 03, 2015 at 11:45:24AM -0700, Andy Lutomirski wrote: >> I'm not entirely convinced that the lock bit should work this way. At >> some point, we might want a setting for "32-bit only" or even "32-bit, >> present, not non-conforming only" (like we do unconditionally for >> set_thread_area). When we do that, having -1 act like 0 might be >> confusing. >> >> I'd actually favor rigging it up to support enumerated values and/or >> the word "locked" somewhere in the text. So we could have "0", "1 >> locked", "1" or even "enabled" "enabled locked", "disabled", "disabled >> locked", "safe 32-bit", "safe 32-bit locked", etc. > > Got it, that makes sense indeed. I asked myself whether we'd use more > than these 3 values, and estimated that "locked on" didn't make much > sense here, and I thought that nobody would like to manipulate such > things using bitmaps. But with words like this it can indeed make > sense. > > I feel like it's probably part of a larger project then. Do you think > we should step back and only support 0/1 for now ? I also have the > patch available. Sounds good to me. > >> I'll add an explicit 16-bit check to my infinite todo list for the asm >> part. Now that the synchronous modify_ldt code is merged, it won't be >> racy, and it would make a 32-bit only mode actually be useful (except >> maybe on AMD -- someone needs to test just how badly broken IRET is on >> AMD systems -- I know that AMD has IRET-to-16-bit differently broken >> from Intel, and that causes test-cast failures. Grump.) >> >> P.S. Hey CPU vendors: please consider stopping your utter suckage when >> it comes to critical system instructions. Intel and AMD both >> terminally screwed up IRET in multiple ways that clearly took actual >> effort. Intel screwed up SYSRET pretty badly (AFAIK every single >> 64-bit OS has had at least one root hole as a result), and AMD screwed >> SYSRET up differently (userspace crash bug that requires a performance >> hit to mitigate because no one at AMD realized that one might preempt >> a process during a syscall). > > Well the good thing is that SYSRET reused the LOADALL opcode so at > least this one cannot be screwed on 64-bit :-) It would have helped us > to emulate IRET though. You sure? I'm reasonably confident that Athlon 64 and newer support SYSRET in legacy and long mode. Of course, I think that SYSCALL is totally worthless in legacy mode (SYSCALL_MASK isn't available, so I suspect that the lack of sensible TF handling would be a show-stopper). > >> P.P.S. You know what would be *way* better than allowing IRET to >> fault? Just allow IRET to continue executing the next instruction on >> failure (I'm talking about #GP, #NP, and #SS here, not page faults). >> >> P.P.P.S. Who thought that IRET faults unmasking NMIs made any sense >> whatsoever when NMIs run on an IST stack? Seriously, people? > > A dedicated flag "don't clear NMI yet" would have been nice in EFLAGS > so that the software stack could set it in fault handlers. It would be > one-shot and always cleared by IRET. That would have been very handy. > How about a dedicated "NMI masked" flag in EFLAGS? That would be straightforward and dead simple to handle. --Andy ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] x86/ldt: allow to disable modify_ldt at runtime 2015-08-03 19:06 ` Andy Lutomirski @ 2015-08-03 19:18 ` Willy Tarreau 0 siblings, 0 replies; 20+ messages in thread From: Willy Tarreau @ 2015-08-03 19:18 UTC (permalink / raw) To: Andy Lutomirski Cc: Andy Lutomirski, Kees Cook, Steven Rostedt, security@kernel.org, X86 ML, Borislav Petkov, Sasha Levin, LKML, Konrad Rzeszutek Wilk, Boris Ostrovsky, Andrew Cooper, Jan Beulich, xen-devel On Mon, Aug 03, 2015 at 12:06:12PM -0700, Andy Lutomirski wrote: > On Mon, Aug 3, 2015 at 12:01 PM, Willy Tarreau <w@1wt.eu> wrote: (...) > > I feel like it's probably part of a larger project then. Do you think > > we should step back and only support 0/1 for now ? I also have the > > patch available. > > Sounds good to me. OK I'll send the other one instead once I unpack my PC. > > Well the good thing is that SYSRET reused the LOADALL opcode so at > > least this one cannot be screwed on 64-bit :-) It would have helped us > > to emulate IRET though. > > You sure? I'm reasonably confident that Athlon 64 and newer support > SYSRET in legacy and long mode. Of course, I think that SYSCALL is > totally worthless in legacy mode (SYSCALL_MASK isn't available, so I > suspect that the lack of sensible TF handling would be a > show-stopper). I meant loadall cannot be screwed since it was replaced. > >> P.P.S. You know what would be *way* better than allowing IRET to > >> fault? Just allow IRET to continue executing the next instruction on > >> failure (I'm talking about #GP, #NP, and #SS here, not page faults). > >> > >> P.P.P.S. Who thought that IRET faults unmasking NMIs made any sense > >> whatsoever when NMIs run on an IST stack? Seriously, people? > > > > A dedicated flag "don't clear NMI yet" would have been nice in EFLAGS > > so that the software stack could set it in fault handlers. It would be > > one-shot and always cleared by IRET. That would have been very handy. > > > > How about a dedicated "NMI masked" flag in EFLAGS? That would be > straightforward and dead simple to handle. Sounds like an oxymoron. But such a flag should be atomically manipulated so that you don't re-arm queued NMIs before calling iret. With two flags, a read-only one for "NMI masked" and a modifiable one "keep NMI masked", you can provide an atomic behaviour where you have this latch executed on iret : NMI_MASKED &= KEEP_NMI_MASKED; KEEP_NMI_MASKED = 0; But anyway we're discussing in the void, this CPU doesn't exist so unless intel/AMD designers want to improve their design (and start by talking together to reach the exact same behavior), we'll never see anything like this :-/ Willy ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] x86/ldt: allow to disable modify_ldt at runtime 2015-08-03 18:45 ` Andy Lutomirski 2015-08-03 19:01 ` Willy Tarreau @ 2015-08-04 3:54 ` Borislav Petkov 2015-08-04 6:00 ` Willy Tarreau 1 sibling, 1 reply; 20+ messages in thread From: Borislav Petkov @ 2015-08-04 3:54 UTC (permalink / raw) To: Andy Lutomirski Cc: Willy Tarreau, Andy Lutomirski, Kees Cook, Steven Rostedt, security@kernel.org, X86 ML, Sasha Levin, LKML, Konrad Rzeszutek Wilk, Boris Ostrovsky, Andrew Cooper, Jan Beulich, xen-devel On Mon, Aug 03, 2015 at 11:45:24AM -0700, Andy Lutomirski wrote: > P.P.P.S. Who thought that IRET faults unmasking NMIs made any sense > whatsoever when NMIs run on an IST stack? Seriously, people? What happened with asking Intel for a sane IRET-NG? Should be relatively easy - take the current IRET microcode, get rid of the nasty crap, allocate a new opcode and done. Validation should actually have *less* to do and can reuse all current test cases. :-) -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] x86/ldt: allow to disable modify_ldt at runtime 2015-08-04 3:54 ` Borislav Petkov @ 2015-08-04 6:00 ` Willy Tarreau 0 siblings, 0 replies; 20+ messages in thread From: Willy Tarreau @ 2015-08-04 6:00 UTC (permalink / raw) To: Borislav Petkov Cc: Andy Lutomirski, Andy Lutomirski, Kees Cook, Steven Rostedt, security@kernel.org, X86 ML, Sasha Levin, LKML, Konrad Rzeszutek Wilk, Boris Ostrovsky, Andrew Cooper, Jan Beulich, xen-devel On Tue, Aug 04, 2015 at 05:54:51AM +0200, Borislav Petkov wrote: > On Mon, Aug 03, 2015 at 11:45:24AM -0700, Andy Lutomirski wrote: > > P.P.P.S. Who thought that IRET faults unmasking NMIs made any sense > > whatsoever when NMIs run on an IST stack? Seriously, people? > > What happened with asking Intel for a sane IRET-NG? > > Should be relatively easy - take the current IRET microcode, get rid > of the nasty crap, allocate a new opcode and done. Validation should > actually have *less* to do and can reuse all current test cases. Even easier, just add a few flags (probably 2 or 3 only) that IRET can check to adjust its behaviour. Basically "don't re-enable NMIs yet", maybe something to adjust the behaviour on bad CS/SS/SP/IP and a few such things could possibly help. Maybe all of this could be summarized as a single flag "I'm in a fault handler". Willy ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] x86/ldt: allow to disable modify_ldt at runtime 2015-08-03 18:23 ` [PATCH 2/2] x86/ldt: allow to disable modify_ldt at runtime Willy Tarreau 2015-08-03 18:45 ` Andy Lutomirski @ 2015-08-03 22:35 ` Kees Cook 2015-08-03 23:19 ` Willy Tarreau 1 sibling, 1 reply; 20+ messages in thread From: Kees Cook @ 2015-08-03 22:35 UTC (permalink / raw) To: Willy Tarreau Cc: Andy Lutomirski, Steven Rostedt, security@kernel.org, X86 ML, Borislav Petkov, Sasha Levin, LKML, Konrad Rzeszutek Wilk, Boris Ostrovsky, Andrew Cooper, Jan Beulich, xen-devel On Mon, Aug 3, 2015 at 11:23 AM, Willy Tarreau <w@1wt.eu> wrote: > For distros who prefer not to take the risk of completely disabling the > modify_ldt syscall using CONFIG_MODIFY_LDT_SYSCALL, this patch adds a > sysctl to enable, temporarily disable, or permanently disable it at > runtime, and proposes to temporarily disable it by default. This can be > a safe alternative. A message is logged if an attempt was stopped so that > it's easy to spot if/when it is needed. > > Cc: Andy Lutomirski <luto@kernel.org> > Cc: Kees Cook <keescook@chromium.org> > Signed-off-by: Willy Tarreau <w@1wt.eu> > --- > Documentation/sysctl/kernel.txt | 16 ++++++++++++++++ > arch/x86/Kconfig | 17 +++++++++++++++++ > arch/x86/kernel/ldt.c | 15 +++++++++++++++ > kernel/sysctl.c | 14 ++++++++++++++ > 4 files changed, 62 insertions(+) > > diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt > index 6fccb69..55648b9 100644 > --- a/Documentation/sysctl/kernel.txt > +++ b/Documentation/sysctl/kernel.txt > @@ -41,6 +41,7 @@ show up in /proc/sys/kernel: > - kptr_restrict > - kstack_depth_to_print [ X86 only ] > - l2cr [ PPC only ] > +- modify_ldt [ X86 only ] > - modprobe ==> Documentation/debugging-modules.txt > - modules_disabled > - msg_next_id [ sysv ipc ] > @@ -391,6 +392,21 @@ This flag controls the L2 cache of G3 processor boards. If > > ============================================================== > > +modify_ldt: (X86 only) > + > +Enables (1), disables (0) or permanently disables (-1) the modify_ldt syscall. > +Modifying the LDT (Local Descriptor Table) may be needed to run a 16-bit or > +segmented code such as Dosemu or Wine. This is done via a system call which is > +not needed to run portable applications, and which can sometimes be abused to > +exploit some weaknesses of the architecture, opening new vulnerabilities. > + > +This sysctl allows one to increase the system's security by disabling the > +system call, or to restore compatibility with specific applications when it > +was already disabled. When permanently disabled, it is not possible to change > +the value anymore until the next system reboot. > + > +============================================================== > + > modules_disabled: > > A toggle value indicating if modules are allowed to be loaded > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index beabf30..88d10a0 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -2069,6 +2069,23 @@ config MODIFY_LDT_SYSCALL > surface. Disabling it removes the modify_ldt(2) system call. > > Saying 'N' here may make sense for embedded or server kernels. > + If really unsure, say 'Y', you'll be able to disable it at runtime. > + > +config DEFAULT_MODIFY_LDT_SYSCALL > + bool "Allow userspace to modify the LDT by default" > + depends on MODIFY_LDT_SYSCALL > + default y > + ---help--- > + Modifying the LDT (Local Descriptor Table) may be needed to run a > + 16-bit or segmented code such as Dosemu or Wine. This is done via > + a system call which is not needed to run portable applications, > + and which can sometimes be abused to exploit some weaknesses of > + the architecture, opening new vulnerabilities. > + > + For this reason this option allows one to enable or disable the > + feature at runtime. It is recommended to say 'N' here to leave > + the system protected, and to enable it at runtime only if needed > + by setting the sys.kernel.modify_ldt sysctl. > > source "kernel/livepatch/Kconfig" > > diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c > index 2bcc052..420fc8f 100644 > --- a/arch/x86/kernel/ldt.c > +++ b/arch/x86/kernel/ldt.c > @@ -11,6 +11,7 @@ > #include <linux/sched.h> > #include <linux/string.h> > #include <linux/mm.h> > +#include <linux/ratelimit.h> > #include <linux/smp.h> > #include <linux/slab.h> > #include <linux/vmalloc.h> > @@ -21,6 +22,11 @@ > #include <asm/mmu_context.h> > #include <asm/syscalls.h> > > +#ifdef CONFIG_MODIFY_LDT_SYSCALL > +int sysctl_modify_ldt __read_mostly = > + IS_ENABLED(CONFIG_DEFAULT_MODIFY_LDT_SYSCALL); > +#endif > + > /* context.lock is held for us, so we don't need any locking. */ > static void flush_ldt(void *current_mm) > { > @@ -276,6 +282,15 @@ asmlinkage int sys_modify_ldt(int func, void __user *ptr, > { > int ret = -ENOSYS; > > + if (sysctl_modify_ldt <= 0) { > + printk_ratelimited(KERN_INFO pr_info_ratelimited? *shrug* > + "Denied a call to modify_ldt() from %s[%d] (uid: %d)." > + " Adjust sysctl if this was not an exploit attempt.\n", > + current->comm, task_pid_nr(current), > + from_kuid_munged(current_user_ns(), current_uid())); > + return ret; > + } > + > switch (func) { > case 0: > ret = read_ldt(ptr, bytecount); > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index 86c95a8..ec1170d 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -111,6 +111,9 @@ extern int sysctl_nr_open_min, sysctl_nr_open_max; > #ifndef CONFIG_MMU > extern int sysctl_nr_trim_pages; > #endif > +#ifdef CONFIG_MODIFY_LDT_SYSCALL > +extern int sysctl_modify_ldt; > +#endif > > /* Constants used for minimum and maximum */ > #ifdef CONFIG_LOCKUP_DETECTOR > @@ -963,6 +966,17 @@ static struct ctl_table kern_table[] = { > .mode = 0644, > .proc_handler = proc_dointvec, > }, > +#ifdef CONFIG_MODIFY_LDT_SYSCALL > + { > + .procname = "modify_ldt", > + .data = &sysctl_modify_ldt, > + .maxlen = sizeof(int), > + .mode = 0644, > + .proc_handler = proc_dointvec_minmax_negperm, > + .extra1 = &neg_one, > + .extra2 = &one, > + }, > +#endif > #endif > #if defined(CONFIG_MMU) > { > -- > 1.7.12.1 > Yay for perm disable! Thank you! :) Acked-by: Kees Cook <keescook@chromium.org> -Kees -- Kees Cook Chrome OS Security ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] x86/ldt: allow to disable modify_ldt at runtime 2015-08-03 22:35 ` Kees Cook @ 2015-08-03 23:19 ` Willy Tarreau 2015-08-04 1:36 ` Kees Cook 0 siblings, 1 reply; 20+ messages in thread From: Willy Tarreau @ 2015-08-03 23:19 UTC (permalink / raw) To: Kees Cook Cc: Andy Lutomirski, Steven Rostedt, security@kernel.org, X86 ML, Borislav Petkov, Sasha Levin, LKML, Konrad Rzeszutek Wilk, Boris Ostrovsky, Andrew Cooper, Jan Beulich, xen-devel On Mon, Aug 03, 2015 at 03:35:15PM -0700, Kees Cook wrote: > Yay for perm disable! Thank you! :) Andy would like to see this evolve towards something possibly more complete and/or generic. I think this needs more thoughts and that we should possibly stick to 0/1 for now and decide how we want to make this evolve later to cover permanent disable, various ABIs, etc... What do you think ? Willy ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] x86/ldt: allow to disable modify_ldt at runtime 2015-08-03 23:19 ` Willy Tarreau @ 2015-08-04 1:36 ` Kees Cook 0 siblings, 0 replies; 20+ messages in thread From: Kees Cook @ 2015-08-04 1:36 UTC (permalink / raw) To: Willy Tarreau Cc: Andy Lutomirski, Steven Rostedt, security@kernel.org, X86 ML, Borislav Petkov, Sasha Levin, LKML, Konrad Rzeszutek Wilk, Boris Ostrovsky, Andrew Cooper, Jan Beulich, xen-devel On Mon, Aug 3, 2015 at 4:19 PM, Willy Tarreau <w@1wt.eu> wrote: > On Mon, Aug 03, 2015 at 03:35:15PM -0700, Kees Cook wrote: >> Yay for perm disable! Thank you! :) > > Andy would like to see this evolve towards something possibly > more complete and/or generic. I think this needs more thoughts > and that we should possibly stick to 0/1 for now and decide how > we want to make this evolve later to cover permanent disable, > various ABIs, etc... > > What do you think ? That's probably the best way forward. I still think a generic syscall disabling feature would be nice. :) I won't have time to work on it for a little while, though. -Kees -- Kees Cook Chrome OS Security ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v3 1/1] x86: allow to enable/disable modify_ldt at run time 2015-08-03 18:23 [PATCH 0/2] x86: allow to enable/disable modify_ldt at run time Willy Tarreau 2015-08-03 18:23 ` [PATCH 1/2] sysctl: add a new generic strategy to make permanent changes on negative values Willy Tarreau 2015-08-03 18:23 ` [PATCH 2/2] x86/ldt: allow to disable modify_ldt at runtime Willy Tarreau @ 2015-08-04 8:49 ` Willy Tarreau 2015-08-05 8:00 ` Ingo Molnar 2 siblings, 1 reply; 20+ messages in thread From: Willy Tarreau @ 2015-08-04 8:49 UTC (permalink / raw) To: Andy Lutomirski, Kees Cook Cc: Steven Rostedt, security@kernel.org, X86 ML, Borislav Petkov, Sasha Levin, LKML, Konrad Rzeszutek Wilk, Boris Ostrovsky, Andrew Cooper, Jan Beulich, xen-devel For distros who prefer not to take the risk of completely disabling the modify_ldt syscall using CONFIG_MODIFY_LDT_SYSCALL, this patch adds a sysctl to enable or disable it at runtime, and proposes to disable it by default. This can be a safe alternative. A message is logged if an attempt was stopped so that it's easy to spot if/when it is needed. Future improvements regarding permanent disabling will have to be done in consideration for other syscalls, ABIs and general use cases. Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Willy Tarreau <w@1wt.eu> --- So this is the third version which only allows to temporarily enable or disable the modify_ldt syscall. Permanent changes will have to be thought about later. It applies on top of Andy's series. Documentation/sysctl/kernel.txt | 15 +++++++++++++++ arch/x86/Kconfig | 17 +++++++++++++++++ arch/x86/kernel/ldt.c | 15 +++++++++++++++ kernel/sysctl.c | 14 ++++++++++++++ 4 files changed, 61 insertions(+) diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 6fccb69..60c7c7a 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -41,6 +41,7 @@ show up in /proc/sys/kernel: - kptr_restrict - kstack_depth_to_print [ X86 only ] - l2cr [ PPC only ] +- modify_ldt [ X86 only ] - modprobe ==> Documentation/debugging-modules.txt - modules_disabled - msg_next_id [ sysv ipc ] @@ -391,6 +392,20 @@ This flag controls the L2 cache of G3 processor boards. If ============================================================== +modify_ldt: (X86 only) + +Enables (1) or disables (0) the modify_ldt syscall. Modifying the LDT +(Local Descriptor Table) may be needed to run a 16-bit or segmented code +such as Dosemu or Wine. This is done via a system call which is not needed +to run portable applications, and which can sometimes be abused to exploit +some weaknesses of the architecture, opening new vulnerabilities. + +This sysctl allows one to increase the system's security by disabling the +system call, or to restore compatibility with specific applications when it +was already disabled. + +============================================================== + modules_disabled: A toggle value indicating if modules are allowed to be loaded diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index beabf30..88d10a0 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2069,6 +2069,23 @@ config MODIFY_LDT_SYSCALL surface. Disabling it removes the modify_ldt(2) system call. Saying 'N' here may make sense for embedded or server kernels. + If really unsure, say 'Y', you'll be able to disable it at runtime. + +config DEFAULT_MODIFY_LDT_SYSCALL + bool "Allow userspace to modify the LDT by default" + depends on MODIFY_LDT_SYSCALL + default y + ---help--- + Modifying the LDT (Local Descriptor Table) may be needed to run a + 16-bit or segmented code such as Dosemu or Wine. This is done via + a system call which is not needed to run portable applications, + and which can sometimes be abused to exploit some weaknesses of + the architecture, opening new vulnerabilities. + + For this reason this option allows one to enable or disable the + feature at runtime. It is recommended to say 'N' here to leave + the system protected, and to enable it at runtime only if needed + by setting the sys.kernel.modify_ldt sysctl. source "kernel/livepatch/Kconfig" diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c index 2bcc052..cb64b85 100644 --- a/arch/x86/kernel/ldt.c +++ b/arch/x86/kernel/ldt.c @@ -11,6 +11,7 @@ #include <linux/sched.h> #include <linux/string.h> #include <linux/mm.h> +#include <linux/ratelimit.h> #include <linux/smp.h> #include <linux/slab.h> #include <linux/vmalloc.h> @@ -21,6 +22,11 @@ #include <asm/mmu_context.h> #include <asm/syscalls.h> +#ifdef CONFIG_MODIFY_LDT_SYSCALL +int sysctl_modify_ldt __read_mostly = + IS_ENABLED(CONFIG_DEFAULT_MODIFY_LDT_SYSCALL); +#endif + /* context.lock is held for us, so we don't need any locking. */ static void flush_ldt(void *current_mm) { @@ -276,6 +282,15 @@ asmlinkage int sys_modify_ldt(int func, void __user *ptr, { int ret = -ENOSYS; + if (!sysctl_modify_ldt) { + printk_ratelimited(KERN_INFO + "Denied a call to modify_ldt() from %s[%d] (uid: %d)." + " Adjust sysctl if this was not an exploit attempt.\n", + current->comm, task_pid_nr(current), + from_kuid_munged(current_user_ns(), current_uid())); + return ret; + } + switch (func) { case 0: ret = read_ldt(ptr, bytecount); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 19b62b5..492aeba 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -111,6 +111,9 @@ extern int sysctl_nr_open_min, sysctl_nr_open_max; #ifndef CONFIG_MMU extern int sysctl_nr_trim_pages; #endif +#ifdef CONFIG_MODIFY_LDT_SYSCALL +extern int sysctl_modify_ldt; +#endif /* Constants used for minimum and maximum */ #ifdef CONFIG_LOCKUP_DETECTOR @@ -960,6 +963,17 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = proc_dointvec, }, +#ifdef CONFIG_MODIFY_LDT_SYSCALL + { + .procname = "modify_ldt", + .data = &sysctl_modify_ldt, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = &zero, + .extra2 = &one, + }, +#endif #endif #if defined(CONFIG_MMU) { -- 1.7.12.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v3 1/1] x86: allow to enable/disable modify_ldt at run time 2015-08-04 8:49 ` [PATCH v3 1/1] x86: allow to enable/disable modify_ldt at run time Willy Tarreau @ 2015-08-05 8:00 ` Ingo Molnar 2015-08-05 8:08 ` Willy Tarreau 0 siblings, 1 reply; 20+ messages in thread From: Ingo Molnar @ 2015-08-05 8:00 UTC (permalink / raw) To: Willy Tarreau Cc: Andy Lutomirski, Kees Cook, Steven Rostedt, security@kernel.org, X86 ML, Borislav Petkov, Sasha Levin, LKML, Konrad Rzeszutek Wilk, Boris Ostrovsky, Andrew Cooper, Jan Beulich, xen-devel * Willy Tarreau <w@1wt.eu> wrote: > @@ -276,6 +282,15 @@ asmlinkage int sys_modify_ldt(int func, void __user *ptr, > { > int ret = -ENOSYS; > > + if (!sysctl_modify_ldt) { > + printk_ratelimited(KERN_INFO > + "Denied a call to modify_ldt() from %s[%d] (uid: %d)." > + " Adjust sysctl if this was not an exploit attempt.\n", > + current->comm, task_pid_nr(current), > + from_kuid_munged(current_user_ns(), current_uid())); UI nit: so this message should really tell the user _which_ sysctl to configure, instead of passive-aggressively alluding to the fact that there's a sysctl somewhere that might do the trick... Thanks, Ingo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v3 1/1] x86: allow to enable/disable modify_ldt at run time 2015-08-05 8:00 ` Ingo Molnar @ 2015-08-05 8:08 ` Willy Tarreau 2015-08-05 8:26 ` Ingo Molnar 0 siblings, 1 reply; 20+ messages in thread From: Willy Tarreau @ 2015-08-05 8:08 UTC (permalink / raw) To: Ingo Molnar Cc: Andy Lutomirski, Kees Cook, Steven Rostedt, security@kernel.org, X86 ML, Borislav Petkov, Sasha Levin, LKML, Konrad Rzeszutek Wilk, Boris Ostrovsky, Andrew Cooper, Jan Beulich, xen-devel Hi Ingo, On Wed, Aug 05, 2015 at 10:00:37AM +0200, Ingo Molnar wrote: > > * Willy Tarreau <w@1wt.eu> wrote: > > > @@ -276,6 +282,15 @@ asmlinkage int sys_modify_ldt(int func, void __user *ptr, > > { > > int ret = -ENOSYS; > > > > + if (!sysctl_modify_ldt) { > > + printk_ratelimited(KERN_INFO > > + "Denied a call to modify_ldt() from %s[%d] (uid: %d)." > > + " Adjust sysctl if this was not an exploit attempt.\n", > > + current->comm, task_pid_nr(current), > > + from_kuid_munged(current_user_ns(), current_uid())); > > UI nit: so this message should really tell the user _which_ sysctl to configure, > instead of passive-aggressively alluding to the fact that there's a sysctl > somewhere that might do the trick... I agree, I did it first and changed my mind due to the repetition of the word "modify_ldt". Here's an updated version instead. Willy >From 17b2720cd54df0fde6686c1d85aaed38d679cbe7 Mon Sep 17 00:00:00 2001 From: Willy Tarreau <w@1wt.eu> Date: Sat, 25 Jul 2015 12:18:33 +0200 Subject: [PATCH] x86/ldt: allow to disable modify_ldt at runtime For distros who prefer not to take the risk of completely disabling the modify_ldt syscall using CONFIG_MODIFY_LDT_SYSCALL, this patch adds a sysctl to enable or/disable it at runtime, and proposes to disable it by default. This can be a safe alternative. A message is logged if an attempt was stopped so that it's easy to spot if/when it is needed. Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Willy Tarreau <w@1wt.eu> --- Documentation/sysctl/kernel.txt | 15 +++++++++++++++ arch/x86/Kconfig | 17 +++++++++++++++++ arch/x86/kernel/ldt.c | 16 ++++++++++++++++ kernel/sysctl.c | 14 ++++++++++++++ 4 files changed, 62 insertions(+) diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 6fccb69..60c7c7a 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -41,6 +41,7 @@ show up in /proc/sys/kernel: - kptr_restrict - kstack_depth_to_print [ X86 only ] - l2cr [ PPC only ] +- modify_ldt [ X86 only ] - modprobe ==> Documentation/debugging-modules.txt - modules_disabled - msg_next_id [ sysv ipc ] @@ -391,6 +392,20 @@ This flag controls the L2 cache of G3 processor boards. If ============================================================== +modify_ldt: (X86 only) + +Enables (1) or disables (0) the modify_ldt syscall. Modifying the LDT +(Local Descriptor Table) may be needed to run a 16-bit or segmented code +such as Dosemu or Wine. This is done via a system call which is not needed +to run portable applications, and which can sometimes be abused to exploit +some weaknesses of the architecture, opening new vulnerabilities. + +This sysctl allows one to increase the system's security by disabling the +system call, or to restore compatibility with specific applications when it +was already disabled. + +============================================================== + modules_disabled: A toggle value indicating if modules are allowed to be loaded diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index beabf30..88d10a0 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2069,6 +2069,23 @@ config MODIFY_LDT_SYSCALL surface. Disabling it removes the modify_ldt(2) system call. Saying 'N' here may make sense for embedded or server kernels. + If really unsure, say 'Y', you'll be able to disable it at runtime. + +config DEFAULT_MODIFY_LDT_SYSCALL + bool "Allow userspace to modify the LDT by default" + depends on MODIFY_LDT_SYSCALL + default y + ---help--- + Modifying the LDT (Local Descriptor Table) may be needed to run a + 16-bit or segmented code such as Dosemu or Wine. This is done via + a system call which is not needed to run portable applications, + and which can sometimes be abused to exploit some weaknesses of + the architecture, opening new vulnerabilities. + + For this reason this option allows one to enable or disable the + feature at runtime. It is recommended to say 'N' here to leave + the system protected, and to enable it at runtime only if needed + by setting the sys.kernel.modify_ldt sysctl. source "kernel/livepatch/Kconfig" diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c index 2bcc052..354e854 100644 --- a/arch/x86/kernel/ldt.c +++ b/arch/x86/kernel/ldt.c @@ -11,6 +11,7 @@ #include <linux/sched.h> #include <linux/string.h> #include <linux/mm.h> +#include <linux/ratelimit.h> #include <linux/smp.h> #include <linux/slab.h> #include <linux/vmalloc.h> @@ -21,6 +22,11 @@ #include <asm/mmu_context.h> #include <asm/syscalls.h> +#ifdef CONFIG_MODIFY_LDT_SYSCALL +int sysctl_modify_ldt __read_mostly = + IS_ENABLED(CONFIG_DEFAULT_MODIFY_LDT_SYSCALL); +#endif + /* context.lock is held for us, so we don't need any locking. */ static void flush_ldt(void *current_mm) { @@ -276,6 +282,16 @@ asmlinkage int sys_modify_ldt(int func, void __user *ptr, { int ret = -ENOSYS; + if (!sysctl_modify_ldt) { + printk_ratelimited(KERN_INFO + "Denied a call to modify_ldt() from %s[%d] (uid: %d)." + " Adjust the modify_ldt sysctl if this was not an" + " exploit attempt.\n", + current->comm, task_pid_nr(current), + from_kuid_munged(current_user_ns(), current_uid())); + return ret; + } + switch (func) { case 0: ret = read_ldt(ptr, bytecount); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 19b62b5..492aeba 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -111,6 +111,9 @@ extern int sysctl_nr_open_min, sysctl_nr_open_max; #ifndef CONFIG_MMU extern int sysctl_nr_trim_pages; #endif +#ifdef CONFIG_MODIFY_LDT_SYSCALL +extern int sysctl_modify_ldt; +#endif /* Constants used for minimum and maximum */ #ifdef CONFIG_LOCKUP_DETECTOR @@ -960,6 +963,17 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = proc_dointvec, }, +#ifdef CONFIG_MODIFY_LDT_SYSCALL + { + .procname = "modify_ldt", + .data = &sysctl_modify_ldt, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = &zero, + .extra2 = &one, + }, +#endif #endif #if defined(CONFIG_MMU) { -- 1.7.12.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v3 1/1] x86: allow to enable/disable modify_ldt at run time 2015-08-05 8:08 ` Willy Tarreau @ 2015-08-05 8:26 ` Ingo Molnar 2015-08-05 9:03 ` Willy Tarreau 0 siblings, 1 reply; 20+ messages in thread From: Ingo Molnar @ 2015-08-05 8:26 UTC (permalink / raw) To: Willy Tarreau Cc: Andy Lutomirski, Kees Cook, Steven Rostedt, security@kernel.org, X86 ML, Borislav Petkov, Sasha Levin, LKML, Konrad Rzeszutek Wilk, Boris Ostrovsky, Andrew Cooper, Jan Beulich, xen-devel * Willy Tarreau <w@1wt.eu> wrote: > Hi Ingo, > > On Wed, Aug 05, 2015 at 10:00:37AM +0200, Ingo Molnar wrote: > > > > * Willy Tarreau <w@1wt.eu> wrote: > > > > > @@ -276,6 +282,15 @@ asmlinkage int sys_modify_ldt(int func, void __user *ptr, > > > { > > > int ret = -ENOSYS; > > > > > > + if (!sysctl_modify_ldt) { > > > + printk_ratelimited(KERN_INFO > > > + "Denied a call to modify_ldt() from %s[%d] (uid: %d)." > > > + " Adjust sysctl if this was not an exploit attempt.\n", > > > + current->comm, task_pid_nr(current), > > > + from_kuid_munged(current_user_ns(), current_uid())); > > > > UI nit: so this message should really tell the user _which_ sysctl to configure, > > instead of passive-aggressively alluding to the fact that there's a sysctl > > somewhere that might do the trick... > > I agree, I did it first and changed my mind due to the repetition of > the word "modify_ldt". > > Here's an updated version instead. > > Willy > > > From 17b2720cd54df0fde6686c1d85aaed38d679cbe7 Mon Sep 17 00:00:00 2001 > From: Willy Tarreau <w@1wt.eu> > Date: Sat, 25 Jul 2015 12:18:33 +0200 > Subject: [PATCH] x86/ldt: allow to disable modify_ldt at runtime > > For distros who prefer not to take the risk of completely disabling the > modify_ldt syscall using CONFIG_MODIFY_LDT_SYSCALL, this patch adds a > sysctl to enable or/disable it at runtime, and proposes to disable it > by default. This can be a safe alternative. A message is logged if an > attempt was stopped so that it's easy to spot if/when it is needed. > > Cc: Andy Lutomirski <luto@kernel.org> > Cc: Kees Cook <keescook@chromium.org> > Signed-off-by: Willy Tarreau <w@1wt.eu> > --- > Documentation/sysctl/kernel.txt | 15 +++++++++++++++ > arch/x86/Kconfig | 17 +++++++++++++++++ > arch/x86/kernel/ldt.c | 16 ++++++++++++++++ > kernel/sysctl.c | 14 ++++++++++++++ > 4 files changed, 62 insertions(+) > > diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt > index 6fccb69..60c7c7a 100644 > --- a/Documentation/sysctl/kernel.txt > +++ b/Documentation/sysctl/kernel.txt > @@ -41,6 +41,7 @@ show up in /proc/sys/kernel: > - kptr_restrict > - kstack_depth_to_print [ X86 only ] > - l2cr [ PPC only ] > +- modify_ldt [ X86 only ] > - modprobe ==> Documentation/debugging-modules.txt > - modules_disabled > - msg_next_id [ sysv ipc ] > @@ -391,6 +392,20 @@ This flag controls the L2 cache of G3 processor boards. If > > ============================================================== > > +modify_ldt: (X86 only) > + > +Enables (1) or disables (0) the modify_ldt syscall. Modifying the LDT > +(Local Descriptor Table) may be needed to run a 16-bit or segmented code s/run a/run > +such as Dosemu or Wine. This is done via a system call which is not needed s/Dosemu/DOSEMU > +to run portable applications, and which can sometimes be abused to exploit > +some weaknesses of the architecture, opening new vulnerabilities. So that's pretty vague IMHO, and a bit FUD-ish in character. How about: ... , and which system call exposes complex, rarely used legacy hardware features and semantics that had suffered vulnerabilities in the past. > + > +This sysctl allows one to increase the system's security by disabling the > +system call, or to restore compatibility with specific applications when it > +was already disabled. > + > +============================================================== > + > modules_disabled: > > A toggle value indicating if modules are allowed to be loaded > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index beabf30..88d10a0 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -2069,6 +2069,23 @@ config MODIFY_LDT_SYSCALL > surface. Disabling it removes the modify_ldt(2) system call. > > Saying 'N' here may make sense for embedded or server kernels. > + If really unsure, say 'Y', you'll be able to disable it at runtime. > + > +config DEFAULT_MODIFY_LDT_SYSCALL > + bool "Allow userspace to modify the LDT by default" > + depends on MODIFY_LDT_SYSCALL > + default y > + ---help--- > + Modifying the LDT (Local Descriptor Table) may be needed to run a > + 16-bit or segmented code such as Dosemu or Wine. This is done via > + a system call which is not needed to run portable applications, > + and which can sometimes be abused to exploit some weaknesses of > + the architecture, opening new vulnerabilities. > + > + For this reason this option allows one to enable or disable the > + feature at runtime. It is recommended to say 'N' here to leave > + the system protected, and to enable it at runtime only if needed > + by setting the sys.kernel.modify_ldt sysctl. Here I'd do the same modifications as to the sysctl text above. > + if (!sysctl_modify_ldt) { > + printk_ratelimited(KERN_INFO > + "Denied a call to modify_ldt() from %s[%d] (uid: %d)." > + " Adjust the modify_ldt sysctl if this was not an" Would it really be so difficult to write this as: Set "sys.kernel.modify_ldt = 1" in /etc/sysctl.conf if this was not an exploit attempt. 99% of the users seeing this message will see it right after an app of theirs ended up not working. Let's not add to the annoyance factor! > + " exploit attempt.\n", > + current->comm, task_pid_nr(current), > + from_kuid_munged(current_user_ns(), current_uid())); Also generally please don't break message lines in the source code while they are a single line in the syslog, to make it easier to grep for and to expose kernel hackers to the form of message they are emitting. Ignore checkpatch. > @@ -960,6 +963,17 @@ static struct ctl_table kern_table[] = { > .mode = 0644, > .proc_handler = proc_dointvec, > }, > +#ifdef CONFIG_MODIFY_LDT_SYSCALL > + { > + .procname = "modify_ldt", > + .data = &sysctl_modify_ldt, > + .maxlen = sizeof(int), > + .mode = 0644, > + .proc_handler = proc_dointvec_minmax, > + .extra1 = &zero, > + .extra2 = &one, > + }, > +#endif So I'd actually make the permissions 0600: to make it a tiny bit harder for exploits to silently query the current value to figure out whether they can safely attempt the syscall or not ... (Sadly /etc/sysctl.conf is world-readable on most distros.) Thanks, Ingo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v3 1/1] x86: allow to enable/disable modify_ldt at run time 2015-08-05 8:26 ` Ingo Molnar @ 2015-08-05 9:03 ` Willy Tarreau 2015-08-05 9:10 ` Ingo Molnar 0 siblings, 1 reply; 20+ messages in thread From: Willy Tarreau @ 2015-08-05 9:03 UTC (permalink / raw) To: Ingo Molnar Cc: Andy Lutomirski, Kees Cook, Steven Rostedt, security@kernel.org, X86 ML, Borislav Petkov, Sasha Levin, LKML, Konrad Rzeszutek Wilk, Boris Ostrovsky, Andrew Cooper, Jan Beulich, xen-devel On Wed, Aug 05, 2015 at 10:26:16AM +0200, Ingo Molnar wrote: > > +modify_ldt: (X86 only) > > + > > +Enables (1) or disables (0) the modify_ldt syscall. Modifying the LDT > > +(Local Descriptor Table) may be needed to run a 16-bit or segmented code > > s/run a/run good catch, thanks. > > +such as Dosemu or Wine. This is done via a system call which is not needed > > s/Dosemu/DOSEMU I didn't know. Fixed. > > +to run portable applications, and which can sometimes be abused to exploit > > +some weaknesses of the architecture, opening new vulnerabilities. > > So that's pretty vague IMHO, and a bit FUD-ish in character. How about: > > ... , and which system call exposes complex, rarely used legacy hardware > features and semantics that had suffered vulnerabilities in the past. OK fine for me, fixed. > > + if (!sysctl_modify_ldt) { > > + printk_ratelimited(KERN_INFO > > + "Denied a call to modify_ldt() from %s[%d] (uid: %d)." > > + " Adjust the modify_ldt sysctl if this was not an" > > Would it really be so difficult to write this as: > > Set "sys.kernel.modify_ldt = 1" in /etc/sysctl.conf if this was not an exploit attempt. It's just a matter of taste. Normally I consider the kernel distro-agnostic so I don't like to suggest one way to adjust sysctls nor to reference config files. Here we're in a case where only standard distro users may hit the issue, and users of embedded distros will not face this message or will easily translate it into their respective configuration scheme. So OK for this one. > 99% of the users seeing this message will see it right after an app of theirs > ended up not working. Let's not add to the annoyance factor! That's exactly why I wrote the message in the first place! > > + " exploit attempt.\n", > > + current->comm, task_pid_nr(current), > > + from_kuid_munged(current_user_ns(), current_uid())); > > Also generally please don't break message lines in the source code while they are > a single line in the syslog, to make it easier to grep for and to expose kernel > hackers to the form of message they are emitting. Ignore checkpatch. I'm fine with this rule as well, it's only in the kenrel that I tend to care about the 80-col limit, in my own code I easily ignore it for the same reason. > > @@ -960,6 +963,17 @@ static struct ctl_table kern_table[] = { > > .mode = 0644, > > .proc_handler = proc_dointvec, > > }, > > +#ifdef CONFIG_MODIFY_LDT_SYSCALL > > + { > > + .procname = "modify_ldt", > > + .data = &sysctl_modify_ldt, > > + .maxlen = sizeof(int), > > + .mode = 0644, > > + .proc_handler = proc_dointvec_minmax, > > + .extra1 = &zero, > > + .extra2 = &one, > > + }, > > +#endif > > So I'd actually make the permissions 0600: to make it a tiny bit harder for > exploits to silently query the current value to figure out whether they can safely > attempt the syscall or not ... That's a good point. If we later make other syscalls configurable, it might be different because some users might want to contact their admin to ask for a specific one. But here, there's usually no admin so I'm fine with hardening it. > (Sadly /etc/sysctl.conf is world-readable on most distros.) Yes, just like most executables are readable while not strictly needed, especially setuid ones which allow the code to be studied in place or easily duplicated to script some exploits. But we could discuss hours about basic system hardening! I've updated the patch with your suggestions. Thanks, Willy >From 8521f170515dfc0f390c396100140504c8dfbcfc Mon Sep 17 00:00:00 2001 From: Willy Tarreau <w@1wt.eu> Date: Sat, 25 Jul 2015 12:18:33 +0200 Subject: [PATCH] x86/ldt: allow to disable modify_ldt at runtime For distros who prefer not to take the risk of completely disabling the modify_ldt syscall using CONFIG_MODIFY_LDT_SYSCALL, this patch adds a sysctl to enable or/disable it at runtime, and proposes to disable it by default. This can be a safe alternative. A message is logged if an attempt was stopped so that it's easy to spot if/when it is needed. Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Willy Tarreau <w@1wt.eu> --- Documentation/sysctl/kernel.txt | 16 ++++++++++++++++ arch/x86/Kconfig | 17 +++++++++++++++++ arch/x86/kernel/ldt.c | 15 +++++++++++++++ kernel/sysctl.c | 14 ++++++++++++++ 4 files changed, 62 insertions(+) diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 6fccb69..7dcdebd 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -41,6 +41,7 @@ show up in /proc/sys/kernel: - kptr_restrict - kstack_depth_to_print [ X86 only ] - l2cr [ PPC only ] +- modify_ldt [ X86 only ] - modprobe ==> Documentation/debugging-modules.txt - modules_disabled - msg_next_id [ sysv ipc ] @@ -391,6 +392,21 @@ This flag controls the L2 cache of G3 processor boards. If ============================================================== +modify_ldt: (X86 only) + +Enables (1) or disables (0) the modify_ldt syscall. Modifying the LDT +(Local Descriptor Table) may be needed to run 16-bit or segmented code +such as DOSEMU or Wine. This is done via a system call which is not needed +to run portable applications, and which exposes complex, rarely used legacy +hardware features and semantics that had suffered vulnerabilities in the +past. + +This sysctl allows one to increase the system's security by disabling the +system call, or to restore compatibility with specific applications when it +was already disabled. + +============================================================== + modules_disabled: A toggle value indicating if modules are allowed to be loaded diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index beabf30..e528fb0 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2069,6 +2069,23 @@ config MODIFY_LDT_SYSCALL surface. Disabling it removes the modify_ldt(2) system call. Saying 'N' here may make sense for embedded or server kernels. + If really unsure, say 'Y', you'll be able to disable it at runtime. + +config DEFAULT_MODIFY_LDT_SYSCALL + bool "Allow userspace to modify the LDT by default" + depends on MODIFY_LDT_SYSCALL + default y + ---help--- + Modifying the LDT (Local Descriptor Table) may be needed to run + 16-bit or segmented code such as DOSEMU or Wine. This is done via + a system call which is not needed to run portable applications, + and which exposes complex, rarely used legacy hardware features + and semantics that had suffered vulnerabilities in the past. + + For this reason this option allows one to enable or disable the + feature at runtime. It is recommended to say 'N' here to leave + the system protected, and to enable it at runtime only if needed + by setting the sys.kernel.modify_ldt sysctl. source "kernel/livepatch/Kconfig" diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c index 2bcc052..b8a4f2d 100644 --- a/arch/x86/kernel/ldt.c +++ b/arch/x86/kernel/ldt.c @@ -11,6 +11,7 @@ #include <linux/sched.h> #include <linux/string.h> #include <linux/mm.h> +#include <linux/ratelimit.h> #include <linux/smp.h> #include <linux/slab.h> #include <linux/vmalloc.h> @@ -21,6 +22,11 @@ #include <asm/mmu_context.h> #include <asm/syscalls.h> +#ifdef CONFIG_MODIFY_LDT_SYSCALL +int sysctl_modify_ldt __read_mostly = + IS_ENABLED(CONFIG_DEFAULT_MODIFY_LDT_SYSCALL); +#endif + /* context.lock is held for us, so we don't need any locking. */ static void flush_ldt(void *current_mm) { @@ -276,6 +282,15 @@ asmlinkage int sys_modify_ldt(int func, void __user *ptr, { int ret = -ENOSYS; + if (!sysctl_modify_ldt) { + printk_ratelimited(KERN_INFO + "Denied a call to modify_ldt() from %s[%d] (uid: %d)." + " Set \"sys.kernel.modify_ldt = 1\" in /etc/sysctl.conf if this was not an exploit attempt.\n", + current->comm, task_pid_nr(current), + from_kuid_munged(current_user_ns(), current_uid())); + return ret; + } + switch (func) { case 0: ret = read_ldt(ptr, bytecount); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 19b62b5..f7eb41f 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -111,6 +111,9 @@ extern int sysctl_nr_open_min, sysctl_nr_open_max; #ifndef CONFIG_MMU extern int sysctl_nr_trim_pages; #endif +#ifdef CONFIG_MODIFY_LDT_SYSCALL +extern int sysctl_modify_ldt; +#endif /* Constants used for minimum and maximum */ #ifdef CONFIG_LOCKUP_DETECTOR @@ -960,6 +963,17 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = proc_dointvec, }, +#ifdef CONFIG_MODIFY_LDT_SYSCALL + { + .procname = "modify_ldt", + .data = &sysctl_modify_ldt, + .maxlen = sizeof(int), + .mode = 0600, + .proc_handler = proc_dointvec_minmax, + .extra1 = &zero, + .extra2 = &one, + }, +#endif #endif #if defined(CONFIG_MMU) { -- 1.7.12.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v3 1/1] x86: allow to enable/disable modify_ldt at run time 2015-08-05 9:03 ` Willy Tarreau @ 2015-08-05 9:10 ` Ingo Molnar 0 siblings, 0 replies; 20+ messages in thread From: Ingo Molnar @ 2015-08-05 9:10 UTC (permalink / raw) To: Willy Tarreau Cc: Andy Lutomirski, Kees Cook, Steven Rostedt, security@kernel.org, X86 ML, Borislav Petkov, Sasha Levin, LKML, Konrad Rzeszutek Wilk, Boris Ostrovsky, Andrew Cooper, Jan Beulich, xen-devel * Willy Tarreau <w@1wt.eu> wrote: > > > + if (!sysctl_modify_ldt) { > > > + printk_ratelimited(KERN_INFO > > > + "Denied a call to modify_ldt() from %s[%d] (uid: %d)." > > > + " Adjust the modify_ldt sysctl if this was not an" > > > > Would it really be so difficult to write this as: > > > > Set "sys.kernel.modify_ldt = 1" in /etc/sysctl.conf if this was not an exploit attempt. > > It's just a matter of taste. Normally I consider the kernel distro-agnostic so I > don't like to suggest one way to adjust sysctls nor to reference config files. > Here we're in a case where only standard distro users may hit the issue, and > users of embedded distros will not face this message or will easily translate it > into their respective configuration scheme. So OK for this one. So it's a side issue, but it's not a matter of taste at all: why should we end up hurting 99% of Linux users (that use regular distros), just to make it slightly more 'correct' for the weird 1% 'embedded distro' case that decided to put sysctl configuration elsewhere? Users of 'embedded' distros won't normally see kernel messages, and even if they do, the message is crystal clear even to them... Such messages should be as helpful to the regular case as possible. The weird cases will be OK too: and it does not help to make a message unhelpful for _both_ cases. Thanks, Ingo ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2015-08-05 9:10 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-08-03 18:23 [PATCH 0/2] x86: allow to enable/disable modify_ldt at run time Willy Tarreau 2015-08-03 18:23 ` [PATCH 1/2] sysctl: add a new generic strategy to make permanent changes on negative values Willy Tarreau 2015-08-03 18:33 ` Andy Lutomirski 2015-08-03 18:50 ` Willy Tarreau 2015-08-03 18:23 ` [PATCH 2/2] x86/ldt: allow to disable modify_ldt at runtime Willy Tarreau 2015-08-03 18:45 ` Andy Lutomirski 2015-08-03 19:01 ` Willy Tarreau 2015-08-03 19:06 ` Andy Lutomirski 2015-08-03 19:18 ` Willy Tarreau 2015-08-04 3:54 ` Borislav Petkov 2015-08-04 6:00 ` Willy Tarreau 2015-08-03 22:35 ` Kees Cook 2015-08-03 23:19 ` Willy Tarreau 2015-08-04 1:36 ` Kees Cook 2015-08-04 8:49 ` [PATCH v3 1/1] x86: allow to enable/disable modify_ldt at run time Willy Tarreau 2015-08-05 8:00 ` Ingo Molnar 2015-08-05 8:08 ` Willy Tarreau 2015-08-05 8:26 ` Ingo Molnar 2015-08-05 9:03 ` Willy Tarreau 2015-08-05 9:10 ` Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).