* [PATCH] [0/11] SYSCTL: Use RCU to avoid races with string sysctls
@ 2009-12-21 1:20 Andi Kleen
2009-12-21 1:20 ` [PATCH] [1/11] Add rcustring ADT for RCU protected strings Andi Kleen
` (11 more replies)
0 siblings, 12 replies; 25+ messages in thread
From: Andi Kleen @ 2009-12-21 1:20 UTC (permalink / raw)
To: linux-kernel, paulmck, ebiederm
With BKL-less sysctls most of the writable string sysctls are racy. There
is no locking on the reader side, so a reader could see an inconsistent
string or worse miss the terminating null and walk of beyond it.
This patch kit adds a new "rcu string" variant to avoid these
problems and convers the racy users. One the writer side the strings are
always copied to new memory and the readers use rcu_read_lock()
to get a stable view. For readers who access the string over
sleeps the reader copies the string.
This is all hidden in a new generic "rcu_string" ADT which can be also
used for other purposes.
This finally implements all the letters in RCU, most other users
leave out the 'C'.
I also fixed the zone order list sysctl to use a mutex to avoid
racing with itself.
I left some obscure users in architectures (sparc, mips) alone and audited
all of the others. The sparc reboot_cmd one has references to asm files
which I didn't want to touch and the mips variant seemd just too obscure.
All the others are not racy.
-Andi
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH] [1/11] Add rcustring ADT for RCU protected strings
2009-12-21 1:20 [PATCH] [0/11] SYSCTL: Use RCU to avoid races with string sysctls Andi Kleen
@ 2009-12-21 1:20 ` Andi Kleen
2009-12-22 2:46 ` Paul E. McKenney
2009-12-21 1:20 ` [PATCH] [2/11] Add a kernel_address() that works for data too Andi Kleen
` (10 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Andi Kleen @ 2009-12-21 1:20 UTC (permalink / raw)
To: paulmck, linux-kernel, paulmck, ebiederm
Add a little ADT for RCU protected strings. RCU is a convenient
way to manage modifications to read-often-write-seldom strings.
Add some helper functions to make this more straight forward.
Used by follow-on patches to implement RCU protected sysctl strings.
* General rules:
* Reader has to use rcu_read_lock() and not sleep while accessing the string,
* or alternatively get a copy with access_rcu_string()
* Writer needs an own lock against each other.
* Each modification should allocate a new string first and free the old
* one with free_rcu_string()
* In writers use rcu_assign_pointer to publicize the updated string to
* global readers.
* The size passed to access_rcu_string() must be the same as passed
* to alloc_rcu_string() and be known in advance. Don't use strlen()!
*
* For sysctls also see proc_rcu_string() as a convenient wrapper
Cc: paulmck@linux.vnet.ibm.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
include/linux/rcustring.h | 20 ++++++++
lib/Makefile | 3 -
lib/rcustring.c | 112 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 134 insertions(+), 1 deletion(-)
Index: linux-2.6.33-rc1-ak/include/linux/rcustring.h
===================================================================
--- /dev/null
+++ linux-2.6.33-rc1-ak/include/linux/rcustring.h
@@ -0,0 +1,20 @@
+#ifndef _RCUSTRING_H
+#define _RCUSTRING_H 1
+
+#include <linux/gfp.h>
+#include <linux/rcupdate.h>
+
+/*
+ * Simple wrapper to manage strings by RCU.
+ */
+
+extern char *alloc_rcu_string(int size, gfp_t gfp);
+extern void free_rcu_string(const char *string);
+
+/*
+ * size must be the same as alloc_rcu_string, don't
+ * use strlen on str!
+ */
+extern char *access_rcu_string(const char *str, int size, gfp_t gfp);
+
+#endif
Index: linux-2.6.33-rc1-ak/lib/rcustring.c
===================================================================
--- /dev/null
+++ linux-2.6.33-rc1-ak/lib/rcustring.c
@@ -0,0 +1,112 @@
+/*
+ * Manage strings by Read-Copy-Update. This is useful for global strings
+ * that change only very rarely, but are read often.
+ *
+ * Author: Andi Kleen
+ *
+ * General rules:
+ * Reader has to use rcu_read_lock() and not sleep while accessing the string,
+ * or alternatively get a copy with access_rcu_string()
+ * Writer needs an own lock against each other.
+ * Each modification should allocate a new string first and free the old
+ * one with free_rcu_string()
+ * In writers use rcu_assign_pointer to publicize the updated string to
+ * global readers.
+ * The size passed to access_rcu_string() must be the same as passed
+ * to alloc_rcu_string() and be known in advance. Don't use strlen()!
+ *
+ * For sysctls also see proc_rcu_string() as a convenient wrapper
+ *
+ * Typical example:
+ * #define MAX_GLOBAL_SIZE ...
+ * char *global = "default";
+ *
+ * Rare writer:
+ * char *old, *new;
+ * DECLARE_MUTEX(writer_lock);
+ * mutex_lock(&writer_lock);
+ * new = alloc_rcu_string(MAX_GLOBAL_SIZE, GFP_KERNEL);
+ * if (!new) {
+ * mutex_unlock(&writer_lock);
+ * return -ENOMEM;
+ * }
+ * strlcpy(new, new_value, MAX_GLOBAL_SIZE);
+ * old = global;
+ * rcu_assign_pointer(global, new);
+ * mutex_unlock(&writer_lock);
+ * free_rcu_string(old);
+ *
+ * Sleepy reader:
+ * char *str = access_rcu_string(global, MAX_GLOBAL_SIZE, GFP_KERNEL);
+ * if (!str)
+ * return -ENOMEM;
+ * ... use str while sleeping ...
+ * kfree(string);
+ *
+ * Non sleepy reader:
+ * rcu_read_lock();
+ * ... read str ...
+ * rcu_read_unlock();
+ *
+ * Note this code could be relatively easily generalized for other kinds
+ * of non-atomic data, but this initial version only handles strings.
+ * Only need to change the strlcpy() below to memcpy()
+ */
+#include <linux/kernel.h>
+#include <linux/rcustring.h>
+#include <linux/slab.h>
+#include <linux/rcupdate.h>
+#include <linux/module.h>
+#include <linux/string.h>
+
+struct rcu_string {
+ struct rcu_head rcu;
+ char str[0];
+};
+
+char *alloc_rcu_string(int size, gfp_t gfp)
+{
+ struct rcu_string *rs = kmalloc(sizeof(struct rcu_string) + size, gfp);
+ if (!rs)
+ return NULL;
+ return rs->str;
+}
+EXPORT_SYMBOL(alloc_rcu_string);
+
+static void do_free_rcu_string(struct rcu_head *h)
+{
+ kfree(container_of(h, struct rcu_string, rcu));
+}
+
+static inline struct rcu_string *str_to_rcustr(const char *str)
+{
+ /*
+ * Opencoded container_of because the strict type checking
+ * in normal container_of cannot deal with char str[0] vs char *str.
+ */
+ return (struct rcu_string *)(str - offsetof(struct rcu_string, str));
+}
+
+void free_rcu_string(const char *str)
+{
+ struct rcu_string *rs = str_to_rcustr(str);
+ call_rcu(&rs->rcu, do_free_rcu_string);
+}
+EXPORT_SYMBOL(free_rcu_string);
+
+/*
+ * Get a local private copy of a RCU protected string.
+ * Mostly useful to get a string that is stable while sleeping.
+ * Caller must free returned string.
+ */
+char *access_rcu_string(const char *str, int size, gfp_t gfp)
+{
+ char *copy = kmalloc(size, gfp);
+ if (!str)
+ return NULL;
+ rcu_read_lock();
+ strlcpy(copy, str, size);
+ rcu_read_unlock();
+ return copy;
+}
+EXPORT_SYMBOL(access_rcu_string);
Index: linux-2.6.33-rc1-ak/lib/Makefile
===================================================================
--- linux-2.6.33-rc1-ak.orig/lib/Makefile
+++ linux-2.6.33-rc1-ak/lib/Makefile
@@ -12,7 +12,8 @@ lib-y := ctype.o string.o vsprintf.o cmd
idr.o int_sqrt.o extable.o prio_tree.o \
sha1.o irq_regs.o reciprocal_div.o argv_split.o \
proportions.o prio_heap.o ratelimit.o show_mem.o \
- is_single_threaded.o plist.o decompress.o flex_array.o
+ is_single_threaded.o plist.o decompress.o flex_array.o \
+ rcustring.o
lib-$(CONFIG_MMU) += ioremap.o
lib-$(CONFIG_SMP) += cpumask.o
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH] [2/11] Add a kernel_address() that works for data too
2009-12-21 1:20 [PATCH] [0/11] SYSCTL: Use RCU to avoid races with string sysctls Andi Kleen
2009-12-21 1:20 ` [PATCH] [1/11] Add rcustring ADT for RCU protected strings Andi Kleen
@ 2009-12-21 1:20 ` Andi Kleen
2009-12-21 1:20 ` [PATCH] [3/11] SYSCTL: Add proc_rcu_string to manage sysctls using rcu strings Andi Kleen
` (9 subsequent siblings)
11 siblings, 0 replies; 25+ messages in thread
From: Andi Kleen @ 2009-12-21 1:20 UTC (permalink / raw)
To: linux-arch, linux-kernel, paulmck, ebiederm
Add a variant of kernel_text_address() that includes kernel data.
Assumes kernel is _text ... _end - init section. True everywhere?
Cc: linux-arch@vger.kernel.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
include/linux/kernel.h | 1 +
kernel/extable.c | 18 ++++++++++++++++++
2 files changed, 19 insertions(+)
Index: linux-2.6.33-rc1-ak/include/linux/kernel.h
===================================================================
--- linux-2.6.33-rc1-ak.orig/include/linux/kernel.h
+++ linux-2.6.33-rc1-ak/include/linux/kernel.h
@@ -205,6 +205,7 @@ extern unsigned long long memparse(const
extern int core_kernel_text(unsigned long addr);
extern int __kernel_text_address(unsigned long addr);
extern int kernel_text_address(unsigned long addr);
+extern int kernel_address(unsigned long addr);
extern int func_ptr_is_kernel_text(void *ptr);
struct pid;
Index: linux-2.6.33-rc1-ak/kernel/extable.c
===================================================================
--- linux-2.6.33-rc1-ak.orig/kernel/extable.c
+++ linux-2.6.33-rc1-ak/kernel/extable.c
@@ -72,6 +72,18 @@ int core_kernel_text(unsigned long addr)
return 0;
}
+static int core_kernel_address(unsigned long addr)
+{
+ if ((addr >= (unsigned long)_text &&
+ addr <= (unsigned long)_end)) {
+ if (addr >= (unsigned long)__init_begin &&
+ addr < (unsigned long)__init_end)
+ return system_state == SYSTEM_BOOTING;
+ return 1;
+ }
+ return 0;
+}
+
int __kernel_text_address(unsigned long addr)
{
if (core_kernel_text(addr))
@@ -98,6 +110,12 @@ int kernel_text_address(unsigned long ad
return is_module_text_address(addr);
}
+/* text or data in core kernel or module */
+int kernel_address(unsigned long addr)
+{
+ return core_kernel_address(addr) || is_module_address(addr);
+}
+
/*
* On some architectures (PPC64, IA64) function pointers
* are actually only tokens to some data that then holds the
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH] [3/11] SYSCTL: Add proc_rcu_string to manage sysctls using rcu strings
2009-12-21 1:20 [PATCH] [0/11] SYSCTL: Use RCU to avoid races with string sysctls Andi Kleen
2009-12-21 1:20 ` [PATCH] [1/11] Add rcustring ADT for RCU protected strings Andi Kleen
2009-12-21 1:20 ` [PATCH] [2/11] Add a kernel_address() that works for data too Andi Kleen
@ 2009-12-21 1:20 ` Andi Kleen
2009-12-22 2:51 ` Paul E. McKenney
2009-12-21 1:20 ` [PATCH] [4/11] SYSCTL: Use RCU strings for core_pattern sysctl Andi Kleen
` (8 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Andi Kleen @ 2009-12-21 1:20 UTC (permalink / raw)
To: linux-kernel, paulmck, ebiederm
Add a helper to use the new rcu strings for managing access
to text sysctls. Conversions will be in follow-on patches.
An alternative would be to use seqlocks here, but RCU seemed
cleaner.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
include/linux/sysctl.h | 2 +
kernel/sysctl.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++
kernel/sysctl_check.c | 1
3 files changed, 69 insertions(+)
Index: linux-2.6.33-rc1-ak/include/linux/sysctl.h
===================================================================
--- linux-2.6.33-rc1-ak.orig/include/linux/sysctl.h
+++ linux-2.6.33-rc1-ak/include/linux/sysctl.h
@@ -969,6 +969,8 @@ typedef int proc_handler (struct ctl_tab
extern int proc_dostring(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
+extern int proc_rcu_string(struct ctl_table *, int,
+ void __user *, size_t *, loff_t *);
extern int proc_dointvec(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
extern int proc_dointvec_minmax(struct ctl_table *, int,
Index: linux-2.6.33-rc1-ak/kernel/sysctl.c
===================================================================
--- linux-2.6.33-rc1-ak.orig/kernel/sysctl.c
+++ linux-2.6.33-rc1-ak/kernel/sysctl.c
@@ -50,6 +50,7 @@
#include <linux/ftrace.h>
#include <linux/slow-work.h>
#include <linux/perf_event.h>
+#include <linux/rcustring.h>
#include <asm/uaccess.h>
#include <asm/processor.h>
@@ -2016,6 +2017,60 @@ static int _proc_do_string(void* data, i
}
/**
+ * proc_rcu_string - sysctl string with rcu protection
+ * @table: the sysctl table
+ * @write: %TRUE if this is a write to the sysctl file
+ * @buffer: the user buffer
+ * @lenp: the size of the user buffer
+ * @ppos: file position
+ *
+ * Handle a string sysctl similar to proc_dostring.
+ * The main difference is that the data pointer in the table
+ * points to a pointer to a string. The string should be initially
+ * pointing to a statically allocated (as a C object, not on the heap)
+ * default. When it is replaced old uses will be protected by
+ * RCU. The reader should use rcu_read_lock()/unlock() or
+ * access_rcu_string().
+ */
+int proc_rcu_string(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ int ret;
+
+ if (write) {
+ /* protect writers against each other */
+ static DEFINE_MUTEX(rcu_string_mutex);
+ char *old;
+ char *new;
+
+ new = alloc_rcu_string(table->maxlen, GFP_KERNEL);
+ if (!new)
+ return -ENOMEM;
+ mutex_lock(&rcu_string_mutex);
+ old = *(char **)(table->data);
+ strcpy(new, old);
+ ret = _proc_do_string(new, table->maxlen, write, buffer, lenp, ppos);
+ rcu_assign_pointer(*(char **)(table->data), new);
+ /*
+ * For the first initialization allow constant strings.
+ */
+ if (!kernel_address((unsigned long)old))
+ free_rcu_string(old);
+ mutex_unlock(&rcu_string_mutex);
+ } else {
+ char *str;
+
+ str = access_rcu_string(*(char **)(table->data), table->maxlen,
+ GFP_KERNEL);
+ if (!str)
+ return -ENOMEM;
+ ret = _proc_do_string(str, table->maxlen, write, buffer, lenp, ppos);
+ kfree(str);
+ }
+ return ret;
+}
+
+/**
* proc_dostring - read a string sysctl
* @table: the sysctl table
* @write: %TRUE if this is a write to the sysctl file
@@ -2030,6 +2085,10 @@ static int _proc_do_string(void* data, i
* and a newline '\n' is added. It is truncated if the buffer is
* not large enough.
*
+ * WARNING: this should be only used for read only strings
+ * or when you have a wrapper with special locking. Otherwise
+ * use proc_rcu_string to avoid races with the consumer.
+ *
* Returns 0 on success.
*/
int proc_dostring(struct ctl_table *table, int write,
@@ -2614,6 +2673,12 @@ int proc_dostring(struct ctl_table *tabl
return -ENOSYS;
}
+int proc_rcu_string(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ return -ENOSYS;
+}
+
int proc_dointvec(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
@@ -2670,6 +2735,7 @@ EXPORT_SYMBOL(proc_dointvec_minmax);
EXPORT_SYMBOL(proc_dointvec_userhz_jiffies);
EXPORT_SYMBOL(proc_dointvec_ms_jiffies);
EXPORT_SYMBOL(proc_dostring);
+EXPORT_SYMBOL(proc_rcu_string);
EXPORT_SYMBOL(proc_doulongvec_minmax);
EXPORT_SYMBOL(proc_doulongvec_ms_jiffies_minmax);
EXPORT_SYMBOL(register_sysctl_table);
Index: linux-2.6.33-rc1-ak/kernel/sysctl_check.c
===================================================================
--- linux-2.6.33-rc1-ak.orig/kernel/sysctl_check.c
+++ linux-2.6.33-rc1-ak/kernel/sysctl_check.c
@@ -131,6 +131,7 @@ int sysctl_check_table(struct nsproxy *n
set_fail(&fail, table, "Directory with extra2");
} else {
if ((table->proc_handler == proc_dostring) ||
+ (table->proc_handler == proc_rcu_string) ||
(table->proc_handler == proc_dointvec) ||
(table->proc_handler == proc_dointvec_minmax) ||
(table->proc_handler == proc_dointvec_jiffies) ||
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH] [4/11] SYSCTL: Use RCU strings for core_pattern sysctl
2009-12-21 1:20 [PATCH] [0/11] SYSCTL: Use RCU to avoid races with string sysctls Andi Kleen
` (2 preceding siblings ...)
2009-12-21 1:20 ` [PATCH] [3/11] SYSCTL: Add proc_rcu_string to manage sysctls using rcu strings Andi Kleen
@ 2009-12-21 1:20 ` Andi Kleen
2009-12-21 1:20 ` [PATCH] [5/11] SYSCTL: Add call_usermodehelper_cleanup() Andi Kleen
` (7 subsequent siblings)
11 siblings, 0 replies; 25+ messages in thread
From: Andi Kleen @ 2009-12-21 1:20 UTC (permalink / raw)
To: linux-kernel, paulmck, ebiederm
As a bonus this removes one use of the BKL.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
fs/exec.c | 9 ++++-----
kernel/sysctl.c | 6 +++---
2 files changed, 7 insertions(+), 8 deletions(-)
Index: linux-2.6.33-rc1-ak/fs/exec.c
===================================================================
--- linux-2.6.33-rc1-ak.orig/fs/exec.c
+++ linux-2.6.33-rc1-ak/fs/exec.c
@@ -62,7 +62,7 @@
#include "internal.h"
int core_uses_pid;
-char core_pattern[CORENAME_MAX_SIZE] = "core";
+char *core_pattern = "core";
unsigned int core_pipe_limit;
int suid_dumpable = 0;
@@ -1825,12 +1825,11 @@ void do_coredump(long signr, int exit_co
clear_thread_flag(TIF_SIGPENDING);
/*
- * lock_kernel() because format_corename() is controlled by sysctl, which
- * uses lock_kernel()
+ * Protect corename by RCU vs proc_rcu_string()
*/
- lock_kernel();
+ rcu_read_lock();
ispipe = format_corename(corename, signr);
- unlock_kernel();
+ rcu_read_unlock();
if ((!ispipe) && (cprm.limit < binfmt->min_coredump))
goto fail_unlock;
Index: linux-2.6.33-rc1-ak/kernel/sysctl.c
===================================================================
--- linux-2.6.33-rc1-ak.orig/kernel/sysctl.c
+++ linux-2.6.33-rc1-ak/kernel/sysctl.c
@@ -75,7 +75,7 @@ extern int sysctl_oom_dump_tasks;
extern int max_threads;
extern int core_uses_pid;
extern int suid_dumpable;
-extern char core_pattern[];
+extern char *core_pattern;
extern unsigned int core_pipe_limit;
extern int pid_max;
extern int min_free_kbytes;
@@ -399,10 +399,10 @@ static struct ctl_table kern_table[] = {
},
{
.procname = "core_pattern",
- .data = core_pattern,
+ .data = &core_pattern,
.maxlen = CORENAME_MAX_SIZE,
.mode = 0644,
- .proc_handler = proc_dostring,
+ .proc_handler = proc_rcu_string,
},
{
.procname = "core_pipe_limit",
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH] [5/11] SYSCTL: Add call_usermodehelper_cleanup()
2009-12-21 1:20 [PATCH] [0/11] SYSCTL: Use RCU to avoid races with string sysctls Andi Kleen
` (3 preceding siblings ...)
2009-12-21 1:20 ` [PATCH] [4/11] SYSCTL: Use RCU strings for core_pattern sysctl Andi Kleen
@ 2009-12-21 1:20 ` Andi Kleen
2009-12-21 1:20 ` [PATCH] [6/11] SYSCTL: Convert modprobe_path to proc_rcu_string() Andi Kleen
` (6 subsequent siblings)
11 siblings, 0 replies; 25+ messages in thread
From: Andi Kleen @ 2009-12-21 1:20 UTC (permalink / raw)
To: linux-kernel, paulmck, ebiederm
This is the same as call_usermodehelper(), but with an cleanup callback.
Needed for some of the followon proc_rcu_string() users.
This avoids open coding this.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
include/linux/kmod.h | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
Index: linux-2.6.33-rc1-ak/include/linux/kmod.h
===================================================================
--- linux-2.6.33-rc1-ak.orig/include/linux/kmod.h
+++ linux-2.6.33-rc1-ak/include/linux/kmod.h
@@ -72,7 +72,8 @@ int call_usermodehelper_exec(struct subp
void call_usermodehelper_freeinfo(struct subprocess_info *info);
static inline int
-call_usermodehelper(char *path, char **argv, char **envp, enum umh_wait wait)
+call_usermodehelper_cleanup(char *path, char **argv, char **envp, enum umh_wait wait,
+ void (*cleanup)(char **, char **))
{
struct subprocess_info *info;
gfp_t gfp_mask = (wait == UMH_NO_WAIT) ? GFP_ATOMIC : GFP_KERNEL;
@@ -80,10 +81,18 @@ call_usermodehelper(char *path, char **a
info = call_usermodehelper_setup(path, argv, envp, gfp_mask);
if (info == NULL)
return -ENOMEM;
+ if (cleanup)
+ call_usermodehelper_setcleanup(info, cleanup);
return call_usermodehelper_exec(info, wait);
}
static inline int
+call_usermodehelper(char *path, char **argv, char **envp, enum umh_wait wait)
+{
+ return call_usermodehelper_cleanup(path, argv, envp, wait, NULL);
+}
+
+static inline int
call_usermodehelper_keys(char *path, char **argv, char **envp,
struct key *session_keyring, enum umh_wait wait)
{
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH] [6/11] SYSCTL: Convert modprobe_path to proc_rcu_string()
2009-12-21 1:20 [PATCH] [0/11] SYSCTL: Use RCU to avoid races with string sysctls Andi Kleen
` (4 preceding siblings ...)
2009-12-21 1:20 ` [PATCH] [5/11] SYSCTL: Add call_usermodehelper_cleanup() Andi Kleen
@ 2009-12-21 1:20 ` Andi Kleen
2009-12-21 1:20 ` [PATCH] [7/11] SYSCTL: Convert poweroff_command to proc_rcu_string Andi Kleen
` (5 subsequent siblings)
11 siblings, 0 replies; 25+ messages in thread
From: Andi Kleen @ 2009-12-21 1:20 UTC (permalink / raw)
To: linux-kernel, paulmck, ebiederm
Avoids races with lockless sysctl
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
kernel/kmod.c | 36 ++++++++++++++++++++++++++++--------
kernel/sysctl.c | 4 ++--
2 files changed, 30 insertions(+), 10 deletions(-)
Index: linux-2.6.33-rc1-ak/kernel/kmod.c
===================================================================
--- linux-2.6.33-rc1-ak.orig/kernel/kmod.c
+++ linux-2.6.33-rc1-ak/kernel/kmod.c
@@ -35,6 +35,7 @@
#include <linux/resource.h>
#include <linux/notifier.h>
#include <linux/suspend.h>
+#include <linux/rcustring.h>
#include <asm/uaccess.h>
#include <trace/events/module.h>
@@ -48,7 +49,12 @@ static struct workqueue_struct *khelper_
/*
modprobe_path is set via /proc/sys.
*/
-char modprobe_path[KMOD_PATH_LEN] = "/sbin/modprobe";
+char *modprobe_path = "/sbin/modprobe";
+
+static void free_arg(char **argv, char **env)
+{
+ kfree(argv[0]);
+}
/**
* __request_module - try to load a kernel module
@@ -71,7 +77,8 @@ int __request_module(bool wait, const ch
char module_name[MODULE_NAME_LEN];
unsigned int max_modprobes;
int ret;
- char *argv[] = { modprobe_path, "-q", "--", module_name, NULL };
+ char *mp_copy;
+ char *argv[] = { NULL, "-q", "--", module_name, NULL };
static char *envp[] = { "HOME=/",
"TERM=linux",
"PATH=/sbin:/usr/sbin:/bin:/usr/bin",
@@ -80,15 +87,24 @@ int __request_module(bool wait, const ch
#define MAX_KMOD_CONCURRENT 50 /* Completely arbitrary value - KAO */
static int kmod_loop_msg;
+ /* Get a stable-over-sleeps private copy of modprobe_path */
+ mp_copy = access_rcu_string(modprobe_path, KMOD_PATH_LEN,
+ wait ? GFP_KERNEL : GFP_ATOMIC);
+ if (!mp_copy)
+ return -ENOMEM;
+ argv[0] = mp_copy;
+
va_start(args, fmt);
ret = vsnprintf(module_name, MODULE_NAME_LEN, fmt, args);
va_end(args);
- if (ret >= MODULE_NAME_LEN)
- return -ENAMETOOLONG;
+ if (ret >= MODULE_NAME_LEN) {
+ ret = -ENAMETOOLONG;
+ goto error;
+ }
ret = security_kernel_module_request(module_name);
if (ret)
- return ret;
+ goto error;
/* If modprobe needs a service that is in a module, we get a recursive
* loop. Limit the number of running kmod threads to max_threads/2 or
@@ -111,14 +127,18 @@ int __request_module(bool wait, const ch
"request_module: runaway loop modprobe %s\n",
module_name);
atomic_dec(&kmod_concurrent);
- return -ENOMEM;
+ ret = -ENOMEM;
+ goto error;
}
trace_module_request(module_name, wait, _RET_IP_);
- ret = call_usermodehelper(modprobe_path, argv, envp,
- wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC);
+ ret = call_usermodehelper_cleanup(mp_copy, argv, envp,
+ wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC, free_arg);
+ mp_copy = NULL; /* free_arg frees */
atomic_dec(&kmod_concurrent);
+error:
+ kfree(mp_copy);
return ret;
}
EXPORT_SYMBOL(__request_module);
Index: linux-2.6.33-rc1-ak/kernel/sysctl.c
===================================================================
--- linux-2.6.33-rc1-ak.orig/kernel/sysctl.c
+++ linux-2.6.33-rc1-ak/kernel/sysctl.c
@@ -121,7 +121,7 @@ static int min_percpu_pagelist_fract = 8
static int ngroups_max = NGROUPS_MAX;
#ifdef CONFIG_MODULES
-extern char modprobe_path[];
+extern char *modprobe_path;
extern int modules_disabled;
#endif
#ifdef CONFIG_CHR_DEV_SG
@@ -532,7 +532,7 @@ static struct ctl_table kern_table[] = {
.data = &modprobe_path,
.maxlen = KMOD_PATH_LEN,
.mode = 0644,
- .proc_handler = proc_dostring,
+ .proc_handler = proc_rcu_string,
},
{
.procname = "modules_disabled",
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH] [7/11] SYSCTL: Convert poweroff_command to proc_rcu_string
2009-12-21 1:20 [PATCH] [0/11] SYSCTL: Use RCU to avoid races with string sysctls Andi Kleen
` (5 preceding siblings ...)
2009-12-21 1:20 ` [PATCH] [6/11] SYSCTL: Convert modprobe_path to proc_rcu_string() Andi Kleen
@ 2009-12-21 1:20 ` Andi Kleen
2009-12-21 1:20 ` [PATCH] [8/11] SYSCTL: Convert hotplug helper string to proc_rcu_string() Andi Kleen
` (4 subsequent siblings)
11 siblings, 0 replies; 25+ messages in thread
From: Andi Kleen @ 2009-12-21 1:20 UTC (permalink / raw)
To: linux-kernel, paulmck, ebiederm
Avoids races with lockless sysctl.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
include/linux/reboot.h | 2 +-
kernel/sys.c | 8 ++++++--
kernel/sysctl.c | 2 +-
3 files changed, 8 insertions(+), 4 deletions(-)
Index: linux-2.6.33-rc1-ak/kernel/sys.c
===================================================================
--- linux-2.6.33-rc1-ak.orig/kernel/sys.c
+++ linux-2.6.33-rc1-ak/kernel/sys.c
@@ -1595,7 +1595,7 @@ SYSCALL_DEFINE3(getcpu, unsigned __user
return err ? -EFAULT : 0;
}
-char poweroff_cmd[POWEROFF_CMD_PATH_LEN] = "/sbin/poweroff";
+char *poweroff_cmd = "/sbin/poweroff";
static void argv_cleanup(char **argv, char **envp)
{
@@ -1612,7 +1612,7 @@ static void argv_cleanup(char **argv, ch
int orderly_poweroff(bool force)
{
int argc;
- char **argv = argv_split(GFP_ATOMIC, poweroff_cmd, &argc);
+ char **argv;
static char *envp[] = {
"HOME=/",
"PATH=/sbin:/bin:/usr/sbin:/usr/bin",
@@ -1621,6 +1621,10 @@ int orderly_poweroff(bool force)
int ret = -ENOMEM;
struct subprocess_info *info;
+ /* RCU protection for poweroff_cmd */
+ rcu_read_lock();
+ argv = argv_split(GFP_ATOMIC, poweroff_cmd, &argc);
+ rcu_read_unlock();
if (argv == NULL) {
printk(KERN_WARNING "%s failed to allocate memory for \"%s\"\n",
__func__, poweroff_cmd);
Index: linux-2.6.33-rc1-ak/include/linux/reboot.h
===================================================================
--- linux-2.6.33-rc1-ak.orig/include/linux/reboot.h
+++ linux-2.6.33-rc1-ak/include/linux/reboot.h
@@ -67,7 +67,7 @@ extern void kernel_power_off(void);
void ctrl_alt_del(void);
#define POWEROFF_CMD_PATH_LEN 256
-extern char poweroff_cmd[POWEROFF_CMD_PATH_LEN];
+extern char *poweroff_cmd;
extern int orderly_poweroff(bool force);
Index: linux-2.6.33-rc1-ak/kernel/sysctl.c
===================================================================
--- linux-2.6.33-rc1-ak.orig/kernel/sysctl.c
+++ linux-2.6.33-rc1-ak/kernel/sysctl.c
@@ -871,7 +871,7 @@ static struct ctl_table kern_table[] = {
.data = &poweroff_cmd,
.maxlen = POWEROFF_CMD_PATH_LEN,
.mode = 0644,
- .proc_handler = proc_dostring,
+ .proc_handler = proc_rcu_string,
},
#ifdef CONFIG_KEYS
{
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH] [8/11] SYSCTL: Convert hotplug helper string to proc_rcu_string()
2009-12-21 1:20 [PATCH] [0/11] SYSCTL: Use RCU to avoid races with string sysctls Andi Kleen
` (6 preceding siblings ...)
2009-12-21 1:20 ` [PATCH] [7/11] SYSCTL: Convert poweroff_command to proc_rcu_string Andi Kleen
@ 2009-12-21 1:20 ` Andi Kleen
2009-12-22 19:03 ` Greg KH
2009-12-21 1:20 ` [PATCH] [9/11] SYSCTL: Add a mutex to the page_alloc zone order sysctl Andi Kleen
` (3 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Andi Kleen @ 2009-12-21 1:20 UTC (permalink / raw)
To: greg, linux-kernel, paulmck, ebiederm
This avoids races with lockless sysctl.
I also moved the code into a separate function because the original
was very long.
Cc: greg@kroah.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
include/linux/kobject.h | 2 -
kernel/sysctl.c | 2 -
lib/kobject_uevent.c | 50 ++++++++++++++++++++++++++++++------------------
3 files changed, 34 insertions(+), 20 deletions(-)
Index: linux-2.6.33-rc1-ak/include/linux/kobject.h
===================================================================
--- linux-2.6.33-rc1-ak.orig/include/linux/kobject.h
+++ linux-2.6.33-rc1-ak/include/linux/kobject.h
@@ -31,7 +31,7 @@
#define UEVENT_BUFFER_SIZE 2048 /* buffer for the variables */
/* path to the userspace helper executed on an event */
-extern char uevent_helper[];
+extern char *uevent_helper;
/* counter to tag the uevent, read only except for the kobject core */
extern u64 uevent_seqnum;
Index: linux-2.6.33-rc1-ak/kernel/sysctl.c
===================================================================
--- linux-2.6.33-rc1-ak.orig/kernel/sysctl.c
+++ linux-2.6.33-rc1-ak/kernel/sysctl.c
@@ -551,7 +551,7 @@ static struct ctl_table kern_table[] = {
.data = &uevent_helper,
.maxlen = UEVENT_HELPER_PATH_LEN,
.mode = 0644,
- .proc_handler = proc_dostring,
+ .proc_handler = proc_rcu_string,
},
#endif
#ifdef CONFIG_CHR_DEV_SG
Index: linux-2.6.33-rc1-ak/lib/kobject_uevent.c
===================================================================
--- linux-2.6.33-rc1-ak.orig/lib/kobject_uevent.c
+++ linux-2.6.33-rc1-ak/lib/kobject_uevent.c
@@ -22,11 +22,12 @@
#include <linux/socket.h>
#include <linux/skbuff.h>
#include <linux/netlink.h>
+#include <linux/rcustring.h>
#include <net/sock.h>
u64 uevent_seqnum;
-char uevent_helper[UEVENT_HELPER_PATH_LEN] = CONFIG_UEVENT_HELPER_PATH;
+char *uevent_helper = CONFIG_UEVENT_HELPER_PATH;
static DEFINE_SPINLOCK(sequence_lock);
#if defined(CONFIG_NET)
static struct sock *uevent_sock;
@@ -76,6 +77,34 @@ out:
return ret;
}
+/* Call an external helper executable. */
+static int uevent_call_helper(const char *subsystem, struct kobj_uevent_env *env)
+{
+ char *argv[3];
+ char *helper;
+ int retval;
+
+ helper = access_rcu_string(uevent_helper, UEVENT_HELPER_PATH_LEN, GFP_KERNEL);
+ if (!helper)
+ return -ENOMEM;
+
+ retval = -E2BIG;
+ argv[0] = helper;
+ argv[1] = (char *)subsystem;
+ argv[2] = NULL;
+ retval = add_uevent_var(env, "HOME=/");
+ if (retval)
+ goto error;
+ retval = add_uevent_var(env, "PATH=/sbin:/bin:/usr/sbin:/usr/bin");
+ if (retval)
+ goto error;
+
+ retval = call_usermodehelper(argv[0], argv, env->envp, UMH_WAIT_EXEC);
+error:
+ kfree(helper);
+ return retval;
+}
+
/**
* kobject_uevent_env - send an uevent with environmental data
*
@@ -243,23 +272,8 @@ int kobject_uevent_env(struct kobject *k
#endif
/* call uevent_helper, usually only enabled during early boot */
- if (uevent_helper[0]) {
- char *argv [3];
-
- argv [0] = uevent_helper;
- argv [1] = (char *)subsystem;
- argv [2] = NULL;
- retval = add_uevent_var(env, "HOME=/");
- if (retval)
- goto exit;
- retval = add_uevent_var(env,
- "PATH=/sbin:/bin:/usr/sbin:/usr/bin");
- if (retval)
- goto exit;
-
- retval = call_usermodehelper(argv[0], argv,
- env->envp, UMH_WAIT_EXEC);
- }
+ if (uevent_helper[0])
+ retval = uevent_call_helper(subsystem, env);
exit:
kfree(devpath);
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH] [9/11] SYSCTL: Add a mutex to the page_alloc zone order sysctl
2009-12-21 1:20 [PATCH] [0/11] SYSCTL: Use RCU to avoid races with string sysctls Andi Kleen
` (7 preceding siblings ...)
2009-12-21 1:20 ` [PATCH] [8/11] SYSCTL: Convert hotplug helper string to proc_rcu_string() Andi Kleen
@ 2009-12-21 1:20 ` Andi Kleen
2009-12-21 1:20 ` [PATCH] [10/11] SYSCTL: Use RCU protected sysctl for ocfs group add helper Andi Kleen
` (2 subsequent siblings)
11 siblings, 0 replies; 25+ messages in thread
From: Andi Kleen @ 2009-12-21 1:20 UTC (permalink / raw)
To: linux-kernel, paulmck, ebiederm
The zone list code clearly cannot tolerate concurrent writers (I couldn't find
any locks for that), so simply add a global mutex. No need for RCU in this case.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
mm/page_alloc.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
Index: linux-2.6.33-rc1-ak/mm/page_alloc.c
===================================================================
--- linux-2.6.33-rc1-ak.orig/mm/page_alloc.c
+++ linux-2.6.33-rc1-ak/mm/page_alloc.c
@@ -2401,13 +2401,14 @@ int numa_zonelist_order_handler(ctl_tabl
{
char saved_string[NUMA_ZONELIST_ORDER_LEN];
int ret;
+ static DEFINE_MUTEX(zl_order_mutex);
+ mutex_lock(&zl_order_mutex);
if (write)
- strncpy(saved_string, (char*)table->data,
- NUMA_ZONELIST_ORDER_LEN);
+ strcpy(saved_string, (char*)table->data);
ret = proc_dostring(table, write, buffer, length, ppos);
if (ret)
- return ret;
+ goto out;
if (write) {
int oldval = user_zonelist_order;
if (__parse_numa_zonelist_order((char*)table->data)) {
@@ -2420,7 +2421,9 @@ int numa_zonelist_order_handler(ctl_tabl
} else if (oldval != user_zonelist_order)
build_all_zonelists();
}
- return 0;
+out:
+ mutex_unlock(&zl_order_mutex);
+ return ret;
}
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH] [10/11] SYSCTL: Use RCU protected sysctl for ocfs group add helper
2009-12-21 1:20 [PATCH] [0/11] SYSCTL: Use RCU to avoid races with string sysctls Andi Kleen
` (8 preceding siblings ...)
2009-12-21 1:20 ` [PATCH] [9/11] SYSCTL: Add a mutex to the page_alloc zone order sysctl Andi Kleen
@ 2009-12-21 1:20 ` Andi Kleen
2009-12-21 1:20 ` [PATCH] [11/11] SYSCTL: Convert IRDA text sysctl to RCU Andi Kleen
2009-12-21 1:59 ` [PATCH] [0/11] SYSCTL: Use RCU to avoid races with string sysctls Eric W. Biederman
11 siblings, 0 replies; 25+ messages in thread
From: Andi Kleen @ 2009-12-21 1:20 UTC (permalink / raw)
To: joel.becker, linux-kernel, paulmck, ebiederm
This avoids races with unlocked sysctl()
Cc: joel.becker@oracle.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
fs/ocfs2/stackglue.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)
Index: linux-2.6.33-rc1-ak/fs/ocfs2/stackglue.c
===================================================================
--- linux-2.6.33-rc1-ak.orig/fs/ocfs2/stackglue.c
+++ linux-2.6.33-rc1-ak/fs/ocfs2/stackglue.c
@@ -27,6 +27,7 @@
#include <linux/kobject.h>
#include <linux/sysfs.h>
#include <linux/sysctl.h>
+#include <linux/rcustring.h>
#include "ocfs2_fs.h"
@@ -40,7 +41,7 @@ static struct ocfs2_locking_protocol *lp
static DEFINE_SPINLOCK(ocfs2_stack_lock);
static LIST_HEAD(ocfs2_stack_list);
static char cluster_stack_name[OCFS2_STACK_LABEL_LEN + 1];
-static char ocfs2_hb_ctl_path[OCFS2_MAX_HB_CTL_PATH] = "/sbin/ocfs2_hb_ctl";
+static char *ocfs2_hb_ctl_path = "/sbin/ocfs2_hb_ctl";
/*
* The stack currently in use. If not null, active_stack->sp_count > 0,
@@ -395,8 +396,15 @@ static void ocfs2_leave_group(const char
{
int ret;
char *argv[5], *envp[3];
+ char *helper;
- argv[0] = ocfs2_hb_ctl_path;
+ helper = access_rcu_string(ocfs2_hb_ctl_path, OCFS2_MAX_HB_CTL_PATH, GFP_KERNEL);
+ if (!helper) {
+ printk(KERN_ERR "ocfs2_leave_group: no memory\n");
+ return;
+ }
+
+ argv[0] = helper;
argv[1] = "-K";
argv[2] = "-u";
argv[3] = (char *)group;
@@ -414,6 +422,7 @@ static void ocfs2_leave_group(const char
"\"%s %s %s %s\"\n",
ret, argv[0], argv[1], argv[2], argv[3]);
}
+ kfree(helper);
}
/*
@@ -621,10 +630,10 @@ error:
static ctl_table ocfs2_nm_table[] = {
{
.procname = "hb_ctl_path",
- .data = ocfs2_hb_ctl_path,
+ .data = &ocfs2_hb_ctl_path,
.maxlen = OCFS2_MAX_HB_CTL_PATH,
.mode = 0644,
- .proc_handler = proc_dostring,
+ .proc_handler = proc_rcu_string,
},
{ }
};
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH] [11/11] SYSCTL: Convert IRDA text sysctl to RCU
2009-12-21 1:20 [PATCH] [0/11] SYSCTL: Use RCU to avoid races with string sysctls Andi Kleen
` (9 preceding siblings ...)
2009-12-21 1:20 ` [PATCH] [10/11] SYSCTL: Use RCU protected sysctl for ocfs group add helper Andi Kleen
@ 2009-12-21 1:20 ` Andi Kleen
2009-12-21 1:59 ` [PATCH] [0/11] SYSCTL: Use RCU to avoid races with string sysctls Eric W. Biederman
11 siblings, 0 replies; 25+ messages in thread
From: Andi Kleen @ 2009-12-21 1:20 UTC (permalink / raw)
To: samuel, linux-kernel, paulmck, ebiederm
This avoids races with lockless sysctl
Cc: samuel@sortiz.org
---
net/irda/irlmp.c | 3 +--
net/irda/irsysctl.c | 9 ++++++---
2 files changed, 7 insertions(+), 5 deletions(-)
Index: linux-2.6.33-rc1-ak/net/irda/irlmp.c
===================================================================
--- linux-2.6.33-rc1-ak.orig/net/irda/irlmp.c
+++ linux-2.6.33-rc1-ak/net/irda/irlmp.c
@@ -56,7 +56,7 @@ int sysctl_discovery = 0;
int sysctl_discovery_timeout = 3; /* 3 seconds by default */
int sysctl_discovery_slots = 6; /* 6 slots by default */
int sysctl_lap_keepalive_time = LM_IDLE_TIMEOUT * 1000 / HZ;
-char sysctl_devname[65];
+char *sysctl_devname = "Linux";
const char *irlmp_reasons[] = {
"ERROR, NOT USED",
@@ -101,7 +101,6 @@ int __init irlmp_init(void)
spin_lock_init(&irlmp->cachelog->hb_spinlock);
irlmp->last_lsap_sel = 0x0f; /* Reserved 0x00-0x0f */
- strcpy(sysctl_devname, "Linux");
init_timer(&irlmp->discovery_timer);
Index: linux-2.6.33-rc1-ak/net/irda/irsysctl.c
===================================================================
--- linux-2.6.33-rc1-ak.orig/net/irda/irsysctl.c
+++ linux-2.6.33-rc1-ak/net/irda/irsysctl.c
@@ -27,6 +27,7 @@
#include <linux/ctype.h>
#include <linux/sysctl.h>
#include <linux/init.h>
+#include <linux/rcupdate.h>
#include <net/irda/irda.h> /* irda_debug */
#include <net/irda/irlmp.h>
@@ -38,7 +39,7 @@ extern int sysctl_discovery_slots;
extern int sysctl_discovery_timeout;
extern int sysctl_slot_timeout;
extern int sysctl_fast_poll_increase;
-extern char sysctl_devname[];
+extern char *sysctl_devname;
extern int sysctl_max_baud_rate;
extern int sysctl_min_tx_turn_time;
extern int sysctl_max_tx_data_size;
@@ -78,13 +79,15 @@ static int do_devname(ctl_table *table,
{
int ret;
- ret = proc_dostring(table, write, buffer, lenp, ppos);
+ ret = proc_rcu_string(table, write, buffer, lenp, ppos);
if (ret == 0 && write) {
struct ias_value *val;
+ rcu_read_lock();
val = irias_new_string_value(sysctl_devname);
if (val)
irias_object_change_attribute("Device", "DeviceName", val);
+ rcu_read_unlock();
}
return ret;
}
@@ -121,7 +124,7 @@ static ctl_table irda_table[] = {
},
{
.procname = "devname",
- .data = sysctl_devname,
+ .data = &sysctl_devname,
.maxlen = 65,
.mode = 0644,
.proc_handler = do_devname,
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] [0/11] SYSCTL: Use RCU to avoid races with string sysctls
2009-12-21 1:20 [PATCH] [0/11] SYSCTL: Use RCU to avoid races with string sysctls Andi Kleen
` (10 preceding siblings ...)
2009-12-21 1:20 ` [PATCH] [11/11] SYSCTL: Convert IRDA text sysctl to RCU Andi Kleen
@ 2009-12-21 1:59 ` Eric W. Biederman
2009-12-21 2:04 ` Andi Kleen
11 siblings, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2009-12-21 1:59 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel, paulmck
Andi Kleen <andi@firstfloor.org> writes:
> With BKL-less sysctls most of the writable string sysctls are racy. There
> is no locking on the reader side, so a reader could see an inconsistent
> string or worse miss the terminating null and walk of beyond it.
The walk will only extend up to the maximum length of the string.
So the worst case really is inconsistent data.
This is an unfortunate corner case. This is not a regression as this
has been the way things have worked for years. So probably 2.6.34
material.
> This patch kit adds a new "rcu string" variant to avoid these
> problems and convers the racy users. One the writer side the strings are
> always copied to new memory and the readers use rcu_read_lock()
> to get a stable view. For readers who access the string over
> sleeps the reader copies the string.
I will have to look more after the holidays. This rcu_string looks like
it introduces allocations on paths that did not use them before, which
has me wondering a bit.
Eric
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] [0/11] SYSCTL: Use RCU to avoid races with string sysctls
2009-12-21 1:59 ` [PATCH] [0/11] SYSCTL: Use RCU to avoid races with string sysctls Eric W. Biederman
@ 2009-12-21 2:04 ` Andi Kleen
2009-12-21 2:31 ` Eric W. Biederman
0 siblings, 1 reply; 25+ messages in thread
From: Andi Kleen @ 2009-12-21 2:04 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Andi Kleen, linux-kernel, paulmck
On Sun, Dec 20, 2009 at 05:59:59PM -0800, Eric W. Biederman wrote:
> Andi Kleen <andi@firstfloor.org> writes:
>
> > With BKL-less sysctls most of the writable string sysctls are racy. There
> > is no locking on the reader side, so a reader could see an inconsistent
> > string or worse miss the terminating null and walk of beyond it.
>
> The walk will only extend up to the maximum length of the string.
> So the worst case really is inconsistent data.
It could still miss the 0 byte and walk out, can't it?
> This is an unfortunate corner case. This is not a regression as this
> has been the way things have worked for years. So probably 2.6.34
> material.
The one that's a clear regression is the core pattern one, that
was protected before by the BKL. A lot of others were always
broken yes.
>
> > This patch kit adds a new "rcu string" variant to avoid these
> > problems and convers the racy users. One the writer side the strings are
> > always copied to new memory and the readers use rcu_read_lock()
> > to get a stable view. For readers who access the string over
> > sleeps the reader copies the string.
>
> I will have to look more after the holidays. This rcu_string looks like
> it introduces allocations on paths that did not use them before, which
> has me wondering a bit.
On the reader side about all of them allocated before, e.g. for
call_usermodehelper.
If the strings were made a bit smaller this could be also
put on the stack, but I didn't dare for 256 bytes.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] [0/11] SYSCTL: Use RCU to avoid races with string sysctls
2009-12-21 2:04 ` Andi Kleen
@ 2009-12-21 2:31 ` Eric W. Biederman
2009-12-21 3:21 ` Andi Kleen
0 siblings, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2009-12-21 2:31 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel, paulmck
Andi Kleen <andi@firstfloor.org> writes:
> On Sun, Dec 20, 2009 at 05:59:59PM -0800, Eric W. Biederman wrote:
>> Andi Kleen <andi@firstfloor.org> writes:
>>
>> > With BKL-less sysctls most of the writable string sysctls are racy. There
>> > is no locking on the reader side, so a reader could see an inconsistent
>> > string or worse miss the terminating null and walk of beyond it.
>>
>> The walk will only extend up to the maximum length of the string.
>> So the worst case really is inconsistent data.
>
> It could still miss the 0 byte and walk out, can't it?
Looking again. Yes it appears there is a small vulnerability there.
The code does:
len = strlen(data);
if (len > maxlen)
len = maxlen;
So we should be calling:
len = strnlen(data, maxlen);
At which point we won't be able to walk out.
The write side appears to be in need of strnlen_user
as well, so it does not walk all of user space looking
for null byte.
>> This is an unfortunate corner case. This is not a regression as this
>> has been the way things have worked for years. So probably 2.6.34
>> material.
>
> The one that's a clear regression is the core pattern one, that
> was protected before by the BKL. A lot of others were always
> broken yes.
Nope. The core pattern just thought it was protected by BKL. I did
not change the /proc/sys code path to remove the BKL. I don't know
if we ever took the BKL on the /proc/sys codepath.
I remember looking at the core pattern earlier and my memory is that
sysctl is new enough that core pattern was not protected by the BKL on the
/proc/sys path when it was introduced.
There was a lot of confusing code in the sys_sysctl code path (which
grabbed the BKL) so I expect people thought they were safe due to the
BKL when they were not.
So we have sysctl have locking problems, not new sysctl regressions.
>> > This patch kit adds a new "rcu string" variant to avoid these
>> > problems and convers the racy users. One the writer side the strings are
>> > always copied to new memory and the readers use rcu_read_lock()
>> > to get a stable view. For readers who access the string over
>> > sleeps the reader copies the string.
>>
>> I will have to look more after the holidays. This rcu_string looks like
>> it introduces allocations on paths that did not use them before, which
>> has me wondering a bit.
>
> On the reader side about all of them allocated before, e.g. for
> call_usermodehelper.
That sounds like less of an issue.
> If the strings were made a bit smaller this could be also
> put on the stack, but I didn't dare for 256 bytes.
Hmm. rcu wise that sounds wrong, but I haven't looked into your
cool new data structure yet.
Eric
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] [0/11] SYSCTL: Use RCU to avoid races with string sysctls
2009-12-21 2:31 ` Eric W. Biederman
@ 2009-12-21 3:21 ` Andi Kleen
0 siblings, 0 replies; 25+ messages in thread
From: Andi Kleen @ 2009-12-21 3:21 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Andi Kleen, linux-kernel, paulmck
> So we have sysctl have locking problems, not new sysctl regressions.
Ok.
> > If the strings were made a bit smaller this could be also
> > put on the stack, but I didn't dare for 256 bytes.
>
> Hmm. rcu wise that sounds wrong, but I haven't looked into your
What sounds wrong?
The reason for the copies is that when the reader sleeps
rcu_read_lock() cannot be used to protect the string completely,
so it copies instead.
The alternative would have been to use SRCU, but I didn't like that.
> cool new data structure yet.
It's not really particularly new or cool, it's just a simple RCU wrapper
around a string.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] [0/11] SYSCTL: Use RCU to avoid races with string sysctls
@ 2009-12-21 7:57 Alexey Dobriyan
2009-12-21 10:50 ` Andi Kleen
0 siblings, 1 reply; 25+ messages in thread
From: Alexey Dobriyan @ 2009-12-21 7:57 UTC (permalink / raw)
To: andi; +Cc: linux-kernel, ebiederm, paulmck
> new = alloc_rcu_string(MAX_GLOBAL_SIZE, GFP_KERNEL);
I think adding failure mode is not right.
We can get away with per-sysctl-string spinlock and not introduce yet
another API.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] [0/11] SYSCTL: Use RCU to avoid races with string sysctls
2009-12-21 7:57 Alexey Dobriyan
@ 2009-12-21 10:50 ` Andi Kleen
0 siblings, 0 replies; 25+ messages in thread
From: Andi Kleen @ 2009-12-21 10:50 UTC (permalink / raw)
To: Alexey Dobriyan; +Cc: andi, linux-kernel, ebiederm, paulmck
On Mon, Dec 21, 2009 at 09:57:32AM +0200, Alexey Dobriyan wrote:
> > new = alloc_rcu_string(MAX_GLOBAL_SIZE, GFP_KERNEL);
>
> I think adding failure mode is not right.
> We can get away with per-sysctl-string spinlock and not introduce yet
> another API.
You mean when running out of memory here?
When this runs out of memory the system will be not usable anyways
and on the last legs and likely in a deadly swap storm.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] [1/11] Add rcustring ADT for RCU protected strings
2009-12-21 1:20 ` [PATCH] [1/11] Add rcustring ADT for RCU protected strings Andi Kleen
@ 2009-12-22 2:46 ` Paul E. McKenney
2009-12-22 10:05 ` Andi Kleen
0 siblings, 1 reply; 25+ messages in thread
From: Paul E. McKenney @ 2009-12-22 2:46 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel, ebiederm
On Mon, Dec 21, 2009 at 02:20:22AM +0100, Andi Kleen wrote:
>
> Add a little ADT for RCU protected strings. RCU is a convenient
> way to manage modifications to read-often-write-seldom strings.
> Add some helper functions to make this more straight forward.
>
> Used by follow-on patches to implement RCU protected sysctl strings.
>
> * General rules:
> * Reader has to use rcu_read_lock() and not sleep while accessing the string,
> * or alternatively get a copy with access_rcu_string()
> * Writer needs an own lock against each other.
> * Each modification should allocate a new string first and free the old
> * one with free_rcu_string()
> * In writers use rcu_assign_pointer to publicize the updated string to
> * global readers.
> * The size passed to access_rcu_string() must be the same as passed
> * to alloc_rcu_string() and be known in advance. Don't use strlen()!
> *
> * For sysctls also see proc_rcu_string() as a convenient wrapper
Looks very interesting, possibly extremely useful!
I do have a couple of questions below, one a debugging suggestion, and
the other either a bug or extreme confusion on my part.
Thanx, Paul
> Cc: paulmck@linux.vnet.ibm.com
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
>
> ---
> include/linux/rcustring.h | 20 ++++++++
> lib/Makefile | 3 -
> lib/rcustring.c | 112 ++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 134 insertions(+), 1 deletion(-)
>
> Index: linux-2.6.33-rc1-ak/include/linux/rcustring.h
> ===================================================================
> --- /dev/null
> +++ linux-2.6.33-rc1-ak/include/linux/rcustring.h
> @@ -0,0 +1,20 @@
> +#ifndef _RCUSTRING_H
> +#define _RCUSTRING_H 1
> +
> +#include <linux/gfp.h>
> +#include <linux/rcupdate.h>
> +
> +/*
> + * Simple wrapper to manage strings by RCU.
> + */
> +
> +extern char *alloc_rcu_string(int size, gfp_t gfp);
> +extern void free_rcu_string(const char *string);
> +
> +/*
> + * size must be the same as alloc_rcu_string, don't
> + * use strlen on str!
> + */
> +extern char *access_rcu_string(const char *str, int size, gfp_t gfp);
> +
> +#endif
> Index: linux-2.6.33-rc1-ak/lib/rcustring.c
> ===================================================================
> --- /dev/null
> +++ linux-2.6.33-rc1-ak/lib/rcustring.c
> @@ -0,0 +1,112 @@
> +/*
> + * Manage strings by Read-Copy-Update. This is useful for global strings
> + * that change only very rarely, but are read often.
> + *
> + * Author: Andi Kleen
> + *
> + * General rules:
> + * Reader has to use rcu_read_lock() and not sleep while accessing the string,
> + * or alternatively get a copy with access_rcu_string()
> + * Writer needs an own lock against each other.
> + * Each modification should allocate a new string first and free the old
> + * one with free_rcu_string()
> + * In writers use rcu_assign_pointer to publicize the updated string to
> + * global readers.
> + * The size passed to access_rcu_string() must be the same as passed
> + * to alloc_rcu_string() and be known in advance. Don't use strlen()!
> + *
> + * For sysctls also see proc_rcu_string() as a convenient wrapper
> + *
> + * Typical example:
> + * #define MAX_GLOBAL_SIZE ...
> + * char *global = "default";
> + *
> + * Rare writer:
> + * char *old, *new;
> + * DECLARE_MUTEX(writer_lock);
> + * mutex_lock(&writer_lock);
> + * new = alloc_rcu_string(MAX_GLOBAL_SIZE, GFP_KERNEL);
> + * if (!new) {
> + * mutex_unlock(&writer_lock);
> + * return -ENOMEM;
> + * }
> + * strlcpy(new, new_value, MAX_GLOBAL_SIZE);
> + * old = global;
> + * rcu_assign_pointer(global, new);
> + * mutex_unlock(&writer_lock);
> + * free_rcu_string(old);
> + *
> + * Sleepy reader:
> + * char *str = access_rcu_string(global, MAX_GLOBAL_SIZE, GFP_KERNEL);
> + * if (!str)
> + * return -ENOMEM;
> + * ... use str while sleeping ...
> + * kfree(string);
> + *
> + * Non sleepy reader:
> + * rcu_read_lock();
> + * ... read str ...
> + * rcu_read_unlock();
> + *
> + * Note this code could be relatively easily generalized for other kinds
> + * of non-atomic data, but this initial version only handles strings.
> + * Only need to change the strlcpy() below to memcpy()
> + */
> +#include <linux/kernel.h>
> +#include <linux/rcustring.h>
> +#include <linux/slab.h>
> +#include <linux/rcupdate.h>
> +#include <linux/module.h>
> +#include <linux/string.h>
> +
> +struct rcu_string {
> + struct rcu_head rcu;
> + char str[0];
My guess is that you will sooner or later need a size field, perhaps
under some debug config parameter.
> +};
> +
> +char *alloc_rcu_string(int size, gfp_t gfp)
> +{
> + struct rcu_string *rs = kmalloc(sizeof(struct rcu_string) + size, gfp);
> + if (!rs)
> + return NULL;
> + return rs->str;
> +}
> +EXPORT_SYMBOL(alloc_rcu_string);
> +
> +static void do_free_rcu_string(struct rcu_head *h)
> +{
> + kfree(container_of(h, struct rcu_string, rcu));
> +}
> +
> +static inline struct rcu_string *str_to_rcustr(const char *str)
> +{
> + /*
> + * Opencoded container_of because the strict type checking
> + * in normal container_of cannot deal with char str[0] vs char *str.
> + */
> + return (struct rcu_string *)(str - offsetof(struct rcu_string, str));
> +}
> +
> +void free_rcu_string(const char *str)
> +{
> + struct rcu_string *rs = str_to_rcustr(str);
> + call_rcu(&rs->rcu, do_free_rcu_string);
> +}
> +EXPORT_SYMBOL(free_rcu_string);
> +
> +/*
> + * Get a local private copy of a RCU protected string.
> + * Mostly useful to get a string that is stable while sleeping.
> + * Caller must free returned string.
> + */
> +char *access_rcu_string(const char *str, int size, gfp_t gfp)
> +{
> + char *copy = kmalloc(size, gfp);
> + if (!str)
> + return NULL;
Assuming that "str" points to the "str" field of a struct rcu_string,
what prevents a grace period from elapsing at this point, freeing the
"str" out from under us?
> + rcu_read_lock();
> + strlcpy(copy, str, size);
> + rcu_read_unlock();
> + return copy;
> +}
> +EXPORT_SYMBOL(access_rcu_string);
> Index: linux-2.6.33-rc1-ak/lib/Makefile
> ===================================================================
> --- linux-2.6.33-rc1-ak.orig/lib/Makefile
> +++ linux-2.6.33-rc1-ak/lib/Makefile
> @@ -12,7 +12,8 @@ lib-y := ctype.o string.o vsprintf.o cmd
> idr.o int_sqrt.o extable.o prio_tree.o \
> sha1.o irq_regs.o reciprocal_div.o argv_split.o \
> proportions.o prio_heap.o ratelimit.o show_mem.o \
> - is_single_threaded.o plist.o decompress.o flex_array.o
> + is_single_threaded.o plist.o decompress.o flex_array.o \
> + rcustring.o
>
> lib-$(CONFIG_MMU) += ioremap.o
> lib-$(CONFIG_SMP) += cpumask.o
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] [3/11] SYSCTL: Add proc_rcu_string to manage sysctls using rcu strings
2009-12-21 1:20 ` [PATCH] [3/11] SYSCTL: Add proc_rcu_string to manage sysctls using rcu strings Andi Kleen
@ 2009-12-22 2:51 ` Paul E. McKenney
2009-12-22 3:00 ` Eric W. Biederman
0 siblings, 1 reply; 25+ messages in thread
From: Paul E. McKenney @ 2009-12-22 2:51 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel, ebiederm
On Mon, Dec 21, 2009 at 02:20:24AM +0100, Andi Kleen wrote:
>
> Add a helper to use the new rcu strings for managing access
> to text sysctls. Conversions will be in follow-on patches.
>
> An alternative would be to use seqlocks here, but RCU seemed
> cleaner.
>
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
Using the below as an example of my concern about access_rcu_string(), FYI.
> ---
> include/linux/sysctl.h | 2 +
> kernel/sysctl.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++
> kernel/sysctl_check.c | 1
> 3 files changed, 69 insertions(+)
>
> Index: linux-2.6.33-rc1-ak/include/linux/sysctl.h
> ===================================================================
> --- linux-2.6.33-rc1-ak.orig/include/linux/sysctl.h
> +++ linux-2.6.33-rc1-ak/include/linux/sysctl.h
> @@ -969,6 +969,8 @@ typedef int proc_handler (struct ctl_tab
>
> extern int proc_dostring(struct ctl_table *, int,
> void __user *, size_t *, loff_t *);
> +extern int proc_rcu_string(struct ctl_table *, int,
> + void __user *, size_t *, loff_t *);
> extern int proc_dointvec(struct ctl_table *, int,
> void __user *, size_t *, loff_t *);
> extern int proc_dointvec_minmax(struct ctl_table *, int,
> Index: linux-2.6.33-rc1-ak/kernel/sysctl.c
> ===================================================================
> --- linux-2.6.33-rc1-ak.orig/kernel/sysctl.c
> +++ linux-2.6.33-rc1-ak/kernel/sysctl.c
> @@ -50,6 +50,7 @@
> #include <linux/ftrace.h>
> #include <linux/slow-work.h>
> #include <linux/perf_event.h>
> +#include <linux/rcustring.h>
>
> #include <asm/uaccess.h>
> #include <asm/processor.h>
> @@ -2016,6 +2017,60 @@ static int _proc_do_string(void* data, i
> }
>
> /**
> + * proc_rcu_string - sysctl string with rcu protection
> + * @table: the sysctl table
> + * @write: %TRUE if this is a write to the sysctl file
> + * @buffer: the user buffer
> + * @lenp: the size of the user buffer
> + * @ppos: file position
> + *
> + * Handle a string sysctl similar to proc_dostring.
> + * The main difference is that the data pointer in the table
> + * points to a pointer to a string. The string should be initially
> + * pointing to a statically allocated (as a C object, not on the heap)
> + * default. When it is replaced old uses will be protected by
> + * RCU. The reader should use rcu_read_lock()/unlock() or
> + * access_rcu_string().
> + */
> +int proc_rcu_string(struct ctl_table *table, int write,
> + void __user *buffer, size_t *lenp, loff_t *ppos)
> +{
> + int ret;
> +
> + if (write) {
> + /* protect writers against each other */
> + static DEFINE_MUTEX(rcu_string_mutex);
> + char *old;
> + char *new;
> +
> + new = alloc_rcu_string(table->maxlen, GFP_KERNEL);
> + if (!new)
> + return -ENOMEM;
> + mutex_lock(&rcu_string_mutex);
> + old = *(char **)(table->data);
> + strcpy(new, old);
> + ret = _proc_do_string(new, table->maxlen, write, buffer, lenp, ppos);
> + rcu_assign_pointer(*(char **)(table->data), new);
> + /*
> + * For the first initialization allow constant strings.
> + */
> + if (!kernel_address((unsigned long)old))
> + free_rcu_string(old);
> + mutex_unlock(&rcu_string_mutex);
> + } else {
> + char *str;
> +
> + str = access_rcu_string(*(char **)(table->data), table->maxlen,
> + GFP_KERNEL);
So the above statement picks up table->data, then some other CPU comes
in and executes the "write" side of this "if" statement, we get
preempted before access_rcu_string() enters its RCU read-side critical
section, the grace period elapse, we resume, and ... ouch!
One trick would be to make access_rcu_string() be a macro that does
first access to its first argument in an RCU read-side critical section.
Alternatively, pass in the address of the pointer, rather than the
pointer itself.
Or explain to me how I am confused.
> + if (!str)
> + return -ENOMEM;
> + ret = _proc_do_string(str, table->maxlen, write, buffer, lenp, ppos);
> + kfree(str);
> + }
> + return ret;
> +}
> +
> +/**
> * proc_dostring - read a string sysctl
> * @table: the sysctl table
> * @write: %TRUE if this is a write to the sysctl file
> @@ -2030,6 +2085,10 @@ static int _proc_do_string(void* data, i
> * and a newline '\n' is added. It is truncated if the buffer is
> * not large enough.
> *
> + * WARNING: this should be only used for read only strings
> + * or when you have a wrapper with special locking. Otherwise
> + * use proc_rcu_string to avoid races with the consumer.
> + *
> * Returns 0 on success.
> */
> int proc_dostring(struct ctl_table *table, int write,
> @@ -2614,6 +2673,12 @@ int proc_dostring(struct ctl_table *tabl
> return -ENOSYS;
> }
>
> +int proc_rcu_string(struct ctl_table *table, int write,
> + void __user *buffer, size_t *lenp, loff_t *ppos)
> +{
> + return -ENOSYS;
> +}
> +
> int proc_dointvec(struct ctl_table *table, int write,
> void __user *buffer, size_t *lenp, loff_t *ppos)
> {
> @@ -2670,6 +2735,7 @@ EXPORT_SYMBOL(proc_dointvec_minmax);
> EXPORT_SYMBOL(proc_dointvec_userhz_jiffies);
> EXPORT_SYMBOL(proc_dointvec_ms_jiffies);
> EXPORT_SYMBOL(proc_dostring);
> +EXPORT_SYMBOL(proc_rcu_string);
> EXPORT_SYMBOL(proc_doulongvec_minmax);
> EXPORT_SYMBOL(proc_doulongvec_ms_jiffies_minmax);
> EXPORT_SYMBOL(register_sysctl_table);
> Index: linux-2.6.33-rc1-ak/kernel/sysctl_check.c
> ===================================================================
> --- linux-2.6.33-rc1-ak.orig/kernel/sysctl_check.c
> +++ linux-2.6.33-rc1-ak/kernel/sysctl_check.c
> @@ -131,6 +131,7 @@ int sysctl_check_table(struct nsproxy *n
> set_fail(&fail, table, "Directory with extra2");
> } else {
> if ((table->proc_handler == proc_dostring) ||
> + (table->proc_handler == proc_rcu_string) ||
> (table->proc_handler == proc_dointvec) ||
> (table->proc_handler == proc_dointvec_minmax) ||
> (table->proc_handler == proc_dointvec_jiffies) ||
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] [3/11] SYSCTL: Add proc_rcu_string to manage sysctls using rcu strings
2009-12-22 2:51 ` Paul E. McKenney
@ 2009-12-22 3:00 ` Eric W. Biederman
2009-12-22 7:44 ` Paul E. McKenney
0 siblings, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2009-12-22 3:00 UTC (permalink / raw)
To: paulmck; +Cc: Andi Kleen, linux-kernel
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
> On Mon, Dec 21, 2009 at 02:20:24AM +0100, Andi Kleen wrote:
>>
>> Add a helper to use the new rcu strings for managing access
>> to text sysctls. Conversions will be in follow-on patches.
>>
>> An alternative would be to use seqlocks here, but RCU seemed
>> cleaner.
>>
>> Signed-off-by: Andi Kleen <ak@linux.intel.com>
>
> Using the below as an example of my concern about access_rcu_string(), FYI.
>
>> ---
>> include/linux/sysctl.h | 2 +
>> kernel/sysctl.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++
>> kernel/sysctl_check.c | 1
>> 3 files changed, 69 insertions(+)
>>
>> Index: linux-2.6.33-rc1-ak/include/linux/sysctl.h
>> ===================================================================
>> --- linux-2.6.33-rc1-ak.orig/include/linux/sysctl.h
>> +++ linux-2.6.33-rc1-ak/include/linux/sysctl.h
>> @@ -969,6 +969,8 @@ typedef int proc_handler (struct ctl_tab
>>
>> extern int proc_dostring(struct ctl_table *, int,
>> void __user *, size_t *, loff_t *);
>> +extern int proc_rcu_string(struct ctl_table *, int,
>> + void __user *, size_t *, loff_t *);
>> extern int proc_dointvec(struct ctl_table *, int,
>> void __user *, size_t *, loff_t *);
>> extern int proc_dointvec_minmax(struct ctl_table *, int,
>> Index: linux-2.6.33-rc1-ak/kernel/sysctl.c
>> ===================================================================
>> --- linux-2.6.33-rc1-ak.orig/kernel/sysctl.c
>> +++ linux-2.6.33-rc1-ak/kernel/sysctl.c
>> @@ -50,6 +50,7 @@
>> #include <linux/ftrace.h>
>> #include <linux/slow-work.h>
>> #include <linux/perf_event.h>
>> +#include <linux/rcustring.h>
>>
>> #include <asm/uaccess.h>
>> #include <asm/processor.h>
>> @@ -2016,6 +2017,60 @@ static int _proc_do_string(void* data, i
>> }
>>
>> /**
>> + * proc_rcu_string - sysctl string with rcu protection
>> + * @table: the sysctl table
>> + * @write: %TRUE if this is a write to the sysctl file
>> + * @buffer: the user buffer
>> + * @lenp: the size of the user buffer
>> + * @ppos: file position
>> + *
>> + * Handle a string sysctl similar to proc_dostring.
>> + * The main difference is that the data pointer in the table
>> + * points to a pointer to a string. The string should be initially
>> + * pointing to a statically allocated (as a C object, not on the heap)
>> + * default. When it is replaced old uses will be protected by
>> + * RCU. The reader should use rcu_read_lock()/unlock() or
>> + * access_rcu_string().
>> + */
>> +int proc_rcu_string(struct ctl_table *table, int write,
>> + void __user *buffer, size_t *lenp, loff_t *ppos)
>> +{
>> + int ret;
>> +
>> + if (write) {
>> + /* protect writers against each other */
>> + static DEFINE_MUTEX(rcu_string_mutex);
>> + char *old;
>> + char *new;
>> +
>> + new = alloc_rcu_string(table->maxlen, GFP_KERNEL);
>> + if (!new)
>> + return -ENOMEM;
>> + mutex_lock(&rcu_string_mutex);
>> + old = *(char **)(table->data);
>> + strcpy(new, old);
>> + ret = _proc_do_string(new, table->maxlen, write, buffer, lenp, ppos);
>> + rcu_assign_pointer(*(char **)(table->data), new);
>> + /*
>> + * For the first initialization allow constant strings.
>> + */
>> + if (!kernel_address((unsigned long)old))
>> + free_rcu_string(old);
>> + mutex_unlock(&rcu_string_mutex);
>> + } else {
>> + char *str;
>> +
>> + str = access_rcu_string(*(char **)(table->data), table->maxlen,
>> + GFP_KERNEL);
>
> So the above statement picks up table->data, then some other CPU comes
> in and executes the "write" side of this "if" statement, we get
> preempted before access_rcu_string() enters its RCU read-side critical
> section, the grace period elapse, we resume, and ... ouch!
>
> One trick would be to make access_rcu_string() be a macro that does
> first access to its first argument in an RCU read-side critical section.
> Alternatively, pass in the address of the pointer, rather than the
> pointer itself.
>
> Or explain to me how I am confused.
That sounds correct to me. There is also the missing rcu_dereference.
Which is less important but it would make clear that access_rcu_string
does the dereference outside of the rcu critical section.
Eric
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] [3/11] SYSCTL: Add proc_rcu_string to manage sysctls using rcu strings
2009-12-22 3:00 ` Eric W. Biederman
@ 2009-12-22 7:44 ` Paul E. McKenney
0 siblings, 0 replies; 25+ messages in thread
From: Paul E. McKenney @ 2009-12-22 7:44 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Andi Kleen, linux-kernel
On Mon, Dec 21, 2009 at 07:00:44PM -0800, Eric W. Biederman wrote:
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
>
> > On Mon, Dec 21, 2009 at 02:20:24AM +0100, Andi Kleen wrote:
> >>
> >> Add a helper to use the new rcu strings for managing access
> >> to text sysctls. Conversions will be in follow-on patches.
> >>
> >> An alternative would be to use seqlocks here, but RCU seemed
> >> cleaner.
> >>
> >> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> >
> > Using the below as an example of my concern about access_rcu_string(), FYI.
> >
> >> ---
> >> include/linux/sysctl.h | 2 +
> >> kernel/sysctl.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++
> >> kernel/sysctl_check.c | 1
> >> 3 files changed, 69 insertions(+)
> >>
> >> Index: linux-2.6.33-rc1-ak/include/linux/sysctl.h
> >> ===================================================================
> >> --- linux-2.6.33-rc1-ak.orig/include/linux/sysctl.h
> >> +++ linux-2.6.33-rc1-ak/include/linux/sysctl.h
> >> @@ -969,6 +969,8 @@ typedef int proc_handler (struct ctl_tab
> >>
> >> extern int proc_dostring(struct ctl_table *, int,
> >> void __user *, size_t *, loff_t *);
> >> +extern int proc_rcu_string(struct ctl_table *, int,
> >> + void __user *, size_t *, loff_t *);
> >> extern int proc_dointvec(struct ctl_table *, int,
> >> void __user *, size_t *, loff_t *);
> >> extern int proc_dointvec_minmax(struct ctl_table *, int,
> >> Index: linux-2.6.33-rc1-ak/kernel/sysctl.c
> >> ===================================================================
> >> --- linux-2.6.33-rc1-ak.orig/kernel/sysctl.c
> >> +++ linux-2.6.33-rc1-ak/kernel/sysctl.c
> >> @@ -50,6 +50,7 @@
> >> #include <linux/ftrace.h>
> >> #include <linux/slow-work.h>
> >> #include <linux/perf_event.h>
> >> +#include <linux/rcustring.h>
> >>
> >> #include <asm/uaccess.h>
> >> #include <asm/processor.h>
> >> @@ -2016,6 +2017,60 @@ static int _proc_do_string(void* data, i
> >> }
> >>
> >> /**
> >> + * proc_rcu_string - sysctl string with rcu protection
> >> + * @table: the sysctl table
> >> + * @write: %TRUE if this is a write to the sysctl file
> >> + * @buffer: the user buffer
> >> + * @lenp: the size of the user buffer
> >> + * @ppos: file position
> >> + *
> >> + * Handle a string sysctl similar to proc_dostring.
> >> + * The main difference is that the data pointer in the table
> >> + * points to a pointer to a string. The string should be initially
> >> + * pointing to a statically allocated (as a C object, not on the heap)
> >> + * default. When it is replaced old uses will be protected by
> >> + * RCU. The reader should use rcu_read_lock()/unlock() or
> >> + * access_rcu_string().
> >> + */
> >> +int proc_rcu_string(struct ctl_table *table, int write,
> >> + void __user *buffer, size_t *lenp, loff_t *ppos)
> >> +{
> >> + int ret;
> >> +
> >> + if (write) {
> >> + /* protect writers against each other */
> >> + static DEFINE_MUTEX(rcu_string_mutex);
> >> + char *old;
> >> + char *new;
> >> +
> >> + new = alloc_rcu_string(table->maxlen, GFP_KERNEL);
> >> + if (!new)
> >> + return -ENOMEM;
> >> + mutex_lock(&rcu_string_mutex);
> >> + old = *(char **)(table->data);
> >> + strcpy(new, old);
> >> + ret = _proc_do_string(new, table->maxlen, write, buffer, lenp, ppos);
> >> + rcu_assign_pointer(*(char **)(table->data), new);
> >> + /*
> >> + * For the first initialization allow constant strings.
> >> + */
> >> + if (!kernel_address((unsigned long)old))
> >> + free_rcu_string(old);
> >> + mutex_unlock(&rcu_string_mutex);
> >> + } else {
> >> + char *str;
> >> +
> >> + str = access_rcu_string(*(char **)(table->data), table->maxlen,
> >> + GFP_KERNEL);
> >
> > So the above statement picks up table->data, then some other CPU comes
> > in and executes the "write" side of this "if" statement, we get
> > preempted before access_rcu_string() enters its RCU read-side critical
> > section, the grace period elapse, we resume, and ... ouch!
> >
> > One trick would be to make access_rcu_string() be a macro that does
> > first access to its first argument in an RCU read-side critical section.
> > Alternatively, pass in the address of the pointer, rather than the
> > pointer itself.
> >
> > Or explain to me how I am confused.
>
> That sounds correct to me. There is also the missing rcu_dereference.
>
> Which is less important but it would make clear that access_rcu_string
> does the dereference outside of the rcu critical section.
Good point, there does indeed need to be an rcu_dereference() in any case.
Thanx, Paul
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] [1/11] Add rcustring ADT for RCU protected strings
2009-12-22 2:46 ` Paul E. McKenney
@ 2009-12-22 10:05 ` Andi Kleen
2009-12-22 20:59 ` Paul E. McKenney
0 siblings, 1 reply; 25+ messages in thread
From: Andi Kleen @ 2009-12-22 10:05 UTC (permalink / raw)
To: Paul E. McKenney; +Cc: Andi Kleen, linux-kernel, ebiederm
On Mon, Dec 21, 2009 at 06:46:35PM -0800, Paul E. McKenney wrote:
> My guess is that you will sooner or later need a size field, perhaps
> under some debug config parameter.
That can be gotten from ksize() if needed.
> > +/*
> > + * Get a local private copy of a RCU protected string.
> > + * Mostly useful to get a string that is stable while sleeping.
> > + * Caller must free returned string.
> > + */
> > +char *access_rcu_string(const char *str, int size, gfp_t gfp)
> > +{
> > + char *copy = kmalloc(size, gfp);
> > + if (!str)
> > + return NULL;
>
> Assuming that "str" points to the "str" field of a struct rcu_string,
> what prevents a grace period from elapsing at this point, freeing the
> "str" out from under us?
Yes, that's broken thanks. I'll move the reference into the read lock
section.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] [8/11] SYSCTL: Convert hotplug helper string to proc_rcu_string()
2009-12-21 1:20 ` [PATCH] [8/11] SYSCTL: Convert hotplug helper string to proc_rcu_string() Andi Kleen
@ 2009-12-22 19:03 ` Greg KH
0 siblings, 0 replies; 25+ messages in thread
From: Greg KH @ 2009-12-22 19:03 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel, paulmck, ebiederm
On Mon, Dec 21, 2009 at 02:20:29AM +0100, Andi Kleen wrote:
>
> This avoids races with lockless sysctl.
>
> I also moved the code into a separate function because the original
> was very long.
>
> Cc: greg@kroah.com
>
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] [1/11] Add rcustring ADT for RCU protected strings
2009-12-22 10:05 ` Andi Kleen
@ 2009-12-22 20:59 ` Paul E. McKenney
0 siblings, 0 replies; 25+ messages in thread
From: Paul E. McKenney @ 2009-12-22 20:59 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel, ebiederm
On Tue, Dec 22, 2009 at 11:05:53AM +0100, Andi Kleen wrote:
> On Mon, Dec 21, 2009 at 06:46:35PM -0800, Paul E. McKenney wrote:
> > My guess is that you will sooner or later need a size field, perhaps
> > under some debug config parameter.
>
> That can be gotten from ksize() if needed.
Fair enough!!!
> > > +/*
> > > + * Get a local private copy of a RCU protected string.
> > > + * Mostly useful to get a string that is stable while sleeping.
> > > + * Caller must free returned string.
> > > + */
> > > +char *access_rcu_string(const char *str, int size, gfp_t gfp)
> > > +{
> > > + char *copy = kmalloc(size, gfp);
> > > + if (!str)
> > > + return NULL;
> >
> > Assuming that "str" points to the "str" field of a struct rcu_string,
> > what prevents a grace period from elapsing at this point, freeing the
> > "str" out from under us?
>
> Yes, that's broken thanks. I'll move the reference into the read lock
> section.
Looking forward to seeing the updated version!
Thanx, Paul
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2009-12-22 20:59 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-21 1:20 [PATCH] [0/11] SYSCTL: Use RCU to avoid races with string sysctls Andi Kleen
2009-12-21 1:20 ` [PATCH] [1/11] Add rcustring ADT for RCU protected strings Andi Kleen
2009-12-22 2:46 ` Paul E. McKenney
2009-12-22 10:05 ` Andi Kleen
2009-12-22 20:59 ` Paul E. McKenney
2009-12-21 1:20 ` [PATCH] [2/11] Add a kernel_address() that works for data too Andi Kleen
2009-12-21 1:20 ` [PATCH] [3/11] SYSCTL: Add proc_rcu_string to manage sysctls using rcu strings Andi Kleen
2009-12-22 2:51 ` Paul E. McKenney
2009-12-22 3:00 ` Eric W. Biederman
2009-12-22 7:44 ` Paul E. McKenney
2009-12-21 1:20 ` [PATCH] [4/11] SYSCTL: Use RCU strings for core_pattern sysctl Andi Kleen
2009-12-21 1:20 ` [PATCH] [5/11] SYSCTL: Add call_usermodehelper_cleanup() Andi Kleen
2009-12-21 1:20 ` [PATCH] [6/11] SYSCTL: Convert modprobe_path to proc_rcu_string() Andi Kleen
2009-12-21 1:20 ` [PATCH] [7/11] SYSCTL: Convert poweroff_command to proc_rcu_string Andi Kleen
2009-12-21 1:20 ` [PATCH] [8/11] SYSCTL: Convert hotplug helper string to proc_rcu_string() Andi Kleen
2009-12-22 19:03 ` Greg KH
2009-12-21 1:20 ` [PATCH] [9/11] SYSCTL: Add a mutex to the page_alloc zone order sysctl Andi Kleen
2009-12-21 1:20 ` [PATCH] [10/11] SYSCTL: Use RCU protected sysctl for ocfs group add helper Andi Kleen
2009-12-21 1:20 ` [PATCH] [11/11] SYSCTL: Convert IRDA text sysctl to RCU Andi Kleen
2009-12-21 1:59 ` [PATCH] [0/11] SYSCTL: Use RCU to avoid races with string sysctls Eric W. Biederman
2009-12-21 2:04 ` Andi Kleen
2009-12-21 2:31 ` Eric W. Biederman
2009-12-21 3:21 ` Andi Kleen
-- strict thread matches above, loose matches on Subject: below --
2009-12-21 7:57 Alexey Dobriyan
2009-12-21 10:50 ` Andi Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox