public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] [0/9] SYSCTL: Use RCU to avoid races with string sysctls v2
@ 2010-01-05  2:15 Andi Kleen
  2010-01-05  2:15 ` [PATCH] [1/9] Add rcustring ADT for RCU protected strings v2 Andi Kleen
                   ` (8 more replies)
  0 siblings, 9 replies; 24+ messages in thread
From: Andi Kleen @ 2010-01-05  2:15 UTC (permalink / raw)
  To: ebiederm, paulmck, akpm, linux-kernel


v2 version that addresses all the earlier review comments
minus patches that were already merged.

I think this one is ready for merge now. Andrew, could you please
take it?

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---

With BKL-less sysctls most of the writable string sysctls are racy. There
is no locking on the reader side, so a reader could see an inconsistent
string or worse miss the terminating null and walk of beyond it.

This patch kit adds a new "rcu string" variant to avoid these 
problems and convers the racy users. One the writer side the strings are 
always copied to new memory and the readers use rcu_read_lock()
to get a stable view. For readers who access the string over
sleeps the reader copies the string. 

This is all hidden in a new generic "rcu_string" ADT which can be also 
used for other purposes.

This finally implements all the letters in RCU, most other users
leave out the 'C'.

I left some obscure users in architectures (sparc, mips) alone and audited
all of the others. The sparc reboot_cmd one has references to asm files
which I didn't want to touch and the mips variant seemd just too obscure.

All the others are not racy.

-Andi

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH] [1/9] Add rcustring ADT for RCU protected strings v2
  2010-01-05  2:15 [PATCH] [0/9] SYSCTL: Use RCU to avoid races with string sysctls v2 Andi Kleen
@ 2010-01-05  2:15 ` Andi Kleen
  2010-01-05  5:32   ` Paul E. McKenney
                     ` (2 more replies)
  2010-01-05  2:15 ` [PATCH] [2/9] Add a kernel_address() that works for data too Andi Kleen
                   ` (7 subsequent siblings)
  8 siblings, 3 replies; 24+ messages in thread
From: Andi Kleen @ 2010-01-05  2:15 UTC (permalink / raw)
  To: paulmck, ebiederm, paulmck, akpm, linux-kernel


Add a little ADT for RCU protected strings. RCU is a convenient
way to manage modifications to read-often-write-seldom strings.
Add some helper functions to make this more straight forward.

Used by follow-on patches to implement RCU protected sysctl strings.

 * General rules:
 * Reader has to use rcu_read_lock() and not sleep while accessing the string,
 * or alternatively get a copy with access_rcu_string()
 * Writer needs an own lock against each other.
 * Each modification should allocate a new string first and free the old
 * one with free_rcu_string()
 * In writers use rcu_assign_pointer to publicize the updated string to
 * global readers.
 * The size passed to access_rcu_string() must be the same as passed
 * to alloc_rcu_string() and be known in advance. Don't use strlen()!
 *
 * For sysctls also see proc_rcu_string() as a convenient wrapper

v2: Use rcu_dereference correctly

Cc: paulmck@linux.vnet.ibm.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 include/linux/rcustring.h |   20 ++++++++
 lib/Makefile              |    3 -
 lib/rcustring.c           |  115 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 137 insertions(+), 1 deletion(-)

Index: linux-2.6.33-rc2-ak/include/linux/rcustring.h
===================================================================
--- /dev/null
+++ linux-2.6.33-rc2-ak/include/linux/rcustring.h
@@ -0,0 +1,20 @@
+#ifndef _RCUSTRING_H
+#define _RCUSTRING_H 1
+
+#include <linux/gfp.h>
+#include <linux/rcupdate.h>
+
+/*
+ * Simple wrapper to manage strings by RCU.
+ */
+
+extern char *alloc_rcu_string(int size, gfp_t gfp);
+extern void free_rcu_string(const char *string);
+
+/*
+ * size must be the same as alloc_rcu_string, don't
+ * use strlen on str!
+ */
+extern char *access_rcu_string(char **str, int size, gfp_t gfp);
+
+#endif
Index: linux-2.6.33-rc2-ak/lib/rcustring.c
===================================================================
--- /dev/null
+++ linux-2.6.33-rc2-ak/lib/rcustring.c
@@ -0,0 +1,115 @@
+/*
+ * Manage strings by Read-Copy-Update. This is useful for global strings
+ * that change only very rarely, but are read often.
+ *
+ * Author: Andi Kleen
+ *
+ * General rules:
+ * Reader has to use rcu_read_lock() and not sleep while accessing the string,
+ * or alternatively get a copy with access_rcu_string()
+ * Writer needs an own lock against each other.
+ * Each modification should allocate a new string first and free the old
+ * one with free_rcu_string()
+ * In writers use rcu_assign_pointer to publicize the updated string to
+ * global readers.
+ * The size passed to access_rcu_string() must be the same as passed
+ * to alloc_rcu_string() and be known in advance. Don't use strlen()!
+ * The pointer passed to access_rcu_string must be to the variable modified
+ * by the writer.
+ *
+ * For sysctls also see proc_rcu_string() as a convenient wrapper
+ *
+ * Typical example:
+ * #define MAX_GLOBAL_SIZE ...
+ * char *global = "default";
+ *
+ * Rare writer:
+ * char *old, *new;
+ * DECLARE_MUTEX(writer_lock);
+ * mutex_lock(&writer_lock);
+ * new = alloc_rcu_string(MAX_GLOBAL_SIZE, GFP_KERNEL);
+ * if (!new) {
+ *      mutex_unlock(&writer_lock);
+ *      return -ENOMEM;
+ * }
+ * strlcpy(new, new_value, MAX_GLOBAL_SIZE);
+ * old = global;
+ * rcu_assign_pointer(global, new);
+ * mutex_unlock(&writer_lock);
+ * free_rcu_string(old);
+ *
+ * Sleepy reader:
+ * char *str = access_rcu_string(&global, MAX_GLOBAL_SIZE, GFP_KERNEL);
+ * if (!str)
+ *     return -ENOMEM;
+ * ... use str while sleeping ...
+ * kfree(string);
+ *
+ * Non sleepy reader:
+ * rcu_read_lock();
+ * str = rcu_dereference(&global);
+ * ... use str without sleeping ...
+ * rcu_read_unlock();
+ *
+ * Note this code could be relatively easily generalized for other kinds
+ * of non-atomic data, but this initial version only handles strings.
+ * Only need to change the strlcpy() below to memcpy()
+ */
+#include <linux/kernel.h>
+#include <linux/rcustring.h>
+#include <linux/slab.h>
+#include <linux/rcupdate.h>
+#include <linux/module.h>
+#include <linux/string.h>
+
+struct rcu_string {
+	struct rcu_head rcu;
+	char str[0];
+};
+
+char *alloc_rcu_string(int size, gfp_t gfp)
+{
+	struct rcu_string *rs = kmalloc(sizeof(struct rcu_string) + size, gfp);
+	if (!rs)
+		return NULL;
+	return rs->str;
+}
+EXPORT_SYMBOL(alloc_rcu_string);
+
+static void do_free_rcu_string(struct rcu_head *h)
+{
+	kfree(container_of(h, struct rcu_string, rcu));
+}
+
+static inline struct rcu_string *str_to_rcustr(const char *str)
+{
+	/*
+	 * Opencoded container_of because the strict type checking
+	 * in normal container_of cannot deal with char str[0] vs char *str.
+	 */
+	return (struct rcu_string *)(str - offsetof(struct rcu_string, str));
+}
+
+void free_rcu_string(const char *str)
+{
+	struct rcu_string *rs = str_to_rcustr(str);
+	call_rcu(&rs->rcu, do_free_rcu_string);
+}
+EXPORT_SYMBOL(free_rcu_string);
+
+/*
+ * Get a local private copy of a RCU protected string.
+ * Mostly useful to get a string that is stable while sleeping.
+ * Caller must free returned string.
+ */
+char *access_rcu_string(char **str, int size, gfp_t gfp)
+{
+	char *copy = kmalloc(size, gfp);
+	if (!str)
+		return NULL;
+	rcu_read_lock();
+	strlcpy(copy, rcu_dereference(*str), size);
+	rcu_read_unlock();
+	return copy;
+}
+EXPORT_SYMBOL(access_rcu_string);
Index: linux-2.6.33-rc2-ak/lib/Makefile
===================================================================
--- linux-2.6.33-rc2-ak.orig/lib/Makefile
+++ linux-2.6.33-rc2-ak/lib/Makefile
@@ -12,7 +12,8 @@ lib-y := ctype.o string.o vsprintf.o cmd
 	 idr.o int_sqrt.o extable.o prio_tree.o \
 	 sha1.o irq_regs.o reciprocal_div.o argv_split.o \
 	 proportions.o prio_heap.o ratelimit.o show_mem.o \
-	 is_single_threaded.o plist.o decompress.o flex_array.o
+	 is_single_threaded.o plist.o decompress.o flex_array.o \
+	 rcustring.o
 
 lib-$(CONFIG_MMU) += ioremap.o
 lib-$(CONFIG_SMP) += cpumask.o

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH] [2/9] Add a kernel_address() that works for data too
  2010-01-05  2:15 [PATCH] [0/9] SYSCTL: Use RCU to avoid races with string sysctls v2 Andi Kleen
  2010-01-05  2:15 ` [PATCH] [1/9] Add rcustring ADT for RCU protected strings v2 Andi Kleen
@ 2010-01-05  2:15 ` Andi Kleen
  2010-01-05  8:44   ` Russell King
  2010-01-05  8:58   ` Sam Ravnborg
  2010-01-05  2:15 ` [PATCH] [3/9] SYSCTL: Add proc_rcu_string to manage sysctls using rcu strings Andi Kleen
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 24+ messages in thread
From: Andi Kleen @ 2010-01-05  2:15 UTC (permalink / raw)
  To: linux-arch, ebiederm, paulmck, akpm, linux-kernel


Add a variant of kernel_text_address() that includes kernel data.

Assumes kernel is _text ... _end - init section. True everywhere?

Cc: linux-arch@vger.kernel.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 include/linux/kernel.h |    1 +
 kernel/extable.c       |   18 ++++++++++++++++++
 2 files changed, 19 insertions(+)

Index: linux-2.6.33-rc2-ak/include/linux/kernel.h
===================================================================
--- linux-2.6.33-rc2-ak.orig/include/linux/kernel.h
+++ linux-2.6.33-rc2-ak/include/linux/kernel.h
@@ -205,6 +205,7 @@ extern unsigned long long memparse(const
 extern int core_kernel_text(unsigned long addr);
 extern int __kernel_text_address(unsigned long addr);
 extern int kernel_text_address(unsigned long addr);
+extern int kernel_address(unsigned long addr);
 extern int func_ptr_is_kernel_text(void *ptr);
 
 struct pid;
Index: linux-2.6.33-rc2-ak/kernel/extable.c
===================================================================
--- linux-2.6.33-rc2-ak.orig/kernel/extable.c
+++ linux-2.6.33-rc2-ak/kernel/extable.c
@@ -72,6 +72,18 @@ int core_kernel_text(unsigned long addr)
 	return 0;
 }
 
+static int core_kernel_address(unsigned long addr)
+{
+	if ((addr >= (unsigned long)_text &&
+	     addr <= (unsigned long)_end)) {
+		if (addr >= (unsigned long)__init_begin &&
+		    addr < (unsigned long)__init_end)
+			return system_state == SYSTEM_BOOTING;
+		return 1;
+	}
+	return 0;
+}
+
 int __kernel_text_address(unsigned long addr)
 {
 	if (core_kernel_text(addr))
@@ -98,6 +110,12 @@ int kernel_text_address(unsigned long ad
 	return is_module_text_address(addr);
 }
 
+/* text or data in core kernel or module */
+int kernel_address(unsigned long addr)
+{
+	return core_kernel_address(addr) || is_module_address(addr);
+}
+
 /*
  * On some architectures (PPC64, IA64) function pointers
  * are actually only tokens to some data that then holds the

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH] [3/9] SYSCTL: Add proc_rcu_string to manage sysctls using rcu strings
  2010-01-05  2:15 [PATCH] [0/9] SYSCTL: Use RCU to avoid races with string sysctls v2 Andi Kleen
  2010-01-05  2:15 ` [PATCH] [1/9] Add rcustring ADT for RCU protected strings v2 Andi Kleen
  2010-01-05  2:15 ` [PATCH] [2/9] Add a kernel_address() that works for data too Andi Kleen
@ 2010-01-05  2:15 ` Andi Kleen
  2010-01-05  2:15 ` [PATCH] [4/9] SYSCTL: Use RCU strings for core_pattern sysctl Andi Kleen
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Andi Kleen @ 2010-01-05  2:15 UTC (permalink / raw)
  To: ebiederm, paulmck, akpm, linux-kernel


Add a helper to use the new rcu strings for managing access
to text sysctls. Conversions will be in follow-on patches.

An alternative would be to use seqlocks here, but RCU seemed
cleaner.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 include/linux/sysctl.h |    2 +
 kernel/sysctl.c        |   66 +++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/sysctl_check.c  |    1 
 3 files changed, 69 insertions(+)

Index: linux-2.6.33-rc2-ak/include/linux/sysctl.h
===================================================================
--- linux-2.6.33-rc2-ak.orig/include/linux/sysctl.h
+++ linux-2.6.33-rc2-ak/include/linux/sysctl.h
@@ -969,6 +969,8 @@ typedef int proc_handler (struct ctl_tab
 
 extern int proc_dostring(struct ctl_table *, int,
 			 void __user *, size_t *, loff_t *);
+extern int proc_rcu_string(struct ctl_table *, int,
+			 void __user *, size_t *, loff_t *);
 extern int proc_dointvec(struct ctl_table *, int,
 			 void __user *, size_t *, loff_t *);
 extern int proc_dointvec_minmax(struct ctl_table *, int,
Index: linux-2.6.33-rc2-ak/kernel/sysctl.c
===================================================================
--- linux-2.6.33-rc2-ak.orig/kernel/sysctl.c
+++ linux-2.6.33-rc2-ak/kernel/sysctl.c
@@ -50,6 +50,7 @@
 #include <linux/ftrace.h>
 #include <linux/slow-work.h>
 #include <linux/perf_event.h>
+#include <linux/rcustring.h>
 
 #include <asm/uaccess.h>
 #include <asm/processor.h>
@@ -2016,6 +2017,60 @@ static int _proc_do_string(void* data, i
 }
 
 /**
+ * proc_rcu_string - sysctl string with rcu protection
+ * @table: the sysctl table
+ * @write: %TRUE if this is a write to the sysctl file
+ * @buffer: the user buffer
+ * @lenp: the size of the user buffer
+ * @ppos: file position
+ *
+ * Handle a string sysctl similar to proc_dostring.
+ * The main difference is that the data pointer in the table
+ * points to a pointer to a string. The string should be initially
+ * pointing to a statically allocated (as a C object, not on the heap)
+ * default. When it is replaced old uses will be protected by
+ * RCU. The reader should use rcu_read_lock()/unlock() or
+ * access_rcu_string().
+ */
+int proc_rcu_string(struct ctl_table *table, int write,
+		void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+	int ret;
+
+	if (write) {
+		/* protect writers against each other */
+		static DEFINE_MUTEX(rcu_string_mutex);
+		char *old;
+		char *new;
+
+		new = alloc_rcu_string(table->maxlen, GFP_KERNEL);
+		if (!new)
+			return -ENOMEM;
+		mutex_lock(&rcu_string_mutex);
+		old = *(char **)(table->data);
+		strcpy(new, old);
+		ret = _proc_do_string(new, table->maxlen, write, buffer, lenp, ppos);
+		rcu_assign_pointer(*(char **)(table->data), new);
+		/*
+		 * For the first initialization allow constant strings.
+		 */
+		if (!kernel_address((unsigned long)old))
+			free_rcu_string(old);
+		mutex_unlock(&rcu_string_mutex);
+	} else {
+		char *str;
+
+		str = access_rcu_string((char **)(table->data), table->maxlen,
+					GFP_KERNEL);
+		if (!str)
+			return -ENOMEM;
+		ret = _proc_do_string(str, table->maxlen, write, buffer, lenp, ppos);
+		kfree(str);
+	}
+	return ret;
+}
+
+/**
  * proc_dostring - read a string sysctl
  * @table: the sysctl table
  * @write: %TRUE if this is a write to the sysctl file
@@ -2030,6 +2085,10 @@ static int _proc_do_string(void* data, i
  * and a newline '\n' is added. It is truncated if the buffer is
  * not large enough.
  *
+ * WARNING: this should be only used for read only strings
+ * or when you have a wrapper with special locking. Otherwise
+ * use proc_rcu_string to avoid races with the consumer.
+ *
  * Returns 0 on success.
  */
 int proc_dostring(struct ctl_table *table, int write,
@@ -2614,6 +2673,12 @@ int proc_dostring(struct ctl_table *tabl
 	return -ENOSYS;
 }
 
+int proc_rcu_string(struct ctl_table *table, int write,
+		  void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+	return -ENOSYS;
+}
+
 int proc_dointvec(struct ctl_table *table, int write,
 		  void __user *buffer, size_t *lenp, loff_t *ppos)
 {
@@ -2670,6 +2735,7 @@ EXPORT_SYMBOL(proc_dointvec_minmax);
 EXPORT_SYMBOL(proc_dointvec_userhz_jiffies);
 EXPORT_SYMBOL(proc_dointvec_ms_jiffies);
 EXPORT_SYMBOL(proc_dostring);
+EXPORT_SYMBOL(proc_rcu_string);
 EXPORT_SYMBOL(proc_doulongvec_minmax);
 EXPORT_SYMBOL(proc_doulongvec_ms_jiffies_minmax);
 EXPORT_SYMBOL(register_sysctl_table);
Index: linux-2.6.33-rc2-ak/kernel/sysctl_check.c
===================================================================
--- linux-2.6.33-rc2-ak.orig/kernel/sysctl_check.c
+++ linux-2.6.33-rc2-ak/kernel/sysctl_check.c
@@ -131,6 +131,7 @@ int sysctl_check_table(struct nsproxy *n
 				set_fail(&fail, table, "Directory with extra2");
 		} else {
 			if ((table->proc_handler == proc_dostring) ||
+			    (table->proc_handler == proc_rcu_string) ||
 			    (table->proc_handler == proc_dointvec) ||
 			    (table->proc_handler == proc_dointvec_minmax) ||
 			    (table->proc_handler == proc_dointvec_jiffies) ||

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH] [4/9] SYSCTL: Use RCU strings for core_pattern sysctl
  2010-01-05  2:15 [PATCH] [0/9] SYSCTL: Use RCU to avoid races with string sysctls v2 Andi Kleen
                   ` (2 preceding siblings ...)
  2010-01-05  2:15 ` [PATCH] [3/9] SYSCTL: Add proc_rcu_string to manage sysctls using rcu strings Andi Kleen
@ 2010-01-05  2:15 ` Andi Kleen
  2010-01-05  6:56   ` Eric W. Biederman
  2010-01-05  2:15 ` [PATCH] [5/9] SYSCTL: Add call_usermodehelper_cleanup() Andi Kleen
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Andi Kleen @ 2010-01-05  2:15 UTC (permalink / raw)
  To: ebiederm, paulmck, akpm, linux-kernel


Also saves ~220 bytes in the data segment for default kernels.

As a bonus this removes one use of the BKL.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 fs/exec.c       |   11 +++++------
 kernel/sysctl.c |    6 +++---
 2 files changed, 8 insertions(+), 9 deletions(-)

Index: linux-2.6.33-rc2-ak/fs/exec.c
===================================================================
--- linux-2.6.33-rc2-ak.orig/fs/exec.c
+++ linux-2.6.33-rc2-ak/fs/exec.c
@@ -62,7 +62,7 @@
 #include "internal.h"
 
 int core_uses_pid;
-char core_pattern[CORENAME_MAX_SIZE] = "core";
+char *core_pattern = "core";
 unsigned int core_pipe_limit;
 int suid_dumpable = 0;
 
@@ -1421,7 +1421,7 @@ EXPORT_SYMBOL(set_binfmt);
 static int format_corename(char *corename, long signr)
 {
 	const struct cred *cred = current_cred();
-	const char *pat_ptr = core_pattern;
+	const char *pat_ptr = rcu_dereference(core_pattern);
 	int ispipe = (*pat_ptr == '|');
 	char *out_ptr = corename;
 	char *const out_end = corename + CORENAME_MAX_SIZE;
@@ -1825,12 +1825,11 @@ void do_coredump(long signr, int exit_co
 	clear_thread_flag(TIF_SIGPENDING);
 
 	/*
-	 * lock_kernel() because format_corename() is controlled by sysctl, which
-	 * uses lock_kernel()
+	 * Protect corename by RCU vs proc_rcu_string()
 	 */
- 	lock_kernel();
+	rcu_read_lock();
 	ispipe = format_corename(corename, signr);
-	unlock_kernel();
+	rcu_read_unlock();
 
 	if ((!ispipe) && (cprm.limit < binfmt->min_coredump))
 		goto fail_unlock;
Index: linux-2.6.33-rc2-ak/kernel/sysctl.c
===================================================================
--- linux-2.6.33-rc2-ak.orig/kernel/sysctl.c
+++ linux-2.6.33-rc2-ak/kernel/sysctl.c
@@ -75,7 +75,7 @@ extern int sysctl_oom_dump_tasks;
 extern int max_threads;
 extern int core_uses_pid;
 extern int suid_dumpable;
-extern char core_pattern[];
+extern char *core_pattern;
 extern unsigned int core_pipe_limit;
 extern int pid_max;
 extern int min_free_kbytes;
@@ -399,10 +399,10 @@ static struct ctl_table kern_table[] = {
 	},
 	{
 		.procname	= "core_pattern",
-		.data		= core_pattern,
+		.data		= &core_pattern,
 		.maxlen		= CORENAME_MAX_SIZE,
 		.mode		= 0644,
-		.proc_handler	= proc_dostring,
+		.proc_handler	= proc_rcu_string,
 	},
 	{
 		.procname	= "core_pipe_limit",

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH] [5/9] SYSCTL: Add call_usermodehelper_cleanup()
  2010-01-05  2:15 [PATCH] [0/9] SYSCTL: Use RCU to avoid races with string sysctls v2 Andi Kleen
                   ` (3 preceding siblings ...)
  2010-01-05  2:15 ` [PATCH] [4/9] SYSCTL: Use RCU strings for core_pattern sysctl Andi Kleen
@ 2010-01-05  2:15 ` Andi Kleen
  2010-01-05  2:15 ` [PATCH] [6/9] SYSCTL: Convert modprobe_path to proc_rcu_string() Andi Kleen
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Andi Kleen @ 2010-01-05  2:15 UTC (permalink / raw)
  To: ebiederm, paulmck, akpm, linux-kernel


This is the same as call_usermodehelper(), but with an cleanup callback.
Needed for some of the followon proc_rcu_string() users.

This avoids open coding this.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 include/linux/kmod.h |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Index: linux-2.6.33-rc2-ak/include/linux/kmod.h
===================================================================
--- linux-2.6.33-rc2-ak.orig/include/linux/kmod.h
+++ linux-2.6.33-rc2-ak/include/linux/kmod.h
@@ -72,7 +72,8 @@ int call_usermodehelper_exec(struct subp
 void call_usermodehelper_freeinfo(struct subprocess_info *info);
 
 static inline int
-call_usermodehelper(char *path, char **argv, char **envp, enum umh_wait wait)
+call_usermodehelper_cleanup(char *path, char **argv, char **envp, enum umh_wait wait,
+			    void (*cleanup)(char **, char **))
 {
 	struct subprocess_info *info;
 	gfp_t gfp_mask = (wait == UMH_NO_WAIT) ? GFP_ATOMIC : GFP_KERNEL;
@@ -80,10 +81,18 @@ call_usermodehelper(char *path, char **a
 	info = call_usermodehelper_setup(path, argv, envp, gfp_mask);
 	if (info == NULL)
 		return -ENOMEM;
+	if (cleanup)
+		call_usermodehelper_setcleanup(info, cleanup);
 	return call_usermodehelper_exec(info, wait);
 }
 
 static inline int
+call_usermodehelper(char *path, char **argv, char **envp, enum umh_wait wait)
+{
+	return call_usermodehelper_cleanup(path, argv, envp, wait, NULL);
+}
+
+static inline int
 call_usermodehelper_keys(char *path, char **argv, char **envp,
 			 struct key *session_keyring, enum umh_wait wait)
 {

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH] [6/9] SYSCTL: Convert modprobe_path to proc_rcu_string()
  2010-01-05  2:15 [PATCH] [0/9] SYSCTL: Use RCU to avoid races with string sysctls v2 Andi Kleen
                   ` (4 preceding siblings ...)
  2010-01-05  2:15 ` [PATCH] [5/9] SYSCTL: Add call_usermodehelper_cleanup() Andi Kleen
@ 2010-01-05  2:15 ` Andi Kleen
  2010-01-05  2:15 ` [PATCH] [7/9] SYSCTL: Convert poweroff_command to proc_rcu_string Andi Kleen
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Andi Kleen @ 2010-01-05  2:15 UTC (permalink / raw)
  To: ebiederm, paulmck, akpm, linux-kernel


Avoids races with lockless sysctl

Also saves ~220 bytes in the data segment for default kernels.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 kernel/kmod.c   |   36 ++++++++++++++++++++++++++++--------
 kernel/sysctl.c |    4 ++--
 2 files changed, 30 insertions(+), 10 deletions(-)

Index: linux-2.6.33-rc2-ak/kernel/kmod.c
===================================================================
--- linux-2.6.33-rc2-ak.orig/kernel/kmod.c
+++ linux-2.6.33-rc2-ak/kernel/kmod.c
@@ -35,6 +35,7 @@
 #include <linux/resource.h>
 #include <linux/notifier.h>
 #include <linux/suspend.h>
+#include <linux/rcustring.h>
 #include <asm/uaccess.h>
 
 #include <trace/events/module.h>
@@ -48,7 +49,12 @@ static struct workqueue_struct *khelper_
 /*
 	modprobe_path is set via /proc/sys.
 */
-char modprobe_path[KMOD_PATH_LEN] = "/sbin/modprobe";
+char *modprobe_path = "/sbin/modprobe";
+
+static void free_arg(char **argv, char **env)
+{
+	kfree(argv[0]);
+}
 
 /**
  * __request_module - try to load a kernel module
@@ -71,7 +77,8 @@ int __request_module(bool wait, const ch
 	char module_name[MODULE_NAME_LEN];
 	unsigned int max_modprobes;
 	int ret;
-	char *argv[] = { modprobe_path, "-q", "--", module_name, NULL };
+	char *mp_copy;
+	char *argv[] = { NULL, "-q", "--", module_name, NULL };
 	static char *envp[] = { "HOME=/",
 				"TERM=linux",
 				"PATH=/sbin:/usr/sbin:/bin:/usr/bin",
@@ -80,15 +87,24 @@ int __request_module(bool wait, const ch
 #define MAX_KMOD_CONCURRENT 50	/* Completely arbitrary value - KAO */
 	static int kmod_loop_msg;
 
+	/* Get a stable-over-sleeps private copy of modprobe_path */
+	mp_copy = access_rcu_string(&modprobe_path, KMOD_PATH_LEN,
+					wait ? GFP_KERNEL : GFP_ATOMIC);
+	if (!mp_copy)
+		return -ENOMEM;
+	argv[0] = mp_copy;
+
 	va_start(args, fmt);
 	ret = vsnprintf(module_name, MODULE_NAME_LEN, fmt, args);
 	va_end(args);
-	if (ret >= MODULE_NAME_LEN)
-		return -ENAMETOOLONG;
+	if (ret >= MODULE_NAME_LEN) {
+		ret = -ENAMETOOLONG;
+		goto error;
+	}
 
 	ret = security_kernel_module_request(module_name);
 	if (ret)
-		return ret;
+		goto error;
 
 	/* If modprobe needs a service that is in a module, we get a recursive
 	 * loop.  Limit the number of running kmod threads to max_threads/2 or
@@ -111,14 +127,18 @@ int __request_module(bool wait, const ch
 			       "request_module: runaway loop modprobe %s\n",
 			       module_name);
 		atomic_dec(&kmod_concurrent);
-		return -ENOMEM;
+		ret = -ENOMEM;
+		goto error;
 	}
 
 	trace_module_request(module_name, wait, _RET_IP_);
 
-	ret = call_usermodehelper(modprobe_path, argv, envp,
-			wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC);
+	ret = call_usermodehelper_cleanup(mp_copy, argv, envp,
+			wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC, free_arg);
+	mp_copy = NULL; /* free_arg frees */
 	atomic_dec(&kmod_concurrent);
+error:
+	kfree(mp_copy);
 	return ret;
 }
 EXPORT_SYMBOL(__request_module);
Index: linux-2.6.33-rc2-ak/kernel/sysctl.c
===================================================================
--- linux-2.6.33-rc2-ak.orig/kernel/sysctl.c
+++ linux-2.6.33-rc2-ak/kernel/sysctl.c
@@ -121,7 +121,7 @@ static int min_percpu_pagelist_fract = 8
 static int ngroups_max = NGROUPS_MAX;
 
 #ifdef CONFIG_MODULES
-extern char modprobe_path[];
+extern char *modprobe_path;
 extern int modules_disabled;
 #endif
 #ifdef CONFIG_CHR_DEV_SG
@@ -532,7 +532,7 @@ static struct ctl_table kern_table[] = {
 		.data		= &modprobe_path,
 		.maxlen		= KMOD_PATH_LEN,
 		.mode		= 0644,
-		.proc_handler	= proc_dostring,
+		.proc_handler	= proc_rcu_string,
 	},
 	{
 		.procname	= "modules_disabled",

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH] [7/9] SYSCTL: Convert poweroff_command to proc_rcu_string
  2010-01-05  2:15 [PATCH] [0/9] SYSCTL: Use RCU to avoid races with string sysctls v2 Andi Kleen
                   ` (5 preceding siblings ...)
  2010-01-05  2:15 ` [PATCH] [6/9] SYSCTL: Convert modprobe_path to proc_rcu_string() Andi Kleen
@ 2010-01-05  2:15 ` Andi Kleen
  2010-01-05  2:15 ` [PATCH] [8/9] SYSCTL: Convert hotplug helper string to proc_rcu_string() Andi Kleen
  2010-01-05  2:15 ` [PATCH] [9/9] SYSCTL: Use RCU protected sysctl for ocfs group add helper Andi Kleen
  8 siblings, 0 replies; 24+ messages in thread
From: Andi Kleen @ 2010-01-05  2:15 UTC (permalink / raw)
  To: ebiederm, paulmck, akpm, linux-kernel


Avoids races with lockless sysctl.

Also saves ~220 bytes in the data segment for default kernels.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 include/linux/reboot.h |    2 +-
 kernel/sys.c           |    8 ++++++--
 kernel/sysctl.c        |    2 +-
 3 files changed, 8 insertions(+), 4 deletions(-)

Index: linux-2.6.33-rc2-ak/kernel/sys.c
===================================================================
--- linux-2.6.33-rc2-ak.orig/kernel/sys.c
+++ linux-2.6.33-rc2-ak/kernel/sys.c
@@ -1597,7 +1597,7 @@ SYSCALL_DEFINE3(getcpu, unsigned __user
 	return err ? -EFAULT : 0;
 }
 
-char poweroff_cmd[POWEROFF_CMD_PATH_LEN] = "/sbin/poweroff";
+char *poweroff_cmd = "/sbin/poweroff";
 
 static void argv_cleanup(char **argv, char **envp)
 {
@@ -1614,7 +1614,7 @@ static void argv_cleanup(char **argv, ch
 int orderly_poweroff(bool force)
 {
 	int argc;
-	char **argv = argv_split(GFP_ATOMIC, poweroff_cmd, &argc);
+	char **argv;
 	static char *envp[] = {
 		"HOME=/",
 		"PATH=/sbin:/bin:/usr/sbin:/usr/bin",
@@ -1623,6 +1623,10 @@ int orderly_poweroff(bool force)
 	int ret = -ENOMEM;
 	struct subprocess_info *info;
 
+	/* RCU protection for poweroff_cmd */
+	rcu_read_lock();
+	argv = argv_split(GFP_ATOMIC, rcu_dereference(poweroff_cmd), &argc);
+	rcu_read_unlock();
 	if (argv == NULL) {
 		printk(KERN_WARNING "%s failed to allocate memory for \"%s\"\n",
 		       __func__, poweroff_cmd);
Index: linux-2.6.33-rc2-ak/include/linux/reboot.h
===================================================================
--- linux-2.6.33-rc2-ak.orig/include/linux/reboot.h
+++ linux-2.6.33-rc2-ak/include/linux/reboot.h
@@ -67,7 +67,7 @@ extern void kernel_power_off(void);
 void ctrl_alt_del(void);
 
 #define POWEROFF_CMD_PATH_LEN	256
-extern char poweroff_cmd[POWEROFF_CMD_PATH_LEN];
+extern char *poweroff_cmd;
 
 extern int orderly_poweroff(bool force);
 
Index: linux-2.6.33-rc2-ak/kernel/sysctl.c
===================================================================
--- linux-2.6.33-rc2-ak.orig/kernel/sysctl.c
+++ linux-2.6.33-rc2-ak/kernel/sysctl.c
@@ -871,7 +871,7 @@ static struct ctl_table kern_table[] = {
 		.data		= &poweroff_cmd,
 		.maxlen		= POWEROFF_CMD_PATH_LEN,
 		.mode		= 0644,
-		.proc_handler	= proc_dostring,
+		.proc_handler	= proc_rcu_string,
 	},
 #ifdef CONFIG_KEYS
 	{

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH] [8/9] SYSCTL: Convert hotplug helper string to proc_rcu_string()
  2010-01-05  2:15 [PATCH] [0/9] SYSCTL: Use RCU to avoid races with string sysctls v2 Andi Kleen
                   ` (6 preceding siblings ...)
  2010-01-05  2:15 ` [PATCH] [7/9] SYSCTL: Convert poweroff_command to proc_rcu_string Andi Kleen
@ 2010-01-05  2:15 ` Andi Kleen
  2010-01-05  2:15 ` [PATCH] [9/9] SYSCTL: Use RCU protected sysctl for ocfs group add helper Andi Kleen
  8 siblings, 0 replies; 24+ messages in thread
From: Andi Kleen @ 2010-01-05  2:15 UTC (permalink / raw)
  To: ebiederm, paulmck, akpm, linux-kernel


This avoids races with lockless sysctl.

Also saves ~220 bytes in the data segment for default kernels.

I also moved the code into a separate function because the original
was very long.

Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 include/linux/kobject.h |    2 -
 kernel/sysctl.c         |    2 -
 lib/kobject_uevent.c    |   50 ++++++++++++++++++++++++++++++------------------
 3 files changed, 34 insertions(+), 20 deletions(-)

Index: linux-2.6.33-rc2-ak/include/linux/kobject.h
===================================================================
--- linux-2.6.33-rc2-ak.orig/include/linux/kobject.h
+++ linux-2.6.33-rc2-ak/include/linux/kobject.h
@@ -31,7 +31,7 @@
 #define UEVENT_BUFFER_SIZE		2048	/* buffer for the variables */
 
 /* path to the userspace helper executed on an event */
-extern char uevent_helper[];
+extern char *uevent_helper;
 
 /* counter to tag the uevent, read only except for the kobject core */
 extern u64 uevent_seqnum;
Index: linux-2.6.33-rc2-ak/kernel/sysctl.c
===================================================================
--- linux-2.6.33-rc2-ak.orig/kernel/sysctl.c
+++ linux-2.6.33-rc2-ak/kernel/sysctl.c
@@ -551,7 +551,7 @@ static struct ctl_table kern_table[] = {
 		.data		= &uevent_helper,
 		.maxlen		= UEVENT_HELPER_PATH_LEN,
 		.mode		= 0644,
-		.proc_handler	= proc_dostring,
+		.proc_handler	= proc_rcu_string,
 	},
 #endif
 #ifdef CONFIG_CHR_DEV_SG
Index: linux-2.6.33-rc2-ak/lib/kobject_uevent.c
===================================================================
--- linux-2.6.33-rc2-ak.orig/lib/kobject_uevent.c
+++ linux-2.6.33-rc2-ak/lib/kobject_uevent.c
@@ -22,11 +22,12 @@
 #include <linux/socket.h>
 #include <linux/skbuff.h>
 #include <linux/netlink.h>
+#include <linux/rcustring.h>
 #include <net/sock.h>
 
 
 u64 uevent_seqnum;
-char uevent_helper[UEVENT_HELPER_PATH_LEN] = CONFIG_UEVENT_HELPER_PATH;
+char *uevent_helper = CONFIG_UEVENT_HELPER_PATH;
 static DEFINE_SPINLOCK(sequence_lock);
 #if defined(CONFIG_NET)
 static struct sock *uevent_sock;
@@ -76,6 +77,34 @@ out:
 	return ret;
 }
 
+/* Call an external helper executable. */
+static int uevent_call_helper(const char *subsystem, struct kobj_uevent_env *env)
+{
+	char *argv[3];
+	char *helper;
+	int retval;
+
+	helper = access_rcu_string(&uevent_helper, UEVENT_HELPER_PATH_LEN, GFP_KERNEL);
+	if (!helper)
+		return -ENOMEM;
+
+	retval = -E2BIG;
+	argv[0] = helper;
+	argv[1] = (char *)subsystem;
+	argv[2] = NULL;
+	retval = add_uevent_var(env, "HOME=/");
+	if (retval)
+		goto error;
+	retval = add_uevent_var(env, "PATH=/sbin:/bin:/usr/sbin:/usr/bin");
+	if (retval)
+		goto error;
+
+	retval = call_usermodehelper(argv[0], argv, env->envp, UMH_WAIT_EXEC);
+error:
+	kfree(helper);
+	return retval;
+}
+
 /**
  * kobject_uevent_env - send an uevent with environmental data
  *
@@ -243,23 +272,8 @@ int kobject_uevent_env(struct kobject *k
 #endif
 
 	/* call uevent_helper, usually only enabled during early boot */
-	if (uevent_helper[0]) {
-		char *argv [3];
-
-		argv [0] = uevent_helper;
-		argv [1] = (char *)subsystem;
-		argv [2] = NULL;
-		retval = add_uevent_var(env, "HOME=/");
-		if (retval)
-			goto exit;
-		retval = add_uevent_var(env,
-					"PATH=/sbin:/bin:/usr/sbin:/usr/bin");
-		if (retval)
-			goto exit;
-
-		retval = call_usermodehelper(argv[0], argv,
-					     env->envp, UMH_WAIT_EXEC);
-	}
+	if (uevent_helper[0])
+		retval = uevent_call_helper(subsystem, env);
 
 exit:
 	kfree(devpath);

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH] [9/9] SYSCTL: Use RCU protected sysctl for ocfs group add helper
  2010-01-05  2:15 [PATCH] [0/9] SYSCTL: Use RCU to avoid races with string sysctls v2 Andi Kleen
                   ` (7 preceding siblings ...)
  2010-01-05  2:15 ` [PATCH] [8/9] SYSCTL: Convert hotplug helper string to proc_rcu_string() Andi Kleen
@ 2010-01-05  2:15 ` Andi Kleen
  8 siblings, 0 replies; 24+ messages in thread
From: Andi Kleen @ 2010-01-05  2:15 UTC (permalink / raw)
  To: joel.becker, ebiederm, paulmck, akpm, linux-kernel


This avoids races with unlocked sysctl() 

Also saves ~220 bytes in the data segment.

Cc: joel.becker@oracle.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 fs/ocfs2/stackglue.c |   17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

Index: linux-2.6.33-rc2-ak/fs/ocfs2/stackglue.c
===================================================================
--- linux-2.6.33-rc2-ak.orig/fs/ocfs2/stackglue.c
+++ linux-2.6.33-rc2-ak/fs/ocfs2/stackglue.c
@@ -27,6 +27,7 @@
 #include <linux/kobject.h>
 #include <linux/sysfs.h>
 #include <linux/sysctl.h>
+#include <linux/rcustring.h>
 
 #include "ocfs2_fs.h"
 
@@ -40,7 +41,7 @@ static struct ocfs2_locking_protocol *lp
 static DEFINE_SPINLOCK(ocfs2_stack_lock);
 static LIST_HEAD(ocfs2_stack_list);
 static char cluster_stack_name[OCFS2_STACK_LABEL_LEN + 1];
-static char ocfs2_hb_ctl_path[OCFS2_MAX_HB_CTL_PATH] = "/sbin/ocfs2_hb_ctl";
+static char *ocfs2_hb_ctl_path = "/sbin/ocfs2_hb_ctl";
 
 /*
  * The stack currently in use.  If not null, active_stack->sp_count > 0,
@@ -395,8 +396,15 @@ static void ocfs2_leave_group(const char
 {
 	int ret;
 	char *argv[5], *envp[3];
+	char *helper;
 
-	argv[0] = ocfs2_hb_ctl_path;
+	helper = access_rcu_string(&ocfs2_hb_ctl_path, OCFS2_MAX_HB_CTL_PATH, GFP_KERNEL);
+	if (!helper) {
+		printk(KERN_ERR "ocfs2_leave_group: no memory\n");
+		return;
+	}
+
+	argv[0] = helper;
 	argv[1] = "-K";
 	argv[2] = "-u";
 	argv[3] = (char *)group;
@@ -414,6 +422,7 @@ static void ocfs2_leave_group(const char
 		       "\"%s %s %s %s\"\n",
 		       ret, argv[0], argv[1], argv[2], argv[3]);
 	}
+	kfree(helper);
 }
 
 /*
@@ -621,10 +630,10 @@ error:
 static ctl_table ocfs2_nm_table[] = {
 	{
 		.procname	= "hb_ctl_path",
-		.data		= ocfs2_hb_ctl_path,
+		.data		= &ocfs2_hb_ctl_path,
 		.maxlen		= OCFS2_MAX_HB_CTL_PATH,
 		.mode		= 0644,
-		.proc_handler	= proc_dostring,
+		.proc_handler	= proc_rcu_string,
 	},
 	{ }
 };

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] [1/9] Add rcustring ADT for RCU protected strings v2
  2010-01-05  2:15 ` [PATCH] [1/9] Add rcustring ADT for RCU protected strings v2 Andi Kleen
@ 2010-01-05  5:32   ` Paul E. McKenney
  2010-01-05 10:47     ` Andi Kleen
  2010-01-08 23:50   ` Andrew Morton
  2010-01-11 12:12   ` Bert Wesarg
  2 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2010-01-05  5:32 UTC (permalink / raw)
  To: Andi Kleen; +Cc: ebiederm, akpm, linux-kernel

On Tue, Jan 05, 2010 at 03:15:26AM +0100, Andi Kleen wrote:
> 
> Add a little ADT for RCU protected strings. RCU is a convenient
> way to manage modifications to read-often-write-seldom strings.
> Add some helper functions to make this more straight forward.
> 
> Used by follow-on patches to implement RCU protected sysctl strings.
> 
>  * General rules:
>  * Reader has to use rcu_read_lock() and not sleep while accessing the string,
>  * or alternatively get a copy with access_rcu_string()
>  * Writer needs an own lock against each other.
>  * Each modification should allocate a new string first and free the old
>  * one with free_rcu_string()
>  * In writers use rcu_assign_pointer to publicize the updated string to
>  * global readers.
>  * The size passed to access_rcu_string() must be the same as passed
>  * to alloc_rcu_string() and be known in advance. Don't use strlen()!
>  *
>  * For sysctls also see proc_rcu_string() as a convenient wrapper
> 
> v2: Use rcu_dereference correctly

Very good, the extra level of indirection on the first argument to
access_rcu_string() fixes the problem I complained about last time.

One additional question below.

> Cc: paulmck@linux.vnet.ibm.com
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> 
> ---
>  include/linux/rcustring.h |   20 ++++++++
>  lib/Makefile              |    3 -
>  lib/rcustring.c           |  115 ++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 137 insertions(+), 1 deletion(-)

[ . . . ]

> +/*
> + * Get a local private copy of a RCU protected string.
> + * Mostly useful to get a string that is stable while sleeping.
> + * Caller must free returned string.
> + */
> +char *access_rcu_string(char **str, int size, gfp_t gfp)
> +{
> +	char *copy = kmalloc(size, gfp);
> +	if (!str)
> +		return NULL;
> +	rcu_read_lock();
> +	strlcpy(copy, rcu_dereference(*str), size);

What if "str" is non-NULL, but "*str" is NULL?  Or is that disallowed
somehow?

If it is not disallowed, then something like the following?

	if (!str)
		return NULL;
	rcu_read_lock();
	tmp = rcu_dereference(*str);
	if (!tmp) {
		rcu_read_unlock();
		return NULL;
	}
	strlcpy(copy, tmp, size);
	rcu_read_unlock();
	return copy;

> +	rcu_read_unlock();
> +	return copy;
> +}
> +EXPORT_SYMBOL(access_rcu_string);
> Index: linux-2.6.33-rc2-ak/lib/Makefile
> ===================================================================
> --- linux-2.6.33-rc2-ak.orig/lib/Makefile
> +++ linux-2.6.33-rc2-ak/lib/Makefile
> @@ -12,7 +12,8 @@ lib-y := ctype.o string.o vsprintf.o cmd
>  	 idr.o int_sqrt.o extable.o prio_tree.o \
>  	 sha1.o irq_regs.o reciprocal_div.o argv_split.o \
>  	 proportions.o prio_heap.o ratelimit.o show_mem.o \
> -	 is_single_threaded.o plist.o decompress.o flex_array.o
> +	 is_single_threaded.o plist.o decompress.o flex_array.o \
> +	 rcustring.o
> 
>  lib-$(CONFIG_MMU) += ioremap.o
>  lib-$(CONFIG_SMP) += cpumask.o

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] [4/9] SYSCTL: Use RCU strings for core_pattern sysctl
  2010-01-05  2:15 ` [PATCH] [4/9] SYSCTL: Use RCU strings for core_pattern sysctl Andi Kleen
@ 2010-01-05  6:56   ` Eric W. Biederman
  2010-01-05 11:49     ` Andi Kleen
  0 siblings, 1 reply; 24+ messages in thread
From: Eric W. Biederman @ 2010-01-05  6:56 UTC (permalink / raw)
  To: Andi Kleen; +Cc: paulmck, akpm, linux-kernel

Andi Kleen <andi@firstfloor.org> writes:

> Also saves ~220 bytes in the data segment for default kernels.
>
> As a bonus this removes one use of the BKL.
>
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
>
> ---
>  fs/exec.c       |   11 +++++------
>  kernel/sysctl.c |    6 +++---
>  2 files changed, 8 insertions(+), 9 deletions(-)
>
> Index: linux-2.6.33-rc2-ak/fs/exec.c
> ===================================================================
> --- linux-2.6.33-rc2-ak.orig/fs/exec.c
> +++ linux-2.6.33-rc2-ak/fs/exec.c
> @@ -62,7 +62,7 @@
>  #include "internal.h"
>  
>  int core_uses_pid;
> -char core_pattern[CORENAME_MAX_SIZE] = "core";
> +char *core_pattern = "core";
>  unsigned int core_pipe_limit;
>  int suid_dumpable = 0;
>  
> @@ -1421,7 +1421,7 @@ EXPORT_SYMBOL(set_binfmt);
>  static int format_corename(char *corename, long signr)
>  {
>  	const struct cred *cred = current_cred();
> -	const char *pat_ptr = core_pattern;
> +	const char *pat_ptr = rcu_dereference(core_pattern);

rcu_dereference should aways be between rcu_read_lock()
and rcu_read_unlock();

>  	int ispipe = (*pat_ptr == '|');
>  	char *out_ptr = corename;
>  	char *const out_end = corename + CORENAME_MAX_SIZE;
> @@ -1825,12 +1825,11 @@ void do_coredump(long signr, int exit_co
>  	clear_thread_flag(TIF_SIGPENDING);
>  
>  	/*
> -	 * lock_kernel() because format_corename() is controlled by sysctl, which
> -	 * uses lock_kernel()
> +	 * Protect corename by RCU vs proc_rcu_string()
>  	 */
> - 	lock_kernel();
> +	rcu_read_lock();
>  	ispipe = format_corename(corename, signr);
> -	unlock_kernel();
> +	rcu_read_unlock();
>  
>  	if ((!ispipe) && (cprm.limit < binfmt->min_coredump))
>  		goto fail_unlock;
> Index: linux-2.6.33-rc2-ak/kernel/sysctl.c
> ===================================================================
> --- linux-2.6.33-rc2-ak.orig/kernel/sysctl.c
> +++ linux-2.6.33-rc2-ak/kernel/sysctl.c
> @@ -75,7 +75,7 @@ extern int sysctl_oom_dump_tasks;
>  extern int max_threads;
>  extern int core_uses_pid;
>  extern int suid_dumpable;
> -extern char core_pattern[];
> +extern char *core_pattern;
>  extern unsigned int core_pipe_limit;
>  extern int pid_max;
>  extern int min_free_kbytes;
> @@ -399,10 +399,10 @@ static struct ctl_table kern_table[] = {
>  	},
>  	{
>  		.procname	= "core_pattern",
> -		.data		= core_pattern,
> +		.data		= &core_pattern,
>  		.maxlen		= CORENAME_MAX_SIZE,
>  		.mode		= 0644,
> -		.proc_handler	= proc_dostring,
> +		.proc_handler	= proc_rcu_string,
>  	},
>  	{
>  		.procname	= "core_pipe_limit",

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] [2/9] Add a kernel_address() that works for data too
  2010-01-05  2:15 ` [PATCH] [2/9] Add a kernel_address() that works for data too Andi Kleen
@ 2010-01-05  8:44   ` Russell King
  2010-01-05  8:58   ` Sam Ravnborg
  1 sibling, 0 replies; 24+ messages in thread
From: Russell King @ 2010-01-05  8:44 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-arch, ebiederm, paulmck, akpm, linux-kernel

On Tue, Jan 05, 2010 at 03:15:27AM +0100, Andi Kleen wrote:
> 
> Add a variant of kernel_text_address() that includes kernel data.
> 
> Assumes kernel is _text ... _end - init section. True everywhere?

No.  XIP kernels have the text and data/bss separated into two distinct
address regions.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] [2/9] Add a kernel_address() that works for data too
  2010-01-05  2:15 ` [PATCH] [2/9] Add a kernel_address() that works for data too Andi Kleen
  2010-01-05  8:44   ` Russell King
@ 2010-01-05  8:58   ` Sam Ravnborg
  2010-01-05 19:04     ` Russell King
  1 sibling, 1 reply; 24+ messages in thread
From: Sam Ravnborg @ 2010-01-05  8:58 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-arch, ebiederm, paulmck, akpm, linux-kernel

On Tue, Jan 05, 2010 at 03:15:27AM +0100, Andi Kleen wrote:
> 
> Add a variant of kernel_text_address() that includes kernel data.
> 
> Assumes kernel is _text ... _end - init section. True everywhere?

Tim Abbott has done a great job lately to unify the various
linker scripts used by the different architectures.

Architectures are supposed to follow the skeleton outlined
in include/asm-generic/vmlinux.lds.h

But I think the skeleton needs a small update.
It describes that we mark start of .text with _stext.
But reality is that we use _text for this.

But you can trust _etext an almost all architectures.
It is a bug if it is missing.

So [_text, _etext] is the text section.

The data section may be placed before or after - it depends on the architecture.
But again - only some architectures define _sdata.
But all? define _edata.


	Sam


> 
> Cc: linux-arch@vger.kernel.org
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> 
> ---
>  include/linux/kernel.h |    1 +
>  kernel/extable.c       |   18 ++++++++++++++++++
>  2 files changed, 19 insertions(+)
> 
> Index: linux-2.6.33-rc2-ak/include/linux/kernel.h
> ===================================================================
> --- linux-2.6.33-rc2-ak.orig/include/linux/kernel.h
> +++ linux-2.6.33-rc2-ak/include/linux/kernel.h
> @@ -205,6 +205,7 @@ extern unsigned long long memparse(const
>  extern int core_kernel_text(unsigned long addr);
>  extern int __kernel_text_address(unsigned long addr);
>  extern int kernel_text_address(unsigned long addr);
> +extern int kernel_address(unsigned long addr);
>  extern int func_ptr_is_kernel_text(void *ptr);
>  
>  struct pid;
> Index: linux-2.6.33-rc2-ak/kernel/extable.c
> ===================================================================
> --- linux-2.6.33-rc2-ak.orig/kernel/extable.c
> +++ linux-2.6.33-rc2-ak/kernel/extable.c
> @@ -72,6 +72,18 @@ int core_kernel_text(unsigned long addr)
>  	return 0;
>  }
>  
> +static int core_kernel_address(unsigned long addr)
> +{
> +	if ((addr >= (unsigned long)_text &&
> +	     addr <= (unsigned long)_end)) {
> +		if (addr >= (unsigned long)__init_begin &&
> +		    addr < (unsigned long)__init_end)
> +			return system_state == SYSTEM_BOOTING;
> +		return 1;
> +	}
> +	return 0;
> +}
> +
>  int __kernel_text_address(unsigned long addr)
>  {
>  	if (core_kernel_text(addr))
> @@ -98,6 +110,12 @@ int kernel_text_address(unsigned long ad
>  	return is_module_text_address(addr);
>  }
>  
> +/* text or data in core kernel or module */
> +int kernel_address(unsigned long addr)
> +{
> +	return core_kernel_address(addr) || is_module_address(addr);
> +}
> +
>  /*
>   * On some architectures (PPC64, IA64) function pointers
>   * are actually only tokens to some data that then holds the
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arch" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] [1/9] Add rcustring ADT for RCU protected strings v2
  2010-01-05  5:32   ` Paul E. McKenney
@ 2010-01-05 10:47     ` Andi Kleen
  2010-01-05 14:11       ` Paul E. McKenney
  0 siblings, 1 reply; 24+ messages in thread
From: Andi Kleen @ 2010-01-05 10:47 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Andi Kleen, ebiederm, akpm, linux-kernel

On Mon, Jan 04, 2010 at 09:32:52PM -0800, Paul E. McKenney wrote:
> > +/*
> > + * Get a local private copy of a RCU protected string.
> > + * Mostly useful to get a string that is stable while sleeping.
> > + * Caller must free returned string.
> > + */
> > +char *access_rcu_string(char **str, int size, gfp_t gfp)
> > +{
> > +	char *copy = kmalloc(size, gfp);
> > +	if (!str)
> > +		return NULL;
> > +	rcu_read_lock();
> > +	strlcpy(copy, rcu_dereference(*str), size);
> 
> What if "str" is non-NULL, but "*str" is NULL?  Or is that disallowed
> somehow?

I would consider it disallowed, all the strings have some default value.
Empty string would be ""

If we allowed it the caller couldn't easily distingush error from expected NULL.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] [4/9] SYSCTL: Use RCU strings for core_pattern sysctl
  2010-01-05  6:56   ` Eric W. Biederman
@ 2010-01-05 11:49     ` Andi Kleen
  0 siblings, 0 replies; 24+ messages in thread
From: Andi Kleen @ 2010-01-05 11:49 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: paulmck, akpm, linux-kernel

ebiederm@xmission.com (Eric W. Biederman) writes:
>>  
>> @@ -1421,7 +1421,7 @@ EXPORT_SYMBOL(set_binfmt);
>>  static int format_corename(char *corename, long signr)
>>  {
>>  	const struct cred *cred = current_cred();
>> -	const char *pat_ptr = core_pattern;
>> +	const char *pat_ptr = rcu_dereference(core_pattern);
>
> rcu_dereference should aways be between rcu_read_lock()
> and rcu_read_unlock();

It is, see the call site below:

>> -	 * lock_kernel() because format_corename() is controlled by sysctl, which
>> -	 * uses lock_kernel()
>> +	 * Protect corename by RCU vs proc_rcu_string()
>>  	 */
>> - 	lock_kernel();
>> +	rcu_read_lock();
>>  	ispipe = format_corename(corename, signr);
>> -	unlock_kernel();
>> +	rcu_read_unlock();

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] [1/9] Add rcustring ADT for RCU protected strings v2
  2010-01-05 10:47     ` Andi Kleen
@ 2010-01-05 14:11       ` Paul E. McKenney
  2010-01-05 14:19         ` Andi Kleen
  0 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2010-01-05 14:11 UTC (permalink / raw)
  To: Andi Kleen; +Cc: ebiederm, akpm, linux-kernel

On Tue, Jan 05, 2010 at 11:47:46AM +0100, Andi Kleen wrote:
> On Mon, Jan 04, 2010 at 09:32:52PM -0800, Paul E. McKenney wrote:
> > > +/*
> > > + * Get a local private copy of a RCU protected string.
> > > + * Mostly useful to get a string that is stable while sleeping.
> > > + * Caller must free returned string.
> > > + */
> > > +char *access_rcu_string(char **str, int size, gfp_t gfp)
> > > +{
> > > +	char *copy = kmalloc(size, gfp);
> > > +	if (!str)
> > > +		return NULL;
> > > +	rcu_read_lock();
> > > +	strlcpy(copy, rcu_dereference(*str), size);
> > 
> > What if "str" is non-NULL, but "*str" is NULL?  Or is that disallowed
> > somehow?
> 
> I would consider it disallowed, all the strings have some default value.
> Empty string would be ""
> 
> If we allowed it the caller couldn't easily distingush error from expected NULL.

OK, then:

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

							Thanx, Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] [1/9] Add rcustring ADT for RCU protected strings v2
  2010-01-05 14:11       ` Paul E. McKenney
@ 2010-01-05 14:19         ` Andi Kleen
  0 siblings, 0 replies; 24+ messages in thread
From: Andi Kleen @ 2010-01-05 14:19 UTC (permalink / raw)
  To: paulmck; +Cc: ebiederm, akpm, linux-kernel

"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
>> 
>> I would consider it disallowed, all the strings have some default value.
>> Empty string would be ""
>> 
>> If we allowed it the caller couldn't easily distingush error from expected NULL.
>
> OK, then:

Thanks. I actually changed my mind and added support for NULL strings
(although I suspect they are not too useful)

I also found one more minor problem (and there was one conversion
missing in v2), will do a repost later

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] [2/9] Add a kernel_address() that works for data too
  2010-01-05  8:58   ` Sam Ravnborg
@ 2010-01-05 19:04     ` Russell King
  2010-01-05 19:15       ` Andi Kleen
  0 siblings, 1 reply; 24+ messages in thread
From: Russell King @ 2010-01-05 19:04 UTC (permalink / raw)
  To: Sam Ravnborg
  Cc: Andi Kleen, linux-arch, ebiederm, paulmck, akpm, linux-kernel

On Tue, Jan 05, 2010 at 09:58:53AM +0100, Sam Ravnborg wrote:
> But you can trust _etext an almost all architectures.
> It is a bug if it is missing.
> 
> So [_text, _etext] is the text section.
> 
> The data section may be placed before or after - it depends on the architecture.
> But again - only some architectures define _sdata.
> But all? define _edata.

You can not guarantee that the data segment is after the text segment,
unless you want to outlaw XIP kernels.  XIP kernels have the text
segment mapped at a completely different address to the data segment.

I'd suggest the only way to identify the data segment in a generic way
is to have everyone define _sdata, or more preferably _data (to be
consistent with _text), to be the start of the data segment.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] [2/9] Add a kernel_address() that works for data too
  2010-01-05 19:04     ` Russell King
@ 2010-01-05 19:15       ` Andi Kleen
  2010-01-08 23:51         ` Andrew Morton
  0 siblings, 1 reply; 24+ messages in thread
From: Andi Kleen @ 2010-01-05 19:15 UTC (permalink / raw)
  To: Sam Ravnborg, Andi Kleen, linux-arch, ebiederm, paulmck, akpm,
	linux-kernel

> I'd suggest the only way to identify the data segment in a generic way
> is to have everyone define _sdata, or more preferably _data (to be
> consistent with _text), to be the start of the data segment.

I agree. I'll take a look at that.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] [1/9] Add rcustring ADT for RCU protected strings v2
  2010-01-05  2:15 ` [PATCH] [1/9] Add rcustring ADT for RCU protected strings v2 Andi Kleen
  2010-01-05  5:32   ` Paul E. McKenney
@ 2010-01-08 23:50   ` Andrew Morton
  2010-01-11 12:12   ` Bert Wesarg
  2 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2010-01-08 23:50 UTC (permalink / raw)
  To: Andi Kleen; +Cc: paulmck, ebiederm, linux-kernel

On Tue,  5 Jan 2010 03:15:26 +0100 (CET)
Andi Kleen <andi@firstfloor.org> wrote:

> +extern char *alloc_rcu_string(int size, gfp_t gfp);
> +extern void free_rcu_string(const char *string);
> +
> +/*
> + * size must be the same as alloc_rcu_string, don't
> + * use strlen on str!
> + */
> +extern char *access_rcu_string(char **str, int size, gfp_t gfp);

I think better names would be rcu_string_alloc(), rcu_string_free() and
rcu_string_access().


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] [2/9] Add a kernel_address() that works for data too
  2010-01-05 19:15       ` Andi Kleen
@ 2010-01-08 23:51         ` Andrew Morton
  0 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2010-01-08 23:51 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Sam Ravnborg, linux-arch, ebiederm, paulmck, linux-kernel

On Tue, 5 Jan 2010 20:15:03 +0100
Andi Kleen <andi@firstfloor.org> wrote:

> > I'd suggest the only way to identify the data segment in a generic way
> > is to have everyone define _sdata, or more preferably _data (to be
> > consistent with _text), to be the start of the data segment.
> 
> I agree. I'll take a look at that.
> 

I'll merge this as-is for now.  Please send updates when convenient.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] [1/9] Add rcustring ADT for RCU protected strings v2
  2010-01-05  2:15 ` [PATCH] [1/9] Add rcustring ADT for RCU protected strings v2 Andi Kleen
  2010-01-05  5:32   ` Paul E. McKenney
  2010-01-08 23:50   ` Andrew Morton
@ 2010-01-11 12:12   ` Bert Wesarg
  2010-01-11 14:26     ` Andi Kleen
  2 siblings, 1 reply; 24+ messages in thread
From: Bert Wesarg @ 2010-01-11 12:12 UTC (permalink / raw)
  To: Andi Kleen; +Cc: paulmck, ebiederm, akpm, linux-kernel

On Tue, Jan 5, 2010 at 03:15, Andi Kleen <andi@firstfloor.org> wrote:
> Index: linux-2.6.33-rc2-ak/lib/rcustring.c
> ===================================================================
> --- /dev/null
> +++ linux-2.6.33-rc2-ak/lib/rcustring.c

[ . . . ]

> +/*
> + * Get a local private copy of a RCU protected string.
> + * Mostly useful to get a string that is stable while sleeping.
> + * Caller must free returned string.
> + */
> +char *access_rcu_string(char **str, int size, gfp_t gfp)
> +{
> +       char *copy = kmalloc(size, gfp);
No check of return value from kmalloc()?

> +       if (!str)
> +               return NULL;
> +       rcu_read_lock();
> +       strlcpy(copy, rcu_dereference(*str), size);
> +       rcu_read_unlock();
> +       return copy;
> +}
> +EXPORT_SYMBOL(access_rcu_string);

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] [1/9] Add rcustring ADT for RCU protected strings v2
  2010-01-11 12:12   ` Bert Wesarg
@ 2010-01-11 14:26     ` Andi Kleen
  0 siblings, 0 replies; 24+ messages in thread
From: Andi Kleen @ 2010-01-11 14:26 UTC (permalink / raw)
  To: Bert Wesarg; +Cc: Andi Kleen, paulmck, ebiederm, akpm, linux-kernel

On Mon, Jan 11, 2010 at 01:12:41PM +0100, Bert Wesarg wrote:
> On Tue, Jan 5, 2010 at 03:15, Andi Kleen <andi@firstfloor.org> wrote:
> > Index: linux-2.6.33-rc2-ak/lib/rcustring.c
> > ===================================================================
> > --- /dev/null
> > +++ linux-2.6.33-rc2-ak/lib/rcustring.c
> 
> [ . . . ]
> 
> > +/*
> > + * Get a local private copy of a RCU protected string.
> > + * Mostly useful to get a string that is stable while sleeping.
> > + * Caller must free returned string.
> > + */
> > +char *access_rcu_string(char **str, int size, gfp_t gfp)
> > +{
> > +       char *copy = kmalloc(size, gfp);
> No check of return value from kmalloc()?
> 
> > +       if (!str)
> > +               return NULL;

The !str should be !copy right. I fixed that in my version.
Thanks for catching.

-Andi

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2010-01-11 14:27 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-05  2:15 [PATCH] [0/9] SYSCTL: Use RCU to avoid races with string sysctls v2 Andi Kleen
2010-01-05  2:15 ` [PATCH] [1/9] Add rcustring ADT for RCU protected strings v2 Andi Kleen
2010-01-05  5:32   ` Paul E. McKenney
2010-01-05 10:47     ` Andi Kleen
2010-01-05 14:11       ` Paul E. McKenney
2010-01-05 14:19         ` Andi Kleen
2010-01-08 23:50   ` Andrew Morton
2010-01-11 12:12   ` Bert Wesarg
2010-01-11 14:26     ` Andi Kleen
2010-01-05  2:15 ` [PATCH] [2/9] Add a kernel_address() that works for data too Andi Kleen
2010-01-05  8:44   ` Russell King
2010-01-05  8:58   ` Sam Ravnborg
2010-01-05 19:04     ` Russell King
2010-01-05 19:15       ` Andi Kleen
2010-01-08 23:51         ` Andrew Morton
2010-01-05  2:15 ` [PATCH] [3/9] SYSCTL: Add proc_rcu_string to manage sysctls using rcu strings Andi Kleen
2010-01-05  2:15 ` [PATCH] [4/9] SYSCTL: Use RCU strings for core_pattern sysctl Andi Kleen
2010-01-05  6:56   ` Eric W. Biederman
2010-01-05 11:49     ` Andi Kleen
2010-01-05  2:15 ` [PATCH] [5/9] SYSCTL: Add call_usermodehelper_cleanup() Andi Kleen
2010-01-05  2:15 ` [PATCH] [6/9] SYSCTL: Convert modprobe_path to proc_rcu_string() Andi Kleen
2010-01-05  2:15 ` [PATCH] [7/9] SYSCTL: Convert poweroff_command to proc_rcu_string Andi Kleen
2010-01-05  2:15 ` [PATCH] [8/9] SYSCTL: Convert hotplug helper string to proc_rcu_string() Andi Kleen
2010-01-05  2:15 ` [PATCH] [9/9] SYSCTL: Use RCU protected sysctl for ocfs group add helper Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox