public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: rcu-refcount stacker performance
  2005-07-14 15:23 ` Paul E. McKenney
@ 2005-07-14 13:44   ` serue
  2005-07-14 16:59     ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: serue @ 2005-07-14 13:44 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: serue, lkml, Dipankar Sarma, David A. Wheeler, Tony Jones

Quoting Paul E. McKenney (paulmck@us.ibm.com):
> On Thu, Jul 14, 2005 at 09:21:07AM -0500, serue@us.ibm.com wrote:
> > On July 8 I sent out a patch which re-implemented the rcu-refcounting
> > of the LSM list in stacker for the sake of supporting safe security
> > module unloading.  (patch reattached here for convenience)  Here are
> > some performance results with and without that patch.  Tests were run
> > on a 16-way ppc64 machine.  Dbench was run 50 times, and kernbench
> > and reaim were run 10 times, and intervals are 95% confidence half-
> > intervals.
> > 
> > These results seem pretty poor.  I'm now wondering whether this is
> > really necessary.  David Wheeler's original stacker had an option
> > of simply waiting a while after a module was taken out of the list
> > of active modules before freeing the modules.  Something like that
> > is of course one option.  I'm hoping we can also take advantage of
> > some already known module state info to be a little less coarse
> > about it.  For instance, sys_delete_module() sets m->state to
> > MODULE_STATE_GOING before calling mod->exit().  If in place of
> > doing atomic_inc(&m->use), stacker skipped the m->hook() if
> > m->state!=MODULE_STATE_LIVE, then it may be safe to assume that
> > any m->hook() should be finished before sys_delete_module() gets
> > to free_module(mod).  This seems to require adding a struct
> > module argument to security/security:mod_reg_security() so an LSM
> > can pass itself along.
> > 
> > So I'll try that next.  Hopefully by avoiding the potential cache
> > line bounces which atomic_inc(&m->use) bring, this should provide
> > far better performance.
> 
> My guess is that the reference count is indeed costing you quite a
> bit.  I glance quickly at the patch, and most of the uses seem to
> be of the form:
> 
> 	increment ref count
> 	rcu_read_lock()
> 	do something
> 	rcu_read_unlock()
> 	decrement ref count
> 
> Can't these cases rely solely on rcu_read_lock()?  Why do you also
> need to increment the reference count in these cases?

The problem is on module unload: is it possible for CPU1 to be
on "do something", and sleep, and, while it sleeps, CPU2 does
rmmod(lsm), so that by the time CPU1 stops sleeping, the code it
is executing has been freed?

Because stacker won't remove the lsm from the list of modules
until mod->exit() is executed, and module_free(mod) happens
immediately after that, the above scenario seems possible.

thanks,
-serge

^ permalink raw reply	[flat|nested] 9+ messages in thread

* rcu-refcount stacker performance
@ 2005-07-14 14:21 serue
  2005-07-14 15:23 ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: serue @ 2005-07-14 14:21 UTC (permalink / raw)
  To: lkml, Paul E. McKenney, Dipankar Sarma, David A. Wheeler,
	Tony Jones

On July 8 I sent out a patch which re-implemented the rcu-refcounting
of the LSM list in stacker for the sake of supporting safe security
module unloading.  (patch reattached here for convenience)  Here are
some performance results with and without that patch.  Tests were run
on a 16-way ppc64 machine.  Dbench was run 50 times, and kernbench
and reaim were run 10 times, and intervals are 95% confidence half-
intervals.

These results seem pretty poor.  I'm now wondering whether this is
really necessary.  David Wheeler's original stacker had an option
of simply waiting a while after a module was taken out of the list
of active modules before freeing the modules.  Something like that
is of course one option.  I'm hoping we can also take advantage of
some already known module state info to be a little less coarse
about it.  For instance, sys_delete_module() sets m->state to
MODULE_STATE_GOING before calling mod->exit().  If in place of
doing atomic_inc(&m->use), stacker skipped the m->hook() if
m->state!=MODULE_STATE_LIVE, then it may be safe to assume that
any m->hook() should be finished before sys_delete_module() gets
to free_module(mod).  This seems to require adding a struct
module argument to security/security:mod_reg_security() so an LSM
can pass itself along.

So I'll try that next.  Hopefully by avoiding the potential cache
line bounces which atomic_inc(&m->use) bring, this should provide
far better performance.

thanks,
-serge

Dbench (throughput, larger is better)
--------------------------------------------
plain stacker:    1531.448400 +/- 15.791116
stacker with rcu: 1408.056200 +/- 12.597277

Kernbench (runtime, smaller is better)
--------------------------------------------
plain stacker:    52.341000  +/- 0.184995
stacker with rcu: 53.722000 +/- 0.161473

Reaim (numjobs, larger is better) (gnuplot-friendly format)
plain stacker:
----------------------------------------------------------
Numforked   jobs/minute         95% CI
1           106662.857000     5354.267865
3           301628.571000     6297.121934
5           488142.858000     16031.685536
7           673200.000000     23994.030784
9           852428.570000     31485.607271
11          961714.290000     0.000000
13          1108157.144000    27287.525982
15          1171178.571000    49790.796869

Reaim (numjobs, larger is better) (gnuplot-friendly format)
plain stacker:
----------------------------------------------------------
Numforked   jobs/minute         95% CI
1           100542.857000     2099.040645
3           266657.139000     6297.121934
5           398892.858000     12023.765252
7           467670.000000     14911.383385
9           418648.352000     11665.751441
11          396825.000000     8700.115252
13          357480.912000     7567.947838
15          337571.428000     2332.267703

Patch:

Index: linux-2.6.12/security/stacker.c
===================================================================
--- linux-2.6.12.orig/security/stacker.c	2005-07-08 13:43:15.000000000 -0500
+++ linux-2.6.12/security/stacker.c	2005-07-08 16:21:54.000000000 -0500
@@ -33,13 +33,13 @@
 
 struct module_entry {
 	struct list_head lsm_list;  /* list of active lsms */
-	struct list_head all_lsms; /* list of all lsms */
 	char *module_name;
 	int namelen;
 	struct security_operations module_operations;
+	struct rcu_head m_rcu;
+	atomic_t use;
 };
 static struct list_head stacked_modules;  /* list of stacked modules */
-static struct list_head all_modules;  /* list of all modules, including freed */
 
 static short sysfsfiles_registered;
 
@@ -84,6 +84,32 @@ MODULE_PARM_DESC(debug, "Debug enabled o
  * We return as soon as an error is returned.
  */
 
+static inline void stacker_free_module(struct module_entry *m)
+{
+	kfree(m->module_name);
+	kfree(m);
+}
+
+/*
+ * Version of stacker_free_module called from call_rcu
+ */
+static void free_mod_fromrcu(struct rcu_head *head)
+{
+	struct module_entry *m;
+
+	m = container_of(head, struct module_entry, m_rcu);
+	stacker_free_module(m);
+}
+
+static void stacker_del_module(struct rcu_head *head)
+{
+	struct module_entry *m;
+	
+	m = container_of(head, struct module_entry, m_rcu);
+	if (atomic_dec_and_test(&m->use))
+		stacker_free_module(m);
+}
+
 #define stack_for_each_entry(pos, head, member)				\
 	for (pos = list_entry((head)->next, typeof(*pos), member);	\
 		&pos->member != (head);					\
@@ -93,16 +119,27 @@ MODULE_PARM_DESC(debug, "Debug enabled o
 /* to make this safe for module deletion, we would need to
  * add a reference count to m as we had before
  */
+/*
+ * XXX We can't quite do this - we delete the module before we grab
+ * m->next?
+ * We could just do a call_rcu.  Then the call_rcu happens in same
+ * rcu cycle has dereference, so module won't be deleted until the
+ * next cycle.
+ * That's what I'm going to do.
+ */
 #define RETURN_ERROR_IF_ANY_ERROR(BASE_FUNC, FUNC_WITH_ARGS) do { \
 	int result = 0; \
 	struct module_entry *m; \
 	rcu_read_lock(); \
 	stack_for_each_entry(m, &stacked_modules, lsm_list) { \
-		if (!m->module_operations.BASE_FUNC) \
-			continue; \
-		rcu_read_unlock(); \
-		result = m->module_operations.FUNC_WITH_ARGS; \
-		rcu_read_lock(); \
+		if (m->module_operations.BASE_FUNC) { \
+			atomic_inc(&m->use); \
+			rcu_read_unlock(); \
+			result = m->module_operations.FUNC_WITH_ARGS; \
+			rcu_read_lock(); \
+			if (unlikely(atomic_dec_and_test(&m->use))) \
+				call_rcu(&m->m_rcu, free_mod_fromrcu); \
+		} \
 		if (result) \
 			break; \
 	} \
@@ -116,9 +153,12 @@ MODULE_PARM_DESC(debug, "Debug enabled o
 	rcu_read_lock(); \
 	stack_for_each_entry(m, &stacked_modules, lsm_list) { \
 		if (m->module_operations.BASE_FUNC) { \
+			atomic_inc(&m->use); \
 			rcu_read_unlock(); \
 			m->module_operations.FUNC_WITH_ARGS; \
 			rcu_read_lock(); \
+			if (unlikely(atomic_dec_and_test(&m->use))) \
+				call_rcu(&m->m_rcu, free_mod_fromrcu); \
 		} \
 	} \
 	rcu_read_unlock(); \
@@ -129,38 +169,47 @@ MODULE_PARM_DESC(debug, "Debug enabled o
 	rcu_read_lock(); \
 	stack_for_each_entry(m, &stacked_modules, lsm_list ) { \
 		if (m->module_operations.BASE_FREE) { \
+			atomic_inc(&m->use); \
 			rcu_read_unlock(); \
 			m->module_operations.FREE_WITH_ARGS; \
 			rcu_read_lock(); \
+			if (unlikely(atomic_dec_and_test(&m->use))) \
+				call_rcu(&m->m_rcu, free_mod_fromrcu); \
 		} \
 	} \
 	rcu_read_unlock(); \
 } while (0)
 
 #define ALLOC_SECURITY(BASE_FUNC,FUNC_WITH_ARGS,BASE_FREE,FREE_WITH_ARGS) do { \
-	int result; \
+	int result = 0; \
 	struct module_entry *m, *m2; \
 	rcu_read_lock(); \
 	stack_for_each_entry(m, &stacked_modules, lsm_list) { \
-		if (!m->module_operations.BASE_FUNC) \
-			continue; \
-		rcu_read_unlock(); \
-		result = m->module_operations.FUNC_WITH_ARGS; \
-		rcu_read_lock(); \
+		if (m->module_operations.BASE_FUNC) { \
+			atomic_inc(&m->use); \
+			rcu_read_unlock(); \
+			result = m->module_operations.FUNC_WITH_ARGS; \
+			rcu_read_lock(); \
+			if (unlikely(atomic_dec_and_test(&m->use))) \
+				call_rcu(&m->m_rcu, free_mod_fromrcu); \
+		} \
 		if (result) \
 			goto bad; \
 	} \
 	rcu_read_unlock(); \
 	return 0; \
 bad: \
-	stack_for_each_entry(m2, &all_modules, all_lsms) { \
+	stack_for_each_entry(m2, &stacked_modules, lsm_list) { \
 		if (m == m2) \
 			break; \
 		if (!m2->module_operations.BASE_FREE) \
 			continue; \
+		atomic_inc(&m2->use); \
 		rcu_read_unlock(); \
 		m2->module_operations.FREE_WITH_ARGS; \
 		rcu_read_lock(); \
+		if (unlikely(atomic_dec_and_test(&m2->use))) \
+			call_rcu(&m2->m_rcu, free_mod_fromrcu); \
 	} \
 	rcu_read_unlock(); \
 	return result; \
@@ -251,10 +300,16 @@ static int stacker_vm_enough_memory(long
 
 	rcu_read_lock();
 	stack_for_each_entry(m, &stacked_modules, lsm_list) {
-		if (!m->module_operations.vm_enough_memory)
+		if (!m->module_operations.vm_enough_memory) {
 			continue;
+		}
+		atomic_inc(&m->use);
 		rcu_read_unlock();
 		result = m->module_operations.vm_enough_memory(pages);
+		rcu_read_lock();
+		if (unlikely(atomic_dec_and_test(&m->use)))
+			stacker_free_module(m);
+		rcu_read_unlock();
 		return result;
 	}
 	rcu_read_unlock();
@@ -281,9 +336,12 @@ static int stacker_netlink_send (struct 
 		if (!m->module_operations.netlink_send)
 			continue;
 		NETLINK_CB(skb).eff_cap = ~0;
+		atomic_inc(&m->use);
 		rcu_read_unlock();
 		result = m->module_operations.netlink_send(sk, skb);
 		rcu_read_lock();
+		if (unlikely(atomic_dec_and_test(&m->use)))
+			call_rcu(&m->m_rcu, free_mod_fromrcu);
 		tmpcap &= NETLINK_CB(skb).eff_cap;
 		if (result)
 			break;
@@ -987,33 +1045,42 @@ stacker_getprocattr(struct task_struct *
 	stack_for_each_entry(m, &stacked_modules, lsm_list) {
 		if (!m->module_operations.getprocattr)
 			continue;
+		atomic_inc(&m->use);
 		rcu_read_unlock();
 		ret = m->module_operations.getprocattr(p, name,
 					value+len, size-len);
 		rcu_read_lock();
-		if (ret == -EINVAL)
-			continue;
-		found_noneinval = 1;
-		if (ret < 0) {
+		if (ret > 0) {
+			found_noneinval = 1;
+			len += ret;
+			if (len+m->namelen+4 < size) {
+				char *v = value;
+				if (v[len-1]=='\n')
+					len--;
+				len += sprintf(value+len, " (%s)\n",
+							m->module_name);
+			}
+		} else if (ret != -EINVAL) {
+			found_noneinval = 1;
 			memset(value, 0, len);
 			len = ret;
+		} else
+			ret = 0;
+
+		if (unlikely(atomic_dec_and_test(&m->use)))
+			call_rcu(&m->m_rcu, free_mod_fromrcu);
+
+		if (ret < 0)
 			break;
-		}
-		if (ret == 0)
-			continue;
-		len += ret;
-		if (len+m->namelen+4 < size) {
-			char *v = value;
-			if (v[len-1]=='\n')
-				len--;
-			len += sprintf(value+len, " (%s)\n", m->module_name);
-		}
 	}
 	rcu_read_unlock();
 
 	return found_noneinval ? len : -EINVAL;
 }
 
+/*
+ * find an lsm by name.  If found, increment its use count and return it
+ */
 static struct module_entry *
 find_active_lsm(const char *name, int len)
 {
@@ -1022,6 +1089,7 @@ find_active_lsm(const char *name, int le
 	rcu_read_lock();
 	stack_for_each_entry(m, &stacked_modules, lsm_list) {
 		if (m->namelen == len && !strncmp(m->module_name, name, len)) {
+			atomic_inc(&m->use);
 			ret = m;
 			break;
 		}
@@ -1043,6 +1111,7 @@ stacker_setprocattr(struct task_struct *
 	char *realv = (char *)value;
 	size_t dsize = size;
 	int loc = 0, end_data = size;
+	int ret, free_module = 0;
 
 	if (list_empty(&stacked_modules))
 		return -EINVAL;
@@ -1063,7 +1132,7 @@ stacker_setprocattr(struct task_struct *
 	callm = find_active_lsm(realv+loc+1, dsize-loc-1);
 	if (!callm)
 		goto call;
-
+	free_module = 1;
 
 	loc--;
 	while (loc && realv[loc]==' ')
@@ -1074,8 +1143,14 @@ call:
 	if (!callm || !callm->module_operations.setprocattr)
 		return -EINVAL;
 
-	return callm->module_operations.setprocattr(p, name, value, end_data) +
+	ret = callm->module_operations.setprocattr(p, name, value, end_data) +
 			(size-end_data);
+
+	if (free_module && atomic_dec_and_test(&callm->use))
+		stacker_free_module(callm);
+
+	return ret;
+
 }
 
 /*
@@ -1116,15 +1191,15 @@ static int stacker_register (const char 
 	new_module_entry->module_name = new_module_name;
 	new_module_entry->namelen = namelen;
 
+	atomic_set(&new_module_entry->use, 1);
+
 	INIT_LIST_HEAD(&new_module_entry->lsm_list);
-	INIT_LIST_HEAD(&new_module_entry->all_lsms);
 
 	rcu_read_lock();
 	if (!modules_registered) {
 		modules_registered++;
 		list_del_rcu(&default_module.lsm_list);
 	}
-	list_add_tail_rcu(&new_module_entry->all_lsms, &all_modules);
 	list_add_tail_rcu(&new_module_entry->lsm_list, &stacked_modules);
 	if (strcmp(name, "selinux") == 0)
 		selinux_module = new_module_entry;
@@ -1141,16 +1216,60 @@ out:
 }
 
 /*
- * Currently this version of stacker does not allow for module
- * unregistering.
- * Easy way to allow for this is using rcu ref counting like an older
- * version of stacker did.
- * Another way would be to force stacker_unregister to sleep between
- * removing the module from all_modules and free_modules and unloading it.
+ * find_lsm_module_by_name:
+ * Find a module by name.  Used by stacker_unregister.  Called with
+ * stacker spinlock held.
+ */
+static struct module_entry *
+find_lsm_with_namelen(const char *name, int len)
+{
+	struct module_entry *m, *ret = NULL;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(m, &stacked_modules, lsm_list) {
+		atomic_inc(&m->use);
+		rcu_read_unlock();
+		if (m->namelen == len && !strncmp(m->module_name, name, len))
+			ret = m;
+		rcu_read_lock();
+		if (unlikely(atomic_dec_and_test(&m->use)))
+			call_rcu(&m->m_rcu, free_mod_fromrcu);
+		if (ret)
+			break;
+	}
+	rcu_read_unlock();
+
+	return ret;
+}
+
+/*
  */
 static int stacker_unregister (const char *name, struct security_operations *ops)
 {
-	return -EPERM;
+	struct module_entry *m;
+	int len = strnlen(name, MAX_MODULE_NAME_LEN);
+	int ret = 0;
+
+	spin_lock(&stacker_lock);
+	m = find_lsm_with_namelen(name, len);
+
+	if (!m) {
+		printk(KERN_INFO "%s: could not find module %s.\n",
+				__FUNCTION__, name);
+		ret = -ENOENT;
+		goto out;
+	}
+
+	list_del_rcu(&m->lsm_list);
+
+	if (strcmp(m->module_name, "selinux") == 0)
+		selinux_module = NULL;
+	call_rcu(&m->m_rcu, stacker_del_module);
+
+out:
+	spin_unlock(&stacker_lock);
+
+	return ret;
 }
 
 static struct security_operations stacker_ops = {
@@ -1407,57 +1526,6 @@ static struct stacker_attribute stacker_
 	.show = listmodules_read,
 };
 
-/* respond to a request to unload a module */
-static ssize_t stacker_unload_write (struct stacker_kobj *obj, const char *name,
-					size_t count)
-{
-	struct module_entry *m;
-	int len = strnlen(name, MAX_MODULE_NAME_LEN);
-	int ret = count;
-
-	if (!capable(CAP_SYS_ADMIN))
-		return -EPERM;
-
-	if (count <= 0)
-		return -EINVAL;
-
-	if (!modules_registered)
-		return -EINVAL;
-
-	spin_lock(&stacker_lock);
-	m = find_active_lsm(name, len);
-
-	if (!m) {
-		printk(KERN_INFO "%s: could not find module %s.\n",
-			__FUNCTION__, name);
-		ret = -ENOENT;
-		goto out;
-	}
-
-	if (strcmp(m->module_name, "selinux") == 0)
-		selinux_module = NULL;
-
-	rcu_read_lock();
-	list_del_rcu(&m->lsm_list);
-	if (list_empty(&stacked_modules)) {
-		INIT_LIST_HEAD(&default_module.lsm_list);
-		list_add_tail_rcu(&default_module.lsm_list, &stacked_modules);
-		modules_registered = 0;
-	}
-	rcu_read_unlock();
-
-out:
-	spin_unlock(&stacker_lock);
-
-	return ret;
-}
-
-static struct stacker_attribute stacker_attr_unload = {
-	.attr = {.name = "unload", .mode = S_IFREG | S_IRUGO | S_IWUSR},
-	.store = stacker_unload_write,
-};
-
-
 /* stop responding to sysfs */
 static ssize_t stop_responding_write (struct stacker_kobj *obj,
 					const char *buff, size_t count)
@@ -1483,7 +1551,6 @@ static void unregister_sysfs_files(void)
 	sysfs_remove_file(kobj, &stacker_attr_lockdown.attr);
 	sysfs_remove_file(kobj, &stacker_attr_listmodules.attr);
 	sysfs_remove_file(kobj, &stacker_attr_stop_responding.attr);
-	sysfs_remove_file(kobj, &stacker_attr_unload.attr);
 
 	sysfsfiles_registered = 0;
 }
@@ -1506,8 +1573,6 @@ static int register_sysfs_files(void)
 			&stacker_attr_listmodules.attr);
 	sysfs_create_file(&stacker_subsys.kset.kobj,
 			&stacker_attr_stop_responding.attr);
-	sysfs_create_file(&stacker_subsys.kset.kobj,
-			&stacker_attr_unload.attr);
 	sysfsfiles_registered = 1;
 	stacker_dbg("sysfs files registered\n");
 	return 0;
@@ -1524,13 +1589,13 @@ static int __init stacker_init (void)
 	sysfsfiles_registered = 0;
 
 	INIT_LIST_HEAD(&stacked_modules);
-	INIT_LIST_HEAD(&all_modules);
 	spin_lock_init(&stacker_lock);
 	default_module.module_name = DEFAULT_MODULE_NAME;
 	default_module.namelen = strlen(DEFAULT_MODULE_NAME);
 	memcpy(&default_module.module_operations, &dummy_security_ops,
 			sizeof(struct security_operations));
 	INIT_LIST_HEAD(&default_module.lsm_list);
+	atomic_set(&default_module.use, 1);
 	list_add_tail(&default_module.lsm_list, &stacked_modules);
 
 	if (register_security (&stacker_ops)) {

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: rcu-refcount stacker performance
  2005-07-14 14:21 rcu-refcount stacker performance serue
@ 2005-07-14 15:23 ` Paul E. McKenney
  2005-07-14 13:44   ` serue
  0 siblings, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2005-07-14 15:23 UTC (permalink / raw)
  To: serue; +Cc: lkml, Dipankar Sarma, David A. Wheeler, Tony Jones

On Thu, Jul 14, 2005 at 09:21:07AM -0500, serue@us.ibm.com wrote:
> On July 8 I sent out a patch which re-implemented the rcu-refcounting
> of the LSM list in stacker for the sake of supporting safe security
> module unloading.  (patch reattached here for convenience)  Here are
> some performance results with and without that patch.  Tests were run
> on a 16-way ppc64 machine.  Dbench was run 50 times, and kernbench
> and reaim were run 10 times, and intervals are 95% confidence half-
> intervals.
> 
> These results seem pretty poor.  I'm now wondering whether this is
> really necessary.  David Wheeler's original stacker had an option
> of simply waiting a while after a module was taken out of the list
> of active modules before freeing the modules.  Something like that
> is of course one option.  I'm hoping we can also take advantage of
> some already known module state info to be a little less coarse
> about it.  For instance, sys_delete_module() sets m->state to
> MODULE_STATE_GOING before calling mod->exit().  If in place of
> doing atomic_inc(&m->use), stacker skipped the m->hook() if
> m->state!=MODULE_STATE_LIVE, then it may be safe to assume that
> any m->hook() should be finished before sys_delete_module() gets
> to free_module(mod).  This seems to require adding a struct
> module argument to security/security:mod_reg_security() so an LSM
> can pass itself along.
> 
> So I'll try that next.  Hopefully by avoiding the potential cache
> line bounces which atomic_inc(&m->use) bring, this should provide
> far better performance.

My guess is that the reference count is indeed costing you quite a
bit.  I glance quickly at the patch, and most of the uses seem to
be of the form:

	increment ref count
	rcu_read_lock()
	do something
	rcu_read_unlock()
	decrement ref count

Can't these cases rely solely on rcu_read_lock()?  Why do you also
need to increment the reference count in these cases?

						Thanx, Paul

> thanks,
> -serge
> 
> Dbench (throughput, larger is better)
> --------------------------------------------
> plain stacker:    1531.448400 +/- 15.791116
> stacker with rcu: 1408.056200 +/- 12.597277
> 
> Kernbench (runtime, smaller is better)
> --------------------------------------------
> plain stacker:    52.341000  +/- 0.184995
> stacker with rcu: 53.722000 +/- 0.161473
> 
> Reaim (numjobs, larger is better) (gnuplot-friendly format)
> plain stacker:
> ----------------------------------------------------------
> Numforked   jobs/minute         95% CI
> 1           106662.857000     5354.267865
> 3           301628.571000     6297.121934
> 5           488142.858000     16031.685536
> 7           673200.000000     23994.030784
> 9           852428.570000     31485.607271
> 11          961714.290000     0.000000
> 13          1108157.144000    27287.525982
> 15          1171178.571000    49790.796869
> 
> Reaim (numjobs, larger is better) (gnuplot-friendly format)
> plain stacker:
> ----------------------------------------------------------
> Numforked   jobs/minute         95% CI
> 1           100542.857000     2099.040645
> 3           266657.139000     6297.121934
> 5           398892.858000     12023.765252
> 7           467670.000000     14911.383385
> 9           418648.352000     11665.751441
> 11          396825.000000     8700.115252
> 13          357480.912000     7567.947838
> 15          337571.428000     2332.267703
> 
> Patch:
> 
> Index: linux-2.6.12/security/stacker.c
> ===================================================================
> --- linux-2.6.12.orig/security/stacker.c	2005-07-08 13:43:15.000000000 -0500
> +++ linux-2.6.12/security/stacker.c	2005-07-08 16:21:54.000000000 -0500
> @@ -33,13 +33,13 @@
>  
>  struct module_entry {
>  	struct list_head lsm_list;  /* list of active lsms */
> -	struct list_head all_lsms; /* list of all lsms */
>  	char *module_name;
>  	int namelen;
>  	struct security_operations module_operations;
> +	struct rcu_head m_rcu;
> +	atomic_t use;
>  };
>  static struct list_head stacked_modules;  /* list of stacked modules */
> -static struct list_head all_modules;  /* list of all modules, including freed */
>  
>  static short sysfsfiles_registered;
>  
> @@ -84,6 +84,32 @@ MODULE_PARM_DESC(debug, "Debug enabled o
>   * We return as soon as an error is returned.
>   */
>  
> +static inline void stacker_free_module(struct module_entry *m)
> +{
> +	kfree(m->module_name);
> +	kfree(m);
> +}
> +
> +/*
> + * Version of stacker_free_module called from call_rcu
> + */
> +static void free_mod_fromrcu(struct rcu_head *head)
> +{
> +	struct module_entry *m;
> +
> +	m = container_of(head, struct module_entry, m_rcu);
> +	stacker_free_module(m);
> +}
> +
> +static void stacker_del_module(struct rcu_head *head)
> +{
> +	struct module_entry *m;
> +	
> +	m = container_of(head, struct module_entry, m_rcu);
> +	if (atomic_dec_and_test(&m->use))
> +		stacker_free_module(m);
> +}
> +
>  #define stack_for_each_entry(pos, head, member)				\
>  	for (pos = list_entry((head)->next, typeof(*pos), member);	\
>  		&pos->member != (head);					\
> @@ -93,16 +119,27 @@ MODULE_PARM_DESC(debug, "Debug enabled o
>  /* to make this safe for module deletion, we would need to
>   * add a reference count to m as we had before
>   */
> +/*
> + * XXX We can't quite do this - we delete the module before we grab
> + * m->next?
> + * We could just do a call_rcu.  Then the call_rcu happens in same
> + * rcu cycle has dereference, so module won't be deleted until the
> + * next cycle.
> + * That's what I'm going to do.
> + */
>  #define RETURN_ERROR_IF_ANY_ERROR(BASE_FUNC, FUNC_WITH_ARGS) do { \
>  	int result = 0; \
>  	struct module_entry *m; \
>  	rcu_read_lock(); \
>  	stack_for_each_entry(m, &stacked_modules, lsm_list) { \
> -		if (!m->module_operations.BASE_FUNC) \
> -			continue; \
> -		rcu_read_unlock(); \
> -		result = m->module_operations.FUNC_WITH_ARGS; \
> -		rcu_read_lock(); \
> +		if (m->module_operations.BASE_FUNC) { \
> +			atomic_inc(&m->use); \
> +			rcu_read_unlock(); \
> +			result = m->module_operations.FUNC_WITH_ARGS; \
> +			rcu_read_lock(); \
> +			if (unlikely(atomic_dec_and_test(&m->use))) \
> +				call_rcu(&m->m_rcu, free_mod_fromrcu); \
> +		} \
>  		if (result) \
>  			break; \
>  	} \
> @@ -116,9 +153,12 @@ MODULE_PARM_DESC(debug, "Debug enabled o
>  	rcu_read_lock(); \
>  	stack_for_each_entry(m, &stacked_modules, lsm_list) { \
>  		if (m->module_operations.BASE_FUNC) { \
> +			atomic_inc(&m->use); \
>  			rcu_read_unlock(); \
>  			m->module_operations.FUNC_WITH_ARGS; \
>  			rcu_read_lock(); \
> +			if (unlikely(atomic_dec_and_test(&m->use))) \
> +				call_rcu(&m->m_rcu, free_mod_fromrcu); \
>  		} \
>  	} \
>  	rcu_read_unlock(); \
> @@ -129,38 +169,47 @@ MODULE_PARM_DESC(debug, "Debug enabled o
>  	rcu_read_lock(); \
>  	stack_for_each_entry(m, &stacked_modules, lsm_list ) { \
>  		if (m->module_operations.BASE_FREE) { \
> +			atomic_inc(&m->use); \
>  			rcu_read_unlock(); \
>  			m->module_operations.FREE_WITH_ARGS; \
>  			rcu_read_lock(); \
> +			if (unlikely(atomic_dec_and_test(&m->use))) \
> +				call_rcu(&m->m_rcu, free_mod_fromrcu); \
>  		} \
>  	} \
>  	rcu_read_unlock(); \
>  } while (0)
>  
>  #define ALLOC_SECURITY(BASE_FUNC,FUNC_WITH_ARGS,BASE_FREE,FREE_WITH_ARGS) do { \
> -	int result; \
> +	int result = 0; \
>  	struct module_entry *m, *m2; \
>  	rcu_read_lock(); \
>  	stack_for_each_entry(m, &stacked_modules, lsm_list) { \
> -		if (!m->module_operations.BASE_FUNC) \
> -			continue; \
> -		rcu_read_unlock(); \
> -		result = m->module_operations.FUNC_WITH_ARGS; \
> -		rcu_read_lock(); \
> +		if (m->module_operations.BASE_FUNC) { \
> +			atomic_inc(&m->use); \
> +			rcu_read_unlock(); \
> +			result = m->module_operations.FUNC_WITH_ARGS; \
> +			rcu_read_lock(); \
> +			if (unlikely(atomic_dec_and_test(&m->use))) \
> +				call_rcu(&m->m_rcu, free_mod_fromrcu); \
> +		} \
>  		if (result) \
>  			goto bad; \
>  	} \
>  	rcu_read_unlock(); \
>  	return 0; \
>  bad: \
> -	stack_for_each_entry(m2, &all_modules, all_lsms) { \
> +	stack_for_each_entry(m2, &stacked_modules, lsm_list) { \
>  		if (m == m2) \
>  			break; \
>  		if (!m2->module_operations.BASE_FREE) \
>  			continue; \
> +		atomic_inc(&m2->use); \
>  		rcu_read_unlock(); \
>  		m2->module_operations.FREE_WITH_ARGS; \
>  		rcu_read_lock(); \
> +		if (unlikely(atomic_dec_and_test(&m2->use))) \
> +			call_rcu(&m2->m_rcu, free_mod_fromrcu); \
>  	} \
>  	rcu_read_unlock(); \
>  	return result; \
> @@ -251,10 +300,16 @@ static int stacker_vm_enough_memory(long
>  
>  	rcu_read_lock();
>  	stack_for_each_entry(m, &stacked_modules, lsm_list) {
> -		if (!m->module_operations.vm_enough_memory)
> +		if (!m->module_operations.vm_enough_memory) {
>  			continue;
> +		}
> +		atomic_inc(&m->use);
>  		rcu_read_unlock();
>  		result = m->module_operations.vm_enough_memory(pages);
> +		rcu_read_lock();
> +		if (unlikely(atomic_dec_and_test(&m->use)))
> +			stacker_free_module(m);
> +		rcu_read_unlock();
>  		return result;
>  	}
>  	rcu_read_unlock();
> @@ -281,9 +336,12 @@ static int stacker_netlink_send (struct 
>  		if (!m->module_operations.netlink_send)
>  			continue;
>  		NETLINK_CB(skb).eff_cap = ~0;
> +		atomic_inc(&m->use);
>  		rcu_read_unlock();
>  		result = m->module_operations.netlink_send(sk, skb);
>  		rcu_read_lock();
> +		if (unlikely(atomic_dec_and_test(&m->use)))
> +			call_rcu(&m->m_rcu, free_mod_fromrcu);
>  		tmpcap &= NETLINK_CB(skb).eff_cap;
>  		if (result)
>  			break;
> @@ -987,33 +1045,42 @@ stacker_getprocattr(struct task_struct *
>  	stack_for_each_entry(m, &stacked_modules, lsm_list) {
>  		if (!m->module_operations.getprocattr)
>  			continue;
> +		atomic_inc(&m->use);
>  		rcu_read_unlock();
>  		ret = m->module_operations.getprocattr(p, name,
>  					value+len, size-len);
>  		rcu_read_lock();
> -		if (ret == -EINVAL)
> -			continue;
> -		found_noneinval = 1;
> -		if (ret < 0) {
> +		if (ret > 0) {
> +			found_noneinval = 1;
> +			len += ret;
> +			if (len+m->namelen+4 < size) {
> +				char *v = value;
> +				if (v[len-1]=='\n')
> +					len--;
> +				len += sprintf(value+len, " (%s)\n",
> +							m->module_name);
> +			}
> +		} else if (ret != -EINVAL) {
> +			found_noneinval = 1;
>  			memset(value, 0, len);
>  			len = ret;
> +		} else
> +			ret = 0;
> +
> +		if (unlikely(atomic_dec_and_test(&m->use)))
> +			call_rcu(&m->m_rcu, free_mod_fromrcu);
> +
> +		if (ret < 0)
>  			break;
> -		}
> -		if (ret == 0)
> -			continue;
> -		len += ret;
> -		if (len+m->namelen+4 < size) {
> -			char *v = value;
> -			if (v[len-1]=='\n')
> -				len--;
> -			len += sprintf(value+len, " (%s)\n", m->module_name);
> -		}
>  	}
>  	rcu_read_unlock();
>  
>  	return found_noneinval ? len : -EINVAL;
>  }
>  
> +/*
> + * find an lsm by name.  If found, increment its use count and return it
> + */
>  static struct module_entry *
>  find_active_lsm(const char *name, int len)
>  {
> @@ -1022,6 +1089,7 @@ find_active_lsm(const char *name, int le
>  	rcu_read_lock();
>  	stack_for_each_entry(m, &stacked_modules, lsm_list) {
>  		if (m->namelen == len && !strncmp(m->module_name, name, len)) {
> +			atomic_inc(&m->use);
>  			ret = m;
>  			break;
>  		}
> @@ -1043,6 +1111,7 @@ stacker_setprocattr(struct task_struct *
>  	char *realv = (char *)value;
>  	size_t dsize = size;
>  	int loc = 0, end_data = size;
> +	int ret, free_module = 0;
>  
>  	if (list_empty(&stacked_modules))
>  		return -EINVAL;
> @@ -1063,7 +1132,7 @@ stacker_setprocattr(struct task_struct *
>  	callm = find_active_lsm(realv+loc+1, dsize-loc-1);
>  	if (!callm)
>  		goto call;
> -
> +	free_module = 1;
>  
>  	loc--;
>  	while (loc && realv[loc]==' ')
> @@ -1074,8 +1143,14 @@ call:
>  	if (!callm || !callm->module_operations.setprocattr)
>  		return -EINVAL;
>  
> -	return callm->module_operations.setprocattr(p, name, value, end_data) +
> +	ret = callm->module_operations.setprocattr(p, name, value, end_data) +
>  			(size-end_data);
> +
> +	if (free_module && atomic_dec_and_test(&callm->use))
> +		stacker_free_module(callm);
> +
> +	return ret;
> +
>  }
>  
>  /*
> @@ -1116,15 +1191,15 @@ static int stacker_register (const char 
>  	new_module_entry->module_name = new_module_name;
>  	new_module_entry->namelen = namelen;
>  
> +	atomic_set(&new_module_entry->use, 1);
> +
>  	INIT_LIST_HEAD(&new_module_entry->lsm_list);
> -	INIT_LIST_HEAD(&new_module_entry->all_lsms);
>  
>  	rcu_read_lock();
>  	if (!modules_registered) {
>  		modules_registered++;
>  		list_del_rcu(&default_module.lsm_list);
>  	}
> -	list_add_tail_rcu(&new_module_entry->all_lsms, &all_modules);
>  	list_add_tail_rcu(&new_module_entry->lsm_list, &stacked_modules);
>  	if (strcmp(name, "selinux") == 0)
>  		selinux_module = new_module_entry;
> @@ -1141,16 +1216,60 @@ out:
>  }
>  
>  /*
> - * Currently this version of stacker does not allow for module
> - * unregistering.
> - * Easy way to allow for this is using rcu ref counting like an older
> - * version of stacker did.
> - * Another way would be to force stacker_unregister to sleep between
> - * removing the module from all_modules and free_modules and unloading it.
> + * find_lsm_module_by_name:
> + * Find a module by name.  Used by stacker_unregister.  Called with
> + * stacker spinlock held.
> + */
> +static struct module_entry *
> +find_lsm_with_namelen(const char *name, int len)
> +{
> +	struct module_entry *m, *ret = NULL;
> +
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(m, &stacked_modules, lsm_list) {
> +		atomic_inc(&m->use);
> +		rcu_read_unlock();
> +		if (m->namelen == len && !strncmp(m->module_name, name, len))
> +			ret = m;
> +		rcu_read_lock();
> +		if (unlikely(atomic_dec_and_test(&m->use)))
> +			call_rcu(&m->m_rcu, free_mod_fromrcu);
> +		if (ret)
> +			break;
> +	}
> +	rcu_read_unlock();
> +
> +	return ret;
> +}
> +
> +/*
>   */
>  static int stacker_unregister (const char *name, struct security_operations *ops)
>  {
> -	return -EPERM;
> +	struct module_entry *m;
> +	int len = strnlen(name, MAX_MODULE_NAME_LEN);
> +	int ret = 0;
> +
> +	spin_lock(&stacker_lock);
> +	m = find_lsm_with_namelen(name, len);
> +
> +	if (!m) {
> +		printk(KERN_INFO "%s: could not find module %s.\n",
> +				__FUNCTION__, name);
> +		ret = -ENOENT;
> +		goto out;
> +	}
> +
> +	list_del_rcu(&m->lsm_list);
> +
> +	if (strcmp(m->module_name, "selinux") == 0)
> +		selinux_module = NULL;
> +	call_rcu(&m->m_rcu, stacker_del_module);
> +
> +out:
> +	spin_unlock(&stacker_lock);
> +
> +	return ret;
>  }
>  
>  static struct security_operations stacker_ops = {
> @@ -1407,57 +1526,6 @@ static struct stacker_attribute stacker_
>  	.show = listmodules_read,
>  };
>  
> -/* respond to a request to unload a module */
> -static ssize_t stacker_unload_write (struct stacker_kobj *obj, const char *name,
> -					size_t count)
> -{
> -	struct module_entry *m;
> -	int len = strnlen(name, MAX_MODULE_NAME_LEN);
> -	int ret = count;
> -
> -	if (!capable(CAP_SYS_ADMIN))
> -		return -EPERM;
> -
> -	if (count <= 0)
> -		return -EINVAL;
> -
> -	if (!modules_registered)
> -		return -EINVAL;
> -
> -	spin_lock(&stacker_lock);
> -	m = find_active_lsm(name, len);
> -
> -	if (!m) {
> -		printk(KERN_INFO "%s: could not find module %s.\n",
> -			__FUNCTION__, name);
> -		ret = -ENOENT;
> -		goto out;
> -	}
> -
> -	if (strcmp(m->module_name, "selinux") == 0)
> -		selinux_module = NULL;
> -
> -	rcu_read_lock();
> -	list_del_rcu(&m->lsm_list);
> -	if (list_empty(&stacked_modules)) {
> -		INIT_LIST_HEAD(&default_module.lsm_list);
> -		list_add_tail_rcu(&default_module.lsm_list, &stacked_modules);
> -		modules_registered = 0;
> -	}
> -	rcu_read_unlock();
> -
> -out:
> -	spin_unlock(&stacker_lock);
> -
> -	return ret;
> -}
> -
> -static struct stacker_attribute stacker_attr_unload = {
> -	.attr = {.name = "unload", .mode = S_IFREG | S_IRUGO | S_IWUSR},
> -	.store = stacker_unload_write,
> -};
> -
> -
>  /* stop responding to sysfs */
>  static ssize_t stop_responding_write (struct stacker_kobj *obj,
>  					const char *buff, size_t count)
> @@ -1483,7 +1551,6 @@ static void unregister_sysfs_files(void)
>  	sysfs_remove_file(kobj, &stacker_attr_lockdown.attr);
>  	sysfs_remove_file(kobj, &stacker_attr_listmodules.attr);
>  	sysfs_remove_file(kobj, &stacker_attr_stop_responding.attr);
> -	sysfs_remove_file(kobj, &stacker_attr_unload.attr);
>  
>  	sysfsfiles_registered = 0;
>  }
> @@ -1506,8 +1573,6 @@ static int register_sysfs_files(void)
>  			&stacker_attr_listmodules.attr);
>  	sysfs_create_file(&stacker_subsys.kset.kobj,
>  			&stacker_attr_stop_responding.attr);
> -	sysfs_create_file(&stacker_subsys.kset.kobj,
> -			&stacker_attr_unload.attr);
>  	sysfsfiles_registered = 1;
>  	stacker_dbg("sysfs files registered\n");
>  	return 0;
> @@ -1524,13 +1589,13 @@ static int __init stacker_init (void)
>  	sysfsfiles_registered = 0;
>  
>  	INIT_LIST_HEAD(&stacked_modules);
> -	INIT_LIST_HEAD(&all_modules);
>  	spin_lock_init(&stacker_lock);
>  	default_module.module_name = DEFAULT_MODULE_NAME;
>  	default_module.namelen = strlen(DEFAULT_MODULE_NAME);
>  	memcpy(&default_module.module_operations, &dummy_security_ops,
>  			sizeof(struct security_operations));
>  	INIT_LIST_HEAD(&default_module.lsm_list);
> +	atomic_set(&default_module.use, 1);
>  	list_add_tail(&default_module.lsm_list, &stacked_modules);
>  
>  	if (register_security (&stacker_ops)) {
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: rcu-refcount stacker performance
  2005-07-14 13:44   ` serue
@ 2005-07-14 16:59     ` Paul E. McKenney
  2005-07-14 17:13       ` serue
  0 siblings, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2005-07-14 16:59 UTC (permalink / raw)
  To: serue; +Cc: lkml, Dipankar Sarma, David A. Wheeler, Tony Jones

On Thu, Jul 14, 2005 at 08:44:50AM -0500, serue@us.ibm.com wrote:
> Quoting Paul E. McKenney (paulmck@us.ibm.com):
> > My guess is that the reference count is indeed costing you quite a
> > bit.  I glance quickly at the patch, and most of the uses seem to
> > be of the form:
> > 
> > 	increment ref count
> > 	rcu_read_lock()
> > 	do something
> > 	rcu_read_unlock()
> > 	decrement ref count
> > 
> > Can't these cases rely solely on rcu_read_lock()?  Why do you also
> > need to increment the reference count in these cases?
> 
> The problem is on module unload: is it possible for CPU1 to be
> on "do something", and sleep, and, while it sleeps, CPU2 does
> rmmod(lsm), so that by the time CPU1 stops sleeping, the code it
> is executing has been freed?

OK, but in the above case, "do something" cannot be sleeping, since
it is under rcu_read_lock().

> Because stacker won't remove the lsm from the list of modules
> until mod->exit() is executed, and module_free(mod) happens
> immediately after that, the above scenario seems possible.

Right, if you have some other code path that sleeps (outside of
rcu_read_lock(), right?), then you need the reference count for that
code path.  But the code paths that do not sleep should be able to
dispense with the reference count, reducing the cache-line traffic.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: rcu-refcount stacker performance
  2005-07-14 16:59     ` Paul E. McKenney
@ 2005-07-14 17:13       ` serue
  2005-07-14 18:50         ` Paul E. McKenney
  2005-07-15  0:29         ` Joe Seigh
  0 siblings, 2 replies; 9+ messages in thread
From: serue @ 2005-07-14 17:13 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: serue, lkml, Dipankar Sarma, David A. Wheeler, Tony Jones

Quoting Paul E. McKenney (paulmck@us.ibm.com):
> On Thu, Jul 14, 2005 at 08:44:50AM -0500, serue@us.ibm.com wrote:
> > Quoting Paul E. McKenney (paulmck@us.ibm.com):
> > > My guess is that the reference count is indeed costing you quite a
> > > bit.  I glance quickly at the patch, and most of the uses seem to
> > > be of the form:
> > > 
> > > 	increment ref count
> > > 	rcu_read_lock()
> > > 	do something
> > > 	rcu_read_unlock()
> > > 	decrement ref count
> > > 
> > > Can't these cases rely solely on rcu_read_lock()?  Why do you also
> > > need to increment the reference count in these cases?
> > 
> > The problem is on module unload: is it possible for CPU1 to be
> > on "do something", and sleep, and, while it sleeps, CPU2 does
> > rmmod(lsm), so that by the time CPU1 stops sleeping, the code it
> > is executing has been freed?
> 
> OK, but in the above case, "do something" cannot be sleeping, since
> it is under rcu_read_lock().

Oh, but that's not quite what the code is doing, rather it is doing:

	rcu_read_lock
	while get next element from list
		inc element.refcount
		rcu_read_unlock
		do something
		rcu_read_lock
		dec refcount
	rcu_read_unlock

What I plan to try next is:

	rcu_read_lock
	while get next element from list
		if (element->owning_module->state != LIVE)
			continue
		rcu_read_unlock
		do something
		rcu_read_lock
	rcu_read_unlock

> > Because stacker won't remove the lsm from the list of modules
> > until mod->exit() is executed, and module_free(mod) happens
> > immediately after that, the above scenario seems possible.
> 
> Right, if you have some other code path that sleeps (outside of
> rcu_read_lock(), right?), then you need the reference count for that
> code path.  But the code paths that do not sleep should be able to
> dispense with the reference count, reducing the cache-line traffic.

Most if not all of the codepaths can sleep, however.  So unfortunately
that doesn't seem a feasible solution.  That's why I'm hoping there is
something inherent in the module unload code that I can take advantage
of to forego my own refcounting.

thanks,
-serge

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: rcu-refcount stacker performance
  2005-07-14 17:13       ` serue
@ 2005-07-14 18:50         ` Paul E. McKenney
  2005-07-14 19:09           ` serue
  2005-07-15  0:29         ` Joe Seigh
  1 sibling, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2005-07-14 18:50 UTC (permalink / raw)
  To: serue; +Cc: lkml, Dipankar Sarma, David A. Wheeler, Tony Jones

On Thu, Jul 14, 2005 at 12:13:57PM -0500, serue@us.ibm.com wrote:
> Quoting Paul E. McKenney (paulmck@us.ibm.com):
> > On Thu, Jul 14, 2005 at 08:44:50AM -0500, serue@us.ibm.com wrote:
> > > Quoting Paul E. McKenney (paulmck@us.ibm.com):
> > > > My guess is that the reference count is indeed costing you quite a
> > > > bit.  I glance quickly at the patch, and most of the uses seem to
> > > > be of the form:
> > > > 
> > > > 	increment ref count
> > > > 	rcu_read_lock()
> > > > 	do something
> > > > 	rcu_read_unlock()
> > > > 	decrement ref count
> > > > 
> > > > Can't these cases rely solely on rcu_read_lock()?  Why do you also
> > > > need to increment the reference count in these cases?
> > > 
> > > The problem is on module unload: is it possible for CPU1 to be
> > > on "do something", and sleep, and, while it sleeps, CPU2 does
> > > rmmod(lsm), so that by the time CPU1 stops sleeping, the code it
> > > is executing has been freed?
> > 
> > OK, but in the above case, "do something" cannot be sleeping, since
> > it is under rcu_read_lock().
> 
> Oh, but that's not quite what the code is doing, rather it is doing:
> 
> 	rcu_read_lock
> 	while get next element from list
> 		inc element.refcount
> 		rcu_read_unlock
> 		do something
> 		rcu_read_lock
> 		dec refcount
> 	rcu_read_unlock

Color me blind this morning...  :-/  Yes, "do something" can legitimately
sleep.  Sorry for my confusion!

> What I plan to try next is:
> 
> 	rcu_read_lock
> 	while get next element from list
> 		if (element->owning_module->state != LIVE)
> 			continue
> 		rcu_read_unlock
> 		do something
> 		rcu_read_lock
> 	rcu_read_unlock
> 
> > > Because stacker won't remove the lsm from the list of modules
> > > until mod->exit() is executed, and module_free(mod) happens
> > > immediately after that, the above scenario seems possible.
> > 
> > Right, if you have some other code path that sleeps (outside of
> > rcu_read_lock(), right?), then you need the reference count for that
> > code path.  But the code paths that do not sleep should be able to
> > dispense with the reference count, reducing the cache-line traffic.
> 
> Most if not all of the codepaths can sleep, however.  So unfortunately
> that doesn't seem a feasible solution.  That's why I'm hoping there is
> something inherent in the module unload code that I can take advantage
> of to forego my own refcounting.

OK, so the only way that elements are removed is when a module is
unloaded, right?

If your module trick does not pan out, how about the following:

o	Add a "need per-element reference count" global variable

o	Have a per-CPU reference-count variable.

o	Make your code snippet do something like the following:

	rcu_read_lock()
	while get next element from list
		if (need per-element reference count)
			ref = &element.refcount
		else
			ref = &__get_cpu_var(stacker_refcounts)
		atomic_inc(ref)
		rcu_read_unlock()
		do something
		rcu_read_lock()
		atomic_dec(ref)
	rcu_read_unlock()

o	The point is to (hopefully) reduce the cache thrashing associated
	with the reference counts.

At module unload time, do something like the following:

	need per-element reference count = 1
	synchronize_rcu()
	for_each_cpu(cpu)
		while (per_cpu(stacker_refcounts,cpu) != 0)
			sleep for a bit

	/* At this point, all CPUs are using per-element reference counts */

If this approach does not reduce cache thrashing enough, one could use
a per-task reference count instead of a per-CPU reference count.  The
downside of doing this per-task approach is that you have to traverse
the entire task list at unload time.  But module unloading should be
quite rare.  If doing the per-task approach, you don't need atomic
increments and decrements for the reference count, and you have excellent
cache locality.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: rcu-refcount stacker performance
  2005-07-14 18:50         ` Paul E. McKenney
@ 2005-07-14 19:09           ` serue
  0 siblings, 0 replies; 9+ messages in thread
From: serue @ 2005-07-14 19:09 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: lkml, Dipankar Sarma, David A. Wheeler, Tony Jones

Thanks, Paul, that's a great idea!

The approach I'm testing right now just does a module_get(mod), which
is released when you manually disable the module by echoing it's name
into /sys/kernel/security/stacker/unload, so that must be done before
you can rmmod.  This way no refcounting is actually needed at all
in the time-critical loops, but it involves a change in the LSM API
which may well be rejected, so your approach may be the way to go.

thanks,
-serge

Quoting Paul E. McKenney (paulmck@us.ibm.com):
> On Thu, Jul 14, 2005 at 12:13:57PM -0500, serue@us.ibm.com wrote:
> > Quoting Paul E. McKenney (paulmck@us.ibm.com):
> > > On Thu, Jul 14, 2005 at 08:44:50AM -0500, serue@us.ibm.com wrote:
> > > > Quoting Paul E. McKenney (paulmck@us.ibm.com):
> > > > > My guess is that the reference count is indeed costing you quite a
> > > > > bit.  I glance quickly at the patch, and most of the uses seem to
> > > > > be of the form:
> > > > > 
> > > > > 	increment ref count
> > > > > 	rcu_read_lock()
> > > > > 	do something
> > > > > 	rcu_read_unlock()
> > > > > 	decrement ref count
> > > > > 
> > > > > Can't these cases rely solely on rcu_read_lock()?  Why do you also
> > > > > need to increment the reference count in these cases?
> > > > 
> > > > The problem is on module unload: is it possible for CPU1 to be
> > > > on "do something", and sleep, and, while it sleeps, CPU2 does
> > > > rmmod(lsm), so that by the time CPU1 stops sleeping, the code it
> > > > is executing has been freed?
> > > 
> > > OK, but in the above case, "do something" cannot be sleeping, since
> > > it is under rcu_read_lock().
> > 
> > Oh, but that's not quite what the code is doing, rather it is doing:
> > 
> > 	rcu_read_lock
> > 	while get next element from list
> > 		inc element.refcount
> > 		rcu_read_unlock
> > 		do something
> > 		rcu_read_lock
> > 		dec refcount
> > 	rcu_read_unlock
> 
> Color me blind this morning...  :-/  Yes, "do something" can legitimately
> sleep.  Sorry for my confusion!
> 
> > What I plan to try next is:
> > 
> > 	rcu_read_lock
> > 	while get next element from list
> > 		if (element->owning_module->state != LIVE)
> > 			continue
> > 		rcu_read_unlock
> > 		do something
> > 		rcu_read_lock
> > 	rcu_read_unlock
> > 
> > > > Because stacker won't remove the lsm from the list of modules
> > > > until mod->exit() is executed, and module_free(mod) happens
> > > > immediately after that, the above scenario seems possible.
> > > 
> > > Right, if you have some other code path that sleeps (outside of
> > > rcu_read_lock(), right?), then you need the reference count for that
> > > code path.  But the code paths that do not sleep should be able to
> > > dispense with the reference count, reducing the cache-line traffic.
> > 
> > Most if not all of the codepaths can sleep, however.  So unfortunately
> > that doesn't seem a feasible solution.  That's why I'm hoping there is
> > something inherent in the module unload code that I can take advantage
> > of to forego my own refcounting.
> 
> OK, so the only way that elements are removed is when a module is
> unloaded, right?
> 
> If your module trick does not pan out, how about the following:
> 
> o	Add a "need per-element reference count" global variable
> 
> o	Have a per-CPU reference-count variable.
> 
> o	Make your code snippet do something like the following:
> 
> 	rcu_read_lock()
> 	while get next element from list
> 		if (need per-element reference count)
> 			ref = &element.refcount
> 		else
> 			ref = &__get_cpu_var(stacker_refcounts)
> 		atomic_inc(ref)
> 		rcu_read_unlock()
> 		do something
> 		rcu_read_lock()
> 		atomic_dec(ref)
> 	rcu_read_unlock()
> 
> o	The point is to (hopefully) reduce the cache thrashing associated
> 	with the reference counts.
> 
> At module unload time, do something like the following:
> 
> 	need per-element reference count = 1
> 	synchronize_rcu()
> 	for_each_cpu(cpu)
> 		while (per_cpu(stacker_refcounts,cpu) != 0)
> 			sleep for a bit
> 
> 	/* At this point, all CPUs are using per-element reference counts */
> 
> If this approach does not reduce cache thrashing enough, one could use
> a per-task reference count instead of a per-CPU reference count.  The
> downside of doing this per-task approach is that you have to traverse
> the entire task list at unload time.  But module unloading should be
> quite rare.  If doing the per-task approach, you don't need atomic
> increments and decrements for the reference count, and you have excellent
> cache locality.
> 
> 							Thanx, Paul
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: rcu-refcount stacker performance
  2005-07-14 17:13       ` serue
  2005-07-14 18:50         ` Paul E. McKenney
@ 2005-07-15  0:29         ` Joe Seigh
  2005-07-15 13:59           ` Joe Seigh
  1 sibling, 1 reply; 9+ messages in thread
From: Joe Seigh @ 2005-07-15  0:29 UTC (permalink / raw)
  To: linux-kernel

serue@us.ibm.com wrote:
> Quoting Paul E. McKenney (paulmck@us.ibm.com):
>>OK, but in the above case, "do something" cannot be sleeping, since
>>it is under rcu_read_lock().
> 
> 
> Oh, but that's not quite what the code is doing, rather it is doing:
> 
> 	rcu_read_lock
> 	while get next element from list
> 		inc element.refcount
> 		rcu_read_unlock
> 		do something
> 		rcu_read_lock
> 		dec refcount
> 	rcu_read_unlock
> 

I've been experimenting with various lock-free methods in user space, which
is preemptive.   Stuff like RCU, RCU+SMR which I've mentioned before,
and some atomically thread-safe reference counting.  I have a proxy
GC based on the latter called APPC (atomic pointer proxy collector).
Basically you use a proxy refcounted object for the whole list
rather than every element in the list.  Before you access the list,
you increment the refcount of the proxy object, and afterwards you
decrement it.  One interlocked instruction for each so performance
wise it looks like a reader lock which never blocks.

Writers enqueue deleted nodes on the collector object and then
push a new collector object in place.

The collector objects look like


    proxy_anchor -> c_obj <- c_obj <- c_obj
                                       ^
                                       | reader

The previous collector objects are back linked so when a reader
thread releases it, all unreference collector objects have
deallocation performed on them and attached nodes.

A bit sketchy.  You can see a working example of this using
C++ refcounted pointers (which can't be used in the kernel
naturally, you'll have to implement your own) at
http://atomic-ptr-plus.sourceforge.net/

--
Joe Seigh


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: rcu-refcount stacker performance
  2005-07-15  0:29         ` Joe Seigh
@ 2005-07-15 13:59           ` Joe Seigh
  0 siblings, 0 replies; 9+ messages in thread
From: Joe Seigh @ 2005-07-15 13:59 UTC (permalink / raw)
  To: linux-kernel

Joe Seigh wrote:
> A bit sketchy.  You can see a working example of this using
> C++ refcounted pointers (which can't be used in the kernel
> naturally, you'll have to implement your own) at
> http://atomic-ptr-plus.sourceforge.net/

The APPC stuff is in the atomic-ptr-plus package if anyone is
wondering where it is.  It was one of what I guess you'd
call strawman packages.  The RCU+SMR (fastsmr) package is
currently the one I'll probably carry forward.

> 
> -- 
> Joe Seigh
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2005-07-15 13:58 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-14 14:21 rcu-refcount stacker performance serue
2005-07-14 15:23 ` Paul E. McKenney
2005-07-14 13:44   ` serue
2005-07-14 16:59     ` Paul E. McKenney
2005-07-14 17:13       ` serue
2005-07-14 18:50         ` Paul E. McKenney
2005-07-14 19:09           ` serue
2005-07-15  0:29         ` Joe Seigh
2005-07-15 13:59           ` Joe Seigh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox