[Ocfs2-devel] [RFC] configfs: module reference counting rules

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Ocfs2-devel] [RFC] configfs: module reference counting rules
@ 2008-06-12 23:54 ` Louis Rilling
  0 siblings, 0 replies; 20+ messages in thread
From: Louis Rilling @ 2008-06-12 23:54 UTC (permalink / raw)
  To: Joel Becker; +Cc: ocfs2-devel, linux-kernel

Hi,

I'm a bit confused by configfs module reference counting, and I feel
that it does not really protect against module unloading. If my feeling
is correct, I'd suggest to change module reference counting rules in
configfs, for instance like in the attached patch. If my feeling is
wrong, could someone shed some light?

Thanks

Louis

-- 
Dr Louis Rilling			Kerlabs - IRISA
Skype: louis.rilling			Campus Universitaire de Beaulieu
Phone: (+33|0) 2 99 84 71 52		Avenue du General Leclerc
Fax: (+33|0) 2 99 84 71 71		35042 Rennes CEDEX - France
http://www.kerlabs.com/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: configfs-fix-module-refcount.patch
Type: text/x-diff
Size: 6835 bytes
Desc: not available
Url : http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20080613/32facd75/attachment-0001.bin 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC] configfs: module reference counting rules
@ 2008-06-12 23:54 ` Louis Rilling
  0 siblings, 0 replies; 20+ messages in thread
From: Louis Rilling @ 2008-06-12 23:54 UTC (permalink / raw)
  To: Joel Becker; +Cc: ocfs2-devel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 568 bytes --]

Hi,

I'm a bit confused by configfs module reference counting, and I feel
that it does not really protect against module unloading. If my feeling
is correct, I'd suggest to change module reference counting rules in
configfs, for instance like in the attached patch. If my feeling is
wrong, could someone shed some light?

Thanks

Louis

-- 
Dr Louis Rilling			Kerlabs - IRISA
Skype: louis.rilling			Campus Universitaire de Beaulieu
Phone: (+33|0) 2 99 84 71 52		Avenue du General Leclerc
Fax: (+33|0) 2 99 84 71 71		35042 Rennes CEDEX - France
http://www.kerlabs.com/

[-- Attachment #2: configfs-fix-module-refcount.patch --]
[-- Type: text/x-diff, Size: 6835 bytes --]

configfs: make module reference counting robust

Module references taken by configfs at mkdir() unfortunately do not ensure that
the module implementing an item will remain loaded before the item is released,
because
	1/ the reference is taken after having created the item, and
	2/ the reference is dropped when dropping the item, which may be before
	   the item's last reference count is dropped.
For instance this can lead to calling the release item operation when the module
implementing the function is already unloaded.

This patch changes module reference counting rules in configfs to ensure that
for each item appearing in configfs a module reference is held until the item is
released. For each item/group/default group created in make_item()/make_group(),
the subsystem is responsible for grabbing a reference on the matching modules.
configfs will drop that reference after having released the item/group/default
group.

Something similar might be needed for configfs_attribute as well.

Signed-off-by: Louis Rilling <Louis.Rilling@kerlabs.com>
---
 drivers/net/netconsole.c       |    1 +
 fs/configfs/dir.c              |   17 +++++++----------
 fs/configfs/item.c             |    5 +++++
 fs/dlm/config.c                |    7 +++++++
 fs/ocfs2/cluster/heartbeat.c   |    1 +
 fs/ocfs2/cluster/nodemanager.c |    5 +++++
 6 files changed, 26 insertions(+), 10 deletions(-)

Index: b/drivers/net/netconsole.c
===================================================================
--- a/drivers/net/netconsole.c	2008-06-13 01:24:55.000000000 +0200
+++ b/drivers/net/netconsole.c	2008-06-13 01:25:51.000000000 +0200
@@ -608,6 +608,7 @@ static struct config_item *make_netconso
 	memset(nt->np.remote_mac, 0xff, ETH_ALEN);
 
 	/* Initialize the config_item member */
+	__module_get(netconsole_target_type.ct_owner);
 	config_item_init_type_name(&nt->item, name, &netconsole_target_type);
 
 	/* Adding, but it is disabled */
Index: b/fs/configfs/dir.c
===================================================================
--- a/fs/configfs/dir.c	2008-06-13 00:41:49.000000000 +0200
+++ b/fs/configfs/dir.c	2008-06-13 01:26:13.000000000 +0200
@@ -1080,18 +1080,18 @@ static int configfs_mkdir(struct inode *
 		goto out_unlink;
 	}
 
+	/*
+	 * make_item()/make_group() should have grabbed a module reference for item
+	 * Let's check it but do not keep one module reference more. The
+	 * reference for item will be dropped after item is released.
+	 */
 	owner = type->ct_owner;
 	if (!try_module_get(owner)) {
+		printk(KERN_ERR "configfs: Ill-behaved subsystem: module reference count may be broken\n");
 		ret = -EINVAL;
 		goto out_unlink;
 	}
-
-	/*
-	 * I hate doing it this way, but if there is
-	 * an error,  module_put() probably should
-	 * happen after any cleanup.
-	 */
-	module_got = 1;
+	module_put(owner)
 
 	if (group)
 		ret = configfs_attach_group(parent_item, item, dentry);
@@ -1111,9 +1111,6 @@ out_unlink:
 		client_drop_item(parent_item, item);
 
 		mutex_unlock(&subsys->su_mutex);
-
-		if (module_got)
-			module_put(owner);
 	}
 
 out_put:
Index: b/fs/configfs/item.c
===================================================================
--- a/fs/configfs/item.c	2008-06-13 00:57:53.000000000 +0200
+++ b/fs/configfs/item.c	2008-06-13 00:59:42.000000000 +0200
@@ -153,13 +153,18 @@ static void config_item_cleanup(struct c
 	struct config_item_type * t = item->ci_type;
 	struct config_group * s = item->ci_group;
 	struct config_item * parent = item->ci_parent;
+	struct module *owner = NULL;
 
 	pr_debug("config_item %s: cleaning up\n",config_item_name(item));
 	if (item->ci_name != item->ci_namebuf)
 		kfree(item->ci_name);
 	item->ci_name = NULL;
+	if (t)
+		owner = t->ct_owner;
 	if (t && t->ct_item_ops && t->ct_item_ops->release)
 		t->ct_item_ops->release(item);
+	if (owner)
+		module_put(owner);
 	if (s)
 		config_group_put(s);
 	if (parent)
Index: b/fs/dlm/config.c
===================================================================
--- a/fs/dlm/config.c	2008-06-13 01:02:35.000000000 +0200
+++ b/fs/dlm/config.c	2008-06-13 01:21:33.000000000 +0200
@@ -408,6 +408,9 @@ static struct config_group *make_cluster
 	if (!cl || !gps || !sps || !cms)
 		goto fail;
 
+	__module_get(cluster_type.ct_owner);
+	__module_get(spaces_type.ct_owner);
+	__module_get(comms_type.ct_owner);
 	config_group_init_type_name(&cl->group, name, &cluster_type);
 	config_group_init_type_name(&sps->ss_group, "spaces", &spaces_type);
 	config_group_init_type_name(&cms->cs_group, "comms", &comms_type);
@@ -479,6 +482,8 @@ static struct config_group *make_space(s
 	if (!sp || !gps || !nds)
 		goto fail;
 
+	__module_get(space_type.ct_owner);
+	__module_get(nodes_type.ct_owner);
 	config_group_init_type_name(&sp->group, name, &space_type);
 	config_group_init_type_name(&nds->ns_group, "nodes", &nodes_type);
 
@@ -530,6 +535,7 @@ static struct config_item *make_comm(str
 	if (!cm)
 		return NULL;
 
+	__module_get(comm_type.ct_owner);
 	config_item_init_type_name(&cm->item, name, &comm_type);
 	cm->nodeid = -1;
 	cm->local = 0;
@@ -563,6 +569,7 @@ static struct config_item *make_node(str
 	if (!nd)
 		return NULL;
 
+	__module_get(node_type.ct_owner);
 	config_item_init_type_name(&nd->item, name, &node_type);
 	nd->nodeid = -1;
 	nd->weight = 1;  /* default weight of 1 if none is set */
Index: b/fs/ocfs2/cluster/heartbeat.c
===================================================================
--- a/fs/ocfs2/cluster/heartbeat.c	2008-06-13 01:23:34.000000000 +0200
+++ b/fs/ocfs2/cluster/heartbeat.c	2008-06-13 01:24:19.000000000 +0200
@@ -1499,6 +1499,7 @@ static struct config_item *o2hb_heartbea
 	if (reg == NULL)
 		goto out; /* ENOMEM */
 
+	__module_get(o2hb_region_type.ct_owner);
 	config_item_init_type_name(&reg->hr_item, name, &o2hb_region_type);
 
 	ret = &reg->hr_item;
Index: b/fs/ocfs2/cluster/nodemanager.c
===================================================================
--- a/fs/ocfs2/cluster/nodemanager.c	2008-06-13 01:09:29.000000000 +0200
+++ b/fs/ocfs2/cluster/nodemanager.c	2008-06-13 01:23:05.000000000 +0200
@@ -718,6 +718,7 @@ static struct config_item *o2nm_node_gro
 		goto out; /* ENOMEM */
 
 	strcpy(node->nd_name, name); /* use item.ci_namebuf instead? */
+	__module_get(o2nm_node_type.ct_owner);
 	config_item_init_type_name(&node->nd_item, name, &o2nm_node_type);
 	spin_lock_init(&node->nd_lock);
 
@@ -831,6 +832,10 @@ static struct config_group *o2nm_cluster
 	if (cluster == NULL || ns == NULL || o2hb_group == NULL || defs == NULL)
 		goto out;
 
+	if (!try_module_get(o2hb_group->cg_item.ci_type->ct_owner))
+		goto out;
+	__modult_get(o2nm_cluster_type.ct_owner);
+	__modult_get(o2nm_node_group_type.ct_owner);
 	config_group_init_type_name(&cluster->cl_group, name,
 				    &o2nm_cluster_type);
 	config_group_init_type_name(&ns->ns_group, "node",

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Ocfs2-devel] [RFC] configfs: module reference counting rules
  2008-06-12 23:54 ` Louis Rilling
@ 2008-06-13  3:33   ` Joel Becker
  -1 siblings, 0 replies; 20+ messages in thread
From: Joel Becker @ 2008-06-13  3:33 UTC (permalink / raw)
  To: Louis Rilling; +Cc: ocfs2-devel, linux-kernel

On Fri, Jun 13, 2008 at 01:54:11AM +0200, Louis Rilling wrote:
> I'm a bit confused by configfs module reference counting, and I feel
> that it does not really protect against module unloading. If my feeling
> is correct, I'd suggest to change module reference counting rules in
> configfs, for instance like in the attached patch. If my feeling is
> wrong, could someone shed some light?

	You're wrong, sort of :-)  I worked quite hard on this, so I was
scared you'd found something - you haven't.
	configfs is only responsible for its *own* references on the
client module.  The client module is responsible for protecting itself.
	What do I mean?  In mkdir(), the problem is racing
configfs_unregister_subsystem().  The group *has* to be live, because we
have i_mutex - unregister_subsystem can't tear down the directory until
we release it.  So we're safe to call ->make_item/group().  After we've
done that, we have a type, and we can try_module_get().  If someone else
is in unregister_subsystem, that fails and we clean up.  If not, we have
a reference and they can't unload.
	This is hard logic, and not something we want each and every
client module to have to figure out.  Your change has them explicitly
__module_get() in ->make_item/group(), which isn't safe because of this
race!
	What about attributes?  They can only be accessed via the
filesystem exposure.  If they are in the filesystem, unregister can't
happen.  If they are opened, the module refcount is incremented.
	Your big concern came from rmdir().  Specifically, while it was
safe to allocate an object during ->make_item/group(), what happens
after ->drop_item() is called and then module_put()?  If the item has a
reference count still, the ->release() is not called.  We may be
dropping our last reference on the module, and now the module can be
unloaded.  This is the module's problem, not ours!  configfs no longer
has a reference to the module, and thus cannot control its lifetime.
	Anyone who has just the single reference is safe, because that's
the last reference.  When we call the last config_item_put(), the
release happens.  That's the simple case.
	A module that takes an additional reference to the config item
needs to have this protection in place.  All in-kernel users take and
release items in one function call.  They don't hold long-term
references.  If they did, they'd have to have a way of ensuring their
structure remained alive - and this would be the case if configfs wasn't
even involved.

Joel

-- 

Life's Little Instruction Book #232

	"Keep your promises."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] configfs: module reference counting rules
@ 2008-06-13  3:33   ` Joel Becker
  0 siblings, 0 replies; 20+ messages in thread
From: Joel Becker @ 2008-06-13  3:33 UTC (permalink / raw)
  To: Louis Rilling; +Cc: ocfs2-devel, linux-kernel

On Fri, Jun 13, 2008 at 01:54:11AM +0200, Louis Rilling wrote:
> I'm a bit confused by configfs module reference counting, and I feel
> that it does not really protect against module unloading. If my feeling
> is correct, I'd suggest to change module reference counting rules in
> configfs, for instance like in the attached patch. If my feeling is
> wrong, could someone shed some light?

	You're wrong, sort of :-)  I worked quite hard on this, so I was
scared you'd found something - you haven't.
	configfs is only responsible for its *own* references on the
client module.  The client module is responsible for protecting itself.
	What do I mean?  In mkdir(), the problem is racing
configfs_unregister_subsystem().  The group *has* to be live, because we
have i_mutex - unregister_subsystem can't tear down the directory until
we release it.  So we're safe to call ->make_item/group().  After we've
done that, we have a type, and we can try_module_get().  If someone else
is in unregister_subsystem, that fails and we clean up.  If not, we have
a reference and they can't unload.
	This is hard logic, and not something we want each and every
client module to have to figure out.  Your change has them explicitly
__module_get() in ->make_item/group(), which isn't safe because of this
race!
	What about attributes?  They can only be accessed via the
filesystem exposure.  If they are in the filesystem, unregister can't
happen.  If they are opened, the module refcount is incremented.
	Your big concern came from rmdir().  Specifically, while it was
safe to allocate an object during ->make_item/group(), what happens
after ->drop_item() is called and then module_put()?  If the item has a
reference count still, the ->release() is not called.  We may be
dropping our last reference on the module, and now the module can be
unloaded.  This is the module's problem, not ours!  configfs no longer
has a reference to the module, and thus cannot control its lifetime.
	Anyone who has just the single reference is safe, because that's
the last reference.  When we call the last config_item_put(), the
release happens.  That's the simple case.
	A module that takes an additional reference to the config item
needs to have this protection in place.  All in-kernel users take and
release items in one function call.  They don't hold long-term
references.  If they did, they'd have to have a way of ensuring their
structure remained alive - and this would be the case if configfs wasn't
even involved.

Joel

-- 

Life's Little Instruction Book #232

	"Keep your promises."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Ocfs2-devel] [RFC] configfs: module reference counting rules
  2008-06-13  3:33   ` Joel Becker
@ 2008-06-13  9:51     ` Louis Rilling
  -1 siblings, 0 replies; 20+ messages in thread
From: Louis Rilling @ 2008-06-13  9:51 UTC (permalink / raw)
  To: Joel.Becker; +Cc: ocfs2-devel, linux-kernel

On Thu, Jun 12, 2008 at 08:33:12PM -0700, Joel Becker wrote:
> On Fri, Jun 13, 2008 at 01:54:11AM +0200, Louis Rilling wrote:
> > I'm a bit confused by configfs module reference counting, and I feel
> > that it does not really protect against module unloading. If my feeling
> > is correct, I'd suggest to change module reference counting rules in
> > configfs, for instance like in the attached patch. If my feeling is
> > wrong, could someone shed some light?
> 
> 	You're wrong, sort of :-)  I worked quite hard on this, so I was
> scared you'd found something - you haven't.
> 	configfs is only responsible for its *own* references on the
> client module.  The client module is responsible for protecting itself.
> 	What do I mean?  In mkdir(), the problem is racing
> configfs_unregister_subsystem().  The group *has* to be live, because we
> have i_mutex - unregister_subsystem can't tear down the directory until
> we release it.  So we're safe to call ->make_item/group().  After we've
> done that, we have a type, and we can try_module_get().  If someone else
> is in unregister_subsystem, that fails and we clean up.  If not, we have
> a reference and they can't unload.

Ok, got it. The race is between unregister_subsystem() and mkdir() at the root
of the subsystem (or one of its default groups). In deeper, user-created groups,
this would be a design bug of the subsystem if this race could occur.

> 	This is hard logic, and not something we want each and every
> client module to have to figure out.  Your change has them explicitly
> __module_get() in ->make_item/group(), which isn't safe because of this
> race!

Well, this remains hard logic for the modules. But I agree that they should not
impact the logic that deals with racing mkdir() and unregister_subsystem().

> 	What about attributes?  They can only be accessed via the
> filesystem exposure.  If they are in the filesystem, unregister can't
> happen.  If they are opened, the module refcount is incremented.
> 	Your big concern came from rmdir().  Specifically, while it was
> safe to allocate an object during ->make_item/group(), what happens
> after ->drop_item() is called and then module_put()?  If the item has a
> reference count still, the ->release() is not called.  We may be
> dropping our last reference on the module, and now the module can be
> unloaded.  This is the module's problem, not ours!  configfs no longer
> has a reference to the module, and thus cannot control its lifetime.

Sure. I was thinking that configfs helped subsystems with such module reference
counting issues, but I was wrong.

> 	Anyone who has just the single reference is safe, because that's
> the last reference.  When we call the last config_item_put(), the
> release happens.  That's the simple case.
> 	A module that takes an additional reference to the config item
> needs to have this protection in place.  All in-kernel users take and
> release items in one function call.  They don't hold long-term
> references.  If they did, they'd have to have a way of ensuring their
> structure remained alive - and this would be the case if configfs wasn't
> even involved.

Thanks for these explanations.

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20080613/1ebdf9ec/attachment.bin 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] configfs: module reference counting rules
@ 2008-06-13  9:51     ` Louis Rilling
  0 siblings, 0 replies; 20+ messages in thread
From: Louis Rilling @ 2008-06-13  9:51 UTC (permalink / raw)
  To: Joel.Becker; +Cc: ocfs2-devel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3424 bytes --]

On Thu, Jun 12, 2008 at 08:33:12PM -0700, Joel Becker wrote:
> On Fri, Jun 13, 2008 at 01:54:11AM +0200, Louis Rilling wrote:
> > I'm a bit confused by configfs module reference counting, and I feel
> > that it does not really protect against module unloading. If my feeling
> > is correct, I'd suggest to change module reference counting rules in
> > configfs, for instance like in the attached patch. If my feeling is
> > wrong, could someone shed some light?
> 
> 	You're wrong, sort of :-)  I worked quite hard on this, so I was
> scared you'd found something - you haven't.
> 	configfs is only responsible for its *own* references on the
> client module.  The client module is responsible for protecting itself.
> 	What do I mean?  In mkdir(), the problem is racing
> configfs_unregister_subsystem().  The group *has* to be live, because we
> have i_mutex - unregister_subsystem can't tear down the directory until
> we release it.  So we're safe to call ->make_item/group().  After we've
> done that, we have a type, and we can try_module_get().  If someone else
> is in unregister_subsystem, that fails and we clean up.  If not, we have
> a reference and they can't unload.

Ok, got it. The race is between unregister_subsystem() and mkdir() at the root
of the subsystem (or one of its default groups). In deeper, user-created groups,
this would be a design bug of the subsystem if this race could occur.

> 	This is hard logic, and not something we want each and every
> client module to have to figure out.  Your change has them explicitly
> __module_get() in ->make_item/group(), which isn't safe because of this
> race!

Well, this remains hard logic for the modules. But I agree that they should not
impact the logic that deals with racing mkdir() and unregister_subsystem().

> 	What about attributes?  They can only be accessed via the
> filesystem exposure.  If they are in the filesystem, unregister can't
> happen.  If they are opened, the module refcount is incremented.
> 	Your big concern came from rmdir().  Specifically, while it was
> safe to allocate an object during ->make_item/group(), what happens
> after ->drop_item() is called and then module_put()?  If the item has a
> reference count still, the ->release() is not called.  We may be
> dropping our last reference on the module, and now the module can be
> unloaded.  This is the module's problem, not ours!  configfs no longer
> has a reference to the module, and thus cannot control its lifetime.

Sure. I was thinking that configfs helped subsystems with such module reference
counting issues, but I was wrong.

> 	Anyone who has just the single reference is safe, because that's
> the last reference.  When we call the last config_item_put(), the
> release happens.  That's the simple case.
> 	A module that takes an additional reference to the config item
> needs to have this protection in place.  All in-kernel users take and
> release items in one function call.  They don't hold long-term
> references.  If they did, they'd have to have a way of ensuring their
> structure remained alive - and this would be the case if configfs wasn't
> even involved.

Thanks for these explanations.

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Ocfs2-devel] [RFC] configfs: module reference counting rules
  2008-06-13  9:51     ` Louis Rilling
@ 2008-06-13 20:26       ` Joel Becker
  -1 siblings, 0 replies; 20+ messages in thread
From: Joel Becker @ 2008-06-13 20:26 UTC (permalink / raw)
  To: Louis Rilling; +Cc: ocfs2-devel, linux-kernel

On Fri, Jun 13, 2008 at 11:51:59AM +0200, Louis Rilling wrote:
> On Thu, Jun 12, 2008 at 08:33:12PM -0700, Joel Becker wrote:
> Ok, got it. The race is between unregister_subsystem() and mkdir() at the root
> of the subsystem (or one of its default groups). In deeper, user-created groups,
> this would be a design bug of the subsystem if this race could occur.

	Exactly.

> > 	This is hard logic, and not something we want each and every
> > client module to have to figure out.  Your change has them explicitly
> > __module_get() in ->make_item/group(), which isn't safe because of this
> > race!
> 
> Well, this remains hard logic for the modules. But I agree that they should not
> impact the logic that deals with racing mkdir() and unregister_subsystem().

	It is, but most modules don't have to deal with it.  Most
modules (all in-kernel configfs users) have a single refrence on it.
When they take an additional one, it's for the duration of a function
call.  They have to make sure that it's safe to call the function in the
first place, so it's clearly safe to get/put the item.
	Forget about the configfs view that is presented to userspace.
If you were a module with inter-linked structures, you'd have to make
sure they were freed cleanly.  For simple things, you create and drop
them.  If a module has a complex interlinkage, they have a mechanism to
handle it.
	Example: filesystems can hang whatever they want off of VFS
structures like inodes and superblocks - a mounted filesystem prevents
rmmod.  Everything is safe, *everything*, as long as it is all cleaned
up when put_super() returns.
	So for the simple case, we provide plenty of protection.  If
someone wants to do something fancier, they have to provide their own
protection, but they would anyway.  And we can't know their complex
lifecycle, so we can't really help anyway.

> Sure. I was thinking that configfs helped subsystems with such module reference
> counting issues, but I was wrong.

	We only help with the ones we have enough information to help
with.

> Thanks for these explanations.

	No problem.  The more questions you ask, the more I refresh my
memory.  And as you've seen on other points, we sometimes find bugs :-)

Joel


-- 

"The trouble with being punctual is that nobody's there to
 appreciate it."
	- Franklin P. Jones

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] configfs: module reference counting rules
@ 2008-06-13 20:26       ` Joel Becker
  0 siblings, 0 replies; 20+ messages in thread
From: Joel Becker @ 2008-06-13 20:26 UTC (permalink / raw)
  To: Louis Rilling; +Cc: ocfs2-devel, linux-kernel

On Fri, Jun 13, 2008 at 11:51:59AM +0200, Louis Rilling wrote:
> On Thu, Jun 12, 2008 at 08:33:12PM -0700, Joel Becker wrote:
> Ok, got it. The race is between unregister_subsystem() and mkdir() at the root
> of the subsystem (or one of its default groups). In deeper, user-created groups,
> this would be a design bug of the subsystem if this race could occur.

	Exactly.

> > 	This is hard logic, and not something we want each and every
> > client module to have to figure out.  Your change has them explicitly
> > __module_get() in ->make_item/group(), which isn't safe because of this
> > race!
> 
> Well, this remains hard logic for the modules. But I agree that they should not
> impact the logic that deals with racing mkdir() and unregister_subsystem().

	It is, but most modules don't have to deal with it.  Most
modules (all in-kernel configfs users) have a single refrence on it.
When they take an additional one, it's for the duration of a function
call.  They have to make sure that it's safe to call the function in the
first place, so it's clearly safe to get/put the item.
	Forget about the configfs view that is presented to userspace.
If you were a module with inter-linked structures, you'd have to make
sure they were freed cleanly.  For simple things, you create and drop
them.  If a module has a complex interlinkage, they have a mechanism to
handle it.
	Example: filesystems can hang whatever they want off of VFS
structures like inodes and superblocks - a mounted filesystem prevents
rmmod.  Everything is safe, *everything*, as long as it is all cleaned
up when put_super() returns.
	So for the simple case, we provide plenty of protection.  If
someone wants to do something fancier, they have to provide their own
protection, but they would anyway.  And we can't know their complex
lifecycle, so we can't really help anyway.

> Sure. I was thinking that configfs helped subsystems with such module reference
> counting issues, but I was wrong.

	We only help with the ones we have enough information to help
with.

> Thanks for these explanations.

	No problem.  The more questions you ask, the more I refresh my
memory.  And as you've seen on other points, we sometimes find bugs :-)

Joel


-- 

"The trouble with being punctual is that nobody's there to
 appreciate it."
	- Franklin P. Jones

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Ocfs2-devel] [RFC] configfs: module reference counting rules
  2008-06-13 20:26       ` Joel Becker
@ 2008-06-13 22:27         ` Louis Rilling
  -1 siblings, 0 replies; 20+ messages in thread
From: Louis Rilling @ 2008-06-13 22:27 UTC (permalink / raw)
  To: Joel.Becker; +Cc: ocfs2-devel, linux-kernel

On Fri, Jun 13, 2008 at 01:26:05PM -0700, Joel Becker wrote:
> On Fri, Jun 13, 2008 at 11:51:59AM +0200, Louis Rilling wrote:
> > On Thu, Jun 12, 2008 at 08:33:12PM -0700, Joel Becker wrote:
> > Ok, got it. The race is between unregister_subsystem() and mkdir() at the root
> > of the subsystem (or one of its default groups). In deeper, user-created groups,
> > this would be a design bug of the subsystem if this race could occur.
> 
> 	Exactly.

A few more thoughts:

1/ taking the module reference is only needed if mkdir() is called under
a subsystem root or one of its default group, right? Of course this is a
bit complex to check for this condition, and it does not hurt to take
module references in all cases of mkdir().

2/ this module reference counting makes unregister_subsystem() win
against mkdir(), but only if unregister_subsystem() is called in
module_cleanup(), because otherwise try_module_get() would succeed,
right? If so, this means that after having called register_subsystem(),
a module_init() cannot cleanly fail. Perhaps this should be documented
in that case.

3/ to make unregister_subsystem() win against mkdir(), mkdir() should
try_module_get() on the subsystem's owner, not on the new item owner (as
is done by the current code), right? Maybe there is a bug here...

> 
> > > 	This is hard logic, and not something we want each and every
> > > client module to have to figure out.  Your change has them explicitly
> > > __module_get() in ->make_item/group(), which isn't safe because of this
> > > race!
> > 
> > Well, this remains hard logic for the modules. But I agree that they should not
> > impact the logic that deals with racing mkdir() and unregister_subsystem().
> 
> 	It is, but most modules don't have to deal with it.  Most
> modules (all in-kernel configfs users) have a single refrence on it.
> When they take an additional one, it's for the duration of a function
> call.  They have to make sure that it's safe to call the function in the
> first place, so it's clearly safe to get/put the item.
> 	Forget about the configfs view that is presented to userspace.
> If you were a module with inter-linked structures, you'd have to make
> sure they were freed cleanly.  For simple things, you create and drop
> them.  If a module has a complex interlinkage, they have a mechanism to
> handle it.
> 	Example: filesystems can hang whatever they want off of VFS
> structures like inodes and superblocks - a mounted filesystem prevents
> rmmod.  Everything is safe, *everything*, as long as it is all cleaned
> up when put_super() returns.
> 	So for the simple case, we provide plenty of protection.  If
> someone wants to do something fancier, they have to provide their own
> protection, but they would anyway.  And we can't know their complex
> lifecycle, so we can't really help anyway.

Actually, I'm developing a framework based on configfs, with which many
modules can be linked together through mkdir() and symlink() operations.
So I'm already managing such reference holding, but the fact that
configfs does not hold a reference until the last config_item_put()
imposes limitations (with which I can live though) to the framework:
plugins of the framework provide custom config_item_types, but cannot define
their own config_item_operations (especially release()), because
release() must be called with a reference held on the module, and once
release() is called somebody (not the module itself) has to drop this reference.
	For instance, A is a group of the framework, and mkdir A/B creates an
object implemented by module "mod_b". Before calling mod_b's
constructor, A's make_item() has to grab a reference on mod_b, and fail
if not successful. Then this reference cannot be dropped before B's last
reference is dropped, which can happen a long time after rmdir A/B. So
A's drop_item() cannot drop mod_b's reference, and A has to provide
the release() config_item_operation for B, which will call B's
destructor and finally drop mod_b's reference.
	So I'd hoped that configfs could help me with this reference
counting, for instance by keeping a reference until last
config_item_cleanup(). But I can live without it, so don't care.

Louis

-- 
Dr Louis Rilling			Kerlabs - IRISA
Skype: louis.rilling			Campus Universitaire de Beaulieu
Phone: (+33|0) 2 99 84 71 52		Avenue du General Leclerc
Fax: (+33|0) 2 99 84 71 71		35042 Rennes CEDEX - France
http://www.kerlabs.com/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] configfs: module reference counting rules
@ 2008-06-13 22:27         ` Louis Rilling
  0 siblings, 0 replies; 20+ messages in thread
From: Louis Rilling @ 2008-06-13 22:27 UTC (permalink / raw)
  To: Joel.Becker; +Cc: ocfs2-devel, linux-kernel

On Fri, Jun 13, 2008 at 01:26:05PM -0700, Joel Becker wrote:
> On Fri, Jun 13, 2008 at 11:51:59AM +0200, Louis Rilling wrote:
> > On Thu, Jun 12, 2008 at 08:33:12PM -0700, Joel Becker wrote:
> > Ok, got it. The race is between unregister_subsystem() and mkdir() at the root
> > of the subsystem (or one of its default groups). In deeper, user-created groups,
> > this would be a design bug of the subsystem if this race could occur.
> 
> 	Exactly.

A few more thoughts:

1/ taking the module reference is only needed if mkdir() is called under
a subsystem root or one of its default group, right? Of course this is a
bit complex to check for this condition, and it does not hurt to take
module references in all cases of mkdir().

2/ this module reference counting makes unregister_subsystem() win
against mkdir(), but only if unregister_subsystem() is called in
module_cleanup(), because otherwise try_module_get() would succeed,
right? If so, this means that after having called register_subsystem(),
a module_init() cannot cleanly fail. Perhaps this should be documented
in that case.

3/ to make unregister_subsystem() win against mkdir(), mkdir() should
try_module_get() on the subsystem's owner, not on the new item owner (as
is done by the current code), right? Maybe there is a bug here...

> 
> > > 	This is hard logic, and not something we want each and every
> > > client module to have to figure out.  Your change has them explicitly
> > > __module_get() in ->make_item/group(), which isn't safe because of this
> > > race!
> > 
> > Well, this remains hard logic for the modules. But I agree that they should not
> > impact the logic that deals with racing mkdir() and unregister_subsystem().
> 
> 	It is, but most modules don't have to deal with it.  Most
> modules (all in-kernel configfs users) have a single refrence on it.
> When they take an additional one, it's for the duration of a function
> call.  They have to make sure that it's safe to call the function in the
> first place, so it's clearly safe to get/put the item.
> 	Forget about the configfs view that is presented to userspace.
> If you were a module with inter-linked structures, you'd have to make
> sure they were freed cleanly.  For simple things, you create and drop
> them.  If a module has a complex interlinkage, they have a mechanism to
> handle it.
> 	Example: filesystems can hang whatever they want off of VFS
> structures like inodes and superblocks - a mounted filesystem prevents
> rmmod.  Everything is safe, *everything*, as long as it is all cleaned
> up when put_super() returns.
> 	So for the simple case, we provide plenty of protection.  If
> someone wants to do something fancier, they have to provide their own
> protection, but they would anyway.  And we can't know their complex
> lifecycle, so we can't really help anyway.

Actually, I'm developing a framework based on configfs, with which many
modules can be linked together through mkdir() and symlink() operations.
So I'm already managing such reference holding, but the fact that
configfs does not hold a reference until the last config_item_put()
imposes limitations (with which I can live though) to the framework:
plugins of the framework provide custom config_item_types, but cannot define
their own config_item_operations (especially release()), because
release() must be called with a reference held on the module, and once
release() is called somebody (not the module itself) has to drop this reference.
	For instance, A is a group of the framework, and mkdir A/B creates an
object implemented by module "mod_b". Before calling mod_b's
constructor, A's make_item() has to grab a reference on mod_b, and fail
if not successful. Then this reference cannot be dropped before B's last
reference is dropped, which can happen a long time after rmdir A/B. So
A's drop_item() cannot drop mod_b's reference, and A has to provide
the release() config_item_operation for B, which will call B's
destructor and finally drop mod_b's reference.
	So I'd hoped that configfs could help me with this reference
counting, for instance by keeping a reference until last
config_item_cleanup(). But I can live without it, so don't care.

Louis

-- 
Dr Louis Rilling			Kerlabs - IRISA
Skype: louis.rilling			Campus Universitaire de Beaulieu
Phone: (+33|0) 2 99 84 71 52		Avenue du General Leclerc
Fax: (+33|0) 2 99 84 71 71		35042 Rennes CEDEX - France
http://www.kerlabs.com/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Ocfs2-devel] [RFC] configfs: module reference counting rules
  2008-06-13 22:27         ` Louis Rilling
@ 2008-06-14  8:47           ` Joel Becker
  -1 siblings, 0 replies; 20+ messages in thread
From: Joel Becker @ 2008-06-14  8:47 UTC (permalink / raw)
  To: Louis Rilling; +Cc: ocfs2-devel, linux-kernel

On Sat, Jun 14, 2008 at 12:27:44AM +0200, Louis Rilling wrote:
> 1/ taking the module reference is only needed if mkdir() is called under
> a subsystem root or one of its default group, right? Of course this is a
> bit complex to check for this condition, and it does not hurt to take
> module references in all cases of mkdir().

	Correct, but it's much cleaner to always take the module ref.

> 2/ this module reference counting makes unregister_subsystem() win
> against mkdir(), but only if unregister_subsystem() is called in
> module_cleanup(), because otherwise try_module_get() would succeed,
> right? If so, this means that after having called register_subsystem(),
> a module_init() cannot cleanly fail. Perhaps this should be documented
> in that case.

	Correct again, mkdir can race a failing module_init().  This is
the same as register_filesystem().  A module that needed to protect
against this could have a mutex they release right before module_init()
succeeds.  They'd check it in make_item().  But most everyone can safely
make configfs_register_subsystem() the last thing in their
module_init().

> 3/ to make unregister_subsystem() win against mkdir(), mkdir() should
> try_module_get() on the subsystem's owner, not on the new item owner (as
> is done by the current code), right? Maybe there is a bug here...

	Nope.  You can build a subsystem out of multiple modules if you
like.  The module that owns the newly created object needs to be pinned,
and that's new_item->type->owner.  If a subsystem lives within one
module, then subsys->type->owner == new_item->type->owner, and it
doesn't matter.

> > 	So for the simple case, we provide plenty of protection.  If
> > someone wants to do something fancier, they have to provide their own
> > protection, but they would anyway.  And we can't know their complex
> > lifecycle, so we can't really help anyway.
> 
> Actually, I'm developing a framework based on configfs, with which many
> modules can be linked together through mkdir() and symlink() operations.
> So I'm already managing such reference holding, but the fact that
> configfs does not hold a reference until the last config_item_put()
> imposes limitations (with which I can live though) to the framework:

	Think about it this way: the try_module_get() isn't to protect
the client module, it is to protect configfs.  It makes sure that if
someone calls a VFS operation on a configfs inode, configfs can follow
the config_*_operations safely.  Once a config_item is removed from the
filesystem view, configfs is done with it.
	Look at it the other way around.  A config_item is not a
structure owned by configfs.  It is a part of the larger object, which
is owned by the module that created it.  The config_item portion just
lets configfs display it to userspace.

> 	For instance, A is a group of the framework, and mkdir A/B creates an
> object implemented by module "mod_b". Before calling mod_b's
> constructor, A's make_item() has to grab a reference on mod_b, and fail
> if not successful. Then this reference cannot be dropped before B's last
> reference is dropped, which can happen a long time after rmdir A/B. So
> A's drop_item() cannot drop mod_b's reference, and A has to provide
> the release() config_item_operation for B, which will call B's
> destructor and finally drop mod_b's reference.

	Wow, so A->make_item() does something like:

    submod = lookup_which_mod();
    if (!strcmp(submod, "mod_a"))
        new_thing = mod_a->alloc();
    else if (!strcmp(submod, "mod_b"))
        new_thing = mod_b->alloc();

    return &new_thing->config_item;

That's pretty complicated, I agree.  But certainly doable.
	Why can't mod_b provide a ->release() that does
module_put(self)?  Or are you trying to hide that detail from the person
who is implementing mod_b?  Even better, use the chained release scheme
that is used by bio_endio().  Have mod_b control its own
config_item_operations.  In make_item, copy off the type and operations,
then put in your own.  In drop, put them back.

struct my_item_type {
	struct config_item_type *original_type;
	struct config_item_type type;
	struct config_item_operations ops;
};

make_item()
{
	mod_b = alloc_mod_b();
	my_type = kzalloc(struct my_item_type);
	my_type->original_type = mod_b->item->ci_type;
	my_type->ops = *my_type->original_type->ct_item_ops;
	my_type->ops.release = my_item_release();
	my_type->type = *my_type->original_type;
	my_type->type.ops = &my_type->ops;
	mod_b->item->ci_type = &my_type->type;

	return &mod_b->item;
}

my_item_release(struct config_item *item)
{
	my_type = to_my_type(item->type);
	item->ci_type = my_type->original_type;
	kfree(my_type);
	item->ci_type->ct_ops->release(item);
}

Joel

-- 

"Ninety feet between bases is perhaps as close as man has ever come
 to perfection."
	- Red Smith

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] configfs: module reference counting rules
@ 2008-06-14  8:47           ` Joel Becker
  0 siblings, 0 replies; 20+ messages in thread
From: Joel Becker @ 2008-06-14  8:47 UTC (permalink / raw)
  To: Louis Rilling; +Cc: ocfs2-devel, linux-kernel

On Sat, Jun 14, 2008 at 12:27:44AM +0200, Louis Rilling wrote:
> 1/ taking the module reference is only needed if mkdir() is called under
> a subsystem root or one of its default group, right? Of course this is a
> bit complex to check for this condition, and it does not hurt to take
> module references in all cases of mkdir().

	Correct, but it's much cleaner to always take the module ref.

> 2/ this module reference counting makes unregister_subsystem() win
> against mkdir(), but only if unregister_subsystem() is called in
> module_cleanup(), because otherwise try_module_get() would succeed,
> right? If so, this means that after having called register_subsystem(),
> a module_init() cannot cleanly fail. Perhaps this should be documented
> in that case.

	Correct again, mkdir can race a failing module_init().  This is
the same as register_filesystem().  A module that needed to protect
against this could have a mutex they release right before module_init()
succeeds.  They'd check it in make_item().  But most everyone can safely
make configfs_register_subsystem() the last thing in their
module_init().

> 3/ to make unregister_subsystem() win against mkdir(), mkdir() should
> try_module_get() on the subsystem's owner, not on the new item owner (as
> is done by the current code), right? Maybe there is a bug here...

	Nope.  You can build a subsystem out of multiple modules if you
like.  The module that owns the newly created object needs to be pinned,
and that's new_item->type->owner.  If a subsystem lives within one
module, then subsys->type->owner == new_item->type->owner, and it
doesn't matter.

> > 	So for the simple case, we provide plenty of protection.  If
> > someone wants to do something fancier, they have to provide their own
> > protection, but they would anyway.  And we can't know their complex
> > lifecycle, so we can't really help anyway.
> 
> Actually, I'm developing a framework based on configfs, with which many
> modules can be linked together through mkdir() and symlink() operations.
> So I'm already managing such reference holding, but the fact that
> configfs does not hold a reference until the last config_item_put()
> imposes limitations (with which I can live though) to the framework:

	Think about it this way: the try_module_get() isn't to protect
the client module, it is to protect configfs.  It makes sure that if
someone calls a VFS operation on a configfs inode, configfs can follow
the config_*_operations safely.  Once a config_item is removed from the
filesystem view, configfs is done with it.
	Look at it the other way around.  A config_item is not a
structure owned by configfs.  It is a part of the larger object, which
is owned by the module that created it.  The config_item portion just
lets configfs display it to userspace.

> 	For instance, A is a group of the framework, and mkdir A/B creates an
> object implemented by module "mod_b". Before calling mod_b's
> constructor, A's make_item() has to grab a reference on mod_b, and fail
> if not successful. Then this reference cannot be dropped before B's last
> reference is dropped, which can happen a long time after rmdir A/B. So
> A's drop_item() cannot drop mod_b's reference, and A has to provide
> the release() config_item_operation for B, which will call B's
> destructor and finally drop mod_b's reference.

	Wow, so A->make_item() does something like:

    submod = lookup_which_mod();
    if (!strcmp(submod, "mod_a"))
        new_thing = mod_a->alloc();
    else if (!strcmp(submod, "mod_b"))
        new_thing = mod_b->alloc();

    return &new_thing->config_item;

That's pretty complicated, I agree.  But certainly doable.
	Why can't mod_b provide a ->release() that does
module_put(self)?  Or are you trying to hide that detail from the person
who is implementing mod_b?  Even better, use the chained release scheme
that is used by bio_endio().  Have mod_b control its own
config_item_operations.  In make_item, copy off the type and operations,
then put in your own.  In drop, put them back.

struct my_item_type {
	struct config_item_type *original_type;
	struct config_item_type type;
	struct config_item_operations ops;
};

make_item()
{
	mod_b = alloc_mod_b();
	my_type = kzalloc(struct my_item_type);
	my_type->original_type = mod_b->item->ci_type;
	my_type->ops = *my_type->original_type->ct_item_ops;
	my_type->ops.release = my_item_release();
	my_type->type = *my_type->original_type;
	my_type->type.ops = &my_type->ops;
	mod_b->item->ci_type = &my_type->type;

	return &mod_b->item;
}

my_item_release(struct config_item *item)
{
	my_type = to_my_type(item->type);
	item->ci_type = my_type->original_type;
	kfree(my_type);
	item->ci_type->ct_ops->release(item);
}

Joel

-- 

"Ninety feet between bases is perhaps as close as man has ever come
 to perfection."
	- Red Smith

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Ocfs2-devel] [RFC] configfs: module reference counting rules
  2008-06-14  8:47           ` Joel Becker
@ 2008-06-16 12:39             ` Louis Rilling
  -1 siblings, 0 replies; 20+ messages in thread
From: Louis Rilling @ 2008-06-16 12:39 UTC (permalink / raw)
  To: Joel.Becker; +Cc: ocfs2-devel, linux-kernel

On Sat, Jun 14, 2008 at 01:47:01AM -0700, Joel Becker wrote:
> On Sat, Jun 14, 2008 at 12:27:44AM +0200, Louis Rilling wrote:
> > 1/ taking the module reference is only needed if mkdir() is called under
> > a subsystem root or one of its default group, right? Of course this is a
> > bit complex to check for this condition, and it does not hurt to take
> > module references in all cases of mkdir().
> 
> 	Correct, but it's much cleaner to always take the module ref.
> 
> > 2/ this module reference counting makes unregister_subsystem() win
> > against mkdir(), but only if unregister_subsystem() is called in
> > module_cleanup(), because otherwise try_module_get() would succeed,
> > right? If so, this means that after having called register_subsystem(),
> > a module_init() cannot cleanly fail. Perhaps this should be documented
> > in that case.
> 
> 	Correct again, mkdir can race a failing module_init().  This is
> the same as register_filesystem().  A module that needed to protect
> against this could have a mutex they release right before module_init()
> succeeds.  They'd check it in make_item().  But most everyone can safely
> make configfs_register_subsystem() the last thing in their
> module_init().

Agreed. I have examples of similar issues (but not configfs related) where
a single module module_init() failing needs several cleanup that can only
safely be done in module_cleanup(), but I cannot claim that this is general
case :)

> 
> > 3/ to make unregister_subsystem() win against mkdir(), mkdir() should
> > try_module_get() on the subsystem's owner, not on the new item owner (as
> > is done by the current code), right? Maybe there is a bug here...
> 
> 	Nope.  You can build a subsystem out of multiple modules if you
> like.  The module that owns the newly created object needs to be pinned,
> and that's new_item->type->owner.  If a subsystem lives within one
> module, then subsys->type->owner == new_item->type->owner, and it
> doesn't matter.

In the "several modules" case, the module is pinned too late to protect against
module unloading. This does probably not hurt configfs since the only
problematic call is config_item_cleanup(), where the new item's release()
operation is called. For such subsystems, the only way to protect against module
unloading is to grab a reference on the new item's module in the make_item()
operation, and find a way to ensure that the reference is dropped after the last
config_item_put().
	IMHO, what really hurts configfs is that the unregister_subsystem() vs
mkdir() race is not solved unless mkdir() tries to grab a reference on the
subsystem's module. And the current code of mkdir() does not ensure that in the
"several modules" case.

> 
> > > 	So for the simple case, we provide plenty of protection.  If
> > > someone wants to do something fancier, they have to provide their own
> > > protection, but they would anyway.  And we can't know their complex
> > > lifecycle, so we can't really help anyway.
> > 
> > Actually, I'm developing a framework based on configfs, with which many
> > modules can be linked together through mkdir() and symlink() operations.
> > So I'm already managing such reference holding, but the fact that
> > configfs does not hold a reference until the last config_item_put()
> > imposes limitations (with which I can live though) to the framework:
> 
> 	Think about it this way: the try_module_get() isn't to protect
> the client module, it is to protect configfs.  It makes sure that if
> someone calls a VFS operation on a configfs inode, configfs can follow
> the config_*_operations safely.  Once a config_item is removed from the
> filesystem view, configfs is done with it.
> 	Look at it the other way around.  A config_item is not a
> structure owned by configfs.  It is a part of the larger object, which
> is owned by the module that created it.  The config_item portion just
> lets configfs display it to userspace.

Yes, this is what I realized in your previous email.

> 
> > 	For instance, A is a group of the framework, and mkdir A/B creates an
> > object implemented by module "mod_b". Before calling mod_b's
> > constructor, A's make_item() has to grab a reference on mod_b, and fail
> > if not successful. Then this reference cannot be dropped before B's last
> > reference is dropped, which can happen a long time after rmdir A/B. So
> > A's drop_item() cannot drop mod_b's reference, and A has to provide
> > the release() config_item_operation for B, which will call B's
> > destructor and finally drop mod_b's reference.
> 
> 	Wow, so A->make_item() does something like:
> 
>     submod = lookup_which_mod();
>     if (!strcmp(submod, "mod_a"))
>         new_thing = mod_a->alloc();
>     else if (!strcmp(submod, "mod_b"))
>         new_thing = mod_b->alloc();
> 
>     return &new_thing->config_item;
> 
> That's pretty complicated, I agree.  But certainly doable.

I do something like this (and this works):

struct mod_type {
	struct list_head type_list;
	struct config_item_type item_type;
	const char *name;
	struct config_item *new_item(const char *name);
	void destroy_item(struct config_item *item);
};

void my_release(struct config_item *item)
{
	struct mod_type *type = container_of(item->ci_type, struct mod_type,
					item_type);
	type->destroy_item(item);
	module_put(type->item_type.ct_owner);
}

struct configfs_item_operations my_item_ops = {
	.release = my_release,
};

void register(struct mod_type *mod_type)
{
	mod_type->item_type.ct_item_ops = &my_item_ops;
	spin_lock(&type_list);
	list_add(&mod_type->type_list, &mod_type_head);
	spin_unlock(&type_list);
}

/* Must only be called inside module_cleanup() */
void unregister(struct mod_type *mod_type)
{
	spin_lock(&type_list);
	list_del(&mod_type->type_list);
	spin_unlock(&type_list);
}

struct mod_type *lookup(const char *name)
{
	/* return mod_type having mod_type->name == name in the list */ 
}

make_item(struct config_group *group, const char *name)
{
	spin_lock(&mod_type_list);
	mod_type = lookup(name);
	if (mod_type)
		if (!try_module_get(mod_type->item_type.ct_owner))
			mod_type = NULL;
	spin_unlock(&mod_type_list);

	new_item = NULL;
	if (mod_type) {
		new_item = mod_type->new_item(name);
		if (!new_item)
			module_put(mod_type->item_type.ct_owner);
	}
	return new_item;
}

drop_item(struct config_group *group, struct config_item *item)
{
	config_item_put(item);
}

------

mod_a_init()
	register(mod_type_a);

mod_a_cleanup()
	unregister(mod_type_a);

> 	Why can't mod_b provide a ->release() that does
> module_put(self)?

Because this is simply wrong. Doing module_put(self) exposes the modules's
function to be run while another cpu unloads the module. Note how I solve this
by doing try_module_get() while still having mod_type_list locked, and doing
module_put() only after having destroyed the module's item. This lock
actually protects item creation against concurrent removal of the module
implementing that item.

> Or are you trying to hide that detail from the person
> who is implementing mod_b?  Even better, use the chained release scheme
> that is used by bio_endio().  Have mod_b control its own
> config_item_operations.  In make_item, copy off the type and operations,
> then put in your own.  In drop, put them back.
> 
> struct my_item_type {
> 	struct config_item_type *original_type;
> 	struct config_item_type type;
> 	struct config_item_operations ops;
> };
> 
> make_item()
> {
> 	mod_b = alloc_mod_b();
> 	my_type = kzalloc(struct my_item_type);
> 	my_type->original_type = mod_b->item->ci_type;
> 	my_type->ops = *my_type->original_type->ct_item_ops;
> 	my_type->ops.release = my_item_release();
> 	my_type->type = *my_type->original_type;
> 	my_type->type.ops = &my_type->ops;
> 	mod_b->item->ci_type = &my_type->type;
> 
> 	return &mod_b->item;
> }
> 
> my_item_release(struct config_item *item)
> {
> 	my_type = to_my_type(item->type);
> 	item->ci_type = my_type->original_type;
> 	kfree(my_type);
> 	item->ci_type->ct_ops->release(item);
> }

I already do something like this, replacing the item_operations instead of the
whole item_type. And I find it ugly. Only a matter of taste, I agree.

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20080616/7588f9f1/attachment-0001.bin 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] configfs: module reference counting rules
@ 2008-06-16 12:39             ` Louis Rilling
  0 siblings, 0 replies; 20+ messages in thread
From: Louis Rilling @ 2008-06-16 12:39 UTC (permalink / raw)
  To: Joel.Becker; +Cc: ocfs2-devel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 8556 bytes --]

On Sat, Jun 14, 2008 at 01:47:01AM -0700, Joel Becker wrote:
> On Sat, Jun 14, 2008 at 12:27:44AM +0200, Louis Rilling wrote:
> > 1/ taking the module reference is only needed if mkdir() is called under
> > a subsystem root or one of its default group, right? Of course this is a
> > bit complex to check for this condition, and it does not hurt to take
> > module references in all cases of mkdir().
> 
> 	Correct, but it's much cleaner to always take the module ref.
> 
> > 2/ this module reference counting makes unregister_subsystem() win
> > against mkdir(), but only if unregister_subsystem() is called in
> > module_cleanup(), because otherwise try_module_get() would succeed,
> > right? If so, this means that after having called register_subsystem(),
> > a module_init() cannot cleanly fail. Perhaps this should be documented
> > in that case.
> 
> 	Correct again, mkdir can race a failing module_init().  This is
> the same as register_filesystem().  A module that needed to protect
> against this could have a mutex they release right before module_init()
> succeeds.  They'd check it in make_item().  But most everyone can safely
> make configfs_register_subsystem() the last thing in their
> module_init().

Agreed. I have examples of similar issues (but not configfs related) where
a single module module_init() failing needs several cleanup that can only
safely be done in module_cleanup(), but I cannot claim that this is general
case :)

> 
> > 3/ to make unregister_subsystem() win against mkdir(), mkdir() should
> > try_module_get() on the subsystem's owner, not on the new item owner (as
> > is done by the current code), right? Maybe there is a bug here...
> 
> 	Nope.  You can build a subsystem out of multiple modules if you
> like.  The module that owns the newly created object needs to be pinned,
> and that's new_item->type->owner.  If a subsystem lives within one
> module, then subsys->type->owner == new_item->type->owner, and it
> doesn't matter.

In the "several modules" case, the module is pinned too late to protect against
module unloading. This does probably not hurt configfs since the only
problematic call is config_item_cleanup(), where the new item's release()
operation is called. For such subsystems, the only way to protect against module
unloading is to grab a reference on the new item's module in the make_item()
operation, and find a way to ensure that the reference is dropped after the last
config_item_put().
	IMHO, what really hurts configfs is that the unregister_subsystem() vs
mkdir() race is not solved unless mkdir() tries to grab a reference on the
subsystem's module. And the current code of mkdir() does not ensure that in the
"several modules" case.

> 
> > > 	So for the simple case, we provide plenty of protection.  If
> > > someone wants to do something fancier, they have to provide their own
> > > protection, but they would anyway.  And we can't know their complex
> > > lifecycle, so we can't really help anyway.
> > 
> > Actually, I'm developing a framework based on configfs, with which many
> > modules can be linked together through mkdir() and symlink() operations.
> > So I'm already managing such reference holding, but the fact that
> > configfs does not hold a reference until the last config_item_put()
> > imposes limitations (with which I can live though) to the framework:
> 
> 	Think about it this way: the try_module_get() isn't to protect
> the client module, it is to protect configfs.  It makes sure that if
> someone calls a VFS operation on a configfs inode, configfs can follow
> the config_*_operations safely.  Once a config_item is removed from the
> filesystem view, configfs is done with it.
> 	Look at it the other way around.  A config_item is not a
> structure owned by configfs.  It is a part of the larger object, which
> is owned by the module that created it.  The config_item portion just
> lets configfs display it to userspace.

Yes, this is what I realized in your previous email.

> 
> > 	For instance, A is a group of the framework, and mkdir A/B creates an
> > object implemented by module "mod_b". Before calling mod_b's
> > constructor, A's make_item() has to grab a reference on mod_b, and fail
> > if not successful. Then this reference cannot be dropped before B's last
> > reference is dropped, which can happen a long time after rmdir A/B. So
> > A's drop_item() cannot drop mod_b's reference, and A has to provide
> > the release() config_item_operation for B, which will call B's
> > destructor and finally drop mod_b's reference.
> 
> 	Wow, so A->make_item() does something like:
> 
>     submod = lookup_which_mod();
>     if (!strcmp(submod, "mod_a"))
>         new_thing = mod_a->alloc();
>     else if (!strcmp(submod, "mod_b"))
>         new_thing = mod_b->alloc();
> 
>     return &new_thing->config_item;
> 
> That's pretty complicated, I agree.  But certainly doable.

I do something like this (and this works):

struct mod_type {
	struct list_head type_list;
	struct config_item_type item_type;
	const char *name;
	struct config_item *new_item(const char *name);
	void destroy_item(struct config_item *item);
};

void my_release(struct config_item *item)
{
	struct mod_type *type = container_of(item->ci_type, struct mod_type,
					item_type);
	type->destroy_item(item);
	module_put(type->item_type.ct_owner);
}

struct configfs_item_operations my_item_ops = {
	.release = my_release,
};

void register(struct mod_type *mod_type)
{
	mod_type->item_type.ct_item_ops = &my_item_ops;
	spin_lock(&type_list);
	list_add(&mod_type->type_list, &mod_type_head);
	spin_unlock(&type_list);
}

/* Must only be called inside module_cleanup() */
void unregister(struct mod_type *mod_type)
{
	spin_lock(&type_list);
	list_del(&mod_type->type_list);
	spin_unlock(&type_list);
}

struct mod_type *lookup(const char *name)
{
	/* return mod_type having mod_type->name == name in the list */ 
}

make_item(struct config_group *group, const char *name)
{
	spin_lock(&mod_type_list);
	mod_type = lookup(name);
	if (mod_type)
		if (!try_module_get(mod_type->item_type.ct_owner))
			mod_type = NULL;
	spin_unlock(&mod_type_list);

	new_item = NULL;
	if (mod_type) {
		new_item = mod_type->new_item(name);
		if (!new_item)
			module_put(mod_type->item_type.ct_owner);
	}
	return new_item;
}

drop_item(struct config_group *group, struct config_item *item)
{
	config_item_put(item);
}

------

mod_a_init()
	register(mod_type_a);

mod_a_cleanup()
	unregister(mod_type_a);

> 	Why can't mod_b provide a ->release() that does
> module_put(self)?

Because this is simply wrong. Doing module_put(self) exposes the modules's
function to be run while another cpu unloads the module. Note how I solve this
by doing try_module_get() while still having mod_type_list locked, and doing
module_put() only after having destroyed the module's item. This lock
actually protects item creation against concurrent removal of the module
implementing that item.

> Or are you trying to hide that detail from the person
> who is implementing mod_b?  Even better, use the chained release scheme
> that is used by bio_endio().  Have mod_b control its own
> config_item_operations.  In make_item, copy off the type and operations,
> then put in your own.  In drop, put them back.
> 
> struct my_item_type {
> 	struct config_item_type *original_type;
> 	struct config_item_type type;
> 	struct config_item_operations ops;
> };
> 
> make_item()
> {
> 	mod_b = alloc_mod_b();
> 	my_type = kzalloc(struct my_item_type);
> 	my_type->original_type = mod_b->item->ci_type;
> 	my_type->ops = *my_type->original_type->ct_item_ops;
> 	my_type->ops.release = my_item_release();
> 	my_type->type = *my_type->original_type;
> 	my_type->type.ops = &my_type->ops;
> 	mod_b->item->ci_type = &my_type->type;
> 
> 	return &mod_b->item;
> }
> 
> my_item_release(struct config_item *item)
> {
> 	my_type = to_my_type(item->type);
> 	item->ci_type = my_type->original_type;
> 	kfree(my_type);
> 	item->ci_type->ct_ops->release(item);
> }

I already do something like this, replacing the item_operations instead of the
whole item_type. And I find it ugly. Only a matter of taste, I agree.

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Ocfs2-devel] [RFC] configfs: module reference counting rules
  2008-06-16 12:39             ` Louis Rilling
@ 2008-06-16 18:06               ` Joel Becker
  -1 siblings, 0 replies; 20+ messages in thread
From: Joel Becker @ 2008-06-16 18:06 UTC (permalink / raw)
  To: Louis Rilling; +Cc: ocfs2-devel, linux-kernel

On Mon, Jun 16, 2008 at 02:39:12PM +0200, Louis Rilling wrote:
> On Sat, Jun 14, 2008 at 01:47:01AM -0700, Joel Becker wrote:
> 	IMHO, what really hurts configfs is that the unregister_subsystem() vs
> mkdir() race is not solved unless mkdir() tries to grab a reference on the
> subsystem's module. And the current code of mkdir() does not ensure that in the
> "several modules" case.

	Valid point.  It really does assume that the owner is always the
same.  Have to think about whether that's a big deal.

> I do something like this (and this works):

	I believe it works.  It looks fine.  I'd personally do it more
like what I displayed, wrapping release() rather than creating a
separate operations abstraction and overriding item_operations, but as
you point out that's just implementation.

> > 	Why can't mod_b provide a ->release() that does
> > module_put(self)?
> 
> Because this is simply wrong. Doing module_put(self) exposes the modules's
> function to be run while another cpu unloads the module. Note how I solve this

	How so?  As long as the module_put() is the last thing, you're
fine.  That said, we both have better solutions with our wrappered
functions.

Joel

-- 

"If you took all of the grains of sand in the world, and lined
 them up end to end in a row, you'd be working for the government!"
	- Mr. Interesting

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] configfs: module reference counting rules
@ 2008-06-16 18:06               ` Joel Becker
  0 siblings, 0 replies; 20+ messages in thread
From: Joel Becker @ 2008-06-16 18:06 UTC (permalink / raw)
  To: Louis Rilling; +Cc: ocfs2-devel, linux-kernel

On Mon, Jun 16, 2008 at 02:39:12PM +0200, Louis Rilling wrote:
> On Sat, Jun 14, 2008 at 01:47:01AM -0700, Joel Becker wrote:
> 	IMHO, what really hurts configfs is that the unregister_subsystem() vs
> mkdir() race is not solved unless mkdir() tries to grab a reference on the
> subsystem's module. And the current code of mkdir() does not ensure that in the
> "several modules" case.

	Valid point.  It really does assume that the owner is always the
same.  Have to think about whether that's a big deal.

> I do something like this (and this works):

	I believe it works.  It looks fine.  I'd personally do it more
like what I displayed, wrapping release() rather than creating a
separate operations abstraction and overriding item_operations, but as
you point out that's just implementation.

> > 	Why can't mod_b provide a ->release() that does
> > module_put(self)?
> 
> Because this is simply wrong. Doing module_put(self) exposes the modules's
> function to be run while another cpu unloads the module. Note how I solve this

	How so?  As long as the module_put() is the last thing, you're
fine.  That said, we both have better solutions with our wrappered
functions.

Joel

-- 

"If you took all of the grains of sand in the world, and lined
 them up end to end in a row, you'd be working for the government!"
	- Mr. Interesting

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Ocfs2-devel] [RFC] configfs: module reference counting rules
  2008-06-16 18:06               ` Joel Becker
@ 2008-06-16 18:14                 ` Louis Rilling
  -1 siblings, 0 replies; 20+ messages in thread
From: Louis Rilling @ 2008-06-16 18:14 UTC (permalink / raw)
  To: Joel.Becker; +Cc: ocfs2-devel, linux-kernel

On Mon, Jun 16, 2008 at 11:06:43AM -0700, Joel Becker wrote:
> > > 	Why can't mod_b provide a ->release() that does
> > > module_put(self)?
> > 
> > Because this is simply wrong. Doing module_put(self) exposes the modules's
> > function to be run while another cpu unloads the module. Note how I solve this
> 
> 	How so?  As long as the module_put() is the last thing, you're
> fine.  That said, we both have better solutions with our wrappered
> functions.

With a preemptible kernel, after module_put(self) the few assembly instructions
cleaning up the stack before returning to the caller can be called after the
memory allocated for the module has been freed. Which will make the kernel
crash.

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20080616/4cf790f2/attachment.bin 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] configfs: module reference counting rules
@ 2008-06-16 18:14                 ` Louis Rilling
  0 siblings, 0 replies; 20+ messages in thread
From: Louis Rilling @ 2008-06-16 18:14 UTC (permalink / raw)
  To: Joel.Becker; +Cc: ocfs2-devel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 903 bytes --]

On Mon, Jun 16, 2008 at 11:06:43AM -0700, Joel Becker wrote:
> > > 	Why can't mod_b provide a ->release() that does
> > > module_put(self)?
> > 
> > Because this is simply wrong. Doing module_put(self) exposes the modules's
> > function to be run while another cpu unloads the module. Note how I solve this
> 
> 	How so?  As long as the module_put() is the last thing, you're
> fine.  That said, we both have better solutions with our wrappered
> functions.

With a preemptible kernel, after module_put(self) the few assembly instructions
cleaning up the stack before returning to the caller can be called after the
memory allocated for the module has been freed. Which will make the kernel
crash.

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Ocfs2-devel] [RFC] configfs: module reference counting rules
  2008-06-16 18:14                 ` Louis Rilling
@ 2008-06-16 19:36                   ` Joel Becker
  -1 siblings, 0 replies; 20+ messages in thread
From: Joel Becker @ 2008-06-16 19:36 UTC (permalink / raw)
  To: Louis Rilling; +Cc: ocfs2-devel, linux-kernel

On Mon, Jun 16, 2008 at 08:14:28PM +0200, Louis Rilling wrote:
> With a preemptible kernel, after module_put(self) the few assembly instructions
> cleaning up the stack before returning to the caller can be called after the
> memory allocated for the module has been freed. Which will make the kernel
> crash.

	Ahh, but if you are running a preemptible kernel, you've already
made your first mistake :-)

Joel

-- 

"Vote early and vote often." 
        - Al Capone

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] configfs: module reference counting rules
@ 2008-06-16 19:36                   ` Joel Becker
  0 siblings, 0 replies; 20+ messages in thread
From: Joel Becker @ 2008-06-16 19:36 UTC (permalink / raw)
  To: Louis Rilling; +Cc: ocfs2-devel, linux-kernel

On Mon, Jun 16, 2008 at 08:14:28PM +0200, Louis Rilling wrote:
> With a preemptible kernel, after module_put(self) the few assembly instructions
> cleaning up the stack before returning to the caller can be called after the
> memory allocated for the module has been freed. Which will make the kernel
> crash.

	Ahh, but if you are running a preemptible kernel, you've already
made your first mistake :-)

Joel

-- 

"Vote early and vote often." 
        - Al Capone

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2008-06-16 19:37 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-12 23:54 [Ocfs2-devel] [RFC] configfs: module reference counting rules Louis Rilling
2008-06-12 23:54 ` Louis Rilling
2008-06-13  3:33 ` [Ocfs2-devel] " Joel Becker
2008-06-13  3:33   ` Joel Becker
2008-06-13  9:51   ` [Ocfs2-devel] " Louis Rilling
2008-06-13  9:51     ` Louis Rilling
2008-06-13 20:26     ` [Ocfs2-devel] " Joel Becker
2008-06-13 20:26       ` Joel Becker
2008-06-13 22:27       ` [Ocfs2-devel] " Louis Rilling
2008-06-13 22:27         ` Louis Rilling
2008-06-14  8:47         ` [Ocfs2-devel] " Joel Becker
2008-06-14  8:47           ` Joel Becker
2008-06-16 12:39           ` [Ocfs2-devel] " Louis Rilling
2008-06-16 12:39             ` Louis Rilling
2008-06-16 18:06             ` [Ocfs2-devel] " Joel Becker
2008-06-16 18:06               ` Joel Becker
2008-06-16 18:14               ` [Ocfs2-devel] " Louis Rilling
2008-06-16 18:14                 ` Louis Rilling
2008-06-16 19:36                 ` [Ocfs2-devel] " Joel Becker
2008-06-16 19:36                   ` Joel Becker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.