* [Ocfs2-devel] [RFC] configfs: module reference counting rules @ 2008-06-12 23:54 ` Louis Rilling 0 siblings, 0 replies; 20+ messages in thread From: Louis Rilling @ 2008-06-12 23:54 UTC (permalink / raw) To: Joel Becker; +Cc: ocfs2-devel, linux-kernel Hi, I'm a bit confused by configfs module reference counting, and I feel that it does not really protect against module unloading. If my feeling is correct, I'd suggest to change module reference counting rules in configfs, for instance like in the attached patch. If my feeling is wrong, could someone shed some light? Thanks Louis -- Dr Louis Rilling Kerlabs - IRISA Skype: louis.rilling Campus Universitaire de Beaulieu Phone: (+33|0) 2 99 84 71 52 Avenue du General Leclerc Fax: (+33|0) 2 99 84 71 71 35042 Rennes CEDEX - France http://www.kerlabs.com/ -------------- next part -------------- A non-text attachment was scrubbed... Name: configfs-fix-module-refcount.patch Type: text/x-diff Size: 6835 bytes Desc: not available Url : http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20080613/32facd75/attachment-0001.bin ^ permalink raw reply [flat|nested] 20+ messages in thread
* [RFC] configfs: module reference counting rules @ 2008-06-12 23:54 ` Louis Rilling 0 siblings, 0 replies; 20+ messages in thread From: Louis Rilling @ 2008-06-12 23:54 UTC (permalink / raw) To: Joel Becker; +Cc: ocfs2-devel, linux-kernel [-- Attachment #1: Type: text/plain, Size: 568 bytes --] Hi, I'm a bit confused by configfs module reference counting, and I feel that it does not really protect against module unloading. If my feeling is correct, I'd suggest to change module reference counting rules in configfs, for instance like in the attached patch. If my feeling is wrong, could someone shed some light? Thanks Louis -- Dr Louis Rilling Kerlabs - IRISA Skype: louis.rilling Campus Universitaire de Beaulieu Phone: (+33|0) 2 99 84 71 52 Avenue du General Leclerc Fax: (+33|0) 2 99 84 71 71 35042 Rennes CEDEX - France http://www.kerlabs.com/ [-- Attachment #2: configfs-fix-module-refcount.patch --] [-- Type: text/x-diff, Size: 6835 bytes --] configfs: make module reference counting robust Module references taken by configfs at mkdir() unfortunately do not ensure that the module implementing an item will remain loaded before the item is released, because 1/ the reference is taken after having created the item, and 2/ the reference is dropped when dropping the item, which may be before the item's last reference count is dropped. For instance this can lead to calling the release item operation when the module implementing the function is already unloaded. This patch changes module reference counting rules in configfs to ensure that for each item appearing in configfs a module reference is held until the item is released. For each item/group/default group created in make_item()/make_group(), the subsystem is responsible for grabbing a reference on the matching modules. configfs will drop that reference after having released the item/group/default group. Something similar might be needed for configfs_attribute as well. Signed-off-by: Louis Rilling <Louis.Rilling@kerlabs.com> --- drivers/net/netconsole.c | 1 + fs/configfs/dir.c | 17 +++++++---------- fs/configfs/item.c | 5 +++++ fs/dlm/config.c | 7 +++++++ fs/ocfs2/cluster/heartbeat.c | 1 + fs/ocfs2/cluster/nodemanager.c | 5 +++++ 6 files changed, 26 insertions(+), 10 deletions(-) Index: b/drivers/net/netconsole.c =================================================================== --- a/drivers/net/netconsole.c 2008-06-13 01:24:55.000000000 +0200 +++ b/drivers/net/netconsole.c 2008-06-13 01:25:51.000000000 +0200 @@ -608,6 +608,7 @@ static struct config_item *make_netconso memset(nt->np.remote_mac, 0xff, ETH_ALEN); /* Initialize the config_item member */ + __module_get(netconsole_target_type.ct_owner); config_item_init_type_name(&nt->item, name, &netconsole_target_type); /* Adding, but it is disabled */ Index: b/fs/configfs/dir.c =================================================================== --- a/fs/configfs/dir.c 2008-06-13 00:41:49.000000000 +0200 +++ b/fs/configfs/dir.c 2008-06-13 01:26:13.000000000 +0200 @@ -1080,18 +1080,18 @@ static int configfs_mkdir(struct inode * goto out_unlink; } + /* + * make_item()/make_group() should have grabbed a module reference for item + * Let's check it but do not keep one module reference more. The + * reference for item will be dropped after item is released. + */ owner = type->ct_owner; if (!try_module_get(owner)) { + printk(KERN_ERR "configfs: Ill-behaved subsystem: module reference count may be broken\n"); ret = -EINVAL; goto out_unlink; } - - /* - * I hate doing it this way, but if there is - * an error, module_put() probably should - * happen after any cleanup. - */ - module_got = 1; + module_put(owner) if (group) ret = configfs_attach_group(parent_item, item, dentry); @@ -1111,9 +1111,6 @@ out_unlink: client_drop_item(parent_item, item); mutex_unlock(&subsys->su_mutex); - - if (module_got) - module_put(owner); } out_put: Index: b/fs/configfs/item.c =================================================================== --- a/fs/configfs/item.c 2008-06-13 00:57:53.000000000 +0200 +++ b/fs/configfs/item.c 2008-06-13 00:59:42.000000000 +0200 @@ -153,13 +153,18 @@ static void config_item_cleanup(struct c struct config_item_type * t = item->ci_type; struct config_group * s = item->ci_group; struct config_item * parent = item->ci_parent; + struct module *owner = NULL; pr_debug("config_item %s: cleaning up\n",config_item_name(item)); if (item->ci_name != item->ci_namebuf) kfree(item->ci_name); item->ci_name = NULL; + if (t) + owner = t->ct_owner; if (t && t->ct_item_ops && t->ct_item_ops->release) t->ct_item_ops->release(item); + if (owner) + module_put(owner); if (s) config_group_put(s); if (parent) Index: b/fs/dlm/config.c =================================================================== --- a/fs/dlm/config.c 2008-06-13 01:02:35.000000000 +0200 +++ b/fs/dlm/config.c 2008-06-13 01:21:33.000000000 +0200 @@ -408,6 +408,9 @@ static struct config_group *make_cluster if (!cl || !gps || !sps || !cms) goto fail; + __module_get(cluster_type.ct_owner); + __module_get(spaces_type.ct_owner); + __module_get(comms_type.ct_owner); config_group_init_type_name(&cl->group, name, &cluster_type); config_group_init_type_name(&sps->ss_group, "spaces", &spaces_type); config_group_init_type_name(&cms->cs_group, "comms", &comms_type); @@ -479,6 +482,8 @@ static struct config_group *make_space(s if (!sp || !gps || !nds) goto fail; + __module_get(space_type.ct_owner); + __module_get(nodes_type.ct_owner); config_group_init_type_name(&sp->group, name, &space_type); config_group_init_type_name(&nds->ns_group, "nodes", &nodes_type); @@ -530,6 +535,7 @@ static struct config_item *make_comm(str if (!cm) return NULL; + __module_get(comm_type.ct_owner); config_item_init_type_name(&cm->item, name, &comm_type); cm->nodeid = -1; cm->local = 0; @@ -563,6 +569,7 @@ static struct config_item *make_node(str if (!nd) return NULL; + __module_get(node_type.ct_owner); config_item_init_type_name(&nd->item, name, &node_type); nd->nodeid = -1; nd->weight = 1; /* default weight of 1 if none is set */ Index: b/fs/ocfs2/cluster/heartbeat.c =================================================================== --- a/fs/ocfs2/cluster/heartbeat.c 2008-06-13 01:23:34.000000000 +0200 +++ b/fs/ocfs2/cluster/heartbeat.c 2008-06-13 01:24:19.000000000 +0200 @@ -1499,6 +1499,7 @@ static struct config_item *o2hb_heartbea if (reg == NULL) goto out; /* ENOMEM */ + __module_get(o2hb_region_type.ct_owner); config_item_init_type_name(®->hr_item, name, &o2hb_region_type); ret = ®->hr_item; Index: b/fs/ocfs2/cluster/nodemanager.c =================================================================== --- a/fs/ocfs2/cluster/nodemanager.c 2008-06-13 01:09:29.000000000 +0200 +++ b/fs/ocfs2/cluster/nodemanager.c 2008-06-13 01:23:05.000000000 +0200 @@ -718,6 +718,7 @@ static struct config_item *o2nm_node_gro goto out; /* ENOMEM */ strcpy(node->nd_name, name); /* use item.ci_namebuf instead? */ + __module_get(o2nm_node_type.ct_owner); config_item_init_type_name(&node->nd_item, name, &o2nm_node_type); spin_lock_init(&node->nd_lock); @@ -831,6 +832,10 @@ static struct config_group *o2nm_cluster if (cluster == NULL || ns == NULL || o2hb_group == NULL || defs == NULL) goto out; + if (!try_module_get(o2hb_group->cg_item.ci_type->ct_owner)) + goto out; + __modult_get(o2nm_cluster_type.ct_owner); + __modult_get(o2nm_node_group_type.ct_owner); config_group_init_type_name(&cluster->cl_group, name, &o2nm_cluster_type); config_group_init_type_name(&ns->ns_group, "node", ^ permalink raw reply [flat|nested] 20+ messages in thread
* [Ocfs2-devel] [RFC] configfs: module reference counting rules 2008-06-12 23:54 ` Louis Rilling @ 2008-06-13 3:33 ` Joel Becker -1 siblings, 0 replies; 20+ messages in thread From: Joel Becker @ 2008-06-13 3:33 UTC (permalink / raw) To: Louis Rilling; +Cc: ocfs2-devel, linux-kernel On Fri, Jun 13, 2008 at 01:54:11AM +0200, Louis Rilling wrote: > I'm a bit confused by configfs module reference counting, and I feel > that it does not really protect against module unloading. If my feeling > is correct, I'd suggest to change module reference counting rules in > configfs, for instance like in the attached patch. If my feeling is > wrong, could someone shed some light? You're wrong, sort of :-) I worked quite hard on this, so I was scared you'd found something - you haven't. configfs is only responsible for its *own* references on the client module. The client module is responsible for protecting itself. What do I mean? In mkdir(), the problem is racing configfs_unregister_subsystem(). The group *has* to be live, because we have i_mutex - unregister_subsystem can't tear down the directory until we release it. So we're safe to call ->make_item/group(). After we've done that, we have a type, and we can try_module_get(). If someone else is in unregister_subsystem, that fails and we clean up. If not, we have a reference and they can't unload. This is hard logic, and not something we want each and every client module to have to figure out. Your change has them explicitly __module_get() in ->make_item/group(), which isn't safe because of this race! What about attributes? They can only be accessed via the filesystem exposure. If they are in the filesystem, unregister can't happen. If they are opened, the module refcount is incremented. Your big concern came from rmdir(). Specifically, while it was safe to allocate an object during ->make_item/group(), what happens after ->drop_item() is called and then module_put()? If the item has a reference count still, the ->release() is not called. We may be dropping our last reference on the module, and now the module can be unloaded. This is the module's problem, not ours! configfs no longer has a reference to the module, and thus cannot control its lifetime. Anyone who has just the single reference is safe, because that's the last reference. When we call the last config_item_put(), the release happens. That's the simple case. A module that takes an additional reference to the config item needs to have this protection in place. All in-kernel users take and release items in one function call. They don't hold long-term references. If they did, they'd have to have a way of ensuring their structure remained alive - and this would be the case if configfs wasn't even involved. Joel -- Life's Little Instruction Book #232 "Keep your promises." Joel Becker Principal Software Developer Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] configfs: module reference counting rules @ 2008-06-13 3:33 ` Joel Becker 0 siblings, 0 replies; 20+ messages in thread From: Joel Becker @ 2008-06-13 3:33 UTC (permalink / raw) To: Louis Rilling; +Cc: ocfs2-devel, linux-kernel On Fri, Jun 13, 2008 at 01:54:11AM +0200, Louis Rilling wrote: > I'm a bit confused by configfs module reference counting, and I feel > that it does not really protect against module unloading. If my feeling > is correct, I'd suggest to change module reference counting rules in > configfs, for instance like in the attached patch. If my feeling is > wrong, could someone shed some light? You're wrong, sort of :-) I worked quite hard on this, so I was scared you'd found something - you haven't. configfs is only responsible for its *own* references on the client module. The client module is responsible for protecting itself. What do I mean? In mkdir(), the problem is racing configfs_unregister_subsystem(). The group *has* to be live, because we have i_mutex - unregister_subsystem can't tear down the directory until we release it. So we're safe to call ->make_item/group(). After we've done that, we have a type, and we can try_module_get(). If someone else is in unregister_subsystem, that fails and we clean up. If not, we have a reference and they can't unload. This is hard logic, and not something we want each and every client module to have to figure out. Your change has them explicitly __module_get() in ->make_item/group(), which isn't safe because of this race! What about attributes? They can only be accessed via the filesystem exposure. If they are in the filesystem, unregister can't happen. If they are opened, the module refcount is incremented. Your big concern came from rmdir(). Specifically, while it was safe to allocate an object during ->make_item/group(), what happens after ->drop_item() is called and then module_put()? If the item has a reference count still, the ->release() is not called. We may be dropping our last reference on the module, and now the module can be unloaded. This is the module's problem, not ours! configfs no longer has a reference to the module, and thus cannot control its lifetime. Anyone who has just the single reference is safe, because that's the last reference. When we call the last config_item_put(), the release happens. That's the simple case. A module that takes an additional reference to the config item needs to have this protection in place. All in-kernel users take and release items in one function call. They don't hold long-term references. If they did, they'd have to have a way of ensuring their structure remained alive - and this would be the case if configfs wasn't even involved. Joel -- Life's Little Instruction Book #232 "Keep your promises." Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 20+ messages in thread
* [Ocfs2-devel] [RFC] configfs: module reference counting rules 2008-06-13 3:33 ` Joel Becker @ 2008-06-13 9:51 ` Louis Rilling -1 siblings, 0 replies; 20+ messages in thread From: Louis Rilling @ 2008-06-13 9:51 UTC (permalink / raw) To: Joel.Becker; +Cc: ocfs2-devel, linux-kernel On Thu, Jun 12, 2008 at 08:33:12PM -0700, Joel Becker wrote: > On Fri, Jun 13, 2008 at 01:54:11AM +0200, Louis Rilling wrote: > > I'm a bit confused by configfs module reference counting, and I feel > > that it does not really protect against module unloading. If my feeling > > is correct, I'd suggest to change module reference counting rules in > > configfs, for instance like in the attached patch. If my feeling is > > wrong, could someone shed some light? > > You're wrong, sort of :-) I worked quite hard on this, so I was > scared you'd found something - you haven't. > configfs is only responsible for its *own* references on the > client module. The client module is responsible for protecting itself. > What do I mean? In mkdir(), the problem is racing > configfs_unregister_subsystem(). The group *has* to be live, because we > have i_mutex - unregister_subsystem can't tear down the directory until > we release it. So we're safe to call ->make_item/group(). After we've > done that, we have a type, and we can try_module_get(). If someone else > is in unregister_subsystem, that fails and we clean up. If not, we have > a reference and they can't unload. Ok, got it. The race is between unregister_subsystem() and mkdir() at the root of the subsystem (or one of its default groups). In deeper, user-created groups, this would be a design bug of the subsystem if this race could occur. > This is hard logic, and not something we want each and every > client module to have to figure out. Your change has them explicitly > __module_get() in ->make_item/group(), which isn't safe because of this > race! Well, this remains hard logic for the modules. But I agree that they should not impact the logic that deals with racing mkdir() and unregister_subsystem(). > What about attributes? They can only be accessed via the > filesystem exposure. If they are in the filesystem, unregister can't > happen. If they are opened, the module refcount is incremented. > Your big concern came from rmdir(). Specifically, while it was > safe to allocate an object during ->make_item/group(), what happens > after ->drop_item() is called and then module_put()? If the item has a > reference count still, the ->release() is not called. We may be > dropping our last reference on the module, and now the module can be > unloaded. This is the module's problem, not ours! configfs no longer > has a reference to the module, and thus cannot control its lifetime. Sure. I was thinking that configfs helped subsystems with such module reference counting issues, but I was wrong. > Anyone who has just the single reference is safe, because that's > the last reference. When we call the last config_item_put(), the > release happens. That's the simple case. > A module that takes an additional reference to the config item > needs to have this protection in place. All in-kernel users take and > release items in one function call. They don't hold long-term > references. If they did, they'd have to have a way of ensuring their > structure remained alive - and this would be the case if configfs wasn't > even involved. Thanks for these explanations. Louis -- Dr Louis Rilling Kerlabs Skype: louis.rilling Batiment Germanium Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes http://www.kerlabs.com/ 35700 Rennes -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Digital signature Url : http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20080613/1ebdf9ec/attachment.bin ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] configfs: module reference counting rules @ 2008-06-13 9:51 ` Louis Rilling 0 siblings, 0 replies; 20+ messages in thread From: Louis Rilling @ 2008-06-13 9:51 UTC (permalink / raw) To: Joel.Becker; +Cc: ocfs2-devel, linux-kernel [-- Attachment #1: Type: text/plain, Size: 3424 bytes --] On Thu, Jun 12, 2008 at 08:33:12PM -0700, Joel Becker wrote: > On Fri, Jun 13, 2008 at 01:54:11AM +0200, Louis Rilling wrote: > > I'm a bit confused by configfs module reference counting, and I feel > > that it does not really protect against module unloading. If my feeling > > is correct, I'd suggest to change module reference counting rules in > > configfs, for instance like in the attached patch. If my feeling is > > wrong, could someone shed some light? > > You're wrong, sort of :-) I worked quite hard on this, so I was > scared you'd found something - you haven't. > configfs is only responsible for its *own* references on the > client module. The client module is responsible for protecting itself. > What do I mean? In mkdir(), the problem is racing > configfs_unregister_subsystem(). The group *has* to be live, because we > have i_mutex - unregister_subsystem can't tear down the directory until > we release it. So we're safe to call ->make_item/group(). After we've > done that, we have a type, and we can try_module_get(). If someone else > is in unregister_subsystem, that fails and we clean up. If not, we have > a reference and they can't unload. Ok, got it. The race is between unregister_subsystem() and mkdir() at the root of the subsystem (or one of its default groups). In deeper, user-created groups, this would be a design bug of the subsystem if this race could occur. > This is hard logic, and not something we want each and every > client module to have to figure out. Your change has them explicitly > __module_get() in ->make_item/group(), which isn't safe because of this > race! Well, this remains hard logic for the modules. But I agree that they should not impact the logic that deals with racing mkdir() and unregister_subsystem(). > What about attributes? They can only be accessed via the > filesystem exposure. If they are in the filesystem, unregister can't > happen. If they are opened, the module refcount is incremented. > Your big concern came from rmdir(). Specifically, while it was > safe to allocate an object during ->make_item/group(), what happens > after ->drop_item() is called and then module_put()? If the item has a > reference count still, the ->release() is not called. We may be > dropping our last reference on the module, and now the module can be > unloaded. This is the module's problem, not ours! configfs no longer > has a reference to the module, and thus cannot control its lifetime. Sure. I was thinking that configfs helped subsystems with such module reference counting issues, but I was wrong. > Anyone who has just the single reference is safe, because that's > the last reference. When we call the last config_item_put(), the > release happens. That's the simple case. > A module that takes an additional reference to the config item > needs to have this protection in place. All in-kernel users take and > release items in one function call. They don't hold long-term > references. If they did, they'd have to have a way of ensuring their > structure remained alive - and this would be the case if configfs wasn't > even involved. Thanks for these explanations. Louis -- Dr Louis Rilling Kerlabs Skype: louis.rilling Batiment Germanium Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes http://www.kerlabs.com/ 35700 Rennes [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* [Ocfs2-devel] [RFC] configfs: module reference counting rules 2008-06-13 9:51 ` Louis Rilling @ 2008-06-13 20:26 ` Joel Becker -1 siblings, 0 replies; 20+ messages in thread From: Joel Becker @ 2008-06-13 20:26 UTC (permalink / raw) To: Louis Rilling; +Cc: ocfs2-devel, linux-kernel On Fri, Jun 13, 2008 at 11:51:59AM +0200, Louis Rilling wrote: > On Thu, Jun 12, 2008 at 08:33:12PM -0700, Joel Becker wrote: > Ok, got it. The race is between unregister_subsystem() and mkdir() at the root > of the subsystem (or one of its default groups). In deeper, user-created groups, > this would be a design bug of the subsystem if this race could occur. Exactly. > > This is hard logic, and not something we want each and every > > client module to have to figure out. Your change has them explicitly > > __module_get() in ->make_item/group(), which isn't safe because of this > > race! > > Well, this remains hard logic for the modules. But I agree that they should not > impact the logic that deals with racing mkdir() and unregister_subsystem(). It is, but most modules don't have to deal with it. Most modules (all in-kernel configfs users) have a single refrence on it. When they take an additional one, it's for the duration of a function call. They have to make sure that it's safe to call the function in the first place, so it's clearly safe to get/put the item. Forget about the configfs view that is presented to userspace. If you were a module with inter-linked structures, you'd have to make sure they were freed cleanly. For simple things, you create and drop them. If a module has a complex interlinkage, they have a mechanism to handle it. Example: filesystems can hang whatever they want off of VFS structures like inodes and superblocks - a mounted filesystem prevents rmmod. Everything is safe, *everything*, as long as it is all cleaned up when put_super() returns. So for the simple case, we provide plenty of protection. If someone wants to do something fancier, they have to provide their own protection, but they would anyway. And we can't know their complex lifecycle, so we can't really help anyway. > Sure. I was thinking that configfs helped subsystems with such module reference > counting issues, but I was wrong. We only help with the ones we have enough information to help with. > Thanks for these explanations. No problem. The more questions you ask, the more I refresh my memory. And as you've seen on other points, we sometimes find bugs :-) Joel -- "The trouble with being punctual is that nobody's there to appreciate it." - Franklin P. Jones Joel Becker Principal Software Developer Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] configfs: module reference counting rules @ 2008-06-13 20:26 ` Joel Becker 0 siblings, 0 replies; 20+ messages in thread From: Joel Becker @ 2008-06-13 20:26 UTC (permalink / raw) To: Louis Rilling; +Cc: ocfs2-devel, linux-kernel On Fri, Jun 13, 2008 at 11:51:59AM +0200, Louis Rilling wrote: > On Thu, Jun 12, 2008 at 08:33:12PM -0700, Joel Becker wrote: > Ok, got it. The race is between unregister_subsystem() and mkdir() at the root > of the subsystem (or one of its default groups). In deeper, user-created groups, > this would be a design bug of the subsystem if this race could occur. Exactly. > > This is hard logic, and not something we want each and every > > client module to have to figure out. Your change has them explicitly > > __module_get() in ->make_item/group(), which isn't safe because of this > > race! > > Well, this remains hard logic for the modules. But I agree that they should not > impact the logic that deals with racing mkdir() and unregister_subsystem(). It is, but most modules don't have to deal with it. Most modules (all in-kernel configfs users) have a single refrence on it. When they take an additional one, it's for the duration of a function call. They have to make sure that it's safe to call the function in the first place, so it's clearly safe to get/put the item. Forget about the configfs view that is presented to userspace. If you were a module with inter-linked structures, you'd have to make sure they were freed cleanly. For simple things, you create and drop them. If a module has a complex interlinkage, they have a mechanism to handle it. Example: filesystems can hang whatever they want off of VFS structures like inodes and superblocks - a mounted filesystem prevents rmmod. Everything is safe, *everything*, as long as it is all cleaned up when put_super() returns. So for the simple case, we provide plenty of protection. If someone wants to do something fancier, they have to provide their own protection, but they would anyway. And we can't know their complex lifecycle, so we can't really help anyway. > Sure. I was thinking that configfs helped subsystems with such module reference > counting issues, but I was wrong. We only help with the ones we have enough information to help with. > Thanks for these explanations. No problem. The more questions you ask, the more I refresh my memory. And as you've seen on other points, we sometimes find bugs :-) Joel -- "The trouble with being punctual is that nobody's there to appreciate it." - Franklin P. Jones Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 20+ messages in thread
* [Ocfs2-devel] [RFC] configfs: module reference counting rules 2008-06-13 20:26 ` Joel Becker @ 2008-06-13 22:27 ` Louis Rilling -1 siblings, 0 replies; 20+ messages in thread From: Louis Rilling @ 2008-06-13 22:27 UTC (permalink / raw) To: Joel.Becker; +Cc: ocfs2-devel, linux-kernel On Fri, Jun 13, 2008 at 01:26:05PM -0700, Joel Becker wrote: > On Fri, Jun 13, 2008 at 11:51:59AM +0200, Louis Rilling wrote: > > On Thu, Jun 12, 2008 at 08:33:12PM -0700, Joel Becker wrote: > > Ok, got it. The race is between unregister_subsystem() and mkdir() at the root > > of the subsystem (or one of its default groups). In deeper, user-created groups, > > this would be a design bug of the subsystem if this race could occur. > > Exactly. A few more thoughts: 1/ taking the module reference is only needed if mkdir() is called under a subsystem root or one of its default group, right? Of course this is a bit complex to check for this condition, and it does not hurt to take module references in all cases of mkdir(). 2/ this module reference counting makes unregister_subsystem() win against mkdir(), but only if unregister_subsystem() is called in module_cleanup(), because otherwise try_module_get() would succeed, right? If so, this means that after having called register_subsystem(), a module_init() cannot cleanly fail. Perhaps this should be documented in that case. 3/ to make unregister_subsystem() win against mkdir(), mkdir() should try_module_get() on the subsystem's owner, not on the new item owner (as is done by the current code), right? Maybe there is a bug here... > > > > This is hard logic, and not something we want each and every > > > client module to have to figure out. Your change has them explicitly > > > __module_get() in ->make_item/group(), which isn't safe because of this > > > race! > > > > Well, this remains hard logic for the modules. But I agree that they should not > > impact the logic that deals with racing mkdir() and unregister_subsystem(). > > It is, but most modules don't have to deal with it. Most > modules (all in-kernel configfs users) have a single refrence on it. > When they take an additional one, it's for the duration of a function > call. They have to make sure that it's safe to call the function in the > first place, so it's clearly safe to get/put the item. > Forget about the configfs view that is presented to userspace. > If you were a module with inter-linked structures, you'd have to make > sure they were freed cleanly. For simple things, you create and drop > them. If a module has a complex interlinkage, they have a mechanism to > handle it. > Example: filesystems can hang whatever they want off of VFS > structures like inodes and superblocks - a mounted filesystem prevents > rmmod. Everything is safe, *everything*, as long as it is all cleaned > up when put_super() returns. > So for the simple case, we provide plenty of protection. If > someone wants to do something fancier, they have to provide their own > protection, but they would anyway. And we can't know their complex > lifecycle, so we can't really help anyway. Actually, I'm developing a framework based on configfs, with which many modules can be linked together through mkdir() and symlink() operations. So I'm already managing such reference holding, but the fact that configfs does not hold a reference until the last config_item_put() imposes limitations (with which I can live though) to the framework: plugins of the framework provide custom config_item_types, but cannot define their own config_item_operations (especially release()), because release() must be called with a reference held on the module, and once release() is called somebody (not the module itself) has to drop this reference. For instance, A is a group of the framework, and mkdir A/B creates an object implemented by module "mod_b". Before calling mod_b's constructor, A's make_item() has to grab a reference on mod_b, and fail if not successful. Then this reference cannot be dropped before B's last reference is dropped, which can happen a long time after rmdir A/B. So A's drop_item() cannot drop mod_b's reference, and A has to provide the release() config_item_operation for B, which will call B's destructor and finally drop mod_b's reference. So I'd hoped that configfs could help me with this reference counting, for instance by keeping a reference until last config_item_cleanup(). But I can live without it, so don't care. Louis -- Dr Louis Rilling Kerlabs - IRISA Skype: louis.rilling Campus Universitaire de Beaulieu Phone: (+33|0) 2 99 84 71 52 Avenue du General Leclerc Fax: (+33|0) 2 99 84 71 71 35042 Rennes CEDEX - France http://www.kerlabs.com/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] configfs: module reference counting rules @ 2008-06-13 22:27 ` Louis Rilling 0 siblings, 0 replies; 20+ messages in thread From: Louis Rilling @ 2008-06-13 22:27 UTC (permalink / raw) To: Joel.Becker; +Cc: ocfs2-devel, linux-kernel On Fri, Jun 13, 2008 at 01:26:05PM -0700, Joel Becker wrote: > On Fri, Jun 13, 2008 at 11:51:59AM +0200, Louis Rilling wrote: > > On Thu, Jun 12, 2008 at 08:33:12PM -0700, Joel Becker wrote: > > Ok, got it. The race is between unregister_subsystem() and mkdir() at the root > > of the subsystem (or one of its default groups). In deeper, user-created groups, > > this would be a design bug of the subsystem if this race could occur. > > Exactly. A few more thoughts: 1/ taking the module reference is only needed if mkdir() is called under a subsystem root or one of its default group, right? Of course this is a bit complex to check for this condition, and it does not hurt to take module references in all cases of mkdir(). 2/ this module reference counting makes unregister_subsystem() win against mkdir(), but only if unregister_subsystem() is called in module_cleanup(), because otherwise try_module_get() would succeed, right? If so, this means that after having called register_subsystem(), a module_init() cannot cleanly fail. Perhaps this should be documented in that case. 3/ to make unregister_subsystem() win against mkdir(), mkdir() should try_module_get() on the subsystem's owner, not on the new item owner (as is done by the current code), right? Maybe there is a bug here... > > > > This is hard logic, and not something we want each and every > > > client module to have to figure out. Your change has them explicitly > > > __module_get() in ->make_item/group(), which isn't safe because of this > > > race! > > > > Well, this remains hard logic for the modules. But I agree that they should not > > impact the logic that deals with racing mkdir() and unregister_subsystem(). > > It is, but most modules don't have to deal with it. Most > modules (all in-kernel configfs users) have a single refrence on it. > When they take an additional one, it's for the duration of a function > call. They have to make sure that it's safe to call the function in the > first place, so it's clearly safe to get/put the item. > Forget about the configfs view that is presented to userspace. > If you were a module with inter-linked structures, you'd have to make > sure they were freed cleanly. For simple things, you create and drop > them. If a module has a complex interlinkage, they have a mechanism to > handle it. > Example: filesystems can hang whatever they want off of VFS > structures like inodes and superblocks - a mounted filesystem prevents > rmmod. Everything is safe, *everything*, as long as it is all cleaned > up when put_super() returns. > So for the simple case, we provide plenty of protection. If > someone wants to do something fancier, they have to provide their own > protection, but they would anyway. And we can't know their complex > lifecycle, so we can't really help anyway. Actually, I'm developing a framework based on configfs, with which many modules can be linked together through mkdir() and symlink() operations. So I'm already managing such reference holding, but the fact that configfs does not hold a reference until the last config_item_put() imposes limitations (with which I can live though) to the framework: plugins of the framework provide custom config_item_types, but cannot define their own config_item_operations (especially release()), because release() must be called with a reference held on the module, and once release() is called somebody (not the module itself) has to drop this reference. For instance, A is a group of the framework, and mkdir A/B creates an object implemented by module "mod_b". Before calling mod_b's constructor, A's make_item() has to grab a reference on mod_b, and fail if not successful. Then this reference cannot be dropped before B's last reference is dropped, which can happen a long time after rmdir A/B. So A's drop_item() cannot drop mod_b's reference, and A has to provide the release() config_item_operation for B, which will call B's destructor and finally drop mod_b's reference. So I'd hoped that configfs could help me with this reference counting, for instance by keeping a reference until last config_item_cleanup(). But I can live without it, so don't care. Louis -- Dr Louis Rilling Kerlabs - IRISA Skype: louis.rilling Campus Universitaire de Beaulieu Phone: (+33|0) 2 99 84 71 52 Avenue du General Leclerc Fax: (+33|0) 2 99 84 71 71 35042 Rennes CEDEX - France http://www.kerlabs.com/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* [Ocfs2-devel] [RFC] configfs: module reference counting rules 2008-06-13 22:27 ` Louis Rilling @ 2008-06-14 8:47 ` Joel Becker -1 siblings, 0 replies; 20+ messages in thread From: Joel Becker @ 2008-06-14 8:47 UTC (permalink / raw) To: Louis Rilling; +Cc: ocfs2-devel, linux-kernel On Sat, Jun 14, 2008 at 12:27:44AM +0200, Louis Rilling wrote: > 1/ taking the module reference is only needed if mkdir() is called under > a subsystem root or one of its default group, right? Of course this is a > bit complex to check for this condition, and it does not hurt to take > module references in all cases of mkdir(). Correct, but it's much cleaner to always take the module ref. > 2/ this module reference counting makes unregister_subsystem() win > against mkdir(), but only if unregister_subsystem() is called in > module_cleanup(), because otherwise try_module_get() would succeed, > right? If so, this means that after having called register_subsystem(), > a module_init() cannot cleanly fail. Perhaps this should be documented > in that case. Correct again, mkdir can race a failing module_init(). This is the same as register_filesystem(). A module that needed to protect against this could have a mutex they release right before module_init() succeeds. They'd check it in make_item(). But most everyone can safely make configfs_register_subsystem() the last thing in their module_init(). > 3/ to make unregister_subsystem() win against mkdir(), mkdir() should > try_module_get() on the subsystem's owner, not on the new item owner (as > is done by the current code), right? Maybe there is a bug here... Nope. You can build a subsystem out of multiple modules if you like. The module that owns the newly created object needs to be pinned, and that's new_item->type->owner. If a subsystem lives within one module, then subsys->type->owner == new_item->type->owner, and it doesn't matter. > > So for the simple case, we provide plenty of protection. If > > someone wants to do something fancier, they have to provide their own > > protection, but they would anyway. And we can't know their complex > > lifecycle, so we can't really help anyway. > > Actually, I'm developing a framework based on configfs, with which many > modules can be linked together through mkdir() and symlink() operations. > So I'm already managing such reference holding, but the fact that > configfs does not hold a reference until the last config_item_put() > imposes limitations (with which I can live though) to the framework: Think about it this way: the try_module_get() isn't to protect the client module, it is to protect configfs. It makes sure that if someone calls a VFS operation on a configfs inode, configfs can follow the config_*_operations safely. Once a config_item is removed from the filesystem view, configfs is done with it. Look at it the other way around. A config_item is not a structure owned by configfs. It is a part of the larger object, which is owned by the module that created it. The config_item portion just lets configfs display it to userspace. > For instance, A is a group of the framework, and mkdir A/B creates an > object implemented by module "mod_b". Before calling mod_b's > constructor, A's make_item() has to grab a reference on mod_b, and fail > if not successful. Then this reference cannot be dropped before B's last > reference is dropped, which can happen a long time after rmdir A/B. So > A's drop_item() cannot drop mod_b's reference, and A has to provide > the release() config_item_operation for B, which will call B's > destructor and finally drop mod_b's reference. Wow, so A->make_item() does something like: submod = lookup_which_mod(); if (!strcmp(submod, "mod_a")) new_thing = mod_a->alloc(); else if (!strcmp(submod, "mod_b")) new_thing = mod_b->alloc(); return &new_thing->config_item; That's pretty complicated, I agree. But certainly doable. Why can't mod_b provide a ->release() that does module_put(self)? Or are you trying to hide that detail from the person who is implementing mod_b? Even better, use the chained release scheme that is used by bio_endio(). Have mod_b control its own config_item_operations. In make_item, copy off the type and operations, then put in your own. In drop, put them back. struct my_item_type { struct config_item_type *original_type; struct config_item_type type; struct config_item_operations ops; }; make_item() { mod_b = alloc_mod_b(); my_type = kzalloc(struct my_item_type); my_type->original_type = mod_b->item->ci_type; my_type->ops = *my_type->original_type->ct_item_ops; my_type->ops.release = my_item_release(); my_type->type = *my_type->original_type; my_type->type.ops = &my_type->ops; mod_b->item->ci_type = &my_type->type; return &mod_b->item; } my_item_release(struct config_item *item) { my_type = to_my_type(item->type); item->ci_type = my_type->original_type; kfree(my_type); item->ci_type->ct_ops->release(item); } Joel -- "Ninety feet between bases is perhaps as close as man has ever come to perfection." - Red Smith Joel Becker Principal Software Developer Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] configfs: module reference counting rules @ 2008-06-14 8:47 ` Joel Becker 0 siblings, 0 replies; 20+ messages in thread From: Joel Becker @ 2008-06-14 8:47 UTC (permalink / raw) To: Louis Rilling; +Cc: ocfs2-devel, linux-kernel On Sat, Jun 14, 2008 at 12:27:44AM +0200, Louis Rilling wrote: > 1/ taking the module reference is only needed if mkdir() is called under > a subsystem root or one of its default group, right? Of course this is a > bit complex to check for this condition, and it does not hurt to take > module references in all cases of mkdir(). Correct, but it's much cleaner to always take the module ref. > 2/ this module reference counting makes unregister_subsystem() win > against mkdir(), but only if unregister_subsystem() is called in > module_cleanup(), because otherwise try_module_get() would succeed, > right? If so, this means that after having called register_subsystem(), > a module_init() cannot cleanly fail. Perhaps this should be documented > in that case. Correct again, mkdir can race a failing module_init(). This is the same as register_filesystem(). A module that needed to protect against this could have a mutex they release right before module_init() succeeds. They'd check it in make_item(). But most everyone can safely make configfs_register_subsystem() the last thing in their module_init(). > 3/ to make unregister_subsystem() win against mkdir(), mkdir() should > try_module_get() on the subsystem's owner, not on the new item owner (as > is done by the current code), right? Maybe there is a bug here... Nope. You can build a subsystem out of multiple modules if you like. The module that owns the newly created object needs to be pinned, and that's new_item->type->owner. If a subsystem lives within one module, then subsys->type->owner == new_item->type->owner, and it doesn't matter. > > So for the simple case, we provide plenty of protection. If > > someone wants to do something fancier, they have to provide their own > > protection, but they would anyway. And we can't know their complex > > lifecycle, so we can't really help anyway. > > Actually, I'm developing a framework based on configfs, with which many > modules can be linked together through mkdir() and symlink() operations. > So I'm already managing such reference holding, but the fact that > configfs does not hold a reference until the last config_item_put() > imposes limitations (with which I can live though) to the framework: Think about it this way: the try_module_get() isn't to protect the client module, it is to protect configfs. It makes sure that if someone calls a VFS operation on a configfs inode, configfs can follow the config_*_operations safely. Once a config_item is removed from the filesystem view, configfs is done with it. Look at it the other way around. A config_item is not a structure owned by configfs. It is a part of the larger object, which is owned by the module that created it. The config_item portion just lets configfs display it to userspace. > For instance, A is a group of the framework, and mkdir A/B creates an > object implemented by module "mod_b". Before calling mod_b's > constructor, A's make_item() has to grab a reference on mod_b, and fail > if not successful. Then this reference cannot be dropped before B's last > reference is dropped, which can happen a long time after rmdir A/B. So > A's drop_item() cannot drop mod_b's reference, and A has to provide > the release() config_item_operation for B, which will call B's > destructor and finally drop mod_b's reference. Wow, so A->make_item() does something like: submod = lookup_which_mod(); if (!strcmp(submod, "mod_a")) new_thing = mod_a->alloc(); else if (!strcmp(submod, "mod_b")) new_thing = mod_b->alloc(); return &new_thing->config_item; That's pretty complicated, I agree. But certainly doable. Why can't mod_b provide a ->release() that does module_put(self)? Or are you trying to hide that detail from the person who is implementing mod_b? Even better, use the chained release scheme that is used by bio_endio(). Have mod_b control its own config_item_operations. In make_item, copy off the type and operations, then put in your own. In drop, put them back. struct my_item_type { struct config_item_type *original_type; struct config_item_type type; struct config_item_operations ops; }; make_item() { mod_b = alloc_mod_b(); my_type = kzalloc(struct my_item_type); my_type->original_type = mod_b->item->ci_type; my_type->ops = *my_type->original_type->ct_item_ops; my_type->ops.release = my_item_release(); my_type->type = *my_type->original_type; my_type->type.ops = &my_type->ops; mod_b->item->ci_type = &my_type->type; return &mod_b->item; } my_item_release(struct config_item *item) { my_type = to_my_type(item->type); item->ci_type = my_type->original_type; kfree(my_type); item->ci_type->ct_ops->release(item); } Joel -- "Ninety feet between bases is perhaps as close as man has ever come to perfection." - Red Smith Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 20+ messages in thread
* [Ocfs2-devel] [RFC] configfs: module reference counting rules 2008-06-14 8:47 ` Joel Becker @ 2008-06-16 12:39 ` Louis Rilling -1 siblings, 0 replies; 20+ messages in thread From: Louis Rilling @ 2008-06-16 12:39 UTC (permalink / raw) To: Joel.Becker; +Cc: ocfs2-devel, linux-kernel On Sat, Jun 14, 2008 at 01:47:01AM -0700, Joel Becker wrote: > On Sat, Jun 14, 2008 at 12:27:44AM +0200, Louis Rilling wrote: > > 1/ taking the module reference is only needed if mkdir() is called under > > a subsystem root or one of its default group, right? Of course this is a > > bit complex to check for this condition, and it does not hurt to take > > module references in all cases of mkdir(). > > Correct, but it's much cleaner to always take the module ref. > > > 2/ this module reference counting makes unregister_subsystem() win > > against mkdir(), but only if unregister_subsystem() is called in > > module_cleanup(), because otherwise try_module_get() would succeed, > > right? If so, this means that after having called register_subsystem(), > > a module_init() cannot cleanly fail. Perhaps this should be documented > > in that case. > > Correct again, mkdir can race a failing module_init(). This is > the same as register_filesystem(). A module that needed to protect > against this could have a mutex they release right before module_init() > succeeds. They'd check it in make_item(). But most everyone can safely > make configfs_register_subsystem() the last thing in their > module_init(). Agreed. I have examples of similar issues (but not configfs related) where a single module module_init() failing needs several cleanup that can only safely be done in module_cleanup(), but I cannot claim that this is general case :) > > > 3/ to make unregister_subsystem() win against mkdir(), mkdir() should > > try_module_get() on the subsystem's owner, not on the new item owner (as > > is done by the current code), right? Maybe there is a bug here... > > Nope. You can build a subsystem out of multiple modules if you > like. The module that owns the newly created object needs to be pinned, > and that's new_item->type->owner. If a subsystem lives within one > module, then subsys->type->owner == new_item->type->owner, and it > doesn't matter. In the "several modules" case, the module is pinned too late to protect against module unloading. This does probably not hurt configfs since the only problematic call is config_item_cleanup(), where the new item's release() operation is called. For such subsystems, the only way to protect against module unloading is to grab a reference on the new item's module in the make_item() operation, and find a way to ensure that the reference is dropped after the last config_item_put(). IMHO, what really hurts configfs is that the unregister_subsystem() vs mkdir() race is not solved unless mkdir() tries to grab a reference on the subsystem's module. And the current code of mkdir() does not ensure that in the "several modules" case. > > > > So for the simple case, we provide plenty of protection. If > > > someone wants to do something fancier, they have to provide their own > > > protection, but they would anyway. And we can't know their complex > > > lifecycle, so we can't really help anyway. > > > > Actually, I'm developing a framework based on configfs, with which many > > modules can be linked together through mkdir() and symlink() operations. > > So I'm already managing such reference holding, but the fact that > > configfs does not hold a reference until the last config_item_put() > > imposes limitations (with which I can live though) to the framework: > > Think about it this way: the try_module_get() isn't to protect > the client module, it is to protect configfs. It makes sure that if > someone calls a VFS operation on a configfs inode, configfs can follow > the config_*_operations safely. Once a config_item is removed from the > filesystem view, configfs is done with it. > Look at it the other way around. A config_item is not a > structure owned by configfs. It is a part of the larger object, which > is owned by the module that created it. The config_item portion just > lets configfs display it to userspace. Yes, this is what I realized in your previous email. > > > For instance, A is a group of the framework, and mkdir A/B creates an > > object implemented by module "mod_b". Before calling mod_b's > > constructor, A's make_item() has to grab a reference on mod_b, and fail > > if not successful. Then this reference cannot be dropped before B's last > > reference is dropped, which can happen a long time after rmdir A/B. So > > A's drop_item() cannot drop mod_b's reference, and A has to provide > > the release() config_item_operation for B, which will call B's > > destructor and finally drop mod_b's reference. > > Wow, so A->make_item() does something like: > > submod = lookup_which_mod(); > if (!strcmp(submod, "mod_a")) > new_thing = mod_a->alloc(); > else if (!strcmp(submod, "mod_b")) > new_thing = mod_b->alloc(); > > return &new_thing->config_item; > > That's pretty complicated, I agree. But certainly doable. I do something like this (and this works): struct mod_type { struct list_head type_list; struct config_item_type item_type; const char *name; struct config_item *new_item(const char *name); void destroy_item(struct config_item *item); }; void my_release(struct config_item *item) { struct mod_type *type = container_of(item->ci_type, struct mod_type, item_type); type->destroy_item(item); module_put(type->item_type.ct_owner); } struct configfs_item_operations my_item_ops = { .release = my_release, }; void register(struct mod_type *mod_type) { mod_type->item_type.ct_item_ops = &my_item_ops; spin_lock(&type_list); list_add(&mod_type->type_list, &mod_type_head); spin_unlock(&type_list); } /* Must only be called inside module_cleanup() */ void unregister(struct mod_type *mod_type) { spin_lock(&type_list); list_del(&mod_type->type_list); spin_unlock(&type_list); } struct mod_type *lookup(const char *name) { /* return mod_type having mod_type->name == name in the list */ } make_item(struct config_group *group, const char *name) { spin_lock(&mod_type_list); mod_type = lookup(name); if (mod_type) if (!try_module_get(mod_type->item_type.ct_owner)) mod_type = NULL; spin_unlock(&mod_type_list); new_item = NULL; if (mod_type) { new_item = mod_type->new_item(name); if (!new_item) module_put(mod_type->item_type.ct_owner); } return new_item; } drop_item(struct config_group *group, struct config_item *item) { config_item_put(item); } ------ mod_a_init() register(mod_type_a); mod_a_cleanup() unregister(mod_type_a); > Why can't mod_b provide a ->release() that does > module_put(self)? Because this is simply wrong. Doing module_put(self) exposes the modules's function to be run while another cpu unloads the module. Note how I solve this by doing try_module_get() while still having mod_type_list locked, and doing module_put() only after having destroyed the module's item. This lock actually protects item creation against concurrent removal of the module implementing that item. > Or are you trying to hide that detail from the person > who is implementing mod_b? Even better, use the chained release scheme > that is used by bio_endio(). Have mod_b control its own > config_item_operations. In make_item, copy off the type and operations, > then put in your own. In drop, put them back. > > struct my_item_type { > struct config_item_type *original_type; > struct config_item_type type; > struct config_item_operations ops; > }; > > make_item() > { > mod_b = alloc_mod_b(); > my_type = kzalloc(struct my_item_type); > my_type->original_type = mod_b->item->ci_type; > my_type->ops = *my_type->original_type->ct_item_ops; > my_type->ops.release = my_item_release(); > my_type->type = *my_type->original_type; > my_type->type.ops = &my_type->ops; > mod_b->item->ci_type = &my_type->type; > > return &mod_b->item; > } > > my_item_release(struct config_item *item) > { > my_type = to_my_type(item->type); > item->ci_type = my_type->original_type; > kfree(my_type); > item->ci_type->ct_ops->release(item); > } I already do something like this, replacing the item_operations instead of the whole item_type. And I find it ugly. Only a matter of taste, I agree. Louis -- Dr Louis Rilling Kerlabs Skype: louis.rilling Batiment Germanium Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes http://www.kerlabs.com/ 35700 Rennes -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Digital signature Url : http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20080616/7588f9f1/attachment-0001.bin ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] configfs: module reference counting rules @ 2008-06-16 12:39 ` Louis Rilling 0 siblings, 0 replies; 20+ messages in thread From: Louis Rilling @ 2008-06-16 12:39 UTC (permalink / raw) To: Joel.Becker; +Cc: ocfs2-devel, linux-kernel [-- Attachment #1: Type: text/plain, Size: 8556 bytes --] On Sat, Jun 14, 2008 at 01:47:01AM -0700, Joel Becker wrote: > On Sat, Jun 14, 2008 at 12:27:44AM +0200, Louis Rilling wrote: > > 1/ taking the module reference is only needed if mkdir() is called under > > a subsystem root or one of its default group, right? Of course this is a > > bit complex to check for this condition, and it does not hurt to take > > module references in all cases of mkdir(). > > Correct, but it's much cleaner to always take the module ref. > > > 2/ this module reference counting makes unregister_subsystem() win > > against mkdir(), but only if unregister_subsystem() is called in > > module_cleanup(), because otherwise try_module_get() would succeed, > > right? If so, this means that after having called register_subsystem(), > > a module_init() cannot cleanly fail. Perhaps this should be documented > > in that case. > > Correct again, mkdir can race a failing module_init(). This is > the same as register_filesystem(). A module that needed to protect > against this could have a mutex they release right before module_init() > succeeds. They'd check it in make_item(). But most everyone can safely > make configfs_register_subsystem() the last thing in their > module_init(). Agreed. I have examples of similar issues (but not configfs related) where a single module module_init() failing needs several cleanup that can only safely be done in module_cleanup(), but I cannot claim that this is general case :) > > > 3/ to make unregister_subsystem() win against mkdir(), mkdir() should > > try_module_get() on the subsystem's owner, not on the new item owner (as > > is done by the current code), right? Maybe there is a bug here... > > Nope. You can build a subsystem out of multiple modules if you > like. The module that owns the newly created object needs to be pinned, > and that's new_item->type->owner. If a subsystem lives within one > module, then subsys->type->owner == new_item->type->owner, and it > doesn't matter. In the "several modules" case, the module is pinned too late to protect against module unloading. This does probably not hurt configfs since the only problematic call is config_item_cleanup(), where the new item's release() operation is called. For such subsystems, the only way to protect against module unloading is to grab a reference on the new item's module in the make_item() operation, and find a way to ensure that the reference is dropped after the last config_item_put(). IMHO, what really hurts configfs is that the unregister_subsystem() vs mkdir() race is not solved unless mkdir() tries to grab a reference on the subsystem's module. And the current code of mkdir() does not ensure that in the "several modules" case. > > > > So for the simple case, we provide plenty of protection. If > > > someone wants to do something fancier, they have to provide their own > > > protection, but they would anyway. And we can't know their complex > > > lifecycle, so we can't really help anyway. > > > > Actually, I'm developing a framework based on configfs, with which many > > modules can be linked together through mkdir() and symlink() operations. > > So I'm already managing such reference holding, but the fact that > > configfs does not hold a reference until the last config_item_put() > > imposes limitations (with which I can live though) to the framework: > > Think about it this way: the try_module_get() isn't to protect > the client module, it is to protect configfs. It makes sure that if > someone calls a VFS operation on a configfs inode, configfs can follow > the config_*_operations safely. Once a config_item is removed from the > filesystem view, configfs is done with it. > Look at it the other way around. A config_item is not a > structure owned by configfs. It is a part of the larger object, which > is owned by the module that created it. The config_item portion just > lets configfs display it to userspace. Yes, this is what I realized in your previous email. > > > For instance, A is a group of the framework, and mkdir A/B creates an > > object implemented by module "mod_b". Before calling mod_b's > > constructor, A's make_item() has to grab a reference on mod_b, and fail > > if not successful. Then this reference cannot be dropped before B's last > > reference is dropped, which can happen a long time after rmdir A/B. So > > A's drop_item() cannot drop mod_b's reference, and A has to provide > > the release() config_item_operation for B, which will call B's > > destructor and finally drop mod_b's reference. > > Wow, so A->make_item() does something like: > > submod = lookup_which_mod(); > if (!strcmp(submod, "mod_a")) > new_thing = mod_a->alloc(); > else if (!strcmp(submod, "mod_b")) > new_thing = mod_b->alloc(); > > return &new_thing->config_item; > > That's pretty complicated, I agree. But certainly doable. I do something like this (and this works): struct mod_type { struct list_head type_list; struct config_item_type item_type; const char *name; struct config_item *new_item(const char *name); void destroy_item(struct config_item *item); }; void my_release(struct config_item *item) { struct mod_type *type = container_of(item->ci_type, struct mod_type, item_type); type->destroy_item(item); module_put(type->item_type.ct_owner); } struct configfs_item_operations my_item_ops = { .release = my_release, }; void register(struct mod_type *mod_type) { mod_type->item_type.ct_item_ops = &my_item_ops; spin_lock(&type_list); list_add(&mod_type->type_list, &mod_type_head); spin_unlock(&type_list); } /* Must only be called inside module_cleanup() */ void unregister(struct mod_type *mod_type) { spin_lock(&type_list); list_del(&mod_type->type_list); spin_unlock(&type_list); } struct mod_type *lookup(const char *name) { /* return mod_type having mod_type->name == name in the list */ } make_item(struct config_group *group, const char *name) { spin_lock(&mod_type_list); mod_type = lookup(name); if (mod_type) if (!try_module_get(mod_type->item_type.ct_owner)) mod_type = NULL; spin_unlock(&mod_type_list); new_item = NULL; if (mod_type) { new_item = mod_type->new_item(name); if (!new_item) module_put(mod_type->item_type.ct_owner); } return new_item; } drop_item(struct config_group *group, struct config_item *item) { config_item_put(item); } ------ mod_a_init() register(mod_type_a); mod_a_cleanup() unregister(mod_type_a); > Why can't mod_b provide a ->release() that does > module_put(self)? Because this is simply wrong. Doing module_put(self) exposes the modules's function to be run while another cpu unloads the module. Note how I solve this by doing try_module_get() while still having mod_type_list locked, and doing module_put() only after having destroyed the module's item. This lock actually protects item creation against concurrent removal of the module implementing that item. > Or are you trying to hide that detail from the person > who is implementing mod_b? Even better, use the chained release scheme > that is used by bio_endio(). Have mod_b control its own > config_item_operations. In make_item, copy off the type and operations, > then put in your own. In drop, put them back. > > struct my_item_type { > struct config_item_type *original_type; > struct config_item_type type; > struct config_item_operations ops; > }; > > make_item() > { > mod_b = alloc_mod_b(); > my_type = kzalloc(struct my_item_type); > my_type->original_type = mod_b->item->ci_type; > my_type->ops = *my_type->original_type->ct_item_ops; > my_type->ops.release = my_item_release(); > my_type->type = *my_type->original_type; > my_type->type.ops = &my_type->ops; > mod_b->item->ci_type = &my_type->type; > > return &mod_b->item; > } > > my_item_release(struct config_item *item) > { > my_type = to_my_type(item->type); > item->ci_type = my_type->original_type; > kfree(my_type); > item->ci_type->ct_ops->release(item); > } I already do something like this, replacing the item_operations instead of the whole item_type. And I find it ugly. Only a matter of taste, I agree. Louis -- Dr Louis Rilling Kerlabs Skype: louis.rilling Batiment Germanium Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes http://www.kerlabs.com/ 35700 Rennes [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* [Ocfs2-devel] [RFC] configfs: module reference counting rules 2008-06-16 12:39 ` Louis Rilling @ 2008-06-16 18:06 ` Joel Becker -1 siblings, 0 replies; 20+ messages in thread From: Joel Becker @ 2008-06-16 18:06 UTC (permalink / raw) To: Louis Rilling; +Cc: ocfs2-devel, linux-kernel On Mon, Jun 16, 2008 at 02:39:12PM +0200, Louis Rilling wrote: > On Sat, Jun 14, 2008 at 01:47:01AM -0700, Joel Becker wrote: > IMHO, what really hurts configfs is that the unregister_subsystem() vs > mkdir() race is not solved unless mkdir() tries to grab a reference on the > subsystem's module. And the current code of mkdir() does not ensure that in the > "several modules" case. Valid point. It really does assume that the owner is always the same. Have to think about whether that's a big deal. > I do something like this (and this works): I believe it works. It looks fine. I'd personally do it more like what I displayed, wrapping release() rather than creating a separate operations abstraction and overriding item_operations, but as you point out that's just implementation. > > Why can't mod_b provide a ->release() that does > > module_put(self)? > > Because this is simply wrong. Doing module_put(self) exposes the modules's > function to be run while another cpu unloads the module. Note how I solve this How so? As long as the module_put() is the last thing, you're fine. That said, we both have better solutions with our wrappered functions. Joel -- "If you took all of the grains of sand in the world, and lined them up end to end in a row, you'd be working for the government!" - Mr. Interesting Joel Becker Principal Software Developer Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] configfs: module reference counting rules @ 2008-06-16 18:06 ` Joel Becker 0 siblings, 0 replies; 20+ messages in thread From: Joel Becker @ 2008-06-16 18:06 UTC (permalink / raw) To: Louis Rilling; +Cc: ocfs2-devel, linux-kernel On Mon, Jun 16, 2008 at 02:39:12PM +0200, Louis Rilling wrote: > On Sat, Jun 14, 2008 at 01:47:01AM -0700, Joel Becker wrote: > IMHO, what really hurts configfs is that the unregister_subsystem() vs > mkdir() race is not solved unless mkdir() tries to grab a reference on the > subsystem's module. And the current code of mkdir() does not ensure that in the > "several modules" case. Valid point. It really does assume that the owner is always the same. Have to think about whether that's a big deal. > I do something like this (and this works): I believe it works. It looks fine. I'd personally do it more like what I displayed, wrapping release() rather than creating a separate operations abstraction and overriding item_operations, but as you point out that's just implementation. > > Why can't mod_b provide a ->release() that does > > module_put(self)? > > Because this is simply wrong. Doing module_put(self) exposes the modules's > function to be run while another cpu unloads the module. Note how I solve this How so? As long as the module_put() is the last thing, you're fine. That said, we both have better solutions with our wrappered functions. Joel -- "If you took all of the grains of sand in the world, and lined them up end to end in a row, you'd be working for the government!" - Mr. Interesting Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 20+ messages in thread
* [Ocfs2-devel] [RFC] configfs: module reference counting rules 2008-06-16 18:06 ` Joel Becker @ 2008-06-16 18:14 ` Louis Rilling -1 siblings, 0 replies; 20+ messages in thread From: Louis Rilling @ 2008-06-16 18:14 UTC (permalink / raw) To: Joel.Becker; +Cc: ocfs2-devel, linux-kernel On Mon, Jun 16, 2008 at 11:06:43AM -0700, Joel Becker wrote: > > > Why can't mod_b provide a ->release() that does > > > module_put(self)? > > > > Because this is simply wrong. Doing module_put(self) exposes the modules's > > function to be run while another cpu unloads the module. Note how I solve this > > How so? As long as the module_put() is the last thing, you're > fine. That said, we both have better solutions with our wrappered > functions. With a preemptible kernel, after module_put(self) the few assembly instructions cleaning up the stack before returning to the caller can be called after the memory allocated for the module has been freed. Which will make the kernel crash. Louis -- Dr Louis Rilling Kerlabs Skype: louis.rilling Batiment Germanium Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes http://www.kerlabs.com/ 35700 Rennes -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Digital signature Url : http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20080616/4cf790f2/attachment.bin ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] configfs: module reference counting rules @ 2008-06-16 18:14 ` Louis Rilling 0 siblings, 0 replies; 20+ messages in thread From: Louis Rilling @ 2008-06-16 18:14 UTC (permalink / raw) To: Joel.Becker; +Cc: ocfs2-devel, linux-kernel [-- Attachment #1: Type: text/plain, Size: 903 bytes --] On Mon, Jun 16, 2008 at 11:06:43AM -0700, Joel Becker wrote: > > > Why can't mod_b provide a ->release() that does > > > module_put(self)? > > > > Because this is simply wrong. Doing module_put(self) exposes the modules's > > function to be run while another cpu unloads the module. Note how I solve this > > How so? As long as the module_put() is the last thing, you're > fine. That said, we both have better solutions with our wrappered > functions. With a preemptible kernel, after module_put(self) the few assembly instructions cleaning up the stack before returning to the caller can be called after the memory allocated for the module has been freed. Which will make the kernel crash. Louis -- Dr Louis Rilling Kerlabs Skype: louis.rilling Batiment Germanium Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes http://www.kerlabs.com/ 35700 Rennes [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* [Ocfs2-devel] [RFC] configfs: module reference counting rules 2008-06-16 18:14 ` Louis Rilling @ 2008-06-16 19:36 ` Joel Becker -1 siblings, 0 replies; 20+ messages in thread From: Joel Becker @ 2008-06-16 19:36 UTC (permalink / raw) To: Louis Rilling; +Cc: ocfs2-devel, linux-kernel On Mon, Jun 16, 2008 at 08:14:28PM +0200, Louis Rilling wrote: > With a preemptible kernel, after module_put(self) the few assembly instructions > cleaning up the stack before returning to the caller can be called after the > memory allocated for the module has been freed. Which will make the kernel > crash. Ahh, but if you are running a preemptible kernel, you've already made your first mistake :-) Joel -- "Vote early and vote often." - Al Capone Joel Becker Principal Software Developer Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC] configfs: module reference counting rules @ 2008-06-16 19:36 ` Joel Becker 0 siblings, 0 replies; 20+ messages in thread From: Joel Becker @ 2008-06-16 19:36 UTC (permalink / raw) To: Louis Rilling; +Cc: ocfs2-devel, linux-kernel On Mon, Jun 16, 2008 at 08:14:28PM +0200, Louis Rilling wrote: > With a preemptible kernel, after module_put(self) the few assembly instructions > cleaning up the stack before returning to the caller can be called after the > memory allocated for the module has been freed. Which will make the kernel > crash. Ahh, but if you are running a preemptible kernel, you've already made your first mistake :-) Joel -- "Vote early and vote often." - Al Capone Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2008-06-16 19:37 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-06-12 23:54 [Ocfs2-devel] [RFC] configfs: module reference counting rules Louis Rilling 2008-06-12 23:54 ` Louis Rilling 2008-06-13 3:33 ` [Ocfs2-devel] " Joel Becker 2008-06-13 3:33 ` Joel Becker 2008-06-13 9:51 ` [Ocfs2-devel] " Louis Rilling 2008-06-13 9:51 ` Louis Rilling 2008-06-13 20:26 ` [Ocfs2-devel] " Joel Becker 2008-06-13 20:26 ` Joel Becker 2008-06-13 22:27 ` [Ocfs2-devel] " Louis Rilling 2008-06-13 22:27 ` Louis Rilling 2008-06-14 8:47 ` [Ocfs2-devel] " Joel Becker 2008-06-14 8:47 ` Joel Becker 2008-06-16 12:39 ` [Ocfs2-devel] " Louis Rilling 2008-06-16 12:39 ` Louis Rilling 2008-06-16 18:06 ` [Ocfs2-devel] " Joel Becker 2008-06-16 18:06 ` Joel Becker 2008-06-16 18:14 ` [Ocfs2-devel] " Louis Rilling 2008-06-16 18:14 ` Louis Rilling 2008-06-16 19:36 ` [Ocfs2-devel] " Joel Becker 2008-06-16 19:36 ` Joel Becker
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.