[Ocfs2-devel] [RFC] configfs: module reference counting rules

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Louis Rilling <Louis.Rilling@kerlabs.com>
To: Joel.Becker@oracle.com
Cc: ocfs2-devel@oss.oracle.com, linux-kernel@vger.kernel.org
Subject: [Ocfs2-devel] [RFC] configfs: module reference counting rules
Date: Mon, 16 Jun 2008 14:39:12 +0200	[thread overview]
Message-ID: <20080616123912.GR30804@localhost> (raw)
In-Reply-To: <20080614084701.GA9657@mail.oracle.com>

On Sat, Jun 14, 2008 at 01:47:01AM -0700, Joel Becker wrote:
> On Sat, Jun 14, 2008 at 12:27:44AM +0200, Louis Rilling wrote:
> > 1/ taking the module reference is only needed if mkdir() is called under
> > a subsystem root or one of its default group, right? Of course this is a
> > bit complex to check for this condition, and it does not hurt to take
> > module references in all cases of mkdir().
> 
> 	Correct, but it's much cleaner to always take the module ref.
> 
> > 2/ this module reference counting makes unregister_subsystem() win
> > against mkdir(), but only if unregister_subsystem() is called in
> > module_cleanup(), because otherwise try_module_get() would succeed,
> > right? If so, this means that after having called register_subsystem(),
> > a module_init() cannot cleanly fail. Perhaps this should be documented
> > in that case.
> 
> 	Correct again, mkdir can race a failing module_init().  This is
> the same as register_filesystem().  A module that needed to protect
> against this could have a mutex they release right before module_init()
> succeeds.  They'd check it in make_item().  But most everyone can safely
> make configfs_register_subsystem() the last thing in their
> module_init().

Agreed. I have examples of similar issues (but not configfs related) where
a single module module_init() failing needs several cleanup that can only
safely be done in module_cleanup(), but I cannot claim that this is general
case :)

> 
> > 3/ to make unregister_subsystem() win against mkdir(), mkdir() should
> > try_module_get() on the subsystem's owner, not on the new item owner (as
> > is done by the current code), right? Maybe there is a bug here...
> 
> 	Nope.  You can build a subsystem out of multiple modules if you
> like.  The module that owns the newly created object needs to be pinned,
> and that's new_item->type->owner.  If a subsystem lives within one
> module, then subsys->type->owner == new_item->type->owner, and it
> doesn't matter.

In the "several modules" case, the module is pinned too late to protect against
module unloading. This does probably not hurt configfs since the only
problematic call is config_item_cleanup(), where the new item's release()
operation is called. For such subsystems, the only way to protect against module
unloading is to grab a reference on the new item's module in the make_item()
operation, and find a way to ensure that the reference is dropped after the last
config_item_put().
	IMHO, what really hurts configfs is that the unregister_subsystem() vs
mkdir() race is not solved unless mkdir() tries to grab a reference on the
subsystem's module. And the current code of mkdir() does not ensure that in the
"several modules" case.

> 
> > > 	So for the simple case, we provide plenty of protection.  If
> > > someone wants to do something fancier, they have to provide their own
> > > protection, but they would anyway.  And we can't know their complex
> > > lifecycle, so we can't really help anyway.
> > 
> > Actually, I'm developing a framework based on configfs, with which many
> > modules can be linked together through mkdir() and symlink() operations.
> > So I'm already managing such reference holding, but the fact that
> > configfs does not hold a reference until the last config_item_put()
> > imposes limitations (with which I can live though) to the framework:
> 
> 	Think about it this way: the try_module_get() isn't to protect
> the client module, it is to protect configfs.  It makes sure that if
> someone calls a VFS operation on a configfs inode, configfs can follow
> the config_*_operations safely.  Once a config_item is removed from the
> filesystem view, configfs is done with it.
> 	Look at it the other way around.  A config_item is not a
> structure owned by configfs.  It is a part of the larger object, which
> is owned by the module that created it.  The config_item portion just
> lets configfs display it to userspace.

Yes, this is what I realized in your previous email.

> 
> > 	For instance, A is a group of the framework, and mkdir A/B creates an
> > object implemented by module "mod_b". Before calling mod_b's
> > constructor, A's make_item() has to grab a reference on mod_b, and fail
> > if not successful. Then this reference cannot be dropped before B's last
> > reference is dropped, which can happen a long time after rmdir A/B. So
> > A's drop_item() cannot drop mod_b's reference, and A has to provide
> > the release() config_item_operation for B, which will call B's
> > destructor and finally drop mod_b's reference.
> 
> 	Wow, so A->make_item() does something like:
> 
>     submod = lookup_which_mod();
>     if (!strcmp(submod, "mod_a"))
>         new_thing = mod_a->alloc();
>     else if (!strcmp(submod, "mod_b"))
>         new_thing = mod_b->alloc();
> 
>     return &new_thing->config_item;
> 
> That's pretty complicated, I agree.  But certainly doable.

I do something like this (and this works):

struct mod_type {
	struct list_head type_list;
	struct config_item_type item_type;
	const char *name;
	struct config_item *new_item(const char *name);
	void destroy_item(struct config_item *item);
};

void my_release(struct config_item *item)
{
	struct mod_type *type = container_of(item->ci_type, struct mod_type,
					item_type);
	type->destroy_item(item);
	module_put(type->item_type.ct_owner);
}

struct configfs_item_operations my_item_ops = {
	.release = my_release,
};

void register(struct mod_type *mod_type)
{
	mod_type->item_type.ct_item_ops = &my_item_ops;
	spin_lock(&type_list);
	list_add(&mod_type->type_list, &mod_type_head);
	spin_unlock(&type_list);
}

/* Must only be called inside module_cleanup() */
void unregister(struct mod_type *mod_type)
{
	spin_lock(&type_list);
	list_del(&mod_type->type_list);
	spin_unlock(&type_list);
}

struct mod_type *lookup(const char *name)
{
	/* return mod_type having mod_type->name == name in the list */ 
}

make_item(struct config_group *group, const char *name)
{
	spin_lock(&mod_type_list);
	mod_type = lookup(name);
	if (mod_type)
		if (!try_module_get(mod_type->item_type.ct_owner))
			mod_type = NULL;
	spin_unlock(&mod_type_list);

	new_item = NULL;
	if (mod_type) {
		new_item = mod_type->new_item(name);
		if (!new_item)
			module_put(mod_type->item_type.ct_owner);
	}
	return new_item;
}

drop_item(struct config_group *group, struct config_item *item)
{
	config_item_put(item);
}

------

mod_a_init()
	register(mod_type_a);

mod_a_cleanup()
	unregister(mod_type_a);

> 	Why can't mod_b provide a ->release() that does
> module_put(self)?

Because this is simply wrong. Doing module_put(self) exposes the modules's
function to be run while another cpu unloads the module. Note how I solve this
by doing try_module_get() while still having mod_type_list locked, and doing
module_put() only after having destroyed the module's item. This lock
actually protects item creation against concurrent removal of the module
implementing that item.

> Or are you trying to hide that detail from the person
> who is implementing mod_b?  Even better, use the chained release scheme
> that is used by bio_endio().  Have mod_b control its own
> config_item_operations.  In make_item, copy off the type and operations,
> then put in your own.  In drop, put them back.
> 
> struct my_item_type {
> 	struct config_item_type *original_type;
> 	struct config_item_type type;
> 	struct config_item_operations ops;
> };
> 
> make_item()
> {
> 	mod_b = alloc_mod_b();
> 	my_type = kzalloc(struct my_item_type);
> 	my_type->original_type = mod_b->item->ci_type;
> 	my_type->ops = *my_type->original_type->ct_item_ops;
> 	my_type->ops.release = my_item_release();
> 	my_type->type = *my_type->original_type;
> 	my_type->type.ops = &my_type->ops;
> 	mod_b->item->ci_type = &my_type->type;
> 
> 	return &mod_b->item;
> }
> 
> my_item_release(struct config_item *item)
> {
> 	my_type = to_my_type(item->type);
> 	item->ci_type = my_type->original_type;
> 	kfree(my_type);
> 	item->ci_type->ct_ops->release(item);
> }

I already do something like this, replacing the item_operations instead of the
whole item_type. And I find it ugly. Only a matter of taste, I agree.

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20080616/7588f9f1/attachment-0001.bin

WARNING: multiple messages have this Message-ID (diff)

From: Louis Rilling <Louis.Rilling@kerlabs.com>
To: Joel.Becker@oracle.com
Cc: ocfs2-devel@oss.oracle.com, linux-kernel@vger.kernel.org
Subject: Re: [RFC] configfs: module reference counting rules
Date: Mon, 16 Jun 2008 14:39:12 +0200	[thread overview]
Message-ID: <20080616123912.GR30804@localhost> (raw)
In-Reply-To: <20080614084701.GA9657@mail.oracle.com>

[-- Attachment #1: Type: text/plain, Size: 8556 bytes --]

On Sat, Jun 14, 2008 at 01:47:01AM -0700, Joel Becker wrote:
> On Sat, Jun 14, 2008 at 12:27:44AM +0200, Louis Rilling wrote:
> > 1/ taking the module reference is only needed if mkdir() is called under
> > a subsystem root or one of its default group, right? Of course this is a
> > bit complex to check for this condition, and it does not hurt to take
> > module references in all cases of mkdir().
> 
> 	Correct, but it's much cleaner to always take the module ref.
> 
> > 2/ this module reference counting makes unregister_subsystem() win
> > against mkdir(), but only if unregister_subsystem() is called in
> > module_cleanup(), because otherwise try_module_get() would succeed,
> > right? If so, this means that after having called register_subsystem(),
> > a module_init() cannot cleanly fail. Perhaps this should be documented
> > in that case.
> 
> 	Correct again, mkdir can race a failing module_init().  This is
> the same as register_filesystem().  A module that needed to protect
> against this could have a mutex they release right before module_init()
> succeeds.  They'd check it in make_item().  But most everyone can safely
> make configfs_register_subsystem() the last thing in their
> module_init().

Agreed. I have examples of similar issues (but not configfs related) where
a single module module_init() failing needs several cleanup that can only
safely be done in module_cleanup(), but I cannot claim that this is general
case :)

> 
> > 3/ to make unregister_subsystem() win against mkdir(), mkdir() should
> > try_module_get() on the subsystem's owner, not on the new item owner (as
> > is done by the current code), right? Maybe there is a bug here...
> 
> 	Nope.  You can build a subsystem out of multiple modules if you
> like.  The module that owns the newly created object needs to be pinned,
> and that's new_item->type->owner.  If a subsystem lives within one
> module, then subsys->type->owner == new_item->type->owner, and it
> doesn't matter.

In the "several modules" case, the module is pinned too late to protect against
module unloading. This does probably not hurt configfs since the only
problematic call is config_item_cleanup(), where the new item's release()
operation is called. For such subsystems, the only way to protect against module
unloading is to grab a reference on the new item's module in the make_item()
operation, and find a way to ensure that the reference is dropped after the last
config_item_put().
	IMHO, what really hurts configfs is that the unregister_subsystem() vs
mkdir() race is not solved unless mkdir() tries to grab a reference on the
subsystem's module. And the current code of mkdir() does not ensure that in the
"several modules" case.

> 
> > > 	So for the simple case, we provide plenty of protection.  If
> > > someone wants to do something fancier, they have to provide their own
> > > protection, but they would anyway.  And we can't know their complex
> > > lifecycle, so we can't really help anyway.
> > 
> > Actually, I'm developing a framework based on configfs, with which many
> > modules can be linked together through mkdir() and symlink() operations.
> > So I'm already managing such reference holding, but the fact that
> > configfs does not hold a reference until the last config_item_put()
> > imposes limitations (with which I can live though) to the framework:
> 
> 	Think about it this way: the try_module_get() isn't to protect
> the client module, it is to protect configfs.  It makes sure that if
> someone calls a VFS operation on a configfs inode, configfs can follow
> the config_*_operations safely.  Once a config_item is removed from the
> filesystem view, configfs is done with it.
> 	Look at it the other way around.  A config_item is not a
> structure owned by configfs.  It is a part of the larger object, which
> is owned by the module that created it.  The config_item portion just
> lets configfs display it to userspace.

Yes, this is what I realized in your previous email.

> 
> > 	For instance, A is a group of the framework, and mkdir A/B creates an
> > object implemented by module "mod_b". Before calling mod_b's
> > constructor, A's make_item() has to grab a reference on mod_b, and fail
> > if not successful. Then this reference cannot be dropped before B's last
> > reference is dropped, which can happen a long time after rmdir A/B. So
> > A's drop_item() cannot drop mod_b's reference, and A has to provide
> > the release() config_item_operation for B, which will call B's
> > destructor and finally drop mod_b's reference.
> 
> 	Wow, so A->make_item() does something like:
> 
>     submod = lookup_which_mod();
>     if (!strcmp(submod, "mod_a"))
>         new_thing = mod_a->alloc();
>     else if (!strcmp(submod, "mod_b"))
>         new_thing = mod_b->alloc();
> 
>     return &new_thing->config_item;
> 
> That's pretty complicated, I agree.  But certainly doable.

I do something like this (and this works):

struct mod_type {
	struct list_head type_list;
	struct config_item_type item_type;
	const char *name;
	struct config_item *new_item(const char *name);
	void destroy_item(struct config_item *item);
};

void my_release(struct config_item *item)
{
	struct mod_type *type = container_of(item->ci_type, struct mod_type,
					item_type);
	type->destroy_item(item);
	module_put(type->item_type.ct_owner);
}

struct configfs_item_operations my_item_ops = {
	.release = my_release,
};

void register(struct mod_type *mod_type)
{
	mod_type->item_type.ct_item_ops = &my_item_ops;
	spin_lock(&type_list);
	list_add(&mod_type->type_list, &mod_type_head);
	spin_unlock(&type_list);
}

/* Must only be called inside module_cleanup() */
void unregister(struct mod_type *mod_type)
{
	spin_lock(&type_list);
	list_del(&mod_type->type_list);
	spin_unlock(&type_list);
}

struct mod_type *lookup(const char *name)
{
	/* return mod_type having mod_type->name == name in the list */ 
}

make_item(struct config_group *group, const char *name)
{
	spin_lock(&mod_type_list);
	mod_type = lookup(name);
	if (mod_type)
		if (!try_module_get(mod_type->item_type.ct_owner))
			mod_type = NULL;
	spin_unlock(&mod_type_list);

	new_item = NULL;
	if (mod_type) {
		new_item = mod_type->new_item(name);
		if (!new_item)
			module_put(mod_type->item_type.ct_owner);
	}
	return new_item;
}

drop_item(struct config_group *group, struct config_item *item)
{
	config_item_put(item);
}

------

mod_a_init()
	register(mod_type_a);

mod_a_cleanup()
	unregister(mod_type_a);

> 	Why can't mod_b provide a ->release() that does
> module_put(self)?

Because this is simply wrong. Doing module_put(self) exposes the modules's
function to be run while another cpu unloads the module. Note how I solve this
by doing try_module_get() while still having mod_type_list locked, and doing
module_put() only after having destroyed the module's item. This lock
actually protects item creation against concurrent removal of the module
implementing that item.

> Or are you trying to hide that detail from the person
> who is implementing mod_b?  Even better, use the chained release scheme
> that is used by bio_endio().  Have mod_b control its own
> config_item_operations.  In make_item, copy off the type and operations,
> then put in your own.  In drop, put them back.
> 
> struct my_item_type {
> 	struct config_item_type *original_type;
> 	struct config_item_type type;
> 	struct config_item_operations ops;
> };
> 
> make_item()
> {
> 	mod_b = alloc_mod_b();
> 	my_type = kzalloc(struct my_item_type);
> 	my_type->original_type = mod_b->item->ci_type;
> 	my_type->ops = *my_type->original_type->ct_item_ops;
> 	my_type->ops.release = my_item_release();
> 	my_type->type = *my_type->original_type;
> 	my_type->type.ops = &my_type->ops;
> 	mod_b->item->ci_type = &my_type->type;
> 
> 	return &mod_b->item;
> }
> 
> my_item_release(struct config_item *item)
> {
> 	my_type = to_my_type(item->type);
> 	item->ci_type = my_type->original_type;
> 	kfree(my_type);
> 	item->ci_type->ct_ops->release(item);
> }

I already do something like this, replacing the item_operations instead of the
whole item_type. And I find it ugly. Only a matter of taste, I agree.

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

next prev parent reply	other threads:[~2008-06-16 12:39 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-12 23:54 [Ocfs2-devel] [RFC] configfs: module reference counting rules Louis Rilling
2008-06-12 23:54 ` Louis Rilling
2008-06-13  3:33 ` [Ocfs2-devel] " Joel Becker
2008-06-13  3:33   ` Joel Becker
2008-06-13  9:51   ` [Ocfs2-devel] " Louis Rilling
2008-06-13  9:51     ` Louis Rilling
2008-06-13 20:26     ` [Ocfs2-devel] " Joel Becker
2008-06-13 20:26       ` Joel Becker
2008-06-13 22:27       ` [Ocfs2-devel] " Louis Rilling
2008-06-13 22:27         ` Louis Rilling
2008-06-14  8:47         ` [Ocfs2-devel] " Joel Becker
2008-06-14  8:47           ` Joel Becker
2008-06-16 12:39           ` Louis Rilling [this message]
2008-06-16 12:39             ` Louis Rilling
2008-06-16 18:06             ` [Ocfs2-devel] " Joel Becker
2008-06-16 18:06               ` Joel Becker
2008-06-16 18:14               ` [Ocfs2-devel] " Louis Rilling
2008-06-16 18:14                 ` Louis Rilling
2008-06-16 19:36                 ` [Ocfs2-devel] " Joel Becker
2008-06-16 19:36                   ` Joel Becker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080616123912.GR30804@localhost \
    --to=louis.rilling@kerlabs.com \
    --cc=Joel.Becker@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.