public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Module Refcount & Stuff mini-FAQ
@ 2002-11-18 22:58 Rusty Russell
  2002-11-19  2:30 ` Werner Almesberger
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Rusty Russell @ 2002-11-18 22:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: Doug Ledford, Alexander Viro

[ Suggestions welcome ]

Golden Rule: If you are calling though a function pointer into a
(different) module, you must hold a reference to that module.
Otherwise you risk sleeping in the module while it is unloaded.

Q: How do I get a reference to a module?
A: Usually, a successful call to try_module_get(owner).  You don't
   need to check for owner != NULL, BTW.

Q: When does try_module_get(owner) fail?
A: When the module is not ready to be entered (ie. still in
   init_module) or it is being removed.  It fails to prevent you
   entering the module as it is being discarded (init might fail, or
   it's being removed).

Q: But modules call my register() routine which wants to call back
   into one of the function pointers immediately, and so
   try_module_get() fails!
A: You're being called from the module, so someone already has a
   reference (unless there's a bug), so you don't need a
   try_module_get().

   This does mean that if you were to register a structure for
   *another* module (does anyone do this?) you'd need to have a
   reference to it.

Q: How do I put the reference back?
A: Using module_put(owner) (owner == NULL is OK).

Q: Do I really need to put try_module_get() before every function ptr
   call?
A: If the function does not sleep (any cannot be preempted) ie. is
   called in softirq or hardirq context, you can omit this step, since
   you obviously won't sleep inside the module.

   Also, most structs have clear "start" and "stop" functions
   (eg. mount/umount), so you only need one try_module_get()
   on start, and module_put() on stop.

Q: Is it safe to call try_module_get() and module_put() from an
   interrtupt / softirq?
A: Yes.

Q: My code use "MOD_INC_USE_COUNT".  Do I still need to adjust my
   module count when someone calls one of my functions?
A: No, you never need to adjust your own module count.  There are five
   ways a function in your module can get called: firstly, it could be
   your module_init() function, in which case the module code holds a
   reference.  It could be another module using one of your
   EXPORT_SYMBOL'ed functions, in which case you cannot be removed
   since they would have to be removed first.  It could be a module
   which found an EXPORT_SYMBOL'ed function using symbol_get(), in
   which case they hold a reference count.  It could be through a
   function pointer which your module gave out previously, which is
   discussed above.  Finally, it could be from within your own module,
   in which case someone must already hold a reference.

Q: My code uses "__MOD_INC_USE_COUNT(reg->owner)", but now I get a
   warning at runtime that it is unsafe.  What do I need to do?
A: You need to use try_module_get(), and not call into the module if
   it fails (act as if it hasn't registered yet).  Note that you no
   longer need to check for NULL yourself, try_module_get() does that.

Q: My code used "GET_USE_COUNT(module)" to get the reference count.
A: Don't do that.  If module unloading is disabled, there is no
   reference count, and there is never a single value you can assign
   to.

Q: My code used "try_inc_mod_count(module)" to get the reference
   count.  Should I change it?
A: No hurry.  try_module_get() is exactly the same: the new name
   reflects that this is now the only way to get a reference.

Q: How does the code in try_module_get() work?
A: It disables preemption for a moment, checks the live flag, and then
   increments a per-cpu counter if the module is live.  This is even
   lighter-weight (in icache and cycles) than using a brlock, but has
   the same effect.  If CONFIG_MODULE_UNLOAD=n, it just becomes a
   check that the module is live.

Q: How does the module remove code work?
A: It stops the machine by scheduling threads for every other CPU,
   then they all disable interrupts.  At this stage we know that noone
   is in try_module_get(), so we can reliably read the counter.  If
   zero, or the rmmod user specified --wait, we set the live flag to
   false.  After this, the reference count should not increase, and
   each module_put() will wake us up, so we can check the counter
   again.

Q: Are these changes all so you could implement an in-kernel module
   linker?
A: No, they were to prevent load and unload races without altering
   every module, nor introducing drastic new requirements.

Q: Doesn't putting linking code into the kernel just add bloat?
A: The total linking code is about 200 generic lines, 100
   x86-specific lines.  The ia64 linking code is about 500 lines (it's
   the most complex).  Richard Henderson has a great suggestion for
   simplifying it furthur, which I'm implementing now.  "insmod" is
   now a single portable system call, meaning insmod can be written in
   about 20 lines of code.

   The previous code required to implement the two module loading
   system call, the module querying system call, and the /proc/ksyms
   output, required a little more code than the current x86 linker.

Q: Why not just fix the old code?
A: Because having something so intimate with the kernel in userspace
   greatly restricts what changes the kernel can make.  Moving into
   the kernel means I have implemented modversions, typesafe
   extensible module parameters and kallsyms without altering
   userspace in any way.  Future extensions won't have to worry about
   the modversions problem.

Q: Why not implement two-stage insert / two-stage delete?
A: Because I implemented it first and it sucked.  And because this
   *is* two-stage insert and two-stage delete, without exposing it to
   the modules using it, with the added advantage that the second
   stage is atomic (activation/deactivation is simply changing
   mod->live, modulo locking implementation magic detailed above).
   This prevents the race between deactivating the module and finding
   that someone has starting using it as you are deactivating it.

Hope that helps?
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 22+ messages in thread
* Re: Module Refcount & Stuff mini-FAQ
@ 2002-11-19 19:18 Adam J. Richter
  0 siblings, 0 replies; 22+ messages in thread
From: Adam J. Richter @ 2002-11-19 19:18 UTC (permalink / raw)
  To: linux-kernel, rusty

On November 18, 2002, Rusty Russell wrote:
>[ Suggestions welcome ]
>
>Golden Rule: If you are calling though a function pointer into a
>(different) module, you must hold a reference to that module.
>Otherwise you risk sleeping in the module while it is unloaded.

Although it is addressed in your subsequent questions, I think people
might read this and think that they should pepper their code with
unneccessary try_module_get()'s, often for their own module.  I'd
recommend demoting this "golden rule to just another FAQ entry."
Otherwise, there is some implication that if you remember just the
"golden rule", that will be constructive, but I can see attempting to
apply this "golden rule" without following the rest other FAQ entries
as harmful in common cases.  If you really like the "golden rule"
format, perhaps you could add a couple of others based on issues that
you addresss later in your FAQ, such as the following.

"Golden Rule": Do not call try_module_get() unnecessarily.  In most
cases, you know that the module reference count cannot drop to zero
because some higher level facility is holding a reference and will
not release that reference until you return, such as reference count
from an open file descriptor or network interface.

Another "Golden Rule": If you find yourself explicitly calling
try_module_get() and module_put() for a common type of module, then
probably some higher level facility really should be taking care of
this for you.  This is addressed under your question "Do I really need
to put try_module_get() before every function ptr call?", but I want
phrase this as something you should proactively look for.

Perhaps there ought to be a module pointer in struct device_driver.
{,un}register_{device,driver}() would *not* modify the counts (that
would prevent module removal entirely), but it would eliminate the
need to sprinkle module pointers from most places that have them and
may allow a little bit of code consolidation in places.

>Q: How do I get a reference to a module?
>A: Usually, a successful call to try_module_get(owner).  You don't
>   need to check for owner != NULL, BTW.
>
>Q: When does try_module_get(owner) fail?
>A: When the module is not ready to be entered (ie. still in
>   init_module) or it is being removed.  It fails to prevent you
>   entering the module as it is being discarded (init might fail, or
>   it's being removed).

	1. try_module_get() introduces new error legs that will get
little testing.

	2. try_module_get() introduces new failures that other software
has to anticipate.  For example, if I try to mount an ext3 file system
and it happens that ext3 was being automatically removed (for lack of
use) at this time, the attempt to get the ext3 filesystem can fail
without request_module() being called to reload it.

	3. try_module_get() introduces yet another "most fundamental"
lock type.  We have a bunch of facilities vying to do that, and I
think it's going to be a source of bugs.  It would be better to avoid
introducing a new layer of locking if possible.

	4. This kind of race is not really specific to modules, although
they may be the only example that comes up in practice.

	I agree that this approach is worth these costs if it is
the only way to avoid module remove races, or if it is the best way.
However, I think there may be potentially better alternatives.

	Here is what I have in mind.  Basically, a module's module_init
function could register an rwsem (not sure what the difference is with
rwlock; I just it because there is already one in struct bus_type).
sys_delete_module take an exclusive lock it before checking the
reference count, and it would remain held in the the module_exit
routine, which would be responsible for releasing the lock.  Just to
enable modules that register and unregister multiple things, it
would be possible to register more than one such lock, although it
is important that these locks normally not be held simultaneously
for any other reason (otherwise there might be deadlock).

	Anyhow, here is some pseudo-code.  The change to delete_module
is most relevant.  I don't know whether I should by using rwsem or rwlock
for this, by the way.  With the example of get_fs_type and using the
generic driver layer, there is no module unload race, there is no
failure path for mod_inc_use_count, and there is no situation where
attempting to get a filesystem that is in the midst of being unloaded
will result in failure (instead, the access will wait until the filesystem
has been unregistered, and the modprobe will reload it).  Under this
scheme, there is no need for a new kind of locking primitive, and much of
this facility is not tied to modules.


struct rwsem_chain {
	struct rw_semaphore *rwsem;
	struct list_head list;
};

/* Adds the rwsem in _sorted_ order, by, say, address of rwsem. */
void rwsems_add_sorted(struct list_head *head, struct rwsem_chain *element);

/* Gets an exclusive lock on every semaphore in the chain, but duplicate
   semaphores on the list are acquired only once. */
void rwsems_wlock(struct list_head *head);

/* Remove element from the list and then release its rwsem *if* there
   are no duplicate rwsem's on the list. */
void rwsems_element_release(struct rwsem_chain *element);

struct module {
	...
	/* If this list is non-empty, then callers to MOD_INC_REF_COUNT
	   must hold one of the locks on this list (it can be a shared
	   hold). */
	struct list_head	rwsems;
};

/* in kernel/module.c: */
asmlinkage long
sys_delete_module(const char *name_user, unsigned int flags) {
	...
	rwsem_chain_wlock(mod->rwsems);
        if ((flags & O_NONBLOCK) && module_refcount(mod) != 0) {
		...
	}
	...
	free_module(mod);/*Destroy routine is responsible for releasing locks*/
	...
}


/* Here is an example of using this facility.  Let's port the generic device
   layer. */

struct device_driver {
	...
	struct module		*module;
	struct rwsem_chain	rwsem_chain;
};

int driver_register(struct device_driver * drv)
{
	...
	if (drv->module) {
		drv->rwsem_chain.rwsem = &drv->bus->rwsem;
		rwsems_add_sorted(&module->rwsems, &drv->rwsem_chain);
	}
}



void put_driver(struct device_driver * drv)
{
	...
	rwsems_element_release(&drv->rwsem_chain);
}

struct device_driver *
get_driver_by_name(struct bus_type *bus_type, const char *name)
{
	down_read(&bus->rwsem);
	list_for_each(node,&bus_type->drivers) {
		struct driver * dev = get_device(to_drv(node));
		if (strcmp(dev->name, name) == 0) {
			MOD_INC_REF_COUNT(drv->module);
			up_read(&bus->rwsem);
			return drv;
		}
	}
	up_read(&bus->rwsem);
	return NULL;
}

struct device_driver *
get_driver_by_name_modprobe(struct bus_type *bus_type, const char *name)
{
	struct device_driver *result = get_driver_by_name(bus_type, name);
	
	if (!result && request_module(name) == 0)
		return get_driver_by_name(bus_type, name);
	else
		return result;
}	



/* You could port the file system list interface to file system
   by having every file system do

   rwsems_add_sorted(&THIS_MODULE->rwsems, &local_rwsems_element);

   ...or you could embed a struct device_driver in struct file_system_type
   and a struct device in struct super_block.

   Either way, you would then a have raceless get_fs_type does not
   return failure when the file system being sought is in the midst
   of being unloaded. */

struct bus_type fs_bus_type;

int register_filesystem(struct file_system_type * fs)
{
	...
	down_write(&fs_bus_type.rwsem);
	...
	fs->driver.module = fs->module;
	fs->driver.bus_type = &fs_bus_type;
	register_driver(&fs->driver);
	...
	up_write(&fs_bus_type.rwsem);
}


struct file_system_type *get_fs_type(const char *name)
{
	struct device_driver *driver = get_driver_by_name_modprobe(name);

	if (driver == NULL)
		return NULL;

	return container_of(driver, struct file_system_type, driver);
}



	I may comment further on your FAQ later, especially the stuff
about having the module loader in the kernel, but I have to go now, and
that's a separate topic anyhow.

Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."

^ permalink raw reply	[flat|nested] 22+ messages in thread
* Re: Module Refcount & Stuff mini-FAQ
@ 2002-11-20 12:25 Adam J. Richter
  0 siblings, 0 replies; 22+ messages in thread
From: Adam J. Richter @ 2002-11-20 12:25 UTC (permalink / raw)
  To: rusty; +Cc: linux-kernel

Although Rusty Russell sent me a preliminary response without cc'ing
linux-kernel (I think because it was so preliminary), but I'm
re-cc'ing linux-kernel because my response explains a bit more about
my proposal to eliminate most or all try_module_get's racelessly,
which is something others might find informative or might want to
comment on.  There is obviously nothing particularly private in this
message.

Rusty Russell wrote:
>In message <200211191918.LAA11641@adam.yggdrasil.com> you [Adam Richter] write:
>> On November 18, 2002, Rusty Russell wrote:
[...]
>> 	1. try_module_get() introduces new error legs that will get
>> little testing.

>try_module_get() is exactly equivalent to try_inc_mod_count().  It's
>not actually a new thing: deprecating the other module refcount
>methods is the new thing.

	OK, s/try_module_get/try_inc_mod_count/.

	Same issue.

>I've only glanced over your locking proposal, but the most obvious
>things to me are that grabbing a rwlock strikes me as a little heavy
>for a fundmantal primitive that might be used anywhere, and secondly I
>want to grab it in a bh handler so I can modularize IPv4.

	It appears that there already is an appropriate mutex to use
in ipv4: rtnl_sem.  My code currently uses an rw_semaphore instead of
a semaphore, but it could either be changed to call a list of
arbirtrary locking functions, or, probably simpler, rtnl_sem could be
changed to an rw_semaphore.  The latter is particularly appealing,
because there already is code in the net subdirectory that seems to be
written to take advantage of this change (there are already distinct
rtnl_shlock and rtnl_exlock macros).  So, the using a locking primitive
that can lock rather than spin has already been solved apparently in ipv4.

	I run modularized ipv4 already, so it should be easy for me to
check if your module loader gets to the point where it works enough
for me to complete a boot (or I could try to patch to Keith Owen's
module system).

	As far as the "strikes me as a little heavy" phrase goes,
that's obviously not a clear identification of a cost, so I can only
try guess what costs you might be referring to talk about them.  The
memory usage of struct rwsem_chain would be 3 pointers (12 or 24
bytes) per detected device (net, char, block, etc.), file_system_type
and perhaps some other resources of which there are usually a handful
of each.  My guess is that you'd probably have under 100 allocated on
a typical computer: one for each detected network interface, one for
each detected block device, one for each detected character device,
and some elsewhere, perhaps an average of 2-3 per loaded module,
probably as much space will be saved from eliminating error paths.
The locks themselves generally generally already exist, and these
amount to one lock per type of device (per "bus_type" from a generic
driver standpoint): one lock for network devices, one lock for file
system types, etc.  It's a very small number and the locks already
exist in every case that I've seen anyhow.  So the net memory costs
should be approximately zero and it may even be a net memory savings.

	If by "a little heavy" you were referring to lock contention,
it's important to realize that this proposal sets up lists of locks,
it does not introduce new attempts to grab these locks or attempts to
hold them much longer, with the exception of the module's unload
function.  Even there, these are locks that would be taken at some
point in the unload function anyhow.  Also, these are not spin locks
and attempts to block with these rw_semaphore's held will be rare to
nonexistant.  So, the waiting on the locks should cause the CPU to
switch to something useful, not just spin.

	This change will likely eliminate bugs, simplify testing and
code walk throughs (fewer untested branches), and, more importantly,
eliminate flakey behavior that is not considered a bug from the
try_inc_mod_count perspective, but is a bug from a functional
standpoint.  For example, if you do "mount -t iso9660 /dev/cdrom
/mnt", a try_inc_mod_count implementation can generate a result like
"iso9660: unknown filesystem" on a system that has iso9660 just
because of a timing fluke, whereas under my scheme you will reliably
get the iso9660 filesystem every time.

	It is also important to realize that this change can be done
incrementally.  Under this scheme, try_inc_mod_count will just succeed
all the time for users of this facility.  Note that the restored
unconditional __MOD_INC_USE_COUNT routine could have an optional
debugging feature: it would call BUG() if the caller does not have at
least a shared lock on one of the rw_semaphore's in the module's list.
__MOD_INC_USE_COUNT is generally not called in IO paths, so it
wouldn't be that costly to turn that check on.

	If we are able to eliminate all calls to try_inc_mod_count,
then that "lock the entire computer" code can be deleted, as well as
try_inc_mod_count.  In the unlikely even that try_inc_mod_count cannot
be eliminated, I think the reliability that this change would bring to
those places that can use this facility would still be enough to
warrant keeping this facility.

Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2002-11-26 22:54 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-11-18 22:58 Module Refcount & Stuff mini-FAQ Rusty Russell
2002-11-19  2:30 ` Werner Almesberger
2002-11-24 22:50   ` Rusty Russell
2002-11-25  2:07     ` Werner Almesberger
2002-11-25  2:27       ` Rusty Russell
2002-11-25  6:39         ` Werner Almesberger
2002-11-25 22:43           ` Rusty Russell
2002-11-26  2:26             ` Werner Almesberger
2002-11-26  3:16               ` Rusty Russell
2002-11-26  7:12                 ` Werner Almesberger
2002-11-26 22:56                   ` Rusty Russell
2002-11-19  2:40 ` John Levon
2002-11-24 23:02   ` Rusty Russell
2002-11-25  0:38     ` John Levon
2002-11-19  3:10 ` kksymoops Jeff Garzik
2002-11-19 21:10   ` kksymoops Rusty Russell
2002-11-20 15:46     ` kksymoops Kai Germaschewski
2002-11-19  3:50 ` kksymoops Jeff Garzik
2002-11-23 22:23 ` Module Refcount & Stuff mini-FAQ Pavel Machek
2002-11-25  0:26   ` Rusty Russell
  -- strict thread matches above, loose matches on Subject: below --
2002-11-19 19:18 Adam J. Richter
2002-11-20 12:25 Adam J. Richter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox