Module Refcount & Stuff mini-FAQ

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Module Refcount & Stuff mini-FAQ
@ 2002-11-18 22:58 Rusty Russell
  2002-11-19  2:30 ` Werner Almesberger
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Rusty Russell @ 2002-11-18 22:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: Doug Ledford, Alexander Viro

[ Suggestions welcome ]

Golden Rule: If you are calling though a function pointer into a
(different) module, you must hold a reference to that module.
Otherwise you risk sleeping in the module while it is unloaded.

Q: How do I get a reference to a module?
A: Usually, a successful call to try_module_get(owner).  You don't
   need to check for owner != NULL, BTW.

Q: When does try_module_get(owner) fail?
A: When the module is not ready to be entered (ie. still in
   init_module) or it is being removed.  It fails to prevent you
   entering the module as it is being discarded (init might fail, or
   it's being removed).

Q: But modules call my register() routine which wants to call back
   into one of the function pointers immediately, and so
   try_module_get() fails!
A: You're being called from the module, so someone already has a
   reference (unless there's a bug), so you don't need a
   try_module_get().

   This does mean that if you were to register a structure for
   *another* module (does anyone do this?) you'd need to have a
   reference to it.

Q: How do I put the reference back?
A: Using module_put(owner) (owner == NULL is OK).

Q: Do I really need to put try_module_get() before every function ptr
   call?
A: If the function does not sleep (any cannot be preempted) ie. is
   called in softirq or hardirq context, you can omit this step, since
   you obviously won't sleep inside the module.

   Also, most structs have clear "start" and "stop" functions
   (eg. mount/umount), so you only need one try_module_get()
   on start, and module_put() on stop.

Q: Is it safe to call try_module_get() and module_put() from an
   interrtupt / softirq?
A: Yes.

Q: My code use "MOD_INC_USE_COUNT".  Do I still need to adjust my
   module count when someone calls one of my functions?
A: No, you never need to adjust your own module count.  There are five
   ways a function in your module can get called: firstly, it could be
   your module_init() function, in which case the module code holds a
   reference.  It could be another module using one of your
   EXPORT_SYMBOL'ed functions, in which case you cannot be removed
   since they would have to be removed first.  It could be a module
   which found an EXPORT_SYMBOL'ed function using symbol_get(), in
   which case they hold a reference count.  It could be through a
   function pointer which your module gave out previously, which is
   discussed above.  Finally, it could be from within your own module,
   in which case someone must already hold a reference.

Q: My code uses "__MOD_INC_USE_COUNT(reg->owner)", but now I get a
   warning at runtime that it is unsafe.  What do I need to do?
A: You need to use try_module_get(), and not call into the module if
   it fails (act as if it hasn't registered yet).  Note that you no
   longer need to check for NULL yourself, try_module_get() does that.

Q: My code used "GET_USE_COUNT(module)" to get the reference count.
A: Don't do that.  If module unloading is disabled, there is no
   reference count, and there is never a single value you can assign
   to.

Q: My code used "try_inc_mod_count(module)" to get the reference
   count.  Should I change it?
A: No hurry.  try_module_get() is exactly the same: the new name
   reflects that this is now the only way to get a reference.

Q: How does the code in try_module_get() work?
A: It disables preemption for a moment, checks the live flag, and then
   increments a per-cpu counter if the module is live.  This is even
   lighter-weight (in icache and cycles) than using a brlock, but has
   the same effect.  If CONFIG_MODULE_UNLOAD=n, it just becomes a
   check that the module is live.

Q: How does the module remove code work?
A: It stops the machine by scheduling threads for every other CPU,
   then they all disable interrupts.  At this stage we know that noone
   is in try_module_get(), so we can reliably read the counter.  If
   zero, or the rmmod user specified --wait, we set the live flag to
   false.  After this, the reference count should not increase, and
   each module_put() will wake us up, so we can check the counter
   again.

Q: Are these changes all so you could implement an in-kernel module
   linker?
A: No, they were to prevent load and unload races without altering
   every module, nor introducing drastic new requirements.

Q: Doesn't putting linking code into the kernel just add bloat?
A: The total linking code is about 200 generic lines, 100
   x86-specific lines.  The ia64 linking code is about 500 lines (it's
   the most complex).  Richard Henderson has a great suggestion for
   simplifying it furthur, which I'm implementing now.  "insmod" is
   now a single portable system call, meaning insmod can be written in
   about 20 lines of code.

   The previous code required to implement the two module loading
   system call, the module querying system call, and the /proc/ksyms
   output, required a little more code than the current x86 linker.

Q: Why not just fix the old code?
A: Because having something so intimate with the kernel in userspace
   greatly restricts what changes the kernel can make.  Moving into
   the kernel means I have implemented modversions, typesafe
   extensible module parameters and kallsyms without altering
   userspace in any way.  Future extensions won't have to worry about
   the modversions problem.

Q: Why not implement two-stage insert / two-stage delete?
A: Because I implemented it first and it sucked.  And because this
   *is* two-stage insert and two-stage delete, without exposing it to
   the modules using it, with the added advantage that the second
   stage is atomic (activation/deactivation is simply changing
   mod->live, modulo locking implementation magic detailed above).
   This prevents the race between deactivating the module and finding
   that someone has starting using it as you are deactivating it.

Hope that helps?
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Module Refcount & Stuff mini-FAQ
  2002-11-18 22:58 Module Refcount & Stuff mini-FAQ Rusty Russell
@ 2002-11-19  2:30 ` Werner Almesberger
  2002-11-24 22:50   ` Rusty Russell
  2002-11-19  2:40 ` John Levon
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Werner Almesberger @ 2002-11-19  2:30 UTC (permalink / raw)
  To: Rusty Russell; +Cc: linux-kernel, Doug Ledford, Alexander Viro

Rusty Russell wrote:
> Q: When does try_module_get(owner) fail?
> A: When the module is not ready to be entered (ie. still in
>    init_module) or it is being removed.  It fails to prevent you
>    entering the module as it is being discarded (init might fail, or
>    it's being removed).

I'd suggest "It fails in order to [...]" to avoid the "does work
exactly NOT as advertised" ambiguity ;-)

> Q: But modules call my register() routine which wants to call back
>    into one of the function pointers immediately, and so
>    try_module_get() fails!
> A: You're being called from the module, so someone already has a
>    reference (unless there's a bug), so you don't need a
>    try_module_get().

Hmm, I haven't really looked at your new code, but this sounds as
if try_module_get gets an exclusive lock. Is this true ? Doesn't
seem to make sense to me.

> Hope that helps?

Don't you want to repeat the golden rule at the end, just as a
polite reminder ? :-) (Just kidding.)

- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, Buenos Aires, Argentina         wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Module Refcount & Stuff mini-FAQ
  2002-11-19  2:30 ` Werner Almesberger
@ 2002-11-24 22:50   ` Rusty Russell
  2002-11-25  2:07     ` Werner Almesberger
  0 siblings, 1 reply; 22+ messages in thread
From: Rusty Russell @ 2002-11-24 22:50 UTC (permalink / raw)
  To: Werner Almesberger; +Cc: linux-kernel, Doug Ledford, Alexander Viro

In message <20021118233047.P1407@almesberger.net> you write:
> Rusty Russell wrote:
> > Q: When does try_module_get(owner) fail?
> > A: When the module is not ready to be entered (ie. still in
> >    init_module) or it is being removed.  It fails to prevent you
> >    entering the module as it is being discarded (init might fail, or
> >    it's being removed).
> 
> I'd suggest "It fails in order to [...]" to avoid the "does work
> exactly NOT as advertised" ambiguity ;-)

Ah, thanks, good catch.  I changed it to "This prevents you".

> > Q: But modules call my register() routine which wants to call back
> >    into one of the function pointers immediately, and so
> >    try_module_get() fails!
> > A: You're being called from the module, so someone already has a
> >    reference (unless there's a bug), so you don't need a
> >    try_module_get().
> 
> Hmm, I haven't really looked at your new code, but this sounds as
> if try_module_get gets an exclusive lock. Is this true ? Doesn't
> seem to make sense to me.

No, it doesn't.  The question is badly phrased.  How about:

Q: But the modules' init routine calls my register() routine which
   wants to call back into one of the function pointers immediately,
   and so try_module_get() fails! (because the module is not finished
   initializing yet)
A: You're being called from the module, so someone already has a
   reference (unless there's a bug), so you don't need a
   try_module_get().

   This does mean that if you were to register a structure for
   *another* module (does anyone do this?) you'd need to have a
   reference to it.

> > Hope that helps?
> 
> Don't you want to repeat the golden rule at the end, just as a
> polite reminder ? :-) (Just kidding.)

Heh.

Well, if we continue to start modules unisolated, I need to rewrite
the FAQ anyway...

Thanks,
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Module Refcount & Stuff mini-FAQ
  2002-11-24 22:50   ` Rusty Russell
@ 2002-11-25  2:07     ` Werner Almesberger
  2002-11-25  2:27       ` Rusty Russell
  0 siblings, 1 reply; 22+ messages in thread
From: Werner Almesberger @ 2002-11-25  2:07 UTC (permalink / raw)
  To: Rusty Russell; +Cc: linux-kernel, Doug Ledford, Alexander Viro

Rusty Russell wrote:
> Q: But the modules' init routine calls my register() routine which
>    wants to call back into one of the function pointers immediately,
>    and so try_module_get() fails! (because the module is not finished
>    initializing yet)
> A: You're being called from the module, so someone already has a
>    reference (unless there's a bug), so you don't need a
>    try_module_get().

Hmm, I wouldn't call this the answer. How about:
 - Q: why does it fail ?
 - A: because you're initializing
 - solution: but since you're calling from a module, and the call
   goes back to the same module, you don't have to worry

This raises the question: why is this a special case ? The
registration function shouldn't have to know all these details.
(That's the whole point of try_module_get, isn't it ?)

Wouldn't it be possible to simply allow try_module_get also
while the module is initializing ?

> Well, if we continue to start modules unisolated, I need to rewrite
> the FAQ anyway...

Does "unisolated" mean that try_module_get would work ? If yes,
you've already solved the problem ;-)

- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, Buenos Aires, Argentina         wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Module Refcount & Stuff mini-FAQ
  2002-11-25  2:07     ` Werner Almesberger
@ 2002-11-25  2:27       ` Rusty Russell
  2002-11-25  6:39         ` Werner Almesberger
  0 siblings, 1 reply; 22+ messages in thread
From: Rusty Russell @ 2002-11-25  2:27 UTC (permalink / raw)
  To: Werner Almesberger; +Cc: linux-kernel, Doug Ledford, Alexander Viro

In message <20021124230758.A1549@almesberger.net> you write:
> Rusty Russell wrote:
> > Q: But the modules' init routine calls my register() routine which
> >    wants to call back into one of the function pointers immediately,
> >    and so try_module_get() fails! (because the module is not finished
> >    initializing yet)
> > A: You're being called from the module, so someone already has a
> >    reference (unless there's a bug), so you don't need a
> >    try_module_get().
> 
> Hmm, I wouldn't call this the answer. How about:
>  - Q: why does it fail ?
>  - A: because you're initializing
>  - solution: but since you're calling from a module, and the call
>    goes back to the same module, you don't have to worry
> 
> This raises the question: why is this a special case ? The
> registration function shouldn't have to know all these details.
> (That's the whole point of try_module_get, isn't it ?)

Yes, this is a fairly rare case: I'm debating it now.  For example,
scsi calls back into the module which just registered, as does the
block layer (to probe for partitions).

> > Well, if we continue to start modules unisolated, I need to rewrite
> > the FAQ anyway...
> 
> Does "unisolated" mean that try_module_get would work ? If yes,
> you've already solved the problem ;-)

At the cost of exposing the module to initialization races.

Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Module Refcount & Stuff mini-FAQ
  2002-11-25  2:27       ` Rusty Russell
@ 2002-11-25  6:39         ` Werner Almesberger
  2002-11-25 22:43           ` Rusty Russell
  0 siblings, 1 reply; 22+ messages in thread
From: Werner Almesberger @ 2002-11-25  6:39 UTC (permalink / raw)
  To: Rusty Russell; +Cc: linux-kernel, Doug Ledford, Alexander Viro

Rusty Russell wrote:
> At the cost of exposing the module to initialization races.

Hmm, what races are there that don't correspond to a bug in
some module ?

- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, Buenos Aires, Argentina         wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Module Refcount & Stuff mini-FAQ
  2002-11-25  6:39         ` Werner Almesberger
@ 2002-11-25 22:43           ` Rusty Russell
  2002-11-26  2:26             ` Werner Almesberger
  0 siblings, 1 reply; 22+ messages in thread
From: Rusty Russell @ 2002-11-25 22:43 UTC (permalink / raw)
  To: Werner Almesberger; +Cc: linux-kernel, Doug Ledford, Alexander Viro

In message <20021125033906.B1549@almesberger.net> you write:
> Rusty Russell wrote:
> > At the cost of exposing the module to initialization races.
> 
> Hmm, what races are there that don't correspond to a bug in
> some module ?

Ah, see other thread (weren't you at the kernel summit?).  There's
currently no way to abort if you've exposed interfaces and then
something fails ("don't do that" is great except noone knows that, and
it's not always possible or nice)

The old module code used to just ignore it.  My code tries to catch it
and leaves the module stuck (except if !CONFIG_MODULE_UNLOAD, in which
case it can't really detect it).

Since we have a way of isolating modules for the symmetric race on
exit, it makes sense to use it on entry (of course, try_inc_mod_count
would have to keep violating the rule during the transition).

Hope that helps,
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Module Refcount & Stuff mini-FAQ
  2002-11-25 22:43           ` Rusty Russell
@ 2002-11-26  2:26             ` Werner Almesberger
  2002-11-26  3:16               ` Rusty Russell
  0 siblings, 1 reply; 22+ messages in thread
From: Werner Almesberger @ 2002-11-26  2:26 UTC (permalink / raw)
  To: Rusty Russell; +Cc: linux-kernel, Doug Ledford, Alexander Viro

Rusty Russell wrote:
> Ah, see other thread

Argh, there are only about a hundred threads on modules ;-)

> (weren't you at the kernel summit?).

Yes, but I hadn't paid much attention to modules before, so I only
understood about half of what you said, sorry. It was interesting
to learn that there were actually so many problems, though :-)

> There's currently no way to abort if you've exposed interfaces and then
> something fails ("don't do that" is great except noone knows that, and
> it's not always possible or nice)

Hmm, if "expose interface" == "publish symbol", why can't you simply
defer publishing until after initialization completes ? If "expose
interface" == "register something somewhere", then this has to be
undone anyway. Or am I overlooking something here ?

Thanks,
- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, Buenos Aires, Argentina         wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Module Refcount & Stuff mini-FAQ
  2002-11-26  2:26             ` Werner Almesberger
@ 2002-11-26  3:16               ` Rusty Russell
  2002-11-26  7:12                 ` Werner Almesberger
  0 siblings, 1 reply; 22+ messages in thread
From: Rusty Russell @ 2002-11-26  3:16 UTC (permalink / raw)
  To: Werner Almesberger; +Cc: linux-kernel, Doug Ledford, Alexander Viro

In message <20021125232610.A22825@almesberger.net> you write:
> Rusty Russell wrote:
> > There's currently no way to abort if you've exposed interfaces and then
> > something fails ("don't do that" is great except noone knows that, and
> > it's not always possible or nice)
> 
> Hmm, if "expose interface" == "publish symbol", why can't you simply
> defer publishing until after initialization completes ? If "expose
> interface" == "register something somewhere", then this has to be
> undone anyway. Or am I overlooking something here ?

Yes, but between doing and undoing (in the failure path) someone has
started using the module.  The old modutils would unload it underneath
them here.  I catch it (if CONFIG_MODULE_UNLOAD, otherwise I can't)
and yell "module is now stuck" and leave it hanging.

Given we have a method of isolating a module already, it seems logical
to use it to prevent exactly this race.  Unfortunately my last attempt
assumed noone did this, and broke IDE and SCSI (hence pissing
*everyone* off 8).

Cheers,
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Module Refcount & Stuff mini-FAQ
  2002-11-26  3:16               ` Rusty Russell
@ 2002-11-26  7:12                 ` Werner Almesberger
  2002-11-26 22:56                   ` Rusty Russell
  0 siblings, 1 reply; 22+ messages in thread
From: Werner Almesberger @ 2002-11-26  7:12 UTC (permalink / raw)
  To: Rusty Russell; +Cc: linux-kernel, Doug Ledford, Alexander Viro

Rusty Russell wrote:
> Yes, but between doing and undoing (in the failure path) someone has
> started using the module.

But how ? Don't we have only two ways of calling a module, i.e.
by symbol, or by callback ? All callbacks that might call a module
must be protected with try_module_get, right ? (*)

(*) Actually, if the registration can be revoked, and the
    deregistration function does properly synchronize with on-going
    callbacks, you shouldn't need try_module_get either. E.g.
    del_timer_sync doesn't need to know about module owners.

So, if you make try_module_get work during initialization, and
modules don't publish their symbols before initialization is done,
there should be no problem ?

> Given we have a method of isolating a module already, it seems logical
> to use it to prevent exactly this race.

But that's mainly the symbol-unload race, considering that any call
through a function pointer requires try_module_get anyway. Without
try_module_get, there would also be callback-unload races if
callback de-registration doesn't synchronize (e.g. del_timer
vs. del_timer_sync).

By the way, it's also not so nice that there can't be
callbacks at removal, e.g.

service_unregister(...)
{
	...
	for_pending_requests(req) {
		...
		if (try_module_get(req->owner))
			req->fn(req,REQUEST_CANCELLED);
		else
			printk(KERN_CRIT "we just dropped a request on the "
			  "floor, how nice\n");
		...
	}
	...
}

Calling this from the module removal function would be
perfectly safe.

Actually ... can't you allow modules to be called until the
cleanup function has returned ?

> Unfortunately my last attempt
> assumed noone did this, and broke IDE and SCSI (hence pissing
> *everyone* off 8).

Two in one strike ain't bad ;-) Maybe we can find something that
gets rid of that pesky NFS, too, e.g. by adding

#define while if    /* enforce efficient programming practices */

to linux/kernel.h, or such ;-)

- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, Buenos Aires, Argentina         wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Module Refcount & Stuff mini-FAQ
  2002-11-26  7:12                 ` Werner Almesberger
@ 2002-11-26 22:56                   ` Rusty Russell
  0 siblings, 0 replies; 22+ messages in thread
From: Rusty Russell @ 2002-11-26 22:56 UTC (permalink / raw)
  To: Werner Almesberger; +Cc: linux-kernel, Doug Ledford, Alexander Viro

In message <20021126041212.B22825@almesberger.net> you write:
> Rusty Russell wrote:
> > Yes, but between doing and undoing (in the failure path) someone has
> > started using the module.
> 
> But how ? Don't we have only two ways of calling a module, i.e.
> by symbol, or by callback ? All callbacks that might call a module
> must be protected with try_module_get, right ? (*)

try_module_get() will succeed now during initialization, because the
module starts live.

> 
> (*) Actually, if the registration can be revoked, and the
>     deregistration function does properly synchronize with on-going
>     callbacks, you shouldn't need try_module_get either. E.g.
>     del_timer_sync doesn't need to know about module owners.

del_timer_sync is actually an oddity: most deregistration functions do
not block pending outstanding calls, they reference count and "unhook"
at deregistration, and delete when the refcount hits zero (if they do
anything at all 8).

(Note also: timers don't need to do try_module_get() since they can't
sleep).

> So, if you make try_module_get work during initialization, and
> modules don't publish their symbols before initialization is done,
> there should be no problem ?

Yes, but that's not the way things work currently: the interfaces to
reserve and publish are not separated.  And whether it's worth
separating them simply because of this, is the question.

> By the way, it's also not so nice that there can't be
> callbacks at removal, e.g.
> 
> service_unregister(...)
> {
> 	...
> 	for_pending_requests(req) {
> 		...
> 		if (try_module_get(req->owner))
> 			req->fn(req,REQUEST_CANCELLED);
> 		else
> 			printk(KERN_CRIT "we just dropped a request on the "
> 			  "floor, how nice\n");
> 		...
> 	}
> 	...
> }
> 
> Calling this from the module removal function would be
> perfectly safe.

Yes, and it's already in the list of exceptions: you're being called
by the module itself here.

Also, you're assuming that the coder chose not to do the
try_module_get() at request submission time.

Finally, there are some cases where a module can miss events while
unloading, but that's OK, because it's *guaranteed* to exit at this
point, so it must clean everything up in its cleanup routine anyway.
I couldn't think of an exception, can you?

> Actually ... can't you allow modules to be called until the
> cleanup function has returned ?

Only by splitting cleanup into "cleanup" and "destroy" (either by
having cleanup say "OK, I'm not live anymore" halfway though, or
having separate hooks).

We have over 1500 modules: not changing the interfaces to them was one
of the key goals.  If someone decides to later, fine.

> Two in one strike ain't bad ;-) Maybe we can find something that
> gets rid of that pesky NFS, too, e.g. by adding
> 
> #define while if    /* enforce efficient programming practices */
> 
> to linux/kernel.h, or such ;-)

Heh...  <sigh> Point taken.  I already decided to leave this one for
the moment, and do something more productive.

Thanks,
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Module Refcount & Stuff mini-FAQ
  2002-11-18 22:58 Module Refcount & Stuff mini-FAQ Rusty Russell
  2002-11-19  2:30 ` Werner Almesberger
@ 2002-11-19  2:40 ` John Levon
  2002-11-24 23:02   ` Rusty Russell
  2002-11-19  3:10 ` kksymoops Jeff Garzik
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: John Levon @ 2002-11-19  2:40 UTC (permalink / raw)
  To: Rusty Russell; +Cc: linux-kernel

On Tue, Nov 19, 2002 at 09:58:56AM +1100, Rusty Russell wrote:

>    The previous code required to implement the two module loading
>    system call, the module querying system call, and the /proc/ksyms
>    output, required a little more code than the current x86 linker.

This makes it sound like you're not bringing /proc/ksyms back (or an
equivalent to let userspace know where modules are loaded). I hope this
isn't the case...

regards
john

-- 
Khendon's Law: If the same point is made twice by the same person,
the thread is over.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Module Refcount & Stuff mini-FAQ
  2002-11-19  2:40 ` John Levon
@ 2002-11-24 23:02   ` Rusty Russell
  2002-11-25  0:38     ` John Levon
  0 siblings, 1 reply; 22+ messages in thread
From: Rusty Russell @ 2002-11-24 23:02 UTC (permalink / raw)
  To: John Levon; +Cc: linux-kernel

In message <20021119024032.GA99837@compsoc.man.ac.uk> you write:
> On Tue, Nov 19, 2002 at 09:58:56AM +1100, Rusty Russell wrote:
> 
> >    The previous code required to implement the two module loading
> >    system call, the module querying system call, and the /proc/ksyms
> >    output, required a little more code than the current x86 linker.
> 
> This makes it sound like you're not bringing /proc/ksyms back (or an
> equivalent to let userspace know where modules are loaded). I hope this
> isn't the case...

I implemented the minimal subset: it's trivial to put back.  The
important question is why do you want it?  Do you only want it when
CONFIG_MODULES=y?  Do you only want the exported symbols, or all
symbols?

If this is for oprofile to figure out where modules are, then an entry
in /proc/modules seems more appropriate, yes?

Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Module Refcount & Stuff mini-FAQ
  2002-11-24 23:02   ` Rusty Russell
@ 2002-11-25  0:38     ` John Levon
  0 siblings, 0 replies; 22+ messages in thread
From: John Levon @ 2002-11-25  0:38 UTC (permalink / raw)
  To: Rusty Russell; +Cc: linux-kernel

On Mon, Nov 25, 2002 at 10:02:00AM +1100, Rusty Russell wrote:

> I implemented the minimal subset: it's trivial to put back.  The
> important question is why do you want it?  Do you only want it when
> CONFIG_MODULES=y?  Do you only want the exported symbols, or all
> symbols?
> 
> If this is for oprofile to figure out where modules are, then an entry
> in /proc/modules seems more appropriate, yes?

Keith Owens pointed out that I need info on where each section is
mapped (consider the separate alloc of init/core sections; sure,
module_init() is an odd thing to profile but it's nice not to lose
samples). I don't care about the symbols themselves, but I believe Keith
does.

regards
john
-- 
Khendon's Law: If the same point is made twice by the same person,
the thread is over.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* kksymoops
  2002-11-18 22:58 Module Refcount & Stuff mini-FAQ Rusty Russell
  2002-11-19  2:30 ` Werner Almesberger
  2002-11-19  2:40 ` John Levon
@ 2002-11-19  3:10 ` Jeff Garzik
  2002-11-19 21:10   ` kksymoops Rusty Russell
  2002-11-19  3:50 ` kksymoops Jeff Garzik
  2002-11-23 22:23 ` Module Refcount & Stuff mini-FAQ Pavel Machek
  4 siblings, 1 reply; 22+ messages in thread
From: Jeff Garzik @ 2002-11-19  3:10 UTC (permalink / raw)
  To: Rusty Russell; +Cc: linux-kernel

Rusty,

What is the status of kksymoops?

I realize that you have a full plate, but [for the short time it existed 
in mainline] this was a incredibly useful feature for end users 
providing bug reports to me (and others).

I'm _not_ asking "when", just wondering what the plan is to resuscitate 
kksymoops.

	Jeff




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: kksymoops
  2002-11-19  3:10 ` kksymoops Jeff Garzik
@ 2002-11-19 21:10   ` Rusty Russell
  2002-11-20 15:46     ` kksymoops Kai Germaschewski
  0 siblings, 1 reply; 22+ messages in thread
From: Rusty Russell @ 2002-11-19 21:10 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel, Kai Germaschewski

In message <3DD9AB88.4000102@pobox.com> you write:
> I'm _not_ asking "when", just wondering what the plan is to resuscitate 
> kksymoops.

Looks like someone pushed my patch.  Erm, OK, wonder if it works on
x86? 8)

My mental TODO list looked something like this:
1) Drop the optimization which checks against addr between _stext and
   _etext, as this skips __init functions on most archs.
2) Only put in the symbols for functions (currently CONFIG_KALLSYMS=y
   adds 400k on my laptop: ouch!).
3) Keith asked me to rename it, so as not to get confused with the
   previous patches and kgdb support).  I guess it's too late for this
   one.
4) Use a simple scheme like the mini-oopser did to reduce the symbol
   table size furthur (I got it down to 70k IIRC).
5) See if Kai prefers the compile step inside the Makefile rather than
   in the script.
6) It'd be nice if CONFIG_KALLSYMS=m worked.

If someone wants to champion any or all of these, be my guest, there's
plenty to go around 8)

Cheers,
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: kksymoops
  2002-11-19 21:10   ` kksymoops Rusty Russell
@ 2002-11-20 15:46     ` Kai Germaschewski
  0 siblings, 0 replies; 22+ messages in thread
From: Kai Germaschewski @ 2002-11-20 15:46 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Jeff Garzik, linux-kernel

On Wed, 20 Nov 2002, Rusty Russell wrote:

> In message <3DD9AB88.4000102@pobox.com> you write:
> > I'm _not_ asking "when", just wondering what the plan is to resuscitate 
> > kksymoops.
> 
> Looks like someone pushed my patch.  Erm, OK, wonder if it works on
> x86? 8)

I think Linus himself did that ;)

> My mental TODO list looked something like this:
> 1) Drop the optimization which checks against addr between _stext and
>    _etext, as this skips __init functions on most archs.

Well, this was put in to avoid all kind of garbage in the traces, so it 
shouldn't just go without replacement. Probably one could even get it 
correct now, using ->module_init() and ->module_core() (just set them for 
the core kernel as well).

> 2) Only put in the symbols for functions (currently CONFIG_KALLSYMS=y
>    adds 400k on my laptop: ouch!).

I'm not to sure about this, I sometimes find it useful to have variables 
on the stack identified correctly.

> 3) Keith asked me to rename it, so as not to get confused with the
>    previous patches and kgdb support).  I guess it's too late for this
>    one.

Nothing wrong with a follow-up patch, is it?

> 5) See if Kai prefers the compile step inside the Makefile rather than
>    in the script.

I'll actually have to look into this. The script is probably fine.

> 6) It'd be nice if CONFIG_KALLSYMS=m worked.

Shouldn't be too hard.

Well, I know talk is cheap. I'll try to find some time to actually look
into some of these issues and come up with patches.

--Kai

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: kksymoops
  2002-11-18 22:58 Module Refcount & Stuff mini-FAQ Rusty Russell
                   ` (2 preceding siblings ...)
  2002-11-19  3:10 ` kksymoops Jeff Garzik
@ 2002-11-19  3:50 ` Jeff Garzik
  2002-11-23 22:23 ` Module Refcount & Stuff mini-FAQ Pavel Machek
  4 siblings, 0 replies; 22+ messages in thread
From: Jeff Garzik @ 2002-11-19  3:50 UTC (permalink / raw)
  To: Rusty Russell; +Cc: linux-kernel

Jeff Garzik wrote:

> Rusty,
>
> What is the status of kksymoops?



LOL -- look at me eat my words.

I see it's now in Linus's tree -- thanks much!

	Jeff




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Module Refcount & Stuff mini-FAQ
  2002-11-18 22:58 Module Refcount & Stuff mini-FAQ Rusty Russell
                   ` (3 preceding siblings ...)
  2002-11-19  3:50 ` kksymoops Jeff Garzik
@ 2002-11-23 22:23 ` Pavel Machek
  2002-11-25  0:26   ` Rusty Russell
  4 siblings, 1 reply; 22+ messages in thread
From: Pavel Machek @ 2002-11-23 22:23 UTC (permalink / raw)
  To: Rusty Russell; +Cc: linux-kernel, Doug Ledford, Alexander Viro

Hi!

> Q: How does the module remove code work?
> A: It stops the machine by scheduling threads for every other CPU,
>    then they all disable interrupts.  At this stage we know that noone
>    is in try_module_get(), so we can reliably read the counter.  If
>    zero, or the rmmod user specified --wait, we set the live flag to
>    false.  After this, the reference count should not increase, and
>    each module_put() will wake us up, so we can check the counter
>    again.

Where is this implemented? I guess I need this for swsusp...

								Pavel
-- 
Worst form of spam? Adding advertisment signatures ala sourceforge.net.
What goes next? Inserting advertisment *into* email?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Module Refcount & Stuff mini-FAQ
  2002-11-23 22:23 ` Module Refcount & Stuff mini-FAQ Pavel Machek
@ 2002-11-25  0:26   ` Rusty Russell
  0 siblings, 0 replies; 22+ messages in thread
From: Rusty Russell @ 2002-11-25  0:26 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel

In message <20021123222334.GB5170@elf.ucw.cz> you write:
> Hi!
> 
> > Q: How does the module remove code work?
> > A: It stops the machine by scheduling threads for every other CPU,
> >    then they all disable interrupts.  At this stage we know that noone
> >    is in try_module_get(), so we can reliably read the counter.  If
> >    zero, or the rmmod user specified --wait, we set the live flag to
> >    false.  After this, the reference count should not increase, and
> >    each module_put() will wake us up, so we can check the counter
> >    again.
> 
> Where is this implemented? I guess I need this for swsusp...

I'm not so sure, but it's worth a look.  Look for stop_refcounts() in
module.c.  BTW, I call this a "bogolock".

Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Module Refcount & Stuff mini-FAQ
@ 2002-11-19 19:18 Adam J. Richter
  0 siblings, 0 replies; 22+ messages in thread
From: Adam J. Richter @ 2002-11-19 19:18 UTC (permalink / raw)
  To: linux-kernel, rusty

On November 18, 2002, Rusty Russell wrote:
>[ Suggestions welcome ]
>
>Golden Rule: If you are calling though a function pointer into a
>(different) module, you must hold a reference to that module.
>Otherwise you risk sleeping in the module while it is unloaded.

Although it is addressed in your subsequent questions, I think people
might read this and think that they should pepper their code with
unneccessary try_module_get()'s, often for their own module.  I'd
recommend demoting this "golden rule to just another FAQ entry."
Otherwise, there is some implication that if you remember just the
"golden rule", that will be constructive, but I can see attempting to
apply this "golden rule" without following the rest other FAQ entries
as harmful in common cases.  If you really like the "golden rule"
format, perhaps you could add a couple of others based on issues that
you addresss later in your FAQ, such as the following.

"Golden Rule": Do not call try_module_get() unnecessarily.  In most
cases, you know that the module reference count cannot drop to zero
because some higher level facility is holding a reference and will
not release that reference until you return, such as reference count
from an open file descriptor or network interface.

Another "Golden Rule": If you find yourself explicitly calling
try_module_get() and module_put() for a common type of module, then
probably some higher level facility really should be taking care of
this for you.  This is addressed under your question "Do I really need
to put try_module_get() before every function ptr call?", but I want
phrase this as something you should proactively look for.

Perhaps there ought to be a module pointer in struct device_driver.
{,un}register_{device,driver}() would *not* modify the counts (that
would prevent module removal entirely), but it would eliminate the
need to sprinkle module pointers from most places that have them and
may allow a little bit of code consolidation in places.

>Q: How do I get a reference to a module?
>A: Usually, a successful call to try_module_get(owner).  You don't
>   need to check for owner != NULL, BTW.
>
>Q: When does try_module_get(owner) fail?
>A: When the module is not ready to be entered (ie. still in
>   init_module) or it is being removed.  It fails to prevent you
>   entering the module as it is being discarded (init might fail, or
>   it's being removed).

	1. try_module_get() introduces new error legs that will get
little testing.

	2. try_module_get() introduces new failures that other software
has to anticipate.  For example, if I try to mount an ext3 file system
and it happens that ext3 was being automatically removed (for lack of
use) at this time, the attempt to get the ext3 filesystem can fail
without request_module() being called to reload it.

	3. try_module_get() introduces yet another "most fundamental"
lock type.  We have a bunch of facilities vying to do that, and I
think it's going to be a source of bugs.  It would be better to avoid
introducing a new layer of locking if possible.

	4. This kind of race is not really specific to modules, although
they may be the only example that comes up in practice.

	I agree that this approach is worth these costs if it is
the only way to avoid module remove races, or if it is the best way.
However, I think there may be potentially better alternatives.

	Here is what I have in mind.  Basically, a module's module_init
function could register an rwsem (not sure what the difference is with
rwlock; I just it because there is already one in struct bus_type).
sys_delete_module take an exclusive lock it before checking the
reference count, and it would remain held in the the module_exit
routine, which would be responsible for releasing the lock.  Just to
enable modules that register and unregister multiple things, it
would be possible to register more than one such lock, although it
is important that these locks normally not be held simultaneously
for any other reason (otherwise there might be deadlock).

	Anyhow, here is some pseudo-code.  The change to delete_module
is most relevant.  I don't know whether I should by using rwsem or rwlock
for this, by the way.  With the example of get_fs_type and using the
generic driver layer, there is no module unload race, there is no
failure path for mod_inc_use_count, and there is no situation where
attempting to get a filesystem that is in the midst of being unloaded
will result in failure (instead, the access will wait until the filesystem
has been unregistered, and the modprobe will reload it).  Under this
scheme, there is no need for a new kind of locking primitive, and much of
this facility is not tied to modules.

struct rwsem_chain {
	struct rw_semaphore *rwsem;
	struct list_head list;
};

/* Adds the rwsem in _sorted_ order, by, say, address of rwsem. */
void rwsems_add_sorted(struct list_head *head, struct rwsem_chain *element);

/* Gets an exclusive lock on every semaphore in the chain, but duplicate
   semaphores on the list are acquired only once. */
void rwsems_wlock(struct list_head *head);

/* Remove element from the list and then release its rwsem *if* there
   are no duplicate rwsem's on the list. */
void rwsems_element_release(struct rwsem_chain *element);

struct module {
	...
	/* If this list is non-empty, then callers to MOD_INC_REF_COUNT
	   must hold one of the locks on this list (it can be a shared
	   hold). */
	struct list_head	rwsems;
};

/* in kernel/module.c: */
asmlinkage long
sys_delete_module(const char *name_user, unsigned int flags) {
	...
	rwsem_chain_wlock(mod->rwsems);
        if ((flags & O_NONBLOCK) && module_refcount(mod) != 0) {
		...
	}
	...
	free_module(mod);/*Destroy routine is responsible for releasing locks*/
	...
}

/* Here is an example of using this facility.  Let's port the generic device
   layer. */

struct device_driver {
	...
	struct module		*module;
	struct rwsem_chain	rwsem_chain;
};

int driver_register(struct device_driver * drv)
{
	...
	if (drv->module) {
		drv->rwsem_chain.rwsem = &drv->bus->rwsem;
		rwsems_add_sorted(&module->rwsems, &drv->rwsem_chain);
	}
}

void put_driver(struct device_driver * drv)
{
	...
	rwsems_element_release(&drv->rwsem_chain);
}

struct device_driver *
get_driver_by_name(struct bus_type *bus_type, const char *name)
{
	down_read(&bus->rwsem);
	list_for_each(node,&bus_type->drivers) {
		struct driver * dev = get_device(to_drv(node));
		if (strcmp(dev->name, name) == 0) {
			MOD_INC_REF_COUNT(drv->module);
			up_read(&bus->rwsem);
			return drv;
		}
	}
	up_read(&bus->rwsem);
	return NULL;
}

struct device_driver *
get_driver_by_name_modprobe(struct bus_type *bus_type, const char *name)
{
	struct device_driver *result = get_driver_by_name(bus_type, name);

	if (!result && request_module(name) == 0)
		return get_driver_by_name(bus_type, name);
	else
		return result;
}	

/* You could port the file system list interface to file system
   by having every file system do

   rwsems_add_sorted(&THIS_MODULE->rwsems, &local_rwsems_element);

   ...or you could embed a struct device_driver in struct file_system_type
   and a struct device in struct super_block.

   Either way, you would then a have raceless get_fs_type does not
   return failure when the file system being sought is in the midst
   of being unloaded. */

struct bus_type fs_bus_type;

int register_filesystem(struct file_system_type * fs)
{
	...
	down_write(&fs_bus_type.rwsem);
	...
	fs->driver.module = fs->module;
	fs->driver.bus_type = &fs_bus_type;
	register_driver(&fs->driver);
	...
	up_write(&fs_bus_type.rwsem);
}

struct file_system_type *get_fs_type(const char *name)
{
	struct device_driver *driver = get_driver_by_name_modprobe(name);

	if (driver == NULL)
		return NULL;

	return container_of(driver, struct file_system_type, driver);
}

	I may comment further on your FAQ later, especially the stuff
about having the module loader in the kernel, but I have to go now, and
that's a separate topic anyhow.

Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Module Refcount & Stuff mini-FAQ
@ 2002-11-20 12:25 Adam J. Richter
  0 siblings, 0 replies; 22+ messages in thread
From: Adam J. Richter @ 2002-11-20 12:25 UTC (permalink / raw)
  To: rusty; +Cc: linux-kernel

Although Rusty Russell sent me a preliminary response without cc'ing
linux-kernel (I think because it was so preliminary), but I'm
re-cc'ing linux-kernel because my response explains a bit more about
my proposal to eliminate most or all try_module_get's racelessly,
which is something others might find informative or might want to
comment on.  There is obviously nothing particularly private in this
message.

Rusty Russell wrote:
>In message <200211191918.LAA11641@adam.yggdrasil.com> you [Adam Richter] write:
>> On November 18, 2002, Rusty Russell wrote:
[...]
>> 	1. try_module_get() introduces new error legs that will get
>> little testing.

>try_module_get() is exactly equivalent to try_inc_mod_count().  It's
>not actually a new thing: deprecating the other module refcount
>methods is the new thing.

	OK, s/try_module_get/try_inc_mod_count/.

	Same issue.

>I've only glanced over your locking proposal, but the most obvious
>things to me are that grabbing a rwlock strikes me as a little heavy
>for a fundmantal primitive that might be used anywhere, and secondly I
>want to grab it in a bh handler so I can modularize IPv4.

	It appears that there already is an appropriate mutex to use
in ipv4: rtnl_sem.  My code currently uses an rw_semaphore instead of
a semaphore, but it could either be changed to call a list of
arbirtrary locking functions, or, probably simpler, rtnl_sem could be
changed to an rw_semaphore.  The latter is particularly appealing,
because there already is code in the net subdirectory that seems to be
written to take advantage of this change (there are already distinct
rtnl_shlock and rtnl_exlock macros).  So, the using a locking primitive
that can lock rather than spin has already been solved apparently in ipv4.

	I run modularized ipv4 already, so it should be easy for me to
check if your module loader gets to the point where it works enough
for me to complete a boot (or I could try to patch to Keith Owen's
module system).

	As far as the "strikes me as a little heavy" phrase goes,
that's obviously not a clear identification of a cost, so I can only
try guess what costs you might be referring to talk about them.  The
memory usage of struct rwsem_chain would be 3 pointers (12 or 24
bytes) per detected device (net, char, block, etc.), file_system_type
and perhaps some other resources of which there are usually a handful
of each.  My guess is that you'd probably have under 100 allocated on
a typical computer: one for each detected network interface, one for
each detected block device, one for each detected character device,
and some elsewhere, perhaps an average of 2-3 per loaded module,
probably as much space will be saved from eliminating error paths.
The locks themselves generally generally already exist, and these
amount to one lock per type of device (per "bus_type" from a generic
driver standpoint): one lock for network devices, one lock for file
system types, etc.  It's a very small number and the locks already
exist in every case that I've seen anyhow.  So the net memory costs
should be approximately zero and it may even be a net memory savings.

	If by "a little heavy" you were referring to lock contention,
it's important to realize that this proposal sets up lists of locks,
it does not introduce new attempts to grab these locks or attempts to
hold them much longer, with the exception of the module's unload
function.  Even there, these are locks that would be taken at some
point in the unload function anyhow.  Also, these are not spin locks
and attempts to block with these rw_semaphore's held will be rare to
nonexistant.  So, the waiting on the locks should cause the CPU to
switch to something useful, not just spin.

	This change will likely eliminate bugs, simplify testing and
code walk throughs (fewer untested branches), and, more importantly,
eliminate flakey behavior that is not considered a bug from the
try_inc_mod_count perspective, but is a bug from a functional
standpoint.  For example, if you do "mount -t iso9660 /dev/cdrom
/mnt", a try_inc_mod_count implementation can generate a result like
"iso9660: unknown filesystem" on a system that has iso9660 just
because of a timing fluke, whereas under my scheme you will reliably
get the iso9660 filesystem every time.

	It is also important to realize that this change can be done
incrementally.  Under this scheme, try_inc_mod_count will just succeed
all the time for users of this facility.  Note that the restored
unconditional __MOD_INC_USE_COUNT routine could have an optional
debugging feature: it would call BUG() if the caller does not have at
least a shared lock on one of the rw_semaphore's in the module's list.
__MOD_INC_USE_COUNT is generally not called in IO paths, so it
wouldn't be that costly to turn that check on.

	If we are able to eliminate all calls to try_inc_mod_count,
then that "lock the entire computer" code can be deleted, as well as
try_inc_mod_count.  In the unlikely even that try_inc_mod_count cannot
be eliminated, I think the reliability that this change would bring to
those places that can use this facility would still be enough to
warrant keeping this facility.

Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2002-11-26 22:54 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-11-18 22:58 Module Refcount & Stuff mini-FAQ Rusty Russell
2002-11-19  2:30 ` Werner Almesberger
2002-11-24 22:50   ` Rusty Russell
2002-11-25  2:07     ` Werner Almesberger
2002-11-25  2:27       ` Rusty Russell
2002-11-25  6:39         ` Werner Almesberger
2002-11-25 22:43           ` Rusty Russell
2002-11-26  2:26             ` Werner Almesberger
2002-11-26  3:16               ` Rusty Russell
2002-11-26  7:12                 ` Werner Almesberger
2002-11-26 22:56                   ` Rusty Russell
2002-11-19  2:40 ` John Levon
2002-11-24 23:02   ` Rusty Russell
2002-11-25  0:38     ` John Levon
2002-11-19  3:10 ` kksymoops Jeff Garzik
2002-11-19 21:10   ` kksymoops Rusty Russell
2002-11-20 15:46     ` kksymoops Kai Germaschewski
2002-11-19  3:50 ` kksymoops Jeff Garzik
2002-11-23 22:23 ` Module Refcount & Stuff mini-FAQ Pavel Machek
2002-11-25  0:26   ` Rusty Russell
  -- strict thread matches above, loose matches on Subject: below --
2002-11-19 19:18 Adam J. Richter
2002-11-20 12:25 Adam J. Richter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox