* [RFC] Is an alternative module interface needed/possible?
2003-02-17 17:04 ` Werner Almesberger
@ 2003-02-17 23:09 ` Roman Zippel
2003-02-18 1:18 ` Werner Almesberger
0 siblings, 1 reply; 30+ messages in thread
From: Roman Zippel @ 2003-02-17 23:09 UTC (permalink / raw)
To: Werner Almesberger; +Cc: Rusty Russell, kuznet, davem, kronos, linux-kernel
Hi,
(Subject changed to hopefully get a bit more attention.)
On Mon, 17 Feb 2003, Werner Almesberger wrote:
> > If we exclude the possibly-wait-forever-option, do you see the problem
> > for dynamic objects which also contain references to static data/
> > functions?
>
> You mean that two locking mechanisms are used, where one of them
> shouldn't be doing all that much ? Well, yes.
>
> Now, is this a problem, or just a symptom ? I'd say it's a symptom:
> we already have a perfectly good locking/synchronization method,
> and that's through the register/unregister interface, so the
> module-specific part is unnecessary.
If it was perfectly good, we hadn't a problem. :)
> That much about the theory. Of course, in real life, we have to
> face a few more problems:
>
> - if callbacks can happen after apparently successful "unregister",
> we die
> - if accesses to other static data owned by a module can happen
> after apparently successful "unregister", we may die
> - if a module doesn't "unregister" at all, we die too
>
> But all these problems equally affect code that does other things
> that can break a callback/access, e.g. if we destroy *de->data
> immediately after remove_proc_entry returns.
>
> So this is not a module-specific problem.
You're skipping ahead. You haven't solved the problem yet, but you're
already jumping to conclusions. :-)
Remember, that we want to savely remove a proc entry and as added bonus,
we only want a single reference count. Let's look first at the possible
solutions:
module count: by design this only works for entries, which are removed
during module exit, but not for dynamic entries.
failure: if the object is still busy, we just return -EBUSY. This is
simple, but this doesn't work for modules, since during module exit you
can't fail anymore.
callbacks: the callback function itself had to be protected somehow, so
just to unregister a proc entry, you have to register a callback. To
unregister that callback, it would be silly to use another callback and
failure doesn't work with modules, so that only leaves the module count.
The last solution sounds complicated, but exactly this is done for
filesystems and we didn't really get rid of the second reference count, we
just moved it somewhere else, where it hurts least.
Without interface changes this is also the only generic option to export
dynamic data - the drivers have to get a filesystem like interface (or
just become filesystem themselves).
The very basic reason which prevents another solution is that static data
(which includes functions) is controlled by the generic module code and
dynamic data is controlled by the driver itself. It's obvious that we
can't give the module code control over dynamic data, on the other hand
would it be possible to give the driver control over the static data? This
way it suddenly it becomes a module-specific problem - how can we give
drivers more control over its data?
bye, Roman
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible?
2003-02-17 23:09 ` [RFC] Is an alternative module interface needed/possible? Roman Zippel
@ 2003-02-18 1:18 ` Werner Almesberger
2003-02-18 4:54 ` Rusty Russell
2003-02-19 1:48 ` Roman Zippel
0 siblings, 2 replies; 30+ messages in thread
From: Werner Almesberger @ 2003-02-18 1:18 UTC (permalink / raw)
To: Roman Zippel; +Cc: Rusty Russell, kuznet, davem, kronos, linux-kernel
Roman Zippel wrote:
> If it was perfectly good, we hadn't a problem. :)
I said we he have the method. Now we need to use it properly :-)
> You're skipping ahead. You haven't solved the problem yet, but you're
> already jumping to conclusions. :-)
The solution is another issue. I simply stated that the problem
happens with or without modules.
> module count: by design this only works for entries, which are removed
> during module exit, but not for dynamic entries.
Works only for modules, not good.
> failure: if the object is still busy, we just return -EBUSY. This is
> simple, but this doesn't work for modules, since during module exit you
> can't fail anymore.
That's a modules API problem. And yes, I think modules should
eventually be able to say that they're busy.
> callbacks: the callback function itself had to be protected somehow, so
> just to unregister a proc entry, you have to register a callback. To
> unregister that callback, it would be silly to use another callback and
If all you want to do is to decrement the module count, you could
have a global handler for this that is guaranteed not to reside
in a module.
By the way, a loong time ago, in the modules thread, I suggested
a "decrement_module_count_and_return" function [1]. Such a
construct would be useful in this specific case.
[1] http://www.uwsg.iu.edu/hypermail/linux/kernel/0207.0/0147.html
> failure doesn't work with modules, so that only leaves the module count.
And how would you ensure correct access to static data in the
absence of modules ? Any solution that _requires_ a module count
looks highly suspicious to me.
Likewise, possibly dynamically allocated data that is synchronized
by the caller, e.g. "user" in "struct proc_dir_entry".
> The last solution sounds complicated, but exactly this is done for
> filesystems and we didn't really get rid of the second reference count, we
> just moved it somewhere else, where it hurts least.
Hmm, I'm confused. With "filesystem", do you mean the file system
driver per se (e.g. "ext3"), or a specific instance of such a file
system (e.g. /dev/hda1 mounted on /) ?
- Werner
--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible?
2003-02-18 1:18 ` Werner Almesberger
@ 2003-02-18 4:54 ` Rusty Russell
2003-02-18 7:20 ` Werner Almesberger
2003-02-19 1:48 ` Roman Zippel
1 sibling, 1 reply; 30+ messages in thread
From: Rusty Russell @ 2003-02-18 4:54 UTC (permalink / raw)
To: Werner Almesberger; +Cc: Roman Zippel, kuznet, davem, kronos, linux-kernel
In message <20030217221837.Q2092@almesberger.net> you write:
> > failure: if the object is still busy, we just return -EBUSY. This is
> > simple, but this doesn't work for modules, since during module exit you
> > can't fail anymore.
Of course, they never could fail module exit. You could use
can_unload() which was only protected by the BKL, hence so rarely
used: grep 2.4.20 to see all four users.
Having a separate can_unload function means a window between asking
can_unload() and doing the actual unload. Doing it in the function
itself (ie. cleanup returns an int, or a separate "prepare_to_unload"
kind of function) means you have to deal with the "I started
deregistering my stuff, but then had to re-register them" which leaves
a race where the module is partially unavailable (ie. the spurious
failure problem).
Both these problems are soluble, but they're not trivial.
> That's a modules API problem. And yes, I think modules should
> eventually be able to say that they're busy.
Take a look at the current ip_conntrack code. It keeps its own
reference count and spins until it hits zero in unload, because
otherwise it would never be unloadable (it attaches a callback to each
packet, so logically it would bump its refcount there). In 2.5 it can
use try_module_get() and be unloaded with rmmod --wait in this worst
case (I'm slack, patching is on my todo list).
Security modules have the same issues, and Greg Kroah-Hartmann was
telling me USB has it as well.
You can see why I considered deprecating module unloading: it's a hard
problem, and fixing it in general probably means massive changes.
Which I'm happy to leave to someone else.
> By the way, a loong time ago, in the modules thread, I suggested
> a "decrement_module_count_and_return" function [1]. Such a
> construct would be useful in this specific case.
Stephen and I even implemented one, for x86 (see below). But
implementing the matching "enter and inc" case remains problematic, so
I dropped that idea.
Anyway, good luck!
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
Name: module_put_return primitive for x86
Author: Rusty Russell
Status: Experimental
Depends: Module/forceunload.patch.gz
D: This patch implements module_put_return() for x86, which allows a
D: module to control its own reference counts in some cases. A module
D: must never use module_put() on itself, since this may result in the
D: module being removable immediately: this is the alternative, which
D: atomically decrements the count and returns.
D:
D: Each architecture will need to implement this for preempt.
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .13134-linux-2.5.38/arch/i386/kernel/module.c .13134-linux-2.5.38.updated/arch/i386/kernel/module.c
--- .13134-linux-2.5.38/arch/i386/kernel/module.c 2002-09-25 02:02:28.000000000 +1000
+++ .13134-linux-2.5.38.updated/arch/i386/kernel/module.c 2002-09-25 02:02:56.000000000 +1000
@@ -28,6 +28,9 @@
#define DEBUGP(fmt , ...)
#endif
+/* For the magic module return. */
+struct module_percpu module_percpu[NR_CPUS];
+
static void *alloc_and_zero(unsigned long size)
{
void *ret;
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .13134-linux-2.5.38/arch/i386/lib/module.S .13134-linux-2.5.38.updated/arch/i386/lib/module.S
--- .13134-linux-2.5.38/arch/i386/lib/module.S 1970-01-01 10:00:00.000000000 +1000
+++ .13134-linux-2.5.38.updated/arch/i386/lib/module.S 2002-09-25 02:02:56.000000000 +1000
@@ -0,0 +1,33 @@
+/* Icky, icky, return trampoline for dying modules. Entered with
+ interrupts off. (C) 2002 Stephen Rothwell, IBM Corporation. */
+
+#include <asm/thread_info.h>
+
+.text
+.align 4
+.globl __magic_module_return
+
+#define MODULE_PERCPU_SIZE_ORDER 3
+
+__magic_module_return:
+ /* Save one working variable */
+ pushl %eax
+
+ /* Get CPU number from current. */
+ GET_THREAD_INFO(%eax)
+ movl TI_CPU(%eax), %eax
+
+ /* Push module_percpu[cpu].flags on the stack */
+ shll $MODULE_PERCPU_SIZE_ORDER, %eax
+ pushl module_percpu(%eax)
+
+ /* Put module_percpu[cpu].returnaddr into %eax */
+ movl module_percpu+4(%eax), %eax
+
+ /* Push returnaddr and restore eax */
+ xchgl %eax, 4(%esp)
+
+ /* Restore interrupts */
+ popf
+ /* Go home */
+ ret
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .13134-linux-2.5.38/include/asm-i386/module.h .13134-linux-2.5.38.updated/include/asm-i386/module.h
--- .13134-linux-2.5.38/include/asm-i386/module.h 2002-09-25 02:02:28.000000000 +1000
+++ .13134-linux-2.5.38.updated/include/asm-i386/module.h 2002-09-25 02:02:56.000000000 +1000
@@ -8,4 +8,33 @@ struct mod_arch_specific
#define Elf_Shdr Elf32_Shdr
#define Elf_Sym Elf32_Sym
#define Elf_Ehdr Elf32_Ehdr
+
+/* Non-preemtible decrement the refcount and return. */
+#define module_put_return(firstarg , ...) \
+do { \
+ unsigned int cpu; \
+ unsigned long flags; \
+ \
+ local_irq_save(flags); \
+ if (unlikely(module_put(THIS_MODULE)) { \
+ module_percpu[cpu].flags = flags; \
+ module_percpu[cpu].returnaddr = ((void **)&(firstarg))[-1]; \
+ ((void **)&(firstarg))[-1] = __magic_module_return; \
+ } else \
+ local_irq_restore(flags); \
+ return __VA_ARGS__; \
+} while(0)
+
+/* FIXME: Use per-cpu vars --RR */
+struct module_percpu
+{
+ unsigned long flags;
+ void *returnaddr;
+};
+
+extern struct module_percpu module_percpu[NR_CPUS];
+
+/* Restore flags and return to caller. */
+extern void __magic_module_return(void);
+
#endif /* _ASM_I386_MODULE_H */
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible?
2003-02-18 4:54 ` Rusty Russell
@ 2003-02-18 7:20 ` Werner Almesberger
2003-02-18 12:06 ` Roman Zippel
2003-02-18 12:35 ` Roman Zippel
0 siblings, 2 replies; 30+ messages in thread
From: Werner Almesberger @ 2003-02-18 7:20 UTC (permalink / raw)
To: Rusty Russell; +Cc: Roman Zippel, kuznet, davem, kronos, linux-kernel
Grr, I hate this. I just typed a long reply to your posting, and
wanted to finish it with a reminder that the issue is more general
than just modules, so that we shouldn't look at modules yet. Then
I realized of course that everything above was about modules :-(
I don't think we'll make much progress if we keep on mixing issues
of interface correctness, current module constraints, and possible
module interface changes, all that with performance considerations
and minimum invasive migration plans thrown in. So I'd suggest the
following sequence:
1) do we agree that the current registration/deregistration
interfaces are potential hazards for their users, be they
modules or not ?
2) one we agree with this, we can look for mechanisms that
solve this, again for general users, which may or may not
be modules
3) last but not least, we can look at what this means for
modules (and that's where beautiful tools like
"module_put_return" (thanks !), or also ideas about
module_exit redesign have their place)
4) "the root of all evil ...". Okay, and now to which level
of hell would all this shoot our performance ? (And back
we go to step 2.)
- Werner
--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible?
2003-02-18 7:20 ` Werner Almesberger
@ 2003-02-18 12:06 ` Roman Zippel
2003-02-18 14:12 ` Werner Almesberger
2003-02-18 12:35 ` Roman Zippel
1 sibling, 1 reply; 30+ messages in thread
From: Roman Zippel @ 2003-02-18 12:06 UTC (permalink / raw)
To: Werner Almesberger; +Cc: Rusty Russell, kuznet, davem, kronos, linux-kernel
Hi,
On Tue, 18 Feb 2003, Werner Almesberger wrote:
> I don't think we'll make much progress if we keep on mixing issues
> of interface correctness, current module constraints, and possible
> module interface changes, all that with performance considerations
> and minimum invasive migration plans thrown in. So I'd suggest the
> following sequence:
>
> 1) do we agree that the current registration/deregistration
> interfaces are potential hazards for their users, be they
> modules or not ?
> 2) one we agree with this, we can look for mechanisms that
> solve this, again for general users, which may or may not
> be modules
> 3) last but not least, we can look at what this means for
> modules (and that's where beautiful tools like
> "module_put_return" (thanks !), or also ideas about
> module_exit redesign have their place)
> 4) "the root of all evil ...". Okay, and now to which level
> of hell would all this shoot our performance ? (And back
> we go to step 2.)
Basically I can agree with this, although I'd like to avoid that we
iterate too much over these steps, as it would too easily divert the
discussion to other things, so I'd rather take smaller steps and keep the
scope a bit broader.
Another point is the perfomance, which is not that important right now.
I'm more interested in the general complexity, it's simply easier to
optimize a simple design than a very complex design.
bye, Roman
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible?
2003-02-18 7:20 ` Werner Almesberger
2003-02-18 12:06 ` Roman Zippel
@ 2003-02-18 12:35 ` Roman Zippel
2003-02-18 14:14 ` Werner Almesberger
1 sibling, 1 reply; 30+ messages in thread
From: Roman Zippel @ 2003-02-18 12:35 UTC (permalink / raw)
To: Werner Almesberger; +Cc: linux-kernel
Hi,
On Tue, 18 Feb 2003, Werner Almesberger wrote:
> I don't think we'll make much progress if we keep on mixing issues
> of interface correctness, current module constraints, and possible
> module interface changes, all that with performance considerations
> and minimum invasive migration plans thrown in. So I'd suggest the
> following sequence:
Um, another point, let's ignore "minimum invasive migration plans", if we
found a good solution, we can still figure out how to get there smoothly,
so this shouldn't be a primary concern.
bye, Roman
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible?
2003-02-18 14:12 ` Werner Almesberger
@ 2003-02-18 12:45 ` Thomas Molina
2003-02-18 17:22 ` Werner Almesberger
1 sibling, 0 replies; 30+ messages in thread
From: Thomas Molina @ 2003-02-18 12:45 UTC (permalink / raw)
To: linux-kernel
On Tue, 18 Feb 2003, Werner Almesberger wrote:
> Good :-) I want to avoid modules as much as possible, because
> they've extensively been tackled in the past (which didn't help
> much making the interfaces better), and also because they're
> just a bit too political an issue.
>
> Okay, this brings us to the issue of broken interfaces. Do we
> have agreement that there are cases where interfaces like
> remove_proc_entry, in their current state, cannot be used
> correctly ?
I hope this discussion is taking place in the context of looking forward
towards something to implement in 2.7. IMHO we are much too late in the
2.5 cycle to implement this now.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible?
2003-02-18 12:06 ` Roman Zippel
@ 2003-02-18 14:12 ` Werner Almesberger
2003-02-18 12:45 ` Thomas Molina
2003-02-18 17:22 ` Werner Almesberger
0 siblings, 2 replies; 30+ messages in thread
From: Werner Almesberger @ 2003-02-18 14:12 UTC (permalink / raw)
To: Roman Zippel; +Cc: Rusty Russell, kuznet, davem, kronos, linux-kernel
(I think we should limit the Cc:s to Roman, Rusty, the list, and me,
and leave the others in peace. Please yell if you really want that
extra copy :-)
Roman Zippel wrote:
> Basically I can agree with this, although I'd like to avoid that we
> iterate too much over these steps, as it would too easily divert the
> discussion to other things, so I'd rather take smaller steps and keep the
> scope a bit broader.
Good :-) I want to avoid modules as much as possible, because
they've extensively been tackled in the past (which didn't help
much making the interfaces better), and also because they're
just a bit too political an issue.
Okay, this brings us to the issue of broken interfaces. Do we
have agreement that there are cases where interfaces like
remove_proc_entry, in their current state, cannot be used
correctly ?
Example:
static ...callback...(... struct file *file ...)
{
void *user = PDE(file->f_dentry->d_inode)->data;
...
do something with "*user"
...
}
...
struct proc_dir_entry *de = create_proc_entry(...);
void *my_data;
de->data = my_data = kmalloc(...);
...
remove_proc_entry(...);
/* what happens with "my_data", formerly known as "de->data" ? */
a) kfree it. May cause an oops or other problems if we're in the
middle of the callback.
b) don't kfree it. So we now have a (hopefully small) memory leak.
Problem: my_data may point to a lot more than just some tiny
allocation.
Okay so far ?
(Possible solutions on the next, slid^H^H^H^Hposting :-)
- Werner
--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible?
2003-02-18 12:35 ` Roman Zippel
@ 2003-02-18 14:14 ` Werner Almesberger
0 siblings, 0 replies; 30+ messages in thread
From: Werner Almesberger @ 2003-02-18 14:14 UTC (permalink / raw)
To: Roman Zippel; +Cc: linux-kernel
Roman Zippel wrote:
> Um, another point, let's ignore "minimum invasive migration plans", if we
> found a good solution, we can still figure out how to get there smoothly,
> so this shouldn't be a primary concern.
In my next posting, I just wrote (this very minute)
"For brainstorming's sake, let's not worry about backward-compatibility
for now:"
So I guess I don't need this disclaimer then ;-)
- Werner
--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible?
2003-02-18 14:12 ` Werner Almesberger
2003-02-18 12:45 ` Thomas Molina
@ 2003-02-18 17:22 ` Werner Almesberger
2003-02-19 3:30 ` Rusty Russell
` (2 more replies)
1 sibling, 3 replies; 30+ messages in thread
From: Werner Almesberger @ 2003-02-18 17:22 UTC (permalink / raw)
To: Roman Zippel; +Cc: Rusty Russell, kuznet, linux-kernel
Next round: possible remedies and their side-effects. As
usual, if you disagree with something, please holler.
If yes, let's look at possible (and not overly insane) solutions,
using remove_proc_entry as a case study:
1) still don't kfree, and leave it to the user to somehow
minimize the damage. (Good luck :-)
2) add a callback that is invoked when the proc entry gets
deleted. (This callback may be called before remove_proc_entry
completes.) Problem: unload/return race for modules.
3) change remove_proc_entry or add remove_proc_entry_wait that
works like remove_proc_entry, but blocks until the entry is
deleted. Problem: may sleep "forever".
4) make remove_proc_entry return an indication of whether the
entry was successfully removed or not. Problem: if it
wasn't, what can we do then ?
5) like above, but don't remove the entry if we can't do it
immediately. Problem: there's no notification for when we
should try again, so we'd have to poll.
6) export the lookup mechanism, and let the caller poll for
removal. Problem: races with creation of a new entry with
the same name.
7) transfer ownership of de->data to procfs, and set some
(possibly configurable) destruction policy (e.g. always
kfree, or such). Similar to 2), but less flexible.
Any more ?
I think that most programmers would intuitively expect an
interface of type 3). In cases where we can't sleep, either
type 2) or type 5) would be common choices.
Does that sound reasonable so far ?
I'll wait a little until I continue with ways for dealing
with the side-effects.
- Werner
--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible?
2003-02-18 1:18 ` Werner Almesberger
2003-02-18 4:54 ` Rusty Russell
@ 2003-02-19 1:48 ` Roman Zippel
2003-02-19 2:27 ` Werner Almesberger
1 sibling, 1 reply; 30+ messages in thread
From: Roman Zippel @ 2003-02-19 1:48 UTC (permalink / raw)
To: Werner Almesberger; +Cc: Rusty Russell, kuznet, kronos, linux-kernel
Hi,
On Mon, 17 Feb 2003, Werner Almesberger wrote:
> > failure: if the object is still busy, we just return -EBUSY. This is
> > simple, but this doesn't work for modules, since during module exit you
> > can't fail anymore.
>
> That's a modules API problem. And yes, I think modules should
> eventually be able to say that they're busy.
As Rusty already correctly said, this is currently not possible with the
current modules API. On the other hand he also jumps very quickly to
conclusions, so let's look at all the options first and then decide how
trivial they are.
Anyway, for now just keep this option in mind and let's look at the other
options, which don't require a module API change.
> By the way, a loong time ago, in the modules thread, I suggested
> a "decrement_module_count_and_return" function [1]. Such a
> construct would be useful in this specific case.
This would just be an optimization(?) for the module count, it doesn't
change the general problem and is not useful as generic solution, so I'd
rather put it back for now.
> > failure doesn't work with modules, so that only leaves the module count.
>
> And how would you ensure correct access to static data in the
> absence of modules ? Any solution that _requires_ a module count
> looks highly suspicious to me.
In that case it would be kernel memory, which cannot be freed, so it will
not go away and requires no module count.
> Likewise, possibly dynamically allocated data that is synchronized
> by the caller, e.g. "user" in "struct proc_dir_entry".
user?
> > The last solution sounds complicated, but exactly this is done for
> > filesystems and we didn't really get rid of the second reference count, we
> > just moved it somewhere else, where it hurts least.
>
> Hmm, I'm confused. With "filesystem", do you mean the file system
> driver per se (e.g. "ext3"), or a specific instance of such a file
> system (e.g. /dev/hda1 mounted on /) ?
A generic file system as it's registered via register_filesystem. A file
system exports lots of dynamic and static data and the only (used) module
pointer you will find is in struct file_system_type (the owner pointer in
struct file_operations is not used by any standard fs). It's interesting
to see how file systems keep the module business at a minimum (no
try_module_get() is still cheaper than the cheapest try_module_get()).
Um, it's getting late and I just played too much with procfs to find a
sensible solution. Below is an experimental patch to add callback which
would allow it to safely remove user data. Very lightly tested, so no
guarantees.
bye, Roman
Index: include/linux/proc_fs.h
===================================================================
RCS file: /usr/src/cvsroot/linux-2.5/include/linux/proc_fs.h,v
retrieving revision 1.1.1.9
diff -u -p -r1.1.1.9 proc_fs.h
--- include/linux/proc_fs.h 27 Aug 2002 23:38:02 -0000 1.1.1.9
+++ include/linux/proc_fs.h 19 Feb 2003 00:46:28 -0000
@@ -69,6 +69,7 @@ struct proc_dir_entry {
void *data;
read_proc_t *read_proc;
write_proc_t *write_proc;
+ void (*release_proc)(struct proc_dir_entry *);
atomic_t count; /* use count */
int deleted; /* delete flag */
kdev_t rdev;
Index: fs/proc/generic.c
===================================================================
RCS file: /usr/src/cvsroot/linux-2.5/fs/proc/generic.c,v
retrieving revision 1.1.1.12
diff -u -p -r1.1.1.12 generic.c
--- fs/proc/generic.c 15 Feb 2003 01:25:07 -0000 1.1.1.12
+++ fs/proc/generic.c 19 Feb 2003 01:40:36 -0000
@@ -639,6 +639,8 @@ void free_proc_entry(struct proc_dir_ent
if (ino < PROC_DYNAMIC_FIRST ||
ino >= PROC_DYNAMIC_FIRST+PROC_NDYNAMIC)
return;
+ if (de->release_proc)
+ de->release_proc(de);
if (S_ISLNK(de->mode) && de->data)
kfree(de->data);
kfree(de);
@@ -655,6 +657,7 @@ void remove_proc_entry(const char *name,
const char *fn = name;
int len;
+ lock_kernel();
if (!parent && xlate_proc_name(name, &parent, &fn) != 0)
goto out;
len = strlen(fn);
@@ -680,5 +683,6 @@ void remove_proc_entry(const char *name,
break;
}
out:
+ unlock_kernel();
return;
}
Index: fs/proc/inode.c
===================================================================
RCS file: /usr/src/cvsroot/linux-2.5/fs/proc/inode.c,v
retrieving revision 1.1.1.13
diff -u -p -r1.1.1.13 inode.c
--- fs/proc/inode.c 1 Feb 2003 19:59:21 -0000 1.1.1.13
+++ fs/proc/inode.c 19 Feb 2003 01:40:36 -0000
@@ -23,13 +23,6 @@
extern void free_proc_entry(struct proc_dir_entry *);
-static inline struct proc_dir_entry * de_get(struct proc_dir_entry *de)
-{
- if (de)
- atomic_inc(&de->count);
- return de;
-}
-
/*
* Decrements the use count and checks for deferred deletion.
*/
@@ -44,11 +37,13 @@ static void de_put(struct proc_dir_entry
}
if (atomic_dec_and_test(&de->count)) {
+ struct module *owner = de->owner;
if (de->deleted) {
printk("de_put: deferred delete of %s\n",
de->name);
free_proc_entry(de);
}
+ module_put(owner);
}
unlock_kernel();
}
@@ -71,11 +66,8 @@ static void proc_delete_inode(struct ino
/* Let go of any associated proc directory entry */
de = PROC_I(inode)->pde;
- if (de) {
- if (de->owner)
- module_put(de->owner);
+ if (de)
de_put(de);
- }
}
struct vfsmount *proc_mnt;
@@ -178,44 +170,36 @@ struct inode * proc_get_inode(struct sup
/*
* Increment the use count so the dir entry can't disappear.
*/
- de_get(de);
-#if 1
-/* shouldn't ever happen */
-if (de && de->deleted)
-printk("proc_iget: using deleted entry %s, count=%d\n", de->name, atomic_read(&de->count));
-#endif
+ if (!atomic_read(&de->count) && !try_module_get(de->owner))
+ return NULL;
+ atomic_inc(&de->count);
inode = iget(sb, ino);
if (!inode)
goto out_fail;
-
+
PROC_I(inode)->pde = de;
- if (de) {
- if (de->mode) {
- inode->i_mode = de->mode;
- inode->i_uid = de->uid;
- inode->i_gid = de->gid;
- }
- if (de->size)
- inode->i_size = de->size;
- if (de->nlink)
- inode->i_nlink = de->nlink;
- if (!try_module_get(de->owner))
- goto out_fail;
- if (de->proc_iops)
- inode->i_op = de->proc_iops;
- if (de->proc_fops)
- inode->i_fop = de->proc_fops;
- else if (S_ISBLK(de->mode)||S_ISCHR(de->mode)||S_ISFIFO(de->mode))
- init_special_inode(inode,de->mode,kdev_t_to_nr(de->rdev));
+ if (de->mode) {
+ inode->i_mode = de->mode;
+ inode->i_uid = de->uid;
+ inode->i_gid = de->gid;
}
+ if (de->size)
+ inode->i_size = de->size;
+ if (de->nlink)
+ inode->i_nlink = de->nlink;
+ if (de->proc_iops)
+ inode->i_op = de->proc_iops;
+ if (de->proc_fops)
+ inode->i_fop = de->proc_fops;
+ else if (S_ISBLK(de->mode)||S_ISCHR(de->mode)||S_ISFIFO(de->mode))
+ init_special_inode(inode,de->mode,kdev_t_to_nr(de->rdev));
-out:
return inode;
out_fail:
de_put(de);
- goto out;
+ return NULL;
}
int proc_fill_super(struct super_block *s, void *data, int silent)
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible?
2003-02-19 1:48 ` Roman Zippel
@ 2003-02-19 2:27 ` Werner Almesberger
0 siblings, 0 replies; 30+ messages in thread
From: Werner Almesberger @ 2003-02-19 2:27 UTC (permalink / raw)
To: Roman Zippel; +Cc: Rusty Russell, kuznet, kronos, linux-kernel
Roman Zippel wrote:
> Anyway, for now just keep this option in mind and let's look at the other
> options, which don't require a module API change.
Yes, we can look at the modules case at the end.
> In that case it would be kernel memory, which cannot be freed, so it will
> not go away and requires no module count.
kfree isn't the only way to make data unusable. Remember the
"my_state" example.
> > Likewise, possibly dynamically allocated data that is synchronized
> > by the caller, e.g. "user" in "struct proc_dir_entry".
>
> user?
"data", sorry. I always call this kind of argument something like
"user" in my code ...
> A generic file system as it's registered via register_filesystem.
Okay, I'll have a look at that.
> Um, it's getting late and I just played too much with procfs to find a
> sensible solution. Below is an experimental patch to add callback which
> would allow it to safely remove user data. Very lightly tested, so no
> guarantees.
Yep, that's the kind of callback I had in mind.
- Werner
--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible?
2003-02-18 17:22 ` Werner Almesberger
@ 2003-02-19 3:30 ` Rusty Russell
2003-02-19 4:11 ` Werner Almesberger
2003-02-20 0:40 ` Roman Zippel
2003-02-23 23:34 ` Kevin O'Connor
2 siblings, 1 reply; 30+ messages in thread
From: Rusty Russell @ 2003-02-19 3:30 UTC (permalink / raw)
To: Werner Almesberger; +Cc: Roman Zippel, kuznet, linux-kernel
In message <20030218142257.A10210@almesberger.net> you write:
> Next round: possible remedies and their side-effects. As
> usual, if you disagree with something, please holler.
>
> If yes, let's look at possible (and not overly insane) solutions,
> using remove_proc_entry as a case study:
>
> 1) still don't kfree, and leave it to the user to somehow
> minimize the damage. (Good luck :-)
>
> 2) add a callback that is invoked when the proc entry gets
> deleted. (This callback may be called before remove_proc_entry
> completes.) Problem: unload/return race for modules.
OK. For reference, the "state of 2.4" solution (which is also the
"state of 2.5" solution) looks like:
> struct proc_dir_entry *de = create_proc_entry(...);
> void *my_data;
>
> de->data = my_data = kmalloc(...);
=====> de->owner = THIS_MODULE;
> ...
> remove_proc_entry(...);
> /* what happens with "my_data", formerly known as "de->data" ? */
And have proc_file_operations do the standard owner get and release:
open: proc_open,
release: proc_release,
static int proc_open(struct inode *inode, struct file *filp)
{
struct proc_dir_entry *dp = PDE(inode);
if (!try_module_get(dp->owner))
return -ENOENT;
return 0;
}
static int proc_release(struct inode *inode, struct file *filp)
{
struct proc_dir_entry *dp = PDE(inode);
module_put(dp->owner);
return 0;
}
Now, if remove_proc_entry() is called from module_exit(), the kfree()
works fine, since (1) we wouldn't be in module_exit() if the proc
entry was in used, and (2) the try_module_get() prevents any new
users.
Of course, if you wanted to remove the entry at any other time
(eg. hotplug), this doesn't help you one damn bit (which is kind of
your point).
> 3) change remove_proc_entry or add remove_proc_entry_wait that
> works like remove_proc_entry, but blocks until the entry is
> deleted. Problem: may sleep "forever".
This is what network devices do, and what the sockopt registration
code does, too, so this is already in the kernel, too. It's not
great, but it becomes a noop for the module deregistration stuff.
Thanks!
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible?
2003-02-19 3:30 ` Rusty Russell
@ 2003-02-19 4:11 ` Werner Almesberger
2003-02-19 23:38 ` Rusty Russell
0 siblings, 1 reply; 30+ messages in thread
From: Werner Almesberger @ 2003-02-19 4:11 UTC (permalink / raw)
To: Rusty Russell; +Cc: Roman Zippel, kuznet, kronos, linux-kernel
Rusty Russell wrote:
> Of course, if you wanted to remove the entry at any other time
> (eg. hotplug), this doesn't help you one damn bit (which is kind of
> your point).
Yep, try_module_get solves the general synchronization problem for
the special but interesting case of modules, but not for the general
case.
> This is what network devices do, and what the sockopt registration
> code does, too, so this is already in the kernel, too. It's not
> great, but it becomes a noop for the module deregistration stuff.
Yes, I think just sleeping isn't so bad at all. First of all,
we already have the module use count as a kind of "don't unload
now" advice (not sure if you plan to phase out MOD_INC_USE_COUNT ?),
and second, it's not exactly without precedent anyway. E.g. umount
will have little qualms of letting you sleep "forever". (And,
naturally, every once in a while, people hate it for this :-)
Anyway, I'll write more about this tomorrow. For tonight, I
have my advanced insanity 101 to finish, topic "ptracing
more than one UML/TT at the same time".
- Werner
--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible?
2003-02-19 4:11 ` Werner Almesberger
@ 2003-02-19 23:38 ` Rusty Russell
2003-02-20 9:46 ` Roman Zippel
0 siblings, 1 reply; 30+ messages in thread
From: Rusty Russell @ 2003-02-19 23:38 UTC (permalink / raw)
To: Werner Almesberger; +Cc: Roman Zippel, kuznet, kronos, linux-kernel
In message <20030219011154.X2092@almesberger.net> you write:
> Rusty Russell wrote:
> > Of course, if you wanted to remove the entry at any other time
> > (eg. hotplug), this doesn't help you one damn bit (which is kind of
> > your point).
>
> Yep, try_module_get solves the general synchronization problem for
> the special but interesting case of modules, but not for the general
> case.
Absolutely. And I admire your (and Roman's) bravery for trying the
general case.
> will have little qualms of letting you sleep "forever". (And,
> naturally, every once in a while, people hate it for this :-)
(Well, MOD_INC_USE_COUNT is already deprecated, mainly because it's
used to try to control module counts from within, which preempt makes
really hard).
Yes, the addition of "umount -l" is congruent to "rmmod --wait". The
assumption is "I don't want any new users, and I'll handle any current
ones". You can get yourself in trouble with both of them.
Keep up the good work,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible?
2003-02-18 17:22 ` Werner Almesberger
2003-02-19 3:30 ` Rusty Russell
@ 2003-02-20 0:40 ` Roman Zippel
2003-02-20 2:17 ` Werner Almesberger
2003-02-23 23:34 ` Kevin O'Connor
2 siblings, 1 reply; 30+ messages in thread
From: Roman Zippel @ 2003-02-20 0:40 UTC (permalink / raw)
To: Werner Almesberger; +Cc: Rusty Russell, kuznet, kronos, linux-kernel
Hi,
On Tue, 18 Feb 2003, Werner Almesberger wrote:
> Next round: possible remedies and their side-effects. As
> usual, if you disagree with something, please holler.
>
> If yes, let's look at possible (and not overly insane) solutions,
> using remove_proc_entry as a case study:
>
> 1) still don't kfree, and leave it to the user to somehow
> minimize the damage. (Good luck :-)
Obviously not a good idea.
> 3) change remove_proc_entry or add remove_proc_entry_wait that
> works like remove_proc_entry, but blocks until the entry is
> deleted. Problem: may sleep "forever".
Sleeping is not a good general solution, as you have to be very careful
not to hold any important locks, otherwise it's easy to abuse.
> 6) export the lookup mechanism, and let the caller poll for
> removal. Problem: races with creation of a new entry with
> the same name.
I don't really understand this one.
> 7) transfer ownership of de->data to procfs, and set some
> (possibly configurable) destruction policy (e.g. always
> kfree, or such). Similar to 2), but less flexible.
de->data might contain references to other data and de->data might not the
only dynamic data, just the most visible one, the read/write functions can
access other dynamic data as well.
> 2) add a callback that is invoked when the proc entry gets
> deleted. (This callback may be called before remove_proc_entry
> completes.) Problem: unload/return race for modules.
>
> 4) make remove_proc_entry return an indication of whether the
> entry was successfully removed or not. Problem: if it
> wasn't, what can we do then ?
>
> 5) like above, but don't remove the entry if we can't do it
> immediately. Problem: there's no notification for when we
> should try again, so we'd have to poll.
>
> Any more ?
If you want to make 4) and 5) separate options, you could the same with
2), you can unlink the entry during remove_proc_entry or after the last
user is gone (before the callback is called).
It would not be difficult to separate an unlink_proc_entry from
remove_proc_entry, so we can force an unlink if needed and we can reduce
this again to two basic options in two variations. :)
The question is now whether we return an error value or use a callback.
When a device is removed, we usually also want to remove all its data
structures, on the other hand we can only remove a module when there are
no users, so here we return an error value.
Now I need a bigger example to put this into a context, a nice example is
scsi_unregister. It removes among other things procfs entries and these
entries have a reference to struct Scsi_Host. As long as scsi_unregister
is called from module_exit everything works fine, but a bit searching
reveals drivers/usb/storage/usb.c, which create/removes a scsi host when
you plug/unplug a storage device (let's ignore other problems here, like
it's still mounted). In this case nothing protects the removal of the proc
entries anymore, this means if someone accesses /proc/scsi/usb-storage/...
while the device is unplugged he will access freed data. Here would be now
the patch from yesterday useful to implement a callback, but now we also
can't simply free the Scsi_Host structure, so we have to add a reference
count to it and only if all references to it are gone, it can be removed
as well.
At this point I can also respond to Rusty's main argument against giving
more control to the modules, that they suddenly had to deal with "I
started deregistering my stuff, but then...". Unless interfaces are
only used during module_init/module_exit, they have to deal with this
anyway. I can understand that the module interface is tempting - just wait
until all user are gone and then disable the module. The only problem is
that this breaks horribly, when the driver interface is used in a
different context (as demonstrated with the usb/scsi example). So what
speaks against forcing driver/interface writers to get it right from the
beginning?
(I at least could think of a few more reason that speak for this, but I'll
continue this later.)
bye, Roman
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible?
2003-02-20 0:40 ` Roman Zippel
@ 2003-02-20 2:17 ` Werner Almesberger
2003-02-23 16:02 ` Roman Zippel
0 siblings, 1 reply; 30+ messages in thread
From: Werner Almesberger @ 2003-02-20 2:17 UTC (permalink / raw)
To: Roman Zippel; +Cc: Rusty Russell, kuznet, kronos, linux-kernel
Roman Zippel wrote:
> Sleeping is not a good general solution, as you have to be very careful
> not to hold any important locks, otherwise it's easy to abuse.
Sleeping has many problems, but it also has one great advantage:
it's relatively easy to understand and doesn't need much extra
code (if any) in the user.
Also, I wouldn't expect lock interdependencies to be a major
problem at the level of something removing all traces of itself
(e.g. a driver module unregistering itself). Or would you happen
to know some example where such problems are happening ?
> > 6) export the lookup mechanism, and let the caller poll for
> > removal. Problem: races with creation of a new entry with
> > the same name.
>
> I don't really understand this one.
Something like:
foo_schedule_destruction(&whatever);
while (foo_lookup(whatever.name))
yield();
Usually too ugly to consider for anything but desperate cases.
> It would not be difficult to separate an unlink_proc_entry from
> remove_proc_entry, so we can force an unlink if needed and we can reduce
> this again to two basic options in two variations. :)
Yes, separating unlink (quick, non-blocking, always succeeds) and
destruction (slow, may run nethack, fragile) clearly has some appeal.
> The question is now whether we return an error value or use a callback.
There's also the difference that the version using a callback would
schedule the removal, while the version returning an error would do
nothing (unless the caller tries again).
You could also do both, but that gets pretty difficult to use.
> Now I need a bigger example to put this into a context, a nice example is
> scsi_unregister.
Nicely illustrates the problem of the "can look around one corner,
but not two" property of things like try_module_get, yes.
I'd also expect many cases of multiple devices which are serviced by
the same driver to have the same sort of problems. E.g. I'm pretty
sure I committed a few such sins in the ATM code, too.
> So what speaks against forcing driver/interface writers to get it
> right from the beginning?
Raising the general awareness of such problems is exactly what I'm
trying to accomplish here :-) By the way, also the argument "it may
be broken but it has never failed so far" is getting weaker with
the emergence of technologies that change relative code path
lengths, like hyperthreading.
- Werner
--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible?
2003-02-19 23:38 ` Rusty Russell
@ 2003-02-20 9:46 ` Roman Zippel
0 siblings, 0 replies; 30+ messages in thread
From: Roman Zippel @ 2003-02-20 9:46 UTC (permalink / raw)
To: Rusty Russell; +Cc: Werner Almesberger, kuznet, kronos, linux-kernel
Hi,
On Thu, 20 Feb 2003, Rusty Russell wrote:
> Yes, the addition of "umount -l" is congruent to "rmmod --wait". The
> assumption is "I don't want any new users, and I'll handle any current
> ones". You can get yourself in trouble with both of them.
With the small difference that "umount -l" won't deadlock.
bye, Roman
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible?
@ 2003-02-20 12:09 Adam J. Richter
2003-02-20 12:46 ` Roman Zippel
0 siblings, 1 reply; 30+ messages in thread
From: Adam J. Richter @ 2003-02-20 12:09 UTC (permalink / raw)
To: zippel; +Cc: linux-kernel, rusty, wa
On 2003-02-20, Roman Zippel responded to Werner Almesberger:
>The question is now whether we return an error value or use a callback.
>When a device is removed, we usually also want to remove all its data
>structures, on the other hand we can only remove a module when there are
>no users, so here we return an error value.
>Now I need a bigger example to put this into a context, a nice example is
>scsi_unregister. It removes among other things procfs entries and these
>entries have a reference to struct Scsi_Host. As long as scsi_unregister
>is called from module_exit everything works fine, but a bit searching
>reveals drivers/usb/storage/usb.c, which create/removes a scsi host when
>you plug/unplug a storage device (let's ignore other problems here, like
>it's still mounted).
The ability to remove a module is generally independent of
whether or not there is any hardware present at that moment for which
the module supplies a driver. Instead, the determining issue is
whether there are file descriptors open for that driver.
Of course, if the necessary hardware was never present at any
time when the device driver's module was loaded, then there never will
be any file descriptors open for the device driver.
Adam J. Richter __ ______________ 575 Oroville Road
adam@yggdrasil.com \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible?
2003-02-20 12:09 Adam J. Richter
@ 2003-02-20 12:46 ` Roman Zippel
0 siblings, 0 replies; 30+ messages in thread
From: Roman Zippel @ 2003-02-20 12:46 UTC (permalink / raw)
To: Adam J. Richter; +Cc: linux-kernel, Rusty Russell, wa
Hi,
On Thu, 20 Feb 2003, Adam J. Richter wrote:
> The ability to remove a module is generally independent of
> whether or not there is any hardware present at that moment for which
> the module supplies a driver. Instead, the determining issue is
> whether there are file descriptors open for that driver.
I don't understand, what you're trying to say.
File descriptors are not the only way to access a driver and the ability
to remove a module is only dependent on the number of references to this
module.
bye, Roman
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible?
@ 2003-02-20 13:51 Adam J. Richter
2003-02-20 14:06 ` Werner Almesberger
2003-02-20 15:38 ` Roman Zippel
0 siblings, 2 replies; 30+ messages in thread
From: Adam J. Richter @ 2003-02-20 13:51 UTC (permalink / raw)
To: zippel; +Cc: linux-kernel, rusty, wa
On Thu, 20 Feb 2003, Roman Zippel wrote:
>On Thu, 20 Feb 2003, Adam J. Richter wrote:
>> The ability to remove a module is generally independent of
>> whether or not there is any hardware present at that moment for which
>> the module supplies a driver. Instead, the determining issue is
>> whether there are file descriptors open for that driver.
>I don't understand, what you're trying to say.
>File descriptors are not the only way to access a driver and the ability
>to remove a module is only dependent on the number of references to this
>module.
You're right. My second sentence was an oversimplification.
I should have said "software references" rather than file descriptors
to include things like "ifconfig eth0 up" creating a reference,
mounting a block device creating a refernece, etc. (Perhaps I
should have stated only my first sentence and stopped there.)
Anyhow, my point is that removing a piece of hardware
does not require that the corresponding module be unloaded
immediately.
Adam J. Richter __ ______________ 575 Oroville Road
adam@yggdrasil.com \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible?
2003-02-20 13:51 [RFC] Is an alternative module interface needed/possible? Adam J. Richter
@ 2003-02-20 14:06 ` Werner Almesberger
2003-02-20 15:38 ` Roman Zippel
1 sibling, 0 replies; 30+ messages in thread
From: Werner Almesberger @ 2003-02-20 14:06 UTC (permalink / raw)
To: Adam J. Richter; +Cc: zippel, linux-kernel, rusty
Adam J. Richter wrote:
> Anyhow, my point is that removing a piece of hardware
> does not require that the corresponding module be unloaded
> immediately.
Nor does the use of a module require the presence of any
non-generic hardware in the first place (e.g. net/sched/,
netfilter, PPP, etc.) :-)
- Werner
--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible?
2003-02-20 13:51 [RFC] Is an alternative module interface needed/possible? Adam J. Richter
2003-02-20 14:06 ` Werner Almesberger
@ 2003-02-20 15:38 ` Roman Zippel
1 sibling, 0 replies; 30+ messages in thread
From: Roman Zippel @ 2003-02-20 15:38 UTC (permalink / raw)
To: Adam J. Richter; +Cc: linux-kernel, Rusty Russell, wa
Hi,
On Thu, 20 Feb 2003, Adam J. Richter wrote:
> Anyhow, my point is that removing a piece of hardware
> does not require that the corresponding module be unloaded
> immediately.
That's true, but when a piece of hardware is removed, you usually also
want to get rid of some data structures, but if the relevant functions are
not prepared to be called outside of module_exit, you have a huge problem.
bye, Roman
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible?
2003-02-20 2:17 ` Werner Almesberger
@ 2003-02-23 16:02 ` Roman Zippel
2003-02-26 23:26 ` Werner Almesberger
0 siblings, 1 reply; 30+ messages in thread
From: Roman Zippel @ 2003-02-23 16:02 UTC (permalink / raw)
To: Werner Almesberger; +Cc: Rusty Russell, kuznet, kronos, linux-kernel
Hi,
On Wed, 19 Feb 2003, Werner Almesberger wrote:
> Roman Zippel wrote:
> > Sleeping is not a good general solution, as you have to be very careful
> > not to hold any important locks, otherwise it's easy to abuse.
>
> Sleeping has many problems, but it also has one great advantage:
> it's relatively easy to understand and doesn't need much extra
> code (if any) in the user.
Uninterruptable sleep should generally be avoided at kernel level, either
it should be possible to interrupt it or it should at least include a
timeout, but then you have to deal with failure again.
OTOH if you have callbacks, you can easily implement sleeping yourself.
> Also, I wouldn't expect lock interdependencies to be a major
> problem at the level of something removing all traces of itself
> (e.g. a driver module unregistering itself). Or would you happen
> to know some example where such problems are happening ?
Imagine you want to remove a device structure, if you manage to block
this somehow, you might completely block the creation and removeal of
other devices.
> > Now I need a bigger example to put this into a context, a nice example is
> > scsi_unregister.
>
> Nicely illustrates the problem of the "can look around one corner,
> but not two" property of things like try_module_get, yes.
Thanks, it seems I didn't cause too much confusion yet, so I can go on
with the next part. I think we can agree that the module locking only
protects a special case and as hotpluggable devices become more and more
important, we should rather get it right in the general case.
Anyway, this alone would be not reason enough to change the module
interface, but another module interface would give us more flexibility and
reduce the locking complexity. To make it easy for Rusty I take an example
he is familiar with - netfilter.
Let's assume we had an interface which allows exit to fail, how would that
help? First, struct nf_hook_ops gets a reference count (struct module does
the job), now with a small change to nf_unregister_hook:
void nf_unregister_hook(struct nf_hook_ops *reg)
{
br_write_lock_bh(BR_NETPROTO_LOCK);
- list_del(®->list);
+ list_del_init(®->list);
br_write_unlock_bh(BR_NETPROTO_LOCK);
}
we can do the following from module_exit:
int foo_exit()
{
nf_unregister_hook(&foo_ops);
return module_refcount(THIS_MODULE) ? -EBUSY : 0;
}
to get a reference to the module this would be enough:
static inline void __module_get(struct module *module)
{
local_inc(&module->ref[smp_processor_id()].count);
}
releasing the reference is as simple:
static inline void __module_put(struct module *module)
{
local_dec(&module->ref[smp_processor_id()].count);
}
This probably needs a bit of explanation:
1. more flexibility: The driver has better the knowledge about how the
module can be stopped and especially only the driver knows which events
should prevent the shutdown of the module.
There is an important difference between stopping the module and actually
removing the module. Any reference to the module must prevent the module
removal, but not everything needs to stop the module from shutting down.
For example an incoming network packet doesn't need to prevent the
shutdown, so we can just remove the hook above independent of the
reference count and just wait for the reference to become zero.
On the other hand this means the module could be in a disabled state, but
this can rather be compared with lazy unmount (via "umount -l") - after
the last user is gone the module can be removed without problems, but the
module could also be reenabled.
2. reduced locking complexity: Above demonstrates how it could look like,
if we can get rid of the live state (at least outside of the generic
module code). I explained the theory behind this before and posted the
link already a few times:
http://marc.theaimsgroup.com/?l=linux-kernel&m=104284223130775&w=2
This live state isn't needed most of the time and if some module should
really need this, it can easily implement that itself.
To help understanding this it maybe helps to clear up a common
misconception: modules are no special objects and they are not different
to other objects which can be removed while the kernel is running. Even
Rusty confuses this issue (http://lwn.net/Articles/15404/). The question
should rather be: how can I safely access an object which I don't own?
There are exactly two rules:
- The user has to lock out the owner.
- When the user wants to access the object outside the lock, he has to
tell that the owner, usually done via a reference count.
One of these conditions must be true (no exception), otherwise we have a
problem.
Applied to netfilter this means as long as we hold the netproto brlock we
can safely access any netfilter hook and only if we need a reference to
the hook outside of this lock, we had to call try_module_get(). This also
means that we don't have to try to get a reference for every packet,
instead e.g. it would be enough to do this once per connection.
(Why now an atomic counter wasn't sufficient for this and why Rusty had to
mess up the module code for this, will probably stay one of the big
mysteries...)
bye, Roman
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible?
2003-02-18 17:22 ` Werner Almesberger
2003-02-19 3:30 ` Rusty Russell
2003-02-20 0:40 ` Roman Zippel
@ 2003-02-23 23:34 ` Kevin O'Connor
2003-02-24 12:14 ` Roman Zippel
2 siblings, 1 reply; 30+ messages in thread
From: Kevin O'Connor @ 2003-02-23 23:34 UTC (permalink / raw)
To: Werner Almesberger; +Cc: Roman Zippel, Rusty Russell, linux-kernel
On Tue, Feb 18, 2003 at 02:22:57PM -0300, Werner Almesberger wrote:
> Next round: possible remedies and their side-effects. As
> usual, if you disagree with something, please holler.
>
> If yes, let's look at possible (and not overly insane) solutions,
> using remove_proc_entry as a case study:
Hi Werner,
Thanks for putting together this list; I've been following this thread for
a while and it seems that the discussion always deteriorates into
semantics. Anyway, I think there is an eighth possible solution to the
list you proposed.
Just to be specific, the requirements for the proc entry stuff are:
a) a mechanism needs to be defined to indicate that the entry is no longer
needed (something that starts the delete process -- ie, remove_proc_entry),
b) the code must conform to a system that ensures de->data will not be used
after the "release" code is executed, and
c) the "release" code must be eventually executed.
Assuming these requirements are really requirements (your options 1 and 4
don't seem to meet these) then an "eighth" solution is:
8) Have the unregister code (remove_proc_entry) set an external flag (eg,
de->data_is_there), and update all users of de->data to check the flag
before following the pointer.
Option 8 may not qualify as "sane", but I think it is important to add it
because it is what the module code is currently using. Thus, one need not
look at the module stuff as a "special case", but as a general (if
complicated) resource management solution.
Finally, one could probably apply any of the "options" for any dynamically
allocated memory. It is interesting that Linus seems to prefer option 2/7
(from the recent kobject work and CodingStyle).
If I've missed something, please let me know.
-Kevin
> 1) still don't kfree, and leave it to the user to somehow
> minimize the damage. (Good luck :-)
>
> 2) add a callback that is invoked when the proc entry gets
> deleted. (This callback may be called before remove_proc_entry
> completes.) Problem: unload/return race for modules.
>
> 3) change remove_proc_entry or add remove_proc_entry_wait that
> works like remove_proc_entry, but blocks until the entry is
> deleted. Problem: may sleep "forever".
>
> 4) make remove_proc_entry return an indication of whether the
> entry was successfully removed or not. Problem: if it
> wasn't, what can we do then ?
>
> 5) like above, but don't remove the entry if we can't do it
> immediately. Problem: there's no notification for when we
> should try again, so we'd have to poll.
>
> 6) export the lookup mechanism, and let the caller poll for
> removal. Problem: races with creation of a new entry with
> the same name.
>
> 7) transfer ownership of de->data to procfs, and set some
> (possibly configurable) destruction policy (e.g. always
> kfree, or such). Similar to 2), but less flexible.
--
------------------------------------------------------------------------
| Kevin O'Connor "BTW, IMHO we need a FAQ for |
| kevin@koconnor.net 'IMHO', 'FAQ', 'BTW', etc. !" |
------------------------------------------------------------------------
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible?
2003-02-23 23:34 ` Kevin O'Connor
@ 2003-02-24 12:14 ` Roman Zippel
0 siblings, 0 replies; 30+ messages in thread
From: Roman Zippel @ 2003-02-24 12:14 UTC (permalink / raw)
To: Kevin O'Connor; +Cc: Werner Almesberger, Rusty Russell, linux-kernel
Hi,
On Sun, 23 Feb 2003, Kevin O'Connor wrote:
> 8) Have the unregister code (remove_proc_entry) set an external flag (eg,
> de->data_is_there), and update all users of de->data to check the flag
> before following the pointer.
>
> Option 8 may not qualify as "sane", but I think it is important to add it
> because it is what the module code is currently using. Thus, one need not
> look at the module stuff as a "special case", but as a general (if
> complicated) resource management solution.
Yes, it's another possible solution, but it has the same problem as the
current module locking - increased locking complexity.
Such flag actually exists already ("deleted"), but no user can use it
currently, because the read/write functions don't have the proc entry
argument. Even if they could use it, switching this flag isn't enough
remove_proc_entry also had to synchronize with active users, so users had
to take some lock just to read the data, where a simple reference was
sufficient before.
bye, Roman
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible?
2003-02-23 16:02 ` Roman Zippel
@ 2003-02-26 23:26 ` Werner Almesberger
2003-02-27 12:34 ` Roman Zippel
0 siblings, 1 reply; 30+ messages in thread
From: Werner Almesberger @ 2003-02-26 23:26 UTC (permalink / raw)
To: Roman Zippel; +Cc: Rusty Russell, kuznet, kronos, linux-kernel
Roman Zippel wrote:
> Anyway, this alone would be not reason enough to change the module
> interface, but another module interface would give us more flexibility and
> reduce the locking complexity.
Wait, wait ! :-) There's one step you've left out: what we actually
expect the module interface to do. We have:
- what it currently does, or what it did in the past
- what users think it does
- what users want it to do
- what we think the users should want
- what we think is a comfortable compromise
With "users", I mean primarily the guy who invokes "rmmod", or such.
Anyway, I'm afraid I can't offer much wisdom from experience for this
part, for I'm not much of a module user myself. I'll have more to say
on service interfaces, though.
Sorry for slowing down, but I'm currently quite busy absorbing all
the cool stuff that's recently been happening with UML. (So, blame
Jeff ;-))
- Werner
--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible?
2003-02-26 23:26 ` Werner Almesberger
@ 2003-02-27 12:34 ` Roman Zippel
2003-02-27 13:20 ` Werner Almesberger
0 siblings, 1 reply; 30+ messages in thread
From: Roman Zippel @ 2003-02-27 12:34 UTC (permalink / raw)
To: Werner Almesberger; +Cc: Rusty Russell, kuznet, kronos, linux-kernel
Hi,
On Wed, 26 Feb 2003, Werner Almesberger wrote:
> Roman Zippel wrote:
> > Anyway, this alone would be not reason enough to change the module
> > interface, but another module interface would give us more flexibility and
> > reduce the locking complexity.
>
> Wait, wait ! :-) There's one step you've left out: what we actually
> expect the module interface to do. We have:
There are several module interfaces:
- module user interface
- module load interface
- module driver interface
These are quite independent and so far we were talking about the last one,
so I'm a bit confused about your request to talk about the first.
<rant>
BTW Why Rusty had to completely break the first two interfaces to
"improve" the last one, is probably another mystery, I'll never
understand.
</rant>
bye, Roman
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible?
2003-02-27 12:34 ` Roman Zippel
@ 2003-02-27 13:20 ` Werner Almesberger
2003-02-27 14:33 ` Roman Zippel
0 siblings, 1 reply; 30+ messages in thread
From: Werner Almesberger @ 2003-02-27 13:20 UTC (permalink / raw)
To: Roman Zippel; +Cc: Rusty Russell, kuznet, kronos, linux-kernel
Roman Zippel wrote:
> There are several module interfaces:
> - module user interface
> - module load interface
> - module driver interface
Hmm, the "load interface" would be the system calls, while
the "driver interface" would be initialization and cleanup
functions in the module ?
We've already established that most things called from the
latter isn't actually module specific.
> These are quite independent and so far we were talking about the last one,
> so I'm a bit confused about your request to talk about the first.
I'm not so sure about them being independent. E.g. if we
just had a blocking single-phase cleanup, users would always
see success, but resources may be tied up indefinitely, which
would break uses like
rmmod foo
insmod foo.o
On the other hand, with a non-blocking two-phase cleanup, users
would still always see success, but only "anonymous" resources
(memory, etc.) would be tied up.
Last but not least, a non-blocking single-phase cleanup would
expose all failures to the user, and require some retry strategy.
Furthermore, use counts can have several meanings:
- indicate when it's convenient for the module to be removed
- indicate when it's possible for the module to be removed
- indicate what consequences the user may experience if
trying to remove now (e.g. blocking)
The "old" module count was a bit of a mixture of the first two.
The "new" count is clearer.
Oh, lest I be misunderstood: I'm not saying that we should
redesign everything, and screw compatibility. I just want to
bring the hidden assumptions that have piled up over the life
of the module system out in the open.
Then, of course, there are still plenty of opportunities to
plot any massive breakage ;-)
- Werner
--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible?
2003-02-27 13:20 ` Werner Almesberger
@ 2003-02-27 14:33 ` Roman Zippel
0 siblings, 0 replies; 30+ messages in thread
From: Roman Zippel @ 2003-02-27 14:33 UTC (permalink / raw)
To: Werner Almesberger; +Cc: Rusty Russell, kuznet, kronos, linux-kernel
Hi,
On Thu, 27 Feb 2003, Werner Almesberger wrote:
> > There are several module interfaces:
> > - module user interface
> > - module load interface
> > - module driver interface
>
> Hmm, the "load interface" would be the system calls, while
> the "driver interface" would be initialization and cleanup
> functions in the module ?
>
> We've already established that most things called from the
> latter isn't actually module specific.
But the module driver interface has an impact on the other driver
interfaces and that's what I tried to describe in the previous mail.
> > These are quite independent and so far we were talking about the last one,
> > so I'm a bit confused about your request to talk about the first.
>
> I'm not so sure about them being independent. E.g. if we
> just had a blocking single-phase cleanup, users would always
> see success, but resources may be tied up indefinitely, which
> would break uses like
>
> rmmod foo
> insmod foo.o
rmmod could already fail before, because the module is busy, so I'm not
sure, what should break here.
> On the other hand, with a non-blocking two-phase cleanup, users
> would still always see success, but only "anonymous" resources
> (memory, etc.) would be tied up.
>
> Last but not least, a non-blocking single-phase cleanup would
> expose all failures to the user, and require some retry strategy.
Now you're skipping ahead. You are already looking at the possible
consequences for the module user interface, before we actually made clear
in which ways the module driver interface could be changed.
There are of course dependencies between the interfaces, but you can
change a lot in one before you have to modify another.
> Furthermore, use counts can have several meanings:
> - indicate when it's convenient for the module to be removed
> - indicate when it's possible for the module to be removed
> - indicate what consequences the user may experience if
> trying to remove now (e.g. blocking)
>
> The "old" module count was a bit of a mixture of the first two.
> The "new" count is clearer.
There is/was no count for the first?!
Which "new" count?
bye, Roman
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2003-02-27 14:23 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-02-20 13:51 [RFC] Is an alternative module interface needed/possible? Adam J. Richter
2003-02-20 14:06 ` Werner Almesberger
2003-02-20 15:38 ` Roman Zippel
-- strict thread matches above, loose matches on Subject: below --
2003-02-20 12:09 Adam J. Richter
2003-02-20 12:46 ` Roman Zippel
2003-02-14 12:04 [RFC] Migrating net/sched to new module interface Rusty Russell
2003-02-14 13:21 ` Roman Zippel
2003-02-14 13:53 ` Werner Almesberger
2003-02-14 14:24 ` Roman Zippel
2003-02-14 18:30 ` Werner Almesberger
2003-02-14 20:09 ` Roman Zippel
2003-02-15 0:12 ` Werner Almesberger
2003-02-15 0:51 ` Roman Zippel
2003-02-15 2:28 ` Werner Almesberger
2003-02-15 23:20 ` Roman Zippel
2003-02-17 17:04 ` Werner Almesberger
2003-02-17 23:09 ` [RFC] Is an alternative module interface needed/possible? Roman Zippel
2003-02-18 1:18 ` Werner Almesberger
2003-02-18 4:54 ` Rusty Russell
2003-02-18 7:20 ` Werner Almesberger
2003-02-18 12:06 ` Roman Zippel
2003-02-18 14:12 ` Werner Almesberger
2003-02-18 12:45 ` Thomas Molina
2003-02-18 17:22 ` Werner Almesberger
2003-02-19 3:30 ` Rusty Russell
2003-02-19 4:11 ` Werner Almesberger
2003-02-19 23:38 ` Rusty Russell
2003-02-20 9:46 ` Roman Zippel
2003-02-20 0:40 ` Roman Zippel
2003-02-20 2:17 ` Werner Almesberger
2003-02-23 16:02 ` Roman Zippel
2003-02-26 23:26 ` Werner Almesberger
2003-02-27 12:34 ` Roman Zippel
2003-02-27 13:20 ` Werner Almesberger
2003-02-27 14:33 ` Roman Zippel
2003-02-23 23:34 ` Kevin O'Connor
2003-02-24 12:14 ` Roman Zippel
2003-02-18 12:35 ` Roman Zippel
2003-02-18 14:14 ` Werner Almesberger
2003-02-19 1:48 ` Roman Zippel
2003-02-19 2:27 ` Werner Almesberger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox