From mboxrd@z Thu Jan 1 00:00:00 1970 From: Rusty Russell Subject: Re: dev->destructor Date: Sat, 03 May 2003 14:07:41 +1000 Sender: netdev-bounce@oss.sgi.com Message-ID: <20030503040949.804182C003@lists.samba.org> References: <20030502.134804.78707298.davem@redhat.com> Cc: kuznet@ms2.inr.ac.ru, shemminger@osdl.org, netdev@oss.sgi.com, acme@conectiva.com.br Return-path: To: "David S. Miller" In-reply-to: Your message of "Fri, 02 May 2003 13:48:04 MST." <20030502.134804.78707298.davem@redhat.com> Errors-to: netdev-bounce@oss.sgi.com List-Id: netdev.vger.kernel.org In message <20030502.134804.78707298.davem@redhat.com> you write: > From: Rusty Russell > Date: Fri, 02 May 2003 15:25:15 +1000 > > If this is true, I think you can use the module reference count only, > and your code will be faster, too. I can prepare the patch for you > later tonight, to see how it looks. > > And where do we get the counter from when dev->owner is NULL > (ie. non-modular)? We need the reference counting regardless of > whether the device is implemented statically in the kernel or modular. But Alexey said you can only call unregister_netdev from module unload, ie. if not a module, it can't be unloaded, hence no refcount needed. I wrote the above paragraph because I'm not sure if I understood Alexey correctly? > Alexey, you are using a module but don't want to reference count it. > I made module reference counts very cheap so you don't have to worry, > but you still are trying to cheat 8) > > Understood. > > I think even stronger part of Alexey's argument is that all of > this "if (x->owner)" all over the place takes away some of the > gains of compiling things statically into the kernel. Why extra > branches all over the place? Agreed. I have considered removing that, and making THIS_MODULE equal a dummy module struct for the core kernel. I think that would be a win (we've already established that the actual refcount is cheap, possibly cheaper than the branch in practice). BTW, we should look at making local_inc() etc. a first-class citizen: it has uses outside modules. > As long as I can Control-C rmmod when it waits like this, which would > be the case, what is the problem? If you can, and the device is still usable afterwards, it would be nirvana 8) [ Refcounting tutorial skipped: I went through the same pain with early conntrack code, and learned to refcount everything now 8) ] > Look, once external view into module (ie. socket operations, > superblock ops, netdev registry) is removed, all that remains to > reference object is exactly these objects. It is the only different > part about modules vs. non-modules. This argument applies to all objects. If you reference count everything which holds a reference to an object, you can infer the reference count of the object from the sum of reference counts of its referees. In practice, as you pointed out in an earlier mail (I think sockets were your example), doing this proves to be extremely painful. And we're feeling the pain now. The module functions, and all its data, are objects. For convenience, size, and speed, we don't reference count them separately. OK, so why doesn't, say, struct netdevice grab the module refcount on registration (as is logical), and drop it on unregistration? Because the module holds a reference to the struct device, so now you have a classic circular reference count problem: the module reference count will never fall to zero. That's why we grab a reference just before use. You can, of course, do two-stage module cleanup (ie. first stage with refcounts non-zero), but the user wants clean "remove or fail to remove" semantics, so half-way through the cleanup you realize it's still in use, and restore things: now you have the spurious failure problem where it was half-unregistered for a while, and AFAICT you have to rewrite 1400 modules's cleanup routines. I imagined schemes where the kernel would be basically stopped during module remove, so the half-remove and unremove would appear atomic. I shied away from implementing such a monster without deadlock, but it might be possible. Then we would truly have nirvana 8) > Trying to unload %s, waiting for all references to go away. > Perhaps you have programs running which still reference this > module? (Hit Ctrl-C to interrupt unload to go and remove > those references) > ==================== > > nobody would ask what does this mean. :-) Please, implement such a thing. I was unable to, without introducing spurious failure in the components, *and* rewriting every module_cleanup function 8( Hence rmmod does not wait by default, but says "module is busy". > And I want to mention that in some cases you have to "wait". The best > example are TCP_TIME_WAIT sockets. Even after users downs all the > interfaces, and closes all the sockets, these remnants must remain for > their full life of 60 seconds. Yes, certainly. The two-stage unload provided by rmmod --wait ensures that the reference count decreases to zero (it makes try_module_get fail). This was my desire if unloading security modules or some netfilter modules was going to be reasonable. > I really am concerned at both sides, both user observed behavior and > kernel side correctness. There are shades of correctness, too. Not jumping into a module which has gone away is probably the most important. Having finite unload time is also nice. Not having spurious failures because someone tried to unload a module is nice too. Roman Zippel suggested (among other things) that every unregister fail when busy, and that modules the reinitialize. This gives the spurious failure problem, and means rewriting every unregister interface in the kernel, every module cleanup function, and dealing with the case where you're cleaning up a failed initialization and your unregister failed. Adam Richter suggested the module or interface register with a central repository to say "this modules uses this interface", then you can atomically freeze all the module interfaces (say they all share a single lock) and see if it's in use, but you can't really call the module's cleanup routine, so you have to atomically deactivate all these registrations before dropping the lock, and that's the same as try_module_get(). As you know, I love radical change. But I want be make sure we're going to end up somewhere we like at the end of it, which is why I didn't do it. Anyway, putting the module loader in the kernel was enough to sate my appetite for change 8) Thanks! Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell.