public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: Rusty's module talk at the Kernel Summit
@ 2002-07-04 17:24 Adam J. Richter
  2002-07-11  2:48 ` Rusty Russell
  0 siblings, 1 reply; 48+ messages in thread
From: Adam J. Richter @ 2002-07-04 17:24 UTC (permalink / raw)
  To: R.E.Wolff; +Cc: linux-kernel

Rogier Wolfff wrote:
>Adam J. Richter wrote:
>> >The total saving over all 2.5.24 modules is 4% of the total module
>> >sizes, rounded to page boundaries.
>> 
>>       As individual space optimizations go, 4% is respectable,
>> especially for something that has no cost, helps detect bugs and
>> simplifies the kernel.  It is hard to think of many potential
>> other space improvements that would as effective, especially as
>> function of implementation effort.  In comparison, my vmlinux is
>> 5% init sections.  So, if init sections are worth it for the
>> core kernel, they should be worth it for modules.
>
>Ehmmm. You normally load one big 1Mb kernel, freeing about 40 or 50k
>at init time.
>
>You normally load a couple of modules, totalling much less. 
>
>Hmm. Just checked on a system with sound as modules, I see half a
>megabyte of modules. So maybe that 20k is worth it. On the other hand,
>you only load half a megabyte of shit if you have the RAM to spare.
>20k is not worth the time I spend typing this....

	The system that I am composing this email on has 1.1MB of
modules and does not have sound drivers loaded.  It has ipv4 and a
number of other facilities modularized that are not modules in the
stock kernels.  Every system that I use has a configuration like this.
With a lower per-module overhead, I would be more inclined to try to
modularize other facilities and break up some larger modules into
smaller ones, in the case where there is substantial code that is not
needed for some configurations.

	Just for fun, using your numbers and US dollars:

    20kB DRAM x $150/GB of DRAM    = $0.003
    $0.003/user x 10 million users = $30,000 contribution to Linux users


>> >Most of that saving comes from a few modules.
>> 
>>       This makes me wonder if __init procedures are not being
>> aggressively identified.  I wonder if people would use __init a little
>> more if they knew they would get the benefit of it in the module case.
>> Perhaps someday someone will write a tool to identify procedures that
>> are only called from init sections.
>
>Sometimes the "error path" will try to reset/reinit the chip. You will
>not see that happening during a normal usage cycle, but you will get
>bitten if you remove the init based on an actual call-trace.... 

	Such routines would correctly be skipped by the tool that I
described.

>>       Kernel modules have been a way of life for me for years, and I
>> don't think I've ever caught a kernel bug by the mechanism that you
>
>This happens often enough "during development" that the bugs get fixed
>before you get to see them....

	As I said, you could have the following facility for when you
want to force use of vmalloc:

>> describe.  However, I see no harm in having a debugging option that
>> always vmalloc'ed kernel modules.  This faciilty could be entirely
>> configuarable from user level by having insmod allocate a module of
>> *exactly* one page for modules that were less than a page (since you
>> would only want to kmalloc modules that were *less* than one page).
>
>As far as I know, kmallocing more than half a page will actually
>allocate the whole page.

	If so, then that could be retuned if it turns out to be
optimal to do so.  Even without the change, there could still be a lot
of modules under half a page, such as the logitech bus mouse driver
that is loaded on this system right now.

	Making efficient use of a resource like memory often involves
repeatedly grabbing small savings of a percent or two.  Maybe you
start by releasing .init.data for 2%.  Then somebody submits a patch
to release .init.text without substantial kernel modifications (even
if only on x86) for 2%.  Then somebody writes a script to identify
.text and .data labels that are only referenced from init sections,
and that saves another 1%.  Then somebody adds a flag to insmod to
load modules in a non-removable mode that does not load the exit
sections, saving another 0.25%.  Then somebody changes allocation of
modules that are less than a page to use kmalloc(GFP_HIGHMEM) instead
of vmalloc (~30% of modules on my system are already this small).
Then somebody figures out a way to have vmalloc's larger than a page
that do not need page alignment can sometimes start in the unused last
page of another vmalloc.  This reduction in the per-module memory
overhead encourages people chip off some parts of larger library
modules that are not used by all of the clients of that library.  Then
somebody adds an kernel option to configure out some kernel code that
is unnecessary in an "everything is a module" configuration, and so on.

	At the end of the day, somebody who is trying to deploy web
browsers on donated PC's for the local school district without
maintaining a custom kernel finds that they can, or someone is able to
squeeze IPSec into the wireless access point that they turned into a
router, or someone finds that they can run a more standard kernel on a
future wristwatch, or someone chooses Linux over Vxworks for a storage
area network disk drive dongle for lower engineering costs and greater
extensibility.  Incremental savings can add up to important advantages.
 
Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."

^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: Rusty's module talk at the Kernel Summit
@ 2002-07-11  5:44 Adam J. Richter
  0 siblings, 0 replies; 48+ messages in thread
From: Adam J. Richter @ 2002-07-11  5:44 UTC (permalink / raw)
  To: acme; +Cc: davem, linux-kernel

>Date: Thu, 11 Jul 2002 00:01:54 -0300
>From: Arnaldo Carvalho de Melo <acme@conectiva.com.br>

>BTW, where are these patches for IPv4 modularisation? I'd love to take a look
>and try it... Adam? Is it available for 2.5.latest?

	I have to catch a plane to Beijing in the morning and I
haven't packed and the internet connectivity in the rooms there is
flakey (possibly due to their router, which is running a Linux 2.2
kernel, by the way).  So, please excuse my sloppy approach, as this
might otherwise take weeks.

	I have made a diff of linux/{net,drivers/net} against 2.5.25,
which should show my ipv4 modularization changes, although there are a
bunch of other changes that are irrelevant (unrelated changes to
various net device drivers) and some that might be relevant (e.g.,
disintegrating drivers/net/net_init.c, modularizing some media level
network protocols).

	The diff is FTPable from

	ftp://ftp.yggdrasil.com/private/adam/kernel/netdiff-2.5.25.gz

	In case I missed something, I have also placed a complete .tar.gz
kernel snapshot at

	ftp://ftp.yggdrasil.com/private/adam/kernel/linux-2.5.25.ygg.tar.gz

	ipv4 modularization would need to be looked over by the lkml
crowd and cleaned up before being sent to Linus.  I probably got lots
of details wrong.  As I mentioned in a previous email, I thought that
there was a modularized ipv4 already working its way to Linus from the
vger cvs tree (don't know if it still exists), which I presumed would
have had a lot more programmer power alreadya applied to it.  Perhaps
Dave Miller could comment on whether I misunderstood the situation
and, if there were other ipv4 modularization patches floating around,
whether he or anyone else knows their current status.

Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."


^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: Rusty's module talk at the Kernel Summit
@ 2002-07-11  5:07 Adam J. Richter
  0 siblings, 0 replies; 48+ messages in thread
From: Adam J. Richter @ 2002-07-11  5:07 UTC (permalink / raw)
  To: rusty; +Cc: linux-kernel

On 2002-07-11 2:48:30, Rusty Russell <rusty@rustcorp.com.au> wrote:
>On Thu, 4 Jul 2002 10:24:11 -0700
>"Adam J. Richter" <adam@yggdrasil.com> wrote:
>>       The system that I am composing this email on has 1.1MB of
>> modules and does not have sound drivers loaded.  It has ipv4 and a
>> number of other facilities modularized that are not modules in the
>> stock kernels.  Every system that I use has a configuration like this.
>> With a lower per-module overhead, I would be more inclined to try to
>> modularize other facilities and break up some larger modules into
>> smaller ones, in the case where there is substantial code that is not
>> needed for some configurations.
>
>For God's sake, WHY?  Look at what you're doing to your TLB (and if you
>made IPv4 a removable module, I'll bet real money you have a bug unless
>you are *very* *very* clever).

	My motivation in modularizing ipv4 was to be able to sqeeze more
drivers onto a boot floppy for CD's or hard disks and have that kernel
still be able to continue on bring up networking later (and to avoid
maintaining a different kernel binary).  Ultimately, I would like to
see CONFIG_NET modularized, if only to reduce the time spent reading
the floppy.

	I have deliberately not fixed some reference count problems in
my ipv4.o module right now because I'm pretty sure lots of things would
break if I tried removing it.  I did write a module_exit function, but
I never tried turning off the reference counting and executing it.

	I also was under the impression that Dave Miller had a modularized
ipv4 in a "vger cvs" kernel (remember them?), so I assumed that some
modularization of ipv4 was working its way to Linus.

	About translation lookaside cache misses, I was considering
breaking down these large modules mostly after the optimizations that
I wishfully described later in my posting:

| Then somebody changes allocation of
| modules that are less than a page to use kmalloc(GFP_HIGHMEM) instead
| of vmalloc (~30% of modules on my system are already this small).
| Then somebody figures out a way to have vmalloc's larger than a page
| that do not need page alignment can sometimes start in the unused last
| page of another vmalloc.

	In that case, it's a much more emperical question about
whether eliminating large chunks of unused code brings the code that
does run into the same page more often than splitting the module
causes code that was in the same page to be split into two different
pages, especially if there is a reasoonable chance that that code is
going to be loaded into a location that shares a page that would already
be in the TLB.

	Come to think of it, if modules do not have to occupy full pages,
you could perhaps add a "module affinity" so that modules that reference
each other would be more likely to end up sharing a page.  Module loading
happens tens of times a day, if that.  Inter-module calls can happen a
zillion times per second.  So, who knows, it might be worth the complexity,
could be in insmod.

	Dave Miller's proposal to use 4MB pages for modules is an
interesting alternative, but, isn't kmalloc()'ed memory already
in the kernel's big page?  If so, then using that for small modules
would have the same effect for at least those modules, and I believe
that kmalloc is set up to handle up to 128kB.

Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."

^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: Rusty's module talk at the Kernel Summit
@ 2002-07-03 15:53 Adam J. Richter
  2002-07-03 17:07 ` Hugh Dickins
  2002-07-03 23:09 ` Keith Owens
  0 siblings, 2 replies; 48+ messages in thread
From: Adam J. Richter @ 2002-07-03 15:53 UTC (permalink / raw)
  To: kaos; +Cc: linux-kernel

On Wed, 03 Jul 2002 22:27:33 +1000, Keith Owens wrote:
>On Wed, 3 Jul 2002 00:31:35 -0700, 
>"Adam J. Richter" <adam@yggdrasil.com> wrote:
>>      As individual space optimizations go, 4% is respectable,
>>especially for something that has no cost

>It is not at no cost.  Getting 4% requires arch dependent code to
>handle all the tables that are affected by partial text removal.  I can
>get 2% for nothing by discarding data.init.  Discarding text.init is a
>lot harder.
[...]
>The problem is the partial removal of code when there are tables that
>point to _all_ the code.  Partial code removal requires a lot of work
>to adjust every table that refers to code and correct them.  To make it
>worse, the tables are arch specific.  Most architectures have
>__ex_table, with different formats for each arch.  Some have unwind
>data, always arch dependent format.  MIPS has dbe.

>Data is not referenced by any of these tables so a partial discard of
>data is easy, no side effects to worry about.

	OK, I agree that anyone wanting to implement discarding of
some module init sections would be best off to start with .init.data.

	I don't know enough about the formats of these tables right now
to really understand the best way to handle them, but I suspect that
the simplest approach might be a mechanism where copy_*_user and the like
could generate assembler that does a .pushsection to a different section
depending on the current section, so you could have "__ex_table" and
".init.__ex_table", etc.  Then it might be possible to deal with these
sections in a way that is not architecture specific, and be able
to discard the obselete parts of these tables after the init code
has completed.  However, this would probably require a gas or gcc
extension.

[...]
>>	As I understand it, __ex_table is just for copy_{to,from}_uesr,
>>which would almost never be done from init sections

>__ex_table is used for any code that requires recovery.  Mainly
>copy..user but not exclusively.

>>The core kernel already deals with the same issue.

>It does not.  There is no code to adjust any tables after discarding
>kernel __init sections.  We rely on the fact that the discarded kernel
>area is not reused for executable text.

	Come to think of it, if the core kernel's .text.init pages could
later be vmalloc'ed for module .text section, then I think you may have
found a potential kernel bug.

Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."

^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: Rusty's module talk at the Kernel Summit
@ 2002-07-03  7:31 Adam J. Richter
  2002-07-03  8:54 ` Rogier Wolff
  2002-07-03 12:27 ` Keith Owens
  0 siblings, 2 replies; 48+ messages in thread
From: Adam J. Richter @ 2002-07-03  7:31 UTC (permalink / raw)
  To: kaos; +Cc: linux-kernel

On Wed, 03 Jul 2002 15:01:53 +1000, Keith Owens <kaos@ocs.com.au> wrote:
>On Mon, 1 Jul 2002  9:12:56 -0700, 
>"Adam J. Richter" <adam@yggdrasil.com> wrote:
>>
>>	As an extereme illustration, imagine a module with 4095 bytes
>>of non-init data and 2 bytes of init data.  With the .init section loaded,
>>the module will occupy two pages.  Freeing the .init section will free
>>an entire page, making 4096 bytes available to the system, even though
>>only two bytes were in the .init section.

>Agreed, so let's look at some real figures.  The tar ball below contains

>  A patch against kernel 2.5.24 to use init sections for module code
>  and data.

>  A patch against modutils 2.4.16 to disable error checks.  We are not
>  loading the modules, just getting data about their size.

>  A Perl script to read the output from the patched insmod and work out
>  what would be saved by discarding init sections.

>  Two reports from running the script against 2.5.24 with everything
>  that will build as a module.  One report is from discarding both code
>  and data.init, the other report is discarding just data.init.


	Cool.  Out of curiosity, is there some reason you need a
patched version of modutils for extracting this information, rather
than reading the output of "objdump --section-headers"?


[...]
>The total saving over all 2.5.24 modules is 4% of the total module
>sizes, rounded to page boundaries.

      As individual space optimizations go, 4% is respectable,
especially for something that has no cost, helps detect bugs and
simplifies the kernel.  It is hard to think of many potential
other space improvements that would as effective, especially as
function of implementation effort.  In comparison, my vmlinux is
5% init sections.  So, if init sections are worth it for the
core kernel, they should be worth it for modules.

>Most of that saving comes from a few modules.

	This makes me wonder if __init procedures are not being
aggressively identified.  I wonder if people would use __init a little
more if they knew they would get the benefit of it in the module case.
Perhaps someday someone will write a tool to identify procedures that
are only called from init sections.


>There is a lot of arch dependent work required to adjust the in-module
>tables to correctly record which code has been discarded.  If the
>tables are not adjusted then we run the risk of applying unwind or
>exception recovery to the wrong areas.

>I don't see that the complexity required to adjust the arch dependent
>tables is worth the small saving.

	I don't follow you.  Right now, I don't think one would have
to write any new kernel code to load init sections and the non-init
sections as two separate kernel modules, but perhaps I'm probably
missing something.

[..]

>>       It would also be possible to achieve space savings for modules
>>with non-init text+data+bss sizes smaller than a page by allocating
>>their space with kmalloc(...,__GFP_HIGHMEM) instead of vmalloc.

>That requires kfree() but kfree does not unmap the area.  Any buggy
>code that accesses the module after rmmod (which is the main problem
>with module unload) will not be detected.  vfree unmaps the entire
>module on removal.  An oops to detect buggy code is better that a
>silent data corruption.

	I do not believe that there is any guarantee that a subsequent
vmalloc() will not remap the same virtual addresses, and I do not believe
that there is any guarantee that a kfree'd area will remain mapped.  So,
in both cases, there are no guarantees.

	Kernel modules have been a way of life for me for years, and I
don't think I've ever caught a kernel bug by the mechanism that you
describe.  However, I see no harm in having a debugging option that
always vmalloc'ed kernel modules.  This faciilty could be entirely
configuarable from user level by having insmod allocate a module of
*exactly* one page for modules that were less than a page (since you
would only want to kmalloc modules that were *less* than one page).


>>       Here is what I have in mind.  I believe that removal of .init
>>sections could be implemented entirely in user land (aside from
>>removing the include/inux/init.h code that disables init sections for
>>modules).  Insmod would allocate two kernel modules, one for the init
>>sections and the other for the regular sections.  Insmod would resolve
>>references between the two sections.  The temporary module for the
>>init sections would be loaded first, with no initialization routine.
>>The module with the real data would be loaded second, and would run
>>the initialization routine (even if the initialization routine were in
>>the temporary init module).  When the initialization routine
>>completed, regardless of sucess or failure, the temporary init module
>>would be unloaded.

>I looked at that several years ago and discarded the idea.  There may
>be references from the init code/data to the main code/data. Those
>references cannot be resolved until the second module has known
>addresses, which requires insmod to keep track of two modules at once
>before either can be loaded.

	I do not understand how this is problem.  As far as I know,
there is nothing preventing one from doing two create_module calls
followed by two init_module calls, so there should be no problem
allocating the kernel modules.  The init module would be loaded first,
and would not run any initiailzation routine.  So, both modules would
be in kernel memory before any code was run.

>It also requires insmod to split the tables that refer to the init
>code.  For example, insmod would have to separate __ex_table and
>.modinfo data according to which sub-module each entry referred to.

	As I understand it, __ex_table is just for copy_{to,from}_uesr,
which would almost never be done from init sections, so it can go
in the non-init section.  The core kernel already deals with the same
issue.

	The .modinfo section is not something that would be loaded
into kernel memory.  The MODLE_PARM entries may refer to locations
in either kernel module, but I don't see how that is a problem.


>All things considered, loading as two modules is too much modutils work
>and maintenance for too little gain.

	Obviously it's not for me to tell you to write software for
me.  I just hope you'll accept a good patch if someone develops one.

Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."


^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: Rusty's module talk at the Kernel Summit
@ 2002-07-01 17:20 Adam J. Richter
  0 siblings, 0 replies; 48+ messages in thread
From: Adam J. Richter @ 2002-07-01 17:20 UTC (permalink / raw)
  To: jlnance; +Cc: linux-kernel

On Mon Jul  1 10:07:38 PDT 2002, Jim L. Nance wrote;
>On Mon, Jul 01, 2002 at 09:12:56AM -0700, Adam J. Richter wrote:
>>       As an extereme illustration, imagine a module with 4095 bytes
>> of non-init data and 2 bytes of init data.  With the .init section loaded,
>> the module will occupy two pages.  Freeing the .init section will free
>> an entire page, making 4096 bytes available to the system, even though
>> only two bytes were in the .init section.
>
>Surly we can do better and just not generate .init sections for modules
>where the size would be smaller than a page.  Is binutils capable of doing
>this given the proper linker script?

	I wasn't talking specifically about modules smaller than a
page in that paragraph.  I was talking about modules where the non-init
section ends toward the end of a page and appending the init
section would make it end more toward the begining of a (different) page.

	I also wasn't talking about modifying binutils.  binutils
already works fine with the present .text.init, .data.init, etc.
sections used in compiled-in kernel .o files (see include/linux/init.h)
I was talking about leaving some of this enabled for modules, and how
insmod could be changed to support it.

Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."

^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: Rusty's module talk at the Kernel Summit
@ 2002-07-01 16:12 Adam J. Richter
  2002-07-01 17:02 ` jlnance
  2002-07-03  5:01 ` Keith Owens
  0 siblings, 2 replies; 48+ messages in thread
From: Adam J. Richter @ 2002-07-01 16:12 UTC (permalink / raw)
  To: kaos; +Cc: linux-kernel

Keith Owens wrote:
>Since modules are always allocated on page boundaries, discarding init
>sections is only a win if it reduces the final size of the module from
>m to m-n pages.  So far the pain of loading in multiple areas and
>adjusting the associated arch dependent tables after discard has
>outweighed any gain from discarding the init sections from modules.

	That would average out for modules larger than a page if the
distribution of the non-init sections modulo 4096 (or whatever
PAGE_CACHE_SIZE is on your architecture) is basically uniform.
You would be just as likely to free more bytes than the size of the .init
sections as a result of page granularity than to free fewer bytes.

	As an extereme illustration, imagine a module with 4095 bytes
of non-init data and 2 bytes of init data.  With the .init section loaded,
the module will occupy two pages.  Freeing the .init section will free
an entire page, making 4096 bytes available to the system, even though
only two bytes were in the .init section.

	On the linux-2.5.24 x86 machine on which I am composing this
email, 654 out of 983 modules (two thirds) have text+data+bss larger than
4096 bytes.  The byte count of these modules modulo 4096 is actually a bit
heavier on the low end, which bodes well for saving space by releasing .init
sections:

Module text+data+bss
size modulo 4096	 # of modules

0000...0511              102
0512...1023              108
1024...1535               81
1536...2047               76
2048...2559               71
2560...3071               90
3072...3583               69
3584...4095               57


	It would also be possible to achieve space savings for modules
with non-init text+data+bss sizes smaller than a page by allocating
their space with kmalloc(...,__GFP_HIGHMEM) instead of vmalloc.  This
would require loading the init and non-init parts as separate modules,
which would happen if this were implemented in what I regard as the
"easy" way, a way that would only delete lines from the current kernel
(but add code to insmod).

	Here is what I have in mind.  I believe that removal of .init
sections could be implemented entirely in user land (aside from
removing the include/inux/init.h code that disables init sections for
modules).  Insmod would allocate two kernel modules, one for the init
sections and the other for the regular sections.  Insmod would resolve
references between the two sections.  The temporary module for the
init sections would be loaded first, with no initialization routine.
The module with the real data would be loaded second, and would run
the initialization routine (even if the initialization routine were in
the temporary init module).  When the initialization routine
completed, regardless of sucess or failure, the temporary init module
would be unloaded.

Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."

^ permalink raw reply	[flat|nested] 48+ messages in thread
* Rusty's module talk at the Kernel Summit
@ 2002-07-01  8:45 Keith Owens
  0 siblings, 0 replies; 48+ messages in thread
From: Keith Owens @ 2002-07-01  8:45 UTC (permalink / raw)
  To: linux-kernel; +Cc: rusty

Rusty at Kernel Summit:

  "Keith is not angry yet, but he will be if he hears some of the
  things I am going to say".
  
Having heard the talk, I am not angry, but some corrections are
required.


=== inter_module_{un}register

inter_module_unregister does BUG() because inter_module_register and
unregister _must_ be matching pairs.  The BUG() is to catch coding
errors, i.e. a run time check to ensure that the interface is being
used correctly.  Rusty, was that point 4 or 5 on your "nice interface"
list?

The only way that inter_module_register can fail to register the
interface is if kmalloc fails, hence the check and the use of
kmalloc_failed.  Anything else is a programmer error, hence BUG().

Registering two blobs with the same name is also a programming error,
so that fails as well.  Blob names must be unique.

The checks and BUG calls are done in inter_module_{un}register because
they must be done for all users of this interface.  The alternative was
assuming that every caller would check for their own coding errors.


=== Discarding init sections.

Discarding init sections is relatively easy, just position the sections
where they can be freed after module_init().  Ensuring that the
associated tables in the module such as exception lists, MIPS dbe, ia64
unwind etc. are updated to reflect that some code/data that used to
exist no longer exists is a lot harder.

Since modules are always allocated on page boundaries, discarding init
sections is only a win if it reduces the final size of the module from
m to m-n pages.  So far the pain of loading in multiple areas and
adjusting the associated arch dependent tables after discard has
outweighed any gain from discarding the init sections from modules.


=== modversions

Keep a "list of symbols and their versions and the in kernel module
linker matches them up".  That will not work.  The whole point of
modversions is to identify the ABI used to compile the module, at the
time it was compiled, not when it is loaded.

IOW, the ABI version information must be bound to the module at the
time the module is built, not when it is loaded.  Hence the mangling of
exported symbols at compile time, not at load time.

The Makefiles list the objects that export symbols as a build
optimization.  Because there is no way of telling where an exported
symbol is used and because the exported symbols must reflect the
compile time ABI, kbuild has to calculate the modversion data at the
start of compilation, before anything else can be compiled.

kbuild could 'fgrep -rl EXPORT_SYMBOL .' to get the list of exporting
objects instead of manually specifying them, but that would slow down
every build.  The existing kbuild system is full of these little
optimizations to make it run faster, e.g. only descend into a directory
if CONFIG_FOO is set.  I agree that they are a pain in the neck, but
you should see how much slower the existing build system runs without
them.

kbuild 2.5 does away with almost all of the hand coded build
optimizations and still manages to be as fast or faster than the
existing build system.  Apparently that does not count for most people,
they are happy with a build system that requires manual optimization to
get decent speed.

"md5sum over the source code, .config etc." to verify if a module and
kernel belong together is pointless.  One advantage of modversions is
that the version data allows a module to be loaded into any compatible
kernel as long as the ABI has not changed, so a checksum of the kernel
source is no good.  Changing the config does not necessarily invalidate
a module, turning on CONFIG_DRIVER_FOO only affect the FOO module, not
every module, so a checksum of .config is no good either.

BTW, I have a design for doing modversions correctly that will not
require manual entries in the Makefiles, will not require name
mangling, will provide better error checking than the current
modversions and provides better error messages for users.

It not only detects a mismatch between SMP and non-SMP but it also
detects all the other build differences that slip past the current
modversion algorithm.  It is cheap enough and accurate enough that
modversions can be the default, this will improve error detection at
the time the module is loaded instead of some random oops later.  Only
one problem, it requires kbuild 2.5, so you will not get this design.


=== MOD_INC_USE_COUNT vs try_inc_mod_count

MOD_INC_USE_COUNT is perfectly safe within a module init routine.
sys_init_module() bumps the use count temporarily around the call to
the module init routine.

MOD_INC_USE_COUNT within a module but outside the init routine has a
race between entering the module on one cpu and freeing the module on
another.  However that race also affects try_inc_mod_count, or any
other method that adjusts the use count from with the object itself.

If you solve the problem of lack of reference counting for code
executing with use count == 0 (including all the preempt hassles) then
both MOD_INC_USE_COUNT and try_inc_mod_count are safe.  AFAICT there is
no need to change every module's use of MOD_INC_USE_COUNT for its own
use count.


=== Pointer trampolines

A couple of years ago I looked at putting a trampoline around function
calls that entered a module.  The aim was to bump the use count
_before_ entering the code, removing the unload race.  However the
implementation sucked.  Each architecture needed its own trampoline
code.  Passing parameters from the trampoline to the real code was a
nightmare, especially on ia64 where the hardware says how many
parameters are being passed.

gcc __builtin_apply and __builtin_return would have helped, but they do
not work on all architectures.  I gave up trampolines for module entry
as a nice idea that was just too difficult to implement and maintain.


^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2002-07-23  4:34 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-07-04 17:24 Rusty's module talk at the Kernel Summit Adam J. Richter
2002-07-11  2:48 ` Rusty Russell
2002-07-11  2:45   ` David S. Miller
2002-07-11  3:30     ` Alexander Viro
2002-07-11  5:13       ` Rusty Russell
2002-07-11  6:37         ` Alexander Viro
2002-07-11  7:14           ` Rusty Russell
2002-07-11 10:54             ` Daniel Phillips
2002-07-11 17:37               ` Roman Zippel
2002-07-11 18:01                 ` Thunder from the hill
2002-07-11 18:50                   ` Daniel Phillips
2002-07-17 18:16                   ` bill davidsen
2002-07-17 19:35                     ` Thunder from the hill
2002-07-11 18:28                 ` Daniel Phillips
2002-07-11 19:48                   ` Roman Zippel
2002-07-11 20:29                     ` Daniel Phillips
2002-07-11 23:37                     ` Alexander Viro
2002-07-12  1:54                       ` Daniel Phillips
2002-07-12  3:53                       ` Rusty Russell
2002-07-12  6:49                         ` Kai Henningsen
2002-07-12 11:30                       ` Roman Zippel
2002-07-12  0:00               ` Rusty Russell
2002-07-12  6:57                 ` Kai Henningsen
2002-07-19  0:19           ` Richard Gooch
2002-07-22 16:29             ` Alexander Viro
2002-07-23  4:37               ` Richard Gooch
2002-07-11  4:02     ` Cort Dougan
2002-07-11  4:19       ` Arnaldo Carvalho de Melo
2002-07-11  4:46       ` Cort Dougan
2002-07-11  2:55   ` Arnaldo Carvalho de Melo
2002-07-11  3:01     ` Arnaldo Carvalho de Melo
2002-07-11  5:16     ` Rusty Russell
  -- strict thread matches above, loose matches on Subject: below --
2002-07-11  5:44 Adam J. Richter
2002-07-11  5:07 Adam J. Richter
2002-07-03 15:53 Adam J. Richter
2002-07-03 17:07 ` Hugh Dickins
2002-07-03 18:46   ` Oliver Neukum
2002-07-03 23:25     ` Keith Owens
2002-07-03 23:09 ` Keith Owens
2002-07-03  7:31 Adam J. Richter
2002-07-03  8:54 ` Rogier Wolff
2002-07-03 12:27 ` Keith Owens
2002-07-03 14:10   ` Keith Owens
2002-07-01 17:20 Adam J. Richter
2002-07-01 16:12 Adam J. Richter
2002-07-01 17:02 ` jlnance
2002-07-03  5:01 ` Keith Owens
2002-07-01  8:45 Keith Owens

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox