public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* vmalloc/module_alloc: unable to handle two memory regions
@ 2003-01-31 10:20 Russell King
       [not found] ` <20030131024820.4c1290ca.akpm@digeo.com>
  0 siblings, 1 reply; 3+ messages in thread
From: Russell King @ 2003-01-31 10:20 UTC (permalink / raw)
  To: linux-kernel

On ARM, modules need to either be placed within 32MB of the kernel
image, or we need to build a jump table for code jumps within the
module to reach the main kernel body.

In order to solve this problem, we created a 16MB region between
TASK_SIZE and PAGE_OFFSET for modules.  Any allocations within this
range are required to be linked into the "vmlist" for /proc/kcore -
this list also used by vmalloc() and friends.

On ARM, we end up with the following virtual memory layout:

 +----------------------------+ 4GB
   devices
 +----------------------------+
   vmalloc/ioremap
 +----------------------------+ PAGE_OFFSET + RAM_SIZE
   kernel direct-mapped ram
 +----------------------------+ PAGE_OFFSET (3GB) = MODULE_END = 0xc0000000
   module
 +----------------------------+ TASK_SIZE = MODULE_START = 0xbf000000
   user space
 +----------------------------+

This idea was borrowed from x86_64, and appears on the face of it to
be the perfect solution.  However, it can (and does) go horribly wrong.
After loading many modules, I end up with the following vm_struct vmlist:

=== vm list dump ===
area c1555e24 addr bf000000 size 00013000 flags 00000002
area c1555dac addr bf013000 size 00002000 flags 00000001
area c1555d84 addr bf015000 size 00002000 flags 00000001
area c1555d5c addr bf018000 size 00004000 flags 00000002
area c1555974 addr bf01c000 size 00002000 flags 00000002
area c15559ec addr bf01f000 size 00002000 flags 00000002
area c1555c44 addr bf021000 size 00003000 flags 00000002
area c1555dfc addr bf024000 size 00002000 flags 00000002
area c1555e74 addr bf027000 size 0000c000 flags 00000002
area c1307fdc addr bf033000 size 00003000 flags 00000002
area c155585c addr bf036000 size 00002000 flags 00000002
area c1555c94 addr bf038000 size 00002000 flags 00000002
area c15557e4 addr bf03a000 size 00002000 flags 00000002
area c155576c addr bf03c000 size 00002000 flags 00000002
area c1555514 addr bf03e000 size 00002000 flags 00000002
area c155549c addr bf040000 size 00002000 flags 00000001
area c1307f64 addr bf046000 size 00003000 flags 00000002
area c1555cbc addr bf06e000 size 00026000 flags 00000002
area c1555564 addr bf094000 size 01001000 flags 00000001

Pay special attention to the last entry - it starts at 0xbf094000, and
ends at 0xc0095000 - it overlaps the kernel direct mapped RAM.  What's
more is that the entries at 0xbf013000, 0xbf015000, 0xbf040000 and
0xbf094000 are for ioremapped memory, but are placed in the module
region.

This occurs due to the way get_vm_area() works:

	addr = VMALLOC_START;
...
        for (p = &vmlist; (tmp = *p) ;p = &tmp->next) {
                if ((size + addr) < addr)
                        goto out;
                if (size + addr <= (unsigned long)tmp->addr)	/* A */
                        goto found;
                addr = tmp->size + (unsigned long)tmp->addr;
                if (addr > VMALLOC_END - size)
                        goto out;
        }

Initially, addr is > PAGE_OFFSET.  If the first vmlist entry is 0xbf000000,
size 0x00013000, "A" is obviously false, and we calculate the new addr.
This ends up being 0xbf013000, which is _less than_ VMALLOC_START.  Let's
say, for the case of argument, that the size being requested is 0x02000000,
and VMALLOC_END is 0xe0000000.

addr + size = 0xc1013000, which overlaps the kernel direct mapped region.
It is still below VMALLOC_START, and it's less than VMALLOC_END.  We end
up allocating this region, and overwriting the kernel direct-mapped
page tables, in this case completely unmapping the kernel from virtual
memory space.

Oops.

The following should fix this, and needs to be applied to _all_ allocators
which touch the vmlist (I'd rather remove the duplication and have one
common allocation function, but that's a subject for separate discussion):

--- orig/mm/vmalloc.c	Tue Nov  5 12:51:41 2002
+++ linux/mm/vmalloc.c	Fri Jan 31 10:11:22 2003
@@ -212,6 +212,8 @@
 	for (p = &vmlist; (tmp = *p) ;p = &tmp->next) {
 		if ((size + addr) < addr)
 			goto out;
+		if (addr > (unsigned long)tmp->addr)
+			continue;
 		if (size + addr <= (unsigned long)tmp->addr)
 			goto found;
 		addr = tmp->size + (unsigned long)tmp->addr;


-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: vmalloc/module_alloc: unable to handle two memory regions
       [not found] ` <20030131024820.4c1290ca.akpm@digeo.com>
@ 2003-01-31 10:55   ` Russell King
  2003-01-31 21:04     ` Andrew Morton
  0 siblings, 1 reply; 3+ messages in thread
From: Russell King @ 2003-01-31 10:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux Kernel List

On Fri, Jan 31, 2003 at 02:48:20AM -0800, Andrew Morton wrote:
> Boggle.
> 
> Isn't this totally abusing get_vma_area?
> 
> What stops an ioremap region from landing in module space?

Exactly the problem.

What's more is that fs/proc/kcore.c:get_kcore_size() also breaks, so
this isn't an acceptable solution.  get_kcore_size wants the module
region to be above PAGE_OFFSET.

In order to place the module in the normal vmalloc space, we end up with
a chicken and egg problem - we need to scan the module from kernel space
to find out how large to make the jump table, but we can't because the
module hasn't been loaded into kernel memory - this is the reason why it
was suggested to go down this route.

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: vmalloc/module_alloc: unable to handle two memory regions
  2003-01-31 10:55   ` Russell King
@ 2003-01-31 21:04     ` Andrew Morton
  0 siblings, 0 replies; 3+ messages in thread
From: Andrew Morton @ 2003-01-31 21:04 UTC (permalink / raw)
  To: Russell King; +Cc: linux-kernel

Russell King <rmk@arm.linux.org.uk> wrote:
>
> On Fri, Jan 31, 2003 at 02:48:20AM -0800, Andrew Morton wrote:
> > Boggle.
> > 
> > Isn't this totally abusing get_vma_area?
> > 
> > What stops an ioremap region from landing in module space?
> 
> Exactly the problem.
> 
> What's more is that fs/proc/kcore.c:get_kcore_size() also breaks, so
> this isn't an acceptable solution.  get_kcore_size wants the module
> region to be above PAGE_OFFSET.
> 
> In order to place the module in the normal vmalloc space, we end up with
> a chicken and egg problem - we need to scan the module from kernel space
> to find out how large to make the jump table, but we can't because the
> module hasn't been loaded into kernel memory - this is the reason why it
> was suggested to go down this route.
> 

Well, could you not do something like:

 +----------------------------+ 4GB
   devices
 +----------------------------+ VMALLOC_END = 0xc2000000
   vmalloc/ioremap
 +----------------------------+ 0xc1000000 + sizeof(linux)
   kernel direct-mapped ram
 +----------------------------+ 0xc1000000
   module
 +----------------------------+ TASK_SIZE = MODULE_START = PAGE_OFFSET =
                                VMALLOC_START = 0xc0000000
   user space
 +----------------------------+

And then arrange for a (start=0xc1000000,len=sizeof(linux)) entry which
describes the kernel itself to be added to the vmlist before anything else?



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2003-01-31 20:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-01-31 10:20 vmalloc/module_alloc: unable to handle two memory regions Russell King
     [not found] ` <20030131024820.4c1290ca.akpm@digeo.com>
2003-01-31 10:55   ` Russell King
2003-01-31 21:04     ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox