public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* sparsemem usage
@ 2006-08-02 13:44 moreau francis
  2006-08-02 14:25 ` Andy Whitcroft
  2006-08-02 15:24 ` Alan Cox
  0 siblings, 2 replies; 8+ messages in thread
From: moreau francis @ 2006-08-02 13:44 UTC (permalink / raw)
  To: linux-kernel; +Cc: apw

My board has a really weird mem mapping.

MEM1: 0xc000 0000 - 32 Mo
MEM2: 0xd000 0000 - 8 Mo
MEM3: 0xd800 0000 - 128 Ko

MEM3 has interesting properties, such as speed and security,
and I really need to use it.

I think that sparsemem can deal with such mapping. But I
encounter an issue when choosing the section bit size. I choose
SECTION_SIZE_BITS = 17. Therefore the section size is
equal to the smallest size of my memories. But I get a
compilation error which is due to this:

#if (MAX_ORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS
#error Allocator MAX_ORDER exceeds SECTION_SIZE
#endif

I'm not sure to understand why there's such check. To fix this
I should change MAX_ORDER to 6.

Is it the only way to fix that ?

Thanks

Francis



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sparsemem usage
  2006-08-02 13:44 sparsemem usage moreau francis
@ 2006-08-02 14:25 ` Andy Whitcroft
  2006-08-02 15:12   ` Re : " moreau francis
  2006-08-02 15:24 ` Alan Cox
  1 sibling, 1 reply; 8+ messages in thread
From: Andy Whitcroft @ 2006-08-02 14:25 UTC (permalink / raw)
  To: moreau francis; +Cc: linux-kernel

moreau francis wrote:
> My board has a really weird mem mapping.
> 
> MEM1: 0xc000 0000 - 32 Mo
> MEM2: 0xd000 0000 - 8 Mo
> MEM3: 0xd800 0000 - 128 Ko
> 
> MEM3 has interesting properties, such as speed and security,
> and I really need to use it.
> 
> I think that sparsemem can deal with such mapping. But I
> encounter an issue when choosing the section bit size. I choose
> SECTION_SIZE_BITS = 17. Therefore the section size is
> equal to the smallest size of my memories. But I get a
> compilation error which is due to this:
> 
> #if (MAX_ORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS
> #error Allocator MAX_ORDER exceeds SECTION_SIZE
> #endif
> 
> I'm not sure to understand why there's such check. To fix this
> I should change MAX_ORDER to 6.
> 
> Is it the only way to fix that ?

The memory allocator buddy location algorithm has an implicit assumption 
that the memory map will be contigious and valid out to MAX_ORDER.  ie 
that we can do relative arithmetic on a page* for a page to find its 
buddy at all times.  The allocator never looks outside a MAX_ORDER 
block, aligned to MAX_ORDER in physical pages.  SPARSEMEM's 
implementation by it nature breaks up the mem_map at the section size. 
Thus for the buddy to work a section must be >= MAX_ORDER in size to 
maintain the contiguity constraint.

However, just because you have a small memory block in your memory map 
doesn't mean that the sparsemem section size needs to be that small to 
match.  If there is any valid memory in any section that section will be 
instantiated and the valid memory marked within it, any invalid memory 
is marked reserved.  The section size bounds the amount of internal 
fragmentation we can have in the mem_map.  SPARSEMEM as its name 
suggests wins biggest when memory is very sparsly populate.  If I am 
reading correctly your memory is actually contigious.

-apw


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re : sparsemem usage
  2006-08-02 14:25 ` Andy Whitcroft
@ 2006-08-02 15:12   ` moreau francis
  2006-08-02 15:36     ` Andy Whitcroft
  0 siblings, 1 reply; 8+ messages in thread
From: moreau francis @ 2006-08-02 15:12 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: linux-kernel

Andy Whitcroft wrote:
> The memory allocator buddy location algorithm has an implicit assumption 
> that the memory map will be contigious and valid out to MAX_ORDER.  ie 
> that we can do relative arithmetic on a page* for a page to find its 
> buddy at all times.  The allocator never looks outside a MAX_ORDER 
> block, aligned to MAX_ORDER in physical pages.  SPARSEMEM's 
> implementation by it nature breaks up the mem_map at the section size. 
> Thus for the buddy to work a section must be >= MAX_ORDER in size to 
> maintain the contiguity constraint.

thanks for the explanation. But still something I'm missing, how can a
MAX_ORDER block be allocated in a memory whose size is only 128Ko ?
Can't it be detected by the buddy allocatorvery early without doing any 
relative arithmetic on a page* ?

> However, just because you have a small memory block in your memory map 
> doesn't mean that the sparsemem section size needs to be that small to 
> match.  If there is any valid memory in any section that section will be 
> instantiated and the valid memory marked within it, any invalid memory 
> is marked reserved.  

ah ok but that means that pfn_valid() will still returns ok for invalid page which
are in a invalid memory marked as reserved. Is it not risky ?

> The section size bounds the amount of internal 
> fragmentation we can have in the mem_map.  SPARSEMEM as its name 
> suggests wins biggest when memory is very sparsly populate. 

sorry but I don't understand. I would say that sparsemem section size should
be chosen to make mem_map[] and mem_section[] sizes as small as possible.

> If I am 
> reading correctly your memory is actually contigious.

well there're big holes in address space.

thanks

Francis





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sparsemem usage
  2006-08-02 13:44 sparsemem usage moreau francis
  2006-08-02 14:25 ` Andy Whitcroft
@ 2006-08-02 15:24 ` Alan Cox
  2006-08-02 15:33   ` Re : " moreau francis
  1 sibling, 1 reply; 8+ messages in thread
From: Alan Cox @ 2006-08-02 15:24 UTC (permalink / raw)
  To: moreau francis; +Cc: linux-kernel, apw

Ar Mer, 2006-08-02 am 13:44 +0000, ysgrifennodd moreau francis:
> #if (MAX_ORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS
> #error Allocator MAX_ORDER exceeds SECTION_SIZE
> #endif
> 
> I'm not sure to understand why there's such check. To fix this
> I should change MAX_ORDER to 6.
> 
> Is it the only way to fix that ?

The kernel allocates memory out using groups of blocks in a buddy
system. 128K is smaller than one of the blocks so the kernel cannot
handle this. You need 2MB (if I remember right) granularity for your
sections but nothing stops you marking most of the 2Mb section except
the 128K that exists as "in use"

Alan


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re : sparsemem usage
  2006-08-02 15:24 ` Alan Cox
@ 2006-08-02 15:33   ` moreau francis
  2006-08-02 16:33     ` Alan Cox
  0 siblings, 1 reply; 8+ messages in thread
From: moreau francis @ 2006-08-02 15:33 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel, apw

Hi Alan !

Alan Cox wrote:
> The kernel allocates memory out using groups of blocks in a buddy
> system. 128K is smaller than one of the blocks so the kernel cannot
> handle this. 

As I wrote to Andy Whitcroft, I would think that the kernel forbid allocation
of blocks whose size is greater than the current memorysize. But I know 
nothing about the buddy allocator so I trust you ;)

> You need 2MB (if I remember right) granularity for your

MAX_ORDER is by default 11. Without changing this, I would say that I
need 4MB granularity.

> sections but nothing stops you marking most of the 2Mb section except
> the 128K that exists as "in use"

ok. But it will make pfn_valid() return "valid" for page beyond the first 128 KB.
Won't that result in bad impacts later ?

thanks

Francis




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re : sparsemem usage
  2006-08-02 15:12   ` Re : " moreau francis
@ 2006-08-02 15:36     ` Andy Whitcroft
  2006-08-03  9:56       ` Re : " moreau francis
  0 siblings, 1 reply; 8+ messages in thread
From: Andy Whitcroft @ 2006-08-02 15:36 UTC (permalink / raw)
  To: moreau francis; +Cc: linux-kernel

moreau francis wrote:
> Andy Whitcroft wrote:
>> The memory allocator buddy location algorithm has an implicit assumption 
>> that the memory map will be contigious and valid out to MAX_ORDER.  ie 
>> that we can do relative arithmetic on a page* for a page to find its 
>> buddy at all times.  The allocator never looks outside a MAX_ORDER 
>> block, aligned to MAX_ORDER in physical pages.  SPARSEMEM's 
>> implementation by it nature breaks up the mem_map at the section size. 
>> Thus for the buddy to work a section must be >= MAX_ORDER in size to 
>> maintain the contiguity constraint.
> 
> thanks for the explanation. But still something I'm missing, how can a
> MAX_ORDER block be allocated in a memory whose size is only 128Ko ?
> Can't it be detected by the buddy allocatorvery early without doing any 
> relative arithmetic on a page* ?

When allocating we do not have a problem as we simply pull a free page 
off the appropriately sizes free list.  Its when freeing we have an 
issue, all the allocator has to work with is the page you are freeing. 
As MAX_ORDER is >128K we can get to the situation where all but one page 
is free.  When we free that page we then need to merge this 128Kb page 
with its buddy if its free.   To tell if that one is free it has to look 
at the page* for it, so that page* must also exist for this check to work.

>> However, just because you have a small memory block in your memory map 
>> doesn't mean that the sparsemem section size needs to be that small to 
>> match.  If there is any valid memory in any section that section will be 
>> instantiated and the valid memory marked within it, any invalid memory 
>> is marked reserved.  
> 
> ah ok but that means that pfn_valid() will still returns ok for invalid page which
> are in a invalid memory marked as reserved. Is it not risky ?

pfn_valid() will indeed say 'ok'.  But that is defined only to mean that 
it is safe to look at the page* for that page.  It says nothing else 
about the page itself.  Pages which are reserved never get freed into 
the allocator so they are not there to be allocated so we should not be 
refering to them.

>> The section size bounds the amount of internal 
>> fragmentation we can have in the mem_map.  SPARSEMEM as its name 
>> suggests wins biggest when memory is very sparsly populate. 
> 
> sorry but I don't understand. I would say that sparsemem section size should
> be chosen to make mem_map[] and mem_section[] sizes as small as possible.

There are tradeoffs here.  The smaller the section size the better the 
internal fragmentation will be.  However also the more of them there 
will be, the more space that will be used tracking them, the more 
cachelines touched with them.  Also as we have seen we can't have things 
in the allocator bigger than the section size.  This can constrain the 
lower bound on the section size.  Finally, on 32 bit systems the overall 
number of sections is bounded by the available space in the fields 
section of the page* flags field.

If your system has 256 1Gb sections and 1 128Kb section then it could 
well make sense to have a 1GB section size or perhaps a 256Mb section 
size as you are only wasting space in the last section.

> 
>> If I am 
>> reading correctly your memory is actually contigious.
> 
> well there're big holes in address space.
> 

I read that as saying there was a major gap to 3Gb and then it was 
contigious from there; but then I was guessing at the units :).

-apw

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re : sparsemem usage
  2006-08-02 15:33   ` Re : " moreau francis
@ 2006-08-02 16:33     ` Alan Cox
  0 siblings, 0 replies; 8+ messages in thread
From: Alan Cox @ 2006-08-02 16:33 UTC (permalink / raw)
  To: moreau francis; +Cc: linux-kernel, apw

Ar Mer, 2006-08-02 am 15:33 +0000, ysgrifennodd moreau francis:
> > sections but nothing stops you marking most of the 2Mb section except
> > the 128K that exists as "in use"
> 
> ok. But it will make pfn_valid() return "valid" for page beyond the first 128 KB.
> Won't that result in bad impacts later ?

Mapping out parts of a section is quite normal - think about the 640K to
1Mb hole in PC memory space.

Alan


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re : Re : sparsemem usage
  2006-08-02 15:36     ` Andy Whitcroft
@ 2006-08-03  9:56       ` moreau francis
  0 siblings, 0 replies; 8+ messages in thread
From: moreau francis @ 2006-08-03  9:56 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: linux-kernel

Andy Whitcroft wrote:
> When allocating we do not have a problem as we simply pull a free page 
> off the appropriately sizes free list.  Its when freeing we have an 
> issue, all the allocator has to work with is the page you are freeing. 
> As MAX_ORDER is >128K we can get to the situation where all but one page 
> is free.  When we free that page we then need to merge this 128Kb page 
> with its buddy if its free.   To tell if that one is free it has to look 
> at the page* for it, so that page* must also exist for this check to work.

Maybe in sparsemem code, we could mark a well chosen page as reserved if
the size of mem region is < MAX_ORDER. That way the buddy allocator
will never have to free a block of 128 KO...

> pfn_valid() will indeed say 'ok'.  But that is defined only to mean that 
> it is safe to look at the page* for that page.  It says nothing else 
> about the page itself.  Pages which are reserved never get freed into 
> the allocator so they are not there to be allocated so we should not be 
> refering to them.

wouldn't it be safer to mark these pages as "invalid" instead of "reserved"
with a special value stored in mem_map[] ?

> There are tradeoffs here.  The smaller the section size the better the 
> internal fragmentation will be.  However also the more of them there 
> will be, the more space that will be used tracking them, the more 
> cachelines touched with them.  Also as we have seen we can't have things 
> in the allocator bigger than the section size.  This can constrain the 
> lower bound on the section size.  Finally, on 32 bit systems the overall 
> number of sections is bounded by the available space in the fields 
> section of the page* flags field.

thanks for that.

> If your system has 256 1Gb sections and 1 128Kb section then it could 
> well make sense to have a 1GB section size or perhaps a 256Mb section 
> size as you are only wasting space in the last section.

> I read that as saying there was a major gap to 3Gb and then it was 
> contigious from there; but then I was guessing at the units :).

here is a updated version of my mapping, it should be clear now:

HOLE0: 0 - 3 Go
MEM1: 0xc000 0000 - 32 Mo
HOLE1: 0xc200 0000 - 224 Mo
MEM2: 0xd000 0000 - 8 Mo
HOLE2: 0xd080 0000 - 120 Mo
MEM3: 0xd800 0000 - 128 Ko
HOLE3: rest of mem

Francis




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-08-03  9:56 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-02 13:44 sparsemem usage moreau francis
2006-08-02 14:25 ` Andy Whitcroft
2006-08-02 15:12   ` Re : " moreau francis
2006-08-02 15:36     ` Andy Whitcroft
2006-08-03  9:56       ` Re : " moreau francis
2006-08-02 15:24 ` Alan Cox
2006-08-02 15:33   ` Re : " moreau francis
2006-08-02 16:33     ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox