All of lore.kernel.org
 help / color / mirror / Atom feed
* Re : Re : sparsemem usage
@ 2006-08-03  9:07 moreau francis
  2006-08-03  9:19 ` KAMEZAWA Hiroyuki
  2006-08-03  9:47 ` Andy Whitcroft
  0 siblings, 2 replies; 13+ messages in thread
From: moreau francis @ 2006-08-03  9:07 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel, apw

Alan Cox wrote:
>
> Mapping out parts of a section is quite normal - think about the 640K to
> 1Mb hole in PC memory space.

OK. But I'm still worry. Please consider the following code

       for (...; ...; ...) {
                [...]
                if (pfn_valid(i))
                       num_physpages++;
                [...]
        }

In that case num_physpages won't store an accurate value. Still it will be
used by the kernel to make some statistic assumptions on other kernel
data structure sizes.

Francis
        




^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: Re : sparsemem usage
@ 2006-08-10 15:21 Andy Whitcroft
  2006-08-10 15:37 ` Re : " moreau francis
  0 siblings, 1 reply; 13+ messages in thread
From: Andy Whitcroft @ 2006-08-10 15:21 UTC (permalink / raw)
  To: moreau francis; +Cc: KAMEZAWA Hiroyuki, alan, linux-kernel

moreau francis wrote:
> KAMEZAWA Hiroyuki wrote:
>> On Thu, 10 Aug 2006 14:40:52 +0200 (CEST)
>> moreau francis <francis_moreau2000@yahoo.fr> wrote: 
>>>> BTW, ioresouce information (see kernel/resouce.c)
>>>>
>>>> [kamezawa@aworks Development]$ cat /proc/iomem | grep RAM
>>>> 00000000-0009fbff : System RAM
>>>> 000a0000-000bffff : Video RAM area
>>>> 00100000-2dfeffff : System RAM
>>>>
>>>> is not enough ?
>>>>
>>> well actually you show that to get a really simple information, ie does
>>> a page exist ?, we need to parse some kernel data structures like 
>>> ioresource (which is, IMHO, hackish) or duplicate in each architecture
>>> some data to keep track of existing pages.
>>>
>> becasue memory map from e820(x86) or efi(ia64) are registered to iomem_resource,
>> we should avoid duplicates that information. kdump and memory hotplug uses
>> this information. (memory hotplug updates this iomem_resource.)
>>
>> Implementing "page_is_exist" function based on ioresouce is one of generic
>> and rubust way to go, I think.
>> (if performance of list walking is problem, enhancing ioresouce code is
>>  better.)
>>  
> 
> Why not implementing page_exist() by simply using mem_map[] ? When
> allocating mem_map[], we can just fill it with a special value. And
> then when registering memory area, we clear this special value with
> the "reserved" value. Hence for flatmem model, we can have:
> 
> #define page_exist(pfn)        (mem_map[pfn] != SPECIAL_VALUE)
> 
> and it should work for sparsemem too and other models that will use
> mem_map[].

The mem_map isn't a pointer, its a physical structure.  We have a 
special value to tell you if the page is usable within that, thats 
called PG_reserved.  If this page is reserved the kernel can't touch it, 
can't look at it.

> Another point, is page_exist() going to replace page_valid() ?
> I mean page_exist() is going to be something more accurate than
> page_valid(). All tests on page_valid() _only_ will be fine to test
> page_exist(). But all tests such:
> 
>     if (page_valid(x) && page_is_ram(x))
> 
> can be replaced by
> 
>     if (page_exist(x))
> 
> So, again, why not simply improving page_valid() definition rather
> than introduce a new service ?

Whilst I can understand that not knowing if a page is real or not is 
perhaps unappealing, I've yet to see any case where we need or care. 
Changing things to make things 'nicer' interlectually is sometimes 
worthwhile.  But what is the user here.

The only consumer you have shown is show_mem() which is a debug 
function, and that only dumps out the current memory counts.  Its not 
clear it cares to really know if a page is real or not.

-ap

^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: Re : sparsemem usage
@ 2006-08-10 15:05 KAMEZAWA Hiroyuki
  2006-08-10 15:23 ` Re : " moreau francis
  0 siblings, 1 reply; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-08-10 15:05 UTC (permalink / raw)
  To: moreau francis; +Cc: apw, alan, linux-kernel

On Thu, 10 Aug 2006 14:46:01 +0000 (GMT)
moreau francis <francis_moreau2000@yahoo.fr> wrote:
> 
> Why not implementing page_exist() by simply using mem_map[] ? When
> allocating mem_map[], we can just fill it with a special value. And
> then when registering memory area, we clear this special value with
> the "reserved" value. Hence for flatmem model, we can have:
> 
> #define page_exist(pfn)        (mem_map[pfn] != SPECIAL_VALUE)
>  
putting a special value to a page struct at mem_map + pfn ?

> and it should work for sparsemem too and other models that will use
> mem_map[].
> 
> Another point, is page_exist() going to replace page_valid() ?
what is page_valid() here ? pfn_valid() (in current kernel) ?

> I mean page_exist() is going to be something more accurate than
> page_valid(). All tests on page_valid() _only_ will be fine to test
> page_exist(). But all tests such:
> 
>     if (page_valid(x) && page_is_ram(x))
> 
> can be replaced by
> 
>     if (page_exist(x))
> 
> So, again, why not simply improving page_valid() definition rather
> than introduce a new service ?
> 
I welcome to do that if implementation is sane.
pfn_valid() --- check there is a page struct
page_exist() --- check there is a physical memory.

but discussing without patch is not very good. please post your patch.
Then we can discuss more concrete things.

-Kame


^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: Re : sparsemem usage
@ 2006-08-02 15:36 Andy Whitcroft
  2006-08-03  9:56 ` Re : " moreau francis
  0 siblings, 1 reply; 13+ messages in thread
From: Andy Whitcroft @ 2006-08-02 15:36 UTC (permalink / raw)
  To: moreau francis; +Cc: linux-kernel

moreau francis wrote:
> Andy Whitcroft wrote:
>> The memory allocator buddy location algorithm has an implicit assumption 
>> that the memory map will be contigious and valid out to MAX_ORDER.  ie 
>> that we can do relative arithmetic on a page* for a page to find its 
>> buddy at all times.  The allocator never looks outside a MAX_ORDER 
>> block, aligned to MAX_ORDER in physical pages.  SPARSEMEM's 
>> implementation by it nature breaks up the mem_map at the section size. 
>> Thus for the buddy to work a section must be >= MAX_ORDER in size to 
>> maintain the contiguity constraint.
> 
> thanks for the explanation. But still something I'm missing, how can a
> MAX_ORDER block be allocated in a memory whose size is only 128Ko ?
> Can't it be detected by the buddy allocatorvery early without doing any 
> relative arithmetic on a page* ?

When allocating we do not have a problem as we simply pull a free page 
off the appropriately sizes free list.  Its when freeing we have an 
issue, all the allocator has to work with is the page you are freeing. 
As MAX_ORDER is >128K we can get to the situation where all but one page 
is free.  When we free that page we then need to merge this 128Kb page 
with its buddy if its free.   To tell if that one is free it has to look 
at the page* for it, so that page* must also exist for this check to work.

>> However, just because you have a small memory block in your memory map 
>> doesn't mean that the sparsemem section size needs to be that small to 
>> match.  If there is any valid memory in any section that section will be 
>> instantiated and the valid memory marked within it, any invalid memory 
>> is marked reserved.  
> 
> ah ok but that means that pfn_valid() will still returns ok for invalid page which
> are in a invalid memory marked as reserved. Is it not risky ?

pfn_valid() will indeed say 'ok'.  But that is defined only to mean that 
it is safe to look at the page* for that page.  It says nothing else 
about the page itself.  Pages which are reserved never get freed into 
the allocator so they are not there to be allocated so we should not be 
refering to them.

>> The section size bounds the amount of internal 
>> fragmentation we can have in the mem_map.  SPARSEMEM as its name 
>> suggests wins biggest when memory is very sparsly populate. 
> 
> sorry but I don't understand. I would say that sparsemem section size should
> be chosen to make mem_map[] and mem_section[] sizes as small as possible.

There are tradeoffs here.  The smaller the section size the better the 
internal fragmentation will be.  However also the more of them there 
will be, the more space that will be used tracking them, the more 
cachelines touched with them.  Also as we have seen we can't have things 
in the allocator bigger than the section size.  This can constrain the 
lower bound on the section size.  Finally, on 32 bit systems the overall 
number of sections is bounded by the available space in the fields 
section of the page* flags field.

If your system has 256 1Gb sections and 1 128Kb section then it could 
well make sense to have a 1GB section size or perhaps a 256Mb section 
size as you are only wasting space in the last section.

> 
>> If I am 
>> reading correctly your memory is actually contigious.
> 
> well there're big holes in address space.
> 

I read that as saying there was a major gap to 3Gb and then it was 
contigious from there; but then I was guessing at the units :).

-apw

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2006-08-11  8:28 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-03  9:07 Re : Re : sparsemem usage moreau francis
2006-08-03  9:19 ` KAMEZAWA Hiroyuki
2006-08-03  9:47 ` Andy Whitcroft
2006-08-03 12:46   ` Re : " moreau francis
2006-08-03 13:13     ` Andy Whitcroft
2006-08-09 14:19       ` Re : " moreau francis
2006-08-10  4:46         ` KAMEZAWA Hiroyuki
2006-08-10 12:40           ` moreau francis
2006-08-10 12:49             ` KAMEZAWA Hiroyuki
  -- strict thread matches above, loose matches on Subject: below --
2006-08-10 15:21 Andy Whitcroft
2006-08-10 15:37 ` Re : " moreau francis
2006-08-11  8:26   ` Andy Whitcroft
2006-08-10 15:05 KAMEZAWA Hiroyuki
2006-08-10 15:23 ` Re : " moreau francis
2006-08-02 15:36 Andy Whitcroft
2006-08-03  9:56 ` Re : " moreau francis

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.