* Re : sparsemem usage
2006-08-02 14:25 Andy Whitcroft
@ 2006-08-02 15:12 ` moreau francis
2006-08-02 15:36 ` Andy Whitcroft
0 siblings, 1 reply; 15+ messages in thread
From: moreau francis @ 2006-08-02 15:12 UTC (permalink / raw)
To: Andy Whitcroft; +Cc: linux-kernel
Andy Whitcroft wrote:
> The memory allocator buddy location algorithm has an implicit assumption
> that the memory map will be contigious and valid out to MAX_ORDER. ie
> that we can do relative arithmetic on a page* for a page to find its
> buddy at all times. The allocator never looks outside a MAX_ORDER
> block, aligned to MAX_ORDER in physical pages. SPARSEMEM's
> implementation by it nature breaks up the mem_map at the section size.
> Thus for the buddy to work a section must be >= MAX_ORDER in size to
> maintain the contiguity constraint.
thanks for the explanation. But still something I'm missing, how can a
MAX_ORDER block be allocated in a memory whose size is only 128Ko ?
Can't it be detected by the buddy allocatorvery early without doing any
relative arithmetic on a page* ?
> However, just because you have a small memory block in your memory map
> doesn't mean that the sparsemem section size needs to be that small to
> match. If there is any valid memory in any section that section will be
> instantiated and the valid memory marked within it, any invalid memory
> is marked reserved.
ah ok but that means that pfn_valid() will still returns ok for invalid page which
are in a invalid memory marked as reserved. Is it not risky ?
> The section size bounds the amount of internal
> fragmentation we can have in the mem_map. SPARSEMEM as its name
> suggests wins biggest when memory is very sparsly populate.
sorry but I don't understand. I would say that sparsemem section size should
be chosen to make mem_map[] and mem_section[] sizes as small as possible.
> If I am
> reading correctly your memory is actually contigious.
well there're big holes in address space.
thanks
Francis
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re : sparsemem usage
2006-08-02 15:24 Alan Cox
@ 2006-08-02 15:33 ` moreau francis
2006-08-02 16:33 ` Alan Cox
0 siblings, 1 reply; 15+ messages in thread
From: moreau francis @ 2006-08-02 15:33 UTC (permalink / raw)
To: Alan Cox; +Cc: linux-kernel, apw
Hi Alan !
Alan Cox wrote:
> The kernel allocates memory out using groups of blocks in a buddy
> system. 128K is smaller than one of the blocks so the kernel cannot
> handle this.
As I wrote to Andy Whitcroft, I would think that the kernel forbid allocation
of blocks whose size is greater than the current memorysize. But I know
nothing about the buddy allocator so I trust you ;)
> You need 2MB (if I remember right) granularity for your
MAX_ORDER is by default 11. Without changing this, I would say that I
need 4MB granularity.
> sections but nothing stops you marking most of the 2Mb section except
> the 128K that exists as "in use"
ok. But it will make pfn_valid() return "valid" for page beyond the first 128 KB.
Won't that result in bad impacts later ?
thanks
Francis
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Re : sparsemem usage
2006-08-02 15:12 ` Re : " moreau francis
@ 2006-08-02 15:36 ` Andy Whitcroft
0 siblings, 0 replies; 15+ messages in thread
From: Andy Whitcroft @ 2006-08-02 15:36 UTC (permalink / raw)
To: moreau francis; +Cc: linux-kernel
moreau francis wrote:
> Andy Whitcroft wrote:
>> The memory allocator buddy location algorithm has an implicit assumption
>> that the memory map will be contigious and valid out to MAX_ORDER. ie
>> that we can do relative arithmetic on a page* for a page to find its
>> buddy at all times. The allocator never looks outside a MAX_ORDER
>> block, aligned to MAX_ORDER in physical pages. SPARSEMEM's
>> implementation by it nature breaks up the mem_map at the section size.
>> Thus for the buddy to work a section must be >= MAX_ORDER in size to
>> maintain the contiguity constraint.
>
> thanks for the explanation. But still something I'm missing, how can a
> MAX_ORDER block be allocated in a memory whose size is only 128Ko ?
> Can't it be detected by the buddy allocatorvery early without doing any
> relative arithmetic on a page* ?
When allocating we do not have a problem as we simply pull a free page
off the appropriately sizes free list. Its when freeing we have an
issue, all the allocator has to work with is the page you are freeing.
As MAX_ORDER is >128K we can get to the situation where all but one page
is free. When we free that page we then need to merge this 128Kb page
with its buddy if its free. To tell if that one is free it has to look
at the page* for it, so that page* must also exist for this check to work.
>> However, just because you have a small memory block in your memory map
>> doesn't mean that the sparsemem section size needs to be that small to
>> match. If there is any valid memory in any section that section will be
>> instantiated and the valid memory marked within it, any invalid memory
>> is marked reserved.
>
> ah ok but that means that pfn_valid() will still returns ok for invalid page which
> are in a invalid memory marked as reserved. Is it not risky ?
pfn_valid() will indeed say 'ok'. But that is defined only to mean that
it is safe to look at the page* for that page. It says nothing else
about the page itself. Pages which are reserved never get freed into
the allocator so they are not there to be allocated so we should not be
refering to them.
>> The section size bounds the amount of internal
>> fragmentation we can have in the mem_map. SPARSEMEM as its name
>> suggests wins biggest when memory is very sparsly populate.
>
> sorry but I don't understand. I would say that sparsemem section size should
> be chosen to make mem_map[] and mem_section[] sizes as small as possible.
There are tradeoffs here. The smaller the section size the better the
internal fragmentation will be. However also the more of them there
will be, the more space that will be used tracking them, the more
cachelines touched with them. Also as we have seen we can't have things
in the allocator bigger than the section size. This can constrain the
lower bound on the section size. Finally, on 32 bit systems the overall
number of sections is bounded by the available space in the fields
section of the page* flags field.
If your system has 256 1Gb sections and 1 128Kb section then it could
well make sense to have a 1GB section size or perhaps a 256Mb section
size as you are only wasting space in the last section.
>
>> If I am
>> reading correctly your memory is actually contigious.
>
> well there're big holes in address space.
>
I read that as saying there was a major gap to 3Gb and then it was
contigious from there; but then I was guessing at the units :).
-apw
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Re : sparsemem usage
2006-08-02 15:33 ` Re : " moreau francis
@ 2006-08-02 16:33 ` Alan Cox
0 siblings, 0 replies; 15+ messages in thread
From: Alan Cox @ 2006-08-02 16:33 UTC (permalink / raw)
To: moreau francis; +Cc: linux-kernel, apw
Ar Mer, 2006-08-02 am 15:33 +0000, ysgrifennodd moreau francis:
> > sections but nothing stops you marking most of the 2Mb section except
> > the 128K that exists as "in use"
>
> ok. But it will make pfn_valid() return "valid" for page beyond the first 128 KB.
> Won't that result in bad impacts later ?
Mapping out parts of a section is quite normal - think about the 640K to
1Mb hole in PC memory space.
Alan
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re : sparsemem usage
2006-08-10 4:46 KAMEZAWA Hiroyuki
@ 2006-08-10 12:40 ` moreau francis
2006-08-10 12:49 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 15+ messages in thread
From: moreau francis @ 2006-08-10 12:40 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: apw, alan, linux-kernel
KAMEZAWA Hiroyuki wrote:
> On Wed, 9 Aug 2006 14:19:01 +0000 (GMT)
> moreau francis <francis_moreau2000@yahoo.fr> wrote:
>
>> Not all arch have page_is_ram(). OK it should be easy to have but we will
>> need create new data structures to keep this info. The point is that it's
>> really easy for memory model such sparsemem to keep this info.
>>
>>> Do you have a usage model in which we really care about the number of
>>> pages in the system to that level of accuracy?
>>>
>> show_mem(), which is arch specific, needs to report them. And some
>> implementations use only pfn_valid().
>>
>
> BTW, ioresouce information (see kernel/resouce.c)
>
> [kamezawa@aworks Development]$ cat /proc/iomem | grep RAM
> 00000000-0009fbff : System RAM
> 000a0000-000bffff : Video RAM area
> 00100000-2dfeffff : System RAM
>
> is not enough ?
>
well actually you show that to get a really simple information, ie does
a page exist ?, we need to parse some kernel data structures like
ioresource (which is, IMHO, hackish) or duplicate in each architecture
some data to keep track of existing pages.
Francis
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Re : sparsemem usage
2006-08-10 12:40 ` moreau francis
@ 2006-08-10 12:49 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 15+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-08-10 12:49 UTC (permalink / raw)
To: moreau francis; +Cc: apw, alan, linux-kernel
On Thu, 10 Aug 2006 14:40:52 +0200 (CEST)
moreau francis <francis_moreau2000@yahoo.fr> wrote:
> > BTW, ioresouce information (see kernel/resouce.c)
> >
> > [kamezawa@aworks Development]$ cat /proc/iomem | grep RAM
> > 00000000-0009fbff : System RAM
> > 000a0000-000bffff : Video RAM area
> > 00100000-2dfeffff : System RAM
> >
> > is not enough ?
> >
>
> well actually you show that to get a really simple information, ie does
> a page exist ?, we need to parse some kernel data structures like
> ioresource (which is, IMHO, hackish) or duplicate in each architecture
> some data to keep track of existing pages.
>
becasue memory map from e820(x86) or efi(ia64) are registered to iomem_resource,
we should avoid duplicates that information. kdump and memory hotplug uses
this information. (memory hotplug updates this iomem_resource.)
Implementing "page_is_exist" function based on ioresouce is one of generic
and rubust way to go, I think.
(if performance of list walking is problem, enhancing ioresouce code is
better.)
-Kame
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re : sparsemem usage
@ 2006-08-10 14:46 moreau francis
2006-08-10 15:05 ` KAMEZAWA Hiroyuki
2006-08-10 15:21 ` Andy Whitcroft
0 siblings, 2 replies; 15+ messages in thread
From: moreau francis @ 2006-08-10 14:46 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: apw, alan, linux-kernel
KAMEZAWA Hiroyuki wrote:
> On Thu, 10 Aug 2006 14:40:52 +0200 (CEST)
> moreau francis <francis_moreau2000@yahoo.fr> wrote:
>>> BTW, ioresouce information (see kernel/resouce.c)
>>>
>>> [kamezawa@aworks Development]$ cat /proc/iomem | grep RAM
>>> 00000000-0009fbff : System RAM
>>> 000a0000-000bffff : Video RAM area
>>> 00100000-2dfeffff : System RAM
>>>
>>> is not enough ?
>>>
>> well actually you show that to get a really simple information, ie does
>> a page exist ?, we need to parse some kernel data structures like
>> ioresource (which is, IMHO, hackish) or duplicate in each architecture
>> some data to keep track of existing pages.
>>
>
> becasue memory map from e820(x86) or efi(ia64) are registered to iomem_resource,
> we should avoid duplicates that information. kdump and memory hotplug uses
> this information. (memory hotplug updates this iomem_resource.)
>
> Implementing "page_is_exist" function based on ioresouce is one of generic
> and rubust way to go, I think.
> (if performance of list walking is problem, enhancing ioresouce code is
> better.)
>
Why not implementing page_exist() by simply using mem_map[] ? When
allocating mem_map[], we can just fill it with a special value. And
then when registering memory area, we clear this special value with
the "reserved" value. Hence for flatmem model, we can have:
#define page_exist(pfn) (mem_map[pfn] != SPECIAL_VALUE)
and it should work for sparsemem too and other models that will use
mem_map[].
Another point, is page_exist() going to replace page_valid() ?
I mean page_exist() is going to be something more accurate than
page_valid(). All tests on page_valid() _only_ will be fine to test
page_exist(). But all tests such:
if (page_valid(x) && page_is_ram(x))
can be replaced by
if (page_exist(x))
So, again, why not simply improving page_valid() definition rather
than introduce a new service ?
Francis
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Re : sparsemem usage
2006-08-10 14:46 Re : sparsemem usage moreau francis
@ 2006-08-10 15:05 ` KAMEZAWA Hiroyuki
2006-08-10 15:23 ` Re : " moreau francis
2006-08-10 15:21 ` Andy Whitcroft
1 sibling, 1 reply; 15+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-08-10 15:05 UTC (permalink / raw)
To: moreau francis; +Cc: apw, alan, linux-kernel
On Thu, 10 Aug 2006 14:46:01 +0000 (GMT)
moreau francis <francis_moreau2000@yahoo.fr> wrote:
>
> Why not implementing page_exist() by simply using mem_map[] ? When
> allocating mem_map[], we can just fill it with a special value. And
> then when registering memory area, we clear this special value with
> the "reserved" value. Hence for flatmem model, we can have:
>
> #define page_exist(pfn) (mem_map[pfn] != SPECIAL_VALUE)
>
putting a special value to a page struct at mem_map + pfn ?
> and it should work for sparsemem too and other models that will use
> mem_map[].
>
> Another point, is page_exist() going to replace page_valid() ?
what is page_valid() here ? pfn_valid() (in current kernel) ?
> I mean page_exist() is going to be something more accurate than
> page_valid(). All tests on page_valid() _only_ will be fine to test
> page_exist(). But all tests such:
>
> if (page_valid(x) && page_is_ram(x))
>
> can be replaced by
>
> if (page_exist(x))
>
> So, again, why not simply improving page_valid() definition rather
> than introduce a new service ?
>
I welcome to do that if implementation is sane.
pfn_valid() --- check there is a page struct
page_exist() --- check there is a physical memory.
but discussing without patch is not very good. please post your patch.
Then we can discuss more concrete things.
-Kame
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Re : sparsemem usage
2006-08-10 14:46 Re : sparsemem usage moreau francis
2006-08-10 15:05 ` KAMEZAWA Hiroyuki
@ 2006-08-10 15:21 ` Andy Whitcroft
2006-08-10 15:37 ` Re : " moreau francis
1 sibling, 1 reply; 15+ messages in thread
From: Andy Whitcroft @ 2006-08-10 15:21 UTC (permalink / raw)
To: moreau francis; +Cc: KAMEZAWA Hiroyuki, alan, linux-kernel
moreau francis wrote:
> KAMEZAWA Hiroyuki wrote:
>> On Thu, 10 Aug 2006 14:40:52 +0200 (CEST)
>> moreau francis <francis_moreau2000@yahoo.fr> wrote:
>>>> BTW, ioresouce information (see kernel/resouce.c)
>>>>
>>>> [kamezawa@aworks Development]$ cat /proc/iomem | grep RAM
>>>> 00000000-0009fbff : System RAM
>>>> 000a0000-000bffff : Video RAM area
>>>> 00100000-2dfeffff : System RAM
>>>>
>>>> is not enough ?
>>>>
>>> well actually you show that to get a really simple information, ie does
>>> a page exist ?, we need to parse some kernel data structures like
>>> ioresource (which is, IMHO, hackish) or duplicate in each architecture
>>> some data to keep track of existing pages.
>>>
>> becasue memory map from e820(x86) or efi(ia64) are registered to iomem_resource,
>> we should avoid duplicates that information. kdump and memory hotplug uses
>> this information. (memory hotplug updates this iomem_resource.)
>>
>> Implementing "page_is_exist" function based on ioresouce is one of generic
>> and rubust way to go, I think.
>> (if performance of list walking is problem, enhancing ioresouce code is
>> better.)
>>
>
> Why not implementing page_exist() by simply using mem_map[] ? When
> allocating mem_map[], we can just fill it with a special value. And
> then when registering memory area, we clear this special value with
> the "reserved" value. Hence for flatmem model, we can have:
>
> #define page_exist(pfn) (mem_map[pfn] != SPECIAL_VALUE)
>
> and it should work for sparsemem too and other models that will use
> mem_map[].
The mem_map isn't a pointer, its a physical structure. We have a
special value to tell you if the page is usable within that, thats
called PG_reserved. If this page is reserved the kernel can't touch it,
can't look at it.
> Another point, is page_exist() going to replace page_valid() ?
> I mean page_exist() is going to be something more accurate than
> page_valid(). All tests on page_valid() _only_ will be fine to test
> page_exist(). But all tests such:
>
> if (page_valid(x) && page_is_ram(x))
>
> can be replaced by
>
> if (page_exist(x))
>
> So, again, why not simply improving page_valid() definition rather
> than introduce a new service ?
Whilst I can understand that not knowing if a page is real or not is
perhaps unappealing, I've yet to see any case where we need or care.
Changing things to make things 'nicer' interlectually is sometimes
worthwhile. But what is the user here.
The only consumer you have shown is show_mem() which is a debug
function, and that only dumps out the current memory counts. Its not
clear it cares to really know if a page is real or not.
-ap
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re : Re : sparsemem usage
2006-08-10 15:05 ` KAMEZAWA Hiroyuki
@ 2006-08-10 15:23 ` moreau francis
0 siblings, 0 replies; 15+ messages in thread
From: moreau francis @ 2006-08-10 15:23 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: apw, alan, linux-kernel
KAMEZAWA Hiroyuki wrote:
> On Thu, 10 Aug 2006 14:46:01 +0000 (GMT)
> moreau francis <francis_moreau2000@yahoo.fr> wrote:
>> Why not implementing page_exist() by simply using mem_map[] ? When
>> allocating mem_map[], we can just fill it with a special value. And
>> then when registering memory area, we clear this special value with
>> the "reserved" value. Hence for flatmem model, we can have:
>>
>> #define page_exist(pfn) (mem_map[pfn] != SPECIAL_VALUE)
>>
> putting a special value to a page struct at mem_map + pfn ?
yes
>
>> and it should work for sparsemem too and other models that will use
>> mem_map[].
>>
>> Another point, is page_exist() going to replace page_valid() ?
> what is page_valid() here ? pfn_valid() (in current kernel) ?
sorry I was meaning pfn_valid() instead of page_valid() in the
whole email.
>
>> I mean page_exist() is going to be something more accurate than
>> page_valid(). All tests on page_valid() _only_ will be fine to test
>> page_exist(). But all tests such:
>>
>> if (page_valid(x) && page_is_ram(x))
>>
>> can be replaced by
>>
>> if (page_exist(x))
>>
>> So, again, why not simply improving page_valid() definition rather
>> than introduce a new service ?
>>
s/page_valid/pfn_valid
> I welcome to do that if implementation is sane.
> pfn_valid() --- check there is a page struct
> page_exist() --- check there is a physical memory.
>
new definition of pfn_valid() would be "a physical page exists". And
this definition imply the old one "it's safe to read the page struct *"
> but discussing without patch is not very good. please post your patch.
> Then we can discuss more concrete things.
>
Since I'm not kernel hacker, or rather a newbie one, I try to make sure
that it worth to dig in that direction before working hard to write a
patch.
thanks
Francis
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re : Re : sparsemem usage
2006-08-10 15:21 ` Andy Whitcroft
@ 2006-08-10 15:37 ` moreau francis
2006-08-11 8:26 ` Andy Whitcroft
0 siblings, 1 reply; 15+ messages in thread
From: moreau francis @ 2006-08-10 15:37 UTC (permalink / raw)
To: Andy Whitcroft; +Cc: KAMEZAWA Hiroyuki, alan, linux-kernel
Andy Whitcroft wrote:
> moreau francis wrote:
>> KAMEZAWA Hiroyuki wrote:
>>> On Thu, 10 Aug 2006 14:40:52 +0200 (CEST)
>>> moreau francis <francis_moreau2000@yahoo.fr> wrote:
>>>>> BTW, ioresouce information (see kernel/resouce.c)
>>>>>
>>>>> [kamezawa@aworks Development]$ cat /proc/iomem | grep RAM
>>>>> 00000000-0009fbff : System RAM
>>>>> 000a0000-000bffff : Video RAM area
>>>>> 00100000-2dfeffff : System RAM
>>>>>
>>>>> is not enough ?
>>>>>
>>>> well actually you show that to get a really simple information, ie does
>>>> a page exist ?, we need to parse some kernel data structures like
>>>> ioresource (which is, IMHO, hackish) or duplicate in each architecture
>>>> some data to keep track of existing pages.
>>>>
>>> becasue memory map from e820(x86) or efi(ia64) are registered to
>>> iomem_resource,
>>> we should avoid duplicates that information. kdump and memory hotplug
>>> uses
>>> this information. (memory hotplug updates this iomem_resource.)
>>>
>>> Implementing "page_is_exist" function based on ioresouce is one of
>>> generic
>>> and rubust way to go, I think.
>>> (if performance of list walking is problem, enhancing ioresouce code is
>>> better.)
>>>
>>
>> Why not implementing page_exist() by simply using mem_map[] ? When
>> allocating mem_map[], we can just fill it with a special value. And
>> then when registering memory area, we clear this special value with
>> the "reserved" value. Hence for flatmem model, we can have:
>>
>> #define page_exist(pfn) (mem_map[pfn] != SPECIAL_VALUE)
>>
>> and it should work for sparsemem too and other models that will use
>> mem_map[].
>
> The mem_map isn't a pointer, its a physical structure. We have a
ok
> special value to tell you if the page is usable within that, thats
> called PG_reserved. If this page is reserved the kernel can't touch it,
> can't look at it.
can't we introduce a new special value, such as "PG_real" ?
>
>> Another point, is page_exist() going to replace page_valid() ?
>> I mean page_exist() is going to be something more accurate than
>> page_valid(). All tests on page_valid() _only_ will be fine to test
>> page_exist(). But all tests such:
>>
>> if (page_valid(x) && page_is_ram(x))
>>
>> can be replaced by
>>
>> if (page_exist(x))
>>
>> So, again, why not simply improving page_valid() definition rather
>> than introduce a new service ?
>
> Whilst I can understand that not knowing if a page is real or not is
> perhaps unappealing, I've yet to see any case where we need or care.
> Changing things to make things 'nicer' interlectually is sometimes
> worthwhile. But what is the user here.
>
> The only consumer you have shown is show_mem() which is a debug
> function, and that only dumps out the current memory counts. Its not
> clear it cares to really know if a page is real or not.
>
I understand your point of view, but even if it's a debug function,
it must exist and report correct information. And my point is that
I think it should be really easy to implement :) that by using
a new "special value". Can you confirm that it's really easy to
implement that ?
thanks
Francis
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Re : Re : sparsemem usage
2006-08-10 15:37 ` Re : " moreau francis
@ 2006-08-11 8:26 ` Andy Whitcroft
2006-08-11 12:46 ` Re : " moreau francis
0 siblings, 1 reply; 15+ messages in thread
From: Andy Whitcroft @ 2006-08-11 8:26 UTC (permalink / raw)
To: moreau francis; +Cc: KAMEZAWA Hiroyuki, alan, linux-kernel
moreau francis wrote:
> Andy Whitcroft wrote:
>> moreau francis wrote:
>>> KAMEZAWA Hiroyuki wrote:
>>>> On Thu, 10 Aug 2006 14:40:52 +0200 (CEST)
>>>> moreau francis <francis_moreau2000@yahoo.fr> wrote:
>>>>>> BTW, ioresouce information (see kernel/resouce.c)
>>>>>>
>>>>>> [kamezawa@aworks Development]$ cat /proc/iomem | grep RAM
>>>>>> 00000000-0009fbff : System RAM
>>>>>> 000a0000-000bffff : Video RAM area
>>>>>> 00100000-2dfeffff : System RAM
>>>>>>
>>>>>> is not enough ?
>>>>>>
>>>>> well actually you show that to get a really simple information, ie does
>>>>> a page exist ?, we need to parse some kernel data structures like
>>>>> ioresource (which is, IMHO, hackish) or duplicate in each architecture
>>>>> some data to keep track of existing pages.
>>>>>
>>>> becasue memory map from e820(x86) or efi(ia64) are registered to
>>>> iomem_resource,
>>>> we should avoid duplicates that information. kdump and memory hotplug
>>>> uses
>>>> this information. (memory hotplug updates this iomem_resource.)
>>>>
>>>> Implementing "page_is_exist" function based on ioresouce is one of
>>>> generic
>>>> and rubust way to go, I think.
>>>> (if performance of list walking is problem, enhancing ioresouce code is
>>>> better.)
>>>>
>>> Why not implementing page_exist() by simply using mem_map[] ? When
>>> allocating mem_map[], we can just fill it with a special value. And
>>> then when registering memory area, we clear this special value with
>>> the "reserved" value. Hence for flatmem model, we can have:
>>>
>>> #define page_exist(pfn) (mem_map[pfn] != SPECIAL_VALUE)
>>>
>>> and it should work for sparsemem too and other models that will use
>>> mem_map[].
>> The mem_map isn't a pointer, its a physical structure. We have a
>
> ok
>
>> special value to tell you if the page is usable within that, thats
>> called PG_reserved. If this page is reserved the kernel can't touch it,
>> can't look at it.
>
> can't we introduce a new special value, such as "PG_real" ?
>
>>> Another point, is page_exist() going to replace page_valid() ?
>>> I mean page_exist() is going to be something more accurate than
>>> page_valid(). All tests on page_valid() _only_ will be fine to test
>>> page_exist(). But all tests such:
>>>
>>> if (page_valid(x) && page_is_ram(x))
>>>
>>> can be replaced by
>>>
>>> if (page_exist(x))
>>>
>>> So, again, why not simply improving page_valid() definition rather
>>> than introduce a new service ?
>> Whilst I can understand that not knowing if a page is real or not is
>> perhaps unappealing, I've yet to see any case where we need or care.
>> Changing things to make things 'nicer' interlectually is sometimes
>> worthwhile. But what is the user here.
>>
>> The only consumer you have shown is show_mem() which is a debug
>> function, and that only dumps out the current memory counts. Its not
>> clear it cares to really know if a page is real or not.
>>
>
> I understand your point of view, but even if it's a debug function,
> it must exist and report correct information. And my point is that
> I think it should be really easy to implement :) that by using
> a new "special value". Can you confirm that it's really easy to
> implement that ?
It does produce real numbers, it tells you how many reserved pages you
have. The places that this is triggered we are interested in why we
have no memory left. We are not interested in how many pages are known
but reserved as against how many pages are backed by page*'s but are
really holes; they are mearly pages we can't use out of the total we are
tracking. We care about how many are not reserved, and how many of
those are available.
It would be 'as simple' as adding a PG_real page bit except for two things:
1) page flags bits are seriously short supply; there are some 24
available of which 22 are in use. Any new user of a bit would have to
be an extremely valuable change with major benefit to the kernel, and
2) if you were to try and populate a PG_real flag it would need to be
populated for _all_ architectures (and there are a lot) for it to be of
any use. As you have already noted there is no consistent way to find
out whether a page is ram so it would be major exercise to get these
bits setup during boot.
I think we should take (2) as a hint here. If we don't have a
consistent interface for finding whether a page is real or not, we
obviously have no general need of that information in the kernel.
Yes we obviously care if we can use a page, but we do not care if the
page is unusable because it contains an ACPI table or the video driver
BIOS or there is a memory hole. Its either usable (!PG_reserved) or its
not (PG_reserved).
-apw
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re : Re : Re : sparsemem usage
2006-08-11 8:26 ` Andy Whitcroft
@ 2006-08-11 12:46 ` moreau francis
2006-08-11 12:52 ` Andy Whitcroft
0 siblings, 1 reply; 15+ messages in thread
From: moreau francis @ 2006-08-11 12:46 UTC (permalink / raw)
To: Andy Whitcroft; +Cc: KAMEZAWA Hiroyuki, alan, linux-kernel
Andy Whitcroft wrote:
> It does produce real numbers, it tells you how many reserved pages you
> have. The places that this is triggered we are interested in why we
> have no memory left. We are not interested in how many pages are known
> but reserved as against how many pages are backed by page*'s but are
> really holes; they are mearly pages we can't use out of the total we are
> tracking. We care about how many are not reserved, and how many of
> those are available.
>
> It would be 'as simple' as adding a PG_real page bit except for two things:
>
> 1) page flags bits are seriously short supply; there are some 24
> available of which 22 are in use. Any new user of a bit would have to
> be an extremely valuable change with major benefit to the kernel, and
>
It's indeed an issue. Could we instead use a combination of flags
that can't happen together. For example PG_Free|PG_Reserved ?
> 2) if you were to try and populate a PG_real flag it would need to be
> populated for _all_ architectures (and there are a lot) for it to be of
> any use. As you have already noted there is no consistent way to find
> out whether a page is ram so it would be major exercise to get these
> bits setup during boot.
>
> I think we should take (2) as a hint here. If we don't have a
> consistent interface for finding whether a page is real or not, we
> obviously have no general need of that information in the kernel.
>
or maybe _because_ we don't have a consistent interface for finding
whether a page is real or not, we end up with a strange thing called
page_is_ram() which could be the same for all arch and be implemented
very simply.
BTW, can you try in a linux tree:
$ grep -r page_is_ram arch/
and see how it's implemented...
thanks
Francis
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Re : Re : Re : sparsemem usage
2006-08-11 12:46 ` Re : " moreau francis
@ 2006-08-11 12:52 ` Andy Whitcroft
2006-08-16 12:56 ` Re : " moreau francis
0 siblings, 1 reply; 15+ messages in thread
From: Andy Whitcroft @ 2006-08-11 12:52 UTC (permalink / raw)
To: moreau francis; +Cc: KAMEZAWA Hiroyuki, alan, linux-kernel
moreau francis wrote:
> Andy Whitcroft wrote:
>> It does produce real numbers, it tells you how many reserved pages you
>> have. The places that this is triggered we are interested in why we
>> have no memory left. We are not interested in how many pages are known
>> but reserved as against how many pages are backed by page*'s but are
>> really holes; they are mearly pages we can't use out of the total we are
>> tracking. We care about how many are not reserved, and how many of
>> those are available.
>>
>> It would be 'as simple' as adding a PG_real page bit except for two things:
>>
>> 1) page flags bits are seriously short supply; there are some 24
>> available of which 22 are in use. Any new user of a bit would have to
>> be an extremely valuable change with major benefit to the kernel, and
>>
>
> It's indeed an issue. Could we instead use a combination of flags
> that can't happen together. For example PG_Free|PG_Reserved ?
>
You'd need to audit all other users of the bits you wanted to borrow to
check they would understand. Like if you used PG_buddy (which I assume
is what you are referring to above) then you'd get non-real memory
getting merged into your buddies. Badness.
>> 2) if you were to try and populate a PG_real flag it would need to be
>> populated for _all_ architectures (and there are a lot) for it to be of
>> any use. As you have already noted there is no consistent way to find
>> out whether a page is ram so it would be major exercise to get these
>> bits setup during boot.
>>
>> I think we should take (2) as a hint here. If we don't have a
>> consistent interface for finding whether a page is real or not, we
>> obviously have no general need of that information in the kernel.
>>
>
> or maybe _because_ we don't have a consistent interface for finding
> whether a page is real or not, we end up with a strange thing called
> page_is_ram() which could be the same for all arch and be implemented
> very simply.
>
> BTW, can you try in a linux tree:
>
> $ grep -r page_is_ram arch/
>
> and see how it's implemented...
Well it depends how you look at it. You are going to need to know which
pages are ram in each architecture to set the bits in the page*'s to
tell us later. So the problem is the same problem. We don't
necessarily have the information for all architectures. As we don't use
this for anything its questionable whether we need it.
Feel free to try and figure it out for all these architectures :).
-apw
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re : Re : Re : Re : sparsemem usage
2006-08-11 12:52 ` Andy Whitcroft
@ 2006-08-16 12:56 ` moreau francis
0 siblings, 0 replies; 15+ messages in thread
From: moreau francis @ 2006-08-16 12:56 UTC (permalink / raw)
To: Andy Whitcroft; +Cc: KAMEZAWA Hiroyuki, alan, linux-kernel
Andy Whitcroft wrote:
> moreau francis wrote:
>>
>> It's indeed an issue. Could we instead use a combination of flags
>> that can't happen together. For example PG_Free|PG_Reserved ?
>>
>
> You'd need to audit all other users of the bits you wanted to borrow to
> check they would understand. Like if you used PG_buddy (which I assume
> is what you are referring to above) then you'd get non-real memory
> getting merged into your buddies. Badness.
>
It would be great if we could define:
#define page_is_real(p) (p->_count > 0 || p->flags != 0)
Hence mem_map[] would be automatically initialized as full of page without
any real memory instead of initializing it with a magic value.
>>
>> or maybe _because_ we don't have a consistent interface for finding
>> whether a page is real or not, we end up with a strange thing called
>> page_is_ram() which could be the same for all arch and be implemented
>> very simply.
>>
>> BTW, can you try in a linux tree:
>>
>> $ grep -r page_is_ram arch/
>>
>> and see how it's implemented...
>
> Well it depends how you look at it. You are going to need to know which
> pages are ram in each architecture to set the bits in the page*'s to
I don't see the problem there. You can init mem_map[] each time it is
allocated with the magic value (if the above definition can't be used).
Then, as usual, archs free all zone area by initializing all mem_map
entries with something different from the magic value. After that all
entries of mem_map[] with the magic value can be fastly discarded
because they don't have real memory.
Francis
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2006-08-16 12:56 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-10 14:46 Re : sparsemem usage moreau francis
2006-08-10 15:05 ` KAMEZAWA Hiroyuki
2006-08-10 15:23 ` Re : " moreau francis
2006-08-10 15:21 ` Andy Whitcroft
2006-08-10 15:37 ` Re : " moreau francis
2006-08-11 8:26 ` Andy Whitcroft
2006-08-11 12:46 ` Re : " moreau francis
2006-08-11 12:52 ` Andy Whitcroft
2006-08-16 12:56 ` Re : " moreau francis
-- strict thread matches above, loose matches on Subject: below --
2006-08-10 4:46 KAMEZAWA Hiroyuki
2006-08-10 12:40 ` moreau francis
2006-08-10 12:49 ` KAMEZAWA Hiroyuki
2006-08-02 15:24 Alan Cox
2006-08-02 15:33 ` Re : " moreau francis
2006-08-02 16:33 ` Alan Cox
2006-08-02 14:25 Andy Whitcroft
2006-08-02 15:12 ` Re : " moreau francis
2006-08-02 15:36 ` Andy Whitcroft
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox