All of lore.kernel.org
 help / color / mirror / Atom feed
* Re : Re : sparsemem usage
@ 2006-08-03  9:07 moreau francis
  2006-08-03  9:19 ` KAMEZAWA Hiroyuki
  2006-08-03  9:47 ` Andy Whitcroft
  0 siblings, 2 replies; 13+ messages in thread
From: moreau francis @ 2006-08-03  9:07 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel, apw

Alan Cox wrote:
>
> Mapping out parts of a section is quite normal - think about the 640K to
> 1Mb hole in PC memory space.

OK. But I'm still worry. Please consider the following code

       for (...; ...; ...) {
                [...]
                if (pfn_valid(i))
                       num_physpages++;
                [...]
        }

In that case num_physpages won't store an accurate value. Still it will be
used by the kernel to make some statistic assumptions on other kernel
data structure sizes.

Francis
        




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Re : Re : sparsemem usage
  2006-08-03  9:07 Re : Re : sparsemem usage moreau francis
@ 2006-08-03  9:19 ` KAMEZAWA Hiroyuki
  2006-08-03  9:47 ` Andy Whitcroft
  1 sibling, 0 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-08-03  9:19 UTC (permalink / raw)
  To: moreau francis; +Cc: alan, linux-kernel, apw

On Thu, 3 Aug 2006 09:07:06 +0000 (GMT)
moreau francis <francis_moreau2000@yahoo.fr> wrote:

> Alan Cox wrote:
> >
> > Mapping out parts of a section is quite normal - think about the 640K to
> > 1Mb hole in PC memory space.
> 
> OK. But I'm still worry. Please consider the following code
> 
>        for (...; ...; ...) {
>                 [...]
>                 if (pfn_valid(i))
>                        num_physpages++;
>                 [...]
>         }
> 
> In that case num_physpages won't store an accurate value. Still it will be
> used by the kernel to make some statistic assumptions on other kernel
> data structure sizes.
> 
In my understanding, pfn_valid() just returns "the page has page struct or not".
So, don't use pfn_valid() to count physical pages..


-Kame


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Re : Re : sparsemem usage
  2006-08-03  9:07 Re : Re : sparsemem usage moreau francis
  2006-08-03  9:19 ` KAMEZAWA Hiroyuki
@ 2006-08-03  9:47 ` Andy Whitcroft
  2006-08-03 12:46   ` Re : " moreau francis
  1 sibling, 1 reply; 13+ messages in thread
From: Andy Whitcroft @ 2006-08-03  9:47 UTC (permalink / raw)
  To: moreau francis; +Cc: Alan Cox, linux-kernel

moreau francis wrote:
> Alan Cox wrote:
>> Mapping out parts of a section is quite normal - think about the 640K to
>> 1Mb hole in PC memory space.
> 
> OK. But I'm still worry. Please consider the following code
> 
>        for (...; ...; ...) {
>                 [...]
>                 if (pfn_valid(i))
>                        num_physpages++;
>                 [...]
>         }
> 
> In that case num_physpages won't store an accurate value. Still it will be
> used by the kernel to make some statistic assumptions on other kernel
> data structure sizes.

That would be incorrect usage.  pfn_valid() simply doesn't tell you if 
you have memory backing a pfn, it mearly means you can interrogate the 
page* for it.  A good example of code which counts pages in a region is 
in count_highmem_pages() which has a form as below:

			for (pfn = start; pfn < end; pfn++) {
  				if (!pfn_valid(pfn))
                                         continue;
                                 page = pfn_to_page(pfn);
                                 if (PageReserved(page))
                                         continue;
				num_physpages++;
			}

-apw

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re : Re : sparsemem usage
  2006-08-02 15:36 Andy Whitcroft
@ 2006-08-03  9:56 ` moreau francis
  0 siblings, 0 replies; 13+ messages in thread
From: moreau francis @ 2006-08-03  9:56 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: linux-kernel

Andy Whitcroft wrote:
> When allocating we do not have a problem as we simply pull a free page 
> off the appropriately sizes free list.  Its when freeing we have an 
> issue, all the allocator has to work with is the page you are freeing. 
> As MAX_ORDER is >128K we can get to the situation where all but one page 
> is free.  When we free that page we then need to merge this 128Kb page 
> with its buddy if its free.   To tell if that one is free it has to look 
> at the page* for it, so that page* must also exist for this check to work.

Maybe in sparsemem code, we could mark a well chosen page as reserved if
the size of mem region is < MAX_ORDER. That way the buddy allocator
will never have to free a block of 128 KO...

> pfn_valid() will indeed say 'ok'.  But that is defined only to mean that 
> it is safe to look at the page* for that page.  It says nothing else 
> about the page itself.  Pages which are reserved never get freed into 
> the allocator so they are not there to be allocated so we should not be 
> refering to them.

wouldn't it be safer to mark these pages as "invalid" instead of "reserved"
with a special value stored in mem_map[] ?

> There are tradeoffs here.  The smaller the section size the better the 
> internal fragmentation will be.  However also the more of them there 
> will be, the more space that will be used tracking them, the more 
> cachelines touched with them.  Also as we have seen we can't have things 
> in the allocator bigger than the section size.  This can constrain the 
> lower bound on the section size.  Finally, on 32 bit systems the overall 
> number of sections is bounded by the available space in the fields 
> section of the page* flags field.

thanks for that.

> If your system has 256 1Gb sections and 1 128Kb section then it could 
> well make sense to have a 1GB section size or perhaps a 256Mb section 
> size as you are only wasting space in the last section.

> I read that as saying there was a major gap to 3Gb and then it was 
> contigious from there; but then I was guessing at the units :).

here is a updated version of my mapping, it should be clear now:

HOLE0: 0 - 3 Go
MEM1: 0xc000 0000 - 32 Mo
HOLE1: 0xc200 0000 - 224 Mo
MEM2: 0xd000 0000 - 8 Mo
HOLE2: 0xd080 0000 - 120 Mo
MEM3: 0xd800 0000 - 128 Ko
HOLE3: rest of mem

Francis




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re : Re : Re : sparsemem usage
  2006-08-03  9:47 ` Andy Whitcroft
@ 2006-08-03 12:46   ` moreau francis
  2006-08-03 13:13     ` Andy Whitcroft
  0 siblings, 1 reply; 13+ messages in thread
From: moreau francis @ 2006-08-03 12:46 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: Alan Cox, linux-kernel

Andy Whitcroft wrote:
> That would be incorrect usage.  pfn_valid() simply doesn't tell you if 
> you have memory backing a pfn, it mearly means you can interrogate the 
> page* for it.  A good example of code which counts pages in a region is 
> in count_highmem_pages() which has a form as below:
> 
>             for (pfn = start; pfn < end; pfn++) {
>                   if (!pfn_valid(pfn))
>                                          continue;
>                                  page = pfn_to_page(pfn);
>                                  if (PageReserved(page))
>                                          continue;
>                 num_physpages++;
>             }
> 
num_physpages would still not give the right total number of pages in the
system. It will report a value smaller than the size of all memories which can
be suprising, depending on how it is used. In my mind I thought that it should
store the number of all pages in the system (reserved + free + ...).

Futhermore for flatmem model, my example that count the number of physical
pages is valid: reserved pages are really pages that are in used by the kernel.
But it's not valid anymore for sparsemem model. For consistency and code
sharing, I would make the same meaning of pfn_valid() and PageReserved() for
both models.

Francis



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Re : Re : Re : sparsemem usage
  2006-08-03 12:46   ` Re : " moreau francis
@ 2006-08-03 13:13     ` Andy Whitcroft
  2006-08-09 14:19       ` Re : " moreau francis
  0 siblings, 1 reply; 13+ messages in thread
From: Andy Whitcroft @ 2006-08-03 13:13 UTC (permalink / raw)
  To: moreau francis; +Cc: Alan Cox, linux-kernel

moreau francis wrote:
> Andy Whitcroft wrote:
>> That would be incorrect usage.  pfn_valid() simply doesn't tell you if 
>> you have memory backing a pfn, it mearly means you can interrogate the 
>> page* for it.  A good example of code which counts pages in a region is 
>> in count_highmem_pages() which has a form as below:
>>
>>             for (pfn = start; pfn < end; pfn++) {
>>                   if (!pfn_valid(pfn))
>>                                          continue;
>>                                  page = pfn_to_page(pfn);
>>                                  if (PageReserved(page))
>>                                          continue;
>>                 num_physpages++;
>>             }
>>
> num_physpages would still not give the right total number of pages in the
> system. It will report a value smaller than the size of all memories which can
> be suprising, depending on how it is used. In my mind I thought that it should
> store the number of all pages in the system (reserved + free + ...).
> 
> Futhermore for flatmem model, my example that count the number of physical
> pages is valid: reserved pages are really pages that are in used by the kernel.
> But it's not valid anymore for sparsemem model. For consistency and code
> sharing, I would make the same meaning of pfn_valid() and PageReserved() for
> both models.

The semantics and meaning of both pfn_valid() and PageReserved() are the 
same in all three memory models, just not what you need them to be for 
your pfn_valid() loop to tell you how many real frames there are.
I do not believe it is correct to say that your loop would give you the 
number of physical pages under FLATMEM.  If there are any gaps at all 
(such as there is for IO space just below 1MB) that will pass 
pfn_valid(), and yet does _not_ have any real memory associated with it.
With FLATMEM you will get pfn_valid() passing on non-memory pages.

I have to re-iterate pfn_valid() does not mean pfn_valid_memory(), it 
means pfn_valid_memmap().  If you want to know if a page is valid and 
memory (at least on x86) you could use:

	if (pfn_valid(pfn) && page_is_ram(pfn)) {
	}

It is rare you care how many real page frames there are in the system. 
You are more interested in how many usable frames there are.  Such as 
for use in sizing hashes or caches.  The reserved pages should be 
excluded in this calculation.  ACPI pages, BIOS pages and the like 
simply are no interest.

I don't see anywhere in the kernel using that construct to work out how 
many pages there are in the system.  Mostly we have architectual 
information to tell us what real physical pages exist in the system such 
as the srat or e820 etc.  If we really care about real page counts at 
that accuracy we have those to refer to.

Do you have a usage model in which we really care about the number of 
pages in the system to that level of accuracy?

-apw

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re : Re : Re : Re : sparsemem usage
  2006-08-03 13:13     ` Andy Whitcroft
@ 2006-08-09 14:19       ` moreau francis
  2006-08-10  4:46         ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 13+ messages in thread
From: moreau francis @ 2006-08-09 14:19 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: Alan Cox, linux-kernel

Andy Whitcroft wrote:
> 
> I have to re-iterate pfn_valid() does not mean pfn_valid_memory(), it
> means pfn_valid_memmap().  If you want to know if a page is valid and
> memory (at least on x86) you could use:
> 
>     if (pfn_valid(pfn) && page_is_ram(pfn)) {
>     }
> 
> It is rare you care how many real page frames there are in the system.
> You are more interested in how many usable frames there are.  Such as
> for use in sizing hashes or caches.  The reserved pages should be
> excluded in this calculation.  ACPI pages, BIOS pages and the like
> simply are no interest.
> 
> I don't see anywhere in the kernel using that construct to work out how
> many pages there are in the system.  Mostly we have architectual
> information to tell us what real physical pages exist in the system such
> as the srat or e820 etc.  If we really care about real page counts at
> that accuracy we have those to refer to.
> 

Not all arch have page_is_ram(). OK it should be easy to have but we will
need create new data structures to keep this info. The point is that it's
really easy for memory model such sparsemem to keep this info.

> Do you have a usage model in which we really care about the number of
> pages in the system to that level of accuracy?
> 

show_mem(), which is arch specific, needs to report them. And some
implementations use only pfn_valid().

thanks

Francis



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Re : Re : Re : Re : sparsemem usage
  2006-08-09 14:19       ` Re : " moreau francis
@ 2006-08-10  4:46         ` KAMEZAWA Hiroyuki
  2006-08-10 12:40           ` moreau francis
  0 siblings, 1 reply; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-08-10  4:46 UTC (permalink / raw)
  To: moreau francis; +Cc: apw, alan, linux-kernel

On Wed, 9 Aug 2006 14:19:01 +0000 (GMT)
moreau francis <francis_moreau2000@yahoo.fr> wrote:

> Not all arch have page_is_ram(). OK it should be easy to have but we will
> need create new data structures to keep this info. The point is that it's
> really easy for memory model such sparsemem to keep this info.
> 
> > Do you have a usage model in which we really care about the number of
> > pages in the system to that level of accuracy?
> > 
> 
> show_mem(), which is arch specific, needs to report them. And some
> implementations use only pfn_valid().
> 

BTW, ioresouce information (see kernel/resouce.c)

[kamezawa@aworks Development]$ cat /proc/iomem | grep RAM
00000000-0009fbff : System RAM
000a0000-000bffff : Video RAM area
00100000-2dfeffff : System RAM

is not enough ?

I think kdump depends on this resouce information to determine 
where should be dumped.

-Kame



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re : sparsemem usage
  2006-08-10  4:46         ` KAMEZAWA Hiroyuki
@ 2006-08-10 12:40           ` moreau francis
  2006-08-10 12:49             ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 13+ messages in thread
From: moreau francis @ 2006-08-10 12:40 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: apw, alan, linux-kernel

KAMEZAWA Hiroyuki wrote:
> On Wed, 9 Aug 2006 14:19:01 +0000 (GMT)
> moreau francis <francis_moreau2000@yahoo.fr> wrote:
> 
>> Not all arch have page_is_ram(). OK it should be easy to have but we will
>> need create new data structures to keep this info. The point is that it's
>> really easy for memory model such sparsemem to keep this info.
>>
>>> Do you have a usage model in which we really care about the number of
>>> pages in the system to that level of accuracy?
>>>
>> show_mem(), which is arch specific, needs to report them. And some
>> implementations use only pfn_valid().
>>
> 
> BTW, ioresouce information (see kernel/resouce.c)
> 
> [kamezawa@aworks Development]$ cat /proc/iomem | grep RAM
> 00000000-0009fbff : System RAM
> 000a0000-000bffff : Video RAM area
> 00100000-2dfeffff : System RAM
> 
> is not enough ?
> 

well actually you show that to get a really simple information, ie does
a page exist ?, we need to parse some kernel data structures like 
ioresource (which is, IMHO, hackish) or duplicate in each architecture
some data to keep track of existing pages.

Francis

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Re : sparsemem usage
  2006-08-10 12:40           ` moreau francis
@ 2006-08-10 12:49             ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-08-10 12:49 UTC (permalink / raw)
  To: moreau francis; +Cc: apw, alan, linux-kernel

On Thu, 10 Aug 2006 14:40:52 +0200 (CEST)
moreau francis <francis_moreau2000@yahoo.fr> wrote: 
> > BTW, ioresouce information (see kernel/resouce.c)
> > 
> > [kamezawa@aworks Development]$ cat /proc/iomem | grep RAM
> > 00000000-0009fbff : System RAM
> > 000a0000-000bffff : Video RAM area
> > 00100000-2dfeffff : System RAM
> > 
> > is not enough ?
> > 
> 
> well actually you show that to get a really simple information, ie does
> a page exist ?, we need to parse some kernel data structures like 
> ioresource (which is, IMHO, hackish) or duplicate in each architecture
> some data to keep track of existing pages.
> 

becasue memory map from e820(x86) or efi(ia64) are registered to iomem_resource,
we should avoid duplicates that information. kdump and memory hotplug uses
this information. (memory hotplug updates this iomem_resource.)

Implementing "page_is_exist" function based on ioresouce is one of generic
and rubust way to go, I think.
(if performance of list walking is problem, enhancing ioresouce code is
 better.)
 
-Kame


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re : Re : sparsemem usage
  2006-08-10 15:05 KAMEZAWA Hiroyuki
@ 2006-08-10 15:23 ` moreau francis
  0 siblings, 0 replies; 13+ messages in thread
From: moreau francis @ 2006-08-10 15:23 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: apw, alan, linux-kernel

KAMEZAWA Hiroyuki wrote:
> On Thu, 10 Aug 2006 14:46:01 +0000 (GMT)
> moreau francis <francis_moreau2000@yahoo.fr> wrote:
>> Why not implementing page_exist() by simply using mem_map[] ? When
>> allocating mem_map[], we can just fill it with a special value. And
>> then when registering memory area, we clear this special value with
>> the "reserved" value. Hence for flatmem model, we can have:
>>
>> #define page_exist(pfn)        (mem_map[pfn] != SPECIAL_VALUE)
>>  
> putting a special value to a page struct at mem_map + pfn ?

yes

> 
>> and it should work for sparsemem too and other models that will use
>> mem_map[].
>>
>> Another point, is page_exist() going to replace page_valid() ?
> what is page_valid() here ? pfn_valid() (in current kernel) ?

sorry I was meaning pfn_valid() instead of page_valid() in the
whole email.

> 
>> I mean page_exist() is going to be something more accurate than
>> page_valid(). All tests on page_valid() _only_ will be fine to test
>> page_exist(). But all tests such:
>>
>>     if (page_valid(x) && page_is_ram(x))
>>
>> can be replaced by
>>
>>     if (page_exist(x))
>>
>> So, again, why not simply improving page_valid() definition rather
>> than introduce a new service ?
>>

s/page_valid/pfn_valid

> I welcome to do that if implementation is sane.
> pfn_valid() --- check there is a page struct
> page_exist() --- check there is a physical memory.
> 

new definition of pfn_valid() would be "a physical page exists". And
this definition imply the old one "it's safe to read the page struct *"

> but discussing without patch is not very good. please post your patch.
> Then we can discuss more concrete things.
> 

Since I'm not kernel hacker, or rather a newbie one, I try to make sure
that it worth to dig in that direction before working hard to write a
patch.

thanks

Francis




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re : Re : sparsemem usage
  2006-08-10 15:21 Andy Whitcroft
@ 2006-08-10 15:37 ` moreau francis
  2006-08-11  8:26   ` Andy Whitcroft
  0 siblings, 1 reply; 13+ messages in thread
From: moreau francis @ 2006-08-10 15:37 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: KAMEZAWA Hiroyuki, alan, linux-kernel

Andy Whitcroft wrote:
> moreau francis wrote:
>> KAMEZAWA Hiroyuki wrote:
>>> On Thu, 10 Aug 2006 14:40:52 +0200 (CEST)
>>> moreau francis <francis_moreau2000@yahoo.fr> wrote:
>>>>> BTW, ioresouce information (see kernel/resouce.c)
>>>>>
>>>>> [kamezawa@aworks Development]$ cat /proc/iomem | grep RAM
>>>>> 00000000-0009fbff : System RAM
>>>>> 000a0000-000bffff : Video RAM area
>>>>> 00100000-2dfeffff : System RAM
>>>>>
>>>>> is not enough ?
>>>>>
>>>> well actually you show that to get a really simple information, ie does
>>>> a page exist ?, we need to parse some kernel data structures like
>>>> ioresource (which is, IMHO, hackish) or duplicate in each architecture
>>>> some data to keep track of existing pages.
>>>>
>>> becasue memory map from e820(x86) or efi(ia64) are registered to
>>> iomem_resource,
>>> we should avoid duplicates that information. kdump and memory hotplug
>>> uses
>>> this information. (memory hotplug updates this iomem_resource.)
>>>
>>> Implementing "page_is_exist" function based on ioresouce is one of
>>> generic
>>> and rubust way to go, I think.
>>> (if performance of list walking is problem, enhancing ioresouce code is
>>>  better.)
>>>  
>>
>> Why not implementing page_exist() by simply using mem_map[] ? When
>> allocating mem_map[], we can just fill it with a special value. And
>> then when registering memory area, we clear this special value with
>> the "reserved" value. Hence for flatmem model, we can have:
>>
>> #define page_exist(pfn)        (mem_map[pfn] != SPECIAL_VALUE)
>>
>> and it should work for sparsemem too and other models that will use
>> mem_map[].
> 
> The mem_map isn't a pointer, its a physical structure.  We have a

ok

> special value to tell you if the page is usable within that, thats
> called PG_reserved.  If this page is reserved the kernel can't touch it,
> can't look at it.

can't we introduce a new special value, such as "PG_real" ?

> 
>> Another point, is page_exist() going to replace page_valid() ?
>> I mean page_exist() is going to be something more accurate than
>> page_valid(). All tests on page_valid() _only_ will be fine to test
>> page_exist(). But all tests such:
>>
>>     if (page_valid(x) && page_is_ram(x))
>>
>> can be replaced by
>>
>>     if (page_exist(x))
>>
>> So, again, why not simply improving page_valid() definition rather
>> than introduce a new service ?
> 
> Whilst I can understand that not knowing if a page is real or not is
> perhaps unappealing, I've yet to see any case where we need or care.
> Changing things to make things 'nicer' interlectually is sometimes
> worthwhile.  But what is the user here.
> 
> The only consumer you have shown is show_mem() which is a debug
> function, and that only dumps out the current memory counts.  Its not
> clear it cares to really know if a page is real or not.
> 

I understand your point of view, but even if it's a debug function,
it must exist and report correct information. And my point is that
I think it should be really easy to implement :) that by using
a new "special value". Can you confirm that it's really easy to
implement that ?

thanks

Francis



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Re : Re : sparsemem usage
  2006-08-10 15:37 ` Re : " moreau francis
@ 2006-08-11  8:26   ` Andy Whitcroft
  0 siblings, 0 replies; 13+ messages in thread
From: Andy Whitcroft @ 2006-08-11  8:26 UTC (permalink / raw)
  To: moreau francis; +Cc: KAMEZAWA Hiroyuki, alan, linux-kernel

moreau francis wrote:
> Andy Whitcroft wrote:
>> moreau francis wrote:
>>> KAMEZAWA Hiroyuki wrote:
>>>> On Thu, 10 Aug 2006 14:40:52 +0200 (CEST)
>>>> moreau francis <francis_moreau2000@yahoo.fr> wrote:
>>>>>> BTW, ioresouce information (see kernel/resouce.c)
>>>>>>
>>>>>> [kamezawa@aworks Development]$ cat /proc/iomem | grep RAM
>>>>>> 00000000-0009fbff : System RAM
>>>>>> 000a0000-000bffff : Video RAM area
>>>>>> 00100000-2dfeffff : System RAM
>>>>>>
>>>>>> is not enough ?
>>>>>>
>>>>> well actually you show that to get a really simple information, ie does
>>>>> a page exist ?, we need to parse some kernel data structures like
>>>>> ioresource (which is, IMHO, hackish) or duplicate in each architecture
>>>>> some data to keep track of existing pages.
>>>>>
>>>> becasue memory map from e820(x86) or efi(ia64) are registered to
>>>> iomem_resource,
>>>> we should avoid duplicates that information. kdump and memory hotplug
>>>> uses
>>>> this information. (memory hotplug updates this iomem_resource.)
>>>>
>>>> Implementing "page_is_exist" function based on ioresouce is one of
>>>> generic
>>>> and rubust way to go, I think.
>>>> (if performance of list walking is problem, enhancing ioresouce code is
>>>>  better.)
>>>>  
>>> Why not implementing page_exist() by simply using mem_map[] ? When
>>> allocating mem_map[], we can just fill it with a special value. And
>>> then when registering memory area, we clear this special value with
>>> the "reserved" value. Hence for flatmem model, we can have:
>>>
>>> #define page_exist(pfn)        (mem_map[pfn] != SPECIAL_VALUE)
>>>
>>> and it should work for sparsemem too and other models that will use
>>> mem_map[].
>> The mem_map isn't a pointer, its a physical structure.  We have a
> 
> ok
> 
>> special value to tell you if the page is usable within that, thats
>> called PG_reserved.  If this page is reserved the kernel can't touch it,
>> can't look at it.
> 
> can't we introduce a new special value, such as "PG_real" ?
> 
>>> Another point, is page_exist() going to replace page_valid() ?
>>> I mean page_exist() is going to be something more accurate than
>>> page_valid(). All tests on page_valid() _only_ will be fine to test
>>> page_exist(). But all tests such:
>>>
>>>     if (page_valid(x) && page_is_ram(x))
>>>
>>> can be replaced by
>>>
>>>     if (page_exist(x))
>>>
>>> So, again, why not simply improving page_valid() definition rather
>>> than introduce a new service ?
>> Whilst I can understand that not knowing if a page is real or not is
>> perhaps unappealing, I've yet to see any case where we need or care.
>> Changing things to make things 'nicer' interlectually is sometimes
>> worthwhile.  But what is the user here.
>>
>> The only consumer you have shown is show_mem() which is a debug
>> function, and that only dumps out the current memory counts.  Its not
>> clear it cares to really know if a page is real or not.
>>
> 
> I understand your point of view, but even if it's a debug function,
> it must exist and report correct information. And my point is that
> I think it should be really easy to implement :) that by using
> a new "special value". Can you confirm that it's really easy to
> implement that ?

It does produce real numbers, it tells you how many reserved pages you 
have.  The places that this is triggered we are interested in why we 
have no memory left.  We are not interested in how many pages are known 
but reserved as against how many pages are backed by page*'s but are 
really holes; they are mearly pages we can't use out of the total we are 
tracking.  We care about how many are not reserved, and how many of 
those are available.

It would be 'as simple' as adding a PG_real page bit except for two things:

1) page flags bits are seriously short supply; there are some 24 
available of which 22 are in use.  Any new user of a bit would have to 
be an extremely valuable change with major benefit to the kernel, and

2) if you were to try and populate a PG_real flag it would need to be 
populated for _all_ architectures (and there are a lot) for it to be of 
any use.  As you have already noted there is no consistent way to find 
out whether a page is ram so it would be major exercise to get these 
bits setup during boot.

I think we should take (2) as a hint here.  If we don't have a 
consistent interface for finding whether a page is real or not, we 
obviously have no general need of that information in the kernel.

Yes we obviously care if we can use a page, but we do not care if the 
page is unusable because it contains an ACPI table or the video driver 
BIOS or there is a memory hole.  Its either usable (!PG_reserved) or its 
not (PG_reserved).

-apw

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2006-08-11  8:28 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-03  9:07 Re : Re : sparsemem usage moreau francis
2006-08-03  9:19 ` KAMEZAWA Hiroyuki
2006-08-03  9:47 ` Andy Whitcroft
2006-08-03 12:46   ` Re : " moreau francis
2006-08-03 13:13     ` Andy Whitcroft
2006-08-09 14:19       ` Re : " moreau francis
2006-08-10  4:46         ` KAMEZAWA Hiroyuki
2006-08-10 12:40           ` moreau francis
2006-08-10 12:49             ` KAMEZAWA Hiroyuki
  -- strict thread matches above, loose matches on Subject: below --
2006-08-10 15:21 Andy Whitcroft
2006-08-10 15:37 ` Re : " moreau francis
2006-08-11  8:26   ` Andy Whitcroft
2006-08-10 15:05 KAMEZAWA Hiroyuki
2006-08-10 15:23 ` Re : " moreau francis
2006-08-02 15:36 Andy Whitcroft
2006-08-03  9:56 ` Re : " moreau francis

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.