linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re : Re : sparsemem usage
@ 2006-08-03  9:07 moreau francis
  2006-08-03  9:19 ` KAMEZAWA Hiroyuki
  2006-08-03  9:47 ` Andy Whitcroft
  0 siblings, 2 replies; 11+ messages in thread
From: moreau francis @ 2006-08-03  9:07 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel, apw

Alan Cox wrote:
>
> Mapping out parts of a section is quite normal - think about the 640K to
> 1Mb hole in PC memory space.

OK. But I'm still worry. Please consider the following code

       for (...; ...; ...) {
                [...]
                if (pfn_valid(i))
                       num_physpages++;
                [...]
        }

In that case num_physpages won't store an accurate value. Still it will be
used by the kernel to make some statistic assumptions on other kernel
data structure sizes.

Francis
        




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re : Re : sparsemem usage
  2006-08-03  9:07 Re : Re : sparsemem usage moreau francis
@ 2006-08-03  9:19 ` KAMEZAWA Hiroyuki
  2006-08-03  9:47 ` Andy Whitcroft
  1 sibling, 0 replies; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-08-03  9:19 UTC (permalink / raw)
  To: moreau francis; +Cc: alan, linux-kernel, apw

On Thu, 3 Aug 2006 09:07:06 +0000 (GMT)
moreau francis <francis_moreau2000@yahoo.fr> wrote:

> Alan Cox wrote:
> >
> > Mapping out parts of a section is quite normal - think about the 640K to
> > 1Mb hole in PC memory space.
> 
> OK. But I'm still worry. Please consider the following code
> 
>        for (...; ...; ...) {
>                 [...]
>                 if (pfn_valid(i))
>                        num_physpages++;
>                 [...]
>         }
> 
> In that case num_physpages won't store an accurate value. Still it will be
> used by the kernel to make some statistic assumptions on other kernel
> data structure sizes.
> 
In my understanding, pfn_valid() just returns "the page has page struct or not".
So, don't use pfn_valid() to count physical pages..


-Kame


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re : Re : sparsemem usage
  2006-08-03  9:07 Re : Re : sparsemem usage moreau francis
  2006-08-03  9:19 ` KAMEZAWA Hiroyuki
@ 2006-08-03  9:47 ` Andy Whitcroft
  2006-08-03 12:46   ` Re : " moreau francis
  1 sibling, 1 reply; 11+ messages in thread
From: Andy Whitcroft @ 2006-08-03  9:47 UTC (permalink / raw)
  To: moreau francis; +Cc: Alan Cox, linux-kernel

moreau francis wrote:
> Alan Cox wrote:
>> Mapping out parts of a section is quite normal - think about the 640K to
>> 1Mb hole in PC memory space.
> 
> OK. But I'm still worry. Please consider the following code
> 
>        for (...; ...; ...) {
>                 [...]
>                 if (pfn_valid(i))
>                        num_physpages++;
>                 [...]
>         }
> 
> In that case num_physpages won't store an accurate value. Still it will be
> used by the kernel to make some statistic assumptions on other kernel
> data structure sizes.

That would be incorrect usage.  pfn_valid() simply doesn't tell you if 
you have memory backing a pfn, it mearly means you can interrogate the 
page* for it.  A good example of code which counts pages in a region is 
in count_highmem_pages() which has a form as below:

			for (pfn = start; pfn < end; pfn++) {
  				if (!pfn_valid(pfn))
                                         continue;
                                 page = pfn_to_page(pfn);
                                 if (PageReserved(page))
                                         continue;
				num_physpages++;
			}

-apw

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re : Re : Re : sparsemem usage
  2006-08-03  9:47 ` Andy Whitcroft
@ 2006-08-03 12:46   ` moreau francis
  2006-08-03 13:13     ` Andy Whitcroft
  0 siblings, 1 reply; 11+ messages in thread
From: moreau francis @ 2006-08-03 12:46 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: Alan Cox, linux-kernel

Andy Whitcroft wrote:
> That would be incorrect usage.  pfn_valid() simply doesn't tell you if 
> you have memory backing a pfn, it mearly means you can interrogate the 
> page* for it.  A good example of code which counts pages in a region is 
> in count_highmem_pages() which has a form as below:
> 
>             for (pfn = start; pfn < end; pfn++) {
>                   if (!pfn_valid(pfn))
>                                          continue;
>                                  page = pfn_to_page(pfn);
>                                  if (PageReserved(page))
>                                          continue;
>                 num_physpages++;
>             }
> 
num_physpages would still not give the right total number of pages in the
system. It will report a value smaller than the size of all memories which can
be suprising, depending on how it is used. In my mind I thought that it should
store the number of all pages in the system (reserved + free + ...).

Futhermore for flatmem model, my example that count the number of physical
pages is valid: reserved pages are really pages that are in used by the kernel.
But it's not valid anymore for sparsemem model. For consistency and code
sharing, I would make the same meaning of pfn_valid() and PageReserved() for
both models.

Francis



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re : Re : Re : sparsemem usage
  2006-08-03 12:46   ` Re : " moreau francis
@ 2006-08-03 13:13     ` Andy Whitcroft
  2006-08-09 14:19       ` Re : " moreau francis
  0 siblings, 1 reply; 11+ messages in thread
From: Andy Whitcroft @ 2006-08-03 13:13 UTC (permalink / raw)
  To: moreau francis; +Cc: Alan Cox, linux-kernel

moreau francis wrote:
> Andy Whitcroft wrote:
>> That would be incorrect usage.  pfn_valid() simply doesn't tell you if 
>> you have memory backing a pfn, it mearly means you can interrogate the 
>> page* for it.  A good example of code which counts pages in a region is 
>> in count_highmem_pages() which has a form as below:
>>
>>             for (pfn = start; pfn < end; pfn++) {
>>                   if (!pfn_valid(pfn))
>>                                          continue;
>>                                  page = pfn_to_page(pfn);
>>                                  if (PageReserved(page))
>>                                          continue;
>>                 num_physpages++;
>>             }
>>
> num_physpages would still not give the right total number of pages in the
> system. It will report a value smaller than the size of all memories which can
> be suprising, depending on how it is used. In my mind I thought that it should
> store the number of all pages in the system (reserved + free + ...).
> 
> Futhermore for flatmem model, my example that count the number of physical
> pages is valid: reserved pages are really pages that are in used by the kernel.
> But it's not valid anymore for sparsemem model. For consistency and code
> sharing, I would make the same meaning of pfn_valid() and PageReserved() for
> both models.

The semantics and meaning of both pfn_valid() and PageReserved() are the 
same in all three memory models, just not what you need them to be for 
your pfn_valid() loop to tell you how many real frames there are.
I do not believe it is correct to say that your loop would give you the 
number of physical pages under FLATMEM.  If there are any gaps at all 
(such as there is for IO space just below 1MB) that will pass 
pfn_valid(), and yet does _not_ have any real memory associated with it.
With FLATMEM you will get pfn_valid() passing on non-memory pages.

I have to re-iterate pfn_valid() does not mean pfn_valid_memory(), it 
means pfn_valid_memmap().  If you want to know if a page is valid and 
memory (at least on x86) you could use:

	if (pfn_valid(pfn) && page_is_ram(pfn)) {
	}

It is rare you care how many real page frames there are in the system. 
You are more interested in how many usable frames there are.  Such as 
for use in sizing hashes or caches.  The reserved pages should be 
excluded in this calculation.  ACPI pages, BIOS pages and the like 
simply are no interest.

I don't see anywhere in the kernel using that construct to work out how 
many pages there are in the system.  Mostly we have architectual 
information to tell us what real physical pages exist in the system such 
as the srat or e820 etc.  If we really care about real page counts at 
that accuracy we have those to refer to.

Do you have a usage model in which we really care about the number of 
pages in the system to that level of accuracy?

-apw

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re : Re : Re : Re : sparsemem usage
  2006-08-03 13:13     ` Andy Whitcroft
@ 2006-08-09 14:19       ` moreau francis
  2006-08-10  4:46         ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 11+ messages in thread
From: moreau francis @ 2006-08-09 14:19 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: Alan Cox, linux-kernel

Andy Whitcroft wrote:
> 
> I have to re-iterate pfn_valid() does not mean pfn_valid_memory(), it
> means pfn_valid_memmap().  If you want to know if a page is valid and
> memory (at least on x86) you could use:
> 
>     if (pfn_valid(pfn) && page_is_ram(pfn)) {
>     }
> 
> It is rare you care how many real page frames there are in the system.
> You are more interested in how many usable frames there are.  Such as
> for use in sizing hashes or caches.  The reserved pages should be
> excluded in this calculation.  ACPI pages, BIOS pages and the like
> simply are no interest.
> 
> I don't see anywhere in the kernel using that construct to work out how
> many pages there are in the system.  Mostly we have architectual
> information to tell us what real physical pages exist in the system such
> as the srat or e820 etc.  If we really care about real page counts at
> that accuracy we have those to refer to.
> 

Not all arch have page_is_ram(). OK it should be easy to have but we will
need create new data structures to keep this info. The point is that it's
really easy for memory model such sparsemem to keep this info.

> Do you have a usage model in which we really care about the number of
> pages in the system to that level of accuracy?
> 

show_mem(), which is arch specific, needs to report them. And some
implementations use only pfn_valid().

thanks

Francis



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re : Re : Re : Re : sparsemem usage
  2006-08-09 14:19       ` Re : " moreau francis
@ 2006-08-10  4:46         ` KAMEZAWA Hiroyuki
  2006-08-10 12:40           ` moreau francis
  0 siblings, 1 reply; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-08-10  4:46 UTC (permalink / raw)
  To: moreau francis; +Cc: apw, alan, linux-kernel

On Wed, 9 Aug 2006 14:19:01 +0000 (GMT)
moreau francis <francis_moreau2000@yahoo.fr> wrote:

> Not all arch have page_is_ram(). OK it should be easy to have but we will
> need create new data structures to keep this info. The point is that it's
> really easy for memory model such sparsemem to keep this info.
> 
> > Do you have a usage model in which we really care about the number of
> > pages in the system to that level of accuracy?
> > 
> 
> show_mem(), which is arch specific, needs to report them. And some
> implementations use only pfn_valid().
> 

BTW, ioresouce information (see kernel/resouce.c)

[kamezawa@aworks Development]$ cat /proc/iomem | grep RAM
00000000-0009fbff : System RAM
000a0000-000bffff : Video RAM area
00100000-2dfeffff : System RAM

is not enough ?

I think kdump depends on this resouce information to determine 
where should be dumped.

-Kame



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re : sparsemem usage
  2006-08-10  4:46         ` KAMEZAWA Hiroyuki
@ 2006-08-10 12:40           ` moreau francis
  2006-08-10 12:49             ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 11+ messages in thread
From: moreau francis @ 2006-08-10 12:40 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: apw, alan, linux-kernel

KAMEZAWA Hiroyuki wrote:
> On Wed, 9 Aug 2006 14:19:01 +0000 (GMT)
> moreau francis <francis_moreau2000@yahoo.fr> wrote:
> 
>> Not all arch have page_is_ram(). OK it should be easy to have but we will
>> need create new data structures to keep this info. The point is that it's
>> really easy for memory model such sparsemem to keep this info.
>>
>>> Do you have a usage model in which we really care about the number of
>>> pages in the system to that level of accuracy?
>>>
>> show_mem(), which is arch specific, needs to report them. And some
>> implementations use only pfn_valid().
>>
> 
> BTW, ioresouce information (see kernel/resouce.c)
> 
> [kamezawa@aworks Development]$ cat /proc/iomem | grep RAM
> 00000000-0009fbff : System RAM
> 000a0000-000bffff : Video RAM area
> 00100000-2dfeffff : System RAM
> 
> is not enough ?
> 

well actually you show that to get a really simple information, ie does
a page exist ?, we need to parse some kernel data structures like 
ioresource (which is, IMHO, hackish) or duplicate in each architecture
some data to keep track of existing pages.

Francis

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re : sparsemem usage
  2006-08-10 12:40           ` moreau francis
@ 2006-08-10 12:49             ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-08-10 12:49 UTC (permalink / raw)
  To: moreau francis; +Cc: apw, alan, linux-kernel

On Thu, 10 Aug 2006 14:40:52 +0200 (CEST)
moreau francis <francis_moreau2000@yahoo.fr> wrote: 
> > BTW, ioresouce information (see kernel/resouce.c)
> > 
> > [kamezawa@aworks Development]$ cat /proc/iomem | grep RAM
> > 00000000-0009fbff : System RAM
> > 000a0000-000bffff : Video RAM area
> > 00100000-2dfeffff : System RAM
> > 
> > is not enough ?
> > 
> 
> well actually you show that to get a really simple information, ie does
> a page exist ?, we need to parse some kernel data structures like 
> ioresource (which is, IMHO, hackish) or duplicate in each architecture
> some data to keep track of existing pages.
> 

becasue memory map from e820(x86) or efi(ia64) are registered to iomem_resource,
we should avoid duplicates that information. kdump and memory hotplug uses
this information. (memory hotplug updates this iomem_resource.)

Implementing "page_is_exist" function based on ioresouce is one of generic
and rubust way to go, I think.
(if performance of list walking is problem, enhancing ioresouce code is
 better.)
 
-Kame


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re : Re : Re : sparsemem usage
  2006-08-11  8:26 Re : " Andy Whitcroft
@ 2006-08-11 12:46 ` moreau francis
  2006-08-11 12:52   ` Andy Whitcroft
  0 siblings, 1 reply; 11+ messages in thread
From: moreau francis @ 2006-08-11 12:46 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: KAMEZAWA Hiroyuki, alan, linux-kernel

Andy Whitcroft wrote:
> It does produce real numbers, it tells you how many reserved pages you
> have.  The places that this is triggered we are interested in why we
> have no memory left.  We are not interested in how many pages are known
> but reserved as against how many pages are backed by page*'s but are
> really holes; they are mearly pages we can't use out of the total we are
> tracking.  We care about how many are not reserved, and how many of
> those are available.
> 
> It would be 'as simple' as adding a PG_real page bit except for two things:
> 
> 1) page flags bits are seriously short supply; there are some 24
> available of which 22 are in use.  Any new user of a bit would have to
> be an extremely valuable change with major benefit to the kernel, and
> 

It's indeed an issue. Could we instead use a combination of flags
that can't happen together. For example PG_Free|PG_Reserved ?

> 2) if you were to try and populate a PG_real flag it would need to be
> populated for _all_ architectures (and there are a lot) for it to be of
> any use.  As you have already noted there is no consistent way to find
> out whether a page is ram so it would be major exercise to get these
> bits setup during boot.
> 
> I think we should take (2) as a hint here.  If we don't have a
> consistent interface for finding whether a page is real or not, we
> obviously have no general need of that information in the kernel.
> 

or maybe _because_ we don't have a consistent interface for finding 
whether a page is real or not, we end up with a strange thing called
page_is_ram() which could be the same for all arch and be implemented
very simply.

BTW, can you try in a linux tree:

$ grep -r page_is_ram arch/

and see how it's implemented...

thanks

Francis

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re : Re : Re : sparsemem usage
  2006-08-11 12:46 ` Re : " moreau francis
@ 2006-08-11 12:52   ` Andy Whitcroft
  0 siblings, 0 replies; 11+ messages in thread
From: Andy Whitcroft @ 2006-08-11 12:52 UTC (permalink / raw)
  To: moreau francis; +Cc: KAMEZAWA Hiroyuki, alan, linux-kernel

moreau francis wrote:
> Andy Whitcroft wrote:
>> It does produce real numbers, it tells you how many reserved pages you
>> have.  The places that this is triggered we are interested in why we
>> have no memory left.  We are not interested in how many pages are known
>> but reserved as against how many pages are backed by page*'s but are
>> really holes; they are mearly pages we can't use out of the total we are
>> tracking.  We care about how many are not reserved, and how many of
>> those are available.
>>
>> It would be 'as simple' as adding a PG_real page bit except for two things:
>>
>> 1) page flags bits are seriously short supply; there are some 24
>> available of which 22 are in use.  Any new user of a bit would have to
>> be an extremely valuable change with major benefit to the kernel, and
>>
> 
> It's indeed an issue. Could we instead use a combination of flags
> that can't happen together. For example PG_Free|PG_Reserved ?
> 

You'd need to audit all other users of the bits you wanted to borrow to 
check they would understand.  Like if you used PG_buddy (which I assume 
is what you are referring to above) then you'd get non-real memory 
getting merged into your buddies.  Badness.

>> 2) if you were to try and populate a PG_real flag it would need to be
>> populated for _all_ architectures (and there are a lot) for it to be of
>> any use.  As you have already noted there is no consistent way to find
>> out whether a page is ram so it would be major exercise to get these
>> bits setup during boot.
>>
>> I think we should take (2) as a hint here.  If we don't have a
>> consistent interface for finding whether a page is real or not, we
>> obviously have no general need of that information in the kernel.
>>
> 
> or maybe _because_ we don't have a consistent interface for finding 
> whether a page is real or not, we end up with a strange thing called
> page_is_ram() which could be the same for all arch and be implemented
> very simply.
> 
> BTW, can you try in a linux tree:
> 
> $ grep -r page_is_ram arch/
> 
> and see how it's implemented...

Well it depends how you look at it.  You are going to need to know which 
pages are ram in each architecture to set the bits in the page*'s to 
tell us later.  So the problem is the same problem.  We don't 
necessarily have the information for all architectures.  As we don't use 
this for anything its questionable whether we need it.

Feel free to try and figure it out for all these architectures :).

-apw

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-08-11 12:54 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-03  9:07 Re : Re : sparsemem usage moreau francis
2006-08-03  9:19 ` KAMEZAWA Hiroyuki
2006-08-03  9:47 ` Andy Whitcroft
2006-08-03 12:46   ` Re : " moreau francis
2006-08-03 13:13     ` Andy Whitcroft
2006-08-09 14:19       ` Re : " moreau francis
2006-08-10  4:46         ` KAMEZAWA Hiroyuki
2006-08-10 12:40           ` moreau francis
2006-08-10 12:49             ` KAMEZAWA Hiroyuki
  -- strict thread matches above, loose matches on Subject: below --
2006-08-11  8:26 Re : " Andy Whitcroft
2006-08-11 12:46 ` Re : " moreau francis
2006-08-11 12:52   ` Andy Whitcroft

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).