All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Richard Moser <nigelenki@comcast.net>
To: Helge Hafting <helge.hafting@aitel.hist.no>
Cc: "linux-os (Dick Johnson)" <linux-os@analogic.com>,
	linux-kernel@vger.kernel.org
Subject: Re: Huge pages and small pages. . .
Date: Wed, 18 Jan 2006 14:11:33 -0500	[thread overview]
Message-ID: <43CE92E5.6030206@comcast.net> (raw)
In-Reply-To: <43CE1A46.1020601@aitel.hist.no>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



Helge Hafting wrote:
> John Richard Moser wrote:
> 
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>>
>>
>> linux-os (Dick Johnson) wrote:
>>  
>>
>>> On Tue, 17 Jan 2006, John Richard Moser wrote:
>>>
>>>
>>> Is there anything in the kernel that shifts the physical pages for 1024
>>> physically allocated and contiguous virtual pages together physically
>>> and remaps them as one huge page?  This would probably work well for the
>>>
>>>
>>>   
>>>
>>>> A page is something that is defined by the CPU. Perhaps you mean
>>>> "order"? When acquiring pages for DMA, they need to be contiguous if
>>>> you are going to access more than one page at a time. Therefore, one
>>>> can attempt to get two or more pages, i.e., the order or pages.
>>>>     
>>>> Since the CPU uses virtual memory always, there is no advantage to
>>>> having contiguous pages. You just map anything that's free into
>>>> what looks like contiguous memory and away you go.
>>>>     
>>
>>
>> Well, pages are typically 4KiB seen by the MMU.  If you fault across
>> them, you need to have them cached in the TLB; if the TLB runs out of
>> room, you invalidate entries; then when you hit entries not in the TLB,
>> the TLB has to searhc for the page mapping in the PTE chain.
>>
>> There are 4MiB pages, called "huge pages," that if you clump 1024
>> contiguous 4KiB pages together and draw a PTE entry up for can correlate
>> to a single TLB entry.  In this way, there's no page faulting until you
>> cross boundaries spaced 4MiB apart from eachother, and you use 1 TLB
>> entry where you would normally use 1024.
>>
>>  
>>
>>> low end of the heap, until someone figures out a way to tell the system
>>> to free intermittent pages in a big mapping (if the heap has an
>>> allocation up high, it can have huge, unused areas that are allocated).
>>>
>>>
>>>   
>>>
>>>> The actual allocation only occurs when an access happens. You can
>>>> allocate all the virtual memory in the machine and never use any
>>>> of it. When you allocate memory, the kernel just marks a promised
>>>> page 'not present'. When you attempt to access it, a page-fault
>>>> occurs and the kernel tries to find a free page to map into your
>>>> address space.
>>>>     
>>
>>
>> Yes.  The heap manager brk()s up the heap to allocate more space, all
>> mapped to the zero page; then the application writes to these pages,
>> causing them to be COW'd to real memory.  They will stay forever
>> allocated until the highest pages of the heap are unused by the program,
>> in which case the heap manager brk()s down the heap and frees them to
>> the system.
>>
>> Currently the heap manager can't seem to tell the system that a page
>> somewhere in the middle of the heap isn't really needed anymore, and can
>> be freed and mapped back to the zero page to await COW again.  So in
>> effect, you'll eventually wind out with a heap that's 20, 50, 100, or
>> 200 megs wide and probably all actually mapped to real, usable memory;
>> at this point, you can probably replace most of those entries with huge
>> pages to save on TLB entries and page faults.
>>  
>>
> This would be a nice performance win _if_ the program is
> actually using all those pages regularly. And I mean _all_.
> 
> The idea fails if we get any memory pressure, and some
> of the little-used pages gets swapped out.  You can't swap
> out part of a huge page, so you either have to break it up,
> or suffer the possibly large performance loss of having
> one megabyte less of virtual memory.   (That'll be even worse
> if several processes do this, of course.)
> 
>> When the program would try to free up "pages" in a huge page, the kernel
>> would have to recognize that the application is working in terms of 4KiB
>> small pages, and take appropriate action shattering the huge page back
>> into 1024 small pages first.
>>  
>>
> To see whether this is worthwile, I suggest instrumenting a kernel
> to detect:
> (1) When a process gets opportunity to have a huge page
>      (I.e. it has a sufficiently large and aligned memory block,
>       none of it is swapped, and rearranging into contigous
>       physical memory is possible.)
> (2) When a huge page would need to be broken up.
>      (I.e. parts of a previously identified huge page gets swapped
>       out, deallocated, shared, memprotected and so on.)
> 
> 
> Both of these detectors will be necessary anyway if automatic huge
> pages gets deployed.  In the meantime, you can check to see
> if such occations arise and for how long the huge pages remain
> viable. I believe the swapper will kill them fast once you have
> any memory pressure, and that fragmentation normally will prevent
> such pages from forming.  But I could be wrong of course.
> 

You're probably right.  On a system with optimum memory, this would be
optimal; aside from that it could get painful.

> For this to be fesible, the reduction in TLB misses would have to
> outweigh the initial cost of copying all that memory into a contigous
> region, the cost of shattering the mapping once the swapper gets
> aggressive, and of course the ongoing cost of identifying mergeable blocks.
> 
> Helge Hafting
> 

- --
All content of all messages exchanged herein are left in the
Public Domain, unless otherwise explicitly stated.

    Creative brains are a valuable, limited resource. They shouldn't be
    wasted on re-inventing the wheel when there are so many fascinating
    new problems waiting out there.
                                                 -- Eric Steven Raymond
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDzpLjhDd4aOud5P8RAmrAAJ0bTjc/SIrTSUECLIIEG8xCNjVnBACdF/N3
Cl/3+2m2nNm7IsGhJcFsWUs=
=Vrrl
-----END PGP SIGNATURE-----

      reply	other threads:[~2006-01-18 19:12 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-01-17 18:52 Huge pages and small pages. . John Richard Moser
2006-01-17 19:06 ` William Lee Irwin III
2006-01-17 19:41   ` John Richard Moser
2006-01-17 19:18 ` linux-os (Dick Johnson)
2006-01-17 19:40   ` John Richard Moser
2006-01-17 23:18     ` Paul Mundt
2006-01-18  5:50       ` Ian Wienand
2006-01-18 10:36     ` Helge Hafting
2006-01-18 19:11       ` John Richard Moser [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43CE92E5.6030206@comcast.net \
    --to=nigelenki@comcast.net \
    --cc=helge.hafting@aitel.hist.no \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-os@analogic.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.