* Re: larger default page sizes...
@ 2008-03-25 23:47 J.C. Pizarro
2008-03-26 15:57 ` H. Peter Anvin
0 siblings, 1 reply; 36+ messages in thread
From: J.C. Pizarro @ 2008-03-25 23:47 UTC (permalink / raw)
To: David Miller, LKML
On Tue, 25 Mar 2008 16:22:44 -0700 (PDT), David Miller wrote:
> > On Mon, 24 Mar 2008, David Miller wrote:
> >
> > > There are ways to get large pages into the process address space for
> > > compute bound tasks, without suffering the well known negative side
> > > effects of using larger pages for everything.
> >
> > These hacks have limitations. F.e. they do not deal with I/O and
> > require application changes.
>
> Transparent automatic hugepages are definitely doable, I don't know
> why you think this requires application changes.
>
> People want these larger pages for HPC apps.
But there is a general problem of larger pages in systems that
don't support them natively (in hardware) depending in how it's
implemented the memory manager in the kernel:
"Doubling the soft page size implies
halfing the TLB soft-entries in the old hardware".
"x4 soft page size=> 1/4 TLB soft-entries, ... and so on."
Assuming one soft double-sized page represents 2 real-sized pages,
one replacing of one soft double-sized page implies replacing
2 TLB's entries containing the 2 real-sized pages.
The TLB is very small, its entries are around 24 entries aprox. in
some processors!.
Assuming soft 64 KiB page using real 4 KiB pages => 1/16 TLB soft-entries.
If the TLB has 24 entries then calculating 24/16=1.5 soft-entries,
the TLB will have only 1 soft-entry for soft 64 KiB pages!!! Weird!!!
The normal soft sizes are 8 KiB or 16 KiB for non-native processors, not more.
So, the TLB of 24 entries of real 4 KiB will have 12 or 6
soft-entries respect.
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: larger default page sizes... 2008-03-25 23:47 larger default page sizes J.C. Pizarro @ 2008-03-26 15:57 ` H. Peter Anvin 0 siblings, 0 replies; 36+ messages in thread From: H. Peter Anvin @ 2008-03-26 15:57 UTC (permalink / raw) To: J.C. Pizarro; +Cc: David Miller, LKML J.C. Pizarro wrote: > > But there is a general problem of larger pages in systems that > don't support them natively (in hardware) depending in how it's > implemented the memory manager in the kernel: > > "Doubling the soft page size implies > halfing the TLB soft-entries in the old hardware". > > "x4 soft page size=> 1/4 TLB soft-entries, ... and so on." > > Assuming one soft double-sized page represents 2 real-sized pages, > one replacing of one soft double-sized page implies replacing > 2 TLB's entries containing the 2 real-sized pages. > > The TLB is very small, its entries are around 24 entries aprox. in > some processors!. > That's not a problem, actually, since the TLB entries can get shuffled like any other (for software TLBs it's a little different, but it can be dealt with there too.) The *real* problem is ABI breakage. -hpa ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [11/14] vcompound: Fallbacks for order 1 stack allocations on IA64 and x86
@ 2008-03-21 17:40 Christoph Lameter
2008-03-21 21:57 ` David Miller
0 siblings, 1 reply; 36+ messages in thread
From: Christoph Lameter @ 2008-03-21 17:40 UTC (permalink / raw)
To: David Miller; +Cc: linux-mm, linux-kernel
On Fri, 21 Mar 2008, David Miller wrote:
> I would be very careful with this especially on IA64.
>
> If the TLB miss or other low-level trap handler depends upon being
> able to dereference thread info, task struct, or kernel stack stuff
> without causing a fault outside of the linear PAGE_OFFSET area, this
> patch will cause problems.
Hmmm. Does not sound good for arches that cannot handle TLB misses in
hardware. I wonder how arch specific this is? Last time around I was told
that some arches already virtually map their stacks.
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: [11/14] vcompound: Fallbacks for order 1 stack allocations on IA64 and x86 2008-03-21 17:40 [11/14] vcompound: Fallbacks for order 1 stack allocations on IA64 and x86 Christoph Lameter @ 2008-03-21 21:57 ` David Miller 2008-03-24 18:27 ` Christoph Lameter 0 siblings, 1 reply; 36+ messages in thread From: David Miller @ 2008-03-21 21:57 UTC (permalink / raw) To: clameter; +Cc: linux-mm, linux-kernel From: Christoph Lameter <clameter@sgi.com> Date: Fri, 21 Mar 2008 10:40:18 -0700 (PDT) > On Fri, 21 Mar 2008, David Miller wrote: > > > I would be very careful with this especially on IA64. > > > > If the TLB miss or other low-level trap handler depends upon being > > able to dereference thread info, task struct, or kernel stack stuff > > without causing a fault outside of the linear PAGE_OFFSET area, this > > patch will cause problems. > > Hmmm. Does not sound good for arches that cannot handle TLB misses in > hardware. I wonder how arch specific this is? Last time around I was told > that some arches already virtually map their stacks. I'm not saying there is a problem, I'm saying "tread lightly" because there might be one. The thing to do is to first validate the way that IA64 handles recursive TLB misses occuring during an initial TLB miss, and if there are any limitations therein. That's the kind of thing I'm talking about. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [11/14] vcompound: Fallbacks for order 1 stack allocations on IA64 and x86 2008-03-21 21:57 ` David Miller @ 2008-03-24 18:27 ` Christoph Lameter 2008-03-24 20:37 ` larger default page sizes David Miller 0 siblings, 1 reply; 36+ messages in thread From: Christoph Lameter @ 2008-03-24 18:27 UTC (permalink / raw) To: David Miller; +Cc: linux-mm, linux-kernel, linux-ia64 On Fri, 21 Mar 2008, David Miller wrote: > The thing to do is to first validate the way that IA64 > handles recursive TLB misses occuring during an initial > TLB miss, and if there are any limitations therein. I am familiar with that area and I am resonably sure that this is an issue on IA64 under some conditions (the processor decides to spill some registers either onto the stack or into the register backing store during tlb processing). Recursion (in the kernel context) still expects the stack and register backing store to be available. ccing linux-ia64 for any thoughts to the contrary. The move to 64k page size on IA64 is another way that this issue can be addressed though. So I think its best to drop the IA64 portion. ^ permalink raw reply [flat|nested] 36+ messages in thread
* larger default page sizes... 2008-03-24 18:27 ` Christoph Lameter @ 2008-03-24 20:37 ` David Miller 2008-03-24 21:05 ` Christoph Lameter ` (2 more replies) 0 siblings, 3 replies; 36+ messages in thread From: David Miller @ 2008-03-24 20:37 UTC (permalink / raw) To: clameter; +Cc: linux-mm, linux-kernel, linux-ia64, torvalds From: Christoph Lameter <clameter@sgi.com> Date: Mon, 24 Mar 2008 11:27:06 -0700 (PDT) > The move to 64k page size on IA64 is another way that this issue can > be addressed though. This is such a huge mistake I wish platforms such as powerpc and IA64 would not make such decisions so lightly. The memory wastage is just rediculious. I already see several distributions moving to 64K pages for powerpc, so I want to nip this in the bud before this monkey-see-monkey-do thing gets any more out of hand. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: larger default page sizes... 2008-03-24 20:37 ` larger default page sizes David Miller @ 2008-03-24 21:05 ` Christoph Lameter 2008-03-24 21:43 ` David Miller 2008-03-24 21:25 ` Luck, Tony 2008-03-25 3:29 ` Paul Mackerras 2 siblings, 1 reply; 36+ messages in thread From: Christoph Lameter @ 2008-03-24 21:05 UTC (permalink / raw) To: David Miller; +Cc: linux-mm, linux-kernel, linux-ia64, torvalds On Mon, 24 Mar 2008, David Miller wrote: > From: Christoph Lameter <clameter@sgi.com> > Date: Mon, 24 Mar 2008 11:27:06 -0700 (PDT) > > > The move to 64k page size on IA64 is another way that this issue can > > be addressed though. > > This is such a huge mistake I wish platforms such as powerpc and IA64 > would not make such decisions so lightly. Its certainly not a light decision if your customer tells you that the box is almost unusable with 16k page size. For our new 2k and 4k processor systems this seems to be a requirement. Customers start hacking SLES10 to run with 64k pages.... > The memory wastage is just rediculious. Well yes if you would use such a box for kernel compiles and small files then its a bad move. However, if you have to process terabytes of data then this is significantly reducing the VM and I/O overhead. > I already see several distributions moving to 64K pages for powerpc, > so I want to nip this in the bud before this monkey-see-monkey-do > thing gets any more out of hand. powerpc also runs HPC codes. They certainly see the same results that we see. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: larger default page sizes... 2008-03-24 21:05 ` Christoph Lameter @ 2008-03-24 21:43 ` David Miller 2008-03-25 17:48 ` Christoph Lameter 0 siblings, 1 reply; 36+ messages in thread From: David Miller @ 2008-03-24 21:43 UTC (permalink / raw) To: clameter; +Cc: linux-mm, linux-kernel, linux-ia64, torvalds From: Christoph Lameter <clameter@sgi.com> Date: Mon, 24 Mar 2008 14:05:02 -0700 (PDT) > On Mon, 24 Mar 2008, David Miller wrote: > > > From: Christoph Lameter <clameter@sgi.com> > > Date: Mon, 24 Mar 2008 11:27:06 -0700 (PDT) > > > > > The move to 64k page size on IA64 is another way that this issue can > > > be addressed though. > > > > This is such a huge mistake I wish platforms such as powerpc and IA64 > > would not make such decisions so lightly. > > Its certainly not a light decision if your customer tells you that the box > is almost unusable with 16k page size. For our new 2k and 4k processor > systems this seems to be a requirement. Customers start hacking SLES10 to > run with 64k pages.... We should fix the underlying problems. I'm hitting issues on 128 cpu Niagara2 boxes, and it's all fundamental stuff like contention on the per-zone page allocator locks. Which is very fixable, without going to larger pages. > powerpc also runs HPC codes. They certainly see the same results > that we see. There are ways to get large pages into the process address space for compute bound tasks, without suffering the well known negative side effects of using larger pages for everything. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: larger default page sizes... 2008-03-24 21:43 ` David Miller @ 2008-03-25 17:48 ` Christoph Lameter 2008-03-25 23:22 ` David Miller 0 siblings, 1 reply; 36+ messages in thread From: Christoph Lameter @ 2008-03-25 17:48 UTC (permalink / raw) To: David Miller; +Cc: linux-mm, linux-kernel, linux-ia64, torvalds On Mon, 24 Mar 2008, David Miller wrote: > We should fix the underlying problems. > > I'm hitting issues on 128 cpu Niagara2 boxes, and it's all fundamental > stuff like contention on the per-zone page allocator locks. > > Which is very fixable, without going to larger pages. No its not fixable. You are doing linear optimizations to a slowdown that grows exponentially. Going just one order up for page size reduces the necessary locks and handling of the kernel by 50%. > > powerpc also runs HPC codes. They certainly see the same results > > that we see. > > There are ways to get large pages into the process address space for > compute bound tasks, without suffering the well known negative side > effects of using larger pages for everything. These hacks have limitations. F.e. they do not deal with I/O and require application changes. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: larger default page sizes... 2008-03-25 17:48 ` Christoph Lameter @ 2008-03-25 23:22 ` David Miller 2008-03-25 23:41 ` Peter Chubb 0 siblings, 1 reply; 36+ messages in thread From: David Miller @ 2008-03-25 23:22 UTC (permalink / raw) To: clameter; +Cc: linux-mm, linux-kernel, linux-ia64, torvalds From: Christoph Lameter <clameter@sgi.com> Date: Tue, 25 Mar 2008 10:48:19 -0700 (PDT) > On Mon, 24 Mar 2008, David Miller wrote: > > > There are ways to get large pages into the process address space for > > compute bound tasks, without suffering the well known negative side > > effects of using larger pages for everything. > > These hacks have limitations. F.e. they do not deal with I/O and > require application changes. Transparent automatic hugepages are definitely doable, I don't know why you think this requires application changes. People want these larger pages for HPC apps. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: larger default page sizes... 2008-03-25 23:22 ` David Miller @ 2008-03-25 23:41 ` Peter Chubb 2008-03-25 23:49 ` David Miller 2008-03-26 0:34 ` David Mosberger-Tang 0 siblings, 2 replies; 36+ messages in thread From: Peter Chubb @ 2008-03-25 23:41 UTC (permalink / raw) To: David Miller; +Cc: clameter, linux-mm, linux-kernel, linux-ia64, torvalds, ianw >>>>> "David" == David Miller <davem@davemloft.net> writes: David> From: Christoph Lameter <clameter@sgi.com> Date: Tue, 25 Mar David> 2008 10:48:19 -0700 (PDT) >> On Mon, 24 Mar 2008, David Miller wrote: >> >> > There are ways to get large pages into the process address space >> for > compute bound tasks, without suffering the well known >> negative side > effects of using larger pages for everything. >> >> These hacks have limitations. F.e. they do not deal with I/O and >> require application changes. David> Transparent automatic hugepages are definitely doable, I don't David> know why you think this requires application changes. It's actually harder than it looks. Ian Wienand just finished his Master's project in this area, so we have *lots* of data. The main issue is that, at least on Itanium, you have to turn off the hardware page table walker for hugepages if you want to mix superpages and standard pages in the same region. (The long format VHPT isn't the panacea we'd like it to be because the hash function it uses depends on the page size). This means that although you have fewer TLB misses with larger pages, the cost of those TLB misses is three to four times higher than with the standard pages. In addition, to set up a large page takes more effort... and it turns out there are few applications where the cost is amortised enough, so on SpecCPU for example, some tests improved performance slightly, some got slightly worse. What we saw was essentially that we could almost eliminate DTLB misses, other than the first, for a huge page. For most applications, though, the extra cost of that first miss, plus the cost of setting up the huge page, was greater than the few hundred DTLB misses we avoided. I'm expecting Ian to publish the full results soon. Other architectures (where the page size isn't tied into the hash function, so the hardware walked can be used for superpages) will have different tradeoffs. -- Dr Peter Chubb http://www.gelato.unsw.edu.au peterc AT gelato.unsw.edu.au http://www.ertos.nicta.com.au ERTOS within National ICT Australia ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: larger default page sizes... 2008-03-25 23:41 ` Peter Chubb @ 2008-03-25 23:49 ` David Miller 2008-03-26 0:25 ` Peter Chubb 2008-03-26 0:34 ` David Mosberger-Tang 1 sibling, 1 reply; 36+ messages in thread From: David Miller @ 2008-03-25 23:49 UTC (permalink / raw) To: peterc; +Cc: clameter, linux-mm, linux-kernel, linux-ia64, torvalds, ianw From: Peter Chubb <peterc@gelato.unsw.edu.au> Date: Wed, 26 Mar 2008 10:41:32 +1100 > It's actually harder than it looks. Ian Wienand just finished his > Master's project in this area, so we have *lots* of data. The main > issue is that, at least on Itanium, you have to turn off the hardware > page table walker for hugepages if you want to mix superpages and > standard pages in the same region. (The long format VHPT isn't the > panacea we'd like it to be because the hash function it uses depends > on the page size). This means that although you have fewer TLB misses > with larger pages, the cost of those TLB misses is three to four times > higher than with the standard pages. If the hugepage is more than 3 to 4 times larger than the base page size, which it almost certainly is, it's still an enormous win. > Other architectures (where the page size isn't tied into the hash > function, so the hardware walked can be used for superpages) will have > different tradeoffs. Right, admittedly this is just a (one of many) strange IA64 quirk. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: larger default page sizes... 2008-03-25 23:49 ` David Miller @ 2008-03-26 0:25 ` Peter Chubb 2008-03-26 0:31 ` David Miller 0 siblings, 1 reply; 36+ messages in thread From: Peter Chubb @ 2008-03-26 0:25 UTC (permalink / raw) To: David Miller Cc: peterc, clameter, linux-mm, linux-kernel, linux-ia64, torvalds, ianw >>>>> "David" == David Miller <davem@davemloft.net> writes: David> From: Peter Chubb <peterc@gelato.unsw.edu.au> Date: Wed, 26 Mar David> 2008 10:41:32 +1100 >> It's actually harder than it looks. Ian Wienand just finished his >> Master's project in this area, so we have *lots* of data. The main >> issue is that, at least on Itanium, you have to turn off the >> hardware page table walker for hugepages if you want to mix >> superpages and standard pages in the same region. (The long format >> VHPT isn't the panacea we'd like it to be because the hash function >> it uses depends on the page size). This means that although you >> have fewer TLB misses with larger pages, the cost of those TLB >> misses is three to four times higher than with the standard pages. David> If the hugepage is more than 3 to 4 times larger than the base David> page size, which it almost certainly is, it's still an enormous David> win. That depends on the access pattern. We measured a small win for some workloads, and a small loss for others, using 4k base pages, and allowing up to 4G superpages (the actual sizes used depended on the size of the objects being allocated, and the amount of contiguous memory available). -- Dr Peter Chubb http://www.gelato.unsw.edu.au peterc AT gelato.unsw.edu.au http://www.ertos.nicta.com.au ERTOS within National ICT Australia ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: larger default page sizes... 2008-03-26 0:25 ` Peter Chubb @ 2008-03-26 0:31 ` David Miller 0 siblings, 0 replies; 36+ messages in thread From: David Miller @ 2008-03-26 0:31 UTC (permalink / raw) To: peterc; +Cc: clameter, linux-mm, linux-kernel, linux-ia64, torvalds, ianw From: Peter Chubb <peterc@gelato.unsw.edu.au> Date: Wed, 26 Mar 2008 11:25:58 +1100 > That depends on the access pattern. Absolutely. FWIW, I bet it helps enormously for gcc which, even for small compiles, swims around chaotically in an 8MB pool of GC'd memory. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: larger default page sizes... 2008-03-25 23:41 ` Peter Chubb 2008-03-25 23:49 ` David Miller @ 2008-03-26 0:34 ` David Mosberger-Tang 2008-03-26 0:39 ` David Miller 2008-03-26 0:57 ` Peter Chubb 1 sibling, 2 replies; 36+ messages in thread From: David Mosberger-Tang @ 2008-03-26 0:34 UTC (permalink / raw) To: Peter Chubb Cc: David Miller, clameter, linux-mm, linux-kernel, linux-ia64, torvalds, ianw On Tue, Mar 25, 2008 at 5:41 PM, Peter Chubb <peterc@gelato.unsw.edu.au> wrote: > The main issue is that, at least on Itanium, you have to turn off the hardware > page table walker for hugepages if you want to mix superpages and > standard pages in the same region. (The long format VHPT isn't the > panacea we'd like it to be because the hash function it uses depends > on the page size). Why not just repeat the PTEs for super-pages? That won't work for huge pages, but for superpages that are a reasonable multiple (e.g., 16-times) the base-page size, it should work nicely. --david -- Mosberger Consulting LLC, http://www.mosberger-consulting.com/ ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: larger default page sizes... 2008-03-26 0:34 ` David Mosberger-Tang @ 2008-03-26 0:39 ` David Miller 2008-03-26 0:57 ` Peter Chubb 1 sibling, 0 replies; 36+ messages in thread From: David Miller @ 2008-03-26 0:39 UTC (permalink / raw) To: dmosberger Cc: peterc, clameter, linux-mm, linux-kernel, linux-ia64, torvalds, ianw From: "David Mosberger-Tang" <dmosberger@gmail.com> Date: Tue, 25 Mar 2008 18:34:13 -0600 > Why not just repeat the PTEs for super-pages? This is basically how we implement hugepages in the page tables on sparc64. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: larger default page sizes... 2008-03-26 0:34 ` David Mosberger-Tang 2008-03-26 0:39 ` David Miller @ 2008-03-26 0:57 ` Peter Chubb 2008-03-26 4:16 ` John Marvin 1 sibling, 1 reply; 36+ messages in thread From: Peter Chubb @ 2008-03-26 0:57 UTC (permalink / raw) To: David Mosberger-Tang Cc: Peter Chubb, David Miller, clameter, linux-mm, linux-kernel, linux-ia64, torvalds, ianw >>>>> "David" == David Mosberger-Tang <dmosberger@gmail.com> writes: David> On Tue, Mar 25, 2008 at 5:41 PM, Peter Chubb David> <peterc@gelato.unsw.edu.au> wrote: >> The main issue is that, at least on Itanium, you have to turn off >> the hardware page table walker for hugepages if you want to mix >> superpages and standard pages in the same region. (The long format >> VHPT isn't the panacea we'd like it to be because the hash function >> it uses depends on the page size). David> Why not just repeat the PTEs for super-pages? That won't work David> for huge pages, but for superpages that are a reasonable David> multiple (e.g., 16-times) the base-page size, it should work David> nicely. You end up having to repeat PTEs to fit into Linux's page table structure *anyway* (unless we can change Linux's page table). But there's no place in the short format hardware-walked page table (that reuses the leaf entries in Linux's table) for a page size. And if you use some of the holes in the format, the hardware walker doesn't understand it --- so you have to turn off the hardware walker for *any* regions where there might be a superpage. If you use the long format VHPT, you have a choice: load the hash table with just the translation that caused the miss, load all possible hash entries that could have caused the miss for the page, or preload the hash table when the page is instantiated, with all possible entries that could hash to the huge page. I don't remember the details, but I seem to remember all these being bad choices for one reason or other ... Ian, can you elaborate? -- Dr Peter Chubb http://www.gelato.unsw.edu.au peterc AT gelato.unsw.edu.au http://www.ertos.nicta.com.au ERTOS within National ICT Australia ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: larger default page sizes... 2008-03-26 0:57 ` Peter Chubb @ 2008-03-26 4:16 ` John Marvin 2008-03-26 4:36 ` David Miller 0 siblings, 1 reply; 36+ messages in thread From: John Marvin @ 2008-03-26 4:16 UTC (permalink / raw) To: linux-ia64; +Cc: linux-mm, linux-kernel Peter Chubb wrote: > > You end up having to repeat PTEs to fit into Linux's page table > structure *anyway* (unless we can change Linux's page table). But > there's no place in the short format hardware-walked page table (that > reuses the leaf entries in Linux's table) for a page size. And if you > use some of the holes in the format, the hardware walker doesn't > understand it --- so you have to turn off the hardware walker for > *any* regions where there might be a superpage. No, you can set an illegal memory attribute in the pte for any superpage entry, and leave the hardware walker enabled for the base page size. The software tlb miss handler can then install the superpage tlb entry. I posted a working prototype of Shimizu superpages working on ia64 using short format vhpt's to the linux kernel list a while back. > > If you use the long format VHPT, you have a choice: load the > hash table with just the translation that caused the miss, load all > possible hash entries that could have caused the miss for the page, or > preload the hash table when the page is instantiated, with all > possible entries that could hash to the huge page. I don't remember > the details, but I seem to remember all these being bad choices for > one reason or other ... Ian, can you elaborate? When I was doing measurements of long format vs. short format, the two main problems with long format (and why I eventually chose to stick with short format) were: 1) There was no easy way of determining what size the long format vhpt cache should be automatically, and changing it dynamically would be too painful. Different workloads performed better with different size vhpt caches. 2) Regardless of the size, the vhpt cache is duplicated information. Using long format vhpt's significantly increased the number of cache misses for some workloads. Theoretically there should have been some cases where the long format solution would have performed better than the short format solution, but I was never able to create such a case. In many cases the performance difference between the long format solution and the short format solution was essentially the same. In other cases the short format vhpt solution outperformed the long format solution, and in those cases there was a significant difference in cache misses that I believe explained the performance difference. John ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: larger default page sizes... 2008-03-26 4:16 ` John Marvin @ 2008-03-26 4:36 ` David Miller 0 siblings, 0 replies; 36+ messages in thread From: David Miller @ 2008-03-26 4:36 UTC (permalink / raw) To: jsm; +Cc: linux-ia64, linux-mm, linux-kernel From: John Marvin <jsm@fc.hp.com> Date: Tue, 25 Mar 2008 22:16:00 -0600 > 1) There was no easy way of determining what size the long format vhpt cache > should be automatically, and changing it dynamically would be too painful. > Different workloads performed better with different size vhpt caches. This is exactly what sparc64 does btw, dynamic TLB miss hash table sizing based upon task RSS ^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: larger default page sizes... 2008-03-24 20:37 ` larger default page sizes David Miller 2008-03-24 21:05 ` Christoph Lameter @ 2008-03-24 21:25 ` Luck, Tony 2008-03-24 21:46 ` David Miller 2008-03-25 3:29 ` Paul Mackerras 2 siblings, 1 reply; 36+ messages in thread From: Luck, Tony @ 2008-03-24 21:25 UTC (permalink / raw) To: David Miller, clameter; +Cc: linux-mm, linux-kernel, linux-ia64, torvalds > The memory wastage is just rediculious. In an ideal world we'd have variable sized pages ... but since most arcthitectures have no h/w support for these it may be a long time before that comes to Linux. In a fixed page size world the right page size to use depends on the workload and the capacity of the system. When memory capacity is measured in hundreds of GB, then a larger page size doesn't look so ridiculous. -Tony ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: larger default page sizes... 2008-03-24 21:25 ` Luck, Tony @ 2008-03-24 21:46 ` David Miller 0 siblings, 0 replies; 36+ messages in thread From: David Miller @ 2008-03-24 21:46 UTC (permalink / raw) To: tony.luck; +Cc: clameter, linux-mm, linux-kernel, linux-ia64, torvalds From: "Luck, Tony" <tony.luck@intel.com> Date: Mon, 24 Mar 2008 14:25:11 -0700 > When memory capacity is measured in hundreds of GB, then > a larger page size doesn't look so ridiculous. We have hugepages and such for a reason. And this can be made more dynamic and flexible, as needed. Increasing the page size is a "stick your head in the sand" type solution by my book. Especially when you can make the hugepage facility stronger and thus get what you want without the memory wastage side effects. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: larger default page sizes... 2008-03-24 20:37 ` larger default page sizes David Miller 2008-03-24 21:05 ` Christoph Lameter 2008-03-24 21:25 ` Luck, Tony @ 2008-03-25 3:29 ` Paul Mackerras 2008-03-25 4:15 ` David Miller ` (2 more replies) 2 siblings, 3 replies; 36+ messages in thread From: Paul Mackerras @ 2008-03-25 3:29 UTC (permalink / raw) To: David Miller; +Cc: clameter, linux-mm, linux-kernel, linux-ia64, torvalds David Miller writes: > From: Christoph Lameter <clameter@sgi.com> > Date: Mon, 24 Mar 2008 11:27:06 -0700 (PDT) > > > The move to 64k page size on IA64 is another way that this issue can > > be addressed though. > > This is such a huge mistake I wish platforms such as powerpc and IA64 > would not make such decisions so lightly. The performance advantage of using hardware 64k pages is pretty compelling, on a wide range of programs, and particularly on HPC apps. > The memory wastage is just rediculious. Depends on the distribution of file sizes you have. > I already see several distributions moving to 64K pages for powerpc, > so I want to nip this in the bud before this monkey-see-monkey-do > thing gets any more out of hand. I just tried a kernel compile on a 4.2GHz POWER6 partition with 4 threads (2 cores) and 2GB of RAM, with two kernels. One was configured with 4kB pages and the other with 64kB kernels but they were otherwise identically configured. Here are the times for the same kernel compile (total time across all threads, for a fairly full-featured config): 4kB pages: 444.051s user + 34.406s system time 64kB pages: 419.963s user + 16.869s system time That's nearly 10% faster with 64kB pages -- on a kernel compile. Yes, the fragmentation in the page cache can be a pain in some circumstances, but on the whole I think the performance advantage is worth that pain, particularly for the sort of applications that people will tend to be running on RHEL on Power boxes. Regards, Paul. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: larger default page sizes... 2008-03-25 3:29 ` Paul Mackerras @ 2008-03-25 4:15 ` David Miller 2008-03-25 11:50 ` Paul Mackerras 2008-03-25 12:05 ` Andi Kleen 2008-03-25 18:27 ` Dave Hansen 2 siblings, 1 reply; 36+ messages in thread From: David Miller @ 2008-03-25 4:15 UTC (permalink / raw) To: paulus; +Cc: clameter, linux-mm, linux-kernel, linux-ia64, torvalds From: Paul Mackerras <paulus@samba.org> Date: Tue, 25 Mar 2008 14:29:55 +1100 > The performance advantage of using hardware 64k pages is pretty > compelling, on a wide range of programs, and particularly on HPC apps. Please read the rest of my responses in this thread, you can have your HPC cake and eat it too. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: larger default page sizes... 2008-03-25 4:15 ` David Miller @ 2008-03-25 11:50 ` Paul Mackerras 2008-03-25 23:32 ` David Miller 0 siblings, 1 reply; 36+ messages in thread From: Paul Mackerras @ 2008-03-25 11:50 UTC (permalink / raw) To: David Miller; +Cc: clameter, linux-mm, linux-kernel, linux-ia64, torvalds David Miller writes: > From: Paul Mackerras <paulus@samba.org> > Date: Tue, 25 Mar 2008 14:29:55 +1100 > > > The performance advantage of using hardware 64k pages is pretty > > compelling, on a wide range of programs, and particularly on HPC apps. > > Please read the rest of my responses in this thread, you > can have your HPC cake and eat it too. It's not just HPC, as I pointed out, it's pretty much everything, including kernel compiles. And "use hugepages" is a pretty inadequate answer given the restrictions of hugepages and the difficulty of using them. How do I get gcc to use hugepages, for instance? Using 64k pages gives us a performance boost for almost everything without the user having to do anything. If the hugepage stuff was in a state where it enabled large pages to be used for mapping an existing program, where possible, without any changes to the executable, then I would agree with you. But it isn't, it's a long way from that, and (as I understand it) Linus has in the past opposed the suggestion that we should move in that direction. Paul. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: larger default page sizes... 2008-03-25 11:50 ` Paul Mackerras @ 2008-03-25 23:32 ` David Miller 2008-03-25 23:49 ` Luck, Tony 0 siblings, 1 reply; 36+ messages in thread From: David Miller @ 2008-03-25 23:32 UTC (permalink / raw) To: paulus; +Cc: clameter, linux-mm, linux-kernel, linux-ia64, torvalds From: Paul Mackerras <paulus@samba.org> Date: Tue, 25 Mar 2008 22:50:00 +1100 > How do I get gcc to use hugepages, for instance? Implementing transparent automatic usage of hugepages has been discussed many times, it's definitely doable and other OSs have implemented this for years. This is what I was implying. ^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: larger default page sizes... 2008-03-25 23:32 ` David Miller @ 2008-03-25 23:49 ` Luck, Tony 2008-03-26 0:16 ` David Miller 2008-03-26 15:54 ` Nish Aravamudan 0 siblings, 2 replies; 36+ messages in thread From: Luck, Tony @ 2008-03-25 23:49 UTC (permalink / raw) To: David Miller, paulus Cc: clameter, linux-mm, linux-kernel, linux-ia64, torvalds > > How do I get gcc to use hugepages, for instance? > > Implementing transparent automatic usage of hugepages has been > discussed many times, it's definitely doable and other OSs have > implemented this for years. > > This is what I was implying. "large" pages, or "super" pages perhaps ... but Linux "huge" pages seem pretty hard to adapt for generic use by applications. They are generally a somewhere between a bit too big (2MB on X86) to way too big (64MB, 256MB, 1GB or 4GB on ia64) for general use. Right now they also suffer from making the sysadmin pick at boot time how much memory to allocate as huge pages (while it is possible to break huge pages into normal pages, going in the reverse direction requires a memory defragmenter that doesn't exist). Making an application use huge pages as heap may be simple (just link with a different library to provide with a different version of malloc()) ... code, stack, mmap'd files are all a lot harder to do transparently. -Tony ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: larger default page sizes... 2008-03-25 23:49 ` Luck, Tony @ 2008-03-26 0:16 ` David Miller 2008-03-26 15:54 ` Nish Aravamudan 1 sibling, 0 replies; 36+ messages in thread From: David Miller @ 2008-03-26 0:16 UTC (permalink / raw) To: tony.luck; +Cc: paulus, clameter, linux-mm, linux-kernel, linux-ia64, torvalds From: "Luck, Tony" <tony.luck@intel.com> Date: Tue, 25 Mar 2008 16:49:23 -0700 > Making an application use huge pages as heap may be simple > (just link with a different library to provide with a different > version of malloc()) ... code, stack, mmap'd files are all > a lot harder to do transparently. The kernel should be able to do this transparently, at the very least for the anonymous page case. It should also be able to handle just fine chips that provide multiple page size support, as many do. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: larger default page sizes... 2008-03-25 23:49 ` Luck, Tony 2008-03-26 0:16 ` David Miller @ 2008-03-26 15:54 ` Nish Aravamudan 2008-03-26 17:05 ` Luck, Tony 1 sibling, 1 reply; 36+ messages in thread From: Nish Aravamudan @ 2008-03-26 15:54 UTC (permalink / raw) To: Luck, Tony Cc: David Miller, paulus, clameter, linux-mm, linux-kernel, linux-ia64, torvalds, agl, Mel Gorman On 3/25/08, Luck, Tony <tony.luck@intel.com> wrote: > > > How do I get gcc to use hugepages, for instance? > > > > Implementing transparent automatic usage of hugepages has been > > discussed many times, it's definitely doable and other OSs have > > implemented this for years. > > > > This is what I was implying. > > > "large" pages, or "super" pages perhaps ... but Linux "huge" pages > seem pretty hard to adapt for generic use by applications. They > are generally a somewhere between a bit too big (2MB on X86) to > way too big (64MB, 256MB, 1GB or 4GB on ia64) for general use. > > Right now they also suffer from making the sysadmin pick at > boot time how much memory to allocate as huge pages (while it > is possible to break huge pages into normal pages, going in > the reverse direction requires a memory defragmenter that > doesn't exist). That's not entirely true. We have a dynamic pool now, thanks to Adam Litke [added to Cc], which can be treated as a high watermark for the hugetlb pool (and the static pool value serves as a low watermark). Unless by hugepages you mean something other than what I think (but referring to a 2M size on x86 imples you are not). And with the antifragmentation improvements, hugepage pool changes at run-time are more likely to succeed [added Mel to Cc]. > Making an application use huge pages as heap may be simple > (just link with a different library to provide with a different > version of malloc()) ... code, stack, mmap'd files are all > a lot harder to do transparently. I feel like I should promote libhugetlbfs here. We're trying to make things easier for applications to use. You can back the heap by hugepages via LD_PRELOAD. But even that isn't always simple (what happens when something is already allocated on the heap?, which we've seen happen even in our constructor in the library, for instance). We're working on hugepage stack support. Text/BSS/Data segment remapping exists now, too, but does require relinking to be more successful. We have a mode that allows libhugetlbfs to try to fit the segments into hugepages, or even just those parts that might fit -- but we have limitations on power and IA64, for instance, where hugepages are restricted in their placement (either depending on the process' existing mappings or generally). libhugetlbfs has, at least, been tested a bit on IA64 to validate the heap backing (IIRC) and the various kernel tests. We also have basic sparc support -- however, I don't have any boxes handy to test on (working on getting them added to our testing grid and then will revisit them), and then one box I used before gave me semi-spurious soft-lockups (old bug, unclear if it is software or just buggy hardware). In any case, my point is people are trying to work on this from various angles. Both making hugepages more available at run-time (in a dynamic fashion, based upon need) and making them easier to use for applications. Is it easy? Not necessarily. Is it guaranteed to work? I like to think we make a best effort. But as others have pointed out, it doesn't seem like we're going to get mainline transparent hugepage support anytime soon. Thanks, Nish ^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: larger default page sizes... 2008-03-26 15:54 ` Nish Aravamudan @ 2008-03-26 17:05 ` Luck, Tony 2008-03-26 18:54 ` Mel Gorman 0 siblings, 1 reply; 36+ messages in thread From: Luck, Tony @ 2008-03-26 17:05 UTC (permalink / raw) To: Nish Aravamudan Cc: David Miller, paulus, clameter, linux-mm, linux-kernel, linux-ia64, torvalds, agl, Mel Gorman > That's not entirely true. We have a dynamic pool now, thanks to Adam > Litke [added to Cc], which can be treated as a high watermark for the > hugetlb pool (and the static pool value serves as a low watermark). > Unless by hugepages you mean something other than what I think (but > referring to a 2M size on x86 imples you are not). And with the > antifragmentation improvements, hugepage pool changes at run-time are > more likely to succeed [added Mel to Cc]. Things are better than I thought ... though the phrase "more likely to succeed" doesn't fill me with confidence. Instead I imagine a system where an occasional spike in memory load causes some memory fragmentation that can't be handled, and so from that point many of the applications that relied on huge pages take a 10% performance hit. This results in sysadmins scheduling regular reboots to unjam things. [Reminds me of the instructions that came with my first flatbed scanner that recommended rebooting the system before and after each use :-( ] > I feel like I should promote libhugetlbfs here. This is also better than I thought ... sounds like some really good things have already happened here. -Tony ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: larger default page sizes... 2008-03-26 17:05 ` Luck, Tony @ 2008-03-26 18:54 ` Mel Gorman 0 siblings, 0 replies; 36+ messages in thread From: Mel Gorman @ 2008-03-26 18:54 UTC (permalink / raw) To: Luck, Tony Cc: Nish Aravamudan, David Miller, paulus, clameter, linux-mm, linux-kernel, linux-ia64, torvalds, agl On (26/03/08 10:05), Luck, Tony didst pronounce: > > That's not entirely true. We have a dynamic pool now, thanks to Adam > > Litke [added to Cc], which can be treated as a high watermark for the > > hugetlb pool (and the static pool value serves as a low watermark). > > Unless by hugepages you mean something other than what I think (but > > referring to a 2M size on x86 imples you are not). And with the > > antifragmentation improvements, hugepage pool changes at run-time are > > more likely to succeed [added Mel to Cc]. > > Things are better than I thought ... though the phrase "more likely > to succeed" doesn't fill me with confidence. It's a lot more likely to succeed since 2.6.24 than it has in the past. On workloads where it is mainly user data that is occuping memory, the chances are even better. If min_free_kbytes is hugepage_size*num_online_nodes(), it becomes a harder again to fragment memory. > Instead I imagine a > system where an occasional spike in memory load causes some memory > fragmentation that can't be handled, and so from that point many of > the applications that relied on huge pages take a 10% performance > hit. If it was found to be a problem and normal anti-frag is not coping for hugepage pool resizes, then specify movablecore=MAX_POSSIBLE_POOL_SIZE_YOU_WOULD_NEED on the command-line and the hugepage pool will be able to expand to that side independent of workload. This would avoid the need to scheduled regular reboots. > This results in sysadmins scheduling regular reboots to unjam > things. [Reminds me of the instructions that came with my first > flatbed scanner that recommended rebooting the system before and > after each use :-( ] > > > I feel like I should promote libhugetlbfs here. > > This is also better than I thought ... sounds like some really > good things have already happened here. > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: larger default page sizes... 2008-03-25 3:29 ` Paul Mackerras 2008-03-25 4:15 ` David Miller @ 2008-03-25 12:05 ` Andi Kleen 2008-03-25 21:27 ` Paul Mackerras 2008-03-26 5:24 ` Paul Mackerras 2008-03-25 18:27 ` Dave Hansen 2 siblings, 2 replies; 36+ messages in thread From: Andi Kleen @ 2008-03-25 12:05 UTC (permalink / raw) To: Paul Mackerras Cc: David Miller, clameter, linux-mm, linux-kernel, linux-ia64, torvalds Paul Mackerras <paulus@samba.org> writes: > > 4kB pages: 444.051s user + 34.406s system time > 64kB pages: 419.963s user + 16.869s system time > > That's nearly 10% faster with 64kB pages -- on a kernel compile. Do you have some idea where the improvement mainly comes from? Is it TLB misses or reduced in kernel overhead? Ok I assume both play together but which part of the equation is more important? -Andi ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: larger default page sizes... 2008-03-25 12:05 ` Andi Kleen @ 2008-03-25 21:27 ` Paul Mackerras 2008-03-26 5:24 ` Paul Mackerras 1 sibling, 0 replies; 36+ messages in thread From: Paul Mackerras @ 2008-03-25 21:27 UTC (permalink / raw) To: Andi Kleen Cc: David Miller, clameter, linux-mm, linux-kernel, linux-ia64, torvalds Andi Kleen writes: > Paul Mackerras <paulus@samba.org> writes: > > > > 4kB pages: 444.051s user + 34.406s system time > > 64kB pages: 419.963s user + 16.869s system time > > > > That's nearly 10% faster with 64kB pages -- on a kernel compile. > > Do you have some idea where the improvement mainly comes from? > Is it TLB misses or reduced in kernel overhead? Ok I assume both > play together but which part of the equation is more important? I think that to a first approximation, the improvement in user time (24 seconds) is due to the increased TLB reach and reduced TLB misses, and the improvement in system time (18 seconds) is due to the reduced number of page faults and reductions in other kernel overheads. As Dave Hansen points out, I can separate the two effects by having the kernel use 64k pages at the VM level but 4k pages in the hardware page table, which is easy since we have support for 64k base page size on machines that don't have hardware 64k page support. I'll do that today. Paul. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: larger default page sizes... 2008-03-25 12:05 ` Andi Kleen 2008-03-25 21:27 ` Paul Mackerras @ 2008-03-26 5:24 ` Paul Mackerras 2008-03-26 15:59 ` Linus Torvalds 2008-03-26 17:56 ` Christoph Lameter 1 sibling, 2 replies; 36+ messages in thread From: Paul Mackerras @ 2008-03-26 5:24 UTC (permalink / raw) To: Andi Kleen Cc: David Miller, clameter, linux-mm, linux-kernel, linux-ia64, torvalds Andi Kleen writes: > Paul Mackerras <paulus@samba.org> writes: > > > > 4kB pages: 444.051s user + 34.406s system time > > 64kB pages: 419.963s user + 16.869s system time > > > > That's nearly 10% faster with 64kB pages -- on a kernel compile. > > Do you have some idea where the improvement mainly comes from? > Is it TLB misses or reduced in kernel overhead? Ok I assume both > play together but which part of the equation is more important? With the kernel configured for a 64k page size, but using 4k pages in the hardware page table, I get: 64k/4k: 441.723s user + 27.258s system time So the improvement in the user time is almost all due to the reduced TLB misses (as one would expect). For the system time, using 64k pages in the VM reduces it by about 21%, and using 64k hardware pages reduces it by another 30%. So the reduction in kernel overhead is significant but not as large as the impact of reducing TLB misses. Paul. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: larger default page sizes... 2008-03-26 5:24 ` Paul Mackerras @ 2008-03-26 15:59 ` Linus Torvalds 2008-03-27 1:08 ` Paul Mackerras 2008-03-26 17:56 ` Christoph Lameter 1 sibling, 1 reply; 36+ messages in thread From: Linus Torvalds @ 2008-03-26 15:59 UTC (permalink / raw) To: Paul Mackerras Cc: Andi Kleen, David Miller, clameter, linux-mm, linux-kernel, linux-ia64 On Wed, 26 Mar 2008, Paul Mackerras wrote: > > So the improvement in the user time is almost all due to the reduced > TLB misses (as one would expect). For the system time, using 64k > pages in the VM reduces it by about 21%, and using 64k hardware pages > reduces it by another 30%. So the reduction in kernel overhead is > significant but not as large as the impact of reducing TLB misses. I realize that getting the POWER people to accept that they have been total morons when it comes to VM for the last three decades is hard, but somebody in the POWER hardware design camp should (a) be told and (b) be really ashamed of themselves. Is this a POWER6 or what? Becasue 21% overhead from TLB handling on something like gcc shows that some piece of hardware is absolute crap. May I suggest people inside IBM try to fix this some day, and in the meantime people outside should probably continue to buy Intel/AMD CPU's until the others can get their act together. Linus ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: larger default page sizes... 2008-03-26 15:59 ` Linus Torvalds @ 2008-03-27 1:08 ` Paul Mackerras 0 siblings, 0 replies; 36+ messages in thread From: Paul Mackerras @ 2008-03-27 1:08 UTC (permalink / raw) To: Linus Torvalds Cc: Andi Kleen, David Miller, clameter, linux-mm, linux-kernel, linux-ia64 Linus Torvalds writes: > On Wed, 26 Mar 2008, Paul Mackerras wrote: > > > > So the improvement in the user time is almost all due to the reduced > > TLB misses (as one would expect). For the system time, using 64k > > pages in the VM reduces it by about 21%, and using 64k hardware pages > > reduces it by another 30%. So the reduction in kernel overhead is > > significant but not as large as the impact of reducing TLB misses. > > I realize that getting the POWER people to accept that they have been > total morons when it comes to VM for the last three decades is hard, but > somebody in the POWER hardware design camp should (a) be told and (b) be > really ashamed of themselves. > > Is this a POWER6 or what? Becasue 21% overhead from TLB handling on > something like gcc shows that some piece of hardware is absolute crap. You have misunderstood the 21% number. That number has *nothing* to do with hardware TLB miss handling, and everything to do with how long the generic Linux virtual memory code spends doing its thing (page faults, setting up and tearing down Linux page tables, etc.). It doesn't even have anything to do with the hash table (hardware page table), because both cases are using 4k hardware pages. Thus in both cases the TLB misses and hash-table misses would have been the same. The *only* difference between the cases is the page size that the generic Linux virtual memory code is using. With the 64k page size our architecture-independent kernel code runs 21% faster. Thus the 21% is not about the TLB or any hardware thing at all, it's about the larger per-byte overhead of our kernel code when using the smaller page size. The thing you were ranting about -- hardware TLB handling overhead -- comes in at 5%, comparing 4k hardware pages to 64k hardware pages (444 seconds vs. 420 seconds user time for the kernel compile). And yes, it's a POWER6. Paul. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: larger default page sizes... 2008-03-26 5:24 ` Paul Mackerras 2008-03-26 15:59 ` Linus Torvalds @ 2008-03-26 17:56 ` Christoph Lameter 2008-03-26 23:21 ` David Miller 2008-03-27 3:00 ` Paul Mackerras 1 sibling, 2 replies; 36+ messages in thread From: Christoph Lameter @ 2008-03-26 17:56 UTC (permalink / raw) To: Paul Mackerras Cc: Andi Kleen, David Miller, linux-mm, linux-kernel, linux-ia64, torvalds On Wed, 26 Mar 2008, Paul Mackerras wrote: > So the improvement in the user time is almost all due to the reduced > TLB misses (as one would expect). For the system time, using 64k > pages in the VM reduces it by about 21%, and using 64k hardware pages > reduces it by another 30%. So the reduction in kernel overhead is > significant but not as large as the impact of reducing TLB misses. One should emphasize that this test was a kernel compile which is not a load that gains much from larger pages. 4k pages are mostly okay for loads that use large amounts of small files. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: larger default page sizes... 2008-03-26 17:56 ` Christoph Lameter @ 2008-03-26 23:21 ` David Miller 2008-03-27 3:00 ` Paul Mackerras 1 sibling, 0 replies; 36+ messages in thread From: David Miller @ 2008-03-26 23:21 UTC (permalink / raw) To: clameter; +Cc: paulus, andi, linux-mm, linux-kernel, linux-ia64, torvalds From: Christoph Lameter <clameter@sgi.com> Date: Wed, 26 Mar 2008 10:56:17 -0700 (PDT) > One should emphasize that this test was a kernel compile which is not > a load that gains much from larger pages. Actually, ever since gcc went to a garbage collecting allocator, I've found it to be a TLB thrasher. It will repeatedly randomly walk over a GC pool of at least 8MB in size, which to fit fully in the TLB with 4K pages reaquires a TLB with 2048 entries assuming gcc touches no other data which is of course a false assumption. For some compiles this GC pool is more than 100MB in size. GCC does not fit into any modern TLB using it's base page size. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: larger default page sizes... 2008-03-26 17:56 ` Christoph Lameter 2008-03-26 23:21 ` David Miller @ 2008-03-27 3:00 ` Paul Mackerras 1 sibling, 0 replies; 36+ messages in thread From: Paul Mackerras @ 2008-03-27 3:00 UTC (permalink / raw) To: Christoph Lameter Cc: Andi Kleen, David Miller, linux-mm, linux-kernel, linux-ia64, torvalds Christoph Lameter writes: > One should emphasize that this test was a kernel compile which is not > a load that gains much from larger pages. 4k pages are mostly okay for > loads that use large amounts of small files. It's also worth emphasizing that 1.5% of the total time, or 21% of the system time, is pure software overhead in the Linux kernel that has nothing to do with the TLB or with gcc's memory access patterns. That's the cost of handling memory in small (i.e. 4kB) chunks inside the generic Linux VM code, rather than bigger chunks. Paul. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: larger default page sizes... 2008-03-25 3:29 ` Paul Mackerras 2008-03-25 4:15 ` David Miller 2008-03-25 12:05 ` Andi Kleen @ 2008-03-25 18:27 ` Dave Hansen 2 siblings, 0 replies; 36+ messages in thread From: Dave Hansen @ 2008-03-25 18:27 UTC (permalink / raw) To: Paul Mackerras Cc: David Miller, clameter, linux-mm, linux-kernel, linux-ia64, torvalds On Tue, 2008-03-25 at 14:29 +1100, Paul Mackerras wrote: > 4kB pages: 444.051s user + 34.406s system time > 64kB pages: 419.963s user + 16.869s system time > > That's nearly 10% faster with 64kB pages -- on a kernel compile. Can you do the same thing with the 4k MMU pages and 64k PAGE_SIZE? Wouldn't that easily break out whether the advantage is from the TLB or from less kernel overhead? -- Dave ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2008-03-27 3:00 UTC | newest] Thread overview: 36+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-03-25 23:47 larger default page sizes J.C. Pizarro 2008-03-26 15:57 ` H. Peter Anvin -- strict thread matches above, loose matches on Subject: below -- 2008-03-21 17:40 [11/14] vcompound: Fallbacks for order 1 stack allocations on IA64 and x86 Christoph Lameter 2008-03-21 21:57 ` David Miller 2008-03-24 18:27 ` Christoph Lameter 2008-03-24 20:37 ` larger default page sizes David Miller 2008-03-24 21:05 ` Christoph Lameter 2008-03-24 21:43 ` David Miller 2008-03-25 17:48 ` Christoph Lameter 2008-03-25 23:22 ` David Miller 2008-03-25 23:41 ` Peter Chubb 2008-03-25 23:49 ` David Miller 2008-03-26 0:25 ` Peter Chubb 2008-03-26 0:31 ` David Miller 2008-03-26 0:34 ` David Mosberger-Tang 2008-03-26 0:39 ` David Miller 2008-03-26 0:57 ` Peter Chubb 2008-03-26 4:16 ` John Marvin 2008-03-26 4:36 ` David Miller 2008-03-24 21:25 ` Luck, Tony 2008-03-24 21:46 ` David Miller 2008-03-25 3:29 ` Paul Mackerras 2008-03-25 4:15 ` David Miller 2008-03-25 11:50 ` Paul Mackerras 2008-03-25 23:32 ` David Miller 2008-03-25 23:49 ` Luck, Tony 2008-03-26 0:16 ` David Miller 2008-03-26 15:54 ` Nish Aravamudan 2008-03-26 17:05 ` Luck, Tony 2008-03-26 18:54 ` Mel Gorman 2008-03-25 12:05 ` Andi Kleen 2008-03-25 21:27 ` Paul Mackerras 2008-03-26 5:24 ` Paul Mackerras 2008-03-26 15:59 ` Linus Torvalds 2008-03-27 1:08 ` Paul Mackerras 2008-03-26 17:56 ` Christoph Lameter 2008-03-26 23:21 ` David Miller 2008-03-27 3:00 ` Paul Mackerras 2008-03-25 18:27 ` Dave Hansen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox