Linux MIPS Architecture development
 help / color / mirror / Atom feed
* TLB dimensioning
@ 2004-09-01 10:07 Emmanuel Michon
  2004-09-01 13:28 ` Dominic Sweetman
  0 siblings, 1 reply; 7+ messages in thread
From: Emmanuel Michon @ 2004-09-01 10:07 UTC (permalink / raw)
  To: linux-mips

Hi,

regarding the hardware implementation of a 4KE (r4k style mmu
if I remember) I'm wondering about the performance difference
when the TLB has 16 pairs of entries (covering 128KBytes of
data) or 32 pairs (covering 256KBytes).

Does someone have a useful advise regarding the `nice spot'
for TLB size?

Sincerely yours,

E.M.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: TLB dimensioning
  2004-09-01 10:07 TLB dimensioning Emmanuel Michon
@ 2004-09-01 13:28 ` Dominic Sweetman
  2004-09-01 23:35   ` Ralf Baechle
  2004-09-02 10:19   ` Johannes Stezenbach
  0 siblings, 2 replies; 7+ messages in thread
From: Dominic Sweetman @ 2004-09-01 13:28 UTC (permalink / raw)
  To: Emmanuel Michon; +Cc: linux-mips


Emmanuel,

> regarding the hardware implementation of a 4KE (r4k style mmu
> if I remember) I'm wondering about the performance difference
> when the TLB has 16 pairs of entries (covering 128KBytes of
> data) or 32 pairs (covering 256KBytes).
> 
> Does someone have a useful advise regarding the `nice spot'
> for TLB size?

As you expected, there is no really simple answer.  The TLB is a
relatively large piece of logic, so it often isn't a trivial decision.

Applications - particularly embedded applications, which I suspect is
what you mean - vary a lot in the size of the mapped, user-space
working set.  Some Linux-powered embedded devices do nearly all their
work in the kernel...

However, the measurements we've done at MIPS suggest that for
moderate-size workloads where the user-space programs are working
hard, a 16-entry TLB can thrash quite badly, making a significant dent
in performance.

So the advice I'd give is that if:

1. Your application has a non-trivial user space of any size;

2. The performance of userland code is significant;

then you should pick a 32-entry TLB, until and unless you have
measurements of your own application to show you don't need it.

-- 
Dominic Sweetman
MIPS Technologies

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: TLB dimensioning
  2004-09-01 13:28 ` Dominic Sweetman
@ 2004-09-01 23:35   ` Ralf Baechle
  2004-09-02 12:25     ` Dominic Sweetman
  2004-09-02 10:19   ` Johannes Stezenbach
  1 sibling, 1 reply; 7+ messages in thread
From: Ralf Baechle @ 2004-09-01 23:35 UTC (permalink / raw)
  To: Dominic Sweetman; +Cc: Emmanuel Michon, linux-mips

On Wed, Sep 01, 2004 at 02:28:30PM +0100, Dominic Sweetman wrote:

> 1. Your application has a non-trivial user space of any size;
> 
> 2. The performance of userland code is significant;

The kernel's performance also relies on TLB performance.

The wired register is making it easy to test performance of kernel and
application with a reduced size TLB; maybe I should make that a kernel
feature.

  Ralf

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: TLB dimensioning
  2004-09-01 13:28 ` Dominic Sweetman
  2004-09-01 23:35   ` Ralf Baechle
@ 2004-09-02 10:19   ` Johannes Stezenbach
  2004-09-02 10:31     ` Ralf Baechle
  1 sibling, 1 reply; 7+ messages in thread
From: Johannes Stezenbach @ 2004-09-02 10:19 UTC (permalink / raw)
  To: Dominic Sweetman; +Cc: Emmanuel Michon, linux-mips

Dominic Sweetman wrote:
> 
> Emmanuel,
> 
> > regarding the hardware implementation of a 4KE (r4k style mmu
> > if I remember) I'm wondering about the performance difference
> > when the TLB has 16 pairs of entries (covering 128KBytes of
> > data) or 32 pairs (covering 256KBytes).
> > 
> > Does someone have a useful advise regarding the `nice spot'
> > for TLB size?
...
> However, the measurements we've done at MIPS suggest that for
> moderate-size workloads where the user-space programs are working
> hard, a 16-entry TLB can thrash quite badly, making a significant dent
> in performance.
> 
> So the advice I'd give is that if:
> 
> 1. Your application has a non-trivial user space of any size;
> 
> 2. The performance of userland code is significant;
> 
> then you should pick a 32-entry TLB, until and unless you have
> measurements of your own application to show you don't need it.

Hm, the MIPS32 4K Processor Core Family Software User's Manual says:

"...the 4Kc core contains a 3-entry instruction TLB (ITLB), a 3-entry
data TLB(DTLB), and a 16 dual-entry joint TLB (JTLB) with variable page
sizes."

What exactly does that mean, and how does it rate performancewise?
I'm just curious ;-)

Johannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: TLB dimensioning
  2004-09-02 10:19   ` Johannes Stezenbach
@ 2004-09-02 10:31     ` Ralf Baechle
  2004-09-02 11:53       ` Dominic Sweetman
  0 siblings, 1 reply; 7+ messages in thread
From: Ralf Baechle @ 2004-09-02 10:31 UTC (permalink / raw)
  To: Johannes Stezenbach, Dominic Sweetman, Emmanuel Michon,
	linux-mips

On Thu, Sep 02, 2004 at 12:19:57PM +0200, Johannes Stezenbach wrote:

> Hm, the MIPS32 4K Processor Core Family Software User's Manual says:
> 
> "...the 4Kc core contains a 3-entry instruction TLB (ITLB), a 3-entry
> data TLB(DTLB), and a 16 dual-entry joint TLB (JTLB) with variable page
> sizes."
> 
> What exactly does that mean, and how does it rate performancewise?
> I'm just curious ;-)

The idea behind ITLB and DTLB is to enable parallel TLB lookups for
instruction and data translations in ITLB and DTLB yet not having to make
dual or even more ported JTLB.  ITLB and DTLB are entirely managed in
hardware and therefore not visible [1] to the OS software and as such
not part of the architecture; only the JTLB is and it's what's usually
meant when documentation or we on this list are speaking of the TLB.
Probably most MIPS implementations since at least the R4600 had ITLB and
DTLB.

  Ralf

[1] Except possibly during hazards but your supposed to avoid them :-)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: TLB dimensioning
  2004-09-02 10:31     ` Ralf Baechle
@ 2004-09-02 11:53       ` Dominic Sweetman
  0 siblings, 0 replies; 7+ messages in thread
From: Dominic Sweetman @ 2004-09-02 11:53 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Johannes Stezenbach, Dominic Sweetman, Emmanuel Michon,
	linux-mips


Johannes asked

> > "...the 4Kc core contains a 3-entry instruction TLB (ITLB), a 3-entry
> > data TLB(DTLB), and a 16 dual-entry joint TLB (JTLB) with variable page
> > sizes."
> > 
> > What exactly does that mean, and how does it rate performancewise?
> > I'm just curious ;-)

I'd like to believe that if the manual mentions the ITLB and DTLB it
also says, somewhere, what they do...

But as Ralf says they're tiny caches of translation entries,
automatically refilled from the main TLB when required.  They work
faster than the main TLB (being smaller) and prevent translations for
loads/stores getting in the way of translations for instruction
fetches.  Usually there's a mysterious 1-clock extra delay when the
translation you need isn't in the ITLB/DTLB, but it's only one clock
and doesn't happen very often, so the performance effect is usually
somewhere between unmeasurable and tiny.

> Probably most MIPS implementations since at least the R4600 had ITLB
> and DTLB.

Even the very first MIPS architecture chip (R2000) had an I-side
"uTLB".  It had just one entry, but then instructions tend to be
sequential...

--
Dominic Sweetman
MIPS Technologies

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: TLB dimensioning
  2004-09-01 23:35   ` Ralf Baechle
@ 2004-09-02 12:25     ` Dominic Sweetman
  0 siblings, 0 replies; 7+ messages in thread
From: Dominic Sweetman @ 2004-09-02 12:25 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: Dominic Sweetman, Emmanuel Michon, linux-mips


Ralf Baechle (ralf@linux-mips.org) writes:

> The kernel's performance also relies on TLB performance.
> 
> The wired register is making it easy to test performance of kernel and
> application with a reduced size TLB; maybe I should make that a kernel
> feature.

An excellent suggestion.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-09-02 12:26 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-01 10:07 TLB dimensioning Emmanuel Michon
2004-09-01 13:28 ` Dominic Sweetman
2004-09-01 23:35   ` Ralf Baechle
2004-09-02 12:25     ` Dominic Sweetman
2004-09-02 10:19   ` Johannes Stezenbach
2004-09-02 10:31     ` Ralf Baechle
2004-09-02 11:53       ` Dominic Sweetman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox