linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* kernel mapping
@ 2001-01-15 23:13 Dan Malek
  2001-01-16  3:07 ` Frank Rowand
  0 siblings, 1 reply; 14+ messages in thread
From: Dan Malek @ 2001-01-15 23:13 UTC (permalink / raw)
  To: linuxppc-dev


How come we don't use iopa() and friends for all kernel mapping
information?  It is only defined for CONFIG_APUS, but is the right
thing to use on 8xx and 4xx, and probably all processors.  The
virt_to_bus/bus_to_virt contain the quickie arithmetic hack with
KERNELBASE, but that isn't the right thing to do for any kmalloc()
or valloc() space or if you don't have BAT mapping.

I am considering making these functions more generic, removing the
#ifdefs, and implementing "simulated" BAT mapping for processors
like the 8xx and 4xx that don't have BATs (not for 2.4, of course :-).

Why shouldn't I do this?

	-- Dan

--

	I like MMUs because I don't have a real life.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel mapping
  2001-01-15 23:13 kernel mapping Dan Malek
@ 2001-01-16  3:07 ` Frank Rowand
  2001-01-16  3:55   ` Dan Malek
  2001-01-16 11:37   ` Ralph Blach
  0 siblings, 2 replies; 14+ messages in thread
From: Frank Rowand @ 2001-01-16  3:07 UTC (permalink / raw)
  To: Dan Malek; +Cc: linuxppc-dev


Dan Malek wrote:
>
> How come we don't use iopa() and friends for all kernel mapping
> information?  It is only defined for CONFIG_APUS, but is the right
> thing to use on 8xx and 4xx, and probably all processors.  The
> virt_to_bus/bus_to_virt contain the quickie arithmetic hack with
> KERNELBASE, but that isn't the right thing to do for any kmalloc()
> or valloc() space or if you don't have BAT mapping.
>
> I am considering making these functions more generic, removing the
> #ifdefs, and implementing "simulated" BAT mapping for processors
> like the 8xx and 4xx that don't have BATs (not for 2.4, of course :-).
>
> Why shouldn't I do this?
>
>         -- Dan

For the 405 I had to use iopa() for virt_to_bus() because there are
cases where I create a virtual address for IO buffers that is
uncached, and that virtual address is not (physical address + KERNELBASE).
I also have the beginnings of simulated BAT mapping for the 405
(not quite there, but part way).

-Frank
--
Frank Rowand <frank_rowand@mvista.com>
MontaVista Software, Inc

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel mapping
  2001-01-16  3:07 ` Frank Rowand
@ 2001-01-16  3:55   ` Dan Malek
  2001-01-16 11:37   ` Ralph Blach
  1 sibling, 0 replies; 14+ messages in thread
From: Dan Malek @ 2001-01-16  3:55 UTC (permalink / raw)
  To: frowand; +Cc: linuxppc-dev


Frank Rowand wrote:

> I also have the beginnings of simulated BAT mapping for the 405
> (not quite there, but part way).

I know.  I'm currently re-writing it all to be generic.  All of
the 4xx specific pinned entry stuff is gone.  I've been thinking
about this too long for the 8xx and now have a reason to do it.
Some of the logic is in the iopa and mm_ptov functions now, which
will also just work fine on the 6xx/7xx/7xxx.


	-- Dan

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel mapping
  2001-01-16  3:07 ` Frank Rowand
  2001-01-16  3:55   ` Dan Malek
@ 2001-01-16 11:37   ` Ralph Blach
  2001-01-16 16:50     ` Dan Malek
  1 sibling, 1 reply; 14+ messages in thread
From: Ralph Blach @ 2001-01-16 11:37 UTC (permalink / raw)
  To: frowand; +Cc: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 1168 bytes --]

Why do we need simulated bat registers.

Chip

Frank Rowand wrote:
>
> Dan Malek wrote:
> >
> > How come we don't use iopa() and friends for all kernel mapping
> > information?  It is only defined for CONFIG_APUS, but is the right
> > thing to use on 8xx and 4xx, and probably all processors.  The
> > virt_to_bus/bus_to_virt contain the quickie arithmetic hack with
> > KERNELBASE, but that isn't the right thing to do for any kmalloc()
> > or valloc() space or if you don't have BAT mapping.
> >
> > I am considering making these functions more generic, removing the
> > #ifdefs, and implementing "simulated" BAT mapping for processors
> > like the 8xx and 4xx that don't have BATs (not for 2.4, of course :-).
> >
> > Why shouldn't I do this?
> >
> >         -- Dan
>
> For the 405 I had to use iopa() for virt_to_bus() because there are
> cases where I create a virtual address for IO buffers that is
> uncached, and that virtual address is not (physical address + KERNELBASE).
> I also have the beginnings of simulated BAT mapping for the 405
> (not quite there, but part way).
>
> -Frank
> --
> Frank Rowand <frank_rowand@mvista.com>
> MontaVista Software, Inc
>

[-- Attachment #2: Card for Ralph Blach --]
[-- Type: text/x-vcard, Size: 247 bytes --]

begin:vcard
n:Blach;Ralph
tel;work:919-543-1207
x-mozilla-html:TRUE
url:www.ibm.com
org:IBM MicroElectronics
adr:;;3039 Cornwallis		;RTP;NC;27709;USA
version:2.1
email;internet:rcblach@raleigh.ibm.com
x-mozilla-cpt:;15936
fn:Ralph Blach
end:vcard

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel mapping
  2001-01-16 11:37   ` Ralph Blach
@ 2001-01-16 16:50     ` Dan Malek
  2001-01-16 17:10       ` Ralph Blach
                         ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Dan Malek @ 2001-01-16 16:50 UTC (permalink / raw)
  To: Ralph Blach; +Cc: frowand, linuxppc-dev


Ralph Blach wrote:
>
> Why do we need simulated bat registers.

To improve performance.  Right now, on the 4xx there is the
concept of "pinned" TLB entries to reduce/eliminate TLB misses
on large mapped areas (like kernel text/data or I/O).  The 8xx
does this in some custom applications as well.  These are just
hacks that are headed down a disastrous maintenance path that
need to be stopped now for a more generic solution.

I have been experimenting with many different methods of using
the "large" page table sizes through the generic memory management
methods that already exist in the kernel.  I believe I can wrap
the concept of the pinned TLB entries into the same logic as BAT
register management on the bigger processors.  Hence, I call them
simulated BAT registers....the semantics aren't quite the same.

The BAT registers are a really good thing, and although the large
page size TLB entries are more flexible, they require more software
overhead.  I would like to make some generic Linux MM modifications
to help us support variable page sizes, but I suspect that will
never happen.


	-- Dan

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel mapping
  2001-01-16 16:50     ` Dan Malek
@ 2001-01-16 17:10       ` Ralph Blach
  2001-01-16 17:47       ` David Edelsohn
  2001-01-16 19:56       ` Frank Rowand
  2 siblings, 0 replies; 14+ messages in thread
From: Ralph Blach @ 2001-01-16 17:10 UTC (permalink / raw)
  To: Dan Malek; +Cc: frowand, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 1333 bytes --]

Dan,

Thanks for the info.  I agree that Pinned tlbs could be maintence
headache with each 4xx/8xx
chip requiring a different set of pinned tlbs.

Chip

Dan Malek wrote:
>
> Ralph Blach wrote:
> >
> > Why do we need simulated bat registers.
>
> To improve performance.  Right now, on the 4xx there is the
> concept of "pinned" TLB entries to reduce/eliminate TLB misses
> on large mapped areas (like kernel text/data or I/O).  The 8xx
> does this in some custom applications as well.  These are just
> hacks that are headed down a disastrous maintenance path that
> need to be stopped now for a more generic solution.
>
> I have been experimenting with many different methods of using
> the "large" page table sizes through the generic memory management
> methods that already exist in the kernel.  I believe I can wrap
> the concept of the pinned TLB entries into the same logic as BAT
> register management on the bigger processors.  Hence, I call them
> simulated BAT registers....the semantics aren't quite the same.
>
> The BAT registers are a really good thing, and although the large
> page size TLB entries are more flexible, they require more software
> overhead.  I would like to make some generic Linux MM modifications
> to help us support variable page sizes, but I suspect that will
> never happen.
>
>         -- Dan
>

[-- Attachment #2: Card for Ralph Blach --]
[-- Type: text/x-vcard, Size: 247 bytes --]

begin:vcard
n:Blach;Ralph
tel;work:919-543-1207
x-mozilla-html:TRUE
url:www.ibm.com
org:IBM MicroElectronics
adr:;;3039 Cornwallis		;RTP;NC;27709;USA
version:2.1
email;internet:rcblach@raleigh.ibm.com
x-mozilla-cpt:;15936
fn:Ralph Blach
end:vcard

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel mapping
  2001-01-16 16:50     ` Dan Malek
  2001-01-16 17:10       ` Ralph Blach
@ 2001-01-16 17:47       ` David Edelsohn
  2001-01-16 21:57         ` Dan Malek
  2001-01-17 10:51         ` Gabriel Paubert
  2001-01-16 19:56       ` Frank Rowand
  2 siblings, 2 replies; 14+ messages in thread
From: David Edelsohn @ 2001-01-16 17:47 UTC (permalink / raw)
  To: Dan Malek; +Cc: Ralph Blach, frowand, linuxppc-dev


>>>>> Dan Malek writes:

Dan> I have been experimenting with many different methods of using
Dan> the "large" page table sizes through the generic memory management
Dan> methods that already exist in the kernel.  I believe I can wrap
Dan> the concept of the pinned TLB entries into the same logic as BAT
Dan> register management on the bigger processors.  Hence, I call them
Dan> simulated BAT registers....the semantics aren't quite the same.

	Note that forthcoming 64-bit PowerPC chips from IBM utilize
multiple page sizes and no longer provide BAT registers.  "BAT register
management on the bigger processors" is a misnomer.

David

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel mapping
  2001-01-16 16:50     ` Dan Malek
  2001-01-16 17:10       ` Ralph Blach
  2001-01-16 17:47       ` David Edelsohn
@ 2001-01-16 19:56       ` Frank Rowand
  2001-01-16 22:13         ` Dan Malek
  2 siblings, 1 reply; 14+ messages in thread
From: Frank Rowand @ 2001-01-16 19:56 UTC (permalink / raw)
  To: Dan Malek; +Cc: Ralph Blach, frowand, linuxppc-dev


Dan Malek wrote:
>
> Ralph Blach wrote:
> >
> > Why do we need simulated bat registers.
>
> To improve performance.  Right now, on the 4xx there is the
> concept of "pinned" TLB entries to reduce/eliminate TLB misses
> on large mapped areas (like kernel text/data or I/O).  The 8xx
> does this in some custom applications as well.  These are just
> hacks that are headed down a disastrous maintenance path that
> need to be stopped now for a more generic solution.

At the moment, the 405 processors _require_ kernel memory to be
pinned because the tlb miss handlers use virtual addresses.  When
I started the 405 port I planned to move the TLB handlers into
assembly running in real mode.  Then when I started seeing info
about the 440 I backed away from that plan because the 440 always
runs with the MMU enabled.  I'm still thinking about the 440...

I pinned some IO ranges as a convenience when I was first porting
to the 405gp but plan to remove those pins.  Though I'm somewhat
tempted to leave a pin in place for the on-chip ethernet device
if performance measurements show a significant gain.


> I have been experimenting with many different methods of using
> the "large" page table sizes through the generic memory management
> methods that already exist in the kernel.  I believe I can wrap
> the concept of the pinned TLB entries into the same logic as BAT
> register management on the bigger processors.  Hence, I call them
> simulated BAT registers....the semantics aren't quite the same.

I think that's a good idea.  If you do so, please provide a way to
force an entry to be locked in the tlb.


> The BAT registers are a really good thing, and although the large
> page size TLB entries are more flexible, they require more software
> overhead.  I would like to make some generic Linux MM modifications
> to help us support variable page sizes, but I suspect that will
> never happen.
>
>         -- Dan

I've toyed with the variable pages sizes idea too, and it just hasn't
moved up high enough on my priority list.  I'm not sure I'm quite as
pessimistic as you about whether it will ever happen because several
other architecture support variable page sizes (including pa-risc
and (I think) IA-64).

-Frank
--
Frank Rowand <frank_rowand@mvista.com>
MontaVista Software, Inc

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel mapping
  2001-01-16 17:47       ` David Edelsohn
@ 2001-01-16 21:57         ` Dan Malek
  2001-01-17 10:51         ` Gabriel Paubert
  1 sibling, 0 replies; 14+ messages in thread
From: Dan Malek @ 2001-01-16 21:57 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Ralph Blach, frowand, linuxppc-dev


David Edelsohn wrote:

>         Note that forthcoming 64-bit PowerPC chips from IBM utilize
> multiple page sizes and no longer provide BAT registers.  "BAT register
> management on the bigger processors" is a misnomer.

No problem...I'm preparing.......Thanks for the info.


	-- Dan

--

	I like MMUs because I don't have a real life.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel mapping
  2001-01-16 19:56       ` Frank Rowand
@ 2001-01-16 22:13         ` Dan Malek
  2001-01-17  0:04           ` Frank Rowand
  0 siblings, 1 reply; 14+ messages in thread
From: Dan Malek @ 2001-01-16 22:13 UTC (permalink / raw)
  To: frowand; +Cc: Ralph Blach, linuxppc-dev


Frank Rowand wrote:

> At the moment, the 405 processors _require_ kernel memory to be
> pinned because the tlb miss handlers use virtual addresses.

I changed that, too.  It works like the other processors, in
particular, the 8xx.

> .........  Then when I started seeing info
> about the 440 I backed away from that plan because the 440 always
> runs with the MMU enabled.  I'm still thinking about the 440...


No way....Dammit can't you IBM guys follow your own rules :-).

> I pinned some IO ranges as a convenience when I was first porting
> to the 405gp but plan to remove those pins.

Those are actually performace advantages, and I am doing that
on some 8xx applications.  The difference now is we don't have
to actually allocate specific "pinned" entries, the large mapping
will just happen as part of the TLB reload.

> I think that's a good idea.  If you do so, please provide a way to
> force an entry to be locked in the tlb.

Nope.  I don't want to do that.  Then you have to make processor
specific trade offs, or incur high management overhead like the
405 does now.  For example, some of the processors allow a fixed
number of locked entries, but you have to trade off what you will
put there against losing TLB entries.  Or, you do like the 405
does and create a "software" locking, losing the use of some
very functional TLB management instructions.

By not locking entries and using large page table entries you don't
need to have processor unique configurations that are cumbersome
or unworkable on lesser featured processors.  You also let the
system operation find the best distribution of TLB entries.  Yes,
there is a clearly visible latency concern with loading TLBs, but
considering the amount of context we are switching these days a
single large page TLB miss is insignificant.

> I've toyed with the variable pages sizes idea too, and it just hasn't
> moved up high enough on my priority list.  I'm not sure I'm quite as
> pessimistic as you about whether it will ever happen


It's going to happen with the 405 merge.  It has to because I have
already screwed up and coded myself into a corner, and I want the
same features on the 8xx already as well.


	-- Dan

--

	I like MMUs because I don't have a real life.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel mapping
  2001-01-16 22:13         ` Dan Malek
@ 2001-01-17  0:04           ` Frank Rowand
  2001-01-17  7:02             ` Dan Malek
  0 siblings, 1 reply; 14+ messages in thread
From: Frank Rowand @ 2001-01-17  0:04 UTC (permalink / raw)
  To: Dan Malek; +Cc: frowand, linuxppc-dev


Dan Malek wrote:
>
> Frank Rowand wrote:
>

I have only a small amount of performance instrumentation and measurements.
Some of what I have to say is based on observation, inference, and
conjecture...


> > I pinned some IO ranges as a convenience when I was first porting
> > to the 405gp but plan to remove those pins.
>
> Those are actually performace advantages, and I am doing that
> on some 8xx applications.  The difference now is we don't have
> to actually allocate specific "pinned" entries, the large mapping
> will just happen as part of the TLB reload.

The IO ranges that I pinned were all just a 4k page (except the 64k
"page" for PCI IO space, which shouldn't be accessed much except for
PCI device initialization).  So the only performance advantage I
gained was avoiding TLB misses, not from large pages.


> > I think that's a good idea.  If you do so, please provide a way to
> > force an entry to be locked in the tlb.
>
> Nope.  I don't want to do that.  Then you have to make processor
> specific trade offs, or incur high management overhead like the
> 405 does now.  For example, some of the processors allow a fixed
> number of locked entries, but you have to trade off what you will
> put there against losing TLB entries.  Or, you do like the 405
> does and create a "software" locking, losing the use of some
> very functional TLB management instructions.


The 405 core (and thus the many processors based on it) has a 64 entry
tlb.  While debugging via a JTAG debugger I have observed that the
tlb very quickly gets filled with entries for the current context.  It
is extremely rare to see entries for a different context left over.
>From this, I infer that the tlb is not large enough to hold a working
set.  (If I was still working as a performance geek, I would find this
an interesting area to instrument.)  Locking a few kernel entries in
the tlb means that the majority of the kernel's working set _is_ in
the tlb at all times.  Here is a simple measurement of tlb misses
(running a simple load of copying nfs mounted files around, etc):

  dtlb  misses:   34679326        <--- data tlb
  itlb  misses:   33075725        <--- instruction tlb
  d + i misses:   67755051
  ktlb  misses:     233683        <--- kernel addresses
  utlb  misses:   67521368        <--- user space addresses
  k + u misses:   67755051


If you want to repeat the measurement with other workloads, just
cat /proc/ppc_htab in my kernel to get the above data.

For the 405, the only tlb management instruction I sacrificed was
the tlbia (invalidate the entire tlb) that I would have used for
PPC4xx_tlb_flush_all(), which is used by flush_tlb_all(), which
is only called from:

  ppc_htab_write()
  mmu_context_overflow()
  vmfree_area_pages()
  vmalloc_area_pages()
  flush_all_zero_pkmaps()

Which doesn't seem to be much of a sacrifice for a large gain.


> By not locking entries and using large page table entries you don't
> need to have processor unique configurations that are cumbersome
> or unworkable on lesser featured processors.  You also let the
> system operation find the best distribution of TLB entries.  Yes,

The tlb is not large enough to accumulate a working set in the tlb,
so system operation never finds the best distribution of tlb entries.


> there is a clearly visible latency concern with loading TLBs, but
> considering the amount of context we are switching these days a
> single large page TLB miss is insignificant.

It will be nice to have large page TLB implemented.

-Frank
--
Frank Rowand <frank_rowand@mvista.com>
MontaVista Software, Inc

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel mapping
  2001-01-17  0:04           ` Frank Rowand
@ 2001-01-17  7:02             ` Dan Malek
  0 siblings, 0 replies; 14+ messages in thread
From: Dan Malek @ 2001-01-17  7:02 UTC (permalink / raw)
  To: frowand; +Cc: linuxppc-dev


Frank Rowand wrote:

> The 405 core (and thus the many processors based on it) has a 64 entry
> tlb.  While debugging via a JTAG debugger I have observed that the
> tlb very quickly gets filled with entries for the current context.

Well, that's because we force it to do that.  Programs that have
a huge working set and run for an extended period will fill the
TLB.  Programs with small working sets will not if we actually
use the contexts properly, but we don't.  The way contexts are
used today, it is effectively flushing the TLB on every switch.
We do the same thing with VSIDs on the "bigger" processors, and
this isn't right either.  The Linux VM properly manages memory
contexts, and we should extend this into the PowerPC specific
software.

I have an LRU context algorithm for the 8xx, and I want to extend
this into the 4xx and even the other processors.  The idea is right,
but I need something that will scale beyond the small 16 contexts
of the 8xx.  I just don't have that yet.


	-- Dan

--

	I like MMUs because I don't have a real life.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel mapping
  2001-01-16 17:47       ` David Edelsohn
  2001-01-16 21:57         ` Dan Malek
@ 2001-01-17 10:51         ` Gabriel Paubert
  2001-01-17 17:45           ` David Edelsohn
  1 sibling, 1 reply; 14+ messages in thread
From: Gabriel Paubert @ 2001-01-17 10:51 UTC (permalink / raw)
  To: David Edelsohn; +Cc: linuxppc-dev


On Tue, 16 Jan 2001, David Edelsohn wrote:

> 	Note that forthcoming 64-bit PowerPC chips from IBM utilize
> multiple page sizes and no longer provide BAT registers.  "BAT register
> management on the bigger processors" is a misnomer.

How is it implemented ?  Is there any documentation available on the web ?

	Regards,
	Gabriel.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel mapping
  2001-01-17 10:51         ` Gabriel Paubert
@ 2001-01-17 17:45           ` David Edelsohn
  0 siblings, 0 replies; 14+ messages in thread
From: David Edelsohn @ 2001-01-17 17:45 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: linuxppc-dev


>>>>> Gabriel Paubert writes:

Gabriel> How is it implemented ?  Is there any documentation available on the web ?

	See Paul DeMone's write-up at Real World Technologies website and
the papers from Microprocessor Forum 1999 and 2000, Microprocessor Report
1999, and discussions in comp.arch on USENET.

David

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2001-01-17 17:45 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-01-15 23:13 kernel mapping Dan Malek
2001-01-16  3:07 ` Frank Rowand
2001-01-16  3:55   ` Dan Malek
2001-01-16 11:37   ` Ralph Blach
2001-01-16 16:50     ` Dan Malek
2001-01-16 17:10       ` Ralph Blach
2001-01-16 17:47       ` David Edelsohn
2001-01-16 21:57         ` Dan Malek
2001-01-17 10:51         ` Gabriel Paubert
2001-01-17 17:45           ` David Edelsohn
2001-01-16 19:56       ` Frank Rowand
2001-01-16 22:13         ` Dan Malek
2001-01-17  0:04           ` Frank Rowand
2001-01-17  7:02             ` Dan Malek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).