Linux MIPS Architecture development
 help / color / mirror / Atom feed
* User-mode drivers and TLB
@ 2003-09-22 20:19 Finney, Steve
  2003-09-22 20:19 ` Finney, Steve
  2003-09-23 11:35 ` Dominic Sweetman
  0 siblings, 2 replies; 7+ messages in thread
From: Finney, Steve @ 2003-09-22 20:19 UTC (permalink / raw)
  To: linux-mips

I am working on an app where I want to give one or more 
user processes access to a largish range of physical 
address space (specifically, this is a Broadcom 1125 
running a 32 bit kernel, and for now the region is 
accessible via KSEG0/1 (physical address < 512 MB)). 
mmap() on /dev/mem does this just fine, and setting 
(or not setting) O_SYNC on open seems to control caching. 
But I just realized a disadvantage to doing this in user 
space: the user process accesses have to be mapped (since a
user process can't, I believe, use KSEG0 or KSEG1 addresses),
so you have to go through the (64 entry) TLB, and if 
you had signficant non-locality of reference, you'd
possibly risk thrashing the TLB (which doesn't happen
in kernel space, since the region can be directly 
accessed). One approach would be to wire a TLB entry 
to handle the large region so you never get a TLB miss, 
but this might not work well for multi-process access,
since (normally) you can't guarantee that the multiple
processes doing mmap's will get the same virtual address.

Is this  correct? Is there some other clever approach I
haven't thought of? Should I even be worrying about TLB usage?

Thanks,
sf

^ permalink raw reply	[flat|nested] 7+ messages in thread

* User-mode drivers and TLB
  2003-09-22 20:19 User-mode drivers and TLB Finney, Steve
@ 2003-09-22 20:19 ` Finney, Steve
  2003-09-23 11:35 ` Dominic Sweetman
  1 sibling, 0 replies; 7+ messages in thread
From: Finney, Steve @ 2003-09-22 20:19 UTC (permalink / raw)
  To: linux-mips

I am working on an app where I want to give one or more 
user processes access to a largish range of physical 
address space (specifically, this is a Broadcom 1125 
running a 32 bit kernel, and for now the region is 
accessible via KSEG0/1 (physical address < 512 MB)). 
mmap() on /dev/mem does this just fine, and setting 
(or not setting) O_SYNC on open seems to control caching. 
But I just realized a disadvantage to doing this in user 
space: the user process accesses have to be mapped (since a
user process can't, I believe, use KSEG0 or KSEG1 addresses),
so you have to go through the (64 entry) TLB, and if 
you had signficant non-locality of reference, you'd
possibly risk thrashing the TLB (which doesn't happen
in kernel space, since the region can be directly 
accessed). One approach would be to wire a TLB entry 
to handle the large region so you never get a TLB miss, 
but this might not work well for multi-process access,
since (normally) you can't guarantee that the multiple
processes doing mmap's will get the same virtual address.

Is this  correct? Is there some other clever approach I
haven't thought of? Should I even be worrying about TLB usage?

Thanks,
sf

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: User-mode drivers and TLB
  2003-09-22 20:19 User-mode drivers and TLB Finney, Steve
  2003-09-22 20:19 ` Finney, Steve
@ 2003-09-23 11:35 ` Dominic Sweetman
  2003-09-23 11:35   ` Dominic Sweetman
  2003-09-24  6:47   ` Ralf Baechle
  1 sibling, 2 replies; 7+ messages in thread
From: Dominic Sweetman @ 2003-09-23 11:35 UTC (permalink / raw)
  To: Finney, Steve; +Cc: linux-mips


Steve,

> I am working on an app where I want to give one or more user
> processes access to a largish range of physical address space
> (specifically, this is a Broadcom 1125 running a 32 bit kernel, and
> for now the region is accessible via KSEG0/1 (physical address < 512
> MB)). mmap() on /dev/mem does this just fine, and setting (or not
> setting) O_SYNC on open seems to control caching. But I just
> realized a disadvantage to doing this in user space: the user
> process accesses have to be mapped (since a user process can't, I
> believe, use KSEG0 or KSEG1 addresses), so you have to go through
> the (64 entry) TLB, and if you had signficant non-locality of
> reference, you'd possibly risk thrashing the TLB (which doesn't
> happen in kernel space, since the region can be directly accessed).

As usual, I guess the first thing is to try doing it the standard way
and then try to measure how much time is being spent in extra TLB misses
generated by your application.  Some MIPS CPUs have "performance
counters" which might be able to count TLB misses, but you'll more
likely have to instrument the TLB miss code.

If it does turn out that TLB replacement is a big drain:

Most MIPS CPU hardware allows you to map large chunks of memory with a
single TLB entry: often up to 16Mbytes at a time.  But I don't know
how you'd persuade Linux how to do that.

--
Dominic Sweetman
MIPS Technologies.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: User-mode drivers and TLB
  2003-09-23 11:35 ` Dominic Sweetman
@ 2003-09-23 11:35   ` Dominic Sweetman
  2003-09-24  6:47   ` Ralf Baechle
  1 sibling, 0 replies; 7+ messages in thread
From: Dominic Sweetman @ 2003-09-23 11:35 UTC (permalink / raw)
  To: Finney, Steve; +Cc: linux-mips


Steve,

> I am working on an app where I want to give one or more user
> processes access to a largish range of physical address space
> (specifically, this is a Broadcom 1125 running a 32 bit kernel, and
> for now the region is accessible via KSEG0/1 (physical address < 512
> MB)). mmap() on /dev/mem does this just fine, and setting (or not
> setting) O_SYNC on open seems to control caching. But I just
> realized a disadvantage to doing this in user space: the user
> process accesses have to be mapped (since a user process can't, I
> believe, use KSEG0 or KSEG1 addresses), so you have to go through
> the (64 entry) TLB, and if you had signficant non-locality of
> reference, you'd possibly risk thrashing the TLB (which doesn't
> happen in kernel space, since the region can be directly accessed).

As usual, I guess the first thing is to try doing it the standard way
and then try to measure how much time is being spent in extra TLB misses
generated by your application.  Some MIPS CPUs have "performance
counters" which might be able to count TLB misses, but you'll more
likely have to instrument the TLB miss code.

If it does turn out that TLB replacement is a big drain:

Most MIPS CPU hardware allows you to map large chunks of memory with a
single TLB entry: often up to 16Mbytes at a time.  But I don't know
how you'd persuade Linux how to do that.

--
Dominic Sweetman
MIPS Technologies.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: User-mode drivers and TLB
@ 2003-09-23 19:41 Finney, Steve
  2003-09-23 19:41 ` Finney, Steve
  0 siblings, 1 reply; 7+ messages in thread
From: Finney, Steve @ 2003-09-23 19:41 UTC (permalink / raw)
  To: Dominic Sweetman; +Cc: linux-mips

> 
> Most MIPS CPU hardware allows you to map large chunks of memory with a
> single TLB entry: often up to 16Mbytes at a time.  But I don't know
> how you'd persuade Linux how to do that.
> 
> --
> Dominic Sweetman
> MIPS Technologies.

Thanks: for what it's worth, the Broadcom/Sibyte apparently allows a TLB entry to map 128 MB (the max mapped size is 64 MB, but the TLB entries are paired). And supposedly the MIPS kernel tree was recently updated with some support for Linux to use wired TLB entries on the Broadcom, though I haven't tried this.

sf

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: User-mode drivers and TLB
  2003-09-23 19:41 Finney, Steve
@ 2003-09-23 19:41 ` Finney, Steve
  0 siblings, 0 replies; 7+ messages in thread
From: Finney, Steve @ 2003-09-23 19:41 UTC (permalink / raw)
  To: Dominic Sweetman; +Cc: linux-mips

> 
> Most MIPS CPU hardware allows you to map large chunks of memory with a
> single TLB entry: often up to 16Mbytes at a time.  But I don't know
> how you'd persuade Linux how to do that.
> 
> --
> Dominic Sweetman
> MIPS Technologies.

Thanks: for what it's worth, the Broadcom/Sibyte apparently allows a TLB entry to map 128 MB (the max mapped size is 64 MB, but the TLB entries are paired). And supposedly the MIPS kernel tree was recently updated with some support for Linux to use wired TLB entries on the Broadcom, though I haven't tried this.

sf

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: User-mode drivers and TLB
  2003-09-23 11:35 ` Dominic Sweetman
  2003-09-23 11:35   ` Dominic Sweetman
@ 2003-09-24  6:47   ` Ralf Baechle
  1 sibling, 0 replies; 7+ messages in thread
From: Ralf Baechle @ 2003-09-24  6:47 UTC (permalink / raw)
  To: Dominic Sweetman; +Cc: Finney, Steve, linux-mips

On Tue, Sep 23, 2003 at 12:35:44PM +0100, Dominic Sweetman wrote:

> As usual, I guess the first thing is to try doing it the standard way
> and then try to measure how much time is being spent in extra TLB misses
> generated by your application.  Some MIPS CPUs have "performance
> counters" which might be able to count TLB misses, but you'll more
> likely have to instrument the TLB miss code.
> 
> If it does turn out that TLB replacement is a big drain:
>
> Most MIPS CPU hardware allows you to map large chunks of memory with a
> single TLB entry: often up to 16Mbytes at a time.  But I don't know
> how you'd persuade Linux how to do that.

As an indication at how effective large pagesize support can be for
applications, take a look at the two USENIX 98 papers titled "General
Purpose Operating System Support for Multiple Page Sizes" by SGI about
IRIX and the "Implementation of Multiple Page Size support in HP-UX"
presented on the same.  Given that we have what QED once called the
slowest TLB reload handler they've even seen the impact could be even
stronger than demonstrated in these two papers.  The implementation
described has been condemened by Linus as stupid and unacceptable.  I
expect a conceptually different optmization on MIPS late this year.

In any case the paper show how costly TLB exception handlers can be;
the reason why I yell at about everybody who's mentioning the phrase
"wired tlb entries".

For the time being Linux has large page support for the kernel - read
KSEG0 / KSEGX.  Another optimization is also the use of the global bit
for all kernel mappings and for 2.6 support for hugetlbfs on MIPS should
also be fairly easy.

Btw, again and again the MIPS r4k-style TLBs are a bit of a pain because
each entry maps a pair of pages which share some of their attributes ...

  Ralf

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2003-09-24  6:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-09-22 20:19 User-mode drivers and TLB Finney, Steve
2003-09-22 20:19 ` Finney, Steve
2003-09-23 11:35 ` Dominic Sweetman
2003-09-23 11:35   ` Dominic Sweetman
2003-09-24  6:47   ` Ralf Baechle
  -- strict thread matches above, loose matches on Subject: below --
2003-09-23 19:41 Finney, Steve
2003-09-23 19:41 ` Finney, Steve

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox