about kmap_high function

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* about kmap_high function
@ 2001-06-29  7:06 michaelc
  2001-07-03  9:38 ` Stephen C. Tweedie
  0 siblings, 1 reply; 7+ messages in thread
From: michaelc @ 2001-06-29  7:06 UTC (permalink / raw)
  To: linux-kernel

    I found that the kmap_high function didn't call __flush_tlb_one()
when it mapped a highmem page sucessfully, and I think it maybe
cause the problem that TLB may store obslete page table entries, but
the kmap_atomic() function do call the __flush_tlb_one(), someone tell
me what's the differenc between the kmap_atomic and kmap_high except
that kmap_atomic can be used in IRQ contexts. I appreciate in advance.

-- 
Best regards,
 michaelc                          mailto:michaelc@turbolinux.com.cn



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: about kmap_high function
  2001-06-29  7:06 about kmap_high function michaelc
@ 2001-07-03  9:38 ` Stephen C. Tweedie
  2001-07-03 12:47   ` Paul Mackerras
  2001-07-05  2:28   ` Re[2]: " michaelc
  0 siblings, 2 replies; 7+ messages in thread
From: Stephen C. Tweedie @ 2001-07-03  9:38 UTC (permalink / raw)
  To: michaelc; +Cc: linux-kernel, Stephen Tweedie

Hi,

On Fri, Jun 29, 2001 at 03:06:01PM +0800, michaelc wrote:
>     I found that the kmap_high function didn't call __flush_tlb_one()
> when it mapped a highmem page sucessfully, and I think it maybe
> cause the problem that TLB may store obslete page table entries, but
> the kmap_atomic() function do call the __flush_tlb_one(), someone tell
> me what's the differenc between the kmap_atomic and kmap_high except
> that kmap_atomic can be used in IRQ contexts. I appreciate in advance.

kmap_high is intended to be called routinely for access to highmem
pages.  It is coded to be as fast as possible as a result.  TLB
flushes are expensive, especially on SMP, so kmap_high tries hard to
avoid unnecessary flushes.

The way it does it is to do only a single, complete TLB flush of the
whole kmap VA range once every time the kmap address ring cycles.
That's what flush_all_zero_pkmaps() does --- it evicts old, unused
kmap mappings and flushes the whole TLB range, so that we are
guaranteed that there is a TLB flush between any two different uses of
any given kmap virtual address.

That way, we can avoid the cost of having to flush the TLB for every
single kmap mapping we create.

Cheers,
 Stephen

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: about kmap_high function
  2001-07-03  9:38 ` Stephen C. Tweedie
@ 2001-07-03 12:47   ` Paul Mackerras
  2001-07-03 15:34     ` Stephen C. Tweedie
  2001-07-05  2:28   ` Re[2]: " michaelc
  1 sibling, 1 reply; 7+ messages in thread
From: Paul Mackerras @ 2001-07-03 12:47 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: linux-kernel

Stephen C. Tweedie writes:

> kmap_high is intended to be called routinely for access to highmem
> pages.  It is coded to be as fast as possible as a result.  TLB
> flushes are expensive, especially on SMP, so kmap_high tries hard to
> avoid unnecessary flushes.

The code assumes that flushing a single TLB entry is expensive on SMP,
while flushing the whole TLB is relatively cheap - certainly cheaper
than flushing several individual entries.  And that assumption is of
course true on i386.

On PPC it is a bit different.  Flushing a single TLB entry is
relatively cheap - the hardware broadcasts the TLB invalidation on the
bus (in most implementations) so there are no cross-calls required.  But
flushing the whole TLB is expensive because we (strictly speaking)
have to flush the whole of the MMU hash table as well.

The MMU gets its PTEs from a hash table (which can be very large) and
we use the hash table as a kind of level-2 cache of PTEs, which means
that the flush_tlb_* routines have to flush entries from the MMU hash
table as well.  The hash table can store PTEs from many contexts, so
it can have a lot of PTEs in it at any given time.  So flushing the
whole TLB would imply going through every single entry in the hash
table and clearing it.  In fact, currently we cheat - flush_tlb_all
actually only flushes the kernel portion of the address space, which
is all that is required in the three places where flush_tlb_all is
called at the moment.

This is not a criticism, rather a request that we expand the
interfaces so that the architecture-specific code can make the
decisions about when and how to flush TLB entries.

For example, I would like to get rid of flush_tlb_all and define a
flush_tlb_kernel_range instead.  In all the places where flush_tlb_all
is currently used, we do actually know the range of addresses which
are affected, and having that information would let us do things a lot
more efficiently on PPC.  On other platforms we could define
flush_tlb_kernel_range to just flush the whole TLB, or whatever.

Note that there is already a flush_tlb_range which could be used, but
some architectures assume that it is only used on user addresses.

Regards,
Paul.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: about kmap_high function
  2001-07-03 12:47   ` Paul Mackerras
@ 2001-07-03 15:34     ` Stephen C. Tweedie
  2001-07-04 11:48       ` Paul Mackerras
  0 siblings, 1 reply; 7+ messages in thread
From: Stephen C. Tweedie @ 2001-07-03 15:34 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Stephen C. Tweedie, linux-kernel

Hi,

On Tue, Jul 03, 2001 at 10:47:20PM +1000, Paul Mackerras wrote:
> Stephen C. Tweedie writes:
> 
> On PPC it is a bit different.  Flushing a single TLB entry is
> relatively cheap - the hardware broadcasts the TLB invalidation on the
> bus (in most implementations) so there are no cross-calls required.  But
> flushing the whole TLB is expensive because we (strictly speaking)
> have to flush the whole of the MMU hash table as well.

How much difference is there?  We only flush once per kmap sweep, and
we have 1024 entries in the global kmap pool, so the single tlb flush
would have to be more than a thousand times less expensive overall
than the global flush for that change to be worthwhile.

If the page flush really is _that_ much faster, then sure, this
decision can easily be made per-architecture: the kmap_high code
already has all of the locking and refcounting to know when a per-page
tlb flush would be safe.

Cheers,
 Stephen

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: about kmap_high function
  2001-07-03 15:34     ` Stephen C. Tweedie
@ 2001-07-04 11:48       ` Paul Mackerras
  0 siblings, 0 replies; 7+ messages in thread
From: Paul Mackerras @ 2001-07-04 11:48 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: linux-kernel

Stephen C. Tweedie writes:

> On Tue, Jul 03, 2001 at 10:47:20PM +1000, Paul Mackerras wrote:
> > On PPC it is a bit different.  Flushing a single TLB entry is
> > relatively cheap - the hardware broadcasts the TLB invalidation on the
> > bus (in most implementations) so there are no cross-calls required.  But
> > flushing the whole TLB is expensive because we (strictly speaking)
> > have to flush the whole of the MMU hash table as well.
> 
> How much difference is there? 

Between flushing a single TLB entry and flushing the whole TLB, or
between flushing a single entry and flushing a range?

Flushing the whole TLB (including the MMU hash table) would be
extremely expensive.  Consider a machine with 1GB of RAM.  The
recommended MMU hash table size would be 16MB (1024MB/64), although we
generally run with much less, maybe a quarter of that.  That's still
4MB of memory we have to scan through in order to find and clear all
the entries in the hash table, which is what would be required for
flushing the whole hash table.

What we do at present is (a) have a bit in the linux page tables which
indicates whether there is a corresponding entry in the MMU hash table
and (b) only flush the kernel portion of the address space (0xc0000000
- 0xffffffff) in flush_tlb_all().  We have a single page table tree
for kernel addresses, shared between all processes.  That all helps
but we still have to scan through all the page table pages for kernel
addresses to do a flush_tlb_all().

I just did some measurements on a 400MHz POWER3 machine with 1GB of
RAM.  This is a 64-bit machine but running a 32-bit kernel (so both
the kernel and userspace run in 32-bit mode).  It is a 1-cpu machine
and I am running an SMP kernel with highmem enabled, with 512MB of
lowmem and 512MB of highmem.  The MMU hash table is 4MB.

The time taken inside a single flush_tlb_page call depends on whether
the linux PTE indicates that there is a hardware PTE in the hash
table.  If not, it takes about 110ns, if it does, it takes 1us (I
measured 998.5ns but I rounded it :).

A call to flush_tlb_range for 1024 pages from flush_all_zero_pkmaps
(replacing the flush_tlb_all call) takes around 1080us, which is
pretty much linear.  The time for flush_tlb_page was measured inside
the procedure whereas the time for flush_tlb_range was measured in the
caller, so the flush_tlb_range number includes procedure call and loop
overhead which the flush_tlb_page number doesn't.  I expect that
almost all the PTEs in the pkmap range would have a corresponding hash
table entry, since we would almost always touch a page that we have
kmap'd.

> We only flush once per kmap sweep, and
> we have 1024 entries in the global kmap pool, so the single tlb flush
> would have to be more than a thousand times less expensive overall
> than the global flush for that change to be worthwhile.

The time for doing a flush_tlb_all call in flush_all_zero_pkmaps was
3280us.  That is for the version which only flushes the kernel portion
of the address space.  Just doing a memset to 0 on the hash table
takes over 11ms (the memset goes at around 360MB/s but there is 4MB to
clear).  Clearing out the hash table properly would take much longer
since you are supposed to synchronize with the hardware when changing
each entry in the hash table and the memset is certainly not doing that.

So yes, the ratio is more than 1024 to 1.

> If the page flush really is _that_ much faster, then sure, this
> decision can easily be made per-architecture: the kmap_high code
> already has all of the locking and refcounting to know when a per-page
> tlb flush would be safe.

My preference would be for architectures to be able to make this
decision.  I don't mind whether it is a flush call per page inside the
loop in flush_all_zero_pkmaps or a flush_tlb_range call at the end of
the loop.  I counted the average number of pages needing to be
flushed in the loop in flush_all_zero_pkmaps - it was 1023.9 for the
workload I was using, which was a kernel compile.

Using flush_tlb_range would be fine on PPC but as I noted before some
architectures assume that flush_tlb_range is only used on user
addresses at the moment.

Paul.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re[2]: about kmap_high function
  2001-07-03  9:38 ` Stephen C. Tweedie
  2001-07-03 12:47   ` Paul Mackerras
@ 2001-07-05  2:28   ` michaelc
  2001-07-05 10:41     ` Stephen C. Tweedie
  1 sibling, 1 reply; 7+ messages in thread
From: michaelc @ 2001-07-05  2:28 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: linux-kernel

Hi,

  Tuesday, July 03, 2001, 5:38:09 PM, you wrote:

SCT> kmap_high is intended to be called routinely for access to highmem
SCT> pages.  It is coded to be as fast as possible as a result.  TLB
SCT> flushes are expensive, especially on SMP, so kmap_high tries hard to
SCT> avoid unnecessary flushes.

SCT> The way it does it is to do only a single, complete TLB flush of the
SCT> whole kmap VA range once every time the kmap address ring cycles.
SCT> That's what flush_all_zero_pkmaps() does --- it evicts old, unused
SCT> kmap mappings and flushes the whole TLB range, so that we are
SCT> guaranteed that there is a TLB flush between any two different uses of
SCT> any given kmap virtual address.

SCT> That way, we can avoid the cost of having to flush the TLB for every
SCT> single kmap mapping we create.

       Thank you very much for your kindly guide, and I have two question to ask
   you, One question is, Is kmap_high intended to be called merely in the user
   context, so the  highmem pages are mapped into user process page table, so
   on SMP, other processes ( including kernel and user process) that running
   on another cpu doesn't need to get that kmap virtual address.
      Another question is, when kernel evicts old, unused kmap  mapping and
   flushes the whole TLB range( call the flush_all_zero_pkmaps), the TLB won't
   keep those zero  mappings, after that, when user process call kmap_high to
   get a new kmap mappings, and when the process access that virtual
   address, MMU component will get the page directory and page table from MEMORY
   instead of TLB to translate the virtual address into physical  address.

--
Best regards,
 Michael Chen                            mailto:michaelc@turbolinux.com.cn

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: about kmap_high function
  2001-07-05  2:28   ` Re[2]: " michaelc
@ 2001-07-05 10:41     ` Stephen C. Tweedie
  0 siblings, 0 replies; 7+ messages in thread
From: Stephen C. Tweedie @ 2001-07-05 10:41 UTC (permalink / raw)
  To: michaelc; +Cc: Stephen C. Tweedie, linux-kernel

Hi,

On Thu, Jul 05, 2001 at 10:28:35AM +0800, michaelc wrote:

>        Thank you very much for your kindly guide, and I have two question to ask
>    you, One question is, Is kmap_high intended to be called merely in the user
>    context, so the  highmem pages are mapped into user process page table, so
>    on SMP, other processes ( including kernel and user process) that running
>    on another cpu doesn't need to get that kmap virtual address.

No.  In user context, at least for user data pages, the highmem pages
can be mapped into the local process's user page tables and we don't
need kmap to access them at all.  kmap is only needed for pages which
are not already in the user page tables, such as when accessing the
page cache in read or write syscalls.

>       Another question is, when kernel evicts old, unused kmap  mapping and
>    flushes the whole TLB range( call the flush_all_zero_pkmaps), the TLB won't
>    keep those zero  mappings, after that, when user process call kmap_high to
>    get a new kmap mappings, and when the process access that virtual
>    address, MMU component will get the page directory and page table from MEMORY
>    instead of TLB to translate the virtual address into physical  address.

No, user processes never access kmap addresses.  They have direct page
table access to highmem pages in their address space.  Only the kernel
uses kmap, and only for pages which are not in the calling process's
local page tables already.  So we don't have to worry about keeping
kmap and page tables consistent --- they are totally different address
spaces, and the kmap virtual addresses are not visible to user
processes.

Cheers,
 Stephen

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2001-07-05 10:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-06-29  7:06 about kmap_high function michaelc
2001-07-03  9:38 ` Stephen C. Tweedie
2001-07-03 12:47   ` Paul Mackerras
2001-07-03 15:34     ` Stephen C. Tweedie
2001-07-04 11:48       ` Paul Mackerras
2001-07-05  2:28   ` Re[2]: " michaelc
2001-07-05 10:41     ` Stephen C. Tweedie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox