* about kmap_high function
@ 2001-06-29 7:06 michaelc
2001-07-03 9:38 ` Stephen C. Tweedie
0 siblings, 1 reply; 7+ messages in thread
From: michaelc @ 2001-06-29 7:06 UTC (permalink / raw)
To: linux-kernel
I found that the kmap_high function didn't call __flush_tlb_one()
when it mapped a highmem page sucessfully, and I think it maybe
cause the problem that TLB may store obslete page table entries, but
the kmap_atomic() function do call the __flush_tlb_one(), someone tell
me what's the differenc between the kmap_atomic and kmap_high except
that kmap_atomic can be used in IRQ contexts. I appreciate in advance.
--
Best regards,
michaelc mailto:michaelc@turbolinux.com.cn
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: about kmap_high function 2001-06-29 7:06 about kmap_high function michaelc @ 2001-07-03 9:38 ` Stephen C. Tweedie 2001-07-03 12:47 ` Paul Mackerras 2001-07-05 2:28 ` Re[2]: " michaelc 0 siblings, 2 replies; 7+ messages in thread From: Stephen C. Tweedie @ 2001-07-03 9:38 UTC (permalink / raw) To: michaelc; +Cc: linux-kernel, Stephen Tweedie Hi, On Fri, Jun 29, 2001 at 03:06:01PM +0800, michaelc wrote: > I found that the kmap_high function didn't call __flush_tlb_one() > when it mapped a highmem page sucessfully, and I think it maybe > cause the problem that TLB may store obslete page table entries, but > the kmap_atomic() function do call the __flush_tlb_one(), someone tell > me what's the differenc between the kmap_atomic and kmap_high except > that kmap_atomic can be used in IRQ contexts. I appreciate in advance. kmap_high is intended to be called routinely for access to highmem pages. It is coded to be as fast as possible as a result. TLB flushes are expensive, especially on SMP, so kmap_high tries hard to avoid unnecessary flushes. The way it does it is to do only a single, complete TLB flush of the whole kmap VA range once every time the kmap address ring cycles. That's what flush_all_zero_pkmaps() does --- it evicts old, unused kmap mappings and flushes the whole TLB range, so that we are guaranteed that there is a TLB flush between any two different uses of any given kmap virtual address. That way, we can avoid the cost of having to flush the TLB for every single kmap mapping we create. Cheers, Stephen ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: about kmap_high function 2001-07-03 9:38 ` Stephen C. Tweedie @ 2001-07-03 12:47 ` Paul Mackerras 2001-07-03 15:34 ` Stephen C. Tweedie 2001-07-05 2:28 ` Re[2]: " michaelc 1 sibling, 1 reply; 7+ messages in thread From: Paul Mackerras @ 2001-07-03 12:47 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: linux-kernel Stephen C. Tweedie writes: > kmap_high is intended to be called routinely for access to highmem > pages. It is coded to be as fast as possible as a result. TLB > flushes are expensive, especially on SMP, so kmap_high tries hard to > avoid unnecessary flushes. The code assumes that flushing a single TLB entry is expensive on SMP, while flushing the whole TLB is relatively cheap - certainly cheaper than flushing several individual entries. And that assumption is of course true on i386. On PPC it is a bit different. Flushing a single TLB entry is relatively cheap - the hardware broadcasts the TLB invalidation on the bus (in most implementations) so there are no cross-calls required. But flushing the whole TLB is expensive because we (strictly speaking) have to flush the whole of the MMU hash table as well. The MMU gets its PTEs from a hash table (which can be very large) and we use the hash table as a kind of level-2 cache of PTEs, which means that the flush_tlb_* routines have to flush entries from the MMU hash table as well. The hash table can store PTEs from many contexts, so it can have a lot of PTEs in it at any given time. So flushing the whole TLB would imply going through every single entry in the hash table and clearing it. In fact, currently we cheat - flush_tlb_all actually only flushes the kernel portion of the address space, which is all that is required in the three places where flush_tlb_all is called at the moment. This is not a criticism, rather a request that we expand the interfaces so that the architecture-specific code can make the decisions about when and how to flush TLB entries. For example, I would like to get rid of flush_tlb_all and define a flush_tlb_kernel_range instead. In all the places where flush_tlb_all is currently used, we do actually know the range of addresses which are affected, and having that information would let us do things a lot more efficiently on PPC. On other platforms we could define flush_tlb_kernel_range to just flush the whole TLB, or whatever. Note that there is already a flush_tlb_range which could be used, but some architectures assume that it is only used on user addresses. Regards, Paul. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: about kmap_high function 2001-07-03 12:47 ` Paul Mackerras @ 2001-07-03 15:34 ` Stephen C. Tweedie 2001-07-04 11:48 ` Paul Mackerras 0 siblings, 1 reply; 7+ messages in thread From: Stephen C. Tweedie @ 2001-07-03 15:34 UTC (permalink / raw) To: Paul Mackerras; +Cc: Stephen C. Tweedie, linux-kernel Hi, On Tue, Jul 03, 2001 at 10:47:20PM +1000, Paul Mackerras wrote: > Stephen C. Tweedie writes: > > On PPC it is a bit different. Flushing a single TLB entry is > relatively cheap - the hardware broadcasts the TLB invalidation on the > bus (in most implementations) so there are no cross-calls required. But > flushing the whole TLB is expensive because we (strictly speaking) > have to flush the whole of the MMU hash table as well. How much difference is there? We only flush once per kmap sweep, and we have 1024 entries in the global kmap pool, so the single tlb flush would have to be more than a thousand times less expensive overall than the global flush for that change to be worthwhile. If the page flush really is _that_ much faster, then sure, this decision can easily be made per-architecture: the kmap_high code already has all of the locking and refcounting to know when a per-page tlb flush would be safe. Cheers, Stephen ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: about kmap_high function 2001-07-03 15:34 ` Stephen C. Tweedie @ 2001-07-04 11:48 ` Paul Mackerras 0 siblings, 0 replies; 7+ messages in thread From: Paul Mackerras @ 2001-07-04 11:48 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: linux-kernel Stephen C. Tweedie writes: > On Tue, Jul 03, 2001 at 10:47:20PM +1000, Paul Mackerras wrote: > > On PPC it is a bit different. Flushing a single TLB entry is > > relatively cheap - the hardware broadcasts the TLB invalidation on the > > bus (in most implementations) so there are no cross-calls required. But > > flushing the whole TLB is expensive because we (strictly speaking) > > have to flush the whole of the MMU hash table as well. > > How much difference is there? Between flushing a single TLB entry and flushing the whole TLB, or between flushing a single entry and flushing a range? Flushing the whole TLB (including the MMU hash table) would be extremely expensive. Consider a machine with 1GB of RAM. The recommended MMU hash table size would be 16MB (1024MB/64), although we generally run with much less, maybe a quarter of that. That's still 4MB of memory we have to scan through in order to find and clear all the entries in the hash table, which is what would be required for flushing the whole hash table. What we do at present is (a) have a bit in the linux page tables which indicates whether there is a corresponding entry in the MMU hash table and (b) only flush the kernel portion of the address space (0xc0000000 - 0xffffffff) in flush_tlb_all(). We have a single page table tree for kernel addresses, shared between all processes. That all helps but we still have to scan through all the page table pages for kernel addresses to do a flush_tlb_all(). I just did some measurements on a 400MHz POWER3 machine with 1GB of RAM. This is a 64-bit machine but running a 32-bit kernel (so both the kernel and userspace run in 32-bit mode). It is a 1-cpu machine and I am running an SMP kernel with highmem enabled, with 512MB of lowmem and 512MB of highmem. The MMU hash table is 4MB. The time taken inside a single flush_tlb_page call depends on whether the linux PTE indicates that there is a hardware PTE in the hash table. If not, it takes about 110ns, if it does, it takes 1us (I measured 998.5ns but I rounded it :). A call to flush_tlb_range for 1024 pages from flush_all_zero_pkmaps (replacing the flush_tlb_all call) takes around 1080us, which is pretty much linear. The time for flush_tlb_page was measured inside the procedure whereas the time for flush_tlb_range was measured in the caller, so the flush_tlb_range number includes procedure call and loop overhead which the flush_tlb_page number doesn't. I expect that almost all the PTEs in the pkmap range would have a corresponding hash table entry, since we would almost always touch a page that we have kmap'd. > We only flush once per kmap sweep, and > we have 1024 entries in the global kmap pool, so the single tlb flush > would have to be more than a thousand times less expensive overall > than the global flush for that change to be worthwhile. The time for doing a flush_tlb_all call in flush_all_zero_pkmaps was 3280us. That is for the version which only flushes the kernel portion of the address space. Just doing a memset to 0 on the hash table takes over 11ms (the memset goes at around 360MB/s but there is 4MB to clear). Clearing out the hash table properly would take much longer since you are supposed to synchronize with the hardware when changing each entry in the hash table and the memset is certainly not doing that. So yes, the ratio is more than 1024 to 1. > If the page flush really is _that_ much faster, then sure, this > decision can easily be made per-architecture: the kmap_high code > already has all of the locking and refcounting to know when a per-page > tlb flush would be safe. My preference would be for architectures to be able to make this decision. I don't mind whether it is a flush call per page inside the loop in flush_all_zero_pkmaps or a flush_tlb_range call at the end of the loop. I counted the average number of pages needing to be flushed in the loop in flush_all_zero_pkmaps - it was 1023.9 for the workload I was using, which was a kernel compile. Using flush_tlb_range would be fine on PPC but as I noted before some architectures assume that flush_tlb_range is only used on user addresses at the moment. Paul. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re[2]: about kmap_high function 2001-07-03 9:38 ` Stephen C. Tweedie 2001-07-03 12:47 ` Paul Mackerras @ 2001-07-05 2:28 ` michaelc 2001-07-05 10:41 ` Stephen C. Tweedie 1 sibling, 1 reply; 7+ messages in thread From: michaelc @ 2001-07-05 2:28 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: linux-kernel Hi, Tuesday, July 03, 2001, 5:38:09 PM, you wrote: SCT> kmap_high is intended to be called routinely for access to highmem SCT> pages. It is coded to be as fast as possible as a result. TLB SCT> flushes are expensive, especially on SMP, so kmap_high tries hard to SCT> avoid unnecessary flushes. SCT> The way it does it is to do only a single, complete TLB flush of the SCT> whole kmap VA range once every time the kmap address ring cycles. SCT> That's what flush_all_zero_pkmaps() does --- it evicts old, unused SCT> kmap mappings and flushes the whole TLB range, so that we are SCT> guaranteed that there is a TLB flush between any two different uses of SCT> any given kmap virtual address. SCT> That way, we can avoid the cost of having to flush the TLB for every SCT> single kmap mapping we create. Thank you very much for your kindly guide, and I have two question to ask you, One question is, Is kmap_high intended to be called merely in the user context, so the highmem pages are mapped into user process page table, so on SMP, other processes ( including kernel and user process) that running on another cpu doesn't need to get that kmap virtual address. Another question is, when kernel evicts old, unused kmap mapping and flushes the whole TLB range( call the flush_all_zero_pkmaps), the TLB won't keep those zero mappings, after that, when user process call kmap_high to get a new kmap mappings, and when the process access that virtual address, MMU component will get the page directory and page table from MEMORY instead of TLB to translate the virtual address into physical address. -- Best regards, Michael Chen mailto:michaelc@turbolinux.com.cn ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: about kmap_high function 2001-07-05 2:28 ` Re[2]: " michaelc @ 2001-07-05 10:41 ` Stephen C. Tweedie 0 siblings, 0 replies; 7+ messages in thread From: Stephen C. Tweedie @ 2001-07-05 10:41 UTC (permalink / raw) To: michaelc; +Cc: Stephen C. Tweedie, linux-kernel Hi, On Thu, Jul 05, 2001 at 10:28:35AM +0800, michaelc wrote: > Thank you very much for your kindly guide, and I have two question to ask > you, One question is, Is kmap_high intended to be called merely in the user > context, so the highmem pages are mapped into user process page table, so > on SMP, other processes ( including kernel and user process) that running > on another cpu doesn't need to get that kmap virtual address. No. In user context, at least for user data pages, the highmem pages can be mapped into the local process's user page tables and we don't need kmap to access them at all. kmap is only needed for pages which are not already in the user page tables, such as when accessing the page cache in read or write syscalls. > Another question is, when kernel evicts old, unused kmap mapping and > flushes the whole TLB range( call the flush_all_zero_pkmaps), the TLB won't > keep those zero mappings, after that, when user process call kmap_high to > get a new kmap mappings, and when the process access that virtual > address, MMU component will get the page directory and page table from MEMORY > instead of TLB to translate the virtual address into physical address. No, user processes never access kmap addresses. They have direct page table access to highmem pages in their address space. Only the kernel uses kmap, and only for pages which are not in the calling process's local page tables already. So we don't have to worry about keeping kmap and page tables consistent --- they are totally different address spaces, and the kmap virtual addresses are not visible to user processes. Cheers, Stephen ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2001-07-05 10:42 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2001-06-29 7:06 about kmap_high function michaelc 2001-07-03 9:38 ` Stephen C. Tweedie 2001-07-03 12:47 ` Paul Mackerras 2001-07-03 15:34 ` Stephen C. Tweedie 2001-07-04 11:48 ` Paul Mackerras 2001-07-05 2:28 ` Re[2]: " michaelc 2001-07-05 10:41 ` Stephen C. Tweedie
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox