* [PATCH] 8xx: fix usage of pinned 8Mbyte TLB entries
@ 2005-05-05 17:20 Marcelo Tosatti
2005-05-06 13:04 ` Jason McMullan
2005-05-06 16:43 ` Dan Malek
0 siblings, 2 replies; 20+ messages in thread
From: Marcelo Tosatti @ 2005-05-05 17:20 UTC (permalink / raw)
To: Dan Malek, linux-ppc-embedded
Hi,
As can be seen by BDI output from previous messages, the 8Mbyte TLB
pinned entry is not being actually used.
The manual says, in section "9.3.2 Translation Enabled" (MMU section):
"A TLB hit in multiple entries is avoided when a TLB is being reloaded.
When TLB logic detects that a new effective page number (EPN) overlaps
one in the TLB (when taking into account pages sizes, subpage validity,
user/supervisor state, address space ID,and the SH values of the TLB
entries), the new EPN is written and the old one is invalidated."
The following patch changes "mmu_mapin_ram" (hook used by mapin_ram), to
begin creation of pagetables after the first 8Megs, preserving the
8Mbyte TLB entry.
This changes the assumption that DMA allocations can start at the first
kernel address, given that those need to be marked uncached due to DMA
cache coherency issues.
The bootmem allocator, used to allocate DMA regions at bootup,uses
MAX_DMA_ADDRESS as its goal parameter. The algorithm searches for
pages above 'goal' first, for then to search lower pages.
So change MAX_DMA_ADDRESS to avoid bootmem collisions with lower 8Megs.
Drivers which allocate directly from __get_free_pages() and tweak the
pte's directly also need to be fixed. For example
Panto: FEC currently does
mem_addr = __get_free_page(GFP_KERNEL);
cbd_base = (cbd_t *)mem_addr;
/* XXX: missing check for allocation failure */
fec_uncache(mem_addr);
That needs to be changed to avoid the lower 8Megs.
We are still using v2.4 FEC driver, so this fixed it:
// mem_addr = __get_free_page(GFP_KERNEL);
mem_addr = dma_alloc_coherent(NULL, PAGE_SIZE, &physaddr,
GFP_KERNEL);
cbd_base = (cbd_t *)mem_addr;
Allocateing from the coherent memory DMA region. Which sits at, I suppose,
after initial 8Megs in all configurations (should be always).
TLB miss stat output now looks like this on 2.6.11:
[root@CAS root]# time dd if=/dev/zero of=file bs=4k count=3840
3840+0 records in
3840+0 records out
real 0m3.723s
user 0m0.150s
sys 0m3.560s
I-TLB userspace misses: 1904
I-TLB kernel misses: 0
D-TLB userspace misses: 160272
D-TLB kernel misses: 135098
instead of
[root@CAS root]# time dd if=/dev/zero of=file bs=4k count=3840
3840+0 records in
3840+0 records out
real 0m4.328s
user 0m0.128s
sys 0m4.170s
I-TLB userspace misses: 162651
I-TLB kernel misses: 138100
D-TLB userspace misses: 255294
D-TLB kernel misses: 238129
Dan: Maybe the pinning should be mandatory, getting rid of CONFIG_PIN_TLB?
diff -Nur --show-c-function linux-2.6.12-rc3.orig/arch/ppc/mm/mmu_decl.h linux-2.6.12-rc3/arch/ppc/mm/mmu_decl.h
--- linux-2.6.12-rc3.orig/arch/ppc/mm/mmu_decl.h 2005-05-05 17:21:55.000000000 -0300
+++ linux-2.6.12-rc3/arch/ppc/mm/mmu_decl.h 2005-05-05 17:31:20.000000000 -0300
@@ -49,7 +49,8 @@ extern unsigned long Hash_size, Hash_mas
#if defined(CONFIG_8xx)
#define flush_HPTE(X, va, pg) _tlbie(va)
#define MMU_init_hw() do { } while(0)
-#define mmu_mapin_ram() (0UL)
+/* There is a 8Mbyte pinned TLB entry covering the first 8Megs, so skip it */
+#define mmu_mapin_ram() (0x00800000)
#elif defined(CONFIG_4xx)
#define flush_HPTE(X, va, pg) _tlbie(va)
diff -Nur --show-c-function linux-2.6.12-rc3.orig/include/asm-ppc/dma.h linux-2.6.12-rc3/include/asm-ppc/dma.h
--- linux-2.6.12-rc3.orig/include/asm-ppc/dma.h 2005-05-05 17:21:59.000000000 -0300
+++ linux-2.6.12-rc3/include/asm-ppc/dma.h 2005-05-05 17:53:07.000000000 -0300
@@ -32,9 +32,16 @@
#define MAX_DMA_CHANNELS 8
#endif
+#ifdef CONFIG_8xx
+/* DMA pages are uncached on 8xx due to cache coherency issues.
+* Avoid bootmem from trying to allocate pages from first 8Megs.
+*/
+#define MAX_DMA_ADDRESS (KERNELBASE + 0x01000000)
+#else
/* The maximum address that we can perform a DMA transfer to on this platform */
/* Doesn't really apply... */
#define MAX_DMA_ADDRESS 0xFFFFFFFF
+#endif
/* in arch/ppc/kernel/setup.c -- Cort */
extern unsigned long DMA_MODE_WRITE, DMA_MODE_READ;
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] 8xx: fix usage of pinned 8Mbyte TLB entries
2005-05-06 13:04 ` Jason McMullan
@ 2005-05-06 11:39 ` Marcelo Tosatti
0 siblings, 0 replies; 20+ messages in thread
From: Marcelo Tosatti @ 2005-05-06 11:39 UTC (permalink / raw)
To: Jason McMullan; +Cc: linux-ppc-embedded
On Fri, May 06, 2005 at 09:04:24AM -0400, Jason McMullan wrote:
> On Thu, 2005-05-05 at 14:20 -0300, Marcelo Tosatti wrote:
> > [snip snip]
> >
> > Allocateing from the coherent memory DMA region. Which sits at, I suppose,
> > after initial 8Megs in all configurations (should be always).
> >
>
>
> What if your board (ie the MPC885ADS) only has 8Mb? Soldered on.
Jason,
Oops, in that case you can't pin the 8Mbyte entry.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] 8xx: fix usage of pinned 8Mbyte TLB entries
2005-05-05 17:20 [PATCH] 8xx: fix usage of pinned 8Mbyte TLB entries Marcelo Tosatti
@ 2005-05-06 13:04 ` Jason McMullan
2005-05-06 11:39 ` Marcelo Tosatti
2005-05-06 16:43 ` Dan Malek
1 sibling, 1 reply; 20+ messages in thread
From: Jason McMullan @ 2005-05-06 13:04 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: linux-ppc-embedded
[-- Attachment #1: Type: text/plain, Size: 374 bytes --]
On Thu, 2005-05-05 at 14:20 -0300, Marcelo Tosatti wrote:
> [snip snip]
>
> Allocateing from the coherent memory DMA region. Which sits at, I suppose,
> after initial 8Megs in all configurations (should be always).
>
What if your board (ie the MPC885ADS) only has 8Mb? Soldered on.
--
Jason McMullan <jason.mcmullan@timesys.com>
TimeSys Corporation
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] 8xx: fix usage of pinned 8Mbyte TLB entries
2005-05-06 16:43 ` Dan Malek
@ 2005-05-06 13:38 ` Marcelo Tosatti
2005-05-06 22:49 ` Dan Malek
0 siblings, 1 reply; 20+ messages in thread
From: Marcelo Tosatti @ 2005-05-06 13:38 UTC (permalink / raw)
To: Dan Malek; +Cc: linux-ppc-embedded
> >The following patch changes "mmu_mapin_ram" (hook used by mapin_ram),
> >to
> >begin creation of pagetables after the first 8Megs, preserving the
> >8Mbyte TLB entry.
>
> Please don't do this. It isn't necessary.
Why it is not necessary?
Have you read the section of the manual which I pasted here?
> >This changes the assumption that DMA allocations can start at the first
> >kernel address, given that those need to be marked uncached due to DMA
> >cache coherency issues.
>
> VM space for uncached DMA has always been allocated using vmalloc(),
> the location of the physical pages backing this space is irrelevant.
>
> The only thing you have to ensure is the virtual address is outside
> of the pinned entry.
What you replied to is:
"This changes the assumption that DMA allocations can start at the first
kernel address, given that those need to be marked uncached due to DMA
cache coherency issues."
I think we mean the same, yes?
> If something about the way the VM space is structured in 2.6 is
> different, we need to fix that in general.
>
> >Panto: FEC currently does
> >
> > mem_addr = __get_free_page(GFP_KERNEL);
> > cbd_base = (cbd_t *)mem_addr;
>
> This is just plain broken and it shouldn't do this.
>
> >We are still using v2.4 FEC driver, so this fixed it:
> >
> >// mem_addr = __get_free_page(GFP_KERNEL);
> > mem_addr = dma_alloc_coherent(NULL, PAGE_SIZE, &physaddr,
> > GFP_KERNEL);
>
> This is the proper way, and should be moved to the equivalent in 2.6.
>
> >Allocateing from the coherent memory DMA region. Which sits at, I
> >suppose,
> >after initial 8Megs in all configurations (should be always).
>
> You are making this too complicated :-) All we have to do is use the
> proper dma allocators and make sure the TLBs are pinned properly.
> That is all.
Sorry, but, what is too complicated?
The patch I sent does two things (pretty much the same thing you suggest
after stating that its "too complicated", AFAICS):
1) avoids the creation of pte tables in the 8Mbyte range, thus preserving
the pinned TLB entry.
2) restricts bootmem to above 8Mbyte region
And last thing is:
3) Memory for DMA pages must not be in the pinned region. ie. drivers
should not allocate memory directly for DMA purposes.
Dan, I would really enjoy having access to some of your precious 8xx
knowledge: share it, along with the correct way to fix this and the
other pending issues.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] 8xx: fix usage of pinned 8Mbyte TLB entries
2005-05-05 17:20 [PATCH] 8xx: fix usage of pinned 8Mbyte TLB entries Marcelo Tosatti
2005-05-06 13:04 ` Jason McMullan
@ 2005-05-06 16:43 ` Dan Malek
2005-05-06 13:38 ` Marcelo Tosatti
1 sibling, 1 reply; 20+ messages in thread
From: Dan Malek @ 2005-05-06 16:43 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: linux-ppc-embedded
On May 5, 2005, at 1:20 PM, Marcelo Tosatti wrote:
> As can be seen by BDI output from previous messages, the 8Mbyte TLB
> pinned entry is not being actually used.
We know it's not, we know it's broken, I'm working on it :-)
> The following patch changes "mmu_mapin_ram" (hook used by mapin_ram),
> to
> begin creation of pagetables after the first 8Megs, preserving the
> 8Mbyte TLB entry.
Please don't do this. It isn't necessary.
> This changes the assumption that DMA allocations can start at the first
> kernel address, given that those need to be marked uncached due to DMA
> cache coherency issues.
VM space for uncached DMA has always been allocated using vmalloc(),
the location of the physical pages backing this space is irrelevant.
The only
thing you have to ensure is the virtual address is outside of the pinned
entry. If something about the way the VM space is structured in 2.6 is
different, we need to fix that in general.
> Panto: FEC currently does
>
> mem_addr = __get_free_page(GFP_KERNEL);
> cbd_base = (cbd_t *)mem_addr;
This is just plain broken and it shouldn't do this.
> We are still using v2.4 FEC driver, so this fixed it:
>
> // mem_addr = __get_free_page(GFP_KERNEL);
> mem_addr = dma_alloc_coherent(NULL, PAGE_SIZE, &physaddr,
> GFP_KERNEL);
This is the proper way, and should be moved to the equivalent in 2.6.
> Allocateing from the coherent memory DMA region. Which sits at, I
> suppose,
> after initial 8Megs in all configurations (should be always).
You are making this too complicated :-) All we have to do is use the
proper dma allocators and make sure the TLBs are pinned properly.
That is all.
Thanks.
-- Dan
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] 8xx: fix usage of pinned 8Mbyte TLB entries
2005-05-06 22:49 ` Dan Malek
@ 2005-05-06 20:03 ` Marcelo Tosatti
2005-05-07 3:09 ` Dan Malek
2005-05-07 5:27 ` Dan Malek
2005-05-06 23:10 ` Dan Malek
1 sibling, 2 replies; 20+ messages in thread
From: Marcelo Tosatti @ 2005-05-06 20:03 UTC (permalink / raw)
To: Dan Malek; +Cc: linux-ppc-embedded
Hi Dan,
On Fri, May 06, 2005 at 06:49:11PM -0400, Dan Malek wrote:
>
> On May 6, 2005, at 9:38 AM, Marcelo Tosatti wrote:
>
> >1) avoids the creation of pte tables in the 8Mbyte range, thus
> >preserving
> >the pinned TLB entry.
>
> This has nothing to do with "preserving" the pinned TLB entry.
> The pinned entries are placed into the reserved portion of the TLB,
> and are never evicted.
OK.
> We never get a fault on these pages, so wenever look up an
> entry in the page table.
The data I have tells me otherwise. I have seen the I-TLB entries
getting created for kernel space.
I did the following:
- insert a break at the beginning of start_kernel, another break at
the end of start_kernel.
- boot, BDI stops at start_kernel.
- dump I-TLB contents, no entries for "start_kernel" pages on I-TLB.
- "go".
- BDI stops at the end of start_kernel.
- dump I-TLB contents, see the 4kb entries for "start_kernel" I-cache
there (ie we _do_ get faults on these pages).
Check it out.
If your setup is not working yet I can get the data for you tomorrow.
> We need to create the
> page tables for informational purposes, so software or debugger
> lookups will do the right thing.
Can't the BDI work on the 8Mbyte page? Same for other software
or debuggers...
Any in-kernel algorithm which relies on direct pte manipulation
looks fragile...
i386 and some (?) other architectures do use big pages for the first
kernel addresses, right?
> >2) restricts bootmem to above 8Mbyte region
>
> Why is this necessary?
void __init
m8xx_setup_arch(void)
{
int cpm_page;
cpm_page = (int) alloc_bootmem_pages(PAGE_SIZE);
/* Reset the Communication Processor Module.
*/
m8xx_cpm_reset(cpm_page);
...
void
m8xx_cpm_reset(uint bootpage)
{
...
/* get the PTE for the bootpage */
if (!get_pteptr(&init_mm, bootpage, &pte))
panic("get_pteptr failed\n");
/* and make it uncachable */
pte_val(*pte) |= _PAGE_NO_CACHE;
_tlbie(bootpage);
host_buffer = bootpage;
host_end = host_buffer + PAGE_SIZE;
> >3) Memory for DMA pages must not be in the pinned region. ie. drivers
> >should not allocate memory directly for DMA purposes.
>
> Why not?
Because DMA pages need to have their PTE's marked as uncached, which in turn
means their TLB's need to be marked as uncached.
> It doesn't matter if we cover a VM space with a bunch of 4K
> entries or a single 8M entry. The physical pages are always going to
> be multiple mapped, either through the mapin_ram() space or a single 8M
> entry, and also through the vmalloc() space. You just have to ensure,
> in any case, that you don't access the pages through both VM spaces.
I dont think you can have multiple overlapping TLB entries.
How is the MMU supposed to decide between multiple mappings
for the same address ?
> >Dan, I would really enjoy having access to some of your precious 8xx
> >knowledge: share it, along with the correct way to fix this and the
> >other pending issues.
>
> The correct fix is rather simple, just make sure you configure the TLB
> to reserve entries, and get the pinned entries into those reserved
> entries.
That is how it is now. See previous posts with detailed TLB debugging.
> I know I had it right once, I don't know what happened :-)
Maybe you thought you got it right because the initial 8Mbyte
mapping works?
Unfortunately that mapping is trashed after overlapping
pte's are created.
> Just hang on and I'll get you some code to test ....
Sure...
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] 8xx: fix usage of pinned 8Mbyte TLB entries
2005-05-06 13:38 ` Marcelo Tosatti
@ 2005-05-06 22:49 ` Dan Malek
2005-05-06 20:03 ` Marcelo Tosatti
2005-05-06 23:10 ` Dan Malek
0 siblings, 2 replies; 20+ messages in thread
From: Dan Malek @ 2005-05-06 22:49 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: linux-ppc-embedded
On May 6, 2005, at 9:38 AM, Marcelo Tosatti wrote:
> 1) avoids the creation of pte tables in the 8Mbyte range, thus
> preserving
> the pinned TLB entry.
This has nothing to do with "preserving" the pinned TLB entry.
The pinned entries are placed into the reserved portion of the TLB,
and are never evicted. We never get a fault on these pages, so we
never look up an entry in the page table. We need to create the
page tables for informational purposes, so software or debugger
lookups will do the right thing.
> 2) restricts bootmem to above 8Mbyte region
Why is this necessary?
> 3) Memory for DMA pages must not be in the pinned region. ie. drivers
> should not allocate memory directly for DMA purposes.
Why not? It doesn't matter if we cover a VM space with a bunch of 4K
entries or a single 8M entry. The physical pages are always going to
be multiple mapped, either through the mapin_ram() space or a single 8M
entry, and also through the vmalloc() space. You just have to ensure,
in any case, that you don't access the pages through both VM spaces.
> Dan, I would really enjoy having access to some of your precious 8xx
> knowledge: share it, along with the correct way to fix this and the
> other pending issues.
The correct fix is rather simple, just make sure you configure the TLB
to reserve entries, and get the pinned entries into those reserved
entries. I know I had it right once, I don't know what happened :-)
Just hang on and I'll get you some code to test ....
Thanks.
-- Dan
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] 8xx: fix usage of pinned 8Mbyte TLB entries
2005-05-07 3:09 ` Dan Malek
@ 2005-05-06 23:05 ` Marcelo Tosatti
2005-05-07 4:39 ` Dan Malek
2005-05-07 14:59 ` Wolfgang Denk
0 siblings, 2 replies; 20+ messages in thread
From: Marcelo Tosatti @ 2005-05-06 23:05 UTC (permalink / raw)
To: Dan Malek; +Cc: linux-ppc-embedded
On Fri, May 06, 2005 at 11:09:15PM -0400, Dan Malek wrote:
>
> On May 6, 2005, at 4:03 PM, Marcelo Tosatti wrote:
>
> >The data I have tells me otherwise. I have seen the I-TLB entries
> >getting created for kernel space.
>
> Of course. That's because the pinned entries aren't working :-)
>
> >Can't the BDI work on the 8Mbyte page? Same for other software
> >or debuggers...
>
> The BDI can, but other software functions will walk the page
> tables looking for PTE information.
Do you have any practical example which you are certain is going
to break?
I dont remember any, and I dont think any software should be walking
kernel pte's directly...
It is not possible to have the 8Mbyte pinned TLB and 4kb pagetables
mapping the same kernel virtual addresses.
> > /* get the PTE for the bootpage */
> > if (!get_pteptr(&init_mm, bootpage, &pte))
> > panic("get_pteptr failed\n");
> >
> > /* and make it uncachable */
> > pte_val(*pte) |= _PAGE_NO_CACHE;
> > _tlbie(bootpage);
>
> This is a bad hack (that I wrote) that needs to get fixed.
>
> >Because DMA pages need to have their PTE's marked as uncached, which
> >in turn
> >means their TLB's need to be marked as uncached.
>
> Right, but these are allocated from the vmalloc() space, far away
> from the pinned entries.
>
> >I dont think you can have multiple overlapping TLB entries.
>
> Sure you can, we do it all of the time. The kernel maps all of
> memory, and then user applications do it again. The only time
> it causes a problem is when you have different cache attributes
> for the same physical page. In this case, you need to ensure
> you only use one mapping. You can't have the same virtual
> address twice in the TLB (iirc, the 8xx automatically invalidates
> an existing one if you do this), but you can have the same
> physical page mapped multiple times.
You can't have both a 4kb page and a 8Mbyte page mapping the virtual
address KERNELBASE + 0.
Do you agree?
> >How is the MMU supposed to decide between multiple mappings
> >for the same address ?
>
> You are thinking backward. The MMU maps the virtual address
> accessed, there is only one valid at a time. You can have multiple
> VM addresses accessing the same physical page.
Right - I'm talking about kernel virtual addresses: in this specific case,
we can't have more than one mapping for the first page at KERNELBASE.
> >That is how it is now. See previous posts with detailed TLB debugging.
>
> Something isn't correct if it isn't working.
>
> >Maybe you thought you got it right because the initial 8Mbyte
> >mapping works?
>
> No, this is required to work for some execute in place from rom
> systems I have done. It was adapted from that. The initial 8M
> mapping must be evicted when the mapin_ram() is done. It's
> supposed to happen that way.
>
> >Unfortunately that mapping is trashed after overlapping
> >pte's are created.
>
> Right, that is supposed to happen unless TLB pinning
> is configured.
OK, we seem to be on the same page now.
So you do agree that pte's should not be created for the first
8MBytes if CONFIG_PIN_TLB is set? :)
Should I send an updated patch or you plan to do that?
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] 8xx: fix usage of pinned 8Mbyte TLB entries
2005-05-06 22:49 ` Dan Malek
2005-05-06 20:03 ` Marcelo Tosatti
@ 2005-05-06 23:10 ` Dan Malek
1 sibling, 0 replies; 20+ messages in thread
From: Dan Malek @ 2005-05-06 23:10 UTC (permalink / raw)
To: Dan Malek; +Cc: linux-ppc-embedded
On May 6, 2005, at 6:49 PM, Dan Malek wrote:
>> 3) Memory for DMA pages must not be in the pinned region. ie. drivers
>> should not allocate memory directly for DMA purposes.
>
> Why not?
Having now read this again, the "Why not?" was for the first
sentence :-)
Drivers should always use the proper dma allocation functions.
Unpredictable and sometimes exciting results can happen if
they don't.
Thanks.
-- Dan
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] 8xx: fix usage of pinned 8Mbyte TLB entries
2005-05-06 20:03 ` Marcelo Tosatti
@ 2005-05-07 3:09 ` Dan Malek
2005-05-06 23:05 ` Marcelo Tosatti
2005-05-07 5:27 ` Dan Malek
1 sibling, 1 reply; 20+ messages in thread
From: Dan Malek @ 2005-05-07 3:09 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: linux-ppc-embedded
On May 6, 2005, at 4:03 PM, Marcelo Tosatti wrote:
> The data I have tells me otherwise. I have seen the I-TLB entries
> getting created for kernel space.
Of course. That's because the pinned entries aren't working :-)
> Can't the BDI work on the 8Mbyte page? Same for other software
> or debuggers...
The BDI can, but other software functions will walk the page
tables looking for PTE information.
> /* get the PTE for the bootpage */
> if (!get_pteptr(&init_mm, bootpage, &pte))
> panic("get_pteptr failed\n");
>
> /* and make it uncachable */
> pte_val(*pte) |= _PAGE_NO_CACHE;
> _tlbie(bootpage);
This is a bad hack (that I wrote) that needs to get fixed.
> Because DMA pages need to have their PTE's marked as uncached, which
> in turn
> means their TLB's need to be marked as uncached.
Right, but these are allocated from the vmalloc() space, far away
from the pinned entries.
> I dont think you can have multiple overlapping TLB entries.
Sure you can, we do it all of the time. The kernel maps all of
memory, and then user applications do it again. The only time
it causes a problem is when you have different cache attributes
for the same physical page. In this case, you need to ensure
you only use one mapping. You can't have the same virtual
address twice in the TLB (iirc, the 8xx automatically invalidates
an existing one if you do this), but you can have the same
physical page mapped multiple times.
> How is the MMU supposed to decide between multiple mappings
> for the same address ?
You are thinking backward. The MMU maps the virtual address
accessed, there is only one valid at a time. You can have multiple
VM addresses accessing the same physical page.
> That is how it is now. See previous posts with detailed TLB debugging.
Something isn't correct if it isn't working.
> Maybe you thought you got it right because the initial 8Mbyte
> mapping works?
No, this is required to work for some execute in place from rom
systems I have done. It was adapted from that. The initial 8M
mapping must be evicted when the mapin_ram() is done. It's
supposed to happen that way.
> Unfortunately that mapping is trashed after overlapping
> pte's are created.
Right, that is supposed to happen unless TLB pinning
is configured.
Thanks.
-- Dan
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] 8xx: fix usage of pinned 8Mbyte TLB entries
2005-05-06 23:05 ` Marcelo Tosatti
@ 2005-05-07 4:39 ` Dan Malek
2005-05-07 5:16 ` Dan Malek
` (2 more replies)
2005-05-07 14:59 ` Wolfgang Denk
1 sibling, 3 replies; 20+ messages in thread
From: Dan Malek @ 2005-05-07 4:39 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: linux-ppc-embedded
On May 6, 2005, at 7:05 PM, Marcelo Tosatti wrote:
> Do you have any practical example which you are certain is going
> to break?
Not at the moment, but that doesn't mean we shouldn't maintain
consistency for anyone that wants to do so.
> I dont remember any, and I dont think any software should be walking
> kernel pte's directly...
Anyone can call get_pteptr and should get the proper information.
> It is not possible to have the 8Mbyte pinned TLB and 4kb pagetables
> mapping the same kernel virtual addresses.
I know, but we don't do that. Like I said, if the 8M pinned entry is
in the TLB, we don't get exceptions for this space and we don't look
up PTEs and replace them.
> You can't have both a 4kb page and a 8Mbyte page mapping the virtual
> address KERNELBASE + 0.
>
> Do you agree?
Yes, but that isn't what we are doing. We can have the 8M page
mapping virtual address 0xc0000000 to 0x0000000, and also another
4k page, at say 0xd0000000 map the same 0x00000000 physical page.
There are many circumstances when we have a kernel VM address
and a user VM address map the same physical page. This is also
what we do to get uncached VM addresses for DMA.
> Right - I'm talking about kernel virtual addresses: in this specific
> case,
> we can't have more than one mapping for the first page at KERNELBASE.
You can't do that in any case for anything, and I'm confused why you
keep mentioning this :-)
> So you do agree that pte's should not be created for the first
> 8MBytes if CONFIG_PIN_TLB is set? :)
NO. Just leave that code alone. I don't understand why you think
doing this will have any effect on the system operation. If you are
able to run a system without creating these tables, then the pinned
TLBs must be working. If pinned TLBs weren't working, the kernel
would crash.
Thanks.
-- Dan
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] 8xx: fix usage of pinned 8Mbyte TLB entries
2005-05-07 4:39 ` Dan Malek
@ 2005-05-07 5:16 ` Dan Malek
2005-05-07 13:16 ` Marcelo Tosatti
2005-05-07 14:05 ` Marcelo Tosatti
2 siblings, 0 replies; 20+ messages in thread
From: Dan Malek @ 2005-05-07 5:16 UTC (permalink / raw)
To: Dan Malek; +Cc: linux-ppc-embedded
The following patch is needed to properly wire the TLB
entries on the newer 8xx processors. I think it will work
on all of them with sufficient entries to allow the pinning.
Don't do this on an 823 or 850.
-- Dan
--- linux-2.6.11.5/arch/ppc/kernel/head_8xx.S 2005-03-19
01:34:56.000000000 -0500
+++ linux-2.6-tlbpin/arch/ppc/kernel/head_8xx.S 2005-05-07
00:57:32.000000000 -0400
@@ -663,7 +663,7 @@
tlbia /* Invalidate all TLB entries */
#ifdef CONFIG_PIN_TLB
lis r8, MI_RSV4I@h
- ori r8, r8, 0x1c00
+ ori r8, r8, 0x1f00
#else
li r8, 0
#endif
@@ -671,7 +671,7 @@
#ifdef CONFIG_PIN_TLB
lis r10, (MD_RSV4I | MD_RESETVAL)@h
- ori r10, r10, 0x1c00
+ ori r10, r10, 0x1f00
mr r8, r10
#else
lis r10, MD_RESETVAL@h
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] 8xx: fix usage of pinned 8Mbyte TLB entries
2005-05-06 20:03 ` Marcelo Tosatti
2005-05-07 3:09 ` Dan Malek
@ 2005-05-07 5:27 ` Dan Malek
2005-05-07 5:55 ` Dan Malek
1 sibling, 1 reply; 20+ messages in thread
From: Dan Malek @ 2005-05-07 5:27 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: linux-ppc-embedded
The last patch I just sent isn't quite sufficient. We still have
to fix this:
On May 6, 2005, at 4:03 PM, Marcelo Tosatti wrote:
> /* get the PTE for the bootpage */
> if (!get_pteptr(&init_mm, bootpage, &pte))
> panic("get_pteptr failed\n");
>
> /* and make it uncachable */
> pte_val(*pte) |= _PAGE_NO_CACHE;
> _tlbie(bootpage);
One of things that was corrected in linuxppc-2.4, that never made
if forward. I did a late consistent_alloc() on the first call to
hostmem_alloc(). I'm looking for a similar solution in 2.6.
Thanks.
-- Dan
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] 8xx: fix usage of pinned 8Mbyte TLB entries
2005-05-07 5:27 ` Dan Malek
@ 2005-05-07 5:55 ` Dan Malek
0 siblings, 0 replies; 20+ messages in thread
From: Dan Malek @ 2005-05-07 5:55 UTC (permalink / raw)
To: Dan Malek; +Cc: linux-ppc-embedded
> On May 6, 2005, at 4:03 PM, Marcelo Tosatti wrote:
>
>> /* get the PTE for the bootpage */
>> if (!get_pteptr(&init_mm, bootpage, &pte))
>> panic("get_pteptr failed\n");
>>
>> /* and make it uncachable */
>> pte_val(*pte) |= _PAGE_NO_CACHE;
>> _tlbie(bootpage);
Can someone explain to me why this was necessary,
along with the weird hacks in the serial driver to
hostmem_alloc() if we are using the console and
dma_alloc_consistent() if we aren't?
This bootmem page stuff should not be necessary,
the cpm_reset() doesn't need to allocate the host
buffer, and it should be done the first time hostmem_alloc()
is called.
I don't have an 8xx handy. Can someone remove all of this:
/* get the PTE for the bootpage */
if (!get_pteptr(&init_mm, bootpage, &pte))
panic("get_pteptr failed\n");
/* and make it uncachable */
pte_val(*pte) |= _PAGE_NO_CACHE;
_tlbie(bootpage);
host_buffer = bootpage;
host_end = host_buffer + PAGE_SIZE;
from arch/ppc/8xx_io/commproc.c and let me know
if the system still works?
Thanks.
-- Dan
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] 8xx: fix usage of pinned 8Mbyte TLB entries
2005-05-07 4:39 ` Dan Malek
2005-05-07 5:16 ` Dan Malek
@ 2005-05-07 13:16 ` Marcelo Tosatti
2005-05-07 20:02 ` Dan Malek
2005-05-07 14:05 ` Marcelo Tosatti
2 siblings, 1 reply; 20+ messages in thread
From: Marcelo Tosatti @ 2005-05-07 13:16 UTC (permalink / raw)
To: Dan Malek; +Cc: linux-ppc-embedded
> >So you do agree that pte's should not be created for the first
> >8MBytes if CONFIG_PIN_TLB is set? :)
>
> NO. Just leave that code alone. I don't understand why you think
> doing this will have any effect on the system operation.
>
> If you are able to run a system without creating these tables, then
> the pinned TLBs must be working. If pinned TLBs weren't working,
> the kernel would crash.
Not creating 4kb mappings for the first 8Mbytes of kernel
virtual addresses fixed the problem for me.
Break at first start_kernel instruction (0xc02284a4).
Dump TLB contents to "itlb-before".
[marcelo@dmt ~]$ grep SPR itlb-before | grep 816
SPR 816 : 0x10002080 268443776
SPR 816 : 0x10001080 268439680
SPR 816 : 0x0ff79080 267882624
SPR 816 : 0x0ff261c0 267542976
SPR 816 : 0x0ff521c0 267723200
SPR 816 : 0x100121c0 268509632
SPR 816 : 0x100011c0 268440000
SPR 816 : 0x0ffdd1c0 268292544
SPR 816 : 0x0ffdb1c0 268284352
SPR 816 : 0x0fef51c0 267342272
SPR 816 : 0x0fef91c0 267358656
SPR 816 : 0x0fe0b1c0 266383808
SPR 816 : 0x0fef71c0 267350464
SPR 816 : 0x0fef61c0 267346368
SPR 816 : 0x0ffee1c0 268362176
SPR 816 : 0x0ffdc1c0 268288448
SPR 816 : 0x0fef21c0 267329984
SPR 816 : 0x0fef11c0 267325888
SPR 816 : 0x0fe071c0 266367424
SPR 816 : 0x0ffc61c0 268198336
SPR 816 : 0x0fe0c1c0 266387904
SPR 816 : 0x0ffc51c0 268194240
SPR 816 : 0x0fe091c0 266375616
SPR 816 : 0x0ffea080 268345472
SPR 816 : 0x0ff20080 267518080
SPR 816 : 0x0ff81080 267915392
SPR 816 : 0x1001c080 268550272
SPR 816 : 0x10008080 268468352
SPR 816 : 0x100021e0 268444128
SPR 816 : 0x100241e0 268583392
SPR 816 : 0x100301e0 268632544
SPR 816 : 0xc0000e1f -1073738209 <----- VALID 8MB TLB ENTRY
"go"
BDI breaks at
BDI>i
Target state : debug mode
Debug entry cause : instruction breakpoint
Current PC : 0xc0228544
BDI>
0xc0228538 <start_kernel+148>: bl 0xc023107c <pidhash_init>
0xc022853c <start_kernel+152>: bl 0xc0230f1c <init_timers>
0xc0228540 <start_kernel+156>: bl 0xc0230cf4 <softirq_init>
0xc0228544 <start_kernel+160>: bl 0xc022ead0 <time_init>
0xc0228548 <start_kernel+164>: bl 0xc02354b0 <console_init>
0xc022854c <start_kernel+168>: lis r9,-16348
[marcelo@dmt ~]$ grep SPR itlb-2 | grep 816
SPR 816 : 0x10001100 268439808
SPR 816 : 0x0ffdd100 268292352
SPR 816 : 0x0ffdb100 268284160
SPR 816 : 0x0fef5100 267342080
SPR 816 : 0x0fef9100 267358464
SPR 816 : 0x0fe0b100 266383616
SPR 816 : 0x0fef7100 267350272
SPR 816 : 0x0fef6100 267346176
SPR 816 : 0x0ffee100 268361984
SPR 816 : 0x0ffdc100 268288256
SPR 816 : 0xc0038110 -1073512176 <---------
SPR 816 : 0xc0063110 -1073336048
SPR 816 : 0xc0024110 -1073594096
SPR 816 : 0xc0017110 -1073647344
SPR 816 : 0xc000e110 -1073684208
SPR 816 : 0xc0003110 -1073729264
SPR 816 : 0xc0002110 -1073733360
SPR 816 : 0xc000d110 -1073688304
SPR 816 : 0xc0004110 -1073725168
SPR 816 : 0xc0012110 -1073667824
SPR 816 : 0xc01a1110 -1072033520
SPR 816 : 0xc01a2110 -1072029424
SPR 816 : 0xc000a110 -1073700592
SPR 816 : 0xc001c110 -1073626864
SPR 816 : 0xc001b110 -1073630960 <---------
SPR 816 : 0x0ff26100 267542784
SPR 816 : 0x0ff52100
SPR 816 : 0x10012100 268509440
SPR 816 : 0x100021e0 268444128
SPR 816 : 0x100241e0 268583392
SPR 816 : 0x100301e0 268632544
SPR 816 : 0xc0000e1f -1073738209
(gdb) disassemble start_kernel
Dump of assembler code for function start_kernel:
0xc02284a4 <start_kernel+0>: lis r3,-16358
0xc02284a8 <start_kernel+4>: stwu r1,-32(r1)
0xc02284ac <start_kernel+8>: mflr r0
0xc02284b0 <start_kernel+12>: addi r3,r3,21832
0xc02284b4 <start_kernel+16>: stw r0,36(r1)
0xc02284b8 <start_kernel+20>: stmw r29,20(r1)
0xc02284bc <start_kernel+24>: bl 0xc0012130 <printk>
0xc02284c0 <start_kernel+28>: addi r3,r1,8
0xc02284c4 <start_kernel+32>: bl 0xc022ee28 <setup_arch>
0xc02284c8 <start_kernel+36>: bl 0xc0230548 <sched_init>
0xc02284cc <start_kernel+40>: bl 0xc02321e8 <build_all_zonelists>
0xc02284d0 <start_kernel+44>: bl 0xc02326f4 <page_alloc_init>
0xc02284d4 <start_kernel+48>: lis r4,-16348
0xc02284d8 <start_kernel+52>: lis r3,-16355
0xc02284dc <start_kernel+56>: addi r4,r4,-5804
0xc02284e0 <start_kernel+60>: addi r3,r3,-6768
0xc02284e4 <start_kernel+64>: bl 0xc0012130 <printk>
0xc02284e8 <start_kernel+68>: bl 0xc022842c <parse_early_param>
0xc02284ec <start_kernel+72>: lis r5,-16353
0xc02284f0 <start_kernel+76>: lis r6,-16353
0xc02284f4 <start_kernel+80>: addi r5,r5,4580
0xc02284f8 <start_kernel+84>: addi r6,r6,5060
---Type <return> to continue, or q <return> to quit---
0xc02284fc <start_kernel+88>: subf r6,r5,r6
0xc0228500 <start_kernel+92>: lis r0,-13108
0xc0228504 <start_kernel+96>: ori r0,r0,52429
0xc0228508 <start_kernel+100>: srawi r6,r6,2
0xc022850c <start_kernel+104>: mullw r6,r6,r0
0xc0228510 <start_kernel+108>: lwz r4,8(r1)
0xc0228514 <start_kernel+112>: lis r7,-16349
0xc0228518 <start_kernel+116>: lis r3,-16355
0xc022851c <start_kernel+120>: addi r7,r7,-32404
0xc0228520 <start_kernel+124>: addi r3,r3,-6740
0xc0228524 <start_kernel+128>: bl 0xc0024dac <parse_args>
0xc0228528 <start_kernel+132>: bl 0xc0231220 <sort_main_extable>
0xc022852c <start_kernel+136>: bl 0xc022eaa0 <trap_init>
0xc0228530 <start_kernel+140>: bl 0xc02311f0 <rcu_init>
0xc0228534 <start_kernel+144>: bl 0xc022eaa4 <init_IRQ>
0xc0228538 <start_kernel+148>: bl 0xc023107c <pidhash_init>
0xc022853c <start_kernel+152>: bl 0xc0230f1c <init_timers>
0xc0228540 <start_kernel+156>: bl 0xc0230cf4 <softirq_init>
0xc0228544 <start_kernel+160>: bl 0xc022ead0 <time_init>
0xc0228548 <start_kernel+164>: bl 0xc02354b0 <console_init>
0xc022854c <start_kernel+168>: lis r9,-16348
0xc0228550 <start_kernel+172>: lwz r3,-8180(r9)
0xc0228554 <start_kernel+176>: cmpwi r3,0
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] 8xx: fix usage of pinned 8Mbyte TLB entries
2005-05-07 4:39 ` Dan Malek
2005-05-07 5:16 ` Dan Malek
2005-05-07 13:16 ` Marcelo Tosatti
@ 2005-05-07 14:05 ` Marcelo Tosatti
2005-05-09 6:09 ` Pantelis Antoniou
2 siblings, 1 reply; 20+ messages in thread
From: Marcelo Tosatti @ 2005-05-07 14:05 UTC (permalink / raw)
To: Dan Malek; +Cc: linux-ppc-embedded
> NO. Just leave that code alone. I don't understand why you think
> doing this will have any effect on the system operation. If you are
> able to run a system without creating these tables, then the pinned
> TLBs must be working. If pinned TLBs weren't working, the kernel
> would crash.
I just booted a kernel with 4kb PTE mappings at KERNELBASE and
the pinned TLB was not trashed.
So, I was talking nonsense. :)
The only problem are DMA users who dont use dma_alloc_coherent API.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] 8xx: fix usage of pinned 8Mbyte TLB entries
2005-05-06 23:05 ` Marcelo Tosatti
2005-05-07 4:39 ` Dan Malek
@ 2005-05-07 14:59 ` Wolfgang Denk
1 sibling, 0 replies; 20+ messages in thread
From: Wolfgang Denk @ 2005-05-07 14:59 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: linux-ppc-embedded
In message <20050506230523.GA15908@logos.cnet> you wrote:
>
> > The BDI can, but other software functions will walk the page
> > tables looking for PTE information.
>
> Do you have any practical example which you are certain is going
> to break?
I think the BDM4GDB BDM debugger depends on this, and maybe some
other tools, too.
> I dont remember any, and I dont think any software should be walking
> kernel pte's directly...
Maybe not regular software, but debug tools that live outside the
kernel.
Best regards,
Wolfgang Denk
--
Software Engineering: Embedded and Realtime Systems, Embedded Linux
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Testing can show the presense of bugs, but not their absence.
-- Edsger Dijkstra
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] 8xx: fix usage of pinned 8Mbyte TLB entries
2005-05-07 20:02 ` Dan Malek
@ 2005-05-07 15:47 ` Marcelo Tosatti
0 siblings, 0 replies; 20+ messages in thread
From: Marcelo Tosatti @ 2005-05-07 15:47 UTC (permalink / raw)
To: Dan Malek; +Cc: linux-ppc-embedded
On Sat, May 07, 2005 at 04:02:34PM -0400, Dan Malek wrote:
>
> On May 7, 2005, at 9:16 AM, Marcelo Tosatti wrote:
>
> >Not creating 4kb mappings for the first 8Mbytes of kernel
> >virtual addresses fixed the problem for me.
>
> Fixed what problem?
page faults for initial 8Mbytes of kernel virtual map.
> In the TLB dump, you replaced the initial 8M entry with
> a bunch of 4K page entries, just as I would have expected
> to happen. Since it was able to run and load these, the
> complete PTE tables must have been created.
Right, that was a dump of a "problematic" (ie 4kb pagefaults on
8Mbyte pinned region) kernel.
> How did you "not create" the 4K mappings?
I told mapin_ram() to start at KERNELBASE + 8Mb.
But, as you said, thats not necessary.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] 8xx: fix usage of pinned 8Mbyte TLB entries
2005-05-07 13:16 ` Marcelo Tosatti
@ 2005-05-07 20:02 ` Dan Malek
2005-05-07 15:47 ` Marcelo Tosatti
0 siblings, 1 reply; 20+ messages in thread
From: Dan Malek @ 2005-05-07 20:02 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: linux-ppc-embedded
On May 7, 2005, at 9:16 AM, Marcelo Tosatti wrote:
> Not creating 4kb mappings for the first 8Mbytes of kernel
> virtual addresses fixed the problem for me.
Fixed what problem?
In the TLB dump, you replaced the initial 8M entry with
a bunch of 4K page entries, just as I would have expected
to happen. Since it was able to run and load these, the
complete PTE tables must have been created.
How did you "not create" the 4K mappings?
Thanks.
-- Dan
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] 8xx: fix usage of pinned 8Mbyte TLB entries
2005-05-07 14:05 ` Marcelo Tosatti
@ 2005-05-09 6:09 ` Pantelis Antoniou
0 siblings, 0 replies; 20+ messages in thread
From: Pantelis Antoniou @ 2005-05-09 6:09 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: linux-ppc-embedded
Marcelo Tosatti wrote:
>>NO. Just leave that code alone. I don't understand why you think
>>doing this will have any effect on the system operation. If you are
>>able to run a system without creating these tables, then the pinned
>>TLBs must be working. If pinned TLBs weren't working, the kernel
>>would crash.
>
>
> I just booted a kernel with 4kb PTE mappings at KERNELBASE and
> the pinned TLB was not trashed.
>
> So, I was talking nonsense. :)
>
> The only problem are DMA users who dont use dma_alloc_coherent API.
>
>
>
Perhaps I'm jumping in too late, but in my 8xx trees all my drivers
use the correct API. So this will not be a problem after we fix the
drivers :)
Regards
Pantelis
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2005-05-09 18:03 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-05 17:20 [PATCH] 8xx: fix usage of pinned 8Mbyte TLB entries Marcelo Tosatti
2005-05-06 13:04 ` Jason McMullan
2005-05-06 11:39 ` Marcelo Tosatti
2005-05-06 16:43 ` Dan Malek
2005-05-06 13:38 ` Marcelo Tosatti
2005-05-06 22:49 ` Dan Malek
2005-05-06 20:03 ` Marcelo Tosatti
2005-05-07 3:09 ` Dan Malek
2005-05-06 23:05 ` Marcelo Tosatti
2005-05-07 4:39 ` Dan Malek
2005-05-07 5:16 ` Dan Malek
2005-05-07 13:16 ` Marcelo Tosatti
2005-05-07 20:02 ` Dan Malek
2005-05-07 15:47 ` Marcelo Tosatti
2005-05-07 14:05 ` Marcelo Tosatti
2005-05-09 6:09 ` Pantelis Antoniou
2005-05-07 14:59 ` Wolfgang Denk
2005-05-07 5:27 ` Dan Malek
2005-05-07 5:55 ` Dan Malek
2005-05-06 23:10 ` Dan Malek
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).