* More details on the telnet with CONFIG_PIN_TLB problems @ 2002-06-03 7:53 David Gibson 2002-06-04 12:54 ` Paul Mackerras 0 siblings, 1 reply; 13+ messages in thread From: David Gibson @ 2002-06-03 7:53 UTC (permalink / raw) To: linuxppc-embedded; +Cc: Paul Mackerras To narrow down the cause of the problem I tried copying a 1M file of random data through the loopback interface (using netcat). The file was corrupted, specifically certain sections became zero in the destination copy. Oddly enough the zeroed chunks were always 2684 bytes long and the next byte after the affected region was a multiple of 32k (e.g. bytes 0xc7584-0xc8000 zeroed). -- David Gibson | For every complex problem there is a david@gibson.dropbear.id.au | solution which is simple, neat and | wrong. -- H.L. Mencken http://www.ozlabs.org/people/dgibson ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: More details on the telnet with CONFIG_PIN_TLB problems 2002-06-03 7:53 More details on the telnet with CONFIG_PIN_TLB problems David Gibson @ 2002-06-04 12:54 ` Paul Mackerras 2002-06-04 14:39 ` David Gibson ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Paul Mackerras @ 2002-06-04 12:54 UTC (permalink / raw) To: David Gibson; +Cc: linuxppc-embedded Looks like Ben and I have found the problem; Ben added an isync and a sync to set_context() after setting the PID register and that seems to have fixed it. It makes sense, as isync invalidates the shadow DTLB and ITLB. (The sync may be unnecessary.) Paul. ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: More details on the telnet with CONFIG_PIN_TLB problems 2002-06-04 12:54 ` Paul Mackerras @ 2002-06-04 14:39 ` David Gibson 2002-06-04 16:57 ` Matt Porter 2002-06-04 17:04 ` Dan Malek 2 siblings, 0 replies; 13+ messages in thread From: David Gibson @ 2002-06-04 14:39 UTC (permalink / raw) To: Paul Mackerras; +Cc: linuxppc-embedded, Benjamin Herrenschmidt On Tue, Jun 04, 2002 at 10:54:51PM +1000, Paul Mackerras wrote: > > Looks like Ben and I have found the problem; Ben added an isync and a > sync to set_context() after setting the PID register and that seems to > have fixed it. It makes sense, as isync invalidates the shadow DTLB > and ITLB. (The sync may be unnecessary.) Aha, that makes some sense. I hadn't thought of this, partly because I was assuming that the Shadow [ID]TLB entries would act more-or-less like full UTLB entries, and so obey the PID etc. However on re-examination the manual says that an isync (or rfi) should be performed after any change to translations - including tlbwe, of course, but also changes to PID, ZPR and MSR. Presumably without large pages the context switch itself was (nearly always) hitting enough kernel pages to flush the shadow TLBs (it would only need 4 ITLB and 4 DTLB misses). We never change ZPR after its initial setup, but we should check for any problems with changing MSR. Usualy this won't be an issue since mostly we only change RI and DI with an rfi, which implicitly flushes the shadow TLBs. However there might be one or two spots (critical exception exit in 2.5?) where we use mtmsr and may need an explicit isync. -- David Gibson | For every complex problem there is a david@gibson.dropbear.id.au | solution which is simple, neat and | wrong. -- H.L. Mencken http://www.ozlabs.org/people/dgibson ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: More details on the telnet with CONFIG_PIN_TLB problems 2002-06-04 12:54 ` Paul Mackerras 2002-06-04 14:39 ` David Gibson @ 2002-06-04 16:57 ` Matt Porter 2002-06-05 0:02 ` David Gibson 2002-06-04 17:04 ` Dan Malek 2 siblings, 1 reply; 13+ messages in thread From: Matt Porter @ 2002-06-04 16:57 UTC (permalink / raw) To: Paul Mackerras; +Cc: David Gibson, linuxppc-embedded On Tue, Jun 04, 2002 at 10:54:51PM +1000, Paul Mackerras wrote: > > Looks like Ben and I have found the problem; Ben added an isync and a > sync to set_context() after setting the PID register and that seems to > have fixed it. It makes sense, as isync invalidates the shadow DTLB > and ITLB. (The sync may be unnecessary.) Makes sense, I was telling some folks at work that it had to be a 40x specific code problem. The 440 has had an isync in set_context() and doesn't see this problem (and by default uses pinned TLBs). Changing the PID is a context changing event that requires a context synchronization. The sync shouldn't be necessary per the UM and seems to be true in practice from my 440 testing. Regards, -- Matt Porter porter@cox.net This is Linux Country. On a quiet night, you can hear Windows reboot. ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: More details on the telnet with CONFIG_PIN_TLB problems 2002-06-04 16:57 ` Matt Porter @ 2002-06-05 0:02 ` David Gibson 2002-06-05 0:34 ` Matt Porter 0 siblings, 1 reply; 13+ messages in thread From: David Gibson @ 2002-06-05 0:02 UTC (permalink / raw) To: Matt Porter; +Cc: Paul Mackerras, linuxppc-embedded On Tue, Jun 04, 2002 at 09:57:20AM -0700, Matt Porter wrote: > > On Tue, Jun 04, 2002 at 10:54:51PM +1000, Paul Mackerras wrote: > > > > Looks like Ben and I have found the problem; Ben added an isync and a > > sync to set_context() after setting the PID register and that seems to > > have fixed it. It makes sense, as isync invalidates the shadow DTLB > > and ITLB. (The sync may be unnecessary.) > > Makes sense, I was telling some folks at work that it had to be a > 40x specific code problem. The 440 has had an isync in set_context() > and doesn't see this problem (and by default uses pinned TLBs). Heh, well not only uses them by default, but must use them since translation is always on. > Changing the PID is a context changing event that requires a context > synchronization. The sync shouldn't be necessary per the UM and > seems to be true in practice from my 440 testing. -- David Gibson | For every complex problem there is a david@gibson.dropbear.id.au | solution which is simple, neat and | wrong. -- H.L. Mencken http://www.ozlabs.org/people/dgibson ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: More details on the telnet with CONFIG_PIN_TLB problems 2002-06-05 0:02 ` David Gibson @ 2002-06-05 0:34 ` Matt Porter 0 siblings, 0 replies; 13+ messages in thread From: Matt Porter @ 2002-06-05 0:34 UTC (permalink / raw) To: david, Paul Mackerras, linuxppc-embedded On Wed, Jun 05, 2002 at 10:02:39AM +1000, David Gibson wrote: > > On Tue, Jun 04, 2002 at 09:57:20AM -0700, Matt Porter wrote: > > > > On Tue, Jun 04, 2002 at 10:54:51PM +1000, Paul Mackerras wrote: > > > > > > Looks like Ben and I have found the problem; Ben added an isync and a > > > sync to set_context() after setting the PID register and that seems to > > > have fixed it. It makes sense, as isync invalidates the shadow DTLB > > > and ITLB. (The sync may be unnecessary.) > > > > Makes sense, I was telling some folks at work that it had to be a > > 40x specific code problem. The 440 has had an isync in set_context() > > and doesn't see this problem (and by default uses pinned TLBs). > > Heh, well not only uses them by default, but must use them since > translation is always on. Heh, I should have said, "and by design, must use . . .". ;) Regards, -- Matt Porter porter@cox.net This is Linux Country. On a quiet night, you can hear Windows reboot. ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: More details on the telnet with CONFIG_PIN_TLB problems 2002-06-04 12:54 ` Paul Mackerras 2002-06-04 14:39 ` David Gibson 2002-06-04 16:57 ` Matt Porter @ 2002-06-04 17:04 ` Dan Malek 2002-06-04 16:43 ` Benjamin Herrenschmidt 2 siblings, 1 reply; 13+ messages in thread From: Dan Malek @ 2002-06-04 17:04 UTC (permalink / raw) To: Paul Mackerras; +Cc: David Gibson, linuxppc-embedded Paul Mackerras wrote: > Looks like Ben and I have found the problem; Cool. I know this works OK on 8xx, I just haven't finished a working tlb miss handler that will work regardless of the page size. Thanks. -- Dan ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: More details on the telnet with CONFIG_PIN_TLB problems 2002-06-04 17:04 ` Dan Malek @ 2002-06-04 16:43 ` Benjamin Herrenschmidt 2002-06-05 3:22 ` David Gibson 0 siblings, 1 reply; 13+ messages in thread From: Benjamin Herrenschmidt @ 2002-06-04 16:43 UTC (permalink / raw) To: Dan Malek, Paul Mackerras; +Cc: David Gibson, linuxppc-embedded >> Looks like Ben and I have found the problem; > >Cool. I know this works OK on 8xx, I just haven't finished a working >tlb miss handler that will work regardless of the page size. >From my understanding, it seems the problem on 4xx is that the shadow TLBs aren't keeping the PID. Thus the following scenario would break (entirely in kernel, no rfi, no interrupt) : - copy_tofrom_user - context switch - copy_tofrom_user In that case, the PID is changed, but stale DLB entries are still around, thus screwing up the second copy_tofrom_user. The isync;sync I added fixes it by clearing the shadow DTLB. I haven't yet tested without the sync, the 405 doc is unclear about what instruction flush the shadow DTLB, unlike it does for the shadow ITLB. The isync may be enough. Ben. ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: More details on the telnet with CONFIG_PIN_TLB problems 2002-06-04 16:43 ` Benjamin Herrenschmidt @ 2002-06-05 3:22 ` David Gibson 2002-06-04 19:31 ` benh 0 siblings, 1 reply; 13+ messages in thread From: David Gibson @ 2002-06-05 3:22 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: Dan Malek, Paul Mackerras, linuxppc-embedded On Tue, Jun 04, 2002 at 06:43:44PM +0200, Benjamin Herrenschmidt wrote: > > >> Looks like Ben and I have found the problem; > > > >Cool. I know this works OK on 8xx, I just haven't finished a working > >tlb miss handler that will work regardless of the page size. > > >From my understanding, it seems the problem on 4xx is that the > shadow TLBs aren't keeping the PID. Thus the following scenario > would break (entirely in kernel, no rfi, no interrupt) : > > - copy_tofrom_user > - context switch > - copy_tofrom_user That situation looks consistent with the sorts of corruption we were seeing. > In that case, the PID is changed, but stale DLB entries are still > around, thus screwing up the second copy_tofrom_user. > > The isync;sync I added fixes it by clearing the shadow DTLB. > I haven't yet tested without the sync, the 405 doc is unclear about > what instruction flush the shadow DTLB, unlike it does for the shadow > ITLB. The isync may be enough. I've committed a patch to use 'sync' before changing the PID (to flush any loads/stores through the MMU before we change the context) and 'isync' afterwards to flush the shadow TLBs. I'm guessing that isync flushes both shadow TLBs, not just the ITLB, and that the missing infomation is a documentation error. I've sent some email to the IBM PPC support people to check. -- David Gibson | For every complex problem there is a david@gibson.dropbear.id.au | solution which is simple, neat and | wrong. -- H.L. Mencken http://www.ozlabs.org/people/dgibson ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: More details on the telnet with CONFIG_PIN_TLB problems 2002-06-05 3:22 ` David Gibson @ 2002-06-04 19:31 ` benh 2002-06-06 1:42 ` David Gibson 0 siblings, 1 reply; 13+ messages in thread From: benh @ 2002-06-04 19:31 UTC (permalink / raw) To: David Gibson; +Cc: Dan Malek, Paul Mackerras, linuxppc-embedded >I've committed a patch to use 'sync' before changing the PID (to flush >any loads/stores through the MMU before we change the context) and >'isync' afterwards to flush the shadow TLBs. I'm guessing that isync >flushes both shadow TLBs, not just the ITLB, and that the missing >infomation is a documentation error. I've sent some email to the IBM >PPC support people to check. Sounds good. In my local tree, I also replaced the icbi's in flush_dcache_icache & flush_instruction_cache with iccci's to make sure we don't have stale aliases. I haven't yet checked glibc for this though. Ben. ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: More details on the telnet with CONFIG_PIN_TLB problems 2002-06-04 19:31 ` benh @ 2002-06-06 1:42 ` David Gibson 2002-06-05 22:15 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 13+ messages in thread From: David Gibson @ 2002-06-06 1:42 UTC (permalink / raw) To: benh; +Cc: Dan Malek, Paul Mackerras, linuxppc-embedded On Tue, Jun 04, 2002 at 09:31:30PM +0200, Benjamin Herrenschmidt wrote: > > >I've committed a patch to use 'sync' before changing the PID (to flush > >any loads/stores through the MMU before we change the context) and > >'isync' afterwards to flush the shadow TLBs. I'm guessing that isync > >flushes both shadow TLBs, not just the ITLB, and that the missing > >infomation is a documentation error. I've sent some email to the IBM > >PPC support people to check. > > Sounds good. In my local tree, I also replaced the icbi's in > flush_dcache_icache & flush_instruction_cache with iccci's to > make sure we don't have stale aliases. I haven't yet checked > glibc for this though. I heard back from the PPCSUPP people, and apparently isync (or any context sychronising instruction) does the right thing and flushes the shadow DTLB. flush_instruction_cache() is already an iccci on 4xx (iccci flushes the entire ICU). flush_dcache_icache() should be fixed though. We could either replace the entire icache flushing loop with a single iccci, or we could replace each icbi with two icbis, on the address and the address XORed with 0x00001000 (which is the only possible alias with 4kb pages). -- David Gibson | For every complex problem there is a david@gibson.dropbear.id.au | solution which is simple, neat and | wrong. -- H.L. Mencken http://www.ozlabs.org/people/dgibson ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: More details on the telnet with CONFIG_PIN_TLB problems 2002-06-06 1:42 ` David Gibson @ 2002-06-05 22:15 ` Benjamin Herrenschmidt 2002-06-06 7:52 ` David Gibson 0 siblings, 1 reply; 13+ messages in thread From: Benjamin Herrenschmidt @ 2002-06-05 22:15 UTC (permalink / raw) To: David Gibson; +Cc: Dan Malek, Paul Mackerras, linuxppc-embedded >flush_instruction_cache() is already an iccci on 4xx (iccci flushes >the entire ICU). flush_dcache_icache() should be fixed though. We >could either replace the entire icache flushing loop with a single >iccci, or we could replace each icbi with two icbis, on the address >and the address XORed with 0x00001000 (which is the only possible >alias with 4kb pages). Provided that address ^ 0x1000 is mapped or we do that with translation off (but then we must get to the physical address of the first page). Looks simpler to do an iccci. Ben. ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: More details on the telnet with CONFIG_PIN_TLB problems 2002-06-05 22:15 ` Benjamin Herrenschmidt @ 2002-06-06 7:52 ` David Gibson 0 siblings, 0 replies; 13+ messages in thread From: David Gibson @ 2002-06-06 7:52 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: Dan Malek, Paul Mackerras, linuxppc-embedded On Thu, Jun 06, 2002 at 12:15:42AM +0200, Benjamin Herrenschmidt wrote: > > >flush_instruction_cache() is already an iccci on 4xx (iccci flushes > >the entire ICU). flush_dcache_icache() should be fixed though. We > >could either replace the entire icache flushing loop with a single > >iccci, or we could replace each icbi with two icbis, on the address > >and the address XORed with 0x00001000 (which is the only possible > >alias with 4kb pages). > > Provided that address ^ 0x1000 is mapped or we do that with > translation off (but then we must get to the physical address > of the first page). Blah. Good point. > Looks simpler to do an iccci. Agreed. -- David Gibson | For every complex problem there is a david@gibson.dropbear.id.au | solution which is simple, neat and | wrong. -- H.L. Mencken http://www.ozlabs.org/people/dgibson ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2002-06-06 7:52 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-06-03 7:53 More details on the telnet with CONFIG_PIN_TLB problems David Gibson 2002-06-04 12:54 ` Paul Mackerras 2002-06-04 14:39 ` David Gibson 2002-06-04 16:57 ` Matt Porter 2002-06-05 0:02 ` David Gibson 2002-06-05 0:34 ` Matt Porter 2002-06-04 17:04 ` Dan Malek 2002-06-04 16:43 ` Benjamin Herrenschmidt 2002-06-05 3:22 ` David Gibson 2002-06-04 19:31 ` benh 2002-06-06 1:42 ` David Gibson 2002-06-05 22:15 ` Benjamin Herrenschmidt 2002-06-06 7:52 ` David Gibson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).