* More details on the telnet with CONFIG_PIN_TLB problems
@ 2002-06-03 7:53 David Gibson
2002-06-04 12:54 ` Paul Mackerras
0 siblings, 1 reply; 13+ messages in thread
From: David Gibson @ 2002-06-03 7:53 UTC (permalink / raw)
To: linuxppc-embedded; +Cc: Paul Mackerras
To narrow down the cause of the problem I tried copying a 1M file of
random data through the loopback interface (using netcat).
The file was corrupted, specifically certain sections became zero in
the destination copy. Oddly enough the zeroed chunks were always 2684
bytes long and the next byte after the affected region was a multiple
of 32k (e.g. bytes 0xc7584-0xc8000 zeroed).
--
David Gibson | For every complex problem there is a
david@gibson.dropbear.id.au | solution which is simple, neat and
| wrong. -- H.L. Mencken
http://www.ozlabs.org/people/dgibson
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: More details on the telnet with CONFIG_PIN_TLB problems
2002-06-03 7:53 More details on the telnet with CONFIG_PIN_TLB problems David Gibson
@ 2002-06-04 12:54 ` Paul Mackerras
2002-06-04 14:39 ` David Gibson
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Paul Mackerras @ 2002-06-04 12:54 UTC (permalink / raw)
To: David Gibson; +Cc: linuxppc-embedded
Looks like Ben and I have found the problem; Ben added an isync and a
sync to set_context() after setting the PID register and that seems to
have fixed it. It makes sense, as isync invalidates the shadow DTLB
and ITLB. (The sync may be unnecessary.)
Paul.
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: More details on the telnet with CONFIG_PIN_TLB problems
2002-06-04 12:54 ` Paul Mackerras
@ 2002-06-04 14:39 ` David Gibson
2002-06-04 16:57 ` Matt Porter
2002-06-04 17:04 ` Dan Malek
2 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2002-06-04 14:39 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-embedded, Benjamin Herrenschmidt
On Tue, Jun 04, 2002 at 10:54:51PM +1000, Paul Mackerras wrote:
>
> Looks like Ben and I have found the problem; Ben added an isync and a
> sync to set_context() after setting the PID register and that seems to
> have fixed it. It makes sense, as isync invalidates the shadow DTLB
> and ITLB. (The sync may be unnecessary.)
Aha, that makes some sense. I hadn't thought of this, partly because
I was assuming that the Shadow [ID]TLB entries would act more-or-less
like full UTLB entries, and so obey the PID etc.
However on re-examination the manual says that an isync (or rfi)
should be performed after any change to translations - including
tlbwe, of course, but also changes to PID, ZPR and MSR. Presumably
without large pages the context switch itself was (nearly always)
hitting enough kernel pages to flush the shadow TLBs (it would only
need 4 ITLB and 4 DTLB misses).
We never change ZPR after its initial setup, but we should check for
any problems with changing MSR. Usualy this won't be an issue since
mostly we only change RI and DI with an rfi, which implicitly flushes
the shadow TLBs. However there might be one or two spots (critical
exception exit in 2.5?) where we use mtmsr and may need an explicit
isync.
--
David Gibson | For every complex problem there is a
david@gibson.dropbear.id.au | solution which is simple, neat and
| wrong. -- H.L. Mencken
http://www.ozlabs.org/people/dgibson
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: More details on the telnet with CONFIG_PIN_TLB problems
2002-06-04 17:04 ` Dan Malek
@ 2002-06-04 16:43 ` Benjamin Herrenschmidt
2002-06-05 3:22 ` David Gibson
0 siblings, 1 reply; 13+ messages in thread
From: Benjamin Herrenschmidt @ 2002-06-04 16:43 UTC (permalink / raw)
To: Dan Malek, Paul Mackerras; +Cc: David Gibson, linuxppc-embedded
>> Looks like Ben and I have found the problem;
>
>Cool. I know this works OK on 8xx, I just haven't finished a working
>tlb miss handler that will work regardless of the page size.
>From my understanding, it seems the problem on 4xx is that the
shadow TLBs aren't keeping the PID. Thus the following scenario
would break (entirely in kernel, no rfi, no interrupt) :
- copy_tofrom_user
- context switch
- copy_tofrom_user
In that case, the PID is changed, but stale DLB entries are still
around, thus screwing up the second copy_tofrom_user.
The isync;sync I added fixes it by clearing the shadow DTLB.
I haven't yet tested without the sync, the 405 doc is unclear about
what instruction flush the shadow DTLB, unlike it does for the shadow
ITLB. The isync may be enough.
Ben.
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: More details on the telnet with CONFIG_PIN_TLB problems
2002-06-04 12:54 ` Paul Mackerras
2002-06-04 14:39 ` David Gibson
@ 2002-06-04 16:57 ` Matt Porter
2002-06-05 0:02 ` David Gibson
2002-06-04 17:04 ` Dan Malek
2 siblings, 1 reply; 13+ messages in thread
From: Matt Porter @ 2002-06-04 16:57 UTC (permalink / raw)
To: Paul Mackerras; +Cc: David Gibson, linuxppc-embedded
On Tue, Jun 04, 2002 at 10:54:51PM +1000, Paul Mackerras wrote:
>
> Looks like Ben and I have found the problem; Ben added an isync and a
> sync to set_context() after setting the PID register and that seems to
> have fixed it. It makes sense, as isync invalidates the shadow DTLB
> and ITLB. (The sync may be unnecessary.)
Makes sense, I was telling some folks at work that it had to be a
40x specific code problem. The 440 has had an isync in set_context()
and doesn't see this problem (and by default uses pinned TLBs).
Changing the PID is a context changing event that requires a context
synchronization. The sync shouldn't be necessary per the UM and
seems to be true in practice from my 440 testing.
Regards,
--
Matt Porter
porter@cox.net
This is Linux Country. On a quiet night, you can hear Windows reboot.
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: More details on the telnet with CONFIG_PIN_TLB problems
2002-06-04 12:54 ` Paul Mackerras
2002-06-04 14:39 ` David Gibson
2002-06-04 16:57 ` Matt Porter
@ 2002-06-04 17:04 ` Dan Malek
2002-06-04 16:43 ` Benjamin Herrenschmidt
2 siblings, 1 reply; 13+ messages in thread
From: Dan Malek @ 2002-06-04 17:04 UTC (permalink / raw)
To: Paul Mackerras; +Cc: David Gibson, linuxppc-embedded
Paul Mackerras wrote:
> Looks like Ben and I have found the problem;
Cool. I know this works OK on 8xx, I just haven't finished a working
tlb miss handler that will work regardless of the page size.
Thanks.
-- Dan
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: More details on the telnet with CONFIG_PIN_TLB problems
2002-06-05 3:22 ` David Gibson
@ 2002-06-04 19:31 ` benh
2002-06-06 1:42 ` David Gibson
0 siblings, 1 reply; 13+ messages in thread
From: benh @ 2002-06-04 19:31 UTC (permalink / raw)
To: David Gibson; +Cc: Dan Malek, Paul Mackerras, linuxppc-embedded
>I've committed a patch to use 'sync' before changing the PID (to flush
>any loads/stores through the MMU before we change the context) and
>'isync' afterwards to flush the shadow TLBs. I'm guessing that isync
>flushes both shadow TLBs, not just the ITLB, and that the missing
>infomation is a documentation error. I've sent some email to the IBM
>PPC support people to check.
Sounds good. In my local tree, I also replaced the icbi's in
flush_dcache_icache & flush_instruction_cache with iccci's to
make sure we don't have stale aliases. I haven't yet checked
glibc for this though.
Ben.
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: More details on the telnet with CONFIG_PIN_TLB problems
2002-06-04 16:57 ` Matt Porter
@ 2002-06-05 0:02 ` David Gibson
2002-06-05 0:34 ` Matt Porter
0 siblings, 1 reply; 13+ messages in thread
From: David Gibson @ 2002-06-05 0:02 UTC (permalink / raw)
To: Matt Porter; +Cc: Paul Mackerras, linuxppc-embedded
On Tue, Jun 04, 2002 at 09:57:20AM -0700, Matt Porter wrote:
>
> On Tue, Jun 04, 2002 at 10:54:51PM +1000, Paul Mackerras wrote:
> >
> > Looks like Ben and I have found the problem; Ben added an isync and a
> > sync to set_context() after setting the PID register and that seems to
> > have fixed it. It makes sense, as isync invalidates the shadow DTLB
> > and ITLB. (The sync may be unnecessary.)
>
> Makes sense, I was telling some folks at work that it had to be a
> 40x specific code problem. The 440 has had an isync in set_context()
> and doesn't see this problem (and by default uses pinned TLBs).
Heh, well not only uses them by default, but must use them since
translation is always on.
> Changing the PID is a context changing event that requires a context
> synchronization. The sync shouldn't be necessary per the UM and
> seems to be true in practice from my 440 testing.
--
David Gibson | For every complex problem there is a
david@gibson.dropbear.id.au | solution which is simple, neat and
| wrong. -- H.L. Mencken
http://www.ozlabs.org/people/dgibson
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: More details on the telnet with CONFIG_PIN_TLB problems
2002-06-05 0:02 ` David Gibson
@ 2002-06-05 0:34 ` Matt Porter
0 siblings, 0 replies; 13+ messages in thread
From: Matt Porter @ 2002-06-05 0:34 UTC (permalink / raw)
To: david, Paul Mackerras, linuxppc-embedded
On Wed, Jun 05, 2002 at 10:02:39AM +1000, David Gibson wrote:
>
> On Tue, Jun 04, 2002 at 09:57:20AM -0700, Matt Porter wrote:
> >
> > On Tue, Jun 04, 2002 at 10:54:51PM +1000, Paul Mackerras wrote:
> > >
> > > Looks like Ben and I have found the problem; Ben added an isync and a
> > > sync to set_context() after setting the PID register and that seems to
> > > have fixed it. It makes sense, as isync invalidates the shadow DTLB
> > > and ITLB. (The sync may be unnecessary.)
> >
> > Makes sense, I was telling some folks at work that it had to be a
> > 40x specific code problem. The 440 has had an isync in set_context()
> > and doesn't see this problem (and by default uses pinned TLBs).
>
> Heh, well not only uses them by default, but must use them since
> translation is always on.
Heh, I should have said, "and by design, must use . . .". ;)
Regards,
--
Matt Porter
porter@cox.net
This is Linux Country. On a quiet night, you can hear Windows reboot.
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: More details on the telnet with CONFIG_PIN_TLB problems
2002-06-04 16:43 ` Benjamin Herrenschmidt
@ 2002-06-05 3:22 ` David Gibson
2002-06-04 19:31 ` benh
0 siblings, 1 reply; 13+ messages in thread
From: David Gibson @ 2002-06-05 3:22 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Dan Malek, Paul Mackerras, linuxppc-embedded
On Tue, Jun 04, 2002 at 06:43:44PM +0200, Benjamin Herrenschmidt wrote:
>
> >> Looks like Ben and I have found the problem;
> >
> >Cool. I know this works OK on 8xx, I just haven't finished a working
> >tlb miss handler that will work regardless of the page size.
>
> >From my understanding, it seems the problem on 4xx is that the
> shadow TLBs aren't keeping the PID. Thus the following scenario
> would break (entirely in kernel, no rfi, no interrupt) :
>
> - copy_tofrom_user
> - context switch
> - copy_tofrom_user
That situation looks consistent with the sorts of corruption we were
seeing.
> In that case, the PID is changed, but stale DLB entries are still
> around, thus screwing up the second copy_tofrom_user.
>
> The isync;sync I added fixes it by clearing the shadow DTLB.
> I haven't yet tested without the sync, the 405 doc is unclear about
> what instruction flush the shadow DTLB, unlike it does for the shadow
> ITLB. The isync may be enough.
I've committed a patch to use 'sync' before changing the PID (to flush
any loads/stores through the MMU before we change the context) and
'isync' afterwards to flush the shadow TLBs. I'm guessing that isync
flushes both shadow TLBs, not just the ITLB, and that the missing
infomation is a documentation error. I've sent some email to the IBM
PPC support people to check.
--
David Gibson | For every complex problem there is a
david@gibson.dropbear.id.au | solution which is simple, neat and
| wrong. -- H.L. Mencken
http://www.ozlabs.org/people/dgibson
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: More details on the telnet with CONFIG_PIN_TLB problems
2002-06-06 1:42 ` David Gibson
@ 2002-06-05 22:15 ` Benjamin Herrenschmidt
2002-06-06 7:52 ` David Gibson
0 siblings, 1 reply; 13+ messages in thread
From: Benjamin Herrenschmidt @ 2002-06-05 22:15 UTC (permalink / raw)
To: David Gibson; +Cc: Dan Malek, Paul Mackerras, linuxppc-embedded
>flush_instruction_cache() is already an iccci on 4xx (iccci flushes
>the entire ICU). flush_dcache_icache() should be fixed though. We
>could either replace the entire icache flushing loop with a single
>iccci, or we could replace each icbi with two icbis, on the address
>and the address XORed with 0x00001000 (which is the only possible
>alias with 4kb pages).
Provided that address ^ 0x1000 is mapped or we do that with
translation off (but then we must get to the physical address
of the first page).
Looks simpler to do an iccci.
Ben.
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: More details on the telnet with CONFIG_PIN_TLB problems
2002-06-04 19:31 ` benh
@ 2002-06-06 1:42 ` David Gibson
2002-06-05 22:15 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 13+ messages in thread
From: David Gibson @ 2002-06-06 1:42 UTC (permalink / raw)
To: benh; +Cc: Dan Malek, Paul Mackerras, linuxppc-embedded
On Tue, Jun 04, 2002 at 09:31:30PM +0200, Benjamin Herrenschmidt wrote:
>
> >I've committed a patch to use 'sync' before changing the PID (to flush
> >any loads/stores through the MMU before we change the context) and
> >'isync' afterwards to flush the shadow TLBs. I'm guessing that isync
> >flushes both shadow TLBs, not just the ITLB, and that the missing
> >infomation is a documentation error. I've sent some email to the IBM
> >PPC support people to check.
>
> Sounds good. In my local tree, I also replaced the icbi's in
> flush_dcache_icache & flush_instruction_cache with iccci's to
> make sure we don't have stale aliases. I haven't yet checked
> glibc for this though.
I heard back from the PPCSUPP people, and apparently isync (or any
context sychronising instruction) does the right thing and flushes the
shadow DTLB.
flush_instruction_cache() is already an iccci on 4xx (iccci flushes
the entire ICU). flush_dcache_icache() should be fixed though. We
could either replace the entire icache flushing loop with a single
iccci, or we could replace each icbi with two icbis, on the address
and the address XORed with 0x00001000 (which is the only possible
alias with 4kb pages).
--
David Gibson | For every complex problem there is a
david@gibson.dropbear.id.au | solution which is simple, neat and
| wrong. -- H.L. Mencken
http://www.ozlabs.org/people/dgibson
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: More details on the telnet with CONFIG_PIN_TLB problems
2002-06-05 22:15 ` Benjamin Herrenschmidt
@ 2002-06-06 7:52 ` David Gibson
0 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2002-06-06 7:52 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Dan Malek, Paul Mackerras, linuxppc-embedded
On Thu, Jun 06, 2002 at 12:15:42AM +0200, Benjamin Herrenschmidt wrote:
>
> >flush_instruction_cache() is already an iccci on 4xx (iccci flushes
> >the entire ICU). flush_dcache_icache() should be fixed though. We
> >could either replace the entire icache flushing loop with a single
> >iccci, or we could replace each icbi with two icbis, on the address
> >and the address XORed with 0x00001000 (which is the only possible
> >alias with 4kb pages).
>
> Provided that address ^ 0x1000 is mapped or we do that with
> translation off (but then we must get to the physical address
> of the first page).
Blah. Good point.
> Looks simpler to do an iccci.
Agreed.
--
David Gibson | For every complex problem there is a
david@gibson.dropbear.id.au | solution which is simple, neat and
| wrong. -- H.L. Mencken
http://www.ozlabs.org/people/dgibson
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2002-06-06 7:52 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-06-03 7:53 More details on the telnet with CONFIG_PIN_TLB problems David Gibson
2002-06-04 12:54 ` Paul Mackerras
2002-06-04 14:39 ` David Gibson
2002-06-04 16:57 ` Matt Porter
2002-06-05 0:02 ` David Gibson
2002-06-05 0:34 ` Matt Porter
2002-06-04 17:04 ` Dan Malek
2002-06-04 16:43 ` Benjamin Herrenschmidt
2002-06-05 3:22 ` David Gibson
2002-06-04 19:31 ` benh
2002-06-06 1:42 ` David Gibson
2002-06-05 22:15 ` Benjamin Herrenschmidt
2002-06-06 7:52 ` David Gibson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).