* CONFIG_PIN_TLB and telnet problems
@ 2002-05-31 7:15 David Gibson
2002-05-31 7:27 ` David Gibson
2002-06-04 0:54 ` Dan Malek
0 siblings, 2 replies; 6+ messages in thread
From: David Gibson @ 2002-05-31 7:15 UTC (permalink / raw)
To: linuxppc-embedded
Ok, I think I've managed to reproduce the telnet problems with
CONFIG_PIN_TLB that Tom Rini and others have reported. I encountered
problems telnetting to localhost on a 16MB EP405 (PVR 40110145) and on
a 32MB Walnut (PVR 401100C4). I was unable to reproduce the problem
on any of the other 405GP machines I have, each of which as at least
64MB of RAM.
Interestingly, if I alter the CONFIG_PIN_TLB code so that it only pins
one entry (i.e. maps 16MB of RAM instead of 32MB) I can still
reproduce the problem on the EP405, but not on the Walnut. It would
seem the problem only bites if the pinned allocation is as large or
larger than physical memory.
It's vaguely understandable that with two pinned entries it would
break on the 16MB machine: creating a mapping for more RAM than exists
is certainly a bug. I don't yet know why there is a problem when the
mapping is exactly as large as RAM though.
--
David Gibson | For every complex problem there is a
david@gibson.dropbear.id.au | solution which is simple, neat and
| wrong. -- H.L. Mencken
http://www.ozlabs.org/people/dgibson
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: CONFIG_PIN_TLB and telnet problems
2002-05-31 7:15 CONFIG_PIN_TLB and telnet problems David Gibson
@ 2002-05-31 7:27 ` David Gibson
2002-06-04 0:54 ` Dan Malek
1 sibling, 0 replies; 6+ messages in thread
From: David Gibson @ 2002-05-31 7:27 UTC (permalink / raw)
To: linuxppc-embedded
On Fri, May 31, 2002 at 05:15:50PM +1000, David Gibson wrote:
>
> Ok, I think I've managed to reproduce the telnet problems with
> CONFIG_PIN_TLB that Tom Rini and others have reported. I encountered
> problems telnetting to localhost on a 16MB EP405 (PVR 40110145) and on
> a 32MB Walnut (PVR 401100C4). I was unable to reproduce the problem
> on any of the other 405GP machines I have, each of which as at least
> 64MB of RAM.
>
> Interestingly, if I alter the CONFIG_PIN_TLB code so that it only pins
> one entry (i.e. maps 16MB of RAM instead of 32MB) I can still
> reproduce the problem on the EP405, but not on the Walnut. It would
> seem the problem only bites if the pinned allocation is as large or
> larger than physical memory.
>
> It's vaguely understandable that with two pinned entries it would
> break on the 16MB machine: creating a mapping for more RAM than exists
> is certainly a bug. I don't yet know why there is a problem when the
> mapping is exactly as large as RAM though.
Update: I've now also reproduced the problem on a 64MB EP405PC board,
by using mem=16M or mem=32M
--
David Gibson | For every complex problem there is a
david@gibson.dropbear.id.au | solution which is simple, neat and
| wrong. -- H.L. Mencken
http://www.ozlabs.org/people/dgibson
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: CONFIG_PIN_TLB and telnet problems
2002-05-31 7:15 CONFIG_PIN_TLB and telnet problems David Gibson
2002-05-31 7:27 ` David Gibson
@ 2002-06-04 0:54 ` Dan Malek
2002-06-04 1:26 ` Paul Mackerras
1 sibling, 1 reply; 6+ messages in thread
From: Dan Malek @ 2002-06-04 0:54 UTC (permalink / raw)
To: David Gibson; +Cc: linuxppc-embedded
David Gibson wrote:
> ....I was unable to reproduce the problem
> on any of the other 405GP machines I have, each of which as at least
> 64MB of RAM.
As I mentioned in the first message, I suspect the problem is with the
multiple mapping/access of data in the pinned and remapped areas. Linux
tends to allocate memory from the high end down, so if you consistent_alloc()
some space on large memory systems, you are just remapping the attributes
of a page. If you do this on memory that is also covered by a large page,
sometimes you will get the access through this large page, and others through
an alternate mapping, which I believe confuses the MMU/cache with different
attributes (which I was assured wouldn't cause problems on 4xx).
So, if you use large pages for the first 16 or 32M, and you have only that
much memory, you will encounter mapping aliases. If you only use large
pages for the first 16 or 32M, but have 64M or more total, then the upper
pages usually allocated to skbufs aren't subject to the large page mapping alias.
-- Dan
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: CONFIG_PIN_TLB and telnet problems
2002-06-04 0:54 ` Dan Malek
@ 2002-06-04 1:26 ` Paul Mackerras
2002-06-04 12:20 ` benh
0 siblings, 1 reply; 6+ messages in thread
From: Paul Mackerras @ 2002-06-04 1:26 UTC (permalink / raw)
To: Dan Malek; +Cc: David Gibson, linuxppc-embedded
Dan Malek writes:
> As I mentioned in the first message, I suspect the problem is with the
> multiple mapping/access of data in the pinned and remapped areas. Linux
> tends to allocate memory from the high end down, so if you consistent_alloc()
> some space on large memory systems, you are just remapping the attributes
> of a page. If you do this on memory that is also covered by a large page,
> sometimes you will get the access through this large page, and others through
> an alternate mapping, which I believe confuses the MMU/cache with different
> attributes (which I was assured wouldn't cause problems on 4xx).
We have reproduced the problem using a ramdisk root and loopback, with
the ethernet disabled, so the only I/O device that is active is the
serial port, which doesn't use DMA. So it doesn't look like it is
anything to do with DMA or with consistent_alloc.
Paul.
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: CONFIG_PIN_TLB and telnet problems
2002-06-04 1:26 ` Paul Mackerras
@ 2002-06-04 12:20 ` benh
2002-06-04 12:37 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 6+ messages in thread
From: benh @ 2002-06-04 12:20 UTC (permalink / raw)
To: Paul Mackerras, Dan Malek; +Cc: David Gibson, linuxppc-embedded
>> As I mentioned in the first message, I suspect the problem is with the
>> multiple mapping/access of data in the pinned and remapped areas. Linux
>> tends to allocate memory from the high end down, so if you
>consistent_alloc()
>> some space on large memory systems, you are just remapping the attributes
>> of a page. If you do this on memory that is also covered by a large page,
>> sometimes you will get the access through this large page, and others
>through
>> an alternate mapping, which I believe confuses the MMU/cache with different
>> attributes (which I was assured wouldn't cause problems on 4xx).
>
>We have reproduced the problem using a ramdisk root and loopback, with
>the ethernet disabled, so the only I/O device that is active is the
>serial port, which doesn't use DMA. So it doesn't look like it is
>anything to do with DMA or with consistent_alloc.
To add to these comments, I can reproduce the problem as well on a
unix socket shared either between two processes, or read & written
by a single process.
After doing various tests, the problem appears rarely and randomly
with half the RAM mapped with fixed TLBs, and very reproduceably
with all the RAM mapped this way. So it seems that reducing the
kernel pressure on TLBs, thus allowing userland TLBs to live much
longer, exhibit the problem.
I tried adding a call to _tlbia (not the instruction but our tlbwe
based implementation) in set_context to make sure I only ever have
one userland context loaded in the TLB and this appear to kill the
problem (I'm currently running 2 offending test programs simultaneously
on the box and none failed yet after a few Gb transferred).
Ben.
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: CONFIG_PIN_TLB and telnet problems
2002-06-04 12:20 ` benh
@ 2002-06-04 12:37 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2002-06-04 12:37 UTC (permalink / raw)
To: Paul Mackerras, Dan Malek; +Cc: David Gibson, linuxppc-embedded
>
>To add to these comments, I can reproduce the problem as well on a
>unix socket shared either between two processes, or read & written
>by a single process.
>
>After doing various tests, the problem appears rarely and randomly
>with half the RAM mapped with fixed TLBs, and very reproduceably
>with all the RAM mapped this way. So it seems that reducing the
>kernel pressure on TLBs, thus allowing userland TLBs to live much
>longer, exhibit the problem.
>
>I tried adding a call to _tlbia (not the instruction but our tlbwe
>based implementation) in set_context to make sure I only ever have
>one userland context loaded in the TLB and this appear to kill the
>problem (I'm currently running 2 offending test programs simultaneously
>on the box and none failed yet after a few Gb transferred).
Hrm... I added isync/sync (actually, the sync is probably too much)
to set_context() in head_4xx.S in order to invalidate the shadow TLBs
and it seems to work ! I'll test a few hours and let you know.
Ben.
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2002-06-04 12:37 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-05-31 7:15 CONFIG_PIN_TLB and telnet problems David Gibson
2002-05-31 7:27 ` David Gibson
2002-06-04 0:54 ` Dan Malek
2002-06-04 1:26 ` Paul Mackerras
2002-06-04 12:20 ` benh
2002-06-04 12:37 ` Benjamin Herrenschmidt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).