* Re: Improved copy_page() function, about 30% speed up for mpc860!
@ 2003-03-02 17:50 Joakim Tjernlund
2003-03-03 21:18 ` Dan Malek
0 siblings, 1 reply; 27+ messages in thread
From: Joakim Tjernlund @ 2003-03-02 17:50 UTC (permalink / raw)
To: linuxppc-dev; +Cc: drow
> > > I can't tell you what revs they were, but all of the MPC860's I could
> > > get my hands on here the last time I tried to use dcbz on them were
> > > faulty. You may just not be triggering the bug.
> >
> > hmm, what boards was this?
> > I am planning to a larger test here with all our custom mpc860 and mpc862 boards. We have them in
> > 100, 80 and 50 MHZ variants.
> >
> > May be the bug is related to board design? Is there an official errata from Motorla
> > regarding this bug? I can't find any.
> >
> > Anyhow I had a flaw in my testprogram, so you can throw this version of copy_page() away.
> > But enabling the use of dcbz in the current version still gives me 30%+ performance increase.
> >
> > See the embedded list for details.
> >
> > Jocke
I found a link that may have relevance regarding the dcbz problem , can anybody confirm this?
http://www.uwsg.iu.edu/hypermail/linux/kernel/0012.0/0529.html
Jocke
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: Improved copy_page() function, about 30% speed up for mpc860!
2003-03-02 17:50 Improved copy_page() function, about 30% speed up for mpc860! Joakim Tjernlund
@ 2003-03-03 21:18 ` Dan Malek
2003-03-03 23:16 ` Joakim Tjernlund
0 siblings, 1 reply; 27+ messages in thread
From: Dan Malek @ 2003-03-03 21:18 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: linuxppc-dev, drow
Joakim Tjernlund wrote:
> I found a link that may have relevance regarding the dcbz problem , can anybody confirm this?
> http://www.uwsg.iu.edu/hypermail/linux/kernel/0012.0/0529.html
This is just one of the many different things that happen with various
silicon versions and the use of the cache instructions on the 8xx.
In this particular case, we have fixed this "bug" because it affects
other load/store instructions under some conditions (i.e. this isn't
unique to the cache instructions).
-- Dan
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Improved copy_page() function, about 30% speed up for mpc860!
2003-03-03 21:18 ` Dan Malek
@ 2003-03-03 23:16 ` Joakim Tjernlund
2003-03-04 0:43 ` Dan Malek
[not found] ` <1046737789.885.15.camel@zion.wanadoo.fr>
0 siblings, 2 replies; 27+ messages in thread
From: Joakim Tjernlund @ 2003-03-03 23:16 UTC (permalink / raw)
To: Dan Malek; +Cc: linuxppc-dev, drow
> Joakim Tjernlund wrote:
>
> > I found a link that may have relevance regarding the dcbz problem , can anybody confirm this?
> > http://www.uwsg.iu.edu/hypermail/linux/kernel/0012.0/0529.html
>
> This is just one of the many different things that happen with various
> silicon versions and the use of the cache instructions on the 8xx.
> In this particular case, we have fixed this "bug" because it affects
> other load/store instructions under some conditions (i.e. this isn't
> unique to the cache instructions).
OK so this is not it then, but what is it then? Are you 100% that
the bug(whatever this may be) is present for mpc860, rev D4 or later?
How can I make it bite me on kernel space memory?
I can't find that info in the archives, if it's there please give me a hint.
I have enabled all kernel functions that uses dcbz for 8xx as well. I even split copy_tofrom_user
into copy_from_myuser resp. copy_to_myuser and enabled dcbz in copy_from_myuser
and still everything is working just fine.
Jocke
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: Improved copy_page() function, about 30% speed up for mpc860!
2003-03-03 23:16 ` Joakim Tjernlund
@ 2003-03-04 0:43 ` Dan Malek
2003-03-04 0:54 ` Daniel Jacobowitz
[not found] ` <1046737789.885.15.camel@zion.wanadoo.fr>
1 sibling, 1 reply; 27+ messages in thread
From: Dan Malek @ 2003-03-04 0:43 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: linuxppc-dev, drow
Joakim Tjernlund wrote:
> OK so this is not it then, but what is it then? Are you 100% that
> the bug(whatever this may be) is present for mpc860, rev D4 or later?
> How can I make it bite me on kernel space memory?
> I can't find that info in the archives, if it's there please give me a hint.
I guess once again I don't understand the questions or failed to express
my responses properly.
In the previous message, you included a link that indicated a problem
setting the DAR, cache instructions, and asked if this was a problem.
In this case, due to the way we map kernel space and use the cache instructions,
we have "fixed" this bug. I believe it was found because of the way we
copy instructions in the C library, and it affected user applications.
The specific problem described in this link is not a problem with Linux.
In general, over the many years and many different revisions of 8xx silicon,
there was always a common problem with cache instructions used on virtual
addresses that missed in the TLB. They either never caused an exception,
caused the wrong one, or didn't set status or other registers as expected.
It was possible to trigger the proper sequence of events with Linux because
we heavily use the TLB in a dynamic manner and use a variety of different
cache modes on a page resolution. To discover the error required a specific
code path with the TLB in a certain state and data accessed in a particular
cache line. It was often configuration specific, as adding/deleting options
moved code in the kernel.
> I have enabled all kernel functions that uses dcbz for 8xx as well. I even split copy_tofrom_user
> into copy_from_myuser resp. copy_to_myuser and enabled dcbz in copy_from_myuser
> and still everything is working just fine.
Maybe it works in your silicon. Maybe it works because the code path and
data access doesn't trigger a problem. Maybe it works because the cache
instructions aren't doing anything and the data didn't need to get zeroed
out to start with (this happens in lots of cases). We know there are lots
of cases where the instructions don't work, and you are the first to claim
they do. I'd suggest posting a patch someplace, let people use it if they
wish, and try to accumulate real application information to see if we can
collect some stability data. I hope the only change needed to the code
is modification of the 'ifdef CONFIG_8xx' directives in the assembler code
that implements the clear/copy functions.
Thanks.
-- Dan
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: Improved copy_page() function, about 30% speed up for mpc860!
2003-03-04 0:43 ` Dan Malek
@ 2003-03-04 0:54 ` Daniel Jacobowitz
2003-03-04 3:38 ` Dan Malek
0 siblings, 1 reply; 27+ messages in thread
From: Daniel Jacobowitz @ 2003-03-04 0:54 UTC (permalink / raw)
To: Dan Malek; +Cc: Joakim Tjernlund, linuxppc-dev
On Mon, Mar 03, 2003 at 07:43:08PM -0500, Dan Malek wrote:
> Joakim Tjernlund wrote:
>
> >OK so this is not it then, but what is it then? Are you 100% that
> >the bug(whatever this may be) is present for mpc860, rev D4 or later?
> >How can I make it bite me on kernel space memory?
> >I can't find that info in the archives, if it's there please give me a
> >hint.
>
> I guess once again I don't understand the questions or failed to express
> my responses properly.
>
> In the previous message, you included a link that indicated a problem
> setting the DAR, cache instructions, and asked if this was a problem.
> In this case, due to the way we map kernel space and use the cache
> instructions,
> we have "fixed" this bug. I believe it was found because of the way we
> copy instructions in the C library, and it affected user applications.
> The specific problem described in this link is not a problem with Linux.
That is, in fact, incorrect. It's fatal from userland. I experimented
with it just a couple of months ago.
Only way to sort it out would be to do instruction decoding on the
faulting address, which is hideous beyond words.
--
Daniel Jacobowitz
MontaVista Software Debian GNU/Linux Developer
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Improved copy_page() function, about 30% speed up for mpc860!
2003-03-04 0:54 ` Daniel Jacobowitz
@ 2003-03-04 3:38 ` Dan Malek
2003-03-04 8:29 ` Joakim Tjernlund
0 siblings, 1 reply; 27+ messages in thread
From: Dan Malek @ 2003-03-04 3:38 UTC (permalink / raw)
To: Daniel Jacobowitz; +Cc: Joakim Tjernlund, linuxppc-dev
Daniel Jacobowitz wrote:
> That is, in fact, incorrect. It's fatal from userland. I experimented
> with it just a couple of months ago.
Uh oh....was this discussed and I missed it?
Will you fill me in on what you were trying to do and how it failed?
Thanks.
-- Dan
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 27+ messages in thread
* RE: Improved copy_page() function, about 30% speed up for mpc860!
2003-03-04 3:38 ` Dan Malek
@ 2003-03-04 8:29 ` Joakim Tjernlund
2003-03-04 13:33 ` Dan Malek
0 siblings, 1 reply; 27+ messages in thread
From: Joakim Tjernlund @ 2003-03-04 8:29 UTC (permalink / raw)
To: Dan Malek, Daniel Jacobowitz; +Cc: linuxppc-dev
> Daniel Jacobowitz wrote:
>
> > That is, in fact, incorrect. It's fatal from userland. I experimented
> > with it just a couple of months ago.
>
> Uh oh....was this discussed and I missed it?
>
> Will you fill me in on what you were trying to do and how it failed?
I and Daniel discussed it, mostly in private. It fails horribly for
me too in user space(init hangs, don't know any details) and that's why
I split copy_tofrom_user into 2 functions( see yesterdays mail) as a test. With that
change my system is stable.
Jocke
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Improved copy_page() function, about 30% speed up for mpc860!
2003-03-04 8:29 ` Joakim Tjernlund
@ 2003-03-04 13:33 ` Dan Malek
2003-03-04 15:24 ` Joakim Tjernlund
0 siblings, 1 reply; 27+ messages in thread
From: Dan Malek @ 2003-03-04 13:33 UTC (permalink / raw)
To: joakim.tjernlund; +Cc: Daniel Jacobowitz, linuxppc-dev
Joakim Tjernlund wrote:
> I and Daniel discussed it, mostly in private. It fails horribly for
> me too in user space(init hangs, don't know any details) and that's why
> I split copy_tofrom_user into 2 functions( see yesterdays mail) as a test. With that
> change my system is stable.
Well, I suspect it's luck more than stable. :-)
I suggest you debug the failure cases and determine what is really
wrong. We know from past history that the cache instructions on 8xx
are troublesome and if we avoid them the system is truly stable. The
execution of the cache instructions is identical whether you are using
them on kernel or user pages, the main difference is you are more likely
to hit TLB refill/update cases when using user space pages, exactly
one of the problem triggers. If it's working on kernel pages and not
user pages, or some other combinations, you are just being lucky. The
cache instructions will do the right thing if the mapping is present
in the TLB (and you don't get a write/update miss) and the page is
cached. If you don't have the page cached or you get any TLB exception
the results are unpredictable and the result varies depending upon
silicon revision.
This is something that is difficult to debug and we can't dismiss this
with a solution of different copy functions. The clear/copy functions
for the 8xx should be identical to all other PowerPC cores, and if they
don't work that way we need to determine why. At least you have the
knowledge that these instructions are troublesome. It took me many months
to discover this the first time, and perhaps they still misbehave.
Thanks.
-- Dan
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 27+ messages in thread
* RE: Improved copy_page() function, about 30% speed up for mpc860!
2003-03-04 13:33 ` Dan Malek
@ 2003-03-04 15:24 ` Joakim Tjernlund
2003-03-04 17:00 ` Dan Malek
0 siblings, 1 reply; 27+ messages in thread
From: Joakim Tjernlund @ 2003-03-04 15:24 UTC (permalink / raw)
To: Dan Malek; +Cc: Daniel Jacobowitz, linuxppc-dev
> Well, I suspect it's luck more than stable. :-)
Maybe :-)
>
> I suggest you debug the failure cases and determine what is really
> wrong. We know from past history that the cache instructions on 8xx
> are troublesome and if we avoid them the system is truly stable. The
> execution of the cache instructions is identical whether you are using
> them on kernel or user pages, the main difference is you are more likely
> to hit TLB refill/update cases when using user space pages, exactly
> one of the problem triggers. If it's working on kernel pages and not
> user pages, or some other combinations, you are just being lucky. The
> cache instructions will do the right thing if the mapping is present
> in the TLB (and you don't get a write/update miss) and the page is
> cached. If you don't have the page cached or you get any TLB exception
> the results are unpredictable and the result varies depending upon
> silicon revision.
hmm, I found this comment in head_8xx.S:
/* The EA of a data TLB miss is automatically stored in the MD_EPN
* register. The EA of a data TLB error is automatically stored in
* the DAR, but not the MD_EPN register. We must copy the 20 most
* significant bits of the EA from the DAR to MD_EPN before we
* start walking the page tables. We also need to copy the CASID
* value from the M_CASID register.
* Addendum: The EA of a data TLB error is _supposed_ to be stored
* in DAR, but it seems that this doesn't happen in some cases, such
* as when the error is due to a dcbi instruction to a page with a
* TLB that doesn't have the changed bit set. In such cases, there
* does not appear to be any way to recover the EA of the error
* since it is neither in DAR nor MD_EPN. As a workaround, the
* _PAGE_HWWRITE bit is set for all kernel data pages when the PTEs
* are initialized in mapin_ram(). This will avoid the problem,
* assuming we only use the dcbi instruction on kernel addresses.
*/
Does this workaround also work for dcbz on kernel addresses?
Also, will the Pinned TLB feature for 860 help here?
It is the DataTLBError exception that is causing(if I have understood
the problem correctly), so if the kernel always has a TLB for kernel
space addresses, dcbz and friends will work correctly for kernel addresses?
>
> This is something that is difficult to debug and we can't dismiss this
> with a solution of different copy functions. The clear/copy functions
> for the 8xx should be identical to all other PowerPC cores, and if they
> don't work that way we need to determine why. At least you have the
> knowledge that these instructions are troublesome. It took me many months
> to discover this the first time, and perhaps they still misbehave.
Yes I am sure it's very hard to debug and probably over my head :-(
I will try a litte bit more though and see very it takes me.
Jocke
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: Improved copy_page() function, about 30% speed up for mpc860!
2003-03-04 15:24 ` Joakim Tjernlund
@ 2003-03-04 17:00 ` Dan Malek
2003-03-04 22:01 ` Joakim Tjernlund
0 siblings, 1 reply; 27+ messages in thread
From: Dan Malek @ 2003-03-04 17:00 UTC (permalink / raw)
To: joakim.tjernlund; +Cc: Daniel Jacobowitz, linuxppc-dev
Joakim Tjernlund wrote:
> Does this workaround also work for dcbz on kernel addresses?
It may, depending upon the silicon revision. :-)
IIRC, the real reason for this code is to work around errata in
some silicon versions where the DAR isn't loaded properly. In
addition, this also helps when we use kernel debuggers for setting
breakpoints when the text area is write enabled.
> Also, will the Pinned TLB feature for 860 help here?
It will only for the static data area of the kernel. If you have
vmalloc()'ed pages you can stumble across the same errors.
> Yes I am sure it's very hard to debug and probably over my head :-(
> I will try a litte bit more though and see very it takes me.
If nothing else, you are learning more than you bargained for. ;-)
-- Dan
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 27+ messages in thread
* RE: Improved copy_page() function, about 30% speed up for mpc860!
2003-03-04 17:00 ` Dan Malek
@ 2003-03-04 22:01 ` Joakim Tjernlund
2003-03-04 22:41 ` Dan Malek
2003-03-04 23:35 ` Tom Rini
0 siblings, 2 replies; 27+ messages in thread
From: Joakim Tjernlund @ 2003-03-04 22:01 UTC (permalink / raw)
To: Dan Malek, Tom Rini; +Cc: Daniel Jacobowitz, linuxppc-dev
>
>
> Joakim Tjernlund wrote:
>
> > Does this workaround also work for dcbz on kernel addresses?
>
> It may, depending upon the silicon revision. :-)
>
> IIRC, the real reason for this code is to work around errata in
> some silicon versions where the DAR isn't loaded properly. In
> addition, this also helps when we use kernel debuggers for setting
> breakpoints when the text area is write enabled.
I can't find any further info about this bug/workaround. There is nothing in
linux that indicates that the workaround is dependant on any revision.
Further, dcbi and dcbz are pretty similar in behavior w.r.t TLB error so
I suspect that if this works for dcbi it also works for dcbz, only
on kernel addresses though.
Maybe user space also can be fixed with instruction decoding
as Daniel said even if it's ugly. Then you would not have to worry about
apps/libs using dcbz, at least it would work but perhaps the performance sucks
Tom, can you shed some light on this? The BK log lists you as the one
who committed this change.
> If nothing else, you are learning more than you bargained for. ;-)
Definitly :-)
Jocke
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Improved copy_page() function, about 30% speed up for mpc860!
2003-03-04 22:01 ` Joakim Tjernlund
@ 2003-03-04 22:41 ` Dan Malek
2003-03-04 23:20 ` Joakim Tjernlund
2003-03-04 23:35 ` Tom Rini
1 sibling, 1 reply; 27+ messages in thread
From: Dan Malek @ 2003-03-04 22:41 UTC (permalink / raw)
To: joakim.tjernlund; +Cc: Tom Rini, Daniel Jacobowitz, linuxppc-dev
Joakim Tjernlund wrote:
> I can't find any further info about this bug/workaround. There is nothing in
> linux that indicates that the workaround is dependant on any revision.
That's because we couldn't determine which ones in particular may be
broken. Remember there are several different sources of cores, the 823/850
core is different from the 860 core. We don't distinguish among these
in the kernel.
> Further, dcbi and dcbz are pretty similar in behavior w.r.t TLB error so
> I suspect that if this works for dcbi it also works for dcbz, only
> on kernel addresses though.
Maybe.
> Maybe user space also can be fixed with instruction decoding
No, user space has always been "fixed" by providing a unique version
of C libraries for the 8xx. This has to be done for more reasons
than just the cache instructions. It also includes supporting a
different sized cache line for dynamic relocation and often the
performance advantage of using soft-float instead of kernel emulated
floating point. It's also perfectly suited for the IBM 403, BTW :-)
> as Daniel said even if it's ugly. Then you would not have to worry about
> apps/libs using dcbz, at least it would work but perhaps the performance sucks
I'm trying to discourage you guys from wasting time here. These are
all known problems that have been solved. If we want to use this processor
we have to accept some limitations and get on with using it. :-) People
have been successfully deploying products for years using Linux and 8xx.
Leave the cache instructions alone. :-)
This discussion has been moved to a very low priority in my mailbox.
Don't expect timely or any further replies, and please don't be sending me
any dcbz kernel patches! :-)
Thanks.
-- Dan
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Improved copy_page() function, about 30% speed up for mpc860!
2003-03-04 22:41 ` Dan Malek
@ 2003-03-04 23:20 ` Joakim Tjernlund
0 siblings, 0 replies; 27+ messages in thread
From: Joakim Tjernlund @ 2003-03-04 23:20 UTC (permalink / raw)
To: Dan Malek; +Cc: Tom Rini, Daniel Jacobowitz, linuxppc-dev
> I'm trying to discourage you guys from wasting time here. These are
> all known problems that have been solved. If we want to use this processor
> we have to accept some limitations and get on with using it. :-) People
> have been successfully deploying products for years using Linux and 8xx.
> Leave the cache instructions alone. :-)
Might be useful for copy_tofrom_user() though, but for now I will leave it alone :-)
> This discussion has been moved to a very low priority in my mailbox.
> Don't expect timely or any further replies, and please don't be sending me
> any dcbz kernel patches! :-)
OK I won't, I'll just send them to the embedded list instead :-)
Jocke
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Improved copy_page() function, about 30% speed up for mpc860!
2003-03-04 22:01 ` Joakim Tjernlund
2003-03-04 22:41 ` Dan Malek
@ 2003-03-04 23:35 ` Tom Rini
2003-03-04 23:45 ` Joakim Tjernlund
2003-03-05 17:15 ` Dan Malek
1 sibling, 2 replies; 27+ messages in thread
From: Tom Rini @ 2003-03-04 23:35 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: Dan Malek, Daniel Jacobowitz, linuxppc-dev
On Tue, Mar 04, 2003 at 11:01:59PM +0100, Joakim Tjernlund wrote:
> Maybe user space also can be fixed with instruction decoding
> as Daniel said even if it's ugly. Then you would not have to worry about
> apps/libs using dcbz, at least it would work but perhaps the performance sucks
>
> Tom, can you shed some light on this? The BK log lists you as the one
> who committed this change.
Which change in particular are you talking about? But I think as Dan
has said, it's best to just accept this and move along. :) It's known
to be horribly broken in some cases, but not admited to by Motorola, and
distinguishing between 8xx's at runtime is not trivial.
--
Tom Rini
http://gate.crashing.org/~trini/
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Improved copy_page() function, about 30% speed up for mpc860!
2003-03-04 23:35 ` Tom Rini
@ 2003-03-04 23:45 ` Joakim Tjernlund
2003-03-05 0:05 ` Tom Rini
2003-03-05 17:15 ` Dan Malek
1 sibling, 1 reply; 27+ messages in thread
From: Joakim Tjernlund @ 2003-03-04 23:45 UTC (permalink / raw)
To: Tom Rini; +Cc: Dan Malek, Daniel Jacobowitz, linuxppc-dev
> Which change in particular are you talking about?
This, in head_8xx.S:
* Addendum: The EA of a data TLB error is _supposed_ to be stored
* in DAR, but it seems that this doesn't happen in some cases, such
* as when the error is due to a dcbi instruction to a page with a
* TLB that doesn't have the changed bit set. In such cases, there
* does not appear to be any way to recover the EA of the error
* since it is neither in DAR nor MD_EPN. As a workaround, the
* _PAGE_HWWRITE bit is set for all kernel data pages when the PTEs
* are initialized in mapin_ram(). This will avoid the problem,
* assuming we only use the dcbi instruction on kernel addresses.
from http://lists.linuxppc.org/linuxppc-dev/200303/msg00022.html
> But I think as Dan
> has said, it's best to just accept this and move along. :) It's known
> to be horribly broken in some cases, but not admited to by Motorola, and
> distinguishing between 8xx's at runtime is not trivial.
Regarding dcbz on user space yes, but not kernel space. I will do some more digging/testing and
maybe something could go into 2.5 eventally
Jocke
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Improved copy_page() function, about 30% speed up for mpc860!
2003-03-04 23:45 ` Joakim Tjernlund
@ 2003-03-05 0:05 ` Tom Rini
2003-03-05 0:19 ` Joakim Tjernlund
0 siblings, 1 reply; 27+ messages in thread
From: Tom Rini @ 2003-03-05 0:05 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: Dan Malek, Daniel Jacobowitz, linuxppc-dev
On Wed, Mar 05, 2003 at 12:45:14AM +0100, Joakim Tjernlund wrote:
> > Which change in particular are you talking about?
> This, in head_8xx.S:
> * Addendum: The EA of a data TLB error is _supposed_ to be stored
> * in DAR, but it seems that this doesn't happen in some cases, such
> * as when the error is due to a dcbi instruction to a page with a
> * TLB that doesn't have the changed bit set. In such cases, there
> * does not appear to be any way to recover the EA of the error
> * since it is neither in DAR nor MD_EPN. As a workaround, the
> * _PAGE_HWWRITE bit is set for all kernel data pages when the PTEs
> * are initialized in mapin_ram(). This will avoid the problem,
> * assuming we only use the dcbi instruction on kernel addresses.
>
> from http://lists.linuxppc.org/linuxppc-dev/200303/msg00022.html
Well, the whole change is at:
http://ppc.bkbits.net:8080/linuxppc_2_4_devel/cset@1.118.1.428?nav=index.html|ChangeSet@-1d
So, that should give it more context and make it clearer, I hope..
> > But I think as Dan
> > has said, it's best to just accept this and move along. :) It's known
> > to be horribly broken in some cases, but not admited to by Motorola, and
> > distinguishing between 8xx's at runtime is not trivial.
>
> Regarding dcbz on user space yes, but not kernel space.
I have this feeling Dan is right and it's luck, not a lack of a real
issue.
--
Tom Rini
http://gate.crashing.org/~trini/
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Improved copy_page() function, about 30% speed up for mpc860!
2003-03-05 0:05 ` Tom Rini
@ 2003-03-05 0:19 ` Joakim Tjernlund
2003-03-05 17:12 ` Tom Rini
0 siblings, 1 reply; 27+ messages in thread
From: Joakim Tjernlund @ 2003-03-05 0:19 UTC (permalink / raw)
To: Tom Rini; +Cc: Dan Malek, Daniel Jacobowitz, linuxppc-dev
> > from http://lists.linuxppc.org/linuxppc-dev/200303/msg00022.html
>
> Well, the whole change is at:
> http://ppc.bkbits.net:8080/linuxppc_2_4_devel/cset@1.118.1.428?nav=index.html|ChangeSet@-1d
>
> So, that should give it more context and make it clearer, I hope..
Yes, but how did you come up with the fix in the first place? Trial end error or did Motorla help you?
If they did, then I would like know what they said.
>
> I have this feeling Dan is right and it's luck, not a lack of a real
> issue.
Would not surprise me.
Jocke
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Improved copy_page() function, about 30% speed up for mpc860!
2003-03-05 0:19 ` Joakim Tjernlund
@ 2003-03-05 17:12 ` Tom Rini
2003-03-05 17:50 ` Joakim Tjernlund
0 siblings, 1 reply; 27+ messages in thread
From: Tom Rini @ 2003-03-05 17:12 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: Dan Malek, Daniel Jacobowitz, linuxppc-dev
On Wed, Mar 05, 2003 at 01:19:04AM +0100, Joakim Tjernlund wrote:
> > > from http://lists.linuxppc.org/linuxppc-dev/200303/msg00022.html
> >
> > Well, the whole change is at:
> > http://ppc.bkbits.net:8080/linuxppc_2_4_devel/cset@1.118.1.428?nav=index.html|ChangeSet@-1d
> >
> > So, that should give it more context and make it clearer, I hope..
>
> Yes, but how did you come up with the fix in the first place? Trial end error or did Motorla help you?
It wasn't me personally, but it was trial and error I believe.
--
Tom Rini
http://gate.crashing.org/~trini/
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 27+ messages in thread
* RE: Improved copy_page() function, about 30% speed up for mpc860!
2003-03-05 17:12 ` Tom Rini
@ 2003-03-05 17:50 ` Joakim Tjernlund
0 siblings, 0 replies; 27+ messages in thread
From: Joakim Tjernlund @ 2003-03-05 17:50 UTC (permalink / raw)
To: Tom Rini; +Cc: Dan Malek, Daniel Jacobowitz, linuxppc-dev
> On Wed, Mar 05, 2003 at 01:19:04AM +0100, Joakim Tjernlund wrote:
> > > > from http://lists.linuxppc.org/linuxppc-dev/200303/msg00022.html
> > >
> > > Well, the whole change is at:
> > > http://ppc.bkbits.net:8080/linuxppc_2_4_devel/cset@1.118.1.428?nav=index.html|ChangeSet@-1d
> > >
> > > So, that should give it more context and make it clearer, I hope..
FYI, I undid the above patch and then dcbz failed horribly. Linux
just locks up. Using my BDI2000 I tried to single step over the offending dcbz
and it never got past it.
Jocke
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Improved copy_page() function, about 30% speed up for mpc860!
2003-03-04 23:35 ` Tom Rini
2003-03-04 23:45 ` Joakim Tjernlund
@ 2003-03-05 17:15 ` Dan Malek
1 sibling, 0 replies; 27+ messages in thread
From: Dan Malek @ 2003-03-05 17:15 UTC (permalink / raw)
To: Tom Rini; +Cc: Joakim Tjernlund, Daniel Jacobowitz, linuxppc-dev
Tom Rini wrote:
> .... It's known
> to be horribly broken in some cases, but not admited to by Motorola, and
> distinguishing between 8xx's at runtime is not trivial.
Well, to be fair, Motorola will admit to the errata. I don't understand
their testing process and it is very difficult to uncover the problem.
I was able to get very early silicon many years ago when this was
discovered. I would tell them the sequence of events that would cause
it, but when tested in newer silicon it wouldn't always be found, but
there was a different sequence that would uncover similar problems.
Certain combinations of TLB exception, cache line status, and instruction
stream would trigger failures. The worst failure was the dcbz instruction
didn't really cause the proper effect, or affected the wrong cache line.
These took forever to discover. So, the easiest solution was to just
not use it. The kernel may still be littered with some workarounds
that aren't necessary once we decided to stop using the instruction.
Thanks.
-- Dan
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 27+ messages in thread
[parent not found: <1046737789.885.15.camel@zion.wanadoo.fr>]
* Re: Improved copy_page() function, about 30% speed up for mpc860!
[not found] ` <1046737789.885.15.camel@zion.wanadoo.fr>
@ 2003-03-04 0:51 ` Dan Malek
0 siblings, 0 replies; 27+ messages in thread
From: Dan Malek @ 2003-03-04 0:51 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Joakim Tjernlund, linuxppc-dev, drow
Benjamin Herrenschmidt wrote:
> If you know precisely what version of the chip has this bug fixed,
> then you can define a CPU feature bit, and enclose the dcbz in
> a CPU feature conditional section. That way, they will get nop'ed
> out on faulty CPUs.
The problem is we don't know because it was never a documented
problem. It was very difficult to find, so I'm not surprised it
didn't always show up on the silicon errata. Further, there may
not be enough information in the cpu identification registers to
determine this level of silicon revision.
We just can't nop the dcbz, we have to add explicit instructions
to perform the function if dcbz or other cache instructions don't
work properly. It's also in an area where we have to be sensitive
to cache utilization. We could end up with a situation where there
are lots of nops (using cpu features) or branches where reloading
the i-cache could kill all of the optimization we gained by making
the d-cache more efficient.
Thanks.
-- Dan
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 27+ messages in thread
* Improved copy_page() function, about 30% speed up for mpc860!
@ 2003-02-27 13:08 Joakim Tjernlund
2003-02-27 15:45 ` Joakim Tjernlund
` (2 more replies)
0 siblings, 3 replies; 27+ messages in thread
From: Joakim Tjernlund @ 2003-02-27 13:08 UTC (permalink / raw)
To: Linuxppc-Embedded@Lists. Linuxppc. Org
Hi all
I have been playing with the copy_page() function in arch/ppc/kernel/misc.S
and gained about 30% speed up for my mpc860, rev D4 MHz.
This is what i did:
- Use dcbz on 8xx but clear ahead one cache line(performance is really crappy
if I don't clear ahead). This is the biggest improvement.
- Use prefetch for 8xx as well.
I know that dcbz is buggy for some 8xx CPUs but I don't know which ones.
For me works just fine, except in copy_tofrom_user(don't know why).
I would like to get some feedback & test results both for 8xx and non 8xx.
Please include exact CPU and revision.
Thanks
Jocke
_GLOBAL(copy_page)
addi r3,r3,-4
addi r4,r4,-4
li r5,4
#if MAX_COPY_PREFETCH > 1
/* This will prefetch past end of page, does not seem to be a problem? */
li r0,MAX_COPY_PREFETCH
li r11,4
mtctr r0
11: dcbt r11,r4
addi r11,r11,L1_CACHE_LINE_SIZE
bdnz 11b
#else /* MAX_L1_COPY_PREFETCH == 1 */
dcbt r5,r4
li r11,L1_CACHE_LINE_SIZE+4
#endif /* MAX_L1_COPY_PREFETCH */
dcbz r5,r3 /* older 8xx CPUs may have buggy dcbz instructions, if so try "dcbt r5,r3" instead */
addi r5,r5,L1_CACHE_LINE_SIZE
li r0,4096/L1_CACHE_LINE_SIZE-1 /* All, but the last cache line of data due dcbz below */
mtctr r0
1:
dcbt r11,r4
dcbz r5,r3 /* zero the cache line after the one that is beeing copied
* older 8xx CPUs may have buggy dcbz instructions, if so try "dcbt r5,r3" instead */
COPY_16_BYTES
#if L1_CACHE_LINE_SIZE >= 32
COPY_16_BYTES
#if L1_CACHE_LINE_SIZE >= 64
COPY_16_BYTES
COPY_16_BYTES
#if L1_CACHE_LINE_SIZE >= 128
COPY_16_BYTES
COPY_16_BYTES
COPY_16_BYTES
COPY_16_BYTES
#endif
#endif
#endif
bdnz 1b
/* Copy the last cache line of data */
COPY_16_BYTES
#if L1_CACHE_LINE_SIZE >= 32
COPY_16_BYTES
#if L1_CACHE_LINE_SIZE >= 64
COPY_16_BYTES
COPY_16_BYTES
#if L1_CACHE_LINE_SIZE >= 128
COPY_16_BYTES
COPY_16_BYTES
COPY_16_BYTES
COPY_16_BYTES
#endif
#endif
#endif
blr
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 27+ messages in thread* RE: Improved copy_page() function, about 30% speed up for mpc860!
2003-02-27 13:08 Joakim Tjernlund
@ 2003-02-27 15:45 ` Joakim Tjernlund
2003-02-28 17:31 ` Joakim Tjernlund
2003-03-03 21:28 ` Dan Malek
2 siblings, 0 replies; 27+ messages in thread
From: Joakim Tjernlund @ 2003-02-27 15:45 UTC (permalink / raw)
To: Linuxppc-Embedded@Lists. Linuxppc. Org
> Hi all
>
> I have been playing with the copy_page() function in arch/ppc/kernel/misc.S
> and gained about 30% speed up for my mpc860, rev D4 MHz.
>
> This is what i did:
> - Use dcbz on 8xx but clear ahead one cache line(performance is really crappy
> if I don't clear ahead). This is the biggest improvement.
> - Use prefetch for 8xx as well.
>
> I know that dcbz is buggy for some 8xx CPUs but I don't know which ones.
> For me works just fine, except in copy_tofrom_user(don't know why).
hmm, I made two versions of copy_tofrom_user(), copy_from_user() and copy_to_user()
and modified asm/uaccess.h to reflect this.
Then I modified copy_from_user() to use dcbz on the destination area.
Booted and started our app and it works just fine!
Jocke
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 27+ messages in thread
* RE: Improved copy_page() function, about 30% speed up for mpc860!
2003-02-27 13:08 Joakim Tjernlund
2003-02-27 15:45 ` Joakim Tjernlund
@ 2003-02-28 17:31 ` Joakim Tjernlund
2003-03-03 21:28 ` Dan Malek
2 siblings, 0 replies; 27+ messages in thread
From: Joakim Tjernlund @ 2003-02-28 17:31 UTC (permalink / raw)
To: Linuxppc-Embedded@Lists. Linuxppc. Org
Forget the copy_page below. I used a non cache aligned buffer :-(
However if I enable the use of "dcbz" and remove "dcbt" in the
orginal copy_page() and use a cache aligned test buffer,
I still get a speedup of 30% or more on my mpc860 board.
I think a new CONFIG option is apropiate where one can turn
on the use of "dcbz" for 8xx. OK?
Jocke
> Hi all
>
> I have been playing with the copy_page() function in arch/ppc/kernel/misc.S
> and gained about 30% speed up for my mpc860, rev D4 MHz.
>
> This is what i did:
> - Use dcbz on 8xx but clear ahead one cache line(performance is really crappy
> if I don't clear ahead). This is the biggest improvement.
> - Use prefetch for 8xx as well.
>
> I know that dcbz is buggy for some 8xx CPUs but I don't know which ones.
> For me works just fine, except in copy_tofrom_user(don't know why).
>
> I would like to get some feedback & test results both for 8xx and non 8xx.
> Please include exact CPU and revision.
>
> Thanks
> Jocke
>
> _GLOBAL(copy_page)
> addi r3,r3,-4
> addi r4,r4,-4
> li r5,4
> #if MAX_COPY_PREFETCH > 1
> /* This will prefetch past end of page, does not seem to be a problem? */
> li r0,MAX_COPY_PREFETCH
> li r11,4
> mtctr r0
> 11: dcbt r11,r4
> addi r11,r11,L1_CACHE_LINE_SIZE
> bdnz 11b
> #else /* MAX_L1_COPY_PREFETCH == 1 */
> dcbt r5,r4
> li r11,L1_CACHE_LINE_SIZE+4
> #endif /* MAX_L1_COPY_PREFETCH */
> dcbz r5,r3 /* older 8xx CPUs may have buggy dcbz instructions, if so try "dcbt r5,r3" instead */
> addi r5,r5,L1_CACHE_LINE_SIZE
> li r0,4096/L1_CACHE_LINE_SIZE-1 /* All, but the last cache line of data due dcbz below */
> mtctr r0
> 1:
> dcbt r11,r4
> dcbz r5,r3 /* zero the cache line after the one that is beeing copied
> * older 8xx CPUs may have buggy dcbz instructions, if so try "dcbt r5,r3" instead */
> COPY_16_BYTES
> #if L1_CACHE_LINE_SIZE >= 32
> COPY_16_BYTES
> #if L1_CACHE_LINE_SIZE >= 64
> COPY_16_BYTES
> COPY_16_BYTES
> #if L1_CACHE_LINE_SIZE >= 128
> COPY_16_BYTES
> COPY_16_BYTES
> COPY_16_BYTES
> COPY_16_BYTES
> #endif
> #endif
> #endif
> bdnz 1b
> /* Copy the last cache line of data */
> COPY_16_BYTES
> #if L1_CACHE_LINE_SIZE >= 32
> COPY_16_BYTES
> #if L1_CACHE_LINE_SIZE >= 64
> COPY_16_BYTES
> COPY_16_BYTES
> #if L1_CACHE_LINE_SIZE >= 128
> COPY_16_BYTES
> COPY_16_BYTES
> COPY_16_BYTES
> COPY_16_BYTES
> #endif
> #endif
> #endif
> blr
>
>
>
>
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Improved copy_page() function, about 30% speed up for mpc860!
2003-02-27 13:08 Joakim Tjernlund
2003-02-27 15:45 ` Joakim Tjernlund
2003-02-28 17:31 ` Joakim Tjernlund
@ 2003-03-03 21:28 ` Dan Malek
2003-03-04 0:09 ` Joakim Tjernlund
2003-03-04 0:19 ` Paul Mackerras
2 siblings, 2 replies; 27+ messages in thread
From: Dan Malek @ 2003-03-03 21:28 UTC (permalink / raw)
To: joakim.tjernlund; +Cc: Linuxppc-Embedded@Lists. Linuxppc. Org
Joakim Tjernlund wrote:
> I have been playing with the copy_page() function in arch/ppc/kernel/misc.S
> and gained about 30% speed up for my mpc860, rev D4 MHz.
Have you found the discussion in linuxppc-dev about the work Paul has done
on this in general for PowerPC? It may help avoid repeating some work and
provide some guidance.....
And don't forget....many applications aren't heavily 'copy-centric' and it
may be beneficial to not blow away the caches in those cases. That is, if
you apply systems engineering methods to your testing instead of just focusing
on such a low level detail, you may discover you are wasting your time and
from an overall system application you may be providing little benefit or
even a overall degradation in system performance.
Thanks.
-- Dan
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Improved copy_page() function, about 30% speed up for mpc860!
2003-03-03 21:28 ` Dan Malek
@ 2003-03-04 0:09 ` Joakim Tjernlund
2003-03-04 0:19 ` Paul Mackerras
1 sibling, 0 replies; 27+ messages in thread
From: Joakim Tjernlund @ 2003-03-04 0:09 UTC (permalink / raw)
To: Dan Malek; +Cc: Linuxppc-Embedded@Lists. Linuxppc. Org
> Joakim Tjernlund wrote:
>
> > I have been playing with the copy_page() function in arch/ppc/kernel/misc.S
> > and gained about 30% speed up for my mpc860, rev D4 MHz.
>
> Have you found the discussion in linuxppc-dev about the work Paul has done
> on this in general for PowerPC? It may help avoid repeating some work and
> provide some guidance.....
I have searched but I did not find anything conclusive. Pointers?
> And don't forget....many applications aren't heavily 'copy-centric' and it
> may be beneficial to not blow away the caches in those cases. That is, if
If you are referring to the copy_page() that I attached in my first mail, then
yes it uses more icache, but if you have seen my later post where I took it
back and stated that just enabling dcbz in the existing version of copy_page()
would give the same speed up, I don't follow you. How am I wasting caches?
In the end I would like to modify copy_tofrom_user() so that dcbz is used on kernel space
addresses but not on user space without adding a lot of code. Ideas welcome.
> you apply systems engineering methods to your testing instead of just focusing
> on such a low level detail, you may discover you are wasting your time and
> from an overall system application you may be providing little benefit or
> even a overall degradation in system performance.
I am just trying make 8xx perform a little better and I focus on areas I know something
about such as crc32, the enet.c driver and in this case various memory copy stuff. Hopefully
the end result will be useful to me and others.
Jocke
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Improved copy_page() function, about 30% speed up for mpc860!
2003-03-03 21:28 ` Dan Malek
2003-03-04 0:09 ` Joakim Tjernlund
@ 2003-03-04 0:19 ` Paul Mackerras
1 sibling, 0 replies; 27+ messages in thread
From: Paul Mackerras @ 2003-03-04 0:19 UTC (permalink / raw)
To: Dan Malek; +Cc: joakim.tjernlund, linuxppc-embedded
Dan Malek writes:
> And don't forget....many applications aren't heavily 'copy-centric' and it
> may be beneficial to not blow away the caches in those cases. That is, if
Using dcbz on the destination won't blow away the caches any more than
doing the copy without dcbz would anyway.
The thing you have to be careful of when using dcbz, particularly when
you are dbcz'ing one or more cache lines ahead, is that you only dcbz
cache lines that are completely contained within the destination area.
That introduces extra complexity and makes the code bigger, so you
have to be careful that you don't make small copies slower.
I did some measurements once and found that almost all of the copies
in the kernel (memcpy and copy_tofrom_user) were either relatively
small, i.e. less than 256 bytes, or were page-sized and page-aligned.
In the optimized copy routines I did in the ppc64 kernel.
Paul.
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2003-03-05 17:50 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-03-02 17:50 Improved copy_page() function, about 30% speed up for mpc860! Joakim Tjernlund
2003-03-03 21:18 ` Dan Malek
2003-03-03 23:16 ` Joakim Tjernlund
2003-03-04 0:43 ` Dan Malek
2003-03-04 0:54 ` Daniel Jacobowitz
2003-03-04 3:38 ` Dan Malek
2003-03-04 8:29 ` Joakim Tjernlund
2003-03-04 13:33 ` Dan Malek
2003-03-04 15:24 ` Joakim Tjernlund
2003-03-04 17:00 ` Dan Malek
2003-03-04 22:01 ` Joakim Tjernlund
2003-03-04 22:41 ` Dan Malek
2003-03-04 23:20 ` Joakim Tjernlund
2003-03-04 23:35 ` Tom Rini
2003-03-04 23:45 ` Joakim Tjernlund
2003-03-05 0:05 ` Tom Rini
2003-03-05 0:19 ` Joakim Tjernlund
2003-03-05 17:12 ` Tom Rini
2003-03-05 17:50 ` Joakim Tjernlund
2003-03-05 17:15 ` Dan Malek
[not found] ` <1046737789.885.15.camel@zion.wanadoo.fr>
2003-03-04 0:51 ` Dan Malek
-- strict thread matches above, loose matches on Subject: below --
2003-02-27 13:08 Joakim Tjernlund
2003-02-27 15:45 ` Joakim Tjernlund
2003-02-28 17:31 ` Joakim Tjernlund
2003-03-03 21:28 ` Dan Malek
2003-03-04 0:09 ` Joakim Tjernlund
2003-03-04 0:19 ` Paul Mackerras
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).