* Re: MIPS: c-r4k.c: Fix the 74K D-cache alias erratum workaround
@ 2014-11-17 20:12 ` Leonid Yegoshin
0 siblings, 0 replies; 14+ messages in thread
From: Leonid Yegoshin @ 2014-11-17 20:12 UTC (permalink / raw)
To: Maciej W. Rozycki; +Cc: Ralf Baechle, linux-mips
> Fix the 74K D-cache alias erratum workaround so that it actually works.
> Our current code sets MIPS_CACHE_VTAG for the D-cache, but that flag
> only has any effect for the I-cache. Additionally MIPS_CACHE_PINDEX is
> set for the D-cache if CP0.Config7.AR is also set for an affected
> processor, leading to confusing information in the bootstrap log (the
> flag isn't used beyond that).
> So delete the setting of MIPS_CACHE_VTAG and rely on MIPS_CACHE_ALIASES,
> set in a common place, removing I-cache coherency issues seen in GDB
> testing with software breakpoints, gdbserver and ptrace(2), on affected
> systems.
> While at it add a little piece of explanation of what CP0.Config6.SYND
> is so that people do not have to chase documentation.
This shift to MIPS_CACHE_ALIASES is not needed, a use of MIPS_CACHE_VTAG
in dcache is actually a way how to prevent some very specific situations
in 74K(E77)/1074K(E17) cache handling. It is not a case of cache
aliasing and name VTAG is used because it is related with virtual
address conversion tagging. I reused MIPS_CACHE_VTAG just to save some
spare bits in cpu_info.options and because D-cache never had virtual
tagging like I-cache.
The setting d-cache aliases then CPU hasn't it is a significant
performance loss and should be avoided.
Please don't use this patch.
- Leonid.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: MIPS: c-r4k.c: Fix the 74K D-cache alias erratum workaround
@ 2014-11-17 20:12 ` Leonid Yegoshin
0 siblings, 0 replies; 14+ messages in thread
From: Leonid Yegoshin @ 2014-11-17 20:12 UTC (permalink / raw)
To: Maciej W. Rozycki; +Cc: Ralf Baechle, linux-mips
> Fix the 74K D-cache alias erratum workaround so that it actually works.
> Our current code sets MIPS_CACHE_VTAG for the D-cache, but that flag
> only has any effect for the I-cache. Additionally MIPS_CACHE_PINDEX is
> set for the D-cache if CP0.Config7.AR is also set for an affected
> processor, leading to confusing information in the bootstrap log (the
> flag isn't used beyond that).
> So delete the setting of MIPS_CACHE_VTAG and rely on MIPS_CACHE_ALIASES,
> set in a common place, removing I-cache coherency issues seen in GDB
> testing with software breakpoints, gdbserver and ptrace(2), on affected
> systems.
> While at it add a little piece of explanation of what CP0.Config6.SYND
> is so that people do not have to chase documentation.
This shift to MIPS_CACHE_ALIASES is not needed, a use of MIPS_CACHE_VTAG
in dcache is actually a way how to prevent some very specific situations
in 74K(E77)/1074K(E17) cache handling. It is not a case of cache
aliasing and name VTAG is used because it is related with virtual
address conversion tagging. I reused MIPS_CACHE_VTAG just to save some
spare bits in cpu_info.options and because D-cache never had virtual
tagging like I-cache.
The setting d-cache aliases then CPU hasn't it is a significant
performance loss and should be avoided.
Please don't use this patch.
- Leonid.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: MIPS: c-r4k.c: Fix the 74K D-cache alias erratum workaround
@ 2014-11-17 23:29 ` Maciej W. Rozycki
0 siblings, 0 replies; 14+ messages in thread
From: Maciej W. Rozycki @ 2014-11-17 23:29 UTC (permalink / raw)
To: Leonid Yegoshin, Ralf Baechle; +Cc: linux-mips
On Mon, 17 Nov 2014, Leonid Yegoshin wrote:
> > Fix the 74K D-cache alias erratum workaround so that it actually works.
> > Our current code sets MIPS_CACHE_VTAG for the D-cache, but that flag
> > only has any effect for the I-cache. Additionally MIPS_CACHE_PINDEX is
> > set for the D-cache if CP0.Config7.AR is also set for an affected
> > processor, leading to confusing information in the bootstrap log (the
> > flag isn't used beyond that).
>
> > So delete the setting of MIPS_CACHE_VTAG and rely on MIPS_CACHE_ALIASES,
> > set in a common place, removing I-cache coherency issues seen in GDB
> > testing with software breakpoints, gdbserver and ptrace(2), on affected
> > systems.
>
> > While at it add a little piece of explanation of what CP0.Config6.SYND
> > is so that people do not have to chase documentation.
>
> This shift to MIPS_CACHE_ALIASES is not needed, a use of MIPS_CACHE_VTAG in
> dcache is actually a way how to prevent some very specific situations in
> 74K(E77)/1074K(E17) cache handling. It is not a case of cache aliasing and
> name VTAG is used because it is related with virtual address conversion
> tagging. I reused MIPS_CACHE_VTAG just to save some spare bits in
> cpu_info.options and because D-cache never had virtual tagging like I-cache.
>
> The setting d-cache aliases then CPU hasn't it is a significant performance
> loss and should be avoided.
>
> Please don't use this patch.
I repeat, there is no use in the current kernel of the MIPS_CACHE_VTAG
flag for the D-cache, there's no single piece of code throughout the
kernel that would check the presence of this flag once it has been set
and this erratum workaround piece is the only place where the flag is
set for the D-cache. The flag is completely ignored for the D-cache and
the only existing uses of this flag are for the I-cache.
Leonid, I spent several hours chasing cache coherency issues util I
realised the workaround in its current form does not do anything, so
unless you propose an alternative fix, this change is the only way known
to bring systems affected to sanity.
Ralf, please apply the change for now, if Leonid provides us with a
better fix later on, then my patch can be reverted. Thanks.
Maciej
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: MIPS: c-r4k.c: Fix the 74K D-cache alias erratum workaround
@ 2014-11-17 23:29 ` Maciej W. Rozycki
0 siblings, 0 replies; 14+ messages in thread
From: Maciej W. Rozycki @ 2014-11-17 23:29 UTC (permalink / raw)
To: Leonid Yegoshin, Ralf Baechle; +Cc: linux-mips
On Mon, 17 Nov 2014, Leonid Yegoshin wrote:
> > Fix the 74K D-cache alias erratum workaround so that it actually works.
> > Our current code sets MIPS_CACHE_VTAG for the D-cache, but that flag
> > only has any effect for the I-cache. Additionally MIPS_CACHE_PINDEX is
> > set for the D-cache if CP0.Config7.AR is also set for an affected
> > processor, leading to confusing information in the bootstrap log (the
> > flag isn't used beyond that).
>
> > So delete the setting of MIPS_CACHE_VTAG and rely on MIPS_CACHE_ALIASES,
> > set in a common place, removing I-cache coherency issues seen in GDB
> > testing with software breakpoints, gdbserver and ptrace(2), on affected
> > systems.
>
> > While at it add a little piece of explanation of what CP0.Config6.SYND
> > is so that people do not have to chase documentation.
>
> This shift to MIPS_CACHE_ALIASES is not needed, a use of MIPS_CACHE_VTAG in
> dcache is actually a way how to prevent some very specific situations in
> 74K(E77)/1074K(E17) cache handling. It is not a case of cache aliasing and
> name VTAG is used because it is related with virtual address conversion
> tagging. I reused MIPS_CACHE_VTAG just to save some spare bits in
> cpu_info.options and because D-cache never had virtual tagging like I-cache.
>
> The setting d-cache aliases then CPU hasn't it is a significant performance
> loss and should be avoided.
>
> Please don't use this patch.
I repeat, there is no use in the current kernel of the MIPS_CACHE_VTAG
flag for the D-cache, there's no single piece of code throughout the
kernel that would check the presence of this flag once it has been set
and this erratum workaround piece is the only place where the flag is
set for the D-cache. The flag is completely ignored for the D-cache and
the only existing uses of this flag are for the I-cache.
Leonid, I spent several hours chasing cache coherency issues util I
realised the workaround in its current form does not do anything, so
unless you propose an alternative fix, this change is the only way known
to bring systems affected to sanity.
Ralf, please apply the change for now, if Leonid provides us with a
better fix later on, then my patch can be reverted. Thanks.
Maciej
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: MIPS: c-r4k.c: Fix the 74K D-cache alias erratum workaround
@ 2014-11-18 0:11 ` Leonid Yegoshin
0 siblings, 0 replies; 14+ messages in thread
From: Leonid Yegoshin @ 2014-11-18 0:11 UTC (permalink / raw)
To: Maciej W. Rozycki, Ralf Baechle; +Cc: linux-mips
On 11/17/2014 03:29 PM, Maciej W. Rozycki wrote:
> On Mon, 17 Nov 2014, Leonid Yegoshin wrote:
>
>>> Fix the 74K D-cache alias erratum workaround so that it actually works.
>>> Our current code sets MIPS_CACHE_VTAG for the D-cache, but that flag
>>> only has any effect for the I-cache. Additionally MIPS_CACHE_PINDEX is
>>> set for the D-cache if CP0.Config7.AR is also set for an affected
>>> processor, leading to confusing information in the bootstrap log (the
>>> flag isn't used beyond that).
>>> So delete the setting of MIPS_CACHE_VTAG and rely on MIPS_CACHE_ALIASES,
>>> set in a common place, removing I-cache coherency issues seen in GDB
>>> testing with software breakpoints, gdbserver and ptrace(2), on affected
>>> systems.
>>> While at it add a little piece of explanation of what CP0.Config6.SYND
>>> is so that people do not have to chase documentation.
>> This shift to MIPS_CACHE_ALIASES is not needed, a use of MIPS_CACHE_VTAG in
>> dcache is actually a way how to prevent some very specific situations in
>> 74K(E77)/1074K(E17) cache handling. It is not a case of cache aliasing and
>> name VTAG is used because it is related with virtual address conversion
>> tagging. I reused MIPS_CACHE_VTAG just to save some spare bits in
>> cpu_info.options and because D-cache never had virtual tagging like I-cache.
>>
>> The setting d-cache aliases then CPU hasn't it is a significant performance
>> loss and should be avoided.
>>
>> Please don't use this patch.
> I repeat, there is no use in the current kernel of the MIPS_CACHE_VTAG
> flag for the D-cache, there's no single piece of code throughout the
> kernel that would check the presence of this flag once it has been set
> and this erratum workaround piece is the only place where the flag is
> set for the D-cache. The flag is completely ignored for the D-cache and
> the only existing uses of this flag are for the I-cache.
Please look into http://patchwork.linux-mips.org/patch/8459/
Look into cpu_has_vtag_dcache usage.
It is a second or 2rd (or 3rd etc) attempt to put using of that
information upstream for last 1.5 year.
>
> Leonid, I spent several hours chasing cache coherency issues util I
> realised the workaround in its current form does not do anything, so
> unless you propose an alternative fix, this change is the only way known
> to bring systems affected to sanity.
You may ask me, I work last 3 years in MIPS (now IMG) on it and did a
most of coherency fixes all that time.
- Leonid.
>
> Ralf, please apply the change for now, if Leonid provides us with a
> better fix later on, then my patch can be reverted. Thanks.
>
> Maciej
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: MIPS: c-r4k.c: Fix the 74K D-cache alias erratum workaround
@ 2014-11-18 0:11 ` Leonid Yegoshin
0 siblings, 0 replies; 14+ messages in thread
From: Leonid Yegoshin @ 2014-11-18 0:11 UTC (permalink / raw)
To: Maciej W. Rozycki, Ralf Baechle; +Cc: linux-mips
On 11/17/2014 03:29 PM, Maciej W. Rozycki wrote:
> On Mon, 17 Nov 2014, Leonid Yegoshin wrote:
>
>>> Fix the 74K D-cache alias erratum workaround so that it actually works.
>>> Our current code sets MIPS_CACHE_VTAG for the D-cache, but that flag
>>> only has any effect for the I-cache. Additionally MIPS_CACHE_PINDEX is
>>> set for the D-cache if CP0.Config7.AR is also set for an affected
>>> processor, leading to confusing information in the bootstrap log (the
>>> flag isn't used beyond that).
>>> So delete the setting of MIPS_CACHE_VTAG and rely on MIPS_CACHE_ALIASES,
>>> set in a common place, removing I-cache coherency issues seen in GDB
>>> testing with software breakpoints, gdbserver and ptrace(2), on affected
>>> systems.
>>> While at it add a little piece of explanation of what CP0.Config6.SYND
>>> is so that people do not have to chase documentation.
>> This shift to MIPS_CACHE_ALIASES is not needed, a use of MIPS_CACHE_VTAG in
>> dcache is actually a way how to prevent some very specific situations in
>> 74K(E77)/1074K(E17) cache handling. It is not a case of cache aliasing and
>> name VTAG is used because it is related with virtual address conversion
>> tagging. I reused MIPS_CACHE_VTAG just to save some spare bits in
>> cpu_info.options and because D-cache never had virtual tagging like I-cache.
>>
>> The setting d-cache aliases then CPU hasn't it is a significant performance
>> loss and should be avoided.
>>
>> Please don't use this patch.
> I repeat, there is no use in the current kernel of the MIPS_CACHE_VTAG
> flag for the D-cache, there's no single piece of code throughout the
> kernel that would check the presence of this flag once it has been set
> and this erratum workaround piece is the only place where the flag is
> set for the D-cache. The flag is completely ignored for the D-cache and
> the only existing uses of this flag are for the I-cache.
Please look into http://patchwork.linux-mips.org/patch/8459/
Look into cpu_has_vtag_dcache usage.
It is a second or 2rd (or 3rd etc) attempt to put using of that
information upstream for last 1.5 year.
>
> Leonid, I spent several hours chasing cache coherency issues util I
> realised the workaround in its current form does not do anything, so
> unless you propose an alternative fix, this change is the only way known
> to bring systems affected to sanity.
You may ask me, I work last 3 years in MIPS (now IMG) on it and did a
most of coherency fixes all that time.
- Leonid.
>
> Ralf, please apply the change for now, if Leonid provides us with a
> better fix later on, then my patch can be reverted. Thanks.
>
> Maciej
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: MIPS: c-r4k.c: Fix the 74K D-cache alias erratum workaround
@ 2014-11-18 1:16 ` Maciej W. Rozycki
0 siblings, 0 replies; 14+ messages in thread
From: Maciej W. Rozycki @ 2014-11-18 1:16 UTC (permalink / raw)
To: Leonid Yegoshin; +Cc: Ralf Baechle, linux-mips
On Tue, 18 Nov 2014, Leonid Yegoshin wrote:
> > I repeat, there is no use in the current kernel of the MIPS_CACHE_VTAG
> > flag for the D-cache, there's no single piece of code throughout the
> > kernel that would check the presence of this flag once it has been set
> > and this erratum workaround piece is the only place where the flag is
> > set for the D-cache. The flag is completely ignored for the D-cache and
> > the only existing uses of this flag are for the I-cache.
>
> Please look into http://patchwork.linux-mips.org/patch/8459/
> Look into cpu_has_vtag_dcache usage.
> It is a second or 2rd (or 3rd etc) attempt to put using of that information
> upstream for last 1.5 year.
That doesn't appear to have anything to do with ptrace(2) and
cache-coherency issues seen around software breakpoints, and the buggy
74K is the only system of all that we have that shows that problem. And
the problem goes away with my fix. So perhaps MIPS_CACHE_ALIASES is
actually needed here, or maybe a more lightweight alternative fix, but
an item that addresses HIGHMEM support is clearly irrelevant.
This is easily reproduced by running the GDB test suite, I'd have to
double-check old logs from before my fix, but it may be more prominent
with MIPS16 code for some reason. So I suggest that you run the GDB
test suite, for the MIPS16 multilib and on the upstream kernel as it is,
on a good and a bad 74K processor and compare results, and then see if
any of your outstanding fixes makes any changes.
The symptoms are either spurious SIGTRAP signals delivered (if a stale
breakpoint remaining in the I-cache was hit) or breakpoint misses (if a
stale original instruction remaining in the I-cache was executed).
Please note that these failures are not extensive, there'll be a couple
across the whole test suite run (several thousands tests) -- it's not
like software breakpoints are unusable.
> > Leonid, I spent several hours chasing cache coherency issues util I
> > realised the workaround in its current form does not do anything, so
> > unless you propose an alternative fix, this change is the only way known
> > to bring systems affected to sanity.
>
> You may ask me, I work last 3 years in MIPS (now IMG) on it and did a most of
> coherency fixes all that time.
Well, I-cache coherency issues around ptrace(2), specifically the
PTRACE_POKETEXT request, appear to persist -- how often do you verify
your code by getting the GDB test suite run on it (e.g. by your
toolchain team)? Which so far is I believe the best way to trigger any
I-cache coherency issues there, and I remember still seeing issues with
software breakpoints in heavily threaded code across all processors I
could check with (and GDB's Remote Serial Protocol logs indicated they
were correctly propagated down to ptrace(2), so it was all up to the
kernel to sort out), at least a 24Kc, a 74Kc and an M14Kc.
Please note that I test using gdbserver with GDB connecting remotely
from another system, I don't normally run native testing on these
boards, they are too weak for that. It is an important detail as native
testing implies much higher cache pressure (gdbserver is a lightweight
agent while GDB itself is a large blob) where incoherent cache lines are
much more likely to cure by themselves in the course of line replacement
on cache fills.
Maciej
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: MIPS: c-r4k.c: Fix the 74K D-cache alias erratum workaround
@ 2014-11-18 1:16 ` Maciej W. Rozycki
0 siblings, 0 replies; 14+ messages in thread
From: Maciej W. Rozycki @ 2014-11-18 1:16 UTC (permalink / raw)
To: Leonid Yegoshin; +Cc: Ralf Baechle, linux-mips
On Tue, 18 Nov 2014, Leonid Yegoshin wrote:
> > I repeat, there is no use in the current kernel of the MIPS_CACHE_VTAG
> > flag for the D-cache, there's no single piece of code throughout the
> > kernel that would check the presence of this flag once it has been set
> > and this erratum workaround piece is the only place where the flag is
> > set for the D-cache. The flag is completely ignored for the D-cache and
> > the only existing uses of this flag are for the I-cache.
>
> Please look into http://patchwork.linux-mips.org/patch/8459/
> Look into cpu_has_vtag_dcache usage.
> It is a second or 2rd (or 3rd etc) attempt to put using of that information
> upstream for last 1.5 year.
That doesn't appear to have anything to do with ptrace(2) and
cache-coherency issues seen around software breakpoints, and the buggy
74K is the only system of all that we have that shows that problem. And
the problem goes away with my fix. So perhaps MIPS_CACHE_ALIASES is
actually needed here, or maybe a more lightweight alternative fix, but
an item that addresses HIGHMEM support is clearly irrelevant.
This is easily reproduced by running the GDB test suite, I'd have to
double-check old logs from before my fix, but it may be more prominent
with MIPS16 code for some reason. So I suggest that you run the GDB
test suite, for the MIPS16 multilib and on the upstream kernel as it is,
on a good and a bad 74K processor and compare results, and then see if
any of your outstanding fixes makes any changes.
The symptoms are either spurious SIGTRAP signals delivered (if a stale
breakpoint remaining in the I-cache was hit) or breakpoint misses (if a
stale original instruction remaining in the I-cache was executed).
Please note that these failures are not extensive, there'll be a couple
across the whole test suite run (several thousands tests) -- it's not
like software breakpoints are unusable.
> > Leonid, I spent several hours chasing cache coherency issues util I
> > realised the workaround in its current form does not do anything, so
> > unless you propose an alternative fix, this change is the only way known
> > to bring systems affected to sanity.
>
> You may ask me, I work last 3 years in MIPS (now IMG) on it and did a most of
> coherency fixes all that time.
Well, I-cache coherency issues around ptrace(2), specifically the
PTRACE_POKETEXT request, appear to persist -- how often do you verify
your code by getting the GDB test suite run on it (e.g. by your
toolchain team)? Which so far is I believe the best way to trigger any
I-cache coherency issues there, and I remember still seeing issues with
software breakpoints in heavily threaded code across all processors I
could check with (and GDB's Remote Serial Protocol logs indicated they
were correctly propagated down to ptrace(2), so it was all up to the
kernel to sort out), at least a 24Kc, a 74Kc and an M14Kc.
Please note that I test using gdbserver with GDB connecting remotely
from another system, I don't normally run native testing on these
boards, they are too weak for that. It is an important detail as native
testing implies much higher cache pressure (gdbserver is a lightweight
agent while GDB itself is a large blob) where incoherent cache lines are
much more likely to cure by themselves in the course of line replacement
on cache fills.
Maciej
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: MIPS: c-r4k.c: Fix the 74K D-cache alias erratum workaround
@ 2014-11-18 3:56 ` Leonid Yegoshin
0 siblings, 0 replies; 14+ messages in thread
From: Leonid Yegoshin @ 2014-11-18 3:56 UTC (permalink / raw)
To: Maciej W. Rozycki; +Cc: Ralf Baechle, linux-mips
On 11/17/2014 05:16 PM, Maciej W. Rozycki wrote:
> On Tue, 18 Nov 2014, Leonid Yegoshin wrote:
>
>>> I repeat, there is no use in the current kernel of the MIPS_CACHE_VTAG
>>> flag for the D-cache, there's no single piece of code throughout the
>>> kernel that would check the presence of this flag once it has been set
>>> and this erratum workaround piece is the only place where the flag is
>>> set for the D-cache. The flag is completely ignored for the D-cache and
>>> the only existing uses of this flag are for the I-cache.
>> Please look into http://patchwork.linux-mips.org/patch/8459/
>> Look into cpu_has_vtag_dcache usage.
>> It is a second or 2rd (or 3rd etc) attempt to put using of that information
>> upstream for last 1.5 year.
> That doesn't appear to have anything to do with ptrace(2) and
> cache-coherency issues seen around software breakpoints, and the buggy
> 74K is the only system of all that we have that shows that problem. And
> the problem goes away with my fix. So perhaps MIPS_CACHE_ALIASES is
> actually needed here, or maybe a more lightweight alternative fix, but
> an item that addresses HIGHMEM support is clearly irrelevant.
Again, the 74K errata has deal only if there is two mappings and one of
that mappings switches to the same physaddr. In other words - TLB change
is needed (some another conditions are also needed). It has nothing with
generic cache aliasing. Switching ON cache aliasing support just masks
the issue behind massive cache flushes.
If you have problem with ptrace(2) then it points to incorrect result of
copy_to_user_page(), and most probably - with kmap() work. That HIGHMEM
patch there takes care about address aliasings, so I assumed I took that
case into account too but something may changes. I think it has sense to
put the check of cpu_has_vtag_dcache in copy_to_user_page() - it
definitely will enforce cache flashing after ptrace() write aka
__access_remote_vm(..., true) and it doesn't harm the rest of system.
And retest.
But it is needed to do after http://patchwork.linux-mips.org/patch/8459/
fix, without it it is futile.
> This is easily reproduced by running the GDB test suite, I'd have to
> double-check old logs from before my fix, but it may be more prominent
> with MIPS16 code for some reason. So I suggest that you run the GDB
> test suite, for the MIPS16 multilib and on the upstream kernel as it is,
> on a good and a bad 74K processor and compare results, and then see if
> any of your outstanding fixes makes any changes.
>
> The symptoms are either spurious SIGTRAP signals delivered (if a stale
> breakpoint remaining in the I-cache was hit) or breakpoint misses (if a
> stale original instruction remaining in the I-cache was executed).
> Please note that these failures are not extensive, there'll be a couple
> across the whole test suite run (several thousands tests) -- it's not
> like software breakpoints are unusable.
>
>>> Leonid, I spent several hours chasing cache coherency issues util I
>>> realised the workaround in its current form does not do anything, so
>>> unless you propose an alternative fix, this change is the only way known
>>> to bring systems affected to sanity.
>> You may ask me, I work last 3 years in MIPS (now IMG) on it and did a most of
>> coherency fixes all that time.
> Well, I-cache coherency issues around ptrace(2), specifically the
> PTRACE_POKETEXT request, appear to persist -- how often do you verify
> your code by getting the GDB test suite run on it (e.g. by your
> toolchain team)? Which so far is I believe the best way to trigger any
> I-cache coherency issues there, and I remember still seeing issues with
> software breakpoints in heavily threaded code across all processors I
> could check with (and GDB's Remote Serial Protocol logs indicated they
> were correctly propagated down to ptrace(2), so it was all up to the
> kernel to sort out), at least a 24Kc, a 74Kc and an M14Kc.
>
> Please note that I test using gdbserver with GDB connecting remotely
> from another system, I don't normally run native testing on these
> boards, they are too weak for that. It is an important detail as native
> testing implies much higher cache pressure (gdbserver is a lightweight
> agent while GDB itself is a large blob) where incoherent cache lines are
> much more likely to cure by themselves in the course of line replacement
> on cache fills.
Excuse me, I need more detailed instructions "how to", I am not GDB guru.
- Leonid Yegoshin.
>
> Maciej
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: MIPS: c-r4k.c: Fix the 74K D-cache alias erratum workaround
@ 2014-11-18 3:56 ` Leonid Yegoshin
0 siblings, 0 replies; 14+ messages in thread
From: Leonid Yegoshin @ 2014-11-18 3:56 UTC (permalink / raw)
To: Maciej W. Rozycki; +Cc: Ralf Baechle, linux-mips
On 11/17/2014 05:16 PM, Maciej W. Rozycki wrote:
> On Tue, 18 Nov 2014, Leonid Yegoshin wrote:
>
>>> I repeat, there is no use in the current kernel of the MIPS_CACHE_VTAG
>>> flag for the D-cache, there's no single piece of code throughout the
>>> kernel that would check the presence of this flag once it has been set
>>> and this erratum workaround piece is the only place where the flag is
>>> set for the D-cache. The flag is completely ignored for the D-cache and
>>> the only existing uses of this flag are for the I-cache.
>> Please look into http://patchwork.linux-mips.org/patch/8459/
>> Look into cpu_has_vtag_dcache usage.
>> It is a second or 2rd (or 3rd etc) attempt to put using of that information
>> upstream for last 1.5 year.
> That doesn't appear to have anything to do with ptrace(2) and
> cache-coherency issues seen around software breakpoints, and the buggy
> 74K is the only system of all that we have that shows that problem. And
> the problem goes away with my fix. So perhaps MIPS_CACHE_ALIASES is
> actually needed here, or maybe a more lightweight alternative fix, but
> an item that addresses HIGHMEM support is clearly irrelevant.
Again, the 74K errata has deal only if there is two mappings and one of
that mappings switches to the same physaddr. In other words - TLB change
is needed (some another conditions are also needed). It has nothing with
generic cache aliasing. Switching ON cache aliasing support just masks
the issue behind massive cache flushes.
If you have problem with ptrace(2) then it points to incorrect result of
copy_to_user_page(), and most probably - with kmap() work. That HIGHMEM
patch there takes care about address aliasings, so I assumed I took that
case into account too but something may changes. I think it has sense to
put the check of cpu_has_vtag_dcache in copy_to_user_page() - it
definitely will enforce cache flashing after ptrace() write aka
__access_remote_vm(..., true) and it doesn't harm the rest of system.
And retest.
But it is needed to do after http://patchwork.linux-mips.org/patch/8459/
fix, without it it is futile.
> This is easily reproduced by running the GDB test suite, I'd have to
> double-check old logs from before my fix, but it may be more prominent
> with MIPS16 code for some reason. So I suggest that you run the GDB
> test suite, for the MIPS16 multilib and on the upstream kernel as it is,
> on a good and a bad 74K processor and compare results, and then see if
> any of your outstanding fixes makes any changes.
>
> The symptoms are either spurious SIGTRAP signals delivered (if a stale
> breakpoint remaining in the I-cache was hit) or breakpoint misses (if a
> stale original instruction remaining in the I-cache was executed).
> Please note that these failures are not extensive, there'll be a couple
> across the whole test suite run (several thousands tests) -- it's not
> like software breakpoints are unusable.
>
>>> Leonid, I spent several hours chasing cache coherency issues util I
>>> realised the workaround in its current form does not do anything, so
>>> unless you propose an alternative fix, this change is the only way known
>>> to bring systems affected to sanity.
>> You may ask me, I work last 3 years in MIPS (now IMG) on it and did a most of
>> coherency fixes all that time.
> Well, I-cache coherency issues around ptrace(2), specifically the
> PTRACE_POKETEXT request, appear to persist -- how often do you verify
> your code by getting the GDB test suite run on it (e.g. by your
> toolchain team)? Which so far is I believe the best way to trigger any
> I-cache coherency issues there, and I remember still seeing issues with
> software breakpoints in heavily threaded code across all processors I
> could check with (and GDB's Remote Serial Protocol logs indicated they
> were correctly propagated down to ptrace(2), so it was all up to the
> kernel to sort out), at least a 24Kc, a 74Kc and an M14Kc.
>
> Please note that I test using gdbserver with GDB connecting remotely
> from another system, I don't normally run native testing on these
> boards, they are too weak for that. It is an important detail as native
> testing implies much higher cache pressure (gdbserver is a lightweight
> agent while GDB itself is a large blob) where incoherent cache lines are
> much more likely to cure by themselves in the course of line replacement
> on cache fills.
Excuse me, I need more detailed instructions "how to", I am not GDB guru.
- Leonid Yegoshin.
>
> Maciej
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: MIPS: c-r4k.c: Fix the 74K D-cache alias erratum workaround
@ 2014-11-18 4:56 ` Maciej W. Rozycki
0 siblings, 0 replies; 14+ messages in thread
From: Maciej W. Rozycki @ 2014-11-18 4:56 UTC (permalink / raw)
To: Leonid Yegoshin; +Cc: Ralf Baechle, linux-mips
On Tue, 18 Nov 2014, Leonid Yegoshin wrote:
> > That doesn't appear to have anything to do with ptrace(2) and
> > cache-coherency issues seen around software breakpoints, and the buggy
> > 74K is the only system of all that we have that shows that problem. And
> > the problem goes away with my fix. So perhaps MIPS_CACHE_ALIASES is
> > actually needed here, or maybe a more lightweight alternative fix, but
> > an item that addresses HIGHMEM support is clearly irrelevant.
>
> Again, the 74K errata has deal only if there is two mappings and one of that
> mappings switches to the same physaddr. In other words - TLB change is needed
> (some another conditions are also needed). It has nothing with generic cache
> aliasing. Switching ON cache aliasing support just masks the issue behind
> massive cache flushes.
For the erratum to trigger do the two mappings absolutely have to go
through the TLB or can one of them be via the TLB and the other one via
a fixed mapping by the means of KSEG0? This will be the case here.
And I understand what the consequence of setting MIPS_CACHE_ALIASES is,
but I also insist that correctness is more important than performance,
so we need to make sure that the kernel performs reliably even if that
comes at a cost. And the cost may be higher than necessary at the
beginning, but that will be the right starting point for further
improvements.
> If you have problem with ptrace(2) then it points to incorrect result of
> copy_to_user_page(), and most probably - with kmap() work. That HIGHMEM patch
> there takes care about address aliasings, so I assumed I took that case into
> account too but something may changes. I think it has sense to put the check
> of cpu_has_vtag_dcache in copy_to_user_page() - it definitely will enforce
> cache flashing after ptrace() write aka __access_remote_vm(..., true) and it
> doesn't harm the rest of system. And retest.
>
> But it is needed to do after http://patchwork.linux-mips.org/patch/8459/ fix,
> without it it is futile.
I think burying fixes for ordinary accesses among HIGHMEM pieces does
not really help, would you be able to split off pieces relevant for
non-HIGHMEM configurations, such as those for the 74K workaround, from
your big change? I think it would make it easier to get this part
accepted, and the remaining pieces would then shrink too, also making
them easier to review and accept.
Also please note that writing a good change description is important
for patch acceptance, sometimes it may even be more important than the
change itself, that can be small, but unobvious. The reviewer has to be
able to infer from the explanation and understand what problem the
change addresses, what the consequences of the change are and why the
problem has been addressed in this way and not another, possibly down to
the individual code fragments if necessary. In my opinion your change
seems somewhat lacking here I am afraid.
Consider this like attending an exam where the patch is your solution
to a problem you have been given to solve: what would you do to persuade
your examiner to give you a good grade? Here the patch reviewer is your
examiner.
And I have just noticed this 1/8 part is further buried in a series
supposedly addressing XPA support on r5 cores which has even less to do
with the 74K and its erratum than it might seem by looking at this
single part only. :(
> > Well, I-cache coherency issues around ptrace(2), specifically the
> > PTRACE_POKETEXT request, appear to persist -- how often do you verify
> > your code by getting the GDB test suite run on it (e.g. by your
> > toolchain team)? Which so far is I believe the best way to trigger any
> > I-cache coherency issues there, and I remember still seeing issues with
> > software breakpoints in heavily threaded code across all processors I
> > could check with (and GDB's Remote Serial Protocol logs indicated they
> > were correctly propagated down to ptrace(2), so it was all up to the
> > kernel to sort out), at least a 24Kc, a 74Kc and an M14Kc.
> >
> > Please note that I test using gdbserver with GDB connecting remotely
> > from another system, I don't normally run native testing on these
> > boards, they are too weak for that. It is an important detail as native
> > testing implies much higher cache pressure (gdbserver is a lightweight
> > agent while GDB itself is a large blob) where incoherent cache lines are
> > much more likely to cure by themselves in the course of line replacement
> > on cache fills.
>
> Excuse me, I need more detailed instructions "how to", I am not GDB guru.
Understood, but in this case you'll have to consult your toolchain team
I am afraid, either to instruct you or to run tests for you. Running
the GDB test suite on a remote system is not straightforward (as e.g.
`make check' would be) and will depend on how your infrastructure has
been wired. And I don't know what you have available. Sorry I couldn't
help much with this.
Maciej
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: MIPS: c-r4k.c: Fix the 74K D-cache alias erratum workaround
@ 2014-11-18 4:56 ` Maciej W. Rozycki
0 siblings, 0 replies; 14+ messages in thread
From: Maciej W. Rozycki @ 2014-11-18 4:56 UTC (permalink / raw)
To: Leonid Yegoshin; +Cc: Ralf Baechle, linux-mips
On Tue, 18 Nov 2014, Leonid Yegoshin wrote:
> > That doesn't appear to have anything to do with ptrace(2) and
> > cache-coherency issues seen around software breakpoints, and the buggy
> > 74K is the only system of all that we have that shows that problem. And
> > the problem goes away with my fix. So perhaps MIPS_CACHE_ALIASES is
> > actually needed here, or maybe a more lightweight alternative fix, but
> > an item that addresses HIGHMEM support is clearly irrelevant.
>
> Again, the 74K errata has deal only if there is two mappings and one of that
> mappings switches to the same physaddr. In other words - TLB change is needed
> (some another conditions are also needed). It has nothing with generic cache
> aliasing. Switching ON cache aliasing support just masks the issue behind
> massive cache flushes.
For the erratum to trigger do the two mappings absolutely have to go
through the TLB or can one of them be via the TLB and the other one via
a fixed mapping by the means of KSEG0? This will be the case here.
And I understand what the consequence of setting MIPS_CACHE_ALIASES is,
but I also insist that correctness is more important than performance,
so we need to make sure that the kernel performs reliably even if that
comes at a cost. And the cost may be higher than necessary at the
beginning, but that will be the right starting point for further
improvements.
> If you have problem with ptrace(2) then it points to incorrect result of
> copy_to_user_page(), and most probably - with kmap() work. That HIGHMEM patch
> there takes care about address aliasings, so I assumed I took that case into
> account too but something may changes. I think it has sense to put the check
> of cpu_has_vtag_dcache in copy_to_user_page() - it definitely will enforce
> cache flashing after ptrace() write aka __access_remote_vm(..., true) and it
> doesn't harm the rest of system. And retest.
>
> But it is needed to do after http://patchwork.linux-mips.org/patch/8459/ fix,
> without it it is futile.
I think burying fixes for ordinary accesses among HIGHMEM pieces does
not really help, would you be able to split off pieces relevant for
non-HIGHMEM configurations, such as those for the 74K workaround, from
your big change? I think it would make it easier to get this part
accepted, and the remaining pieces would then shrink too, also making
them easier to review and accept.
Also please note that writing a good change description is important
for patch acceptance, sometimes it may even be more important than the
change itself, that can be small, but unobvious. The reviewer has to be
able to infer from the explanation and understand what problem the
change addresses, what the consequences of the change are and why the
problem has been addressed in this way and not another, possibly down to
the individual code fragments if necessary. In my opinion your change
seems somewhat lacking here I am afraid.
Consider this like attending an exam where the patch is your solution
to a problem you have been given to solve: what would you do to persuade
your examiner to give you a good grade? Here the patch reviewer is your
examiner.
And I have just noticed this 1/8 part is further buried in a series
supposedly addressing XPA support on r5 cores which has even less to do
with the 74K and its erratum than it might seem by looking at this
single part only. :(
> > Well, I-cache coherency issues around ptrace(2), specifically the
> > PTRACE_POKETEXT request, appear to persist -- how often do you verify
> > your code by getting the GDB test suite run on it (e.g. by your
> > toolchain team)? Which so far is I believe the best way to trigger any
> > I-cache coherency issues there, and I remember still seeing issues with
> > software breakpoints in heavily threaded code across all processors I
> > could check with (and GDB's Remote Serial Protocol logs indicated they
> > were correctly propagated down to ptrace(2), so it was all up to the
> > kernel to sort out), at least a 24Kc, a 74Kc and an M14Kc.
> >
> > Please note that I test using gdbserver with GDB connecting remotely
> > from another system, I don't normally run native testing on these
> > boards, they are too weak for that. It is an important detail as native
> > testing implies much higher cache pressure (gdbserver is a lightweight
> > agent while GDB itself is a large blob) where incoherent cache lines are
> > much more likely to cure by themselves in the course of line replacement
> > on cache fills.
>
> Excuse me, I need more detailed instructions "how to", I am not GDB guru.
Understood, but in this case you'll have to consult your toolchain team
I am afraid, either to instruct you or to run tests for you. Running
the GDB test suite on a remote system is not straightforward (as e.g.
`make check' would be) and will depend on how your infrastructure has
been wired. And I don't know what you have available. Sorry I couldn't
help much with this.
Maciej
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: MIPS: c-r4k.c: Fix the 74K D-cache alias erratum workaround
@ 2014-11-18 19:17 ` Leonid Yegoshin
0 siblings, 0 replies; 14+ messages in thread
From: Leonid Yegoshin @ 2014-11-18 19:17 UTC (permalink / raw)
To: Maciej W. Rozycki; +Cc: Ralf Baechle, linux-mips
On 11/17/2014 08:56 PM, Maciej W. Rozycki wrote:
> On Tue, 18 Nov 2014, Leonid Yegoshin wrote:
>
>>> That doesn't appear to have anything to do with ptrace(2) and
>>> cache-coherency issues seen around software breakpoints, and the buggy
>>> 74K is the only system of all that we have that shows that problem. And
>>> the problem goes away with my fix. So perhaps MIPS_CACHE_ALIASES is
>>> actually needed here, or maybe a more lightweight alternative fix, but
>>> an item that addresses HIGHMEM support is clearly irrelevant.
>> Again, the 74K errata has deal only if there is two mappings and one of that
>> mappings switches to the same physaddr. In other words - TLB change is needed
>> (some another conditions are also needed). It has nothing with generic cache
>> aliasing. Switching ON cache aliasing support just masks the issue behind
>> massive cache flushes.
> For the erratum to trigger do the two mappings absolutely have to go
> through the TLB or can one of them be via the TLB and the other one via
> a fixed mapping by the means of KSEG0?
One of mapping can be KSEG0 fixed.
> This will be the case here.
It was considered as not - KSEG0 mapping is not changed and user mapping
actually is stable and is not changed during ptrace(2).
But change of one TLB is required (with accesses via both and with race
condition with other mapping).
It is a very specific case and there are only two places which may have
impact - CoW and page clearing before it is submitted to user.
>
> And I understand what the consequence of setting MIPS_CACHE_ALIASES is,
> but I also insist that correctness is more important than performance,
> so we need to make sure that the kernel performs reliably even if that
> comes at a cost. And the cost may be higher than necessary at the
> beginning, but that will be the right starting point for further
> improvements.
I suspect you may have a problem with something else but not 74K errata
and switching on cache aliasing flushes just hides a problem. Why you
see it on 74K only - because it is out-of-order CPU. Did you test on
proAptiv or newest P5600, it has a similar out-of-order design?
>> If you have problem with ptrace(2) then it points to incorrect result of
>> copy_to_user_page(), and most probably - with kmap() work. That HIGHMEM patch
>> there takes care about address aliasings, so I assumed I took that case into
>> account too but something may changes. I think it has sense to put the check
>> of cpu_has_vtag_dcache in copy_to_user_page() - it definitely will enforce
>> cache flashing after ptrace() write aka __access_remote_vm(..., true) and it
>> doesn't harm the rest of system. And retest.
>>
>> But it is needed to do after http://patchwork.linux-mips.org/patch/8459/ fix,
>> without it it is futile.
> I think burying fixes for ordinary accesses among HIGHMEM pieces does
> not really help, would you be able to split off pieces relevant for
> non-HIGHMEM configurations, such as those for the 74K workaround, from
> your big change? I think it would make it easier to get this part
> accepted, and the remaining pieces would then shrink too, also making
> them easier to review and accept.
It is related with HIGHMEM because (at least 3 years ago) the HIGHMEM
had a high possibility to hit it.
However, it was a series of some stuff which was sequentially ported
from 2.6.32.15 to 2.6.32.9 and so on but never passes upstream and
because of that it suffers a serious change in time.
- Leonid.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: MIPS: c-r4k.c: Fix the 74K D-cache alias erratum workaround
@ 2014-11-18 19:17 ` Leonid Yegoshin
0 siblings, 0 replies; 14+ messages in thread
From: Leonid Yegoshin @ 2014-11-18 19:17 UTC (permalink / raw)
To: Maciej W. Rozycki; +Cc: Ralf Baechle, linux-mips
On 11/17/2014 08:56 PM, Maciej W. Rozycki wrote:
> On Tue, 18 Nov 2014, Leonid Yegoshin wrote:
>
>>> That doesn't appear to have anything to do with ptrace(2) and
>>> cache-coherency issues seen around software breakpoints, and the buggy
>>> 74K is the only system of all that we have that shows that problem. And
>>> the problem goes away with my fix. So perhaps MIPS_CACHE_ALIASES is
>>> actually needed here, or maybe a more lightweight alternative fix, but
>>> an item that addresses HIGHMEM support is clearly irrelevant.
>> Again, the 74K errata has deal only if there is two mappings and one of that
>> mappings switches to the same physaddr. In other words - TLB change is needed
>> (some another conditions are also needed). It has nothing with generic cache
>> aliasing. Switching ON cache aliasing support just masks the issue behind
>> massive cache flushes.
> For the erratum to trigger do the two mappings absolutely have to go
> through the TLB or can one of them be via the TLB and the other one via
> a fixed mapping by the means of KSEG0?
One of mapping can be KSEG0 fixed.
> This will be the case here.
It was considered as not - KSEG0 mapping is not changed and user mapping
actually is stable and is not changed during ptrace(2).
But change of one TLB is required (with accesses via both and with race
condition with other mapping).
It is a very specific case and there are only two places which may have
impact - CoW and page clearing before it is submitted to user.
>
> And I understand what the consequence of setting MIPS_CACHE_ALIASES is,
> but I also insist that correctness is more important than performance,
> so we need to make sure that the kernel performs reliably even if that
> comes at a cost. And the cost may be higher than necessary at the
> beginning, but that will be the right starting point for further
> improvements.
I suspect you may have a problem with something else but not 74K errata
and switching on cache aliasing flushes just hides a problem. Why you
see it on 74K only - because it is out-of-order CPU. Did you test on
proAptiv or newest P5600, it has a similar out-of-order design?
>> If you have problem with ptrace(2) then it points to incorrect result of
>> copy_to_user_page(), and most probably - with kmap() work. That HIGHMEM patch
>> there takes care about address aliasings, so I assumed I took that case into
>> account too but something may changes. I think it has sense to put the check
>> of cpu_has_vtag_dcache in copy_to_user_page() - it definitely will enforce
>> cache flashing after ptrace() write aka __access_remote_vm(..., true) and it
>> doesn't harm the rest of system. And retest.
>>
>> But it is needed to do after http://patchwork.linux-mips.org/patch/8459/ fix,
>> without it it is futile.
> I think burying fixes for ordinary accesses among HIGHMEM pieces does
> not really help, would you be able to split off pieces relevant for
> non-HIGHMEM configurations, such as those for the 74K workaround, from
> your big change? I think it would make it easier to get this part
> accepted, and the remaining pieces would then shrink too, also making
> them easier to review and accept.
It is related with HIGHMEM because (at least 3 years ago) the HIGHMEM
had a high possibility to hit it.
However, it was a series of some stuff which was sequentially ported
from 2.6.32.15 to 2.6.32.9 and so on but never passes upstream and
because of that it suffers a serious change in time.
- Leonid.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2014-11-18 19:17 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-17 20:12 MIPS: c-r4k.c: Fix the 74K D-cache alias erratum workaround Leonid Yegoshin
2014-11-17 20:12 ` Leonid Yegoshin
2014-11-17 23:29 ` Maciej W. Rozycki
2014-11-17 23:29 ` Maciej W. Rozycki
2014-11-18 0:11 ` Leonid Yegoshin
2014-11-18 0:11 ` Leonid Yegoshin
2014-11-18 1:16 ` Maciej W. Rozycki
2014-11-18 1:16 ` Maciej W. Rozycki
2014-11-18 3:56 ` Leonid Yegoshin
2014-11-18 3:56 ` Leonid Yegoshin
2014-11-18 4:56 ` Maciej W. Rozycki
2014-11-18 4:56 ` Maciej W. Rozycki
2014-11-18 19:17 ` Leonid Yegoshin
2014-11-18 19:17 ` Leonid Yegoshin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.