Query about: ARM11 MPCore: preemption/task migration cache coherency

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* Query about: ARM11 MPCore: preemption/task migration cache coherency
@ 2012-05-09  9:11 bill4carson
  2012-05-11  8:51 ` Will Deacon
  2012-06-05  4:06 ` George G. Davis
  0 siblings, 2 replies; 29+ messages in thread
From: bill4carson @ 2012-05-09  9:11 UTC (permalink / raw)
  To: linux-arm-kernel

Hi, All

I'm using ARM11 MPCore on linux-2.6.34, unfortunately I have random
panic/segment fault with task migration. I noticed a patch set
[ARM11 MPCore: preemption/task migration cache coherency fixups] to fix
such issues in here:

http://lists.infradead.org/pipermail/linux-arm-kernel/2011-October/069851.html

It seems there is no follow ups, is there official patch to fix such
issues?

thanks

-- 
Love each day!

--bill

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Query about: ARM11 MPCore: preemption/task migration cache coherency
  2012-05-09  9:11 Query about: ARM11 MPCore: preemption/task migration cache coherency bill4carson
@ 2012-05-11  8:51 ` Will Deacon
  2012-05-11  9:53   ` bill4carson
  2012-05-29  5:28   ` bill4carson
  2012-06-05  4:06 ` George G. Davis
  1 sibling, 2 replies; 29+ messages in thread
From: Will Deacon @ 2012-05-11  8:51 UTC (permalink / raw)
  To: linux-arm-kernel

Bill,

On Wed, May 09, 2012 at 10:11:05AM +0100, bill4carson wrote:
> I'm using ARM11 MPCore on linux-2.6.34, unfortunately I have random
> panic/segment fault with task migration. I noticed a patch set
> [ARM11 MPCore: preemption/task migration cache coherency fixups] to fix
> such issues in here:
> 
> http://lists.infradead.org/pipermail/linux-arm-kernel/2011-October/069851.html
> 
> It seems there is no follow ups, is there official patch to fix such
> issues?

Let's be honest: you haven't given us a lot to go on here. Perhaps you could
answer the following?

(1) Do you experience the same issues with a more recent kernel?
(2) If you apply the patches linked to above, does it fix your problem?
(3) If you can reproduce on current mainline, do you have a testcase?
(4) Does disabling CONFIG_PREEMPT make the problem disappear?

That should provide us with some information about the problem.

Thanks,

Will

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Query about: ARM11 MPCore: preemption/task migration cache coherency
  2012-05-11  8:51 ` Will Deacon
@ 2012-05-11  9:53   ` bill4carson
  2012-05-29  5:28   ` bill4carson
  1 sibling, 0 replies; 29+ messages in thread
From: bill4carson @ 2012-05-11  9:53 UTC (permalink / raw)
  To: linux-arm-kernel



On 2012?05?11? 16:51, Will Deacon wrote:
> Bill,
>
> On Wed, May 09, 2012 at 10:11:05AM +0100, bill4carson wrote:
>> I'm using ARM11 MPCore on linux-2.6.34, unfortunately I have random
>> panic/segment fault with task migration. I noticed a patch set
>> [ARM11 MPCore: preemption/task migration cache coherency fixups] to fix
>> such issues in here:
>>
>> http://lists.infradead.org/pipermail/linux-arm-kernel/2011-October/069851.html
>>
>> It seems there is no follow ups, is there official patch to fix such
>> issues?
>
> Let's be honest: you haven't given us a lot to go on here. Perhaps you could
> answer the following?
>

Hi, Will

First thanks for your reply.


> (1) Do you experience the same issues with a more recent kernel?

This ARM11MPcore based SOC is not merged into mainline kernel, to make
things worse, we do not any other ARM11MPcore based SOC to verify
whether this issue is bsp related or generic ARM core code issue.


> (2) If you apply the patches linked to above, does it fix your problem?
Yes, We apply the v4 patch set after some minor fix followed in the
discussion. This problem still exits.

> (3) If you can reproduce on current mainline, do you have a testcase?
We are trying to contact with the vendor to verify this, it's a bit
of slow, so I send this noisy query to mail list with a hope to find
a way for this issue.

> (4) Does disabling CONFIG_PREEMPT make the problem disappear?

Yes, the problem still exits.

For now, I can only forward some error message.

Procedures:

# cat /opt/test_yaffs1.sh
#!/bin/sh

for i in `seq $1`
do
  	sync
  	echo 3>  /proc/sys/vm/drop_caches
  	cat /cat >/dev/null&
	cat /wrsv-lttctl >/dev/null&
  	cat /wrsv-lttd >/dev/null&
	cat /busybox >/dev/null&
  	wait
  	echo debug ---"$i"---
done

# echo 0>  /proc/sys/kernel/sched_migration_cost
# /opt/test_yaffs1.sh 10000


Some different Errors:

*** glibc detected *** cat: munmap_chunk(): invalid pointer: 0xbeaa0f13 ***
 >
 > [10870.314465] Unhandled fault: alignment exception (0x821) at 0xebfffee6
 > [10870.315347] Unhandled fault: alignment exception (0x821) at 0xebfffee6
 > [10870.354917] Unhandled fault: alignment exception (0x821) at 0xebfffee6
------------------------------------------------------------------------

[13758.255564] Alignment trap: not handling instruction e1a00001 at
 > [<00033e38>]
 > [13758.256459] Alignment trap: not handling instruction e1a00001 at
 > [<00033e38>]
 > [13758.256522] Unhandled fault: alignment exception (0x811) at 0x0019f1aa
 > [13758.319272] Alignment trap: not handling instruction e1a00001 at
 > [<00033e38>]
 > [13758.340860] Unhandled fault: alignment exception (0x811) at 0x0019f1aa
 > [13758.361612] Unhandled fault: alignment exception (0x811) at 0x0019f1aa




When disable migration by:

echo -1 >  /proc/sys/kernel/sched_migration_cost

All errors message gone.


-------------------------------------------------------



>
> That should provide us with some information about the problem.
>




> Thanks,
>
> Will
>

-- 
Love each day!

--bill

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Query about: ARM11 MPCore: preemption/task migration cache coherency
  2012-05-11  8:51 ` Will Deacon
  2012-05-11  9:53   ` bill4carson
@ 2012-05-29  5:28   ` bill4carson
  2012-05-30  6:38     ` Will Deacon
  1 sibling, 1 reply; 29+ messages in thread
From: bill4carson @ 2012-05-29  5:28 UTC (permalink / raw)
  To: linux-arm-kernel



On 2012?05?11? 16:51, Will Deacon wrote:
> Bill,
>
> On Wed, May 09, 2012 at 10:11:05AM +0100, bill4carson wrote:
>> I'm using ARM11 MPCore on linux-2.6.34, unfortunately I have random
>> panic/segment fault with task migration. I noticed a patch set
>> [ARM11 MPCore: preemption/task migration cache coherency fixups] to fix
>> such issues in here:
>>
>> http://lists.infradead.org/pipermail/linux-arm-kernel/2011-October/069851.html
>>
>> It seems there is no follow ups, is there official patch to fix such
>> issues?
>
> Let's be honest: you haven't given us a lot to go on here. Perhaps you could
> answer the following?
>
> (1) Do you experience the same issues with a more recent kernel?
> (2) If you apply the patches linked to above, does it fix your problem?
> (3) If you can reproduce on current mainline, do you have a testcase?
> (4) Does disabling CONFIG_PREEMPT make the problem disappear?
>
> That should provide us with some information about the problem.
>



Based on the limitation of CP15 of ARM11 MPCore:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dai0228a/index.html

The ARM11 MPCore SCU does not handle coherency consequences of CP15 
cache operations like clean and invalidate. If these operations are 
performed on one CPU, they do not affect the state of a cache line on a 
different CPU.  This can result in unexpected behavior if, say, a line 
is cleaned/invalidated but a subsequent access hits a stale copy in 
another CPU?s L1 through snooping the ?coherency domain?.

Kernel option CONFIG_DMA_CACHE_RWFO was introduced to fix it. details 
see arch/arm/mm/cache-v6.S:
...
#ifdef CONFIG_DMA_CACHE_RWFO
         ldr     r2, [r0]                        @ read for ownership
         str     r2, [r0]                        @ write for ownership
#endif
...

I think:
1) The similar protection was not added on data cache handlers like 
v6_coherent_kern_range and v6_flush_kern_cache_all.


Here I modified v6_coherent_kern_range as:
--- a/arch/arm/mm/cache-v6.S
+++ b/arch/arm/mm/cache-v6.S
@@ -170,6 +170,10 @@ ENDPROC(v6_coherent_kern_range)
  ENTRY(v6_flush_kern_dcache_area)
         add     r1, r0, r1
  1:
+#ifdef CONFIG_SMP
+       ldr     r2, [r0]                        @ read for ownership
+       str     r2, [r0]                        @ write for ownership
+#endif /* CONFIG_SMP */
  #ifdef HARVARD_CACHE
         mcr     p15, 0, r0, c7, c14, 1          @ clean & invalidate D line
  #else

But I have no idea on how to accomplish the v6_flush_kern_cache_all, 
maybe IPI is needed?

Any opinions?

And, with this little patch above, a strange issue on my side did 
*disappeared*. The issue is when enable task migration, some strange 
error may occurred like Segmentation fault and:

*** glibc detected *** cat: munmap_chunk(): invalid pointer: 0xbeaa0f13 ***

[10870.314465] Unhandled fault: alignment exception (0x821) at 0xebfffee6
[10870.315347] Unhandled fault: alignment exception (0x821) at 0xebfffee6
[10870.354917] Unhandled fault: alignment exception (0x821) at 0xebfffee6

cat: invalid number 'cat'
cat: invalid number 'cat'
cat: invalid number 'cat'
cat: invalid number 'cat'

cat: \x13\x1f??\x17\x1f??: No such file or directory
cat: \x17??^[??: No such file or directory

cat: can't open ' ?*?D?_?f?o?|???????????': No such file or directory
cat: applet not found

cat: (null): Invalid argument

cat: (null): Bad address
cat: (null): Bad address

cat: \x11o??\x15o??: No such file or directory
cat: applet not found
cat: ???\x13???: No such file or directory

cat: unknown user /busybox

cat: \x17???^[???: No such file or directory
cat: applet not found

cat: ???$???: Bad address
cat: \x1a???
???: Bad address
cat: \x18\x1f??
\x1f??: Bad address

[13758.255564] Alignment trap: not handling instruction e1a00001 at
[<00033e38>]
[13758.256459] Alignment trap: not handling instruction e1a00001 at
[<00033e38>]
[13758.256522] Unhandled fault: alignment exception (0x811) at 0x0019f1aa
[13758.319272] Alignment trap: not handling instruction e1a00001 at
[<00033e38>]
[13758.340860] Unhandled fault: alignment exception (0x811) at 0x0019f1aa
[13758.361612] Unhandled fault: alignment exception (0x811) at 0x0019f1aa




-- 
Love each day!

--bill

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Query about: ARM11 MPCore: preemption/task migration cache coherency
  2012-05-29  5:28   ` bill4carson
@ 2012-05-30  6:38     ` Will Deacon
  2012-05-30 10:01       ` bill4carson
  0 siblings, 1 reply; 29+ messages in thread
From: Will Deacon @ 2012-05-30  6:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 29, 2012 at 06:28:11AM +0100, bill4carson wrote:
> --- a/arch/arm/mm/cache-v6.S
> +++ b/arch/arm/mm/cache-v6.S
> @@ -170,6 +170,10 @@ ENDPROC(v6_coherent_kern_range)
>   ENTRY(v6_flush_kern_dcache_area)
>          add     r1, r0, r1
>   1:
> +#ifdef CONFIG_SMP
> +       ldr     r2, [r0]                        @ read for ownership
> +       str     r2, [r0]                        @ write for ownership
> +#endif /* CONFIG_SMP */
>   #ifdef HARVARD_CACHE
>          mcr     p15, 0, r0, c7, c14, 1          @ clean & invalidate D line
>   #else

I don't think the invalidation is needed here, so you probably don't need to
hack this function at all.

> But I have no idea on how to accomplish the v6_flush_kern_cache_all, 
> maybe IPI is needed?

We could add an IPI to invalidate the I-caches on the other cores, however
I haven't checked to see if we could instead do something on the CPU
migration path which avoid the need for the broadcasting.

Will

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Query about: ARM11 MPCore: preemption/task migration cache coherency
  2012-05-30  6:38     ` Will Deacon
@ 2012-05-30 10:01       ` bill4carson
  2012-05-31  3:00         ` Catalin Marinas
  0 siblings, 1 reply; 29+ messages in thread
From: bill4carson @ 2012-05-30 10:01 UTC (permalink / raw)
  To: linux-arm-kernel

Hi, Will

First of all, Thanks your attentions for this issue.


On 2012?05?30? 14:38, Will Deacon wrote:
> On Tue, May 29, 2012 at 06:28:11AM +0100, bill4carson wrote:
>> --- a/arch/arm/mm/cache-v6.S
>> +++ b/arch/arm/mm/cache-v6.S
>> @@ -170,6 +170,10 @@ ENDPROC(v6_coherent_kern_range)
>>    ENTRY(v6_flush_kern_dcache_area)
>>           add     r1, r0, r1
>>    1:
>> +#ifdef CONFIG_SMP
>> +       ldr     r2, [r0]                        @ read for ownership
>> +       str     r2, [r0]                        @ write for ownership
>> +#endif /* CONFIG_SMP */
>>    #ifdef HARVARD_CACHE
>>           mcr     p15, 0, r0, c7, c14, 1          @ clean&  invalidate D line
>>    #else
>
> I don't think the invalidation is needed here, so you probably don't need to
> hack this function at all.


     CPU0                        CPU1
+--------------------+        +--------------------+   +----------+
| L1 Dcache   part_a |        | L1 Dcache   part_b |   |  Icache  |
+--------------------+        +--------------------+   +----------+
             ^                                            |
             |                   +------------------------+
             |                   |
             |         SCU       V
     +-------|-------------------|---------+
     |       +------- ??? -------+         |
     |                 |                   |
     +-----------------|-------------------+
                       |
                       V
              Main   Memory
     +-------------------------------------+
     |       stale part_a       part_b     |
     +-------------------------------------+


1. task t runs on CPU 0, it executes one program in nand flash,
    so first task t read *part* of this program into its local Dcache,
    let's call it part_a;

2. task t migrates from CPU0 onto CPU1, in there reads the rest of
    program into its local Dcache, label it part_b;

3. on CPU1, task t flush the whole address range of this program,
    As with ARM11 MPCore, this flush is locally effective, after flush
    part_a still resides in CPU0 L1 Dcache.

4. When this program gets executed, CPU 1 fetch its instructions from
    Icache, SCU will notice this action,

    Question:
    At this point, is stale part_a in main memory *or* L1 Dcache part_a
    in CPU0 routed into CPU1 as instruction?



>
>> But I have no idea on how to accomplish the v6_flush_kern_cache_all,
>> maybe IPI is needed?
>
> We could add an IPI to invalidate the I-caches on the other cores, however
> I haven't checked to see if we could instead do something on the CPU
> migration path which avoid the need for the broadcasting.
>


Another workaround is mark the task migrations in function "pull_task",
while in function "context_switch", check it to see if any tasks
migrated into current CPU, if there do be such tasks, flush entire data
cache on remote core(the source core of task migration) and wait the
operation accomplished. This is also verified. But from my point of
view, this is just a temporary workaround instead of resolution.
The little patch above should be the right one that fix this bug:
Make the flush_kern_dcache_area global affective.


It's very pleased if you could give your insights in this.


> Will
>

-- 
Love each day!

--bill

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Query about: ARM11 MPCore: preemption/task migration cache coherency
  2012-05-30 10:01       ` bill4carson
@ 2012-05-31  3:00         ` Catalin Marinas
  2012-05-31  3:11           ` bill4carson
  0 siblings, 1 reply; 29+ messages in thread
From: Catalin Marinas @ 2012-05-31  3:00 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Wed, May 30, 2012 at 11:01:59AM +0100, bill4carson wrote:
> On 2012?05?30? 14:38, Will Deacon wrote:
> > On Tue, May 29, 2012 at 06:28:11AM +0100, bill4carson wrote:
> >> --- a/arch/arm/mm/cache-v6.S
> >> +++ b/arch/arm/mm/cache-v6.S
> >> @@ -170,6 +170,10 @@ ENDPROC(v6_coherent_kern_range)
> >>    ENTRY(v6_flush_kern_dcache_area)
> >>           add     r1, r0, r1
> >>    1:
> >> +#ifdef CONFIG_SMP
> >> +       ldr     r2, [r0]                        @ read for ownership
> >> +       str     r2, [r0]                        @ write for ownership
> >> +#endif /* CONFIG_SMP */
> >>    #ifdef HARVARD_CACHE
> >>           mcr     p15, 0, r0, c7, c14, 1          @ clean&  invalidate D line
> >>    #else
> >
> > I don't think the invalidation is needed here, so you probably don't need to
> > hack this function at all.
...
> 1. task t runs on CPU 0, it executes one program in nand flash,
>     so first task t read *part* of this program into its local Dcache,
>     let's call it part_a;
> 
> 2. task t migrates from CPU0 onto CPU1, in there reads the rest of
>     program into its local Dcache, label it part_b;
> 
> 3. on CPU1, task t flush the whole address range of this program,
>     As with ARM11 MPCore, this flush is locally effective, after flush
>     part_a still resides in CPU0 L1 Dcache.
> 
> 4. When this program gets executed, CPU 1 fetch its instructions from
>     Icache, SCU will notice this action,
> 
>     Question:
>     At this point, is stale part_a in main memory *or* L1 Dcache part_a
>     in CPU0 routed into CPU1 as instruction?

This has been discussed in the past. The first thing is to disable
CONFIG_PREEMPT to make sure you are not preempted between page copying
and task execution. The code doing the page copy needs to call
flush_dcache_page() on the CPU where the copy occurred and Linux does a
non-lazy D-cache flush. But there are many situations where Linux
doesn't do this (you can google for flush_dcache_page and my name to
find a few places where I tried to fix this).

AFAIK, The SCU only snoops the D-cache, not the I-cache. We have a full
I-cache invalidation during task migration.

-- 
Catalin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Query about: ARM11 MPCore: preemption/task migration cache coherency
  2012-05-31  3:00         ` Catalin Marinas
@ 2012-05-31  3:11           ` bill4carson
  2012-05-31  3:12             ` Catalin Marinas
  2012-05-31  3:19             ` Catalin Marinas
  0 siblings, 2 replies; 29+ messages in thread
From: bill4carson @ 2012-05-31  3:11 UTC (permalink / raw)
  To: linux-arm-kernel



On 2012?05?31? 11:00, Catalin Marinas wrote:
> Hi,
>
> On Wed, May 30, 2012 at 11:01:59AM +0100, bill4carson wrote:
>> On 2012?05?30? 14:38, Will Deacon wrote:
>>> On Tue, May 29, 2012 at 06:28:11AM +0100, bill4carson wrote:
>>>> --- a/arch/arm/mm/cache-v6.S
>>>> +++ b/arch/arm/mm/cache-v6.S
>>>> @@ -170,6 +170,10 @@ ENDPROC(v6_coherent_kern_range)
>>>>     ENTRY(v6_flush_kern_dcache_area)
>>>>            add     r1, r0, r1
>>>>     1:
>>>> +#ifdef CONFIG_SMP
>>>> +       ldr     r2, [r0]                        @ read for ownership
>>>> +       str     r2, [r0]                        @ write for ownership
>>>> +#endif /* CONFIG_SMP */
>>>>     #ifdef HARVARD_CACHE
>>>>            mcr     p15, 0, r0, c7, c14, 1          @ clean&   invalidate D line
>>>>     #else
>>>
>>> I don't think the invalidation is needed here, so you probably don't need to
>>> hack this function at all.
> ...
>> 1. task t runs on CPU 0, it executes one program in nand flash,
>>      so first task t read *part* of this program into its local Dcache,
>>      let's call it part_a;
>>
>> 2. task t migrates from CPU0 onto CPU1, in there reads the rest of
>>      program into its local Dcache, label it part_b;
>>
>> 3. on CPU1, task t flush the whole address range of this program,
>>      As with ARM11 MPCore, this flush is locally effective, after flush
>>      part_a still resides in CPU0 L1 Dcache.
>>
>> 4. When this program gets executed, CPU 1 fetch its instructions from
>>      Icache, SCU will notice this action,
>>
>>      Question:
>>      At this point, is stale part_a in main memory *or* L1 Dcache part_a
>>      in CPU0 routed into CPU1 as instruction?
>
> This has been discussed in the past. The first thing is to disable
> CONFIG_PREEMPT to make sure you are not preempted between page copying
> and task execution. The code doing the page copy needs to call
> flush_dcache_page() on the CPU where the copy occurred and Linux does a
> non-lazy D-cache flush. But there are many situations where Linux
> doesn't do this (you can google for flush_dcache_page and my name to
> find a few places where I tried to fix this).
>
Thanks:)


> AFAIK, The SCU only snoops the D-cache, not the I-cache. We have a full
> I-cache invalidation during task migration.

                             ^^^^^^^^^^^^
Could you please point it to me?




-- 
Love each day!

--bill

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Query about: ARM11 MPCore: preemption/task migration cache coherency
  2012-05-31  3:11           ` bill4carson
@ 2012-05-31  3:12             ` Catalin Marinas
  2012-05-31  3:19             ` Catalin Marinas
  1 sibling, 0 replies; 29+ messages in thread
From: Catalin Marinas @ 2012-05-31  3:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 31, 2012 at 04:11:46AM +0100, bill4carson wrote:
> On 2012?05?31? 11:00, Catalin Marinas wrote:
> > AFAIK, The SCU only snoops the D-cache, not the I-cache. We have a full
> > I-cache invalidation during task migration.
> 
>                              ^^^^^^^^^^^^
> Could you please point it to me?

There is a check in switch_mm() in the asm/mmu_context.h file.

-- 
Catalin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Query about: ARM11 MPCore: preemption/task migration cache coherency
  2012-05-31  3:11           ` bill4carson
  2012-05-31  3:12             ` Catalin Marinas
@ 2012-05-31  3:19             ` Catalin Marinas
  2012-05-31  3:38               ` bill4carson
  1 sibling, 1 reply; 29+ messages in thread
From: Catalin Marinas @ 2012-05-31  3:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 31, 2012 at 04:11:46AM +0100, bill4carson wrote:
> On 2012?05?31? 11:00, Catalin Marinas wrote:
> > On Wed, May 30, 2012 at 11:01:59AM +0100, bill4carson wrote:
> >> On 2012?05?30? 14:38, Will Deacon wrote:
> >>> On Tue, May 29, 2012 at 06:28:11AM +0100, bill4carson wrote:
> >>>> --- a/arch/arm/mm/cache-v6.S
> >>>> +++ b/arch/arm/mm/cache-v6.S
> >>>> @@ -170,6 +170,10 @@ ENDPROC(v6_coherent_kern_range)
> >>>>     ENTRY(v6_flush_kern_dcache_area)
> >>>>            add     r1, r0, r1
> >>>>     1:
> >>>> +#ifdef CONFIG_SMP
> >>>> +       ldr     r2, [r0]                        @ read for ownership
> >>>> +       str     r2, [r0]                        @ write for ownership
> >>>> +#endif /* CONFIG_SMP */
> >>>>     #ifdef HARVARD_CACHE
> >>>>            mcr     p15, 0, r0, c7, c14, 1          @ clean&   invalidate D line
> >>>>     #else
> >>>
> >>> I don't think the invalidation is needed here, so you probably don't need to
> >>> hack this function at all.
> > ...
> >> 1. task t runs on CPU 0, it executes one program in nand flash,
> >>      so first task t read *part* of this program into its local Dcache,
> >>      let's call it part_a;
> >>
> >> 2. task t migrates from CPU0 onto CPU1, in there reads the rest of
> >>      program into its local Dcache, label it part_b;
> >>
> >> 3. on CPU1, task t flush the whole address range of this program,
> >>      As with ARM11 MPCore, this flush is locally effective, after flush
> >>      part_a still resides in CPU0 L1 Dcache.
> >>
> >> 4. When this program gets executed, CPU 1 fetch its instructions from
> >>      Icache, SCU will notice this action,
> >>
> >>      Question:
> >>      At this point, is stale part_a in main memory *or* L1 Dcache part_a
> >>      in CPU0 routed into CPU1 as instruction?
> >
> > This has been discussed in the past. The first thing is to disable
> > CONFIG_PREEMPT to make sure you are not preempted between page copying
> > and task execution. The code doing the page copy needs to call
> > flush_dcache_page() on the CPU where the copy occurred and Linux does a
> > non-lazy D-cache flush. But there are many situations where Linux
> > doesn't do this (you can google for flush_dcache_page and my name to
> > find a few places where I tried to fix this).
>
> Thanks:)

BTW, see this as a starting point (and a hack):

http://article.gmane.org/gmane.linux.ports.arm.kernel/51556

-- 
Catalin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Query about: ARM11 MPCore: preemption/task migration cache coherency
  2012-05-31  3:19             ` Catalin Marinas
@ 2012-05-31  3:38               ` bill4carson
  2012-05-31  3:58                 ` Catalin Marinas
  0 siblings, 1 reply; 29+ messages in thread
From: bill4carson @ 2012-05-31  3:38 UTC (permalink / raw)
  To: linux-arm-kernel



On 2012?05?31? 11:19, Catalin Marinas wrote:
> On Thu, May 31, 2012 at 04:11:46AM +0100, bill4carson wrote:
>> On 2012?05?31? 11:00, Catalin Marinas wrote:
>>> On Wed, May 30, 2012 at 11:01:59AM +0100, bill4carson wrote:
>>>> On 2012?05?30? 14:38, Will Deacon wrote:
>>>>> On Tue, May 29, 2012 at 06:28:11AM +0100, bill4carson wrote:
>>>>>> --- a/arch/arm/mm/cache-v6.S
>>>>>> +++ b/arch/arm/mm/cache-v6.S
>>>>>> @@ -170,6 +170,10 @@ ENDPROC(v6_coherent_kern_range)
>>>>>>      ENTRY(v6_flush_kern_dcache_area)
>>>>>>             add     r1, r0, r1
>>>>>>      1:
>>>>>> +#ifdef CONFIG_SMP
>>>>>> +       ldr     r2, [r0]                        @ read for ownership
>>>>>> +       str     r2, [r0]                        @ write for ownership
>>>>>> +#endif /* CONFIG_SMP */
>>>>>>      #ifdef HARVARD_CACHE
>>>>>>             mcr     p15, 0, r0, c7, c14, 1          @ clean&    invalidate D line
>>>>>>      #else
>>>>>
>>>>> I don't think the invalidation is needed here, so you probably don't need to
>>>>> hack this function at all.
>>> ...
>>>> 1. task t runs on CPU 0, it executes one program in nand flash,
>>>>       so first task t read *part* of this program into its local Dcache,
>>>>       let's call it part_a;
>>>>
>>>> 2. task t migrates from CPU0 onto CPU1, in there reads the rest of
>>>>       program into its local Dcache, label it part_b;
>>>>
>>>> 3. on CPU1, task t flush the whole address range of this program,
>>>>       As with ARM11 MPCore, this flush is locally effective, after flush
>>>>       part_a still resides in CPU0 L1 Dcache.
>>>>
>>>> 4. When this program gets executed, CPU 1 fetch its instructions from
>>>>       Icache, SCU will notice this action,
>>>>
>>>>       Question:
>>>>       At this point, is stale part_a in main memory *or* L1 Dcache part_a
>>>>       in CPU0 routed into CPU1 as instruction?
>>>
>>> This has been discussed in the past. The first thing is to disable
>>> CONFIG_PREEMPT to make sure you are not preempted between page copying
>>> and task execution. The code doing the page copy needs to call
>>> flush_dcache_page() on the CPU where the copy occurred and Linux does a
>>> non-lazy D-cache flush. But there are many situations where Linux
>>> doesn't do this (you can google for flush_dcache_page and my name to
>>> find a few places where I tried to fix this).
>>
>> Thanks:)
>
> BTW, see this as a starting point (and a hack):
>
> http://article.gmane.org/gmane.linux.ports.arm.kernel/51556


I think we has some mis-understanding here :(

As for:v6_flush_kern_dcache_area/v6_flush_kern_dcache_all
these two hooks is supposed to be globally effective, *not*
locally!

Hence, there should below fix to make it as globally effective.

+#ifdef CONFIG_SMP
+       ldr     r2, [r0]                        @ read for ownership
+       str     r2, [r0]                        @ write for ownership
+#endif /* CONFIG_SMP */

Or there maybe some other better solution for this issue.


thanks

-- 
Love each day!

--bill

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Query about: ARM11 MPCore: preemption/task migration cache coherency
  2012-05-31  3:38               ` bill4carson
@ 2012-05-31  3:58                 ` Catalin Marinas
  2012-05-31  5:06                   ` bill4carson
  0 siblings, 1 reply; 29+ messages in thread
From: Catalin Marinas @ 2012-05-31  3:58 UTC (permalink / raw)
  To: linux-arm-kernel

On 31 May 2012 11:38, bill4carson <bill4carson@gmail.com> wrote:
> On 2012?05?31? 11:19, Catalin Marinas wrote:
>> BTW, see this as a starting point (and a hack):
>>
>> http://article.gmane.org/gmane.linux.ports.arm.kernel/51556
>
> I think we has some mis-understanding here :(
>
> As for:v6_flush_kern_dcache_area/v6_flush_kern_dcache_all
> these two hooks is supposed to be globally effective, *not*
> locally!
>
> Hence, there should below fix to make it as globally effective.
>
>
> +#ifdef CONFIG_SMP
> + ? ? ? ldr ? ? r2, [r0] ? ? ? ? ? ? ? ? ? ? ? ?@ read for ownership
> + ? ? ? str ? ? r2, [r0] ? ? ? ? ? ? ? ? ? ? ? ?@ write for ownership
> +#endif /* CONFIG_SMP */
>
> Or there maybe some other better solution for this issue.

I still didn't fully understand what the problem is. So, to make sure,
if you run some applications from flash using a yaffs filesystem, you
get random crashes. Is this correct? If yes, a solution is to actually
call flush_dcache_page() on the CPU that does the page copying from
flash into RAM, which could be the yaffs filesystem.

-- 
Catalin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Query about: ARM11 MPCore: preemption/task migration cache coherency
  2012-05-31  3:58                 ` Catalin Marinas
@ 2012-05-31  5:06                   ` bill4carson
  2012-05-31  5:19                     ` Catalin Marinas
  0 siblings, 1 reply; 29+ messages in thread
From: bill4carson @ 2012-05-31  5:06 UTC (permalink / raw)
  To: linux-arm-kernel



On 2012?05?31? 11:58, Catalin Marinas wrote:
> On 31 May 2012 11:38, bill4carson<bill4carson@gmail.com>  wrote:
>> On 2012?05?31? 11:19, Catalin Marinas wrote:
>>> BTW, see this as a starting point (and a hack):
>>>
>>> http://article.gmane.org/gmane.linux.ports.arm.kernel/51556
>>
>> I think we has some mis-understanding here :(
>>
>> As for:v6_flush_kern_dcache_area/v6_flush_kern_dcache_all
>> these two hooks is supposed to be globally effective, *not*
>> locally!
>>
>> Hence, there should below fix to make it as globally effective.
>>
>>
>> +#ifdef CONFIG_SMP
>> +       ldr     r2, [r0]                        @ read for ownership
>> +       str     r2, [r0]                        @ write for ownership
>> +#endif /* CONFIG_SMP */
>>
>> Or there maybe some other better solution for this issue.
>
> I still didn't fully understand what the problem is. So, to make sure,
> if you run some applications from flash using a yaffs filesystem, you
> get random crashes. Is this correct? If yes, a solution is to actually
> call flush_dcache_page() on the CPU that does the page copying from
> flash into RAM, which could be the yaffs filesystem.
>

The story goes like this:
function "flush_dcache_page" should be global effective
but in ARMv6 MPCore, it was not, it was just local effective due
to hardware design. This may cause error in some cases for example:

1) Task running on Core-0 loading text section into memory.
    It was preempted and then migrate into Core-1;
2) On Core-1, this task continue loading it and then
    "flush_dcache_page" to make sure the loaded text section write
    into main memory.
3) Task tend to the loaded text section and running it.

If the "flush_dcache_page" was not global effective,
there maybe data still in Core-0's data cache, not write
into main memory. Thus in step 3, error instruction maybe
fetched thus cause strange error.

If I'm missing something here, please let me know.

thanks

-- 
Love each day!

--bill

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Query about: ARM11 MPCore: preemption/task migration cache coherency
  2012-05-31  5:06                   ` bill4carson
@ 2012-05-31  5:19                     ` Catalin Marinas
  2012-05-31  5:51                       ` bill4carson
  0 siblings, 1 reply; 29+ messages in thread
From: Catalin Marinas @ 2012-05-31  5:19 UTC (permalink / raw)
  To: linux-arm-kernel

On 31 May 2012 13:06, bill4carson <bill4carson@gmail.com> wrote:
> On 2012?05?31? 11:58, Catalin Marinas wrote:
>> I still didn't fully understand what the problem is. So, to make sure,
>> if you run some applications from flash using a yaffs filesystem, you
>> get random crashes. Is this correct? If yes, a solution is to actually
>> call flush_dcache_page() on the CPU that does the page copying from
>> flash into RAM, which could be the yaffs filesystem.
>
> The story goes like this:
> function "flush_dcache_page" should be global effective
> but in ARMv6 MPCore, it was not, it was just local effective due
> to hardware design.

Yes, I know this.

> This may cause error in some cases for example:
>
> 1) Task running on Core-0 loading text section into memory.
> ? It was preempted and then migrate into Core-1;

BTW, do you have CONFIG_PREEMPT enabled?

To be clear - is your application reading some data from flash and
trying to execute or it's the kernel doing the load via the
page/prefetch abort mechanism?

If the latter, task running on core 0 gets a prefetch abort when
trying to execute some code. The kernel reads the page from flash (via
mtd, block layer, VFS) and copies it into RAM. It can be on any CPU as
long as it calls flush_dcache_page on the same CPU that copied the
data.

No matter where the task was running or migrated to, if the code doing
the copy also called flush_dcache_page() on the same core, there is no
data left in the D-cache for that page.

> 2) On Core-1, this task continue loading it and then
> ? "flush_dcache_page" to make sure the loaded text section write
> ? into main memory.

The flush_dcache_page() must be called by the code doing the copy. If
that copy happened on core 0, the call is done there and not where the
task migrated. We don't do lazy flushing on ARM11MPCore.

> 3) Task tend to the loaded text section and running it.
>
> If the "flush_dcache_page" was not global effective,
> there maybe data still in Core-0's data cache, not write
> into main memory. Thus in step 3, error instruction maybe
> fetched thus cause strange error.

This can only happen if you have either preempt enabled (so that the
kernel code doing the copy is migrated) or the mtd driver or fs do not
call flush_dcache_page().

-- 
Catalin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Query about: ARM11 MPCore: preemption/task migration cache coherency
  2012-05-31  5:19                     ` Catalin Marinas
@ 2012-05-31  5:51                       ` bill4carson
  2012-05-31  6:56                         ` Catalin Marinas
  0 siblings, 1 reply; 29+ messages in thread
From: bill4carson @ 2012-05-31  5:51 UTC (permalink / raw)
  To: linux-arm-kernel



On 2012?05?31? 13:19, Catalin Marinas wrote:
> On 31 May 2012 13:06, bill4carson<bill4carson@gmail.com>  wrote:
>> On 2012?05?31? 11:58, Catalin Marinas wrote:
>>> I still didn't fully understand what the problem is. So, to make sure,
>>> if you run some applications from flash using a yaffs filesystem, you
>>> get random crashes. Is this correct? If yes, a solution is to actually
>>> call flush_dcache_page() on the CPU that does the page copying from
>>> flash into RAM, which could be the yaffs filesystem.
>>
>> The story goes like this:
>> function "flush_dcache_page" should be global effective
>> but in ARMv6 MPCore, it was not, it was just local effective due
>> to hardware design.
>
> Yes, I know this.
>
Then, why not fix "flush_dcache_page" to make it globally effective?


>> This may cause error in some cases for example:
>>
>> 1) Task running on Core-0 loading text section into memory.
>>    It was preempted and then migrate into Core-1;
>
> BTW, do you have CONFIG_PREEMPT enabled?
>
Yes, CONFIG_PREEMPT enabled. Thus cause the task was preempted. :-)


> To be clear - is your application reading some data from flash and
> trying to execute or it's the kernel doing the load via the
> page/prefetch abort mechanism?
>

In my current case, it is yaffs root file system.

> If the latter, task running on core 0 gets a prefetch abort when
> trying to execute some code. The kernel reads the page from flash (via
> mtd, block layer, VFS) and copies it into RAM. It can be on any CPU as
> long as it calls flush_dcache_page on the same CPU that copied the
> data.
>
> No matter where the task was running or migrated to, if the code doing
> the copy also called flush_dcache_page() on the same core, there is no
> data left in the D-cache for that page.

Yes, I agree with it.
But how to flush the data cache on the same core with PREEMPT enabled?
And, I think according to the design, there is no such operation that
guarantee it.


>
>> 2) On Core-1, this task continue loading it and then
>>    "flush_dcache_page" to make sure the loaded text section write
>>    into main memory.
>
> The flush_dcache_page() must be called by the code doing the copy. If
> that copy happened on core 0, the call is done there and not where the
> task migrated. We don't do lazy flushing on ARM11MPCore.
>
WHY?

>> 3) Task tend to the loaded text section and running it.
>>
>> If the "flush_dcache_page" was not global effective,
>> there maybe data still in Core-0's data cache, not write
>> into main memory. Thus in step 3, error instruction maybe
>> fetched thus cause strange error.
>
> This can only happen if you have either preempt enabled (so that the
> kernel code doing the copy is migrated) or the mtd driver or fs do not
> call flush_dcache_page().
>


PREEMPT + MTD root file system. :-)

But any way, I think flush_dcache_page should be global effective.
If ARMv6 MPCore didn't make it, we should try to accomplish it.



-- 
Love each day!

--bill

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Query about: ARM11 MPCore: preemption/task migration cache coherency
  2012-05-31  5:51                       ` bill4carson
@ 2012-05-31  6:56                         ` Catalin Marinas
  2012-05-31  7:21                           ` bill4carson
  0 siblings, 1 reply; 29+ messages in thread
From: Catalin Marinas @ 2012-05-31  6:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 31, 2012 at 06:51:22AM +0100, bill4carson wrote:
> On 2012?05?31? 13:19, Catalin Marinas wrote:
> > On 31 May 2012 13:06, bill4carson<bill4carson@gmail.com>  wrote:
> >> On 2012?05?31? 11:58, Catalin Marinas wrote:
> >>> I still didn't fully understand what the problem is. So, to make sure,
> >>> if you run some applications from flash using a yaffs filesystem, you
> >>> get random crashes. Is this correct? If yes, a solution is to actually
> >>> call flush_dcache_page() on the CPU that does the page copying from
> >>> flash into RAM, which could be the yaffs filesystem.
> >>
> >> The story goes like this:
> >> function "flush_dcache_page" should be global effective
> >> but in ARMv6 MPCore, it was not, it was just local effective due
> >> to hardware design.
> >
> > Yes, I know this.
>
> Then, why not fix "flush_dcache_page" to make it globally effective?

Performance?

And it's also just ARM11MPCore microarchitecture specific.

> >> This may cause error in some cases for example:
> >>
> >> 1) Task running on Core-0 loading text section into memory.
> >>    It was preempted and then migrate into Core-1;
> >
> > BTW, do you have CONFIG_PREEMPT enabled?
> >
> Yes, CONFIG_PREEMPT enabled. Thus cause the task was preempted. :-)

I told you that CONFIG_PREEMPT is not supported on ARM11MPCore :).

> > To be clear - is your application reading some data from flash and
> > trying to execute or it's the kernel doing the load via the
> > page/prefetch abort mechanism?
> 
> In my current case, it is yaffs root file system.
> 
> > If the latter, task running on core 0 gets a prefetch abort when
> > trying to execute some code. The kernel reads the page from flash (via
> > mtd, block layer, VFS) and copies it into RAM. It can be on any CPU as
> > long as it calls flush_dcache_page on the same CPU that copied the
> > data.
> >
> > No matter where the task was running or migrated to, if the code doing
> > the copy also called flush_dcache_page() on the same core, there is no
> > data left in the D-cache for that page.
> 
> Yes, I agree with it.
> But how to flush the data cache on the same core with PREEMPT enabled?

That's not easily possible. But you may get better results with
VOLUNTARY_PREEMPT.

> And, I think according to the design, there is no such operation that
> guarantee it.

RFO/WFO tricks only work on ARM11MPCore.

> But any way, I think flush_dcache_page should be global effective.
> If ARMv6 MPCore didn't make it, we should try to accomplish it.

Do some performance tests first.

-- 
Catalin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Query about: ARM11 MPCore: preemption/task migration cache coherency
  2012-05-31  6:56                         ` Catalin Marinas
@ 2012-05-31  7:21                           ` bill4carson
  2012-05-31  7:46                             ` snakky.zhang at gmail.com
  0 siblings, 1 reply; 29+ messages in thread
From: bill4carson @ 2012-05-31  7:21 UTC (permalink / raw)
  To: linux-arm-kernel



On 2012?05?31? 14:56, Catalin Marinas wrote:
> On Thu, May 31, 2012 at 06:51:22AM +0100, bill4carson wrote:
>> On 2012?05?31? 13:19, Catalin Marinas wrote:
>>> On 31 May 2012 13:06, bill4carson<bill4carson@gmail.com>   wrote:
>>>> On 2012?05?31? 11:58, Catalin Marinas wrote:
>>>>> I still didn't fully understand what the problem is. So, to make sure,
>>>>> if you run some applications from flash using a yaffs filesystem, you
>>>>> get random crashes. Is this correct? If yes, a solution is to actually
>>>>> call flush_dcache_page() on the CPU that does the page copying from
>>>>> flash into RAM, which could be the yaffs filesystem.
>>>>
>>>> The story goes like this:
>>>> function "flush_dcache_page" should be global effective
>>>> but in ARMv6 MPCore, it was not, it was just local effective due
>>>> to hardware design.
>>>
>>> Yes, I know this.
>>
>> Then, why not fix "flush_dcache_page" to make it globally effective?
>
> Performance?
>
> And it's also just ARM11MPCore microarchitecture specific.
>
>>>> This may cause error in some cases for example:
>>>>
>>>> 1) Task running on Core-0 loading text section into memory.
>>>>     It was preempted and then migrate into Core-1;
>>>
>>> BTW, do you have CONFIG_PREEMPT enabled?
>>>
>> Yes, CONFIG_PREEMPT enabled. Thus cause the task was preempted. :-)
>
> I told you that CONFIG_PREEMPT is not supported on ARM11MPCore :).

Point!

Is it better to add comment, such as "PREEMPT is not supported for
ARM11MPCore" in somewhere(for now, I don't find such place)?
then custom will be alerted with such notice when they trying change
to PREEMPT.

And again, thanks for your patience with me :)


>
>>> To be clear - is your application reading some data from flash and
>>> trying to execute or it's the kernel doing the load via the
>>> page/prefetch abort mechanism?
>>
>> In my current case, it is yaffs root file system.
>>
>>> If the latter, task running on core 0 gets a prefetch abort when
>>> trying to execute some code. The kernel reads the page from flash (via
>>> mtd, block layer, VFS) and copies it into RAM. It can be on any CPU as
>>> long as it calls flush_dcache_page on the same CPU that copied the
>>> data.
>>>
>>> No matter where the task was running or migrated to, if the code doing
>>> the copy also called flush_dcache_page() on the same core, there is no
>>> data left in the D-cache for that page.
>>
>> Yes, I agree with it.
>> But how to flush the data cache on the same core with PREEMPT enabled?
>
> That's not easily possible. But you may get better results with
> VOLUNTARY_PREEMPT.
>
>> And, I think according to the design, there is no such operation that
>> guarantee it.
>
> RFO/WFO tricks only work on ARM11MPCore.
>
>> But any way, I think flush_dcache_page should be global effective.
>> If ARMv6 MPCore didn't make it, we should try to accomplish it.
>
> Do some performance tests first.
>

-- 
Love each day!

--bill

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Query about: ARM11 MPCore: preemption/task migration cache coherency
  2012-05-31  7:21                           ` bill4carson
@ 2012-05-31  7:46                             ` snakky.zhang at gmail.com
  2012-05-31 16:04                               ` Catalin Marinas
  0 siblings, 1 reply; 29+ messages in thread
From: snakky.zhang at gmail.com @ 2012-05-31  7:46 UTC (permalink / raw)
  To: linux-arm-kernel

On 2012?05?31? 15:21, bill4carson wrote:
>
>
> On 2012?05?31? 14:56, Catalin Marinas wrote:
>> On Thu, May 31, 2012 at 06:51:22AM +0100, bill4carson wrote:
>>> On 2012?05?31? 13:19, Catalin Marinas wrote:
>>>> On 31 May 2012 13:06, bill4carson<bill4carson@gmail.com>   wrote:
>>>>> On 2012?05?31? 11:58, Catalin Marinas wrote:
>>>>>> I still didn't fully understand what the problem is. So, to make 
>>>>>> sure,
>>>>>> if you run some applications from flash using a yaffs filesystem, 
>>>>>> you
>>>>>> get random crashes. Is this correct? If yes, a solution is to 
>>>>>> actually
>>>>>> call flush_dcache_page() on the CPU that does the page copying from
>>>>>> flash into RAM, which could be the yaffs filesystem.
>>>>>
>>>>> The story goes like this:
>>>>> function "flush_dcache_page" should be global effective
>>>>> but in ARMv6 MPCore, it was not, it was just local effective due
>>>>> to hardware design.
>>>>
>>>> Yes, I know this.
>>>
>>> Then, why not fix "flush_dcache_page" to make it globally effective?
>>
>> Performance?
>>
>> And it's also just ARM11MPCore microarchitecture specific.
>>
Yes, seems newer CPUs has no such limitation thus this function is 
global effective naturally. :-)

And , I find Mips's c-r4k also has this issue but it use IPI to make it. 
Details in arch/mips/mm/c-r4k.c.

 From my point of view, PREEMPT should not related to CPU type. So if 
this type of CPU does not support PREEMPT for performance reason, can we 
add something in Documentation and related Kconfig to make/mark it?

Or maybe disable task migration is also a choice. :-)

Thanks
Xiao
>>>>> This may cause error in some cases for example:
>>>>>
>>>>> 1) Task running on Core-0 loading text section into memory.
>>>>>     It was preempted and then migrate into Core-1;
>>>>
>>>> BTW, do you have CONFIG_PREEMPT enabled?
>>>>
>>> Yes, CONFIG_PREEMPT enabled. Thus cause the task was preempted. :-)
>>
>> I told you that CONFIG_PREEMPT is not supported on ARM11MPCore :).
>
> Point!
>
> Is it better to add comment, such as "PREEMPT is not supported for
> ARM11MPCore" in somewhere(for now, I don't find such place)?
> then custom will be alerted with such notice when they trying change
> to PREEMPT.
>
> And again, thanks for your patience with me :)
>
>
>>
>>>> To be clear - is your application reading some data from flash and
>>>> trying to execute or it's the kernel doing the load via the
>>>> page/prefetch abort mechanism?
>>>
>>> In my current case, it is yaffs root file system.
>>>
>>>> If the latter, task running on core 0 gets a prefetch abort when
>>>> trying to execute some code. The kernel reads the page from flash (via
>>>> mtd, block layer, VFS) and copies it into RAM. It can be on any CPU as
>>>> long as it calls flush_dcache_page on the same CPU that copied the
>>>> data.
>>>>
>>>> No matter where the task was running or migrated to, if the code doing
>>>> the copy also called flush_dcache_page() on the same core, there is no
>>>> data left in the D-cache for that page.
>>>
>>> Yes, I agree with it.
>>> But how to flush the data cache on the same core with PREEMPT enabled?
>>
>> That's not easily possible. But you may get better results with
>> VOLUNTARY_PREEMPT.
>>
>>> And, I think according to the design, there is no such operation that
>>> guarantee it.
>>
>> RFO/WFO tricks only work on ARM11MPCore.
>>
>>> But any way, I think flush_dcache_page should be global effective.
>>> If ARMv6 MPCore didn't make it, we should try to accomplish it.
>>
>> Do some performance tests first.
>>
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Query about: ARM11 MPCore: preemption/task migration cache coherency
  2012-05-31  7:46                             ` snakky.zhang at gmail.com
@ 2012-05-31 16:04                               ` Catalin Marinas
  2012-06-01  1:11                                 ` snakky.zhang at gmail.com
  2012-06-01  1:34                                 ` snakky.zhang at gmail.com
  0 siblings, 2 replies; 29+ messages in thread
From: Catalin Marinas @ 2012-05-31 16:04 UTC (permalink / raw)
  To: linux-arm-kernel

On 31 May 2012 15:46,  <snakky.zhang@gmail.com> wrote:
> On 2012?05?31? 15:21, bill4carson wrote:
>> On 2012?05?31? 14:56, Catalin Marinas wrote:
>>> On Thu, May 31, 2012 at 06:51:22AM +0100, bill4carson wrote:
>>>> On 2012?05?31? 13:19, Catalin Marinas wrote:
>>>>> On 31 May 2012 13:06, bill4carson<bill4carson@gmail.com> ? wrote:
>>>>>> On 2012?05?31? 11:58, Catalin Marinas wrote:
>>>>>>> I still didn't fully understand what the problem is. So, to make
>>>>>>> sure,
>>>>>>> if you run some applications from flash using a yaffs filesystem, you
>>>>>>> get random crashes. Is this correct? If yes, a solution is to
>>>>>>> actually
>>>>>>> call flush_dcache_page() on the CPU that does the page copying from
>>>>>>> flash into RAM, which could be the yaffs filesystem.
>>>>>>
>>>>>>
>>>>>> The story goes like this:
>>>>>> function "flush_dcache_page" should be global effective
>>>>>> but in ARMv6 MPCore, it was not, it was just local effective due
>>>>>> to hardware design.
>>>>>
>>>>>
>>>>> Yes, I know this.
>>>>
>>>>
>>>> Then, why not fix "flush_dcache_page" to make it globally effective?
>>>
>>>
>>> Performance?
>>>
>>> And it's also just ARM11MPCore microarchitecture specific.
>>>
> Yes, seems newer CPUs has no such limitation thus this function is global
> effective naturally. :-)
>
> And , I find Mips's c-r4k also has this issue but it use IPI to make it.
> Details in arch/mips/mm/c-r4k.c.

Rather than IPI we would better use the read-for-ownership trick like
in this patch to make flush_dcache_page global (no need for
write-for-ownership):

http://dchs.spinics.net/lists/arm-kernel/msg125075.html

(it may no longer apply, I haven't checked it for some time).

That's the first thing. Secondly you still need preemption disable so
that it is not preempted between RFO and the actual cache cleaning.

-- 
Catalin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Query about: ARM11 MPCore: preemption/task migration cache coherency
  2012-05-31 16:04                               ` Catalin Marinas
@ 2012-06-01  1:11                                 ` snakky.zhang at gmail.com
  2012-06-01  3:25                                   ` Catalin Marinas
  2012-06-01  1:34                                 ` snakky.zhang at gmail.com
  1 sibling, 1 reply; 29+ messages in thread
From: snakky.zhang at gmail.com @ 2012-06-01  1:11 UTC (permalink / raw)
  To: linux-arm-kernel

On 2012?06?01? 00:04, Catalin Marinas wrote:
> On 31 May 2012 15:46,<snakky.zhang@gmail.com>  wrote:
>> On 2012?05?31? 15:21, bill4carson wrote:
>>> On 2012?05?31? 14:56, Catalin Marinas wrote:
>>>> On Thu, May 31, 2012 at 06:51:22AM +0100, bill4carson wrote:
>>>>> On 2012?05?31? 13:19, Catalin Marinas wrote:
>>>>>> On 31 May 2012 13:06, bill4carson<bill4carson@gmail.com>     wrote:
>>>>>>> On 2012?05?31? 11:58, Catalin Marinas wrote:
>>>>>>>> I still didn't fully understand what the problem is. So, to make
>>>>>>>> sure,
>>>>>>>> if you run some applications from flash using a yaffs filesystem, you
>>>>>>>> get random crashes. Is this correct? If yes, a solution is to
>>>>>>>> actually
>>>>>>>> call flush_dcache_page() on the CPU that does the page copying from
>>>>>>>> flash into RAM, which could be the yaffs filesystem.
>>>>>>>
>>>>>>> The story goes like this:
>>>>>>> function "flush_dcache_page" should be global effective
>>>>>>> but in ARMv6 MPCore, it was not, it was just local effective due
>>>>>>> to hardware design.
>>>>>>
>>>>>> Yes, I know this.
>>>>>
>>>>> Then, why not fix "flush_dcache_page" to make it globally effective?
>>>>
>>>> Performance?
>>>>
>>>> And it's also just ARM11MPCore microarchitecture specific.
>>>>
>> Yes, seems newer CPUs has no such limitation thus this function is global
>> effective naturally. :-)
>>
>> And , I find Mips's c-r4k also has this issue but it use IPI to make it.
>> Details in arch/mips/mm/c-r4k.c.
> Rather than IPI we would better use the read-for-ownership trick like
> in this patch to make flush_dcache_page global (no need for
> write-for-ownership):

I think write for ownership is necessary for flush_dcache_xxx: Read 
guarantee local data cache get
newest data, at the same time write guarantee the data can be flushed 
into memory.

See Section 7.1 of ARM11 MPCore. Processor Technical Reference 
Manual(Revision: r2p0) P146/728:
======
Clean Applies to write-back data caches. If the cache line targeted by 
the Clean
operation contains stored data that has not yet been written out to main
memory, it is written to main memory, and the line is marked as clean.
======

So I am afraid without the write action, the "clean & invalidate" action 
later will not write data back to main
memory.

Another question here:  Why the flush_kern_dcache_xxx in 
arch/arm/mm/cache-v6 use "clean & invalidate"
progress instead of "clean"? Seems clean is enough here.

Please fix me if I mis-understand something.

>
> http://dchs.spinics.net/lists/arm-kernel/msg125075.html
>
> (it may no longer apply, I haven't checked it for some time).
>
> That's the first thing. Secondly you still need preemption disable so
> that it is not preempted between RFO and the actual cache cleaning.
>
PREEMPT. :-)

Get it. But currently, I can't find anything related to ARMv6 MPCore 
conflict with PREEMPT. So if it is also
necessary to add something in Documentation and related Kconfig to 
describe it and make sure PREEMPT
can't been enabled on such CPUs?

Thanks
Xiao

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Query about: ARM11 MPCore: preemption/task migration cache coherency
  2012-05-31 16:04                               ` Catalin Marinas
  2012-06-01  1:11                                 ` snakky.zhang at gmail.com
@ 2012-06-01  1:34                                 ` snakky.zhang at gmail.com
  2012-06-01  3:29                                   ` Catalin Marinas
  1 sibling, 1 reply; 29+ messages in thread
From: snakky.zhang at gmail.com @ 2012-06-01  1:34 UTC (permalink / raw)
  To: linux-arm-kernel


>> Yes, seems newer CPUs has no such limitation thus this function is global
>> effective naturally. :-)
>>
>> And , I find Mips's c-r4k also has this issue but it use IPI to make it.
>> Details in arch/mips/mm/c-r4k.c.
> Rather than IPI we would better use the read-for-ownership trick like
> in this patch to make flush_dcache_page global (no need for
> write-for-ownership):
>
> http://dchs.spinics.net/lists/arm-kernel/msg125075.html
>
> (it may no longer apply, I haven't checked it for some time).
>
> That's the first thing. Secondly you still need preemption disable so
> that it is not preempted between RFO and the actual cache cleaning.
>
And, another confusion for PREEMPT: Even if we disable preempt, with locally
effective flush_dcache_xxx, there is still possibility to reproduce such
issue(Similar with the previous case):

1) Task running on Core-0 loading text section into memory.
     It was schedule out and then migrate into Core-1;
2) On Core-1, this task continue loading it and then
     "flush_dcache_page" to make sure the loaded text section write
     into main memory.
3) Task tend to the loaded text section and running it.

Similar as the previous case, the difference lies in step1 that the task was
interrupted by timer interrupt. Thus it still can be switch out and then 
been
migrate to another core. Thus in step2 and 3, this issue may still been 
reproduced.
So, disable preempt can only lower the possibility of this issue but 
can't avoid it.

If I mis-understand something, please correct me.

Thanks
Xiao

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Query about: ARM11 MPCore: preemption/task migration cache coherency
  2012-06-01  1:11                                 ` snakky.zhang at gmail.com
@ 2012-06-01  3:25                                   ` Catalin Marinas
  2012-06-01  5:21                                     ` snakky.zhang at gmail.com
  0 siblings, 1 reply; 29+ messages in thread
From: Catalin Marinas @ 2012-06-01  3:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jun 01, 2012 at 02:11:59AM +0100, snakky.zhang at gmail.com wrote:
> On 2012?06?01? 00:04, Catalin Marinas wrote:
> > Rather than IPI we would better use the read-for-ownership trick like
> > in this patch to make flush_dcache_page global (no need for
> > write-for-ownership):
> 
> I think write for ownership is necessary for flush_dcache_xxx: Read
> guarantee local data cache get newest data, at the same time write
> guarantee the data can be flushed into memory.
> 
> See Section 7.1 of ARM11 MPCore. Processor Technical Reference 
> Manual(Revision: r2p0) P146/728:
> ======
> Clean Applies to write-back data caches. If the cache line targeted by
> the Clean operation contains stored data that has not yet been written
> out to main memory, it is written to main memory, and the line is
> marked as clean.
> ======
> 
> So I am afraid without the write action, the "clean & invalidate"
> action later will not write data back to main memory.

If there is a dirty cache line, it will be written to memory by the
clean&invalidate operation. If the data in the cache line is in a clean
state, it means that it is identical to the main memory (or L2 if
present).

With just a read, clean&invalidate would not invalidate (remove) the
cache lines from the other CPUs. Doing a write forces the cache line to
only be present on the current CPU (though automatically invalidating it
on the other CPUs).

> Another question here:  Why the flush_kern_dcache_xxx in
> arch/arm/mm/cache-v6 use "clean & invalidate" progress instead of
> "clean"? Seems clean is enough here.

I think in the context of VIPT caches clean would be enough.

> > http://dchs.spinics.net/lists/arm-kernel/msg125075.html
> >
> > (it may no longer apply, I haven't checked it for some time).
> >
> > That's the first thing. Secondly you still need preemption disable so
> > that it is not preempted between RFO and the actual cache cleaning.
>
> PREEMPT. :-)
> 
> Get it. But currently, I can't find anything related to ARMv6 MPCore
> conflict with PREEMPT. So if it is also necessary to add something in
> Documentation and related Kconfig to describe it and make sure PREEMPT
> can't been enabled on such CPUs?

Well, we either get it to work or, if not possible, we add a comment.
Let's try the former option first :)

-- 
Catalin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Query about: ARM11 MPCore: preemption/task migration cache coherency
  2012-06-01  1:34                                 ` snakky.zhang at gmail.com
@ 2012-06-01  3:29                                   ` Catalin Marinas
  2012-06-03 11:34                                     ` Russell King - ARM Linux
  0 siblings, 1 reply; 29+ messages in thread
From: Catalin Marinas @ 2012-06-01  3:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jun 01, 2012 at 02:34:12AM +0100, snakky.zhang at gmail.com wrote:
> >> Yes, seems newer CPUs has no such limitation thus this function is global
> >> effective naturally. :-)
> >>
> >> And , I find Mips's c-r4k also has this issue but it use IPI to make it.
> >> Details in arch/mips/mm/c-r4k.c.
> > Rather than IPI we would better use the read-for-ownership trick like
> > in this patch to make flush_dcache_page global (no need for
> > write-for-ownership):
> >
> > http://dchs.spinics.net/lists/arm-kernel/msg125075.html
> >
> > (it may no longer apply, I haven't checked it for some time).
> >
> > That's the first thing. Secondly you still need preemption disable so
> > that it is not preempted between RFO and the actual cache cleaning.
> >
> And, another confusion for PREEMPT: Even if we disable preempt, with locally
> effective flush_dcache_xxx, there is still possibility to reproduce such
> issue(Similar with the previous case):
> 
> 1) Task running on Core-0 loading text section into memory.
>      It was schedule out and then migrate into Core-1;
> 2) On Core-1, this task continue loading it and then
>      "flush_dcache_page" to make sure the loaded text section write
>      into main memory.
> 3) Task tend to the loaded text section and running it.
> 
> Similar as the previous case, the difference lies in step1 that the
> task was interrupted by timer interrupt. Thus it still can be switch
> out and then been migrate to another core. Thus in step2 and 3, this
> issue may still been reproduced.  So, disable preempt can only lower
> the possibility of this issue but can't avoid it.

It would work as long as the data copying into the text area (done by
the driver and VFS layer) and the flush_dcache_page() sequence are not
preemptible. A timer interrupt between data copying and
flush_dcache_page() would interrupt a kernel routine which is not
preemptible.

-- 
Catalin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Query about: ARM11 MPCore: preemption/task migration cache coherency
  2012-06-01  3:25                                   ` Catalin Marinas
@ 2012-06-01  5:21                                     ` snakky.zhang at gmail.com
  0 siblings, 0 replies; 29+ messages in thread
From: snakky.zhang at gmail.com @ 2012-06-01  5:21 UTC (permalink / raw)
  To: linux-arm-kernel

On 2012?06?01? 11:25, Catalin Marinas wrote:
> On Fri, Jun 01, 2012 at 02:11:59AM +0100, snakky.zhang at gmail.com wrote:
>> On 2012?06?01? 00:04, Catalin Marinas wrote:
>>> Rather than IPI we would better use the read-for-ownership trick like
>>> in this patch to make flush_dcache_page global (no need for
>>> write-for-ownership):
>> I think write for ownership is necessary for flush_dcache_xxx: Read
>> guarantee local data cache get newest data, at the same time write
>> guarantee the data can be flushed into memory.
>>
>> See Section 7.1 of ARM11 MPCore. Processor Technical Reference
>> Manual(Revision: r2p0) P146/728:
>> ======
>> Clean Applies to write-back data caches. If the cache line targeted by
>> the Clean operation contains stored data that has not yet been written
>> out to main memory, it is written to main memory, and the line is
>> marked as clean.
>> ======
>>
>> So I am afraid without the write action, the "clean&  invalidate"
>> action later will not write data back to main memory.
> If there is a dirty cache line, it will be written to memory by the
> clean&invalidate operation. If the data in the cache line is in a clean
> state, it means that it is identical to the main memory (or L2 if
> present).
>
> With just a read, clean&invalidate would not invalidate (remove) the
> cache lines from the other CPUs. Doing a write forces the cache line to
> only be present on the current CPU (though automatically invalidating it
> on the other CPUs).
>
>> Another question here:  Why the flush_kern_dcache_xxx in
>> arch/arm/mm/cache-v6 use "clean&  invalidate" progress instead of
>> "clean"? Seems clean is enough here.
> I think in the context of VIPT caches clean would be enough.
>
>>> http://dchs.spinics.net/lists/arm-kernel/msg125075.html
>>>
>>> (it may no longer apply, I haven't checked it for some time).
>>>
>>> That's the first thing. Secondly you still need preemption disable so
>>> that it is not preempted between RFO and the actual cache cleaning.
>> PREEMPT. :-)
>>
>> Get it. But currently, I can't find anything related to ARMv6 MPCore
>> conflict with PREEMPT. So if it is also necessary to add something in
>> Documentation and related Kconfig to describe it and make sure PREEMPT
>> can't been enabled on such CPUs?
> Well, we either get it to work or, if not possible, we add a comment.
> Let's try the former option first :)
>
Thanks for your patient explanation. :-)

Thanks a lot!
Xiao

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Query about: ARM11 MPCore: preemption/task migration cache coherency
  2012-06-01  3:29                                   ` Catalin Marinas
@ 2012-06-03 11:34                                     ` Russell King - ARM Linux
  2012-06-04  9:20                                       ` snakky.zhang at gmail.com
  0 siblings, 1 reply; 29+ messages in thread
From: Russell King - ARM Linux @ 2012-06-03 11:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jun 01, 2012 at 11:29:13AM +0800, Catalin Marinas wrote:
> On Fri, Jun 01, 2012 at 02:34:12AM +0100, snakky.zhang at gmail.com wrote:
> > >> Yes, seems newer CPUs has no such limitation thus this function is global
> > >> effective naturally. :-)
> > >>
> > >> And , I find Mips's c-r4k also has this issue but it use IPI to make it.
> > >> Details in arch/mips/mm/c-r4k.c.
> > > Rather than IPI we would better use the read-for-ownership trick like
> > > in this patch to make flush_dcache_page global (no need for
> > > write-for-ownership):
> > >
> > > http://dchs.spinics.net/lists/arm-kernel/msg125075.html
> > >
> > > (it may no longer apply, I haven't checked it for some time).
> > >
> > > That's the first thing. Secondly you still need preemption disable so
> > > that it is not preempted between RFO and the actual cache cleaning.
> > >
> > And, another confusion for PREEMPT: Even if we disable preempt, with locally
> > effective flush_dcache_xxx, there is still possibility to reproduce such
> > issue(Similar with the previous case):
> > 
> > 1) Task running on Core-0 loading text section into memory.
> >      It was schedule out and then migrate into Core-1;
> > 2) On Core-1, this task continue loading it and then
> >      "flush_dcache_page" to make sure the loaded text section write
> >      into main memory.
> > 3) Task tend to the loaded text section and running it.
> > 
> > Similar as the previous case, the difference lies in step1 that the
> > task was interrupted by timer interrupt. Thus it still can be switch
> > out and then been migrate to another core. Thus in step2 and 3, this
> > issue may still been reproduced.  So, disable preempt can only lower
> > the possibility of this issue but can't avoid it.
> 
> It would work as long as the data copying into the text area (done by
> the driver and VFS layer) and the flush_dcache_page() sequence are not
> preemptible. A timer interrupt between data copying and
> flush_dcache_page() would interrupt a kernel routine which is not
> preemptible.

And that doesn't matter because on a non-preemptible kernel, timer ticks
do _not_ cause the threads to be rescheduled while in kernel mode.  If
they did, it would be a _preemptible_ kernel.

The only things on a non-preemptible kernel which cause a schedule point
are functions which may sleep, so semaphores, waiting for events, etc.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Query about: ARM11 MPCore: preemption/task migration cache coherency
  2012-06-03 11:34                                     ` Russell King - ARM Linux
@ 2012-06-04  9:20                                       ` snakky.zhang at gmail.com
  0 siblings, 0 replies; 29+ messages in thread
From: snakky.zhang at gmail.com @ 2012-06-04  9:20 UTC (permalink / raw)
  To: linux-arm-kernel

On 2012?06?03? 19:34, Russell King - ARM Linux wrote:
> On Fri, Jun 01, 2012 at 11:29:13AM +0800, Catalin Marinas wrote:
>> On Fri, Jun 01, 2012 at 02:34:12AM +0100, snakky.zhang at gmail.com wrote:
>>>>> Yes, seems newer CPUs has no such limitation thus this function is global
>>>>> effective naturally. :-)
>>>>>
>>>>> And , I find Mips's c-r4k also has this issue but it use IPI to make it.
>>>>> Details in arch/mips/mm/c-r4k.c.
>>>> Rather than IPI we would better use the read-for-ownership trick like
>>>> in this patch to make flush_dcache_page global (no need for
>>>> write-for-ownership):
>>>>
>>>> http://dchs.spinics.net/lists/arm-kernel/msg125075.html
>>>>
>>>> (it may no longer apply, I haven't checked it for some time).
>>>>
>>>> That's the first thing. Secondly you still need preemption disable so
>>>> that it is not preempted between RFO and the actual cache cleaning.
>>>>
>>> And, another confusion for PREEMPT: Even if we disable preempt, with locally
>>> effective flush_dcache_xxx, there is still possibility to reproduce such
>>> issue(Similar with the previous case):
>>>
>>> 1) Task running on Core-0 loading text section into memory.
>>>       It was schedule out and then migrate into Core-1;
>>> 2) On Core-1, this task continue loading it and then
>>>       "flush_dcache_page" to make sure the loaded text section write
>>>       into main memory.
>>> 3) Task tend to the loaded text section and running it.
>>>
>>> Similar as the previous case, the difference lies in step1 that the
>>> task was interrupted by timer interrupt. Thus it still can be switch
>>> out and then been migrate to another core. Thus in step2 and 3, this
>>> issue may still been reproduced.  So, disable preempt can only lower
>>> the possibility of this issue but can't avoid it.
>> It would work as long as the data copying into the text area (done by
>> the driver and VFS layer) and the flush_dcache_page() sequence are not
>> preemptible. A timer interrupt between data copying and
>> flush_dcache_page() would interrupt a kernel routine which is not
>> preemptible.
> And that doesn't matter because on a non-preemptible kernel, timer ticks
> do _not_ cause the threads to be rescheduled while in kernel mode.  If
> they did, it would be a _preemptible_ kernel.
>
> The only things on a non-preemptible kernel which cause a schedule point
> are functions which may sleep, so semaphores, waiting for events, etc.
>
>
Thanks for your description. :-)

Thanks
Xiao

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Query about: ARM11 MPCore: preemption/task migration cache coherency
  2012-05-09  9:11 Query about: ARM11 MPCore: preemption/task migration cache coherency bill4carson
  2012-05-11  8:51 ` Will Deacon
@ 2012-06-05  4:06 ` George G. Davis
  2012-06-05  4:50   ` bill4carson
  1 sibling, 1 reply; 29+ messages in thread
From: George G. Davis @ 2012-06-05  4:06 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On May 9, 2012, at 5:11 AM, bill4carson wrote:

> Hi, All
> 
> I'm using ARM11 MPCore on linux-2.6.34, unfortunately I have random
> panic/segment fault with task migration. I noticed a patch set
> [ARM11 MPCore: preemption/task migration cache coherency fixups] to fix
> such issues in here:
> 

> http://lists.infradead.org/pipermail/linux-arm-kernel/2011-October/069851.html
> 
> It seems there is no follow ups, is there official patch to fix such
> issues?

Apologies for the delayed reply.  Yes, I believe those patches are still needed
for ARM11 MPCore, even when not using PREEMPT.  I vaguely recall at least
two cases where problems occurred w/o those patches, 1) LTP stress would
randomly fail, and 2) parallel module loading would invariably oops with a
random fault, with results similar to what you've reported later in this thread.
The most reliable test case to reproduce the failure was the parallel module
loading.  LTP stress test runs took a long time to trigger the errors.

I'm dusting off the patches and trying to resurrect the test cases to see if I can
reproduce the errors on latest kernel.org.  I'll followup with results in a day or
so.  FWIW,   I've got LTP stress running on latest kernel.org now but probably
should have resurrected the parallel module loading test case instead, as
LTP stress may take awhile to reproduce the error, if at all.

--
Regards,
George

> 
> 
> thanks
> 
> -- 
> Love each day!
> 
> --bill

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Query about: ARM11 MPCore: preemption/task migration cache coherency
  2012-06-05  4:06 ` George G. Davis
@ 2012-06-05  4:50   ` bill4carson
  2012-06-06  6:18     ` Andrew Yan-Pai Chen
  0 siblings, 1 reply; 29+ messages in thread
From: bill4carson @ 2012-06-05  4:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hi, Geroge

thanks for your updates.


On 2012?06?05? 12:06, George G. Davis wrote:
> Hello,
>
> On May 9, 2012, at 5:11 AM, bill4carson wrote:
>
>> Hi, All
>>
>> I'm using ARM11 MPCore on linux-2.6.34, unfortunately I have random
>> panic/segment fault with task migration. I noticed a patch set
>> [ARM11 MPCore: preemption/task migration cache coherency fixups] to fix
>> such issues in here:
>>
>
>> http://lists.infradead.org/pipermail/linux-arm-kernel/2011-October/069851.html
>>
>> It seems there is no follow ups, is there official patch to fix such
>> issues?
>
> Apologies for the delayed reply.  Yes, I believe those patches are still needed
> for ARM11 MPCore, even when not using PREEMPT.  I vaguely recall at least
> two cases where problems occurred w/o those patches, 1) LTP stress would
> randomly fail, and 2) parallel module loading would invariably oops with a
> random fault, with results similar to what you've reported later in this thread.
> The most reliable test case to reproduce the failure was the parallel module
> loading.  LTP stress test runs took a long time to trigger the errors.
>
> I'm dusting off the patches and trying to resurrect the test cases to see if I can
> reproduce the errors on latest kernel.org.  I'll followup with results in a day or
> so.  FWIW,   I've got LTP stress running on latest kernel.org now but probably
> should have resurrected the parallel module loading test case instead, as
> LTP stress may take awhile to reproduce the error, if at all.
>
> --
> Regards,
> George
>
>>
>>
>> thanks
>>
>> --
>> Love each day!
>>
>> --bill
>
>

-- 
Love each day!

--bill

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Query about: ARM11 MPCore: preemption/task migration cache coherency
  2012-06-05  4:50   ` bill4carson
@ 2012-06-06  6:18     ` Andrew Yan-Pai Chen
  0 siblings, 0 replies; 29+ messages in thread
From: Andrew Yan-Pai Chen @ 2012-06-06  6:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 5, 2012 at 12:50 PM, bill4carson <bill4carson@gmail.com> wrote:
> Hi, Geroge
>
> thanks for your updates.
>
>
>
> On 2012?06?05? 12:06, George G. Davis wrote:
>>
>> Hello,
>>
>> On May 9, 2012, at 5:11 AM, bill4carson wrote:
>>
>>> Hi, All
>>>
>>> I'm using ARM11 MPCore on linux-2.6.34, unfortunately I have random
>>> panic/segment fault with task migration. I noticed a patch set
>>> [ARM11 MPCore: preemption/task migration cache coherency fixups] to fix
>>> such issues in here:
>>>
>>
>>>
>>> http://lists.infradead.org/pipermail/linux-arm-kernel/2011-October/069851.html
>>>
>>> It seems there is no follow ups, is there official patch to fix such
>>> issues?
>>
>>
>> Apologies for the delayed reply. ?Yes, I believe those patches are still
>> needed
>> for ARM11 MPCore, even when not using PREEMPT. ?I vaguely recall at least
>> two cases where problems occurred w/o those patches, 1) LTP stress would
>> randomly fail, and 2) parallel module loading would invariably oops with a
>> random fault, with results similar to what you've reported later in this
>> thread.
>> The most reliable test case to reproduce the failure was the parallel
>> module
>> loading. ?LTP stress test runs took a long time to trigger the errors.
>>
>> I'm dusting off the patches and trying to resurrect the test cases to see
>> if I can
>> reproduce the errors on latest kernel.org. ?I'll followup with results in
>> a day or
>> so. ?FWIW, ? I've got LTP stress running on latest kernel.org now but
>> probably
>> should have resurrected the parallel module loading test case instead, as
>> LTP stress may take awhile to reproduce the error, if at all.
>>
>> --
>> Regards,
>> George
>>
>>>
>>>
>>> thanks
>>>
>>> --
>>> Love each day!
>>>
>>> --bill
>>
>>
>>
>
> --
> Love each day!
>
> --bill
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Hi all,

When PREEMPT disabled, is it possible that a kernel thread which performs
DMA transfers (such like dmatest) is migrated to other CPUs when it is
executing the DMA cache maintenance operations?
If so, isn't it needed to prevent task migration in DMA cache maintenance
operations (or in dma_map_single()) for ARMv6MPCore?

--
Regards,
Andrew

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2012-06-06  6:18 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-09  9:11 Query about: ARM11 MPCore: preemption/task migration cache coherency bill4carson
2012-05-11  8:51 ` Will Deacon
2012-05-11  9:53   ` bill4carson
2012-05-29  5:28   ` bill4carson
2012-05-30  6:38     ` Will Deacon
2012-05-30 10:01       ` bill4carson
2012-05-31  3:00         ` Catalin Marinas
2012-05-31  3:11           ` bill4carson
2012-05-31  3:12             ` Catalin Marinas
2012-05-31  3:19             ` Catalin Marinas
2012-05-31  3:38               ` bill4carson
2012-05-31  3:58                 ` Catalin Marinas
2012-05-31  5:06                   ` bill4carson
2012-05-31  5:19                     ` Catalin Marinas
2012-05-31  5:51                       ` bill4carson
2012-05-31  6:56                         ` Catalin Marinas
2012-05-31  7:21                           ` bill4carson
2012-05-31  7:46                             ` snakky.zhang at gmail.com
2012-05-31 16:04                               ` Catalin Marinas
2012-06-01  1:11                                 ` snakky.zhang at gmail.com
2012-06-01  3:25                                   ` Catalin Marinas
2012-06-01  5:21                                     ` snakky.zhang at gmail.com
2012-06-01  1:34                                 ` snakky.zhang at gmail.com
2012-06-01  3:29                                   ` Catalin Marinas
2012-06-03 11:34                                     ` Russell King - ARM Linux
2012-06-04  9:20                                       ` snakky.zhang at gmail.com
2012-06-05  4:06 ` George G. Davis
2012-06-05  4:50   ` bill4carson
2012-06-06  6:18     ` Andrew Yan-Pai Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).