Linux MIPS Architecture development
 help / color / mirror / Atom feed
* RM7k cache_flush_sigtramp
@ 2003-07-31  1:56 Fuxin Zhang
  2003-07-31 11:46 ` Ralf Baechle
  0 siblings, 1 reply; 22+ messages in thread
From: Fuxin Zhang @ 2003-07-31  1:56 UTC (permalink / raw)
  To: MAKE FUN PRANK CALLS

hi,

r4k_cache_flush_sigtrap seems not enough for RM7000 cpus because
there is a writebuffer between L1 dcache & L2 cache,so the written back
block may not be seen by icache. This small patch fixes crashes of my
Xserver
on ev64240.


--- r4kcache.h.ori 2003-07-31 09:51:01.000000000 +0800
+++ r4kcache.h 2003-07-31 09:51:57.000000000 +0800
@@ -94,6 +94,9 @@
".set noreorder\n\t"
".set mips3\n"
"1:\tcache %0,(%1)\n"
+#ifdef CONFIG_CPU_RM7000
+ "sync\n\t"
+#endif
"2:\t.set mips0\n\t"
".set reorder\n\t"
".section\t__ex_table,\"a\"\n\t"

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RM7k cache_flush_sigtramp
  2003-07-31  1:56 Fuxin Zhang
@ 2003-07-31 11:46 ` Ralf Baechle
  2003-07-31 12:57   ` Fuxin Zhang
  0 siblings, 1 reply; 22+ messages in thread
From: Ralf Baechle @ 2003-07-31 11:46 UTC (permalink / raw)
  To: Fuxin Zhang; +Cc: MAKE FUN PRANK CALLS

On Thu, Jul 31, 2003 at 09:56:08AM +0800, Fuxin Zhang wrote:
> Date:	Thu, 31 Jul 2003 09:56:08 +0800
> From:	Fuxin Zhang <fxzhang@ict.ac.cn>
> To:	MAKE FUN PRANK CALLS <linux-mips@linux-mips.org>
        ^^^^^^^^^^^^^^^^^^^^

Funny name for the list :-)

> r4k_cache_flush_sigtrap seems not enough for RM7000 cpus because
> there is a writebuffer between L1 dcache & L2 cache,so the written back
> block may not be seen by icache. This small patch fixes crashes of my
> Xserver on ev64240.

It would seem a similar fix is also needed in other places then?

  Ralf

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RM7k cache_flush_sigtramp
  2003-07-31 11:46 ` Ralf Baechle
@ 2003-07-31 12:57   ` Fuxin Zhang
  0 siblings, 0 replies; 22+ messages in thread
From: Fuxin Zhang @ 2003-07-31 12:57 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: MAKE FUN PRANK CALLS



Ralf Baechle wrote:

>On Thu, Jul 31, 2003 at 09:56:08AM +0800, Fuxin Zhang wrote:
>  
>
>>Date:	Thu, 31 Jul 2003 09:56:08 +0800
>>From:	Fuxin Zhang <fxzhang@ict.ac.cn>
>>To:	MAKE FUN PRANK CALLS <linux-mips@linux-mips.org>
>>    
>>
>        ^^^^^^^^^^^^^^^^^^^^
>
>Funny name for the list :-)
>
>  
>
>>r4k_cache_flush_sigtrap seems not enough for RM7000 cpus because
>>there is a writebuffer between L1 dcache & L2 cache,so the written back
>>block may not be seen by icache. This small patch fixes crashes of my
>>Xserver on ev64240.
>>    
>>
>
>It would seem a similar fix is also needed in other places then?
>
I have not thought about it further. But
  1. I implement wb_flush for this board,using sync and uncached read. 
Just in case
      so many buffer on the cpu and system bridge will surprise me.
  2. There are still occasionally oops, especially with IO 
activities,e.g.,when fscking a disk.

What would should suggest to look at? Some flushes will go through all 
levels of cache,
I think they should be safe. Will check later.

Thanks.

>
>  Ralf
>
>
>
>  
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: RM7k cache_flush_sigtramp
@ 2003-07-31 16:50 Adam Kiepul
  2003-08-01  0:40 ` Fuxin Zhang
  2003-08-01  7:51 ` Dominic Sweetman
  0 siblings, 2 replies; 22+ messages in thread
From: Adam Kiepul @ 2003-07-31 16:50 UTC (permalink / raw)
  To: 'Ralf Baechle', Fuxin Zhang; +Cc: MAKE FUN PRANK CALLS

Hi,

If this is just to ensure the I Cache coherency for modified code then the following should be sufficient:

cache Hit_Writeback_D, offset(base_register)
cache Hit_Invalidate_I, offset(base_register)

The ordering does matter however since the Hit_Invalidate_I makes sure the write buffer is flushed.

Kind Regards,

_______________________________

Adam Kiepul
Sr. Applications Engineer

PMC-Sierra, Microprocessor Division
Mission Towers One
3975 Freedom Circle
Santa Clara, CA 95054, USA
Direct: 408 239 8124
Fax: 408 492 9462



-----Original Message-----
From: Ralf Baechle [mailto:ralf@linux-mips.org]
Sent: Thursday, July 31, 2003 4:47 AM
To: Fuxin Zhang
Cc: MAKE FUN PRANK CALLS
Subject: Re: RM7k cache_flush_sigtramp


On Thu, Jul 31, 2003 at 09:56:08AM +0800, Fuxin Zhang wrote:
> Date:	Thu, 31 Jul 2003 09:56:08 +0800
> From:	Fuxin Zhang <fxzhang@ict.ac.cn>
> To:	MAKE FUN PRANK CALLS <linux-mips@linux-mips.org>
        ^^^^^^^^^^^^^^^^^^^^

Funny name for the list :-)

> r4k_cache_flush_sigtrap seems not enough for RM7000 cpus because
> there is a writebuffer between L1 dcache & L2 cache,so the written back
> block may not be seen by icache. This small patch fixes crashes of my
> Xserver on ev64240.

It would seem a similar fix is also needed in other places then?

  Ralf

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RM7k cache_flush_sigtramp
  2003-07-31 16:50 Adam Kiepul
@ 2003-08-01  0:40 ` Fuxin Zhang
  2003-08-01  3:01   ` Ralf Baechle
  2003-08-01  7:51 ` Dominic Sweetman
  1 sibling, 1 reply; 22+ messages in thread
From: Fuxin Zhang @ 2003-08-01  0:40 UTC (permalink / raw)
  To: Adam Kiepul, MAKE FUN PRANK CALLS


Adam Kiepul wrote:

>Hi,
>
>If this is just to ensure the I Cache coherency for modified code then the following should be sufficient:
>
>cache Hit_Writeback_D, offset(base_register)
>cache Hit_Invalidate_I, offset(base_register)
>  
>
Current linux code does exactly this. But I was seeing all kinds of 
faults occuring around the
sigreturn point on the stack without a sync? And a sync does greatly 
improve the stablity.

>The ordering does matter however since the Hit_Invalidate_I makes sure the write buffer is flushed.
>
>Kind Regards,
>
>_______________________________
>
>Adam Kiepul
>Sr. Applications Engineer
>
>PMC-Sierra, Microprocessor Division
>Mission Towers One
>3975 Freedom Circle
>Santa Clara, CA 95054, USA
>Direct: 408 239 8124
>Fax: 408 492 9462
>
>
>
>-----Original Message-----
>From: Ralf Baechle [mailto:ralf@linux-mips.org]
>Sent: Thursday, July 31, 2003 4:47 AM
>To: Fuxin Zhang
>Cc: MAKE FUN PRANK CALLS
>Subject: Re: RM7k cache_flush_sigtramp
>
>
>On Thu, Jul 31, 2003 at 09:56:08AM +0800, Fuxin Zhang wrote:
>  
>
>>Date:	Thu, 31 Jul 2003 09:56:08 +0800
>>From:	Fuxin Zhang <fxzhang@ict.ac.cn>
>>To:	MAKE FUN PRANK CALLS <linux-mips@linux-mips.org>
>>    
>>
>        ^^^^^^^^^^^^^^^^^^^^
>
>Funny name for the list :-)
>
>  
>
>>r4k_cache_flush_sigtrap seems not enough for RM7000 cpus because
>>there is a writebuffer between L1 dcache & L2 cache,so the written back
>>block may not be seen by icache. This small patch fixes crashes of my
>>Xserver on ev64240.
>>    
>>
>
>It would seem a similar fix is also needed in other places then?
>
>  Ralf
>
>
>  
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RM7k cache_flush_sigtramp
  2003-08-01  0:40 ` Fuxin Zhang
@ 2003-08-01  3:01   ` Ralf Baechle
  2003-08-01  4:59     ` Fuxin Zhang
  0 siblings, 1 reply; 22+ messages in thread
From: Ralf Baechle @ 2003-08-01  3:01 UTC (permalink / raw)
  To: Fuxin Zhang; +Cc: Adam Kiepul, MAKE FUN PRANK CALLS

Adam,

On Fri, Aug 01, 2003 at 08:40:14AM +0800, Fuxin Zhang wrote:

> Current linux code does exactly this. But I was seeing all kinds of 
> faults occuring around the
> sigreturn point on the stack without a sync? And a sync does greatly 
> improve the stablity.
> 
> >The ordering does matter however since the Hit_Invalidate_I makes sure the 
> >write buffer is flushed.

could there be an errata explaining Fuxin's findings?

Fuxin, what version are you running?

  Ralf

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RM7k cache_flush_sigtramp
  2003-08-01  3:01   ` Ralf Baechle
@ 2003-08-01  4:59     ` Fuxin Zhang
  0 siblings, 0 replies; 22+ messages in thread
From: Fuxin Zhang @ 2003-08-01  4:59 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: Adam Kiepul, MAKE FUN PRANK CALLS

I am using a slightly modified 2.4.21-pre4,based on cvs of early this 
month(?).
We have merged with latest cvs, I will run it and report the result tonight.


Ralf Baechle wrote:

>Adam,
>
>On Fri, Aug 01, 2003 at 08:40:14AM +0800, Fuxin Zhang wrote:
>
>  
>
>>Current linux code does exactly this. But I was seeing all kinds of 
>>faults occuring around the
>>sigreturn point on the stack without a sync? And a sync does greatly 
>>improve the stablity.
>>
>>    
>>
>>>The ordering does matter however since the Hit_Invalidate_I makes sure the 
>>>write buffer is flushed.
>>>      
>>>
>
>could there be an errata explaining Fuxin's findings?
>
>Fuxin, what version are you running?
>
>  Ralf
>
>
>  
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: RM7k cache_flush_sigtramp
  2003-07-31 16:50 Adam Kiepul
  2003-08-01  0:40 ` Fuxin Zhang
@ 2003-08-01  7:51 ` Dominic Sweetman
  2003-08-01  7:51   ` Dominic Sweetman
  2003-08-01  9:26   ` Ralf Baechle
  1 sibling, 2 replies; 22+ messages in thread
From: Dominic Sweetman @ 2003-08-01  7:51 UTC (permalink / raw)
  To: Adam Kiepul; +Cc: 'Ralf Baechle', Fuxin Zhang, linux-mips


> If this is just to ensure the I Cache coherency for modified code
> then the following should be sufficient:
> 
> cache Hit_Writeback_D, offset(base_register)
> cache Hit_Invalidate_I, offset(base_register)
> 
> The ordering does matter however since the Hit_Invalidate_I makes
> sure the write buffer is flushed.

I'm probably jumping into the middle of something, sorry... 

The MIPS32/MIPS64 release 2 architecture includes a useful instruction
SYNCI which does the whole job (repeat on each affected cache line)
and is legal in user mode; this will take a while to spread but I'd
recommend it as a model worth following.

So I hope that kernels will provide one function for "I've just
written instructions and now I want to execute them", and not export
the separate writeback-D/invalidate-I interface.

--
Dominic Sweetman
MIPS Technologies
dom@mips.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: RM7k cache_flush_sigtramp
  2003-08-01  7:51 ` Dominic Sweetman
@ 2003-08-01  7:51   ` Dominic Sweetman
  2003-08-01  9:26   ` Ralf Baechle
  1 sibling, 0 replies; 22+ messages in thread
From: Dominic Sweetman @ 2003-08-01  7:51 UTC (permalink / raw)
  To: Adam Kiepul; +Cc: 'Ralf Baechle', Fuxin Zhang, linux-mips


> If this is just to ensure the I Cache coherency for modified code
> then the following should be sufficient:
> 
> cache Hit_Writeback_D, offset(base_register)
> cache Hit_Invalidate_I, offset(base_register)
> 
> The ordering does matter however since the Hit_Invalidate_I makes
> sure the write buffer is flushed.

I'm probably jumping into the middle of something, sorry... 

The MIPS32/MIPS64 release 2 architecture includes a useful instruction
SYNCI which does the whole job (repeat on each affected cache line)
and is legal in user mode; this will take a while to spread but I'd
recommend it as a model worth following.

So I hope that kernels will provide one function for "I've just
written instructions and now I want to execute them", and not export
the separate writeback-D/invalidate-I interface.

--
Dominic Sweetman
MIPS Technologies
dom@mips.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RM7k cache_flush_sigtramp
  2003-08-01  7:51 ` Dominic Sweetman
  2003-08-01  7:51   ` Dominic Sweetman
@ 2003-08-01  9:26   ` Ralf Baechle
  2003-08-01 14:18     ` Fuxin Zhang
  2003-08-04  8:45     ` Dominic Sweetman
  1 sibling, 2 replies; 22+ messages in thread
From: Ralf Baechle @ 2003-08-01  9:26 UTC (permalink / raw)
  To: Dominic Sweetman; +Cc: Adam Kiepul, Fuxin Zhang, linux-mips

On Fri, Aug 01, 2003 at 08:51:39AM +0100, Dominic Sweetman wrote:

> The MIPS32/MIPS64 release 2 architecture includes a useful instruction
> SYNCI which does the whole job (repeat on each affected cache line)
> and is legal in user mode; this will take a while to spread but I'd
> recommend it as a model worth following.

> So I hope that kernels will provide one function for "I've just
> written instructions and now I want to execute them", and not export
> the separate writeback-D/invalidate-I interface.

Linux supports the traditional MIPS UNIX cacheflush(2) syscall through
a libc interface.  Since I've not seen any other use for the call than
I/D-cache synchronization.  I'd just make cacheflush(3) use SYNCI where
available (Or maybe one of the other vendor specific mechanisms ...) and
fallback to cacheflush(2) where available.  Gcc would be another place
to teach about SYNCI for it's trampolines.

  Ralf

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RM7k cache_flush_sigtramp
  2003-08-01  9:26   ` Ralf Baechle
@ 2003-08-01 14:18     ` Fuxin Zhang
  2003-08-02 17:02       ` Ralf Baechle
  2003-08-04  8:45     ` Dominic Sweetman
  1 sibling, 1 reply; 22+ messages in thread
From: Fuxin Zhang @ 2003-08-01 14:18 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: Dominic Sweetman, Adam Kiepul, linux-mips

I just run a fresh new 2.4.21 kernel on my board, no luck.  The problem 
remains.
But I notice that my hardware may have some problems,especially with the 
add-on
ide card. Keep headaching...

As to the discussion of SYNC, I can't help wondering whether the cache 
management
should be totally hidden from programmers. People tends to write 
"safetest" code because
of all kinds of brain-damage different hardware, which leads to 
inefficient code. And this will
cancel out the potential speed benefit of simpler hardware. Also today's 
hardware seems not
as expensive as it was before...


Ralf Baechle wrote:

>On Fri, Aug 01, 2003 at 08:51:39AM +0100, Dominic Sweetman wrote:
>
>  
>
>>The MIPS32/MIPS64 release 2 architecture includes a useful instruction
>>SYNCI which does the whole job (repeat on each affected cache line)
>>and is legal in user mode; this will take a while to spread but I'd
>>recommend it as a model worth following.
>>    
>>
>
>  
>
>>So I hope that kernels will provide one function for "I've just
>>written instructions and now I want to execute them", and not export
>>the separate writeback-D/invalidate-I interface.
>>    
>>
>
>Linux supports the traditional MIPS UNIX cacheflush(2) syscall through
>a libc interface.  Since I've not seen any other use for the call than
>I/D-cache synchronization.  I'd just make cacheflush(3) use SYNCI where
>available (Or maybe one of the other vendor specific mechanisms ...) and
>fallback to cacheflush(2) where available.  Gcc would be another place
>to teach about SYNCI for it's trampolines.
>
>  Ralf
>
>
>
>  
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: RM7k cache_flush_sigtramp
@ 2003-08-01 15:42 Adam Kiepul
  2003-08-04  3:38 ` Fuxin Zhang
  2003-08-06 11:00 ` Fuxin Zhang
  0 siblings, 2 replies; 22+ messages in thread
From: Adam Kiepul @ 2003-08-01 15:42 UTC (permalink / raw)
  To: 'Fuxin Zhang', Ralf Baechle; +Cc: MAKE FUN PRANK CALLS

Hi Fuxin,

Could you please provide me with the _original_ Kernel code disassembly snippet around the point where your SYNC patch applies?
Also, can you check what RM7000 part revision is on your board? You can find it out by reading the PrID register.

I will check if there is an erratum that the code could trigger.

By the way, are you aware of any other ev64240 board that would exhibit the same behavior?

I would be quite careful drawing any conclusions at the moment since we can not preclude the possibility that it is simply a "bad CPU on the board" case. Please note that the SYNC instruction changes a lot in the manner things physically happen in the CPU so it can often mask off various problems, such as a bad part.

Thank you,

Adam


-----Original Message-----
From: Fuxin Zhang [mailto:fxzhang@ict.ac.cn]
Sent: Thursday, July 31, 2003 9:59 PM
To: Ralf Baechle
Cc: Adam Kiepul; MAKE FUN PRANK CALLS
Subject: Re: RM7k cache_flush_sigtramp


I am using a slightly modified 2.4.21-pre4,based on cvs of early this 
month(?).
We have merged with latest cvs, I will run it and report the result tonight.


Ralf Baechle wrote:

>Adam,
>
>On Fri, Aug 01, 2003 at 08:40:14AM +0800, Fuxin Zhang wrote:
>
>  
>
>>Current linux code does exactly this. But I was seeing all kinds of 
>>faults occuring around the
>>sigreturn point on the stack without a sync? And a sync does greatly 
>>improve the stablity.
>>
>>    
>>
>>>The ordering does matter however since the Hit_Invalidate_I makes sure the 
>>>write buffer is flushed.
>>>      
>>>
>
>could there be an errata explaining Fuxin's findings?
>
>Fuxin, what version are you running?
>
>  Ralf
>
>
>  
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RM7k cache_flush_sigtramp
  2003-08-01 14:18     ` Fuxin Zhang
@ 2003-08-02 17:02       ` Ralf Baechle
  0 siblings, 0 replies; 22+ messages in thread
From: Ralf Baechle @ 2003-08-02 17:02 UTC (permalink / raw)
  To: Fuxin Zhang; +Cc: Dominic Sweetman, Adam Kiepul, linux-mips

On Fri, Aug 01, 2003 at 10:18:50PM +0800, Fuxin Zhang wrote:

> I just run a fresh new 2.4.21 kernel on my board, no luck.  The problem 
> remains.  But I notice that my hardware may have some problems,
> especially with the add-on ide card. Keep headaching...
> 
> As to the discussion of SYNC, I can't help wondering whether the cache 
> management should be totally hidden from programmers. People tends to
> write "safetest" code because of all kinds of brain-damage different
> hardware, which leads to inefficient code. And this will cancel out the
> potential speed benefit of simpler hardware. Also today's hardware seems
> not as expensive as it was before...

Cache managment needs to be somehow hidden from programmers as well as
possible - the average programmer has no clue about how caches work.
We've come up with an API that hides the actual functioning of caches
pretty well for DMA devices, see Documentation/DMA-mapping.txt and in
2.6 also a more generalized version documented in Documentation/DMA-API.txt.

  Ralf

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RM7k cache_flush_sigtramp
  2003-08-01 15:42 RM7k cache_flush_sigtramp Adam Kiepul
@ 2003-08-04  3:38 ` Fuxin Zhang
  2003-08-06 11:00 ` Fuxin Zhang
  1 sibling, 0 replies; 22+ messages in thread
From: Fuxin Zhang @ 2003-08-04  3:38 UTC (permalink / raw)
  To: Adam Kiepul; +Cc: Ralf Baechle, MAKE FUN PRANK CALLS

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=gb18030; format=flowed, Size: 3201 bytes --]

Hi Adam,
   
   My cpu PRID:  0x2732, runs at freq 133x2MHz

disassemble code before patch:

ffffffff8010f2fc <r4k_flush_cache_sigtramp>:
ffffffff8010f2fc:	3c03802c 	lui	v1,0x802c
ffffffff8010f300:	246367ec 	addiu	v1,v1,26604
ffffffff8010f304:	94620010 	lhu	v0,16(v1)
ffffffff8010f308:	94650000 	lhu	a1,0(v1)
ffffffff8010f30c:	00021023 	negu	v0,v0
ffffffff8010f310:	00821024 	and	v0,a0,v0
ffffffff8010f314:	bc550000 	cache	0x15,0(v0)
ffffffff8010f318:	00052823 	negu	a1,a1
ffffffff8010f31c:	00852024 	and	a0,a0,a1
ffffffff8010f320:	bc900000 	cache	0x10,0(a0)
ffffffff8010f324:	03e00008 	jr	ra
ffffffff8010f328:	00000000 	nop

disassemble code after patch:
ffffffff8010ceb0 <r4k_flush_cache_sigtramp>:
ffffffff8010ceb0:	3c03802f 	lui	v1,0x802f
ffffffff8010ceb4:	2463e3ac 	addiu	v1,v1,-7252
ffffffff8010ceb8:	94620010 	lhu	v0,16(v1)
ffffffff8010cebc:	94650000 	lhu	a1,0(v1)
ffffffff8010cec0:	00021023 	negu	v0,v0
ffffffff8010cec4:	00821024 	and	v0,a0,v0
ffffffff8010cec8:	bc550000 	cache	0x15,0(v0)
ffffffff8010cecc:	0000000f 	sync
ffffffff8010ced0:	00052823 	negu	a1,a1
ffffffff8010ced4:	00852024 	and	a0,a0,a1
ffffffff8010ced8:	bc900000 	cache	0x10,0(a0)
ffffffff8010cedc:	03e00008 	jr	ra
ffffffff8010cee0:	00000000 	nop


We do have more than one set of ev64240 and RM7k cpu£¬but it will take 
some time for
me to get another one for test. I will tell you the result once i do it.

Thank you.
 

Adam Kiepul wrote:

>Hi Fuxin,
>
>Could you please provide me with the _original_ Kernel code disassembly snippet around the point where your SYNC patch applies?
>Also, can you check what RM7000 part revision is on your board? You can find it out by reading the PrID register.
>
>I will check if there is an erratum that the code could trigger.
>
>By the way, are you aware of any other ev64240 board that would exhibit the same behavior?
>
>I would be quite careful drawing any conclusions at the moment since we can not preclude the possibility that it is simply a "bad CPU on the board" case. Please note that the SYNC instruction changes a lot in the manner things physically happen in the CPU so it can often mask off various problems, such as a bad part.
>
>Thank you,
>
>Adam
>
>
>-----Original Message-----
>From: Fuxin Zhang [mailto:fxzhang@ict.ac.cn]
>Sent: Thursday, July 31, 2003 9:59 PM
>To: Ralf Baechle
>Cc: Adam Kiepul; MAKE FUN PRANK CALLS
>Subject: Re: RM7k cache_flush_sigtramp
>
>
>I am using a slightly modified 2.4.21-pre4,based on cvs of early this 
>month(?).
>We have merged with latest cvs, I will run it and report the result tonight.
>
>
>Ralf Baechle wrote:
>
>  
>
>>Adam,
>>
>>On Fri, Aug 01, 2003 at 08:40:14AM +0800, Fuxin Zhang wrote:
>>
>> 
>>
>>    
>>
>>>Current linux code does exactly this. But I was seeing all kinds of 
>>>faults occuring around the
>>>sigreturn point on the stack without a sync? And a sync does greatly 
>>>improve the stablity.
>>>
>>>   
>>>
>>>      
>>>
>>>>The ordering does matter however since the Hit_Invalidate_I makes sure the 
>>>>write buffer is flushed.
>>>>     
>>>>
>>>>        
>>>>
>>could there be an errata explaining Fuxin's findings?
>>
>>Fuxin, what version are you running?
>>
>> Ralf
>>
>>
>> 
>>
>>    
>>
>
>
>
>  
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RM7k cache_flush_sigtramp
  2003-08-01  9:26   ` Ralf Baechle
  2003-08-01 14:18     ` Fuxin Zhang
@ 2003-08-04  8:45     ` Dominic Sweetman
  2003-08-04 11:51       ` Maciej W. Rozycki
  1 sibling, 1 reply; 22+ messages in thread
From: Dominic Sweetman @ 2003-08-04  8:45 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: Dominic Sweetman, Adam Kiepul, Fuxin Zhang, linux-mips


Ralf,

> Linux supports the traditional MIPS UNIX cacheflush(2) syscall through
> a libc interface.  Since I've not seen any other use for the call than
> I/D-cache synchronization.  I'd just make cacheflush(3) use SYNCI where
> available...

SYNCI just does what's required to execute code you just wrote: that's
a D-cache writeback and an I-cache invalidate.  It doesn't invalidate
the D-cache afterwards, which is required by the definition of
cacheflush(3).

I think it would be better to provide cache manipulation calls defined
top-down (by their purpose); but so long as we are stuck with calls
which are defined as performing particular low-level actions, it's
surely dangerous to guess that we know what they are used for so we
can trim the functions accordingly...

--
Dominic Sweetman
MIPS Technologies

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RM7k cache_flush_sigtramp
  2003-08-04  8:45     ` Dominic Sweetman
@ 2003-08-04 11:51       ` Maciej W. Rozycki
  0 siblings, 0 replies; 22+ messages in thread
From: Maciej W. Rozycki @ 2003-08-04 11:51 UTC (permalink / raw)
  To: Dominic Sweetman; +Cc: Ralf Baechle, Adam Kiepul, Fuxin Zhang, linux-mips

Dominic,

> I think it would be better to provide cache manipulation calls defined
> top-down (by their purpose); but so long as we are stuck with calls
> which are defined as performing particular low-level actions, it's
> surely dangerous to guess that we know what they are used for so we
> can trim the functions accordingly...

 The API is not cast in stone -- if there's a justifiable benefit,
appropriate fuctions can be added; either completely new ones (possibly
inlined) or as an extension to cacheflush() (which still has 30 bits
freely available). 

  Maciej

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RM7k cache_flush_sigtramp
  2003-08-01 15:42 RM7k cache_flush_sigtramp Adam Kiepul
  2003-08-04  3:38 ` Fuxin Zhang
@ 2003-08-06 11:00 ` Fuxin Zhang
  2003-08-06 11:55   ` Ralf Baechle
  1 sibling, 1 reply; 22+ messages in thread
From: Fuxin Zhang @ 2003-08-06 11:00 UTC (permalink / raw)
  To: Adam Kiepul; +Cc: Ralf Baechle, MAKE FUN PRANK CALLS

[-- Attachment #1: Type: text/plain, Size: 4266 bytes --]

hi,
   These days I have performed more experiments on our ev64240 board.

   Now it seems I get at least two problems: sigtramp flush and L3 cache.

   Let me descripe the phenomemena first.
      1.  fsck /dev/hda4(a 10G partition of 40G ide disk on a pci add-on 
card use
            intel piix4 chip) frequently fail with oops in various place:
                __remove_inode_queue, free_buffers, vmscan:359 etc.
       2. occasionally other apps may fail with segmentation fault or 
bus error.
       3. xwindow system is extremely unstable,both the applications and 
the
           Xserver may fail with sigill/sigsegv/sigbus etc.

   To address the problems, I modified arch/mips/signal.c to let kernel 
dump core
unconditionally(even if there are use handler installed) for 
sigill/sigsegv/sigbus.
By this way I get many core files for XFree86,then I find that they all 
look quite
similiar--all around the point of kernel generated sigreturn code(Two 
example are
 attached). Days ago i added a 'sync' after writeback and the situation 
was much better.
But then i still see this kinds of failure even with the 'sync'.  I have 
to go further back
to use 'Writeback_SD', so far no more such fault. But just as Adam 
pointed out,it may
just mask over another error.  I have tried to add code in 
r4k_flush_sigtramp and
sigreturn,and when xserver fails,I do observe that there are flush for 
the faulting point,
but no sigreturn executed. So it is at my wit's end:(. Maybe some 
complex schedule or
reentry problem? Or even a potential bug of context management(e.g.,we 
are using the
other's stack)?

   Using Writeback_SD only help xserver problem, the other problems look 
like
cache related. So I try to run with L3 cache disabled. That helps 
greatly, no oops
now. With a little tweak on ide code,the 'lost interrupt' problem seems 
gone too.
But with only L3 disabled, the Xserver problem remains.

  I am doing stress test now. Hope it won't give me more surprise.

  And here I have a question for Mr. Adam: original linux code use 
'Writeback_Inv_D"
and "Hit_Invalidate_I",not "Writeback_D" and "Hit_Invalidate_I",could it 
lead to the
problem?

 BTW:
   a silly question: how can i make my email show up pretier? I find 
that the mailing list
often break my lines very badly. I feel guilty for that:) I am using 
mozilla composer,the
original linebreaks are manually inserted(hit enter when i feel it is 
long enough).

Thank you for any help.
 
Adam Kiepul wrote:

>Hi Fuxin,
>
>Could you please provide me with the _original_ Kernel code disassembly snippet around the point where your SYNC patch applies?
>Also, can you check what RM7000 part revision is on your board? You can find it out by reading the PrID register.
>
>I will check if there is an erratum that the code could trigger.
>
>By the way, are you aware of any other ev64240 board that would exhibit the same behavior?
>
>I would be quite careful drawing any conclusions at the moment since we can not preclude the possibility that it is simply a "bad CPU on the board" case. Please note that the SYNC instruction changes a lot in the manner things physically happen in the CPU so it can often mask off various problems, such as a bad part.
>
>Thank you,
>
>Adam
>
>
>-----Original Message-----
>From: Fuxin Zhang [mailto:fxzhang@ict.ac.cn]
>Sent: Thursday, July 31, 2003 9:59 PM
>To: Ralf Baechle
>Cc: Adam Kiepul; MAKE FUN PRANK CALLS
>Subject: Re: RM7k cache_flush_sigtramp
>
>
>I am using a slightly modified 2.4.21-pre4,based on cvs of early this 
>month(?).
>We have merged with latest cvs, I will run it and report the result tonight.
>
>
>Ralf Baechle wrote:
>
>  
>
>>Adam,
>>
>>On Fri, Aug 01, 2003 at 08:40:14AM +0800, Fuxin Zhang wrote:
>>
>> 
>>
>>    
>>
>>>Current linux code does exactly this. But I was seeing all kinds of 
>>>faults occuring around the
>>>sigreturn point on the stack without a sync? And a sync does greatly 
>>>improve the stablity.
>>>
>>>   
>>>
>>>      
>>>
>>>>The ordering does matter however since the Hit_Invalidate_I makes sure the 
>>>>write buffer is flushed.
>>>>     
>>>>
>>>>        
>>>>
>>could there be an errata explaining Fuxin's findings?
>>
>>Fuxin, what version are you running?
>>
>> Ralf
>>
>>
>> 
>>
>>    
>>
>
>
>
>  
>

[-- Attachment #2: out --]
[-- Type: text/plain, Size: 3968 bytes --]

GNU gdb 2002-04-01-cvs
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "mipsel-linux"...(no debugging symbols found)...
Core was generated by `/usr/bin/X11/X -dpi 100 -nolisten tcp'.
Program terminated with signal 4, Illegal instruction.
Reading symbols from /usr/lib/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/ld.so.1
Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...
done.
Loaded symbols for /lib/libnss_files.so.2

    GDB is unable to find the start of the function at 0x7fff7600
and thus can't determine the size of that function's stack frame.
This means that GDB may be unable to access that stack frame, or
the frames below it.
    This problem is most likely caused by an invalid program counter or
stack pointer.
    However, if you think GDB should simply search farther back
from 0x7fff7600 for code which looks like the beginning of a
function, you can increase the range of the search using the `set
heuristic-fence-post' command.
#0  0x7fff7600 in ?? ()
(gdb) where
#0  0x7fff7600 in ?? ()
(gdb) disass 0x7fff7580 0x7fff7680
Dump of assembler code from 0x7fff7580 to 0x7fff7680:
0x7fff7580:	nop
0x7fff7584:	nop
0x7fff7588:	nop
0x7fff758c:	nop
0x7fff7590:	nop
0x7fff7594:	nop
0x7fff7598:	nop
0x7fff759c:	nop
0x7fff75a0:	nop
0x7fff75a4:	nop
0x7fff75a8:	nop
0x7fff75ac:	nop
0x7fff75b0:	nop
0x7fff75b4:	nop
0x7fff75b8:	nop
0x7fff75bc:	nop
0x7fff75c0:	nop
0x7fff75c4:	nop
0x7fff75c8:	nop
0x7fff75cc:	nop
0x7fff75d0:	nop
0x7fff75d4:	beq	at,t3,0x80003ef8
0x7fff75d8:	sllv	zero,zero,zero
0x7fff75dc:	0xe
0x7fff75e0:	beq	zero,gp,0x7ffef244
0x7fff75e4:	beq	zero,t0,0x800012e8
0x7fff75e8:	0x7fff7600
0x7fff75ec:	beq	zero,t2,0x7ffd98b0
0x7fff75f0:	sd	ra,-1(ra)
0x7fff75f4:	sd	ra,-1(ra)
0x7fff75f8:	slti	sp,s6,-25040
0x7fff75fc:	beq	zero,t2,0x7ffd9cc0
0x7fff7600:	li	v0,4119
0x7fff7604:	syscall
0x7fff7608:	0x12c
0x7fff760c:	lb	a0,-19437(zero)
0x7fff7610:	slti	s1,s6,17812
0x7fff7614:	nop
0x7fff7618:	nop
0x7fff761c:	nop
0x7fff7620:	0xcf9210
0x7fff7624:	nop
0x7fff7628:	mfhi	zero
0x7fff762c:	nop
0x7fff7630:	beq	zero,at,0x8000caa4
0x7fff7634:	nop
0x7fff7638:	0xe
0x7fff763c:	nop
0x7fff7640:	slti	v0,k1,28680
0x7fff7644:	nop
0x7fff7648:	0x1228
0x7fff764c:	nop
0x7fff7650:	nop
0x7fff7654:	nop
0x7fff7658:	multu	zero,zero
0x7fff765c:	nop
0x7fff7660:	nop
0x7fff7664:	nop
0x7fff7668:	slti	t9,s7,17756
0x7fff766c:	nop
0x7fff7670:	beq	at,t3,0x7fff6f04
0x7fff7674:	nop
0x7fff7678:	0x12c
0x7fff767c:	nop
End of assembler dump.
(gdb) info regs
(gdb) info regs\b \b
          zero       at       v0       v1       a0       a1       a2       a3
 R0   00000000 b004b400 ffffffff ffffffff 00000001 7fff73f8 00000000 00000000 
            t0       t1       t2       t3       t4       t5       t6       t7
 R8   0000b400 00000000 00000000 00000000 00000000 822b2880 822b2900 00000000 
            s0       s1       s2       s3       s4       s5       s6       s7
 R16  00000000 102b3248 00000004 0000000e 101cdf18 102b1820 00000001 00000000 
            t8       t9       k0       k1       gp       sp       s8       ra
 R24  00000000 006c86ec 00000000 00000000 10082740 7fff75f0 7fff7d08 7fff7600 
            sr       lo       hi      bad    cause       pc
      a004b413 00000002 00000000 8009c6a0 00000028 7fff7600 
           fsr      fir       fp
      00800004 00000000 00000000 
(gdb) quit

[-- Attachment #3: out1 --]
[-- Type: text/plain, Size: 3412 bytes --]

GNU gdb 2002-04-01-cvs
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "mipsel-linux"...(no debugging symbols found)...
Core was generated by `/bin/sh /usr/bin/X11/startx'.
Program terminated with signal 4, Illegal instruction.

    GDB is unable to find the start of the function at 0x7fff75b8
and thus can't determine the size of that function's stack frame.
This means that GDB may be unable to access that stack frame, or
the frames below it.
    This problem is most likely caused by an invalid program counter or
stack pointer.
    However, if you think GDB should simply search farther back
from 0x7fff75b8 for code which looks like the beginning of a
function, you can increase the range of the search using the `set
heuristic-fence-post' command.
#0  0x7fff75b8 in ?? ()
(gdb) info reg
          zero       at       v0       v1       a0       a1       a2       a3
 R0   00000000 2ad918f0 2ad918f0 0000000a 00000012 7fff7538 00000001 00000001 
            t0       t1       t2       t3       t4       t5       t6       t7
 R8   0000000a 2aca6394 00000000 00000004 00000000 00000000 00000000 07200720 
            s0       s1       s2       s3       s4       s5       s6       s7
 R16  00000000 00000004 00000080 7fff7878 00000003 ffffffff 1000f0f8 00000001 
            t8       t9       k0       k1       gp       sp       s8       ra
 R24  00000000 00000000 00000000 00000000 1000d880 7fff7590 00000003 7fff75a0 
            sr       lo       hi      bad    cause       pc
      a004f413 000001b0 00000000 8009c6a0 80000028 7fff75b8 
           fsr      fir       fp
      00000000 00000000 00000000 
(gdb) disass 0x7fff7500 0x7fff7600
Dump of assembler code from 0x7fff7500 to 0x7fff7600:
0x7fff7500:	0xc2009d
0x7fff7504:	0x10000e8
0x7fff7508:	0x11a0110
0x7fff750c:	0x990121
0x7fff7510:	slti	t9,s6,32304
0x7fff7514:	tltu	a0,t9,0x2
0x7fff7518:	slti	t9,s6,32304
0x7fff751c:	0x442c88
0x7fff7520:	nop
0x7fff7524:	nop
0x7fff7528:	nop
0x7fff752c:	nop
0x7fff7530:	b	0x7ffed734
0x7fff7534:	nop
0x7fff7538:	nop
0x7fff753c:	slti	t8,s6,-8108
0x7fff7540:	nop
0x7fff7544:	sllv	zero,zero,zero
0x7fff7548:	sll	zero,zero,0x2
0x7fff754c:	0x7fff7878
0x7fff7550:	sra	zero,zero,0x0
0x7fff7554:	sd	ra,-1(ra)
0x7fff7558:	b	0x7fff393c
0x7fff755c:	b	0x7ffed760
0x7fff7560:	teq	v0,a0,0xa9
0x7fff7564:	nop
0x7fff7568:	nop
0x7fff756c:	nop
0x7fff7570:	nop
0x7fff7574:	nop
0x7fff7578:	b	0x7ffed77c
0x7fff757c:	nop
0x7fff7580:	nop
0x7fff7584:	b	0x7ffed788
0x7fff7588:	0x7fff75a0
0x7fff758c:	0x1
0x7fff7590:	b	0x7fff2174
0x7fff7594:	0x7fff7804
0x7fff7598:	slti	t9,s6,32304
0x7fff759c:	0x475718
0x7fff75a0:	li	v0,4119
0x7fff75a4:	syscall
0x7fff75a8:	slti	t8,s6,-8108
0x7fff75ac:	lb	a0,-3053(zero)
0x7fff75b0:	slti	t5,s6,9620
0x7fff75b4:	nop
0x7fff75b8:	nop
0x7fff75bc:	nop
0x7fff75c0:	b	0x7fff8cb4
0x7fff75c4:	nop
0x7fff75c8:	sllv	zero,zero,zero
0x7fff75cc:	nop
0x7fff75d0:	nop
0x7fff75d4:	nop
0x7fff75d8:	sra	zero,zero,0x0
0x7fff75dc:	nop
0x7fff75e0:	0x7fff7878
0x7fff75e4:	nop
0x7fff75e8:	sll	zero,zero,0x2
0x7fff75ec:	nop
0x7fff75f0:	0x1
0x7fff75f4:	nop
0x7fff75f8:	mult	zero,zero
0x7fff75fc:	nop
End of assembler dump.
(gdb) quit

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RM7k cache_flush_sigtramp
  2003-08-06 11:00 ` Fuxin Zhang
@ 2003-08-06 11:55   ` Ralf Baechle
  2003-08-06 12:52     ` Fuxin Zhang
  0 siblings, 1 reply; 22+ messages in thread
From: Ralf Baechle @ 2003-08-06 11:55 UTC (permalink / raw)
  To: Fuxin Zhang; +Cc: Adam Kiepul, MAKE FUN PRANK CALLS

On Wed, Aug 06, 2003 at 07:00:07PM +0800, Fuxin Zhang wrote:

>  And here I have a question for Mr. Adam: original linux code use 
> 'Writeback_Inv_D"
> and "Hit_Invalidate_I",not "Writeback_D" and "Hit_Invalidate_I",could it 
> lead to the
> problem?

No.  To synchronize the D-cache and I-cache it's irrelevant if you
invalidate the D-cache or not.

> BTW:
>   a silly question: how can i make my email show up pretier? I find 
> that the mailing list
> often break my lines very badly. I feel guilty for that:) I am using 
> mozilla composer,the
> original linebreaks are manually inserted(hit enter when i feel it is 
> long enough).

Format your email with hard breaks to about 75 columns.  75 columns
because god made vt100 with 80 columns so that leaves a bit of space for
quoting your mail nicely.

Now for your register dumps and information:

> (gdb) info reg
[...]
>            t8       t9       k0       k1       gp       sp       s8       ra
> R24  00000000 00000000 00000000 00000000 1000d880 7fff7590 00000003 7fff75a0
>            sr       lo       hi      bad    cause       pc
>      a004f413 000001b0 00000000 8009c6a0 80000028 7fff75b8
[...]

> 0x7fff75a0:     li      v0,4119
> 0x7fff75a4:     syscall

So the pc is pointing just after the trampoline which suspiciously looks
like the return of an old bug.  Could your application be doing something
unusual such as forking from a signal handler or similar?  The scenario
is about

 - kernel installs signal trampoline on stack
 - kernel forks.  Now the signal trampoline installed in the first step
   resides on a copy-on-write page.
 - newly created process touches the cow page, thereby resulting in
   breaking of the cow page.  Now parent and child have their own copy
   of the page.  BUT: flush_cache_page() doesn't properly flush this page.
 - Parent executes again on the copy of the page for which caches have
   not been flushed proplerly in the previous step, thereby failing to
   execute the trampoline - crash.

  Ralf

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RM7k cache_flush_sigtramp
  2003-08-06 11:55   ` Ralf Baechle
@ 2003-08-06 12:52     ` Fuxin Zhang
  2003-08-06 14:45       ` Ralf Baechle
  0 siblings, 1 reply; 22+ messages in thread
From: Fuxin Zhang @ 2003-08-06 12:52 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: Adam Kiepul, MAKE FUN PRANK CALLS



Ralf Baechle wrote:

>On Wed, Aug 06, 2003 at 07:00:07PM +0800, Fuxin Zhang wrote:
>
>  
>
>> And here I have a question for Mr. Adam: original linux code use 
>>'Writeback_Inv_D"
>>and "Hit_Invalidate_I",not "Writeback_D" and "Hit_Invalidate_I",could it 
>>lead to the
>>problem?
>>    
>>
>
>No.  To synchronize the D-cache and I-cache it's irrelevant if you
>invalidate the D-cache or not.
>  
>
I think so. Just in case the hardware is doing something strange:)

>  
>
>>BTW:
>>  a silly question: how can i make my email show up pretier? I find 
>>that the mailing list
>>often break my lines very badly. I feel guilty for that:) I am using 
>>mozilla composer,the
>>original linebreaks are manually inserted(hit enter when i feel it is 
>>long enough).
>>    
>>
>
>Format your email with hard breaks to about 75 columns.  75 columns
>because god made vt100 with 80 columns so that leaves a bit of space for
>quoting your mail nicely.
>
Thanks:)

>
>Now for your register dumps and information:
>  
>
>>           sr       lo       hi      bad    cause       pc
>>     a004f413 000001b0 00000000 8009c6a0 80000028 7fff75b8
>>    
>>
>[...]
>  
>
>>0x7fff75a0:     li      v0,4119
>>0x7fff75a4:     syscall
>>    
>>
>
>So the pc is pointing just after the trampoline which suspiciously looks
>like the return of an old bug.  Could your application be doing something
>unusual such as forking from a signal handler or similar?  The scenario
>
I am not sure. It is stardard X distribution from debian-woody. Fairly 
easy to reproduce,just move the mouse
around and click here and there then it would die. Will check this 
later,but I think such a giant as Xserver
won't fork frequently.

>is about
>
> - kernel installs signal trampoline on stack
> - kernel forks.  Now the signal trampoline installed in the first step
>   resides on a copy-on-write page.
> - newly created process touches the cow page, thereby resulting in
>   breaking of the cow page.  Now parent and child have their own copy
>   of the page.  BUT: flush_cache_page() doesn't properly flush this page
>  
>
> - Parent executes again on the copy of the page for which caches have
>
If the new process touch the cow page first,shouldn't it get a new page 
and leave the original page for parent?
If so,the parent should be able to see the trampoline content from 
icache anyway(either L2 or memory should
have the value),though the child may not?

>   not been flushed proplerly in the previous step, thereby failing to
>   execute the trampoline - crash.
>
RM7000 has 16k 4-way set-associated primary caches,which are supposed to 
have no cache aliasing problem

Bad news:

oops again:(  while true; do fsck -y -f /dev/hda4 ; done 

after about 5 succeeded run.
So still some  problems lurking somewhere.

It seems I have to switch some hardware...

>
>  Ralf
>
>
>
>  
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RM7k cache_flush_sigtramp
  2003-08-06 12:52     ` Fuxin Zhang
@ 2003-08-06 14:45       ` Ralf Baechle
  2003-08-06 15:04         ` Fuxin Zhang
  0 siblings, 1 reply; 22+ messages in thread
From: Ralf Baechle @ 2003-08-06 14:45 UTC (permalink / raw)
  To: Fuxin Zhang; +Cc: Adam Kiepul, MAKE FUN PRANK CALLS

On Wed, Aug 06, 2003 at 08:52:46PM +0800, Fuxin Zhang wrote:

> I am not sure. It is stardard X distribution from debian-woody. Fairly 
> easy to reproduce,just move the mouse
> around and click here and there then it would die. Will check this 
> later,but I think such a giant as Xserver won't fork frequently.

The scenario I was describing was just how we did originally discover the
bug.  Supposedly that was fixed but your register dump and dissassembly
show the exact fingerprint of that old problem, so I though I should
describe it in the hope it's going to help you.

> If the new process touch the cow page first,shouldn't it get a new page 
> and leave the original page for parent?
> If so,the parent should be able to see the trampoline content from 
> icache anyway(either L2 or memory should
> have the value),though the child may not?

RM7000 has a physically indexed cache.  That means if the copy of the
page wasn't explicitly or implicitly written back to L2 the process
whichever ends up with the copy of the page might fetch stale instructions
from memory - boom.

> >  not been flushed proplerly in the previous step, thereby failing to
> >  execute the trampoline - crash.
> >
> RM7000 has 16k 4-way set-associated primary caches,which are supposed to 
> have no cache aliasing problem

The described scenario is not an aliasing problem; it's the case where the
copy of the cow page hasn't properly been flushed at all.  When we
isolated the bug was that neither flush_page_to_ram() nor flush_cache_page()
were flushing the cache.  I suspect your case must be something fairly
similar.

  Ralf

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RM7k cache_flush_sigtramp
  2003-08-06 14:45       ` Ralf Baechle
@ 2003-08-06 15:04         ` Fuxin Zhang
  2003-08-06 22:30           ` Ralf Baechle
  0 siblings, 1 reply; 22+ messages in thread
From: Fuxin Zhang @ 2003-08-06 15:04 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: Adam Kiepul, MAKE FUN PRANK CALLS



Ralf Baechle wrote:

>>If the new process touch the cow page first,shouldn't it get a new page 
>>and leave the original page for parent?
>>If so,the parent should be able to see the trampoline content from 
>>icache anyway(either L2 or memory should
>>have the value),though the child may not?
>>    
>>
>
>RM7000 has a physically indexed cache.  That means if the copy of the
>page wasn't explicitly or implicitly written back to L2 the process
>whichever ends up with the copy of the page might fetch stale instructions
>from memory - boom.
>
>  
>
>>> not been flushed proplerly in the previous step, thereby failing to
>>> execute the trampoline - crash.
>>>
>>>      
>>>
>>RM7000 has 16k 4-way set-associated primary caches,which are supposed to 
>>have no cache aliasing problem
>>    
>>
>
>The described scenario is not an aliasing problem; it's the case where the
>copy of the cow page hasn't properly been flushed at all.  When we
>isolated the bug was that neither flush_page_to_ram() nor flush_cache_page()
>were flushing the cache.  I suspect your case must be something fairly
>  
>
After cache rewrite,flush_page_to_ram is null; and in this case 
flush_cache_page
 do nothing for a stack page. (It flushes only when has_dc_aliases or 
exec set).
So  the  one use  the new copy will  have problem ?!  Am I missing 
something?

Thank you very much, great Ralf:).

>similar.
>
>  Ralf
>
>
>  
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RM7k cache_flush_sigtramp
  2003-08-06 15:04         ` Fuxin Zhang
@ 2003-08-06 22:30           ` Ralf Baechle
  0 siblings, 0 replies; 22+ messages in thread
From: Ralf Baechle @ 2003-08-06 22:30 UTC (permalink / raw)
  To: Fuxin Zhang; +Cc: Adam Kiepul, MAKE FUN PRANK CALLS

On Wed, Aug 06, 2003 at 11:04:19PM +0800, Fuxin Zhang wrote:

> After cache rewrite,flush_page_to_ram is null; and in this case 
> flush_cache_page
> do nothing for a stack page. (It flushes only when has_dc_aliases or 
> exec set).
> So  the  one use  the new copy will  have problem ?!  Am I missing 
> something?

The stack page contains the trampoline so it must be marked executable,
so on an RM7000 flush_dcache_page must flush both the D-cache and
I-cache.

  Ralf

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2003-08-06 22:30 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-08-01 15:42 RM7k cache_flush_sigtramp Adam Kiepul
2003-08-04  3:38 ` Fuxin Zhang
2003-08-06 11:00 ` Fuxin Zhang
2003-08-06 11:55   ` Ralf Baechle
2003-08-06 12:52     ` Fuxin Zhang
2003-08-06 14:45       ` Ralf Baechle
2003-08-06 15:04         ` Fuxin Zhang
2003-08-06 22:30           ` Ralf Baechle
  -- strict thread matches above, loose matches on Subject: below --
2003-07-31 16:50 Adam Kiepul
2003-08-01  0:40 ` Fuxin Zhang
2003-08-01  3:01   ` Ralf Baechle
2003-08-01  4:59     ` Fuxin Zhang
2003-08-01  7:51 ` Dominic Sweetman
2003-08-01  7:51   ` Dominic Sweetman
2003-08-01  9:26   ` Ralf Baechle
2003-08-01 14:18     ` Fuxin Zhang
2003-08-02 17:02       ` Ralf Baechle
2003-08-04  8:45     ` Dominic Sweetman
2003-08-04 11:51       ` Maciej W. Rozycki
2003-07-31  1:56 Fuxin Zhang
2003-07-31 11:46 ` Ralf Baechle
2003-07-31 12:57   ` Fuxin Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox