Linux MIPS Architecture development
 help / color / mirror / Atom feed
* Memory corruption
@ 1999-06-22  1:39 Ulf Carlsson
  1999-06-30  1:01 ` William J. Earl
  0 siblings, 1 reply; 14+ messages in thread
From: Ulf Carlsson @ 1999-06-22  1:39 UTC (permalink / raw)
  To: linux

Hi,

The compiler may stop working sometimes on certain files, giving bogus error
messages which I don't understand (the compiler is probably not the only
application affected).  Running this program I just wrote forces the corrupted
caches to be flushed or something and ``fixes'' the problems:

int main(void)
{
	unsigned long tot = 0;
	unsigned long i = 1 << 20;
	void *p;
	int failures = 0;

	while (i) {
		p = malloc(i);
		if (!p) {
			if (failures++ < 10)
				continue;
			i = i >> 1;
			failures = 0;
			continue;
		}
		memset(p, 0, i);
		tot += i;
	}
	printf("Total memory set: %u kb\n", tot >> 10);
}

Maybe I should put this in my crontab along with sync :-)

Does anyone else notice these problems?

- Ulf

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Memory corruption
  1999-06-22  1:39 Memory corruption Ulf Carlsson
@ 1999-06-30  1:01 ` William J. Earl
  1999-06-30  2:47   ` Ulf Carlsson
  0 siblings, 1 reply; 14+ messages in thread
From: William J. Earl @ 1999-06-30  1:01 UTC (permalink / raw)
  To: Ulf Carlsson; +Cc: linux

Ulf Carlsson writes:
 > Hi,
 > 
 > The compiler may stop working sometimes on certain files, giving bogus error
 > messages which I don't understand (the compiler is probably not the only
 > application affected).  Running this program I just wrote forces the corrupted
 > caches to be flushed or something and ``fixes'' the problems:
...

      This problem sounds like a cache flushing problem.  Do you also
get SIGILL, SIGBUS, and SIGSEGV failures?  One possibility is that the icache
is not being flushed on a page fault, when a page is read in from disk,
and the icache still has old data in it.  This could lead to a cache line
of bogus instructions being executed.

      What model of CPU do you have in your machine?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Memory corruption
  1999-06-30  1:01 ` William J. Earl
@ 1999-06-30  2:47   ` Ulf Carlsson
  1999-06-30 22:01     ` William J. Earl
  0 siblings, 1 reply; 14+ messages in thread
From: Ulf Carlsson @ 1999-06-30  2:47 UTC (permalink / raw)
  To: William J. Earl; +Cc: linux

>  > The compiler may stop working sometimes on certain files, giving bogus
>  > error messages which I don't understand (the compiler is probably not the
>  > only application affected).  Running this program I just wrote forces the
>  > corrupted caches to be flushed or something and ``fixes'' the problems:
> ...
> 
>       This problem sounds like a cache flushing problem.  Do you also get
>       SIGILL, SIGBUS, and SIGSEGV failures?  One possibility is that the
>       icache is not being flushed on a page fault, when a page is read in from
>       disk, and the icache still has old data in it.  This could lead to a
>       cache line of bogus instructions being executed.

Sometimes when this happens I think I only get a SIGSEGV or a SIGBUS, otherwise
I get internal compiler errors.  It's hard to say since these problems are very
hard to reproduce, and I forget what happens from time to time.  I have
unfortunately not written down the results.  It sounds like this may be the
cause of the type of file corruption I have when only a little part of the file
is damaged (sounds like the problem covers both icache and dcache).  That type
of file corruption goes away after reboot.  I haven't had a chance to try this
with my discard-disk-cache program since this happens very seldom..

>       What model of CPU do you have in your machine?

I have a 133 MHz R4600 with 512kb board cache, 16kb dcache and 16kb icache.

Regards,
Ulf

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Memory corruption
  1999-06-30  2:47   ` Ulf Carlsson
@ 1999-06-30 22:01     ` William J. Earl
  1999-07-01  0:23       ` Ralf Baechle
  0 siblings, 1 reply; 14+ messages in thread
From: William J. Earl @ 1999-06-30 22:01 UTC (permalink / raw)
  To: Ulf Carlsson; +Cc: linux, ralf

Ulf Carlsson writes:
...
 > Sometimes when this happens I think I only get a SIGSEGV or a SIGBUS, otherwise
 > I get internal compiler errors.  It's hard to say since these problems are very
 > hard to reproduce, and I forget what happens from time to time.  I have
 > unfortunately not written down the results.  It sounds like this may be the
 > cause of the type of file corruption I have when only a little part of the file
 > is damaged (sounds like the problem covers both icache and dcache).  That type
 > of file corruption goes away after reboot.  I haven't had a chance to try this
 > with my discard-disk-cache program since this happens very seldom..
 > 
 > >       What model of CPU do you have in your machine?
 > 
 > I have a 133 MHz R4600 with 512kb board cache, 16kb dcache and 16kb icache.

     I have been looking at the fault handling and the cache flushing routines
for the R4600.  In do_no_page() in mm/memory.c, we have:

	flush_page_to_ram(page);

I don't see where any code invalidates the icache, which might have
cached lines from a previous incarnation of the page.
flush_page_to_ram(), for the R4600, essentially does a writeback of
the dcache, if I understand the code correctly.  I believe that an
icache invalidate is also needed, at least for executable pages
(including any page for which mprotect() with PROT_EXEC has been
called, not just for text pages from an executable file).  Also,
unless something has changed, my understanding is that conflicting
virtual aliases (in the dcache) are still possible, which will also
lead to data corruption when it happens.

     In particular, if process A mmaps a file page at virtual index
0 and process B happens to mmap the same file page at virtual index
1, they will in general corrupt each other's view of the data.

     There is a comment in memory.c that a non-present page shouldn't
be cached, but it is not yet clear to me that this is guaranteed for
the icache.  Also, the flush_page_to_ram() slows down processing on
machines which physical cache tags, for cases where the virtual
index used by the kernel and the virtual index used by the application
are the same.  It should have an extra argument of the intended user virtual
address, so that it can decide whether to flush or not on architectures
such as MIPS.

    Handling the virtual index conflicts requires dynamic ownership
switching (including cache flushing), which means that we have to record
those hardware-valid PTEs currently referencing the page, so that we can
invalidate the PTEs and flush the cache when a fault happens for a mapping
of a different color.  We could take a brute-force approach, and record
just one mapping, forcing a fault on each use of a different message,
which would allow us to keep the reverse map in an array parallel to mem_map,
or we could use some more complex structure to record mappings.  Also,
to reduce the frequency of conflicts, address assignment in do_mmap()
should take cache color into account on machines with virtually indexed
caches which lack hardware cache coherency (such as the R4000PC, R4600,
and R5000).

    

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Memory corruption
  1999-06-30 22:01     ` William J. Earl
@ 1999-07-01  0:23       ` Ralf Baechle
  1999-07-01  0:53         ` William J. Earl
  0 siblings, 1 reply; 14+ messages in thread
From: Ralf Baechle @ 1999-07-01  0:23 UTC (permalink / raw)
  To: William J. Earl; +Cc: Ulf Carlsson, linux

On Wed, Jun 30, 1999 at 03:01:27PM -0700, William J. Earl wrote:

>      I have been looking at the fault handling and the cache flushing routines
> for the R4600.  In do_no_page() in mm/memory.c, we have:
> 
> 	flush_page_to_ram(page);
> 
> I don't see where any code invalidates the icache, which might have
> cached lines from a previous incarnation of the page.
> flush_page_to_ram(), for the R4600, essentially does a writeback of
> the dcache, if I understand the code correctly.  I believe that an
> icache invalidate is also needed, at least for executable pages
> (including any page for which mprotect() with PROT_EXEC has been
> called, not just for text pages from an executable file).  Also,
> unless something has changed, my understanding is that conflicting
> virtual aliases (in the dcache) are still possible, which will also
> lead to data corruption when it happens.

The particular flush_page_to_ram() call is only necessary because the
call to vma->vm_ops->nopage() may have brought the page into the
primary cache under it's KSEG0 address.

>      In particular, if process A mmaps a file page at virtual index
> 0 and process B happens to mmap the same file page at virtual index
> 1, they will in general corrupt each other's view of the data.

Oh, the common case is either shared r/o mappings or SysV SHM which per
ABI is 64kb aligned, so the hairy case doesn't hit us.  Usually ...

Especially I don't see why anything should corrupt executable pages
which are r/o mapped.

>      There is a comment in memory.c that a non-present page shouldn't
> be cached, but it is not yet clear to me that this is guaranteed for
> the icache.

Flushing the caches for pages which are being unmapped is done by
flush_cach_page and takes care of the VM_EXEC flag.

On exec, fork or exit we flush the entire cache so that problems shouldn't
hit us either.

Actually we're pretty generous with our cacheflushed, we flush more than we
should.

> Also, the flush_page_to_ram() slows down processing on
> machines which physical cache tags, for cases where the virtual
> index used by the kernel and the virtual index used by the application
> are the same.  It should have an extra argument of the intended user virtual
> address, so that it can decide whether to flush or not on architectures
> such as MIPS.

For R3000 and R6000 flush_page_to_ram() is a no-op, see arch/mips/mm/r2300.c
and arch/mips/mm/r6000.c.

For virtual indexed CPUs something like change_page_colour(oldvaddr, newvaddr)
would usually do a more efficient job than always flushing the page to
memory especially when combined with an allocator which takes the vaddr where
the page will be mapped as a hint.

  Ralf

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Memory corruption
  1999-07-01  0:23       ` Ralf Baechle
@ 1999-07-01  0:53         ` William J. Earl
  1999-07-01 11:25           ` Harald Koerfgen
                             ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: William J. Earl @ 1999-07-01  0:53 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: William J. Earl, Ulf Carlsson, linux

Ralf Baechle writes:
...
 > >      In particular, if process A mmaps a file page at virtual index
 > > 0 and process B happens to mmap the same file page at virtual index
 > > 1, they will in general corrupt each other's view of the data.
 > 
 > Oh, the common case is either shared r/o mappings or SysV SHM which per
 > ABI is 64kb aligned, so the hairy case doesn't hit us.  Usually ...
 > 
 > Especially I don't see why anything should corrupt executable pages
 > which are r/o mapped.

     Suppose physical page X has been used as logical page 100 of executable
file ABC, and is then freed, but is still partially in the icache at 
virtual index 0.  Then suppose the page X is reused as logical page 200 of
executable DEF, at virtual index 0.  The writeback of the data cache is
good, but there are still cache lines from file ABC in the icache.  If
nothing flushes the icache (and there is no reason to flush the icache
when reusing a page for data), the icache will have stale data with respect
to the new identity of page X as logical page 200 of executable DEF.

     Also, if there are incompatible aliases for a page, and there are
dirty lines left in the cache when the mapping for, say, virtual index
1 is released, and then the mapping for virtual index 0 is also released,
and the page, which has KSEG0 virtual index 0 is used for I/O, the normal
flushing will flush only virtual index 0.  A later victim writeback of
the dirty lines for virtual index 1 will overwrite the new data with
stale data, even if the new data is instructions.  This case can apply
even if the one alias is a kernel KSEG0 alias and the other is a 
user alias.  For regular file I/O, this is not a problem, but it is a problem
with mmap(), particularly since Linux mmap() makes no attempt to keep multiple
mappings of the same page of a file color-congruent.  (mmap() addresses
are essentially arbitrary.)  

     The icache issue applies to all processors.  The dcache issue applies only
to the R4000PC, R4600, and R5000.

 > >      There is a comment in memory.c that a non-present page shouldn't
 > > be cached, but it is not yet clear to me that this is guaranteed for
 > > the icache.
 > 
 > Flushing the caches for pages which are being unmapped is done by
 > flush_cach_page and takes care of the VM_EXEC flag.
 > 
 > On exec, fork or exit we flush the entire cache so that problems shouldn't
 > hit us either.

      It is not clear this works as expected if the page is stolen by
vmscan.

 > Actually we're pretty generous with our cacheflushed, we flush more than we
 > should.

     Yes, but it is not clear that all paths are covered.

 > > Also, the flush_page_to_ram() slows down processing on
 > > machines which physical cache tags, for cases where the virtual
 > > index used by the kernel and the virtual index used by the application
 > > are the same.  It should have an extra argument of the intended user virtual
 > > address, so that it can decide whether to flush or not on architectures
 > > such as MIPS.
 > 
 > For R3000 and R6000 flush_page_to_ram() is a no-op, see arch/mips/mm/r2300.c
 > and arch/mips/mm/r6000.c.

    Yes, since those have write-through caches.  The icache
invalidation is still an issue, if there are any paths, such as
try_to_swap_out(), which break a virtual-to-physical mapping without
flushing the icache.

 > For virtual indexed CPUs something like change_page_colour(oldvaddr, newvaddr)
 > would usually do a more efficient job than always flushing the page to
 > memory especially when combined with an allocator which takes the vaddr where
 > the page will be mapped as a hint.

      Right.  Also, for IRIX and RISCos, I had mmap prefer an mmap
address for which color(address) == color(file_offset), so that
applications not using MAP_FIXED would always map a given file page at
the same virtual color, and I had the kernel use page_mapin() to make
a page addressable, so that I could have page_mapin() create a KSEG2
mapping of the appropriate color if it were different from the KSEG0
color of the page (for cases where the allocator could not allocate a
page with KSEG0 color to match the desired virtual color).
page_mapin() would of course return the KSEG0 address if the KSEG0
color matched the virtual color.  The color changing code is still
neaded to deal with MAP_FIXED and so on, but it is much less
performance-critical.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Memory corruption
  1999-07-01  0:53         ` William J. Earl
@ 1999-07-01 11:25           ` Harald Koerfgen
  1999-07-02 22:41           ` Ralf Baechle
  1999-07-06 13:05           ` Ralf Baechle
  2 siblings, 0 replies; 14+ messages in thread
From: Harald Koerfgen @ 1999-07-01 11:25 UTC (permalink / raw)
  To: William J. Earl; +Cc: linux, Ulf Carlsson, Ralf Baechle, linux-mips


On 01-Jul-99 William J. Earl wrote:
> Ralf Baechle writes:
[...]
>  > Actually we're pretty generous with our cacheflushed, we flush more than we
>  > should.
> 
>      Yes, but it is not clear that all paths are covered.
> 
>  > > Also, the flush_page_to_ram() slows down processing on
>  > > machines which physical cache tags, for cases where the virtual
>  > > index used by the kernel and the virtual index used by the application
>  > > are the same.  It should have an extra argument of the intended user virtual
>  > > address, so that it can decide whether to flush or not on architectures
>  > > such as MIPS.
>  > 
>  > For R3000 and R6000 flush_page_to_ram() is a no-op, see arch/mips/mm/r2300.c
>  > and arch/mips/mm/r6000.c.
> 
>     Yes, since those have write-through caches.  The icache
> invalidation is still an issue, if there are any paths, such as
> try_to_swap_out(), which break a virtual-to-physical mapping without
> flushing the icache.

A good point. That seems to be exactly the problem R3k DECstations have. Processes
are dying with SIGABRT SIGBUS or SIGSEGV shortly after swapping occurs. Trying to
hunt that down I removed all optimisations from the cacheflushing routines and 
replaced them with flush_cache_all() but that didn't help.

---
Regards,
Harald

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Memory corruption
  1999-07-01  0:53         ` William J. Earl
  1999-07-01 11:25           ` Harald Koerfgen
@ 1999-07-02 22:41           ` Ralf Baechle
  1999-07-06 13:05           ` Ralf Baechle
  2 siblings, 0 replies; 14+ messages in thread
From: Ralf Baechle @ 1999-07-02 22:41 UTC (permalink / raw)
  To: William J. Earl; +Cc: Ulf Carlsson, linux

On Wed, Jun 30, 1999 at 05:53:58PM -0700, William J. Earl wrote:

>      Suppose physical page X has been used as logical page 100 of executable
> file ABC, and is then freed, but is still partially in the icache at 
> virtual index 0.  Then suppose the page X is reused as logical page 200 of
> executable DEF, at virtual index 0.  The writeback of the data cache is
> good, but there are still cache lines from file ABC in the icache.  If
> nothing flushes the icache (and there is no reason to flush the icache
> when reusing a page for data), the icache will have stale data with respect
> to the new identity of page X as logical page 200 of executable DEF.

Ok, yes that can happen in theory if code has been executed in a page
which was not marked PROT_EXEC but execed though.  Fixing that makes things
quite a bit slower, we'll have to flush the icache on every flush_cache_page.
flush_cache_range() already does this.

Hmm...  Maybe a my-software-behaves-properly-and-I-know-this-is-dangerous-
sysctl() which restablishes the current i-cache flushing behaviour if
VM_EXEC is unset?

I herewith order an execution protection bit for the next generation MIPS
and while we're at it an integer add with carry for faster IP checksums.

> The icache issue applies to all processors.  The dcache issue applies only
> to the R4000PC, R4600, and R5000.

And R41xx, R42xx, R43xx, R4700, Nevada, Kronus, Sony Playstation II CPU ...

>  > Flushing the caches for pages which are being unmapped is done by
>  > flush_cach_page and takes care of the VM_EXEC flag.
>  > 
>  > On exec, fork or exit we flush the entire cache so that problems shouldn't
>  > hit us either.
>
> It is not clear this works as expected if the page is stolen by vmscan.

The thing is that as I already mentioned above a page might be in the
icache even though it isn't marked as VM_EXEC.

>  > Actually we're pretty generous with our cacheflushed, we flush more
>  > than we should.
>
> Yes, but it is not clear that all paths are covered.
>
>  > > Also, the flush_page_to_ram() slows down processing on
>  > > machines which physical cache tags, for cases where the virtual
>  > > index used by the kernel and the virtual index used by the application
>  > > are the same.  It should have an extra argument of the intended user
>  > > virtual address, so that it can decide whether to flush or not on
>  > > architectures such as MIPS.
>  > 
>  > For R3000 and R6000 flush_page_to_ram() is a no-op, see
>  >  arch/mips/mm/r2300.c and arch/mips/mm/r6000.c.
>
> Yes, since those have write-through caches.

The cache write policy doesn't matter in that case.

> The icache invalidation is still an issue, if there are any paths, such
> as try_to_swap_out(), which break a virtual-to-physical mapping without
> flushing the icache.

>  > For virtual indexed CPUs something like change_page_colour(oldvaddr,
>  > newvaddr) would usually do a more efficient job than always flushing the
>  > page to memory especially when combined with an allocator which takes the
>  > vaddr where the page will be mapped as a hint.
>
>       Right.  Also, for IRIX and RISCos, I had mmap prefer an mmap
> address for which color(address) == color(file_offset), so that
> applications not using MAP_FIXED would always map a given file page at
> the same virtual color, and I had the kernel use page_mapin() to make
> a page addressable, so that I could have page_mapin() create a KSEG2
> mapping of the appropriate color if it were different from the KSEG0
> color of the page (for cases where the allocator could not allocate a
> page with KSEG0 color to match the desired virtual color).
> page_mapin() would of course return the KSEG0 address if the KSEG0
> color matched the virtual color.  The color changing code is still
> neaded to deal with MAP_FIXED and so on, but it is much less
> performance-critical.

That will also deal efficiently with the way ld.so loads ELF binaries.

  Ralf

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Memory corruption
  1999-07-01  0:53         ` William J. Earl
  1999-07-01 11:25           ` Harald Koerfgen
  1999-07-02 22:41           ` Ralf Baechle
@ 1999-07-06 13:05           ` Ralf Baechle
  1999-07-07 21:08             ` Harald Koerfgen
  2 siblings, 1 reply; 14+ messages in thread
From: Ralf Baechle @ 1999-07-06 13:05 UTC (permalink / raw)
  To: William J. Earl; +Cc: Ulf Carlsson, linux, linux-mips, linux-mips

I've received a report from some person who is working on his own R3081
port.  He also observes data corruption and suspects reading of swapped
pages is causing that.

Sigh,

  Ralf

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Memory corruption
  1999-07-06 13:05           ` Ralf Baechle
@ 1999-07-07 21:08             ` Harald Koerfgen
  1999-07-08  1:51               ` Warner Losh
  1999-07-08 10:39               ` Ralf Baechle
  0 siblings, 2 replies; 14+ messages in thread
From: Harald Koerfgen @ 1999-07-07 21:08 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips, linux-mips, linux, Ulf Carlsson, William J. Earl


On 06-Jul-99 Ralf Baechle wrote:
> I've received a report from some person who is working on his own R3081
> port.  He also observes data corruption and suspects reading of swapped
> pages is causing that.

That's definitely true for R3k DECstations, and no, flushing the icache in
flush_tlb_page() does not help. I have added cacheflushing to all tlb routines,
copy_page and even rw_swap_page_base() and swap_after_unlock_page() without
success.

Any ideas?
---
Regards,
Harald

P.S.: I'll be on vacation until July 18th so this has twait a little bit :-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Memory corruption
  1999-07-07 21:08             ` Harald Koerfgen
@ 1999-07-08  1:51               ` Warner Losh
  1999-07-08  3:12                 ` William J. Earl
  1999-07-08 10:39               ` Ralf Baechle
  1 sibling, 1 reply; 14+ messages in thread
From: Warner Losh @ 1999-07-08  1:51 UTC (permalink / raw)
  To: Harald Koerfgen
  Cc: Ralf Baechle, linux-mips, linux-mips, linux, Ulf Carlsson,
	William J. Earl

In message <XFMail.990707230857.Harald.Koerfgen@home.ivm.de> Harald Koerfgen writes:
: That's definitely true for R3k DECstations, and no, flushing the icache in
: flush_tlb_page() does not help. I have added cacheflushing to all tlb routines,
: copy_page and even rw_swap_page_base() and swap_after_unlock_page() without
: success.

Don'y you want to flush the dcache as well?  I think that you can run
into problems when you have a dirty dcache and then dma into the pages
that are dirty.  Instant karma corruption, no?  Or am I thinking of
some other problem?

Warner

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Memory corruption
  1999-07-08  1:51               ` Warner Losh
@ 1999-07-08  3:12                 ` William J. Earl
       [not found]                   ` <37846EE7.EADD9E32@niisi.msk.ru>
  0 siblings, 1 reply; 14+ messages in thread
From: William J. Earl @ 1999-07-08  3:12 UTC (permalink / raw)
  To: Warner Losh
  Cc: Harald Koerfgen, Ralf Baechle, linux-mips, linux-mips, linux,
	Ulf Carlsson, William J. Earl

Warner Losh writes:
 > In message <XFMail.990707230857.Harald.Koerfgen@home.ivm.de> Harald Koerfgen writes:
 > : That's definitely true for R3k DECstations, and no, flushing the icache in
 > : flush_tlb_page() does not help. I have added cacheflushing to all tlb routines,
 > : copy_page and even rw_swap_page_base() and swap_after_unlock_page() without
 > : success.
 > 
 > Don'y you want to flush the dcache as well?  I think that you can run
 > into problems when you have a dirty dcache and then dma into the pages
 > that are dirty.  Instant karma corruption, no?  Or am I thinking of
 > some other problem?

      The R3000 has a write-through cache, so there cannot be dirty cache
lines, although you do have to flush the write buffers to be completely
correct (in the case of a DMA device writing to memory VERY quickly after
the register write which starts it up, on some hardware). 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Memory corruption
  1999-07-07 21:08             ` Harald Koerfgen
  1999-07-08  1:51               ` Warner Losh
@ 1999-07-08 10:39               ` Ralf Baechle
  1 sibling, 0 replies; 14+ messages in thread
From: Ralf Baechle @ 1999-07-08 10:39 UTC (permalink / raw)
  To: Harald Koerfgen
  Cc: linux-mips, linux-mips, linux, Ulf Carlsson, William J. Earl

On Wed, Jul 07, 1999 at 11:08:57PM +0200, Harald Koerfgen wrote:

> On 06-Jul-99 Ralf Baechle wrote:
> > I've received a report from some person who is working on his own R3081
> > port.  He also observes data corruption and suspects reading of swapped
> > pages is causing that.
> 
> That's definitely true for R3k DECstations, and no, flushing the icache in
> flush_tlb_page() does not help. I have added cacheflushing to all tlb routines,
> copy_page and even rw_swap_page_base() and swap_after_unlock_page() without
> success.

Note that on R3000 with it's physical indexed caches there is no way that
cache problems should be able to crash the whole system.  At least under the
provision that DMA drivers get their cacheflushing right.

I recently tried to put our memcpy / memmove from the kernel into libc
and as result ended up with a libc which was almost unusable.  Also, a
part of memove is disabled by #if 0, it was demonstrated to cause data
corruption.  Time to fix that bastard.  The whole file is a big mess, btw.
because the code tries to share as much code as possible between memcpy,
memmove and __copy_{to,from}_user.  So put on your peril sensitive
glasses ;-)

> P.S.: I'll be on vacation until July 18th so this has twait a little bit :-)

s/.*/P.S.: I have plenty of time for hacking during my vacation :-)/p ;-)

  Ralf

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Memory corruption
       [not found]                   ` <37846EE7.EADD9E32@niisi.msk.ru>
@ 1999-07-08 17:56                     ` William J. Earl
  0 siblings, 0 replies; 14+ messages in thread
From: William J. Earl @ 1999-07-08 17:56 UTC (permalink / raw)
  To: Gleb O. Raiko
  Cc: William J. Earl, Warner Losh, Harald Koerfgen, Ralf Baechle,
	linux-mips, linux-mips, linux, Ulf Carlsson

Gleb O. Raiko writes:
 > "William J. Earl" wrote:
...
 > >       The R3000 has a write-through cache, so there cannot be dirty cache
 > > lines, although you do have to flush the write buffers to be completely
 > > correct (in the case of a DMA device writing to memory VERY quickly after
 > > the register write which starts it up, on some hardware).
 > 
 > You must flush d-cache after dma. While some cache controllers are able
 > to watch the bus and flush the data that are invalidated due to DMA
 > transfers, I think, most r3k boxes doesn't have such beasts. Flushing
 > d-cache wasn't implemented at the same time as the cache stuff because
 > we hadn't boxes with DMA devices.

     Most R3000 (and many R4000/R4600/R5000) boxes do not have
cache-coherent I/O, and Linux/MIPS does do cache flushing.  If
everything is well-organized, one can flush the d-cache only before an
I/O.  On an R3000, it does not much matter which approach you take,
since the caches are write-through (aside from the need to flush the
write-buffer before initiating a DMA).  For later processors, you must
flush the d-cache BEFORE a DMA, since victim writebacks of dirty lines
after a DMA into memory has updated memory will lead to I/O data
corruption, and failure to flush dirty lines before a DMA from memory
will lead to stale data being written to disk.  If it is possible for
the CPU to access the buffer during the DMA, then you must invalidate
the cache for the buffer after a DMA into memory as well, but a
well-constructed system should never do that.  

    If you have a buffer which is not cache-line-aligned (which is
possible with the general case of raw or direct I/O, although not in
unmodified Linux at the moment), then, for DMA into memory, you must
use temporary buffers for any portion of the buffer which occupies
just part of a cache line, and copy the data from the temporary buffer
to the real buffer after the DMA completes, to account for the
possibility of a separate thread modifying data outside the buffer in
the shared cache line, leading to a victim writeback (or a
writethrough on the R3000).  This could apply even to the R3000, depending
on how the compiler generates code for a partial-word update, although
it is unlikely.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~1999-07-08 21:53 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
1999-06-22  1:39 Memory corruption Ulf Carlsson
1999-06-30  1:01 ` William J. Earl
1999-06-30  2:47   ` Ulf Carlsson
1999-06-30 22:01     ` William J. Earl
1999-07-01  0:23       ` Ralf Baechle
1999-07-01  0:53         ` William J. Earl
1999-07-01 11:25           ` Harald Koerfgen
1999-07-02 22:41           ` Ralf Baechle
1999-07-06 13:05           ` Ralf Baechle
1999-07-07 21:08             ` Harald Koerfgen
1999-07-08  1:51               ` Warner Losh
1999-07-08  3:12                 ` William J. Earl
     [not found]                   ` <37846EE7.EADD9E32@niisi.msk.ru>
1999-07-08 17:56                     ` William J. Earl
1999-07-08 10:39               ` Ralf Baechle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox