All of lore.kernel.org
 help / color / mirror / Atom feed
* dcache_blast() bug?
@ 2001-06-04 17:34 Ian Thompson
  2001-06-04 19:18   ` Kevin D. Kissell
  0 siblings, 1 reply; 9+ messages in thread
From: Ian Thompson @ 2001-06-04 17:34 UTC (permalink / raw)
  To: linux-mips


Hi all,

I'm seeing some odd memory behavior around the time when blast_dcache()
is called, leading me to think that the method may be a little buggy. 
It appears that memory is being corrupted (consistently so) over the
course of flushing the dcache.  This happens to my command line argument
string - arcs_cmdline.  Before the blast_dcache() call, it is
"console=ttyS0 ramdisk_start=0x9fcf0000 load_ramdisk=1", and after the
call, the corrupted data is "ttyS0 ra0".  I take it this isn't supposed
to happen?  any ideas of why the writeback_invalidate_d cache operation
may be losing data?

thanks,
-ian


-- 
----------------------------------------
Ian Thompson           tel: 408.952.2023
Firmware Engineer      fax: 408.570.0910
Palmchip Corporation   www.palmchip.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dcache_blast() bug?
@ 2001-06-04 19:18   ` Kevin D. Kissell
  0 siblings, 0 replies; 9+ messages in thread
From: Kevin D. Kissell @ 2001-06-04 19:18 UTC (permalink / raw)
  To: Ian Thompson, linux-mips

What processor are you running?

            Kevin K.

----- Original Message ----- 
From: "Ian Thompson" <iant@palmchip.com>
To: <linux-mips@oss.sgi.com>
Sent: Monday, June 04, 2001 7:34 PM
Subject: dcache_blast() bug?


> 
> Hi all,
> 
> I'm seeing some odd memory behavior around the time when blast_dcache()
> is called, leading me to think that the method may be a little buggy. 
> It appears that memory is being corrupted (consistently so) over the
> course of flushing the dcache.  This happens to my command line argument
> string - arcs_cmdline.  Before the blast_dcache() call, it is
> "console=ttyS0 ramdisk_start=0x9fcf0000 load_ramdisk=1", and after the
> call, the corrupted data is "ttyS0 ra0".  I take it this isn't supposed
> to happen?  any ideas of why the writeback_invalidate_d cache operation
> may be losing data?
> 
> thanks,
> -ian
> 
> 
> -- 
> ----------------------------------------
> Ian Thompson           tel: 408.952.2023
> Firmware Engineer      fax: 408.570.0910
> Palmchip Corporation   www.palmchip.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dcache_blast() bug?
@ 2001-06-04 19:18   ` Kevin D. Kissell
  0 siblings, 0 replies; 9+ messages in thread
From: Kevin D. Kissell @ 2001-06-04 19:18 UTC (permalink / raw)
  To: Ian Thompson, linux-mips

What processor are you running?

            Kevin K.

----- Original Message ----- 
From: "Ian Thompson" <iant@palmchip.com>
To: <linux-mips@oss.sgi.com>
Sent: Monday, June 04, 2001 7:34 PM
Subject: dcache_blast() bug?


> 
> Hi all,
> 
> I'm seeing some odd memory behavior around the time when blast_dcache()
> is called, leading me to think that the method may be a little buggy. 
> It appears that memory is being corrupted (consistently so) over the
> course of flushing the dcache.  This happens to my command line argument
> string - arcs_cmdline.  Before the blast_dcache() call, it is
> "console=ttyS0 ramdisk_start=0x9fcf0000 load_ramdisk=1", and after the
> call, the corrupted data is "ttyS0 ra0".  I take it this isn't supposed
> to happen?  any ideas of why the writeback_invalidate_d cache operation
> may be losing data?
> 
> thanks,
> -ian
> 
> 
> -- 
> ----------------------------------------
> Ian Thompson           tel: 408.952.2023
> Firmware Engineer      fax: 408.570.0910
> Palmchip Corporation   www.palmchip.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dcache_blast() bug?
  2001-06-04 19:18   ` Kevin D. Kissell
  (?)
@ 2001-06-04 20:27   ` Ian Thompson
  2001-06-04 21:21       ` Kevin D. Kissell
  -1 siblings, 1 reply; 9+ messages in thread
From: Ian Thompson @ 2001-06-04 20:27 UTC (permalink / raw)
  To: Kevin D. Kissell; +Cc: linux-mips

oops sorry i meant to mention that.  running a mips 4kc.



"Kevin D. Kissell" wrote:
> 
> What processor are you running?
> 
>             Kevin K.
> 
> ----- Original Message -----
> From: "Ian Thompson" <iant@palmchip.com>
> To: <linux-mips@oss.sgi.com>
> Sent: Monday, June 04, 2001 7:34 PM
> Subject: dcache_blast() bug?
> 
> >
> > Hi all,
> >
> > I'm seeing some odd memory behavior around the time when blast_dcache()
> > is called, leading me to think that the method may be a little buggy.
> > It appears that memory is being corrupted (consistently so) over the
> > course of flushing the dcache.  This happens to my command line argument
> > string - arcs_cmdline.  Before the blast_dcache() call, it is
> > "console=ttyS0 ramdisk_start=0x9fcf0000 load_ramdisk=1", and after the
> > call, the corrupted data is "ttyS0 ra0".  I take it this isn't supposed
> > to happen?  any ideas of why the writeback_invalidate_d cache operation
> > may be losing data?
> >
> > thanks,
> > -ian
> >
> >
> > --
> > ----------------------------------------
> > Ian Thompson           tel: 408.952.2023
> > Firmware Engineer      fax: 408.570.0910
> > Palmchip Corporation   www.palmchip.com

-- 
----------------------------------------
Ian Thompson           tel: 408.952.2023
Firmware Engineer      fax: 408.570.0910
Palmchip Corporation   www.palmchip.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dcache_blast() bug?
@ 2001-06-04 21:21       ` Kevin D. Kissell
  0 siblings, 0 replies; 9+ messages in thread
From: Kevin D. Kissell @ 2001-06-04 21:21 UTC (permalink / raw)
  To: Ian Thompson; +Cc: linux-mips

Interesting, in that the 4Kc has write-through caches,
which make it a good deal more difficult to get into
the kind of trouble you describe.  Are you running one
of the 4Kc "lead vehicle" chips, or some other part?
Which version of the kernel are you running, and are
the CPU type and cache organization being reported
correctly during boot-up?

The output you report from dumping the command line
sounds is interesting.  The corruption seems to be in
8-byte chunks - the first 8 have disappeared, as has
the third 8.  Lord knows where the "0" in "ra0" comes
from.  Can you confirm that (a) the command line string
is stored at an 8-byte aligned boundary, and (b) whether
the data is actually being moved, or if the missing
characters are simply being replaced with nulls or other
non-printable characters?  I know it's not pretty, but
can you dump the same memory addresses as seen
through non-cacheable kseg1 (0xa0000000-0xbfffffff),
and are the cache and memory consistent?

If the failure is happening on 8-byte, doubleword aligned
chunks, I suspect a hardware problem more than a
kernel bug.  If it were my system, I'd re-seat the RAM
and CPU modules to make sure I'm not simply getting
screwed by a bad connection when the memory interface
suddenly gets hit with a lot of traffic following the flush.

            Kevin K.

----- Original Message -----
From: "Ian Thompson" <iant@palmchip.com>
To: "Kevin D. Kissell" <kevink@mips.com>
Cc: <linux-mips@oss.sgi.com>
Sent: Monday, June 04, 2001 10:27 PM
Subject: Re: dcache_blast() bug?


> oops sorry i meant to mention that.  running a mips 4kc.
>
>
>
> "Kevin D. Kissell" wrote:
> >
> > What processor are you running?
> >
> >             Kevin K.
> >
> > ----- Original Message -----
> > From: "Ian Thompson" <iant@palmchip.com>
> > To: <linux-mips@oss.sgi.com>
> > Sent: Monday, June 04, 2001 7:34 PM
> > Subject: dcache_blast() bug?
> >
> > >
> > > Hi all,
> > >
> > > I'm seeing some odd memory behavior around the time when
blast_dcache()
> > > is called, leading me to think that the method may be a little buggy.
> > > It appears that memory is being corrupted (consistently so) over the
> > > course of flushing the dcache.  This happens to my command line
argument
> > > string - arcs_cmdline.  Before the blast_dcache() call, it is
> > > "console=ttyS0 ramdisk_start=0x9fcf0000 load_ramdisk=1", and after the
> > > call, the corrupted data is "ttyS0 ra0".  I take it this isn't
supposed
> > > to happen?  any ideas of why the writeback_invalidate_d cache
operation
> > > may be losing data?
> > >
> > > thanks,
> > > -ian
> > >
> > >
> > > --
> > > ----------------------------------------
> > > Ian Thompson           tel: 408.952.2023
> > > Firmware Engineer      fax: 408.570.0910
> > > Palmchip Corporation   www.palmchip.com
>
> --
> ----------------------------------------
> Ian Thompson           tel: 408.952.2023
> Firmware Engineer      fax: 408.570.0910
> Palmchip Corporation   www.palmchip.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dcache_blast() bug?
@ 2001-06-04 21:21       ` Kevin D. Kissell
  0 siblings, 0 replies; 9+ messages in thread
From: Kevin D. Kissell @ 2001-06-04 21:21 UTC (permalink / raw)
  To: Ian Thompson; +Cc: linux-mips

Interesting, in that the 4Kc has write-through caches,
which make it a good deal more difficult to get into
the kind of trouble you describe.  Are you running one
of the 4Kc "lead vehicle" chips, or some other part?
Which version of the kernel are you running, and are
the CPU type and cache organization being reported
correctly during boot-up?

The output you report from dumping the command line
sounds is interesting.  The corruption seems to be in
8-byte chunks - the first 8 have disappeared, as has
the third 8.  Lord knows where the "0" in "ra0" comes
from.  Can you confirm that (a) the command line string
is stored at an 8-byte aligned boundary, and (b) whether
the data is actually being moved, or if the missing
characters are simply being replaced with nulls or other
non-printable characters?  I know it's not pretty, but
can you dump the same memory addresses as seen
through non-cacheable kseg1 (0xa0000000-0xbfffffff),
and are the cache and memory consistent?

If the failure is happening on 8-byte, doubleword aligned
chunks, I suspect a hardware problem more than a
kernel bug.  If it were my system, I'd re-seat the RAM
and CPU modules to make sure I'm not simply getting
screwed by a bad connection when the memory interface
suddenly gets hit with a lot of traffic following the flush.

            Kevin K.

----- Original Message -----
From: "Ian Thompson" <iant@palmchip.com>
To: "Kevin D. Kissell" <kevink@mips.com>
Cc: <linux-mips@oss.sgi.com>
Sent: Monday, June 04, 2001 10:27 PM
Subject: Re: dcache_blast() bug?


> oops sorry i meant to mention that.  running a mips 4kc.
>
>
>
> "Kevin D. Kissell" wrote:
> >
> > What processor are you running?
> >
> >             Kevin K.
> >
> > ----- Original Message -----
> > From: "Ian Thompson" <iant@palmchip.com>
> > To: <linux-mips@oss.sgi.com>
> > Sent: Monday, June 04, 2001 7:34 PM
> > Subject: dcache_blast() bug?
> >
> > >
> > > Hi all,
> > >
> > > I'm seeing some odd memory behavior around the time when
blast_dcache()
> > > is called, leading me to think that the method may be a little buggy.
> > > It appears that memory is being corrupted (consistently so) over the
> > > course of flushing the dcache.  This happens to my command line
argument
> > > string - arcs_cmdline.  Before the blast_dcache() call, it is
> > > "console=ttyS0 ramdisk_start=0x9fcf0000 load_ramdisk=1", and after the
> > > call, the corrupted data is "ttyS0 ra0".  I take it this isn't
supposed
> > > to happen?  any ideas of why the writeback_invalidate_d cache
operation
> > > may be losing data?
> > >
> > > thanks,
> > > -ian
> > >
> > >
> > > --
> > > ----------------------------------------
> > > Ian Thompson           tel: 408.952.2023
> > > Firmware Engineer      fax: 408.570.0910
> > > Palmchip Corporation   www.palmchip.com
>
> --
> ----------------------------------------
> Ian Thompson           tel: 408.952.2023
> Firmware Engineer      fax: 408.570.0910
> Palmchip Corporation   www.palmchip.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dcache_blast() bug?
  2001-06-04 21:21       ` Kevin D. Kissell
  (?)
@ 2001-06-04 23:33       ` Ian Thompson
  2001-06-05  8:50           ` Kevin D. Kissell
  -1 siblings, 1 reply; 9+ messages in thread
From: Ian Thompson @ 2001-06-04 23:33 UTC (permalink / raw)
  To: Kevin D. Kissell; +Cc: linux-mips

Thanks for your help Kevin.  It may be possible that this is a hardware
bug.  I am using one of the lead vehicle chips with 16k d$ & i$,
although there is some custom hardware which may be causing trouble as
well.  Oh, and this is the 2.4.1 kernel.

It appears that when I copy the arguments into the command line
variable, they are 8-byte aligned, and the destination is also 8-byte
aligned.  However, there is an inconsistency between the data in the
cache and in memory after the blast_dcache() call.  Could it also be
possible that the cache write buffer is not quite empty, and the data in
it is being lost on the blast call?  Should some implementation of
wbflush be called before the cache ops are done?

I just wanted to see if this could be a problem before I start trying to
track down bugs in hardware...

Thanks,
-ian

"Kevin D. Kissell" wrote:
> 
> Interesting, in that the 4Kc has write-through caches,
> which make it a good deal more difficult to get into
> the kind of trouble you describe.  Are you running one
> of the 4Kc "lead vehicle" chips, or some other part?
> Which version of the kernel are you running, and are
> the CPU type and cache organization being reported
> correctly during boot-up?
> 
> The output you report from dumping the command line
> sounds is interesting.  The corruption seems to be in
> 8-byte chunks - the first 8 have disappeared, as has
> the third 8.  Lord knows where the "0" in "ra0" comes
> from.  Can you confirm that (a) the command line string
> is stored at an 8-byte aligned boundary, and (b) whether
> the data is actually being moved, or if the missing
> characters are simply being replaced with nulls or other
> non-printable characters?  I know it's not pretty, but
> can you dump the same memory addresses as seen
> through non-cacheable kseg1 (0xa0000000-0xbfffffff),
> and are the cache and memory consistent?
> 
> If the failure is happening on 8-byte, doubleword aligned
> chunks, I suspect a hardware problem more than a
> kernel bug.  If it were my system, I'd re-seat the RAM
> and CPU modules to make sure I'm not simply getting
> screwed by a bad connection when the memory interface
> suddenly gets hit with a lot of traffic following the flush.
> 
>             Kevin K.
> 
> ----- Original Message -----
> From: "Ian Thompson" <iant@palmchip.com>
> To: "Kevin D. Kissell" <kevink@mips.com>
> Cc: <linux-mips@oss.sgi.com>
> Sent: Monday, June 04, 2001 10:27 PM
> Subject: Re: dcache_blast() bug?
> 
> > oops sorry i meant to mention that.  running a mips 4kc.
> >
> >
> >
> > "Kevin D. Kissell" wrote:
> > >
> > > What processor are you running?
> > >
> > >             Kevin K.
> > >
> > > ----- Original Message -----
> > > From: "Ian Thompson" <iant@palmchip.com>
> > > To: <linux-mips@oss.sgi.com>
> > > Sent: Monday, June 04, 2001 7:34 PM
> > > Subject: dcache_blast() bug?
> > >
> > > >
> > > > Hi all,
> > > >
> > > > I'm seeing some odd memory behavior around the time when
> blast_dcache()
> > > > is called, leading me to think that the method may be a little buggy.
> > > > It appears that memory is being corrupted (consistently so) over the
> > > > course of flushing the dcache.  This happens to my command line
> argument
> > > > string - arcs_cmdline.  Before the blast_dcache() call, it is
> > > > "console=ttyS0 ramdisk_start=0x9fcf0000 load_ramdisk=1", and after the
> > > > call, the corrupted data is "ttyS0 ra0".  I take it this isn't
> supposed
> > > > to happen?  any ideas of why the writeback_invalidate_d cache
> operation
> > > > may be losing data?
> > > >
> > > > thanks,
> > > > -ian
> > > >
> > > >
> > > > --
> > > > ----------------------------------------
> > > > Ian Thompson           tel: 408.952.2023
> > > > Firmware Engineer      fax: 408.570.0910
> > > > Palmchip Corporation   www.palmchip.com
> >
> > --
> > ----------------------------------------
> > Ian Thompson           tel: 408.952.2023
> > Firmware Engineer      fax: 408.570.0910
> > Palmchip Corporation   www.palmchip.com

-- 
----------------------------------------
Ian Thompson           tel: 408.952.2023
Firmware Engineer      fax: 408.570.0910
Palmchip Corporation   www.palmchip.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dcache_blast() bug?
@ 2001-06-05  8:50           ` Kevin D. Kissell
  0 siblings, 0 replies; 9+ messages in thread
From: Kevin D. Kissell @ 2001-06-05  8:50 UTC (permalink / raw)
  To: Ian Thompson; +Cc: linux-mips


> Thanks for your help Kevin.  It may be possible that this is a hardware
> bug.  I am using one of the lead vehicle chips with 16k d$ & i$,
> although there is some custom hardware which may be causing trouble as
> well.  Oh, and this is the 2.4.1 kernel.

I'm also running a 4Kc lead vehicle, on a MIPS Malta board,
with the 2.4.1 kernel, run the system moderately hard, and have
never seen any behavior like that you describle.

> It appears that when I copy the arguments into the command line
> variable, they are 8-byte aligned, and the destination is also 8-byte
> aligned.  However, there is an inconsistency between the data in the
> cache and in memory after the blast_dcache() call.

Am I correct in taking this to mean that the contents of
memory is correct, but that the cache is in error when
you read the data back in cacheable space?  That suggests
that writes are working fine, but that either the blast_dcache()
isn't correctly clearing the tags, or the refill from memory
is getting trashed on the way to the cache.  The former
could result from misbehavior in the 4Kc lead vehicle chip
itself (possibly provoked by some kind of marginal
clock or power supply input), the later could result from
any one of several problems in the path between the RAM
array and the lead vehicle cache.  I favor the later theory.
See below.

>
Could it also be
> possible that the cache write buffer is not quite empty, and the data in
> it is being lost on the blast call?

I know of no software mechanism that will cause the contents
of the write buffer to be lost.  I think a bus error indication
from the system might cause it to be thrown away, but that's
about it.  The SYNC instruction forces its contents to be
written to memory, not discarded.

> Should some implementation of
> wbflush be called before the cache ops are done?

The write buffers are part of the BIU which is on the
"far side" of the cache.  Since the cache in write-through,
the cache operations should not result in any interaction
with the write buffer at all - the cache tags should get
invalidated, and that's all.

The reason that the 8-byte granularity of error suggests
a hardware problem at the memory interface is that, while
writes to memory will be 1, 2, or 4 bytes (byte, halfword,
and word stores), and the cache line size and write buffer
size are both 16 bytes, the 4Kc lead vehicle has a 64-bit
memory interface, and reads 8 bytes at a time when doing
cache fills.  A botched RAM cycle during a cache fill would
cause 8-byte blocks within the 16-byte cache lines to be
trashed - which seems to be exactly what you are seeing.

I strongly suggest that you double check all mechanical connections
(CPU socket and memory slots), and if that doesn't help, check your
RAM timing, your supply voltage, and the symmetry and cleanliness
of your clocks.  It sounds like the problem is highly reproducable,
so a next step might be to stick a logic analyser on the CPU/Memory
interface and watch the fill operation on the address, following the flush.

            Regards,

            Kevin K.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dcache_blast() bug?
@ 2001-06-05  8:50           ` Kevin D. Kissell
  0 siblings, 0 replies; 9+ messages in thread
From: Kevin D. Kissell @ 2001-06-05  8:50 UTC (permalink / raw)
  To: Ian Thompson; +Cc: linux-mips


> Thanks for your help Kevin.  It may be possible that this is a hardware
> bug.  I am using one of the lead vehicle chips with 16k d$ & i$,
> although there is some custom hardware which may be causing trouble as
> well.  Oh, and this is the 2.4.1 kernel.

I'm also running a 4Kc lead vehicle, on a MIPS Malta board,
with the 2.4.1 kernel, run the system moderately hard, and have
never seen any behavior like that you describle.

> It appears that when I copy the arguments into the command line
> variable, they are 8-byte aligned, and the destination is also 8-byte
> aligned.  However, there is an inconsistency between the data in the
> cache and in memory after the blast_dcache() call.

Am I correct in taking this to mean that the contents of
memory is correct, but that the cache is in error when
you read the data back in cacheable space?  That suggests
that writes are working fine, but that either the blast_dcache()
isn't correctly clearing the tags, or the refill from memory
is getting trashed on the way to the cache.  The former
could result from misbehavior in the 4Kc lead vehicle chip
itself (possibly provoked by some kind of marginal
clock or power supply input), the later could result from
any one of several problems in the path between the RAM
array and the lead vehicle cache.  I favor the later theory.
See below.

>
Could it also be
> possible that the cache write buffer is not quite empty, and the data in
> it is being lost on the blast call?

I know of no software mechanism that will cause the contents
of the write buffer to be lost.  I think a bus error indication
from the system might cause it to be thrown away, but that's
about it.  The SYNC instruction forces its contents to be
written to memory, not discarded.

> Should some implementation of
> wbflush be called before the cache ops are done?

The write buffers are part of the BIU which is on the
"far side" of the cache.  Since the cache in write-through,
the cache operations should not result in any interaction
with the write buffer at all - the cache tags should get
invalidated, and that's all.

The reason that the 8-byte granularity of error suggests
a hardware problem at the memory interface is that, while
writes to memory will be 1, 2, or 4 bytes (byte, halfword,
and word stores), and the cache line size and write buffer
size are both 16 bytes, the 4Kc lead vehicle has a 64-bit
memory interface, and reads 8 bytes at a time when doing
cache fills.  A botched RAM cycle during a cache fill would
cause 8-byte blocks within the 16-byte cache lines to be
trashed - which seems to be exactly what you are seeing.

I strongly suggest that you double check all mechanical connections
(CPU socket and memory slots), and if that doesn't help, check your
RAM timing, your supply voltage, and the symmetry and cleanliness
of your clocks.  It sounds like the problem is highly reproducable,
so a next step might be to stick a logic analyser on the CPU/Memory
interface and watch the fill operation on the address, following the flush.

            Regards,

            Kevin K.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2001-06-05  8:50 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-06-04 17:34 dcache_blast() bug? Ian Thompson
2001-06-04 19:18 ` Kevin D. Kissell
2001-06-04 19:18   ` Kevin D. Kissell
2001-06-04 20:27   ` Ian Thompson
2001-06-04 21:21     ` Kevin D. Kissell
2001-06-04 21:21       ` Kevin D. Kissell
2001-06-04 23:33       ` Ian Thompson
2001-06-05  8:50         ` Kevin D. Kissell
2001-06-05  8:50           ` Kevin D. Kissell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.