* dcache_blast() bug? @ 2001-06-04 17:34 Ian Thompson 2001-06-04 19:18 ` Kevin D. Kissell 0 siblings, 1 reply; 9+ messages in thread From: Ian Thompson @ 2001-06-04 17:34 UTC (permalink / raw) To: linux-mips Hi all, I'm seeing some odd memory behavior around the time when blast_dcache() is called, leading me to think that the method may be a little buggy. It appears that memory is being corrupted (consistently so) over the course of flushing the dcache. This happens to my command line argument string - arcs_cmdline. Before the blast_dcache() call, it is "console=ttyS0 ramdisk_start=0x9fcf0000 load_ramdisk=1", and after the call, the corrupted data is "ttyS0 ra0". I take it this isn't supposed to happen? any ideas of why the writeback_invalidate_d cache operation may be losing data? thanks, -ian -- ---------------------------------------- Ian Thompson tel: 408.952.2023 Firmware Engineer fax: 408.570.0910 Palmchip Corporation www.palmchip.com ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: dcache_blast() bug? @ 2001-06-04 19:18 ` Kevin D. Kissell 0 siblings, 0 replies; 9+ messages in thread From: Kevin D. Kissell @ 2001-06-04 19:18 UTC (permalink / raw) To: Ian Thompson, linux-mips What processor are you running? Kevin K. ----- Original Message ----- From: "Ian Thompson" <iant@palmchip.com> To: <linux-mips@oss.sgi.com> Sent: Monday, June 04, 2001 7:34 PM Subject: dcache_blast() bug? > > Hi all, > > I'm seeing some odd memory behavior around the time when blast_dcache() > is called, leading me to think that the method may be a little buggy. > It appears that memory is being corrupted (consistently so) over the > course of flushing the dcache. This happens to my command line argument > string - arcs_cmdline. Before the blast_dcache() call, it is > "console=ttyS0 ramdisk_start=0x9fcf0000 load_ramdisk=1", and after the > call, the corrupted data is "ttyS0 ra0". I take it this isn't supposed > to happen? any ideas of why the writeback_invalidate_d cache operation > may be losing data? > > thanks, > -ian > > > -- > ---------------------------------------- > Ian Thompson tel: 408.952.2023 > Firmware Engineer fax: 408.570.0910 > Palmchip Corporation www.palmchip.com ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: dcache_blast() bug? @ 2001-06-04 19:18 ` Kevin D. Kissell 0 siblings, 0 replies; 9+ messages in thread From: Kevin D. Kissell @ 2001-06-04 19:18 UTC (permalink / raw) To: Ian Thompson, linux-mips What processor are you running? Kevin K. ----- Original Message ----- From: "Ian Thompson" <iant@palmchip.com> To: <linux-mips@oss.sgi.com> Sent: Monday, June 04, 2001 7:34 PM Subject: dcache_blast() bug? > > Hi all, > > I'm seeing some odd memory behavior around the time when blast_dcache() > is called, leading me to think that the method may be a little buggy. > It appears that memory is being corrupted (consistently so) over the > course of flushing the dcache. This happens to my command line argument > string - arcs_cmdline. Before the blast_dcache() call, it is > "console=ttyS0 ramdisk_start=0x9fcf0000 load_ramdisk=1", and after the > call, the corrupted data is "ttyS0 ra0". I take it this isn't supposed > to happen? any ideas of why the writeback_invalidate_d cache operation > may be losing data? > > thanks, > -ian > > > -- > ---------------------------------------- > Ian Thompson tel: 408.952.2023 > Firmware Engineer fax: 408.570.0910 > Palmchip Corporation www.palmchip.com ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: dcache_blast() bug? 2001-06-04 19:18 ` Kevin D. Kissell (?) @ 2001-06-04 20:27 ` Ian Thompson 2001-06-04 21:21 ` Kevin D. Kissell -1 siblings, 1 reply; 9+ messages in thread From: Ian Thompson @ 2001-06-04 20:27 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: linux-mips oops sorry i meant to mention that. running a mips 4kc. "Kevin D. Kissell" wrote: > > What processor are you running? > > Kevin K. > > ----- Original Message ----- > From: "Ian Thompson" <iant@palmchip.com> > To: <linux-mips@oss.sgi.com> > Sent: Monday, June 04, 2001 7:34 PM > Subject: dcache_blast() bug? > > > > > Hi all, > > > > I'm seeing some odd memory behavior around the time when blast_dcache() > > is called, leading me to think that the method may be a little buggy. > > It appears that memory is being corrupted (consistently so) over the > > course of flushing the dcache. This happens to my command line argument > > string - arcs_cmdline. Before the blast_dcache() call, it is > > "console=ttyS0 ramdisk_start=0x9fcf0000 load_ramdisk=1", and after the > > call, the corrupted data is "ttyS0 ra0". I take it this isn't supposed > > to happen? any ideas of why the writeback_invalidate_d cache operation > > may be losing data? > > > > thanks, > > -ian > > > > > > -- > > ---------------------------------------- > > Ian Thompson tel: 408.952.2023 > > Firmware Engineer fax: 408.570.0910 > > Palmchip Corporation www.palmchip.com -- ---------------------------------------- Ian Thompson tel: 408.952.2023 Firmware Engineer fax: 408.570.0910 Palmchip Corporation www.palmchip.com ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: dcache_blast() bug? @ 2001-06-04 21:21 ` Kevin D. Kissell 0 siblings, 0 replies; 9+ messages in thread From: Kevin D. Kissell @ 2001-06-04 21:21 UTC (permalink / raw) To: Ian Thompson; +Cc: linux-mips Interesting, in that the 4Kc has write-through caches, which make it a good deal more difficult to get into the kind of trouble you describe. Are you running one of the 4Kc "lead vehicle" chips, or some other part? Which version of the kernel are you running, and are the CPU type and cache organization being reported correctly during boot-up? The output you report from dumping the command line sounds is interesting. The corruption seems to be in 8-byte chunks - the first 8 have disappeared, as has the third 8. Lord knows where the "0" in "ra0" comes from. Can you confirm that (a) the command line string is stored at an 8-byte aligned boundary, and (b) whether the data is actually being moved, or if the missing characters are simply being replaced with nulls or other non-printable characters? I know it's not pretty, but can you dump the same memory addresses as seen through non-cacheable kseg1 (0xa0000000-0xbfffffff), and are the cache and memory consistent? If the failure is happening on 8-byte, doubleword aligned chunks, I suspect a hardware problem more than a kernel bug. If it were my system, I'd re-seat the RAM and CPU modules to make sure I'm not simply getting screwed by a bad connection when the memory interface suddenly gets hit with a lot of traffic following the flush. Kevin K. ----- Original Message ----- From: "Ian Thompson" <iant@palmchip.com> To: "Kevin D. Kissell" <kevink@mips.com> Cc: <linux-mips@oss.sgi.com> Sent: Monday, June 04, 2001 10:27 PM Subject: Re: dcache_blast() bug? > oops sorry i meant to mention that. running a mips 4kc. > > > > "Kevin D. Kissell" wrote: > > > > What processor are you running? > > > > Kevin K. > > > > ----- Original Message ----- > > From: "Ian Thompson" <iant@palmchip.com> > > To: <linux-mips@oss.sgi.com> > > Sent: Monday, June 04, 2001 7:34 PM > > Subject: dcache_blast() bug? > > > > > > > > Hi all, > > > > > > I'm seeing some odd memory behavior around the time when blast_dcache() > > > is called, leading me to think that the method may be a little buggy. > > > It appears that memory is being corrupted (consistently so) over the > > > course of flushing the dcache. This happens to my command line argument > > > string - arcs_cmdline. Before the blast_dcache() call, it is > > > "console=ttyS0 ramdisk_start=0x9fcf0000 load_ramdisk=1", and after the > > > call, the corrupted data is "ttyS0 ra0". I take it this isn't supposed > > > to happen? any ideas of why the writeback_invalidate_d cache operation > > > may be losing data? > > > > > > thanks, > > > -ian > > > > > > > > > -- > > > ---------------------------------------- > > > Ian Thompson tel: 408.952.2023 > > > Firmware Engineer fax: 408.570.0910 > > > Palmchip Corporation www.palmchip.com > > -- > ---------------------------------------- > Ian Thompson tel: 408.952.2023 > Firmware Engineer fax: 408.570.0910 > Palmchip Corporation www.palmchip.com ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: dcache_blast() bug? @ 2001-06-04 21:21 ` Kevin D. Kissell 0 siblings, 0 replies; 9+ messages in thread From: Kevin D. Kissell @ 2001-06-04 21:21 UTC (permalink / raw) To: Ian Thompson; +Cc: linux-mips Interesting, in that the 4Kc has write-through caches, which make it a good deal more difficult to get into the kind of trouble you describe. Are you running one of the 4Kc "lead vehicle" chips, or some other part? Which version of the kernel are you running, and are the CPU type and cache organization being reported correctly during boot-up? The output you report from dumping the command line sounds is interesting. The corruption seems to be in 8-byte chunks - the first 8 have disappeared, as has the third 8. Lord knows where the "0" in "ra0" comes from. Can you confirm that (a) the command line string is stored at an 8-byte aligned boundary, and (b) whether the data is actually being moved, or if the missing characters are simply being replaced with nulls or other non-printable characters? I know it's not pretty, but can you dump the same memory addresses as seen through non-cacheable kseg1 (0xa0000000-0xbfffffff), and are the cache and memory consistent? If the failure is happening on 8-byte, doubleword aligned chunks, I suspect a hardware problem more than a kernel bug. If it were my system, I'd re-seat the RAM and CPU modules to make sure I'm not simply getting screwed by a bad connection when the memory interface suddenly gets hit with a lot of traffic following the flush. Kevin K. ----- Original Message ----- From: "Ian Thompson" <iant@palmchip.com> To: "Kevin D. Kissell" <kevink@mips.com> Cc: <linux-mips@oss.sgi.com> Sent: Monday, June 04, 2001 10:27 PM Subject: Re: dcache_blast() bug? > oops sorry i meant to mention that. running a mips 4kc. > > > > "Kevin D. Kissell" wrote: > > > > What processor are you running? > > > > Kevin K. > > > > ----- Original Message ----- > > From: "Ian Thompson" <iant@palmchip.com> > > To: <linux-mips@oss.sgi.com> > > Sent: Monday, June 04, 2001 7:34 PM > > Subject: dcache_blast() bug? > > > > > > > > Hi all, > > > > > > I'm seeing some odd memory behavior around the time when blast_dcache() > > > is called, leading me to think that the method may be a little buggy. > > > It appears that memory is being corrupted (consistently so) over the > > > course of flushing the dcache. This happens to my command line argument > > > string - arcs_cmdline. Before the blast_dcache() call, it is > > > "console=ttyS0 ramdisk_start=0x9fcf0000 load_ramdisk=1", and after the > > > call, the corrupted data is "ttyS0 ra0". I take it this isn't supposed > > > to happen? any ideas of why the writeback_invalidate_d cache operation > > > may be losing data? > > > > > > thanks, > > > -ian > > > > > > > > > -- > > > ---------------------------------------- > > > Ian Thompson tel: 408.952.2023 > > > Firmware Engineer fax: 408.570.0910 > > > Palmchip Corporation www.palmchip.com > > -- > ---------------------------------------- > Ian Thompson tel: 408.952.2023 > Firmware Engineer fax: 408.570.0910 > Palmchip Corporation www.palmchip.com ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: dcache_blast() bug? 2001-06-04 21:21 ` Kevin D. Kissell (?) @ 2001-06-04 23:33 ` Ian Thompson 2001-06-05 8:50 ` Kevin D. Kissell -1 siblings, 1 reply; 9+ messages in thread From: Ian Thompson @ 2001-06-04 23:33 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: linux-mips Thanks for your help Kevin. It may be possible that this is a hardware bug. I am using one of the lead vehicle chips with 16k d$ & i$, although there is some custom hardware which may be causing trouble as well. Oh, and this is the 2.4.1 kernel. It appears that when I copy the arguments into the command line variable, they are 8-byte aligned, and the destination is also 8-byte aligned. However, there is an inconsistency between the data in the cache and in memory after the blast_dcache() call. Could it also be possible that the cache write buffer is not quite empty, and the data in it is being lost on the blast call? Should some implementation of wbflush be called before the cache ops are done? I just wanted to see if this could be a problem before I start trying to track down bugs in hardware... Thanks, -ian "Kevin D. Kissell" wrote: > > Interesting, in that the 4Kc has write-through caches, > which make it a good deal more difficult to get into > the kind of trouble you describe. Are you running one > of the 4Kc "lead vehicle" chips, or some other part? > Which version of the kernel are you running, and are > the CPU type and cache organization being reported > correctly during boot-up? > > The output you report from dumping the command line > sounds is interesting. The corruption seems to be in > 8-byte chunks - the first 8 have disappeared, as has > the third 8. Lord knows where the "0" in "ra0" comes > from. Can you confirm that (a) the command line string > is stored at an 8-byte aligned boundary, and (b) whether > the data is actually being moved, or if the missing > characters are simply being replaced with nulls or other > non-printable characters? I know it's not pretty, but > can you dump the same memory addresses as seen > through non-cacheable kseg1 (0xa0000000-0xbfffffff), > and are the cache and memory consistent? > > If the failure is happening on 8-byte, doubleword aligned > chunks, I suspect a hardware problem more than a > kernel bug. If it were my system, I'd re-seat the RAM > and CPU modules to make sure I'm not simply getting > screwed by a bad connection when the memory interface > suddenly gets hit with a lot of traffic following the flush. > > Kevin K. > > ----- Original Message ----- > From: "Ian Thompson" <iant@palmchip.com> > To: "Kevin D. Kissell" <kevink@mips.com> > Cc: <linux-mips@oss.sgi.com> > Sent: Monday, June 04, 2001 10:27 PM > Subject: Re: dcache_blast() bug? > > > oops sorry i meant to mention that. running a mips 4kc. > > > > > > > > "Kevin D. Kissell" wrote: > > > > > > What processor are you running? > > > > > > Kevin K. > > > > > > ----- Original Message ----- > > > From: "Ian Thompson" <iant@palmchip.com> > > > To: <linux-mips@oss.sgi.com> > > > Sent: Monday, June 04, 2001 7:34 PM > > > Subject: dcache_blast() bug? > > > > > > > > > > > Hi all, > > > > > > > > I'm seeing some odd memory behavior around the time when > blast_dcache() > > > > is called, leading me to think that the method may be a little buggy. > > > > It appears that memory is being corrupted (consistently so) over the > > > > course of flushing the dcache. This happens to my command line > argument > > > > string - arcs_cmdline. Before the blast_dcache() call, it is > > > > "console=ttyS0 ramdisk_start=0x9fcf0000 load_ramdisk=1", and after the > > > > call, the corrupted data is "ttyS0 ra0". I take it this isn't > supposed > > > > to happen? any ideas of why the writeback_invalidate_d cache > operation > > > > may be losing data? > > > > > > > > thanks, > > > > -ian > > > > > > > > > > > > -- > > > > ---------------------------------------- > > > > Ian Thompson tel: 408.952.2023 > > > > Firmware Engineer fax: 408.570.0910 > > > > Palmchip Corporation www.palmchip.com > > > > -- > > ---------------------------------------- > > Ian Thompson tel: 408.952.2023 > > Firmware Engineer fax: 408.570.0910 > > Palmchip Corporation www.palmchip.com -- ---------------------------------------- Ian Thompson tel: 408.952.2023 Firmware Engineer fax: 408.570.0910 Palmchip Corporation www.palmchip.com ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: dcache_blast() bug? @ 2001-06-05 8:50 ` Kevin D. Kissell 0 siblings, 0 replies; 9+ messages in thread From: Kevin D. Kissell @ 2001-06-05 8:50 UTC (permalink / raw) To: Ian Thompson; +Cc: linux-mips > Thanks for your help Kevin. It may be possible that this is a hardware > bug. I am using one of the lead vehicle chips with 16k d$ & i$, > although there is some custom hardware which may be causing trouble as > well. Oh, and this is the 2.4.1 kernel. I'm also running a 4Kc lead vehicle, on a MIPS Malta board, with the 2.4.1 kernel, run the system moderately hard, and have never seen any behavior like that you describle. > It appears that when I copy the arguments into the command line > variable, they are 8-byte aligned, and the destination is also 8-byte > aligned. However, there is an inconsistency between the data in the > cache and in memory after the blast_dcache() call. Am I correct in taking this to mean that the contents of memory is correct, but that the cache is in error when you read the data back in cacheable space? That suggests that writes are working fine, but that either the blast_dcache() isn't correctly clearing the tags, or the refill from memory is getting trashed on the way to the cache. The former could result from misbehavior in the 4Kc lead vehicle chip itself (possibly provoked by some kind of marginal clock or power supply input), the later could result from any one of several problems in the path between the RAM array and the lead vehicle cache. I favor the later theory. See below. > Could it also be > possible that the cache write buffer is not quite empty, and the data in > it is being lost on the blast call? I know of no software mechanism that will cause the contents of the write buffer to be lost. I think a bus error indication from the system might cause it to be thrown away, but that's about it. The SYNC instruction forces its contents to be written to memory, not discarded. > Should some implementation of > wbflush be called before the cache ops are done? The write buffers are part of the BIU which is on the "far side" of the cache. Since the cache in write-through, the cache operations should not result in any interaction with the write buffer at all - the cache tags should get invalidated, and that's all. The reason that the 8-byte granularity of error suggests a hardware problem at the memory interface is that, while writes to memory will be 1, 2, or 4 bytes (byte, halfword, and word stores), and the cache line size and write buffer size are both 16 bytes, the 4Kc lead vehicle has a 64-bit memory interface, and reads 8 bytes at a time when doing cache fills. A botched RAM cycle during a cache fill would cause 8-byte blocks within the 16-byte cache lines to be trashed - which seems to be exactly what you are seeing. I strongly suggest that you double check all mechanical connections (CPU socket and memory slots), and if that doesn't help, check your RAM timing, your supply voltage, and the symmetry and cleanliness of your clocks. It sounds like the problem is highly reproducable, so a next step might be to stick a logic analyser on the CPU/Memory interface and watch the fill operation on the address, following the flush. Regards, Kevin K. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: dcache_blast() bug? @ 2001-06-05 8:50 ` Kevin D. Kissell 0 siblings, 0 replies; 9+ messages in thread From: Kevin D. Kissell @ 2001-06-05 8:50 UTC (permalink / raw) To: Ian Thompson; +Cc: linux-mips > Thanks for your help Kevin. It may be possible that this is a hardware > bug. I am using one of the lead vehicle chips with 16k d$ & i$, > although there is some custom hardware which may be causing trouble as > well. Oh, and this is the 2.4.1 kernel. I'm also running a 4Kc lead vehicle, on a MIPS Malta board, with the 2.4.1 kernel, run the system moderately hard, and have never seen any behavior like that you describle. > It appears that when I copy the arguments into the command line > variable, they are 8-byte aligned, and the destination is also 8-byte > aligned. However, there is an inconsistency between the data in the > cache and in memory after the blast_dcache() call. Am I correct in taking this to mean that the contents of memory is correct, but that the cache is in error when you read the data back in cacheable space? That suggests that writes are working fine, but that either the blast_dcache() isn't correctly clearing the tags, or the refill from memory is getting trashed on the way to the cache. The former could result from misbehavior in the 4Kc lead vehicle chip itself (possibly provoked by some kind of marginal clock or power supply input), the later could result from any one of several problems in the path between the RAM array and the lead vehicle cache. I favor the later theory. See below. > Could it also be > possible that the cache write buffer is not quite empty, and the data in > it is being lost on the blast call? I know of no software mechanism that will cause the contents of the write buffer to be lost. I think a bus error indication from the system might cause it to be thrown away, but that's about it. The SYNC instruction forces its contents to be written to memory, not discarded. > Should some implementation of > wbflush be called before the cache ops are done? The write buffers are part of the BIU which is on the "far side" of the cache. Since the cache in write-through, the cache operations should not result in any interaction with the write buffer at all - the cache tags should get invalidated, and that's all. The reason that the 8-byte granularity of error suggests a hardware problem at the memory interface is that, while writes to memory will be 1, 2, or 4 bytes (byte, halfword, and word stores), and the cache line size and write buffer size are both 16 bytes, the 4Kc lead vehicle has a 64-bit memory interface, and reads 8 bytes at a time when doing cache fills. A botched RAM cycle during a cache fill would cause 8-byte blocks within the 16-byte cache lines to be trashed - which seems to be exactly what you are seeing. I strongly suggest that you double check all mechanical connections (CPU socket and memory slots), and if that doesn't help, check your RAM timing, your supply voltage, and the symmetry and cleanliness of your clocks. It sounds like the problem is highly reproducable, so a next step might be to stick a logic analyser on the CPU/Memory interface and watch the fill operation on the address, following the flush. Regards, Kevin K. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2001-06-05 8:50 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2001-06-04 17:34 dcache_blast() bug? Ian Thompson 2001-06-04 19:18 ` Kevin D. Kissell 2001-06-04 19:18 ` Kevin D. Kissell 2001-06-04 20:27 ` Ian Thompson 2001-06-04 21:21 ` Kevin D. Kissell 2001-06-04 21:21 ` Kevin D. Kissell 2001-06-04 23:33 ` Ian Thompson 2001-06-05 8:50 ` Kevin D. Kissell 2001-06-05 8:50 ` Kevin D. Kissell
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.