All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nikolay Dimitrov <picmaster@mail.bg>
To: Doug Schwanke <doug.schwanke@firstviewconsultants.com>
Cc: "meta-freescale@yoctoproject.org" <meta-freescale@yoctoproject.org>
Subject: Re: imx6 silent memory corruption
Date: Tue, 27 Jan 2015 10:40:04 +0200	[thread overview]
Message-ID: <54C74EE4.8070308@mail.bg> (raw)
In-Reply-To: <BLUPR01MB469A14FC5385237923B46B58F350@BLUPR01MB469.prod.exchangelabs.com>

Hi Doug,

On 01/26/2015 04:40 PM, Doug Schwanke wrote:
>> -----Original Message-----
>> From: meta-freescale-bounces@yoctoproject.org [mailto:meta-freescale-
>> bounces@yoctoproject.org] On Behalf Of Nikolay Dimitrov
>> Sent: Friday, January 23, 2015 3:11 PM
>> To: Fabio Estevam
>> Cc: meta-freescale@yoctoproject.org
>> Subject: Re: [meta-freescale] imx6 silent memory corruption
>>
>> Hi Fabio,
>>
>> On 01/23/2015 12:25 AM, Fabio Estevam wrote:
>>> On Thu, Jan 22, 2015 at 7:25 PM, Nikolay Dimitrov <picmaster@mail.bg>
>> wrote:
>>>
>>>> I will appreciate if you can share ideas what could be wrong with
>>>> this setup, and also I'll be happy to hear from you suggestions for
>>>> similar simple tests for system reliability.
>>>
>>> Maybe you could try to run the 'memtester' utility and see it how your
>>> board behaves.
>>
>> Thanks for the idea. I ran the tool and it also reports errors, but this happens
>> rarely (just like the hash test) and I still looking for how to easily reproduce
>> the issue. Here's an example of memory error:
>>
>>
>> # memtester 64M 100
>> memtester version 4.1.3 (32-bit)
>> Copyright (C) 2010 Charles Cazabon.
>> Licensed under the GNU General Public License version 2 (only).
>>
>> pagesize is 4096
>> pagesizemask is 0xfffff000
>> want 64MB (67108864 bytes)
>> got  64MB (67108864 bytes), trying mlock ...locked.
>> Loop 1/100:
>>     Stuck Address       : ok
>>     Random Value        : ok
>> FAILURE: 0xc3909006 != 0xc3909007 at offset 0x00291fac.
>>     Compare XOR         :   Compare SUB         : ok
>>     Compare MUL         : ok
>>     Compare DIV         : ok
>>     Compare OR          : ok
>>     Compare AND         : ok
>>     Sequential Increment: ok
>>     Solid Bits          : ok
>>     Block Sequential    : ok
>>     Checkerboard        : ok
>>     Bit Spread          : ok
>>     Bit Flip            : ok
>>     Walking Ones        : ok
>>     Walking Zeroes      : ok
>>
>>
>> Memtester can run for hours without finding an issue, and sometimes it runs
>> for several minutes and reports a memory error.
>>
>> Found another tool, stresstestapp (http://stressapptest.googlecode.com
>> /svn/trunk/) which again seems to trigger the issue. Here's again an example
>> of memory error:
>>
>>
>> # ./stressapptest --no_timestamps --printsec 60 -M 64 -s 300
>> Log: Commandline - ./stressapptest --no_timestamps --printsec 60 -M 64
>> -s 300
>> Stats: SAT revision 1.0.7_autoconf, 32 bit binary
>> Log: picmaster @ riotboard on Fri Jan 23 20:48:49 EET 2015 from open
>> source release
>> Log: 1 nodes, 2 cpus.
>> Log: Defaulting to 2 copy threads
>> Log: Flooring memory allocation to multiple of 4: 64MB
>> Log: Prefer plain malloc memory allocation.
>> Log: Using mmap() allocation at 0x72430000.
>> Stats: Starting SAT, 64M, 300 seconds
>> Log: region number 1 exceeds region count 1
>> Log: Region mask: 0x1
>> Log: Seconds remaining: 240
>> Log: Seconds remaining: 180
>> Report Error: miscompare : DIMM Unknown : 1 : 134s
>> Hardware Error: miscompare on CPU 1(0x2) at 0x74e93040(0x33f0d040:DIMM
>> Unknown): read:0xaaaaaaaaaaaaaa8a, reread:0xaaaaaaaaaaaaaa8a
>> expected:0xaaaaaaaaaaaaaaaa
>> Report Error: miscompare : DIMM Unknown : 1 : 136s
>> Hardware Error: miscompare on CPU 0(0x1) at 0x75528710(0x32270710:DIMM
>> Unknown): read:0xffffffbfffffffbe, reread:0xffffffbfffffffbe
>> expected:0xffffffbfffffffbf
>> Log: Seconds remaining: 120
>> Log: Seconds remaining: 60
>> Report Error: miscompare : DIMM Unknown : 1 : 266s
>> Hardware Error: miscompare on CPU 0(0x1) at
>> 0x74b979d0(0x358ae9d0:DIMM
>> Unknown): read:0x0000001000000000, reread:0x0000001000000000
>> expected:0x0000001000000010
>> Report Error: miscompare : DIMM Unknown : 1 : 274s
>> Hardware Error: miscompare on CPU 0(0x1) at 0x73b4cfd0(0x35e8afd0:DIMM
>> Unknown): read:0x0000001000000000, reread:0x0000001000000000
>> expected:0x0000001000000010
>> Log: Thread 1 found 3 hardware incidents
>> Log: Thread 2 found 1 hardware incidents
>> Stats: Found 4 hardware incidents
>> Stats: Completed: 256346.00M in 300.03s 854.40MB/s, with 4 hardware
>> incidents, 0 errors
>> Stats: Memory Copy: 256346.00M at 854.46MB/s
>> Stats: File Copy: 0.00M at 0.00MB/s
>> Stats: Net Copy: 0.00M at 0.00MB/s
>> Stats: Data Check: 0.00M at 0.00MB/s
>> Stats: Invert Data: 0.00M at 0.00MB/s
>> Stats: Disk: 0.00M at 0.00MB/s
>>
>> Status: FAIL - test discovered HW problems
>>
>>
>> I plan to run again the FSL DDR stress test to see whether it
>> detects issues with my DDR memory. My board uses a SO-DIMM DDR3, and I
>> was also thinking to try with another SO-DIMM module to see whether
>> there's any difference.
>>
>> Thanks for the ideas so far. This is a major problem for me so I need
>> to resolve it before doing anything else on this board.
>>
>
> Have you read ERR005198 of the Chip Errata for the i.MX 6Dual/6Quad
> http://cache.freescale.com/files/32bit/doc/errata/IMX6DQCE.pdf

The issue is observed even when PL310 is disabled in the kernel
configuration.

Regards,
Nikolay


  reply	other threads:[~2015-01-27  8:40 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-22 21:25 imx6 silent memory corruption Nikolay Dimitrov
2015-01-22 22:25 ` Fabio Estevam
2015-01-23 21:11   ` Nikolay Dimitrov
2015-01-26 14:40     ` Doug Schwanke
2015-01-27  8:40       ` Nikolay Dimitrov [this message]
2015-01-27 16:27         ` Nikolay Dimitrov
2015-01-27 17:00           ` Otavio Salvador
2015-01-27 17:40             ` Gonzalez, Alex
2015-01-27 20:23               ` Nikolay Dimitrov
2015-01-27 20:51                 ` Eric Bénard
2015-01-27 22:35                   ` Nikolay Dimitrov
2015-01-27 22:59                     ` Eric Bénard
2015-01-28 16:18                       ` Nikolay Dimitrov
2015-01-28 16:40                         ` Fabio Estevam
2015-02-14 12:12                       ` Nikolay Dimitrov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54C74EE4.8070308@mail.bg \
    --to=picmaster@mail.bg \
    --cc=doug.schwanke@firstviewconsultants.com \
    --cc=meta-freescale@yoctoproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.