All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nikolay Dimitrov <picmaster@mail.bg>
To: Fabio Estevam <festevam@gmail.com>
Cc: "meta-freescale@yoctoproject.org" <meta-freescale@yoctoproject.org>
Subject: Re: imx6 silent memory corruption
Date: Fri, 23 Jan 2015 23:11:28 +0200	[thread overview]
Message-ID: <54C2B900.8000807@mail.bg> (raw)
In-Reply-To: <CAOMZO5AMXXxn-M-yx90E2hwY72nT118U8_b0MF8JdBp74Bq3fg@mail.gmail.com>

Hi Fabio,

On 01/23/2015 12:25 AM, Fabio Estevam wrote:
> On Thu, Jan 22, 2015 at 7:25 PM, Nikolay Dimitrov <picmaster@mail.bg> wrote:
>
>> I will appreciate if you can share ideas what could be wrong with this
>> setup, and also I'll be happy to hear from you suggestions for similar
>> simple tests for system reliability.
>
> Maybe you could try to run the 'memtester' utility and see it how your
> board behaves.

Thanks for the idea. I ran the tool and it also reports errors, but
this happens rarely (just like the hash test) and I still looking for
how to easily reproduce the issue. Here's an example of memory error:


# memtester 64M 100
memtester version 4.1.3 (32-bit)
Copyright (C) 2010 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffff000
want 64MB (67108864 bytes)
got  64MB (67108864 bytes), trying mlock ...locked.
Loop 1/100:
   Stuck Address       : ok
   Random Value        : ok
FAILURE: 0xc3909006 != 0xc3909007 at offset 0x00291fac.
   Compare XOR         :   Compare SUB         : ok
   Compare MUL         : ok
   Compare DIV         : ok
   Compare OR          : ok
   Compare AND         : ok
   Sequential Increment: ok
   Solid Bits          : ok
   Block Sequential    : ok
   Checkerboard        : ok
   Bit Spread          : ok
   Bit Flip            : ok
   Walking Ones        : ok
   Walking Zeroes      : ok


Memtester can run for hours without finding an issue, and sometimes it
runs for several minutes and reports a memory error.

Found another tool, stresstestapp (http://stressapptest.googlecode.com
/svn/trunk/) which again seems to trigger the issue. Here's again an 
example of memory error:


# ./stressapptest --no_timestamps --printsec 60 -M 64 -s 300
Log: Commandline - ./stressapptest --no_timestamps --printsec 60 -M 64 
-s 300
Stats: SAT revision 1.0.7_autoconf, 32 bit binary
Log: picmaster @ riotboard on Fri Jan 23 20:48:49 EET 2015 from open 
source release
Log: 1 nodes, 2 cpus.
Log: Defaulting to 2 copy threads
Log: Flooring memory allocation to multiple of 4: 64MB
Log: Prefer plain malloc memory allocation.
Log: Using mmap() allocation at 0x72430000.
Stats: Starting SAT, 64M, 300 seconds
Log: region number 1 exceeds region count 1
Log: Region mask: 0x1
Log: Seconds remaining: 240
Log: Seconds remaining: 180
Report Error: miscompare : DIMM Unknown : 1 : 134s
Hardware Error: miscompare on CPU 1(0x2) at 0x74e93040(0x33f0d040:DIMM 
Unknown): read:0xaaaaaaaaaaaaaa8a, reread:0xaaaaaaaaaaaaaa8a 
expected:0xaaaaaaaaaaaaaaaa
Report Error: miscompare : DIMM Unknown : 1 : 136s
Hardware Error: miscompare on CPU 0(0x1) at 0x75528710(0x32270710:DIMM 
Unknown): read:0xffffffbfffffffbe, reread:0xffffffbfffffffbe 
expected:0xffffffbfffffffbf
Log: Seconds remaining: 120
Log: Seconds remaining: 60
Report Error: miscompare : DIMM Unknown : 1 : 266s
Hardware Error: miscompare on CPU 0(0x1) at 0x74b979d0(0x358ae9d0:DIMM 
Unknown): read:0x0000001000000000, reread:0x0000001000000000 
expected:0x0000001000000010
Report Error: miscompare : DIMM Unknown : 1 : 274s
Hardware Error: miscompare on CPU 0(0x1) at 0x73b4cfd0(0x35e8afd0:DIMM 
Unknown): read:0x0000001000000000, reread:0x0000001000000000 
expected:0x0000001000000010
Log: Thread 1 found 3 hardware incidents
Log: Thread 2 found 1 hardware incidents
Stats: Found 4 hardware incidents
Stats: Completed: 256346.00M in 300.03s 854.40MB/s, with 4 hardware 
incidents, 0 errors
Stats: Memory Copy: 256346.00M at 854.46MB/s
Stats: File Copy: 0.00M at 0.00MB/s
Stats: Net Copy: 0.00M at 0.00MB/s
Stats: Data Check: 0.00M at 0.00MB/s
Stats: Invert Data: 0.00M at 0.00MB/s
Stats: Disk: 0.00M at 0.00MB/s

Status: FAIL - test discovered HW problems


I plan to run again the FSL DDR stress test to see whether it
detects issues with my DDR memory. My board uses a SO-DIMM DDR3, and I
was also thinking to try with another SO-DIMM module to see whether
there's any difference.

Thanks for the ideas so far. This is a major problem for me so I need
to resolve it before doing anything else on this board.

Kind regards,
Nikolay


  reply	other threads:[~2015-01-23 21:11 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-22 21:25 imx6 silent memory corruption Nikolay Dimitrov
2015-01-22 22:25 ` Fabio Estevam
2015-01-23 21:11   ` Nikolay Dimitrov [this message]
2015-01-26 14:40     ` Doug Schwanke
2015-01-27  8:40       ` Nikolay Dimitrov
2015-01-27 16:27         ` Nikolay Dimitrov
2015-01-27 17:00           ` Otavio Salvador
2015-01-27 17:40             ` Gonzalez, Alex
2015-01-27 20:23               ` Nikolay Dimitrov
2015-01-27 20:51                 ` Eric Bénard
2015-01-27 22:35                   ` Nikolay Dimitrov
2015-01-27 22:59                     ` Eric Bénard
2015-01-28 16:18                       ` Nikolay Dimitrov
2015-01-28 16:40                         ` Fabio Estevam
2015-02-14 12:12                       ` Nikolay Dimitrov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54C2B900.8000807@mail.bg \
    --to=picmaster@mail.bg \
    --cc=festevam@gmail.com \
    --cc=meta-freescale@yoctoproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.