From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by yocto-www.yoctoproject.org (Postfix, from userid 118) id AB16EE008DC; Tue, 27 Jan 2015 00:40:11 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on yocto-www.yoctoproject.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 X-Spam-HAM-Report: * 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider * (picmaster[at]mail.bg) * -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low * trust * [193.201.172.118 listed in list.dnswl.org] * -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% * [score: 0.0000] * -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's * domain * 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily * valid * -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature Received: from mx2.mail.bg (mx2.mail.bg [193.201.172.118]) by yocto-www.yoctoproject.org (Postfix) with ESMTP id 59976E0083A for ; Tue, 27 Jan 2015 00:40:06 -0800 (PST) Received: from [192.168.0.40] (unknown [93.152.143.60]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx2.mail.bg (Postfix) with ESMTPSA id 07EFE6000871; Tue, 27 Jan 2015 10:40:04 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=mail.bg; s=default; t=1422348005; bh=KCtnsJa2hte+hmopwQRrMEYZJCEbthzgOgBYSJ7uVQQ=; h=Message-ID:Date:From:MIME-Version:To:CC:Subject:References: In-Reply-To:Content-Type:Content-Transfer-Encoding; b=AmUDNzCWZVoP0VgECa+OXu1h9EJk9x3AcnKhDJ6edjs7SwvmBrvciRcydFCgd1msC VRpaHZtKkORnCtZ31vRe8zV3yrGGb0qs43IFFPqLfnq5seC08qQkUHqJAk1UmrVyXv J21zIXn7nsRGH7Eem1+1S0yl1MrjU42IkzuSMuDg= Message-ID: <54C74EE4.8070308@mail.bg> Date: Tue, 27 Jan 2015 10:40:04 +0200 From: Nikolay Dimitrov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.4.0 MIME-Version: 1.0 To: Doug Schwanke References: <54C16AB6.20605@mail.bg> <54C2B900.8000807@mail.bg> In-Reply-To: Cc: "meta-freescale@yoctoproject.org" Subject: Re: imx6 silent memory corruption X-BeenThere: meta-freescale@yoctoproject.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: Usage and development list for the meta-fsl-* layers List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Jan 2015 08:40:11 -0000 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Hi Doug, On 01/26/2015 04:40 PM, Doug Schwanke wrote: >> -----Original Message----- >> From: meta-freescale-bounces@yoctoproject.org [mailto:meta-freescale- >> bounces@yoctoproject.org] On Behalf Of Nikolay Dimitrov >> Sent: Friday, January 23, 2015 3:11 PM >> To: Fabio Estevam >> Cc: meta-freescale@yoctoproject.org >> Subject: Re: [meta-freescale] imx6 silent memory corruption >> >> Hi Fabio, >> >> On 01/23/2015 12:25 AM, Fabio Estevam wrote: >>> On Thu, Jan 22, 2015 at 7:25 PM, Nikolay Dimitrov >> wrote: >>> >>>> I will appreciate if you can share ideas what could be wrong with >>>> this setup, and also I'll be happy to hear from you suggestions for >>>> similar simple tests for system reliability. >>> >>> Maybe you could try to run the 'memtester' utility and see it how your >>> board behaves. >> >> Thanks for the idea. I ran the tool and it also reports errors, but this happens >> rarely (just like the hash test) and I still looking for how to easily reproduce >> the issue. Here's an example of memory error: >> >> >> # memtester 64M 100 >> memtester version 4.1.3 (32-bit) >> Copyright (C) 2010 Charles Cazabon. >> Licensed under the GNU General Public License version 2 (only). >> >> pagesize is 4096 >> pagesizemask is 0xfffff000 >> want 64MB (67108864 bytes) >> got 64MB (67108864 bytes), trying mlock ...locked. >> Loop 1/100: >> Stuck Address : ok >> Random Value : ok >> FAILURE: 0xc3909006 != 0xc3909007 at offset 0x00291fac. >> Compare XOR : Compare SUB : ok >> Compare MUL : ok >> Compare DIV : ok >> Compare OR : ok >> Compare AND : ok >> Sequential Increment: ok >> Solid Bits : ok >> Block Sequential : ok >> Checkerboard : ok >> Bit Spread : ok >> Bit Flip : ok >> Walking Ones : ok >> Walking Zeroes : ok >> >> >> Memtester can run for hours without finding an issue, and sometimes it runs >> for several minutes and reports a memory error. >> >> Found another tool, stresstestapp (http://stressapptest.googlecode.com >> /svn/trunk/) which again seems to trigger the issue. Here's again an example >> of memory error: >> >> >> # ./stressapptest --no_timestamps --printsec 60 -M 64 -s 300 >> Log: Commandline - ./stressapptest --no_timestamps --printsec 60 -M 64 >> -s 300 >> Stats: SAT revision 1.0.7_autoconf, 32 bit binary >> Log: picmaster @ riotboard on Fri Jan 23 20:48:49 EET 2015 from open >> source release >> Log: 1 nodes, 2 cpus. >> Log: Defaulting to 2 copy threads >> Log: Flooring memory allocation to multiple of 4: 64MB >> Log: Prefer plain malloc memory allocation. >> Log: Using mmap() allocation at 0x72430000. >> Stats: Starting SAT, 64M, 300 seconds >> Log: region number 1 exceeds region count 1 >> Log: Region mask: 0x1 >> Log: Seconds remaining: 240 >> Log: Seconds remaining: 180 >> Report Error: miscompare : DIMM Unknown : 1 : 134s >> Hardware Error: miscompare on CPU 1(0x2) at 0x74e93040(0x33f0d040:DIMM >> Unknown): read:0xaaaaaaaaaaaaaa8a, reread:0xaaaaaaaaaaaaaa8a >> expected:0xaaaaaaaaaaaaaaaa >> Report Error: miscompare : DIMM Unknown : 1 : 136s >> Hardware Error: miscompare on CPU 0(0x1) at 0x75528710(0x32270710:DIMM >> Unknown): read:0xffffffbfffffffbe, reread:0xffffffbfffffffbe >> expected:0xffffffbfffffffbf >> Log: Seconds remaining: 120 >> Log: Seconds remaining: 60 >> Report Error: miscompare : DIMM Unknown : 1 : 266s >> Hardware Error: miscompare on CPU 0(0x1) at >> 0x74b979d0(0x358ae9d0:DIMM >> Unknown): read:0x0000001000000000, reread:0x0000001000000000 >> expected:0x0000001000000010 >> Report Error: miscompare : DIMM Unknown : 1 : 274s >> Hardware Error: miscompare on CPU 0(0x1) at 0x73b4cfd0(0x35e8afd0:DIMM >> Unknown): read:0x0000001000000000, reread:0x0000001000000000 >> expected:0x0000001000000010 >> Log: Thread 1 found 3 hardware incidents >> Log: Thread 2 found 1 hardware incidents >> Stats: Found 4 hardware incidents >> Stats: Completed: 256346.00M in 300.03s 854.40MB/s, with 4 hardware >> incidents, 0 errors >> Stats: Memory Copy: 256346.00M at 854.46MB/s >> Stats: File Copy: 0.00M at 0.00MB/s >> Stats: Net Copy: 0.00M at 0.00MB/s >> Stats: Data Check: 0.00M at 0.00MB/s >> Stats: Invert Data: 0.00M at 0.00MB/s >> Stats: Disk: 0.00M at 0.00MB/s >> >> Status: FAIL - test discovered HW problems >> >> >> I plan to run again the FSL DDR stress test to see whether it >> detects issues with my DDR memory. My board uses a SO-DIMM DDR3, and I >> was also thinking to try with another SO-DIMM module to see whether >> there's any difference. >> >> Thanks for the ideas so far. This is a major problem for me so I need >> to resolve it before doing anything else on this board. >> > > Have you read ERR005198 of the Chip Errata for the i.MX 6Dual/6Quad > http://cache.freescale.com/files/32bit/doc/errata/IMX6DQCE.pdf The issue is observed even when PL310 is disabled in the kernel configuration. Regards, Nikolay