From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by yocto-www.yoctoproject.org (Postfix, from userid 118) id 28B08E0080C; Fri, 23 Jan 2015 13:11:37 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on yocto-www.yoctoproject.org X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 X-Spam-HAM-Report: * 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider * (picmaster[at]mail.bg) * -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% * [score: 0.0000] * -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's * domain * 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily * valid * -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature * -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no * trust * [193.201.172.118 listed in list.dnswl.org] Received: from mx2.mail.bg (mx2.mail.bg [193.201.172.118]) by yocto-www.yoctoproject.org (Postfix) with ESMTP id 66FF2E002F9 for ; Fri, 23 Jan 2015 13:11:31 -0800 (PST) Received: from [192.168.0.40] (unknown [93.152.143.60]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx2.mail.bg (Postfix) with ESMTPSA id E53F76000374; Fri, 23 Jan 2015 23:11:28 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=mail.bg; s=default; t=1422047488; bh=wwHprqTV7HipS9wNqo0Bh5F2ixeUDj+o968ylvyBpYw=; h=Message-ID:Date:From:MIME-Version:To:CC:Subject:References: In-Reply-To:Content-Type:Content-Transfer-Encoding; b=GYpjXYoOWFhoZx9RXheYvS+tidyHbISe6N8/oUXQvDp9ex10unve1u34GuYFOall7 xYH91vQKRygSH9KklqXQ+Mk2YPwioZp/cnVpUxlKlL3LOoyVnnmJxGo4na2rM71WOs yIk2DpURFLUaaMGDxTWH5vsqmlm8MnjaK0u9f3tc= Message-ID: <54C2B900.8000807@mail.bg> Date: Fri, 23 Jan 2015 23:11:28 +0200 From: Nikolay Dimitrov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.4.0 MIME-Version: 1.0 To: Fabio Estevam References: <54C16AB6.20605@mail.bg> In-Reply-To: Cc: "meta-freescale@yoctoproject.org" Subject: Re: imx6 silent memory corruption X-BeenThere: meta-freescale@yoctoproject.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: Usage and development list for the meta-fsl-* layers List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Jan 2015 21:11:37 -0000 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Hi Fabio, On 01/23/2015 12:25 AM, Fabio Estevam wrote: > On Thu, Jan 22, 2015 at 7:25 PM, Nikolay Dimitrov wrote: > >> I will appreciate if you can share ideas what could be wrong with this >> setup, and also I'll be happy to hear from you suggestions for similar >> simple tests for system reliability. > > Maybe you could try to run the 'memtester' utility and see it how your > board behaves. Thanks for the idea. I ran the tool and it also reports errors, but this happens rarely (just like the hash test) and I still looking for how to easily reproduce the issue. Here's an example of memory error: # memtester 64M 100 memtester version 4.1.3 (32-bit) Copyright (C) 2010 Charles Cazabon. Licensed under the GNU General Public License version 2 (only). pagesize is 4096 pagesizemask is 0xfffff000 want 64MB (67108864 bytes) got 64MB (67108864 bytes), trying mlock ...locked. Loop 1/100: Stuck Address : ok Random Value : ok FAILURE: 0xc3909006 != 0xc3909007 at offset 0x00291fac. Compare XOR : Compare SUB : ok Compare MUL : ok Compare DIV : ok Compare OR : ok Compare AND : ok Sequential Increment: ok Solid Bits : ok Block Sequential : ok Checkerboard : ok Bit Spread : ok Bit Flip : ok Walking Ones : ok Walking Zeroes : ok Memtester can run for hours without finding an issue, and sometimes it runs for several minutes and reports a memory error. Found another tool, stresstestapp (http://stressapptest.googlecode.com /svn/trunk/) which again seems to trigger the issue. Here's again an example of memory error: # ./stressapptest --no_timestamps --printsec 60 -M 64 -s 300 Log: Commandline - ./stressapptest --no_timestamps --printsec 60 -M 64 -s 300 Stats: SAT revision 1.0.7_autoconf, 32 bit binary Log: picmaster @ riotboard on Fri Jan 23 20:48:49 EET 2015 from open source release Log: 1 nodes, 2 cpus. Log: Defaulting to 2 copy threads Log: Flooring memory allocation to multiple of 4: 64MB Log: Prefer plain malloc memory allocation. Log: Using mmap() allocation at 0x72430000. Stats: Starting SAT, 64M, 300 seconds Log: region number 1 exceeds region count 1 Log: Region mask: 0x1 Log: Seconds remaining: 240 Log: Seconds remaining: 180 Report Error: miscompare : DIMM Unknown : 1 : 134s Hardware Error: miscompare on CPU 1(0x2) at 0x74e93040(0x33f0d040:DIMM Unknown): read:0xaaaaaaaaaaaaaa8a, reread:0xaaaaaaaaaaaaaa8a expected:0xaaaaaaaaaaaaaaaa Report Error: miscompare : DIMM Unknown : 1 : 136s Hardware Error: miscompare on CPU 0(0x1) at 0x75528710(0x32270710:DIMM Unknown): read:0xffffffbfffffffbe, reread:0xffffffbfffffffbe expected:0xffffffbfffffffbf Log: Seconds remaining: 120 Log: Seconds remaining: 60 Report Error: miscompare : DIMM Unknown : 1 : 266s Hardware Error: miscompare on CPU 0(0x1) at 0x74b979d0(0x358ae9d0:DIMM Unknown): read:0x0000001000000000, reread:0x0000001000000000 expected:0x0000001000000010 Report Error: miscompare : DIMM Unknown : 1 : 274s Hardware Error: miscompare on CPU 0(0x1) at 0x73b4cfd0(0x35e8afd0:DIMM Unknown): read:0x0000001000000000, reread:0x0000001000000000 expected:0x0000001000000010 Log: Thread 1 found 3 hardware incidents Log: Thread 2 found 1 hardware incidents Stats: Found 4 hardware incidents Stats: Completed: 256346.00M in 300.03s 854.40MB/s, with 4 hardware incidents, 0 errors Stats: Memory Copy: 256346.00M at 854.46MB/s Stats: File Copy: 0.00M at 0.00MB/s Stats: Net Copy: 0.00M at 0.00MB/s Stats: Data Check: 0.00M at 0.00MB/s Stats: Invert Data: 0.00M at 0.00MB/s Stats: Disk: 0.00M at 0.00MB/s Status: FAIL - test discovered HW problems I plan to run again the FSL DDR stress test to see whether it detects issues with my DDR memory. My board uses a SO-DIMM DDR3, and I was also thinking to try with another SO-DIMM module to see whether there's any difference. Thanks for the ideas so far. This is a major problem for me so I need to resolve it before doing anything else on this board. Kind regards, Nikolay