From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from 71-19-161-253.dedicated.allstream.net ([71.19.161.253] helo=nsa.nbspaymentsolutions.com) by merlin.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1WDFL3-0006j7-16 for linux-mtd@lists.infradead.org; Tue, 11 Feb 2014 15:33:33 +0000 From: Bill Pringlemeir To: "Wiedemer, Thorsten (Lawo AG)" Subject: Re: AW: UBI leb_write_unlock NULL pointer Oops (continuation) References: <52EF772D.8080207@nod.at> <52EF9FFE.4020405@nod.at> <52F1F658.9080701@nod.at> Date: Tue, 11 Feb 2014 10:25:44 -0500 In-Reply-To: (Thorsten Wiedemer's message of "Tue, 11 Feb 2014 09:01:50 +0100") Message-ID: <87zjlxy8lj.fsf@nbsps.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: Richard Weinberger , "linux-mtd@lists.infradead.org" List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , >> Am 04.02.2014 18:01, schrieb Wiedemer, Thorsten (Lawo AG): >>> I made a "hardcore test" with: >>>> while [ 1 ]; do cp =20 >>> tmp/; sync; done & $ while [ 1 ]; do cp=20 >>> tmp/; sync; done & $=20 >>> while [ 1 ]; do cp tmp/; >>> sync; done & >>> It took about 2-3 hours until I had an error (two times): >> -----Urspr=C3=BCngliche Nachricht----- >> Von: Richard Weinberger [mailto:richard@nod.at]=20 >> This test ran the over night without any error on my imx51 board. :-\ >> Thorsten, can you please enable CONFIG_DEBUG_LIST? >> Also try whether you can trigger the issue with lock debugging >> enabled. On 11 Feb 2014, Thorsten.Wiedemer@lawo.com wrote: > short update (I was out of office the rest of last week). I compiled > the kernel with the debug flags for debug list and lock alloc. The > kernel compiled with gcc-4.8.2 didn't start (no output on serial > console and reboot of the system). I didn't try (yet) to find out > what happens at startup. You don't need to enable the 'lock alloc' debugging; Just the 'debug list' as Richard suggested. One at a time would work and give clues if you can reproduce it. > I compiled the same kernel (and same config) with gcc-4.4.4. The write > test runs now for over 16 hours without error. Next step is to find > out wether this is due to a changed timing because of the debug flags > or if it's the compiler. I ran a test as per the above on an IMX25 and mxc_nand has 448179139 interrupts, with about 6 bit flips and one torture test. It was been running for about four days. I am using gcc 4.7.3 (crosstool-ng) and backports to 2.6.36.=20=20 I think that the issue is not related to an MTD driver and/or UBI/UbiFS directly. It is more likely an architecture issue and maybe some API inconsistency. It could be compiler related, however, it seems many people have seen the issue on various ARM926 systems (different Linux versions, different compilers, and different MTD drivers). User space tasks running in parallel with the test may play a role. Did you turn CONFIG_PREEMPT off? I think memory pressure and other effect (not related to UBI/UbiFS) maybe needed to trigger the issue. We don't normally see this on our systems. The one time it happened, an application developer ran some 'ls -R' or 'find .' in parallel with a file intensive feature in our application. I haven't found a test to reproduce it reliably. Fwiw, Bill Pringlemeir.