From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_2 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4ADEBC433E0 for ; Thu, 14 Jan 2021 15:43:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 05C2B23B3E for ; Thu, 14 Jan 2021 15:43:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725935AbhANPnK convert rfc822-to-8bit (ORCPT ); Thu, 14 Jan 2021 10:43:10 -0500 Received: from relay6-d.mail.gandi.net ([217.70.183.198]:59097 "EHLO relay6-d.mail.gandi.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725877AbhANPnK (ORCPT ); Thu, 14 Jan 2021 10:43:10 -0500 X-Originating-IP: 86.201.233.230 Received: from xps13 (lfbn-tou-1-151-230.w86-201.abo.wanadoo.fr [86.201.233.230]) (Authenticated sender: miquel.raynal@bootlin.com) by relay6-d.mail.gandi.net (Postfix) with ESMTPSA id 95C22C0005; Thu, 14 Jan 2021 15:42:25 +0000 (UTC) Date: Thu, 14 Jan 2021 16:42:24 +0100 From: Miquel Raynal To: Adam Ford Cc: Richard Weinberger , Vignesh Raghavendra , Tudor Ambarus , linux-mtd@lists.infradead.org, Julien Su , ycllin@mxic.com.tw, Thomas Petazzoni , Linux-OMAP Subject: Re: [PATCH 04/20] mtd: nand: ecc-bch: Stop exporting the private structure Message-ID: <20210114164224.5d21c170@xps13> In-Reply-To: References: <20200929230124.31491-1-miquel.raynal@bootlin.com> <20200929230124.31491-5-miquel.raynal@bootlin.com> <20210111112027.7cbda0ba@xps13> <20210112153534.5ba93cde@xps13> Organization: Bootlin X-Mailer: Claws Mail 3.17.4 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Precedence: bulk List-ID: X-Mailing-List: linux-omap@vger.kernel.org Hi Adam, Adam Ford wrote on Tue, 12 Jan 2021 11:20:24 -0600: > On Tue, Jan 12, 2021 at 10:01 AM Adam Ford wrote: > > > > On Tue, Jan 12, 2021 at 8:35 AM Miquel Raynal wrote: > > > > > > Hi Adam, > > > > > > Miquel Raynal wrote on Mon, 11 Jan 2021 > > > 11:20:27 +0100: > > > > > > > Hi Adam, > > > > > > > > Adam Ford wrote on Sat, 9 Jan 2021 08:46:44 -0600: > > > > > > > > > On Tue, Sep 29, 2020 at 6:09 PM Miquel Raynal wrote: > > > > > > > > > > > > The NAND BCH control structure has nothing to do outside of this > > > > > > driver, all users of the nand_bch_init/free() functions just save it > > > > > > to chip->ecc.priv so do it in this driver directly and return a > > > > > > regular error code instead. > > > > > > > > > > > > Signed-off-by: Miquel Raynal > > > > > > --- > > > > > > > > > > Starting with this commit: 3c0fe36abebe, the kernel either doesn't > > > > > build or returns errors on some omap2plus devices with the following > > > > > error: > > > > > > > > > > nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xbc > > > > > nand: Micron MT29F4G16ABBDA3W > > > > > nand: 512 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64 > > > > > nand: using OMAP_ECC_BCH8_CODE_HW_DETECTION_SW > > > > > Invalid ECC layout > > > > > omap2-nand 30000000.nand: unable to use BCH library > > > > > omap2-nand: probe of 30000000.nand failed with error -22 > > > > > 8<--- cut here --- > > > > > > > > > > There are few commits using git bisect that have build errors, so it > > > > > wasn't possible for me to determine the exact commit that broke the > > > > > ECC. If the build failed, I marked it as 'bad' to git bisect. > > > > > > > > I am sorry to hear that, I regularly rebase with a make run between each > > > > pick and push my branches to a 0-day repository to have robots check > > > > for such errors, but sometimes I fail. > > > > > > > > > Newer commits have remedied the build issue, but the Invalid ECC > > > > > layout error still exists as of 5.11-RC2. > > > > > > > > Ok so let's focus on these. > > > > > > > > > Do you have any suggestions on what I can do to remedy this? I am > > > > > willing to try and test. > > > > > > > > Glad to hear that. > > > > > > > > Can you share the NAND controller DT node you are using? > > > > > > > > Also, can you please add a few printk's like below and give me the > > > > output? > > > > > > Will you have the time to check these soon? I am ready to help and > > > would like to fix it asap. > > > > Sorry for the delay, I have to split my time with 3 different > > projects. I am hoping to get you data later today. > > > Miquel, > > Here is the dump from my boot sequence: > > [ 2.629089] omap2-nand 30000000.nand: GPIO lookup for consumer rb > [ 2.635253] omap2-nand 30000000.nand: using device tree for GPIO lookup > [ 2.642150] of_get_named_gpiod_flags: parsed 'rb-gpios' property of node '/o) > [ 2.653900] gpio gpiochip6: Persistence not supported for GPIO 0 > [ 2.660339] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xbc > [ 2.666900] nand: Micron MT29F4G16ABBDA3W > [ 2.670959] nand: 512 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB si4 > [ 2.678710] nand: using OMAP_ECC_BCH8_CODE_HW_DETECTION_SW > [ 2.684234] writesize 2048, step_size 512, nsteps 4 > [ 2.689300] strength 8, step size 512, code_size 13 Until here, everything looks fine. > [ 2.696807] count eccbytes 0 This is the cause of the error, the MTD OOB layout reports not ECC byte. Can you please check that we effectively call the large page helpers (in particular nand_ooblayout_ecc_lp()) . I bet this function returns -ERANGE on its first call, which reduces the eccbytes variable above to zero. What is strange is that, the only reason this would happen (to my eyes) is nand->ecc.ctx.total being 0. Can you please check its effective value? I do not see the immediate reason because nand->ecc.ctx.total is set to nsteps (4) * code_size (13) right before calling mtd_ooblayout_count_eccbytes(). Can you please verify my sayings and perhaps tackle the root cause of this issue? Please do not hesitate to ask questions, I'll do my best to help because this is a critical section that is not only breaking OMAP boards, unfortunately. Thanks, Miquèl