From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mdfmta004.mxout.tbr.inty.net ([91.221.168.45] helo=smtp.demon.co.uk) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1XbQUF-0000ip-HV for linux-mtd@lists.infradead.org; Tue, 07 Oct 2014 08:51:16 +0000 Date: Tue, 7 Oct 2014 09:46:08 +0100 (BST) From: steve To: Artem Bityutskiy Subject: Re: UBI: Question about error checking in ubi_eba_write_leb() In-Reply-To: <1412349293.3795.91.camel@sauron.fi.intel.com> Message-ID: References: <1412349293.3795.91.camel@sauron.fi.intel.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: linux-mtd@lists.infradead.org List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Thank you very much for your reply Artem, that makes sense to me now. On Fri, 3 Oct 2014, Artem Bityutskiy wrote: > On Tue, 2014-09-30 at 12:00 +0100, steve wrote: >> This is my first post to this mailing list, I think this is the right place for >> my question. >> >> I think the error checking in ubi_eba_write_leb() may be wrong and may cause >> incorrect behaviour when extra checking is enabled in debugfs, here's the >> scenario I have a problem with: >> >> ubi_eba_write_leb() calls ubi_io_write_data() -> ubi_io_write() -> >> ubi_dbg_check_all_ff() >> >> ubi_dbg_check_all_ff() will perform an mtd_read() which can fail with EBADMSG >> (if there is a failure to correct using ECC as in my case). > > Yep, this is possible. > >> This error code >> will bubble back up to ubi_eba_write_leb() > > And the question is - is this the right thing to do? Original intention > was to catch I/O problems and report about them promptly by dumping the > information to the log and returning an error. > > The error we return is -EINVAL exactly in because we did not want the > upper layers try to 'deal' with the problem. All we wanted to do is to > catch the problems. > >> , but recover_peb() will only be >> called if the error code was EIO. > > Yeah, and we never meant the recover_peb() to actually try doing > anything about that. > >> The question is should recover_peb() be >> called in this case as well? > > Good question - this was not meant to be. > > So see, you are now talking about the debugging code. It was written > basically to make developers' live simpler. This is just a tool to catch > bugs during the R&D. > > So if during the R&D phase we see that the driver returns an "ECC > error" (-EBADMSG) when we read the data we just wrote - this is a > problem, and we _do not_ want to hide it by starting the recovery > mechanism - we want the write request to actually fail and the developer > to actually notice this. > > The non-debug case is a completely different thing. Here we actually > would be interested to handle the issue transparently - move the data to > a different PEB, and mark the failing PEB as bad. > > But the assumption is that the write() path never returns -EBADMSG. If a > driver internally does write-read-verify, we assume it should return > -EIO if there is an ECC error. But not -EBADMSG. > > This is why we do not handle this error code in eba.c. > > Now, what I see is that you are looking at some old code. In the new > code the function is named 'ubi_self_check_all_ff()', and if I read the > code correctly, it will return -EINVAL if mtd_read() returns -EBADMSG: > > > err = mtd_read(ubi->mtd, addr, len, &read, buf); > if (err && !mtd_is_bitflip(err)) { > ubi_err("error %d while reading %d bytes from PEB %d:%d, read %zd bytes", > err, len, pnum, offset, read); > goto error; > >