From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mga02.intel.com ([134.134.136.20]) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1Xa4Zt-000078-Mo for linux-mtd@lists.infradead.org; Fri, 03 Oct 2014 15:15:30 +0000 Message-ID: <1412349293.3795.91.camel@sauron.fi.intel.com> Subject: Re: UBI: Question about error checking in ubi_eba_write_leb() From: Artem Bityutskiy Reply-To: dedekind1@gmail.com To: steve Date: Fri, 03 Oct 2014 18:14:53 +0300 In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Cc: linux-mtd@lists.infradead.org List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Tue, 2014-09-30 at 12:00 +0100, steve wrote: > This is my first post to this mailing list, I think this is the right place for > my question. > > I think the error checking in ubi_eba_write_leb() may be wrong and may cause > incorrect behaviour when extra checking is enabled in debugfs, here's the > scenario I have a problem with: > > ubi_eba_write_leb() calls ubi_io_write_data() -> ubi_io_write() -> > ubi_dbg_check_all_ff() > > ubi_dbg_check_all_ff() will perform an mtd_read() which can fail with EBADMSG > (if there is a failure to correct using ECC as in my case). Yep, this is possible. > This error code > will bubble back up to ubi_eba_write_leb() And the question is - is this the right thing to do? Original intention was to catch I/O problems and report about them promptly by dumping the information to the log and returning an error. The error we return is -EINVAL exactly in because we did not want the upper layers try to 'deal' with the problem. All we wanted to do is to catch the problems. > , but recover_peb() will only be > called if the error code was EIO. Yeah, and we never meant the recover_peb() to actually try doing anything about that. > The question is should recover_peb() be > called in this case as well? Good question - this was not meant to be. So see, you are now talking about the debugging code. It was written basically to make developers' live simpler. This is just a tool to catch bugs during the R&D. So if during the R&D phase we see that the driver returns an "ECC error" (-EBADMSG) when we read the data we just wrote - this is a problem, and we _do not_ want to hide it by starting the recovery mechanism - we want the write request to actually fail and the developer to actually notice this. The non-debug case is a completely different thing. Here we actually would be interested to handle the issue transparently - move the data to a different PEB, and mark the failing PEB as bad. But the assumption is that the write() path never returns -EBADMSG. If a driver internally does write-read-verify, we assume it should return -EIO if there is an ECC error. But not -EBADMSG. This is why we do not handle this error code in eba.c. Now, what I see is that you are looking at some old code. In the new code the function is named 'ubi_self_check_all_ff()', and if I read the code correctly, it will return -EINVAL if mtd_read() returns -EBADMSG: err = mtd_read(ubi->mtd, addr, len, &read, buf); if (err && !mtd_is_bitflip(err)) { ubi_err("error %d while reading %d bytes from PEB %d:%d, read %zd bytes", err, len, pnum, offset, read); goto error;