From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.transmode.se ([31.15.61.139]) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1ZuiKr-0007dW-Hf for linux-mtd@lists.infradead.org; Fri, 06 Nov 2015 14:49:51 +0000 From: Joakim Tjernlund To: "markmarshall14@gmail.com" , "linux-mtd@lists.infradead.org" CC: "dwmw2@infradead.org" Subject: Re: [PATCH] mtd: Add locking to cfi_cmdset_001:do_getlockstatus_oneblock Date: Fri, 6 Nov 2015 14:49:23 +0000 Message-ID: <1446821363.21216.191.camel@transmode.se> References: <563CAFD7.7060803@gmail.com> In-Reply-To: <563CAFD7.7060803@gmail.com> Content-Language: en-US Content-Type: text/plain; charset="iso-8859-15" Content-ID: <2A32E194A749DC44AAFE49731C939C8F@transmode.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Fri, 2015-11-06 at 14:49 +0100, Mark Marshall wrote: > The function (do_getlockstatus_oneblock) switches the flash out of > array-read mode and into query mode. It should not run in parallel to > another function that uses array-read mode. We therefore need to > acquire the chip mutex and call get_chip(..., FL_JEDEC_QUERY). >=20 > I also assume that we should invalidate the cache in the same way that > we do for do_otp_read. >=20 > Signed-off-by: Mark Marshall > --- >=20 > Hi all, >=20 > We have had a device in production (and operation) for a long time > running Linux and using a CFI NOR flash chip. We have recently > started to get devices come back from the field with strange errors in > the JFFS2 file system that is on this flash. It appeared that blocks > of files on the image were being dropped (not flash blocks, just 4K > sections from the middle of the files were being replaced with 0's). >=20 > After some debugging I found that sometimes, when the JFFS2 FS was > reading the JFFS2 inode block it would get an CRC error. After adding > some debug code I could see that in the error cases the "second half" > of the read data was replaced with a fixed repeating pattern, and that > this pattern was part of the CFI "QRY" data (It contained the flash > manufacturer and device ID's). >=20 > This led me to think that maybe the problem was that some part of the > MTD code was switching the device out of the normal "array-read" mode > and into the query mode. After looking through the mtd code I found > that the function 'do_getlockstatus_oneblock' does switch the mode of > the flash without taking the chip mutex. Adding code to take the > mutex has fixed our problem. >=20 > I am slightly surprised that no one else has seen this bug - the only > thing I can think of is that reading the lock status of the flash is > not something that happens in normal operation very often? We have a > process on the device that generates a hardware check report at boot > up, and one of the things that it does is report on the 'locked' > status of the flash. This is always running while the JFFS2 code is > performing it's initial scan. We have! Been looking at it for a while now, thinking it is an bug in erase suspend for small blocks. We even contacted Micron for confirmatio= n. Even got at few patches to address som other issues we found during trouble= shooting this. I will test your patch=20 >=20 > The bug is fairly easy to reproduce if you have a system with the > correct type of flash. In one terminal run something like: >=20 > while true ; do ( hd -n 65536 /dev/mtd0 | md5sum ) ; done >=20 > And in a second terminal run a program or utility that will check the > lock status of the flash (mtdinfo should work). Assuming nothing else > is writing to the flash partition mtd0 the md5sum will remain > constant. When the bug bites you can see the md5sum change (but then > recover). >=20 > Best Regards, >=20 > Mark Marshall >=20 > drivers/mtd/chips/cfi_cmdset_0001.c | 19 +++++++++++++++++++ > 1 file changed, 19 insertions(+) >=20 > diff --git a/drivers/mtd/chips/cfi_cmdset_0001.c > b/drivers/mtd/chips/cfi_cmdset_0001.c > index 286b97a..d675efb 100644 > --- a/drivers/mtd/chips/cfi_cmdset_0001.c > +++ b/drivers/mtd/chips/cfi_cmdset_0001.c > @@ -2051,13 +2051,32 @@ static int __xipram > do_getlockstatus_oneblock(struct map_info *map, > { > struct cfi_private *cfi =3D map->fldrv_priv; > int status, ofs_factor =3D cfi->interleave * cfi->device_type; > + int ret; >=20 > adr +=3D chip->start; > + > + mutex_lock(&chip->mutex); > + ret =3D get_chip(map, chip, adr, FL_JEDEC_QUERY); > + if (ret) { > + mutex_unlock(&chip->mutex); > + return ret; > + } > + > + /* let's ensure we're not reading back cached data from array mode */ > + INVALIDATE_CACHED_RANGE(map, adr+(2*ofs_factor), 1); > + > xip_disable(map, chip, adr+(2*ofs_factor)); > map_write(map, CMD(0x90), adr+(2*ofs_factor)); > chip->state =3D FL_JEDEC_QUERY; > status =3D cfi_read_query(map, adr+(2*ofs_factor)); > xip_enable(map, chip, 0); > + > + /* then ensure we don't keep query data in the cache */ > + INVALIDATE_CACHED_RANGE(map, adr+(2*ofs_factor), 1); > + > + put_chip(map, chip, adr); > + mutex_unlock(&chip->mutex); > + > return status; > } >=20