* mtd locking @ 2016-05-29 13:38 Matthias Auchmann 2016-05-29 14:23 ` Richard Weinberger 0 siblings, 1 reply; 6+ messages in thread From: Matthias Auchmann @ 2016-05-29 13:38 UTC (permalink / raw) To: linux-mtd@lists.infradead.org Hi all! I have a question - sorry if it turns out to be a beginner's question. I'm using kernel version 3.17, but as far as I can tell my question still applies. My question is how locking is ensured with MTD. I understand that usually there would only be one file system attached to one mtd partition, but what if I concurrently ran multiple instances of nanddump from userspace? Is MTD thread-safe? I'm asking since I had an issue where when I called nanddump on two partitions of the same NAND flash simultaneously, the ECC errors would count up for both nanddumps although only one partition had ECC errors. Experimentally adding a mutex to mtdpart.c's part_read() function solved the issue. I then noticed that quite frequently in the MTD code, ecc_stats are saved, then some reads are performed, and then ecc_stats are compared. No mutex or spinlock or whatever protection is used. Given that the ECC errors can result in -EUCLEAN or a positive bitflip count (then resulting in scrubbing for UBI for example)... I started to feel a little uncomfortable about the whole MTD locking issue. This is where the question comes from... could anyone shed light on this? Thanks! ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: mtd locking 2016-05-29 13:38 mtd locking Matthias Auchmann @ 2016-05-29 14:23 ` Richard Weinberger 2016-05-29 15:53 ` Matthias Auchmann 0 siblings, 1 reply; 6+ messages in thread From: Richard Weinberger @ 2016-05-29 14:23 UTC (permalink / raw) To: Matthias Auchmann; +Cc: linux-mtd@lists.infradead.org On Sun, May 29, 2016 at 3:38 PM, Matthias Auchmann <m.auchmann@artech.at> wrote: > Hi all! > > I have a question - sorry if it turns out to be a beginner's question. I'm using kernel version 3.17, but as far as I can tell my question still applies. > > My question is how locking is ensured with MTD. I understand that usually there would only be one file system attached to one mtd partition, but what if I concurrently ran multiple instances of nanddump from userspace? Is MTD thread-safe? > > I'm asking since I had an issue where when I called nanddump on two partitions of the same NAND flash simultaneously, the ECC errors would count up for both nanddumps although only one partition had ECC errors. Experimentally adding a mutex to mtdpart.c's part_read() function solved the issue. IIRC ECC error counting is not thread safe at all. So, yes the behavior is indented. ...at least nobody cared enough to make it better. -- Thanks, //richard ^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: mtd locking 2016-05-29 14:23 ` Richard Weinberger @ 2016-05-29 15:53 ` Matthias Auchmann 2016-05-29 16:17 ` Richard Weinberger 0 siblings, 1 reply; 6+ messages in thread From: Matthias Auchmann @ 2016-05-29 15:53 UTC (permalink / raw) To: Richard Weinberger; +Cc: linux-mtd@lists.infradead.org By intended you mean it's wrong and we know, but it's still wrong, right? What about thread safety in general? Is the rest of the mtd system thread safe? Can it happen that e.g. a command is issued to NAND, and before the buffer is read, another command is issued (and similar scenarios)? Or is there some kind of protection in place for that? I'm willing to contribute and fix the ECC part, but I first need to understand what other mechanisms or thread safety issues there might still be... Thanks! BR, Matthias -----Original Message----- From: Richard Weinberger [mailto:richard.weinberger@gmail.com] Sent: Sunday, May 29, 2016 4:24 PM To: Matthias Auchmann Cc: linux-mtd@lists.infradead.org Subject: Re: mtd locking On Sun, May 29, 2016 at 3:38 PM, Matthias Auchmann <m.auchmann@artech.at> wrote: > Hi all! > > I have a question - sorry if it turns out to be a beginner's question. I'm using kernel version 3.17, but as far as I can tell my question still applies. > > My question is how locking is ensured with MTD. I understand that usually there would only be one file system attached to one mtd partition, but what if I concurrently ran multiple instances of nanddump from userspace? Is MTD thread-safe? > > I'm asking since I had an issue where when I called nanddump on two partitions of the same NAND flash simultaneously, the ECC errors would count up for both nanddumps although only one partition had ECC errors. Experimentally adding a mutex to mtdpart.c's part_read() function solved the issue. IIRC ECC error counting is not thread safe at all. So, yes the behavior is indented. ...at least nobody cared enough to make it better. -- Thanks, //richard ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: mtd locking 2016-05-29 15:53 ` Matthias Auchmann @ 2016-05-29 16:17 ` Richard Weinberger 2016-05-30 9:24 ` Boris Brezillon 2016-05-31 18:11 ` Brian Norris 0 siblings, 2 replies; 6+ messages in thread From: Richard Weinberger @ 2016-05-29 16:17 UTC (permalink / raw) To: Matthias Auchmann; +Cc: linux-mtd@lists.infradead.org Hi! Am 29.05.2016 um 17:53 schrieb Matthias Auchmann: > By intended you mean it's wrong and we know, but it's still wrong, right? I'd say lazy. Sometimes counters don't need to be exact. Having exact counting is expensive. See network stack. I'm not sure what the exact reasons in the ECC error counting case are. Maybe Brian or Boris can tell more. > What about thread safety in general? Is the rest of the mtd system thread safe? Can it happen that e.g. a command is issued to NAND, and before the buffer is read, another command is issued (and similar scenarios)? Or is there some kind of protection in place for that? NAND chip access is strictly serialized. :-) See nand_get_device(). Thanks, //richard ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: mtd locking 2016-05-29 16:17 ` Richard Weinberger @ 2016-05-30 9:24 ` Boris Brezillon 2016-05-31 18:11 ` Brian Norris 1 sibling, 0 replies; 6+ messages in thread From: Boris Brezillon @ 2016-05-30 9:24 UTC (permalink / raw) To: Matthias Auchmann Cc: Richard Weinberger, linux-mtd@lists.infradead.org, Brian Norris, Sascha Hauer +Brian and Sasha Hi Matthias, On Sun, 29 May 2016 18:17:02 +0200 Richard Weinberger <richard@nod.at> wrote: > Hi! > > Am 29.05.2016 um 17:53 schrieb Matthias Auchmann: > > By intended you mean it's wrong and we know, but it's still wrong, right? > > I'd say lazy. Sometimes counters don't need to be exact. > Having exact counting is expensive. See network stack. > > I'm not sure what the exact reasons in the ECC error counting > case are. > Maybe Brian or Boris can tell more. > > > What about thread safety in general? Is the rest of the mtd system thread safe? Can it happen that e.g. a command is issued to NAND, and before the buffer is read, another command is issued (and similar scenarios)? Or is there some kind of protection in place for that? > > NAND chip access is strictly serialized. :-) > See nand_get_device(). Yep, read/write accesses are correctly protected. The statistics update are also protected, but it does not prevent tools like nanddump to retrieve incorrect information. Actually, nanddump uses the following sequence: 1/ get stats 2/ read stuff 3/ get new stats and compare with information retrieved in #1 Since we can't lock access to the MTD device for the whole sequence, the statistics might change because of other read accesses on the same MTD device. I recently suggested the addition of a new ioctl to address that [1]. Regards, Boris [1]http://thread.gmane.org/gmane.linux.drivers.mtd/66834 -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: mtd locking 2016-05-29 16:17 ` Richard Weinberger 2016-05-30 9:24 ` Boris Brezillon @ 2016-05-31 18:11 ` Brian Norris 1 sibling, 0 replies; 6+ messages in thread From: Brian Norris @ 2016-05-31 18:11 UTC (permalink / raw) To: Richard Weinberger Cc: Matthias Auchmann, linux-mtd@lists.infradead.org, Sascha Hauer, Boris Brezillon Hi, On Sun, May 29, 2016 at 06:17:02PM +0200, Richard Weinberger wrote: > Am 29.05.2016 um 17:53 schrieb Matthias Auchmann: > > By intended you mean it's wrong and we know, but it's still wrong, right? > > I'd say lazy. Sometimes counters don't need to be exact. > Having exact counting is expensive. See network stack. > > I'm not sure what the exact reasons in the ECC error counting > case are. > Maybe Brian or Boris can tell more. Boris correctly explained things. And several people have noticed, but few solutions have been attempted. The only thing I'd add: I suspect the MTD char device APIs haven't gotten as much love because they aren't as useful. Most users have found better utility by working through JFFS2, UBI/UBIFS, etc., since there's just so much that a raw MTD doesn't really solve for you, especially when you start talking about the type of device where you care about bitflips (i.e., NAND). You're more likely to get away fine with raw MTD on NOR flash, but then you don't care about ECCGETSTATS there anyway. I think Boris has found use cases where he just wants to talk raw NAND/MTD, so maybe some of this dynamic is changing. (And BTW, I realized this is a potential impediment to moving the MTD tests entirely to user space. They can still work fine in single-threaded mode, but once you start running multiple instances, you'll start getting inaccurate stats reports, which will make some tests think they've failed.) Brian ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-05-31 18:11 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-05-29 13:38 mtd locking Matthias Auchmann 2016-05-29 14:23 ` Richard Weinberger 2016-05-29 15:53 ` Matthias Auchmann 2016-05-29 16:17 ` Richard Weinberger 2016-05-30 9:24 ` Boris Brezillon 2016-05-31 18:11 ` Brian Norris
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).