* Re: Race to power off harming SATA SSDs [not found] ` <20170410235206.GA28603@wtj.duckdns.org> @ 2017-05-07 20:40 ` Pavel Machek 2017-05-08 7:21 ` David Woodhouse 0 siblings, 1 reply; 28+ messages in thread From: Pavel Machek @ 2017-05-07 20:40 UTC (permalink / raw) To: Tejun Heo Cc: Henrique de Moraes Holschuh, linux-kernel, linux-scsi, linux-ide, Hans de Goede, boris.brezillon, linux-mtd, dwmw2 Hi! > > However, *IN PRACTICE*, SATA STANDBY IMMEDIATE command completion > > [often?] only indicates that the device is now switching to the target > > power management state, not that it has reached the target state. Any > > further device status inquires would return that it is in STANDBY mode, > > even if it is still entering that state. > > > > The kernel then continues the shutdown path while the SSD is still > > preparing itself to be powered off, and it becomes a race. When the > > kernel + firmware wins, platform power is cut before the SSD has > > finished (i.e. the SSD is subject to an unclean power-off). > > At that point, the device is fully flushed and in terms of data > integrity should be fine with losing power at any point anyway. Actually, no, that is not how it works. "Fully flushed" is one thing, surviving power loss is different. Explanation below. > > NOTE: unclean SSD power-offs are dangerous and may brick the device in > > the worst case, or otherwise harm it (reduce longevity, damage flash > > blocks). It is also not impossible to get data corruption. > > I get that the incrementing counters might not be pretty but I'm a bit > skeptical about this being an actual issue. Because if that were > true, the device would be bricking itself from any sort of power > losses be that an actual power loss, battery rundown or hard power off > after crash. And that's exactly what users see. If you do enough power fails on a SSD, you usually brick it, some die sooner than others. There was some test results published, some are here http://lkcl.net/reports/ssd_analysis.html, I believe I seen some others too. It is very hard for a NAND to work reliably in face of power failures. In fact, not even Linux MTD + UBIFS works well in that regards. See http://www.linux-mtd.infradead.org/faq/ubi.html. (Unfortunately, its down now?!). If we can't get it right, do you believe SSD manufactures do? [Issue is, if you powerdown during erase, you get "weakly erased" page, which will contain expected 0xff's, but you'll get bitflips there quickly. Similar issue exists for writes. It is solveable in software, just hard and slow... and we don't do it.] Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Race to power off harming SATA SSDs 2017-05-07 20:40 ` Race to power off harming SATA SSDs Pavel Machek @ 2017-05-08 7:21 ` David Woodhouse 2017-05-08 7:38 ` Ricard Wanderlof 2017-05-08 9:28 ` Pavel Machek 0 siblings, 2 replies; 28+ messages in thread From: David Woodhouse @ 2017-05-08 7:21 UTC (permalink / raw) To: Pavel Machek, Tejun Heo Cc: Henrique de Moraes Holschuh, linux-kernel, linux-scsi, linux-ide, Hans de Goede, boris.brezillon, linux-mtd [-- Attachment #1: Type: text/plain, Size: 1874 bytes --] On Sun, 2017-05-07 at 22:40 +0200, Pavel Machek wrote: > > > NOTE: unclean SSD power-offs are dangerous and may brick the device in > > > the worst case, or otherwise harm it (reduce longevity, damage flash > > > blocks). It is also not impossible to get data corruption. > > > I get that the incrementing counters might not be pretty but I'm a bit > > skeptical about this being an actual issue. Because if that were > > true, the device would be bricking itself from any sort of power > > losses be that an actual power loss, battery rundown or hard power off > > after crash. > > And that's exactly what users see. If you do enough power fails on a > SSD, you usually brick it, some die sooner than others. There was some > test results published, some are here > http://lkcl.net/reports/ssd_analysis.html, I believe I seen some > others too. > > It is very hard for a NAND to work reliably in face of power > failures. In fact, not even Linux MTD + UBIFS works well in that > regards. See > http://www.linux-mtd.infradead.org/faq/ubi.html. (Unfortunately, its > down now?!). If we can't get it right, do you believe SSD manufactures > do? > > [Issue is, if you powerdown during erase, you get "weakly erased" > page, which will contain expected 0xff's, but you'll get bitflips > there quickly. Similar issue exists for writes. It is solveable in > software, just hard and slow... and we don't do it.] It's not that hard. We certainly do it in JFFS2. I was fairly sure that it was also part of the design considerations for UBI — it really ought to be right there too. I'm less sure about UBIFS but I would have expected it to be OK. SSDs however are often crap; power fail those at your peril. And of course there's nothing you can do when they do fail, whereas we accept patches for things which are implemented in Linux. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 4938 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Race to power off harming SATA SSDs 2017-05-08 7:21 ` David Woodhouse @ 2017-05-08 7:38 ` Ricard Wanderlof 2017-05-08 8:13 ` David Woodhouse 2017-05-08 9:28 ` Pavel Machek 1 sibling, 1 reply; 28+ messages in thread From: Ricard Wanderlof @ 2017-05-08 7:38 UTC (permalink / raw) To: David Woodhouse Cc: Pavel Machek, Tejun Heo, boris.brezillon, linux-scsi, Hans de Goede, linux-kernel, linux-ide, linux-mtd, Henrique de Moraes Holschuh On Mon, 8 May 2017, David Woodhouse wrote: > > [Issue is, if you powerdown during erase, you get "weakly erased" > > page, which will contain expected 0xff's, but you'll get bitflips > > there quickly. Similar issue exists for writes. It is solveable in > > software, just hard and slow... and we don't do it.] > > It's not that hard. We certainly do it in JFFS2. I was fairly sure that > it was also part of the design considerations for UBI ? it really ought > to be right there too. I'm less sure about UBIFS but I would have > expected it to be OK. I've got a problem with the underlying mechanism. How long does it take to erase a NAND block? A couple of milliseconds. That means that for an erase to be "weak" du to a power fail, the host CPU must issue an erase command, and then the power to the NAND must drop within those milliseconds. However, in most systems there will be a power monitor which will essentially reset the CPU as soon as the power starts dropping. So in practice, by the time the voltage is too low to successfully supply the NAND chip, the CPU has already been reset, hence, no reset command will have been given by the time NAND runs out of steam. Sure, with switchmode power supplies, we don't have those large capacitors in the power supply which can keep the power going for a second or more, but still, I would think that the power wouldn't die fast enough for this to be an issue. But I could very well be wrong and I haven't had experience with that many NAND flash systems. But then please tell me where the above reasoning is flawed. /Ricard -- Ricard Wolf Wanderlöf ricardw(at)axis.com Axis Communications AB, Lund, Sweden www.axis.com Phone +46 46 272 2016 Fax +46 46 13 61 30 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Race to power off harming SATA SSDs 2017-05-08 7:38 ` Ricard Wanderlof @ 2017-05-08 8:13 ` David Woodhouse 2017-05-08 8:36 ` Ricard Wanderlof 0 siblings, 1 reply; 28+ messages in thread From: David Woodhouse @ 2017-05-08 8:13 UTC (permalink / raw) To: Ricard Wanderlof Cc: Pavel Machek, Tejun Heo, boris.brezillon, linux-scsi, Hans de Goede, linux-kernel, linux-ide, linux-mtd, Henrique de Moraes Holschuh [-- Attachment #1: Type: text/plain, Size: 1823 bytes --] On Mon, 2017-05-08 at 09:38 +0200, Ricard Wanderlof wrote: > On Mon, 8 May 2017, David Woodhouse wrote: > > > > > > > > > [Issue is, if you powerdown during erase, you get "weakly erased" > > > page, which will contain expected 0xff's, but you'll get bitflips > > > there quickly. Similar issue exists for writes. It is solveable in > > > software, just hard and slow... and we don't do it.] > > It's not that hard. We certainly do it in JFFS2. I was fairly sure that > > it was also part of the design considerations for UBI ? it really ought > > to be right there too. I'm less sure about UBIFS but I would have > > expected it to be OK. > I've got a problem with the underlying mechanism. How long does it take to > erase a NAND block? A couple of milliseconds. That means that for an erase > to be "weak" du to a power fail, the host CPU must issue an erase command, > and then the power to the NAND must drop within those milliseconds. > However, in most systems there will be a power monitor which will > essentially reset the CPU as soon as the power starts dropping. So in > practice, by the time the voltage is too low to successfully supply the > NAND chip, the CPU has already been reset, hence, no reset command will > have been given by the time NAND runs out of steam. > > Sure, with switchmode power supplies, we don't have those large capacitors > in the power supply which can keep the power going for a second or more, > but still, I would think that the power wouldn't die fast enough for this > to be an issue. > > But I could very well be wrong and I haven't had experience with that many > NAND flash systems. But then please tell me where the above reasoning is > flawed. Our empirical testing trumps your "can never happen" theory :) [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 4938 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Race to power off harming SATA SSDs 2017-05-08 8:13 ` David Woodhouse @ 2017-05-08 8:36 ` Ricard Wanderlof 2017-05-08 8:54 ` David Woodhouse 0 siblings, 1 reply; 28+ messages in thread From: Ricard Wanderlof @ 2017-05-08 8:36 UTC (permalink / raw) To: David Woodhouse Cc: Pavel Machek, Tejun Heo, boris.brezillon, linux-scsi, Hans de Goede, linux-kernel, linux-ide, linux-mtd, Henrique de Moraes Holschuh On Mon, 8 May 2017, David Woodhouse wrote: > > I've got a problem with the underlying mechanism. How long does it take to > > erase a NAND block? A couple of milliseconds. That means that for an erase > > to be "weak" du to a power fail, the host CPU must issue an erase command, > > and then the power to the NAND must drop within those milliseconds. > > However, in most systems there will be a power monitor which will > > essentially reset the CPU as soon as the power starts dropping. So in > > practice, by the time the voltage is too low to successfully supply the > > NAND chip, the CPU has already been reset, hence, no reset command will > > have been given by the time NAND runs out of steam. > > > > Sure, with switchmode power supplies, we don't have those large capacitors > > in the power supply which can keep the power going for a second or more, > > but still, I would think that the power wouldn't die fast enough for this > > to be an issue. > > > Our empirical testing trumps your "can never happen" theory :) I'm sure it does. But what is the explanation then? Has anyone analyzed what is going on using an oscilloscope to verify relationship between erase command and supply voltage drop? /Ricard -- Ricard Wolf Wanderlöf ricardw(at)axis.com Axis Communications AB, Lund, Sweden www.axis.com Phone +46 46 272 2016 Fax +46 46 13 61 30 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Race to power off harming SATA SSDs 2017-05-08 8:36 ` Ricard Wanderlof @ 2017-05-08 8:54 ` David Woodhouse 2017-05-08 9:06 ` Ricard Wanderlof 0 siblings, 1 reply; 28+ messages in thread From: David Woodhouse @ 2017-05-08 8:54 UTC (permalink / raw) To: Ricard Wanderlof Cc: Pavel Machek, Tejun Heo, boris.brezillon, linux-scsi, Hans de Goede, linux-kernel, linux-ide, linux-mtd, Henrique de Moraes Holschuh [-- Attachment #1: Type: text/plain, Size: 794 bytes --] On Mon, 2017-05-08 at 10:36 +0200, Ricard Wanderlof wrote: > On Mon, 8 May 2017, David Woodhouse wrote: > > Our empirical testing trumps your "can never happen" theory :) > > I'm sure it does. But what is the explanation then? Has anyone analyzed > what is going on using an oscilloscope to verify relationship between > erase command and supply voltage drop? Not that I'm aware of. Once we have reached the "it does happen and we have to cope" there was not a lot of point in working out *why* it happened. In fact, the only examples I *personally* remember were on NOR flash, which takes longer to erase. So it's vaguely possible that it doesn't happen on NAND. But really, it's not something we should be depending on and the software mechanisms have to remain in place. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 4938 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Race to power off harming SATA SSDs 2017-05-08 8:54 ` David Woodhouse @ 2017-05-08 9:06 ` Ricard Wanderlof 2017-05-08 9:09 ` Hans de Goede 2017-05-08 10:12 ` David Woodhouse 0 siblings, 2 replies; 28+ messages in thread From: Ricard Wanderlof @ 2017-05-08 9:06 UTC (permalink / raw) To: David Woodhouse Cc: Pavel Machek, Tejun Heo, boris.brezillon, linux-scsi, Hans de Goede, linux-kernel, linux-ide, linux-mtd, Henrique de Moraes Holschuh On Mon, 8 May 2017, David Woodhouse wrote: > > On Mon, 8 May 2017, David Woodhouse wrote: > > > Our empirical testing trumps your "can never happen" theory :) > > > > I'm sure it does. But what is the explanation then? Has anyone analyzed > > what is going on using an oscilloscope to verify relationship between > > erase command and supply voltage drop? > > Not that I'm aware of. Once we have reached the "it does happen and we > have to cope" there was not a lot of point in working out *why* it > happened. > > In fact, the only examples I *personally* remember were on NOR flash, > which takes longer to erase. So it's vaguely possible that it doesn't > happen on NAND. But really, it's not something we should be depending > on and the software mechanisms have to remain in place. My point is really that say that the problem is in fact not that the erase is cut short due to the power fail, but that the software issues a second command before the first erase command has completed, for instance, or some other situation. Then we'd have a concrete situation which we can resolve (i.e., fix the bug), rather than assuming that it's the hardware's fault and implement various software workarounds. On the other hand, making the software resilient to erase problems essentially makes the system more robust in any case, so it's not a bad thing of course. It's just that I've seen this "we're software guys, and it must be the hardware's fault" (and vice versa) enough times to cause a small warning bell to off here. /Ricard -- Ricard Wolf Wanderlöf ricardw(at)axis.com Axis Communications AB, Lund, Sweden www.axis.com Phone +46 46 272 2016 Fax +46 46 13 61 30 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Race to power off harming SATA SSDs 2017-05-08 9:06 ` Ricard Wanderlof @ 2017-05-08 9:09 ` Hans de Goede 2017-05-08 10:13 ` David Woodhouse 2017-05-08 10:12 ` David Woodhouse 1 sibling, 1 reply; 28+ messages in thread From: Hans de Goede @ 2017-05-08 9:09 UTC (permalink / raw) To: Ricard Wanderlof, David Woodhouse Cc: Pavel Machek, Tejun Heo, boris.brezillon, linux-scsi, linux-kernel, linux-ide, linux-mtd, Henrique de Moraes Holschuh Hi, On 08-05-17 11:06, Ricard Wanderlof wrote: > > On Mon, 8 May 2017, David Woodhouse wrote: > >>> On Mon, 8 May 2017, David Woodhouse wrote: >>>> Our empirical testing trumps your "can never happen" theory :) >>> >>> I'm sure it does. But what is the explanation then? Has anyone analyzed >>> what is going on using an oscilloscope to verify relationship between >>> erase command and supply voltage drop? >> >> Not that I'm aware of. Once we have reached the "it does happen and we >> have to cope" there was not a lot of point in working out *why* it >> happened. >> >> In fact, the only examples I *personally* remember were on NOR flash, >> which takes longer to erase. So it's vaguely possible that it doesn't >> happen on NAND. But really, it's not something we should be depending >> on and the software mechanisms have to remain in place. > > My point is really that say that the problem is in fact not that the erase > is cut short due to the power fail, but that the software issues a second > command before the first erase command has completed, for instance, or > some other situation. Then we'd have a concrete situation which we can > resolve (i.e., fix the bug), rather than assuming that it's the hardware's > fault and implement various software workarounds. You're forgetting that the SSD itself (this thread is about SSDs) also has a major software component which is doing housekeeping all the time, so even if the main CPU gets reset the SSD's controller may still happily be erasing blocks. Regards, Hans ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Race to power off harming SATA SSDs 2017-05-08 9:09 ` Hans de Goede @ 2017-05-08 10:13 ` David Woodhouse 2017-05-08 11:50 ` Boris Brezillon 0 siblings, 1 reply; 28+ messages in thread From: David Woodhouse @ 2017-05-08 10:13 UTC (permalink / raw) To: Hans de Goede, Ricard Wanderlof Cc: Pavel Machek, Tejun Heo, boris.brezillon, linux-scsi, linux-kernel, linux-ide, linux-mtd, Henrique de Moraes Holschuh [-- Attachment #1: Type: text/plain, Size: 425 bytes --] On Mon, 2017-05-08 at 11:09 +0200, Hans de Goede wrote: > You're forgetting that the SSD itself (this thread is about SSDs) also has > a major software component which is doing housekeeping all the time, so even > if the main CPU gets reset the SSD's controller may still happily be erasing > blocks. We're not really talking about SSDs at all any more; we're talking about real flash with real maintainable software. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 4938 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Race to power off harming SATA SSDs 2017-05-08 10:13 ` David Woodhouse @ 2017-05-08 11:50 ` Boris Brezillon 2017-05-08 15:40 ` David Woodhouse 2017-05-08 16:43 ` Pavel Machek 0 siblings, 2 replies; 28+ messages in thread From: Boris Brezillon @ 2017-05-08 11:50 UTC (permalink / raw) To: David Woodhouse Cc: Hans de Goede, Ricard Wanderlof, Pavel Machek, Tejun Heo, linux-scsi, linux-kernel, linux-ide, linux-mtd, Henrique de Moraes Holschuh On Mon, 08 May 2017 11:13:10 +0100 David Woodhouse <dwmw2@infradead.org> wrote: > On Mon, 2017-05-08 at 11:09 +0200, Hans de Goede wrote: > > You're forgetting that the SSD itself (this thread is about SSDs) also has > > a major software component which is doing housekeeping all the time, so even > > if the main CPU gets reset the SSD's controller may still happily be erasing > > blocks. > > We're not really talking about SSDs at all any more; we're talking > about real flash with real maintainable software. It's probably a good sign that this new discussion should take place in a different thread :-). ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Race to power off harming SATA SSDs 2017-05-08 11:50 ` Boris Brezillon @ 2017-05-08 15:40 ` David Woodhouse 2017-05-08 21:36 ` Pavel Machek 2017-05-08 16:43 ` Pavel Machek 1 sibling, 1 reply; 28+ messages in thread From: David Woodhouse @ 2017-05-08 15:40 UTC (permalink / raw) To: Boris Brezillon Cc: Hans de Goede, Ricard Wanderlof, Pavel Machek, Tejun Heo, linux-scsi, linux-kernel, linux-ide, linux-mtd, Henrique de Moraes Holschuh [-- Attachment #1: Type: text/plain, Size: 830 bytes --] On Mon, 2017-05-08 at 13:50 +0200, Boris Brezillon wrote: > On Mon, 08 May 2017 11:13:10 +0100 > David Woodhouse <dwmw2@infradead.org> wrote: > > > > > On Mon, 2017-05-08 at 11:09 +0200, Hans de Goede wrote: > > > > > > You're forgetting that the SSD itself (this thread is about SSDs) also has > > > a major software component which is doing housekeeping all the time, so even > > > if the main CPU gets reset the SSD's controller may still happily be erasing > > > blocks. > > We're not really talking about SSDs at all any more; we're talking > > about real flash with real maintainable software. > > It's probably a good sign that this new discussion should take place in > a different thread :-). Well, maybe. But it was a silly thread in the first place. SATA SSDs aren't *expected* to be reliable. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 4938 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Race to power off harming SATA SSDs 2017-05-08 15:40 ` David Woodhouse @ 2017-05-08 21:36 ` Pavel Machek 0 siblings, 0 replies; 28+ messages in thread From: Pavel Machek @ 2017-05-08 21:36 UTC (permalink / raw) To: David Woodhouse Cc: Boris Brezillon, Hans de Goede, Ricard Wanderlof, Tejun Heo, linux-scsi, linux-kernel, linux-ide, linux-mtd, Henrique de Moraes Holschuh On Mon 2017-05-08 16:40:11, David Woodhouse wrote: > On Mon, 2017-05-08 at 13:50 +0200, Boris Brezillon wrote: > > On Mon, 08 May 2017 11:13:10 +0100 > > David Woodhouse <dwmw2@infradead.org> wrote: > > > > > > > > On Mon, 2017-05-08 at 11:09 +0200, Hans de Goede wrote: > > > > > > > > You're forgetting that the SSD itself (this thread is about SSDs) also has > > > > a major software component which is doing housekeeping all the time, so even > > > > if the main CPU gets reset the SSD's controller may still happily be erasing > > > > blocks. > > > We're not really talking about SSDs at all any more; we're talking > > > about real flash with real maintainable software. > > > > It's probably a good sign that this new discussion should take place in > > a different thread :-). > > Well, maybe. But it was a silly thread in the first place. SATA SSDs > aren't *expected* to be reliable. Citation needed? I'm pretty sure SATA SSDs are expected to be reliable, up to maximum amount of gigabytes written (specified by manufacturer), as long as you don't cut power without warning. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Race to power off harming SATA SSDs 2017-05-08 11:50 ` Boris Brezillon 2017-05-08 15:40 ` David Woodhouse @ 2017-05-08 16:43 ` Pavel Machek 2017-05-08 17:43 ` Tejun Heo 2017-05-08 18:29 ` Atlant Schmidt 1 sibling, 2 replies; 28+ messages in thread From: Pavel Machek @ 2017-05-08 16:43 UTC (permalink / raw) To: Boris Brezillon Cc: David Woodhouse, Hans de Goede, Ricard Wanderlof, Tejun Heo, linux-scsi, linux-kernel, linux-ide, linux-mtd, Henrique de Moraes Holschuh [-- Attachment #1: Type: text/plain, Size: 1227 bytes --] On Mon 2017-05-08 13:50:05, Boris Brezillon wrote: > On Mon, 08 May 2017 11:13:10 +0100 > David Woodhouse <dwmw2@infradead.org> wrote: > > > On Mon, 2017-05-08 at 11:09 +0200, Hans de Goede wrote: > > > You're forgetting that the SSD itself (this thread is about SSDs) also has > > > a major software component which is doing housekeeping all the time, so even > > > if the main CPU gets reset the SSD's controller may still happily be erasing > > > blocks. > > > > We're not really talking about SSDs at all any more; we're talking > > about real flash with real maintainable software. > > It's probably a good sign that this new discussion should take place in > a different thread :-). Well, you are right.. and I'm responsible. What I was trying to point out was that storage people try to treat SSDs as HDDs... and SSDs are very different. Harddrives mostly survive powerfails (with emergency parking), while it is very, very difficult to make SSD survive random powerfail, and we have to make sure we always powerdown SSDs "cleanly". Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Race to power off harming SATA SSDs 2017-05-08 16:43 ` Pavel Machek @ 2017-05-08 17:43 ` Tejun Heo 2017-05-08 18:56 ` Pavel Machek 2017-05-08 18:29 ` Atlant Schmidt 1 sibling, 1 reply; 28+ messages in thread From: Tejun Heo @ 2017-05-08 17:43 UTC (permalink / raw) To: Pavel Machek Cc: Boris Brezillon, David Woodhouse, Hans de Goede, Ricard Wanderlof, linux-scsi, linux-kernel, linux-ide, linux-mtd, Henrique de Moraes Holschuh Hello, On Mon, May 08, 2017 at 06:43:22PM +0200, Pavel Machek wrote: > What I was trying to point out was that storage people try to treat > SSDs as HDDs... and SSDs are very different. Harddrives mostly survive > powerfails (with emergency parking), while it is very, very difficult > to make SSD survive random powerfail, and we have to make sure we > always powerdown SSDs "cleanly". We do. The issue raised is that some SSDs still increment the unexpected power loss count even after clean shutdown sequence and that the kernel should wait for some secs before powering off. We can do that for select devices but I want something more than "this SMART counter is getting incremented" before doing that. Thanks. -- tejun ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Race to power off harming SATA SSDs 2017-05-08 17:43 ` Tejun Heo @ 2017-05-08 18:56 ` Pavel Machek 2017-05-08 19:04 ` Tejun Heo 0 siblings, 1 reply; 28+ messages in thread From: Pavel Machek @ 2017-05-08 18:56 UTC (permalink / raw) To: Tejun Heo Cc: Boris Brezillon, David Woodhouse, Hans de Goede, Ricard Wanderlof, linux-scsi, linux-kernel, linux-ide, linux-mtd, Henrique de Moraes Holschuh [-- Attachment #1: Type: text/plain, Size: 1448 bytes --] On Mon 2017-05-08 13:43:03, Tejun Heo wrote: > Hello, > > On Mon, May 08, 2017 at 06:43:22PM +0200, Pavel Machek wrote: > > What I was trying to point out was that storage people try to treat > > SSDs as HDDs... and SSDs are very different. Harddrives mostly survive > > powerfails (with emergency parking), while it is very, very difficult > > to make SSD survive random powerfail, and we have to make sure we > > always powerdown SSDs "cleanly". > > We do. > > The issue raised is that some SSDs still increment the unexpected > power loss count even after clean shutdown sequence and that the > kernel should wait for some secs before powering off. > > We can do that for select devices but I want something more than "this > SMART counter is getting incremented" before doing that. Well... the SMART counter tells us that the device was not shut down correctly. Do we have reason to believe that it is _not_ telling us truth? It is more than one device. SSDs die when you power them without warning: http://lkcl.net/reports/ssd_analysis.html What kind of data would you like to see? "I have been using linux and my SSD died"? We have had such reports. "I have killed 10 SSDs in a week then I added one second delay, and this SSD survived 6 months"? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Race to power off harming SATA SSDs 2017-05-08 18:56 ` Pavel Machek @ 2017-05-08 19:04 ` Tejun Heo 0 siblings, 0 replies; 28+ messages in thread From: Tejun Heo @ 2017-05-08 19:04 UTC (permalink / raw) To: Pavel Machek Cc: Boris Brezillon, David Woodhouse, Hans de Goede, Ricard Wanderlof, linux-scsi, linux-kernel, linux-ide, linux-mtd, Henrique de Moraes Holschuh Hello, On Mon, May 08, 2017 at 08:56:15PM +0200, Pavel Machek wrote: > Well... the SMART counter tells us that the device was not shut down > correctly. Do we have reason to believe that it is _not_ telling us > truth? It is more than one device. It also finished power off command successfully. > SSDs die when you power them without warning: > http://lkcl.net/reports/ssd_analysis.html > > What kind of data would you like to see? "I have been using linux and > my SSD died"? We have had such reports. "I have killed 10 SSDs in a > week then I added one second delay, and this SSD survived 6 months"? Repeating shutdown cycles and showing that the device actually is in trouble would be great. It doesn't have to reach full-on device failure. Showing some sign of corruption would be enough - increase in CRC failure counts, bad block counts (a lot of devices report remaining reserve or lifetime in one way or the other) and so on. Right now, it might as well be just the SMART counter being funky. Thanks. -- tejun ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: Race to power off harming SATA SSDs 2017-05-08 16:43 ` Pavel Machek 2017-05-08 17:43 ` Tejun Heo @ 2017-05-08 18:29 ` Atlant Schmidt 1 sibling, 0 replies; 28+ messages in thread From: Atlant Schmidt @ 2017-05-08 18:29 UTC (permalink / raw) To: Pavel Machek, Boris Brezillon Cc: Ricard Wanderlof, linux-scsi@vger.kernel.org, linux-ide@vger.kernel.org, linux-kernel@vger.kernel.org, Hans de Goede, linux-mtd@lists.infradead.org, Henrique de Moraes Holschuh, Tejun Heo, David Woodhouse > Well, you are right.. and I'm responsible. > > What I was trying to point out was that storage people try to treat SSDs as HDDs... > and SSDs are very different. Harddrives mostly survive powerfails (with emergency > parking), while it is very, very difficult to make SSD survive random powerfail, > and we have to make sure we always powerdown SSDs "cleanly". It all depends on the class of SSD that we're discussing. "Enterprise class" SSDs will often use either ultracapacitors or batteries to allow them to successfully complete all of the necessary operations upon a power cut. This e-mail and the information, including any attachments it contains, are intended to be a confidential communication only to the person or entity to whom it is addressed and may contain information that is privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender and destroy the original message. Thank you. Please consider the environment before printing this email. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Race to power off harming SATA SSDs 2017-05-08 9:06 ` Ricard Wanderlof 2017-05-08 9:09 ` Hans de Goede @ 2017-05-08 10:12 ` David Woodhouse 1 sibling, 0 replies; 28+ messages in thread From: David Woodhouse @ 2017-05-08 10:12 UTC (permalink / raw) To: Ricard Wanderlof Cc: Pavel Machek, Tejun Heo, boris.brezillon, linux-scsi, Hans de Goede, linux-kernel, linux-ide, linux-mtd, Henrique de Moraes Holschuh [-- Attachment #1: Type: text/plain, Size: 1046 bytes --] On Mon, 2017-05-08 at 11:06 +0200, Ricard Wanderlof wrote: > > My point is really that say that the problem is in fact not that the erase > is cut short due to the power fail, but that the software issues a second > command before the first erase command has completed, for instance, or > some other situation. Then we'd have a concrete situation which we can > resolve (i.e., fix the bug), rather than assuming that it's the hardware's > fault and implement various software workarounds. On NOR flash we have *definitely* seen it during powerfail testing. A block looks like it's all 0xFF when you read it back on mount, but if you read it repeatedly, you may see bit flips because it wasn't completely erased. And even if you read it ten times and 'trust' that it's properly erased, it could start to show those bit flips when you start to program it. It was very repeatable, and that's when we implemented the 'clean markers' written after a successful erase, rather than trusting a block that "looks empty". [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 4938 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Race to power off harming SATA SSDs 2017-05-08 7:21 ` David Woodhouse 2017-05-08 7:38 ` Ricard Wanderlof @ 2017-05-08 9:28 ` Pavel Machek 2017-05-08 9:34 ` David Woodhouse 2017-05-08 9:51 ` Richard Weinberger 1 sibling, 2 replies; 28+ messages in thread From: Pavel Machek @ 2017-05-08 9:28 UTC (permalink / raw) To: David Woodhouse Cc: Tejun Heo, Henrique de Moraes Holschuh, linux-kernel, linux-scsi, linux-ide, Hans de Goede, boris.brezillon, linux-mtd On Mon 2017-05-08 08:21:34, David Woodhouse wrote: > On Sun, 2017-05-07 at 22:40 +0200, Pavel Machek wrote: > > > > NOTE: unclean SSD power-offs are dangerous and may brick the device in > > > > the worst case, or otherwise harm it (reduce longevity, damage flash > > > > blocks). It is also not impossible to get data corruption. > > > > > I get that the incrementing counters might not be pretty but I'm a bit > > > skeptical about this being an actual issue. Because if that were > > > true, the device would be bricking itself from any sort of power > > > losses be that an actual power loss, battery rundown or hard power off > > > after crash. > > > > And that's exactly what users see. If you do enough power fails on a > > SSD, you usually brick it, some die sooner than others. There was some > > test results published, some are here > > http://lkcl.net/reports/ssd_analysis.html, I believe I seen some > > others too. > > > > It is very hard for a NAND to work reliably in face of power > > failures. In fact, not even Linux MTD + UBIFS works well in that > > regards. See > > http://www.linux-mtd.infradead.org/faq/ubi.html. (Unfortunately, its > > down now?!). If we can't get it right, do you believe SSD manufactures > > do? > > > > [Issue is, if you powerdown during erase, you get "weakly erased" > > page, which will contain expected 0xff's, but you'll get bitflips > > there quickly. Similar issue exists for writes. It is solveable in > > software, just hard and slow... and we don't do it.] > > It's not that hard. We certainly do it in JFFS2. I was fairly sure that > it was also part of the design considerations for UBI — it really ought > to be right there too. I'm less sure about UBIFS but I would have > expected it to be OK. Are you sure you have it right in JFFS2? Do you journal block erases? Apparently, that was pretty much non-issue on older flashes. https://web-beta.archive.org/web/20160923094716/http://www.linux-mtd.infradead.org:80/doc/ubifs.html#L_unstable_bits > SSDs however are often crap; power fail those at your peril. And of > course there's nothing you can do when they do fail, whereas we accept > patches for things which are implemented in Linux. Agreed. If the SSD indiciates unexpected powerdown, it is a problem and we need to fix it. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Race to power off harming SATA SSDs 2017-05-08 9:28 ` Pavel Machek @ 2017-05-08 9:34 ` David Woodhouse 2017-05-08 10:49 ` Pavel Machek 2017-05-08 9:51 ` Richard Weinberger 1 sibling, 1 reply; 28+ messages in thread From: David Woodhouse @ 2017-05-08 9:34 UTC (permalink / raw) To: Pavel Machek Cc: Tejun Heo, Henrique de Moraes Holschuh, linux-kernel, linux-scsi, linux-ide, Hans de Goede, boris.brezillon, linux-mtd [-- Attachment #1: Type: text/plain, Size: 809 bytes --] On Mon, 2017-05-08 at 11:28 +0200, Pavel Machek wrote: > > Are you sure you have it right in JFFS2? Do you journal block erases? > Apparently, that was pretty much non-issue on older flashes. It isn't necessary in JFFS2. It is a *purely* log-structured file system (which is why it doesn't scale well past the 1GiB or so that we made it handle for OLPC). So we don't erase a block until all its contents are obsolete. And if we fail to complete the erase... well the contents are either going to fail a CRC check, or... still be obsoleted by later entries elsewhere. And even if it *looks* like an erase has completed and the block is all 0xFF, we erase it again and write a 'clean marker' to it to indicate that the erase was completed successfully. Because otherwise it can't be trusted. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 4938 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Race to power off harming SATA SSDs 2017-05-08 9:34 ` David Woodhouse @ 2017-05-08 10:49 ` Pavel Machek 2017-05-08 11:06 ` Richard Weinberger 2017-05-08 11:09 ` David Woodhouse 0 siblings, 2 replies; 28+ messages in thread From: Pavel Machek @ 2017-05-08 10:49 UTC (permalink / raw) To: David Woodhouse Cc: Tejun Heo, Henrique de Moraes Holschuh, linux-kernel, linux-scsi, linux-ide, Hans de Goede, boris.brezillon, linux-mtd On Mon 2017-05-08 10:34:08, David Woodhouse wrote: > On Mon, 2017-05-08 at 11:28 +0200, Pavel Machek wrote: > > > > Are you sure you have it right in JFFS2? Do you journal block erases? > > Apparently, that was pretty much non-issue on older flashes. > > It isn't necessary in JFFS2. It is a *purely* log-structured file > system (which is why it doesn't scale well past the 1GiB or so that we > made it handle for OLPC). > > So we don't erase a block until all its contents are obsolete. And if > we fail to complete the erase... well the contents are either going to > fail a CRC check, or... still be obsoleted by later entries elsewhere. > > And even if it *looks* like an erase has completed and the block is all > 0xFF, we erase it again and write a 'clean marker' to it to indicate > that the erase was completed successfully. Because otherwise it can't > be trusted. Aha, nice, so it looks like ubifs is a step back here. 'clean marker' is a good idea... empty pages have plenty of space. How do you handle the issue during regular write? Always ignore last successfully written block? Do you handle "paired pages" problem on MLC? Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Race to power off harming SATA SSDs 2017-05-08 10:49 ` Pavel Machek @ 2017-05-08 11:06 ` Richard Weinberger 2017-05-08 11:48 ` Boris Brezillon 2017-05-08 11:09 ` David Woodhouse 1 sibling, 1 reply; 28+ messages in thread From: Richard Weinberger @ 2017-05-08 11:06 UTC (permalink / raw) To: Pavel Machek Cc: David Woodhouse, Boris Brezillon, linux-scsi@vger.kernel.org, Hans de Goede, LKML, linux-ide, linux-mtd@lists.infradead.org, Henrique de Moraes Holschuh, Tejun Heo On Mon, May 8, 2017 at 12:49 PM, Pavel Machek <pavel@ucw.cz> wrote: > Aha, nice, so it looks like ubifs is a step back here. > > 'clean marker' is a good idea... empty pages have plenty of space. If UBI (not UBIFS) faces an empty block, it also re-erases it. The EC header is uses as clean marker. > How do you handle the issue during regular write? Always ignore last > successfully written block? The last page of a block is inspected and allowed to be corrupted. > Do you handle "paired pages" problem on MLC? Nope, no MLC support in mainline so far. -- Thanks, //richard ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Race to power off harming SATA SSDs 2017-05-08 11:06 ` Richard Weinberger @ 2017-05-08 11:48 ` Boris Brezillon 2017-05-08 11:55 ` Boris Brezillon 2017-05-08 12:13 ` Richard Weinberger 0 siblings, 2 replies; 28+ messages in thread From: Boris Brezillon @ 2017-05-08 11:48 UTC (permalink / raw) To: Richard Weinberger Cc: Pavel Machek, David Woodhouse, linux-scsi@vger.kernel.org, Hans de Goede, LKML, linux-ide, linux-mtd@lists.infradead.org, Henrique de Moraes Holschuh, Tejun Heo On Mon, 8 May 2017 13:06:17 +0200 Richard Weinberger <richard.weinberger@gmail.com> wrote: > On Mon, May 8, 2017 at 12:49 PM, Pavel Machek <pavel@ucw.cz> wrote: > > Aha, nice, so it looks like ubifs is a step back here. > > > > 'clean marker' is a good idea... empty pages have plenty of space. > > If UBI (not UBIFS) faces an empty block, it also re-erases it. Unfortunately, that's not the case, though UBI can easily be patched to do that (see below). > The EC header is uses as clean marker. That is true. If the EC header has been written to a block, that means this block has been correctly erased. > > > How do you handle the issue during regular write? Always ignore last > > successfully written block? I guess UBIFS can know what was written last, because of the log-based approach + the seqnum stored along with FS nodes, but I'm pretty sure UBIFS does not re-write the last written block in case of an unclean mount. Richard, am I wrong? > > The last page of a block is inspected and allowed to be corrupted. Actually, it's not really about corrupted pages, it's about pages that might become unreadable after a few reads. > > > Do you handle "paired pages" problem on MLC? > > Nope, no MLC support in mainline so far. Richard and I have put a lot of effort to reliably support MLC NANDs in mainline, unfortunately this projects has been paused. You can access the last version of our work here [1] if you're interested (it's clearly not in a shippable state ;-)). [1]https://github.com/bbrezillon/linux-sunxi/commits/bb/4.7/ubi-mlc --->8--- diff --git a/drivers/mtd/ubi/attach.c b/drivers/mtd/ubi/attach.c index 93ceea4f27d5..3d76941c9570 100644 --- a/drivers/mtd/ubi/attach.c +++ b/drivers/mtd/ubi/attach.c @@ -1121,21 +1121,20 @@ static int scan_peb(struct ubi_device *ubi, struct ubi_attach_info *ai, return err; goto adjust_mean_ec; case UBI_IO_FF_BITFLIPS: + case UBI_IO_FF: + /* + * Always erase the block if the EC header is empty, even if + * no bitflips were reported because otherwise we might + * expose ourselves to the 'unstable bits' issue described + * here: + * + * http://www.linux-mtd.infradead.org/doc/ubifs.html#L_unstable_bits + */ err = add_to_list(ai, pnum, UBI_UNKNOWN, UBI_UNKNOWN, ec, 1, &ai->erase); if (err) return err; goto adjust_mean_ec; - case UBI_IO_FF: - if (ec_err || bitflips) - err = add_to_list(ai, pnum, UBI_UNKNOWN, - UBI_UNKNOWN, ec, 1, &ai->erase); - else - err = add_to_list(ai, pnum, UBI_UNKNOWN, - UBI_UNKNOWN, ec, 0, &ai->free); - if (err) - return err; - goto adjust_mean_ec; default: ubi_err(ubi, "'ubi_io_read_vid_hdr()' returned unknown code %d", err); ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: Race to power off harming SATA SSDs 2017-05-08 11:48 ` Boris Brezillon @ 2017-05-08 11:55 ` Boris Brezillon 2017-05-08 12:13 ` Richard Weinberger 1 sibling, 0 replies; 28+ messages in thread From: Boris Brezillon @ 2017-05-08 11:55 UTC (permalink / raw) To: Richard Weinberger Cc: Pavel Machek, David Woodhouse, linux-scsi@vger.kernel.org, Hans de Goede, LKML, linux-ide, linux-mtd@lists.infradead.org, Henrique de Moraes Holschuh, Tejun Heo On Mon, 8 May 2017 13:48:07 +0200 Boris Brezillon <boris.brezillon@free-electrons.com> wrote: > On Mon, 8 May 2017 13:06:17 +0200 > Richard Weinberger <richard.weinberger@gmail.com> wrote: > > > On Mon, May 8, 2017 at 12:49 PM, Pavel Machek <pavel@ucw.cz> wrote: > > > Aha, nice, so it looks like ubifs is a step back here. > > > > > > 'clean marker' is a good idea... empty pages have plenty of space. > > > > If UBI (not UBIFS) faces an empty block, it also re-erases it. > > Unfortunately, that's not the case, though UBI can easily be patched > to do that (see below). Sorry for the noise, I was wrong, UBI already re-erases empty blocks [1]. [1]http://elixir.free-electrons.com/linux/latest/source/drivers/mtd/ubi/attach.c#L983 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Race to power off harming SATA SSDs 2017-05-08 11:48 ` Boris Brezillon 2017-05-08 11:55 ` Boris Brezillon @ 2017-05-08 12:13 ` Richard Weinberger 1 sibling, 0 replies; 28+ messages in thread From: Richard Weinberger @ 2017-05-08 12:13 UTC (permalink / raw) To: Boris Brezillon Cc: Pavel Machek, David Woodhouse, linux-scsi@vger.kernel.org, Hans de Goede, LKML, linux-ide, linux-mtd@lists.infradead.org, Henrique de Moraes Holschuh, Tejun Heo Boris, Am 08.05.2017 um 13:48 schrieb Boris Brezillon: >>> How do you handle the issue during regular write? Always ignore last >>> successfully written block? > > I guess UBIFS can know what was written last, because of the log-based > approach + the seqnum stored along with FS nodes, but I'm pretty sure > UBIFS does not re-write the last written block in case of an unclean > mount. Richard, am I wrong? Yes. UBIFS has the machinery but uses it differently. When it faces ECC errors while replying the journal it can recover good data from the LEB. It assumes that an interrupted write leads always to ECC errors. >> >> The last page of a block is inspected and allowed to be corrupted. > > Actually, it's not really about corrupted pages, it's about pages that > might become unreadable after a few reads. As stated before, it assumes an ECC error from an interrupted read. We could automatically re-write everything in UBIFS that was written last but we don't have this information for data UBI itself wrote since UBI has no journal. If unstable bit can be triggered with current systems we can think of a clever trick to deal with that. So far nobody was able to show me the problem. Thanks, //richard ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Race to power off harming SATA SSDs 2017-05-08 10:49 ` Pavel Machek 2017-05-08 11:06 ` Richard Weinberger @ 2017-05-08 11:09 ` David Woodhouse 2017-05-08 12:32 ` Pavel Machek 1 sibling, 1 reply; 28+ messages in thread From: David Woodhouse @ 2017-05-08 11:09 UTC (permalink / raw) To: Pavel Machek Cc: Tejun Heo, Henrique de Moraes Holschuh, linux-kernel, linux-scsi, linux-ide, Hans de Goede, boris.brezillon, linux-mtd [-- Attachment #1: Type: text/plain, Size: 2072 bytes --] On Mon, 2017-05-08 at 12:49 +0200, Pavel Machek wrote: > On Mon 2017-05-08 10:34:08, David Woodhouse wrote: > > > > On Mon, 2017-05-08 at 11:28 +0200, Pavel Machek wrote: > > > > > > > > > Are you sure you have it right in JFFS2? Do you journal block erases? > > > Apparently, that was pretty much non-issue on older flashes. > > It isn't necessary in JFFS2. It is a *purely* log-structured file > > system (which is why it doesn't scale well past the 1GiB or so that we > > made it handle for OLPC). > > > > So we don't erase a block until all its contents are obsolete. And if > > we fail to complete the erase... well the contents are either going to > > fail a CRC check, or... still be obsoleted by later entries elsewhere. > > > > And even if it *looks* like an erase has completed and the block is all > > 0xFF, we erase it again and write a 'clean marker' to it to indicate > > that the erase was completed successfully. Because otherwise it can't > > be trusted. > Aha, nice, so it looks like ubifs is a step back here. > > 'clean marker' is a good idea... empty pages have plenty of space. Well... you lose that space permanently. Although I suppose you could do things differently and erase a block immediately prior to using it. But in that case why ever write the cleanmarker? Just maintain a set of blocks that you *will* erase and re-use. > How do you handle the issue during regular write? Always ignore last > successfully written block? Log nodes have a CRC. If you get interrupted during a write, that CRC should fail. > Do you handle "paired pages" problem on MLC? No. It would theoretically be possible, by not considering a write to the first page "committed" until the second page of the pair is also written. Essentially, it's not far off expanding the existing 'wbuf' which we use to gather writes into full pages for NAND, to cover the *whole* of the set of pages which are affected by MLC. But we mostly consider JFFS2 to be obsolete these days, in favour of UBI/UBIFS or other approaches. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 4938 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Race to power off harming SATA SSDs 2017-05-08 11:09 ` David Woodhouse @ 2017-05-08 12:32 ` Pavel Machek 0 siblings, 0 replies; 28+ messages in thread From: Pavel Machek @ 2017-05-08 12:32 UTC (permalink / raw) To: David Woodhouse Cc: Tejun Heo, Henrique de Moraes Holschuh, linux-kernel, linux-scsi, linux-ide, Hans de Goede, boris.brezillon, linux-mtd Hi! > > 'clean marker' is a good idea... empty pages have plenty of space. > > Well... you lose that space permanently. Although I suppose you could > do things differently and erase a block immediately prior to using it. > But in that case why ever write the cleanmarker? Just maintain a set of > blocks that you *will* erase and re-use. Yes, but erase is slow so that would hurt performance...? > > How do you handle the issue during regular write? Always ignore last > > successfully written block? > > Log nodes have a CRC. If you get interrupted during a write, that CRC > should fail. Umm. That is not what "unstable bits" issue is about, right? If you are interrupted during write, you can get into state where readback will be correct on next boot (CRC, ECC ok), but then the bits will go back few hours after that. You can't rely on checksums to detect that.. because the bits will have the right values -- for a while. > > Do you handle "paired pages" problem on MLC? > > No. It would theoretically be possible, by not considering a write to > the first page "committed" until the second page of the pair is also > written. Essentially, it's not far off expanding the existing 'wbuf' > which we use to gather writes into full pages for NAND, to cover the > *whole* of the set of pages which are affected by MLC. > > But we mostly consider JFFS2 to be obsolete these days, in favour of > UBI/UBIFS or other approaches. Yes, I guess MLC NAND chips are mostly too big for jjfs2. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Race to power off harming SATA SSDs 2017-05-08 9:28 ` Pavel Machek 2017-05-08 9:34 ` David Woodhouse @ 2017-05-08 9:51 ` Richard Weinberger 1 sibling, 0 replies; 28+ messages in thread From: Richard Weinberger @ 2017-05-08 9:51 UTC (permalink / raw) To: Pavel Machek Cc: David Woodhouse, Tejun Heo, Henrique de Moraes Holschuh, LKML, linux-scsi@vger.kernel.org, linux-ide, Hans de Goede, Boris Brezillon, linux-mtd@lists.infradead.org Pavel, On Mon, May 8, 2017 at 11:28 AM, Pavel Machek <pavel@ucw.cz> wrote: > Are you sure you have it right in JFFS2? Do you journal block erases? > Apparently, that was pretty much non-issue on older flashes. This is what the website says, yes. Do you have hardware where you can trigger it? If so, I'd love to get access to it. So far I never saw the issue, sometimes people claim to suffer from it but when I inspect the problems in detail it is always something else. -- Thanks, //richard ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2017-05-08 21:36 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20170410232118.GA4816@khazad-dum.debian.net>
[not found] ` <20170410235206.GA28603@wtj.duckdns.org>
2017-05-07 20:40 ` Race to power off harming SATA SSDs Pavel Machek
2017-05-08 7:21 ` David Woodhouse
2017-05-08 7:38 ` Ricard Wanderlof
2017-05-08 8:13 ` David Woodhouse
2017-05-08 8:36 ` Ricard Wanderlof
2017-05-08 8:54 ` David Woodhouse
2017-05-08 9:06 ` Ricard Wanderlof
2017-05-08 9:09 ` Hans de Goede
2017-05-08 10:13 ` David Woodhouse
2017-05-08 11:50 ` Boris Brezillon
2017-05-08 15:40 ` David Woodhouse
2017-05-08 21:36 ` Pavel Machek
2017-05-08 16:43 ` Pavel Machek
2017-05-08 17:43 ` Tejun Heo
2017-05-08 18:56 ` Pavel Machek
2017-05-08 19:04 ` Tejun Heo
2017-05-08 18:29 ` Atlant Schmidt
2017-05-08 10:12 ` David Woodhouse
2017-05-08 9:28 ` Pavel Machek
2017-05-08 9:34 ` David Woodhouse
2017-05-08 10:49 ` Pavel Machek
2017-05-08 11:06 ` Richard Weinberger
2017-05-08 11:48 ` Boris Brezillon
2017-05-08 11:55 ` Boris Brezillon
2017-05-08 12:13 ` Richard Weinberger
2017-05-08 11:09 ` David Woodhouse
2017-05-08 12:32 ` Pavel Machek
2017-05-08 9:51 ` Richard Weinberger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox