* Re: Memory replacement [not found] ` <320717C0-7117-462E-9227-7966EE6941D7@laptop.org> @ 2011-03-12 22:51 ` Arnd Bergmann 2011-03-13 1:01 ` C. Scott Ananian 0 siblings, 1 reply; 15+ messages in thread From: Arnd Bergmann @ 2011-03-12 22:51 UTC (permalink / raw) To: John Watlington; +Cc: devel, Kevin Gordon, linux-mmc On Friday 11 March 2011 18:28:49 John Watlington wrote: > > On Mar 11, 2011, at 5:35 AM, Arnd Bergmann wrote: > > > I've tested around a dozen media from them, and while you are true > > that they use rather different algorithms and NAND chips inside, all > > of them can write to at least 5 erase blocks before getting into > > garbage collection, which is really needed for ext3 file systems. > > > > Contrast this with Kingston cards, which all use the same algorithm > > and can only write data linearly to one erase block at a time, resulting > > in one or two orders of magnitude higher internal write amplification. > > > > Most other vendors are somewhere inbetween, and you sometimes get > > fake cards that don't do what you expect, such a a bunch of Samsung > > microSDHC cards thaI have I which are labeled Sandisk on the outside. > > Those aren't fakes. That is what I'm trying to get across. I've had four cards with a Sandisk label that had unusual characteristics and manufacturer/OEM IDs that refer to other companies, three Samsung ("SM") and one unknown ("BE", possibly lexar). In all cases, the Sandisk support has confirmed from photos that the cards are definitely fake. They also explained that all authentic cards (possibly fake ones as well, but I have not seen them) will be labeled "Made in China", not "Made in Korea" or "Made in Taiwan" as my fake ones, and that the authentic microSD cards have the serial number on the front side, not on the back. > > I've also seen some really cheap noname cards outperform similar-spec'd > > sandisk card, both regarding maximum throughput and the garbage collection > > algorithms, but you can't rely on that. > > > My point is that you can't rely on Sandisk either. > > I've been in discussion with both Sandisk and Adata about these issues, > as well as constantly testing batches of new SD cards from all major > vendors. > > Unless you pay a lot extra and order at least 100K, you have no > control over what they give you. They don't just change NAND chips, > they change the controller chip and its firmware. Frequently. > And they don't update either the SKU number, part marking or the > identification fields available to software. The manufacturing batch > number printed on the outside is the only thing that changes. I agree that you cannot rely on specific behavior to stay the same with any vendor. One thing I noticed for instance is that many new Sandisk cards are using TLC (three level cell) NAND, which is inherently slower and cheaper than the regular two-level MLC used in older cards or those from other vendors. However, they have apparently managed to make them work well for random access by using some erase blocks as SLC (writing only the pages that carry the most significant bit in each cell) and by doing log structured writes in there, something that apparently others have not figured out yet. Also, as I mentioned, they consistenly use a relatively large number of open erase blocks. I've measured both effects on SD cards and USB sticks. I believe you can get this level of sophistication only from companies that make the nand flash, the controller and the card: Sandisk, Samsung and Toshiba. Other brands that just get the controllers and the flash chips from whoever sells them cheaply (kingston, adata, panasonic, transcend, ...) apparently don't get the really good stuff. > How we deal with this is constant testing and getting notification from > the manufacturer that they are changing the internals (unfortunately, > we aren't willing to pay the premium to have a special SKU). Do you have test results somewhere publically available? We are currently discussing adding some tweaks to the linux mmc drivers to detect cards with certain features, and to do some optimizations in the block layer for common ones. Arnd ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Memory replacement 2011-03-12 22:51 ` Memory replacement Arnd Bergmann @ 2011-03-13 1:01 ` C. Scott Ananian 2011-03-13 12:57 ` Andrei Warkentin 2011-03-13 17:21 ` Arnd Bergmann 0 siblings, 2 replies; 15+ messages in thread From: C. Scott Ananian @ 2011-03-13 1:01 UTC (permalink / raw) To: Arnd Bergmann; +Cc: John Watlington, Kevin Gordon, devel, linux-mmc On Sat, Mar 12, 2011 at 5:51 PM, Arnd Bergmann <arnd@arndb.de> wrote: > I've had four cards with a Sandisk label that had unusual characteristics > and manufacturer/OEM IDs that refer to other companies, three Samsung ("SM") > and one unknown ("BE", possibly lexar). In all cases, the Sandisk support > has confirmed from photos that the cards are definitely fake. They also Please see the blog post I cited in the email immediately prior to yours, which discusses this situation precisely. Often the cards are not actually "fake" -- they may even be produced on the exact same equipment as the usual cards, but "off the books" during hours when the factory is officially closed. This sort of thing is very very widespread, and fakes can come even via official distribution channels. (Discussed in bunnie's post.) > However, they have apparently managed to make them work well > for random access by using some erase blocks as SLC (writing only > the pages that carry the most significant bit in each cell) and > by doing log structured writes in there, something that apparently > others have not figured out yet. Also, as I mentioned, they > consistenly use a relatively large number of open erase blocks. > I've measured both effects on SD cards and USB sticks. You've been lucky. > I believe you can get this level of sophistication only from > companies that make the nand flash, the controller and the card: > Sandisk, Samsung and Toshiba. > Other brands that just get the controllers and the flash chips > from whoever sells them cheaply (kingston, adata, panasonic, > transcend, ...) apparently don't get the really good stuff. You're giving the OEMs too much credit. As John says, unless you arrange for a special SKU, even the "first source" companies will give you whatever they've got cheap that day. >> How we deal with this is constant testing and getting notification from >> the manufacturer that they are changing the internals (unfortunately, >> we aren't willing to pay the premium to have a special SKU). > > Do you have test results somewhere publically available? We are currently > discussing adding some tweaks to the linux mmc drivers to detect cards > with certain features, and to do some optimizations in the block layer > for common ones. http://wiki.laptop.org/go/NAND_Testing But the testing wad is talking about is really *on the factory floor*: Regular sampling of chips as they come into the factory to ensure that the chips *you are actually about to put into the XOs* are consistent. Relying on manufacturing data reported by the chips is not reliable. --scott -- ( http://cscott.net/ ) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Memory replacement 2011-03-13 1:01 ` C. Scott Ananian @ 2011-03-13 12:57 ` Andrei Warkentin 2011-03-13 17:00 ` C. Scott Ananian 2011-03-13 17:21 ` Arnd Bergmann 1 sibling, 1 reply; 15+ messages in thread From: Andrei Warkentin @ 2011-03-13 12:57 UTC (permalink / raw) To: C. Scott Ananian Cc: Arnd Bergmann, John Watlington, Kevin Gordon, devel, linux-mmc On Sat, Mar 12, 2011 at 7:01 PM, C. Scott Ananian <cscott@laptop.org> wrote: > On Sat, Mar 12, 2011 at 5:51 PM, Arnd Bergmann <arnd@arndb.de> wrote: >> I've had four cards with a Sandisk label that had unusual characteristics >> and manufacturer/OEM IDs that refer to other companies, three Samsung ("SM") >> and one unknown ("BE", possibly lexar). In all cases, the Sandisk support >> has confirmed from photos that the cards are definitely fake. They also > > Please see the blog post I cited in the email immediately prior to > yours, which discusses this situation precisely. Often the cards are > not actually "fake" -- they may even be produced on the exact same > equipment as the usual cards, but "off the books" during hours when > the factory is officially closed. This sort of thing is very very > widespread, and fakes can come even via official distribution > channels. (Discussed in bunnie's post.) > >> However, they have apparently managed to make them work well >> for random access by using some erase blocks as SLC (writing only >> the pages that carry the most significant bit in each cell) and >> by doing log structured writes in there, something that apparently >> others have not figured out yet. Also, as I mentioned, they >> consistenly use a relatively large number of open erase blocks. >> I've measured both effects on SD cards and USB sticks. > > You've been lucky. > >> I believe you can get this level of sophistication only from >> companies that make the nand flash, the controller and the card: >> Sandisk, Samsung and Toshiba. >> Other brands that just get the controllers and the flash chips >> from whoever sells them cheaply (kingston, adata, panasonic, >> transcend, ...) apparently don't get the really good stuff. > > You're giving the OEMs too much credit. As John says, unless you > arrange for a special SKU, even the "first source" companies will give > you whatever they've got cheap that day. > >>> How we deal with this is constant testing and getting notification from >>> the manufacturer that they are changing the internals (unfortunately, >>> we aren't willing to pay the premium to have a special SKU). >> >> Do you have test results somewhere publically available? We are currently >> discussing adding some tweaks to the linux mmc drivers to detect cards >> with certain features, and to do some optimizations in the block layer >> for common ones. > > http://wiki.laptop.org/go/NAND_Testing > > But the testing wad is talking about is really *on the factory floor*: > Regular sampling of chips as they come into the factory to ensure > that the chips *you are actually about to put into the XOs* are > consistent. Relying on manufacturing data reported by the chips is > not reliable. > --scott Sorry to butt in, I think I'm missing most of the context here....nevertheless... I'm curious, ignoring outer packaging and product names, if you look at cards with the "same" CID (i.e. same manfid/oemid/date/firmware and hw rev), do you get same performance characteristics? Anyway, if you're curious about optimizing performance for certain cards, I'm curious to see your results, your tests and (if any) vendor recommendations. I'm collecting data and trying to re-validate some of the vendor suggestions for Toshiba eMMCs... in particular - splitting unaligned writes into an unaligned and aligned part. The only thing I can say now is that the more data I collect the less it makes sense :-). I'm resubmitting a change to MMC layer that allows creating block MMC quirks... Skipping the actual quirks as I'm trying to revalidate data taken by others and provide data I'm confident about, but you might be interested in the overall quirks support if you're thinking about adding your own. A ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Memory replacement 2011-03-13 12:57 ` Andrei Warkentin @ 2011-03-13 17:00 ` C. Scott Ananian 2011-03-13 17:06 ` C. Scott Ananian 0 siblings, 1 reply; 15+ messages in thread From: C. Scott Ananian @ 2011-03-13 17:00 UTC (permalink / raw) To: Andrei Warkentin Cc: Arnd Bergmann, John Watlington, Kevin Gordon, devel, linux-mmc On Sun, Mar 13, 2011 at 8:57 AM, Andrei Warkentin <andreiw@motorola.com> wrote: > Sorry to butt in, I think I'm missing most of the context > here....nevertheless... I'm curious, ignoring outer packaging and > product names, if you look at cards with the "same" CID (i.e. same > manfid/oemid/date/firmware and hw rev), do you get same performance > characteristics? No. --scott -- ( http://cscott.net/ ) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Memory replacement 2011-03-13 17:00 ` C. Scott Ananian @ 2011-03-13 17:06 ` C. Scott Ananian 0 siblings, 0 replies; 15+ messages in thread From: C. Scott Ananian @ 2011-03-13 17:06 UTC (permalink / raw) To: Andrei Warkentin Cc: Arnd Bergmann, John Watlington, Kevin Gordon, devel, linux-mmc On Sun, Mar 13, 2011 at 1:00 PM, C. Scott Ananian <cscott@laptop.org> wrote: > On Sun, Mar 13, 2011 at 8:57 AM, Andrei Warkentin <andreiw@motorola.com> wrote: >> Sorry to butt in, I think I'm missing most of the context >> here....nevertheless... I'm curious, ignoring outer packaging and >> product names, if you look at cards with the "same" CID (i.e. same >> manfid/oemid/date/firmware and hw rev), do you get same performance >> characteristics? > > No. To elaborate: see bunnie's blog post (cited above) on how the CID is often forged or wrong. I've also personally witnessed a manufacturer's rep come to the factory floor to reprogram a compact flash card's internal microcontroller with new firmware. This did not update any externally visible information reported by the chip. I had to convince the manufacturer to leave their proprietary hardware on the factory floor in order to be able to verify that future units would have the correct firmware. (Granted, this was not an MMC unit, but I would be surprised if MMC vendors were significantly different in this regard.) If you've spent any time working with Chinese/Taiwanese OEMs, you will notice that version control methodologies are (in general) disappointingly lax. --scott -- ( http://cscott.net/ ) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Memory replacement 2011-03-13 1:01 ` C. Scott Ananian 2011-03-13 12:57 ` Andrei Warkentin @ 2011-03-13 17:21 ` Arnd Bergmann 2011-03-13 21:31 ` Richard A. Smith 1 sibling, 1 reply; 15+ messages in thread From: Arnd Bergmann @ 2011-03-13 17:21 UTC (permalink / raw) To: C. Scott Ananian; +Cc: John Watlington, Kevin Gordon, devel, linux-mmc On Sunday 13 March 2011 02:01:22 C. Scott Ananian wrote: > On Sat, Mar 12, 2011 at 5:51 PM, Arnd Bergmann <arnd@arndb.de> wrote: > > I've had four cards with a Sandisk label that had unusual characteristics > > and manufacturer/OEM IDs that refer to other companies, three Samsung ("SM") > > and one unknown ("BE", possibly lexar). In all cases, the Sandisk support > > has confirmed from photos that the cards are definitely fake. They also > > Please see the blog post I cited in the email immediately prior to > yours, which discusses this situation precisely. Often the cards are > not actually "fake" -- they may even be produced on the exact same > equipment as the usual cards, but "off the books" during hours when > the factory is officially closed. This sort of thing is very very > widespread, and fakes can come even via official distribution > channels. (Discussed in bunnie's post.) I am very familiar with bunnie's research, and have referenced it from my own page on the linaro wiki. I have also found Kingston cards with the exact same symptoms that triggered his original interest (very slow, manfid 0x41, oemid "42", low serial number). Another interesting case of a fake card I found had a Sandisk label and "LEXAR" in its MMC name field. Moreover, it actually contained copyrighted software that Lexar ships in their real cards. So what I'd assume is happening here is that the factory that produces the cards or Lexar had a graveyard shift where they were just printing Sandisk labels on the cards. > You're giving the OEMs too much credit. As John says, unless you > arrange for a special SKU, even the "first source" companies will give > you whatever they've got cheap that day. It's pretty clear that they are moving to cheaper NAND chips when possible, and I also mentioned that. For the controller chips, I don't understand how they would save money by buying them on the spot market. On the contrary, using the smart controllers that Sandisk themselves make allows them to use even slower NAND chips and still qualify for a better nominal speed grade, while companies that don't have acess to decent controllers need to either use chips that are fast enough to make up for the bad GC algorithm or lie in their speed grades. > >> How we deal with this is constant testing and getting notification from > >> the manufacturer that they are changing the internals (unfortunately, > >> we aren't willing to pay the premium to have a special SKU). > > > > Do you have test results somewhere publically available? We are currently > > discussing adding some tweaks to the linux mmc drivers to detect cards > > with certain features, and to do some optimizations in the block layer > > for common ones. > > http://wiki.laptop.org/go/NAND_Testing Ok, so the "testing" essentially means you create an ext2/3/4 file system and run tests on the file system until the card wears out, right? It does seem a bit crude, because many cards are not really suitable for this kind of file system when their wear leveling is purely optimized to the accesses defined in the sd card file system specification. If you did this on e.g. a typical Kingston card, it can have a write amplification 100 times higher than normal (FAT32, nilfs2, ...), so it gets painfully slow and wears out very quickly. I had hoped that someone already correlated the GC algorithms with the requirements of specific file systems to allow a more systematic approach. Arnd ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Memory replacement 2011-03-13 17:21 ` Arnd Bergmann @ 2011-03-13 21:31 ` Richard A. Smith 2011-03-13 22:34 ` Mikus Grinbergs 2011-03-14 17:32 ` Arnd Bergmann 0 siblings, 2 replies; 15+ messages in thread From: Richard A. Smith @ 2011-03-13 21:31 UTC (permalink / raw) To: Arnd Bergmann; +Cc: C. Scott Ananian, Kevin Gordon, devel, linux-mmc On 03/13/2011 01:21 PM, Arnd Bergmann wrote: >>> Do you have test results somewhere publically available? We are currently >>> discussing adding some tweaks to the linux mmc drivers to detect cards >>> with certain features, and to do some optimizations in the block layer >>> for common ones. >> >> http://wiki.laptop.org/go/NAND_Testing > > Ok, so the "testing" essentially means you create an ext2/3/4 file system > and run tests on the file system until the card wears out, right? The qualifying test is that the card must pass 3TB of writes with no errors. We run that on samples from the various mfg's. There's a 2nd round of test(s) that runs during the manufacturing and burn-in phases. One is a simple firmware test to see if you can talk the card at all and then one runs at burn in. It doesn't have a minimum write size criteria but during the run there must not be any bit errors. > It does seem a bit crude, because many cards are not really suitable > for this kind of file system when their wear leveling is purely optimized > to the accesses defined in the sd card file system specification. > > If you did this on e.g. a typical Kingston card, it can have a write > amplification 100 times higher than normal (FAT32, nilfs2, ...), so > it gets painfully slow and wears out very quickly. Crude as they are they have been useful tests for us. Our top criteria is reliability. We want to ship the machines with a SD card thats going to last for the 5 year design life using the filesystem we ship. We tried to create an access pattern was the worst possible and the highest stress on the wear leveling system. If a card pases the 3TB abuse test then we are pretty certain its going to meet that goal. There were many cards that died very quickly. The tests have also helped expose other issues with things like sudden power off. In one case a SPO during a write would corrupt the card so badly it became useless. You could only recover them via a super secret tool from the manufacturer. > I had hoped that someone already correlated the GC algorithms with > the requirements of specific file systems to allow a more systematic > approach. At the time we started doing this testing none of the log structure filesystems were deemed to be mature enough for us to ship. So we didn't bother to try and torture test using them. If more precision tests were created that still allowed us to make a reasonable estimate of data write lifetime we would be happy to start using them. -- Richard A. Smith <richard@laptop.org> One Laptop per Child ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Memory replacement 2011-03-13 21:31 ` Richard A. Smith @ 2011-03-13 22:34 ` Mikus Grinbergs 2011-03-14 14:01 ` Arnd Bergmann ` (2 more replies) 2011-03-14 17:32 ` Arnd Bergmann 1 sibling, 3 replies; 15+ messages in thread From: Mikus Grinbergs @ 2011-03-13 22:34 UTC (permalink / raw) To: Richard A. Smith; +Cc: Arnd Bergmann, devel, linux-mmc > The tests have also helped expose other issues with things like sudden > power off. In one case a SPO during a write would corrupt the card so > badly it became useless. You could only recover them via a super secret > tool from the manufacturer. Is there any "sledgehammer" process available to users without a super secret tool ? I've encountered SD cards which will be recognized as a device when plugged in to a running XO-1 (though 'ls' of a filesystem on that SD card is corrupt) -- but 'fdisk' is ineffective when I want to write a new partition table (and 'fsck' appears to loop). Since otherwise I'd just have to throw the card away, I'd be willing to apply EXTREME measures to get such a card into a reusable ("blank slate") condition. mikus ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Memory replacement 2011-03-13 22:34 ` Mikus Grinbergs @ 2011-03-14 14:01 ` Arnd Bergmann 2011-03-14 14:17 ` Richard A. Smith 2011-03-14 18:50 ` John Watlington 2 siblings, 0 replies; 15+ messages in thread From: Arnd Bergmann @ 2011-03-14 14:01 UTC (permalink / raw) To: mikus; +Cc: Richard A. Smith, devel, linux-mmc On Sunday 13 March 2011, Mikus Grinbergs wrote: > > The tests have also helped expose other issues with things like sudden > > power off. In one case a SPO during a write would corrupt the card so > > badly it became useless. You could only recover them via a super secret > > tool from the manufacturer. > > Is there any "sledgehammer" process available to users without a super > secret tool ? You can recover some cards by issueing an erase on the full drive. Unfortunately, this requires a patch to the SDHCI device driver, which is only now going into the kernel, I think it will be in 2.6.39. Issuing an erase (ioctl BLKDISCARD) also helps recover the performance on cards that get slower with increased internal fragmentation, but most cards use GC algorithms far too simple to get into that problem in the first place. Arnd ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Memory replacement 2011-03-13 22:34 ` Mikus Grinbergs 2011-03-14 14:01 ` Arnd Bergmann @ 2011-03-14 14:17 ` Richard A. Smith 2011-03-14 18:50 ` John Watlington 2 siblings, 0 replies; 15+ messages in thread From: Richard A. Smith @ 2011-03-14 14:17 UTC (permalink / raw) To: mikus; +Cc: devel, linux-mmc On 03/13/2011 06:34 PM, Mikus Grinbergs wrote: >> The tests have also helped expose other issues with things like sudden >> power off. In one case a SPO during a write would corrupt the card so >> badly it became useless. You could only recover them via a super secret >> tool from the manufacturer. > > Is there any "sledgehammer" process available to users without a super > secret tool ? Wasn't just secret to users. They would not give us the info on how to do it either. It was vendor specific so not really worth the effort of trying to reverse engineer. -- Richard A. Smith <richard@laptop.org> One Laptop per Child ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Memory replacement 2011-03-13 22:34 ` Mikus Grinbergs 2011-03-14 14:01 ` Arnd Bergmann 2011-03-14 14:17 ` Richard A. Smith @ 2011-03-14 18:50 ` John Watlington 2011-03-14 19:18 ` Arnd Bergmann 2 siblings, 1 reply; 15+ messages in thread From: John Watlington @ 2011-03-14 18:50 UTC (permalink / raw) To: mikus; +Cc: Richard A. Smith, devel, linux-mmc On Mar 13, 2011, at 6:34 PM, Mikus Grinbergs wrote: >> The tests have also helped expose other issues with things like sudden >> power off. In one case a SPO during a write would corrupt the card so >> badly it became useless. You could only recover them via a super secret >> tool from the manufacturer. > > Is there any "sledgehammer" process available to users without a super > secret tool ? No. Such software does exist for every controller, but it doesn't necessarily use the SD interface as SD. > I've encountered SD cards which will be recognized as a device when > plugged in to a running XO-1 (though 'ls' of a filesystem on that SD > card is corrupt) -- but 'fdisk' is ineffective when I want to write a > new partition table (and 'fsck' appears to loop). Since otherwise I'd > just have to throw the card away, I'd be willing to apply EXTREME > measures to get such a card into a reusable ("blank slate") condition. Cards that are in the state you describe are most likely dead due to running out of spare blocks. There is nothing that can be done to rehabilitate them, even using the manufacturer's secret code. In a disturbing trend, most of the cards I've returned for failure analysis in the past year have been worn out (and not just trashed meta-data due to a firmware error). Bummer, wad ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Memory replacement 2011-03-14 18:50 ` John Watlington @ 2011-03-14 19:18 ` Arnd Bergmann 2011-03-15 0:29 ` John Watlington 0 siblings, 1 reply; 15+ messages in thread From: Arnd Bergmann @ 2011-03-14 19:18 UTC (permalink / raw) To: John Watlington; +Cc: mikus, Richard A. Smith, devel, linux-mmc On Monday 14 March 2011 19:50:27 John Watlington wrote: > Cards that are in the state you describe are most likely dead due to > running out of spare blocks. There is nothing that can be done to > rehabilitate them, even using the manufacturer's secret code. > In a disturbing trend, most of the cards I've returned for failure analysis > in the past year have been worn out (and not just trashed meta-data > due to a firmware error). Part of the explanation for this could be the fact that erase block sizes have rapidly increased. AFAIK, the original XO builtin flash had 128KB erase blocks, which is also a common size for 1GB SD and CF cards. Cards made in 2010 or later typically have erase blocks of 2 MB, and combine two of them into an allocation unit of 4 MB. This means that in the worst case (random access over the whole medium), the write amplification has increased by a factor of 32. Another effect is that the page size has increased by a factor of 8, from 2 or 4 KB to 16 or 32 KB. Writing data that as smaller than a page is more likely to get you into the worst case mentioned above. This is part of why FAT32 with 32 KB clusters still works reasonably well, but ext3 with 4 KB blocks has regressed so much. Arnd ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Memory replacement 2011-03-14 19:18 ` Arnd Bergmann @ 2011-03-15 0:29 ` John Watlington 2011-03-15 8:42 ` Arnd Bergmann 0 siblings, 1 reply; 15+ messages in thread From: John Watlington @ 2011-03-15 0:29 UTC (permalink / raw) To: Arnd Bergmann; +Cc: John Watlington, mikus, Richard A. Smith, devel, linux-mmc On Mar 14, 2011, at 3:18 PM, Arnd Bergmann wrote: > On Monday 14 March 2011 19:50:27 John Watlington wrote: >> Cards that are in the state you describe are most likely dead due to >> running out of spare blocks. There is nothing that can be done to >> rehabilitate them, even using the manufacturer's secret code. >> In a disturbing trend, most of the cards I've returned for failure analysis >> in the past year have been worn out (and not just trashed meta-data >> due to a firmware error). > > Part of the explanation for this could be the fact that erase block > sizes have rapidly increased. AFAIK, the original XO builtin flash > had 128KB erase blocks, which is also a common size for 1GB SD and > CF cards. > Cards made in 2010 or later typically have erase blocks of 2 MB, and > combine two of them into an allocation unit of 4 MB. This means that > in the worst case (random access over the whole medium), the write > amplification has increased by a factor of 32. > > Another effect is that the page size has increased by a factor of 8, > from 2 or 4 KB to 16 or 32 KB. Writing data that as smaller than > a page is more likely to get you into the worst case mentioned > above. This is part of why FAT32 with 32 KB clusters still works > reasonably well, but ext3 with 4 KB blocks has regressed so much. The explanation is simple: manufacturers moved to two-bit/cell (MLC) NAND Flash over a year ago, and six months ago moved to three-bit/cell (TLC) NAND Flash. Reliability went down, then went through the floor (I cannot recommend TLC for anything but write-once devices). You might have noticed this as an increase in the size of the erase block, as it doubled or more with the change. Cheers, wad ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Memory replacement 2011-03-15 0:29 ` John Watlington @ 2011-03-15 8:42 ` Arnd Bergmann 0 siblings, 0 replies; 15+ messages in thread From: Arnd Bergmann @ 2011-03-15 8:42 UTC (permalink / raw) To: John Watlington; +Cc: mikus, Richard A. Smith, devel, linux-mmc On Tuesday 15 March 2011 01:29:19 John Watlington wrote: > On Mar 14, 2011, at 3:18 PM, Arnd Bergmann wrote: > > > Another effect is that the page size has increased by a factor of 8, > > from 2 or 4 KB to 16 or 32 KB. Writing data that as smaller than > > a page is more likely to get you into the worst case mentioned > > above. This is part of why FAT32 with 32 KB clusters still works > > reasonably well, but ext3 with 4 KB blocks has regressed so much. > > The explanation is simple: manufacturers moved to two-bit/cell (MLC) NAND Flash > over a year ago, and six months ago moved to three-bit/cell (TLC) NAND Flash. > Reliability went down, then went through the floor (I cannot recommend TLC for > anything but write-once devices). You might have noticed this as an increase in > the size of the erase block, as it doubled or more with the change. That, and the move to smaller structures (down to 25 nm) has of course reduced reliablility further, down to 2000 or so erase cycles per block, but that effect is unrelated to the file system being used. My point was that even if the card was done perfectly for FAT32 (maybe a write amplification of 2), the changes I described are pessimising ext3 (data from my head, easily off by an order of magnitude): drive block page erase w-amplftn expected life size size size cycles FAT ext3 FAT ext3 2005 SLC 256 MB 64 KB 1 KB 100000 2 8 13 TB 3.2 TB 2005 MLC 512 MB 128 KB 2 KB 10000 2 16 2.5 TB 640 GB 2011 SLC 4 GB 2 MB 8 KB 50000 2 512 100 TB 200 GB 2011 MLC 8 GB 4 MB 16 KB 5000 2 1024 20 TB 40 GB 2011 TLC 16 GB 4 MB 16 KB 2000 2 1024 16 TB 32 GB The manufacturers have probably mitigated this slightly by using more spare blocks, better ECC and better GC over the years, but essentially your measurements are matching the theory. Arnd ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Memory replacement 2011-03-13 21:31 ` Richard A. Smith 2011-03-13 22:34 ` Mikus Grinbergs @ 2011-03-14 17:32 ` Arnd Bergmann 1 sibling, 0 replies; 15+ messages in thread From: Arnd Bergmann @ 2011-03-14 17:32 UTC (permalink / raw) To: Richard A. Smith; +Cc: C. Scott Ananian, Kevin Gordon, devel, linux-mmc On Sunday 13 March 2011, Richard A. Smith wrote: > On 03/13/2011 01:21 PM, Arnd Bergmann wrote: > There's a 2nd round of test(s) that runs during the manufacturing and > burn-in phases. One is a simple firmware test to see if you can talk the > card at all and then one runs at burn in. It doesn't have a minimum > write size criteria but during the run there must not be any bit errors. ok. > > It does seem a bit crude, because many cards are not really suitable > > for this kind of file system when their wear leveling is purely optimized > > to the accesses defined in the sd card file system specification. > > > > If you did this on e.g. a typical Kingston card, it can have a write > > amplification 100 times higher than normal (FAT32, nilfs2, ...), so > > it gets painfully slow and wears out very quickly. > > Crude as they are they have been useful tests for us. Our top criteria > is reliability. We want to ship the machines with a SD card thats going > to last for the 5 year design life using the filesystem we ship. We > tried to create an access pattern was the worst possible and the highest > stress on the wear leveling system. I see. Using the 2 KB block size on ext3 as described in the Wiki should certainly do that, even on old cards that use 4 KB pages. I typically misalign the partition by a few sectors to get a similar effect, doubling the amount of internal garbage collection. I guess the real images use a higher block size, right? > > I had hoped that someone already correlated the GC algorithms with > > the requirements of specific file systems to allow a more systematic > > approach. > > At the time we started doing this testing none of the log structure > filesystems were deemed to be mature enough for us to ship. So we didn't > bother to try and torture test using them. > > If more precision tests were created that still allowed us to make a > reasonable estimate of data write lifetime we would be happy to start > using them. The tool that I'm working is git://git.linaro.org/people/arnd/flashbench.git It can be used to characterize a card in terms of its erase block size, number of open erase blocks, FAT optimized sections of the card, and possible access patterns inside of erase blocks, all by doing raw block I/O. Using it is currently a more manual process than I'd hope to make it for giving it to regular users. It also needs to be correlated to block access patterns from the file system. When you have that, it should be possible to accurately predict the amount of write amplification, which directly relates to how long the card ends up living. What I cannot determine right now is whether the card does static wear leveling. I have a Panasonic card that is advertized as doing it, but I haven't been able to pin down when that happens using timing attacks. Another thing you might be interested in is my other work on a block remapper that is designed to reduce the garbage collection by writing data in a log-structured way, similar to how some SSDs work internally. This will also do static wear leveling, as a way to improve the expected life by multiple orders of magnitude in some cases. https://wiki.linaro.org/WorkingGroups/KernelConsolidation/Projects/FlashDeviceMapper lists some concepts I want to use, but I have done a lot of changes to the design that are not yet reflected in the Wiki. I need to talk to more people at the Embedded Linux Conference and Storage/FS summit in San Francisco to make sure I get that right. Arnd ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2011-03-15 8:42 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <AANLkTikQOVa8qfU-R6uqQXKhAVxqeAmxKONjPmmqLCr8@mail.gmail.com>
[not found] ` <201103111135.01394.arnd@arndb.de>
[not found] ` <320717C0-7117-462E-9227-7966EE6941D7@laptop.org>
2011-03-12 22:51 ` Memory replacement Arnd Bergmann
2011-03-13 1:01 ` C. Scott Ananian
2011-03-13 12:57 ` Andrei Warkentin
2011-03-13 17:00 ` C. Scott Ananian
2011-03-13 17:06 ` C. Scott Ananian
2011-03-13 17:21 ` Arnd Bergmann
2011-03-13 21:31 ` Richard A. Smith
2011-03-13 22:34 ` Mikus Grinbergs
2011-03-14 14:01 ` Arnd Bergmann
2011-03-14 14:17 ` Richard A. Smith
2011-03-14 18:50 ` John Watlington
2011-03-14 19:18 ` Arnd Bergmann
2011-03-15 0:29 ` John Watlington
2011-03-15 8:42 ` Arnd Bergmann
2011-03-14 17:32 ` Arnd Bergmann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox