Re: Memory replacement

public inbox for linux-mmc@vger.kernel.org
 help / color / mirror / Atom feed

* Re: Memory replacement
       [not found]   ` <320717C0-7117-462E-9227-7966EE6941D7@laptop.org>
@ 2011-03-12 22:51     ` Arnd Bergmann
  2011-03-13  1:01       ` C. Scott Ananian
  0 siblings, 1 reply; 15+ messages in thread
From: Arnd Bergmann @ 2011-03-12 22:51 UTC (permalink / raw)
  To: John Watlington; +Cc: devel, Kevin Gordon, linux-mmc

On Friday 11 March 2011 18:28:49 John Watlington wrote:
> 
> On Mar 11, 2011, at 5:35 AM, Arnd Bergmann wrote:
> 
> > I've tested around a dozen media from them, and while you are true
> > that they use rather different algorithms and NAND chips inside, all
> > of them can write to at least 5 erase blocks before getting into
> > garbage collection, which is really needed for ext3 file systems.
> > 
> > Contrast this with Kingston cards, which all use the same algorithm
> > and can only write data linearly to one erase block at a time, resulting
> > in one or two orders of magnitude higher internal write amplification.
> > 
> > Most other vendors are somewhere inbetween, and you sometimes get
> > fake cards that don't do what you expect, such a a bunch of Samsung
> > microSDHC cards thaI have I which are labeled Sandisk on the outside.
> 
> Those aren't fakes.   That is what I'm trying to get across.

I've had four cards with a Sandisk label that had unusual characteristics
and manufacturer/OEM IDs that refer to other companies, three Samsung ("SM")
and one unknown ("BE", possibly lexar). In all cases, the Sandisk support
has confirmed from photos that the cards are definitely fake. They also
explained that all authentic cards (possibly fake ones as well, but I have
not seen them) will be labeled "Made in China", not "Made in Korea" or
"Made in Taiwan" as my fake ones, and that the authentic microSD cards have
the serial number on the front side, not on the back.

> > I've also seen some really cheap noname cards outperform similar-spec'd
> > sandisk card, both regarding maximum throughput and the garbage collection
> > algorithms, but you can't rely on that.
> 
> 
> My point is that you can't rely on Sandisk either.
> 
> I've been in discussion with both Sandisk and Adata about these issues,
> as well as constantly testing batches of new SD cards from all major
> vendors.
>
> Unless you pay a lot extra and order at least 100K, you have no
> control over what they give you.   They don't just change NAND chips,
> they change the controller chip and its firmware.  Frequently.
> And they don't update either the SKU number, part marking or the
> identification fields available to software.    The manufacturing batch
> number printed on the outside is the only thing that changes.

I agree that you cannot rely on specific behavior to stay the 
same with any vendor. One thing I noticed for instance is that
many new Sandisk cards are using TLC (three level cell) NAND,
which is inherently slower and cheaper than the regular two-level
MLC used in older cards or those from other vendors.

However, they have apparently managed to make them work well
for random access by using some erase blocks as SLC (writing only
the pages that carry the most significant bit in each cell) and
by doing log structured writes in there, something that apparently
others have not figured out yet. Also, as I mentioned, they
consistenly use a relatively large number of open erase blocks.
I've measured both effects on SD cards and USB sticks.

I believe you can get this level of sophistication only from
companies that make the nand flash, the controller and the card:
Sandisk, Samsung and Toshiba.

Other brands that just get the controllers and the flash chips
from whoever sells them cheaply (kingston, adata, panasonic,
transcend, ...) apparently don't get the really good stuff.

> How we deal with this is constant testing and getting notification from
> the manufacturer that they are changing the internals (unfortunately,
> we aren't willing to pay the premium to have a special SKU).

Do you have test results somewhere publically available? We are currently
discussing adding some tweaks to the linux mmc drivers to detect cards
with certain features, and to do some optimizations in the block layer
for common ones.

	Arnd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory replacement
  2011-03-12 22:51     ` Memory replacement Arnd Bergmann
@ 2011-03-13  1:01       ` C. Scott Ananian
  2011-03-13 12:57         ` Andrei Warkentin
  2011-03-13 17:21         ` Arnd Bergmann
  0 siblings, 2 replies; 15+ messages in thread
From: C. Scott Ananian @ 2011-03-13  1:01 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: John Watlington, Kevin Gordon, devel, linux-mmc

On Sat, Mar 12, 2011 at 5:51 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> I've had four cards with a Sandisk label that had unusual characteristics
> and manufacturer/OEM IDs that refer to other companies, three Samsung ("SM")
> and one unknown ("BE", possibly lexar). In all cases, the Sandisk support
> has confirmed from photos that the cards are definitely fake. They also

Please see the blog post I cited in the email immediately prior to
yours, which discusses this situation precisely.  Often the cards are
not actually "fake" -- they may even be produced on the exact same
equipment as the usual cards, but "off the books" during hours when
the factory is officially closed.  This sort of thing is very very
widespread, and fakes can come even via official distribution
channels.  (Discussed in bunnie's post.)

> However, they have apparently managed to make them work well
> for random access by using some erase blocks as SLC (writing only
> the pages that carry the most significant bit in each cell) and
> by doing log structured writes in there, something that apparently
> others have not figured out yet. Also, as I mentioned, they
> consistenly use a relatively large number of open erase blocks.
> I've measured both effects on SD cards and USB sticks.

You've been lucky.

> I believe you can get this level of sophistication only from
> companies that make the nand flash, the controller and the card:
> Sandisk, Samsung and Toshiba.
> Other brands that just get the controllers and the flash chips
> from whoever sells them cheaply (kingston, adata, panasonic,
> transcend, ...) apparently don't get the really good stuff.

You're giving the OEMs too much credit.  As John says, unless you
arrange for a special SKU, even the "first source" companies will give
you whatever they've got cheap that day.

>> How we deal with this is constant testing and getting notification from
>> the manufacturer that they are changing the internals (unfortunately,
>> we aren't willing to pay the premium to have a special SKU).
>
> Do you have test results somewhere publically available? We are currently
> discussing adding some tweaks to the linux mmc drivers to detect cards
> with certain features, and to do some optimizations in the block layer
> for common ones.

http://wiki.laptop.org/go/NAND_Testing

But the testing wad is talking about is really *on the factory floor*:
 Regular sampling of chips as they come into the factory to ensure
that the chips *you are actually about to put into the XOs* are
consistent.  Relying on manufacturing data reported by the chips is
not reliable.
  --scott

-- 
                         ( http://cscott.net/ )

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory replacement
  2011-03-13  1:01       ` C. Scott Ananian
@ 2011-03-13 12:57         ` Andrei Warkentin
  2011-03-13 17:00           ` C. Scott Ananian
  2011-03-13 17:21         ` Arnd Bergmann
  1 sibling, 1 reply; 15+ messages in thread
From: Andrei Warkentin @ 2011-03-13 12:57 UTC (permalink / raw)
  To: C. Scott Ananian
  Cc: Arnd Bergmann, John Watlington, Kevin Gordon, devel, linux-mmc

On Sat, Mar 12, 2011 at 7:01 PM, C. Scott Ananian <cscott@laptop.org> wrote:
> On Sat, Mar 12, 2011 at 5:51 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> I've had four cards with a Sandisk label that had unusual characteristics
>> and manufacturer/OEM IDs that refer to other companies, three Samsung ("SM")
>> and one unknown ("BE", possibly lexar). In all cases, the Sandisk support
>> has confirmed from photos that the cards are definitely fake. They also
>
> Please see the blog post I cited in the email immediately prior to
> yours, which discusses this situation precisely.  Often the cards are
> not actually "fake" -- they may even be produced on the exact same
> equipment as the usual cards, but "off the books" during hours when
> the factory is officially closed.  This sort of thing is very very
> widespread, and fakes can come even via official distribution
> channels.  (Discussed in bunnie's post.)
>
>> However, they have apparently managed to make them work well
>> for random access by using some erase blocks as SLC (writing only
>> the pages that carry the most significant bit in each cell) and
>> by doing log structured writes in there, something that apparently
>> others have not figured out yet. Also, as I mentioned, they
>> consistenly use a relatively large number of open erase blocks.
>> I've measured both effects on SD cards and USB sticks.
>
> You've been lucky.
>
>> I believe you can get this level of sophistication only from
>> companies that make the nand flash, the controller and the card:
>> Sandisk, Samsung and Toshiba.
>> Other brands that just get the controllers and the flash chips
>> from whoever sells them cheaply (kingston, adata, panasonic,
>> transcend, ...) apparently don't get the really good stuff.
>
> You're giving the OEMs too much credit.  As John says, unless you
> arrange for a special SKU, even the "first source" companies will give
> you whatever they've got cheap that day.
>
>>> How we deal with this is constant testing and getting notification from
>>> the manufacturer that they are changing the internals (unfortunately,
>>> we aren't willing to pay the premium to have a special SKU).
>>
>> Do you have test results somewhere publically available? We are currently
>> discussing adding some tweaks to the linux mmc drivers to detect cards
>> with certain features, and to do some optimizations in the block layer
>> for common ones.
>
> http://wiki.laptop.org/go/NAND_Testing
>
> But the testing wad is talking about is really *on the factory floor*:
>  Regular sampling of chips as they come into the factory to ensure
> that the chips *you are actually about to put into the XOs* are
> consistent.  Relying on manufacturing data reported by the chips is
> not reliable.
>  --scott


Sorry to butt in, I think I'm missing most of the context
here....nevertheless... I'm curious, ignoring outer packaging and
product names, if you look at cards with the "same" CID (i.e. same
manfid/oemid/date/firmware and hw rev), do you get same performance
characteristics?

Anyway, if you're curious about optimizing performance for certain
cards, I'm curious to see your results, your tests and (if any) vendor
recommendations. I'm collecting data and trying to re-validate some of
the vendor suggestions for Toshiba eMMCs... in particular - splitting
unaligned writes into an unaligned and aligned part. The only thing I
can say now is that the more data I collect the less it makes sense
:-).

I'm resubmitting a change to MMC layer that allows creating block MMC
quirks... Skipping the actual quirks as I'm trying to revalidate data
taken by others and provide data I'm confident about, but you might be
interested in the overall quirks support if you're thinking about
adding your own.

A

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory replacement
  2011-03-13 12:57         ` Andrei Warkentin
@ 2011-03-13 17:00           ` C. Scott Ananian
  2011-03-13 17:06             ` C. Scott Ananian
  0 siblings, 1 reply; 15+ messages in thread
From: C. Scott Ananian @ 2011-03-13 17:00 UTC (permalink / raw)
  To: Andrei Warkentin
  Cc: Arnd Bergmann, John Watlington, Kevin Gordon, devel, linux-mmc

On Sun, Mar 13, 2011 at 8:57 AM, Andrei Warkentin <andreiw@motorola.com> wrote:
> Sorry to butt in, I think I'm missing most of the context
> here....nevertheless... I'm curious, ignoring outer packaging and
> product names, if you look at cards with the "same" CID (i.e. same
> manfid/oemid/date/firmware and hw rev), do you get same performance
> characteristics?

No.
 --scott

-- 
                         ( http://cscott.net/ )

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory replacement
  2011-03-13 17:00           ` C. Scott Ananian
@ 2011-03-13 17:06             ` C. Scott Ananian
  0 siblings, 0 replies; 15+ messages in thread
From: C. Scott Ananian @ 2011-03-13 17:06 UTC (permalink / raw)
  To: Andrei Warkentin
  Cc: Arnd Bergmann, John Watlington, Kevin Gordon, devel, linux-mmc

On Sun, Mar 13, 2011 at 1:00 PM, C. Scott Ananian <cscott@laptop.org> wrote:
> On Sun, Mar 13, 2011 at 8:57 AM, Andrei Warkentin <andreiw@motorola.com> wrote:
>> Sorry to butt in, I think I'm missing most of the context
>> here....nevertheless... I'm curious, ignoring outer packaging and
>> product names, if you look at cards with the "same" CID (i.e. same
>> manfid/oemid/date/firmware and hw rev), do you get same performance
>> characteristics?
>
> No.

To elaborate: see bunnie's blog post (cited above) on how the CID is
often forged or wrong.  I've also personally witnessed a
manufacturer's rep come to the factory floor to reprogram a compact
flash card's internal microcontroller with new firmware.  This did not
update any externally visible information reported by the chip.  I had
to convince the manufacturer to leave their proprietary hardware on
the factory floor in order to be able to verify that future units
would have the correct firmware.  (Granted, this was not an MMC unit,
but I would be surprised if MMC vendors were significantly different
in this regard.)

If you've spent any time working with Chinese/Taiwanese OEMs, you will
notice that version control methodologies are (in general)
disappointingly lax.
 --scott

-- 
                         ( http://cscott.net/ )

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory replacement
  2011-03-13  1:01       ` C. Scott Ananian
  2011-03-13 12:57         ` Andrei Warkentin
@ 2011-03-13 17:21         ` Arnd Bergmann
  2011-03-13 21:31           ` Richard A. Smith
  1 sibling, 1 reply; 15+ messages in thread
From: Arnd Bergmann @ 2011-03-13 17:21 UTC (permalink / raw)
  To: C. Scott Ananian; +Cc: John Watlington, Kevin Gordon, devel, linux-mmc

On Sunday 13 March 2011 02:01:22 C. Scott Ananian wrote:
> On Sat, Mar 12, 2011 at 5:51 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> > I've had four cards with a Sandisk label that had unusual characteristics
> > and manufacturer/OEM IDs that refer to other companies, three Samsung ("SM")
> > and one unknown ("BE", possibly lexar). In all cases, the Sandisk support
> > has confirmed from photos that the cards are definitely fake. They also
> 
> Please see the blog post I cited in the email immediately prior to
> yours, which discusses this situation precisely.  Often the cards are
> not actually "fake" -- they may even be produced on the exact same
> equipment as the usual cards, but "off the books" during hours when
> the factory is officially closed.  This sort of thing is very very
> widespread, and fakes can come even via official distribution
> channels.  (Discussed in bunnie's post.)

I am very familiar with bunnie's research, and have referenced
it from my own page on the linaro wiki. I have also found Kingston
cards with the exact same symptoms that triggered his original
interest (very slow, manfid 0x41, oemid "42", low serial number).

Another interesting case of a fake card I found had a Sandisk
label and "LEXAR" in its MMC name field. Moreover, it actually
contained copyrighted software that Lexar ships in their real
cards. So what I'd assume is happening here is that the factory
that produces the cards or Lexar had a graveyard shift where they
were just printing Sandisk labels on the cards.

> You're giving the OEMs too much credit.  As John says, unless you
> arrange for a special SKU, even the "first source" companies will give
> you whatever they've got cheap that day.

It's pretty clear that they are moving to cheaper NAND chips when
possible, and I also mentioned that. For the controller chips, I don't
understand how they would save money by buying them on the spot market.
On the contrary, using the smart controllers that Sandisk themselves
make allows them to use even slower NAND chips and still qualify for
a better nominal speed grade, while companies that don't have acess
to decent controllers need to either use chips that are fast enough
to make up for the bad GC algorithm or lie in their speed grades.

> >> How we deal with this is constant testing and getting notification from
> >> the manufacturer that they are changing the internals (unfortunately,
> >> we aren't willing to pay the premium to have a special SKU).
> >
> > Do you have test results somewhere publically available? We are currently
> > discussing adding some tweaks to the linux mmc drivers to detect cards
> > with certain features, and to do some optimizations in the block layer
> > for common ones.
> 
> http://wiki.laptop.org/go/NAND_Testing

Ok, so the "testing" essentially means you create an ext2/3/4 file system
and run tests on the file system until the card wears out, right?

It does seem a bit crude, because many cards are not really suitable
for this kind of file system when their wear leveling is purely optimized
to the accesses defined in the sd card file system specification.

If you did this on e.g. a typical Kingston card, it can have a write
amplification 100 times higher than normal (FAT32, nilfs2, ...), so
it gets painfully slow and wears out very quickly.

I had hoped that someone already correlated the GC algorithms with
the requirements of specific file systems to allow a more systematic
approach.

	Arnd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory replacement
  2011-03-13 17:21         ` Arnd Bergmann
@ 2011-03-13 21:31           ` Richard A. Smith
  2011-03-13 22:34             ` Mikus Grinbergs
  2011-03-14 17:32             ` Arnd Bergmann
  0 siblings, 2 replies; 15+ messages in thread
From: Richard A. Smith @ 2011-03-13 21:31 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: C. Scott Ananian, Kevin Gordon, devel, linux-mmc

On 03/13/2011 01:21 PM, Arnd Bergmann wrote:

>>> Do you have test results somewhere publically available? We are currently
>>> discussing adding some tweaks to the linux mmc drivers to detect cards
>>> with certain features, and to do some optimizations in the block layer
>>> for common ones.
>>
>> http://wiki.laptop.org/go/NAND_Testing
>
> Ok, so the "testing" essentially means you create an ext2/3/4 file system
> and run tests on the file system until the card wears out, right?

The qualifying test is that the card must pass 3TB of writes with no 
errors.  We run that on samples from the various mfg's.

There's a 2nd round of test(s) that runs during the manufacturing and 
burn-in phases. One is a simple firmware test to see if you can talk the 
card at all and then one runs at burn in.  It doesn't have a minimum 
write size criteria but during the run there must not be any bit errors.

> It does seem a bit crude, because many cards are not really suitable
> for this kind of file system when their wear leveling is purely optimized
> to the accesses defined in the sd card file system specification.
>
> If you did this on e.g. a typical Kingston card, it can have a write
> amplification 100 times higher than normal (FAT32, nilfs2, ...), so
> it gets painfully slow and wears out very quickly.

Crude as they are they have been useful tests for us.  Our top criteria 
is reliability.  We want to ship the machines with a SD card thats going 
to last for the 5 year design life using the filesystem we ship.  We 
tried to create an access pattern was the worst possible and the highest 
stress on the wear leveling system.

If a card pases the 3TB abuse test then we are pretty certain its going 
to meet that goal.  There were many cards that died very quickly.

The tests have also helped expose other issues with things like sudden 
power off.  In one case a SPO during a write would corrupt the card so 
badly it became useless.  You could only recover them via a super secret 
tool from the manufacturer.

> I had hoped that someone already correlated the GC algorithms with
> the requirements of specific file systems to allow a more systematic
> approach.

At the time we started doing this testing none of the log structure 
filesystems were deemed to be mature enough for us to ship. So we didn't 
bother to try and torture test using them.

If more precision tests were created that still allowed us to make a 
reasonable estimate of data write lifetime we would be happy to start 
using them.

-- 
Richard A. Smith  <richard@laptop.org>
One Laptop per Child

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory replacement
  2011-03-13 21:31           ` Richard A. Smith
@ 2011-03-13 22:34             ` Mikus Grinbergs
  2011-03-14 14:01               ` Arnd Bergmann
                                 ` (2 more replies)
  2011-03-14 17:32             ` Arnd Bergmann
  1 sibling, 3 replies; 15+ messages in thread
From: Mikus Grinbergs @ 2011-03-13 22:34 UTC (permalink / raw)
  To: Richard A. Smith; +Cc: Arnd Bergmann, devel, linux-mmc

> The tests have also helped expose other issues with things like sudden 
> power off.  In one case a SPO during a write would corrupt the card so 
> badly it became useless.  You could only recover them via a super secret 
> tool from the manufacturer.

Is there any "sledgehammer" process available to users without a super
secret tool ?

I've encountered SD cards which will be recognized as a device when
plugged in to a running XO-1 (though 'ls' of a filesystem on that SD
card is corrupt) -- but 'fdisk' is ineffective when I want to write a
new partition table (and 'fsck' appears to loop).  Since otherwise I'd
just have to throw the card away, I'd be willing to apply EXTREME
measures to get such a card into a reusable ("blank slate") condition.

mikus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory replacement
  2011-03-13 22:34             ` Mikus Grinbergs
@ 2011-03-14 14:01               ` Arnd Bergmann
  2011-03-14 14:17               ` Richard A. Smith
  2011-03-14 18:50               ` John Watlington
  2 siblings, 0 replies; 15+ messages in thread
From: Arnd Bergmann @ 2011-03-14 14:01 UTC (permalink / raw)
  To: mikus; +Cc: Richard A. Smith, devel, linux-mmc

On Sunday 13 March 2011, Mikus Grinbergs wrote:
> > The tests have also helped expose other issues with things like sudden 
> > power off.  In one case a SPO during a write would corrupt the card so 
> > badly it became useless.  You could only recover them via a super secret 
> > tool from the manufacturer.
> 
> Is there any "sledgehammer" process available to users without a super
> secret tool ?

You can recover some cards by issueing an erase on the full drive.
Unfortunately, this requires a patch to the SDHCI device driver,
which is only now going into the kernel, I think it will be
in 2.6.39.

Issuing an erase (ioctl BLKDISCARD) also helps recover the performance
on cards that get slower with increased internal fragmentation, but
most cards use GC algorithms far too simple to get into that problem
in the first place.

	Arnd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory replacement
  2011-03-13 22:34             ` Mikus Grinbergs
  2011-03-14 14:01               ` Arnd Bergmann
@ 2011-03-14 14:17               ` Richard A. Smith
  2011-03-14 18:50               ` John Watlington
  2 siblings, 0 replies; 15+ messages in thread
From: Richard A. Smith @ 2011-03-14 14:17 UTC (permalink / raw)
  To: mikus; +Cc: devel, linux-mmc

On 03/13/2011 06:34 PM, Mikus Grinbergs wrote:
>> The tests have also helped expose other issues with things like sudden
>> power off.  In one case a SPO during a write would corrupt the card so
>> badly it became useless.  You could only recover them via a super secret
>> tool from the manufacturer.
>
> Is there any "sledgehammer" process available to users without a super
> secret tool ?

Wasn't just secret to users.  They would not give us the info on how to 
do it either.  It was vendor specific so not really worth the effort of 
trying to reverse engineer.

-- 
Richard A. Smith  <richard@laptop.org>
One Laptop per Child

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory replacement
  2011-03-13 22:34             ` Mikus Grinbergs
  2011-03-14 14:01               ` Arnd Bergmann
  2011-03-14 14:17               ` Richard A. Smith
@ 2011-03-14 18:50               ` John Watlington
  2011-03-14 19:18                 ` Arnd Bergmann
  2 siblings, 1 reply; 15+ messages in thread
From: John Watlington @ 2011-03-14 18:50 UTC (permalink / raw)
  To: mikus; +Cc: Richard A. Smith, devel, linux-mmc


On Mar 13, 2011, at 6:34 PM, Mikus Grinbergs wrote:

>> The tests have also helped expose other issues with things like sudden 
>> power off.  In one case a SPO during a write would corrupt the card so 
>> badly it became useless.  You could only recover them via a super secret 
>> tool from the manufacturer.
> 
> Is there any "sledgehammer" process available to users without a super
> secret tool ?

No.   Such software does exist for every controller, but it doesn't
necessarily use the SD interface as SD.

> I've encountered SD cards which will be recognized as a device when
> plugged in to a running XO-1 (though 'ls' of a filesystem on that SD
> card is corrupt) -- but 'fdisk' is ineffective when I want to write a
> new partition table (and 'fsck' appears to loop).  Since otherwise I'd
> just have to throw the card away, I'd be willing to apply EXTREME
> measures to get such a card into a reusable ("blank slate") condition.


Cards that are in the state you describe are most likely dead due to
running out of spare blocks.   There is nothing that can be done to
rehabilitate them, even using the manufacturer's secret code.
In a disturbing trend, most of the cards I've returned for failure analysis
in the past year have been worn out (and not just trashed meta-data
due to a firmware error).

Bummer,
wad

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory replacement
  2011-03-14 18:50               ` John Watlington
@ 2011-03-14 19:18                 ` Arnd Bergmann
  2011-03-15  0:29                   ` John Watlington
  0 siblings, 1 reply; 15+ messages in thread
From: Arnd Bergmann @ 2011-03-14 19:18 UTC (permalink / raw)
  To: John Watlington; +Cc: mikus, Richard A. Smith, devel, linux-mmc

On Monday 14 March 2011 19:50:27 John Watlington wrote:
> Cards that are in the state you describe are most likely dead due to
> running out of spare blocks.   There is nothing that can be done to
> rehabilitate them, even using the manufacturer's secret code.
> In a disturbing trend, most of the cards I've returned for failure analysis
> in the past year have been worn out (and not just trashed meta-data
> due to a firmware error).

Part of the explanation for this could be the fact that erase block
sizes have rapidly increased. AFAIK, the original XO builtin flash
had 128KB erase blocks, which is also a common size for 1GB SD and
CF cards.

Cards made in 2010 or later typically have erase blocks of 2 MB, and
combine two of them into an allocation unit of 4 MB. This means that
in the worst case (random access over the whole medium), the write
amplification has increased by a factor of 32.

Another effect is that the page size has increased by a factor of 8,
from 2 or 4 KB to 16 or 32 KB. Writing data that as smaller than
a page is more likely to get you into the worst case mentioned
above. This is part of why FAT32 with 32 KB clusters still works
reasonably well, but ext3 with 4 KB blocks has regressed so much.

	Arnd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory replacement
  2011-03-14 19:18                 ` Arnd Bergmann
@ 2011-03-15  0:29                   ` John Watlington
  2011-03-15  8:42                     ` Arnd Bergmann
  0 siblings, 1 reply; 15+ messages in thread
From: John Watlington @ 2011-03-15  0:29 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: John Watlington, mikus, Richard A. Smith, devel, linux-mmc


On Mar 14, 2011, at 3:18 PM, Arnd Bergmann wrote:

> On Monday 14 March 2011 19:50:27 John Watlington wrote:
>> Cards that are in the state you describe are most likely dead due to
>> running out of spare blocks.   There is nothing that can be done to
>> rehabilitate them, even using the manufacturer's secret code.
>> In a disturbing trend, most of the cards I've returned for failure analysis
>> in the past year have been worn out (and not just trashed meta-data
>> due to a firmware error).
> 
> Part of the explanation for this could be the fact that erase block
> sizes have rapidly increased. AFAIK, the original XO builtin flash
> had 128KB erase blocks, which is also a common size for 1GB SD and
> CF cards.

> Cards made in 2010 or later typically have erase blocks of 2 MB, and
> combine two of them into an allocation unit of 4 MB. This means that
> in the worst case (random access over the whole medium), the write
> amplification has increased by a factor of 32.
> 
> Another effect is that the page size has increased by a factor of 8,
> from 2 or 4 KB to 16 or 32 KB. Writing data that as smaller than
> a page is more likely to get you into the worst case mentioned
> above. This is part of why FAT32 with 32 KB clusters still works
> reasonably well, but ext3 with 4 KB blocks has regressed so much.

The explanation is simple: manufacturers moved to two-bit/cell (MLC) NAND Flash
over a year ago, and six months ago moved to three-bit/cell (TLC) NAND Flash.
Reliability went down, then went through the floor (I cannot recommend TLC for
anything but write-once devices).   You might have noticed this as an increase in
the size of the erase block, as it doubled or more with the change.

Cheers,
wad


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory replacement
  2011-03-15  0:29                   ` John Watlington
@ 2011-03-15  8:42                     ` Arnd Bergmann
  0 siblings, 0 replies; 15+ messages in thread
From: Arnd Bergmann @ 2011-03-15  8:42 UTC (permalink / raw)
  To: John Watlington; +Cc: mikus, Richard A. Smith, devel, linux-mmc

On Tuesday 15 March 2011 01:29:19 John Watlington wrote:

> On Mar 14, 2011, at 3:18 PM, Arnd Bergmann wrote:
>
> > Another effect is that the page size has increased by a factor of 8,
> > from 2 or 4 KB to 16 or 32 KB. Writing data that as smaller than
> > a page is more likely to get you into the worst case mentioned
> > above. This is part of why FAT32 with 32 KB clusters still works
> > reasonably well, but ext3 with 4 KB blocks has regressed so much.
> 
> The explanation is simple: manufacturers moved to two-bit/cell (MLC) NAND Flash
> over a year ago, and six months ago moved to three-bit/cell (TLC) NAND Flash.
> Reliability went down, then went through the floor (I cannot recommend TLC for
> anything but write-once devices).   You might have noticed this as an increase in
> the size of the erase block, as it doubled or more with the change.

That, and the move to smaller structures (down to 25 nm) has of course
reduced reliablility further, down to 2000 or so erase cycles per block,
but that effect is unrelated to the file system being used.

My point was that even if the card was done perfectly for FAT32 (maybe a
write amplification of 2), the changes I described are pessimising ext3
(data from my head, easily off by an order of magnitude):

	 drive	 block	page 	erase	w-amplftn	expected life
	 size    size	size 	cycles	FAT ext3	FAT	ext3
2005 SLC 256 MB	 64 KB	1 KB	100000	2   8		13 TB	3.2 TB
2005 MLC 512 MB	 128 KB	2 KB	10000	2   16		2.5 TB	640 GB
2011 SLC 4 GB	 2 MB	8 KB	50000	2   512		100 TB	200 GB
2011 MLC 8 GB	 4 MB	16 KB	5000	2   1024	20 TB	40 GB
2011 TLC 16 GB	 4 MB	16 KB	2000	2   1024	16 TB	32 GB

The manufacturers have probably mitigated this slightly by using more
spare blocks, better ECC and better GC over the years, but essentially
your measurements are matching the theory.

	Arnd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory replacement
  2011-03-13 21:31           ` Richard A. Smith
  2011-03-13 22:34             ` Mikus Grinbergs
@ 2011-03-14 17:32             ` Arnd Bergmann
  1 sibling, 0 replies; 15+ messages in thread
From: Arnd Bergmann @ 2011-03-14 17:32 UTC (permalink / raw)
  To: Richard A. Smith; +Cc: C. Scott Ananian, Kevin Gordon, devel, linux-mmc

On Sunday 13 March 2011, Richard A. Smith wrote:
> On 03/13/2011 01:21 PM, Arnd Bergmann wrote:
> There's a 2nd round of test(s) that runs during the manufacturing and 
> burn-in phases. One is a simple firmware test to see if you can talk the 
> card at all and then one runs at burn in.  It doesn't have a minimum 
> write size criteria but during the run there must not be any bit errors.

ok.

> > It does seem a bit crude, because many cards are not really suitable
> > for this kind of file system when their wear leveling is purely optimized
> > to the accesses defined in the sd card file system specification.
> >
> > If you did this on e.g. a typical Kingston card, it can have a write
> > amplification 100 times higher than normal (FAT32, nilfs2, ...), so
> > it gets painfully slow and wears out very quickly.
> 
> Crude as they are they have been useful tests for us.  Our top criteria 
> is reliability.  We want to ship the machines with a SD card thats going 
> to last for the 5 year design life using the filesystem we ship.  We 
> tried to create an access pattern was the worst possible and the highest 
> stress on the wear leveling system.

I see. Using the 2 KB block size on ext3 as described in the Wiki should
certainly do that, even on old cards that use 4 KB pages. I typically
misalign the partition by a few sectors to get a similar effect,
doubling the amount of internal garbage collection.

I guess the real images use a higher block size, right?

> > I had hoped that someone already correlated the GC algorithms with
> > the requirements of specific file systems to allow a more systematic
> > approach.
> 
> At the time we started doing this testing none of the log structure 
> filesystems were deemed to be mature enough for us to ship. So we didn't 
> bother to try and torture test using them.
> 
> If more precision tests were created that still allowed us to make a 
> reasonable estimate of data write lifetime we would be happy to start 
> using them.

The tool that I'm working is git://git.linaro.org/people/arnd/flashbench.git
It can be used to characterize a card in terms of its erase block size,
number of open erase blocks, FAT optimized sections of the card, and
possible access patterns inside of erase blocks, all by doing raw block
I/O. Using it is currently a more manual process than I'd hope to
make it for giving it to regular users. It also needs to be correlated
to block access patterns from the file system. When you have that, it
should be possible to accurately predict the amount of write amplification,
which directly relates to how long the card ends up living.

What I cannot determine right now is whether the card does static wear
leveling. I have a Panasonic card that is advertized as doing it, but
I haven't been able to pin down when that happens using timing attacks.

Another thing you might be interested in is my other work on a block
remapper that is designed to reduce the garbage collection by writing
data in a log-structured way, similar to how some SSDs work internally.
This will also do static wear leveling, as a way to improve the expected
life by multiple orders of magnitude in some cases.
https://wiki.linaro.org/WorkingGroups/KernelConsolidation/Projects/FlashDeviceMapper
lists some concepts I want to use, but I have done a lot of changes
to the design that are not yet reflected in the Wiki. I need to
talk to more people at the Embedded Linux Conference and Storage/FS summit
in San Francisco to make sure I get that right.

	Arnd

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-03-15  8:42 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <AANLkTikQOVa8qfU-R6uqQXKhAVxqeAmxKONjPmmqLCr8@mail.gmail.com>
     [not found] ` <201103111135.01394.arnd@arndb.de>
     [not found]   ` <320717C0-7117-462E-9227-7966EE6941D7@laptop.org>
2011-03-12 22:51     ` Memory replacement Arnd Bergmann
2011-03-13  1:01       ` C. Scott Ananian
2011-03-13 12:57         ` Andrei Warkentin
2011-03-13 17:00           ` C. Scott Ananian
2011-03-13 17:06             ` C. Scott Ananian
2011-03-13 17:21         ` Arnd Bergmann
2011-03-13 21:31           ` Richard A. Smith
2011-03-13 22:34             ` Mikus Grinbergs
2011-03-14 14:01               ` Arnd Bergmann
2011-03-14 14:17               ` Richard A. Smith
2011-03-14 18:50               ` John Watlington
2011-03-14 19:18                 ` Arnd Bergmann
2011-03-15  0:29                   ` John Watlington
2011-03-15  8:42                     ` Arnd Bergmann
2011-03-14 17:32             ` Arnd Bergmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox