public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Michael H. Warfield" <mhw@wittsend.com>
To: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	mhw@wittsend.com
Subject: Re: Sync option destroys flash!
Date: Fri, 13 May 2005 21:05:34 -0400	[thread overview]
Message-ID: <1116032735.5461.46.camel@localhost.localdomain> (raw)
In-Reply-To: <1116009619.9371.494.camel@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 5586 bytes --]

Hey Alan...

On Fri, 2005-05-13 at 19:40 +0100, Alan Cox wrote:
> > 	What happens, with the sync option on a VFAT file system, is that the
> > FAT tables are getting pounded and over-written over and over and over
> > again as each and every block/cluster is allocated while a new file is
> > written out.  This constant overwriting eventually wears out the first
> > block or two of the flash drive.

> All non-shite quality flash keys have an on media log structured file
> system and will take 100,000+ writes per sector or so. They decent ones
> also map out bad blocks and have spares. The "wear out the same sector"
> stuff is a myth except on ultra-crap devices.

	Yah know...  I've been thinking about this...  In a former life, we use
to do something very similar with a virtual memory system on some real
early (80's vintage) networked VM workstations (back when memory was
actually valuable and scarce).

	So...  This would have to work with a list or pool of "spares" that are
not allocated to the "visible" file system.  We used a "least used"
algorithm for that VM system.  This would seem to be a "replace as
rewritten" algorithm.  Each time you write to the file system, it grabs
a block off the head of the spares list, writes your data to it, and
then adds the old block to the tail of the list.  Pretty basic stuff and
it doesn't have to track what kind of high level file system you are
using or know anything about its structure.  Cool...

	That makes sense.  But...  How big might this "list" be?  Maybe an
additional 10% of the entire drive capacity?  That's quite a bit...  But
now you're beating on that FAT table pretty heavy.  For each block
allocated and written to, we're rewriting the FAT table (actually TWO
FAT tables if you count the back up FAT).  Ok...  One data block, two
FAT table rewrites.  So a FAT table block gets added back to the list
and a block is grabbed off the list.  Seems like there would be a pretty
high percentage of old FAT table blocks sitting there circulating on the
spares list.  That would make the probability of grabbing an old FAT
table block and rewriting it again pretty high.  Then it would get added
back to the list again, in turn.

	Because of this systematic thumping of the FAT tables, these old FAT
blocks are going to be circulating in that spares list at a pretty high
density.  The wear leveling is not going to be nearly as effective
BECAUSE of the thumping.  I'm not certain if that will be better or
worse if there are more blocks in the spares list.  Seems like you are
going to end up with 50% - 60% (WAG) of the blocks in the spares list
being old FAT table blocks and end up with a number that just keep
recirculating until they burn out.  I would think that they'll burn out
faster if that spares list is small and they get reused more frequently
(note to follow).

	The up side is that, once an beat up old FAT table block does get
allocated to a file data block, it gets to retire in comfort and not get
rewritten until the file gets rewritten.  But...  That's reducing the
pool of circulating blocks in the allocated file system...  So, a file
system that's full is going to rotate through it's spare and free blocks
faster as well...  Some pluses...  Some minuses...

	It would seem like this would work well for something like a camera (or
a Mars Rover) where you are periodically removing almost everything from
the flash memory and all the blocks have a chance to return to the
spares list.  But I see lots of possibilities for degrading the wear
leveling in other cases...

	Now...  Flaw recovery could be a big help there.  Write the block but
notice that the old one is now bad and don't add it back or the new one
failed and you grab another.  But then your spares list shrinks.
Failure occurs on the first failure where the spares list hits zero.
Probability (in the FAT thumping case with the sync option) is that it's
going to be a FAT block that takes the hit and takes then entire drive
out.

	Am I seeing this correctly?  Seems to me that the wear leveling is not
going to be nearly as effective as it should in the case where we are
beating up on the FAT simply because of this systematic bias the sync
option introduces into the write patterns on a FAT file system.  And
that will be aggravated by significant load of static data.  If
anything, the "sync" option almost appears to be defeating the wear
leveling logic on FAT and VFAT file systems.

> > 	I'm also going to file a couple of bug reports in bugzilla at RedHat
> > but this seems to be a more fundamental problem than a RedHat specific
> > problem.  But, IMHO, they should never be setting that damn sync flag
> > arbitrarily.
> 
> It sounds like your need to find a vendor who makes decent keys. For
> that matter several vendors now offer life time guarantees with their
> USB flash media.
> 
> Sync gets set by RH because it seemed the right thing to do to handle
> random user device pulls. Now O_SYNC works so excessively well on
> fat/vfat that needs looking at - and as you say likewise perhaps the
> nature of the FAT rewriting.
> 
> However its not a media issue, its primarily a performance issue.
> 
> Alan

	Mike
-- 
 Michael H. Warfield    |  (770) 985-6132   |  mhw@WittsEnd.com  
  /\/\|=mhw=|\/\/       |  (678) 463-0932   |  http://www.wittsend.com/mhw/
  NIC whois:  MHW9      |  An optimist believes we live in the best of all
 PGP Key: 0xDF1DD471    |  possible worlds.  A pessimist is sure of it!

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 307 bytes --]

  parent reply	other threads:[~2005-05-14  1:07 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-13 16:20 Sync option destroys flash! Michael H. Warfield
2005-05-13 17:17 ` Lennart Sorensen
2005-05-13 17:53   ` Michael H. Warfield
2005-05-13 18:09     ` Lennart Sorensen
2005-05-13 18:21       ` Michael H. Warfield
2005-05-13 18:26         ` Lennart Sorensen
2005-05-13 18:52           ` Flash device types Mark Rustad
2005-05-13 18:53           ` Sync option destroys flash! Michael H. Warfield
2005-05-13 17:58   ` Zan Lynx
2005-05-13 18:13     ` Lennart Sorensen
2005-05-13 18:40 ` Alan Cox
2005-05-13 19:10   ` Michael H. Warfield
2005-05-13 22:00     ` Alan Cox
2005-05-13 22:22       ` Måns Rullgård
2005-05-13 23:24         ` Jon Masters
2005-05-13 23:01       ` Jeffrey Hundstad
2005-05-13 23:27         ` Jon Masters
2005-05-14 10:17       ` Jörn Engel
2005-05-14  1:05   ` Michael H. Warfield [this message]
2005-05-17 13:30     ` Lennart Sorensen
2005-05-13 21:25 ` Lee Revell
2005-05-13 22:43   ` Alan Cox
2005-05-15 19:00 ` Denis Vlasenko
2005-05-16  0:23   ` Mark Lord
2005-05-16  9:29     ` David Woodhouse
2005-05-16 16:42       ` Pavel Machek
2005-05-16 13:01     ` Richard B. Johnson
2005-05-16 23:18   ` Helge Hafting
2005-05-18  7:03     ` Denis Vlasenko
2005-05-17  7:59 ` Colin Leroy
     [not found] <43Ldl-NM-25@gated-at.bofh.it>
     [not found] ` <43M9s-1B8-39@gated-at.bofh.it>
     [not found]   ` <43MCx-1UF-27@gated-at.bofh.it>
     [not found]     ` <43MVz-2hL-1@gated-at.bofh.it>
2005-05-13 23:59       ` Robert Hancock
  -- strict thread matches above, loose matches on Subject: below --
2005-05-14  2:43 linux
2005-05-17 13:36 ` Lennart Sorensen
2005-05-17 20:31   ` linux
2005-05-17 20:43     ` Richard B. Johnson
2005-05-18 13:37     ` Lennart Sorensen
     [not found] <43UT5-jT-3@gated-at.bofh.it>
2005-05-14  4:34 ` Robert Hancock
2005-05-18 11:13 linux
2005-05-18 12:01 ` Richard B. Johnson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1116032735.5461.46.camel@localhost.localdomain \
    --to=mhw@wittsend.com \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox