From: Chris Snook <csnook@redhat.com>
To: "Jörn Engel" <joern@logfs.org>
Cc: Stefan Monnier <monnier@iro.umontreal.ca>, linux-kernel@vger.kernel.org
Subject: Re: Filesystem for block devices using flash storage?
Date: Mon, 13 Oct 2008 13:30:29 -0400 [thread overview]
Message-ID: <48F385B5.1040503@redhat.com> (raw)
In-Reply-To: <20081012143505.GA15799@logfs.org>
Jörn Engel wrote:
> On Wed, 8 October 2008 16:51:46 -0400, Chris Snook wrote:
>> Stefan Monnier wrote:
>>
>> Writes to magnetic disks are functionally atomic at the sector level. With
>> SSDs, writing requires an erase followed by rewriting the sectors that
>> aren't changing. This means that an ill-timed power loss can corrupt an
>> entire erase block, which could be up to 256k on some MLC flash. Unless
>
> What makes you think that? The standard mode of operation in El Cheapo
> devices is to write to a new eraseblock first, then delete the old one.
> An ill-timed power loss results in either the old or the new block being
> valid as a whole. This has been the standard ever since you could buy
> 4MB compactflash cards.
>
>> logfs tries to solve the write amplification problem by forcing all write
>> activity to be sequential. I'm not sure how mature it is.
>
> Still under development. What exactly do you mean by the write
> amplification problem?
Write amplification is where a 512 byte write turns into a 128k write,
due to erase block size.
>>> Or is there some hope for SSDs to provide access to the MTD layer in the
>>> not too distant future?
>> I hope not. The proper fix is to have the devices report their physical
>> topology via SCSI/ATA commands. This allows dumb software to function
>> correctly, albeit inefficiently, and allows smart software to optimize
>> itself. This technique also helps with RAID arrays, large-sector disks, etc.
>
> Having access to the actual flash would provide a large number of
> benefits. It just isn't a safe default choice at the moment.
>
>> I suspect that in the long run, the problem will go away. Erase blocks are
>> a relic of the days when flash was used primarily for low-power,
>> read-mostly applications. As the SSD market heats up, the flash vendors
>> will move to smaller erase blocks, possibly as small as the sector size.
>
> Do you have any information to back this claim? AFAICT smaller erase
> blocks would require more chip area per bit, making devices more
> expensive. If anything, I can see a trend towards bigger erase blocks.
Intel is claiming a write amplification factor of 1.1. Either they're
using very small erase blocks, or doing something very smart in the
controller.
-- Chris
next prev parent reply other threads:[~2008-10-13 17:30 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-08 16:38 Filesystem for block devices using flash storage? Stefan Monnier
2008-10-08 20:51 ` Chris Snook
2008-10-11 14:35 ` Pavel Machek
2008-10-11 16:29 ` Arjan van de Ven
2008-10-11 17:51 ` Alan Cox
2008-10-12 13:01 ` Jörn Engel
2008-10-13 10:57 ` Pavel Machek
2008-10-13 12:10 ` Jörn Engel
2008-10-14 18:04 ` Lennart Sorensen
2008-10-12 14:35 ` Jörn Engel
2008-10-13 17:30 ` Chris Snook [this message]
2008-10-13 18:13 ` Jörn Engel
2008-10-13 18:38 ` Chris Snook
2008-10-14 11:18 ` Jörn Engel
2008-10-14 13:05 ` Stefan Monnier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48F385B5.1040503@redhat.com \
--to=csnook@redhat.com \
--cc=joern@logfs.org \
--cc=linux-kernel@vger.kernel.org \
--cc=monnier@iro.umontreal.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox