public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
From: Arnd Bergmann <arnd@arndb.de>
To: Andreas Dilger <adilger@dilger.ca>
Cc: Andrei Warkentin <andreiw@motorola.com>,
	linux-mmc@vger.kernel.org, linux-ext4@vger.kernel.org
Subject: Re: [RFC 4/5] MMC: Adjust unaligned write accesses.
Date: Tue, 22 Mar 2011 14:56:31 +0100	[thread overview]
Message-ID: <201103221456.32151.arnd@arndb.de> (raw)
In-Reply-To: <7BE97618-725C-4BFA-9FE5-59C893BDA097@dilger.ca>

On Tuesday 22 March 2011, Andreas Dilger wrote:
> On 2011-03-21, at 8:05 PM, Arnd Bergmann wrote:
> > On Monday 21 March 2011 19:03:09 Andreas Dilger wrote:
> >> Note that mballoc was specifically designed to handle allocation
> >> requests that are aligned on RAID stripe boundaries, so it should
> >> be able to handle this for MMC as well.  What is needed is to tell
> >> the filesystem what the underlying alignment is.  That can be done
> >> at format time with mke2fs or afterward with tune2fs by using the
> >> "-E stripe_width" option.
> > 
> > Ah, that sounds useful. So would I set the stripe_width to the
> > erase block size, and the block group size to a multiple of that?
> 
> When you write "block group size" do you mean the ext4 block group? 

Yes.

> Then yes it would help.  You could also consider setting the flex_bg
> size to a multiple of this, so that the bitmap blocks are grouped as
> a multiple of this size.  However, they may not be aligned correctly,
> which needs extra effort that isn't obvious.  
> 
> I think it would be nice to have mke2fs take the stripe_width and/or
> flex_bg factor into account when sizing/aligning the bitmaps, but it
> doesn't yet.

A few more questions: 

* On cards that can only write to a single erase block at a time,
should I make the block group size the same as the as the erase
block? I suppose writing both block bitmaps, inode and data to
separate erase blocks would create multiple eraseblock
read-modify-write cycles for every single file otherwise.

* Is it guaranteed that inode bitmap, inode, block bitmap and
blocks are always written in low-to-high sector order within
one ext4 block group? A lot of the drives will do a garbage-collect
step (adding hundreds of miliseconds) every time you move back
inside of the eraseblock.

* Is there any way to make ext4 use effective blocks larger
than 4 KB? The most common size for a NAND flash page is 16
KB right (effectively, ignoring what the hardware does), so
it would be good to never write smaller.

* Calling TRIM on SD cards is probably counterproductive unless
you trim entire erase blocks. Is that even possible with ext4,
assuming that we use block group == erase block?

* Is there a way to put the journal into specific parts of the
drive? Almost all SD cards have an area in the second 4 MB
(more for larger cards) that can be written using random access
without forcing garbage collection on other parts.

> > Does this also work in (rare) cases where the erase block size is
> > not a power of two?
> 
> It does (or is supposed to), but that isn't code that is exercised
> very much (most installations use a power-of-two size).

Ok. Recently, cheap TLC (three-level cell, 3-bit MLC) NAND is
becoming popular. I've seen erase block sizes of 6 MiB, 1376 KiB
(4096 / 3, rounded up) and 4128 KiB (1376 * 3) because of this, in
place of the common 4096 KiB. The SD card standard specifies
values of 12 MB and 24 MB aside from the usual power-of-two values
up to 64 MB for large cards (>32GB), while smaller cards are allowed
only up to 4 MB erase blocks and need to be power-of-two. Many
cards do not use the size they claim in their registers.

	Arnd

  reply	other threads:[~2011-03-22 13:56 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1299718449-15172-1-git-send-email-andreiw@motorola.com>
     [not found] ` <AANLkTimV-no9Wk4wbS4gQGdSgq-2L=ims6SXDFrdEZAe@mail.gmail.com>
     [not found]   ` <AANLkTinEQEwa2SqEwTnbe3kcYuDoM-ZbzWE7X+V3B+zV@mail.gmail.com>
2011-03-21 14:21     ` [RFC 4/5] MMC: Adjust unaligned write accesses Arnd Bergmann
2011-03-21 14:41       ` Andrei Warkentin
2011-03-21 18:03         ` Andreas Dilger
2011-03-21 19:05           ` Arnd Bergmann
2011-03-21 23:58             ` Andreas Dilger
2011-03-22 13:56               ` Arnd Bergmann [this message]
2011-03-22 15:02                 ` Andreas Dilger
2011-03-22 15:44                   ` Arnd Bergmann
2011-03-21 14:27 Fwd: " Arnd Bergmann
2011-03-21 23:45 ` Andreas Dilger
2011-03-22  7:18   ` Andrei Warkentin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201103221456.32151.arnd@arndb.de \
    --to=arnd@arndb.de \
    --cc=adilger@dilger.ca \
    --cc=andreiw@motorola.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-mmc@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox