From: Arnd Bergmann <arnd.bergmann@linaro.org>
To: "Ted Ts'o" <tytso@mit.edu>
Cc: Alex Lemberg <Alex.Lemberg@sandisk.com>,
HYOJIN JEONG <syr.jeong@samsung.com>,
Saugata Das <saugata.das@linaro.org>,
Artem Bityutskiy <dedekind1@gmail.com>,
Saugata Das <saugata.das@stericsson.com>,
linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mmc@vger.kernel.org, patches@linaro.org, venkat@linaro.org,
"Luca Porzio (lporzio)" <lporzio@micron.com>
Subject: Re: [PATCH 2/3] ext4: Context support
Date: Sat, 16 Jun 2012 17:41:23 +0000 [thread overview]
Message-ID: <201206161741.23900.arnd.bergmann@linaro.org> (raw)
In-Reply-To: <20120616134923.GA12140@thunk.org>
On Saturday 16 June 2012, Ted Ts'o wrote:
> On Sat, Jun 16, 2012 at 07:26:07AM +0000, Arnd Bergmann wrote:
> > > Oh, that's cool. And I don't think that's hard to do. We could just
> > > keep a flag in the in-core inode indicating whether it is in "large
> > > unit" mode. If it is in large unit mode, we can make the fs writeback
> > > function make sure that we adhere to the restrictions of the large
> > > unit mode, and if at any point we need to do something that might
> > > violate the constraints, the file system would simply close the
> > > context.
> >
> > Really? I actually had expected this to be a major issue, to the
> > point that I thought we would only ever do large contexts in
> > special emmc-optimized file sytems.
>
> Yeah, it's easy, for file systems (like ext4) which have delayed
> allocation. It's always faster to write in large contiguous chunks,
> so we do a lot of work to make sure we can make that happen. Take a
> look of a blktrace of ext4 when writing large set of files; most of
> the I/O will be in contiguous, large chunks. So it's just a matter of
> telling the block device layer when we are about to do that large
> write. We could probably do some tuning to make the chunks be larger
> and adjust some parameters in the block allocation, but that's easy.
>
> One thing which is going to be tricky is that ext4 currently uses a
> buddy allocator, so it will work well for erase blocks of two. You
> mentioned some devices might have erase block sizes of 3*2**N, so that
> might require reworking the block allocator some, if we need to align
> writes on erase block boundaries.
What about the other restrictions I mentioned though? If we use large-unit
read-only contexts, it's not just about writing the entire erase block
from start to end, we have to make sure we follow other rules:
* We cannot read from write-only large-unit context, so we have to
do one of these:
a) ensure we never drop any pages from page-cache between writing
them to the large context and closing that context
b) if we need to read some data that we have just written to the
large-unit context, close that context and open a new rw-context
without the large-unit flag set (or write in the default context)
* All writes to the large-unit context have to be done in superpage
size, which means something between 8 and 32 kb typically, so more
than the underlying fs block size
* We can only start the large unit at the start of an erase block. If
we unmount the drive and later continue writing, it has to continue
without the large-unit flag at first until we hit an erase block
boundary.
* If we run out of contexts in the block device, we might have to
close a large-unit context before getting to the end of it.
> > > Well, I'm interested in getting something upstream, which is useful
> > > not just for the consumer-grade eMMC devices in handsets, but which
> > > might also be extensible to SSD's, and all the way up to PCIe-attached
> > > flash devices that might be used in large data centers.
> > >
> >
> > I am not aware of any actual SSD technology that would take advantage
> > of it, but at least the upcoming UFS standard that is supposed to
> > replace eMMC should do it, and it's somewhere inbetween an eMMC and
> > an SSD in many ways.
>
> I'm not aware that anything has been announced, but this is one of
> those things which the high end folks have *got* to be thinking about.
> The issues involved aren't only just for eMMC, you know... :-)
My impression was always that the high-end storage folks try to make
everything behave nicely whatever the access patterns are, and they
can do it because an SSD controllers has vast amounts of cache (megabytes,
not kilobytes) and processing power (e.g. 1Ghz ARMv5 instead of 50 Mhz
8051) to handle it, and they also make use of tagged command queuing to
let the device have multiple outstanding requests.
Arnd
next prev parent reply other threads:[~2012-06-16 17:41 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-11 10:46 [PATCH 1/3] block: Context support Saugata Das
2012-06-11 10:46 ` [PATCH 2/3] ext4: " Saugata Das
2012-06-11 11:41 ` Artem Bityutskiy
2012-06-11 12:27 ` Ted Ts'o
2012-06-12 12:21 ` Saugata Das
2012-06-12 12:32 ` Ted Ts'o
2012-06-12 13:29 ` Arnd Bergmann
2012-06-12 14:26 ` Saugata Das
2012-06-12 14:55 ` Arnd Bergmann
2012-06-12 18:19 ` Ted Ts'o
2012-06-12 20:07 ` Arnd Bergmann
2012-06-12 20:41 ` Ted Ts'o
2012-06-13 19:44 ` Arnd Bergmann
2012-06-13 20:00 ` Ted Ts'o
2012-06-13 20:43 ` Arnd Bergmann
2012-06-14 2:07 ` Ted Ts'o
2012-06-14 16:14 ` Nicolas Pitre
2012-06-14 16:24 ` Artem Bityutskiy
2012-06-14 17:05 ` Ted Ts'o
2012-06-14 19:08 ` Nicolas Pitre
2012-06-15 9:19 ` Arnd Bergmann
2012-06-15 21:30 ` Ted Ts'o
2012-06-16 6:49 ` Arnd Bergmann
2012-06-14 21:55 ` Arnd Bergmann
2012-06-15 5:18 ` Andreas Dilger
2012-06-15 9:25 ` Arnd Bergmann
2012-06-15 9:40 ` Andreas Dilger
2012-06-15 10:54 ` Arnd Bergmann
2012-06-15 22:04 ` Ted Ts'o
2012-06-15 22:25 ` Andreas Dilger
2012-06-16 7:14 ` Arnd Bergmann
2012-06-16 7:28 ` Arnd Bergmann
2012-06-16 7:26 ` Arnd Bergmann
2012-06-16 13:49 ` Ted Ts'o
2012-06-16 17:41 ` Arnd Bergmann [this message]
2012-06-18 17:42 ` Ted Ts'o
2012-06-19 15:17 ` Arnd Bergmann
2012-06-20 15:09 ` Luca Porzio (lporzio)
2012-06-20 15:46 ` Arnd Bergmann
2012-06-22 13:29 ` Artem Bityutskiy
2012-06-22 14:07 ` Luca Porzio (lporzio)
2012-06-11 10:46 ` [PATCH 3/3] mmc: " Saugata Das
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201206161741.23900.arnd.bergmann@linaro.org \
--to=arnd.bergmann@linaro.org \
--cc=Alex.Lemberg@sandisk.com \
--cc=dedekind1@gmail.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mmc@vger.kernel.org \
--cc=lporzio@micron.com \
--cc=patches@linaro.org \
--cc=saugata.das@linaro.org \
--cc=saugata.das@stericsson.com \
--cc=syr.jeong@samsung.com \
--cc=tytso@mit.edu \
--cc=venkat@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).