From: Matt Mackall <mpm@selenic.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com>,
Artem Bityutskiy <dedekind@infradead.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Frank Haverkamp <haver@vnet.ibm.com>,
Christoph Hellwig <hch@infradead.org>,
David Woodhouse <dwmw2@infradead.org>
Subject: Re: [PATCH 00/22 take 3] UBI: Unsorted Block Images
Date: Mon, 19 Mar 2007 17:32:05 -0500 [thread overview]
Message-ID: <20070319223205.GZ4892@waste.org> (raw)
In-Reply-To: <1174338329.13341.633.camel@localhost.localdomain>
On Mon, Mar 19, 2007 at 10:05:29PM +0100, Thomas Gleixner wrote:
> On Mon, 2007-03-19 at 14:54 -0500, Matt Mackall wrote:
> > > (UBI also has static volumes which LVM doesn't but that is an aside.)
> >
> > If a static volume is simply a non-dynamic volume, then device mapper
> > can do that too. And countless other things. Which is not an aside.
> > UBI growing to do all the things that device mapper does is exactly
> > the thing we should be seeking to avoid.
>
> No it can't and device mapper sits on top of block devices. FLASH is no
> block device. Period.
Which of the following two properties does it lack?
- discrete blocks
- non-sequential access to blocks
When you do the obvious s/blocks/eraseblocks/, this appears to be
true.
Saying "but I can't do I/O smaller than the blocksize" doesn't change
this any more than it would for disks.
Saying "but I can do smaller I/O efficiently in some circumstances"
also doesn't change it.
In historical UNIX, some tapes were block devices too. Because they
supported seek().
> Device mapper can not provide a simple easy to decode scheme for boot
> loaders. We need to be able to boot out of 512 - 2048 byte of NAND FLASH
> and be able to find the kernel or second stage boot loader in this
> unordered device.
>
> And no, fixed addresses do not work. Do you want to implement device
> mapper into your Initialial Bootloader stage ?
This is exactly the same problem as booting on a desktop PC. But
somehow LILO manages. My first Linux box had a hell of a lot less disk
than the platform I bootstrapped (and wrote NAND drivers for) last
month had in NAND.
> > > That's why I suggested fixing the MTD layers that present block devices
> > > first in the part of my reply that you cut off. It seems to me that
> > > you're really after getting flash to look like a block device, which
> > > would enable device mapper to be used for something similar to UBI.
> > > That's fine, but until someone does that work UBI fills a need, has
> > > users, and has an existing implementation.
> >
> > False starts that get mainlined delay or prevent things getting done
> > right. The question is and remains "is UBI the right way to do
> > things?" Not "is UBI the easiest way to do things?" or "is UBI
> > something people have already adopted?"
> >
> > If the right way is instead to extend the block layer and device
> > mapper to encompass the quirks of NAND in a sensible fashion, then UBI
> > should not go in.
>
> No, block layer on top of FLASH needs 80% of the functionality of UBI in
> the first place.
Incorrect. A block-based filesystem on top of flash needs this
functionality. But a block device suitable to device mapper layering
(which then provides the functionality) does not.
> You need to implement a clever journalling block device
> emulator in order to keep the data alive and the FLASH not weared out
> within no time. You need the wear levelling, otherwise you can throw
> away your FLASH in no time.
And that's why it's in my picture.
> > Let me draw a picture so we have something to argue about:
> >
> > iSCSI/nbd(6)
> > |
> > filesystem { swap | ext3 ext3 jffs2
> > \ | | | /
> > / \ | dm-crypt->snapshot(5) /
> > device mapper -| \ \ | /
> > | partitioning /
> > | | partitioning(4)
> > | wear leveling(3) /
> > | | /
> > | block concatenation
> > | | | | |
> > \ bad block remapping(2)
> > | | | |
> > MTD raw block { raw block devices with no smarts(1)
> > / | \ \
> > hardware { NAND NAND NAND NAND
> >
> > Notes:
> > 1. This would provide a block device that allowed writing pages and
> > a secondary method for erasing whole blocks as well as a method for
> > querying/setting out of band information.
>
> Forget about OOB data. OOB data is reserved for ECC. Please read the
> recommendations of the NAND FLASH manufacturers. NAND gets less reliable
> with higher density devices and smaller processes.
>
> > 2. This would hide erase blocks either by using an embedded table or
> > out of band info. This could stack on top of block concatenation if
> > desired.
>
> Hide erase blocks ? UBI does not hide anything. It maps logical
> eraseblocks, which are exposed to the clients to arbitrary physical
> eraseblocks on the FLASH device in order to provide across device wear
> levelling.
Sorry, I meant hiding bad blocks here. That's why this layer was
labeled "bad block remapping".
> > 3. This would provide wear leveling, and probably simultaneously
> > provide relatively efficient and safe access to write sector
> > and page-sized I/O. Below this level, things had better be
> > comfortable with the limitations of NAND if they want to work well.
>
> I don't see how this provides across device wear levelling.
Because the layer immediately beneath it ("block concatenation") takes
N devices and presents one logical device.
> > 4. JFFS2 has its own wear-leving scheme, as do several other
> > filesystems, so they probably want to bypass this piece of the stack.
>
> JFFS2 on top of UBI delegates the wear levelling to UBI, as JFFS2s own
> wear levelling sucks.
Ok, fine. How about LogFS, then?
> > 5. We don't reimplement higher pieces of the stack (dm-crypt,
> > snapshot, etc.).
>
> Why should we reimplement that ?
So that you can get encryption and snapshot, etc.?
> > 6. We make some things possible that simply aren't otherwise.
> >
> > And this picture isn't even interesting yet. Imagine a dm-cache layer
> > that caches data read from disks in high-speed flash. Or using
> > dm-mirror to mirror writes to local flash over NBD or to a USB drive.
> > Neither of these can be done 'right' in a stack split between device
> > mapper and UBI.
>
> Err. Implement a clever block layer on top of UBI and use all the
> goodies you want including device mapper.
If I wanted to have both device mapper and device mapper's little
brother in my kernel, I wouldn't have started this thread.
--
Mathematics is the supreme nostalgia of our time.
next prev parent reply other threads:[~2007-03-19 22:45 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-03-14 15:19 [PATCH 00/22 take 3] UBI: Unsorted Block Images Artem Bityutskiy
2007-03-14 15:19 ` [PATCH 01/22 take 3] UBI: on-flash data structures header Artem Bityutskiy
2007-03-14 15:19 ` [PATCH 02/22 take 3] UBI: user-space API header Artem Bityutskiy
2007-03-14 15:19 ` [PATCH 03/22 take 3] UBI: kernel-space " Artem Bityutskiy
2007-03-14 15:19 ` [PATCH 04/22 take 3] UBI: internal header Artem Bityutskiy
2007-03-14 15:19 ` [PATCH 05/22 take 3] UBI: startup code Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 06/22 take 3] UBI: scanning unit Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 07/22 take 3] UBI: I/O unit Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 08/22 take 3] UBI: volume table unit Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 09/22 take 3] UBI: wear-leveling unit Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 10/22 take 3] UBI: EBA unit Artem Bityutskiy
2007-03-15 19:07 ` Andrew Morton
2007-03-15 21:24 ` Randy Dunlap
2007-03-15 23:29 ` Josh Boyer
2007-03-16 1:49 ` Randy Dunlap
2007-03-16 10:23 ` Artem Bityutskiy
2007-03-16 10:21 ` Artem Bityutskiy
2007-03-16 14:55 ` Randy Dunlap
2007-03-16 10:14 ` Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 11/22 take 3] UBI: user-interfaces unit Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 12/22 take 3] UBI: update functionality Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 13/22 take 3] UBI: accounting unit Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 14/22 take 3] UBI: volume management functionality Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 15/22 take 3] UBI: sysfs functionality Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 16/22 take 3] UBI: character devices functionality Artem Bityutskiy
2007-03-14 15:21 ` [PATCH 17/22 take 3] UBI: gluebi functionality Artem Bityutskiy
2007-03-14 15:21 ` [PATCH 18/22 take 3] UBI: misc stuff Artem Bityutskiy
2007-03-14 15:21 ` [PATCH 19/22 take 3] UBI: debugging stuff Artem Bityutskiy
2007-03-14 15:21 ` [PATCH 20/22 take 3] UBI: JFFS2 UBI support Artem Bityutskiy
2007-03-14 15:21 ` [PATCH 21/22 take 3] UBI: update MAINTAINERS Artem Bityutskiy
2007-03-14 15:21 ` [PATCH 22/22 take 3] UBI: Linux build integration Artem Bityutskiy
2007-03-18 16:27 ` [PATCH 00/22 take 3] UBI: Unsorted Block Images Matt Mackall
2007-03-18 16:49 ` Artem Bityutskiy
2007-03-18 19:18 ` Matt Mackall
2007-03-18 20:31 ` Josh Boyer
2007-03-19 17:08 ` Matt Mackall
2007-03-19 18:16 ` Josh Boyer
2007-03-19 19:54 ` Matt Mackall
2007-03-19 20:18 ` Artem Bityutskiy
2007-03-19 21:05 ` Thomas Gleixner
2007-03-19 22:32 ` Matt Mackall [this message]
2007-03-20 0:42 ` Thomas Gleixner
2007-03-20 1:05 ` Matt Mackall
2007-03-20 6:28 ` Thomas Gleixner
2007-03-21 11:05 ` Jörn Engel
2007-03-21 11:25 ` Thomas Gleixner
2007-03-21 11:35 ` Jörn Engel
2007-03-21 11:57 ` Thomas Gleixner
2007-03-21 12:31 ` Jörn Engel
2007-03-21 12:39 ` Artem Bityutskiy
2007-03-21 11:36 ` Artem Bityutskiy
2007-03-25 20:08 ` Jörn Engel
2007-03-25 21:49 ` David Lang
2007-03-25 22:55 ` Jörn Engel
2007-03-25 23:46 ` David Woodhouse
2007-03-26 0:01 ` Jörn Engel
2007-03-26 0:21 ` David Woodhouse
2007-03-26 1:04 ` Jörn Engel
2007-03-26 9:45 ` David Woodhouse
2007-03-26 9:51 ` Jörn Engel
2007-03-26 10:07 ` David Woodhouse
2007-03-26 10:02 ` Thomas Gleixner
2007-03-26 10:49 ` Artem Bityutskiy
2007-03-26 11:30 ` Jörn Engel
2007-03-19 21:06 ` Artem Bityutskiy
2007-03-19 21:36 ` Matt Mackall
2007-03-20 0:43 ` Thomas Gleixner
2007-03-20 12:25 ` Artem Bityutskiy
2007-03-20 13:52 ` Theodore Tso
2007-03-20 15:14 ` Artem Bityutskiy
2007-03-20 15:59 ` Josh Boyer
2007-03-20 18:58 ` David Lang
2007-03-20 20:05 ` Artem Bityutskiy
2007-03-20 21:36 ` David Woodhouse
2007-03-21 8:54 ` Artem Bityutskiy
2007-03-20 21:32 ` David Woodhouse
2007-03-21 13:03 ` Jörn Engel
2007-03-20 22:03 ` Theodore Tso
2007-03-21 8:44 ` Artem Bityutskiy
2007-03-21 13:50 ` Theodore Tso
2007-03-21 13:59 ` Josh Boyer
2007-03-21 14:02 ` Artem Bityutskiy
2007-03-21 15:38 ` Frank Haverkamp
2007-03-21 20:26 ` David Lang
2007-03-20 12:13 ` Josh Boyer
2007-03-19 19:03 ` Thomas Gleixner
2007-03-19 20:12 ` Matt Mackall
2007-03-19 21:04 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070319223205.GZ4892@waste.org \
--to=mpm@selenic.com \
--cc=dedekind@infradead.org \
--cc=dwmw2@infradead.org \
--cc=haver@vnet.ibm.com \
--cc=hch@infradead.org \
--cc=jwboyer@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox