public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Matt Mackall <mpm@selenic.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com>,
	Artem Bityutskiy <dedekind@infradead.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Frank Haverkamp <haver@vnet.ibm.com>,
	Christoph Hellwig <hch@infradead.org>,
	David Woodhouse <dwmw2@infradead.org>
Subject: Re: [PATCH 00/22 take 3] UBI: Unsorted Block Images
Date: Mon, 19 Mar 2007 17:32:05 -0500	[thread overview]
Message-ID: <20070319223205.GZ4892@waste.org> (raw)
In-Reply-To: <1174338329.13341.633.camel@localhost.localdomain>

On Mon, Mar 19, 2007 at 10:05:29PM +0100, Thomas Gleixner wrote:
> On Mon, 2007-03-19 at 14:54 -0500, Matt Mackall wrote:
> > > (UBI also has static volumes which LVM doesn't but that is an aside.)
> > 
> > If a static volume is simply a non-dynamic volume, then device mapper
> > can do that too. And countless other things. Which is not an aside.
> > UBI growing to do all the things that device mapper does is exactly
> > the thing we should be seeking to avoid.
> 
> No it can't and device mapper sits on top of block devices. FLASH is no
> block device. Period.

Which of the following two properties does it lack?

- discrete blocks
- non-sequential access to blocks

When you do the obvious s/blocks/eraseblocks/, this appears to be
true.

Saying "but I can't do I/O smaller than the blocksize" doesn't change
this any more than it would for disks.

Saying "but I can do smaller I/O efficiently in some circumstances"
also doesn't change it.

In historical UNIX, some tapes were block devices too. Because they
supported seek().

> Device mapper can not provide a simple easy to decode scheme for boot
> loaders. We need to be able to boot out of 512 - 2048 byte of NAND FLASH
> and be able to find the kernel or second stage boot loader in this
> unordered device.
> 
> And no, fixed addresses do not work. Do you want to implement device
> mapper into your Initialial Bootloader stage ?

This is exactly the same problem as booting on a desktop PC. But
somehow LILO manages. My first Linux box had a hell of a lot less disk
than the platform I bootstrapped (and wrote NAND drivers for) last
month had in NAND.

> > > That's why I suggested fixing the MTD layers that present block devices
> > > first in the part of my reply that you cut off.  It seems to me that
> > > you're really after getting flash to look like a block device, which
> > > would enable device mapper to be used for something similar to UBI.
> > > That's fine, but until someone does that work UBI fills a need, has
> > > users, and has an existing implementation.
> > 
> > False starts that get mainlined delay or prevent things getting done
> > right. The question is and remains "is UBI the right way to do
> > things?" Not "is UBI the easiest way to do things?" or "is UBI
> > something people have already adopted?"
> > 
> > If the right way is instead to extend the block layer and device
> > mapper to encompass the quirks of NAND in a sensible fashion, then UBI
> > should not go in.
> 
> No, block layer on top of FLASH needs 80% of the functionality of UBI in
> the first place.

Incorrect. A block-based filesystem on top of flash needs this
functionality. But a block device suitable to device mapper layering
(which then provides the functionality) does not.

> You need to implement a clever journalling block device
> emulator in order to keep the data alive and the FLASH not weared out
> within no time. You need the wear levelling, otherwise you can throw
> away your FLASH in no time.

And that's why it's in my picture.

> > Let me draw a picture so we have something to argue about:
> > 
> >                      iSCSI/nbd(6)
> >                           |
> > filesystem {        swap  |  ext3        ext3     jffs2
> >                       \   |   |            |       /
> >                /       \  | dm-crypt->snapshot(5) /
> > device mapper -|        \ \   |                  /
> >                |         partitioning           /
> >                |              |          partitioning(4)
> >                |        wear leveling(3)  /
> >                |              |          /
> >                |      block concatenation
> >                |       |    |    |     |
> >                \      bad block remapping(2)   
> >                        |    |    |     |
> > MTD raw block {     raw block devices with no smarts(1)
> >                       /     |     \      \
> > hardware {         NAND    NAND   NAND   NAND
> > 
> > Notes:
> > 1. This would provide a block device that allowed writing pages and
> >    a secondary method for erasing whole blocks as well as a method for
> >    querying/setting out of band information.
> 
> Forget about OOB data. OOB data is reserved for ECC. Please read the
> recommendations of the NAND FLASH manufacturers. NAND gets less reliable
> with higher density devices and smaller processes.
> 
> > 2. This would hide erase blocks either by using an embedded table or
> >    out of band info. This could stack on top of block concatenation if
> >    desired.
> 
> Hide erase blocks ? UBI does not hide anything. It maps logical
> eraseblocks, which are exposed to the clients to arbitrary physical
> eraseblocks on the FLASH device in order to provide across device wear
> levelling.

Sorry, I meant hiding bad blocks here. That's why this layer was
labeled "bad block remapping".

> > 3. This would provide wear leveling, and probably simultaneously
> >    provide relatively efficient and safe access to write sector 
> >    and page-sized I/O. Below this level, things had better be
> >    comfortable with the limitations of NAND if they want to work well.
> 
> I don't see how this provides across device wear levelling.

Because the layer immediately beneath it ("block concatenation") takes
N devices and presents one logical device.

> > 4. JFFS2 has its own wear-leving scheme, as do several other
> >    filesystems, so they probably want to bypass this piece of the stack.
> 
> JFFS2 on top of UBI delegates the wear levelling to UBI, as JFFS2s own
> wear levelling sucks. 

Ok, fine. How about LogFS, then?
 
> > 5. We don't reimplement higher pieces of the stack (dm-crypt,
> >    snapshot, etc.).
> 
> Why should we reimplement that ?

So that you can get encryption and snapshot, etc.?

> > 6. We make some things possible that simply aren't otherwise.
> >
> > And this picture isn't even interesting yet. Imagine a dm-cache layer
> > that caches data read from disks in high-speed flash. Or using
> > dm-mirror to mirror writes to local flash over NBD or to a USB drive.
> > Neither of these can be done 'right' in a stack split between device
> > mapper and UBI.
> 
> Err. Implement a clever block layer on top of UBI and use all the
> goodies you want including device mapper.

If I wanted to have both device mapper and device mapper's little
brother in my kernel, I wouldn't have started this thread.

-- 
Mathematics is the supreme nostalgia of our time.

  reply	other threads:[~2007-03-19 22:45 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-14 15:19 [PATCH 00/22 take 3] UBI: Unsorted Block Images Artem Bityutskiy
2007-03-14 15:19 ` [PATCH 01/22 take 3] UBI: on-flash data structures header Artem Bityutskiy
2007-03-14 15:19 ` [PATCH 02/22 take 3] UBI: user-space API header Artem Bityutskiy
2007-03-14 15:19 ` [PATCH 03/22 take 3] UBI: kernel-space " Artem Bityutskiy
2007-03-14 15:19 ` [PATCH 04/22 take 3] UBI: internal header Artem Bityutskiy
2007-03-14 15:19 ` [PATCH 05/22 take 3] UBI: startup code Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 06/22 take 3] UBI: scanning unit Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 07/22 take 3] UBI: I/O unit Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 08/22 take 3] UBI: volume table unit Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 09/22 take 3] UBI: wear-leveling unit Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 10/22 take 3] UBI: EBA unit Artem Bityutskiy
2007-03-15 19:07   ` Andrew Morton
2007-03-15 21:24     ` Randy Dunlap
2007-03-15 23:29       ` Josh Boyer
2007-03-16  1:49         ` Randy Dunlap
2007-03-16 10:23           ` Artem Bityutskiy
2007-03-16 10:21       ` Artem Bityutskiy
2007-03-16 14:55         ` Randy Dunlap
2007-03-16 10:14     ` Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 11/22 take 3] UBI: user-interfaces unit Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 12/22 take 3] UBI: update functionality Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 13/22 take 3] UBI: accounting unit Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 14/22 take 3] UBI: volume management functionality Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 15/22 take 3] UBI: sysfs functionality Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 16/22 take 3] UBI: character devices functionality Artem Bityutskiy
2007-03-14 15:21 ` [PATCH 17/22 take 3] UBI: gluebi functionality Artem Bityutskiy
2007-03-14 15:21 ` [PATCH 18/22 take 3] UBI: misc stuff Artem Bityutskiy
2007-03-14 15:21 ` [PATCH 19/22 take 3] UBI: debugging stuff Artem Bityutskiy
2007-03-14 15:21 ` [PATCH 20/22 take 3] UBI: JFFS2 UBI support Artem Bityutskiy
2007-03-14 15:21 ` [PATCH 21/22 take 3] UBI: update MAINTAINERS Artem Bityutskiy
2007-03-14 15:21 ` [PATCH 22/22 take 3] UBI: Linux build integration Artem Bityutskiy
2007-03-18 16:27 ` [PATCH 00/22 take 3] UBI: Unsorted Block Images Matt Mackall
2007-03-18 16:49   ` Artem Bityutskiy
2007-03-18 19:18     ` Matt Mackall
2007-03-18 20:31       ` Josh Boyer
2007-03-19 17:08         ` Matt Mackall
2007-03-19 18:16           ` Josh Boyer
2007-03-19 19:54             ` Matt Mackall
2007-03-19 20:18               ` Artem Bityutskiy
2007-03-19 21:05               ` Thomas Gleixner
2007-03-19 22:32                 ` Matt Mackall [this message]
2007-03-20  0:42                   ` Thomas Gleixner
2007-03-20  1:05                     ` Matt Mackall
2007-03-20  6:28                       ` Thomas Gleixner
2007-03-21 11:05                     ` Jörn Engel
2007-03-21 11:25                       ` Thomas Gleixner
2007-03-21 11:35                         ` Jörn Engel
2007-03-21 11:57                           ` Thomas Gleixner
2007-03-21 12:31                             ` Jörn Engel
2007-03-21 12:39                               ` Artem Bityutskiy
2007-03-21 11:36                         ` Artem Bityutskiy
2007-03-25 20:08                         ` Jörn Engel
2007-03-25 21:49                           ` David Lang
2007-03-25 22:55                             ` Jörn Engel
2007-03-25 23:46                               ` David Woodhouse
2007-03-26  0:01                                 ` Jörn Engel
2007-03-26  0:21                                   ` David Woodhouse
2007-03-26  1:04                                     ` Jörn Engel
2007-03-26  9:45                                       ` David Woodhouse
2007-03-26  9:51                                         ` Jörn Engel
2007-03-26 10:07                                           ` David Woodhouse
2007-03-26 10:02                                         ` Thomas Gleixner
2007-03-26 10:49                           ` Artem Bityutskiy
2007-03-26 11:30                             ` Jörn Engel
2007-03-19 21:06               ` Artem Bityutskiy
2007-03-19 21:36                 ` Matt Mackall
2007-03-20  0:43                   ` Thomas Gleixner
2007-03-20 12:25                   ` Artem Bityutskiy
2007-03-20 13:52                     ` Theodore Tso
2007-03-20 15:14                       ` Artem Bityutskiy
2007-03-20 15:59                       ` Josh Boyer
2007-03-20 18:58                         ` David Lang
2007-03-20 20:05                           ` Artem Bityutskiy
2007-03-20 21:36                             ` David Woodhouse
2007-03-21  8:54                               ` Artem Bityutskiy
2007-03-20 21:32                           ` David Woodhouse
2007-03-21 13:03                             ` Jörn Engel
2007-03-20 22:03                         ` Theodore Tso
2007-03-21  8:44                           ` Artem Bityutskiy
2007-03-21 13:50                             ` Theodore Tso
2007-03-21 13:59                               ` Josh Boyer
2007-03-21 14:02                               ` Artem Bityutskiy
2007-03-21 15:38                               ` Frank Haverkamp
2007-03-21 20:26                                 ` David Lang
2007-03-20 12:13               ` Josh Boyer
2007-03-19 19:03           ` Thomas Gleixner
2007-03-19 20:12             ` Matt Mackall
2007-03-19 21:04               ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070319223205.GZ4892@waste.org \
    --to=mpm@selenic.com \
    --cc=dedekind@infradead.org \
    --cc=dwmw2@infradead.org \
    --cc=haver@vnet.ibm.com \
    --cc=hch@infradead.org \
    --cc=jwboyer@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox