All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matt Mackall <mpm@selenic.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com>,
	Artem Bityutskiy <dedekind@infradead.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Frank Haverkamp <haver@vnet.ibm.com>,
	Christoph Hellwig <hch@infradead.org>,
	David Woodhouse <dwmw2@infradead.org>
Subject: Re: [PATCH 00/22 take 3] UBI: Unsorted Block Images
Date: Mon, 19 Mar 2007 17:32:05 -0500	[thread overview]
Message-ID: <20070319223205.GZ4892@waste.org> (raw)
In-Reply-To: <1174338329.13341.633.camel@localhost.localdomain>

On Mon, Mar 19, 2007 at 10:05:29PM +0100, Thomas Gleixner wrote:
> On Mon, 2007-03-19 at 14:54 -0500, Matt Mackall wrote:
> > > (UBI also has static volumes which LVM doesn't but that is an aside.)
> > 
> > If a static volume is simply a non-dynamic volume, then device mapper
> > can do that too. And countless other things. Which is not an aside.
> > UBI growing to do all the things that device mapper does is exactly
> > the thing we should be seeking to avoid.
> 
> No it can't and device mapper sits on top of block devices. FLASH is no
> block device. Period.

Which of the following two properties does it lack?

- discrete blocks
- non-sequential access to blocks

When you do the obvious s/blocks/eraseblocks/, this appears to be
true.

Saying "but I can't do I/O smaller than the blocksize" doesn't change
this any more than it would for disks.

Saying "but I can do smaller I/O efficiently in some circumstances"
also doesn't change it.

In historical UNIX, some tapes were block devices too. Because they
supported seek().

> Device mapper can not provide a simple easy to decode scheme for boot
> loaders. We need to be able to boot out of 512 - 2048 byte of NAND FLASH
> and be able to find the kernel or second stage boot loader in this
> unordered device.
> 
> And no, fixed addresses do not work. Do you want to implement device
> mapper into your Initialial Bootloader stage ?

This is exactly the same problem as booting on a desktop PC. But
somehow LILO manages. My first Linux box had a hell of a lot less disk
than the platform I bootstrapped (and wrote NAND drivers for) last
month had in NAND.

> > > That's why I suggested fixing the MTD layers that present block devices
> > > first in the part of my reply that you cut off.  It seems to me that
> > > you're really after getting flash to look like a block device, which
> > > would enable device mapper to be used for something similar to UBI.
> > > That's fine, but until someone does that work UBI fills a need, has
> > > users, and has an existing implementation.
> > 
> > False starts that get mainlined delay or prevent things getting done
> > right. The question is and remains "is UBI the right way to do
> > things?" Not "is UBI the easiest way to do things?" or "is UBI
> > something people have already adopted?"
> > 
> > If the right way is instead to extend the block layer and device
> > mapper to encompass the quirks of NAND in a sensible fashion, then UBI
> > should not go in.
> 
> No, block layer on top of FLASH needs 80% of the functionality of UBI in
> the first place.

Incorrect. A block-based filesystem on top of flash needs this
functionality. But a block device suitable to device mapper layering
(which then provides the functionality) does not.

> You need to implement a clever journalling block device
> emulator in order to keep the data alive and the FLASH not weared out
> within no time. You need the wear levelling, otherwise you can throw
> away your FLASH in no time.

And that's why it's in my picture.

> > Let me draw a picture so we have something to argue about:
> > 
> >                      iSCSI/nbd(6)
> >                           |
> > filesystem {        swap  |  ext3        ext3     jffs2
> >                       \   |   |            |       /
> >                /       \  | dm-crypt->snapshot(5) /
> > device mapper -|        \ \   |                  /
> >                |         partitioning           /
> >                |              |          partitioning(4)
> >                |        wear leveling(3)  /
> >                |              |          /
> >                |      block concatenation
> >                |       |    |    |     |
> >                \      bad block remapping(2)   
> >                        |    |    |     |
> > MTD raw block {     raw block devices with no smarts(1)
> >                       /     |     \      \
> > hardware {         NAND    NAND   NAND   NAND
> > 
> > Notes:
> > 1. This would provide a block device that allowed writing pages and
> >    a secondary method for erasing whole blocks as well as a method for
> >    querying/setting out of band information.
> 
> Forget about OOB data. OOB data is reserved for ECC. Please read the
> recommendations of the NAND FLASH manufacturers. NAND gets less reliable
> with higher density devices and smaller processes.
> 
> > 2. This would hide erase blocks either by using an embedded table or
> >    out of band info. This could stack on top of block concatenation if
> >    desired.
> 
> Hide erase blocks ? UBI does not hide anything. It maps logical
> eraseblocks, which are exposed to the clients to arbitrary physical
> eraseblocks on the FLASH device in order to provide across device wear
> levelling.

Sorry, I meant hiding bad blocks here. That's why this layer was
labeled "bad block remapping".

> > 3. This would provide wear leveling, and probably simultaneously
> >    provide relatively efficient and safe access to write sector 
> >    and page-sized I/O. Below this level, things had better be
> >    comfortable with the limitations of NAND if they want to work well.
> 
> I don't see how this provides across device wear levelling.

Because the layer immediately beneath it ("block concatenation") takes
N devices and presents one logical device.

> > 4. JFFS2 has its own wear-leving scheme, as do several other
> >    filesystems, so they probably want to bypass this piece of the stack.
> 
> JFFS2 on top of UBI delegates the wear levelling to UBI, as JFFS2s own
> wear levelling sucks. 

Ok, fine. How about LogFS, then?
 
> > 5. We don't reimplement higher pieces of the stack (dm-crypt,
> >    snapshot, etc.).
> 
> Why should we reimplement that ?

So that you can get encryption and snapshot, etc.?

> > 6. We make some things possible that simply aren't otherwise.
> >
> > And this picture isn't even interesting yet. Imagine a dm-cache layer
> > that caches data read from disks in high-speed flash. Or using
> > dm-mirror to mirror writes to local flash over NBD or to a USB drive.
> > Neither of these can be done 'right' in a stack split between device
> > mapper and UBI.
> 
> Err. Implement a clever block layer on top of UBI and use all the
> goodies you want including device mapper.

If I wanted to have both device mapper and device mapper's little
brother in my kernel, I wouldn't have started this thread.

-- 
Mathematics is the supreme nostalgia of our time.

  reply	other threads:[~2007-03-19 22:45 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-14 15:19 [PATCH 00/22 take 3] UBI: Unsorted Block Images Artem Bityutskiy
2007-03-14 15:19 ` [PATCH 01/22 take 3] UBI: on-flash data structures header Artem Bityutskiy
2007-03-14 15:19 ` [PATCH 02/22 take 3] UBI: user-space API header Artem Bityutskiy
2007-03-14 15:19 ` [PATCH 03/22 take 3] UBI: kernel-space " Artem Bityutskiy
2007-03-14 15:19 ` [PATCH 04/22 take 3] UBI: internal header Artem Bityutskiy
2007-03-14 15:19 ` [PATCH 05/22 take 3] UBI: startup code Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 06/22 take 3] UBI: scanning unit Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 07/22 take 3] UBI: I/O unit Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 08/22 take 3] UBI: volume table unit Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 09/22 take 3] UBI: wear-leveling unit Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 10/22 take 3] UBI: EBA unit Artem Bityutskiy
2007-03-15 19:07   ` Andrew Morton
2007-03-15 21:24     ` Randy Dunlap
2007-03-15 23:29       ` Josh Boyer
2007-03-16  1:49         ` Randy Dunlap
2007-03-16 10:23           ` Artem Bityutskiy
2007-03-16 10:21       ` Artem Bityutskiy
2007-03-16 14:55         ` Randy Dunlap
2007-03-16 10:14     ` Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 11/22 take 3] UBI: user-interfaces unit Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 12/22 take 3] UBI: update functionality Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 13/22 take 3] UBI: accounting unit Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 14/22 take 3] UBI: volume management functionality Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 15/22 take 3] UBI: sysfs functionality Artem Bityutskiy
2007-03-14 15:20 ` [PATCH 16/22 take 3] UBI: character devices functionality Artem Bityutskiy
2007-03-14 15:21 ` [PATCH 17/22 take 3] UBI: gluebi functionality Artem Bityutskiy
2007-03-14 15:21 ` [PATCH 18/22 take 3] UBI: misc stuff Artem Bityutskiy
2007-03-14 15:21 ` [PATCH 19/22 take 3] UBI: debugging stuff Artem Bityutskiy
2007-03-14 15:21 ` [PATCH 20/22 take 3] UBI: JFFS2 UBI support Artem Bityutskiy
2007-03-14 15:21 ` [PATCH 21/22 take 3] UBI: update MAINTAINERS Artem Bityutskiy
2007-03-14 15:21 ` [PATCH 22/22 take 3] UBI: Linux build integration Artem Bityutskiy
2007-03-18 16:27 ` [PATCH 00/22 take 3] UBI: Unsorted Block Images Matt Mackall
2007-03-18 16:49   ` Artem Bityutskiy
2007-03-18 19:18     ` Matt Mackall
2007-03-18 20:31       ` Josh Boyer
2007-03-19 17:08         ` Matt Mackall
2007-03-19 18:16           ` Josh Boyer
2007-03-19 19:54             ` Matt Mackall
2007-03-19 20:18               ` Artem Bityutskiy
2007-03-19 21:05               ` Thomas Gleixner
2007-03-19 22:32                 ` Matt Mackall [this message]
2007-03-20  0:42                   ` Thomas Gleixner
2007-03-20  1:05                     ` Matt Mackall
2007-03-20  6:28                       ` Thomas Gleixner
2007-03-21 11:05                     ` Jörn Engel
2007-03-21 11:25                       ` Thomas Gleixner
2007-03-21 11:35                         ` Jörn Engel
2007-03-21 11:57                           ` Thomas Gleixner
2007-03-21 12:31                             ` Jörn Engel
2007-03-21 12:39                               ` Artem Bityutskiy
2007-03-21 11:36                         ` Artem Bityutskiy
2007-03-25 20:08                         ` Jörn Engel
2007-03-25 21:49                           ` David Lang
2007-03-25 22:55                             ` Jörn Engel
2007-03-25 23:46                               ` David Woodhouse
2007-03-26  0:01                                 ` Jörn Engel
2007-03-26  0:21                                   ` David Woodhouse
2007-03-26  1:04                                     ` Jörn Engel
2007-03-26  9:45                                       ` David Woodhouse
2007-03-26  9:51                                         ` Jörn Engel
2007-03-26 10:07                                           ` David Woodhouse
2007-03-26 10:02                                         ` Thomas Gleixner
2007-03-26 10:49                           ` Artem Bityutskiy
2007-03-26 11:30                             ` Jörn Engel
2007-03-19 21:06               ` Artem Bityutskiy
2007-03-19 21:36                 ` Matt Mackall
2007-03-20  0:43                   ` Thomas Gleixner
2007-03-20 12:25                   ` Artem Bityutskiy
2007-03-20 13:52                     ` Theodore Tso
2007-03-20 15:14                       ` Artem Bityutskiy
2007-03-20 15:59                       ` Josh Boyer
2007-03-20 18:58                         ` David Lang
2007-03-20 20:05                           ` Artem Bityutskiy
2007-03-20 21:36                             ` David Woodhouse
2007-03-21  8:54                               ` Artem Bityutskiy
2007-03-20 21:32                           ` David Woodhouse
2007-03-21 13:03                             ` Jörn Engel
2007-03-20 22:03                         ` Theodore Tso
2007-03-21  8:44                           ` Artem Bityutskiy
2007-03-21 13:50                             ` Theodore Tso
2007-03-21 13:59                               ` Josh Boyer
2007-03-21 14:02                               ` Artem Bityutskiy
2007-03-21 15:38                               ` Frank Haverkamp
2007-03-21 20:26                                 ` David Lang
2007-03-20 12:13               ` Josh Boyer
2007-03-19 19:03           ` Thomas Gleixner
2007-03-19 20:12             ` Matt Mackall
2007-03-19 21:04               ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070319223205.GZ4892@waste.org \
    --to=mpm@selenic.com \
    --cc=dedekind@infradead.org \
    --cc=dwmw2@infradead.org \
    --cc=haver@vnet.ibm.com \
    --cc=hch@infradead.org \
    --cc=jwboyer@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.