Support of removable MTD devices and other advanced features (follow-up from lkml)

public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed

* Support of removable MTD devices and other advanced features (follow-up from lkml)
@ 2008-05-20 13:59 Alex Dubov
  2008-05-21  6:47 ` Artem Bityutskiy
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Alex Dubov @ 2008-05-20 13:59 UTC (permalink / raw)
  To: linux-mtd

Greetings.

I was working on SmartMedia and MemoryStick support and found that some
desirable features are missing from MTD core. As the thread on lkml died out I
would like to continue it here.

First, I want to say that I'm not aiming for immediate inclusion of my code. I,
however, want to work toward target that will be acceptable by all MTD
developers.

Second, I don't want to start a flame war. I just want to hear some
suggestions, if possible.

And third, architectural switch I'm proposing should not require modification
to low level chip drivers or higher level functionality of UBI kind. They can
be glued together with relative ease (callback architecture is quite flexible).
My target is device management and FTL drivers.

Therefore, I propose (and intend to implement) a new architecture for MTD core,
modeled after the block device API. The "alpha" version of it is here:

http://gentoo-wiki.com/User:Oakad/mtd_proposal

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Support of removable MTD devices and other advanced features (follow-up from lkml)
  2008-05-20 13:59 Support of removable MTD devices and other advanced features (follow-up from lkml) Alex Dubov
@ 2008-05-21  6:47 ` Artem Bityutskiy
  2008-05-21  8:41 ` Jörn Engel
  2008-05-21  9:06 ` David Woodhouse
  2 siblings, 0 replies; 26+ messages in thread
From: Artem Bityutskiy @ 2008-05-21  6:47 UTC (permalink / raw)
  To: Alex Dubov; +Cc: linux-mtd

Hi,

I do not know much about SmartMedia and so on, so I cannot really
comment. I mail more because I am afraid your mail will not be answered.
Probably you should keep CCing LKML, not sure.

On Tue, 2008-05-20 at 06:59 -0700, Alex Dubov wrote:
> And third, architectural switch I'm proposing should not require modification
> to low level chip drivers or higher level functionality of UBI kind. They can
> be glued together with relative ease (callback architecture is quite flexible).
> My target is device management and FTL drivers.
> 
> Therefore, I propose (and intend to implement) a new architecture for MTD core,
> modeled after the block device API. The "alpha" version of it is here:
> 
> http://gentoo-wiki.com/User:Oakad/mtd_proposal

I've briefly read through this. I think it is nice having an
asynchronous interface based on requests. Probably UBI/UBIFS would
benefit from it as well. Just few thoughts:

1. You will probably need to solve the horrible MTD problem - it does
not support devices larger than 4GiB because it uses absolute 32-bit
offsets.
2. You will need to make MTD sysfs-aware.
3. Removable devices support is a separate task as well.
4. IMO you should preserve the low-level flash access interface and the
request-based infrastructure should yous it as the back-end. Is this
possible? Or my understanding of smartmedia is incorrect? I thought it
is accessible as bare NAND and you have to support its FTL in software. 

Probably there are other tasks. I think you should work on these smaller
tasks separately and make them go in one by one, instead of trying to
put all together.

-- 
Best regards,
Artem Bityutskiy (Битюцкий Артём)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Support of removable MTD devices and other advanced features (follow-up from lkml)
  2008-05-20 13:59 Support of removable MTD devices and other advanced features (follow-up from lkml) Alex Dubov
  2008-05-21  6:47 ` Artem Bityutskiy
@ 2008-05-21  8:41 ` Jörn Engel
  2008-05-22  1:30   ` Alex Dubov
  2008-05-21  9:06 ` David Woodhouse
  2 siblings, 1 reply; 26+ messages in thread
From: Jörn Engel @ 2008-05-21  8:41 UTC (permalink / raw)
  To: Alex Dubov; +Cc: linux-mtd

On Tue, 20 May 2008 06:59:18 -0700, Alex Dubov wrote:
> 
> Therefore, I propose (and intend to implement) a new architecture for MTD core,
> modeled after the block device API. The "alpha" version of it is here:
> 
> http://gentoo-wiki.com/User:Oakad/mtd_proposal

Excellent!  I was just about to write my own proposal for some of this
today.  In particular I need asynchronous read, writes and erases.

So here is a lightly modified version of your mtd_request bits.  Apart
from reformatting and adding some documentation, the changes are:

- No flag for MTD_DATA, as this should be the default
- flags argument becomes int for natural alignment
- struct mtd_address introduced
- added length fields and data pointer

enum mtd_command {
	MTD_READ,
	MTD_WRITE,
	MTD_ERASE,
	MTD_COPY,
	MTD_INVALIDATE
};

/**
 * @block_no:	physical eraseblock number
 * @block_ofs:	offset within physical eraseblock
 */
struct mtd_address {
	u32	block_no;
	u32	block_ofs;
};

#define MTD_FLAG_OOB	0x01
/**
 * @mtd:	underlying memory technology device
 * @command:	read, write, erase, etc.
 * @flags:	additional flags to modify commands
 * @dst:	destination address
 * @len:	length for read/write
 * @src:	source address - only used for MTD_COPY
 */
struct mtd_request {
	struct mtd_device *mtd;
	enum mtd_command command;
	int		flags;

	u32		dst;
	void		*buf;
	u32		len;
	u32		src;
};


Jörn

-- 
My second remark is that our intellectual powers are rather geared to
master static relations and that our powers to visualize processes
evolving in time are relatively poorly developed.
-- Edsger W. Dijkstra

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Support of removable MTD devices and other advanced features (follow-up from lkml)
  2008-05-20 13:59 Support of removable MTD devices and other advanced features (follow-up from lkml) Alex Dubov
  2008-05-21  6:47 ` Artem Bityutskiy
  2008-05-21  8:41 ` Jörn Engel
@ 2008-05-21  9:06 ` David Woodhouse
  2008-05-21  9:29   ` Jörn Engel
  2 siblings, 1 reply; 26+ messages in thread
From: David Woodhouse @ 2008-05-21  9:06 UTC (permalink / raw)
  To: Alex Dubov; +Cc: linux-mtd

On Tue, 2008-05-20 at 06:59 -0700, Alex Dubov wrote:
> Therefore, I propose (and intend to implement) a new architecture for MTD core,
> modeled after the block device API. The "alpha" version of it is here:
> 
> http://gentoo-wiki.com/User:Oakad/mtd_proposal

That looks very interesting, and is fairly similar to what we were
thinking. Some of the outstanding problems we need to solve when we
change are:

 - Support for devices larger than 4GiB.
 - Clarifying whether buffers can be used for DMA
 - Partitioning (which is a bit of a hack right now) & concatenation.
 - Removable devices
 - Sysfs presence

Here's one way I think we can get started on this...

 1. Turn all calls to functions like mtd->read(mtd...) into calls to 
    core functions 'mtd_read()' which we can later play with.
 2. Introduce a wrapper which acts on mtd_requests by making calls to
    the existing direct driver functions (much like mtd_blkdevs.c does
    for block requests).
 3. Make our code 'mtd_read()' et al functions use the mtd_request API.
 4. Shift partitioning into the new layer.
 5. Make it possible for devices to provide the new API directly instead
    of the old one being invoked through the wrappers -- and for user
    modules to use it directly. This is the point at which we fix sysfs.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Support of removable MTD devices and other advanced features (follow-up from lkml)
  2008-05-21  9:06 ` David Woodhouse
@ 2008-05-21  9:29   ` Jörn Engel
  2008-05-21 15:20     ` Alex Dubov
  0 siblings, 1 reply; 26+ messages in thread
From: Jörn Engel @ 2008-05-21  9:29 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd, Alex Dubov

On Wed, 21 May 2008 10:06:07 +0100, David Woodhouse wrote:
> 
>  - Support for devices larger than 4GiB.
>  - Clarifying whether buffers can be used for DMA
>  - Partitioning (which is a bit of a hack right now) & concatenation.

Been there, tried that.  You should still have my patches from 2001
somewhere in your attic. ;)

One of the problems with partitioning is device numbering.  I decided to
use 0 for the main device, 1..15 for the partitions, 16 for the next
main devices, etc.  I also removed the read-only devices, which is a
real no-no, as it breaks the current userland interface.

But even the raw partitioning is a suprise at the least.  Given two
devices with two partitions each, you would expect to see something
like:
ls /dev
...
mtda0
mtda1
mtda2
mtdb0
mtdb1
mtdb2

But what you'll get is:
mtd0
mtd0ro
mtd1
mtd1ro
mtd2
mtd2ro
mtd16
mtd16ro
mtd17
mtd17ro
mtd18
mtd18ro

So unless someone has a bright idea or we all decide that it's worth
breaking stuff to fix the current mess, partitioning will remain a mess.

>  - Removable devices
>  - Sysfs presence

I'm not entirely sure what this is good for.  But maybe I've just become
too disillusioned with sysfs altogether and wish we'd never digressed
from sysctl.  At least back that we had one crappy interface instead of
three or four.

> Here's one way I think we can get started on this...
> 
>  1. Turn all calls to functions like mtd->read(mtd...) into calls to 
>     core functions 'mtd_read()' which we can later play with.

You mean in fs/jffs2 and similar?  Not sure how useful that would be.
In the end we will need to deal with four combinations:
1. driver uses old interface, user uses old interface
2. driver uses new interface, user uses old interface
3. driver uses old interface, user uses new interface
4. driver uses new interface, user uses new interface

1 is what we have right now, 4 is where we want to be in the future.  In
the meantime we either have to deal with 2 and 3 or go through all
drivers/users and fix them up.  Wrapper are mainly useful to avoid doing
the fixup, which means we'll end up with something akin to arch/ppc/ -
old cruddy code staying in limbo for decades to come.

What I wanted to do is basically add the asynchronous interface, convert
roughtly one driver and one user (logfs and ramtd) to play around with
the interface and get some experience.  What that becomes useful we can
decide whether we need plumbing for 2 and 3 or just crawl through all
the dusty code and change it.

Jörn

-- 
It is the mark of an educated mind to be able to entertain a thought
without accepting it.
-- Aristotle

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Support of removable MTD devices and other advanced features (follow-up from lkml)
  2008-05-21  9:29   ` Jörn Engel
@ 2008-05-21 15:20     ` Alex Dubov
  2008-05-21 15:22       ` David Woodhouse
  0 siblings, 1 reply; 26+ messages in thread
From: Alex Dubov @ 2008-05-21 15:20 UTC (permalink / raw)
  To: linux-mtd; +Cc: Jörn Engel, David Woodhouse

Thanks for the comments. At this point it's enough for me to see that you're ok
with the idea.

I'll try to implement a working prototype of the proposal, supporting block
device emulation over smartmedia and memorystick (these are the only mtd
devices I have). Hopefully, it should take no more than couple of weeks. Then
we will be able to play with the working prototype to see what features it must
actually have.

I'll try to take care of the device node mess, to the possible extent.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Support of removable MTD devices and other advanced features (follow-up from lkml)
  2008-05-21 15:20     ` Alex Dubov
@ 2008-05-21 15:22       ` David Woodhouse
  2008-05-21 15:41         ` Alex Dubov
  0 siblings, 1 reply; 26+ messages in thread
From: David Woodhouse @ 2008-05-21 15:22 UTC (permalink / raw)
  To: Alex Dubov; +Cc: J?rn Engel, linux-mtd

On Wed, 2008-05-21 at 08:20 -0700, Alex Dubov wrote:
> Thanks for the comments. At this point it's enough for me to see that you're ok
> with the idea.
> 
> I'll try to implement a working prototype of the proposal, supporting block
> device emulation over smartmedia and memorystick (these are the only mtd
> devices I have). Hopefully, it should take no more than couple of weeks. Then
> we will be able to play with the working prototype to see what features it must
> actually have.

Block device emulation is a separate thing to the core MTD device API.
We already have an implementation of SSFDC, don't we?
      
-- 
dwmw2

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Support of removable MTD devices and other advanced features (follow-up from lkml)
  2008-05-21 15:22       ` David Woodhouse
@ 2008-05-21 15:41         ` Alex Dubov
  2008-05-21 20:45           ` Jörn Engel
  0 siblings, 1 reply; 26+ messages in thread
From: Alex Dubov @ 2008-05-21 15:41 UTC (permalink / raw)
  To: David Woodhouse; +Cc: J?rn Engel, linux-mtd


--- David Woodhouse <dwmw2@infradead.org> wrote:

> On Wed, 2008-05-21 at 08:20 -0700, Alex Dubov wrote:
> > Thanks for the comments. At this point it's enough for me to see that
> you're ok
> > with the idea.
> > 
> > I'll try to implement a working prototype of the proposal, supporting block
> > device emulation over smartmedia and memorystick (these are the only mtd
> > devices I have). Hopefully, it should take no more than couple of weeks.
> Then
> > we will be able to play with the working prototype to see what features it
> must
> > actually have.
> 
> Block device emulation is a separate thing to the core MTD device API.
> We already have an implementation of SSFDC, don't we?
>       

What I meant is: my prototype will include mtd_block, ftl for ssfdc, ftl for
memorystick, and backends for both adapted to work with fully callback driven
architecture. This is my current priority.

Everything else can be bolted on later.

(Current ssfdc.ko implementation is junk, sorry. Joern's driver has its own
implementation built-in).





      

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Support of removable MTD devices and other advanced features (follow-up from lkml)
  2008-05-21 15:41         ` Alex Dubov
@ 2008-05-21 20:45           ` Jörn Engel
  0 siblings, 0 replies; 26+ messages in thread
From: Jörn Engel @ 2008-05-21 20:45 UTC (permalink / raw)
  To: Alex Dubov; +Cc: linux-mtd, David Woodhouse

On Wed, 21 May 2008 08:41:38 -0700, Alex Dubov wrote:
> 
> (Current ssfdc.ko implementation is junk, sorry. Joern's driver has its own
> implementation built-in).

Mine doesn't, I ripped it all out.  But the original driver from Daniel
Drake does, along with several others in drivers/usb/storage/.  And
those are almost as bad as vendor code - from which they likely evolved.

Jörn

-- 
Fools ignore complexity.  Pragmatists suffer it.
Some can avoid it.  Geniuses remove it.
-- Perlis's Programming Proverb #58, SIGPLAN Notices, Sept.  1982

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Support of removable MTD devices and other advanced features (follow-up from lkml)
  2008-05-21  8:41 ` Jörn Engel
@ 2008-05-22  1:30   ` Alex Dubov
  2008-05-22 15:10     ` Jörn Engel
  0 siblings, 1 reply; 26+ messages in thread
From: Alex Dubov @ 2008-05-22  1:30 UTC (permalink / raw)
  To: Jörn Engel; +Cc: linux-mtd

> So here is a lightly modified version of your mtd_request bits.  Apart
> from reformatting and adding some documentation, the changes are:
> 
> - No flag for MTD_DATA, as this should be the default

# Nope. What about the case where you only want to read the oob (block scan) or
have no data to transfer (block invalidate)? Of course, this "flags" field may
be omitted outright, and the drivers could rely on error returned from
mtd_get_buf/mtd_get_oob, but this may turn out awkward.

> 
> enum mtd_command {
> 	MTD_READ,
> 	MTD_WRITE,
> 	MTD_ERASE,
> 	MTD_COPY,
> 	MTD_INVALIDATE
> };
> 
> /**
>  * @block_no:	physical eraseblock number
>  * @block_ofs:	offset within physical eraseblock
>  */
> struct mtd_address {
> 	u32	block_no;
> 	u32	block_ofs;
> };
> 
> #define MTD_FLAG_OOB	0x01
> /**
>  * @mtd:	underlying memory technology device
>  * @command:	read, write, erase, etc.
>  * @flags:	additional flags to modify commands
>  * @dst:	destination address
>  * @len:	length for read/write
>  * @src:	source address - only used for MTD_COPY
>  */
> struct mtd_request {
> 	struct mtd_device  *mtd;
> 	enum mtd_command   command;
> 	int		   flags;
>

        u32                log_block;

	struct mtd_address dst;

> 	u32		   len;

 	struct mtd_address src;
> };
> 

# You have to retain logical block address when using FTLs. This also makes
"struct mtd_address" confusing - the offset applies the same both to physical
and logical block. User level driver may want to fill in only logical block
address + offset, FTL will put in the actual physical block.

# And it's not clear to me why would you need pointer to data in the request
struct. Data is delivered through the additional (quite often more than one)
call to mtd_get_buf. I think, it's imperative to support fragmented buffers -
block device often submits requests that are larger than eraseblock (can be
read/written in one go) but fragmented across several memory buffers.

By the way, my bias toward block device emulation is caused by the obvious fact
that overwhelming majority of flash devices in existence are used as such, and
not without much success (I don't see many people complaining about their usb
sticks and mmc cards, and those may run linux inside one day).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Support of removable MTD devices and other advanced features (follow-up from lkml)
  2008-05-22  1:30   ` Alex Dubov
@ 2008-05-22 15:10     ` Jörn Engel
  2008-05-23  2:47       ` Alex Dubov
  0 siblings, 1 reply; 26+ messages in thread
From: Jörn Engel @ 2008-05-22 15:10 UTC (permalink / raw)
  To: Alex Dubov; +Cc: linux-mtd

On Wed, 21 May 2008 18:30:44 -0700, Alex Dubov wrote:
> 
> > So here is a lightly modified version of your mtd_request bits.  Apart
> > from reformatting and adding some documentation, the changes are:
> > 
> > - No flag for MTD_DATA, as this should be the default
> 
> # Nope. What about the case where you only want to read the oob (block scan) or
> have no data to transfer (block invalidate)? Of course, this "flags" field may
> be omitted outright, and the drivers could rely on error returned from
> mtd_get_buf/mtd_get_oob, but this may turn out awkward.

Fair enough.

> # You have to retain logical block address when using FTLs. This also makes
> "struct mtd_address" confusing - the offset applies the same both to physical
> and logical block. User level driver may want to fill in only logical block
> address + offset, FTL will put in the actual physical block.

But nothing below the FTL should ever know about a logical block at all.
Why pass it on?

> # And it's not clear to me why would you need pointer to data in the request
> struct. Data is delivered through the additional (quite often more than one)
> call to mtd_get_buf. I think, it's imperative to support fragmented buffers -
> block device often submits requests that are larger than eraseblock (can be
> read/written in one go) but fragmented across several memory buffers.

We can replace the data pointer with a struct bio_vec.  In fact, I am
wondering whether we could just use a struct bio instead of struct
mtd_request.

Jörn

-- 
Mundie uses a textbook tactic of manipulation: start with some
reasonable talk, and lead the audience to an unreasonable conclusion.
-- Bruce Perens

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Support of removable MTD devices and other advanced features (follow-up from lkml)
  2008-05-22 15:10     ` Jörn Engel
@ 2008-05-23  2:47       ` Alex Dubov
  2008-05-23  5:50         ` Jörn Engel
  0 siblings, 1 reply; 26+ messages in thread
From: Alex Dubov @ 2008-05-23  2:47 UTC (permalink / raw)
  To: Jörn Engel; +Cc: linux-mtd

> > # You have to retain logical block address when using FTLs. This also makes
> > "struct mtd_address" confusing - the offset applies the same both to
> physical
> > and logical block. User level driver may want to fill in only logical block
> > address + offset, FTL will put in the actual physical block.
> 
> But nothing below the FTL should ever know about a logical block at all.
> Why pass it on?

Media class specific backend must know it, to populate media's oob structure.
TI's smartmedia adapter, for example, wants to know it itself (it has a
hardware register for it).

> 
> > # And it's not clear to me why would you need pointer to data in the
> request
> > struct. Data is delivered through the additional (quite often more than
> one)
> > call to mtd_get_buf. I think, it's imperative to support fragmented buffers
> -
> > block device often submits requests that are larger than eraseblock (can be
> > read/written in one go) but fragmented across several memory buffers.
> 
> We can replace the data pointer with a struct bio_vec.  In fact, I am
> wondering whether we could just use a struct bio instead of struct
> mtd_request.
> 

Passing the whole struct bio is an overshoot for what we want to support. It
has plenty of functionality that to my opinion would never be supported by dumb
flash controllers.

I thought, something simple would be enough, as long as it can handle buffer
fragmentation and give the backend access to the actual struct page for mapping
operation. It doesn't matter if this is a scatterlist or bio_vec (they are
mostly the same). What's matter is an ability to obtain several chunks of the
buffer in some not too obtrusive way.

Host controller will get a buffer chunk, and set up a dma into it. When it
catches dma boundary event, it'll get another chunk, while still in the
interrupt handler (as much as needed). If controller happens to have a real sg
dma, it can call mtd_get_buf_sg several times in advance to populate the sg
table.

Controllers lacking dma capability can do exactly the same using mtd_get_buf.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Support of removable MTD devices and other advanced features (follow-up from lkml)
  2008-05-23  2:47       ` Alex Dubov
@ 2008-05-23  5:50         ` Jörn Engel
  2008-05-23  9:33           ` Alex Dubov
  0 siblings, 1 reply; 26+ messages in thread
From: Jörn Engel @ 2008-05-23  5:50 UTC (permalink / raw)
  To: Alex Dubov; +Cc: linux-mtd

On Thu, 22 May 2008 19:47:02 -0700, Alex Dubov wrote:
> 
> > 
> > But nothing below the FTL should ever know about a logical block at all.
> > Why pass it on?
> 
> Media class specific backend must know it, to populate media's oob structure.
> TI's smartmedia adapter, for example, wants to know it itself (it has a
> hardware register for it).

Interesting.  Do you have a spec for that?

I would have expected two kinds of controllers.  "Smart" ones that
essentially export a block device interface and do all FTL work
themselves and dumb ones that allow raw flash access and require and FTL
in software.  Didn't know there were mixtures of the two.

> > We can replace the data pointer with a struct bio_vec.  In fact, I am
> > wondering whether we could just use a struct bio instead of struct
> > mtd_request.
> > 
> 
> Passing the whole struct bio is an overshoot for what we want to support. It
> has plenty of functionality that to my opinion would never be supported by dumb
> flash controllers.

After having a closer look at it I tend to agree.  Nevertheless there
are tons of similarities between block devices and mtd and I would like
them to become as similar as reasonably possible.  So right now I'd
create a struct fio (flash io) and copy any fields that make sense for
both from struct bio. 

> I thought, something simple would be enough, as long as it can handle buffer
> fragmentation and give the backend access to the actual struct page for mapping
> operation. It doesn't matter if this is a scatterlist or bio_vec (they are
> mostly the same). What's matter is an ability to obtain several chunks of the
> buffer in some not too obtrusive way.
> 
> Host controller will get a buffer chunk, and set up a dma into it. When it
> catches dma boundary event, it'll get another chunk, while still in the
> interrupt handler (as much as needed). If controller happens to have a real sg
> dma, it can call mtd_get_buf_sg several times in advance to populate the sg
> table.
> 
> Controllers lacking dma capability can do exactly the same using mtd_get_buf.

Makes sense.  And one hopes that most controllers don't require busy
waits and either send an interrupt or allow setting a timer as a poor
man's interrupt.

Jörn

-- 
Homo Sapiens is a goal, not a description.
-- unknown

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Support of removable MTD devices and other advanced features (follow-up from lkml)
  2008-05-23  5:50         ` Jörn Engel
@ 2008-05-23  9:33           ` Alex Dubov
  2008-05-23  9:59             ` Jörn Engel
  0 siblings, 1 reply; 26+ messages in thread
From: Alex Dubov @ 2008-05-23  9:33 UTC (permalink / raw)
  To: Jörn Engel; +Cc: linux-mtd

--- Jörn Engel <joern@logfs.org> wrote:

> On Thu, 22 May 2008 19:47:02 -0700, Alex Dubov wrote:
> > 
> > > 
> > > But nothing below the FTL should ever know about a logical block at all.
> > > Why pass it on?
> > 
> > Media class specific backend must know it, to populate media's oob
> structure.
> > TI's smartmedia adapter, for example, wants to know it itself (it has a
> > hardware register for it).
> 
> Interesting.  Do you have a spec for that?
> 
> I would have expected two kinds of controllers.  "Smart" ones that
> essentially export a block device interface and do all FTL work
> themselves and dumb ones that allow raw flash access and require and FTL
> in software.  Didn't know there were mixtures of the two.

I don't have a datasheet for the TI, but it appears that it calculates a parity
for the "block address" field in smartmedia oob itself, so it will overwrite
the submitted oob with the value of the appropriate register.

Memorystick cards don not expose the oob at all - logical address is set to oob
through a media side register.

> 
> > > We can replace the data pointer with a struct bio_vec.  In fact, I am
> > > wondering whether we could just use a struct bio instead of struct
> > > mtd_request.
> > > 
> > 
> > Passing the whole struct bio is an overshoot for what we want to support.
> It
> > has plenty of functionality that to my opinion would never be supported by
> dumb
> > flash controllers.
> 
> After having a closer look at it I tend to agree.  Nevertheless there
> are tons of similarities between block devices and mtd and I would like
> them to become as similar as reasonably possible.  So right now I'd
> create a struct fio (flash io) and copy any fields that make sense for
> both from struct bio. 

The question is: do you really think something like this is needed at all?
Block device layer uses all kinds of assumptions irrelevant to MTD:

1. Most backends are very (NCQ) intelligent and very fast.
2. Failure rates are diminishingly small and mostly handled in hardware.

On the MTD side:

1. Backends are dumb.
2. Protocols are even dumber.
3. Failures happen all the time.
4. Fully zero-copy approach is not possible (because of the occasional
read-merge-erase-write).

That's why MTD will hardly benefit from request queues or fancy IO management
schemes.

> 
> > I thought, something simple would be enough, as long as it can handle
> buffer
> > fragmentation and give the backend access to the actual struct page for
> mapping
> > operation. It doesn't matter if this is a scatterlist or bio_vec (they are
> > mostly the same). What's matter is an ability to obtain several chunks of
> the
> > buffer in some not too obtrusive way.
> > 
> > Host controller will get a buffer chunk, and set up a dma into it. When it
> > catches dma boundary event, it'll get another chunk, while still in the
> > interrupt handler (as much as needed). If controller happens to have a real
> sg
> > dma, it can call mtd_get_buf_sg several times in advance to populate the sg
> > table.
> > 
> > Controllers lacking dma capability can do exactly the same using
> mtd_get_buf.
> 
> Makes sense.  And one hopes that most controllers don't require busy
> waits and either send an interrupt or allow setting a timer as a poor
> man's interrupt.

There's nothing wrong with backend busy waiting to complete request. Maximum,
your audio will jump here and there. Just don't use that driver for mp3 player
project.

(That's why I put as a requirement that new requests must be advertised
asynchronously, by firing a tasklet in the backend, for example. I'm following
this approach in my xd_card driver).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Support of removable MTD devices and other advanced features (follow-up from lkml)
  2008-05-23  9:33           ` Alex Dubov
@ 2008-05-23  9:59             ` Jörn Engel
  2008-05-23 12:49               ` Alex Dubov
  0 siblings, 1 reply; 26+ messages in thread
From: Jörn Engel @ 2008-05-23  9:59 UTC (permalink / raw)
  To: Alex Dubov; +Cc: linux-mtd

On Fri, 23 May 2008 02:33:30 -0700, Alex Dubov wrote:
> 
> The question is: do you really think something like this is needed at all?
> Block device layer uses all kinds of assumptions irrelevant to MTD:
> 
> 1. Most backends are very (NCQ) intelligent and very fast.
> 2. Failure rates are diminishingly small and mostly handled in hardware.
> 
> On the MTD side:
> 
> 1. Backends are dumb.
> 2. Protocols are even dumber.
> 3. Failures happen all the time.
> 4. Fully zero-copy approach is not possible (because of the occasional
> read-merge-erase-write).
> 
> That's why MTD will hardly benefit from request queues or fancy IO management
> schemes.

I have a need to spread reads/writes over N chips, with N approaching
large numbers.  And I would like to deal with such a beast as a single
mtd entity, not have a filesystem on 12odd devices at once.  So this
device should do something similar to mtdconcat and support having at
least as many outstanding requests as there are chips.

For read requests there can be literally thousands outstanding, as the
read path in a filesystem should be either lockless or extremely
fine-grained.  The only throtteling mechanisms are data dependencies,
which depend on your workload and the total amount of memory in the
system.  Or the number of threads, if reads are blocking.

An elevator is clearly pointless.  But fairness may well become and
issue, so some sort of scheduler may even make sense.  Once you think
about several gigabytes of storage attached through mtd, the
similarities to the block device layer increase.  In spite of all your
arguments being valid. :)

> There's nothing wrong with backend busy waiting to complete request. Maximum,
> your audio will jump here and there. Just don't use that driver for mp3 player
> project.

:)

> (That's why I put as a requirement that new requests must be advertised
> asynchronously, by firing a tasklet in the backend, for example. I'm following
> this approach in my xd_card driver).

I guess anything reading/writing more than 512 bytes at once can take
longer than two schedule events.  Flash may be fast compared to spinning
rust, but it's still horribly slow when compared to the cpu or even RAM.

Jörn

-- 
Invincibility is in oneself, vulnerability is in the opponent.
-- Sun Tzu

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Support of removable MTD devices and other advanced features (follow-up from lkml)
  2008-05-23  9:59             ` Jörn Engel
@ 2008-05-23 12:49               ` Alex Dubov
  2008-05-23 13:28                 ` Jörn Engel
  0 siblings, 1 reply; 26+ messages in thread
From: Alex Dubov @ 2008-05-23 12:49 UTC (permalink / raw)
  To: Jörn Engel; +Cc: linux-mtd


--- Jörn Engel <joern@logfs.org> wrote:

> On Fri, 23 May 2008 02:33:30 -0700, Alex Dubov wrote:
> > 
> > The question is: do you really think something like this is needed at all?
> > Block device layer uses all kinds of assumptions irrelevant to MTD:
> > 
> > 1. Most backends are very (NCQ) intelligent and very fast.
> > 2. Failure rates are diminishingly small and mostly handled in hardware.
> > 
> > On the MTD side:
> > 
> > 1. Backends are dumb.
> > 2. Protocols are even dumber.
> > 3. Failures happen all the time.
> > 4. Fully zero-copy approach is not possible (because of the occasional
> > read-merge-erase-write).
> > 
> > That's why MTD will hardly benefit from request queues or fancy IO
> management
> > schemes.
> 
> I have a need to spread reads/writes over N chips, with N approaching
> large numbers.  And I would like to deal with such a beast as a single
> mtd entity, not have a filesystem on 12odd devices at once.  So this
> device should do something similar to mtdconcat and support having at
> least as many outstanding requests as there are chips.
> 
> For read requests there can be literally thousands outstanding, as the
> read path in a filesystem should be either lockless or extremely
> fine-grained.  The only throtteling mechanisms are data dependencies,
> which depend on your workload and the total amount of memory in the
> system.  Or the number of threads, if reads are blocking.
> 
> An elevator is clearly pointless.  But fairness may well become and
> issue, so some sort of scheduler may even make sense.  Once you think
> about several gigabytes of storage attached through mtd, the
> similarities to the block device layer increase.  In spite of all your
> arguments being valid. :)
> 

We are talking about somewhat different things here. Userspace visible devices
(highest layer in mtd stack) must support something complex. Lower levers in
the mtd stack - not necessarily so.

Highest level "raw mtd" devices can be normal block devices with support for
custom commands. Intermediate and low level modules can do with simple
interface.

Then, there should be a layer akin to "md" that will allow creation of flash
raids. After all, we are not limited in number of intermediate devices present
in the kernel. We don't have to create userspace visible nodes for them.



      

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Support of removable MTD devices and other advanced features (follow-up from lkml)
  2008-05-23 12:49               ` Alex Dubov
@ 2008-05-23 13:28                 ` Jörn Engel
  2008-05-24 13:12                   ` Alex Dubov
  0 siblings, 1 reply; 26+ messages in thread
From: Jörn Engel @ 2008-05-23 13:28 UTC (permalink / raw)
  To: Alex Dubov; +Cc: linux-mtd

On Fri, 23 May 2008 05:49:48 -0700, Alex Dubov wrote:
> 
> We are talking about somewhat different things here. Userspace visible devices
> (highest layer in mtd stack) must support something complex. Lower levers in
> the mtd stack - not necessarily so.
> 
> Highest level "raw mtd" devices can be normal block devices with support for
> custom commands. Intermediate and low level modules can do with simple
> interface.

Either I misunderstand you or you are forgetting that filesystems deal
with raw mtd devices, which are the lowest levels in your stack afaics.
JFFS2 and LogFS will deal directly with the chip driver and bypass any
intermediate layers.  UBIFS will talk to UBI as a middle layer, which
again talks directly to the chip driver.

> Then, there should be a layer akin to "md" that will allow creation of flash
> raids. After all, we are not limited in number of intermediate devices present
> in the kernel. We don't have to create userspace visible nodes for them.

True.  For me pure concatenation would be enough.  All I need is some
extra geometry information so I can decide which block belongs to which
chip.  In the simplest case something like "13 chips of 123MiB each,
linearly concatenated".  Different chips sizes are fine, different
erasesize and writesize may still work within reason.

So mtdconcat.c plus extended geometry information would be good enough.

Jörn

-- 
All art is but imitation of nature.
-- Lucius Annaeus Seneca

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Support of removable MTD devices and other advanced features (follow-up from lkml)
  2008-05-23 13:28                 ` Jörn Engel
@ 2008-05-24 13:12                   ` Alex Dubov
  2008-05-24 17:56                     ` Jörn Engel
  0 siblings, 1 reply; 26+ messages in thread
From: Alex Dubov @ 2008-05-24 13:12 UTC (permalink / raw)
  To: Jörn Engel; +Cc: linux-mtd


--- Jörn Engel <joern@logfs.org> wrote:

> On Fri, 23 May 2008 05:49:48 -0700, Alex Dubov wrote:
> > 
> > We are talking about somewhat different things here. Userspace visible
> devices
> > (highest layer in mtd stack) must support something complex. Lower levers
> in
> > the mtd stack - not necessarily so.
> > 
> > Highest level "raw mtd" devices can be normal block devices with support
> for
> > custom commands. Intermediate and low level modules can do with simple
> > interface.
> 
> Either I misunderstand you or you are forgetting that filesystems deal
> with raw mtd devices, which are the lowest levels in your stack afaics.
> JFFS2 and LogFS will deal directly with the chip driver and bypass any
> intermediate layers.  UBIFS will talk to UBI as a middle layer, which
> again talks directly to the chip driver.
> 

Do UBI and JFFS always operate in terms of whole eraseblocks or they may
attempt  partial block writes? Different flash chips have different
capabilities in regard to writing and this can be used to some advantage.




      

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Support of removable MTD devices and other advanced features (follow-up from lkml)
  2008-05-24 13:12                   ` Alex Dubov
@ 2008-05-24 17:56                     ` Jörn Engel
  2008-05-25  3:41                       ` Alex Dubov
  0 siblings, 1 reply; 26+ messages in thread
From: Jörn Engel @ 2008-05-24 17:56 UTC (permalink / raw)
  To: Alex Dubov; +Cc: linux-mtd

On Sat, 24 May 2008 06:12:23 -0700, Alex Dubov wrote:
> 
> Do UBI and JFFS always operate in terms of whole eraseblocks or they may
> attempt  partial block writes? Different flash chips have different
> capabilities in regard to writing and this can be used to some advantage.

Writes happen in multiples of mtd->writesize.  Which for NAND is
pagesize.  There are also special cases with subpage writes.  AFAIK only
UBI exploits that feature.

Jörn

-- 
Joern's library part 15:
http://www.knosof.co.uk/cbook/accu06a.pdf

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Support of removable MTD devices and other advanced features (follow-up from lkml)
  2008-05-24 17:56                     ` Jörn Engel
@ 2008-05-25  3:41                       ` Alex Dubov
  2008-05-25  7:25                         ` Jörn Engel
  0 siblings, 1 reply; 26+ messages in thread
From: Alex Dubov @ 2008-05-25  3:41 UTC (permalink / raw)
  To: Jörn Engel; +Cc: linux-mtd

--- Jörn Engel <joern@logfs.org> wrote:

> On Sat, 24 May 2008 06:12:23 -0700, Alex Dubov wrote:
> > 
> > Do UBI and JFFS always operate in terms of whole eraseblocks or they may
> > attempt  partial block writes? Different flash chips have different
> > capabilities in regard to writing and this can be used to some advantage.
> 
> Writes happen in multiples of mtd->writesize.  Which for NAND is
> pagesize.  There are also special cases with subpage writes.  AFAIK only
> UBI exploits that feature.
> 

Most xd cards can only be written a whole PEB in a time (can be handled with
appropriate writesize, I suppose).

Memorystick cards can be written page at a time, but only in progressive
fashion - only if all pages at lower offsets to the current page were written
before. This can be made to work as a useful optimization.

Are there any special tricks with subpage writes or it all amounts to "read
block" -> "merge changes" -> "write block"?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Support of removable MTD devices and other advanced features (follow-up from lkml)
  2008-05-25  3:41                       ` Alex Dubov
@ 2008-05-25  7:25                         ` Jörn Engel
  2008-05-25 13:30                           ` Jamie Lokier
  2008-05-26  2:12                           ` Alex Dubov
  0 siblings, 2 replies; 26+ messages in thread
From: Jörn Engel @ 2008-05-25  7:25 UTC (permalink / raw)
  To: Alex Dubov; +Cc: linux-mtd

On Sat, 24 May 2008 20:41:17 -0700, Alex Dubov wrote:
> --- Jörn Engel <joern@logfs.org> wrote:
> 
> > Writes happen in multiples of mtd->writesize.  Which for NAND is
> > pagesize.  There are also special cases with subpage writes.  AFAIK only
> > UBI exploits that feature.
> 
> Most xd cards can only be written a whole PEB in a time (can be handled with
> appropriate writesize, I suppose).

Up to 256M writing pages at a time worked for me.  1G didn't work
anymore.  Whether this was due to your requirement above or something
else I didn't work out.

> Memorystick cards can be written page at a time, but only in progressive
> fashion - only if all pages at lower offsets to the current page were written
> before. This can be made to work as a useful optimization.

The same limitation is true for some raw NAND chips as well.  Afaik all
of current MTD honors that limitation.

> Are there any special tricks with subpage writes or it all amounts to "read
> block" -> "merge changes" -> "write block"?

Subpage writes are special.  On some chips less than a page can be
written at once, e.g. 512 bytes for a 2k page chip.  But there are
additional limitations on the numbers of writes.  The 2k chip may limit
you to 3 writes.  So you have to do something like 512, 512, 1k.

Unless you really really need this, you'd better forget about it.

Jörn

-- 
I can say that I spend most of my time fixing bugs even if I have lots
of new features to implement in mind, but I give bugs more priority.
-- Andrea Arcangeli, 2000

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Support of removable MTD devices and other advanced features (follow-up from lkml)
  2008-05-25  7:25                         ` Jörn Engel
@ 2008-05-25 13:30                           ` Jamie Lokier
  2008-05-25 16:24                             ` Jörn Engel
  2008-05-26  2:12                           ` Alex Dubov
  1 sibling, 1 reply; 26+ messages in thread
From: Jamie Lokier @ 2008-05-25 13:30 UTC (permalink / raw)
  To: Jörn Engel; +Cc: linux-mtd, Alex Dubov

Jörn Engel wrote:
> > Memorystick cards can be written page at a time, but only in progressive
> > fashion - only if all pages at lower offsets to the current page were written
> > before. This can be made to work as a useful optimization.
> 
> The same limitation is true for some raw NAND chips as well.  Afaik all
> of current MTD honors that limitation.

That seems like it would cause uneven wear-levelling in some
situations.  Specifically, each time you need to erase a block which
has not been completely written, because you need to write a
contiguous record larger than the remaining space.

I don't know it was a physical requirement of some chips, thanks for
clarifying.

-- Jamie

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Support of removable MTD devices and other advanced features (follow-up from lkml)
  2008-05-25 13:30                           ` Jamie Lokier
@ 2008-05-25 16:24                             ` Jörn Engel
  2008-05-25 16:35                               ` Jamie Lokier
  0 siblings, 1 reply; 26+ messages in thread
From: Jörn Engel @ 2008-05-25 16:24 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: linux-mtd, Alex Dubov

On Sun, 25 May 2008 14:30:29 +0100, Jamie Lokier wrote:
> Jörn Engel wrote:
> > > Memorystick cards can be written page at a time, but only in progressive
> > > fashion - only if all pages at lower offsets to the current page were written
> > > before. This can be made to work as a useful optimization.
> > 
> > The same limitation is true for some raw NAND chips as well.  Afaik all
> > of current MTD honors that limitation.
> 
> That seems like it would cause uneven wear-levelling in some
> situations.  Specifically, each time you need to erase a block which
> has not been completely written, because you need to write a
> contiguous record larger than the remaining space.
> 
> I don't know it was a physical requirement of some chips, thanks for
> clarifying.

The "progressive" limitation only concerns writes _within_ eraseblocks.
It has no impact on wear leveling.

Jörn

-- 
If you're willing to restrict the flexibility of your approach,
you can almost always do something better.
-- John Carmack

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Support of removable MTD devices and other advanced features (follow-up from lkml)
  2008-05-25 16:24                             ` Jörn Engel
@ 2008-05-25 16:35                               ` Jamie Lokier
  2008-05-25 16:55                                 ` Jörn Engel
  0 siblings, 1 reply; 26+ messages in thread
From: Jamie Lokier @ 2008-05-25 16:35 UTC (permalink / raw)
  To: Jörn Engel; +Cc: linux-mtd, Alex Dubov

Jörn Engel wrote:
> > That seems like it would cause uneven wear-levelling in some
> > situations.  Specifically, each time you need to erase a block which
> > has not been completely written, because you need to write a
> > contiguous record larger than the remaining space.
> > 
> > I don't know it was a physical requirement of some chips, thanks for
> > clarifying.
> 
> The "progressive" limitation only concerns writes _within_ eraseblocks.
> It has no impact on wear leveling.

I understood that.

I mean that progressive writing may cause more wear towards the
beginning of _each_ eraseblock, because you'll write more often at the
start of each eraseblock than the end.  That's if wear is at all a
function of writes, and not solely erase operations.

There's another curious thought: do individual flash bit cells wear
out more quickly when written to "0" or left at "1"?  I doubt it, but
if it did make a difference, it would make a case for xoring data with
predictable pseudo-random bits.

-- Jamie

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Support of removable MTD devices and other advanced features (follow-up from lkml)
  2008-05-25 16:35                               ` Jamie Lokier
@ 2008-05-25 16:55                                 ` Jörn Engel
  0 siblings, 0 replies; 26+ messages in thread
From: Jörn Engel @ 2008-05-25 16:55 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: linux-mtd, Alex Dubov

On Sun, 25 May 2008 17:35:56 +0100, Jamie Lokier wrote:
> 
> I mean that progressive writing may cause more wear towards the
> beginning of _each_ eraseblock, because you'll write more often at the
> start of each eraseblock than the end.  That's if wear is at all a
> function of writes, and not solely erase operations.

I don't have any numbers on that.  Iirc on NOR flash, erase voltage is
significantly higher than write voltage, so erase likely dominate the
wear.  NAND flash may behave differently.

> There's another curious thought: do individual flash bit cells wear
> out more quickly when written to "0" or left at "1"?  I doubt it, but
> if it did make a difference, it would make a case for xoring data with
> predictable pseudo-random bits.

Might make a nice article for the April edition of some magazine. ;)

Jörn

-- 
Anything that can go wrong, will.
-- Finagle's Law

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Support of removable MTD devices and other advanced features (follow-up from lkml)
  2008-05-25  7:25                         ` Jörn Engel
  2008-05-25 13:30                           ` Jamie Lokier
@ 2008-05-26  2:12                           ` Alex Dubov
  1 sibling, 0 replies; 26+ messages in thread
From: Alex Dubov @ 2008-05-26  2:12 UTC (permalink / raw)
  To: Jörn Engel; +Cc: linux-mtd


--- Jörn Engel <joern@logfs.org> wrote:

> On Sat, 24 May 2008 20:41:17 -0700, Alex Dubov wrote:
> > --- Jörn Engel <joern@logfs.org> wrote:
> > 
> > > Writes happen in multiples of mtd->writesize.  Which for NAND is
> > > pagesize.  There are also special cases with subpage writes.  AFAIK only
> > > UBI exploits that feature.
> > 
> > Most xd cards can only be written a whole PEB in a time (can be handled
> with
> > appropriate writesize, I suppose).
> 
> Up to 256M writing pages at a time worked for me.  1G didn't work
> anymore.  Whether this was due to your requirement above or something
> else I didn't work out.
> 

"Smartmedia" spec kind of allows (but recommends against) single page
programming. "xD card" spec prohibits it outright starting from version 1.0.
And in fact I had some really nasty problems recently with two olympus cards
when not writing full blocks.



      

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2008-05-26  2:12 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-20 13:59 Support of removable MTD devices and other advanced features (follow-up from lkml) Alex Dubov
2008-05-21  6:47 ` Artem Bityutskiy
2008-05-21  8:41 ` Jörn Engel
2008-05-22  1:30   ` Alex Dubov
2008-05-22 15:10     ` Jörn Engel
2008-05-23  2:47       ` Alex Dubov
2008-05-23  5:50         ` Jörn Engel
2008-05-23  9:33           ` Alex Dubov
2008-05-23  9:59             ` Jörn Engel
2008-05-23 12:49               ` Alex Dubov
2008-05-23 13:28                 ` Jörn Engel
2008-05-24 13:12                   ` Alex Dubov
2008-05-24 17:56                     ` Jörn Engel
2008-05-25  3:41                       ` Alex Dubov
2008-05-25  7:25                         ` Jörn Engel
2008-05-25 13:30                           ` Jamie Lokier
2008-05-25 16:24                             ` Jörn Engel
2008-05-25 16:35                               ` Jamie Lokier
2008-05-25 16:55                                 ` Jörn Engel
2008-05-26  2:12                           ` Alex Dubov
2008-05-21  9:06 ` David Woodhouse
2008-05-21  9:29   ` Jörn Engel
2008-05-21 15:20     ` Alex Dubov
2008-05-21 15:22       ` David Woodhouse
2008-05-21 15:41         ` Alex Dubov
2008-05-21 20:45           ` Jörn Engel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox