mtd_info->size again (lengthy)

public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed

* mtd_info->size again (lengthy)
@ 2008-06-08  5:38 Bruce_Leonard
  2008-06-08 12:10 ` Jörn Engel
  0 siblings, 1 reply; 17+ messages in thread
From: Bruce_Leonard @ 2008-06-08  5:38 UTC (permalink / raw)
  To: linux-mtd

There's been some chatter on the list in the last couple of months (mostly 
by me) about the limit on MTD devices of no more than 2GiB because struct 
mtd_info->size is a 32-bit value.  A few weeks ago I asked for ideas on 
how to get around this limit, the most popular on being that I should 
change the meaning of mtd_info->size from "the size of the device in 
bytes" to "the size of the device in kilobytes".  I took the suggestion to 
heart and when I started working on this last week, that was the direction 
I took.  Now I'm in deep trouble.  I'm having multiple problems with this 
solution.

A quick grep of mtd->info turns up on the order of 225 hits in various and 
assorted drivers and core code, not counting other structures that have an 
mtd_info buried inside them.  A lot of those hits are simple things like 
"if(off + len > mtd->info)".  So if I change the meaning of mtd_info->size 
to KiB, then I have to change everyplace that touches/checkes/uses 
mtd_info->size to use KiB.  A duanting task, especially as there are other 
places where the value of mtd_info->size is used to initialize other 
structures (JFFS2 leeps to mind).  Would I then have to go through those 
other places and ripple the change in meaning there as well?  This idea 
was getting out of hand in a hurry.

So I stepped back and rethought this.  Maybe just leaving the meaning as 
bytes and changing the type to 64-bits wasn't such a bad idea after all. 
As I said, a sizable percentage of the 225 hits are simple compares, the 
compiler would take care of my problem by promotion.  Plus places like the 
NAND layer that does calculations by shifting mtd_info->size could safely 
be left alone.  However, I'm still left with the problem of ripple effect 
into other layers like JFFS2 and some of the MTD map files.

Plus I run into this problem with making it a 64-bit value: there are a 
few places (add_mtd_device() for example) where mtd_info->size is passed 
as a parameter of type size_t, which in my case on an mpc8347 is a 32-bit 
value.  This is where I really start to get bogged down.  size_t (please 
correct me if I'm wrong because I probably am) is defined to be a type 
that will tell you the largest item (in bytes) that can be accessed by 
that particular processor.  In other words, the biggest structure you 
could access in the RAM on the local bus of an mpc8347 would be 4GiB, or 
0x0 - 0xffffffff, a 32-bit value, and thus size_t on an 8347 is defined to 
be an unsigned long.  That makes my poor little HW engineer brain hurt, 
though I think I get it.  But wait....now I have 8GiB of NAND flash, 
connected to a PCI controller, handled by a single PCI driver and thus 
presented to the kernel as a single item, 8GiB in size which doesn't fit 
in size_t.  So now, my call to mtd->unlock() in the function 
add_mtd_device() passes a length of 0 because mtd->size gets truncated to 
32-bits!  Ouch, my poor little HW engineer brain just had an aneurysm :(.

We then thought, why don't we just break the NAND controller up into four 
controllers (we can do that becuase we rolled our own in an FPGA) with 
2GiB of flash on each controller.  Then we can just concatenate the four 
together in the MTD layer.  Great idea.  Not!  Same problem.  When you 
concatenate devices together in MTD, each devices size gets added to 
mtd_info->size, and again I've overflowed my 32-bit size.

Then we thought, okay just have four 2GiB devices on /dev/mtd0-3.  After 
all, we can mount as much stuff on the root directory as we want.  Except 
my SW folks are telling me that, due to archiving requirements we will 
probably end up with sub-directories in excess of 2GiB which will make 
file management a real nightmare.  Well, fine, so I'm back to trying to 
get the MTD layer to support >2GiB devices.  And to be honest, I really 
believe it's the best solution for all.  As flash densities continue to 
grow, I think there are going to be more and more embedded designs where 
using Compact Flash or some other SSD hidden behind an IDE interface is 
not going to be acceptible.  This is a good thing to change and now is a 
good time.

But folks, I'm stuck.  I'm just a dumb HW engineer trying to get my box 
running.  I'm willing to make the change and I want to make the change, 
but I need help.  I hate asking, I know that the etiquette on the lists 
frowns on asking for specific help.  And usually after a few days of 
poking I can come up with my own solution.  It's not pretty but it works 
:).  But right now I'm totally lost in the subtle complexeties and the 
shear size of MTD.  It's way beyond me.  What I need is a concrete code 
example from someone on what and how to go about expanding MTD to support 
large sizes.  Can someone please help me out?  If I can get a good solid 
idea of how to fix it, I'm more than happy to do the work of making all 
the code changes and putting together a patch.  The open source community 
has been good to me and I'd love nothing more than being able to 
contribute something back.

Thanks for listening.

Bruce

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: mtd_info->size again (lengthy)
  2008-06-08  5:38 mtd_info->size again (lengthy) Bruce_Leonard
@ 2008-06-08 12:10 ` Jörn Engel
  2008-06-08 12:13   ` Jörn Engel
                     ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Jörn Engel @ 2008-06-08 12:10 UTC (permalink / raw)
  To: Bruce_Leonard; +Cc: linux-mtd

On Sat, 7 June 2008 22:38:21 -0700, Bruce_Leonard@selinc.com wrote:
> 
> But folks, I'm stuck.

Sometimes the best answer comes in diff -u form, see below.

This has been on my TODO list for a while.  Long-term we want to get rid
of read, write and erase in drivers, replacing them with submit_fio (or
some equivalent function, if people dislike the name).  Submit_fio has
the advantage of allowing asynchronous operations, queueing,
out-of-order completion and all those goodies.  If you have several
chips in a device, I'm sure you want those as well.

And while at it, replace the 32bit size with a 64bit blockno and 32bit
offset.  Unless someone is stupid enough to create devices with 4G
erasesize, that should last for a human lifetime or something close.

It still leaves the problem of converting all driver.  I've only started
on mtdram - because it is simple and everyone can test it.  Other
drivers can follow over time.  Users like jffs2 could be converted as
well, but I'd rather leave them for now.  We want synchronous wrappers
around submit_fio anyway, as those are useful for development and for
trivial performance-agnostic code.  So for the time being, users can
simply use the wrappers.

Which leaves four quadrants:
          old driver  new driver
old user      1           2
new user      3           4

1 is fine, 4 is fine.  2 needs to deal with the synchronous wrappers and
3 will also need some plumbing.

Comments?

Jörn

-- 
Don't patch bad code, rewrite it.
-- Kernigham and Pike, according to Rusty

diff --git a/drivers/mtd/devices/mtdram.c b/drivers/mtd/devices/mtdram.c
index 0399be1..6ddefa7 100644
--- a/drivers/mtd/devices/mtdram.c
+++ b/drivers/mtd/devices/mtdram.c
@@ -90,6 +90,50 @@ static int ram_write(struct mtd_info *mtd, loff_t to, size_t len,
 	return 0;
 }
 
+static int ram_submit_fio(struct mtd_info *mtd, struct fio *fio)
+{
+	int i, err = 0;
+	u32 ofs;
+	size_t *retlen;
+	const u_char *buf;
+	struct fio_vec *vec;
+
+	/* This driver cannot deal with device >32bit yet */
+	if (fio->fi_blockno * mtd->erasesize + fio->block_ofs + fio->fi_len
+			> 0xffffffff)
+		return -EIO;
+
+	ofs = fio->fi_blockno * mtd->erasesize + fio->block_ofs;
+
+	if (fio->fi_command == FI_ERASE) {
+		memset(mtd->priv + ofs, 0xff, fio->fi_len);
+		goto out;
+	}
+
+	for (i = 0; i < fio->fi_vcnt; i++) {
+		vec = fio->fi_io_vec + i;
+		buf = kmap_atomic(vec->fv_page, KM_USER0);
+		switch (fio->fi_command) {
+		case FI_READ:
+			err = ram_read(mtd, ofs, vec->fv_len, &retlen,
+					buf + vec->fv_offset);
+			break;
+		case FI_WRITE:
+			err = ram_write(mtd, ofs, vec->fv_len, &retlen,
+					buf + vec->fv_offset);
+		default:
+			BUG();
+		}
+		kunmap_atomic(buf, KM_USER0);
+		ofs += vec->fv_len;
+		if (err)
+			goto out;
+	}
+out:
+	fio->fi_end_io(fio, err);
+	return 0;
+}
+
 static void __exit cleanup_mtdram(void)
 {
 	if (mtd_info) {
@@ -109,8 +153,10 @@ int mtdram_init_device(struct mtd_info *mtd, void *mapped_address,
 	mtd->type = MTD_RAM;
 	mtd->flags = MTD_CAP_RAM;
 	mtd->size = size;
-	mtd->writesize = 1;
+	mtd->no_eraseblocks = size;
+	do_div(mtd->no_eraseblocks, MTDRAM_ERASE_SIZE);
 	mtd->erasesize = MTDRAM_ERASE_SIZE;
+	mtd->writesize = 1;
 	mtd->priv = mapped_address;
 
 	mtd->owner = THIS_MODULE;
diff --git a/include/linux/mtd/mtd.h b/include/linux/mtd/mtd.h
index 245f909..9729a17 100644
--- a/include/linux/mtd/mtd.h
+++ b/include/linux/mtd/mtd.h
@@ -98,10 +98,51 @@ struct mtd_oob_ops {
 	uint8_t		*oobbuf;
 };
 
+enum fi_command {
+	FI_READ,
+	FI_WRITE,
+	FI_ERASE,
+};
+
+struct fio_vec {
+	struct page	*fv_page;
+	unsigned int	fv_len;
+	unsigned int	fv_offset;
+};
+
+typedef void (fio_end_io_t) (struct fio *, int);
+/*
+ * struct fio (flash io) is modeled after struct bio and where possible the
+ * fields have been kept identical.  Some fields have gone missing and others
+ * been added.
+ */
+struct fio {
+	/* Fields shared with struct bio */
+	struct fio	*fi_next;
+	unsigned long	fi_flags;
+	unsigned long	fi_command;
+	unsigned short	fi_vcnt;
+	struct fio_vec	fi_io_vec;
+	fio_end_io_t	fi_end_io;
+	void		*fi_private;
+	/* Fields specific to flash */
+	u64		fi_blockno;
+	u32		fi_blockofs;
+	u32		fi_len;
+};
+
 struct mtd_info {
 	u_char type;
 	u_int32_t flags;
-	u_int32_t size;	 // Total size of the MTD
+	/*
+	 * size is becoming redundant, as it can be calculated from
+	 * no_eraseblocks * erasesize.  There is a fair chance it will die
+	 * in the near future.  For flashes with variable erase sizes,
+	 * no_eraseblocks signifies the "normal" size, so above calculation
+	 * remains valid.
+	 */
+	u_int64_t size;	 // Total size of the MTD
+	u_int64_t no_eraseblocks;
 
 	/* "Major" erase size for the device. Naïve users may take this
 	 * to be the only erase size available, or may use the more detailed
@@ -151,6 +192,7 @@ struct mtd_info {
 	void (*unpoint) (struct mtd_info *mtd, loff_t from, size_t len);
 
 
+	int (*submit_bio) (struct mtd_info *mtd, struct fio *fio);
 	int (*read) (struct mtd_info *mtd, loff_t from, size_t len, size_t *retlen, u_char *buf);
 	int (*write) (struct mtd_info *mtd, loff_t to, size_t len, size_t *retlen, const u_char *buf);
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: mtd_info->size again (lengthy)
  2008-06-08 12:10 ` Jörn Engel
@ 2008-06-08 12:13   ` Jörn Engel
  2008-06-09  2:51   ` Bruce_Leonard
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Jörn Engel @ 2008-06-08 12:13 UTC (permalink / raw)
  To: Bruce_Leonard; +Cc: linux-mtd

On Sun, 8 June 2008 14:10:20 +0200, Jörn Engel wrote:
> 
> Comments?

And before anyone bothers complaining, the patch is completely untested,
I haven't even compiled it.

Jörn

-- 
One of my most productive days was throwing away 1000 lines of code.
-- Ken Thompson.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: mtd_info->size again (lengthy)
  2008-06-08 12:10 ` Jörn Engel
  2008-06-08 12:13   ` Jörn Engel
@ 2008-06-09  2:51   ` Bruce_Leonard
  2008-06-09  5:21     ` Jörn Engel
  2008-06-09  4:43   ` Bruce_Leonard
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Bruce_Leonard @ 2008-06-09  2:51 UTC (permalink / raw)
  To: Jörn Engel; +Cc: linux-mtd

Jörn Engel <joern@logfs.org> wrote on 06/08/2008 05:10:20 AM:

> On Sat, 7 June 2008 22:38:21 -0700, Bruce_Leonard@selinc.com wrote:
> > 
> > But folks, I'm stuck.
> 
> Sometimes the best answer comes in diff -u form, see below.
> 
> This has been on my TODO list for a while.  Long-term we want to get rid

Jorn,

Thanks so much for the feedback.  I think I get what you're saying, but 
let me paraphrase it back to make sure.  My approach should be to make a 
new interface that is 64-bit clean, using one or more functions (I'm not 
sure I should shove ALL the functions that exist in mtd_info into a single 
submit_fio(), but I don't know).  For my driver I should write the new 
function(s) to be 64-bit clean and just use the new interface.  I can then 
change the core code that impacts my specific case to take advantage of 
the new interface (being carefull to make sure I don't break the old 
interface).  Existing drivers can continue to use the existing interface 
but will eventually migrate to the new interface.  If they can't be made 
64-bit clean for whatever reason, the new interface just becomes a wrapper 
to the old functions.  Sooner or later everything will be on the new 
interface and the old one can be removed.  Does that sound about right?

Just a few questions about the details of your (admitidley un-compiled and 
un-tested :)  ) solution:

> 
> +static int ram_submit_fio(struct mtd_info *mtd, struct fio *fio)
> +{

I gather fio is something that's to be filled in each time ram_submit_fio 
is called rather than being a static thing in mtd_info?  Sort of like 
struct mtd_oob_ops in the NAND layer gets filled in based on what's being 
called?

> +   int i, err = 0;
> +   u32 ofs;
> +   size_t *retlen;
> +   const u_char *buf;
> +   struct fio_vec *vec;
> +
> +   /* This driver cannot deal with device >32bit yet */
> +   if (fio->fi_blockno * mtd->erasesize + fio->block_ofs + fio->fi_len
> +         > 0xffffffff)
> +      return -EIO;
> +
> +   ofs = fio->fi_blockno * mtd->erasesize + fio->block_ofs;
> +
> +   if (fio->fi_command == FI_ERASE) {
> +      memset(mtd->priv + ofs, 0xff, fio->fi_len);
> +      goto out;
> +   }
> +
> +   for (i = 0; i < fio->fi_vcnt; i++) {
> +      vec = fio->fi_io_vec + i;
> +      buf = kmap_atomic(vec->fv_page, KM_USER0);
> +      switch (fio->fi_command) {
> +      case FI_READ:
> +         err = ram_read(mtd, ofs, vec->fv_len, &retlen,
> +               buf + vec->fv_offset);

I'm not famliar enough with block I/O to know what advantages or 
disadvantages is has, but since all the upper layers already allocate a 
buffer for putting date into or taking data out of, wouldn't it make more 
sense/be simpler to just keep using it?  For example, mtd_read() in 
mtdchar.c does a kmalloc to get a buffer that is then passed to 
mtd->read().  Wouldn't it be simpler to just keep using that rather than 
switching to struct page?  Or is there something that I'm missing that 
struct page brings to the game?

> +         break;
> +      case FI_WRITE:
> +         err = ram_write(mtd, ofs, vec->fv_len, &retlen,
> +               buf + vec->fv_offset);
> +      default:
> +         BUG();
> +      }
> +      kunmap_atomic(buf, KM_USER0);
> +      ofs += vec->fv_len;
> +      if (err)
> +         goto out;
> +   }
> +out:
> +   fio->fi_end_io(fio, err);

Not sure I follow the logic of the typedef here and what it is you're 
trying to accomplish.

<snip>

> +
> +typedef void (fio_end_io_t) (struct fio *, int);
> +/*

<snip>

>  struct mtd_info {
>     u_char type;
>     u_int32_t flags;
> -   u_int32_t size;    // Total size of the MTD
> +   /*
> +    * size is becoming redundant, as it can be calculated from
> +    * no_eraseblocks * erasesize.  There is a fair chance it will die
> +    * in the near future.  For flashes with variable erase sizes,
> +    * no_eraseblocks signifies the "normal" size, so above calculation
> +    * remains valid.
> +    */

Which calculation are you refering to?

> +   u_int64_t size;    // Total size of the MTD
> +   u_int64_t no_eraseblocks;
> 

In my original email I mentioned that I ran into a few problems makeing 
size a 64-bit value.  Did you have any problems with it?

Thanks again for the feedback.  It's given me something I think I can sink 
my teeth into and finally make some progress.  At least I hope so :)

Bruce

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: mtd_info->size again (lengthy)
  2008-06-09  2:51   ` Bruce_Leonard
@ 2008-06-09  5:21     ` Jörn Engel
  0 siblings, 0 replies; 17+ messages in thread
From: Jörn Engel @ 2008-06-09  5:21 UTC (permalink / raw)
  To: Bruce_Leonard; +Cc: linux-mtd

On Sun, 8 June 2008 19:51:47 -0700, Bruce_Leonard@selinc.com wrote:
> Jörn Engel <joern@logfs.org> wrote on 06/08/2008 05:10:20 AM:
> 
> Thanks so much for the feedback.  I think I get what you're saying, but 
> let me paraphrase it back to make sure.  My approach should be to make a 
> new interface that is 64-bit clean, using one or more functions (I'm not 
> sure I should shove ALL the functions that exist in mtd_info into a single 
> submit_fio(), but I don't know).  For my driver I should write the new 
> function(s) to be 64-bit clean and just use the new interface.  I can then 
> change the core code that impacts my specific case to take advantage of 
> the new interface (being carefull to make sure I don't break the old 
> interface).  Existing drivers can continue to use the existing interface 
> but will eventually migrate to the new interface.

Pretty much.

> If they can't be made 
> 64-bit clean for whatever reason, the new interface just becomes a wrapper 
> to the old functions.

Everything can be made 64bit clean.  The only reason against it is a
crufty driver for obsolete hardware that noone wants to touch with a 10'
pole.  So such plumbing is an indication of a social problem (laziness)
rather than a technical one. ;)

> Sooner or later everything will be on the new 
> interface and the old one can be removed.  Does that sound about right?

Yes.

> > +static int ram_submit_fio(struct mtd_info *mtd, struct fio *fio)
> > +{
> 
> I gather fio is something that's to be filled in each time ram_submit_fio 
> is called rather than being a static thing in mtd_info?  Sort of like 
> struct mtd_oob_ops in the NAND layer gets filled in based on what's being 
> called?

Yes.

> > +   int i, err = 0;
> > +   u32 ofs;
> > +   size_t *retlen;
> > +   const u_char *buf;
> > +   struct fio_vec *vec;
> > +
> > +   /* This driver cannot deal with device >32bit yet */
> > +   if (fio->fi_blockno * mtd->erasesize + fio->block_ofs + fio->fi_len
> > +         > 0xffffffff)
> > +      return -EIO;
> > +
> > +   ofs = fio->fi_blockno * mtd->erasesize + fio->block_ofs;
> > +
> > +   if (fio->fi_command == FI_ERASE) {
> > +      memset(mtd->priv + ofs, 0xff, fio->fi_len);
> > +      goto out;
> > +   }
> > +
> > +   for (i = 0; i < fio->fi_vcnt; i++) {
> > +      vec = fio->fi_io_vec + i;
> > +      buf = kmap_atomic(vec->fv_page, KM_USER0);
> > +      switch (fio->fi_command) {
> > +      case FI_READ:
> > +         err = ram_read(mtd, ofs, vec->fv_len, &retlen,
> > +               buf + vec->fv_offset);
> 
> I'm not famliar enough with block I/O to know what advantages or 
> disadvantages is has, but since all the upper layers already allocate a 
> buffer for putting date into or taking data out of, wouldn't it make more 
> sense/be simpler to just keep using it?  For example, mtd_read() in 
> mtdchar.c does a kmalloc to get a buffer that is then passed to 
> mtd->read().  Wouldn't it be simpler to just keep using that rather than 
> switching to struct page?  Or is there something that I'm missing that 
> struct page brings to the game?

Performance.  If you want the same path to deal with data from both
userspace and kernelspace, the only alternative would be memcpy, which
isn't exactly known to aid in benchmarks.

> > +         break;
> > +      case FI_WRITE:
> > +         err = ram_write(mtd, ofs, vec->fv_len, &retlen,
> > +               buf + vec->fv_offset);
> > +      default:
> > +         BUG();
> > +      }
> > +      kunmap_atomic(buf, KM_USER0);
> > +      ofs += vec->fv_len;
> > +      if (err)
> > +         goto out;
> > +   }
> > +out:
> > +   fio->fi_end_io(fio, err);
> 
> Not sure I follow the logic of the typedef here and what it is you're 
> trying to accomplish.

Asynchronous completion.  mtdram is a simple driver.  But if you have a
piece of hardware that actually takes time to do things, preferrably
with a dma engine and generating interrupts, you have to queue those
requests.

An even better example is a device consisting of four independent chips.
Here you want four independent queues.  When a new request comes in, you
decide which chip it belongs to and queue it for that chip.  If one chip
is busy and three are idle, one queue fills up and the odd request for
the other three is handled fairly quickly.  So io requests and io
completion are in a different order.

You may even want to have two queues per chip and always handle reads
before dealing with any writes or erases.

All this gives you extra performance.  It just doesn't matter much for
the stupid driver I used as an example.

> > +
> > +typedef void (fio_end_io_t) (struct fio *, int);
> > +/*

Or do you ask why I used a typedef instead of directly using the type in
struct fio?  No reason, really.  bio works that way and I simply copied
it for consistency.

> >  struct mtd_info {
> >     u_char type;
> >     u_int32_t flags;
> > -   u_int32_t size;    // Total size of the MTD
> > +   /*
> > +    * size is becoming redundant, as it can be calculated from
> > +    * no_eraseblocks * erasesize.  There is a fair chance it will die
> > +    * in the near future.  For flashes with variable erase sizes,
> > +    * no_eraseblocks signifies the "normal" size, so above calculation
> > +    * remains valid.
> > +    */
> 
> Which calculation are you refering to?

no_eraseblocks * erasesize

> > +   u_int64_t size;    // Total size of the MTD
> > +   u_int64_t no_eraseblocks;
> 
> In my original email I mentioned that I ran into a few problems makeing 
> size a 64-bit value.  Did you have any problems with it?

I didn't even compile. ;)

Basically, you could pick either of the approaches in your mail - the
result would be the same.  You need to work your way through every
single driver and every single user and fix things up.  All work and no
play.

If we are going through the trouble anyway, we might as well add
asynchronous stuff while at it.  Which I want, badly.  Plus, by having a
new function (or set of functions, doesn't matter), we can easily check
at runtime whether we're dealing with an old driver or a new one.

Jörn

-- 
Joern's library part 2:
http://www.art.net/~hopkins/Don/unix-haters/tirix/embarrassing-memo.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: mtd_info->size again (lengthy)
  2008-06-08 12:10 ` Jörn Engel
  2008-06-08 12:13   ` Jörn Engel
  2008-06-09  2:51   ` Bruce_Leonard
@ 2008-06-09  4:43   ` Bruce_Leonard
  2008-06-09  5:28     ` Jörn Engel
  2008-06-09 13:21   ` Jamie Lokier
  2008-06-11  0:09   ` Bruce_Leonard
  4 siblings, 1 reply; 17+ messages in thread
From: Bruce_Leonard @ 2008-06-09  4:43 UTC (permalink / raw)
  To: Jörn Engel; +Cc: linux-mtd

Jörn Engel <joern@logfs.org> wrote on 06/08/2008 05:10:20 AM:

> 
> And while at it, replace the 32bit size with a 64bit blockno and 32bit
> offset.  Unless someone is stupid enough to create devices with 4G
> erasesize, that should last for a human lifetime or something close.
> 

What kind of block is blockno?  An erase block (which is 128K in my case) 
or a write block (2K)?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: mtd_info->size again (lengthy)
  2008-06-09  4:43   ` Bruce_Leonard
@ 2008-06-09  5:28     ` Jörn Engel
  0 siblings, 0 replies; 17+ messages in thread
From: Jörn Engel @ 2008-06-09  5:28 UTC (permalink / raw)
  To: Bruce_Leonard; +Cc: linux-mtd

On Sun, 8 June 2008 21:43:33 -0700, Bruce_Leonard@selinc.com wrote:
> Jörn Engel <joern@logfs.org> wrote on 06/08/2008 05:10:20 AM:
> 
> > And while at it, replace the 32bit size with a 64bit blockno and 32bit
> > offset.  Unless someone is stupid enough to create devices with 4G
> > erasesize, that should last for a human lifetime or something close.
> 
> What kind of block is blockno?  An erase block (which is 128K in my case) 
> or a write block (2K)?

eraseblock

Jörn

-- 
Those who come seeking peace without a treaty are plotting.
-- Sun Tzu

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: mtd_info->size again (lengthy)
  2008-06-08 12:10 ` Jörn Engel
                     ` (2 preceding siblings ...)
  2008-06-09  4:43   ` Bruce_Leonard
@ 2008-06-09 13:21   ` Jamie Lokier
  2008-06-09 14:36     ` Jörn Engel
  2008-06-11  0:09   ` Bruce_Leonard
  4 siblings, 1 reply; 17+ messages in thread
From: Jamie Lokier @ 2008-06-09 13:21 UTC (permalink / raw)
  To: Jörn Engel; +Cc: linux-mtd, Bruce_Leonard

Jörn Engel wrote:
> replacing them with submit_fio (or some equivalent function, if
> people dislike the name).

May I suggest submit_mtdio?  One concept to one name, you know.

F means flash, sure.  But we don't call the devices /dev/flash0, /dev/flash1.
Nor do we call it the Flash subsystem.

Or, maybe we should change the name to flash everywhere.  Most people
recognise that nowadays - flash is a common term.

> Submit_fio has the advantage of allowing asynchronous operations,
> queueing, out-of-order completion and all those goodies.  If you
> have several chips in a device, I'm sure you want those as well.

Yes!  I always wondered why it wasn't done like that from the beginning.

> And while at it, replace the 32bit size with a 64bit blockno and 32bit
> offset.  Unless someone is stupid enough to create devices with 4G
> erasesize, that should last for a human lifetime or something close.

Although a single chip is unlikely to have 4GiB erasesize for a while,
a cluster of chips might, and it should be possible to represent them
as a single device.  I strongly suggest supporting erasesizes
comparable with the device size.  That doesn't mean writesize has to
be that big - writesize should fit in size_t (actually ssize_t), just
like sys_write().

> > It still leaves the problem of converting all driver.  I've only started
> on mtdram - because it is simple and everyone can test it.  Other
> drivers can follow over time.  Users like jffs2 could be converted as
> well, but I'd rather leave them for now.

> We want synchronous wrappers around submit_fio anyway, as those are
> useful for development and for trivial performance-agnostic code.
> Comments?

I'm thinking we want asynchronous wrappers around the synchronous
methods too, using a dispatch thread.  So that drivers can be
converted one by one, and old drivers don't need to be.  Something like
generic_submit_fio.

-- Jamie

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: mtd_info->size again (lengthy)
  2008-06-09 13:21   ` Jamie Lokier
@ 2008-06-09 14:36     ` Jörn Engel
  2008-06-09 16:31       ` Jamie Lokier
  0 siblings, 1 reply; 17+ messages in thread
From: Jörn Engel @ 2008-06-09 14:36 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: linux-mtd, Bruce_Leonard

On Mon, 9 June 2008 14:21:17 +0100, Jamie Lokier wrote:
> Jörn Engel wrote:
> > replacing them with submit_fio (or some equivalent function, if
> > people dislike the name).
> 
> May I suggest submit_mtdio?  One concept to one name, you know.
> 
> F means flash, sure.  But we don't call the devices /dev/flash0, /dev/flash1.
> Nor do we call it the Flash subsystem.

I never liked the name "mtd".  It is unpronouncable and reeks of the
kind of premature overgeneralization big corporations usually come up
with.  "One day this will cover all memory, including hard disks, cpu
caches and those five bytes in the rtc."  Or some such madness.

But mtdio is even worse.  Dude, what are you smoking? ;)

> Or, maybe we should change the name to flash everywhere.  Most people
> recognise that nowadays - flash is a common term.

The old name - as stupid as it is - has become part of the eternal
interfaces.  You'll never hunt it to extinction.

> > And while at it, replace the 32bit size with a 64bit blockno and 32bit
> > offset.  Unless someone is stupid enough to create devices with 4G
> > erasesize, that should last for a human lifetime or something close.
> 
> Although a single chip is unlikely to have 4GiB erasesize for a while,
> a cluster of chips might, and it should be possible to represent them
> as a single device.

When you have a cluster of chips, striping them and increasing writesize
and erasesize (and having each bad block kill perfectly good blocks in
the other stripes) may be the first idea you have.  But I hope it won't
be the last.

Part of the reason for asynchronous interfaces is to make parallel chips
work fast without striping them.  Or rather, make them work even faster
than stripes.

> I'm thinking we want asynchronous wrappers around the synchronous
> methods too, using a dispatch thread.  So that drivers can be
> converted one by one, and old drivers don't need to be.  Something like
> generic_submit_fio.

Maybe.  That depends on when and how the users will take advantage of
the new interface.  I have plans for logfs, but don't intend to convert
jffs2, mtdchar, mtdblock or ubi.  If others do, please send patches.

Jörn

-- 
Public Domain  - Free as in Beer
General Public - Free as in Speech
BSD License    - Free as in Enterprise
Shared Source  - Free as in "Work will make you..."

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: mtd_info->size again (lengthy)
  2008-06-09 14:36     ` Jörn Engel
@ 2008-06-09 16:31       ` Jamie Lokier
  2008-06-09 17:22         ` Jörn Engel
  0 siblings, 1 reply; 17+ messages in thread
From: Jamie Lokier @ 2008-06-09 16:31 UTC (permalink / raw)
  To: Jörn Engel; +Cc: linux-mtd, Bruce_Leonard

Jörn Engel wrote:
> > May I suggest submit_mtdio?  One concept to one name, you know.
> > 
> > F means flash, sure.  But we don't call the devices /dev/flash0, /dev/flash1.
> > Nor do we call it the Flash subsystem.
> 
> I never liked the name "mtd".

Me neither.

> But mtdio is even worse.  Dude, what are you smoking? ;)

submit_fio suggests "file I/O" to me....  Even if I knew better, it
looks just that bit too much like submit_bio for something file-ish.

submit_mtd_request or async_mtd, then! :-)

> > > And while at it, replace the 32bit size with a 64bit blockno and 32bit
> > > offset.  Unless someone is stupid enough to create devices with 4G
> > > erasesize, that should last for a human lifetime or something close.
> > 
> > Although a single chip is unlikely to have 4GiB erasesize for a while,
> > a cluster of chips might, and it should be possible to represent them
> > as a single device.
> 
> When you have a cluster of chips, striping them and increasing writesize
> and erasesize (and having each bad block kill perfectly good blocks in
> the other stripes) may be the first idea you have.  But I hope it won't
> be the last.

I didn't say it's a _good_ idea!  Just that I won't be surprised when
someone produces hardware like that.  And, there will probably be a
reason you haven't thought of why they did it that way.

> Part of the reason for asynchronous interfaces is to make parallel chips
> work fast without striping them.  Or rather, make them work even faster
> than stripes.

And erasing simultaneous with streaming writes, let the CPU get on
with something else, etc.

What should the behaviour be with devices which block the CPU during
synchronous writes and erases?  Right now, you can be sure the
temporary CPU blockage (if there is one) happens when you request the
I/O.  If it's asynchronous, depending on how it's implemented, the CPU
could block at an unexpected time later.

-- Jamie

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: mtd_info->size again (lengthy)
  2008-06-09 16:31       ` Jamie Lokier
@ 2008-06-09 17:22         ` Jörn Engel
  2008-06-10 10:56           ` Jamie Lokier
  0 siblings, 1 reply; 17+ messages in thread
From: Jörn Engel @ 2008-06-09 17:22 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: linux-mtd, Bruce_Leonard

On Mon, 9 June 2008 17:31:49 +0100, Jamie Lokier wrote:
> 
> > But mtdio is even worse.  Dude, what are you smoking? ;)
> 
> submit_fio suggests "file I/O" to me....  Even if I knew better, it
> looks just that bit too much like submit_bio for something file-ish.
> 
> submit_mtd_request or async_mtd, then! :-)

No, the similarities to submit_bio are _not_ accidental.  In a perfect
world, I would have used submit_bio.  In our less than perfect world, I
want to retain as many similarities as possible.  We have two storage
subsystems in the kernel that have lived their own seperate lives for
all too long.  They should get combined by some brave soul.  And while I
am not that brave soul, at least I can make his/her work a little
easier.

> I didn't say it's a _good_ idea!  Just that I won't be surprised when
> someone produces hardware like that.  And, there will probably be a
> reason you haven't thought of why they did it that way.

If someone is stupid enough to create such hardware, I don't mind if
they suffer.  Quite the contrary.  Send them the darwin award of
hardware engineering!

> > Part of the reason for asynchronous interfaces is to make parallel chips
> > work fast without striping them.  Or rather, make them work even faster
> > than stripes.
> 
> And erasing simultaneous with streaming writes, let the CPU get on
> with something else, etc.

Same with asynchronous erases on all chips - except that the blocks
being erased don't need any special relationship.

> What should the behaviour be with devices which block the CPU during
> synchronous writes and erases?  Right now, you can be sure the
> temporary CPU blockage (if there is one) happens when you request the
> I/O.  If it's asynchronous, depending on how it's implemented, the CPU
> could block at an unexpected time later.

Good devices should have a dma engine and an interrupt line.  Lesser
devices can set a timer in the driver.  If the cpu has to actively do
some work during erase, you won't win any benchmark crowns with it
anyway and I care fairly little.

If you are designing hardware and want it to be fast, make sure it can
handle at least one request autonomously.  Better yet add a small queue
with - say - 16 entries and simply handle them in order.  Reads/writes
should dma the data to/from main memory.  Each completed request yields
an interrupt.

This is not rocket science.  If you have any trouble implementing
something like this, just talk to some network/ide/scsi person.  Crap
design like dual plane chips will disappear over time.  Once you get to
64 parallel planes, it is obvious that the same approach no longer
works.

Jörn

-- 
Doubt is not a pleasant condition, but certainty is an absurd one.
-- Voltaire

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: mtd_info->size again (lengthy)
  2008-06-09 17:22         ` Jörn Engel
@ 2008-06-10 10:56           ` Jamie Lokier
  2008-06-10 11:46             ` Jörn Engel
  0 siblings, 1 reply; 17+ messages in thread
From: Jamie Lokier @ 2008-06-10 10:56 UTC (permalink / raw)
  To: Jörn Engel; +Cc: linux-mtd, Bruce_Leonard

Jörn Engel wrote:
> > I didn't say it's a _good_ idea!  Just that I won't be surprised when
> > someone produces hardware like that.  And, there will probably be a
> > reason you haven't thought of why they did it that way.
> 
> If someone is stupid enough to create such hardware, I don't mind if
> they suffer.  Quite the contrary.  Send them the darwin award of
> hardware engineering!

Erm, I think a good reason will crop up in a few years that you
haven't thought of.

Like a fast system with 128TiB molecular flash where the erase power
circuit is 10^7 x larger than each storage bit.

I just don't see any advantage to assuming "nobody will ever use 4GiB
erase blocks", when changing the API with larger sizes.

> > What should the behaviour be with devices which block the CPU during
> > synchronous writes and erases?  Right now, you can be sure the
> > temporary CPU blockage (if there is one) happens when you request the
> > I/O.  If it's asynchronous, depending on how it's implemented, the CPU
> > could block at an unexpected time later.
> 
> Good devices should have a dma engine and an interrupt line.  Lesser
> devices can set a timer in the driver.  If the cpu has to actively do
> some work during erase, you won't win any benchmark crowns with it
> anyway and I care fairly little.
>
> If you are designing hardware ...

Back in the real world, we want to run current kernels on hardware
which is already built, or which is not designed for current kernels.

We also want to run current kernels on hardware designed by people who
care about minimising hardware cost, or design time, or using the
cheapest available old chips.

Fine, the async interface isn't designed for that.  If it's only used
for high-performance chips, that's fine with me.

But if you want to make the async interface the _central_ interface
for kernel flash API in future, around which everything should
revolve, please at least specify what behaviour to expect with
non-async devices - particularly, how should chip drivers behave.

There are plenty of slow, crappy parts in use for good reasons, cost,
size, and compatibility with other software or reference designs, primarily.

I have a particularly crappy one where erase blocks the CPU if the CPU
reads from the chip during the erase - so the cfi_cmdset_0002.c file
needs a patch to avoid polling the status register for this board.
This is topic for another post, though.

-- Jamie

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: mtd_info->size again (lengthy)
  2008-06-10 10:56           ` Jamie Lokier
@ 2008-06-10 11:46             ` Jörn Engel
  0 siblings, 0 replies; 17+ messages in thread
From: Jörn Engel @ 2008-06-10 11:46 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: linux-mtd, Bruce_Leonard

On Tue, 10 June 2008 11:56:48 +0100, Jamie Lokier wrote:
> 
> I just don't see any advantage to assuming "nobody will ever use 4GiB
> erase blocks", when changing the API with larger sizes.

*shrug*

I use u64 for the number of eraseblocks because it doesn't take a lot of
imagination to see u32 overflow before I retire.  It didn't take a lot
of imagination to see the u32 size problem coming either.  Relying on "a
good reason that I haven't thought of" is a bit of a stretch.

Let me put it this way: if you spend the time and create all the
patches, you can use u128 for all I care.

> I have a particularly crappy one where erase blocks the CPU if the CPU
> reads from the chip during the erase - so the cfi_cmdset_0002.c file
> needs a patch to avoid polling the status register for this board.
> This is topic for another post, though.

Nothing can prevent bad drivers.  At least they are easy to fix.

Jörn

-- 
"Security vulnerabilities are here to stay."
-- Scott Culp, Manager of the Microsoft Security Response Center, 2001

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: mtd_info->size again (lengthy)
  2008-06-08 12:10 ` Jörn Engel
                     ` (3 preceding siblings ...)
  2008-06-09 13:21   ` Jamie Lokier
@ 2008-06-11  0:09   ` Bruce_Leonard
  2008-06-11  7:18     ` Jörn Engel
  4 siblings, 1 reply; 17+ messages in thread
From: Bruce_Leonard @ 2008-06-11  0:09 UTC (permalink / raw)
  To: Jörn Engel; +Cc: linux-mtd

Jorn,

Well I'm glad I started a lively discussion :).  Most of what you and 
Jamie are talking about are beyond me though so I don't think I'll try and 
contribute.  I'll just keep asking questions to try and understand.

I think I'm starting to glean some things and one of those things is that 
I think I'm in over my head :(.  Are you suggesting that the direction we 
want to move is to change the MTD layer over to function the same way as 
the Block I/O layer?  With all of it's attendant infrastructure?  I've 
read through the 'Block Device Drivers' chapter of 'Understanding the 
Linux Kernal'.  It talks about the I/O scheduler and queues of requests 
consisting of multiple bios which can consist of multiple segments 
(bio_vec?), and a WHOLE lot of 'stuff'.  If that's what we're talking 
about then it's WAY beyond my scope or my abilities.  I'm trying to solve 
a very narrow problem (i.e., I want to access 8GiB of flash as a single 
device using UBI/UBIFS).

So here's what I suggest I can do for you and you can tell me if it'll be 
of any help.  I can continue to ask questions about the code you posted so 
I understand both the plumbing end and the calling end (I'm still a bit 
confused on the struct page part and how to use it), implement basically 
just what you showed me (not anything like I've read about BIO), and try 
to make it work for my UBI/MTD/NAND case.  That would be a start to the 
changes without being beyond my abilities and I can do it in a way that 
should't break any existing code.  Sound resonable or just a waste of 
your's and the list's time?

On to some questions:

> +static int ram_submit_fio(struct mtd_info *mtd, struct fio *fio)
> +{
> +   int i, err = 0;
> +   u32 ofs;
> +   size_t *retlen;
> +   const u_char *buf;
> +   struct fio_vec *vec;
> +
> +   /* This driver cannot deal with device >32bit yet */
> +   if (fio->fi_blockno * mtd->erasesize + fio->block_ofs + fio->fi_len
> +         > 0xffffffff)
> +      return -EIO;
> +
> +   ofs = fio->fi_blockno * mtd->erasesize + fio->block_ofs;
> +
> +   if (fio->fi_command == FI_ERASE) {
> +      memset(mtd->priv + ofs, 0xff, fio->fi_len);
> +      goto out;
> +   }
> +
> +   for (i = 0; i < fio->fi_vcnt; i++) {
> +      vec = fio->fi_io_vec + i;
> +      buf = kmap_atomic(vec->fv_page, KM_USER0);
> +      switch (fio->fi_command) {
> +      case FI_READ:
> +         err = ram_read(mtd, ofs, vec->fv_len, &retlen,
> +               buf + vec->fv_offset);

I don't understand 'struct page' which is probably where my confusion 
comes from.  'ofs' passed to ram_read() is calculated from fi_blockno and 
fi_blockofs.  This makes sense as ram_read() is going to use this as the 
starting location in the "mtd" device to read from.  'buf' is calculated 
from vec->fv_page and vec->fv_offset.  I think this makes sense, as this 
is used by ram_read() is the starting location in fv_page to write to. 
'len' confuses me a bit as it comes from vec->fv_len rather than fi_len. 
Why use the length from the vector rather than the struct fio?  Should 
they be initialized to the same thing?  I would think to use fi_len as the 
calling function should have loaded that at the same time it figured out 
where to start from (i.e., when it loaded fi_blockno and fi_blockofs). I'm 
sure I'm missing something because I've never used struct page before and 
it's probably obvious to someone with more kernel experiance.

Does vec->fv_len represent the remaining length of vec->fv_page?  And 
vec->fv_offset the offset to start accessing vec->fv_page from?  Sorry to 
be asking dumb questions.

Bruce

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: mtd_info->size again (lengthy)
  2008-06-11  0:09   ` Bruce_Leonard
@ 2008-06-11  7:18     ` Jörn Engel
  2008-06-11  7:46       ` Bruce_Leonard
  0 siblings, 1 reply; 17+ messages in thread
From: Jörn Engel @ 2008-06-11  7:18 UTC (permalink / raw)
  To: Bruce_Leonard; +Cc: linux-mtd

On Tue, 10 June 2008 17:09:35 -0700, Bruce_Leonard@selinc.com wrote:
> 
> I think I'm starting to glean some things and one of those things is that 
> I think I'm in over my head :(.  Are you suggesting that the direction we 
> want to move is to change the MTD layer over to function the same way as 
> the Block I/O layer?

Either that or move the block io layer to function the same way as mtd.
Effectively the two should meet somewhere in between, but it's going to
be a lot closer to current block io than current mtd.  We want all the
goodies of both worlds, and block io simply has more.

> With all of it's attendant infrastructure?  I've 
> read through the 'Block Device Drivers' chapter of 'Understanding the 
> Linux Kernal'.  It talks about the I/O scheduler and queues of requests 
> consisting of multiple bios which can consist of multiple segments 
> (bio_vec?), and a WHOLE lot of 'stuff'.  If that's what we're talking 
> about then it's WAY beyond my scope or my abilities.

That's ok.  I don't expect you to do it all.  What I would like is to
solve your problem in a way that brings us closer to block io instead of
farther away.

> > +static int ram_submit_fio(struct mtd_info *mtd, struct fio *fio)
> > +{
> > +   int i, err = 0;
> > +   u32 ofs;
> > +   size_t *retlen;
> > +   const u_char *buf;
> > +   struct fio_vec *vec;
> > +
> > +   /* This driver cannot deal with device >32bit yet */
> > +   if (fio->fi_blockno * mtd->erasesize + fio->block_ofs + fio->fi_len
> > +         > 0xffffffff)
> > +      return -EIO;
> > +
> > +   ofs = fio->fi_blockno * mtd->erasesize + fio->block_ofs;
> > +
> > +   if (fio->fi_command == FI_ERASE) {
> > +      memset(mtd->priv + ofs, 0xff, fio->fi_len);
> > +      goto out;
> > +   }
> > +
> > +   for (i = 0; i < fio->fi_vcnt; i++) {
> > +      vec = fio->fi_io_vec + i;
> > +      buf = kmap_atomic(vec->fv_page, KM_USER0);
> > +      switch (fio->fi_command) {
> > +      case FI_READ:
> > +         err = ram_read(mtd, ofs, vec->fv_len, &retlen,
> > +               buf + vec->fv_offset);
> 
> 'len' confuses me a bit as it comes from vec->fv_len rather than fi_len. 
> Why use the length from the vector rather than the struct fio?  Should 
> they be initialized to the same thing?  I would think to use fi_len as the 
> calling function should have loaded that at the same time it figured out 
> where to start from (i.e., when it loaded fi_blockno and fi_blockofs). I'm 
> sure I'm missing something because I've never used struct page before and 
> it's probably obvious to someone with more kernel experiance.

An io-vector is used to scatter-gather io.  Basically you tell a driver
to write those 4 meg of data in one huge chunk to the device.  And, by
the way, the data is currently scattered all over memory and the
individual bits are here, here, here, here, here,...

That's why we have a loop and within each iteration of the loop only one
fragment is processed.  Each fragment consists of the page, starting
offset within the page and length.  The sum of all fragments length
fields should equal fio->fi_len.  (And it might be a good idea to check
that to catch bugs.)

> Does vec->fv_len represent the remaining length of vec->fv_page?  And 
> vec->fv_offset the offset to start accessing vec->fv_page from?  Sorry to 
> be asking dumb questions.

Just keep asking.  Your questions are far from dumb.  And lack of
experience is often a good thing.  Sometimes all the adults strongly
believe the emperor is wearing clothes and it takes a child to point out
the obvious facts. :)

Jörn

-- 
Homo Sapiens is a goal, not a description.
-- unknown

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: mtd_info->size again (lengthy)
  2008-06-11  7:18     ` Jörn Engel
@ 2008-06-11  7:46       ` Bruce_Leonard
  2008-06-11  9:13         ` Jörn Engel
  0 siblings, 1 reply; 17+ messages in thread
From: Bruce_Leonard @ 2008-06-11  7:46 UTC (permalink / raw)
  To: Jörn Engel; +Cc: linux-mtd

>
> > I think I'm in over my head :(.  Are you suggesting that the direction 
we 
> > want to move is to change the MTD layer over to function the same way 
as 
> > the Block I/O layer?
> 
> Either that or move the block io layer to function the same way as mtd.
> Effectively the two should meet somewhere in between, but it's going to
> be a lot closer to current block io than current mtd.  We want all the
> goodies of both worlds, and block io simply has more.
>

Okay, that makes sense.  I think I'm actually starting to understand some 
of this :\.  How tied up in the block layer is the I/O scheduler?  With my 
limited understanding, it seems that is going to be the real sticking 
point in moving block and mtd io towards each other.  If the I/O scheduler 
is largely decoupled from bio it may make this easier than I think.

> 
> That's ok.  I don't expect you to do it all.  What I would like is to
> solve your problem in a way that brings us closer to block io instead of
> farther away.
> 

Fortunately, I think I'm well on my way to doing that.  I've taken your 
code snippets (okay pretty much stolen them wholesale, I hope that's okay) 
and started makeing changes based on them.  The changes aren't really 
radical, I don't make use of the struct fio_vec and I'm just making 
submit_fio() a wrapper around existing NAND functions, simple stuff like 
that for now.  That will at least get me working and maybe some proof of 
concept.  I hope to have a patch set in the next day or two for the list 
to look over.

Thanks again for all the help.

Bruce

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: mtd_info->size again (lengthy)
  2008-06-11  7:46       ` Bruce_Leonard
@ 2008-06-11  9:13         ` Jörn Engel
  0 siblings, 0 replies; 17+ messages in thread
From: Jörn Engel @ 2008-06-11  9:13 UTC (permalink / raw)
  To: Bruce_Leonard; +Cc: linux-mtd

On Wed, 11 June 2008 00:46:30 -0700, Bruce_Leonard@selinc.com wrote:
> 
> Okay, that makes sense.  I think I'm actually starting to understand some 
> of this :\.  How tied up in the block layer is the I/O scheduler?  With my 
> limited understanding, it seems that is going to be the real sticking 
> point in moving block and mtd io towards each other.  If the I/O scheduler 
> is largely decoupled from bio it may make this easier than I think.

The block io schedulers do two things.  First they implement some kind
of elevator, combining operations and reordering them for better disk
head utilization.  That is fairly useless in flash context.  Some
schedulers also add policy, like giving all process a fair share of io.
That might become useful for flash as well.

> Fortunately, I think I'm well on my way to doing that.  I've taken your 
> code snippets (okay pretty much stolen them wholesale, I hope that's okay) 
> and started makeing changes based on them.  The changes aren't really 
> radical, I don't make use of the struct fio_vec and I'm just making 
> submit_fio() a wrapper around existing NAND functions, simple stuff like 
> that for now.  That will at least get me working and maybe some proof of 
> concept.  I hope to have a patch set in the next day or two for the list 
> to look over.

Cool.  If you solve your problem, don't break anything existing and the
infrastructure allows to handle existing drivers one by one, I am happy.
Bonus points for changing existing drivers, of course.  But I think a
one-by-one migration is better than a flag-day.  If the conversion
introduces bugs to individual drivers, we can simply revert those bits
and redo them at our leasure, instead of dealing with everything at
once.

Jörn

-- 
I don't understand it. Nobody does.
-- Richard P. Feynman

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2008-06-11  9:14 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-08  5:38 mtd_info->size again (lengthy) Bruce_Leonard
2008-06-08 12:10 ` Jörn Engel
2008-06-08 12:13   ` Jörn Engel
2008-06-09  2:51   ` Bruce_Leonard
2008-06-09  5:21     ` Jörn Engel
2008-06-09  4:43   ` Bruce_Leonard
2008-06-09  5:28     ` Jörn Engel
2008-06-09 13:21   ` Jamie Lokier
2008-06-09 14:36     ` Jörn Engel
2008-06-09 16:31       ` Jamie Lokier
2008-06-09 17:22         ` Jörn Engel
2008-06-10 10:56           ` Jamie Lokier
2008-06-10 11:46             ` Jörn Engel
2008-06-11  0:09   ` Bruce_Leonard
2008-06-11  7:18     ` Jörn Engel
2008-06-11  7:46       ` Bruce_Leonard
2008-06-11  9:13         ` Jörn Engel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox