public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] forget block layer request for FTLs
@ 2005-11-24 13:21 Sean Young
  2005-11-25  0:36 ` Charles Manning
  2005-11-25 10:55 ` Jörn Engel
  0 siblings, 2 replies; 5+ messages in thread
From: Sean Young @ 2005-11-24 13:21 UTC (permalink / raw)
  To: linux-fsdevel

Any flash device which appears as a normal block device to user space 
has a flash translation layer (FTL) which from time to time garbage
collects, i.e. moves data around to make room for an erase. Data might
be moved from one erase unit to another, so a full erase unit can be 
erased without losing data.

The FTL is unaware of any "unused" sectors; e.g. if a file is
unlinked the sectors where the data of that (former) file resides will
still be considered for garbage collection, even though those sectors 
are no longer relevant.

The purpose of this patch is to make it possible for file system drivers 
to inform the block device of "staleness" of no-longer-used sectors.  This 
patch introduces another barrier request, REQ_FORGET. The semantics
of this requests are that no read() is expected before the next write(),
i.e. read() may return garbage in the time between forget() and the
next write().

This has advantages for speed, durability and energy consumption. Erases
cost time, energy and are another nail in the coffin. Moving data costs
cycles. Since less data has to be garbage collected, less erases will be
needed. Erases can be done ahead-of-time, which further increases the 
write throughput.

Note that this patch does not make the possibility of "forgetting" 
sectors accessible from user-space. This could have advantages when 
user-space accesses a FTLed flash directly, e.g. mkfs or writing a file 
system image. 

Not only the in-kernel FTL's can use this. The CompactFlash specification
includes a function "erase sectors". This command instructs the 
CompactFlash to forget the sector and also make sure there is enough
free space for a write w/o erase. This is a little bit more than just
forget(), but still adventagous. Normally, once an CompactFlash has
had every sector written no sectors are ever erased, which will cause
lots of garbage collection to go on; after erasing all unused sectors,
garbage collection is should not happen until the flash is full once 
again.


Sean
--- 
diff -uprN linux-2.6.14/block/ll_rw_blk.c linux-forget/block/ll_rw_blk.c
--- linux-2.6.14/block/ll_rw_blk.c	2005-11-20 18:05:05.000000000 +0100
+++ linux-forget/block/ll_rw_blk.c	2005-11-23 19:43:40.000000000 +0100
@@ -1056,6 +1056,7 @@ static char *rq_flags[] = {
 	"REQ_DRIVE_TASK",
 	"REQ_DRIVE_TASKFILE",
 	"REQ_PREEMPT",
+	"REQ_FORGET",
 	"REQ_PM_SUSPEND",
 	"REQ_PM_RESUME",
 	"REQ_PM_SHUTDOWN",
@@ -2377,6 +2378,44 @@ int blkdev_issue_flush(struct block_devi
 
 EXPORT_SYMBOL(blkdev_issue_flush);
 
+/**
+ * blkdev_issue_forget() - inform block device of stale sectors
+ * @bdev: block device to issue forget for
+ * @sector: first sector that can be forgotten
+ * @size: amount of bytes that can be forgotten
+ */
+
+void blkdev_issue_forget(struct block_device *bdev, sector_t sector, int size)
+{
+	request_queue_t *q;
+	struct request *rq;
+
+	q = bdev_get_queue(bdev);
+
+	rq = blk_get_request(q, WRITE, __GFP_WAIT);
+	rq->errors = 0;
+	rq->rq_disk = bdev->bd_disk;
+	rq->bio = NULL;
+	rq->buffer = NULL;
+	rq->timeout = 60*HZ;
+	rq->data = NULL;
+	rq->data_len = 0;
+	rq->flags |= REQ_FORGET | REQ_SOFTBARRIER | REQ_NOMERGE;
+	rq->sector = sector;
+	rq->current_nr_sectors = size >> 9;
+
+	if (bdev != bdev->bd_contains) {
+		struct hd_struct *p = bdev->bd_part;
+
+		rq->sector += p->start_sect;
+		rq->rq_disk = bdev->bd_contains;
+	}
+
+	elv_add_request(q, rq, ELEVATOR_INSERT_BACK, 1);
+}
+
+EXPORT_SYMBOL(blkdev_issue_forget);
+
 static void drive_stat_acct(struct request *rq, int nr_sectors, int new_io)
 {
 	int rw = rq_data_dir(rq);
diff -uprN linux-2.6.14/drivers/mtd/mtd_blkdevs.c linux-forget/drivers/mtd/mtd_blkdevs.c
--- linux-2.6.14/drivers/mtd/mtd_blkdevs.c	2005-11-20 18:05:05.000000000 +0100
+++ linux-forget/drivers/mtd/mtd_blkdevs.c	2005-11-23 19:46:14.000000000 +0100
@@ -46,6 +46,18 @@ static int do_blktrans_request(struct mt
 	nsect = req->current_nr_sectors;
 	buf = req->buffer;
 
+	if (blk_barrier_forget(req)) {
+		if (tr->forgetsect) {
+			for (; nsect > 0; nsect--) 
+				if (tr->forgetsect(dev, block++))
+					return 0;
+
+			return 1;
+		}
+
+		return 0;
+	}
+
 	if (!(req->flags & REQ_CMD))
 		return 0;
 
diff -uprN linux-2.6.14/drivers/mtd/rfd_ftl.c linux-forget/drivers/mtd/rfd_ftl.c
--- linux-2.6.14/drivers/mtd/rfd_ftl.c	2005-11-20 18:05:05.000000000 +0100
+++ linux-forget/drivers/mtd/rfd_ftl.c	2005-11-23 19:43:40.000000000 +0100
@@ -621,6 +621,19 @@ err:
 	return rc;
 }
 
+static int rfd_ftl_forgetsect(struct mtd_blktrans_dev *dev, u_long sector)
+{
+	struct partition *part = (struct partition*)dev;
+
+	u_long addr = part->sector_map[sector];
+	if (addr == -1)
+		return 0;
+
+	part->sector_map[sector] = -1;
+
+	return mark_sector_deleted(part, addr);
+}
+
 static int find_free_sector(const struct partition *part, const struct block *block)
 {
 	int i, stop;
@@ -830,10 +843,11 @@ struct mtd_blktrans_ops rfd_ftl_tr = {
 	.part_bits	= PART_BITS,
 	.readsect	= rfd_ftl_readsect,
 	.writesect	= rfd_ftl_writesect,
+	.forgetsect	= rfd_ftl_forgetsect,
 	.getgeo		= rfd_ftl_getgeo,
 	.add_mtd	= rfd_ftl_add_mtd,
 	.remove_dev	= rfd_ftl_remove_dev,
-	.owner		= THIS_MODULE,
+	.owner		= THIS_MODULE
 };
 
 static int __init init_rfd_ftl(void)
diff -uprN linux-2.6.14/fs/fat/fatent.c linux-forget/fs/fat/fatent.c
--- linux-2.6.14/fs/fat/fatent.c	2005-11-20 18:05:05.000000000 +0100
+++ linux-forget/fs/fat/fatent.c	2005-11-23 19:43:40.000000000 +0100
@@ -6,6 +6,7 @@
 #include <linux/module.h>
 #include <linux/fs.h>
 #include <linux/msdos_fs.h>
+#include <linux/blkdev.h>
 
 struct fatent_operations {
 	void (*ent_blocknr)(struct super_block *, int, int *, sector_t *);
@@ -530,6 +531,10 @@ int fat_free_clusters(struct inode *inod
 	fatent_init(&fatent);
 	lock_fat(sbi);
 	do {
+		blkdev_issue_forget(sb->s_bdev, sbi->data_start + 
+				(cluster << (sbi->cluster_bits - 9)),
+				sbi->cluster_size);
+
 		cluster = fat_ent_read(inode, &fatent, cluster);
 		if (cluster < 0) {
 			err = cluster;
diff -uprN linux-2.6.14/include/linux/blkdev.h linux-forget/include/linux/blkdev.h
--- linux-2.6.14/include/linux/blkdev.h	2005-11-20 18:05:05.000000000 +0100
+++ linux-forget/include/linux/blkdev.h	2005-11-23 19:43:40.000000000 +0100
@@ -226,6 +226,7 @@ enum rq_flag_bits {
 	__REQ_DRIVE_TASK,
 	__REQ_DRIVE_TASKFILE,
 	__REQ_PREEMPT,		/* set for "ide_preempt" requests */
+	__REQ_FORGET,		/* forget sectors request */
 	__REQ_PM_SUSPEND,	/* suspend request */
 	__REQ_PM_RESUME,	/* resume request */
 	__REQ_PM_SHUTDOWN,	/* shutdown request */
@@ -256,6 +257,7 @@ enum rq_flag_bits {
 #define REQ_DRIVE_TASK	(1 << __REQ_DRIVE_TASK)
 #define REQ_DRIVE_TASKFILE	(1 << __REQ_DRIVE_TASKFILE)
 #define REQ_PREEMPT	(1 << __REQ_PREEMPT)
+#define REQ_FORGET	(1 << __REQ_FORGET)
 #define REQ_PM_SUSPEND	(1 << __REQ_PM_SUSPEND)
 #define REQ_PM_RESUME	(1 << __REQ_PM_RESUME)
 #define REQ_PM_SHUTDOWN	(1 << __REQ_PM_SHUTDOWN)
@@ -467,6 +469,7 @@ enum {
 #define blk_barrier_rq(rq)	((rq)->flags & REQ_HARDBARRIER)
 #define blk_barrier_preflush(rq)	((rq)->flags & REQ_BAR_PREFLUSH)
 #define blk_barrier_postflush(rq)	((rq)->flags & REQ_BAR_POSTFLUSH)
+#define blk_barrier_forget(rq)	((rq->flags & REQ_BAR_FORGET)
 
 #define list_entry_rq(ptr)	list_entry((ptr), struct request, queuelist)
 
@@ -693,6 +696,7 @@ extern long blk_congestion_wait(int rw, 
 
 extern void blk_rq_bio_prep(request_queue_t *, struct request *, struct bio *);
 extern int blkdev_issue_flush(struct block_device *, sector_t *);
+extern void blkdev_issue_forget(struct block_device *, sector_t, int);
 
 #define MAX_PHYS_SEGMENTS 128
 #define MAX_HW_SEGMENTS 128
diff -uprN linux-2.6.14/include/linux/mtd/blktrans.h linux-forget/include/linux/mtd/blktrans.h
--- linux-2.6.14/include/linux/mtd/blktrans.h	2005-11-20 18:05:05.000000000 +0100
+++ linux-forget/include/linux/mtd/blktrans.h	2005-11-23 19:43:40.000000000 +0100
@@ -42,6 +42,7 @@ struct mtd_blktrans_ops {
 		    unsigned long block, char *buffer);
 	int (*writesect)(struct mtd_blktrans_dev *dev,
 		     unsigned long block, char *buffer);
+	int (*forgetsect)(struct mtd_blktrans_dev *dev, unsigned long block);
 
 	/* Block layer ioctls */
 	int (*getgeo)(struct mtd_blktrans_dev *dev, struct hd_geometry *geo);

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] forget block layer request for FTLs
  2005-11-24 13:21 [RFC PATCH] forget block layer request for FTLs Sean Young
@ 2005-11-25  0:36 ` Charles Manning
  2005-11-25  1:42   ` Sean Young
  2005-11-27 17:54   ` Sean Young
  2005-11-25 10:55 ` Jörn Engel
  1 sibling, 2 replies; 5+ messages in thread
From: Charles Manning @ 2005-11-25  0:36 UTC (permalink / raw)
  To: Sean Young; +Cc: linux-fsdevel

On Friday 25 November 2005 02:21, Sean Young wrote:
> Any flash device which appears as a normal block device to user space
> has a flash translation layer (FTL) which from time to time garbage
> collects, i.e. moves data around to make room for an erase. Data might
> be moved from one erase unit to another, so a full erase unit can be
> erased without losing data.
>
> The FTL is unaware of any "unused" sectors; e.g. if a file is
> unlinked the sectors where the data of that (former) file resides will
> still be considered for garbage collection, even though those sectors
> are no longer relevant.
>
> The purpose of this patch is to make it possible for file system drivers
> to inform the block device of "staleness" of no-longer-used sectors.  This
> patch introduces another barrier request, REQ_FORGET. The semantics
> of this requests are that no read() is expected before the next write(),
> i.e. read() may return garbage in the time between forget() and the
> next write().

Being a bit of a flash-head, I find this RPC quite interesting. I am 
struggling though to see exactly where this is going to be used. 

>From what I understand, you're going to have file knowledge passed into the 
block driver and need a special fs to exploit this. This is starting to break 
down the whole point of a block driver isn't it?

Perhaps if you hacked a fs to use this that would show the benefits.

-- CHarles



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] forget block layer request for FTLs
  2005-11-25  0:36 ` Charles Manning
@ 2005-11-25  1:42   ` Sean Young
  2005-11-27 17:54   ` Sean Young
  1 sibling, 0 replies; 5+ messages in thread
From: Sean Young @ 2005-11-25  1:42 UTC (permalink / raw)
  To: Charles Manning; +Cc: linux-fsdevel

On Fri, Nov 25, 2005 at 01:36:41PM +1300, Charles Manning wrote:
> On Friday 25 November 2005 02:21, Sean Young wrote:
> > Any flash device which appears as a normal block device to user space
> > has a flash translation layer (FTL) which from time to time garbage
> > collects, i.e. moves data around to make room for an erase. Data might
> > be moved from one erase unit to another, so a full erase unit can be
> > erased without losing data.
> >
> > The FTL is unaware of any "unused" sectors; e.g. if a file is
> > unlinked the sectors where the data of that (former) file resides will
> > still be considered for garbage collection, even though those sectors
> > are no longer relevant.
> >
> > The purpose of this patch is to make it possible for file system drivers
> > to inform the block device of "staleness" of no-longer-used sectors.  This
> > patch introduces another barrier request, REQ_FORGET. The semantics
> > of this requests are that no read() is expected before the next write(),
> > i.e. read() may return garbage in the time between forget() and the
> > next write().
> 
> Being a bit of a flash-head, I find this RPC quite interesting. I am 
> struggling though to see exactly where this is going to be used. 
> 
> >From what I understand, you're going to have file knowledge passed into the 
> block driver and need a special fs to exploit this. This is starting to break 
> down the whole point of a block driver isn't it?
> 
> Perhaps if you hacked a fs to use this that would show the benefits.

In the patch (see previous mail) the fat file systems call 
blkdev_issue_forget() in fat_free_clusters(). So, on file unlink the 
block device gets a forget request for the data sectors where the 
file previously existed. 

Also, the patch augments the resident flash disk FTL to use this 
request; those sectors are simply removed and if an entire erase unit
is full of "deleted" (or overwritten) sectors, it is erased. I've tested
this and it really does reduce the number of erases in certain 
circumstances.

This same request could be handled in other FTLs and CFA ATA.

So, no special file system is required nor is file system knowledge present
in the block device driver. Of course, this only helps in case there is an
FTL. Flash-aware file systems like jffs have far smarter ways of dealing
with this. However FTLs are used in so many devices nowadays.


Sean

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] forget block layer request for FTLs
  2005-11-24 13:21 [RFC PATCH] forget block layer request for FTLs Sean Young
  2005-11-25  0:36 ` Charles Manning
@ 2005-11-25 10:55 ` Jörn Engel
  1 sibling, 0 replies; 5+ messages in thread
From: Jörn Engel @ 2005-11-25 10:55 UTC (permalink / raw)
  To: Sean Young; +Cc: linux-fsdevel

On Thu, 24 November 2005 14:21:57 +0100, Sean Young wrote:
> 
> [...]

Patch looks fine to me.

Jörn

-- 
Invincibility is in oneself, vulnerability is in the opponent.
-- Sun Tzu
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] forget block layer request for FTLs
  2005-11-25  0:36 ` Charles Manning
  2005-11-25  1:42   ` Sean Young
@ 2005-11-27 17:54   ` Sean Young
  1 sibling, 0 replies; 5+ messages in thread
From: Sean Young @ 2005-11-27 17:54 UTC (permalink / raw)
  To: Charles Manning; +Cc: linux-fsdevel

On Fri, Nov 25, 2005 at 01:36:41PM +1300, Charles Manning wrote:
> Being a bit of a flash-head, I find this RPC quite interesting. I am 
> struggling though to see exactly where this is going to be used. 

I don't think I explained it very well. Allow my to try again.

A normal block device like a harddisk can overwrite (write-in-place)
sectors; however flash has larger erase units than the sector size (512)
which makes this impossible. This problem is handled by the flash
translation layer (FTL). It has a layer of indirection which stores
sectors a different location on the flash, along with a table of
sector number and an address on the flash memory chip. When a sector is
overwritten it saves it a different location and updates the table. When
no free space exists any more on the flash, it needs to make an entire
erase unit available for erase. To achieve this, it moves around sectors
while updating the table for consistency.

The problem I'm trying to solve is that the FTL is unaware of sectors
which are no longer relevant, sectors of deleted files for example. Those
sectors do not need to be moved, they can simply be "forgotten", causing
the flash memory to fill up less quickly before another erase is
necessary.

Also, by "forgetting" sectors an erase unit might become available for
erase without requiring moving of sectors, allowing the erase to happen
before the flash memory is full. So, it makes FTL faster and the flash
last longer.

I hope this clears things up.


Sean

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-11-27 17:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-24 13:21 [RFC PATCH] forget block layer request for FTLs Sean Young
2005-11-25  0:36 ` Charles Manning
2005-11-25  1:42   ` Sean Young
2005-11-27 17:54   ` Sean Young
2005-11-25 10:55 ` Jörn Engel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox