From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sean Young Subject: [RFC PATCH] forget block layer request for FTLs Date: Thu, 24 Nov 2005 14:21:57 +0100 Message-ID: <20051124132157.GA16439@atlantis.8hz.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from atlantis.8hz.com ([212.129.237.78]:62662 "EHLO atlantis.8hz.com") by vger.kernel.org with ESMTP id S1750904AbVKXNV7 (ORCPT ); Thu, 24 Nov 2005 08:21:59 -0500 Received: from atlantis.8hz.com (localhost [127.0.0.1]) by atlantis.8hz.com (Postfix) with ESMTP id 11FA7BB0C for ; Thu, 24 Nov 2005 14:21:58 +0100 (CET) Received: (from sean@localhost) by atlantis.8hz.com (8.13.1/8.13.1/Submit) id jAODLvvX016466 for linux-fsdevel@vger.kernel.org; Thu, 24 Nov 2005 14:21:57 +0100 (CET) (envelope-from sean@mess.org) To: linux-fsdevel@vger.kernel.org Content-Disposition: inline Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org Any flash device which appears as a normal block device to user space has a flash translation layer (FTL) which from time to time garbage collects, i.e. moves data around to make room for an erase. Data might be moved from one erase unit to another, so a full erase unit can be erased without losing data. The FTL is unaware of any "unused" sectors; e.g. if a file is unlinked the sectors where the data of that (former) file resides will still be considered for garbage collection, even though those sectors are no longer relevant. The purpose of this patch is to make it possible for file system drivers to inform the block device of "staleness" of no-longer-used sectors. This patch introduces another barrier request, REQ_FORGET. The semantics of this requests are that no read() is expected before the next write(), i.e. read() may return garbage in the time between forget() and the next write(). This has advantages for speed, durability and energy consumption. Erases cost time, energy and are another nail in the coffin. Moving data costs cycles. Since less data has to be garbage collected, less erases will be needed. Erases can be done ahead-of-time, which further increases the write throughput. Note that this patch does not make the possibility of "forgetting" sectors accessible from user-space. This could have advantages when user-space accesses a FTLed flash directly, e.g. mkfs or writing a file system image. Not only the in-kernel FTL's can use this. The CompactFlash specification includes a function "erase sectors". This command instructs the CompactFlash to forget the sector and also make sure there is enough free space for a write w/o erase. This is a little bit more than just forget(), but still adventagous. Normally, once an CompactFlash has had every sector written no sectors are ever erased, which will cause lots of garbage collection to go on; after erasing all unused sectors, garbage collection is should not happen until the flash is full once again. Sean --- diff -uprN linux-2.6.14/block/ll_rw_blk.c linux-forget/block/ll_rw_blk.c --- linux-2.6.14/block/ll_rw_blk.c 2005-11-20 18:05:05.000000000 +0100 +++ linux-forget/block/ll_rw_blk.c 2005-11-23 19:43:40.000000000 +0100 @@ -1056,6 +1056,7 @@ static char *rq_flags[] = { "REQ_DRIVE_TASK", "REQ_DRIVE_TASKFILE", "REQ_PREEMPT", + "REQ_FORGET", "REQ_PM_SUSPEND", "REQ_PM_RESUME", "REQ_PM_SHUTDOWN", @@ -2377,6 +2378,44 @@ int blkdev_issue_flush(struct block_devi EXPORT_SYMBOL(blkdev_issue_flush); +/** + * blkdev_issue_forget() - inform block device of stale sectors + * @bdev: block device to issue forget for + * @sector: first sector that can be forgotten + * @size: amount of bytes that can be forgotten + */ + +void blkdev_issue_forget(struct block_device *bdev, sector_t sector, int size) +{ + request_queue_t *q; + struct request *rq; + + q = bdev_get_queue(bdev); + + rq = blk_get_request(q, WRITE, __GFP_WAIT); + rq->errors = 0; + rq->rq_disk = bdev->bd_disk; + rq->bio = NULL; + rq->buffer = NULL; + rq->timeout = 60*HZ; + rq->data = NULL; + rq->data_len = 0; + rq->flags |= REQ_FORGET | REQ_SOFTBARRIER | REQ_NOMERGE; + rq->sector = sector; + rq->current_nr_sectors = size >> 9; + + if (bdev != bdev->bd_contains) { + struct hd_struct *p = bdev->bd_part; + + rq->sector += p->start_sect; + rq->rq_disk = bdev->bd_contains; + } + + elv_add_request(q, rq, ELEVATOR_INSERT_BACK, 1); +} + +EXPORT_SYMBOL(blkdev_issue_forget); + static void drive_stat_acct(struct request *rq, int nr_sectors, int new_io) { int rw = rq_data_dir(rq); diff -uprN linux-2.6.14/drivers/mtd/mtd_blkdevs.c linux-forget/drivers/mtd/mtd_blkdevs.c --- linux-2.6.14/drivers/mtd/mtd_blkdevs.c 2005-11-20 18:05:05.000000000 +0100 +++ linux-forget/drivers/mtd/mtd_blkdevs.c 2005-11-23 19:46:14.000000000 +0100 @@ -46,6 +46,18 @@ static int do_blktrans_request(struct mt nsect = req->current_nr_sectors; buf = req->buffer; + if (blk_barrier_forget(req)) { + if (tr->forgetsect) { + for (; nsect > 0; nsect--) + if (tr->forgetsect(dev, block++)) + return 0; + + return 1; + } + + return 0; + } + if (!(req->flags & REQ_CMD)) return 0; diff -uprN linux-2.6.14/drivers/mtd/rfd_ftl.c linux-forget/drivers/mtd/rfd_ftl.c --- linux-2.6.14/drivers/mtd/rfd_ftl.c 2005-11-20 18:05:05.000000000 +0100 +++ linux-forget/drivers/mtd/rfd_ftl.c 2005-11-23 19:43:40.000000000 +0100 @@ -621,6 +621,19 @@ err: return rc; } +static int rfd_ftl_forgetsect(struct mtd_blktrans_dev *dev, u_long sector) +{ + struct partition *part = (struct partition*)dev; + + u_long addr = part->sector_map[sector]; + if (addr == -1) + return 0; + + part->sector_map[sector] = -1; + + return mark_sector_deleted(part, addr); +} + static int find_free_sector(const struct partition *part, const struct block *block) { int i, stop; @@ -830,10 +843,11 @@ struct mtd_blktrans_ops rfd_ftl_tr = { .part_bits = PART_BITS, .readsect = rfd_ftl_readsect, .writesect = rfd_ftl_writesect, + .forgetsect = rfd_ftl_forgetsect, .getgeo = rfd_ftl_getgeo, .add_mtd = rfd_ftl_add_mtd, .remove_dev = rfd_ftl_remove_dev, - .owner = THIS_MODULE, + .owner = THIS_MODULE }; static int __init init_rfd_ftl(void) diff -uprN linux-2.6.14/fs/fat/fatent.c linux-forget/fs/fat/fatent.c --- linux-2.6.14/fs/fat/fatent.c 2005-11-20 18:05:05.000000000 +0100 +++ linux-forget/fs/fat/fatent.c 2005-11-23 19:43:40.000000000 +0100 @@ -6,6 +6,7 @@ #include #include #include +#include struct fatent_operations { void (*ent_blocknr)(struct super_block *, int, int *, sector_t *); @@ -530,6 +531,10 @@ int fat_free_clusters(struct inode *inod fatent_init(&fatent); lock_fat(sbi); do { + blkdev_issue_forget(sb->s_bdev, sbi->data_start + + (cluster << (sbi->cluster_bits - 9)), + sbi->cluster_size); + cluster = fat_ent_read(inode, &fatent, cluster); if (cluster < 0) { err = cluster; diff -uprN linux-2.6.14/include/linux/blkdev.h linux-forget/include/linux/blkdev.h --- linux-2.6.14/include/linux/blkdev.h 2005-11-20 18:05:05.000000000 +0100 +++ linux-forget/include/linux/blkdev.h 2005-11-23 19:43:40.000000000 +0100 @@ -226,6 +226,7 @@ enum rq_flag_bits { __REQ_DRIVE_TASK, __REQ_DRIVE_TASKFILE, __REQ_PREEMPT, /* set for "ide_preempt" requests */ + __REQ_FORGET, /* forget sectors request */ __REQ_PM_SUSPEND, /* suspend request */ __REQ_PM_RESUME, /* resume request */ __REQ_PM_SHUTDOWN, /* shutdown request */ @@ -256,6 +257,7 @@ enum rq_flag_bits { #define REQ_DRIVE_TASK (1 << __REQ_DRIVE_TASK) #define REQ_DRIVE_TASKFILE (1 << __REQ_DRIVE_TASKFILE) #define REQ_PREEMPT (1 << __REQ_PREEMPT) +#define REQ_FORGET (1 << __REQ_FORGET) #define REQ_PM_SUSPEND (1 << __REQ_PM_SUSPEND) #define REQ_PM_RESUME (1 << __REQ_PM_RESUME) #define REQ_PM_SHUTDOWN (1 << __REQ_PM_SHUTDOWN) @@ -467,6 +469,7 @@ enum { #define blk_barrier_rq(rq) ((rq)->flags & REQ_HARDBARRIER) #define blk_barrier_preflush(rq) ((rq)->flags & REQ_BAR_PREFLUSH) #define blk_barrier_postflush(rq) ((rq)->flags & REQ_BAR_POSTFLUSH) +#define blk_barrier_forget(rq) ((rq->flags & REQ_BAR_FORGET) #define list_entry_rq(ptr) list_entry((ptr), struct request, queuelist) @@ -693,6 +696,7 @@ extern long blk_congestion_wait(int rw, extern void blk_rq_bio_prep(request_queue_t *, struct request *, struct bio *); extern int blkdev_issue_flush(struct block_device *, sector_t *); +extern void blkdev_issue_forget(struct block_device *, sector_t, int); #define MAX_PHYS_SEGMENTS 128 #define MAX_HW_SEGMENTS 128 diff -uprN linux-2.6.14/include/linux/mtd/blktrans.h linux-forget/include/linux/mtd/blktrans.h --- linux-2.6.14/include/linux/mtd/blktrans.h 2005-11-20 18:05:05.000000000 +0100 +++ linux-forget/include/linux/mtd/blktrans.h 2005-11-23 19:43:40.000000000 +0100 @@ -42,6 +42,7 @@ struct mtd_blktrans_ops { unsigned long block, char *buffer); int (*writesect)(struct mtd_blktrans_dev *dev, unsigned long block, char *buffer); + int (*forgetsect)(struct mtd_blktrans_dev *dev, unsigned long block); /* Block layer ioctls */ int (*getgeo)(struct mtd_blktrans_dev *dev, struct hd_geometry *geo);