linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <jens.axboe@oracle.com>
To: David Woodhouse <dwmw2@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Ric Wheeler <ricwheeler@gmail.com>,
	linux-fsdevel@vger.kernel.org, gilad@codefidence.com
Subject: Re: [PATCH 1/5] [BLOCK] Add 'discard' request handling
Date: Fri, 8 Aug 2008 13:44:09 +0200	[thread overview]
Message-ID: <20080808114409.GR20055@kernel.dk> (raw)
In-Reply-To: <1218194997.12232.124.camel@pmac.infradead.org>

On Fri, Aug 08 2008, David Woodhouse wrote:
> On Fri, 2008-08-08 at 13:18 +0200, Jens Axboe wrote:
> > Actually, my memory is a bit shot, since I got rid of ->issue_flush_fn
> > in the 2.6.24 cycle. So it's already gone
> 
> It was q->prepare_flush_fn which the discard_fn is based on, and that's
> still there in the for-2.6.28 tree you just pointed me at.
> 
> http://git.kernel.dk/?p=linux-2.6-block.git;a=blob;f=block/blk-barrier.c;h=a09ead19f9c5702a1ad76d709c54969176fe9e94;hb=6dc2b733b2f9605b48fdb7692fce5a3eafe241e4#l149

Sigh indeed, ->issue_flush_fn() was the actual issuer, not the preparer.
Let me send a new diff. This adds the ->prepare_discard_fn() to do the
transformation, and also extends blkdev_issue_flush() to return error if
the IO was never queued because of some device in the stack not
supporting it.

Until we have overlap detection, I think we should make the discard
request a explicit barrier. Otherwise we could have problems with a
discard being passed by a write request and such.

diff --git a/block/blk-barrier.c b/block/blk-barrier.c
index a09ead1..b2add39 100644
--- a/block/blk-barrier.c
+++ b/block/blk-barrier.c
@@ -315,3 +315,78 @@ int blkdev_issue_flush(struct block_device *bdev, sector_t *error_sector)
 	return ret;
 }
 EXPORT_SYMBOL(blkdev_issue_flush);
+
+static void bio_discard_end_io(struct bio *bio, int err)
+{
+	if (err) {
+		if (err == -EOPNOTSUPP)
+			set_bit(BIO_EOPNOTSUPP, &bio->bi_flags);
+		clear_bit(BIO_UPTODATE, &bio->bi_flags);
+	}
+
+	bio_put(bio);
+}
+
+/**
+ * blkdev_issue_discard - queue a discard
+ * @bdev:	blockdev to issue discard for
+ * @sector:	start sector
+ * @nr_sects:	number of sectors to discard
+ * @end_io:	end_io function (or %NULL)
+ *
+ * Description:
+ *    Issue a discard request for the sectors in question. Caller can pass an
+ *    end_io function (which must call bio_put()), if they really want to see
+ *    the outcome. Most callers probably won't, and can just pass %NULL.
+ */
+int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
+			 unsigned nr_sects, bio_end_io_t end_io)
+{
+	struct request_queue *q;
+	struct bio *bio;
+	int ret;
+
+	if (bdev->bd_disk == NULL)
+		return -ENXIO;
+
+	q = bdev_get_queue(bdev);
+	if (!q)
+		return -ENXIO;
+
+	if (!q->prepare_discard_fn)
+		return -EOPNOTSUPP;
+
+	bio = bio_alloc(GFP_KERNEL, 0);
+	if (!bio)
+		return -ENOMEM;
+
+	/*
+	 * Many callers won't care at all about the outcome. After all,
+	 * this is just a hint to the underlying device. They'll just
+	 * ignore errors completely. So the end_io function can be just
+	 * a call to bio_put()
+	 */
+	if (end_io)
+		bio->bi_end_io = end_io;
+	else
+		bio->bi_end_io = bio_discard_end_io;
+
+	bio->bi_bdev = bdev;
+	bio->bi_sector = sector;
+	bio->bi_size = nr_sects << 9;
+	bio_get(bio);
+	submit_bio(1 << BIO_RW_DISCARD, bio);
+
+	/*
+	 * Check if it was failed before being submitted
+	 */
+	ret = 0;
+	if (bio_flagged(bio, BIO_EOPNOTSUPP))
+		ret = -EOPNOTSUPP;
+	else if (!bio_flagged(bio, BIO_UPTODATE))
+		ret = -EIO;
+
+	bio_put(bio);
+	return ret;
+}
+EXPORT_SYMBOL(blkdev_issue_discard);
diff --git a/block/blk-core.c b/block/blk-core.c
index ff7ec49..ad519c9 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1068,7 +1068,10 @@ EXPORT_SYMBOL(blk_put_request);
 
 void init_request_from_bio(struct request *req, struct bio *bio)
 {
-	req->cmd_type = REQ_TYPE_FS;
+	if (bio_discard(bio))
+		req->cmd_type = REQ_TYPE_DISCARD;
+	else
+		req->cmd_type = REQ_TYPE_FS;
 
 	/*
 	 * inherit FAILFAST from bio (for read-ahead, and explicit FAILFAST)
@@ -1407,7 +1410,8 @@ end_io:
 
 		if (bio_check_eod(bio, nr_sectors))
 			goto end_io;
-		if (bio_empty_barrier(bio) && !q->prepare_flush_fn) {
+		if ((bio_empty_barrier(bio) && !q->prepare_flush_fn) ||
+		    (bio_discard(bio) && !q->prepare_discard_fn)) {
 			err = -EOPNOTSUPP;
 			goto end_io;
 		}
diff --git a/block/blk-settings.c b/block/blk-settings.c
index dfc7701..c783d84 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -32,6 +32,23 @@ void blk_queue_prep_rq(struct request_queue *q, prep_rq_fn *pfn)
 }
 EXPORT_SYMBOL(blk_queue_prep_rq);
 
+ /**
+ * blk_queue_setdiscard - set a discard_sectors function for queue
+ * @q:		queue
+ * @dfn:	discard function
+ *
+ * It's possible for a queue to register a discard callback which is used
+ * to transform a discard request into the appropriate type for the
+ * hardware. If none is registered, then discard requests are failed
+ * with EOPNOTSUPP.
+ *
+ */
+void blk_queue_set_discard(struct request_queue *q, prepare_discard_fn *dfn)
+{
+	q->prepare_discard_fn = dfn;
+}
+EXPORT_SYMBOL(blk_queue_set_discard);
+
 /**
  * blk_queue_merge_bvec - set a merge_bvec function for queue
  * @q:		queue
diff --git a/block/elevator.c b/block/elevator.c
index ed6f8f3..e704086 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -75,6 +75,13 @@ int elv_rq_merge_ok(struct request *rq, struct bio *bio)
 		return 0;
 
 	/*
+	 * Don't merge file system and non-file system requests
+	 */
+	if ((!bio_has_data(bio) && bio_has_data(rq->bio)) ||
+	    (bio_has_data(bio) && !bio_has_data(rq->bio)))
+		return 0;
+
+	/*
 	 * different data direction or already started, don't merge
 	 */
 	if (bio_data_dir(bio) != rq_data_dir(rq))
@@ -607,7 +614,6 @@ void elv_insert(struct request_queue *q, struct request *rq, int where)
 		break;
 
 	case ELEVATOR_INSERT_SORT:
-		BUG_ON(!blk_fs_request(rq));
 		rq->cmd_flags |= REQ_SORTED;
 		q->nr_sorted++;
 		if (rq_mergeable(rq)) {
diff --git a/include/linux/bio.h b/include/linux/bio.h
index dbeb66f..cc678d4 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -149,6 +149,8 @@ struct bio {
  * bit 2 -- barrier
  * bit 3 -- fail fast, don't want low level driver retries
  * bit 4 -- synchronous I/O hint: the block layer will unplug immediately
+ * bit 5 -- metadata request
+ * bit 6 -- discard sectors request
  */
 #define BIO_RW		0
 #define BIO_RW_AHEAD	1
@@ -156,6 +158,7 @@ struct bio {
 #define BIO_RW_FAILFAST	3
 #define BIO_RW_SYNC	4
 #define BIO_RW_META	5
+#define BIO_RW_DISCARD	6
 
 /*
  * upper 16 bits of bi_rw define the io priority of this bio
@@ -186,13 +189,18 @@ struct bio {
 #define bio_rw_ahead(bio)	((bio)->bi_rw & (1 << BIO_RW_AHEAD))
 #define bio_rw_meta(bio)	((bio)->bi_rw & (1 << BIO_RW_META))
 #define bio_empty_barrier(bio)	(bio_barrier(bio) && !bio_has_data(bio))
+#define bio_discard(bio)	((bio)->bi_rw & (1 << BIO_RW_DISCARD))
 
 static inline unsigned int bio_cur_sectors(struct bio *bio)
 {
+	/*
+	 * Discard requests don't have an bio_vec list, yet they still
+	 * refer to a range of data. Just use the given bio size in that case.
+	 */
 	if (bio->bi_vcnt)
 		return bio_iovec(bio)->bv_len >> 9;
-
-	return 0;
+	else
+		return bio_sectors(bio);
 }
 
 static inline void *bio_data(struct bio *bio)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e61f22b..1df6889 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -56,6 +56,7 @@ enum rq_cmd_type_bits {
 	REQ_TYPE_PM_SHUTDOWN,		/* shutdown request */
 	REQ_TYPE_FLUSH,			/* flush request */
 	REQ_TYPE_SPECIAL,		/* driver defined type */
+	REQ_TYPE_DISCARD,		/* discard sectors request */
 	REQ_TYPE_LINUX_BLOCK,		/* generic block layer message */
 	/*
 	 * for ATA/ATAPI devices. this really doesn't belong here, ide should
@@ -263,6 +264,7 @@ struct bvec_merge_data {
 typedef int (merge_bvec_fn) (struct request_queue *, struct bvec_merge_data *,
 			     struct bio_vec *);
 typedef void (prepare_flush_fn) (struct request_queue *, struct request *);
+typedef void (prepare_discard_fn) (struct request_queue *, struct request *);
 typedef void (softirq_done_fn)(struct request *);
 typedef int (dma_drain_needed_fn)(struct request *);
 
@@ -300,6 +302,7 @@ struct request_queue
 	unplug_fn		*unplug_fn;
 	merge_bvec_fn		*merge_bvec_fn;
 	prepare_flush_fn	*prepare_flush_fn;
+	prepare_discard_fn	*prepare_discard_fn;
 	softirq_done_fn		*softirq_done_fn;
 	dma_drain_needed_fn	*dma_drain_needed;
 
@@ -522,6 +525,7 @@ enum {
 #define blk_pc_request(rq)	((rq)->cmd_type == REQ_TYPE_BLOCK_PC)
 #define blk_special_request(rq)	((rq)->cmd_type == REQ_TYPE_SPECIAL)
 #define blk_sense_request(rq)	((rq)->cmd_type == REQ_TYPE_SENSE)
+#define blk_discard_request(rq)	((rq)->cmd_type == REQ_TYPE_DISCARD)
 
 #define blk_noretry_request(rq)	((rq)->cmd_flags & REQ_FAILFAST)
 #define blk_rq_started(rq)	((rq)->cmd_flags & REQ_STARTED)
@@ -788,6 +792,7 @@ extern void blk_queue_update_dma_alignment(struct request_queue *, int);
 extern void blk_queue_softirq_done(struct request_queue *, softirq_done_fn *);
 extern struct backing_dev_info *blk_get_backing_dev_info(struct block_device *bdev);
 extern int blk_queue_ordered(struct request_queue *, unsigned, prepare_flush_fn *);
+extern void blk_queue_set_discard(struct request_queue *, prepare_discard_fn *);
 extern int blk_do_ordered(struct request_queue *, struct request **);
 extern unsigned blk_ordered_cur_seq(struct request_queue *);
 extern unsigned blk_ordered_req_seq(struct request *);
@@ -829,6 +834,8 @@ static inline struct request *blk_map_queue_find_tag(struct blk_queue_tag *bqt,
 }
 
 extern int blkdev_issue_flush(struct block_device *, sector_t *);
+extern int blkdev_issue_discard(struct block_device *, sector_t,
+				unsigned int, bio_end_io_t *);
 
 /*
 * command filter functions

-- 
Jens Axboe


  reply	other threads:[~2008-08-08 11:44 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <488B7281.4020007@gmail.com>
     [not found] ` <20080726130200.f541e604.akpm@linux-foundation.org>
2008-08-05  1:45   ` [RFC] 'discard sectors' block request David Woodhouse
2008-08-05  1:59     ` Andrew Morton
2008-08-05  9:01       ` David Woodhouse
2008-08-05 11:42     ` Jens Axboe
2008-08-05 13:32       ` David Woodhouse
2008-08-05 14:21         ` Ric Wheeler
2008-08-05 16:29       ` David Woodhouse
2008-08-05 17:25         ` David Woodhouse
2008-08-06  9:25           ` [PATCH 1/5] [BLOCK] Add 'discard' request handling David Woodhouse
2008-08-06 16:19             ` OGAWA Hirofumi
2008-08-06 18:18               ` David Woodhouse
2008-08-06 19:28                 ` OGAWA Hirofumi
2008-08-07 16:32             ` [PATCH 1/5, v2] " David Woodhouse
2008-08-07 18:41             ` [PATCH 1/5] " Jens Axboe
2008-08-08  9:33               ` David Woodhouse
2008-08-08 10:30                 ` Jens Axboe
2008-08-08 10:49                   ` Jens Axboe
2008-08-08 11:04                     ` David Woodhouse
2008-08-08 11:20                       ` Jens Axboe
2008-08-08 10:52                   ` David Woodhouse
2008-08-08 11:09                     ` Jens Axboe
2008-08-08 11:18                       ` Jens Axboe
2008-08-08 11:29                         ` David Woodhouse
2008-08-08 11:44                           ` Jens Axboe [this message]
2008-08-08 11:47                             ` Jens Axboe
2008-08-08 12:05                             ` David Woodhouse
2008-08-08 12:13                               ` Jens Axboe
2008-08-08 12:32                                 ` David Woodhouse
2008-08-08 12:37                                   ` Jens Axboe
2008-08-08 12:49                                     ` David Woodhouse
2008-08-10  1:05                                       ` Jamie Lokier
2008-08-08 15:32                             ` David Woodhouse
2008-08-08 14:22                     ` Matthew Wilcox
2008-08-08 14:27                       ` David Woodhouse
2008-08-08 14:34                         ` Matthew Wilcox
2008-08-06  9:25           ` [PATCH 2/5] [FAT] Let the block device know when sectors can be discarded David Woodhouse
2008-08-06 16:40             ` OGAWA Hirofumi
2008-08-06 17:14               ` OGAWA Hirofumi
2008-08-06 18:11               ` David Woodhouse
2008-08-06 19:10                 ` OGAWA Hirofumi
2008-08-06 19:50                   ` OGAWA Hirofumi
2008-08-06 20:10                     ` OGAWA Hirofumi
2008-08-06 21:37                   ` David Woodhouse
2008-08-07 16:09                     ` David Woodhouse
2008-08-07 16:33             ` [PATCH 2/5, v2] " David Woodhouse
2008-08-06  9:25           ` [PATCH 3/5] [MTD] Support 'discard sectors' operation in translation layer support core David Woodhouse
2008-08-06  9:25           ` [PATCH 4/5] [MTD] [FTL] Support 'discard sectors' operation David Woodhouse
2008-08-06  9:25           ` [PATCH 5/5] [BLOCK] Fix up comments about matching flags between bio and rq David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080808114409.GR20055@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=dwmw2@infradead.org \
    --cc=gilad@codefidence.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=ricwheeler@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).