* [PATCH 1/7] [BLOCK] Fix typo causing compile error in blk_queue_bounce()
2008-08-09 16:26 [PATCH 0/7] Discard requests, v2 David Woodhouse
@ 2008-08-09 16:29 ` David Woodhouse
2008-08-09 16:30 ` [PATCH 2/7] [BLOCK] Fix up comments about matching flags between bio and rq David Woodhouse
` (12 subsequent siblings)
13 siblings, 0 replies; 88+ messages in thread
From: David Woodhouse @ 2008-08-09 16:29 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---
mm/bounce.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/mm/bounce.c b/mm/bounce.c
index f6efc47..06722c4 100644
--- a/mm/bounce.c
+++ b/mm/bounce.c
@@ -267,7 +267,7 @@ void blk_queue_bounce(struct request_queue *q, struct bio **bio_orig)
/*
* Data-less bio, nothing to bounce
*/
- if (!bio_has_data(*orig_bio))
+ if (!bio_has_data(*bio_orig))
return;
/*
--
1.5.5.1
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply related [flat|nested] 88+ messages in thread* [PATCH 2/7] [BLOCK] Fix up comments about matching flags between bio and rq
2008-08-09 16:26 [PATCH 0/7] Discard requests, v2 David Woodhouse
2008-08-09 16:29 ` [PATCH 1/7] [BLOCK] Fix typo causing compile error in blk_queue_bounce() David Woodhouse
@ 2008-08-09 16:30 ` David Woodhouse
2008-08-09 16:30 ` [PATCH 3/7] [BLOCK] Add 'discard' request handling David Woodhouse
` (11 subsequent siblings)
13 siblings, 0 replies; 88+ messages in thread
From: David Woodhouse @ 2008-08-09 16:30 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---
block/blk-core.c | 7 ++-----
include/linux/bio.h | 4 ++--
include/linux/blkdev.h | 2 +-
3 files changed, 5 insertions(+), 8 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index ff7ec49..9dc5732 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -622,10 +622,6 @@ blk_alloc_request(struct request_queue *q, int rw, int priv, gfp_t gfp_mask)
blk_rq_init(q, rq);
- /*
- * first three bits are identical in rq->cmd_flags and bio->bi_rw,
- * see bio.h and blkdev.h
- */
rq->cmd_flags = rw | REQ_ALLOCED;
if (priv) {
@@ -2010,7 +2006,8 @@ EXPORT_SYMBOL_GPL(blk_end_request_callback);
void blk_rq_bio_prep(struct request_queue *q, struct request *rq,
struct bio *bio)
{
- /* first two bits are identical in rq->cmd_flags and bio->bi_rw */
+ /* Bit 0 (R/W) is identical in rq->cmd_flags and bio->bi_rw, and
+ we want BIO_RW_AHEAD (bit 1) to imply REQ_FAILFAST (bit 1). */
rq->cmd_flags |= (bio->bi_rw & 3);
rq->nr_phys_segments = bio_phys_segments(q, bio);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index dbeb66f..17f1fbd 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -150,8 +150,8 @@ struct bio {
* bit 3 -- fail fast, don't want low level driver retries
* bit 4 -- synchronous I/O hint: the block layer will unplug immediately
*/
-#define BIO_RW 0
-#define BIO_RW_AHEAD 1
+#define BIO_RW 0 /* Must match RW in req flags (blkdev.h) */
+#define BIO_RW_AHEAD 1 /* Must match FAILFAST in req flags */
#define BIO_RW_BARRIER 2
#define BIO_RW_FAILFAST 3
#define BIO_RW_SYNC 4
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e61f22b..a07659d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -84,7 +84,7 @@ enum {
};
/*
- * request type modified bits. first three bits match BIO_RW* bits, important
+ * request type modified bits. first two bits match BIO_RW* bits, important
*/
enum rq_flag_bits {
__REQ_RW, /* not set, read. set, write */
--
1.5.5.1
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply related [flat|nested] 88+ messages in thread* [PATCH 3/7] [BLOCK] Add 'discard' request handling
2008-08-09 16:26 [PATCH 0/7] Discard requests, v2 David Woodhouse
2008-08-09 16:29 ` [PATCH 1/7] [BLOCK] Fix typo causing compile error in blk_queue_bounce() David Woodhouse
2008-08-09 16:30 ` [PATCH 2/7] [BLOCK] Fix up comments about matching flags between bio and rq David Woodhouse
@ 2008-08-09 16:30 ` David Woodhouse
2008-08-09 20:39 ` OGAWA Hirofumi
2008-08-09 16:31 ` [PATCH 4/7] [FAT] Let the block device know when sectors can be discarded David Woodhouse
` (10 subsequent siblings)
13 siblings, 1 reply; 88+ messages in thread
From: David Woodhouse @ 2008-08-09 16:30 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
Some block devices benefit from a hint that they can forget the contents
of certain sectors. Add basic support for this to the block core, along
with a 'blkdev_issue_discard()' helper function which issues such
requests.
Although blkdev_issue_discard() can take an end_io function, it's
acceptable to leave that as NULL and in that case the allocated bio will
be automatically freed. Most of the time, it's expected that callers
won't care about when, or even _if_, the request completes. It's only a
hint to the device anyway. By definition, the file system doesn't _care_
about these sectors any more.
[With feedback from OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> and
Jens Axboe <jens.axboe@oracle.com]
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---
block/blk-barrier.c | 77 ++++++++++++++++++++++++++++++++++++++++++++++++
block/blk-core.c | 28 ++++++++++++-----
block/blk-settings.c | 17 ++++++++++
block/elevator.c | 8 ++++-
include/linux/bio.h | 8 ++++-
include/linux/blkdev.h | 17 ++++++++++
6 files changed, 144 insertions(+), 11 deletions(-)
diff --git a/block/blk-barrier.c b/block/blk-barrier.c
index a09ead1..0351148 100644
--- a/block/blk-barrier.c
+++ b/block/blk-barrier.c
@@ -315,3 +315,80 @@ int blkdev_issue_flush(struct block_device *bdev, sector_t *error_sector)
return ret;
}
EXPORT_SYMBOL(blkdev_issue_flush);
+
+static void blkdev_discard_end_io(struct bio *bio, int err)
+{
+ bio_end_io_t *end_io = bio->bi_private;
+
+ if (err) {
+ if (err == -EOPNOTSUPP)
+ set_bit(BIO_EOPNOTSUPP, &bio->bi_flags);
+ clear_bit(BIO_UPTODATE, &bio->bi_flags);
+ }
+
+ if (end_io)
+ end_io(bio, err);
+ else
+ bio_put(bio);
+}
+
+/**
+ * blkdev_issue_discard - queue a discard
+ * @bdev: blockdev to issue discard for
+ * @sector: start sector
+ * @nr_sects: number of sectors to discard
+ * @end_io: end_io function (or %NULL)
+ *
+ * Description:
+ * Issue a discard request for the sectors in question. Caller can pass an
+ * end_io function (which must call bio_put()), if they really want to see
+ * the outcome. Most callers probably won't, and can just pass %NULL.
+ */
+int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
+ unsigned nr_sects, bio_end_io_t end_io)
+{
+ struct request_queue *q;
+ struct bio *bio;
+ int ret = 0;
+
+ if (bdev->bd_disk == NULL)
+ return -ENXIO;
+
+ q = bdev_get_queue(bdev);
+ if (!q)
+ return -ENXIO;
+
+ if (!q->prepare_discard_fn)
+ return -EOPNOTSUPP;
+
+ while (nr_sects && !ret) {
+ bio = bio_alloc(GFP_KERNEL, 0);
+ if (!bio)
+ return -ENOMEM;
+
+ bio->bi_private = (void *)end_io;
+ bio->bi_end_io = blkdev_discard_end_io;
+ bio->bi_bdev = bdev;
+
+ bio->bi_sector = sector;
+
+ if (nr_sects > q->max_hw_sectors) {
+ bio->bi_size = q->max_hw_sectors << 9;
+ nr_sects -= q->max_hw_sectors;
+ } else {
+ bio->bi_size = nr_sects << 9;
+ nr_sects = 0;
+ }
+ bio_get(bio);
+ submit_bio(1 << BIO_RW_DISCARD, bio);
+
+ /* Check if it failed immediately */
+ if (bio_flagged(bio, BIO_EOPNOTSUPP))
+ ret = -EOPNOTSUPP;
+ else if (!bio_flagged(bio, BIO_UPTODATE))
+ ret = -EIO;
+ bio_put(bio);
+ }
+ return ret;
+}
+EXPORT_SYMBOL(blkdev_issue_discard);
diff --git a/block/blk-core.c b/block/blk-core.c
index 9dc5732..2442fb7 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1077,6 +1077,10 @@ void init_request_from_bio(struct request *req, struct bio *bio)
*/
if (unlikely(bio_barrier(bio)))
req->cmd_flags |= (REQ_HARDBARRIER | REQ_NOMERGE);
+ if (unlikely(bio_discard(bio))) {
+ req->cmd_flags |= (REQ_SOFTBARRIER | REQ_DISCARD);
+ req->q->prepare_discard_fn(req->q, req);
+ }
if (bio_sync(bio))
req->cmd_flags |= REQ_RW_SYNC;
@@ -1093,7 +1097,7 @@ void init_request_from_bio(struct request *req, struct bio *bio)
static int __make_request(struct request_queue *q, struct bio *bio)
{
struct request *req;
- int el_ret, nr_sectors, barrier, err;
+ int el_ret, nr_sectors, barrier, discard, err;
const unsigned short prio = bio_prio(bio);
const int sync = bio_sync(bio);
int rw_flags;
@@ -1113,6 +1117,12 @@ static int __make_request(struct request_queue *q, struct bio *bio)
goto end_io;
}
+ discard = bio_discard(bio);
+ if (unlikely(discard) && !q->prepare_discard_fn) {
+ err = -EOPNOTSUPP;
+ goto end_io;
+ }
+
spin_lock_irq(q->queue_lock);
if (unlikely(barrier) || elv_queue_empty(q))
@@ -1403,7 +1413,8 @@ end_io:
if (bio_check_eod(bio, nr_sectors))
goto end_io;
- if (bio_empty_barrier(bio) && !q->prepare_flush_fn) {
+ if ((bio_empty_barrier(bio) && !q->prepare_flush_fn) ||
+ (bio_discard(bio) && !q->prepare_discard_fn)) {
err = -EOPNOTSUPP;
goto end_io;
}
@@ -1485,7 +1496,6 @@ void submit_bio(int rw, struct bio *bio)
* go through the normal accounting stuff before submission.
*/
if (bio_has_data(bio)) {
-
if (rw & WRITE) {
count_vm_events(PGPGOUT, count);
} else {
@@ -1879,7 +1889,7 @@ static int blk_end_io(struct request *rq, int error, unsigned int nr_bytes,
struct request_queue *q = rq->q;
unsigned long flags = 0UL;
- if (bio_has_data(rq->bio)) {
+ if (bio_has_data(rq->bio) || blk_discard_rq(rq)) {
if (__end_that_request_first(rq, error, nr_bytes))
return 1;
@@ -1937,7 +1947,7 @@ EXPORT_SYMBOL_GPL(blk_end_request);
**/
int __blk_end_request(struct request *rq, int error, unsigned int nr_bytes)
{
- if (bio_has_data(rq->bio) &&
+ if ((bio_has_data(rq->bio) || blk_discard_rq(rq)) &&
__end_that_request_first(rq, error, nr_bytes))
return 1;
@@ -2010,12 +2020,14 @@ void blk_rq_bio_prep(struct request_queue *q, struct request *rq,
we want BIO_RW_AHEAD (bit 1) to imply REQ_FAILFAST (bit 1). */
rq->cmd_flags |= (bio->bi_rw & 3);
- rq->nr_phys_segments = bio_phys_segments(q, bio);
- rq->nr_hw_segments = bio_hw_segments(q, bio);
+ if (bio_has_data(bio)) {
+ rq->nr_phys_segments = bio_phys_segments(q, bio);
+ rq->nr_hw_segments = bio_hw_segments(q, bio);
+ rq->buffer = bio_data(bio);
+ }
rq->current_nr_sectors = bio_cur_sectors(bio);
rq->hard_cur_sectors = rq->current_nr_sectors;
rq->hard_nr_sectors = rq->nr_sectors = bio_sectors(bio);
- rq->buffer = bio_data(bio);
rq->data_len = bio->bi_size;
rq->bio = rq->biotail = bio;
diff --git a/block/blk-settings.c b/block/blk-settings.c
index dfc7701..539d873 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -33,6 +33,23 @@ void blk_queue_prep_rq(struct request_queue *q, prep_rq_fn *pfn)
EXPORT_SYMBOL(blk_queue_prep_rq);
/**
+ * blk_queue_set_discard - set a discard_sectors function for queue
+ * @q: queue
+ * @dfn: prepare_discard function
+ *
+ * It's possible for a queue to register a discard callback which is used
+ * to transform a discard request into the appropriate type for the
+ * hardware. If none is registered, then discard requests are failed
+ * with %EOPNOTSUPP.
+ *
+ */
+void blk_queue_set_discard(struct request_queue *q, prepare_discard_fn *dfn)
+{
+ q->prepare_discard_fn = dfn;
+}
+EXPORT_SYMBOL(blk_queue_set_discard);
+
+/**
* blk_queue_merge_bvec - set a merge_bvec function for queue
* @q: queue
* @mbfn: merge_bvec_fn
diff --git a/block/elevator.c b/block/elevator.c
index ed6f8f3..17ae417 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -75,6 +75,12 @@ int elv_rq_merge_ok(struct request *rq, struct bio *bio)
return 0;
/*
+ * Don't merge file system and non-file system requests
+ */
+ if (bio_has_data(bio) != bio_has_data(rq->bio))
+ return 0;
+
+ /*
* different data direction or already started, don't merge
*/
if (bio_data_dir(bio) != rq_data_dir(rq))
@@ -607,7 +613,7 @@ void elv_insert(struct request_queue *q, struct request *rq, int where)
break;
case ELEVATOR_INSERT_SORT:
- BUG_ON(!blk_fs_request(rq));
+ BUG_ON(!blk_fs_request(rq) && !blk_discard_rq(rq));
rq->cmd_flags |= REQ_SORTED;
q->nr_sorted++;
if (rq_mergeable(rq)) {
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 17f1fbd..1fdfc56 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -149,6 +149,8 @@ struct bio {
* bit 2 -- barrier
* bit 3 -- fail fast, don't want low level driver retries
* bit 4 -- synchronous I/O hint: the block layer will unplug immediately
+ * bit 5 -- metadata request
+ * bit 6 -- discard sectors
*/
#define BIO_RW 0 /* Must match RW in req flags (blkdev.h) */
#define BIO_RW_AHEAD 1 /* Must match FAILFAST in req flags */
@@ -156,6 +158,7 @@ struct bio {
#define BIO_RW_FAILFAST 3
#define BIO_RW_SYNC 4
#define BIO_RW_META 5
+#define BIO_RW_DISCARD 6
/*
* upper 16 bits of bi_rw define the io priority of this bio
@@ -186,13 +189,14 @@ struct bio {
#define bio_rw_ahead(bio) ((bio)->bi_rw & (1 << BIO_RW_AHEAD))
#define bio_rw_meta(bio) ((bio)->bi_rw & (1 << BIO_RW_META))
#define bio_empty_barrier(bio) (bio_barrier(bio) && !bio_has_data(bio))
+#define bio_discard(bio) ((bio)->bi_rw & (1 << BIO_RW_DISCARD))
static inline unsigned int bio_cur_sectors(struct bio *bio)
{
if (bio->bi_vcnt)
return bio_iovec(bio)->bv_len >> 9;
-
- return 0;
+ else /* dataless requests such as discard */
+ return bio->bi_size >> 9;
}
static inline void *bio_data(struct bio *bio)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index a07659d..8a65702 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -89,6 +89,7 @@ enum {
enum rq_flag_bits {
__REQ_RW, /* not set, read. set, write */
__REQ_FAILFAST, /* no low level driver retries */
+ __REQ_DISCARD, /* request to discard sectors */
__REQ_SORTED, /* elevator knows about this request */
__REQ_SOFTBARRIER, /* may not be passed by ioscheduler */
__REQ_HARDBARRIER, /* may not be passed by drive either */
@@ -111,6 +112,7 @@ enum rq_flag_bits {
};
#define REQ_RW (1 << __REQ_RW)
+#define REQ_DISCARD (1 << __REQ_DISCARD)
#define REQ_FAILFAST (1 << __REQ_FAILFAST)
#define REQ_SORTED (1 << __REQ_SORTED)
#define REQ_SOFTBARRIER (1 << __REQ_SOFTBARRIER)
@@ -252,6 +254,7 @@ typedef void (request_fn_proc) (struct request_queue *q);
typedef int (make_request_fn) (struct request_queue *q, struct bio *bio);
typedef int (prep_rq_fn) (struct request_queue *, struct request *);
typedef void (unplug_fn) (struct request_queue *);
+typedef int (prepare_discard_fn) (struct request_queue *, struct request *);
struct bio_vec;
struct bvec_merge_data {
@@ -298,6 +301,7 @@ struct request_queue
make_request_fn *make_request_fn;
prep_rq_fn *prep_rq_fn;
unplug_fn *unplug_fn;
+ prepare_discard_fn *prepare_discard_fn;
merge_bvec_fn *merge_bvec_fn;
prepare_flush_fn *prepare_flush_fn;
softirq_done_fn *softirq_done_fn;
@@ -536,6 +540,7 @@ enum {
#define blk_sorted_rq(rq) ((rq)->cmd_flags & REQ_SORTED)
#define blk_barrier_rq(rq) ((rq)->cmd_flags & REQ_HARDBARRIER)
#define blk_fua_rq(rq) ((rq)->cmd_flags & REQ_FUA)
+#define blk_discard_rq(rq) ((rq)->cmd_flags & REQ_DISCARD)
#define blk_bidi_rq(rq) ((rq)->next_rq != NULL)
#define blk_empty_barrier(rq) (blk_barrier_rq(rq) && blk_fs_request(rq) && !(rq)->hard_nr_sectors)
/* rq->queuelist of dequeued request must be list_empty() */
@@ -786,6 +791,7 @@ extern void blk_queue_merge_bvec(struct request_queue *, merge_bvec_fn *);
extern void blk_queue_dma_alignment(struct request_queue *, int);
extern void blk_queue_update_dma_alignment(struct request_queue *, int);
extern void blk_queue_softirq_done(struct request_queue *, softirq_done_fn *);
+extern void blk_queue_set_discard(struct request_queue *, prepare_discard_fn *);
extern struct backing_dev_info *blk_get_backing_dev_info(struct block_device *bdev);
extern int blk_queue_ordered(struct request_queue *, unsigned, prepare_flush_fn *);
extern int blk_do_ordered(struct request_queue *, struct request **);
@@ -829,6 +835,17 @@ static inline struct request *blk_map_queue_find_tag(struct blk_queue_tag *bqt,
}
extern int blkdev_issue_flush(struct block_device *, sector_t *);
+extern int blkdev_issue_discard(struct block_device *, sector_t sector,
+ unsigned nr_sects, bio_end_io_t *);
+
+static inline int sb_issue_discard(struct super_block *sb,
+ sector_t block, unsigned nr_blocks,
+ bio_end_io_t *endio)
+{
+ block <<= (sb->s_blocksize_bits - 9);
+ nr_blocks <<= (sb->s_blocksize_bits - 9);
+ return blkdev_issue_discard(sb->s_bdev, block, nr_blocks, endio);
+}
/*
* command filter functions
--
1.5.5.1
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply related [flat|nested] 88+ messages in thread* Re: [PATCH 3/7] [BLOCK] Add 'discard' request handling
2008-08-09 16:30 ` [PATCH 3/7] [BLOCK] Add 'discard' request handling David Woodhouse
@ 2008-08-09 20:39 ` OGAWA Hirofumi
2008-08-09 21:37 ` David Woodhouse
2008-08-10 6:32 ` David Woodhouse
0 siblings, 2 replies; 88+ messages in thread
From: OGAWA Hirofumi @ 2008-08-09 20:39 UTC (permalink / raw)
To: David Woodhouse
Cc: Jens Axboe, Andrew Morton, Ric Wheeler, linux-fsdevel, gilad,
matthew
David Woodhouse <dwmw2@infradead.org> writes:
> + * Don't merge file system and non-file system requests
> + */
> + if (bio_has_data(bio) != bio_has_data(rq->bio))
> + return 0;
Maybe, this is bio_discard(bio) != bio_discard(rq->bio)? Because it
seems to say bio_discard(bio) == blk_empty_barrier(rq) can merge,
rq_mergeable(rq) already checked barrier before this though...
> +
> + /*
> * different data direction or already started, don't merge
> */
> if (bio_data_dir(bio) != rq_data_dir(rq))
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 3/7] [BLOCK] Add 'discard' request handling
2008-08-09 20:39 ` OGAWA Hirofumi
@ 2008-08-09 21:37 ` David Woodhouse
2008-08-10 6:32 ` David Woodhouse
1 sibling, 0 replies; 88+ messages in thread
From: David Woodhouse @ 2008-08-09 21:37 UTC (permalink / raw)
To: OGAWA Hirofumi
Cc: Jens Axboe, Andrew Morton, Ric Wheeler, linux-fsdevel, gilad,
matthew
On Sun, 2008-08-10 at 05:39 +0900, OGAWA Hirofumi wrote:
> David Woodhouse <dwmw2@infradead.org> writes:
>
> > + * Don't merge file system and non-file system requests
> > + */
> > + if (bio_has_data(bio) != bio_has_data(rq->bio))
> > + return 0;
>
> Maybe, this is bio_discard(bio) != bio_discard(rq->bio)? Because it
> seems to say bio_discard(bio) == blk_empty_barrier(rq) can merge,
> rq_mergeable(rq) already checked barrier before this though...
Yeah, either would actually do the job just fine, but I should probably
change it to bio_discard().
Thanks.
--
dwmw2
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 3/7] [BLOCK] Add 'discard' request handling
2008-08-09 20:39 ` OGAWA Hirofumi
2008-08-09 21:37 ` David Woodhouse
@ 2008-08-10 6:32 ` David Woodhouse
1 sibling, 0 replies; 88+ messages in thread
From: David Woodhouse @ 2008-08-10 6:32 UTC (permalink / raw)
To: OGAWA Hirofumi
Cc: Jens Axboe, Andrew Morton, Ric Wheeler, linux-fsdevel, gilad,
matthew
On Sun, 2008-08-10 at 05:39 +0900, OGAWA Hirofumi wrote:
> Maybe, this is bio_discard(bio) != bio_discard(rq->bio)? Because it
> seems to say bio_discard(bio) == blk_empty_barrier(rq) can merge,
> rq_mergeable(rq) already checked barrier before this though...
Yeah. It doesn't matter at the moment, as you say, but it's probably
cleaner to make it bio_discard(). That's what I was thinking as I typed
it anyway, I think. I've changed it in the git tree.
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply [flat|nested] 88+ messages in thread
* [PATCH 4/7] [FAT] Let the block device know when sectors can be discarded
2008-08-09 16:26 [PATCH 0/7] Discard requests, v2 David Woodhouse
` (2 preceding siblings ...)
2008-08-09 16:30 ` [PATCH 3/7] [BLOCK] Add 'discard' request handling David Woodhouse
@ 2008-08-09 16:31 ` David Woodhouse
2008-08-09 16:32 ` [PATCH 5/7] [MTD] Support 'discard sectors' operation in translation layer support core David Woodhouse
` (9 subsequent siblings)
13 siblings, 0 replies; 88+ messages in thread
From: David Woodhouse @ 2008-08-09 16:31 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
[hirofumi@mail.parknet.co.jp: discard _after_ checking for corrupt chains]
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
---
fs/fat/fatent.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/fs/fat/fatent.c b/fs/fat/fatent.c
index 302e95c..a8b10ac 100644
--- a/fs/fat/fatent.c
+++ b/fs/fat/fatent.c
@@ -6,6 +6,7 @@
#include <linux/module.h>
#include <linux/fs.h>
#include <linux/msdos_fs.h>
+#include <linux/blkdev.h>
struct fatent_operations {
void (*ent_blocknr)(struct super_block *, int, int *, sector_t *);
@@ -551,6 +552,10 @@ int fat_free_clusters(struct inode *inode, int cluster)
goto error;
}
+ /* Issue discard for the sectors we no longer care about */
+ sb_issue_discard(sb, fat_clus_to_blknr(sbi, fatent.entry),
+ sbi->sec_per_clus, NULL);
+
ops->ent_put(&fatent, FAT_ENT_FREE);
if (sbi->free_clusters != -1) {
sbi->free_clusters++;
--
1.5.5.1
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply related [flat|nested] 88+ messages in thread* [PATCH 5/7] [MTD] Support 'discard sectors' operation in translation layer support core
2008-08-09 16:26 [PATCH 0/7] Discard requests, v2 David Woodhouse
` (3 preceding siblings ...)
2008-08-09 16:31 ` [PATCH 4/7] [FAT] Let the block device know when sectors can be discarded David Woodhouse
@ 2008-08-09 16:32 ` David Woodhouse
2008-08-09 16:33 ` [PATCH 6/7] [MTD] [FTL] Support 'discard sectors' operation David Woodhouse
` (8 subsequent siblings)
13 siblings, 0 replies; 88+ messages in thread
From: David Woodhouse @ 2008-08-09 16:32 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---
drivers/mtd/mtd_blkdevs.c | 16 ++++++++++++++++
include/linux/blkdev.h | 1 +
include/linux/mtd/blktrans.h | 2 ++
3 files changed, 19 insertions(+), 0 deletions(-)
diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c
index 9ff007c..681d5ac 100644
--- a/drivers/mtd/mtd_blkdevs.c
+++ b/drivers/mtd/mtd_blkdevs.c
@@ -32,6 +32,14 @@ struct mtd_blkcore_priv {
spinlock_t queue_lock;
};
+static int blktrans_discard_request(struct request_queue *q,
+ struct request *req)
+{
+ req->cmd_type = REQ_TYPE_LINUX_BLOCK;
+ req->cmd[0] = REQ_LB_OP_DISCARD;
+ return 0;
+}
+
static int do_blktrans_request(struct mtd_blktrans_ops *tr,
struct mtd_blktrans_dev *dev,
struct request *req)
@@ -44,6 +52,10 @@ static int do_blktrans_request(struct mtd_blktrans_ops *tr,
buf = req->buffer;
+ if (req->cmd_type == REQ_TYPE_LINUX_BLOCK &&
+ req->cmd[0] == REQ_LB_OP_DISCARD)
+ return !tr->discard(dev, block, nsect);
+
if (!blk_fs_request(req))
return 0;
@@ -367,6 +379,10 @@ int register_mtd_blktrans(struct mtd_blktrans_ops *tr)
tr->blkcore_priv->rq->queuedata = tr;
blk_queue_hardsect_size(tr->blkcore_priv->rq, tr->blksize);
+ if (tr->discard)
+ blk_queue_set_discard(tr->blkcore_priv->rq,
+ blktrans_discard_request);
+
tr->blkshift = ffs(tr->blksize) - 1;
tr->blkcore_priv->thread = kthread_run(mtd_blktrans_thread, tr,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 8a65702..f8957c5 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -81,6 +81,7 @@ enum {
*/
REQ_LB_OP_EJECT = 0x40, /* eject request */
REQ_LB_OP_FLUSH = 0x41, /* flush device */
+ REQ_LB_OP_DISCARD = 0x42, /* discard sectors */
};
/*
diff --git a/include/linux/mtd/blktrans.h b/include/linux/mtd/blktrans.h
index 310e616..8b4aa05 100644
--- a/include/linux/mtd/blktrans.h
+++ b/include/linux/mtd/blktrans.h
@@ -41,6 +41,8 @@ struct mtd_blktrans_ops {
unsigned long block, char *buffer);
int (*writesect)(struct mtd_blktrans_dev *dev,
unsigned long block, char *buffer);
+ int (*discard)(struct mtd_blktrans_dev *dev,
+ unsigned long block, unsigned nr_blocks);
/* Block layer ioctls */
int (*getgeo)(struct mtd_blktrans_dev *dev, struct hd_geometry *geo);
--
1.5.5.1
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply related [flat|nested] 88+ messages in thread* [PATCH 6/7] [MTD] [FTL] Support 'discard sectors' operation.
2008-08-09 16:26 [PATCH 0/7] Discard requests, v2 David Woodhouse
` (4 preceding siblings ...)
2008-08-09 16:32 ` [PATCH 5/7] [MTD] Support 'discard sectors' operation in translation layer support core David Woodhouse
@ 2008-08-09 16:33 ` David Woodhouse
2008-08-09 16:33 ` [PATCH 7/7] [BLOCK] Allow elevators to sort/merge discard requests David Woodhouse
` (7 subsequent siblings)
13 siblings, 0 replies; 88+ messages in thread
From: David Woodhouse @ 2008-08-09 16:33 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
We can benefit from knowing that the file system no longer cares about
the contents of certain sectors, by throwing them away immediately and
then never having to garbage collect them, and using the extra free
space to make our operations more efficient. Do so.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---
drivers/mtd/ftl.c | 24 ++++++++++++++++++++++++
1 files changed, 24 insertions(+), 0 deletions(-)
diff --git a/drivers/mtd/ftl.c b/drivers/mtd/ftl.c
index f34f20c..9bf581c 100644
--- a/drivers/mtd/ftl.c
+++ b/drivers/mtd/ftl.c
@@ -1005,6 +1005,29 @@ static int ftl_writesect(struct mtd_blktrans_dev *dev,
return ftl_write((void *)dev, buf, block, 1);
}
+static int ftl_discardsect(struct mtd_blktrans_dev *dev,
+ unsigned long sector, unsigned nr_sects)
+{
+ partition_t *part = (void *)dev;
+ uint32_t bsize = 1 << part->header.EraseUnitSize;
+
+ DEBUG(1, "FTL erase sector %ld for %d sectors\n",
+ sector, nr_sects);
+
+ while (nr_sects) {
+ uint32_t old_addr = part->VirtualBlockMap[sector];
+ if (old_addr != 0xffffffff) {
+ part->VirtualBlockMap[sector] = 0xffffffff;
+ part->EUNInfo[old_addr/bsize].Deleted++;
+ if (set_bam_entry(part, old_addr, 0))
+ return -EIO;
+ }
+ nr_sects--;
+ sector++;
+ }
+
+ return 0;
+}
/*====================================================================*/
static void ftl_freepart(partition_t *part)
@@ -1069,6 +1092,7 @@ static struct mtd_blktrans_ops ftl_tr = {
.blksize = SECTOR_SIZE,
.readsect = ftl_readsect,
.writesect = ftl_writesect,
+ .discard = ftl_discardsect,
.getgeo = ftl_getgeo,
.add_mtd = ftl_add_mtd,
.remove_dev = ftl_remove_dev,
--
1.5.5.1
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply related [flat|nested] 88+ messages in thread* [PATCH 7/7] [BLOCK] Allow elevators to sort/merge discard requests
2008-08-09 16:26 [PATCH 0/7] Discard requests, v2 David Woodhouse
` (5 preceding siblings ...)
2008-08-09 16:33 ` [PATCH 6/7] [MTD] [FTL] Support 'discard sectors' operation David Woodhouse
@ 2008-08-09 16:33 ` David Woodhouse
2008-10-03 20:29 ` Andrew Morton
2008-08-09 22:48 ` [PATCH 0/7] Discard requests, v2 OGAWA Hirofumi
` (6 subsequent siblings)
13 siblings, 1 reply; 88+ messages in thread
From: David Woodhouse @ 2008-08-09 16:33 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---
block/blk-core.c | 2 +-
block/blk-merge.c | 12 +++++++-----
block/elevator.c | 4 +++-
include/linux/blkdev.h | 5 +++--
4 files changed, 14 insertions(+), 9 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 2442fb7..0c8ed97 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1078,7 +1078,7 @@ void init_request_from_bio(struct request *req, struct bio *bio)
if (unlikely(bio_barrier(bio)))
req->cmd_flags |= (REQ_HARDBARRIER | REQ_NOMERGE);
if (unlikely(bio_discard(bio))) {
- req->cmd_flags |= (REQ_SOFTBARRIER | REQ_DISCARD);
+ req->cmd_flags |= REQ_DISCARD;
req->q->prepare_discard_fn(req->q, req);
}
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 5efc9e7..58a0ea3 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -11,7 +11,7 @@
void blk_recalc_rq_sectors(struct request *rq, int nsect)
{
- if (blk_fs_request(rq)) {
+ if (blk_fs_request(rq) || blk_discard_rq(rq)) {
rq->hard_sector += nsect;
rq->hard_nr_sectors -= nsect;
@@ -317,8 +317,9 @@ int ll_back_merge_fn(struct request_queue *q, struct request *req,
if (!bio_flagged(bio, BIO_SEG_VALID))
blk_recount_segments(q, bio);
len = req->biotail->bi_hw_back_size + bio->bi_hw_front_size;
- if (BIOVEC_VIRT_MERGEABLE(__BVEC_END(req->biotail), __BVEC_START(bio))
- && !BIOVEC_VIRT_OVERSIZE(len)) {
+ if (!bio_has_data(bio) ||
+ (BIOVEC_VIRT_MERGEABLE(__BVEC_END(req->biotail), __BVEC_START(bio))
+ && !BIOVEC_VIRT_OVERSIZE(len))) {
int mergeable = ll_new_mergeable(q, req, bio);
if (mergeable) {
@@ -356,8 +357,9 @@ int ll_front_merge_fn(struct request_queue *q, struct request *req,
blk_recount_segments(q, bio);
if (!bio_flagged(req->bio, BIO_SEG_VALID))
blk_recount_segments(q, req->bio);
- if (BIOVEC_VIRT_MERGEABLE(__BVEC_END(bio), __BVEC_START(req->bio)) &&
- !BIOVEC_VIRT_OVERSIZE(len)) {
+ if (!bio_has_data(bio) ||
+ (BIOVEC_VIRT_MERGEABLE(__BVEC_END(bio), __BVEC_START(req->bio)) &&
+ !BIOVEC_VIRT_OVERSIZE(len))) {
int mergeable = ll_new_mergeable(q, req, bio);
if (mergeable) {
diff --git a/block/elevator.c b/block/elevator.c
index 17ae417..4e27daa 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -444,6 +444,8 @@ void elv_dispatch_sort(struct request_queue *q, struct request *rq)
list_for_each_prev(entry, &q->queue_head) {
struct request *pos = list_entry_rq(entry);
+ if (blk_discard_rq(rq) != blk_discard_rq(pos))
+ break;
if (rq_data_dir(rq) != rq_data_dir(pos))
break;
if (pos->cmd_flags & stop_flags)
@@ -698,7 +700,7 @@ void __elv_add_request(struct request_queue *q, struct request *rq, int where,
* this request is scheduling boundary, update
* end_sector
*/
- if (blk_fs_request(rq)) {
+ if (blk_fs_request(rq) || blk_discard_rq(rq)) {
q->end_sector = rq_end_sector(rq);
q->boundary_rq = rq;
}
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f8957c5..b9cb7e1 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -531,7 +531,7 @@ enum {
#define blk_noretry_request(rq) ((rq)->cmd_flags & REQ_FAILFAST)
#define blk_rq_started(rq) ((rq)->cmd_flags & REQ_STARTED)
-#define blk_account_rq(rq) (blk_rq_started(rq) && blk_fs_request(rq))
+#define blk_account_rq(rq) (blk_rq_started(rq) && (blk_fs_request(rq) || blk_discard_rq(rq)))
#define blk_pm_suspend_request(rq) ((rq)->cmd_type == REQ_TYPE_PM_SUSPEND)
#define blk_pm_resume_request(rq) ((rq)->cmd_type == REQ_TYPE_PM_RESUME)
@@ -588,7 +588,8 @@ static inline void blk_clear_queue_full(struct request_queue *q, int rw)
#define RQ_NOMERGE_FLAGS \
(REQ_NOMERGE | REQ_STARTED | REQ_HARDBARRIER | REQ_SOFTBARRIER)
#define rq_mergeable(rq) \
- (!((rq)->cmd_flags & RQ_NOMERGE_FLAGS) && blk_fs_request((rq)))
+ (!((rq)->cmd_flags & RQ_NOMERGE_FLAGS) && \
+ (blk_discard_rq(rq) || blk_fs_request((rq))))
/*
* q->prep_rq_fn return values
--
1.5.5.1
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply related [flat|nested] 88+ messages in thread* Re: [PATCH 7/7] [BLOCK] Allow elevators to sort/merge discard requests
2008-08-09 16:33 ` [PATCH 7/7] [BLOCK] Allow elevators to sort/merge discard requests David Woodhouse
@ 2008-10-03 20:29 ` Andrew Morton
2008-10-07 12:07 ` Jens Axboe
0 siblings, 1 reply; 88+ messages in thread
From: Andrew Morton @ 2008-10-03 20:29 UTC (permalink / raw)
To: David Woodhouse; +Cc: jens.axboe, ricwheeler, linux-fsdevel, gilad, matthew
On Sat, 09 Aug 2008 17:33:46 +0100
David Woodhouse <dwmw2@infradead.org> wrote:
> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
> ---
> block/blk-core.c | 2 +-
> block/blk-merge.c | 12 +++++++-----
> block/elevator.c | 4 +++-
> include/linux/blkdev.h | 5 +++--
> 4 files changed, 14 insertions(+), 9 deletions(-)
I seem to be having a grumpier day.
ERROR: trailing whitespace
#96: FILE: block/blk-merge.c:320:
+^Iif (!bio_has_data(bio) || $
ERROR: trailing whitespace
#108: FILE: block/blk-merge.c:360:
+^Iif (!bio_has_data(bio) || $
ERROR: trailing whitespace
#145: FILE: include/linux/blkdev.h:534:
+#define blk_account_rq(rq)^I(blk_rq_started(rq) && (blk_fs_request(rq) || blk_discard_rq(rq))) $
WARNING: line over 80 characters
#145: FILE: include/linux/blkdev.h:534:
+#define blk_account_rq(rq) (blk_rq_started(rq) && (blk_fs_request(rq) || blk_discard_rq(rq)))
total: 3 errors, 1 warnings, 71 lines checked
Do we think there were tangible benefits to adding new trailing
whitespace?
> -#define blk_account_rq(rq) (blk_rq_started(rq) && blk_fs_request(rq))
> +#define blk_account_rq(rq) (blk_rq_started(rq) && (blk_fs_request(rq) || blk_discard_rq(rq)))
>
> #define blk_pm_suspend_request(rq) ((rq)->cmd_type == REQ_TYPE_PM_SUSPEND)
> #define blk_pm_resume_request(rq) ((rq)->cmd_type == REQ_TYPE_PM_RESUME)
> @@ -588,7 +588,8 @@ static inline void blk_clear_queue_full(struct request_queue *q, int rw)
> #define RQ_NOMERGE_FLAGS \
> (REQ_NOMERGE | REQ_STARTED | REQ_HARDBARRIER | REQ_SOFTBARRIER)
> #define rq_mergeable(rq) \
> - (!((rq)->cmd_flags & RQ_NOMERGE_FLAGS) && blk_fs_request((rq)))
> + (!((rq)->cmd_flags & RQ_NOMERGE_FLAGS) && \
> + (blk_discard_rq(rq) || blk_fs_request((rq))))
Can we please stop writing the kernel in cpp?
Not only are these things completely foul and stupid from a readbility
POV, they are flat out buggy if passed expression-with-side-effects and
will generate large amounts of additional text if called with _any_
expression other than a local variable or something which by some
miracle GCC can 100% CSE.
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 7/7] [BLOCK] Allow elevators to sort/merge discard requests
2008-10-03 20:29 ` Andrew Morton
@ 2008-10-07 12:07 ` Jens Axboe
0 siblings, 0 replies; 88+ messages in thread
From: Jens Axboe @ 2008-10-07 12:07 UTC (permalink / raw)
To: Andrew Morton; +Cc: David Woodhouse, ricwheeler, linux-fsdevel, gilad, matthew
On Fri, Oct 03 2008, Andrew Morton wrote:
> On Sat, 09 Aug 2008 17:33:46 +0100
> David Woodhouse <dwmw2@infradead.org> wrote:
>
> > Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
> > ---
> > block/blk-core.c | 2 +-
> > block/blk-merge.c | 12 +++++++-----
> > block/elevator.c | 4 +++-
> > include/linux/blkdev.h | 5 +++--
> > 4 files changed, 14 insertions(+), 9 deletions(-)
>
> I seem to be having a grumpier day.
>
> ERROR: trailing whitespace
> #96: FILE: block/blk-merge.c:320:
> +^Iif (!bio_has_data(bio) || $
>
> ERROR: trailing whitespace
> #108: FILE: block/blk-merge.c:360:
> +^Iif (!bio_has_data(bio) || $
>
> ERROR: trailing whitespace
> #145: FILE: include/linux/blkdev.h:534:
> +#define blk_account_rq(rq)^I(blk_rq_started(rq) && (blk_fs_request(rq) || blk_discard_rq(rq))) $
>
> WARNING: line over 80 characters
> #145: FILE: include/linux/blkdev.h:534:
> +#define blk_account_rq(rq) (blk_rq_started(rq) && (blk_fs_request(rq) || blk_discard_rq(rq)))
>
> total: 3 errors, 1 warnings, 71 lines checked
>
>
> Do we think there were tangible benefits to adding new trailing
> whitespace?
I fixed that up in a later patch.
> > -#define blk_account_rq(rq) (blk_rq_started(rq) && blk_fs_request(rq))
> > +#define blk_account_rq(rq) (blk_rq_started(rq) && (blk_fs_request(rq) || blk_discard_rq(rq)))
> >
> > #define blk_pm_suspend_request(rq) ((rq)->cmd_type == REQ_TYPE_PM_SUSPEND)
> > #define blk_pm_resume_request(rq) ((rq)->cmd_type == REQ_TYPE_PM_RESUME)
> > @@ -588,7 +588,8 @@ static inline void blk_clear_queue_full(struct request_queue *q, int rw)
> > #define RQ_NOMERGE_FLAGS \
> > (REQ_NOMERGE | REQ_STARTED | REQ_HARDBARRIER | REQ_SOFTBARRIER)
> > #define rq_mergeable(rq) \
> > - (!((rq)->cmd_flags & RQ_NOMERGE_FLAGS) && blk_fs_request((rq)))
> > + (!((rq)->cmd_flags & RQ_NOMERGE_FLAGS) && \
> > + (blk_discard_rq(rq) || blk_fs_request((rq))))
>
> Can we please stop writing the kernel in cpp?
>
> Not only are these things completely foul and stupid from a readbility
> POV, they are flat out buggy if passed expression-with-side-effects and
> will generate large amounts of additional text if called with _any_
> expression other than a local variable or something which by some
> miracle GCC can 100% CSE.
It's not real pretty, but I'd be hard pressed to think of a code example
that has ever passed 'rq' as an expression with side effects...
--
Jens Axboe
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/7] Discard requests, v2
2008-08-09 16:26 [PATCH 0/7] Discard requests, v2 David Woodhouse
` (6 preceding siblings ...)
2008-08-09 16:33 ` [PATCH 7/7] [BLOCK] Allow elevators to sort/merge discard requests David Woodhouse
@ 2008-08-09 22:48 ` OGAWA Hirofumi
2008-08-10 10:25 ` David Woodhouse
2008-08-10 10:29 ` [PATCH 8/7] blktrace: support discard requests David Woodhouse
` (5 subsequent siblings)
13 siblings, 1 reply; 88+ messages in thread
From: OGAWA Hirofumi @ 2008-08-09 22:48 UTC (permalink / raw)
To: David Woodhouse
Cc: Jens Axboe, Andrew Morton, Ric Wheeler, linux-fsdevel, gilad,
matthew
David Woodhouse <dwmw2@infradead.org> writes:
> This time on top of the for-2.6.28 branch of
> git://git.kernel.dk/linux-2.6-block.git
>
> I've made it cope with merging and sorting discard requests, still as a
> separate patch at the end of the sequence. I don't think we have a
> problem with discards passing writes in the queue, any more than we
> _already_ had a problem with writes passing writes.
>
> I've at least made elv_dispatch_sort() prevent discard requests from
> passing read/write requests, just as it already prevented read and write
> requests from passing each other.
I wonder about I/O schedule impact...
Currently it seems we schedule data direction as READ (no flags).
Although I'm not sure, we might want to schedule it as WRITE? (I guess
WRITE can be allowed delay more than READ...)
Thanks.
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
^ permalink raw reply [flat|nested] 88+ messages in thread* Re: [PATCH 0/7] Discard requests, v2
2008-08-09 22:48 ` [PATCH 0/7] Discard requests, v2 OGAWA Hirofumi
@ 2008-08-10 10:25 ` David Woodhouse
2008-08-10 16:37 ` Jamie Lokier
0 siblings, 1 reply; 88+ messages in thread
From: David Woodhouse @ 2008-08-10 10:25 UTC (permalink / raw)
To: OGAWA Hirofumi
Cc: Jens Axboe, Andrew Morton, Ric Wheeler, linux-fsdevel, gilad,
matthew
On Sun, 2008-08-10 at 07:48 +0900, OGAWA Hirofumi wrote:
> I wonder about I/O schedule impact...
>
> Currently it seems we schedule data direction as READ (no flags).
> Although I'm not sure, we might want to schedule it as WRITE? (I guess
> WRITE can be allowed delay more than READ...)
I'm not sure it matters much -- this is mostly going to be used on
devices which aren't affected by I/O scheduling at all and should
probably be using the no-op scheduler.
But we could do it fairly easily -- we just have to make sure it's done
before blk_alloc_request() shows it to elv_set_request(), which means we
can't just do it in init_request_from_bio() because that's too late.
diff --git a/block/blk-core.c b/block/blk-core.c
index 0c8ed97..b92e8aa 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1489,6 +1489,8 @@ void submit_bio(int rw, struct bio *bio)
{
int count = bio_sectors(bio);
+ if (rw & (1 << BIO_RW_DISCARD))
+ rw |= WRITE;
bio->bi_rw |= rw;
/*
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply related [flat|nested] 88+ messages in thread* Re: [PATCH 0/7] Discard requests, v2
2008-08-10 10:25 ` David Woodhouse
@ 2008-08-10 16:37 ` Jamie Lokier
2008-08-10 17:55 ` OGAWA Hirofumi
0 siblings, 1 reply; 88+ messages in thread
From: Jamie Lokier @ 2008-08-10 16:37 UTC (permalink / raw)
To: David Woodhouse
Cc: OGAWA Hirofumi, Jens Axboe, Andrew Morton, Ric Wheeler,
linux-fsdevel, gilad, matthew
David Woodhouse wrote:
> > Currently it seems we schedule data direction as READ (no flags).
> > Although I'm not sure, we might want to schedule it as WRITE? (I guess
> > WRITE can be allowed delay more than READ...)
>
> I'm not sure it matters much -- this is mostly going to be used on
> devices which aren't affected by I/O scheduling at all and should
> probably be using the no-op scheduler.
One case where it might need I/O scheduling is an
non-volatile-memory-backed ATA/SCSI controller. (E.g. typical RAID
card with battery, or the newer disks with flash).
You want the schedule, because there are disks behind the controller.
But you want discard requests too, so they can discard from the
non-volatile memory before it hits the disk.
Also, on flash devices there are no seek delays, but even small writes
can take a while, when their internal queues reach a synchronous block
erase state. If there's lots of WRITES or DISCARDS queued, you
wouldn't want them to affect latency-sensitive READ.
For barriers they're asymmetric. DISCARD must come after barriers,
but keeping DISCARD before barriers doesn't seem to matter. Treating
them like WRITE is stronger than necessary but fine.
-- Jamie
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/7] Discard requests, v2
2008-08-10 16:37 ` Jamie Lokier
@ 2008-08-10 17:55 ` OGAWA Hirofumi
2008-08-10 20:07 ` David Woodhouse
0 siblings, 1 reply; 88+ messages in thread
From: OGAWA Hirofumi @ 2008-08-10 17:55 UTC (permalink / raw)
To: Jamie Lokier
Cc: David Woodhouse, Jens Axboe, Andrew Morton, Ric Wheeler,
linux-fsdevel, gilad, matthew
Jamie Lokier <jamie@shareable.org> writes:
> David Woodhouse wrote:
>> > Currently it seems we schedule data direction as READ (no flags).
>> > Although I'm not sure, we might want to schedule it as WRITE? (I guess
>> > WRITE can be allowed delay more than READ...)
>>
>> I'm not sure it matters much -- this is mostly going to be used on
>> devices which aren't affected by I/O scheduling at all and should
>> probably be using the no-op scheduler.
[...]
> you wouldn't want them to affect latency-sensitive READ.
Yes. My concern is like it. If it's READ and it's passed to device
immediately like sync, I guess there is impact to userland throughput.
[...]
@@ -1489,6 +1489,8 @@ void submit_bio(int rw, struct bio *bio)
{
int count = bio_sectors(bio);
+ if (rw & (1 << BIO_RW_DISCARD))
+ rw |= WRITE;
bio->bi_rw |= rw;
Or another options is caller should do? e.g. (WRITE | (1 << BIO_RW_DISCARD))
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
^ permalink raw reply [flat|nested] 88+ messages in thread* Re: [PATCH 0/7] Discard requests, v2
2008-08-10 17:55 ` OGAWA Hirofumi
@ 2008-08-10 20:07 ` David Woodhouse
2008-08-10 21:40 ` OGAWA Hirofumi
0 siblings, 1 reply; 88+ messages in thread
From: David Woodhouse @ 2008-08-10 20:07 UTC (permalink / raw)
To: OGAWA Hirofumi
Cc: Jamie Lokier, Jens Axboe, Andrew Morton, Ric Wheeler,
linux-fsdevel, gilad, matthew
On Mon, 2008-08-11 at 02:55 +0900, OGAWA Hirofumi wrote:
> Or another options is caller should do? e.g. (WRITE | (1 <<
> BIO_RW_DISCARD))
Yeah, I thought about that. Figured it was better just to sort it out in
submit_bio() rather than relying on the callers. Although admittedly
there is precisely _one_ caller right now...
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/7] Discard requests, v2
2008-08-10 20:07 ` David Woodhouse
@ 2008-08-10 21:40 ` OGAWA Hirofumi
2008-08-11 9:40 ` David Woodhouse
0 siblings, 1 reply; 88+ messages in thread
From: OGAWA Hirofumi @ 2008-08-10 21:40 UTC (permalink / raw)
To: David Woodhouse
Cc: Jamie Lokier, Jens Axboe, Andrew Morton, Ric Wheeler,
linux-fsdevel, gilad, matthew
David Woodhouse <dwmw2@infradead.org> writes:
> On Mon, 2008-08-11 at 02:55 +0900, OGAWA Hirofumi wrote:
>> Or another options is caller should do? e.g. (WRITE | (1 <<
>> BIO_RW_DISCARD))
>
> Yeah, I thought about that. Figured it was better just to sort it out in
> submit_bio() rather than relying on the callers. Although admittedly
> there is precisely _one_ caller right now...
Although I don't have strong opinion, however, I've noticed
WRITE_BARRIER seems to be already doing similar thing. So, I just
imaged it like following... [I don't know whether blkdev_issue_flush()
really want to use (1 << BIO_RW_BARRIER) or not.]
diff -puN include/linux/fs.h~test include/linux/fs.h
--- linux-2.6/include/linux/fs.h~test 2008-08-11 06:22:38.000000000 +0900
+++ linux-2.6-hirofumi/include/linux/fs.h 2008-08-11 06:24:23.000000000 +0900
@@ -86,7 +86,8 @@ extern int dir_notify_enable;
#define READ_META (READ | (1 << BIO_RW_META))
#define WRITE_SYNC (WRITE | (1 << BIO_RW_SYNC))
#define SWRITE_SYNC (SWRITE | (1 << BIO_RW_SYNC))
-#define WRITE_BARRIER ((1 << BIO_RW) | (1 << BIO_RW_BARRIER))
+#define WRITE_BARRIER (WRITE | (1 << BIO_RW_BARRIER))
+#define WRITE_DISCARD (WRITE | (1 << BIO_RW_DISCARD))
#define SEL_IN 1
#define SEL_OUT 2
diff -puN block/blk-barrier.c~test block/blk-barrier.c
--- linux-2.6/block/blk-barrier.c~test 2008-08-11 06:29:13.000000000 +0900
+++ linux-2.6-hirofumi/block/blk-barrier.c 2008-08-11 06:30:16.000000000 +0900
@@ -293,7 +293,7 @@ int blkdev_issue_flush(struct block_devi
bio->bi_end_io = bio_end_empty_barrier;
bio->bi_private = &wait;
bio->bi_bdev = bdev;
- submit_bio(1 << BIO_RW_BARRIER, bio);
+ submit_bio(WRITE_BARRIER, bio);
wait_for_completion(&wait);
_
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/7] Discard requests, v2
2008-08-10 21:40 ` OGAWA Hirofumi
@ 2008-08-11 9:40 ` David Woodhouse
2008-08-11 10:25 ` OGAWA Hirofumi
0 siblings, 1 reply; 88+ messages in thread
From: David Woodhouse @ 2008-08-11 9:40 UTC (permalink / raw)
To: OGAWA Hirofumi
Cc: Jamie Lokier, Jens Axboe, Andrew Morton, Ric Wheeler,
linux-fsdevel, gilad, matthew
On Mon, 2008-08-11 at 06:40 +0900, OGAWA Hirofumi wrote:
> Although I don't have strong opinion, however, I've noticed
> WRITE_BARRIER seems to be already doing similar thing. So, I just
> imaged it like following...
OK, that makes sense. I'll change it to be consistent with that then.
Thank you.
> [I don't know whether blkdev_issue_flush()
> really want to use (1 << BIO_RW_BARRIER) or not.]
I think it seems reasonable, but let's do it in a separate patch. Do you
want me to put it in my tree, or will you send it to Jens directly?
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/7] Discard requests, v2
2008-08-11 9:40 ` David Woodhouse
@ 2008-08-11 10:25 ` OGAWA Hirofumi
2008-08-11 13:17 ` David Woodhouse
0 siblings, 1 reply; 88+ messages in thread
From: OGAWA Hirofumi @ 2008-08-11 10:25 UTC (permalink / raw)
To: David Woodhouse
Cc: Jamie Lokier, Jens Axboe, Andrew Morton, Ric Wheeler,
linux-fsdevel, gilad, matthew
David Woodhouse <dwmw2@infradead.org> writes:
>> [I don't know whether blkdev_issue_flush()
>> really want to use (1 << BIO_RW_BARRIER) or not.]
>
> I think it seems reasonable, but let's do it in a separate patch. Do you
> want me to put it in my tree, or will you send it to Jens directly?
Thanks. Could you put it in your tree?
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/7] Discard requests, v2
2008-08-11 10:25 ` OGAWA Hirofumi
@ 2008-08-11 13:17 ` David Woodhouse
2008-08-11 14:21 ` OGAWA Hirofumi
0 siblings, 1 reply; 88+ messages in thread
From: David Woodhouse @ 2008-08-11 13:17 UTC (permalink / raw)
To: OGAWA Hirofumi
Cc: Jamie Lokier, Jens Axboe, Andrew Morton, Ric Wheeler,
linux-fsdevel, gilad, matthew
On Mon, 2008-08-11 at 19:25 +0900, OGAWA Hirofumi wrote:
> David Woodhouse <dwmw2@infradead.org> writes:
>
> >> [I don't know whether blkdev_issue_flush()
> >> really want to use (1 << BIO_RW_BARRIER) or not.]
> >
> > I think it seems reasonable, but let's do it in a separate patch. Do you
> > want me to put it in my tree, or will you send it to Jens directly?
>
> Thanks. Could you put it in your tree
Will do. Can I have it with a signed-off-by?
Now I'm happy that request merging works, I'll stress it a little less,
too...
diff --git a/fs/fat/fatent.c b/fs/fat/fatent.c
index a8b10ac..0942a54 100644
--- a/fs/fat/fatent.c
+++ b/fs/fat/fatent.c
@@ -536,6 +536,7 @@ int fat_free_clusters(struct inode *inode, int cluster)
struct fat_entry fatent;
struct buffer_head *bhs[MAX_BUF_PER_PAGE];
int i, err, nr_bhs;
+ int first_cl = cluster;
nr_bhs = 0;
fatent_init(&fatent);
@@ -552,9 +553,15 @@ int fat_free_clusters(struct inode *inode, int cluster)
goto error;
}
- /* Issue discard for the sectors we no longer care about */
- sb_issue_discard(sb, fat_clus_to_blknr(sbi, fatent.entry),
- sbi->sec_per_clus, NULL);
+ /* Issue discard for the sectors we no longer care about,
+ batching contiguous clusters into one request */
+ if (cluster != fatent.entry + 1) {
+ int nr_clus = fatent.entry - first_cl + 1;
+
+ sb_issue_discard(sb, fat_clus_to_blknr(sbi, first_cl),
+ nr_clus * sbi->sec_per_clus, NULL);
+ first_cl = cluster;
+ }
ops->ent_put(&fatent, FAT_ENT_FREE);
if (sbi->free_clusters != -1) {
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply related [flat|nested] 88+ messages in thread* Re: [PATCH 0/7] Discard requests, v2
2008-08-11 13:17 ` David Woodhouse
@ 2008-08-11 14:21 ` OGAWA Hirofumi
0 siblings, 0 replies; 88+ messages in thread
From: OGAWA Hirofumi @ 2008-08-11 14:21 UTC (permalink / raw)
To: David Woodhouse
Cc: Jamie Lokier, Jens Axboe, Andrew Morton, Ric Wheeler,
linux-fsdevel, gilad, matthew
David Woodhouse <dwmw2@infradead.org> writes:
> On Mon, 2008-08-11 at 19:25 +0900, OGAWA Hirofumi wrote:
>> David Woodhouse <dwmw2@infradead.org> writes:
>>
>> >> [I don't know whether blkdev_issue_flush()
>> >> really want to use (1 << BIO_RW_BARRIER) or not.]
>> >
>> > I think it seems reasonable, but let's do it in a separate patch. Do you
>> > want me to put it in my tree, or will you send it to Jens directly?
>>
>> Thanks. Could you put it in your tree
>
> Will do. Can I have it with a signed-off-by?
Ah, of course.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
> @@ -552,9 +553,15 @@ int fat_free_clusters(struct inode *inode, int cluster)
> goto error;
> }
>
> - /* Issue discard for the sectors we no longer care about */
> - sb_issue_discard(sb, fat_clus_to_blknr(sbi, fatent.entry),
> - sbi->sec_per_clus, NULL);
> + /* Issue discard for the sectors we no longer care about,
> + batching contiguous clusters into one request */
> + if (cluster != fatent.entry + 1) {
> + int nr_clus = fatent.entry - first_cl + 1;
> +
> + sb_issue_discard(sb, fat_clus_to_blknr(sbi, first_cl),
> + nr_clus * sbi->sec_per_clus, NULL);
Looks good to me. But, could you use following comment style for FAT?
/*
* Issue discard for the sectors we no longer care about,
* batching contiguous clusters into one request
*/
Then, Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Thanks.
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
^ permalink raw reply [flat|nested] 88+ messages in thread
* [PATCH 8/7] blktrace: support discard requests
2008-08-09 16:26 [PATCH 0/7] Discard requests, v2 David Woodhouse
` (7 preceding siblings ...)
2008-08-09 22:48 ` [PATCH 0/7] Discard requests, v2 OGAWA Hirofumi
@ 2008-08-10 10:29 ` David Woodhouse
2008-08-10 10:35 ` [USERSPACE PATCH] " David Woodhouse
2008-08-10 10:41 ` [PATCH 8/7] " David Woodhouse
2008-08-10 11:48 ` [PATCH 9/7] blktrace: simplify flags handling in __blk_add_trace David Woodhouse
` (4 subsequent siblings)
13 siblings, 2 replies; 88+ messages in thread
From: David Woodhouse @ 2008-08-10 10:29 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
Tidy up the bio_act[] array a little while we're at it. We should
probably be using ilog2() for this anyway. It can't be _that_ much of a
performance hit.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---
block/blktrace.c | 11 ++++++++++-
include/linux/blktrace_api.h | 4 ++++
2 files changed, 14 insertions(+), 1 deletions(-)
diff --git a/block/blktrace.c b/block/blktrace.c
index eb9651c..7495a84 100644
--- a/block/blktrace.c
+++ b/block/blktrace.c
@@ -114,7 +114,13 @@ static u32 ddir_act[2] __read_mostly = { BLK_TC_ACT(BLK_TC_READ), BLK_TC_ACT(BLK
/*
* Bio action bits of interest
*/
-static u32 bio_act[9] __read_mostly = { 0, BLK_TC_ACT(BLK_TC_BARRIER), BLK_TC_ACT(BLK_TC_SYNC), 0, BLK_TC_ACT(BLK_TC_AHEAD), 0, 0, 0, BLK_TC_ACT(BLK_TC_META) };
+static u32 bio_act[17] __read_mostly = {
+ [1] = BLK_TC_ACT(BLK_TC_BARRIER),
+ [2] = BLK_TC_ACT(BLK_TC_SYNC),
+ [4] = BLK_TC_ACT(BLK_TC_AHEAD),
+ [8] = BLK_TC_ACT(BLK_TC_META),
+ [16] = BLK_TC_ACT(BLK_TC_DISCARD)
+};
/*
* More could be added as needed, taking care to increment the decrementer
@@ -128,6 +134,8 @@ static u32 bio_act[9] __read_mostly = { 0, BLK_TC_ACT(BLK_TC_BARRIER), BLK_TC_AC
(((rw) & (1 << BIO_RW_AHEAD)) << (2 - BIO_RW_AHEAD))
#define trace_meta_bit(rw) \
(((rw) & (1 << BIO_RW_META)) >> (BIO_RW_META - 3))
+#define trace_discard_bit(rw) \
+ (((rw) & (1 << BIO_RW_DISCARD)) >> (BIO_RW_DISCARD - 4))
/*
* The worker for the various blk_add_trace*() types. Fills out a
@@ -151,6 +159,7 @@ void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
what |= bio_act[trace_sync_bit(rw)];
what |= bio_act[trace_ahead_bit(rw)];
what |= bio_act[trace_meta_bit(rw)];
+ what |= bio_act[trace_discard_bit(rw)];
pid = tsk->pid;
if (unlikely(act_log_check(bt, what, sector, pid)))
diff --git a/include/linux/blktrace_api.h b/include/linux/blktrace_api.h
index d084b8d..27da2cc 100644
--- a/include/linux/blktrace_api.h
+++ b/include/linux/blktrace_api.h
@@ -21,6 +21,7 @@ enum blktrace_cat {
BLK_TC_NOTIFY = 1 << 10, /* special message */
BLK_TC_AHEAD = 1 << 11, /* readahead */
BLK_TC_META = 1 << 12, /* metadata */
+ BLK_TC_DISCARD = 1 << 13, /* discard requests */
BLK_TC_END = 1 << 15, /* only 16-bits, reminder */
};
@@ -195,6 +196,9 @@ static inline void blk_add_trace_rq(struct request_queue *q, struct request *rq,
if (likely(!bt))
return;
+ if (blk_discard_rq(rq))
+ rw |= (1 << BIO_RW_DISCARD);
+
if (blk_pc_request(rq)) {
what |= BLK_TC_ACT(BLK_TC_PC);
__blk_add_trace(bt, 0, rq->data_len, rw, what, rq->errors, sizeof(rq->cmd), rq->cmd);
--
1.5.5.1
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply related [flat|nested] 88+ messages in thread* [USERSPACE PATCH] blktrace: support discard requests
2008-08-10 10:29 ` [PATCH 8/7] blktrace: support discard requests David Woodhouse
@ 2008-08-10 10:35 ` David Woodhouse
2008-08-15 8:43 ` Jens Axboe
2008-08-10 10:41 ` [PATCH 8/7] " David Woodhouse
1 sibling, 1 reply; 88+ messages in thread
From: David Woodhouse @ 2008-08-10 10:35 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
Add support for discard requests to blktrace userspace tools.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
diff --git a/act_mask.c b/act_mask.c
index 40a4050..6632320 100644
--- a/act_mask.c
+++ b/act_mask.c
@@ -25,6 +25,7 @@ static struct mask_map mask_maps[] = {
DECLARE_MASK_MAP(PC),
DECLARE_MASK_MAP(AHEAD),
DECLARE_MASK_MAP(META),
+ DECLARE_MASK_MAP(DISCARD),
};
int find_mask_map(char *string)
diff --git a/blkparse_fmt.c b/blkparse_fmt.c
index 364f27c..83a8504 100644
--- a/blkparse_fmt.c
+++ b/blkparse_fmt.c
@@ -57,9 +57,12 @@ static inline void fill_rwbs(char *rwbs, struct blk_io_trace *t)
int b = t->action & BLK_TC_ACT(BLK_TC_BARRIER);
int s = t->action & BLK_TC_ACT(BLK_TC_SYNC);
int m = t->action & BLK_TC_ACT(BLK_TC_META);
+ int d = t->action & BLK_TC_ACT(BLK_TC_DISCARD);
int i = 0;
- if (w)
+ if (d)
+ rwbs[i++] = 'D';
+ else if (w)
rwbs[i++] = 'W';
else if (t->bytes)
rwbs[i++] = 'R';
diff --git a/blkrawverify.c b/blkrawverify.c
index 82a435e..984dbff 100644
--- a/blkrawverify.c
+++ b/blkrawverify.c
@@ -49,6 +49,7 @@ static struct trace_info traces[] = {
TRACE_TO_STRING( BLK_TC_PC ),
TRACE_TO_STRING( BLK_TC_AHEAD ),
TRACE_TO_STRING( BLK_TC_META ),
+ TRACE_TO_STRING( BLK_TC_DISCARD ),
};
#define N_TRACES (sizeof(traces) / sizeof(struct trace_info))
diff --git a/blktrace_api.h b/blktrace_api.h
index 67720de..992850c 100644
--- a/blktrace_api.h
+++ b/blktrace_api.h
@@ -20,6 +20,7 @@ enum {
BLK_TC_NOTIFY = 1 << 10, /* special message */
BLK_TC_AHEAD = 1 << 11, /* readahead */
BLK_TC_META = 1 << 12, /* metadata */
+ BLK_TC_DISCARD = 1 << 13, /* discard requests */
BLK_TC_END = 1 << 15, /* only 16-bits, reminder */
};
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply related [flat|nested] 88+ messages in thread* Re: [USERSPACE PATCH] blktrace: support discard requests
2008-08-10 10:35 ` [USERSPACE PATCH] " David Woodhouse
@ 2008-08-15 8:43 ` Jens Axboe
2008-08-15 9:01 ` David Woodhouse
0 siblings, 1 reply; 88+ messages in thread
From: Jens Axboe @ 2008-08-15 8:43 UTC (permalink / raw)
To: David Woodhouse; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
On Sun, Aug 10 2008, David Woodhouse wrote:
> Add support for discard requests to blktrace userspace tools.
Please remember to document the 'D' value for data direction, there's
plenty of documentation in blktrace/parse. Otherwise people have no idea
what it means.
--
Jens Axboe
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [USERSPACE PATCH] blktrace: support discard requests
2008-08-15 8:43 ` Jens Axboe
@ 2008-08-15 9:01 ` David Woodhouse
2008-08-15 9:08 ` Jens Axboe
0 siblings, 1 reply; 88+ messages in thread
From: David Woodhouse @ 2008-08-15 9:01 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
On Fri, 2008-08-15 at 10:43 +0200, Jens Axboe wrote:
> On Sun, Aug 10 2008, David Woodhouse wrote:
> > Add support for discard requests to blktrace userspace tools.
>
> Please remember to document the 'D' value for data direction, there's
> plenty of documentation in blktrace/parse. Otherwise people have no idea
> what it means.
Added to the documentation of the RWBS field in blktrace.tex and
blkparse.1.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
diff --git a/act_mask.c b/act_mask.c
index 40a4050..6632320 100644
--- a/act_mask.c
+++ b/act_mask.c
@@ -25,6 +25,7 @@ static struct mask_map mask_maps[] = {
DECLARE_MASK_MAP(PC),
DECLARE_MASK_MAP(AHEAD),
DECLARE_MASK_MAP(META),
+ DECLARE_MASK_MAP(DISCARD),
};
int find_mask_map(char *string)
diff --git a/blkparse_fmt.c b/blkparse_fmt.c
index 364f27c..83a8504 100644
--- a/blkparse_fmt.c
+++ b/blkparse_fmt.c
@@ -57,9 +57,12 @@ static inline void fill_rwbs(char *rwbs, struct blk_io_trace *t)
int b = t->action & BLK_TC_ACT(BLK_TC_BARRIER);
int s = t->action & BLK_TC_ACT(BLK_TC_SYNC);
int m = t->action & BLK_TC_ACT(BLK_TC_META);
+ int d = t->action & BLK_TC_ACT(BLK_TC_DISCARD);
int i = 0;
- if (w)
+ if (d)
+ rwbs[i++] = 'D';
+ else if (w)
rwbs[i++] = 'W';
else if (t->bytes)
rwbs[i++] = 'R';
diff --git a/blkrawverify.c b/blkrawverify.c
index 82a435e..984dbff 100644
--- a/blkrawverify.c
+++ b/blkrawverify.c
@@ -49,6 +49,7 @@ static struct trace_info traces[] = {
TRACE_TO_STRING( BLK_TC_PC ),
TRACE_TO_STRING( BLK_TC_AHEAD ),
TRACE_TO_STRING( BLK_TC_META ),
+ TRACE_TO_STRING( BLK_TC_DISCARD ),
};
#define N_TRACES (sizeof(traces) / sizeof(struct trace_info))
diff --git a/blktrace_api.h b/blktrace_api.h
index 67720de..992850c 100644
--- a/blktrace_api.h
+++ b/blktrace_api.h
@@ -20,6 +20,7 @@ enum {
BLK_TC_NOTIFY = 1 << 10, /* special message */
BLK_TC_AHEAD = 1 << 11, /* readahead */
BLK_TC_META = 1 << 12, /* metadata */
+ BLK_TC_DISCARD = 1 << 13, /* discard requests */
BLK_TC_END = 1 << 15, /* only 16-bits, reminder */
};
diff --git a/doc/blkparse.1 b/doc/blkparse.1
index 825ca36..e97581c 100644
--- a/doc/blkparse.1
+++ b/doc/blkparse.1
@@ -391,9 +391,9 @@ Split
.SH "RWBS DESCRIPTION"
-This is a small string containing at least one character ('R' for read, 'W' for
-write operation), and optionally either a 'B' (for barrier operations) or 'S' (for
-synchronous operations).
+This is a small string containing at least one character ('R' for read, 'W'
+for write, or 'D' for block discard operation), and optionally either
+a 'B' (for barrier operations) or 'S' (for synchronous operations).
.SH "DEFAULT OUTPUT"
diff --git a/doc/blktrace.tex b/doc/blktrace.tex
index 82c35e8..1e69240 100644
--- a/doc/blktrace.tex
+++ b/doc/blktrace.tex
@@ -645,8 +645,8 @@ X & Split \\ \hline
\subsubsection{\label{sec:act-table}RWBS Description}
This is a small string containing at least one character ('R' for read,
-'W' for write operation), and optionally either a 'B' (for barrier
-operations) or 'S' (for synchronous operations).
+'W' for write, or 'D' for block discard operation), and optionally either
+a 'B' (for barrier operations) or 'S' (for synchronous operations).
\subsubsection{\label{sec:default-output}Default output}
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply related [flat|nested] 88+ messages in thread* Re: [USERSPACE PATCH] blktrace: support discard requests
2008-08-15 9:01 ` David Woodhouse
@ 2008-08-15 9:08 ` Jens Axboe
0 siblings, 0 replies; 88+ messages in thread
From: Jens Axboe @ 2008-08-15 9:08 UTC (permalink / raw)
To: David Woodhouse; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
On Fri, Aug 15 2008, David Woodhouse wrote:
> On Fri, 2008-08-15 at 10:43 +0200, Jens Axboe wrote:
> > On Sun, Aug 10 2008, David Woodhouse wrote:
> > > Add support for discard requests to blktrace userspace tools.
> >
> > Please remember to document the 'D' value for data direction, there's
> > plenty of documentation in blktrace/parse. Otherwise people have no idea
> > what it means.
>
> Added to the documentation of the RWBS field in blktrace.tex and
> blkparse.1.
Thanks David, I already applied your previous patch, so I'll just dig
out the docu bits from this one.
--
Jens Axboe
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 8/7] blktrace: support discard requests
2008-08-10 10:29 ` [PATCH 8/7] blktrace: support discard requests David Woodhouse
2008-08-10 10:35 ` [USERSPACE PATCH] " David Woodhouse
@ 2008-08-10 10:41 ` David Woodhouse
2008-08-13 11:17 ` Jens Axboe
1 sibling, 1 reply; 88+ messages in thread
From: David Woodhouse @ 2008-08-10 10:41 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
While doing this, I note blktrace doesn't handle partial completion
particularly well. Since our FTL implementations currently don't care
about merged requests and just call end_request(), we see this (note the
sector counts in the completion events, which _should_ all be '+ 16')...
44,0 0 500 1010.573373912 2759 D D 1843 + 96 [ftld]
44,0 0 501 1010.581654276 2759 C D 1843 + 96 [0]
44,0 0 502 1010.589936163 2759 C D 1859 + 80 [0]
44,0 0 503 1010.598221556 2759 C D 1875 + 64 [0]
44,0 0 504 1010.606505959 2759 C D 1891 + 48 [0]
44,0 0 505 1010.614787227 2759 C D 1907 + 32 [0]
44,0 0 506 1010.623061067 2759 C D 1923 + 16 [0]
Do we want to pass nr_bytes into blk_add_trace_rq() from
__end_that_request_first()?
It's also a bit suboptimal that 'blktrace /dev/ftla -o- | blkparse -i-'
stalls when it hits the first message packet, and then starts storing
requests to be sorted, never printing anything more until you hit ^C.
I just removed the '&& bit->action != BLK_TN_MESSAGE' from line 2197 of
blkparse.c but that means the output isn't in the right order. Some kind
of partial sort might be nice, if we can do it -- wait for a pause or
some kind of marker in the stream, then sort and print what we have
batched, then continue waiting for new events...
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 8/7] blktrace: support discard requests
2008-08-10 10:41 ` [PATCH 8/7] " David Woodhouse
@ 2008-08-13 11:17 ` Jens Axboe
0 siblings, 0 replies; 88+ messages in thread
From: Jens Axboe @ 2008-08-13 11:17 UTC (permalink / raw)
To: David Woodhouse; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
On Sun, Aug 10 2008, David Woodhouse wrote:
> While doing this, I note blktrace doesn't handle partial completion
> particularly well. Since our FTL implementations currently don't care
> about merged requests and just call end_request(), we see this (note the
> sector counts in the completion events, which _should_ all be '+ 16')...
> 44,0 0 500 1010.573373912 2759 D D 1843 + 96 [ftld]
> 44,0 0 501 1010.581654276 2759 C D 1843 + 96 [0]
> 44,0 0 502 1010.589936163 2759 C D 1859 + 80 [0]
> 44,0 0 503 1010.598221556 2759 C D 1875 + 64 [0]
> 44,0 0 504 1010.606505959 2759 C D 1891 + 48 [0]
> 44,0 0 505 1010.614787227 2759 C D 1907 + 32 [0]
> 44,0 0 506 1010.623061067 2759 C D 1923 + 16 [0]
>
> Do we want to pass nr_bytes into blk_add_trace_rq() from
> __end_that_request_first()?
Yep, that does indeed look like a bug!
> It's also a bit suboptimal that 'blktrace /dev/ftla -o- | blkparse -i-'
> stalls when it hits the first message packet, and then starts storing
> requests to be sorted, never printing anything more until you hit ^C.
>
> I just removed the '&& bit->action != BLK_TN_MESSAGE' from line 2197 of
> blkparse.c but that means the output isn't in the right order. Some kind
> of partial sort might be nice, if we can do it -- wait for a pause or
> some kind of marker in the stream, then sort and print what we have
> batched, then continue waiting for new events...
I'll take patches, it's a pretty tricky area... Live tracing really
isn't that well supported, the only way to guarantee "perfect" traces is
to do post-processing.
--
Jens Axboe
^ permalink raw reply [flat|nested] 88+ messages in thread
* [PATCH 9/7] blktrace: simplify flags handling in __blk_add_trace
2008-08-09 16:26 [PATCH 0/7] Discard requests, v2 David Woodhouse
` (8 preceding siblings ...)
2008-08-10 10:29 ` [PATCH 8/7] blktrace: support discard requests David Woodhouse
@ 2008-08-10 11:48 ` David Woodhouse
2008-08-10 11:50 ` David Woodhouse
2008-08-11 15:11 ` [PATCH 10/7] [BLOCK] Add BLKDISCARD ioctl to allow userspace to discard sectors David Woodhouse
` (3 subsequent siblings)
13 siblings, 1 reply; 88+ messages in thread
From: David Woodhouse @ 2008-08-10 11:48 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
Let the compiler see what's going on, and it can all get a lot simpler.
On PPC64 this reduces the size of the code calculating these bits by
about 60%. On x86_64 it's less of a win -- only 40%.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---
block/blktrace.c | 38 ++++++++------------------------------
1 files changed, 8 insertions(+), 30 deletions(-)
diff --git a/block/blktrace.c b/block/blktrace.c
index 7495a84..9e0212c 100644
--- a/block/blktrace.c
+++ b/block/blktrace.c
@@ -111,31 +111,9 @@ static int act_log_check(struct blk_trace *bt, u32 what, sector_t sector,
*/
static u32 ddir_act[2] __read_mostly = { BLK_TC_ACT(BLK_TC_READ), BLK_TC_ACT(BLK_TC_WRITE) };
-/*
- * Bio action bits of interest
- */
-static u32 bio_act[17] __read_mostly = {
- [1] = BLK_TC_ACT(BLK_TC_BARRIER),
- [2] = BLK_TC_ACT(BLK_TC_SYNC),
- [4] = BLK_TC_ACT(BLK_TC_AHEAD),
- [8] = BLK_TC_ACT(BLK_TC_META),
- [16] = BLK_TC_ACT(BLK_TC_DISCARD)
-};
-
-/*
- * More could be added as needed, taking care to increment the decrementer
- * to get correct indexing
- */
-#define trace_barrier_bit(rw) \
- (((rw) & (1 << BIO_RW_BARRIER)) >> (BIO_RW_BARRIER - 0))
-#define trace_sync_bit(rw) \
- (((rw) & (1 << BIO_RW_SYNC)) >> (BIO_RW_SYNC - 1))
-#define trace_ahead_bit(rw) \
- (((rw) & (1 << BIO_RW_AHEAD)) << (2 - BIO_RW_AHEAD))
-#define trace_meta_bit(rw) \
- (((rw) & (1 << BIO_RW_META)) >> (BIO_RW_META - 3))
-#define trace_discard_bit(rw) \
- (((rw) & (1 << BIO_RW_DISCARD)) >> (BIO_RW_DISCARD - 4))
+/* The ilog2() calls fall out because they're constant */
+#define MASK_TC_BIT(rw, __name) ( (rw & (1 << BIO_RW_ ## __name)) << \
+ (ilog2(BLK_TC_ ## __name) + BLK_TC_SHIFT - BIO_RW_ ## __name) )
/*
* The worker for the various blk_add_trace*() types. Fills out a
@@ -155,11 +133,11 @@ void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
return;
what |= ddir_act[rw & WRITE];
- what |= bio_act[trace_barrier_bit(rw)];
- what |= bio_act[trace_sync_bit(rw)];
- what |= bio_act[trace_ahead_bit(rw)];
- what |= bio_act[trace_meta_bit(rw)];
- what |= bio_act[trace_discard_bit(rw)];
+ what |= MASK_TC_BIT(rw, BARRIER);
+ what |= MASK_TC_BIT(rw, SYNC);
+ what |= MASK_TC_BIT(rw, AHEAD);
+ what |= MASK_TC_BIT(rw, META);
+ what |= MASK_TC_BIT(rw, DISCARD);
pid = tsk->pid;
if (unlikely(act_log_check(bt, what, sector, pid)))
--
1.5.5.1
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply related [flat|nested] 88+ messages in thread* Re: [PATCH 9/7] blktrace: simplify flags handling in __blk_add_trace
2008-08-10 11:48 ` [PATCH 9/7] blktrace: simplify flags handling in __blk_add_trace David Woodhouse
@ 2008-08-10 11:50 ` David Woodhouse
0 siblings, 0 replies; 88+ messages in thread
From: David Woodhouse @ 2008-08-10 11:50 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
On Sun, 2008-08-10 at 12:48 +0100, David Woodhouse wrote:
> On PPC64 this reduces the size of the code calculating these bits by
> about 60%.
(gdb) disass make_what_old
Dump of assembler code for function make_what_old:
0x0000000000000000 <make_what_old+0>: std r30,-16(r1)
0x0000000000000004 <make_what_old+4>: ld r30,0(r2)
0x0000000000000008 <make_what_old+8>: rlwinm r9,r3,29,30,30
0x000000000000000c <make_what_old+12>: rlwinm r10,r3,1,29,29
0x0000000000000010 <make_what_old+16>: extsw r9,r9
0x0000000000000014 <make_what_old+20>: rlwinm r7,r3,0,29,29
0x0000000000000018 <make_what_old+24>: ld r8,-32768(r30)
0x000000000000001c <make_what_old+28>: rldicr r9,r9,2,61
0x0000000000000020 <make_what_old+32>: rlwinm r11,r3,30,28,28
0x0000000000000024 <make_what_old+36>: ld r30,-16(r1)
0x0000000000000028 <make_what_old+40>: extsw r10,r10
0x000000000000002c <make_what_old+44>: rlwinm r0,r3,30,27,27
0x0000000000000030 <make_what_old+48>: rldicr r10,r10,2,61
0x0000000000000034 <make_what_old+52>: extsw r11,r11
0x0000000000000038 <make_what_old+56>: lwzx r9,r8,r9
0x000000000000003c <make_what_old+60>: lwzx r3,r8,r7
0x0000000000000040 <make_what_old+64>: extsw r0,r0
0x0000000000000044 <make_what_old+68>: rldicr r11,r11,2,61
0x0000000000000048 <make_what_old+72>: or r3,r3,r9
0x000000000000004c <make_what_old+76>: rldicr r0,r0,2,61
0x0000000000000050 <make_what_old+80>: lwzx r9,r8,r10
0x0000000000000054 <make_what_old+84>: or r3,r3,r4
0x0000000000000058 <make_what_old+88>: lwzx r11,r8,r11
0x000000000000005c <make_what_old+92>: or r3,r3,r9
0x0000000000000060 <make_what_old+96>: lwzx r0,r8,r0
0x0000000000000064 <make_what_old+100>: or r3,r3,r11
0x0000000000000068 <make_what_old+104>: or r3,r3,r0
0x000000000000006c <make_what_old+108>: clrldi r3,r3,32
0x0000000000000070 <make_what_old+112>: blr
End of assembler dump.
(gdb) disass make_what_new
Dump of assembler code for function make_what_new:
0x0000000000000074 <make_what_new+0>: mr r0,r3
0x0000000000000078 <make_what_new+4>: rlwinm r3,r3,16,13,13
0x000000000000007c <make_what_new+8>: rlwinm r9,r0,15,12,12
0x0000000000000080 <make_what_new+12>: rlwinm r11,r0,26,4,4
0x0000000000000084 <make_what_new+16>: or r3,r3,r9
0x0000000000000088 <make_what_new+20>: rlwinm r9,r0,23,3,3
0x000000000000008c <make_what_new+24>: or r3,r3,r4
0x0000000000000090 <make_what_new+28>: rlwinm r0,r0,23,2,2
0x0000000000000094 <make_what_new+32>: or r3,r3,r11
0x0000000000000098 <make_what_new+36>: or r3,r3,r9
0x000000000000009c <make_what_new+40>: or r3,r3,r0
0x00000000000000a0 <make_what_new+44>: clrldi r3,r3,32
0x00000000000000a4 <make_what_new+48>: blr
End of assembler dump.
> On x86_64 it's less of a win -- only 40%.
(gdb) disass make_what_old
Dump of assembler code for function make_what_old:
0x0000000000000000 <make_what_old+0>: mov %edi,%edx
0x0000000000000002 <make_what_old+2>: mov %rdi,%rax
0x0000000000000005 <make_what_old+5>: push %rbp
0x0000000000000006 <make_what_old+6>: and $0x10,%edx
0x0000000000000009 <make_what_old+9>: and $0x4,%eax
0x000000000000000c <make_what_old+12>: sar $0x3,%edx
0x000000000000000f <make_what_old+15>: mov 0x0(%rax),%eax
0x0000000000000015 <make_what_old+21>: mov %rsp,%rbp
0x0000000000000018 <make_what_old+24>: movslq %edx,%rdx
0x000000000000001b <make_what_old+27>: or 0x0(,%rdx,4),%eax
0x0000000000000022 <make_what_old+34>: mov %edi,%edx
0x0000000000000024 <make_what_old+36>: and $0x2,%edx
0x0000000000000027 <make_what_old+39>: add %edx,%edx
0x0000000000000029 <make_what_old+41>: movslq %edx,%rdx
0x000000000000002c <make_what_old+44>: or %esi,%eax
0x000000000000002e <make_what_old+46>: or 0x0(,%rdx,4),%eax
0x0000000000000035 <make_what_old+53>: mov %edi,%edx
0x0000000000000037 <make_what_old+55>: and $0x20,%edx
0x000000000000003a <make_what_old+58>: and $0x40,%edi
0x000000000000003d <make_what_old+61>: sar $0x2,%edx
0x0000000000000040 <make_what_old+64>: sar $0x2,%edi
0x0000000000000043 <make_what_old+67>: movslq %edx,%rdx
0x0000000000000046 <make_what_old+70>: movslq %edi,%rdi
0x0000000000000049 <make_what_old+73>: or 0x0(,%rdx,4),%eax
0x0000000000000050 <make_what_old+80>: or 0x0(,%rdi,4),%eax
0x0000000000000057 <make_what_old+87>: leaveq
0x0000000000000058 <make_what_old+88>: retq
End of assembler dump.
(gdb) disass make_what_new
Dump of assembler code for function make_what_new:
0x0000000000000059 <make_what_new+0>: mov %edi,%eax
0x000000000000005b <make_what_new+2>: mov %edi,%edx
0x000000000000005d <make_what_new+4>: push %rbp
0x000000000000005e <make_what_new+5>: and $0x4,%eax
0x0000000000000061 <make_what_new+8>: and $0x10,%edx
0x0000000000000064 <make_what_new+11>: shl $0xf,%edx
0x0000000000000067 <make_what_new+14>: shl $0x10,%eax
0x000000000000006a <make_what_new+17>: mov %rsp,%rbp
0x000000000000006d <make_what_new+20>: or %edx,%eax
0x000000000000006f <make_what_new+22>: mov %edi,%edx
0x0000000000000071 <make_what_new+24>: and $0x2,%edx
0x0000000000000074 <make_what_new+27>: or %esi,%eax
0x0000000000000076 <make_what_new+29>: shl $0x1a,%edx
0x0000000000000079 <make_what_new+32>: leaveq
0x000000000000007a <make_what_new+33>: or %edx,%eax
0x000000000000007c <make_what_new+35>: mov %edi,%edx
0x000000000000007e <make_what_new+37>: and $0x40,%edi
0x0000000000000081 <make_what_new+40>: and $0x20,%edx
0x0000000000000084 <make_what_new+43>: shl $0x17,%edi
0x0000000000000087 <make_what_new+46>: shl $0x17,%edx
0x000000000000008a <make_what_new+49>: or %edx,%eax
0x000000000000008c <make_what_new+51>: or %edi,%eax
0x000000000000008e <make_what_new+53>: retq
End of assembler dump.
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply [flat|nested] 88+ messages in thread
* [PATCH 10/7] [BLOCK] Add BLKDISCARD ioctl to allow userspace to discard sectors
2008-08-09 16:26 [PATCH 0/7] Discard requests, v2 David Woodhouse
` (9 preceding siblings ...)
2008-08-10 11:48 ` [PATCH 9/7] blktrace: simplify flags handling in __blk_add_trace David Woodhouse
@ 2008-08-11 15:11 ` David Woodhouse
2008-08-11 18:27 ` Matthew Wilcox
2008-08-11 20:52 ` David Woodhouse
2008-08-12 9:14 ` [PATCH 0/7] Discard requests, v2 Jens Axboe
` (2 subsequent siblings)
13 siblings, 2 replies; 88+ messages in thread
From: David Woodhouse @ 2008-08-11 15:11 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
We may well want mkfs tools to use this to mark the whole device as
unwanted before they format it, for example.
The ioctl takes a pair of uint64_ts, which are start offset and length
in _bytes_. Although at the moment it might make sense for them both to
be in 512-byte sectors, I don't want to limit the ABI to that.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---
block/compat_ioctl.c | 1 +
block/ioctl.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/fs.h | 1 +
3 files changed, 76 insertions(+), 0 deletions(-)
diff --git a/block/compat_ioctl.c b/block/compat_ioctl.c
index c23177e..1e559fb 100644
--- a/block/compat_ioctl.c
+++ b/block/compat_ioctl.c
@@ -788,6 +788,7 @@ long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg)
return compat_hdio_getgeo(disk, bdev, compat_ptr(arg));
case BLKFLSBUF:
case BLKROSET:
+ case BLKDISCARD:
/*
* the ones below are implemented in blkdev_locked_ioctl,
* but we call blkdev_ioctl, which gets the lock for us
diff --git a/block/ioctl.c b/block/ioctl.c
index 77185e5..8193fbb 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -111,6 +111,69 @@ static int blkdev_reread_part(struct block_device *bdev)
return res;
}
+static void blk_ioc_discard_endio(struct bio *bio, int err)
+{
+ if (err) {
+ if (err == -EOPNOTSUPP)
+ set_bit(BIO_EOPNOTSUPP, &bio->bi_flags);
+ clear_bit(BIO_UPTODATE, &bio->bi_flags);
+ }
+ complete(bio->bi_private);
+}
+
+static int blk_ioctl_discard(struct block_device *bdev, uint64_t start,
+ uint64_t len)
+{
+ struct request_queue *q = bdev_get_queue(bdev);
+ int ret = 0;
+
+ if (start & 511)
+ return -EINVAL;
+ if (len & 511)
+ return -EINVAL;
+ start >>= 9;
+ len >>= 9;
+
+ if (start + len > (bdev->bd_inode->i_size >> 9))
+ return -EINVAL;
+
+ if (!q->prepare_discard_fn)
+ return -EOPNOTSUPP;
+
+ while (len && !ret) {
+ DECLARE_COMPLETION_ONSTACK(wait);
+ struct bio *bio;
+
+ bio = bio_alloc(GFP_KERNEL, 0);
+ if (!bio)
+ return -ENOMEM;
+
+ bio->bi_end_io = blk_ioc_discard_endio;
+ bio->bi_bdev = bdev;
+ bio->bi_private = &wait;
+ bio->bi_sector = start;
+
+ if (len > q->max_hw_sectors) {
+ bio->bi_size = q->max_hw_sectors << 9;
+ len -= q->max_hw_sectors;
+ start += q->max_hw_sectors;
+ } else {
+ bio->bi_size = len << 9;
+ len = 0;
+ }
+ submit_bio(WRITE_DISCARD, bio);
+
+ wait_for_completion(&wait);
+
+ if (bio_flagged(bio, BIO_EOPNOTSUPP))
+ ret = -EOPNOTSUPP;
+ else if (!bio_flagged(bio, BIO_UPTODATE))
+ ret = -EIO;
+ bio_put(bio);
+ }
+ return ret;
+}
+
static int put_ushort(unsigned long arg, unsigned short val)
{
return put_user(val, (unsigned short __user *)arg);
@@ -258,6 +321,17 @@ int blkdev_ioctl(struct inode *inode, struct file *file, unsigned cmd,
set_device_ro(bdev, n);
unlock_kernel();
return 0;
+
+ case BLKDISCARD: {
+ uint64_t start, len;
+
+ if (get_user(start, (uint64_t __user *)(arg)) ||
+ get_user(len, (uint64_t __user *)(arg+8)))
+ return -EFAULT;
+
+ return blk_ioctl_discard(bdev, start, len);
+ }
+
case HDIO_GETGEO: {
struct hd_geometry geo;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index eb01313..88358ca 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -223,6 +223,7 @@ extern int dir_notify_enable;
#define BLKTRACESTART _IO(0x12,116)
#define BLKTRACESTOP _IO(0x12,117)
#define BLKTRACETEARDOWN _IO(0x12,118)
+#define BLKDISCARD _IO(0x12,119)
#define BMAP_IOCTL 1 /* obsolete - kept for compatibility */
#define FIBMAP _IO(0x00,1) /* bmap access */
--
1.5.5.1
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply related [flat|nested] 88+ messages in thread* Re: [PATCH 10/7] [BLOCK] Add BLKDISCARD ioctl to allow userspace to discard sectors
2008-08-11 15:11 ` [PATCH 10/7] [BLOCK] Add BLKDISCARD ioctl to allow userspace to discard sectors David Woodhouse
@ 2008-08-11 18:27 ` Matthew Wilcox
2008-08-11 20:52 ` David Woodhouse
1 sibling, 0 replies; 88+ messages in thread
From: Matthew Wilcox @ 2008-08-11 18:27 UTC (permalink / raw)
To: David Woodhouse
Cc: Jens Axboe, Andrew Morton, Ric Wheeler, linux-fsdevel, gilad
On Mon, Aug 11, 2008 at 04:11:12PM +0100, David Woodhouse wrote:
> We may well want mkfs tools to use this to mark the whole device as
> unwanted before they format it, for example.
>
> The ioctl takes a pair of uint64_ts, which are start offset and length
> in _bytes_. Although at the moment it might make sense for them both to
> be in 512-byte sectors, I don't want to limit the ABI to that.
http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=0650&db=man&fname=/usr/share/catman/p_man/cat2/standard/fcntl.z
Their F_FREESP does pretty much exactly this.
I also like the idea of extending MADV_REMOVE to work on devices which
support PUNCH/TRIM/DISCARD, and adding a LINUX_FADV_REMOVE to fadvise.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 10/7] [BLOCK] Add BLKDISCARD ioctl to allow userspace to discard sectors
2008-08-11 15:11 ` [PATCH 10/7] [BLOCK] Add BLKDISCARD ioctl to allow userspace to discard sectors David Woodhouse
2008-08-11 18:27 ` Matthew Wilcox
@ 2008-08-11 20:52 ` David Woodhouse
1 sibling, 0 replies; 88+ messages in thread
From: David Woodhouse @ 2008-08-11 20:52 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
On Mon, 2008-08-11 at 16:11 +0100, David Woodhouse wrote:
> + if (get_user(start, (uint64_t __user *)(arg)) ||
> + get_user(len, (uint64_t __user *)(arg+8)))
Hm, I don't think that works on any 32-bit platform except PowerPC.
Although the patch at http://david.woodhou.se/i386-getuser8.patch might
fix it for i386, if someone with access to a suitable box wants to test
it for me.
Still, this is probably a better plan...
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -323,13 +323,12 @@ int blkdev_ioctl(struct inode *inode, struct file
*file, u
return 0;
case BLKDISCARD: {
- uint64_t start, len;
+ uint64_t range[2];
- if (get_user(start, (uint64_t __user *)(arg)) ||
- get_user(len, (uint64_t __user *)(arg+8)))
+ if (copy_from_user(range, arg, sizeof(range)))
return -EFAULT;
- return blk_ioctl_discard(bdev, start, len);
+ return blk_ioctl_discard(bdev, range[0], range[1]);
}
case HDIO_GETGEO: {
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/7] Discard requests, v2
2008-08-09 16:26 [PATCH 0/7] Discard requests, v2 David Woodhouse
` (10 preceding siblings ...)
2008-08-11 15:11 ` [PATCH 10/7] [BLOCK] Add BLKDISCARD ioctl to allow userspace to discard sectors David Woodhouse
@ 2008-08-12 9:14 ` Jens Axboe
2008-08-12 10:00 ` David Woodhouse
` (2 more replies)
2008-08-13 11:39 ` [PATCH 11/7] Kill REQ_TYPE_FLUSH David Woodhouse
2008-08-16 17:08 ` [PATCH 0/2] MMC discard support (was [PATCH 0/7] Discard requests, v2) Pierre Ossman
13 siblings, 3 replies; 88+ messages in thread
From: Jens Axboe @ 2008-08-12 9:14 UTC (permalink / raw)
To: David Woodhouse; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
On Sat, Aug 09 2008, David Woodhouse wrote:
> This time on top of the for-2.6.28 branch of
> git://git.kernel.dk/linux-2.6-block.git
>
> I've made it cope with merging and sorting discard requests, still as a
> separate patch at the end of the sequence. I don't think we have a
> problem with discards passing writes in the queue, any more than we
> _already_ had a problem with writes passing writes.
But we don't already have this problem, that is the point. For page
cache writes, the page cache nicely solves this issue for us - a write
that comes in later gets to wait on the page lock before proceeding. So
at least it's ordered. For O_DIRECT, the issuer is on his own.
I think this is a serious problem and that we must ensure that an
overlapping write doesn't pass a previously issued discard request. So
in that sense, discards must be considered soft barriers.
--
Jens Axboe
^ permalink raw reply [flat|nested] 88+ messages in thread* Re: [PATCH 0/7] Discard requests, v2
2008-08-12 9:14 ` [PATCH 0/7] Discard requests, v2 Jens Axboe
@ 2008-08-12 10:00 ` David Woodhouse
2008-08-12 10:54 ` Jens Axboe
2008-08-12 18:10 ` Jamie Lokier
2008-08-12 11:42 ` Matthew Wilcox
2008-08-12 19:53 ` OGAWA Hirofumi
2 siblings, 2 replies; 88+ messages in thread
From: David Woodhouse @ 2008-08-12 10:00 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
On Tue, 2008-08-12 at 11:14 +0200, Jens Axboe wrote:
> On Sat, Aug 09 2008, David Woodhouse wrote:
> > This time on top of the for-2.6.28 branch of
> > git://git.kernel.dk/linux-2.6-block.git
> >
> > I've made it cope with merging and sorting discard requests, still as a
> > separate patch at the end of the sequence. I don't think we have a
> > problem with discards passing writes in the queue, any more than we
> > _already_ had a problem with writes passing writes.
>
> But we don't already have this problem, that is the point. For page
> cache writes, the page cache nicely solves this issue for us - a write
> that comes in later gets to wait on the page lock before proceeding. So
> at least it's ordered. For O_DIRECT, the issuer is on his own.
>
> I think this is a serious problem and that we must ensure that an
> overlapping write doesn't pass a previously issued discard request. So
> in that sense, discards must be considered soft barriers.
OK, that makes sense. Enabling merges is now the last commit in
git.infradead.org/users/dwmw2/discard-2.6.git so we can easily drop it
for now. Now that I've updated the FAT code to merge discard requests
for contiguous chains, it's not a massive issue.
I'd still quite like to make it work though -- file systems that care
can always use explicit barriers to ensure that the DISCARD requests are
handled before any subsequent reallocation and write to the same
sectors.
If we want to allow that option for those who submit bios directly, is
it sufficient to set BIO_RW_SYNC to the flags in blkdev_issue_discard,
or do we really need REQ_SOFTBARRIER? There doesn't seem to be a way to
indicate that in ->bi_rw.
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/7] Discard requests, v2
2008-08-12 10:00 ` David Woodhouse
@ 2008-08-12 10:54 ` Jens Axboe
2008-08-12 11:16 ` David Woodhouse
2008-08-12 18:10 ` Jamie Lokier
1 sibling, 1 reply; 88+ messages in thread
From: Jens Axboe @ 2008-08-12 10:54 UTC (permalink / raw)
To: David Woodhouse; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
On Tue, Aug 12 2008, David Woodhouse wrote:
> On Tue, 2008-08-12 at 11:14 +0200, Jens Axboe wrote:
> > On Sat, Aug 09 2008, David Woodhouse wrote:
> > > This time on top of the for-2.6.28 branch of
> > > git://git.kernel.dk/linux-2.6-block.git
> > >
> > > I've made it cope with merging and sorting discard requests, still as a
> > > separate patch at the end of the sequence. I don't think we have a
> > > problem with discards passing writes in the queue, any more than we
> > > _already_ had a problem with writes passing writes.
> >
> > But we don't already have this problem, that is the point. For page
> > cache writes, the page cache nicely solves this issue for us - a write
> > that comes in later gets to wait on the page lock before proceeding. So
> > at least it's ordered. For O_DIRECT, the issuer is on his own.
> >
> > I think this is a serious problem and that we must ensure that an
> > overlapping write doesn't pass a previously issued discard request. So
> > in that sense, discards must be considered soft barriers.
>
> OK, that makes sense. Enabling merges is now the last commit in
> git.infradead.org/users/dwmw2/discard-2.6.git so we can easily drop it
> for now. Now that I've updated the FAT code to merge discard requests
> for contiguous chains, it's not a massive issue.
You can leave the merging, if we make it a barrier then that'll prevent
it from being merged anyway.
I'm fine with providing both a BIO_RW_DISCARD and
BIO_RW_DISCARD_NOBARRIER (or something like that) so that we at least
now the default is safe. If the file systems takes care of the ordering,
then it can use the less restricted BIO_RW_DISCARD_NOBARRIER instead.
> I'd still quite like to make it work though -- file systems that care
> can always use explicit barriers to ensure that the DISCARD requests are
> handled before any subsequent reallocation and write to the same
> sectors.
Agree, we should cater to those as well.
> If we want to allow that option for those who submit bios directly, is
> it sufficient to set BIO_RW_SYNC to the flags in blkdev_issue_discard,
> or do we really need REQ_SOFTBARRIER? There doesn't seem to be a way to
> indicate that in ->bi_rw.
BIO_RW_SYNC wont help you much, it'll just ensure that it wont go
through a plug cycle. We've never had bio softbarriers before, hence
there's just the one flag for barriers. The easy fix is would just be to
change init_request_from_bio() to do something ala:
if (bio_barrier(bio)) {
if (bio_has_data(bio))
req->cmd_flags |= REQ_HARDBARRIER;
else
req->cmd_flags |= REQ_SOFTBARRIER;
}
since there's no real data dependency when you don't have data attached.
--
Jens Axboe
^ permalink raw reply [flat|nested] 88+ messages in thread* Re: [PATCH 0/7] Discard requests, v2
2008-08-12 10:54 ` Jens Axboe
@ 2008-08-12 11:16 ` David Woodhouse
2008-08-12 12:19 ` David Woodhouse
0 siblings, 1 reply; 88+ messages in thread
From: David Woodhouse @ 2008-08-12 11:16 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
On Tue, 2008-08-12 at 12:54 +0200, Jens Axboe wrote:
> The easy fix is would just be to
> change init_request_from_bio() to do something ala:
>
> if (bio_barrier(bio)) {
> if (bio_has_data(bio))
> req->cmd_flags |= REQ_HARDBARRIER;
> else
> req->cmd_flags |= REQ_SOFTBARRIER;
> }
>
> since there's no real data dependency when you don't have data
> attached.
There's a check in __make_request() which will return -EOPNOTSUPP for
bios with BIO_RW_BARRIER set, in some circumstances. Is that OK too?
I was thinking of using the READ/WRITE bit for it -- just
(1<<BIO_RW_DISCARD) (i.e. read) would imply a barrier, while
having the WRITE bit also set would mean no barrier, and it gets
scheduled like a normal write.
Something like...
diff --git a/block/blk-barrier.c b/block/blk-barrier.c
index e544813..988b634 100644
--- a/block/blk-barrier.c
+++ b/block/blk-barrier.c
@@ -372,7 +372,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
nr_sects = 0;
}
bio_get(bio);
- submit_bio(WRITE_DISCARD, bio);
+ submit_bio(DISCARD_BARRIER, bio);
/* Check if it failed immediately */
if (bio_flagged(bio, BIO_EOPNOTSUPP))
diff --git a/block/blk-core.c b/block/blk-core.c
index 0c8ed97..be2fe52 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1079,6 +1079,13 @@ void init_request_from_bio(struct request *req, struct bio *bio)
req->cmd_flags |= (REQ_HARDBARRIER | REQ_NOMERGE);
if (unlikely(bio_discard(bio))) {
req->cmd_flags |= REQ_DISCARD;
+ /*
+ * READ discard requests are soft barriers; WRITE
+ * indicates the file system will take care of that
+ * for itself.
+ */
+ if (!bio_data_dir(bio))
+ req->cmd_flags |= REQ_SOFTBARRIER;
req->q->prepare_discard_fn(req->q, req);
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 88358ca..5cfacb3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -87,7 +87,8 @@ extern int dir_notify_enable;
#define WRITE_SYNC (WRITE | (1 << BIO_RW_SYNC))
#define SWRITE_SYNC (SWRITE | (1 << BIO_RW_SYNC))
#define WRITE_BARRIER (WRITE | (1 << BIO_RW_BARRIER))
-#define WRITE_DISCARD (WRITE | (1 << BIO_RW_DISCARD))
+#define DISCARD_NOBARRIER (WRITE | (1 << BIO_RW_DISCARD))
+#define DISCARD_BARRIER (1 << BIO_RW_DISCARD)
#define SEL_IN 1
#define SEL_OUT 2
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply related [flat|nested] 88+ messages in thread* Re: [PATCH 0/7] Discard requests, v2
2008-08-12 11:16 ` David Woodhouse
@ 2008-08-12 12:19 ` David Woodhouse
2008-08-12 12:53 ` Jens Axboe
0 siblings, 1 reply; 88+ messages in thread
From: David Woodhouse @ 2008-08-12 12:19 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
On Tue, 2008-08-12 at 12:16 +0100, David Woodhouse wrote:
> There's a check in __make_request() which will return -EOPNOTSUPP for
> bios with BIO_RW_BARRIER set, in some circumstances. Is that OK too?
A: No, it breaks submission of DISCARD|BARRIER requests. And I have to
change blk_empty_barrier() to explicitly omit discards too, to avoid the
check at line 1419 (in __generic_make_request()) returning -EOPNOTSUPP
before we even get there.
Using BIO_RW_BARRIER seems conceptually cleaner, but in practice it's
messier. I think this version will work, but I'm inclined to prefer the
previous one I posted, using the R/W bit.
diff --git a/block/blk-barrier.c b/block/blk-barrier.c
index e544813..988b634 100644
--- a/block/blk-barrier.c
+++ b/block/blk-barrier.c
@@ -372,7 +372,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
nr_sects = 0;
}
bio_get(bio);
- submit_bio(WRITE_DISCARD, bio);
+ submit_bio(DISCARD_BARRIER, bio);
/* Check if it failed immediately */
if (bio_flagged(bio, BIO_EOPNOTSUPP))
diff --git a/block/blk-core.c b/block/blk-core.c
index 0c8ed97..e7f2419 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1075,12 +1075,13 @@ void init_request_from_bio(struct request *req, struct bio *bio)
/*
* REQ_BARRIER implies no merging, but lets make it explicit
*/
- if (unlikely(bio_barrier(bio)))
- req->cmd_flags |= (REQ_HARDBARRIER | REQ_NOMERGE);
if (unlikely(bio_discard(bio))) {
req->cmd_flags |= REQ_DISCARD;
+ if (bio_barrier(bio))
+ req->cmd_flags |= REQ_SOFTBARRIER;
req->q->prepare_discard_fn(req->q, req);
- }
+ } else if (unlikely(bio_barrier(bio)))
+ req->cmd_flags |= (REQ_HARDBARRIER | REQ_NOMERGE);
if (bio_sync(bio))
req->cmd_flags |= REQ_RW_SYNC;
@@ -1112,12 +1113,13 @@ static int __make_request(struct request_queue *q, struct bio *bio)
blk_queue_bounce(q, &bio);
barrier = bio_barrier(bio);
- if (unlikely(barrier) && (q->next_ordered == QUEUE_ORDERED_NONE)) {
+ discard = bio_discard(bio);
+ if (unlikely(barrier) && !discard &&
+ (q->next_ordered == QUEUE_ORDERED_NONE)) {
err = -EOPNOTSUPP;
goto end_io;
}
- discard = bio_discard(bio);
if (unlikely(discard) && !q->prepare_discard_fn) {
err = -EOPNOTSUPP;
goto end_io;
diff --git a/block/ioctl.c b/block/ioctl.c
index 890f003..6f34c6c 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -161,7 +161,7 @@ static int blk_ioctl_discard(struct block_device *bdev, uint64_t start,
bio->bi_size = len << 9;
len = 0;
}
- submit_bio(WRITE_DISCARD, bio);
+ submit_bio(DISCARD_NOBARRIER, bio);
wait_for_completion(&wait);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 1fdfc56..33c3947 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -188,8 +188,8 @@ struct bio {
#define bio_failfast(bio) ((bio)->bi_rw & (1 << BIO_RW_FAILFAST))
#define bio_rw_ahead(bio) ((bio)->bi_rw & (1 << BIO_RW_AHEAD))
#define bio_rw_meta(bio) ((bio)->bi_rw & (1 << BIO_RW_META))
-#define bio_empty_barrier(bio) (bio_barrier(bio) && !bio_has_data(bio))
#define bio_discard(bio) ((bio)->bi_rw & (1 << BIO_RW_DISCARD))
+#define bio_empty_barrier(bio) (bio_barrier(bio) && !bio_has_data(bio) && !bio_discard(bio))
static inline unsigned int bio_cur_sectors(struct bio *bio)
{
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 88358ca..860689f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -87,7 +87,8 @@ extern int dir_notify_enable;
#define WRITE_SYNC (WRITE | (1 << BIO_RW_SYNC))
#define SWRITE_SYNC (SWRITE | (1 << BIO_RW_SYNC))
#define WRITE_BARRIER (WRITE | (1 << BIO_RW_BARRIER))
-#define WRITE_DISCARD (WRITE | (1 << BIO_RW_DISCARD))
+#define DISCARD_NOBARRIER (1 << BIO_RW_DISCARD)
+#define DISCARD_BARRIER ((1 << BIO_RW_DISCARD) | (1 << BIO_RW_BARRIER))
#define SEL_IN 1
#define SEL_OUT 2
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply related [flat|nested] 88+ messages in thread* Re: [PATCH 0/7] Discard requests, v2
2008-08-12 12:19 ` David Woodhouse
@ 2008-08-12 12:53 ` Jens Axboe
2008-08-12 13:04 ` David Woodhouse
0 siblings, 1 reply; 88+ messages in thread
From: Jens Axboe @ 2008-08-12 12:53 UTC (permalink / raw)
To: David Woodhouse; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
On Tue, Aug 12 2008, David Woodhouse wrote:
> On Tue, 2008-08-12 at 12:16 +0100, David Woodhouse wrote:
> > There's a check in __make_request() which will return -EOPNOTSUPP for
> > bios with BIO_RW_BARRIER set, in some circumstances. Is that OK too?
>
> A: No, it breaks submission of DISCARD|BARRIER requests. And I have to
> change blk_empty_barrier() to explicitly omit discards too, to avoid the
> check at line 1419 (in __generic_make_request()) returning -EOPNOTSUPP
> before we even get there.
>
> Using BIO_RW_BARRIER seems conceptually cleaner, but in practice it's
> messier. I think this version will work, but I'm inclined to prefer the
> previous one I posted, using the R/W bit.
Or just match the check before -EOPNOTSUPP with bio_has_data(), since it
only applies to a barrier that carries data.
--
Jens Axboe
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/7] Discard requests, v2
2008-08-12 12:53 ` Jens Axboe
@ 2008-08-12 13:04 ` David Woodhouse
2008-08-12 15:47 ` David Woodhouse
0 siblings, 1 reply; 88+ messages in thread
From: David Woodhouse @ 2008-08-12 13:04 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
On Tue, 2008-08-12 at 14:53 +0200, Jens Axboe wrote:
> On Tue, Aug 12 2008, David Woodhouse wrote:
> > On Tue, 2008-08-12 at 12:16 +0100, David Woodhouse wrote:
> > > There's a check in __make_request() which will return -EOPNOTSUPP for
> > > bios with BIO_RW_BARRIER set, in some circumstances. Is that OK too?
> >
> > A: No, it breaks submission of DISCARD|BARRIER requests. And I have to
> > change blk_empty_barrier() to explicitly omit discards too, to avoid the
> > check at line 1419 (in __generic_make_request()) returning -EOPNOTSUPP
> > before we even get there.
> >
> > Using BIO_RW_BARRIER seems conceptually cleaner, but in practice it's
> > messier. I think this version will work, but I'm inclined to prefer the
> > previous one I posted, using the R/W bit.
>
> Or just match the check before -EOPNOTSUPP with bio_has_data(), since it
> only applies to a barrier that carries data.
Like this, you mean? Empty barriers don't get to there?
I still prefer the R/W version, but I'll commit it this way if you
want...
diff --git a/block/blk-core.c b/block/blk-core.c
index be2fe52..1b38994 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1075,19 +1075,13 @@ void init_request_from_bio(struct request *req, struct bio *bio)
/*
* REQ_BARRIER implies no merging, but lets make it explicit
*/
- if (unlikely(bio_barrier(bio)))
- req->cmd_flags |= (REQ_HARDBARRIER | REQ_NOMERGE);
if (unlikely(bio_discard(bio))) {
req->cmd_flags |= REQ_DISCARD;
- /*
- * READ discard requests are soft barriers; WRITE
- * indicates the file system will take care of that
- * for itself.
- */
- if (!bio_data_dir(bio))
+ if (bio_barrier(bio))
req->cmd_flags |= REQ_SOFTBARRIER;
req->q->prepare_discard_fn(req->q, req);
- }
+ } else if (unlikely(bio_barrier(bio)))
+ req->cmd_flags |= (REQ_HARDBARRIER | REQ_NOMERGE);
if (bio_sync(bio))
req->cmd_flags |= REQ_RW_SYNC;
@@ -1119,7 +1113,8 @@ static int __make_request(struct request_queue *q, struct bio *bio)
blk_queue_bounce(q, &bio);
barrier = bio_barrier(bio);
- if (unlikely(barrier) && (q->next_ordered == QUEUE_ORDERED_NONE)) {
+ if (unlikely(barrier) && bio_has_data(bio) &&
+ (q->next_ordered == QUEUE_ORDERED_NONE)) {
err = -EOPNOTSUPP;
goto end_io;
}
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 1fdfc56..33c3947 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -188,8 +188,8 @@ struct bio {
#define bio_failfast(bio) ((bio)->bi_rw & (1 << BIO_RW_FAILFAST))
#define bio_rw_ahead(bio) ((bio)->bi_rw & (1 << BIO_RW_AHEAD))
#define bio_rw_meta(bio) ((bio)->bi_rw & (1 << BIO_RW_META))
-#define bio_empty_barrier(bio) (bio_barrier(bio) && !bio_has_data(bio))
#define bio_discard(bio) ((bio)->bi_rw & (1 << BIO_RW_DISCARD))
+#define bio_empty_barrier(bio) (bio_barrier(bio) && !bio_has_data(bio) && !bio_discard(bio))
static inline unsigned int bio_cur_sectors(struct bio *bio)
{
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 5cfacb3..860689f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -87,8 +87,8 @@ extern int dir_notify_enable;
#define WRITE_SYNC (WRITE | (1 << BIO_RW_SYNC))
#define SWRITE_SYNC (SWRITE | (1 << BIO_RW_SYNC))
#define WRITE_BARRIER (WRITE | (1 << BIO_RW_BARRIER))
-#define DISCARD_NOBARRIER (WRITE | (1 << BIO_RW_DISCARD))
-#define DISCARD_BARRIER (1 << BIO_RW_DISCARD)
+#define DISCARD_NOBARRIER (1 << BIO_RW_DISCARD)
+#define DISCARD_BARRIER ((1 << BIO_RW_DISCARD) | (1 << BIO_RW_BARRIER))
#define SEL_IN 1
#define SEL_OUT 2
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply related [flat|nested] 88+ messages in thread* Re: [PATCH 0/7] Discard requests, v2
2008-08-12 13:04 ` David Woodhouse
@ 2008-08-12 15:47 ` David Woodhouse
2008-08-12 18:04 ` Jamie Lokier
2008-08-13 11:15 ` Jens Axboe
0 siblings, 2 replies; 88+ messages in thread
From: David Woodhouse @ 2008-08-12 15:47 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
On Tue, 2008-08-12 at 14:04 +0100, David Woodhouse wrote:
> > Or just match the check before -EOPNOTSUPP with bio_has_data(),
> > since it only applies to a barrier that carries data.
>
> Like this, you mean? Empty barriers don't get to there?
Seems to work, although I'm somewhat dubious about it. Still, if you're
happy that it's correct and you really prefer it that way, then I can
commit it.
Anything else I need to address before you pull the tree? Any comment on
the BLKDISCARD ioctl? I've left that one using the non-barrier version
since it's waiting for it anyway, and shouldn't be happening
concurrently with anything else.
(http://,git://} git.infradead.org/users/dwmw2/discard-2.6.git
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/7] Discard requests, v2
2008-08-12 15:47 ` David Woodhouse
@ 2008-08-12 18:04 ` Jamie Lokier
2008-08-13 10:22 ` David Woodhouse
2008-08-13 11:15 ` Jens Axboe
1 sibling, 1 reply; 88+ messages in thread
From: Jamie Lokier @ 2008-08-12 18:04 UTC (permalink / raw)
To: David Woodhouse
Cc: Jens Axboe, Andrew Morton, Ric Wheeler, linux-fsdevel, gilad,
matthew
David Woodhouse wrote:
> Anything else I need to address before you pull the tree? Any comment on
> the BLKDISCARD ioctl? I've left that one using the non-barrier version
> since it's waiting for it anyway, and shouldn't be happening
> concurrently with anything else.
What makes you think a userspace filesystem or database wouldn't call
BLKDISCARD concurrently with anything else?
-- Jamie
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/7] Discard requests, v2
2008-08-12 18:04 ` Jamie Lokier
@ 2008-08-13 10:22 ` David Woodhouse
2008-08-13 12:19 ` Jamie Lokier
0 siblings, 1 reply; 88+ messages in thread
From: David Woodhouse @ 2008-08-13 10:22 UTC (permalink / raw)
To: Jamie Lokier
Cc: Jens Axboe, Andrew Morton, Ric Wheeler, linux-fsdevel, gilad,
matthew
On Tue, 2008-08-12 at 19:04 +0100, Jamie Lokier wrote:
> David Woodhouse wrote:
> > Anything else I need to address before you pull the tree? Any comment on
> > the BLKDISCARD ioctl? I've left that one using the non-barrier version
> > since it's waiting for it anyway, and shouldn't be happening
> > concurrently with anything else.
>
> What makes you think a userspace filesystem or database wouldn't call
> BLKDISCARD concurrently with anything else?
The BLKDISCARD ioctl is synchronous -- I haven't done an aio version of
it. Issuing a write to the same sectors before it finishes would be kind
of silly.
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/7] Discard requests, v2
2008-08-13 10:22 ` David Woodhouse
@ 2008-08-13 12:19 ` Jamie Lokier
2008-08-13 12:26 ` David Woodhouse
0 siblings, 1 reply; 88+ messages in thread
From: Jamie Lokier @ 2008-08-13 12:19 UTC (permalink / raw)
To: David Woodhouse
Cc: Jens Axboe, Andrew Morton, Ric Wheeler, linux-fsdevel, gilad,
matthew
David Woodhouse wrote:
> On Tue, 2008-08-12 at 19:04 +0100, Jamie Lokier wrote:
> > David Woodhouse wrote:
> > > Anything else I need to address before you pull the tree? Any comment on
> > > the BLKDISCARD ioctl? I've left that one using the non-barrier version
> > > since it's waiting for it anyway, and shouldn't be happening
> > > concurrently with anything else.
> >
> > What makes you think a userspace filesystem or database wouldn't call
> > BLKDISCARD concurrently with anything else?
>
> The BLKDISCARD ioctl is synchronous -- I haven't done an aio version of
> it. Issuing a write to the same sectors before it finishes would be kind
> of silly.
You're right, as long as it's not aio, there's no way to sensibly
order it with concurrent writes; I take it back.
(Aside: You would issue writes to *different* sectors before
BLKDISCARD, and they must be done in that order. That is how you
implement journalling.)
-- Jamie
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/7] Discard requests, v2
2008-08-13 12:19 ` Jamie Lokier
@ 2008-08-13 12:26 ` David Woodhouse
0 siblings, 0 replies; 88+ messages in thread
From: David Woodhouse @ 2008-08-13 12:26 UTC (permalink / raw)
To: Jamie Lokier
Cc: Jens Axboe, Andrew Morton, Ric Wheeler, linux-fsdevel, gilad,
matthew
On Wed, 2008-08-13 at 13:19 +0100, Jamie Lokier wrote:
> You're right, as long as it's not aio, there's no way to sensibly
> order it with concurrent writes; I take it back.
>
> (Aside: You would issue writes to *different* sectors before
> BLKDISCARD, and they must be done in that order. That is how you
> implement journalling.)
Yes, but you implement flushes or barriers to handle that case _anyway_.
Normal writes can cross each other too, when they're on different pages.
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/7] Discard requests, v2
2008-08-12 15:47 ` David Woodhouse
2008-08-12 18:04 ` Jamie Lokier
@ 2008-08-13 11:15 ` Jens Axboe
2008-08-13 11:23 ` David Woodhouse
1 sibling, 1 reply; 88+ messages in thread
From: Jens Axboe @ 2008-08-13 11:15 UTC (permalink / raw)
To: David Woodhouse; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
On Tue, Aug 12 2008, David Woodhouse wrote:
> On Tue, 2008-08-12 at 14:04 +0100, David Woodhouse wrote:
> > > Or just match the check before -EOPNOTSUPP with bio_has_data(),
> > > since it only applies to a barrier that carries data.
> >
> > Like this, you mean? Empty barriers don't get to there?
>
> Seems to work, although I'm somewhat dubious about it. Still, if you're
> happy that it's correct and you really prefer it that way, then I can
> commit it.
Looks ok to me.
> Anything else I need to address before you pull the tree? Any comment on
> the BLKDISCARD ioctl? I've left that one using the non-barrier version
> since it's waiting for it anyway, and shouldn't be happening
> concurrently with anything else.
>
> (http://,git://} git.infradead.org/users/dwmw2/discard-2.6.git
I'm with Jamie on using the safer version for the ioctl, unless you
ensure that the block device isn't mounted before allowing it.
--
Jens Axboe
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/7] Discard requests, v2
2008-08-13 11:15 ` Jens Axboe
@ 2008-08-13 11:23 ` David Woodhouse
2008-08-13 11:32 ` Jens Axboe
0 siblings, 1 reply; 88+ messages in thread
From: David Woodhouse @ 2008-08-13 11:23 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
On Wed, 2008-08-13 at 13:15 +0200, Jens Axboe wrote:
> On Tue, Aug 12 2008, David Woodhouse wrote:
> > On Tue, 2008-08-12 at 14:04 +0100, David Woodhouse wrote:
> > > > Or just match the check before -EOPNOTSUPP with bio_has_data(),
> > > > since it only applies to a barrier that carries data.
> > >
> > > Like this, you mean? Empty barriers don't get to there?
> >
> > Seems to work, although I'm somewhat dubious about it. Still, if you're
> > happy that it's correct and you really prefer it that way, then I can
> > commit it.
>
> Looks ok to me.
OK, that version is now in the git tree.
> > Anything else I need to address before you pull the tree? Any comment on
> > the BLKDISCARD ioctl? I've left that one using the non-barrier version
> > since it's waiting for it anyway, and shouldn't be happening
> > concurrently with anything else.
> >
> > (http://,git://} git.infradead.org/users/dwmw2/discard-2.6.git
>
> I'm with Jamie on using the safer version for the ioctl, unless you
> ensure that the block device isn't mounted before allowing it.
We don't ensure that the block device isn't mounted before we allow
reads/writes -- why should we do so before we allow discard?
If the userspace tool 'owns' the block device, that's a different story
-- and that's OK too, because the BLKDISCARD ioctl is synchronous. It
won't return until it's actually _complete_, and userspace really
shouldn't be trying to write to the same sectors until that happens.
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/7] Discard requests, v2
2008-08-13 11:23 ` David Woodhouse
@ 2008-08-13 11:32 ` Jens Axboe
2008-08-13 11:34 ` David Woodhouse
2008-08-14 7:25 ` David Woodhouse
0 siblings, 2 replies; 88+ messages in thread
From: Jens Axboe @ 2008-08-13 11:32 UTC (permalink / raw)
To: David Woodhouse; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
On Wed, Aug 13 2008, David Woodhouse wrote:
> On Wed, 2008-08-13 at 13:15 +0200, Jens Axboe wrote:
> > On Tue, Aug 12 2008, David Woodhouse wrote:
> > > On Tue, 2008-08-12 at 14:04 +0100, David Woodhouse wrote:
> > > > > Or just match the check before -EOPNOTSUPP with bio_has_data(),
> > > > > since it only applies to a barrier that carries data.
> > > >
> > > > Like this, you mean? Empty barriers don't get to there?
> > >
> > > Seems to work, although I'm somewhat dubious about it. Still, if you're
> > > happy that it's correct and you really prefer it that way, then I can
> > > commit it.
> >
> > Looks ok to me.
>
> OK, that version is now in the git tree.
Alright, I'll pull it it.
> > > Anything else I need to address before you pull the tree? Any comment on
> > > the BLKDISCARD ioctl? I've left that one using the non-barrier version
> > > since it's waiting for it anyway, and shouldn't be happening
> > > concurrently with anything else.
> > >
> > > (http://,git://} git.infradead.org/users/dwmw2/discard-2.6.git
> >
> > I'm with Jamie on using the safer version for the ioctl, unless you
> > ensure that the block device isn't mounted before allowing it.
>
> We don't ensure that the block device isn't mounted before we allow
> reads/writes -- why should we do so before we allow discard?
>
> If the userspace tool 'owns' the block device, that's a different story
> -- and that's OK too, because the BLKDISCARD ioctl is synchronous. It
> won't return until it's actually _complete_, and userspace really
> shouldn't be trying to write to the same sectors until that happens.
Still seems a little unsafe. I guess you could make a case for making
the ioctl privileged. We should at least ensure that the user has write
access to the device before allowing a discard operation.
--
Jens Axboe
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/7] Discard requests, v2
2008-08-13 11:32 ` Jens Axboe
@ 2008-08-13 11:34 ` David Woodhouse
2008-08-13 12:07 ` David Woodhouse
2008-08-14 7:49 ` Jens Axboe
2008-08-14 7:25 ` David Woodhouse
1 sibling, 2 replies; 88+ messages in thread
From: David Woodhouse @ 2008-08-13 11:34 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
On Wed, 2008-08-13 at 13:32 +0200, Jens Axboe wrote:
> On Wed, Aug 13 2008, David Woodhouse wrote:
> > On Wed, 2008-08-13 at 13:15 +0200, Jens Axboe wrote:
> > > On Tue, Aug 12 2008, David Woodhouse wrote:
> > > > On Tue, 2008-08-12 at 14:04 +0100, David Woodhouse wrote:
> > > > > > Or just match the check before -EOPNOTSUPP with bio_has_data(),
> > > > > > since it only applies to a barrier that carries data.
> > > > >
> > > > > Like this, you mean? Empty barriers don't get to there?
> > > >
> > > > Seems to work, although I'm somewhat dubious about it. Still, if you're
> > > > happy that it's correct and you really prefer it that way, then I can
> > > > commit it.
> > >
> > > Looks ok to me.
> >
> > OK, that version is now in the git tree.
>
> Alright, I'll pull it it.
>
> > > > Anything else I need to address before you pull the tree? Any comment on
> > > > the BLKDISCARD ioctl? I've left that one using the non-barrier version
> > > > since it's waiting for it anyway, and shouldn't be happening
> > > > concurrently with anything else.
> > > >
> > > > (http://,git://} git.infradead.org/users/dwmw2/discard-2.6.git
> > >
> > > I'm with Jamie on using the safer version for the ioctl, unless you
> > > ensure that the block device isn't mounted before allowing it.
> >
> > We don't ensure that the block device isn't mounted before we allow
> > reads/writes -- why should we do so before we allow discard?
> >
> > If the userspace tool 'owns' the block device, that's a different story
> > -- and that's OK too, because the BLKDISCARD ioctl is synchronous. It
> > won't return until it's actually _complete_, and userspace really
> > shouldn't be trying to write to the same sectors until that happens.
>
> Still seems a little unsafe. I guess you could make a case for making
> the ioctl privileged. We should at least ensure that the user has write
> access to the device before allowing a discard operation.
Er, yes. That would be a good idea, wouldn't it? :)
Hold off on pulling the tree for a moment; I'll do that.
--
dwmw2
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/7] Discard requests, v2
2008-08-13 11:34 ` David Woodhouse
@ 2008-08-13 12:07 ` David Woodhouse
2008-08-14 7:49 ` Jens Axboe
1 sibling, 0 replies; 88+ messages in thread
From: David Woodhouse @ 2008-08-13 12:07 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
On Wed, 2008-08-13 at 12:34 +0100, David Woodhouse wrote:
> Er, yes. That would be a good idea, wouldn't it? :)
>
> Hold off on pulling the tree for a moment; I'll do that.
Merged this into the appropriate commit:
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -325,6 +325,9 @@ int blkdev_ioctl(struct inode *inode, struct file *file, uns
case BLKDISCARD: {
uint64_t range[2];
+ if (!(file->f_mode & FMODE_WRITE))
+ return -EBADF;
+
if (copy_from_user(range, (void __user *)arg, sizeof(range)))
return -EFAULT;
Please pull (I've left out the REQ_TYPE_FLUSH patch for now)...
The following changes since commit 6dc2b733b2f9605b48fdb7692fce5a3eafe241e4:
Jens Axboe (1):
highmem: use bio_has_data() in the bounce path
are available in the git repository at:
git://git.infradead.org/users/dwmw2/discard-2.6.git master
David Woodhouse (10):
[BLOCK] Fix typo causing compile error in blk_queue_bounce()
[BLOCK] Fix up comments about matching flags between bio and rq
[BLOCK] Add 'discard' request handling
[FAT] Let the block device know when sectors can be discarded
[MTD] Support 'discard sectors' operation in translation layer support core
[MTD] [FTL] Support 'discard sectors' operation.
blktrace: support discard requests
blktrace: simplify flags handling in __blk_add_trace
[BLOCK] Add BLKDISCARD ioctl to allow userspace to discard sectors
[BLOCK] Allow elevators to sort/merge discard requests
OGAWA Hirofumi (1):
[BLOCK] Use WRITE_BARRIER in blkdev_issue_flush(), not (1<<BIO_RW_BARRIER)
block/blk-barrier.c | 71 ++++++++++++++++++++++++++++++++++++++-
block/blk-core.c | 41 ++++++++++++++--------
block/blk-merge.c | 27 +++++++++-----
block/blk-settings.c | 17 +++++++++
block/blktrace.c | 29 ++++-----------
block/compat_ioctl.c | 1 +
block/elevator.c | 12 +++++-
block/ioctl.c | 76 ++++++++++++++++++++++++++++++++++++++++++
drivers/mtd/ftl.c | 24 +++++++++++++
drivers/mtd/mtd_blkdevs.c | 16 +++++++++
fs/fat/fatent.c | 14 ++++++++
include/linux/bio.h | 14 +++++---
include/linux/blkdev.h | 24 +++++++++++--
include/linux/blktrace_api.h | 4 ++
include/linux/fs.h | 5 ++-
include/linux/mtd/blktrans.h | 2 +
mm/bounce.c | 2 +-
17 files changed, 320 insertions(+), 59 deletions(-)
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply [flat|nested] 88+ messages in thread* Re: [PATCH 0/7] Discard requests, v2
2008-08-13 11:34 ` David Woodhouse
2008-08-13 12:07 ` David Woodhouse
@ 2008-08-14 7:49 ` Jens Axboe
2008-08-14 7:52 ` David Woodhouse
1 sibling, 1 reply; 88+ messages in thread
From: Jens Axboe @ 2008-08-14 7:49 UTC (permalink / raw)
To: David Woodhouse; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
On Wed, Aug 13 2008, David Woodhouse wrote:
> On Wed, 2008-08-13 at 13:32 +0200, Jens Axboe wrote:
> > On Wed, Aug 13 2008, David Woodhouse wrote:
> > > On Wed, 2008-08-13 at 13:15 +0200, Jens Axboe wrote:
> > > > On Tue, Aug 12 2008, David Woodhouse wrote:
> > > > > On Tue, 2008-08-12 at 14:04 +0100, David Woodhouse wrote:
> > > > > > > Or just match the check before -EOPNOTSUPP with bio_has_data(),
> > > > > > > since it only applies to a barrier that carries data.
> > > > > >
> > > > > > Like this, you mean? Empty barriers don't get to there?
> > > > >
> > > > > Seems to work, although I'm somewhat dubious about it. Still, if you're
> > > > > happy that it's correct and you really prefer it that way, then I can
> > > > > commit it.
> > > >
> > > > Looks ok to me.
> > >
> > > OK, that version is now in the git tree.
> >
> > Alright, I'll pull it it.
> >
> > > > > Anything else I need to address before you pull the tree? Any comment on
> > > > > the BLKDISCARD ioctl? I've left that one using the non-barrier version
> > > > > since it's waiting for it anyway, and shouldn't be happening
> > > > > concurrently with anything else.
> > > > >
> > > > > (http://,git://} git.infradead.org/users/dwmw2/discard-2.6.git
> > > >
> > > > I'm with Jamie on using the safer version for the ioctl, unless you
> > > > ensure that the block device isn't mounted before allowing it.
> > >
> > > We don't ensure that the block device isn't mounted before we allow
> > > reads/writes -- why should we do so before we allow discard?
> > >
> > > If the userspace tool 'owns' the block device, that's a different story
> > > -- and that's OK too, because the BLKDISCARD ioctl is synchronous. It
> > > won't return until it's actually _complete_, and userspace really
> > > shouldn't be trying to write to the same sectors until that happens.
> >
> > Still seems a little unsafe. I guess you could make a case for making
> > the ioctl privileged. We should at least ensure that the user has write
> > access to the device before allowing a discard operation.
>
> Er, yes. That would be a good idea, wouldn't it? :)
I'd think so :-)
> Hold off on pulling the tree for a moment; I'll do that.
I've pulled it this morning.
--
Jens Axboe
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/7] Discard requests, v2
2008-08-14 7:49 ` Jens Axboe
@ 2008-08-14 7:52 ` David Woodhouse
0 siblings, 0 replies; 88+ messages in thread
From: David Woodhouse @ 2008-08-14 7:52 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
On Thu, 2008-08-14 at 09:49 +0200, Jens Axboe wrote:
> > Hold off on pulling the tree for a moment; I'll do that.
>
> I've pulled it this morning.
OK, I'll do another tree for the miscellaneous file system and device
driver bits then. Thanks.
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/7] Discard requests, v2
2008-08-13 11:32 ` Jens Axboe
2008-08-13 11:34 ` David Woodhouse
@ 2008-08-14 7:25 ` David Woodhouse
2008-08-14 7:33 ` Stephen Rothwell
1 sibling, 1 reply; 88+ messages in thread
From: David Woodhouse @ 2008-08-14 7:25 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew
On Wed, 2008-08-13 at 13:32 +0200, Jens Axboe wrote:
> > > Looks ok to me.
> >
> > OK, that version is now in the git tree.
>
> Alright, I'll pull it it.
I'm guessing (mostly from the typo which has prevented it compiling for
the last 5 days) that git://git.kernel.dk/linux-2.6-block.git#for-2.6.28
isn't in linux-next?
Do you mind if I ask Stephen to add my tree? I think I've mostly
finished playing with the block core now, and I'll round up patches for
new file systems to use this, and driver support (Pierre has SD/MMC
almost ready).
Should we merge your final commit and my first, while we're at it? It's
best not to leave a non-compiling version in the history if we can avoid
it.
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/7] Discard requests, v2
2008-08-14 7:25 ` David Woodhouse
@ 2008-08-14 7:33 ` Stephen Rothwell
2008-08-14 7:37 ` David Woodhouse
2008-08-14 7:42 ` Jens Axboe
0 siblings, 2 replies; 88+ messages in thread
From: Stephen Rothwell @ 2008-08-14 7:33 UTC (permalink / raw)
To: David Woodhouse
Cc: Jens Axboe, Andrew Morton, Ric Wheeler, linux-fsdevel, gilad,
matthew
[-- Attachment #1: Type: text/plain, Size: 486 bytes --]
Hi David,
On Thu, 14 Aug 2008 08:25:25 +0100 David Woodhouse <dwmw2@infradead.org> wrote:
>
> I'm guessing (mostly from the typo which has prevented it compiling for
> the last 5 days) that git://git.kernel.dk/linux-2.6-block.git#for-2.6.28
> isn't in linux-next?
linux-next has the for-next branch of that tree (which is currently empty
relative to Linus' tree).
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/7] Discard requests, v2
2008-08-14 7:33 ` Stephen Rothwell
@ 2008-08-14 7:37 ` David Woodhouse
2008-08-14 7:42 ` Jens Axboe
1 sibling, 0 replies; 88+ messages in thread
From: David Woodhouse @ 2008-08-14 7:37 UTC (permalink / raw)
To: Stephen Rothwell
Cc: Jens Axboe, Andrew Morton, Ric Wheeler, linux-fsdevel, gilad,
matthew
On Thu, 2008-08-14 at 17:33 +1000, Stephen Rothwell wrote:
> Hi David,
>
> On Thu, 14 Aug 2008 08:25:25 +0100 David Woodhouse <dwmw2@infradead.org> wrote:
> >
> > I'm guessing (mostly from the typo which has prevented it compiling for
> > the last 5 days) that git://git.kernel.dk/linux-2.6-block.git#for-2.6.28
> > isn't in linux-next?
>
> linux-next has the for-next branch of that tree (which is currently empty
> relative to Linus' tree).
Argh. Too many branches :)
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/7] Discard requests, v2
2008-08-14 7:33 ` Stephen Rothwell
2008-08-14 7:37 ` David Woodhouse
@ 2008-08-14 7:42 ` Jens Axboe
2008-08-14 7:46 ` Stephen Rothwell
1 sibling, 1 reply; 88+ messages in thread
From: Jens Axboe @ 2008-08-14 7:42 UTC (permalink / raw)
To: Stephen Rothwell
Cc: David Woodhouse, Andrew Morton, Ric Wheeler, linux-fsdevel, gilad,
matthew
On Thu, Aug 14 2008, Stephen Rothwell wrote:
> Hi David,
>
> On Thu, 14 Aug 2008 08:25:25 +0100 David Woodhouse <dwmw2@infradead.org> wrote:
> >
> > I'm guessing (mostly from the typo which has prevented it compiling for
> > the last 5 days) that git://git.kernel.dk/linux-2.6-block.git#for-2.6.28
> > isn't in linux-next?
>
> linux-next has the for-next branch of that tree (which is currently empty
> relative to Linus' tree).
Patience, it'll be pushed later today when I have it integrated and
tested.
--
Jens Axboe
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/7] Discard requests, v2
2008-08-14 7:42 ` Jens Axboe
@ 2008-08-14 7:46 ` Stephen Rothwell
0 siblings, 0 replies; 88+ messages in thread
From: Stephen Rothwell @ 2008-08-14 7:46 UTC (permalink / raw)
To: Jens Axboe
Cc: David Woodhouse, Andrew Morton, Ric Wheeler, linux-fsdevel, gilad,
matthew
[-- Attachment #1: Type: text/plain, Size: 556 bytes --]
Hi Jens,
On Thu, 14 Aug 2008 09:42:46 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:
>
> > linux-next has the for-next branch of that tree (which is currently empty
> > relative to Linus' tree).
>
> Patience, it'll be pushed later today when I have it integrated and
> tested.
That is as it should be, thanks. I wasn't criticising, just informing
David :-)
The next linux-next build will not begin for 15 hours, so no rush :-)
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/7] Discard requests, v2
2008-08-12 10:00 ` David Woodhouse
2008-08-12 10:54 ` Jens Axboe
@ 2008-08-12 18:10 ` Jamie Lokier
2008-08-13 10:20 ` David Woodhouse
1 sibling, 1 reply; 88+ messages in thread
From: Jamie Lokier @ 2008-08-12 18:10 UTC (permalink / raw)
To: David Woodhouse
Cc: Jens Axboe, Andrew Morton, Ric Wheeler, linux-fsdevel, gilad,
matthew
David Woodhouse wrote:
> I'd still quite like to make it work though -- file systems that care
> can always use explicit barriers to ensure that the DISCARD requests are
> handled before any subsequent reallocation and write to the same
> sectors.
Wouldn't an explicit barrier be (a) slow if it's a hard-barrier, and
(b) as a soft-barrier, overly constrain scheduling, because the only
ordering required is against a subsequent overlapping write?
Explicit soft or hard barrier *before* DISCARD does need to be an
option. Think of journalling: Step 1 = commit "deleted file" to
journal, 2 = hard barrier, 3 = DISCARD file data. Barrier after does
not seem to be required.
-- Jamie
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/7] Discard requests, v2
2008-08-12 18:10 ` Jamie Lokier
@ 2008-08-13 10:20 ` David Woodhouse
0 siblings, 0 replies; 88+ messages in thread
From: David Woodhouse @ 2008-08-13 10:20 UTC (permalink / raw)
To: Jamie Lokier
Cc: Jens Axboe, Andrew Morton, Ric Wheeler, linux-fsdevel, gilad,
matthew
On Tue, 2008-08-12 at 19:10 +0100, Jamie Lokier wrote:
> Wouldn't an explicit barrier be (a) slow if it's a hard-barrier, and
> (b) as a soft-barrier, overly constrain scheduling, because the only
> ordering required is against a subsequent overlapping write?
Yes, a soft barrier is more than we need -- but unless we do a lot of
work in the schedulers to ensure that writes can't cross discards,
that's the best option we have.
> Explicit soft or hard barrier *before* DISCARD does need to be an
> option. Think of journalling: Step 1 = commit "deleted file" to
> journal, 2 = hard barrier, 3 = DISCARD file data. Barrier after does
> not seem to be required.
The barrier afterwards is to protect against immediate reallocation of
these blocks to a new inode, and the associated data write getting
scheduled _before_ the discard.
--
dwmw2
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/7] Discard requests, v2
2008-08-12 9:14 ` [PATCH 0/7] Discard requests, v2 Jens Axboe
2008-08-12 10:00 ` David Woodhouse
@ 2008-08-12 11:42 ` Matthew Wilcox
2008-08-12 11:46 ` David Woodhouse
2008-08-12 19:53 ` OGAWA Hirofumi
2 siblings, 1 reply; 88+ messages in thread
From: Matthew Wilcox @ 2008-08-12 11:42 UTC (permalink / raw)
To: Jens Axboe
Cc: David Woodhouse, Andrew Morton, Ric Wheeler, linux-fsdevel, gilad
On Tue, Aug 12, 2008 at 11:14:47AM +0200, Jens Axboe wrote:
> But we don't already have this problem, that is the point. For page
> cache writes, the page cache nicely solves this issue for us - a write
> that comes in later gets to wait on the page lock before proceeding. So
> at least it's ordered. For O_DIRECT, the issuer is on his own.
>
> I think this is a serious problem and that we must ensure that an
> overlapping write doesn't pass a previously issued discard request. So
> in that sense, discards must be considered soft barriers.
Would it make sense for discards to lock the page as if it were a write?
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/7] Discard requests, v2
2008-08-12 11:42 ` Matthew Wilcox
@ 2008-08-12 11:46 ` David Woodhouse
0 siblings, 0 replies; 88+ messages in thread
From: David Woodhouse @ 2008-08-12 11:46 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Jens Axboe, Andrew Morton, Ric Wheeler, linux-fsdevel, gilad
On Tue, 2008-08-12 at 05:42 -0600, Matthew Wilcox wrote:
> Would it make sense for discards to lock the page as if it were a
> write?
In general, there is no page any more -- not in the file system's inode,
at least. There's the page cache for the block device itself, but
there's no point in locking that is there?
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/7] Discard requests, v2
2008-08-12 9:14 ` [PATCH 0/7] Discard requests, v2 Jens Axboe
2008-08-12 10:00 ` David Woodhouse
2008-08-12 11:42 ` Matthew Wilcox
@ 2008-08-12 19:53 ` OGAWA Hirofumi
2008-08-12 20:11 ` OGAWA Hirofumi
2 siblings, 1 reply; 88+ messages in thread
From: OGAWA Hirofumi @ 2008-08-12 19:53 UTC (permalink / raw)
To: Jens Axboe
Cc: David Woodhouse, Andrew Morton, Ric Wheeler, linux-fsdevel, gilad,
matthew
Jens Axboe <jens.axboe@oracle.com> writes:
> But we don't already have this problem, that is the point. For page
> cache writes, the page cache nicely solves this issue for us - a write
> that comes in later gets to wait on the page lock before proceeding. So
> at least it's ordered. For O_DIRECT, the issuer is on his own.
>
> I think this is a serious problem and that we must ensure that an
> overlapping write doesn't pass a previously issued discard request. So
> in that sense, discards must be considered soft barriers.
Um.., if blocks used by data (page cache) is reused as directory (meta
data), it can happen?
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/7] Discard requests, v2
2008-08-12 19:53 ` OGAWA Hirofumi
@ 2008-08-12 20:11 ` OGAWA Hirofumi
0 siblings, 0 replies; 88+ messages in thread
From: OGAWA Hirofumi @ 2008-08-12 20:11 UTC (permalink / raw)
To: Jens Axboe
Cc: David Woodhouse, Andrew Morton, Ric Wheeler, linux-fsdevel, gilad,
matthew
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> writes:
> Jens Axboe <jens.axboe@oracle.com> writes:
>
>> But we don't already have this problem, that is the point. For page
>> cache writes, the page cache nicely solves this issue for us - a write
>> that comes in later gets to wait on the page lock before proceeding. So
>> at least it's ordered. For O_DIRECT, the issuer is on his own.
>>
>> I think this is a serious problem and that we must ensure that an
>> overlapping write doesn't pass a previously issued discard request. So
>> in that sense, discards must be considered soft barriers.
>
> Um.., if blocks used by data (page cache) is reused as directory (meta
> data), it can happen?
Ugh, sorry, can't happen. forget it. FWIW, it would happen, if someone
directly access to device mounted by fs, it can.. But it's user's fault
already.
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
^ permalink raw reply [flat|nested] 88+ messages in thread
* [PATCH 11/7] Kill REQ_TYPE_FLUSH
2008-08-09 16:26 [PATCH 0/7] Discard requests, v2 David Woodhouse
` (11 preceding siblings ...)
2008-08-12 9:14 ` [PATCH 0/7] Discard requests, v2 Jens Axboe
@ 2008-08-13 11:39 ` David Woodhouse
2008-08-13 11:58 ` Geert Uytterhoeven
2008-08-13 15:40 ` Jens Axboe
2008-08-16 17:08 ` [PATCH 0/2] MMC discard support (was [PATCH 0/7] Discard requests, v2) Pierre Ossman
13 siblings, 2 replies; 88+ messages in thread
From: David Woodhouse @ 2008-08-13 11:39 UTC (permalink / raw)
To: Jens Axboe
Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew,
FUJITA Tomonori, Geert Uytterhoeven
It was only used by ps3disk, and it should probably have been
REQ_TYPE_LINUX_BLOCK + REQ_LB_OP_FLUSH.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---
drivers/block/ps3disk.c | 9 ++++++---
include/linux/blkdev.h | 1 -
2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/block/ps3disk.c b/drivers/block/ps3disk.c
index d797e20..4b0d6c7 100644
--- a/drivers/block/ps3disk.c
+++ b/drivers/block/ps3disk.c
@@ -199,7 +199,8 @@ static void ps3disk_do_request(struct ps3_storage_device *dev,
if (blk_fs_request(req)) {
if (ps3disk_submit_request_sg(dev, req))
break;
- } else if (req->cmd_type == REQ_TYPE_FLUSH) {
+ } else if (req->cmd_type == REQ_TYPE_LINUX_BLOCK &&
+ req->cmd[0] == REQ_LB_OP_FLUSH) {
if (ps3disk_submit_flush_request(dev, req))
break;
} else {
@@ -257,7 +258,8 @@ static irqreturn_t ps3disk_interrupt(int irq, void *data)
return IRQ_HANDLED;
}
- if (req->cmd_type == REQ_TYPE_FLUSH) {
+ if (req->cmd_type == REQ_TYPE_LINUX_BLOCK &&
+ req->cmd[0] == REQ_LB_OP_FLUSH) {
read = 0;
num_sectors = req->hard_cur_sectors;
op = "flush";
@@ -405,7 +407,8 @@ static void ps3disk_prepare_flush(struct request_queue *q, struct request *req)
dev_dbg(&dev->sbd.core, "%s:%u\n", __func__, __LINE__);
- req->cmd_type = REQ_TYPE_FLUSH;
+ req->cmd_type = REQ_TYPE_LINUX_BLOCK;
+ req->cmd[0] = REQ_LB_OP_FLUSH;
}
static unsigned long ps3disk_mask;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 293a71a..a0fa413 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -54,7 +54,6 @@ enum rq_cmd_type_bits {
REQ_TYPE_PM_SUSPEND, /* suspend request */
REQ_TYPE_PM_RESUME, /* resume request */
REQ_TYPE_PM_SHUTDOWN, /* shutdown request */
- REQ_TYPE_FLUSH, /* flush request */
REQ_TYPE_SPECIAL, /* driver defined type */
REQ_TYPE_LINUX_BLOCK, /* generic block layer message */
/*
--
1.5.5.1
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply related [flat|nested] 88+ messages in thread* Re: [PATCH 11/7] Kill REQ_TYPE_FLUSH
2008-08-13 11:39 ` [PATCH 11/7] Kill REQ_TYPE_FLUSH David Woodhouse
@ 2008-08-13 11:58 ` Geert Uytterhoeven
2008-08-13 12:43 ` David Woodhouse
2008-08-13 15:40 ` Jens Axboe
1 sibling, 1 reply; 88+ messages in thread
From: Geert Uytterhoeven @ 2008-08-13 11:58 UTC (permalink / raw)
To: David Woodhouse
Cc: Jens Axboe, Andrew Morton, Ric Wheeler, linux-fsdevel, gilad,
matthew, FUJITA Tomonori, Linux Kernel Development
[-- Attachment #1: Type: TEXT/PLAIN, Size: 2964 bytes --]
On Wed, 13 Aug 2008, David Woodhouse wrote:
> It was only used by ps3disk, and it should probably have been
> REQ_TYPE_LINUX_BLOCK + REQ_LB_OP_FLUSH.
I used REQ_TYPE_FLUSH after discussing with the block experts.
Note that REQ_LB_OP_FLUSH is also never used. Actually its definition in
include/linux/blkdev.h has:
| enum {
| /*
| * just examples for now
^^^^^^^^^^^^^^^^^^^^^
| */
| REQ_LB_OP_EJECT = 0x40, /* eject request */
| REQ_LB_OP_FLUSH = 0x41, /* flush device */
| };
which makes me a bit reluctant to make this change.
BTW, how to test all this flushing behavior, if there are no real users?
> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
> ---
> drivers/block/ps3disk.c | 9 ++++++---
> include/linux/blkdev.h | 1 -
> 2 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/block/ps3disk.c b/drivers/block/ps3disk.c
> index d797e20..4b0d6c7 100644
> --- a/drivers/block/ps3disk.c
> +++ b/drivers/block/ps3disk.c
> @@ -199,7 +199,8 @@ static void ps3disk_do_request(struct ps3_storage_device *dev,
> if (blk_fs_request(req)) {
> if (ps3disk_submit_request_sg(dev, req))
> break;
> - } else if (req->cmd_type == REQ_TYPE_FLUSH) {
> + } else if (req->cmd_type == REQ_TYPE_LINUX_BLOCK &&
> + req->cmd[0] == REQ_LB_OP_FLUSH) {
> if (ps3disk_submit_flush_request(dev, req))
> break;
> } else {
> @@ -257,7 +258,8 @@ static irqreturn_t ps3disk_interrupt(int irq, void *data)
> return IRQ_HANDLED;
> }
>
> - if (req->cmd_type == REQ_TYPE_FLUSH) {
> + if (req->cmd_type == REQ_TYPE_LINUX_BLOCK &&
> + req->cmd[0] == REQ_LB_OP_FLUSH) {
> read = 0;
> num_sectors = req->hard_cur_sectors;
> op = "flush";
> @@ -405,7 +407,8 @@ static void ps3disk_prepare_flush(struct request_queue *q, struct request *req)
>
> dev_dbg(&dev->sbd.core, "%s:%u\n", __func__, __LINE__);
>
> - req->cmd_type = REQ_TYPE_FLUSH;
> + req->cmd_type = REQ_TYPE_LINUX_BLOCK;
> + req->cmd[0] = REQ_LB_OP_FLUSH;
> }
>
> static unsigned long ps3disk_mask;
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 293a71a..a0fa413 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -54,7 +54,6 @@ enum rq_cmd_type_bits {
> REQ_TYPE_PM_SUSPEND, /* suspend request */
> REQ_TYPE_PM_RESUME, /* resume request */
> REQ_TYPE_PM_SHUTDOWN, /* shutdown request */
> - REQ_TYPE_FLUSH, /* flush request */
> REQ_TYPE_SPECIAL, /* driver defined type */
> REQ_TYPE_LINUX_BLOCK, /* generic block layer message */
> /*
With kind regards,
Geert Uytterhoeven
Software Architect
Sony Techsoft Centre Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium
Phone: +32 (0)2 700 8453
Fax: +32 (0)2 700 8622
E-mail: Geert.Uytterhoeven@sonycom.com
Internet: http://www.sony-europe.com/
A division of Sony Europe (Belgium) N.V.
VAT BE 0413.825.160 · RPR Brussels
Fortis 293-0376800-10 GEBA-BE-BB
^ permalink raw reply [flat|nested] 88+ messages in thread* Re: [PATCH 11/7] Kill REQ_TYPE_FLUSH
2008-08-13 11:58 ` Geert Uytterhoeven
@ 2008-08-13 12:43 ` David Woodhouse
0 siblings, 0 replies; 88+ messages in thread
From: David Woodhouse @ 2008-08-13 12:43 UTC (permalink / raw)
To: Geert Uytterhoeven
Cc: Jens Axboe, Andrew Morton, Ric Wheeler, linux-fsdevel, gilad,
matthew, FUJITA Tomonori, Linux Kernel Development
On Wed, 2008-08-13 at 13:58 +0200, Geert Uytterhoeven wrote:
> Note that REQ_LB_OP_FLUSH is also never used.
That can be fixed...
diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c
index 681d5ac..8663f3f 100644
--- a/drivers/mtd/mtd_blkdevs.c
+++ b/drivers/mtd/mtd_blkdevs.c
@@ -32,6 +32,13 @@ struct mtd_blkcore_priv {
spinlock_t queue_lock;
};
+static void blktrans_prepare_flush(struct request_queue *q,
+ struct request *req)
+{
+ req->cmd_type = REQ_TYPE_LINUX_BLOCK;
+ req->cmd[0] = REQ_LB_OP_FLUSH;
+}
+
static int blktrans_discard_request(struct request_queue *q,
struct request *req)
{
@@ -53,6 +60,14 @@ static int do_blktrans_request(struct mtd_blktrans_ops *tr,
buf = req->buffer;
if (req->cmd_type == REQ_TYPE_LINUX_BLOCK &&
+ req->cmd[0] == REQ_LB_OP_FLUSH) {
+ if (tr->flush)
+ return !tr->flush(dev);
+ else
+ return 1;
+ }
+
+ if (req->cmd_type == REQ_TYPE_LINUX_BLOCK &&
req->cmd[0] == REQ_LB_OP_DISCARD)
return !tr->discard(dev, block, nsect);
@@ -383,6 +398,9 @@ int register_mtd_blktrans(struct mtd_blktrans_ops *tr)
blk_queue_set_discard(tr->blkcore_priv->rq,
blktrans_discard_request);
+ blk_queue_ordered(tr->blkcore_priv->rq, QUEUE_ORDERED_DRAIN_FLUSH,
+ blktrans_prepare_flush);
+
tr->blkshift = ffs(tr->blksize) - 1;
tr->blkcore_priv->thread = kthread_run(mtd_blktrans_thread, tr,
> Actually its definition in include/linux/blkdev.h has:
>
> | enum {
> | /*
> | * just examples for now
> ^^^^^^^^^^^^^^^^^^^^^
> | */
> | REQ_LB_OP_EJECT = 0x40, /* eject request */
> | REQ_LB_OP_FLUSH = 0x41, /* flush device */
> | };
>
> which makes me a bit reluctant to make this change.
That can be fixed too...
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 293a71a..7ef582f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -76,11 +75,8 @@ enum rq_cmd_type_bits {
*
*/
enum {
- /*
- * just examples for now
- */
REQ_LB_OP_EJECT = 0x40, /* eject request */
- REQ_LB_OP_FLUSH = 0x41, /* flush device */
+ REQ_LB_OP_FLUSH = 0x41, /* flush request */
REQ_LB_OP_DISCARD = 0x42, /* discard sectors */
};
> BTW, how to test all this flushing behavior, if there are no real
> users?
btrfs uses it.
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply related [flat|nested] 88+ messages in thread
* Re: [PATCH 11/7] Kill REQ_TYPE_FLUSH
2008-08-13 11:39 ` [PATCH 11/7] Kill REQ_TYPE_FLUSH David Woodhouse
2008-08-13 11:58 ` Geert Uytterhoeven
@ 2008-08-13 15:40 ` Jens Axboe
2008-08-13 15:46 ` David Woodhouse
1 sibling, 1 reply; 88+ messages in thread
From: Jens Axboe @ 2008-08-13 15:40 UTC (permalink / raw)
To: David Woodhouse
Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew,
FUJITA Tomonori, Geert Uytterhoeven
On Wed, Aug 13 2008, David Woodhouse wrote:
> It was only used by ps3disk, and it should probably have been
> REQ_TYPE_LINUX_BLOCK + REQ_LB_OP_FLUSH.
>
> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
> ---
> drivers/block/ps3disk.c | 9 ++++++---
> include/linux/blkdev.h | 1 -
> 2 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/block/ps3disk.c b/drivers/block/ps3disk.c
> index d797e20..4b0d6c7 100644
> --- a/drivers/block/ps3disk.c
> +++ b/drivers/block/ps3disk.c
> @@ -199,7 +199,8 @@ static void ps3disk_do_request(struct ps3_storage_device *dev,
> if (blk_fs_request(req)) {
> if (ps3disk_submit_request_sg(dev, req))
> break;
> - } else if (req->cmd_type == REQ_TYPE_FLUSH) {
> + } else if (req->cmd_type == REQ_TYPE_LINUX_BLOCK &&
> + req->cmd[0] == REQ_LB_OP_FLUSH) {
> if (ps3disk_submit_flush_request(dev, req))
> break;
> } else {
> @@ -257,7 +258,8 @@ static irqreturn_t ps3disk_interrupt(int irq, void *data)
> return IRQ_HANDLED;
> }
>
> - if (req->cmd_type == REQ_TYPE_FLUSH) {
> + if (req->cmd_type == REQ_TYPE_LINUX_BLOCK &&
> + req->cmd[0] == REQ_LB_OP_FLUSH) {
> read = 0;
> num_sectors = req->hard_cur_sectors;
> op = "flush";
> @@ -405,7 +407,8 @@ static void ps3disk_prepare_flush(struct request_queue *q, struct request *req)
>
> dev_dbg(&dev->sbd.core, "%s:%u\n", __func__, __LINE__);
>
> - req->cmd_type = REQ_TYPE_FLUSH;
> + req->cmd_type = REQ_TYPE_LINUX_BLOCK;
> + req->cmd[0] = REQ_LB_OP_FLUSH;
> }
>
> static unsigned long ps3disk_mask;
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 293a71a..a0fa413 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -54,7 +54,6 @@ enum rq_cmd_type_bits {
> REQ_TYPE_PM_SUSPEND, /* suspend request */
> REQ_TYPE_PM_RESUME, /* resume request */
> REQ_TYPE_PM_SHUTDOWN, /* shutdown request */
> - REQ_TYPE_FLUSH, /* flush request */
> REQ_TYPE_SPECIAL, /* driver defined type */
> REQ_TYPE_LINUX_BLOCK, /* generic block layer message */
> /*
> --
> 1.5.5.1
That looks good, thanks David!
--
Jens Axboe
^ permalink raw reply [flat|nested] 88+ messages in thread* Re: [PATCH 11/7] Kill REQ_TYPE_FLUSH
2008-08-13 15:40 ` Jens Axboe
@ 2008-08-13 15:46 ` David Woodhouse
0 siblings, 0 replies; 88+ messages in thread
From: David Woodhouse @ 2008-08-13 15:46 UTC (permalink / raw)
To: Jens Axboe
Cc: Andrew Morton, Ric Wheeler, linux-fsdevel, gilad, matthew,
FUJITA Tomonori, Geert Uytterhoeven
On Wed, 2008-08-13 at 17:40 +0200, Jens Axboe wrote:
> That looks good, thanks David!
OK, I've added it (and the subsequent mtd_blkdevs patch) to the git
tree.
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply [flat|nested] 88+ messages in thread
* [PATCH 0/2] MMC discard support (was [PATCH 0/7] Discard requests, v2)
2008-08-09 16:26 [PATCH 0/7] Discard requests, v2 David Woodhouse
` (12 preceding siblings ...)
2008-08-13 11:39 ` [PATCH 11/7] Kill REQ_TYPE_FLUSH David Woodhouse
@ 2008-08-16 17:08 ` Pierre Ossman
2008-08-16 17:11 ` [PATCH 1/2] mmc_block: factor out the mmc request handling Pierre Ossman
` (3 more replies)
13 siblings, 4 replies; 88+ messages in thread
From: Pierre Ossman @ 2008-08-16 17:08 UTC (permalink / raw)
To: linux-fsdevel; +Cc: David Woodhouse, Jens Axboe
[-- Attachment #1: Type: text/plain, Size: 1198 bytes --]
I've cooked up some patches that maps David's new discard requests to
erase operations on MMC and SD cards. I'm not entirely sure these are
something to keep though as I've been unable to see any performance
increase in keeping blocks erased. Do we have any other reason to keep
it?
During development and test I noticed one issue though; the discard
requests are chopped up into smaller pieces. As the erase commands have
a granularity, sometimes this can mean that parts of the flash are
missed because the discard request isn't split in alignment with the
erase boundaries.
For this reason, I propose that discard requests do not respect the
regular queue restrictions. Those are generally for expressing
limitations in the devices' DMA engines. Since the discard request
carries no data, the DMA engine will never get involved.
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
rdesktop, core developer http://www.rdesktop.org
WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 88+ messages in thread* [PATCH 1/2] mmc_block: factor out the mmc request handling
2008-08-16 17:08 ` [PATCH 0/2] MMC discard support (was [PATCH 0/7] Discard requests, v2) Pierre Ossman
@ 2008-08-16 17:11 ` Pierre Ossman
2008-08-16 17:12 ` [PATCH 2/2] mmc_block: erase discarded blocks Pierre Ossman
` (2 subsequent siblings)
3 siblings, 0 replies; 88+ messages in thread
From: Pierre Ossman @ 2008-08-16 17:11 UTC (permalink / raw)
To: linux-fsdevel; +Cc: David Woodhouse, Jens Axboe
From: Pierre Ossman <drzeus@drzeus.cx>
Move the handling of the actual MMC request to its own function so
that the main request handler can be extended to handle other types
of requests than simple reads and writes.
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index 86dbb36..3140e92 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -208,195 +208,215 @@ static u32 mmc_sd_num_wr_blocks(struct mmc_card *card)
return blocks;
}
-static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
+static int mmc_blk_xfer_rq(struct mmc_blk_data *md,
+ struct request *req, unsigned int *bytes_xfered)
{
- struct mmc_blk_data *md = mq->data;
- struct mmc_card *card = md->queue.card;
+ struct mmc_card *card;
+
struct mmc_blk_request brq;
- int ret = 1, data_size, i;
+ int ret, data_size, i;
struct scatterlist *sg;
+ u32 readcmd, writecmd;
- mmc_claim_host(card->host);
+ BUG_ON(!bytes_xfered);
- do {
- struct mmc_command cmd;
- u32 readcmd, writecmd;
-
- memset(&brq, 0, sizeof(struct mmc_blk_request));
- brq.mrq.cmd = &brq.cmd;
- brq.mrq.data = &brq.data;
-
- brq.cmd.arg = req->sector;
- if (!mmc_card_blockaddr(card))
- brq.cmd.arg <<= 9;
- brq.cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R1 | MMC_CMD_ADTC;
- brq.data.blksz = 1 << md->block_bits;
- brq.stop.opcode = MMC_STOP_TRANSMISSION;
- brq.stop.arg = 0;
- brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
- brq.data.blocks = req->nr_sectors >> (md->block_bits - 9);
- if (brq.data.blocks > card->host->max_blk_count)
- brq.data.blocks = card->host->max_blk_count;
-
- if (brq.data.blocks > 1) {
- /* SPI multiblock writes terminate using a special
- * token, not a STOP_TRANSMISSION request.
- */
- if (!mmc_host_is_spi(card->host)
- || rq_data_dir(req) == READ)
- brq.mrq.stop = &brq.stop;
- readcmd = MMC_READ_MULTIPLE_BLOCK;
- writecmd = MMC_WRITE_MULTIPLE_BLOCK;
- } else {
- brq.mrq.stop = NULL;
- readcmd = MMC_READ_SINGLE_BLOCK;
- writecmd = MMC_WRITE_BLOCK;
- }
-
- if (rq_data_dir(req) == READ) {
- brq.cmd.opcode = readcmd;
- brq.data.flags |= MMC_DATA_READ;
- } else {
- brq.cmd.opcode = writecmd;
- brq.data.flags |= MMC_DATA_WRITE;
- }
+ card = md->queue.card;
- mmc_set_data_timeout(&brq.data, card);
+ memset(&brq, 0, sizeof(struct mmc_blk_request));
+ brq.mrq.cmd = &brq.cmd;
+ brq.mrq.data = &brq.data;
- brq.data.sg = mq->sg;
- brq.data.sg_len = mmc_queue_map_sg(mq);
-
- mmc_queue_bounce_pre(mq);
+ brq.cmd.arg = req->sector;
+ if (!mmc_card_blockaddr(card))
+ brq.cmd.arg <<= 9;
+ brq.cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R1 | MMC_CMD_ADTC;
+ brq.data.blksz = 1 << md->block_bits;
+ brq.stop.opcode = MMC_STOP_TRANSMISSION;
+ brq.stop.arg = 0;
+ brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
+ brq.data.blocks = req->nr_sectors >> (md->block_bits - 9);
+ if (brq.data.blocks > card->host->max_blk_count)
+ brq.data.blocks = card->host->max_blk_count;
+ if (brq.data.blocks > 1) {
/*
- * Adjust the sg list so it is the same size as the
- * request.
+ * SPI multiblock writes terminate using a special
+ * token, not a STOP_TRANSMISSION request.
*/
- if (brq.data.blocks !=
- (req->nr_sectors >> (md->block_bits - 9))) {
- data_size = brq.data.blocks * brq.data.blksz;
- for_each_sg(brq.data.sg, sg, brq.data.sg_len, i) {
- data_size -= sg->length;
- if (data_size <= 0) {
- sg->length += data_size;
- i++;
- break;
- }
- }
- brq.data.sg_len = i;
- }
+ if (!mmc_host_is_spi(card->host)
+ || rq_data_dir(req) == READ)
+ brq.mrq.stop = &brq.stop;
+ readcmd = MMC_READ_MULTIPLE_BLOCK;
+ writecmd = MMC_WRITE_MULTIPLE_BLOCK;
+ } else {
+ brq.mrq.stop = NULL;
+ readcmd = MMC_READ_SINGLE_BLOCK;
+ writecmd = MMC_WRITE_BLOCK;
+ }
- mmc_wait_for_req(card->host, &brq.mrq);
+ if (rq_data_dir(req) == READ) {
+ brq.cmd.opcode = readcmd;
+ brq.data.flags |= MMC_DATA_READ;
+ } else {
+ brq.cmd.opcode = writecmd;
+ brq.data.flags |= MMC_DATA_WRITE;
+ }
- mmc_queue_bounce_post(mq);
+ mmc_set_data_timeout(&brq.data, card);
- /*
- * Check for errors here, but don't jump to cmd_err
- * until later as we need to wait for the card to leave
- * programming mode even when things go wrong.
- */
- if (brq.cmd.error) {
- printk(KERN_ERR "%s: error %d sending read/write command\n",
- req->rq_disk->disk_name, brq.cmd.error);
- }
+ brq.data.sg = md->queue.sg;
+ brq.data.sg_len = mmc_queue_map_sg(&md->queue);
- if (brq.data.error) {
- printk(KERN_ERR "%s: error %d transferring data\n",
- req->rq_disk->disk_name, brq.data.error);
- }
+ mmc_queue_bounce_pre(&md->queue);
- if (brq.stop.error) {
- printk(KERN_ERR "%s: error %d sending stop command\n",
- req->rq_disk->disk_name, brq.stop.error);
+ /*
+ * Adjust the sg list so it is the same size as the
+ * request.
+ */
+ if (brq.data.blocks !=
+ (req->nr_sectors >> (md->block_bits - 9))) {
+ data_size = brq.data.blocks * brq.data.blksz;
+ for_each_sg(brq.data.sg, sg, brq.data.sg_len, i) {
+ data_size -= sg->length;
+ if (data_size <= 0) {
+ sg->length += data_size;
+ i++;
+ break;
+ }
}
+ brq.data.sg_len = i;
+ }
- if (!mmc_host_is_spi(card->host) && rq_data_dir(req) != READ) {
- do {
- int err;
-
- cmd.opcode = MMC_SEND_STATUS;
- cmd.arg = card->rca << 16;
- cmd.flags = MMC_RSP_R1 | MMC_CMD_AC;
- err = mmc_wait_for_cmd(card->host, &cmd, 5);
- if (err) {
- printk(KERN_ERR "%s: error %d requesting status\n",
- req->rq_disk->disk_name, err);
- goto cmd_err;
- }
- /*
- * Some cards mishandle the status bits,
- * so make sure to check both the busy
- * indication and the card state.
- */
- } while (!(cmd.resp[0] & R1_READY_FOR_DATA) ||
- (R1_CURRENT_STATE(cmd.resp[0]) == 7));
-
-#if 0
- if (cmd.resp[0] & ~0x00000900)
- printk(KERN_ERR "%s: status = %08x\n",
- req->rq_disk->disk_name, cmd.resp[0]);
- if (mmc_decode_status(cmd.resp))
- goto cmd_err;
-#endif
- }
+ mmc_wait_for_req(card->host, &brq.mrq);
- if (brq.cmd.error || brq.data.error || brq.stop.error)
- goto cmd_err;
+ mmc_queue_bounce_post(&md->queue);
- /*
- * A block was successfully transferred.
- */
- spin_lock_irq(&md->lock);
- ret = __blk_end_request(req, 0, brq.data.bytes_xfered);
- spin_unlock_irq(&md->lock);
- } while (ret);
+ ret = 0;
+ *bytes_xfered = brq.data.bytes_xfered;
- mmc_release_host(card->host);
+ if (brq.cmd.error) {
+ ret = brq.cmd.error;
+ printk(KERN_ERR "%s: error %d sending read/write command\n",
+ req->rq_disk->disk_name, brq.cmd.error);
+ }
- return 1;
+ if (brq.data.error) {
+ ret = brq.cmd.error;
+ printk(KERN_ERR "%s: error %d transferring data\n",
+ req->rq_disk->disk_name, brq.data.error);
+ }
- cmd_err:
- /*
- * If this is an SD card and we're writing, we can first
- * mark the known good sectors as ok.
- *
- * If the card is not SD, we can still ok written sectors
- * as reported by the controller (which might be less than
- * the real number of written sectors, but never more).
- *
- * For reads we just fail the entire chunk as that should
- * be safe in all cases.
+ if (brq.stop.error) {
+ ret = brq.cmd.error;
+ printk(KERN_ERR "%s: error %d sending stop command\n",
+ req->rq_disk->disk_name, brq.stop.error);
+ }
+
+ /*
+ * We need to wait for the card to leave programming mode
+ * even when things go wrong.
*/
- if (rq_data_dir(req) != READ) {
- if (mmc_card_sd(card)) {
+ if (!mmc_host_is_spi(card->host) && rq_data_dir(req) != READ) {
+ int cmd_ret;
+ struct mmc_command cmd;
+
+ do {
+ cmd.opcode = MMC_SEND_STATUS;
+ cmd.arg = card->rca << 16;
+ cmd.flags = MMC_RSP_R1 | MMC_CMD_AC;
+ cmd_ret = mmc_wait_for_cmd(card->host, &cmd, 5);
+ if (cmd_ret) {
+ printk(KERN_ERR "%s: error %d requesting status\n",
+ req->rq_disk->disk_name, cmd_ret);
+ ret = cmd_ret;
+ break;
+ }
+ /*
+ * Some cards mishandle the status bits,
+ * so make sure to check both the busy
+ * indication and the card state.
+ */
+ } while (!(cmd.resp[0] & R1_READY_FOR_DATA) ||
+ (R1_CURRENT_STATE(cmd.resp[0]) == 7));
+ }
+
+ /*
+ * Adjust the number of bytes transferred if there has been
+ * an error...
+ */
+ if (ret) {
+ /*
+ * For reads we just fail the entire chunk as that should
+ * be safe in all cases.
+ *
+ * If this is an SD card and we're writing, we can ask the
+ * card for known good sectors.
+ *
+ * If the card is not SD, we can still ok written sectors
+ * as reported by the controller (which might be less than
+ * the real number of written sectors, but never more).
+ */
+ if (rq_data_dir(req) == READ)
+ *bytes_xfered = 0;
+ else if (mmc_card_sd(card)) {
u32 blocks;
- unsigned int bytes;
blocks = mmc_sd_num_wr_blocks(card);
- if (blocks != (u32)-1) {
+ if (blocks == (u32)-1)
+ *bytes_xfered = 0;
+ else {
if (card->csd.write_partial)
- bytes = blocks << md->block_bits;
+ *bytes_xfered = blocks << md->block_bits;
else
- bytes = blocks << 9;
- spin_lock_irq(&md->lock);
- ret = __blk_end_request(req, 0, bytes);
- spin_unlock_irq(&md->lock);
+ *bytes_xfered = blocks << 9;
}
- } else {
+ }
+ }
+
+ return ret;
+}
+
+static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
+{
+ struct mmc_blk_data *md = mq->data;
+ struct mmc_card *card = md->queue.card;
+ int ret, err, bytes_xfered;
+
+ mmc_claim_host(card->host);
+
+ do {
+ err = mmc_blk_xfer_rq(md, req, &bytes_xfered);
+
+ /*
+ * First handle the sectors that got transferred
+ * successfully...
+ */
+ spin_lock_irq(&md->lock);
+ ret = __blk_end_request(req, 0, bytes_xfered);
+ spin_unlock_irq(&md->lock);
+
+ /*
+ * ...then check if things went south.
+ */
+ if (err) {
+ mmc_release_host(card->host);
+
+ /*
+ * Kill of the rest of the request...
+ */
spin_lock_irq(&md->lock);
- ret = __blk_end_request(req, 0, brq.data.bytes_xfered);
+ while (ret)
+ ret = __blk_end_request(req, -EIO,
+ blk_rq_cur_bytes(req));
spin_unlock_irq(&md->lock);
+
+ return 0;
}
- }
+ } while (ret);
mmc_release_host(card->host);
- spin_lock_irq(&md->lock);
- while (ret)
- ret = __blk_end_request(req, -EIO, blk_rq_cur_bytes(req));
- spin_unlock_irq(&md->lock);
-
- return 0;
+ return 1;
}
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
rdesktop, core developer http://www.rdesktop.org
WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.
^ permalink raw reply related [flat|nested] 88+ messages in thread* [PATCH 2/2] mmc_block: erase discarded blocks
2008-08-16 17:08 ` [PATCH 0/2] MMC discard support (was [PATCH 0/7] Discard requests, v2) Pierre Ossman
2008-08-16 17:11 ` [PATCH 1/2] mmc_block: factor out the mmc request handling Pierre Ossman
@ 2008-08-16 17:12 ` Pierre Ossman
2008-08-16 17:38 ` [PATCH 0/2] MMC discard support (was [PATCH 0/7] Discard requests, v2) David Woodhouse
2008-08-22 9:24 ` Jens Axboe
3 siblings, 0 replies; 88+ messages in thread
From: Pierre Ossman @ 2008-08-16 17:12 UTC (permalink / raw)
To: linux-fsdevel; +Cc: David Woodhouse, Jens Axboe
From: Pierre Ossman <drzeus@drzeus.cx>
Erase blocks when upper layers discards them as that will increase
performance in subsequent writes to those areas.
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index 3140e92..41d3000 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -376,6 +376,119 @@ static int mmc_blk_xfer_rq(struct mmc_blk_data *md,
return ret;
}
+static int mmc_blk_erase_rq(struct mmc_blk_data *md,
+ struct request *req, unsigned int *bytes_xfered)
+{
+ struct mmc_card *card;
+ struct mmc_command cmd;
+ int ret;
+
+ uint64_t start, end, blocks;
+
+ BUG_ON(!bytes_xfered);
+
+ *bytes_xfered = 0;
+
+ card = md->queue.card;
+
+ BUG_ON(mmc_card_blockaddr(card) && (card->csd.erase_size % 512));
+
+ start = req->sector << 9;
+ end = start + (req->nr_sectors << 9);
+
+ /*
+ * The specs talk about the card removing the least
+ * significant bits, but the erase sizes are not guaranteed
+ * to be a power of two, so do a proper calculation.
+ */
+ blocks = start;
+ if (do_div(blocks, card->csd.erase_size)) /* start % erase_size */
+ start = (blocks + 1) * card->csd.erase_size; /* roundup() */
+ blocks = end;
+ if (do_div(blocks, card->csd.erase_size))
+ end = blocks * card->csd.erase_size;
+
+ if (start == end)
+ goto out;
+
+ /*
+ * The MMC spec isn't entirely clear that this should be done,
+ * but it would be impossible to erase the entire card if the
+ * addresses aren't sector based.
+ */
+ if (mmc_card_blockaddr(card)) {
+ start >>= 9;
+ end >>= 9;
+ }
+
+ if (mmc_card_sd(card))
+ cmd.opcode = SD_ERASE_WR_BLK_START;
+ else
+ cmd.opcode = MMC_ERASE_GROUP_START;
+ cmd.arg = start;
+ cmd.flags = MMC_RSP_R1 | MMC_CMD_AC;
+
+ ret = mmc_wait_for_cmd(card->host, &cmd, 0);
+ if (ret) {
+ printk(KERN_ERR "%s: error %d setting block erase start address\n",
+ req->rq_disk->disk_name, ret);
+ return ret;
+ }
+
+ if (mmc_card_sd(card))
+ cmd.opcode = SD_ERASE_WR_BLK_END;
+ else
+ cmd.opcode = MMC_ERASE_GROUP_END;
+ cmd.arg = end - 1; /* the span is inclusive */
+ cmd.flags = MMC_RSP_R1 | MMC_CMD_AC;
+
+ ret = mmc_wait_for_cmd(card->host, &cmd, 0);
+ if (ret) {
+ printk(KERN_ERR "%s: error %d setting block erase end address\n",
+ req->rq_disk->disk_name, ret);
+ return ret;
+ }
+
+ cmd.opcode = MMC_ERASE;
+ cmd.arg = 0;
+ cmd.flags = MMC_RSP_R1B | MMC_CMD_AC;
+
+ ret = mmc_wait_for_cmd(card->host, &cmd, 0);
+ if (ret) {
+ printk(KERN_ERR "%s: error %d starting block erase\n",
+ req->rq_disk->disk_name, ret);
+ return ret;
+ }
+
+ /*
+ * Wait for the card to finish the erase request...
+ */
+ if (!mmc_host_is_spi(card->host)) {
+ do {
+ cmd.opcode = MMC_SEND_STATUS;
+ cmd.arg = card->rca << 16;
+ cmd.flags = MMC_RSP_R1 | MMC_CMD_AC;
+ ret = mmc_wait_for_cmd(card->host, &cmd, 5);
+ if (ret) {
+ printk(KERN_ERR "%s: error %d requesting status\n",
+ req->rq_disk->disk_name, ret);
+ return ret;
+ }
+ /*
+ * Some cards mishandle the status bits,
+ * so make sure to check both the busy
+ * indication and the card state.
+ */
+ } while (!(cmd.resp[0] & R1_READY_FOR_DATA) ||
+ (R1_CURRENT_STATE(cmd.resp[0]) == 7));
+ }
+
+out:
+ *bytes_xfered = req->nr_sectors << 9;
+
+ return 0;
+}
+
static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
{
struct mmc_blk_data *md = mq->data;
@@ -385,7 +498,10 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
mmc_claim_host(card->host);
do {
- err = mmc_blk_xfer_rq(md, req, &bytes_xfered);
+ if (blk_discard_rq(req))
+ err = mmc_blk_erase_rq(md, req, &bytes_xfered);
+ else
+ err = mmc_blk_xfer_rq(md, req, &bytes_xfered);
/*
* First handle the sectors that got transferred
diff --git a/drivers/mmc/card/queue.c b/drivers/mmc/card/queue.c
index 3dee97e..236a8e8 100644
--- a/drivers/mmc/card/queue.c
+++ b/drivers/mmc/card/queue.c
@@ -17,6 +17,7 @@
#include <linux/mmc/card.h>
#include <linux/mmc/host.h>
+#include <linux/mmc/mmc.h>
#include "queue.h"
#define MMC_QUEUE_BOUNCESZ 65536
@@ -41,6 +42,12 @@ static int mmc_prep_request(struct request_queue *q, struct request *req)
return BLKPREP_OK;
}
+static int mmc_prep_discard(struct request_queue *q, struct request *req)
+{
+ /* This is just a dummy function to indicate erase support */
+ return BLKPREP_OK;
+}
+
static int mmc_queue_thread(void *d)
{
struct mmc_queue *mq = d;
@@ -132,6 +139,17 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, spinlock_t *lock
blk_queue_prep_rq(mq->queue, mmc_prep_request);
+ if (card->csd.cmdclass & CCC_ERASE)
+ blk_queue_set_discard(mq->queue, mmc_prep_discard);
+
+ /*
+ * Calculating a correct span is way to messy if this
+ * assumption is broken, so remove the erase support
+ */
+ if (unlikely(mmc_card_blockaddr(card) &&
+ (card->csd.erase_size % 512)))
+ blk_queue_set_discard(mq->queue, NULL);
+
#ifdef CONFIG_MMC_BLOCK_BOUNCE
if (host->max_hw_segs == 1) {
unsigned int bouncesz;
diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
index fdd7c76..5e09e36 100644
--- a/drivers/mmc/core/mmc.c
+++ b/drivers/mmc/core/mmc.c
@@ -113,7 +113,7 @@ static int mmc_decode_cid(struct mmc_card *card)
static int mmc_decode_csd(struct mmc_card *card)
{
struct mmc_csd *csd = &card->csd;
- unsigned int e, m, csd_struct;
+ unsigned int e, m, a, b, csd_struct;
u32 *resp = card->raw_csd;
/*
@@ -150,6 +150,11 @@ static int mmc_decode_csd(struct mmc_card *card)
csd->write_blkbits = UNSTUFF_BITS(resp, 22, 4);
csd->write_partial = UNSTUFF_BITS(resp, 21, 1);
+ a = UNSTUFF_BITS(resp, 42, 5);
+ b = UNSTUFF_BITS(resp, 37, 5);
+ csd->erase_size = (a + 1) * (b + 1);
+ csd->erase_size <<= csd->write_blkbits;
+
return 0;
}
diff --git a/drivers/mmc/core/sd.c b/drivers/mmc/core/sd.c
index 26fc098..3024798 100644
--- a/drivers/mmc/core/sd.c
+++ b/drivers/mmc/core/sd.c
@@ -118,6 +118,13 @@ static int mmc_decode_csd(struct mmc_card *card)
csd->r2w_factor = UNSTUFF_BITS(resp, 26, 3);
csd->write_blkbits = UNSTUFF_BITS(resp, 22, 4);
csd->write_partial = UNSTUFF_BITS(resp, 21, 1);
+
+ if (UNSTUFF_BITS(resp, 46, 1))
+ csd->erase_size = 512;
+ else {
+ csd->erase_size = UNSTUFF_BITS(resp, 39, 7) + 1;
+ csd->erase_size <<= csd->write_blkbits;
+ }
break;
case 1:
/*
@@ -146,6 +153,7 @@ static int mmc_decode_csd(struct mmc_card *card)
csd->r2w_factor = 4; /* Unused */
csd->write_blkbits = 9;
csd->write_partial = 0;
+ csd->erase_size = 512;
break;
default:
printk(KERN_ERR "%s: unrecognised CSD structure version %d\n",
diff --git a/include/linux/mmc/card.h b/include/linux/mmc/card.h
index ee6e822..8abdf72 100644
--- a/include/linux/mmc/card.h
+++ b/include/linux/mmc/card.h
@@ -30,6 +30,7 @@ struct mmc_csd {
unsigned int tacc_ns;
unsigned int r2w_factor;
unsigned int max_dtr;
+ unsigned int erase_size;
unsigned int read_blkbits;
unsigned int write_blkbits;
unsigned int capacity;
diff --git a/include/linux/mmc/sd.h b/include/linux/mmc/sd.h
index f310062..1f6f316 100644
--- a/include/linux/mmc/sd.h
+++ b/include/linux/mmc/sd.h
@@ -21,6 +21,10 @@
/* class 10 */
#define SD_SWITCH 6 /* adtc [31:0] See below R1 */
+ /* class 5 */
+#define SD_ERASE_WR_BLK_START 32 /* ac [31:0] data addr R1 */
+#define SD_ERASE_WR_BLK_END 33 /* ac [31:0] data addr R1 */
+
/* Application commands */
#define SD_APP_SET_BUS_WIDTH 6 /* ac [1:0] bus width R1 */
#define SD_APP_SEND_NUM_WR_BLKS 22 /* adtc R1 */
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
rdesktop, core developer http://www.rdesktop.org
WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.
^ permalink raw reply related [flat|nested] 88+ messages in thread* Re: [PATCH 0/2] MMC discard support (was [PATCH 0/7] Discard requests, v2)
2008-08-16 17:08 ` [PATCH 0/2] MMC discard support (was [PATCH 0/7] Discard requests, v2) Pierre Ossman
2008-08-16 17:11 ` [PATCH 1/2] mmc_block: factor out the mmc request handling Pierre Ossman
2008-08-16 17:12 ` [PATCH 2/2] mmc_block: erase discarded blocks Pierre Ossman
@ 2008-08-16 17:38 ` David Woodhouse
2008-08-16 17:51 ` Pierre Ossman
2008-08-22 9:24 ` Jens Axboe
3 siblings, 1 reply; 88+ messages in thread
From: David Woodhouse @ 2008-08-16 17:38 UTC (permalink / raw)
To: Pierre Ossman; +Cc: linux-fsdevel, Jens Axboe
On Sat, 2008-08-16 at 19:08 +0200, Pierre Ossman wrote:
> I've cooked up some patches that maps David's new discard requests to
> erase operations on MMC and SD cards. I'm not entirely sure these are
> something to keep though as I've been unable to see any performance
> increase in keeping blocks erased. Do we have any other reason to keep
> it?
When you fill a file system completely, then delete all files -- then do
you see a performance improvement when you subsequently write to it?
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/2] MMC discard support (was [PATCH 0/7] Discard requests, v2)
2008-08-16 17:38 ` [PATCH 0/2] MMC discard support (was [PATCH 0/7] Discard requests, v2) David Woodhouse
@ 2008-08-16 17:51 ` Pierre Ossman
0 siblings, 0 replies; 88+ messages in thread
From: Pierre Ossman @ 2008-08-16 17:51 UTC (permalink / raw)
To: David Woodhouse; +Cc: linux-fsdevel, Jens Axboe
[-- Attachment #1: Type: text/plain, Size: 1508 bytes --]
On Sat, 16 Aug 2008 18:38:41 +0100
David Woodhouse <dwmw2@infradead.org> wrote:
> On Sat, 2008-08-16 at 19:08 +0200, Pierre Ossman wrote:
> > I've cooked up some patches that maps David's new discard requests to
> > erase operations on MMC and SD cards. I'm not entirely sure these are
> > something to keep though as I've been unable to see any performance
> > increase in keeping blocks erased. Do we have any other reason to keep
> > it?
>
> When you fill a file system completely, then delete all files -- then do
> you see a performance improvement when you subsequently write to it?
>
When testing the new version, I've first erased the entire device using
the new ioctl. So the tests start with a completely erased device.
Basically the following:
./discard /dev/mmcblk0 0 `cat /sys/block/mmcblk0/size`
mkfs -t vfat /dev/mmcblk0
mount /dev/mmcblk0 /media/tmp
bonnie++ -d /media/tmp
The comparison is against a driver without erase support and a card
prepared by filling it with /dev/zero.
It might be that my testing methodology is flawed, so suggestions are
welcome (and people testing on their own for that matter).
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
rdesktop, core developer http://www.rdesktop.org
WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/2] MMC discard support (was [PATCH 0/7] Discard requests, v2)
2008-08-16 17:08 ` [PATCH 0/2] MMC discard support (was [PATCH 0/7] Discard requests, v2) Pierre Ossman
` (2 preceding siblings ...)
2008-08-16 17:38 ` [PATCH 0/2] MMC discard support (was [PATCH 0/7] Discard requests, v2) David Woodhouse
@ 2008-08-22 9:24 ` Jens Axboe
2008-08-22 9:45 ` David Woodhouse
2008-08-22 11:13 ` Pierre Ossman
3 siblings, 2 replies; 88+ messages in thread
From: Jens Axboe @ 2008-08-22 9:24 UTC (permalink / raw)
To: Pierre Ossman; +Cc: linux-fsdevel, David Woodhouse
On Sat, Aug 16 2008, Pierre Ossman wrote:
> I've cooked up some patches that maps David's new discard requests to
> erase operations on MMC and SD cards. I'm not entirely sure these are
> something to keep though as I've been unable to see any performance
> increase in keeping blocks erased. Do we have any other reason to keep
> it?
>
> During development and test I noticed one issue though; the discard
> requests are chopped up into smaller pieces. As the erase commands have
> a granularity, sometimes this can mean that parts of the flash are
> missed because the discard request isn't split in alignment with the
> erase boundaries.
>
> For this reason, I propose that discard requests do not respect the
> regular queue restrictions. Those are generally for expressing
> limitations in the devices' DMA engines. Since the discard request
> carries no data, the DMA engine will never get involved.
I agree with that, the thought did cross my mind earlier as well. I
committed something like the below (in two patches). Do you want me to
queue up your mmc patches as well?
diff --git a/block/blk-core.c b/block/blk-core.c
index d785466..52824c0 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1381,7 +1381,8 @@ end_io:
break;
}
- if (unlikely(nr_sectors > q->max_hw_sectors)) {
+ if (unlikely(bio_has_data(bio) &&
+ nr_sectors > q->max_hw_sectors)) {
printk(KERN_ERR "bio too big device %s (%u > %u)\n",
bdevname(bio->bi_bdev, b),
bio_sectors(bio),
diff --git a/block/ioctl.c b/block/ioctl.c
index 375c579..ec6fa5d 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -125,52 +125,41 @@ static int blk_ioctl_discard(struct block_device *bdev, uint64_t start,
uint64_t len)
{
struct request_queue *q = bdev_get_queue(bdev);
+ DECLARE_COMPLETION_ONSTACK(wait);
+ struct bio *bio;
int ret = 0;
if (start & 511)
return -EINVAL;
if (len & 511)
return -EINVAL;
- start >>= 9;
- len >>= 9;
- if (start + len > (bdev->bd_inode->i_size >> 9))
+ if (start + len > bdev->bd_inode->i_size)
return -EINVAL;
if (!q->prepare_discard_fn)
return -EOPNOTSUPP;
- while (len && !ret) {
- DECLARE_COMPLETION_ONSTACK(wait);
- struct bio *bio;
-
- bio = bio_alloc(GFP_KERNEL, 0);
- if (!bio)
- return -ENOMEM;
-
- bio->bi_end_io = blk_ioc_discard_endio;
- bio->bi_bdev = bdev;
- bio->bi_private = &wait;
- bio->bi_sector = start;
-
- if (len > q->max_hw_sectors) {
- bio->bi_size = q->max_hw_sectors << 9;
- len -= q->max_hw_sectors;
- start += q->max_hw_sectors;
- } else {
- bio->bi_size = len << 9;
- len = 0;
- }
- submit_bio(DISCARD_NOBARRIER, bio);
-
- wait_for_completion(&wait);
-
- if (bio_flagged(bio, BIO_EOPNOTSUPP))
- ret = -EOPNOTSUPP;
- else if (!bio_flagged(bio, BIO_UPTODATE))
- ret = -EIO;
- bio_put(bio);
- }
+ bio = bio_alloc(GFP_KERNEL, 0);
+ if (!bio)
+ return -ENOMEM;
+
+ bio->bi_end_io = blk_ioc_discard_endio;
+ bio->bi_bdev = bdev;
+ bio->bi_private = &wait;
+ bio->bi_sector = start >> 9;
+ bio->bi_size = len;
+
+ submit_bio(DISCARD_NOBARRIER, bio);
+
+ wait_for_completion(&wait);
+
+ if (bio_flagged(bio, BIO_EOPNOTSUPP))
+ ret = -EOPNOTSUPP;
+ else if (!bio_flagged(bio, BIO_UPTODATE))
+ ret = -EIO;
+
+ bio_put(bio);
return ret;
}
--
Jens Axboe
^ permalink raw reply related [flat|nested] 88+ messages in thread* Re: [PATCH 0/2] MMC discard support (was [PATCH 0/7] Discard requests, v2)
2008-08-22 9:24 ` Jens Axboe
@ 2008-08-22 9:45 ` David Woodhouse
2008-08-22 10:50 ` Jens Axboe
2008-08-22 11:13 ` Pierre Ossman
1 sibling, 1 reply; 88+ messages in thread
From: David Woodhouse @ 2008-08-22 9:45 UTC (permalink / raw)
To: Jens Axboe; +Cc: Pierre Ossman, linux-fsdevel
On Fri, 2008-08-22 at 11:24 +0200, Jens Axboe wrote:
> I agree with that, the thought did cross my mind earlier as well. I
> committed something like the below (in two patches).
But there _are_ limits on how many sectors can be discarded in a single
operation. For ATA 'Trim' I think it's 65536? Do we want to force the
drivers to translate a single request into multiple actual commands?
I suspect we need a separate 'max_discard_sectors' field.
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/2] MMC discard support (was [PATCH 0/7] Discard requests, v2)
2008-08-22 9:45 ` David Woodhouse
@ 2008-08-22 10:50 ` Jens Axboe
2008-08-22 10:58 ` David Woodhouse
0 siblings, 1 reply; 88+ messages in thread
From: Jens Axboe @ 2008-08-22 10:50 UTC (permalink / raw)
To: David Woodhouse; +Cc: Pierre Ossman, linux-fsdevel
On Fri, Aug 22 2008, David Woodhouse wrote:
> On Fri, 2008-08-22 at 11:24 +0200, Jens Axboe wrote:
> > I agree with that, the thought did cross my mind earlier as well. I
> > committed something like the below (in two patches).
>
> But there _are_ limits on how many sectors can be discarded in a single
> operation. For ATA 'Trim' I think it's 65536? Do we want to force the
> drivers to translate a single request into multiple actual commands?
Hmm right, I guess the sector limit in the command of course still
applies.
> I suspect we need a separate 'max_discard_sectors' field.
No, that is what max_hw_sectors is for.
--
Jens Axboe
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/2] MMC discard support (was [PATCH 0/7] Discard requests, v2)
2008-08-22 10:50 ` Jens Axboe
@ 2008-08-22 10:58 ` David Woodhouse
2008-08-22 11:11 ` Pierre Ossman
0 siblings, 1 reply; 88+ messages in thread
From: David Woodhouse @ 2008-08-22 10:58 UTC (permalink / raw)
To: Jens Axboe; +Cc: Pierre Ossman, linux-fsdevel
On Fri, 2008-08-22 at 12:50 +0200, Jens Axboe wrote:
> > I suspect we need a separate 'max_discard_sectors' field.
>
> No, that is what max_hw_sectors is for.
I thought Pierre's problem is that he has _different_ limits for
discards vs. read/writes?
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/2] MMC discard support (was [PATCH 0/7] Discard requests, v2)
2008-08-22 10:58 ` David Woodhouse
@ 2008-08-22 11:11 ` Pierre Ossman
2008-08-22 11:19 ` Jens Axboe
0 siblings, 1 reply; 88+ messages in thread
From: Pierre Ossman @ 2008-08-22 11:11 UTC (permalink / raw)
To: David Woodhouse; +Cc: Jens Axboe, linux-fsdevel
[-- Attachment #1: Type: text/plain, Size: 944 bytes --]
On Fri, 22 Aug 2008 11:58:16 +0100
David Woodhouse <dwmw2@infradead.org> wrote:
> On Fri, 2008-08-22 at 12:50 +0200, Jens Axboe wrote:
> > > I suspect we need a separate 'max_discard_sectors' field.
> >
> > No, that is what max_hw_sectors is for.
>
> I thought Pierre's problem is that he has _different_ limits for
> discards vs. read/writes?
>
Indeed. max_hw_sectors is _not_ used for "maximum number of sectors" in
most cases, but for "maximum number of bytes the device can transfer".
The difference between the two is irrelevant for data requests, but is
evident here.
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
rdesktop, core developer http://www.rdesktop.org
WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 88+ messages in thread* Re: [PATCH 0/2] MMC discard support (was [PATCH 0/7] Discard requests, v2)
2008-08-22 11:11 ` Pierre Ossman
@ 2008-08-22 11:19 ` Jens Axboe
0 siblings, 0 replies; 88+ messages in thread
From: Jens Axboe @ 2008-08-22 11:19 UTC (permalink / raw)
To: Pierre Ossman; +Cc: David Woodhouse, linux-fsdevel
On Fri, Aug 22 2008, Pierre Ossman wrote:
> On Fri, 22 Aug 2008 11:58:16 +0100
> David Woodhouse <dwmw2@infradead.org> wrote:
>
> > On Fri, 2008-08-22 at 12:50 +0200, Jens Axboe wrote:
> > > > I suspect we need a separate 'max_discard_sectors' field.
> > >
> > > No, that is what max_hw_sectors is for.
> >
> > I thought Pierre's problem is that he has _different_ limits for
> > discards vs. read/writes?
> >
>
> Indeed. max_hw_sectors is _not_ used for "maximum number of sectors" in
> most cases, but for "maximum number of bytes the device can transfer".
> The difference between the two is irrelevant for data requests, but is
> evident here.
OK, I didn't notice the distinction for this case.
--
Jens Axboe
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/2] MMC discard support (was [PATCH 0/7] Discard requests, v2)
2008-08-22 9:24 ` Jens Axboe
2008-08-22 9:45 ` David Woodhouse
@ 2008-08-22 11:13 ` Pierre Ossman
2008-08-22 11:20 ` Jens Axboe
1 sibling, 1 reply; 88+ messages in thread
From: Pierre Ossman @ 2008-08-22 11:13 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-fsdevel, David Woodhouse
[-- Attachment #1: Type: text/plain, Size: 740 bytes --]
On Fri, 22 Aug 2008 11:24:48 +0200
Jens Axboe <jens.axboe@oracle.com> wrote:
>
> I agree with that, the thought did cross my mind earlier as well. I
> committed something like the below (in two patches). Do you want me to
> queue up your mmc patches as well?
>
As it stands, no. If nobody can find a measurable improvement then this
code is just more things that can go wrong.
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
rdesktop, core developer http://www.rdesktop.org
WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 88+ messages in thread* Re: [PATCH 0/2] MMC discard support (was [PATCH 0/7] Discard requests, v2)
2008-08-22 11:13 ` Pierre Ossman
@ 2008-08-22 11:20 ` Jens Axboe
2008-08-22 14:49 ` OGAWA Hirofumi
0 siblings, 1 reply; 88+ messages in thread
From: Jens Axboe @ 2008-08-22 11:20 UTC (permalink / raw)
To: Pierre Ossman; +Cc: linux-fsdevel, David Woodhouse
On Fri, Aug 22 2008, Pierre Ossman wrote:
> On Fri, 22 Aug 2008 11:24:48 +0200
> Jens Axboe <jens.axboe@oracle.com> wrote:
>
> >
> > I agree with that, the thought did cross my mind earlier as well. I
> > committed something like the below (in two patches). Do you want me to
> > queue up your mmc patches as well?
> >
>
> As it stands, no. If nobody can find a measurable improvement then this
> code is just more things that can go wrong.
It could just be that the device you tested with doesn't do anything
with the discard information. But I agree, until there's a verified
benefit to adding the code, we can leave it be.
--
Jens Axboe
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/2] MMC discard support (was [PATCH 0/7] Discard requests, v2)
2008-08-22 11:20 ` Jens Axboe
@ 2008-08-22 14:49 ` OGAWA Hirofumi
2008-08-22 23:02 ` Pierre Ossman
0 siblings, 1 reply; 88+ messages in thread
From: OGAWA Hirofumi @ 2008-08-22 14:49 UTC (permalink / raw)
To: Jens Axboe; +Cc: Pierre Ossman, linux-fsdevel, David Woodhouse
Jens Axboe <jens.axboe@oracle.com> writes:
> On Fri, Aug 22 2008, Pierre Ossman wrote:
>> On Fri, 22 Aug 2008 11:24:48 +0200
>> Jens Axboe <jens.axboe@oracle.com> wrote:
>>
>> >
>> > I agree with that, the thought did cross my mind earlier as well. I
>> > committed something like the below (in two patches). Do you want me to
>> > queue up your mmc patches as well?
>> >
>>
>> As it stands, no. If nobody can find a measurable improvement then this
>> code is just more things that can go wrong.
>
> It could just be that the device you tested with doesn't do anything
> with the discard information. But I agree, until there's a verified
> benefit to adding the code, we can leave it be.
That doesn't have improvement? E.g. it means the results
dd if=/dev/zero of=/dev/xxx
and
./discard_all /dev/xxx
dd if=/dev/zero of=/dev/xxx
are same? Or FAT doesn't have?
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/2] MMC discard support (was [PATCH 0/7] Discard requests, v2)
2008-08-22 14:49 ` OGAWA Hirofumi
@ 2008-08-22 23:02 ` Pierre Ossman
2008-08-22 23:59 ` OGAWA Hirofumi
0 siblings, 1 reply; 88+ messages in thread
From: Pierre Ossman @ 2008-08-22 23:02 UTC (permalink / raw)
To: OGAWA Hirofumi; +Cc: Jens Axboe, linux-fsdevel, David Woodhouse
[-- Attachment #1: Type: text/plain, Size: 817 bytes --]
On Fri, 22 Aug 2008 23:49:07 +0900
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> wrote:
>
> That doesn't have improvement? E.g. it means the results
>
> dd if=/dev/zero of=/dev/xxx
> and
> ./discard_all /dev/xxx
> dd if=/dev/zero of=/dev/xxx
>
> are same? Or FAT doesn't have?
I have tried those two and others, and I cannot see any noticeable
difference in performance. Please perform your own tests though as I
might have overlooked something.
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
rdesktop, core developer http://www.rdesktop.org
WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 88+ messages in thread* Re: [PATCH 0/2] MMC discard support (was [PATCH 0/7] Discard requests, v2)
2008-08-22 23:02 ` Pierre Ossman
@ 2008-08-22 23:59 ` OGAWA Hirofumi
2008-08-24 11:23 ` Pierre Ossman
0 siblings, 1 reply; 88+ messages in thread
From: OGAWA Hirofumi @ 2008-08-22 23:59 UTC (permalink / raw)
To: Pierre Ossman; +Cc: Jens Axboe, linux-fsdevel, David Woodhouse
Pierre Ossman <drzeus-list@drzeus.cx> writes:
> On Fri, 22 Aug 2008 23:49:07 +0900
> OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> wrote:
>
>>
>> That doesn't have improvement? E.g. it means the results
>>
>> dd if=/dev/zero of=/dev/xxx
>> and
>> ./discard_all /dev/xxx
>> dd if=/dev/zero of=/dev/xxx
>>
>> are same? Or FAT doesn't have?
>
> I have tried those two and others, and I cannot see any noticeable
> difference in performance. Please perform your own tests though as I
> might have overlooked something.
I see, sounds like erase commands of sd/mmc was ignored. It's
interesting results, and I'd like to test and see the detail of it,
however, unfortunately I don't have proper devices now.
Anyway, thanks for info.
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: [PATCH 0/2] MMC discard support (was [PATCH 0/7] Discard requests, v2)
2008-08-22 23:59 ` OGAWA Hirofumi
@ 2008-08-24 11:23 ` Pierre Ossman
2008-08-24 13:39 ` OGAWA Hirofumi
0 siblings, 1 reply; 88+ messages in thread
From: Pierre Ossman @ 2008-08-24 11:23 UTC (permalink / raw)
To: OGAWA Hirofumi; +Cc: Jens Axboe, linux-fsdevel, David Woodhouse
[-- Attachment #1: Type: text/plain, Size: 1098 bytes --]
On Sat, 23 Aug 2008 08:59:11 +0900
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> wrote:
> Pierre Ossman <drzeus-list@drzeus.cx> writes:
> >
> > I have tried those two and others, and I cannot see any noticeable
> > difference in performance. Please perform your own tests though as I
> > might have overlooked something.
>
> I see, sounds like erase commands of sd/mmc was ignored.
If only. :)
The device does take some time to perform the command, and the data is
indeed gone after the command has completed. So there is no doubt that
the sector gets erased.
It might simply be that so few (no?) SD/MMC readers actually implement
these commands, so the card vendors haven't bothered implementing any
decent handling of them.
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
rdesktop, core developer http://www.rdesktop.org
WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 88+ messages in thread* Re: [PATCH 0/2] MMC discard support (was [PATCH 0/7] Discard requests, v2)
2008-08-24 11:23 ` Pierre Ossman
@ 2008-08-24 13:39 ` OGAWA Hirofumi
0 siblings, 0 replies; 88+ messages in thread
From: OGAWA Hirofumi @ 2008-08-24 13:39 UTC (permalink / raw)
To: Pierre Ossman; +Cc: Jens Axboe, linux-fsdevel, David Woodhouse
Pierre Ossman <drzeus-list@drzeus.cx> writes:
> On Sat, 23 Aug 2008 08:59:11 +0900
> OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> wrote:
>
>> Pierre Ossman <drzeus-list@drzeus.cx> writes:
>> >
>> > I have tried those two and others, and I cannot see any noticeable
>> > difference in performance. Please perform your own tests though as I
>> > might have overlooked something.
>>
>> I see, sounds like erase commands of sd/mmc was ignored.
>
> If only. :)
Yes :)
> The device does take some time to perform the command, and the data is
> indeed gone after the command has completed. So there is no doubt that
> the sector gets erased.
>
> It might simply be that so few (no?) SD/MMC readers actually implement
> these commands, so the card vendors haven't bothered implementing any
> decent handling of them.
I see. Umm... Or. The pre-erased card has no improvement. So, the card
may be doing wear-leveling (ERASE_BLK_EN=1 may imply?), and write
bandwidth is slow enough, it might hide the slowness of erase well...
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
^ permalink raw reply [flat|nested] 88+ messages in thread