* [PATCH 0/2] scsi discard support @ 2009-10-29 15:08 Christoph Hellwig 2009-10-29 15:08 ` [PATCH 1/2] block: add support for discard limits Christoph Hellwig 2009-10-29 15:08 ` [PATCH 2/2] sd: add support for WRITE SAME (16) with unmap bit Christoph Hellwig 0 siblings, 2 replies; 14+ messages in thread From: Christoph Hellwig @ 2009-10-29 15:08 UTC (permalink / raw) To: linux-scsi; +Cc: mkp, axboe, matthew Time for proper scsi discard support. The first patch is for the block layer to implement the discard granularity limits properly. The second is a SPC3r20 compliant implementation sending WRITE SAME with the unmap bit from sd. ATA support will come later now that I finally obtained a TRIM capable OCZ SSD. So far only tested using Martin's scsi_debug TP support and my experimental qemu TP support, but I'll be able to test it on some real arrays soon. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/2] block: add support for discard limits 2009-10-29 15:08 [PATCH 0/2] scsi discard support Christoph Hellwig @ 2009-10-29 15:08 ` Christoph Hellwig 2009-10-29 19:06 ` Jens Axboe 2009-10-30 3:19 ` Martin K. Petersen 2009-10-29 15:08 ` [PATCH 2/2] sd: add support for WRITE SAME (16) with unmap bit Christoph Hellwig 1 sibling, 2 replies; 14+ messages in thread From: Christoph Hellwig @ 2009-10-29 15:08 UTC (permalink / raw) To: linux-scsi; +Cc: mkp, axboe, matthew [-- Attachment #1: discard-add-limits --] [-- Type: text/plain, Size: 6999 bytes --] To properly support discard on SCSI Arrays we need to take the discard granularity and the first aligned discard LBA into account. This patch adds block limits for both of them, and trims down dicard requests to fit into these limits in blkdev_issue_discard. We also make sure the alignment offset is properly adjust for partitions and add sysfs files to expose the limits to userspaced. Signed-off-by: Christoph Hellwig <hch@lst.de> Index: linux-2.6/block/blk-sysfs.c =================================================================== --- linux-2.6.orig/block/blk-sysfs.c 2009-10-29 15:46:06.567004210 +0100 +++ linux-2.6/block/blk-sysfs.c 2009-10-29 15:47:33.660022475 +0100 @@ -126,6 +126,18 @@ static ssize_t queue_io_opt_show(struct return queue_var_show(queue_io_opt(q), page); } +static ssize_t queue_discard_granularity_show(struct request_queue *q, + char *page) +{ + return queue_var_show(q->limits.discard_granularity, page); +} + +static ssize_t queue_discard_alignment_show(struct request_queue *q, + char *page) +{ + return queue_var_show(q->limits.discard_alignment, page); +} + static ssize_t queue_max_sectors_store(struct request_queue *q, const char *page, size_t count) { @@ -293,6 +305,16 @@ static struct queue_sysfs_entry queue_io .show = queue_io_opt_show, }; +static struct queue_sysfs_entry queue_discard_granularity_entry = { + .attr = {.name = "discard_granularity", .mode = S_IRUGO }, + .show = queue_discard_granularity_show, +}; + +static struct queue_sysfs_entry queue_discard_alignment_entry = { + .attr = {.name = "discard_alignment", .mode = S_IRUGO }, + .show = queue_discard_alignment_show, +}; + static struct queue_sysfs_entry queue_nonrot_entry = { .attr = {.name = "rotational", .mode = S_IRUGO | S_IWUSR }, .show = queue_nonrot_show, @@ -328,6 +350,8 @@ static struct attribute *default_attrs[] &queue_physical_block_size_entry.attr, &queue_io_min_entry.attr, &queue_io_opt_entry.attr, + &queue_discard_granularity_entry.attr, + &queue_discard_alignment_entry.attr, &queue_nonrot_entry.attr, &queue_nomerges_entry.attr, &queue_rq_affinity_entry.attr, Index: linux-2.6/include/linux/blkdev.h =================================================================== --- linux-2.6.orig/include/linux/blkdev.h 2009-10-29 15:46:06.551004197 +0100 +++ linux-2.6/include/linux/blkdev.h 2009-10-29 15:47:33.661024838 +0100 @@ -312,6 +312,8 @@ struct queue_limits { unsigned int io_min; unsigned int io_opt; unsigned int max_discard_sectors; + unsigned int discard_granularity; + unsigned int discard_alignment; unsigned short logical_block_size; unsigned short max_hw_segments; Index: linux-2.6/block/blk-barrier.c =================================================================== --- linux-2.6.orig/block/blk-barrier.c 2009-10-29 15:46:06.573022714 +0100 +++ linux-2.6/block/blk-barrier.c 2009-10-29 15:47:33.663024324 +0100 @@ -355,6 +355,20 @@ static void blkdev_discard_end_io(struct bio_put(bio); } +/* + * Many implementation of block dicard are very limited. Most implementations + * enforce a granularity limit for the discard requests and we have to trim + * down the request to match it. In addition to that for some implementation + * the start of this granularity is misaligned vs block 0, so we need to take + * that into account aswell. + */ +static sector_t discard_offset(sector_t sector, unsigned int granularity, + unsigned int alignment) +{ + return ((sector - alignment + granularity - 1) & + ~(granularity - 1)) - (sector - alignment); +} + /** * blkdev_issue_discard - queue a discard * @bdev: blockdev to issue discard for @@ -371,10 +385,12 @@ int blkdev_issue_discard(struct block_de { DECLARE_COMPLETION_ONSTACK(wait); struct request_queue *q = bdev_get_queue(bdev); + struct hd_struct *part = bdev->bd_part; int type = flags & DISCARD_FL_BARRIER ? DISCARD_BARRIER : DISCARD_NOBARRIER; struct bio *bio; struct page *page; + sector_t offset; int ret = 0; if (!q) @@ -383,6 +399,18 @@ int blkdev_issue_discard(struct block_de if (!blk_queue_discard(q)) return -EOPNOTSUPP; + /* + * We need to respect the discard granularity that is supported by the + * device. Round up the start block to the nearest multiple and round + * down the length to the nearest multiple of that granularity. + */ + offset = discard_offset(sector, q->limits.discard_granularity, + part->discard_alignment); + + sector += offset; + nr_sects = (nr_sects - offset - 1) & + ~(q->limits.discard_granularity - 1); + while (nr_sects && !ret) { unsigned int sector_size = q->limits.logical_block_size; unsigned int max_discard_sectors = Index: linux-2.6/fs/partitions/check.c =================================================================== --- linux-2.6.orig/fs/partitions/check.c 2009-10-29 15:46:06.580003701 +0100 +++ linux-2.6/fs/partitions/check.c 2009-10-29 15:47:33.667023647 +0100 @@ -226,6 +226,13 @@ ssize_t part_alignment_offset_show(struc return sprintf(buf, "%llu\n", (unsigned long long)p->alignment_offset); } +ssize_t part_discard_alignment_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct hd_struct *p = dev_to_part(dev); + return sprintf(buf, "%u\n", p->discard_alignment); +} + ssize_t part_stat_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -288,6 +295,8 @@ static DEVICE_ATTR(partition, S_IRUGO, p static DEVICE_ATTR(start, S_IRUGO, part_start_show, NULL); static DEVICE_ATTR(size, S_IRUGO, part_size_show, NULL); static DEVICE_ATTR(alignment_offset, S_IRUGO, part_alignment_offset_show, NULL); +static DEVICE_ATTR(discard_alignment, S_IRUGO, part_discard_alignment_show, + NULL); static DEVICE_ATTR(stat, S_IRUGO, part_stat_show, NULL); static DEVICE_ATTR(inflight, S_IRUGO, part_inflight_show, NULL); #ifdef CONFIG_FAIL_MAKE_REQUEST @@ -300,6 +309,7 @@ static struct attribute *part_attrs[] = &dev_attr_start.attr, &dev_attr_size.attr, &dev_attr_alignment_offset.attr, + &dev_attr_discard_alignment.attr, &dev_attr_stat.attr, &dev_attr_inflight.attr, #ifdef CONFIG_FAIL_MAKE_REQUEST @@ -403,6 +413,8 @@ struct hd_struct *add_partition(struct g p->start_sect = start; p->alignment_offset = queue_sector_alignment_offset(disk->queue, start); + p->discard_alignment = (disk->queue->limits.discard_alignment + start) & + (disk->queue->limits.discard_granularity - 1); p->nr_sects = len; p->partno = partno; p->policy = get_disk_ro(disk); Index: linux-2.6/include/linux/genhd.h =================================================================== --- linux-2.6.orig/include/linux/genhd.h 2009-10-29 15:46:06.557028148 +0100 +++ linux-2.6/include/linux/genhd.h 2009-10-29 15:47:33.672256087 +0100 @@ -91,6 +91,7 @@ struct hd_struct { sector_t start_sect; sector_t nr_sects; sector_t alignment_offset; + unsigned int discard_alignment; struct device __dev; struct kobject *holder_dir; int policy, partno; ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] block: add support for discard limits 2009-10-29 15:08 ` [PATCH 1/2] block: add support for discard limits Christoph Hellwig @ 2009-10-29 19:06 ` Jens Axboe 2009-10-30 5:03 ` Christoph Hellwig 2009-10-30 3:19 ` Martin K. Petersen 1 sibling, 1 reply; 14+ messages in thread From: Jens Axboe @ 2009-10-29 19:06 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-scsi, mkp, matthew On Thu, Oct 29 2009, Christoph Hellwig wrote: > To properly support discard on SCSI Arrays we need to take the discard > granularity and the first aligned discard LBA into account. This patch > adds block limits for both of them, and trims down dicard requests to fit > into these limits in blkdev_issue_discard. We also make sure the alignment > offset is properly adjust for partitions and add sysfs files to expose > the limits to userspaced. Except for a few typos, this looks good to me. I'll add it for 2.6.33. -- Jens Axboe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] block: add support for discard limits 2009-10-29 19:06 ` Jens Axboe @ 2009-10-30 5:03 ` Christoph Hellwig 2009-11-02 12:39 ` Jens Axboe 0 siblings, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2009-10-30 5:03 UTC (permalink / raw) To: Jens Axboe; +Cc: Christoph Hellwig, linux-scsi, mkp, matthew On Thu, Oct 29, 2009 at 08:06:59PM +0100, Jens Axboe wrote: > On Thu, Oct 29 2009, Christoph Hellwig wrote: > > To properly support discard on SCSI Arrays we need to take the discard > > granularity and the first aligned discard LBA into account. This patch > > adds block limits for both of them, and trims down dicard requests to fit > > into these limits in blkdev_issue_discard. We also make sure the alignment > > offset is properly adjust for partitions and add sysfs files to expose > > the limits to userspaced. > > Except for a few typos, this looks good to me. I'll add it for 2.6.33. It would be good if we could queue up this (or mkp's version) with the sd bits through the scsi tree - otherwise getting testing for the whole stack will be a pain. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] block: add support for discard limits 2009-10-30 5:03 ` Christoph Hellwig @ 2009-11-02 12:39 ` Jens Axboe 0 siblings, 0 replies; 14+ messages in thread From: Jens Axboe @ 2009-11-02 12:39 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-scsi, mkp, matthew On Fri, Oct 30 2009, Christoph Hellwig wrote: > On Thu, Oct 29, 2009 at 08:06:59PM +0100, Jens Axboe wrote: > > On Thu, Oct 29 2009, Christoph Hellwig wrote: > > > To properly support discard on SCSI Arrays we need to take the discard > > > granularity and the first aligned discard LBA into account. This patch > > > adds block limits for both of them, and trims down dicard requests to fit > > > into these limits in blkdev_issue_discard. We also make sure the alignment > > > offset is properly adjust for partitions and add sysfs files to expose > > > the limits to userspaced. > > > > Except for a few typos, this looks good to me. I'll add it for 2.6.33. > > It would be good if we could queue up this (or mkp's version) with the > sd bits through the scsi tree - otherwise getting testing for the whole > stack will be a pain. Sure, fine with me. Feel free to add my acked-by. -- Jens Axboe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] block: add support for discard limits 2009-10-29 15:08 ` [PATCH 1/2] block: add support for discard limits Christoph Hellwig 2009-10-29 19:06 ` Jens Axboe @ 2009-10-30 3:19 ` Martin K. Petersen 2009-10-30 5:05 ` Christoph Hellwig 1 sibling, 1 reply; 14+ messages in thread From: Martin K. Petersen @ 2009-10-30 3:19 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-scsi, axboe, matthew >>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes: Christoph> To properly support discard on SCSI Arrays we need to take Christoph> the discard granularity and the first aligned discard LBA Christoph> into account. This patch adds block limits for both of them, Christoph> and trims down dicard requests to fit into these limits in Christoph> blkdev_issue_discard. We also make sure the alignment offset Christoph> is properly adjust for partitions and add sysfs files to Christoph> expose the limits to userspaced. Do we know for sure that this is really needed? The reason I'm asking is that in my first attempt I also exposed all this up the stack. However, it sucks to keep this in sync and would require the same level of topology stacking magic as the rest of the queue limits. My brain melted at the thought of LVM/MD volumes striped or mirrored between devices with different unmap granularity and alignment. So I talked to a variety of array folks. And regardless of whether they implement WRITE SAME or UNMAP they all appear to handle misaligned requests just fine. I.e. there may be some mapped residue at the beginning and end of an unmap request but any full allocation units in between will be correctly unmapped. They all told me the alignment and granularity parameters were purely informational and intended as hints for laying out data. Not as hard limits for issuing I/O. Consequently I gutted all the alignment stuff from my thin provisioning tree. I also changed mt scsi_debug code to unmap allocations similar to the way the arrays work (my first attempt completely ignored misaligned requests). ACS-2 doesn't currently have a notion of unmap granularity and all array vendors I have talked to appear to happily process any unmap request we throw at them. So I'm leaning towards keeping things simple and just sending things down verbatim... -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] block: add support for discard limits 2009-10-30 3:19 ` Martin K. Petersen @ 2009-10-30 5:05 ` Christoph Hellwig 2009-11-02 13:29 ` Martin K. Petersen 0 siblings, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2009-10-30 5:05 UTC (permalink / raw) To: Martin K. Petersen; +Cc: Christoph Hellwig, linux-scsi, axboe, matthew On Thu, Oct 29, 2009 at 11:19:43PM -0400, Martin K. Petersen wrote: > ACS-2 doesn't currently have a notion of unmap granularity and all array > vendors I have talked to appear to happily process any unmap request we > throw at them. So I'm leaning towards keeping things simple and just > sending things down verbatim... We have all the granularity and alignment information avilable and keeping track of it is simple enough. Yes md/dm will need to update the first aligned unmap sector, but they'll also need to update the first aligned LBA for I/O purposes and adding more more is simple enough. Given that these bits are in the standard vendors will use them sooner or later and I'd rather be prepared if it's simple enough. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] block: add support for discard limits 2009-10-30 5:05 ` Christoph Hellwig @ 2009-11-02 13:29 ` Martin K. Petersen 2009-11-02 15:39 ` James Bottomley ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Martin K. Petersen @ 2009-11-02 13:29 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Martin K. Petersen, linux-scsi, axboe, matthew >>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes: >> ACS-2 doesn't currently have a notion of unmap granularity and all >> array vendors I have talked to appear to happily process any unmap >> request we throw at them. So I'm leaning towards keeping things >> simple and just sending things down verbatim... Christoph> We have all the granularity and alignment information Christoph> avilable and keeping track of it is simple enough. That's not the issue. Nothing prevents you from submitting a misaligned read/write request to the array. If you do, the array may be slower in servicing the request. But the entire request will still be serviced. Similarly, nothing prevents you from submitting a misaligned unmap request. If you do, the array will just ignore the portions that do not constitute entire allocation units. Your code is taking what is a hint and turning it into a hard limit. Note that it's called OPTIMAL UNMAP GRANULARITY, not REQUIRED UNMAP GRANULARITY. Every vendor I have talked to have asked us to always unmap the *entire* LBA range we're interested in freeing. No exceptions. We don't throw away the beginning/end of a read/write request because it's not properly aligned either, do we? Christoph> Yes md/dm will need to update the first aligned unmap sector, Christoph> but they'll also need to update the first aligned LBA for I/O Christoph> purposes and adding more more is simple enough. But with md and dm you can have devices with different unmap alignment and granularity. With read/write alignment we can throw our hands in the air and say that things will be slow. But we'll still submit all I/O. With heterogeneous md and dm devices there may not be a meaningful value to put in the discard granularity field. Then what? -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] block: add support for discard limits 2009-11-02 13:29 ` Martin K. Petersen @ 2009-11-02 15:39 ` James Bottomley 2009-11-02 18:16 ` Martin K. Petersen 2009-11-02 16:02 ` Boaz Harrosh 2009-11-03 15:16 ` Christoph Hellwig 2 siblings, 1 reply; 14+ messages in thread From: James Bottomley @ 2009-11-02 15:39 UTC (permalink / raw) To: Martin K. Petersen; +Cc: Christoph Hellwig, linux-scsi, axboe, matthew On Mon, 2009-11-02 at 08:29 -0500, Martin K. Petersen wrote: > >>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes: > > >> ACS-2 doesn't currently have a notion of unmap granularity and all > >> array vendors I have talked to appear to happily process any unmap > >> request we throw at them. So I'm leaning towards keeping things > >> simple and just sending things down verbatim... > > Christoph> We have all the granularity and alignment information > Christoph> avilable and keeping track of it is simple enough. > > That's not the issue. > > Nothing prevents you from submitting a misaligned read/write request to > the array. If you do, the array may be slower in servicing the request. > But the entire request will still be serviced. > > Similarly, nothing prevents you from submitting a misaligned unmap > request. If you do, the array will just ignore the portions that do not > constitute entire allocation units. Your code is taking what is a hint > and turning it into a hard limit. Note that it's called OPTIMAL UNMAP > GRANULARITY, not REQUIRED UNMAP GRANULARITY. > > Every vendor I have talked to have asked us to always unmap the *entire* > LBA range we're interested in freeing. No exceptions. > > We don't throw away the beginning/end of a read/write request because > it's not properly aligned either, do we? Agree entirely for this with UNMAP, since the array will just silently discard what it doesn't want. Don't necessarily agree with this for WRITE_SAME. From the implementations I've read, it sounds like the array will discard all sectors within the correct boundary and alignment but it will have to write zeros to all sectors outside of the aligned region ... this sounds like a performance hit unless we do some judicious aligning of the discards. James ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] block: add support for discard limits 2009-11-02 15:39 ` James Bottomley @ 2009-11-02 18:16 ` Martin K. Petersen 0 siblings, 0 replies; 14+ messages in thread From: Martin K. Petersen @ 2009-11-02 18:16 UTC (permalink / raw) To: James Bottomley Cc: Martin K. Petersen, Christoph Hellwig, linux-scsi, axboe, matthew >>>>> "James" == James Bottomley <James.Bottomley@suse.de> writes: James> Agree entirely for this with UNMAP, since the array will just James> silently discard what it doesn't want. James> Don't necessarily agree with this for WRITE_SAME. From the James> implementations I've read, it sounds like the array will discard James> all sectors within the correct boundary and alignment but it will James> have to write zeros to all sectors outside of the aligned region James> ... this sounds like a performance hit unless we do some James> judicious aligning of the discards. Depends whether the array firmware tracks partial allocation units or not. Some do (and will report allocation units in the block limits VPD because it's a hint to FS/DB allocators). Since all vendors I've talked to treat unmaps (whether UNMAP or WRITE SAME with the UNMAP bit set) asynchronously, I doubt it's a problem. And they all seem to think that the overhead is minimal. They say we can worry about loss of a queue tag, but that we shouldn't worry about processing time on the back end at all. Should the problem arise I vote for doing the alignment correction in the WRITE SAME codepath in sd.c. Potentially backed by some heuristic. That solves the problem of information being lost on the way down due to stacking and whatnot... -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] block: add support for discard limits 2009-11-02 13:29 ` Martin K. Petersen 2009-11-02 15:39 ` James Bottomley @ 2009-11-02 16:02 ` Boaz Harrosh 2009-11-02 18:18 ` Martin K. Petersen 2009-11-03 15:16 ` Christoph Hellwig 2 siblings, 1 reply; 14+ messages in thread From: Boaz Harrosh @ 2009-11-02 16:02 UTC (permalink / raw) To: Martin K. Petersen; +Cc: Christoph Hellwig, linux-scsi, axboe, matthew On 11/02/2009 03:29 PM, Martin K. Petersen wrote: > > With heterogeneous md and dm devices there may not be a meaningful value > to put in the discard granularity field. Then what? > These alignment masks may only be base-2, no? If so then the max alignment of all devices will work for any device. I think Boaz ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] block: add support for discard limits 2009-11-02 16:02 ` Boaz Harrosh @ 2009-11-02 18:18 ` Martin K. Petersen 0 siblings, 0 replies; 14+ messages in thread From: Martin K. Petersen @ 2009-11-02 18:18 UTC (permalink / raw) To: Boaz Harrosh Cc: Martin K. Petersen, Christoph Hellwig, linux-scsi, axboe, matthew >>>>> "Boaz" == Boaz Harrosh <bharrosh@panasas.com> writes: Boaz> On 11/02/2009 03:29 PM, Martin K. Petersen wrote: >> >> With heterogeneous md and dm devices there may not be a meaningful >> value to put in the discard granularity field. Then what? >> Boaz> These alignment masks may only be base-2, no? Nope. Could be the good old sector 63 offset vs. naturally aligned. Boaz> If so then the max alignment of all devices will work for any Boaz> device. But even then we'd waste space on the array with smaller granularity. -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] block: add support for discard limits 2009-11-02 13:29 ` Martin K. Petersen 2009-11-02 15:39 ` James Bottomley 2009-11-02 16:02 ` Boaz Harrosh @ 2009-11-03 15:16 ` Christoph Hellwig 2 siblings, 0 replies; 14+ messages in thread From: Christoph Hellwig @ 2009-11-03 15:16 UTC (permalink / raw) To: Martin K. Petersen; +Cc: Christoph Hellwig, linux-scsi, axboe, matthew On Mon, Nov 02, 2009 at 08:29:08AM -0500, Martin K. Petersen wrote: > Similarly, nothing prevents you from submitting a misaligned unmap > request. If you do, the array will just ignore the portions that do not > constitute entire allocation units. Your code is taking what is a hint > and turning it into a hard limit. Note that it's called OPTIMAL UNMAP > GRANULARITY, not REQUIRED UNMAP GRANULARITY. > > Every vendor I have talked to have asked us to always unmap the *entire* > LBA range we're interested in freeing. No exceptions. > > We don't throw away the beginning/end of a read/write request because > it's not properly aligned either, do we? read/write is different. If we throw parts of it away we lose data, we don't do for unmap requests. I'll resend a patch series that moves the trimming of the range into sd.c as suggested later and make it tunable. That might not be what we want to keep long term, but it'll allow us benchmarking both variants on the real life arrays. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/2] sd: add support for WRITE SAME (16) with unmap bit 2009-10-29 15:08 [PATCH 0/2] scsi discard support Christoph Hellwig 2009-10-29 15:08 ` [PATCH 1/2] block: add support for discard limits Christoph Hellwig @ 2009-10-29 15:08 ` Christoph Hellwig 1 sibling, 0 replies; 14+ messages in thread From: Christoph Hellwig @ 2009-10-29 15:08 UTC (permalink / raw) To: linux-scsi; +Cc: mkp, axboe, matthew [-- Attachment #1: discard-add-scsi-write-same-support --] [-- Type: text/plain, Size: 3973 bytes --] Send WRITE SAME request with the unmap bit set to the device if it advertises thin provisioning support. Signed-off-by: Christoph Hellwig <hch@lst.de> Index: linux-2.6/drivers/scsi/sd.c =================================================================== --- linux-2.6.orig/drivers/scsi/sd.c 2009-10-29 15:46:55.754006357 +0100 +++ linux-2.6/drivers/scsi/sd.c 2009-10-29 15:49:05.825279362 +0100 @@ -398,6 +398,35 @@ static void sd_prot_op(struct scsi_cmnd scsi_set_prot_type(scmd, dif); } +static void sd_prepare_discard(struct request_queue *q, struct request *rq) +{ + struct bio *bio = rq->bio; + + rq->cmd_type = REQ_TYPE_BLOCK_PC; + rq->timeout = SD_TIMEOUT; + rq->cmd[0] = WRITE_SAME_16; + rq->cmd[1] = 0x8; /* UNMAP bit */ + rq->cmd[2] = sizeof(bio->bi_sector) > 4 ? + (unsigned char) (bio->bi_sector >> 56) & 0xff : 0; + rq->cmd[3] = sizeof(bio->bi_sector) > 4 ? + (unsigned char) (bio->bi_sector >> 48) & 0xff : 0; + rq->cmd[4] = sizeof(bio->bi_sector) > 4 ? + (unsigned char) (bio->bi_sector >> 40) & 0xff : 0; + rq->cmd[5] = sizeof(bio->bi_sector) > 4 ? + (unsigned char) (bio->bi_sector >> 32) & 0xff : 0; + rq->cmd[6] = (unsigned char) (bio->bi_sector >> 24) & 0xff; + rq->cmd[7] = (unsigned char) (bio->bi_sector >> 16) & 0xff; + rq->cmd[8] = (unsigned char) (bio->bi_sector >> 8) & 0xff; + rq->cmd[9] = (unsigned char) bio->bi_sector & 0xff; + rq->cmd[10] = (unsigned char) (bio_sectors(bio) >> 24) & 0xff; + rq->cmd[11] = (unsigned char) (bio_sectors(bio) >> 16) & 0xff; + rq->cmd[12] = (unsigned char) (bio_sectors(bio) >> 8) & 0xff; + rq->cmd[13] = (unsigned char) bio_sectors(bio) & 0xff; + rq->cmd[14] = 0; + rq->cmd[15] = 0; + rq->cmd_len = 16; +} + /** * sd_init_command - build a scsi (read or write) command from * information in the request structure. @@ -418,6 +447,13 @@ static int sd_prep_fn(struct request_que int ret, host_dif; unsigned char protect; + /* + * Discard request come in as REQ_TYPE_FS but we turn them into + * block PC requests to make life easier. + */ + if (blk_discard_rq(rq)) + sd_prepare_discard(q, rq); + if (rq->cmd_type == REQ_TYPE_BLOCK_PC) { ret = scsi_setup_blk_pc_cmnd(sdp, rq); goto out; @@ -425,6 +461,7 @@ static int sd_prep_fn(struct request_que ret = BLKPREP_KILL; goto out; } + ret = scsi_setup_fs_cmnd(sdp, rq); if (ret != BLKPREP_OK) goto out; @@ -1432,6 +1469,9 @@ static int read_capacity_16(struct scsi_ sd_printk(KERN_NOTICE, sdkp, "physical block alignment offset: %u\n", alignment); + if (buffer[14] & 0x80) + sdkp->thin_provisioning = 1; + sdkp->capacity = lba + 1; return sector_size; } @@ -1877,6 +1917,17 @@ static void sd_read_block_limits(struct blk_queue_io_opt(sdkp->disk->queue, get_unaligned_be32(&buffer[12]) * sector_sz); + if (sdkp->thin_provisioning) { + sdkp->disk->queue->limits.discard_granularity = + get_unaligned_be32(&buffer[28]); + + if (buffer[32] & 0x80) { + buffer[32] &= ~0x80; + sdkp->disk->queue->limits.discard_alignment = + get_unaligned_be32(&buffer[32]); + } + } + kfree(buffer); } @@ -1979,6 +2030,11 @@ static int sd_revalidate_disk(struct gen blk_queue_ordered(sdkp->disk->queue, ordered, sd_prepare_flush); + if (sdkp->thin_provisioning) { + queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, sdkp->disk->queue); + blk_queue_max_discard_sectors(sdp->request_queue, UINT_MAX); + } + set_capacity(disk, sdkp->capacity); kfree(buffer); Index: linux-2.6/drivers/scsi/sd.h =================================================================== --- linux-2.6.orig/drivers/scsi/sd.h 2009-10-29 15:49:00.890254050 +0100 +++ linux-2.6/drivers/scsi/sd.h 2009-10-29 15:49:05.826278860 +0100 @@ -60,6 +60,7 @@ struct scsi_disk { unsigned RCD : 1; /* state of disk RCD bit, unused */ unsigned DPOFUA : 1; /* state of disk DPOFUA bit */ unsigned first_scan : 1; + unsigned thin_provisioning : 1; }; #define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev) ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2009-11-03 15:16 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-10-29 15:08 [PATCH 0/2] scsi discard support Christoph Hellwig 2009-10-29 15:08 ` [PATCH 1/2] block: add support for discard limits Christoph Hellwig 2009-10-29 19:06 ` Jens Axboe 2009-10-30 5:03 ` Christoph Hellwig 2009-11-02 12:39 ` Jens Axboe 2009-10-30 3:19 ` Martin K. Petersen 2009-10-30 5:05 ` Christoph Hellwig 2009-11-02 13:29 ` Martin K. Petersen 2009-11-02 15:39 ` James Bottomley 2009-11-02 18:16 ` Martin K. Petersen 2009-11-02 16:02 ` Boaz Harrosh 2009-11-02 18:18 ` Martin K. Petersen 2009-11-03 15:16 ` Christoph Hellwig 2009-10-29 15:08 ` [PATCH 2/2] sd: add support for WRITE SAME (16) with unmap bit Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox