* Thin provisioning fixes (take 3)
@ 2009-11-26 17:00 Martin K. Petersen
2009-11-26 17:00 ` [PATCH 1/5] block: Allow devices to indicate whether discarded blocks are zeroed Martin K. Petersen
` (5 more replies)
0 siblings, 6 replies; 13+ messages in thread
From: Martin K. Petersen @ 2009-11-26 17:00 UTC (permalink / raw)
To: jens.axboe, james.bottomley, hch, willy, jgarzik, sandeen,
rwheeler, linux-ide
Changes since last kit:
- Patch 1 (block): Wire up an ioctl that reports whether the
block device zeroes after discard
- Patch 2 (sd): Tweak max_discard_sectors reporting
- Patch 3 (scsi, new): Retry temporary TP write errors
- Patch 4 (libata): No change
- Patch 5 (libata): Switch to a bytes-in, bytes-out convention
for ata_set_lba_range_entries() to make things more clear
I think we need a bit R&D wrt. discard coalescing and I doubt
we'll have that ready for the merge window. So I suggest we
stick with the current loop-over-max-discard-sectors approach for
2.6.32.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/5] block: Allow devices to indicate whether discarded blocks are zeroed
2009-11-26 17:00 Thin provisioning fixes (take 3) Martin K. Petersen
@ 2009-11-26 17:00 ` Martin K. Petersen
2009-11-26 17:00 ` [PATCH 2/5] sd: WRITE SAME(16) / UNMAP support Martin K. Petersen
` (4 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Martin K. Petersen @ 2009-11-26 17:00 UTC (permalink / raw)
To: jens.axboe, james.bottomley, hch, willy, jgarzik, sandeen,
rwheeler, linux-ide
Cc: Martin K. Petersen
The discard ioctl is used by mkfs utilities to clear a block device
prior to putting metadata down. However, not all devices return zeroed
blocks after a discard. Some drives return stale data, potentially
containing old superblocks. It is therefore important to know whether
discarded blocks are properly zeroed.
Both ATA and SCSI drives have configuration bits that indicate whether
zeroes are returned after a discard operation. Implement a block level
interface that allows this information to be bubbled up the stack and
queried via a new block device ioctl.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
block/blk-settings.c | 2 ++
block/blk-sysfs.c | 11 +++++++++++
block/compat_ioctl.c | 2 ++
block/ioctl.c | 2 ++
include/linux/blkdev.h | 14 ++++++++++++++
include/linux/fs.h | 1 +
6 files changed, 32 insertions(+), 0 deletions(-)
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 7f986ca..1027e30 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -100,6 +100,7 @@ void blk_set_default_limits(struct queue_limits *lim)
lim->discard_granularity = 0;
lim->discard_alignment = 0;
lim->discard_misaligned = 0;
+ lim->discard_zeroes_data = -1;
lim->logical_block_size = lim->physical_block_size = lim->io_min = 512;
lim->bounce_pfn = (unsigned long)(BLK_BOUNCE_ANY >> PAGE_SHIFT);
lim->alignment_offset = 0;
@@ -543,6 +544,7 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
t->io_min = max(t->io_min, b->io_min);
t->no_cluster |= b->no_cluster;
+ t->discard_zeroes_data &= b->discard_zeroes_data;
/* Bottom device offset aligned? */
if (offset &&
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 3147145..8606c95 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -136,6 +136,11 @@ static ssize_t queue_discard_max_show(struct request_queue *q, char *page)
return queue_var_show(q->limits.max_discard_sectors << 9, page);
}
+static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *page)
+{
+ return queue_var_show(queue_discard_zeroes_data(q), page);
+}
+
static ssize_t
queue_max_sectors_store(struct request_queue *q, const char *page, size_t count)
{
@@ -313,6 +318,11 @@ static struct queue_sysfs_entry queue_discard_max_entry = {
.show = queue_discard_max_show,
};
+static struct queue_sysfs_entry queue_discard_zeroes_data_entry = {
+ .attr = {.name = "discard_zeroes_data", .mode = S_IRUGO },
+ .show = queue_discard_zeroes_data_show,
+};
+
static struct queue_sysfs_entry queue_nonrot_entry = {
.attr = {.name = "rotational", .mode = S_IRUGO | S_IWUSR },
.show = queue_nonrot_show,
@@ -350,6 +360,7 @@ static struct attribute *default_attrs[] = {
&queue_io_opt_entry.attr,
&queue_discard_granularity_entry.attr,
&queue_discard_max_entry.attr,
+ &queue_discard_zeroes_data_entry.attr,
&queue_nonrot_entry.attr,
&queue_nomerges_entry.attr,
&queue_rq_affinity_entry.attr,
diff --git a/block/compat_ioctl.c b/block/compat_ioctl.c
index 9bd086c..4eb8e9e 100644
--- a/block/compat_ioctl.c
+++ b/block/compat_ioctl.c
@@ -747,6 +747,8 @@ long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg)
return compat_put_uint(arg, bdev_io_opt(bdev));
case BLKALIGNOFF:
return compat_put_int(arg, bdev_alignment_offset(bdev));
+ case BLKDISCARDZEROES:
+ return compat_put_uint(arg, bdev_discard_zeroes_data(bdev));
case BLKFLSBUF:
case BLKROSET:
case BLKDISCARD:
diff --git a/block/ioctl.c b/block/ioctl.c
index 1f4d1de..be48ea5 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -280,6 +280,8 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
return put_uint(arg, bdev_io_opt(bdev));
case BLKALIGNOFF:
return put_int(arg, bdev_alignment_offset(bdev));
+ case BLKDISCARDZEROES:
+ return put_uint(arg, bdev_discard_zeroes_data(bdev));
case BLKSECTGET:
return put_ushort(arg, queue_max_sectors(bdev_get_queue(bdev)));
case BLKRASET:
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3b67221..bb92f15 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -322,6 +322,7 @@ struct queue_limits {
unsigned char misaligned;
unsigned char discard_misaligned;
unsigned char no_cluster;
+ signed char discard_zeroes_data;
};
struct request_queue
@@ -1152,6 +1153,19 @@ static inline int queue_sector_discard_alignment(struct request_queue *q,
& (q->limits.discard_granularity - 1);
}
+static inline unsigned int queue_discard_zeroes_data(struct request_queue *q)
+{
+ if (q->limits.discard_zeroes_data == 1)
+ return 1;
+
+ return 0;
+}
+
+static inline unsigned int bdev_discard_zeroes_data(struct block_device *bdev)
+{
+ return queue_discard_zeroes_data(bdev_get_queue(bdev));
+}
+
static inline int queue_dma_alignment(struct request_queue *q)
{
return q ? q->dma_alignment : 511;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2620a8c..6962d0d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -304,6 +304,7 @@ struct inodes_stat_t {
#define BLKIOOPT _IO(0x12,121)
#define BLKALIGNOFF _IO(0x12,122)
#define BLKPBSZGET _IO(0x12,123)
+#define BLKDISCARDZEROES _IO(0x12,124)
#define BMAP_IOCTL 1 /* obsolete - kept for compatibility */
#define FIBMAP _IO(0x00,1) /* bmap access */
--
1.6.0.6
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/5] sd: WRITE SAME(16) / UNMAP support
2009-11-26 17:00 Thin provisioning fixes (take 3) Martin K. Petersen
2009-11-26 17:00 ` [PATCH 1/5] block: Allow devices to indicate whether discarded blocks are zeroed Martin K. Petersen
@ 2009-11-26 17:00 ` Martin K. Petersen
2009-11-26 17:00 ` [PATCH 3/5] scsi: Correctly handle thin provisioning write error Martin K. Petersen
` (3 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Martin K. Petersen @ 2009-11-26 17:00 UTC (permalink / raw)
To: jens.axboe, james.bottomley, hch, willy, jgarzik, sandeen,
rwheeler, linux-ide
Cc: Martin K. Petersen
Implement a function for handling discard requests that sends either
WRITE SAME(16) or UNMAP(10) depending on parameters indicated by the
device in the block limits VPD.
Extract unmap constraints and report them to the block layer.
Based in part by a patch by Christoph Hellwig <hch@lst.de>.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
drivers/scsi/sd.c | 107 +++++++++++++++++++++++++++++++++++++++++++++++++++++
drivers/scsi/sd.h | 2 +
2 files changed, 109 insertions(+), 0 deletions(-)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 9093c72..255da53 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -264,6 +264,15 @@ sd_show_app_tag_own(struct device *dev, struct device_attribute *attr,
return snprintf(buf, 20, "%u\n", sdkp->ATO);
}
+static ssize_t
+sd_show_thin_provisioning(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct scsi_disk *sdkp = to_scsi_disk(dev);
+
+ return snprintf(buf, 20, "%u\n", sdkp->thin_provisioning);
+}
+
static struct device_attribute sd_disk_attrs[] = {
__ATTR(cache_type, S_IRUGO|S_IWUSR, sd_show_cache_type,
sd_store_cache_type),
@@ -274,6 +283,7 @@ static struct device_attribute sd_disk_attrs[] = {
sd_store_manage_start_stop),
__ATTR(protection_type, S_IRUGO, sd_show_protection_type, NULL),
__ATTR(app_tag_own, S_IRUGO, sd_show_app_tag_own, NULL),
+ __ATTR(thin_provisioning, S_IRUGO, sd_show_thin_provisioning, NULL),
__ATTR_NULL,
};
@@ -399,6 +409,57 @@ static void sd_prot_op(struct scsi_cmnd *scmd, unsigned int dif)
}
/**
+ * sd_prepare_discard - unmap blocks on thinly provisioned device
+ * @rq: Request to prepare
+ *
+ * Will issue either UNMAP or WRITE SAME(16) depending on preference
+ * indicated by target device.
+ **/
+static int sd_prepare_discard(struct request *rq)
+{
+ struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
+ struct bio *bio = rq->bio;
+ sector_t sector = bio->bi_sector;
+ unsigned int num = bio_sectors(bio);
+
+ if (sdkp->device->sector_size == 4096) {
+ sector >>= 3;
+ num >>= 3;
+ }
+
+ rq->cmd_type = REQ_TYPE_BLOCK_PC;
+ rq->timeout = SD_TIMEOUT;
+
+ memset(rq->cmd, 0, rq->cmd_len);
+
+ if (sdkp->unmap) {
+ char *buf = kmap_atomic(bio_page(bio), KM_USER0);
+
+ rq->cmd[0] = UNMAP;
+ rq->cmd[8] = 24;
+ rq->cmd_len = 10;
+
+ /* Ensure that data length matches payload */
+ rq->__data_len = bio->bi_size = bio->bi_io_vec->bv_len = 24;
+
+ put_unaligned_be16(6 + 16, &buf[0]);
+ put_unaligned_be16(16, &buf[2]);
+ put_unaligned_be64(sector, &buf[8]);
+ put_unaligned_be32(num, &buf[16]);
+
+ kunmap_atomic(buf, KM_USER0);
+ } else {
+ rq->cmd[0] = WRITE_SAME_16;
+ rq->cmd[1] = 0x8; /* UNMAP */
+ put_unaligned_be64(sector, &rq->cmd[2]);
+ put_unaligned_be32(num, &rq->cmd[10]);
+ rq->cmd_len = 16;
+ }
+
+ return BLKPREP_OK;
+}
+
+/**
* sd_init_command - build a scsi (read or write) command from
* information in the request structure.
* @SCpnt: pointer to mid-level's per scsi command structure that
@@ -418,6 +479,13 @@ static int sd_prep_fn(struct request_queue *q, struct request *rq)
int ret, host_dif;
unsigned char protect;
+ /*
+ * Discard request come in as REQ_TYPE_FS but we turn them into
+ * block PC requests to make life easier.
+ */
+ if (blk_discard_rq(rq))
+ ret = sd_prepare_discard(rq);
+
if (rq->cmd_type == REQ_TYPE_BLOCK_PC) {
ret = scsi_setup_blk_pc_cmnd(sdp, rq);
goto out;
@@ -1432,6 +1500,19 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
sd_printk(KERN_NOTICE, sdkp,
"physical block alignment offset: %u\n", alignment);
+ if (buffer[14] & 0x80) { /* TPE */
+ struct request_queue *q = sdp->request_queue;
+
+ sdkp->thin_provisioning = 1;
+ q->limits.discard_granularity = sdkp->hw_sector_size;
+ q->limits.max_discard_sectors = 0xffffffff;
+
+ if (buffer[14] & 0x40) /* TPRZ */
+ q->limits.discard_zeroes_data = 1;
+
+ queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
+ }
+
sdkp->capacity = lba + 1;
return sector_size;
}
@@ -1863,6 +1944,7 @@ void sd_read_app_tag_own(struct scsi_disk *sdkp, unsigned char *buffer)
*/
static void sd_read_block_limits(struct scsi_disk *sdkp)
{
+ struct request_queue *q = sdkp->disk->queue;
unsigned int sector_sz = sdkp->device->sector_size;
char *buffer;
@@ -1877,6 +1959,31 @@ static void sd_read_block_limits(struct scsi_disk *sdkp)
blk_queue_io_opt(sdkp->disk->queue,
get_unaligned_be32(&buffer[12]) * sector_sz);
+ /* Thin provisioning enabled and page length indicates TP support */
+ if (sdkp->thin_provisioning && buffer[3] == 0x3c) {
+ unsigned int lba_count, desc_count, granularity;
+
+ lba_count = get_unaligned_be32(&buffer[20]);
+ desc_count = get_unaligned_be32(&buffer[24]);
+
+ if (lba_count) {
+ q->limits.max_discard_sectors =
+ lba_count * sector_sz >> 9;
+
+ if (desc_count)
+ sdkp->unmap = 1;
+ }
+
+ granularity = get_unaligned_be32(&buffer[28]);
+
+ if (granularity)
+ q->limits.discard_granularity = granularity * sector_sz;
+
+ if (buffer[32] & 0x80)
+ q->limits.discard_alignment =
+ get_unaligned_be32(&buffer[32]) & ~(1 << 31);
+ }
+
kfree(buffer);
}
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index e374804..43d3caf 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -60,6 +60,8 @@ struct scsi_disk {
unsigned RCD : 1; /* state of disk RCD bit, unused */
unsigned DPOFUA : 1; /* state of disk DPOFUA bit */
unsigned first_scan : 1;
+ unsigned thin_provisioning : 1;
+ unsigned unmap : 1;
};
#define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)
--
1.6.0.6
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 3/5] scsi: Correctly handle thin provisioning write error
2009-11-26 17:00 Thin provisioning fixes (take 3) Martin K. Petersen
2009-11-26 17:00 ` [PATCH 1/5] block: Allow devices to indicate whether discarded blocks are zeroed Martin K. Petersen
2009-11-26 17:00 ` [PATCH 2/5] sd: WRITE SAME(16) / UNMAP support Martin K. Petersen
@ 2009-11-26 17:00 ` Martin K. Petersen
2009-11-26 17:00 ` [PATCH 4/5] libata: Report zeroed read after Trim and max discard size Martin K. Petersen
` (2 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Martin K. Petersen @ 2009-11-26 17:00 UTC (permalink / raw)
To: jens.axboe, james.bottomley, hch, willy, jgarzik, sandeen,
rwheeler, linux-ide
Cc: Martin K. Petersen
A thin provisioned device may temporarily be out of sufficient
allocation units to fulfill a write request. In that case it will
return a space allocation in progress error. Wait a bit and retry the
write.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
drivers/scsi/scsi_lib.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 5987da8..dd12183 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -859,6 +859,7 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
case 0x07: /* operation in progress */
case 0x08: /* Long write in progress */
case 0x09: /* self test in progress */
+ case 0x14: /* space allocation in progress */
action = ACTION_DELAYED_RETRY;
break;
default:
--
1.6.0.6
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 4/5] libata: Report zeroed read after Trim and max discard size
2009-11-26 17:00 Thin provisioning fixes (take 3) Martin K. Petersen
` (2 preceding siblings ...)
2009-11-26 17:00 ` [PATCH 3/5] scsi: Correctly handle thin provisioning write error Martin K. Petersen
@ 2009-11-26 17:00 ` Martin K. Petersen
2009-11-27 1:55 ` Mark Lord
2009-11-26 17:00 ` [PATCH 5/5] libata: Clarify ata_set_lba_range_entries function Martin K. Petersen
2009-11-26 17:09 ` Thin provisioning fixes (take 3) James Bottomley
5 siblings, 1 reply; 13+ messages in thread
From: Martin K. Petersen @ 2009-11-26 17:00 UTC (permalink / raw)
To: jens.axboe, james.bottomley, hch, willy, jgarzik, sandeen,
rwheeler, linux-ide
Cc: Martin K. Petersen
Our current Trim payload is a single sector that can accommodate 64 *
65535 blocks being unmapped. Report this value in the Block Limits
Maximum Unmap LBA count field.
If a storage device supports Trim and the DRAT and RZAT bits are set,
report TPRZ=1 in Read Capacity(16).
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
drivers/ata/libata-scsi.c | 12 +++++++++---
include/linux/ata.h | 10 ++++++++++
2 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index e0995c4..08d4ab7 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -2116,8 +2116,10 @@ static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf)
* that we support some form of unmap - in thise case via WRITE SAME
* with the unmap bit set.
*/
- if (ata_id_has_trim(args->id))
+ if (ata_id_has_trim(args->id)) {
+ put_unaligned_be32(65535 * 512 / 8, &rbuf[20]);
put_unaligned_be32(1, &rbuf[28]);
+ }
return 0;
}
@@ -2412,8 +2414,12 @@ static unsigned int ata_scsiop_read_cap(struct ata_scsi_args *args, u8 *rbuf)
rbuf[14] = (lowest_aligned >> 8) & 0x3f;
rbuf[15] = lowest_aligned;
- if (ata_id_has_trim(args->id))
- rbuf[14] |= 0x80;
+ if (ata_id_has_trim(args->id)) {
+ rbuf[14] |= 0x80; /* TPE */
+
+ if (ata_id_has_zero_after_trim(args->id))
+ rbuf[14] |= 0x40; /* TPRZ */
+ }
}
return 0;
diff --git a/include/linux/ata.h b/include/linux/ata.h
index e2595e8..b18b2bb 100644
--- a/include/linux/ata.h
+++ b/include/linux/ata.h
@@ -75,6 +75,7 @@ enum {
ATA_ID_EIDE_DMA_TIME = 66,
ATA_ID_EIDE_PIO = 67,
ATA_ID_EIDE_PIO_IORDY = 68,
+ ATA_ID_ADDITIONAL_SUPP = 69,
ATA_ID_QUEUE_DEPTH = 75,
ATA_ID_MAJOR_VER = 80,
ATA_ID_COMMAND_SET_1 = 82,
@@ -816,6 +817,15 @@ static inline int ata_id_has_trim(const u16 *id)
return 0;
}
+static inline int ata_id_has_zero_after_trim(const u16 *id)
+{
+ /* DSM supported, deterministic read, and read zero after trim set */
+ if (ata_id_has_trim(id) && id[ATA_ID_ADDITIONAL_SUPP] & 0x4020)
+ return 1;
+
+ return 0;
+}
+
static inline int ata_id_current_chs_valid(const u16 *id)
{
/* For ATA-1 devices, if the INITIALIZE DEVICE PARAMETERS command
--
1.6.0.6
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 5/5] libata: Clarify ata_set_lba_range_entries function
2009-11-26 17:00 Thin provisioning fixes (take 3) Martin K. Petersen
` (3 preceding siblings ...)
2009-11-26 17:00 ` [PATCH 4/5] libata: Report zeroed read after Trim and max discard size Martin K. Petersen
@ 2009-11-26 17:00 ` Martin K. Petersen
2009-12-03 23:02 ` Jeff Garzik
2009-11-26 17:09 ` Thin provisioning fixes (take 3) James Bottomley
5 siblings, 1 reply; 13+ messages in thread
From: Martin K. Petersen @ 2009-11-26 17:00 UTC (permalink / raw)
To: jens.axboe, james.bottomley, hch, willy, jgarzik, sandeen,
rwheeler, linux-ide
Cc: Martin K. Petersen
ata_set_lba_range_entries used the variable max for two different things
which was confusing. Make the function take a buffer size in bytes as
argument and return the used buffer size upon completion.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
drivers/ata/libata-scsi.c | 2 +-
include/linux/ata.h | 20 ++++++++++----------
2 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 08d4ab7..e06ee7b 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -2973,7 +2973,7 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
goto invalid_fld;
buf = page_address(sg_page(scsi_sglist(scmd)));
- size = ata_set_lba_range_entries(buf, 512 / 8, block, n_block);
+ size = ata_set_lba_range_entries(buf, 512, block, n_block);
tf->protocol = ATA_PROT_DMA;
tf->hob_feature = 0;
diff --git a/include/linux/ata.h b/include/linux/ata.h
index b18b2bb..54a8ffb 100644
--- a/include/linux/ata.h
+++ b/include/linux/ata.h
@@ -981,17 +981,17 @@ static inline void ata_id_to_hd_driveid(u16 *id)
}
/*
- * Write up to 'max' LBA Range Entries to the buffer that will cover the
- * extent from sector to sector + count. This is used for TRIM and for
- * ADD LBA(S) TO NV CACHE PINNED SET.
+ * Write LBA Range Entries to the buffer that will cover the extent from
+ * sector to sector + count. This is used for TRIM and for ADD LBA(S)
+ * TO NV CACHE PINNED SET.
*/
-static inline unsigned ata_set_lba_range_entries(void *_buffer, unsigned max,
- u64 sector, unsigned long count)
+static inline unsigned ata_set_lba_range_entries(void *_buffer,
+ unsigned buf_size, u64 sector, unsigned long count)
{
__le64 *buffer = _buffer;
- unsigned i = 0;
+ unsigned i = 0, used_bytes;
- while (i < max) {
+ while (i < buf_size / 8 ) { /* 6-byte LBA + 2-byte range per entry */
u64 entry = sector |
((u64)(count > 0xffff ? 0xffff : count) << 48);
buffer[i++] = __cpu_to_le64(entry);
@@ -1001,9 +1001,9 @@ static inline unsigned ata_set_lba_range_entries(void *_buffer, unsigned max,
sector += 0xffff;
}
- max = ALIGN(i * 8, 512);
- memset(buffer + i, 0, max - i * 8);
- return max;
+ used_bytes = ALIGN(i * 8, 512);
+ memset(buffer + i, 0, used_bytes - i * 8);
+ return used_bytes;
}
static inline int is_multi_taskfile(struct ata_taskfile *tf)
--
1.6.0.6
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: Thin provisioning fixes (take 3)
2009-11-26 17:00 Thin provisioning fixes (take 3) Martin K. Petersen
` (4 preceding siblings ...)
2009-11-26 17:00 ` [PATCH 5/5] libata: Clarify ata_set_lba_range_entries function Martin K. Petersen
@ 2009-11-26 17:09 ` James Bottomley
2009-11-26 17:27 ` Martin K. Petersen
5 siblings, 1 reply; 13+ messages in thread
From: James Bottomley @ 2009-11-26 17:09 UTC (permalink / raw)
To: Martin K. Petersen
Cc: jens.axboe, hch, willy, jgarzik, sandeen, rwheeler, linux-ide,
linux-scsi
On Thu, 2009-11-26 at 12:00 -0500, Martin K. Petersen wrote:
> Changes since last kit:
>
> - Patch 1 (block): Wire up an ioctl that reports whether the
> block device zeroes after discard
>
> - Patch 2 (sd): Tweak max_discard_sectors reporting
>
> - Patch 3 (scsi, new): Retry temporary TP write errors
>
> - Patch 4 (libata): No change
>
> - Patch 5 (libata): Switch to a bytes-in, bytes-out convention
> for ata_set_lba_range_entries() to make things more clear
>
> I think we need a bit R&D wrt. discard coalescing and I doubt
> we'll have that ready for the merge window. So I suggest we
> stick with the current loop-over-max-discard-sectors approach for
> 2.6.32.
So you forgot to specify ordering. I think the right ordering is that
actually, there isn't any. Jens can take 1, I can take 2,3 and Jeff can
take 4,5 without running any postmerge trees ... is that correct?
James
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Thin provisioning fixes (take 3)
2009-11-26 17:09 ` Thin provisioning fixes (take 3) James Bottomley
@ 2009-11-26 17:27 ` Martin K. Petersen
0 siblings, 0 replies; 13+ messages in thread
From: Martin K. Petersen @ 2009-11-26 17:27 UTC (permalink / raw)
To: James Bottomley
Cc: Martin K. Petersen, jens.axboe, hch, willy, jgarzik, sandeen,
rwheeler, linux-ide, linux-scsi
>>>>> "James" == James Bottomley <James.Bottomley@suse.de> writes:
James> So you forgot to specify ordering. I think the right ordering is
James> that actually, there isn't any. Jens can take 1, I can take 2,3
James> and Jeff can take 4,5 without running any postmerge trees ... is
James> that correct?
The only dependency is that the sd patch (#2) requires "block: Expose
discard granularity" (86b37281411cf1e9bc0a6b5406c45edb7bd9ea5d) in Jens'
for-2.6.33 tree.
So we will need either a rebase or a postmerge tree.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 4/5] libata: Report zeroed read after Trim and max discard size
2009-11-26 17:00 ` [PATCH 4/5] libata: Report zeroed read after Trim and max discard size Martin K. Petersen
@ 2009-11-27 1:55 ` Mark Lord
2009-11-27 3:46 ` Martin K. Petersen
0 siblings, 1 reply; 13+ messages in thread
From: Mark Lord @ 2009-11-27 1:55 UTC (permalink / raw)
To: Martin K. Petersen
Cc: jens.axboe, james.bottomley, hch, willy, jgarzik, sandeen,
rwheeler, linux-ide, linux-scsi
Martin K. Petersen wrote:
..
> If a storage device supports Trim and the DRAT and RZAT bits are set,
> report TPRZ=1 in Read Capacity(16).
..
> +static inline int ata_id_has_zero_after_trim(const u16 *id)
> +{
> + /* DSM supported, deterministic read, and read zero after trim set */
> + if (ata_id_has_trim(id) && id[ATA_ID_ADDITIONAL_SUPP] & 0x4020)
> + return 1;
..
Is that correct, or should it be this:
if (ata_id_has_trim(id) && (id[ATA_ID_ADDITIONAL_SUPP] & 0x4020) == 0x4020)
??
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 4/5] libata: Report zeroed read after Trim and max discard size
2009-11-27 1:55 ` Mark Lord
@ 2009-11-27 3:46 ` Martin K. Petersen
2009-12-03 7:57 ` Jeff Garzik
2009-12-03 23:02 ` Jeff Garzik
0 siblings, 2 replies; 13+ messages in thread
From: Martin K. Petersen @ 2009-11-27 3:46 UTC (permalink / raw)
To: Mark Lord
Cc: Martin K. Petersen, jens.axboe, james.bottomley, hch, willy,
jgarzik, sandeen, rwheeler, linux-ide, linux-scsi
>>>>> "Mark" == Mark Lord <kernel@teksavvy.com> writes:
Mark> Is that correct, or should it be this:
Mark> if (ata_id_has_trim(id) && (id[ATA_ID_ADDITIONAL_SUPP] &
Mark> 0x4020) == 0x4020)
Good spotting! You are right, I need both bits to be set...
libata: Report zeroed read after Trim and max discard size
Our current Trim payload is a single sector that can accommodate 64 *
65535 blocks being unmapped. Report this value in the Block Limits
Maximum Unmap LBA count field.
If a storage device supports Trim and the DRAT and RZAT bits are set,
report TPRZ=1 in Read Capacity(16).
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index e0995c4..08d4ab7 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -2116,8 +2116,10 @@ static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf)
* that we support some form of unmap - in thise case via WRITE SAME
* with the unmap bit set.
*/
- if (ata_id_has_trim(args->id))
+ if (ata_id_has_trim(args->id)) {
+ put_unaligned_be32(65535 * 512 / 8, &rbuf[20]);
put_unaligned_be32(1, &rbuf[28]);
+ }
return 0;
}
@@ -2412,8 +2414,12 @@ static unsigned int ata_scsiop_read_cap(struct ata_scsi_args *args, u8 *rbuf)
rbuf[14] = (lowest_aligned >> 8) & 0x3f;
rbuf[15] = lowest_aligned;
- if (ata_id_has_trim(args->id))
- rbuf[14] |= 0x80;
+ if (ata_id_has_trim(args->id)) {
+ rbuf[14] |= 0x80; /* TPE */
+
+ if (ata_id_has_zero_after_trim(args->id))
+ rbuf[14] |= 0x40; /* TPRZ */
+ }
}
return 0;
diff --git a/include/linux/ata.h b/include/linux/ata.h
index e2595e8..dfa2298 100644
--- a/include/linux/ata.h
+++ b/include/linux/ata.h
@@ -75,6 +75,7 @@ enum {
ATA_ID_EIDE_DMA_TIME = 66,
ATA_ID_EIDE_PIO = 67,
ATA_ID_EIDE_PIO_IORDY = 68,
+ ATA_ID_ADDITIONAL_SUPP = 69,
ATA_ID_QUEUE_DEPTH = 75,
ATA_ID_MAJOR_VER = 80,
ATA_ID_COMMAND_SET_1 = 82,
@@ -816,6 +817,16 @@ static inline int ata_id_has_trim(const u16 *id)
return 0;
}
+static inline int ata_id_has_zero_after_trim(const u16 *id)
+{
+ /* DSM supported, deterministic read, and read zero after trim set */
+ if (ata_id_has_trim(id) &&
+ (id[ATA_ID_ADDITIONAL_SUPP] & 0x4020) == 0x4020)
+ return 1;
+
+ return 0;
+}
+
static inline int ata_id_current_chs_valid(const u16 *id)
{
/* For ATA-1 devices, if the INITIALIZE DEVICE PARAMETERS command
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 4/5] libata: Report zeroed read after Trim and max discard size
2009-11-27 3:46 ` Martin K. Petersen
@ 2009-12-03 7:57 ` Jeff Garzik
2009-12-03 23:02 ` Jeff Garzik
1 sibling, 0 replies; 13+ messages in thread
From: Jeff Garzik @ 2009-12-03 7:57 UTC (permalink / raw)
To: Martin K. Petersen
Cc: Mark Lord, jens.axboe, james.bottomley, hch, willy, sandeen,
rwheeler, linux-ide, linux-scsi
On 11/26/2009 10:46 PM, Martin K. Petersen wrote:
>>>>>> "Mark" == Mark Lord<kernel@teksavvy.com> writes:
>
> Mark> Is that correct, or should it be this:
>
> Mark> if (ata_id_has_trim(id)&& (id[ATA_ID_ADDITIONAL_SUPP]&
> Mark> 0x4020) == 0x4020)
>
> Good spotting! You are right, I need both bits to be set...
>
>
> libata: Report zeroed read after Trim and max discard size
>
> Our current Trim payload is a single sector that can accommodate 64 *
> 65535 blocks being unmapped. Report this value in the Block Limits
> Maximum Unmap LBA count field.
>
> If a storage device supports Trim and the DRAT and RZAT bits are set,
> report TPRZ=1 in Read Capacity(16).
>
> Signed-off-by: Martin K. Petersen<martin.petersen@oracle.com>
The two libata patches look good. I need to review the thread one more
time, and then will apply these...
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 4/5] libata: Report zeroed read after Trim and max discard size
2009-11-27 3:46 ` Martin K. Petersen
2009-12-03 7:57 ` Jeff Garzik
@ 2009-12-03 23:02 ` Jeff Garzik
1 sibling, 0 replies; 13+ messages in thread
From: Jeff Garzik @ 2009-12-03 23:02 UTC (permalink / raw)
To: Martin K. Petersen
Cc: Mark Lord, jens.axboe, james.bottomley, hch, willy, sandeen,
rwheeler, linux-ide, linux-scsi
On 11/26/2009 10:46 PM, Martin K. Petersen wrote:
>>>>>> "Mark" == Mark Lord<kernel@teksavvy.com> writes:
>
> Mark> Is that correct, or should it be this:
>
> Mark> if (ata_id_has_trim(id)&& (id[ATA_ID_ADDITIONAL_SUPP]&
> Mark> 0x4020) == 0x4020)
>
> Good spotting! You are right, I need both bits to be set...
>
>
> libata: Report zeroed read after Trim and max discard size
>
> Our current Trim payload is a single sector that can accommodate 64 *
> 65535 blocks being unmapped. Report this value in the Block Limits
> Maximum Unmap LBA count field.
>
> If a storage device supports Trim and the DRAT and RZAT bits are set,
> report TPRZ=1 in Read Capacity(16).
>
> Signed-off-by: Martin K. Petersen<martin.petersen@oracle.com>
applied
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 5/5] libata: Clarify ata_set_lba_range_entries function
2009-11-26 17:00 ` [PATCH 5/5] libata: Clarify ata_set_lba_range_entries function Martin K. Petersen
@ 2009-12-03 23:02 ` Jeff Garzik
0 siblings, 0 replies; 13+ messages in thread
From: Jeff Garzik @ 2009-12-03 23:02 UTC (permalink / raw)
To: Martin K. Petersen
Cc: jens.axboe, james.bottomley, hch, willy, sandeen, rwheeler,
linux-ide, linux-scsi
On 11/26/2009 12:00 PM, Martin K. Petersen wrote:
> ata_set_lba_range_entries used the variable max for two different things
> which was confusing. Make the function take a buffer size in bytes as
> argument and return the used buffer size upon completion.
>
> Signed-off-by: Martin K. Petersen<martin.petersen@oracle.com>
> ---
> drivers/ata/libata-scsi.c | 2 +-
> include/linux/ata.h | 20 ++++++++++----------
> 2 files changed, 11 insertions(+), 11 deletions(-)
applied
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2009-12-03 23:02 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-26 17:00 Thin provisioning fixes (take 3) Martin K. Petersen
2009-11-26 17:00 ` [PATCH 1/5] block: Allow devices to indicate whether discarded blocks are zeroed Martin K. Petersen
2009-11-26 17:00 ` [PATCH 2/5] sd: WRITE SAME(16) / UNMAP support Martin K. Petersen
2009-11-26 17:00 ` [PATCH 3/5] scsi: Correctly handle thin provisioning write error Martin K. Petersen
2009-11-26 17:00 ` [PATCH 4/5] libata: Report zeroed read after Trim and max discard size Martin K. Petersen
2009-11-27 1:55 ` Mark Lord
2009-11-27 3:46 ` Martin K. Petersen
2009-12-03 7:57 ` Jeff Garzik
2009-12-03 23:02 ` Jeff Garzik
2009-11-26 17:00 ` [PATCH 5/5] libata: Clarify ata_set_lba_range_entries function Martin K. Petersen
2009-12-03 23:02 ` Jeff Garzik
2009-11-26 17:09 ` Thin provisioning fixes (take 3) James Bottomley
2009-11-26 17:27 ` Martin K. Petersen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).