* [PATCH v5 0/2] Add support for SCT Write Same
@ 2016-08-10 1:00 Shaun Tancheff
2016-08-10 1:00 ` [PATCH v5 1/2] Use kmap_atomic when rewriting attached page Shaun Tancheff
2016-08-10 1:00 ` [PATCH v5 2/2] Add support for SCT Write Same Shaun Tancheff
0 siblings, 2 replies; 16+ messages in thread
From: Shaun Tancheff @ 2016-08-10 1:00 UTC (permalink / raw)
To: linux-ide, linux-kernel
Cc: Shaun Tancheff, Christoph Hellwig, Tejun Heo, Josh Bingaman
At some point the method of issuing Write Same for ATA drives changed.
Currently write same is commonly available via SCT so expose the SCT
capabilities and use SCT Write Same when it is available.
This is useful for zoned based media that prefers to support discard
with lbprz set, aka discard zeroes data by mapping discard operations to
reset write pointer operations. Conventional zones that do not support
reset write pointer can still honor the discard zeroes data by issuing
a write same over the zone.
It may also be nice to know if various controllers that currently
disable WRITE SAME will work with the SCT Write Same code path:
aacraid, arcmsr, megaraid, 3w-9xxx, 3w-sas, 3w-xxxx, gdth, hpsa, ips,
megaraid, pmcraid, storvsc_drv
This patch against v4.8-rc1 is also at
https://github.com/stancheff/linux/tree/v4.8-rc1+ws.v5
git@github.com:stancheff/linux.git v4.8-rc1+ws.v5
Shaun Tancheff (2):
Use kmap_atomic when rewriting attached page
Add support for SCT Write Same
drivers/ata/libata-scsi.c | 240 ++++++++++++++++++++++++++++++++++++++++------
include/linux/ata.h | 69 ++++++++-----
2 files changed, 252 insertions(+), 57 deletions(-)
--
2.8.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v5 1/2] Use kmap_atomic when rewriting attached page
2016-08-10 1:00 [PATCH v5 0/2] Add support for SCT Write Same Shaun Tancheff
@ 2016-08-10 1:00 ` Shaun Tancheff
2016-08-10 10:56 ` Tom Yan
2016-08-11 14:13 ` Christoph Hellwig
2016-08-10 1:00 ` [PATCH v5 2/2] Add support for SCT Write Same Shaun Tancheff
1 sibling, 2 replies; 16+ messages in thread
From: Shaun Tancheff @ 2016-08-10 1:00 UTC (permalink / raw)
To: linux-ide, linux-kernel
Cc: Shaun Tancheff, Christoph Hellwig, Tejun Heo, Josh Bingaman,
Shaun Tancheff
The current SATL for WRITE_SAME does not protect against misaligned
pages. Additionally the associated page should also kmap'd when
being modified.
Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
---
v5: Added prep patch to work with non-page aligned scatterlist pages
and use kmap_atomic() to lock page during modification.
drivers/ata/libata-scsi.c | 53 ++++++++++++++++++++++++++++++++++++++++++-----
include/linux/ata.h | 26 -----------------------
2 files changed, 48 insertions(+), 31 deletions(-)
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index e207b33..a71067a 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -3282,16 +3282,60 @@ static unsigned int ata_scsi_pass_thru(struct ata_queued_cmd *qc)
return 1;
}
+/**
+ * ata_format_dsm_trim_descr() - SATL Write Same to DSM Trim
+ * @sg: Scatter / Gather list attached to command.
+ * @num: Maximum number of entries (nominally 64).
+ * @sector: Starting sector
+ * @count: Total Range of request
+ *
+ * Rewrite the WRITE SAME descriptor to be a DSM TRIM little-endian formatted
+ * descriptor.
+ *
+ * Upto 64 entries of the format:
+ * 63:48 Range Length
+ * 47:0 LBA
+ *
+ * Range Length of 0 is ignored.
+ * LBA's should be sorted order and not overlap.
+ *
+ * NOTE: this is the same format as ADD LBA(S) TO NV CACHE PINNED SET
+ */
+static unsigned int ata_format_dsm_trim_descr(struct scatterlist *sg, u32 num,
+ u64 sector, u32 count)
+{
+ void *ptr = kmap_atomic(sg_page(sg));
+ __le64 *buffer = ptr + sg->offset;
+ u32 i = 0, used_bytes;
+
+ while (i < num) {
+ u64 entry = sector |
+ ((u64)(count > 0xffff ? 0xffff : count) << 48);
+ buffer[i++] = __cpu_to_le64(entry);
+ if (count <= 0xffff)
+ break;
+ count -= 0xffff;
+ sector += 0xffff;
+ }
+
+ used_bytes = ALIGN(i * 8, 512);
+ memset(buffer + i, 0, used_bytes - i * 8);
+
+ kunmap_atomic(ptr);
+ return used_bytes;
+}
+
static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
{
struct ata_taskfile *tf = &qc->tf;
struct scsi_cmnd *scmd = qc->scsicmd;
struct ata_device *dev = qc->dev;
const u8 *cdb = scmd->cmnd;
+ struct scatterlist *sg;
u64 block;
u32 n_block;
+ const u32 trmax = ATA_MAX_TRIM_RNUM;
u32 size;
- void *buf;
u16 fp;
u8 bp = 0xff;
@@ -3319,10 +3363,9 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
if (!scsi_sg_count(scmd))
goto invalid_param_len;
- buf = page_address(sg_page(scsi_sglist(scmd)));
-
- if (n_block <= 65535 * ATA_MAX_TRIM_RNUM) {
- size = ata_set_lba_range_entries(buf, ATA_MAX_TRIM_RNUM, block, n_block);
+ sg = scsi_sglist(scmd);
+ if (n_block <= 0xffff * cmax) {
+ size = ata_format_dsm_trim_descr(sg, trmax, block, n_block);
} else {
fp = 2;
goto invalid_fld;
diff --git a/include/linux/ata.h b/include/linux/ata.h
index adbc812..45a1d71 100644
--- a/include/linux/ata.h
+++ b/include/linux/ata.h
@@ -1071,32 +1071,6 @@ static inline void ata_id_to_hd_driveid(u16 *id)
#endif
}
-/*
- * Write LBA Range Entries to the buffer that will cover the extent from
- * sector to sector + count. This is used for TRIM and for ADD LBA(S)
- * TO NV CACHE PINNED SET.
- */
-static inline unsigned ata_set_lba_range_entries(void *_buffer,
- unsigned num, u64 sector, unsigned long count)
-{
- __le64 *buffer = _buffer;
- unsigned i = 0, used_bytes;
-
- while (i < num) {
- u64 entry = sector |
- ((u64)(count > 0xffff ? 0xffff : count) << 48);
- buffer[i++] = __cpu_to_le64(entry);
- if (count <= 0xffff)
- break;
- count -= 0xffff;
- sector += 0xffff;
- }
-
- used_bytes = ALIGN(i * 8, 512);
- memset(buffer + i, 0, used_bytes - i * 8);
- return used_bytes;
-}
-
static inline bool ata_ok(u8 status)
{
return ((status & (ATA_BUSY | ATA_DRDY | ATA_DF | ATA_DRQ | ATA_ERR))
--
2.8.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v5 2/2] Add support for SCT Write Same
2016-08-10 1:00 [PATCH v5 0/2] Add support for SCT Write Same Shaun Tancheff
2016-08-10 1:00 ` [PATCH v5 1/2] Use kmap_atomic when rewriting attached page Shaun Tancheff
@ 2016-08-10 1:00 ` Shaun Tancheff
2016-08-10 6:31 ` Tom Yan
1 sibling, 1 reply; 16+ messages in thread
From: Shaun Tancheff @ 2016-08-10 1:00 UTC (permalink / raw)
To: linux-ide, linux-kernel
Cc: Shaun Tancheff, Christoph Hellwig, Tejun Heo, Josh Bingaman,
Shaun Tancheff
SATA drives may support write same via SCT. This is useful
for setting the drive contents to a specific pattern (0's).
Translate a SCSI WRITE SAME command to be either a DSM TRIM command or
an SCT Write Same command.
Based on the UNMAP flag:
- When set translate to DSM TRIM
- When not set translate to SCT Write Same
Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
---
v5:
- Addressed review comments
- Report support for ZBC only for zoned devices.
- kmap page during rewrite
- Fix unmap set to require trim or error, if not unmap then sct write
same or error.
v4:
- Added partial MAINTENANCE_IN opcode simulation
- Dropped all changes in drivers/scsi/*
- Changed to honor the UNMAP flag -> TRIM, no UNMAP -> SCT.
v3:
- Demux UNMAP/TRIM from WRITE SAME
v2:
- Remove fugly ata hacking from sd.c
---
drivers/ata/libata-scsi.c | 189 +++++++++++++++++++++++++++++++++++++++-------
include/linux/ata.h | 43 +++++++++++
2 files changed, 205 insertions(+), 27 deletions(-)
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index a71067a..99b0e6c 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1159,8 +1159,6 @@ static void ata_scsi_sdev_config(struct scsi_device *sdev)
{
sdev->use_10_for_rw = 1;
sdev->use_10_for_ms = 1;
- sdev->no_report_opcodes = 1;
- sdev->no_write_same = 1;
/* Schedule policy is determined by ->qc_defer() callback and
* it needs to see every deferred qc. Set dev_blocked to 1 to
@@ -3325,6 +3323,41 @@ static unsigned int ata_format_dsm_trim_descr(struct scatterlist *sg, u32 num,
return used_bytes;
}
+/**
+ * ata_format_dsm_trim_descr() - SATL Write Same to ATA SCT Write Same
+ * @sg: Scatter / Gather list attached to command.
+ * @lba: Starting sector
+ * @num: Number of bytes to be zero'd.
+ *
+ * Rewrite the WRITE SAME descriptor to be an SCT Write Same formatted
+ * descriptor.
+ * NOTE: Writes a pattern (0's) in the foreground.
+ * Large write-same requents can timeout.
+ */
+static void ata_format_sct_write_same(struct scatterlist *sg, u64 lba, u64 num)
+{
+ void *ptr = kmap_atomic(sg_page(sg));
+ u16 *sctpg = ptr + sg->offset;
+
+ put_unaligned_le16(0x0002, &sctpg[0]); /* SCT_ACT_WRITE_SAME */
+ put_unaligned_le16(0x0101, &sctpg[1]); /* WRITE PTRN FG */
+ put_unaligned_le64(lba, &sctpg[2]);
+ put_unaligned_le64(num, &sctpg[6]);
+ put_unaligned_le32(0u, &sctpg[10]);
+
+ kunmap_atomic(ptr);
+}
+
+/**
+ * ata_scsi_write_same_xlat() - SATL Write Same to ATA SCT Write Same
+ * @qc: Command to be translated
+ *
+ * Translate a SCSI WRITE SAME command to be either a DSM TRIM command or
+ * an SCT Write Same command.
+ * Based on WRITE SAME has the UNMAP flag
+ * When set translate to DSM TRIM
+ * When clear translate to SCT Write Same
+ */
static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
{
struct ata_taskfile *tf = &qc->tf;
@@ -3338,6 +3371,7 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
u32 size;
u16 fp;
u8 bp = 0xff;
+ u8 unmap = cdb[1] & 0x8;
/* we may not issue DMA commands if no DMA mode is set */
if (unlikely(!dev->dma_mode))
@@ -3350,10 +3384,23 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
scsi_16_lba_len(cdb, &block, &n_block);
/* for now we only support WRITE SAME with the unmap bit set */
- if (unlikely(!(cdb[1] & 0x8))) {
- fp = 1;
- bp = 3;
- goto invalid_fld;
+ if (unmap) {
+ if ((dev->horkage & ATA_HORKAGE_NOTRIM) ||
+ !ata_id_has_trim(dev->id)) {
+ fp = 1;
+ bp = 3;
+ goto invalid_fld;
+ }
+ if (n_block > 0xffff * trmax) {
+ fp = 2;
+ goto invalid_fld;
+ }
+ } else {
+ if (!ata_id_sct_write_same(dev->id)) {
+ fp = 1;
+ bp = 3;
+ goto invalid_fld;
+ }
}
/*
@@ -3364,30 +3411,42 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
goto invalid_param_len;
sg = scsi_sglist(scmd);
- if (n_block <= 0xffff * cmax) {
+ if (unmap) {
size = ata_format_dsm_trim_descr(sg, trmax, block, n_block);
+ if (ata_ncq_enabled(dev) && ata_fpdma_dsm_supported(dev)) {
+ /* Newer devices support queued TRIM commands */
+ tf->protocol = ATA_PROT_NCQ;
+ tf->command = ATA_CMD_FPDMA_SEND;
+ tf->hob_nsect = ATA_SUBCMD_FPDMA_SEND_DSM & 0x1f;
+ tf->nsect = qc->tag << 3;
+ tf->hob_feature = (size / 512) >> 8;
+ tf->feature = size / 512;
+
+ tf->auxiliary = 1;
+ } else {
+ tf->protocol = ATA_PROT_DMA;
+ tf->hob_feature = 0;
+ tf->feature = ATA_DSM_TRIM;
+ tf->hob_nsect = (size / 512) >> 8;
+ tf->nsect = size / 512;
+ tf->command = ATA_CMD_DSM;
+ }
} else {
- fp = 2;
- goto invalid_fld;
- }
-
- if (ata_ncq_enabled(dev) && ata_fpdma_dsm_supported(dev)) {
- /* Newer devices support queued TRIM commands */
- tf->protocol = ATA_PROT_NCQ;
- tf->command = ATA_CMD_FPDMA_SEND;
- tf->hob_nsect = ATA_SUBCMD_FPDMA_SEND_DSM & 0x1f;
- tf->nsect = qc->tag << 3;
- tf->hob_feature = (size / 512) >> 8;
- tf->feature = size / 512;
+ ata_format_sct_write_same(sg, block, n_block);
- tf->auxiliary = 1;
- } else {
- tf->protocol = ATA_PROT_DMA;
tf->hob_feature = 0;
- tf->feature = ATA_DSM_TRIM;
- tf->hob_nsect = (size / 512) >> 8;
- tf->nsect = size / 512;
- tf->command = ATA_CMD_DSM;
+ tf->feature = 0;
+ tf->hob_nsect = 0;
+ tf->nsect = 1;
+ tf->lbah = 0;
+ tf->lbam = 0;
+ tf->lbal = ATA_CMD_STANDBYNOW1;
+ tf->hob_lbah = 0;
+ tf->hob_lbam = 0;
+ tf->hob_lbal = 0;
+ tf->device = ATA_CMD_STANDBYNOW1;
+ tf->protocol = ATA_PROT_DMA;
+ tf->command = ATA_CMD_WRITE_LOG_DMA_EXT;
}
tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE | ATA_TFLAG_LBA48 |
@@ -3411,6 +3470,76 @@ invalid_opcode:
}
/**
+ * ata_scsiop_maint_in - Simulate a subset of MAINTENANCE_IN
+ * @args: device MAINTENANCE_IN data / SCSI command of interest.
+ * @rbuf: Response buffer, to which simulated SCSI cmd output is sent.
+ *
+ * Yields a subset to satisfy scsi_report_opcode()
+ *
+ * LOCKING:
+ * spin_lock_irqsave(host lock)
+ */
+static unsigned int ata_scsiop_maint_in(struct ata_scsi_args *args, u8 *rbuf)
+{
+ struct ata_device *dev = args->dev;
+ u8 *cdb = args->cmd->cmnd;
+ u8 supported = 0;
+ unsigned int err = 0;
+
+ if (cdb[2] != 1) {
+ ata_dev_warn(dev, "invalid command format %d\n", cdb[2]);
+ err = 2;
+ goto out;
+ }
+ switch (cdb[3]) {
+ case INQUIRY:
+ case MODE_SENSE:
+ case MODE_SENSE_10:
+ case READ_CAPACITY:
+ case SERVICE_ACTION_IN_16:
+ case REPORT_LUNS:
+ case REQUEST_SENSE:
+ case SYNCHRONIZE_CACHE:
+ case REZERO_UNIT:
+ case SEEK_6:
+ case SEEK_10:
+ case TEST_UNIT_READY:
+ case SEND_DIAGNOSTIC:
+ case MAINTENANCE_IN:
+ case READ_6:
+ case READ_10:
+ case READ_16:
+ case WRITE_6:
+ case WRITE_10:
+ case WRITE_16:
+ case ATA_12:
+ case ATA_16:
+ case VERIFY:
+ case VERIFY_16:
+ case MODE_SELECT:
+ case MODE_SELECT_10:
+ case START_STOP:
+ supported = 3;
+ break;
+ case WRITE_SAME_16:
+ if (ata_id_sct_write_same(dev->id))
+ supported = 3;
+ break;
+ case ZBC_IN:
+ case ZBC_OUT:
+ if (ata_id_zoned_cap(dev->id) ||
+ dev->class == ATA_DEV_ZAC)
+ supported = 3;
+ break;
+ default:
+ break;
+ }
+out:
+ rbuf[1] = supported; /* supported */
+ return err;
+}
+
+/**
* ata_scsi_report_zones_complete - convert ATA output
* @qc: command structure returning the data
*
@@ -4190,6 +4319,13 @@ void ata_scsi_simulate(struct ata_device *dev, struct scsi_cmnd *cmd)
ata_scsi_invalid_field(dev, cmd, 1);
break;
+ case MAINTENANCE_IN:
+ if (scsicmd[1] == MI_REPORT_SUPPORTED_OPERATION_CODES)
+ ata_scsi_rbuf_fill(&args, ata_scsiop_maint_in);
+ else
+ ata_scsi_invalid_field(dev, cmd, 1);
+ break;
+
/* all other commands */
default:
ata_scsi_set_sense(dev, cmd, ILLEGAL_REQUEST, 0x20, 0x0);
@@ -4222,7 +4358,6 @@ int ata_scsi_add_hosts(struct ata_host *host, struct scsi_host_template *sht)
shost->max_lun = 1;
shost->max_channel = 1;
shost->max_cmd_len = 16;
- shost->no_write_same = 1;
/* Schedule policy is determined by ->qc_defer()
* callback and it needs to see every deferred qc.
diff --git a/include/linux/ata.h b/include/linux/ata.h
index 45a1d71..fdb1803 100644
--- a/include/linux/ata.h
+++ b/include/linux/ata.h
@@ -105,6 +105,7 @@ enum {
ATA_ID_CFA_KEY_MGMT = 162,
ATA_ID_CFA_MODES = 163,
ATA_ID_DATA_SET_MGMT = 169,
+ ATA_ID_SCT_CMD_XPORT = 206,
ATA_ID_ROT_SPEED = 217,
ATA_ID_PIO4 = (1 << 1),
@@ -789,6 +790,48 @@ static inline bool ata_id_sense_reporting_enabled(const u16 *id)
}
/**
+ *
+ * Word: 206 - SCT Command Transport
+ * 15:12 - Vendor Specific
+ * 11:6 - Reserved
+ * 5 - SCT Command Transport Data Tables supported
+ * 4 - SCT Command Transport Features Control supported
+ * 3 - SCT Command Transport Error Recovery Control supported
+ * 2 - SCT Command Transport Write Same supported
+ * 1 - SCT Command Transport Long Sector Access supported
+ * 0 - SCT Command Transport supported
+ */
+static inline bool ata_id_sct_data_tables(const u16 *id)
+{
+ return id[ATA_ID_SCT_CMD_XPORT] & (1 << 5) ? true : false;
+}
+
+static inline bool ata_id_sct_features_ctrl(const u16 *id)
+{
+ return id[ATA_ID_SCT_CMD_XPORT] & (1 << 4) ? true : false;
+}
+
+static inline bool ata_id_sct_error_recovery_ctrl(const u16 *id)
+{
+ return id[ATA_ID_SCT_CMD_XPORT] & (1 << 3) ? true : false;
+}
+
+static inline bool ata_id_sct_write_same(const u16 *id)
+{
+ return id[ATA_ID_SCT_CMD_XPORT] & (1 << 2) ? true : false;
+}
+
+static inline bool ata_id_sct_long_sector_access(const u16 *id)
+{
+ return id[ATA_ID_SCT_CMD_XPORT] & (1 << 1) ? true : false;
+}
+
+static inline bool ata_id_sct_supported(const u16 *id)
+{
+ return id[ATA_ID_SCT_CMD_XPORT] & (1 << 0) ? true : false;
+}
+
+/**
* ata_id_major_version - get ATA level of drive
* @id: Identify data
*
--
2.8.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v5 2/2] Add support for SCT Write Same
2016-08-10 1:00 ` [PATCH v5 2/2] Add support for SCT Write Same Shaun Tancheff
@ 2016-08-10 6:31 ` Tom Yan
2016-08-10 11:30 ` Tom Yan
0 siblings, 1 reply; 16+ messages in thread
From: Tom Yan @ 2016-08-10 6:31 UTC (permalink / raw)
To: Shaun Tancheff
Cc: linux-ide, linux-kernel, Christoph Hellwig, Tejun Heo,
Josh Bingaman, Shaun Tancheff
I don't really know about SCT Write Same but there is one concern I
could I think of.
libata's SATL would report a maximum write same length base on the
number of sectors a one-block TRIM payload can describe at most, which
is 65535 * 64 = 4194240 (see ata_scsiop_inq_b0 in libata-scsi.c). If
the drive does not support TRIM, it will not report such length. That
is technically fine, as per SBC standard, and I suppose the SCSI disk
driver would use SD_MAX_WS16_BLOCKS = 0x7fffff (8388607).
However, are both of the limits fine for SCT Write Same? Any alignment
concern? Also, is such "inconsistency" acceptable?
On 10 August 2016 at 01:00, Shaun Tancheff <shaun@tancheff.com> wrote:
> SATA drives may support write same via SCT. This is useful
> for setting the drive contents to a specific pattern (0's).
>
> Translate a SCSI WRITE SAME command to be either a DSM TRIM command or
> an SCT Write Same command.
>
> Based on the UNMAP flag:
> - When set translate to DSM TRIM
> - When not set translate to SCT Write Same
>
> Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
> ---
> v5:
> - Addressed review comments
> - Report support for ZBC only for zoned devices.
> - kmap page during rewrite
> - Fix unmap set to require trim or error, if not unmap then sct write
> same or error.
> v4:
> - Added partial MAINTENANCE_IN opcode simulation
> - Dropped all changes in drivers/scsi/*
> - Changed to honor the UNMAP flag -> TRIM, no UNMAP -> SCT.
> v3:
> - Demux UNMAP/TRIM from WRITE SAME
> v2:
> - Remove fugly ata hacking from sd.c
> ---
> drivers/ata/libata-scsi.c | 189 +++++++++++++++++++++++++++++++++++++++-------
> include/linux/ata.h | 43 +++++++++++
> 2 files changed, 205 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index a71067a..99b0e6c 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -1159,8 +1159,6 @@ static void ata_scsi_sdev_config(struct scsi_device *sdev)
> {
> sdev->use_10_for_rw = 1;
> sdev->use_10_for_ms = 1;
> - sdev->no_report_opcodes = 1;
> - sdev->no_write_same = 1;
>
> /* Schedule policy is determined by ->qc_defer() callback and
> * it needs to see every deferred qc. Set dev_blocked to 1 to
> @@ -3325,6 +3323,41 @@ static unsigned int ata_format_dsm_trim_descr(struct scatterlist *sg, u32 num,
> return used_bytes;
> }
>
> +/**
> + * ata_format_dsm_trim_descr() - SATL Write Same to ATA SCT Write Same
> + * @sg: Scatter / Gather list attached to command.
> + * @lba: Starting sector
> + * @num: Number of bytes to be zero'd.
> + *
> + * Rewrite the WRITE SAME descriptor to be an SCT Write Same formatted
> + * descriptor.
> + * NOTE: Writes a pattern (0's) in the foreground.
> + * Large write-same requents can timeout.
> + */
> +static void ata_format_sct_write_same(struct scatterlist *sg, u64 lba, u64 num)
> +{
> + void *ptr = kmap_atomic(sg_page(sg));
> + u16 *sctpg = ptr + sg->offset;
> +
> + put_unaligned_le16(0x0002, &sctpg[0]); /* SCT_ACT_WRITE_SAME */
> + put_unaligned_le16(0x0101, &sctpg[1]); /* WRITE PTRN FG */
> + put_unaligned_le64(lba, &sctpg[2]);
> + put_unaligned_le64(num, &sctpg[6]);
> + put_unaligned_le32(0u, &sctpg[10]);
> +
> + kunmap_atomic(ptr);
> +}
> +
> +/**
> + * ata_scsi_write_same_xlat() - SATL Write Same to ATA SCT Write Same
> + * @qc: Command to be translated
> + *
> + * Translate a SCSI WRITE SAME command to be either a DSM TRIM command or
> + * an SCT Write Same command.
> + * Based on WRITE SAME has the UNMAP flag
> + * When set translate to DSM TRIM
> + * When clear translate to SCT Write Same
> + */
> static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
> {
> struct ata_taskfile *tf = &qc->tf;
> @@ -3338,6 +3371,7 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
> u32 size;
> u16 fp;
> u8 bp = 0xff;
> + u8 unmap = cdb[1] & 0x8;
>
> /* we may not issue DMA commands if no DMA mode is set */
> if (unlikely(!dev->dma_mode))
> @@ -3350,10 +3384,23 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
> scsi_16_lba_len(cdb, &block, &n_block);
>
> /* for now we only support WRITE SAME with the unmap bit set */
> - if (unlikely(!(cdb[1] & 0x8))) {
> - fp = 1;
> - bp = 3;
> - goto invalid_fld;
> + if (unmap) {
> + if ((dev->horkage & ATA_HORKAGE_NOTRIM) ||
> + !ata_id_has_trim(dev->id)) {
> + fp = 1;
> + bp = 3;
> + goto invalid_fld;
> + }
> + if (n_block > 0xffff * trmax) {
> + fp = 2;
> + goto invalid_fld;
> + }
> + } else {
> + if (!ata_id_sct_write_same(dev->id)) {
> + fp = 1;
> + bp = 3;
> + goto invalid_fld;
> + }
> }
>
> /*
> @@ -3364,30 +3411,42 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
> goto invalid_param_len;
>
> sg = scsi_sglist(scmd);
> - if (n_block <= 0xffff * cmax) {
> + if (unmap) {
> size = ata_format_dsm_trim_descr(sg, trmax, block, n_block);
> + if (ata_ncq_enabled(dev) && ata_fpdma_dsm_supported(dev)) {
> + /* Newer devices support queued TRIM commands */
> + tf->protocol = ATA_PROT_NCQ;
> + tf->command = ATA_CMD_FPDMA_SEND;
> + tf->hob_nsect = ATA_SUBCMD_FPDMA_SEND_DSM & 0x1f;
> + tf->nsect = qc->tag << 3;
> + tf->hob_feature = (size / 512) >> 8;
> + tf->feature = size / 512;
> +
> + tf->auxiliary = 1;
> + } else {
> + tf->protocol = ATA_PROT_DMA;
> + tf->hob_feature = 0;
> + tf->feature = ATA_DSM_TRIM;
> + tf->hob_nsect = (size / 512) >> 8;
> + tf->nsect = size / 512;
> + tf->command = ATA_CMD_DSM;
> + }
> } else {
> - fp = 2;
> - goto invalid_fld;
> - }
> -
> - if (ata_ncq_enabled(dev) && ata_fpdma_dsm_supported(dev)) {
> - /* Newer devices support queued TRIM commands */
> - tf->protocol = ATA_PROT_NCQ;
> - tf->command = ATA_CMD_FPDMA_SEND;
> - tf->hob_nsect = ATA_SUBCMD_FPDMA_SEND_DSM & 0x1f;
> - tf->nsect = qc->tag << 3;
> - tf->hob_feature = (size / 512) >> 8;
> - tf->feature = size / 512;
> + ata_format_sct_write_same(sg, block, n_block);
>
> - tf->auxiliary = 1;
> - } else {
> - tf->protocol = ATA_PROT_DMA;
> tf->hob_feature = 0;
> - tf->feature = ATA_DSM_TRIM;
> - tf->hob_nsect = (size / 512) >> 8;
> - tf->nsect = size / 512;
> - tf->command = ATA_CMD_DSM;
> + tf->feature = 0;
> + tf->hob_nsect = 0;
> + tf->nsect = 1;
> + tf->lbah = 0;
> + tf->lbam = 0;
> + tf->lbal = ATA_CMD_STANDBYNOW1;
> + tf->hob_lbah = 0;
> + tf->hob_lbam = 0;
> + tf->hob_lbal = 0;
> + tf->device = ATA_CMD_STANDBYNOW1;
> + tf->protocol = ATA_PROT_DMA;
> + tf->command = ATA_CMD_WRITE_LOG_DMA_EXT;
> }
>
> tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE | ATA_TFLAG_LBA48 |
> @@ -3411,6 +3470,76 @@ invalid_opcode:
> }
>
> /**
> + * ata_scsiop_maint_in - Simulate a subset of MAINTENANCE_IN
> + * @args: device MAINTENANCE_IN data / SCSI command of interest.
> + * @rbuf: Response buffer, to which simulated SCSI cmd output is sent.
> + *
> + * Yields a subset to satisfy scsi_report_opcode()
> + *
> + * LOCKING:
> + * spin_lock_irqsave(host lock)
> + */
> +static unsigned int ata_scsiop_maint_in(struct ata_scsi_args *args, u8 *rbuf)
> +{
> + struct ata_device *dev = args->dev;
> + u8 *cdb = args->cmd->cmnd;
> + u8 supported = 0;
> + unsigned int err = 0;
> +
> + if (cdb[2] != 1) {
> + ata_dev_warn(dev, "invalid command format %d\n", cdb[2]);
> + err = 2;
> + goto out;
> + }
> + switch (cdb[3]) {
> + case INQUIRY:
> + case MODE_SENSE:
> + case MODE_SENSE_10:
> + case READ_CAPACITY:
> + case SERVICE_ACTION_IN_16:
> + case REPORT_LUNS:
> + case REQUEST_SENSE:
> + case SYNCHRONIZE_CACHE:
> + case REZERO_UNIT:
> + case SEEK_6:
> + case SEEK_10:
> + case TEST_UNIT_READY:
> + case SEND_DIAGNOSTIC:
> + case MAINTENANCE_IN:
> + case READ_6:
> + case READ_10:
> + case READ_16:
> + case WRITE_6:
> + case WRITE_10:
> + case WRITE_16:
> + case ATA_12:
> + case ATA_16:
> + case VERIFY:
> + case VERIFY_16:
> + case MODE_SELECT:
> + case MODE_SELECT_10:
> + case START_STOP:
> + supported = 3;
> + break;
> + case WRITE_SAME_16:
> + if (ata_id_sct_write_same(dev->id))
> + supported = 3;
> + break;
> + case ZBC_IN:
> + case ZBC_OUT:
> + if (ata_id_zoned_cap(dev->id) ||
> + dev->class == ATA_DEV_ZAC)
> + supported = 3;
> + break;
> + default:
> + break;
> + }
> +out:
> + rbuf[1] = supported; /* supported */
> + return err;
> +}
> +
> +/**
> * ata_scsi_report_zones_complete - convert ATA output
> * @qc: command structure returning the data
> *
> @@ -4190,6 +4319,13 @@ void ata_scsi_simulate(struct ata_device *dev, struct scsi_cmnd *cmd)
> ata_scsi_invalid_field(dev, cmd, 1);
> break;
>
> + case MAINTENANCE_IN:
> + if (scsicmd[1] == MI_REPORT_SUPPORTED_OPERATION_CODES)
> + ata_scsi_rbuf_fill(&args, ata_scsiop_maint_in);
> + else
> + ata_scsi_invalid_field(dev, cmd, 1);
> + break;
> +
> /* all other commands */
> default:
> ata_scsi_set_sense(dev, cmd, ILLEGAL_REQUEST, 0x20, 0x0);
> @@ -4222,7 +4358,6 @@ int ata_scsi_add_hosts(struct ata_host *host, struct scsi_host_template *sht)
> shost->max_lun = 1;
> shost->max_channel = 1;
> shost->max_cmd_len = 16;
> - shost->no_write_same = 1;
>
> /* Schedule policy is determined by ->qc_defer()
> * callback and it needs to see every deferred qc.
> diff --git a/include/linux/ata.h b/include/linux/ata.h
> index 45a1d71..fdb1803 100644
> --- a/include/linux/ata.h
> +++ b/include/linux/ata.h
> @@ -105,6 +105,7 @@ enum {
> ATA_ID_CFA_KEY_MGMT = 162,
> ATA_ID_CFA_MODES = 163,
> ATA_ID_DATA_SET_MGMT = 169,
> + ATA_ID_SCT_CMD_XPORT = 206,
> ATA_ID_ROT_SPEED = 217,
> ATA_ID_PIO4 = (1 << 1),
>
> @@ -789,6 +790,48 @@ static inline bool ata_id_sense_reporting_enabled(const u16 *id)
> }
>
> /**
> + *
> + * Word: 206 - SCT Command Transport
> + * 15:12 - Vendor Specific
> + * 11:6 - Reserved
> + * 5 - SCT Command Transport Data Tables supported
> + * 4 - SCT Command Transport Features Control supported
> + * 3 - SCT Command Transport Error Recovery Control supported
> + * 2 - SCT Command Transport Write Same supported
> + * 1 - SCT Command Transport Long Sector Access supported
> + * 0 - SCT Command Transport supported
> + */
> +static inline bool ata_id_sct_data_tables(const u16 *id)
> +{
> + return id[ATA_ID_SCT_CMD_XPORT] & (1 << 5) ? true : false;
> +}
> +
> +static inline bool ata_id_sct_features_ctrl(const u16 *id)
> +{
> + return id[ATA_ID_SCT_CMD_XPORT] & (1 << 4) ? true : false;
> +}
> +
> +static inline bool ata_id_sct_error_recovery_ctrl(const u16 *id)
> +{
> + return id[ATA_ID_SCT_CMD_XPORT] & (1 << 3) ? true : false;
> +}
> +
> +static inline bool ata_id_sct_write_same(const u16 *id)
> +{
> + return id[ATA_ID_SCT_CMD_XPORT] & (1 << 2) ? true : false;
> +}
> +
> +static inline bool ata_id_sct_long_sector_access(const u16 *id)
> +{
> + return id[ATA_ID_SCT_CMD_XPORT] & (1 << 1) ? true : false;
> +}
> +
> +static inline bool ata_id_sct_supported(const u16 *id)
> +{
> + return id[ATA_ID_SCT_CMD_XPORT] & (1 << 0) ? true : false;
> +}
> +
> +/**
> * ata_id_major_version - get ATA level of drive
> * @id: Identify data
> *
> --
> 2.8.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v5 1/2] Use kmap_atomic when rewriting attached page
2016-08-10 1:00 ` [PATCH v5 1/2] Use kmap_atomic when rewriting attached page Shaun Tancheff
@ 2016-08-10 10:56 ` Tom Yan
2016-08-10 12:26 ` Shaun Tancheff
2016-08-11 14:13 ` Christoph Hellwig
1 sibling, 1 reply; 16+ messages in thread
From: Tom Yan @ 2016-08-10 10:56 UTC (permalink / raw)
To: Shaun Tancheff
Cc: linux-ide, linux-kernel, Christoph Hellwig, Tejun Heo,
Josh Bingaman, Shaun Tancheff
On 10 August 2016 at 09:00, Shaun Tancheff <shaun@tancheff.com> wrote:
> static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
> {
> struct ata_taskfile *tf = &qc->tf;
> struct scsi_cmnd *scmd = qc->scsicmd;
> struct ata_device *dev = qc->dev;
> const u8 *cdb = scmd->cmnd;
> + struct scatterlist *sg;
> u64 block;
> u32 n_block;
> + const u32 trmax = ATA_MAX_TRIM_RNUM;
> u32 size;
> - void *buf;
> u16 fp;
> u8 bp = 0xff;
>
> @@ -3319,10 +3363,9 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
> if (!scsi_sg_count(scmd))
> goto invalid_param_len;
>
> - buf = page_address(sg_page(scsi_sglist(scmd)));
> -
> - if (n_block <= 65535 * ATA_MAX_TRIM_RNUM) {
> - size = ata_set_lba_range_entries(buf, ATA_MAX_TRIM_RNUM, block, n_block);
> + sg = scsi_sglist(scmd);
> + if (n_block <= 0xffff * cmax) {
Although this got moved and corrected in the next patch, but perhaps
you should still correct the `cmax` here, which should be `trmax`.
> + size = ata_format_dsm_trim_descr(sg, trmax, block, n_block);
> } else {
> fp = 2;
> goto invalid_fld;
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v5 2/2] Add support for SCT Write Same
2016-08-10 6:31 ` Tom Yan
@ 2016-08-10 11:30 ` Tom Yan
2016-08-10 14:34 ` Shaun Tancheff
0 siblings, 1 reply; 16+ messages in thread
From: Tom Yan @ 2016-08-10 11:30 UTC (permalink / raw)
To: Shaun Tancheff
Cc: linux-ide, linux-kernel, Christoph Hellwig, Tejun Heo,
Josh Bingaman, Shaun Tancheff
On 10 August 2016 at 14:31, Tom Yan <tom.ty89@gmail.com> wrote:
> I don't really know about SCT Write Same but there is one concern I
> could I think of.
>
> libata's SATL would report a maximum write same length base on the
> number of sectors a one-block TRIM payload can describe at most, which
> is 65535 * 64 = 4194240 (see ata_scsiop_inq_b0 in libata-scsi.c). If
> the drive does not support TRIM, it will not report such length. That
> is technically fine, as per SBC standard, and I suppose the SCSI disk
> driver would use SD_MAX_WS16_BLOCKS = 0x7fffff (8388607).
Actually it will use SD_MAX_WS10_BLOCKS = 0xffff (65535) in such case.
See sd_config_write_same() in sd.c. So if the device support TRIM,
each SCT Write Same will cover 4194240 logical sectors; if it does
not, each will cover 65535 logical sectors. In that case, perhaps we
should report the same value in the Maximum Write Same Length field
when (only) SCT Write Same is supported? (Not the Optimal Unmap
Granularity field though).
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v5 1/2] Use kmap_atomic when rewriting attached page
2016-08-10 10:56 ` Tom Yan
@ 2016-08-10 12:26 ` Shaun Tancheff
0 siblings, 0 replies; 16+ messages in thread
From: Shaun Tancheff @ 2016-08-10 12:26 UTC (permalink / raw)
To: Tom Yan
Cc: Shaun Tancheff, linux-ide, LKML, Christoph Hellwig, Tejun Heo,
Josh Bingaman
On Wed, Aug 10, 2016 at 5:56 AM, Tom Yan <tom.ty89@gmail.com> wrote:
> On 10 August 2016 at 09:00, Shaun Tancheff <shaun@tancheff.com> wrote:
>> static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>> {
>> struct ata_taskfile *tf = &qc->tf;
>> struct scsi_cmnd *scmd = qc->scsicmd;
>> struct ata_device *dev = qc->dev;
>> const u8 *cdb = scmd->cmnd;
>> + struct scatterlist *sg;
>> u64 block;
>> u32 n_block;
>> + const u32 trmax = ATA_MAX_TRIM_RNUM;
>> u32 size;
>> - void *buf;
>> u16 fp;
>> u8 bp = 0xff;
>>
>> @@ -3319,10 +3363,9 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>> if (!scsi_sg_count(scmd))
>> goto invalid_param_len;
>>
>> - buf = page_address(sg_page(scsi_sglist(scmd)));
>> -
>> - if (n_block <= 65535 * ATA_MAX_TRIM_RNUM) {
>> - size = ata_set_lba_range_entries(buf, ATA_MAX_TRIM_RNUM, block, n_block);
>> + sg = scsi_sglist(scmd);
>> + if (n_block <= 0xffff * cmax) {
>
> Although this got moved and corrected in the next patch, but perhaps
> you should still correct the `cmax` here, which should be `trmax`.
You are correct. I will fix this up. Thanks!
>> + size = ata_format_dsm_trim_descr(sg, trmax, block, n_block);
>> } else {
>> fp = 2;
>> goto invalid_fld;
--
Shaun Tancheff
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v5 2/2] Add support for SCT Write Same
2016-08-10 11:30 ` Tom Yan
@ 2016-08-10 14:34 ` Shaun Tancheff
2016-08-10 16:50 ` Tom Yan
2016-08-11 1:33 ` Martin K. Petersen
0 siblings, 2 replies; 16+ messages in thread
From: Shaun Tancheff @ 2016-08-10 14:34 UTC (permalink / raw)
To: Tom Yan
Cc: Shaun Tancheff, linux-ide, LKML, Christoph Hellwig, Tejun Heo,
Josh Bingaman
On Wed, Aug 10, 2016 at 6:30 AM, Tom Yan <tom.ty89@gmail.com> wrote:
> On 10 August 2016 at 14:31, Tom Yan <tom.ty89@gmail.com> wrote:
>> I don't really know about SCT Write Same but there is one concern I
>> could I think of.
>>
>> libata's SATL would report a maximum write same length base on the
>> number of sectors a one-block TRIM payload can describe at most, which
>> is 65535 * 64 = 4194240 (see ata_scsiop_inq_b0 in libata-scsi.c). If
>> the drive does not support TRIM, it will not report such length. That
>> is technically fine, as per SBC standard, and I suppose the SCSI disk
>> driver would use SD_MAX_WS16_BLOCKS = 0x7fffff (8388607).
>
> Actually it will use SD_MAX_WS10_BLOCKS = 0xffff (65535) in such case.
> See sd_config_write_same() in sd.c. So if the device support TRIM,
> each SCT Write Same will cover 4194240 logical sectors; if it does
> not, each will cover 65535 logical sectors. In that case, perhaps we
> should report the same value in the Maximum Write Same Length field
> when (only) SCT Write Same is supported? (Not the Optimal Unmap
> Granularity field though).
You are correct in that we can advertise the larger limit in
ata_scsi_dev_config() when only SCT write same is supported
rather than fall back to WS10.
TRIM is bound by an interface maximum. You can only stuff 64 entries
of a 16 bit length followed by 48 bit lba into a 512 byte block.
SCT is not restricted (you can wipe an entire drive) however there
is a practical limit in that I have coded the SCT to operate
in the foreground so the command could timeout depending
on how fast the media can write.
On my machine the default timeout is 30s so to clear 4194240 (16G):
30s -> 547 MB/s
60s -> 274 MB/s
90s -> 183 MB/s
120s -> 137 MB/s
So for my drives 8G and 30s or 16G and 60s is fine.
For older or slow drives 4G and 30s should be fine.
I really am not sure what would be considered the correct
solution though. I believe that the WRITE SAME defaults
are currently being chosen around physical limits.
We could reduce the trim to 16 entries when SCT is available and
bump SCT to the same 16 * 63335 maximum?
I think we can also bump the command timeout for WRITE SAME?
Suggestions are welcome.
--
Shaun Tancheff
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v5 2/2] Add support for SCT Write Same
2016-08-10 14:34 ` Shaun Tancheff
@ 2016-08-10 16:50 ` Tom Yan
2016-08-10 18:29 ` Shaun Tancheff
2016-08-11 1:47 ` Martin K. Petersen
2016-08-11 1:33 ` Martin K. Petersen
1 sibling, 2 replies; 16+ messages in thread
From: Tom Yan @ 2016-08-10 16:50 UTC (permalink / raw)
To: Shaun Tancheff
Cc: Shaun Tancheff, linux-ide, LKML, Christoph Hellwig, Tejun Heo,
Josh Bingaman
On 10 August 2016 at 14:34, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
>
> You are correct in that we can advertise the larger limit in
> ata_scsi_dev_config() when only SCT write same is supported
> rather than fall back to WS10.
ata_scsi_dev_config()? Not sure if I follow. We should only need to
report Maximum Write Same Length in the Block Limit VPD
(ata_scsiop_inq_b0).
>
> TRIM is bound by an interface maximum. You can only stuff 64 entries
> of a 16 bit length followed by 48 bit lba into a 512 byte block.
Well that is actually the minimum. Modern SSDs often support more than
one-block payload (e.g. 8, 16...). It's just our SCSI disk driver
statically limit it to the minimum. Though it allows only 0xffffffff /
512 = 8388607 (SD_MAX_WS16_BLOCKS) blocks per WRITE SAME (16) command
anyway, so we can at most allow only a 2-block (well, or 3-block)
payload.
>
> SCT is not restricted (you can wipe an entire drive) however there
> is a practical limit in that I have coded the SCT to operate
> in the foreground so the command could timeout depending
> on how fast the media can write.
>
> On my machine the default timeout is 30s so to clear 4194240 (16G):
You are talking about an AF 4Kn drive I suppose? For a 512e drive it
should be only ~2G.
> 30s -> 547 MB/s
> 60s -> 274 MB/s
> 90s -> 183 MB/s
> 120s -> 137 MB/s
>
> So for my drives 8G and 30s or 16G and 60s is fine.
> For older or slow drives 4G and 30s should be fine.
>
> I really am not sure what would be considered the correct
> solution though. I believe that the WRITE SAME defaults
> are currently being chosen around physical limits.
Not sure about what WRITE SAME defaults and physical limits you are
referring to.
>
> We could reduce the trim to 16 entries when SCT is available and
> bump SCT to the same 16 * 63335 maximum?
I am not sure if that's a good idea. Small TRIM payloads (hence more
TRIM commands) could lead to noticeable overhead in my experience. But
if 4194240 blocks is really too many for SCT Write Same in any case, I
guess we will have to compromise, since the Maximum Write Same Length
field is shared. (Now it feels unfortunate that we decided to switch
from UNMAP -> TRIM to WRITE SAME (16) -> TRIM long ago.) The question
is, do we want the value to stay at 4194240 when SCT Write Same is not
available?
I have no idea what the value should be. But, given the fact sector
size seems to matter much in the SCT case, perhaps at the very least,
we would want to derive the multiplier from that?
>
> I think we can also bump the command timeout for WRITE SAME?
I have no idea where the timeout comes from. Is it even a thing in the
kernel (instead of one in the firmware of the drive or the ACS
standard)?
>
> Suggestions are welcome.
> --
> Shaun Tancheff
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v5 2/2] Add support for SCT Write Same
2016-08-10 16:50 ` Tom Yan
@ 2016-08-10 18:29 ` Shaun Tancheff
2016-08-11 8:04 ` Tom Yan
2016-08-11 1:47 ` Martin K. Petersen
1 sibling, 1 reply; 16+ messages in thread
From: Shaun Tancheff @ 2016-08-10 18:29 UTC (permalink / raw)
To: Tom Yan
Cc: Shaun Tancheff, linux-ide, LKML, Christoph Hellwig, Tejun Heo,
Josh Bingaman
On Wed, Aug 10, 2016 at 11:50 AM, Tom Yan <tom.ty89@gmail.com> wrote:
> On 10 August 2016 at 14:34, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
>>
>> You are correct in that we can advertise the larger limit in
>> ata_scsi_dev_config() when only SCT write same is supported
>> rather than fall back to WS10.
>
> ata_scsi_dev_config()? Not sure if I follow. We should only need to
> report Maximum Write Same Length in the Block Limit VPD
> (ata_scsiop_inq_b0).
>
>>
>> TRIM is bound by an interface maximum. You can only stuff 64 entries
>> of a 16 bit length followed by 48 bit lba into a 512 byte block.
>
> Well that is actually the minimum. Modern SSDs often support more than
> one-block payload (e.g. 8, 16...). It's just our SCSI disk driver
> statically limit it to the minimum. Though it allows only 0xffffffff /
> 512 = 8388607 (SD_MAX_WS16_BLOCKS) blocks per WRITE SAME (16) command
> anyway, so we can at most allow only a 2-block (well, or 3-block)
> payload.
Ah. Thanks for the clarification.
>> SCT is not restricted (you can wipe an entire drive) however there
>> is a practical limit in that I have coded the SCT to operate
>> in the foreground so the command could timeout depending
>> on how fast the media can write.
>>
>> On my machine the default timeout is 30s so to clear 4194240 (16G):
>
> You are talking about an AF 4Kn drive I suppose? For a 512e drive it
> should be only ~2G.
I stand corrected. Since all the kernel math is 512 byte sectors you are
absolutely correct and this isn't an issue at all.
We should report SD_MAX_WS16_BLOCKS when only SCT
is available and 4194240 when TRIM is available.
You can safely ignore the remainder of my pointless rambling.
Thanks for you patience none the less.
>> 30s -> 547 MB/s
>> 60s -> 274 MB/s
>> 90s -> 183 MB/s
>> 120s -> 137 MB/s
>>
>> So for my drives 8G and 30s or 16G and 60s is fine.
>> For older or slow drives 4G and 30s should be fine.
>>
>> I really am not sure what would be considered the correct
>> solution though. I believe that the WRITE SAME defaults
>> are currently being chosen around physical limits.
>
> Not sure about what WRITE SAME defaults and physical limits you are
> referring to.
Just that the WRITE SAME limit SD_MAX_WS16_BLOCKS is derived
from the request interface as opposed to some other arbitrary limit.
>>
>> We could reduce the trim to 16 entries when SCT is available and
>> bump SCT to the same 16 * 63335 maximum?
>
> I am not sure if that's a good idea. Small TRIM payloads (hence more
> TRIM commands) could lead to noticeable overhead in my experience. But
> if 4194240 blocks is really too many for SCT Write Same in any case, I
> guess we will have to compromise, since the Maximum Write Same Length
> field is shared. (Now it feels unfortunate that we decided to switch
> from UNMAP -> TRIM to WRITE SAME (16) -> TRIM long ago.) The question
> is, do we want the value to stay at 4194240 when SCT Write Same is not
> available?
>
> I have no idea what the value should be. But, given the fact sector
> size seems to matter much in the SCT case, perhaps at the very least,
> we would want to derive the multiplier from that?
>
>>
>> I think we can also bump the command timeout for WRITE SAME?
>
> I have no idea where the timeout comes from. Is it even a thing in the
> kernel (instead of one in the firmware of the drive or the ACS
> standard)?
Oh ... the timeout I was thinking of is this (or defaulted from):
/sys/block/sdX/device/timeout
It's the struct request_queue's 'rq_timeout'
At least that's where my line of thought was going.
I will update the patch to use SD_MAX_WS16_BLOCKS and 4194240 as appropriate.
Regards,
Shaun
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v5 2/2] Add support for SCT Write Same
2016-08-10 14:34 ` Shaun Tancheff
2016-08-10 16:50 ` Tom Yan
@ 2016-08-11 1:33 ` Martin K. Petersen
1 sibling, 0 replies; 16+ messages in thread
From: Martin K. Petersen @ 2016-08-11 1:33 UTC (permalink / raw)
To: Shaun Tancheff
Cc: Tom Yan, Shaun Tancheff, linux-ide, LKML, Christoph Hellwig,
Tejun Heo, Josh Bingaman
>>>>> "Shaun" == Shaun Tancheff <shaun.tancheff@seagate.com> writes:
Shaun,
Shaun> You are correct in that we can advertise the larger limit in
Shaun> ata_scsi_dev_config() when only SCT write same is supported
Shaun> rather than fall back to WS10.
I deliberately capped WRITE SAME to 64K blocks unless otherwise reported
by the device because:
a) Several older drives supported the WRITE SAME(16) command but
ignored the upper bytes of the transfer length effectively
turning it into a WRITE SAME(10).
b) 64K blocks was the sweet spot for older drives as well, a
size commonly used by RAID array firmwares that were the only
commonplace users of the WRITE SAME family.
Shaun> I really am not sure what would be considered the correct
Shaun> solution though. I believe that the WRITE SAME defaults are
Shaun> currently being chosen around physical limits.
Lacking a solid reporting facility in the spec, I suggest you just leave
it unset and let the SCSI defaults apply.
Shaun> I think we can also bump the command timeout for WRITE SAME?
The default WRITE SAME timeout is 120s.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v5 2/2] Add support for SCT Write Same
2016-08-10 16:50 ` Tom Yan
2016-08-10 18:29 ` Shaun Tancheff
@ 2016-08-11 1:47 ` Martin K. Petersen
2016-08-11 8:46 ` Tom Yan
1 sibling, 1 reply; 16+ messages in thread
From: Martin K. Petersen @ 2016-08-11 1:47 UTC (permalink / raw)
To: Tom Yan
Cc: Shaun Tancheff, Shaun Tancheff, linux-ide, LKML,
Christoph Hellwig, Tejun Heo, Josh Bingaman
>>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes:
Tom> Well that is actually the minimum. Modern SSDs often support more
Tom> than one-block payload (e.g. 8, 16...). It's just our SCSI disk
Tom> driver statically limit it to the minimum. Though it allows only
Tom> 0xffffffff / 512 = 8388607 (SD_MAX_WS16_BLOCKS) blocks per WRITE
Tom> SAME (16) command anyway,
Yes, because that's the biggest command we can express in the block
layer.
Tom> so we can at most allow only a 2-block (well, or 3-block) payload.
We tried turning on multi block payloads and it was a massive disaster.
Many drives reported that they supported 8 block payloads but actually
didn't. Instead of playing the blacklist game we capped it at a single
sector.
Many drives from different vendors were affected by this. So we'd have
to make multi block payloads an explicit opt-in like we did for
discard_zeroes_data. However, given that "big" discards are mainly done
synchronously when creating filesystems, I am not sure there is any real
benefit to this.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v5 2/2] Add support for SCT Write Same
2016-08-10 18:29 ` Shaun Tancheff
@ 2016-08-11 8:04 ` Tom Yan
0 siblings, 0 replies; 16+ messages in thread
From: Tom Yan @ 2016-08-11 8:04 UTC (permalink / raw)
To: Shaun Tancheff, martin.petersen
Cc: Shaun Tancheff, linux-ide, LKML, Christoph Hellwig, Tejun Heo,
Josh Bingaman
On 11 August 2016 at 02:29, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
>>
>> You are talking about an AF 4Kn drive I suppose? For a 512e drive it
>> should be only ~2G.
>
> I stand corrected. Since all the kernel math is 512 byte sectors you are
> absolutely correct and this isn't an issue at all.
>
> We should report SD_MAX_WS16_BLOCKS when only SCT
> is available and 4194240 when TRIM is available.
Why would you come up with such decision/conclusion? I thought
SD_MAX_WS16_BLOCKS could even be too big for 512e drive (~4G per WRITE
SAME command)?
If 4194240 (~2G per command) isn't too big for SCT Write Same, then we
shoud probably stick with it in both cases (when only SCT Write Same
is available / when TRIM is also available) to maintain consistency.
Also Maximum Write Same Length should be multiplied by the actual
logical sector size of the drive, which would be 4096 in the case of
4Kn drives. (If the kernel isn't doing that, it's simply a bug.)
Therefore, if I haven't missed anything, we'll need to divide
ATA_MAX_TRIM_RNUM by (logical sector size / 512) anyway, otherwise
discard_max_bytes and write_same_max_bytes would overflow with 4Kn
drive (i.e. Maximum Write Same Length needs to be <= 0xffffffff /
4096). 4Kn SSDs may not be a thing on the market yet, but apparently
it's a different story for traditional HDDs, and SCT Write Same isn't
only available on SSDs. The division should not change the current
behaviour on drives with 512-byte logical sectors.
I'll be sending a patch on that.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v5 2/2] Add support for SCT Write Same
2016-08-11 1:47 ` Martin K. Petersen
@ 2016-08-11 8:46 ` Tom Yan
2016-08-12 1:18 ` Martin K. Petersen
0 siblings, 1 reply; 16+ messages in thread
From: Tom Yan @ 2016-08-11 8:46 UTC (permalink / raw)
To: Martin K. Petersen
Cc: Shaun Tancheff, Shaun Tancheff, linux-ide, LKML,
Christoph Hellwig, Tejun Heo, Josh Bingaman
On 11 August 2016 at 09:47, Martin K. Petersen
<martin.petersen@oracle.com> wrote:
>
> Tom> so we can at most allow only a 2-block (well, or 3-block) payload.
>
> We tried turning on multi block payloads and it was a massive disaster.
> Many drives reported that they supported 8 block payloads but actually
> didn't. Instead of playing the blacklist game we capped it at a single
> sector.
I don't know, apparently Windows does multi block payloads though (at
least that's how it advertise on the simulated VPD).
What I meant was it will not make a big difference in our case anyway.
Given the 32-bit representation limitation, we could be at best using
a full 2-block payload. So let's not do 2 but 1? Fine :P
>
> Many drives from different vendors were affected by this. So we'd have
> to make multi block payloads an explicit opt-in like we did for
> discard_zeroes_data. However, given that "big" discards are mainly done
> synchronously when creating filesystems, I am not sure there is any real
> benefit to this.
>
Probably. Perhaps it could make a difference upon deletion of some
really big files (though the logical sectors used may not be
continuous anyway).
> --
> Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v5 1/2] Use kmap_atomic when rewriting attached page
2016-08-10 1:00 ` [PATCH v5 1/2] Use kmap_atomic when rewriting attached page Shaun Tancheff
2016-08-10 10:56 ` Tom Yan
@ 2016-08-11 14:13 ` Christoph Hellwig
1 sibling, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2016-08-11 14:13 UTC (permalink / raw)
To: Shaun Tancheff
Cc: linux-ide, linux-kernel, Christoph Hellwig, Tejun Heo,
Josh Bingaman, Shaun Tancheff
> +static unsigned int ata_format_dsm_trim_descr(struct scatterlist *sg, u32 num,
> + u64 sector, u32 count)
> +{
> + void *ptr = kmap_atomic(sg_page(sg));
Please use sg_copy_from_buffer to be safe against all corner cases.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v5 2/2] Add support for SCT Write Same
2016-08-11 8:46 ` Tom Yan
@ 2016-08-12 1:18 ` Martin K. Petersen
0 siblings, 0 replies; 16+ messages in thread
From: Martin K. Petersen @ 2016-08-12 1:18 UTC (permalink / raw)
To: Tom Yan
Cc: Martin K. Petersen, Shaun Tancheff, Shaun Tancheff, linux-ide,
LKML, Christoph Hellwig, Tejun Heo, Josh Bingaman
>>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes:
>> Many drives from different vendors were affected by this. So we'd
>> have to make multi block payloads an explicit opt-in like we did for
>> discard_zeroes_data. However, given that "big" discards are mainly
>> done synchronously when creating filesystems, I am not sure there is
>> any real benefit to this.
Tom> Probably. Perhaps it could make a difference upon deletion of some
Tom> really big files (though the logical sectors used may not be
Tom> continuous anyway).
Nope. And you've got the bio size limit getting in the way as well.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2016-08-12 1:19 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-10 1:00 [PATCH v5 0/2] Add support for SCT Write Same Shaun Tancheff
2016-08-10 1:00 ` [PATCH v5 1/2] Use kmap_atomic when rewriting attached page Shaun Tancheff
2016-08-10 10:56 ` Tom Yan
2016-08-10 12:26 ` Shaun Tancheff
2016-08-11 14:13 ` Christoph Hellwig
2016-08-10 1:00 ` [PATCH v5 2/2] Add support for SCT Write Same Shaun Tancheff
2016-08-10 6:31 ` Tom Yan
2016-08-10 11:30 ` Tom Yan
2016-08-10 14:34 ` Shaun Tancheff
2016-08-10 16:50 ` Tom Yan
2016-08-10 18:29 ` Shaun Tancheff
2016-08-11 8:04 ` Tom Yan
2016-08-11 1:47 ` Martin K. Petersen
2016-08-11 8:46 ` Tom Yan
2016-08-12 1:18 ` Martin K. Petersen
2016-08-11 1:33 ` Martin K. Petersen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox