* Re: [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat
From: Shaun Tancheff @ 2016-08-22 22:00 UTC (permalink / raw)
To: Tom Yan
Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
Josh Bingaman, Hannes Reinecke
In-Reply-To: <CAGnHSEnMS+wYWiwLxoYsTDOoVb9qJ=gZbcQmhsZumiO+wbQi4A@mail.gmail.com>
On Mon, Aug 22, 2016 at 4:20 PM, Tom Yan <tom.ty89@gmail.com> wrote:
> On 22 August 2016 at 04:23, Shaun Tancheff <shaun@tancheff.com> wrote:
>> Safely overwriting the attached page to ATA format from the SCSI formatted
>> variant.
>>
>> Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
>> ---
>> v6:
>> - Fix bisect bug reported by Tom Yan <tom.ty89@gmail.com>
>> - Change to use sg_copy_from_buffer as per Christoph Hellwig <hch@lst.de>
>> v5:
>> - Added prep patch to work with non-page aligned scatterlist pages
>> and use kmap_atomic() to lock page during modification.
>>
>> drivers/ata/libata-scsi.c | 56 ++++++++++++++++++++++++++++++++++++++++++-----
>> include/linux/ata.h | 26 ----------------------
>> 2 files changed, 51 insertions(+), 31 deletions(-)
>>
>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>> index e207b33..7990cb2 100644
>> --- a/drivers/ata/libata-scsi.c
>> +++ b/drivers/ata/libata-scsi.c
>> @@ -3282,6 +3282,54 @@ static unsigned int ata_scsi_pass_thru(struct ata_queued_cmd *qc)
>> return 1;
>> }
>>
>> +/**
>> + * ata_format_dsm_trim_descr() - SATL Write Same to DSM Trim
>> + * @cmd: SCSI command being translated
>> + * @num: Maximum number of entries (nominally 64).
>> + * @sector: Starting sector
>> + * @count: Total Range of request
>> + *
>> + * Rewrite the WRITE SAME descriptor to be a DSM TRIM little-endian formatted
>> + * descriptor.
>> + *
>> + * Upto 64 entries of the format:
>> + * 63:48 Range Length
>> + * 47:0 LBA
>> + *
>> + * Range Length of 0 is ignored.
>> + * LBA's should be sorted order and not overlap.
>> + *
>> + * NOTE: this is the same format as ADD LBA(S) TO NV CACHE PINNED SET
>> + */
>> +static unsigned int ata_format_dsm_trim_descr(struct scsi_cmnd *cmd, u32 num,
>> + u64 sector, u32 count)
>> +{
>> + __le64 *buffer;
>> + u32 i = 0, used_bytes;
>> + unsigned long flags;
>> +
>> + BUILD_BUG_ON(512 > ATA_SCSI_RBUF_SIZE);
>
> Why do we want to check "512" (a literal number) against
> ATA_SCSI_RBUF_SIZE here?
Because this is re-using the response buffer in a new way.
Such reuse could be a surprise to someone refactoring that
code, although it's pretty unlikely. The build bug on
gives some level of confidence that it won't go unnoticed.
It also codifies the assumption that we can write 512 bytes
to the global ata_scsi_rbuf buffer.
As to using a literal 512, I was just following what was here
before.
It looks like there is a ATA_SECT_SIZE that can replace most
of the literal 512's in here though. That might be a nice cleanup
overall. Not sure it belongs here though.
>> +
>> + spin_lock_irqsave(&ata_scsi_rbuf_lock, flags);
>> + buffer = ((void *)ata_scsi_rbuf);
>> + while (i < num) {
>> + u64 entry = sector |
>> + ((u64)(count > 0xffff ? 0xffff : count) << 48);
>> + buffer[i++] = __cpu_to_le64(entry);
>> + if (count <= 0xffff)
>> + break;
>> + count -= 0xffff;
>> + sector += 0xffff;
>> + }
>> +
>> + used_bytes = ALIGN(i * 8, 512);
>> + memset(buffer + i, 0, used_bytes - i * 8);
>> + sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buffer, 512);
>> + spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
>> +
>> + return used_bytes;
>> +}
>> +
>> static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>> {
>> struct ata_taskfile *tf = &qc->tf;
>> @@ -3290,8 +3338,8 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>> const u8 *cdb = scmd->cmnd;
>> u64 block;
>> u32 n_block;
>> + const u32 trmax = ATA_MAX_TRIM_RNUM;
>> u32 size;
>> - void *buf;
>> u16 fp;
>> u8 bp = 0xff;
>>
>> @@ -3319,10 +3367,8 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>> if (!scsi_sg_count(scmd))
>> goto invalid_param_len;
>>
>> - buf = page_address(sg_page(scsi_sglist(scmd)));
>> -
>> - if (n_block <= 65535 * ATA_MAX_TRIM_RNUM) {
>> - size = ata_set_lba_range_entries(buf, ATA_MAX_TRIM_RNUM, block, n_block);
>> + if (n_block <= 0xffff * trmax) {
>> + size = ata_format_dsm_trim_descr(scmd, trmax, block, n_block);
>> } else {
>> fp = 2;
>> goto invalid_fld;
>> diff --git a/include/linux/ata.h b/include/linux/ata.h
>> index adbc812..45a1d71 100644
>> --- a/include/linux/ata.h
>> +++ b/include/linux/ata.h
>> @@ -1071,32 +1071,6 @@ static inline void ata_id_to_hd_driveid(u16 *id)
>> #endif
>> }
>>
>> -/*
>> - * Write LBA Range Entries to the buffer that will cover the extent from
>> - * sector to sector + count. This is used for TRIM and for ADD LBA(S)
>> - * TO NV CACHE PINNED SET.
>> - */
>> -static inline unsigned ata_set_lba_range_entries(void *_buffer,
>> - unsigned num, u64 sector, unsigned long count)
>> -{
>> - __le64 *buffer = _buffer;
>> - unsigned i = 0, used_bytes;
>> -
>> - while (i < num) {
>> - u64 entry = sector |
>> - ((u64)(count > 0xffff ? 0xffff : count) << 48);
>> - buffer[i++] = __cpu_to_le64(entry);
>> - if (count <= 0xffff)
>> - break;
>> - count -= 0xffff;
>> - sector += 0xffff;
>> - }
>> -
>> - used_bytes = ALIGN(i * 8, 512);
>> - memset(buffer + i, 0, used_bytes - i * 8);
>> - return used_bytes;
>> -}
>> -
>> static inline bool ata_ok(u8 status)
>> {
>> return ((status & (ATA_BUSY | ATA_DRDY | ATA_DF | ATA_DRQ | ATA_ERR))
>> --
>> 2.9.3
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Shaun Tancheff
^ permalink raw reply
* Re: [PATCH v6 2/4] Add support for SCT Write Same
From: Shaun Tancheff @ 2016-08-22 22:07 UTC (permalink / raw)
To: Tom Yan
Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
Josh Bingaman, Hannes Reinecke
In-Reply-To: <CAGnHSEnZe0iipr2hO=QDSq6s7+5JoevXvPNes=WP7gwToFzHsw@mail.gmail.com>
On Mon, Aug 22, 2016 at 3:14 PM, Tom Yan <tom.ty89@gmail.com> wrote:
> On 23 August 2016 at 03:43, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
>>
>> Why would we enforce upper level limits on something that doesn't
>> have any?
>
> If we advertise a limit in our SATL, it makes sense that we should
> make sure the behaviour is consistent when we issue a write same
> through the block layer / ioctl and when we issue a SCSI Write Same
> command directly (e.g. with sg_write_same). IMHO that's pretty much
> why SBC would mandate such behaviour as well.
Breaking would be advertising a limit that is too high and failing.
Advertising a lower limit and succeeding may not be ideal for all
possible use cases, but it's not breaking behaviour.
>>
>> If the upper level, or SG_IO, chooses to set a timeout of 10 hours and
>> wipe a whole disk it should be free to do so.
>>
>
> That's why I said, "if you are going to advertise an Maximum Write Same Length".
--
Shaun Tancheff
^ permalink raw reply
* Re: [PATCH v6 2/4] Add support for SCT Write Same
From: Tom Yan @ 2016-08-22 23:09 UTC (permalink / raw)
To: Shaun Tancheff
Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
Josh Bingaman, Hannes Reinecke
In-Reply-To: <CAJVOszAHcuyLoSJwmKvK5vgM8aLqfQJsaw=Vx7K66wMEaqmu_A@mail.gmail.com>
I am not sure about what you mean here. Rejecting SCSI Write Same
commands that has its "number of blocks" field set to a value higher
than the device's reported Maximum Write Same Length is only natural
and mandated by SBC. We have no reason (even if it is practically not
a must) not to do it while we are implementing a SCSI-ATA Translation
Layer here as long as we advertise Maximum Write Same Length. It does
not matter here whether the command ends up being translated to SCT
Write Same or TRIM.
How high or how lower the limit should be advertised has nothing to do
with the checking.
FWIW, letting the SCSI/block layer fall back with SD_MAX_WS10_BLOCKS
does NOT count as advertising Maximum Write Same Length, that's why we
may or may not (in terms of SBC) check n_block against it if we are
really gonna leave ata_scsiop_inq_b0 in libata-scsi untouched.
On 22 August 2016 at 22:07, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
> On Mon, Aug 22, 2016 at 3:14 PM, Tom Yan <tom.ty89@gmail.com> wrote:
>> On 23 August 2016 at 03:43, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
>>>
>>> Why would we enforce upper level limits on something that doesn't
>>> have any?
>>
>> If we advertise a limit in our SATL, it makes sense that we should
>> make sure the behaviour is consistent when we issue a write same
>> through the block layer / ioctl and when we issue a SCSI Write Same
>> command directly (e.g. with sg_write_same). IMHO that's pretty much
>> why SBC would mandate such behaviour as well.
>
> Breaking would be advertising a limit that is too high and failing.
> Advertising a lower limit and succeeding may not be ideal for all
> possible use cases, but it's not breaking behaviour.
>
>>>
>>> If the upper level, or SG_IO, chooses to set a timeout of 10 hours and
>>> wipe a whole disk it should be free to do so.
>>>
>>
>> That's why I said, "if you are going to advertise an Maximum Write Same Length".
>
> --
> Shaun Tancheff
^ permalink raw reply
* Re: [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat
From: Tom Yan @ 2016-08-22 23:49 UTC (permalink / raw)
To: Shaun Tancheff
Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
Josh Bingaman, Hannes Reinecke
In-Reply-To: <CAJVOszBVXftz9hqmOgWG5DvxW7cQjFPUFaGygWcj_exgNwT4=w@mail.gmail.com>
The only 512 I can see in the old code is the one in:
> - used_bytes = ALIGN(i * 8, 512);
where the alignment is necessary because 'used_bytes' will be returned
as 'size', which will be used for setting the number of 512-byte block
payload when we construct the ATA taskfile / command. It may NOT be a
good idea to replace it with ATA_SECT_SIZE. Some comment could be
nice.
So I don't think it makes any sense to check ATA_SCSI_RBUF_SIZE
against 512 here.
On 22 August 2016 at 22:00, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
> Because this is re-using the response buffer in a new way.
> Such reuse could be a surprise to someone refactoring that
> code, although it's pretty unlikely. The build bug on
> gives some level of confidence that it won't go unnoticed.
> It also codifies the assumption that we can write 512 bytes
> to the global ata_scsi_rbuf buffer.
>
> As to using a literal 512, I was just following what was here
> before.
>
> It looks like there is a ATA_SECT_SIZE that can replace most
> of the literal 512's in here though. That might be a nice cleanup
> overall. Not sure it belongs here though.
>
>>> + used_bytes = ALIGN(i * 8, 512);
>>> + memset(buffer + i, 0, used_bytes - i * 8);
>>> + sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buffer, 512);
And I don't think you have a reason to use 512 here either. It appears
to me that you should use ATA_SCSI_RBUF_SIZE instead (see
ata_scsi_rbuf_put in libata-scsi). If not, it should probably be a
_derived_ value (e.g. from `i`) that tells the _actual_ size of
`buffer`.
Again, note that even when we prefer to stick with _one_ 512-byte
block TRIM payload, we should probably NEVER _hard code_ such limit
(for it's really ugly and unnecessary) in functions like this. All we
need to do is to advertise the limit (or lower) as Maximum Write
Length, and make sure our SATL works as per SBC that it will reject
SCSI Write Same commands that is "larger" than that.
>>> + spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
>>> +
>>> + return used_bytes;
>>> +}
^ permalink raw reply
* Re: [PATCH v6 2/4] Add support for SCT Write Same
From: Shaun Tancheff @ 2016-08-23 0:36 UTC (permalink / raw)
To: Tom Yan
Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
Josh Bingaman, Hannes Reinecke
In-Reply-To: <CAGnHSEm5Ur-7eCj-_mJsTeDwxhkGHPERAodebDS7rRg0AvGn3Q@mail.gmail.com>
On Mon, Aug 22, 2016 at 6:09 PM, Tom Yan <tom.ty89@gmail.com> wrote:
> I am not sure about what you mean here. Rejecting SCSI Write Same
> commands that has its "number of blocks" field set to a value higher
> than the device's reported Maximum Write Same Length is only natural
> and mandated by SBC. We have no reason (even if it is practically not
> a must) not to do it while we are implementing a SCSI-ATA Translation
> Layer here as long as we advertise Maximum Write Same Length. It does
> not matter here whether the command ends up being translated to SCT
> Write Same or TRIM.
>
> How high or how lower the limit should be advertised has nothing to do
> with the checking.
>
> FWIW, letting the SCSI/block layer fall back with SD_MAX_WS10_BLOCKS
> does NOT count as advertising Maximum Write Same Length, that's why we
> may or may not (in terms of SBC) check n_block against it if we are
> really gonna leave ata_scsiop_inq_b0 in libata-scsi untouched.
Sorry I'm still a bit confused.
SCT Write Same does not have a limit ... it's a u64 of logical sectors.
Any limit specified is smaller based on other parts of the stack.
The SATL code being used to emulating SCSI Write Same which does
have a limited number of sectors .. so falling back to the SCSI limit
seems reasonable.
So the limit that is being applied is either the current TRIM limit,
or the SCSI Write Same limit.
--
Shaun Tancheff
^ permalink raw reply
* Re: [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat
From: Shaun Tancheff @ 2016-08-23 1:01 UTC (permalink / raw)
To: Tom Yan
Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
Josh Bingaman, Hannes Reinecke
In-Reply-To: <CAGnHSE=t1qg5ukA7btSwD=AaQZB6W6bmF9HMC_K2JsCqoSNGVg@mail.gmail.com>
On Mon, Aug 22, 2016 at 6:49 PM, Tom Yan <tom.ty89@gmail.com> wrote:
> The only 512 I can see in the old code is the one in:
>
>> - used_bytes = ALIGN(i * 8, 512);
>
> where the alignment is necessary because 'used_bytes' will be returned
> as 'size', which will be used for setting the number of 512-byte block
> payload when we construct the ATA taskfile / command. It may NOT be a
> good idea to replace it with ATA_SECT_SIZE. Some comment could be
> nice.
Not sure I agree with that analysis. Could just as well assign it directly,
as ALIGN() is just superfluous here.
Later the count is always 512/512 -> 1. Always.
"i" is used here to limit the number of bytes that need to be memset()
after filling to payload. Personally I think memset is fast enough that
it is better to do before rather than later but I figure if the code works
let it be.
> So I don't think it makes any sense to check ATA_SCSI_RBUF_SIZE
> against 512 here.
Again, not sure I agree, but I don't really care on that point. Just many
years of defensive coding.
> On 22 August 2016 at 22:00, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
>> Because this is re-using the response buffer in a new way.
>> Such reuse could be a surprise to someone refactoring that
>> code, although it's pretty unlikely. The build bug on
>> gives some level of confidence that it won't go unnoticed.
>> It also codifies the assumption that we can write 512 bytes
>> to the global ata_scsi_rbuf buffer.
>>
>> As to using a literal 512, I was just following what was here
>> before.
>>
>> It looks like there is a ATA_SECT_SIZE that can replace most
>> of the literal 512's in here though. That might be a nice cleanup
>> overall. Not sure it belongs here though.
>>
>
>>>> + used_bytes = ALIGN(i * 8, 512);
>>>> + memset(buffer + i, 0, used_bytes - i * 8);
>>>> + sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buffer, 512);
>
> And I don't think you have a reason to use 512 here either. It appears
> to me that you should use ATA_SCSI_RBUF_SIZE instead (see
> ata_scsi_rbuf_put in libata-scsi). If not, it should probably be a
> _derived_ value (e.g. from `i`) that tells the _actual_ size of
> `buffer`.
Nope. We *must* copy the whole 512 byte payload into the sglist.
Otherwise what was the point of the memset to clear out an cruft?
Failing to move the whole payload into place could leave whatever
garbage is in the buffer to be interpreted as an actual trim and do
real damage. I certainly can't use ATA_SCSI_RBUF_SIZE because
the payload attached to the cmd need only be 512 bytes. Trying
to copy in 4k is going to cause bad things when you check
the return from sg_copy_from_buffer() and notice you failed to
copy in you payload.
> Again, note that even when we prefer to stick with _one_ 512-byte
> block TRIM payload, we should probably NEVER _hard code_ such limit
> (for it's really ugly and unnecessary) in functions like this. All we
The interface requires well formed 512 byte chunks so we have to
at least have to do enough to ensure that we work in multiples of
512. Since 512 is all over the spec for this type of thing I think
it would be reasonable to have a define or enum if you don't think
reusing ATA_SECT_SIZE is good maybe something
like ATA_CMD_XFER_SIZE ?
> need to do is to advertise the limit (or lower) as Maximum Write
> Length, and make sure our SATL works as per SBC that it will reject
> SCSI Write Same commands that is "larger" than that.
>>>> + spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
>>>> +
>>>> + return used_bytes;
>>>> +}
--
Shaun Tancheff
^ permalink raw reply
* Re: [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat
From: Tom Yan @ 2016-08-23 5:26 UTC (permalink / raw)
To: Shaun Tancheff
Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
Josh Bingaman, Hannes Reinecke
In-Reply-To: <CAJVOszBP_JLgKUsknSmG+AuCVW8VEB4VUuaTNQ-MrE9B0nuXBg@mail.gmail.com>
On 23 August 2016 at 01:01, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
> On Mon, Aug 22, 2016 at 6:49 PM, Tom Yan <tom.ty89@gmail.com> wrote:
>> The only 512 I can see in the old code is the one in:
>>
>>> - used_bytes = ALIGN(i * 8, 512);
>>
>> where the alignment is necessary because 'used_bytes' will be returned
>> as 'size', which will be used for setting the number of 512-byte block
>> payload when we construct the ATA taskfile / command. It may NOT be a
>> good idea to replace it with ATA_SECT_SIZE. Some comment could be
>> nice.
>
> Not sure I agree with that analysis. Could just as well assign it directly,
> as ALIGN() is just superfluous here.
> Later the count is always 512/512 -> 1. Always.
It is always 1 merely because we prefer sticking to that. Say we want
to enable multi-block payload now, it won't be 1 anymore.
Also note that the payload is NOT always fully-filled. Say, I issue a
discard request of 1M or a SCSI Write Same (16) commands of 2048
blocks. If you do not round up used_bytes, the result of the division
will be 0 in those cases.
Basically that's the mathematical version of the story of why we need
the following memset().
>
> "i" is used here to limit the number of bytes that need to be memset()
> after filling to payload. Personally I think memset is fast enough that
> it is better to do before rather than later but I figure if the code works
> let it be.
Not sure what you mean by "fast enough"/"do before". What memset() do
here is to locate the offset/location to starting filling the memory
(buffer + i) and fill zero until the payload is 512-byte aligned
(used_bytes - i * 8, where used_bytes is an aligned/rounded-up-to-512
version of i * 8). Again, it does NOT and should NOT limit the
allocated memory to _one_ 512-byte block.
>
>> So I don't think it makes any sense to check ATA_SCSI_RBUF_SIZE
>> against 512 here.
>
> Again, not sure I agree, but I don't really care on that point. Just many
> years of defensive coding.
The thing is you can't even tell what exactly you are defending here.
This would only make the perhaps already obfuscated code more
obfuscated, without actually preventing any bad thing to happen.
The only case you would want to check ATA_SCSI_RBUF_SIZE here is, you
somehow want to use it as buflen parameter in the following
sg_copy_from_buffer call. Then you may say "I want to make sure that
we wouldn't carelessly decrease the currently 4096 ATA_SCSI_RBUF_SIZE
to a value smaller than 512 someday, which cannot even accommodate a
single full-block payload." And even that's a weak reason.
>
>> On 22 August 2016 at 22:00, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
>>> Because this is re-using the response buffer in a new way.
>>> Such reuse could be a surprise to someone refactoring that
>>> code, although it's pretty unlikely. The build bug on
>>> gives some level of confidence that it won't go unnoticed.
>>> It also codifies the assumption that we can write 512 bytes
>>> to the global ata_scsi_rbuf buffer.
>>>
>>> As to using a literal 512, I was just following what was here
>>> before.
>>>
>>> It looks like there is a ATA_SECT_SIZE that can replace most
>>> of the literal 512's in here though. That might be a nice cleanup
>>> overall. Not sure it belongs here though.
>>>
>>
>>>>> + used_bytes = ALIGN(i * 8, 512);
>>>>> + memset(buffer + i, 0, used_bytes - i * 8);
>>>>> + sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buffer, 512);
>>
>> And I don't think you have a reason to use 512 here either. It appears
>> to me that you should use ATA_SCSI_RBUF_SIZE instead (see
>> ata_scsi_rbuf_put in libata-scsi). If not, it should probably be a
>> _derived_ value (e.g. from `i`) that tells the _actual_ size of
>> `buffer`.
>
> Nope. We *must* copy the whole 512 byte payload into the sglist.
> Otherwise what was the point of the memset to clear out an cruft?
> Failing to move the whole payload into place could leave whatever
> garbage is in the buffer to be interpreted as an actual trim and do
> real damage. I certainly can't use ATA_SCSI_RBUF_SIZE because
> the payload attached to the cmd need only be 512 bytes. Trying
> to copy in 4k is going to cause bad things when you check
> the return from sg_copy_from_buffer() and notice you failed to
> copy in you payload.
I don't really know how scatter/gather list and the related functions
work to be honest. You should probably use used_bytes then, since
used_bytes is exactly the final size of buffer (after memset, and it
is 512-byte aligned, but not always 512 in terms of logic).
>
>> Again, note that even when we prefer to stick with _one_ 512-byte
>> block TRIM payload, we should probably NEVER _hard code_ such limit
>> (for it's really ugly and unnecessary) in functions like this. All we
>
> The interface requires well formed 512 byte chunks so we have to
> at least have to do enough to ensure that we work in multiples of
> 512. Since 512 is all over the spec for this type of thing I think
> it would be reasonable to have a define or enum if you don't think
> reusing ATA_SECT_SIZE is good maybe something
> like ATA_CMD_XFER_SIZE ?
I have no idea. Perhaps it is fine to use ATA_SECT_SIZE if what you
described ("512 is all over the spec for this type of thing") is true.
It's just personally I don't see any benefit in terms of readability
introducing macro like that. If anything should be done, I would say
it is to add comment above ALIGN() and the ATA taskfiles pointing out
512 is used because ACS specifically pointed out that "TRIM payload
blocks are always 512-byte, but not the logical sector size or so".
>
>> need to do is to advertise the limit (or lower) as Maximum Write
>> Length, and make sure our SATL works as per SBC that it will reject
>> SCSI Write Same commands that is "larger" than that.
>>>>> + spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
>>>>> +
>>>>> + return used_bytes;
>>>>> +}
>
>
> --
> Shaun Tancheff
^ permalink raw reply
* Re: [PATCH v6 2/4] Add support for SCT Write Same
From: Tom Yan @ 2016-08-23 5:55 UTC (permalink / raw)
To: Shaun Tancheff
Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
Josh Bingaman, Hannes Reinecke
In-Reply-To: <CAJVOszC_MfPtGTR05p=qSHjY_wu4ziMMAqgra7Pu0tQYoxa-Kg@mail.gmail.com>
On 23 August 2016 at 00:36, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
> On Mon, Aug 22, 2016 at 6:09 PM, Tom Yan <tom.ty89@gmail.com> wrote:
>> I am not sure about what you mean here. Rejecting SCSI Write Same
>> commands that has its "number of blocks" field set to a value higher
>> than the device's reported Maximum Write Same Length is only natural
>> and mandated by SBC. We have no reason (even if it is practically not
>> a must) not to do it while we are implementing a SCSI-ATA Translation
>> Layer here as long as we advertise Maximum Write Same Length. It does
>> not matter here whether the command ends up being translated to SCT
>> Write Same or TRIM.
>>
>> How high or how lower the limit should be advertised has nothing to do
>> with the checking.
>>
>> FWIW, letting the SCSI/block layer fall back with SD_MAX_WS10_BLOCKS
>> does NOT count as advertising Maximum Write Same Length, that's why we
>> may or may not (in terms of SBC) check n_block against it if we are
>> really gonna leave ata_scsiop_inq_b0 in libata-scsi untouched.
>
> Sorry I'm still a bit confused.
It's alright. I guess I am not good at expressing my ideas. (And poor English)
>
> SCT Write Same does not have a limit ... it's a u64 of logical sectors.
As I've said, it is not really about SCT Write Same or TRIM. The check
here is added just to make our SATL (which is in some sense a SCSI
device) behave as how we advertised it (Maximum Write Same Length /
rbuf[36] set in ata_scsiop_inq_b0, that's why the value of the check
should always be the same as that).
So for both cases (TRIM and SCT Write Same), _if_ we advertise Maximum
Write Same Length, we should make our SATL (or in other word, our
"SCSI device") reject SCSI Write Same commands as per what we "told"
the users (and the SCSI/block layer).
The SCSI disk driver or so would NOT reject Write Same commands for us
as per the Maximum Write Same Length we reported, because it is the
responsibility of the SCSI device itself (as per SBC). The limit will
only be used to tell the block layer how large at max should each
split of the discard / write same request be.
> Any limit specified is smaller based on other parts of the stack.
> The SATL code being used to emulating SCSI Write Same which does
> have a limited number of sectors .. so falling back to the SCSI limit
> seems reasonable.
I am really not going to tell the whole story here again. We have a
really long discussion on whether we should advertise Maximum Write
Same Length for SCT Write Same, and the value we should advertise.
(Didn't we come to an conclusion on that as well?)
The check should be added here is just a knee-jerk reaction for the
Maximum Write Same Length we advertise. If you end up not advertising
one for SCT Write Same, you don't need to add the check here.
>
> So the limit that is being applied is either the current TRIM limit,
> or the SCSI Write Same limit.
> --
> Shaun Tancheff
^ permalink raw reply
* Re: [PATCH v6 2/4] Add support for SCT Write Same
From: Shaun Tancheff @ 2016-08-23 6:11 UTC (permalink / raw)
To: Tom Yan
Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
Josh Bingaman, Hannes Reinecke
In-Reply-To: <CAGnHSEmLq0qcg8_LZBCqFYJohZORBSe58SSjC3sK7cCf=0g9EQ@mail.gmail.com>
On Tue, Aug 23, 2016 at 12:55 AM, Tom Yan <tom.ty89@gmail.com> wrote:
> I am really not going to tell the whole story here again. We have a
> really long discussion on whether we should advertise Maximum Write
> Same Length for SCT Write Same, and the value we should advertise.
> (Didn't we come to an conclusion on that as well?)
I had thought the conclusion was leave b0 as is and let the WS10 limit
take effect per Martin.
--
Shaun Tancheff
^ permalink raw reply
* Re: [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat
From: Shaun Tancheff @ 2016-08-23 6:20 UTC (permalink / raw)
To: Tom Yan
Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
Josh Bingaman, Hannes Reinecke
In-Reply-To: <CAGnHSEmsgp93X85-bgFyV0CdQbvdMHZsWj6e=VNH_7dUDUewHA@mail.gmail.com>
On Tue, Aug 23, 2016 at 12:26 AM, Tom Yan <tom.ty89@gmail.com> wrote:
> It is always 1 merely because we prefer sticking to that. Say we want
> to enable multi-block payload now, it won't be 1 anymore.
Sorry, I though that DSM TRIM is using 512 bytes here because
WRITE_SAME_16 has a payload of a single logical sector.
> Also note that the payload is NOT always fully-filled.
Oh, I was considering filling the remainder of the buffer
with 0's using memset() as you described to be "filly-filled"
in this case. Sorry for the confusion.
--
Shaun Tancheff
^ permalink raw reply
* Re: [PATCH] ata: do not hard code limit in ata_set_lba_range_entries()
From: Shaun Tancheff @ 2016-08-23 7:30 UTC (permalink / raw)
To: Tom Yan
Cc: Shaun Tancheff, Tejun Heo, Martin K . Petersen, linux-ide,
linux-scsi
In-Reply-To: <CAGnHSEm18o-8ZRhyFnyQ22rUm+gJLWwJyxoA-iSJnkM3k4oHHQ@mail.gmail.com>
On Mon, Aug 22, 2016 at 3:53 PM, Tom Yan <tom.ty89@gmail.com> wrote:
> On 22 August 2016 at 20:32, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
>> On Mon, Aug 22, 2016 at 3:07 PM, Tom Yan <tom.ty89@gmail.com> wrote:
>>> I don't see how that's possible. count / n_block will always be
>>> smaller than 65535 * ATA_MAX_TRIM_RNUM(64) = 4194240. Not to mention
>>> that isn't even a "buffer limit" anyway. By SG_IO do you mean like
>>> SCSI Write Same commands that issued with sg_write_same or so? If
>>> that's the case, that's what exactly commit 5c79097a28c2
>>> ("libata-scsi: reject WRITE SAME (16) with n_block that exceeds
>>> limit") is for.
>>
>> Ah, I see. You are guarding the only user of ata_set_lba_range_entries().
>
> Yup. It is the only right thing to do anyway, that we leave the
> function "open" and guard per context when we use it. Say if
> ata_set_lba_range_entries() is gonna be a function that is shared by
> others, it would only make this commit more important. As I said, we
> did not guard it with a certain buffer limit, but merely redundantly
> guard it with a ("humanized") limit that applies to TRIM only.
But the "humanized" limit is the one you just added and proceeded to
change ata_set_lba_range_entries(). You changed from a buffer size
to use "num" instead and now you want to remove the protection
entirely?
Why not just change to put this in front of ata_set_lba_range_entries()
if (n_block > 65535 * ATA_MAX_TRIM_RNUM) {
fp = 2;
goto invalid_fld;
}
And then restore ata_set_lba_range_entries() to how it looked
before you changed it in commit:
2983860c7 (libata-scsi: avoid repeated calculation of number of TRIM ranges)
Then you can have ata_set_lba_range_entries() take the buffer size ...
something like the following would be fine:
size = ata_set_lba_range_entries(buf, scmd->device->sector_size,
block, n_block);
Now things are happily protected against both exceeding the b0 limit(s) and
overflowing the sglist buffer.
--
Shaun
^ permalink raw reply
* Re: [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat
From: Tom Yan @ 2016-08-23 7:53 UTC (permalink / raw)
To: Shaun Tancheff
Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
Josh Bingaman, Hannes Reinecke
In-Reply-To: <CAJVOszD8HDqQMxyAnU=Yh6UDA+KE13u7Mq6s248uguRf-uUkYg@mail.gmail.com>
On 23 August 2016 at 06:20, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
> On Tue, Aug 23, 2016 at 12:26 AM, Tom Yan <tom.ty89@gmail.com> wrote:
>
>> It is always 1 merely because we prefer sticking to that. Say we want
>> to enable multi-block payload now, it won't be 1 anymore.
>
> Sorry, I though that DSM TRIM is using 512 bytes here because
> WRITE_SAME_16 has a payload of a single logical sector.
Nope, SCSI Write Same commands does not have payload (or in SCSI
terms, parameter list / data-out buffer).
>
>> Also note that the payload is NOT always fully-filled.
>
> Oh, I was considering filling the remainder of the buffer
> with 0's using memset() as you described to be "filly-filled"
> in this case. Sorry for the confusion.
Well yeah _after_ memset() it should always be "fully-filled" (i.e.
512-byte aligned).
>
> --
> Shaun Tancheff
^ permalink raw reply
* Re: [PATCH v6 2/4] Add support for SCT Write Same
From: Tom Yan @ 2016-08-23 7:57 UTC (permalink / raw)
To: Shaun Tancheff
Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
Josh Bingaman, Hannes Reinecke
In-Reply-To: <CAJVOszCU63Dahdu6ZoKGgbWwwma=Mq5zji4j8ya=57NCwFUB_w@mail.gmail.com>
Hmm,
On 22 August 2016 at 18:00, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
>
> Timeout for WS is 120 seconds so we should be fine there.
>
> The number to look for is the:
> Max. Sustained Transfer Rate OD (MB/s): 190 8TB (180 5TB)
>
> Which means the above drives should complete a 2G write in
> about 10 to 11 seconds.
>
> If these were 4Kn drives and we allowed a 16G max then it
> would be 80-90 seconds, assuming the write speed didn't
> get any better.
>
> So holding the maximum to around 2G is probably the best
> overall, in my opinion.
>
> --
> Shaun Tancheff
On 23 August 2016 at 06:11, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
>
> I had thought the conclusion was leave b0 as is and let the WS10 limit
> take effect per Martin.
If that's your final call then you do not need to add the check for
SCT Write Same. (But please make sure that the check for TRIM does not
go away)
> --
> Shaun Tancheff
^ permalink raw reply
* Re: [PATCH] ata: do not hard code limit in ata_set_lba_range_entries()
From: Tom Yan @ 2016-08-23 8:37 UTC (permalink / raw)
To: Shaun Tancheff
Cc: Shaun Tancheff, Tejun Heo, Martin K . Petersen, linux-ide,
linux-scsi
In-Reply-To: <CAJ48U8XDb7S2WPSDBtn+VUoRb70u6SmgbPFhm-NC2jx6CmE2mw@mail.gmail.com>
On 23 August 2016 at 07:30, Shaun Tancheff <shaun@tancheff.com> wrote:
>
> But the "humanized" limit is the one you just added and proceeded to
> change ata_set_lba_range_entries(). You changed from a buffer size
> to use "num" instead and now you want to remove the protection
> entirely?
I am not sure about what you are talking about here. What I did is
just to avoid setting an (inappropriate and unnecessary) hard-coded
limit (with the original while-condition) for i (the unaligned buffer
size) in this general (in the sense that TRIM payload is _not always_
of one 512-byte block, even if we may want to forever practice that in
our kernel) function.
If we really want/need to avoid hitting some real buffer limit (e.g.
maximum length of scatter/gather list?), then we should in some way
check n_block against that. If it is too large we then return
used_bytes = 0 (optionally with some follow-up to add a response to
such return value or so).
We advertise 4194240 as Maximum Write Same Length so that the
SCSI/block layer will know how to split the discard request, and then
we make sure that the SATL reject SCSI commands (that is not issued
through the discard / write same ioctl but with SG_IO /
sg_write_same). That's all we really need to do, end of story.
>
> Why not just change to put this in front of ata_set_lba_range_entries()
>
> if (n_block > 65535 * ATA_MAX_TRIM_RNUM) {
> fp = 2;
> goto invalid_fld;
> }
Well you can change the if-else clause to that. Apparently you've been
doing that in your patch series. But that has nothing to do with what
I am fixing here. Neither does it affect the actual behavior.
>
> And then restore ata_set_lba_range_entries() to how it looked
> before you changed it in commit:
>
> 2983860c7 (libata-scsi: avoid repeated calculation of number of TRIM ranges)
I don't see how it is relevant at all. Though I should have introduced
_this_ patch before that to save some work. Unfortunately I wasn't
aware how ugly ata_set_lba_range_entries was.
>
> Then you can have ata_set_lba_range_entries() take the buffer size ...
> something like the following would be fine:
>
> size = ata_set_lba_range_entries(buf, scmd->device->sector_size,
> block, n_block);
>
> Now things are happily protected against both exceeding the b0 limit(s) and
> overflowing the sglist buffer.
We have no reason at all to bring the logical sector size in here.
Only if in future ACS standards it is stated that payload block are
counted in logical block size instead of statically 512, we would only
need to make the now static "512" in ALIGN() in to 4096.
For possible sglist buffer overflow, see what I mentioned above about
checking n_block and return 0. We would not want to do it with the
original while-way anyway. All it does would be to truncate larger
request and partially complete it silently.
Can you tell exactly how the size of the sglist buffer is limited?
AFAIK it has nothing to do with logical sector size. I doubt that we
will ever cause an overflow, given how conservative we have been on
the size of TRIM payload. We would probably ignore the reported value
once it is larger than 16 or so, if we would ever enable multi-block
payload in the future.
Realistically speaking, I sent this patch not really to get the
function ready for multi-block payload (neither I have mentioned that
in the commit message), but just to make the code proper and avoid
silly effort (and confusion caused) you and I spent in our patches.
>
> --
> Shaun
^ permalink raw reply
* Re: [PATCH] ata: do not hard code limit in ata_set_lba_range_entries()
From: Tom Yan @ 2016-08-23 8:41 UTC (permalink / raw)
To: Shaun Tancheff
Cc: Shaun Tancheff, Tejun Heo, Martin K . Petersen, linux-ide,
linux-scsi
In-Reply-To: <CAGnHSEn470B0ecDdMSY29W4Or3jD_NPoUAku_09o=X9Tn6vSpw@mail.gmail.com>
On 23 August 2016 at 08:37, Tom Yan <tom.ty89@gmail.com> wrote:
> On 23 August 2016 at 07:30, Shaun Tancheff <shaun@tancheff.com> wrote:
>>
>> But the "humanized" limit is the one you just added and proceeded to
>> change ata_set_lba_range_entries(). You changed from a buffer size
>> to use "num" instead and now you want to remove the protection
>> entirely?
>
> I am not sure about what you are talking about here. What I did is
> just to avoid setting an (inappropriate and unnecessary) hard-coded
> limit (with the original while-condition) for i (the unaligned buffer
> size) in this general (in the sense that TRIM payload is _not always_
> of one 512-byte block, even if we may want to forever practice that in
> our kernel) function.
>
> If we really want/need to avoid hitting some real buffer limit (e.g.
> maximum length of scatter/gather list?), then we should in some way
> check n_block against that. If it is too large we then return
> used_bytes = 0 (optionally with some follow-up to add a response to
> such return value or so).
>
> We advertise 4194240 as Maximum Write Same Length so that the
> SCSI/block layer will know how to split the discard request, and then
> we make sure that the SATL reject SCSI commands (that is not issued
> through the discard / write same ioctl but with SG_IO /
> sg_write_same). That's all we really need to do, end of story.
>
>>
>> Why not just change to put this in front of ata_set_lba_range_entries()
>>
>> if (n_block > 65535 * ATA_MAX_TRIM_RNUM) {
>> fp = 2;
>> goto invalid_fld;
>> }
>
> Well you can change the if-else clause to that. Apparently you've been
> doing that in your patch series. But that has nothing to do with what
> I am fixing here. Neither does it affect the actual behavior.
>
>>
>> And then restore ata_set_lba_range_entries() to how it looked
>> before you changed it in commit:
>>
>> 2983860c7 (libata-scsi: avoid repeated calculation of number of TRIM ranges)
>
> I don't see how it is relevant at all. Though I should have introduced
> _this_ patch before that to save some work. Unfortunately I wasn't
> aware how ugly ata_set_lba_range_entries was.
>
>>
>> Then you can have ata_set_lba_range_entries() take the buffer size ...
>> something like the following would be fine:
>>
>> size = ata_set_lba_range_entries(buf, scmd->device->sector_size,
>> block, n_block);
>>
>> Now things are happily protected against both exceeding the b0 limit(s) and
>> overflowing the sglist buffer.
>
> We have no reason at all to bring the logical sector size in here.
> Only if in future ACS standards it is stated that payload block are
> counted in logical block size instead of statically 512, we would only
> need to make the now static "512" in ALIGN() in to 4096.
typo. I meant from static "512" to logical sector size, and I really
think we should do it ONLY if the later ACS changes its statement.
>
> For possible sglist buffer overflow, see what I mentioned above about
> checking n_block and return 0. We would not want to do it with the
> original while-way anyway. All it does would be to truncate larger
> request and partially complete it silently.
>
> Can you tell exactly how the size of the sglist buffer is limited?
> AFAIK it has nothing to do with logical sector size. I doubt that we
> will ever cause an overflow, given how conservative we have been on
> the size of TRIM payload. We would probably ignore the reported value
> once it is larger than 16 or so, if we would ever enable multi-block
> payload in the future.
>
> Realistically speaking, I sent this patch not really to get the
> function ready for multi-block payload (neither I have mentioned that
> in the commit message), but just to make the code proper and avoid
> silly effort (and confusion caused) you and I spent in our patches.
>
>>
>> --
>> Shaun
^ permalink raw reply
* Re: [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat
From: Shaun Tancheff @ 2016-08-23 8:42 UTC (permalink / raw)
To: Tom Yan
Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
Josh Bingaman, Hannes Reinecke
In-Reply-To: <CAGnHSEn7M9Kmse+_JFFCgvZY5TqZt1TDzXGhKaqimog4OKiktg@mail.gmail.com>
On Tue, Aug 23, 2016 at 2:53 AM, Tom Yan <tom.ty89@gmail.com> wrote:
> On 23 August 2016 at 06:20, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
>> On Tue, Aug 23, 2016 at 12:26 AM, Tom Yan <tom.ty89@gmail.com> wrote:
>>
>>> It is always 1 merely because we prefer sticking to that. Say we want
>>> to enable multi-block payload now, it won't be 1 anymore.
>>
>> Sorry, I though that DSM TRIM is using 512 bytes here because
>> WRITE_SAME_16 has a payload of a single logical sector.
>
> Nope, SCSI Write Same commands does not have payload (or in SCSI
> terms, parameter list / data-out buffer).
Hmm. Not sure about T10, but that's not how I read
sd_setup_write_same_cmnd(), it always setups up a transfer of
sector_size for scsi_init_io().
As I understand things, this is how the cmd's sglist is populated and why
this should be using sg_copy_from_buffer() rather than open coding
to the page.
--
Shaun Tancheff
^ permalink raw reply
* Re: [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat
From: Tom Yan @ 2016-08-23 9:04 UTC (permalink / raw)
To: Shaun Tancheff
Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
Josh Bingaman, Hannes Reinecke
In-Reply-To: <CAJVOszDu8WfoSysar=a5vvKb8ADtGb2faR=dGJm9vtvC_OSeUQ@mail.gmail.com>
There's comment about `rq->__data_len` in sd.c (and blkdev.h,
definition of struct `request`). Apparently it is "internal only" (in
terms of the kernel).
As for `cmd->transfersize`, it's probably talking about the CDB
itself. Although different commands have CDB of various lengths (much
shorter than 512-byte), but I guess when the actual command is sent to
the device, we will still transfer it in a full logical block for some
reason.
Anyway these will not have anything to do with how we form the TRIM
payload. The SATL simply read the LBA (starting offset) and NUMBER OF
BLOCK (to TRIM) field in the CDB and pack ranges according to that.
All we care is the actual TRIM limit of the ATA device (and
conservative concern on bogus ones).
On 23 August 2016 at 08:42, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
> On Tue, Aug 23, 2016 at 2:53 AM, Tom Yan <tom.ty89@gmail.com> wrote:
>> On 23 August 2016 at 06:20, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
>>> On Tue, Aug 23, 2016 at 12:26 AM, Tom Yan <tom.ty89@gmail.com> wrote:
>>>
>>>> It is always 1 merely because we prefer sticking to that. Say we want
>>>> to enable multi-block payload now, it won't be 1 anymore.
>>>
>>> Sorry, I though that DSM TRIM is using 512 bytes here because
>>> WRITE_SAME_16 has a payload of a single logical sector.
>>
>> Nope, SCSI Write Same commands does not have payload (or in SCSI
>> terms, parameter list / data-out buffer).
>
> Hmm. Not sure about T10, but that's not how I read
> sd_setup_write_same_cmnd(), it always setups up a transfer of
> sector_size for scsi_init_io().
>
> As I understand things, this is how the cmd's sglist is populated and why
> this should be using sg_copy_from_buffer() rather than open coding
> to the page.
> --
> Shaun Tancheff
^ permalink raw reply
* Re: [PATCH] ata: do not hard code limit in ata_set_lba_range_entries()
From: Tom Yan @ 2016-08-23 9:36 UTC (permalink / raw)
To: Shaun Tancheff
Cc: Shaun Tancheff, Tejun Heo, Martin K . Petersen, linux-ide,
linux-scsi
In-Reply-To: <CAJVOszB7cyUcE+A_EUw0OnvdO3iGFW9sG45R2Ng1Ea_Q7YxaQA@mail.gmail.com>
On 23 August 2016 at 09:18, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
> On Tue, Aug 23, 2016 at 3:37 AM, Tom Yan <tom.ty89@gmail.com> wrote:
>> On 23 August 2016 at 07:30, Shaun Tancheff <shaun@tancheff.com> wrote:
>
>> If we really want/need to avoid hitting some real buffer limit (e.g.
>> maximum length of scatter/gather list?), then we should in some way
>> check n_block against that. If it is too large we then return
>> used_bytes = 0 (optionally with some follow-up to add a response to
>> such return value or so).
>
> Yes there is a real buffer limit, I can think of these two options:
> 1- Assume the setups from sd_setup_discard_cmnd() and/
> or sd_setup_write_same_cmnd() are providing an sglist of
> sdp->sector_size via scsi_init_io()
That sounds completely wrong. The scatter/gather list we are talking
about here has nothing to do with the SCSI or block layer anymore. The
SATL has _already_ parsed the SCSI Write Same (16) command and is
packing ranges/payload according to that in this stage. If there is
any limit it would probably the max_segment allowed by the host driver
(e.g. ahci).
It doesn't seem to make sense to me either that we would need to
prevent sglist overflow in such level. Doesn't that mean we would need
to do the same checking (specifically, as in hard coding checks in all
kinds of procedures) in every use of scatter/gather list? That doesn't
sound right at all.
>
> 2- Find (or write) a suitable sg_get_size(sgl, nents) to walk the
> sglist and calculate the available buffer size.
>
> #2 sounds like more fun but I'm not sure it's what people would prefer to see.
No idea if such thing exists / makes sense at all.
>
> --
> Shaun
^ permalink raw reply
* Re: [PATCH] ata: do not hard code limit in ata_set_lba_range_entries()
From: Shaun Tancheff @ 2016-08-23 9:18 UTC (permalink / raw)
To: Tom Yan
Cc: Shaun Tancheff, Tejun Heo, Martin K . Petersen, linux-ide,
linux-scsi
In-Reply-To: <CAGnHSEn470B0ecDdMSY29W4Or3jD_NPoUAku_09o=X9Tn6vSpw@mail.gmail.com>
On Tue, Aug 23, 2016 at 3:37 AM, Tom Yan <tom.ty89@gmail.com> wrote:
> On 23 August 2016 at 07:30, Shaun Tancheff <shaun@tancheff.com> wrote:
> If we really want/need to avoid hitting some real buffer limit (e.g.
> maximum length of scatter/gather list?), then we should in some way
> check n_block against that. If it is too large we then return
> used_bytes = 0 (optionally with some follow-up to add a response to
> such return value or so).
Yes there is a real buffer limit, I can think of these two options:
1- Assume the setups from sd_setup_discard_cmnd() and/
or sd_setup_write_same_cmnd() are providing an sglist of
sdp->sector_size via scsi_init_io()
2- Find (or write) a suitable sg_get_size(sgl, nents) to walk the
sglist and calculate the available buffer size.
#2 sounds like more fun but I'm not sure it's what people would prefer to see.
--
Shaun
^ permalink raw reply
* Re: [PATCH] ata: do not hard code limit in ata_set_lba_range_entries()
From: Tom Yan @ 2016-08-23 10:17 UTC (permalink / raw)
To: Shaun Tancheff
Cc: Shaun Tancheff, Tejun Heo, Martin K . Petersen, linux-ide,
linux-scsi
In-Reply-To: <CAGnHSE=Lsb-L-nd8OHQpVPDe1JkURjnXoA26ZQ7bH1GBEPAAtA@mail.gmail.com>
Wait a minute. I think you missed or misunderstood something when you
listen to someone's opinion in that we should switch to sglist buffer.
I think the danger people referred to is exactly what is revealed when
the ugly code is removed in this commit (it doesn't mean that the code
should be kept though).
The original buffer appears to be open:
buf = page_address(sg_page(scsi_sglist(scmd)));
While the new buffer you adopted in in ata_format_sct_write_same() and
ata_format_dsm_trim_descr() is of fixed size:
buffer = ((void *)ata_scsi_rbuf);
sctpg = ((void *)ata_scsi_rbuf);
because:
#define ATA_SCSI_RBUF_SIZE 4096
...
static u8 ata_scsi_rbuf[ATA_SCSI_RBUF_SIZE];
So the sglist buffer is always 4096 bytes.
And hence you can probably safely use ATA_SCSI_RBUF_SIZE as the buflen
param in the sg_copy_from_buffer() calls (at least in the case of
ata_format_sct_write_same()). However, the return value of
ata_format_dsm_trim_descr() should still always be used_bytes since
that is needed by the ata taskfile construction.
You may want to check (n_block / 65535 * 8 > ATA_SCSI_RBUF_SIZE). If
it is true, then perhaps we may want to return 0, and make the SATL
response with invalid CDB field if we catch that.
Though IMHO this is really NOT a reason that is strong enough to
prevent this patch from entering the repo first.
On 23 August 2016 at 09:36, Tom Yan <tom.ty89@gmail.com> wrote:
> On 23 August 2016 at 09:18, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
>> On Tue, Aug 23, 2016 at 3:37 AM, Tom Yan <tom.ty89@gmail.com> wrote:
>>> On 23 August 2016 at 07:30, Shaun Tancheff <shaun@tancheff.com> wrote:
>>
>>> If we really want/need to avoid hitting some real buffer limit (e.g.
>>> maximum length of scatter/gather list?), then we should in some way
>>> check n_block against that. If it is too large we then return
>>> used_bytes = 0 (optionally with some follow-up to add a response to
>>> such return value or so).
>>
>> Yes there is a real buffer limit, I can think of these two options:
>> 1- Assume the setups from sd_setup_discard_cmnd() and/
>> or sd_setup_write_same_cmnd() are providing an sglist of
>> sdp->sector_size via scsi_init_io()
>
> That sounds completely wrong. The scatter/gather list we are talking
> about here has nothing to do with the SCSI or block layer anymore. The
> SATL has _already_ parsed the SCSI Write Same (16) command and is
> packing ranges/payload according to that in this stage. If there is
> any limit it would probably the max_segment allowed by the host driver
> (e.g. ahci).
>
> It doesn't seem to make sense to me either that we would need to
> prevent sglist overflow in such level. Doesn't that mean we would need
> to do the same checking (specifically, as in hard coding checks in all
> kinds of procedures) in every use of scatter/gather list? That doesn't
> sound right at all.
>
>>
>> 2- Find (or write) a suitable sg_get_size(sgl, nents) to walk the
>> sglist and calculate the available buffer size.
>>
>> #2 sounds like more fun but I'm not sure it's what people would prefer to see.
>
> No idea if such thing exists / makes sense at all.
>
>>
>> --
>> Shaun
^ permalink raw reply
* Re: [PATCH v6 2/4] Add support for SCT Write Same
From: Tom Yan @ 2016-08-23 10:37 UTC (permalink / raw)
To: Shaun Tancheff
Cc: linux-ide, LKML, Tejun Heo, Christoph Hellwig,
Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
Josh Bingaman, Hannes Reinecke, Shaun Tancheff
In-Reply-To: <20160822042321.24367-3-shaun@tancheff.com>
On 22 August 2016 at 04:23, Shaun Tancheff <shaun@tancheff.com> wrote:
> SATA drives may support write same via SCT. This is useful
> for setting the drive contents to a specific pattern (0's).
>
> Translate a SCSI WRITE SAME 16 command to be either a DSM TRIM
> command or an SCT Write Same command.
>
> Based on the UNMAP flag:
> - When set translate to DSM TRIM
> - When not set translate to SCT Write Same
>
> Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
> ---
> v6:
> - Change to use sg_copy_from_buffer as per Christoph Hellwig <hch@lst.de>
> v5:
> - Addressed review comments
> - Report support for ZBC only for zoned devices.
> - kmap page during rewrite
> - Fix unmap set to require trim or error, if not unmap then sct write
> same or error.
> v4:
> - Added partial MAINTENANCE_IN opcode simulation
> - Dropped all changes in drivers/scsi/*
> - Changed to honor the UNMAP flag -> TRIM, no UNMAP -> SCT.
> v3:
> - Demux UNMAP/TRIM from WRITE SAME
> v2:
> - Remove fugly ata hacking from sd.c
>
> drivers/ata/libata-scsi.c | 199 +++++++++++++++++++++++++++++++++++++++-------
> include/linux/ata.h | 43 ++++++++++
> 2 files changed, 213 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 7990cb2..ebf1a04 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -1159,8 +1159,6 @@ static void ata_scsi_sdev_config(struct scsi_device *sdev)
> {
> sdev->use_10_for_rw = 1;
> sdev->use_10_for_ms = 1;
> - sdev->no_report_opcodes = 1;
> - sdev->no_write_same = 1;
>
> /* Schedule policy is determined by ->qc_defer() callback and
> * it needs to see every deferred qc. Set dev_blocked to 1 to
> @@ -3287,7 +3285,7 @@ static unsigned int ata_scsi_pass_thru(struct ata_queued_cmd *qc)
> * @cmd: SCSI command being translated
> * @num: Maximum number of entries (nominally 64).
> * @sector: Starting sector
> - * @count: Total Range of request
> + * @count: Total Range of request in logical sectors
> *
> * Rewrite the WRITE SAME descriptor to be a DSM TRIM little-endian formatted
> * descriptor.
> @@ -3330,6 +3328,45 @@ static unsigned int ata_format_dsm_trim_descr(struct scsi_cmnd *cmd, u32 num,
> return used_bytes;
> }
>
> +/**
> + * ata_format_dsm_trim_descr() - SATL Write Same to ATA SCT Write Same
> + * @cmd: SCSI command being translated
> + * @lba: Starting sector
> + * @num: Number of logical sectors to be zero'd.
> + *
> + * Rewrite the WRITE SAME descriptor to be an SCT Write Same formatted
> + * descriptor.
> + * NOTE: Writes a pattern (0's) in the foreground.
> + * Large write-same requents can timeout.
> + */
> +static void ata_format_sct_write_same(struct scsi_cmnd *cmd, u64 lba, u64 num)
> +{
> + u16 *sctpg;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&ata_scsi_rbuf_lock, flags);
> + sctpg = ((void *)ata_scsi_rbuf);
Because ata_scsi_rbuf is of a fixed size of ATA_SCSI_RBUF_SIZE.
#define ATA_SCSI_RBUF_SIZE 4096
...
static u8 ata_scsi_rbuf[ATA_SCSI_RBUF_SIZE];
> +
> + put_unaligned_le16(0x0002, &sctpg[0]); /* SCT_ACT_WRITE_SAME */
> + put_unaligned_le16(0x0101, &sctpg[1]); /* WRITE PTRN FG */
> + put_unaligned_le64(lba, &sctpg[2]);
> + put_unaligned_le64(num, &sctpg[6]);
> + put_unaligned_le32(0u, &sctpg[10]);
> +
> + sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), sctpg, 512);
You have no reason to use 512 here instead of ATA_SCSI_RBUF_SIZE this time.
> + spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
> +}
> +
> +/**
> + * ata_scsi_write_same_xlat() - SATL Write Same to ATA SCT Write Same
> + * @qc: Command to be translated
> + *
> + * Translate a SCSI WRITE SAME command to be either a DSM TRIM command or
> + * an SCT Write Same command.
> + * Based on WRITE SAME has the UNMAP flag
> + * When set translate to DSM TRIM
> + * When clear translate to SCT Write Same
> + */
> static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
> {
> struct ata_taskfile *tf = &qc->tf;
> @@ -3342,6 +3379,7 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
> u32 size;
> u16 fp;
> u8 bp = 0xff;
> + u8 unmap = cdb[1] & 0x8;
>
> /* we may not issue DMA commands if no DMA mode is set */
> if (unlikely(!dev->dma_mode))
> @@ -3353,11 +3391,26 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
> }
> scsi_16_lba_len(cdb, &block, &n_block);
>
> - /* for now we only support WRITE SAME with the unmap bit set */
> - if (unlikely(!(cdb[1] & 0x8))) {
> - fp = 1;
> - bp = 3;
> - goto invalid_fld;
> + if (unmap) {
> + /* If trim is not enabled the cmd is invalid. */
> + if ((dev->horkage & ATA_HORKAGE_NOTRIM) ||
> + !ata_id_has_trim(dev->id)) {
> + fp = 1;
> + bp = 3;
> + goto invalid_fld;
> + }
> + /* If the request is too large the cmd is invalid */
> + if (n_block > 0xffff * trmax) {
> + fp = 2;
> + goto invalid_fld;
> + }
> + } else {
> + /* If write same is not available the cmd is invalid */
> + if (!ata_id_sct_write_same(dev->id)) {
> + fp = 1;
> + bp = 3;
> + goto invalid_fld;
> + }
> }
>
> /*
> @@ -3367,30 +3420,42 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
> if (!scsi_sg_count(scmd))
> goto invalid_param_len;
>
> - if (n_block <= 0xffff * trmax) {
> + if (unmap) {
> size = ata_format_dsm_trim_descr(scmd, trmax, block, n_block);
> + if (ata_ncq_enabled(dev) && ata_fpdma_dsm_supported(dev)) {
> + /* Newer devices support queued TRIM commands */
> + tf->protocol = ATA_PROT_NCQ;
> + tf->command = ATA_CMD_FPDMA_SEND;
> + tf->hob_nsect = ATA_SUBCMD_FPDMA_SEND_DSM & 0x1f;
> + tf->nsect = qc->tag << 3;
> + tf->hob_feature = (size / 512) >> 8;
> + tf->feature = size / 512;
> +
> + tf->auxiliary = 1;
> + } else {
> + tf->protocol = ATA_PROT_DMA;
> + tf->hob_feature = 0;
> + tf->feature = ATA_DSM_TRIM;
> + tf->hob_nsect = (size / 512) >> 8;
> + tf->nsect = size / 512;
> + tf->command = ATA_CMD_DSM;
> + }
> } else {
> - fp = 2;
> - goto invalid_fld;
> - }
> -
> - if (ata_ncq_enabled(dev) && ata_fpdma_dsm_supported(dev)) {
> - /* Newer devices support queued TRIM commands */
> - tf->protocol = ATA_PROT_NCQ;
> - tf->command = ATA_CMD_FPDMA_SEND;
> - tf->hob_nsect = ATA_SUBCMD_FPDMA_SEND_DSM & 0x1f;
> - tf->nsect = qc->tag << 3;
> - tf->hob_feature = (size / 512) >> 8;
> - tf->feature = size / 512;
> + ata_format_sct_write_same(scmd, block, n_block);
>
> - tf->auxiliary = 1;
> - } else {
> - tf->protocol = ATA_PROT_DMA;
> tf->hob_feature = 0;
> - tf->feature = ATA_DSM_TRIM;
> - tf->hob_nsect = (size / 512) >> 8;
> - tf->nsect = size / 512;
> - tf->command = ATA_CMD_DSM;
> + tf->feature = 0;
> + tf->hob_nsect = 0;
> + tf->nsect = 1;
> + tf->lbah = 0;
> + tf->lbam = 0;
> + tf->lbal = ATA_CMD_STANDBYNOW1;
> + tf->hob_lbah = 0;
> + tf->hob_lbam = 0;
> + tf->hob_lbal = 0;
> + tf->device = ATA_CMD_STANDBYNOW1;
> + tf->protocol = ATA_PROT_DMA;
> + tf->command = ATA_CMD_WRITE_LOG_DMA_EXT;
> }
>
> tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE | ATA_TFLAG_LBA48 |
> @@ -3414,6 +3479,76 @@ invalid_opcode:
> }
>
> /**
> + * ata_scsiop_maint_in - Simulate a subset of MAINTENANCE_IN
> + * @args: device MAINTENANCE_IN data / SCSI command of interest.
> + * @rbuf: Response buffer, to which simulated SCSI cmd output is sent.
> + *
> + * Yields a subset to satisfy scsi_report_opcode()
> + *
> + * LOCKING:
> + * spin_lock_irqsave(host lock)
> + */
> +static unsigned int ata_scsiop_maint_in(struct ata_scsi_args *args, u8 *rbuf)
> +{
> + struct ata_device *dev = args->dev;
> + u8 *cdb = args->cmd->cmnd;
> + u8 supported = 0;
> + unsigned int err = 0;
> +
> + if (cdb[2] != 1) {
> + ata_dev_warn(dev, "invalid command format %d\n", cdb[2]);
> + err = 2;
> + goto out;
> + }
> + switch (cdb[3]) {
> + case INQUIRY:
> + case MODE_SENSE:
> + case MODE_SENSE_10:
> + case READ_CAPACITY:
> + case SERVICE_ACTION_IN_16:
> + case REPORT_LUNS:
> + case REQUEST_SENSE:
> + case SYNCHRONIZE_CACHE:
> + case REZERO_UNIT:
> + case SEEK_6:
> + case SEEK_10:
> + case TEST_UNIT_READY:
> + case SEND_DIAGNOSTIC:
> + case MAINTENANCE_IN:
> + case READ_6:
> + case READ_10:
> + case READ_16:
> + case WRITE_6:
> + case WRITE_10:
> + case WRITE_16:
> + case ATA_12:
> + case ATA_16:
> + case VERIFY:
> + case VERIFY_16:
> + case MODE_SELECT:
> + case MODE_SELECT_10:
> + case START_STOP:
> + supported = 3;
> + break;
> + case WRITE_SAME_16:
> + if (ata_id_sct_write_same(dev->id))
> + supported = 3;
> + break;
> + case ZBC_IN:
> + case ZBC_OUT:
> + if (ata_id_zoned_cap(dev->id) ||
> + dev->class == ATA_DEV_ZAC)
> + supported = 3;
> + break;
> + default:
> + break;
> + }
> +out:
> + rbuf[1] = supported; /* supported */
> + return err;
> +}
> +
> +/**
> * ata_scsi_report_zones_complete - convert ATA output
> * @qc: command structure returning the data
> *
> @@ -4193,6 +4328,13 @@ void ata_scsi_simulate(struct ata_device *dev, struct scsi_cmnd *cmd)
> ata_scsi_invalid_field(dev, cmd, 1);
> break;
>
> + case MAINTENANCE_IN:
> + if (scsicmd[1] == MI_REPORT_SUPPORTED_OPERATION_CODES)
> + ata_scsi_rbuf_fill(&args, ata_scsiop_maint_in);
> + else
> + ata_scsi_invalid_field(dev, cmd, 1);
> + break;
> +
> /* all other commands */
> default:
> ata_scsi_set_sense(dev, cmd, ILLEGAL_REQUEST, 0x20, 0x0);
> @@ -4225,7 +4367,6 @@ int ata_scsi_add_hosts(struct ata_host *host, struct scsi_host_template *sht)
> shost->max_lun = 1;
> shost->max_channel = 1;
> shost->max_cmd_len = 16;
> - shost->no_write_same = 1;
>
> /* Schedule policy is determined by ->qc_defer()
> * callback and it needs to see every deferred qc.
> diff --git a/include/linux/ata.h b/include/linux/ata.h
> index 45a1d71..fdb1803 100644
> --- a/include/linux/ata.h
> +++ b/include/linux/ata.h
> @@ -105,6 +105,7 @@ enum {
> ATA_ID_CFA_KEY_MGMT = 162,
> ATA_ID_CFA_MODES = 163,
> ATA_ID_DATA_SET_MGMT = 169,
> + ATA_ID_SCT_CMD_XPORT = 206,
> ATA_ID_ROT_SPEED = 217,
> ATA_ID_PIO4 = (1 << 1),
>
> @@ -789,6 +790,48 @@ static inline bool ata_id_sense_reporting_enabled(const u16 *id)
> }
>
> /**
> + *
> + * Word: 206 - SCT Command Transport
> + * 15:12 - Vendor Specific
> + * 11:6 - Reserved
> + * 5 - SCT Command Transport Data Tables supported
> + * 4 - SCT Command Transport Features Control supported
> + * 3 - SCT Command Transport Error Recovery Control supported
> + * 2 - SCT Command Transport Write Same supported
> + * 1 - SCT Command Transport Long Sector Access supported
> + * 0 - SCT Command Transport supported
> + */
> +static inline bool ata_id_sct_data_tables(const u16 *id)
> +{
> + return id[ATA_ID_SCT_CMD_XPORT] & (1 << 5) ? true : false;
> +}
> +
> +static inline bool ata_id_sct_features_ctrl(const u16 *id)
> +{
> + return id[ATA_ID_SCT_CMD_XPORT] & (1 << 4) ? true : false;
> +}
> +
> +static inline bool ata_id_sct_error_recovery_ctrl(const u16 *id)
> +{
> + return id[ATA_ID_SCT_CMD_XPORT] & (1 << 3) ? true : false;
> +}
> +
> +static inline bool ata_id_sct_write_same(const u16 *id)
> +{
> + return id[ATA_ID_SCT_CMD_XPORT] & (1 << 2) ? true : false;
> +}
> +
> +static inline bool ata_id_sct_long_sector_access(const u16 *id)
> +{
> + return id[ATA_ID_SCT_CMD_XPORT] & (1 << 1) ? true : false;
> +}
> +
> +static inline bool ata_id_sct_supported(const u16 *id)
> +{
> + return id[ATA_ID_SCT_CMD_XPORT] & (1 << 0) ? true : false;
> +}
> +
> +/**
> * ata_id_major_version - get ATA level of drive
> * @id: Identify data
> *
> --
> 2.9.3
>
^ permalink raw reply
* Re: [PATCH] ata: do not hard code limit in ata_set_lba_range_entries()
From: Shaun Tancheff @ 2016-08-23 10:45 UTC (permalink / raw)
To: Tom Yan
Cc: Shaun Tancheff, Tejun Heo, Martin K . Petersen, linux-ide,
linux-scsi
In-Reply-To: <CAGnHSEmZdVy2Xp_wK8n+89mD18NPAqcy5UBhhDEy5RAAC7c+vA@mail.gmail.com>
On Tue, Aug 23, 2016 at 5:17 AM, Tom Yan <tom.ty89@gmail.com> wrote:
> Wait a minute. I think you missed or misunderstood something when you
> listen to someone's opinion in that we should switch to sglist buffer.
No, I think I can trust Christoph Hellwig <hch@lst.de>
> I think the danger people referred to is exactly what is revealed when
> the ugly code is removed in this commit (it doesn't mean that the code
> should be kept though).
>
> The original buffer appears to be open:
> buf = page_address(sg_page(scsi_sglist(scmd)));
Which is unsafe.
> While the new buffer you adopted in in ata_format_sct_write_same() and
> ata_format_dsm_trim_descr() is of fixed size:
Yes ... it is a temporary response buffer for simulated commands used to
copy data to and from the command sg_list so as not to hold irqs while
modifying the buffer.
> buffer = ((void *)ata_scsi_rbuf);
>
> sctpg = ((void *)ata_scsi_rbuf);
>
> because:
>
> #define ATA_SCSI_RBUF_SIZE 4096
> ...
> static u8 ata_scsi_rbuf[ATA_SCSI_RBUF_SIZE];
>
> So the sglist buffer is always 4096 bytes.
No. The sglist buffer attached to the write same / trim command
is always sdp->sector_size
> And hence you can probably safely use ATA_SCSI_RBUF_SIZE as the buflen
> param in the sg_copy_from_buffer() calls (at least in the case of
> ata_format_sct_write_same()).
No. SCT Write Same has a fixed single 512 byte transfer.
However, the return value of
> ata_format_dsm_trim_descr() should still always be used_bytes since
> that is needed by the ata taskfile construction.
So long as it does not exceed its sglist/sector_size buffer.
> You may want to check (n_block / 65535 * 8 > ATA_SCSI_RBUF_SIZE). If
> it is true, then perhaps we may want to return 0, and make the SATL
> response with invalid CDB field if we catch that.
No that is not quite right you need to check if you are
overflowing either RBUF or sdp->sector_size.
> Though IMHO this is really NOT a reason that is strong enough to
> prevent this patch from entering the repo first.
> On 23 August 2016 at 09:36, Tom Yan <tom.ty89@gmail.com> wrote:
>> On 23 August 2016 at 09:18, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
>>> On Tue, Aug 23, 2016 at 3:37 AM, Tom Yan <tom.ty89@gmail.com> wrote:
>>>> On 23 August 2016 at 07:30, Shaun Tancheff <shaun@tancheff.com> wrote:
>>>
>>>> If we really want/need to avoid hitting some real buffer limit (e.g.
>>>> maximum length of scatter/gather list?), then we should in some way
>>>> check n_block against that. If it is too large we then return
>>>> used_bytes = 0 (optionally with some follow-up to add a response to
>>>> such return value or so).
>>>
>>> Yes there is a real buffer limit, I can think of these two options:
>>> 1- Assume the setups from sd_setup_discard_cmnd() and/
>>> or sd_setup_write_same_cmnd() are providing an sglist of
>>> sdp->sector_size via scsi_init_io()
>>
>> That sounds completely wrong. The scatter/gather list we are talking
>> about here has nothing to do with the SCSI or block layer anymore. The
>> SATL has _already_ parsed the SCSI Write Same (16) command and is
>> packing ranges/payload according to that in this stage. If there is
>> any limit it would probably the max_segment allowed by the host driver
>> (e.g. ahci).
>>
>> It doesn't seem to make sense to me either that we would need to
>> prevent sglist overflow in such level. Doesn't that mean we would need
>> to do the same checking (specifically, as in hard coding checks in all
>> kinds of procedures) in every use of scatter/gather list? That doesn't
>> sound right at all.
>>
>>>
>>> 2- Find (or write) a suitable sg_get_size(sgl, nents) to walk the
>>> sglist and calculate the available buffer size.
>>>
>>> #2 sounds like more fun but I'm not sure it's what people would prefer to see.
>>
>> No idea if such thing exists / makes sense at all.
>>
>>>
>>> --
>>> Shaun
--
Shaun Tancheff
^ permalink raw reply
* Re: [PATCH v6 2/4] Add support for SCT Write Same
From: Shaun Tancheff @ 2016-08-23 10:56 UTC (permalink / raw)
To: Tom Yan
Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
Josh Bingaman, Hannes Reinecke
In-Reply-To: <CAGnHSEmVbkJ30uNbW1O-Wh4E7RKbf+OojxRYjWanMYH_OrvHnA@mail.gmail.com>
On Tue, Aug 23, 2016 at 5:37 AM, Tom Yan <tom.ty89@gmail.com> wrote:
> On 22 August 2016 at 04:23, Shaun Tancheff <shaun@tancheff.com> wrote:
>> +/**
>> + * ata_format_dsm_trim_descr() - SATL Write Same to ATA SCT Write Same
>> + * @cmd: SCSI command being translated
>> + * @lba: Starting sector
>> + * @num: Number of logical sectors to be zero'd.
>> + *
>> + * Rewrite the WRITE SAME descriptor to be an SCT Write Same formatted
>> + * descriptor.
>> + * NOTE: Writes a pattern (0's) in the foreground.
>> + * Large write-same requents can timeout.
>> + */
>> +static void ata_format_sct_write_same(struct scsi_cmnd *cmd, u64 lba, u64 num)
>> +{
>> + u16 *sctpg;
>> + unsigned long flags;
>> +
>> + spin_lock_irqsave(&ata_scsi_rbuf_lock, flags);
>> + sctpg = ((void *)ata_scsi_rbuf);
>
> Because ata_scsi_rbuf is of a fixed size of ATA_SCSI_RBUF_SIZE.
>
> #define ATA_SCSI_RBUF_SIZE 4096
> ...
> static u8 ata_scsi_rbuf[ATA_SCSI_RBUF_SIZE];
>
>> +
>> + put_unaligned_le16(0x0002, &sctpg[0]); /* SCT_ACT_WRITE_SAME */
>> + put_unaligned_le16(0x0101, &sctpg[1]); /* WRITE PTRN FG */
>> + put_unaligned_le64(lba, &sctpg[2]);
>> + put_unaligned_le64(num, &sctpg[6]);
>> + put_unaligned_le32(0u, &sctpg[10]);
>> +
>> + sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), sctpg, 512);
>
> You have no reason to use 512 here instead of ATA_SCSI_RBUF_SIZE this time.
Ah .. because SCT Write Same is a fixed 512 byte transfer?
Ah .. because I only have 512 bytes to copy?
>> + spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
>> +}
^ permalink raw reply
* Re: [PATCH] ata: do not hard code limit in ata_set_lba_range_entries()
From: Tom Yan @ 2016-08-23 11:16 UTC (permalink / raw)
To: Shaun Tancheff
Cc: Shaun Tancheff, Tejun Heo, Martin K . Petersen, linux-ide,
linux-scsi
In-Reply-To: <CAJVOszBcuNouJk28nr1smPYJfEhE_9LjD0UgOiG31ApgrXzGJg@mail.gmail.com>
On 23 August 2016 at 10:45, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
> On Tue, Aug 23, 2016 at 5:17 AM, Tom Yan <tom.ty89@gmail.com> wrote:
>> Wait a minute. I think you missed or misunderstood something when you
>> listen to someone's opinion in that we should switch to sglist buffer.
>
> No, I think I can trust Christoph Hellwig <hch@lst.de>
I didn't say that you cannot trust him, but I just wonder you might
have misinterpreted what he said. I haven't really follow your patch
series earlier though.
>
>> I think the danger people referred to is exactly what is revealed when
>> the ugly code is removed in this commit (it doesn't mean that the code
>> should be kept though).
>>
>> The original buffer appears to be open:
>> buf = page_address(sg_page(scsi_sglist(scmd)));
>
> Which is unsafe.
Yup, but the while loop is not the right way to make it safe. It
probably prevent any overflow, but does not handle / response to the
request (n_block) properly.
It would only be practically unsafe if we would ever call it with a
request that needs a multi-block payload.
>
>> While the new buffer you adopted in in ata_format_sct_write_same() and
>> ata_format_dsm_trim_descr() is of fixed size:
>
> Yes ... it is a temporary response buffer for simulated commands used to
> copy data to and from the command sg_list so as not to hold irqs while
> modifying the buffer.
>
>> buffer = ((void *)ata_scsi_rbuf);
>>
>> sctpg = ((void *)ata_scsi_rbuf);
>>
>> because:
>>
>> #define ATA_SCSI_RBUF_SIZE 4096
>> ...
>> static u8 ata_scsi_rbuf[ATA_SCSI_RBUF_SIZE];
>>
>> So the sglist buffer is always 4096 bytes.
>
> No. The sglist buffer attached to the write same / trim command
Don't you think it is in appropriate to use ata_scsi_rbuf here then?
> is always sdp->sector_size
I can hardly see how sdp->sector_size is relevant to the sglist buffer
(especially the sglist here is facing to the ATA device).
Scatter/gather list could have multiple entries of a certain size
(that is not necessarily, or unlikely, logical sector size). Although
I don't really know how scatter/gather list works, it hardly seems
making any sense by saying that the sglist buffer is always logical
sector size.
AFAIK, for example, USB Attached SCSI driver does not limit the
maximum number of entries (max_segment) so the kernel fall back to
SG_MAX_SEGMENTS (2048), while it reports a max_segment_size of 4096 in
spite of the logical sector size with dma_get_max_seg_size().
For AHCI, it reports a max_segment of 168, while it does not seem to
report any max_segment_size so dma_get_max_seg_size() fallback to
65536 (though practically it seems 4096 will still be used in normal
writing).
>
>> And hence you can probably safely use ATA_SCSI_RBUF_SIZE as the buflen
>> param in the sg_copy_from_buffer() calls (at least in the case of
>> ata_format_sct_write_same()).
>
> No. SCT Write Same has a fixed single 512 byte transfer.
Never mind on that. I have presumption that you should always fully
copy ata_scsi_rbuf to the sglist.
>
> However, the return value of
>> ata_format_dsm_trim_descr() should still always be used_bytes since
>> that is needed by the ata taskfile construction.
>
> So long as it does not exceed its sglist/sector_size buffer.
>
>> You may want to check (n_block / 65535 * 8 > ATA_SCSI_RBUF_SIZE). If
>> it is true, then perhaps we may want to return 0, and make the SATL
>> response with invalid CDB field if we catch that.
>
> No that is not quite right you need to check if you are
> overflowing either RBUF or sdp->sector_size.
Again, this does not make any sense. See above. Apparently you really
have no reason to use ata_scsi_rbuf.
>
>> Though IMHO this is really NOT a reason that is strong enough to
>> prevent this patch from entering the repo first.
>
>> On 23 August 2016 at 09:36, Tom Yan <tom.ty89@gmail.com> wrote:
>>> On 23 August 2016 at 09:18, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
>>>> On Tue, Aug 23, 2016 at 3:37 AM, Tom Yan <tom.ty89@gmail.com> wrote:
>>>>> On 23 August 2016 at 07:30, Shaun Tancheff <shaun@tancheff.com> wrote:
>>>>
>>>>> If we really want/need to avoid hitting some real buffer limit (e.g.
>>>>> maximum length of scatter/gather list?), then we should in some way
>>>>> check n_block against that. If it is too large we then return
>>>>> used_bytes = 0 (optionally with some follow-up to add a response to
>>>>> such return value or so).
>>>>
>>>> Yes there is a real buffer limit, I can think of these two options:
>>>> 1- Assume the setups from sd_setup_discard_cmnd() and/
>>>> or sd_setup_write_same_cmnd() are providing an sglist of
>>>> sdp->sector_size via scsi_init_io()
>>>
>>> That sounds completely wrong. The scatter/gather list we are talking
>>> about here has nothing to do with the SCSI or block layer anymore. The
>>> SATL has _already_ parsed the SCSI Write Same (16) command and is
>>> packing ranges/payload according to that in this stage. If there is
>>> any limit it would probably the max_segment allowed by the host driver
>>> (e.g. ahci).
>>>
>>> It doesn't seem to make sense to me either that we would need to
>>> prevent sglist overflow in such level. Doesn't that mean we would need
>>> to do the same checking (specifically, as in hard coding checks in all
>>> kinds of procedures) in every use of scatter/gather list? That doesn't
>>> sound right at all.
>>>
>>>>
>>>> 2- Find (or write) a suitable sg_get_size(sgl, nents) to walk the
>>>> sglist and calculate the available buffer size.
>>>>
>>>> #2 sounds like more fun but I'm not sure it's what people would prefer to see.
>>>
>>> No idea if such thing exists / makes sense at all.
>>>
>>>>
>>>> --
>>>> Shaun
>
>
>
> --
> Shaun Tancheff
^ permalink raw reply
* Re: [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat
From: Martin K. Petersen @ 2016-08-24 3:33 UTC (permalink / raw)
To: Tom Yan
Cc: Shaun Tancheff, Shaun Tancheff, linux-ide, LKML, Tejun Heo,
Christoph Hellwig, Martin K . Petersen, Damien Le Moal,
Hannes Reinecke, Josh Bingaman, Hannes Reinecke
In-Reply-To: <CAGnHSEn7M9Kmse+_JFFCgvZY5TqZt1TDzXGhKaqimog4OKiktg@mail.gmail.com>
>>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes:
Tom> Nope, SCSI Write Same commands does not have payload (or in SCSI
Tom> terms, parameter list / data-out buffer).
WRITE SAME has a a payload of 1 logical block (unless NDOB is set but we
have had no good reason to support that yet).
UNMAP has a payload that varies based on the number of range
descriptors. The SCSI disk driver only ever issues a single descriptor
but since libata doesn't support UNMAP this doesn't really come into
play.
Ideally there would be a way to distinguish between device limits for
WRITE SAME with the UNMAP bit and for "regular" WRITE SAME. One way to
do that would be to transition the libata discard implementation over to
single-range UNMAP, fill out the relevant VPD page B0 fields and leave
the WRITE SAME bits for writing zeroes.
One reason that has not been particularly compelling is that the WRITE
SAME payload buffer does double duty to hold the ATA DSM TRIM range
descriptors and matches the required ATA payload size. Whereas the UNMAP
command would only provide 24 bytes of TRIM range space.
Also, please be careful with transfer lengths, __data_len, etc. As
mentioned, the transfer length WRITE SAME is typically 512 bytes and
that's the number of bytes that need to be DMA'ed and transferred over
the wire by the controller. But from a command completion perspective we
need to complete however many bytes the command acted upon. Unlike reads
and writes there is not a 1:1 mapping between the transfer length and
the affected area. So we do a bit of magic after the buffer has been
mapped to ensure that the completion byte count matches the number of
blocks that were affected by the command rather than the size of the
data buffer in bytes.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox