From: Douglas Gilbert <dgilbert@interlog.com>
To: michaelc@cs.wisc.edu, linux-scsi@vger.kernel.org,
target-devel@vger.kernel.org, ceph-devel@vger.kernel.org,
axboe@kernel.dk
Cc: Hannes Reinecke <hare@suse.de>
Subject: Re: [PATCH 0/5] block/scsi/lio support for COMPARE_AND_WRITE
Date: Thu, 16 Oct 2014 22:01:37 +0200 [thread overview]
Message-ID: <54402421.8060808@interlog.com> (raw)
In-Reply-To: <543FA05C.6060200@interlog.com>
On 14-10-16 12:39 PM, Douglas Gilbert wrote:
> On 14-10-16 07:37 AM, michaelc@cs.wisc.edu wrote:
>> The following patches implement the SCSI command COMPARE_AND_WRITE as a new
>> bio/request type REQ_CMP_AND_WRITE. COMPARE_AND_WRITE is defined in the
>> SCSI SBC (SCSI block command) specs as:
>>
>> The COMPARE AND WRITE command requests that the device server perform the
>> following as an uninterrupted series of actions:
>>
>> 1) perform the following operations:
>> A) read the specified logical blocks; and
>> B) transfer the specified number of logical blocks from the Data-Out
>> Buffer (i.e., the verify instance of the data is transferred from the
>> Data-Out Buffer);
>>
>> 2) compare the data read from the specified logical blocks with the verify
>> instance of the data; and
>> 3) If the compared data matches, then perform the following operations:
>> 1) transfer the specified number of logical blocks from the Data-Out
>> Buffer (i.e., the write instance of the data transferred from the
>> Data-Out Buffer); and
>> 2) write those logical blocks.
>>
>> The most command use of this command today is in VMware ESX where it is used
>> for locking. See
>> http://blogs.vmware.com/vsphere/2012/05/vmfs-locking-uncovered.html
>> [in ESX is it is called ATS (atomic test and set)] for more VMware info.
>> Linux fits into this use, because its SCSI target layer (LIO) is commonly
>> used as storage for ESX VMs.
>>
>> Currently, to support this command in LIO we emulate it by taking a lock,
>> doing a read, comparing it, then doing a write. The problem this patchset
>> tries to solve is that in many cases it is more efficient to pass the one
>> COMPARE_AND_REQUEST request directly to the device where it might have
>> optimized locking and also will require fewer requests to/from the target
>> and backing storage device.
>>
>> I am also bugging the ceph-devel list, because I am working on LIO + ceph
>> support. I am interested in using ceph's rbd device for the backing
>> storage for LIO, and I was thinking this request could be implemented similar
>> to how REQ_DISCARD (unmap/trim) is going to be, and I wanted to get some early
>> feedback. I know the scsi layer better, so I have only added support in sd in
>> this patchset.
>>
>> The following patches were made over the target-pending for-next branch but
>> also apply to Linus's tree.
>
> As I found when I implemented this command in sg3_utils,
> my library's support for handling and reporting the
> MISCOMPARE sense key needed to be strengthened. [A sense
> buffer with a MISCOMPARE sense key is what results when
> the compare in step 2) is unequal.]
>
> Since it was relatively rare prior to VMWare's use of
> the COMPARE AND WRITE command, MISCOMPARE is often forgotten
> in sense key handling. Also it should not be considered
> as an error and definitely should not lead to the command
> being retried.
>
> The COMPARE AND WRITE command may fail for other reasons
> such as a transport problem or a Unit Attention, so the
> SCSI eh logic may need to know about it.
Elaborating ...
Hannes will enjoy this one: say a COMPARE AND WRITE (CAW) fails
due to a transport error or timeout. What should the EH do *** ?
Answer: read that LBA(s) to see whether the command succeeded
(i.e. wrote the new data)! If it did, do nothing; if it didn't,
repeat the CAW command. And naturally that second CAW may
yield a MISCOMPARE.
Mike proposes using ECANCELED for the errno corresponding to
MISCOMPARE. Not wild about that but can't see anything better,
and it is definitely much better than EIO.
Checked with FreeBSD and this issue has not come up there yet.
If ESX uses a Unix like kernel, it would be interesting to know
which errno (if any) they use.
Doug Gilbert
*** the EH has other options:
- send the transport error or timeout indication back so
the application is alerted to do a "read to check if done".
- if it retries the CAW blindly that might yield a MISCOMPARE
when it actually succeeded (due to the original CAW command
being acted on); but then the application needs to be aware
that ECANCELED may not mean miscompare.
next prev parent reply other threads:[~2014-10-16 20:01 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-16 5:37 [PATCH 0/5] block/scsi/lio support for COMPARE_AND_WRITE michaelc
2014-10-16 5:37 ` [PATCH 1/5] block: set the nr of sectors a dev can compare and write atomically michaelc
2014-10-16 5:37 ` [PATCH 2/5] block: add function to issue compare and write michaelc
2014-10-17 9:55 ` Christoph Hellwig
2014-10-17 23:38 ` Martin K. Petersen
2014-10-18 15:16 ` Christoph Hellwig
2014-10-16 5:37 ` [PATCH 3/5] scsi: add support for COMPARE_AND_WRITE michaelc
2014-12-18 0:23 ` Elliott, Robert (Server Storage)
2014-10-16 5:37 ` [PATCH 4/5] lio: use REQ_COMPARE_AND_WRITE if supported michaelc
2014-10-16 5:37 ` [PATCH 5/5] lio iblock: add support for REQ_CMP_AND_WRITE michaelc
2014-10-16 10:39 ` [PATCH 0/5] block/scsi/lio support for COMPARE_AND_WRITE Douglas Gilbert
2014-10-16 20:01 ` Douglas Gilbert [this message]
2014-10-16 20:12 ` Elliott, Robert (Server Storage)
2014-10-17 6:02 ` Hannes Reinecke
2014-10-18 8:11 ` Bart Van Assche
2014-10-18 20:32 ` Mike Christie
2014-10-20 7:18 ` Sagi Grimberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54402421.8060808@interlog.com \
--to=dgilbert@interlog.com \
--cc=axboe@kernel.dk \
--cc=ceph-devel@vger.kernel.org \
--cc=hare@suse.de \
--cc=linux-scsi@vger.kernel.org \
--cc=michaelc@cs.wisc.edu \
--cc=target-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox