From: Paolo Bonzini <pbonzini@redhat.com>
To: Ric Wheeler <rwheeler@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>,
James Bottomley <James.Bottomley@HansenPartnership.com>,
"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
"Martin K. Petersen" <mkp@mkp.net>,
Jeff Moyer <jmoyer@redhat.com>, Tejun Heo <tj@kernel.org>,
Mike Snitzer <snitzer@redhat.com>,
"Black, David" <david.black@emc.com>,
"Elliott, Robert (Server Storage)" <Elliott@hp.com>,
"Knight, Frederick" <Frederick.Knight@netapp.com>
Subject: Re: T10 WCE interpretation in Linux & device level access
Date: Wed, 24 Apr 2013 14:57:38 +0200 [thread overview]
Message-ID: <5177D6C2.6080705@redhat.com> (raw)
In-Reply-To: <5177CFB6.9070105@redhat.com>
Il 24/04/2013 14:27, Ric Wheeler ha scritto:
>> The point is to _avoid_ hitting the disk. :)
>
> The point is to have a crash-proof version of the data acknowledged by
> the target device while letting data sit in volatile state as long as
> possible. To be even clearer, we would love to do this for a sub-range
> of the device but currently use a "big hammer" to flush the entire cache
> (possibly for multiple file systems on one target storage device).
>
> Once we use the SYNCHRONIZE_CACHE (or CACHE_FLUSH_EXT) command, we want
> the data on that target device to be there if someone loses power.
>
> If the device can promise this, we don't care (and don't know) how it
> manages that promise. It can leave the data on battery backed DRAM, can
> archive it to flash or any other scheme that works.
That's exactly the point of SYNC_NV=1.
> Just as importantly, we don't want to "destage" data to the back end
> drives if that is not required since it is really, really slow.
>
> The confusion here is that various storage devices have used the
> standard bits in arbitrary ways which makes it very hard to have one
> clear set of rules.
Also that we have ignored the problem for long, and it's worked
surprisingly well. :)
> Even harder to explain to end users when to use a work around (like
> mount -o nobarrier) or the proposed "ignore flushes" block level call :)
Hoping Thunderbird doesn't mangle the patch too badly, code might be
worth a thousand words... see after the sig, compile-tested only. I have
no access to these controllers, neither the good ones nor the bad ones. :)
Paolo
--------------------- 8< -------------------------
From: Paolo Bonzini <pbonzini@redhat.com>
Subject: [PATCH] scsi: only make REQ_FLUSH flush to non-volatile cache
The point of REQ_FLUSH is to have a crash-proof version of the data
acknowledged by the target device. We want the data on that target
device to be there if someone loses power, but we don't care (and don't
want to know) how it manages that promise. It can leave the data on
battery backed DRAM, can archive it to flash or any other scheme that
works.
This is exactly what SYNC_NV=1 does. Instead, SYNC_NV=0 should flush
the data to the medium, which is not desirable when we have a non-volatile
cache (except perhaps if the medium is removable).
NOT-Tested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 7992635..97ecfd9 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -800,9 +800,17 @@ static int sd_setup_write_same_cmnd(struct scsi_device *sdp, struct request *rq)
static int scsi_setup_flush_cmnd(struct scsi_device *sdp, struct request *rq)
{
+ struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
+
rq->timeout = SD_FLUSH_TIMEOUT;
rq->retries = SD_MAX_RETRIES;
rq->cmd[0] = SYNCHRONIZE_CACHE;
+
+ /* No need to synchronize to medium if we have a non-volatile cache,
+ * but be safe if the medium could just go away.
+ */
+ if (sdkp->nv_sup && !sdp->removable)
+ rq->cmd[1] |= 4; /* SYNC_NV */
rq->cmd_len = 10;
return scsi_setup_blk_pc_cmnd(sdp, rq);
@@ -2511,6 +2519,26 @@ static void sd_read_app_tag_own(struct scsi_disk *sdkp, unsigned char *buffer)
}
/**
+ * sd_read_extended_inquiry - Query disk device for non-volatile cache.
+ * @disk: disk to query
+ */
+static void sd_read_extended_inquiry(struct scsi_disk *sdkp)
+{
+ const int vpd_len = 64;
+ unsigned char *buffer = kmalloc(vpd_len, GFP_KERNEL);
+
+ if (!buffer ||
+ /* Block Limits VPD */
+ scsi_get_vpd_page(sdkp->device, 0x86, buffer, vpd_len))
+ goto out;
+
+ sdkp->nv_sup = (buffer[6] & 0x02) != 0;
+
+ out:
+ kfree(buffer);
+}
+
+/**
* sd_read_block_limits - Query disk device for preferred I/O sizes.
* @disk: disk to query
*/
@@ -2684,6 +2712,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
sd_read_capacity(sdkp, buffer);
if (sd_try_extended_inquiry(sdp)) {
+ sd_read_extended_inquiry(sdkp);
sd_read_block_provisioning(sdkp);
sd_read_block_limits(sdkp);
sd_read_block_characteristics(sdkp);
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 74a1e4c..6334dfe 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -84,6 +84,7 @@ struct scsi_disk {
unsigned lbpws10 : 1;
unsigned lbpvpd : 1;
unsigned ws16 : 1;
+ unsigned nv_sup : 1;
};
#define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)
> Regards,
>
> Ric
>
next prev parent reply other threads:[~2013-04-24 12:58 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-23 19:41 T10 WCE interpretation in Linux & device level access Ric Wheeler
2013-04-23 20:07 ` James Bottomley
2013-04-23 22:39 ` Jeremy Linton
2013-04-24 5:44 ` Elliott, Robert (Server Storage)
2013-04-24 11:00 ` Ric Wheeler
2013-04-27 16:09 ` James Bottomley
2013-04-24 11:17 ` Paolo Bonzini
2013-04-24 12:07 ` Hannes Reinecke
2013-04-24 12:08 ` Paolo Bonzini
2013-04-24 12:12 ` Hannes Reinecke
2013-04-24 12:23 ` Paolo Bonzini
2013-04-24 12:27 ` Mike Snitzer
2013-04-24 12:27 ` Ric Wheeler
2013-04-24 12:57 ` Paolo Bonzini [this message]
2013-04-24 14:35 ` Jeremy Linton
2013-04-24 18:20 ` Black, David
2013-04-24 20:41 ` Ric Wheeler
2013-04-24 21:02 ` James Bottomley
2013-04-24 21:54 ` Paolo Bonzini
2013-04-24 22:09 ` James Bottomley
2013-04-24 22:36 ` Ric Wheeler
2013-04-24 22:46 ` James Bottomley
2013-04-25 11:35 ` Ric Wheeler
2013-04-25 14:12 ` James Bottomley
2013-04-25 1:32 ` Martin K. Petersen
2013-04-27 6:03 ` Paolo Bonzini
2013-04-24 11:30 ` Hannes Reinecke
2013-04-23 20:28 ` Douglas Gilbert
2013-04-24 15:40 ` Douglas Gilbert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5177D6C2.6080705@redhat.com \
--to=pbonzini@redhat.com \
--cc=Elliott@hp.com \
--cc=Frederick.Knight@netapp.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=david.black@emc.com \
--cc=hare@suse.de \
--cc=jmoyer@redhat.com \
--cc=linux-scsi@vger.kernel.org \
--cc=mkp@mkp.net \
--cc=rwheeler@redhat.com \
--cc=snitzer@redhat.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).