Re: T10 WCE interpretation in Linux & device level access

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Ric Wheeler <rwheeler@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>,
	James Bottomley <James.Bottomley@HansenPartnership.com>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"Martin K. Petersen" <mkp@mkp.net>,
	Jeff Moyer <jmoyer@redhat.com>, Tejun Heo <tj@kernel.org>,
	Mike Snitzer <snitzer@redhat.com>,
	"Black, David" <david.black@emc.com>,
	"Elliott, Robert (Server Storage)" <Elliott@hp.com>,
	"Knight, Frederick" <Frederick.Knight@netapp.com>
Subject: Re: T10 WCE interpretation in Linux & device level access
Date: Wed, 24 Apr 2013 08:27:34 -0400	[thread overview]
Message-ID: <5177CFB6.9070105@redhat.com> (raw)
In-Reply-To: <5177CB23.5090802@redhat.com>

On 04/24/2013 08:08 AM, Paolo Bonzini wrote:
> Il 24/04/2013 14:07, Hannes Reinecke ha scritto:
>> On 04/24/2013 01:17 PM, Paolo Bonzini wrote:
>>> Il 23/04/2013 22:07, James Bottomley ha scritto:
>>>> On Tue, 2013-04-23 at 15:41 -0400, Ric Wheeler wrote:
>>>>> For many years, we have used WCE as an indication that a device has a volatile
>>>>> write cache (not just a write cache) and used this as a trigger to send down
>>>>> SYNCHRONIZE_CACHE commands as needed.
>>>>>
>>>>> Some arrays with non-volatile cache seem to have WCE set and simply ignore the
>>>>> command.
>>>> I bet they don't; they probably obey the spec.  There's a SYNC_NV bit
>>>> which if unset (which it is in our implementation) means only sync your
>>>> non-NV cache.  For a device with all NV, that equates to nop.
>>> Isn't it the other way round?
>>>
>>> SYNC_NV = 0 means "sync all your caches to the medium", and it's what we do.
>>>
>>> SYNC_NV = 1 means "sync volatile to non-volatile", and it's what Ric wants.
>>>
>>> So we should set SYNC_NV=1 if NV_SUP is set, perhaps only if the medium
>>> is non-removable just to err on the safe side.
>> Or use 'WRITE_AND_VERIFY' here; that's guaranteed to hit the disk.
>> Plus it even has a guarantee about data consistency on the disk,
>> which the normal WRITE command has not.
> The point is to _avoid_ hitting the disk. :)
>
> Paolo
>

The point is to have a crash-proof version of the data acknowledged by the 
target device while letting data sit in volatile state as long as possible. To 
be even clearer, we would love to do this for a sub-range of the device but 
currently use a "big hammer" to flush the entire cache (possibly for multiple 
file systems on one target storage device).

Once we use the SYNCHRONIZE_CACHE (or CACHE_FLUSH_EXT) command, we want the data 
on that target device to be there if someone loses power.

If the device can promise this, we don't care (and don't know) how it manages 
that promise. It can leave the data on battery backed DRAM, can archive it to 
flash or any other scheme that works.

Just as importantly, we don't want to "destage" data to the back end drives if 
that is not required since it is really, really slow.

The confusion here is that various storage devices have used the standard bits 
in arbitrary ways which makes it very hard to have one clear set of rules.

Even harder to explain to end users when to use a work around (like mount -o 
nobarrier) or the proposed "ignore flushes" block level call :)

Regards,

Ric

next prev parent reply	other threads:[~2013-04-24 12:28 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-23 19:41 T10 WCE interpretation in Linux & device level access Ric Wheeler
2013-04-23 20:07 ` James Bottomley
2013-04-23 22:39   ` Jeremy Linton
2013-04-24  5:44     ` Elliott, Robert (Server Storage)
2013-04-24 11:00       ` Ric Wheeler
2013-04-27 16:09       ` James Bottomley
2013-04-24 11:17   ` Paolo Bonzini
2013-04-24 12:07     ` Hannes Reinecke
2013-04-24 12:08       ` Paolo Bonzini
2013-04-24 12:12         ` Hannes Reinecke
2013-04-24 12:23           ` Paolo Bonzini
2013-04-24 12:27           ` Mike Snitzer
2013-04-24 12:27         ` Ric Wheeler [this message]
2013-04-24 12:57           ` Paolo Bonzini
2013-04-24 14:35             ` Jeremy Linton
2013-04-24 18:20               ` Black, David
2013-04-24 20:41                 ` Ric Wheeler
2013-04-24 21:02                   ` James Bottomley
2013-04-24 21:54                     ` Paolo Bonzini
2013-04-24 22:09                       ` James Bottomley
2013-04-24 22:36                         ` Ric Wheeler
2013-04-24 22:46                           ` James Bottomley
2013-04-25 11:35                             ` Ric Wheeler
2013-04-25 14:12                               ` James Bottomley
2013-04-25  1:32                         ` Martin K. Petersen
2013-04-27  6:03                           ` Paolo Bonzini
2013-04-24 11:30   ` Hannes Reinecke
2013-04-23 20:28 ` Douglas Gilbert
2013-04-24 15:40 ` Douglas Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5177CFB6.9070105@redhat.com \
    --to=rwheeler@redhat.com \
    --cc=Elliott@hp.com \
    --cc=Frederick.Knight@netapp.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=david.black@emc.com \
    --cc=hare@suse.de \
    --cc=jmoyer@redhat.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=mkp@mkp.net \
    --cc=pbonzini@redhat.com \
    --cc=snitzer@redhat.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).