linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: Hannes Reinecke <hare@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	James Bottomley <James.Bottomley@HansenPartnership.com>,
	Ric Wheeler <rwheeler@redhat.com>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"Martin K. Petersen" <mkp@mkp.net>,
	Jeff Moyer <jmoyer@redhat.com>, Tejun Heo <tj@kernel.org>
Subject: Re: T10 WCE interpretation in Linux & device level access
Date: Wed, 24 Apr 2013 08:27:09 -0400	[thread overview]
Message-ID: <20130424122709.GA12155@redhat.com> (raw)
In-Reply-To: <5177CC31.7090700@suse.de>

On Wed, Apr 24 2013 at  8:12am -0400,
Hannes Reinecke <hare@suse.de> wrote:

> On 04/24/2013 02:08 PM, Paolo Bonzini wrote:
> > Il 24/04/2013 14:07, Hannes Reinecke ha scritto:
> >> On 04/24/2013 01:17 PM, Paolo Bonzini wrote:
> >>> Il 23/04/2013 22:07, James Bottomley ha scritto:
> >>>> On Tue, 2013-04-23 at 15:41 -0400, Ric Wheeler wrote:
> >>>>> For many years, we have used WCE as an indication that a device has a volatile 
> >>>>> write cache (not just a write cache) and used this as a trigger to send down 
> >>>>> SYNCHRONIZE_CACHE commands as needed.
> >>>>>
> >>>>> Some arrays with non-volatile cache seem to have WCE set and simply ignore the 
> >>>>> command.
> >>>>
> >>>> I bet they don't; they probably obey the spec.  There's a SYNC_NV bit
> >>>> which if unset (which it is in our implementation) means only sync your
> >>>> non-NV cache.  For a device with all NV, that equates to nop.
> >>>
> >>> Isn't it the other way round?
> >>>
> >>> SYNC_NV = 0 means "sync all your caches to the medium", and it's what we do.
> >>>
> >>> SYNC_NV = 1 means "sync volatile to non-volatile", and it's what Ric wants.
> >>>
> >>> So we should set SYNC_NV=1 if NV_SUP is set, perhaps only if the medium
> >>> is non-removable just to err on the safe side.
> >>
> >> Or use 'WRITE_AND_VERIFY' here; that's guaranteed to hit the disk.
> >> Plus it even has a guarantee about data consistency on the disk,
> >> which the normal WRITE command has not.
> > 
> > The point is to _avoid_ hitting the disk. :)
> > 
> Ah. Really?
> 
> Why do we discuss SYNCHRONIZE CACHE then?
> I was under the impression that we're talking various 'barriers'
> (or rather 'flush' nowadays) implementations.
> Which require that some data needs to be written to disk before
> continuing.
> 
> Or did I somehow misread the thread?

This thread was motivated by the fact that the storage is reporting
WCE=1 and OracleDB (with ASM) is issuing SYNCHRONIZE CACHE (via
REQ_FLUSH) which the array in question handles _very_ slowly (even
though it is battery backed).

So the question Ric had is: should we expose a new knob that allows
admins to impose WCE=0 behavior (avoiding the SYNCHRONIZE CACHE).

I'm concerned such a knob will be abused for the benefit of speed and
all data integrity caution will get thrown to the wind (much like the
nobarrier FS mount option).

Mike

  parent reply	other threads:[~2013-04-24 12:27 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-23 19:41 T10 WCE interpretation in Linux & device level access Ric Wheeler
2013-04-23 20:07 ` James Bottomley
2013-04-23 22:39   ` Jeremy Linton
2013-04-24  5:44     ` Elliott, Robert (Server Storage)
2013-04-24 11:00       ` Ric Wheeler
2013-04-27 16:09       ` James Bottomley
2013-04-24 11:17   ` Paolo Bonzini
2013-04-24 12:07     ` Hannes Reinecke
2013-04-24 12:08       ` Paolo Bonzini
2013-04-24 12:12         ` Hannes Reinecke
2013-04-24 12:23           ` Paolo Bonzini
2013-04-24 12:27           ` Mike Snitzer [this message]
2013-04-24 12:27         ` Ric Wheeler
2013-04-24 12:57           ` Paolo Bonzini
2013-04-24 14:35             ` Jeremy Linton
2013-04-24 18:20               ` Black, David
2013-04-24 20:41                 ` Ric Wheeler
2013-04-24 21:02                   ` James Bottomley
2013-04-24 21:54                     ` Paolo Bonzini
2013-04-24 22:09                       ` James Bottomley
2013-04-24 22:36                         ` Ric Wheeler
2013-04-24 22:46                           ` James Bottomley
2013-04-25 11:35                             ` Ric Wheeler
2013-04-25 14:12                               ` James Bottomley
2013-04-25  1:32                         ` Martin K. Petersen
2013-04-27  6:03                           ` Paolo Bonzini
2013-04-24 11:30   ` Hannes Reinecke
2013-04-23 20:28 ` Douglas Gilbert
2013-04-24 15:40 ` Douglas Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130424122709.GA12155@redhat.com \
    --to=snitzer@redhat.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=hare@suse.de \
    --cc=jmoyer@redhat.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=mkp@mkp.net \
    --cc=pbonzini@redhat.com \
    --cc=rwheeler@redhat.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).