From: Douglas Gilbert <dgilbert@interlog.com>
To: Ric Wheeler <rwheeler@redhat.com>
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
"Martin K. Petersen" <mkp@mkp.net>,
James Bottomley <James.Bottomley@HansenPartnership.com>,
Jeff Moyer <jmoyer@redhat.com>, Tejun Heo <tj@kernel.org>,
Mike Snitzer <snitzer@redhat.com>
Subject: Re: T10 WCE interpretation in Linux & device level access
Date: Wed, 24 Apr 2013 11:40:12 -0400 [thread overview]
Message-ID: <5177FCDC.6010304@interlog.com> (raw)
In-Reply-To: <5176E3E8.3000809@redhat.com>
On 13-04-23 03:41 PM, Ric Wheeler wrote:
>
> For many years, we have used WCE as an indication that a device has a volatile
> write cache (not just a write cache) and used this as a trigger to send down
> SYNCHRONIZE_CACHE commands as needed.
>
> Some arrays with non-volatile cache seem to have WCE set and simply ignore the
> command.
>
> Some arrays with non-volatile cache seem to not set WCE.
>
> Others arrays with non-volatile cache - our problem arrays - set WCE and do
> something horrible and slow when sent the SYNCHRONIZE_CACHE commands.
>
> Note that for file systems, you can override this behavior by mounting with our
> barriers disabled (mount -o nobarrier .....). There is currently no way do
> disable this for anything using the device directly, not through the file system.
>
> Some applications run against block devices - not through a file system - and
> want not to slow to a crawl when they have an array in my problem set.
>
> Giving them a hook to ignore WCE seems to be a hack, but one that would resolve
> issues with users who won't want to wait months (years?) for us to convince the
> array vendors.
>
> Is this a hook worth doing?
>
> Have we hashed this out in the T10 committee?
Naturally I'm biased, but I tend to think the user space
is usually smarter than the kernel. That assumes skilled
users.
So if the user space issues a SYNCHRONIZE_CACHE with the
IMMED bit set and for the whole disk then the user should
have a way of forcing that command to be issued. The
assumption here is that the skilled user is about to power
down that array or pull some disks or SSDs *.
The more questionable cases are when a file system or the
block layer is issuing a barrier or some such that
translates to a SYNCHRONIZE_CACHE. That should be ignored
in some cases already discussed in this thread.
While working with SoCs I have noticed an interesting
technique. Sub-system sized sections of the memory mapped
IO space (e.g. a bank of GPIOs) can be write protected by
a simple ASCII sequence **. Attempts to change configuration
registers after write protect are ignored and an error
is noted (if anyone cares). The same ACSII sequence can be
used to un-write protect those sub-system configuration
registers. Typically on a SoC if the GPIOs are randomly
re-configured, it's game over.
Back to the SCSI world: a better solution might be if an
LLD could be informed of the reason a SCSI control command
is being issued (a sort of "come from" field). Failing, or
it addition to that, a sysfs interface could be added to
filter out "dangerous" SCSI commands:
echo "SC" > /sys/class/scsi_device/8:0:0:0/device/filter
cat /sys/class/scsi_device/8:0:0:0/device/filter
FU SC
If, for whatever reason, we did ignore a SYNCHRONIZE_CACHE
command we could use vendor specific sense data (vendor=Linux)
to indicate that a command had been ignored. That could be
extended to all SCSI commands that are filtered out ***;
better that than EIO, EACCES etc.
Doug Gilbert
* and if Linux doesn't permit this, then user might be
advised to run another, more obedient, host OS with
Linux running as a VM. A "pass-by" rather than a
"pass-through" ...
** only the configuration registers are write protected, so
data can still be written to the GPIOs
*** like me, many pass-through users cannot see why SCSI
commands injected to the SCSI subsystem (e.g. via
sg or bsg) are filtered out silently by the block layer.
prev parent reply other threads:[~2013-04-24 15:41 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-23 19:41 T10 WCE interpretation in Linux & device level access Ric Wheeler
2013-04-23 20:07 ` James Bottomley
2013-04-23 22:39 ` Jeremy Linton
2013-04-24 5:44 ` Elliott, Robert (Server Storage)
2013-04-24 11:00 ` Ric Wheeler
2013-04-27 16:09 ` James Bottomley
2013-04-24 11:17 ` Paolo Bonzini
2013-04-24 12:07 ` Hannes Reinecke
2013-04-24 12:08 ` Paolo Bonzini
2013-04-24 12:12 ` Hannes Reinecke
2013-04-24 12:23 ` Paolo Bonzini
2013-04-24 12:27 ` Mike Snitzer
2013-04-24 12:27 ` Ric Wheeler
2013-04-24 12:57 ` Paolo Bonzini
2013-04-24 14:35 ` Jeremy Linton
2013-04-24 18:20 ` Black, David
2013-04-24 20:41 ` Ric Wheeler
2013-04-24 21:02 ` James Bottomley
2013-04-24 21:54 ` Paolo Bonzini
2013-04-24 22:09 ` James Bottomley
2013-04-24 22:36 ` Ric Wheeler
2013-04-24 22:46 ` James Bottomley
2013-04-25 11:35 ` Ric Wheeler
2013-04-25 14:12 ` James Bottomley
2013-04-25 1:32 ` Martin K. Petersen
2013-04-27 6:03 ` Paolo Bonzini
2013-04-24 11:30 ` Hannes Reinecke
2013-04-23 20:28 ` Douglas Gilbert
2013-04-24 15:40 ` Douglas Gilbert [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5177FCDC.6010304@interlog.com \
--to=dgilbert@interlog.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=jmoyer@redhat.com \
--cc=linux-scsi@vger.kernel.org \
--cc=mkp@mkp.net \
--cc=rwheeler@redhat.com \
--cc=snitzer@redhat.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).