From: Douglas Gilbert <dgilbert@interlog.com>
To: Ric Wheeler <rwheeler@redhat.com>
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
"Martin K. Petersen" <mkp@mkp.net>,
James Bottomley <James.Bottomley@HansenPartnership.com>,
Jeff Moyer <jmoyer@redhat.com>, Tejun Heo <tj@kernel.org>,
Mike Snitzer <snitzer@redhat.com>
Subject: Re: T10 WCE interpretation in Linux & device level access
Date: Wed, 24 Apr 2013 11:40:12 -0400 [thread overview]
Message-ID: <5177FCDC.6010304@interlog.com> (raw)
In-Reply-To: <5176E3E8.3000809@redhat.com>
On 13-04-23 03:41 PM, Ric Wheeler wrote:
>
> For many years, we have used WCE as an indication that a device has a volatile
> write cache (not just a write cache) and used this as a trigger to send down
> SYNCHRONIZE_CACHE commands as needed.
>
> Some arrays with non-volatile cache seem to have WCE set and simply ignore the
> command.
>
> Some arrays with non-volatile cache seem to not set WCE.
>
> Others arrays with non-volatile cache - our problem arrays - set WCE and do
> something horrible and slow when sent the SYNCHRONIZE_CACHE commands.
>
> Note that for file systems, you can override this behavior by mounting with our
> barriers disabled (mount -o nobarrier .....). There is currently no way do
> disable this for anything using the device directly, not through the file system.
>
> Some applications run against block devices - not through a file system - and
> want not to slow to a crawl when they have an array in my problem set.
>
> Giving them a hook to ignore WCE seems to be a hack, but one that would resolve
> issues with users who won't want to wait months (years?) for us to convince the
> array vendors.
>
> Is this a hook worth doing?
>
> Have we hashed this out in the T10 committee?
Naturally I'm biased, but I tend to think the user space
is usually smarter than the kernel. That assumes skilled
users.
So if the user space issues a SYNCHRONIZE_CACHE with the
IMMED bit set and for the whole disk then the user should
have a way of forcing that command to be issued. The
assumption here is that the skilled user is about to power
down that array or pull some disks or SSDs *.
The more questionable cases are when a file system or the
block layer is issuing a barrier or some such that
translates to a SYNCHRONIZE_CACHE. That should be ignored
in some cases already discussed in this thread.
While working with SoCs I have noticed an interesting
technique. Sub-system sized sections of the memory mapped
IO space (e.g. a bank of GPIOs) can be write protected by
a simple ASCII sequence **. Attempts to change configuration
registers after write protect are ignored and an error
is noted (if anyone cares). The same ACSII sequence can be
used to un-write protect those sub-system configuration
registers. Typically on a SoC if the GPIOs are randomly
re-configured, it's game over.
Back to the SCSI world: a better solution might be if an
LLD could be informed of the reason a SCSI control command
is being issued (a sort of "come from" field). Failing, or
it addition to that, a sysfs interface could be added to
filter out "dangerous" SCSI commands:
echo "SC" > /sys/class/scsi_device/8:0:0:0/device/filter
cat /sys/class/scsi_device/8:0:0:0/device/filter
FU SC
If, for whatever reason, we did ignore a SYNCHRONIZE_CACHE
command we could use vendor specific sense data (vendor=Linux)
to indicate that a command had been ignored. That could be
extended to all SCSI commands that are filtered out ***;
better that than EIO, EACCES etc.
Doug Gilbert
* and if Linux doesn't permit this, then user might be
advised to run another, more obedient, host OS with
Linux running as a VM. A "pass-by" rather than a
"pass-through" ...
** only the configuration registers are write protected, so
data can still be written to the GPIOs
*** like me, many pass-through users cannot see why SCSI
commands injected to the SCSI subsystem (e.g. via
sg or bsg) are filtered out silently by the block layer.
prev parent reply other threads:[~2013-04-24 15:41 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-23 19:41 T10 WCE interpretation in Linux & device level access Ric Wheeler
2013-04-23 20:07 ` James Bottomley
2013-04-23 22:39 ` Jeremy Linton
2013-04-24 5:44 ` Elliott, Robert (Server Storage)
2013-04-24 11:00 ` Ric Wheeler
2013-04-27 16:09 ` James Bottomley
2013-04-24 11:17 ` Paolo Bonzini
2013-04-24 12:07 ` Hannes Reinecke
2013-04-24 12:08 ` Paolo Bonzini
2013-04-24 12:12 ` Hannes Reinecke
2013-04-24 12:23 ` Paolo Bonzini
2013-04-24 12:27 ` Mike Snitzer
2013-04-24 12:27 ` Ric Wheeler
2013-04-24 12:57 ` Paolo Bonzini
2013-04-24 14:35 ` Jeremy Linton
2013-04-24 18:20 ` Black, David
2013-04-24 20:41 ` Ric Wheeler
2013-04-24 21:02 ` James Bottomley
2013-04-24 21:54 ` Paolo Bonzini
2013-04-24 22:09 ` James Bottomley
2013-04-24 22:36 ` Ric Wheeler
2013-04-24 22:46 ` James Bottomley
2013-04-25 11:35 ` Ric Wheeler
2013-04-25 14:12 ` James Bottomley
2013-04-25 1:32 ` Martin K. Petersen
2013-04-27 6:03 ` Paolo Bonzini
2013-04-24 11:30 ` Hannes Reinecke
2013-04-23 20:28 ` Douglas Gilbert
2013-04-24 15:40 ` Douglas Gilbert [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5177FCDC.6010304@interlog.com \
--to=dgilbert@interlog.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=jmoyer@redhat.com \
--cc=linux-scsi@vger.kernel.org \
--cc=mkp@mkp.net \
--cc=rwheeler@redhat.com \
--cc=snitzer@redhat.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.