From: James Bottomley <jbottomley@parallels.com>
To: "Elliott@hp.com" <Elliott@hp.com>
Cc: "axboe@kernel.dk" <axboe@kernel.dk>,
"michaelc@cs.wisc.edu" <michaelc@cs.wisc.edu>,
"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
"jasowang@redhat.com" <jasowang@redhat.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"ohering@suse.com" <ohering@suse.com>,
"hch@infradead.org" <hch@infradead.org>,
"apw@canonical.com" <apw@canonical.com>,
"devel@linuxdriverproject.org" <devel@linuxdriverproject.org>
Subject: Re: [PATCH 1/1] [SCSI] Fix a bug in deriving the FLUSH_TIMEOUT from the basic I/O timeout
Date: Fri, 18 Jul 2014 15:39:25 +0000 [thread overview]
Message-ID: <1405697964.605.27.camel@jarvis.lan> (raw)
In-Reply-To: <94D0CD8314A33A4D9D801C0FE68B402958BAC6DC@G9W0745.americas.hpqcorp.net>
On Fri, 2014-07-18 at 00:51 +0000, Elliott, Robert (Server Storage)
wrote:
> In sd_sync_cache:
> rq->timeout *= SD_FLUSH_TIMEOUT_MULTIPLIER;
>
> Regardless of the baseline for the multiplication, a magic
> number of 2 is too arbitrary. That might work for an
> individual drive, but could be far too short for a RAID
> controller that runs into worst case error handling for
> the drives to which it is flushing (e.g., if its cache
> is volatile and the drives all have recoverable errors
> during writes). That time goes up with a bigger cache and
> with more fragmented write data in the cache requiring
> more individual WRITE commands.
RAID devices with gigabytes of cache are usually battery backed and lie
about their cache type (they usually claim write through). This
behaviour is exactly what we want because the flush is used to enforce
write barriers for filesystem consistency, so we only want to flush
volatile caches.
The rare case you cite, where the RAID device is volatile and having
difficulty flushing, we do want a failure because otherwise the
filesystem will become unusable and the system will live lock ...
barriers are sent down frequently under normal filesystem operation.
The SCSI standards committees have been struggling with defining and
detecting the difference between volatile and non-volatile cache and the
heuristics we'd have to use to avoid annoying USB devices with detecting
this don't look pretty. We always zero the SYNC_NV bit anyway, so even
if the devices stopped lying, we'd be safe.
> A better value would be the Recommended Command Timeout field
> value reported in the REPORT SUPPORTED OPERATION CODES command,
> if reported by the device server. That is supposed to account
> for the worst case.
>
> For cases where that is not reported, exposing the multiplier
> in sysfs would let the timeout for simple devices be set to
> smaller values than complex devices.
>
> Also, in both sd_setup_flush_cmnd and sd_sync_cache:
> cmd->cmnd[0] = SYNCHRONIZE_CACHE;
> cmd->cmd_len = 10;
>
> SYNCHRONIZE CACHE (16) should be favored over SYNCHRONIZE
> CACHE (10) unless SYNCHRONIZE CACHE (10) is not supported.
For what reason. We usually go for the safe alternatives, which is 10
byte commands because they have the widest testing and greatest level of
support. We don't do range flushes currently, so there doesn't seem to
be a practical different. If we did support range flushes, we'd likely
only use SC(16) on >2TB devices.
James
next prev parent reply other threads:[~2014-07-18 15:39 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-04 16:33 [PATCH 1/1] [SCSI] Fix a bug in deriving the FLUSH_TIMEOUT from the basic I/O timeout K. Y. Srinivasan
2014-06-04 17:02 ` James Bottomley
2014-06-04 17:15 ` KY Srinivasan
2014-06-06 1:32 ` Mike Christie
2014-06-06 2:53 ` KY Srinivasan
2014-06-06 17:18 ` Mike Christie
2014-06-06 17:52 ` James Bottomley
2014-06-06 18:22 ` Jens Axboe
2014-06-20 21:36 ` KY Srinivasan
2014-07-17 23:53 ` KY Srinivasan
2014-07-18 0:51 ` Elliott, Robert (Server Storage)
2014-07-18 15:11 ` Christoph Hellwig (hch@infradead.org)
2014-07-18 15:39 ` James Bottomley [this message]
2014-07-18 17:17 ` Elliott, Robert (Server Storage)
2014-07-18 17:41 ` James Bottomley
2014-07-18 18:16 ` Douglas Gilbert
2014-07-18 15:10 ` Christoph Hellwig (hch@infradead.org)
2014-07-18 16:44 ` KY Srinivasan
2014-07-18 16:57 ` James Bottomley
2014-07-18 17:01 ` hch
2014-07-18 17:05 ` KY Srinivasan
2014-07-18 17:12 ` hch
2014-07-18 17:15 ` hch
2014-07-18 17:00 ` Christoph Hellwig
2014-07-18 17:03 ` James Bottomley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1405697964.605.27.camel@jarvis.lan \
--to=jbottomley@parallels.com \
--cc=Elliott@hp.com \
--cc=apw@canonical.com \
--cc=axboe@kernel.dk \
--cc=devel@linuxdriverproject.org \
--cc=gregkh@linuxfoundation.org \
--cc=hch@infradead.org \
--cc=jasowang@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=michaelc@cs.wisc.edu \
--cc=ohering@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).