linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
To: Tejun Heo <tj@kernel.org>
Cc: jaxboe@fusionio.com, linux-fsdevel@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-ide@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org,
	hch@lst.de, James.Bottomley@suse.de, tytso@mit.edu,
	chris.mason@oracle.com, swhiteho@redhat.com,
	konishi.ryusuke@lab.ntt.co.jp, dm-devel@redhat.com, vst@vlnb.net,
	jack@suse.cz, rwheeler@redhat.com, hare@suse.de
Subject: Re: [PATCHSET block#for-2.6.36-post] block: replace barrier with sequenced flush
Date: Fri, 20 Aug 2010 15:22:14 +0200	[thread overview]
Message-ID: <20100820132214.GA6184@lst.de> (raw)
In-Reply-To: <1281616891-5691-1-git-send-email-tj@kernel.org>

FYI: here's a little writeup to document the new cache flushing scheme,
intended to replace Documentation/block/barriers.txt.  Any good
suggestion for a filename in the kernel tree?

---

Explicit volatile write cache control
=====================================

Introduction
------------

Many storage devices, especially in the consumer market, come with volatile
write back caches.  That means the devices signal I/O completion to the
operating system before data actually has hit the physical medium.  This
behavior obviously speeds up various workloads, but it means the operating
system needs to force data out to the physical medium when it performs
a data integrity operation like fsync, sync or an unmount.

The Linux block layer provides a two simple mechanism that lets filesystems
control the caching behavior of the storage device.  These mechanisms are
a forced cache flush, and the Force Unit Access (FUA) flag for requests.


Explicit cache flushes
----------------------

The REQ_FLUSH flag can be OR ed into the r/w flags of a bio submitted from the
filesystem and will make sure the volatile cache of the storage device
has been flushed before the actual I/O operation is started.  The explicit
guarantees write requests that have completed before the bio was submitted
actually are on the physical medium before this request has started.
In addition the REQ_FLUSH flag can be set on an otherwise empty bio
structure, which causes only an explicit cache flush without any dependent
I/O.  It is recommend to use the blkdev_issue_flush() helper for a pure
cache flush.


Forced Unit Access
-----------------

The REQ_FUA flag can be OR ed into the r/w flags of a bio submitted from the
filesystem and will make sure that I/O completion for this requests is not
signaled before the data has made it to non-volatile storage on the
physical medium.


Implementation details for filesystems
--------------------------------------

Filesystem can simply set the REQ_FLUSH and REQ_FUA bits and do not have to
worry if the underlying devices need any explicit cache flushing and how
the Forced Unit Access is implemented.  The REQ_FLUSH and REQ_FUA flags
may both be set on a single bio.


Implementation details for make_request_fn based block drivers
--------------------------------------------------------------

These drivers will always see the REQ_FLUSH and REQ_FUA bits as they sit
directly below the submit_bio interface.  For remapping drivers the REQ_FUA
bits needs to be propagate to underlying devices, and a global flush needs
to be implemented for bios with the REQ_FLUSH bit set.  For real device
drivers that do not have a volatile cache the REQ_FLUSH and REQ_FUA bits
on non-empty bios can simply be ignored, and REQ_FLUSH requests without
data can be completed successfully without doing any work.  Drivers for
devices with volatile caches need to implement the support for these
flags themselves without any help from the block layer.


Implementation details for request_fn based block drivers
--------------------------------------------------------------

For devices that do not support volatile write caches there is no driver
support required, the block layer completes empty REQ_FLUSH requests before
entering the driver and strips off the REQ_FLUSH and REQ_FUA bits from
requests that have a payload.  For device with volatile write caches the
driver needs to tell the block layer that it supports flushing caches by
doing:

	blk_queue_flush(sdkp->disk->queue, REQ_FLUSH);

and handle empty REQ_FLUSH requests in it's prep_fn/request_fn.  Note that
REQ_FLUSH requests with a payload are automatically turned into a sequence
of empty REQ_FLUSH and the actual write by the block layer.  For devices
that also support the FUA bit the block layer needs to be told to pass
through that bit using:

	blk_queue_flush(sdkp->disk->queue, REQ_FLUSH | REQ_FUA);

and handle write requests that have the REQ_FUA bit set properly in it's
prep_fn/request_fn.  If the FUA bit is not natively supported the block
layer turns it into an empty REQ_FLUSH requests after the actual write.

  parent reply	other threads:[~2010-08-20 13:22 UTC|newest]

Thread overview: 109+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-12 12:41 [PATCHSET block#for-2.6.36-post] block: replace barrier with sequenced flush Tejun Heo
2010-08-12 12:41 ` [PATCH 01/11] block/loop: queue ordered mode should be DRAIN_FLUSH Tejun Heo
2010-08-12 12:41 ` [PATCH 02/11] block: kill QUEUE_ORDERED_BY_TAG Tejun Heo
2010-08-13 12:56   ` Vladislav Bolkhovitin
2010-08-13 13:06     ` Christoph Hellwig
2010-08-12 12:41 ` [PATCH 03/11] block: deprecate barrier and replace blk_queue_ordered() with blk_queue_flush() Tejun Heo
2010-08-14  1:07   ` Jeremy Fitzhardinge
2010-08-14  9:42     ` hch
2010-08-16 20:38       ` Jeremy Fitzhardinge
2010-08-12 12:41 ` [PATCH 04/11] block: remove spurious uses of REQ_HARDBARRIER Tejun Heo
2010-08-12 12:41 ` [PATCH 05/11] block: misc cleanups in barrier code Tejun Heo
2010-08-12 12:41 ` [PATCH 06/11] block: drop barrier ordering by queue draining Tejun Heo
2010-08-12 12:41 ` [PATCH 07/11] block: rename blk-barrier.c to blk-flush.c Tejun Heo
2010-08-12 12:41 ` [PATCH 08/11] block: rename barrier/ordered to flush Tejun Heo
2010-08-17 13:26   ` Christoph Hellwig
2010-08-17 16:23     ` Tejun Heo
2010-08-17 17:08       ` Christoph Hellwig
2010-08-18  6:23         ` Tejun Heo
2010-08-12 12:41 ` [PATCH 09/11] block: implement REQ_FLUSH/FUA based interface for FLUSH/FUA requests Tejun Heo
2010-08-12 12:41 ` [PATCH 10/11] fs, block: propagate REQ_FLUSH/FUA interface to upper layers Tejun Heo
2010-08-12 21:24   ` Jan Kara
2010-08-13  7:19     ` Tejun Heo
2010-08-13  7:47       ` Christoph Hellwig
2010-08-16 16:33   ` [PATCH UPDATED " Tejun Heo
2010-08-12 12:41 ` [PATCH 11/11] block: use REQ_FLUSH in blkdev_issue_flush() Tejun Heo
2010-08-13 11:48 ` [PATCHSET block#for-2.6.36-post] block: replace barrier with sequenced flush Christoph Hellwig
2010-08-13 13:48   ` Tejun Heo
2010-08-13 14:38     ` Christoph Hellwig
2010-08-13 14:51       ` Tejun Heo
2010-08-14 10:36         ` Christoph Hellwig
2010-08-17  9:59           ` Tejun Heo
2010-08-17 13:19             ` Christoph Hellwig
2010-08-17 16:41               ` Tejun Heo
2010-08-17 16:59                 ` Christoph Hellwig
2010-08-18  6:35                   ` Tejun Heo
2010-08-18  8:11                     ` Tejun Heo
2010-08-20  8:26                   ` Kiyoshi Ueda
2010-08-23 12:14                     ` Tejun Heo
2010-08-23 14:17                       ` Mike Snitzer
2010-08-24 10:24                         ` Kiyoshi Ueda
2010-08-24 16:59                           ` Tejun Heo
2010-08-24 17:52                             ` Mike Snitzer
2010-08-24 18:14                               ` Tejun Heo
2010-08-25  8:00                             ` Kiyoshi Ueda
2010-08-25 15:28                               ` Mike Snitzer
2010-08-27  9:47                                 ` Kiyoshi Ueda
2010-08-27 13:49                                   ` Mike Snitzer
2010-08-30  6:13                                     ` Kiyoshi Ueda
2010-09-01  0:55                                       ` safety of retrying SYNCHRONIZE CACHE [was: Re: [PATCHSET block#for-2.6.36-post] block: replace barrier with sequenced flush] Mike Snitzer
2010-09-01  7:32                                         ` Hannes Reinecke
2010-09-01  7:38                                           ` Hannes Reinecke
2010-08-25 15:59                               ` [RFC] training mpath to discern between SCSI errors (was: Re: [PATCHSET block#for-2.6.36-post] block: replace barrier with sequenced flush) Mike Snitzer
2010-08-25 19:15                                 ` [RFC] training mpath to discern between SCSI errors Mike Christie
2010-08-30 11:38                                 ` Hannes Reinecke
2010-08-30 12:07                                   ` Sergei Shtylyov
2010-08-30 12:39                                     ` Hannes Reinecke
2010-08-30 14:52                                       ` [dm-devel] " Hannes Reinecke
2010-10-18  8:09                                         ` Jun'ichi Nomura
2010-10-18 11:55                                           ` Hannes Reinecke
2010-10-19  4:03                                             ` Jun'ichi Nomura
2010-11-19  3:11                                             ` [dm-devel] " Malahal Naineni
2010-11-30 22:59                                               ` Mike Snitzer
2010-12-07 23:16                                                 ` [RFC PATCH 0/3] differentiate between I/O errors Mike Snitzer
2010-12-07 23:16                                                   ` [RFC PATCH v2 1/3] scsi: Detailed " Mike Snitzer
2010-12-07 23:16                                                   ` [RFC PATCH v2 2/3] dm mpath: propagate target errors immediately Mike Snitzer
2010-12-07 23:16                                                   ` [RFC PATCH 3/3] block: improve detail in I/O error messages Mike Snitzer
2010-12-08 11:28                                                     ` Sergei Shtylyov
2010-12-08 15:05                                                       ` [PATCH v2 " Mike Snitzer
2010-12-10 23:40                                                   ` [RFC PATCH 0/3] differentiate between I/O errors Malahal Naineni
2011-01-14  1:15                                                     ` Mike Snitzer
2010-12-17  9:47                                                 ` training mpath to discern between SCSI errors Hannes Reinecke
2010-12-17 14:06                                                   ` Mike Snitzer
2011-01-14  1:09                                                     ` Mike Snitzer
2011-01-14  7:45                                                       ` Hannes Reinecke
2011-01-14 13:59                                                         ` Mike Snitzer
2010-08-24 17:11                       ` [PATCHSET block#for-2.6.36-post] block: replace barrier with sequenced flush Vladislav Bolkhovitin
2010-08-24 23:14                         ` Alan Cox
2010-08-13 12:55 ` Vladislav Bolkhovitin
2010-08-13 13:17   ` Christoph Hellwig
2010-08-18 19:29     ` Vladislav Bolkhovitin
2010-08-13 13:21   ` Tejun Heo
2010-08-18 19:30     ` Vladislav Bolkhovitin
2010-08-19  9:51       ` Tejun Heo
2010-08-30  9:54         ` Hannes Reinecke
2010-08-30 20:34           ` Vladislav Bolkhovitin
2010-08-18  9:46 ` Christoph Hellwig
2010-08-19  9:57   ` Tejun Heo
2010-08-19 10:20     ` Christoph Hellwig
2010-08-19 10:22       ` Tejun Heo
2010-08-20 13:22 ` Christoph Hellwig [this message]
2010-08-20 15:18   ` Ric Wheeler
2010-08-20 16:00     ` Chris Mason
2010-08-20 16:02       ` Ric Wheeler
2010-08-23 12:30     ` Tejun Heo
2010-08-23 12:48       ` Christoph Hellwig
2010-08-23 13:58         ` Ric Wheeler
2010-08-23 14:01           ` Jens Axboe
2010-08-23 14:08             ` Christoph Hellwig
2010-08-23 14:13               ` Tejun Heo
2010-08-23 14:19                 ` Christoph Hellwig
2010-08-25 11:31               ` Jens Axboe
2010-08-30 10:04               ` Hannes Reinecke
2010-08-23 15:19             ` Ric Wheeler
2010-08-23 16:45               ` Sergey Vlasov
2010-08-23 16:49                 ` [dm-devel] " Ric Wheeler
2010-08-23 12:36   ` Tejun Heo
2010-08-23 14:05     ` Christoph Hellwig
2010-08-23 14:15 ` [PATCH] block: simplify queue_next_fseq Christoph Hellwig
2010-08-23 16:28   ` OT grammar nit " John Robinson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100820132214.GA6184@lst.de \
    --to=hch@lst.de \
    --cc=James.Bottomley@suse.de \
    --cc=chris.mason@oracle.com \
    --cc=dm-devel@redhat.com \
    --cc=hare@suse.de \
    --cc=jack@suse.cz \
    --cc=jaxboe@fusionio.com \
    --cc=konishi.ryusuke@lab.ntt.co.jp \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=rwheeler@redhat.com \
    --cc=swhiteho@redhat.com \
    --cc=tj@kernel.org \
    --cc=tytso@mit.edu \
    --cc=vst@vlnb.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).