From: Nikhilesh Reddy <reddyn@codeaurora.org>
To: Theodore Ts'o <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org
Subject: Re: Using Cache barriers in lieu of REQ_FLUSH | REQ_FUA for emmc 5.1 (jdec spec JESD84-B51)
Date: Mon, 28 Sep 2015 15:28:16 -0700 [thread overview]
Message-ID: <5609BF00.5000502@codeaurora.org> (raw)
In-Reply-To: <20150920034248.GB2909@thunk.org>
On Sat 19 Sep 2015 08:42:48 PM PDT, Theodore Ts'o wrote:
> On Tue, Sep 15, 2015 at 04:17:46PM -0700, Nikhilesh Reddy wrote:
>>
>> The eMMC 5.1 spec defines cache "barrier" capability of the eMMC device as
>> defined in JESD84-B51
>>
>> I was wondering if there were any downsides to replacing the
>> WRITE_FLUSH_FUA with the cache barrier?
>>
>> I understand that REQ_FLUSH is used to ensure that the current cache be
>> flushed to prevent any reordering but I dont seem to be clear on why
>> REQ_FUA is used.
>> Can someone please help me understand this part?
>>
>> I know there there was a big decision in 2010
>> https://lwn.net/Articles/400541/
>> and http://lwn.net/Articles/399148/
>> to remove the software based barrier support... but with the hardware
>> supporting "barriers" is there a downside to using them to replace the
>> flushes?
>
> OK, so a couple of things here.
>
> There is queuing happening at two different layers in the system;
> once at the block device layer, and one at the storage device layer.
> (Possibly more if you have a hardware RAID card, etc., but for this
> discussion, what's important is the queuing which is happening inside
> the kernel, and that which is happening below the kernel.
>
> The transition in 2010 is referring to how we handle barriers at the
> block device layer, and was inspired by the fact that at that time,
> the vast majority of the storage devices only supported "cache flush"
> at the storage layer, and a few devices would support FUA (Force Unit
> Attention) requests. But it can support devices which have a true
> cache barrier function.
>
> So when we say REQ_FLUSH, what we mean is that the writes are flushed
> from the block layer command queues to the storage device, and that
> subsequent writes will not be reordered before the flush. Since most
> devices don't support a cache barrier command, this is implemented in
> practice as a FLUSH CACHE, but if the device supports cache barrier
> command, that would be sufficient.
>
> The FUA write command is the command that actually has temporal
> meaning; the device is not supported to signal completion until that
> particular write has been committed to stable store. And if you
> combine that with a flush command, as in WRITE_FLUSH_FUA, then that
> implies a cache barrier, followed by a write that should not return
> until write (FUA), and all preceeding writes, have been committed to
> stable store (implied by the cache barrier).
>
> For devices that support a cache barrier, a REQ_FLUSH can be
> implemented using a cache barrier. If the storage device does not
> support a cache barrier, the much stronger FLUSH CACHE command will
> also work, and in practice, that's what gets used in for most storage
> devices today.
>
> For devices that don't support a FUA write, this can be simulated
> using the (overly strong) combination of a write followed by a FLUSH
> CACHE command. (Note, due to regressions caused by buggy hardware,
> the libata driver does not enable FUA by default. Interestingly,
> apparently Windows 2012 and newer no longer tries to use FUA either;
> maybe Microsoft has run into consumer-grade storage devices with
> crappy firmware? That being said, if you are using SATA drives which
> in a JBOD which is has a SAS expander, you *are* using FUA --- but
> presumably people who are doing this are at bigger shops who can do
> proper HDD validation and can lean on their storage vendors to make
> sure any firmware bugs they find get fixed.)
>
> So for ext4, when we do a journal commit, first we write the journal
> blocks, then a REQ_FLUSH, and then we FUA write the commit block ---
> which for commodity SATA drives, gets translated to write the journal
> blocks, FLUSH CACHE, write the commit block, FLUSH CACHE.
>
> If your storage device has support for a barrier command and FUA, then
> this could also be translated to write the journal blocks, CACHE
> BARRIER, FUA WRITE the commit block.
>
> And of course if you don't have FUA support, but you do have the
> barrier command, then this could also get translated to write the
> journal blocks, CACHE BARRIER, write the commit block, FLUSH CACHE.
>
> All of these scenarios should work just fine.
>
> Hope this helps,
>
> - Ted
Thanks so much !!
This was really helpful!
--
Thanks
Nikhilesh Reddy
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum,
a Linux Foundation Collaborative Project.
next prev parent reply other threads:[~2015-09-28 22:28 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-15 23:17 Using Cache barriers in lieu of REQ_FLUSH | REQ_FUA for emmc 5.1 (jdec spec JESD84-B51) Nikhilesh Reddy
2015-09-20 3:42 ` Theodore Ts'o
2015-09-28 22:28 ` Nikhilesh Reddy [this message]
2015-10-23 6:33 ` Running XFS tests on qemu Nikhilesh Reddy
2015-10-23 9:34 ` Theodore Ts'o
2015-10-27 17:55 ` Nikhilesh Reddy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5609BF00.5000502@codeaurora.org \
--to=reddyn@codeaurora.org \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).