From: Walker, Benjamin <benjamin.walker at intel.com>
To: spdk@lists.01.org
Subject: Re: [SPDK] Set not only O_DIRECT but also O_DSYNC to BDEV_AIO
Date: Thu, 16 Nov 2017 17:59:47 +0000 [thread overview]
Message-ID: <1510855185.2692.15.camel@intel.com> (raw)
In-Reply-To: FB0B9576-F099-447A-A86C-848CFC2507B3@intel.com
[-- Attachment #1: Type: text/plain, Size: 1807 bytes --]
On Thu, 2017-11-16 at 15:33 +0000, Harris, James R wrote:
> It still seems like O_DSYNC should be a nop on NVMe SSDs that do not have a
> volatile write cache. Certainly, adding O_DSYNC should not improve
> performance as Cunyin showed in his latest data. I suspect some of these
> differences must still be related to preconditioning.
I agree - there has to be something else going on here. Sending flushes after
every I/O should make the drive slower (or have no impact), not faster.
> On 11/15/17, 5:39 PM, "SPDK on behalf of 松本周平 / MATSUMOTO,SHUUHEI" <spdk-bounc
> es(a)lists.01.org on behalf of shuhei.matsumoto.xt(a)hitachi.com> wrote:
> I want to add our team's experience about AIO + only O_DIRECT.
> We have observed that
> - if only O_DIRECT is set, any IO succeeded even if HDD was hot removed.
> - if both O_DIRECT and O_DSYNC, any IO failed if HDD was hot removed.
Is this experience based on experiments with HDDs? I assume those HDDs have a
volatile write cache, since most do. What if another explanation is that AIO
reported the I/O complete after the write to the HDD completed successfully,
then the drive was hot removed prior to a flush being sent? In that case, the
data would be lost even though the write I/O completed successfully. This would
also explain why O_DIRECT plus O_DSYNC fixes the problem - it first sends the
write which succeeds, but then tries to send a flush immediately which fails
because the device was hot removed between the two commands.
I really would like to avoid adding O_DSYNC if we can find a way to make the
code strictly correct in terms of data integrity. That option should have a
large negative impact on performance for devices that have a volatile write
cache (HDDs, consumer SSDs).
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 3274 bytes --]
next reply other threads:[~2017-11-16 17:59 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-16 17:59 Walker, Benjamin [this message]
-- strict thread matches above, loose matches on Subject: below --
2017-11-17 0:23 [SPDK] Set not only O_DIRECT but also O_DSYNC to BDEV_AIO
2017-11-17 0:08
2017-11-16 15:33 Harris, James R
2017-11-16 3:43
2017-11-16 2:41 Chang, Cunyin
2017-11-16 0:39
2017-11-15 6:02
2017-11-15 1:06
2017-11-15 0:38 Chang, Cunyin
2017-11-14 18:01 Walker, Benjamin
2017-11-14 15:56 Harris, James R
2017-11-14 9:31
2017-11-14 9:26
2017-11-14 8:07 Chang, Cunyin
2017-11-14 7:47 Yang, Ziye
2017-11-14 7:41
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1510855185.2692.15.camel@intel.com \
--to=spdk@lists.01.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.