From: Walker, Benjamin <benjamin.walker at intel.com>
To: spdk@lists.01.org
Subject: Re: [SPDK] Set not only O_DIRECT but also O_DSYNC to BDEV_AIO
Date: Tue, 14 Nov 2017 18:01:08 +0000 [thread overview]
Message-ID: <1510682467.2168.26.camel@intel.com> (raw)
In-Reply-To: KAWPR01MB1282CEE4464EC5353E0CD61AA2280@KAWPR01MB1282.jpnprd01.prod.outlook.com
[-- Attachment #1: Type: text/plain, Size: 2337 bytes --]
On Tue, 2017-11-14 at 07:41 +0000, 松本周平 / MATSUMOTO,SHUUHEI wrote:
> Hi,
>
> Current BDEV AIO uses O_DIRECT to avoid IO cache effects but does not use
> O_DSYNC.
> O_DSYNC assures that IO is written to persistent storage.
>
> About the difference of IO command sequence,
> for SCSI disk O_DSYNC issues extra IO command and it may affect IO performance
> but
> for NVMe-SSD O_DSYNC issues no extra IO command.
> Hence I estimate this is the reason of indifference of performance.
>
O_DIRECT avoids using operating system caches in system memory. O_DSYNC avoids
using volatile caches in the SSD itself by setting the FUA bit on I/O (Force
Unit Access) and additionally by issuing a SCSI synchronize cache command after
each I/O on devices that report having a volatile write cache. This is why you
see an extra command for SCSI devices - they are reporting that they support a
volatile write cache. That's very common for a SAS/SATA HDD.
The SPDK API provides the user with a mechanism to query whether a block device
has a volatile write cache (spdk_bdev_has_write_cache) and an API to instruct
the device to make data in its volatile caches persistent (spdk_bdev_flush). The
existence of volatile write caches and the semantics around flushing are well
established in traditional block stacks and provide significant performance
benefits to some types of devices (particularly lower end consumer grade
devices), so we've chosen to provide those same semantics in SPDK. Altering the
spdk bdev aio module to always specify O_DSYNC will greatly reduce the
performance on these types of devices, so I think choosing just O_DIRECT and not
O_DSYNC is the correct choice for the flag. This provides the user the
traditional semantics of flushing block devices that they expect.
Note that the Intel P3700 does not have a volatile write cache, so sending flush
requests does nothing (it has a write cache, it just isn't volatile). The reason
Jim was asking about preconditioning in another branch of this thread is because
I don't think either of us expect to see any performance difference on the Intel
P3700 when the O_DSYNC flag is added, regardless of workload. If there is a
difference even after preconditioning, then it certainly warrants investigation.
Thanks Shuhei!
Ben
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 3274 bytes --]
next reply other threads:[~2017-11-14 18:01 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-14 18:01 Walker, Benjamin [this message]
-- strict thread matches above, loose matches on Subject: below --
2017-11-17 0:23 [SPDK] Set not only O_DIRECT but also O_DSYNC to BDEV_AIO
2017-11-17 0:08
2017-11-16 17:59 Walker, Benjamin
2017-11-16 15:33 Harris, James R
2017-11-16 3:43
2017-11-16 2:41 Chang, Cunyin
2017-11-16 0:39
2017-11-15 6:02
2017-11-15 1:06
2017-11-15 0:38 Chang, Cunyin
2017-11-14 15:56 Harris, James R
2017-11-14 9:31
2017-11-14 9:26
2017-11-14 8:07 Chang, Cunyin
2017-11-14 7:47 Yang, Ziye
2017-11-14 7:41
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1510682467.2168.26.camel@intel.com \
--to=spdk@lists.01.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.