From: Greg Banks <gnb-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
To: Michael Kerrisk <mtk-manpages-hi6Y0CQ0nG0@public.gmane.org>
Cc: linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: [PATCH] open.2: improve description of O_DIRECT
Date: Wed, 23 Jan 2008 17:36:14 +1100 [thread overview]
Message-ID: <4796E05E.1000508@melbourne.sgi.com> (raw)
Against man-pages-2.76. Update the description of the O_DIRECT flag to open(2)
- to document the behaviour of O_DIRECT with NFS, and
- to be clearer about the O_DIRECT alignment restriction
mess in Linux, and
- to recommend that application writers exercise caution.
Information from reading NFS & XFS source and talking to XFS folks.
Signed-off-by: Greg Banks <gnb-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
Reviewed-by: David Chinner <dgc-sJ/iWh9BUns@public.gmane.org>
Reviewed-by: Jeremy Higdon <jeremy-sJ/iWh9BUns@public.gmane.org>
References: SGI:PV975946
---
man2/open.2 | 81 ++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 63 insertions(+), 18 deletions(-)
Index: man-pages-2.76/man2/open.2
===================================================================
--- man-pages-2.76.orig/man2/open.2 2008-01-18 13:04:11.523554019 +1100
+++ man-pages-2.76/man2/open.2 2008-01-21 20:31:52.206769981 +1100
@@ -197,14 +197,9 @@ The I/O is synchronous, that is, at the
.BR read (2)
or
.BR write (2),
-data is guaranteed to have been transferred.
-Under Linux 2.4 transfer sizes, and the alignment of user buffer
-and file offset must all be multiples of the logical block size
-of the file system.
-Under Linux 2.6 alignment to 512-byte boundaries
-suffices.
-.\" Alignment should satisfy requirements for the underlying device
-.\" There may be coherency problems.
+data is guaranteed to have been transferred. See
+.B NOTES
+below for further discussion.
.sp
A semantically similar (but deprecated) interface for block devices
is described in
@@ -587,17 +582,67 @@ On many systems the file is actually tru
.LP
The
.B O_DIRECT
-flag was introduced in SGI IRIX, where it has alignment restrictions
-similar to those of Linux 2.4.
-IRIX has also a fcntl(2) call to
-query appropriate alignments, and sizes.
-FreeBSD 4.x introduced
-a flag of same name, but without alignment restrictions.
-Support was added under Linux in kernel version 2.4.10.
+flag may impose alignment restrictions on the length and address
+of userspace buffers and the file offset of I/Os. In Linux alignment
+restrictions vary by filesystem and kernel version and might be
+absent entirely. However there is currently no filesystem\-independent
+interface for an application to discover these restrictions for a given
+file or filesystem. Some filesystems provide their own interfaces
+for doing so, for example the
+.B XFS_IOC_DIOINFO
+operation in
+.BR xfsctl (3).
+.LP
+Under Linux 2.4, transfer sizes, and the alignment of user buffer
+and file offset must all be multiples of the logical block size
+of the file system. Under Linux 2.6, alignment to 512-byte boundaries
+suffices. The flag was introduced in SGI IRIX, where it has alignment
+restrictions similar to those of Linux 2.4. IRIX has also a fcntl(2)
+call to query appropriate alignments, and sizes. FreeBSD 4.x introduced
+a flag of the same name, but without alignment restrictions.
+.LP
+.B O_DIRECT
+support was added under Linux in kernel version 2.4.10.
Older Linux kernels simply ignore this flag.
-One may have to define the
-.B _GNU_SOURCE
-macro to get its definition.
+Some filesystems may not implement the flag and
+.B open
+will fail with EINVAL if it is used.
+.LP
+Applications should avoid mixing
+.B O_DIRECT
+and normal I/O to the same
+file, and especially to overlapping byte regions in the same file.
+Even when the filesystem correctly handles the coherency issues in
+this situation, overall I/O throughput is likely to be slower than
+using either mode alone. Likewise, applications should avoid mixing
+.BR mmap (2)
+of files with direct I/O to the same files.
+.LP
+The behaviour of
+.B O_DIRECT
+with NFS will differ from local filesystems. Older kernels, or
+kernels configured in certain ways, may not support this combination.
+The NFS protocol does not support passing the flag to the server, so
+.B O_DIRECT
+I/O will only bypass the page cache on the client; the server may
+still cache the I/O. The client asks the server to make the I/O
+synchronous to preserve the synchronous semantics of
+.BR O_DIRECT .
+Some servers will perform poorly under these circumstances, especially
+if the I/O size is small. Some servers may also be configured to
+lie to clients about the I/O having reached stable storage; this
+will avoid the performance penalty at some risk to data integrity
+in the event of server power failure. The Linux NFS client places
+no alignment restrictions on
+.B O_DIRECT
+I/O.
+.PP
+In summary,
+.B O_DIRECT
+is a potentially powerful tool that should be used with caution. It
+is recommended that applications treat use of
+.B O_DIRECT
+as a performance option which is disabled by default.
.PP
There are many infelicities in the protocol underlying NFS, affecting
amongst others
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
The cake is *not* a lie.
I don't speak for SGI.
next reply other threads:[~2008-01-23 6:36 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-23 6:36 Greg Banks [this message]
[not found] ` <4796E05E.1000508-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
2008-02-11 16:29 ` [PATCH] open.2: improve description of O_DIRECT Michael Kerrisk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4796E05E.1000508@melbourne.sgi.com \
--to=gnb-cp1dwlodopni96+mszhfpqc/g2k4zdhf@public.gmane.org \
--cc=linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=mtk-manpages-hi6Y0CQ0nG0@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox