From: Greg Banks <gnb-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
To: Michael Kerrisk <mtk-manpages-hi6Y0CQ0nG0@public.gmane.org>
Cc: linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: [PATCH] open.2: improve description of O_DIRECT
Date: Wed, 23 Jan 2008 17:36:14 +1100 [thread overview]
Message-ID: <4796E05E.1000508@melbourne.sgi.com> (raw)
Against man-pages-2.76. Update the description of the O_DIRECT flag to open(2)
- to document the behaviour of O_DIRECT with NFS, and
- to be clearer about the O_DIRECT alignment restriction
mess in Linux, and
- to recommend that application writers exercise caution.
Information from reading NFS & XFS source and talking to XFS folks.
Signed-off-by: Greg Banks <gnb-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
Reviewed-by: David Chinner <dgc-sJ/iWh9BUns@public.gmane.org>
Reviewed-by: Jeremy Higdon <jeremy-sJ/iWh9BUns@public.gmane.org>
References: SGI:PV975946
---
man2/open.2 | 81 ++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 63 insertions(+), 18 deletions(-)
Index: man-pages-2.76/man2/open.2
===================================================================
--- man-pages-2.76.orig/man2/open.2 2008-01-18 13:04:11.523554019 +1100
+++ man-pages-2.76/man2/open.2 2008-01-21 20:31:52.206769981 +1100
@@ -197,14 +197,9 @@ The I/O is synchronous, that is, at the
.BR read (2)
or
.BR write (2),
-data is guaranteed to have been transferred.
-Under Linux 2.4 transfer sizes, and the alignment of user buffer
-and file offset must all be multiples of the logical block size
-of the file system.
-Under Linux 2.6 alignment to 512-byte boundaries
-suffices.
-.\" Alignment should satisfy requirements for the underlying device
-.\" There may be coherency problems.
+data is guaranteed to have been transferred. See
+.B NOTES
+below for further discussion.
.sp
A semantically similar (but deprecated) interface for block devices
is described in
@@ -587,17 +582,67 @@ On many systems the file is actually tru
.LP
The
.B O_DIRECT
-flag was introduced in SGI IRIX, where it has alignment restrictions
-similar to those of Linux 2.4.
-IRIX has also a fcntl(2) call to
-query appropriate alignments, and sizes.
-FreeBSD 4.x introduced
-a flag of same name, but without alignment restrictions.
-Support was added under Linux in kernel version 2.4.10.
+flag may impose alignment restrictions on the length and address
+of userspace buffers and the file offset of I/Os. In Linux alignment
+restrictions vary by filesystem and kernel version and might be
+absent entirely. However there is currently no filesystem\-independent
+interface for an application to discover these restrictions for a given
+file or filesystem. Some filesystems provide their own interfaces
+for doing so, for example the
+.B XFS_IOC_DIOINFO
+operation in
+.BR xfsctl (3).
+.LP
+Under Linux 2.4, transfer sizes, and the alignment of user buffer
+and file offset must all be multiples of the logical block size
+of the file system. Under Linux 2.6, alignment to 512-byte boundaries
+suffices. The flag was introduced in SGI IRIX, where it has alignment
+restrictions similar to those of Linux 2.4. IRIX has also a fcntl(2)
+call to query appropriate alignments, and sizes. FreeBSD 4.x introduced
+a flag of the same name, but without alignment restrictions.
+.LP
+.B O_DIRECT
+support was added under Linux in kernel version 2.4.10.
Older Linux kernels simply ignore this flag.
-One may have to define the
-.B _GNU_SOURCE
-macro to get its definition.
+Some filesystems may not implement the flag and
+.B open
+will fail with EINVAL if it is used.
+.LP
+Applications should avoid mixing
+.B O_DIRECT
+and normal I/O to the same
+file, and especially to overlapping byte regions in the same file.
+Even when the filesystem correctly handles the coherency issues in
+this situation, overall I/O throughput is likely to be slower than
+using either mode alone. Likewise, applications should avoid mixing
+.BR mmap (2)
+of files with direct I/O to the same files.
+.LP
+The behaviour of
+.B O_DIRECT
+with NFS will differ from local filesystems. Older kernels, or
+kernels configured in certain ways, may not support this combination.
+The NFS protocol does not support passing the flag to the server, so
+.B O_DIRECT
+I/O will only bypass the page cache on the client; the server may
+still cache the I/O. The client asks the server to make the I/O
+synchronous to preserve the synchronous semantics of
+.BR O_DIRECT .
+Some servers will perform poorly under these circumstances, especially
+if the I/O size is small. Some servers may also be configured to
+lie to clients about the I/O having reached stable storage; this
+will avoid the performance penalty at some risk to data integrity
+in the event of server power failure. The Linux NFS client places
+no alignment restrictions on
+.B O_DIRECT
+I/O.
+.PP
+In summary,
+.B O_DIRECT
+is a potentially powerful tool that should be used with caution. It
+is recommended that applications treat use of
+.B O_DIRECT
+as a performance option which is disabled by default.
.PP
There are many infelicities in the protocol underlying NFS, affecting
amongst others
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
The cake is *not* a lie.
I don't speak for SGI.
next reply other threads:[~2008-01-23 6:36 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-23 6:36 Greg Banks [this message]
[not found] ` <4796E05E.1000508-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
2008-02-11 16:29 ` [PATCH] open.2: improve description of O_DIRECT Michael Kerrisk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4796E05E.1000508@melbourne.sgi.com \
--to=gnb-cp1dwlodopni96+mszhfpqc/g2k4zdhf@public.gmane.org \
--cc=linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=mtk-manpages-hi6Y0CQ0nG0@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.