public inbox for linux-man@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] open.2: improve description of O_DIRECT
@ 2008-01-23  6:36 Greg Banks
       [not found] ` <4796E05E.1000508-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 2+ messages in thread
From: Greg Banks @ 2008-01-23  6:36 UTC (permalink / raw)
  To: Michael Kerrisk; +Cc: linux-man-u79uwXL29TY76Z2rM5mHXA

Against man-pages-2.76.  Update the description of the O_DIRECT flag to open(2)

 - to document the behaviour of O_DIRECT with NFS, and

 - to be clearer about the O_DIRECT alignment restriction
   mess in Linux, and

 - to recommend that application writers exercise caution.

Information from reading NFS & XFS source and talking to XFS folks.

Signed-off-by: Greg Banks <gnb-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
Reviewed-by: David Chinner <dgc-sJ/iWh9BUns@public.gmane.org>
Reviewed-by: Jeremy Higdon <jeremy-sJ/iWh9BUns@public.gmane.org>
References: SGI:PV975946
---

 man2/open.2 |   81 ++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 63 insertions(+), 18 deletions(-)

Index: man-pages-2.76/man2/open.2
===================================================================
--- man-pages-2.76.orig/man2/open.2	2008-01-18 13:04:11.523554019 +1100
+++ man-pages-2.76/man2/open.2	2008-01-21 20:31:52.206769981 +1100
@@ -197,14 +197,9 @@ The I/O is synchronous, that is, at the 
 .BR read (2)
 or
 .BR write (2),
-data is guaranteed to have been transferred.
-Under Linux 2.4 transfer sizes, and the alignment of user buffer
-and file offset must all be multiples of the logical block size
-of the file system.
-Under Linux 2.6 alignment to 512-byte boundaries
-suffices.
-.\" Alignment should satisfy requirements for the underlying device
-.\" There may be coherency problems.
+data is guaranteed to have been transferred.  See
+.B NOTES
+below for further discussion.
 .sp
 A semantically similar (but deprecated) interface for block devices
 is described in
@@ -587,17 +582,67 @@ On many systems the file is actually tru
 .LP
 The
 .B O_DIRECT
-flag was introduced in SGI IRIX, where it has alignment restrictions
-similar to those of Linux 2.4.
-IRIX has also a fcntl(2) call to
-query appropriate alignments, and sizes.
-FreeBSD 4.x introduced
-a flag of same name, but without alignment restrictions.
-Support was added under Linux in kernel version 2.4.10.
+flag may impose alignment restrictions on the length and address
+of userspace buffers and the file offset of I/Os.  In Linux alignment
+restrictions vary by filesystem and kernel version and might be
+absent entirely.  However there is currently no filesystem\-independent
+interface for an application to discover these restrictions for a given
+file or filesystem.  Some filesystems provide their own interfaces
+for doing so, for example the
+.B XFS_IOC_DIOINFO
+operation in
+.BR xfsctl (3).
+.LP
+Under Linux 2.4, transfer sizes, and the alignment of user buffer
+and file offset must all be multiples of the logical block size
+of the file system.  Under Linux 2.6, alignment to 512-byte boundaries
+suffices.  The flag was introduced in SGI IRIX, where it has alignment
+restrictions similar to those of Linux 2.4.  IRIX has also a fcntl(2)
+call to query appropriate alignments, and sizes.  FreeBSD 4.x introduced
+a flag of the same name, but without alignment restrictions.
+.LP
+.B O_DIRECT
+support was added under Linux in kernel version 2.4.10.
 Older Linux kernels simply ignore this flag.
-One may have to define the
-.B _GNU_SOURCE
-macro to get its definition.
+Some filesystems may not implement the flag and
+.B open
+will fail with EINVAL if it is used.
+.LP
+Applications should avoid mixing
+.B O_DIRECT
+and normal I/O to the same
+file, and especially to overlapping byte regions in the same file.
+Even when the filesystem correctly handles the coherency issues in
+this situation, overall I/O throughput is likely to be slower than
+using either mode alone.  Likewise, applications should avoid mixing
+.BR mmap (2)
+of files with direct I/O to the same files.
+.LP
+The behaviour of
+.B O_DIRECT
+with NFS will differ from local filesystems.  Older kernels, or
+kernels configured in certain ways, may not support this combination.
+The NFS protocol does not support passing the flag to the server, so
+.B O_DIRECT
+I/O will only bypass the page cache on the client; the server may
+still cache the I/O.  The client asks the server to make the I/O
+synchronous to preserve the synchronous semantics of
+.BR O_DIRECT .
+Some servers will perform poorly under these circumstances, especially
+if the I/O size is small.  Some servers may also be configured to
+lie to clients about the I/O having reached stable storage; this
+will avoid the performance penalty at some risk to data integrity
+in the event of server power failure.  The Linux NFS client places
+no alignment restrictions on
+.B O_DIRECT
+I/O.
+.PP
+In summary,
+.B O_DIRECT
+is a potentially powerful tool that should be used with caution.  It
+is recommended that applications treat use of
+.B O_DIRECT
+as a performance option which is disabled by default.
 .PP
 There are many infelicities in the protocol underlying NFS, affecting
 amongst others

-- 
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
The cake is *not* a lie.
I don't speak for SGI.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2008-02-11 16:29 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-23  6:36 [PATCH] open.2: improve description of O_DIRECT Greg Banks
     [not found] ` <4796E05E.1000508-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
2008-02-11 16:29   ` Michael Kerrisk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox