From: Jamie Lokier <jamie@shareable.org>
To: Christoph Hellwig <hch@infradead.org>
Cc: Ulrich Drepper <drepper@redhat.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: adding proper O_SYNC/O_DSYNC, was Re: O_DIRECT and barriers
Date: Fri, 28 Aug 2009 17:44:32 +0100 [thread overview]
Message-ID: <20090828164432.GA8036@shareable.org> (raw)
In-Reply-To: <20090828154647.GA15808@infradead.org>
Christoph Hellwig wrote:
> On Thu, Aug 27, 2009 at 10:24:28AM -0700, Ulrich Drepper wrote:
> > The problem with O_* extensions is that the syscall doesn't fail if the
> > flag is not handled. This is a problem in the open implementation which
> > can only be fixed with a new syscall.
> >
> > Why cannot just go on and say we interpret O_SYNC like O_SYNC and
> > O_SYNC|O_DSYNC like O_DSYNC. The POSIX spec explicitly requires that
> > the latter handled like O_SYNC.
> >
> > We could handle it by allocating two bits, only one is handled in the
> > kernel. If the O_DSYNC definition for userlevel would be different from
> > the kernel definition then the kernel could interpret O_SYNC|O_DSYNC
> > like O_DSYNC. The libc would then have to translate the userlevel
> > O_DSYNC into the kernel O_DSYNC. If the libc is too old for the kernel
> > and the application, the userlevel flag would be passed to the kernel
> > and nothing bad happens.
>
> What about hte following variant:
>
> - given that our current O_SYNC really is and always has been actuall
> Posix O_DSYNC keep the numerical value and rename it to O_DSYNC in
> the headers.
> - Add a new O_SYNC definition:
>
> #define O_SYNC (O_DSYNC|O_REALLY_SYNC)
>
> and do full O_SYNC handling in new kernels if O_REALLY_SYNC is
> present.
That looks good for the kernel.
However, for userspace, there's an issue with applications which were
compiled with an old libc and used O_SYNC. Most of them probably
expected O_SYNC behaviour but all they got was O_DSYNC, because Linux
didn't do it right.
But they *didn't know* that.
When using a newer kernel which actually implements O_SYNC behaviour,
I'm thinking those applications which asked for O_SYNC should get it,
even though they're still linked with an old libc.
That's because this thread is the first time I've heard that Linux
O_SYNC was really the weaker O_DSYNC in disguise, and judging from the
many Googlings I've done about O_SYNC in applications and on different
OS, it'll be news to other people too.
(I always thought the "#define O_DSYNC O_SYNC" was because Linux
didn't implement the weaker O_DSYNC).
(Oh, and Ulrich: Why is there a "#define O_RSYNC O_SYNC" in the Glibc
headers? That doesn't make sense: O_RSYNC has nothing to do with
writing.)
To achieve that, libc could implement two versions of open() at the
same time as it updates header files. The new libc's __old_open() would
do:
/* Only O_DSYNC is set for apps built against old libc which
were compiled
if (flags & O_DSYNC)
flags |= O_SYNC;
I'm not exactly sure how symbol versioning works, but perhaps the
header file in the new libc would need __REDIRECT_NTH to map open() to
__new_open(), which just calls the kernel. This is to ensure .o and
.a files built with an old libc's headers but then linked to a new
libc will get __old_open().
Although libc's __new_open() could have this:
/* Old kernels only look at O_DSYNC. It's better than nothing. */
if (flags & O_SYNC)
flags |= O_DSYNC;
Imho, it's better to not do that, and instead have
#define O_SYNC (O_DSYNC|__O_SYNC_KERNEL)
as Chris suggests, in the libc header the same as the kernel header,
because that way applications which use the syscall() function or have
to invoke a syscall directly (I've seen clone-using code doing it),
won't spontaneously start losing their O_SYNCness on older kernels.
Unless there is some reason why "flags &= ~O_SYNC" is not permitted to
clear the O_DSYNC flag, or other reason why they must be separate flags.
-- Jamie
next prev parent reply other threads:[~2009-08-28 16:44 UTC|newest]
Thread overview: 139+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-19 16:04 [PATCH 0/17] Make O_SYNC handling use standard syncing path Jan Kara
2009-08-19 16:04 ` [PATCH 01/17] vfs: Introduce filemap_fdatawait_range Jan Kara
2009-08-19 16:10 ` Christoph Hellwig
2009-08-19 16:04 ` [Ocfs2-devel] [PATCH 02/17] vfs: Export __generic_file_aio_write() and add some comments Jan Kara
2009-08-19 16:04 ` Jan Kara
2009-08-19 16:11 ` [Ocfs2-devel] " Christoph Hellwig
2009-08-19 16:11 ` Christoph Hellwig
2009-08-20 12:04 ` [Ocfs2-devel] " Jan Kara
2009-08-20 12:04 ` Jan Kara
2009-08-19 20:22 ` Evgeniy Polyakov
2009-08-19 20:22 ` [Ocfs2-devel] " Evgeniy Polyakov
2009-08-20 12:31 ` Jan Kara
2009-08-20 12:31 ` Jan Kara
2009-08-20 13:30 ` Evgeniy Polyakov
2009-08-20 13:30 ` [Ocfs2-devel] " Evgeniy Polyakov
2009-08-20 13:52 ` Jan Kara
2009-08-20 13:52 ` Jan Kara
2009-08-20 13:58 ` Evgeniy Polyakov
2009-08-20 13:58 ` [Ocfs2-devel] " Evgeniy Polyakov
2009-08-19 16:04 ` [PATCH 03/17] vfs: Remove syncing from generic_file_direct_write() and generic_file_buffered_write() Jan Kara
2009-08-19 16:04 ` Jan Kara
2009-08-19 16:04 ` [Ocfs2-devel] " Jan Kara
2009-08-19 16:18 ` Christoph Hellwig
2009-08-19 16:18 ` Christoph Hellwig
2009-08-19 16:18 ` [Ocfs2-devel] " Christoph Hellwig
2009-08-20 13:31 ` Jan Kara
2009-08-20 13:31 ` Jan Kara
2009-08-20 13:31 ` [Ocfs2-devel] " Jan Kara
2009-08-19 16:04 ` [PATCH 04/17] pohmelfs: Use __generic_file_aio_write instead of generic_file_aio_write_nolock Jan Kara
2009-08-19 16:04 ` [Ocfs2-devel] [PATCH 05/17] ocfs2: " Jan Kara
2009-08-19 16:04 ` Jan Kara
2009-08-19 16:04 ` [PATCH 06/17] vfs: Remove sync_page_range_nolock Jan Kara
2009-08-19 16:21 ` Christoph Hellwig
2009-08-19 16:04 ` [PATCH 07/17] vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode Jan Kara
2009-08-19 16:04 ` [Ocfs2-devel] " Jan Kara
2009-08-19 16:04 ` Jan Kara
2009-08-19 16:26 ` Christoph Hellwig
2009-08-19 16:26 ` [Ocfs2-devel] " Christoph Hellwig
2009-08-19 16:26 ` Christoph Hellwig
2009-08-20 12:15 ` Jan Kara
2009-08-20 12:15 ` [Ocfs2-devel] " Jan Kara
2009-08-20 12:15 ` Jan Kara
2009-08-20 16:27 ` Christoph Hellwig
2009-08-20 16:27 ` [Ocfs2-devel] " Christoph Hellwig
2009-08-20 16:27 ` Christoph Hellwig
2009-08-21 15:23 ` Jan Kara
2009-08-21 15:23 ` [Ocfs2-devel] " Jan Kara
2009-08-21 15:23 ` Jan Kara
2009-08-21 15:32 ` Christoph Hellwig
2009-08-21 15:32 ` [Ocfs2-devel] " Christoph Hellwig
2009-08-21 15:32 ` Christoph Hellwig
2009-08-21 15:48 ` Jan Kara
2009-08-21 15:48 ` [Ocfs2-devel] " Jan Kara
2009-08-21 15:48 ` Jan Kara
2009-08-26 18:22 ` Christoph Hellwig
2009-08-26 18:22 ` [Ocfs2-devel] " Christoph Hellwig
2009-08-26 18:22 ` Christoph Hellwig
2009-08-27 0:04 ` Christoph Hellwig
2009-08-27 0:04 ` [Ocfs2-devel] " Christoph Hellwig
2009-08-27 0:04 ` Christoph Hellwig
2009-08-19 16:04 ` [PATCH 08/17] ext2: Update comment about generic_osync_inode Jan Kara
2009-08-19 16:04 ` [PATCH 09/17] ext3: Remove syncing logic from ext3_file_write Jan Kara
2009-08-19 16:04 ` [PATCH 10/17] ext4: Remove syncing logic from ext4_file_write Jan Kara
2009-08-19 16:04 ` Jan Kara
2009-08-19 16:04 ` [PATCH 11/17] fat: Opencode sync_page_range_nolock() Jan Kara
2009-08-19 16:04 ` [PATCH 12/17] ntfs: Use new syncing helpers and update comments Jan Kara
2009-08-19 16:04 ` [Ocfs2-devel] [PATCH 13/17] ocfs2: Update syncing after splicing to match generic version Jan Kara
2009-08-19 16:04 ` Jan Kara
2009-08-21 1:36 ` [Ocfs2-devel] " Joel Becker
2009-08-21 1:36 ` Joel Becker
2009-08-21 14:30 ` Jan Kara
2009-08-21 14:30 ` Jan Kara
2009-08-19 16:04 ` [PATCH 14/17] xfs: Use new syncing helper Jan Kara
2009-08-19 16:04 ` Jan Kara
2009-08-19 16:33 ` Christoph Hellwig
2009-08-19 16:33 ` Christoph Hellwig
2009-08-20 12:22 ` Jan Kara
2009-08-20 12:22 ` Jan Kara
2009-08-19 16:04 ` [PATCH 15/17] pohmelfs: " Jan Kara
2009-08-19 16:04 ` [PATCH 16/17] nfs: Remove reference to generic_osync_inode from a comment Jan Kara
2009-08-19 16:04 ` [PATCH 17/17] vfs: Remove generic_osync_inode() and sync_page_range() Jan Kara
2009-08-20 22:12 ` O_DIRECT and barriers Christoph Hellwig
2009-08-21 11:40 ` Jens Axboe
2009-08-21 13:54 ` Jamie Lokier
2009-08-21 14:26 ` Christoph Hellwig
2009-08-21 15:24 ` Jamie Lokier
2009-08-21 17:45 ` Christoph Hellwig
2009-08-21 19:18 ` Ric Wheeler
2009-08-22 0:50 ` Jamie Lokier
2009-08-22 2:19 ` Theodore Tso
2009-08-22 2:31 ` Theodore Tso
2009-08-24 2:34 ` Christoph Hellwig
2009-08-27 14:34 ` Jamie Lokier
2009-08-27 17:10 ` adding proper O_SYNC/O_DSYNC, was " Christoph Hellwig
2009-08-27 17:24 ` Ulrich Drepper
2009-08-27 17:24 ` Ulrich Drepper
2009-08-28 15:46 ` Christoph Hellwig
2009-08-28 16:06 ` Ulrich Drepper
2009-08-28 16:06 ` Ulrich Drepper
2009-08-28 16:17 ` Christoph Hellwig
2009-08-28 16:33 ` Ulrich Drepper
2009-08-28 16:33 ` Ulrich Drepper
2009-08-28 16:41 ` Christoph Hellwig
2009-08-28 20:51 ` Ulrich Drepper
2009-08-28 20:51 ` Ulrich Drepper
2009-08-28 21:08 ` Christoph Hellwig
2009-08-28 21:16 ` Trond Myklebust
2009-08-28 21:29 ` Christoph Hellwig
2009-08-28 21:43 ` Trond Myklebust
2009-08-28 22:39 ` Christoph Hellwig
2009-08-30 16:44 ` Jamie Lokier
2009-08-28 16:46 ` Jamie Lokier
2009-08-29 0:59 ` Jamie Lokier
2009-08-28 16:44 ` Jamie Lokier [this message]
2009-08-28 16:50 ` Jamie Lokier
2009-08-28 21:08 ` Ulrich Drepper
2009-08-28 21:08 ` Ulrich Drepper
2009-08-30 16:58 ` Jamie Lokier
2009-08-30 17:48 ` Jamie Lokier
2009-08-28 23:06 ` Jamie Lokier
2009-08-28 23:46 ` Christoph Hellwig
2009-08-21 22:08 ` Theodore Tso
2009-08-21 22:38 ` Joel Becker
2009-08-21 22:45 ` Joel Becker
2009-08-21 22:45 ` Joel Becker
2009-08-22 2:11 ` Theodore Tso
2009-08-24 2:42 ` Christoph Hellwig
2009-08-24 2:37 ` Christoph Hellwig
2009-08-24 2:37 ` Christoph Hellwig
2009-08-22 0:56 ` Jamie Lokier
2009-08-22 2:06 ` Theodore Tso
2009-08-26 6:34 ` Dave Chinner
2009-08-26 6:34 ` Dave Chinner
2009-08-26 15:01 ` Jamie Lokier
2009-08-26 18:47 ` Theodore Tso
2009-08-27 14:50 ` Jamie Lokier
2009-08-21 14:20 ` Christoph Hellwig
2009-08-21 15:06 ` James Bottomley
2009-08-21 15:23 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090828164432.GA8036@shareable.org \
--to=jamie@shareable.org \
--cc=drepper@redhat.com \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.