From: Rik van Riel <riel@redhat.com>
To: linux-fsdevel <linux-fsdevel@vger.kernel.org>
Cc: Ray Strode <rstrode@redhat.com>, elb@psg.com
Subject: RFC: O_PONIES semantics (well O_REWRITE)
Date: Wed, 10 Jun 2009 21:03:25 -0400 [thread overview]
Message-ID: <4A3057DD.1050703@redhat.com> (raw)
The ext4 automatic-fsync-on-rename discussion has shown that
many applications simply Do It Wrong when it comes to rewriting
configuration files.
Some of the common failures are:
- program overwrites the old config file
- program writes a new file, but forgets to fsync before rename
- program writes the new file in /tmp, so the rename fails on
some systems
- program writes a new file and fsyncs, but forgets to give the
new file the same file ownership, permission and/or extended
attributes as the old file
Magically taking care of filesystem semantics for every use may
not be possible (no O_PONIES for you!), but I believe we can
help the applications that just want to completely rewrite a
file and atomically replace it.
The semantics for O_REWRITE would be:
1) When opening a file O_REWRITE, the file handle points at
a freshly allocated, empty file. The original file is
still available to programs that open the file without
O_REWRITE.
2) O_REWRITE can only be used in conjunction with O_WRONLY,
because the file descriptor is not associated with the
original file (which has data), but with an empty inode.
3) The code that implements O_REWRITE (kernel? glibc?)
makes sure that:
- the new file is on the same filesystem as the original file
- the new file is not linked (so it is automatically freed
after a process or system crash)
- the new file's ownership, permissions and extended attributes
match that of the original file
4) The application that opens a file O_REWRITE is required
to rewrite the entire file.
5) On close(), the code that implements O_REWRITE makes sure that
the file is atomically renamed, so that if a system crash happens,
the user will see either the old or the new file contents, but
never an empty file.
6) After close(), processes that open the file will get the new
content. Processes that previously opened the file will hold
on to the old inode and get old contents.
Here are my questions:
- Are these semantics useful for programs that want to replace
config (or other) files with new content?
- Are these semantics sane?
- What would be the best place to implement these semantics?
Relying on application developers to get it right seems to
not have worked out well, so I'm thinking kernel or glibc.
Glibc has the advantage of it not being in the kernel, but
implementing it in-kernel might give us the opportunity for
performance enhancements, like reducing step (5) to merely
enforcing ordering between filesystem operations, instead
of requiring an fsync.
--
All rights reversed.
next reply other threads:[~2009-06-11 1:03 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-11 1:03 Rik van Riel [this message]
2009-06-11 5:53 ` RFC: O_PONIES semantics (well O_REWRITE) Andreas Dilger
2009-06-11 14:06 ` Rik van Riel
2009-06-11 14:23 ` Trond Myklebust
2009-06-11 14:32 ` Ray Strode
2009-06-17 13:52 ` Rik van Riel
2009-06-11 9:51 ` Artem Bityutskiy
2009-06-12 2:07 ` Jamie Lokier
2009-06-12 2:20 ` Matthew Wilcox
2009-06-12 17:06 ` Ray Strode
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A3057DD.1050703@redhat.com \
--to=riel@redhat.com \
--cc=elb@psg.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=rstrode@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).