linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RFC: O_PONIES semantics (well O_REWRITE)
@ 2009-06-11  1:03 Rik van Riel
  2009-06-11  5:53 ` Andreas Dilger
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Rik van Riel @ 2009-06-11  1:03 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Ray Strode, elb

The ext4 automatic-fsync-on-rename discussion has shown that
many applications simply Do It Wrong when it comes to rewriting
configuration files.

Some of the common failures are:
- program overwrites the old config file
- program writes a new file, but forgets to fsync before rename
- program writes the new file in /tmp, so the rename fails on
   some systems
- program writes a new file and fsyncs, but forgets to give the
   new file the same file ownership, permission and/or extended
   attributes as the old file

Magically taking care of filesystem semantics for every use may
not be possible (no O_PONIES for you!), but I believe we can
help the applications that just want to completely rewrite a
file and atomically replace it.

The semantics for O_REWRITE would be:

1) When opening a file O_REWRITE, the file handle points at
    a freshly allocated, empty file.  The original file is
    still available to programs that open the file without
    O_REWRITE.

2) O_REWRITE can only be used in conjunction with O_WRONLY,
    because the file descriptor is not associated with the
    original file (which has data), but with an empty inode.

3) The code that implements O_REWRITE (kernel?  glibc?)
    makes sure that:
    - the new file is on the same filesystem as the original file
    - the new file is not linked (so it is automatically freed
      after a process or system crash)
    - the new file's ownership, permissions and extended attributes
      match that of the original file

4) The application that opens a file O_REWRITE is required
    to rewrite the entire file.

5) On close(), the code that implements O_REWRITE makes sure that
    the file is atomically renamed, so that if a system crash happens,
    the user will see either the old or the new file contents, but
    never an empty file.

6) After close(), processes that open the file will get the new
    content.  Processes that previously opened the file will hold
    on to the old inode and get old contents.

Here are my questions:

- Are these semantics useful for programs that want to replace
   config (or other) files with new content?

- Are these semantics sane?

- What would be the best place to implement these semantics?

Relying on application developers to get it right seems to
not have worked out well, so I'm thinking kernel or glibc.
Glibc has the advantage of it not being in the kernel, but
implementing it in-kernel might give us the opportunity for
performance enhancements, like reducing step (5) to merely
enforcing ordering between filesystem operations, instead
of requiring an fsync.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-06-17 13:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-11  1:03 RFC: O_PONIES semantics (well O_REWRITE) Rik van Riel
2009-06-11  5:53 ` Andreas Dilger
2009-06-11 14:06   ` Rik van Riel
2009-06-11 14:23     ` Trond Myklebust
2009-06-11 14:32       ` Ray Strode
2009-06-17 13:52       ` Rik van Riel
2009-06-11  9:51 ` Artem Bityutskiy
2009-06-12  2:07 ` Jamie Lokier
2009-06-12  2:20   ` Matthew Wilcox
2009-06-12 17:06     ` Ray Strode

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).