linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RFC: O_PONIES semantics (well O_REWRITE)
@ 2009-06-11  1:03 Rik van Riel
  2009-06-11  5:53 ` Andreas Dilger
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Rik van Riel @ 2009-06-11  1:03 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Ray Strode, elb

The ext4 automatic-fsync-on-rename discussion has shown that
many applications simply Do It Wrong when it comes to rewriting
configuration files.

Some of the common failures are:
- program overwrites the old config file
- program writes a new file, but forgets to fsync before rename
- program writes the new file in /tmp, so the rename fails on
   some systems
- program writes a new file and fsyncs, but forgets to give the
   new file the same file ownership, permission and/or extended
   attributes as the old file

Magically taking care of filesystem semantics for every use may
not be possible (no O_PONIES for you!), but I believe we can
help the applications that just want to completely rewrite a
file and atomically replace it.

The semantics for O_REWRITE would be:

1) When opening a file O_REWRITE, the file handle points at
    a freshly allocated, empty file.  The original file is
    still available to programs that open the file without
    O_REWRITE.

2) O_REWRITE can only be used in conjunction with O_WRONLY,
    because the file descriptor is not associated with the
    original file (which has data), but with an empty inode.

3) The code that implements O_REWRITE (kernel?  glibc?)
    makes sure that:
    - the new file is on the same filesystem as the original file
    - the new file is not linked (so it is automatically freed
      after a process or system crash)
    - the new file's ownership, permissions and extended attributes
      match that of the original file

4) The application that opens a file O_REWRITE is required
    to rewrite the entire file.

5) On close(), the code that implements O_REWRITE makes sure that
    the file is atomically renamed, so that if a system crash happens,
    the user will see either the old or the new file contents, but
    never an empty file.

6) After close(), processes that open the file will get the new
    content.  Processes that previously opened the file will hold
    on to the old inode and get old contents.

Here are my questions:

- Are these semantics useful for programs that want to replace
   config (or other) files with new content?

- Are these semantics sane?

- What would be the best place to implement these semantics?

Relying on application developers to get it right seems to
not have worked out well, so I'm thinking kernel or glibc.
Glibc has the advantage of it not being in the kernel, but
implementing it in-kernel might give us the opportunity for
performance enhancements, like reducing step (5) to merely
enforcing ordering between filesystem operations, instead
of requiring an fsync.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC: O_PONIES semantics (well O_REWRITE)
  2009-06-11  1:03 RFC: O_PONIES semantics (well O_REWRITE) Rik van Riel
@ 2009-06-11  5:53 ` Andreas Dilger
  2009-06-11 14:06   ` Rik van Riel
  2009-06-11  9:51 ` Artem Bityutskiy
  2009-06-12  2:07 ` Jamie Lokier
  2 siblings, 1 reply; 10+ messages in thread
From: Andreas Dilger @ 2009-06-11  5:53 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-fsdevel, Ray Strode, elb

On Jun 10, 2009  21:03 -0400, Rik van Riel wrote:
> The semantics for O_REWRITE would be:
>
> 1) When opening a file O_REWRITE, the file handle points at
>    a freshly allocated, empty file.  The original file is
>    still available to programs that open the file without
>    O_REWRITE.
>
> 2) O_REWRITE can only be used in conjunction with O_WRONLY,
>    because the file descriptor is not associated with the
>    original file (which has data), but with an empty inode.
>
> 3) The code that implements O_REWRITE (kernel?  glibc?)
>    makes sure that:
>    - the new file is on the same filesystem as the original file
>    - the new file is not linked (so it is automatically freed
>      after a process or system crash)
>    - the new file's ownership, permissions and extended attributes
>      match that of the original file
>
> 4) The application that opens a file O_REWRITE is required
>    to rewrite the entire file.

This is all essentially open(O_CREAT|O_TRUNC|O_WRONLY)

> 5) On close(), the code that implements O_REWRITE makes sure that
>    the file is atomically renamed, so that if a system crash happens,
>    the user will see either the old or the new file contents, but
>    never an empty file.

This would be possible if the kernel set the i_size=0, but didn't
send the filesystem the truncate until the file was closed and
being flushed.

> 6) After close(), processes that open the file will get the new
>    content.  Processes that previously opened the file will hold
>    on to the old inode and get old contents.

What is the benefit of (6)?  Of all these semantics this is the
one that would cause the most confusion I think.

> Here are my questions:
>
> - Are these semantics useful for programs that want to replace
>   config (or other) files with new content?
>
> - Are these semantics sane?
>
> - What would be the best place to implement these semantics?

The main question is - would any applications use O_REWRITE in
the first place, or would it just make sense to have a helper
function in glibc like e.g. mktemp that handles the "atomic
update of config file" properly in the first place.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC: O_PONIES semantics (well O_REWRITE)
  2009-06-11  1:03 RFC: O_PONIES semantics (well O_REWRITE) Rik van Riel
  2009-06-11  5:53 ` Andreas Dilger
@ 2009-06-11  9:51 ` Artem Bityutskiy
  2009-06-12  2:07 ` Jamie Lokier
  2 siblings, 0 replies; 10+ messages in thread
From: Artem Bityutskiy @ 2009-06-11  9:51 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-fsdevel, Ray Strode, elb

Hi,

Rik van Riel wrote:
> Here are my questions:
> 
> - Are these semantics useful for programs that want to replace
>   config (or other) files with new content?
> 
> - Are these semantics sane?
> 
> - What would be the best place to implement these semantics?

IMO, people won't use this. And IMO, if we want to fix broken
applications, we should do something transparent to them, i.e.
something similar to what ext4 did.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC: O_PONIES semantics (well O_REWRITE)
  2009-06-11  5:53 ` Andreas Dilger
@ 2009-06-11 14:06   ` Rik van Riel
  2009-06-11 14:23     ` Trond Myklebust
  0 siblings, 1 reply; 10+ messages in thread
From: Rik van Riel @ 2009-06-11 14:06 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: linux-fsdevel, Ray Strode, elb

Andreas Dilger wrote:
> On Jun 10, 2009  21:03 -0400, Rik van Riel wrote:
>> The semantics for O_REWRITE would be:
>>
>> 1) When opening a file O_REWRITE, the file handle points at
>>    a freshly allocated, empty file.  The original file is
>>    still available to programs that open the file without
>>    O_REWRITE.
>>
>> 2) O_REWRITE can only be used in conjunction with O_WRONLY,
>>    because the file descriptor is not associated with the
>>    original file (which has data), but with an empty inode.
>>
>> 3) The code that implements O_REWRITE (kernel?  glibc?)
>>    makes sure that:
>>    - the new file is on the same filesystem as the original file
>>    - the new file is not linked (so it is automatically freed
>>      after a process or system crash)
>>    - the new file's ownership, permissions and extended attributes
>>      match that of the original file
>>
>> 4) The application that opens a file O_REWRITE is required
>>    to rewrite the entire file.
> 
> This is all essentially open(O_CREAT|O_TRUNC|O_WRONLY)

With one big difference.

open(~/.myprog/myprog.conf, O_REWRITE|O_WRONLY) does not
truncate the inode which myprog.conf lives in currently,
but it opens a new inode instead.

This means that the old ~/.myprog/myprog.conf still exists,
and will continue to exist until the program closes the
O_REWRITE file handle - at which point the new contents
will be attached to the filename.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC: O_PONIES semantics (well O_REWRITE)
  2009-06-11 14:06   ` Rik van Riel
@ 2009-06-11 14:23     ` Trond Myklebust
  2009-06-11 14:32       ` Ray Strode
  2009-06-17 13:52       ` Rik van Riel
  0 siblings, 2 replies; 10+ messages in thread
From: Trond Myklebust @ 2009-06-11 14:23 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Andreas Dilger, linux-fsdevel, Ray Strode, elb

On Thu, 2009-06-11 at 10:06 -0400, Rik van Riel wrote:
> Andreas Dilger wrote:
> > On Jun 10, 2009  21:03 -0400, Rik van Riel wrote:
> >> The semantics for O_REWRITE would be:
> >>
> >> 1) When opening a file O_REWRITE, the file handle points at
> >>    a freshly allocated, empty file.  The original file is
> >>    still available to programs that open the file without
> >>    O_REWRITE.
> >>
> >> 2) O_REWRITE can only be used in conjunction with O_WRONLY,
> >>    because the file descriptor is not associated with the
> >>    original file (which has data), but with an empty inode.
> >>
> >> 3) The code that implements O_REWRITE (kernel?  glibc?)
> >>    makes sure that:
> >>    - the new file is on the same filesystem as the original file
> >>    - the new file is not linked (so it is automatically freed
> >>      after a process or system crash)
> >>    - the new file's ownership, permissions and extended attributes
> >>      match that of the original file
> >>
> >> 4) The application that opens a file O_REWRITE is required
> >>    to rewrite the entire file.
> > 
> > This is all essentially open(O_CREAT|O_TRUNC|O_WRONLY)
> 
> With one big difference.
> 
> open(~/.myprog/myprog.conf, O_REWRITE|O_WRONLY) does not
> truncate the inode which myprog.conf lives in currently,
> but it opens a new inode instead.
> 
> This means that the old ~/.myprog/myprog.conf still exists,
> and will continue to exist until the program closes the
> O_REWRITE file handle - at which point the new contents
> will be attached to the filename.

How is this any different than just having your application use
mkostemp() to create a temporary dot file, then renaming it when done
writing?

Trond


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC: O_PONIES semantics (well O_REWRITE)
  2009-06-11 14:23     ` Trond Myklebust
@ 2009-06-11 14:32       ` Ray Strode
  2009-06-17 13:52       ` Rik van Riel
  1 sibling, 0 replies; 10+ messages in thread
From: Ray Strode @ 2009-06-11 14:32 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Rik van Riel, Andreas Dilger, linux-fsdevel, elb

Hi,

> How is this any different than just having your application use
> mkostemp() to create a temporary dot file, then renaming it when done
> writing?
Creating a temporary file and renaming is what applications do now, but it has problems that Rik already mentioned:

- Apps forget to fsync() before rename()
- Apps forget to copy permissions over
- Apps forget to copy acls, selinux labels, and other xattrs over
- Apps create the temporary file in /tmp and then the rename() fails on systems where /tmp is a separate mount

It's not that it's not possible to do, it's that it's hard to do right.  There are a huge number of hoops to jump through to just to properly save a file.
And apps get it wrong.  Most apps get it wrong in some way or another.

Also if it's in the kernel then filesystems could potentially hook into the process and prevent the early sync.

--Ray

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC: O_PONIES semantics (well O_REWRITE)
  2009-06-11  1:03 RFC: O_PONIES semantics (well O_REWRITE) Rik van Riel
  2009-06-11  5:53 ` Andreas Dilger
  2009-06-11  9:51 ` Artem Bityutskiy
@ 2009-06-12  2:07 ` Jamie Lokier
  2009-06-12  2:20   ` Matthew Wilcox
  2 siblings, 1 reply; 10+ messages in thread
From: Jamie Lokier @ 2009-06-12  2:07 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-fsdevel, Ray Strode, elb

Rik van Riel wrote:
> The ext4 automatic-fsync-on-rename discussion has shown that
> many applications simply Do It Wrong when it comes to rewriting
> configuration files.

I got the impression ext4 has
automatic-fsync-on-rename-only-if-the-old-file-exists, which is a bit
less reliable.

By the way, the kernel has some generic support for O_SYNC and
O_DSYNC, and generic MS_SYNC mount option.

So I guess it could also have generic support for mount options
"sync_on_rename" and "sync_on_close", instead of only doing it with ext4.

For example, this came up recently on the linux-mtd list which deals
with flash filesystems.  The ext4-like behaviour is being considerd in
a flash filesystem.  So if it's that important, maybe it would be even
better to make it a generic VFS mount option for all filesystems.

> Some of the common failures are:
> - program overwrites the old config file
> - program writes a new file, but forgets to fsync before rename
> - program writes the new file in /tmp, so the rename fails on
>   some systems
> - program writes a new file and fsyncs, but forgets to give the
>   new file the same file ownership, permission and/or extended
>   attributes as the old file

It's also really hard to do those things from shell scripts, so they
are almost never done there.

> Glibc has the advantage of it not being in the kernel, but
> implementing it in-kernel might give us the opportunity for
> performance enhancements, like reducing step (5) to merely
> enforcing ordering between filesystem operations, instead
> of requiring an fsync.

I think the performance enhancement from order-without-sync might be
useful, I'm not sure, but if so not just for this operation, which is
still quite specialised.

-- Jamie

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC: O_PONIES semantics (well O_REWRITE)
  2009-06-12  2:07 ` Jamie Lokier
@ 2009-06-12  2:20   ` Matthew Wilcox
  2009-06-12 17:06     ` Ray Strode
  0 siblings, 1 reply; 10+ messages in thread
From: Matthew Wilcox @ 2009-06-12  2:20 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Rik van Riel, linux-fsdevel, Ray Strode, elb

On Fri, Jun 12, 2009 at 03:07:38AM +0100, Jamie Lokier wrote:
> > Some of the common failures are:
> > - program overwrites the old config file
> > - program writes a new file, but forgets to fsync before rename
> > - program writes the new file in /tmp, so the rename fails on
> >   some systems
> > - program writes a new file and fsyncs, but forgets to give the
> >   new file the same file ownership, permission and/or extended
> >   attributes as the old file
> 
> It's also really hard to do those things from shell scripts, so they
> are almost never done there.

That's a good point, but O_(PONIES|REWRITE) doesn't fix that problem,
since shell scripts can't specify it, and shells can't do it automatically
for all files created.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC: O_PONIES semantics (well O_REWRITE)
  2009-06-12  2:20   ` Matthew Wilcox
@ 2009-06-12 17:06     ` Ray Strode
  0 siblings, 0 replies; 10+ messages in thread
From: Ray Strode @ 2009-06-12 17:06 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Jamie Lokier, Rik van Riel, linux-fsdevel, elb

Hi,
> On Fri, Jun 12, 2009 at 03:07:38AM +0100, Jamie Lokier wrote:
> > > Some of the common failures are:
> > > - program overwrites the old config file
> > > - program writes a new file, but forgets to fsync before rename
> > > - program writes the new file in /tmp, so the rename fails on
> > >   some systems
> > > - program writes a new file and fsyncs, but forgets to give the
> > >   new file the same file ownership, permission and/or extended
> > >   attributes as the old file
> > 
> > It's also really hard to do those things from shell scripts, so they
> > are almost never done there.
> 
> That's a good point, but O_(PONIES|REWRITE) doesn't fix that problem,
> since shell scripts can't specify it, and shells can't do it automatically
> for all files created.
Well, it's not really the shells that need to be able to do it, it's the helper utilities that shell scripts use.

For instance, sed should use O_REWRITE if -i is passed.

Still, you could imagine a new type of redirection operator that exposed O_REWRITE, e.g.,

cat >>> config-file.cfg << EOF
[config]
Value=foo
EOF

--Ray

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC: O_PONIES semantics (well O_REWRITE)
  2009-06-11 14:23     ` Trond Myklebust
  2009-06-11 14:32       ` Ray Strode
@ 2009-06-17 13:52       ` Rik van Riel
  1 sibling, 0 replies; 10+ messages in thread
From: Rik van Riel @ 2009-06-17 13:52 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Andreas Dilger, linux-fsdevel, Ray Strode, elb

Trond Myklebust wrote:

> How is this any different than just having your application use
> mkostemp() to create a temporary dot file, then renaming it when done
> writing?

That is exactly what it is.  There are reasons for implementing
it at a lower level in the system, though.

Implementing it in glibc:
- means applications get it right (today, many don't)
- allows for a performance optimization, moving the fsync
   into a specially spawned off temporary thread, so the
   main application doesn't stall

Implementing it in the kernel allows for some further
performance and power optimizations, most notably:
- the sync could be turned into an ordering requirement,
   meaning it can be postponed and even obsoleted by a
   future version of the file
- the ability to postpone the write allows for better
   power saving

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-06-17 13:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-11  1:03 RFC: O_PONIES semantics (well O_REWRITE) Rik van Riel
2009-06-11  5:53 ` Andreas Dilger
2009-06-11 14:06   ` Rik van Riel
2009-06-11 14:23     ` Trond Myklebust
2009-06-11 14:32       ` Ray Strode
2009-06-17 13:52       ` Rik van Riel
2009-06-11  9:51 ` Artem Bityutskiy
2009-06-12  2:07 ` Jamie Lokier
2009-06-12  2:20   ` Matthew Wilcox
2009-06-12 17:06     ` Ray Strode

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).