public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ric Wheeler <rwheeler@redhat.com>
To: Pavel Machek <pavel@ucw.cz>
Cc: Artem Bityutskiy <dedekind@yandex.ru>,
	Artem Bityutskiy <Artem.Bityutskiy@nokia.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: replace() system call needed (was Re: EXT4-ish "fixes" in UBIFS)
Date: Mon, 30 Mar 2009 13:19:26 -0400	[thread overview]
Message-ID: <49D0FF1E.3010804@redhat.com> (raw)
In-Reply-To: <20090329124959.GD15492@elf.ucw.cz>

Pavel Machek wrote:
>>>> We have a problem that user-space people do not want to
>>>> use 'fsync()', even when they are pointed to their code
>>>> which is doing create/write/rename/close without fsync().
>>>>         
>>> Well... they really don't want to spin the disk up for the
>>> fsync(). I'm not sure if fsync() is really sensible operation to use
>>> there.
>>>       
>> I'm personally concerned about hand-held, and in case of UBIFS
>> fsync is not too expensive - we work on flash and on fsync() we
>> write back only the stuff belonging to inode in question, and
>> nothing else.
>>     
>
> Well, I'm more concerned about spinning disks, having one even in my
> zaurus. And I do believe that fsync() will write more data than
> neccessary even in flash case.
>
>   
>>>> 1. truncate/write/close leads to empty files
>>>>         
>>> this is buggy.
>>>       
>> In FS, or in application?
>>     
>
> Application is buggy; no way kernel can help there.
>
>   
>>>> 2. create/write/rename leads to empty files
>>>>         
>>> ..but this should not be. If we want to make that explicit, we should
>>> provide "replace()" operation; where replace is rename that makes sure
>>> that source file is completely on media before commiting the rename.
>>>       
>> Well, OK, we can fsync() before rename, we just need clean rules
>> for this, so that all Linux FSes would follow them. Would be nice
>> to have final agreement on all this stuff.
>>     
>
> My proposal is 
>
> rename() stays.
>
> replace(src, bar) is rename that ensures that bar will contain valid
> data after powerfail.
>   

Surely the only way to "insure" this is to spin up the drive, write the 
meta-data and data back and make sure that it is not held in volatile 
write cache?

Why would calling this replace be better or more power efficient than 
what you need to do today?

ric

>   
>>> It is somehow similar to fsync()/rename(), but does not force disk
>>> spin up immediately -- it only inserts "barrier" between data blocks
>>> and rename. (And yes, it should be implemented as fsync()+rename() for
>>> filesystems like xfs. It can be implemented as plain rename for ext3
>>> and ext4 after the fixes...)
>>>       
>> Right. But I guess only few file-systems would really implement
>> this, because this is complex.
>>     
>
> Complex yes, but at least ext3+ext4+btrfs should, and they really have
> 90% of "market share" :-). ext3 and ext4 implementations are already
> done :-).
> 								Pavel
>   


  parent reply	other threads:[~2009-03-30 17:19 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-27 12:48 EXT4-ish "fixes" in UBIFS Artem Bityutskiy
2009-03-28  1:22 ` Kyungmin Park
2009-03-29 12:31   ` Artem Bityutskiy
2009-03-29 12:54     ` Artem Bityutskiy
2009-03-29 12:26 ` replace() system call needed (was Re: EXT4-ish "fixes" in UBIFS) Pavel Machek
2009-03-29 12:42   ` Artem Bityutskiy
2009-03-29 12:50     ` Pavel Machek
2009-03-29 13:00       ` Artem Bityutskiy
2009-03-29 13:02         ` Pavel Machek
2009-03-29 13:07           ` Artem Bityutskiy
2009-03-29 13:22             ` Andreas T.Auer
2009-03-29 13:55               ` Artem Bityutskiy
2009-03-29 13:40             ` Pavel Machek
2009-03-29 13:57               ` Artem Bityutskiy
2009-03-29 14:00                 ` Pavel Machek
2009-03-30 17:19       ` Ric Wheeler [this message]
2009-03-30 22:11         ` Pavel Machek
2009-03-29 13:01     ` Andreas T.Auer
2009-03-29 13:06       ` Artem Bityutskiy
2009-03-30 15:58   ` Diego Calleja
2009-04-03  0:09 ` EXT4-ish "fixes" in UBIFS Christian Kujau
2009-04-03  0:24   ` Trenton D. Adams
2009-04-03  0:28     ` Trenton D. Adams
2009-04-03  0:38       ` Christian Kujau
2009-04-03  0:54         ` Trenton D. Adams
2009-04-03  0:54         ` Trenton D. Adams
2009-04-03  0:59           ` Trenton D. Adams
2009-04-03  1:55       ` David Rees
2009-04-03  2:05         ` Trenton D. Adams
2009-04-03  2:19           ` David Rees
2009-04-03  2:28             ` Trenton D. Adams
2009-04-03  2:58               ` David Rees
2009-04-03  3:13                 ` Trenton D. Adams
2009-04-03  3:14                   ` Trenton D. Adams
2009-04-03  5:02                 ` Theodore Tso
2009-04-03  5:15                   ` Trenton D. Adams
2009-04-03  6:30                     ` Theodore Tso
2009-04-03 18:53                       ` Chris Adams
2009-04-03 18:05                   ` David Rees
2009-04-09 20:17                   ` Pavel Machek
2009-04-03  2:26         ` Trenton D. Adams
2009-04-03  2:05   ` Theodore Tso
2009-04-03  2:45     ` Christian Kujau
2009-04-03  2:49       ` Trenton D. Adams
2009-04-03  6:53   ` Artem Bityutskiy
     [not found] <ckjPq-2Dl-15@gated-at.bofh.it>
     [not found] ` <cl2jy-65z-1@gated-at.bofh.it>
     [not found]   ` <cl2CZ-6q2-21@gated-at.bofh.it>
     [not found]     ` <cl2N9-6Bj-9@gated-at.bofh.it>
2009-03-31 21:27       ` replace() system call needed (was Re: EXT4-ish "fixes" in UBIFS) Bodo Eggert
2009-04-01  0:06         ` Theodore Tso
2009-04-01 20:52           ` Pavel Machek
2009-04-01 22:58             ` Bodo Eggert
     [not found] <cmFiD-8uc-9@gated-at.bofh.it>
     [not found] ` <cmFss-ft-15@gated-at.bofh.it>
     [not found]   ` <cmFsu-ft-23@gated-at.bofh.it>
     [not found]     ` <cmGRt-2hq-7@gated-at.bofh.it>
     [not found]       ` <cmH1b-2K0-11@gated-at.bofh.it>
     [not found]         ` <cmHkz-3d3-5@gated-at.bofh.it>
     [not found]           ` <cmHkA-3d3-7@gated-at.bofh.it>
     [not found]             ` <cmHND-3Oz-5@gated-at.bofh.it>
     [not found]               ` <cmJPm-7hd-5@gated-at.bofh.it>

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49D0FF1E.3010804@redhat.com \
    --to=rwheeler@redhat.com \
    --cc=Artem.Bityutskiy@nokia.com \
    --cc=dedekind@yandex.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pavel@ucw.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox