linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Mason <chris.mason@oracle.com>
To: Olaf van der Spek <olafvdspek@gmail.com>
Cc: Hubert Kario <hka@qbs.com.pl>, linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Atomic file data replace API
Date: Wed, 26 Jan 2011 14:30:24 -0500	[thread overview]
Message-ID: <1296069448-sup-2841@think> (raw)
In-Reply-To: <AANLkTim8wQuskQT1DsRX-1LY6yx91Ltd+hDz9OsYvjOP@mail.gmail.com>

Excerpts from Olaf van der Spek's message of 2011-01-26 13:30:08 -0500:
> On Sat, Jan 8, 2011 at 3:40 PM, Olaf van der Spek <olafvdspek@gmail.c=
om> wrote:
> > On Fri, Jan 7, 2011 at 8:29 PM, Chris Mason <chris.mason@oracle.com=
> wrote:
> >> The exact amount of tracking is going to vary. =C2=A0The reason wh=
y is that
> >> actually doing the truncate is an O(size of the file) operation an=
d so
> >> you can't just flip a switch when the write or the close comes in.=
 =C2=A0You
> >> have to run through all the metadata of the file and do something
> >> temporary with each part that is only completed when the file IO i=
s
> >> actually done.
> >
> > That's true. Maybe the proper way, via O_ATOMIC, is better.
> >
> >> Honestly, there many different ways to solve this in the applicati=
on.
> >> Requiring high speed atomic replacement of individual file content=
s is a
> >> recipe for frustration.
> >
> > Did you see message of Massimo? That'd be the ideal way from an app
> > point of view.
> > Not solving this properly in the FS moves the problem to userspace
> > where it's even harder to solve and is not as performant.
> >
> > Replacing file data is a common operation that IMO the FS should
> > support in a safe way.
>=20
> Chris?
>=20

My answer hasn't really changed ;)  Replacing file data is a common
operation, but it is still surprisingly complex.  Again, the truncate i=
s
O(size of the file) and it is actually impossible to do this atomically
in most filesystems.

You don't notice this because xfs/ext34/btrfs (and many others) have
code that makes sure a truncate is restarted if you crash.  So, it
appears to be atomic even though we're really just restarting the
operation.  In order to have a truncate + replacement of data operation=
,
we'd have to do a disk format change that includes both the truncate an=
d
the new data.

It would look a lot like echo data > file.new ; truncate file ; mv
file.new file, but recorded in the FS metadata.

I don't have this in the btrfs roadmap.  It would be nice but most
people use databases for things that require atomic operations.  I
think what ext4 and btrfs do today fall into the category of best
effort and least surprise, and I think it is as good as we can get
without huge performance penalties for normal use.

Now, if you want to talk about atomic replacement of file data without
changing the file size, that's much easier.  At least it's easier for
those of us with cows in our pockets.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2011-01-26 19:30 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-06 20:01 Atomic file data replace API Olaf van der Spek
2011-01-07 13:55 ` Mike Fleetwood
2011-01-07 14:01   ` Olaf van der Spek
2011-01-07 14:10     ` Olaf van der Spek
2011-01-07 14:58 ` Chris Mason
2011-01-07 15:01   ` Olaf van der Spek
2011-01-07 15:05     ` Chris Mason
2011-01-07 15:08       ` Olaf van der Spek
2011-01-07 15:13         ` Chris Mason
2011-01-07 15:17           ` Olaf van der Spek
2011-01-07 16:12             ` Chris Mason
2011-01-07 16:19               ` Olaf van der Spek
2011-01-07 16:26               ` Hubert Kario
2011-01-07 19:29                 ` Chris Mason
2011-01-08 14:40                   ` Olaf van der Spek
2011-01-26 18:30                     ` Olaf van der Spek
2011-01-26 19:30                       ` Chris Mason [this message]
2011-01-26 21:56                         ` Olaf van der Spek
2011-01-07 16:32             ` Massimo Maggi
2011-01-07 16:34               ` Olaf van der Spek
2011-01-07 19:29                 ` Thomas Bellman
2011-01-08 14:36                   ` Olaf van der Spek
2011-01-08 21:43                     ` Thomas Bellman
2011-01-09 15:16                       ` Olaf van der Spek
2011-01-09 18:56                         ` Thomas Bellman
2011-01-09 19:06                           ` Olaf van der Spek
2011-01-09 20:13                           ` Phillip Susi
2011-01-08  1:11   ` Phillip Susi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1296069448-sup-2841@think \
    --to=chris.mason@oracle.com \
    --cc=hka@qbs.com.pl \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=olafvdspek@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).