linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Mason <chris.mason@oracle.com>
To: Olaf van der Spek <olafvdspek@gmail.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Atomic file data replace API
Date: Fri, 07 Jan 2011 10:05:15 -0500	[thread overview]
Message-ID: <1294412553-sup-9058@think> (raw)
In-Reply-To: <AANLkTi=Z1RCwMKSMRzg4fwN55ZQs4F91nNjdEfv6veyb@mail.gmail.com>

Excerpts from Olaf van der Spek's message of 2011-01-07 10:01:59 -0500:
> On Fri, Jan 7, 2011 at 3:58 PM, Chris Mason <chris.mason@oracle.com> =
wrote:
> > Excerpts from Olaf van der Spek's message of 2011-01-06 15:01:15 -0=
500:
> >> Hi,
> >>
> >> Does btrfs support atomic file data replaces? Basically, the atomi=
c
> >> variant of this:
> >> // old stage
> >> open(O_TRUNC)
> >> write() // 0+ times
> >> close()
> >> // new state
> >
> > Yes and no. =C2=A0We have a best effort mechanism where we try to g=
uess that
> > since you've done this truncate and the write that you want the wri=
tes
> > to show up quickly. =C2=A0But its a guess.
> >
> > The problem is the write() // 0+ times. =C2=A0The kernel has no ide=
a what
> > new result you want the file to contain because the application isn=
't
> > telling us.
>=20
> Isn't it safe for the kernel to wait until the first write or close
> before writing anything to disk?

I'm afraid not.  Picture an application that opens a thousand files and
writes 1MB to each of them, and then didn't close any.  If we waited
until close, you'd have 1GB of memory pinned or staged somehow.

>=20
> > What btrfs can do (but we haven't yet implemented) is make sure tha=
t the
> > results of a single write file are on disk atomically, even if they=
 are
> > replacing existing bytes in the file.
> >
> > Because we cow and because we don't update metadata pointers until =
the
> > IO is complete, we can wait until all the IO for a given write call=
 is
> > on disk before we update any of the metadata.
> >
> > This isn't hard, it's on my TODO list.
>=20
> What about a new flag: O_ATOMIC that'd take the guesswork out of the =
kernel?

We can't guess beyond a single write call.  Otherwise we get into
the problem above where an application can force the kernel to wait
forever.  I'm not against O_ATOMIC to enable the new btrfs
functionality, but it will still be limited to one write.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2011-01-07 15:05 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-06 20:01 Atomic file data replace API Olaf van der Spek
2011-01-07 13:55 ` Mike Fleetwood
2011-01-07 14:01   ` Olaf van der Spek
2011-01-07 14:10     ` Olaf van der Spek
2011-01-07 14:58 ` Chris Mason
2011-01-07 15:01   ` Olaf van der Spek
2011-01-07 15:05     ` Chris Mason [this message]
2011-01-07 15:08       ` Olaf van der Spek
2011-01-07 15:13         ` Chris Mason
2011-01-07 15:17           ` Olaf van der Spek
2011-01-07 16:12             ` Chris Mason
2011-01-07 16:19               ` Olaf van der Spek
2011-01-07 16:26               ` Hubert Kario
2011-01-07 19:29                 ` Chris Mason
2011-01-08 14:40                   ` Olaf van der Spek
2011-01-26 18:30                     ` Olaf van der Spek
2011-01-26 19:30                       ` Chris Mason
2011-01-26 21:56                         ` Olaf van der Spek
2011-01-07 16:32             ` Massimo Maggi
2011-01-07 16:34               ` Olaf van der Spek
2011-01-07 19:29                 ` Thomas Bellman
2011-01-08 14:36                   ` Olaf van der Spek
2011-01-08 21:43                     ` Thomas Bellman
2011-01-09 15:16                       ` Olaf van der Spek
2011-01-09 18:56                         ` Thomas Bellman
2011-01-09 19:06                           ` Olaf van der Spek
2011-01-09 20:13                           ` Phillip Susi
2011-01-08  1:11   ` Phillip Susi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1294412553-sup-9058@think \
    --to=chris.mason@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=olafvdspek@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).