All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Mason <chris.mason@oracle.com>
To: Olaf van der Spek <olafvdspek@gmail.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Atomic file data replace API
Date: Fri, 07 Jan 2011 10:05:15 -0500	[thread overview]
Message-ID: <1294412553-sup-9058@think> (raw)
In-Reply-To: <AANLkTi=Z1RCwMKSMRzg4fwN55ZQs4F91nNjdEfv6veyb@mail.gmail.com>

Excerpts from Olaf van der Spek's message of 2011-01-07 10:01:59 -0500:
> On Fri, Jan 7, 2011 at 3:58 PM, Chris Mason <chris.mason@oracle.com> =
wrote:
> > Excerpts from Olaf van der Spek's message of 2011-01-06 15:01:15 -0=
500:
> >> Hi,
> >>
> >> Does btrfs support atomic file data replaces? Basically, the atomi=
c
> >> variant of this:
> >> // old stage
> >> open(O_TRUNC)
> >> write() // 0+ times
> >> close()
> >> // new state
> >
> > Yes and no. =C2=A0We have a best effort mechanism where we try to g=
uess that
> > since you've done this truncate and the write that you want the wri=
tes
> > to show up quickly. =C2=A0But its a guess.
> >
> > The problem is the write() // 0+ times. =C2=A0The kernel has no ide=
a what
> > new result you want the file to contain because the application isn=
't
> > telling us.
>=20
> Isn't it safe for the kernel to wait until the first write or close
> before writing anything to disk?

I'm afraid not.  Picture an application that opens a thousand files and
writes 1MB to each of them, and then didn't close any.  If we waited
until close, you'd have 1GB of memory pinned or staged somehow.

>=20
> > What btrfs can do (but we haven't yet implemented) is make sure tha=
t the
> > results of a single write file are on disk atomically, even if they=
 are
> > replacing existing bytes in the file.
> >
> > Because we cow and because we don't update metadata pointers until =
the
> > IO is complete, we can wait until all the IO for a given write call=
 is
> > on disk before we update any of the metadata.
> >
> > This isn't hard, it's on my TODO list.
>=20
> What about a new flag: O_ATOMIC that'd take the guesswork out of the =
kernel?

We can't guess beyond a single write call.  Otherwise we get into
the problem above where an application can force the kernel to wait
forever.  I'm not against O_ATOMIC to enable the new btrfs
functionality, but it will still be limited to one write.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2011-01-07 15:05 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-06 20:01 Atomic file data replace API Olaf van der Spek
2011-01-07 13:55 ` Mike Fleetwood
2011-01-07 14:01   ` Olaf van der Spek
2011-01-07 14:10     ` Olaf van der Spek
2011-01-07 14:58 ` Chris Mason
2011-01-07 15:01   ` Olaf van der Spek
2011-01-07 15:05     ` Chris Mason [this message]
2011-01-07 15:08       ` Olaf van der Spek
2011-01-07 15:13         ` Chris Mason
2011-01-07 15:17           ` Olaf van der Spek
2011-01-07 16:12             ` Chris Mason
2011-01-07 16:19               ` Olaf van der Spek
2011-01-07 16:26               ` Hubert Kario
2011-01-07 19:29                 ` Chris Mason
2011-01-08 14:40                   ` Olaf van der Spek
2011-01-26 18:30                     ` Olaf van der Spek
2011-01-26 19:30                       ` Chris Mason
2011-01-26 21:56                         ` Olaf van der Spek
2011-01-07 16:32             ` Massimo Maggi
2011-01-07 16:34               ` Olaf van der Spek
2011-01-07 19:29                 ` Thomas Bellman
2011-01-08 14:36                   ` Olaf van der Spek
2011-01-08 21:43                     ` Thomas Bellman
2011-01-09 15:16                       ` Olaf van der Spek
2011-01-09 18:56                         ` Thomas Bellman
2011-01-09 19:06                           ` Olaf van der Spek
2011-01-09 20:13                           ` Phillip Susi
2011-01-08  1:11   ` Phillip Susi
  -- strict thread matches above, loose matches on Subject: below --
2010-12-27 11:51 Olaf van der Spek
2010-12-27 13:20 ` Amir Goldstein
2010-12-27 15:53   ` Olaf van der Spek
2010-12-27 17:20     ` Amir Goldstein
2010-12-27 18:34       ` Olaf van der Spek
2010-12-28  2:59 ` Ted Ts'o
2010-12-28 17:27   ` Olaf van der Spek
2010-12-28 19:06     ` Ric Wheeler
2010-12-28 22:25       ` Olaf van der Spek
2010-12-28 22:36         ` Ric Wheeler
2010-12-28 22:58           ` Olaf van der Spek
2010-12-29  9:20             ` Amir Goldstein
2010-12-29 12:42               ` Olaf van der Spek
2010-12-29 15:30                 ` Christian Stroetmann
2010-12-29 15:35                   ` Olaf van der Spek
2010-12-29 16:30                     ` Christian Stroetmann
2010-12-29 17:12                       ` Olaf van der Spek
2010-12-29 17:15                   ` Greg Freemyer
2010-12-29 19:30                     ` Christian Stroetmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1294412553-sup-9058@think \
    --to=chris.mason@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=olafvdspek@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.