From: Chris Mason <chris.mason@oracle.com>
To: Olaf van der Spek <olafvdspek@gmail.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Atomic file data replace API
Date: Fri, 07 Jan 2011 11:12:11 -0500 [thread overview]
Message-ID: <1294416073-sup-5254@think> (raw)
In-Reply-To: <AANLkTi=o4w9wDoh6f8iaJF=vPQ-Xs1XBnJMbym08cOa-@mail.gmail.com>
Excerpts from Olaf van der Spek's message of 2011-01-07 10:17:31 -0500:
> On Fri, Jan 7, 2011 at 4:13 PM, Chris Mason <chris.mason@oracle.com> =
wrote:
> >> That's not what I asked. ;)
> >> I asked to wait until the first write (or close). That way, you do=
n't
> >> get unintentional empty files.
> >> One step further, you don't have to keep the data in memory, you'r=
e
> >> free to write them to disk. You just wouldn't update the meta-data
> >> (yet).
> >
> > Sorry ;) Picture an application that truncates 1024 files without c=
losing any
> > of them. =C2=A0Basically any operation that includes the kernel wai=
ting for
> > applications because they promise to do something soon is a denial =
of
> > service attack, or a really easy way to run out of memory on the bo=
x.
>=20
> I'm not sure why you would run out of memory in that case.
Well, lets make sure I've got a good handle on the proposed interface:
1) fd =3D open(some_file, O_ATOMIC)
2) truncate(fd, 0)
3) write(fd, new data)
The semantics are that we promise not to let the truncate hit the disk
until the application does the write.
We have a few choices on how we do this:
1) Leave the disk untouched, but keep something in memory that says thi=
s
inode is really truncated
2) Record on disk that we've done our atomic truncate but it is still
pending. We'd need some way to remove or invalidate this record after =
a
crash.
3) Go ahead and do the operation but don't allow the transaction to
commit until the write is done.
option #1: keep something in memory. Well, any time we have a
requirement to pin something in memory until userland decides to do a
write, we risk oom.
option #2: disk format change. Actually somewhat complex because if we
haven't crashed, we need to be able to read the inode in again without
invalidating the record but if we do crash, we have to invalidate the
record. Not impossible, but not trivial.
option #3: Pin the whole transaction. Depending on the FS this may be
impossible. Certain operations require us to commit the transaction to
reclaim space, and we cannot allow userland to put that on hold without
deadlocking.
What most people don't realize about the crash safe filesystems is they
don't have fine grained transactions. There is one single transaction
for all the operations done. This is mostly because it is less complex
and much faster, but it also makes any 'pin the whole transaction' type
system unusable.
-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2011-01-07 16:12 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-06 20:01 Atomic file data replace API Olaf van der Spek
2011-01-07 13:55 ` Mike Fleetwood
2011-01-07 14:01 ` Olaf van der Spek
2011-01-07 14:10 ` Olaf van der Spek
2011-01-07 14:58 ` Chris Mason
2011-01-07 15:01 ` Olaf van der Spek
2011-01-07 15:05 ` Chris Mason
2011-01-07 15:08 ` Olaf van der Spek
2011-01-07 15:13 ` Chris Mason
2011-01-07 15:17 ` Olaf van der Spek
2011-01-07 16:12 ` Chris Mason [this message]
2011-01-07 16:19 ` Olaf van der Spek
2011-01-07 16:26 ` Hubert Kario
2011-01-07 19:29 ` Chris Mason
2011-01-08 14:40 ` Olaf van der Spek
2011-01-26 18:30 ` Olaf van der Spek
2011-01-26 19:30 ` Chris Mason
2011-01-26 21:56 ` Olaf van der Spek
2011-01-07 16:32 ` Massimo Maggi
2011-01-07 16:34 ` Olaf van der Spek
2011-01-07 19:29 ` Thomas Bellman
2011-01-08 14:36 ` Olaf van der Spek
2011-01-08 21:43 ` Thomas Bellman
2011-01-09 15:16 ` Olaf van der Spek
2011-01-09 18:56 ` Thomas Bellman
2011-01-09 19:06 ` Olaf van der Spek
2011-01-09 20:13 ` Phillip Susi
2011-01-08 1:11 ` Phillip Susi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1294416073-sup-5254@think \
--to=chris.mason@oracle.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=olafvdspek@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).