From: Josef Bacik <josef@redhat.com>
To: Mike Fedyk <mfedyk@mikefedyk.com>
Cc: Josef Bacik <josef@redhat.com>, Chris Ball <cjb@laptop.org>,
Nickolai Zeldovich <nickolai@csail.mit.edu>,
linux-btrfs@vger.kernel.org
Subject: Re: zero-length files in snapshots
Date: Fri, 12 Feb 2010 11:22:08 -0500 [thread overview]
Message-ID: <20100212162207.GB4191@localhost.localdomain> (raw)
In-Reply-To: <93cdabd21002120818g4c47e2b6k3083a368286651e5@mail.gmail.com>
On Fri, Feb 12, 2010 at 08:18:01AM -0800, Mike Fedyk wrote:
> On Fri, Feb 12, 2010 at 7:19 AM, Josef Bacik <josef@redhat.com> wrote=
:
> > On Thu, Feb 11, 2010 at 08:50:48PM -0800, Mike Fedyk wrote:
> >> On Thu, Feb 11, 2010 at 7:11 PM, Chris Ball <cjb@laptop.org> wrote=
:
> >> > =A0 > echo x1 > /mnt/x/d/foo.txt || exit 2
> >> > =A0 > btrfsctl -s /mnt/x/snap /mnt/x/d
> >> >
> >> > You're just missing a sync/fsync() between these two lines.
> >> >
> >> > We argued on IRC a while ago about whether this is a sensible de=
fault;
> >> > cmason wants the no-sync version of snapshot creation to be avai=
lable,
> >> > but was amenable to the idea of changing the default to be sync =
before
> >> > snapshot, since it was pointed out that no-one other than him ha=
d
> >> > understood we were supposed to be running sync first.
> >> >
> >> You're saying that it only snapshots the on-disk data structures a=
nd
> >> not the in-memory versions? =A0That can only lead to pain. =A0What=
do you
> >> do if something else during this race condition? =A0What would a s=
ync do
> >> to solve this? =A0Have the semantics of sync been changed in btrfs=
from
> >> "sync everything that hasn't been written yet" to "sync this
> >> subvolume"?
> >>
> >
> > Welcome to delalloc. =A0You either get fast writes or you get all o=
f your data on
> > the disk every 5 seconds. =A0If you don't like delalloc, use ext3. =
=A0The data
> > you've written to memory doesn't go down to disk unless explicitly =
told to, such
> > as
> >
> > 1) fsync - this is obvious
> > 2) vm - the vm has decided that this dirty page has been sitting ar=
ound long
> > enough and should be written back to the disk, could happen now, co=
uld happen 10
> > years from now.
> > 3) sync - this is not as obvious. =A0sync doesn't mean anything tha=
n "start
> > writing back dirty data to the fs", and returns before it's done. =A0=
=46or btrfs
> > what that means is we run through _every_ inode that has delalloc p=
ages
> > associated with them and start writeback on them. =A0This will get =
most of your
> > data into the current transaction, which is when the snapshot happe=
ns.
> >
> > If you don't want empty files, do something like this
> >
> > btrfsctl -c /dir/to/volume
> > btrfsctl -s /dir/to/volume/snapshotname /dir/to/volume
> >
> > this is what we do with yum and its rollback plugin, and it works o=
ut quite
> > well. =A0Thanks,
> >
>=20
> Then you broke your ordering guarantee. If the data isn't there, the
> meta-data shouldn't be there either. So the snapshots made before th=
e
> data hits a transaction shouldn't have the file at all.
Nope, what is happening is
fd =3D creat("file") <- this is metadata that needs to be written
write(fd, buf) <- because of delalloc there is no metadata that is=
created
for this operation, therefore it doesn't need to be written out.
close(fd)
so the file has metadata created for it, which needs to be written out.=
Because
of delalloc there are no extents created or anything for the data, ther=
efore
there is nothing to write. Thanks,
Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-02-12 16:22 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-12 1:49 zero-length files in snapshots Nickolai Zeldovich
2010-02-12 3:11 ` Chris Ball
2010-02-12 4:50 ` Mike Fedyk
2010-02-12 15:19 ` Josef Bacik
2010-02-12 16:18 ` Mike Fedyk
2010-02-12 16:22 ` Josef Bacik [this message]
2010-02-12 16:27 ` Mike Fedyk
2010-02-12 16:32 ` Josef Bacik
2010-02-12 17:13 ` Mike Fedyk
2010-02-13 11:25 ` Sander
2010-02-13 19:26 ` Mike Fedyk
2010-02-19 22:22 ` Sage Weil
2010-02-25 18:57 ` Goffredo Baroncelli
2010-02-12 18:22 ` Ravi Pinjala
2010-02-12 18:45 ` Josef Bacik
2010-02-12 19:03 ` Chris Ball
2010-02-12 19:10 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100212162207.GB4191@localhost.localdomain \
--to=josef@redhat.com \
--cc=cjb@laptop.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=mfedyk@mikefedyk.com \
--cc=nickolai@csail.mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox