From: Chris Mason <chris.mason@oracle.com>
To: Sage Weil <sage@newdream.net>
Cc: Yan Zheng <yanzheng@21cn.com>, linux-btrfs@vger.kernel.org
Subject: Re: inode data not getting included in commits?
Date: Fri, 19 Dec 2008 14:07:24 -0500 [thread overview]
Message-ID: <1229713644.6695.48.camel@think.oraclecorp.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0812191028020.29416@cobra.newdream.net>
On Fri, 2008-12-19 at 10:48 -0800, Sage Weil wrote:
> On Fri, 19 Dec 2008, Chris Mason wrote:
> > On Thu, 2008-12-18 at 21:21 -0800, Sage Weil wrote:
> > > On Fri, 19 Dec 2008, Yan Zheng wrote:
> > > > > I noticed some data and metadata getting out of sync on disk, despite
> > > > > wrapping my writes with btrfs transactions. After digging into it a bit,
> > > > > it appears to be a larger problem with inode size/data getting written
> > > > > during a regular commit.
> > > > > [...]
> > > >
> > > > This is the desired behaviour of data=ordered. Btrfs transaction commit
> > > > don't flush data, and metadata wont get updated until data IO complete.
> > > >
> > > > http://article.gmane.org/gmane.comp.file-systems.btrfs/869/match=new+data+ordered+code
> > >
> > > Ah, right, so it is.
> > >
> > > I think what I'm looking for then is a mount mode to get the old behavior,
> > > such that each commit flushes previously written data. Probably a call to
> > > btrfs_wait_ordered_extents() in btrfs_commit_transaction(), or something
> > > along those lines...
> >
> > Could you describe the end goal a bit? I'm happy to make modes where
> > it'll do what you need.
>
> The end goal is for data to flush and commit with the transaction that was
> running when the write() occured.
>
> So, after a sequence like
> write A
> setxattr B
> <crash>
> you should always see A if you see B.
>
> And after a sequence like
> ioctl(fd, BTRFS_IOC_TRANS_START)
> write A
> setxattr B
> close(fd)
> <crash>
> you should see either both A and B or neither A nor B.
>
> fsync() isn't really appropriate since it forces a commit (or a tree log
> entry?), and it would still be better to roll lots of operations up
> together. Either a mount mode that includes dirty data in each
> transaction commit (and probably disables the tree log?), or a per-file
> fsync-like operation that commits an individual file's dirty data to the
> running transaction would do the trick.
A third option is a different type of xattr operation that doesn't go to
disk until the metadata updates done at IO end time.
>From a performance point of view, it'll be much faster than slowing down
commit with data writes.
Can that work for you?
-chris
next prev parent reply other threads:[~2008-12-19 19:07 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-19 0:22 inode data not getting included in commits? Sage Weil
2008-12-19 1:26 ` Yan Zheng
2008-12-19 5:21 ` Sage Weil
2008-12-19 14:12 ` Chris Mason
2008-12-19 18:48 ` Sage Weil
2008-12-19 19:07 ` Chris Mason [this message]
2008-12-19 20:08 ` Sage Weil
2008-12-20 0:11 ` Chris Mason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1229713644.6695.48.camel@think.oraclecorp.com \
--to=chris.mason@oracle.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=sage@newdream.net \
--cc=yanzheng@21cn.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox