All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Mason <chris.mason@oracle.com>
To: Sage Weil <sage@newdream.net>
Cc: Yan Zheng <yanzheng@21cn.com>, linux-btrfs@vger.kernel.org
Subject: Re: inode data not getting included in commits?
Date: Fri, 19 Dec 2008 14:07:24 -0500	[thread overview]
Message-ID: <1229713644.6695.48.camel@think.oraclecorp.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0812191028020.29416@cobra.newdream.net>

On Fri, 2008-12-19 at 10:48 -0800, Sage Weil wrote:
> On Fri, 19 Dec 2008, Chris Mason wrote:
> > On Thu, 2008-12-18 at 21:21 -0800, Sage Weil wrote:
> > > On Fri, 19 Dec 2008, Yan Zheng wrote:
> > > > > I noticed some data and metadata getting out of sync on disk, despite
> > > > > wrapping my writes with btrfs transactions.  After digging into it a bit,
> > > > > it appears to be a larger problem with inode size/data getting written
> > > > > during a regular commit.
> > > > > [...]
> > > > 
> > > > This is the desired behaviour of data=ordered. Btrfs transaction commit
> > > > don't flush data, and metadata wont get updated until data IO complete.
> > > > 
> > > > http://article.gmane.org/gmane.comp.file-systems.btrfs/869/match=new+data+ordered+code
> > > 
> > > Ah, right, so it is.
> > > 
> > > I think what I'm looking for then is a mount mode to get the old behavior, 
> > > such that each commit flushes previously written data.  Probably a call to 
> > > btrfs_wait_ordered_extents() in btrfs_commit_transaction(), or something 
> > > along those lines...
> > 
> > Could you describe the end goal a bit?  I'm happy to make modes where
> > it'll do what you need.
> 
> The end goal is for data to flush and commit with the transaction that was 
> running when the write() occured.
> 
> So, after a sequence like
>  write A
>  setxattr B
>  <crash>
> you should always see A if you see B.
> 
> And after a sequence like
>  ioctl(fd, BTRFS_IOC_TRANS_START)
>  write A
>  setxattr B
>  close(fd)
>  <crash>
> you should see either both A and B or neither A nor B.
> 
> fsync() isn't really appropriate since it forces a commit (or a tree log 
> entry?), and it would still be better to roll lots of operations up 
> together.  Either a mount mode that includes dirty data in each 
> transaction commit (and probably disables the tree log?), or a per-file 
> fsync-like operation that commits an individual file's dirty data to the 
> running transaction would do the trick.

A third option is a different type of xattr operation that doesn't go to
disk until the metadata updates done at IO end time.

>From a performance point of view, it'll be much faster than slowing down
commit with data writes.

Can that work for you?

-chris



  reply	other threads:[~2008-12-19 19:07 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-19  0:22 inode data not getting included in commits? Sage Weil
2008-12-19  1:26 ` Yan Zheng
2008-12-19  5:21   ` Sage Weil
2008-12-19 14:12     ` Chris Mason
2008-12-19 18:48       ` Sage Weil
2008-12-19 19:07         ` Chris Mason [this message]
2008-12-19 20:08           ` Sage Weil
2008-12-20  0:11             ` Chris Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1229713644.6695.48.camel@think.oraclecorp.com \
    --to=chris.mason@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=sage@newdream.net \
    --cc=yanzheng@21cn.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.