linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Filipe David Manana <fdmanana@gmail.com>
To: Miao Xie <miaox@cn.fujitsu.com>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH] Btrfs: fix sync fs to actually wait for all data to be persisted
Date: Mon, 23 Sep 2013 10:11:42 +0100	[thread overview]
Message-ID: <CAL3q7H4X_KD2RsRoQ7T9ZwigbGeofFcE_wxE8BOnZDObjvP_=Q@mail.gmail.com> (raw)
In-Reply-To: <523F99CA.5000703@cn.fujitsu.com>

On Mon, Sep 23, 2013 at 2:30 AM, Miao Xie <miaox@cn.fujitsu.com> wrote:
>
> On      sun, 22 Sep 2013 21:55:53 +0100, Filipe David Borba Manana wrote:
> > Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't
> > wait for delayed work to finish before returning success to the
> > caller. This change fixes this, ensuring that there's no data loss
> > if a power failure happens right after fs sync returns success to
> > the caller and before the next commit happens.
> >
> > Steps to reproduce the data loss issue:
> >
> > $ mkfs.btrfs -f /dev/sdb3
> > $ mount /dev/sdb3 /mnt/btrfs
> > $ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);' && btrfs fi sync /mnt/btrfs
> >
> > Right after the btrfs fi sync command (a second or 2 for example), power
> > off the machine and reboot it. The file will be empty, as it can be verified
> > after mounting the filesystem and through btrfs-debug-tree:
> >
> > $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
> >
> >         item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
> >                 location key (257 INODE_ITEM 0) type FILE
> >                 namelen 6 datalen 0 name: foobar
> >         item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
> >                 inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1
> >         item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
> >                 inode ref index 2 namelen 6 name: foobar
> > checksum tree key (CSUM_TREE ROOT_ITEM 0)
> > leaf 29429760 items 0 free space 3995 generation 7 owner 7
> > fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e
> > chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae
> > uuid tree key (UUID_TREE ROOT_ITEM 0)
> >
> > After this patch, the data loss no longer happens after a power failure and
> > btrfs-debug-tree shows:
> >
> > $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
> >       item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
> >               location key (257 INODE_ITEM 0) type FILE
> >               namelen 6 datalen 0 name: foobar
> >       item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
> >               inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1
> >       item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
> >               inode ref index 2 namelen 6 name: foobar
> >       item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53
> >               extent data disk byte 12845056 nr 8192
> >               extent data offset 0 nr 8192 ram 8192
> >               extent compression 0
> > checksum tree key (CSUM_TREE ROOT_ITEM 0)
> >
> > Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
> > ---
> >  fs/btrfs/super.c |    5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> > index 6ab0df5..557e38f 100644
> > --- a/fs/btrfs/super.c
> > +++ b/fs/btrfs/super.c
> > @@ -913,6 +913,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
> >       struct btrfs_trans_handle *trans;
> >       struct btrfs_fs_info *fs_info = btrfs_sb(sb);
> >       struct btrfs_root *root = fs_info->tree_root;
> > +     int ret;
> >
> >       trace_btrfs_sync_fs(wait);
> >
> > @@ -921,6 +922,10 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
> >               return 0;
> >       }
> >
> > +     ret = btrfs_start_all_delalloc_inodes(fs_info, 0);
> > +     if (ret)
> > +             return ret;
> > +
>
> I don't think we should call btrfs_start_all_delalloc_inodes(), because this function is also
> called by do_sync(), but do_sync() syncs the whole fs before calling it, so if we add
> btrfs_start_all_delalloc_inodes() here, we will sync the fs twice, and the second one is unnecessary.

Where is that do_sync() function exactly? I'm not finding any with
that exact name in fs/btrfs/* nor fs/*
I used this approach because (besides working) it's what is done in
btrfs_commit_transaction() (via btrfs_start_delalloc_flush and
btrfs_wait_delalloc_flush). Why can it be like that in the transaction
commit and not in btrfs_sync_fs() ?

>
> Calling writeback_inodes_sb() before btrfs_sync_fs() is better way to fix this problem.

Just tested it, and it works that way too. Uploading a new patch.

Thanks for the feedback/review Miao.

>
> Thanks
> Miao
>
> >       btrfs_wait_all_ordered_extents(fs_info);
> >
> >       trans = btrfs_attach_transaction_barrier(root);
> >
>



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

  reply	other threads:[~2013-09-23  9:11 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-22 20:55 [PATCH] Btrfs: fix sync fs to actually wait for all data to be persisted Filipe David Borba Manana
2013-09-23  1:30 ` Miao Xie
2013-09-23  9:11   ` Filipe David Manana [this message]
2013-09-23  9:51     ` Liu Bo
2013-09-23  9:23 ` [PATCH v2] " Filipe David Borba Manana
2013-09-23  9:53   ` Filipe David Manana
2013-09-23  9:59     ` Liu Bo
2013-09-23 10:06       ` Filipe David Manana
2013-09-23 10:28 ` [PATCH v3] " Filipe David Borba Manana
2013-09-23 10:35 ` [PATCH v4] " Filipe David Borba Manana
2013-09-24  1:31   ` Miao Xie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAL3q7H4X_KD2RsRoQ7T9ZwigbGeofFcE_wxE8BOnZDObjvP_=Q@mail.gmail.com' \
    --to=fdmanana@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=miaox@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).