From: Chao Yu <chao2.yu@samsung.com>
To: 'Jaegeuk Kim' <jaegeuk@kernel.org>,
'huang ying' <huang.ying.caritas@gmail.com>
Cc: 'Huang Ying' <ying.huang@intel.com>,
'LKML' <linux-kernel@vger.kernel.org>,
linux-f2fs-devel@lists.sourceforge.net
Subject: Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
Date: Thu, 11 Sep 2014 18:47:02 +0800 [thread overview]
Message-ID: <003601cfcdad$d64a8d00$82dfa700$@samsung.com> (raw)
In-Reply-To: <20140911053716.GA24315@jaegeuk-mac02.hsd1.ca.comcast.net>
> -----Original Message-----
> From: Jaegeuk Kim [mailto:jaegeuk@kernel.org]
> Sent: Thursday, September 11, 2014 1:37 PM
> To: huang ying
> Cc: linux-f2fs-devel@lists.sourceforge.net; LKML; Huang Ying
> Subject: Re: [f2fs-dev] [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
>
> On Wed, Sep 10, 2014 at 07:08:32PM +0800, huang ying wrote:
> > On Wed, Sep 10, 2014 at 3:21 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> >
> > > On Tue, Sep 09, 2014 at 07:31:49PM +0800, huang ying wrote:
> > > > On Tue, Sep 9, 2014 at 3:09 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> > > > > > On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > > > > > > Hi Huang,
> > > > > > >
> > > > > > > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > > > > > > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > > > > > > > inode is not checkpointed. The non-inode dnode may be written
> > > before
> > > > > > > > inode. So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > > > > > > > recovery fail.
> > > > > > > >
> > > > > > > > Usually, inode will be allocated before non-inode dnode, so the
> > > nid
> > > > > of
> > > > > > > > inode < nid of non-inode dnode. But it is possible for the
> > > reverse.
> > > > > > > > For example, because of alloc_nid_failed.
> > > > > > > >
> > > > > > > > This is fixed via ignoring non-inode dnode before inode dnode in
> > > > > > > > find_fsync_dnodes.
> > > > > > > >
> > > > > > > > The patch was tested via allocating nid reversely via a debugging
> > > > > > > > patch, that is, from big number to small number.
> > > > > > > >
> > > > > > > > Signed-off-by: Huang, Ying <ying.huang@intel.com>
> > > > > > > > ---
> > > > > > > > fs/f2fs/recovery.c | 7 ++++---
> > > > > > > > 1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > >
> > > > > > > > --- a/fs/f2fs/recovery.c
> > > > > > > > +++ b/fs/f2fs/recovery.c
> > > > > > > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > > > > > > > if (IS_INODE(page) && is_dent_dnode(page))
> > > > > > > > set_inode_flag(F2FS_I(entry->inode),
> > > > > > > > FI_INC_LINK);
> > > > > > > > - } else {
> > > > > > > > - if (IS_INODE(page) && is_dent_dnode(page)) {
> > > > > > >
> > > > > > > If this is not inode block, we should add this inode to recover its
> > > > > data blocks.
> > > > > >
> > > > > > Is it possible that there is only non-inode dnode but no inode when
> > > > > > find_fsync_dnodes checking dnodes? Per my understanding, any
> > > changes to
> > > > > > file will cause inode page dirty (for example, mtime changed), so
> > > that
> > > > > > we will write inode block. Is it right? If so, the solution in this
> > > > > > patch should work too.
> > > > >
> > > > > Your description says that f2fs_iget will fail, which causes the
> > > recovery
> > > > > fail.
> > > > > So, I thought it would be better to handle the f2fs_iget failure
> > > directly.
> > > > >
> > > >
> > > > Yes. That is another way to fix the issue.
> > > >
> > > >
> > > > > In addition, we cannot guarantee the write order of dnode and inode.
> > > > > For exmaple,
> > > > > 1. the inode is written by flusher or kswapd, then,
> > > > > 2. f2fs_sync_file writes its dnode.
> > > > >
> > > > > In that case, we can get only non-inode dnode in the node chain, since
> > > the
> > > > > inode
> > > > > has not fsync_mark.
> > > > >
> > > >
> > > > I think your solution is better here, but does not fix all scenarios. If
> > > > the inode is checkpointed, the file can be recovered, although the inode
> > > > information may be not up to date. But if the inode is not checkpointed,
> > > > f2fs_iget will fail too and recover will fail.
> > >
> > > Ok, let me consider your scenarios.
> > >
> > > Term: F: fsync_mark, D: dentry_mark
> > >
> > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > -> Lose the latest inode(x). Need to fix.
> > >
> > > 2. inode(x) | CP | dnode(F) | inode(x)
> > > -> Impossible, but recover latest dnode(F)
> > >
> > > 3. CP | inode(x) | dnode(F)
> > > -> Need to write inode(DF) in f2fs_sync_file.
> > >
> > > 4. CP | dnode(F) | inode(DF)
> > > -> If f2fs_iget fails, then goto next.
> > >
> > > 5. CP | dnode(F) | inode(x)
> > > -> If f2fs_iget fails, then goto next. But, this is an impossible
> > > scenario.
> > > Drop this dnode(F).
> > >
> > > Indeed, there were some missing scenarios.
> > >
> > > So, how about this patch?
> > >
> > > From 552dc68c5f07a335d7b55c197bab531efb135521 Mon Sep 17 00:00:00 2001
> > > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > >
> > > We can summarize the roll forward recovery scenarios as follows.
> > >
> > > [Term] F: fsync_mark, D: dentry_mark
> > >
> > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > -> Update the latest inode(x).
> > >
> > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > -> No problem.
> > >
> > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > -> Impossible, but recover latest dnode(F)
> > >
> >
> > I think this is possible. If f2fs_sync_file runs concurrently with
> > writeback. f2fs_sync_file written dnode(F), then writeback written inode(x).
>
> If the inode(x) was written, f2fs_sync_file will do write the inode again with
> fsync_mark. So, dnode(F) | inode(x) | inode(F) should be shown.
>
> In f2fs_sync_file,
> ...
> while (!sync_node_pages(sbi, ino, &wbc)) {
> if (fsync_mark_done(sbi, ino))
> goto out;
> mark_inode_dirty_sync(inode);
> ret = f2fs_write_inode(inode, NULL);
> if (ret)
> goto out;
> }
> ...
>
> >
> >
> > > 4. inode(x) | CP | dnode(F) | inode(F)
> > > -> No problem.
> > >
> > > 5. CP | inode(x) | dnode(F)
> > > -> Impossible. Write inode(DF) with dnode(F) by f2fs_sync_file.
> > >
> >
> > I think this is possible. If inode(x) is written by writeback and then
> > dnode(F) is written by f2fs_sync_file.
>
> -> The inode(DF) was missing. Should drop this dnode(F).
> Again, CP | inode(x) | dnode(F) | inode(DF) should be shown.
> This is fixed by the below updated patch.
>
> >
> >
> > > 6. CP | inode(DF) | dnode(F)
> > > -> No problem.
> > >
> > > 7. CP | dnode(F) | inode(DF)
> > > -> If f2fs_iget fails, then goto next to find inode(DF).
> > >
> > > 8. CP | dnode(F) | inode(x)
> > > -> If f2fs_iget fails, then goto next. But, this is an impossible scenario.
> > > Drop this dnode(F).
> > >
> >
> > I think this is possible. as 3.
>
> ditto.
>
> >
> >
> > > So, this patch adds some missing points such as #1, #5, #7, and #8.
> > >
> > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > > ---
> > > fs/f2fs/file.c | 10 ++++++++
> > > fs/f2fs/recovery.c | 70
> > > +++++++++++++++++++++++++++++++++++++++++++++---------
> > > 2 files changed, 69 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > index 5cde363..2660af2 100644
> > > --- a/fs/f2fs/file.c
> > > +++ b/fs/f2fs/file.c
> > > @@ -207,6 +207,16 @@ int f2fs_sync_file(struct file *file, loff_t start,
> > > loff_t end, int datasync)
> > > up_write(&fi->i_sem);
> > > }
> > > } else {
> > > + /*
> > > + * CP | inode(x) | dnode(F)
> > > + * We need to remain inode(DF) for roll-forward recovery.
> > > + */
> > > + if (fsync_mark_done(sbi, inode->i_ino) &&
> > >
> >
> > How can this happen? We have not called sync_node_pages(sbi, inode->i_ino,
> > &wbc). Where is the fsync mark set?
> >
> >
> > > + !is_checkpointed_node(sbi, F2FS_I(inode)->i_pino))
> > > {
> > >
> >
> > if !is_checkpointed_node(sbi, F2FS_I(inode)->i_pino), we need to do
> > checkpoint. Right?
>
> Yes, whole conditions were wrong.
> Please, refer the following patch.
> Thanks,
>
> >From 2e70059dd680830c030f59813d4e9837cb65ecb1 Mon Sep 17 00:00:00 2001
> From: Jaegeuk Kim <jaegeuk@kernel.org>
> Date: Wed, 10 Sep 2014 00:16:34 -0700
> Subject: [PATCH 1/2] f2fs: fix roll-forward missing scenarios
>
> We can summarize the roll forward recovery scenarios as follows.
>
> [Term] F: fsync_mark, D: dentry_mark
>
> 1. inode(x) | CP | inode(x) | dnode(F)
> -> Update the latest inode(x).
>
> 2. inode(x) | CP | inode(F) | dnode(F)
> -> No problem.
>
> 3. inode(x) | CP | dnode(F) | inode(x)
> -> Impossible, but recover latest dnode(F).
>
> 4. inode(x) | CP | dnode(F) | inode(F)
> -> No problem.
>
> 5. CP | inode(x) | dnode(F)
> -> The inode(DF) was missing. Should drop this dnode(F).
>
> 6. CP | inode(DF) | dnode(F)
> -> No problem.
>
> 7. CP | dnode(F) | inode(DF)
> -> If f2fs_iget fails, then goto next to find inode(DF).
>
> 8. CP | dnode(F) | inode(x)
> -> If f2fs_iget fails, then goto next to find inode(DF).
> But it will fail due to no inode(DF).
>
> So, this patch adds some missing points such as #1, #5, #7, and #8.
>
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> ---
> fs/f2fs/file.c | 10 ++++++++
> fs/f2fs/recovery.c | 70 +++++++++++++++++++++++++++++++++++++++++++++---------
> 2 files changed, 69 insertions(+), 11 deletions(-)
>
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index e7681c3..e2a2f38 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -206,6 +206,16 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int
> datasync)
> up_write(&fi->i_sem);
> }
> } else {
> + /*
> + * CP | inode(x) | dnode(F)
> + * We need to remain inode(DF) for roll-forward recovery.
> + */
> + if (!fsync_mark_done(sbi, inode->i_ino) &&
> + !is_checkpointed_node(sbi, inode->i_ino)) {
> + mark_inode_dirty_sync(inode);
> + f2fs_write_inode(inode, NULL);
> + }
This un-checkpointed non-fsynced inode page might be writeback by
kswapd/bdi-flusher here. Then CP | inode(x) | dnode(F) occur.
So how about modifying as below:
while (!sync_node_pages(sbi, inode->i_ino, &wbc) ||
(!fsync_mark_done(sbi, inode->i_ino) &&
!is_checkpointed_node(sbi, inode->i_ino))) {
to guarantee the inode(DF) is be written out to disk?
Thanks,
Yu
> +
> /* if there is no written node page, write its inode page */
> while (!sync_node_pages(sbi, ino, &wbc)) {
> if (fsync_mark_done(sbi, ino))
> diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
> index 6c5a74a..eeb3785 100644
> --- a/fs/f2fs/recovery.c
> +++ b/fs/f2fs/recovery.c
> @@ -14,6 +14,36 @@
> #include "node.h"
> #include "segment.h"
>
> +/*
> + * Roll forward recovery scenarios.
> + *
> + * [Term] F: fsync_mark, D: dentry_mark
> + *
> + * 1. inode(x) | CP | inode(x) | dnode(F)
> + * -> Update the latest inode(x).
> + *
> + * 2. inode(x) | CP | inode(F) | dnode(F)
> + * -> No problem.
> + *
> + * 3. inode(x) | CP | dnode(F) | inode(x)
> + * -> Impossible, but recover latest dnode(F)
> + *
> + * 4. inode(x) | CP | dnode(F) | inode(F)
> + * -> No problem.
> + *
> + * 5. CP | inode(x) | dnode(F)
> + * -> The inode(DF) was missing. Should drop this dnode(F).
> + *
> + * 6. CP | inode(DF) | dnode(F)
> + * -> No problem.
> + *
> + * 7. CP | dnode(F) | inode(DF)
> + * -> If f2fs_iget fails, then goto next to find inode(DF).
> + *
> + * 8. CP | dnode(F) | inode(x)
> + * -> If f2fs_iget fails, then goto next to find inode(DF).
> + * But it will fail due to no inode(DF).
> + */
> static struct kmem_cache *fsync_entry_slab;
>
> bool space_for_roll_forward(struct f2fs_sb_info *sbi)
> @@ -110,27 +140,32 @@ out:
> return err;
> }
>
> -static int recover_inode(struct inode *inode, struct page *node_page)
> +static void __recover_inode(struct inode *inode, struct page *page)
> {
> - struct f2fs_inode *raw_inode = F2FS_INODE(node_page);
> + struct f2fs_inode *raw = F2FS_INODE(page);
> +
> + inode->i_mode = le16_to_cpu(raw->i_mode);
> + i_size_write(inode, le64_to_cpu(raw->i_size));
> + inode->i_atime.tv_sec = le64_to_cpu(raw->i_mtime);
> + inode->i_ctime.tv_sec = le64_to_cpu(raw->i_ctime);
> + inode->i_mtime.tv_sec = le64_to_cpu(raw->i_mtime);
> + inode->i_atime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
> + inode->i_ctime.tv_nsec = le32_to_cpu(raw->i_ctime_nsec);
> + inode->i_mtime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
> +}
>
> +static int recover_inode(struct inode *inode, struct page *node_page)
> +{
> if (!IS_INODE(node_page))
> return 0;
>
> - inode->i_mode = le16_to_cpu(raw_inode->i_mode);
> - i_size_write(inode, le64_to_cpu(raw_inode->i_size));
> - inode->i_atime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
> - inode->i_ctime.tv_sec = le64_to_cpu(raw_inode->i_ctime);
> - inode->i_mtime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
> - inode->i_atime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
> - inode->i_ctime.tv_nsec = le32_to_cpu(raw_inode->i_ctime_nsec);
> - inode->i_mtime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
> + __recover_inode(inode, node_page);
>
> if (is_dent_dnode(node_page))
> return recover_dentry(node_page, inode);
>
> f2fs_msg(inode->i_sb, KERN_NOTICE, "recover_inode: ino = %x, name = %s",
> - ino_of_node(node_page), raw_inode->i_name);
> + ino_of_node(node_page), F2FS_INODE(node_page)->i_name);
> return 0;
> }
>
> @@ -186,10 +221,16 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head
> *head)
> break;
> }
>
> + /*
> + * CP | dnode(F) | inode(DF)
> + * For this case, we should not give up now.
> + */
> entry->inode = f2fs_iget(sbi->sb, ino_of_node(page));
> if (IS_ERR(entry->inode)) {
> err = PTR_ERR(entry->inode);
> kmem_cache_free(fsync_entry_slab, entry);
> + if (err == -ENOENT)
> + goto next;
> break;
> }
> list_add_tail(&entry->list, head);
> @@ -416,6 +457,13 @@ static int recover_data(struct f2fs_sb_info *sbi,
> entry = get_fsync_inode(head, ino_of_node(page));
> if (!entry)
> goto next;
> + /*
> + * inode(x) | CP | inode(x) | dnode(F)
> + * In this case, we can lose the latest inode(x).
> + * So, call __recover_inode for the inode update.
> + */
> + if (IS_INODE(page))
> + __recover_inode(entry->inode, page);
>
> err = do_recover_data(sbi, entry->inode, page, blkaddr);
> if (err)
> --
> 1.8.5.2 (Apple Git-48)
>
>
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce
> Perforce version control. Predictably reliable.
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> _______________________________________________
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
WARNING: multiple messages have this Message-ID (diff)
From: Chao Yu <chao2.yu@samsung.com>
To: "'Jaegeuk Kim'" <jaegeuk@kernel.org>,
"'huang ying'" <huang.ying.caritas@gmail.com>
Cc: linux-f2fs-devel@lists.sourceforge.net,
"'LKML'" <linux-kernel@vger.kernel.org>,
"'Huang Ying'" <ying.huang@intel.com>
Subject: RE: [f2fs-dev] [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
Date: Thu, 11 Sep 2014 18:47:02 +0800 [thread overview]
Message-ID: <003601cfcdad$d64a8d00$82dfa700$@samsung.com> (raw)
In-Reply-To: <20140911053716.GA24315@jaegeuk-mac02.hsd1.ca.comcast.net>
> -----Original Message-----
> From: Jaegeuk Kim [mailto:jaegeuk@kernel.org]
> Sent: Thursday, September 11, 2014 1:37 PM
> To: huang ying
> Cc: linux-f2fs-devel@lists.sourceforge.net; LKML; Huang Ying
> Subject: Re: [f2fs-dev] [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode
>
> On Wed, Sep 10, 2014 at 07:08:32PM +0800, huang ying wrote:
> > On Wed, Sep 10, 2014 at 3:21 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> >
> > > On Tue, Sep 09, 2014 at 07:31:49PM +0800, huang ying wrote:
> > > > On Tue, Sep 9, 2014 at 3:09 PM, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> > > > > > On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > > > > > > Hi Huang,
> > > > > > >
> > > > > > > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > > > > > > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > > > > > > > inode is not checkpointed. The non-inode dnode may be written
> > > before
> > > > > > > > inode. So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > > > > > > > recovery fail.
> > > > > > > >
> > > > > > > > Usually, inode will be allocated before non-inode dnode, so the
> > > nid
> > > > > of
> > > > > > > > inode < nid of non-inode dnode. But it is possible for the
> > > reverse.
> > > > > > > > For example, because of alloc_nid_failed.
> > > > > > > >
> > > > > > > > This is fixed via ignoring non-inode dnode before inode dnode in
> > > > > > > > find_fsync_dnodes.
> > > > > > > >
> > > > > > > > The patch was tested via allocating nid reversely via a debugging
> > > > > > > > patch, that is, from big number to small number.
> > > > > > > >
> > > > > > > > Signed-off-by: Huang, Ying <ying.huang@intel.com>
> > > > > > > > ---
> > > > > > > > fs/f2fs/recovery.c | 7 ++++---
> > > > > > > > 1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > >
> > > > > > > > --- a/fs/f2fs/recovery.c
> > > > > > > > +++ b/fs/f2fs/recovery.c
> > > > > > > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > > > > > > > if (IS_INODE(page) && is_dent_dnode(page))
> > > > > > > > set_inode_flag(F2FS_I(entry->inode),
> > > > > > > > FI_INC_LINK);
> > > > > > > > - } else {
> > > > > > > > - if (IS_INODE(page) && is_dent_dnode(page)) {
> > > > > > >
> > > > > > > If this is not inode block, we should add this inode to recover its
> > > > > data blocks.
> > > > > >
> > > > > > Is it possible that there is only non-inode dnode but no inode when
> > > > > > find_fsync_dnodes checking dnodes? Per my understanding, any
> > > changes to
> > > > > > file will cause inode page dirty (for example, mtime changed), so
> > > that
> > > > > > we will write inode block. Is it right? If so, the solution in this
> > > > > > patch should work too.
> > > > >
> > > > > Your description says that f2fs_iget will fail, which causes the
> > > recovery
> > > > > fail.
> > > > > So, I thought it would be better to handle the f2fs_iget failure
> > > directly.
> > > > >
> > > >
> > > > Yes. That is another way to fix the issue.
> > > >
> > > >
> > > > > In addition, we cannot guarantee the write order of dnode and inode.
> > > > > For exmaple,
> > > > > 1. the inode is written by flusher or kswapd, then,
> > > > > 2. f2fs_sync_file writes its dnode.
> > > > >
> > > > > In that case, we can get only non-inode dnode in the node chain, since
> > > the
> > > > > inode
> > > > > has not fsync_mark.
> > > > >
> > > >
> > > > I think your solution is better here, but does not fix all scenarios. If
> > > > the inode is checkpointed, the file can be recovered, although the inode
> > > > information may be not up to date. But if the inode is not checkpointed,
> > > > f2fs_iget will fail too and recover will fail.
> > >
> > > Ok, let me consider your scenarios.
> > >
> > > Term: F: fsync_mark, D: dentry_mark
> > >
> > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > -> Lose the latest inode(x). Need to fix.
> > >
> > > 2. inode(x) | CP | dnode(F) | inode(x)
> > > -> Impossible, but recover latest dnode(F)
> > >
> > > 3. CP | inode(x) | dnode(F)
> > > -> Need to write inode(DF) in f2fs_sync_file.
> > >
> > > 4. CP | dnode(F) | inode(DF)
> > > -> If f2fs_iget fails, then goto next.
> > >
> > > 5. CP | dnode(F) | inode(x)
> > > -> If f2fs_iget fails, then goto next. But, this is an impossible
> > > scenario.
> > > Drop this dnode(F).
> > >
> > > Indeed, there were some missing scenarios.
> > >
> > > So, how about this patch?
> > >
> > > From 552dc68c5f07a335d7b55c197bab531efb135521 Mon Sep 17 00:00:00 2001
> > > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > >
> > > We can summarize the roll forward recovery scenarios as follows.
> > >
> > > [Term] F: fsync_mark, D: dentry_mark
> > >
> > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > -> Update the latest inode(x).
> > >
> > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > -> No problem.
> > >
> > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > -> Impossible, but recover latest dnode(F)
> > >
> >
> > I think this is possible. If f2fs_sync_file runs concurrently with
> > writeback. f2fs_sync_file written dnode(F), then writeback written inode(x).
>
> If the inode(x) was written, f2fs_sync_file will do write the inode again with
> fsync_mark. So, dnode(F) | inode(x) | inode(F) should be shown.
>
> In f2fs_sync_file,
> ...
> while (!sync_node_pages(sbi, ino, &wbc)) {
> if (fsync_mark_done(sbi, ino))
> goto out;
> mark_inode_dirty_sync(inode);
> ret = f2fs_write_inode(inode, NULL);
> if (ret)
> goto out;
> }
> ...
>
> >
> >
> > > 4. inode(x) | CP | dnode(F) | inode(F)
> > > -> No problem.
> > >
> > > 5. CP | inode(x) | dnode(F)
> > > -> Impossible. Write inode(DF) with dnode(F) by f2fs_sync_file.
> > >
> >
> > I think this is possible. If inode(x) is written by writeback and then
> > dnode(F) is written by f2fs_sync_file.
>
> -> The inode(DF) was missing. Should drop this dnode(F).
> Again, CP | inode(x) | dnode(F) | inode(DF) should be shown.
> This is fixed by the below updated patch.
>
> >
> >
> > > 6. CP | inode(DF) | dnode(F)
> > > -> No problem.
> > >
> > > 7. CP | dnode(F) | inode(DF)
> > > -> If f2fs_iget fails, then goto next to find inode(DF).
> > >
> > > 8. CP | dnode(F) | inode(x)
> > > -> If f2fs_iget fails, then goto next. But, this is an impossible scenario.
> > > Drop this dnode(F).
> > >
> >
> > I think this is possible. as 3.
>
> ditto.
>
> >
> >
> > > So, this patch adds some missing points such as #1, #5, #7, and #8.
> > >
> > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > > ---
> > > fs/f2fs/file.c | 10 ++++++++
> > > fs/f2fs/recovery.c | 70
> > > +++++++++++++++++++++++++++++++++++++++++++++---------
> > > 2 files changed, 69 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > index 5cde363..2660af2 100644
> > > --- a/fs/f2fs/file.c
> > > +++ b/fs/f2fs/file.c
> > > @@ -207,6 +207,16 @@ int f2fs_sync_file(struct file *file, loff_t start,
> > > loff_t end, int datasync)
> > > up_write(&fi->i_sem);
> > > }
> > > } else {
> > > + /*
> > > + * CP | inode(x) | dnode(F)
> > > + * We need to remain inode(DF) for roll-forward recovery.
> > > + */
> > > + if (fsync_mark_done(sbi, inode->i_ino) &&
> > >
> >
> > How can this happen? We have not called sync_node_pages(sbi, inode->i_ino,
> > &wbc). Where is the fsync mark set?
> >
> >
> > > + !is_checkpointed_node(sbi, F2FS_I(inode)->i_pino))
> > > {
> > >
> >
> > if !is_checkpointed_node(sbi, F2FS_I(inode)->i_pino), we need to do
> > checkpoint. Right?
>
> Yes, whole conditions were wrong.
> Please, refer the following patch.
> Thanks,
>
> >From 2e70059dd680830c030f59813d4e9837cb65ecb1 Mon Sep 17 00:00:00 2001
> From: Jaegeuk Kim <jaegeuk@kernel.org>
> Date: Wed, 10 Sep 2014 00:16:34 -0700
> Subject: [PATCH 1/2] f2fs: fix roll-forward missing scenarios
>
> We can summarize the roll forward recovery scenarios as follows.
>
> [Term] F: fsync_mark, D: dentry_mark
>
> 1. inode(x) | CP | inode(x) | dnode(F)
> -> Update the latest inode(x).
>
> 2. inode(x) | CP | inode(F) | dnode(F)
> -> No problem.
>
> 3. inode(x) | CP | dnode(F) | inode(x)
> -> Impossible, but recover latest dnode(F).
>
> 4. inode(x) | CP | dnode(F) | inode(F)
> -> No problem.
>
> 5. CP | inode(x) | dnode(F)
> -> The inode(DF) was missing. Should drop this dnode(F).
>
> 6. CP | inode(DF) | dnode(F)
> -> No problem.
>
> 7. CP | dnode(F) | inode(DF)
> -> If f2fs_iget fails, then goto next to find inode(DF).
>
> 8. CP | dnode(F) | inode(x)
> -> If f2fs_iget fails, then goto next to find inode(DF).
> But it will fail due to no inode(DF).
>
> So, this patch adds some missing points such as #1, #5, #7, and #8.
>
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> ---
> fs/f2fs/file.c | 10 ++++++++
> fs/f2fs/recovery.c | 70 +++++++++++++++++++++++++++++++++++++++++++++---------
> 2 files changed, 69 insertions(+), 11 deletions(-)
>
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index e7681c3..e2a2f38 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -206,6 +206,16 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int
> datasync)
> up_write(&fi->i_sem);
> }
> } else {
> + /*
> + * CP | inode(x) | dnode(F)
> + * We need to remain inode(DF) for roll-forward recovery.
> + */
> + if (!fsync_mark_done(sbi, inode->i_ino) &&
> + !is_checkpointed_node(sbi, inode->i_ino)) {
> + mark_inode_dirty_sync(inode);
> + f2fs_write_inode(inode, NULL);
> + }
This un-checkpointed non-fsynced inode page might be writeback by
kswapd/bdi-flusher here. Then CP | inode(x) | dnode(F) occur.
So how about modifying as below:
while (!sync_node_pages(sbi, inode->i_ino, &wbc) ||
(!fsync_mark_done(sbi, inode->i_ino) &&
!is_checkpointed_node(sbi, inode->i_ino))) {
to guarantee the inode(DF) is be written out to disk?
Thanks,
Yu
> +
> /* if there is no written node page, write its inode page */
> while (!sync_node_pages(sbi, ino, &wbc)) {
> if (fsync_mark_done(sbi, ino))
> diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
> index 6c5a74a..eeb3785 100644
> --- a/fs/f2fs/recovery.c
> +++ b/fs/f2fs/recovery.c
> @@ -14,6 +14,36 @@
> #include "node.h"
> #include "segment.h"
>
> +/*
> + * Roll forward recovery scenarios.
> + *
> + * [Term] F: fsync_mark, D: dentry_mark
> + *
> + * 1. inode(x) | CP | inode(x) | dnode(F)
> + * -> Update the latest inode(x).
> + *
> + * 2. inode(x) | CP | inode(F) | dnode(F)
> + * -> No problem.
> + *
> + * 3. inode(x) | CP | dnode(F) | inode(x)
> + * -> Impossible, but recover latest dnode(F)
> + *
> + * 4. inode(x) | CP | dnode(F) | inode(F)
> + * -> No problem.
> + *
> + * 5. CP | inode(x) | dnode(F)
> + * -> The inode(DF) was missing. Should drop this dnode(F).
> + *
> + * 6. CP | inode(DF) | dnode(F)
> + * -> No problem.
> + *
> + * 7. CP | dnode(F) | inode(DF)
> + * -> If f2fs_iget fails, then goto next to find inode(DF).
> + *
> + * 8. CP | dnode(F) | inode(x)
> + * -> If f2fs_iget fails, then goto next to find inode(DF).
> + * But it will fail due to no inode(DF).
> + */
> static struct kmem_cache *fsync_entry_slab;
>
> bool space_for_roll_forward(struct f2fs_sb_info *sbi)
> @@ -110,27 +140,32 @@ out:
> return err;
> }
>
> -static int recover_inode(struct inode *inode, struct page *node_page)
> +static void __recover_inode(struct inode *inode, struct page *page)
> {
> - struct f2fs_inode *raw_inode = F2FS_INODE(node_page);
> + struct f2fs_inode *raw = F2FS_INODE(page);
> +
> + inode->i_mode = le16_to_cpu(raw->i_mode);
> + i_size_write(inode, le64_to_cpu(raw->i_size));
> + inode->i_atime.tv_sec = le64_to_cpu(raw->i_mtime);
> + inode->i_ctime.tv_sec = le64_to_cpu(raw->i_ctime);
> + inode->i_mtime.tv_sec = le64_to_cpu(raw->i_mtime);
> + inode->i_atime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
> + inode->i_ctime.tv_nsec = le32_to_cpu(raw->i_ctime_nsec);
> + inode->i_mtime.tv_nsec = le32_to_cpu(raw->i_mtime_nsec);
> +}
>
> +static int recover_inode(struct inode *inode, struct page *node_page)
> +{
> if (!IS_INODE(node_page))
> return 0;
>
> - inode->i_mode = le16_to_cpu(raw_inode->i_mode);
> - i_size_write(inode, le64_to_cpu(raw_inode->i_size));
> - inode->i_atime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
> - inode->i_ctime.tv_sec = le64_to_cpu(raw_inode->i_ctime);
> - inode->i_mtime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
> - inode->i_atime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
> - inode->i_ctime.tv_nsec = le32_to_cpu(raw_inode->i_ctime_nsec);
> - inode->i_mtime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
> + __recover_inode(inode, node_page);
>
> if (is_dent_dnode(node_page))
> return recover_dentry(node_page, inode);
>
> f2fs_msg(inode->i_sb, KERN_NOTICE, "recover_inode: ino = %x, name = %s",
> - ino_of_node(node_page), raw_inode->i_name);
> + ino_of_node(node_page), F2FS_INODE(node_page)->i_name);
> return 0;
> }
>
> @@ -186,10 +221,16 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head
> *head)
> break;
> }
>
> + /*
> + * CP | dnode(F) | inode(DF)
> + * For this case, we should not give up now.
> + */
> entry->inode = f2fs_iget(sbi->sb, ino_of_node(page));
> if (IS_ERR(entry->inode)) {
> err = PTR_ERR(entry->inode);
> kmem_cache_free(fsync_entry_slab, entry);
> + if (err == -ENOENT)
> + goto next;
> break;
> }
> list_add_tail(&entry->list, head);
> @@ -416,6 +457,13 @@ static int recover_data(struct f2fs_sb_info *sbi,
> entry = get_fsync_inode(head, ino_of_node(page));
> if (!entry)
> goto next;
> + /*
> + * inode(x) | CP | inode(x) | dnode(F)
> + * In this case, we can lose the latest inode(x).
> + * So, call __recover_inode for the inode update.
> + */
> + if (IS_INODE(page))
> + __recover_inode(entry->inode, page);
>
> err = do_recover_data(sbi, entry->inode, page, blkaddr);
> if (err)
> --
> 1.8.5.2 (Apple Git-48)
>
>
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce
> Perforce version control. Predictably reliable.
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> _______________________________________________
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
next prev parent reply other threads:[~2014-09-11 10:48 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-08 11:38 [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode Huang Ying
2014-09-08 11:38 ` Huang Ying
2014-09-09 5:23 ` Jaegeuk Kim
2014-09-09 5:23 ` Jaegeuk Kim
2014-09-09 5:39 ` Huang Ying
2014-09-09 7:09 ` Jaegeuk Kim
2014-09-09 7:09 ` Jaegeuk Kim
[not found] ` <CAC=cRTMA9AqmHjQqK3=5pAs8yqi25Rzmz8MaZ8=oTDaxHAXU+A@mail.gmail.com>
2014-09-10 7:21 ` Jaegeuk Kim
2014-09-10 7:21 ` Jaegeuk Kim
[not found] ` <CAC=cRTMGJfD_SZ=bE8pi5U6n5W1MF17emLC3VZDbpLSQtbNcKg@mail.gmail.com>
2014-09-11 5:37 ` Jaegeuk Kim
2014-09-11 5:37 ` Jaegeuk Kim
2014-09-11 10:47 ` Chao Yu [this message]
2014-09-11 10:47 ` [f2fs-dev] " Chao Yu
2014-09-11 12:31 ` Huang Ying
2014-09-11 12:31 ` [f2fs-dev] " Huang Ying
2014-09-11 12:25 ` Huang Ying
2014-09-11 12:25 ` Huang Ying
2014-09-12 5:13 ` Jaegeuk Kim
2014-09-12 7:34 ` Huang Ying
2014-09-12 7:34 ` Huang Ying
2014-09-13 14:23 ` Huang Ying
2014-09-13 14:23 ` Huang Ying
2014-09-14 7:48 ` Jaegeuk Kim
2014-09-14 7:48 ` Jaegeuk Kim
2014-09-15 5:14 ` Huang Ying
2014-09-15 5:14 ` Huang Ying
2014-09-18 5:47 ` Jaegeuk Kim
2014-09-14 7:38 ` Jaegeuk Kim
2014-09-14 7:38 ` Jaegeuk Kim
2014-09-15 1:32 ` Huang Ying
2014-09-15 1:32 ` Huang Ying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='003601cfcdad$d64a8d00$82dfa700$@samsung.com' \
--to=chao2.yu@samsung.com \
--cc=huang.ying.caritas@gmail.com \
--cc=jaegeuk@kernel.org \
--cc=linux-f2fs-devel@lists.sourceforge.net \
--cc=linux-kernel@vger.kernel.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.