All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
To: Ryusuke Konishi
	<konishi.ryusuke-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: linux-nilfs <linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH] nilfs2: Fix race condition that causes file system corruption
Date: Sun, 29 Oct 2017 10:46:35 +0100	[thread overview]
Message-ID: <20171029104629.7ce6cebb@terok> (raw)
In-Reply-To: <CAKFNMomWEWyPZ4jSvJLk9AdzuDxn1T4-gxsOpbgRoZ-=6jH1Tg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Hi Ryusuke,

I ran my tests again and I can reproduce the issue within half an hour:

[ 9823.056522] WARNING: CPU: 5 PID: 18461 at fs/nilfs2//dat.c:160
nilfs_dat_prepare_end+0xd3/0xe0 [nilfs2]

...

[ 9823.056570] Call Trace:
[ 9823.056616]  nilfs_dat_prepare_update+0x20/0x60 [nilfs2]
[ 9823.056620]  nilfs_direct_propagate+0x71/0xd0 [nilfs2]
[ 9823.056623]  nilfs_bmap_propagate+0x2d/0x70 [nilfs2]
[ 9823.056626]  nilfs_collect_file_data+0x23/0x50 [nilfs2]
[ 9823.056629]  ? nilfs_collect_dat_data+0x50/0x50 [nilfs2]
[ 9823.056636]  nilfs_segctor_apply_buffers+0x7b/0xf0 [nilfs2]
[ 9823.056640]  nilfs_segctor_scan_file+0x135/0x190 [nilfs2]
[ 9823.056643]  nilfs_segctor_do_construct+0xb7a/0x2100 [nilfs2]
[ 9823.056647]  nilfs_segctor_construct+0x1c9/0x300 [nilfs2]
[ 9823.056650]  ? nilfs_segctor_construct+0x1c9/0x300 [nilfs2]
[ 9823.056653]  nilfs_segctor_thread+0x10a/0x390 [nilfs2]
[ 9823.056659]  kthread+0x109/0x140
[ 9823.056662]  ? nilfs_segctor_construct+0x300/0x300 [nilfs2]
[ 9823.056663]  ? kthread_create_on_node+0x70/0x70
[ 9823.056665]  ret_from_fork+0x25/0x30

The result is a broken file system, and the kernel log is spammed with
the above warnings and stack traces.

The file system usage is at 71%:

$ LANG=C df /dev/sdb1 
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/sdb1      104849404 70205436  29401088  71% /mnt

The patch for the race condition definitely resolves the problem. With
it, I can run the bugsimulator for hours without any problems.

However, I used quite an old kernel 4.12.0. I will update to the
latest version and test again.

Best regards,

Andreas

Am Sun, 29 Oct 2017 14:09:40 +0900
schrieb Ryusuke Konishi <konishi.ryusuke-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:

> Unfortunately the patch didn't fix this issue:
> 
> [  612.614615] NILFS (sda1): segctord starting. Construction interval
> = 5 seconds, CP frequency < 30 seconds
> [ 4298.841376] NILFS (sda1): nilfs_ioctl_move_inode_block: conflicting
> data buffer: ino=1155, cno=11, offset=1, blocknr=1216036,
> vblocknr=1198436
> [ 4298.841877] NILFS (sda1): error -17 preparing GC: cannot read
> source blocks
> 
> Look like the bugsimulator hit other bug.
> 
> When this happened, the disk usage was 100%:
> 
> $ df /dev/sda1
> Filesystem     1K-blocks     Used Available Use% Mounted on
> /dev/sda1      104849404 99655676         0 100% /test
> 
> And, the error killed cleanerd.   However, I could cleanly unmount the
> partition, and
> I didn't see any errors after remount the partition.
> 
> As you commented, I have changed some parameters
> in /etc/nilfs_cleanerd.conf:
> ------------------------------------------- mc_nsegments_per_clean  16
> nsegments_per_clean     16
> max_clean_segments              30%
> cleaning_interval       0.1
> mc_cleaning_interval    0.1
> -------------------------------------------
> 
> I will look into this issue with bugsimulator.
> 
> Anyone can reproduce the original corruption issue in other way?
> 
> Regards,
> Ryusuke Konishi
> 
> 
> 2017-10-29 2:16 GMT+09:00 Ryusuke Konishi <konishi.ryusuke-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
> > Hi,
> >
> > Sorry for my late reponse.
> > I could reproduce a corruption issue with the bug simulator.
> >
> > -----
> > [88805.039545] NILFS (sda1): segctord starting. Construction
> > interval = 5 second s, CP frequency < 30 seconds
> > [92817.941644] NILFS (sda1): nilfs_ioctl_move_inode_block:
> > conflicting data buff er: ino=10846, cno=92, offset=2,
> > blocknr=11425709, vblocknr=11227714 [92817.941936] NILFS (sda1):
> > error -17 preparing GC: cannot read source blocks -----
> >
> > Now I'm testing the patch.   I will send it to upstream after it
> > turned out to suppress the issue.
> >
> > The patch itself looks good to me though I guess there may be
> > similar bugs. What makes things complicated seems that inode block
> > is not marked as dirty just when "propagate" function carries a
> > dirty state from a dirty node/data block.
> >
> > Anyway, I hasten to apply this patch.
> >
> > Thanks,
> > Ryusuke Konishi
> >
> >
> > 2017-07-13 15:44 GMT+09:00 Andreas Rohner
> > <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>:  
> >> Hi,
> >>
> >> Am Wed, 12 Jul 2017 09:08:57 +0900
> >> schrieb Ryusuke Konishi <konishi.ryusuke-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
> >>  
> >>> Hi,
> >>>
> >>> 2017-07-11 15:19 GMT+09:00 Andreas Rohner
> >>> <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>:  
> >>> >
> >>> > There is a race condition between the function
> >>> > nilfs_dirty_inode() and nilfs_set_file_dirty().
> >>> >
> >>> > When a file is opened, nilfs_dirty_inode() is called to update
> >>> > the access timestamp in the inode. It calls
> >>> > __nilfs_mark_inode_dirty() in a separate transaction.
> >>> > __nilfs_mark_inode_dirty() caches the ifile buffer_head in the
> >>> > i_bh field of the inode info structure and marks it as dirty.
> >>> >
> >>> > After some data was written to the file in another transaction,
> >>> > the function nilfs_set_file_dirty() is called, which adds the
> >>> > inode to the ns_dirty_files list.
> >>> >
> >>> > Then the segment construction calls
> >>> > nilfs_segctor_collect_dirty_files(), which goes through the
> >>> > ns_dirty_files list and checks the i_bh field. If there is a
> >>> > cached buffer_head in i_bh it is not marked as dirty again.
> >>> >
> >>> > Since nilfs_dirty_inode() and nilfs_set_file_dirty() use
> >>> > separate transactions, it is possible that a segment
> >>> > construction that writes out the ifile occurs in-between the
> >>> > two. If this happens the inode is not on the ns_dirty_files
> >>> > list, but its ifile block is still marked as dirty and written
> >>> > out.
> >>> >
> >>> > In the next segment construction, the data for the file is
> >>> > written out and nilfs_bmap_propagate() updates the b-tree.
> >>> > Eventually the bmap root is written into the i_bh block, which
> >>> > is not dirty, because it was written out in another segment
> >>> > construction.
> >>> >
> >>> > As a result the bmap update can be lost, which leads to file
> >>> > system corruption. Either the virtual block address points to an
> >>> > unallocated DAT block, or the DAT entry will be reused for
> >>> > something different.
> >>> >
> >>> > The error can remain undetected for a long time. A typical error
> >>> > message would be one of the "bad btree" errors or a warning
> >>> > that a DAT entry could not be found.
> >>> >
> >>> > This bug can be reproduced reliably by a simple benchmark that
> >>> > creates and overwrites millions of 4k files.
> >>> >
> >>> > Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>  
> >>>
> >>> Will review, please wait for a while.  
> >>
> >> Thank you, I have uploaded my benchmark on GitHub:
> >>
> >> https://github.com/zeitgeist87/bugsimulator
> >>
> >> The benchmark is very simple. It writes out 13107200 4k files in a
> >> two layer directory tree, so that every directory contains about
> >> 1000 files. Then it overwrites these files in an infinite loop.
> >>
> >> You need quite aggressive cleaner settings for it to keep up with
> >> the bugsimulator:
> >>
> >> mc_nsegments_per_clean  16
> >> nsegments_per_clean     16
> >> max_clean_segments              30%
> >> cleaning_interval       0.1
> >> mc_cleaning_interval    0.1
> >>
> >> If you run this benchmark on a fresh 100GB NILFS2 partition with
> >> the above cleaner settings, the file system will be corrupt in
> >> about half an hour. At least on my machine.
> >>
> >> With the patch for the race condition, it can run indefinetly.
> >>
> >> Regards,
> >> Andreas
> >>  
> >>> Regards,
> >>> Ryusuke Konishi
> >>>  
> >>> >
> >>> > ---
> >>> >  fs/nilfs2/segment.c | 6 ++++--
> >>> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >>> >
> >>> > diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
> >>> > index 70ded52dc1dd..50e12956c737 100644
> >>> > --- a/fs/nilfs2/segment.c
> >>> > +++ b/fs/nilfs2/segment.c
> >>> > @@ -1958,8 +1958,6 @@ static int
> >>> > nilfs_segctor_collect_dirty_files(struct nilfs_sc_info *sci,
> >>> > err, ii->vfs_inode.i_ino); return err;
> >>> >                         }
> >>> > -                       mark_buffer_dirty(ibh);
> >>> > -                       nilfs_mdt_mark_dirty(ifile);
> >>> >                         spin_lock(&nilfs->ns_inode_lock);
> >>> >                         if (likely(!ii->i_bh))
> >>> >                                 ii->i_bh = ibh;
> >>> > @@ -1968,6 +1966,10 @@ static int
> >>> > nilfs_segctor_collect_dirty_files(struct nilfs_sc_info *sci,
> >>> > goto retry; }
> >>> >
> >>> > +               // Always redirty the buffer to avoid race
> >>> > condition
> >>> > +               mark_buffer_dirty(ii->i_bh);
> >>> > +               nilfs_mdt_mark_dirty(ifile);
> >>> > +
> >>> >                 clear_bit(NILFS_I_QUEUED, &ii->i_state);
> >>> >                 set_bit(NILFS_I_BUSY, &ii->i_state);
> >>> >                 list_move_tail(&ii->i_dirty,
> >>> > &sci->sc_dirty_files); --
> >>> > 2.13.2
> >>> >
> >>> > --
> >>> > To unsubscribe from this list: send the line "unsubscribe
> >>> > linux-nilfs" in the body of a message to
> >>> > majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at
> >>> > http://vger.kernel.org/majordomo-info.html  
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe
> >>> linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >>> More majordomo info at
> >>> http://vger.kernel.org/majordomo-info.html  
> >>  
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2017-10-29  9:46 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-11  6:19 [PATCH] nilfs2: Fix race condition that causes file system corruption Andreas Rohner
     [not found] ` <20170711061907.19259-1-andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
2017-07-12  0:08   ` Ryusuke Konishi
     [not found]     ` <CAKFNMonzhYBKkjotNiNY0x0DnWxzHYF8WvO7f7xjGwr-8a=aRQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-07-13  6:44       ` Andreas Rohner
2017-10-28 17:16         ` Ryusuke Konishi
     [not found]           ` <CAKFNMomD0qE3Q-bSfH=6-A82LaEVh2fK8Asp6emgYMa9BstQKw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-10-29  5:09             ` Ryusuke Konishi
     [not found]               ` <CAKFNMomWEWyPZ4jSvJLk9AdzuDxn1T4-gxsOpbgRoZ-=6jH1Tg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-10-29  5:24                 ` Ryusuke Konishi
2017-10-29  8:07                 ` Andreas Rohner
2017-10-29 12:53                   ` Ryusuke Konishi
     [not found]                     ` <CAKFNMo=6uOSA-bdkxN6dYnPo-7Svkm2vMXbFLzcP_eEp9ra_jg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-10-29 14:06                       ` Andreas Rohner
2017-10-29 16:29                         ` Ryusuke Konishi
     [not found]                           ` <CAKFNMonWJtOHxaHQO8SNPGOUx6H=ghp7EQ+L1s3nbjMZt-Zr_Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-10-29 22:19                             ` Ryusuke Konishi
     [not found]                               ` <CAKFNMo=NnxYgxXhBASBY7QVO9phf=p3RgXCpCb6qWR9N99YedA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-10-30 11:25                                 ` Ryusuke Konishi
2017-10-29  9:46                 ` Andreas Rohner [this message]
2017-10-29 10:23                   ` Clemens Eisserer
2017-10-24  3:18       ` Clemens Eisserer
     [not found]         ` <CAFvQSYR6vXuyGzGjeogKdg1Xzg=4QDE4TM7r4G-UnZ7hLniDkw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-10-24 17:06           ` Viacheslav Dubeyko
     [not found]             ` <1508864769.501.5.camel-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2017-10-24 19:29               ` Clemens Eisserer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171029104629.7ce6cebb@terok \
    --to=andreas.rohner-hi6y0cq0ng0@public.gmane.org \
    --cc=konishi.ryusuke-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.