Re: f2fs Crash Consistency Problem

linux-f2fs-devel.lists.sourceforge.net archive mirror
 help / color / mirror / Atom feed

From: Raouf Rokhjavan <rokhjavan.r@gmail.com>
To: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: linux-f2fs-devel@lists.sourceforge.net
Subject: Re: f2fs Crash Consistency Problem
Date: Wed, 17 May 2017 22:13:34 +0430	[thread overview]
Message-ID: <a5d7e55f-c246-8c5d-c32c-de299f5d3e03@gmail.com> (raw)
In-Reply-To: <20170512001431.GA9575@jaegeuk.local>

On 05/12/17 04:44, Jaegeuk Kim wrote:
> Hi,
>
> On 05/10, Raouf Rokhjavan wrote:
> ...
>
>> As you told to use snapshot mechanism to prevent changing ckpt number after
>> each mount, I ran again generic tests of xfstests framework on top of
>> log-writes target with f2fs file system. In order to automate reporting an
>> inconsistency situation, I add a parameter to fsck.f2fs to return(-1) when
>> c.bug_on condition is met. To evaluate how f2fs react in case of crash
>> consistency, I replay each log and check the consistency of f2fs with a my
>> own modified version of fsck.f2fs.  Accordingly, all tests passed smoothly
>> except these tests:
>>
>> [FAIL] Running generic/013 failed. (consistency_single)
> Could you check whether any IO made by mkfs was added in the replay log?
> If so, fsck.f2fs should be failed when replaying them.
>
>> [FAIL] Running generic/070 failed. (consistency_single)
>> [FAIL] Running generic/113 failed. (consistency_single)
> I added a mark to replay in the beginning of generic/113, and ran the test.
> But, I couldn't find any error given test_dev as a log_dev. (I tested this
> in the latest f2fs/dev-test branch.)
>
>> [FAIL] Running generic/241 failed. (consistency_single)
>>
>> In other words, in these tests, c.bug_on() was true. Would you please
>> describe why they become inconsistent?
>>
>> Besides, I ran sysbench for database benchmark with 1 thread, 1000 records,
>> and 100 transactions on top of log-writes target with f2fs. Interestingly, I
>> encountered a weird inconsistency. After replaying about 100 logs, fsck.f2fs
>> complains about inconsistency with the following messages:
> Can you share the parameter for sysbench?
Hi,

Since I want to make sure that my system, having a database app, stay 
operational after the power failure, I test database system on top of 
f2fs. Accordingly, I use sysbench and dm-log-writes to serve this 
purpose. I took advantage of lua scripting facility in sysbench to 
implement write only operations in database:

#sysbench 
--test=/home/roraouf/Projects/CrashConsistencyTest/locals/var/lib/dbtests/sysbench-lua/tests/db/oltp_write_only.lua 
--db-driver=mysql --oltp-table-size=1000 --mysql-db=sysbench 
--mysql-user=sysbench --mysql-password=password --max-requests=100 
--num-threads=1 
--mysql-socket=/mnt/crash_consistency/f2fs/mysql/mysql.sock run

I ran this test on 3 configurations:
1- ext4 (ordered, noatime) - success 15/15
2- ext4 (norecovery, noatime) - success 0/15
3- f2fs (noatime) - success 3/15

Success, here, means whether file system is operational without running 
fsck and fixing after each replay.
As the result show, ext4 with ordered journaling could surmount this 
test, but ,as it had been expected, ext4 without journaling like ext2 
needs fsck to recover file system after simulated power loss.
The surprising part of this test is f2fs. As f2fs always maintains a 
stable checkpoint of file system, and based on its FAST paper, it always 
rolls back to its stable checkpoint after power loss, I didn't expect to 
see f2fs in inconsistent state after replaying logs as fsck.f2fs 
reports. (It's necessary to mention that we check consistency of f2fs 
after mkfs.f2fs. ext4's results verify this notion.)

Unfortunately, the results are not reproducible, and inconsistency 
occurs in different logs; moreover, fsck.f2fs passes this test 
occasionally.
To give more accurate information, I uploaded the output of fsck.f2fs on 
Google Drive.

https://drive.google.com/open?id=0BxdqCs3G6wd3UWtDTmRGbFBiYmc

Regards,
>
> Thanks,
>
>> Info: Segments per section = 1
>> Info: Sections per zone = 1
>> Info: sector size = 512
>> Info: total sectors = 2097152 (1024 MB)
>> Info: MKFS version
>>    "Linux version 4.9.8 (roraoof@desktopr.example.com) (gcc version 4.8.5
>> 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Feb 7 08:24:57 IRST 2017"
>> Info: FSCK version
>>    from "Linux version 4.9.8 (roraoof@desktopr.example.com) (gcc version
>> 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Feb 7 08:24:57 IRST
>> 2017"
>>      to "Linux version 4.9.8 (roraoof@desktopr.example.com) (gcc version
>> 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Feb 7 08:24:57 IRST
>> 2017"
>> Info: superblock features = 0 :
>> Info: superblock encrypt level = 0, salt = 00000000000000000000000000000000
>> Info: total FS sectors = 2097152 (1024 MB)
>> Info: CKPT version = 2b59c128
>> Info: checkpoint state = 44 :  compacted_summary sudden-power-off
>> [ASSERT] (sanity_check_nid: 388)  --> nid[0x6] nat_entry->ino[0x6]
>> footer.ino[0x0]
>>
>> NID[0x6] is unreachable
>> NID[0x7] is unreachable
>> [FSCK] Unreachable nat entries                        [Fail] [0x2]
>> [FSCK] SIT valid block bitmap checking                [Fail]
>> [FSCK] Hard link checking for regular file            [Ok..] [0x0]
>> [FSCK] valid_block_count matching with CP             [Fail] [0x6dc9]
>> [FSCK] valid_node_count matcing with CP (de lookup)   [Fail] [0xe3]
>> [FSCK] valid_node_count matcing with CP (nat lookup)  [Ok..] [0xe5]
>> [FSCK] valid_inode_count matched with CP              [Fail] [0x63]
>> [FSCK] free segment_count matched with CP             [Ok..] [0x1c6]
>> [FSCK] next block offset is free                      [Ok..]
>> [FSCK] fixing SIT types
>> [FSCK] other corrupted bugs                           [Fail]
>>
>> After canceling the test by using Ctrl-C without answering any YES/NO
>> questions, on another terminal I run fsck.f2fs again, but the output is
>> completely different:
>> [root@localhost CrashConsistencyTest]# ./locals/usr/local/sbin/fsck.f2fs
>> /dev/sdc
>> Info: [/dev/sdc] Disk Model: VMware Virtual S1.0
>> Info: Segments per section = 1
>> Info: Sections per zone = 1
>> Info: sector size = 512
>> Info: total sectors = 2097152 (1024 MB)
>> Info: MKFS version
>>    "Linux version 4.9.8 (roraoof@desktopr.example.com) (gcc version 4.8.5
>> 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Feb 7 08:24:57 IRST 2017"
>> Info: FSCK version
>>    from "Linux version 4.9.8 (roraoof@desktopr.example.com) (gcc version
>> 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Feb 7 08:24:57 IRST
>> 2017"
>>      to "Linux version 4.9.8 (roraoof@desktopr.example.com) (gcc version
>> 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Feb 7 08:24:57 IRST
>> 2017"
>> Info: superblock features = 0 :
>> Info: superblock encrypt level = 0, salt = 00000000000000000000000000000000
>> Info: total FS sectors = 2097152 (1024 MB)
>> Info: CKPT version = 2b59c128
>> Info: checkpoint state = 44 :  compacted_summary sudden-power-off
>>
>> [FSCK] Unreachable nat entries                        [Ok..] [0x0]
>> [FSCK] SIT valid block bitmap checking                [Ok..]
>> [FSCK] Hard link checking for regular file            [Ok..] [0x0]
>> [FSCK] valid_block_count matching with CP             [Ok..] [0x6dcf]
>> [FSCK] valid_node_count matcing with CP (de lookup)   [Ok..] [0xe5]
>> [FSCK] valid_node_count matcing with CP (nat lookup)  [Ok..] [0xe5]
>> [FSCK] valid_inode_count matched with CP              [Ok..] [0x64]
>> [FSCK] free segment_count matched with CP             [Ok..] [0x1c6]
>> [FSCK] next block offset is free                      [Ok..]
>> [FSCK] fixing SIT types
>> [FSCK] other corrupted bugs                           [Ok..]
>>
>> This situation raises a couple of questions:
>> 1. How  does an inconsistent file system turn into a consistent one in this
>> case?
>> 2. Why does an inconsistency occur in different log numbers; in other words,
>> why is it unpredictable?  Does ordering of logs have to do with disk
>> controller and I/O scheduler?
>>
>> I do appreciate for your help.
>> Regards


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

next prev parent reply	other threads:[~2017-05-17 17:43 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-13 15:28 f2fs Crash Consistency Problem Raouf Rokhjavan
2017-04-13 21:19 ` Jaegeuk Kim
     [not found]   ` <60e7c703-13f1-0f7e-24cc-2c5fae3bc958@gmail.com>
     [not found]     ` <20170414184520.GA6827@jaegeuk.local>
2017-04-15  4:33       ` Raouf Rokhjavan
     [not found]     ` <20170414235834.GA8933@jaegeuk.local>
2017-04-15  5:13       ` Raouf Rokhjavan
2017-04-17 22:34         ` Jaegeuk Kim
2017-05-10 17:51   ` Raouf Rokhjavan
2017-05-12  0:14     ` Jaegeuk Kim
2017-05-12 18:30       ` Raouf Rokhjavan
2017-05-15 17:46         ` Jaegeuk Kim
2017-05-17 17:43       ` Raouf Rokhjavan [this message]
2017-05-17 18:01         ` Jaegeuk Kim
2017-05-24 19:58           ` Raouf Rokhjavan
2017-05-25 23:44             ` Jaegeuk Kim
     [not found]               ` <20170526022213.GA54408@jaegeuk-macbookpro.roam.corp.google.com>
2017-06-01  4:44                 ` Raouf Rokhjavan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a5d7e55f-c246-8c5d-c32c-de299f5d3e03@gmail.com \
    --to=rokhjavan.r@gmail.com \
    --cc=jaegeuk@kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).