Re: [PATCH 5/5] update i_disksize coherently with block allocation

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dmitry Monakhov <dmonakhov@openvz.org>
To: Theodore Ts'o <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org, lczerner@redhat.com
Subject: Re: [PATCH 5/5] update i_disksize coherently with block allocation
Date: Sun, 14 Sep 2014 16:38:45 +0400	[thread overview]
Message-ID: <8761gqpeay.fsf@openvz.org> (raw)
In-Reply-To: <87ha119f03.fsf@openvz.org>

On Mon, 25 Aug 2014 11:59:08 +0400, Dmitry Monakhov <dmonakhov@openvz.org> wrote:
> On Sat, 23 Aug 2014 18:00:29 -0400, "Theodore Ts'o" <tytso@mit.edu> wrote:
> > On Fri, Aug 22, 2014 at 03:32:27PM +0400, Dmitry Monakhov wrote:
> > > Writeback call trace looks like follows:
> > > ext4_writepages
> > >  while(nr_pages)
> > >  ->journal_start
> > >  ->mpage_map_and_submit_extent -> may alloc some blocks
> > >    ->mpage_map_one_extent
> > >  ->journal_stop
> > > In case of delalloc block i_disksize may be less than i_size. So we have to
> > > update i_disksize each time we allocated and submitted some blocks beyond
> > > i_disksize. And we MUST update it in the same transaction, otherwise this
> > > result in fs-inconsistency in case of upcoming power-failure.
> > > 
> > > Another possible way to fix that issue is to insert inode to orhphan list
> > > on ext4_writepages entrance.
> > > 
> > > testcase: xfstest generic/019
> > > 
> > > Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> > 
> > Hi Dmitry, were you seeing generic/019 fail before this patch series?
> > I've been trying to build a kernel with CONFIG_FAIL_MAKE_REQUEST and I
> > haven't been able to get generic/019 to fail on me.  Is there
> > something else we need in order to reliably trigger the test fail?
> As usual this kind of test are not 100% reliable, I've saw failures from
> time to time. But I've assumed that it was side effect of incorrect
> error detection in e2fsck introduced d3f32c2db8f11, But this week i've
> rechecked e2fsck and found that condition was fixed and it is correct.
> In order to speedup testing I use ram dev:
> options brd rd_nr=4 rd_size=10485760 part_show=1
> TEST_DEV=/dev/ram0
> SCRATCH_DEV=/dev/ram1
> And run several rounds for this test:
> for ((i=0;i<20;i++));do ./check generic/019 || break ;done
> 
> You also can increase probability by playing with fsstress options
> --- a/tests/generic/019
> +++ b/tests/generic/019
> @@ -135,7 +135,7 @@ FSSTRESS_AVOID="$FSSTRESS_AVOID -ffsync=0 -fsync=0
> -ffdatasync=0 -f setattr=1"
>  _workout()
>  {
>         out=$SCRATCH_MNT/fsstress.$$
> -       args=`_scale_fsstress_args -p 1 -n999999999 -f setattr=0
> $FSSTRESS_AVOID -d $out`
> +       args=`_scale_fsstress_args -p 8 -n999999999 -f setattr=0
> $FSSTRESS_AVOID -d $out`
>         echo ""
>         echo "Start fsstress.."
>         echo ""
> 
> And finally the cherry on top of this cake I've found that this test
> provoke orphan list corruption or dangling inodes after failure.
> fsck 1.43-WIP (09-Jul-2014)
> e2fsck 1.43-WIP (09-Jul-2014)
> Pass 1: Checking inodes, blocks, and sizes
> Deleted inode 43792 has zero dtime.  Fix<y>? no
> Inodes that were part of a corrupted orphan linked list found.  Fix<y>?
> no
> Inode 493817 was part of the orphaned inode list.  IGNORED.
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> Block bitmap differences:  -148712 -148714
> Fix<y>? no
> Inode bitmap differences:  -43792 -493817
> Fix<y>? no
> 
> /dev/ram1: ********** WARNING: Filesystem still has errors **********
> 
> /dev/ram1: 201244/655360 files (0.0% non-contiguous), 409632/10485760
> blocks
> [root@ts105 xfstests-dev.git2]# INO=493817
> [root@ts105 xfstests-dev.git2]# debugfs /dev/ram1 -R "ex <$INO>" ; \
>             debugfs /dev/ram1 -R "stat <$INO>" ; debugfs /dev/ram1 -R "ncheck $INO"
> debugfs 1.43-WIP (09-Jul-2014)
> Level Entries       Logical            Physical Length Flags
>  0/ 0   1/  1     0 -     0   148712 -   148712      1 
> debugfs 1.43-WIP (09-Jul-2014)
> Inode: 493817   Type: symlink    Mode:  0777   Flags: 0x80000
> Generation: 4038911591    Version: 0x00000000:00000001
> User:     0   Group:     0   Size: 638
> File ACL: 0    Directory ACL: 0
> Links: 0   Blockcount: 2
> Fragment:  Address: 0    Number: 0    Size: 0
>  ctime: 0x53fae8e8:ea861dc0 -- Mon Aug 25 11:42:32 2014
>  atime: 0x53fae8e8:ea861dc0 -- Mon Aug 25 11:42:32 2014
>  mtime: 0x53fae8e8:ea861dc0 -- Mon Aug 25 11:42:32 2014
> crtime: 0x53fae8e8:ea861dc0 -- Mon Aug 25 11:42:32 2014
> dtime: 0x0000ab10 -- Thu Jan  1 15:09:52 1970
> Size of extra inode fields: 28
> EXTENTS:
> (0):148712
> debugfs 1.43-WIP (09-Jul-2014)
> Inode   Pathname
> 
> I saw this effect with different file types (synmlink,chdev,regfile)
> From my findings we lost newly created inode during creation.
> Actually code is very simple, but at this moment I can not find why and
> where this happen.
I've had plenty of time to brain storm this issue :).
In fact it is very simple test-environment related issue.
Once we force make_request failure for all new IO requests
ext4_error will tag on-disk SB state with EXT4_ERROR_FS. In normal
situation this update should not reach permanent-storage, but in our
case updated EXT4_SB(sb)->s_sbh may be under writeback so ERROR_FS flag
will be visible on next mount and orphan_list cleanup will be skipped
due to ERROR_FS. Latest action is 100% correct.
It looks we have to fix the test by using another failure technique.
At this moment I think that faulty bcache may works for us. 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2014-09-14 12:38 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-22 11:32 [PATCH 0/5] i_size/i_disksize update fixes and cleanup series Dmitry Monakhov
2014-08-22 11:32 ` [PATCH 1/5] use ext4_update_i_disksize instead of opencoded ones Dmitry Monakhov
2014-08-31  3:17   ` Theodore Ts'o
2014-08-22 11:32 ` [PATCH 2/5] move i_size,i_diskzie update routines to helper function Dmitry Monakhov
2014-08-26  1:30   ` Theodore Ts'o
2014-08-22 11:32 ` [PATCH 3/5] ext4_zero_range: fix incorect journal credits reservation Dmitry Monakhov
2014-08-23 19:08   ` Theodore Ts'o
2014-08-25  7:19     ` Dmitry Monakhov
2014-08-26  1:30       ` Theodore Ts'o
2014-08-27 21:31       ` Theodore Ts'o
2014-08-22 11:32 ` [PATCH 4/5] fix transaction issues for ext4_fallocate and ext_zero_range Dmitry Monakhov
2014-08-26  1:30   ` Theodore Ts'o
2014-08-22 11:32 ` [PATCH 5/5] update i_disksize coherently with block allocation Dmitry Monakhov
2014-08-23 22:00   ` Theodore Ts'o
2014-08-25  7:59     ` Dmitry Monakhov
2014-09-14 12:38       ` Dmitry Monakhov [this message]
2014-08-26  1:13   ` Theodore Ts'o
2014-08-26  7:47     ` Dmitry Monakhov
2014-08-26 12:26       ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8761gqpeay.fsf@openvz.org \
    --to=dmonakhov@openvz.org \
    --cc=lczerner@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.