All of lore.kernel.org
 help / color / mirror / Atom feed
From: Xiao Yang <yangx.jy@cn.fujitsu.com>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Ashlie Martinez <ashmrtn@utexas.edu>,
	Amir Goldstein <amir73il@gmail.com>, Eryu Guan <eguan@redhat.com>,
	Josef Bacik <jbacik@fb.com>, fstests <fstests@vger.kernel.org>,
	Ext4 <linux-ext4@vger.kernel.org>,
	Vijay Chidambaram <vvijay03@gmail.com>
Subject: Re: [PATCH] ext4: fix interaction between i_size, fallocate, and delalloc after a crash
Date: Wed, 11 Oct 2017 19:11:03 +0800	[thread overview]
Message-ID: <59DDFC47.3050300@cn.fujitsu.com> (raw)
In-Reply-To: <20171007032917.bntgnubthdstmrrt@thunk.org>

On 2017/10/07 11:29, Theodore Ts'o wrote:
> On Thu, Oct 05, 2017 at 07:34:10PM -0500, Ashlie Martinez wrote:
>>>> It almost seems that way, though to be honest, I don't think I know
>>>> enough about 1. the setup used by Amir and 2. all the internal working
>>>> of KVM+virtio to say for sure.
>>> I believe you misread my email.
>>> The problem was NOT reproduced on KVM+virtio.
>>> The problem *is* reproduced on a 10G LVM volume over SSD on
>>> Ubuntu 16.04 with latest kernel and latest e2fsprogs.
> I was able to reproduce it using both kvm-xfstests[1] and gce-xfstests[2].
>
> [1] https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-xfstests.md
> [2] https://thunk.org/gce-xfstests
>
> It did turn out to be timing related, and it requires that a journal
> commit take place after fsstress runs, but it can *not* be triggered
> by a sync/fsync (as this would cause the delayed allocation writes to
> be forced out to disk, and that makes the problem go away).
>
> I initially tried using xfs_io as a replacement for fsstress (since it
> is more flexible and would allow me to more easily run experiments),
> but it turns out xfs_io was too fast/efficient, and so using xfs_io to
> execute the same system calls (verified by strace) would not replicate
> the problem.
>
>>> Now you have a broken file system image and the exact set of operations
>>> that led to it. That's should be a pretty big lead for investigation.
> It was indeed a big lead for investigation (thanks, Amir!), but it
> still took me several hours before I was finally able to figure out
> the problem.  The patch and the commit description should explain what
> was going on.
>
> I'll leave it to Ashlie and Vijay to investigate how to improve Crash
> Monkey so it can better find problems like this automatically.  Since
> you now have a clear reproducer (you can use generic/456 and run it on
> gce-xfstests, using is a standard cloud VM configuration) and an
> explanation of the bug and the four-line fix, I suspect this might be
> good grist for follow-on research after your Hot Storage '17 workshop
> paper.  :-)
>
> Best regards,
>
> 					- Ted
>
>
> commit 3912e7b44cf77e9452d4d0cb6c1da9c7043bb7f1
> Author: Theodore Ts'o<tytso@mit.edu>
> Date:   Fri Oct 6 23:09:55 2017 -0400
>
>      ext4: fix interaction between i_size, fallocate, and delalloc after a crash
Hi Theodore,

After applying your patch, generic/456 always passes on my system which 
just triggers the output[2].
So i could believe this two different outputs[1][2] are triggered by 
different environments, but they
are caused by the same bug which your patch fixes.  Is this right?

[1] Inode 12, end of extent exceeds allowed value(logical block 33, 
physical block 33441, len 7)Clear? no
       Inode 12, i_blocks is 184, should be 128. Fix? no
[2] Inode 12, i_size is 147456, should be 163840. Fix? no

Sorry, i am not familiar with ext4.

Thanks,
Xiao Yang
>
>      If there are pending writes subject to delayed allocation, then i_size
>      will show size after the writes have completed, while i_disksize
>      contains the value of i_size on the disk (since the writes have not
>      been persisted to disk).
>
>      If fallocate(2) is called with the FALLOC_FL_KEEP_SIZE flag, either
>      with or without the FALLOC_FL_ZERO_RANGE flag set, and the new size
>      after the fallocate(2) is between i_size and i_disksize, then after a
>      crash, if a journal commit has resulted in the changes made by the
>      fallocate() call to be persisted after a crash, but the delayed
>      allocation write has not resolved itself, i_size would not be updated,
>      and this would cause the following e2fsck complaint:
>
>      Inode 12, end of extent exceeds allowed value
>              (logical block 33, physical block 33441, len 7)
>
>      This can only take place on a sparse file, where the fallocate(2) call
>      is allocating blocks in a range which is before a pending delayed
>      allocation write which is extending i_size.  Since this situation is
>      quite rare, and the window in which the crash must take place is
>      typically<  30 seconds, in practice this condition will rarely happen.
>
>      Nevertheless, it can be triggered in testing, and in particular by
>      xfstests generic/456.
>
>      Signed-off-by: Theodore Ts'o<tytso@mit.edu>
>      Reported-by: Amir Goldstein<amir73il@gmail.com>
>      Cc: stable@vger.kernel.org
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 97f0fd06728d..07bca11749d4 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -4794,7 +4794,8 @@ static long ext4_zero_range(struct file *file, loff_t offset,
>   	}
>
>   	if (!(mode&  FALLOC_FL_KEEP_SIZE)&&
> -	     offset + len>  i_size_read(inode)) {
> +	    (offset + len>  i_size_read(inode) ||
> +	     offset + len>  EXT4_I(inode)->i_disksize)) {
>   		new_size = offset + len;
>   		ret = inode_newsize_ok(inode, new_size);
>   		if (ret)
> @@ -4965,7 +4966,8 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>   	}
>
>   	if (!(mode&  FALLOC_FL_KEEP_SIZE)&&
> -	     offset + len>  i_size_read(inode)) {
> +	    (offset + len>  i_size_read(inode) ||
> +	     offset + len>  EXT4_I(inode)->i_disksize)) {
>   		new_size = offset + len;
>   		ret = inode_newsize_ok(inode, new_size);
>   		if (ret)
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
> .
>




  parent reply	other threads:[~2017-10-11 11:11 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-27 10:44 [RFC][PATCH] fstest: regression test for ext4 crash consistency bug Amir Goldstein
2017-09-25  9:49 ` Xiao Yang
2017-09-25  9:49   ` Xiao Yang
2017-09-25 10:53   ` Amir Goldstein
2017-09-26 10:45     ` Xiao Yang
2017-09-26 11:48       ` Amir Goldstein
2017-09-30 14:15     ` Ashlie Martinez
2017-10-05  7:27       ` Xiao Yang
2017-10-05 15:04         ` Ashlie Martinez
2017-10-05 19:10           ` Amir Goldstein
2017-10-06  0:34             ` Ashlie Martinez
2017-10-07  3:29               ` [PATCH] ext4: fix interaction between i_size, fallocate, and delalloc after a crash Theodore Ts'o
2017-10-07  5:54                 ` Amir Goldstein
2017-10-07 18:32                   ` Theodore Ts'o
2017-10-09  0:37                 ` Ashlie Martinez
2017-10-11 11:11                 ` Xiao Yang [this message]
2017-10-11 13:17                   ` Ashlie Martinez
2017-10-11 13:34                     ` Amir Goldstein
2017-10-16 19:32                       ` Ashlie Martinez
2017-10-16 21:11                         ` Amir Goldstein
2017-10-17  0:09                           ` Theodore Ts'o
2017-10-17  1:02                             ` Vijay Chidambaram
     [not found]                             ` <CAPaz=E+jFuOmRk8+EmVhNawwogNzW3VkciFrCc0Fk23OfGbwuA@mail.gmail.com>
2017-10-17  7:15                               ` Amir Goldstein
2017-10-17 14:41                               ` Theodore Ts'o
2017-10-17 23:16                                 ` Vijay Chidambaram
2017-10-12 14:38                 ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=59DDFC47.3050300@cn.fujitsu.com \
    --to=yangx.jy@cn.fujitsu.com \
    --cc=amir73il@gmail.com \
    --cc=ashmrtn@utexas.edu \
    --cc=eguan@redhat.com \
    --cc=fstests@vger.kernel.org \
    --cc=jbacik@fb.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=vvijay03@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.