public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
From: Eric Whitney <enwlinux@gmail.com>
To: JeffleXu <jefflexu@linux.alibaba.com>
Cc: Eric Whitney <enwlinux@gmail.com>,
	linux-ext4@vger.kernel.org, tytso@mit.edu,
	joseph.qi@linux.alibaba.com
Subject: Re: [PATCH RFC] ext4: fix partial cluster initialization when splitting extent
Date: Thu, 21 May 2020 17:26:19 -0400	[thread overview]
Message-ID: <20200521212619.GA10473@localhost.localdomain> (raw)
In-Reply-To: <9b526ae9-cba6-35dd-0424-61e8fa5ab016@linux.alibaba.com>

* JeffleXu <jefflexu@linux.alibaba.com>:
> 
> On 5/19/20 6:08 AM, Eric Whitney wrote:
> > Hi, Jeffle:
> > 
> > What kernel were you running when you observed your failures?  Does your
> > patch resolve all observed failures, or do any remain?  Do you have a
> > simple test script that reproduces the bug?
> > 
> > I've made almost 1000 runs of shared/298 on various bigalloc configurations
> > using Ted's test appliance on 5.7-rc5 and have not observed a failure.
> > Several auto group runs have also passed without failures.  Ideally, I'd
> > like to be able to reproduce your failure to be sure we fully understand
> > what's going on.  It's still the case that the "2" is wrong, but I think
> > that code in rm_leaf may be involved in an unexpected way.
> > 
> > Thanks,
> > Eric
> 
> Hi Eric,
> 
> Following on is my test environment.
> 
> 
> kernel: 5.7-rc4-git-eb24fdd8e6f5c6bb95129748a1801c6476492aba
> 
> e2fsprog: latest release version 1.45.6 (20-Mar-2020)
> 
> xfstests: git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git, master
> branch, latest commit
> 
> 
> 1. Test device
> 
> I run the test in a VM and the VM is setup by qemu. The size of vdb is 1G,
> 
> ```
> 
> #lsblk
> 
> NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> vdb    254:16   0   1G  0 disk
> 
> ```
> 
> 
> and is initialized by:
> 
> ```
> 
> qemu-img create -f qcow2 /XX/disk1.qcow2 1G
> 
> qemu-kvm -drive file=/XX/disk1.qcow2,if=virtio,format=qcow2 ...
> 
> ```
> 
> 
> 2. Test script
> 
> 
> local.config of xfstests is like:
> 
> export TEST_DEV=/dev/vdb
> export TEST_DIR=/mnt/test
> export SCRATCH_DEV=/dev/vdc
> export SCRATCH_MNT=/mnt/scratch
> 
> 
> Following on is an example script to reproduce the failure:
> 
> ```sh
> 
> #!/bin/bash
> 
> for i in `seq 100`; do
>         echo y | mkfs.ext4 -O bigalloc -C 16K /dev/vdb
> 
>         ./check shared/298
>         status=$?
> 
>         if [[ $status == 1 ]]; then
>                 echo "$i exit"
>                 exit
>         fi
> done
> 
> ```
> 
> 
> Indeed the failure occurs occasionally. Sometimes the script stops at
> iteration 4, or sometimes
> 
> at iteration 2, 7, 24.
> 
> 
> The failure occurs with the following dmesg report:
> 
> ```
> 
> [  387.471876] EXT4-fs error (device vdb): mb_free_blocks:1457: group 1,
> block 158084:freeing already freed block (bit 6753); block bitmap corrupt.
> [  387.473729] EXT4-fs error (device vdb): ext4_mb_generate_buddy:747: group
> 1, block bitmap and bg descriptor inconsistent: 19550 vs 19551 free clusters
> 
> ```
> 
> 
> 3. About the applied patch
> 
> The applied patch does fix the failure in my test environment. At least the
> failure doesn't occur after running the full 100 iterations.
> 
> 
> Thanks
> 
> Jeffle
> 
> 
>

Hi, Jeffle:

Thanks for that information.  I'm still unable to reproduce your failure,
but by inspection your patch clearly fixes a bug, and of course, you're seeing
that.  I suspect the code in rm_leaf that also sets the partial cluster nofree
state is masking the bug in my testing.  In your case, my best guess is that
your testing is occasionally getting into the retry loop for EAGAIN in
remove_space.  This would effectively expose the bug again and could lead to
the failure you've described.

Your patch has survived all the heavy testing I've thrown at it.  So, please
repost your RFC patch as a fix, and feel free to add:
Reviewed-by: Eric Whitney <enwlinux@gmail.com>

This points out that the cluster freeing code really needs to be cleaned up,
so I'm working on a patch series that does that.

Thanks for your patience,
Eric


  reply	other threads:[~2020-05-21 21:26 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-14  8:14 [PATCH RFC] ext4: fix partial cluster initialization when splitting extent Jeffle Xu
2020-05-14 22:21 ` Eric Whitney
2020-05-18 22:08   ` Eric Whitney
2020-05-19  3:29     ` JeffleXu
2020-05-21 21:26       ` Eric Whitney [this message]
2020-05-22  3:09         ` JeffleXu
  -- strict thread matches above, loose matches on Subject: below --
2020-05-22  4:02 Jeffle Xu
2020-05-22  4:15 ` JeffleXu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200521212619.GA10473@localhost.localdomain \
    --to=enwlinux@gmail.com \
    --cc=jefflexu@linux.alibaba.com \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox