From: Omar Sandoval <osandov@osandov.com>
To: Chris J Arges <chris.j.arges@canonical.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>,
"Andreas Dilger" <adilger.kernel@dilger.ca>,
"Lukáš Czerner" <lczerner@redhat.com>,
linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] ext4: fix indirect punch hole corruption
Date: Mon, 9 Feb 2015 13:28:05 -0800 [thread overview]
Message-ID: <20150209212805.GA10892@mew.cs.washington.edu> (raw)
In-Reply-To: <54D920BC.2040402@canonical.com>
[-- Attachment #1: Type: text/plain, Size: 3332 bytes --]
On Mon, Feb 09, 2015 at 03:03:56PM -0600, Chris J Arges wrote:
> On 02/09/2015 12:21 PM, Chris J Arges wrote:
> > On 02/08/2015 06:15 AM, Omar Sandoval wrote:
> >> Commit 4f579ae7de56 (ext4: fix punch hole on files with indirect
> >> mapping) rewrote FALLOC_FL_PUNCH_HOLE for ext4 files with indirect
> >> mapping. However, there are a bugs in a few cases.
> >>
> >> In the case where the punch happens within one level of indirection, we
> >> expect the start and end shared branches to converge on an indirect
> >> block. However, because the branches returned from ext4_find_shared do
> >> not necessarily start at the same level (e.g., the partial2 chain will
> >> be shallower if the last block occurs at the beginning of an indirect
> >> group), the walk of the two chains can end up "missing" each other and
> >> freeing a bunch of extra blocks in the process. This mismatch can be
> >> handled by first making sure that the chains are at the same level, then
> >> walking them together until they converge.
> >>
> >> In the case that a punch spans different levels of indirection, the
> >> original code skips freeing the intermediate indirect trees if the last
> >> block is the first triply-indirected block because it returns instead of
> >> jumping to do_indirects. Additionally, a non-zero nr2 does not mean that
> >> there's nothing else to free at the level of partial2: consider the case
> >> where the all_zeroes in ext4_find_shared backed up the shared branch.
> >>
> >> Signed-off-by: Omar Sandoval <osandov@osandov.com>
> >
> > Omar,
> > With this patch I no longer seem to be getting the original corruption I
> > detected with my test case; however eventually I do get errors when
> > trying to delete qcow2 snapshots. After getting these errors if I run
> > 'qemu-img check <image>' I see the following errors:
> >
> > ERROR OFLAG_COPIED data cluster: l2_entry=800000018f7f0000 refcount=0
> > ERROR OFLAG_COPIED data cluster: l2_entry=800000018f800000 refcount=0
> > ERROR OFLAG_COPIED data cluster: l2_entry=800000018f810000 refcount=0
> >
> > 16941 errors were found on the image.
> > Data may be corrupted, or further writes to the image may corrupt it.
> >
> > 60459 leaked clusters were found on the image.
> > This means waste of disk space, but no harm to data.
> > 88629/262144 = 33.81% allocated, 9.57% fragmented, 0.00% compressed clusters
> > Image end offset: 10438180864
> >
> > So this patch seems to have moved the problem. I can collect additional
> > logs if necessary.
> >
> > Thanks,
> > --chris j arges
> >
>
> After ignoring snapshot deletion errors, I've hit the original
> corruption problem with your patch still. I'll continue debugging this.
> --chris j arges
>
Chris,
Thanks for testing, and sorry about this game of whack-a-mole. Looks
like there's at least one more bug in the n == n2 case (if nr != 0, we
sometimes, but not always, need to free the subtree referred to by it.)
I'll send you another patch as soon as I have the proper fix.
P.S. The fpunch xfstests don't cover these bugs. I'm attaching the
slightly convoluted script I've been using for testing. It's a Python
script that generates a shell script which I then run on the test
machine. The interesting stuff is in known_bugs(), the first three of
which are fixed by the last patch I sent out.
--
Omar
[-- Attachment #2: punchtests.py --]
[-- Type: text/x-python, Size: 6790 bytes --]
#!/usr/bin/env python3
import hashlib
NDIR = 12
BLOCK_SIZE = 1024
ADDRS = (BLOCK_SIZE // 4)
ALLOCED_BLOCK = b'\xcd' * BLOCK_SIZE
HOLE_BLOCK = b'\x00' * BLOCK_SIZE
TEST_CASE = 0
def test_case(file_blocks, *args):
global TEST_CASE
test_file = '/mnt/test/test.%d' % TEST_CASE
TEST_CASE += 1
print('rm -f %s' % test_file)
print("xfs_io -f -c 'pwrite 0 %d' %s > /dev/null" % (file_blocks * BLOCK_SIZE, test_file))
blocks = [True] * file_blocks
for arg in args:
punch_start, punch_end = arg
offset = punch_start * BLOCK_SIZE
length = punch_end * BLOCK_SIZE - offset
print("xfs_io -c 'fpunch %d %d' %s" % (offset, length, test_file))
for i in range(punch_start, punch_end):
blocks[i] = False
print('umount /mnt/test')
print('mount /mnt/test')
print('diff - <(_fiemap %s) << "EOF"' % test_file)
m = hashlib.md5()
num = 0
extent_start = None
extent_type = None
for i, alloced in enumerate(blocks):
m.update(ALLOCED_BLOCK if alloced else HOLE_BLOCK)
if alloced != extent_type:
if extent_type is not None:
start = extent_start * (BLOCK_SIZE // 512)
end = i * (BLOCK_SIZE // 512) - 1
print('%d: [%d..%d]: %s' % (num, start, end, 'extent' if extent_type else 'hole'))
num += 1
extent_start = i
extent_type = alloced
if extent_type is not None:
if extent_start > 0 or extent_type:
start = extent_start * (BLOCK_SIZE // 512)
end = len(blocks) * (BLOCK_SIZE // 512) - 1
print('%d: [%d..%d]: %s' % (num, start, end, 'extent' if extent_type else 'hole'))
print('EOF')
print('diff - <(md5sum %s) << "EOF"' % test_file)
print('%s %s' % (m.hexdigest(), test_file))
print('EOF')
print()
def simple_tests():
# Punch within direct level.
print("echo 'n = 1, n2 = 1'")
test_case(NDIR, (2, NDIR - 2))
test_case(NDIR, (0, NDIR // 2))
# Punch from direct level to starts of indirect levels.
print("echo 'n2 > n'")
print("echo ' n = 1'")
test_case(NDIR + 1, (0, NDIR)) # Start of direct.
test_case(NDIR + ADDRS + 1, (0, NDIR + ADDRS))
test_case(NDIR + ADDRS + ADDRS * ADDRS + 1, (0, NDIR + ADDRS + ADDRS * ADDRS))
test_case(NDIR + 1, (2, NDIR)) # Middle of direct.
test_case(NDIR + ADDRS + 1, (2, NDIR + ADDRS))
test_case(NDIR + ADDRS + ADDRS * ADDRS + 1, (2, NDIR + ADDRS + ADDRS * ADDRS))
# Punch from direct level into indirect levels.
test_case(NDIR + 2, (0, NDIR + 1)) # Start of direct.
test_case(NDIR + ADDRS + 2, (0, NDIR + ADDRS + 1))
test_case(NDIR + ADDRS + ADDRS * ADDRS + 2, (0, NDIR + ADDRS + ADDRS * ADDRS + 1))
test_case(NDIR + 2, (2, NDIR + 1)) # Middle of direct
test_case(NDIR + ADDRS + 2, (2, NDIR + ADDRS + 1))
test_case(NDIR + ADDRS + ADDRS * ADDRS + 2, (2, NDIR + ADDRS + ADDRS * ADDRS + 1))
# Punch from indirect level into another indirect level.
print("echo ' n = 2'")
# IND -> DIND, IND -> TIND, start of end.
test_case(NDIR + ADDRS + 1, (NDIR, NDIR + ADDRS)) # Start of indirect.
test_case(NDIR + ADDRS + ADDRS * ADDRS + 1, (NDIR, NDIR + ADDRS + ADDRS * ADDRS))
test_case(NDIR + ADDRS + 1, (NDIR + 2, NDIR + ADDRS)) # Middle of indirect.
test_case(NDIR + ADDRS + ADDRS * ADDRS + 1, (NDIR + 2, NDIR + ADDRS + ADDRS * ADDRS))
# Middle of end.
test_case(NDIR + ADDRS + 2, (NDIR, NDIR + ADDRS + 1)) # Start of indirect.
test_case(NDIR + ADDRS + ADDRS * ADDRS + 2, (NDIR, NDIR + ADDRS + ADDRS * ADDRS + 1))
test_case(NDIR + ADDRS + 2, (NDIR + 2, NDIR + ADDRS + 1)) # Middle of indirect.
test_case(NDIR + ADDRS + ADDRS * ADDRS + 2, (NDIR + 2, NDIR + ADDRS + ADDRS * ADDRS + 1))
# DIND -> TIND
print("echo ' n = 3'")
test_case(NDIR + ADDRS + ADDRS * ADDRS + 1, (NDIR + ADDRS, NDIR + ADDRS + ADDRS * ADDRS)) # Start of indirect.
test_case(NDIR + ADDRS + ADDRS * ADDRS + 1, (NDIR + ADDRS + 2, NDIR + ADDRS + ADDRS * ADDRS))
test_case(NDIR + ADDRS + ADDRS * ADDRS + 2, (NDIR + ADDRS, NDIR + ADDRS + ADDRS * ADDRS + 1)) # Middle of indirect.
test_case(NDIR + ADDRS + ADDRS * ADDRS + 2, (NDIR + ADDRS + 2, NDIR + ADDRS + ADDRS * ADDRS + 1))
# Punch within the same indirect level.
print("echo 'n = 2, n2 = 2'")
# IND
test_case(NDIR + ADDRS, (NDIR, NDIR + ADDRS - 1))
test_case(NDIR + ADDRS, (NDIR + 4, NDIR + ADDRS - 1))
# DIND
print("echo 'n = 3, n2 = 3'")
test_case(NDIR + ADDRS + ADDRS * ADDRS, (NDIR + ADDRS, NDIR + ADDRS + ADDRS * ADDRS - 1))
test_case(NDIR + ADDRS + ADDRS * ADDRS, (NDIR + ADDRS + 4, NDIR + ADDRS + ADDRS * ADDRS - 1))
# TIND
print("echo 'n = 4, n2 = 4'")
test_case(NDIR + ADDRS + ADDRS * ADDRS + 20, (NDIR + ADDRS + ADDRS * ADDRS, NDIR + ADDRS + ADDRS * ADDRS + 20))
test_case(NDIR + ADDRS + ADDRS * ADDRS + 20, (NDIR + ADDRS + ADDRS * ADDRS + 4, NDIR + ADDRS + ADDRS * ADDRS + 20))
test_case(NDIR + ADDRS + ADDRS * ADDRS + 20, (NDIR + ADDRS + ADDRS * ADDRS, NDIR + ADDRS + ADDRS * ADDRS + 19))
test_case(NDIR + ADDRS + ADDRS * ADDRS + 20, (NDIR + ADDRS + ADDRS * ADDRS + 4, NDIR + ADDRS + ADDRS * ADDRS + 19))
def known_bugs():
print("echo 'n == n2, uneven partials'")
test_case(NDIR + ADDRS + 2 * ADDRS, (NDIR + ADDRS + ADDRS // 2, NDIR + ADDRS + ADDRS))
print("echo 'n <= 2, n2 == 4, end is first triply-indirect block'")
test_case(NDIR + ADDRS + ADDRS * ADDRS + 4, (0, NDIR + ADDRS + ADDRS * ADDRS))
print("echo 'n < n2, nr2 != 0 because all_zeroes'")
test_case(NDIR + ADDRS + 2 * ADDRS + 5, (NDIR + ADDRS + 2 * ADDRS, NDIR + ADDRS + 2 * ADDRS + 4),
(0, NDIR + ADDRS + 2 * ADDRS + 4))
print("echo 'n == n2, nr != 0 because all_zeroes'")
test_case(NDIR + ADDRS + 4 * ADDRS, (NDIR + ADDRS + ADDRS, NDIR + ADDRS + ADDRS + 1),
(NDIR + ADDRS + ADDRS + 1, NDIR + ADDRS + 3 * ADDRS))
if __name__ == '__main__':
print('#!/bin/bash')
print(r"""
_coalesce_extents () {
awk -F: '
{
range = $2;
type = $3;
split(range, bounds, "[\\[ \\.\\]]");
low = bounds[3];
high = bounds[5];
if (type != prev_type) {
if (prev_type != "")
printf("%u]:%s\n", low - 1, prev_type);
printf("%u: [%u..", out_count++, low);
prev_type = type;
}
}
END {
if (prev_type != "")
printf("%u]:%s\n", high, prev_type);
}'
}
_filter_hole_fiemap () {
awk --posix '
$3 ~ /hole/ {
print $1, $2, $3;
next;
}
$5 ~ /0x[[:xdigit:]]+/ {
print $1, $2, "extent";
}' |
_coalesce_extents
}
_fiemap () {
xfs_io -c 'fiemap -v' "$1" | _filter_hole_fiemap
}
""")
# simple_tests()
known_bugs()
next prev parent reply other threads:[~2015-02-09 21:28 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-05 20:50 [PATCH] ext4: fix indirect punch hole corruption Omar Sandoval
2015-02-05 20:56 ` Omar Sandoval
2015-02-05 21:30 ` Chris J Arges
2015-02-05 21:41 ` Omar Sandoval
2015-02-07 0:28 ` Omar Sandoval
2015-02-07 0:35 ` Chris J Arges
2015-02-07 21:57 ` [PATCH v2] " Omar Sandoval
2015-02-08 12:15 ` [PATCH v3] " Omar Sandoval
2015-02-09 18:21 ` Chris J Arges
2015-02-09 21:03 ` Chris J Arges
2015-02-09 21:28 ` Omar Sandoval [this message]
2015-02-10 21:44 ` [PATCH v4] " Omar Sandoval
2015-02-11 2:59 ` Chris J Arges
2015-02-11 3:37 ` Omar Sandoval
2015-02-17 18:59 ` Omar Sandoval
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150209212805.GA10892@mew.cs.washington.edu \
--to=osandov@osandov.com \
--cc=adilger.kernel@dilger.ca \
--cc=chris.j.arges@canonical.com \
--cc=lczerner@redhat.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).