From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Zorro Lang <zlang@redhat.com>
Cc: fstests@vger.kernel.org, linux-xfs@vger.kernel.org
Subject: Re: [PATCH] xfstests/shared: dedup integrity test by duperemove
Date: Tue, 29 May 2018 10:42:38 -0700 [thread overview]
Message-ID: <20180529174238.GN4910@magnolia> (raw)
In-Reply-To: <20180529171344.GD4893@hp-dl360g9-06.rhts.eng.pek2.redhat.com>
On Wed, May 30, 2018 at 01:13:45AM +0800, Zorro Lang wrote:
> On Tue, May 29, 2018 at 09:30:23AM -0700, Darrick J. Wong wrote:
> > On Wed, May 30, 2018 at 12:13:04AM +0800, Zorro Lang wrote:
> > > On Tue, May 29, 2018 at 08:07:59AM -0700, Darrick J. Wong wrote:
> > > > On Mon, May 28, 2018 at 12:54:27PM +0800, Zorro Lang wrote:
> > > > > Duperemove is a tool for finding duplicated extents and submitting
> > > > > them for deduplication, and it supports XFS. This case trys to
> > > > > verify the integrity of XFS after running duperemove.
> > > > >
> > > > > Signed-off-by: Zorro Lang <zlang@redhat.com>
> > > > > ---
> > > > >
> > > > > Hi,
> > > > >
> > > > > There's not many softwares support XFS dedup now, duperemove is a rare one.
> > > > > So I write this case by using duperemove.
> > > > >
> > > > > I use fsstress to make many files and data randomly, I don't know if there're
> > > > > better things I can use? Because fsstress only write '0xff' into files, maybe
> > > > > I should add an option to make fsstress can write random character?
> > > >
> > > > Heh. But you probably don't want totally random contents because then
> > > > dupremove doesn't do much.
> > >
> > > No matter how random contents I get, I will copy once :)
> >
> > I suppose so. Once fsstress pokes reflink enough there ought to be a
> > fair number of easy targets for dedupe... on the other hand I think it's
> > a useful test for "literally everything in this fs is identical, dedupe
> > everything" :)
>
> Do you mean dd (with same char) a big file to fill the whole scratch_dev, then
> dedupe?
I was mostly thinking one test where the entire fs is dedupable (because
we only ever write 0xff or whatever), and a second one where we make
duperemove hunt for things.
> Hmm... how about iteration test as below?
>
> # fsstress -d $mnt/dir0 ...
> # for ((i=1; i<100; i++));do
> cp -a dir$((i-1)) dir$i
> find $dir$i -type f -exec md5sum {} \; > $TEST_DIR/${seq}md5.sum$i
> duperemove -dr $mnt/
> md5sum -c $TEST_DIR/${seq}md5.sum$i
> done
> # scratch_cycle_mount
> # for ((i=1; i<100; i++));do
> md5sum -c $TEST_DIR/${seq}md5.sum$i
> done
>
> But this will cost lots of test time. For save time, need to reduce files count
> and size.
Constrict the fs size?
> Hmm... if we have different dedupe test ways, maybe we can write not only one
> cases.
>
> >
> > > >
> > > > >
> > > > > Please tell me, if you have better ideas:)
> > > > >
> > > > > PS: This case test passed on XFS(with reflink=1) and btrfs. And the duperemove
> > > > > can reclaim some space in the test, see below:
> > > > >
> > > > > Before duperemove
> > > > > Filesystem 1K-blocks Used Available Use% Mounted on
> > > > > /dev/mapper/xxxx-xfscratch 31441920K 583692K 30858228K 2% /mnt/scratch
> > > > >
> > > > > After duperemove
> > > > > Filesystem 1K-blocks Used Available Use% Mounted on
> > > > > /dev/mapper/xxxx-xfscratch 31441920K 345728K 31096192K 2% /mnt/scratch
> > > > >
> > > > > Thanks,
> > > > > Zorro
> > > > >
> > > > > common/config | 1 +
> > > > > tests/shared/008 | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > > > tests/shared/008.out | 2 ++
> > > > > tests/shared/group | 1 +
> > > > > 4 files changed, 92 insertions(+)
> > > > > create mode 100755 tests/shared/008
> > > > > create mode 100644 tests/shared/008.out
> > > > >
> > > > > diff --git a/common/config b/common/config
> > > > > index 02c378a9..def559c1 100644
> > > > > --- a/common/config
> > > > > +++ b/common/config
> > > > > @@ -207,6 +207,7 @@ export SQLITE3_PROG="`set_prog_path sqlite3`"
> > > > > export TIMEOUT_PROG="`set_prog_path timeout`"
> > > > > export SETCAP_PROG="`set_prog_path setcap`"
> > > > > export GETCAP_PROG="`set_prog_path getcap`"
> > > > > +export DUPEREMOVE_PROG="`set_prog_path duperemove`"
> > > > >
> > > > > # use 'udevadm settle' or 'udevsettle' to wait for lv to be settled.
> > > > > # newer systems have udevadm command but older systems like RHEL5 don't.
> > > > > diff --git a/tests/shared/008 b/tests/shared/008
> > > > > new file mode 100755
> > > > > index 00000000..dace5429
> > > > > --- /dev/null
> > > > > +++ b/tests/shared/008
> > > > > @@ -0,0 +1,88 @@
> > > > > +#! /bin/bash
> > > > > +# FS QA Test 008
> > > > > +#
> > > > > +# Dedup integrity test by duperemove
> > > > > +#
> > > > > +#-----------------------------------------------------------------------
> > > > > +# Copyright (c) 2018 Red Hat Inc. All Rights Reserved.
> > > > > +#
> > > > > +# This program is free software; you can redistribute it and/or
> > > > > +# modify it under the terms of the GNU General Public License as
> > > > > +# published by the Free Software Foundation.
> > > > > +#
> > > > > +# This program is distributed in the hope that it would be useful,
> > > > > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > > > > +# GNU General Public License for more details.
> > > > > +#
> > > > > +# You should have received a copy of the GNU General Public License
> > > > > +# along with this program; if not, write the Free Software Foundation,
> > > > > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> > > > > +#-----------------------------------------------------------------------
> > > > > +#
> > > > > +
> > > > > +seq=`basename $0`
> > > > > +seqres=$RESULT_DIR/$seq
> > > > > +echo "QA output created by $seq"
> > > > > +
> > > > > +here=`pwd`
> > > > > +tmp=/tmp/$$
> > > > > +status=1 # failure is the default!
> > > > > +trap "_cleanup; exit \$status" 0 1 2 3 15
> > > > > +
> > > > > +_cleanup()
> > > > > +{
> > > > > + cd /
> > > > > + rm -f $tmp.*
> > > > > +}
> > > > > +
> > > > > +# get standard environment, filters and checks
> > > > > +. ./common/rc
> > > > > +. ./common/filter
> > > > > +. ./common/reflink
> > > > > +
> > > > > +# remove previous $seqres.full before test
> > > > > +rm -f $seqres.full
> > > > > +
> > > > > +# real QA test starts here
> > > > > +
> > > > > +# duperemove only supports btrfs and xfs (with reflink feature).
> > > > > +# Add other filesystems if it supports more later.
> > > > > +_supported_fs xfs btrfs
> > > > > +_supported_os Linux
> > > >
> > > > _require_command "$DUPEREMOVE_PROG" duperemove ?
> > >
> > > Yes, it would be better to use this template, not check
> > > [ "$DUPEREMOVE_PROG" = "" ].
> > >
> > > >
> > > > > +_require_scratch_reflink
> > > >
> > > > _require_scratch_dedupe
> > >
> > > Yes, I should check XFS_IOC_FILE_EXTENT_SAME, not XFS_IOC_CLONE*.
> > >
> > > >
> > > > > +
> > > > > +[ "$DUPEREMOVE_PROG" = "" ] && _notrun "duperemove not found"
> > > > > +_scratch_mkfs > $seqres.full 2>&1
> > > > > +_scratch_mount >> $seqres.full 2>&1
> > > > > +
> > > > > +testdir=$SCRATCH_MNT/test-$seq
> > > > > +mkdir $testdir
> > > > > +
> > > > > +fsstress_opts="-w -r -f mknod=0"
> > > > > +# Create some files and make a duplicate
> > > > > +$FSSTRESS_PROG $fsstress_opts -d $testdir \
> > > > > + -n $((500 * LOAD_FACTOR)) -p 10 >/dev/null 2>&1
> > > > > +duptestdir=${testdir}.dup
> > > > > +cp -a $testdir $duptestdir
> > > > > +
> > > > > +# Make some difference in two directories
> > > > > +$FSSTRESS_PROG $fsstress_opts -d $testdir -n 200 -p 5 >/dev/null 2>&1
> > > > > +$FSSTRESS_PROG $fsstress_opts -d $duptestdir -n 200 -p 5 >/dev/null 2>&1
> > > > > +
> > > > > +# Record all files' md5 checksum
> > > > > +find $testdir -type f -exec md5sum {} \; > $TEST_DIR/${seq}md5.sum
> > > > > +find $duptestdir -type f -exec md5sum {} \; > $TEST_DIR/dup${seq}md5.sum
> > > > > +
> > > > > +# Dedup
> > > > > +echo "== Duperemove output ==" >> $seqres.full
> > > > > +$DUPEREMOVE_PROG -dr $SCRATCH_MNT/ >>$seqres.full 2>&1
> > > > > +
> > > > > +# Verify all files' integrity
> > > > > +md5sum -c --quiet $TEST_DIR/${seq}md5.sum
> > > > > +md5sum -c --quiet $TEST_DIR/dup${seq}md5.sum
> > > >
> > > > Can we _scratch_mount_cycle and md5sum -c again so that we test that the
> > > > pagecache contents don't mutate and a fresh read from the disk also
> > > > doesn't show mutations?
> > >
> > > If so, is the md5sum data safe? Should I do cycle_mount before get md5 checksum?
> > > What 'fresh read' do you mean, from above duperemove processes? Or you hope to
> > > read all files once before cycle_mount?
> >
> > Since this is dedupe, the md5sum should never change. The existing
> > md5sum -c check makes sure that the dedupe operation doesn't
> > mishandle/corrupt the page cache such that it suddenly starts returning
> > incorrect contents; and the post-cycle md5sum -c check I propose would
> > flush the page cache and make sure that the on-disk contents also have
> > not changed.
>
> Make sense, I'll do a cycle_mount and md5sum -c check again.
Ok.
--D
> >
> > --D
> >
> > > Thanks,
> > > Zorro
> > >
> > > >
> > > > --D
> > > >
> > > > > +
> > > > > +echo "Silence is golden"
> > > > > +
> > > > > +status=0
> > > > > +exit
> > > > > diff --git a/tests/shared/008.out b/tests/shared/008.out
> > > > > new file mode 100644
> > > > > index 00000000..dd68d5a4
> > > > > --- /dev/null
> > > > > +++ b/tests/shared/008.out
> > > > > @@ -0,0 +1,2 @@
> > > > > +QA output created by 008
> > > > > +Silence is golden
> > > > > diff --git a/tests/shared/group b/tests/shared/group
> > > > > index b3663a03..de7fe79f 100644
> > > > > --- a/tests/shared/group
> > > > > +++ b/tests/shared/group
> > > > > @@ -10,6 +10,7 @@
> > > > > 005 dangerous_fuzzers
> > > > > 006 auto enospc
> > > > > 007 dangerous_fuzzers
> > > > > +008 auto quick dedupe
> > > > > 032 mkfs auto quick
> > > > > 272 auto enospc rw
> > > > > 289 auto quick
> > > > > --
> > > > > 2.14.3
> > > > >
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > the body of a message to majordomo@vger.kernel.org
> > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe fstests" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2018-05-29 17:42 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-28 4:54 [PATCH] xfstests/shared: dedup integrity test by duperemove Zorro Lang
2018-05-29 15:07 ` Darrick J. Wong
2018-05-29 16:13 ` Zorro Lang
2018-05-29 16:30 ` Darrick J. Wong
2018-05-29 17:13 ` Zorro Lang
2018-05-29 17:42 ` Darrick J. Wong [this message]
2018-05-29 18:50 ` Zorro Lang
2018-05-29 21:12 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180529174238.GN4910@magnolia \
--to=darrick.wong@oracle.com \
--cc=fstests@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=zlang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).