Random corruption test for e2fsck

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Random corruption test for e2fsck
@ 2007-07-10 13:07 Kalpak Shah
  2007-07-10 14:58 ` Theodore Tso
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Kalpak Shah @ 2007-07-10 13:07 UTC (permalink / raw)
  To: linux-ext4; +Cc: TheodoreTso

[-- Attachment #1: Type: text/plain, Size: 1047 bytes --]

Hi,

This is a random corruption test which can be included in the e2fsprogs
regression tests. It does the following:
1) Create an test fs and format it with ext2/3/4 and random selection of
features.
2) Mount it and copy data into it.
3) Move around the blocks of the filesystem randomly causing corruption.
Also overwrite some random blocks with garbage from /dev/urandom. Create
a copy of this corrupted filesystem.
4) Unmount and run e2fsck. If the first run of e2fsck produces any
errors like uncorrected errors, library error, segfault, usage error,
etc. then it is deemed a bug. But in any case, a second run of e2fsck is
done to check if it renders the filesystem clean. 
5) If the test went by without any errors the test image is deleted and
in case of any errors the user is notified that the log of this test run
should be mailed to linux-ext4@ and the image should be preserved.

Any comments are welcome.

---
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Kalpak Shah <kalpak@clusterfs.com>


Thanks,
Kalpak.


[-- Attachment #2: e2fsprogs-tests-f_random_corruption.patch --]
[-- Type: text/x-patch, Size: 9743 bytes --]

Index: e2fsprogs-regression/tests/f_random_corruption/script
===================================================================
--- /dev/null
+++ e2fsprogs-regression/tests/f_random_corruption/script
@@ -0,0 +1,277 @@
+# This is to make sure that if this test fails other tests can still be run
+# instead of doing an exit. We break before the end of the loop.
+while (( 1 )); do
+
+# choose block and inode sizes randomly
+BLK_SIZES=(1024 2048 4096)
+INODE_SIZES=(128 256 512 1024)
+
+SEED=$(head -1 /dev/urandom | od -N 1 | awk '{ print $2 }')
+RANDOM=$SEED
+
+IMAGE=${IMAGE:-$TMPFILE}
+DATE=`date '+%Y%m%d%H%M%S'`
+ARCHIVE=$IMAGE.$DATE
+SIZE=${SIZE:-$(( 192000 + RANDOM + RANDOM )) }
+FS_TYPE=${FS_TYPE:-ext3}
+BLK_SIZE=${BLK_SIZES[(( $RANDOM % ${#BLK_SIZES[*]} ))]}
+INODE_SIZE=${INODE_SIZES[(( $RANDOM % ${#INODE_SIZES[*]} ))]}
+DEF_FEATURES="sparse_super,filetype,resize_inode,dir_index"
+FEATURES=${FEATURES:-$DEF_FEATURES}
+MOUNT_OPTS="-o loop"
+MNTPT=$test_dir/temp
+OUT=$test_name.$DATE.log
+
+# Do you want to try and mount the filesystem?
+MOUNT_AFTER_CORRUPTION=${MOUNT_AFTER_CORRUPTION:-"no"}
+# Do you want to remove the files from the mounted filesystem? Ideally use it
+# only in test environment.
+REMOVE_FILES=${REMOVE_FILES:-"no"}
+
+# In KB
+CORRUPTION_SIZE=${CORRUPTION_SIZE:-16}
+CORRUPTION_ITERATIONS=${CORRUPTION_ITERATIONS:-5}
+
+MKFS=../misc/mke2fs
+E2FSCK=../e2fsck/e2fsck
+FIRST_FSCK_OPTS="-fyv"
+SECOND_FSCK_OPTS="-fyv"
+
+# Lets check if the image can fit in the current filesystem.
+BASE_BS=`stat -f . | grep "Block size:" | cut -d " " -f3`
+BASE_AVAIL_BLOCKS=`stat -f . | grep "Available:" | cut -d ":" -f5`
+
+if (( BASE_BS * BASE_AVAIL_BLOCKS < NUM_BLKS * BLK_SIZE )); then
+	echo "The base filesystem does not have enough space to accomodate the"
+	echo "test image. Aborting test...."
+	break;
+fi
+
+# Lets have a journal more times than not.
+HAVE_JOURNAL=$(( $RANDOM % 12 ))
+if (( HAVE_JOURNAL == 0 )); then
+	FS_TYPE="ext2"
+	HAVE_JOURNAL=""
+else
+	HAVE_JOURNAL="-j"
+fi
+
+# Experimental features should not be used too often.
+LAZY_BG=$(( $RANDOM % 12 ))
+if (( LAZY_BG == 0 )); then
+       	FEATURES=$FEATURES,lazy_bg
+fi
+META_BG=$(( $RANDOM % 12 ))
+if (( META_BG == 0 )); then
+       	FEATURES=$FEATURES,meta_bg
+fi
+
+modprobe ext4 2> /dev/null
+modprobe ext4dev 2> /dev/null
+
+# If ext4 is present in the kernel then we can play with ext4 options
+EXT4=`grep ext4 /proc/filesystems`
+if [ -n "$EXT4" ]; then
+	USE_EXT4=$(( $RANDOM % 2 ))
+	if (( USE_EXT4 == 1 )); then
+		FS_TYPE="ext4dev"
+	fi
+fi
+
+if [ "$FS_TYPE" = "ext4dev" ]; then
+	UNINIT_GROUPS=$(( $RANDOM % 12 ))
+	if (( UNINIT_GROUPS == 0 )); then
+		FEATURES=$FEATURES,uninit_groups
+	fi
+	EXPAND_ESIZE=$(( $RANDOM % 12 ))
+	if (( EXPAND_EISIZE == 0 )); then
+		FIRST_FSCK_OPTS=$FIRST_FSCK_OPTS," -E expand_extra_isize"
+	fi
+fi
+
+MKFS_OPTS=" $HAVE_JOURNAL -b $BLK_SIZE -I $INODE_SIZE -O $FEATURES"
+
+NUM_BLKS=$(( (SIZE * 1024) / BLK_SIZE ))
+
+unset_vars()
+{
+	unset IMAGE DATE ARCHIVE FS_TYPE SIZE BLK_SIZE MKFS_OPTS MOUNT_OPTS
+	unset E2FSCK FIRST_FSCK_OPTS SECOND_FSCK_OPTS OUT
+}
+
+cleanup()
+{
+	echo "Error occured..." >> $OUT.failed
+        umount -f $MNTPT > /dev/null 2>&1 | tee -a $OUT
+	echo " failed"
+	echo "*** This appears to be a bug in e2fsprogs ***"
+	echo "Please contact linux-ext4 for further assistance."
+	echo "Include $OUT as an attachment, and save $ARCHIVE locally for future reference."
+	unset_vars
+	break;
+}
+
+echo -n "Random corruption test for e2fsck:"
+# Truncate the output log file
+> $OUT
+
+get_random_location()
+{
+	total=$1
+
+	tmp=$(( (RANDOM * 32768) % total ))
+
+	# Try and have more corruption in metadata at the start of the
+	# filesystem.
+	if (( tmp % 3 == 0 || tmp % 5 == 0 || tmp % 7 == 0 )); then
+		tmp=$(( $tmp % 32768 ))
+	fi
+
+	echo $tmp
+}
+
+make_fs_dirty()
+{
+        MAX_BLKS_TO_DIRTY=${1:-NUM_BLKS}
+        from=$(( (RANDOM * RANDOM) % NUM_BLKS ))
+
+        # Number of blocks to write garbage into should be within fs and should
+        # not be too many.
+        num_blks_to_dirty=$(( RANDOM % MAX_BLKS_TO_DIRTY ))
+
+        # write garbage into the selected blocks
+        dd if=/dev/urandom of=$IMAGE seek=$from conv=notrunc count=$num_blks_to_dirty bs=$BLK_SIZE >> $OUT 2>&1
+}
+
+
+touch $IMAGE
+echo "Format the filesystem image..." >> $OUT
+echo >> $OUT
+# Write some garbage blocks into the filesystem to make sure e2fsck has to do
+# a more difficult job than checking blocks of zeroes.
+echo "Copy some random data into filesystem image...." >> $OUT
+make_fs_dirty
+echo "$MKFS $MKFS_OPTS -F $IMAGE  >> $OUT" >> $OUT
+$MKFS $MKFS_OPTS -F $IMAGE $NUM_BLKS >> $OUT 2>&1
+if [ $? -ne 0 ]
+then
+	zero_size=`grep "Device size reported to be zero" $OUT`
+	short_write=`grep "Attempt to write block from filesystem resulted in short write" $OUT`
+
+	if (( zero_size != 0 || short_write != 0 )); then
+		echo "mkfs failed due to device size of 0 or a short write. This is harmless and need not be reported."
+	else
+		echo "mkfs failed - internal error during operation. Aborting random regression test..."
+		cleanup;
+	fi
+fi
+
+mkdir -p $MNTPT
+if [ $? -ne 0 ]
+then
+	echo "Failed to create or find mountpoint...." >> $OUT
+fi
+
+mount -t $FS_TYPE $MOUNT_OPTS $IMAGE $MNTPT > /dev/null 2>&1 | tee -a $OUT
+if [ $? -ne 0 ]
+then
+	echo "Unable to mount file system - skipped" >> $OUT
+else
+	df -h >> $OUT
+	echo "Copying data into the test filesystem..." >> $OUT
+
+	cp -r ../ $MNTPT >> $OUT 2>&1
+	sync
+	umount -f $MNTPT > /dev/null 2>&1 | tee -a $OUT
+fi
+
+echo "Corrupt the image by moving around blocks of data..." >> $OUT
+echo >> $OUT
+for (( i = 0; i < $CORRUPTION_ITERATIONS; i++ ))
+do
+	from=`get_random_location $NUM_BLKS`
+	to=`get_random_location $NUM_BLKS`
+
+	echo "Moving $CORRUPTION_SIZE KB data from $(($from * $BLK_SIZE)) " >> $OUT
+	echo " to $(($to * $BLK_SIZE))." >> $OUT
+	dd if=$IMAGE of=$IMAGE bs=1k count=$CORRUPTION_SIZE conv=notrunc skip=$from seek=$to >> $OUT 2>&1
+
+	# more corruption by overwriting blocks from within the filesystem.
+	make_fs_dirty $(( NUM_BLKS / 256 ))
+done
+
+# Copy the image for reproducing the bug.
+cp --sparse=auto $IMAGE $ARCHIVE >> $OUT 2>&1
+
+echo "First pass of fsck..." >> $OUT
+$E2FSCK $FIRST_FSCK_OPTS $IMAGE >> $OUT 2>&1
+RET=$?
+CORRECTED=$(( $RET & 1 ))
+REBOOT=$(( $RET & 2 ))
+UNCORRECTED=$(( $RET & 4 ))
+OPERROR=$(( $RET & 8 ))
+USEERROR=$(( $RET & 16 ))
+CANCELED=$(( $RET & 32 ))
+LIBERROR=$(( $RET & 128 ))
+
+# Run e2fsck for the second time and check if the problem gets solved. After
+# we can report error with pass1.
+export PASS1_ERROR
+PASS1_ERROR="no"
+[ $CORRECTED == 0 ] || { echo "The first fsck corrected errors" >> $OUT; }
+[ $REBOOT == 0 ] || { echo "The first fsck wants a reboot" >> $OUT.failed; PASS1_ERROR="yes"; }
+[ $UNCORRECTED == 0 ] || { echo "The first fsck left uncorrected errors" >> $OUT.failed; PASS1_ERROR="yes"; }
+[ $OPERROR == 0 ] || { echo "The first fsck claims there was an operational error" >> $OUT.failed; PASS1_ERROR="yes"; }
+[ $USEERROR == 0 ] || { echo "The first fsck claims there was a usage error" >> $OUT.failed; PASS1_ERROR="yes"; }
+[ $CANCELED == 0 ] || { echo "The first fsck claims it was canceled" >> $OUT.failed; PASS1_ERROR="yes"; }
+[ $LIBERROR == 0 ] || { echo "The first fsck claims there was a library error" >> $OUT.failed; PASS1_ERROR="yes"; }
+
+echo --------------------------------------------------------- >> $OUT
+
+echo "Second pass of fsck..." >> $OUT
+$E2FSCK $SECOND_FSCK_OPTS $IMAGE >> $OUT 2>&1
+RET=$?
+CORRECTED=$(( $RET & 1 ))
+REBOOT=$(( $RET & 2 ))
+UNCORRECTED=$(( $RET & 4 ))
+OPERROR=$(( $RET & 8 ))
+USEERROR=$(( $RET & 16 ))
+CANCELED=$(( $RET & 32 ))
+LIBERROR=$(( $RET & 128 ))
+[ $CORRECTED == 0 ] || { echo "The second fsck claimed to correct errors!" >> $OUT.failed; cleanup; }
+[ $REBOOT == 0 ] || { echo "The second fsck wants a reboot" >> $OUT.failed; cleanup; }
+[ $UNCORRECTED == 0 ] || { echo "The second fsck left uncorrected errors" >> $OUT.failed; cleanup; }
+[ $OPERROR == 0 ] || { echo "The second fsck claims there was an operational error" >> $OUT.failed; cleanup; }
+[ $USEERROR == 0 ] || { echo "The second fsck claims there was a usage error" >> $OUT.failed;  cleanup; }
+[ $CANCELED == 0 ] || { echo "The second fsck claims it was canceled" >> $OUT.failed;  cleanup; }
+[ $LIBERROR == 0 ] || { echo "The second fsck claims there was a library error" >> $OUT.failed; cleanup; }
+
+if [ "PASS1_ERROR" = "yes" ]; then
+	cleanup;
+fi
+
+if [ "$MOUNT_AFTER_CORRUPTION" = "yes" ]; then
+	mount -t $FS_TYPE $MOUNT_OPTS $IMAGE $MNTPT 2>&1 | tee -a $OUT
+	if [ $? -ne 0 ]
+	then
+		echo "Unable to mount file system - skipped" >> $OUT
+	fi
+
+	if [ "$REMOVE_FILES" = "yes" ]; then
+		rm -rf $MNTPT/* >> $OUT
+	fi
+	umount -f $MNTPT > /dev/null 2>&1 | tee -a $OUT
+fi
+
+rm -f	$ARCHIVE
+rm -f	$OUT.failed
+
+# Report success
+echo "ok"
+echo "Succeeded..." > $OUT.ok
+
+unset_vars
+
+break;
+
+done
Index: e2fsprogs-regression/tests/Makefile.in
===================================================================
--- e2fsprogs-regression.orig/tests/Makefile.in
+++ e2fsprogs-regression/tests/Makefile.in
@@ -24,6 +24,8 @@ test_script: test_script.in Makefile
 	@chmod +x test_script
 
 check:: test_script
+	@echo "Removing remnants of earlier tests..."
+	$(RM) -f *~ *.log *.new *.failed *.ok test.img2*
 	@echo "Running e2fsprogs test suite..."
 	@echo " "
 	@./test_script
@@ -63,7 +65,7 @@ testend: test_script ${TDIR}/image
 	@echo "If all is well, edit ${TDIR}/name and rename ${TDIR}."
 
 clean::
-	$(RM) -f *~ *.log *.new *.failed *.ok test.img test_script
+	$(RM) -f *~ *.log *.new *.failed *.ok test.img* test_script
 
 distclean:: clean
 	$(RM) -f Makefile

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Random corruption test for e2fsck
  2007-07-10 13:07 Random corruption test for e2fsck Kalpak Shah
@ 2007-07-10 14:58 ` Theodore Tso
  2007-07-10 15:42   ` Eric Sandeen
                     ` (3 more replies)
  2007-07-10 15:47 ` Eric Sandeen
  2007-07-11 15:20 ` Andi Kleen
  2 siblings, 4 replies; 15+ messages in thread
From: Theodore Tso @ 2007-07-10 14:58 UTC (permalink / raw)
  To: Kalpak Shah; +Cc: linux-ext4

On Tue, Jul 10, 2007 at 06:37:40PM +0530, Kalpak Shah wrote:
> Hi,
> 
> This is a random corruption test which can be included in the e2fsprogs
> regression tests. 
> 1) Create an test fs and format it with ext2/3/4 and random selection of
> features.
> 2) Mount it and copy data into it.

This requires root privileges in order to mount the loop filesystem.
Any chance you could change it to use debugfs to populate the
filesystem, so we don't need root privs in order to mount it.

This will increase the number of people that will actually run the
test, and more importantly not encourage people from running "make
check" as root.

> 3) Move around the blocks of the filesystem randomly causing corruption.
> Also overwrite some random blocks with garbage from /dev/urandom. Create
> a copy of this corrupted filesystem.
>
> 4) Unmount and run e2fsck. If the first run of e2fsck produces any
> errors like uncorrected errors, library error, segfault, usage error,
> etc. then it is deemed a bug. But in any case, a second run of e2fsck is
> done to check if it renders the filesystem clean. 

Err, you do unmount the filesystem first before you start corrupting
it, right?  (Checking script; sure looks like it.)

> 5) If the test went by without any errors the test image is deleted and
> in case of any errors the user is notified that the log of this test run
> should be mailed to linux-ext4@ and the image should be preserved.

I certainly like the general concept!!

I wonder if the code to create a random filesystem and corrupting it
should be kept as separate shell script, since it can be reused in
another of interesting ways.  One thought would be to write a test
script that mounts corrupted filesystems using UML and then does some
exercises on it (tar cvf on the filesyste, random renames on the
filesystem, rm -rf of all of the contents of the filesystems), to see
whether we can provoke a kernel oops.

Regards,

							- Ted

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Random corruption test for e2fsck
  2007-07-10 14:58 ` Theodore Tso
@ 2007-07-10 15:42   ` Eric Sandeen
  2007-07-11  7:03   ` Kalpak Shah
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Eric Sandeen @ 2007-07-10 15:42 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Kalpak Shah, linux-ext4

Theodore Tso wrote:
>> 5) If the test went by without any errors the test image is deleted and
>> in case of any errors the user is notified that the log of this test run
>> should be mailed to linux-ext4@ and the image should be preserved.
> 
> I certainly like the general concept!!
> 
> I wonder if the code to create a random filesystem and corrupting it
> should be kept as separate shell script, since it can be reused in
> another of interesting ways.  One thought would be to write a test
> script that mounts corrupted filesystems using UML and then does some
> exercises on it (tar cvf on the filesyste, random renames on the
> filesystem, rm -rf of all of the contents of the filesystems), to see
> whether we can provoke a kernel oops.

FWIW, that's what fsfuzzer does, in an fs-agnostic way.

-Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Random corruption test for e2fsck
  2007-07-10 13:07 Random corruption test for e2fsck Kalpak Shah
  2007-07-10 14:58 ` Theodore Tso
@ 2007-07-10 15:47 ` Eric Sandeen
  2007-07-11 16:03   ` Andreas Dilger
  2007-07-11 15:20 ` Andi Kleen
  2 siblings, 1 reply; 15+ messages in thread
From: Eric Sandeen @ 2007-07-10 15:47 UTC (permalink / raw)
  To: Kalpak Shah; +Cc: linux-ext4, TheodoreTso

Kalpak Shah wrote:
> Hi,
> 
> This is a random corruption test which can be included in the e2fsprogs
> regression tests. It does the following:
> 1) Create an test fs and format it with ext2/3/4 and random selection of
> features.
> 2) Mount it and copy data into it.
> 3) Move around the blocks of the filesystem randomly causing corruption.
> Also overwrite some random blocks with garbage from /dev/urandom. Create
> a copy of this corrupted filesystem.
> 4) Unmount and run e2fsck. If the first run of e2fsck produces any
> errors like uncorrected errors, library error, segfault, usage error,
> etc. then it is deemed a bug. But in any case, a second run of e2fsck is
> done to check if it renders the filesystem clean. 
> 5) If the test went by without any errors the test image is deleted and
> in case of any errors the user is notified that the log of this test run
> should be mailed to linux-ext4@ and the image should be preserved.
> 
> Any comments are welcome.

Seems like a pretty good idea.  I had played with such a thing using
fsfuzzer... fsfuzzer always seemed at least as useful useful as a fsck
tester than a kernel code tester anyway.  (OOC, did you look at fsfuzzer
when you did this?)

My only concern is that since it's introducing random corruption, new
things will probably pop up from time to time; when we do an rpm build
for Fedora/RHEL, it automatically runs make check:

%check
make check

which seems like a reasonably good idea to me.  However, I'd rather not
have last-minute build failures introduced by new random collection of
bits that have never been seen before.  Maybe "make RANDOM=0 check" as
an option would be a good idea for automated builds...?

Thanks,
-Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Random corruption test for e2fsck
  2007-07-10 14:58 ` Theodore Tso
  2007-07-10 15:42   ` Eric Sandeen
@ 2007-07-11  7:03   ` Kalpak Shah
       [not found]   ` <20070711094410.GM6417@schatzie.adilger.int>
  2007-07-12  5:52   ` Andreas Dilger
  3 siblings, 0 replies; 15+ messages in thread
From: Kalpak Shah @ 2007-07-11  7:03 UTC (permalink / raw)
  To: Theodore Tso; +Cc: linux-ext4

On Tue, 2007-07-10 at 10:58 -0400, Theodore Tso wrote:
> On Tue, Jul 10, 2007 at 06:37:40PM +0530, Kalpak Shah wrote:
> > Hi,
> > 
> > This is a random corruption test which can be included in the e2fsprogs
> > regression tests. 
> > 1) Create an test fs and format it with ext2/3/4 and random selection of
> > features.
> > 2) Mount it and copy data into it.
> 
> This requires root privileges in order to mount the loop filesystem.
> Any chance you could change it to use debugfs to populate the
> filesystem, so we don't need root privs in order to mount it.
> 
> This will increase the number of people that will actually run the
> test, and more importantly not encourage people from running "make
> check" as root.

That is a good idea. With this script, the mount would just fail without
root privileges and the test would be done on an empty filesystem. I
will make this change and post it.


> > 3) Move around the blocks of the filesystem randomly causing corruption.
> > Also overwrite some random blocks with garbage from /dev/urandom. Create
> > a copy of this corrupted filesystem.
> >
> > 4) Unmount and run e2fsck. If the first run of e2fsck produces any
> > errors like uncorrected errors, library error, segfault, usage error,
> > etc. then it is deemed a bug. But in any case, a second run of e2fsck is
> > done to check if it renders the filesystem clean. 
> 
> Err, you do unmount the filesystem first before you start corrupting
> it, right?  (Checking script; sure looks like it.)
> 

Yes, the filesystem is unmounted before the corruption begins.

> > 5) If the test went by without any errors the test image is deleted and
> > in case of any errors the user is notified that the log of this test run
> > should be mailed to linux-ext4@ and the image should be preserved.
> 
> I certainly like the general concept!!
> 
> I wonder if the code to create a random filesystem and corrupting it
> should be kept as separate shell script, since it can be reused in
> another of interesting ways.  One thought would be to write a test
> script that mounts corrupted filesystems using UML and then does some
> exercises on it (tar cvf on the filesyste, random renames on the
> filesystem, rm -rf of all of the contents of the filesystems), to see
> whether we can provoke a kernel oops.

Well, there is a MOUNT_AFTER_CORRUPTION option in the script which can
be enhanced to do this.

Thanks,
Kalpak.

> Regards,
> 
> 							- Ted
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Random corruption test for e2fsck
  2007-07-10 13:07 Random corruption test for e2fsck Kalpak Shah
  2007-07-10 14:58 ` Theodore Tso
  2007-07-10 15:47 ` Eric Sandeen
@ 2007-07-11 15:20 ` Andi Kleen
  2007-07-12  5:19   ` Andreas Dilger
  2 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2007-07-11 15:20 UTC (permalink / raw)
  To: Kalpak Shah; +Cc: linux-ext4, TheodoreTso

Kalpak Shah <kalpak@clusterfs.com> writes:

> regression tests. It does the following:
> 1) Create an test fs and format it with ext2/3/4 and random selection of
> features.
> 2) Mount it and copy data into it.
> 3) Move around the blocks of the filesystem randomly causing corruption.
> Also overwrite some random blocks with garbage from /dev/urandom. Create
> a copy of this corrupted filesystem.

If you use a normal pseudo random number generator and print the seed
(e.g. create from the time) initially the image can be easily recreated 
later without shipping it around. /dev/urandom
is not really needed for this since you don't need cryptographic
strength randomness. Besides urandom data is precious and it's 
a pity to use it up needlessly.

bash has $RANDOM built in for this purpose.

-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Random corruption test for e2fsck
  2007-07-10 15:47 ` Eric Sandeen
@ 2007-07-11 16:03   ` Andreas Dilger
  0 siblings, 0 replies; 15+ messages in thread
From: Andreas Dilger @ 2007-07-11 16:03 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Kalpak Shah, linux-ext4, TheodoreTso

[-- Attachment #1: Type: text/plain, Size: 1568 bytes --]

On Jul 10, 2007  10:47 -0500, Eric Sandeen wrote:
> Seems like a pretty good idea.  I had played with such a thing using
> fsfuzzer... fsfuzzer always seemed at least as useful useful as a fsck
> tester than a kernel code tester anyway.  (OOC, did you look at fsfuzzer
> when you did this?)

The person who originally started it had looked at fsfuzzer, but I haven't
myself.

> My only concern is that since it's introducing random corruption, new
> things will probably pop up from time to time; when we do an rpm build
> for Fedora/RHEL, it automatically runs make check:
> 
> %check
> make check

Yes, we added this to our .spec file also, though I didn't realize rpm
had a %check stanza in it.  We just added it into the %build stanza,
but this is something that should be pushed upstream, since it really
makes sense to ensure e2fsprogs is built & running correctly.

> which seems like a reasonably good idea to me.  However, I'd rather not
> have last-minute build failures introduced by new random collection of
> bits that have never been seen before.  Maybe "make RANDOM=0 check" as
> an option would be a good idea for automated builds...?

I've added this to the updated version:
$ f_random_corruption=skip ./test_script f_random_corruption
f_random_corruption: skipped

I wonder if it makes sense to add this as a generic functionality to
test_script, something like:

	[ `eval \$$test_name` = "skip" ] && echo "skipped"

Latest version of the script is attached.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


[-- Attachment #2: e2fsprogs-tests-f_random_corruption.patch --]
[-- Type: text/plain, Size: 9028 bytes --]

Index: e2fsprogs-cfs/tests/f_random_corruption/script
===================================================================
--- /dev/null
+++ e2fsprogs-cfs/tests/f_random_corruption/script
@@ -0,0 +1,268 @@
+# This is to make sure that if this test fails other tests can still be run
+# instead of doing an exit. We break before the end of the loop.
+while (( 1 )); do
+[ "$f_random_corruption" = "skip" ] && echo "skipped" && break
+
+# choose block and inode sizes randomly
+BLK_SIZES=(1024 2048 4096)
+INODE_SIZES=(128 256 512 1024)
+
+SEED=$(head -1 /dev/urandom | od -N 1 | awk '{ print $2 }')
+RANDOM=$SEED
+
+IMAGE=${IMAGE:-$TMPFILE}
+DATE=`date '+%Y%m%d%H%M%S'`
+ARCHIVE=$IMAGE.$DATE
+SIZE=${SIZE:-$(( 192000 + RANDOM + RANDOM )) }
+FS_TYPE=${FS_TYPE:-ext3}
+BLK_SIZE=${BLK_SIZES[(( $RANDOM % ${#BLK_SIZES[*]} ))]}
+INODE_SIZE=${INODE_SIZES[(( $RANDOM % ${#INODE_SIZES[*]} ))]}
+DEF_FEATURES="sparse_super,filetype,resize_inode,dir_index"
+FEATURES=${FEATURES:-$DEF_FEATURES}
+MOUNT_OPTS="-o loop"
+MNTPT=$test_dir/temp
+OUT=$test_name.log
+FAILED=$test_name.failed
+OKFILE=$test_name.ok
+
+# Do you want to try and mount the filesystem?
+MOUNT_AFTER_CORRUPTION=${MOUNT_AFTER_CORRUPTION:-"no"}
+# Do you want to remove the files from the mounted filesystem?
+# Ideally use it only in test environment.
+REMOVE_FILES=${REMOVE_FILES:-"no"}
+
+# In KB
+CORRUPTION_SIZE=${CORRUPTION_SIZE:-64}
+CORRUPTION_ITERATIONS=${CORRUPTION_ITERATIONS:-5}
+
+MKFS=../misc/mke2fs
+E2FSCK=../e2fsck/e2fsck
+FIRST_FSCK_OPTS="-fyv"
+SECOND_FSCK_OPTS="-fyv"
+
+# Lets check if the image can fit in the current filesystem.
+BASE_DIR=`dirname $IMAGE`
+BASE_AVAIL_BLOCKS=`df -P -k $BASE_DIR  | awk '/%/ { print $4 }'`
+
+if (( BASE_AVAIL_BLOCKS < NUM_BLKS * (BLK_SIZE / 1024) )); then
+	echo "$BASE_DIR does not have enough space to accomodate test image."
+	echo "Skipping test...."
+	break;
+fi
+
+# Lets have a journal more times than not.
+HAVE_JOURNAL=$((RANDOM % 12 ))
+if (( HAVE_JOURNAL == 0 )); then
+	FS_TYPE="ext2"
+	HAVE_JOURNAL=""
+else
+	HAVE_JOURNAL="-j"
+fi
+
+# Experimental features should not be used too often.
+LAZY_BG=$(( $RANDOM % 12 ))
+if (( LAZY_BG == 0 )); then
+       	FEATURES=$FEATURES,lazy_bg
+fi
+META_BG=$(( $RANDOM % 12 ))
+if (( META_BG == 0 )); then
+       	FEATURES=$FEATURES,meta_bg
+fi
+
+modprobe ext4 2> /dev/null
+modprobe ext4dev 2> /dev/null
+
+# If ext4 is present in the kernel then we can play with ext4 options
+EXT4=`grep ext4 /proc/filesystems`
+if [ -n "$EXT4" ]; then
+	USE_EXT4=$((RANDOM % 2 ))
+	if (( USE_EXT4 == 1 )); then
+		FS_TYPE="ext4dev"
+	fi
+fi
+
+if [ "$FS_TYPE" = "ext4dev" ]; then
+	UNINIT_GROUPS=$((RANDOM % 12 ))
+	if (( UNINIT_GROUPS == 0 )); then
+		FEATURES=$FEATURES,uninit_groups
+	fi
+	EXPAND_ESIZE=$((RANDOM % 12 ))
+	if (( EXPAND_EISIZE == 0 )); then
+		FIRST_FSCK_OPTS=$FIRST_FSCK_OPTS," -E expand_extra_isize"
+	fi
+fi
+
+MKFS_OPTS=" $HAVE_JOURNAL -b $BLK_SIZE -I $INODE_SIZE -O $FEATURES"
+
+NUM_BLKS=$(( (SIZE * 1024) / BLK_SIZE ))
+
+log()
+{
+	[ "$VERBOSE" ] && echo "$*"
+	echo "$*" >> $OUT
+}
+
+error()
+{
+	log "$*"
+	echo "$*" >> $FAILED
+}
+
+unset_vars()
+{
+	unset IMAGE DATE ARCHIVE FS_TYPE SIZE BLK_SIZE MKFS_OPTS MOUNT_OPTS
+	unset E2FSCK FIRST_FSCK_OPTS SECOND_FSCK_OPTS OUT FAILED OKFILE
+}
+
+cleanup()
+{
+	[ "$1" ] && error "$*" || error "Error occured..."
+        umount -f $MNTPT > /dev/null 2>&1 | tee -a $OUT
+	cp $OUT $OUT.$DATE
+	echo " failed"
+	echo "*** This appears to be a bug in e2fsprogs ***"
+	echo "Please contact linux-ext4@vger.kernel.org for further assistance."
+	echo "Include $OUT.$DATE, and save $ARCHIVE locally for reference."
+	unset_vars
+	break;
+}
+
+echo -n "Random corruption test for e2fsck:"
+# Truncate the output log file
+rm -f $FAILED $OKFILE
+> $OUT
+
+get_random_location()
+{
+	total=$1
+
+	tmp=$(((RANDOM * 32768) % total))
+
+	# Try and have more corruption in metadata at the start of the
+	# filesystem.
+	if ((tmp % 3 == 0 || tmp % 5 == 0 || tmp % 7 == 0)); then
+		tmp=$((tmp % 32768))
+	fi
+
+	echo $tmp
+}
+
+make_fs_dirty()
+{
+        from=`get_random_location $NUM_BLKS`
+
+        # Number of blocks to write garbage into should be within fs and should
+        # not be too many.
+        num_blks_to_dirty=$((RANDOM % $1))
+
+        # write garbage into the selected blocks
+	[ ! -c /dev/urandom ] && return
+	log "writing ${num_blks_to_dirty}kB random garbage at offset ${from}kB"
+        dd if=/dev/urandom of=$IMAGE bs=1kB seek=$from conv=notrunc \
+		count=$num_blks_to_dirty bs=$BLK_SIZE >> $OUT 2>&1
+}
+
+
+touch $IMAGE
+log "Format the filesystem image..."
+log
+# Write some garbage blocks into the filesystem to make sure e2fsck has to do
+# a more difficult job than checking blocks of zeroes.
+log "Copy some random data into filesystem image...."
+make_fs_dirty 32768
+log "$MKFS $MKFS_OPTS -F $IMAGE  >> $OUT"
+$MKFS $MKFS_OPTS -F $IMAGE $NUM_BLKS >> $OUT 2>&1
+if [ $? -ne 0 ]
+then
+	zero_size=`grep "Device size reported to be zero" $OUT`
+	short_write=`grep "Attempt to write block from filesystem resulted in short write" $OUT`
+
+	if (( zero_size != 0 || short_write != 0 )); then
+		echo "mkfs failed due to device size of 0 or a short write. This is harmless and need not be reported."
+	else
+		cleanup "mkfs failed - internal error during operation. Aborting random regression test..."
+	fi
+fi
+
+mkdir -p $MNTPT
+if [ $? -ne 0 ]; then
+	log "Failed to create or find mountpoint...."
+else
+	mount -t $FS_TYPE $MOUNT_OPTS $IMAGE $MNTPT 2>&1 | tee -a $OUT |\
+		grep -v "only root can do that"
+	if [ $? -ne 0 ]; then
+		log "Unable to mount file system - skipped"
+	else
+		df -h $MNTPT >> $OUT
+		df -i $MNTPT >> $OUT
+		log "Copying data into the test filesystem..."
+
+		cp -r ../ $MNTPT >> $OUT 2>&1
+		sync
+		umount -f $MNTPT > /dev/null 2>&1 | tee -a $OUT
+	fi
+fi
+
+log "Corrupt the image by moving around blocks of data..."
+log
+for (( i = 0; i < $CORRUPTION_ITERATIONS; i++ )); do
+	from=`get_random_location $NUM_BLKS`
+	to=`get_random_location $NUM_BLKS`
+
+	log "Moving ${CORRUPTION_SIZE}kB from block ${from}kB to ${to}kB"
+	dd if=$IMAGE of=$IMAGE bs=1k count=$CORRUPTION_SIZE conv=notrunc skip=$from seek=$to >> $OUT 2>&1
+
+	# more corruption by overwriting blocks from within the filesystem.
+	make_fs_dirty $CORRUPTION_SIZE
+done
+
+# Copy the image for reproducing the bug.
+cp --sparse=always $IMAGE $ARCHIVE >> $OUT 2>&1
+
+log "First pass of fsck..."
+$E2FSCK $FIRST_FSCK_OPTS $IMAGE >> $OUT 2>&1
+RET=$?
+
+# Run e2fsck for the second time and check if the problem gets solved.
+# After we can report error with pass1.
+[ $((RET & 1)) == 0 ] || log "The first fsck corrected errors"
+[ $((RET & 2)) == 0 ] || error "The first fsck wants a reboot"
+[ $((RET & 4)) == 0 ] || error "The first fsck left uncorrected errors"
+[ $((RET & 8)) == 0 ] || error "The first fsck reports an operational error"
+[ $((RET & 16)) == 0 ] || error "The first fsck reports there was a usage error"
+[ $((RET & 32)) == 0 ] || error "The first fsck reports it was cancelled"
+[ $((RET & 128)) == 0 ] || error "The first fsck reports a library error"
+
+log "---------------------------------------------------------"
+
+log "Second pass of fsck..."
+$E2FSCK $SECOND_FSCK_OPTS $IMAGE >> $OUT 2>&1
+RET=$?
+[ $((RET & 1)) == 0 ] || cleanup "The second fsck corrected errors!"
+[ $((RET & 2)) == 0 ] || cleanup "The second fsck wants a reboot"
+[ $((RET & 4)) == 0 ] || cleanup "The second fsck left uncorrected errors"
+[ $((RET & 8)) == 0 ] || cleanup "The second fsck reports an operational error"
+[ $((RET & 16)) == 0 ] || cleanup "The second fsck reports a usage error"
+[ $((RET & 32)) == 0 ] || cleanup "The second fsck reports it was cancelled"
+[ $((RET & 128)) == 0 ] || cleanup "The second fsck reports a library error"
+
+[ -f $FAILED ] && cleanup
+
+if [ "$MOUNT_AFTER_CORRUPTION" = "yes" ]; then
+	mount -t $FS_TYPE $MOUNT_OPTS $IMAGE $MNTPT 2>&1 | tee -a $OUT
+	[ $? -ne 0 ] && log "Unable to mount file system - skipped"
+
+	[ "$REMOVE_FILES" = "yes" ] && rm -rf $MNTPT/* >> $OUT
+	umount -f $MNTPT > /dev/null 2>&1 | tee -a $OUT
+fi
+
+rm -f $ARCHIVE
+
+# Report success
+echo "ok"
+echo "Succeeded..." > $OKFILE
+
+unset_vars
+
+break; # this breaks out of the while(1) wrapping this test
+done
Index: e2fsprogs-cfs/tests/Makefile.in
===================================================================
--- e2fsprogs-cfs.orig/tests/Makefile.in
+++ e2fsprogs-cfs/tests/Makefile.in
@@ -24,6 +24,8 @@ test_script: test_script.in Makefile
 	@chmod +x test_script
 
 check:: test_script
+	@echo "Removing remnants of earlier tests..."
+	$(RM) -f *~ *.log *.new *.failed *.ok test.img2*
 	@echo "Running e2fsprogs test suite..."
 	@echo " "
 	@./test_script
@@ -63,7 +65,7 @@ testend: test_script ${TDIR}/image
 	@echo "If all is well, edit ${TDIR}/name and rename ${TDIR}."
 
 clean::
-	$(RM) -f *~ *.log *.new *.failed *.ok test.img test_script
+	$(RM) -f *~ *.log *.new *.failed *.ok test.img* test_script
 
 distclean:: clean
 	$(RM) -f Makefile

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Random corruption test for e2fsck
       [not found]   ` <20070711094410.GM6417@schatzie.adilger.int>
@ 2007-07-11 17:43     ` Theodore Tso
  2007-07-12  5:15       ` Andreas Dilger
  0 siblings, 1 reply; 15+ messages in thread
From: Theodore Tso @ 2007-07-11 17:43 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Kalpak Shah, linux-ext4

On Wed, Jul 11, 2007 at 03:44:11AM -0600, Andreas Dilger wrote:
> I've already found some kind of memory corruption in e2fsck as a result
> of running this as a regular user.  It segfaults in qsort() when freeing
> memory.  The image that causes this problem is attached, and it happens
> with the unpatched 1.39-WIP Mercurial tree of 2007-05-22.  Unfortunately,
> I don't have any decent memory debugging tools handy, so it isn't easy to
> see what is happening.  This is on an FC3 i686 system, in case it matters.

Thanks for sending me the test case!  Here's the patch, which will
probably cause me to do a 1.40.2 release sooner rather than later...

						- Ted

commit 5e9ba85c2694926eb784531d81ba107200cf1a51
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Wed Jul 11 13:42:43 2007 -0400

    Fix e2fsck segfault on very badly damaged filesystems
    
    A recent change to e2fsck_add_dir_info() to use tdb files to check
    filesystems with a very large number of filesystems had a typo which
    caused us to resize the wrong data structure.  This would cause a
    array overrun leading to malloc pointer corruptions.  Since we
    normally can very accurately predict how big the the dirinfo array
    needs to be, this bug only got triggered on very badly corrupted
    filesystems.
    
    Thanks to Andreas Dilger for submitting the test case which discovered
    this problem, and to Kalpak Shah for writing a random testing script
    which created the test case.
    
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

diff --git a/e2fsck/dirinfo.c b/e2fsck/dirinfo.c
index aaa4d09..f583c62 100644
--- a/e2fsck/dirinfo.c
+++ b/e2fsck/dirinfo.c
@@ -126,7 +126,7 @@ void e2fsck_add_dir_info(e2fsck_t ctx, ext2_ino_t ino, ext2_ino_t parent)
 		ctx->dir_info->size += 10;
 		retval = ext2fs_resize_mem(old_size, ctx->dir_info->size *
 					   sizeof(struct dir_info),
-					   &ctx->dir_info);
+					   &ctx->dir_info->array);
 		if (retval) {
 			ctx->dir_info->size -= 10;
 			return;

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: Random corruption test for e2fsck
  2007-07-11 17:43     ` Theodore Tso
@ 2007-07-12  5:15       ` Andreas Dilger
  0 siblings, 0 replies; 15+ messages in thread
From: Andreas Dilger @ 2007-07-12  5:15 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Kalpak Shah, linux-ext4

On Jul 11, 2007  13:43 -0400, Theodore Tso wrote:
>     Fix e2fsck segfault on very badly damaged filesystems
>     
> --- a/e2fsck/dirinfo.c
> +++ b/e2fsck/dirinfo.c
> @@ -126,7 +126,7 @@ void e2fsck_add_dir_info(e2fsck_t ctx, ext2_ino_t ino, ext2_ino_t parent)
>  		ctx->dir_info->size += 10;
>  		retval = ext2fs_resize_mem(old_size, ctx->dir_info->size *
>  					   sizeof(struct dir_info),
> -					   &ctx->dir_info);
> +					   &ctx->dir_info->array);
>  		if (retval) {
>  			ctx->dir_info->size -= 10;
>  			return;

This appears to fix the problem.  I was previously able to crash e2fsck
within a couple of runs, now it is running in a loop w/o problems.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Random corruption test for e2fsck
  2007-07-11 15:20 ` Andi Kleen
@ 2007-07-12  5:19   ` Andreas Dilger
  2007-07-12 11:09     ` Andi Kleen
  0 siblings, 1 reply; 15+ messages in thread
From: Andreas Dilger @ 2007-07-12  5:19 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Kalpak Shah, linux-ext4, TheodoreTso

On Jul 11, 2007  17:20 +0200, Andi Kleen wrote:
> If you use a normal pseudo random number generator and print the seed
> (e.g. create from the time) initially the image can be easily recreated 
> later without shipping it around. /dev/urandom
> is not really needed for this since you don't need cryptographic
> strength randomness. Besides urandom data is precious and it's 
> a pity to use it up needlessly.
> 
> bash has $RANDOM built in for this purpose.

Except it is a lot more efficient and easy to do
"dd if=/dev/urandom bs=1k ..." than to spin in a loop getting 16-bit
random numbers from bash.  We would also be at the mercy of the shell
being identical on the user and debugger's systems.

I don't think that running this test once in a blue moon on some
system is going to be a source of problems.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Random corruption test for e2fsck
  2007-07-10 14:58 ` Theodore Tso
                     ` (2 preceding siblings ...)
       [not found]   ` <20070711094410.GM6417@schatzie.adilger.int>
@ 2007-07-12  5:52   ` Andreas Dilger
  3 siblings, 0 replies; 15+ messages in thread
From: Andreas Dilger @ 2007-07-12  5:52 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Kalpak Shah, linux-ext4

I've got another one, but it isn't a show stopper I think.

If you format a filesystem with both resize_inode and meta_bg you get an
unfixable filesystem.  The bad news is that it appears that running
e2fsck on the filesystem is actually _causing_ the corruption in this case
(trying to rebuild the resize inode)?  For now, the answer is "don't do that".
The resize inode was never intended to be in use when meta_bg is enabled,
so we should just prevent this from the start.

mke2fs  -j -b 4096 -I 512 -O sparse_super,filetype,resize_inode,dir_index,lazy_bg,meta_bg -F /tmp/test.img 57852

I can run e2fsck on this repeatedly and it always complains the same way:

$ e2fsck -fy /tmp/test.img
e2fsck 1.39.cfs9 (7-Apr-2007)
Resize inode not valid.  Recreate? yes

Pass 1: Checking inodes, blocks, and sizes
Inode 8, i_blocks is 0, should be 32816.  Fix? yes

Reserved inode 9 (<Reserved inode 9>) has invalid mode.  Clear? yes
Deleted inode 17 has zero dtime.  Fix? yes
Deleted inode 25 has zero dtime.  Fix? yes
Deleted inode 33 has zero dtime.  Fix? yes
Deleted inode 41 has zero dtime.  Fix? yes
Deleted inode 49 has zero dtime.  Fix? yes
Deleted inode 57 has zero dtime.  Fix? yes
Deleted inode 65 has zero dtime.  Fix? yes
Deleted inode 73 has zero dtime.  Fix? yes
Deleted inode 81 has zero dtime.  Fix? yes
Deleted inode 89 has zero dtime.  Fix? yes

Pass 2: Checking directory structure
Inode 2 (???) has invalid mode (00).
Clear? yes

Entry '..' in ??? (2) has deleted/unused inode 2.  Clear? yes

Inode 11 (???) has invalid mode (00).
Clear? yes

Pass 3: Checking directory connectivity
Root inode not allocated.  Allocate? yes

/lost+found not found.  Create? yes

Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  +0 +(2--14) +(16--3619) +(3621--3622) +(3625--7727)
Fix? yes

Free blocks count wrong for group #0 (25039, counted=25041).
Fix? yes

Free blocks count wrong (46503, counted=46505).
Fix? yes

Inode bitmap differences:  +(3--10) -16
Fix? yes

Free inodes count wrong for group #0 (28918, counted=28917).
Fix? yes

Directories count wrong for group #0 (3, counted=2).
Fix? yes

Free inodes count wrong (57846, counted=57845).
Fix? yes

/tmp/test.img: ***** FILE SYSTEM WAS MODIFIED *****
/tmp/test.img: 11/57856 files (9.1% non-contiguous), 11347/57852 blocks

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Random corruption test for e2fsck
  2007-07-12  5:19   ` Andreas Dilger
@ 2007-07-12 11:09     ` Andi Kleen
  2007-07-12 22:16       ` Andreas Dilger
  0 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2007-07-12 11:09 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Andi Kleen, Kalpak Shah, linux-ext4, TheodoreTso

On Wed, Jul 11, 2007 at 11:19:38PM -0600, Andreas Dilger wrote:
> On Jul 11, 2007  17:20 +0200, Andi Kleen wrote:
> > If you use a normal pseudo random number generator and print the seed
> > (e.g. create from the time) initially the image can be easily recreated 
> > later without shipping it around. /dev/urandom
> > is not really needed for this since you don't need cryptographic
> > strength randomness. Besides urandom data is precious and it's 
> > a pity to use it up needlessly.
> > 
> > bash has $RANDOM built in for this purpose.
> 
> Except it is a lot more efficient and easy to do

Ah you chose to only address one sentence in my reply.
I thought only Linus liked to to do that.

If you're worried about efficiency it's trivial to
write a C program that generates bulk pseudo random numbers using
random(3) 

> "dd if=/dev/urandom bs=1k ..." than to spin in a loop getting 16-bit
> random numbers from bash.  We would also be at the mercy of the shell
> being identical on the user and debugger's systems.

With /dev/urandom you have the guarantee you'll never ever reproduce
it again. 

Andrea A. used to rant about people who use srand(time(NULL)) 
in benchmarks and it's sad these mistakes get repeated again and again.

-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Random corruption test for e2fsck
  2007-07-12 11:09     ` Andi Kleen
@ 2007-07-12 22:16       ` Andreas Dilger
  2007-07-12 22:24         ` Andi Kleen
  0 siblings, 1 reply; 15+ messages in thread
From: Andreas Dilger @ 2007-07-12 22:16 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Kalpak Shah, linux-ext4, TheodoreTso

On Jul 12, 2007  13:09 +0200, Andi Kleen wrote:
> > "dd if=/dev/urandom bs=1k ..." than to spin in a loop getting 16-bit
> > random numbers from bash.  We would also be at the mercy of the shell
> > being identical on the user and debugger's systems.
> 
> With /dev/urandom you have the guarantee you'll never ever reproduce
> it again. 

That is kind of the point of this testing - getting new test images for
each user that runs "make check" or "make rpm".  I'm We also save the
generated image before e2fsck touches it so that it can be used for
debugging if needed.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Random corruption test for e2fsck
  2007-07-12 22:16       ` Andreas Dilger
@ 2007-07-12 22:24         ` Andi Kleen
  2007-07-13  7:12           ` Kalpak Shah
  0 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2007-07-12 22:24 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Andi Kleen, Kalpak Shah, linux-ext4, TheodoreTso

On Thu, Jul 12, 2007 at 04:16:24PM -0600, Andreas Dilger wrote:
> On Jul 12, 2007  13:09 +0200, Andi Kleen wrote:
> > > "dd if=/dev/urandom bs=1k ..." than to spin in a loop getting 16-bit
> > > random numbers from bash.  We would also be at the mercy of the shell
> > > being identical on the user and debugger's systems.
> > 
> > With /dev/urandom you have the guarantee you'll never ever reproduce
> > it again. 
> 
> That is kind of the point of this testing - getting new test images for
> each user that runs "make check" or "make rpm".  I'm We also save the
> generated image before e2fsck touches it so that it can be used for
> debugging if needed.

If you seed a good pseudo RNG with the time (or even a few bytes from 
/dev/urandom; although the time tends to work as well) you'll also effectively 
get a new image every time.

But the advantage is if you print out the seed the image
can be easily recreated just by re-running the fuzzer
with the same seed. No need to ship potentially huge images
around.

You can essentially compress your whole image into a single
number this way.

-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Random corruption test for e2fsck
  2007-07-12 22:24         ` Andi Kleen
@ 2007-07-13  7:12           ` Kalpak Shah
  0 siblings, 0 replies; 15+ messages in thread
From: Kalpak Shah @ 2007-07-13  7:12 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andreas Dilger, linux-ext4, TheodoreTso

On Fri, 2007-07-13 at 00:24 +0200, Andi Kleen wrote:
> On Thu, Jul 12, 2007 at 04:16:24PM -0600, Andreas Dilger wrote:
> > On Jul 12, 2007  13:09 +0200, Andi Kleen wrote:
> > > > "dd if=/dev/urandom bs=1k ..." than to spin in a loop getting 16-bit
> > > > random numbers from bash.  We would also be at the mercy of the shell
> > > > being identical on the user and debugger's systems.
> > > 
> > > With /dev/urandom you have the guarantee you'll never ever reproduce
> > > it again. 
> > 
> > That is kind of the point of this testing - getting new test images for
> > each user that runs "make check" or "make rpm".  I'm We also save the
> > generated image before e2fsck touches it so that it can be used for
> > debugging if needed.
> 
> If you seed a good pseudo RNG with the time (or even a few bytes from 
> /dev/urandom; although the time tends to work as well) you'll also effectively 
> get a new image every time.
> 
> But the advantage is if you print out the seed the image
> can be easily recreated just by re-running the fuzzer
> with the same seed. No need to ship potentially huge images
> around.
> 
> You can essentially compress your whole image into a single
> number this way.

Firstly the filesystem is populated with files from the e2fsprogs source
directory. The filesystem is also corrupted by copying blocks in the
filesystem to some arbitary locations within it.

	from=`get_random_location $NUM_BLKS`
	to=`get_random_location $NUM_BLKS`
	dd if=$IMAGE of=$IMAGE bs=1k count=$CORRUPTION_SIZE conv=notrunc skip=
$from seek=$to >> $OUT 2>&1

Then the filesystem also undergoes corruption with /dev/urandom. To be
able to recreate the exact same filesystem with the seed, the filesystem
would need to allocate the _same_ blocks and metadata on both the
clients and the testers machine, which is obviously not possible.

Thanks,
Kalpak.

> 
> -Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2007-07-13  7:11 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-10 13:07 Random corruption test for e2fsck Kalpak Shah
2007-07-10 14:58 ` Theodore Tso
2007-07-10 15:42   ` Eric Sandeen
2007-07-11  7:03   ` Kalpak Shah
     [not found]   ` <20070711094410.GM6417@schatzie.adilger.int>
2007-07-11 17:43     ` Theodore Tso
2007-07-12  5:15       ` Andreas Dilger
2007-07-12  5:52   ` Andreas Dilger
2007-07-10 15:47 ` Eric Sandeen
2007-07-11 16:03   ` Andreas Dilger
2007-07-11 15:20 ` Andi Kleen
2007-07-12  5:19   ` Andreas Dilger
2007-07-12 11:09     ` Andi Kleen
2007-07-12 22:16       ` Andreas Dilger
2007-07-12 22:24         ` Andi Kleen
2007-07-13  7:12           ` Kalpak Shah

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).