* Strange data corruption with RW snapshots
@ 2005-10-01 11:20 Eduard Bloch
2005-10-05 15:49 ` Kevin Corry
0 siblings, 1 reply; 4+ messages in thread
From: Eduard Bloch @ 2005-10-01 11:20 UTC (permalink / raw)
To: dm-devel
Hello people,
I am trying to create a modified KNOPPIX using devmapper RW snapshots
instead of its unionfs solution. The problem is: I get corrupted data
and cannot see where it comes from. The relevant part of setup code
is attached below. The resulting snapshot volume looks ok at the first
glance (dmsetup status) and can be mounted (ext2), the directory tree is
displayed but it returns corruped data on most files. I cannot see the
problem, a simulation with the same setup method in another environment
(kernel 2.6.13) did work as expected. Also Ubuntu uses that method for
their Live CD and there it works as well.
I looked for a good and up-to-date documentation of the snapshot target
but could not find anything. I hope someone can give me a clue, I cannot
see how I could solve the problem now.
# state: KNOPDEV is /dev/cloop, readonly block device and mounted on
# /KNOPPIX, DMSETUP is a static. linked version from devmapper-1.01.04,
# kernel is .2.6.12 (.4 or so)
COW_NAME=COW
CHUNK_SIZE=8
SNAPSHOT_NAME=knop_rw
VOL_SIZE=$(blockdev --getsize $KNOPDEV)
# create space for the cow data and assign it to a free loop device COWDEV
echo | dd bs=1 of=/ramdisk/$COW_NAME seek=`expr 512 \* $VOL_SIZE` count=1 2>/dev/null
for x in /dev/loop* ; do if losetup $x >/dev/null 2>&1 ; then continue ; else COWDEV=$x ; break ; fi ; done
losetup $COWDEV /ramdisk/$COW_NAME
umount /KNOPPIX
# critical part, no /KNOPPIX/* system tools for now
$DMSETUP mknodes
echo "0 $VOL_SIZE linear /dev/cloop 0" | $DMSETUP create knop_ro
# this was expected to be a workaround, mapping the loop device to a
# devmapper volume. It did also fail when using pure $COWDEV
echo "0 $VOL_SIZE linear $COWDEV 0" | $DMSETUP create cow
echo "0 $VOL_SIZE snapshot /dev/mapper/knop_ro /dev/mapper/cow p $CHUNK_SIZE" | $DMSETUP create $SNAPSHOT_NAME
$DMSETUP mknodes $SNAPSHOT_NAME
#remount
/static/mount /dev/mapper/$SNAPSHOT_NAME /KNOPPIX
# we are back, rewritable
bash
# And this bash command running on /KNOPPIX already fails with obscure
# errors.
Thanks,
Eduard.
--
<Getty> LOL die Telefonnummer vom Arbeitsamt Mönchengladbach ist echt 404-0?
<Getty> Soll das nen schlechter Scherz sein?
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Strange data corruption with RW snapshots
2005-10-01 11:20 Strange data corruption with RW snapshots Eduard Bloch
@ 2005-10-05 15:49 ` Kevin Corry
2005-10-05 16:16 ` Eduard Bloch
0 siblings, 1 reply; 4+ messages in thread
From: Kevin Corry @ 2005-10-05 15:49 UTC (permalink / raw)
To: dm-devel
Hi Eduard,
On Sat October 1 2005 6:20 am, Eduard Bloch wrote:
> I am trying to create a modified KNOPPIX using devmapper RW snapshots
> instead of its unionfs solution. The problem is: I get corrupted data
> and cannot see where it comes from. The relevant part of setup code
> is attached below. The resulting snapshot volume looks ok at the first
> glance (dmsetup status) and can be mounted (ext2), the directory tree is
> displayed but it returns corruped data on most files. I cannot see the
> problem, a simulation with the same setup method in another environment
> (kernel 2.6.13) did work as expected. Also Ubuntu uses that method for
> their Live CD and there it works as well.
>
> I looked for a good and up-to-date documentation of the snapshot target
> but could not find anything. I hope someone can give me a clue, I cannot
> see how I could solve the problem now.
>
> # state: KNOPDEV is /dev/cloop, readonly block device and mounted on
> # /KNOPPIX, DMSETUP is a static. linked version from devmapper-1.01.04,
> # kernel is .2.6.12 (.4 or so)
> COW_NAME=COW
> CHUNK_SIZE=8
> SNAPSHOT_NAME=knop_rw
> VOL_SIZE=$(blockdev --getsize $KNOPDEV)
>
> # create space for the cow data and assign it to a free loop device COWDEV
> echo | dd bs=1 of=/ramdisk/$COW_NAME seek=`expr 512 \* $VOL_SIZE` count=1 \
> 2>/dev/null
> for x in /dev/loop* ; do
> if losetup $x >/dev/null 2>&1 ; then
> continue
> else
> COWDEV=$x
> break
> fi
> done
> losetup $COWDEV /ramdisk/$COW_NAME
>
> umount /KNOPPIX
> # critical part, no /KNOPPIX/* system tools for now
> $DMSETUP mknodes
> echo "0 $VOL_SIZE linear /dev/cloop 0" | $DMSETUP create knop_ro
If /dev/cloop is read-only (and apparently unmounted anyway), you shouldn't
need to create the knop_ro device. You can just use /dev/cloop directly.
> # this was expected to be a workaround, mapping the loop device to a
> # devmapper volume. It did also fail when using pure $COWDEV
> echo "0 $VOL_SIZE linear $COWDEV 0" | $DMSETUP create cow
This also should not be necessary. What kind of failure do you get if you use
$COWDEV directly?
> echo "0 $VOL_SIZE snapshot /dev/mapper/knop_ro /dev/mapper/cow p \
> $CHUNK_SIZE" | $DMSETUP create $SNAPSHOT_NAME
Based on what I mentioned above, this could be:
echo "0 $VOL_SIZE snapshot /dev/cloop $COWDEV p $CHUNK_SIZE" | \
$DMSETUP create $SNAPSHOT_NAME
> $DMSETUP mknodes $SNAPSHOT_NAME
>
> #remount
> /static/mount /dev/mapper/$SNAPSHOT_NAME /KNOPPIX
> # we are back, rewritable
> bash
> # And this bash command running on /KNOPPIX already fails with obscure
> # errors.
I just ran a similar test (on a 2.6.12 kernel):
# Create loop devices
dd if=/dev/zero of=loop_file0 bs=1M count=1 seek=1024
dd if=/dev/zero of=loop_file1 bs=1M count=1 seek=1024
losetup /dev/loop0 loop_file0
losetup /dev/loop1 loop_file1
# Create filesystem on first loop device and copy some data to it.
mkfs.ext3 /dev/loop0
mount /dev/loop0 /mnt/loop0/
cp -a /usr/src/linux-2.6.12 /mnt/loop0/
umount /mnt/loop0
# Create the snapshot.
echo "0 `blockdev --getsize /dev/loop0` snapshot /dev/loop0 /dev/loop1 p 8" \
| dmsetup create loop_snap
# Mount the snapshot. Compare data and modify data.
mount /dev/mapper/loop_snap /mnt/loop_snap
cd /mnt/loop_snap
diff -Naur --brief linux-2.6.12 /usr/src/linux-2.6.12
cp -a /usr/src/linux-2.6.11 .
rm -rf linux-2.6.12
diff -Naur --brief linux-2.6.11 /usr/src/linux-2.6.11
cd /
# Unmount the snapshot. Deactivate and reactivate the snapshot. Mount again
# and compare and modify data again.
umount /mnt/loop_snap
dmsetup remove loop_snap
echo "0 `blockdev --getsize /dev/loop0` snapshot /dev/loop0 /dev/loop1 p 8" \
| dmsetup create loop_snap
mount /dev/mapper/loop_snap /mnt/loop_snap
cd /mnt/loop_snap
diff -Naur --brief linux-2.6.11 /usr/src/linux-2.6.11
rm -rf linux-2.6.11
I have not seen any kind of errors at all. What specific errors are you
getting? Anything in the kernel log? Can you test again without the extra
dm-linear devices?
--
Kevin Corry
kevcorry@us.ibm.com
http://www.ibm.com/linux/
http://evms.sourceforge.net/
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Strange data corruption with RW snapshots
2005-10-05 15:49 ` Kevin Corry
@ 2005-10-05 16:16 ` Eduard Bloch
2005-10-06 16:59 ` Eduard Bloch
0 siblings, 1 reply; 4+ messages in thread
From: Eduard Bloch @ 2005-10-05 16:16 UTC (permalink / raw)
To: dm-devel
#include <hallo.h>
* Kevin Corry [Wed, Oct 05 2005, 10:49:30AM]:
> > # this was expected to be a workaround, mapping the loop device to a
> > # devmapper volume. It did also fail when using pure $COWDEV
> > echo "0 $VOL_SIZE linear $COWDEV 0" | $DMSETUP create cow
>
> This also should not be necessary. What kind of failure do you get if you use
> $COWDEV directly?
As said, I did also use COWDEV directly and got the same errors.
> > #remount
> > /static/mount /dev/mapper/$SNAPSHOT_NAME /KNOPPIX
> > # we are back, rewritable
> > bash
> > # And this bash command running on /KNOPPIX already fails with obscure
> > # errors.
>
> I just ran a similar test (on a 2.6.12 kernel):
>
> # Create loop devices
> dd if=/dev/zero of=loop_file0 bs=1M count=1 seek=1024
> dd if=/dev/zero of=loop_file1 bs=1M count=1 seek=1024
> losetup /dev/loop0 loop_file0
> losetup /dev/loop1 loop_file1
It is similar, but not the same. In the meantime, I found out that those
mysterious data corruption happens only if you use a cloop as origin
device and a rewrittable snapshot. Exactly the same problem has been
reported here, in
http://www.redhat.com/archives/dm-devel/2005-August/msg00081.html with
no useful results.
And I could reproduce it with kernel 2.6.13 as well, but not using the
loop driver as backend. Using snapshot-origin or another "linear" mapped
device as cow device does not solve it. To reproduce the problem, do
following:
Install the cloop driver and its utils from
http://ftp.de.debian.org/debian/pool/main/c/cloop/cloop_2.02.1+eb.8.tar.gz ,
unpack, do "make" and "mknod /dev/cloop0 b 240 0".
> # Create filesystem image and cow file
dd if=/dev/zero of=loop_file0 bs=1M count=1 seek=1024
dd if=/dev/zero of=loop_file1 bs=1M count=1 seek=1024
# mount image, copy data, umount image releasing cloop, compressing it for cloop
mke2fs -F loop_file0
mkdir tmp
mount loop_file0 tmp -oloop
cp -a /bin tmp/
umount tmp
create_compressed_fs loop_file0 loop_file0.cloop
# attache the compressed image to the cloop device and setup the
# snapshot
losetup /dev/cloop0 loop_file0.cloop
losetup /dev/loop0 loop_file1
echo "0 `blockdev --getsize /dev/cloop0` snapshot /dev/cloop0 /dev/loop0 p 8" \
| dmsetup create loop_snap
# Mount the snapshot. Compare data
mount /dev/mapper/loop_snap /mnt/loop_snap
diff -Nr /bin /mnt/loop_snap/bin
You will get different data in almost every file. Though the filesystem
has been mounted without problems, the data returned on random reads
(file access) looks like a salad of randomly permutated blocks.
In addition, a
# cmp loop_file0 /dev/mapper/cloop_snap
returns no problems for until the read reaches the last blocks of the
device. There cloop begins to print "funny" kernel messages about
decoding errors.
So I must assume there is something wrong with the cloop driver, and
dm-snapshot is the only way to reproduce that. The symptoms look similar
to non-reentrant programs, but I didn't try to inspect the exact
behaviour of dm-snapshot when reading from cloop device yet.
Eduard.
--
Bleib ruhig: In hundert Jahren ist alles vorbei.
-- Ralph Waldo Emerson
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: Strange data corruption with RW snapshots
2005-10-05 16:16 ` Eduard Bloch
@ 2005-10-06 16:59 ` Eduard Bloch
0 siblings, 0 replies; 4+ messages in thread
From: Eduard Bloch @ 2005-10-06 16:59 UTC (permalink / raw)
To: dm-devel
#include <hallo.h>
* Eduard Bloch [Wed, Oct 05 2005, 06:16:10PM]:
> #include <hallo.h>
> * Kevin Corry [Wed, Oct 05 2005, 10:49:30AM]:
> So I must assume there is something wrong with the cloop driver, and
> dm-snapshot is the only way to reproduce that. The symptoms look similar
> to non-reentrant programs, but I didn't try to inspect the exact
> behaviour of dm-snapshot when reading from cloop device yet.
The problem was in the cloop driver, it has also been found and fixed by
the Ubuntu guys without telling anyone else. The new packages of cloop
will solve the problem. Sorry for the noise.
Eduard.
--
Wer viel spricht hat weniger Zeit zum Denken.
-- Indisches Sprichwort
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2005-10-06 16:59 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-10-01 11:20 Strange data corruption with RW snapshots Eduard Bloch
2005-10-05 15:49 ` Kevin Corry
2005-10-05 16:16 ` Eduard Bloch
2005-10-06 16:59 ` Eduard Bloch
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.