linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Behrens <sbehrens@giantdisaster.de>
To: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Testing permanent IO errors with btrfs
Date: Tue, 03 Jul 2012 18:26:40 +0200	[thread overview]
Message-ID: <4FF31D40.6080103@giantdisaster.de> (raw)
In-Reply-To: <4FF2FB5B.70909@giantdisaster.de>

On Tue, 03 Jul 2012 16:02:03 +0200, Stefan Behrens wrote:
On 7/3/2012 4:02 PM, Stefan Behrens wrote:
> On Mon, 2 Jul 2012 22:57:01 +0300, Alex Lyakas wrote:
>> Hi everybody,
>> I am interested to test how btrfs behaves when the underlying block
>> device starts returning permanent IO errors. To test this, I set up a
>> linear device-mapper, mapped to the block device and start IOs. At
>> some point, I switch the device-mapper's table to "error" table (using
>> "dmsetup reload" and "dmsetup resume").
>> With older version of btrfs, I experienced kernel panics and sometimes
>> the IO processes would not terminate.
> 
> Did you fully switch to the "error" table or to a mixture of linear and
> error? And how did you write to the disk?
> 
> I tried several times with the following script and everything was fine:
> the filesystem was forced readonly, the fs fell back to an consistent
> state and was mountable.
> 
> Therefore, more details how to reproduce the issue would be helpful, and
> if you have the kernel logs for the issue, that would be helpful, too.

The issue is that during unmount, the super blocks are written which
contain a root that was not written. This is done for the case that the
filesystem was flagged with BTRFS_SUPER_FLAG_ERROR before due to an
fatal I/O error. This means, the transaction aborting is not correct.

Call trace:
write_dev_supers+0x2e6/0x2f0 [btrfs]
write_all_supers+0x69a/0x870 [btrfs]
btrfs_error_commit_super+0x74/0x80 [btrfs]
close_ctree+0x2c0/0x330 [btrfs]

It is reproducible like this (with today's btrfs-next):

#!/bin/sh
echo 0 25165824 linear /dev/sdm 0 | dmsetup create foo
ls -alLF /dev/mapper/foo
mkfs.btrfs /dev/mapper/foo
mount /dev/mapper/foo /mnt
sync
# switch into I/O error mode:
echo 0 25165824 error | dmsetup reload foo
dmsetup resume foo
ls -alF /mnt
touch /mnt/1
ls -alF /mnt
sleep 35
# during the sleep 35, the "btrfs is forced readonly" and
# "btrfs: Transaction aborted" appears.
# switch dm back into I/O good mode:
echo 0 25165824 linear /dev/sdm 0 | dmsetup reload foo
dmsetup resume foo
sleep 1
umount /mnt
# now write_all_supers() is called and points to something
# that is never written to disk
btrfsck /dev/mapper/foo


> 
> #!/bin/sh
> set -x
> DISK=/dev/sdm
> # 12GB
> echo 0 25165824 linear $DISK 0 | dmsetup create foo
> dmsetup info foo
> ls -alLF /dev/mapper/foo
> dd if=/dev/zero of=/dev/mapper/foo bs=1M
> mkfs.btrfs /dev/mapper/foo
> mount /dev/mapper/foo /mnt
> dd if=/dev/zero of=/mnt/30G bs=1M count=12000 &
> sleep 10
> sync
> sleep 5
> echo 0 25165824 error | dmsetup reload foo
> dmsetup resume foo
> wait
> ls -alF /mnt
> umount /mnt
> sleep 1
> echo 0 25165824 linear $DISK 0 | dmsetup reload foo
> dmsetup resume foo
> sleep 1
> mount /dev/mapper/foo /mnt
> ls -alF /mnt
> umount /mnt
> btrfsck /dev/mapper/foo
> dmsetup remove foo
> 
> 
>>
>> Today I tested with
>> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git up
>> to commit 782b7bca60c5021213e87ab26bbf94deb7654b62 (Merge branch
>> 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next
>> into for-linus). Things look better - kernel issued a warning about
>> aborting the transaction, and all IO processes exited with error.
>> However, after I unmounted (got some kernel warnings on unmount) and
>> mounted again, mount failed. Using btrfs-debug-tree, I followed the
>> __open_ctree_fd() path and discovered that:
>>
>> Problem 1: superblock tells transaction id (13) higher than the chunk
>> tree root (5). (The chunk tree itself has only one leaf, which is also
>> the root.)
>> Problem 2: when trying to read the block of the root of tree roots
>> (read_tree_block()), using bytenr stored at superblock->root and the
>> chunk tree (with low transaction id) to map the block, brings
>> corrupted nodes from both locations (DUP).
>>
>> At this point btrfs-debug-tree aborts, because the root of tree roots
>> is not available.
>>
>> Below is some information, pls let me know if any additional info is needed.
>>
>> Thanks,
>> Alex.
>>
>> The superblock structure:
>> $11 = {csum = "Ø\017\336", '\000' <repeats 27 times>, fsid =
>> "??CC\330\344M\327\272\237\003\363\065\266\274", <incomplete sequence
>> \372>, bytenr = 65536, flags = 1, magic = 5575266562640200287,
>> generation = 13, root = 29515776, chunk_root = 20971520, log_root = 0,
>> log_root_transid = 0, total_bytes = 21474836480, bytes_used =
>> 3674767360, root_dir_objectid = 6, num_devices = 1, sectorsize = 4096,
>> nodesize = 4096, leafsize = 4096, stripesize = 4096,
>> sys_chunk_array_size = 226, chunk_root_generation = 13, compat_flags =
>> 0, compat_ro_flags = 0, incompat_flags = 1, csum_type = 0, root_level
>> = 0 '\000', chunk_root_level = 0 '\000', log_root_level = 0 '\000',
>> dev_item = {devid = 1, total_bytes = 21474836480, bytes_used =
>> 7553941504, io_align = 4096, io_width = 4096, sector_size = 4096, type
>> = 0, generation = 0, start_offset = 4294967312, dev_group = 0,
>> seek_speed = 1 '\001', bandwidth = 0 '\000', uuid =
>> "\206\201\355\"\311\330L\234\221\305\016d\362\241\276", <incomplete
>> sequence \356>, fsid =
>> "??CC\330\344M\327\272\237\003\363\065\266\274", <incomplete sequence
>> \372>}, label = '\000' <repeats 255 times>, cache_generation = 0,
>> reserved = {0 <repeats 31 times>}, sys_chunk_array =
>> "\000\001\000\000\000\000\000\000\344\000\000\000\000\000\000\000\000\000\000@\000\000\000\000\000\002\000\000\000\000\000\000\000\000\000\001\000\000\000\000\000\002\000\000\000\000\000\000\000\000\020\000\000\000\020\000\000\000\020\000\000\001\000\000\000\001",
>> '\000' <repeats 15 times>"\206,
>> \201\355\"\311\330L\234\221\305\016d\362\241\276\356\000\001\000\000\000\000\000\000\344\000\000@\001\000\000\000\000\000\000\200\000\000\000\000\000\002\000\000\000\000\000\000\000\000\000\001\000\000\000\000\000\"\000\000\000\000\000\000\000\000\000\001\000\000\000\001\000\000\020\000\000\002\000\000\000\001\000\000\000\000\000\000\000\000\000@\001\000\000\000\000\206\201\355\"\311\330L\234\221\305\016d\362\241\276\356\001\000\000\000\000\000\000\000\000\000\300\001\000\000\000\000\206\201\355\"\311\330L\234\221\305\016d\362\241\276\356",
>> '\000' <repeats 1821 times>, super_roots = {{tree_root = 29425664,
>> tree_root_gen = 12, chunk_root = 20975616, chunk_root_gen = 11,
>> extent_root = 29417472, extent_root_gen = 12, fs_root = 29429760,
>> fs_root_gen = 13, dev_root = 29372416, dev_root_gen = 11, csum_root =
>> 29376512, csum_root_gen = 5, total_bytes = 21474836480, bytes_used =
>> 2252435456, num_devices = 1, unsed_64 = {0, 0, 0, 0}, tree_root_level
>> = 0 '\000', chunk_root_level = 0 '\000', extent_root_level = 0 '\000',
>> fs_root_level = 1 '\001', dev_root_level = 0 '\000', csum_root_level =
>> 0 '\000', unused_8 = "\000\000\000\000\000\000\000\000\000"},
>> {tree_root = 29515776, tree_root_gen = 13, chunk_root = 20971520,
>> chunk_root_gen = 13, extent_root = 29507584, extent_root_gen = 13,
>> fs_root = 29519872, fs_root_gen = 14, dev_root = 29503488,
>> dev_root_gen = 13, csum_root = 29376512, csum_root_gen = 5,
>> total_bytes = 21474836480, bytes_used = 3674767360, num_devices = 1,
>> unsed_64 = {0, 0, 0, 0}, tree_root_level = 0 '\000', chunk_root_level
>> = 0 '\000', extent_root_level = 1 '\001', fs_root_level = 1 '\001',
>> dev_root_level = 0 '\000', csum_root_level = 0 '\000', unused_8 =
>> "\000\000\000\000\000\000\000\000\000"}, {tree_root = 29405184,
>> tree_root_gen = 10, chunk_root = 20971520, chunk_root_gen = 5,
>> extent_root = 29409280, extent_root_gen = 10, fs_root = 29360128,
>> fs_root_gen = 5, dev_root = 29392896, dev_root_gen = 8, csum_root =
>> 29376512, csum_root_gen = 5, total_bytes = 21474836480, bytes_used =
>> 28672,       num_devices = 1, unsed_64 = {0, 0, 0, 0}, tree_root_level
>> = 0 '\000', chunk_root_level = 0 '\000', extent_root_level = 0 '\000',
>>       fs_root_level = 0 '\000', dev_root_level = 0 '\000',
>> csum_root_level = 0 '\000', unused_8 =
>> "\000\000\000\000\000\000\000\000\000"}, { tree_root = 29396992,
>> tree_root_gen = 11, chunk_root = 20975616, chunk_root_gen = 11,
>> extent_root = 29417472, extent_root_gen = 12, fs_root = 29401088,
>> fs_root_gen = 12, dev_root = 29372416, dev_root_gen = 11, csum_root =
>> 29376512, csum_root_gen = 5, total_bytes = 21474836480, bytes_used =
>> 2125430784, num_devices = 1, unsed_64 = {0, 0, 0, 0}, tree_root_level
>> = 0 '\000', chunk_root_level = 0 '\000', extent_root_level = 0 '\000',
>> fs_root_level = 1 '\001', dev_root_level = 0 '\000', csum_root_level =
>> 0 '\000', unused_8 = "\000\000\000\000\000\000\000\000\000"}}}
>>
>> Chunk tree:
>> chunk tree
>> leaf 20971520 items 6 free space 3283 generation 5 owner 3
>> fs uuid 3f3f4343-d8e4-4dd7-ba9f-03f335b6bcfa
>> chunk uuid a872f18c-9a34-4875-809b-eb56f400bcd1
>> 	item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 3897 itemsize 98
>> 		dev item devid 1 gen 0 total_bytes 21474836480 bytes used 2185232384
>> 	item 1 key (FIRST_CHUNK_TREE CHUNK_ITEM 0) itemoff 3817 itemsize 80
>> 		chunk length 4194304 owner 2 type 2 num_stripes 1
>> 			stripe 0 devid 1 offset 0
>> 	item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 4194304) itemoff 3737 itemsize 80
>> 		chunk length 8388608 owner 2 type 4 num_stripes 1
>> 			stripe 0 devid 1 offset 4194304
>> 	item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 12582912) itemoff 3657 itemsize 80
>> 		chunk length 8388608 owner 2 type 1 num_stripes 1
>> 			stripe 0 devid 1 offset 12582912
>> 	item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 20971520) itemoff 3545 itemsize 112
>> 		chunk length 8388608 owner 2 type 34 num_stripes 2
>> 			stripe 0 devid 1 offset 20971520
>> 			stripe 1 devid 1 offset 29360128
>> 	item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 29360128) itemoff 3433 itemsize 112
>> 		chunk length 1073741824 owner 2 type 36 num_stripes 2
>> 			stripe 0 devid 1 offset 37748736
>> 			stripe 1 devid 1 offset 1111490560
>> total bytes 21474836480
>> bytes used 3674767360
>> uuid 3f3f4343-d8e4-4dd7-ba9f-03f335b6bcfa
>>
>> both attempts to read the root block:
>> $84 = {csum = "\006\000\000\000\006\000\000\000\002\000\000\000\002\000\000\000\016\000\000\000\016\000\000\000\002\000\000\000\002\000\000",
>>  fsid = "\006\000\000\000\006\000\000\000\002\000\000\000\002\000\000",
>> bytenr = 128849018910, flags = 8589934594,  chunk_tree_uuid =
>> "\006\000\000\000\006\000\000\000\002\000\000\000\002\000\000",
>> generation = 60129542158, owner = 8589934594, nritems = 6,  level = 6
>> '\006'}
>>
>> $90 = {csum = "\aYZ9\aYZ:\aYZ;\aYZ<\aYZ=\aYZ>\aYZ?\aYZ@", fsid =
>> "\aYZA\aYZB\aYZC\aYZD", bytenr = 5069462218322106631, flags =
>> 5213577406431516935, chunk_tree_uuid = "\aYZI\aYZJ\aYZK\aYZL",
>> generation = 5645922970759747847, owner = 5790038158869158151, nritems
>> = 1364875527, level = 7 '\a'}
>>
>> btrfs-find-root output:
>> parent transid verify failed on 20971520 wanted 13 found 5
>> parent transid verify failed on 20971520 wanted 13 found 5
>> parent transid verify failed on 20971520 wanted 13 found 5
>> parent transid verify failed on 20971520 wanted 13 found 5
>> Ignoring transid failure
>> Super think's the tree root is at 29515776, chunk root 20971520
>> Well block 4194304 seems great, but generation doesn't match, have=3, want=13
>> Well block 4206592 seems great, but generation doesn't match, have=4, want=13
>> Well block 29396992 seems great, but generation doesn't match, have=11, want=13
>> Well block 29405184 seems great, but generation doesn't match, have=10, want=13
>> Well block 29425664 seems great, but generation doesn't match, have=12, want=13
>> No more metdata to scan, exiting

  reply	other threads:[~2012-07-03 16:26 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-02 19:57 Testing permanent IO errors with btrfs Alex Lyakas
2012-07-03 14:02 ` Stefan Behrens
2012-07-03 16:26   ` Stefan Behrens [this message]
2012-07-03 16:29     ` Alex Lyakas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FF31D40.6080103@giantdisaster.de \
    --to=sbehrens@giantdisaster.de \
    --cc=alex.bolshoy.btrfs@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).