* Corrupt file in subvolume
@ 2011-10-10 2:14 dima
2011-10-10 10:37 ` David Sterba
0 siblings, 1 reply; 7+ messages in thread
From: dima @ 2011-10-10 2:14 UTC (permalink / raw)
To: linux-btrfs
Hello,
Somehow my subvolume with /home got corrupted. When I booted the machine this
morning (after perfectly normal shutdown) it gave me a bunch of kernel errors. I
found out that if I comment out my /home entry in fstab, it would boot ok. So
the / is not corrupted. I then booted from the live CD and set "clear_cache" for
/home instead of "inode_cache,space_cache"
/dev/disk/by-label/btrfs-root / btrfs
defaults,noatime,inode_cache,space_cache 0 0
/dev/disk/by-label/btrfs-root /var/lib/btrfs-root btrfs
defaults,noatime,subvolid=0 0 0
#/dev/disk/by-label/btrfs-root /home btrfs
defaults,noatime,subvol=__home-new,inode_cache,space_cache 0 0
/dev/disk/by-label/btrfs-root /home btrfs
defaults,noatime,subvol=__home-new,clear_cache 0 0
/var/lib/btrfs-root/boot /boot none bind 0 0
Then I could mount the /home subvolume.
I also found the corrupted file
? -????????? ? ? ? ? ? 13.4.4.40.js
Whenever I try to access it I am getting Input/output error and the following
error in the kernel.log
Oct 10 10:38:03 yukikaze kernel: [34592.275080] parent transid verify failed on
105930436608 wanted 58565 found 134248
Oct 10 10:38:03 yukikaze kernel: [34592.275161] BUG: scheduling while atomic:
ls/2545/0x00000002
Oct 10 10:38:03 yukikaze kernel: [34592.275166] Modules linked in: ipv6 loop
usb_storage uas radeon snd_hda_codec_hdmi ttm snd_hda_codec_via drm_kms_helper
ppdev sg snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd edac_core
soundcore sp5100_tco r8169 drm firewire_ohci firewire_core i2c_algo_bit
i2c_piix4 i2c_core edac_mce_amd parport_pc shpchp parport pci_hotplug pcspkr
evdev mii serio_raw k10temp psmouse asus_atk0110 snd_page_alloc crc_itu_t wmi
button powernow_k8 processor mperf sr_mod cdrom sd_mod pata_acpi usbhid hid
ohci_hcd pata_atiixp ahci libahci libata ehci_hcd scsi_mod usbcore
Oct 10 10:38:03 yukikaze kernel: [34592.275268] Pid: 2545, comm: ls Not tainted
3.0.6-aya1 #3
Oct 10 10:38:03 yukikaze kernel: [34592.275273] Call Trace:
Oct 10 10:38:03 yukikaze kernel: [34592.275288] [<ffffffff8143fd33>]
__schedule_bug+0x5f/0x64
Oct 10 10:38:03 yukikaze kernel: [34592.275298] [<ffffffff81447c89>]
__schedule+0x7c9/0x980
Oct 10 10:38:03 yukikaze kernel: [34592.275310] [<ffffffff812705e7>] ?
submit_bio+0x87/0x110
Oct 10 10:38:03 yukikaze kernel: [34592.275320] [<ffffffff81009e29>] ?
read_tsc+0x9/0x20
Oct 10 10:38:03 yukikaze kernel: [34592.275329] [<ffffffff8107e7bd>] ?
ktime_get_ts+0xad/0xe0
Oct 10 10:38:03 yukikaze kernel: [34592.275338] [<ffffffff810eb550>] ?
__lock_page+0x70/0x70
Oct 10 10:38:03 yukikaze kernel: [34592.275346] [<ffffffff8104ac6f>]
schedule+0x3f/0x60
Oct 10 10:38:03 yukikaze kernel: [34592.275354] [<ffffffff81447fbf>]
io_schedule+0x8f/0xd0
Oct 10 10:38:03 yukikaze kernel: [34592.275362] [<ffffffff810eb55e>]
sleep_on_page+0xe/0x20
Oct 10 10:38:03 yukikaze kernel: [34592.275370] [<ffffffff8144876f>]
__wait_on_bit+0x5f/0x90
Oct 10 10:38:03 yukikaze kernel: [34592.275379] [<ffffffff810eb748>]
wait_on_page_bit+0x78/0x80
Oct 10 10:38:03 yukikaze kernel: [34592.275388] [<ffffffff81074140>] ?
autoremove_wake_function+0x40/0x40
Oct 10 10:38:03 yukikaze kernel: [34592.275397] [<ffffffff81210902>]
read_extent_buffer_pages+0x412/0x480
Oct 10 10:38:03 yukikaze kernel: [34592.275405] [<ffffffff811e4410>] ?
verify_parent_transid+0x240/0x240
Oct 10 10:38:03 yukikaze kernel: [34592.275414] [<ffffffff811e529a>]
btree_read_extent_buffer_pages.isra.61+0x8a/0xc0
Oct 10 10:38:03 yukikaze kernel: [34592.275422] [<ffffffff811e6bf1>]
read_tree_block+0x41/0x60
Oct 10 10:38:03 yukikaze kernel: [34592.275431] [<ffffffff811cbaab>]
read_block_for_search.isra.33+0x1fb/0x500
Oct 10 10:38:03 yukikaze kernel: [34592.275439] [<ffffffff811cb0bd>] ?
generic_bin_search.constprop.35+0x17d/0x1f0
Oct 10 10:38:03 yukikaze kernel: [34592.275447] [<ffffffff811cb214>] ?
bin_search+0xe4/0x130
Oct 10 10:38:03 yukikaze kernel: [34592.275454] [<ffffffff811ceb48>]
btrfs_search_slot+0x358/0x900
Oct 10 10:38:03 yukikaze kernel: [34592.275464] [<ffffffff811e310f>]
btrfs_lookup_inode+0x2f/0xa0
Oct 10 10:38:03 yukikaze kernel: [34592.275473] [<ffffffff811f6e38>]
btrfs_iget+0x108/0x4d0
Oct 10 10:38:03 yukikaze kernel: [34592.275482] [<ffffffff811e0b7f>] ?
btrfs_lookup_dir_item+0xdf/0x110
Oct 10 10:38:03 yukikaze kernel: [34592.275491] [<ffffffff811f78f3>]
btrfs_lookup_dentry+0x383/0x480
Oct 10 10:38:03 yukikaze kernel: [34592.275499] [<ffffffff811367b9>] ?
kmem_cache_alloc+0x149/0x160
Oct 10 10:38:03 yukikaze kernel: [34592.275508] [<ffffffff811f7a06>]
btrfs_lookup+0x16/0x30
Oct 10 10:38:03 yukikaze kernel: [34592.275515] [<ffffffff811561d5>]
d_alloc_and_lookup+0x45/0x90
Oct 10 10:38:03 yukikaze kernel: [34592.275524] [<ffffffff811632b5>] ?
d_lookup+0x35/0x60
Oct 10 10:38:03 yukikaze kernel: [34592.275531] [<ffffffff81157a3e>]
do_lookup+0x29e/0x310
Oct 10 10:38:03 yukikaze kernel: [34592.275538] [<ffffffff811586bc>]
path_lookupat+0x11c/0x700
Oct 10 10:38:03 yukikaze kernel: [34592.275546] [<ffffffff81158cd1>]
do_path_lookup+0x31/0xc0
Oct 10 10:38:03 yukikaze kernel: [34592.275553] [<ffffffff8115a909>]
user_path_at+0x59/0xa0
Oct 10 10:38:03 yukikaze kernel: [34592.275561] [<ffffffff8102f8f0>] ?
do_page_fault+0x1c0/0x4d0
Oct 10 10:38:03 yukikaze kernel: [34592.275570] [<ffffffff8114fd64>]
vfs_fstatat+0x44/0x70
Oct 10 10:38:03 yukikaze kernel: [34592.275578] [<ffffffff810677fd>] ?
do_sigaction+0x12d/0x1f0
Oct 10 10:38:03 yukikaze kernel: [34592.275586] [<ffffffff8114fdcb>]
vfs_stat+0x1b/0x20
Oct 10 10:38:03 yukikaze kernel: [34592.275593] [<ffffffff8114ff0a>]
sys_newstat+0x1a/0x40
Oct 10 10:38:03 yukikaze kernel: [34592.275601] [<ffffffff81067bcd>] ?
sys_rt_sigaction+0x8d/0xc0
Oct 10 10:38:03 yukikaze kernel: [34592.275610] [<ffffffff8144b055>] ?
page_fault+0x25/0x30
Oct 10 10:38:03 yukikaze kernel: [34592.275617] [<ffffffff8144b602>]
system_call_fastpath+0x16/0x1b
My question - is it possible to delete this rogue file somehow or repair it?
I tried to delete the directory that contained it, but got the same Input/output
error.
Any help is appreciated.
I need to mention that I did have the very same error about a couple of months
ago with about 30 files getting corrupt this way in my /home. I had to create a
new subvolume for /home (__home-new) and restore the missing files from backup.
When I tried to delete the corrupted subvolume it gave me a bunch of kernel
errors, but when I repeated the command, it completed ok. However, on reboot the
space from this subvolume was not recovered. I tried to balance the subvolume
after that but after a couple of hours I am getting only the note about 22
extents in my kernel.log
Oct 10 11:03:22 yukikaze kernel: [36111.396313] btrfs: found 22 extents
Oct 10 11:03:27 yukikaze kernel: [36116.922236] btrfs: found 22 extents
Oct 10 11:03:33 yukikaze kernel: [36122.922488] btrfs: found 22 extents
and no relocation messages. So I think it go stuck (
thanks
~dima
---
archlinux
Linux yukikaze 3.0.6-aya1 #3 SMP PREEMPT Sat Oct 8 19:01:41 JST 2011 x86_64 AMD
Athlon(tm) II X4 635 Processor AuthenticAMD GNU/Linux
the latest btrfs-tools
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Corrupt file in subvolume
2011-10-10 2:14 Corrupt file in subvolume dima
@ 2011-10-10 10:37 ` David Sterba
2011-10-10 11:03 ` dima
2011-10-10 11:55 ` Kai Krakow
0 siblings, 2 replies; 7+ messages in thread
From: David Sterba @ 2011-10-10 10:37 UTC (permalink / raw)
To: dima; +Cc: linux-btrfs
Hi,
On Mon, Oct 10, 2011 at 02:14:26AM +0000, dima wrote:
> Somehow my subvolume with /home got corrupted. When I booted the machine this
> morning (after perfectly normal shutdown) it gave me a bunch of kernel errors.
That's very strange, if it was a pefrectly normal shutdown, I don't see
a way how could happen. External disk damage, bad RAM would seem a as
convenient excuse :)
> I found out that if I comment out my /home entry in fstab, it would
> boot ok. So the / is not corrupted. I then booted from the live CD and
> set "clear_cache" for /home instead of "inode_cache,space_cache"
>
> /dev/disk/by-label/btrfs-root / btrfs
> defaults,noatime,inode_cache,space_cache 0 0
> /dev/disk/by-label/btrfs-root /var/lib/btrfs-root btrfs
> defaults,noatime,subvolid=0 0 0
> #/dev/disk/by-label/btrfs-root /home btrfs
> defaults,noatime,subvol=__home-new,inode_cache,space_cache 0 0
> /dev/disk/by-label/btrfs-root /home btrfs
> defaults,noatime,subvol=__home-new,clear_cache 0 0
> /var/lib/btrfs-root/boot /boot none bind 0 0
>
> Then I could mount the /home subvolume.
>
> I also found the corrupted file
> ? -????????? ? ? ? ? ? 13.4.4.40.js
Chromium cache? Somebody recently reported a problem there. I wonder
what this browser does to the filesystem ... :)
> Whenever I try to access it I am getting Input/output error and the following
> error in the kernel.log
>
>
> Oct 10 10:38:03 yukikaze kernel: [34592.275080] parent transid verify failed on
> 105930436608 wanted 58565 found 134248
> Oct 10 10:38:03 yukikaze kernel: [34592.275161] BUG: scheduling while atomic:
> ls/2545/0x00000002
This bug is in most cases only a consequence of some btrfs BUG_ON,
please try to find it in your logs or reproduce the problem. The 'parent
transid verify' problem may cause a BUG_ON up in the caller stack.
> My question - is it possible to delete this rogue file somehow or repair it?
> I tried to delete the directory that contained it, but got the same Input/output
> error.
Fsck for the rescue! Or, you can try Josef's repair [1] proggy to retrieve
the data from the volume (AFAIK it should work around the parent transid
problem). If all other files are fine, you can rebuild the /home from
that.
david
[1] git://github.com/josefbacik/btrfs-progs.git
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Corrupt file in subvolume
2011-10-10 10:37 ` David Sterba
@ 2011-10-10 11:03 ` dima
2011-10-10 11:29 ` David Sterba
2011-10-10 11:55 ` Kai Krakow
1 sibling, 1 reply; 7+ messages in thread
From: dima @ 2011-10-10 11:03 UTC (permalink / raw)
To: linux-btrfs
Thanks David,
The last shutdown was clean, but I had to powercycle several times this month.
I am also mounting a swapfile via loop device, so maybe this also adds up to
instability.
The corrupt file is a firefox source file
(mozilla-central/js/src/tests/e4x/XML/13.4.4.40.js). Interesting thing that I
did not touch this file or rebuild firefox for about 3-4 days, so I do not have
any idea why it got corrupted suddenly.
When trying to remove the directory containing this file I am getting:
Oct 10 14:03:13 yukikaze kernel: [ 9836.993172] ------------[ cut here
]------------
Oct 10 14:03:13 yukikaze kernel: [ 9836.993261] kernel BUG at
fs/btrfs/inode.c:3024!
Oct 10 14:03:13 yukikaze kernel: [ 9836.993340] invalid opcode: 0000 [#1]
PREEMPT SMP
Oct 10 14:03:13 yukikaze kernel: [ 9836.993438] CPU 0
Oct 10 14:03:13 yukikaze kernel: [ 9836.993474] Modules linked in: reiserfs
usb_storage uas ipv6 loop snd_hda_codec_hdmi snd_hda_codec_via sg snd_hda_intel
snd_hda_codec snd_hwdep snd_pcm snd_timer snd sp5100_tco i2c_piix4 radeon ttm
drm_kms_helper drm i2c_algo_bit firewire_ohci psmouse ppdev shpchp evdev
serio_raw pcspkr firewire_core pci_hotplug i2c_core edac_core soundcore
snd_page_alloc asus_atk0110 k10temp edac_mce_amd parport_pc parport crc_itu_t
r8169 mii button wmi powernow_k8 processor mperf usbhid hid sr_mod cdrom sd_mod
pata_acpi ohci_hcd ehci_hcd pata_atiixp ahci libahci libata scsi_mod usbcore
Oct 10 14:03:13 yukikaze kernel: [ 9836.994630]
Oct 10 14:03:13 yukikaze kernel: [ 9836.994662] Pid: 3043, comm: rm Not tainted
3.0.6-aya1 #3 System manufacturer System Product Name/M4A785TD-V EVO
Oct 10 14:03:13 yukikaze kernel: [ 9836.994840] RIP: 0010:[<ffffffff811f5221>]
[<ffffffff811f5221>] btrfs_unlink+0xd1/0xe0
Oct 10 14:03:13 yukikaze kernel: [ 9836.994983] RSP: 0018:ffff8800a616fe28
EFLAGS: 00010282
Oct 10 14:03:13 yukikaze kernel: [ 9836.995070] RAX: 00000000fffffffe RBX:
ffff8801178f6240 RCX: 000000000331d8c0
Oct 10 14:03:13 yukikaze kernel: [ 9836.995185] RDX: 000000000331d880 RSI:
0000000000018dc0 RDI: ffffea0003d28130
Oct 10 14:03:13 yukikaze kernel: [ 9836.995301] RBP: ffff8800a616fe58 R08:
ffffffff811c7dda R09: 0000000000000000
Oct 10 14:03:13 yukikaze kernel: [ 9836.995416] R10: 0000000000000000 R11:
0000000000000001 R12: 00000000fffffffe
Oct 10 14:03:13 yukikaze kernel: [ 9836.995530] R13: ffff880096fb05c8 R14:
ffff8801186ad800 R15: ffff8800426bbf88
Oct 10 14:03:13 yukikaze kernel: [ 9836.995646] FS: 00007f54a0d6e700(0000)
GS:ffff88011fc00000(0000) knlGS:0000000000000000
Oct 10 14:03:13 yukikaze kernel: [ 9836.995777] CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
Oct 10 14:03:13 yukikaze kernel: [ 9836.995870] CR2: 0000000001ddf0b8 CR3:
00000001081d9000 CR4: 00000000000006f0
Oct 10 14:03:13 yukikaze kernel: [ 9836.995984] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Oct 10 14:03:13 yukikaze kernel: [ 9836.996099] DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
Oct 10 14:03:13 yukikaze kernel: [ 9836.996113] Process rm (pid: 3043,
threadinfo ffff8800a616e000, task ffff8800967e1d00)
Oct 10 14:03:13 yukikaze kernel: [ 9836.996113] Stack:
Oct 10 14:03:13 yukikaze kernel: [ 9836.996113] 0000000000000000
ffff880012f8b300 0000000000000000 ffff880096fb05c8
Oct 10 14:03:13 yukikaze kernel: [ 9836.996113] 0000000000000000
0000000000000003 ffff8800a616fe88 ffffffff8115a42f
Oct 10 14:03:13 yukikaze kernel: [ 9836.996113] ffff8800a616fe88
ffff880012f8b300 ffff8800426bbf88 0000000000000000
Oct 10 14:03:13 yukikaze kernel: [ 9836.996113] Call Trace:
Oct 10 14:03:13 yukikaze kernel: [ 9836.996113] [<ffffffff8115a42f>]
vfs_unlink+0x9f/0x110
Oct 10 14:03:13 yukikaze kernel: [ 9836.996113] [<ffffffff8115a63a>]
do_unlinkat+0x19a/0x1c0
Oct 10 14:03:13 yukikaze kernel: [ 9836.996113] [<ffffffff811496b6>] ?
filp_close+0x66/0x90
Oct 10 14:03:13 yukikaze kernel: [ 9836.996113] [<ffffffff8115b332>]
sys_unlinkat+0x22/0x40
Oct 10 14:03:13 yukikaze kernel: [ 9836.996113] [<ffffffff8144b602>]
system_call_fastpath+0x16/0x1b
Oct 10 14:03:13 yukikaze kernel: [ 9836.996113] Code: 5d d8 4c 8b 65 e0 4c 8b 6d
e8 4c 8b 75 f0 4c 8b 7d f8 c9 c3 66 0f 1f 44 00 00 4c 89 fe 48 89 df e8 e5 cd ff
ff 85 c0 74 b8 0f 0b <0f> 0b 41 89 c4 eb c9 0f 1f 84 00 00 00 00 00 55 48 89 e5
41 57
Oct 10 14:03:13 yukikaze kernel: [ 9836.996113] RIP [<ffffffff811f5221>]
btrfs_unlink+0xd1/0xe0
Oct 10 14:03:13 yukikaze kernel: [ 9836.996113] RSP <ffff8800a616fe28>
Oct 10 14:03:13 yukikaze kernel: [ 9837.023860] ---[ end trace 771cebd6df5534bd
]---
I did btrfsck with the latest btrfs-tools
After
item 33 key (150121906176 EXTENT_ITEM 4096) itemoff 2234 itemsize 51
extent refs 1 gen 33099 flags 2
tree block key (1215402 1 0) level 0
tree block backref root 257
(i.e. very early, about 4-5 seconds after I started checking)
it gave me an error
failed to find block number 150121762816
Unless I touch this file, the FS is fully functional.
Yes, I can create a new subvolume of course, but as I mentioned before, there is
a big chance that the corrupted one will not be deleted cleanly and my disk gets
bloated even more with junk data I can do nothing about.
thanks
~dima
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Corrupt file in subvolume
2011-10-10 11:03 ` dima
@ 2011-10-10 11:29 ` David Sterba
2011-10-10 12:20 ` dima
0 siblings, 1 reply; 7+ messages in thread
From: David Sterba @ 2011-10-10 11:29 UTC (permalink / raw)
To: dima; +Cc: linux-btrfs
On Mon, Oct 10, 2011 at 11:03:34AM +0000, dima wrote:
> The last shutdown was clean, but I had to powercycle several times this month.
> I am also mounting a swapfile via loop device, so maybe this also adds up to
> instability.
>
> The corrupt file is a firefox source file
> (mozilla-central/js/src/tests/e4x/XML/13.4.4.40.js). Interesting thing that I
> did not touch this file or rebuild firefox for about 3-4 days, so I do not have
> any idea why it got corrupted suddenly.
>
> When trying to remove the directory containing this file I am getting:
>
> Oct 10 14:03:13 yukikaze kernel: [ 9836.993172] ------------[ cut here
> ]------------
> Oct 10 14:03:13 yukikaze kernel: [ 9836.993261] kernel BUG at
> fs/btrfs/inode.c:3024!
fixed by:
commit b532402e4d147e4f409c4e7f50d4413e8450101d
Author: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Date: Tue Jul 19 07:27:20 2011 +0000
Btrfs: return error to caller when btrfs_unlink() failes
When btrfs_unlink_inode() and btrfs_orphan_add() in btrfs_unlink()
are error, the error code is returned to the caller instead of
BUG_ON().
david
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Corrupt file in subvolume
2011-10-10 10:37 ` David Sterba
2011-10-10 11:03 ` dima
@ 2011-10-10 11:55 ` Kai Krakow
1 sibling, 0 replies; 7+ messages in thread
From: Kai Krakow @ 2011-10-10 11:55 UTC (permalink / raw)
To: linux-btrfs
David Sterba wrote:
>> Then I could mount the /home subvolume.
>>
>> I also found the corrupted file
>>? -????????? ? ? ? ? ? 13.4.4.40.js
>
> Chromium cache? Somebody recently reported a problem there. I wonder
> what this browser does to the filesystem ... :)
If you meant me by "someone": No, my problem was not related to chromium
usage - the problems only raised there because of a previous "cp --reflink"
issue while I continued browsing. ;-)
So it is pure coincidence because browser caches are a probable destination
for write access while my system came to a complete halt due to a browsing-
unrelated file operation (cp --reflink).
Regards,
Kai
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Corrupt file in subvolume
2011-10-10 11:29 ` David Sterba
@ 2011-10-10 12:20 ` dima
2011-10-11 13:30 ` dima
0 siblings, 1 reply; 7+ messages in thread
From: dima @ 2011-10-10 12:20 UTC (permalink / raw)
To: linux-btrfs
Oh, I see. The fix is not in 3.0.x but on the master branch. I will need the
latest 3.1 RC.
I will try this.
Thanks David
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Corrupt file in subvolume
2011-10-10 12:20 ` dima
@ 2011-10-11 13:30 ` dima
0 siblings, 0 replies; 7+ messages in thread
From: dima @ 2011-10-11 13:30 UTC (permalink / raw)
To: linux-btrfs
I have upgraded to 3.1 rc8.
I created a new subvolume for /home, copied the files there from the old
subvolume and deleted the old subvolume. It looks like the space has been
reclaimed fine.
Though when doing btrfsck I am still getting the same error
failed to find block number 150121762816
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-10-11 13:30 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-10 2:14 Corrupt file in subvolume dima
2011-10-10 10:37 ` David Sterba
2011-10-10 11:03 ` dima
2011-10-10 11:29 ` David Sterba
2011-10-10 12:20 ` dima
2011-10-11 13:30 ` dima
2011-10-10 11:55 ` Kai Krakow
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox