* btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice?
@ 2012-04-09 13:24 Leho Kraav
0 siblings, 0 replies; 10+ messages in thread
From: Leho Kraav @ 2012-04-09 13:24 UTC (permalink / raw)
To: linux-btrfs
Hi all
$ uname -a
Gentoo Linux s9 3.3.1-pf #2 SMP PREEMPT Mon Apr 9 00:35:28 EEST 2012
i686 Intel(R) Core(TM) i5-2467M CPU @ 1.60GHz GenuineIntel GNU/Linux
I was running stuff for the past year or so on 4 partitions:
/dev/sda1 -> dm-crypt -> btrfs raid 0 ROOT 10.0GB
/dev/sda2 -> dm-crypt -> btrfs raid 0 ROOT 10.0GB
/dev/sda3 -> dm-crypt -> btrfs raid 0 HOME 10.0GB
/dev/sda4 -> dm-crypt -> btrfs raid 0 HOME 10.0GB
Both filesystems mounted with "noatime,nodiratime,ssd,discard,compress=lzo"
I set that multi-partition monster up back in the 2.6.36ish days, when
dm-crypt either was not capable of utilizing multicores on a single
partition or I possibly didn't know that it already could. At one point
it definitely couldn't.
So over time HOME started filling up and at the point of last night's
baby eating "df -hT" showed 1.7G free. Yes I know free space is
complicated in btrfs. Space had not been an issue so I didn't think to
use any better tools regularly to check, such as "btrfs fi show" I guess.
I upgraded my 3.2.2-pf to 3.3.1-pf* and proceeding to launching my
regular apps Firefox, TB, office, etc. Except they all hung. Checking my
/var/log/message window revealed what was happening:
* pf-sources => http://pf.natalenko.name/
...
Apr 8 02:45:52 s9 sudo: leho : TTY=pts/0 ; PWD=/home/leho ;
USER=root ; COMMAND=/bin/tail -
f /home/leho/.tail/awesome-leho /home/leho/.tail/messages
/home/leho/.tail/openvpn.log
Apr 8 02:45:52 s9 sudo: pam_unix(sudo:session): session opened for user
root by (uid=0)
Apr 8 02:46:11 s9 kernel: [ 189.691778] attempt to access beyond end
of device
Apr 8 02:46:11 s9 kernel: [ 189.691787] dm-3: rw=129, want=23361976,
limit=20967424
Apr 8 02:46:11 s9 kernel: [ 189.691792] attempt to access beyond end
of device
Apr 8 02:46:11 s9 kernel: [ 189.691795] dm-3: rw=129, want=27556216,
limit=20967424
Apr 8 02:46:11 s9 kernel: [ 189.691799] attempt to access beyond end
of device
...
Apr 8 02:46:11 s9 kernel: [ 189.691869] attempt to access beyond end
of device
Apr 8 02:46:11 s9 kernel: [ 189.691874] dm-3: rw=129, want=69498616,
limit=20967424
...
Apr 8 02:46:11 s9 kernel: [ 189.692233] attempt to access beyond end
of device
Apr 8 02:46:11 s9 kernel: [ 189.692237] dm-3: rw=129, want=228879736,
limit=20967424
(thousands of lines of this, as we can see "want" gets bigger all the time)
And it was all downhill from there. Result is a majorly corrupted
filesystem that seems to be beyond repair. Hard rebooting back started
giving csum errors in various spots and any modifications to the
filesystem, even deleting files, would start another flood of "attempt
to access beyond end of device", totally messing up syslog-ng. With
blazing speedsc of an SSD that probably isn't a surprise.
So searching around, I found out about the ENOSPC thing which is
possibly still an issue in 3.3. Is there any useful info I could provide
for this? I now have some bigger partitions and probably won't run out
of space again for a while.
I also discovered the btrfs "restore" binary, although possibly it was
too late, since I had already hard rebooted a few times and done some
more damage to HOME. This thing returned a whole bunch of "ret is -3"
messages, and 0 byte files. Occasionally files were good as well. But
majority of the files, seems to corrupt. When running out of space
happens, is this a reasonable result to expect?
"btrfs scrub" reported uncorrectable errors count in the millions. At
least thousands of csum mismatch errors visible in dmesg.
"btrfs balance" would bomb the machine with the same "access beyond end
of device".
I made images of the two btrfs partitions on sda3 and sda4 for future
diagnosis. I do think they are pretty corrupt though. Or could there be
some magic poke or offset that would make more stuff magically
"restore"-able :>
So in conclusion:
* is filesystem-wide corruption like this helped by running on top of
dm-crypt or btrfs multi device? dm-crypt is definitely staying for me,
but I did consolidate partitions now to just 2.
* what exactly should happen when an out of space scenario like the
above happens?
* I guess I should keep an eye on "btrfs fi show" on the regular?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice?
@ 2012-04-09 14:35 Daniel J Blueman
2012-04-09 14:44 ` Leho Kraav
0 siblings, 1 reply; 10+ messages in thread
From: Daniel J Blueman @ 2012-04-09 14:35 UTC (permalink / raw)
To: Leho Kraav; +Cc: Linux BTRFS, Liu Bo
Leho Kraav <leho <at> kraav.com> writes:
[]
> Apr 8 02:46:11 s9 kernel: [ 189.691778] attempt to access beyond end
> of device
> Apr 8 02:46:11 s9 kernel: [ 189.691787] dm-3: rw=129, want=23361976,
> limit=20967424
I recently bumped into this too [1]. Liu Bo posted a patch for it [2],
which tests out fine here. The workaround is to not mount with
'discard' until eg ~3.4-rc3 or later.
Thanks,
Daniel
[1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16409
[2] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16649
--
Daniel J Blueman
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice?
2012-04-09 14:35 btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice? Daniel J Blueman
@ 2012-04-09 14:44 ` Leho Kraav
2012-04-09 14:54 ` Daniel J Blueman
0 siblings, 1 reply; 10+ messages in thread
From: Leho Kraav @ 2012-04-09 14:44 UTC (permalink / raw)
To: Daniel J Blueman; +Cc: Linux BTRFS, Liu Bo
On 09.04.2012 17:35, Daniel J Blueman wrote:
> Leho Kraav<leho<at> kraav.com> writes:
> []
>> Apr 8 02:46:11 s9 kernel: [ 189.691778] attempt to access beyond end
>> of device
>> Apr 8 02:46:11 s9 kernel: [ 189.691787] dm-3: rw=129, want=23361976,
>> limit=20967424
>
> I recently bumped into this too [1]. Liu Bo posted a patch for it [2],
> which tests out fine here. The workaround is to not mount with
> 'discard' until eg ~3.4-rc3 or later.
>
> [1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16409
> [2] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16649
Oh wow, thanks. This sounds exactly like what happened. I got the
livelock post off my search results, but the patch post doesn't seem to
have any of the keywords I was looking for, since I had no idea it could
be related to discards.
So can this become a problem earlier too, not only when the space used
is approaching limits? If not, I think I should be good until 3.4:
$ sudo btrfs fi show
Label: 'S9-HOME' uuid: 1ed06dbc-e1b7-433f-8d1b-19cf1f7756f1
Total devices 1 FS bytes used 12.93GB
devid 1 size 60.00GB used 20.04GB path /dev/dm-0
Label: 'S9-ROOT' uuid: 6206dfce-afcf-4afe-9047-b1c88a7889fd
Total devices 1 FS bytes used 8.75GB
devid 1 size 30.00GB used 18.29GB path /dev/dm-1
I think I'd like to keep using "discard" for SSD still, unless a smart
person says it's not particularly useful anyway.
So while I'm on 3.3, is the patch from gmane:16649 good enough to
eliminate immediate dangers?
And is the previous filesystem still hosed for good then? Or mounting
the images with -discard might help?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice?
2012-04-09 14:44 ` Leho Kraav
@ 2012-04-09 14:54 ` Daniel J Blueman
2012-04-09 19:07 ` Martin Steigerwald
2012-04-09 20:58 ` Leho Kraav
0 siblings, 2 replies; 10+ messages in thread
From: Daniel J Blueman @ 2012-04-09 14:54 UTC (permalink / raw)
To: Leho Kraav; +Cc: Linux BTRFS, Liu Bo
On 9 April 2012 22:44, Leho Kraav <leho@kraav.com> wrote:
> On 09.04.2012 17:35, Daniel J Blueman wrote:
>>
>> Leho Kraav<leho<at> =A0kraav.com> =A0writes:
>> []
>>>
>>> Apr =A08 02:46:11 s9 kernel: [ =A0189.691778] attempt to access bey=
ond end
>>> of device
>>> Apr =A08 02:46:11 s9 kernel: [ =A0189.691787] dm-3: rw=3D129, want=3D=
23361976,
>>> limit=3D20967424
>>
>>
>> I recently bumped into this too [1]. Liu Bo posted a patch for it [2=
],
>> which tests out fine here. The workaround is to not mount with
>> 'discard' until eg ~3.4-rc3 or later.
>>
>> [1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16409
>> [2] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16649
>
> Oh wow, thanks. This sounds exactly like what happened. I got the liv=
elock
> post off my search results, but the patch post doesn't seem to have a=
ny of
> the keywords I was looking for, since I had no idea it could be relat=
ed to
> discards.
>
> So can this become a problem earlier too, not only when the space use=
d is
> approaching limits? If not, I think I should be good until 3.4:
Looks like it affects at least 3.3 and 3.4-rc1/2 in all circumstances.
> $ sudo btrfs fi show
> Label: 'S9-HOME' =A0uuid: 1ed06dbc-e1b7-433f-8d1b-19cf1f7756f1
> =A0 =A0 =A0 =A0Total devices 1 FS bytes used 12.93GB
> =A0 =A0 =A0 =A0devid =A0 =A01 size 60.00GB used 20.04GB path /dev/dm-=
0
>
> Label: 'S9-ROOT' =A0uuid: 6206dfce-afcf-4afe-9047-b1c88a7889fd
> =A0 =A0 =A0 =A0Total devices 1 FS bytes used 8.75GB
> =A0 =A0 =A0 =A0devid =A0 =A01 size 30.00GB used 18.29GB path /dev/dm-=
1
>
> I think I'd like to keep using "discard" for SSD still, unless a smar=
t
> person says it's not particularly useful anyway.
If your SSD has background garbage collection and there are disk idle
periods, the synchronous discards will have little benefit.
> So while I'm on 3.3, is the patch from gmane:16649 good enough to eli=
minate
> immediate dangers?
Yes.
> And is the previous filesystem still hosed for good then? Or mounting=
the
> images with -discard might help?
It seems like the kernel caught and prevented the discard after the
end of the partition, so the data should be fine; scrubbing will tell
you.
Daniel
--=20
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice?
2012-04-09 14:54 ` Daniel J Blueman
@ 2012-04-09 19:07 ` Martin Steigerwald
2012-04-09 20:58 ` Leho Kraav
1 sibling, 0 replies; 10+ messages in thread
From: Martin Steigerwald @ 2012-04-09 19:07 UTC (permalink / raw)
To: linux-btrfs; +Cc: Daniel J Blueman, Leho Kraav, Liu Bo
Am Montag, 9. April 2012 schrieb Daniel J Blueman:
> On 9 April 2012 22:44, Leho Kraav <leho@kraav.com> wrote:
> > On 09.04.2012 17:35, Daniel J Blueman wrote:
> >> Leho Kraav<leho<at> kraav.com> writes:
> >> []
> >>=20
> >>> Apr 8 02:46:11 s9 kernel: [ 189.691778] attempt to access beyon=
d
> >>> end of device
> >>> Apr 8 02:46:11 s9 kernel: [ 189.691787] dm-3: rw=3D129,
> >>> want=3D23361976, limit=3D20967424
> >>=20
> >> I recently bumped into this too [1]. Liu Bo posted a patch for it
> >> [2], which tests out fine here. The workaround is to not mount wit=
h
> >> 'discard' until eg ~3.4-rc3 or later.
> >>=20
> >> [1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16409
> >> [2] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16649
> >=20
> > Oh wow, thanks. This sounds exactly like what happened. I got the
> > livelock post off my search results, but the patch post doesn't see=
m
> > to have any of the keywords I was looking for, since I had no idea
> > it could be related to discards.
> >=20
> > So can this become a problem earlier too, not only when the space
> > used is
>=20
> > approaching limits? If not, I think I should be good until 3.4:
> Looks like it affects at least 3.3 and 3.4-rc1/2 in all circumstances=
=2E
Is offline discard via fstrim also affected?
I used fstrim some times for my / BTRFS with 3.3.0-trunk Debian kernel=20
(should be 3.3.0) and
martin@merkaba:~> zgrep "beyond" /var/log/syslog*
martin@merkaba:~#1>
Seems I am safe.
But I think I won=B4t use fstrim for now anymore on any BTRFS partition=
=20
until I have some confirmation that it is safe.
Thanks,
--=20
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice?
2012-04-09 14:54 ` Daniel J Blueman
2012-04-09 19:07 ` Martin Steigerwald
@ 2012-04-09 20:58 ` Leho Kraav
2012-04-09 21:32 ` Leho Kraav
1 sibling, 1 reply; 10+ messages in thread
From: Leho Kraav @ 2012-04-09 20:58 UTC (permalink / raw)
To: Daniel J Blueman; +Cc: Linux BTRFS, Liu Bo
On 09.04.2012 17:54, Daniel J Blueman wrote:
> On 9 April 2012 22:44, Leho Kraav<leho@kraav.com> wrote:
>
>> And is the previous filesystem still hosed for good then? Or mounting the
>> images with -discard might help?
>
> It seems like the kernel caught and prevented the discard after the
> end of the partition, so the data should be fine; scrubbing will tell
> you.
Without the patch at least, it's BUG time. This is what happens when
mounting the image.
...
[171555.937706] device label HOME devid 1 transid 370409 /dev/loop3
[171555.956786] device label HOME devid 2 transid 370409 /dev/loop4
[171647.077501] device label HOME devid 2 transid 370409 /dev/loop4
[171647.196262] btrfs: continuing balance
[171650.826278] btrfs: relocating block group 18278776832 flags 9
[171651.218444] btrfs csum failed ino 257 off 262144 csum 3439556781
private 289331560
[171651.226455] btrfs csum failed ino 257 off 196608 csum 3957169907
private 1046207033
[171651.227070] btrfs csum failed ino 257 off 196608 csum 3957169907
private 1046207033
[171652.484666] ------------[ cut here ]------------
[171652.484669] kernel BUG at fs/btrfs/volumes.c:2487!
[171652.484671] invalid opcode: 0000 [#1] PREEMPT SMP
[171652.484673] Modules linked in: btrfs zlib_deflate lrw gf128mul
vboxnetadp(O) vboxnetflt(O) vboxdrv(O) coretemp it87 hwmon_vid hwmon nfs
autofs4 nfsd lockd nfs_acl auth_rpcgss sunrpc iptable_mangle ipt_ULOG
xt_recent xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state
nf_conntrack iptable_filter ip_tables x_tables squashfs imon rfcomm bnep
ext4 jbd2 snd_dummy loop fuse crc32c_intel nvidia(PO)
snd_hda_codec_realtek dvb_usb_dib0700 dib7000p dib0070 dvb_usb dvb_core
snd_hda_intel snd_hda_codec snd_pcm rc_core btusb bluetooth snd_timer
r8168(O) processor skge snd rtc_cmos sg snd_page_alloc button
dibx000_common i2c_i801 hid_logitech_dj hid_logitech usbhid sr_mod cdrom
firewire_ohci firewire_core crc_itu_t pata_jmicron uhci_hcd [last
unloaded: imon]
[171652.484703]
[171652.484705] Pid: 21206, comm: btrfs-balance Tainted: P O
3.3.1-vs2.3.3.2+pf #1 Gigabyte Technology Co., Ltd. P55M-UD2/P55M-UD2
[171652.484708] EIP: 0060:[<fa252ec9>] EFLAGS: 00010282 CPU: 1
[171652.484718] EIP is at btrfs_balance+0xe79/0xed0 [btrfs]
[171652.484719] EAX: fffffffb EBX: d0e58e00 ECX: 80240022 EDX: 80240023
[171652.484721] ESI: 7fd00000 EDI: 00000002 EBP: cc046068 ESP: dc5a3ef0
[171652.484722] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[171652.484724] Process btrfs-balance (pid: 21206, ti=dc5a2000
task=c40ee750 task.ti=dc5a2000)
[171652.484725] Stack:
[171652.484726] 00000096 c14d806c c1048ebb 00000046 00000046 7fe00000
00000002 df15d800
[171652.484729] 00000000 c8efc000 c1580e3d 00000030 00000000 00000246
00000000 ec85a800
[171652.484733] 00029e7f 0002fea6 dc5a3f52 00000010 00000000 00000003
00000246 00000000
[171652.484736] Call Trace:
[171652.484740] [<c1048ebb>] ? up+0xb/0x40
[171652.484743] [<c104dbfe>] ? try_to_wake_up+0x6e/0x100
[171652.484745] [<c104dc98>] ? default_wake_function+0x8/0x10
[171652.484752] [<fa252f7f>] ? balance_kthread+0x5f/0xa0 [btrfs]
[171652.484759] [<fa252f20>] ? btrfs_balance+0xed0/0xed0 [btrfs]
[171652.484761] [<c104381e>] ? kthread+0x6e/0x80
[171652.484763] [<c10437b0>] ? kthread_freezable_should_stop+0x50/0x50
[171652.484771] [<c13c0fb6>] ? kernel_thread_helper+0x6/0xd
[171652.484772] Code: 00 00 83 ea 02 83 c7 02 e9 ee fe ff ff c6 07 00 66
ba ff 03 8b 7c 24 60 83 c7 01 e9 cf fe ff ff 31 db e9 70 fe ff ff 0f 0b
0f 0b <0f> 0b 0f 0b 0f 0b 0f 0b 8b 74 24 7c c7 04 24 5c 47 28 fa 89 74
[171652.484793] EIP: [<fa252ec9>] btrfs_balance+0xe79/0xed0 [btrfs]
SS:ESP 0068:dc5a3ef0
[171652.484802] ---[ end trace 15f25988d7f952de ]---
...
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice?
2012-04-09 20:58 ` Leho Kraav
@ 2012-04-09 21:32 ` Leho Kraav
2012-04-09 23:19 ` David Sterba
0 siblings, 1 reply; 10+ messages in thread
From: Leho Kraav @ 2012-04-09 21:32 UTC (permalink / raw)
To: Daniel J Blueman; +Cc: Linux BTRFS, Liu Bo
On 09.04.2012 23:58, Leho Kraav wrote:
> On 09.04.2012 17:54, Daniel J Blueman wrote:
>> On 9 April 2012 22:44, Leho Kraav<leho@kraav.com> wrote:
>>
>>> And is the previous filesystem still hosed for good then? Or mounting
>>> the
>>> images with -discard might help?
>>
>> It seems like the kernel caught and prevented the discard after the
>> end of the partition, so the data should be fine; scrubbing will tell
>> you.
>
> Without the patch at least, it's BUG time. This is what happens when
> mounting the image.
>
It is also BUG time WITH the patch. Mount succeeds, but "btrfs fi
balance HOME" gives us:
Apr 10 00:24:18 server sudo: pam_unix(sudo:session): session opened for
user root by (uid=1000)
Apr 10 00:24:18 server kernel: [ 363.839105] ------------[ cut here
]------------
Apr 10 00:24:18 server kernel: [ 363.839163] kernel BUG at
fs/btrfs/volumes.c:2733!
Apr 10 00:24:18 server kernel: [ 363.839220] invalid opcode: 0000 [#1]
PREEMPT SMP
Apr 10 00:24:18 server kernel: [ 363.839258] Modules linked in: btrfs
zlib_deflate rfcomm bnep ext4 jbd2 snd_dummy loop fuse crc32c_intel
nvidia(PO) snd_hda_codec_realtek snd_hda_intel snd_hda_c
odec snd_pcm dvb_usb_dib0700 dvb_usb dib0070 dib7000p dibx000_common
imon dvb_core hid_logitech_dj btusb bluetooth hid_logitech rc_core skge
snd_page_alloc snd_timer processor snd r8168(O) butto
n i2c_i801 rtc_cmos usbhid sr_mod cdrom firewire_ohci firewire_core
crc_itu_t uhci_hcd pata_jmicron
Apr 10 00:24:18 server kernel: [ 363.839609]
Apr 10 00:24:18 server kernel: [ 363.839619] Pid: 4682, comm: btrfs
Tainted: P O 3.3.1-vs2.3.3.2+pf #1 Gigabyte Technology Co.,
Ltd. P55M-UD2/P55M-UD2
Apr 10 00:24:18 server kernel: [ 363.839677] EIP: 0060:[<f4f4deff>]
EFLAGS: 00210246 CPU: 1
Apr 10 00:24:18 server kernel: [ 363.839709] EIP is at
btrfs_balance+0xe7f/0xed0 [btrfs]
Apr 10 00:24:18 server kernel: [ 363.839732] EAX: ffffff00 EBX:
ffffffef ECX: 00000003 EDX: 00000303
Apr 10 00:24:18 server kernel: [ 363.839758] ESI: eb868e00 EDI:
00000000 EBP: 00000000 ESP: e8ebbdd8
Apr 10 00:24:18 server kernel: [ 363.839785] DS: 007b ES: 007b FS:
00d8 GS: 0033 SS: 0068
Apr 10 00:24:18 server kernel: [ 363.839809] Process btrfs (pid: 4682,
ti=e8eba000 task=eb2c8ab0 task.ti=e8eba000)
Apr 10 00:24:18 server kernel: [ 363.839839] Stack:
Apr 10 00:24:18 server kernel: [ 363.839850] 00000040 00000001
00000000 00000000 00000000 e8ebbe30 e8ece000 ec713bb4
Apr 10 00:24:18 server kernel: [ 363.839914] 00000097 eb945000
00000097 0000000e 00000000 00000002 00000000 f2f2d3b0
Apr 10 00:24:18 server kernel: [ 363.839987] ec0cdd34 f3153b00
e9240600 ec0cde00 c1094152 c10cbb6b eac98b00 00000001
Apr 10 00:24:18 server kernel: [ 363.840090] Call Trace:
Apr 10 00:24:18 server kernel: [ 363.840109] [<c1094152>] ?
filemap_fault+0x82/0x420
Apr 10 00:24:18 server kernel: [ 363.840132] [<c10cbb6b>] ?
__mem_cgroup_try_charge+0x28b/0x4c0
Apr 10 00:24:18 server kernel: [ 363.840160] [<c10aa619>] ?
__do_fault+0x3c9/0x510
Apr 10 00:24:18 server kernel: [ 363.840183] [<c10c2865>] ?
kmem_cache_alloc+0x75/0x90
Apr 10 00:24:18 server kernel: [ 363.840212] [<f4f53f19>] ?
btrfs_ioctl_balance.isra.52+0x379/0x390 [btrfs]
Apr 10 00:24:18 server kernel: [ 363.840246] [<f4f56230>] ?
update_ioctl_balance_args+0x2e0/0x2e0 [btrfs]
Apr 10 00:24:18 server kernel: [ 363.840280] [<f4f568a1>] ?
btrfs_ioctl+0x671/0x1200 [btrfs]
Apr 10 00:24:18 server kernel: [ 363.840306] [<c10ae1b4>] ?
handle_mm_fault+0x124/0x260
Apr 10 00:24:18 server kernel: [ 363.840334] [<f4f56230>] ?
update_ioctl_balance_args+0x2e0/0x2e0 [btrfs]
Apr 10 00:24:18 server kernel: [ 363.840363] [<c10e01ea>] ?
do_vfs_ioctl+0x7a/0x580
Apr 10 00:24:18 server kernel: [ 363.840386] [<c1020a10>] ?
vmalloc_sync_all+0x10/0x10
Apr 10 00:24:18 server kernel: [ 363.840409] [<c1020b95>] ?
do_page_fault+0x185/0x3d0
Apr 10 00:24:18 server kernel: [ 363.840432] [<c10cf15f>] ?
do_sys_open+0x15f/0x1b0
Apr 10 00:24:18 server kernel: [ 363.840453] [<c10df1e2>] ?
do_fcntl+0x232/0x470
Apr 10 00:24:18 server kernel: [ 363.840475] [<c10e071e>] ?
sys_ioctl+0x2e/0x60
Apr 10 00:24:18 server kernel: [ 363.840497] [<c13c0a4c>] ?
sysenter_do_call+0x12/0x22
Apr 10 00:24:18 server kernel: [ 363.840519] Code: c7 02 e9 ee fe ff ff
c6 07 00 66 ba ff 03 8b 7c 24 60 83 c7 01 e9 cf fe ff ff 31 db e9 70 fe
ff ff 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b <0f> 0b 8b 74 24 7c c7 04 24 9c f7
f7 f4 89 74 24 04 e8 dc ce 46
Apr 10 00:24:18 server kernel: [ 363.840933] EIP: [<f4f4deff>]
btrfs_balance+0xe7f/0xed0 [btrfs] SS:ESP 0068:e8ebbdd8
Apr 10 00:24:18 server kernel: [ 363.841023] ---[ end trace
8be1f61ebfe6132a ]---
~
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice?
2012-04-09 21:32 ` Leho Kraav
@ 2012-04-09 23:19 ` David Sterba
2012-04-10 9:07 ` Ilya Dryomov
0 siblings, 1 reply; 10+ messages in thread
From: David Sterba @ 2012-04-09 23:19 UTC (permalink / raw)
To: Leho Kraav; +Cc: Daniel J Blueman, Linux BTRFS, Liu Bo, idryomov
On Tue, Apr 10, 2012 at 12:32:00AM +0300, Leho Kraav wrote:
> It is also BUG time WITH the patch. Mount succeeds, but "btrfs fi balance
> HOME" gives us:
>
> Apr 10 00:24:18 server sudo: pam_unix(sudo:session): session opened for user > root by (uid=1000)
> Apr 10 00:24:18 server kernel: [ 363.839105] ------------[ cut here > ]------------
> Apr 10 00:24:18 server kernel: [ 363.839163] kernel BUG at > fs/btrfs/volumes.c:2733!
that's
2732 if (!(bctl->flags & BTRFS_BALANCE_RESUME)) {
2733 BUG_ON(ret == -EEXIST);
^^^^
2734 set_balance_control(bctl);
2735 } else {
2736 BUG_ON(ret != -EEXIST);
2737 spin_lock(&fs_info->balance_lock);
2738 update_balance_args(bctl);
2739 spin_unlock(&fs_info->balance_lock);
2740 }
IIRC somebody reported similar problem recently. It basically means
there's an inconsistent balance state. Adding Ilya to CC.
david
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice?
2012-04-09 23:19 ` David Sterba
@ 2012-04-10 9:07 ` Ilya Dryomov
2012-04-10 15:31 ` Leho Kraav
0 siblings, 1 reply; 10+ messages in thread
From: Ilya Dryomov @ 2012-04-10 9:07 UTC (permalink / raw)
To: Leho Kraav, Daniel J Blueman, Linux BTRFS, Liu Bo
On Tue, Apr 10, 2012 at 01:19:54AM +0200, David Sterba wrote:
> On Tue, Apr 10, 2012 at 12:32:00AM +0300, Leho Kraav wrote:
> > It is also BUG time WITH the patch. Mount succeeds, but "btrfs fi balance
> > HOME" gives us:
> >
> > Apr 10 00:24:18 server sudo: pam_unix(sudo:session): session opened for user > root by (uid=1000)
> > Apr 10 00:24:18 server kernel: [ 363.839105] ------------[ cut here > ]------------
> > Apr 10 00:24:18 server kernel: [ 363.839163] kernel BUG at > fs/btrfs/volumes.c:2733!
>
> that's
>
> 2732 if (!(bctl->flags & BTRFS_BALANCE_RESUME)) {
> 2733 BUG_ON(ret == -EEXIST);
> ^^^^
> 2734 set_balance_control(bctl);
> 2735 } else {
> 2736 BUG_ON(ret != -EEXIST);
> 2737 spin_lock(&fs_info->balance_lock);
> 2738 update_balance_args(bctl);
> 2739 spin_unlock(&fs_info->balance_lock);
> 2740 }
>
> IIRC somebody reported similar problem recently. It basically means
> there's an inconsistent balance state. Adding Ilya to CC.
Leho, so you just mount with discard patch and run 'btrfs fi balance
<mnt>', correct ?
The problem is that you have balance state on disk (from trying to run
balance earlier w/o discard patch) but we are failing to pick it up on
mount.
Could you please post the entire dmesg and the output of
'btrfs-debug-tree -d <dev>' somewhere ?
Could you also apply the debug patch below, mount your fs and send me
dmesg output (no need to run balance, just mount) ?
Thanks,
Ilya
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 20196f4..86fa082 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1867,6 +1867,7 @@ int open_ctree(struct super_block *sb,
csum_root = fs_info->csum_root = btrfs_alloc_root(fs_info);
chunk_root = fs_info->chunk_root = btrfs_alloc_root(fs_info);
dev_root = fs_info->dev_root = btrfs_alloc_root(fs_info);
+printk("open_ctree\n");
if (!tree_root || !extent_root || !csum_root ||
!chunk_root || !dev_root) {
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a872b48..2e39348 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2834,6 +2834,7 @@ static int balance_kthread(void *data)
mutex_lock(&fs_info->balance_mutex);
set_balance_control(bctl);
+printk("balance_kthread: flags %llu\n", (unsigned long long)bctl->flags);
if (btrfs_test_opt(fs_info->tree_root, SKIP_BALANCE)) {
printk(KERN_INFO "btrfs: force skipping balance\n");
@@ -2858,6 +2859,7 @@ int btrfs_recover_balance(struct btrfs_root *tree_root)
struct btrfs_key key;
int ret;
+printk("recover_balance\n");
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
@@ -2872,7 +2874,11 @@ int btrfs_recover_balance(struct btrfs_root *tree_root)
key.type = BTRFS_BALANCE_ITEM_KEY;
key.offset = 0;
+printk("key.obj %llu\n", (unsigned long long)key.objectid);
+printk("key.type %d\n", key.type);
+printk("key.off %llu\n", (unsigned long long)key.offset);
ret = btrfs_search_slot(NULL, tree_root, &key, path, 0, 0);
+printk("search ret %d\n", ret);
if (ret < 0)
goto out_bctl;
if (ret > 0) { /* ret = -ENOENT; */
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice?
2012-04-10 9:07 ` Ilya Dryomov
@ 2012-04-10 15:31 ` Leho Kraav
0 siblings, 0 replies; 10+ messages in thread
From: Leho Kraav @ 2012-04-10 15:31 UTC (permalink / raw)
To: Ilya Dryomov; +Cc: Daniel J Blueman, Linux BTRFS, Liu Bo
On 10.04.2012 12:07, Ilya Dryomov wrote:
> On Tue, Apr 10, 2012 at 01:19:54AM +0200, David Sterba wrote:
>>
>> IIRC somebody reported similar problem recently. It basically means
>> there's an inconsistent balance state. Adding Ilya to CC.
>
> Leho, so you just mount with discard patch and run 'btrfs fi balance
> <mnt>', correct ?
>
> The problem is that you have balance state on disk (from trying to run
> balance earlier w/o discard patch) but we are failing to pick it up on
> mount.
>
> Could you please post the entire dmesg and the output of
> 'btrfs-debug-tree -d<dev>' somewhere ?
>
> Could you also apply the debug patch below, mount your fs and send me
> dmesg output (no need to run balance, just mount) ?
>
Your understanding of situation based on above is correct.
Yes I can do all the stuff, but it's going to take until next week,
since I'm travelling and I can't risk BUG-ing my server remotely.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2012-04-10 15:31 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-04-09 14:35 btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice? Daniel J Blueman
2012-04-09 14:44 ` Leho Kraav
2012-04-09 14:54 ` Daniel J Blueman
2012-04-09 19:07 ` Martin Steigerwald
2012-04-09 20:58 ` Leho Kraav
2012-04-09 21:32 ` Leho Kraav
2012-04-09 23:19 ` David Sterba
2012-04-10 9:07 ` Ilya Dryomov
2012-04-10 15:31 ` Leho Kraav
-- strict thread matches above, loose matches on Subject: below --
2012-04-09 13:24 Leho Kraav
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).