linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice?
@ 2012-04-09 14:35 Daniel J Blueman
  2012-04-09 14:44 ` Leho Kraav
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel J Blueman @ 2012-04-09 14:35 UTC (permalink / raw)
  To: Leho Kraav; +Cc: Linux BTRFS, Liu Bo

Leho Kraav <leho <at> kraav.com> writes:
[]
> Apr  8 02:46:11 s9 kernel: [  189.691778] attempt to access beyond end
> of device
> Apr  8 02:46:11 s9 kernel: [  189.691787] dm-3: rw=129, want=23361976,
> limit=20967424

I recently bumped into this too [1]. Liu Bo posted a patch for it [2],
which tests out fine here. The workaround is to not mount with
'discard' until eg ~3.4-rc3 or later.

Thanks,
  Daniel

[1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16409
[2] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16649
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 10+ messages in thread
* btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice?
@ 2012-04-09 13:24 Leho Kraav
  0 siblings, 0 replies; 10+ messages in thread
From: Leho Kraav @ 2012-04-09 13:24 UTC (permalink / raw)
  To: linux-btrfs

Hi all

$ uname -a
Gentoo Linux s9 3.3.1-pf #2 SMP PREEMPT Mon Apr 9 00:35:28 EEST 2012 
i686 Intel(R) Core(TM) i5-2467M CPU @ 1.60GHz GenuineIntel GNU/Linux

I was running stuff for the past year or so on 4 partitions:

/dev/sda1 -> dm-crypt -> btrfs raid 0 ROOT 10.0GB
/dev/sda2 -> dm-crypt -> btrfs raid 0 ROOT 10.0GB
/dev/sda3 -> dm-crypt -> btrfs raid 0 HOME 10.0GB
/dev/sda4 -> dm-crypt -> btrfs raid 0 HOME 10.0GB

Both filesystems mounted with "noatime,nodiratime,ssd,discard,compress=lzo"

I set that multi-partition monster up back in the 2.6.36ish days, when 
dm-crypt either was not capable of utilizing multicores on a single 
partition or I possibly didn't know that it already could. At one point 
it definitely couldn't.

So over time HOME started filling up and at the point of last night's 
baby eating "df -hT" showed 1.7G free. Yes I know free space is 
complicated in btrfs. Space had not been an issue so I didn't think to 
use any better tools regularly to check, such as "btrfs fi show" I guess.

I upgraded my 3.2.2-pf to 3.3.1-pf* and proceeding to launching my 
regular apps Firefox, TB, office, etc. Except they all hung. Checking my 
/var/log/message window revealed what was happening:

* pf-sources => http://pf.natalenko.name/

...
Apr  8 02:45:52 s9 sudo:     leho : TTY=pts/0 ; PWD=/home/leho ; 
USER=root ; COMMAND=/bin/tail -
f /home/leho/.tail/awesome-leho /home/leho/.tail/messages 
/home/leho/.tail/openvpn.log
Apr  8 02:45:52 s9 sudo: pam_unix(sudo:session): session opened for user 
root by (uid=0)
Apr  8 02:46:11 s9 kernel: [  189.691778] attempt to access beyond end 
of device
Apr  8 02:46:11 s9 kernel: [  189.691787] dm-3: rw=129, want=23361976, 
limit=20967424
Apr  8 02:46:11 s9 kernel: [  189.691792] attempt to access beyond end 
of device
Apr  8 02:46:11 s9 kernel: [  189.691795] dm-3: rw=129, want=27556216, 
limit=20967424
Apr  8 02:46:11 s9 kernel: [  189.691799] attempt to access beyond end 
of device
...
Apr  8 02:46:11 s9 kernel: [  189.691869] attempt to access beyond end 
of device
Apr  8 02:46:11 s9 kernel: [  189.691874] dm-3: rw=129, want=69498616, 
limit=20967424
...
Apr  8 02:46:11 s9 kernel: [  189.692233] attempt to access beyond end 
of device
Apr  8 02:46:11 s9 kernel: [  189.692237] dm-3: rw=129, want=228879736, 
limit=20967424
(thousands of lines of this, as we can see "want" gets bigger all the time)

And it was all downhill from there. Result is a majorly corrupted 
filesystem that seems to be beyond repair. Hard rebooting back started 
giving csum errors in various spots and any modifications to the 
filesystem, even deleting files, would start another flood of "attempt 
to access beyond end of device", totally messing up syslog-ng. With 
blazing speedsc of an SSD that probably isn't a surprise.

So searching around, I found out about the ENOSPC thing which is 
possibly still an issue in 3.3. Is there any useful info I could provide 
for this? I now have some bigger partitions and probably won't run out 
of space again for a while.

I also discovered the btrfs "restore" binary, although possibly it was 
too late, since I had already hard rebooted a few times and done some 
more damage to HOME. This thing returned a whole bunch of "ret is -3" 
messages, and 0 byte files. Occasionally files were good as well. But 
majority of the files, seems to corrupt. When running out of space 
happens, is this a reasonable result to expect?

"btrfs scrub" reported uncorrectable errors count in the millions. At 
least thousands of csum mismatch errors visible in dmesg.

"btrfs balance" would bomb the machine with the same "access beyond end 
of device".

I made images of the two btrfs partitions on sda3 and sda4 for future 
diagnosis. I do think they are pretty corrupt though. Or could there be 
some magic poke or offset that would make more stuff magically 
"restore"-able :>

So in conclusion:

  * is filesystem-wide corruption like this helped by running on top of 
dm-crypt or btrfs multi device? dm-crypt is definitely staying for me, 
but I did consolidate partitions now to just 2.
  * what exactly should happen when an out of space scenario like the 
above happens?
  * I guess I should keep an eye on "btrfs fi show" on the regular?

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-04-10 15:31 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-04-09 14:35 btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice? Daniel J Blueman
2012-04-09 14:44 ` Leho Kraav
2012-04-09 14:54   ` Daniel J Blueman
2012-04-09 19:07     ` Martin Steigerwald
2012-04-09 20:58     ` Leho Kraav
2012-04-09 21:32       ` Leho Kraav
2012-04-09 23:19         ` David Sterba
2012-04-10  9:07           ` Ilya Dryomov
2012-04-10 15:31             ` Leho Kraav
  -- strict thread matches above, loose matches on Subject: below --
2012-04-09 13:24 Leho Kraav

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).