linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* OOPS on 3.11.6
@ 2013-11-04 23:11 Andy Lutomirski
  2013-11-05  8:30 ` Duncan
  0 siblings, 1 reply; 2+ messages in thread
From: Andy Lutomirski @ 2013-11-04 23:11 UTC (permalink / raw)
  To: linux-btrfs

(This is Fedora's kernel 3.11.6-200.fc19.x86_64)

I have a file on my btrfs filesystem.  Reading it results in:

[  170.261789] general protection fault: 0000 [#1] SMP
[  170.261950] Modules linked in: rfcomm fuse xt_CHECKSUM
nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE
ip6t_REJECT tun bnep bluetooth rfkill xt_conntrack ebtable_nat
ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat
nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle
ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
iptable_mangle iptable_security iptable_raw f71882fg vfat fat
snd_hda_codec_hdmi snd_hda_codec_realtek btrfs zlib_deflate raid6_pq
libcrc32c xor x86_pkg_temp_thermal coretemp kvm_intel iTCO_wdt joydev
kvm iTCO_vendor_support snd_hda_intel mxm_wmi snd_hda_codec snd_hwdep
snd_seq snd_seq_device snd_pcm microcode sb_edac i2c_i801 serio_raw
edac_core e1000e snd_page_alloc
[  170.264416]  ntb snd_timer mei_me snd ptp mei lpc_ich soundcore
shpchp pps_core wmi mfd_core mperf uinput binfmt_misc dm_crypt radeon
hid_logitech_dj crc32_pclmul i2c_algo_bit drm_kms_helper crc32c_intel
ttm ghash_clmulni_intel drm firewire_ohci firewire_core i2c_core
crc_itu_t
[  170.265260] CPU: 0 PID: 2947 Comm: thg Tainted: G        W
3.11.6-200.fc19.x86_64 #1
[  170.265503] Hardware name: MSI MS-7760/X79A-GD65 (8D) (MS-7760),
BIOS V1.8 10/18/2012
[  170.265745] task: ffff88042ee48f40 ti: ffff8803fdb30000 task.ti:
ffff8803fdb30000
[  170.265975] RIP: 0010:[<ffffffff81307d7d>]  [<ffffffff81307d7d>]
memcpy+0xd/0x110
[  170.266209] RSP: 0018:ffff8803fdb31960  EFLAGS: 00010202
[  170.266373] RAX: ffff88040369ede9 RBX: 000000000000006c RCX: 000000000000000d
[  170.266592] RDX: 0000000000000004 RSI: 0005080000000000 RDI: ffff88040369ede9
[  170.266812] RBP: ffff8803fdb31998 R08: 0000000000001000 R09: ffff88040369e000
[  170.267031] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8803eac91390
[  170.267251] R13: 0000160000000000 R14: ffff88040369ee55 R15: 000000000000006c
[  170.267471] FS:  00007f223b42c740(0000) GS:ffff88045fc00000(0000)
knlGS:0000000000000000
[  170.267721] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  170.267898] CR2: 00000000031a1024 CR3: 0000000412d6a000 CR4: 00000000000407f0
[  170.268117] Stack:
[  170.268179]  ffffffffa04bf26c 0000000000001000 ffff8804477de000
0000000000000000
[  170.268424]  ffff88041688a900 ffff8803eaca2210 ffff8803f0fcda18
ffff8803fdb31a58
[  170.268666]  ffffffffa04a4376 0000000000000000 000000000000019d
00000000ffffffff
[  170.268909] Call Trace:
[  170.268994]  [<ffffffffa04bf26c>] ? read_extent_buffer+0xbc/0x110 [btrfs]
[  170.269202]  [<ffffffffa04a4376>] btrfs_get_extent+0x926/0x9b0 [btrfs]
[  170.269403]  [<ffffffffa04bc53e>] __extent_read_full_page+0x2ee/0x700 [btrfs]
[  170.269622]  [<ffffffffa04a3a50>] ? btrfs_submit_direct+0x660/0x660 [btrfs]
[  170.269832]  [<ffffffff81159f53>] ? __inc_zone_page_state+0x33/0x40
[  170.270028]  [<ffffffffa04a3a50>] ? btrfs_submit_direct+0x660/0x660 [btrfs]
[  170.270243]  [<ffffffffa04bd945>] extent_readpages+0x195/0x200 [btrfs]
[  170.270440]  [<ffffffff81183129>] ? alloc_pages_current+0xa9/0x170
[  170.270635]  [<ffffffffa04a136f>] btrfs_readpages+0x1f/0x30 [btrfs]
[  170.270824]  [<ffffffff811484fe>] __do_page_cache_readahead+0x1ae/0x240
[  170.271027]  [<ffffffff811489c6>] ondemand_readahead+0x126/0x250
[  170.271212]  [<ffffffff81148b23>] page_cache_sync_readahead+0x33/0x50
[  170.271410]  [<ffffffff8113da45>] generic_file_aio_read+0x4b5/0x700
[  170.271604]  [<ffffffff811a7ab0>] do_sync_read+0x80/0xb0
[  170.271766]  [<ffffffff811a80de>] vfs_read+0x9e/0x170
[  170.271921]  [<ffffffff811a8c09>] SyS_read+0x49/0xa0
[  170.272074]  [<ffffffff810e6496>] ? __audit_syscall_exit+0x1f6/0x2a0
[  170.272271]  [<ffffffff81656e99>] system_call_fastpath+0x16/0x1b
[  170.272458] Code: 43 4e 5b 5d c3 66 0f 1f 84 00 00 00 00 00 e8 fb
fb ff ff eb e2 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 48 c1 e9
03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56
10 4c
[  170.273297] RIP  [<ffffffff81307d7d>] memcpy+0xd/0x110
[  170.273457]  RSP <ffff8803fdb31960>
[  170.348204] ---[ end trace 7d04a6835a0093fd ]---


This issue has survived a reboot.

(The taint flag is due to a bogus BGRT table in my EFI BIOS.  It's not
corrupting any kernel data structures.)


--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: OOPS on 3.11.6
  2013-11-04 23:11 OOPS on 3.11.6 Andy Lutomirski
@ 2013-11-05  8:30 ` Duncan
  0 siblings, 0 replies; 2+ messages in thread
From: Duncan @ 2013-11-05  8:30 UTC (permalink / raw)
  To: linux-btrfs

Andy Lutomirski posted on Mon, 04 Nov 2013 15:11:44 -0800 as excerpted:

> (This is Fedora's kernel 3.11.6-200.fc19.x86_64)
> 
> I have a file on my btrfs filesystem.  Reading it results in:
> 
> [  170.261789] general protection fault: 0000 [#1] SMP

I had a similar case recently (running 3.12-rc5+ at the time, I believe).

Unfortunately sometimes my storage takes longer to stabilize after resume 
from suspend-to-ram than the kernel is willing to wait (and again 
unfortunately I know of no knob for that, I already have "wait forever" 
set for boot, but the kernel apparently doesn't use the same knob for 
resume), and occasionally one of the devices drops out of my btrfs raid1 
configuration, with the resulting kernel and btrfs mayhem.

Root is never remounted read-write by default, only for system updates, 
so it remains consistent.  But my (separate btrfs) log and home 
partitions cannot be remounted read-only for the suspend due to files 
being in-use, so they remain read-write mounted thru the suspend, and 
when the device drops they go inconsistent.

Fortunately, most of the time a scrub after reboot seems to fix things up 
just fine, but the last time it happened, two files, my user's 
~/.bash_history and ~/.xsession_errors files, were apparently corrupted 
beyond what scrub could fix.  Despite scrub saying it fixed everything 
(and a rescrub resulting in no errors) any attempt to read those files 
resulted in a a hung task, which one of them being ./bash_history 
naturally meant I couldn't login at the console, and after fixing that, I 
still couldn't startx due to the ~/.xsession_errors problem.

I tried various ways (cat, etc) to read the files to see what the problem 
was, but that had the same result, so ultimately I simply blew them away 
with an rm, and let bash and X recreate them.

I've toyed with the idea of bind-mounting a couple of tmpfs files over 
the two (as I already do with $TMPDIR and $KDETMP except they're not 
bindmounts, just pointed at the appropriate tmpfs), since they're 
basically cached history/errors in any case and losing them isn't a big 
deal, but what if a more critical file happened to be being written when 
I suspended?  I suppose I could work thru my routinely open-write files 
one at a time, bindmounting tmpfs, until I could pre-suspend read-only 
mount /home in the routine case, and refuse to suspend if I couldn't read-
only mount, but that is beyond the ability of most users and /shouldn't/ 
be necessary.

What really bothers me that scrub supposedly fixed all the errors, yet 
these files were still corrupt to the point that even a cat of the 
affected file would hang the system -- so obviously the filesystem wasn't 
in a consistent state despite scrub's claims.  What would it take for 
btrfs in raid1 mode to atomically update one copy at a time, so a scrub 
would consistently recreate either the pre-write or the post-write copy, 
and the file would never be corrupted by a crash at the wrong moment 
beyond what scrub could recover to either one or the other, 
consistently?  Isn't atomic COW supposed to already do just that?

But with a read-only mounted root, at least I should always have full 
recovery tools available to me. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-11-05  8:31 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-04 23:11 OOPS on 3.11.6 Andy Lutomirski
2013-11-05  8:30 ` Duncan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).