All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tomasz Chmielewski <tch@virtall.com>
To: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: kernel crashes with btrfs and busy database IO - how to debug?
Date: Thu, 11 Jun 2015 20:33:41 +0900	[thread overview]
Message-ID: <ae9b9ca434c98509ca6c1ba6dbd84b63@admin.virtall.com> (raw)

I have a server where I've installed a couple of LXC guests, btrfs - so 
easy to test things with snapshots. Or so it seems.

Unfortunately the box crashes when I put "too much IO load" - with too 
much load being these two running at the same time:

- quite busy MySQL database (doing up to 100% IO wait when running 
alone)
- busy mongo database (doing up to 100% IO wait when running alone)

With both mongo and mysql running at the same time, it crashes after 1-2 
days (tried kernels 4.0.4, 4.0.5, 4.1-rc7 from Ubuntu "kernel-ppa"). It 
does not crash if I only run mongo, or only mysql. There is plenty of 
memory available (just around 2-4 GB used out of 32 GB) when it crashes.

As the box is only reachable remotely, I'm not able to catch a crash.
Sometimes, I'm able to get a bit of it printed via remote SSH, like 
here:

[162276.341030] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000008
[162276.341069] IP: [<ffffffff810c06cd>] 
prepare_to_wait_event+0xcd/0x100
[162276.341096] PGD 80a15e067 PUD 6e08c2067 PMD 0
[162276.341116] Oops: 0002 [#1] SMP
[162276.341133] Modules linked in: xfs libcrc32c xt_conntrack veth 
xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat 
nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc 
intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp 
kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel 
aesni_intel aes_x86_64 lrw eeepc_wmi gf128mul asus_wmi glue_helper 
sparse_keymap ablk_helper cryptd ie31200_edac shpchp lpc_ich edac_core 
mac_hid 8250_fintek tpm_infineon wmi serio_raw video lp parport btrfs 
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor 
async_tx xor raid6_pq e1000e raid1 raid0 ahci ptp libahci multipath 
pps_core linear [last unloaded: xfs]
[162276.341394] CPU: 6 PID: 12853 Comm: mysqld Not tainted 
4.1.0-040100rc7-generic #201506080035
[162276.341428] Hardware name: System manufacturer System Product 
Name/P8B WS, BIOS 0904 10/24/2011
[162276.341463] task: ffff8800730d8a10 ti: ffff88047a0f8000 task.ti: 
ffff88047a0f8000
[162276.341495] RIP: 0010:[<ffffffff810c06cd>]  [<ffffffff810c06cd>] 
prepare_to_wait_event+0xcd/0x100
[162276.341532] RSP: 0018:ffff88047a0fbcd8  EFLAGS: 00010046
[162276.341583] RDX: ffff88047a0fbd48 RSI: ffff8800730d8a10 RDI: 
ffff8801e2f96ee8
[162276.341615] RBP: ffff88047a0fbd08 R08: 0000000000000000 R09: 
0000000000000001
[162276.341646] R10: 0000000000000001 R11: 0000000000000000 R12: 
ffff8801e2f96ee8
[162276.341678] R13: 0000000000000002 R14: ffff8801e2f96e60 R15: 
ffff8806b513f248
[162276.341709] FS:  00007f9f2bbd3700(0000) GS:ffff88082fb80000(0000) 
knlGS:0000000000000000

Remote syslog does not capture anything.

The above crash does not point at btrfs - although the box does not 
crash with the same tests done on ext4. The box passes memtests and is 
generally stable otherwise.

How can I debug this further?


"prepare_to_wait_event" can be found here in 4.1-rc7 kernel:

include/linux/wait.h:           long __int = prepare_to_wait_event(&wq, 
&__wait, state);\
include/linux/wait.h:long prepare_to_wait_event(wait_queue_head_t *q, 
wait_queue_t *wait, int state);
kernel/sched/wait.c:long prepare_to_wait_event(wait_queue_head_t *q, 
wait_queue_t *wait, int state)
kernel/sched/wait.c:EXPORT_SYMBOL(prepare_to_wait_event);



-- 
Tomasz Chmielewski
http://wpkg.org


             reply	other threads:[~2015-06-11 11:33 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-11 11:33 Tomasz Chmielewski [this message]
2015-06-12  7:13 ` kernel crashes with btrfs and busy database IO - how to debug? Qu Wenruo
2015-06-12  8:35   ` Tomasz Chmielewski
2015-06-12  9:09     ` Qu Wenruo
2015-06-12 23:23       ` Tomasz Chmielewski
2015-06-14  0:30         ` Tomasz Chmielewski
2015-06-14  7:58           ` Tomasz Chmielewski
2015-06-15  8:10             ` Qu Wenruo
2015-06-15 10:31               ` Tomasz Chmielewski
2015-06-12  7:53 ` Duncan
2015-06-12 16:26 ` Chris Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ae9b9ca434c98509ca6c1ba6dbd84b63@admin.virtall.com \
    --to=tch@virtall.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.