* [Bug 113041] mbcache NULL pointer dereference
2016-02-24 10:05 [Bug 113041] New: mbcache NULL pointer dereference bugzilla-daemon
@ 2016-03-15 2:11 ` bugzilla-daemon
2016-03-15 13:07 ` bugzilla-daemon
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2016-03-15 2:11 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=113041
nickkrause@sympatico.ca changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |nickkrause@sympatico.ca
--- Comment #1 from nickkrause@sympatico.ca ---
Have you tried a newer rc candidate to see if this bug has been fixed.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug 113041] mbcache NULL pointer dereference
2016-02-24 10:05 [Bug 113041] New: mbcache NULL pointer dereference bugzilla-daemon
2016-03-15 2:11 ` [Bug 113041] " bugzilla-daemon
@ 2016-03-15 13:07 ` bugzilla-daemon
2016-03-15 15:47 ` bugzilla-daemon
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2016-03-15 13:07 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=113041
--- Comment #2 from Johnny <johnny+bugzilla@appdata.biz> ---
Unfortunately not, as I don't know how to reproduce the issue and it happened
in a production environment where we rely on the version that the CoreOS
distribution provides.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug 113041] mbcache NULL pointer dereference
2016-02-24 10:05 [Bug 113041] New: mbcache NULL pointer dereference bugzilla-daemon
2016-03-15 2:11 ` [Bug 113041] " bugzilla-daemon
2016-03-15 13:07 ` bugzilla-daemon
@ 2016-03-15 15:47 ` bugzilla-daemon
2016-04-11 13:21 ` bugzilla-daemon
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2016-03-15 15:47 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=113041
--- Comment #3 from nickkrause@sympatico.ca ---
If you tell me what Cassandra was doing I may be able to find the issue through
reading the code carefully but I would like to still test it to make sure.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug 113041] mbcache NULL pointer dereference
2016-02-24 10:05 [Bug 113041] New: mbcache NULL pointer dereference bugzilla-daemon
` (2 preceding siblings ...)
2016-03-15 15:47 ` bugzilla-daemon
@ 2016-04-11 13:21 ` bugzilla-daemon
2016-04-27 7:26 ` bugzilla-daemon
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2016-04-11 13:21 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=113041
--- Comment #4 from Johnny <johnny+bugzilla@appdata.biz> ---
Another crash today with a similar trace output:
```
[511806.488629] general protection fault: 0000 [#1] SMP
[511806.489335] Modules linked in: xt_conntrack ipt_MASQUERADE
nf_nat_masquerade_ipv4 vxlan ip6_udp_tunnel udp_tunnel iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter
br_netfilter nf_nat nf_conntrack bridge stp llc xfs libcrc32c nls_ascii
nls_cp437 vfat fat xenfs xen_privcmd ext4 crc16 mbcache jbd2 crc32c_intel hmac
drbg aesni_intel ata_piix aes_x86_64 glue_helper libata lrw mousedev gf128mul
ablk_helper cryptd xen_blkfront microcode i2c_piix4 firmware_class scsi_mod
psmouse i2c_core ixgbevf evdev acpi_cpufreq button sch_fq_codel ip_tables
autofs4
[511806.520082] CPU: 2 PID: 57829 Comm: java Not tainted 4.2.2-coreos-r2 #2
[511806.529094] Hardware name: Xen HVM domU, BIOS 4.2.amazon 12/07/2015
[511806.529094] task: ffff8801636e0000 ti: ffff88015aaf0000 task.ti:
ffff88015aaf0000
[511806.529094] RIP: 0010:[<ffffffff812c3bf9>] [<ffffffff812c3bf9>]
strnlen+0x9/0x40
[511806.529094] RSP: 0018:ffff88015aaf3128 EFLAGS: 00010086
[511806.529094] RAX: ffffffff817c48ce RBX: ffffffff8356e003 RCX:
0000000000000000
[511806.529094] RDX: 017fff0000080078 RSI: ffffffffffffffff RDI:
017fff0000080078
[511806.529094] RBP: ffff88015aaf3128 R08: 000000000000ffff R09:
000000000000ffff
[511806.529094] R10: ffff880770658f80 R11: ffff88072d51e888 R12:
017fff0000080078
[511806.529094] R13: ffffffff8356e3a0 R14: 00000000ffffffff R15:
0000000000000000
[511806.529094] FS: 00007ff4a85f8700(0000) GS:ffff880770640000(0000)
knlGS:0000000000000000
[511806.529094] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[511806.529094] CR2: 00007ff65800e000 CR3: 00000006e0ff1000 CR4:
00000000001406e0
[511806.529094] Stack:
[511806.529094] ffff88015aaf3168 ffffffff812c58ff 0000000000000296
ffffffff8356e003
[511806.529094] ffffffff8356e3a0 ffff88015aaf32b0 ffffffff817c9288
ffffffff817c9288
[511806.529094] ffff88015aaf31e8 ffffffff812c73b3 ffff88015aaf31b8
ffffffff81154868
[511806.529094] Call Trace:
[511806.529094] [<ffffffff812c58ff>] string.isra.4+0x3f/0xd0
[511806.529094] [<ffffffff812c73b3>] vsnprintf+0x163/0x510
[511806.529094] [<ffffffff81154868>] ? free_hot_cold_page_list+0x48/0xa0
[511806.529094] [<ffffffff812c7771>] vscnprintf+0x11/0x40
[511806.529094] [<ffffffff810bd548>] vprintk_emit+0x128/0x530
[511806.529094] [<ffffffff810bda9f>] vprintk_default+0x1f/0x30
[511806.529094] [<ffffffff815250d3>] printk+0x46/0x48
[511806.529094] [<ffffffff811a318a>] kmem_cache_free+0x13a/0x1f0
[511806.529094] [<ffffffff810afc01>] ?
__raw_callee_save___pv_queued_spin_unlock+0x11/0x20
[511806.529094] [<ffffffffa003f0ce>] 0xffffffffa003f0ce
[511806.529094] [<ffffffffa003ffac>] mb_cache_shrink+0x2bc/0x3a0 [mbcache]
[511806.529094] [<ffffffff8115ef3d>] shrink_slab+0x1ed/0x370
[511806.529094] [<ffffffff8109cd00>] ? enqueue_entity+0x3e0/0xdc0
[511806.529094] [<ffffffff81163283>] shrink_zone+0x283/0x290
[511806.529094] [<ffffffff811633ec>] do_try_to_free_pages+0x15c/0x430
[511806.529094] [<ffffffff8116377a>] try_to_free_pages+0xba/0x130
[511806.529094] [<ffffffff8115658a>] __alloc_pages_nodemask+0x56a/0x970
[511806.529094] [<ffffffff81199221>] alloc_pages_current+0x91/0x100
[511806.529094] [<ffffffff811a3d9c>] new_slab+0x34c/0x440
[511806.529094] [<ffffffff810afc01>] ?
__raw_callee_save___pv_queued_spin_unlock+0x11/0x20
[511806.529094] [<ffffffff811a4239>] __slab_alloc+0x3a9/0x490
[511806.529094] [<ffffffffa017aa6f>] ? ext4_orphan_del+0x47ff/0xda20 [ext4]
[511806.529094] [<ffffffff8126818c>] ? hashtab_search+0x5c/0x80
[511806.529094] [<ffffffff81274787>] ? mls_level_isvalid+0x57/0x60
[511806.529094] [<ffffffffa017aa6f>] ? ext4_orphan_del+0x47ff/0xda20 [ext4]
[511806.529094] [<ffffffff811a44b1>] kmem_cache_alloc+0x191/0x1f0
[511806.529094] [<ffffffffa017aa6f>] ext4_orphan_del+0x47ff/0xda20 [ext4]
[511806.529094] [<ffffffff811d7a9d>] alloc_inode+0x1d/0x90
[511806.529094] [<ffffffff811d98a1>] new_inode_pseudo+0x11/0x60
[511806.529094] [<ffffffff811d990b>] new_inode+0x1b/0x40
[511806.529094] [<ffffffffa0163c7f>] __ext4_new_inode+0x7f/0x1190 [ext4]
[511806.529094] [<ffffffffa017463c>] ext4_insert_dentry+0x188c/0x1900 [ext4]
[511806.529094] [<ffffffff811c9e2a>] vfs_create+0xca/0x130
[511806.529094] [<ffffffff8123c748>] ovl_create_real+0xb8/0x230
[511806.529094] [<ffffffff8123d9ab>] ovl_create_or_link+0x10b/0x500
[511806.529094] [<ffffffff8123dddd>] ovl_create_object+0x3d/0x60
[511806.529094] [<ffffffff8125d533>] ? selinux_inode_create+0x13/0x20
[511806.529094] [<ffffffff8123deb1>] ovl_create+0x21/0x30
[511806.529094] [<ffffffff811c9e2a>] vfs_create+0xca/0x130
[511806.529094] [<ffffffff811cc3f1>] path_openat+0xab1/0x13e0
[511806.529094] [<ffffffff811cce9b>] ? putname+0x5b/0x60
[511806.529094] [<ffffffff81090f6f>] ? wake_up_q+0x2f/0x70
[511806.529094] [<ffffffff811a4499>] ? kmem_cache_alloc+0x179/0x1f0
[511806.529094] [<ffffffff811cdddb>] do_filp_open+0x7b/0xe0
[511806.529094] [<ffffffff811daeb9>] ? __alloc_fd+0x89/0x110
[511806.529094] [<ffffffff811bd27c>] do_sys_open+0x12c/0x210
[511806.529094] [<ffffffff81021b4f>] ? syscall_trace_enter_phase1+0xff/0x150
[511806.529094] [<ffffffff811bd37e>] SyS_open+0x1e/0x20
[511806.529094] [<ffffffff8152bbae>] entry_SYSCALL_64_fastpath+0x12/0x71
[511806.529094] Code: 00 00 80 3f 00 55 48 89 e5 74 11 48 89 f8 48 83 c0 01 80
38 00 75 f7 48 29 f8 5d c3 31 c0 5d c3 66 90 55 48 85 f6 48 89 e5 74 2d <80> 3f
00 74 28 48 8d 47 01 48 01 fe eb 0a 48 83 c0 01 80 78 ff
[511806.529094] RIP [<ffffffff812c3bf9>] strnlen+0x9/0x40
[511806.529094] RSP <ffff88015aaf3128>
[511806.529094] ---[ end trace 045dada6ce1782d4 ]---
[511806.529094] Kernel panic - not syncing: Fatal exception
[511806.529094] Kernel Offset: disabled
```
It could possibly be related to making backups of the data files of cassandra
at the same time. As there are no logs from cassandra at the moment of the
crash, it's hard to know exactly what it's trying to do.
A general observation is that in both traces there is something mentioning
deleting files on ext4 while the cassandra storage is supposed to use xfs in
our mount table. Also cassandra is doing file compactions moving data around
pretty much all the time but there are no extra ordinary readings from the disk
statistics at the time of the crash.
Additional note is that the version of cassandra is 2.1.11-1, not .12 as
previously mentioned.
Also linux version is higher this time:
Linux version 4.3.6-coreos (buildbot@ip-10-204-3-57) (gcc version 4.9.3 (Gentoo
Hardened 4.9.3 p1.3, pie-0.6.3) ) #2 SMP Tue Apr 5 10:32:16 UTC 2016
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug 113041] mbcache NULL pointer dereference
2016-02-24 10:05 [Bug 113041] New: mbcache NULL pointer dereference bugzilla-daemon
` (3 preceding siblings ...)
2016-04-11 13:21 ` bugzilla-daemon
@ 2016-04-27 7:26 ` bugzilla-daemon
2016-04-27 7:36 ` bugzilla-daemon
2016-04-27 21:35 ` bugzilla-daemon
6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2016-04-27 7:26 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=113041
--- Comment #5 from Johnny <johnny+bugzilla@appdata.biz> ---
And another with cassandra 2.1.13 and again kernel 4.3.6:
```
[121437.908906] general protection fault: 0000 [#1] SMP
[121437.912476] Modules linked in: veth xt_conntrack ipt_MASQUERADE
nf_nat_masquerade_ipv4 vxlan ip6_udp_tunnel udp_tunnel iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter nf_nat
nf_conntrack br_netfilter bridge stp llc overlay xfs libcrc32c crc32c_generic
nls_ascii nls_cp437 vfat fat xenfs xen_privcmd ext4 crc16 mbcache jbd2
crc32c_intel hmac drbg aesni_intel aes_x86_64 glue_helper lrw gf128mul
ablk_helper ata_piix cryptd mousedev libata xen_blkfront microcode
firmware_class psmouse evdev scsi_mod i2c_piix4 ixgbevf i2c_core acpi_cpufreq
tpm_tis tpm button sch_fq_codel ip_tables autofs4
[121437.936337] CPU: 2 PID: 66 Comm: kswapd0 Not tainted 4.3.6-coreos #2
[121437.936337] Hardware name: Xen HVM domU, BIOS 4.2.amazon 12/07/2015
[121437.936337] task: ffff8803bde79d00 ti: ffff8803b9ef0000 task.ti:
ffff8803b9ef0000
[121437.936337] RIP: 0010:[<ffffffffad1ac714>] [<ffffffffad1ac714>]
kmem_cache_free+0x74/0x1e0
[121437.936337] RSP: 0018:ffff8803b9ef3bf8 EFLAGS: 00010246
[121437.936337] RAX: 017fff0000000080 RBX: ffff8803819614e0 RCX:
000000010027001b
[121437.936337] RDX: 000077ff80000000 RSI: ffff8803819614e0 RDI:
ffffea0006c579a0
[121437.936337] RBP: ffff8803b9ef3c10 R08: 0000000081961401 R09:
ffffffffc02a001a
[121437.936337] R10: ffffea000934e060 R11: ffffea000e065840 R12:
ffffea0006c579a0
[121437.936337] R13: 0000000000000059 R14: 0000000000000080 R15:
ffffffffc02a3000
[121437.936337] FS: 0000000000000000(0000) GS:ffff8803cfc40000(0000)
knlGS:0000000000000000
[121437.936337] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[121437.936337] CR2: 00007f3084000000 CR3: 000000002da0b000 CR4:
00000000001406e0
[121437.936337] Stack:
[121437.936337] ffffea000934e040 ffff8803b9ef3c38 0000000000000059
ffff8803b9ef3c28
[121437.936337] ffffffffc02a001a ffff880381961270 ffff8803b9ef3c68
ffffffffc02a103f
[121437.936337] ffff880128c393a8 ffff8802663be680 00000000a184fd14
0000000000000088
[121437.936337] Call Trace:
[121437.936337] [<ffffffffc02a001a>] 0xffffffffc02a001a
[121437.936337] [<ffffffffc02a103f>] mb_cache_entry_find_next+0x17f/0x270
[mbcache]
[121437.936337] [<ffffffffad1669ae>] shrink_slab.part.42+0x1de/0x370
[121437.936337] [<ffffffffad16aa8d>] shrink_zone+0x28d/0x2d0
[121437.936337] [<ffffffffad16ba91>] kswapd+0x551/0x9e0
[121437.936337] [<ffffffffad16b540>] ? mem_cgroup_shrink_node_zone+0x190/0x190
[121437.936337] [<ffffffffad08e178>] kthread+0xd8/0xf0
[121437.936337] [<ffffffffad08e0a0>] ? kthread_park+0x60/0x60
[121437.936337] [<ffffffffad54749f>] ret_from_fork+0x3f/0x70
[121437.936337] [<ffffffffad08e0a0>] ? kthread_park+0x60/0x60
[121437.936337] Code: 01 d8 48 0f 42 15 1d 59 86 00 4c 8b 4d 08 48 01 d0 48 c1
e8 0c 48 c1 e0 06 49 01 c3 49 8b 03 f6 c4 80 0f 85 56 01 00 00 4c 8b 17 <65> 49
8b 52 08 65 4c 03 15 e7 d9 e5 52 4d 3b 5a 10 0f 85 29 01
[121437.936337] RIP [<ffffffffad1ac714>] kmem_cache_free+0x74/0x1e0
[121437.936337] RSP <ffff8803b9ef3bf8>
[121438.134600] ---[ end trace 199019257ae805c3 ]---
[121438.137849] Kernel panic - not syncing: Fatal exception
[121438.138841] Kernel Offset: 0x2c000000 from 0xffffffff81000000 (relocation
range: 0xffffffff80000000-0xffffffffbfffffff)
```
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug 113041] mbcache NULL pointer dereference
2016-02-24 10:05 [Bug 113041] New: mbcache NULL pointer dereference bugzilla-daemon
` (4 preceding siblings ...)
2016-04-27 7:26 ` bugzilla-daemon
@ 2016-04-27 7:36 ` bugzilla-daemon
2016-04-27 21:35 ` bugzilla-daemon
6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2016-04-27 7:36 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=113041
--- Comment #6 from Johnny <johnny+bugzilla@appdata.biz> ---
What's new in latest trace is that the comm is kswapd0 and not java.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug 113041] mbcache NULL pointer dereference
2016-02-24 10:05 [Bug 113041] New: mbcache NULL pointer dereference bugzilla-daemon
` (5 preceding siblings ...)
2016-04-27 7:36 ` bugzilla-daemon
@ 2016-04-27 21:35 ` bugzilla-daemon
6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2016-04-27 21:35 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=113041
Theodore Tso <tytso@mit.edu> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |tytso@mit.edu
--- Comment #7 from Theodore Tso <tytso@mit.edu> ---
The comm field doesn't really matter all that much. The crash is in the
mbcache slab shrinker, which gets called from the VM when the system us under
memory pressure.
It looks like the crash is in the extended attribute cache which is in turn
triggered by SELinux. (As far as I know Cassandra doesn't use extended
attributes.)
Note that the 4.3.x kernel is not a long-term supported kernel, and it's no
longer automatically getting bug fixes ported to it, at least not in the
upstream. If CoreOS is providing their own security updates, then you should
really ask them for support because this would be a distro-kernel that has
changes not seen or supported by usptream developers.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 8+ messages in thread