* 3.10.20-rt17, BUG and Oops
@ 2013-11-30 7:56 Fernando Lopez-Lezcano
2013-11-30 16:40 ` Carsten Emde
0 siblings, 1 reply; 11+ messages in thread
From: Fernando Lopez-Lezcano @ 2013-11-30 7:56 UTC (permalink / raw)
To: linux-rt-users; +Cc: nando, LKML, Steven Rostedt
Hi all,
Just got this on 3.10.20-rt17, ThinkPad T510 running Fedora 19 (I think
it has happened a few times before). The machine is not completely dead,
the mouse pointer moves around but otherwise display updates and
keyboard response are nil.
-- Fernando
--------
Nov 29 23:17:52 localhost kernel: [50532.638944] BUG: unable to handle
kernel NULL pointer dereference at 00000000000002c7
Nov 29 23:17:52 localhost kernel: [50532.638951] IP:
[<ffffffff81361e9a>] advance_transaction+0x60/0x121
Nov 29 23:17:52 localhost kernel: [50532.638953] PGD 1db141067 PUD
228703067 PMD 0
Nov 29 23:17:52 localhost kernel: [50532.638955] Oops: 0000 [#1] PREEMPT
SMP
Nov 29 23:17:52 localhost kernel: [50532.638983] Modules linked in:
snd_hrtimer snd_seq_midi snd_seq_midi_event snd_seq_dummy snd_hdsp
snd_rawmidi fuse xt_CHECKSUM tun nf_conntrack_netbios_ns
nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT xt_conntrack
ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
ip6table_mangle ip6table_security ip6table_raw ip6table_filter
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
nf_nat nf_conntrack iptable_mangle iptable_security rfcomm iptable_raw
bnep iTCO_wdt iTCO_vendor_support snd_hda_codec_hdmi acpi_cpufreq mperf
coretemp kvm_intel kvm crc32_pclmul uvcvideo crc32c_intel
ghash_clmulni_intel videobuf2_vmalloc videobuf2_memops videobuf2_core
microcode videodev media serio_raw snd_hda_codec_conexant intel_ips
btusb i2c_i801 bluetooth arc4 iwldvm mac80211 snd_hda_intel
snd_hda_codec iwlwifi snd_hwdep sdhci_pci snd_seq sdhci snd_seq_device
cfg80211 mmc_core snd_pcm lpc_ich mfd_core e1000e snd_page_alloc ptp
mei_me snd_timer pps_core mei thinkpad_acpi snd soundcore rfkill shpchp
uinput nouveau i2c_algo_bit firewire_ohci drm_kms_helper firewire_core
crc_itu_t ttm drm i2c_core mxm_wmi video wmi
Nov 29 23:17:52 localhost kernel: [50532.639006] CPU: 0 PID: 45 Comm:
irq/9-acpi Not tainted 3.10.20-200.rt17.1.fc19.ccrma.x86_64.rt #1
Nov 29 23:17:52 localhost kernel: [50532.639007] Hardware name: LENOVO
4313CTO/4313CTO, BIOS 6MET64WW (1.27 ) 07/15/2010
Nov 29 23:17:52 localhost kernel: [50532.639008] task: ffff880229bc8000
ti: ffff880229bac000 task.ti: ffff880229bac000
Nov 29 23:17:52 localhost kernel: [50532.639011] RIP:
0010:[<ffffffff81361e9a>] [<ffffffff81361e9a>]
advance_transaction+0x60/0x121
Nov 29 23:17:52 localhost kernel: [50532.639012] RSP:
0018:ffff880229badd50 EFLAGS: 00010246
Nov 29 23:17:52 localhost kernel: [50532.639013] RAX: 0000000000000081
RBX: ffff88011339f990 RCX: 0000000000000082
Nov 29 23:17:52 localhost kernel: [50532.639013] RDX: 0000000000000246
RSI: 0000000000000001 RDI: ffff880229b78eb0
Nov 29 23:17:52 localhost kernel: [50532.639014] RBP: ffff880229badd70
R08: 0000000000000000 R09: 0000000000000000
Nov 29 23:17:52 localhost kernel: [50532.639015] R10: 0000000000000001
R11: 000000000000102a R12: ffff880229b78e00
Nov 29 23:17:52 localhost kernel: [50532.639016] R13: 0000000000000001
R14: ffff880229b78eb0 R15: ffff880229b78d36
Nov 29 23:17:52 localhost kernel: [50532.639017] FS:
0000000000000000(0000) GS:ffff88023bc00000(0000) knlGS:0000000000000000
Nov 29 23:17:52 localhost kernel: [50532.639018] CS: 0010 DS: 0000 ES:
0000 CR0: 000000008005003b
Nov 29 23:17:52 localhost kernel: [50532.639019] CR2: 00000000000002c7
CR3: 00000001e8198000 CR4: 00000000000007f0
Nov 29 23:17:52 localhost kernel: [50532.639020] DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
Nov 29 23:17:52 localhost kernel: [50532.639021] DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
Nov 29 23:17:52 localhost kernel: [50532.639021] Stack:
Nov 29 23:17:52 localhost kernel: [50532.639024] ffff880229b78e00
0000000000000001 ffff88022983b000 0000000000000001
Nov 29 23:17:52 localhost kernel: [50532.639026] ffff880229badd90
ffffffff8136258e ffff880229bc0198 0000000000000011
Nov 29 23:17:52 localhost kernel: [50532.639028] ffff880229baddb8
ffffffff8136c3a3 ffff880229b8a660 0000000000000000
Nov 29 23:17:52 localhost kernel: [50532.639028] Call Trace:
Nov 29 23:17:52 localhost kernel: [50532.639032] [<ffffffff8136258e>]
acpi_ec_gpe_handler+0x48/0xc9
Nov 29 23:17:52 localhost kernel: [50532.639036] [<ffffffff8136c3a3>]
acpi_ev_gpe_dispatch+0xb6/0x126
Nov 29 23:17:52 localhost kernel: [50532.639037] [<ffffffff8136c4d3>]
acpi_ev_gpe_detect+0xc0/0x111
Nov 29 23:17:52 localhost kernel: [50532.639043] [<ffffffff810f46b0>] ?
irq_thread_fn+0x50/0x50
Nov 29 23:17:52 localhost kernel: [50532.639044] [<ffffffff8136e3cf>]
acpi_ev_sci_xrupt_handler+0x1f/0x25
Nov 29 23:17:52 localhost kernel: [50532.639048] [<ffffffff8135b12f>]
acpi_irq+0x16/0x31
Nov 29 23:17:52 localhost kernel: [50532.639050] [<ffffffff810f46d3>]
irq_forced_thread_fn+0x23/0x70
Nov 29 23:17:52 localhost kernel: [50532.639051] [<ffffffff810f4c7f>]
irq_thread+0x10f/0x150
Nov 29 23:17:52 localhost kernel: [50532.639053] [<ffffffff810f4770>] ?
wake_threads_waitq+0x50/0x50
Nov 29 23:17:52 localhost kernel: [50532.639054] [<ffffffff810f4b70>] ?
irq_thread_check_affinity+0x90/0x90
Nov 29 23:17:52 localhost kernel: [50532.639058] [<ffffffff810865e2>]
kthread+0xb2/0xc0
Nov 29 23:17:52 localhost kernel: [50532.639059] [<ffffffff81086530>] ?
kthread_worker_fn+0x180/0x180
Nov 29 23:17:52 localhost kernel: [50532.639063] [<ffffffff816628ec>]
ret_from_fork+0x7c/0xb0
Nov 29 23:17:52 localhost kernel: [50532.639064] [<ffffffff81086530>] ?
kthread_worker_fn+0x180/0x180
Nov 29 23:17:52 localhost kernel: [50532.639076] Code: 0f 84 d0 00 00 00
0f b6 43 13 8a 4b 15 38 c1 76 44 41 f6 c5 02 0f 85 9c 00 00 00 8d 48 01
48 8b 13 88 4b 13 f6 05 98 2a 98 00 04 <8a> 1c 02 74 18 0f b6 d3 48 c7
c6 bf aa a2 81 48 c7 c7 10 49 ce
Nov 29 23:17:52 localhost kernel: [50532.639078] RIP
[<ffffffff81361e9a>] advance_transaction+0x60/0x121
Nov 29 23:17:52 localhost kernel: [50532.639079] RSP <ffff880229badd50>
Nov 29 23:17:52 localhost kernel: [50532.639079] CR2: 00000000000002c7
Nov 29 23:17:52 localhost kernel: [50532.652572] ---[ end trace
0000000000000002 ]---
Nov 29 23:17:52 localhost kernel: [50532.652622] BUG: unable to handle
kernel paging request at ffffffffffffffd8
Nov 29 23:17:52 localhost kernel: [50532.652635] IP:
[<ffffffff81086900>] kthread_data+0x10/0x20
Nov 29 23:17:52 localhost kernel: [50532.652637] PGD 1c12067 PUD 1c14067
PMD 0
Nov 29 23:17:52 localhost kernel: [50532.652640] Oops: 0000 [#2] PREEMPT
SMP
Nov 29 23:17:52 localhost kernel: [50532.652680] Modules linked in:
snd_hrtimer snd_seq_midi snd_seq_midi_event snd_seq_dummy snd_hdsp
snd_rawmidi fuse xt_CHECKSUM tun nf_conntrack_netbios_ns
nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT xt_conntrack
ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
ip6table_mangle ip6table_security ip6table_raw ip6table_filter
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
nf_nat nf_conntrack iptable_mangle iptable_security rfcomm iptable_raw
bnep iTCO_wdt iTCO_vendor_support snd_hda_codec_hdmi acpi_cpufreq mperf
coretemp kvm_intel kvm crc32_pclmul uvcvideo crc32c_intel
ghash_clmulni_intel videobuf2_vmalloc videobuf2_memops videobuf2_core
microcode videodev media serio_raw snd_hda_codec_conexant intel_ips
btusb i2c_i801 bluetooth arc4 iwldvm mac80211 snd_hda_intel
snd_hda_codec iwlwifi snd_hwdep sdhci_pci snd_seq sdhci snd_seq_device
cfg80211 mmc_core snd_pcm lpc_ich mfd_core e1000e snd_page_alloc ptp
mei_me snd_timer pps_core mei thinkpad_acpi snd soundcore rfkill shpchp
uinput nouveau i2c_algo_bit firewire_ohci drm_kms_helper firewire_core
crc_itu_t ttm drm i2c_core mxm_wmi video wmi
Nov 29 23:17:52 localhost kernel: [50532.652706] CPU: 0 PID: 45 Comm:
irq/9-acpi Tainted: G D 3.10.20-200.rt17.1.fc19.ccrma.x86_64.rt #1
Nov 29 23:17:52 localhost kernel: [50532.652707] Hardware name: LENOVO
4313CTO/4313CTO, BIOS 6MET64WW (1.27 ) 07/15/2010
Nov 29 23:17:52 localhost kernel: [50532.652708] task: ffff880229bc8000
ti: ffff880229bac000 task.ti: ffff880229bac000
Nov 29 23:17:52 localhost kernel: [50532.652713] RIP:
0010:[<ffffffff81086900>] [<ffffffff81086900>] kthread_data+0x10/0x20
Nov 29 23:17:52 localhost kernel: [50532.652713] RSP:
0018:ffff880229bad9e8 EFLAGS: 00010202
Nov 29 23:17:52 localhost kernel: [50532.652714] RAX: 0000000000000000
RBX: 0000000000000000 RCX: 0000000000000000
Nov 29 23:17:52 localhost kernel: [50532.652715] RDX: ffff880229bade90
RSI: 0000000000000000 RDI: ffff880229bc8000
Nov 29 23:17:52 localhost kernel: [50532.652716] RBP: ffff880229bad9e8
R08: 0000000000000000 R09: 0000000000000000
Nov 29 23:17:52 localhost kernel: [50532.652717] R10: 0000000040000004
R11: 0000000000000000 R12: ffff880229bc8000
Nov 29 23:17:52 localhost kernel: [50532.652718] R13: ffff880229bc85a8
R14: ffff880229bc8000 R15: ffff880229bc8000
Nov 29 23:17:52 localhost kernel: [50532.652720] FS:
0000000000000000(0000) GS:ffff88023bc00000(0000) knlGS:0000000000000000
Nov 29 23:17:52 localhost kernel: [50532.652720] CS: 0010 DS: 0000 ES:
0000 CR0: 000000008005003b
Nov 29 23:17:52 localhost kernel: [50532.652721] CR2: ffffffffffffffd8
CR3: 00000001e8198000 CR4: 00000000000007f0
Nov 29 23:17:52 localhost kernel: [50532.652723] DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
Nov 29 23:17:52 localhost kernel: [50532.652723] DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
Nov 29 23:17:52 localhost kernel: [50532.652724] Stack:
Nov 29 23:17:52 localhost kernel: [50532.652727] ffff880229bada10
ffffffff810f4799 0000000000000070 0000000000000000
Nov 29 23:17:52 localhost kernel: [50532.652729] ffffffff81f441a0
ffff880229bada40 ffffffff81082c0f 0000000000000001
Nov 29 23:17:52 localhost kernel: [50532.652731] 000000000000002d
0000000000000246 0000000000000000 ffff880229badac8
Nov 29 23:17:52 localhost kernel: [50532.652732] Call Trace:
Nov 29 23:17:52 localhost kernel: [50532.652741] [<ffffffff810f4799>]
irq_thread_dtor+0x29/0xc0
Nov 29 23:17:52 localhost kernel: [50532.652749] [<ffffffff81082c0f>]
task_work_run+0x9f/0xe0
Nov 29 23:17:52 localhost kernel: [50532.652753] [<ffffffff810656f6>]
do_exit+0x2c6/0xad0
Nov 29 23:17:52 localhost kernel: [50532.652759] [<ffffffff81650935>] ?
printk+0x67/0x69
Nov 29 23:17:52 localhost kernel: [50532.652766] [<ffffffff81062dc1>] ?
kmsg_dump+0xc1/0xd0
Nov 29 23:17:52 localhost kernel: [50532.652771] [<ffffffff8165c1e1>]
oops_end+0xa1/0xe0
Nov 29 23:17:52 localhost kernel: [50532.652775] [<ffffffff8165026a>]
no_context+0x263/0x270
Nov 29 23:17:52 localhost kernel: [50532.652779] [<ffffffff816502ea>]
__bad_area_nosemaphore+0x73/0x1cc
Nov 29 23:17:52 localhost kernel: [50532.652781] [<ffffffff81650456>]
bad_area_nosemaphore+0x13/0x15
Nov 29 23:17:52 localhost kernel: [50532.652786] [<ffffffff8165e5c4>]
__do_page_fault+0xf4/0x5c0
Nov 29 23:17:52 localhost kernel: [50532.652793] [<ffffffff810135ae>] ?
__switch_to+0x13e/0x4f0
Nov 29 23:17:52 localhost kernel: [50532.652800] [<ffffffff81091569>] ?
finish_task_switch+0x49/0xf0
Nov 29 23:17:52 localhost kernel: [50532.652804] [<ffffffff81659368>] ?
__schedule+0x2e8/0x700
Nov 29 23:17:52 localhost kernel: [50532.652809] [<ffffffff8165ea9e>]
do_page_fault+0xe/0x10
Nov 29 23:17:52 localhost kernel: [50532.652814] [<ffffffff8165b658>]
page_fault+0x28/0x30
Nov 29 23:17:52 localhost kernel: [50532.652819] [<ffffffff81361e9a>] ?
advance_transaction+0x60/0x121
Nov 29 23:17:52 localhost kernel: [50532.652821] [<ffffffff8136258e>]
acpi_ec_gpe_handler+0x48/0xc9
Nov 29 23:17:52 localhost kernel: [50532.652826] [<ffffffff8136c3a3>]
acpi_ev_gpe_dispatch+0xb6/0x126
Nov 29 23:17:52 localhost kernel: [50532.652828] [<ffffffff8136c4d3>]
acpi_ev_gpe_detect+0xc0/0x111
Nov 29 23:17:52 localhost kernel: [50532.652830] [<ffffffff810f46b0>] ?
irq_thread_fn+0x50/0x50
Nov 29 23:17:52 localhost kernel: [50532.652832] [<ffffffff8136e3cf>]
acpi_ev_sci_xrupt_handler+0x1f/0x25
Nov 29 23:17:52 localhost kernel: [50532.652835] [<ffffffff8135b12f>]
acpi_irq+0x16/0x31
Nov 29 23:17:52 localhost kernel: [50532.652837] [<ffffffff810f46d3>]
irq_forced_thread_fn+0x23/0x70
Nov 29 23:17:52 localhost kernel: [50532.652840] [<ffffffff810f4c7f>]
irq_thread+0x10f/0x150
Nov 29 23:17:52 localhost kernel: [50532.652842] [<ffffffff810f4770>] ?
wake_threads_waitq+0x50/0x50
Nov 29 23:17:52 localhost kernel: [50532.652844] [<ffffffff810f4b70>] ?
irq_thread_check_affinity+0x90/0x90
Nov 29 23:17:52 localhost kernel: [50532.652846] [<ffffffff810865e2>]
kthread+0xb2/0xc0
Nov 29 23:17:52 localhost kernel: [50532.652848] [<ffffffff81086530>] ?
kthread_worker_fn+0x180/0x180
Nov 29 23:17:52 localhost kernel: [50532.652852] [<ffffffff816628ec>]
ret_from_fork+0x7c/0xb0
Nov 29 23:17:52 localhost kernel: [50532.652854] [<ffffffff81086530>] ?
kthread_worker_fn+0x180/0x180
Nov 29 23:17:52 localhost kernel: [50532.652871] Code: 00 48 89 e5 5d 48
8b 40 c8 48 c1 e8 02 83 e0 01 c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66
66 90 48 8b 87 d8 02 00 00 55 48 89 e5 <48> 8b 40 d8 5d c3 66 2e 0f 1f
84 00 00 00 00 00 66 66 66 66 90
Nov 29 23:17:52 localhost kernel: [50532.652873] RIP
[<ffffffff81086900>] kthread_data+0x10/0x20
Nov 29 23:17:52 localhost kernel: [50532.652874] RSP <ffff880229bad9e8>
Nov 29 23:17:52 localhost kernel: [50532.652875] CR2: ffffffffffffffd8
Nov 29 23:17:52 localhost kernel: [50532.746096] ---[ end trace
0000000000000003 ]---
Nov 29 23:17:52 localhost kernel: [50532.746098] Fixing recursive fault
but reboot is needed!
--------
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 3.10.20-rt17, BUG and Oops
2013-11-30 7:56 3.10.20-rt17, BUG and Oops Fernando Lopez-Lezcano
@ 2013-11-30 16:40 ` Carsten Emde
2013-11-30 20:39 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 11+ messages in thread
From: Carsten Emde @ 2013-11-30 16:40 UTC (permalink / raw)
To: Fernando Lopez-Lezcano, linux-rt-users; +Cc: Steven Rostedt
On 11/30/2013 08:56 AM, Fernando Lopez-Lezcano wrote:
> Just got this on 3.10.20-rt17, ThinkPad T510 running Fedora 19 (I think
> it has happened a few times before). The machine is not completely dead,
> the mouse pointer moves around but otherwise display updates and
> keyboard response are nil.
Same behavior and same manufacturer here, but newer kernel (3.12.0-rt2).
This is the second crash of its kind. About four days after this fault,
the system stopped working and needed a reboot - as predicted. This
crash already made it into the OSADL Hall of Shame
(https://www.osadl.org/?id=1817&system=r1s6) where more details are
available.
While I can't remember crashes at advance_transaction(), function calls
such as acpi_ev_gpe_detect() and friends already occurred a couple of
times in crash dumps. It probably is important to note that the problem
is ACPI related and occurred in two different systems of the same board
manufacturer (and probably the same BIOS/ACPI supplier).
-Carsten.
----------
# addr2line -e vmlinux 0xffffffff81298301
/usr/src/kernels/linux-3.12.0-rt2/drivers/acpi/ec.c:186
if (t->wlen > t->wi) {
if ((status & ACPI_EC_FLAG_IBF) == 0)
acpi_ec_write_data(ec,
----> t->wdata[t->wi++]);
else
goto err;
----------
[805457.455978] BUG: unable to handle kernel paging request at
000000000000809b
[805457.455989] IP: [<ffffffff81298301>] advance_transaction+0x5b/0x12e
[805457.455994] PGD 72551067 PUD 79d02067 PMD 0
[805457.455998] Oops: 0000 [#1] PREEMPT SMP
[805457.456049] Modules linked in: nfsv4 nfs fscache lockd cpufreq_stats
eeprom rpcsec_gss_krb5 auth_rpcgss bnep oid_registry bluetooth sunrpc
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
coretemp snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel
snd_hda_codec arc4 snd_hwdep rtl8192ce snd_seq rtl8192c_common
snd_seq_device rtl_pci rtlwifi mac80211 snd_pcm iTCO_wdt
iTCO_vendor_support cfg80211 snd_timer snd lpc_ich microcode soundcore
mfd_core pcspkr i2c_i801 r8169 serio_raw mii rfkill snd_page_alloc ipv6
autofs4 radeon ttm drm_kms_helper drm i2c_algo_bit [last unloaded:
speedstep_lib]
[805457.456055] CPU: 1 PID: 45 Comm: irq/9-acpi Not tainted 3.12.0-rt2 #9
[805457.456057] Hardware name: LENOVO 10087&3110/Tiger Hill, BIOS
E6KT11AUS 11/17/2011
[805457.456060] task: ffff88007cb59800 ti: ffff88007c41c000 task.ti:
ffff88007c41c000
[805457.456067] RIP: 0010:[<ffffffff81298301>] [<ffffffff81298301>]
advance_transaction+0x5b/0x12e
[805457.456069] RSP: 0018:ffff88007c41dce8 EFLAGS: 00010202
[805457.456071] RAX: 000000000000007c RBX: ffff88007bc27d98 RCX:
0000000000008020
[805457.456074] RDX: 000000000000007b RSI: 0000000000000009 RDI:
ffff88007cb7a0b0
[805457.456076] RBP: ffff88007c41dd18 R08: 00000000000004d0 R09:
000000000000042e
[805457.456078] R10: ffff88007c41c000 R11: ffff88007c41c000 R12:
ffff88007cb7a000
[805457.456081] R13: 0000000000000011 R14: 0000000000000009 R15:
ffff88007cb7a0b0
[805457.456084] FS: 0000000000000000(0000) GS:ffff88007f680000(0000)
knlGS:0000000000000000
[805457.456086] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[805457.456088] CR2: 000000000000809b CR3: 000000007985a000 CR4:
00000000000007e0
[805457.456090] Stack:
[805457.456097] ffff88007ca8c909 ffff88007cb7a000 ffff88007ca8c909
0000000000000011
[805457.456102] 0000000000000002 0000000000000000 ffff88007c41dd38
ffffffff81298ed8
[805457.456108] 0000000000000001 ffff88007ca8c998 ffff88007c41dd68
ffffffff812a1e11
[805457.456109] Call Trace:
[805457.456116] [<ffffffff81298ed8>] acpi_ec_gpe_handler+0x47/0xc6
[805457.456122] [<ffffffff812a1e11>] acpi_ev_gpe_dispatch+0xc5/0x13e
[805457.456127] [<ffffffff812a1f3f>] acpi_ev_gpe_detect+0xb5/0x10a
[805457.456135] [<ffffffff81077e15>] ? irq_thread_fn+0x41/0x41
[805457.456140] [<ffffffff812a3e46>] acpi_ev_sci_xrupt_handler+0x22/0x2b
[805457.456146] [<ffffffff81290f92>] acpi_irq+0x16/0x31
[805457.456152] [<ffffffff81077e39>] irq_forced_thread_fn+0x24/0x5a
[805457.456159] [<ffffffff81077ba7>] irq_thread+0x8c/0x174
[805457.456166] [<ffffffff81077d2e>] ?
irq_finalize_oneshot.part.5+0x9f/0x9f
[805457.456173] [<ffffffff81077b1b>] ? wake_threads_waitq+0x44/0x44
[805457.456179] [<ffffffff81077b1b>] ? wake_threads_waitq+0x44/0x44
[805457.456186] [<ffffffff81051c71>] kthread+0x8d/0x95
[805457.456192] [<ffffffff81051be4>] ?
rcu_read_unlock_sched_notrace+0x44/0x44
[805457.456199] [<ffffffff814d327c>] ret_from_fork+0x7c/0xb0
[805457.456205] [<ffffffff81051be4>] ?
rcu_read_unlock_sched_notrace+0x44/0x44
[805457.456264] Code: ff e8 36 59 23 00 48 85 db 0f 84 d6 00 00 00 8a 53
15 8a 43 13 38 c2 76 49 41 f6 c6 02 0f 85 a1 00 00 00 48 8b 0b 0f b6 d0
ff c0 <44> 8a 2c 11 88 43 13 f6 05 bb 5b 80 00 04 74 19 41 0f b6 d5 48
[805457.456269] RIP [<ffffffff81298301>] advance_transaction+0x5b/0x12e
[805457.456270] RSP <ffff88007c41dce8>
[805457.456272] CR2: 000000000000809b
[805458.068444] ---[ end trace 0000000000000002 ]---
[805458.068475] BUG: unable to handle kernel paging request at
ffffffffffffffd8
[805458.068484] IP: [<ffffffff81051f6e>] kthread_data+0x10/0x16
[805458.068488] PGD 1a0f067 PUD 1a11067 PMD 0
[805458.068491] Oops: 0000 [#2] PREEMPT SMP
[805458.068542] Modules linked in: nfsv4 nfs fscache lockd cpufreq_stats
eeprom rpcsec_gss_krb5 auth_rpcgss bnep oid_registry bluetooth sunrpc
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
coretemp snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel
snd_hda_codec arc4 snd_hwdep rtl8192ce snd_seq rtl8192c_common
snd_seq_device rtl_pci rtlwifi mac80211 snd_pcm iTCO_wdt
iTCO_vendor_support cfg80211 snd_timer snd lpc_ich microcode soundcore
mfd_core pcspkr i2c_i801 r8169 serio_raw mii rfkill snd_page_alloc ipv6
autofs4 radeon ttm drm_kms_helper drm i2c_algo_bit [last unloaded:
speedstep_lib]
[805458.068546] CPU: 1 PID: 45 Comm: irq/9-acpi Tainted: G D
3.12.0-rt2 #9
[805458.068548] Hardware name: LENOVO 10087&3110/Tiger Hill, BIOS
E6KT11AUS 11/17/2011
[805458.068550] task: ffff88007cb59800 ti: ffff88007c41c000 task.ti:
ffff88007c41c000
[805458.068556] RIP: 0010:[<ffffffff81051f6e>] [<ffffffff81051f6e>]
kthread_data+0x10/0x16
[805458.068558] RSP: 0018:ffff88007c41d928 EFLAGS: 00010202
[805458.068560] RAX: 0000000000000000 RBX: ffff88007cb59800 RCX:
0000000000000000
[805458.068561] RDX: ffff88007c41de70 RSI: 0000000000000000 RDI:
ffff88007cb59800
[805458.068563] RBP: ffff88007c41d928 R08: 0000000000000000 R09:
0000000000000001
[805458.068565] R10: 0000000000000001 R11: ffff88007cb59800 R12:
ffff88007cb59800
[805458.068566] R13: ffffffff81c51f20 R14: 0000000000000246 R15:
0000000000000001
[805458.068569] FS: 0000000000000000(0000) GS:ffff88007f680000(0000)
knlGS:0000000000000000
[805458.068571] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[805458.068572] CR2: ffffffffffffffd8 CR3: 000000007985a000 CR4:
00000000000007e0
[805458.068574] Stack:
[805458.068579] ffff88007c41d948 ffffffff81077d76 ffff88007cb59800
0000000000000000
[805458.068583] ffff88007c41d978 ffffffff8104efe9 ffff88007cb59800
ffff88007cb59800
[805458.068587] 0000000000000000 ffff88007c41dc38 ffff88007c41d9f8
ffffffff81036fa0
[805458.068588] Call Trace:
[805458.068594] [<ffffffff81077d76>] irq_thread_dtor+0x48/0xa6
[805458.068599] [<ffffffff8104efe9>] task_work_run+0x83/0x9a
[805458.068604] [<ffffffff81036fa0>] do_exit+0x3f0/0x969
[805458.068609] [<ffffffff81076127>] ? kmsg_dump+0xac/0xb3
[805458.068613] [<ffffffff814cf03c>] oops_end+0xba/0xc2
[805458.068619] [<ffffffff814c5ae1>] no_context+0x1f9/0x208
[805458.068624] [<ffffffff81008d40>] ? __cycles_2_ns+0xe/0x4c
[805458.068629] [<ffffffff814c5cc5>] __bad_area_nosemaphore+0x1d5/0x1f6
[805458.068634] [<ffffffff8105d4c4>] ? get_parent_ip+0xe/0x3e
[805458.068638] [<ffffffff814ce4d9>] ? restore_args+0x30/0x30
[805458.068643] [<ffffffff814c5cf9>] bad_area_nosemaphore+0x13/0x15
[805458.068647] [<ffffffff814d110d>] __do_page_fault+0x370/0x3b0
[805458.068651] [<ffffffff810b8fb4>] ? buffer_ftrace_now+0x39/0x45
[805458.068656] [<ffffffff810016ee>] ? __switch_to+0x1f9/0x409
[805458.068660] [<ffffffff814cdeb7>] ? _raw_spin_unlock_irq+0x47/0x54
[805458.068665] [<ffffffff8105c1d2>] ? finish_task_switch+0x8c/0xdb
[805458.068669] [<ffffffff8105c585>] ? need_resched+0x38/0x44
[805458.068673] [<ffffffff814cca3f>] ? __schedule+0x614/0x62b
[805458.068677] [<ffffffff8105d4c4>] ? get_parent_ip+0xe/0x3e
[805458.068681] [<ffffffff814d115b>] do_page_fault+0xe/0x10
[805458.068685] [<ffffffff814ce6c2>] page_fault+0x22/0x30
[805458.068691] [<ffffffff81298301>] ? advance_transaction+0x5b/0x12e
[805458.068696] [<ffffffff812982dc>] ? advance_transaction+0x36/0x12e
[805458.068703] [<ffffffff81298ed8>] acpi_ec_gpe_handler+0x47/0xc6
[805458.068710] [<ffffffff812a1e11>] acpi_ev_gpe_dispatch+0xc5/0x13e
[805458.068716] [<ffffffff812a1f3f>] acpi_ev_gpe_detect+0xb5/0x10a
[805458.068724] [<ffffffff81077e15>] ? irq_thread_fn+0x41/0x41
[805458.068728] [<ffffffff812a3e46>] acpi_ev_sci_xrupt_handler+0x22/0x2b
[805458.068732] [<ffffffff81290f92>] acpi_irq+0x16/0x31
[805458.068737] [<ffffffff81077e39>] irq_forced_thread_fn+0x24/0x5a
[805458.068741] [<ffffffff81077ba7>] irq_thread+0x8c/0x174
[805458.068746] [<ffffffff81077d2e>] ?
irq_finalize_oneshot.part.5+0x9f/0x9f
[805458.068750] [<ffffffff81077b1b>] ? wake_threads_waitq+0x44/0x44
[805458.068754] [<ffffffff81077b1b>] ? wake_threads_waitq+0x44/0x44
[805458.068758] [<ffffffff81051c71>] kthread+0x8d/0x95
[805458.068764] [<ffffffff81051be4>] ?
rcu_read_unlock_sched_notrace+0x44/0x44
[805458.068768] [<ffffffff814d327c>] ret_from_fork+0x7c/0xb0
[805458.068773] [<ffffffff81051be4>] ?
rcu_read_unlock_sched_notrace+0x44/0x44
[805458.068820] Code: 40 98 00 00 48 8b 80 c8 03 00 00 48 89 e5 5d 48 8b
40 c8 48 c1 e8 02 83 e0 01 c3 0f 1f 44 00 00 48 8b 87 c8 03 00 00 55 48
89 e5 <48> 8b 40 d8 5d c3 0f 1f 44 00 00 55 ba 08 00 00 00 48 89 e5 48
[805458.068824] RIP [<ffffffff81051f6e>] kthread_data+0x10/0x16
[805458.068825] RSP <ffff88007c41d928>
[805458.068826] CR2: ffffffffffffffd8
[805458.912113] ---[ end trace 0000000000000003 ]---
[805458.912115] Fixing recursive fault but reboot is needed!
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 3.10.20-rt17, BUG and Oops
2013-11-30 16:40 ` Carsten Emde
@ 2013-11-30 20:39 ` Sebastian Andrzej Siewior
2013-11-30 22:47 ` Carsten Emde
0 siblings, 1 reply; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-11-30 20:39 UTC (permalink / raw)
To: Fernando Lopez-Lezcano, Carsten Emde; +Cc: linux-rt-users, Steven Rostedt
* Carsten Emde | 2013-11-30 17:40:49 [+0100]:
># addr2line -e vmlinux 0xffffffff81298301
>/usr/src/kernels/linux-3.12.0-rt2/drivers/acpi/ec.c:186
>
> if (t->wlen > t->wi) {
> if ((status & ACPI_EC_FLAG_IBF) == 0)
> acpi_ec_write_data(ec,
>----> t->wdata[t->wi++]);
> else
> goto err;
based on the assembly, I *think* this is
t->wdata[x]
wher X is outside of wdata's range. But then the pointer is almost
NULL. Could one of you check with the acpi folks if they know some
tricks how to debug this ACPI thingy?
Is this any help?
diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index a06d983..d3add07 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -175,16 +175,19 @@ static void start_transaction(struct acpi_ec *ec)
static void advance_transaction(struct acpi_ec *ec, u8 status)
{
unsigned long flags;
- struct transaction *t = ec->curr;
+ struct transaction *t;
spin_lock_irqsave(&ec->lock, flags);
+ t = ec->curr;
if (!t)
goto unlock;
if (t->wlen > t->wi) {
- if ((status & ACPI_EC_FLAG_IBF) == 0)
+ if ((status & ACPI_EC_FLAG_IBF) == 0) {
+ if (WARN_ON_ONCE(!t->wdata))
+ goto err;
acpi_ec_write_data(ec,
t->wdata[t->wi++]);
- else
+ } else
goto err;
} else if (t->rlen > t->ri) {
if ((status & ACPI_EC_FLAG_OBF) == 1) {
Sebastian
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: 3.10.20-rt17, BUG and Oops
2013-11-30 20:39 ` Sebastian Andrzej Siewior
@ 2013-11-30 22:47 ` Carsten Emde
2013-12-02 8:27 ` Sebastian Andrzej Siewior
2013-12-15 14:53 ` Sebastian Andrzej Siewior
0 siblings, 2 replies; 11+ messages in thread
From: Carsten Emde @ 2013-11-30 22:47 UTC (permalink / raw)
To: Sebastian Andrzej Siewior, Fernando Lopez-Lezcano
Cc: linux-rt-users, Steven Rostedt
Sebastian,
>> # addr2line -e vmlinux 0xffffffff81298301
>> /usr/src/kernels/linux-3.12.0-rt2/drivers/acpi/ec.c:186
>>
>> if (t->wlen > t->wi) {
>> if ((status & ACPI_EC_FLAG_IBF) == 0)
>> acpi_ec_write_data(ec,
>> ----> t->wdata[t->wi++]);
>> else
>> goto err;
>
> based on the assembly, I *think* this is
> t->wdata[x]
>
> wher X is outside of wdata's range. But then the pointer is almost
> NULL.
Note the offensive addresses of the two crashes
2013-11-26-23.28
unable to handle kernel paging request at 000000000000809b
2013-11-12-08.15
unable to handle kernel NULL pointer dereference at 000000000000007a
it looks like the write data pointer t->wdata was overwritten - in the
first case by 0x8000 and in the second case by 0.
> Is this any help?
>
> diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
> index a06d983..d3add07 100644
> --- a/drivers/acpi/ec.c
> +++ b/drivers/acpi/ec.c
> @@ -175,16 +175,19 @@ static void start_transaction(struct acpi_ec *ec)
> static void advance_transaction(struct acpi_ec *ec, u8 status)
> {
> unsigned long flags;
> - struct transaction *t = ec->curr;
> + struct transaction *t;
>
> spin_lock_irqsave(&ec->lock, flags);
> + t = ec->curr;
Looks like a race - did you find a place where ec->curr->wdata could be
overwritten? The small size of the potential race window may explain why
it took a couple of days to trigger it.
Will apply the fix and the warning - let's see.
Thanks.
-Carsten.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 3.10.20-rt17, BUG and Oops
2013-11-30 22:47 ` Carsten Emde
@ 2013-12-02 8:27 ` Sebastian Andrzej Siewior
2013-12-15 14:53 ` Sebastian Andrzej Siewior
1 sibling, 0 replies; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-12-02 8:27 UTC (permalink / raw)
To: Carsten Emde; +Cc: Fernando Lopez-Lezcano, linux-rt-users, Steven Rostedt
On 11/30/2013 11:47 PM, Carsten Emde wrote:
> Sebastian,
Hi Carsten,
>> index a06d983..d3add07 100644
>> --- a/drivers/acpi/ec.c
>> +++ b/drivers/acpi/ec.c
>> @@ -175,16 +175,19 @@ static void start_transaction(struct acpi_ec *ec)
>> static void advance_transaction(struct acpi_ec *ec, u8 status)
>> {
>> unsigned long flags;
>
>> - struct transaction *t = ec->curr;
>> + struct transaction *t;
>>
>> spin_lock_irqsave(&ec->lock, flags);
>> + t = ec->curr;
> Looks like a race - did you find a place where ec->curr->wdata could be
> overwritten? The small size of the potential race window may explain why
> it took a couple of days to trigger it.
The pointer is assigned under a lock but here it is dereferenced before
taking the lock. It looks racy this might be the bug.
> Will apply the fix and the warning - let's see.
>
> Thanks.
> -Carsten.
Sebastian
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 3.10.20-rt17, BUG and Oops
2013-11-30 22:47 ` Carsten Emde
2013-12-02 8:27 ` Sebastian Andrzej Siewior
@ 2013-12-15 14:53 ` Sebastian Andrzej Siewior
2013-12-15 23:50 ` Carsten Emde
2013-12-17 19:40 ` Fernando Lopez-Lezcano
1 sibling, 2 replies; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-12-15 14:53 UTC (permalink / raw)
To: Carsten Emde; +Cc: Fernando Lopez-Lezcano, linux-rt-users, Steven Rostedt
* Carsten Emde | 2013-11-30 23:47:25 [+0100]:
>Will apply the fix and the warning - let's see.
I do not want to urge anyone but just gentle ping to satisfy my
curiosity about the patch.
Fernando, any updates from you?
>
>Thanks.
> -Carsten.
Sebastian
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 3.10.20-rt17, BUG and Oops
2013-12-15 14:53 ` Sebastian Andrzej Siewior
@ 2013-12-15 23:50 ` Carsten Emde
2013-12-16 8:09 ` Sebastian Andrzej Siewior
2013-12-17 19:40 ` Fernando Lopez-Lezcano
1 sibling, 1 reply; 11+ messages in thread
From: Carsten Emde @ 2013-12-15 23:50 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Fernando Lopez-Lezcano, linux-rt-users, Steven Rostedt
On 12/15/2013 03:53 PM, Sebastian Andrzej Siewior wrote:
> * Carsten Emde | 2013-11-30 23:47:25 [+0100]:
>> Will apply the fix and the warning - let's see.
> I do not want to urge anyone but just gentle ping to satisfy my
> curiosity about the patch.
I have applied the patch to a number of systems including the one that
regularly crashed after less than a week. No system crashed so far, and
uptime of the one that I initially monitored for the crashes is now 15
days. Please allow for another week of monitoring. If by next weekend
all patched systems still are up and running, I will submit the patch
along with a description and a short history when and how the race made
it into the code.
-Carsten.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 3.10.20-rt17, BUG and Oops
2013-12-15 23:50 ` Carsten Emde
@ 2013-12-16 8:09 ` Sebastian Andrzej Siewior
2013-12-21 21:23 ` Carsten Emde
0 siblings, 1 reply; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-12-16 8:09 UTC (permalink / raw)
To: Carsten Emde; +Cc: Fernando Lopez-Lezcano, linux-rt-users, Steven Rostedt
On 12/16/2013 12:50 AM, Carsten Emde wrote:
> On 12/15/2013 03:53 PM, Sebastian Andrzej Siewior wrote:
>> * Carsten Emde | 2013-11-30 23:47:25 [+0100]:
>>> Will apply the fix and the warning - let's see.
>> I do not want to urge anyone but just gentle ping to satisfy my
>> curiosity about the patch.
> I have applied the patch to a number of systems including the one that
> regularly crashed after less than a week. No system crashed so far, and
> uptime of the one that I initially monitored for the crashes is now 15
> days. Please allow for another week of monitoring. If by next weekend
> all patched systems still are up and running, I will submit the patch
> along with a description and a short history when and how the race made
> it into the code.
Sounds great. Thanks.
>
> -Carsten.
Sebastian
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 3.10.20-rt17, BUG and Oops
2013-12-15 14:53 ` Sebastian Andrzej Siewior
2013-12-15 23:50 ` Carsten Emde
@ 2013-12-17 19:40 ` Fernando Lopez-Lezcano
2013-12-17 19:42 ` Sebastian Andrzej Siewior
1 sibling, 1 reply; 11+ messages in thread
From: Fernando Lopez-Lezcano @ 2013-12-17 19:40 UTC (permalink / raw)
To: Sebastian Andrzej Siewior, Carsten Emde
Cc: nando, linux-rt-users, Steven Rostedt
On 12/15/2013 06:53 AM, Sebastian Andrzej Siewior wrote:
> * Carsten Emde | 2013-11-30 23:47:25 [+0100]:
>
>> Will apply the fix and the warning - let's see.
>
> I do not want to urge anyone but just gentle ping to satisfy my
> curiosity about the patch.
> Fernando, any updates from you?
Hi, I have not seen this problem again, but I have not had a chance to
stress the system. Usually this would pop up when running jackd + audio
applications for some time. But so far so good...
-- Fernando
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 3.10.20-rt17, BUG and Oops
2013-12-17 19:40 ` Fernando Lopez-Lezcano
@ 2013-12-17 19:42 ` Sebastian Andrzej Siewior
0 siblings, 0 replies; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-12-17 19:42 UTC (permalink / raw)
To: Fernando Lopez-Lezcano; +Cc: Carsten Emde, linux-rt-users, Steven Rostedt
On 12/17/2013 08:40 PM, Fernando Lopez-Lezcano wrote:
>> Fernando, any updates from you?
>
> Hi, I have not seen this problem again, but I have not had a chance to
> stress the system. Usually this would pop up when running jackd + audio
> applications for some time. But so far so good...
Okay, but you are using the patch I posted correct? And without the
patch you were able to reproduce it more than once?
>
> -- Fernando
Sebastian
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 3.10.20-rt17, BUG and Oops
2013-12-16 8:09 ` Sebastian Andrzej Siewior
@ 2013-12-21 21:23 ` Carsten Emde
0 siblings, 0 replies; 11+ messages in thread
From: Carsten Emde @ 2013-12-21 21:23 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Fernando Lopez-Lezcano, linux-rt-users, Steven Rostedt
On 12/16/2013 09:09 AM, Sebastian Andrzej Siewior wrote:
> On 12/16/2013 12:50 AM, Carsten Emde wrote:
>> On 12/15/2013 03:53 PM, Sebastian Andrzej Siewior wrote:
>>> * Carsten Emde | 2013-11-30 23:47:25 [+0100]:
>>>> Will apply the fix and the warning - let's see.
>>> I do not want to urge anyone but just gentle ping to satisfy my
>>> curiosity about the patch.
>> I have applied the patch to a number of systems including the one that
>> regularly crashed after less than a week. No system crashed so far, and
>> uptime of the one that I initially monitored for the crashes is now 15
>> days. Please allow for another week of monitoring. If by next weekend
>> all patched systems still are up and running, I will submit the patch
>> along with a description and a short history when and how the race made
>> it into the code.
> Sounds great. Thanks.
Uptime is now 20 days, no problems any more.
Just wanted to write down the whole story and send the patch to LKML
when I saw that the patch already made it into vanilla 3.12.5 and, thus,
to RT. It is marked 3.8+ stable, so all is good now.
The mainline patch is:
ACPI / EC: Ensure lock is acquired before accessing ec struct members,
commit 36b15875a7819a2ec4cb5748ff7096ad7bd86cbb.
-Carsten.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2013-12-21 21:35 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-30 7:56 3.10.20-rt17, BUG and Oops Fernando Lopez-Lezcano
2013-11-30 16:40 ` Carsten Emde
2013-11-30 20:39 ` Sebastian Andrzej Siewior
2013-11-30 22:47 ` Carsten Emde
2013-12-02 8:27 ` Sebastian Andrzej Siewior
2013-12-15 14:53 ` Sebastian Andrzej Siewior
2013-12-15 23:50 ` Carsten Emde
2013-12-16 8:09 ` Sebastian Andrzej Siewior
2013-12-21 21:23 ` Carsten Emde
2013-12-17 19:40 ` Fernando Lopez-Lezcano
2013-12-17 19:42 ` Sebastian Andrzej Siewior
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).