public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Thierry Moreau <thierry.moreau@connotech.com>
To: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Spurious Fatal exception in interrupt, 4.1.43
Date: Mon, 04 Sep 2017 21:41:00 +0000	[thread overview]
Message-ID: <59ADC86C.4040307@connotech.com> (raw)


Let me report a difficulty with the Linux kernel -- unknown root cause. 
Basically, this is a FYI message, unless someone sees appropriate to 
investigate -- thanks in advance.

This is an on-line server with services like DNS, postfix mail, Apache, 
local wifi, IP forwarding. No GUI. Distribution is Crux, which means a 
customized installation (e.g. kernel manually configured).

The system ran for two years without significant problems (except for an 
episode of instability in disk access). Then it started to crash about 
two months ago and does so once or twice a week. Difficult to pinpoint 
any environmental factor from the occurrence pattern.

Here is the trace I get from the console.

blk_done_softirq+0x73/0x90
__do_softirq+0xd4/0x1e0
irq_exit+0x7e/0xa0
do_IRQ+0x4b/0xe0
common_interrupt+0x6e/0x6e
<EOI>
lapnic_next_deadline+0x2b/0x40
cpuidle_enter_state+0x9e/0x150
cpuidle_enter_state+0x94/0x150
cpu_startup_entry+0x221/0x2b0
start_kernel+0x405/0x410
set_init_arg+0x4e/0x4e
early_init_idt_handler_array+0x120/0x120
early_init_idt_handler_array+0x120/0x120
x86_64_start_kernel+0xe5/0xf2
Code: ff e0 0f 1f 80 00 00 00 00 48 8b 5f 50 89 74 24 0c e8 c3 fc ff ff 
8b 74 24 84 00 00 00 00 00 f0 ff 47 44 e9 57 ff
RIP [<ffffffff8137dd62>] bio_endio+0x92/0xa0
RSP <ffff88021ea03dc8>
---[ end trace b293c5209809c889 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Fatal exception in interrupt

I upgraded the kernel from 4.1.3 to 4.1.43 and got exactly the same 
trace after a few days of up-time.

Here are some options I see for the next step:

(A) Upgrade once more to 4.4.x, 4.9.x, or 4.12.x (or 4.13).

(B) Investigate kernel configuration for reliability-impacting options.

(C) Review the system BIOS configuration (e.g. under the hypothesis that 
interrupt processing calls for a special memory/cache access cycle that 
is borderline with the current BIOS confguration).

(D) Remove the wifi service (the only "option" from an operational 
perspective).

(E) Provision a replacement system and remove other services in order to 
isolate environmental factors (likely to confirm that a useless system 
is working fine generally!).

(F) Give up with this hardware (and thus deprive the Linux community 
from possible improvement if some kernel issue was to be identified in 
troubleshooting ???).

Any suggestion?

- Thierry Moreau

                 reply	other threads:[~2017-09-04 21:49 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=59ADC86C.4040307@connotech.com \
    --to=thierry.moreau@connotech.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox