All of lore.kernel.org
 help / color / mirror / Atom feed
From: Adrien Mahieux <adrien.mahieux@gmail.com>
To: linux-kernel@vger.kernel.org
Subject: [RFC] NMI: Generic per-code NMI handler (panic/kdump)
Date: Sun, 7 May 2017 01:05:28 +0200	[thread overview]
Message-ID: <20170506230315.GA5553@gmail.com> (raw)

Hello,


I'm new to the LKML, so should I make mistakes, please tell me along with
the correct way to do (or doc I've read but forgotten).

I've written a small module to manage NMI events based on their code, so a
sysadmin can drop them (avoid console messages) or panic the kernel (kdump).
https://github.com/Saruspete/nmimgr/blob/master/nmimgr.c

So far, working as expected in massive prod, with different kernels. 


As a newbie, I've got some questions I didn't found response in the docs:

- My code is supporting multiple versions with KERNEL_VERSION macro, but I 
  read it's not recommanded and should just be compiling against masters head.
  May I leave this as is to ease the distributions' maintainers work ?

- In what subsystem/file should it go ?
  arch/x86/kernel/nmi.c (but should be for all archs)
  kernel/watchdog.c     (but not a watchdog)
  drivers/char/ipmi     (but not an IPMI nor a driver)

- How to know where to place its Kconfig menus ? It's easy for drivers, but
  what about this one ?

- If someone has time to review the code and point me cases I didn't think
  of, would be happy to fix them.



Here are some real-life usage of this module:

- When my servers are frozen, I generate an NMI from IPMI "power diag". But the
  event code changes between each hardware vendor (even different gen of the
  same vendor) and I have some specific hardware (like fpgas) that generates
  NMI as well, or near-dead parts that generates some too so I can't use
  *nmi_panic sysctls.

- When using hpwdt module, it registers an equivalent of panic upon any nmi
  event. So I still want the watchdog, but only upon ILO and ASR NMIs, not
  all others.

- During a kdump, some servers may take a lot of time to dump memory. If the
  server receives another NMI, it'll reboot and loose the current dump. By
  dropping all NMIs, it acts as a fence during the kdump. 

To help the usage, I've added a "setup.sh" in the repo to build and configure
the kmod with the NMI events matching the current hardware (HP, Dell, IBM,
VirtualBox...).



Thanks for your guidance.

Adrien.

                 reply	other threads:[~2017-05-06 23:05 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170506230315.GA5553@gmail.com \
    --to=adrien.mahieux@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.