From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753990AbdEFXFl (ORCPT ); Sat, 6 May 2017 19:05:41 -0400 Received: from mail-wm0-f54.google.com ([74.125.82.54]:35420 "EHLO mail-wm0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750780AbdEFXFc (ORCPT ); Sat, 6 May 2017 19:05:32 -0400 Date: Sun, 7 May 2017 01:05:28 +0200 From: Adrien Mahieux To: linux-kernel@vger.kernel.org Subject: [RFC] NMI: Generic per-code NMI handler (panic/kdump) Message-ID: <20170506230315.GA5553@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline User-Agent: Mutt/1.7.1 (2016-10-04) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, I'm new to the LKML, so should I make mistakes, please tell me along with the correct way to do (or doc I've read but forgotten). I've written a small module to manage NMI events based on their code, so a sysadmin can drop them (avoid console messages) or panic the kernel (kdump). https://github.com/Saruspete/nmimgr/blob/master/nmimgr.c So far, working as expected in massive prod, with different kernels. As a newbie, I've got some questions I didn't found response in the docs: - My code is supporting multiple versions with KERNEL_VERSION macro, but I read it's not recommanded and should just be compiling against masters head. May I leave this as is to ease the distributions' maintainers work ? - In what subsystem/file should it go ? arch/x86/kernel/nmi.c (but should be for all archs) kernel/watchdog.c (but not a watchdog) drivers/char/ipmi (but not an IPMI nor a driver) - How to know where to place its Kconfig menus ? It's easy for drivers, but what about this one ? - If someone has time to review the code and point me cases I didn't think of, would be happy to fix them. Here are some real-life usage of this module: - When my servers are frozen, I generate an NMI from IPMI "power diag". But the event code changes between each hardware vendor (even different gen of the same vendor) and I have some specific hardware (like fpgas) that generates NMI as well, or near-dead parts that generates some too so I can't use *nmi_panic sysctls. - When using hpwdt module, it registers an equivalent of panic upon any nmi event. So I still want the watchdog, but only upon ILO and ASR NMIs, not all others. - During a kdump, some servers may take a lot of time to dump memory. If the server receives another NMI, it'll reboot and loose the current dump. By dropping all NMIs, it acts as a fence during the kdump. To help the usage, I've added a "setup.sh" in the repo to build and configure the kmod with the NMI events matching the current hardware (HP, Dell, IBM, VirtualBox...). Thanks for your guidance. Adrien.