public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	kexec@lists.infradead.org, davem@davemloft.net,
	akpm@linux-foundation.org, torvalds@linux-foundation.org,
	ganeshgr@chelsio.com, nirranjan@chelsio.com,
	indranil@chelsio.com
Subject: Re: [RFC 0/2] kernel: add support to collect hardware logs in panic
Date: Fri, 02 Mar 2018 07:22:45 -0600	[thread overview]
Message-ID: <87lgfad32y.fsf@xmission.com> (raw)
In-Reply-To: <cover.1519911559.git.rahul.lakkireddy@chelsio.com> (Rahul Lakkireddy's message of "Fri, 2 Mar 2018 17:49:56 +0530")

Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> writes:

> On production servers running variety of workloads over time, kernel
> panic can happen sporadically after days or even months. It is
> important to collect as much debug logs as possible to root cause
> and fix the problem, that may not be easy to reproduce. Snapshot of
> underlying hardware/firmware state (like register dump, firmware
> logs, adapter memory, etc.), at the time of kernel panic will be very
> helpful while debugging the culprit device driver.
>
> This series of patches add new generic framework that enable device
> drivers to collect device specific snapshot of the hardware/firmware
> state of the underlying device at the time of kernel panic. The
> collected logs are appended to vmcore along with details, such as
> start address and length of the logs, which are required for
> extraction during post-analysis.
>
> Device drivers can use crash_driver_dump_register() to register their
> callback that collects underlying device specific hardware/firmware
> logs during kernel panic (i.e. before booting into the second kernel).
> Drivers can unregister with crash_driver_dump_unregister().
>
> To extract the device specific hardware/firmware logs using crash:
>
> crash> help -D | grep DRIVERDUMP
> DRIVERDUMP=(cxgb4_0000:02:00.4, ffffb131090bd000, 37782968)
>
> crash> rd ffffb131090bd000 37782968 -r hardware.log
> 37782968 bytes copied from 0xffffb131090bd000 to hardware.log
>
> Patch 1 adds API to allow drivers to register callback to
> collect the device specific hardware/firmware logs.
>
> Patch 2 shows a cxgb4 driver example using the API to collect
> hardware/firmware logs during kernel panic.
>
> Suggestions and feedback will be much appreciated.

I strongly suggest you figure out how to run this code in the
crash recovery kernel before your hardware is initialized.
That will give you a known good kernel to perform your collection from.

Every line of code we add to the kexec on panic code path tends to add
to it's fragility and increase the chance you won't get any information
at all.

When the assumption is it is something wrong with your driver/hardware
that caused the crash, calling into your driver is a very bad idea.
Especially running code that does callbacks and all kinds of other cute
things.

Doing this as the crash recover kernel boots up before much if any
hardware is initialized seems like a fine thing to do, and just
needs a little coordination with userspace to ensure the information
gets saved when a vmcore is computed.

Eric

  parent reply	other threads:[~2018-03-02 13:24 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-02 12:19 [RFC 0/2] kernel: add support to collect hardware logs in panic Rahul Lakkireddy
2018-03-02 12:19 ` [RFC 1/2] kernel/crash_core: add API to collect hardware dump in kernel panic Rahul Lakkireddy
2018-03-02 12:19 ` [RFC 2/2] cxgb4: " Rahul Lakkireddy
2018-03-02 13:22 ` Eric W. Biederman [this message]
2018-03-03 10:43   ` [RFC 0/2] kernel: add support to collect hardware logs in panic Rahul Lakkireddy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lgfad32y.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=akpm@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=ganeshgr@chelsio.com \
    --cc=indranil@chelsio.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nirranjan@chelsio.com \
    --cc=rahul.lakkireddy@chelsio.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox