From: Rahul Lakkireddy <rahul.lakkireddy-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
To: Dave Young <dyoung-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Indranil Choudhury
<indranil-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>,
"netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Nirranjan Kirubaharan
<nirranjan-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>,
"linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
"viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org"
<viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>,
"davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org"
<davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>,
"stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ@public.gmane.org"
<stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ@public.gmane.org>,
Ganesh GR <ganeshgr-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>,
"linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
"akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org"
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
"torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org"
<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
"kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org"
<kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org>,
"ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org"
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Subject: Re: [PATCH net-next v4 0/3] kernel: add support to collect hardware logs in crash recovery kernel
Date: Wed, 18 Apr 2018 18:01:16 +0530 [thread overview]
Message-ID: <20180418123114.GA19159@chelsio.com> (raw)
In-Reply-To: <20180418061546.GA4551-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
On Wednesday, April 04/18/18, 2018 at 11:45:46 +0530, Dave Young wrote:
> Hi Rahul,
> On 04/17/18 at 01:14pm, Rahul Lakkireddy wrote:
> > On production servers running variety of workloads over time, kernel
> > panic can happen sporadically after days or even months. It is
> > important to collect as much debug logs as possible to root cause
> > and fix the problem, that may not be easy to reproduce. Snapshot of
> > underlying hardware/firmware state (like register dump, firmware
> > logs, adapter memory, etc.), at the time of kernel panic will be very
> > helpful while debugging the culprit device driver.
> >
> > This series of patches add new generic framework that enable device
> > drivers to collect device specific snapshot of the hardware/firmware
> > state of the underlying device in the crash recovery kernel. In crash
> > recovery kernel, the collected logs are added as elf notes to
> > /proc/vmcore, which is copied by user space scripts for post-analysis.
> >
> > The sequence of actions done by device drivers to append their device
> > specific hardware/firmware logs to /proc/vmcore are as follows:
> >
> > 1. During probe (before hardware is initialized), device drivers
> > register to the vmcore module (via vmcore_add_device_dump()), with
> > callback function, along with buffer size and log name needed for
> > firmware/hardware log collection.
>
> I assumed the elf notes info should be prepared while kexec_[file_]load
> phase. But I did not read the old comment, not sure if it has been discussed
> or not.
>
We must not collect dumps in crashing kernel. Adding more things in
crash dump path risks not collecting vmcore at all. Eric had
discussed this in more detail at:
https://lkml.org/lkml/2018/3/24/319
We are safe to collect dumps in the second kernel. Each device dump
will be exported as an elf note in /proc/vmcore.
> If do this in 2nd kernel a question is driver can be loaded later than vmcore init.
Yes, drivers will add their device dumps after vmcore init.
> How to guarantee the function works if vmcore reading happens before
> the driver is loaded?
>
> Also it is possible that kdump initramfs does not contains the driver
> module.
>
> Am I missing something?
>
Yes, driver must be in initramfs if it wants to collect and add device
dump to /proc/vmcore in second kernel.
> >
> > 2. vmcore module allocates the buffer with requested size. It adds
> > an elf note and invokes the device driver's registered callback
> > function.
> >
> > 3. Device driver collects all hardware/firmware logs into the buffer
> > and returns control back to vmcore module.
> >
> > The device specific hardware/firmware logs can be seen as elf notes:
> >
> > # readelf -n /proc/vmcore
> >
> > Displaying notes found at file offset 0x00001000 with length 0x04003288:
> > Owner Data size Description
> > VMCOREDD_cxgb4_0000:02:00.4 0x02000fd8 Unknown note type: (0x00000700)
> > VMCOREDD_cxgb4_0000:04:00.4 0x02000fd8 Unknown note type: (0x00000700)
> > CORE 0x00000150 NT_PRSTATUS (prstatus structure)
> > CORE 0x00000150 NT_PRSTATUS (prstatus structure)
> > CORE 0x00000150 NT_PRSTATUS (prstatus structure)
> > CORE 0x00000150 NT_PRSTATUS (prstatus structure)
> > CORE 0x00000150 NT_PRSTATUS (prstatus structure)
> > CORE 0x00000150 NT_PRSTATUS (prstatus structure)
> > CORE 0x00000150 NT_PRSTATUS (prstatus structure)
> > CORE 0x00000150 NT_PRSTATUS (prstatus structure)
> > VMCOREINFO 0x0000074f Unknown note type: (0x00000000)
> >
> > Patch 1 adds API to vmcore module to allow drivers to register callback
> > to collect the device specific hardware/firmware logs. The logs will
> > be added to /proc/vmcore as elf notes.
> >
> > Patch 2 updates read and mmap logic to append device specific hardware/
> > firmware logs as elf notes.
> >
> > Patch 3 shows a cxgb4 driver example using the API to collect
> > hardware/firmware logs in crash recovery kernel, before hardware is
> > initialized.
> >
> > Thanks,
> > Rahul
> >
> > RFC v1: https://lkml.org/lkml/2018/3/2/542
> > RFC v2: https://lkml.org/lkml/2018/3/16/326
> >
[...]
Thanks,
Rahul
next prev parent reply other threads:[~2018-04-18 12:31 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-17 7:44 [PATCH net-next v4 0/3] kernel: add support to collect hardware logs in crash recovery kernel Rahul Lakkireddy
2018-04-17 7:44 ` [PATCH net-next v4 1/3] vmcore: add API to collect hardware dump in second kernel Rahul Lakkireddy
2018-04-19 8:24 ` Greg KH
2018-04-19 14:56 ` Rahul Lakkireddy
[not found] ` <cover.1523950321.git.rahul.lakkireddy-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
2018-04-17 7:44 ` [PATCH net-next v4 2/3] vmcore: append device dumps to vmcore as elf notes Rahul Lakkireddy
2018-04-17 7:44 ` [PATCH net-next v4 3/3] cxgb4: collect hardware dump in second kernel Rahul Lakkireddy
2018-04-18 6:15 ` [PATCH net-next v4 0/3] kernel: add support to collect hardware logs in crash recovery kernel Dave Young
[not found] ` <20180418061546.GA4551-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
2018-04-18 12:31 ` Rahul Lakkireddy [this message]
2018-04-18 14:28 ` Eric W. Biederman
2018-04-18 15:07 ` Rahul Lakkireddy
2018-04-19 1:40 ` Dave Young
2018-04-19 14:27 ` Rahul Lakkireddy
2018-04-19 14:53 ` Eric W. Biederman
2018-04-20 13:06 ` Rahul Lakkireddy
2018-04-20 13:36 ` Eric W. Biederman
2018-04-20 14:51 ` Rahul Lakkireddy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180418123114.GA19159@chelsio.com \
--to=rahul.lakkireddy-ut6up61k2wzbdgjk7y7tuq@public.gmane.org \
--cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org \
--cc=dyoung-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org \
--cc=ganeshgr-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org \
--cc=indranil-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org \
--cc=kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \
--cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=nirranjan-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org \
--cc=stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ@public.gmane.org \
--cc=torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).