linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Young <dyoung@redhat.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"kexec@lists.infradead.org" <kexec@lists.infradead.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Indranil Choudhury <indranil@chelsio.com>,
	Nirranjan Kirubaharan <nirranjan@chelsio.com>,
	"stephen@networkplumber.org" <stephen@networkplumber.org>,
	Ganesh GR <ganeshgr@chelsio.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"torvalds@linux-foundation.org" <torvalds@linux-foundation.org>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH net-next v4 0/3] kernel: add support to collect hardware logs in crash recovery kernel
Date: Wed, 18 Apr 2018 20:37:07 +0530	[thread overview]
Message-ID: <20180418150707.GA27638@chelsio.com> (raw)
In-Reply-To: <871sfcy4ge.fsf@xmission.com>

On Wednesday, April 04/18/18, 2018 at 19:58:01 +0530, Eric W. Biederman wrote:
> Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> writes:
> 
> > On Wednesday, April 04/18/18, 2018 at 11:45:46 +0530, Dave Young wrote:
> >> Hi Rahul,
> >> On 04/17/18 at 01:14pm, Rahul Lakkireddy wrote:
> >> > On production servers running variety of workloads over time, kernel
> >> > panic can happen sporadically after days or even months. It is
> >> > important to collect as much debug logs as possible to root cause
> >> > and fix the problem, that may not be easy to reproduce. Snapshot of
> >> > underlying hardware/firmware state (like register dump, firmware
> >> > logs, adapter memory, etc.), at the time of kernel panic will be very
> >> > helpful while debugging the culprit device driver.
> >> > 
> >> > This series of patches add new generic framework that enable device
> >> > drivers to collect device specific snapshot of the hardware/firmware
> >> > state of the underlying device in the crash recovery kernel. In crash
> >> > recovery kernel, the collected logs are added as elf notes to
> >> > /proc/vmcore, which is copied by user space scripts for post-analysis.
> >> > 
> >> > The sequence of actions done by device drivers to append their device
> >> > specific hardware/firmware logs to /proc/vmcore are as follows:
> >> > 
> >> > 1. During probe (before hardware is initialized), device drivers
> >> > register to the vmcore module (via vmcore_add_device_dump()), with
> >> > callback function, along with buffer size and log name needed for
> >> > firmware/hardware log collection.
> >> 
> >> I assumed the elf notes info should be prepared while kexec_[file_]load
> >> phase. But I did not read the old comment, not sure if it has been discussed
> >> or not.
> >> 
> >
> > We must not collect dumps in crashing kernel. Adding more things in
> > crash dump path risks not collecting vmcore at all. Eric had
> > discussed this in more detail at:
> >
> > https://lkml.org/lkml/2018/3/24/319
> >
> > We are safe to collect dumps in the second kernel. Each device dump
> > will be exported as an elf note in /proc/vmcore.
> 
> It just occurred to me there is one variation that is worth
> considering.
> 
> Is the area you are looking at dumping part of a huge mmio area?
> I think someone said 2GB?
> 
> If that is the case it could be worth it to simply add the needed
> addresses to the range of memory we need to dump, and simply having a
> elf note saying that is what happened.
> 

We are _not_ dumping mmio area. However, one part of the dump
collection involves reading 2 GB on-chip memory via PIO access,
which is compressed and stored.

Thanks,
Rahul

  reply	other threads:[~2018-04-18 15:07 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-17  7:44 [PATCH net-next v4 0/3] kernel: add support to collect hardware logs in crash recovery kernel Rahul Lakkireddy
2018-04-17  7:44 ` [PATCH net-next v4 1/3] vmcore: add API to collect hardware dump in second kernel Rahul Lakkireddy
2018-04-19  8:24   ` Greg KH
2018-04-19 14:56     ` Rahul Lakkireddy
2018-04-17  7:44 ` [PATCH net-next v4 2/3] vmcore: append device dumps to vmcore as elf notes Rahul Lakkireddy
2018-04-17  7:44 ` [PATCH net-next v4 3/3] cxgb4: collect hardware dump in second kernel Rahul Lakkireddy
2018-04-18  6:15 ` [PATCH net-next v4 0/3] kernel: add support to collect hardware logs in crash recovery kernel Dave Young
2018-04-18 12:31   ` Rahul Lakkireddy
2018-04-18 14:28     ` Eric W. Biederman
2018-04-18 15:07       ` Rahul Lakkireddy [this message]
2018-04-19  1:40     ` Dave Young
2018-04-19 14:27       ` Rahul Lakkireddy
2018-04-19 14:53         ` Eric W. Biederman
2018-04-20 13:06           ` Rahul Lakkireddy
2018-04-20 13:36             ` Eric W. Biederman
2018-04-20 14:51               ` Rahul Lakkireddy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180418150707.GA27638@chelsio.com \
    --to=rahul.lakkireddy@chelsio.com \
    --cc=akpm@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=dyoung@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=ganeshgr@chelsio.com \
    --cc=indranil@chelsio.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nirranjan@chelsio.com \
    --cc=stephen@networkplumber.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).