From: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
To: Jakub Kicinski <kubakici@wp.pl>
Cc: Florian Fainelli <f.fainelli@gmail.com>,
David Miller <davem@davemloft.net>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
Ganesh GR <ganeshgr@chelsio.com>,
Nirranjan Kirubaharan <nirranjan@chelsio.com>,
Indranil Choudhury <indranil@chelsio.com>
Subject: Re: [PATCH net-next] cxgb4: append firmware dump to vmcore in kernel panic
Date: Wed, 21 Feb 2018 20:55:17 +0530 [thread overview]
Message-ID: <20180221152516.GA12148@chelsio.com> (raw)
In-Reply-To: <20180220170420.25d6e391@cakuba.netronome.com>
On Tuesday, February 02/20/18, 2018 at 17:04:20 -0800, Jakub Kicinski wrote:
> On Tue, 20 Feb 2018 16:51:03 -0800, Florian Fainelli wrote:
> > On 02/20/2018 04:43 PM, Jakub Kicinski wrote:
> > > On Mon, 19 Feb 2018 18:04:17 +0530, Rahul Lakkireddy wrote:
> > >> Our requirement is to analyze the state of firmware/hardware at the
> > >> time of kernel panic.
> > >
> > > I was wondering about this since you posted the patch and I can't come
> > > up with any specific scenario where kernel crash would correlate
> > > clearly with device state in non-trivial way.
> > >
> > > Perhaps there is something about cxgb4 HW/FW that makes this useful.
> > > Could you explain? Could you give a real life example of a bug?
> > > Is it related to the TOE-looking TLS offload Atul is posting?
> > >
> > > Is the panic you're targeting here real or manually triggered from user
> > > space to get a full dump of kernel and FW?
> > >
> > > That's me trying to guess what you're doing.. :)
> >
This is not related to TLS that Atul posted. This is related to
general Field Diagnostics.
When a kernel panic happens on critical production servers, they
may not be reproducible again or may not have downtime for debugging.
Currently vmcore generated after panic, has only snapshot of driver
state and not hardware/firmware state at the time of kernel panic. If
complete state and logs of underlying NIC hardware/firmware (in fact,
all hardware components) is collected, it will be very helpful for
post analysis.
For example, hardware memory gets incorrectly programmed by driver
due to a race condition which causes a kernel panic indirectly.
A dump of hardware memory collected during kernel panic, will
definitely help to root cause and fix the issue.
> > One case where this might be helpful is if you are chasing down DMA
> > corruption and you would like to get a nearly instant capture of both
> > the kernel's memory and the adapter which may be responsible for that.
> > This is not probably 100% proof because there is a timing window during
> > which the dumps of both contexts are going to happen, and that alone
> > might be influencing the captured memory view. Just guessing of course.
>
> Perhaps this is what you mean with the timing window - but with random
> corruptions by the time kernel hits the corrupted memory 40/100Gb
> adapter has likely forgotten all about those DMAs.. And IOMMUs are
> pretty good at catching corruptions on big iron CPUs (i.e. it's easy to
> catch them in testing, even if production environment runs iommu=pt).
> At least that's my gut feeling/experience ;)
Thanks,
Rahul
prev parent reply other threads:[~2018-02-21 15:25 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-15 13:54 [PATCH net-next] cxgb4: append firmware dump to vmcore in kernel panic Rahul Lakkireddy
2018-02-16 20:41 ` David Miller
2018-02-19 12:34 ` Rahul Lakkireddy
2018-02-19 15:01 ` David Miller
2018-02-21 0:43 ` Jakub Kicinski
2018-02-21 0:51 ` Florian Fainelli
2018-02-21 1:04 ` Jakub Kicinski
2018-02-21 15:25 ` Rahul Lakkireddy [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180221152516.GA12148@chelsio.com \
--to=rahul.lakkireddy@chelsio.com \
--cc=davem@davemloft.net \
--cc=f.fainelli@gmail.com \
--cc=ganeshgr@chelsio.com \
--cc=indranil@chelsio.com \
--cc=kubakici@wp.pl \
--cc=netdev@vger.kernel.org \
--cc=nirranjan@chelsio.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).