From: ebiederm@xmission.com (Eric W. Biederman)
To: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Cc: Indranil Choudhury <indranil@chelsio.com>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
Nirranjan Kirubaharan <nirranjan@chelsio.com>,
"kexec@lists.infradead.org" <kexec@lists.infradead.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"stephen@networkplumber.org" <stephen@networkplumber.org>,
"torvalds@linux-foundation.org" <torvalds@linux-foundation.org>,
Ganesh GR <ganeshgr@chelsio.com>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
Dave Young <dyoung@redhat.com>,
"davem@davemloft.net" <davem@davemloft.net>,
"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH net-next v4 0/3] kernel: add support to collect hardware logs in crash recovery kernel
Date: Fri, 20 Apr 2018 08:36:09 -0500 [thread overview]
Message-ID: <87po2uhueu.fsf@xmission.com> (raw)
In-Reply-To: <20180420130632.GA32304@chelsio.com> (Rahul Lakkireddy's message of "Fri, 20 Apr 2018 18:36:34 +0530")
Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> writes:
> On Thursday, April 04/19/18, 2018 at 20:23:37 +0530, Eric W. Biederman wrote:
>> Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> writes:
>>
>> > On Thursday, April 04/19/18, 2018 at 07:10:30 +0530, Dave Young wrote:
>> >> On 04/18/18 at 06:01pm, Rahul Lakkireddy wrote:
>> >> > On Wednesday, April 04/18/18, 2018 at 11:45:46 +0530, Dave Young wrote:
>> >> > > Hi Rahul,
>> >> > > On 04/17/18 at 01:14pm, Rahul Lakkireddy wrote:
>> >> > > > On production servers running variety of workloads over time, kernel
>> >> > > > panic can happen sporadically after days or even months. It is
>> >> > > > important to collect as much debug logs as possible to root cause
>> >> > > > and fix the problem, that may not be easy to reproduce. Snapshot of
>> >> > > > underlying hardware/firmware state (like register dump, firmware
>> >> > > > logs, adapter memory, etc.), at the time of kernel panic will be very
>> >> > > > helpful while debugging the culprit device driver.
>> >> > > >
>> >> > > > This series of patches add new generic framework that enable device
>> >> > > > drivers to collect device specific snapshot of the hardware/firmware
>> >> > > > state of the underlying device in the crash recovery kernel. In crash
>> >> > > > recovery kernel, the collected logs are added as elf notes to
>> >> > > > /proc/vmcore, which is copied by user space scripts for post-analysis.
>> >> > > >
>> >> > > > The sequence of actions done by device drivers to append their device
>> >> > > > specific hardware/firmware logs to /proc/vmcore are as follows:
>> >> > > >
>> >> > > > 1. During probe (before hardware is initialized), device drivers
>> >> > > > register to the vmcore module (via vmcore_add_device_dump()), with
>> >> > > > callback function, along with buffer size and log name needed for
>> >> > > > firmware/hardware log collection.
>> >> > >
>> >> > > I assumed the elf notes info should be prepared while kexec_[file_]load
>> >> > > phase. But I did not read the old comment, not sure if it has been discussed
>> >> > > or not.
>> >> > >
>> >> >
>> >> > We must not collect dumps in crashing kernel. Adding more things in
>> >> > crash dump path risks not collecting vmcore at all. Eric had
>> >> > discussed this in more detail at:
>> >> >
>> >> > https://lkml.org/lkml/2018/3/24/319
>> >> >
>> >> > We are safe to collect dumps in the second kernel. Each device dump
>> >> > will be exported as an elf note in /proc/vmcore.
>> >>
>> >> I understand that we should avoid adding anything in crash path. And I also
>> >> agree to collect device dump in second kernel. I just assumed device
>> >> dump use some memory area to store the debug info and the memory
>> >> is persistent so that this can be done in 2 steps, first register the
>> >> address in elf header in kexec_load, then collect the dump in 2nd
>> >> kernel. But it seems the driver is doing some other logic to collect
>> >> the info instead of just that simple like I thought.
>> >>
>> >
>> > It seems simpler, but I'm concerned with waste of memory area, if
>> > there are no device dumps being collected in second kernel. In
>> > approach proposed in these series, we dynamically allocate memory
>> > for the device dumps from second kernel's available memory.
>>
>> Don't count that kernel having more than about 128MiB.
>>
>
> If large dump is expected, Administrator can increase the memory
> allocated to the second kernel (using crashkernel boot param), to
> ensure device dumps get collected.
Except 128MiB is already a already a huge amount to reserve. I
typically have run crash dumps with 16MiB of memory and thought it was
overkill. Looking below 32MiB seems a bit high but it is small enough
that it is still doable. I am baffled at how 2GiB can be guaranteed to fit
in 32MiB (sparse register space?) but if it works reliably.
>> For that reason if for no other it would be nice if it was possible to
>> have the driver to not initialize the device and just stand there
>> handing out the data a piece at a time as it is read from /proc/vmcore.
>>
>
> Since cxgb4 is a network driver, it can be used to transfer the dumps
> over the network. So we must ensure the dumps get collected and
> stored, before device gets initialized to transfer dumps over
> the network.
Good point. For some reason I was thinking it was an infiniband and not
an 10GiB ethernet device.
>> The 2GiB number I read earlier concerns me for working in a limited
>> environment.
>>
>
> All dumps, including the 2GB on-chip memory dump, is compressed by
> the cxgb4 driver as they are collected. The overall compressed dump
> comes out at max 32 MB.
>
>> It might even make sense to separate this into a completely separate
>> module (depended upon the main driver if it makes sense to share
>> the functionality) so that people performing crash dumps would not
>> hesitate to include the code in their initramfs images.
>>
>> I can see splitting a device up into a portion only to be used in case
>> of a crash dump and a normal portion like we do for main memory but I
>> doubt that makes sense in practice.
>>
>
> This is not required, especially in case of network drivers, which
> must collect underlying device dump and initialize the device to
> transfer dumps over the network.
I have a practical concern. What happens if the previous kernel left
the device in such a bad stat the driver can not successfully initialize
it.
Does failure to initialize cxgb4 after a crash now mean that you can not
capture the crash dump to see the crazy state the device was in?
Typically the initramfs for a crash dump does not include unnecessary
drivers so that hardware in states the drivers can't handle won't
prevent taking a crash dump.
I understand the issue if you are taking a dump over your 10GiB ethernet
it is a moot point. But if you are writing your dump to disk, or
writing it over a management gigabit ethernet then it is still an issue.
Is there a decoupling so that a totally b0rked device can't prevent
taking it's own dump?
Eric
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
next prev parent reply other threads:[~2018-04-20 13:37 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-17 7:44 [PATCH net-next v4 0/3] kernel: add support to collect hardware logs in crash recovery kernel Rahul Lakkireddy
2018-04-17 7:44 ` [PATCH net-next v4 1/3] vmcore: add API to collect hardware dump in second kernel Rahul Lakkireddy
2018-04-19 8:24 ` Greg KH
2018-04-19 14:56 ` Rahul Lakkireddy
2018-04-17 7:44 ` [PATCH net-next v4 2/3] vmcore: append device dumps to vmcore as elf notes Rahul Lakkireddy
2018-04-17 7:44 ` [PATCH net-next v4 3/3] cxgb4: collect hardware dump in second kernel Rahul Lakkireddy
2018-04-18 6:15 ` [PATCH net-next v4 0/3] kernel: add support to collect hardware logs in crash recovery kernel Dave Young
2018-04-18 12:31 ` Rahul Lakkireddy
2018-04-18 14:28 ` Eric W. Biederman
2018-04-18 15:07 ` Rahul Lakkireddy
2018-04-19 1:40 ` Dave Young
2018-04-19 14:27 ` Rahul Lakkireddy
2018-04-19 14:53 ` Eric W. Biederman
2018-04-20 13:06 ` Rahul Lakkireddy
2018-04-20 13:36 ` Eric W. Biederman [this message]
2018-04-20 14:51 ` Rahul Lakkireddy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87po2uhueu.fsf@xmission.com \
--to=ebiederm@xmission.com \
--cc=akpm@linux-foundation.org \
--cc=davem@davemloft.net \
--cc=dyoung@redhat.com \
--cc=ganeshgr@chelsio.com \
--cc=indranil@chelsio.com \
--cc=kexec@lists.infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=nirranjan@chelsio.com \
--cc=rahul.lakkireddy@chelsio.com \
--cc=stephen@networkplumber.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox