From: Jakub Kicinski <kuba@kernel.org>
To: Michael Chan <michael.chan@broadcom.com>
Cc: davem@davemloft.net, netdev@vger.kernel.org, edumazet@google.com,
pabeni@redhat.com
Subject: Re: [PATCH net-next 0/3] eth: bnxt: handle invalid Tx completions more gracefully
Date: Mon, 10 Jul 2023 17:38:58 -0700 [thread overview]
Message-ID: <20230710173858.75bc590e@kernel.org> (raw)
In-Reply-To: <CACKFLikt=1U5fB2Xe=KfsvjfrXmgQuR2PH4iWCESWcpZBf-8Qg@mail.gmail.com>
On Mon, 10 Jul 2023 14:44:31 -0700 Michael Chan wrote:
> > bnxt trusts the events generated by the device which may lead to kernel
> > crashes. These are extremely rare but they do happen. For a while
> > I thought crashing may be intentional, because device reporting invalid
> > completions should never happen, and having a core dump could be useful
> > if it does. But in practice I haven't found any clues in the core dumps,
> > and panic_on_warn exists.
>
> Indeed, it was intentional to crash the kernel so that we could
> analyze the rings in the core dump. Typically, we would find a bad
> completion in one of the rings and we would debug it with the hardware
> team during early chip testing. Either the bug is fixed or some
> suitable workaround is implemented. Ideally, this should never happen
> once the chip goes into production.
I was suspecting bad HW, but some new platforms seems to be hitting it,
too. Which now makes me suspect PXE -> Linux hand off problem?
Or multi-host? Hard to tell..
Hopefully once it's not crashing it will be easier to do more analysis -
crashes within softirq during boot don't propagate too well into
monitoring systems :(
> I suppose in a large enough deployment, this NULL SKB crash can
> happen. I will review your patchset later today. Thanks.
Thanks!
prev parent reply other threads:[~2023-07-11 0:38 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-10 20:56 [PATCH net-next 0/3] eth: bnxt: handle invalid Tx completions more gracefully Jakub Kicinski
2023-07-10 20:56 ` [PATCH net-next 1/3] eth: bnxt: move and rename reset helpers Jakub Kicinski
2023-07-10 20:56 ` [PATCH net-next 2/3] eth: bnxt: take the bit to set as argument of bnxt_queue_sp_work() Jakub Kicinski
2023-07-10 20:56 ` [PATCH net-next 3/3] eth: bnxt: handle invalid Tx completions more gracefully Jakub Kicinski
2023-07-11 8:00 ` Michael Chan
2023-07-12 0:01 ` Michael Chan
2023-07-12 1:09 ` Jakub Kicinski
2023-07-12 4:11 ` Michael Chan
2023-07-12 4:24 ` Jakub Kicinski
2023-07-12 4:50 ` Michael Chan
2023-07-12 16:35 ` Jakub Kicinski
2023-07-11 10:10 ` Paolo Abeni
2023-07-12 1:19 ` Jakub Kicinski
2023-07-12 6:50 ` Paolo Abeni
2023-07-12 16:34 ` Jakub Kicinski
2023-07-12 20:31 ` Paolo Abeni
2023-07-12 20:50 ` Jakub Kicinski
2023-07-10 21:44 ` [PATCH net-next 0/3] " Michael Chan
2023-07-11 0:38 ` Jakub Kicinski [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230710173858.75bc590e@kernel.org \
--to=kuba@kernel.org \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=michael.chan@broadcom.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).