All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: Michael Chan <michael.chan@broadcom.com>
Cc: davem@davemloft.net, netdev@vger.kernel.org, edumazet@google.com,
	pabeni@redhat.com
Subject: Re: [PATCH net-next 0/3] eth: bnxt: handle invalid Tx completions more gracefully
Date: Mon, 10 Jul 2023 17:38:58 -0700	[thread overview]
Message-ID: <20230710173858.75bc590e@kernel.org> (raw)
In-Reply-To: <CACKFLikt=1U5fB2Xe=KfsvjfrXmgQuR2PH4iWCESWcpZBf-8Qg@mail.gmail.com>

On Mon, 10 Jul 2023 14:44:31 -0700 Michael Chan wrote:
> > bnxt trusts the events generated by the device which may lead to kernel
> > crashes. These are extremely rare but they do happen. For a while
> > I thought crashing may be intentional, because device reporting invalid
> > completions should never happen, and having a core dump could be useful
> > if it does. But in practice I haven't found any clues in the core dumps,
> > and panic_on_warn exists.  
> 
> Indeed, it was intentional to crash the kernel so that we could
> analyze the rings in the core dump.  Typically, we would find a bad
> completion in one of the rings and we would debug it with the hardware
> team during early chip testing.  Either the bug is fixed or some
> suitable workaround is implemented.  Ideally, this should never happen
> once the chip goes into production.

I was suspecting bad HW, but some new platforms seems to be hitting it,
too. Which now makes me suspect PXE -> Linux hand off problem? 
Or multi-host?  Hard to tell..
Hopefully once it's not crashing it will be easier to do more analysis -
crashes within softirq during boot don't propagate too well into
monitoring systems :(

> I suppose in a large enough deployment, this NULL SKB crash can
> happen.  I will review your patchset later today.  Thanks.

Thanks!

      reply	other threads:[~2023-07-11  0:38 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-10 20:56 [PATCH net-next 0/3] eth: bnxt: handle invalid Tx completions more gracefully Jakub Kicinski
2023-07-10 20:56 ` [PATCH net-next 1/3] eth: bnxt: move and rename reset helpers Jakub Kicinski
2023-07-10 20:56 ` [PATCH net-next 2/3] eth: bnxt: take the bit to set as argument of bnxt_queue_sp_work() Jakub Kicinski
2023-07-10 20:56 ` [PATCH net-next 3/3] eth: bnxt: handle invalid Tx completions more gracefully Jakub Kicinski
2023-07-11  8:00   ` Michael Chan
2023-07-12  0:01     ` Michael Chan
2023-07-12  1:09       ` Jakub Kicinski
2023-07-12  4:11         ` Michael Chan
2023-07-12  4:24           ` Jakub Kicinski
2023-07-12  4:50             ` Michael Chan
2023-07-12 16:35               ` Jakub Kicinski
2023-07-11 10:10   ` Paolo Abeni
2023-07-12  1:19     ` Jakub Kicinski
2023-07-12  6:50       ` Paolo Abeni
2023-07-12 16:34         ` Jakub Kicinski
2023-07-12 20:31           ` Paolo Abeni
2023-07-12 20:50             ` Jakub Kicinski
2023-07-10 21:44 ` [PATCH net-next 0/3] " Michael Chan
2023-07-11  0:38   ` Jakub Kicinski [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230710173858.75bc590e@kernel.org \
    --to=kuba@kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=michael.chan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.