public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Jeremy Kerr <jk@codeconstruct.com.au>
To: "William A. Kennington III" <william@wkennington.com>,
	Matt Johnston <matt@codeconstruct.com.au>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	 Paolo Abeni <pabeni@redhat.com>, Wolfram Sang <wsa@kernel.org>
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mctp i2c: check packet length before marking flow active
Date: Fri, 24 Apr 2026 12:16:21 +0800	[thread overview]
Message-ID: <1651e98dcc86f38a0b39679b1a6f9ef604e0812a.camel@codeconstruct.com.au> (raw)
In-Reply-To: <c94fc7c9-279b-4b4b-92f3-7f1b88bc0c64@wkennington.com>

Hi William,

> > Out of curiosity though, how did you hit the hdr_byte_count mismatch in
> > the first place?
> 
> Our current theory is that we have known buggy firmware on our NVME MCTP 
> devices and we are seeing some kind of corruption on the bus that we are 
> going to fix in on the firmware side.

OK, sounds good for the overall fix, but I don't think that would be
causing the path that you're addressing here. The fix is definitely
valid, but can't be hit through any RX data corruption (we're in the
TX path).

The header byte count is populated during header construction, so a
mismatch here would indicate modification of the skb between that point
at the actual xmit. Do you see the "Bad TX len" warning in these cases?

> We started also seeing kernel 
> crashes along with the bad firmware symptoms, walked through ~110 kdumps 
> and found i2c locks that were held by 2 owners (eeprom reading and the 
> MCTP TX queue).

Just to clarify my understanding of the state: "being held by two
owners" would indicate a violation of the lock itself. Or is it that
there are two threads blocked waiting to acquire the mutex?

For NVMe-MI, you're likely using manual tag allocation, where the tag
allocation (and hence flow state) is entirely controlled by userspace.
It may be that the NVMe protocol-level errors are causing that tags to
be held for long durations, perhaps?

Cheers,


Jeremy

  reply	other threads:[~2026-04-24  4:16 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-23  0:15 [PATCH] mctp i2c: check packet length before marking flow active William A. Kennington III
2026-04-23  3:47 ` Jeremy Kerr
2026-04-23  6:55   ` William A. Kennington III
2026-04-24  4:16     ` Jeremy Kerr [this message]
2026-04-23  7:46 ` [PATCH net v2] net: mctp i2c: check " William A. Kennington III
2026-04-28 11:03   ` Paolo Abeni
2026-04-28 11:20   ` patchwork-bot+netdevbpf
2026-04-29  1:23     ` Jeremy Kerr

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1651e98dcc86f38a0b39679b1a6f9ef604e0812a.camel@codeconstruct.com.au \
    --to=jk@codeconstruct.com.au \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matt@codeconstruct.com.au \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=william@wkennington.com \
    --cc=wsa@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox