All of lore.kernel.org
 help / color / mirror / Atom feed
From: "William A. Kennington III" <william@wkennington.com>
To: Jeremy Kerr <jk@codeconstruct.com.au>,
	Matt Johnston <matt@codeconstruct.com.au>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Wolfram Sang <wsa@kernel.org>
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mctp i2c: check packet length before marking flow active
Date: Wed, 29 Apr 2026 02:04:27 -0700	[thread overview]
Message-ID: <0c396f24-366b-49ed-ae84-9f1982866a99@wkennington.com> (raw)
In-Reply-To: <1651e98dcc86f38a0b39679b1a6f9ef604e0812a.camel@codeconstruct.com.au>

On 4/23/26 21:16, Jeremy Kerr wrote:

> Hi William,
>
>>> Out of curiosity though, how did you hit the hdr_byte_count mismatch in
>>> the first place?
>> Our current theory is that we have known buggy firmware on our NVME MCTP
>> devices and we are seeing some kind of corruption on the bus that we are
>> going to fix in on the firmware side.
> OK, sounds good for the overall fix, but I don't think that would be
> causing the path that you're addressing here. The fix is definitely
> valid, but can't be hit through any RX data corruption (we're in the
> TX path).
Yeah I think you might be right, the hard part is reproducing this is so 
infrequent for us that it takes a long time to iterate on testing these 
changes.
>
> The header byte count is populated during header construction, so a
> mismatch here would indicate modification of the skb between that point
> at the actual xmit. Do you see the "Bad TX len" warning in these cases?

I double checked and so far I can’t find evidence of it. Probably we 
still want to keep this change, but it’s not the root of our problems.

>> We started also seeing kernel
>> crashes along with the bad firmware symptoms, walked through ~110 kdumps
>> and found i2c locks that were held by 2 owners (eeprom reading and the
>> MCTP TX queue).
> Just to clarify my understanding of the state: "being held by two
> owners" would indicate a violation of the lock itself. Or is it that
> there are two threads blocked waiting to acquire the mutex?
I think it’s actually this, 2 threads are waiting on acquiring the lock. 
There was a theory that it was a lock underflow that allowed 2 threads 
to acquire the lock that lead to this patch.
> For NVMe-MI, you're likely using manual tag allocation, where the tag
> allocation (and hence flow state) is entirely controlled by userspace.
> It may be that the NVMe protocol-level errors are causing that tags to
> be held for long durations, perhaps?

Yeah, this is very plausible given the device(s) stop responding 
correctly. I imagine we are getting stuck with manual allocations and 
not releasing locks. Can we reset the state machine back to NEW instead 
of holding the lock?

>
> Cheers,
>
>
> Jeremy

  reply	other threads:[~2026-04-29  9:04 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-23  0:15 [PATCH] mctp i2c: check packet length before marking flow active William A. Kennington III
2026-04-23  3:47 ` Jeremy Kerr
2026-04-23  6:55   ` William A. Kennington III
2026-04-24  4:16     ` Jeremy Kerr
2026-04-29  9:04       ` William A. Kennington III [this message]
2026-05-06  8:01         ` Jeremy Kerr
2026-05-07  7:50           ` William A. Kennington III
2026-04-23  7:46 ` [PATCH net v2] net: mctp i2c: check " William A. Kennington III
2026-04-28 11:03   ` Paolo Abeni
2026-04-28 11:20   ` patchwork-bot+netdevbpf
2026-04-29  1:23     ` Jeremy Kerr
2026-04-29  7:13       ` Paolo Abeni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0c396f24-366b-49ed-ae84-9f1982866a99@wkennington.com \
    --to=william@wkennington.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=jk@codeconstruct.com.au \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matt@codeconstruct.com.au \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=wsa@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.