The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* Re: [PATCH] mctp i2c: check packet length before marking flow active
       [not found]       ` <0c396f24-366b-49ed-ae84-9f1982866a99@wkennington.com>
@ 2026-05-06  8:01         ` Jeremy Kerr
  2026-05-07  7:50           ` William A. Kennington III
  0 siblings, 1 reply; 2+ messages in thread
From: Jeremy Kerr @ 2026-05-06  8:01 UTC (permalink / raw)
  To: William A. Kennington III, Matt Johnston, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Wolfram Sang
  Cc: netdev, linux-kernel

Hi William,

> > Just to clarify my understanding of the state: "being held by two
> > owners" would indicate a violation of the lock itself. Or is it that
> > there are two threads blocked waiting to acquire the mutex?
> I think it’s actually this, 2 threads are waiting on acquiring the lock. 

OK, that's good news!

> There was a theory that it was a lock underflow that allowed 2 threads 
> to acquire the lock that lead to this patch.
>
> > For NVMe-MI, you're likely using manual tag allocation, where the tag
> > allocation (and hence flow state) is entirely controlled by userspace.
> > It may be that the NVMe protocol-level errors are causing that tags to
> > be held for long durations, perhaps?
> 
> Yeah, this is very plausible given the device(s) stop responding 
> correctly. I imagine we are getting stuck with manual allocations and
> not releasing locks. Can we reset the state machine back to NEW instead 
> of holding the lock?

Not sure what you're referring to here; if the userspace application is
not releasing the tag, we have to keep the i2c bus locked, otherwise we
may not receive a response from the device.

The one case I can think of (in upstream infrastructure, at least) is
that this might be triggered by the device reporting a long MPRT value,
and then a response gets lost. libnvme is respecting the MPRT, and not
releasing the tag for that (excessive) duration.

However, the tag -> i2c lock associations are only useful if you have
muxes in the i2c topology. Is that the case on your platform? If not,
perhaps we could elide all the bus locking when we can detect that...

Cheers,


Jeremy

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] mctp i2c: check packet length before marking flow active
  2026-05-06  8:01         ` [PATCH] mctp i2c: check packet length before marking flow active Jeremy Kerr
@ 2026-05-07  7:50           ` William A. Kennington III
  0 siblings, 0 replies; 2+ messages in thread
From: William A. Kennington III @ 2026-05-07  7:50 UTC (permalink / raw)
  To: Jeremy Kerr, Matt Johnston, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Wolfram Sang
  Cc: netdev, linux-kernel


On 5/6/26 01:01, Jeremy Kerr wrote:
> Hi William,
>
>>> Just to clarify my understanding of the state: "being held by two
>>> owners" would indicate a violation of the lock itself. Or is it that
>>> there are two threads blocked waiting to acquire the mutex?
>> I think it’s actually this, 2 threads are waiting on acquiring the lock.
> OK, that's good news!
>
>> There was a theory that it was a lock underflow that allowed 2 threads
>> to acquire the lock that lead to this patch.
>>
>>> For NVMe-MI, you're likely using manual tag allocation, where the tag
>>> allocation (and hence flow state) is entirely controlled by userspace.
>>> It may be that the NVMe protocol-level errors are causing that tags to
>>> be held for long durations, perhaps?
>> Yeah, this is very plausible given the device(s) stop responding
>> correctly. I imagine we are getting stuck with manual allocations and
>> not releasing locks. Can we reset the state machine back to NEW instead
>> of holding the lock?
> Not sure what you're referring to here; if the userspace application is
> not releasing the tag, we have to keep the i2c bus locked, otherwise we
> may not receive a response from the device.

Isn't this inherently an approach asking for trouble, where a 
potentially buggy userspace can starve out other applications which need 
to access the bus. For us we have FRU devices on the that are 
periodically rescanned or accessed for various reasons alongside the 
MCTP endpoint on the NVME device.

> The one case I can think of (in upstream infrastructure, at least) is
> that this might be triggered by the device reporting a long MPRT value,
> and then a response gets lost. libnvme is respecting the MPRT, and not
> releasing the tag for that (excessive) duration.
Yeah, I'll have to look at the specific firmware bug more but I don't 
think it's been oot caused it fully yet.
> However, the tag -> i2c lock associations are only useful if you have
> muxes in the i2c topology. Is that the case on your platform? If not,
> perhaps we could elide all the bus locking when we can detect that...

We have at least 1 layer of mux before each NVME and FRU device.

> Cheers,
>
>
> Jeremy

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-05-07  7:50 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260423001517.79219-1-william@wkennington.com>
     [not found] ` <61bb1b3838609996600f46ccb2c4ff89d085ee6f.camel@codeconstruct.com.au>
     [not found]   ` <c94fc7c9-279b-4b4b-92f3-7f1b88bc0c64@wkennington.com>
     [not found]     ` <1651e98dcc86f38a0b39679b1a6f9ef604e0812a.camel@codeconstruct.com.au>
     [not found]       ` <0c396f24-366b-49ed-ae84-9f1982866a99@wkennington.com>
2026-05-06  8:01         ` [PATCH] mctp i2c: check packet length before marking flow active Jeremy Kerr
2026-05-07  7:50           ` William A. Kennington III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox