From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-dl1-f44.google.com (mail-dl1-f44.google.com [74.125.82.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8C2413793DF for ; Thu, 7 May 2026 07:50:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.44 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778140239; cv=none; b=lFDgg9JqxWT5j45FgS8a61s0bR9E5KaZy99uxXms4O2h6Tjf8C10h9FZyk14ZXbu6sOLw0u8IWQyTnc5XAjwzMTLC/QP9bKY6nGr5sVERsYwdi/dMm6v/cUJlmiIfOMNRtDzTz43i064/hkg0CcxmTwJFirShdPoksJUHMzSTqw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778140239; c=relaxed/simple; bh=CXifJGsbhENDtyiTlyzKA1BRv3ZdpcOlUQiYadmnCT8=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=E/Pg4eKXQkRYPzQq5Fco9yGDv8yfa8faXyosKKgkJmw1S+MwahaZDA70L3dl3dagwypi/oYP4/cRNJNmBuwHmxpPSR2yU+1F2cG2xH/TAEMEUbnNcixx8QfCZeg8uN+TgLbKi6A9dBrv9xPofZZ+2adx3HCvsVtzBc3fqVbM2WM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=wkennington.com; spf=none smtp.mailfrom=wkennington.com; dkim=pass (2048-bit key) header.d=wkennington-com.20251104.gappssmtp.com header.i=@wkennington-com.20251104.gappssmtp.com header.b=GCcGIk1g; arc=none smtp.client-ip=74.125.82.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=wkennington.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=wkennington.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=wkennington-com.20251104.gappssmtp.com header.i=@wkennington-com.20251104.gappssmtp.com header.b="GCcGIk1g" Received: by mail-dl1-f44.google.com with SMTP id a92af1059eb24-130a4aae5ccso1808911c88.1 for ; Thu, 07 May 2026 00:50:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wkennington-com.20251104.gappssmtp.com; s=20251104; t=1778140236; x=1778745036; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=OQaH4VE6zKt+ePtRJskHRhgkSbQv1EciQK3ci86kyl4=; b=GCcGIk1g7jXXgIl5QK0y+LQMl6qRP9GHY3cCD2olKvIg4HaOp5kKrTEa1rhIIiPX1R 9vhgf1i+FGxECeEMcofffSb1eP/SvVnIMnP1dGenurL4fOSGiqj+2kCaNZJhpTvPl//P 3rfKVLTCRMvFgkDSM1QvZ145iuuxM8Ey8i88gPnwdWwE3bfrbyVNac0qOWY/QaHOOEUa TS6DdIRiT6q+dxTaFdx1AqznRKt/K1795cIrdqzgBUCXpMVm9U0vcKU7fLkJWvTxpX0R ZV5QaKRT0h5I+4hzUuZLVnZNC7Z+p+foj/BfMcm39IMR8W1A6Z1LPdWKY6zQTO39zU4m UFBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778140236; x=1778745036; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=OQaH4VE6zKt+ePtRJskHRhgkSbQv1EciQK3ci86kyl4=; b=AIIX4LKZW0kn0Dp4KYMmEFvNnRuHkkev+YV7q2s3nDfufZDzFbxlb+Dq1U73YUFqvS NLbBpCTEA75ABNFhZSPB0u3pNJsq5GmV/e1O2vtu0KoP5zqrcU/5uGE4qR0XLcfD+gOS UC4L3DBT0fgWZJstY/lXRIrQMw7ocUh/19kwwYkJcoYRFNfmSEbVOKEOTn0/kv9IBOsR u6qx3SaQkpy5e/DIHSYpKFR0+3/f9O2d0XtOs2nVwl7n53I0w17JnDZVJ0DQJxMFkEQQ 1/pgWHjLVk8QwQKHHKLMoLeWg6dWsjjq3ecPNSB2x+f9/P/xLz820AsxAufY7PtvkoXz aQHQ== X-Forwarded-Encrypted: i=1; AFNElJ9G1k2ML4rC06SROwkvOCtg04pA17nE3wUQ7WcGFCQqxckj+KVyP5K9PMeUY2CRbSDrgD13Q8/VJN5dyCs=@vger.kernel.org X-Gm-Message-State: AOJu0YzIeXH986jsA5iS4k3AfxQFcBoy3kBA0bMc+RHjHjjaj05scmJQ TPNH0EF/f1m8v0w+1iLCFp2Kouc6FXYsWp8cVHXseMp92DblgC3Nh7Ex8BLjgXQWhAU= X-Gm-Gg: AeBDietauMD2sKMVYPlwkLfChNBbCr/4wbzjTE9qgwvdsloKj+k5zyd8GWLM8mw3AHN ++HhNqqL8AuuqlcInl9I7HrqSoxh0YMVONKb8Ojbe3dYuesgCyzFoW4EhAeHCKiMta6lUfyO6wE Jhn8oE5T0c9c4AgZldCvbn5jhFDSxxvOTqZl/7qzA7zmjshWG4cNqLDbRX3lWYH8ZtT6KEAZK1u YtJfevVVQJOJzpRaO2rQm5TtJsc6UYFFdx/cBR8L/KO9+d5gNOXRkgOH4DBj7hs7b/qWUfvifub hUtgkjG4WCbiIejvY9cHBz4xMs31UytVxTNQx10GhAvFBfP539ZXwGnK025p2GsSK+7EihzEUUO xOLRYIL4PiZhLPGni8fX0zRpC1T3aZO10ilIdr2zKxRQkx5bNLPXE3VQ7KTB6XqBcU79yhg52k/ 20Yju3PyKb26l24C+qoDcy4tEpPio3WNvr80wv6aIvt1U4BiWL7tSMO0EME+NB0fMn9g35TPtUk IK9VBcJFVM6dEsQLRNoJMjD3A== X-Received: by 2002:a05:7022:4184:b0:12d:de3e:86a7 with SMTP id a92af1059eb24-131964ae31amr3759315c88.37.1778140235937; Thu, 07 May 2026 00:50:35 -0700 (PDT) Received: from ?IPV6:2600:1700:5ae0:228c:e58:7bff:fe94:34ed? ([2600:1700:5ae0:228c:e58:7bff:fe94:34ed]) by smtp.gmail.com with ESMTPSA id a92af1059eb24-13202db1914sm8372702c88.11.2026.05.07.00.50.35 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 07 May 2026 00:50:35 -0700 (PDT) Message-ID: <83fddb43-72ff-4176-aeac-3c65faac811e@wkennington.com> Date: Thu, 7 May 2026 00:50:34 -0700 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mctp i2c: check packet length before marking flow active To: Jeremy Kerr , Matt Johnston , Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Wolfram Sang Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org References: <20260423001517.79219-1-william@wkennington.com> <61bb1b3838609996600f46ccb2c4ff89d085ee6f.camel@codeconstruct.com.au> <1651e98dcc86f38a0b39679b1a6f9ef604e0812a.camel@codeconstruct.com.au> <0c396f24-366b-49ed-ae84-9f1982866a99@wkennington.com> <88cb1f245485d991b845886827876f27f11792ae.camel@codeconstruct.com.au> Content-Language: en-US From: "William A. Kennington III" In-Reply-To: <88cb1f245485d991b845886827876f27f11792ae.camel@codeconstruct.com.au> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 5/6/26 01:01, Jeremy Kerr wrote: > Hi William, > >>> Just to clarify my understanding of the state: "being held by two >>> owners" would indicate a violation of the lock itself. Or is it that >>> there are two threads blocked waiting to acquire the mutex? >> I think it’s actually this, 2 threads are waiting on acquiring the lock. > OK, that's good news! > >> There was a theory that it was a lock underflow that allowed 2 threads >> to acquire the lock that lead to this patch. >> >>> For NVMe-MI, you're likely using manual tag allocation, where the tag >>> allocation (and hence flow state) is entirely controlled by userspace. >>> It may be that the NVMe protocol-level errors are causing that tags to >>> be held for long durations, perhaps? >> Yeah, this is very plausible given the device(s) stop responding >> correctly. I imagine we are getting stuck with manual allocations and >> not releasing locks. Can we reset the state machine back to NEW instead >> of holding the lock? > Not sure what you're referring to here; if the userspace application is > not releasing the tag, we have to keep the i2c bus locked, otherwise we > may not receive a response from the device. Isn't this inherently an approach asking for trouble, where a potentially buggy userspace can starve out other applications which need to access the bus. For us we have FRU devices on the that are periodically rescanned or accessed for various reasons alongside the MCTP endpoint on the NVME device. > The one case I can think of (in upstream infrastructure, at least) is > that this might be triggered by the device reporting a long MPRT value, > and then a response gets lost. libnvme is respecting the MPRT, and not > releasing the tag for that (excessive) duration. Yeah, I'll have to look at the specific firmware bug more but I don't think it's been oot caused it fully yet. > However, the tag -> i2c lock associations are only useful if you have > muxes in the i2c topology. Is that the case on your platform? If not, > perhaps we could elide all the bus locking when we can detect that... We have at least 1 layer of mux before each NVME and FRU device. > Cheers, > > > Jeremy