From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx0b-0016f401.pphosted.com (mx0a-0016f401.pphosted.com [67.231.148.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5D5C134DCCD; Fri, 15 May 2026 07:27:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.148.174 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778830056; cv=none; b=PrxrNbsLuPq4SckL1X9IGVdOZ6eWfHJNeVv5jfDhyh+/9KG+DTlfRGRRv9l61HB4WNSw6hmkYrXimSAiNaqyCSEMpnrVVr5YzhWI+OC/5xZtQ9a7XBRpWolgHE4kskHcfbQLPGgeZ6/PmoCXFWasW4aUrzvEkJnY0gCoBDxw698= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778830056; c=relaxed/simple; bh=CtXcKLY2TdeHy6fGq+PAmYUpChkHygIAlqMAIJMvYts=; h=Date:From:To:CC:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=RkIVtuHP/M7v8DdYp091X5ceO3eRnupIy6mdWO8Qukzp30Uk9UdnRcTif33v6RM7JcsZ9i5xbap2FYJurk7LPPYphZhVRf/i16ax7Ktqvkh51GXZ/SWACyHMoeYo6RW2tjZfnak1qL2olyvI+bbarPdV8GIgvP5+WTunXwTmDiw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=marvell.com; spf=pass smtp.mailfrom=marvell.com; dkim=pass (2048-bit key) header.d=marvell.com header.i=@marvell.com header.b=g5stKuKJ; arc=none smtp.client-ip=67.231.148.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=marvell.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=marvell.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=marvell.com header.i=@marvell.com header.b="g5stKuKJ" Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 64F7B9CL1033243; Fri, 15 May 2026 00:27:03 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h= cc:content-type:date:from:in-reply-to:message-id:mime-version :references:subject:to; s=pfpt0220; bh=y/HL4MaF3LtfbjmzG/YvFN4KE f18kgCJMeKCaLgm1VA=; b=g5stKuKJ6ocFJvgzqmxAzcARolm6hXN7U76l4T7Z+ SdS/WpfqmEFdKpMYBx0ihLciRJyWTGP8TAo8bqkdYZJnKOaS80dDQjfs4fvQwveD Bd/qO8X5PVf73nZX9vEdn2IBR6BUimQ9Ags460Y61MzSWl5LrYI9d21FYm4+Fh2U WGWMYu1ILjOmb0NmQb6onWTrrBtEl/teEObxK9blN1vQI59m5SeD5SLI5EXfPlEd cPRIq8sAzAK3PdUpm2y2CXEmCyn8PbDUhBVHyIpo54t9H3f3kn4x7JAr3Ww3Kzbh mE3TgUs8xluPtM7J47yTjBnM1YzG5ceb1IlyBk++evuwQ== Received: from dc5-exch05.marvell.com ([199.233.59.128]) by mx0a-0016f401.pphosted.com (PPS) with ESMTPS id 4e5m3whmvu-5 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 15 May 2026 00:27:03 -0700 (PDT) Received: from DC5-EXCH05.marvell.com (10.69.176.209) by DC5-EXCH05.marvell.com (10.69.176.209) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.25; Fri, 15 May 2026 00:27:00 -0700 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH05.marvell.com (10.69.176.209) with Microsoft SMTP Server id 15.2.1544.25 via Frontend Transport; Fri, 15 May 2026 00:27:00 -0700 Received: from rkannoth-OptiPlex-7090 (unknown [10.28.36.165]) by maili.marvell.com (Postfix) with SMTP id 1E8573F7057; Fri, 15 May 2026 00:26:49 -0700 (PDT) Date: Fri, 15 May 2026 12:56:49 +0530 From: Ratheesh Kannoth To: , , , CC: , , , , , , , , , , , , , , , , , , , , , , , , , Subject: Re: [PATCH v14 net-next 5/9] octeontx2-af: npc: cn20k: add subbank search order control Message-ID: References: <20260514062537.3813802-1-rkannoth@marvell.com> <20260514062537.3813802-6-rkannoth@marvell.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20260514062537.3813802-6-rkannoth@marvell.com> X-Authority-Analysis: v=2.4 cv=YMqvDxGx c=1 sm=1 tr=0 ts=6a06cac7 cx=c_pps a=rEv8fa4AjpPjGxpoe8rlIQ==:117 a=rEv8fa4AjpPjGxpoe8rlIQ==:17 a=kj9zAlcOel0A:10 a=NGcC8JguVDcA:10 a=VkNPw1HP01LnGYTKEx00:22 a=l0iWHRpgs5sLHlkKQ1IR:22 a=EAYMVhzMl8SCOHhVQcBL:22 a=M5GUcnROAAAA:8 a=KvYP_omU_0Oirzp_N2YA:9 a=CjuIK1q_8ugA:10 a=OBjm3rFKGHvpk9ecZwUJ:22 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNTE1MDA3MyBTYWx0ZWRfXyhvkg8VxWlgY WuwT2vSdqVjiRegaWMkJt1WW1qFcw+lwREXl56N86zZq9Zo7Isue8tbvSoc+bnYVPa+cf3nUSM3 OvytXS/T9akmOyY2Qn764RWuAGYlK2peII6iJhR92zg+i06H5Dj5r4XZLneUf9itMmymWutKjuy QWay358evg1Sfe7ksNDZFOki0VpyKyP36+xU9QUn1/afdujovm1Zm81qn9a/RybjJ9aKW8V6Ojh 2wgZNly52zpMoQcPZFfoR/wz8Rqf2DQ9myerbVZGKyo6oGdh92owovPec2LLKum+07/Wqu+Y91G LdBdKvFAF0I+OhBHuhZRyvharmKx65cd41fjmld3JVwLk9SOfV3yHfnK0kHFDt9Jukm9IDVfZ7z yBp+RTia91XXSEZVS3AMCQ9hKZJdN2A1u2TLeaKygkNRlQV1eweudK/QDxGtqNyFrQB3X+cHMwF 8Xs0mU958Ah4JAhwWCQ== X-Proofpoint-GUID: 6wjn7lM1EhK2N6T11qfwWx0mj6MAhQdD X-Proofpoint-ORIG-GUID: 6wjn7lM1EhK2N6T11qfwWx0mj6MAhQdD X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-05-15_01,2026-05-13_01,2025-10-01_01 On 2026-05-14 at 11:55:33, Ratheesh Kannoth (rkannoth@marvell.com) wrote: > CN20K NPC MCAM is split into 32 subbanks that are searched in a > predefined order during allocation. Lower-numbered subbanks have > higher priority than higher-numbered ones. > > Add a runtime "srch_order" to control the order in which > subbanks are searched during MCAM allocation. > > >> +int npc_cn20k_search_order_set(struct rvu *rvu, >> + u64 narr[MAX_NUM_SUB_BANKS], int cnt) >> +{ >> + struct npc_mcam *mcam = &rvu->hw->mcam; >> + struct npc_subbank *sb; >> + struct xarray *xa; >> + int prio, rc; >> + int sb_idx; >> + >> + if (cnt != npc_priv.num_subbanks) { >> + dev_err(rvu->dev, "Number of entries(%u) != %u\n", >> + cnt, npc_priv.num_subbanks); >> + return -EINVAL; >> + } >> + >> + mutex_lock(&mcam->lock); >> + npc_lock_all_subbank(); >Does exposing npc_lock_all_subbank() to a user-triggerable devlink callback >cause a lockdep violation? >npc_lock_all_subbank() acquires mutexes for all subbanks sequentially. >Since these mutexes share the same lock class, lockdep might detect this >as a possible recursive lock. With up to 32 subbanks, this also exceeds >the maximum lockdep nesting limit of 8 subclasses. The 32 subbank mutexes are physically distinct locks protecting independent hardware queues, meaning no actual runtime deadlock path exists. To satisfy lockdep's verification engine and prevent tracking limit exhaustion during sequential allocation, the driver uses a singular, overarching top-level domain lock (mcam->lock) to serialize the entire configuration sequence. Because mcam->lock acts as the primary synchronization barrier for the devlink context, concurrent cross-subbank inversion paths are structurally impossible. To permanently silence the tracking constraint warning, i can post a patch (post merging this series) to either mutex_lock_nested() with an explicit index state or disables lockdep tracking entirely for this specialized hardware control loop using the lockdep_set_novalidate_class() macro >> + restrict_valid = false; >> + >> + for (prio = 0; prio < cnt; prio++) { >> + sb_idx = narr[prio]; >> + sb = &npc_priv.sb[sb_idx]; >> + sb->arr_idx = prio; >> + >> + /* erase entry if exist */ >> + xa_erase(&npc_priv.xa_sb_free, sb->arr_idx); >> + xa_erase(&npc_priv.xa_sb_used, sb->arr_idx); >> + >> + xa = &npc_priv.xa_sb_free; >> + if (sb->flags & NPC_SUBBANK_FLAG_USED) >> + xa = &npc_priv.xa_sb_used; >> + >> + /* if xa_store() fails due to memory pressure, there is >> + * no point in rolling back, as roll back may also fail >> + */ >> + rc = xa_err(xa_store(xa, sb->arr_idx, >> + xa_mk_value(sb_idx), GFP_KERNEL)); >> + if (rc) { >> + dev_err(rvu->dev, >> + "Setting arr_idx=%d for sb=%d failed\n", >> + sb->arr_idx, sb_idx); >> + goto fail; >> + } >> + } >Does this error path leak subbank resources and leave the xarrays in a >corrupted state? >If xa_store() fails, the function jumps to the fail label and aborts >without restoring the original state. Because the entry at the current >priority index was already erased via xa_erase(), but the new subbank >failed to store, that subbank is permanently lost. >Additionally, the subbank that failed to store is left with a modified >arr_idx pointing to an empty slot, potentially breaking future allocations. >Could this function preallocate resources (such as using xa_reserve()) or >build temporary xarrays to ensure the changes can be applied atomically >without a destructive intermediate state? >> + restrict_valid = false; >> + >> + for (prio = 0; prio < cnt; prio++) { >> + sb_idx = narr[prio]; >> + sb = &npc_priv.sb[sb_idx]; >> + sb->arr_idx = prio; >> + >> + /* erase entry if exist */ >> + xa_erase(&npc_priv.xa_sb_free, sb->arr_idx); >> + xa_erase(&npc_priv.xa_sb_used, sb->arr_idx); >> + >> + xa = &npc_priv.xa_sb_free; >> + if (sb->flags & NPC_SUBBANK_FLAG_USED) >> + xa = &npc_priv.xa_sb_used; >> + >> + /* if xa_store() fails due to memory pressure, there is >> + * no point in rolling back, as roll back may also fail >> + */ >> + rc = xa_err(xa_store(xa, sb->arr_idx, >> + xa_mk_value(sb_idx), GFP_KERNEL)); >> + if (rc) { >> + dev_err(rvu->dev, >> + "Setting arr_idx=%d for sb=%d failed\n", >> + sb->arr_idx, sb_idx); >> + goto fail; >> + } >> + } >Does this error path leak subbank resources and leave the xarrays in a >corrupted state? >If xa_store() fails, the function jumps to the fail label and aborts >without restoring the original state. Because the entry at the current >priority index was already erased via xa_erase(), but the new subbank >failed to store, that subbank is permanently lost. >Additionally, the subbank that failed to store is left with a modified >arr_idx pointing to an empty slot, potentially breaking future allocations. >Could this function preallocate resources (such as using xa_reserve()) or >build temporary xarrays to ensure the changes can be applied atomically >without a destructive intermediate state? The omission of a transactional rollback path is intentional due to the systemic implications of a GFP_KERNEL allocation failure inside the xarray core. If xa_store() returns an ENOMEM error, the underlying slab allocator is already exhausted, meaning any programmatic attempt to erase or restore original indices would also fail, compounding the unpredictable state. Because this devlink callback operates under the top-level mcam->lock mutex, no concurrent allocation or translation worker can read the partially modified xarrays while the driver is in this degraded state. An xa_store() failure indicates a fatal, system-wide memory leak or exhaustion event. At this threshold, the driver prints a critical error payload to dmesg to allow deterministic post-mortem debugging of the host's global memory subsystem, rather than attempting an unreliable in-kernel rollback.