From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BF82E3BB677; Fri, 5 Jun 2026 02:38:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780627127; cv=none; b=DyPb98y+6kv60UfQifKYcbjyx1TrTaNqu4zAxrVImrNYXizY9ZfP+XMoQYANAEa+kh6KeJdw2ERGdyegrNUg+WYY5XJACUxndSO71hdTA/P9oOcRXCQrAX8Chr9Iitj9Xtk1+OaxRQwOvp12H09PUelfrX5MOp0LsNFGkxrw3tA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780627127; c=relaxed/simple; bh=LtCEhE69OZJMIcZijTbLlsQmWABBDJJa4xvGMBjZVwE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ehs9zpDDQVoNA5Anv3nlO1UURiiPt59qH0npwxe3rTEyfPivHETaIMfEVaCSJ383dyU/6TeaKSVbuTc142UHhVqiN64TCKmOAOvbMNLokBovdxTpm5ByShFNopFzVnV4sV62b8SlgCCWRBKxU2XsTvIxp3f64NtpKQ4PTBwVi8M= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=eNAAwSOY; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="eNAAwSOY" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B2B911F00899; Fri, 5 Jun 2026 02:38:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780627117; bh=6QbMMTu+jWsY7A9wTECEYRZhP1wOEcjM9sT+lCNHMvQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=eNAAwSOYmLUFwVpM0NAJF+gh8Gb4hpQd407JIu/5Pps4+lcubloiSNrqcnBQpugBv jAxSN2OM1kUO4K1ME1IvMw6edMYFBJIKQ3rCXaiL0AzdGl6pg9T1xcCK6fgBx9cdBP Na+pzk+ngKsHB027nd6jEBVu74ZRCRMqa2qPFvQ7n7CipQvjslJCULYZ7IKq1p/kIk QsIA7g7B+96iz7/CTKAHRZrkEQqQppca7RPvF2Fz4uThi/Lf/xgK/WRC1OnFpCvItz A2RtdC4olCLS7yNTUFsEQJ1J77Cf7mFf8W3mpDOTr1+dcH5CYmi3wXrAZRhciSkl6o IQDtCtrqZ/gUw== From: Jakub Kicinski To: sukhdeeps@marvell.com Cc: Jakub Kicinski , netdev@vger.kernel.org, andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, pabeni@redhat.com, linux-kernel@vger.kernel.org, horms@kernel.org, vadim.fedorenko@linux.dev Subject: Re: [PATCH net-next v4 7/12] net: atlantic: implement AQC113 L2/L3/L4 RX filter ops Date: Thu, 4 Jun 2026 19:38:33 -0700 Message-ID: <20260605023833.3629131-1-kuba@kernel.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260602135452.516-8-sukhdeeps@marvell.com> References: <20260602135452.516-8-sukhdeeps@marvell.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This is an AI-generated review of your patch. The human sending this email has considered the AI review valid, or at least plausible. --- net: atlantic: implement AQC113 L2/L3/L4 RX filter ops This patch adds tag-based L2/L3/L4 RX filter management for AQC113, including reference-counted L3 and L4 filter slots, an ART-based L3L4 configuration helper, and L2 ethertype set/clear, and wires the new ops into hw_atl2_ops. [Low] The commit message describes hw_atl2_filter_tag_get/put as: Add tag-based filter policy with reference-counted sharing (hw_atl2_filter_tag_get/put), allowing multiple filter rules to share the same L3 or L4 hardware filter when their match criteria are identical. In the patch, hw_atl2_filter_tag_get/put operate only on priv->etype_policy[] and are called only from hw_atl2_hw_fl2_set/clear. The L3 and L4 sharing is implemented separately via the usage field on hw_atl2_l3_filter / hw_atl2_l4_filter and hw_atl2_rxf_l3_get/put / hw_atl2_rxf_l4_get/put. Could the description be reworded so it does not conflate the etype tag policy with the L3/L4 filter refcounts? > diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl2/hw_atl2.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl2/hw_atl2.c > index 27a62fffbc29..a07a3dbddefe 100644 > --- a/drivers/net/ethernet/aquantia/atlantic/hw_atl2/hw_atl2.c > +++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl2/hw_atl2.c > @@ -87,6 +87,38 @@ const struct aq_hw_caps_s hw_atl2_caps_aqc116c = { > AQ_NIC_RATE_10M, > }; > > +/* Find tag with the same action or new free tag > + * top - top inclusive tag value > + * action - action for ActionResolverTable > + */ > +static int hw_atl2_filter_tag_get(struct hw_atl2_tag_policy *tags, > + int top, u16 action) > +{ > + int i; > + > + for (i = 1; i <= top; i++) > + if (tags[i].usage > 0 && tags[i].action == action) { > + tags[i].usage++; > + return i; > + } > + > + for (i = 1; i <= top; i++) > + if (tags[i].usage == 0) { > + tags[i].usage = 1; > + tags[i].action = action; > + return i; > + } > + > + return -1; [Low] Sibling helpers introduced in this patch (hw_atl2_new_fl3l4_find_l3 / _find_l4) return -ENOSPC on the no-free-slot path, but this helper returns the bare value -1. The current caller hw_atl2_hw_fl2_set() translates it into -ENOSPC manually, but if a future caller propagates the value unchanged it becomes -EPERM. Would returning -ENOSPC here directly be more consistent? > +} [ ... ] > @@ -823,6 +855,489 @@ static struct aq_stats_s *hw_atl2_utils_get_hw_stats(struct aq_hw_s *self) > return &self->curr_stats; > } > > +static bool hw_atl2_rxf_l3_is_equal(struct hw_atl2_l3_filter *f1, > + struct hw_atl2_l3_filter *f2) > +{ > + if (f1->cmd != f2->cmd) > + return false; > + > + if (f1->cmd & HW_ATL2_RPF_L3_CMD_SA_EN) > + if (f1->srcip[0] != f2->srcip[0]) > + return false; > + > + if (f1->cmd & HW_ATL2_RPF_L3_CMD_DA_EN) > + if (f1->dstip[0] != f2->dstip[0]) > + return false; > + > + if (f1->cmd & (HW_ATL2_RPF_L3_CMD_PROTO_EN | > + HW_ATL2_RPF_L3_V6_CMD_PROTO_EN)) > + if (f1->proto != f2->proto) > + return false; [Low] Where is l3.proto ever populated? In hw_atl2_new_fl3l4_configure() the local l3 is memset to 0 and the protocol value is encoded into the upper bits of l3.cmd via: l3.cmd |= IPPROTO_TCP << (data->is_ipv6 ? 0x18 : 8); hw_atl2_rxf_l3_get() then copies _l3->proto (=0) into the tracked slot, so this proto comparison appears to be 0 == 0 in all cases. The actual protocol comparison happens implicitly via the f1->cmd != f2->cmd check above. Should l3.proto be removed, or populated consistently from the cmd encoding, so future code that begins setting one but not the other does not silently produce false negatives in the sharing check? > + > + if (f1->cmd & HW_ATL2_RPF_L3_V6_CMD_SA_EN) > + if (memcmp(f1->srcip, f2->srcip, 16)) > + return false; > + > + if (f1->cmd & HW_ATL2_RPF_L3_V6_CMD_DA_EN) > + if (memcmp(f1->dstip, f2->dstip, 16)) > + return false; > + > + return true; > +} [ ... ] > +static void hw_atl2_rxf_l3_get(struct aq_hw_s *self, > + struct hw_atl2_l3_filter *l3, int idx, > + const struct hw_atl2_l3_filter *_l3) > +{ > + int i; > + > + l3->usage++; > + if (l3->usage == 1) { > + l3->cmd = _l3->cmd; > + for (i = 0; i < 4; i++) { > + l3->srcip[i] = _l3->srcip[i]; > + l3->dstip[i] = _l3->dstip[i]; > + } > + l3->proto = _l3->proto; > + > + if (l3->cmd & HW_ATL2_RPF_L3_CMD_EN) { > + hw_atl2_rpf_l3_v4_cmd_set(self, l3->cmd, idx); > + hw_atl2_rpf_l3_v4_tag_set(self, idx + 1, idx); > + hw_atl2_rpf_l3_v4_dest_addr_set(self, > + idx, > + l3->dstip[0]); > + hw_atl2_rpf_l3_v4_src_addr_set(self, > + idx, > + l3->srcip[0]); > + } else { > + hw_atl2_rpf_l3_v6_cmd_set(self, l3->cmd, idx); > + hw_atl2_rpf_l3_v6_tag_set(self, idx + 1, idx); > + hw_atl2_rpf_l3_v6_dest_addr_set(self, > + idx, > + l3->dstip); > + hw_atl2_rpf_l3_v6_src_addr_set(self, > + idx, > + l3->srcip); > + } > + } > +} [Medium] Is the ordering of the MMIO writes here intentional? The cmd word containing the EN/SA_EN/DA_EN/PROTO_EN bits is committed first via hw_atl2_rpf_l3_v4_cmd_set() (the v4 mask is 0xFFFF and includes the EN bit), and only afterwards are the destination and source address registers programmed. Between the cmd write and the address writes, the filter is enabled while its DA/SA registers still hold whatever value they had before (stale, zero, or a previously programmed filter's addresses). The IPv6 path is worse because hw_atl2_rpf_l3_v6_dest_addr_set() and _src_addr_set() each issue four separate register writes (one per dword), so the filter is enabled with a partially-programmed address for several MMIO cycles. The same pattern is in hw_atl2_rxf_l4_get() below. Would it be safer to program the addresses (or ports) and tag first and write the cmd word with EN last? [ ... ] > +static void hw_atl2_rxf_l4_get(struct aq_hw_s *self, > + struct hw_atl2_l4_filter *l4, int idx, > + const struct hw_atl2_l4_filter *_l4) > +{ > + l4->usage++; > + if (l4->usage == 1) { > + l4->cmd = _l4->cmd; > + l4->sport = _l4->sport; > + l4->dport = _l4->dport; > + > + hw_atl2_rpf_l4_cmd_set(self, l4->cmd, idx); > + hw_atl2_rpf_l4_tag_set(self, idx + 1, idx); > + hw_atl_rpf_l4_spd_set(self, l4->sport, idx); > + hw_atl_rpf_l4_dpd_set(self, l4->dport, idx); > + } > +} > + > +static int hw_atl2_new_fl3l4_configure(struct aq_hw_s *self, > + struct aq_rx_filter_l3l4 *data) > +{ [ ... ] > + /* convert legacy filter to new */ > + if (data->cmd & HW_ATL_RX_ENABLE_CMP_PROT_L4) { > + l3.cmd |= data->is_ipv6 ? HW_ATL2_RPF_L3_V6_CMD_PROTO_EN : > + HW_ATL2_RPF_L3_CMD_PROTO_EN; > + l3.cmd |= data->is_ipv6 ? HW_ATL2_RPF_L3_V6_CMD_EN : > + HW_ATL2_RPF_L3_CMD_EN; > + switch (data->cmd & 0x7) { > + case HW_ATL_RX_TCP: > + l3.cmd |= IPPROTO_TCP << (data->is_ipv6 ? 0x18 : 8); > + break; > + case HW_ATL_RX_UDP: > + l3.cmd |= IPPROTO_UDP << (data->is_ipv6 ? 0x18 : 8); > + break; > + case HW_ATL_RX_SCTP: > + l3.cmd |= IPPROTO_SCTP << (data->is_ipv6 ? 0x18 : 8); > + break; > + case HW_ATL_RX_ICMP: > + l3.cmd |= IPPROTO_ICMP << (data->is_ipv6 ? 0x18 : 8); > + break; [Low] For the IPv6 branch of HW_ATL_RX_ICMP, should the next-header value be IPPROTO_ICMPV6 (58) rather than IPPROTO_ICMP (1)? Real ICMPv6 traffic uses next-header 58, so this branch will not match ICMPv6 packets. The aq_set_data_fl3l4() helper in aq_filters.c does not currently set HW_ATL_RX_ICMP for any flow type, so the case is not reachable from the in-tree ethtool ntuple path today, but it is written explicitly here. [Low] The shift values 0x18 and 8 are repeated four times here, while the matching enable bits are named constants (HW_ATL2_RPF_L3_CMD_PROTO_EN and HW_ATL2_RPF_L3_V6_CMD_PROTO_EN). Would a pair of named shift macros (for example HW_ATL2_RPF_L3_CMD_PROTO_SHIFT / HW_ATL2_RPF_L3_V6_CMD_PROTO_SHIFT) make the relationship between the enable bit and the protocol byte position clearer and harder to mis-edit? > + } > + } [ ... ] > + /* find L3 and L4 filters */ > + if (l3.cmd & (HW_ATL2_RPF_L3_CMD_EN | HW_ATL2_RPF_L3_V6_CMD_EN)) { > + l3_idx = hw_atl2_new_fl3l4_find_l3(self, &l3); > + if (l3_idx < 0) > + return l3_idx; > + > + if (l3.cmd & HW_ATL2_RPF_L3_V6_CMD_EN) > + l3_filters = priv->l3_v6_filters; > + else > + l3_filters = priv->l3_v4_filters; > + > + if (priv->l3l4_filters[data->location].l3_index != l3_idx || > + old_ipv6 != !!(l3.cmd & HW_ATL2_RPF_L3_V6_CMD_EN)) > + hw_atl2_rxf_l3_get(self, &l3_filters[l3_idx], > + l3_idx, &l3); > + } > + > + if (old_l3_index != -1) { > + if (old_ipv6) > + l3_filters = priv->l3_v6_filters; > + else > + l3_filters = priv->l3_v4_filters; > + > + if (!(hw_atl2_rxf_l3_is_equal(&l3, > + &l3_filters[old_l3_index]))) { > + hw_atl2_rxf_l3_put(self, > + &l3_filters[old_l3_index], > + old_l3_index); > + } > + } > + if (l3.cmd & HW_ATL2_RPF_L3_V6_CMD_EN) > + priv->l3l4_filters[data->location].ipv6 = 1; > + else > + priv->l3l4_filters[data->location].ipv6 = 0; > + priv->l3l4_filters[data->location].l3_index = l3_idx; > + > + if (l4.cmd & HW_ATL2_RPF_L4_CMD_EN) { > + l4_idx = hw_atl2_new_fl3l4_find_l4(self, &l4); > + if (l4_idx < 0) > + return -EINVAL; [High, Low] By the time this L4 search runs, hw_atl2_new_fl3l4_configure() has already mutated state for the new request: - hw_atl2_rxf_l3_get() programmed the new L3 hardware filter and bumped its refcount, - the old L3 filter has typically been released via hw_atl2_rxf_l3_put() (clearing its HW slot if it was the last user, since match criteria differ), and - priv->l3l4_filters[location].l3_index and .ipv6 have been overwritten to point at the new L3 slot. When this branch returns, hw_atl2_hw_fl3l4_set() simply propagates the error and skips the ART write, so the device is left with the new L3 filter enabled (refcount 1) but no ART entry referencing its tag, the old ART entry still pointing at the now-cleared old L3 tag, and the user's previously-installed working rule effectively destroyed. Userspace receives an error and will not call the disable path, so the new L3 slot stays leaked. With only HW_ATL2_RPF_L3L4_FILTERS slots per family, repeated failures rapidly exhaust the pool. Could this path roll back the L3-side state (and refcounts) before returning? As a separate question, the L3 path uses: if (l3_idx < 0) return l3_idx; while the L4 path overwrites the more specific value with -EINVAL: if (l4_idx < 0) return -EINVAL; Should the L4 path also propagate l4_idx so userspace can distinguish -ENOSPC from -EINVAL? > + > + if (priv->l3l4_filters[data->location].l4_index != l4_idx) > + hw_atl2_rxf_l4_get(self, &priv->l4_filters[l4_idx], > + l4_idx, &l4); > + } [ ... ] > +static int hw_atl2_hw_fl2_set(struct aq_hw_s *self, > + struct aq_rx_filter_l2 *data) > +{ [ ... ] > + tag = hw_atl2_filter_tag_get(priv->etype_policy, > + priv->etype_filter_tag_top, > + action); > + > + if (tag < 0) > + return -ENOSPC; > + > + req_tag |= tag << HW_ATL2_RPF_TAG_ET_OFFSET; > + hw_atl2_rpf_etht_flr_tag_set(self, tag, location); > + index = priv->art_base_index + HW_ATL2_RPF_ET_PCP_USER_INDEX + > + data->location; > + hw_atl2_act_rslvr_table_set(self, index, req_tag, mask, action); > + > + hw_atl_rpf_etht_flr_en_set(self, 1U, location); > + > + return aq_hw_err_from_flags(self); > +} [Medium] Once hw_atl2_filter_tag_get() succeeds, the etype_policy slot's usage count has been incremented. After that point this function still has two ways to return an error without releasing that tag: - hw_atl2_act_rslvr_table_set() may return an error (semaphore poll timeout) — its return value is ignored here, but - aq_hw_err_from_flags(self) at the end can return -ENXIO or -EIO if AQ_HW_FLAG_ERR_UNPLUG / AQ_HW_FLAG_ERR_HW have been set during the register writes above. Because HW_ATL2_RPF_ETYPE_TAGS is small (7), repeated transient HW errors would gradually permanently exhaust the etype tag pool until no L2 filter can be installed. Should a failure path here call hw_atl2_filter_tag_put() before returning?