From: Jakub Kicinski <kuba@kernel.org>
To: sukhdeeps@marvell.com
Cc: Jakub Kicinski <kuba@kernel.org>,
netdev@vger.kernel.org, andrew+netdev@lunn.ch,
davem@davemloft.net, edumazet@google.com, pabeni@redhat.com,
linux-kernel@vger.kernel.org, horms@kernel.org,
vadim.fedorenko@linux.dev
Subject: Re: [PATCH net-next v4 7/12] net: atlantic: implement AQC113 L2/L3/L4 RX filter ops
Date: Thu, 4 Jun 2026 19:38:33 -0700 [thread overview]
Message-ID: <20260605023833.3629131-1-kuba@kernel.org> (raw)
In-Reply-To: <20260602135452.516-8-sukhdeeps@marvell.com>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: atlantic: implement AQC113 L2/L3/L4 RX filter ops
This patch adds tag-based L2/L3/L4 RX filter management for AQC113,
including reference-counted L3 and L4 filter slots, an ART-based L3L4
configuration helper, and L2 ethertype set/clear, and wires the new
ops into hw_atl2_ops.
[Low]
The commit message describes hw_atl2_filter_tag_get/put as:
Add tag-based filter policy with reference-counted sharing
(hw_atl2_filter_tag_get/put), allowing multiple filter rules to share
the same L3 or L4 hardware filter when their match criteria are
identical.
In the patch, hw_atl2_filter_tag_get/put operate only on
priv->etype_policy[] and are called only from hw_atl2_hw_fl2_set/clear.
The L3 and L4 sharing is implemented separately via the usage field on
hw_atl2_l3_filter / hw_atl2_l4_filter and hw_atl2_rxf_l3_get/put /
hw_atl2_rxf_l4_get/put.
Could the description be reworded so it does not conflate the etype tag
policy with the L3/L4 filter refcounts?
> diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl2/hw_atl2.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl2/hw_atl2.c
> index 27a62fffbc29..a07a3dbddefe 100644
> --- a/drivers/net/ethernet/aquantia/atlantic/hw_atl2/hw_atl2.c
> +++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl2/hw_atl2.c
> @@ -87,6 +87,38 @@ const struct aq_hw_caps_s hw_atl2_caps_aqc116c = {
> AQ_NIC_RATE_10M,
> };
>
> +/* Find tag with the same action or new free tag
> + * top - top inclusive tag value
> + * action - action for ActionResolverTable
> + */
> +static int hw_atl2_filter_tag_get(struct hw_atl2_tag_policy *tags,
> + int top, u16 action)
> +{
> + int i;
> +
> + for (i = 1; i <= top; i++)
> + if (tags[i].usage > 0 && tags[i].action == action) {
> + tags[i].usage++;
> + return i;
> + }
> +
> + for (i = 1; i <= top; i++)
> + if (tags[i].usage == 0) {
> + tags[i].usage = 1;
> + tags[i].action = action;
> + return i;
> + }
> +
> + return -1;
[Low]
Sibling helpers introduced in this patch (hw_atl2_new_fl3l4_find_l3 /
_find_l4) return -ENOSPC on the no-free-slot path, but this helper
returns the bare value -1.
The current caller hw_atl2_hw_fl2_set() translates it into -ENOSPC
manually, but if a future caller propagates the value unchanged it
becomes -EPERM. Would returning -ENOSPC here directly be more
consistent?
> +}
[ ... ]
> @@ -823,6 +855,489 @@ static struct aq_stats_s *hw_atl2_utils_get_hw_stats(struct aq_hw_s *self)
> return &self->curr_stats;
> }
>
> +static bool hw_atl2_rxf_l3_is_equal(struct hw_atl2_l3_filter *f1,
> + struct hw_atl2_l3_filter *f2)
> +{
> + if (f1->cmd != f2->cmd)
> + return false;
> +
> + if (f1->cmd & HW_ATL2_RPF_L3_CMD_SA_EN)
> + if (f1->srcip[0] != f2->srcip[0])
> + return false;
> +
> + if (f1->cmd & HW_ATL2_RPF_L3_CMD_DA_EN)
> + if (f1->dstip[0] != f2->dstip[0])
> + return false;
> +
> + if (f1->cmd & (HW_ATL2_RPF_L3_CMD_PROTO_EN |
> + HW_ATL2_RPF_L3_V6_CMD_PROTO_EN))
> + if (f1->proto != f2->proto)
> + return false;
[Low]
Where is l3.proto ever populated? In hw_atl2_new_fl3l4_configure() the
local l3 is memset to 0 and the protocol value is encoded into the upper
bits of l3.cmd via:
l3.cmd |= IPPROTO_TCP << (data->is_ipv6 ? 0x18 : 8);
hw_atl2_rxf_l3_get() then copies _l3->proto (=0) into the tracked slot,
so this proto comparison appears to be 0 == 0 in all cases. The actual
protocol comparison happens implicitly via the f1->cmd != f2->cmd check
above.
Should l3.proto be removed, or populated consistently from the cmd
encoding, so future code that begins setting one but not the other does
not silently produce false negatives in the sharing check?
> +
> + if (f1->cmd & HW_ATL2_RPF_L3_V6_CMD_SA_EN)
> + if (memcmp(f1->srcip, f2->srcip, 16))
> + return false;
> +
> + if (f1->cmd & HW_ATL2_RPF_L3_V6_CMD_DA_EN)
> + if (memcmp(f1->dstip, f2->dstip, 16))
> + return false;
> +
> + return true;
> +}
[ ... ]
> +static void hw_atl2_rxf_l3_get(struct aq_hw_s *self,
> + struct hw_atl2_l3_filter *l3, int idx,
> + const struct hw_atl2_l3_filter *_l3)
> +{
> + int i;
> +
> + l3->usage++;
> + if (l3->usage == 1) {
> + l3->cmd = _l3->cmd;
> + for (i = 0; i < 4; i++) {
> + l3->srcip[i] = _l3->srcip[i];
> + l3->dstip[i] = _l3->dstip[i];
> + }
> + l3->proto = _l3->proto;
> +
> + if (l3->cmd & HW_ATL2_RPF_L3_CMD_EN) {
> + hw_atl2_rpf_l3_v4_cmd_set(self, l3->cmd, idx);
> + hw_atl2_rpf_l3_v4_tag_set(self, idx + 1, idx);
> + hw_atl2_rpf_l3_v4_dest_addr_set(self,
> + idx,
> + l3->dstip[0]);
> + hw_atl2_rpf_l3_v4_src_addr_set(self,
> + idx,
> + l3->srcip[0]);
> + } else {
> + hw_atl2_rpf_l3_v6_cmd_set(self, l3->cmd, idx);
> + hw_atl2_rpf_l3_v6_tag_set(self, idx + 1, idx);
> + hw_atl2_rpf_l3_v6_dest_addr_set(self,
> + idx,
> + l3->dstip);
> + hw_atl2_rpf_l3_v6_src_addr_set(self,
> + idx,
> + l3->srcip);
> + }
> + }
> +}
[Medium]
Is the ordering of the MMIO writes here intentional?
The cmd word containing the EN/SA_EN/DA_EN/PROTO_EN bits is committed
first via hw_atl2_rpf_l3_v4_cmd_set() (the v4 mask is 0xFFFF and
includes the EN bit), and only afterwards are the destination and
source address registers programmed.
Between the cmd write and the address writes, the filter is enabled
while its DA/SA registers still hold whatever value they had before
(stale, zero, or a previously programmed filter's addresses).
The IPv6 path is worse because hw_atl2_rpf_l3_v6_dest_addr_set() and
_src_addr_set() each issue four separate register writes (one per
dword), so the filter is enabled with a partially-programmed address
for several MMIO cycles.
The same pattern is in hw_atl2_rxf_l4_get() below. Would it be safer to
program the addresses (or ports) and tag first and write the cmd word
with EN last?
[ ... ]
> +static void hw_atl2_rxf_l4_get(struct aq_hw_s *self,
> + struct hw_atl2_l4_filter *l4, int idx,
> + const struct hw_atl2_l4_filter *_l4)
> +{
> + l4->usage++;
> + if (l4->usage == 1) {
> + l4->cmd = _l4->cmd;
> + l4->sport = _l4->sport;
> + l4->dport = _l4->dport;
> +
> + hw_atl2_rpf_l4_cmd_set(self, l4->cmd, idx);
> + hw_atl2_rpf_l4_tag_set(self, idx + 1, idx);
> + hw_atl_rpf_l4_spd_set(self, l4->sport, idx);
> + hw_atl_rpf_l4_dpd_set(self, l4->dport, idx);
> + }
> +}
> +
> +static int hw_atl2_new_fl3l4_configure(struct aq_hw_s *self,
> + struct aq_rx_filter_l3l4 *data)
> +{
[ ... ]
> + /* convert legacy filter to new */
> + if (data->cmd & HW_ATL_RX_ENABLE_CMP_PROT_L4) {
> + l3.cmd |= data->is_ipv6 ? HW_ATL2_RPF_L3_V6_CMD_PROTO_EN :
> + HW_ATL2_RPF_L3_CMD_PROTO_EN;
> + l3.cmd |= data->is_ipv6 ? HW_ATL2_RPF_L3_V6_CMD_EN :
> + HW_ATL2_RPF_L3_CMD_EN;
> + switch (data->cmd & 0x7) {
> + case HW_ATL_RX_TCP:
> + l3.cmd |= IPPROTO_TCP << (data->is_ipv6 ? 0x18 : 8);
> + break;
> + case HW_ATL_RX_UDP:
> + l3.cmd |= IPPROTO_UDP << (data->is_ipv6 ? 0x18 : 8);
> + break;
> + case HW_ATL_RX_SCTP:
> + l3.cmd |= IPPROTO_SCTP << (data->is_ipv6 ? 0x18 : 8);
> + break;
> + case HW_ATL_RX_ICMP:
> + l3.cmd |= IPPROTO_ICMP << (data->is_ipv6 ? 0x18 : 8);
> + break;
[Low]
For the IPv6 branch of HW_ATL_RX_ICMP, should the next-header value be
IPPROTO_ICMPV6 (58) rather than IPPROTO_ICMP (1)? Real ICMPv6 traffic
uses next-header 58, so this branch will not match ICMPv6 packets.
The aq_set_data_fl3l4() helper in aq_filters.c does not currently set
HW_ATL_RX_ICMP for any flow type, so the case is not reachable from the
in-tree ethtool ntuple path today, but it is written explicitly here.
[Low]
The shift values 0x18 and 8 are repeated four times here, while the
matching enable bits are named constants (HW_ATL2_RPF_L3_CMD_PROTO_EN
and HW_ATL2_RPF_L3_V6_CMD_PROTO_EN). Would a pair of named shift
macros (for example HW_ATL2_RPF_L3_CMD_PROTO_SHIFT /
HW_ATL2_RPF_L3_V6_CMD_PROTO_SHIFT) make the relationship between the
enable bit and the protocol byte position clearer and harder to
mis-edit?
> + }
> + }
[ ... ]
> + /* find L3 and L4 filters */
> + if (l3.cmd & (HW_ATL2_RPF_L3_CMD_EN | HW_ATL2_RPF_L3_V6_CMD_EN)) {
> + l3_idx = hw_atl2_new_fl3l4_find_l3(self, &l3);
> + if (l3_idx < 0)
> + return l3_idx;
> +
> + if (l3.cmd & HW_ATL2_RPF_L3_V6_CMD_EN)
> + l3_filters = priv->l3_v6_filters;
> + else
> + l3_filters = priv->l3_v4_filters;
> +
> + if (priv->l3l4_filters[data->location].l3_index != l3_idx ||
> + old_ipv6 != !!(l3.cmd & HW_ATL2_RPF_L3_V6_CMD_EN))
> + hw_atl2_rxf_l3_get(self, &l3_filters[l3_idx],
> + l3_idx, &l3);
> + }
> +
> + if (old_l3_index != -1) {
> + if (old_ipv6)
> + l3_filters = priv->l3_v6_filters;
> + else
> + l3_filters = priv->l3_v4_filters;
> +
> + if (!(hw_atl2_rxf_l3_is_equal(&l3,
> + &l3_filters[old_l3_index]))) {
> + hw_atl2_rxf_l3_put(self,
> + &l3_filters[old_l3_index],
> + old_l3_index);
> + }
> + }
> + if (l3.cmd & HW_ATL2_RPF_L3_V6_CMD_EN)
> + priv->l3l4_filters[data->location].ipv6 = 1;
> + else
> + priv->l3l4_filters[data->location].ipv6 = 0;
> + priv->l3l4_filters[data->location].l3_index = l3_idx;
> +
> + if (l4.cmd & HW_ATL2_RPF_L4_CMD_EN) {
> + l4_idx = hw_atl2_new_fl3l4_find_l4(self, &l4);
> + if (l4_idx < 0)
> + return -EINVAL;
[High, Low]
By the time this L4 search runs, hw_atl2_new_fl3l4_configure() has
already mutated state for the new request:
- hw_atl2_rxf_l3_get() programmed the new L3 hardware filter and
bumped its refcount,
- the old L3 filter has typically been released via
hw_atl2_rxf_l3_put() (clearing its HW slot if it was the last
user, since match criteria differ), and
- priv->l3l4_filters[location].l3_index and .ipv6 have been
overwritten to point at the new L3 slot.
When this branch returns, hw_atl2_hw_fl3l4_set() simply propagates
the error and skips the ART write, so the device is left with the new
L3 filter enabled (refcount 1) but no ART entry referencing its tag,
the old ART entry still pointing at the now-cleared old L3 tag, and
the user's previously-installed working rule effectively destroyed.
Userspace receives an error and will not call the disable path, so
the new L3 slot stays leaked. With only HW_ATL2_RPF_L3L4_FILTERS
slots per family, repeated failures rapidly exhaust the pool.
Could this path roll back the L3-side state (and refcounts) before
returning?
As a separate question, the L3 path uses:
if (l3_idx < 0)
return l3_idx;
while the L4 path overwrites the more specific value with -EINVAL:
if (l4_idx < 0)
return -EINVAL;
Should the L4 path also propagate l4_idx so userspace can distinguish
-ENOSPC from -EINVAL?
> +
> + if (priv->l3l4_filters[data->location].l4_index != l4_idx)
> + hw_atl2_rxf_l4_get(self, &priv->l4_filters[l4_idx],
> + l4_idx, &l4);
> + }
[ ... ]
> +static int hw_atl2_hw_fl2_set(struct aq_hw_s *self,
> + struct aq_rx_filter_l2 *data)
> +{
[ ... ]
> + tag = hw_atl2_filter_tag_get(priv->etype_policy,
> + priv->etype_filter_tag_top,
> + action);
> +
> + if (tag < 0)
> + return -ENOSPC;
> +
> + req_tag |= tag << HW_ATL2_RPF_TAG_ET_OFFSET;
> + hw_atl2_rpf_etht_flr_tag_set(self, tag, location);
> + index = priv->art_base_index + HW_ATL2_RPF_ET_PCP_USER_INDEX +
> + data->location;
> + hw_atl2_act_rslvr_table_set(self, index, req_tag, mask, action);
> +
> + hw_atl_rpf_etht_flr_en_set(self, 1U, location);
> +
> + return aq_hw_err_from_flags(self);
> +}
[Medium]
Once hw_atl2_filter_tag_get() succeeds, the etype_policy slot's usage
count has been incremented. After that point this function still has
two ways to return an error without releasing that tag:
- hw_atl2_act_rslvr_table_set() may return an error (semaphore poll
timeout) — its return value is ignored here, but
- aq_hw_err_from_flags(self) at the end can return -ENXIO or -EIO if
AQ_HW_FLAG_ERR_UNPLUG / AQ_HW_FLAG_ERR_HW have been set during the
register writes above.
Because HW_ATL2_RPF_ETYPE_TAGS is small (7), repeated transient HW
errors would gradually permanently exhaust the etype tag pool until
no L2 filter can be installed. Should a failure path here call
hw_atl2_filter_tag_put() before returning?
next prev parent reply other threads:[~2026-06-05 2:38 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-02 13:54 [PATCH net-next v4 0/12] net: atlantic: add PTP support for AQC113 (Antigua) sukhdeeps
2026-06-02 13:54 ` [PATCH net-next v4 1/12] net: atlantic: correct L3L4 filter flow_type masking and IPv6 handling sukhdeeps
2026-06-02 13:54 ` [PATCH net-next v4 2/12] net: atlantic: move active_ipv4/ipv6 bitmap updates after HW write sukhdeeps
2026-06-02 13:54 ` [PATCH net-next v4 3/12] net: atlantic: decouple aq_set_data_fl3l4() from driver internals sukhdeeps
2026-06-02 13:54 ` [PATCH net-next v4 4/12] net: atlantic: add AQC113 hardware register definitions and accessors sukhdeeps
2026-06-02 13:54 ` [PATCH net-next v4 5/12] net: atlantic: add AQC113 filter data structures and firmware query sukhdeeps
2026-06-05 2:38 ` Jakub Kicinski
2026-06-02 13:54 ` [PATCH net-next v4 6/12] net: atlantic: fix AQC113 HW init: ART, L2 filter slot, MAC address sukhdeeps
2026-06-05 2:38 ` Jakub Kicinski
2026-06-02 13:54 ` [PATCH net-next v4 7/12] net: atlantic: implement AQC113 L2/L3/L4 RX filter ops sukhdeeps
2026-06-05 2:38 ` Jakub Kicinski [this message]
2026-06-02 13:54 ` [PATCH net-next v4 8/12] net: atlantic: add AQC113 PTP traffic class and TX path setup sukhdeeps
2026-06-02 13:54 ` [PATCH net-next v4 9/12] net: atlantic: extend hw_ops and TX descriptor for AQC113 PTP sukhdeeps
2026-06-02 13:54 ` [PATCH net-next v4 10/12] net: atlantic: add AQC113 PTP hardware ops in hw_atl2 sukhdeeps
2026-06-05 2:38 ` Jakub Kicinski
2026-06-02 13:54 ` [PATCH net-next v4 11/12] net: atlantic: add AQC113 TX timestamp polling and PTP TX classification sukhdeeps
2026-06-05 2:38 ` Jakub Kicinski
2026-06-02 13:54 ` [PATCH net-next v4 12/12] net: atlantic: add AQC113 PTP support in aq_ptp and driver core sukhdeeps
2026-06-05 2:38 ` Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260605023833.3629131-1-kuba@kernel.org \
--to=kuba@kernel.org \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sukhdeeps@marvell.com \
--cc=vadim.fedorenko@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox