From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
To: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Wen Gu <guwen@linux.alibaba.com>,
Philo Lu <lulie@linux.alibaba.com>,
Vadim Fedorenko <vadim.fedorenko@linux.dev>,
Dong Yibo <dong100@mucse.com>,
Mingyu Wang <25181214217@stu.xidian.edu.cn>,
Heiner Kallweit <hkallweit1@gmail.com>,
Dust Li <dust.li@linux.alibaba.com>,
netdev@vger.kernel.org
Subject: Re: [PATCH net-next v43 5/8] eea: implement packet receive logic
Date: Mon, 18 May 2026 10:34:07 +0800 [thread overview]
Message-ID: <1779071647.763595-5-xuanzhuo@linux.alibaba.com> (raw)
In-Reply-To: <20260514095138.80680-6-xuanzhuo@linux.alibaba.com>
From sashiko:
commit 142959c3285196e5751c147c32a4a954a804d736
Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
eea: implement packet receive logic
This commit implements the EEA RX path: page-pool buffer management,
per-descriptor metadata handling, packet assembly via build_skb()/
napi_alloc_skb(), and submission through napi_gro_receive(). It also
introduces a one-shot hardware-timestamp offset captured at probe time.
> diff --git a/drivers/net/ethernet/alibaba/eea/eea_net.c b/drivers/net/ethernet/alibaba/eea/eea_net.c
> index cfb18a07e296..bf8581bc27ed 100644
> --- a/drivers/net/ethernet/alibaba/eea/eea_net.c
> +++ b/drivers/net/ethernet/alibaba/eea/eea_net.c
> @@ -642,6 +642,15 @@ static struct eea_net *eea_netdev_alloc(struct eea_device *edev, u32 pairs)
> return enet;
> }
>
> +static void eea_update_ts_off(struct eea_device *edev, struct eea_net *enet)
> +{
> + u64 ts;
> +
> + ts = eea_pci_device_ts(edev);
> +
> + enet->hw_ts_offset = ktime_get_real() - ts;
> +}
Is the choice of ktime_get_real() as the anchor for hardware RX
timestamps intentional?
Consumers of skb_hwtstamps()->hwtstamp typically expect a stable
monotonic timeline; tying the offset to CLOCK_REALTIME means any later
settimeofday(), NTP step, leap-second adjustment, or a probe that
happens before NTP sync will permanently bias every reported hwtstamp.
There is also no ptp_clock_register()/PHC and no periodic resync, so
the offset captured here is the one used forever.
Is the capture itself meant to be atomic with respect to scheduling?
ts = eea_pci_device_ts(edev);
enet->hw_ts_offset = ktime_get_real() - ts;
eea_pci_device_ts() performs an MMIO read across PCIe and is then
followed by ktime_get_real() with no preempt/IRQ disabling, so any
transit or preemption delay between the two reads is permanently
baked into hw_ts_offset for the lifetime of the device.
Does this code handle 48-bit wraparound of the device timestamp
counter?
EEA_DESC_TS(desc) masks desc->ts to 48 bits via
EEA_DESC_TS_MASK = GENMASK_ULL(47, 0), and hw_ts_offset is taken
exactly once here at probe/reprobe. After roughly 3.26 days of
device uptime (2^48 ns) the per-packet computation
EEA_DESC_TS(desc) + enet->hw_ts_offset will jump backwards by ~2^48
ns relative to wall time, with no wrap detector, no resync, and no
cyclecounter/timecounter wrapper.
There is also a width inconsistency between this capture and the
per-packet path:
drivers/net/ethernet/alibaba/eea/eea_pci.c:eea_pci_device_ts() {
return cfg_read64(ep_dev->reg, hw_ts);
}
returns the full 64-bit register, while EEA_DESC_TS(desc) masks the
descriptor field to 48 bits. If bits 48..63 of the hw_ts register
are ever non-zero, every reported hwtstamp would be off by at least
2^48 ns.
> @@ -711,6 +722,8 @@ int eea_net_probe(struct eea_device *edev)
> if (err)
> goto err_reset_dev;
>
> + eea_update_ts_off(edev, enet);
> +
> netdev_dbg(enet->netdev, "eea probe success.\n");
>
> /* Queue TX/RX implementation is still in progress. register_netdev is
[ ... ]
> diff --git a/drivers/net/ethernet/alibaba/eea/eea_rx.c b/drivers/net/ethernet/alibaba/eea/eea_rx.c
> index b1265048fbc3..9c71a7cf950e 100644
> --- a/drivers/net/ethernet/alibaba/eea/eea_rx.c
> +++ b/drivers/net/ethernet/alibaba/eea/eea_rx.c
[ ... ]
> +static void process_first_buf(struct eea_net_rx *rx, struct eea_rx_ctx *ctx)
> +{
> + struct eea_net *enet = rx->enet;
> + struct sk_buff *skb = NULL;
> +
> + if (eea_harden_check_size(ctx, enet))
> + goto err;
> +
> + rx->pkt.data_valid = ctx->flags & EEA_DESC_F_DATA_VALID;
Is the EEA_DESC_F_DATA_VALID bit (the source of CHECKSUM_UNNECESSARY)
meant to be sampled only on the first descriptor of a multi-descriptor
packet?
process_first_buf() is the only place rx->pkt.data_valid is assigned,
and process_remain_buf() does not touch it. Meanwhile in
eea_submit_skb() the timestamp comes from the EOP descriptor:
if (rx->pkt.data_valid)
skb->ip_summed = CHECKSUM_UNNECESSARY;
if (enet->cfg.ts_cfg.rx_filter == HWTSTAMP_FILTER_ALL)
skb_hwtstamps(skb)->hwtstamp = EEA_DESC_TS(desc) +
enet->hw_ts_offset;
If the device places per-packet metadata on the EOP descriptor (a
common NIC layout), every multi-segment frame would silently fall
back to CHECKSUM_NONE. Could the descriptor used for the checksum
flag and for the timestamp be documented or made consistent?
> +
> + if (ctx->hdr_len)
> + skb = eea_rx_build_split_hdr_skb(rx, ctx);
> + else
> + skb = eea_rx_build_skb(rx, ctx);
> +
> + if (unlikely(!skb))
> + goto err;
> +
> + rx->pkt.head_skb = skb;
> +
> + return;
> +
> +err:
> + rx->pkt.do_drop = true;
> +}
> +
> +static void eea_submit_skb(struct eea_net_rx *rx, struct sk_buff *skb,
> + struct eea_rx_cdesc *desc)
> +{
> + struct eea_net *enet = rx->enet;
> +
> + if (rx->pkt.data_valid)
> + skb->ip_summed = CHECKSUM_UNNECESSARY;
> +
> + if (enet->cfg.ts_cfg.rx_filter == HWTSTAMP_FILTER_ALL)
> + skb_hwtstamps(skb)->hwtstamp = EEA_DESC_TS(desc) +
> + enet->hw_ts_offset;
Is the strict equality with HWTSTAMP_FILTER_ALL intentional?
The usual driver convention for SIOCSHWTSTAMP is to either reject
unsupported filters or upgrade them to ALL, so that consumers
requesting more specific filters (PTP_V2_*, NTP_ALL, etc.) still
get timestamps on matching frames. With the strict comparison
above, anything other than HWTSTAMP_FILTER_ALL yields no timestamps.
There is no SIOCSHWTSTAMP setter in this series, so the branch is
effectively unreachable today, but it may be worth revisiting
together with whatever setter lands later.
> +
> + skb_record_rx_queue(skb, rx->index);
> + skb->protocol = eth_type_trans(skb, enet->netdev);
> +
> + napi_gro_receive(rx->napi, skb);
> +}
[ ... ]
> +static int eea_cleanrx(struct eea_net_rx *rx, int budget,
> + struct eea_rx_ctx *ctx)
> +{
> + struct eea_rx_cdesc *desc;
> + struct eea_rx_meta *meta;
> + int recv, err;
> +
> + for (recv = 0; recv < budget; ) {
> + desc = eea_ering_cq_get_desc(rx->ering);
> + if (!desc)
> + break;
> +
> + err = eea_rx_desc_to_ctx(rx, ctx, desc);
> + if (unlikely(err)) {
> + if (ctx->meta)
> + eea_rx_meta_put(rx, ctx->meta);
> +
> + if (rx->pkt.head_skb)
> + dev_kfree_skb(rx->pkt.head_skb);
> +
> + /* A hardware error occurred; we are attempting to
> + * mitigate the impact. Subsequent packets may be
> + * corrupted.
> + */
> + ctx->more = false;
> + goto ack;
> + }
Can this recovery path forward arbitrary mid-payload bytes to the
network stack as a fresh L2 frame?
When eea_rx_desc_to_ctx() fails mid-stream (invalid id, or len
greater than meta->len), this branch frees rx->pkt.head_skb, forces
ctx->more = false, and jumps to ack. The tail of the loop then
runs:
if (!ctx->more) {
memset(&rx->pkt, 0, sizeof(rx->pkt));
++recv;
}
so no "still inside a device-side packet" indicator survives. The
next descriptor from the device — which may be a continuation
buffer of the original frame — will be handled by process_first_buf()
because rx->pkt.idx == 0. If that continuation satisfies
ctx->len >= ETH_HLEN, an skb is built from raw mid-payload bytes
and submitted via eth_type_trans()/napi_gro_receive().
Would setting rx->pkt.do_drop = true (and only clearing rx->pkt on
the next observed !more) be a safer way to mitigate, given the
existing comment already notes "subsequent packets may be
corrupted"?
> +
> + meta = ctx->meta;
> +
> + if (unlikely(rx->pkt.do_drop))
> + goto skip;
> +
> + eea_rx_meta_dma_sync_for_cpu(rx, meta, ctx->len);
> +
> + rx->pkt.recv_len += ctx->len;
> + rx->pkt.recv_len += ctx->hdr_len;
> +
> + if (!rx->pkt.idx)
> + process_first_buf(rx, ctx);
> + else
> + process_remain_buf(rx, ctx);
> +
> + ++rx->pkt.idx;
> +
> + if (!ctx->more && rx->pkt.head_skb)
> + eea_submit_skb(rx, rx->pkt.head_skb, desc);
> +
> +skip:
> + eea_rx_meta_put(rx, meta);
> +ack:
> + eea_ering_cq_ack_desc(rx->ering, 1);
> +
> + if (!ctx->more) {
> + memset(&rx->pkt, 0, sizeof(rx->pkt));
> + ++recv;
> + }
> + }
> +
> + return recv;
> +}
next prev parent reply other threads:[~2026-05-18 2:40 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-14 9:51 [PATCH net-next v43 0/8] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor Xuan Zhuo
2026-05-14 9:51 ` [PATCH net-next v43 1/8] eea: introduce PCI framework Xuan Zhuo
2026-05-18 3:06 ` Xuan Zhuo
2026-05-18 3:07 ` Xuan Zhuo
2026-05-14 9:51 ` [PATCH net-next v43 2/8] eea: introduce ring and descriptor structures Xuan Zhuo
2026-05-18 2:57 ` Xuan Zhuo
2026-05-14 9:51 ` [PATCH net-next v43 3/8] eea: probe the netdevice and create adminq Xuan Zhuo
2026-05-18 1:41 ` Xuan Zhuo
2026-05-18 1:41 ` Xuan Zhuo
2026-05-14 9:51 ` [PATCH net-next v43 4/8] eea: create/destroy rx,tx queues for netdevice open and stop Xuan Zhuo
2026-05-18 1:18 ` Xuan Zhuo
2026-05-18 1:24 ` Xuan Zhuo
2026-05-14 9:51 ` [PATCH net-next v43 5/8] eea: implement packet receive logic Xuan Zhuo
2026-05-18 2:34 ` Xuan Zhuo [this message]
2026-05-18 2:35 ` Xuan Zhuo
2026-05-14 9:51 ` [PATCH net-next v43 6/8] eea: implement packet transmit logic Xuan Zhuo
2026-05-18 2:47 ` Xuan Zhuo
2026-05-18 2:48 ` Xuan Zhuo
2026-05-14 9:51 ` [PATCH net-next v43 7/8] eea: introduce ethtool support Xuan Zhuo
2026-05-18 2:56 ` Xuan Zhuo
2026-05-14 9:51 ` [PATCH net-next v43 8/8] eea: introduce callback for ndo_get_stats64 and register netdev Xuan Zhuo
2026-05-18 3:01 ` Xuan Zhuo
2026-05-15 6:40 ` [PATCH net-next v43 0/8] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor Xuan Zhuo
2026-05-19 10:33 ` Paolo Abeni
2026-05-19 10:40 ` patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1779071647.763595-5-xuanzhuo@linux.alibaba.com \
--to=xuanzhuo@linux.alibaba.com \
--cc=25181214217@stu.xidian.edu.cn \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=dong100@mucse.com \
--cc=dust.li@linux.alibaba.com \
--cc=edumazet@google.com \
--cc=guwen@linux.alibaba.com \
--cc=hkallweit1@gmail.com \
--cc=kuba@kernel.org \
--cc=lulie@linux.alibaba.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=vadim.fedorenko@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox