From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
To: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Wen Gu <guwen@linux.alibaba.com>,
Philo Lu <lulie@linux.alibaba.com>,
Vadim Fedorenko <vadim.fedorenko@linux.dev>,
Dong Yibo <dong100@mucse.com>,
Mingyu Wang <25181214217@stu.xidian.edu.cn>,
Heiner Kallweit <hkallweit1@gmail.com>,
Dust Li <dust.li@linux.alibaba.com>,
netdev@vger.kernel.org
Subject: Re: [PATCH net-next v43 5/8] eea: implement packet receive logic
Date: Mon, 18 May 2026 10:35:09 +0800 [thread overview]
Message-ID: <1779071709.526365-6-xuanzhuo@linux.alibaba.com> (raw)
In-Reply-To: <1779071647.763595-5-xuanzhuo@linux.alibaba.com>
On Mon, 18 May 2026 10:34:07 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> From sashiko:
>
>
> commit 142959c3285196e5751c147c32a4a954a804d736
> Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> eea: implement packet receive logic
> This commit implements the EEA RX path: page-pool buffer management,
> per-descriptor metadata handling, packet assembly via build_skb()/
> napi_alloc_skb(), and submission through napi_gro_receive(). It also
> introduces a one-shot hardware-timestamp offset captured at probe time.
> > diff --git a/drivers/net/ethernet/alibaba/eea/eea_net.c b/drivers/net/ethernet/alibaba/eea/eea_net.c
> > index cfb18a07e296..bf8581bc27ed 100644
> > --- a/drivers/net/ethernet/alibaba/eea/eea_net.c
> > +++ b/drivers/net/ethernet/alibaba/eea/eea_net.c
> > @@ -642,6 +642,15 @@ static struct eea_net *eea_netdev_alloc(struct eea_device *edev, u32 pairs)
> > return enet;
> > }
> >
> > +static void eea_update_ts_off(struct eea_device *edev, struct eea_net *enet)
> > +{
> > + u64 ts;
> > +
> > + ts = eea_pci_device_ts(edev);
> > +
> > + enet->hw_ts_offset = ktime_get_real() - ts;
> > +}
> Is the choice of ktime_get_real() as the anchor for hardware RX
> timestamps intentional?
> Consumers of skb_hwtstamps()->hwtstamp typically expect a stable
> monotonic timeline; tying the offset to CLOCK_REALTIME means any later
> settimeofday(), NTP step, leap-second adjustment, or a probe that
> happens before NTP sync will permanently bias every reported hwtstamp.
> There is also no ptp_clock_register()/PHC and no periodic resync, so
> the offset captured here is the one used forever.
> Is the capture itself meant to be atomic with respect to scheduling?
> ts = eea_pci_device_ts(edev);
> enet->hw_ts_offset = ktime_get_real() - ts;
> eea_pci_device_ts() performs an MMIO read across PCIe and is then
> followed by ktime_get_real() with no preempt/IRQ disabling, so any
> transit or preemption delay between the two reads is permanently
> baked into hw_ts_offset for the lifetime of the device.
> Does this code handle 48-bit wraparound of the device timestamp
> counter?
> EEA_DESC_TS(desc) masks desc->ts to 48 bits via
> EEA_DESC_TS_MASK = GENMASK_ULL(47, 0), and hw_ts_offset is taken
> exactly once here at probe/reprobe. After roughly 3.26 days of
> device uptime (2^48 ns) the per-packet computation
> EEA_DESC_TS(desc) + enet->hw_ts_offset will jump backwards by ~2^48
> ns relative to wall time, with no wrap detector, no resync, and no
> cyclecounter/timecounter wrapper.
> There is also a width inconsistency between this capture and the
> per-packet path:
> drivers/net/ethernet/alibaba/eea/eea_pci.c:eea_pci_device_ts() {
> return cfg_read64(ep_dev->reg, hw_ts);
> }
> returns the full 64-bit register, while EEA_DESC_TS(desc) masks the
> descriptor field to 48 bits. If bits 48..63 of the hw_ts register
> are ever non-zero, every reported hwtstamp would be off by at least
> 2^48 ns.
Indeed, this part is currently just a basic architectural framework. I have not
included a full PTP implementation in this version, as it is slated for future
development. I acknowledge that this implementation is currently incomplete.
> > @@ -711,6 +722,8 @@ int eea_net_probe(struct eea_device *edev)
> > if (err)
> > goto err_reset_dev;
> >
> > + eea_update_ts_off(edev, enet);
> > +
> > netdev_dbg(enet->netdev, "eea probe success.\n");
> >
> > /* Queue TX/RX implementation is still in progress. register_netdev is
> [ ... ]
> > diff --git a/drivers/net/ethernet/alibaba/eea/eea_rx.c b/drivers/net/ethernet/alibaba/eea/eea_rx.c
> > index b1265048fbc3..9c71a7cf950e 100644
> > --- a/drivers/net/ethernet/alibaba/eea/eea_rx.c
> > +++ b/drivers/net/ethernet/alibaba/eea/eea_rx.c
> [ ... ]
> > +static void process_first_buf(struct eea_net_rx *rx, struct eea_rx_ctx *ctx)
> > +{
> > + struct eea_net *enet = rx->enet;
> > + struct sk_buff *skb = NULL;
> > +
> > + if (eea_harden_check_size(ctx, enet))
> > + goto err;
> > +
> > + rx->pkt.data_valid = ctx->flags & EEA_DESC_F_DATA_VALID;
> Is the EEA_DESC_F_DATA_VALID bit (the source of CHECKSUM_UNNECESSARY)
> meant to be sampled only on the first descriptor of a multi-descriptor
> packet?
> process_first_buf() is the only place rx->pkt.data_valid is assigned,
> and process_remain_buf() does not touch it. Meanwhile in
> eea_submit_skb() the timestamp comes from the EOP descriptor:
> if (rx->pkt.data_valid)
> skb->ip_summed = CHECKSUM_UNNECESSARY;
> if (enet->cfg.ts_cfg.rx_filter == HWTSTAMP_FILTER_ALL)
> skb_hwtstamps(skb)->hwtstamp = EEA_DESC_TS(desc) +
> enet->hw_ts_offset;
> If the device places per-packet metadata on the EOP descriptor (a
> common NIC layout), every multi-segment frame would silently fall
> back to CHECKSUM_NONE. Could the descriptor used for the checksum
> flag and for the timestamp be documented or made consistent?
On the RX path, for multi-descriptor packets, the information contained in all
descriptors is essentially the same. Since they all describe the same packet, we
only need to sample EEA_DESC_F_DATA_VALID once. As for the hardware timestamp,
we simply retrieve it from the last descriptor as a matter of convenience.
> > +
> > + if (ctx->hdr_len)
> > + skb = eea_rx_build_split_hdr_skb(rx, ctx);
> > + else
> > + skb = eea_rx_build_skb(rx, ctx);
> > +
> > + if (unlikely(!skb))
> > + goto err;
> > +
> > + rx->pkt.head_skb = skb;
> > +
> > + return;
> > +
> > +err:
> > + rx->pkt.do_drop = true;
> > +}
> > +
> > +static void eea_submit_skb(struct eea_net_rx *rx, struct sk_buff *skb,
> > + struct eea_rx_cdesc *desc)
> > +{
> > + struct eea_net *enet = rx->enet;
> > +
> > + if (rx->pkt.data_valid)
> > + skb->ip_summed = CHECKSUM_UNNECESSARY;
> > +
> > + if (enet->cfg.ts_cfg.rx_filter == HWTSTAMP_FILTER_ALL)
> > + skb_hwtstamps(skb)->hwtstamp = EEA_DESC_TS(desc) +
> > + enet->hw_ts_offset;
> Is the strict equality with HWTSTAMP_FILTER_ALL intentional?
> The usual driver convention for SIOCSHWTSTAMP is to either reject
> unsupported filters or upgrade them to ALL, so that consumers
> requesting more specific filters (PTP_V2_*, NTP_ALL, etc.) still
> get timestamps on matching frames. With the strict comparison
> above, anything other than HWTSTAMP_FILTER_ALL yields no timestamps.
> There is no SIOCSHWTSTAMP setter in this series, so the branch is
> effectively unreachable today, but it may be worth revisiting
> together with whatever setter lands later.
> > +
> > + skb_record_rx_queue(skb, rx->index);
> > + skb->protocol = eth_type_trans(skb, enet->netdev);
> > +
> > + napi_gro_receive(rx->napi, skb);
> > +}
> [ ... ]
> > +static int eea_cleanrx(struct eea_net_rx *rx, int budget,
> > + struct eea_rx_ctx *ctx)
> > +{
> > + struct eea_rx_cdesc *desc;
> > + struct eea_rx_meta *meta;
> > + int recv, err;
> > +
> > + for (recv = 0; recv < budget; ) {
> > + desc = eea_ering_cq_get_desc(rx->ering);
> > + if (!desc)
> > + break;
> > +
> > + err = eea_rx_desc_to_ctx(rx, ctx, desc);
> > + if (unlikely(err)) {
> > + if (ctx->meta)
> > + eea_rx_meta_put(rx, ctx->meta);
> > +
> > + if (rx->pkt.head_skb)
> > + dev_kfree_skb(rx->pkt.head_skb);
> > +
> > + /* A hardware error occurred; we are attempting to
> > + * mitigate the impact. Subsequent packets may be
> > + * corrupted.
> > + */
> > + ctx->more = false;
> > + goto ack;
> > + }
> Can this recovery path forward arbitrary mid-payload bytes to the
> network stack as a fresh L2 frame?
> When eea_rx_desc_to_ctx() fails mid-stream (invalid id, or len
> greater than meta->len), this branch frees rx->pkt.head_skb, forces
> ctx->more = false, and jumps to ack. The tail of the loop then
> runs:
> if (!ctx->more) {
> memset(&rx->pkt, 0, sizeof(rx->pkt));
> ++recv;
> }
> so no "still inside a device-side packet" indicator survives. The
> next descriptor from the device — which may be a continuation
> buffer of the original frame — will be handled by process_first_buf()
> because rx->pkt.idx == 0. If that continuation satisfies
> ctx->len >= ETH_HLEN, an skb is built from raw mid-payload bytes
> and submitted via eth_type_trans()/napi_gro_receive().
> Would setting rx->pkt.do_drop = true (and only clearing rx->pkt on
> the next observed !more) be a safer way to mitigate, given the
> existing comment already notes "subsequent packets may be
> corrupted"?
During multi-fragment processing—specifically, whether a mid-stream
exception might cause the driver to misidentify subsequent fragments as the
start of a new packet. However, from a driver perspective, there is little
we can do in such scenarios; if the hardware state becomes corrupted in
this way, it constitutes a critical hardware error that is effectively
beyond the driver's recovery capability.
> > +
> > + meta = ctx->meta;
> > +
> > + if (unlikely(rx->pkt.do_drop))
> > + goto skip;
> > +
> > + eea_rx_meta_dma_sync_for_cpu(rx, meta, ctx->len);
> > +
> > + rx->pkt.recv_len += ctx->len;
> > + rx->pkt.recv_len += ctx->hdr_len;
> > +
> > + if (!rx->pkt.idx)
> > + process_first_buf(rx, ctx);
> > + else
> > + process_remain_buf(rx, ctx);
> > +
> > + ++rx->pkt.idx;
> > +
> > + if (!ctx->more && rx->pkt.head_skb)
> > + eea_submit_skb(rx, rx->pkt.head_skb, desc);
> > +
> > +skip:
> > + eea_rx_meta_put(rx, meta);
> > +ack:
> > + eea_ering_cq_ack_desc(rx->ering, 1);
> > +
> > + if (!ctx->more) {
> > + memset(&rx->pkt, 0, sizeof(rx->pkt));
> > + ++recv;
> > + }
> > + }
> > +
> > + return recv;
> > +}
next prev parent reply other threads:[~2026-05-18 2:44 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-14 9:51 [PATCH net-next v43 0/8] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor Xuan Zhuo
2026-05-14 9:51 ` [PATCH net-next v43 1/8] eea: introduce PCI framework Xuan Zhuo
2026-05-18 3:06 ` Xuan Zhuo
2026-05-18 3:07 ` Xuan Zhuo
2026-05-14 9:51 ` [PATCH net-next v43 2/8] eea: introduce ring and descriptor structures Xuan Zhuo
2026-05-18 2:57 ` Xuan Zhuo
2026-05-14 9:51 ` [PATCH net-next v43 3/8] eea: probe the netdevice and create adminq Xuan Zhuo
2026-05-18 1:41 ` Xuan Zhuo
2026-05-18 1:41 ` Xuan Zhuo
2026-05-14 9:51 ` [PATCH net-next v43 4/8] eea: create/destroy rx,tx queues for netdevice open and stop Xuan Zhuo
2026-05-18 1:18 ` Xuan Zhuo
2026-05-18 1:24 ` Xuan Zhuo
2026-05-14 9:51 ` [PATCH net-next v43 5/8] eea: implement packet receive logic Xuan Zhuo
2026-05-18 2:34 ` Xuan Zhuo
2026-05-18 2:35 ` Xuan Zhuo [this message]
2026-05-14 9:51 ` [PATCH net-next v43 6/8] eea: implement packet transmit logic Xuan Zhuo
2026-05-18 2:47 ` Xuan Zhuo
2026-05-18 2:48 ` Xuan Zhuo
2026-05-14 9:51 ` [PATCH net-next v43 7/8] eea: introduce ethtool support Xuan Zhuo
2026-05-18 2:56 ` Xuan Zhuo
2026-05-14 9:51 ` [PATCH net-next v43 8/8] eea: introduce callback for ndo_get_stats64 and register netdev Xuan Zhuo
2026-05-18 3:01 ` Xuan Zhuo
2026-05-15 6:40 ` [PATCH net-next v43 0/8] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor Xuan Zhuo
2026-05-19 10:33 ` Paolo Abeni
2026-05-19 10:40 ` patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1779071709.526365-6-xuanzhuo@linux.alibaba.com \
--to=xuanzhuo@linux.alibaba.com \
--cc=25181214217@stu.xidian.edu.cn \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=dong100@mucse.com \
--cc=dust.li@linux.alibaba.com \
--cc=edumazet@google.com \
--cc=guwen@linux.alibaba.com \
--cc=hkallweit1@gmail.com \
--cc=kuba@kernel.org \
--cc=lulie@linux.alibaba.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=vadim.fedorenko@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox