Netdev List
 help / color / mirror / Atom feed
From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
To: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Wen Gu <guwen@linux.alibaba.com>,
	Philo Lu <lulie@linux.alibaba.com>,
	Vadim Fedorenko <vadim.fedorenko@linux.dev>,
	Dong Yibo <dong100@mucse.com>,
	Mingyu Wang <25181214217@stu.xidian.edu.cn>,
	Heiner Kallweit <hkallweit1@gmail.com>,
	Dust Li <dust.li@linux.alibaba.com>,
	netdev@vger.kernel.org
Subject: Re: [PATCH net-next v43 5/8] eea: implement packet receive logic
Date: Mon, 18 May 2026 10:35:09 +0800	[thread overview]
Message-ID: <1779071709.526365-6-xuanzhuo@linux.alibaba.com> (raw)
In-Reply-To: <1779071647.763595-5-xuanzhuo@linux.alibaba.com>

On Mon, 18 May 2026 10:34:07 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> From sashiko:
>
>
> commit 142959c3285196e5751c147c32a4a954a804d736
> Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> eea: implement packet receive logic
> This commit implements the EEA RX path: page-pool buffer management,
> per-descriptor metadata handling, packet assembly via build_skb()/
> napi_alloc_skb(), and submission through napi_gro_receive().  It also
> introduces a one-shot hardware-timestamp offset captured at probe time.
> > diff --git a/drivers/net/ethernet/alibaba/eea/eea_net.c b/drivers/net/ethernet/alibaba/eea/eea_net.c
> > index cfb18a07e296..bf8581bc27ed 100644
> > --- a/drivers/net/ethernet/alibaba/eea/eea_net.c
> > +++ b/drivers/net/ethernet/alibaba/eea/eea_net.c
> > @@ -642,6 +642,15 @@ static struct eea_net *eea_netdev_alloc(struct eea_device *edev, u32 pairs)
> >  	return enet;
> >  }
> >
> > +static void eea_update_ts_off(struct eea_device *edev, struct eea_net *enet)
> > +{
> > +	u64 ts;
> > +
> > +	ts = eea_pci_device_ts(edev);
> > +
> > +	enet->hw_ts_offset = ktime_get_real() - ts;
> > +}
> Is the choice of ktime_get_real() as the anchor for hardware RX
> timestamps intentional?
> Consumers of skb_hwtstamps()->hwtstamp typically expect a stable
> monotonic timeline; tying the offset to CLOCK_REALTIME means any later
> settimeofday(), NTP step, leap-second adjustment, or a probe that
> happens before NTP sync will permanently bias every reported hwtstamp.
> There is also no ptp_clock_register()/PHC and no periodic resync, so
> the offset captured here is the one used forever.
> Is the capture itself meant to be atomic with respect to scheduling?
> 	ts = eea_pci_device_ts(edev);
> 	enet->hw_ts_offset = ktime_get_real() - ts;
> eea_pci_device_ts() performs an MMIO read across PCIe and is then
> followed by ktime_get_real() with no preempt/IRQ disabling, so any
> transit or preemption delay between the two reads is permanently
> baked into hw_ts_offset for the lifetime of the device.
> Does this code handle 48-bit wraparound of the device timestamp
> counter?
> EEA_DESC_TS(desc) masks desc->ts to 48 bits via
> EEA_DESC_TS_MASK = GENMASK_ULL(47, 0), and hw_ts_offset is taken
> exactly once here at probe/reprobe.  After roughly 3.26 days of
> device uptime (2^48 ns) the per-packet computation
> EEA_DESC_TS(desc) + enet->hw_ts_offset will jump backwards by ~2^48
> ns relative to wall time, with no wrap detector, no resync, and no
> cyclecounter/timecounter wrapper.
> There is also a width inconsistency between this capture and the
> per-packet path:
> drivers/net/ethernet/alibaba/eea/eea_pci.c:eea_pci_device_ts() {
>     return cfg_read64(ep_dev->reg, hw_ts);
> }
> returns the full 64-bit register, while EEA_DESC_TS(desc) masks the
> descriptor field to 48 bits.  If bits 48..63 of the hw_ts register
> are ever non-zero, every reported hwtstamp would be off by at least
> 2^48 ns.

Indeed, this part is currently just a basic architectural framework. I have not
included a full PTP implementation in this version, as it is slated for future
development. I acknowledge that this implementation is currently incomplete.


> > @@ -711,6 +722,8 @@ int eea_net_probe(struct eea_device *edev)
> >  	if (err)
> >  		goto err_reset_dev;
> >
> > +	eea_update_ts_off(edev, enet);
> > +
> >  	netdev_dbg(enet->netdev, "eea probe success.\n");
> >
> >  	/* Queue TX/RX implementation is still in progress. register_netdev is
> [ ... ]
> > diff --git a/drivers/net/ethernet/alibaba/eea/eea_rx.c b/drivers/net/ethernet/alibaba/eea/eea_rx.c
> > index b1265048fbc3..9c71a7cf950e 100644
> > --- a/drivers/net/ethernet/alibaba/eea/eea_rx.c
> > +++ b/drivers/net/ethernet/alibaba/eea/eea_rx.c
> [ ... ]
> > +static void process_first_buf(struct eea_net_rx *rx, struct eea_rx_ctx *ctx)
> > +{
> > +	struct eea_net *enet = rx->enet;
> > +	struct sk_buff *skb = NULL;
> > +
> > +	if (eea_harden_check_size(ctx, enet))
> > +		goto err;
> > +
> > +	rx->pkt.data_valid = ctx->flags & EEA_DESC_F_DATA_VALID;
> Is the EEA_DESC_F_DATA_VALID bit (the source of CHECKSUM_UNNECESSARY)
> meant to be sampled only on the first descriptor of a multi-descriptor
> packet?
> process_first_buf() is the only place rx->pkt.data_valid is assigned,
> and process_remain_buf() does not touch it.  Meanwhile in
> eea_submit_skb() the timestamp comes from the EOP descriptor:
>     if (rx->pkt.data_valid)
>         skb->ip_summed = CHECKSUM_UNNECESSARY;
>     if (enet->cfg.ts_cfg.rx_filter == HWTSTAMP_FILTER_ALL)
>         skb_hwtstamps(skb)->hwtstamp = EEA_DESC_TS(desc) +
>             enet->hw_ts_offset;
> If the device places per-packet metadata on the EOP descriptor (a
> common NIC layout), every multi-segment frame would silently fall
> back to CHECKSUM_NONE.  Could the descriptor used for the checksum
> flag and for the timestamp be documented or made consistent?

On the RX path, for multi-descriptor packets, the information contained in all
descriptors is essentially the same. Since they all describe the same packet, we
only need to sample EEA_DESC_F_DATA_VALID once. As for the hardware timestamp,
we simply retrieve it from the last descriptor as a matter of convenience.

> > +
> > +	if (ctx->hdr_len)
> > +		skb = eea_rx_build_split_hdr_skb(rx, ctx);
> > +	else
> > +		skb = eea_rx_build_skb(rx, ctx);
> > +
> > +	if (unlikely(!skb))
> > +		goto err;
> > +
> > +	rx->pkt.head_skb = skb;
> > +
> > +	return;
> > +
> > +err:
> > +	rx->pkt.do_drop = true;
> > +}
> > +
> > +static void eea_submit_skb(struct eea_net_rx *rx, struct sk_buff *skb,
> > +			   struct eea_rx_cdesc *desc)
> > +{
> > +	struct eea_net *enet = rx->enet;
> > +
> > +	if (rx->pkt.data_valid)
> > +		skb->ip_summed = CHECKSUM_UNNECESSARY;
> > +
> > +	if (enet->cfg.ts_cfg.rx_filter == HWTSTAMP_FILTER_ALL)
> > +		skb_hwtstamps(skb)->hwtstamp = EEA_DESC_TS(desc) +
> > +			enet->hw_ts_offset;
> Is the strict equality with HWTSTAMP_FILTER_ALL intentional?
> The usual driver convention for SIOCSHWTSTAMP is to either reject
> unsupported filters or upgrade them to ALL, so that consumers
> requesting more specific filters (PTP_V2_*, NTP_ALL, etc.) still
> get timestamps on matching frames.  With the strict comparison
> above, anything other than HWTSTAMP_FILTER_ALL yields no timestamps.
> There is no SIOCSHWTSTAMP setter in this series, so the branch is
> effectively unreachable today, but it may be worth revisiting
> together with whatever setter lands later.
> > +
> > +	skb_record_rx_queue(skb, rx->index);
> > +	skb->protocol = eth_type_trans(skb, enet->netdev);
> > +
> > +	napi_gro_receive(rx->napi, skb);
> > +}
> [ ... ]
> > +static int eea_cleanrx(struct eea_net_rx *rx, int budget,
> > +		       struct eea_rx_ctx *ctx)
> > +{
> > +	struct eea_rx_cdesc *desc;
> > +	struct eea_rx_meta *meta;
> > +	int recv, err;
> > +
> > +	for (recv = 0; recv < budget; ) {
> > +		desc = eea_ering_cq_get_desc(rx->ering);
> > +		if (!desc)
> > +			break;
> > +
> > +		err = eea_rx_desc_to_ctx(rx, ctx, desc);
> > +		if (unlikely(err)) {
> > +			if (ctx->meta)
> > +				eea_rx_meta_put(rx, ctx->meta);
> > +
> > +			if (rx->pkt.head_skb)
> > +				dev_kfree_skb(rx->pkt.head_skb);
> > +
> > +			/* A hardware error occurred; we are attempting to
> > +			 * mitigate the impact. Subsequent packets may be
> > +			 * corrupted.
> > +			 */
> > +			ctx->more = false;
> > +			goto ack;
> > +		}
> Can this recovery path forward arbitrary mid-payload bytes to the
> network stack as a fresh L2 frame?
> When eea_rx_desc_to_ctx() fails mid-stream (invalid id, or len
> greater than meta->len), this branch frees rx->pkt.head_skb, forces
> ctx->more = false, and jumps to ack.  The tail of the loop then
> runs:
>     if (!ctx->more) {
>         memset(&rx->pkt, 0, sizeof(rx->pkt));
>         ++recv;
>     }
> so no "still inside a device-side packet" indicator survives.  The
> next descriptor from the device — which may be a continuation
> buffer of the original frame — will be handled by process_first_buf()
> because rx->pkt.idx == 0.  If that continuation satisfies
> ctx->len >= ETH_HLEN, an skb is built from raw mid-payload bytes
> and submitted via eth_type_trans()/napi_gro_receive().
> Would setting rx->pkt.do_drop = true (and only clearing rx->pkt on
> the next observed !more) be a safer way to mitigate, given the
> existing comment already notes "subsequent packets may be
> corrupted"?


During multi-fragment processing—specifically, whether a mid-stream
exception might cause the driver to misidentify subsequent fragments as the
start of a new packet. However, from a driver perspective, there is little
we can do in such scenarios; if the hardware state becomes corrupted in
this way, it constitutes a critical hardware error that is effectively
beyond the driver's recovery capability.


> > +
> > +		meta = ctx->meta;
> > +
> > +		if (unlikely(rx->pkt.do_drop))
> > +			goto skip;
> > +
> > +		eea_rx_meta_dma_sync_for_cpu(rx, meta, ctx->len);
> > +
> > +		rx->pkt.recv_len += ctx->len;
> > +		rx->pkt.recv_len += ctx->hdr_len;
> > +
> > +		if (!rx->pkt.idx)
> > +			process_first_buf(rx, ctx);
> > +		else
> > +			process_remain_buf(rx, ctx);
> > +
> > +		++rx->pkt.idx;
> > +
> > +		if (!ctx->more && rx->pkt.head_skb)
> > +			eea_submit_skb(rx, rx->pkt.head_skb, desc);
> > +
> > +skip:
> > +		eea_rx_meta_put(rx, meta);
> > +ack:
> > +		eea_ering_cq_ack_desc(rx->ering, 1);
> > +
> > +		if (!ctx->more) {
> > +			memset(&rx->pkt, 0, sizeof(rx->pkt));
> > +			++recv;
> > +		}
> > +	}
> > +
> > +	return recv;
> > +}

  reply	other threads:[~2026-05-18  2:44 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-14  9:51 [PATCH net-next v43 0/8] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor Xuan Zhuo
2026-05-14  9:51 ` [PATCH net-next v43 1/8] eea: introduce PCI framework Xuan Zhuo
2026-05-18  3:06   ` Xuan Zhuo
2026-05-18  3:07     ` Xuan Zhuo
2026-05-14  9:51 ` [PATCH net-next v43 2/8] eea: introduce ring and descriptor structures Xuan Zhuo
2026-05-18  2:57   ` Xuan Zhuo
2026-05-14  9:51 ` [PATCH net-next v43 3/8] eea: probe the netdevice and create adminq Xuan Zhuo
2026-05-18  1:41   ` Xuan Zhuo
2026-05-18  1:41     ` Xuan Zhuo
2026-05-14  9:51 ` [PATCH net-next v43 4/8] eea: create/destroy rx,tx queues for netdevice open and stop Xuan Zhuo
2026-05-18  1:18   ` Xuan Zhuo
2026-05-18  1:24     ` Xuan Zhuo
2026-05-14  9:51 ` [PATCH net-next v43 5/8] eea: implement packet receive logic Xuan Zhuo
2026-05-18  2:34   ` Xuan Zhuo
2026-05-18  2:35     ` Xuan Zhuo [this message]
2026-05-14  9:51 ` [PATCH net-next v43 6/8] eea: implement packet transmit logic Xuan Zhuo
2026-05-18  2:47   ` Xuan Zhuo
2026-05-18  2:48     ` Xuan Zhuo
2026-05-14  9:51 ` [PATCH net-next v43 7/8] eea: introduce ethtool support Xuan Zhuo
2026-05-18  2:56   ` Xuan Zhuo
2026-05-14  9:51 ` [PATCH net-next v43 8/8] eea: introduce callback for ndo_get_stats64 and register netdev Xuan Zhuo
2026-05-18  3:01   ` Xuan Zhuo
2026-05-15  6:40 ` [PATCH net-next v43 0/8] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor Xuan Zhuo
2026-05-19 10:33 ` Paolo Abeni
2026-05-19 10:40 ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1779071709.526365-6-xuanzhuo@linux.alibaba.com \
    --to=xuanzhuo@linux.alibaba.com \
    --cc=25181214217@stu.xidian.edu.cn \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=dong100@mucse.com \
    --cc=dust.li@linux.alibaba.com \
    --cc=edumazet@google.com \
    --cc=guwen@linux.alibaba.com \
    --cc=hkallweit1@gmail.com \
    --cc=kuba@kernel.org \
    --cc=lulie@linux.alibaba.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=vadim.fedorenko@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox