From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out30-97.freemail.mail.aliyun.com (out30-97.freemail.mail.aliyun.com [115.124.30.97]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 436DD405C33 for ; Mon, 18 May 2026 02:40:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.97 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779072006; cv=none; b=DBFn5aiw/AvkERTL3vmV9AgVww3FPjJXeeRhqnHkZDaKA5EJxr/PwKDyP5fg+w0heg5IjcrftU/YdOtBkwKp0fWkZkJyk+Iz0ca2e9B2fCiwIEqVzn7yHTBWAZtGbV21xyj+p4ET1P/Md/o1Sl855m+aoQOvqLhp5WuHzhCkrB8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779072006; c=relaxed/simple; bh=mSf1a1qPo31k9eMiOFBpG8zk6sfSWpqTrVUxcl9QrTU=; h=Message-ID:Subject:Date:From:To:Cc:References:In-Reply-To: Content-Type; b=i7aPt6luQqb0MW+SHCHS1p8fdxcpy8vJxZnje1C8X96yuY83MJpRTm1wQSz+t3pk7/FzURPslPKzWrONFQXXfPt3c55ZLjSm9hj7UnStZQv6KzwI0O6HwOQN2bS2U4c4V8al4ady94t6ujVaO7hMxVv2Rq9j1lq2rlJ0nlwyNXI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=FYRNQ166; arc=none smtp.client-ip=115.124.30.97 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="FYRNQ166" DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1779071995; h=Message-ID:Subject:Date:From:To:Content-Type; bh=PiG9qQycqxm2lcFZ87Us6MTtBKCnJAug+XbE4OfwzIM=; b=FYRNQ166+70wEiJs6vo49bf4iBpnTFeU/QtaNHKRSMYcylGdUlUzj3QrNQ7Rhs712kmOtmZyjYYNMwiKpDqZMbjQvM8KP4xKDK67i/7pzUsViJanLTz+9MGOMtCFFoj0HZcCGrtKAh4tutSuqUjt0rnx+ISlGe5MT6+fBd0kUQ4= X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R351e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=maildocker-contentspam033037033178;MF=xuanzhuo@linux.alibaba.com;NM=1;PH=DS;RN=14;SR=0;TI=SMTPD_---0X32xHIv_1779071677; Received: from localhost(mailfrom:xuanzhuo@linux.alibaba.com fp:SMTPD_---0X32xHIv_1779071677 cluster:ay36) by smtp.aliyun-inc.com; Mon, 18 May 2026 10:34:37 +0800 Message-ID: <1779071647.763595-5-xuanzhuo@linux.alibaba.com> Subject: Re: [PATCH net-next v43 5/8] eea: implement packet receive logic Date: Mon, 18 May 2026 10:34:07 +0800 From: Xuan Zhuo To: Xuan Zhuo Cc: Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Wen Gu , Philo Lu , Vadim Fedorenko , Dong Yibo , Mingyu Wang <25181214217@stu.xidian.edu.cn>, Heiner Kallweit , Dust Li , netdev@vger.kernel.org References: <20260514095138.80680-1-xuanzhuo@linux.alibaba.com> <20260514095138.80680-6-xuanzhuo@linux.alibaba.com> In-Reply-To: <20260514095138.80680-6-xuanzhuo@linux.alibaba.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: >From sashiko: commit 142959c3285196e5751c147c32a4a954a804d736 Author: Xuan Zhuo eea: implement packet receive logic This commit implements the EEA RX path: page-pool buffer management, per-descriptor metadata handling, packet assembly via build_skb()/ napi_alloc_skb(), and submission through napi_gro_receive(). It also introduces a one-shot hardware-timestamp offset captured at probe time. > diff --git a/drivers/net/ethernet/alibaba/eea/eea_net.c b/drivers/net/eth= ernet/alibaba/eea/eea_net.c > index cfb18a07e296..bf8581bc27ed 100644 > --- a/drivers/net/ethernet/alibaba/eea/eea_net.c > +++ b/drivers/net/ethernet/alibaba/eea/eea_net.c > @@ -642,6 +642,15 @@ static struct eea_net *eea_netdev_alloc(struct eea_d= evice *edev, u32 pairs) > return enet; > } > > +static void eea_update_ts_off(struct eea_device *edev, struct eea_net *e= net) > +{ > + u64 ts; > + > + ts =3D eea_pci_device_ts(edev); > + > + enet->hw_ts_offset =3D ktime_get_real() - ts; > +} Is the choice of ktime_get_real() as the anchor for hardware RX timestamps intentional? Consumers of skb_hwtstamps()->hwtstamp typically expect a stable monotonic timeline; tying the offset to CLOCK_REALTIME means any later settimeofday(), NTP step, leap-second adjustment, or a probe that happens before NTP sync will permanently bias every reported hwtstamp. There is also no ptp_clock_register()/PHC and no periodic resync, so the offset captured here is the one used forever. Is the capture itself meant to be atomic with respect to scheduling? ts =3D eea_pci_device_ts(edev); enet->hw_ts_offset =3D ktime_get_real() - ts; eea_pci_device_ts() performs an MMIO read across PCIe and is then followed by ktime_get_real() with no preempt/IRQ disabling, so any transit or preemption delay between the two reads is permanently baked into hw_ts_offset for the lifetime of the device. Does this code handle 48-bit wraparound of the device timestamp counter? EEA_DESC_TS(desc) masks desc->ts to 48 bits via EEA_DESC_TS_MASK =3D GENMASK_ULL(47, 0), and hw_ts_offset is taken exactly once here at probe/reprobe. After roughly 3.26 days of device uptime (2^48 ns) the per-packet computation EEA_DESC_TS(desc) + enet->hw_ts_offset will jump backwards by ~2^48 ns relative to wall time, with no wrap detector, no resync, and no cyclecounter/timecounter wrapper. There is also a width inconsistency between this capture and the per-packet path: drivers/net/ethernet/alibaba/eea/eea_pci.c:eea_pci_device_ts() { return cfg_read64(ep_dev->reg, hw_ts); } returns the full 64-bit register, while EEA_DESC_TS(desc) masks the descriptor field to 48 bits. If bits 48..63 of the hw_ts register are ever non-zero, every reported hwtstamp would be off by at least 2^48 ns. > @@ -711,6 +722,8 @@ int eea_net_probe(struct eea_device *edev) > if (err) > goto err_reset_dev; > > + eea_update_ts_off(edev, enet); > + > netdev_dbg(enet->netdev, "eea probe success.\n"); > > /* Queue TX/RX implementation is still in progress. register_netdev is [ ... ] > diff --git a/drivers/net/ethernet/alibaba/eea/eea_rx.c b/drivers/net/ethe= rnet/alibaba/eea/eea_rx.c > index b1265048fbc3..9c71a7cf950e 100644 > --- a/drivers/net/ethernet/alibaba/eea/eea_rx.c > +++ b/drivers/net/ethernet/alibaba/eea/eea_rx.c [ ... ] > +static void process_first_buf(struct eea_net_rx *rx, struct eea_rx_ctx *= ctx) > +{ > + struct eea_net *enet =3D rx->enet; > + struct sk_buff *skb =3D NULL; > + > + if (eea_harden_check_size(ctx, enet)) > + goto err; > + > + rx->pkt.data_valid =3D ctx->flags & EEA_DESC_F_DATA_VALID; Is the EEA_DESC_F_DATA_VALID bit (the source of CHECKSUM_UNNECESSARY) meant to be sampled only on the first descriptor of a multi-descriptor packet? process_first_buf() is the only place rx->pkt.data_valid is assigned, and process_remain_buf() does not touch it. Meanwhile in eea_submit_skb() the timestamp comes from the EOP descriptor: if (rx->pkt.data_valid) skb->ip_summed =3D CHECKSUM_UNNECESSARY; if (enet->cfg.ts_cfg.rx_filter =3D=3D HWTSTAMP_FILTER_ALL) skb_hwtstamps(skb)->hwtstamp =3D EEA_DESC_TS(desc) + enet->hw_ts_offset; If the device places per-packet metadata on the EOP descriptor (a common NIC layout), every multi-segment frame would silently fall back to CHECKSUM_NONE. Could the descriptor used for the checksum flag and for the timestamp be documented or made consistent? > + > + if (ctx->hdr_len) > + skb =3D eea_rx_build_split_hdr_skb(rx, ctx); > + else > + skb =3D eea_rx_build_skb(rx, ctx); > + > + if (unlikely(!skb)) > + goto err; > + > + rx->pkt.head_skb =3D skb; > + > + return; > + > +err: > + rx->pkt.do_drop =3D true; > +} > + > +static void eea_submit_skb(struct eea_net_rx *rx, struct sk_buff *skb, > + struct eea_rx_cdesc *desc) > +{ > + struct eea_net *enet =3D rx->enet; > + > + if (rx->pkt.data_valid) > + skb->ip_summed =3D CHECKSUM_UNNECESSARY; > + > + if (enet->cfg.ts_cfg.rx_filter =3D=3D HWTSTAMP_FILTER_ALL) > + skb_hwtstamps(skb)->hwtstamp =3D EEA_DESC_TS(desc) + > + enet->hw_ts_offset; Is the strict equality with HWTSTAMP_FILTER_ALL intentional? The usual driver convention for SIOCSHWTSTAMP is to either reject unsupported filters or upgrade them to ALL, so that consumers requesting more specific filters (PTP_V2_*, NTP_ALL, etc.) still get timestamps on matching frames. With the strict comparison above, anything other than HWTSTAMP_FILTER_ALL yields no timestamps. There is no SIOCSHWTSTAMP setter in this series, so the branch is effectively unreachable today, but it may be worth revisiting together with whatever setter lands later. > + > + skb_record_rx_queue(skb, rx->index); > + skb->protocol =3D eth_type_trans(skb, enet->netdev); > + > + napi_gro_receive(rx->napi, skb); > +} [ ... ] > +static int eea_cleanrx(struct eea_net_rx *rx, int budget, > + struct eea_rx_ctx *ctx) > +{ > + struct eea_rx_cdesc *desc; > + struct eea_rx_meta *meta; > + int recv, err; > + > + for (recv =3D 0; recv < budget; ) { > + desc =3D eea_ering_cq_get_desc(rx->ering); > + if (!desc) > + break; > + > + err =3D eea_rx_desc_to_ctx(rx, ctx, desc); > + if (unlikely(err)) { > + if (ctx->meta) > + eea_rx_meta_put(rx, ctx->meta); > + > + if (rx->pkt.head_skb) > + dev_kfree_skb(rx->pkt.head_skb); > + > + /* A hardware error occurred; we are attempting to > + * mitigate the impact. Subsequent packets may be > + * corrupted. > + */ > + ctx->more =3D false; > + goto ack; > + } Can this recovery path forward arbitrary mid-payload bytes to the network stack as a fresh L2 frame? When eea_rx_desc_to_ctx() fails mid-stream (invalid id, or len greater than meta->len), this branch frees rx->pkt.head_skb, forces ctx->more =3D false, and jumps to ack. The tail of the loop then runs: if (!ctx->more) { memset(&rx->pkt, 0, sizeof(rx->pkt)); ++recv; } so no "still inside a device-side packet" indicator survives. The next descriptor from the device =E2=80=94 which may be a continuation buffer of the original frame =E2=80=94 will be handled by process_first_buf= () because rx->pkt.idx =3D=3D 0. If that continuation satisfies ctx->len >=3D ETH_HLEN, an skb is built from raw mid-payload bytes and submitted via eth_type_trans()/napi_gro_receive(). Would setting rx->pkt.do_drop =3D true (and only clearing rx->pkt on the next observed !more) be a safer way to mitigate, given the existing comment already notes "subsequent packets may be corrupted"? > + > + meta =3D ctx->meta; > + > + if (unlikely(rx->pkt.do_drop)) > + goto skip; > + > + eea_rx_meta_dma_sync_for_cpu(rx, meta, ctx->len); > + > + rx->pkt.recv_len +=3D ctx->len; > + rx->pkt.recv_len +=3D ctx->hdr_len; > + > + if (!rx->pkt.idx) > + process_first_buf(rx, ctx); > + else > + process_remain_buf(rx, ctx); > + > + ++rx->pkt.idx; > + > + if (!ctx->more && rx->pkt.head_skb) > + eea_submit_skb(rx, rx->pkt.head_skb, desc); > + > +skip: > + eea_rx_meta_put(rx, meta); > +ack: > + eea_ering_cq_ack_desc(rx->ering, 1); > + > + if (!ctx->more) { > + memset(&rx->pkt, 0, sizeof(rx->pkt)); > + ++recv; > + } > + } > + > + return recv; > +}