From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtpbgsg1.qq.com (smtpbgsg1.qq.com [54.254.200.92]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4D7B53C9EC2 for ; Thu, 11 Jun 2026 10:01:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=54.254.200.92 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781172070; cv=none; b=F+850E+Qf850gZw+uf2t6/k1i9+K9tOIPZn/Cjwh6DNyBk9VIk2siUDd7o8gdK1pUzHlIIacUZj3vY2eLA6F4cUQgOcgnW/Q3Vj/LI+s/s88G8YBKXMEZCrcogg3JxpOpx9zROK+Ej1r0UwVJat+oLTUVwQAbJYRSOV6JRHlyO0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781172070; c=relaxed/simple; bh=KpBuWNsfn0BtX8/0qR7x4E1WbMCL5tUp44srr16Q494=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version:Content-Type; b=Nnz0SEjuOoo6AQ8KR2jutCq5/tfa5Kc+yABSddTrHJr0jbvqxye7SSYDqRG0asnILTTzzEFYHlKcyiBIgYhc1177uwmxwjNQNRt7TKo+heKb+pnSE34o/BO1+BdHd3rpxdqxtfHvi582PyyhmjKr/H2miwKfkgkISixXXCovr4I= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mucse.com; spf=pass smtp.mailfrom=mucse.com; arc=none smtp.client-ip=54.254.200.92 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=mucse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=mucse.com X-QQ-mid: esmtpgz14t1781172042t39dde181 X-QQ-Originating-IP: G8d9Rov0qsaq3ib14HJhOJ9EAyG1fBcuVaPMpfPw2i0= Received: from localhost.localdomain ( [203.174.112.180]) by bizesmtp.qq.com (ESMTP) with id ; Thu, 11 Jun 2026 18:00:39 +0800 (CST) X-QQ-SSF: 0000000000000000000000000000000 X-QQ-GoodBg: 0 X-BIZMAIL-ID: 3083459060942472717 EX-QQ-RecipientCnt: 13 From: Dong Yibo To: andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, danishanwar@ti.com, vadim.fedorenko@linux.dev, horms@kernel.org, u.kleine-koenig@baylibre.com Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org, dong100@mucse.com, yaojun@mucse.com Subject: [PATCH net-next v7 0/4] net: rnpgbe: Add TX/RX and link status support Date: Thu, 11 Jun 2026 18:00:32 +0800 Message-Id: <20260611100036.36370-1-dong100@mucse.com> X-Mailer: git-send-email 2.25.1 Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-QQ-SENDSIZE: 520 Feedback-ID: esmtpgz:mucse.com:qybglogicsvrgz:qybglogicsvrgz3a-1 X-QQ-XMAILINFO: N4WhQbLQyIqS5s3vnidybL+tVX02X43uXFvLrp3aKFaVKr/DsQ/pBSf/ qzTGkrWbWoQXiz5ot+N6K/8TexR1Dv2kMtkJHGiolKPT4QGOsDf48J+iITjFcpIaZL0XNXS 0vQtDgBcXlsqWkJJTKE7DZ0PvMmHY8st/qyFLe3C8KhV3netgEE+XBN5rEK3F6O42aEQTe8 1Q4XUdLUySJT41jDb0cG6QVpdsMjc1yoetfwPcLozjZB3cAsNSpbiXBBxNzNwmyIpx+x3oe +9qxdb+tWoEmh6rVbn9BCG4hvwuBqKG5RMR0eeR8ApNUcMnECLApLOTTT/xgZZH2M8EqDyo e4w3mUM4pVk5ZskVcpC/bniLUSdwsoex7oSHW++MAqD0gVxFdLbPtDf+5V+1oVxEGFc7DIa qYdZx9FYNeRiyIJsgKZ+WllKA+qODO7SYkWupV9ZMt/HvtgVeQBuvuWSzMu9deeAOx3qsBu DgMZBUXoPhzg7oT1pZYoFkfG/t6cUlAr3aOU2uskwzJoL6iD6xhfoXHZbuGIVyJEFsAiFdv IJynqJbQsWOfE/zhwAp19pHlDhRK+26eI+1GkAK0BseZHC0awsukouOLtFhg0BpTiD/mptn jW7sn5d5fGZv3JxQkWwmnpXU0YZjE9x3Q9QQLEEI2FLZ1/98zQdZn81MCkDbROxswAiSfk3 89Ynexm3+aBvQBiFF4EZIomrAJFsIKgBuWBRE9mJv73kzIG8PO+xRjSoFnAfulCSMpRa3Y/ l4x7cx9cmspy86vsfcp4Bdk4SXzv5yd65yl42YxiwBA3qzfaGaC8szLfRwVlYISLqerSa8/ 25BekF09elZPa5PcgRlZ7SxgoQ90RJBcUzEGM56OutCJ8asUjM3FMrHQCGyAOMo5qnSwc1U B5JeLKPmNQ+AJ0+En5ut1f+vWhx34GbseSITqUmeynxD1B8HGqFq7IMZV8XF9SQ4i57xAUM 2G3P701k1/RSiL+5tLJG71sf3ylgHQODd5zVW+lu7f1t3cbqFAbjUm4psYcZnhgVWssGpsA 5favHUuaTsHdJYoK2cLjWN5KEtmE0wQQvlG4nqOHRWfo9tFWNvcErexkL6M2UuoMyAe5Oru w== X-QQ-XMRINFO: NyFYKkN4Ny6FuXrnB5Ye7Aabb3ujjtK+gg== X-QQ-RECHKSPAM: 0 Hi maintainers, This patch series adds the packet transmission, reception, and link status management features to the RNPGBE driver, building upon the previously introduced mailbox communication and basic driver infrastructure. The series introduces: - Msix/msi interrupt handling with NAPI support - TX path with scatter-gather DMA and completion handling - RX path with page pool buffer management - Link status monitoring and carrier management These changes enable the RNPGBE driver to support basic tx/rx network operations. Changelog: v6 -> v7: [patch 2/4]: 1. Fix 'frag_idx' error in rnpgbe_tx_map. (Sashiko-gemini) [patch 3/4]: 1. Fix skb leak in invalid size path in rnpgbe_clean_rx_irq. (Sashiko-gemini) 2. Fix invalid size range check for rxdesc. (Sashiko-gemini) [patch 4/4]: 1. Fix 'data race on the reply payload'. (Sashiko-gemini) 2. Fix 'asymmetric behaviour' when report up/down. (andrew) links: --- v1: https://lore.kernel.org/netdev/20260325091204.94015-1-dong100@mucse.com/ v2: https://lore.kernel.org/netdev/20260403025713.527841-1-dong100@mucse.com/ v3: https://lore.kernel.org/netdev/20260507081539.171844-1-dong100@mucse.com/ v4: https://lore.kernel.org/netdev/20260526033539.164061-1-dong100@mucse.com/ v5: https://lore.kernel.org/netdev/20260528023150.239532-1-dong100@mucse.com/ v6: https://lore.kernel.org/netdev/20260604112750.769215-1-dong100@mucse.com/ Additional Notes: 1. Sashiko: > static int rnpgbe_open(struct net_device *netdev) > { > + struct mucse *mucse = netdev_priv(netdev); > + int err; > + > + err = rnpgbe_request_irq(mucse); > + if (err) > + return err; > + > + err = netif_set_real_num_queues(netdev, mucse->num_tx_queues, > + mucse->num_rx_queues); > + if (err) > + goto err_free_irqs; > + > + rnpgbe_up_complete(mucse); Could this sequence lead to a queue stall? If an interrupt arrives immediately after rnpgbe_request_irq(), the handler will mask the interrupt and call napi_schedule_irqoff(). Since NAPI is not yet enabled, the scheduling request is silently dropped. When rnpgbe_up_complete() later enables NAPI, the interrupt remains masked and the queue stalls. Should NAPI be enabled before requesting the IRQ? Answer: The sequence is safe because the hardware cannot generate any Rx/Tx or link interrupts until rnpgbe_up_complete() enables DMA engines and interrupt sources. There is no race window where an interrupt can arrive before NAPI is ready. Therefore no stall can occur. 2. Sashiko: > +static bool rnpgbe_cleanup_headers(struct sk_buff *skb) > +{ > + if (IS_ERR(skb)) > + return true; > + /* place header in linear portion of buffer */ > + if (!skb_headlen(skb)) > + rnpgbe_pull_tail(skb); > + /* if eth_skb_pad returns an error the skb was freed */ > + if (eth_skb_pad(skb)) > + return true; > + > + return false; > +} If the first hardware descriptor provides a size between 1 and 13 bytes, skb_headlen(skb) will be greater than 0, causing this to skip the rnpgbe_pull_tail() call. When eth_type_trans() is called later, it assumes a full 14 byte header and calls __skb_pull(skb, ETH_HLEN). If the packet spans multiple fragments, could subtracting 14 from skb->len cause it to drop below skb->data_len, triggering the BUG_ON in __skb_pull() and crashing the kernel? Answer: This issue is not reproducible: Hardware enforces a minimum packet size of 33 bytes (no 1–13 bytes descriptor sizes). Multiple descriptors are only used Format frames >1536(M_DEFAULT_SG is 96) bytes; all small packets use a single descriptor (no skb fragmentation). Thus the problematic __skb_pull() condition cannot occur. 3 Sashiko: > +static int rnpgbe_clean_rx_irq(struct mucse_q_vector *q_vector, > + struct mucse_ring *rx_ring, > + int budget) > +{ > + unsigned int total_rx_bytes = 0, total_rx_packets = 0; > + u16 cleaned_count = mucse_desc_unused_rx(rx_ring); > + bool fail_alloc = false; > + > + while (likely(total_rx_packets < budget)) { > + struct mucse_rx_buffer *rx_buffer; > + union rnpgbe_rx_desc *rx_desc; > + struct sk_buff *skb; > + unsigned int size; [ ... ] > + /* This memory barrier is needed to keep us from reading > + * any other fields out of the rx_desc until we know the > + * descriptor has been written back > + */ > + dma_rmb(); > + size = le16_to_cpu(rx_desc->wb.len); > + rx_buffer = rnpgbe_get_buffer(rx_ring, &skb, size); Is the size value validated against the allocated page size anywhere? If a hardware device provides an oversized length, such as 65535, could this cause dma_sync_single_range_for_cpu() to operate out of bounds? Additionally, since size is passed to rnpgbe_build_skb() which uses __skb_put(), could this cause skb->tail and skb->len to advance beyond the allocated linear buffer? Answer: Can not happen, hardware splits data into multiple descs with 1536 bytes (M_DEFAULT_SG is 96) for each desc. 4. Sashiko: > + if (!size) > + break; If size is zero, does this cause an infinite loop? By breaking here, the driver doesn't advance rx_ring->next_to_clean or clear the descriptor's DD bit. On subsequent NAPI polls, it will examine the exact same descriptor, see the DD bit set and size 0, and immediately break again, locking up the CPU in softirq context. Answer: No, an infinite loop cannot occur here.The hardware guarantees that the descriptor size is always non-zero when writing back the DD (Descriptor Done) bit. 5. Sashiko: > +static void rnpgbe_configure_rx_ring(struct mucse *mucse, > + struct mucse_ring *ring) > +{ [ ... ] > + mucse_ring_wr32(ring, RNPGBE_RX_LEN, ring->count); > + ring->tail = ring->ring_addr + RNPGBE_RX_TAIL; > + ring->next_to_clean = mucse_ring_rd32(ring, RNPGBE_RX_HEAD); > + ring->next_to_use = ring->next_to_clean; Can RNPGBE_RX_HEAD contain a value greater than or equal to ring->count? If the device register holds an out-of-bounds value, it looks like it will be directly used as an index into rx_ring->rx_buffer_info[] and by M_RX_DESC(), causing an out-of-bounds memory access. Answer: No out-of-bounds risk: RNPGBE_RX_HEAD is 0 after hardware reset (always valid). This is just driver/hardware sync for queue down/up (no full reset). If ring->count changes, a full reset sets head back to 0. Head value always stays within valid index rang. 6. Sashiko: > +static void mucse_mbx_fw_req_handler(struct mucse_hw *hw, > + struct mbx_fw_cmd_req *req) > +{ > + struct mucse *mucse = container_of(hw, struct mucse, hw); > + u32 magic = le32_to_cpu(req->link_stat.port_magic); > + unsigned long flags; > + > + if (le16_to_cpu(req->opcode) == LINK_CHANGE_EVT) { > + spin_lock_irqsave(&mucse->link_lock, flags); > + > + if (le16_to_cpu(req->link_stat.port_status)) > + hw->link = true; Does this correctly handle multi-port link events? Since req->link_stat.port_status is a bitmask of changed ports, checking it as a simple boolean might cause this port to process events meant for other ports on the same NIC. Additionally, this unconditionally reads from st[0], which would cause all ports to adopt the speed and duplex of port 0 whenever any port changes state. Answer: This driver follows a 1:1 mapping — one PF (Physical Function) corresponds to exactly one port. The multi-port bitmask (port_status) and multi-port structure definitions in the firmware command are only for forward compatibility interfacethe driver-firmware communication interface, to support both multi-port and single-port designs. 7. Sashiko: > +void rnpgbe_set_rx(struct mucse_hw *hw, bool enable) > +{ > + u32 value = mucse_hw_rd32(hw, GMAC_CONTROL); > + > + if (enable) > + value |= GMAC_CONTROL_RE; > + else > + value &= ~GMAC_CONTROL_RE; > + > + mucse_hw_wr32(hw, GMAC_CONTROL, value); > + > + value = mucse_hw_rd32(hw, GMAC_FRAME_FILTER); > + mucse_hw_wr32(hw, GMAC_FRAME_FILTER, value | BIT(0)); Does this lockless read-modify-write race with standard network operations like ndo_set_rx_mode? Answer: No race here, ndo_set_rx_mode is not added now. Even in the future, GMAC_FRAME_FILTER will not controlled in ndo_set_rx_mode, hw has other register to control promiscuous mode. 8. Sashiko: > +/** > + * register_mbx_irq - Register mbx routine > + * @mucse: pointer to private structure > + * > + * @return: 0 on success, negative on failure > + **/ > +int register_mbx_irq(struct mucse *mucse) > +{ > + struct pci_dev *pdev = mucse->pdev; > + int err = 0; > + > + snprintf(mucse->mbx_name, sizeof(mucse->mbx_name), > + "rnpgbe-mbx:%s", pci_name(pdev)); > + > + if (mucse->flags & M_FLAG_MSIX_EN) { > + err = request_irq(pci_irq_vector(pdev, 0), > + rnpgbe_msix_other, 0, mucse->mbx_name, > + mucse); > + } > + > + return err; > +} Is it intentional that register_mbx_irq() only installs a handler when M_FLAG_MSIX_EN is set? In MSIX_SINGLE_EN and MSI modes, the mailbox is serviced only through the multiplexed rnpgbe_int_single() that rnpgbe_open() registers and rnpgbe_close() tears down. Between the end of probe and the first open, and again between every close and the next open, there appears to be no handler attached to the interrupt that firmware can signal through. Does that leave a window where a firmware-initiated mailbox interrupt in those modes has no registered handler? Combined with the empty body of mucse_fw_irq_handler(), how does this match the commit message wording: > Add comprehensive interrupt handling for the RNPGBE driver: Answer: Fw only tirg irq after driver set mucse_fw_link_report_en true (in rnpgbe_up_complete). So this is not 'no handler attached' contition. 9. Sashiko: > +static int rnpgbe_set_interrupt_capability(struct mucse *mucse) > +{ > + int v_budget; > + > + v_budget = min_t(int, mucse->num_tx_queues, mucse->num_rx_queues); > + v_budget = min_t(int, v_budget, MAX_Q_VECTORS); > + v_budget = min_t(int, v_budget, num_online_cpus()); > + /* add one vector for mbx */ > + v_budget += 1; > + v_budget = pci_alloc_irq_vectors(mucse->pdev, 1, v_budget, > + PCI_IRQ_MSI | PCI_IRQ_MSIX); > + if (v_budget < 0) > + return v_budget; > + > + if (mucse->pdev->msix_enabled) { > + /* q_vector not include mbx */ > + if (v_budget > 1) { > + mucse->flags |= M_FLAG_MSIX_EN; > + mucse->num_q_vectors = v_budget - 1; > + } else { > + mucse->flags |= M_FLAG_MSIX_SINGLE_EN; > + mucse->num_q_vectors = 1; > + } > + } else { > + /* msi use only 1 irq */ > + mucse->num_q_vectors = 1; > + mucse->flags |= M_FLAG_MSI_EN; > + } > + > + return 0; > +} If the PCI subsystem allocates multiple MSI vectors because v_budget > 1 and the device supports Multiple Message MSI, does this code leak the unused vectors? It looks like num_q_vectors is forced to 1 and the remaining allocated vectors are ignored. Answer: Hw only suports 1 msi. MSI mode should falls back to single vector. 10. Sashiko: > +static void rnpgbe_configure_rx_ring(struct mucse *mucse, > + struct mucse_ring *ring) > +{ > + struct mucse_hw *hw = &mucse->hw; > + > + /* disable queue to avoid issues while updating state */ > + mucse_ring_wr32(ring, RNPGBE_RX_START, 0); [ ... ] > + mucse_ring_wr32(ring, RNPGBE_RX_INT_PKTCNT, M_DEFAULT_RX_INT_PKTCNT); > + rnpgbe_alloc_rx_buffers(ring, mucse_desc_unused_rx(ring)); > +} The return value of rnpgbe_alloc_rx_buffers() is discarded here. If the very first population fails under memory pressure, the ring is left partially populated (the remaining descriptors keep the zeroed contents from dma_alloc_coherent()), and rnpgbe_up_complete() then writes RNPGBE_RX_START=1 unconditionally. When hardware walks into one of those zeroed descriptors, it will DMA an incoming packet to an invalid address (just fun_id, or 0 if pfvfnum == 0). Because rnpgbe_configure_rx() returns void, rnpgbe_open() returns success and the failure is invisible to callers. Should the allocation failure propagate up and unwind RX resources? Answer: Cannot happen, hw only start request rx descriptors after driver update rx_tail(which is only update after desc is write correct with memory alloc success). 11. Sashiko: > +static void rnpgbe_pull_tail(struct sk_buff *skb) > +{ > + skb_frag_t *frag = &skb_shinfo(skb)->frags[0]; > + unsigned int pull_len; > + unsigned char *va; > + > + va = skb_frag_address(frag); > + pull_len = eth_get_headlen(skb->dev, va, M_RX_HDR_SIZE); Because rnpgbe_build_skb() places the first descriptor's data into the linear SKB area, skb_headlen(skb) equals the descriptor's size. This means rnpgbe_pull_tail() is only called from rnpgbe_cleanup_headers() if size is 0. If a 0-length descriptor with EOP is received, nr_frags will be 0. Can this code then dereference skb_shinfo(skb)->frags[0], leading to an out-of-bounds read on uninitialized memory? This could result in a severe length underflow in skb_frag_size_sub() since it passes a hardcoded M_RX_HDR_SIZE (256) to eth_get_headlen(). Answer: Cannot happen, hw never writeback 0-size descriptor. 12. Sashiko: > +void rnpgbe_set_rx(struct mucse_hw *hw, bool enable) > +{ > + u32 value = mucse_hw_rd32(hw, GMAC_CONTROL); > + > + if (enable) > + value |= GMAC_CONTROL_RE; > + else > + value &= ~GMAC_CONTROL_RE; > + > + mucse_hw_wr32(hw, GMAC_CONTROL, value); > + > + if (enable) > + mucse_hw_wr32(hw, GMAC_FRAME_FILTER, GMAC_RX_ALL); > + else > + mucse_hw_wr32(hw, GMAC_FRAME_FILTER, 0); > +} On the DesignWare GMAC block this driver is using (GMAC_CONTROL at MAC offset 0x0, GMAC_FRAME_FILTER at 0x4), bit 31 of the frame filter is RA (Receive All) and bit 0 is PR (Promiscuous Mode). Given: #define GMAC_RX_ALL (BIT(31) | BIT(0)) every link-up transition forces both bits on, with no destination-address filtering applied at all. Is that the intended steady-state receive configuration for this NIC? Also, because rnpgbe_watchdog_link_is_up() re-applies this filter on every link change, any future ndo_set_rx_mode() would be overwritten on the next carrier flap. Would a perfect-match filter driven by netdev->flags and the uc/mc lists be more appropriate? Answer: This is intentional for the initial submission. The hardware has a separate register set for MAC/promiscuous filtering beyond the standard DW GMAC frame filter; full Rx mode management will be added together with ndo_set_rx_mode() in a follow-up series. The current GMAC_RX_ALL ensures basic connectivity works without packet loss. 13. Sashiko > diff --git a/drivers/net/ethernet/mucse/rnpgbe/rnpgbe.h b/drivers/net/ethernet/mucse/rnpgbe/rnpgbe.h > index 5b024f9f7e17..cbe60f168346 100644 > --- a/drivers/net/ethernet/mucse/rnpgbe/rnpgbe.h > +++ b/drivers/net/ethernet/mucse/rnpgbe/rnpgbe.h > @@ -35,21 +39,63 @@ enum { > > struct mucse_hw { > void __iomem *hw_addr; > + void __iomem *ring_msix_base; > struct pci_dev *pdev; > struct mucse_mbx_info mbx; > int port; > u8 pfvfnum; > }; > > +struct mucse_ring { > + struct mucse_ring *next; > + struct mucse_q_vector *q_vector; > + void __iomem *ring_addr; > + void __iomem *irq_mask; > + void __iomem *trig; > + u8 queue_index; > + /* hw ring idx */ > + u8 rnpgbe_queue_idx; > +} ____cacheline_internodealigned_in_smp; This isn't a bug, but the driver mixes the mucse_ and rnpgbe_ prefixes throughout. The Makefile and module are named rnpgbe, yet the core structures are mucse_hw, mucse_ring, mucse_q_vector, and the helpers mucse_add_ring(), mucse_for_each_ring, mucse_fw_irq_handler(), mucse_hw_rd32() coexist with rnpgbe_poll(), rnpgbe_request_irq(), rnpgbe_set_ring_vector(), etc. Is there a planned partitioning between a shared mucse common layer and rnpgbe-specific code, or would it be preferable to settle on a single prefix to avoid future collisions if another mucse PCIe driver appears? Answer: The mucse_ and rnpgbe_ prefixes follow a layered convention: mucse_ — Hardware abstraction layer. These operate on the mucse_hw / mucse_ring / mucse_q_vector structures and contain logic that is shared across Mucse product lines (e.g., ring traversal, register RMW, mailbox protocol). They are not specific to the RNPGBE device and are expected to be reused by future Mucse PCIe drivers. rnpgbe_ — Driver-level operations. These implement the ndo callbacks, interrupt policy, NAPI poll, and other Linux networking stack integration that is specific to the rnpgbe driver instance. This is an intentional separation: the mucse layer is the cross-product hardware library, and the rnpgbe layer is the Linux driver that uses it. 14. Sashiko: > +static void rnpgbe_irq_enable_queues(struct mucse_q_vector *q_vector) > +{ > + struct mucse_ring *ring; > + > + /* tx/rx use one register, different bit */ > + mucse_for_each_ring(ring, q_vector->tx) { > + writel(0, ring->irq_mask); > + writel(INT_VALID | TX_INT_MASK | RX_INT_MASK, ring->trig); Does this write to ring->trig during NAPI unmasking risk stalling queues or causing an interrupt storm? If ring->trig is an Interrupt Status/ACK register (W1C), writing to it will clear any pending interrupts that arrived between the end of the NAPI poll and this unmasking step, resulting in a lost interrupt. If it is a software interrupt trigger register, writing to it will unconditionally force an interrupt, leading to an endless interrupt storm. Should interrupt status clearance happen only in the hardirq handler instead? Answer: The RNPGBE_DMA_INT_TRIG register is neither a W1C status register nor a software interrupt trigger. It is a hardware lost-interrupt recovery mechanism: writing to it instructs the hardware to check whether any interrupt events were missed while the queue was masked. If the hardware detects a lost event, it re-issues the interrupt after a short delay. 15. Sashiko: > diff --git a/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_main.c b/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_main.c > index 70a2b0082ba86..9e541c2419232 100644 > --- a/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_main.c > +++ b/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_main.c > @@ -32,11 +33,28 @@ static struct pci_device_id rnpgbe_pci_tbl[] = { > * The open entry point is called when a network interface is made > * active by the system (IFF_UP). > * > - * Return: 0 > + * Return: 0 on success, negative value on failure > **/ > static int rnpgbe_open(struct net_device *netdev) > { > + struct mucse *mucse = netdev_priv(netdev); > + int err; > + > + err = rnpgbe_request_irq(mucse); > + if (err) > + return err; > + > + err = netif_set_real_num_queues(netdev, mucse->num_tx_queues, > + mucse->num_rx_queues); > + if (err) > + goto err_free_irqs; > + > + rnpgbe_up_complete(mucse); Is it safe to call rnpgbe_request_irq() before napi_enable() (which is called inside rnpgbe_up_complete())? This exposes a race window where an interrupt can fire, mask the queues, and call napi_schedule_irqoff(). Because napi_enable() has not been called yet, the NAPI state still has NAPI_STATE_SCHED set, causing napi_schedule_prep() to fail and napi_schedule_irqoff() to be a no-op. When napi_enable() is subsequently called, it unconditionally clears the NAPI state, losing the missed schedule. For edge-triggered interrupts, the condition won't re-assert by unmasking, leading to permanently stalled queues. Could NAPI be fully initialized and enabled before registering the IRQ handler? Answer: For the per-ring MSI-X handlers (rnpgbe_msix_clean_rings), this is safe because the hardware interrupt source remains masked until rnpgbe_irq_enable(), which runs after napi_enable_all(). No interrupt can fire between request_irq() and napi_enable(), so no race exists. For the single-interrupt mode (rnpgbe_int_single), the handler is registered at probe time and persists across open/close cycles. Here we guard with __MUCSE_DOWN: rnpgbe_int_single(): if (test_bit(__MUCSE_DOWN, &mucse->state)) return IRQ_HANDLED; // device not ready, discard __MUCSE_DOWN is set at probe (before any handler is registered) and only cleared after NAPI is fully enabled: rnpgbe_up_complete(): rnpgbe_napi_enable_all(mucse); // NAPI ready first clear_bit(__MUCSE_DOWN, &mucse->state); // handler can proceed rnpgbe_irq_enable(mucse); // hw unmasked last The invariant is: clear_bit(__MUCSE_DOWN) happens strictly after napi_enable_all() and before rnpgbe_irq_enable(), so by the time the handler sees DOWN=0 and proceeds to napi_schedule_irqoff(), NAPI is already enabled. On the teardown side, rnpgbe_down() sets the bit then calls synchronize_irq(), guaranteeing in-flight handlers observe the transition. Both paths are safe, just via different mechanisms. Dong Yibo (4): net: rnpgbe: Add interrupt handling net: rnpgbe: Add basic TX packet transmission support net: rnpgbe: Add RX packet reception support net: rnpgbe: Add link status handling support drivers/net/ethernet/mucse/Kconfig | 1 + drivers/net/ethernet/mucse/rnpgbe/Makefile | 3 +- drivers/net/ethernet/mucse/rnpgbe/rnpgbe.h | 208 +- .../net/ethernet/mucse/rnpgbe/rnpgbe_chip.c | 45 +- drivers/net/ethernet/mucse/rnpgbe/rnpgbe_hw.h | 19 + .../net/ethernet/mucse/rnpgbe/rnpgbe_lib.c | 2156 +++++++++++++++++ .../net/ethernet/mucse/rnpgbe/rnpgbe_lib.h | 87 + .../net/ethernet/mucse/rnpgbe/rnpgbe_main.c | 99 +- .../net/ethernet/mucse/rnpgbe/rnpgbe_mbx.c | 24 + .../net/ethernet/mucse/rnpgbe/rnpgbe_mbx.h | 1 + .../net/ethernet/mucse/rnpgbe/rnpgbe_mbx_fw.c | 228 +- .../net/ethernet/mucse/rnpgbe/rnpgbe_mbx_fw.h | 39 + 12 files changed, 2887 insertions(+), 23 deletions(-) create mode 100644 drivers/net/ethernet/mucse/rnpgbe/rnpgbe_lib.c create mode 100644 drivers/net/ethernet/mucse/rnpgbe/rnpgbe_lib.h -- 2.25.1