[PATCH net-next v43 0/8] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next v43 0/8] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor
@ 2026-05-14  9:51 Xuan Zhuo
  2026-05-14  9:51 ` [PATCH net-next v43 1/8] eea: introduce PCI framework Xuan Zhuo
                   ` (10 more replies)
  0 siblings, 11 replies; 25+ messages in thread
From: Xuan Zhuo @ 2026-05-14  9:51 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Xuan Zhuo, Wen Gu, Philo Lu, Vadim Fedorenko,
	Dong Yibo, Mingyu Wang, Heiner Kallweit, Dust Li

Add a driver framework for EEA that will be available in the future.

This driver is currently quite minimal, implementing only fundamental
core functionalities. Key features include: I/O queue management via
adminq, basic PCI-layer operations, and essential RX/TX data
communication capabilities. It also supports the creation,
initialization, and management of network devices (netdev). Furthermore,
the ring structures for both I/O queues and adminq have been abstracted
into a simple, unified, and reusable library implementation,
facilitating future extension and maintenance.

v43:
    1. fix the comments from sashiko

v42:
    1. fix the comments from Jakub
    2. fix the comments from sashiko

v41:
    1. make https://sashiko.dev/ happy

v40:
    1. fix comments from https://sashiko.dev/

v39:
    1. fix comments from https://sashiko.dev/#/patchset/20260407122001.22265-1-xuanzhuo%40linux.alibaba.com

v38:
    1. fix comments from https://sashiko.dev/#/patchset/20260404135144.33166-1-xuanzhuo%40linux.alibaba.com

v36:
    1. fix comments from https://sashiko.dev/#/patchset/20260323074441.91691-1-xuanzhuo%40linux.alibaba.com

v35:
    1. fix comments from https://netdev-ai.bots.linux.dev/ai-review.html?id=24371ffc-a1ea-46e1-a6b3-b8cbcbb52efb

v34:
    1. fix the comments from https://sashiko.dev/#/patchset/20260317015257.79494-1-xuanzhuo%40linux.alibaba.com

v33:
    1. fix the comments. old version refers enet_bind_new_q_and_cfg, but that is
       changed to eea_bind_q_and_cfg.

v32:
    1. remove unused parameters from eea_net_ha_reset_remove and eea_create_pp
    2. call skb_mark_for_recycle for new skb for frag_list skbs

v31:
    1. remove unused parameter "enet" from eex_rx_post()

v30:
   1. fix some small problems

v29:
    Address AI-generated review suggestions from the previous version.

v28:
    Address AI-generated review suggestions from the previous version.

v27:
    Address AI-generated review suggestions from the previous version.

v26:
    Adopting suggestions from the previous AI review, another significant
    change is the introduction of an IRQ block to implement IRQ proxying. With
    this design, when an IRQ occurs, we no longer invoke the RX data structure
    directly -- instead, the IRQ block serves as an intermediary proxy. This
    approach offers several advantages: IRQ resources no longer need to be
    reallocated during reset operations, and IRQs are decoupled from RX
    structures. Consequently, when certain errors occur, we can fall back and
    safely reuse the original memory resources.

v25:
    I have adopted most of the suggestions from the AI's feedback. However, I
    believe some of the feedback is incorrect. I have already replied in the
    previous thread. http://lore.kernel.org/all/1770002612.3297296-2-xuanzhuo@linux.alibaba.com

v24:
    1. Add null checks for enet->rx and enet->tx in eea_get_ethtool_stat to
       prevent errors when reading rx = enet->rx[i] in case enet->rx is null.
       tx is similar. With rtnl protection in place, this check is sufficient.
    2. Use 'received' as the return value in eea_poll.

v23:
    I have moved netif_set_real_num_queues() out of eea_start_rxtx(), so
    eea_start_rxtx() is now a void function. I believe enet_bind_new_q_and_cfg()
    is a more suitable place to include netif_set_real_num_queues(). In
    eea_active_ring_and_irq(), I first execute request_irq() before interacting
    with the hardware to create queues. Therefore, during the NIC setup process,
    all driver-internal operations (memory allocation, IRQ initialization, sysfs
    configuration, etc.) will be completed before the final notification to the
    hardware.

v22:
    1. Use the budget from the NAPI poll function as the parameter for
       napi_consume_skb.
    2. Stop the TX queue when the remaining ring slots cannot hold an SKB.

v21:
    Fix two issues from the previous version:
    1, a DMA unmap operation was missing.
    2, RCU APIs were not used in eea_stats. Although the standard practice when
        using RCU would require adding the __rcu annotation to both the rx and
        tx fields, in many cases these fields are read without needing RCU
        protection.  Therefore, I do not want to add the __rcu annotation.
        Instead, I use a spin lock to protect modifications to rx and tx.

v20:
    Fix the partially initialized structure passed to db. @Jakub
    http://lore.kernel.org/all/20260113172353.2ae6ef81@kernel.org

v19:
    fix the comments from @Simon Horman

v18:
    v17 with [PATCH] prefix.

v17:
    1. In `eea_adminq_dev_status`, uniformly use `enet->cfg.rx_ring_num`.
    2. Add a `struct eea_net_cfg *cfg` parameter to `eea_free_rx` and
        `eea_free_tx`. When called in the normal path, pass `enet->cfg` as
        the argument; when called during initialization, pass the temporary
        `cfg` instead.
    3. Move the `.ndo_get_stats64` callback into `eea_net.c`.
    4. In the `.ndo_get_stats64` callback, add a comment explaining how the TX
        and RX statistics are protected by RCU.

       /* This function is protected by RCU. Here uses enet->tx and enet->rx
        * to check whether the TX and RX structures are safe to access. In
        * eea_free_rxtx_q_mem, before freeing the TX and RX resources, enet->rx
        * and enet->tx are set to NULL, and synchronize_net is called.
        */


v16:
    1. follow the advices from @ALOK TIWARI
       http://lore.kernel.org/all/5ff95a71-69e5-4cb6-9b2a-5224c983bdc2@oracle.com

v15:
    1. remove 'default m' from eea kconfig
    2. free the resources when open failed.

v14:
    1. some tiny fixes

v13:
    1. fix some tiny fixes @Simon

v12:
    I encountered some issues with sending the v11 patches, as they were quite
    messy. Therefore, I'm resending them as v12.

v11:
    1. remove auto clean __free(kfree)
    2. some tiny fixes

v10:
    1. name the jump labels after the target @Jakub
    2. rm __GFP_ZERO from dma_alloc_coherent @Jakub
v9:
    1. some fixes for ethtool from http://lore.kernel.org/all/20251027183754.52fe2a2c@kernel.org

v8: 1. rename eea_net_tmp to eea_net_init_ctx
    2. rm code that allocs memory to destroy queues
    3. some other minor changes

v7: 1. remove the irrelative code from ethtool commit
    2. build every commits with W12

v6: Split the big one commit to five commits
v5: Thanks for the comments from Kalesh Anakkur Purayil, ALOK TIWARI
v4: Thanks for the comments from Troy Mitchell, Przemek Kitszel, Andrew Lunn, Kalesh Anakkur Purayil
v3: Thanks for the comments from Paolo Abenchi
v2: Thanks for the comments from Simon Horman and Andrew Lunn
v1: Thanks for the comments from Simon Horman and Andrew Lunn






































Xuan Zhuo (8):
  eea: introduce PCI framework
  eea: introduce ring and descriptor structures
  eea: probe the netdevice and create adminq
  eea: create/destroy rx,tx queues for netdevice open and stop
  eea: implement packet receive logic
  eea: implement packet transmit logic
  eea: introduce ethtool support
  eea: introduce callback for ndo_get_stats64 and register netdev

 MAINTAINERS                                   |   8 +
 drivers/net/ethernet/Kconfig                  |   1 +
 drivers/net/ethernet/Makefile                 |   1 +
 drivers/net/ethernet/alibaba/Kconfig          |  28 +
 drivers/net/ethernet/alibaba/Makefile         |   5 +
 drivers/net/ethernet/alibaba/eea/Makefile     |   9 +
 drivers/net/ethernet/alibaba/eea/eea_adminq.c | 542 +++++++++++
 drivers/net/ethernet/alibaba/eea/eea_adminq.h |  83 ++
 drivers/net/ethernet/alibaba/eea/eea_desc.h   | 138 +++
 .../net/ethernet/alibaba/eea/eea_ethtool.c    | 273 ++++++
 .../net/ethernet/alibaba/eea/eea_ethtool.h    |  48 +
 drivers/net/ethernet/alibaba/eea/eea_net.c    | 887 ++++++++++++++++++
 drivers/net/ethernet/alibaba/eea/eea_net.h    | 198 ++++
 drivers/net/ethernet/alibaba/eea/eea_pci.c    | 744 +++++++++++++++
 drivers/net/ethernet/alibaba/eea/eea_pci.h    |  73 ++
 drivers/net/ethernet/alibaba/eea/eea_ring.c   | 249 +++++
 drivers/net/ethernet/alibaba/eea/eea_ring.h   |  99 ++
 drivers/net/ethernet/alibaba/eea/eea_rx.c     | 814 ++++++++++++++++
 drivers/net/ethernet/alibaba/eea/eea_tx.c     | 500 ++++++++++
 19 files changed, 4700 insertions(+)
 create mode 100644 drivers/net/ethernet/alibaba/Kconfig
 create mode 100644 drivers/net/ethernet/alibaba/Makefile
 create mode 100644 drivers/net/ethernet/alibaba/eea/Makefile
 create mode 100644 drivers/net/ethernet/alibaba/eea/eea_adminq.c
 create mode 100644 drivers/net/ethernet/alibaba/eea/eea_adminq.h
 create mode 100644 drivers/net/ethernet/alibaba/eea/eea_desc.h
 create mode 100644 drivers/net/ethernet/alibaba/eea/eea_ethtool.c
 create mode 100644 drivers/net/ethernet/alibaba/eea/eea_ethtool.h
 create mode 100644 drivers/net/ethernet/alibaba/eea/eea_net.c
 create mode 100644 drivers/net/ethernet/alibaba/eea/eea_net.h
 create mode 100644 drivers/net/ethernet/alibaba/eea/eea_pci.c
 create mode 100644 drivers/net/ethernet/alibaba/eea/eea_pci.h
 create mode 100644 drivers/net/ethernet/alibaba/eea/eea_ring.c
 create mode 100644 drivers/net/ethernet/alibaba/eea/eea_ring.h
 create mode 100644 drivers/net/ethernet/alibaba/eea/eea_rx.c
 create mode 100644 drivers/net/ethernet/alibaba/eea/eea_tx.c

--
2.32.0.3.g01195cf9f


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH net-next v43 1/8] eea: introduce PCI framework
  2026-05-14  9:51 [PATCH net-next v43 0/8] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor Xuan Zhuo
@ 2026-05-14  9:51 ` Xuan Zhuo
  2026-05-18  3:06   ` Xuan Zhuo
  2026-05-14  9:51 ` [PATCH net-next v43 2/8] eea: introduce ring and descriptor structures Xuan Zhuo
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 25+ messages in thread
From: Xuan Zhuo @ 2026-05-14  9:51 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Xuan Zhuo, Wen Gu, Philo Lu, Vadim Fedorenko,
	Dong Yibo, Mingyu Wang, Heiner Kallweit, Dust Li

Add basic driver framework for the Alibaba Elastic Ethernet Adapter(EEA).

This commit implements the EEA PCI probe functionality.

Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Philo Lu <lulie@linux.alibaba.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 MAINTAINERS                                |   8 +
 drivers/net/ethernet/Kconfig               |   1 +
 drivers/net/ethernet/Makefile              |   1 +
 drivers/net/ethernet/alibaba/Kconfig       |  28 ++
 drivers/net/ethernet/alibaba/Makefile      |   5 +
 drivers/net/ethernet/alibaba/eea/Makefile  |   3 +
 drivers/net/ethernet/alibaba/eea/eea_pci.c | 487 +++++++++++++++++++++
 drivers/net/ethernet/alibaba/eea/eea_pci.h |  50 +++
 8 files changed, 583 insertions(+)
 create mode 100644 drivers/net/ethernet/alibaba/Kconfig
 create mode 100644 drivers/net/ethernet/alibaba/Makefile
 create mode 100644 drivers/net/ethernet/alibaba/eea/Makefile
 create mode 100644 drivers/net/ethernet/alibaba/eea/eea_pci.c
 create mode 100644 drivers/net/ethernet/alibaba/eea/eea_pci.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 5bbbbde6b907..829015eaa3ce 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -802,6 +802,14 @@ S:	Maintained
 F:	Documentation/i2c/busses/i2c-ali1563.rst
 F:	drivers/i2c/busses/i2c-ali1563.c
 
+ALIBABA ELASTIC ETHERNET ADAPTER DRIVER
+M:	Xuan Zhuo <xuanzhuo@linux.alibaba.com>
+M:	Wen Gu <guwen@linux.alibaba.com>
+R:	Philo Lu <lulie@linux.alibaba.com>
+L:	netdev@vger.kernel.org
+S:	Maintained
+F:	drivers/net/ethernet/alibaba/eea
+
 ALIBABA ELASTIC RDMA DRIVER
 M:	Cheng Xu <chengyou@linux.alibaba.com>
 M:	Kai Shen <kaishen@linux.alibaba.com>
diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index b8f70e2a1763..78c79ad7bba5 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -22,6 +22,7 @@ source "drivers/net/ethernet/aeroflex/Kconfig"
 source "drivers/net/ethernet/agere/Kconfig"
 source "drivers/net/ethernet/airoha/Kconfig"
 source "drivers/net/ethernet/alacritech/Kconfig"
+source "drivers/net/ethernet/alibaba/Kconfig"
 source "drivers/net/ethernet/allwinner/Kconfig"
 source "drivers/net/ethernet/altera/Kconfig"
 source "drivers/net/ethernet/amazon/Kconfig"
diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
index 57344fec6ce0..bba55d9af387 100644
--- a/drivers/net/ethernet/Makefile
+++ b/drivers/net/ethernet/Makefile
@@ -12,6 +12,7 @@ obj-$(CONFIG_NET_VENDOR_ADI) += adi/
 obj-$(CONFIG_NET_VENDOR_AGERE) += agere/
 obj-$(CONFIG_NET_VENDOR_AIROHA) += airoha/
 obj-$(CONFIG_NET_VENDOR_ALACRITECH) += alacritech/
+obj-$(CONFIG_NET_VENDOR_ALIBABA) += alibaba/
 obj-$(CONFIG_NET_VENDOR_ALLWINNER) += allwinner/
 obj-$(CONFIG_ALTERA_TSE) += altera/
 obj-$(CONFIG_NET_VENDOR_AMAZON) += amazon/
diff --git a/drivers/net/ethernet/alibaba/Kconfig b/drivers/net/ethernet/alibaba/Kconfig
new file mode 100644
index 000000000000..b8fd3c92fa4c
--- /dev/null
+++ b/drivers/net/ethernet/alibaba/Kconfig
@@ -0,0 +1,28 @@
+#
+# Alibaba network device configuration
+#
+
+config NET_VENDOR_ALIBABA
+	bool "Alibaba Devices"
+	default y
+	help
+	  If you have a network (Ethernet) device belonging to this class, say Y.
+
+	  Note that the answer to this question doesn't directly affect the
+	  kernel: saying N will just cause the configurator to skip all
+	  the questions about Alibaba devices. If you say Y, you will be asked
+	  for your specific device in the following questions.
+
+if NET_VENDOR_ALIBABA
+
+config ALIBABA_EEA
+	tristate "Alibaba Elastic Ethernet Adapter support"
+	depends on PCI_MSI
+	depends on 64BIT
+	select PAGE_POOL
+	help
+	  This driver supports Alibaba Elastic Ethernet Adapter.
+
+	  To compile this driver as a module, choose M here.
+
+endif #NET_VENDOR_ALIBABA
diff --git a/drivers/net/ethernet/alibaba/Makefile b/drivers/net/ethernet/alibaba/Makefile
new file mode 100644
index 000000000000..63a527e79ba7
--- /dev/null
+++ b/drivers/net/ethernet/alibaba/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile for the Alibaba network device drivers.
+#
+
+obj-$(CONFIG_ALIBABA_EEA) += eea/
diff --git a/drivers/net/ethernet/alibaba/eea/Makefile b/drivers/net/ethernet/alibaba/eea/Makefile
new file mode 100644
index 000000000000..a07f3019ec1f
--- /dev/null
+++ b/drivers/net/ethernet/alibaba/eea/Makefile
@@ -0,0 +1,3 @@
+
+obj-$(CONFIG_ALIBABA_EEA) += eea.o
+eea-y := eea_pci.o
diff --git a/drivers/net/ethernet/alibaba/eea/eea_pci.c b/drivers/net/ethernet/alibaba/eea/eea_pci.c
new file mode 100644
index 000000000000..65a0ceb73b35
--- /dev/null
+++ b/drivers/net/ethernet/alibaba/eea/eea_pci.c
@@ -0,0 +1,487 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Driver for Alibaba Elastic Ethernet Adapter.
+ *
+ * Copyright (C) 2025 Alibaba Inc.
+ */
+
+#include <linux/io-64-nonatomic-lo-hi.h>
+#include <linux/iopoll.h>
+
+#include "eea_pci.h"
+
+#define EEA_PCI_DB_OFFSET 4096
+#define EEA_PCI_DB_MIN_SIZE 8
+#define EEA_PCI_DB_MAX_SIZE 512
+#define EEA_PCI_Q_MAX_NUM 1000
+
+struct eea_pci_cfg {
+	__le32 reserve0;
+	__le32 reserve1;
+	__le32 drv_f_idx;
+	__le32 drv_f;
+
+#define EEA_S_INIT         (BIT(0) | BIT(1))
+#define EEA_S_OK           BIT(2)
+#define EEA_S_FEATURE_DONE BIT(3)
+#define EEA_S_FAILED       BIT(7)
+	u8   device_status;
+	u8   reserved[7];
+
+	__le32 rx_num_max;
+	__le32 tx_num_max;
+	__le32 db_blk_size;
+
+	/* admin queue cfg */
+	__le16 aq_size;
+	__le16 aq_msix_vector;
+	__le32 aq_db_off;
+
+	__le32 aq_sq_addr;
+	__le32 aq_sq_addr_hi;
+	__le32 aq_cq_addr;
+	__le32 aq_cq_addr_hi;
+
+	__le32 reserved1;
+	__le64 hw_ts;
+};
+
+struct eea_pci_device {
+	struct eea_device edev;
+	struct pci_dev *pci_dev;
+
+	u32 msix_vec_n;
+	u32 db_len;
+
+	void __iomem *reg;
+	void __iomem *db_base;
+	void __iomem *db_end;
+
+	bool shutdown;
+};
+
+#define cfg_pointer(reg, item) \
+	((void __iomem *)((reg) + offsetof(struct eea_pci_cfg, item)))
+
+#define cfg_write8(reg, item, val) iowrite8(val, cfg_pointer(reg, item))
+#define cfg_write32(reg, item, val) iowrite32(val, cfg_pointer(reg, item))
+
+#define cfg_read8(reg, item) ioread8(cfg_pointer(reg, item))
+#define cfg_read32(reg, item) ioread32(cfg_pointer(reg, item))
+#define cfg_read64(reg, item) ioread64(cfg_pointer(reg, item))
+
+const char *eea_pci_name(struct eea_device *edev)
+{
+	return pci_name(edev->ep_dev->pci_dev);
+}
+
+int eea_pci_domain_nr(struct eea_device *edev)
+{
+	return pci_domain_nr(edev->ep_dev->pci_dev->bus);
+}
+
+u16 eea_pci_bdf(struct eea_device *edev)
+{
+	return pci_dev_id(edev->ep_dev->pci_dev);
+}
+
+static void eea_pci_io_set_status(struct eea_device *edev, u8 status)
+{
+	struct eea_pci_device *ep_dev = edev->ep_dev;
+
+	cfg_write8(ep_dev->reg, device_status, status);
+}
+
+static u8 eea_pci_io_get_status(struct eea_device *edev)
+{
+	struct eea_pci_device *ep_dev = edev->ep_dev;
+
+	return cfg_read8(ep_dev->reg, device_status);
+}
+
+static void eea_add_status(struct eea_device *dev, u32 status)
+{
+	eea_pci_io_set_status(dev, eea_pci_io_get_status(dev) | status);
+}
+
+#define EEA_RESET_TIMEOUT_US (60 * 1000 * 1000)
+
+int eea_device_reset(struct eea_device *edev)
+{
+	struct eea_pci_device *ep_dev = edev->ep_dev;
+	int err;
+	u8 val;
+
+	eea_pci_io_set_status(edev, 0);
+
+	/* We are no longer waiting for device ack during the shutdown flow. */
+	if (ep_dev->shutdown)
+		return 0;
+
+	/* A longer timeout is set here to handle edge cases, though it should
+	 * return promptly in most scenarios.
+	 *
+	 * In our case, all replies are handled by the DPU software, so there is
+	 * no race condition between the hardware processes and the register.
+	 */
+	err = read_poll_timeout(cfg_read8, val, (!val || val == 0xFF), 20,
+				EEA_RESET_TIMEOUT_US,
+				false, ep_dev->reg, device_status);
+
+	/* Surprise PCIe Removal */
+	if (val == 0xFF)
+		return -EINVAL;
+
+	return err;
+}
+
+int eea_pci_set_aq_up(struct eea_device *edev)
+{
+	struct eea_pci_device *ep_dev = edev->ep_dev;
+	u8 status = eea_pci_io_get_status(edev);
+	int err;
+	u8 val;
+
+	eea_pci_io_set_status(edev, status | EEA_S_OK);
+
+	/* A longer timeout is set here to handle edge cases, though it should
+	 * return promptly in most scenarios.
+	 *
+	 * In our case, all replies are handled by the DPU software, so there is
+	 * no race condition between the hardware processes and the register.
+	 */
+	err = read_poll_timeout(cfg_read8, val,
+				val & (EEA_S_OK | EEA_S_FAILED),
+				20, EEA_RESET_TIMEOUT_US,
+				false, ep_dev->reg, device_status);
+
+	/* Surprise PCIe Removal */
+	if (val == 0xFF)
+		return -EINVAL;
+
+	/* device fail */
+	if (val & EEA_S_FAILED)
+		return -EINVAL;
+
+	return err;
+}
+
+static int eea_negotiate(struct eea_device *edev)
+{
+	struct eea_pci_device *ep_dev;
+	u32 status;
+
+	ep_dev = edev->ep_dev;
+
+	edev->features = 0;
+
+	cfg_write32(ep_dev->reg, drv_f_idx, 0);
+	cfg_write32(ep_dev->reg, drv_f, lower_32_bits(edev->features));
+	cfg_write32(ep_dev->reg, drv_f_idx, 1);
+	cfg_write32(ep_dev->reg, drv_f, upper_32_bits(edev->features));
+
+	eea_add_status(edev, EEA_S_FEATURE_DONE);
+	status = eea_pci_io_get_status(edev);
+
+	/* Surprise PCIe Removal */
+	if (status == 0xFF)
+		return -EINVAL;
+
+	if (!(status & EEA_S_FEATURE_DONE))
+		return -ENODEV;
+
+	return 0;
+}
+
+static void eea_pci_release_resource(struct eea_pci_device *ep_dev)
+{
+	struct pci_dev *pci_dev = ep_dev->pci_dev;
+
+	if (ep_dev->reg) {
+		pci_iounmap(pci_dev, ep_dev->reg);
+		ep_dev->reg = NULL;
+	}
+
+	if (ep_dev->msix_vec_n) {
+		ep_dev->msix_vec_n = 0;
+		pci_free_irq_vectors(ep_dev->pci_dev);
+	}
+
+	pci_clear_master(pci_dev);
+	pci_release_regions(pci_dev);
+	pci_disable_device(pci_dev);
+}
+
+static int eea_pci_setup(struct pci_dev *pci_dev, struct eea_pci_device *ep_dev)
+{
+	int err, n, ret, len;
+
+	ep_dev->pci_dev = pci_dev;
+
+	err = pci_enable_device(pci_dev);
+	if (err)
+		return err;
+
+	err = pci_request_regions(pci_dev, "EEA");
+	if (err)
+		goto err_disable_dev;
+
+	if (pci_resource_len(pci_dev, 0) < EEA_PCI_DB_OFFSET) {
+		dev_err(&pci_dev->dev, "Bar 0 is too small %llu\n",
+			(u64)pci_resource_len(pci_dev, 0));
+		err = -EINVAL;
+		goto err_release_regions;
+	}
+
+	ep_dev->reg = pci_iomap(pci_dev, 0, 0);
+	if (!ep_dev->reg) {
+		dev_err(&pci_dev->dev, "Failed to map pci bar!\n");
+		err = -ENOMEM;
+		goto err_release_regions;
+	}
+
+	err = eea_device_reset(&ep_dev->edev);
+	if (err) {
+		dev_err(&pci_dev->dev, "Failed to reset device for setup!\n");
+		goto err_unmap_reg;
+	}
+
+	err = dma_set_mask_and_coherent(&pci_dev->dev, DMA_BIT_MASK(64));
+	if (err) {
+		dev_warn(&pci_dev->dev, "Failed to enable 64-bit DMA.\n");
+		goto err_unmap_reg;
+	}
+
+	pci_set_master(pci_dev);
+
+	ep_dev->edev.rx_num = cfg_read32(ep_dev->reg, rx_num_max);
+	ep_dev->edev.tx_num = cfg_read32(ep_dev->reg, tx_num_max);
+
+	if (ep_dev->edev.rx_num > EEA_PCI_Q_MAX_NUM ||
+	    ep_dev->edev.tx_num > EEA_PCI_Q_MAX_NUM) {
+		dev_err(&pci_dev->dev, "Invalid queue num %u %u\n",
+			ep_dev->edev.rx_num,
+			ep_dev->edev.tx_num);
+		err = -EINVAL;
+		goto err_clear_master;
+	}
+
+	ep_dev->edev.db_blk_size = cfg_read32(ep_dev->reg, db_blk_size);
+	if (!IS_ALIGNED(ep_dev->edev.db_blk_size, 8) ||
+	    ep_dev->edev.db_blk_size > EEA_PCI_DB_MAX_SIZE ||
+	    ep_dev->edev.db_blk_size < EEA_PCI_DB_MIN_SIZE) {
+		dev_err(&pci_dev->dev, "Invalid db size %u\n",
+			ep_dev->edev.db_blk_size);
+		err = -EINVAL;
+		goto err_clear_master;
+	}
+
+	ep_dev->db_len = ep_dev->edev.db_blk_size * (ep_dev->edev.rx_num +
+						     ep_dev->edev.tx_num + 1);
+	ep_dev->db_base = ep_dev->reg + EEA_PCI_DB_OFFSET;
+	ep_dev->db_end = ep_dev->db_base + ep_dev->db_len;
+
+	len = ep_dev->db_end - ep_dev->reg;
+
+	if (pci_resource_len(pci_dev, 0) < len) {
+		dev_err(&pci_dev->dev, "Bar 0 is too small %llu\n",
+			(u64)pci_resource_len(pci_dev, 0));
+		err = -EINVAL;
+		goto err_clear_master;
+	}
+
+	/* In our design, the number of hardware interrupts matches the maximum
+	 * number of queues. If pci_alloc_irq_vectors failed, return directly.
+	 *
+	 * 2: adminq, error handle
+	 */
+	n = ep_dev->edev.rx_num + 2;
+	ret = pci_alloc_irq_vectors(ep_dev->pci_dev, n, n, PCI_IRQ_MSIX);
+	if (ret != n) {
+		err = ret;
+		goto err_clear_master;
+	}
+
+	ep_dev->msix_vec_n = ret;
+
+	return 0;
+
+err_clear_master:
+	pci_clear_master(pci_dev);
+
+err_unmap_reg:
+	pci_iounmap(pci_dev, ep_dev->reg);
+	ep_dev->reg = NULL;
+
+err_release_regions:
+	pci_release_regions(pci_dev);
+
+err_disable_dev:
+	pci_disable_device(pci_dev);
+
+	return err;
+}
+
+void __iomem *eea_pci_db_addr(struct eea_device *edev, u32 off)
+{
+	u32 max_off;
+
+	if (!IS_ALIGNED(off, 8))
+		return NULL;
+
+	max_off = edev->ep_dev->db_len - edev->db_blk_size;
+
+	if (off > max_off)
+		return NULL;
+
+	return edev->ep_dev->db_base + off;
+}
+
+u64 eea_pci_device_ts(struct eea_device *edev)
+{
+	struct eea_pci_device *ep_dev = edev->ep_dev;
+
+	return cfg_read64(ep_dev->reg, hw_ts);
+}
+
+static int eea_init_device(struct eea_device *edev)
+{
+	int err;
+
+	err = eea_device_reset(edev);
+	if (err)
+		return err;
+
+	eea_pci_io_set_status(edev, EEA_S_INIT);
+
+	err = eea_negotiate(edev);
+	if (err)
+		goto err;
+
+	/* do net device probe ... */
+
+	return 0;
+err:
+	eea_add_status(edev, EEA_S_FAILED);
+	return err;
+}
+
+static int __eea_pci_probe(struct pci_dev *pci_dev,
+			   struct eea_pci_device *ep_dev)
+{
+	int err;
+
+	pci_set_drvdata(pci_dev, ep_dev);
+
+	err = eea_pci_setup(pci_dev, ep_dev);
+	if (err)
+		return err;
+
+	err = eea_init_device(&ep_dev->edev);
+	if (err)
+		goto err_pci_rel;
+
+	return 0;
+
+err_pci_rel:
+	eea_pci_release_resource(ep_dev);
+	return err;
+}
+
+static void __eea_pci_remove(struct pci_dev *pci_dev)
+{
+	struct eea_pci_device *ep_dev = pci_get_drvdata(pci_dev);
+	struct device *dev = get_device(&ep_dev->pci_dev->dev);
+
+	eea_pci_release_resource(ep_dev);
+
+	put_device(dev);
+}
+
+static int eea_pci_probe(struct pci_dev *pci_dev,
+			 const struct pci_device_id *id)
+{
+	struct eea_pci_device *ep_dev;
+	struct eea_device *edev;
+	int err;
+
+	ep_dev = kzalloc(sizeof(*ep_dev), GFP_KERNEL);
+	if (!ep_dev)
+		return -ENOMEM;
+
+	edev = &ep_dev->edev;
+
+	edev->ep_dev = ep_dev;
+	edev->dma_dev = &pci_dev->dev;
+
+	ep_dev->pci_dev = pci_dev;
+
+	err = __eea_pci_probe(pci_dev, ep_dev);
+	if (err) {
+		pci_set_drvdata(pci_dev, NULL);
+		kfree(ep_dev);
+	}
+
+	return err;
+}
+
+static void eea_pci_remove(struct pci_dev *pci_dev)
+{
+	struct eea_pci_device *ep_dev = pci_get_drvdata(pci_dev);
+
+	eea_device_reset(&ep_dev->edev);
+
+	__eea_pci_remove(pci_dev);
+
+	pci_set_drvdata(pci_dev, NULL);
+	kfree(ep_dev);
+}
+
+static void eea_pci_shutdown(struct pci_dev *pci_dev)
+{
+	struct eea_pci_device *ep_dev = pci_get_drvdata(pci_dev);
+	struct eea_device *edev;
+
+	edev = &ep_dev->edev;
+
+	ep_dev->shutdown = true;
+
+	/* do net device stop and clear. */
+
+	eea_device_reset(edev);
+
+	pci_clear_master(pci_dev);
+}
+
+static const struct pci_device_id eea_pci_id_table[] = {
+	{ PCI_DEVICE(PCI_VENDOR_ID_ALIBABA, 0x500B) },
+	{ 0 }
+};
+
+MODULE_DEVICE_TABLE(pci, eea_pci_id_table);
+
+static struct pci_driver eea_pci_driver = {
+	.name            = "alibaba_eea",
+	.id_table        = eea_pci_id_table,
+	.probe           = eea_pci_probe,
+	.remove          = eea_pci_remove,
+	.shutdown        = eea_pci_shutdown,
+	.sriov_configure = pci_sriov_configure_simple,
+};
+
+static __init int eea_pci_init(void)
+{
+	return pci_register_driver(&eea_pci_driver);
+}
+
+static __exit void eea_pci_exit(void)
+{
+	pci_unregister_driver(&eea_pci_driver);
+}
+
+module_init(eea_pci_init);
+module_exit(eea_pci_exit);
+
+MODULE_DESCRIPTION("Driver for Alibaba Elastic Ethernet Adapter");
+MODULE_AUTHOR("Xuan Zhuo <xuanzhuo@linux.alibaba.com>");
+MODULE_LICENSE("GPL");
diff --git a/drivers/net/ethernet/alibaba/eea/eea_pci.h b/drivers/net/ethernet/alibaba/eea/eea_pci.h
new file mode 100644
index 000000000000..746cce4dd68e
--- /dev/null
+++ b/drivers/net/ethernet/alibaba/eea/eea_pci.h
@@ -0,0 +1,50 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Driver for Alibaba Elastic Ethernet Adapter.
+ *
+ * Copyright (C) 2025 Alibaba Inc.
+ */
+
+#ifndef __EEA_PCI_H__
+#define __EEA_PCI_H__
+
+#include <linux/pci.h>
+
+struct eea_pci_cap {
+	__u8 cap_vndr;
+	__u8 cap_next;
+	__u8 cap_len;
+	__u8 cfg_type;
+};
+
+struct eea_pci_reset_reg {
+	struct eea_pci_cap cap;
+	__le16 driver;
+	__le16 device;
+};
+
+struct eea_pci_device;
+
+struct eea_device {
+	struct eea_pci_device *ep_dev;
+	struct device         *dma_dev;
+	struct eea_net        *enet;
+
+	u64 features;
+
+	u32 rx_num;
+	u32 tx_num;
+	u32 db_blk_size;
+};
+
+const char *eea_pci_name(struct eea_device *edev);
+int eea_pci_domain_nr(struct eea_device *edev);
+u16 eea_pci_bdf(struct eea_device *edev);
+
+int eea_device_reset(struct eea_device *dev);
+int eea_pci_set_aq_up(struct eea_device *dev);
+
+u64 eea_pci_device_ts(struct eea_device *edev);
+
+void __iomem *eea_pci_db_addr(struct eea_device *edev, u32 off);
+#endif
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next v43 2/8] eea: introduce ring and descriptor structures
  2026-05-14  9:51 [PATCH net-next v43 0/8] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor Xuan Zhuo
  2026-05-14  9:51 ` [PATCH net-next v43 1/8] eea: introduce PCI framework Xuan Zhuo
@ 2026-05-14  9:51 ` Xuan Zhuo
  2026-05-18  2:57   ` Xuan Zhuo
  2026-05-14  9:51 ` [PATCH net-next v43 3/8] eea: probe the netdevice and create adminq Xuan Zhuo
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 25+ messages in thread
From: Xuan Zhuo @ 2026-05-14  9:51 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Xuan Zhuo, Wen Gu, Philo Lu, Vadim Fedorenko,
	Dong Yibo, Mingyu Wang, Heiner Kallweit, Dust Li

Add basic driver framework for the Alibaba Elastic Ethernet Adapter(EEA).

This commit introduces the ring and descriptor implementations.

These structures and ring APIs are used by the RX, TX, and admin queues.

Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Philo Lu <lulie@linux.alibaba.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/ethernet/alibaba/eea/Makefile   |   3 +-
 drivers/net/ethernet/alibaba/eea/eea_desc.h | 138 +++++++++++
 drivers/net/ethernet/alibaba/eea/eea_ring.c | 249 ++++++++++++++++++++
 drivers/net/ethernet/alibaba/eea/eea_ring.h |  99 ++++++++
 4 files changed, 488 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/alibaba/eea/eea_desc.h
 create mode 100644 drivers/net/ethernet/alibaba/eea/eea_ring.c
 create mode 100644 drivers/net/ethernet/alibaba/eea/eea_ring.h

diff --git a/drivers/net/ethernet/alibaba/eea/Makefile b/drivers/net/ethernet/alibaba/eea/Makefile
index a07f3019ec1f..7d8e7e8c2f3a 100644
--- a/drivers/net/ethernet/alibaba/eea/Makefile
+++ b/drivers/net/ethernet/alibaba/eea/Makefile
@@ -1,3 +1,4 @@
 
 obj-$(CONFIG_ALIBABA_EEA) += eea.o
-eea-y := eea_pci.o
+eea-y :=  eea_ring.o \
+	eea_pci.o
diff --git a/drivers/net/ethernet/alibaba/eea/eea_desc.h b/drivers/net/ethernet/alibaba/eea/eea_desc.h
new file mode 100644
index 000000000000..8d94a0f0f237
--- /dev/null
+++ b/drivers/net/ethernet/alibaba/eea/eea_desc.h
@@ -0,0 +1,138 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Driver for Alibaba Elastic Ethernet Adapter.
+ *
+ * Copyright (C) 2025 Alibaba Inc.
+ */
+
+#ifndef __EEA_DESC_H__
+#define __EEA_DESC_H__
+
+#define EEA_DESC_TS_MASK GENMASK_ULL(47, 0)
+#define EEA_DESC_TS(desc) (le64_to_cpu((desc)->ts) & EEA_DESC_TS_MASK)
+
+struct eea_aq_desc {
+	__le16 flags;
+	__le16 id;
+	__le16 reserved;
+	u8 classid;
+	u8 command;
+	__le64 data_addr;
+	__le64 reply_addr;
+	__le32 data_len;
+	__le32 reply_len;
+};
+
+struct eea_aq_cdesc {
+	__le16 flags;
+	__le16 id;
+#define EEA_OK     0
+#define EEA_ERR    0xffffffff
+	__le32 status;
+	__le32 reply_len;
+	__le32 reserved1;
+
+	__le64 reserved2;
+	__le64 reserved3;
+};
+
+struct eea_rx_desc_no_hdr {
+	__le16 flags;
+	__le16 id;
+	__le16 len;
+	__le16 reserved1;
+
+	__le64 addr;
+};
+
+struct eea_rx_desc {
+	__le16 flags;
+	__le16 id;
+	__le16 len;
+	__le16 reserved1;
+
+	__le64 addr;
+
+	__le64 hdr_addr;
+	__le32 reserved2;
+	__le32 reserved3;
+};
+
+#define EEA_RX_CDESC_HDR_LEN_MASK GENMASK_ULL(9, 0)
+
+struct eea_rx_cdesc {
+#define EEA_DESC_F_DATA_VALID	BIT(6)
+#define EEA_DESC_F_SPLIT_HDR	BIT(5)
+	__le16 flags;
+	__le16 id;
+	__le16 len;
+#define EEA_NET_PT_NONE      0
+#define EEA_NET_PT_IPv4      1
+#define EEA_NET_PT_TCPv4     2
+#define EEA_NET_PT_UDPv4     3
+#define EEA_NET_PT_IPv6      4
+#define EEA_NET_PT_TCPv6     5
+#define EEA_NET_PT_UDPv6     6
+#define EEA_NET_PT_IPv6_EX   7
+#define EEA_NET_PT_TCPv6_EX  8
+#define EEA_NET_PT_UDPv6_EX  9
+	/* [9:0] is packet type. */
+	__le16 type;
+
+	/* hw timestamp [0:47]: ts */
+	__le64 ts;
+
+	__le32 hash;
+
+	/* 0-9: hdr_len  split header
+	 * 10-15: reserved1
+	 */
+	__le16 len_ex;
+	__le16 reserved2;
+
+	__le32 reserved3;
+	__le32 reserved4;
+};
+
+#define EEA_TX_GSO_NONE   0
+#define EEA_TX_GSO_TCPV4  1
+#define EEA_TX_GSO_TCPV6  4
+#define EEA_TX_GSO_UDP_L4 5
+#define EEA_TX_GSO_ECN    0x80
+
+struct eea_tx_desc {
+#define EEA_DESC_F_DO_CSUM	BIT(6)
+	__le16 flags;
+	__le16 id;
+	__le16 len;
+	__le16 reserved1;
+
+	__le64 addr;
+
+	__le16 csum_start;
+	__le16 csum_offset;
+	u8 gso_type;
+	u8 reserved2;
+	__le16 gso_size;
+	__le64 reserved3;
+};
+
+struct eea_tx_cdesc {
+	__le16 flags;
+	__le16 id;
+	__le16 len;
+	__le16 reserved1;
+
+	/* hw timestamp [0:47]: ts */
+	__le64 ts;
+};
+
+#define EEA_DB_FLAGS_OFF      0
+#define EEA_DB_IDX_OFF        (2 * 8)
+#define EEA_DB_TX_CQ_HEAD_OFF (4 * 8)
+#define EEA_DB_RX_CQ_HEAD_OFF (6 * 8)
+
+#define EEA_IDX_PRESENT   BIT(0)
+#define EEA_IRQ_MASK      BIT(1)
+#define EEA_IRQ_UNMASK    BIT(2)
+#endif
diff --git a/drivers/net/ethernet/alibaba/eea/eea_ring.c b/drivers/net/ethernet/alibaba/eea/eea_ring.c
new file mode 100644
index 000000000000..99dcabd094b8
--- /dev/null
+++ b/drivers/net/ethernet/alibaba/eea/eea_ring.c
@@ -0,0 +1,249 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Driver for Alibaba Elastic Ethernet Adapter.
+ *
+ * Copyright (C) 2025 Alibaba Inc.
+ */
+
+#include "eea_pci.h"
+#include "eea_ring.h"
+
+void eea_ering_irq_active(struct eea_ring *ering, struct eea_ring *tx_ering)
+{
+	u64 value = 0, rx_idx, tx_idx;
+
+	tx_idx = (u64)tx_ering->cq.hw_idx;
+	rx_idx = (u64)ering->cq.hw_idx;
+
+	value |= EEA_IRQ_UNMASK << EEA_DB_FLAGS_OFF;
+	value |= tx_idx << EEA_DB_TX_CQ_HEAD_OFF;
+	value |= rx_idx << EEA_DB_RX_CQ_HEAD_OFF;
+
+	writeq(value, ering->db);
+}
+
+void *eea_ering_cq_get_desc(const struct eea_ring *ering)
+{
+	u8 phase;
+	u8 *desc;
+
+	desc = ering->cq.desc + (ering->cq.head << ering->cq.desc_size_shift);
+
+	phase = READ_ONCE(*(u8 *)(desc + ering->cq.desc_size - 1));
+
+	if ((phase & EEA_RING_DESC_F_CQ_PHASE) == ering->cq.phase) {
+		dma_rmb();
+		return desc;
+	}
+
+	return NULL;
+}
+
+/* sq api */
+void *eea_ering_sq_alloc_desc(struct eea_ring *ering, u16 id, bool is_last,
+			      u16 flags)
+{
+	struct eea_ring_sq *sq = &ering->sq;
+	struct eea_common_desc *desc;
+
+	if (!sq->shadow_num) {
+		sq->shadow_idx = sq->head;
+		sq->shadow_id = cpu_to_le16(id);
+	}
+
+	if (!is_last)
+		flags |= EEA_RING_DESC_F_MORE;
+
+	desc = sq->desc + (sq->shadow_idx << sq->desc_size_shift);
+
+	desc->flags = cpu_to_le16(flags);
+	desc->id = sq->shadow_id;
+
+	if (unlikely(++sq->shadow_idx >= ering->num))
+		sq->shadow_idx = 0;
+
+	++sq->shadow_num;
+
+	return desc;
+}
+
+/* This is an allocation API for admin Q. For each call to admin Q, only one
+ * desc will be allocated.
+ */
+void *eea_ering_aq_alloc_desc(struct eea_ring *ering)
+{
+	struct eea_ring_sq *sq = &ering->sq;
+	struct eea_common_desc *desc;
+
+	if (!sq->shadow_num)
+		sq->shadow_idx = sq->head;
+
+	desc = sq->desc + (sq->shadow_idx << sq->desc_size_shift);
+
+	if (unlikely(++sq->shadow_idx >= ering->num))
+		sq->shadow_idx = 0;
+
+	++sq->shadow_num;
+
+	return desc;
+}
+
+void eea_ering_sq_commit_desc(struct eea_ring *ering)
+{
+	struct eea_ring_sq *sq = &ering->sq;
+	int num;
+
+	num = sq->shadow_num;
+
+	ering->num_free -= num;
+
+	sq->head       = sq->shadow_idx;
+	sq->hw_idx     += num;
+	sq->shadow_num = 0;
+}
+
+void eea_ering_sq_cancel(struct eea_ring *ering)
+{
+	ering->sq.shadow_num = 0;
+}
+
+/* cq api */
+void eea_ering_cq_ack_desc(struct eea_ring *ering, u32 num)
+{
+	struct eea_ring_cq *cq = &ering->cq;
+
+	cq->head += num;
+	cq->hw_idx += num;
+
+	if (unlikely(cq->head >= ering->num)) {
+		cq->head -= ering->num;
+		cq->phase ^= EEA_RING_DESC_F_CQ_PHASE;
+	}
+
+	ering->num_free += num;
+}
+
+/* notify */
+void eea_ering_kick(struct eea_ring *ering)
+{
+	u64 value = 0, idx;
+
+	idx = (u64)ering->sq.hw_idx;
+
+	value |= EEA_IDX_PRESENT << EEA_DB_FLAGS_OFF;
+	value |= idx << EEA_DB_IDX_OFF;
+
+	writeq(value, ering->db);
+}
+
+/* ering alloc/free */
+static void ering_free_queue(struct eea_device *edev, size_t size,
+			     void *queue, dma_addr_t dma_handle)
+{
+	dma_free_coherent(edev->dma_dev, size, queue, dma_handle);
+}
+
+static void *ering_alloc_queue(struct eea_device *edev, size_t size,
+			       dma_addr_t *dma_handle)
+{
+	gfp_t flags = GFP_KERNEL | __GFP_NOWARN;
+
+	return dma_alloc_coherent(edev->dma_dev, size, dma_handle, flags);
+}
+
+static int ering_alloc_queues(struct eea_ring *ering, struct eea_device *edev,
+			      size_t num, u8 sq_desc_size, u8 cq_desc_size)
+{
+	dma_addr_t addr;
+	size_t size;
+	void *ring;
+
+	size = num * sq_desc_size;
+
+	ring = ering_alloc_queue(edev, size, &addr);
+	if (!ring)
+		return -ENOMEM;
+
+	ering->sq.desc     = ring;
+	ering->sq.dma_addr = addr;
+	ering->sq.dma_size = size;
+	ering->sq.desc_size = sq_desc_size;
+	ering->sq.desc_size_shift = fls(sq_desc_size) - 1;
+
+	size = num * cq_desc_size;
+
+	ring = ering_alloc_queue(edev, size, &addr);
+	if (!ring)
+		goto err_free_sq;
+
+	ering->cq.desc     = ring;
+	ering->cq.dma_addr = addr;
+	ering->cq.dma_size = size;
+	ering->cq.desc_size = cq_desc_size;
+	ering->cq.desc_size_shift = fls(cq_desc_size) - 1;
+
+	ering->num = num;
+
+	return 0;
+
+err_free_sq:
+	ering_free_queue(ering->edev, ering->sq.dma_size,
+			 ering->sq.desc, ering->sq.dma_addr);
+	return -ENOMEM;
+}
+
+static void ering_init(struct eea_ring *ering)
+{
+	ering->cq.phase = EEA_RING_DESC_F_CQ_PHASE;
+	ering->num_free = ering->num;
+}
+
+struct eea_ring *eea_ering_alloc(u32 index, u32 num, struct eea_device *edev,
+				 u8 sq_desc_size, u8 cq_desc_size,
+				 const char *name)
+{
+	struct eea_ring *ering;
+
+	if (num > EEA_NET_IO_HW_RING_DEPTH_MAX ||
+	    num < EEA_NET_IO_RING_DEPTH_MIN)
+		return NULL;
+
+	if (!is_power_of_2(num))
+		return NULL;
+
+	if (!sq_desc_size || !is_power_of_2(sq_desc_size))
+		return NULL;
+
+	if (!cq_desc_size || !is_power_of_2(cq_desc_size))
+		return NULL;
+
+	ering = kzalloc(sizeof(*ering), GFP_KERNEL);
+	if (!ering)
+		return NULL;
+
+	ering->edev = edev;
+	ering->name = name;
+	ering->index = index;
+
+	if (ering_alloc_queues(ering, edev, num, sq_desc_size, cq_desc_size))
+		goto err_free;
+
+	ering_init(ering);
+
+	return ering;
+
+err_free:
+	kfree(ering);
+	return NULL;
+}
+
+void eea_ering_free(struct eea_ring *ering)
+{
+	ering_free_queue(ering->edev, ering->cq.dma_size,
+			 ering->cq.desc, ering->cq.dma_addr);
+
+	ering_free_queue(ering->edev, ering->sq.dma_size,
+			 ering->sq.desc, ering->sq.dma_addr);
+
+	kfree(ering);
+}
diff --git a/drivers/net/ethernet/alibaba/eea/eea_ring.h b/drivers/net/ethernet/alibaba/eea/eea_ring.h
new file mode 100644
index 000000000000..a7ce465943a5
--- /dev/null
+++ b/drivers/net/ethernet/alibaba/eea/eea_ring.h
@@ -0,0 +1,99 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Driver for Alibaba Elastic Ethernet Adapter.
+ *
+ * Copyright (C) 2025 Alibaba Inc.
+ */
+
+#ifndef __EEA_RING_H__
+#define __EEA_RING_H__
+
+#include <linux/dma-mapping.h>
+#include "eea_desc.h"
+
+#define EEA_RING_DESC_F_MORE		BIT(0)
+#define EEA_RING_DESC_F_CQ_PHASE	BIT(7)
+
+/* These two values define the bounds for the queue depth returned by the
+ * hardware.
+ */
+#define EEA_NET_IO_HW_RING_DEPTH_MAX (32 * 1024)
+#define EEA_NET_IO_HW_RING_DEPTH_MIN 128
+
+/* This value constrains the minimum queue depth that the driver configures for
+ * the hardware, which typically applies to user-provided settings. Naturally,
+ * the configured depth must also not exceed the maximum capacity supported by
+ * the hardware.
+ */
+#define EEA_NET_IO_RING_DEPTH_MIN 64
+
+struct eea_common_desc {
+	__le16 flags;
+	__le16 id;
+};
+
+struct eea_device;
+
+struct eea_ring_sq {
+	void *desc;
+
+	u16 head;
+	u16 hw_idx;
+
+	u16 shadow_idx;
+	__le16 shadow_id;
+	u16 shadow_num;
+
+	u8 desc_size;
+	u8 desc_size_shift;
+
+	dma_addr_t dma_addr;
+	u32 dma_size;
+};
+
+struct eea_ring_cq {
+	void *desc;
+
+	u16 head;
+	u16 hw_idx;
+
+	u8 phase;
+	u8 desc_size_shift;
+	u8 desc_size;
+
+	dma_addr_t dma_addr;
+	u32 dma_size;
+};
+
+struct eea_ring {
+	const char *name;
+	struct eea_device *edev;
+	u32 index;
+	void __iomem *db;
+	u16 msix_vec;
+
+	u32 num;
+
+	u32 num_free;
+
+	struct eea_ring_sq sq;
+	struct eea_ring_cq cq;
+};
+
+struct eea_ring *eea_ering_alloc(u32 index, u32 num, struct eea_device *edev,
+				 u8 sq_desc_size, u8 cq_desc_size,
+				 const char *name);
+void eea_ering_free(struct eea_ring *ering);
+void eea_ering_kick(struct eea_ring *ering);
+
+void *eea_ering_sq_alloc_desc(struct eea_ring *ering, u16 id,
+			      bool is_last, u16 flags);
+void *eea_ering_aq_alloc_desc(struct eea_ring *ering);
+void eea_ering_sq_commit_desc(struct eea_ring *ering);
+void eea_ering_sq_cancel(struct eea_ring *ering);
+
+void eea_ering_cq_ack_desc(struct eea_ring *ering, u32 num);
+
+void eea_ering_irq_active(struct eea_ring *ering, struct eea_ring *tx_ering);
+void *eea_ering_cq_get_desc(const struct eea_ring *ering);
+#endif
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next v43 3/8] eea: probe the netdevice and create adminq
  2026-05-14  9:51 [PATCH net-next v43 0/8] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor Xuan Zhuo
  2026-05-14  9:51 ` [PATCH net-next v43 1/8] eea: introduce PCI framework Xuan Zhuo
  2026-05-14  9:51 ` [PATCH net-next v43 2/8] eea: introduce ring and descriptor structures Xuan Zhuo
@ 2026-05-14  9:51 ` Xuan Zhuo
  2026-05-18  1:41   ` Xuan Zhuo
  2026-05-14  9:51 ` [PATCH net-next v43 4/8] eea: create/destroy rx,tx queues for netdevice open and stop Xuan Zhuo
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 25+ messages in thread
From: Xuan Zhuo @ 2026-05-14  9:51 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Xuan Zhuo, Wen Gu, Philo Lu, Vadim Fedorenko,
	Dong Yibo, Mingyu Wang, Heiner Kallweit, Dust Li

Add basic driver framework for the Alibaba Elastic Ethernet Adapter(EEA).

This commit creates the netdevice after PCI probe,
and initializes the admin queue to send commands to the device.

Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Philo Lu <lulie@linux.alibaba.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/ethernet/alibaba/eea/Makefile     |   6 +-
 drivers/net/ethernet/alibaba/eea/eea_adminq.c | 542 ++++++++++++++++++
 drivers/net/ethernet/alibaba/eea/eea_adminq.h |  83 +++
 drivers/net/ethernet/alibaba/eea/eea_net.c    | 249 ++++++++
 drivers/net/ethernet/alibaba/eea/eea_net.h    | 137 +++++
 drivers/net/ethernet/alibaba/eea/eea_pci.c    |  35 +-
 drivers/net/ethernet/alibaba/eea/eea_pci.h    |   3 +
 7 files changed, 1047 insertions(+), 8 deletions(-)
 create mode 100644 drivers/net/ethernet/alibaba/eea/eea_adminq.c
 create mode 100644 drivers/net/ethernet/alibaba/eea/eea_adminq.h
 create mode 100644 drivers/net/ethernet/alibaba/eea/eea_net.c
 create mode 100644 drivers/net/ethernet/alibaba/eea/eea_net.h

diff --git a/drivers/net/ethernet/alibaba/eea/Makefile b/drivers/net/ethernet/alibaba/eea/Makefile
index 7d8e7e8c2f3a..a842ac416ae8 100644
--- a/drivers/net/ethernet/alibaba/eea/Makefile
+++ b/drivers/net/ethernet/alibaba/eea/Makefile
@@ -1,4 +1,6 @@
 
 obj-$(CONFIG_ALIBABA_EEA) += eea.o
-eea-y :=  eea_ring.o \
-	eea_pci.o
+eea-y := eea_ring.o \
+	eea_net.o \
+	eea_pci.o \
+	eea_adminq.o
diff --git a/drivers/net/ethernet/alibaba/eea/eea_adminq.c b/drivers/net/ethernet/alibaba/eea/eea_adminq.c
new file mode 100644
index 000000000000..dfad1bdbc44d
--- /dev/null
+++ b/drivers/net/ethernet/alibaba/eea/eea_adminq.c
@@ -0,0 +1,542 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Driver for Alibaba Elastic Ethernet Adapter.
+ *
+ * Copyright (C) 2025 Alibaba Inc.
+ */
+
+#include <linux/etherdevice.h>
+#include <linux/iopoll.h>
+#include <linux/utsname.h>
+#include <linux/version.h>
+
+#include "eea_adminq.h"
+#include "eea_net.h"
+#include "eea_pci.h"
+#include "eea_ring.h"
+
+#define EEA_AQ_CMD_CFG_QUERY         ((0 << 8) | 0)
+
+#define EEA_AQ_CMD_QUEUE_CREATE      ((1 << 8) | 0)
+#define EEA_AQ_CMD_QUEUE_DESTROY_ALL ((1 << 8) | 1)
+
+#define EEA_AQ_CMD_HOST_INFO         ((2 << 8) | 0)
+
+#define EEA_AQ_CMD_DEV_STATUS        ((3 << 8) | 0)
+
+#define EEA_RING_DESC_F_AQ_PHASE     (BIT(15) | BIT(7))
+
+#define EEA_QUEUE_FLAGS_HW_SPLIT_HDR BIT(0)
+#define EEA_QUEUE_FLAGS_SQCQ         BIT(1)
+#define EEA_QUEUE_FLAGS_HWTS         BIT(2)
+
+struct eea_aq_create {
+	__le32 flags;
+	/* queue index.
+	 * rx: 0 == qidx % 2
+	 * tx: 1 == qidx % 2
+	 */
+	__le16 qidx;
+	/* the depth of the queue */
+	__le16 depth;
+	/*  0: without SPLIT HDR
+	 *  1: 128B
+	 *  2: 256B
+	 *  3: 512B
+	 */
+	u8 hdr_buf_size;
+	u8 sq_desc_size;
+	u8 cq_desc_size;
+	u8 reserve0;
+	/* The vector for the irq. rx,tx share the same vector */
+	__le16 msix_vector;
+	__le16 reserve;
+	/* sq ring cfg. */
+	__le32 sq_addr_low;
+	__le32 sq_addr_high;
+	/* cq ring cfg. Just valid when flags include EEA_QUEUE_FLAGS_SQCQ. */
+	__le32 cq_addr_low;
+	__le32 cq_addr_high;
+};
+
+struct eea_aq_queue_drv_status {
+	__le16 qidx;
+
+	__le16 sq_head;
+	__le16 cq_head;
+	__le16 reserved;
+};
+
+#define EEA_OS_DISTRO		0
+#define EEA_DRV_TYPE		0
+#define EEA_OS_LINUX		1
+#define EEA_SPEC_VER_MAJOR	1
+#define EEA_SPEC_VER_MINOR	0
+
+struct eea_aq_host_info_cfg {
+	__le16	os_type;
+	__le16	os_dist;
+	__le16	drv_type;
+
+	__le16	kern_ver_major;
+	__le16	kern_ver_minor;
+	__le16	kern_ver_sub_minor;
+
+	__le16	drv_ver_major;
+	__le16	drv_ver_minor;
+	__le16	drv_ver_sub_minor;
+
+	__le16	spec_ver_major;
+	__le16	spec_ver_minor;
+	__le16	pci_bdf;
+	__le32	pci_domain;
+
+	u8      os_ver_str[64];
+	u8      isa_str[64];
+};
+
+#define EEA_HINFO_MAX_REP_LEN	1024
+#define EEA_HINFO_REP_BAD	2
+
+struct eea_aq_host_info_rep {
+	u8	op_code;
+	u8	has_reply;
+	u8	reply_str[EEA_HINFO_MAX_REP_LEN];
+};
+
+static struct eea_ring *qid_to_ering(struct eea_net *enet, u32 qid)
+{
+	struct eea_ring *ering;
+
+	if (qid % 2 == 0)
+		ering = enet->rx[qid / 2]->ering;
+	else
+		ering = enet->tx[qid / 2].ering;
+
+	return ering;
+}
+
+#define EEA_AQ_TIMEOUT_US (60 * 1000 * 1000)
+
+static void eea_device_broken(struct eea_net *enet)
+{
+	if (enet->adminq.broken)
+		return;
+
+	eea_device_reset(enet->edev);
+	enet->adminq.broken = true;
+}
+
+static int eea_adminq_submit(struct eea_net *enet, u16 cmd,
+			     dma_addr_t req_addr, dma_addr_t res_addr,
+			     u32 req_size, u32 res_size, u32 *reply_len)
+{
+	struct eea_aq_cdesc *cdesc;
+	struct eea_aq_desc *desc;
+	int ret;
+
+	if (enet->adminq.broken)
+		return -EIO;
+
+	desc = eea_ering_aq_alloc_desc(enet->adminq.ring);
+
+	desc->classid = cmd >> 8;
+	desc->command = cmd & 0xff;
+
+	desc->data_addr = cpu_to_le64(req_addr);
+	desc->data_len = cpu_to_le32(req_size);
+
+	desc->reply_addr = cpu_to_le64(res_addr);
+	desc->reply_len = cpu_to_le32(res_size);
+
+	/* for update flags */
+	dma_wmb();
+
+	desc->flags = cpu_to_le16(enet->adminq.phase);
+
+	eea_ering_sq_commit_desc(enet->adminq.ring);
+
+	eea_ering_kick(enet->adminq.ring);
+
+	++enet->adminq.num;
+
+	if ((enet->adminq.num % enet->adminq.ring->num) == 0)
+		enet->adminq.phase ^= EEA_RING_DESC_F_AQ_PHASE;
+
+	ret = read_poll_timeout(eea_ering_cq_get_desc, cdesc, cdesc, 10,
+				EEA_AQ_TIMEOUT_US, false, enet->adminq.ring);
+	if (ret) {
+		netdev_err(enet->netdev,
+			   "adminq exec timeout. cmd: %d reset device.\n",
+			   cmd);
+		/* The device must be reset before unmapping buffers to avoid
+		 * potential DMA writes after the memory is freed.
+		 */
+		eea_device_broken(enet);
+		return ret;
+	}
+
+	/* Returns 0 on success, or a negative error code on failure. */
+	ret = le32_to_cpu(cdesc->status);
+
+	eea_ering_cq_ack_desc(enet->adminq.ring, 1);
+
+	if (ret)
+		netdev_err(enet->netdev,
+			   "adminq exec failed. cmd: %d ret %d\n", cmd, ret);
+	else
+		*reply_len = le32_to_cpu(cdesc->reply_len);
+
+	return ret;
+}
+
+static int eea_adminq_exec(struct eea_net *enet, u16 cmd,
+			   void *req, u32 req_size,
+			   void *res, u32 res_size,
+			   u32 *reply)
+{
+	dma_addr_t req_addr = 0, res_addr = 0;
+	struct device *dma;
+	u32 reply_len = 0;
+	int ret;
+
+	if (reply)
+		*reply = 0;
+
+	dma = enet->edev->dma_dev;
+
+	if (req) {
+		req_addr = dma_map_single(dma, req, req_size, DMA_TO_DEVICE);
+		if (unlikely(dma_mapping_error(dma, req_addr)))
+			return -ENOMEM;
+	}
+
+	if (res) {
+		res_addr = dma_map_single(dma, res, res_size, DMA_FROM_DEVICE);
+		if (unlikely(dma_mapping_error(dma, res_addr))) {
+			ret = -ENOMEM;
+			goto err_unmap_req;
+		}
+	}
+
+	mutex_lock(&enet->adminq.lock);
+	ret = eea_adminq_submit(enet, cmd, req_addr, res_addr,
+				req_size, res_size, &reply_len);
+	mutex_unlock(&enet->adminq.lock);
+	if (res) {
+		dma_unmap_single(dma, res_addr, res_size, DMA_FROM_DEVICE);
+
+		if (ret)
+			memset(res, 0, res_size);
+		else if (res_size > reply_len)
+			memset(res + reply_len, 0, res_size - reply_len);
+
+		if (reply)
+			*reply = reply_len;
+	}
+
+err_unmap_req:
+	if (req)
+		dma_unmap_single(dma, req_addr, req_size, DMA_TO_DEVICE);
+
+	return ret;
+}
+
+void eea_destroy_adminq(struct eea_net *enet)
+{
+	struct eea_aq *aq;
+
+	aq = &enet->adminq;
+
+	if (aq->ring) {
+		eea_ering_free(aq->ring);
+		aq->ring = NULL;
+		aq->phase = 0;
+	}
+
+	kfree(aq->q_req_buf);
+	kfree(aq->q_res_buf);
+
+	aq->q_req_buf = NULL;
+	aq->q_res_buf = NULL;
+}
+
+int eea_create_adminq(struct eea_net *enet, u32 qid)
+{
+	u32 db_size, q_size, num;
+	struct eea_ring *ering;
+	struct eea_aq *aq;
+	int err = -ENOMEM;
+
+	num = enet->edev->rx_num + enet->edev->tx_num;
+	aq = &enet->adminq;
+
+	ering = eea_ering_alloc(qid, 64, enet->edev, sizeof(struct eea_aq_desc),
+				sizeof(struct eea_aq_cdesc), "adminq");
+	if (!ering)
+		return -ENOMEM;
+
+	aq->ring = ering;
+
+	err = eea_pci_active_aq(ering, qid / 2 + 1);
+	if (err)
+		goto err;
+
+	aq->phase = BIT(7);
+	aq->num = 0;
+
+	q_size = sizeof(*aq->q_req_buf) * num;
+	db_size = sizeof(*aq->q_res_buf) * num;
+
+	aq->q_req_size = q_size;
+	aq->q_res_size = db_size;
+
+	err = -ENOMEM;
+
+	aq->q_req_buf = kzalloc(q_size, GFP_KERNEL);
+	if (!aq->q_req_buf)
+		goto err;
+
+	aq->q_res_buf = kzalloc(db_size, GFP_KERNEL);
+	if (!aq->q_res_buf)
+		goto err;
+
+	/* Before we set up the AQ, the device remains in an inactive state, so
+	 * there will be no DMA operations. If the 'set up AQ' process fails, we
+	 * can safely free the DMA-related memory.
+	 */
+	err = eea_pci_set_aq_up(enet->edev);
+	if (err)
+		goto err;
+
+	aq->broken = false;
+
+	mutex_init(&aq->lock);
+
+	return 0;
+
+err:
+	eea_destroy_adminq(enet);
+	return err;
+}
+
+int eea_adminq_query_cfg(struct eea_net *enet, struct eea_aq_cfg *cfg)
+{
+	return eea_adminq_exec(enet, EEA_AQ_CMD_CFG_QUERY, NULL, 0, cfg,
+			       sizeof(*cfg), NULL);
+}
+
+static void qcfg_fill(struct eea_aq_create *qcfg, struct eea_ring *ering,
+		      u32 flags)
+{
+	qcfg->flags = cpu_to_le32(flags);
+	qcfg->qidx = cpu_to_le16(ering->index);
+	qcfg->depth = cpu_to_le16(ering->num);
+
+	qcfg->hdr_buf_size = flags & EEA_QUEUE_FLAGS_HW_SPLIT_HDR ? 1 : 0;
+	qcfg->sq_desc_size = ering->sq.desc_size;
+	qcfg->cq_desc_size = ering->cq.desc_size;
+	qcfg->msix_vector = cpu_to_le16(ering->msix_vec);
+
+	qcfg->sq_addr_low = cpu_to_le32(lower_32_bits(ering->sq.dma_addr));
+	qcfg->sq_addr_high = cpu_to_le32(upper_32_bits(ering->sq.dma_addr));
+
+	qcfg->cq_addr_low = cpu_to_le32(lower_32_bits(ering->cq.dma_addr));
+	qcfg->cq_addr_high = cpu_to_le32(upper_32_bits(ering->cq.dma_addr));
+}
+
+int eea_adminq_create_q(struct eea_net *enet, u32 num, u32 flags)
+{
+	int i, db_size, q_size, err = -ENOMEM;
+	struct eea_net_cfg *cfg;
+	struct eea_ring *ering;
+	struct eea_aq *aq;
+	u32 reply_len;
+
+	cfg = &enet->cfg;
+	aq = &enet->adminq;
+
+	if (cfg->split_hdr)
+		flags |= EEA_QUEUE_FLAGS_HW_SPLIT_HDR;
+
+	flags |= EEA_QUEUE_FLAGS_SQCQ;
+	flags |= EEA_QUEUE_FLAGS_HWTS;
+
+	q_size = sizeof(*aq->q_req_buf) * num;
+	db_size = sizeof(*aq->q_res_buf) * num;
+
+	for (i = 0; i < num; i++) {
+		ering = qid_to_ering(enet, i);
+		qcfg_fill(aq->q_req_buf + i, ering, flags);
+	}
+
+	err = eea_adminq_exec(enet, EEA_AQ_CMD_QUEUE_CREATE,
+			      aq->q_req_buf, q_size,
+			      aq->q_res_buf, db_size,
+			      &reply_len);
+	if (err)
+		return err;
+
+	if (reply_len != db_size) {
+		eea_adminq_destroy_all_q(enet);
+		netdev_err(enet->netdev, "invalid reply len %u\n", reply_len);
+		return -EINVAL;
+	}
+
+	for (i = 0; i < num; i++) {
+		ering = qid_to_ering(enet, i);
+		ering->db = eea_pci_db_addr(ering->edev,
+					    le32_to_cpu(aq->q_res_buf[i]));
+		if (!ering->db) {
+			netdev_err(enet->netdev, "invalid db off %u\n",
+				   le32_to_cpu(aq->q_res_buf[i]));
+			goto err;
+		}
+	}
+
+	return err;
+
+err:
+	eea_adminq_destroy_all_q(enet);
+	for (i = 0; i < num; i++) {
+		ering = qid_to_ering(enet, i);
+		ering->db = NULL;
+	}
+
+	return -EIO;
+}
+
+int eea_adminq_destroy_all_q(struct eea_net *enet)
+{
+	int err;
+
+	err = eea_adminq_exec(enet, EEA_AQ_CMD_QUEUE_DESTROY_ALL, NULL, 0,
+			      NULL, 0, NULL);
+	if (err) {
+		/* The device must be reset before unmapping buffers to avoid
+		 * potential DMA writes after the memory is freed.
+		 */
+		mutex_lock(&enet->adminq.lock);
+		eea_device_broken(enet);
+		mutex_unlock(&enet->adminq.lock);
+
+		netdev_err(enet->netdev, "QUEUE_DESTROY fail: reset device.\n");
+	}
+
+	return err;
+}
+
+/* The caller must ensure that both the 'rx' and 'tx' arrays are valid. */
+int eea_adminq_dev_status(struct eea_net *enet,
+			  struct eea_aq_dev_status *dstatus)
+{
+	struct eea_aq_queue_drv_status *drv_status;
+	struct __eea_aq_dev_status *dev_status;
+	int err, i, io_num, size, q_num;
+	struct eea_ring *ering;
+	void *rep, *req;
+
+	q_num = enet->cfg.rx_ring_num + enet->cfg.tx_ring_num + 1;
+	io_num = enet->cfg.rx_ring_num + enet->cfg.tx_ring_num;
+
+	req = kcalloc(q_num, sizeof(struct eea_aq_queue_drv_status),
+		      GFP_KERNEL);
+	if (!req)
+		return -ENOMEM;
+
+	size = struct_size(dev_status, q_status, q_num);
+
+	rep = kzalloc(size, GFP_KERNEL);
+	if (!rep) {
+		kfree(req);
+		return -ENOMEM;
+	}
+
+	drv_status = req;
+	for (i = 0; i < io_num; ++i, ++drv_status) {
+		ering = qid_to_ering(enet, i);
+		drv_status->qidx = cpu_to_le16(i);
+		drv_status->cq_head = cpu_to_le16(ering->cq.head);
+		drv_status->sq_head = cpu_to_le16(ering->sq.head);
+	}
+
+	drv_status->qidx = cpu_to_le16(i);
+	drv_status->cq_head = cpu_to_le16(enet->adminq.ring->cq.head);
+	drv_status->sq_head = cpu_to_le16(enet->adminq.ring->sq.head);
+
+	err = eea_adminq_exec(enet, EEA_AQ_CMD_DEV_STATUS, req,
+			      q_num * sizeof(struct eea_aq_queue_drv_status),
+			      rep, size, NULL);
+	kfree(req);
+	if (err) {
+		kfree(rep);
+		return err;
+	}
+
+	dstatus->num = q_num;
+	dstatus->status = rep;
+
+	return 0;
+}
+
+void eea_adminq_config_host_info(struct eea_net *enet)
+{
+	struct device *dev = enet->edev->dma_dev;
+	struct eea_aq_host_info_cfg *cfg;
+	struct eea_aq_host_info_rep *rep;
+	int rc = -ENOMEM;
+
+	cfg = kzalloc(sizeof(*cfg), GFP_KERNEL);
+	if (!cfg)
+		return;
+
+	rep = kzalloc(sizeof(*rep), GFP_KERNEL);
+	if (!rep)
+		goto err_free_cfg;
+
+	cfg->os_type            = cpu_to_le16(EEA_OS_LINUX);
+	cfg->os_dist            = cpu_to_le16(EEA_OS_DISTRO);
+	cfg->drv_type           = cpu_to_le16(EEA_DRV_TYPE);
+
+	cfg->kern_ver_major     = cpu_to_le16(LINUX_VERSION_MAJOR);
+	cfg->kern_ver_minor     = cpu_to_le16(LINUX_VERSION_PATCHLEVEL);
+	cfg->kern_ver_sub_minor = cpu_to_le16(LINUX_VERSION_SUBLEVEL);
+
+	cfg->drv_ver_major      = cpu_to_le16(EEA_VER_MAJOR);
+	cfg->drv_ver_minor      = cpu_to_le16(EEA_VER_MINOR);
+	cfg->drv_ver_sub_minor  = cpu_to_le16(EEA_VER_SUB_MINOR);
+
+	cfg->spec_ver_major     = cpu_to_le16(EEA_SPEC_VER_MAJOR);
+	cfg->spec_ver_minor     = cpu_to_le16(EEA_SPEC_VER_MINOR);
+
+	cfg->pci_bdf            = cpu_to_le16(eea_pci_bdf(enet->edev));
+	cfg->pci_domain         = cpu_to_le32(eea_pci_domain_nr(enet->edev));
+
+	strscpy(cfg->os_ver_str, utsname()->release, sizeof(cfg->os_ver_str));
+	strscpy(cfg->isa_str, utsname()->machine, sizeof(cfg->isa_str));
+
+	rc = eea_adminq_exec(enet, EEA_AQ_CMD_HOST_INFO,
+			     cfg, sizeof(*cfg), rep, sizeof(*rep), NULL);
+
+	if (!rc) {
+		if (rep->op_code == EEA_HINFO_REP_BAD)
+			dev_warn(dev, "The hardware-driven state validation may be abnormal.\n");
+
+		if (rep->has_reply) {
+			char buf[EEA_HINFO_MAX_REP_LEN] = {0};
+
+			rep->reply_str[EEA_HINFO_MAX_REP_LEN - 1] = '\0';
+
+			string_escape_str(rep->reply_str, buf, sizeof(buf),
+					  ESCAPE_NP, NULL);
+
+			buf[EEA_HINFO_MAX_REP_LEN - 1] = '\0';
+
+			dev_warn(dev, "Device replied: %s\n", buf);
+		}
+	}
+
+	kfree(rep);
+err_free_cfg:
+	kfree(cfg);
+}
diff --git a/drivers/net/ethernet/alibaba/eea/eea_adminq.h b/drivers/net/ethernet/alibaba/eea/eea_adminq.h
new file mode 100644
index 000000000000..0182f5641fcf
--- /dev/null
+++ b/drivers/net/ethernet/alibaba/eea/eea_adminq.h
@@ -0,0 +1,83 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Driver for Alibaba Elastic Ethernet Adapter.
+ *
+ * Copyright (C) 2025 Alibaba Inc.
+ */
+
+#ifndef __EEA_ADMINQ_H__
+#define __EEA_ADMINQ_H__
+
+struct eea_aq_cfg {
+	__le32 rx_depth_max;
+	__le32 rx_depth_def;
+
+	__le32 tx_depth_max;
+	__le32 tx_depth_def;
+
+	__le32 max_tso_size;
+	__le32 max_tso_segs;
+
+	u8 mac[ETH_ALEN];
+	__le16 status;
+
+	__le16 mtu;
+	__le16 reserved0;
+	__le16 reserved1;
+	u8 reserved2;
+	u8 reserved3;
+
+	__le16 reserved4;
+	__le16 reserved5;
+	__le16 reserved6;
+};
+
+struct eea_aq_queue_status {
+	__le16 qidx;
+#define EEA_QUEUE_STATUS_OK 0
+#define EEA_QUEUE_STATUS_NEED_RESET 1
+	__le16 status;
+};
+
+struct __eea_aq_dev_status {
+#define EEA_LINK_DOWN_STATUS  0
+#define EEA_LINK_UP_STATUS    1
+	__le16 link_status;
+	__le16 reserved;
+
+	struct eea_aq_queue_status q_status[];
+};
+
+struct eea_aq_dev_status {
+	u32 num;
+	struct __eea_aq_dev_status *status;
+};
+
+struct eea_aq {
+	struct eea_ring *ring;
+	u32 num;
+	bool broken;
+	u16 phase;
+
+	/* lock for adminq exec */
+	struct mutex lock;
+
+	u32 q_req_size;
+	u32 q_res_size;
+	struct eea_aq_create *q_req_buf;
+	__le32 *q_res_buf;
+};
+
+struct eea_net;
+
+int eea_create_adminq(struct eea_net *enet, u32 qid);
+void eea_destroy_adminq(struct eea_net *enet);
+
+int eea_adminq_query_cfg(struct eea_net *enet, struct eea_aq_cfg *cfg);
+
+int eea_adminq_create_q(struct eea_net *enet, u32 num, u32 flags);
+int eea_adminq_destroy_all_q(struct eea_net *enet);
+int eea_adminq_dev_status(struct eea_net *enet,
+			  struct eea_aq_dev_status *dstatus);
+void eea_adminq_config_host_info(struct eea_net *enet);
+#endif
diff --git a/drivers/net/ethernet/alibaba/eea/eea_net.c b/drivers/net/ethernet/alibaba/eea/eea_net.c
new file mode 100644
index 000000000000..bb8a49f8c6df
--- /dev/null
+++ b/drivers/net/ethernet/alibaba/eea/eea_net.c
@@ -0,0 +1,249 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Driver for Alibaba Elastic Ethernet Adapter.
+ *
+ * Copyright (C) 2025 Alibaba Inc.
+ */
+
+#include <linux/etherdevice.h>
+#include <linux/module.h>
+#include <linux/netdevice.h>
+#include <linux/rtnetlink.h>
+#include <net/netdev_queues.h>
+
+#include "eea_adminq.h"
+#include "eea_net.h"
+#include "eea_pci.h"
+#include "eea_ring.h"
+
+#define EEA_SPLIT_HDR_SIZE ALIGN(128, L1_CACHE_BYTES)
+
+static int eea_update_cfg(struct eea_net *enet,
+			  struct eea_device *edev,
+			  struct eea_aq_cfg *hwcfg)
+{
+	u32 rx_max = le32_to_cpu(hwcfg->rx_depth_max);
+	u32 tx_max = le32_to_cpu(hwcfg->tx_depth_max);
+	u32 rx_def = le32_to_cpu(hwcfg->rx_depth_def);
+	u32 tx_def = le32_to_cpu(hwcfg->tx_depth_def);
+
+	/* Now, we assert that the rx ring num is equal to the tx ring num. */
+	if (edev->rx_num != edev->tx_num) {
+		dev_err(edev->dma_dev, "Inconsistent ring num: RX %u, TX %u\n",
+			edev->rx_num, edev->tx_num);
+		return -EINVAL;
+	}
+
+	if (rx_max > EEA_NET_IO_HW_RING_DEPTH_MAX ||
+	    rx_max < EEA_NET_IO_HW_RING_DEPTH_MIN ||
+	    tx_max > EEA_NET_IO_HW_RING_DEPTH_MAX ||
+	    tx_max < EEA_NET_IO_HW_RING_DEPTH_MIN) {
+		dev_err(edev->dma_dev, "Invalid HW max depth: RX %u, TX %u\n",
+			rx_max, tx_max);
+		return -EINVAL;
+	}
+
+	if (rx_def > rx_max ||
+	    tx_def > tx_max ||
+	    rx_def < EEA_NET_IO_HW_RING_DEPTH_MIN ||
+	    tx_def < EEA_NET_IO_HW_RING_DEPTH_MIN) {
+		dev_err(edev->dma_dev, "Invalid default depth: RX %u (max %u), TX %u (max %u)\n",
+			rx_def, rx_max, tx_def, tx_max);
+		return -EINVAL;
+	}
+
+	if (!is_power_of_2(rx_max) || !is_power_of_2(tx_max) ||
+	    !is_power_of_2(rx_def) || !is_power_of_2(tx_def)) {
+		dev_err(edev->dma_dev, "Ring depth must be power of 2\n");
+		return -EINVAL;
+	}
+
+	enet->cfg_hw.rx_ring_depth = rx_max;
+	enet->cfg_hw.tx_ring_depth = tx_max;
+	enet->cfg_hw.rx_ring_num = edev->rx_num;
+	enet->cfg_hw.tx_ring_num = edev->tx_num;
+	enet->cfg_hw.split_hdr = EEA_SPLIT_HDR_SIZE;
+
+	enet->cfg.rx_ring_depth = rx_def;
+	enet->cfg.tx_ring_depth = tx_def;
+	enet->cfg.rx_ring_num = edev->rx_num;
+	enet->cfg.tx_ring_num = edev->tx_num;
+
+	return 0;
+}
+
+static int eea_netdev_init_features(struct net_device *netdev,
+				    struct eea_net *enet,
+				    struct eea_device *edev)
+{
+	struct eea_aq_cfg *cfg;
+	int err;
+	u32 mtu;
+
+	cfg = kzalloc(sizeof(*cfg), GFP_KERNEL);
+	if (!cfg)
+		return -ENOMEM;
+
+	err = eea_adminq_query_cfg(enet, cfg);
+	if (err)
+		goto err_free;
+
+	mtu = le16_to_cpu(cfg->mtu);
+	if (mtu < ETH_MIN_MTU) {
+		dev_err(edev->dma_dev, "The device gave us an invalid MTU. Here we can only exit the initialization. %u < %u\n",
+			mtu, ETH_MIN_MTU);
+		err = -EINVAL;
+		goto err_free;
+	}
+
+	err = eea_update_cfg(enet, edev, cfg);
+	if (err)
+		goto err_free;
+
+	netdev->priv_flags |= IFF_UNICAST_FLT;
+	netdev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
+
+	netdev->hw_features |= NETIF_F_HW_CSUM;
+	netdev->hw_features |= NETIF_F_GRO_HW;
+	netdev->hw_features |= NETIF_F_SG;
+	netdev->hw_features |= NETIF_F_TSO;
+	netdev->hw_features |= NETIF_F_TSO_ECN;
+	netdev->hw_features |= NETIF_F_TSO6;
+	netdev->hw_features |= NETIF_F_GSO_UDP_L4;
+
+	netdev->features |= NETIF_F_HIGHDMA;
+	netdev->features |= NETIF_F_HW_CSUM;
+	netdev->features |= NETIF_F_SG;
+	netdev->features |= NETIF_F_GSO_ROBUST;
+	netdev->features |= netdev->hw_features & NETIF_F_ALL_TSO;
+	netdev->features |= NETIF_F_RXCSUM;
+	netdev->features |= NETIF_F_GRO_HW;
+
+	netdev->vlan_features = netdev->features;
+
+	if (!is_valid_ether_addr(cfg->mac)) {
+		dev_err(edev->dma_dev, "The device gave invalid mac %pM\n",
+			cfg->mac);
+		err = -EINVAL;
+		goto err_free;
+	}
+
+	eth_hw_addr_set(netdev, cfg->mac);
+
+	enet->speed = SPEED_UNKNOWN;
+	enet->duplex = DUPLEX_UNKNOWN;
+
+	netdev->min_mtu = ETH_MIN_MTU;
+
+	netdev->mtu = mtu;
+
+	/* If jumbo frames are already enabled, then the returned MTU will be a
+	 * jumbo MTU, and the driver will automatically enable jumbo frame
+	 * support by default.
+	 */
+	netdev->max_mtu = mtu;
+
+err_free:
+	kfree(cfg);
+	return err;
+}
+
+static const struct net_device_ops eea_netdev = {
+	.ndo_validate_addr  = eth_validate_addr,
+	.ndo_features_check = passthru_features_check,
+};
+
+static struct eea_net *eea_netdev_alloc(struct eea_device *edev, u32 pairs)
+{
+	struct net_device *netdev;
+	struct eea_net *enet;
+
+	netdev = alloc_etherdev_mq(sizeof(struct eea_net), pairs);
+	if (!netdev) {
+		dev_err(edev->dma_dev,
+			"alloc_etherdev_mq failed with pairs %d\n", pairs);
+		return NULL;
+	}
+
+	netdev->netdev_ops = &eea_netdev;
+	SET_NETDEV_DEV(netdev, edev->dma_dev);
+
+	enet = netdev_priv(netdev);
+	enet->netdev = netdev;
+	enet->edev = edev;
+	edev->enet = enet;
+
+	return enet;
+}
+
+int eea_net_probe(struct eea_device *edev)
+{
+	struct eea_net *enet;
+	int err = -ENOMEM;
+
+	enet = eea_netdev_alloc(edev, edev->rx_num);
+	if (!enet)
+		return -ENOMEM;
+
+	err = eea_create_adminq(enet, edev->rx_num + edev->tx_num);
+	if (err)
+		goto err_free_netdev;
+
+	eea_adminq_config_host_info(enet);
+
+	err = eea_netdev_init_features(enet->netdev, enet, edev);
+	if (err)
+		goto err_reset_dev;
+
+	netdev_dbg(enet->netdev, "eea probe success.\n");
+
+	/* Queue TX/RX implementation is still in progress. register_netdev is
+	 * deferred until these are completed in subsequent commits.
+	 */
+
+	return 0;
+
+err_reset_dev:
+	eea_device_reset(edev);
+	eea_destroy_adminq(enet);
+
+err_free_netdev:
+	free_netdev(enet->netdev);
+	return err;
+}
+
+void eea_net_remove(struct eea_device *edev)
+{
+	struct net_device *netdev;
+	struct eea_net *enet;
+
+	enet = edev->enet;
+	netdev = enet->netdev;
+
+	netdev_dbg(enet->netdev, "eea removed.\n");
+
+	eea_device_reset(edev);
+
+	eea_destroy_adminq(enet);
+
+	free_netdev(netdev);
+}
+
+void eea_net_shutdown(struct eea_device *edev)
+{
+	struct net_device *netdev;
+	struct eea_net *enet;
+
+	enet = edev->enet;
+	netdev = enet->netdev;
+
+	rtnl_lock();
+
+	netif_device_detach(netdev);
+
+	eea_device_reset(edev);
+
+	eea_destroy_adminq(enet);
+
+	rtnl_unlock();
+}
diff --git a/drivers/net/ethernet/alibaba/eea/eea_net.h b/drivers/net/ethernet/alibaba/eea/eea_net.h
new file mode 100644
index 000000000000..fa0eec8af21b
--- /dev/null
+++ b/drivers/net/ethernet/alibaba/eea/eea_net.h
@@ -0,0 +1,137 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Driver for Alibaba Elastic Ethernet Adapter.
+ *
+ * Copyright (C) 2025 Alibaba Inc.
+ */
+
+#ifndef __EEA_NET_H__
+#define __EEA_NET_H__
+
+#include <linux/ethtool.h>
+#include <linux/netdevice.h>
+
+#include "eea_adminq.h"
+#include "eea_ring.h"
+
+#define EEA_VER_MAJOR		1
+#define EEA_VER_MINOR		0
+#define EEA_VER_SUB_MINOR	0
+
+struct eea_net_tx {
+	struct eea_net *enet;
+
+	struct eea_ring *ering;
+
+	struct eea_tx_meta *meta;
+	struct eea_tx_meta *free;
+
+	struct device *dma_dev;
+
+	u32 index;
+
+	char name[16];
+};
+
+struct eea_rx_meta {
+	struct eea_rx_meta *next;
+
+	struct page *page;
+	dma_addr_t dma;
+	u32 offset;
+	u32 frags;
+
+	struct page *hdr_page;
+	void *hdr_addr;
+	dma_addr_t hdr_dma;
+
+	u32 id;
+
+	u32 truesize;
+	u32 headroom;
+	u32 tailroom;
+
+	u32 len;
+};
+
+struct eea_net_rx_pkt_ctx {
+	u16 idx;
+
+	bool data_valid;
+	bool do_drop;
+
+	struct sk_buff *head_skb;
+};
+
+struct eea_net_rx {
+	struct eea_net *enet;
+
+	struct eea_ring *ering;
+
+	struct eea_rx_meta *meta;
+	struct eea_rx_meta *free;
+
+	struct device *dma_dev;
+
+	u32 index;
+
+	u32 flags;
+
+	u32 headroom;
+
+	struct napi_struct *napi;
+
+	char name[16];
+
+	struct eea_net_rx_pkt_ctx pkt;
+
+	struct page_pool *pp;
+};
+
+struct eea_net_cfg {
+	u32 rx_ring_depth;
+	u32 tx_ring_depth;
+	u32 rx_ring_num;
+	u32 tx_ring_num;
+
+	u8 rx_sq_desc_size;
+	u8 rx_cq_desc_size;
+	u8 tx_sq_desc_size;
+	u8 tx_cq_desc_size;
+
+	u32 split_hdr;
+};
+
+enum {
+	EEA_LINK_ERR_NONE,
+	EEA_LINK_ERR_HA_RESET_DEV,
+	EEA_LINK_ERR_LINK_DOWN,
+};
+
+struct eea_net {
+	struct eea_device *edev;
+	struct net_device *netdev;
+
+	struct eea_aq adminq;
+
+	struct eea_net_tx *tx;
+	struct eea_net_rx **rx;
+
+	struct eea_net_cfg cfg;
+	struct eea_net_cfg cfg_hw;
+
+	u32 link_err;
+
+	bool started;
+
+	u8 duplex;
+	u32 speed;
+
+	u64 hw_ts_offset;
+};
+
+int eea_net_probe(struct eea_device *edev);
+void eea_net_remove(struct eea_device *edev);
+void eea_net_shutdown(struct eea_device *edev);
+
+#endif
diff --git a/drivers/net/ethernet/alibaba/eea/eea_pci.c b/drivers/net/ethernet/alibaba/eea/eea_pci.c
index 65a0ceb73b35..bef0ebd6bb20 100644
--- a/drivers/net/ethernet/alibaba/eea/eea_pci.c
+++ b/drivers/net/ethernet/alibaba/eea/eea_pci.c
@@ -8,6 +8,7 @@
 #include <linux/io-64-nonatomic-lo-hi.h>
 #include <linux/iopoll.h>
 
+#include "eea_net.h"
 #include "eea_pci.h"
 
 #define EEA_PCI_DB_OFFSET 4096
@@ -64,7 +65,9 @@ struct eea_pci_device {
 	((void __iomem *)((reg) + offsetof(struct eea_pci_cfg, item)))
 
 #define cfg_write8(reg, item, val) iowrite8(val, cfg_pointer(reg, item))
+#define cfg_write16(reg, item, val) iowrite16(val, cfg_pointer(reg, item))
 #define cfg_write32(reg, item, val) iowrite32(val, cfg_pointer(reg, item))
+#define cfg_write64(reg, item, val) iowrite64_lo_hi(val, cfg_pointer(reg, item))
 
 #define cfg_read8(reg, item) ioread8(cfg_pointer(reg, item))
 #define cfg_read32(reg, item) ioread32(cfg_pointer(reg, item))
@@ -337,6 +340,25 @@ void __iomem *eea_pci_db_addr(struct eea_device *edev, u32 off)
 	return edev->ep_dev->db_base + off;
 }
 
+int eea_pci_active_aq(struct eea_ring *ering, int msix_vec)
+{
+	struct eea_pci_device *ep_dev = ering->edev->ep_dev;
+
+	cfg_write16(ep_dev->reg, aq_size, ering->num);
+	cfg_write16(ep_dev->reg, aq_msix_vector, msix_vec);
+
+	cfg_write64(ep_dev->reg, aq_sq_addr, ering->sq.dma_addr);
+	cfg_write64(ep_dev->reg, aq_cq_addr, ering->cq.dma_addr);
+
+	ering->db = eea_pci_db_addr(ering->edev,
+				    cfg_read32(ep_dev->reg, aq_db_off));
+
+	if (!ering->db)
+		return -EIO;
+
+	return 0;
+}
+
 u64 eea_pci_device_ts(struct eea_device *edev)
 {
 	struct eea_pci_device *ep_dev = edev->ep_dev;
@@ -358,7 +380,9 @@ static int eea_init_device(struct eea_device *edev)
 	if (err)
 		goto err;
 
-	/* do net device probe ... */
+	err = eea_net_probe(edev);
+	if (err)
+		goto err;
 
 	return 0;
 err:
@@ -392,6 +416,9 @@ static void __eea_pci_remove(struct pci_dev *pci_dev)
 {
 	struct eea_pci_device *ep_dev = pci_get_drvdata(pci_dev);
 	struct device *dev = get_device(&ep_dev->pci_dev->dev);
+	struct eea_device *edev = &ep_dev->edev;
+
+	eea_net_remove(edev);
 
 	eea_pci_release_resource(ep_dev);
 
@@ -429,8 +456,6 @@ static void eea_pci_remove(struct pci_dev *pci_dev)
 {
 	struct eea_pci_device *ep_dev = pci_get_drvdata(pci_dev);
 
-	eea_device_reset(&ep_dev->edev);
-
 	__eea_pci_remove(pci_dev);
 
 	pci_set_drvdata(pci_dev, NULL);
@@ -446,9 +471,7 @@ static void eea_pci_shutdown(struct pci_dev *pci_dev)
 
 	ep_dev->shutdown = true;
 
-	/* do net device stop and clear. */
-
-	eea_device_reset(edev);
+	eea_net_shutdown(edev);
 
 	pci_clear_master(pci_dev);
 }
diff --git a/drivers/net/ethernet/alibaba/eea/eea_pci.h b/drivers/net/ethernet/alibaba/eea/eea_pci.h
index 746cce4dd68e..cfd278e2efde 100644
--- a/drivers/net/ethernet/alibaba/eea/eea_pci.h
+++ b/drivers/net/ethernet/alibaba/eea/eea_pci.h
@@ -10,6 +10,8 @@
 
 #include <linux/pci.h>
 
+#include "eea_ring.h"
+
 struct eea_pci_cap {
 	__u8 cap_vndr;
 	__u8 cap_next;
@@ -43,6 +45,7 @@ u16 eea_pci_bdf(struct eea_device *edev);
 
 int eea_device_reset(struct eea_device *dev);
 int eea_pci_set_aq_up(struct eea_device *dev);
+int eea_pci_active_aq(struct eea_ring *ering, int msix_vec);
 
 u64 eea_pci_device_ts(struct eea_device *edev);
 
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next v43 4/8] eea: create/destroy rx,tx queues for netdevice open and stop
  2026-05-14  9:51 [PATCH net-next v43 0/8] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor Xuan Zhuo
                   ` (2 preceding siblings ...)
  2026-05-14  9:51 ` [PATCH net-next v43 3/8] eea: probe the netdevice and create adminq Xuan Zhuo
@ 2026-05-14  9:51 ` Xuan Zhuo
  2026-05-18  1:18   ` Xuan Zhuo
  2026-05-14  9:51 ` [PATCH net-next v43 5/8] eea: implement packet receive logic Xuan Zhuo
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 25+ messages in thread
From: Xuan Zhuo @ 2026-05-14  9:51 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Xuan Zhuo, Wen Gu, Philo Lu, Vadim Fedorenko,
	Dong Yibo, Mingyu Wang, Heiner Kallweit, Dust Li

Add basic driver framework for the Alibaba Elastic Ethernet Adapter(EEA).

This commit introduces the implementation for the netdevice open and
stop.

This commit introduces HA to restore the device when error occurs,
but in HA scenarios the driver can't ensure to restore the status
correctly.

Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Philo Lu <lulie@linux.alibaba.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/ethernet/alibaba/eea/Makefile  |   4 +-
 drivers/net/ethernet/alibaba/eea/eea_net.c | 563 ++++++++++++++++++++-
 drivers/net/ethernet/alibaba/eea/eea_net.h |  50 +-
 drivers/net/ethernet/alibaba/eea/eea_pci.c | 244 ++++++++-
 drivers/net/ethernet/alibaba/eea/eea_pci.h |  20 +
 drivers/net/ethernet/alibaba/eea/eea_rx.c  | 264 ++++++++++
 drivers/net/ethernet/alibaba/eea/eea_tx.c  | 102 ++++
 7 files changed, 1231 insertions(+), 16 deletions(-)
 create mode 100644 drivers/net/ethernet/alibaba/eea/eea_rx.c
 create mode 100644 drivers/net/ethernet/alibaba/eea/eea_tx.c

diff --git a/drivers/net/ethernet/alibaba/eea/Makefile b/drivers/net/ethernet/alibaba/eea/Makefile
index a842ac416ae8..5f0961002e57 100644
--- a/drivers/net/ethernet/alibaba/eea/Makefile
+++ b/drivers/net/ethernet/alibaba/eea/Makefile
@@ -3,4 +3,6 @@ obj-$(CONFIG_ALIBABA_EEA) += eea.o
 eea-y := eea_ring.o \
 	eea_net.o \
 	eea_pci.o \
-	eea_adminq.o
+	eea_adminq.o \
+	eea_tx.o \
+	eea_rx.o
diff --git a/drivers/net/ethernet/alibaba/eea/eea_net.c b/drivers/net/ethernet/alibaba/eea/eea_net.c
index bb8a49f8c6df..cfb18a07e296 100644
--- a/drivers/net/ethernet/alibaba/eea/eea_net.c
+++ b/drivers/net/ethernet/alibaba/eea/eea_net.c
@@ -18,6 +18,460 @@
 
 #define EEA_SPLIT_HDR_SIZE ALIGN(128, L1_CACHE_BYTES)
 
+static irqreturn_t eea_irq_handler(int irq, void *data)
+{
+	struct eea_irq_blk *blk = data;
+
+	napi_schedule_irqoff(&blk->napi);
+
+	return IRQ_HANDLED;
+}
+
+static void eea_free_irq_blk(struct eea_net *enet)
+{
+	struct eea_irq_blk *blk;
+	u32 num;
+	int i;
+
+	if (!enet->irq_blks)
+		return;
+
+	num = enet->edev->rx_num;
+
+	for (i = 0; i < num; i++) {
+		blk = &enet->irq_blks[i];
+
+		if (blk->ready)
+			eea_pci_free_irq(blk);
+
+		blk->ready = false;
+	}
+
+	kvfree(enet->irq_blks);
+	enet->irq_blks = NULL;
+}
+
+/* The driver will always attempt to allocate IRQ blocks based on the maximum
+ * possible queue num.
+ */
+static int eea_alloc_irq_blks(struct eea_net *enet)
+{
+	struct eea_device *edev = enet->edev;
+	struct eea_irq_blk *blk, *irq_blks;
+	int i, err, num;
+
+	num = enet->edev->rx_num;
+
+	irq_blks = kvcalloc(num, sizeof(*blk), GFP_KERNEL);
+	if (!irq_blks)
+		return -ENOMEM;
+
+	enet->irq_blks = irq_blks;
+
+	for (i = 0; i < num; i++) {
+		blk = &irq_blks[i];
+		blk->idx = i;
+
+		/* vec 0 is for error notify. */
+		blk->msix_vec = i + 1;
+
+		err = eea_pci_request_irq(edev, blk, eea_irq_handler);
+		if (err)
+			goto err_free_irq_blk;
+
+		blk->ready = true;
+	}
+
+	return 0;
+
+err_free_irq_blk:
+	eea_free_irq_blk(enet);
+	return err;
+}
+
+static int eea_update_queues(struct eea_net *enet)
+{
+	return netif_set_real_num_queues(enet->netdev, enet->cfg.tx_ring_num,
+					 enet->cfg.rx_ring_num);
+}
+
+void eea_init_ctx(struct eea_net *enet, struct eea_net_init_ctx *ctx)
+{
+	memset(ctx, 0, sizeof(*ctx));
+
+	ctx->netdev = enet->netdev;
+	ctx->edev = enet->edev;
+	ctx->cfg = enet->cfg;
+}
+
+static void eea_bind_q_and_cfg(struct eea_net *enet,
+			       struct eea_net_init_ctx *ctx)
+{
+	struct eea_irq_blk *blk;
+	struct eea_net_rx *rx;
+	struct eea_net_tx *tx;
+	int i;
+
+	enet->cfg = ctx->cfg;
+	enet->rx = ctx->rx;
+	enet->tx = ctx->tx;
+
+	for (i = 0; i < ctx->cfg.rx_ring_num; i++) {
+		blk = &enet->irq_blks[i];
+
+		rx = ctx->rx[i];
+		tx = &ctx->tx[i];
+
+		rx->enet = enet;
+		rx->napi = &blk->napi;
+		rx->ering->msix_vec = blk->msix_vec;
+
+		tx->enet = enet;
+		tx->ering->msix_vec = blk->msix_vec;
+
+		blk->rx = rx;
+	}
+}
+
+static void eea_unbind_q_and_cfg(struct eea_net *enet,
+				 struct eea_net_init_ctx *ctx)
+{
+	struct eea_irq_blk *blk;
+	struct eea_net_rx *rx;
+	int i;
+
+	ctx->cfg = enet->cfg;
+	ctx->rx = enet->rx;
+	ctx->tx = enet->tx;
+
+	enet->rx = NULL;
+	enet->tx = NULL;
+
+	for (i = 0; i < ctx->cfg.rx_ring_num; i++) {
+		blk = &enet->irq_blks[i];
+
+		rx = ctx->rx[i];
+
+		rx->napi = NULL;
+
+		blk->rx = NULL;
+	}
+}
+
+static void eea_free_rxtx_q_mem(struct eea_net_init_ctx *ctx)
+{
+	struct eea_net_rx *rx;
+	struct eea_net_tx *tx;
+	int i;
+
+	for (i = 0; i < ctx->cfg.rx_ring_num; i++) {
+		rx = ctx->rx[i];
+		tx = &ctx->tx[i];
+
+		eea_free_rx(rx, &ctx->cfg);
+		eea_free_tx(tx, &ctx->cfg);
+	}
+
+	kvfree(ctx->rx);
+	kvfree(ctx->tx);
+}
+
+/* alloc tx/rx: struct, ring, meta, pp, napi */
+static int eea_alloc_rxtx_q_mem(struct eea_net_init_ctx *ctx)
+{
+	struct eea_net_rx *rx;
+	struct eea_net_tx *tx;
+	int err, i;
+
+	ctx->tx = kvcalloc(ctx->cfg.tx_ring_num, sizeof(*ctx->tx), GFP_KERNEL);
+	if (!ctx->tx)
+		return -ENOMEM;
+
+	ctx->rx = kvcalloc(ctx->cfg.rx_ring_num, sizeof(*ctx->rx), GFP_KERNEL);
+	if (!ctx->rx)
+		goto err_free_tx;
+
+	ctx->cfg.rx_sq_desc_size = sizeof(struct eea_rx_desc);
+	ctx->cfg.rx_cq_desc_size = sizeof(struct eea_rx_cdesc);
+	ctx->cfg.tx_sq_desc_size = sizeof(struct eea_tx_desc);
+	ctx->cfg.tx_cq_desc_size = sizeof(struct eea_tx_cdesc);
+
+	/* ethtool may config this. */
+	if (!ctx->cfg.split_hdr)
+		ctx->cfg.rx_sq_desc_size = sizeof(struct eea_rx_desc_no_hdr);
+
+	for (i = 0; i < ctx->cfg.rx_ring_num; i++) {
+		rx = eea_alloc_rx(ctx, i);
+		if (!rx)
+			goto err_free;
+
+		ctx->rx[i] = rx;
+
+		tx = ctx->tx + i;
+		err = eea_alloc_tx(ctx, tx, i);
+		if (err)
+			goto err_free;
+	}
+
+	return 0;
+
+err_free:
+	for (i = 0; i < ctx->cfg.rx_ring_num; i++) {
+		rx = ctx->rx[i];
+		tx = ctx->tx + i;
+
+		eea_free_rx(rx, &ctx->cfg);
+		eea_free_tx(tx, &ctx->cfg);
+	}
+
+	kvfree(ctx->rx);
+
+err_free_tx:
+	kvfree(ctx->tx);
+	return -ENOMEM;
+}
+
+static int eea_hw_active_ring(struct eea_net *enet)
+{
+	return eea_adminq_create_q(enet, enet->cfg.rx_ring_num
+				   + enet->cfg.tx_ring_num, 0);
+}
+
+static int eea_hw_unactive_ring(struct eea_net *enet)
+{
+	int err;
+
+	err = eea_adminq_destroy_all_q(enet);
+	if (err)
+		netdev_warn(enet->netdev, "unactive rxtx ring failed.\n");
+
+	return err;
+}
+
+/* stop rx napi, stop tx queue. */
+static void eea_stop_rxtx(struct net_device *netdev)
+{
+	struct eea_net *enet = netdev_priv(netdev);
+	int i;
+
+	netif_tx_disable(netdev);
+
+	for (i = 0; i < enet->cfg.rx_ring_num; i++)
+		enet_rx_stop(enet->rx[i]);
+
+	netif_carrier_off(netdev);
+}
+
+static void eea_start_rxtx(struct eea_net *enet)
+{
+	int i;
+
+	for (i = 0; i < enet->cfg.rx_ring_num; i++)
+		enet_rx_start(enet->rx[i]);
+
+	netif_tx_start_all_queues(enet->netdev);
+	netif_carrier_on(enet->netdev);
+
+	enet->started = true;
+}
+
+static int eea_netdev_stop(struct net_device *netdev)
+{
+	struct eea_net *enet = netdev_priv(netdev);
+	struct eea_net_init_ctx ctx;
+
+	/* This function can be called during device anomaly recovery. To
+	 * prevent duplicate stop operations, the `started` flag is introduced
+	 * for checking.
+	 */
+
+	if (!enet->started) {
+		netdev_warn(netdev, "eea netdev stop: but dev is not started.\n");
+		return 0;
+	}
+
+	eea_init_ctx(enet, &ctx);
+
+	eea_stop_rxtx(netdev);
+	eea_hw_unactive_ring(enet);
+	eea_unbind_q_and_cfg(enet, &ctx);
+	eea_free_rxtx_q_mem(&ctx);
+
+	enet->started = false;
+
+	return 0;
+}
+
+static int eea_netdev_open(struct net_device *netdev)
+{
+	struct eea_net *enet = netdev_priv(netdev);
+	struct eea_net_init_ctx ctx;
+	int err;
+
+	if (enet->link_err) {
+		netdev_err(netdev, "netdev open err, because link error: %d\n",
+			   enet->link_err);
+		return -EBUSY;
+	}
+
+	eea_init_ctx(enet, &ctx);
+
+	err = eea_alloc_rxtx_q_mem(&ctx);
+	if (err)
+		goto err_done;
+
+	eea_bind_q_and_cfg(enet, &ctx);
+
+	err = eea_update_queues(enet);
+	if (err)
+		goto err_free_q;
+
+	err = eea_hw_active_ring(enet);
+	if (err)
+		goto err_free_q;
+
+	eea_start_rxtx(enet);
+
+	return 0;
+
+err_free_q:
+	eea_unbind_q_and_cfg(enet, &ctx);
+	eea_free_rxtx_q_mem(&ctx);
+
+err_done:
+	return err;
+}
+
+/* resources: ring, buffers, irq */
+int eea_reset_hw_resources(struct eea_net *enet, struct eea_net_init_ctx *ctx)
+{
+	struct eea_net_init_ctx ctx_old = {0};
+	int err, error;
+
+	if (!netif_running(enet->netdev) || !enet->started) {
+		enet->cfg = ctx->cfg;
+		return 0;
+	}
+
+	err = eea_alloc_rxtx_q_mem(ctx);
+	if (err) {
+		netdev_warn(enet->netdev,
+			    "eea reset: alloc q failed. stop reset. err %d\n",
+			    err);
+		return err;
+	}
+
+	eea_stop_rxtx(enet->netdev);
+	eea_hw_unactive_ring(enet);
+
+	eea_unbind_q_and_cfg(enet, &ctx_old);
+	eea_bind_q_and_cfg(enet, ctx);
+
+	err = eea_update_queues(enet);
+	if (err) {
+		netdev_err(enet->netdev,
+			   "eea reset: set real num queues failed. err %d\n",
+			   err);
+		goto err_bind_old;
+	}
+
+	err = eea_hw_active_ring(enet);
+	if (err) {
+		netdev_err(enet->netdev, "eea reset: active new ring. err %d\n",
+			   err);
+		eea_unbind_q_and_cfg(enet, ctx);
+		goto err_free_q;
+	}
+
+	eea_start_rxtx(enet);
+	eea_free_rxtx_q_mem(&ctx_old);
+	return 0;
+
+err_bind_old:
+	eea_unbind_q_and_cfg(enet, ctx);
+	eea_bind_q_and_cfg(enet, &ctx_old);
+	error = eea_hw_active_ring(enet);
+	if (error) {
+		netdev_err(enet->netdev, "eea reset: active old ring. err %d\n",
+			   error);
+		eea_unbind_q_and_cfg(enet, &ctx_old);
+		err = error;
+		goto err_free_q;
+	}
+
+	eea_start_rxtx(enet);
+	eea_free_rxtx_q_mem(ctx);
+	return err;
+
+err_free_q:
+
+	/* An exception occurred at the hardware level, and there's not much we
+	 * can do about it -- we can only release the resources first.
+	 */
+	eea_free_rxtx_q_mem(ctx);
+	eea_free_rxtx_q_mem(&ctx_old);
+	enet->started = false;
+	return err;
+}
+
+int eea_queues_check_and_reset(struct eea_device *edev)
+{
+	struct eea_aq_dev_status dstatus = {0};
+	struct eea_aq_queue_status *qstatus;
+	struct eea_aq_queue_status *qs;
+	struct eea_net_init_ctx ctx;
+	bool need_reset = false;
+	int i, err = 0;
+
+	rtnl_lock();
+
+	if (!netif_running(edev->enet->netdev))
+		goto err_unlock;
+
+	/* Maybe stopped by ha. */
+	if (!edev->enet->started || edev->enet->link_err)
+		goto err_unlock;
+
+	err = eea_adminq_dev_status(edev->enet, &dstatus);
+	if (err) {
+		netdev_warn(edev->enet->netdev, "query queue status failed.\n");
+		goto err_unlock;
+	}
+
+	if (le16_to_cpu(dstatus.status->link_status) == EEA_LINK_DOWN_STATUS) {
+		/* The device is broken, can not be up. */
+		eea_netdev_stop(edev->enet->netdev);
+		edev->enet->link_err = EEA_LINK_ERR_LINK_DOWN;
+		netdev_warn(edev->enet->netdev, "device link is down. stop device.\n");
+		goto err_free;
+	}
+
+	qstatus = dstatus.status->q_status;
+
+	for (i = 0; i < dstatus.num; ++i) {
+		qs = &qstatus[i];
+
+		if (le16_to_cpu(qs->status) == EEA_QUEUE_STATUS_NEED_RESET) {
+			netdev_warn(edev->enet->netdev,
+				    "queue status: queue %u needs to reset\n",
+				    le16_to_cpu(qs->qidx));
+			need_reset = true;
+		}
+	}
+
+	if (need_reset) {
+		eea_init_ctx(edev->enet, &ctx);
+		err = eea_reset_hw_resources(edev->enet, &ctx);
+	}
+
+err_free:
+	kfree(dstatus.status);
+
+err_unlock:
+	rtnl_unlock();
+	return err;
+}
+
 static int eea_update_cfg(struct eea_net *enet,
 			  struct eea_device *edev,
 			  struct eea_aq_cfg *hwcfg)
@@ -149,6 +603,9 @@ static int eea_netdev_init_features(struct net_device *netdev,
 }
 
 static const struct net_device_ops eea_netdev = {
+	.ndo_open           = eea_netdev_open,
+	.ndo_stop           = eea_netdev_stop,
+	.ndo_start_xmit     = eea_tx_xmit,
 	.ndo_validate_addr  = eth_validate_addr,
 	.ndo_features_check = passthru_features_check,
 };
@@ -157,6 +614,7 @@ static struct eea_net *eea_netdev_alloc(struct eea_device *edev, u32 pairs)
 {
 	struct net_device *netdev;
 	struct eea_net *enet;
+	int err;
 
 	netdev = alloc_etherdev_mq(sizeof(struct eea_net), pairs);
 	if (!netdev) {
@@ -173,14 +631,72 @@ static struct eea_net *eea_netdev_alloc(struct eea_device *edev, u32 pairs)
 	enet->edev = edev;
 	edev->enet = enet;
 
+	err = eea_alloc_irq_blks(enet);
+	if (err) {
+		dev_err(edev->dma_dev,
+			"eea_alloc_irq_blks failed with pairs %d\n", pairs);
+		free_netdev(netdev);
+		return NULL;
+	}
+
 	return enet;
 }
 
+static int eea_net_reprobe(struct eea_device *edev)
+{
+	struct eea_net *enet = edev->enet;
+	int err = 0;
+
+	enet->edev = edev;
+
+	if (!enet->adminq.ring) {
+		err = eea_create_adminq(enet, edev->rx_num + edev->tx_num);
+		if (err)
+			return err;
+	}
+
+	err = eea_alloc_irq_blks(enet);
+	if (err)
+		goto err_destroy_aq;
+
+	rtnl_lock();
+
+	enet->link_err = 0;
+	if (edev->ha_reset_netdev_running &&
+	    netif_running(edev->enet->netdev)) {
+		err = eea_netdev_open(enet->netdev);
+		if (err) {
+			enet->link_err = EEA_LINK_ERR_HA_RESET_DEV;
+			rtnl_unlock();
+			goto err_free_irq_blks;
+		}
+	}
+
+	rtnl_unlock();
+
+	enet->wait_pci_ready = false;
+	return 0;
+
+err_free_irq_blks:
+	eea_free_irq_blk(enet);
+
+err_destroy_aq:
+	eea_destroy_adminq(enet);
+
+	return err;
+}
+
 int eea_net_probe(struct eea_device *edev)
 {
 	struct eea_net *enet;
 	int err = -ENOMEM;
 
+	/* If edev->enet is not null, then this is called from ha reset worker.
+	 * Call eea_net_reprobe() directly.
+	 */
+	if (edev->enet)
+		return eea_net_reprobe(edev);
+
 	enet = eea_netdev_alloc(edev, edev->rx_num);
 	if (!enet)
 		return -ENOMEM;
@@ -208,11 +724,34 @@ int eea_net_probe(struct eea_device *edev)
 	eea_destroy_adminq(enet);
 
 err_free_netdev:
+	eea_free_irq_blk(enet);
 	free_netdev(enet->netdev);
 	return err;
 }
 
-void eea_net_remove(struct eea_device *edev)
+static void eea_net_ha_reset_remove(struct eea_net *enet,
+				    struct eea_device *edev)
+{
+	rtnl_lock();
+	edev->ha_reset_netdev_running = false;
+	if (netif_running(enet->netdev)) {
+		eea_netdev_stop(enet->netdev);
+		edev->ha_reset_netdev_running = true;
+	}
+
+	/* Prevent that the user set up the net device. */
+	enet->link_err = EEA_LINK_ERR_HA_RESET_DEV;
+
+	rtnl_unlock();
+
+	eea_device_reset(edev);
+	eea_destroy_adminq(enet);
+	eea_free_irq_blk(enet);
+
+	enet->wait_pci_ready = true;
+}
+
+void eea_net_remove(struct eea_device *edev, bool ha)
 {
 	struct net_device *netdev;
 	struct eea_net *enet;
@@ -220,11 +759,19 @@ void eea_net_remove(struct eea_device *edev)
 	enet = edev->enet;
 	netdev = enet->netdev;
 
-	netdev_dbg(enet->netdev, "eea removed.\n");
+	if (ha) {
+		if (enet->wait_pci_ready)
+			return;
 
-	eea_device_reset(edev);
+		eea_net_ha_reset_remove(enet, edev);
+		return;
+	}
 
-	eea_destroy_adminq(enet);
+	if (!enet->wait_pci_ready) {
+		eea_device_reset(edev);
+		eea_destroy_adminq(enet);
+		eea_free_irq_blk(enet);
+	}
 
 	free_netdev(netdev);
 }
@@ -241,9 +788,11 @@ void eea_net_shutdown(struct eea_device *edev)
 
 	netif_device_detach(netdev);
 
-	eea_device_reset(edev);
-
-	eea_destroy_adminq(enet);
+	if (!enet->wait_pci_ready) {
+		eea_device_reset(edev);
+		eea_destroy_adminq(enet);
+		eea_free_irq_blk(enet);
+	}
 
 	rtnl_unlock();
 }
diff --git a/drivers/net/ethernet/alibaba/eea/eea_net.h b/drivers/net/ethernet/alibaba/eea/eea_net.h
index fa0eec8af21b..9639f0c5c618 100644
--- a/drivers/net/ethernet/alibaba/eea/eea_net.h
+++ b/drivers/net/ethernet/alibaba/eea/eea_net.h
@@ -18,6 +18,8 @@
 #define EEA_VER_MINOR		0
 #define EEA_VER_SUB_MINOR	0
 
+struct eea_tx_meta;
+
 struct eea_net_tx {
 	struct eea_net *enet;
 
@@ -100,6 +102,18 @@ struct eea_net_cfg {
 	u8 tx_cq_desc_size;
 
 	u32 split_hdr;
+
+	struct hwtstamp_config ts_cfg;
+};
+
+struct eea_net_init_ctx {
+	struct eea_net_cfg cfg;
+
+	struct eea_net_tx *tx;
+	struct eea_net_rx **rx;
+
+	struct net_device *netdev;
+	struct eea_device *edev;
 };
 
 enum {
@@ -108,6 +122,17 @@ enum {
 	EEA_LINK_ERR_LINK_DOWN,
 };
 
+struct eea_irq_blk {
+	struct napi_struct napi;
+	u16 msix_vec;
+	bool ready;
+	struct eea_net_rx *rx;
+	char irq_name[32];
+	int irq;
+	int idx;
+
+};
+
 struct eea_net {
 	struct eea_device *edev;
 	struct net_device *netdev;
@@ -120,18 +145,37 @@ struct eea_net {
 	struct eea_net_cfg cfg;
 	struct eea_net_cfg cfg_hw;
 
+	struct eea_irq_blk *irq_blks;
+
 	u32 link_err;
 
 	bool started;
+	bool wait_pci_ready;
 
 	u8 duplex;
 	u32 speed;
-
-	u64 hw_ts_offset;
 };
 
 int eea_net_probe(struct eea_device *edev);
-void eea_net_remove(struct eea_device *edev);
+void eea_net_remove(struct eea_device *edev, bool ha);
 void eea_net_shutdown(struct eea_device *edev);
 
+int eea_reset_hw_resources(struct eea_net *enet, struct eea_net_init_ctx *ctx);
+void eea_init_ctx(struct eea_net *enet, struct eea_net_init_ctx *ctx);
+int eea_queues_check_and_reset(struct eea_device *edev);
+
+/* rx apis */
+void enet_rx_stop(struct eea_net_rx *rx);
+void enet_rx_start(struct eea_net_rx *rx);
+
+void eea_free_rx(struct eea_net_rx *rx, struct eea_net_cfg *cfg);
+struct eea_net_rx *eea_alloc_rx(struct eea_net_init_ctx *ctx, u32 idx);
+
+/* tx apis */
+int eea_poll_tx(struct eea_net_tx *tx, int budget);
+netdev_tx_t eea_tx_xmit(struct sk_buff *skb, struct net_device *netdev);
+
+void eea_free_tx(struct eea_net_tx *tx, struct eea_net_cfg *cfg);
+int eea_alloc_tx(struct eea_net_init_ctx *ctx, struct eea_net_tx *tx, u32 idx);
+
 #endif
diff --git a/drivers/net/ethernet/alibaba/eea/eea_pci.c b/drivers/net/ethernet/alibaba/eea/eea_pci.c
index bef0ebd6bb20..9872e360dd5d 100644
--- a/drivers/net/ethernet/alibaba/eea/eea_pci.c
+++ b/drivers/net/ethernet/alibaba/eea/eea_pci.c
@@ -16,6 +16,9 @@
 #define EEA_PCI_DB_MAX_SIZE 512
 #define EEA_PCI_Q_MAX_NUM 1000
 
+#define EEA_PCI_CAP_RESET_DEVICE 0xFA
+#define EEA_PCI_CAP_RESET_FLAG BIT(1)
+
 struct eea_pci_cfg {
 	__le32 reserve0;
 	__le32 reserve1;
@@ -58,6 +61,13 @@ struct eea_pci_device {
 	void __iomem *db_base;
 	void __iomem *db_end;
 
+	int ha_irq;
+
+	struct work_struct ha_handle_work;
+	char ha_irq_name[32];
+	int reset_pos;
+	bool ha_ready;
+
 	bool shutdown;
 };
 
@@ -73,6 +83,11 @@ struct eea_pci_device {
 #define cfg_read32(reg, item) ioread32(cfg_pointer(reg, item))
 #define cfg_read64(reg, item) ioread64(cfg_pointer(reg, item))
 
+/* Due to circular references, we have to add function definitions here. */
+static int __eea_pci_probe(struct pci_dev *pci_dev,
+			   struct eea_pci_device *ep_dev, bool pci_probe);
+static void __eea_pci_remove(struct pci_dev *pci_dev, bool pci_remove);
+
 const char *eea_pci_name(struct eea_device *edev)
 {
 	return pci_name(edev->ep_dev->pci_dev);
@@ -199,6 +214,12 @@ static int eea_negotiate(struct eea_device *edev)
 static void eea_pci_release_resource(struct eea_pci_device *ep_dev)
 {
 	struct pci_dev *pci_dev = ep_dev->pci_dev;
+	struct eea_device *edev;
+
+	edev = &ep_dev->edev;
+
+	if (edev->status < EEA_PCI_STATUS_READY)
+		return;
 
 	if (ep_dev->reg) {
 		pci_iounmap(pci_dev, ep_dev->reg);
@@ -213,12 +234,16 @@ static void eea_pci_release_resource(struct eea_pci_device *ep_dev)
 	pci_clear_master(pci_dev);
 	pci_release_regions(pci_dev);
 	pci_disable_device(pci_dev);
+
+	edev->status = EEA_PCI_STATUS_NONE;
 }
 
 static int eea_pci_setup(struct pci_dev *pci_dev, struct eea_pci_device *ep_dev)
 {
 	int err, n, ret, len;
 
+	ep_dev->edev.status = EEA_PCI_STATUS_ERR;
+
 	ep_dev->pci_dev = pci_dev;
 
 	err = pci_enable_device(pci_dev);
@@ -307,6 +332,8 @@ static int eea_pci_setup(struct pci_dev *pci_dev, struct eea_pci_device *ep_dev)
 
 	ep_dev->msix_vec_n = ret;
 
+	ep_dev->edev.status = EEA_PCI_STATUS_READY;
+
 	return 0;
 
 err_clear_master:
@@ -359,6 +386,178 @@ int eea_pci_active_aq(struct eea_ring *ering, int msix_vec)
 	return 0;
 }
 
+void eea_pci_free_irq(struct eea_irq_blk *blk)
+{
+	irq_update_affinity_hint(blk->irq, NULL);
+	free_irq(blk->irq, blk);
+}
+
+int eea_pci_request_irq(struct eea_device *edev, struct eea_irq_blk *blk,
+			irqreturn_t (*callback)(int irq, void *data))
+{
+	struct eea_pci_device *ep_dev = edev->ep_dev;
+	int irq;
+
+	snprintf(blk->irq_name, sizeof(blk->irq_name), "eea-q%d@%s", blk->idx,
+		 pci_name(ep_dev->pci_dev));
+
+	irq = pci_irq_vector(ep_dev->pci_dev, blk->msix_vec);
+
+	blk->irq = irq;
+
+	return request_irq(irq, callback, IRQF_NO_AUTOEN, blk->irq_name, blk);
+}
+
+static void eea_ha_handle_reset(struct eea_pci_device *ep_dev)
+{
+	struct eea_device *edev;
+	struct pci_dev *pci_dev;
+	u16 reset;
+	int err;
+
+	if (!ep_dev->reset_pos) {
+		eea_queues_check_and_reset(&ep_dev->edev);
+		return;
+	}
+
+	edev = &ep_dev->edev;
+
+	pci_read_config_word(ep_dev->pci_dev, ep_dev->reset_pos, &reset);
+
+	/* Clear bits using 0xFFFF and ignore all previous messages. */
+	pci_write_config_word(ep_dev->pci_dev, ep_dev->reset_pos, 0xFFFF);
+
+	if (reset & EEA_PCI_CAP_RESET_FLAG) {
+		dev_warn(&ep_dev->pci_dev->dev, "recv device reset request.\n");
+
+		pci_dev = ep_dev->pci_dev;
+
+		/* The pci remove callback may hold this lock. If the
+		 * pci remove callback is called, then we can ignore the
+		 * ha interrupt.
+		 */
+		if (mutex_trylock(&edev->ha_lock)) {
+			if (edev->status != EEA_PCI_STATUS_DONE) {
+				dev_err(&ep_dev->pci_dev->dev, "ha: reset device: pci status is %d. skip it.\n",
+					edev->status);
+
+				mutex_unlock(&edev->ha_lock);
+				return;
+			}
+
+			__eea_pci_remove(pci_dev, false);
+			err = __eea_pci_probe(pci_dev, ep_dev, false);
+			if (err)
+				/* Currently, for some reason, PCI
+				 * initialization or network device re-probing
+				 * has failed. Waiting for the PCI subsystem to
+				 * call the remove callback to release the
+				 * remaining resources.
+				 */
+				dev_err(&ep_dev->pci_dev->dev,
+					"ha: re-setup failed.\n");
+
+			mutex_unlock(&edev->ha_lock);
+		} else {
+			/* Device removal is in progress, so return directly. */
+			dev_warn(&ep_dev->pci_dev->dev,
+				 "ha device reset: trylock failed.\n");
+		}
+		return;
+	}
+
+	eea_queues_check_and_reset(&ep_dev->edev);
+}
+
+/* ha handle code */
+static void eea_ha_handle_work(struct work_struct *work)
+{
+	struct eea_pci_device *ep_dev;
+
+	ep_dev = container_of(work, struct eea_pci_device, ha_handle_work);
+
+	/* Ha interrupt is triggered, so there maybe some error, we may need to
+	 * reset the device or reset some queues.
+	 */
+	dev_warn(&ep_dev->pci_dev->dev, "recv ha interrupt.\n");
+
+	eea_ha_handle_reset(ep_dev);
+}
+
+static irqreturn_t eea_pci_ha_handle(int irq, void *data)
+{
+	struct eea_device *edev = data;
+
+	schedule_work(&edev->ep_dev->ha_handle_work);
+
+	return IRQ_HANDLED;
+}
+
+static void eea_pci_free_ha_irq(struct eea_device *edev)
+{
+	struct eea_pci_device *ep_dev = edev->ep_dev;
+	int irq;
+
+	if (ep_dev->ha_ready) {
+		irq = pci_irq_vector(ep_dev->pci_dev, 0);
+		free_irq(irq, edev);
+		ep_dev->ha_ready = false;
+	}
+}
+
+static int eea_pci_ha_init(struct eea_device *edev, struct pci_dev *pci_dev,
+			   bool pci_probe)
+{
+	int pos, cfg_type_off, cfg_drv_off, cfg_dev_off;
+	struct eea_pci_device *ep_dev = edev->ep_dev;
+	int irq, err;
+	u8 type;
+
+	snprintf(ep_dev->ha_irq_name, sizeof(ep_dev->ha_irq_name), "eea-ha@%s",
+		 pci_name(ep_dev->pci_dev));
+
+	irq = pci_irq_vector(ep_dev->pci_dev, 0);
+
+	if (pci_probe)
+		INIT_WORK(&ep_dev->ha_handle_work, eea_ha_handle_work);
+
+	/* This irq is not only work for ha, so request it always. */
+	err = request_irq(irq, eea_pci_ha_handle, IRQF_NO_AUTOEN,
+			  ep_dev->ha_irq_name, edev);
+	if (err)
+		return err;
+
+	ep_dev->ha_irq = irq;
+
+	ep_dev->ha_ready = true;
+	ep_dev->reset_pos = 0;
+
+	cfg_type_off = offsetof(struct eea_pci_cap, cfg_type);
+	cfg_drv_off = offsetof(struct eea_pci_reset_reg, driver);
+	cfg_dev_off = offsetof(struct eea_pci_reset_reg, device);
+
+	for (pos = pci_find_capability(pci_dev, PCI_CAP_ID_VNDR);
+	     pos > 0;
+	     pos = pci_find_next_capability(pci_dev, pos, PCI_CAP_ID_VNDR)) {
+		pci_read_config_byte(pci_dev, pos + cfg_type_off, &type);
+
+		if (type == EEA_PCI_CAP_RESET_DEVICE) {
+			/* notify device, driver support this feature. */
+			pci_write_config_word(pci_dev, pos + cfg_drv_off,
+					      EEA_PCI_CAP_RESET_FLAG);
+			pci_write_config_word(pci_dev, pos + cfg_dev_off,
+					      0xFFFF);
+
+			edev->ep_dev->reset_pos = pos + cfg_dev_off;
+			return 0;
+		}
+	}
+
+	/* irq just for event notify */
+	dev_warn(&edev->ep_dev->pci_dev->dev, "Not Found reset cap.\n");
+	return 0;
+}
+
 u64 eea_pci_device_ts(struct eea_device *edev)
 {
 	struct eea_pci_device *ep_dev = edev->ep_dev;
@@ -391,12 +590,16 @@ static int eea_init_device(struct eea_device *edev)
 }
 
 static int __eea_pci_probe(struct pci_dev *pci_dev,
-			   struct eea_pci_device *ep_dev)
+			   struct eea_pci_device *ep_dev,
+			   bool pci_probe)
 {
+	struct eea_device *edev;
 	int err;
 
 	pci_set_drvdata(pci_dev, ep_dev);
 
+	edev = &ep_dev->edev;
+
 	err = eea_pci_setup(pci_dev, ep_dev);
 	if (err)
 		return err;
@@ -405,20 +608,36 @@ static int __eea_pci_probe(struct pci_dev *pci_dev,
 	if (err)
 		goto err_pci_rel;
 
+	err = eea_pci_ha_init(edev, pci_dev, pci_probe);
+	if (err)
+		goto err_net_rm;
+
+	edev->status = EEA_PCI_STATUS_DONE;
+
+	enable_irq(ep_dev->ha_irq);
+
 	return 0;
 
+err_net_rm:
+	eea_net_remove(edev, !pci_probe);
+
 err_pci_rel:
 	eea_pci_release_resource(ep_dev);
 	return err;
 }
 
-static void __eea_pci_remove(struct pci_dev *pci_dev)
+static void __eea_pci_remove(struct pci_dev *pci_dev, bool pci_remove)
 {
 	struct eea_pci_device *ep_dev = pci_get_drvdata(pci_dev);
 	struct device *dev = get_device(&ep_dev->pci_dev->dev);
 	struct eea_device *edev = &ep_dev->edev;
 
-	eea_net_remove(edev);
+	eea_pci_free_ha_irq(edev);
+
+	if (pci_remove)
+		flush_work(&ep_dev->ha_handle_work);
+
+	eea_net_remove(edev, !pci_remove);
 
 	eea_pci_release_resource(ep_dev);
 
@@ -443,8 +662,11 @@ static int eea_pci_probe(struct pci_dev *pci_dev,
 
 	ep_dev->pci_dev = pci_dev;
 
-	err = __eea_pci_probe(pci_dev, ep_dev);
+	mutex_init(&edev->ha_lock);
+
+	err = __eea_pci_probe(pci_dev, ep_dev, true);
 	if (err) {
+		mutex_destroy(&edev->ha_lock);
 		pci_set_drvdata(pci_dev, NULL);
 		kfree(ep_dev);
 	}
@@ -455,10 +677,17 @@ static int eea_pci_probe(struct pci_dev *pci_dev,
 static void eea_pci_remove(struct pci_dev *pci_dev)
 {
 	struct eea_pci_device *ep_dev = pci_get_drvdata(pci_dev);
+	struct eea_device *edev;
+
+	edev = &ep_dev->edev;
 
-	__eea_pci_remove(pci_dev);
+	mutex_lock(&edev->ha_lock);
+	__eea_pci_remove(pci_dev, true);
+	mutex_unlock(&edev->ha_lock);
 
 	pci_set_drvdata(pci_dev, NULL);
+
+	mutex_destroy(&edev->ha_lock);
 	kfree(ep_dev);
 }
 
@@ -471,6 +700,11 @@ static void eea_pci_shutdown(struct pci_dev *pci_dev)
 
 	ep_dev->shutdown = true;
 
+	mutex_lock(&edev->ha_lock);
+	eea_pci_free_ha_irq(edev);
+	flush_work(&ep_dev->ha_handle_work);
+	mutex_unlock(&edev->ha_lock);
+
 	eea_net_shutdown(edev);
 
 	pci_clear_master(pci_dev);
diff --git a/drivers/net/ethernet/alibaba/eea/eea_pci.h b/drivers/net/ethernet/alibaba/eea/eea_pci.h
index cfd278e2efde..6f6ba7d5514d 100644
--- a/drivers/net/ethernet/alibaba/eea/eea_pci.h
+++ b/drivers/net/ethernet/alibaba/eea/eea_pci.h
@@ -10,8 +10,18 @@
 
 #include <linux/pci.h>
 
+#include "eea_net.h"
 #include "eea_ring.h"
 
+enum eea_pci_status {
+	EEA_PCI_STATUS_NONE,
+	EEA_PCI_STATUS_ERR,
+	EEA_PCI_STATUS_READY,
+	EEA_PCI_STATUS_DONE,
+};
+
+struct eea_irq_blk;
+
 struct eea_pci_cap {
 	__u8 cap_vndr;
 	__u8 cap_next;
@@ -34,6 +44,12 @@ struct eea_device {
 
 	u64 features;
 
+	enum eea_pci_status status;
+	bool ha_reset_netdev_running;
+
+	/* ha lock for the race between ha work and pci remove */
+	struct mutex ha_lock;
+
 	u32 rx_num;
 	u32 tx_num;
 	u32 db_blk_size;
@@ -47,6 +63,10 @@ int eea_device_reset(struct eea_device *dev);
 int eea_pci_set_aq_up(struct eea_device *dev);
 int eea_pci_active_aq(struct eea_ring *ering, int msix_vec);
 
+int eea_pci_request_irq(struct eea_device *edev, struct eea_irq_blk *blk,
+			irqreturn_t (*callback)(int irq, void *data));
+void eea_pci_free_irq(struct eea_irq_blk *blk);
+
 u64 eea_pci_device_ts(struct eea_device *edev);
 
 void __iomem *eea_pci_db_addr(struct eea_device *edev, u32 off);
diff --git a/drivers/net/ethernet/alibaba/eea/eea_rx.c b/drivers/net/ethernet/alibaba/eea/eea_rx.c
new file mode 100644
index 000000000000..b1265048fbc3
--- /dev/null
+++ b/drivers/net/ethernet/alibaba/eea/eea_rx.c
@@ -0,0 +1,264 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Driver for Alibaba Elastic Ethernet Adapter.
+ *
+ * Copyright (C) 2025 Alibaba Inc.
+ */
+
+#include <net/netdev_rx_queue.h>
+#include <net/page_pool/helpers.h>
+
+#include "eea_adminq.h"
+#include "eea_net.h"
+#include "eea_pci.h"
+#include "eea_ring.h"
+
+#define EEA_ENABLE_F_NAPI        BIT(0)
+
+#define EEA_PAGE_FRAGS_NUM 1024
+
+static void eea_free_rx_buffer(struct eea_net_rx *rx, struct eea_rx_meta *meta,
+			       bool allow_direct)
+{
+	u32 drain_count;
+
+	drain_count = EEA_PAGE_FRAGS_NUM - meta->frags;
+
+	if (page_pool_unref_page(meta->page, drain_count) == 0)
+		page_pool_put_unrefed_page(rx->pp, meta->page, -1,
+					   allow_direct);
+
+	meta->page = NULL;
+}
+
+static void eea_free_rx_hdr(struct eea_net_rx *rx, struct eea_net_cfg *cfg)
+{
+	struct eea_rx_meta *meta;
+	int i;
+
+	for (i = 0; i < cfg->rx_ring_depth; ++i) {
+		meta = &rx->meta[i];
+		meta->hdr_addr = NULL;
+
+		if (!meta->hdr_page)
+			continue;
+
+		dma_unmap_page(rx->dma_dev, meta->hdr_dma, PAGE_SIZE,
+			       DMA_FROM_DEVICE);
+		put_page(meta->hdr_page);
+
+		meta->hdr_page = NULL;
+	}
+}
+
+static int eea_alloc_rx_hdr(struct eea_net_init_ctx *ctx, struct eea_net_rx *rx)
+{
+	struct page *hdr_page = NULL;
+	struct eea_rx_meta *meta;
+	u32 offset = 0, hdrsize;
+	struct device *dmadev;
+	dma_addr_t dma;
+	int i;
+
+	dmadev = ctx->edev->dma_dev;
+	hdrsize = ctx->cfg.split_hdr;
+
+	for (i = 0; i < ctx->cfg.rx_ring_depth; ++i) {
+		meta = &rx->meta[i];
+		meta->hdr_page = NULL;
+
+		if (!hdr_page || offset + hdrsize > PAGE_SIZE) {
+			hdr_page = alloc_page(GFP_KERNEL);
+			if (!hdr_page)
+				goto err;
+
+			dma = dma_map_page(dmadev, hdr_page, 0, PAGE_SIZE,
+					   DMA_FROM_DEVICE);
+
+			if (unlikely(dma_mapping_error(dmadev, dma))) {
+				put_page(hdr_page);
+				goto err;
+			}
+
+			offset = 0;
+			meta->hdr_page = hdr_page;
+		}
+
+		meta->hdr_dma = dma + offset;
+		meta->hdr_addr = page_address(hdr_page) + offset;
+		offset += hdrsize;
+	}
+
+	return 0;
+
+err:
+	eea_free_rx_hdr(rx, &ctx->cfg);
+	return -ENOMEM;
+}
+
+static int eea_poll(struct napi_struct *napi, int budget)
+{
+	/* Empty function; will be implemented in a subsequent commit. */
+	return 0;
+}
+
+static void eea_free_rx_buffers(struct eea_net_rx *rx, struct eea_net_cfg *cfg)
+{
+	struct eea_rx_meta *meta;
+	u32 i;
+
+	for (i = 0; i < cfg->rx_ring_depth; ++i) {
+		meta = &rx->meta[i];
+		if (!meta->page)
+			continue;
+
+		eea_free_rx_buffer(rx, meta, false);
+	}
+}
+
+static struct page_pool *eea_create_pp(struct eea_net_init_ctx *ctx, u32 idx)
+{
+	struct page_pool_params pp_params = {0};
+
+	pp_params.order     = 0;
+	pp_params.flags     = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV;
+	pp_params.pool_size = ctx->cfg.rx_ring_depth;
+	pp_params.nid       = dev_to_node(ctx->edev->dma_dev);
+	pp_params.dev       = ctx->edev->dma_dev;
+	pp_params.netdev    = ctx->netdev;
+	pp_params.dma_dir   = DMA_FROM_DEVICE;
+	pp_params.max_len   = PAGE_SIZE;
+	pp_params.queue_idx = idx;
+
+	return page_pool_create(&pp_params);
+}
+
+static void eea_destroy_page_pool(struct eea_net_rx *rx)
+{
+	if (rx->pp)
+		page_pool_destroy(rx->pp);
+}
+
+void enet_rx_stop(struct eea_net_rx *rx)
+{
+	if (rx->flags & EEA_ENABLE_F_NAPI) {
+		rx->flags &= ~EEA_ENABLE_F_NAPI;
+
+		disable_irq(rx->enet->irq_blks[rx->index].irq);
+		napi_disable(rx->napi);
+
+		page_pool_disable_direct_recycling(rx->pp);
+		netif_napi_del(rx->napi);
+	}
+}
+
+void enet_rx_start(struct eea_net_rx *rx)
+{
+	netif_napi_add(rx->enet->netdev, rx->napi, eea_poll);
+
+	page_pool_enable_direct_recycling(rx->pp, rx->napi);
+
+	napi_enable(rx->napi);
+
+	rx->flags |= EEA_ENABLE_F_NAPI;
+
+	local_bh_disable();
+	napi_schedule(rx->napi);
+	local_bh_enable();
+
+	enable_irq(rx->enet->irq_blks[rx->index].irq);
+}
+
+/* Maybe called before eea_bind_q_and_cfg. So the cfg must be passed. */
+void eea_free_rx(struct eea_net_rx *rx, struct eea_net_cfg *cfg)
+{
+	if (!rx)
+		return;
+
+	if (rx->ering) {
+		eea_ering_free(rx->ering);
+		rx->ering = NULL;
+	}
+
+	if (rx->meta) {
+		eea_free_rx_buffers(rx, cfg);
+		eea_free_rx_hdr(rx, cfg);
+		kvfree(rx->meta);
+		rx->meta = NULL;
+	}
+
+	if (rx->pp) {
+		eea_destroy_page_pool(rx);
+		rx->pp = NULL;
+	}
+
+	kfree(rx);
+}
+
+static void eea_rx_meta_init(struct eea_net_rx *rx, u32 num)
+{
+	struct eea_rx_meta *meta;
+	int i;
+
+	rx->free = NULL;
+
+	for (i = 0; i < num; ++i) {
+		meta = &rx->meta[i];
+		meta->id = i;
+		meta->next = rx->free;
+		rx->free = meta;
+	}
+}
+
+struct eea_net_rx *eea_alloc_rx(struct eea_net_init_ctx *ctx, u32 idx)
+{
+	struct eea_ring *ering;
+	struct eea_net_rx *rx;
+	int err;
+
+	rx = kzalloc(sizeof(*rx), GFP_KERNEL);
+	if (!rx)
+		return rx;
+
+	rx->index = idx;
+	snprintf(rx->name, sizeof(rx->name), "rx.%u", idx);
+
+	/* ering */
+	ering = eea_ering_alloc(idx * 2, ctx->cfg.rx_ring_depth, ctx->edev,
+				ctx->cfg.rx_sq_desc_size,
+				ctx->cfg.rx_cq_desc_size,
+				rx->name);
+	if (!ering)
+		goto err_free_rx;
+
+	rx->ering = ering;
+
+	rx->dma_dev = ctx->edev->dma_dev;
+
+	/* meta */
+	rx->meta = kvcalloc(ctx->cfg.rx_ring_depth,
+			    sizeof(*rx->meta), GFP_KERNEL);
+	if (!rx->meta)
+		goto err_free_rx;
+
+	eea_rx_meta_init(rx, ctx->cfg.rx_ring_depth);
+
+	if (ctx->cfg.split_hdr) {
+		err = eea_alloc_rx_hdr(ctx, rx);
+		if (err)
+			goto err_free_rx;
+	}
+
+	rx->pp = eea_create_pp(ctx, idx);
+	if (IS_ERR(rx->pp)) {
+		err = PTR_ERR(rx->pp);
+		rx->pp = NULL;
+		goto err_free_rx;
+	}
+
+	return rx;
+
+err_free_rx:
+	eea_free_rx(rx, &ctx->cfg);
+	return NULL;
+}
diff --git a/drivers/net/ethernet/alibaba/eea/eea_tx.c b/drivers/net/ethernet/alibaba/eea/eea_tx.c
new file mode 100644
index 000000000000..5a8babfce5de
--- /dev/null
+++ b/drivers/net/ethernet/alibaba/eea/eea_tx.c
@@ -0,0 +1,102 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Driver for Alibaba Elastic Ethernet Adapter.
+ *
+ * Copyright (C) 2025 Alibaba Inc.
+ */
+
+#include <net/netdev_queues.h>
+
+#include "eea_net.h"
+#include "eea_pci.h"
+#include "eea_ring.h"
+
+struct eea_tx_meta {
+	struct eea_tx_meta *next;
+
+	u32 id;
+
+	union {
+		struct sk_buff *skb;
+		void *data;
+	};
+
+	u32 num;
+
+	dma_addr_t dma_addr;
+	struct eea_tx_desc *desc;
+	u32 dma_len;
+};
+
+int eea_poll_tx(struct eea_net_tx *tx, int budget)
+{
+	/* Empty function; will be implemented in a subsequent commit. */
+	return budget;
+}
+
+netdev_tx_t eea_tx_xmit(struct sk_buff *skb, struct net_device *netdev)
+{
+	/* Empty function; will be implemented in a subsequent commit. */
+	dev_kfree_skb_any(skb);
+	return NETDEV_TX_OK;
+}
+
+static void eea_free_meta(struct eea_net_tx *tx, struct eea_net_cfg *cfg)
+{
+	kvfree(tx->meta);
+	tx->meta = NULL;
+}
+
+/* Maybe called before eea_bind_q_and_cfg. So the cfg must be passed. */
+void eea_free_tx(struct eea_net_tx *tx, struct eea_net_cfg *cfg)
+{
+	if (!tx)
+		return;
+
+	if (tx->ering) {
+		eea_ering_free(tx->ering);
+		tx->ering = NULL;
+	}
+
+	if (tx->meta)
+		eea_free_meta(tx, cfg);
+}
+
+int eea_alloc_tx(struct eea_net_init_ctx *ctx, struct eea_net_tx *tx, u32 idx)
+{
+	struct eea_tx_meta *meta;
+	struct eea_ring *ering;
+	u32 i;
+
+	snprintf(tx->name, sizeof(tx->name), "tx.%u", idx);
+
+	ering = eea_ering_alloc(idx * 2 + 1, ctx->cfg.tx_ring_depth, ctx->edev,
+				ctx->cfg.tx_sq_desc_size,
+				ctx->cfg.tx_cq_desc_size,
+				tx->name);
+	if (!ering)
+		goto err_free_tx;
+
+	tx->ering = ering;
+	tx->index = idx;
+	tx->dma_dev = ctx->edev->dma_dev;
+
+	/* meta */
+	tx->meta = kvcalloc(ctx->cfg.tx_ring_depth,
+			    sizeof(*tx->meta), GFP_KERNEL);
+	if (!tx->meta)
+		goto err_free_tx;
+
+	for (i = 0; i < ctx->cfg.tx_ring_depth; ++i) {
+		meta = &tx->meta[i];
+		meta->id = i;
+		meta->next = tx->free;
+		tx->free = meta;
+	}
+
+	return 0;
+
+err_free_tx:
+	eea_free_tx(tx, &ctx->cfg);
+	return -ENOMEM;
+}
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next v43 5/8] eea: implement packet receive logic
  2026-05-14  9:51 [PATCH net-next v43 0/8] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor Xuan Zhuo
                   ` (3 preceding siblings ...)
  2026-05-14  9:51 ` [PATCH net-next v43 4/8] eea: create/destroy rx,tx queues for netdevice open and stop Xuan Zhuo
@ 2026-05-14  9:51 ` Xuan Zhuo
  2026-05-18  2:34   ` Xuan Zhuo
  2026-05-14  9:51 ` [PATCH net-next v43 6/8] eea: implement packet transmit logic Xuan Zhuo
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 25+ messages in thread
From: Xuan Zhuo @ 2026-05-14  9:51 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Xuan Zhuo, Wen Gu, Philo Lu, Vadim Fedorenko,
	Dong Yibo, Mingyu Wang, Heiner Kallweit, Dust Li

Implement the core logic for receiving packets in the EEA RX path,
including packet buffering and basic validation.

Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Philo Lu <lulie@linux.alibaba.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/ethernet/alibaba/eea/eea_net.c |  13 +
 drivers/net/ethernet/alibaba/eea/eea_net.h |   7 +
 drivers/net/ethernet/alibaba/eea/eea_rx.c  | 532 ++++++++++++++++++++-
 3 files changed, 550 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/alibaba/eea/eea_net.c b/drivers/net/ethernet/alibaba/eea/eea_net.c
index cfb18a07e296..bf8581bc27ed 100644
--- a/drivers/net/ethernet/alibaba/eea/eea_net.c
+++ b/drivers/net/ethernet/alibaba/eea/eea_net.c
@@ -642,6 +642,15 @@ static struct eea_net *eea_netdev_alloc(struct eea_device *edev, u32 pairs)
 	return enet;
 }
 
+static void eea_update_ts_off(struct eea_device *edev, struct eea_net *enet)
+{
+	u64 ts;
+
+	ts = eea_pci_device_ts(edev);
+
+	enet->hw_ts_offset = ktime_get_real() - ts;
+}
+
 static int eea_net_reprobe(struct eea_device *edev)
 {
 	struct eea_net *enet = edev->enet;
@@ -659,6 +668,8 @@ static int eea_net_reprobe(struct eea_device *edev)
 	if (err)
 		goto err_destroy_aq;
 
+	eea_update_ts_off(edev, enet);
+
 	rtnl_lock();
 
 	enet->link_err = 0;
@@ -711,6 +722,8 @@ int eea_net_probe(struct eea_device *edev)
 	if (err)
 		goto err_reset_dev;
 
+	eea_update_ts_off(edev, enet);
+
 	netdev_dbg(enet->netdev, "eea probe success.\n");
 
 	/* Queue TX/RX implementation is still in progress. register_netdev is
diff --git a/drivers/net/ethernet/alibaba/eea/eea_net.h b/drivers/net/ethernet/alibaba/eea/eea_net.h
index 9639f0c5c618..1ff9ef1a8fbe 100644
--- a/drivers/net/ethernet/alibaba/eea/eea_net.h
+++ b/drivers/net/ethernet/alibaba/eea/eea_net.h
@@ -41,6 +41,7 @@ struct eea_rx_meta {
 	struct page *page;
 	dma_addr_t dma;
 	u32 offset;
+	u32 sync_for_cpu;
 	u32 frags;
 
 	struct page *hdr_page;
@@ -54,6 +55,8 @@ struct eea_rx_meta {
 	u32 tailroom;
 
 	u32 len;
+
+	bool in_use;
 };
 
 struct eea_net_rx_pkt_ctx {
@@ -62,6 +65,7 @@ struct eea_net_rx_pkt_ctx {
 	bool data_valid;
 	bool do_drop;
 
+	u32 recv_len;
 	struct sk_buff *head_skb;
 };
 
@@ -154,6 +158,8 @@ struct eea_net {
 
 	u8 duplex;
 	u32 speed;
+
+	u64 hw_ts_offset;
 };
 
 int eea_net_probe(struct eea_device *edev);
@@ -165,6 +171,7 @@ void eea_init_ctx(struct eea_net *enet, struct eea_net_init_ctx *ctx);
 int eea_queues_check_and_reset(struct eea_device *edev);
 
 /* rx apis */
+
 void enet_rx_stop(struct eea_net_rx *rx);
 void enet_rx_start(struct eea_net_rx *rx);
 
diff --git a/drivers/net/ethernet/alibaba/eea/eea_rx.c b/drivers/net/ethernet/alibaba/eea/eea_rx.c
index b1265048fbc3..9c71a7cf950e 100644
--- a/drivers/net/ethernet/alibaba/eea/eea_rx.c
+++ b/drivers/net/ethernet/alibaba/eea/eea_rx.c
@@ -17,6 +17,39 @@
 
 #define EEA_PAGE_FRAGS_NUM 1024
 
+#define EEA_RX_BUF_ALIGN 128
+
+#define EEA_RX_BUF_MAX_LEN (10 * 1024)
+
+struct eea_rx_ctx {
+	u32 len;
+	u32 hdr_len;
+
+	u16 flags;
+	bool more;
+
+	struct eea_rx_meta *meta;
+};
+
+static struct eea_rx_meta *eea_rx_meta_get(struct eea_net_rx *rx)
+{
+	struct eea_rx_meta *meta;
+
+	if (!rx->free)
+		return NULL;
+
+	meta = rx->free;
+	rx->free = meta->next;
+
+	return meta;
+}
+
+static void eea_rx_meta_put(struct eea_net_rx *rx, struct eea_rx_meta *meta)
+{
+	meta->next = rx->free;
+	rx->free = meta;
+}
+
 static void eea_free_rx_buffer(struct eea_net_rx *rx, struct eea_rx_meta *meta,
 			       bool allow_direct)
 {
@@ -31,6 +64,89 @@ static void eea_free_rx_buffer(struct eea_net_rx *rx, struct eea_rx_meta *meta,
 	meta->page = NULL;
 }
 
+static void eea_rx_meta_dma_sync_for_device(struct eea_net_rx *rx,
+					    struct eea_rx_meta *meta)
+{
+	u32 len;
+
+	if (meta->sync_for_cpu <= meta->offset + rx->headroom)
+		return;
+
+	len = meta->sync_for_cpu - meta->offset - rx->headroom;
+
+	dma_sync_single_for_device(rx->enet->edev->dma_dev,
+				   meta->dma + meta->offset + rx->headroom,
+				   len, DMA_FROM_DEVICE);
+	meta->sync_for_cpu = 0;
+}
+
+static void meta_align_offset(struct eea_net_rx *rx, struct eea_rx_meta *meta)
+{
+	int h, b;
+
+	h = rx->headroom;
+	b = meta->offset + h;
+
+	/* For better performance, we align the buffer address to
+	 * EEA_RX_BUF_ALIGN, as required by the device design.
+	 */
+	b = ALIGN(b, EEA_RX_BUF_ALIGN);
+
+	meta->offset = b - h;
+}
+
+static int eea_alloc_rx_buffer(struct eea_net_rx *rx, struct eea_rx_meta *meta)
+{
+	struct page *page;
+
+	if (meta->page) {
+		eea_rx_meta_dma_sync_for_device(rx, meta);
+		return 0;
+	}
+
+	page = page_pool_dev_alloc_pages(rx->pp);
+	if (!page)
+		return -ENOMEM;
+
+	page_pool_fragment_page(page, EEA_PAGE_FRAGS_NUM);
+
+	meta->page = page;
+	meta->dma = page_pool_get_dma_addr(page);
+	meta->offset = 0;
+	meta->frags = 0;
+	meta->sync_for_cpu = 0;
+
+	meta_align_offset(rx, meta);
+
+	return 0;
+}
+
+static u32 eea_consume_rx_buffer(struct eea_net_rx *rx,
+				 struct eea_rx_meta *meta,
+				 u32 consumed)
+{
+	u32 offset;
+	int min;
+
+	offset = meta->offset;
+
+	meta->offset += consumed;
+	++meta->frags;
+
+	min = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	min += rx->headroom;
+	min += SKB_DATA_ALIGN(ETH_DATA_LEN);
+
+	meta_align_offset(rx, meta);
+
+	if (min + meta->offset > PAGE_SIZE) {
+		eea_free_rx_buffer(rx, meta, true);
+		return PAGE_SIZE - offset;
+	}
+
+	return meta->offset - offset;
+}
+
 static void eea_free_rx_hdr(struct eea_net_rx *rx, struct eea_net_cfg *cfg)
 {
 	struct eea_rx_meta *meta;
@@ -96,17 +212,429 @@ static int eea_alloc_rx_hdr(struct eea_net_init_ctx *ctx, struct eea_net_rx *rx)
 	return -ENOMEM;
 }
 
-static int eea_poll(struct napi_struct *napi, int budget)
+static void eea_rx_meta_dma_sync_for_cpu(struct eea_net_rx *rx,
+					 struct eea_rx_meta *meta, u32 len)
+{
+	dma_sync_single_for_cpu(rx->enet->edev->dma_dev,
+				meta->dma + meta->offset + meta->headroom,
+				len, DMA_FROM_DEVICE);
+	meta->sync_for_cpu = meta->offset + meta->headroom + len;
+}
+
+static int eea_harden_check_overflow(struct eea_rx_ctx *ctx,
+				     struct eea_net *enet)
+{
+	u32 max_len;
+
+	max_len = ctx->meta->truesize - ctx->meta->headroom -
+		ctx->meta->tailroom;
+
+	if (unlikely(ctx->len > max_len)) {
+		pr_debug("%s: rx error: len %u exceeds truesize %u\n",
+			 enet->netdev->name, ctx->len, max_len);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int eea_harden_check_size(struct eea_rx_ctx *ctx, struct eea_net *enet)
 {
-	/* Empty function; will be implemented in a subsequent commit. */
+	int err;
+
+	err = eea_harden_check_overflow(ctx, enet);
+	if (err)
+		return err;
+
+	if (ctx->hdr_len) {
+		if (unlikely(ctx->hdr_len < ETH_HLEN)) {
+			pr_debug("%s: short hdr %u\n", enet->netdev->name,
+				 ctx->hdr_len);
+			return -EINVAL;
+		}
+
+		if (unlikely(ctx->hdr_len > enet->cfg.split_hdr)) {
+			pr_debug("%s: rx error: hdr len %u exceeds hdr buffer size %u\n",
+				 enet->netdev->name, ctx->hdr_len,
+				 enet->cfg.split_hdr);
+			return -EINVAL;
+		}
+
+		return 0;
+	}
+
+	if (unlikely(ctx->len < ETH_HLEN)) {
+		pr_debug("%s: short packet %u\n", enet->netdev->name, ctx->len);
+		return -EINVAL;
+	}
+
 	return 0;
 }
 
+static struct sk_buff *eea_build_skb(void *buf, u32 buflen, u32 headroom,
+				     u32 len)
+{
+	struct sk_buff *skb;
+
+	skb = build_skb(buf, buflen);
+	if (unlikely(!skb))
+		return NULL;
+
+	skb_reserve(skb, headroom);
+	skb_put(skb, len);
+
+	return skb;
+}
+
+static struct sk_buff *eea_rx_build_split_hdr_skb(struct eea_net_rx *rx,
+						  struct eea_rx_ctx *ctx)
+{
+	struct eea_rx_meta *meta = ctx->meta;
+	u32 truesize, offset;
+	struct sk_buff *skb;
+	struct page *page;
+
+	dma_sync_single_for_cpu(rx->enet->edev->dma_dev, meta->hdr_dma,
+				ctx->hdr_len, DMA_FROM_DEVICE);
+
+	skb = napi_alloc_skb(rx->napi, ctx->hdr_len);
+	if (unlikely(!skb))
+		return NULL;
+
+	skb_put_data(skb, ctx->meta->hdr_addr, ctx->hdr_len);
+
+	if (ctx->len) {
+		page = meta->page;
+		offset = meta->offset + meta->headroom;
+
+		truesize = eea_consume_rx_buffer(rx, meta,
+						 meta->headroom + ctx->len);
+
+		skb_add_rx_frag(skb, 0, page, offset, ctx->len, truesize);
+	}
+
+	skb_mark_for_recycle(skb);
+
+	return skb;
+}
+
+static struct sk_buff *eea_rx_build_skb(struct eea_net_rx *rx,
+					struct eea_rx_ctx *ctx)
+{
+	struct eea_rx_meta *meta = ctx->meta;
+	u32 shinfo_size, bufsize, truesize;
+	struct sk_buff *skb;
+	struct page *page;
+	void *buf;
+
+	page = meta->page;
+
+	shinfo_size = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+
+	buf = page_address(page) + meta->offset;
+	bufsize = meta->headroom + SKB_DATA_ALIGN(ctx->len) + shinfo_size;
+
+	skb = eea_build_skb(buf, bufsize, meta->headroom, ctx->len);
+	if (unlikely(!skb))
+		return NULL;
+
+	truesize = eea_consume_rx_buffer(rx, meta, bufsize);
+	skb_mark_for_recycle(skb);
+
+	skb->truesize += truesize - bufsize;
+
+	return skb;
+}
+
+static void process_remain_buf(struct eea_net_rx *rx, struct eea_rx_ctx *ctx)
+{
+	struct eea_net *enet = rx->enet;
+	struct sk_buff *head_skb;
+	u32 offset, truesize, nr_frags;
+	struct page *page;
+
+	if (eea_harden_check_overflow(ctx, enet))
+		goto err;
+
+	head_skb = rx->pkt.head_skb;
+
+	nr_frags = skb_shinfo(head_skb)->nr_frags;
+	if (unlikely(nr_frags >= MAX_SKB_FRAGS))
+		goto err;
+
+	offset = ctx->meta->offset + ctx->meta->headroom;
+	page = ctx->meta->page;
+	truesize = eea_consume_rx_buffer(rx, ctx->meta,
+					 ctx->meta->headroom + ctx->len);
+
+	skb_add_rx_frag(head_skb, nr_frags, page, offset, ctx->len, truesize);
+
+	return;
+
+err:
+	dev_kfree_skb(rx->pkt.head_skb);
+	rx->pkt.do_drop = true;
+	rx->pkt.head_skb = NULL;
+}
+
+static void process_first_buf(struct eea_net_rx *rx, struct eea_rx_ctx *ctx)
+{
+	struct eea_net *enet = rx->enet;
+	struct sk_buff *skb = NULL;
+
+	if (eea_harden_check_size(ctx, enet))
+		goto err;
+
+	rx->pkt.data_valid = ctx->flags & EEA_DESC_F_DATA_VALID;
+
+	if (ctx->hdr_len)
+		skb = eea_rx_build_split_hdr_skb(rx, ctx);
+	else
+		skb = eea_rx_build_skb(rx, ctx);
+
+	if (unlikely(!skb))
+		goto err;
+
+	rx->pkt.head_skb = skb;
+
+	return;
+
+err:
+	rx->pkt.do_drop = true;
+}
+
+static void eea_submit_skb(struct eea_net_rx *rx, struct sk_buff *skb,
+			   struct eea_rx_cdesc *desc)
+{
+	struct eea_net *enet = rx->enet;
+
+	if (rx->pkt.data_valid)
+		skb->ip_summed = CHECKSUM_UNNECESSARY;
+
+	if (enet->cfg.ts_cfg.rx_filter == HWTSTAMP_FILTER_ALL)
+		skb_hwtstamps(skb)->hwtstamp = EEA_DESC_TS(desc) +
+			enet->hw_ts_offset;
+
+	skb_record_rx_queue(skb, rx->index);
+	skb->protocol = eth_type_trans(skb, enet->netdev);
+
+	napi_gro_receive(rx->napi, skb);
+}
+
+static int eea_rx_desc_to_ctx(struct eea_net_rx *rx,
+			      struct eea_rx_ctx *ctx,
+			      struct eea_rx_cdesc *desc)
+{
+	u16 id;
+
+	ctx->meta = NULL;
+
+	id = le16_to_cpu(desc->id);
+	if (unlikely(id >= rx->ering->num)) {
+		if (net_ratelimit())
+			netdev_err(rx->enet->netdev, "rx invalid id %d\n", id);
+		return -EINVAL;
+	}
+
+	ctx->meta = &rx->meta[id];
+	if (!ctx->meta->in_use) {
+		if (net_ratelimit())
+			netdev_err(rx->enet->netdev, "rx invalid id %d\n", id);
+		ctx->meta = NULL;
+		return -EINVAL;
+	}
+
+	ctx->meta->in_use = false;
+
+	ctx->len = le16_to_cpu(desc->len);
+	if (unlikely(ctx->len > ctx->meta->len)) {
+		if (net_ratelimit())
+			netdev_err(rx->enet->netdev, "rx invalid len(%d) id:%d\n",
+				   ctx->len, id);
+		return -EINVAL;
+	}
+
+	ctx->flags = le16_to_cpu(desc->flags);
+
+	ctx->hdr_len = 0;
+	if (ctx->flags & EEA_DESC_F_SPLIT_HDR) {
+		ctx->hdr_len = le16_to_cpu(desc->len_ex) &
+			EEA_RX_CDESC_HDR_LEN_MASK;
+	}
+
+	ctx->more = ctx->flags & EEA_RING_DESC_F_MORE;
+
+	return 0;
+}
+
+static int eea_cleanrx(struct eea_net_rx *rx, int budget,
+		       struct eea_rx_ctx *ctx)
+{
+	struct eea_rx_cdesc *desc;
+	struct eea_rx_meta *meta;
+	int recv, err;
+
+	for (recv = 0; recv < budget; ) {
+		desc = eea_ering_cq_get_desc(rx->ering);
+		if (!desc)
+			break;
+
+		err = eea_rx_desc_to_ctx(rx, ctx, desc);
+		if (unlikely(err)) {
+			if (ctx->meta)
+				eea_rx_meta_put(rx, ctx->meta);
+
+			if (rx->pkt.head_skb)
+				dev_kfree_skb(rx->pkt.head_skb);
+
+			/* A hardware error occurred; we are attempting to
+			 * mitigate the impact. Subsequent packets may be
+			 * corrupted.
+			 */
+			ctx->more = false;
+			goto ack;
+		}
+
+		meta = ctx->meta;
+
+		if (unlikely(rx->pkt.do_drop))
+			goto skip;
+
+		eea_rx_meta_dma_sync_for_cpu(rx, meta, ctx->len);
+
+		rx->pkt.recv_len += ctx->len;
+		rx->pkt.recv_len += ctx->hdr_len;
+
+		if (!rx->pkt.idx)
+			process_first_buf(rx, ctx);
+		else
+			process_remain_buf(rx, ctx);
+
+		++rx->pkt.idx;
+
+		if (!ctx->more && rx->pkt.head_skb)
+			eea_submit_skb(rx, rx->pkt.head_skb, desc);
+
+skip:
+		eea_rx_meta_put(rx, meta);
+ack:
+		eea_ering_cq_ack_desc(rx->ering, 1);
+
+		if (!ctx->more) {
+			memset(&rx->pkt, 0, sizeof(rx->pkt));
+			++recv;
+		}
+	}
+
+	return recv;
+}
+
+static void eea_rx_dma_sync_hdr(struct eea_net_rx *rx, dma_addr_t addr)
+{
+	dma_sync_single_for_device(rx->dma_dev, addr,
+				   rx->enet->cfg.split_hdr,
+				   DMA_FROM_DEVICE);
+}
+
+/* Only be called from napi. */
+static void eea_rx_post(struct eea_net_rx *rx)
+{
+	u32 tailroom, headroom, room, len;
+	struct eea_rx_meta *meta;
+	struct eea_rx_desc *desc;
+	int err = 0, num = 0;
+	dma_addr_t addr;
+
+	tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	headroom = rx->headroom;
+	room = headroom + tailroom;
+
+	while (true) {
+		meta = eea_rx_meta_get(rx);
+		if (!meta)
+			break;
+
+		err = eea_alloc_rx_buffer(rx, meta);
+		if (err) {
+			eea_rx_meta_put(rx, meta);
+			break;
+		}
+
+		len = min_t(u32, PAGE_SIZE - meta->offset - room,
+			    EEA_RX_BUF_MAX_LEN);
+
+		len = ALIGN_DOWN(len, SMP_CACHE_BYTES);
+
+		addr = meta->dma + meta->offset + headroom;
+
+		desc = eea_ering_sq_alloc_desc(rx->ering, meta->id, true, 0);
+		desc->addr = cpu_to_le64(addr);
+		desc->len = cpu_to_le16(len);
+
+		if (meta->hdr_addr) {
+			eea_rx_dma_sync_hdr(rx, meta->hdr_dma);
+			desc->hdr_addr = cpu_to_le64(meta->hdr_dma);
+		}
+
+		eea_ering_sq_commit_desc(rx->ering);
+
+		meta->truesize = len + room;
+		meta->headroom = headroom;
+		meta->tailroom = tailroom;
+		meta->len = len;
+		meta->in_use = true;
+		++num;
+	}
+
+	if (num)
+		eea_ering_kick(rx->ering);
+}
+
+static int eea_poll(struct napi_struct *napi, int budget)
+{
+	struct eea_irq_blk *blk = container_of(napi, struct eea_irq_blk, napi);
+	struct eea_net_rx *rx = blk->rx;
+	struct eea_net_tx *tx = &rx->enet->tx[rx->index];
+	struct eea_rx_ctx ctx = {};
+	bool busy = false;
+	u32 received;
+
+	busy |= eea_poll_tx(tx, budget) >= budget;
+
+	received = eea_cleanrx(rx, budget, &ctx);
+
+	if (rx->ering->num_free > budget) {
+		/* Due to the hardware design, there is no notification when
+		 * buffers are exhausted. Therefore, we should proactively
+		 * pre-fill the buffers to avoid starvation.
+		 */
+		eea_rx_post(rx);
+
+		if (rx->ering->num - rx->ering->num_free < budget)
+			busy = true;
+	}
+
+	busy |= received >= budget;
+
+	if (busy)
+		return budget;
+
+	if (napi_complete_done(napi, received))
+		eea_ering_irq_active(rx->ering, tx->ering);
+
+	return received;
+}
+
 static void eea_free_rx_buffers(struct eea_net_rx *rx, struct eea_net_cfg *cfg)
 {
 	struct eea_rx_meta *meta;
 	u32 i;
 
+	if (rx->pkt.head_skb) {
+		dev_kfree_skb(rx->pkt.head_skb);
+		rx->pkt.head_skb = NULL;
+	}
+
 	for (i = 0; i < cfg->rx_ring_depth; ++i) {
 		meta = &rx->meta[i];
 		if (!meta->page)
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next v43 6/8] eea: implement packet transmit logic
  2026-05-14  9:51 [PATCH net-next v43 0/8] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor Xuan Zhuo
                   ` (4 preceding siblings ...)
  2026-05-14  9:51 ` [PATCH net-next v43 5/8] eea: implement packet receive logic Xuan Zhuo
@ 2026-05-14  9:51 ` Xuan Zhuo
  2026-05-18  2:47   ` Xuan Zhuo
  2026-05-14  9:51 ` [PATCH net-next v43 7/8] eea: introduce ethtool support Xuan Zhuo
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 25+ messages in thread
From: Xuan Zhuo @ 2026-05-14  9:51 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Xuan Zhuo, Wen Gu, Philo Lu, Vadim Fedorenko,
	Dong Yibo, Mingyu Wang, Heiner Kallweit, Dust Li

Implement the core logic for transmitting packets in the EEA TX path,
including packet preparation and submission to the underlying transport.

Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Philo Lu <lulie@linux.alibaba.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/ethernet/alibaba/eea/eea_tx.c | 384 +++++++++++++++++++++-
 1 file changed, 380 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/alibaba/eea/eea_tx.c b/drivers/net/ethernet/alibaba/eea/eea_tx.c
index 5a8babfce5de..8b7fc75ead8a 100644
--- a/drivers/net/ethernet/alibaba/eea/eea_tx.c
+++ b/drivers/net/ethernet/alibaba/eea/eea_tx.c
@@ -11,6 +11,11 @@
 #include "eea_pci.h"
 #include "eea_ring.h"
 
+struct eea_sq_free_stats {
+	u64 packets;
+	u64 bytes;
+};
+
 struct eea_tx_meta {
 	struct eea_tx_meta *next;
 
@@ -26,23 +31,394 @@ struct eea_tx_meta {
 	dma_addr_t dma_addr;
 	struct eea_tx_desc *desc;
 	u32 dma_len;
+	bool unmap;
+	bool unmap_single;
 };
 
+static struct eea_tx_meta *eea_tx_meta_get(struct eea_net_tx *tx)
+{
+	struct eea_tx_meta *meta;
+
+	if (!tx->free)
+		return NULL;
+
+	meta = tx->free;
+	tx->free = meta->next;
+
+	return meta;
+}
+
+static void eea_tx_meta_put_and_unmap(struct eea_net_tx *tx,
+				      struct eea_tx_meta *meta)
+{
+	struct eea_tx_meta *head;
+
+	head = meta;
+
+	while (true) {
+		if (meta->unmap) {
+			if (meta->unmap_single)
+				dma_unmap_single(tx->dma_dev, meta->dma_addr,
+						 meta->dma_len, DMA_TO_DEVICE);
+			else
+				dma_unmap_page(tx->dma_dev, meta->dma_addr,
+					       meta->dma_len, DMA_TO_DEVICE);
+		}
+
+		if (meta->next) {
+			meta = meta->next;
+			continue;
+		}
+
+		break;
+	}
+
+	meta->next = tx->free;
+	tx->free = head;
+}
+
+static void eea_meta_free_xmit(struct eea_net_tx *tx,
+			       struct eea_tx_meta *meta,
+			       int budget,
+			       struct eea_tx_cdesc *desc,
+			       struct eea_sq_free_stats *stats)
+{
+	struct sk_buff *skb = meta->skb;
+
+	if (unlikely((skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) && desc)) {
+		struct skb_shared_hwtstamps ts = {};
+
+		ts.hwtstamp = EEA_DESC_TS(desc) + tx->enet->hw_ts_offset;
+		skb_tstamp_tx(skb, &ts);
+	}
+
+	++stats->packets;
+	napi_consume_skb(meta->skb, budget);
+
+	meta->data = NULL;
+}
+
+static int eea_clean_tx(struct eea_net_tx *tx, int budget)
+{
+	struct eea_sq_free_stats stats = {0};
+	struct eea_tx_cdesc *desc;
+	struct eea_tx_meta *meta;
+	int desc_n;
+	u16 id;
+
+	while (stats.packets < budget) {
+		desc = eea_ering_cq_get_desc(tx->ering);
+		if (!desc)
+			break;
+
+		id = le16_to_cpu(desc->id);
+		if (unlikely(id >= tx->ering->num)) {
+			if (net_ratelimit())
+				netdev_err(tx->enet->netdev, "tx invalid id %d\n",
+					   id);
+			eea_ering_cq_ack_desc(tx->ering, 1);
+			continue;
+		}
+
+		meta = &tx->meta[id];
+
+		if (meta->data) {
+			eea_tx_meta_put_and_unmap(tx, meta);
+			eea_meta_free_xmit(tx, meta, budget, desc, &stats);
+			desc_n = meta->num;
+		} else {
+			if (net_ratelimit())
+				netdev_err(tx->enet->netdev,
+					   "tx meta->data is null. id %d num: %d\n",
+					   meta->id, meta->num);
+			desc_n = 1;
+		}
+
+		eea_ering_cq_ack_desc(tx->ering, desc_n);
+	}
+
+	return stats.packets;
+}
+
 int eea_poll_tx(struct eea_net_tx *tx, int budget)
 {
-	/* Empty function; will be implemented in a subsequent commit. */
-	return budget;
+	struct eea_net *enet = tx->enet;
+	u32 index = tx - enet->tx;
+	struct netdev_queue *txq;
+	int num;
+
+	txq = netdev_get_tx_queue(enet->netdev, index);
+
+	__netif_tx_lock(txq, smp_processor_id());
+
+	num = eea_clean_tx(tx, budget);
+
+	if (netif_tx_queue_stopped(txq) &&
+	    tx->ering->num_free >= MAX_SKB_FRAGS + 2)
+		netif_tx_wake_queue(txq);
+
+	__netif_tx_unlock(txq);
+
+	return num;
+}
+
+static int eea_fill_desc_from_skb(const struct sk_buff *skb,
+				  struct eea_tx_desc *desc)
+{
+	if (skb_is_gso(skb)) {
+		struct skb_shared_info *sinfo = skb_shinfo(skb);
+
+		desc->gso_size = cpu_to_le16(sinfo->gso_size);
+		if (sinfo->gso_type & SKB_GSO_TCPV4)
+			desc->gso_type = EEA_TX_GSO_TCPV4;
+
+		else if (sinfo->gso_type & SKB_GSO_TCPV6)
+			desc->gso_type = EEA_TX_GSO_TCPV6;
+
+		else if (sinfo->gso_type & SKB_GSO_UDP_L4)
+			desc->gso_type = EEA_TX_GSO_UDP_L4;
+
+		else
+			return -EINVAL;
+
+		if (sinfo->gso_type & SKB_GSO_TCP_ECN)
+			desc->gso_type |= EEA_TX_GSO_ECN;
+	} else {
+		desc->gso_type = EEA_TX_GSO_NONE;
+	}
+
+	if (skb->ip_summed == CHECKSUM_PARTIAL) {
+		desc->csum_start = cpu_to_le16(skb_checksum_start_offset(skb));
+		desc->csum_offset = cpu_to_le16(skb->csum_offset);
+	}
+
+	return 0;
+}
+
+static struct eea_tx_meta *__eea_tx_desc_fill(struct eea_net_tx *tx,
+					      struct eea_tx_meta *head_meta,
+					      dma_addr_t addr, u32 data_len,
+					      u32 dma_len, bool last,
+					      void *data, u16 flags,
+					      bool unmap)
+{
+	struct eea_tx_meta *meta;
+	struct eea_tx_desc *desc;
+
+	meta = eea_tx_meta_get(tx);
+
+	desc = eea_ering_sq_alloc_desc(tx->ering, meta->id, last, flags);
+	desc->addr = cpu_to_le64(addr);
+	desc->len = cpu_to_le16(data_len);
+
+	meta->next     = NULL;
+	meta->dma_len  = dma_len;
+	meta->dma_addr = addr;
+	meta->data     = data;
+	meta->num      = 1;
+	meta->desc     = desc;
+	meta->unmap    = unmap;
+	meta->unmap_single = false;
+
+	if (head_meta) {
+		meta->next = head_meta->next;
+		head_meta->next = meta;
+		++head_meta->num;
+	}
+
+	return meta;
+}
+
+static struct eea_tx_meta *eea_tx_desc_fill(struct eea_net_tx *tx,
+					    struct eea_tx_meta *head_meta,
+					    dma_addr_t addr, u32 length,
+					    bool is_last, void *data, u16 flags)
+{
+	struct eea_tx_meta *meta;
+	u16 len, last;
+
+	WARN_ON_ONCE(length >= 2 * USHRT_MAX);
+
+	/* Since eea does not support BIG TCP, the maximum GSO size is capped at
+	 * 64KB. Consequently, a single skb buffer (head or fragment) will not
+	 * require more than two descriptors
+	 */
+	if (length > USHRT_MAX) {
+		len = USHRT_MAX;
+		last = false;
+	} else {
+		len = length;
+		last = is_last;
+	}
+
+	meta = __eea_tx_desc_fill(tx, head_meta, addr, len, length,
+				  last, data, flags, true);
+
+	if (length > USHRT_MAX) {
+		if (!head_meta)
+			head_meta = meta;
+
+		addr += USHRT_MAX;
+		len = length - USHRT_MAX;
+
+		__eea_tx_desc_fill(tx, head_meta, addr, len, 0, is_last,
+				   NULL, 0, false);
+	}
+
+	return meta;
+}
+
+static int eea_tx_add_skb_frag(struct eea_net_tx *tx,
+			       struct eea_tx_meta *head_meta,
+			       const skb_frag_t *frag, bool is_last)
+{
+	u32 len = skb_frag_size(frag);
+	dma_addr_t addr;
+
+	addr = skb_frag_dma_map(tx->dma_dev, frag, 0, len, DMA_TO_DEVICE);
+	if (unlikely(dma_mapping_error(tx->dma_dev, addr)))
+		return -ENOMEM;
+
+	eea_tx_desc_fill(tx, head_meta, addr, len, is_last, NULL, 0);
+
+	return 0;
+}
+
+static int eea_tx_post_skb(struct eea_net_tx *tx, struct sk_buff *skb)
+{
+	const struct skb_shared_info *shinfo = skb_shinfo(skb);
+	u32 hlen = skb_headlen(skb);
+	struct eea_tx_meta *meta;
+	const skb_frag_t *frag;
+	dma_addr_t addr;
+	u32 len = hlen;
+	int i, err;
+	u16 flags;
+	bool last;
+
+	if (len) {
+		addr = dma_map_single(tx->dma_dev, skb->data, len,
+				      DMA_TO_DEVICE);
+		if (unlikely(dma_mapping_error(tx->dma_dev, addr)))
+			return -ENOMEM;
+
+		last = !shinfo->nr_frags;
+		i = 0;
+	} else {
+		/* The net stack will never submit an skb with an skb->len of
+		 * 0. If the head len is 0, the number of frags must be greater
+		 * than 0.
+		 */
+		frag = &shinfo->frags[0];
+		len = skb_frag_size(frag);
+
+		addr = skb_frag_dma_map(tx->dma_dev, frag, 0, len,
+					DMA_TO_DEVICE);
+		if (unlikely(dma_mapping_error(tx->dma_dev, addr)))
+			return -ENOMEM;
+
+		last = shinfo->nr_frags == 1;
+		i = 1;
+	}
+
+	flags = skb->ip_summed == CHECKSUM_PARTIAL ? EEA_DESC_F_DO_CSUM : 0;
+
+	meta = eea_tx_desc_fill(tx, NULL, addr, len, last, skb, flags);
+	meta->unmap_single = !!hlen;
+
+	err = eea_fill_desc_from_skb(skb, meta->desc);
+	if (err)
+		goto err_cancel;
+
+	for (; i < shinfo->nr_frags; i++) {
+		frag = &shinfo->frags[i];
+		bool is_last = i == (shinfo->nr_frags - 1);
+
+		err = eea_tx_add_skb_frag(tx, meta, frag, is_last);
+		if (err)
+			goto err_cancel;
+	}
+
+	eea_ering_sq_commit_desc(tx->ering);
+
+	return 0;
+
+err_cancel:
+	eea_ering_sq_cancel(tx->ering);
+	eea_tx_meta_put_and_unmap(tx, meta);
+	meta->data = NULL;
+	return err;
+}
+
+static void eea_tx_kick(struct eea_net_tx *tx)
+{
+	eea_ering_kick(tx->ering);
+}
+
+static int eea_tx_check_free_num(struct eea_net_tx *tx,
+				 struct netdev_queue *txq)
+{
+	int n;
+
+	/* MAX_SKB_FRAGS + 1: Covers the skb linear head and all paged fragments
+	 * 1: Extra slot for a head or fragment that exceeds 64KB.
+	 */
+	n = MAX_SKB_FRAGS + 2;
+	return netif_txq_maybe_stop(txq, tx->ering->num_free, n, n);
 }
 
 netdev_tx_t eea_tx_xmit(struct sk_buff *skb, struct net_device *netdev)
 {
-	/* Empty function; will be implemented in a subsequent commit. */
-	dev_kfree_skb_any(skb);
+	struct eea_net *enet = netdev_priv(netdev);
+	int qnum = skb_get_queue_mapping(skb);
+	struct eea_net_tx *tx = &enet->tx[qnum];
+	struct netdev_queue *txq;
+	int err, enable;
+
+	txq = netdev_get_tx_queue(netdev, qnum);
+
+	enable = eea_tx_check_free_num(tx, txq);
+	if (!enable)
+		return NETDEV_TX_BUSY;
+
+	err = eea_tx_post_skb(tx, skb);
+	if (unlikely(err)) {
+		dev_kfree_skb_any(skb);
+	} else {
+		if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP))
+			skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
+		skb_tx_timestamp(skb);
+	}
+
+	/* NETDEV_TX_BUSY is expensive. So stop advancing the TX queue. */
+	eea_tx_check_free_num(tx, txq);
+
+	if (!netdev_xmit_more() || netif_xmit_stopped(txq))
+		eea_tx_kick(tx);
+
 	return NETDEV_TX_OK;
 }
 
 static void eea_free_meta(struct eea_net_tx *tx, struct eea_net_cfg *cfg)
 {
+	struct eea_sq_free_stats stats = {0};
+	struct eea_tx_meta *meta;
+	int i;
+
+	while ((meta = eea_tx_meta_get(tx)))
+		meta->skb = NULL;
+
+	for (i = 0; i < cfg->tx_ring_depth; i++) {
+		meta = &tx->meta[i];
+
+		if (!meta->skb)
+			continue;
+
+		eea_tx_meta_put_and_unmap(tx, meta);
+
+		eea_meta_free_xmit(tx, meta, 0, NULL, &stats);
+	}
+
 	kvfree(tx->meta);
 	tx->meta = NULL;
 }
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next v43 7/8] eea: introduce ethtool support
  2026-05-14  9:51 [PATCH net-next v43 0/8] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor Xuan Zhuo
                   ` (5 preceding siblings ...)
  2026-05-14  9:51 ` [PATCH net-next v43 6/8] eea: implement packet transmit logic Xuan Zhuo
@ 2026-05-14  9:51 ` Xuan Zhuo
  2026-05-18  2:56   ` Xuan Zhuo
  2026-05-14  9:51 ` [PATCH net-next v43 8/8] eea: introduce callback for ndo_get_stats64 and register netdev Xuan Zhuo
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 25+ messages in thread
From: Xuan Zhuo @ 2026-05-14  9:51 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Xuan Zhuo, Wen Gu, Philo Lu, Vadim Fedorenko,
	Dong Yibo, Mingyu Wang, Heiner Kallweit, Dust Li, Andrew Lunn

Add basic driver framework for the Alibaba Elastic Ethernet Adapter(EEA).

This commit introduces ethtool support.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Philo Lu <lulie@linux.alibaba.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/ethernet/alibaba/eea/Makefile     |   1 +
 .../net/ethernet/alibaba/eea/eea_ethtool.c    | 273 ++++++++++++++++++
 .../net/ethernet/alibaba/eea/eea_ethtool.h    |  48 +++
 drivers/net/ethernet/alibaba/eea/eea_net.c    |   1 +
 drivers/net/ethernet/alibaba/eea/eea_net.h    |   5 +
 drivers/net/ethernet/alibaba/eea/eea_rx.c     |  32 +-
 drivers/net/ethernet/alibaba/eea/eea_tx.c     |  22 ++
 7 files changed, 377 insertions(+), 5 deletions(-)
 create mode 100644 drivers/net/ethernet/alibaba/eea/eea_ethtool.c
 create mode 100644 drivers/net/ethernet/alibaba/eea/eea_ethtool.h

diff --git a/drivers/net/ethernet/alibaba/eea/Makefile b/drivers/net/ethernet/alibaba/eea/Makefile
index 5f0961002e57..c38db22cca34 100644
--- a/drivers/net/ethernet/alibaba/eea/Makefile
+++ b/drivers/net/ethernet/alibaba/eea/Makefile
@@ -4,5 +4,6 @@ eea-y := eea_ring.o \
 	eea_net.o \
 	eea_pci.o \
 	eea_adminq.o \
+	eea_ethtool.o \
 	eea_tx.o \
 	eea_rx.o
diff --git a/drivers/net/ethernet/alibaba/eea/eea_ethtool.c b/drivers/net/ethernet/alibaba/eea/eea_ethtool.c
new file mode 100644
index 000000000000..479779aa2dd8
--- /dev/null
+++ b/drivers/net/ethernet/alibaba/eea/eea_ethtool.c
@@ -0,0 +1,273 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Driver for Alibaba Elastic Ethernet Adapter.
+ *
+ * Copyright (C) 2025 Alibaba Inc.
+ */
+
+#include <linux/ethtool.h>
+#include <linux/ethtool_netlink.h>
+#include <linux/rtnetlink.h>
+
+#include "eea_adminq.h"
+#include "eea_net.h"
+#include "eea_pci.h"
+
+struct eea_stat_desc {
+	char desc[ETH_GSTRING_LEN];
+	size_t offset;
+};
+
+#define EEA_TX_STAT(m)	{#m, offsetof(struct eea_tx_stats, m)}
+#define EEA_RX_STAT(m)	{#m, offsetof(struct eea_rx_stats, m)}
+
+static const struct eea_stat_desc eea_rx_stats_desc[] = {
+	EEA_RX_STAT(descs),
+	EEA_RX_STAT(kicks),
+};
+
+static const struct eea_stat_desc eea_tx_stats_desc[] = {
+	EEA_TX_STAT(descs),
+	EEA_TX_STAT(kicks),
+};
+
+#define EEA_TX_STATS_LEN	ARRAY_SIZE(eea_tx_stats_desc)
+#define EEA_RX_STATS_LEN	ARRAY_SIZE(eea_rx_stats_desc)
+
+static void eea_get_drvinfo(struct net_device *netdev,
+			    struct ethtool_drvinfo *info)
+{
+	struct eea_net *enet = netdev_priv(netdev);
+	struct eea_device *edev = enet->edev;
+
+	strscpy(info->driver,   KBUILD_MODNAME,     sizeof(info->driver));
+	strscpy(info->bus_info, eea_pci_name(edev), sizeof(info->bus_info));
+}
+
+static void eea_get_ringparam(struct net_device *netdev,
+			      struct ethtool_ringparam *ring,
+			      struct kernel_ethtool_ringparam *kernel_ring,
+			      struct netlink_ext_ack *extack)
+{
+	struct eea_net *enet = netdev_priv(netdev);
+
+	ring->rx_max_pending = enet->cfg_hw.rx_ring_depth;
+	ring->tx_max_pending = enet->cfg_hw.tx_ring_depth;
+	ring->rx_pending = enet->cfg.rx_ring_depth;
+	ring->tx_pending = enet->cfg.tx_ring_depth;
+
+	kernel_ring->tcp_data_split = enet->cfg.split_hdr ?
+				      ETHTOOL_TCP_DATA_SPLIT_ENABLED :
+				      ETHTOOL_TCP_DATA_SPLIT_DISABLED;
+}
+
+static int eea_set_ringparam(struct net_device *netdev,
+			     struct ethtool_ringparam *ring,
+			     struct kernel_ethtool_ringparam *kernel_ring,
+			     struct netlink_ext_ack *extack)
+{
+	struct eea_net *enet = netdev_priv(netdev);
+	struct eea_net_init_ctx ctx;
+	bool need_update = false;
+	struct eea_net_cfg *cfg;
+	bool sh;
+
+	if (ring->rx_pending < EEA_NET_IO_RING_DEPTH_MIN ||
+	    ring->tx_pending < EEA_NET_IO_RING_DEPTH_MIN)
+		return -EINVAL;
+
+	if (!is_power_of_2(ring->rx_pending) ||
+	    !is_power_of_2(ring->tx_pending))
+		return -EINVAL;
+
+	eea_init_ctx(enet, &ctx);
+
+	cfg = &ctx.cfg;
+
+	if (ring->rx_pending != cfg->rx_ring_depth)
+		need_update = true;
+
+	if (ring->tx_pending != cfg->tx_ring_depth)
+		need_update = true;
+
+	sh = false;
+
+	switch (kernel_ring->tcp_data_split) {
+	case ETHTOOL_TCP_DATA_SPLIT_ENABLED:
+		sh = true;
+		break;
+
+	case ETHTOOL_TCP_DATA_SPLIT_DISABLED:
+		sh = false;
+		break;
+
+	case ETHTOOL_TCP_DATA_SPLIT_UNKNOWN:
+		sh = !!cfg->split_hdr;
+		break;
+	}
+
+	if (sh != !!(cfg->split_hdr))
+		need_update = true;
+
+	if (!need_update)
+		return 0;
+
+	cfg->rx_ring_depth = ring->rx_pending;
+	cfg->tx_ring_depth = ring->tx_pending;
+
+	/* By default, enet->cfg_hw.split_hdr is 128. */
+	cfg->split_hdr = sh ? enet->cfg_hw.split_hdr : 0;
+
+	return eea_reset_hw_resources(enet, &ctx);
+}
+
+static int eea_set_channels(struct net_device *netdev,
+			    struct ethtool_channels *channels)
+{
+	struct eea_net *enet = netdev_priv(netdev);
+	u16 queue_pairs = channels->combined_count;
+	struct eea_net_init_ctx ctx;
+	struct eea_net_cfg *cfg;
+
+	eea_init_ctx(enet, &ctx);
+
+	cfg = &ctx.cfg;
+
+	cfg->rx_ring_num = queue_pairs;
+	cfg->tx_ring_num = queue_pairs;
+
+	return eea_reset_hw_resources(enet, &ctx);
+}
+
+static void eea_get_channels(struct net_device *netdev,
+			     struct ethtool_channels *channels)
+{
+	struct eea_net *enet = netdev_priv(netdev);
+
+	channels->combined_count = enet->cfg.rx_ring_num;
+	channels->max_combined   = enet->cfg_hw.rx_ring_num;
+}
+
+static void eea_get_strings(struct net_device *netdev, u32 stringset, u8 *data)
+{
+	struct eea_net *enet = netdev_priv(netdev);
+	u8 *p = data;
+	u32 i, j;
+
+	if (stringset != ETH_SS_STATS)
+		return;
+
+	for (i = 0; i < enet->cfg.rx_ring_num; i++) {
+		for (j = 0; j < EEA_RX_STATS_LEN; j++)
+			ethtool_sprintf(&p, "rx%u_%s", i,
+					eea_rx_stats_desc[j].desc);
+	}
+
+	for (i = 0; i < enet->cfg.tx_ring_num; i++) {
+		for (j = 0; j < EEA_TX_STATS_LEN; j++)
+			ethtool_sprintf(&p, "tx%u_%s", i,
+					eea_tx_stats_desc[j].desc);
+	}
+}
+
+static int eea_get_sset_count(struct net_device *netdev, int sset)
+{
+	struct eea_net *enet = netdev_priv(netdev);
+
+	if (sset != ETH_SS_STATS)
+		return -EOPNOTSUPP;
+
+	return enet->cfg.rx_ring_num * EEA_RX_STATS_LEN +
+		enet->cfg.tx_ring_num * EEA_TX_STATS_LEN;
+}
+
+static void eea_stats_fill_for_q(struct u64_stats_sync *syncp, u32 num,
+				 const struct eea_stat_desc *desc,
+				 u64 *data, u32 idx)
+{
+	void *stats_base = syncp;
+	u32 start, i;
+
+	do {
+		start = u64_stats_fetch_begin(syncp);
+		for (i = 0; i < num; i++)
+			data[idx + i] =
+				u64_stats_read(stats_base + desc[i].offset);
+
+	} while (u64_stats_fetch_retry(syncp, start));
+
+	BUILD_BUG_ON(offsetof(struct eea_tx_stats, syncp));
+	BUILD_BUG_ON(offsetof(struct eea_rx_stats, syncp));
+}
+
+static void eea_get_ethtool_stats(struct net_device *netdev,
+				  struct ethtool_stats *stats, u64 *data)
+{
+	struct eea_net *enet = netdev_priv(netdev);
+	u32 i, idx = 0;
+
+	ASSERT_RTNL();
+
+	if (enet->rx) {
+		for (i = 0; i < enet->cfg.rx_ring_num; i++) {
+			struct eea_net_rx *rx = enet->rx[i];
+
+			eea_stats_fill_for_q(&rx->stats.syncp, EEA_RX_STATS_LEN,
+					     eea_rx_stats_desc, data, idx);
+
+			idx += EEA_RX_STATS_LEN;
+		}
+	}
+
+	if (enet->tx) {
+		for (i = 0; i < enet->cfg.tx_ring_num; i++) {
+			struct eea_net_tx *tx = &enet->tx[i];
+
+			eea_stats_fill_for_q(&tx->stats.syncp, EEA_TX_STATS_LEN,
+					     eea_tx_stats_desc, data, idx);
+
+			idx += EEA_TX_STATS_LEN;
+		}
+	}
+}
+
+void eea_update_rx_stats(struct eea_rx_stats *rx_stats,
+			 struct eea_rx_ctx_stats *stats)
+{
+	u64_stats_update_begin(&rx_stats->syncp);
+	u64_stats_add(&rx_stats->descs,             stats->descs);
+	u64_stats_add(&rx_stats->packets,           stats->packets);
+	u64_stats_add(&rx_stats->bytes,             stats->bytes);
+	u64_stats_add(&rx_stats->drops,             stats->drops);
+	u64_stats_add(&rx_stats->split_hdr_bytes,   stats->split_hdr_bytes);
+	u64_stats_add(&rx_stats->split_hdr_packets, stats->split_hdr_packets);
+	u64_stats_add(&rx_stats->length_errors,     stats->length_errors);
+	u64_stats_add(&rx_stats->kicks,             stats->kicks);
+	u64_stats_update_end(&rx_stats->syncp);
+}
+
+static int eea_get_link_ksettings(struct net_device *netdev,
+				  struct ethtool_link_ksettings *cmd)
+{
+	struct eea_net *enet = netdev_priv(netdev);
+
+	cmd->base.speed  = enet->speed;
+	cmd->base.duplex = enet->duplex;
+	cmd->base.port   = PORT_OTHER;
+
+	return 0;
+}
+
+const struct ethtool_ops eea_ethtool_ops = {
+	.supported_ring_params = ETHTOOL_RING_USE_TCP_DATA_SPLIT,
+	.get_drvinfo        = eea_get_drvinfo,
+	.get_link           = ethtool_op_get_link,
+	.get_ringparam      = eea_get_ringparam,
+	.set_ringparam      = eea_set_ringparam,
+	.set_channels       = eea_set_channels,
+	.get_channels       = eea_get_channels,
+	.get_strings        = eea_get_strings,
+	.get_sset_count     = eea_get_sset_count,
+	.get_ethtool_stats  = eea_get_ethtool_stats,
+	.get_link_ksettings = eea_get_link_ksettings,
+};
diff --git a/drivers/net/ethernet/alibaba/eea/eea_ethtool.h b/drivers/net/ethernet/alibaba/eea/eea_ethtool.h
new file mode 100644
index 000000000000..08c824a116de
--- /dev/null
+++ b/drivers/net/ethernet/alibaba/eea/eea_ethtool.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Driver for Alibaba Elastic Ethernet Adapter.
+ *
+ * Copyright (C) 2025 Alibaba Inc.
+ */
+
+#ifndef __EEA_ETHTOOL_H__
+#define __EEA_ETHTOOL_H__
+
+struct eea_tx_stats {
+	struct u64_stats_sync syncp;
+	u64_stats_t descs;
+	u64_stats_t packets;
+	u64_stats_t bytes;
+	u64_stats_t drops;
+	u64_stats_t kicks;
+};
+
+struct eea_rx_ctx_stats {
+	u64 descs;
+	u64 packets;
+	u64 bytes;
+	u64 drops;
+	u64 split_hdr_bytes;
+	u64 split_hdr_packets;
+	u64 kicks;
+	u64 length_errors;
+};
+
+struct eea_rx_stats {
+	struct u64_stats_sync syncp;
+	u64_stats_t descs;
+	u64_stats_t packets;
+	u64_stats_t bytes;
+	u64_stats_t drops;
+	u64_stats_t kicks;
+	u64_stats_t split_hdr_bytes;
+	u64_stats_t split_hdr_packets;
+
+	u64_stats_t length_errors;
+};
+
+void eea_update_rx_stats(struct eea_rx_stats *rx_stats,
+			 struct eea_rx_ctx_stats *stats);
+
+extern const struct ethtool_ops eea_ethtool_ops;
+#endif
diff --git a/drivers/net/ethernet/alibaba/eea/eea_net.c b/drivers/net/ethernet/alibaba/eea/eea_net.c
index bf8581bc27ed..f4505eed8bdd 100644
--- a/drivers/net/ethernet/alibaba/eea/eea_net.c
+++ b/drivers/net/ethernet/alibaba/eea/eea_net.c
@@ -624,6 +624,7 @@ static struct eea_net *eea_netdev_alloc(struct eea_device *edev, u32 pairs)
 	}
 
 	netdev->netdev_ops = &eea_netdev;
+	netdev->ethtool_ops = &eea_ethtool_ops;
 	SET_NETDEV_DEV(netdev, edev->dma_dev);
 
 	enet = netdev_priv(netdev);
diff --git a/drivers/net/ethernet/alibaba/eea/eea_net.h b/drivers/net/ethernet/alibaba/eea/eea_net.h
index 1ff9ef1a8fbe..6a5b2f908c76 100644
--- a/drivers/net/ethernet/alibaba/eea/eea_net.h
+++ b/drivers/net/ethernet/alibaba/eea/eea_net.h
@@ -12,6 +12,7 @@
 #include <linux/netdevice.h>
 
 #include "eea_adminq.h"
+#include "eea_ethtool.h"
 #include "eea_ring.h"
 
 #define EEA_VER_MAJOR		1
@@ -33,6 +34,8 @@ struct eea_net_tx {
 	u32 index;
 
 	char name[16];
+
+	struct eea_tx_stats stats;
 };
 
 struct eea_rx_meta {
@@ -87,6 +90,8 @@ struct eea_net_rx {
 
 	struct napi_struct *napi;
 
+	struct eea_rx_stats stats;
+
 	char name[16];
 
 	struct eea_net_rx_pkt_ctx pkt;
diff --git a/drivers/net/ethernet/alibaba/eea/eea_rx.c b/drivers/net/ethernet/alibaba/eea/eea_rx.c
index 9c71a7cf950e..a3f0d2a79ad8 100644
--- a/drivers/net/ethernet/alibaba/eea/eea_rx.c
+++ b/drivers/net/ethernet/alibaba/eea/eea_rx.c
@@ -29,6 +29,8 @@ struct eea_rx_ctx {
 	bool more;
 
 	struct eea_rx_meta *meta;
+
+	struct eea_rx_ctx_stats stats;
 };
 
 static struct eea_rx_meta *eea_rx_meta_get(struct eea_net_rx *rx)
@@ -232,6 +234,7 @@ static int eea_harden_check_overflow(struct eea_rx_ctx *ctx,
 	if (unlikely(ctx->len > max_len)) {
 		pr_debug("%s: rx error: len %u exceeds truesize %u\n",
 			 enet->netdev->name, ctx->len, max_len);
+		++ctx->stats.length_errors;
 		return -EINVAL;
 	}
 
@@ -250,6 +253,7 @@ static int eea_harden_check_size(struct eea_rx_ctx *ctx, struct eea_net *enet)
 		if (unlikely(ctx->hdr_len < ETH_HLEN)) {
 			pr_debug("%s: short hdr %u\n", enet->netdev->name,
 				 ctx->hdr_len);
+			++ctx->stats.length_errors;
 			return -EINVAL;
 		}
 
@@ -257,6 +261,7 @@ static int eea_harden_check_size(struct eea_rx_ctx *ctx, struct eea_net *enet)
 			pr_debug("%s: rx error: hdr len %u exceeds hdr buffer size %u\n",
 				 enet->netdev->name, ctx->hdr_len,
 				 enet->cfg.split_hdr);
+			++ctx->stats.length_errors;
 			return -EINVAL;
 		}
 
@@ -265,6 +270,7 @@ static int eea_harden_check_size(struct eea_rx_ctx *ctx, struct eea_net *enet)
 
 	if (unlikely(ctx->len < ETH_HLEN)) {
 		pr_debug("%s: short packet %u\n", enet->netdev->name, ctx->len);
+		++ctx->stats.length_errors;
 		return -EINVAL;
 	}
 
@@ -373,6 +379,7 @@ static void process_remain_buf(struct eea_net_rx *rx, struct eea_rx_ctx *ctx)
 
 err:
 	dev_kfree_skb(rx->pkt.head_skb);
+	++ctx->stats.drops;
 	rx->pkt.do_drop = true;
 	rx->pkt.head_skb = NULL;
 }
@@ -400,6 +407,7 @@ static void process_first_buf(struct eea_net_rx *rx, struct eea_rx_ctx *ctx)
 	return;
 
 err:
+	++ctx->stats.drops;
 	rx->pkt.do_drop = true;
 }
 
@@ -460,6 +468,8 @@ static int eea_rx_desc_to_ctx(struct eea_net_rx *rx,
 	if (ctx->flags & EEA_DESC_F_SPLIT_HDR) {
 		ctx->hdr_len = le16_to_cpu(desc->len_ex) &
 			EEA_RX_CDESC_HDR_LEN_MASK;
+		ctx->stats.split_hdr_bytes += ctx->hdr_len;
+		++ctx->stats.split_hdr_packets;
 	}
 
 	ctx->more = ctx->flags & EEA_RING_DESC_F_MORE;
@@ -484,8 +494,10 @@ static int eea_cleanrx(struct eea_net_rx *rx, int budget,
 			if (ctx->meta)
 				eea_rx_meta_put(rx, ctx->meta);
 
-			if (rx->pkt.head_skb)
+			if (rx->pkt.head_skb) {
 				dev_kfree_skb(rx->pkt.head_skb);
+				++ctx->stats.drops;
+			}
 
 			/* A hardware error occurred; we are attempting to
 			 * mitigate the impact. Subsequent packets may be
@@ -512,13 +524,17 @@ static int eea_cleanrx(struct eea_net_rx *rx, int budget,
 
 		++rx->pkt.idx;
 
-		if (!ctx->more && rx->pkt.head_skb)
+		if (!ctx->more && rx->pkt.head_skb) {
 			eea_submit_skb(rx, rx->pkt.head_skb, desc);
+			ctx->stats.bytes += rx->pkt.recv_len;
+			++ctx->stats.packets;
+		}
 
 skip:
 		eea_rx_meta_put(rx, meta);
 ack:
 		eea_ering_cq_ack_desc(rx->ering, 1);
+		++ctx->stats.descs;
 
 		if (!ctx->more) {
 			memset(&rx->pkt, 0, sizeof(rx->pkt));
@@ -537,7 +553,7 @@ static void eea_rx_dma_sync_hdr(struct eea_net_rx *rx, dma_addr_t addr)
 }
 
 /* Only be called from napi. */
-static void eea_rx_post(struct eea_net_rx *rx)
+static void eea_rx_post(struct eea_net_rx *rx, struct eea_rx_ctx *ctx)
 {
 	u32 tailroom, headroom, room, len;
 	struct eea_rx_meta *meta;
@@ -586,8 +602,10 @@ static void eea_rx_post(struct eea_net_rx *rx)
 		++num;
 	}
 
-	if (num)
+	if (num) {
 		eea_ering_kick(rx->ering);
+		++ctx->stats.kicks;
+	}
 }
 
 static int eea_poll(struct napi_struct *napi, int budget)
@@ -608,12 +626,14 @@ static int eea_poll(struct napi_struct *napi, int budget)
 		 * buffers are exhausted. Therefore, we should proactively
 		 * pre-fill the buffers to avoid starvation.
 		 */
-		eea_rx_post(rx);
+		eea_rx_post(rx, &ctx);
 
 		if (rx->ering->num - rx->ering->num_free < budget)
 			busy = true;
 	}
 
+	eea_update_rx_stats(&rx->stats, &ctx.stats);
+
 	busy |= received >= budget;
 
 	if (busy)
@@ -751,6 +771,8 @@ struct eea_net_rx *eea_alloc_rx(struct eea_net_init_ctx *ctx, u32 idx)
 	rx->index = idx;
 	snprintf(rx->name, sizeof(rx->name), "rx.%u", idx);
 
+	u64_stats_init(&rx->stats.syncp);
+
 	/* ering */
 	ering = eea_ering_alloc(idx * 2, ctx->cfg.rx_ring_depth, ctx->edev,
 				ctx->cfg.rx_sq_desc_size,
diff --git a/drivers/net/ethernet/alibaba/eea/eea_tx.c b/drivers/net/ethernet/alibaba/eea/eea_tx.c
index 8b7fc75ead8a..85fb0e9ca5ba 100644
--- a/drivers/net/ethernet/alibaba/eea/eea_tx.c
+++ b/drivers/net/ethernet/alibaba/eea/eea_tx.c
@@ -93,6 +93,7 @@ static void eea_meta_free_xmit(struct eea_net_tx *tx,
 	}
 
 	++stats->packets;
+	stats->bytes += meta->skb->len;
 	napi_consume_skb(meta->skb, budget);
 
 	meta->data = NULL;
@@ -137,6 +138,13 @@ static int eea_clean_tx(struct eea_net_tx *tx, int budget)
 		eea_ering_cq_ack_desc(tx->ering, desc_n);
 	}
 
+	if (stats.packets) {
+		u64_stats_update_begin(&tx->stats.syncp);
+		u64_stats_add(&tx->stats.bytes, stats.bytes);
+		u64_stats_add(&tx->stats.packets, stats.packets);
+		u64_stats_update_end(&tx->stats.syncp);
+	}
+
 	return stats.packets;
 }
 
@@ -341,6 +349,10 @@ static int eea_tx_post_skb(struct eea_net_tx *tx, struct sk_buff *skb)
 
 	eea_ering_sq_commit_desc(tx->ering);
 
+	u64_stats_update_begin(&tx->stats.syncp);
+	u64_stats_add(&tx->stats.descs, meta->num);
+	u64_stats_update_end(&tx->stats.syncp);
+
 	return 0;
 
 err_cancel:
@@ -353,6 +365,10 @@ static int eea_tx_post_skb(struct eea_net_tx *tx, struct sk_buff *skb)
 static void eea_tx_kick(struct eea_net_tx *tx)
 {
 	eea_ering_kick(tx->ering);
+
+	u64_stats_update_begin(&tx->stats.syncp);
+	u64_stats_inc(&tx->stats.kicks);
+	u64_stats_update_end(&tx->stats.syncp);
 }
 
 static int eea_tx_check_free_num(struct eea_net_tx *tx,
@@ -383,6 +399,10 @@ netdev_tx_t eea_tx_xmit(struct sk_buff *skb, struct net_device *netdev)
 
 	err = eea_tx_post_skb(tx, skb);
 	if (unlikely(err)) {
+		u64_stats_update_begin(&tx->stats.syncp);
+		u64_stats_inc(&tx->stats.drops);
+		u64_stats_update_end(&tx->stats.syncp);
+
 		dev_kfree_skb_any(skb);
 	} else {
 		if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP))
@@ -444,6 +464,8 @@ int eea_alloc_tx(struct eea_net_init_ctx *ctx, struct eea_net_tx *tx, u32 idx)
 	struct eea_ring *ering;
 	u32 i;
 
+	u64_stats_init(&tx->stats.syncp);
+
 	snprintf(tx->name, sizeof(tx->name), "tx.%u", idx);
 
 	ering = eea_ering_alloc(idx * 2 + 1, ctx->cfg.tx_ring_depth, ctx->edev,
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next v43 8/8] eea: introduce callback for ndo_get_stats64 and register netdev
  2026-05-14  9:51 [PATCH net-next v43 0/8] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor Xuan Zhuo
                   ` (6 preceding siblings ...)
  2026-05-14  9:51 ` [PATCH net-next v43 7/8] eea: introduce ethtool support Xuan Zhuo
@ 2026-05-14  9:51 ` Xuan Zhuo
  2026-05-18  3:01   ` Xuan Zhuo
  2026-05-15  6:40 ` [PATCH net-next v43 0/8] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor Xuan Zhuo
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 25+ messages in thread
From: Xuan Zhuo @ 2026-05-14  9:51 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Xuan Zhuo, Wen Gu, Philo Lu, Vadim Fedorenko,
	Dong Yibo, Mingyu Wang, Heiner Kallweit, Dust Li

This commit adds support for ndo_get_stats64 to provide accurate
interface statistics. With the TX and RX data paths now fully functional,
it is appropriate to register the netdevice and expose the interface to
userspace.

Registered the network device via register_netdev, and updated the
corresponding unregister_netdev and dev_close routines to ensure
synchronization.

Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Philo Lu <lulie@linux.alibaba.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/ethernet/alibaba/eea/eea_net.c | 83 ++++++++++++++++++++--
 drivers/net/ethernet/alibaba/eea/eea_net.h |  5 ++
 2 files changed, 84 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/alibaba/eea/eea_net.c b/drivers/net/ethernet/alibaba/eea/eea_net.c
index f4505eed8bdd..63e68580de94 100644
--- a/drivers/net/ethernet/alibaba/eea/eea_net.c
+++ b/drivers/net/ethernet/alibaba/eea/eea_net.c
@@ -112,6 +112,11 @@ static void eea_bind_q_and_cfg(struct eea_net *enet,
 	struct eea_net_tx *tx;
 	int i;
 
+	/* Since 'ndo_get_stats64' is not called in softirq context, there is no
+	 * need to use 'spin_lock_bh'.
+	 */
+	spin_lock(&enet->stats_lock);
+
 	enet->cfg = ctx->cfg;
 	enet->rx = ctx->rx;
 	enet->tx = ctx->tx;
@@ -131,6 +136,8 @@ static void eea_bind_q_and_cfg(struct eea_net *enet,
 
 		blk->rx = rx;
 	}
+
+	spin_unlock(&enet->stats_lock);
 }
 
 static void eea_unbind_q_and_cfg(struct eea_net *enet,
@@ -140,6 +147,8 @@ static void eea_unbind_q_and_cfg(struct eea_net *enet,
 	struct eea_net_rx *rx;
 	int i;
 
+	spin_lock(&enet->stats_lock);
+
 	ctx->cfg = enet->cfg;
 	ctx->rx = enet->rx;
 	ctx->tx = enet->tx;
@@ -156,6 +165,8 @@ static void eea_unbind_q_and_cfg(struct eea_net *enet,
 
 		blk->rx = NULL;
 	}
+
+	spin_unlock(&enet->stats_lock);
 }
 
 static void eea_free_rxtx_q_mem(struct eea_net_init_ctx *ctx)
@@ -342,6 +353,60 @@ static int eea_netdev_open(struct net_device *netdev)
 	return err;
 }
 
+/* Statistics may be reset to zero upon device reset. This is expected behavior
+ * for now and will be addressed in the future.
+ */
+static void eea_stats(struct net_device *netdev, struct rtnl_link_stats64 *tot)
+{
+	struct eea_net *enet = netdev_priv(netdev);
+	u64 packets, bytes, drop, lerr;
+	u32 start;
+	int i;
+
+	spin_lock(&enet->stats_lock);
+
+	if (enet->rx) {
+		for (i = 0; i < enet->cfg.rx_ring_num; i++) {
+			struct eea_net_rx *rx = enet->rx[i];
+
+			do {
+				start = u64_stats_fetch_begin(&rx->stats.syncp);
+				packets = u64_stats_read(&rx->stats.packets);
+				bytes = u64_stats_read(&rx->stats.bytes);
+				drop = u64_stats_read(&rx->stats.drops);
+				lerr = u64_stats_read(&rx->stats.length_errors);
+			} while (u64_stats_fetch_retry(&rx->stats.syncp,
+						       start));
+
+			tot->rx_packets       += packets;
+			tot->rx_bytes         += bytes;
+			tot->rx_dropped       += drop;
+			tot->rx_length_errors += lerr;
+			tot->rx_errors        += lerr;
+		}
+	}
+
+	if (enet->tx) {
+		for (i = 0; i < enet->cfg.tx_ring_num; i++) {
+			struct eea_net_tx *tx = &enet->tx[i];
+
+			do {
+				start = u64_stats_fetch_begin(&tx->stats.syncp);
+				packets = u64_stats_read(&tx->stats.packets);
+				bytes = u64_stats_read(&tx->stats.bytes);
+				drop = u64_stats_read(&tx->stats.drops);
+			} while (u64_stats_fetch_retry(&tx->stats.syncp,
+						       start));
+
+			tot->tx_packets += packets;
+			tot->tx_bytes   += bytes;
+			tot->tx_dropped += drop;
+		}
+	}
+
+	spin_unlock(&enet->stats_lock);
+}
+
 /* resources: ring, buffers, irq */
 int eea_reset_hw_resources(struct eea_net *enet, struct eea_net_init_ctx *ctx)
 {
@@ -349,7 +414,9 @@ int eea_reset_hw_resources(struct eea_net *enet, struct eea_net_init_ctx *ctx)
 	int err, error;
 
 	if (!netif_running(enet->netdev) || !enet->started) {
+		spin_lock(&enet->stats_lock);
 		enet->cfg = ctx->cfg;
+		spin_unlock(&enet->stats_lock);
 		return 0;
 	}
 
@@ -607,6 +674,7 @@ static const struct net_device_ops eea_netdev = {
 	.ndo_stop           = eea_netdev_stop,
 	.ndo_start_xmit     = eea_tx_xmit,
 	.ndo_validate_addr  = eth_validate_addr,
+	.ndo_get_stats64    = eea_stats,
 	.ndo_features_check = passthru_features_check,
 };
 
@@ -640,6 +708,8 @@ static struct eea_net *eea_netdev_alloc(struct eea_device *edev, u32 pairs)
 		return NULL;
 	}
 
+	spin_lock_init(&enet->stats_lock);
+
 	return enet;
 }
 
@@ -725,11 +795,13 @@ int eea_net_probe(struct eea_device *edev)
 
 	eea_update_ts_off(edev, enet);
 
-	netdev_dbg(enet->netdev, "eea probe success.\n");
+	netif_carrier_off(enet->netdev);
 
-	/* Queue TX/RX implementation is still in progress. register_netdev is
-	 * deferred until these are completed in subsequent commits.
-	 */
+	err = register_netdev(enet->netdev);
+	if (err)
+		goto err_reset_dev;
+
+	netdev_dbg(enet->netdev, "eea probe success.\n");
 
 	return 0;
 
@@ -781,6 +853,8 @@ void eea_net_remove(struct eea_device *edev, bool ha)
 		return;
 	}
 
+	unregister_netdev(netdev);
+
 	if (!enet->wait_pci_ready) {
 		eea_device_reset(edev);
 		eea_destroy_adminq(enet);
@@ -801,6 +875,7 @@ void eea_net_shutdown(struct eea_device *edev)
 	rtnl_lock();
 
 	netif_device_detach(netdev);
+	dev_close(netdev);
 
 	if (!enet->wait_pci_ready) {
 		eea_device_reset(edev);
diff --git a/drivers/net/ethernet/alibaba/eea/eea_net.h b/drivers/net/ethernet/alibaba/eea/eea_net.h
index 6a5b2f908c76..848bb90bccf8 100644
--- a/drivers/net/ethernet/alibaba/eea/eea_net.h
+++ b/drivers/net/ethernet/alibaba/eea/eea_net.h
@@ -165,6 +165,11 @@ struct eea_net {
 	u32 speed;
 
 	u64 hw_ts_offset;
+
+	/* Protect the tx and rx of struct eea_net, when eea_stats accesses the
+	 * stats from rx and tx queues.
+	 */
+	spinlock_t stats_lock;
 };
 
 int eea_net_probe(struct eea_device *edev);
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v43 0/8] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor
  2026-05-14  9:51 [PATCH net-next v43 0/8] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor Xuan Zhuo
                   ` (7 preceding siblings ...)
  2026-05-14  9:51 ` [PATCH net-next v43 8/8] eea: introduce callback for ndo_get_stats64 and register netdev Xuan Zhuo
@ 2026-05-15  6:40 ` Xuan Zhuo
  2026-05-19 10:33 ` Paolo Abeni
  2026-05-19 10:40 ` patchwork-bot+netdevbpf
  10 siblings, 0 replies; 25+ messages in thread
From: Xuan Zhuo @ 2026-05-15  6:40 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Wen Gu, Philo Lu, Vadim Fedorenko, Dong Yibo,
	Mingyu Wang, Heiner Kallweit, Dust Li, netdev

Hi Paolo,

Could you please clarify if this patch set is under embargo?

Also, I noticed that this version is very similar to the previous one. Could
you advise on how you would like me to proceed?

Link: https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260514095138.80680-1-xuanzhuo%40linux.alibaba.com

Thanks


On Thu, 14 May 2026 17:51:30 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> Add a driver framework for EEA that will be available in the future.
>
> This driver is currently quite minimal, implementing only fundamental
> core functionalities. Key features include: I/O queue management via
> adminq, basic PCI-layer operations, and essential RX/TX data
> communication capabilities. It also supports the creation,
> initialization, and management of network devices (netdev). Furthermore,
> the ring structures for both I/O queues and adminq have been abstracted
> into a simple, unified, and reusable library implementation,
> facilitating future extension and maintenance.
>
> v43:
>     1. fix the comments from sashiko
>
> v42:
>     1. fix the comments from Jakub
>     2. fix the comments from sashiko
>
> v41:
>     1. make https://sashiko.dev/ happy
>
> v40:
>     1. fix comments from https://sashiko.dev/
>
> v39:
>     1. fix comments from https://sashiko.dev/#/patchset/20260407122001.22265-1-xuanzhuo%40linux.alibaba.com
>
> v38:
>     1. fix comments from https://sashiko.dev/#/patchset/20260404135144.33166-1-xuanzhuo%40linux.alibaba.com
>
> v36:
>     1. fix comments from https://sashiko.dev/#/patchset/20260323074441.91691-1-xuanzhuo%40linux.alibaba.com
>
> v35:
>     1. fix comments from https://netdev-ai.bots.linux.dev/ai-review.html?id=24371ffc-a1ea-46e1-a6b3-b8cbcbb52efb
>
> v34:
>     1. fix the comments from https://sashiko.dev/#/patchset/20260317015257.79494-1-xuanzhuo%40linux.alibaba.com
>
> v33:
>     1. fix the comments. old version refers enet_bind_new_q_and_cfg, but that is
>        changed to eea_bind_q_and_cfg.
>
> v32:
>     1. remove unused parameters from eea_net_ha_reset_remove and eea_create_pp
>     2. call skb_mark_for_recycle for new skb for frag_list skbs
>
> v31:
>     1. remove unused parameter "enet" from eex_rx_post()
>
> v30:
>    1. fix some small problems
>
> v29:
>     Address AI-generated review suggestions from the previous version.
>
> v28:
>     Address AI-generated review suggestions from the previous version.
>
> v27:
>     Address AI-generated review suggestions from the previous version.
>
> v26:
>     Adopting suggestions from the previous AI review, another significant
>     change is the introduction of an IRQ block to implement IRQ proxying. With
>     this design, when an IRQ occurs, we no longer invoke the RX data structure
>     directly -- instead, the IRQ block serves as an intermediary proxy. This
>     approach offers several advantages: IRQ resources no longer need to be
>     reallocated during reset operations, and IRQs are decoupled from RX
>     structures. Consequently, when certain errors occur, we can fall back and
>     safely reuse the original memory resources.
>
> v25:
>     I have adopted most of the suggestions from the AI's feedback. However, I
>     believe some of the feedback is incorrect. I have already replied in the
>     previous thread. http://lore.kernel.org/all/1770002612.3297296-2-xuanzhuo@linux.alibaba.com
>
> v24:
>     1. Add null checks for enet->rx and enet->tx in eea_get_ethtool_stat to
>        prevent errors when reading rx = enet->rx[i] in case enet->rx is null.
>        tx is similar. With rtnl protection in place, this check is sufficient.
>     2. Use 'received' as the return value in eea_poll.
>
> v23:
>     I have moved netif_set_real_num_queues() out of eea_start_rxtx(), so
>     eea_start_rxtx() is now a void function. I believe enet_bind_new_q_and_cfg()
>     is a more suitable place to include netif_set_real_num_queues(). In
>     eea_active_ring_and_irq(), I first execute request_irq() before interacting
>     with the hardware to create queues. Therefore, during the NIC setup process,
>     all driver-internal operations (memory allocation, IRQ initialization, sysfs
>     configuration, etc.) will be completed before the final notification to the
>     hardware.
>
> v22:
>     1. Use the budget from the NAPI poll function as the parameter for
>        napi_consume_skb.
>     2. Stop the TX queue when the remaining ring slots cannot hold an SKB.
>
> v21:
>     Fix two issues from the previous version:
>     1, a DMA unmap operation was missing.
>     2, RCU APIs were not used in eea_stats. Although the standard practice when
>         using RCU would require adding the __rcu annotation to both the rx and
>         tx fields, in many cases these fields are read without needing RCU
>         protection.  Therefore, I do not want to add the __rcu annotation.
>         Instead, I use a spin lock to protect modifications to rx and tx.
>
> v20:
>     Fix the partially initialized structure passed to db. @Jakub
>     http://lore.kernel.org/all/20260113172353.2ae6ef81@kernel.org
>
> v19:
>     fix the comments from @Simon Horman
>
> v18:
>     v17 with [PATCH] prefix.
>
> v17:
>     1. In `eea_adminq_dev_status`, uniformly use `enet->cfg.rx_ring_num`.
>     2. Add a `struct eea_net_cfg *cfg` parameter to `eea_free_rx` and
>         `eea_free_tx`. When called in the normal path, pass `enet->cfg` as
>         the argument; when called during initialization, pass the temporary
>         `cfg` instead.
>     3. Move the `.ndo_get_stats64` callback into `eea_net.c`.
>     4. In the `.ndo_get_stats64` callback, add a comment explaining how the TX
>         and RX statistics are protected by RCU.
>
>        /* This function is protected by RCU. Here uses enet->tx and enet->rx
>         * to check whether the TX and RX structures are safe to access. In
>         * eea_free_rxtx_q_mem, before freeing the TX and RX resources, enet->rx
>         * and enet->tx are set to NULL, and synchronize_net is called.
>         */
>
>
> v16:
>     1. follow the advices from @ALOK TIWARI
>        http://lore.kernel.org/all/5ff95a71-69e5-4cb6-9b2a-5224c983bdc2@oracle.com
>
> v15:
>     1. remove 'default m' from eea kconfig
>     2. free the resources when open failed.
>
> v14:
>     1. some tiny fixes
>
> v13:
>     1. fix some tiny fixes @Simon
>
> v12:
>     I encountered some issues with sending the v11 patches, as they were quite
>     messy. Therefore, I'm resending them as v12.
>
> v11:
>     1. remove auto clean __free(kfree)
>     2. some tiny fixes
>
> v10:
>     1. name the jump labels after the target @Jakub
>     2. rm __GFP_ZERO from dma_alloc_coherent @Jakub
> v9:
>     1. some fixes for ethtool from http://lore.kernel.org/all/20251027183754.52fe2a2c@kernel.org
>
> v8: 1. rename eea_net_tmp to eea_net_init_ctx
>     2. rm code that allocs memory to destroy queues
>     3. some other minor changes
>
> v7: 1. remove the irrelative code from ethtool commit
>     2. build every commits with W12
>
> v6: Split the big one commit to five commits
> v5: Thanks for the comments from Kalesh Anakkur Purayil, ALOK TIWARI
> v4: Thanks for the comments from Troy Mitchell, Przemek Kitszel, Andrew Lunn, Kalesh Anakkur Purayil
> v3: Thanks for the comments from Paolo Abenchi
> v2: Thanks for the comments from Simon Horman and Andrew Lunn
> v1: Thanks for the comments from Simon Horman and Andrew Lunn
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Xuan Zhuo (8):
>   eea: introduce PCI framework
>   eea: introduce ring and descriptor structures
>   eea: probe the netdevice and create adminq
>   eea: create/destroy rx,tx queues for netdevice open and stop
>   eea: implement packet receive logic
>   eea: implement packet transmit logic
>   eea: introduce ethtool support
>   eea: introduce callback for ndo_get_stats64 and register netdev
>
>  MAINTAINERS                                   |   8 +
>  drivers/net/ethernet/Kconfig                  |   1 +
>  drivers/net/ethernet/Makefile                 |   1 +
>  drivers/net/ethernet/alibaba/Kconfig          |  28 +
>  drivers/net/ethernet/alibaba/Makefile         |   5 +
>  drivers/net/ethernet/alibaba/eea/Makefile     |   9 +
>  drivers/net/ethernet/alibaba/eea/eea_adminq.c | 542 +++++++++++
>  drivers/net/ethernet/alibaba/eea/eea_adminq.h |  83 ++
>  drivers/net/ethernet/alibaba/eea/eea_desc.h   | 138 +++
>  .../net/ethernet/alibaba/eea/eea_ethtool.c    | 273 ++++++
>  .../net/ethernet/alibaba/eea/eea_ethtool.h    |  48 +
>  drivers/net/ethernet/alibaba/eea/eea_net.c    | 887 ++++++++++++++++++
>  drivers/net/ethernet/alibaba/eea/eea_net.h    | 198 ++++
>  drivers/net/ethernet/alibaba/eea/eea_pci.c    | 744 +++++++++++++++
>  drivers/net/ethernet/alibaba/eea/eea_pci.h    |  73 ++
>  drivers/net/ethernet/alibaba/eea/eea_ring.c   | 249 +++++
>  drivers/net/ethernet/alibaba/eea/eea_ring.h   |  99 ++
>  drivers/net/ethernet/alibaba/eea/eea_rx.c     | 814 ++++++++++++++++
>  drivers/net/ethernet/alibaba/eea/eea_tx.c     | 500 ++++++++++
>  19 files changed, 4700 insertions(+)
>  create mode 100644 drivers/net/ethernet/alibaba/Kconfig
>  create mode 100644 drivers/net/ethernet/alibaba/Makefile
>  create mode 100644 drivers/net/ethernet/alibaba/eea/Makefile
>  create mode 100644 drivers/net/ethernet/alibaba/eea/eea_adminq.c
>  create mode 100644 drivers/net/ethernet/alibaba/eea/eea_adminq.h
>  create mode 100644 drivers/net/ethernet/alibaba/eea/eea_desc.h
>  create mode 100644 drivers/net/ethernet/alibaba/eea/eea_ethtool.c
>  create mode 100644 drivers/net/ethernet/alibaba/eea/eea_ethtool.h
>  create mode 100644 drivers/net/ethernet/alibaba/eea/eea_net.c
>  create mode 100644 drivers/net/ethernet/alibaba/eea/eea_net.h
>  create mode 100644 drivers/net/ethernet/alibaba/eea/eea_pci.c
>  create mode 100644 drivers/net/ethernet/alibaba/eea/eea_pci.h
>  create mode 100644 drivers/net/ethernet/alibaba/eea/eea_ring.c
>  create mode 100644 drivers/net/ethernet/alibaba/eea/eea_ring.h
>  create mode 100644 drivers/net/ethernet/alibaba/eea/eea_rx.c
>  create mode 100644 drivers/net/ethernet/alibaba/eea/eea_tx.c
>
> --
> 2.32.0.3.g01195cf9f
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v43 4/8] eea: create/destroy rx,tx queues for netdevice open and stop
  2026-05-14  9:51 ` [PATCH net-next v43 4/8] eea: create/destroy rx,tx queues for netdevice open and stop Xuan Zhuo
@ 2026-05-18  1:18   ` Xuan Zhuo
  2026-05-18  1:24     ` Xuan Zhuo
  0 siblings, 1 reply; 25+ messages in thread
From: Xuan Zhuo @ 2026-05-18  1:18 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Wen Gu, Philo Lu, Vadim Fedorenko, Dong Yibo,
	Mingyu Wang, Heiner Kallweit, Dust Li, netdev

> diff --git a/drivers/net/ethernet/alibaba/eea/eea_net.c b/drivers/net/ethernet/alibaba/eea/eea_net.c
> index bb8a49f8c6df..cfb18a07e296 100644
> --- a/drivers/net/ethernet/alibaba/eea/eea_net.c
> +++ b/drivers/net/ethernet/alibaba/eea/eea_net.c
[ ... ]
> +/* resources: ring, buffers, irq */
> +int eea_reset_hw_resources(struct eea_net *enet, struct eea_net_init_ctx *ctx)
> +{
> +	struct eea_net_init_ctx ctx_old = {0};
> +	int err, error;
> +
> +	if (!netif_running(enet->netdev) || !enet->started) {
> +		enet->cfg = ctx->cfg;
> +		return 0;
> +	}
> +
> +	err = eea_alloc_rxtx_q_mem(ctx);
> +	if (err) {
> +		netdev_warn(enet->netdev,
> +			    "eea reset: alloc q failed. stop reset. err %d\n",
> +			    err);
> +		return err;
> +	}
> +
> +	eea_stop_rxtx(enet->netdev);
> +	eea_hw_unactive_ring(enet);
> +
> +	eea_unbind_q_and_cfg(enet, &ctx_old);
> +	eea_bind_q_and_cfg(enet, ctx);
> +
> +	err = eea_update_queues(enet);
> +	if (err) {
> +		netdev_err(enet->netdev,
> +			   "eea reset: set real num queues failed. err %d\n",
> +			   err);
> +		goto err_bind_old;
> +	}
> +
> +	err = eea_hw_active_ring(enet);
> +	if (err) {
> +		netdev_err(enet->netdev, "eea reset: active new ring. err %d\n",
> +			   err);
> +		eea_unbind_q_and_cfg(enet, ctx);
> +		goto err_free_q;
> +	}
On the eea_hw_active_ring() failure branch above, can the netdev be
left with mismatched cached state?
The sequence executed by then is:
    eea_unbind_q_and_cfg(enet, &ctx_old);   /* saves OLD cfg into ctx_old */
    eea_bind_q_and_cfg(enet, ctx);          /* enet->cfg = ctx->cfg (NEW) */
    err = eea_update_queues(enet);          /* sets real_num_{rx,tx}_queues
                                               to NEW values, succeeded */
    err = eea_hw_active_ring(enet);         /* failed */
    eea_unbind_q_and_cfg(enet, ctx);
    goto err_free_q;
err_free_q then sets enet->started = false but never restores
enet->cfg or netdev->real_num_{rx,tx}_queues to the old values, and
eea_unbind_q_and_cfg() only nulls enet->rx / enet->tx without
reverting cfg.
Looking at eea_unbind_q_and_cfg():
    ctx->cfg = enet->cfg;
    ctx->rx = enet->rx;
    ctx->tx = enet->tx;
    enet->rx = NULL;
    enet->tx = NULL;
If the user later calls ndo_open, eea_netdev_open() reads enet->cfg
which still holds the NEW values that the device just rejected. Was
a rollback to ctx_old intended on this path as well, similar to the
err_bind_old branch?
[ ... ]
> +static void eea_net_ha_reset_remove(struct eea_net *enet,
> +				    struct eea_device *edev)
> +{
[ ... ]
> diff --git a/drivers/net/ethernet/alibaba/eea/eea_net.h b/drivers/net/ethernet/alibaba/eea/eea_net.h
> index fa0eec8af21b..9639f0c5c618 100644
> --- a/drivers/net/ethernet/alibaba/eea/eea_net.h
> +++ b/drivers/net/ethernet/alibaba/eea/eea_net.h
[ ... ]
> +/* rx apis */
> +void enet_rx_stop(struct eea_net_rx *rx);
> +void enet_rx_start(struct eea_net_rx *rx);
Every other symbol in this driver uses the eea_ prefix, but
enet_rx_start / enet_rx_stop use enet_. Was the inconsistency
intentional, or should these be renamed to eea_rx_start /
eea_rx_stop for consistency with the rest of the eea_ namespace?
> +
> +void eea_free_rx(struct eea_net_rx *rx, struct eea_net_cfg *cfg);
> +struct eea_net_rx *eea_alloc_rx(struct eea_net_init_ctx *ctx, u32 idx);
[ ... ]
> diff --git a/drivers/net/ethernet/alibaba/eea/eea_pci.c b/drivers/net/ethernet/alibaba/eea/eea_pci.c
> index bef0ebd6bb20..9872e360dd5d 100644
> --- a/drivers/net/ethernet/alibaba/eea/eea_pci.c
> +++ b/drivers/net/ethernet/alibaba/eea/eea_pci.c
[ ... ]
> +static void eea_ha_handle_reset(struct eea_pci_device *ep_dev)
> +{
[ ... ]
> +	if (reset & EEA_PCI_CAP_RESET_FLAG) {
> +		dev_warn(&ep_dev->pci_dev->dev, "recv device reset request.\n");
> +
> +		pci_dev = ep_dev->pci_dev;
> +
> +		/* The pci remove callback may hold this lock. If the
> +		 * pci remove callback is called, then we can ignore the
> +		 * ha interrupt.
> +		 */
> +		if (mutex_trylock(&edev->ha_lock)) {
Is the mutex_trylock() with a silent-skip fallback below the right
pattern here?
> +			if (edev->status != EEA_PCI_STATUS_DONE) {
> +				dev_err(&ep_dev->pci_dev->dev, "ha: reset device: pci status is %d. skip it.\n",
> +					edev->status);
> +
> +				mutex_unlock(&edev->ha_lock);
> +				return;
> +			}
> +
> +			__eea_pci_remove(pci_dev, false);
> +			err = __eea_pci_probe(pci_dev, ep_dev, false);
> +			if (err)
[ ... ]
> +			mutex_unlock(&edev->ha_lock);
> +		} else {
> +			/* Device removal is in progress, so return directly. */
> +			dev_warn(&ep_dev->pci_dev->dev,
> +				 "ha device reset: trylock failed.\n");
> +		}
> +		return;
> +	}
When the trylock fails, the work simply logs a warning and returns
without rescheduling. The networking subsystem guidance lists this
specific pattern (trylock with fallback to skip the work entirely) as
a likely bug, with the narrow exception of work items that retry via
schedule_work() on failure.
The trylock also fails for any holder of ha_lock, not only pci
remove, so a HA reset request that arrives during any concurrent
holder is dropped silently and the device is left without recovery.
Would it be more appropriate to either reschedule the work via
schedule_work() on trylock failure, or to redesign the locking so
that the remove path coordinates with the worker via cancel/flush
rather than a shared mutex that the worker must avoid?
[ ... ]
> +static int eea_pci_ha_init(struct eea_device *edev, struct pci_dev *pci_dev,
> +			   bool pci_probe)
> +{
[ ... ]
> diff --git a/drivers/net/ethernet/alibaba/eea/eea_net.c b/drivers/net/ethernet/alibaba/eea/eea_net.c
> +static int eea_net_reprobe(struct eea_device *edev)
> +{
> +	struct eea_net *enet = edev->enet;
> +	int err = 0;
> +
> +	enet->edev = edev;
> +
> +	if (!enet->adminq.ring) {
> +		err = eea_create_adminq(enet, edev->rx_num + edev->tx_num);
> +		if (err)
> +			return err;
> +	}
> +
> +	err = eea_alloc_irq_blks(enet);
> +	if (err)
> +		goto err_destroy_aq;
> +
> +	rtnl_lock();
> +
> +	enet->link_err = 0;
> +	if (edev->ha_reset_netdev_running &&
> +	    netif_running(edev->enet->netdev)) {
> +		err = eea_netdev_open(enet->netdev);
On the HA reprobe path, can edev->rx_num change relative to the
value used at original probe?
eea_pci_setup() re-reads rx_num/tx_num from device registers each
time, and eea_alloc_irq_blks() sizes enet->irq_blks based on the
new edev->rx_num:
    num = enet->edev->rx_num;
    irq_blks = kvcalloc(num, sizeof(*blk), GFP_KERNEL);
However, the netdev itself was allocated once with
alloc_etherdev_mq(..., pairs) using the original count, and
enet->cfg.rx_ring_num is preserved from the previous probe.
If the device reports a smaller rx_num after the HA reset,
eea_bind_q_and_cfg() will iterate enet->cfg.rx_ring_num entries
(OLD) over enet->irq_blks[] (NEW, smaller):
    for (i = 0; i < ctx->cfg.rx_ring_num; i++) {
        blk = &enet->irq_blks[i];
        ...
    }
Should eea_net_reprobe() validate that edev->rx_num and
edev->tx_num match the values used at the original alloc_etherdev_mq()
and refuse to reprobe (or reinitialise cfg) if they have changed?
[ ... ]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v43 4/8] eea: create/destroy rx,tx queues for netdevice open and stop
  2026-05-18  1:18   ` Xuan Zhuo
@ 2026-05-18  1:24     ` Xuan Zhuo
  0 siblings, 0 replies; 25+ messages in thread
From: Xuan Zhuo @ 2026-05-18  1:24 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Wen Gu, Philo Lu, Vadim Fedorenko, Dong Yibo,
	Mingyu Wang, Heiner Kallweit, Dust Li, netdev

On Mon, 18 May 2026 09:18:34 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > diff --git a/drivers/net/ethernet/alibaba/eea/eea_net.c b/drivers/net/ethernet/alibaba/eea/eea_net.c
> > index bb8a49f8c6df..cfb18a07e296 100644
> > --- a/drivers/net/ethernet/alibaba/eea/eea_net.c
> > +++ b/drivers/net/ethernet/alibaba/eea/eea_net.c
> [ ... ]
> > +/* resources: ring, buffers, irq */
> > +int eea_reset_hw_resources(struct eea_net *enet, struct eea_net_init_ctx *ctx)
> > +{
> > +	struct eea_net_init_ctx ctx_old = {0};
> > +	int err, error;
> > +
> > +	if (!netif_running(enet->netdev) || !enet->started) {
> > +		enet->cfg = ctx->cfg;
> > +		return 0;
> > +	}
> > +
> > +	err = eea_alloc_rxtx_q_mem(ctx);
> > +	if (err) {
> > +		netdev_warn(enet->netdev,
> > +			    "eea reset: alloc q failed. stop reset. err %d\n",
> > +			    err);
> > +		return err;
> > +	}
> > +
> > +	eea_stop_rxtx(enet->netdev);
> > +	eea_hw_unactive_ring(enet);
> > +
> > +	eea_unbind_q_and_cfg(enet, &ctx_old);
> > +	eea_bind_q_and_cfg(enet, ctx);
> > +
> > +	err = eea_update_queues(enet);
> > +	if (err) {
> > +		netdev_err(enet->netdev,
> > +			   "eea reset: set real num queues failed. err %d\n",
> > +			   err);
> > +		goto err_bind_old;
> > +	}
> > +
> > +	err = eea_hw_active_ring(enet);
> > +	if (err) {
> > +		netdev_err(enet->netdev, "eea reset: active new ring. err %d\n",
> > +			   err);
> > +		eea_unbind_q_and_cfg(enet, ctx);
> > +		goto err_free_q;
> > +	}
> On the eea_hw_active_ring() failure branch above, can the netdev be
> left with mismatched cached state?
> The sequence executed by then is:
>     eea_unbind_q_and_cfg(enet, &ctx_old);   /* saves OLD cfg into ctx_old */
>     eea_bind_q_and_cfg(enet, ctx);          /* enet->cfg = ctx->cfg (NEW) */
>     err = eea_update_queues(enet);          /* sets real_num_{rx,tx}_queues
>                                                to NEW values, succeeded */
>     err = eea_hw_active_ring(enet);         /* failed */
>     eea_unbind_q_and_cfg(enet, ctx);
>     goto err_free_q;
> err_free_q then sets enet->started = false but never restores
> enet->cfg or netdev->real_num_{rx,tx}_queues to the old values, and
> eea_unbind_q_and_cfg() only nulls enet->rx / enet->tx without
> reverting cfg.
> Looking at eea_unbind_q_and_cfg():
>     ctx->cfg = enet->cfg;
>     ctx->rx = enet->rx;
>     ctx->tx = enet->tx;
>     enet->rx = NULL;
>     enet->tx = NULL;
> If the user later calls ndo_open, eea_netdev_open() reads enet->cfg
> which still holds the NEW values that the device just rejected. Was
> a rollback to ctx_old intended on this path as well, similar to the
> err_bind_old branch?

All configurations that reach this point are additive in nature. The failure to
apply them to the hardware does not imply that the configuration itself is
invalid; rather, it is typically due to hardware issues or memory allocation
failures. Therefore, we retain the configuration with the expectation that it
may be successfully applied upon the next attempt.

> [ ... ]
> > +static void eea_net_ha_reset_remove(struct eea_net *enet,
> > +				    struct eea_device *edev)
> > +{
> [ ... ]
> > diff --git a/drivers/net/ethernet/alibaba/eea/eea_net.h b/drivers/net/ethernet/alibaba/eea/eea_net.h
> > index fa0eec8af21b..9639f0c5c618 100644
> > --- a/drivers/net/ethernet/alibaba/eea/eea_net.h
> > +++ b/drivers/net/ethernet/alibaba/eea/eea_net.h
> [ ... ]
> > +/* rx apis */
> > +void enet_rx_stop(struct eea_net_rx *rx);
> > +void enet_rx_start(struct eea_net_rx *rx);
> Every other symbol in this driver uses the eea_ prefix, but
> enet_rx_start / enet_rx_stop use enet_. Was the inconsistency
> intentional, or should these be renamed to eea_rx_start /
> eea_rx_stop for consistency with the rest of the eea_ namespace?

The enet prefix indicates that this code is not part of the PCI or adminq layers,
but is specifically used for eea netdev handling.


> > +
> > +void eea_free_rx(struct eea_net_rx *rx, struct eea_net_cfg *cfg);
> > +struct eea_net_rx *eea_alloc_rx(struct eea_net_init_ctx *ctx, u32 idx);
> [ ... ]
> > diff --git a/drivers/net/ethernet/alibaba/eea/eea_pci.c b/drivers/net/ethernet/alibaba/eea/eea_pci.c
> > index bef0ebd6bb20..9872e360dd5d 100644
> > --- a/drivers/net/ethernet/alibaba/eea/eea_pci.c
> > +++ b/drivers/net/ethernet/alibaba/eea/eea_pci.c
> [ ... ]
> > +static void eea_ha_handle_reset(struct eea_pci_device *ep_dev)
> > +{
> [ ... ]
> > +	if (reset & EEA_PCI_CAP_RESET_FLAG) {
> > +		dev_warn(&ep_dev->pci_dev->dev, "recv device reset request.\n");
> > +
> > +		pci_dev = ep_dev->pci_dev;
> > +
> > +		/* The pci remove callback may hold this lock. If the
> > +		 * pci remove callback is called, then we can ignore the
> > +		 * ha interrupt.
> > +		 */
> > +		if (mutex_trylock(&edev->ha_lock)) {
> Is the mutex_trylock() with a silent-skip fallback below the right
> pattern here?
> > +			if (edev->status != EEA_PCI_STATUS_DONE) {
> > +				dev_err(&ep_dev->pci_dev->dev, "ha: reset device: pci status is %d. skip it.\n",
> > +					edev->status);
> > +
> > +				mutex_unlock(&edev->ha_lock);
> > +				return;
> > +			}
> > +
> > +			__eea_pci_remove(pci_dev, false);
> > +			err = __eea_pci_probe(pci_dev, ep_dev, false);
> > +			if (err)
> [ ... ]
> > +			mutex_unlock(&edev->ha_lock);
> > +		} else {
> > +			/* Device removal is in progress, so return directly. */
> > +			dev_warn(&ep_dev->pci_dev->dev,
> > +				 "ha device reset: trylock failed.\n");
> > +		}
> > +		return;
> > +	}
> When the trylock fails, the work simply logs a warning and returns
> without rescheduling. The networking subsystem guidance lists this
> specific pattern (trylock with fallback to skip the work entirely) as
> a likely bug, with the narrow exception of work items that retry via
> schedule_work() on failure.
> The trylock also fails for any holder of ha_lock, not only pci
> remove, so a HA reset request that arrives during any concurrent
> holder is dropped silently and the device is left without recovery.
> Would it be more appropriate to either reschedule the work via
> schedule_work() on trylock failure, or to redesign the locking so
> that the remove path coordinates with the worker via cancel/flush
> rather than a shared mutex that the worker must avoid?


If the system has already entered the PCI remove path, the HA work is no longer
necessary, so returning directly after mutex_trylock failure is correct
behavior. Furthermore, our design guarantees that an HA interrupt will not be
triggered again until the current one is fully processed. If the hardware were
to encounter such a demand (i.e., multiple concurrent HA requests), our design
strategy is to let the hardware stop functioning entirely rather than attempting
to recover again.


> [ ... ]
> > +static int eea_pci_ha_init(struct eea_device *edev, struct pci_dev *pci_dev,
> > +			   bool pci_probe)
> > +{
> [ ... ]
> > diff --git a/drivers/net/ethernet/alibaba/eea/eea_net.c b/drivers/net/ethernet/alibaba/eea/eea_net.c
> > +static int eea_net_reprobe(struct eea_device *edev)
> > +{
> > +	struct eea_net *enet = edev->enet;
> > +	int err = 0;
> > +
> > +	enet->edev = edev;
> > +
> > +	if (!enet->adminq.ring) {
> > +		err = eea_create_adminq(enet, edev->rx_num + edev->tx_num);
> > +		if (err)
> > +			return err;
> > +	}
> > +
> > +	err = eea_alloc_irq_blks(enet);
> > +	if (err)
> > +		goto err_destroy_aq;
> > +
> > +	rtnl_lock();
> > +
> > +	enet->link_err = 0;
> > +	if (edev->ha_reset_netdev_running &&
> > +	    netif_running(edev->enet->netdev)) {
> > +		err = eea_netdev_open(enet->netdev);
> On the HA reprobe path, can edev->rx_num change relative to the
> value used at original probe?

This won't happen. According to our hardware design specification, the device
parameters (including rx_num and tx_num) are guaranteed to remain constant
throughout the entire HA reset process. Therefore, we don't need to perform an
explicit validation check for these values during eea_net_reprobe().



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v43 3/8] eea: probe the netdevice and create adminq
  2026-05-14  9:51 ` [PATCH net-next v43 3/8] eea: probe the netdevice and create adminq Xuan Zhuo
@ 2026-05-18  1:41   ` Xuan Zhuo
  2026-05-18  1:41     ` Xuan Zhuo
  0 siblings, 1 reply; 25+ messages in thread
From: Xuan Zhuo @ 2026-05-18  1:41 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Wen Gu, Philo Lu, Vadim Fedorenko, Dong Yibo,
	Mingyu Wang, Heiner Kallweit, Dust Li, netdev


Sashiko ai review:

commit b9e7d5d9497cc9075fff4db2f0e1c143dbb8d2d8
Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
eea: probe the netdevice and create adminq
This patch adds the basic driver framework for the Alibaba Elastic
Ethernet Adapter, creating the netdevice after PCI probe and
initialising the admin queue used to issue commands to the device.
A few questions on the code below.
> diff --git a/drivers/net/ethernet/alibaba/eea/eea_adminq.c b/drivers/net/ethernet/alibaba/eea/eea_adminq.c
> new file mode 100644
> index 000000000000..dfad1bdbc44d
> --- /dev/null
> +++ b/drivers/net/ethernet/alibaba/eea/eea_adminq.c
[ ... ]
> +static int eea_adminq_submit(struct eea_net *enet, u16 cmd,
> +			     dma_addr_t req_addr, dma_addr_t res_addr,
> +			     u32 req_size, u32 res_size, u32 *reply_len)
> +{
> +	struct eea_aq_cdesc *cdesc;
> +	struct eea_aq_desc *desc;
> +	int ret;
[ ... ]
> +	/* Returns 0 on success, or a negative error code on failure. */
> +	ret = le32_to_cpu(cdesc->status);
> +
> +	eea_ering_cq_ack_desc(enet->adminq.ring, 1);
> +
> +	if (ret)
> +		netdev_err(enet->netdev,
> +			   "adminq exec failed. cmd: %d ret %d\n", cmd, ret);
> +	else
> +		*reply_len = le32_to_cpu(cdesc->reply_len);
> +
> +	return ret;
> +}
The wire protocol header defines:
  #define EEA_OK     0
  #define EEA_ERR    0xffffffff
cdesc->status is a __le32, but it is stored into a signed int here:
  ret = le32_to_cpu(cdesc->status);
When the device reports failure (0xffffffff), ret becomes -1, which is
-EPERM. That value then propagates up through eea_adminq_exec() into
eea_netdev_init_features() and ultimately out of probe.
Should the device's protocol-level error be translated to a more
appropriate kernel errno such as -EIO before being returned to the
generic layers, given that the wire protocol defines only EEA_OK and
EEA_ERR rather than a kernel errno space?
[ ... ]
> +int eea_create_adminq(struct eea_net *enet, u32 qid)
> +{
> +	u32 db_size, q_size, num;
> +	struct eea_ring *ering;
> +	struct eea_aq *aq;
> +	int err = -ENOMEM;
> +
> +	num = enet->edev->rx_num + enet->edev->tx_num;
> +	aq = &enet->adminq;
> +
> +	ering = eea_ering_alloc(qid, 64, enet->edev, sizeof(struct eea_aq_desc),
> +				sizeof(struct eea_aq_cdesc), "adminq");
> +	if (!ering)
> +		return -ENOMEM;
> +
> +	aq->ring = ering;
> +
> +	err = eea_pci_active_aq(ering, qid / 2 + 1);
> +	if (err)
> +		goto err;
> +
> +	aq->phase = BIT(7);
> +	aq->num = 0;
> +
> +	q_size = sizeof(*aq->q_req_buf) * num;
> +	db_size = sizeof(*aq->q_res_buf) * num;
> +
> +	aq->q_req_size = q_size;
> +	aq->q_res_size = db_size;
> +
> +	err = -ENOMEM;
> +
> +	aq->q_req_buf = kzalloc(q_size, GFP_KERNEL);
> +	if (!aq->q_req_buf)
> +		goto err;
> +
> +	aq->q_res_buf = kzalloc(db_size, GFP_KERNEL);
> +	if (!aq->q_res_buf)
> +		goto err;
> +
> +	/* Before we set up the AQ, the device remains in an inactive state, so
> +	 * there will be no DMA operations. If the 'set up AQ' process fails, we
> +	 * can safely free the DMA-related memory.
> +	 */
> +	err = eea_pci_set_aq_up(enet->edev);
> +	if (err)
> +		goto err;
> +
> +	aq->broken = false;
> +
> +	mutex_init(&aq->lock);
> +
> +	return 0;
> +
> +err:
> +	eea_destroy_adminq(enet);
> +	return err;
> +}
By the time eea_pci_set_aq_up() is called, eea_pci_active_aq() has
already programmed the SQ/CQ DMA addresses into the device registers
and eea_pci_set_aq_up() has signalled EEA_S_OK. If the
read_poll_timeout() inside eea_pci_set_aq_up() times out, can the
device be assumed to still be quiescent?
The err: label here jumps directly to eea_destroy_adminq(), which calls
eea_ering_free() -> dma_free_coherent() without first calling
eea_device_reset(). Every other teardown site in this commit resets
first, for example eea_adminq_destroy_all_q():
  if (err) {
      /* The device must be reset before unmapping buffers to avoid
       * potential DMA writes after the memory is freed.
       */
      mutex_lock(&enet->adminq.lock);
      eea_device_broken(enet);
      mutex_unlock(&enet->adminq.lock);
      ...
  }
and eea_adminq_submit()'s timeout path (via eea_device_broken()), and
the err_reset_dev label in eea_net_probe(). Should this err: label
also call eea_device_reset(enet->edev) before eea_destroy_adminq() to
avoid a DMA-after-free window into the freed SQ/CQ rings?
mutex_init(&aq->lock) is called only after the last goto err point,
and eea_destroy_adminq() never calls mutex_destroy() on it. Should
eea_destroy_adminq() pair the init with mutex_destroy(), and should
mutex_init() move earlier so the lock is always initialised on the
teardown path?
[ ... ]
> +void eea_adminq_config_host_info(struct eea_net *enet)
> +{
> +	struct device *dev = enet->edev->dma_dev;
> +	struct eea_aq_host_info_cfg *cfg;
> +	struct eea_aq_host_info_rep *rep;
> +	int rc = -ENOMEM;
[ ... ]
> +	cfg->os_type            = cpu_to_le16(EEA_OS_LINUX);
> +	cfg->os_dist            = cpu_to_le16(EEA_OS_DISTRO);
> +	cfg->drv_type           = cpu_to_le16(EEA_DRV_TYPE);
> +
> +	cfg->kern_ver_major     = cpu_to_le16(LINUX_VERSION_MAJOR);
> +	cfg->kern_ver_minor     = cpu_to_le16(LINUX_VERSION_PATCHLEVEL);
> +	cfg->kern_ver_sub_minor = cpu_to_le16(LINUX_VERSION_SUBLEVEL);
> +
> +	cfg->drv_ver_major      = cpu_to_le16(EEA_VER_MAJOR);
> +	cfg->drv_ver_minor      = cpu_to_le16(EEA_VER_MINOR);
> +	cfg->drv_ver_sub_minor  = cpu_to_le16(EEA_VER_SUB_MINOR);
The driver carries a private version triplet (EEA_VER_MAJOR/MINOR/
SUB_MINOR = 1.0.0) and reports it to the device. For an upstream
driver, the canonical version is the kernel version itself; private
driver version numbers tend to fall out of date and have historically
been pushed back on by netdev maintainers.
Could this field be dropped, or derived from the kernel version,
rather than freezing 1.0.0 into the device-side ABI?
> +
> +	cfg->spec_ver_major     = cpu_to_le16(EEA_SPEC_VER_MAJOR);
> +	cfg->spec_ver_minor     = cpu_to_le16(EEA_SPEC_VER_MINOR);
> +
> +	cfg->pci_bdf            = cpu_to_le16(eea_pci_bdf(enet->edev));
> +	cfg->pci_domain         = cpu_to_le32(eea_pci_domain_nr(enet->edev));
> +
> +	strscpy(cfg->os_ver_str, utsname()->release, sizeof(cfg->os_ver_str));
> +	strscpy(cfg->isa_str, utsname()->machine, sizeof(cfg->isa_str));
On every probe the driver unconditionally sends utsname()->release,
utsname()->machine, the full Linux kernel version, os_dist, drv_type,
the PCI BDF and PCI domain to the device. The kernel release string
is set by distros and can include build identifiers.
Could the commit message explain what the firmware does with these
strings, and is there a reason this needs to be unconditional rather
than gated on a module parameter or kernel option?
EEA_OS_DISTRO and EEA_DRV_TYPE are both hardcoded to 0 with no
documented enumeration of valid values. Together with EEA_OS_LINUX=1,
these become part of the wire ABI introduced in this commit.
If these fields are intended to grow (for example to distinguish
upstream vs out-of-tree builds, or kernel module vs userspace
consumers), can the meaning of value 0 and the reserved range be
documented now? If they are not needed, can they be removed before
firmware ships against this layout?
> +
> +	rc = eea_adminq_exec(enet, EEA_AQ_CMD_HOST_INFO,
> +			     cfg, sizeof(*cfg), rep, sizeof(*rep), NULL);
[ ... ]
> diff --git a/drivers/net/ethernet/alibaba/eea/eea_net.c b/drivers/net/ethernet/alibaba/eea/eea_net.c
> new file mode 100644
> index 000000000000..bb8a49f8c6df
> --- /dev/null
> +++ b/drivers/net/ethernet/alibaba/eea/eea_net.c
[ ... ]
> +static int eea_netdev_init_features(struct net_device *netdev,
> +				    struct eea_net *enet,
> +				    struct eea_device *edev)
> +{
> +	struct eea_aq_cfg *cfg;
> +	int err;
> +	u32 mtu;
[ ... ]
> +	netdev->min_mtu = ETH_MIN_MTU;
> +
> +	netdev->mtu = mtu;
> +
> +	/* If jumbo frames are already enabled, then the returned MTU will be a
> +	 * jumbo MTU, and the driver will automatically enable jumbo frame
> +	 * support by default.
> +	 */
> +	netdev->max_mtu = mtu;
netdev->max_mtu is set to the same value as netdev->mtu, where mtu
came from cfg->mtu in the device's reply to EEA_AQ_CMD_CFG_QUERY.
The eea_aq_cfg structure has only a single __le16 mtu field and no
separate maximum, and the comment above describes mtu as the current
configured MTU rather than the device-supported maximum.
With max_mtu == mtu, can a user ever raise the MTU above the value the
device happened to advertise at probe time via "ip link set dev X
mtu N"? Should the wire ABI being introduced here grow a separate
mtu_max field, or should the driver derive a maximum (for example a
fixed jumbo cap) for max_mtu?
> +
> +err_free:
> +	kfree(cfg);
> +	return err;
> +}
[ ... ]
> +int eea_net_probe(struct eea_device *edev)
> +{
> +	struct eea_net *enet;
> +	int err = -ENOMEM;
> +
> +	enet = eea_netdev_alloc(edev, edev->rx_num);
> +	if (!enet)
> +		return -ENOMEM;
> +
> +	err = eea_create_adminq(enet, edev->rx_num + edev->tx_num);
> +	if (err)
> +		goto err_free_netdev;
> +
> +	eea_adminq_config_host_info(enet);
> +
> +	err = eea_netdev_init_features(enet->netdev, enet, edev);
> +	if (err)
> +		goto err_reset_dev;
> +
> +	netdev_dbg(enet->netdev, "eea probe success.\n");
> +
> +	/* Queue TX/RX implementation is still in progress. register_netdev is
> +	 * deferred until these are completed in subsequent commits.
> +	 */
> +
> +	return 0;
After this commit, eea_net_probe() returns success without calling
register_netdev(). A bisect that lands exactly on this commit will
see a probed PCI driver that has consumed IRQs and DMA buffers and
already issued the HOST_INFO admin command to firmware, but no netdev
visible to userspace.
Could this patch be squashed with the later patches that complete the
TX/RX paths and call register_netdev(), so a bisected kernel either
has a working interface or no driver activation at all?
> +
> +err_reset_dev:
> +	eea_device_reset(edev);
> +	eea_destroy_adminq(enet);
> +
> +err_free_netdev:
> +	free_netdev(enet->netdev);
> +	return err;
> +}

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v43 3/8] eea: probe the netdevice and create adminq
  2026-05-18  1:41   ` Xuan Zhuo
@ 2026-05-18  1:41     ` Xuan Zhuo
  0 siblings, 0 replies; 25+ messages in thread
From: Xuan Zhuo @ 2026-05-18  1:41 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Wen Gu, Philo Lu, Vadim Fedorenko, Dong Yibo,
	Mingyu Wang, Heiner Kallweit, Dust Li, netdev

On Mon, 18 May 2026 09:41:08 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Sashiko ai review:
>
> commit b9e7d5d9497cc9075fff4db2f0e1c143dbb8d2d8
> Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> eea: probe the netdevice and create adminq
> This patch adds the basic driver framework for the Alibaba Elastic
> Ethernet Adapter, creating the netdevice after PCI probe and
> initialising the admin queue used to issue commands to the device.
> A few questions on the code below.
> > diff --git a/drivers/net/ethernet/alibaba/eea/eea_adminq.c b/drivers/net/ethernet/alibaba/eea/eea_adminq.c
> > new file mode 100644
> > index 000000000000..dfad1bdbc44d
> > --- /dev/null
> > +++ b/drivers/net/ethernet/alibaba/eea/eea_adminq.c
> [ ... ]
> > +static int eea_adminq_submit(struct eea_net *enet, u16 cmd,
> > +			     dma_addr_t req_addr, dma_addr_t res_addr,
> > +			     u32 req_size, u32 res_size, u32 *reply_len)
> > +{
> > +	struct eea_aq_cdesc *cdesc;
> > +	struct eea_aq_desc *desc;
> > +	int ret;
> [ ... ]
> > +	/* Returns 0 on success, or a negative error code on failure. */
> > +	ret = le32_to_cpu(cdesc->status);
> > +
> > +	eea_ering_cq_ack_desc(enet->adminq.ring, 1);
> > +
> > +	if (ret)
> > +		netdev_err(enet->netdev,
> > +			   "adminq exec failed. cmd: %d ret %d\n", cmd, ret);
> > +	else
> > +		*reply_len = le32_to_cpu(cdesc->reply_len);
> > +
> > +	return ret;
> > +}
> The wire protocol header defines:
>   #define EEA_OK     0
>   #define EEA_ERR    0xffffffff
> cdesc->status is a __le32, but it is stored into a signed int here:
>   ret = le32_to_cpu(cdesc->status);
> When the device reports failure (0xffffffff), ret becomes -1, which is
> -EPERM. That value then propagates up through eea_adminq_exec() into
> eea_netdev_init_features() and ultimately out of probe.
> Should the device's protocol-level error be translated to a more
> appropriate kernel errno such as -EIO before being returned to the
> generic layers, given that the wire protocol defines only EEA_OK and
> EEA_ERR rather than a kernel errno space?

Regarding the status design, we fully align with the standard Linux kernel error
codes; therefore, the value returned here is actually -EIO.

> [ ... ]
> > +int eea_create_adminq(struct eea_net *enet, u32 qid)
> > +{
> > +	u32 db_size, q_size, num;
> > +	struct eea_ring *ering;
> > +	struct eea_aq *aq;
> > +	int err = -ENOMEM;
> > +
> > +	num = enet->edev->rx_num + enet->edev->tx_num;
> > +	aq = &enet->adminq;
> > +
> > +	ering = eea_ering_alloc(qid, 64, enet->edev, sizeof(struct eea_aq_desc),
> > +				sizeof(struct eea_aq_cdesc), "adminq");
> > +	if (!ering)
> > +		return -ENOMEM;
> > +
> > +	aq->ring = ering;
> > +
> > +	err = eea_pci_active_aq(ering, qid / 2 + 1);
> > +	if (err)
> > +		goto err;
> > +
> > +	aq->phase = BIT(7);
> > +	aq->num = 0;
> > +
> > +	q_size = sizeof(*aq->q_req_buf) * num;
> > +	db_size = sizeof(*aq->q_res_buf) * num;
> > +
> > +	aq->q_req_size = q_size;
> > +	aq->q_res_size = db_size;
> > +
> > +	err = -ENOMEM;
> > +
> > +	aq->q_req_buf = kzalloc(q_size, GFP_KERNEL);
> > +	if (!aq->q_req_buf)
> > +		goto err;
> > +
> > +	aq->q_res_buf = kzalloc(db_size, GFP_KERNEL);
> > +	if (!aq->q_res_buf)
> > +		goto err;
> > +
> > +	/* Before we set up the AQ, the device remains in an inactive state, so
> > +	 * there will be no DMA operations. If the 'set up AQ' process fails, we
> > +	 * can safely free the DMA-related memory.
> > +	 */
> > +	err = eea_pci_set_aq_up(enet->edev);
> > +	if (err)
> > +		goto err;
> > +
> > +	aq->broken = false;
> > +
> > +	mutex_init(&aq->lock);
> > +
> > +	return 0;
> > +
> > +err:
> > +	eea_destroy_adminq(enet);
> > +	return err;
> > +}
> By the time eea_pci_set_aq_up() is called, eea_pci_active_aq() has
> already programmed the SQ/CQ DMA addresses into the device registers
> and eea_pci_set_aq_up() has signalled EEA_S_OK. If the
> read_poll_timeout() inside eea_pci_set_aq_up() times out, can the
> device be assumed to still be quiescent?
> The err: label here jumps directly to eea_destroy_adminq(), which calls
> eea_ering_free() -> dma_free_coherent() without first calling
> eea_device_reset(). Every other teardown site in this commit resets
> first, for example eea_adminq_destroy_all_q():
>   if (err) {
>       /* The device must be reset before unmapping buffers to avoid
>        * potential DMA writes after the memory is freed.
>        */
>       mutex_lock(&enet->adminq.lock);
>       eea_device_broken(enet);
>       mutex_unlock(&enet->adminq.lock);
>       ...
>   }
> and eea_adminq_submit()'s timeout path (via eea_device_broken()), and
> the err_reset_dev label in eea_net_probe(). Should this err: label
> also call eea_device_reset(enet->edev) before eea_destroy_adminq() to
> avoid a DMA-after-free window into the freed SQ/CQ rings?
> mutex_init(&aq->lock) is called only after the last goto err point,
> and eea_destroy_adminq() never calls mutex_destroy() on it. Should
> eea_destroy_adminq() pair the init with mutex_destroy(), and should
> mutex_init() move earlier so the lock is always initialised on the
> teardown path?
> [ ... ]

Since this is still during the initialization process, a mutex is not required.
Furthermore, even if EEA_S_OK has been signaled to the device, it is still safe
to proceed with the release if the device times out for some reason. The AdminQ
is entirely driven by the driver side; the device will not read or write to the
rings until the driver issues a request via the doorbell.


> > +void eea_adminq_config_host_info(struct eea_net *enet)
> > +{
> > +	struct device *dev = enet->edev->dma_dev;
> > +	struct eea_aq_host_info_cfg *cfg;
> > +	struct eea_aq_host_info_rep *rep;
> > +	int rc = -ENOMEM;
> [ ... ]
> > +	cfg->os_type            = cpu_to_le16(EEA_OS_LINUX);
> > +	cfg->os_dist            = cpu_to_le16(EEA_OS_DISTRO);
> > +	cfg->drv_type           = cpu_to_le16(EEA_DRV_TYPE);
> > +
> > +	cfg->kern_ver_major     = cpu_to_le16(LINUX_VERSION_MAJOR);
> > +	cfg->kern_ver_minor     = cpu_to_le16(LINUX_VERSION_PATCHLEVEL);
> > +	cfg->kern_ver_sub_minor = cpu_to_le16(LINUX_VERSION_SUBLEVEL);
> > +
> > +	cfg->drv_ver_major      = cpu_to_le16(EEA_VER_MAJOR);
> > +	cfg->drv_ver_minor      = cpu_to_le16(EEA_VER_MINOR);
> > +	cfg->drv_ver_sub_minor  = cpu_to_le16(EEA_VER_SUB_MINOR);
> The driver carries a private version triplet (EEA_VER_MAJOR/MINOR/
> SUB_MINOR = 1.0.0) and reports it to the device. For an upstream
> driver, the canonical version is the kernel version itself; private
> driver version numbers tend to fall out of date and have historically
> been pushed back on by netdev maintainers.
> Could this field be dropped, or derived from the kernel version,
> rather than freezing 1.0.0 into the device-side ABI?
> > +
> > +	cfg->spec_ver_major     = cpu_to_le16(EEA_SPEC_VER_MAJOR);
> > +	cfg->spec_ver_minor     = cpu_to_le16(EEA_SPEC_VER_MINOR);
> > +
> > +	cfg->pci_bdf            = cpu_to_le16(eea_pci_bdf(enet->edev));
> > +	cfg->pci_domain         = cpu_to_le32(eea_pci_domain_nr(enet->edev));
> > +
> > +	strscpy(cfg->os_ver_str, utsname()->release, sizeof(cfg->os_ver_str));
> > +	strscpy(cfg->isa_str, utsname()->machine, sizeof(cfg->isa_str));
> On every probe the driver unconditionally sends utsname()->release,
> utsname()->machine, the full Linux kernel version, os_dist, drv_type,
> the PCI BDF and PCI domain to the device. The kernel release string
> is set by distros and can include build identifiers.
> Could the commit message explain what the firmware does with these
> strings, and is there a reason this needs to be unconditional rather
> than gated on a module parameter or kernel option?
> EEA_OS_DISTRO and EEA_DRV_TYPE are both hardcoded to 0 with no
> documented enumeration of valid values. Together with EEA_OS_LINUX=1,
> these become part of the wire ABI introduced in this commit.
> If these fields are intended to grow (for example to distinguish
> upstream vs out-of-tree builds, or kernel module vs userspace
> consumers), can the meaning of value 0 and the reserved range be
> documented now? If they are not needed, can they be removed before
> firmware ships against this layout?

Regarding the EEA_OS_DISTRO, EEA_DRV_TYPE, and the version triplet
(EEA_VER_MAJOR/MINOR/SUB_MINOR), these are already part of our maintenance plan,
and this has been previously discussed in our email thread; please disregard
these concerns for now.

To elaborate: unlike standard physical NICs delivered directly to customers, our
cloud NICs operate within our own cloud environment. Based on our operational
experience, we can implement hardware-based mitigations if a specific driver
version is known to have bugs. By reporting the kernel and driver versions to
the device, the hardware can identify problematic driver versions and apply
workarounds or fallback mechanisms to ensure at least basic functionality,
rather than failing entirely. This is a critical design requirement for our
cloud infrastructure.


> > +
> > +	rc = eea_adminq_exec(enet, EEA_AQ_CMD_HOST_INFO,
> > +			     cfg, sizeof(*cfg), rep, sizeof(*rep), NULL);
> [ ... ]
> > diff --git a/drivers/net/ethernet/alibaba/eea/eea_net.c b/drivers/net/ethernet/alibaba/eea/eea_net.c
> > new file mode 100644
> > index 000000000000..bb8a49f8c6df
> > --- /dev/null
> > +++ b/drivers/net/ethernet/alibaba/eea/eea_net.c
> [ ... ]
> > +static int eea_netdev_init_features(struct net_device *netdev,
> > +				    struct eea_net *enet,
> > +				    struct eea_device *edev)
> > +{
> > +	struct eea_aq_cfg *cfg;
> > +	int err;
> > +	u32 mtu;
> [ ... ]
> > +	netdev->min_mtu = ETH_MIN_MTU;
> > +
> > +	netdev->mtu = mtu;
> > +
> > +	/* If jumbo frames are already enabled, then the returned MTU will be a
> > +	 * jumbo MTU, and the driver will automatically enable jumbo frame
> > +	 * support by default.
> > +	 */
> > +	netdev->max_mtu = mtu;
> netdev->max_mtu is set to the same value as netdev->mtu, where mtu
> came from cfg->mtu in the device's reply to EEA_AQ_CMD_CFG_QUERY.
> The eea_aq_cfg structure has only a single __le16 mtu field and no
> separate maximum, and the comment above describes mtu as the current
> configured MTU rather than the device-supported maximum.
> With max_mtu == mtu, can a user ever raise the MTU above the value the
> device happened to advertise at probe time via "ip link set dev X
> mtu N"? Should the wire ABI being introduced here grow a separate
> mtu_max field, or should the driver derive a maximum (for example a
> fixed jumbo cap) for max_mtu?


The MTU value here represents the maximum MTU. We default to providing the user
with the largest possible MTU. In network environments that do not support jumbo
frames, we will not return a jumbo MTU; however, if the network environment does
support it, we enable it by default so the user's NIC can utilize its full
capacity.


> > +
> > +err_free:
> > +	kfree(cfg);
> > +	return err;
> > +}
> [ ... ]
> > +int eea_net_probe(struct eea_device *edev)
> > +{
> > +	struct eea_net *enet;
> > +	int err = -ENOMEM;
> > +
> > +	enet = eea_netdev_alloc(edev, edev->rx_num);
> > +	if (!enet)
> > +		return -ENOMEM;
> > +
> > +	err = eea_create_adminq(enet, edev->rx_num + edev->tx_num);
> > +	if (err)
> > +		goto err_free_netdev;
> > +
> > +	eea_adminq_config_host_info(enet);
> > +
> > +	err = eea_netdev_init_features(enet->netdev, enet, edev);
> > +	if (err)
> > +		goto err_reset_dev;
> > +
> > +	netdev_dbg(enet->netdev, "eea probe success.\n");
> > +
> > +	/* Queue TX/RX implementation is still in progress. register_netdev is
> > +	 * deferred until these are completed in subsequent commits.
> > +	 */
> > +
> > +	return 0;
> After this commit, eea_net_probe() returns success without calling
> register_netdev(). A bisect that lands exactly on this commit will
> see a probed PCI driver that has consumed IRQs and DMA buffers and
> already issued the HOST_INFO admin command to firmware, but no netdev
> visible to userspace.
> Could this patch be squashed with the later patches that complete the
> TX/RX paths and call register_netdev(), so a bisected kernel either
> has a working interface or no driver activation at all?

The RX and TX paths, along with several other features, are not yet fully
implemented. Enabling the netdev prematurely would likely lead to functional
issues. The code is structured in separate patches to facilitate the review
process. Furthermore, since the netdev will be registered within this same patch
series, this is not a concern.

> > +
> > +err_reset_dev:
> > +	eea_device_reset(edev);
> > +	eea_destroy_adminq(enet);
> > +
> > +err_free_netdev:
> > +	free_netdev(enet->netdev);
> > +	return err;
> > +}

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v43 5/8] eea: implement packet receive logic
  2026-05-14  9:51 ` [PATCH net-next v43 5/8] eea: implement packet receive logic Xuan Zhuo
@ 2026-05-18  2:34   ` Xuan Zhuo
  2026-05-18  2:35     ` Xuan Zhuo
  0 siblings, 1 reply; 25+ messages in thread
From: Xuan Zhuo @ 2026-05-18  2:34 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Wen Gu, Philo Lu, Vadim Fedorenko, Dong Yibo,
	Mingyu Wang, Heiner Kallweit, Dust Li, netdev

From sashiko:


commit 142959c3285196e5751c147c32a4a954a804d736
Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
eea: implement packet receive logic
This commit implements the EEA RX path: page-pool buffer management,
per-descriptor metadata handling, packet assembly via build_skb()/
napi_alloc_skb(), and submission through napi_gro_receive().  It also
introduces a one-shot hardware-timestamp offset captured at probe time.
> diff --git a/drivers/net/ethernet/alibaba/eea/eea_net.c b/drivers/net/ethernet/alibaba/eea/eea_net.c
> index cfb18a07e296..bf8581bc27ed 100644
> --- a/drivers/net/ethernet/alibaba/eea/eea_net.c
> +++ b/drivers/net/ethernet/alibaba/eea/eea_net.c
> @@ -642,6 +642,15 @@ static struct eea_net *eea_netdev_alloc(struct eea_device *edev, u32 pairs)
>  	return enet;
>  }
>
> +static void eea_update_ts_off(struct eea_device *edev, struct eea_net *enet)
> +{
> +	u64 ts;
> +
> +	ts = eea_pci_device_ts(edev);
> +
> +	enet->hw_ts_offset = ktime_get_real() - ts;
> +}
Is the choice of ktime_get_real() as the anchor for hardware RX
timestamps intentional?
Consumers of skb_hwtstamps()->hwtstamp typically expect a stable
monotonic timeline; tying the offset to CLOCK_REALTIME means any later
settimeofday(), NTP step, leap-second adjustment, or a probe that
happens before NTP sync will permanently bias every reported hwtstamp.
There is also no ptp_clock_register()/PHC and no periodic resync, so
the offset captured here is the one used forever.
Is the capture itself meant to be atomic with respect to scheduling?
	ts = eea_pci_device_ts(edev);
	enet->hw_ts_offset = ktime_get_real() - ts;
eea_pci_device_ts() performs an MMIO read across PCIe and is then
followed by ktime_get_real() with no preempt/IRQ disabling, so any
transit or preemption delay between the two reads is permanently
baked into hw_ts_offset for the lifetime of the device.
Does this code handle 48-bit wraparound of the device timestamp
counter?
EEA_DESC_TS(desc) masks desc->ts to 48 bits via
EEA_DESC_TS_MASK = GENMASK_ULL(47, 0), and hw_ts_offset is taken
exactly once here at probe/reprobe.  After roughly 3.26 days of
device uptime (2^48 ns) the per-packet computation
EEA_DESC_TS(desc) + enet->hw_ts_offset will jump backwards by ~2^48
ns relative to wall time, with no wrap detector, no resync, and no
cyclecounter/timecounter wrapper.
There is also a width inconsistency between this capture and the
per-packet path:
drivers/net/ethernet/alibaba/eea/eea_pci.c:eea_pci_device_ts() {
    return cfg_read64(ep_dev->reg, hw_ts);
}
returns the full 64-bit register, while EEA_DESC_TS(desc) masks the
descriptor field to 48 bits.  If bits 48..63 of the hw_ts register
are ever non-zero, every reported hwtstamp would be off by at least
2^48 ns.
> @@ -711,6 +722,8 @@ int eea_net_probe(struct eea_device *edev)
>  	if (err)
>  		goto err_reset_dev;
>
> +	eea_update_ts_off(edev, enet);
> +
>  	netdev_dbg(enet->netdev, "eea probe success.\n");
>
>  	/* Queue TX/RX implementation is still in progress. register_netdev is
[ ... ]
> diff --git a/drivers/net/ethernet/alibaba/eea/eea_rx.c b/drivers/net/ethernet/alibaba/eea/eea_rx.c
> index b1265048fbc3..9c71a7cf950e 100644
> --- a/drivers/net/ethernet/alibaba/eea/eea_rx.c
> +++ b/drivers/net/ethernet/alibaba/eea/eea_rx.c
[ ... ]
> +static void process_first_buf(struct eea_net_rx *rx, struct eea_rx_ctx *ctx)
> +{
> +	struct eea_net *enet = rx->enet;
> +	struct sk_buff *skb = NULL;
> +
> +	if (eea_harden_check_size(ctx, enet))
> +		goto err;
> +
> +	rx->pkt.data_valid = ctx->flags & EEA_DESC_F_DATA_VALID;
Is the EEA_DESC_F_DATA_VALID bit (the source of CHECKSUM_UNNECESSARY)
meant to be sampled only on the first descriptor of a multi-descriptor
packet?
process_first_buf() is the only place rx->pkt.data_valid is assigned,
and process_remain_buf() does not touch it.  Meanwhile in
eea_submit_skb() the timestamp comes from the EOP descriptor:
    if (rx->pkt.data_valid)
        skb->ip_summed = CHECKSUM_UNNECESSARY;
    if (enet->cfg.ts_cfg.rx_filter == HWTSTAMP_FILTER_ALL)
        skb_hwtstamps(skb)->hwtstamp = EEA_DESC_TS(desc) +
            enet->hw_ts_offset;
If the device places per-packet metadata on the EOP descriptor (a
common NIC layout), every multi-segment frame would silently fall
back to CHECKSUM_NONE.  Could the descriptor used for the checksum
flag and for the timestamp be documented or made consistent?
> +
> +	if (ctx->hdr_len)
> +		skb = eea_rx_build_split_hdr_skb(rx, ctx);
> +	else
> +		skb = eea_rx_build_skb(rx, ctx);
> +
> +	if (unlikely(!skb))
> +		goto err;
> +
> +	rx->pkt.head_skb = skb;
> +
> +	return;
> +
> +err:
> +	rx->pkt.do_drop = true;
> +}
> +
> +static void eea_submit_skb(struct eea_net_rx *rx, struct sk_buff *skb,
> +			   struct eea_rx_cdesc *desc)
> +{
> +	struct eea_net *enet = rx->enet;
> +
> +	if (rx->pkt.data_valid)
> +		skb->ip_summed = CHECKSUM_UNNECESSARY;
> +
> +	if (enet->cfg.ts_cfg.rx_filter == HWTSTAMP_FILTER_ALL)
> +		skb_hwtstamps(skb)->hwtstamp = EEA_DESC_TS(desc) +
> +			enet->hw_ts_offset;
Is the strict equality with HWTSTAMP_FILTER_ALL intentional?
The usual driver convention for SIOCSHWTSTAMP is to either reject
unsupported filters or upgrade them to ALL, so that consumers
requesting more specific filters (PTP_V2_*, NTP_ALL, etc.) still
get timestamps on matching frames.  With the strict comparison
above, anything other than HWTSTAMP_FILTER_ALL yields no timestamps.
There is no SIOCSHWTSTAMP setter in this series, so the branch is
effectively unreachable today, but it may be worth revisiting
together with whatever setter lands later.
> +
> +	skb_record_rx_queue(skb, rx->index);
> +	skb->protocol = eth_type_trans(skb, enet->netdev);
> +
> +	napi_gro_receive(rx->napi, skb);
> +}
[ ... ]
> +static int eea_cleanrx(struct eea_net_rx *rx, int budget,
> +		       struct eea_rx_ctx *ctx)
> +{
> +	struct eea_rx_cdesc *desc;
> +	struct eea_rx_meta *meta;
> +	int recv, err;
> +
> +	for (recv = 0; recv < budget; ) {
> +		desc = eea_ering_cq_get_desc(rx->ering);
> +		if (!desc)
> +			break;
> +
> +		err = eea_rx_desc_to_ctx(rx, ctx, desc);
> +		if (unlikely(err)) {
> +			if (ctx->meta)
> +				eea_rx_meta_put(rx, ctx->meta);
> +
> +			if (rx->pkt.head_skb)
> +				dev_kfree_skb(rx->pkt.head_skb);
> +
> +			/* A hardware error occurred; we are attempting to
> +			 * mitigate the impact. Subsequent packets may be
> +			 * corrupted.
> +			 */
> +			ctx->more = false;
> +			goto ack;
> +		}
Can this recovery path forward arbitrary mid-payload bytes to the
network stack as a fresh L2 frame?
When eea_rx_desc_to_ctx() fails mid-stream (invalid id, or len
greater than meta->len), this branch frees rx->pkt.head_skb, forces
ctx->more = false, and jumps to ack.  The tail of the loop then
runs:
    if (!ctx->more) {
        memset(&rx->pkt, 0, sizeof(rx->pkt));
        ++recv;
    }
so no "still inside a device-side packet" indicator survives.  The
next descriptor from the device — which may be a continuation
buffer of the original frame — will be handled by process_first_buf()
because rx->pkt.idx == 0.  If that continuation satisfies
ctx->len >= ETH_HLEN, an skb is built from raw mid-payload bytes
and submitted via eth_type_trans()/napi_gro_receive().
Would setting rx->pkt.do_drop = true (and only clearing rx->pkt on
the next observed !more) be a safer way to mitigate, given the
existing comment already notes "subsequent packets may be
corrupted"?
> +
> +		meta = ctx->meta;
> +
> +		if (unlikely(rx->pkt.do_drop))
> +			goto skip;
> +
> +		eea_rx_meta_dma_sync_for_cpu(rx, meta, ctx->len);
> +
> +		rx->pkt.recv_len += ctx->len;
> +		rx->pkt.recv_len += ctx->hdr_len;
> +
> +		if (!rx->pkt.idx)
> +			process_first_buf(rx, ctx);
> +		else
> +			process_remain_buf(rx, ctx);
> +
> +		++rx->pkt.idx;
> +
> +		if (!ctx->more && rx->pkt.head_skb)
> +			eea_submit_skb(rx, rx->pkt.head_skb, desc);
> +
> +skip:
> +		eea_rx_meta_put(rx, meta);
> +ack:
> +		eea_ering_cq_ack_desc(rx->ering, 1);
> +
> +		if (!ctx->more) {
> +			memset(&rx->pkt, 0, sizeof(rx->pkt));
> +			++recv;
> +		}
> +	}
> +
> +	return recv;
> +}

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v43 5/8] eea: implement packet receive logic
  2026-05-18  2:34   ` Xuan Zhuo
@ 2026-05-18  2:35     ` Xuan Zhuo
  0 siblings, 0 replies; 25+ messages in thread
From: Xuan Zhuo @ 2026-05-18  2:35 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Wen Gu, Philo Lu, Vadim Fedorenko, Dong Yibo,
	Mingyu Wang, Heiner Kallweit, Dust Li, netdev

On Mon, 18 May 2026 10:34:07 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> From sashiko:
>
>
> commit 142959c3285196e5751c147c32a4a954a804d736
> Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> eea: implement packet receive logic
> This commit implements the EEA RX path: page-pool buffer management,
> per-descriptor metadata handling, packet assembly via build_skb()/
> napi_alloc_skb(), and submission through napi_gro_receive().  It also
> introduces a one-shot hardware-timestamp offset captured at probe time.
> > diff --git a/drivers/net/ethernet/alibaba/eea/eea_net.c b/drivers/net/ethernet/alibaba/eea/eea_net.c
> > index cfb18a07e296..bf8581bc27ed 100644
> > --- a/drivers/net/ethernet/alibaba/eea/eea_net.c
> > +++ b/drivers/net/ethernet/alibaba/eea/eea_net.c
> > @@ -642,6 +642,15 @@ static struct eea_net *eea_netdev_alloc(struct eea_device *edev, u32 pairs)
> >  	return enet;
> >  }
> >
> > +static void eea_update_ts_off(struct eea_device *edev, struct eea_net *enet)
> > +{
> > +	u64 ts;
> > +
> > +	ts = eea_pci_device_ts(edev);
> > +
> > +	enet->hw_ts_offset = ktime_get_real() - ts;
> > +}
> Is the choice of ktime_get_real() as the anchor for hardware RX
> timestamps intentional?
> Consumers of skb_hwtstamps()->hwtstamp typically expect a stable
> monotonic timeline; tying the offset to CLOCK_REALTIME means any later
> settimeofday(), NTP step, leap-second adjustment, or a probe that
> happens before NTP sync will permanently bias every reported hwtstamp.
> There is also no ptp_clock_register()/PHC and no periodic resync, so
> the offset captured here is the one used forever.
> Is the capture itself meant to be atomic with respect to scheduling?
> 	ts = eea_pci_device_ts(edev);
> 	enet->hw_ts_offset = ktime_get_real() - ts;
> eea_pci_device_ts() performs an MMIO read across PCIe and is then
> followed by ktime_get_real() with no preempt/IRQ disabling, so any
> transit or preemption delay between the two reads is permanently
> baked into hw_ts_offset for the lifetime of the device.
> Does this code handle 48-bit wraparound of the device timestamp
> counter?
> EEA_DESC_TS(desc) masks desc->ts to 48 bits via
> EEA_DESC_TS_MASK = GENMASK_ULL(47, 0), and hw_ts_offset is taken
> exactly once here at probe/reprobe.  After roughly 3.26 days of
> device uptime (2^48 ns) the per-packet computation
> EEA_DESC_TS(desc) + enet->hw_ts_offset will jump backwards by ~2^48
> ns relative to wall time, with no wrap detector, no resync, and no
> cyclecounter/timecounter wrapper.
> There is also a width inconsistency between this capture and the
> per-packet path:
> drivers/net/ethernet/alibaba/eea/eea_pci.c:eea_pci_device_ts() {
>     return cfg_read64(ep_dev->reg, hw_ts);
> }
> returns the full 64-bit register, while EEA_DESC_TS(desc) masks the
> descriptor field to 48 bits.  If bits 48..63 of the hw_ts register
> are ever non-zero, every reported hwtstamp would be off by at least
> 2^48 ns.

Indeed, this part is currently just a basic architectural framework. I have not
included a full PTP implementation in this version, as it is slated for future
development. I acknowledge that this implementation is currently incomplete.


> > @@ -711,6 +722,8 @@ int eea_net_probe(struct eea_device *edev)
> >  	if (err)
> >  		goto err_reset_dev;
> >
> > +	eea_update_ts_off(edev, enet);
> > +
> >  	netdev_dbg(enet->netdev, "eea probe success.\n");
> >
> >  	/* Queue TX/RX implementation is still in progress. register_netdev is
> [ ... ]
> > diff --git a/drivers/net/ethernet/alibaba/eea/eea_rx.c b/drivers/net/ethernet/alibaba/eea/eea_rx.c
> > index b1265048fbc3..9c71a7cf950e 100644
> > --- a/drivers/net/ethernet/alibaba/eea/eea_rx.c
> > +++ b/drivers/net/ethernet/alibaba/eea/eea_rx.c
> [ ... ]
> > +static void process_first_buf(struct eea_net_rx *rx, struct eea_rx_ctx *ctx)
> > +{
> > +	struct eea_net *enet = rx->enet;
> > +	struct sk_buff *skb = NULL;
> > +
> > +	if (eea_harden_check_size(ctx, enet))
> > +		goto err;
> > +
> > +	rx->pkt.data_valid = ctx->flags & EEA_DESC_F_DATA_VALID;
> Is the EEA_DESC_F_DATA_VALID bit (the source of CHECKSUM_UNNECESSARY)
> meant to be sampled only on the first descriptor of a multi-descriptor
> packet?
> process_first_buf() is the only place rx->pkt.data_valid is assigned,
> and process_remain_buf() does not touch it.  Meanwhile in
> eea_submit_skb() the timestamp comes from the EOP descriptor:
>     if (rx->pkt.data_valid)
>         skb->ip_summed = CHECKSUM_UNNECESSARY;
>     if (enet->cfg.ts_cfg.rx_filter == HWTSTAMP_FILTER_ALL)
>         skb_hwtstamps(skb)->hwtstamp = EEA_DESC_TS(desc) +
>             enet->hw_ts_offset;
> If the device places per-packet metadata on the EOP descriptor (a
> common NIC layout), every multi-segment frame would silently fall
> back to CHECKSUM_NONE.  Could the descriptor used for the checksum
> flag and for the timestamp be documented or made consistent?

On the RX path, for multi-descriptor packets, the information contained in all
descriptors is essentially the same. Since they all describe the same packet, we
only need to sample EEA_DESC_F_DATA_VALID once. As for the hardware timestamp,
we simply retrieve it from the last descriptor as a matter of convenience.

> > +
> > +	if (ctx->hdr_len)
> > +		skb = eea_rx_build_split_hdr_skb(rx, ctx);
> > +	else
> > +		skb = eea_rx_build_skb(rx, ctx);
> > +
> > +	if (unlikely(!skb))
> > +		goto err;
> > +
> > +	rx->pkt.head_skb = skb;
> > +
> > +	return;
> > +
> > +err:
> > +	rx->pkt.do_drop = true;
> > +}
> > +
> > +static void eea_submit_skb(struct eea_net_rx *rx, struct sk_buff *skb,
> > +			   struct eea_rx_cdesc *desc)
> > +{
> > +	struct eea_net *enet = rx->enet;
> > +
> > +	if (rx->pkt.data_valid)
> > +		skb->ip_summed = CHECKSUM_UNNECESSARY;
> > +
> > +	if (enet->cfg.ts_cfg.rx_filter == HWTSTAMP_FILTER_ALL)
> > +		skb_hwtstamps(skb)->hwtstamp = EEA_DESC_TS(desc) +
> > +			enet->hw_ts_offset;
> Is the strict equality with HWTSTAMP_FILTER_ALL intentional?
> The usual driver convention for SIOCSHWTSTAMP is to either reject
> unsupported filters or upgrade them to ALL, so that consumers
> requesting more specific filters (PTP_V2_*, NTP_ALL, etc.) still
> get timestamps on matching frames.  With the strict comparison
> above, anything other than HWTSTAMP_FILTER_ALL yields no timestamps.
> There is no SIOCSHWTSTAMP setter in this series, so the branch is
> effectively unreachable today, but it may be worth revisiting
> together with whatever setter lands later.
> > +
> > +	skb_record_rx_queue(skb, rx->index);
> > +	skb->protocol = eth_type_trans(skb, enet->netdev);
> > +
> > +	napi_gro_receive(rx->napi, skb);
> > +}
> [ ... ]
> > +static int eea_cleanrx(struct eea_net_rx *rx, int budget,
> > +		       struct eea_rx_ctx *ctx)
> > +{
> > +	struct eea_rx_cdesc *desc;
> > +	struct eea_rx_meta *meta;
> > +	int recv, err;
> > +
> > +	for (recv = 0; recv < budget; ) {
> > +		desc = eea_ering_cq_get_desc(rx->ering);
> > +		if (!desc)
> > +			break;
> > +
> > +		err = eea_rx_desc_to_ctx(rx, ctx, desc);
> > +		if (unlikely(err)) {
> > +			if (ctx->meta)
> > +				eea_rx_meta_put(rx, ctx->meta);
> > +
> > +			if (rx->pkt.head_skb)
> > +				dev_kfree_skb(rx->pkt.head_skb);
> > +
> > +			/* A hardware error occurred; we are attempting to
> > +			 * mitigate the impact. Subsequent packets may be
> > +			 * corrupted.
> > +			 */
> > +			ctx->more = false;
> > +			goto ack;
> > +		}
> Can this recovery path forward arbitrary mid-payload bytes to the
> network stack as a fresh L2 frame?
> When eea_rx_desc_to_ctx() fails mid-stream (invalid id, or len
> greater than meta->len), this branch frees rx->pkt.head_skb, forces
> ctx->more = false, and jumps to ack.  The tail of the loop then
> runs:
>     if (!ctx->more) {
>         memset(&rx->pkt, 0, sizeof(rx->pkt));
>         ++recv;
>     }
> so no "still inside a device-side packet" indicator survives.  The
> next descriptor from the device — which may be a continuation
> buffer of the original frame — will be handled by process_first_buf()
> because rx->pkt.idx == 0.  If that continuation satisfies
> ctx->len >= ETH_HLEN, an skb is built from raw mid-payload bytes
> and submitted via eth_type_trans()/napi_gro_receive().
> Would setting rx->pkt.do_drop = true (and only clearing rx->pkt on
> the next observed !more) be a safer way to mitigate, given the
> existing comment already notes "subsequent packets may be
> corrupted"?


During multi-fragment processing—specifically, whether a mid-stream
exception might cause the driver to misidentify subsequent fragments as the
start of a new packet. However, from a driver perspective, there is little
we can do in such scenarios; if the hardware state becomes corrupted in
this way, it constitutes a critical hardware error that is effectively
beyond the driver's recovery capability.


> > +
> > +		meta = ctx->meta;
> > +
> > +		if (unlikely(rx->pkt.do_drop))
> > +			goto skip;
> > +
> > +		eea_rx_meta_dma_sync_for_cpu(rx, meta, ctx->len);
> > +
> > +		rx->pkt.recv_len += ctx->len;
> > +		rx->pkt.recv_len += ctx->hdr_len;
> > +
> > +		if (!rx->pkt.idx)
> > +			process_first_buf(rx, ctx);
> > +		else
> > +			process_remain_buf(rx, ctx);
> > +
> > +		++rx->pkt.idx;
> > +
> > +		if (!ctx->more && rx->pkt.head_skb)
> > +			eea_submit_skb(rx, rx->pkt.head_skb, desc);
> > +
> > +skip:
> > +		eea_rx_meta_put(rx, meta);
> > +ack:
> > +		eea_ering_cq_ack_desc(rx->ering, 1);
> > +
> > +		if (!ctx->more) {
> > +			memset(&rx->pkt, 0, sizeof(rx->pkt));
> > +			++recv;
> > +		}
> > +	}
> > +
> > +	return recv;
> > +}

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v43 6/8] eea: implement packet transmit logic
  2026-05-14  9:51 ` [PATCH net-next v43 6/8] eea: implement packet transmit logic Xuan Zhuo
@ 2026-05-18  2:47   ` Xuan Zhuo
  2026-05-18  2:48     ` Xuan Zhuo
  0 siblings, 1 reply; 25+ messages in thread
From: Xuan Zhuo @ 2026-05-18  2:47 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Wen Gu, Philo Lu, Vadim Fedorenko, Dong Yibo,
	Mingyu Wang, Heiner Kallweit, Dust Li, netdev

From sashiko:


commit cffb539ce8b8ad9935febc105007a1405d95169b
Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
eea: implement packet transmit logic
This commit fills in the previously stubbed eea_tx_xmit() and
eea_poll_tx() with the full TX path: building descriptors from skbs,
handling DMA mapping/unmapping, GSO and checksum offload setup, and
completion processing.
> diff --git a/drivers/net/ethernet/alibaba/eea/eea_tx.c b/drivers/net/ethernet/alibaba/eea/eea_tx.c
> index 5a8babfce5de..8b7fc75ead8a 100644
> --- a/drivers/net/ethernet/alibaba/eea/eea_tx.c
> +++ b/drivers/net/ethernet/alibaba/eea/eea_tx.c
[ ... ]
> +static int eea_fill_desc_from_skb(const struct sk_buff *skb,
> +				  struct eea_tx_desc *desc)
> +{
> +	if (skb_is_gso(skb)) {
> +		struct skb_shared_info *sinfo = skb_shinfo(skb);
> +
> +		desc->gso_size = cpu_to_le16(sinfo->gso_size);
> +		if (sinfo->gso_type & SKB_GSO_TCPV4)
> +			desc->gso_type = EEA_TX_GSO_TCPV4;
> +
> +		else if (sinfo->gso_type & SKB_GSO_TCPV6)
> +			desc->gso_type = EEA_TX_GSO_TCPV6;
> +
> +		else if (sinfo->gso_type & SKB_GSO_UDP_L4)
> +			desc->gso_type = EEA_TX_GSO_UDP_L4;
> +
> +		else
> +			return -EINVAL;
> +
> +		if (sinfo->gso_type & SKB_GSO_TCP_ECN)
> +			desc->gso_type |= EEA_TX_GSO_ECN;
> +	} else {
> +		desc->gso_type = EEA_TX_GSO_NONE;
> +	}
> +
> +	if (skb->ip_summed == CHECKSUM_PARTIAL) {
> +		desc->csum_start = cpu_to_le16(skb_checksum_start_offset(skb));
> +		desc->csum_offset = cpu_to_le16(skb->csum_offset);
> +	}
> +
> +	return 0;
> +}
Can stale fields in a reused descriptor slot be sent to the device?
The SQ ring slots are reused as the ring wraps. Several eea_tx_desc
fields are only conditionally written:
  - csum_start and csum_offset are only set when
    skb->ip_summed == CHECKSUM_PARTIAL
  - gso_size is only set when skb_is_gso(skb)
  - gso_type is set here for the head descriptor, but
    eea_tx_add_skb_frag() and the >USHRT_MAX split path in
    eea_tx_desc_fill() never write gso_type, gso_size, csum_start
    or csum_offset on continuation descriptors
  - reserved1, reserved2 and reserved3 are never written
After the first ring wrap each reused slot will carry whatever values
the previous occupant wrote. Does correctness here rely on the device
gating each optional field on EEA_DESC_F_DO_CSUM and on the
"more descriptors" flag, and ignoring all reserved bits?
If so, would it be worth either zeroing the descriptor at fill time
(for example a memset before populating it) or adding a comment that
documents the hardware contract being relied on?
What is the silent drop signal when an unsupported gso_type appears?
When skb_is_gso(skb) is true but the type is none of TCPV4, TCPV6 or
UDP_L4, this returns -EINVAL, which propagates up through
eea_tx_post_skb() into err_cancel and ultimately to
dev_kfree_skb_any() in eea_tx_xmit(). There is no kfree_skb_reason(),
no per-queue counter and no log message.
Today the four supported types match what is advertised in
hw_features and features, so this branch should be unreachable. If a
future feature flag is advertised without updating this switch (for
example a tunnel GSO type), would the resulting drops be observable
from userspace at all?
Would kfree_skb_reason() with a specific drop reason, or restructuring
so the unsupported case is structurally unreachable, make this easier
to debug?
> +static int eea_tx_check_free_num(struct eea_net_tx *tx,
> +				 struct netdev_queue *txq)
> +{
> +	int n;
> +
> +	/* MAX_SKB_FRAGS + 1: Covers the skb linear head and all paged fragments
> +	 * 1: Extra slot for a head or fragment that exceeds 64KB.
> +	 */
> +	n = MAX_SKB_FRAGS + 2;
> +	return netif_txq_maybe_stop(txq, tx->ering->num_free, n, n);
>  }
>
>  netdev_tx_t eea_tx_xmit(struct sk_buff *skb, struct net_device *netdev)
>  {
> -	/* Empty function; will be implemented in a subsequent commit. */
> -	dev_kfree_skb_any(skb);
> +	struct eea_net *enet = netdev_priv(netdev);
> +	int qnum = skb_get_queue_mapping(skb);
> +	struct eea_net_tx *tx = &enet->tx[qnum];
> +	struct netdev_queue *txq;
> +	int err, enable;
> +
> +	txq = netdev_get_tx_queue(netdev, qnum);
> +
> +	enable = eea_tx_check_free_num(tx, txq);
> +	if (!enable)
> +		return NETDEV_TX_BUSY;
> +
> +	err = eea_tx_post_skb(tx, skb);
> +	if (unlikely(err)) {
> +		dev_kfree_skb_any(skb);
> +	} else {
> +		if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP))
> +			skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
> +		skb_tx_timestamp(skb);
> +	}
> +
> +	/* NETDEV_TX_BUSY is expensive. So stop advancing the TX queue. */
> +	eea_tx_check_free_num(tx, txq);
> +
> +	if (!netdev_xmit_more() || netif_xmit_stopped(txq))
> +		eea_tx_kick(tx);
> +
>  	return NETDEV_TX_OK;
>  }
Is BQL intended to be wired up here?
eea_tx_check_free_num() uses netif_txq_maybe_stop(), which is the
BQL-aware helper, but the xmit path never calls
netdev_tx_sent_queue() and the completion path in eea_clean_tx() /
eea_poll_tx() never calls netdev_tx_completed_queue(). Without the
paired byte updates, BQL has no view of in-flight bytes.
Also, the stop and start thresholds passed to netif_txq_maybe_stop()
are both MAX_SKB_FRAGS + 2, so there is no hysteresis between
queue-stop and queue-wake. eea_poll_tx() uses the same threshold for
its wake check:
	if (netif_tx_queue_stopped(txq) &&
	    tx->ering->num_free >= MAX_SKB_FRAGS + 2)
		netif_tx_wake_queue(txq);
Was a stop/start gap (e.g. start at 2 * stop) considered to dampen
wake/stop oscillation under sustained pressure?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v43 6/8] eea: implement packet transmit logic
  2026-05-18  2:47   ` Xuan Zhuo
@ 2026-05-18  2:48     ` Xuan Zhuo
  0 siblings, 0 replies; 25+ messages in thread
From: Xuan Zhuo @ 2026-05-18  2:48 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Wen Gu, Philo Lu, Vadim Fedorenko, Dong Yibo,
	Mingyu Wang, Heiner Kallweit, Dust Li, netdev

On Mon, 18 May 2026 10:47:05 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> From sashiko:
>
>
> commit cffb539ce8b8ad9935febc105007a1405d95169b
> Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> eea: implement packet transmit logic
> This commit fills in the previously stubbed eea_tx_xmit() and
> eea_poll_tx() with the full TX path: building descriptors from skbs,
> handling DMA mapping/unmapping, GSO and checksum offload setup, and
> completion processing.
> > diff --git a/drivers/net/ethernet/alibaba/eea/eea_tx.c b/drivers/net/ethernet/alibaba/eea/eea_tx.c
> > index 5a8babfce5de..8b7fc75ead8a 100644
> > --- a/drivers/net/ethernet/alibaba/eea/eea_tx.c
> > +++ b/drivers/net/ethernet/alibaba/eea/eea_tx.c
> [ ... ]
> > +static int eea_fill_desc_from_skb(const struct sk_buff *skb,
> > +				  struct eea_tx_desc *desc)
> > +{
> > +	if (skb_is_gso(skb)) {
> > +		struct skb_shared_info *sinfo = skb_shinfo(skb);
> > +
> > +		desc->gso_size = cpu_to_le16(sinfo->gso_size);
> > +		if (sinfo->gso_type & SKB_GSO_TCPV4)
> > +			desc->gso_type = EEA_TX_GSO_TCPV4;
> > +
> > +		else if (sinfo->gso_type & SKB_GSO_TCPV6)
> > +			desc->gso_type = EEA_TX_GSO_TCPV6;
> > +
> > +		else if (sinfo->gso_type & SKB_GSO_UDP_L4)
> > +			desc->gso_type = EEA_TX_GSO_UDP_L4;
> > +
> > +		else
> > +			return -EINVAL;
> > +
> > +		if (sinfo->gso_type & SKB_GSO_TCP_ECN)
> > +			desc->gso_type |= EEA_TX_GSO_ECN;
> > +	} else {
> > +		desc->gso_type = EEA_TX_GSO_NONE;
> > +	}
> > +
> > +	if (skb->ip_summed == CHECKSUM_PARTIAL) {
> > +		desc->csum_start = cpu_to_le16(skb_checksum_start_offset(skb));
> > +		desc->csum_offset = cpu_to_le16(skb->csum_offset);
> > +	}
> > +
> > +	return 0;
> > +}
> Can stale fields in a reused descriptor slot be sent to the device?
> The SQ ring slots are reused as the ring wraps. Several eea_tx_desc
> fields are only conditionally written:
>   - csum_start and csum_offset are only set when
>     skb->ip_summed == CHECKSUM_PARTIAL
>   - gso_size is only set when skb_is_gso(skb)
>   - gso_type is set here for the head descriptor, but
>     eea_tx_add_skb_frag() and the >USHRT_MAX split path in
>     eea_tx_desc_fill() never write gso_type, gso_size, csum_start
>     or csum_offset on continuation descriptors
>   - reserved1, reserved2 and reserved3 are never written
> After the first ring wrap each reused slot will carry whatever values
> the previous occupant wrote. Does correctness here rely on the device
> gating each optional field on EEA_DESC_F_DO_CSUM and on the
> "more descriptors" flag, and ignoring all reserved bits?
> If so, would it be worth either zeroing the descriptor at fill time
> (for example a memset before populating it) or adding a comment that
> documents the hardware contract being relied on?

This isn't an issue; the device parses the flags first to determine which fields
to process.

> What is the silent drop signal when an unsupported gso_type appears?
> When skb_is_gso(skb) is true but the type is none of TCPV4, TCPV6 or
> UDP_L4, this returns -EINVAL, which propagates up through
> eea_tx_post_skb() into err_cancel and ultimately to
> dev_kfree_skb_any() in eea_tx_xmit(). There is no kfree_skb_reason(),
> no per-queue counter and no log message.
> Today the four supported types match what is advertised in
> hw_features and features, so this branch should be unreachable. If a


Yes, so return directly.


> future feature flag is advertised without updating this switch (for
> example a tunnel GSO type), would the resulting drops be observable
> from userspace at all?
> Would kfree_skb_reason() with a specific drop reason, or restructuring
> so the unsupported case is structurally unreachable, make this easier
> to debug?
> > +static int eea_tx_check_free_num(struct eea_net_tx *tx,
> > +				 struct netdev_queue *txq)
> > +{
> > +	int n;
> > +
> > +	/* MAX_SKB_FRAGS + 1: Covers the skb linear head and all paged fragments
> > +	 * 1: Extra slot for a head or fragment that exceeds 64KB.
> > +	 */
> > +	n = MAX_SKB_FRAGS + 2;
> > +	return netif_txq_maybe_stop(txq, tx->ering->num_free, n, n);
> >  }
> >
> >  netdev_tx_t eea_tx_xmit(struct sk_buff *skb, struct net_device *netdev)
> >  {
> > -	/* Empty function; will be implemented in a subsequent commit. */
> > -	dev_kfree_skb_any(skb);
> > +	struct eea_net *enet = netdev_priv(netdev);
> > +	int qnum = skb_get_queue_mapping(skb);
> > +	struct eea_net_tx *tx = &enet->tx[qnum];
> > +	struct netdev_queue *txq;
> > +	int err, enable;
> > +
> > +	txq = netdev_get_tx_queue(netdev, qnum);
> > +
> > +	enable = eea_tx_check_free_num(tx, txq);
> > +	if (!enable)
> > +		return NETDEV_TX_BUSY;
> > +
> > +	err = eea_tx_post_skb(tx, skb);
> > +	if (unlikely(err)) {
> > +		dev_kfree_skb_any(skb);
> > +	} else {
> > +		if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP))
> > +			skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
> > +		skb_tx_timestamp(skb);
> > +	}
> > +
> > +	/* NETDEV_TX_BUSY is expensive. So stop advancing the TX queue. */
> > +	eea_tx_check_free_num(tx, txq);
> > +
> > +	if (!netdev_xmit_more() || netif_xmit_stopped(txq))
> > +		eea_tx_kick(tx);
> > +
> >  	return NETDEV_TX_OK;
> >  }
> Is BQL intended to be wired up here?
> eea_tx_check_free_num() uses netif_txq_maybe_stop(), which is the
> BQL-aware helper, but the xmit path never calls
> netdev_tx_sent_queue() and the completion path in eea_clean_tx() /
> eea_poll_tx() never calls netdev_tx_completed_queue(). Without the
> paired byte updates, BQL has no view of in-flight bytes.
> Also, the stop and start thresholds passed to netif_txq_maybe_stop()
> are both MAX_SKB_FRAGS + 2, so there is no hysteresis between
> queue-stop and queue-wake. eea_poll_tx() uses the same threshold for
> its wake check:
> 	if (netif_tx_queue_stopped(txq) &&
> 	    tx->ering->num_free >= MAX_SKB_FRAGS + 2)
> 		netif_tx_wake_queue(txq);
> Was a stop/start gap (e.g. start at 2 * stop) considered to dampen
> wake/stop oscillation under sustained pressure?

BQL is not supported in this version.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v43 7/8] eea: introduce ethtool support
  2026-05-14  9:51 ` [PATCH net-next v43 7/8] eea: introduce ethtool support Xuan Zhuo
@ 2026-05-18  2:56   ` Xuan Zhuo
  0 siblings, 0 replies; 25+ messages in thread
From: Xuan Zhuo @ 2026-05-18  2:56 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Wen Gu, Philo Lu, Vadim Fedorenko, Dong Yibo,
	Mingyu Wang, Heiner Kallweit, Dust Li, Andrew Lunn, netdev


No regressions from Sashiko.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v43 2/8] eea: introduce ring and descriptor structures
  2026-05-14  9:51 ` [PATCH net-next v43 2/8] eea: introduce ring and descriptor structures Xuan Zhuo
@ 2026-05-18  2:57   ` Xuan Zhuo
  0 siblings, 0 replies; 25+ messages in thread
From: Xuan Zhuo @ 2026-05-18  2:57 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Wen Gu, Philo Lu, Vadim Fedorenko, Dong Yibo,
	Mingyu Wang, Heiner Kallweit, Dust Li, netdev

No regressions from Sashiko.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v43 8/8] eea: introduce callback for ndo_get_stats64 and register netdev
  2026-05-14  9:51 ` [PATCH net-next v43 8/8] eea: introduce callback for ndo_get_stats64 and register netdev Xuan Zhuo
@ 2026-05-18  3:01   ` Xuan Zhuo
  0 siblings, 0 replies; 25+ messages in thread
From: Xuan Zhuo @ 2026-05-18  3:01 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Wen Gu, Philo Lu, Vadim Fedorenko, Dong Yibo,
	Mingyu Wang, Heiner Kallweit, Dust Li, netdev

https://idealab.alibaba-inc.com/ideaTalk#/ideaTALK/chat?sessionId=252ca245-ee4a-45d8-a859-d20100fe3f91

Sashiko comments linear growth in the stats counters.

However, I plan to introduce qstats (queue-specific statistics) in a future
update, which will be a more appropriate place for such tracking. I intend
to defer this feature until that implementation; it will not be included in
the current version.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v43 1/8] eea: introduce PCI framework
  2026-05-14  9:51 ` [PATCH net-next v43 1/8] eea: introduce PCI framework Xuan Zhuo
@ 2026-05-18  3:06   ` Xuan Zhuo
  2026-05-18  3:07     ` Xuan Zhuo
  0 siblings, 1 reply; 25+ messages in thread
From: Xuan Zhuo @ 2026-05-18  3:06 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Wen Gu, Philo Lu, Vadim Fedorenko, Dong Yibo,
	Mingyu Wang, Heiner Kallweit, Dust Li, netdev


From https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260514095138.80680-1-xuanzhuo%40linux.alibaba.com

commit 46941a5c3a95ebc5ea79193eb1c2ceab7183da1d
Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
eea: introduce PCI framework
This patch adds a basic PCI driver framework for the Alibaba Elastic
Ethernet Adapter (EEA), implementing PCI probe, remove, shutdown, the
device-status handshake helpers, and an MMIO config layout.
A few questions inline below.
> diff --git a/drivers/net/ethernet/alibaba/eea/eea_pci.c b/drivers/net/ethernet/alibaba/eea/eea_pci.c
> new file mode 100644
> index 000000000000..65a0ceb73b35
> --- /dev/null
> +++ b/drivers/net/ethernet/alibaba/eea/eea_pci.c
> @@ -0,0 +1,487 @@
[ ... ]
> +int eea_device_reset(struct eea_device *edev)
> +{
> +	struct eea_pci_device *ep_dev = edev->ep_dev;
> +	int err;
> +	u8 val;
> +
> +	eea_pci_io_set_status(edev, 0);
> +
> +	/* We are no longer waiting for device ack during the shutdown flow. */
> +	if (ep_dev->shutdown)
> +		return 0;
> +
> +	/* A longer timeout is set here to handle edge cases, though it should
> +	 * return promptly in most scenarios.
> +	 *
> +	 * In our case, all replies are handled by the DPU software, so there is
> +	 * no race condition between the hardware processes and the register.
> +	 */
> +	err = read_poll_timeout(cfg_read8, val, (!val || val == 0xFF), 20,
> +				EEA_RESET_TIMEOUT_US,
> +				false, ep_dev->reg, device_status);
On the shutdown path, ep_dev->shutdown is set to true and eea_device_reset()
then returns immediately after writing device_status = 0, skipping the
read_poll_timeout() that would otherwise wait for the device to acknowledge
the reset:
eea_pci_shutdown():
    ep_dev->shutdown = true;
    /* do net device stop and clear. */
    eea_device_reset(edev);
    pci_clear_master(pci_dev);
After that, eea_pci_shutdown() calls pci_clear_master(), which only clears
the bus-master enable bit and does not drain in-flight or posted DMA writes.
Without a flush read of an MMIO register from this function or a synchronous
device-side ack, is there a guarantee that the device has stopped issuing
DMA before pci_clear_master() runs?
On kexec or warm-reboot, can in-flight DMA after eea_pci_shutdown() returns
land in pages that the next kernel has reused?
> +
> +	/* Surprise PCIe Removal */
> +	if (val == 0xFF)
> +		return -EINVAL;
> +
> +	return err;
> +}
> +
> +int eea_pci_set_aq_up(struct eea_device *edev)
> +{
> +	struct eea_pci_device *ep_dev = edev->ep_dev;
> +	u8 status = eea_pci_io_get_status(edev);
> +	int err;
> +	u8 val;
> +
> +	eea_pci_io_set_status(edev, status | EEA_S_OK);
> +
> +	/* A longer timeout is set here to handle edge cases, though it should
> +	 * return promptly in most scenarios.
> +	 *
> +	 * In our case, all replies are handled by the DPU software, so there is
> +	 * no race condition between the hardware processes and the register.
> +	 */
> +	err = read_poll_timeout(cfg_read8, val,
> +				val & (EEA_S_OK | EEA_S_FAILED),
> +				20, EEA_RESET_TIMEOUT_US,
> +				false, ep_dev->reg, device_status);
> +
> +	/* Surprise PCIe Removal */
> +	if (val == 0xFF)
> +		return -EINVAL;
> +
> +	/* device fail */
> +	if (val & EEA_S_FAILED)
> +		return -EINVAL;
> +
> +	return err;
> +}
Can this handshake actually observe device acknowledgment?
The driver writes status | EEA_S_OK to device_status, then read_poll_timeout()
waits for the readback to satisfy (val & (EEA_S_OK | EEA_S_FAILED)).
read_poll_timeout() with sleep_before_read = false performs the op() once
before sleeping, so the very first cfg_read8() of device_status will return
the value that was just written, which already has EEA_S_OK set. The
condition is met on iteration zero, the 60-second EEA_RESET_TIMEOUT_US never
elapses, and EEA_S_FAILED never gets a chance to be observed.
For the device to ever indicate failure, would it need to first clear
EEA_S_OK and then either re-assert EEA_S_OK on success or set EEA_S_FAILED
on failure? Or use a separate ack/done bit that is distinct from the
driver-written request bit? As written, the EEA_S_FAILED check below the
poll appears unreachable on a healthy bus.
> +
> +static int eea_negotiate(struct eea_device *edev)
> +{
> +	struct eea_pci_device *ep_dev;
> +	u32 status;
> +
> +	ep_dev = edev->ep_dev;
> +
> +	edev->features = 0;
> +
> +	cfg_write32(ep_dev->reg, drv_f_idx, 0);
> +	cfg_write32(ep_dev->reg, drv_f, lower_32_bits(edev->features));
> +	cfg_write32(ep_dev->reg, drv_f_idx, 1);
> +	cfg_write32(ep_dev->reg, drv_f, upper_32_bits(edev->features));
> +
> +	eea_add_status(edev, EEA_S_FEATURE_DONE);
> +	status = eea_pci_io_get_status(edev);
> +
> +	/* Surprise PCIe Removal */
> +	if (status == 0xFF)
> +		return -EINVAL;
> +
> +	if (!(status & EEA_S_FEATURE_DONE))
> +		return -ENODEV;
> +
> +	return 0;
> +}
Is this status check tautological?
eea_add_status() reads device_status, ORs in EEA_S_FEATURE_DONE, and writes
it back. The very next line reads device_status again, so unless the device
clears the bit before the readback (which there is no synchronisation for),
the readback will always have EEA_S_FEATURE_DONE set, except in the surprise
removal case where the value is 0xFF.
Can this code detect a device that legitimately rejects the offered
features, given there is no poll, delay, or wait for the device to react?
The asymmetry with eea_pci_set_aq_up(), which does call read_poll_timeout()
albeit with the issue noted above, suggests the protocol contract for these
two transitions is different. Could the expected device-side behaviour for
EEA_S_FEATURE_DONE be spelled out so reviewers can check the handshake?
> +
[ ... ]
> +static int eea_pci_setup(struct pci_dev *pci_dev, struct eea_pci_device *ep_dev)
> +{
> +	int err, n, ret, len;
> +
> +	ep_dev->pci_dev = pci_dev;
> +
> +	err = pci_enable_device(pci_dev);
> +	if (err)
> +		return err;
> +
> +	err = pci_request_regions(pci_dev, "EEA");
> +	if (err)
> +		goto err_disable_dev;
[ ... ]
> +	ep_dev->reg = pci_iomap(pci_dev, 0, 0);
> +	if (!ep_dev->reg) {
> +		dev_err(&pci_dev->dev, "Failed to map pci bar!\n");
> +		err = -ENOMEM;
> +		goto err_release_regions;
> +	}
> +
> +	err = eea_device_reset(&ep_dev->edev);
> +	if (err) {
> +		dev_err(&pci_dev->dev, "Failed to reset device for setup!\n");
> +		goto err_unmap_reg;
> +	}
[ ... ]
> +	ep_dev->edev.rx_num = cfg_read32(ep_dev->reg, rx_num_max);
> +	ep_dev->edev.tx_num = cfg_read32(ep_dev->reg, tx_num_max);
> +
> +	if (ep_dev->edev.rx_num > EEA_PCI_Q_MAX_NUM ||
> +	    ep_dev->edev.tx_num > EEA_PCI_Q_MAX_NUM) {
> +		dev_err(&pci_dev->dev, "Invalid queue num %u %u\n",
> +			ep_dev->edev.rx_num,
> +			ep_dev->edev.tx_num);
> +		err = -EINVAL;
> +		goto err_clear_master;
> +	}
[ ... ]
> +	/* In our design, the number of hardware interrupts matches the maximum
> +	 * number of queues. If pci_alloc_irq_vectors failed, return directly.
> +	 *
> +	 * 2: adminq, error handle
> +	 */
> +	n = ep_dev->edev.rx_num + 2;
> +	ret = pci_alloc_irq_vectors(ep_dev->pci_dev, n, n, PCI_IRQ_MSIX);
> +	if (ret != n) {
> +		err = ret;
> +		goto err_clear_master;
> +	}
The vector count uses rx_num only, but tx_num is also read from the device
and validated independently against EEA_PCI_Q_MAX_NUM. If a device ever
advertises tx_num > rx_num, would later patches that bind a TX-queue IRQ at
an index above rx_num + 1 fail because no vector was allocated?
Should this be max(rx_num, tx_num) + 2, or should there be an explicit
invariant rejecting tx_num > rx_num here so that the design assumption from
the comment is enforced?
> +
> +	ep_dev->msix_vec_n = ret;
> +
> +	return 0;
[ ... ]
> +static int eea_init_device(struct eea_device *edev)
> +{
> +	int err;
> +
> +	err = eea_device_reset(edev);
> +	if (err)
> +		return err;
> +
> +	eea_pci_io_set_status(edev, EEA_S_INIT);
> +
> +	err = eea_negotiate(edev);
> +	if (err)
> +		goto err;
> +
> +	/* do net device probe ... */
> +
> +	return 0;
> +err:
> +	eea_add_status(edev, EEA_S_FAILED);
> +	return err;
> +}
> +
> +static int __eea_pci_probe(struct pci_dev *pci_dev,
> +			   struct eea_pci_device *ep_dev)
> +{
> +	int err;
> +
> +	pci_set_drvdata(pci_dev, ep_dev);
> +
> +	err = eea_pci_setup(pci_dev, ep_dev);
> +	if (err)
> +		return err;
> +
> +	err = eea_init_device(&ep_dev->edev);
> +	if (err)
> +		goto err_pci_rel;
> +
> +	return 0;
The probe path calls eea_device_reset() twice in succession:
__eea_pci_probe()
  -> eea_pci_setup()
       -> eea_device_reset()  /* first reset */
  -> eea_init_device()
       -> eea_device_reset()  /* second reset */
EEA_RESET_TIMEOUT_US is 60 seconds, so a misbehaving or slow device can
stall probe for up to 120 seconds. Is the second reset intentional, and if
so, could the rationale be added as a comment? Otherwise, would it make
sense to drop one of them so that probe has only one reset point?
> +
> +err_pci_rel:
> +	eea_pci_release_resource(ep_dev);
> +	return err;
> +}
[ ... ]
> diff --git a/drivers/net/ethernet/alibaba/eea/eea_pci.h b/drivers/net/ethernet/alibaba/eea/eea_pci.h
> new file mode 100644
> index 000000000000..746cce4dd68e
> --- /dev/null
> +++ b/drivers/net/ethernet/alibaba/eea/eea_pci.h
> @@ -0,0 +1,50 @@
[ ... ]
> +struct eea_pci_cap {
> +	__u8 cap_vndr;
> +	__u8 cap_next;
> +	__u8 cap_len;
> +	__u8 cfg_type;
> +};
> +
> +struct eea_pci_reset_reg {
> +	struct eea_pci_cap cap;
> +	__le16 driver;
> +	__le16 device;
> +};
> +
> +struct eea_pci_device;
> +
> +struct eea_device {
> +	struct eea_pci_device *ep_dev;
> +	struct device         *dma_dev;
> +	struct eea_net        *enet;
> +
> +	u64 features;
> +
> +	u32 rx_num;
> +	u32 tx_num;
> +	u32 db_blk_size;
> +};
> +
> +const char *eea_pci_name(struct eea_device *edev);
> +int eea_pci_domain_nr(struct eea_device *edev);
> +u16 eea_pci_bdf(struct eea_device *edev);
> +
> +int eea_device_reset(struct eea_device *dev);
> +int eea_pci_set_aq_up(struct eea_device *dev);
> +
> +u64 eea_pci_device_ts(struct eea_device *edev);
> +
> +void __iomem *eea_pci_db_addr(struct eea_device *edev, u32 off);
> +#endif
A number of symbols introduced here have no callers in this commit:
  eea_pci_name()
  eea_pci_domain_nr()
  eea_pci_bdf()
  eea_pci_set_aq_up()
  eea_pci_db_addr()
  eea_pci_device_ts()
  struct eea_pci_cap
  struct eea_pci_reset_reg
struct eea_device also references struct eea_net via the enet field, but
struct eea_net is not declared anywhere in this commit.
Because these helpers are non-static, the compiler will not warn about
unused functions, so the contracts they establish (for example the
eea_pci_set_aq_up() handshake noted above) cannot be checked against a real
caller in this patch.
Could the series be reorganised so that each helper or type is introduced
together with its first user, per the usual kernel convention?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v43 1/8] eea: introduce PCI framework
  2026-05-18  3:06   ` Xuan Zhuo
@ 2026-05-18  3:07     ` Xuan Zhuo
  0 siblings, 0 replies; 25+ messages in thread
From: Xuan Zhuo @ 2026-05-18  3:07 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Wen Gu, Philo Lu, Vadim Fedorenko, Dong Yibo,
	Mingyu Wang, Heiner Kallweit, Dust Li, netdev

On Mon, 18 May 2026 11:06:55 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> From https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260514095138.80680-1-xuanzhuo%40linux.alibaba.com
>
> commit 46941a5c3a95ebc5ea79193eb1c2ceab7183da1d
> Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> eea: introduce PCI framework
> This patch adds a basic PCI driver framework for the Alibaba Elastic
> Ethernet Adapter (EEA), implementing PCI probe, remove, shutdown, the
> device-status handshake helpers, and an MMIO config layout.
> A few questions inline below.
> > diff --git a/drivers/net/ethernet/alibaba/eea/eea_pci.c b/drivers/net/ethernet/alibaba/eea/eea_pci.c
> > new file mode 100644
> > index 000000000000..65a0ceb73b35
> > --- /dev/null
> > +++ b/drivers/net/ethernet/alibaba/eea/eea_pci.c
> > @@ -0,0 +1,487 @@
> [ ... ]
> > +int eea_device_reset(struct eea_device *edev)
> > +{
> > +	struct eea_pci_device *ep_dev = edev->ep_dev;
> > +	int err;
> > +	u8 val;
> > +
> > +	eea_pci_io_set_status(edev, 0);
> > +
> > +	/* We are no longer waiting for device ack during the shutdown flow. */
> > +	if (ep_dev->shutdown)
> > +		return 0;
> > +
> > +	/* A longer timeout is set here to handle edge cases, though it should
> > +	 * return promptly in most scenarios.
> > +	 *
> > +	 * In our case, all replies are handled by the DPU software, so there is
> > +	 * no race condition between the hardware processes and the register.
> > +	 */
> > +	err = read_poll_timeout(cfg_read8, val, (!val || val == 0xFF), 20,
> > +				EEA_RESET_TIMEOUT_US,
> > +				false, ep_dev->reg, device_status);
> On the shutdown path, ep_dev->shutdown is set to true and eea_device_reset()
> then returns immediately after writing device_status = 0, skipping the
> read_poll_timeout() that would otherwise wait for the device to acknowledge
> the reset:
> eea_pci_shutdown():
>     ep_dev->shutdown = true;
>     /* do net device stop and clear. */
>     eea_device_reset(edev);
>     pci_clear_master(pci_dev);
> After that, eea_pci_shutdown() calls pci_clear_master(), which only clears
> the bus-master enable bit and does not drain in-flight or posted DMA writes.
> Without a flush read of an MMIO register from this function or a synchronous
> device-side ack, is there a guarantee that the device has stopped issuing
> DMA before pci_clear_master() runs?
> On kexec or warm-reboot, can in-flight DMA after eea_pci_shutdown() returns
> land in pages that the next kernel has reused?

We prefer to avoid blocking for too long during the shutdown process, which is
why read_poll_timeout is skipped here. As for the probe phase, I ensure DMA
safety by triggering a reset before setting the master.

> > +
> > +	/* Surprise PCIe Removal */
> > +	if (val == 0xFF)
> > +		return -EINVAL;
> > +
> > +	return err;
> > +}
> > +
> > +int eea_pci_set_aq_up(struct eea_device *edev)
> > +{
> > +	struct eea_pci_device *ep_dev = edev->ep_dev;
> > +	u8 status = eea_pci_io_get_status(edev);
> > +	int err;
> > +	u8 val;
> > +
> > +	eea_pci_io_set_status(edev, status | EEA_S_OK);
> > +
> > +	/* A longer timeout is set here to handle edge cases, though it should
> > +	 * return promptly in most scenarios.
> > +	 *
> > +	 * In our case, all replies are handled by the DPU software, so there is
> > +	 * no race condition between the hardware processes and the register.
> > +	 */
> > +	err = read_poll_timeout(cfg_read8, val,
> > +				val & (EEA_S_OK | EEA_S_FAILED),
> > +				20, EEA_RESET_TIMEOUT_US,
> > +				false, ep_dev->reg, device_status);
> > +
> > +	/* Surprise PCIe Removal */
> > +	if (val == 0xFF)
> > +		return -EINVAL;
> > +
> > +	/* device fail */
> > +	if (val & EEA_S_FAILED)
> > +		return -EINVAL;
> > +
> > +	return err;
> > +}
> Can this handshake actually observe device acknowledgment?
> The driver writes status | EEA_S_OK to device_status, then read_poll_timeout()
> waits for the readback to satisfy (val & (EEA_S_OK | EEA_S_FAILED)).
> read_poll_timeout() with sleep_before_read = false performs the op() once
> before sleeping, so the very first cfg_read8() of device_status will return
> the value that was just written, which already has EEA_S_OK set. The
> condition is met on iteration zero, the 60-second EEA_RESET_TIMEOUT_US never
> elapses, and EEA_S_FAILED never gets a chance to be observed.
> For the device to ever indicate failure, would it need to first clear
> EEA_S_OK and then either re-assert EEA_S_OK on success or set EEA_S_FAILED
> on failure? Or use a separate ack/done bit that is distinct from the
> driver-written request bit? As written, the EEA_S_FAILED check below the
> poll appears unreachable on a healthy bus.

This is a DPU implementation, so all PCI TLP messages are handled and
acknowledged by software. Even for PCI registers, the read operation does not
simply return the last written value.


> > +
> > +static int eea_negotiate(struct eea_device *edev)
> > +{
> > +	struct eea_pci_device *ep_dev;
> > +	u32 status;
> > +
> > +	ep_dev = edev->ep_dev;
> > +
> > +	edev->features = 0;
> > +
> > +	cfg_write32(ep_dev->reg, drv_f_idx, 0);
> > +	cfg_write32(ep_dev->reg, drv_f, lower_32_bits(edev->features));
> > +	cfg_write32(ep_dev->reg, drv_f_idx, 1);
> > +	cfg_write32(ep_dev->reg, drv_f, upper_32_bits(edev->features));
> > +
> > +	eea_add_status(edev, EEA_S_FEATURE_DONE);
> > +	status = eea_pci_io_get_status(edev);
> > +
> > +	/* Surprise PCIe Removal */
> > +	if (status == 0xFF)
> > +		return -EINVAL;
> > +
> > +	if (!(status & EEA_S_FEATURE_DONE))
> > +		return -ENODEV;
> > +
> > +	return 0;
> > +}
> Is this status check tautological?
> eea_add_status() reads device_status, ORs in EEA_S_FEATURE_DONE, and writes
> it back. The very next line reads device_status again, so unless the device
> clears the bit before the readback (which there is no synchronisation for),
> the readback will always have EEA_S_FEATURE_DONE set, except in the surprise
> removal case where the value is 0xFF.
> Can this code detect a device that legitimately rejects the offered
> features, given there is no poll, delay, or wait for the device to react?
> The asymmetry with eea_pci_set_aq_up(), which does call read_poll_timeout()
> albeit with the issue noted above, suggests the protocol contract for these
> two transitions is different. Could the expected device-side behaviour for
> EEA_S_FEATURE_DONE be spelled out so reviewers can check the handshake?


This is just a framework, for this version no feature is needed to negotiabled.


> > +
> [ ... ]
> > +static int eea_pci_setup(struct pci_dev *pci_dev, struct eea_pci_device *ep_dev)
> > +{
> > +	int err, n, ret, len;
> > +
> > +	ep_dev->pci_dev = pci_dev;
> > +
> > +	err = pci_enable_device(pci_dev);
> > +	if (err)
> > +		return err;
> > +
> > +	err = pci_request_regions(pci_dev, "EEA");
> > +	if (err)
> > +		goto err_disable_dev;
> [ ... ]
> > +	ep_dev->reg = pci_iomap(pci_dev, 0, 0);
> > +	if (!ep_dev->reg) {
> > +		dev_err(&pci_dev->dev, "Failed to map pci bar!\n");
> > +		err = -ENOMEM;
> > +		goto err_release_regions;
> > +	}
> > +
> > +	err = eea_device_reset(&ep_dev->edev);
> > +	if (err) {
> > +		dev_err(&pci_dev->dev, "Failed to reset device for setup!\n");
> > +		goto err_unmap_reg;
> > +	}
> [ ... ]
> > +	ep_dev->edev.rx_num = cfg_read32(ep_dev->reg, rx_num_max);
> > +	ep_dev->edev.tx_num = cfg_read32(ep_dev->reg, tx_num_max);
> > +
> > +	if (ep_dev->edev.rx_num > EEA_PCI_Q_MAX_NUM ||
> > +	    ep_dev->edev.tx_num > EEA_PCI_Q_MAX_NUM) {
> > +		dev_err(&pci_dev->dev, "Invalid queue num %u %u\n",
> > +			ep_dev->edev.rx_num,
> > +			ep_dev->edev.tx_num);
> > +		err = -EINVAL;
> > +		goto err_clear_master;
> > +	}
> [ ... ]
> > +	/* In our design, the number of hardware interrupts matches the maximum
> > +	 * number of queues. If pci_alloc_irq_vectors failed, return directly.
> > +	 *
> > +	 * 2: adminq, error handle
> > +	 */
> > +	n = ep_dev->edev.rx_num + 2;
> > +	ret = pci_alloc_irq_vectors(ep_dev->pci_dev, n, n, PCI_IRQ_MSIX);
> > +	if (ret != n) {
> > +		err = ret;
> > +		goto err_clear_master;
> > +	}
> The vector count uses rx_num only, but tx_num is also read from the device
> and validated independently against EEA_PCI_Q_MAX_NUM. If a device ever
> advertises tx_num > rx_num, would later patches that bind a TX-queue IRQ at
> an index above rx_num + 1 fail because no vector was allocated?
> Should this be max(rx_num, tx_num) + 2, or should there be an explicit
> invariant rejecting tx_num > rx_num here so that the design assumption from
> the comment is enforced?

For the current design, these issues are out of scope.

> > +
> > +	ep_dev->msix_vec_n = ret;
> > +
> > +	return 0;
> [ ... ]
> > +static int eea_init_device(struct eea_device *edev)
> > +{
> > +	int err;
> > +
> > +	err = eea_device_reset(edev);
> > +	if (err)
> > +		return err;
> > +
> > +	eea_pci_io_set_status(edev, EEA_S_INIT);
> > +
> > +	err = eea_negotiate(edev);
> > +	if (err)
> > +		goto err;
> > +
> > +	/* do net device probe ... */
> > +
> > +	return 0;
> > +err:
> > +	eea_add_status(edev, EEA_S_FAILED);
> > +	return err;
> > +}
> > +
> > +static int __eea_pci_probe(struct pci_dev *pci_dev,
> > +			   struct eea_pci_device *ep_dev)
> > +{
> > +	int err;
> > +
> > +	pci_set_drvdata(pci_dev, ep_dev);
> > +
> > +	err = eea_pci_setup(pci_dev, ep_dev);
> > +	if (err)
> > +		return err;
> > +
> > +	err = eea_init_device(&ep_dev->edev);
> > +	if (err)
> > +		goto err_pci_rel;
> > +
> > +	return 0;
> The probe path calls eea_device_reset() twice in succession:
> __eea_pci_probe()
>   -> eea_pci_setup()
>        -> eea_device_reset()  /* first reset */
>   -> eea_init_device()
>        -> eea_device_reset()  /* second reset */
> EEA_RESET_TIMEOUT_US is 60 seconds, so a misbehaving or slow device can
> stall probe for up to 120 seconds. Is the second reset intentional, and if
> so, could the rationale be added as a comment? Otherwise, would it make
> sense to drop one of them so that probe has only one reset point?

The first reset is intended to clear any potential DMA remnants from scenarios
like kexec. The second reset is performed to ensure the device is in a valid
state immediately before the actual probe. While the 60s timeout is a worst-case
safeguard, the actual device response should typically complete within 1 second.


> > +
> > +err_pci_rel:
> > +	eea_pci_release_resource(ep_dev);
> > +	return err;
> > +}
> [ ... ]
> > diff --git a/drivers/net/ethernet/alibaba/eea/eea_pci.h b/drivers/net/ethernet/alibaba/eea/eea_pci.h
> > new file mode 100644
> > index 000000000000..746cce4dd68e
> > --- /dev/null
> > +++ b/drivers/net/ethernet/alibaba/eea/eea_pci.h
> > @@ -0,0 +1,50 @@
> [ ... ]
> > +struct eea_pci_cap {
> > +	__u8 cap_vndr;
> > +	__u8 cap_next;
> > +	__u8 cap_len;
> > +	__u8 cfg_type;
> > +};
> > +
> > +struct eea_pci_reset_reg {
> > +	struct eea_pci_cap cap;
> > +	__le16 driver;
> > +	__le16 device;
> > +};
> > +
> > +struct eea_pci_device;
> > +
> > +struct eea_device {
> > +	struct eea_pci_device *ep_dev;
> > +	struct device         *dma_dev;
> > +	struct eea_net        *enet;
> > +
> > +	u64 features;
> > +
> > +	u32 rx_num;
> > +	u32 tx_num;
> > +	u32 db_blk_size;
> > +};
> > +
> > +const char *eea_pci_name(struct eea_device *edev);
> > +int eea_pci_domain_nr(struct eea_device *edev);
> > +u16 eea_pci_bdf(struct eea_device *edev);
> > +
> > +int eea_device_reset(struct eea_device *dev);
> > +int eea_pci_set_aq_up(struct eea_device *dev);
> > +
> > +u64 eea_pci_device_ts(struct eea_device *edev);
> > +
> > +void __iomem *eea_pci_db_addr(struct eea_device *edev, u32 off);
> > +#endif
> A number of symbols introduced here have no callers in this commit:
>   eea_pci_name()
>   eea_pci_domain_nr()
>   eea_pci_bdf()
>   eea_pci_set_aq_up()
>   eea_pci_db_addr()
>   eea_pci_device_ts()
>   struct eea_pci_cap
>   struct eea_pci_reset_reg
> struct eea_device also references struct eea_net via the enet field, but
> struct eea_net is not declared anywhere in this commit.
> Because these helpers are non-static, the compiler will not warn about
> unused functions, so the contracts they establish (for example the
> eea_pci_set_aq_up() handshake noted above) cannot be checked against a real
> caller in this patch.
> Could the series be reorganised so that each helper or type is introduced
> together with its first user, per the usual kernel convention?


If there is a follow-up version, I will do this.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v43 0/8] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor
  2026-05-14  9:51 [PATCH net-next v43 0/8] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor Xuan Zhuo
                   ` (8 preceding siblings ...)
  2026-05-15  6:40 ` [PATCH net-next v43 0/8] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor Xuan Zhuo
@ 2026-05-19 10:33 ` Paolo Abeni
  2026-05-19 10:40 ` patchwork-bot+netdevbpf
  10 siblings, 0 replies; 25+ messages in thread
From: Paolo Abeni @ 2026-05-19 10:33 UTC (permalink / raw)
  To: Xuan Zhuo, netdev
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Wen Gu, Philo Lu, Vadim Fedorenko, Dong Yibo, Mingyu Wang,
	Heiner Kallweit, Dust Li

On 5/14/26 11:51 AM, Xuan Zhuo wrote:
> Add a driver framework for EEA that will be available in the future.
> 
> This driver is currently quite minimal, implementing only fundamental
> core functionalities. Key features include: I/O queue management via
> adminq, basic PCI-layer operations, and essential RX/TX data
> communication capabilities. It also supports the creation,
> initialization, and management of network devices (netdev). Furthermore,
> the ring structures for both I/O queues and adminq have been abstracted
> into a simple, unified, and reusable library implementation,
> facilitating future extension and maintenance.

I went over all the sashiko feedback and your replies, and I think
pending concerns are better handled as follow-up as needed.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v43 0/8] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor
  2026-05-14  9:51 [PATCH net-next v43 0/8] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor Xuan Zhuo
                   ` (9 preceding siblings ...)
  2026-05-19 10:33 ` Paolo Abeni
@ 2026-05-19 10:40 ` patchwork-bot+netdevbpf
  10 siblings, 0 replies; 25+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-05-19 10:40 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, andrew+netdev, davem, edumazet, kuba, pabeni, guwen,
	lulie, vadim.fedorenko, dong100, 25181214217, hkallweit1, dust.li

Hello:

This series was applied to netdev/net-next.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Thu, 14 May 2026 17:51:30 +0800 you wrote:
> Add a driver framework for EEA that will be available in the future.
> 
> This driver is currently quite minimal, implementing only fundamental
> core functionalities. Key features include: I/O queue management via
> adminq, basic PCI-layer operations, and essential RX/TX data
> communication capabilities. It also supports the creation,
> initialization, and management of network devices (netdev). Furthermore,
> the ring structures for both I/O queues and adminq have been abstracted
> into a simple, unified, and reusable library implementation,
> facilitating future extension and maintenance.
> 
> [...]

Here is the summary with links:
  - [net-next,v43,1/8] eea: introduce PCI framework
    https://git.kernel.org/netdev/net-next/c/4f68ffc45e94
  - [net-next,v43,2/8] eea: introduce ring and descriptor structures
    https://git.kernel.org/netdev/net-next/c/039ce329dfe6
  - [net-next,v43,3/8] eea: probe the netdevice and create adminq
    https://git.kernel.org/netdev/net-next/c/4c45a51e5bf5
  - [net-next,v43,4/8] eea: create/destroy rx,tx queues for netdevice open and stop
    https://git.kernel.org/netdev/net-next/c/aa8bca4cf200
  - [net-next,v43,5/8] eea: implement packet receive logic
    https://git.kernel.org/netdev/net-next/c/df9cad6baf43
  - [net-next,v43,6/8] eea: implement packet transmit logic
    https://git.kernel.org/netdev/net-next/c/aef5055e9764
  - [net-next,v43,7/8] eea: introduce ethtool support
    https://git.kernel.org/netdev/net-next/c/5f4f7bc0ed11
  - [net-next,v43,8/8] eea: introduce callback for ndo_get_stats64 and register netdev
    https://git.kernel.org/netdev/net-next/c/4e88fb3234c8

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2026-05-19 10:40 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-14  9:51 [PATCH net-next v43 0/8] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor Xuan Zhuo
2026-05-14  9:51 ` [PATCH net-next v43 1/8] eea: introduce PCI framework Xuan Zhuo
2026-05-18  3:06   ` Xuan Zhuo
2026-05-18  3:07     ` Xuan Zhuo
2026-05-14  9:51 ` [PATCH net-next v43 2/8] eea: introduce ring and descriptor structures Xuan Zhuo
2026-05-18  2:57   ` Xuan Zhuo
2026-05-14  9:51 ` [PATCH net-next v43 3/8] eea: probe the netdevice and create adminq Xuan Zhuo
2026-05-18  1:41   ` Xuan Zhuo
2026-05-18  1:41     ` Xuan Zhuo
2026-05-14  9:51 ` [PATCH net-next v43 4/8] eea: create/destroy rx,tx queues for netdevice open and stop Xuan Zhuo
2026-05-18  1:18   ` Xuan Zhuo
2026-05-18  1:24     ` Xuan Zhuo
2026-05-14  9:51 ` [PATCH net-next v43 5/8] eea: implement packet receive logic Xuan Zhuo
2026-05-18  2:34   ` Xuan Zhuo
2026-05-18  2:35     ` Xuan Zhuo
2026-05-14  9:51 ` [PATCH net-next v43 6/8] eea: implement packet transmit logic Xuan Zhuo
2026-05-18  2:47   ` Xuan Zhuo
2026-05-18  2:48     ` Xuan Zhuo
2026-05-14  9:51 ` [PATCH net-next v43 7/8] eea: introduce ethtool support Xuan Zhuo
2026-05-18  2:56   ` Xuan Zhuo
2026-05-14  9:51 ` [PATCH net-next v43 8/8] eea: introduce callback for ndo_get_stats64 and register netdev Xuan Zhuo
2026-05-18  3:01   ` Xuan Zhuo
2026-05-15  6:40 ` [PATCH net-next v43 0/8] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor Xuan Zhuo
2026-05-19 10:33 ` Paolo Abeni
2026-05-19 10:40 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox