netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH net-next v1 0/3] bnxt: implement queue api
@ 2024-04-30  1:07 David Wei
  2024-04-30  1:07 ` [RFC PATCH net-next v1 1/3] queue_api: define " David Wei
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: David Wei @ 2024-04-30  1:07 UTC (permalink / raw)
  To: netdev, Michael Chan, Pavan Chebbi, Andy Gospodarek, Mina Almasry,
	Shailend Chand
  Cc: Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni

Both TCP devmem and io_uring ZC Rx need a way to change the page pool
for an Rx queue and reset it. As discussed in [1], until netdev core
takes a greater role in owning queue mem and ndo_open()/stop() calls
the queue API to alloc/free queue mem, we will have the driver
allocate queue mem in netdev_queue_mgmt_ops.

Rather than keeping the queue restart function in netmem, move it to
netdev core in netdev_rx_queue.h, since io_uring will also need to call
this as well. In the future, we'll have another API function that
changes the page pool memory provider for a given queue, then restarts
it.

The bnxt implementation is minimal for now, with
ndo_queue_mem_alloc()/free() doing nothing. Therefore queue mem is
allocated after the queue has been stopped, instead of before.
Implementing this properly for bnxt is more complex so before spending
that time I would like to get some feedback first on the viability of
this patchset.

The ndo_queue_stop()/start() steps are basically the same as
bnxt_rx_ring_reset(). It is done outside of sp_task, since the caller
netdev_rx_queue_restart() is in the task context already, with rtnl_lock
taken.

[1]: https://lore.kernel.org/netdev/20240419202535.5c5097fe@kernel.org/

David Wei (2):
  bnxt: implement queue api
  netdev: add netdev_rx_queue_restart()

Mina Almasry (1):
  queue_api: define queue api

 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 62 +++++++++++++++++++++++
 include/linux/netdevice.h                 |  3 ++
 include/net/netdev_queues.h               | 27 ++++++++++
 include/net/netdev_rx_queue.h             |  3 ++
 net/core/Makefile                         |  1 +
 net/core/netdev_rx_queue.c                | 58 +++++++++++++++++++++
 6 files changed, 154 insertions(+)
 create mode 100644 net/core/netdev_rx_queue.c

-- 
2.43.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC PATCH net-next v1 1/3] queue_api: define queue api
  2024-04-30  1:07 [RFC PATCH net-next v1 0/3] bnxt: implement queue api David Wei
@ 2024-04-30  1:07 ` David Wei
  2024-04-30 18:00   ` Mina Almasry
  2024-04-30  1:07 ` [RFC PATCH net-next v1 2/3] bnxt: implement " David Wei
  2024-04-30  1:07 ` [RFC PATCH net-next v1 3/3] netdev: add netdev_rx_queue_restart() David Wei
  2 siblings, 1 reply; 15+ messages in thread
From: David Wei @ 2024-04-30  1:07 UTC (permalink / raw)
  To: netdev, Michael Chan, Pavan Chebbi, Andy Gospodarek, Mina Almasry,
	Shailend Chand
  Cc: Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni

From: Mina Almasry <almasrymina@google.com>

This API enables the net stack to reset the queues used for devmem TCP.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: David Wei <dw@davidwei.uk>
---
 include/linux/netdevice.h   |  3 +++
 include/net/netdev_queues.h | 27 +++++++++++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f849e7d110ed..6a58ec73c5e8 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1957,6 +1957,7 @@ enum netdev_reg_state {
  *	@sysfs_rx_queue_group:	Space for optional per-rx queue attributes
  *	@rtnl_link_ops:	Rtnl_link_ops
  *	@stat_ops:	Optional ops for queue-aware statistics
+ *	@queue_mgmt_ops:	Optional ops for queue management
  *
  *	@gso_max_size:	Maximum size of generic segmentation offload
  *	@tso_max_size:	Device (as in HW) limit on the max TSO request size
@@ -2340,6 +2341,8 @@ struct net_device {
 
 	const struct netdev_stat_ops *stat_ops;
 
+	const struct netdev_queue_mgmt_ops *queue_mgmt_ops;
+
 	/* for setting kernel sock attribute on TCP connection setup */
 #define GSO_MAX_SEGS		65535u
 #define GSO_LEGACY_MAX_SIZE	65536u
diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
index 1ec408585373..337df0860ae6 100644
--- a/include/net/netdev_queues.h
+++ b/include/net/netdev_queues.h
@@ -60,6 +60,33 @@ struct netdev_stat_ops {
 			       struct netdev_queue_stats_tx *tx);
 };
 
+/**
+ * struct netdev_queue_mgmt_ops - netdev ops for queue management
+ *
+ * @ndo_queue_mem_alloc: Allocate memory for an RX queue. The memory returned
+ *			 in the form of a void* can be passed to
+ *			 ndo_queue_mem_free() for freeing or to ndo_queue_start
+ *			 to create an RX queue with this memory.
+ *
+ * @ndo_queue_mem_free:	Free memory from an RX queue.
+ *
+ * @ndo_queue_start:	Start an RX queue at the specified index.
+ *
+ * @ndo_queue_stop:	Stop the RX queue at the specified index.
+ */
+struct netdev_queue_mgmt_ops {
+	void *			(*ndo_queue_mem_alloc)(struct net_device *dev,
+						       int idx);
+	void			(*ndo_queue_mem_free)(struct net_device *dev,
+						      void *queue_mem);
+	int			(*ndo_queue_start)(struct net_device *dev,
+						   int idx,
+						   void *queue_mem);
+	int			(*ndo_queue_stop)(struct net_device *dev,
+						  int idx,
+						  void **out_queue_mem);
+};
+
 /**
  * DOC: Lockless queue stopping / waking helpers.
  *
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH net-next v1 2/3] bnxt: implement queue api
  2024-04-30  1:07 [RFC PATCH net-next v1 0/3] bnxt: implement queue api David Wei
  2024-04-30  1:07 ` [RFC PATCH net-next v1 1/3] queue_api: define " David Wei
@ 2024-04-30  1:07 ` David Wei
  2024-04-30  1:07 ` [RFC PATCH net-next v1 3/3] netdev: add netdev_rx_queue_restart() David Wei
  2 siblings, 0 replies; 15+ messages in thread
From: David Wei @ 2024-04-30  1:07 UTC (permalink / raw)
  To: netdev, Michael Chan, Pavan Chebbi, Andy Gospodarek, Mina Almasry,
	Shailend Chand
  Cc: Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni

Implement the bare minimum queue API for bnxt. I'm essentially breaking
up the existing bnxt_rx_ring_reset() into two steps:

1. bnxt_queue_stop()
2. bnxt_queue_start()

The other two ndos are left as no-ops for now, so the queue mem is
allocated after the queue has been stopped. Doing this before stopping
the queue is a lot more work, so I'm looking for some feedback first.

Signed-off-by: David Wei <dw@davidwei.uk>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 62 +++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 06b7a963bbbd..788c87271eb1 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -14810,6 +14810,67 @@ static const struct netdev_stat_ops bnxt_stat_ops = {
 	.get_base_stats		= bnxt_get_base_stats,
 };
 
+static void *bnxt_queue_mem_alloc(struct net_device *dev, int idx)
+{
+	struct bnxt *bp = netdev_priv(dev);
+
+	return &bp->rx_ring[idx];
+}
+
+static void bnxt_queue_mem_free(struct net_device *dev, void *qmem)
+{
+}
+
+static int bnxt_queue_start(struct net_device *dev, int idx, void *qmem)
+{
+	struct bnxt_rx_ring_info *rxr = qmem;
+	struct bnxt *bp = netdev_priv(dev);
+
+	bnxt_alloc_one_rx_ring(bp, idx);
+
+	if (bp->flags & BNXT_FLAG_AGG_RINGS)
+		bnxt_db_write(bp, &rxr->rx_agg_db, rxr->rx_agg_prod);
+	bnxt_db_write(bp, &rxr->rx_db, rxr->rx_prod);
+
+	if (bp->flags & BNXT_FLAG_TPA)
+		bnxt_set_tpa(bp, true);
+
+	return 0;
+}
+
+static int bnxt_queue_stop(struct net_device *dev, int idx, void **out_qmem)
+{
+	struct bnxt *bp = netdev_priv(dev);
+	struct bnxt_rx_ring_info *rxr;
+	struct bnxt_cp_ring_info *cpr;
+	int rc;
+
+	rc = bnxt_hwrm_rx_ring_reset(bp, idx);
+	if (rc)
+		return rc;
+
+	bnxt_free_one_rx_ring_skbs(bp, idx);
+	rxr = &bp->rx_ring[idx];
+	rxr->rx_prod = 0;
+	rxr->rx_agg_prod = 0;
+	rxr->rx_sw_agg_prod = 0;
+	rxr->rx_next_cons = 0;
+
+	cpr = &rxr->bnapi->cp_ring;
+	cpr->sw_stats.rx.rx_resets++;
+
+	*out_qmem = rxr;
+
+	return 0;
+}
+
+static const struct netdev_queue_mgmt_ops bnxt_queue_mgmt_ops = {
+	.ndo_queue_mem_alloc	= bnxt_queue_mem_alloc,
+	.ndo_queue_mem_free	= bnxt_queue_mem_free,
+	.ndo_queue_start	= bnxt_queue_start,
+	.ndo_queue_stop		= bnxt_queue_stop,
+};
+
 static void bnxt_remove_one(struct pci_dev *pdev)
 {
 	struct net_device *dev = pci_get_drvdata(pdev);
@@ -15275,6 +15336,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	dev->stat_ops = &bnxt_stat_ops;
 	dev->watchdog_timeo = BNXT_TX_TIMEOUT;
 	dev->ethtool_ops = &bnxt_ethtool_ops;
+	dev->queue_mgmt_ops = &bnxt_queue_mgmt_ops;
 	pci_set_drvdata(pdev, dev);
 
 	rc = bnxt_alloc_hwrm_resources(bp);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH net-next v1 3/3] netdev: add netdev_rx_queue_restart()
  2024-04-30  1:07 [RFC PATCH net-next v1 0/3] bnxt: implement queue api David Wei
  2024-04-30  1:07 ` [RFC PATCH net-next v1 1/3] queue_api: define " David Wei
  2024-04-30  1:07 ` [RFC PATCH net-next v1 2/3] bnxt: implement " David Wei
@ 2024-04-30  1:07 ` David Wei
  2024-04-30 18:15   ` Mina Almasry
  2 siblings, 1 reply; 15+ messages in thread
From: David Wei @ 2024-04-30  1:07 UTC (permalink / raw)
  To: netdev, Michael Chan, Pavan Chebbi, Andy Gospodarek, Mina Almasry,
	Shailend Chand
  Cc: Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni

From: Mina Almasry <almasrymina@google.com>

Add netdev_rx_queue_restart() function to netdev_rx_queue.h. This is
taken from Mina's work in [1] with a slight modification of taking
rtnl_lock() during the queue stop and start ops.

For bnxt specifically, if the firmware doesn't support
BNXT_RST_RING_SP_EVENT, then ndo_queue_stop() returns -EOPNOTSUPP and
the whole restart fails. Unlike bnxt_rx_ring_reset(), there is no
attempt to reset the whole device.

[1]: https://lore.kernel.org/linux-kernel/20240403002053.2376017-6-almasrymina@google.com/#t

Signed-off-by: David Wei <dw@davidwei.uk>
---
 include/net/netdev_rx_queue.h |  3 ++
 net/core/Makefile             |  1 +
 net/core/netdev_rx_queue.c    | 58 +++++++++++++++++++++++++++++++++++
 3 files changed, 62 insertions(+)
 create mode 100644 net/core/netdev_rx_queue.c

diff --git a/include/net/netdev_rx_queue.h b/include/net/netdev_rx_queue.h
index aa1716fb0e53..e78ca52d67fb 100644
--- a/include/net/netdev_rx_queue.h
+++ b/include/net/netdev_rx_queue.h
@@ -54,4 +54,7 @@ get_netdev_rx_queue_index(struct netdev_rx_queue *queue)
 	return index;
 }
 #endif
+
+int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq);
+
 #endif
diff --git a/net/core/Makefile b/net/core/Makefile
index 21d6fbc7e884..f2aa63c167a3 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -19,6 +19,7 @@ obj-$(CONFIG_NETDEV_ADDR_LIST_TEST) += dev_addr_lists_test.o
 
 obj-y += net-sysfs.o
 obj-y += hotdata.o
+obj-y += netdev_rx_queue.o
 obj-$(CONFIG_PAGE_POOL) += page_pool.o page_pool_user.o
 obj-$(CONFIG_PROC_FS) += net-procfs.o
 obj-$(CONFIG_NET_PKTGEN) += pktgen.o
diff --git a/net/core/netdev_rx_queue.c b/net/core/netdev_rx_queue.c
new file mode 100644
index 000000000000..9633fb36f6d1
--- /dev/null
+++ b/net/core/netdev_rx_queue.c
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include <linux/netdevice.h>
+#include <net/netdev_queues.h>
+#include <net/netdev_rx_queue.h>
+
+int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq)
+{
+	void *new_mem;
+	void *old_mem;
+	int err;
+
+	if (!dev->queue_mgmt_ops->ndo_queue_stop ||
+	    !dev->queue_mgmt_ops->ndo_queue_mem_free ||
+	    !dev->queue_mgmt_ops->ndo_queue_mem_alloc ||
+	    !dev->queue_mgmt_ops->ndo_queue_start)
+		return -EOPNOTSUPP;
+
+	new_mem = dev->queue_mgmt_ops->ndo_queue_mem_alloc(dev, rxq);
+	if (!new_mem)
+		return -ENOMEM;
+
+	rtnl_lock();
+	err = dev->queue_mgmt_ops->ndo_queue_stop(dev, rxq, &old_mem);
+	if (err)
+		goto err_free_new_mem;
+
+	err = dev->queue_mgmt_ops->ndo_queue_start(dev, rxq, new_mem);
+	if (err)
+		goto err_start_queue;
+	rtnl_unlock();
+
+	dev->queue_mgmt_ops->ndo_queue_mem_free(dev, old_mem);
+
+	return 0;
+
+err_start_queue:
+	/* Restarting the queue with old_mem should be successful as we haven't
+	 * changed any of the queue configuration, and there is not much we can
+	 * do to recover from a failure here.
+	 *
+	 * WARN if the we fail to recover the old rx queue, and at least free
+	 * old_mem so we don't also leak that.
+	 */
+	if (dev->queue_mgmt_ops->ndo_queue_start(dev, rxq, old_mem)) {
+		WARN(1,
+		     "Failed to restart old queue in error path. RX queue %d may be unhealthy.",
+		     rxq);
+		dev->queue_mgmt_ops->ndo_queue_mem_free(dev, &old_mem);
+	}
+
+err_free_new_mem:
+	dev->queue_mgmt_ops->ndo_queue_mem_free(dev, new_mem);
+	rtnl_unlock();
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(netdev_rx_queue_restart);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next v1 1/3] queue_api: define queue api
  2024-04-30  1:07 ` [RFC PATCH net-next v1 1/3] queue_api: define " David Wei
@ 2024-04-30 18:00   ` Mina Almasry
  2024-05-02  1:00     ` David Wei
  2024-05-02  2:44     ` David Wei
  0 siblings, 2 replies; 15+ messages in thread
From: Mina Almasry @ 2024-04-30 18:00 UTC (permalink / raw)
  To: David Wei
  Cc: netdev, Michael Chan, Pavan Chebbi, Andy Gospodarek,
	Shailend Chand, Jakub Kicinski, David S. Miller, Eric Dumazet,
	Paolo Abeni

On Mon, Apr 29, 2024 at 6:07 PM David Wei <dw@davidwei.uk> wrote:
>
> From: Mina Almasry <almasrymina@google.com>
>
> This API enables the net stack to reset the queues used for devmem TCP.
>
> Signed-off-by: Mina Almasry <almasrymina@google.com>
> Signed-off-by: David Wei <dw@davidwei.uk>
> ---
>  include/linux/netdevice.h   |  3 +++
>  include/net/netdev_queues.h | 27 +++++++++++++++++++++++++++
>  2 files changed, 30 insertions(+)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index f849e7d110ed..6a58ec73c5e8 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1957,6 +1957,7 @@ enum netdev_reg_state {
>   *     @sysfs_rx_queue_group:  Space for optional per-rx queue attributes
>   *     @rtnl_link_ops: Rtnl_link_ops
>   *     @stat_ops:      Optional ops for queue-aware statistics
> + *     @queue_mgmt_ops:        Optional ops for queue management
>   *
>   *     @gso_max_size:  Maximum size of generic segmentation offload
>   *     @tso_max_size:  Device (as in HW) limit on the max TSO request size
> @@ -2340,6 +2341,8 @@ struct net_device {
>
>         const struct netdev_stat_ops *stat_ops;
>
> +       const struct netdev_queue_mgmt_ops *queue_mgmt_ops;
> +
>         /* for setting kernel sock attribute on TCP connection setup */
>  #define GSO_MAX_SEGS           65535u
>  #define GSO_LEGACY_MAX_SIZE    65536u
> diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
> index 1ec408585373..337df0860ae6 100644
> --- a/include/net/netdev_queues.h
> +++ b/include/net/netdev_queues.h
> @@ -60,6 +60,33 @@ struct netdev_stat_ops {
>                                struct netdev_queue_stats_tx *tx);
>  };
>
> +/**
> + * struct netdev_queue_mgmt_ops - netdev ops for queue management
> + *
> + * @ndo_queue_mem_alloc: Allocate memory for an RX queue. The memory returned
> + *                      in the form of a void* can be passed to
> + *                      ndo_queue_mem_free() for freeing or to ndo_queue_start
> + *                      to create an RX queue with this memory.
> + *
> + * @ndo_queue_mem_free:        Free memory from an RX queue.
> + *
> + * @ndo_queue_start:   Start an RX queue at the specified index.
> + *
> + * @ndo_queue_stop:    Stop the RX queue at the specified index.
> + */
> +struct netdev_queue_mgmt_ops {
> +       void *                  (*ndo_queue_mem_alloc)(struct net_device *dev,
> +                                                      int idx);
> +       void                    (*ndo_queue_mem_free)(struct net_device *dev,
> +                                                     void *queue_mem);
> +       int                     (*ndo_queue_start)(struct net_device *dev,
> +                                                  int idx,
> +                                                  void *queue_mem);
> +       int                     (*ndo_queue_stop)(struct net_device *dev,
> +                                                 int idx,
> +                                                 void **out_queue_mem);
> +};

Sorry, I think we raced a bit, we updated our interface definition
based on your/Jakub's feedback to expose the size of the struct for
core to allocate, so it looks like this for us now:

+struct netdev_queue_mgmt_ops {
+       size_t                  ndo_queue_mem_size;
+       int                     (*ndo_queue_mem_alloc)(struct net_device *dev,
+                                                      void *per_queue_mem,
+                                                      int idx);
+       void                    (*ndo_queue_mem_free)(struct net_device *dev,
+                                                     void *per_queue_mem);
+       int                     (*ndo_queue_start)(struct net_device *dev,
+                                                  void *per_queue_mem,
+                                                  int idx);
+       int                     (*ndo_queue_stop)(struct net_device *dev,
+                                                 void *per_queue_mem,
+                                                 int idx);
+};
+

The idea is that ndo_queue_mem_size is the size of the memory
per_queue_mem points to.

The rest of the functions are slightly modified to allow core to
allocate the memory and pass in the pointer for the driver to fill
in/us. I think Shailend is close to posting the patches, let us know
if you see any issues.

-- 
Thanks,
Mina

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next v1 3/3] netdev: add netdev_rx_queue_restart()
  2024-04-30  1:07 ` [RFC PATCH net-next v1 3/3] netdev: add netdev_rx_queue_restart() David Wei
@ 2024-04-30 18:15   ` Mina Almasry
  2024-05-02  1:04     ` David Wei
  0 siblings, 1 reply; 15+ messages in thread
From: Mina Almasry @ 2024-04-30 18:15 UTC (permalink / raw)
  To: David Wei
  Cc: netdev, Michael Chan, Pavan Chebbi, Andy Gospodarek,
	Shailend Chand, Jakub Kicinski, David S. Miller, Eric Dumazet,
	Paolo Abeni

On Mon, Apr 29, 2024 at 6:07 PM David Wei <dw@davidwei.uk> wrote:
>
> From: Mina Almasry <almasrymina@google.com>
>
> Add netdev_rx_queue_restart() function to netdev_rx_queue.h. This is
> taken from Mina's work in [1] with a slight modification of taking
> rtnl_lock() during the queue stop and start ops.
>
> For bnxt specifically, if the firmware doesn't support
> BNXT_RST_RING_SP_EVENT, then ndo_queue_stop() returns -EOPNOTSUPP and
> the whole restart fails. Unlike bnxt_rx_ring_reset(), there is no
> attempt to reset the whole device.
>
> [1]: https://lore.kernel.org/linux-kernel/20240403002053.2376017-6-almasrymina@google.com/#t
>
> Signed-off-by: David Wei <dw@davidwei.uk>
> ---
>  include/net/netdev_rx_queue.h |  3 ++
>  net/core/Makefile             |  1 +
>  net/core/netdev_rx_queue.c    | 58 +++++++++++++++++++++++++++++++++++
>  3 files changed, 62 insertions(+)
>  create mode 100644 net/core/netdev_rx_queue.c
>
> diff --git a/include/net/netdev_rx_queue.h b/include/net/netdev_rx_queue.h
> index aa1716fb0e53..e78ca52d67fb 100644
> --- a/include/net/netdev_rx_queue.h
> +++ b/include/net/netdev_rx_queue.h
> @@ -54,4 +54,7 @@ get_netdev_rx_queue_index(struct netdev_rx_queue *queue)
>         return index;
>  }
>  #endif
> +
> +int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq);
> +
>  #endif
> diff --git a/net/core/Makefile b/net/core/Makefile
> index 21d6fbc7e884..f2aa63c167a3 100644
> --- a/net/core/Makefile
> +++ b/net/core/Makefile
> @@ -19,6 +19,7 @@ obj-$(CONFIG_NETDEV_ADDR_LIST_TEST) += dev_addr_lists_test.o
>
>  obj-y += net-sysfs.o
>  obj-y += hotdata.o
> +obj-y += netdev_rx_queue.o
>  obj-$(CONFIG_PAGE_POOL) += page_pool.o page_pool_user.o
>  obj-$(CONFIG_PROC_FS) += net-procfs.o
>  obj-$(CONFIG_NET_PKTGEN) += pktgen.o
> diff --git a/net/core/netdev_rx_queue.c b/net/core/netdev_rx_queue.c
> new file mode 100644
> index 000000000000..9633fb36f6d1
> --- /dev/null
> +++ b/net/core/netdev_rx_queue.c
> @@ -0,0 +1,58 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#include <linux/netdevice.h>
> +#include <net/netdev_queues.h>
> +#include <net/netdev_rx_queue.h>
> +
> +int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq)
> +{
> +       void *new_mem;
> +       void *old_mem;
> +       int err;
> +
> +       if (!dev->queue_mgmt_ops->ndo_queue_stop ||
> +           !dev->queue_mgmt_ops->ndo_queue_mem_free ||
> +           !dev->queue_mgmt_ops->ndo_queue_mem_alloc ||
> +           !dev->queue_mgmt_ops->ndo_queue_start)
> +               return -EOPNOTSUPP;
> +
> +       new_mem = dev->queue_mgmt_ops->ndo_queue_mem_alloc(dev, rxq);
> +       if (!new_mem)
> +               return -ENOMEM;
> +
> +       rtnl_lock();

FWIW in my case this function is called from a netlink API which
already has rtnl_lock, so maybe a check of rtnl_is_locked is good here
rather than a relock.

> +       err = dev->queue_mgmt_ops->ndo_queue_stop(dev, rxq, &old_mem);
> +       if (err)
> +               goto err_free_new_mem;
> +
> +       err = dev->queue_mgmt_ops->ndo_queue_start(dev, rxq, new_mem);
> +       if (err)
> +               goto err_start_queue;
> +       rtnl_unlock();
> +
> +       dev->queue_mgmt_ops->ndo_queue_mem_free(dev, old_mem);
> +
> +       return 0;
> +
> +err_start_queue:
> +       /* Restarting the queue with old_mem should be successful as we haven't
> +        * changed any of the queue configuration, and there is not much we can
> +        * do to recover from a failure here.
> +        *
> +        * WARN if the we fail to recover the old rx queue, and at least free
> +        * old_mem so we don't also leak that.
> +        */
> +       if (dev->queue_mgmt_ops->ndo_queue_start(dev, rxq, old_mem)) {
> +               WARN(1,
> +                    "Failed to restart old queue in error path. RX queue %d may be unhealthy.",
> +                    rxq);
> +               dev->queue_mgmt_ops->ndo_queue_mem_free(dev, &old_mem);
> +       }
> +
> +err_free_new_mem:
> +       dev->queue_mgmt_ops->ndo_queue_mem_free(dev, new_mem);
> +       rtnl_unlock();
> +
> +       return err;
> +}
> +EXPORT_SYMBOL_GPL(netdev_rx_queue_restart);

Does stuff outside of core need this? I don't think so, right? I think
you can drop EXPORT_SYMBOL_GPL.

At that point the compiler may complain about an unused function, I
think, so maybe  __attribute__((unused)) would help there.

I also think it's fine for this patch series to only add the ndos and
to leave it to the devmem series to introduce this function, but I'm
fine either way.

-- 
Thanks,
Mina

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next v1 1/3] queue_api: define queue api
  2024-04-30 18:00   ` Mina Almasry
@ 2024-05-02  1:00     ` David Wei
  2024-05-02 16:58       ` Mina Almasry
  2024-05-02  2:44     ` David Wei
  1 sibling, 1 reply; 15+ messages in thread
From: David Wei @ 2024-05-02  1:00 UTC (permalink / raw)
  To: Mina Almasry
  Cc: netdev, Michael Chan, Pavan Chebbi, Andy Gospodarek,
	Shailend Chand, Jakub Kicinski, David S. Miller, Eric Dumazet,
	Paolo Abeni

On 2024-04-30 11:00 am, Mina Almasry wrote:
> On Mon, Apr 29, 2024 at 6:07 PM David Wei <dw@davidwei.uk> wrote:
>>
>> From: Mina Almasry <almasrymina@google.com>
>>
>> This API enables the net stack to reset the queues used for devmem TCP.
>>
>> Signed-off-by: Mina Almasry <almasrymina@google.com>
>> Signed-off-by: David Wei <dw@davidwei.uk>
>> ---
>>  include/linux/netdevice.h   |  3 +++
>>  include/net/netdev_queues.h | 27 +++++++++++++++++++++++++++
>>  2 files changed, 30 insertions(+)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index f849e7d110ed..6a58ec73c5e8 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -1957,6 +1957,7 @@ enum netdev_reg_state {
>>   *     @sysfs_rx_queue_group:  Space for optional per-rx queue attributes
>>   *     @rtnl_link_ops: Rtnl_link_ops
>>   *     @stat_ops:      Optional ops for queue-aware statistics
>> + *     @queue_mgmt_ops:        Optional ops for queue management
>>   *
>>   *     @gso_max_size:  Maximum size of generic segmentation offload
>>   *     @tso_max_size:  Device (as in HW) limit on the max TSO request size
>> @@ -2340,6 +2341,8 @@ struct net_device {
>>
>>         const struct netdev_stat_ops *stat_ops;
>>
>> +       const struct netdev_queue_mgmt_ops *queue_mgmt_ops;
>> +
>>         /* for setting kernel sock attribute on TCP connection setup */
>>  #define GSO_MAX_SEGS           65535u
>>  #define GSO_LEGACY_MAX_SIZE    65536u
>> diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
>> index 1ec408585373..337df0860ae6 100644
>> --- a/include/net/netdev_queues.h
>> +++ b/include/net/netdev_queues.h
>> @@ -60,6 +60,33 @@ struct netdev_stat_ops {
>>                                struct netdev_queue_stats_tx *tx);
>>  };
>>
>> +/**
>> + * struct netdev_queue_mgmt_ops - netdev ops for queue management
>> + *
>> + * @ndo_queue_mem_alloc: Allocate memory for an RX queue. The memory returned
>> + *                      in the form of a void* can be passed to
>> + *                      ndo_queue_mem_free() for freeing or to ndo_queue_start
>> + *                      to create an RX queue with this memory.
>> + *
>> + * @ndo_queue_mem_free:        Free memory from an RX queue.
>> + *
>> + * @ndo_queue_start:   Start an RX queue at the specified index.
>> + *
>> + * @ndo_queue_stop:    Stop the RX queue at the specified index.
>> + */
>> +struct netdev_queue_mgmt_ops {
>> +       void *                  (*ndo_queue_mem_alloc)(struct net_device *dev,
>> +                                                      int idx);
>> +       void                    (*ndo_queue_mem_free)(struct net_device *dev,
>> +                                                     void *queue_mem);
>> +       int                     (*ndo_queue_start)(struct net_device *dev,
>> +                                                  int idx,
>> +                                                  void *queue_mem);
>> +       int                     (*ndo_queue_stop)(struct net_device *dev,
>> +                                                 int idx,
>> +                                                 void **out_queue_mem);
>> +};
> 
> Sorry, I think we raced a bit, we updated our interface definition
> based on your/Jakub's feedback to expose the size of the struct for
> core to allocate, so it looks like this for us now:
> 
> +struct netdev_queue_mgmt_ops {
> +       size_t                  ndo_queue_mem_size;
> +       int                     (*ndo_queue_mem_alloc)(struct net_device *dev,
> +                                                      void *per_queue_mem,
> +                                                      int idx);
> +       void                    (*ndo_queue_mem_free)(struct net_device *dev,
> +                                                     void *per_queue_mem);
> +       int                     (*ndo_queue_start)(struct net_device *dev,
> +                                                  void *per_queue_mem,
> +                                                  int idx);
> +       int                     (*ndo_queue_stop)(struct net_device *dev,
> +                                                 void *per_queue_mem,
> +                                                 int idx);
> +};
> +
> 
> The idea is that ndo_queue_mem_size is the size of the memory
> per_queue_mem points to.

Thanks, I'll update this.

No race, I'm just working on the bnxt side at the same time because I
need feedback from Broadcom. Hope you don't mind whichever one merges
first. Full credit is given to you on the queue API + netdev queue reset
function.

> 
> The rest of the functions are slightly modified to allow core to
> allocate the memory and pass in the pointer for the driver to fill
> in/us. I think Shailend is close to posting the patches, let us know
> if you see any issues.
> 

Great, I'll take a look when it is posted.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next v1 3/3] netdev: add netdev_rx_queue_restart()
  2024-04-30 18:15   ` Mina Almasry
@ 2024-05-02  1:04     ` David Wei
  2024-05-02 16:46       ` Mina Almasry
  0 siblings, 1 reply; 15+ messages in thread
From: David Wei @ 2024-05-02  1:04 UTC (permalink / raw)
  To: Mina Almasry
  Cc: netdev, Michael Chan, Pavan Chebbi, Andy Gospodarek,
	Shailend Chand, Jakub Kicinski, David S. Miller, Eric Dumazet,
	Paolo Abeni

On 2024-04-30 11:15 am, Mina Almasry wrote:
> On Mon, Apr 29, 2024 at 6:07 PM David Wei <dw@davidwei.uk> wrote:
>>
>> From: Mina Almasry <almasrymina@google.com>
>>
>> Add netdev_rx_queue_restart() function to netdev_rx_queue.h. This is
>> taken from Mina's work in [1] with a slight modification of taking
>> rtnl_lock() during the queue stop and start ops.
>>
>> For bnxt specifically, if the firmware doesn't support
>> BNXT_RST_RING_SP_EVENT, then ndo_queue_stop() returns -EOPNOTSUPP and
>> the whole restart fails. Unlike bnxt_rx_ring_reset(), there is no
>> attempt to reset the whole device.
>>
>> [1]: https://lore.kernel.org/linux-kernel/20240403002053.2376017-6-almasrymina@google.com/#t
>>
>> Signed-off-by: David Wei <dw@davidwei.uk>
>> ---
>>  include/net/netdev_rx_queue.h |  3 ++
>>  net/core/Makefile             |  1 +
>>  net/core/netdev_rx_queue.c    | 58 +++++++++++++++++++++++++++++++++++
>>  3 files changed, 62 insertions(+)
>>  create mode 100644 net/core/netdev_rx_queue.c
>>
>> diff --git a/include/net/netdev_rx_queue.h b/include/net/netdev_rx_queue.h
>> index aa1716fb0e53..e78ca52d67fb 100644
>> --- a/include/net/netdev_rx_queue.h
>> +++ b/include/net/netdev_rx_queue.h
>> @@ -54,4 +54,7 @@ get_netdev_rx_queue_index(struct netdev_rx_queue *queue)
>>         return index;
>>  }
>>  #endif
>> +
>> +int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq);
>> +
>>  #endif
>> diff --git a/net/core/Makefile b/net/core/Makefile
>> index 21d6fbc7e884..f2aa63c167a3 100644
>> --- a/net/core/Makefile
>> +++ b/net/core/Makefile
>> @@ -19,6 +19,7 @@ obj-$(CONFIG_NETDEV_ADDR_LIST_TEST) += dev_addr_lists_test.o
>>
>>  obj-y += net-sysfs.o
>>  obj-y += hotdata.o
>> +obj-y += netdev_rx_queue.o
>>  obj-$(CONFIG_PAGE_POOL) += page_pool.o page_pool_user.o
>>  obj-$(CONFIG_PROC_FS) += net-procfs.o
>>  obj-$(CONFIG_NET_PKTGEN) += pktgen.o
>> diff --git a/net/core/netdev_rx_queue.c b/net/core/netdev_rx_queue.c
>> new file mode 100644
>> index 000000000000..9633fb36f6d1
>> --- /dev/null
>> +++ b/net/core/netdev_rx_queue.c
>> @@ -0,0 +1,58 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +
>> +#include <linux/netdevice.h>
>> +#include <net/netdev_queues.h>
>> +#include <net/netdev_rx_queue.h>
>> +
>> +int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq)
>> +{
>> +       void *new_mem;
>> +       void *old_mem;
>> +       int err;
>> +
>> +       if (!dev->queue_mgmt_ops->ndo_queue_stop ||
>> +           !dev->queue_mgmt_ops->ndo_queue_mem_free ||
>> +           !dev->queue_mgmt_ops->ndo_queue_mem_alloc ||
>> +           !dev->queue_mgmt_ops->ndo_queue_start)
>> +               return -EOPNOTSUPP;
>> +
>> +       new_mem = dev->queue_mgmt_ops->ndo_queue_mem_alloc(dev, rxq);
>> +       if (!new_mem)
>> +               return -ENOMEM;
>> +
>> +       rtnl_lock();
> 
> FWIW in my case this function is called from a netlink API which
> already has rtnl_lock, so maybe a check of rtnl_is_locked is good here
> rather than a relock.
> 
>> +       err = dev->queue_mgmt_ops->ndo_queue_stop(dev, rxq, &old_mem);
>> +       if (err)
>> +               goto err_free_new_mem;
>> +
>> +       err = dev->queue_mgmt_ops->ndo_queue_start(dev, rxq, new_mem);
>> +       if (err)
>> +               goto err_start_queue;
>> +       rtnl_unlock();
>> +
>> +       dev->queue_mgmt_ops->ndo_queue_mem_free(dev, old_mem);
>> +
>> +       return 0;
>> +
>> +err_start_queue:
>> +       /* Restarting the queue with old_mem should be successful as we haven't
>> +        * changed any of the queue configuration, and there is not much we can
>> +        * do to recover from a failure here.
>> +        *
>> +        * WARN if the we fail to recover the old rx queue, and at least free
>> +        * old_mem so we don't also leak that.
>> +        */
>> +       if (dev->queue_mgmt_ops->ndo_queue_start(dev, rxq, old_mem)) {
>> +               WARN(1,
>> +                    "Failed to restart old queue in error path. RX queue %d may be unhealthy.",
>> +                    rxq);
>> +               dev->queue_mgmt_ops->ndo_queue_mem_free(dev, &old_mem);
>> +       }
>> +
>> +err_free_new_mem:
>> +       dev->queue_mgmt_ops->ndo_queue_mem_free(dev, new_mem);
>> +       rtnl_unlock();
>> +
>> +       return err;
>> +}
>> +EXPORT_SYMBOL_GPL(netdev_rx_queue_restart);
> 
> Does stuff outside of core need this? I don't think so, right? I think
> you can drop EXPORT_SYMBOL_GPL.

Not sure, we intend to call this from within io_uring. Does that require
exporting or not?

Later on I'll want to add something like
netdev_rx_queue_set_memory_provider() which then calls
netdev_rx_queue_restart(). When that happens I can remove the
EXPORT_SYMBOL_GPL.

> 
> At that point the compiler may complain about an unused function, I
> think, so maybe  __attribute__((unused)) would help there.
> 
> I also think it's fine for this patch series to only add the ndos and
> to leave it to the devmem series to introduce this function, but I'm
> fine either way.
> 

I'd like to agree on the netdev public API and merge alongside the queue
api changes, separate to TCP devmem. Then that's one fewer deps between
our main patchsets.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next v1 1/3] queue_api: define queue api
  2024-04-30 18:00   ` Mina Almasry
  2024-05-02  1:00     ` David Wei
@ 2024-05-02  2:44     ` David Wei
  2024-05-02 17:15       ` Mina Almasry
  1 sibling, 1 reply; 15+ messages in thread
From: David Wei @ 2024-05-02  2:44 UTC (permalink / raw)
  To: Mina Almasry
  Cc: netdev, Michael Chan, Pavan Chebbi, Andy Gospodarek,
	Shailend Chand, Jakub Kicinski, David S. Miller, Eric Dumazet,
	Paolo Abeni

On 2024-04-30 11:00 am, Mina Almasry wrote:
> On Mon, Apr 29, 2024 at 6:07 PM David Wei <dw@davidwei.uk> wrote:
>>
>> From: Mina Almasry <almasrymina@google.com>
>>
>> This API enables the net stack to reset the queues used for devmem TCP.
>>
>> Signed-off-by: Mina Almasry <almasrymina@google.com>
>> Signed-off-by: David Wei <dw@davidwei.uk>
>> ---
>>  include/linux/netdevice.h   |  3 +++
>>  include/net/netdev_queues.h | 27 +++++++++++++++++++++++++++
>>  2 files changed, 30 insertions(+)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index f849e7d110ed..6a58ec73c5e8 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -1957,6 +1957,7 @@ enum netdev_reg_state {
>>   *     @sysfs_rx_queue_group:  Space for optional per-rx queue attributes
>>   *     @rtnl_link_ops: Rtnl_link_ops
>>   *     @stat_ops:      Optional ops for queue-aware statistics
>> + *     @queue_mgmt_ops:        Optional ops for queue management
>>   *
>>   *     @gso_max_size:  Maximum size of generic segmentation offload
>>   *     @tso_max_size:  Device (as in HW) limit on the max TSO request size
>> @@ -2340,6 +2341,8 @@ struct net_device {
>>
>>         const struct netdev_stat_ops *stat_ops;
>>
>> +       const struct netdev_queue_mgmt_ops *queue_mgmt_ops;
>> +
>>         /* for setting kernel sock attribute on TCP connection setup */
>>  #define GSO_MAX_SEGS           65535u
>>  #define GSO_LEGACY_MAX_SIZE    65536u
>> diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
>> index 1ec408585373..337df0860ae6 100644
>> --- a/include/net/netdev_queues.h
>> +++ b/include/net/netdev_queues.h
>> @@ -60,6 +60,33 @@ struct netdev_stat_ops {
>>                                struct netdev_queue_stats_tx *tx);
>>  };
>>
>> +/**
>> + * struct netdev_queue_mgmt_ops - netdev ops for queue management
>> + *
>> + * @ndo_queue_mem_alloc: Allocate memory for an RX queue. The memory returned
>> + *                      in the form of a void* can be passed to
>> + *                      ndo_queue_mem_free() for freeing or to ndo_queue_start
>> + *                      to create an RX queue with this memory.
>> + *
>> + * @ndo_queue_mem_free:        Free memory from an RX queue.
>> + *
>> + * @ndo_queue_start:   Start an RX queue at the specified index.
>> + *
>> + * @ndo_queue_stop:    Stop the RX queue at the specified index.
>> + */
>> +struct netdev_queue_mgmt_ops {
>> +       void *                  (*ndo_queue_mem_alloc)(struct net_device *dev,
>> +                                                      int idx);
>> +       void                    (*ndo_queue_mem_free)(struct net_device *dev,
>> +                                                     void *queue_mem);
>> +       int                     (*ndo_queue_start)(struct net_device *dev,
>> +                                                  int idx,
>> +                                                  void *queue_mem);
>> +       int                     (*ndo_queue_stop)(struct net_device *dev,
>> +                                                 int idx,
>> +                                                 void **out_queue_mem);
>> +};
> 
> Sorry, I think we raced a bit, we updated our interface definition
> based on your/Jakub's feedback to expose the size of the struct for
> core to allocate, so it looks like this for us now:
> 
> +struct netdev_queue_mgmt_ops {
> +       size_t                  ndo_queue_mem_size;
> +       int                     (*ndo_queue_mem_alloc)(struct net_device *dev,
> +                                                      void *per_queue_mem,
> +                                                      int idx);
> +       void                    (*ndo_queue_mem_free)(struct net_device *dev,
> +                                                     void *per_queue_mem);
> +       int                     (*ndo_queue_start)(struct net_device *dev,
> +                                                  void *per_queue_mem,
> +                                                  int idx);
> +       int                     (*ndo_queue_stop)(struct net_device *dev,
> +                                                 void *per_queue_mem,
> +                                                 int idx);
> +};
> +
> 
> The idea is that ndo_queue_mem_size is the size of the memory
> per_queue_mem points to.
> 
> The rest of the functions are slightly modified to allow core to
> allocate the memory and pass in the pointer for the driver to fill
> in/us. I think Shailend is close to posting the patches, let us know
> if you see any issues.
> 

Hmm. Thinking about this a bit more, are you having netdev core manage
all of the queues, i.e. alloc/free during open()/stop()? Otherwise how
can netdev core pass in the old queue mem into ndo_queue_stop(), and
where is the queue mem stored?

Or is it the new queue mem being passed into ndo_queue_stop()?

My takeaway from the discussion on Shailend's last RFC is that for the
first iteration we'll keep what we had before and have the driver
alloc/free the qmem. Implementing this for bnxt felt pretty good as
well.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next v1 3/3] netdev: add netdev_rx_queue_restart()
  2024-05-02  1:04     ` David Wei
@ 2024-05-02 16:46       ` Mina Almasry
  2024-05-03  0:22         ` Jakub Kicinski
  0 siblings, 1 reply; 15+ messages in thread
From: Mina Almasry @ 2024-05-02 16:46 UTC (permalink / raw)
  To: David Wei
  Cc: netdev, Michael Chan, Pavan Chebbi, Andy Gospodarek,
	Shailend Chand, Jakub Kicinski, David S. Miller, Eric Dumazet,
	Paolo Abeni

On Wed, May 1, 2024 at 6:04 PM David Wei <dw@davidwei.uk> wrote:
>
> On 2024-04-30 11:15 am, Mina Almasry wrote:
> > On Mon, Apr 29, 2024 at 6:07 PM David Wei <dw@davidwei.uk> wrote:
> >>
> >> +err_start_queue:
> >> +       /* Restarting the queue with old_mem should be successful as we haven't
> >> +        * changed any of the queue configuration, and there is not much we can
> >> +        * do to recover from a failure here.
> >> +        *
> >> +        * WARN if the we fail to recover the old rx queue, and at least free
> >> +        * old_mem so we don't also leak that.
> >> +        */
> >> +       if (dev->queue_mgmt_ops->ndo_queue_start(dev, rxq, old_mem)) {
> >> +               WARN(1,
> >> +                    "Failed to restart old queue in error path. RX queue %d may be unhealthy.",
> >> +                    rxq);
> >> +               dev->queue_mgmt_ops->ndo_queue_mem_free(dev, &old_mem);
> >> +       }
> >> +
> >> +err_free_new_mem:
> >> +       dev->queue_mgmt_ops->ndo_queue_mem_free(dev, new_mem);
> >> +       rtnl_unlock();
> >> +
> >> +       return err;
> >> +}
> >> +EXPORT_SYMBOL_GPL(netdev_rx_queue_restart);
> >
> > Does stuff outside of core need this? I don't think so, right? I think
> > you can drop EXPORT_SYMBOL_GPL.
>
> Not sure, we intend to call this from within io_uring. Does that require
> exporting or not?
>

I don't this this requires exporting, no.

> Later on I'll want to add something like
> netdev_rx_queue_set_memory_provider() which then calls
> netdev_rx_queue_restart(). When that happens I can remove the
> EXPORT_SYMBOL_GPL.
>

Sorry, I think if we don't need the EXPORT, then I think don't export
in the first place. Removing an EXPORT is, AFAIU, tricky. Because if
something is exported and then you unexport it could break an out of
tree module/driver that developed a dependency on it. Not sure how
much of a concern it really is.


> >
> > At that point the compiler may complain about an unused function, I
> > think, so maybe  __attribute__((unused)) would help there.
> >
> > I also think it's fine for this patch series to only add the ndos and
> > to leave it to the devmem series to introduce this function, but I'm
> > fine either way.
> >
>
> I'd like to agree on the netdev public API and merge alongside the queue
> api changes, separate to TCP devmem. Then that's one fewer deps between
> our main patchsets.

Ah, OK, sounds good.

I was thinking both efforts would use the ndos, but I think you're
thinking both efforts would use the netdev_rx_queue_restart. Yes, that
sounds reasonable.


--
Thanks,
Mina

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next v1 1/3] queue_api: define queue api
  2024-05-02  1:00     ` David Wei
@ 2024-05-02 16:58       ` Mina Almasry
  0 siblings, 0 replies; 15+ messages in thread
From: Mina Almasry @ 2024-05-02 16:58 UTC (permalink / raw)
  To: David Wei
  Cc: netdev, Michael Chan, Pavan Chebbi, Andy Gospodarek,
	Shailend Chand, Jakub Kicinski, David S. Miller, Eric Dumazet,
	Paolo Abeni

On Wed, May 1, 2024 at 6:00 PM David Wei <dw@davidwei.uk> wrote:
>
> On 2024-04-30 11:00 am, Mina Almasry wrote:
> >
> > Sorry, I think we raced a bit, we updated our interface definition
> > based on your/Jakub's feedback to expose the size of the struct for
> > core to allocate, so it looks like this for us now:
> >
> > +struct netdev_queue_mgmt_ops {
> > +       size_t                  ndo_queue_mem_size;
> > +       int                     (*ndo_queue_mem_alloc)(struct net_device *dev,
> > +                                                      void *per_queue_mem,
> > +                                                      int idx);
> > +       void                    (*ndo_queue_mem_free)(struct net_device *dev,
> > +                                                     void *per_queue_mem);
> > +       int                     (*ndo_queue_start)(struct net_device *dev,
> > +                                                  void *per_queue_mem,
> > +                                                  int idx);
> > +       int                     (*ndo_queue_stop)(struct net_device *dev,
> > +                                                 void *per_queue_mem,
> > +                                                 int idx);
> > +};
> > +
> >
> > The idea is that ndo_queue_mem_size is the size of the memory
> > per_queue_mem points to.
>
> Thanks, I'll update this.
>
> No race, I'm just working on the bnxt side at the same time because I
> need feedback from Broadcom. Hope you don't mind whichever one merges
> first. Full credit is given to you on the queue API + netdev queue reset
> function.
>

Thanks! No concerns from me on what gets merged first. By 'raced' I
just meant that we were incorporating your feedback while you were
working on the older version :D

-- 
Thanks,
Mina

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next v1 1/3] queue_api: define queue api
  2024-05-02  2:44     ` David Wei
@ 2024-05-02 17:15       ` Mina Almasry
  0 siblings, 0 replies; 15+ messages in thread
From: Mina Almasry @ 2024-05-02 17:15 UTC (permalink / raw)
  To: David Wei
  Cc: netdev, Michael Chan, Pavan Chebbi, Andy Gospodarek,
	Shailend Chand, Jakub Kicinski, David S. Miller, Eric Dumazet,
	Paolo Abeni

On Wed, May 1, 2024 at 7:44 PM David Wei <dw@davidwei.uk> wrote:
>
> On 2024-04-30 11:00 am, Mina Almasry wrote:
> >
> > Sorry, I think we raced a bit, we updated our interface definition
> > based on your/Jakub's feedback to expose the size of the struct for
> > core to allocate, so it looks like this for us now:
> >
> > +struct netdev_queue_mgmt_ops {
> > +       size_t                  ndo_queue_mem_size;
> > +       int                     (*ndo_queue_mem_alloc)(struct net_device *dev,
> > +                                                      void *per_queue_mem,
> > +                                                      int idx);
> > +       void                    (*ndo_queue_mem_free)(struct net_device *dev,
> > +                                                     void *per_queue_mem);
> > +       int                     (*ndo_queue_start)(struct net_device *dev,
> > +                                                  void *per_queue_mem,
> > +                                                  int idx);
> > +       int                     (*ndo_queue_stop)(struct net_device *dev,
> > +                                                 void *per_queue_mem,
> > +                                                 int idx);
> > +};
> > +
> >
> > The idea is that ndo_queue_mem_size is the size of the memory
> > per_queue_mem points to.
> >
> > The rest of the functions are slightly modified to allow core to
> > allocate the memory and pass in the pointer for the driver to fill
> > in/us. I think Shailend is close to posting the patches, let us know
> > if you see any issues.
> >
>
> Hmm. Thinking about this a bit more, are you having netdev core manage
> all of the queues, i.e. alloc/free during open()/stop()?

No, we do not modify open()/stop(). I think in the future that is the
plan. However when it comes to the future direction of queue
management I think that's more Jakub's purview so I'm leaving it up to
him. For devmem TCP I'm just implementing what we need, and I'm
trusting that it is aligned with the general direction Jakub wants to
take things in eventually as he hasn't (yet) complained in the reviews
:D

> Otherwise how
> can netdev core pass in the old queue mem into ndo_queue_stop(), and
> where is the queue mem stored?
>

What we have in mind, is:

1. driver declares how much memory it needs in ndo_queue_mem_size
2. Core kzalloc's that much memory.
3. Core passes a pointer to that memory to ndo_queue_stop. The driver
fills in the memory and stops the queue.

Then, core will have a pointer to the 'old queue mem'. Core can then
free that memory if allocing/starting a new queue succeeded, or do a
ndo_queue_start(old_mem) if it wishes to start a new queue.

We do something similar for ndo_queue_mem_alloc:

1. driver declares how much memory it needs in ndo_queue_mem_size
2. Core kzallocs's that much memory.
3. Core passes that memory to ndo_queue_mem_alloc. The driver allocs
the resources for a new queue, attaches the resources to the passed
pointer, and returns.

We can also discuss over a public netdev bbb call if face to face time
makes it easier to align quickly.

> Or is it the new queue mem being passed into ndo_queue_stop()?
>
> My takeaway from the discussion on Shailend's last RFC is that for the
> first iteration we'll keep what we had before and have the driver
> alloc/free the qmem. Implementing this for bnxt felt pretty good as
> well.

We can certainly switch back to what we had before, and remove the
ndo_queue_mem_size we added if you've changed your mind, not a big
deal.

-- 
Thanks,
Mina

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next v1 3/3] netdev: add netdev_rx_queue_restart()
  2024-05-02 16:46       ` Mina Almasry
@ 2024-05-03  0:22         ` Jakub Kicinski
  2024-05-03  0:49           ` Jakub Kicinski
  0 siblings, 1 reply; 15+ messages in thread
From: Jakub Kicinski @ 2024-05-03  0:22 UTC (permalink / raw)
  To: Mina Almasry
  Cc: David Wei, netdev, Michael Chan, Pavan Chebbi, Andy Gospodarek,
	Shailend Chand, David S. Miller, Eric Dumazet, Paolo Abeni

On Thu, 2 May 2024 09:46:46 -0700 Mina Almasry wrote:
> Sorry, I think if we don't need the EXPORT, then I think don't export
> in the first place. Removing an EXPORT is, AFAIU, tricky. Because if
> something is exported and then you unexport it could break an out of
> tree module/driver that developed a dependency on it. Not sure how
> much of a concern it really is.

FWIW don't worry about out of tree code, it's not a concern.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next v1 3/3] netdev: add netdev_rx_queue_restart()
  2024-05-03  0:22         ` Jakub Kicinski
@ 2024-05-03  0:49           ` Jakub Kicinski
  2024-05-03  1:19             ` David Wei
  0 siblings, 1 reply; 15+ messages in thread
From: Jakub Kicinski @ 2024-05-03  0:49 UTC (permalink / raw)
  To: Mina Almasry
  Cc: David Wei, netdev, Michael Chan, Pavan Chebbi, Andy Gospodarek,
	Shailend Chand, David S. Miller, Eric Dumazet, Paolo Abeni

On Thu, 2 May 2024 17:22:41 -0700 Jakub Kicinski wrote:
> On Thu, 2 May 2024 09:46:46 -0700 Mina Almasry wrote:
> > Sorry, I think if we don't need the EXPORT, then I think don't export
> > in the first place. Removing an EXPORT is, AFAIU, tricky. Because if
> > something is exported and then you unexport it could break an out of
> > tree module/driver that developed a dependency on it. Not sure how
> > much of a concern it really is.  
> 
> FWIW don't worry about out of tree code, it's not a concern.

That said (looking at the other thread), if there's no in-tree
user, it's actually incorrect to add an export.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next v1 3/3] netdev: add netdev_rx_queue_restart()
  2024-05-03  0:49           ` Jakub Kicinski
@ 2024-05-03  1:19             ` David Wei
  0 siblings, 0 replies; 15+ messages in thread
From: David Wei @ 2024-05-03  1:19 UTC (permalink / raw)
  To: Jakub Kicinski, Mina Almasry
  Cc: netdev, Michael Chan, Pavan Chebbi, Andy Gospodarek,
	Shailend Chand, David S. Miller, Eric Dumazet, Paolo Abeni

On 2024-05-02 17:49, Jakub Kicinski wrote:
> On Thu, 2 May 2024 17:22:41 -0700 Jakub Kicinski wrote:
>> On Thu, 2 May 2024 09:46:46 -0700 Mina Almasry wrote:
>>> Sorry, I think if we don't need the EXPORT, then I think don't export
>>> in the first place. Removing an EXPORT is, AFAIU, tricky. Because if
>>> something is exported and then you unexport it could break an out of
>>> tree module/driver that developed a dependency on it. Not sure how
>>> much of a concern it really is.  
>>
>> FWIW don't worry about out of tree code, it's not a concern.
> 
> That said (looking at the other thread), if there's no in-tree
> user, it's actually incorrect to add an export.

Sorry, I forgot to remove it here, but I will do.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-05-03  1:19 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-30  1:07 [RFC PATCH net-next v1 0/3] bnxt: implement queue api David Wei
2024-04-30  1:07 ` [RFC PATCH net-next v1 1/3] queue_api: define " David Wei
2024-04-30 18:00   ` Mina Almasry
2024-05-02  1:00     ` David Wei
2024-05-02 16:58       ` Mina Almasry
2024-05-02  2:44     ` David Wei
2024-05-02 17:15       ` Mina Almasry
2024-04-30  1:07 ` [RFC PATCH net-next v1 2/3] bnxt: implement " David Wei
2024-04-30  1:07 ` [RFC PATCH net-next v1 3/3] netdev: add netdev_rx_queue_restart() David Wei
2024-04-30 18:15   ` Mina Almasry
2024-05-02  1:04     ` David Wei
2024-05-02 16:46       ` Mina Almasry
2024-05-03  0:22         ` Jakub Kicinski
2024-05-03  0:49           ` Jakub Kicinski
2024-05-03  1:19             ` David Wei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).