netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [net-next,RFC PATCH 0/5] Configuring NAPI instance for a queue
@ 2024-04-05 20:09 Amritha Nambiar
  2024-04-05 20:09 ` [net-next, RFC PATCH 1/5] netdev-genl: spec: Extend netdev netlink spec in YAML for queue-set Amritha Nambiar
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Amritha Nambiar @ 2024-04-05 20:09 UTC (permalink / raw)
  To: netdev, kuba, davem
  Cc: edumazet, pabeni, ast, sdf, lorenzo, tariqt, daniel,
	anthony.l.nguyen, lucien.xin, hawk, sridhar.samudrala,
	amritha.nambiar

Support user configurability of queue<->NAPI association. The netdev-genl
interface is extended with 'queue-set' command. Currently, the set
command enables associating a NAPI ID for a queue, but can also be extended
to support configuring other attributes. To set the NAPI attribute, the
command requires the queue identifiers and the ID of the NAPI instance that
the queue has to be associated with.

Previous discussion:
https://lore.kernel.org/netdev/32e32635-ca75-99b8-2285-1d87a29b6d89@intel.com/

Example:
$ ./cli.py --spec netdev.yaml --do queue-set  --json='{"ifindex": 12, "id": 0, "type": 0, "napi-id": 595}'
{'id': 0, 'ifindex': 12, 'napi-id': 595, 'type': 'rx'}

The queue<->NAPI association is achieved by mapping the queue to the
corresponding interrupt vector. This interface, thereby, exposes
configurability of the queue<->vector mapping.

The interface is generic such that any Tx or Rx queue[s] can be associated
with a NAPI instance. So, it is possible to have multiple queues (Tx queues
only, Rx queues only or both Tx and Rx queues) on a NAPI/vector. This is a
step away from the usual 1:1 queue<->vector model, although the queue-pair
association can still be retained by configuring the Tx[n]-Rx[n] queue-pair
to the same NAPI instance.

The usecase for configuring NAPI pollers to queues this way from the userspace
is to limit the number of pollers (by sharing multiple queues on a vector).
This allows to have more cores for application threads by reducing the cores
needed for poller threads.

Patch 1-2 introduces the kernel interface hooks to set the NAPI ID attribute
for a queue.
Patch 3-4 prepares the ice driver for device configuration. The queue<->vector
mapping configured from userspace would not be reflected in the HW unless the
queue is reset. Currently, the driver supports only reset of the entire VSI or
a queue-pair. Resetting the VSI for each queue configuration is an overkill.
So, add the ice driver support to reset a vector. Also, add support to handle
vectors that gets unused due to the dynamic nature of configurability.
Patch 5 implements the support for queue<->NAPI configuration in the ice driver.

Testing Hints:
1. Move a Rx queue from its current NAPI to a different NAPI during traffic
2. Move a Rx queue back to its initial NAPI during traffic
3. Move all Rx queues of NAPI-A to NAPI-B during traffic
4. Repeat the tests above for Tx queue
5. Move all Tx and Rx queues of a NAPI to a new NAPI during traffic
6. Move a queue to an unused NAPI during traffic

---

Amritha Nambiar (5):
      netdev-genl: spec: Extend netdev netlink spec in YAML for queue-set
      netdev-genl: Add netlink framework functions for queue-set NAPI
      ice: Add support to enable/disable a vector
      ice: Handle unused vectors dynamically
      ice: Add driver support for ndo_queue_set_napi


 Documentation/netlink/specs/netdev.yaml   |   20 +
 drivers/net/ethernet/intel/ice/ice.h      |   13 +
 drivers/net/ethernet/intel/ice/ice_lib.c  |  667 +++++++++++++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_lib.h  |   12 +
 drivers/net/ethernet/intel/ice/ice_main.c |   15 -
 drivers/net/ethernet/intel/ice/ice_xsk.c  |   34 -
 include/linux/netdevice.h                 |    7 
 include/uapi/linux/netdev.h               |    1 
 net/core/netdev-genl-gen.c                |   15 +
 net/core/netdev-genl-gen.h                |    1 
 net/core/netdev-genl.c                    |  120 +++++
 tools/include/uapi/linux/netdev.h         |    1 
 12 files changed, 845 insertions(+), 61 deletions(-)

--

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [net-next, RFC PATCH 1/5] netdev-genl: spec: Extend netdev netlink spec in YAML for queue-set
  2024-04-05 20:09 [net-next,RFC PATCH 0/5] Configuring NAPI instance for a queue Amritha Nambiar
@ 2024-04-05 20:09 ` Amritha Nambiar
  2024-04-05 20:09 ` [net-next, RFC PATCH 2/5] netdev-genl: Add netlink framework functions for queue-set NAPI Amritha Nambiar
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Amritha Nambiar @ 2024-04-05 20:09 UTC (permalink / raw)
  To: netdev, kuba, davem
  Cc: edumazet, pabeni, ast, sdf, lorenzo, tariqt, daniel,
	anthony.l.nguyen, lucien.xin, hawk, sridhar.samudrala,
	amritha.nambiar

Add support in netlink spec(netdev.yaml) for queue-set command. Currently,
the set command enables associating a NAPI ID for a queue, but can also
be extended to support configuring other attributes.
Also, add code generated from the spec.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 Documentation/netlink/specs/netdev.yaml |   20 ++++++++++++++++++++
 include/uapi/linux/netdev.h             |    1 +
 net/core/netdev-genl-gen.c              |   15 +++++++++++++++
 net/core/netdev-genl-gen.h              |    1 +
 net/core/netdev-genl.c                  |    5 +++++
 tools/include/uapi/linux/netdev.h       |    1 +
 6 files changed, 43 insertions(+)

diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml
index 76352dbd2be4..eda45ae31077 100644
--- a/Documentation/netlink/specs/netdev.yaml
+++ b/Documentation/netlink/specs/netdev.yaml
@@ -457,6 +457,26 @@ operations:
           attributes:
             - ifindex
         reply: *queue-get-op
+    -
+      name: queue-set
+      doc: User configuration of queue attributes.
+           The id, type and ifindex forms the queue header/identifier. Example,
+           to configure the NAPI instance associated with the queue, the napi-id
+           is the configurable attribute.
+      attribute-set: queue
+      do:
+        request:
+          attributes:
+            - ifindex
+            - type
+            - id
+            - napi-id
+        reply: &queue-set-op
+          attributes:
+            - id
+            - type
+            - napi-id
+            - ifindex
     -
       name: napi-get
       doc: Get information about NAPI instances configured on the system.
diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h
index bb65ee840cda..80fac72da8b2 100644
--- a/include/uapi/linux/netdev.h
+++ b/include/uapi/linux/netdev.h
@@ -162,6 +162,7 @@ enum {
 	NETDEV_CMD_PAGE_POOL_CHANGE_NTF,
 	NETDEV_CMD_PAGE_POOL_STATS_GET,
 	NETDEV_CMD_QUEUE_GET,
+	NETDEV_CMD_QUEUE_SET,
 	NETDEV_CMD_NAPI_GET,
 	NETDEV_CMD_QSTATS_GET,
 
diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c
index 8d8ace9ef87f..cb5485dc5843 100644
--- a/net/core/netdev-genl-gen.c
+++ b/net/core/netdev-genl-gen.c
@@ -58,6 +58,14 @@ static const struct nla_policy netdev_queue_get_dump_nl_policy[NETDEV_A_QUEUE_IF
 	[NETDEV_A_QUEUE_IFINDEX] = NLA_POLICY_MIN(NLA_U32, 1),
 };
 
+/* NETDEV_CMD_QUEUE_SET - do */
+static const struct nla_policy netdev_queue_set_nl_policy[NETDEV_A_QUEUE_NAPI_ID + 1] = {
+	[NETDEV_A_QUEUE_IFINDEX] = NLA_POLICY_MIN(NLA_U32, 1),
+	[NETDEV_A_QUEUE_TYPE] = NLA_POLICY_MAX(NLA_U32, 1),
+	[NETDEV_A_QUEUE_ID] = { .type = NLA_U32, },
+	[NETDEV_A_QUEUE_NAPI_ID] = { .type = NLA_U32, },
+};
+
 /* NETDEV_CMD_NAPI_GET - do */
 static const struct nla_policy netdev_napi_get_do_nl_policy[NETDEV_A_NAPI_ID + 1] = {
 	[NETDEV_A_NAPI_ID] = { .type = NLA_U32, },
@@ -129,6 +137,13 @@ static const struct genl_split_ops netdev_nl_ops[] = {
 		.maxattr	= NETDEV_A_QUEUE_IFINDEX,
 		.flags		= GENL_CMD_CAP_DUMP,
 	},
+	{
+		.cmd		= NETDEV_CMD_QUEUE_SET,
+		.doit		= netdev_nl_queue_set_doit,
+		.policy		= netdev_queue_set_nl_policy,
+		.maxattr	= NETDEV_A_QUEUE_NAPI_ID,
+		.flags		= GENL_CMD_CAP_DO,
+	},
 	{
 		.cmd		= NETDEV_CMD_NAPI_GET,
 		.doit		= netdev_nl_napi_get_doit,
diff --git a/net/core/netdev-genl-gen.h b/net/core/netdev-genl-gen.h
index 4db40fd5b4a9..be136c5ea5ad 100644
--- a/net/core/netdev-genl-gen.h
+++ b/net/core/netdev-genl-gen.h
@@ -26,6 +26,7 @@ int netdev_nl_page_pool_stats_get_dumpit(struct sk_buff *skb,
 int netdev_nl_queue_get_doit(struct sk_buff *skb, struct genl_info *info);
 int netdev_nl_queue_get_dumpit(struct sk_buff *skb,
 			       struct netlink_callback *cb);
+int netdev_nl_queue_set_doit(struct sk_buff *skb, struct genl_info *info);
 int netdev_nl_napi_get_doit(struct sk_buff *skb, struct genl_info *info);
 int netdev_nl_napi_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb);
 int netdev_nl_qstats_get_dumpit(struct sk_buff *skb,
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index 7004b3399c2b..d5b2e90e5709 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -674,6 +674,11 @@ int netdev_nl_qstats_get_dumpit(struct sk_buff *skb,
 	return err;
 }
 
+int netdev_nl_queue_set_doit(struct sk_buff *skb, struct genl_info *info)
+{
+	return -EOPNOTSUPP;
+}
+
 static int netdev_genl_netdevice_event(struct notifier_block *nb,
 				       unsigned long event, void *ptr)
 {
diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/netdev.h
index bb65ee840cda..80fac72da8b2 100644
--- a/tools/include/uapi/linux/netdev.h
+++ b/tools/include/uapi/linux/netdev.h
@@ -162,6 +162,7 @@ enum {
 	NETDEV_CMD_PAGE_POOL_CHANGE_NTF,
 	NETDEV_CMD_PAGE_POOL_STATS_GET,
 	NETDEV_CMD_QUEUE_GET,
+	NETDEV_CMD_QUEUE_SET,
 	NETDEV_CMD_NAPI_GET,
 	NETDEV_CMD_QSTATS_GET,
 


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [net-next, RFC PATCH 2/5] netdev-genl: Add netlink framework functions for queue-set NAPI
  2024-04-05 20:09 [net-next,RFC PATCH 0/5] Configuring NAPI instance for a queue Amritha Nambiar
  2024-04-05 20:09 ` [net-next, RFC PATCH 1/5] netdev-genl: spec: Extend netdev netlink spec in YAML for queue-set Amritha Nambiar
@ 2024-04-05 20:09 ` Amritha Nambiar
  2024-04-06  0:20   ` David Wei
  2024-04-05 20:09 ` [net-next, RFC PATCH 3/5] ice: Add support to enable/disable a vector Amritha Nambiar
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Amritha Nambiar @ 2024-04-05 20:09 UTC (permalink / raw)
  To: netdev, kuba, davem
  Cc: edumazet, pabeni, ast, sdf, lorenzo, tariqt, daniel,
	anthony.l.nguyen, lucien.xin, hawk, sridhar.samudrala,
	amritha.nambiar

Implement the netdev netlink framework functions for associating
a queue with NAPI ID.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 include/linux/netdevice.h |    7 +++
 net/core/netdev-genl.c    |  117 +++++++++++++++++++++++++++++++++++++++------
 2 files changed, 108 insertions(+), 16 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0c198620ac93..70df1cec4a60 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1351,6 +1351,10 @@ struct netdev_net_notifier {
  *			   struct kernel_hwtstamp_config *kernel_config,
  *			   struct netlink_ext_ack *extack);
  *	Change the hardware timestamping parameters for NIC device.
+ *
+ * int (*ndo_queue_set_napi)(struct net_device *dev, u32 q_idx, u32 q_type,
+ *			     struct napi_struct *napi);
+ *	Change the NAPI instance associated with the queue.
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -1596,6 +1600,9 @@ struct net_device_ops {
 	int			(*ndo_hwtstamp_set)(struct net_device *dev,
 						    struct kernel_hwtstamp_config *kernel_config,
 						    struct netlink_ext_ack *extack);
+	int			(*ndo_queue_set_napi)(struct net_device *dev,
+						      u32 q_idx, u32 q_type,
+						      struct napi_struct *napi);
 };
 
 /**
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index d5b2e90e5709..6b3d3165d76e 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -288,12 +288,29 @@ int netdev_nl_napi_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
 	return err;
 }
 
+/* must be called under rtnl_lock() */
+static struct napi_struct *
+napi_get_by_queue(struct net_device *netdev, u32 q_idx, u32 q_type)
+{
+	struct netdev_rx_queue *rxq;
+	struct netdev_queue *txq;
+
+	switch (q_type) {
+	case NETDEV_QUEUE_TYPE_RX:
+		rxq = __netif_get_rx_queue(netdev, q_idx);
+		return rxq->napi;
+	case NETDEV_QUEUE_TYPE_TX:
+		txq = netdev_get_tx_queue(netdev, q_idx);
+		return txq->napi;
+	}
+	return NULL;
+}
+
 static int
 netdev_nl_queue_fill_one(struct sk_buff *rsp, struct net_device *netdev,
 			 u32 q_idx, u32 q_type, const struct genl_info *info)
 {
-	struct netdev_rx_queue *rxq;
-	struct netdev_queue *txq;
+	struct napi_struct *napi;
 	void *hdr;
 
 	hdr = genlmsg_iput(rsp, info);
@@ -305,19 +322,9 @@ netdev_nl_queue_fill_one(struct sk_buff *rsp, struct net_device *netdev,
 	    nla_put_u32(rsp, NETDEV_A_QUEUE_IFINDEX, netdev->ifindex))
 		goto nla_put_failure;
 
-	switch (q_type) {
-	case NETDEV_QUEUE_TYPE_RX:
-		rxq = __netif_get_rx_queue(netdev, q_idx);
-		if (rxq->napi && nla_put_u32(rsp, NETDEV_A_QUEUE_NAPI_ID,
-					     rxq->napi->napi_id))
-			goto nla_put_failure;
-		break;
-	case NETDEV_QUEUE_TYPE_TX:
-		txq = netdev_get_tx_queue(netdev, q_idx);
-		if (txq->napi && nla_put_u32(rsp, NETDEV_A_QUEUE_NAPI_ID,
-					     txq->napi->napi_id))
-			goto nla_put_failure;
-	}
+	napi = napi_get_by_queue(netdev, q_idx, q_type);
+	if (napi && nla_put_u32(rsp, NETDEV_A_QUEUE_NAPI_ID, napi->napi_id))
+		goto nla_put_failure;
 
 	genlmsg_end(rsp, hdr);
 
@@ -674,9 +681,87 @@ int netdev_nl_qstats_get_dumpit(struct sk_buff *skb,
 	return err;
 }
 
+static int
+netdev_nl_queue_set_napi(struct sk_buff *rsp, struct net_device *netdev,
+			 u32 q_idx, u32 q_type, u32 napi_id,
+			 const struct genl_info *info)
+{
+	struct napi_struct *napi, *old_napi;
+	int err;
+
+	if (!(netdev->flags & IFF_UP))
+		return 0;
+
+	err = netdev_nl_queue_validate(netdev, q_idx, q_type);
+	if (err)
+		return err;
+
+	old_napi = napi_get_by_queue(netdev, q_idx, q_type);
+	if (old_napi && old_napi->napi_id == napi_id)
+		return 0;
+
+	napi = napi_by_id(napi_id);
+	if (!napi)
+		return -EINVAL;
+
+	err = netdev->netdev_ops->ndo_queue_set_napi(netdev, q_idx, q_type, napi);
+
+	return err;
+}
+
 int netdev_nl_queue_set_doit(struct sk_buff *skb, struct genl_info *info)
 {
-	return -EOPNOTSUPP;
+	u32 q_id, q_type, ifindex;
+	struct net_device *netdev;
+	struct sk_buff *rsp;
+	u32 napi_id = 0;
+	int err = 0;
+
+	if (GENL_REQ_ATTR_CHECK(info, NETDEV_A_QUEUE_ID) ||
+	    GENL_REQ_ATTR_CHECK(info, NETDEV_A_QUEUE_TYPE) ||
+	    GENL_REQ_ATTR_CHECK(info, NETDEV_A_QUEUE_IFINDEX))
+		return -EINVAL;
+
+	q_id = nla_get_u32(info->attrs[NETDEV_A_QUEUE_ID]);
+	q_type = nla_get_u32(info->attrs[NETDEV_A_QUEUE_TYPE]);
+	ifindex = nla_get_u32(info->attrs[NETDEV_A_QUEUE_IFINDEX]);
+
+	if (info->attrs[NETDEV_A_QUEUE_NAPI_ID]) {
+		napi_id = nla_get_u32(info->attrs[NETDEV_A_QUEUE_NAPI_ID]);
+		if (napi_id < MIN_NAPI_ID)
+			return -EINVAL;
+	}
+
+	rsp = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!rsp)
+		return -ENOMEM;
+
+	rtnl_lock();
+
+	netdev = __dev_get_by_index(genl_info_net(info), ifindex);
+	if (netdev) {
+		if (!napi_id)
+			GENL_SET_ERR_MSG(info, "No queue parameters changed\n");
+		else
+			err = netdev_nl_queue_set_napi(rsp, netdev, q_id,
+						       q_type, napi_id, info);
+		if (!err)
+			err = netdev_nl_queue_fill_one(rsp, netdev, q_id,
+						       q_type, info);
+	} else {
+		err = -ENODEV;
+	}
+
+	rtnl_unlock();
+
+	if (err)
+		goto err_free_msg;
+
+	return genlmsg_reply(rsp, info);
+
+err_free_msg:
+	nlmsg_free(rsp);
+	return err;
 }
 
 static int netdev_genl_netdevice_event(struct notifier_block *nb,


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [net-next, RFC PATCH 3/5] ice: Add support to enable/disable a vector
  2024-04-05 20:09 [net-next,RFC PATCH 0/5] Configuring NAPI instance for a queue Amritha Nambiar
  2024-04-05 20:09 ` [net-next, RFC PATCH 1/5] netdev-genl: spec: Extend netdev netlink spec in YAML for queue-set Amritha Nambiar
  2024-04-05 20:09 ` [net-next, RFC PATCH 2/5] netdev-genl: Add netlink framework functions for queue-set NAPI Amritha Nambiar
@ 2024-04-05 20:09 ` Amritha Nambiar
  2024-04-05 20:09 ` [net-next,RFC PATCH 4/5] ice: Handle unused vectors dynamically Amritha Nambiar
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Amritha Nambiar @ 2024-04-05 20:09 UTC (permalink / raw)
  To: netdev, kuba, davem
  Cc: edumazet, pabeni, ast, sdf, lorenzo, tariqt, daniel,
	anthony.l.nguyen, lucien.xin, hawk, sridhar.samudrala,
	amritha.nambiar

While configuring a queue from the userspace, the queue will have to
be reset for the configuration to take effect in hardware. Resetting
a queue has the dependency of resetting the vector it is on.
Add the framework functions to enable/disable a single queue and hence
the vector also. The existing support in the driver allows either
enabling/disabling a single queue-pair or an entire VSI but not any random
Tx or Rx queue.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h      |    1 
 drivers/net/ethernet/intel/ice/ice_lib.c  |  247 +++++++++++++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_lib.h  |    4 
 drivers/net/ethernet/intel/ice/ice_main.c |    2 
 drivers/net/ethernet/intel/ice/ice_xsk.c  |   34 ----
 5 files changed, 253 insertions(+), 35 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index a7e88d797d4c..a2c91fa88e14 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -1009,4 +1009,5 @@ static inline void ice_clear_rdma_cap(struct ice_pf *pf)
 }
 
 extern const struct xdp_metadata_ops ice_xdp_md_ops;
+void ice_init_moderation(struct ice_q_vector *q_vector);
 #endif /* _ICE_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index d06e7c82c433..35389189af1b 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -4001,3 +4001,250 @@ ice_vsi_update_local_lb(struct ice_vsi *vsi, bool set)
 	vsi->info = ctx.info;
 	return 0;
 }
+
+/**
+ * ice_tx_queue_dis - Disable a Tx ring
+ * @vsi: VSI being configured
+ * @q_idx: Tx ring index
+ *
+ */
+static int ice_tx_queue_dis(struct ice_vsi *vsi, u16 q_idx)
+{
+	struct ice_txq_meta txq_meta = { };
+	struct ice_tx_ring *tx_ring;
+	int err;
+
+	if (q_idx >= vsi->num_txq)
+		return -EINVAL;
+
+	netif_tx_stop_queue(netdev_get_tx_queue(vsi->netdev, q_idx));
+
+	tx_ring = vsi->tx_rings[q_idx];
+	ice_fill_txq_meta(vsi, tx_ring, &txq_meta);
+	err = ice_vsi_stop_tx_ring(vsi, ICE_NO_RESET, 0, tx_ring, &txq_meta);
+	if (err)
+		return err;
+
+	ice_clean_tx_ring(tx_ring);
+
+	return 0;
+}
+
+/**
+ * ice_tx_queue_ena - Enable a Tx ring
+ * @vsi: VSI being configured
+ * @q_idx: Tx ring index
+ *
+ */
+static int ice_tx_queue_ena(struct ice_vsi *vsi, u16 q_idx)
+{
+	struct ice_q_vector *q_vector;
+	struct ice_tx_ring *tx_ring;
+	int err;
+
+	err = ice_vsi_cfg_single_txq(vsi, vsi->tx_rings, q_idx);
+	if (err)
+		return err;
+
+	tx_ring = vsi->tx_rings[q_idx];
+	q_vector = tx_ring->q_vector;
+	ice_cfg_txq_interrupt(vsi, tx_ring->reg_idx, q_vector->reg_idx,
+			      q_vector->tx.itr_idx);
+
+	netif_tx_start_queue(netdev_get_tx_queue(vsi->netdev, q_idx));
+
+	return 0;
+}
+
+/**
+ * ice_rx_ring_dis_irq - clear the queue to interrupt mapping in HW
+ * @vsi: VSI being configured
+ * @rx_ring: Rx ring that will have its IRQ disabled
+ *
+ */
+static void ice_rx_ring_dis_irq(struct ice_vsi *vsi, struct ice_rx_ring *rx_ring)
+{
+	struct ice_hw *hw = &vsi->back->hw;
+	u16 reg;
+	u32 val;
+
+	/* Clear QINT_RQCTL to clear the queue to interrupt mapping in HW */
+	reg = rx_ring->reg_idx;
+	val = rd32(hw, QINT_RQCTL(reg));
+	val &= ~QINT_RQCTL_CAUSE_ENA_M;
+	wr32(hw, QINT_RQCTL(reg), val);
+
+	ice_flush(hw);
+}
+
+/**
+ * ice_rx_queue_dis - Disable a Rx ring
+ * @vsi: VSI being configured
+ * @q_idx: Rx ring index
+ *
+ */
+static int ice_rx_queue_dis(struct ice_vsi *vsi, u16 q_idx)
+{
+	struct ice_rx_ring *rx_ring;
+	int err;
+
+	if (q_idx >= vsi->num_rxq)
+		return -EINVAL;
+
+	rx_ring = vsi->rx_rings[q_idx];
+	ice_rx_ring_dis_irq(vsi, rx_ring);
+
+	err = ice_vsi_ctrl_one_rx_ring(vsi, false, q_idx, true);
+	if (err)
+		return err;
+
+	ice_clean_rx_ring(rx_ring);
+
+	return 0;
+}
+
+/**
+ * ice_rx_queue_ena - Enable a Rx ring
+ * @vsi: VSI being configured
+ * @q_idx: Tx ring index
+ *
+ */
+static int ice_rx_queue_ena(struct ice_vsi *vsi, u16 q_idx)
+{
+	struct ice_q_vector *q_vector;
+	struct ice_rx_ring *rx_ring;
+	int err;
+
+	if (q_idx >= vsi->num_rxq)
+		return -EINVAL;
+
+	err = ice_vsi_cfg_single_rxq(vsi, q_idx);
+	if (err)
+		return err;
+
+	rx_ring = vsi->rx_rings[q_idx];
+	q_vector = rx_ring->q_vector;
+	ice_cfg_rxq_interrupt(vsi, rx_ring->reg_idx, q_vector->reg_idx,
+			      q_vector->rx.itr_idx);
+
+	err = ice_vsi_ctrl_one_rx_ring(vsi, true, q_idx, true);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+/**
+ * ice_qvec_toggle_napi - Enables/disables NAPI for a given q_vector
+ * @vsi: VSI that has netdev
+ * @q_vector: q_vector that has NAPI context
+ * @enable: true for enable, false for disable
+ */
+void
+ice_qvec_toggle_napi(struct ice_vsi *vsi, struct ice_q_vector *q_vector,
+		     bool enable)
+{
+	if (!vsi->netdev || !q_vector)
+		return;
+
+	if (enable)
+		napi_enable(&q_vector->napi);
+	else
+		napi_disable(&q_vector->napi);
+}
+
+/**
+ * ice_qvec_ena_irq - Enable IRQ for given queue vector
+ * @vsi: the VSI that contains queue vector
+ * @q_vector: queue vector
+ */
+void ice_qvec_ena_irq(struct ice_vsi *vsi, struct ice_q_vector *q_vector)
+{
+	struct ice_pf *pf = vsi->back;
+	struct ice_hw *hw = &pf->hw;
+
+	ice_irq_dynamic_ena(hw, vsi, q_vector);
+
+	ice_flush(hw);
+}
+
+/**
+ * ice_qvec_configure - Setup initial interrupt configuration
+ * @vsi: the VSI that contains queue vector
+ * @q_vector: queue vector
+ */
+static void ice_qvec_configure(struct ice_vsi *vsi, struct ice_q_vector *q_vector)
+{
+	struct ice_hw *hw = &vsi->back->hw;
+
+	ice_cfg_itr(hw, q_vector);
+	ice_init_moderation(q_vector);
+}
+
+/**
+ * ice_q_vector_dis - Disable a vector and all queues on it
+ * @vsi: the VSI that contains queue vector
+ * @q_vector: queue vector
+ */
+static int __maybe_unused
+ice_q_vector_dis(struct ice_vsi *vsi, struct ice_q_vector *q_vector)
+{
+	struct ice_hw *hw = &vsi->back->hw;
+	struct ice_rx_ring *rx_ring;
+	struct ice_tx_ring *tx_ring;
+	int err;
+
+	/* Disable the vector */
+	wr32(hw, GLINT_DYN_CTL(q_vector->reg_idx), 0);
+	ice_flush(hw);
+	synchronize_irq(q_vector->irq.virq);
+
+	ice_qvec_toggle_napi(vsi, q_vector, false);
+
+	/* Disable all rings on this vector */
+	ice_for_each_rx_ring(rx_ring, q_vector->rx) {
+		err = ice_rx_queue_dis(vsi, rx_ring->q_index);
+		if (err)
+			return err;
+	}
+
+	ice_for_each_tx_ring(tx_ring, q_vector->tx) {
+		err = ice_tx_queue_dis(vsi, tx_ring->q_index);
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
+/**
+ * ice_q_vector_ena - Enable a vector and all queues on it
+ * @vsi: the VSI that contains queue vector
+ * @q_vector: queue vector
+ */
+static int __maybe_unused
+ice_q_vector_ena(struct ice_vsi *vsi, struct ice_q_vector *q_vector)
+{
+	struct ice_rx_ring *rx_ring;
+	struct ice_tx_ring *tx_ring;
+	int err;
+
+	ice_qvec_configure(vsi, q_vector);
+
+	/* enable all rings on this vector */
+	ice_for_each_rx_ring(rx_ring, q_vector->rx) {
+		err = ice_rx_queue_ena(vsi, rx_ring->q_index);
+		if (err)
+			return err;
+	}
+
+	ice_for_each_tx_ring(tx_ring, q_vector->tx) {
+		err = ice_tx_queue_ena(vsi, tx_ring->q_index);
+		if (err)
+			return err;
+	}
+
+	ice_qvec_toggle_napi(vsi, q_vector, true);
+	ice_qvec_ena_irq(vsi, q_vector);
+
+	return 0;
+}
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.h b/drivers/net/ethernet/intel/ice/ice_lib.h
index 9cd23afe5f15..00239c2efa92 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_lib.h
@@ -160,4 +160,8 @@ void ice_set_feature_support(struct ice_pf *pf, enum ice_feature f);
 void ice_clear_feature_support(struct ice_pf *pf, enum ice_feature f);
 void ice_init_feature_support(struct ice_pf *pf);
 bool ice_vsi_is_rx_queue_active(struct ice_vsi *vsi);
+void ice_qvec_ena_irq(struct ice_vsi *vsi, struct ice_q_vector *q_vector);
+void
+ice_qvec_toggle_napi(struct ice_vsi *vsi, struct ice_q_vector *q_vector,
+		     bool enable);
 #endif /* !_ICE_LIB_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 9d751954782c..cd2f467fe3a0 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -6508,7 +6508,7 @@ static void ice_rx_dim_work(struct work_struct *work)
  * dynamic moderation mode or not in order to make sure hardware is in a known
  * state.
  */
-static void ice_init_moderation(struct ice_q_vector *q_vector)
+void ice_init_moderation(struct ice_q_vector *q_vector)
 {
 	struct ice_ring_container *rc;
 	bool tx_dynamic, rx_dynamic;
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index aa81d1162b81..f7708bbb769b 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -59,25 +59,6 @@ static void ice_qp_clean_rings(struct ice_vsi *vsi, u16 q_idx)
 	ice_clean_rx_ring(vsi->rx_rings[q_idx]);
 }
 
-/**
- * ice_qvec_toggle_napi - Enables/disables NAPI for a given q_vector
- * @vsi: VSI that has netdev
- * @q_vector: q_vector that has NAPI context
- * @enable: true for enable, false for disable
- */
-static void
-ice_qvec_toggle_napi(struct ice_vsi *vsi, struct ice_q_vector *q_vector,
-		     bool enable)
-{
-	if (!vsi->netdev || !q_vector)
-		return;
-
-	if (enable)
-		napi_enable(&q_vector->napi);
-	else
-		napi_disable(&q_vector->napi);
-}
-
 /**
  * ice_qvec_dis_irq - Mask off queue interrupt generation on given ring
  * @vsi: the VSI that contains queue vector being un-configured
@@ -135,21 +116,6 @@ ice_qvec_cfg_msix(struct ice_vsi *vsi, struct ice_q_vector *q_vector)
 	ice_flush(hw);
 }
 
-/**
- * ice_qvec_ena_irq - Enable IRQ for given queue vector
- * @vsi: the VSI that contains queue vector
- * @q_vector: queue vector
- */
-static void ice_qvec_ena_irq(struct ice_vsi *vsi, struct ice_q_vector *q_vector)
-{
-	struct ice_pf *pf = vsi->back;
-	struct ice_hw *hw = &pf->hw;
-
-	ice_irq_dynamic_ena(hw, vsi, q_vector);
-
-	ice_flush(hw);
-}
-
 /**
  * ice_qp_dis - Disables a queue pair
  * @vsi: VSI of interest


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [net-next,RFC PATCH 4/5] ice: Handle unused vectors dynamically
  2024-04-05 20:09 [net-next,RFC PATCH 0/5] Configuring NAPI instance for a queue Amritha Nambiar
                   ` (2 preceding siblings ...)
  2024-04-05 20:09 ` [net-next, RFC PATCH 3/5] ice: Add support to enable/disable a vector Amritha Nambiar
@ 2024-04-05 20:09 ` Amritha Nambiar
  2024-04-05 20:09 ` [net-next, RFC PATCH 5/5] ice: Add driver support for ndo_queue_set_napi Amritha Nambiar
  2024-04-09 23:21 ` [net-next,RFC PATCH 0/5] Configuring NAPI instance for a queue Jakub Kicinski
  5 siblings, 0 replies; 13+ messages in thread
From: Amritha Nambiar @ 2024-04-05 20:09 UTC (permalink / raw)
  To: netdev, kuba, davem
  Cc: edumazet, pabeni, ast, sdf, lorenzo, tariqt, daniel,
	anthony.l.nguyen, lucien.xin, hawk, sridhar.samudrala,
	amritha.nambiar

When queues are moved between vectors, some vector[s] may get
unused. The unused vector[s] need to be freed. When queue[s]
gets assigned to previously unused and freed vector, this vector
will need to be requested and setup. Add the framework functions
for this.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h      |   12 +++
 drivers/net/ethernet/intel/ice/ice_lib.c  |  117 +++++++++++++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_lib.h  |    6 +
 drivers/net/ethernet/intel/ice/ice_main.c |   12 ---
 4 files changed, 136 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index a2c91fa88e14..d7b67821dc21 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -1010,4 +1010,16 @@ static inline void ice_clear_rdma_cap(struct ice_pf *pf)
 
 extern const struct xdp_metadata_ops ice_xdp_md_ops;
 void ice_init_moderation(struct ice_q_vector *q_vector);
+void
+ice_irq_affinity_notify(struct irq_affinity_notify *notify,
+			const cpumask_t *mask);
+/**
+ * ice_irq_affinity_release - Callback for affinity notifier release
+ * @ref: internal core kernel usage
+ *
+ * This is a callback function used by the irq_set_affinity_notifier function
+ * to inform the current notification subscriber that they will no longer
+ * receive notifications.
+ */
+static inline void ice_irq_affinity_release(struct kref __always_unused *ref) {}
 #endif /* _ICE_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 35389189af1b..419d9561bc2a 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -4248,3 +4248,120 @@ ice_q_vector_ena(struct ice_vsi *vsi, struct ice_q_vector *q_vector)
 
 	return 0;
 }
+
+static void
+ice_qvec_release_msix(struct ice_vsi *vsi, struct ice_q_vector *q_vector)
+{
+	struct ice_hw *hw = &vsi->back->hw;
+	struct ice_rx_ring *rx_ring;
+	struct ice_tx_ring *tx_ring;
+
+	ice_write_intrl(q_vector, 0);
+
+	ice_for_each_rx_ring(rx_ring, q_vector->rx) {
+		ice_write_itr(&q_vector->rx, 0);
+		wr32(hw, QINT_RQCTL(vsi->rxq_map[rx_ring->q_index]), 0);
+	}
+
+	ice_for_each_tx_ring(tx_ring, q_vector->tx) {
+		ice_write_itr(&q_vector->tx, 0);
+		wr32(hw, QINT_TQCTL(vsi->txq_map[tx_ring->q_index]), 0);
+	}
+
+	/* Disable the interrupt by writing to the register */
+	wr32(hw, GLINT_DYN_CTL(q_vector->reg_idx), 0);
+	ice_flush(hw);
+}
+
+/**
+ * ice_qvec_free - Free the MSI_X vector
+ * @vsi: the VSI that contains queue vector
+ * @q_vector: queue vector
+ */
+static void __maybe_unused
+ice_qvec_free(struct ice_vsi *vsi, struct ice_q_vector *q_vector)
+{
+	int irq_num = q_vector->irq.virq;
+	struct ice_pf *pf = vsi->back;
+
+	ice_qvec_release_msix(vsi, q_vector);
+
+#ifdef CONFIG_RFS_ACCEL
+	struct net_device *netdev = vsi->netdev;
+
+	if (netdev && netdev->rx_cpu_rmap)
+		irq_cpu_rmap_remove(netdev->rx_cpu_rmap, irq_num);
+#endif
+
+	/* clear the affinity notifier in the IRQ descriptor */
+	if (!IS_ENABLED(CONFIG_RFS_ACCEL))
+		irq_set_affinity_notifier(irq_num, NULL);
+
+	/* clear the affinity_mask in the IRQ descriptor */
+	irq_set_affinity_hint(irq_num, NULL);
+
+	synchronize_irq(irq_num);
+	devm_free_irq(ice_pf_to_dev(pf), irq_num, q_vector);
+}
+
+/**
+ * ice_qvec_prep - Request and prepare a new MSI_X vector
+ * @vsi: the VSI that contains queue vector
+ * @q_vector: queue vector
+ */
+static int __maybe_unused
+ice_qvec_prep(struct ice_vsi *vsi, struct ice_q_vector *q_vector)
+{
+	struct ice_pf *pf = vsi->back;
+	struct device *dev;
+	int err, irq_num;
+
+	dev = ice_pf_to_dev(pf);
+	irq_num = q_vector->irq.virq;
+
+	err = devm_request_irq(dev, irq_num, vsi->irq_handler, 0,
+			       q_vector->name, q_vector);
+	if (err) {
+		netdev_err(vsi->netdev, "MSIX request_irq failed, error: %d\n",
+			   err);
+		goto free_q_irqs;
+	}
+
+	/* register for affinity change notifications */
+	if (!IS_ENABLED(CONFIG_RFS_ACCEL)) {
+		struct irq_affinity_notify *affinity_notify;
+
+		affinity_notify = &q_vector->affinity_notify;
+		affinity_notify->notify = ice_irq_affinity_notify;
+		affinity_notify->release = ice_irq_affinity_release;
+		irq_set_affinity_notifier(irq_num, affinity_notify);
+	}
+
+	/* assign the mask for this irq */
+	irq_set_affinity_hint(irq_num, &q_vector->affinity_mask);
+
+#ifdef CONFIG_RFS_ACCEL
+	struct net_device *netdev = vsi->netdev;
+
+	if (!netdev) {
+		err = -EINVAL;
+		goto free_q_irqs;
+	}
+
+	if (irq_cpu_rmap_add(netdev->rx_cpu_rmap, irq_num)) {
+		err = -EINVAL;
+		netdev_err(vsi->netdev, "Failed to setup CPU RMAP on irq %u: %pe\n",
+			   irq_num, ERR_PTR(err));
+		goto free_q_irqs;
+	}
+#endif
+	return 0;
+
+free_q_irqs:
+	if (!IS_ENABLED(CONFIG_RFS_ACCEL))
+		irq_set_affinity_notifier(irq_num, NULL);
+	irq_set_affinity_hint(irq_num, NULL);
+	devm_free_irq(dev, irq_num, &q_vector);
+
+	return err;
+}
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.h b/drivers/net/ethernet/intel/ice/ice_lib.h
index 00239c2efa92..66a9709ff612 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_lib.h
@@ -164,4 +164,10 @@ void ice_qvec_ena_irq(struct ice_vsi *vsi, struct ice_q_vector *q_vector);
 void
 ice_qvec_toggle_napi(struct ice_vsi *vsi, struct ice_q_vector *q_vector,
 		     bool enable);
+static inline bool
+ice_is_q_vector_unused(struct ice_q_vector *q_vector)
+{
+	return (!q_vector->num_ring_tx && !q_vector->num_ring_rx);
+}
+
 #endif /* !_ICE_LIB_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index cd2f467fe3a0..0884b53a0b01 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -2476,7 +2476,7 @@ int ice_schedule_reset(struct ice_pf *pf, enum ice_reset_req reset)
  * This is a callback function used by the irq_set_affinity_notifier function
  * so that we may register to receive changes to the irq affinity masks.
  */
-static void
+void
 ice_irq_affinity_notify(struct irq_affinity_notify *notify,
 			const cpumask_t *mask)
 {
@@ -2486,16 +2486,6 @@ ice_irq_affinity_notify(struct irq_affinity_notify *notify,
 	cpumask_copy(&q_vector->affinity_mask, mask);
 }
 
-/**
- * ice_irq_affinity_release - Callback for affinity notifier release
- * @ref: internal core kernel usage
- *
- * This is a callback function used by the irq_set_affinity_notifier function
- * to inform the current notification subscriber that they will no longer
- * receive notifications.
- */
-static void ice_irq_affinity_release(struct kref __always_unused *ref) {}
-
 /**
  * ice_vsi_ena_irq - Enable IRQ for the given VSI
  * @vsi: the VSI being configured


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [net-next, RFC PATCH 5/5] ice: Add driver support for ndo_queue_set_napi
  2024-04-05 20:09 [net-next,RFC PATCH 0/5] Configuring NAPI instance for a queue Amritha Nambiar
                   ` (3 preceding siblings ...)
  2024-04-05 20:09 ` [net-next,RFC PATCH 4/5] ice: Handle unused vectors dynamically Amritha Nambiar
@ 2024-04-05 20:09 ` Amritha Nambiar
  2024-04-09 23:21 ` [net-next,RFC PATCH 0/5] Configuring NAPI instance for a queue Jakub Kicinski
  5 siblings, 0 replies; 13+ messages in thread
From: Amritha Nambiar @ 2024-04-05 20:09 UTC (permalink / raw)
  To: netdev, kuba, davem
  Cc: edumazet, pabeni, ast, sdf, lorenzo, tariqt, daniel,
	anthony.l.nguyen, lucien.xin, hawk, sridhar.samudrala,
	amritha.nambiar

Add support in the ice driver to change the NAPI instance
associated with the queue. This is achieved by updating the
interrupt vector association for the queue.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_lib.c  |  311 +++++++++++++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_lib.h  |    2 
 drivers/net/ethernet/intel/ice/ice_main.c |    1 
 3 files changed, 310 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 419d9561bc2a..3a93b53a0da0 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -4186,7 +4186,7 @@ static void ice_qvec_configure(struct ice_vsi *vsi, struct ice_q_vector *q_vecto
  * @vsi: the VSI that contains queue vector
  * @q_vector: queue vector
  */
-static int __maybe_unused
+static int
 ice_q_vector_dis(struct ice_vsi *vsi, struct ice_q_vector *q_vector)
 {
 	struct ice_hw *hw = &vsi->back->hw;
@@ -4221,7 +4221,7 @@ ice_q_vector_dis(struct ice_vsi *vsi, struct ice_q_vector *q_vector)
  * @vsi: the VSI that contains queue vector
  * @q_vector: queue vector
  */
-static int __maybe_unused
+static int
 ice_q_vector_ena(struct ice_vsi *vsi, struct ice_q_vector *q_vector)
 {
 	struct ice_rx_ring *rx_ring;
@@ -4278,7 +4278,7 @@ ice_qvec_release_msix(struct ice_vsi *vsi, struct ice_q_vector *q_vector)
  * @vsi: the VSI that contains queue vector
  * @q_vector: queue vector
  */
-static void __maybe_unused
+static void
 ice_qvec_free(struct ice_vsi *vsi, struct ice_q_vector *q_vector)
 {
 	int irq_num = q_vector->irq.virq;
@@ -4309,7 +4309,7 @@ ice_qvec_free(struct ice_vsi *vsi, struct ice_q_vector *q_vector)
  * @vsi: the VSI that contains queue vector
  * @q_vector: queue vector
  */
-static int __maybe_unused
+static int
 ice_qvec_prep(struct ice_vsi *vsi, struct ice_q_vector *q_vector)
 {
 	struct ice_pf *pf = vsi->back;
@@ -4365,3 +4365,306 @@ ice_qvec_prep(struct ice_vsi *vsi, struct ice_q_vector *q_vector)
 
 	return err;
 }
+
+/**
+ * ice_vsi_rename_irq_msix
+ * @vsi: VSI being configured
+ * @basename: name for the vector
+ *
+ * Rename the vector. The default vector names assumed a 1:1 mapping between
+ * queues and vectors in a serial fashion. When the NAPI association for the
+ * queue is changed, it is possible to have multiple queues sharing a vector
+ * in a non-serial way.
+ */
+static void ice_vsi_rename_irq_msix(struct ice_vsi *vsi, char *basename)
+{
+	int q_vectors = vsi->num_q_vectors;
+	int vector, err;
+
+	for (vector = 0; vector < q_vectors; vector++) {
+		struct ice_q_vector *q_vector = vsi->q_vectors[vector];
+
+		if (q_vector->tx.tx_ring && q_vector->rx.rx_ring) {
+			err = snprintf(q_vector->name, sizeof(q_vector->name) - 1,
+				       "%s-%s", basename, "TxRx");
+		} else if (q_vector->rx.rx_ring) {
+			err = snprintf(q_vector->name, sizeof(q_vector->name) - 1,
+				       "%s-%s", basename, "rx");
+		} else if (q_vector->tx.tx_ring) {
+			err = snprintf(q_vector->name, sizeof(q_vector->name) - 1,
+				       "%s-%s", basename, "tx");
+		} else {
+			err = snprintf(q_vector->name, sizeof(q_vector->name) - 1,
+				       "%s", basename);
+		}
+		/* Catching the return quiets a Wformat-truncation complaint */
+		if (err > sizeof(q_vector->name) - 1)
+			netdev_dbg(vsi->netdev, "vector name truncated, ignore\n");
+	}
+}
+
+/**
+ * ice_tx_ring_unmap_qvec - Unmap tx ring from its current q_vector
+ * @tx_ring: rx ring to be removed
+ *
+ * Unmap tx ring from its current vector association in SW
+ */
+static void
+ice_tx_ring_unmap_qvec(struct ice_tx_ring *tx_ring)
+{
+	struct ice_q_vector *q_vector = tx_ring->q_vector;
+	struct ice_tx_ring *prev, *ring;
+
+	/* Remove a tx ring from its corresponding vector's ring container */
+	ring = q_vector->tx.tx_ring;
+	if (!ring)
+		return;
+
+	if (tx_ring == ring) {
+		q_vector->tx.tx_ring = tx_ring->next;
+		q_vector->num_ring_tx--;
+		return;
+	}
+
+	while (ring && ring != tx_ring) {
+		prev = ring;
+		ring = ring->next;
+	}
+	if (!ring)
+		return;
+	prev->next = ring->next;
+	q_vector->num_ring_tx--;
+}
+
+/**
+ * ice_rx_ring_unmap_qvec - Unmap rx ring from its current q_vector
+ * @rx_ring: rx ring to be removed
+ *
+ * Unmap rx ring from its current vector association in SW
+ */
+static void
+ice_rx_ring_unmap_qvec(struct ice_rx_ring *rx_ring)
+{
+	struct ice_q_vector *q_vector = rx_ring->q_vector;
+	struct ice_rx_ring *prev, *ring;
+
+	/* Remove a rx ring from its corresponding vector's ring container */
+	ring = q_vector->rx.rx_ring;
+	if (!ring)
+		return;
+
+	if (rx_ring == ring) {
+		q_vector->rx.rx_ring = rx_ring->next;
+		q_vector->num_ring_rx--;
+		return;
+	}
+
+	while (ring && ring != rx_ring) {
+		prev = ring;
+		ring = ring->next;
+	}
+	if (!ring)
+		return;
+	prev->next = ring->next;
+	q_vector->num_ring_rx--;
+}
+
+static int
+ice_tx_queue_update_q_vector(struct ice_vsi *vsi, u32 q_idx,
+			     struct ice_q_vector *new_qvec)
+{
+	struct ice_q_vector *old_qvec;
+	struct ice_tx_ring *tx_ring;
+	int timeout = 50;
+	int err;
+
+	if (q_idx >= vsi->num_txq)
+		return -EINVAL;
+	tx_ring = vsi->tx_rings[q_idx];
+	if (!tx_ring)
+		return -EINVAL;
+	old_qvec = tx_ring->q_vector;
+
+	if (old_qvec->irq.virq == new_qvec->irq.virq)
+		return 0;
+
+	while (test_and_set_bit(ICE_CFG_BUSY, vsi->state)) {
+		timeout--;
+		if (!timeout)
+			return -EBUSY;
+		usleep_range(1000, 2000);
+	}
+
+	err = ice_q_vector_dis(vsi, old_qvec);
+	if (err)
+		return err;
+
+	ice_tx_ring_unmap_qvec(tx_ring);
+
+	/* free vector if it has no queues as all of its queues are now moved */
+	if (ice_is_q_vector_unused(old_qvec))
+		ice_qvec_free(vsi, old_qvec);
+
+	/* Prepare new q_vector if it was previously unused */
+	if (ice_is_q_vector_unused(new_qvec)) {
+		err = ice_qvec_prep(vsi, new_qvec);
+		if (err)
+			return err;
+	} else {
+		err = ice_q_vector_dis(vsi, new_qvec);
+		if (err)
+			return err;
+	}
+
+	tx_ring->q_vector = new_qvec;
+	tx_ring->next = new_qvec->tx.tx_ring;
+	new_qvec->tx.tx_ring = tx_ring;
+	new_qvec->num_ring_tx++;
+
+	err = ice_q_vector_ena(vsi, new_qvec);
+	if (err)
+		return err;
+
+	if (!ice_is_q_vector_unused(old_qvec)) {
+		err = ice_q_vector_ena(vsi, old_qvec);
+		if (err)
+			return err;
+	}
+
+	clear_bit(ICE_CFG_BUSY, vsi->state);
+
+	return 0;
+}
+
+static int
+ice_rx_queue_update_q_vector(struct ice_vsi *vsi, u32 q_idx,
+			     struct ice_q_vector *new_qvec)
+{
+	struct ice_q_vector *old_qvec;
+	struct ice_rx_ring *rx_ring;
+	int timeout = 50;
+	int err;
+
+	if (q_idx >= vsi->num_rxq)
+		return -EINVAL;
+	rx_ring = vsi->rx_rings[q_idx];
+	if (!rx_ring)
+		return -EINVAL;
+
+	old_qvec = rx_ring->q_vector;
+
+	if (old_qvec->irq.virq == new_qvec->irq.virq)
+		return 0;
+
+	while (test_and_set_bit(ICE_CFG_BUSY, vsi->state)) {
+		timeout--;
+		if (!timeout)
+			return -EBUSY;
+		usleep_range(1000, 2000);
+	}
+
+	err = ice_q_vector_dis(vsi, old_qvec);
+	if (err)
+		return err;
+
+	ice_rx_ring_unmap_qvec(rx_ring);
+
+	/* free vector if it has no queues as all of its queues are now moved */
+	if (ice_is_q_vector_unused(old_qvec))
+		ice_qvec_free(vsi, old_qvec);
+
+	/* Prepare new q_vector if it was previously unused */
+	if (ice_is_q_vector_unused(new_qvec)) {
+		err = ice_qvec_prep(vsi, new_qvec);
+		if (err)
+			return err;
+	} else {
+		err = ice_q_vector_dis(vsi, new_qvec);
+		if (err)
+			return err;
+	}
+
+	rx_ring->q_vector = new_qvec;
+	rx_ring->next = new_qvec->rx.rx_ring;
+	new_qvec->rx.rx_ring = rx_ring;
+	new_qvec->num_ring_rx++;
+
+	err = ice_q_vector_ena(vsi, new_qvec);
+	if (err)
+		return err;
+
+	if (!ice_is_q_vector_unused(old_qvec)) {
+		err = ice_q_vector_ena(vsi, old_qvec);
+		if (err)
+			return err;
+	}
+
+	clear_bit(ICE_CFG_BUSY, vsi->state);
+
+	return 0;
+}
+
+/**
+ * ice_vsi_get_vector_from_irq
+ * @vsi: the VSI being configured
+ * @irq_num: Interrupt vector number
+ *
+ * Get the q_vector from the Linux interrupt vector number
+ */
+static struct ice_q_vector *
+ice_vsi_get_vector_from_irq(struct ice_vsi *vsi, int irq_num)
+{
+	int i;
+
+	ice_for_each_q_vector(vsi, i) {
+		if (vsi->q_vectors[i]->irq.virq == irq_num)
+			return vsi->q_vectors[i];
+	}
+	return NULL;
+}
+
+/**
+ * ice_queue_change_napi - Change the NAPI instance for the queue
+ * @dev: device to which NAPI and queue belong
+ * @q_idx: Index of queue
+ * @q_type: queue type as RX or TX
+ * @napi: NAPI context for the queue
+ */
+int ice_queue_change_napi(struct net_device *dev, u32 q_idx, u32 q_type,
+			  struct napi_struct *napi)
+{
+	struct ice_netdev_priv *np = netdev_priv(dev);
+	char int_name[ICE_INT_NAME_STR_LEN];
+	struct ice_q_vector *q_vector;
+	struct ice_vsi *vsi = np->vsi;
+	struct ice_pf *pf = vsi->back;
+	int err;
+
+	q_vector = ice_vsi_get_vector_from_irq(vsi, napi->irq);
+	if (!q_vector)
+		return -EINVAL;
+
+	switch (q_type) {
+	case NETDEV_QUEUE_TYPE_RX:
+		err = ice_rx_queue_update_q_vector(vsi, q_idx, q_vector);
+		if (err)
+			return err;
+		break;
+	case NETDEV_QUEUE_TYPE_TX:
+		err = ice_tx_queue_update_q_vector(vsi, q_idx, q_vector);
+		if (err)
+			return err;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	snprintf(int_name, sizeof(int_name) - 1, "%s-%s",
+		 dev_driver_string(ice_pf_to_dev(pf)), vsi->netdev->name);
+	ice_vsi_rename_irq_msix(vsi, int_name);
+
+	/* Now report to the stack */
+	netif_queue_set_napi(dev, q_idx, q_type, napi);
+
+	return 0;
+}
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.h b/drivers/net/ethernet/intel/ice/ice_lib.h
index 66a9709ff612..d53f399487bf 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_lib.h
@@ -170,4 +170,6 @@ ice_is_q_vector_unused(struct ice_q_vector *q_vector)
 	return (!q_vector->num_ring_tx && !q_vector->num_ring_rx);
 }
 
+int ice_queue_change_napi(struct net_device *dev, u32 q_idx, u32 q_type,
+			  struct napi_struct *napi);
 #endif /* !_ICE_LIB_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 0884b53a0b01..08c20f8b17e2 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -9494,4 +9494,5 @@ static const struct net_device_ops ice_netdev_ops = {
 	.ndo_bpf = ice_xdp,
 	.ndo_xdp_xmit = ice_xdp_xmit,
 	.ndo_xsk_wakeup = ice_xsk_wakeup,
+	.ndo_queue_set_napi = ice_queue_change_napi,
 };


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [net-next, RFC PATCH 2/5] netdev-genl: Add netlink framework functions for queue-set NAPI
  2024-04-05 20:09 ` [net-next, RFC PATCH 2/5] netdev-genl: Add netlink framework functions for queue-set NAPI Amritha Nambiar
@ 2024-04-06  0:20   ` David Wei
  2024-04-08 19:45     ` Nambiar, Amritha
  0 siblings, 1 reply; 13+ messages in thread
From: David Wei @ 2024-04-06  0:20 UTC (permalink / raw)
  To: Amritha Nambiar, netdev, kuba, davem
  Cc: edumazet, pabeni, ast, sdf, lorenzo, tariqt, daniel,
	anthony.l.nguyen, lucien.xin, hawk, sridhar.samudrala

On 2024-04-05 13:09, Amritha Nambiar wrote:
> Implement the netdev netlink framework functions for associating
> a queue with NAPI ID.
> 
> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
> ---
>  include/linux/netdevice.h |    7 +++
>  net/core/netdev-genl.c    |  117 +++++++++++++++++++++++++++++++++++++++------
>  2 files changed, 108 insertions(+), 16 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 0c198620ac93..70df1cec4a60 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1351,6 +1351,10 @@ struct netdev_net_notifier {
>   *			   struct kernel_hwtstamp_config *kernel_config,
>   *			   struct netlink_ext_ack *extack);
>   *	Change the hardware timestamping parameters for NIC device.
> + *
> + * int (*ndo_queue_set_napi)(struct net_device *dev, u32 q_idx, u32 q_type,
> + *			     struct napi_struct *napi);
> + *	Change the NAPI instance associated with the queue.
>   */
>  struct net_device_ops {
>  	int			(*ndo_init)(struct net_device *dev);
> @@ -1596,6 +1600,9 @@ struct net_device_ops {
>  	int			(*ndo_hwtstamp_set)(struct net_device *dev,
>  						    struct kernel_hwtstamp_config *kernel_config,
>  						    struct netlink_ext_ack *extack);
> +	int			(*ndo_queue_set_napi)(struct net_device *dev,
> +						      u32 q_idx, u32 q_type,
> +						      struct napi_struct *napi);
>  };
>  
>  /**
> diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
> index d5b2e90e5709..6b3d3165d76e 100644
> --- a/net/core/netdev-genl.c
> +++ b/net/core/netdev-genl.c
> @@ -288,12 +288,29 @@ int netdev_nl_napi_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
>  	return err;
>  }
>  
> +/* must be called under rtnl_lock() */
> +static struct napi_struct *
> +napi_get_by_queue(struct net_device *netdev, u32 q_idx, u32 q_type)
> +{
> +	struct netdev_rx_queue *rxq;
> +	struct netdev_queue *txq;
> +
> +	switch (q_type) {
> +	case NETDEV_QUEUE_TYPE_RX:
> +		rxq = __netif_get_rx_queue(netdev, q_idx);
> +		return rxq->napi;
> +	case NETDEV_QUEUE_TYPE_TX:
> +		txq = netdev_get_tx_queue(netdev, q_idx);
> +		return txq->napi;
> +	}
> +	return NULL;
> +}
> +
>  static int
>  netdev_nl_queue_fill_one(struct sk_buff *rsp, struct net_device *netdev,
>  			 u32 q_idx, u32 q_type, const struct genl_info *info)
>  {
> -	struct netdev_rx_queue *rxq;
> -	struct netdev_queue *txq;
> +	struct napi_struct *napi;
>  	void *hdr;
>  
>  	hdr = genlmsg_iput(rsp, info);
> @@ -305,19 +322,9 @@ netdev_nl_queue_fill_one(struct sk_buff *rsp, struct net_device *netdev,
>  	    nla_put_u32(rsp, NETDEV_A_QUEUE_IFINDEX, netdev->ifindex))
>  		goto nla_put_failure;
>  
> -	switch (q_type) {
> -	case NETDEV_QUEUE_TYPE_RX:
> -		rxq = __netif_get_rx_queue(netdev, q_idx);
> -		if (rxq->napi && nla_put_u32(rsp, NETDEV_A_QUEUE_NAPI_ID,
> -					     rxq->napi->napi_id))
> -			goto nla_put_failure;
> -		break;
> -	case NETDEV_QUEUE_TYPE_TX:
> -		txq = netdev_get_tx_queue(netdev, q_idx);
> -		if (txq->napi && nla_put_u32(rsp, NETDEV_A_QUEUE_NAPI_ID,
> -					     txq->napi->napi_id))
> -			goto nla_put_failure;
> -	}
> +	napi = napi_get_by_queue(netdev, q_idx, q_type);
> +	if (napi && nla_put_u32(rsp, NETDEV_A_QUEUE_NAPI_ID, napi->napi_id))
> +		goto nla_put_failure;
>  
>  	genlmsg_end(rsp, hdr);
>  
> @@ -674,9 +681,87 @@ int netdev_nl_qstats_get_dumpit(struct sk_buff *skb,
>  	return err;
>  }
>  
> +static int
> +netdev_nl_queue_set_napi(struct sk_buff *rsp, struct net_device *netdev,
> +			 u32 q_idx, u32 q_type, u32 napi_id,
> +			 const struct genl_info *info)
> +{
> +	struct napi_struct *napi, *old_napi;
> +	int err;
> +
> +	if (!(netdev->flags & IFF_UP))
> +		return 0;

Should this be an error code?

> +
> +	err = netdev_nl_queue_validate(netdev, q_idx, q_type);
> +	if (err)
> +		return err;
> +
> +	old_napi = napi_get_by_queue(netdev, q_idx, q_type);
> +	if (old_napi && old_napi->napi_id == napi_id)
> +		return 0;

Same as above, I think this should be an error.

> +
> +	napi = napi_by_id(napi_id);
> +	if (!napi)
> +		return -EINVAL;
> +
> +	err = netdev->netdev_ops->ndo_queue_set_napi(netdev, q_idx, q_type, napi);
> +
> +	return err;

nit: return ndo_queue_set_napi() would save two lines.

> +}
> +
>  int netdev_nl_queue_set_doit(struct sk_buff *skb, struct genl_info *info)
>  {
> -	return -EOPNOTSUPP;
> +	u32 q_id, q_type, ifindex;

nit: q_idx for consistency?

> +	struct net_device *netdev;
> +	struct sk_buff *rsp;
> +	u32 napi_id = 0;
> +	int err = 0;
> +
> +	if (GENL_REQ_ATTR_CHECK(info, NETDEV_A_QUEUE_ID) ||
> +	    GENL_REQ_ATTR_CHECK(info, NETDEV_A_QUEUE_TYPE) ||
> +	    GENL_REQ_ATTR_CHECK(info, NETDEV_A_QUEUE_IFINDEX))
> +		return -EINVAL;
> +
> +	q_id = nla_get_u32(info->attrs[NETDEV_A_QUEUE_ID]);
> +	q_type = nla_get_u32(info->attrs[NETDEV_A_QUEUE_TYPE]);
> +	ifindex = nla_get_u32(info->attrs[NETDEV_A_QUEUE_IFINDEX]);
> +
> +	if (info->attrs[NETDEV_A_QUEUE_NAPI_ID]) {
> +		napi_id = nla_get_u32(info->attrs[NETDEV_A_QUEUE_NAPI_ID]);
> +		if (napi_id < MIN_NAPI_ID)
> +			return -EINVAL;
> +	}
> +
> +	rsp = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
> +	if (!rsp)
> +		return -ENOMEM;
> +
> +	rtnl_lock();
> +
> +	netdev = __dev_get_by_index(genl_info_net(info), ifindex);
> +	if (netdev) {
> +		if (!napi_id)
> +			GENL_SET_ERR_MSG(info, "No queue parameters changed\n");

Could this be checked earlier outside of rtnl_lock? I feel like not
setting a napi_id here is EINVAL.

> +		else
> +			err = netdev_nl_queue_set_napi(rsp, netdev, q_id,
> +						       q_type, napi_id, info);
> +		if (!err)
> +			err = netdev_nl_queue_fill_one(rsp, netdev, q_id,
> +						       q_type, info);
> +	} else {
> +		err = -ENODEV;

Could be less nesty (completely untested):

	if (!netdev) {
		err = -ENODEV;
		goto err_rtnl_unlock;
	}

	if (!napi_id) {
		GENL_SET_ERR_MSG(info, "No queue parameters changed\n");
		goto err_nonapi;
	}

	err = netdev_nl_queue_set_napi(rsp, netdev, q_id,
				       q_type, napi_id, info);
	if (err)
		goto err_rtnl_unlock;

err_nonapi:
	err = netdev_nl_queue_fill_one(rsp, netdev, q_id,
				       q_type, info);

err_rtnl_unlock:
	rtnl_unlock();

	if (!err)
		return genlmsg_reply(rsp, info);

err_free_msg:
	nlmsg_free(rsp);
	return err;

> +	}
> +
> +	rtnl_unlock();
> +
> +	if (err)
> +		goto err_free_msg;
> +
> +	return genlmsg_reply(rsp, info);
> +
> +err_free_msg:
> +	nlmsg_free(rsp);
> +	return err;
>  }
>  
>  static int netdev_genl_netdevice_event(struct notifier_block *nb,
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next, RFC PATCH 2/5] netdev-genl: Add netlink framework functions for queue-set NAPI
  2024-04-06  0:20   ` David Wei
@ 2024-04-08 19:45     ` Nambiar, Amritha
  0 siblings, 0 replies; 13+ messages in thread
From: Nambiar, Amritha @ 2024-04-08 19:45 UTC (permalink / raw)
  To: David Wei, netdev, kuba, davem
  Cc: edumazet, pabeni, ast, sdf, lorenzo, tariqt, daniel,
	anthony.l.nguyen, lucien.xin, hawk, sridhar.samudrala

On 4/5/2024 5:20 PM, David Wei wrote:
> On 2024-04-05 13:09, Amritha Nambiar wrote:
>> Implement the netdev netlink framework functions for associating
>> a queue with NAPI ID.
>>
>> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
>> ---
>>   include/linux/netdevice.h |    7 +++
>>   net/core/netdev-genl.c    |  117 +++++++++++++++++++++++++++++++++++++++------
>>   2 files changed, 108 insertions(+), 16 deletions(-)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 0c198620ac93..70df1cec4a60 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -1351,6 +1351,10 @@ struct netdev_net_notifier {
>>    *			   struct kernel_hwtstamp_config *kernel_config,
>>    *			   struct netlink_ext_ack *extack);
>>    *	Change the hardware timestamping parameters for NIC device.
>> + *
>> + * int (*ndo_queue_set_napi)(struct net_device *dev, u32 q_idx, u32 q_type,
>> + *			     struct napi_struct *napi);
>> + *	Change the NAPI instance associated with the queue.
>>    */
>>   struct net_device_ops {
>>   	int			(*ndo_init)(struct net_device *dev);
>> @@ -1596,6 +1600,9 @@ struct net_device_ops {
>>   	int			(*ndo_hwtstamp_set)(struct net_device *dev,
>>   						    struct kernel_hwtstamp_config *kernel_config,
>>   						    struct netlink_ext_ack *extack);
>> +	int			(*ndo_queue_set_napi)(struct net_device *dev,
>> +						      u32 q_idx, u32 q_type,
>> +						      struct napi_struct *napi);
>>   };
>>   
>>   /**
>> diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
>> index d5b2e90e5709..6b3d3165d76e 100644
>> --- a/net/core/netdev-genl.c
>> +++ b/net/core/netdev-genl.c
>> @@ -288,12 +288,29 @@ int netdev_nl_napi_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
>>   	return err;
>>   }
>>   
>> +/* must be called under rtnl_lock() */
>> +static struct napi_struct *
>> +napi_get_by_queue(struct net_device *netdev, u32 q_idx, u32 q_type)
>> +{
>> +	struct netdev_rx_queue *rxq;
>> +	struct netdev_queue *txq;
>> +
>> +	switch (q_type) {
>> +	case NETDEV_QUEUE_TYPE_RX:
>> +		rxq = __netif_get_rx_queue(netdev, q_idx);
>> +		return rxq->napi;
>> +	case NETDEV_QUEUE_TYPE_TX:
>> +		txq = netdev_get_tx_queue(netdev, q_idx);
>> +		return txq->napi;
>> +	}
>> +	return NULL;
>> +}
>> +
>>   static int
>>   netdev_nl_queue_fill_one(struct sk_buff *rsp, struct net_device *netdev,
>>   			 u32 q_idx, u32 q_type, const struct genl_info *info)
>>   {
>> -	struct netdev_rx_queue *rxq;
>> -	struct netdev_queue *txq;
>> +	struct napi_struct *napi;
>>   	void *hdr;
>>   
>>   	hdr = genlmsg_iput(rsp, info);
>> @@ -305,19 +322,9 @@ netdev_nl_queue_fill_one(struct sk_buff *rsp, struct net_device *netdev,
>>   	    nla_put_u32(rsp, NETDEV_A_QUEUE_IFINDEX, netdev->ifindex))
>>   		goto nla_put_failure;
>>   
>> -	switch (q_type) {
>> -	case NETDEV_QUEUE_TYPE_RX:
>> -		rxq = __netif_get_rx_queue(netdev, q_idx);
>> -		if (rxq->napi && nla_put_u32(rsp, NETDEV_A_QUEUE_NAPI_ID,
>> -					     rxq->napi->napi_id))
>> -			goto nla_put_failure;
>> -		break;
>> -	case NETDEV_QUEUE_TYPE_TX:
>> -		txq = netdev_get_tx_queue(netdev, q_idx);
>> -		if (txq->napi && nla_put_u32(rsp, NETDEV_A_QUEUE_NAPI_ID,
>> -					     txq->napi->napi_id))
>> -			goto nla_put_failure;
>> -	}
>> +	napi = napi_get_by_queue(netdev, q_idx, q_type);
>> +	if (napi && nla_put_u32(rsp, NETDEV_A_QUEUE_NAPI_ID, napi->napi_id))
>> +		goto nla_put_failure;
>>   
>>   	genlmsg_end(rsp, hdr);
>>   
>> @@ -674,9 +681,87 @@ int netdev_nl_qstats_get_dumpit(struct sk_buff *skb,
>>   	return err;
>>   }
>>   
>> +static int
>> +netdev_nl_queue_set_napi(struct sk_buff *rsp, struct net_device *netdev,
>> +			 u32 q_idx, u32 q_type, u32 napi_id,
>> +			 const struct genl_info *info)
>> +{
>> +	struct napi_struct *napi, *old_napi;
>> +	int err;
>> +
>> +	if (!(netdev->flags & IFF_UP))
>> +		return 0;
> 
> Should this be an error code?

Thought I can return 0 like the _get_ functions as this is a noop and 
not really an error.

> 
>> +
>> +	err = netdev_nl_queue_validate(netdev, q_idx, q_type);
>> +	if (err)
>> +		return err;
>> +
>> +	old_napi = napi_get_by_queue(netdev, q_idx, q_type);
>> +	if (old_napi && old_napi->napi_id == napi_id)
>> +		return 0;
> 
> Same as above, I think this should be an error.

I was thinking I can follow the ethtool semantics here, when there's no 
update/parameter values are not modified, it is a noop and proceed to 
display the current values instead of throwing error.

> 
>> +
>> +	napi = napi_by_id(napi_id);
>> +	if (!napi)
>> +		return -EINVAL;
>> +
>> +	err = netdev->netdev_ops->ndo_queue_set_napi(netdev, q_idx, q_type, napi);
>> +
>> +	return err;
> 
> nit: return ndo_queue_set_napi() would save two lines.

Sure, will fix in the next version.

> 
>> +}
>> +
>>   int netdev_nl_queue_set_doit(struct sk_buff *skb, struct genl_info *info)
>>   {
>> -	return -EOPNOTSUPP;
>> +	u32 q_id, q_type, ifindex;
> 
> nit: q_idx for consistency?

Sure, will fix.

> 
>> +	struct net_device *netdev;
>> +	struct sk_buff *rsp;
>> +	u32 napi_id = 0;
>> +	int err = 0;
>> +
>> +	if (GENL_REQ_ATTR_CHECK(info, NETDEV_A_QUEUE_ID) ||
>> +	    GENL_REQ_ATTR_CHECK(info, NETDEV_A_QUEUE_TYPE) ||
>> +	    GENL_REQ_ATTR_CHECK(info, NETDEV_A_QUEUE_IFINDEX))
>> +		return -EINVAL;
>> +
>> +	q_id = nla_get_u32(info->attrs[NETDEV_A_QUEUE_ID]);
>> +	q_type = nla_get_u32(info->attrs[NETDEV_A_QUEUE_TYPE]);
>> +	ifindex = nla_get_u32(info->attrs[NETDEV_A_QUEUE_IFINDEX]);
>> +
>> +	if (info->attrs[NETDEV_A_QUEUE_NAPI_ID]) {
>> +		napi_id = nla_get_u32(info->attrs[NETDEV_A_QUEUE_NAPI_ID]);
>> +		if (napi_id < MIN_NAPI_ID)
>> +			return -EINVAL;
>> +	}
>> +
>> +	rsp = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
>> +	if (!rsp)
>> +		return -ENOMEM;
>> +
>> +	rtnl_lock();
>> +
>> +	netdev = __dev_get_by_index(genl_info_net(info), ifindex);
>> +	if (netdev) {
>> +		if (!napi_id)
>> +			GENL_SET_ERR_MSG(info, "No queue parameters changed\n");
> 
> Could this be checked earlier outside of rtnl_lock? I feel like not
> setting a napi_id here is EINVAL.
> 

I agree this part is a little ambiguous as I was trying to handle a few 
things here that I was not really sure of. The idea was that 
netdev_nl_queue_set_doit() could be the one function to handle all set 
params for the queue object. Today, only napi-id is supported. In 
future, there may be additional configurable attributes for the queue 
object beyond napi-id. So, it should be possible to set those params as 
well from netdev_nl_queue_set_doit(), which implies that, not setting 
napi-id is not EINVAL, as some other param could be set. If none of the 
configurable params are set, display the message and the current 
settings for the queue.
So the code today handles this; if the user has given a napi-id and it 
is valid, then proceed to the ndo callback. If the user has not 
specified a napi-id to update, it is not an error case, display the 
current settings, same as ethtool-set params does (although it's done in 
the userspace in case of ethtool). Ideally, this should be handled in 
the userspace for netdev-genl, maybe some tricks in the YAML can achieve 
this for the set command to check for the presence of atleast one 
configurable attribute in netdev-user.c.
I think, the following in YAML would make napi-id mandatory:
name: napi-id
checks:
   min: 1

But, that's not exactly what we want for set any params, what I was 
aiming for was to have any one attribute required from among a group of 
attributes.

>> +		else
>> +			err = netdev_nl_queue_set_napi(rsp, netdev, q_id,
>> +						       q_type, napi_id, info);
>> +		if (!err)
>> +			err = netdev_nl_queue_fill_one(rsp, netdev, q_id,
>> +						       q_type, info);
>> +	} else {
>> +		err = -ENODEV;
> 
> Could be less nesty (completely untested):
> 

Sure, thanks. I'll make this less nesty.

> 	if (!netdev) {
> 		err = -ENODEV;
> 		goto err_rtnl_unlock;
> 	}
> 
> 	if (!napi_id) {
> 		GENL_SET_ERR_MSG(info, "No queue parameters changed\n");
> 		goto err_nonapi;
> 	}
> 
> 	err = netdev_nl_queue_set_napi(rsp, netdev, q_id,
> 				       q_type, napi_id, info);
> 	if (err)
> 		goto err_rtnl_unlock;
> 
> err_nonapi:
> 	err = netdev_nl_queue_fill_one(rsp, netdev, q_id,
> 				       q_type, info);
> 
> err_rtnl_unlock:
> 	rtnl_unlock();
> 
> 	if (!err)
> 		return genlmsg_reply(rsp, info);
> 
> err_free_msg:
> 	nlmsg_free(rsp);
> 	return err;
> 
>> +	}
>> +
>> +	rtnl_unlock();
>> +
>> +	if (err)
>> +		goto err_free_msg;
>> +
>> +	return genlmsg_reply(rsp, info);
>> +
>> +err_free_msg:
>> +	nlmsg_free(rsp);
>> +	return err;
>>   }
>>   
>>   static int netdev_genl_netdevice_event(struct notifier_block *nb,
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next,RFC PATCH 0/5] Configuring NAPI instance for a queue
  2024-04-05 20:09 [net-next,RFC PATCH 0/5] Configuring NAPI instance for a queue Amritha Nambiar
                   ` (4 preceding siblings ...)
  2024-04-05 20:09 ` [net-next, RFC PATCH 5/5] ice: Add driver support for ndo_queue_set_napi Amritha Nambiar
@ 2024-04-09 23:21 ` Jakub Kicinski
  2024-04-11 22:46   ` Nambiar, Amritha
  5 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2024-04-09 23:21 UTC (permalink / raw)
  To: Amritha Nambiar
  Cc: netdev, davem, edumazet, pabeni, ast, sdf, lorenzo, tariqt,
	daniel, anthony.l.nguyen, lucien.xin, hawk, sridhar.samudrala

On Fri, 05 Apr 2024 13:09:28 -0700 Amritha Nambiar wrote:
> $ ./cli.py --spec netdev.yaml --do queue-set  --json='{"ifindex": 12, "id": 0, "type": 0, "napi-id": 595}'
> {'id': 0, 'ifindex': 12, 'napi-id': 595, 'type': 'rx'}

NAPI ID is not stable. What happens if you set the ID, bring the
device down and up again? I think we need to make NAPI IDs stable.

What happens if you change the channel count? Do we lose the config?
We try never to lose explicit user config. I think for simplicity
we should store the config in the core / common code.

How does the user know whether queue <> NAPI association is based
on driver defaults or explicit configuration? I think I mentioned
this in earlier discussions but the configuration may need to be
detached from the existing objects (for one thing they may not exist 
at all when the device is down).

Last but not least your driver patch implements the start/stop steps
of the "queue API" I think we should pull that out into the core.

Also the tests now exist - take a look at the sample one in
tools/testing/selftests/drivers/net/stats.py
Would be great to have all future netdev family extensions accompanied
by tests which can run both on real HW and netdevsim.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next,RFC PATCH 0/5] Configuring NAPI instance for a queue
  2024-04-09 23:21 ` [net-next,RFC PATCH 0/5] Configuring NAPI instance for a queue Jakub Kicinski
@ 2024-04-11 22:46   ` Nambiar, Amritha
  2024-04-12  1:47     ` Jakub Kicinski
  0 siblings, 1 reply; 13+ messages in thread
From: Nambiar, Amritha @ 2024-04-11 22:46 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, davem, edumazet, pabeni, ast, sdf, lorenzo, tariqt,
	daniel, anthony.l.nguyen, lucien.xin, hawk, sridhar.samudrala

On 4/9/2024 4:21 PM, Jakub Kicinski wrote:
> On Fri, 05 Apr 2024 13:09:28 -0700 Amritha Nambiar wrote:
>> $ ./cli.py --spec netdev.yaml --do queue-set  --json='{"ifindex": 12, "id": 0, "type": 0, "napi-id": 595}'
>> {'id': 0, 'ifindex': 12, 'napi-id': 595, 'type': 'rx'}
> 
> NAPI ID is not stable. What happens if you set the ID, bring the
> device down and up again? I think we need to make NAPI IDs stable.
> 

I tried this (device down/up and check NAPIs) on both bnxt and intel/ice.
On bnxt: New NAPI IDs are created sequentially once the device is up 
after turning down.
On ice: The NAPI IDs are stable and remains the same once the device is 
up after turning down.

In case of ice, device down/up executes napi_disable/napi_enable. The 
NAPI IDs are not lost as netif_napi_del is not called at IFF_DOWN. On 
IFF_DOWN, the IRQs associations with the OS are freed, but the resources 
allocated for the vectors and hence the NAPIs for the vectors persists 
(unless unload/reconfig).

> What happens if you change the channel count? Do we lose the config?
> We try never to lose explicit user config. I think for simplicity
> we should store the config in the core / common code.
> 

Yes, we lose the config in case of re-configuring channels. The reconfig 
path involves freeing the vectors and reallocating based on the new 
channel config, so, for the NAPIs associated with the vectors, 
netif_napi_del and netif_napi_add executes creating new NAPI IDs 
sequentially.

Wouldn't losing the explicit user config make sense in this case? By 
changing the channel count, the user has updated the queue layout, the 
queue<>vector mappings etc., so I think, the previous configs from set 
queue<>NAPI should be overwritten with the new config from set-channel.

> How does the user know whether queue <> NAPI association is based
> on driver defaults or explicit configuration?

I am not sure of this. ethtool shows pre-set defaults and current 
settings, but in this case, it is tricky :(

  I think I mentioned
> this in earlier discussions but the configuration may need to be
> detached from the existing objects (for one thing they may not exist
> at all when the device is down).
> 

Yes, we did have that discussion about detaching queues from NAPI. But, 
I am not sure how to accomplish that. Any thoughts on what other 
possible object can be used for the configuration?
WRT ice, when the device is down, the queues are listed and exists as 
inactive queues, NAPI IDs exists, IRQs associations with the OS are freed.

> Last but not least your driver patch implements the start/stop steps
> of the "queue API" I think we should pull that out into the core.
> 

Agree, it would be good to have these steps in the core, but I think the 
challenge is that we would still end up with a lot of code in the driver 
as well, due to all the hardware-centric bits in it.

> Also the tests now exist - take a look at the sample one in
> tools/testing/selftests/drivers/net/stats.py
> Would be great to have all future netdev family extensions accompanied
> by tests which can run both on real HW and netdevsim.

Okay, I will write tests for the new extensions here.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next,RFC PATCH 0/5] Configuring NAPI instance for a queue
  2024-04-11 22:46   ` Nambiar, Amritha
@ 2024-04-12  1:47     ` Jakub Kicinski
  2024-04-18 21:23       ` Nambiar, Amritha
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2024-04-12  1:47 UTC (permalink / raw)
  To: Nambiar, Amritha
  Cc: netdev, davem, edumazet, pabeni, ast, sdf, lorenzo, tariqt,
	daniel, anthony.l.nguyen, lucien.xin, hawk, sridhar.samudrala

On Thu, 11 Apr 2024 15:46:45 -0700 Nambiar, Amritha wrote:
> On 4/9/2024 4:21 PM, Jakub Kicinski wrote:
> > On Fri, 05 Apr 2024 13:09:28 -0700 Amritha Nambiar wrote:  
> >> $ ./cli.py --spec netdev.yaml --do queue-set  --json='{"ifindex": 12, "id": 0, "type": 0, "napi-id": 595}'
> >> {'id': 0, 'ifindex': 12, 'napi-id': 595, 'type': 'rx'}  
> > 
> > NAPI ID is not stable. What happens if you set the ID, bring the
> > device down and up again? I think we need to make NAPI IDs stable.
> 
> I tried this (device down/up and check NAPIs) on both bnxt and intel/ice.
> On bnxt: New NAPI IDs are created sequentially once the device is up 
> after turning down.
> On ice: The NAPI IDs are stable and remains the same once the device is 
> up after turning down.
> 
> In case of ice, device down/up executes napi_disable/napi_enable. The 
> NAPI IDs are not lost as netif_napi_del is not called at IFF_DOWN. On 
> IFF_DOWN, the IRQs associations with the OS are freed, but the resources 
> allocated for the vectors and hence the NAPIs for the vectors persists 
> (unless unload/reconfig).

SG! So let's just make sure we cover that in tests.

> > What happens if you change the channel count? Do we lose the config?
> > We try never to lose explicit user config. I think for simplicity
> > we should store the config in the core / common code.
> 
> Yes, we lose the config in case of re-configuring channels. The reconfig 
> path involves freeing the vectors and reallocating based on the new 
> channel config, so, for the NAPIs associated with the vectors, 
> netif_napi_del and netif_napi_add executes creating new NAPI IDs 
> sequentially.
> 
> Wouldn't losing the explicit user config make sense in this case? By 
> changing the channel count, the user has updated the queue layout, the 
> queue<>vector mappings etc., so I think, the previous configs from set 
> queue<>NAPI should be overwritten with the new config from set-channel.

We do prevent indirection table from being reset on channel count
change. I think same logic applies here..

> > How does the user know whether queue <> NAPI association is based
> > on driver defaults or explicit configuration?  
> 
> I am not sure of this. ethtool shows pre-set defaults and current 
> settings, but in this case, it is tricky :(

Can you say more about the use case for moving the queues around?
If you just want to have fewer NAPI vectors and more queues, but
don't care about exact mapping - we could probably come up with
a simpler API, no? Are the queues stack queues or also AF_XDP?

> > I think I mentioned
> > this in earlier discussions but the configuration may need to be
> > detached from the existing objects (for one thing they may not exist
> > at all when the device is down).
> 
> Yes, we did have that discussion about detaching queues from NAPI. But, 
> I am not sure how to accomplish that. Any thoughts on what other 
> possible object can be used for the configuration?

We could stick to the queue as the object perhaps. The "target NAPI"
would just be part of the config passed to the alloc/start callbacks.

> WRT ice, when the device is down, the queues are listed and exists as 
> inactive queues, NAPI IDs exists, IRQs associations with the OS are freed.
> 
> > Last but not least your driver patch implements the start/stop steps
> > of the "queue API" I think we should pull that out into the core.
> 
> Agree, it would be good to have these steps in the core, but I think the 
> challenge is that we would still end up with a lot of code in the driver 
> as well, due to all the hardware-centric bits in it.

For one feature I think adding code in the core is not beneficial.
But we have multiple adjacent needs, so when we add up your work,
zero copy, page pool config, maybe queue alloc.. hopefully the code
in the core will be net positive.

> > Also the tests now exist - take a look at the sample one in
> > tools/testing/selftests/drivers/net/stats.py
> > Would be great to have all future netdev family extensions accompanied
> > by tests which can run both on real HW and netdevsim.  
> 
> Okay, I will write tests for the new extensions here.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next,RFC PATCH 0/5] Configuring NAPI instance for a queue
  2024-04-12  1:47     ` Jakub Kicinski
@ 2024-04-18 21:23       ` Nambiar, Amritha
  2024-04-19  1:37         ` Jakub Kicinski
  0 siblings, 1 reply; 13+ messages in thread
From: Nambiar, Amritha @ 2024-04-18 21:23 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, davem, edumazet, pabeni, ast, sdf, lorenzo, tariqt,
	daniel, anthony.l.nguyen, lucien.xin, hawk, sridhar.samudrala

On 4/11/2024 6:47 PM, Jakub Kicinski wrote:
> On Thu, 11 Apr 2024 15:46:45 -0700 Nambiar, Amritha wrote:
>> On 4/9/2024 4:21 PM, Jakub Kicinski wrote:
>>> On Fri, 05 Apr 2024 13:09:28 -0700 Amritha Nambiar wrote:
>>>> $ ./cli.py --spec netdev.yaml --do queue-set  --json='{"ifindex": 12, "id": 0, "type": 0, "napi-id": 595}'
>>>> {'id': 0, 'ifindex': 12, 'napi-id': 595, 'type': 'rx'}
>>>
>>> NAPI ID is not stable. What happens if you set the ID, bring the
>>> device down and up again? I think we need to make NAPI IDs stable.
>>
>> I tried this (device down/up and check NAPIs) on both bnxt and intel/ice.
>> On bnxt: New NAPI IDs are created sequentially once the device is up
>> after turning down.
>> On ice: The NAPI IDs are stable and remains the same once the device is
>> up after turning down.
>>
>> In case of ice, device down/up executes napi_disable/napi_enable. The
>> NAPI IDs are not lost as netif_napi_del is not called at IFF_DOWN. On
>> IFF_DOWN, the IRQs associations with the OS are freed, but the resources
>> allocated for the vectors and hence the NAPIs for the vectors persists
>> (unless unload/reconfig).
> 
> SG! So let's just make sure we cover that in tests.
> 
>>> What happens if you change the channel count? Do we lose the config?
>>> We try never to lose explicit user config. I think for simplicity
>>> we should store the config in the core / common code.
>>
>> Yes, we lose the config in case of re-configuring channels. The reconfig
>> path involves freeing the vectors and reallocating based on the new
>> channel config, so, for the NAPIs associated with the vectors,
>> netif_napi_del and netif_napi_add executes creating new NAPI IDs
>> sequentially.
>>
>> Wouldn't losing the explicit user config make sense in this case? By
>> changing the channel count, the user has updated the queue layout, the
>> queue<>vector mappings etc., so I think, the previous configs from set
>> queue<>NAPI should be overwritten with the new config from set-channel.
> 
> We do prevent indirection table from being reset on channel count
> change. I think same logic applies here..
> 

Okay. I tried this on bnxt (this may be outside scope and secondary, but 
hoping all the additional information helps).
It looks like bnxt differentiates if the indirection table was based on 
driver defaults vs user configuration. If the indirection table was from 
driver defaults, then changing channel count to fewer queues is allowed. 
If it was based on explicit user configuration, changing channel count 
to fewer queues is not allowed as the indirection table might then point 
to inactive queues. So, the rss user configuration is preserved by 
blocking new channel configurations that do not align.
So applying the same logic here would mean, changing the channel count 
to queues < 'default queue for the last user configured NAPI ID' would 
have to be prevented. This becomes difficult to track unless pre-set 
default queue <> NAPI configs are maintained.

>>> How does the user know whether queue <> NAPI association is based
>>> on driver defaults or explicit configuration?
>>
>> I am not sure of this. ethtool shows pre-set defaults and current
>> settings, but in this case, it is tricky :(
> 
> Can you say more about the use case for moving the queues around?
> If you just want to have fewer NAPI vectors and more queues, but
> don't care about exact mapping - we could probably come up with
> a simpler API, no? Are the queues stack queues or also AF_XDP?
> 

I'll try to explain. The goal is to have fewer NAPI pollers. The number 
of NAPI pollers is the same as the number of active NAPIs (kthread per 
NAPI). It is possible to limit the number of pollers by mapping 
multiples queues on an interrupt vector (fewer vectors, more queues) 
implicitly in the driver. But, we are looking for a more granular 
approach, in our case, the queues are grouped into 
queue-groups/rss-contexts. We would like to reduce the number of pollers 
within certain selected queue-groups/rss-contexts (not all the 
queue-groups), hence need the configurability.
This would benefit our hyper-threading use case, where a single physical 
core can be used for both network and application processing. If the 
NAPI to queue association is known, we can pin the NAPI thread to the 
logical core and the application thread to the corresponding sibling 
logical core.

The queues are stack queues, not AF_XDP.

>>> I think I mentioned
>>> this in earlier discussions but the configuration may need to be
>>> detached from the existing objects (for one thing they may not exist
>>> at all when the device is down).
>>
>> Yes, we did have that discussion about detaching queues from NAPI. But,
>> I am not sure how to accomplish that. Any thoughts on what other
>> possible object can be used for the configuration?
> 
> We could stick to the queue as the object perhaps. The "target NAPI"
> would just be part of the config passed to the alloc/start callbacks.
> 

Okay.

>> WRT ice, when the device is down, the queues are listed and exists as
>> inactive queues, NAPI IDs exists, IRQs associations with the OS are freed.
>>
>>> Last but not least your driver patch implements the start/stop steps
>>> of the "queue API" I think we should pull that out into the core.
>>
>> Agree, it would be good to have these steps in the core, but I think the
>> challenge is that we would still end up with a lot of code in the driver
>> as well, due to all the hardware-centric bits in it.
> 
> For one feature I think adding code in the core is not beneficial.
> But we have multiple adjacent needs, so when we add up your work,
> zero copy, page pool config, maybe queue alloc.. hopefully the code
> in the core will be net positive.
> 
>>> Also the tests now exist - take a look at the sample one in
>>> tools/testing/selftests/drivers/net/stats.py
>>> Would be great to have all future netdev family extensions accompanied
>>> by tests which can run both on real HW and netdevsim.
>>
>> Okay, I will write tests for the new extensions here.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next,RFC PATCH 0/5] Configuring NAPI instance for a queue
  2024-04-18 21:23       ` Nambiar, Amritha
@ 2024-04-19  1:37         ` Jakub Kicinski
  0 siblings, 0 replies; 13+ messages in thread
From: Jakub Kicinski @ 2024-04-19  1:37 UTC (permalink / raw)
  To: Nambiar, Amritha
  Cc: netdev, davem, edumazet, pabeni, ast, sdf, lorenzo, tariqt,
	daniel, anthony.l.nguyen, lucien.xin, hawk, sridhar.samudrala

On Thu, 18 Apr 2024 14:23:03 -0700 Nambiar, Amritha wrote:
> >> I am not sure of this. ethtool shows pre-set defaults and current
> >> settings, but in this case, it is tricky :(  
> > 
> > Can you say more about the use case for moving the queues around?
> > If you just want to have fewer NAPI vectors and more queues, but
> > don't care about exact mapping - we could probably come up with
> > a simpler API, no? Are the queues stack queues or also AF_XDP?
> 
> I'll try to explain. The goal is to have fewer NAPI pollers. The number 
> of NAPI pollers is the same as the number of active NAPIs (kthread per 
> NAPI). It is possible to limit the number of pollers by mapping 
> multiples queues on an interrupt vector (fewer vectors, more queues) 
> implicitly in the driver. But, we are looking for a more granular 
> approach, in our case, the queues are grouped into 
> queue-groups/rss-contexts. We would like to reduce the number of pollers 
> within certain selected queue-groups/rss-contexts (not all the 
> queue-groups), hence need the configurability.
> This would benefit our hyper-threading use case, where a single physical 
> core can be used for both network and application processing. If the 
> NAPI to queue association is known, we can pin the NAPI thread to the 
> logical core and the application thread to the corresponding sibling 
> logical core.

Could you provide a more detailed example of a desired configuration? 
I'm not sure I'm getting the point.

What's the point of having multiple queues if they end up in the same
NAPI?

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-04-19  1:37 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-05 20:09 [net-next,RFC PATCH 0/5] Configuring NAPI instance for a queue Amritha Nambiar
2024-04-05 20:09 ` [net-next, RFC PATCH 1/5] netdev-genl: spec: Extend netdev netlink spec in YAML for queue-set Amritha Nambiar
2024-04-05 20:09 ` [net-next, RFC PATCH 2/5] netdev-genl: Add netlink framework functions for queue-set NAPI Amritha Nambiar
2024-04-06  0:20   ` David Wei
2024-04-08 19:45     ` Nambiar, Amritha
2024-04-05 20:09 ` [net-next, RFC PATCH 3/5] ice: Add support to enable/disable a vector Amritha Nambiar
2024-04-05 20:09 ` [net-next,RFC PATCH 4/5] ice: Handle unused vectors dynamically Amritha Nambiar
2024-04-05 20:09 ` [net-next, RFC PATCH 5/5] ice: Add driver support for ndo_queue_set_napi Amritha Nambiar
2024-04-09 23:21 ` [net-next,RFC PATCH 0/5] Configuring NAPI instance for a queue Jakub Kicinski
2024-04-11 22:46   ` Nambiar, Amritha
2024-04-12  1:47     ` Jakub Kicinski
2024-04-18 21:23       ` Nambiar, Amritha
2024-04-19  1:37         ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).