* [PATCH net-next RFC 0/3] net: Add support to configure SR-IOV VF queues.
@ 2018-05-09 11:21 Michael Chan
2018-05-09 11:21 ` [PATCH net-next RFC 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues Michael Chan
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Michael Chan @ 2018-05-09 11:21 UTC (permalink / raw)
To: davem; +Cc: netdev
VF Queue resources are always limited and there is currently no
infrastructure to allow the admin. on the host to add or reduce queue
resources for any particular VF. This RFC series adds the infrastructure
to do that and adds the functionality to the bnxt_en driver.
The "ip link set" command will subsequently be patched to support the new
operation.
Michael Chan (3):
net: Add support to configure SR-IOV VF minimum and maximum queues.
bnxt_en: Store min/max tx/rx rings for individual VFs.
bnxt_en: Implement .ndo_set_vf_queues().
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 1 +
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 5 ++
drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 90 +++++++++++++++++++++++--
drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h | 2 +
include/linux/if_link.h | 4 ++
include/linux/netdevice.h | 6 ++
include/uapi/linux/if_link.h | 9 +++
net/core/rtnetlink.c | 28 +++++++-
8 files changed, 138 insertions(+), 7 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net-next RFC 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
2018-05-09 11:21 [PATCH net-next RFC 0/3] net: Add support to configure SR-IOV VF queues Michael Chan
@ 2018-05-09 11:21 ` Michael Chan
2018-05-09 23:15 ` Jakub Kicinski
2018-05-09 11:21 ` [PATCH net-next RFC 2/3] bnxt_en: Store min/max tx/rx rings for individual VFs Michael Chan
2018-05-09 11:21 ` [PATCH net-next RFC 3/3] bnxt_en: Implement .ndo_set_vf_queues() Michael Chan
2 siblings, 1 reply; 8+ messages in thread
From: Michael Chan @ 2018-05-09 11:21 UTC (permalink / raw)
To: davem; +Cc: netdev
VF Queue resources are always limited and there is currently no
infrastructure to allow the admin. on the host to add or reduce queue
resources for any particular VF. With ever increasing number of VFs
being supported, it is desirable to allow the admin. to configure queue
resources differently for the VFs. Some VFs may require more or fewer
queues due to different bandwidth requirements or different number of
vCPUs in the VM. This patch adds the infrastructure to do that by
adding IFLA_VF_QUEUES netlink attribute and a new .ndo_set_vf_queues()
to the net_device_ops.
Four parameters are exposed for each VF:
o min_tx_queues - Guaranteed or current tx queues assigned to the VF.
o max_tx_queues - Maximum but not necessarily guaranteed tx queues
available to the VF.
o min_rx_queues - Guaranteed or current rx queues assigned to the VF.
o max_rx_queues - Maximum but not necessarily guaranteed rx queues
available to the VF.
The "ip link set" command will subsequently be patched to support the new
operation to set the above parameters.
After the admin. makes a change to the above parameters, the corresponding
VF will have a new range of channels to set using ethtool -L.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
include/linux/if_link.h | 4 ++++
include/linux/netdevice.h | 6 ++++++
include/uapi/linux/if_link.h | 9 +++++++++
net/core/rtnetlink.c | 28 +++++++++++++++++++++++++---
4 files changed, 44 insertions(+), 3 deletions(-)
diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 622658d..8e81121 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -29,5 +29,9 @@ struct ifla_vf_info {
__u32 rss_query_en;
__u32 trusted;
__be16 vlan_proto;
+ __u32 min_tx_queues;
+ __u32 max_tx_queues;
+ __u32 min_rx_queues;
+ __u32 max_rx_queues;
};
#endif /* _LINUX_IF_LINK_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 03ed492..30a3caf 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1023,6 +1023,8 @@ struct dev_ifalias {
* with PF and querying it may introduce a theoretical security risk.
* int (*ndo_set_vf_rss_query_en)(struct net_device *dev, int vf, bool setting);
* int (*ndo_get_vf_port)(struct net_device *dev, int vf, struct sk_buff *skb);
+ * int (*ndo_set_vf_queues)(struct net_device *dev, int vf, int min_txq,
+ * int max_txq, int min_rxq, int max_rxq);
* int (*ndo_setup_tc)(struct net_device *dev, enum tc_setup_type type,
* void *type_data);
* Called to setup any 'tc' scheduler, classifier or action on @dev.
@@ -1272,6 +1274,10 @@ struct net_device_ops {
int (*ndo_set_vf_rss_query_en)(
struct net_device *dev,
int vf, bool setting);
+ int (*ndo_set_vf_queues)(struct net_device *dev,
+ int vf,
+ int min_txq, int max_txq,
+ int min_rxq, int max_rxq);
int (*ndo_setup_tc)(struct net_device *dev,
enum tc_setup_type type,
void *type_data);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index b852664..fc56a47 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -658,6 +658,7 @@ enum {
IFLA_VF_IB_NODE_GUID, /* VF Infiniband node GUID */
IFLA_VF_IB_PORT_GUID, /* VF Infiniband port GUID */
IFLA_VF_VLAN_LIST, /* nested list of vlans, option for QinQ */
+ IFLA_VF_QUEUES, /* Min and Max TX/RX queues */
__IFLA_VF_MAX,
};
@@ -748,6 +749,14 @@ struct ifla_vf_trust {
__u32 setting;
};
+struct ifla_vf_queues {
+ __u32 vf;
+ __u32 min_tx_queues;
+ __u32 max_tx_queues;
+ __u32 min_rx_queues;
+ __u32 max_rx_queues;
+};
+
/* VF ports management section
*
* Nested layout of set/get msg is:
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 8080254..7cf3582 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -921,7 +921,8 @@ static inline int rtnl_vfinfo_size(const struct net_device *dev,
nla_total_size_64bit(sizeof(__u64)) +
/* IFLA_VF_STATS_TX_DROPPED */
nla_total_size_64bit(sizeof(__u64)) +
- nla_total_size(sizeof(struct ifla_vf_trust)));
+ nla_total_size(sizeof(struct ifla_vf_trust)) +
+ nla_total_size(sizeof(struct ifla_vf_queues)));
return size;
} else
return 0;
@@ -1181,6 +1182,7 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
struct ifla_vf_vlan_info vf_vlan_info;
struct ifla_vf_spoofchk vf_spoofchk;
struct ifla_vf_tx_rate vf_tx_rate;
+ struct ifla_vf_queues vf_queues;
struct ifla_vf_stats vf_stats;
struct ifla_vf_trust vf_trust;
struct ifla_vf_vlan vf_vlan;
@@ -1217,7 +1219,8 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
vf_spoofchk.vf =
vf_linkstate.vf =
vf_rss_query_en.vf =
- vf_trust.vf = ivi.vf;
+ vf_trust.vf =
+ vf_queues.vf = ivi.vf;
memcpy(vf_mac.mac, ivi.mac, sizeof(ivi.mac));
vf_vlan.vlan = ivi.vlan;
@@ -1232,6 +1235,10 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
vf_linkstate.link_state = ivi.linkstate;
vf_rss_query_en.setting = ivi.rss_query_en;
vf_trust.setting = ivi.trusted;
+ vf_queues.min_tx_queues = ivi.min_tx_queues;
+ vf_queues.max_tx_queues = ivi.max_tx_queues;
+ vf_queues.min_rx_queues = ivi.min_rx_queues;
+ vf_queues.max_rx_queues = ivi.max_rx_queues;
vf = nla_nest_start(skb, IFLA_VF_INFO);
if (!vf)
goto nla_put_vfinfo_failure;
@@ -1249,7 +1256,9 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
sizeof(vf_rss_query_en),
&vf_rss_query_en) ||
nla_put(skb, IFLA_VF_TRUST,
- sizeof(vf_trust), &vf_trust))
+ sizeof(vf_trust), &vf_trust) ||
+ nla_put(skb, IFLA_VF_QUEUES,
+ sizeof(vf_queues), &vf_queues))
goto nla_put_vf_failure;
vfvlanlist = nla_nest_start(skb, IFLA_VF_VLAN_LIST);
if (!vfvlanlist)
@@ -1706,6 +1715,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
[IFLA_VF_TRUST] = { .len = sizeof(struct ifla_vf_trust) },
[IFLA_VF_IB_NODE_GUID] = { .len = sizeof(struct ifla_vf_guid) },
[IFLA_VF_IB_PORT_GUID] = { .len = sizeof(struct ifla_vf_guid) },
+ [IFLA_VF_QUEUES] = { .len = sizeof(struct ifla_vf_queues) },
};
static const struct nla_policy ifla_port_policy[IFLA_PORT_MAX+1] = {
@@ -2208,6 +2218,18 @@ static int do_setvfinfo(struct net_device *dev, struct nlattr **tb)
return handle_vf_guid(dev, ivt, IFLA_VF_IB_PORT_GUID);
}
+ if (tb[IFLA_VF_QUEUES]) {
+ struct ifla_vf_queues *ivq = nla_data(tb[IFLA_VF_QUEUES]);
+
+ err = -EOPNOTSUPP;
+ if (ops->ndo_set_vf_queues)
+ err = ops->ndo_set_vf_queues(dev, ivq->vf,
+ ivq->min_tx_queues, ivq->max_tx_queues,
+ ivq->min_rx_queues, ivq->max_rx_queues);
+ if (err < 0)
+ return err;
+ }
+
return err;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next RFC 2/3] bnxt_en: Store min/max tx/rx rings for individual VFs.
2018-05-09 11:21 [PATCH net-next RFC 0/3] net: Add support to configure SR-IOV VF queues Michael Chan
2018-05-09 11:21 ` [PATCH net-next RFC 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues Michael Chan
@ 2018-05-09 11:21 ` Michael Chan
2018-05-09 11:21 ` [PATCH net-next RFC 3/3] bnxt_en: Implement .ndo_set_vf_queues() Michael Chan
2 siblings, 0 replies; 8+ messages in thread
From: Michael Chan @ 2018-05-09 11:21 UTC (permalink / raw)
To: davem; +Cc: netdev
With new infrastructure to configure queues differently for each VF,
we need to store the current min/max rx/tx rings for each VF.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 5 +++++
drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 23 +++++++++++++++++++----
2 files changed, 24 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 9b14eb6..2f5a23c 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -837,6 +837,10 @@ struct bnxt_vf_info {
u32 func_flags; /* func cfg flags */
u32 min_tx_rate;
u32 max_tx_rate;
+ u16 min_tx_rings;
+ u16 max_tx_rings;
+ u16 min_rx_rings;
+ u16 max_rx_rings;
void *hwrm_cmd_req_addr;
dma_addr_t hwrm_cmd_req_dma_addr;
};
@@ -1351,6 +1355,7 @@ struct bnxt {
#ifdef CONFIG_BNXT_SRIOV
int nr_vfs;
struct bnxt_vf_info vf;
+ struct hwrm_func_vf_resource_cfg_input vf_resc_cfg_input;
wait_queue_head_t sriov_cfg_wait;
bool sriov_cfg;
#define BNXT_SRIOV_CFG_WAIT_TMO msecs_to_jiffies(10000)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
index a649108..489e534 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
@@ -171,6 +171,10 @@ int bnxt_get_vf_config(struct net_device *dev, int vf_id,
ivi->linkstate = IFLA_VF_LINK_STATE_ENABLE;
else
ivi->linkstate = IFLA_VF_LINK_STATE_DISABLE;
+ ivi->min_tx_queues = vf->min_tx_rings;
+ ivi->max_tx_queues = vf->max_tx_rings;
+ ivi->min_rx_queues = vf->min_rx_rings;
+ ivi->max_rx_queues = vf->max_rx_rings;
return 0;
}
@@ -498,6 +502,8 @@ static int bnxt_hwrm_func_vf_resc_cfg(struct bnxt *bp, int num_vfs)
mutex_lock(&bp->hwrm_cmd_lock);
for (i = 0; i < num_vfs; i++) {
+ struct bnxt_vf_info *vf = &pf->vf[i];
+
req.vf_id = cpu_to_le16(pf->first_vf_id + i);
rc = _hwrm_send_message(bp, &req, sizeof(req),
HWRM_CMD_TIMEOUT);
@@ -506,7 +512,11 @@ static int bnxt_hwrm_func_vf_resc_cfg(struct bnxt *bp, int num_vfs)
break;
}
pf->active_vfs = i + 1;
- pf->vf[i].fw_fid = pf->first_vf_id + i;
+ vf->fw_fid = pf->first_vf_id + i;
+ vf->min_tx_rings = le16_to_cpu(req.min_tx_rings);
+ vf->max_tx_rings = vf_tx_rings;
+ vf->min_rx_rings = le16_to_cpu(req.min_rx_rings);
+ vf->max_rx_rings = vf_rx_rings;
}
mutex_unlock(&bp->hwrm_cmd_lock);
if (pf->active_vfs) {
@@ -521,6 +531,7 @@ static int bnxt_hwrm_func_vf_resc_cfg(struct bnxt *bp, int num_vfs)
hw_resc->max_stat_ctxs -= le16_to_cpu(req.min_stat_ctx) * n;
hw_resc->max_vnics -= le16_to_cpu(req.min_vnics) * n;
+ memcpy(&bp->vf_resc_cfg_input, &req, sizeof(req));
rc = pf->active_vfs;
}
return rc;
@@ -585,6 +596,7 @@ static int bnxt_hwrm_func_cfg(struct bnxt *bp, int num_vfs)
mutex_lock(&bp->hwrm_cmd_lock);
for (i = 0; i < num_vfs; i++) {
+ struct bnxt_vf_info *vf = &pf->vf[i];
int vf_tx_rsvd = vf_tx_rings;
req.fid = cpu_to_le16(pf->first_vf_id + i);
@@ -593,12 +605,15 @@ static int bnxt_hwrm_func_cfg(struct bnxt *bp, int num_vfs)
if (rc)
break;
pf->active_vfs = i + 1;
- pf->vf[i].fw_fid = le16_to_cpu(req.fid);
- rc = __bnxt_hwrm_get_tx_rings(bp, pf->vf[i].fw_fid,
- &vf_tx_rsvd);
+ vf->fw_fid = le16_to_cpu(req.fid);
+ rc = __bnxt_hwrm_get_tx_rings(bp, vf->fw_fid, &vf_tx_rsvd);
if (rc)
break;
total_vf_tx_rings += vf_tx_rsvd;
+ vf->min_tx_rings = vf_tx_rsvd;
+ vf->max_tx_rings = vf_tx_rsvd;
+ vf->min_rx_rings = vf_rx_rings;
+ vf->max_rx_rings = vf_rx_rings;
}
mutex_unlock(&bp->hwrm_cmd_lock);
if (rc)
--
1.8.3.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next RFC 3/3] bnxt_en: Implement .ndo_set_vf_queues().
2018-05-09 11:21 [PATCH net-next RFC 0/3] net: Add support to configure SR-IOV VF queues Michael Chan
2018-05-09 11:21 ` [PATCH net-next RFC 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues Michael Chan
2018-05-09 11:21 ` [PATCH net-next RFC 2/3] bnxt_en: Store min/max tx/rx rings for individual VFs Michael Chan
@ 2018-05-09 11:21 ` Michael Chan
2 siblings, 0 replies; 8+ messages in thread
From: Michael Chan @ 2018-05-09 11:21 UTC (permalink / raw)
To: davem; +Cc: netdev
Implement .ndo_set_vf_queues() on the PF driver to configure the queues
parameters for individual VFs. This allows the admin. on the host to
increase or decrease queues for individual VFs.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 1 +
drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 67 +++++++++++++++++++++++++
drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h | 2 +
3 files changed, 70 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index dfa0839..2ce9779 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -8373,6 +8373,7 @@ static int bnxt_swdev_port_attr_get(struct net_device *dev,
.ndo_set_vf_link_state = bnxt_set_vf_link_state,
.ndo_set_vf_spoofchk = bnxt_set_vf_spoofchk,
.ndo_set_vf_trust = bnxt_set_vf_trust,
+ .ndo_set_vf_queues = bnxt_set_vf_queues,
#endif
#ifdef CONFIG_NET_POLL_CONTROLLER
.ndo_poll_controller = bnxt_poll_controller,
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
index 489e534..f0d938c 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
@@ -138,6 +138,73 @@ int bnxt_set_vf_trust(struct net_device *dev, int vf_id, bool trusted)
return 0;
}
+static bool bnxt_param_ok(int new, u16 curr, u16 avail)
+{
+ int delta;
+
+ if (new <= curr)
+ return true;
+
+ delta = new - curr;
+ if (delta <= avail)
+ return true;
+ return false;
+}
+
+int bnxt_set_vf_queues(struct net_device *dev, int vf_id, int min_txq,
+ int max_txq, int min_rxq, int max_rxq)
+{
+ struct hwrm_func_vf_resource_cfg_input req = {0};
+ struct bnxt *bp = netdev_priv(dev);
+ u16 avail_tx_rings, avail_rx_rings;
+ struct bnxt_hw_resc *hw_resc;
+ struct bnxt_vf_info *vf;
+ int rc;
+
+ if (bnxt_vf_ndo_prep(bp, vf_id))
+ return -EINVAL;
+
+ if (!(bp->flags & BNXT_FLAG_NEW_RM))
+ return -EOPNOTSUPP;
+
+ vf = &bp->pf.vf[vf_id];
+ hw_resc = &bp->hw_resc;
+
+ avail_tx_rings = hw_resc->max_tx_rings - bp->tx_nr_rings;
+ if (bp->flags & BNXT_FLAG_AGG_RINGS)
+ avail_rx_rings = hw_resc->max_rx_rings - bp->rx_nr_rings * 2;
+ else
+ avail_rx_rings = hw_resc->max_rx_rings - bp->rx_nr_rings;
+ if (!bnxt_param_ok(min_txq, vf->min_tx_rings, avail_tx_rings))
+ return -ENOBUFS;
+ if (!bnxt_param_ok(min_rxq, vf->min_rx_rings, avail_rx_rings))
+ return -ENOBUFS;
+ if (!bnxt_param_ok(max_txq, vf->max_tx_rings, avail_tx_rings))
+ return -ENOBUFS;
+ if (!bnxt_param_ok(max_rxq, vf->max_rx_rings, avail_rx_rings))
+ return -ENOBUFS;
+
+ bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_FUNC_VF_RESOURCE_CFG, -1, -1);
+ memcpy(&req, &bp->vf_resc_cfg_input, sizeof(req));
+ req.min_tx_rings = cpu_to_le16(min_txq);
+ req.min_rx_rings = cpu_to_le16(min_rxq);
+ req.max_tx_rings = cpu_to_le16(max_txq);
+ req.max_rx_rings = cpu_to_le16(max_rxq);
+ rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+ if (rc)
+ return -EIO;
+
+ hw_resc->max_tx_rings += vf->min_tx_rings;
+ hw_resc->max_rx_rings += vf->min_rx_rings;
+ vf->min_tx_rings = min_txq;
+ vf->max_tx_rings = max_txq;
+ vf->min_rx_rings = min_rxq;
+ vf->max_rx_rings = max_rxq;
+ hw_resc->max_tx_rings -= vf->min_tx_rings;
+ hw_resc->max_rx_rings -= vf->min_rx_rings;
+ return 0;
+}
+
int bnxt_get_vf_config(struct net_device *dev, int vf_id,
struct ifla_vf_info *ivi)
{
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
index e9b20cd..325b412 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
@@ -35,6 +35,8 @@
int bnxt_set_vf_link_state(struct net_device *, int, int);
int bnxt_set_vf_spoofchk(struct net_device *, int, bool);
int bnxt_set_vf_trust(struct net_device *dev, int vf_id, bool trust);
+int bnxt_set_vf_queues(struct net_device *dev, int vf_id, int min_txq,
+ int max_txq, int min_rxq, int max_rxq);
int bnxt_sriov_configure(struct pci_dev *pdev, int num_vfs);
void bnxt_sriov_disable(struct bnxt *);
void bnxt_hwrm_exec_fwd_req(struct bnxt *);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next RFC 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
2018-05-09 11:21 ` [PATCH net-next RFC 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues Michael Chan
@ 2018-05-09 23:15 ` Jakub Kicinski
2018-05-10 0:22 ` Michael Chan
0 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2018-05-09 23:15 UTC (permalink / raw)
To: Michael Chan, Or Gerlitz; +Cc: davem, netdev
On Wed, 9 May 2018 07:21:41 -0400, Michael Chan wrote:
> VF Queue resources are always limited and there is currently no
> infrastructure to allow the admin. on the host to add or reduce queue
> resources for any particular VF. With ever increasing number of VFs
> being supported, it is desirable to allow the admin. to configure queue
> resources differently for the VFs. Some VFs may require more or fewer
> queues due to different bandwidth requirements or different number of
> vCPUs in the VM. This patch adds the infrastructure to do that by
> adding IFLA_VF_QUEUES netlink attribute and a new .ndo_set_vf_queues()
> to the net_device_ops.
>
> Four parameters are exposed for each VF:
>
> o min_tx_queues - Guaranteed or current tx queues assigned to the VF.
This muxing of semantics may be a little awkward and unnecessary, would
it make sense for struct ifla_vf_info to have a separate fields for
current number of queues and the admin-set guaranteed min?
Is there a real world use case for the min value or are you trying to
make the API feature complete?
> o max_tx_queues - Maximum but not necessarily guaranteed tx queues
> available to the VF.
>
> o min_rx_queues - Guaranteed or current rx queues assigned to the VF.
>
> o max_rx_queues - Maximum but not necessarily guaranteed rx queues
> available to the VF.
>
> The "ip link set" command will subsequently be patched to support the new
> operation to set the above parameters.
>
> After the admin. makes a change to the above parameters, the corresponding
> VF will have a new range of channels to set using ethtool -L.
>
> Signed-off-by: Michael Chan <michael.chan@broadcom.com>
In switchdev mode we can use number of queues on the representor as a
proxy for max number of queues allowed for the ASIC port. This works
better when representors are muxed in the first place than when they
have actual queues backing them. WDYT about such scheme, Or? A very
pleasant side-effect is that one can configure qdiscs and get stats
per-HW queue.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next RFC 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
2018-05-09 23:15 ` Jakub Kicinski
@ 2018-05-10 0:22 ` Michael Chan
2018-05-10 1:10 ` Jakub Kicinski
0 siblings, 1 reply; 8+ messages in thread
From: Michael Chan @ 2018-05-10 0:22 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: Or Gerlitz, David Miller, Netdev
On Wed, May 9, 2018 at 4:15 PM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
> On Wed, 9 May 2018 07:21:41 -0400, Michael Chan wrote:
>> VF Queue resources are always limited and there is currently no
>> infrastructure to allow the admin. on the host to add or reduce queue
>> resources for any particular VF. With ever increasing number of VFs
>> being supported, it is desirable to allow the admin. to configure queue
>> resources differently for the VFs. Some VFs may require more or fewer
>> queues due to different bandwidth requirements or different number of
>> vCPUs in the VM. This patch adds the infrastructure to do that by
>> adding IFLA_VF_QUEUES netlink attribute and a new .ndo_set_vf_queues()
>> to the net_device_ops.
>>
>> Four parameters are exposed for each VF:
>>
>> o min_tx_queues - Guaranteed or current tx queues assigned to the VF.
>
> This muxing of semantics may be a little awkward and unnecessary, would
> it make sense for struct ifla_vf_info to have a separate fields for
> current number of queues and the admin-set guaranteed min?
The loose semantics is mainly to allow some flexibility in
implementation. Sure, we can tighten the definitions or add
additional fields.
>
> Is there a real world use case for the min value or are you trying to
> make the API feature complete?
In this proposal, these parameters are mainly viewed as the bounds for
the queues that each VF can potentially allocate. The actual number
of queues chosen by the VF driver or modified by the VF user can be
any number within the bounds.
We currently need to have min and max parameters to support the
different modes we use to distribute the queue resources to the VFs.
In one mode, for example, resources are statically divided and each VF
has a small number of guaranteed queues (min = max). In a different
mode, we allow more flexible resource allocation with each VF having a
small number of guaranteed queues but a higher number of
non-guaranteed queues (min < max). Some VFs may be able to allocate
queues much higher than min when resources are still available, while
others may only be able to allocate min queues when resources are used
up.
With min and max exposed, the PF user can properly tweak the resources
for each VF described above.
>
>> o max_tx_queues - Maximum but not necessarily guaranteed tx queues
>> available to the VF.
>>
>> o min_rx_queues - Guaranteed or current rx queues assigned to the VF.
>>
>> o max_rx_queues - Maximum but not necessarily guaranteed rx queues
>> available to the VF.
>>
>> The "ip link set" command will subsequently be patched to support the new
>> operation to set the above parameters.
>>
>> After the admin. makes a change to the above parameters, the corresponding
>> VF will have a new range of channels to set using ethtool -L.
>>
>> Signed-off-by: Michael Chan <michael.chan@broadcom.com>
>
> In switchdev mode we can use number of queues on the representor as a
> proxy for max number of queues allowed for the ASIC port. This works
> better when representors are muxed in the first place than when they
> have actual queues backing them. WDYT about such scheme, Or? A very
> pleasant side-effect is that one can configure qdiscs and get stats
> per-HW queue.
This is an interesting approach. But it doesn't have the min and max
for each VF, and also only works in switchdev mode.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next RFC 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
2018-05-10 0:22 ` Michael Chan
@ 2018-05-10 1:10 ` Jakub Kicinski
2018-05-10 2:32 ` Michael Chan
0 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2018-05-10 1:10 UTC (permalink / raw)
To: Michael Chan; +Cc: Or Gerlitz, David Miller, Netdev
On Wed, 9 May 2018 17:22:50 -0700, Michael Chan wrote:
> On Wed, May 9, 2018 at 4:15 PM, Jakub Kicinski wrote:
> > On Wed, 9 May 2018 07:21:41 -0400, Michael Chan wrote:
> >> VF Queue resources are always limited and there is currently no
> >> infrastructure to allow the admin. on the host to add or reduce queue
> >> resources for any particular VF. With ever increasing number of VFs
> >> being supported, it is desirable to allow the admin. to configure queue
> >> resources differently for the VFs. Some VFs may require more or fewer
> >> queues due to different bandwidth requirements or different number of
> >> vCPUs in the VM. This patch adds the infrastructure to do that by
> >> adding IFLA_VF_QUEUES netlink attribute and a new .ndo_set_vf_queues()
> >> to the net_device_ops.
> >>
> >> Four parameters are exposed for each VF:
> >>
> >> o min_tx_queues - Guaranteed or current tx queues assigned to the VF.
> >
> > This muxing of semantics may be a little awkward and unnecessary, would
> > it make sense for struct ifla_vf_info to have a separate fields for
> > current number of queues and the admin-set guaranteed min?
>
> The loose semantics is mainly to allow some flexibility in
> implementation. Sure, we can tighten the definitions or add
> additional fields.
I would appreciate that, if others don't disagree. I personally don't
see the need for flexibility (AKA per-vendor behaviour) here, quite the
opposite, min/max/current number of queues seems quite self-explanatory.
Or at least don't allow min to mean current? Otherwise the API gets a
bit asymmetrical :(
> > Is there a real world use case for the min value or are you trying to
> > make the API feature complete?
>
> In this proposal, these parameters are mainly viewed as the bounds for
> the queues that each VF can potentially allocate. The actual number
> of queues chosen by the VF driver or modified by the VF user can be
> any number within the bounds.
Perhaps you have misspoken here - these are not allowed bounds, right?
min is the guarantee that queues will be available, not requirement.
Similar to bandwidth allocation.
IOW if the bounds are set [4, 16] the VF may still choose to use 1
queue, event thought that's not within bounds.
> We currently need to have min and max parameters to support the
> different modes we use to distribute the queue resources to the VFs.
> In one mode, for example, resources are statically divided and each VF
> has a small number of guaranteed queues (min = max). In a different
> mode, we allow more flexible resource allocation with each VF having a
> small number of guaranteed queues but a higher number of
> non-guaranteed queues (min < max). Some VFs may be able to allocate
> queues much higher than min when resources are still available, while
> others may only be able to allocate min queues when resources are used
> up.
>
> With min and max exposed, the PF user can properly tweak the resources
> for each VF described above.
Right, I was just looking for a real world scenario where this
flexibility is going to be used - mainly because the switchdev model I
described below won't allow it. I'm not sure users will leave a
portion of queues to be allocated by chance.
> >> o max_tx_queues - Maximum but not necessarily guaranteed tx queues
> >> available to the VF.
> >>
> >> o min_rx_queues - Guaranteed or current rx queues assigned to the VF.
> >>
> >> o max_rx_queues - Maximum but not necessarily guaranteed rx queues
> >> available to the VF.
> >>
> >> The "ip link set" command will subsequently be patched to support the new
> >> operation to set the above parameters.
> >>
> >> After the admin. makes a change to the above parameters, the corresponding
> >> VF will have a new range of channels to set using ethtool -L.
> >>
> >> Signed-off-by: Michael Chan <michael.chan@broadcom.com>
> >
> > In switchdev mode we can use number of queues on the representor as a
> > proxy for max number of queues allowed for the ASIC port. This works
> > better when representors are muxed in the first place than when they
> > have actual queues backing them. WDYT about such scheme, Or? A very
> > pleasant side-effect is that one can configure qdiscs and get stats
> > per-HW queue.
>
> This is an interesting approach. But it doesn't have the min and max
> for each VF, and also only works in switchdev mode.
It allows controlling all ports of the switch with the same, existing
and well known API (incl. PFs). But sadly I don't think we are at the
point where switchdev-mode solutions are considered an alternative, so
I'm only mentioning it to broaden the discussion :) I'm not opposed to
your patches :)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next RFC 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
2018-05-10 1:10 ` Jakub Kicinski
@ 2018-05-10 2:32 ` Michael Chan
0 siblings, 0 replies; 8+ messages in thread
From: Michael Chan @ 2018-05-10 2:32 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: Or Gerlitz, David Miller, Netdev
On Wed, May 9, 2018 at 6:10 PM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
> On Wed, 9 May 2018 17:22:50 -0700, Michael Chan wrote:
>> On Wed, May 9, 2018 at 4:15 PM, Jakub Kicinski wrote:
>> > On Wed, 9 May 2018 07:21:41 -0400, Michael Chan wrote:
>> >> VF Queue resources are always limited and there is currently no
>> >> infrastructure to allow the admin. on the host to add or reduce queue
>> >> resources for any particular VF. With ever increasing number of VFs
>> >> being supported, it is desirable to allow the admin. to configure queue
>> >> resources differently for the VFs. Some VFs may require more or fewer
>> >> queues due to different bandwidth requirements or different number of
>> >> vCPUs in the VM. This patch adds the infrastructure to do that by
>> >> adding IFLA_VF_QUEUES netlink attribute and a new .ndo_set_vf_queues()
>> >> to the net_device_ops.
>> >>
>> >> Four parameters are exposed for each VF:
>> >>
>> >> o min_tx_queues - Guaranteed or current tx queues assigned to the VF.
>> >
>> > This muxing of semantics may be a little awkward and unnecessary, would
>> > it make sense for struct ifla_vf_info to have a separate fields for
>> > current number of queues and the admin-set guaranteed min?
>>
>> The loose semantics is mainly to allow some flexibility in
>> implementation. Sure, we can tighten the definitions or add
>> additional fields.
>
> I would appreciate that, if others don't disagree. I personally don't
> see the need for flexibility (AKA per-vendor behaviour) here, quite the
> opposite, min/max/current number of queues seems quite self-explanatory.
>
> Or at least don't allow min to mean current? Otherwise the API gets a
> bit asymmetrical :(
Sure, will do.
>
>> > Is there a real world use case for the min value or are you trying to
>> > make the API feature complete?
>>
>> In this proposal, these parameters are mainly viewed as the bounds for
>> the queues that each VF can potentially allocate. The actual number
>> of queues chosen by the VF driver or modified by the VF user can be
>> any number within the bounds.
>
> Perhaps you have misspoken here - these are not allowed bounds, right?
> min is the guarantee that queues will be available, not requirement.
> Similar to bandwidth allocation.
>
> IOW if the bounds are set [4, 16] the VF may still choose to use 1
> queue, event thought that's not within bounds.
Yes, you are absolutely right. The VF can allocate 1 queue. Up to
min is guaranteed. Up to max is not guaranteed.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-05-10 2:32 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-05-09 11:21 [PATCH net-next RFC 0/3] net: Add support to configure SR-IOV VF queues Michael Chan
2018-05-09 11:21 ` [PATCH net-next RFC 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues Michael Chan
2018-05-09 23:15 ` Jakub Kicinski
2018-05-10 0:22 ` Michael Chan
2018-05-10 1:10 ` Jakub Kicinski
2018-05-10 2:32 ` Michael Chan
2018-05-09 11:21 ` [PATCH net-next RFC 2/3] bnxt_en: Store min/max tx/rx rings for individual VFs Michael Chan
2018-05-09 11:21 ` [PATCH net-next RFC 3/3] bnxt_en: Implement .ndo_set_vf_queues() Michael Chan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).