* [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
2018-05-29 8:18 [PATCH net-next 0/3] net: Add support to configure SR-IOV VF queues Michael Chan
@ 2018-05-29 8:18 ` Michael Chan
2018-05-29 20:46 ` Samudrala, Sridhar
0 siblings, 1 reply; 11+ messages in thread
From: Michael Chan @ 2018-05-29 8:18 UTC (permalink / raw)
To: davem; +Cc: netdev
VF Queue resources are always limited and there is currently no
infrastructure to allow the admin. on the host to add or reduce queue
resources for any particular VF. With ever increasing number of VFs
being supported, it is desirable to allow the admin. to configure queue
resources differently for the VFs. Some VFs may require more or fewer
queues due to different bandwidth requirements or different number of
vCPUs in the VM. This patch adds the infrastructure to do that by
adding IFLA_VF_QUEUES netlink attribute and a new .ndo_set_vf_queues()
to the net_device_ops.
Four parameters are exposed for each VF:
o min_tx_queues - Guaranteed tx queues available to the VF.
o max_tx_queues - Maximum but not necessarily guaranteed tx queues
available to the VF.
o min_rx_queues - Guaranteed rx queues available to the VF.
o max_rx_queues - Maximum but not necessarily guaranteed rx queues
available to the VF.
The "ip link set" command will subsequently be patched to support the new
operation to set the above parameters.
After the admin. makes a change to the above parameters, the corresponding
VF will have a new range of channels to set using ethtool -L. The VF may
have to go through IF down/up before the new queues will take effect. Up
to the min values are guaranteed. Up to the max values are possible but not
guaranteed.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
include/linux/if_link.h | 4 ++++
include/linux/netdevice.h | 6 ++++++
include/uapi/linux/if_link.h | 9 +++++++++
net/core/rtnetlink.c | 32 +++++++++++++++++++++++++++++---
4 files changed, 48 insertions(+), 3 deletions(-)
diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 622658d..8e81121 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -29,5 +29,9 @@ struct ifla_vf_info {
__u32 rss_query_en;
__u32 trusted;
__be16 vlan_proto;
+ __u32 min_tx_queues;
+ __u32 max_tx_queues;
+ __u32 min_rx_queues;
+ __u32 max_rx_queues;
};
#endif /* _LINUX_IF_LINK_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 8452f72..17f5892 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1023,6 +1023,8 @@ struct dev_ifalias {
* with PF and querying it may introduce a theoretical security risk.
* int (*ndo_set_vf_rss_query_en)(struct net_device *dev, int vf, bool setting);
* int (*ndo_get_vf_port)(struct net_device *dev, int vf, struct sk_buff *skb);
+ * int (*ndo_set_vf_queues)(struct net_device *dev, int vf, int min_txq,
+ * int max_txq, int min_rxq, int max_rxq);
* int (*ndo_setup_tc)(struct net_device *dev, enum tc_setup_type type,
* void *type_data);
* Called to setup any 'tc' scheduler, classifier or action on @dev.
@@ -1276,6 +1278,10 @@ struct net_device_ops {
int (*ndo_set_vf_rss_query_en)(
struct net_device *dev,
int vf, bool setting);
+ int (*ndo_set_vf_queues)(struct net_device *dev,
+ int vf,
+ int min_txq, int max_txq,
+ int min_rxq, int max_rxq);
int (*ndo_setup_tc)(struct net_device *dev,
enum tc_setup_type type,
void *type_data);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index cf01b68..81bbc4e 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -659,6 +659,7 @@ enum {
IFLA_VF_IB_NODE_GUID, /* VF Infiniband node GUID */
IFLA_VF_IB_PORT_GUID, /* VF Infiniband port GUID */
IFLA_VF_VLAN_LIST, /* nested list of vlans, option for QinQ */
+ IFLA_VF_QUEUES, /* Min and Max TX/RX queues */
__IFLA_VF_MAX,
};
@@ -749,6 +750,14 @@ struct ifla_vf_trust {
__u32 setting;
};
+struct ifla_vf_queues {
+ __u32 vf;
+ __u32 min_tx_queues; /* min guaranteed tx queues */
+ __u32 max_tx_queues; /* max non guaranteed tx queues */
+ __u32 min_rx_queues; /* min guaranteed rx queues */
+ __u32 max_rx_queues; /* max non guaranteed rx queues */
+};
+
/* VF ports management section
*
* Nested layout of set/get msg is:
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 8080254..e21ab8a 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -921,7 +921,8 @@ static inline int rtnl_vfinfo_size(const struct net_device *dev,
nla_total_size_64bit(sizeof(__u64)) +
/* IFLA_VF_STATS_TX_DROPPED */
nla_total_size_64bit(sizeof(__u64)) +
- nla_total_size(sizeof(struct ifla_vf_trust)));
+ nla_total_size(sizeof(struct ifla_vf_trust)) +
+ nla_total_size(sizeof(struct ifla_vf_queues)));
return size;
} else
return 0;
@@ -1181,6 +1182,7 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
struct ifla_vf_vlan_info vf_vlan_info;
struct ifla_vf_spoofchk vf_spoofchk;
struct ifla_vf_tx_rate vf_tx_rate;
+ struct ifla_vf_queues vf_queues;
struct ifla_vf_stats vf_stats;
struct ifla_vf_trust vf_trust;
struct ifla_vf_vlan vf_vlan;
@@ -1198,6 +1200,10 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
ivi.spoofchk = -1;
ivi.rss_query_en = -1;
ivi.trusted = -1;
+ ivi.min_tx_queues = -1;
+ ivi.max_tx_queues = -1;
+ ivi.min_rx_queues = -1;
+ ivi.max_rx_queues = -1;
/* The default value for VF link state is "auto"
* IFLA_VF_LINK_STATE_AUTO which equals zero
*/
@@ -1217,7 +1223,8 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
vf_spoofchk.vf =
vf_linkstate.vf =
vf_rss_query_en.vf =
- vf_trust.vf = ivi.vf;
+ vf_trust.vf =
+ vf_queues.vf = ivi.vf;
memcpy(vf_mac.mac, ivi.mac, sizeof(ivi.mac));
vf_vlan.vlan = ivi.vlan;
@@ -1232,6 +1239,10 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
vf_linkstate.link_state = ivi.linkstate;
vf_rss_query_en.setting = ivi.rss_query_en;
vf_trust.setting = ivi.trusted;
+ vf_queues.min_tx_queues = ivi.min_tx_queues;
+ vf_queues.max_tx_queues = ivi.max_tx_queues;
+ vf_queues.min_rx_queues = ivi.min_rx_queues;
+ vf_queues.max_rx_queues = ivi.max_rx_queues;
vf = nla_nest_start(skb, IFLA_VF_INFO);
if (!vf)
goto nla_put_vfinfo_failure;
@@ -1249,7 +1260,9 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
sizeof(vf_rss_query_en),
&vf_rss_query_en) ||
nla_put(skb, IFLA_VF_TRUST,
- sizeof(vf_trust), &vf_trust))
+ sizeof(vf_trust), &vf_trust) ||
+ nla_put(skb, IFLA_VF_QUEUES,
+ sizeof(vf_queues), &vf_queues))
goto nla_put_vf_failure;
vfvlanlist = nla_nest_start(skb, IFLA_VF_VLAN_LIST);
if (!vfvlanlist)
@@ -1706,6 +1719,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
[IFLA_VF_TRUST] = { .len = sizeof(struct ifla_vf_trust) },
[IFLA_VF_IB_NODE_GUID] = { .len = sizeof(struct ifla_vf_guid) },
[IFLA_VF_IB_PORT_GUID] = { .len = sizeof(struct ifla_vf_guid) },
+ [IFLA_VF_QUEUES] = { .len = sizeof(struct ifla_vf_queues) },
};
static const struct nla_policy ifla_port_policy[IFLA_PORT_MAX+1] = {
@@ -2208,6 +2222,18 @@ static int do_setvfinfo(struct net_device *dev, struct nlattr **tb)
return handle_vf_guid(dev, ivt, IFLA_VF_IB_PORT_GUID);
}
+ if (tb[IFLA_VF_QUEUES]) {
+ struct ifla_vf_queues *ivq = nla_data(tb[IFLA_VF_QUEUES]);
+
+ err = -EOPNOTSUPP;
+ if (ops->ndo_set_vf_queues)
+ err = ops->ndo_set_vf_queues(dev, ivq->vf,
+ ivq->min_tx_queues, ivq->max_tx_queues,
+ ivq->min_rx_queues, ivq->max_rx_queues);
+ if (err < 0)
+ return err;
+ }
+
return err;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
2018-05-29 8:18 ` [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues Michael Chan
@ 2018-05-29 20:46 ` Samudrala, Sridhar
2018-05-30 3:19 ` Michael Chan
0 siblings, 1 reply; 11+ messages in thread
From: Samudrala, Sridhar @ 2018-05-29 20:46 UTC (permalink / raw)
To: Michael Chan, davem; +Cc: netdev
On 5/29/2018 1:18 AM, Michael Chan wrote:
> VF Queue resources are always limited and there is currently no
> infrastructure to allow the admin. on the host to add or reduce queue
> resources for any particular VF. With ever increasing number of VFs
> being supported, it is desirable to allow the admin. to configure queue
> resources differently for the VFs. Some VFs may require more or fewer
> queues due to different bandwidth requirements or different number of
> vCPUs in the VM. This patch adds the infrastructure to do that by
> adding IFLA_VF_QUEUES netlink attribute and a new .ndo_set_vf_queues()
> to the net_device_ops.
>
> Four parameters are exposed for each VF:
>
> o min_tx_queues - Guaranteed tx queues available to the VF.
>
> o max_tx_queues - Maximum but not necessarily guaranteed tx queues
> available to the VF.
>
> o min_rx_queues - Guaranteed rx queues available to the VF.
>
> o max_rx_queues - Maximum but not necessarily guaranteed rx queues
> available to the VF.
>
> The "ip link set" command will subsequently be patched to support the new
> operation to set the above parameters.
>
> After the admin. makes a change to the above parameters, the corresponding
> VF will have a new range of channels to set using ethtool -L. The VF may
> have to go through IF down/up before the new queues will take effect. Up
> to the min values are guaranteed. Up to the max values are possible but not
> guaranteed.
>
> Signed-off-by: Michael Chan <michael.chan@broadcom.com>
> ---
> include/linux/if_link.h | 4 ++++
> include/linux/netdevice.h | 6 ++++++
> include/uapi/linux/if_link.h | 9 +++++++++
> net/core/rtnetlink.c | 32 +++++++++++++++++++++++++++++---
> 4 files changed, 48 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> index 622658d..8e81121 100644
> --- a/include/linux/if_link.h
> +++ b/include/linux/if_link.h
> @@ -29,5 +29,9 @@ struct ifla_vf_info {
> __u32 rss_query_en;
> __u32 trusted;
> __be16 vlan_proto;
> + __u32 min_tx_queues;
> + __u32 max_tx_queues;
> + __u32 min_rx_queues;
> + __u32 max_rx_queues;
> };
> #endif /* _LINUX_IF_LINK_H */
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 8452f72..17f5892 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1023,6 +1023,8 @@ struct dev_ifalias {
> * with PF and querying it may introduce a theoretical security risk.
> * int (*ndo_set_vf_rss_query_en)(struct net_device *dev, int vf, bool setting);
> * int (*ndo_get_vf_port)(struct net_device *dev, int vf, struct sk_buff *skb);
> + * int (*ndo_set_vf_queues)(struct net_device *dev, int vf, int min_txq,
> + * int max_txq, int min_rxq, int max_rxq);
Isn't ndo_set_vf_xxx() considered a legacy interface and not planned to be extended?
Shouldn't we enable this via ethtool on the port representor netdev?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
2018-05-29 20:46 ` Samudrala, Sridhar
@ 2018-05-30 3:19 ` Michael Chan
2018-05-30 22:36 ` Jakub Kicinski
0 siblings, 1 reply; 11+ messages in thread
From: Michael Chan @ 2018-05-30 3:19 UTC (permalink / raw)
To: Samudrala, Sridhar; +Cc: David Miller, Netdev
On Tue, May 29, 2018 at 1:46 PM, Samudrala, Sridhar
<sridhar.samudrala@intel.com> wrote:
>
> Isn't ndo_set_vf_xxx() considered a legacy interface and not planned to be
> extended?
I didn't know about that.
> Shouldn't we enable this via ethtool on the port representor netdev?
>
>
We discussed about this. ethtool on the VF representor will only work
in switchdev mode and also will not support min/max values.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
@ 2018-05-30 5:56 Jakub Kicinski
2018-05-30 6:08 ` Michael Chan
0 siblings, 1 reply; 11+ messages in thread
From: Jakub Kicinski @ 2018-05-30 5:56 UTC (permalink / raw)
To: Michael Chan, Samudrala, Sridhar; +Cc: David Miller, Netdev, Or Gerlitz
On Tue, 29 May 2018 20:19:54 -0700, Michael Chan wrote:
> On Tue, May 29, 2018 at 1:46 PM, Samudrala, Sridhar wrote:
> > Isn't ndo_set_vf_xxx() considered a legacy interface and not planned to be
> > extended?
+1 it's painful to see this feature being added to the legacy
API :( Another duplicated configuration knob.
> I didn't know about that.
>
> > Shouldn't we enable this via ethtool on the port representor netdev?
>
> We discussed about this. ethtool on the VF representor will only work
> in switchdev mode and also will not support min/max values.
Ethtool channel API may be overdue a rewrite in devlink anyway, but I
feel like implementing switchdev mode and rewriting features in devlink
may be too much to ask.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
2018-05-30 5:56 [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues Jakub Kicinski
@ 2018-05-30 6:08 ` Michael Chan
2018-05-30 6:33 ` Jakub Kicinski
0 siblings, 1 reply; 11+ messages in thread
From: Michael Chan @ 2018-05-30 6:08 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: Samudrala, Sridhar, David Miller, Netdev, Or Gerlitz
On Tue, May 29, 2018 at 10:56 PM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
> On Tue, 29 May 2018 20:19:54 -0700, Michael Chan wrote:
>> On Tue, May 29, 2018 at 1:46 PM, Samudrala, Sridhar wrote:
>> > Isn't ndo_set_vf_xxx() considered a legacy interface and not planned to be
>> > extended?
>
> +1 it's painful to see this feature being added to the legacy
> API :( Another duplicated configuration knob.
>
>> I didn't know about that.
>>
>> > Shouldn't we enable this via ethtool on the port representor netdev?
>>
>> We discussed about this. ethtool on the VF representor will only work
>> in switchdev mode and also will not support min/max values.
>
> Ethtool channel API may be overdue a rewrite in devlink anyway, but I
> feel like implementing switchdev mode and rewriting features in devlink
> may be too much to ask.
Totally agreed. And switchdev mode doesn't seem to be that widely
used at the moment. Do you have other suggestions besides NDO?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
2018-05-30 6:08 ` Michael Chan
@ 2018-05-30 6:33 ` Jakub Kicinski
2018-05-30 7:18 ` Michael Chan
2018-05-30 21:23 ` Samudrala, Sridhar
0 siblings, 2 replies; 11+ messages in thread
From: Jakub Kicinski @ 2018-05-30 6:33 UTC (permalink / raw)
To: Michael Chan; +Cc: Samudrala, Sridhar, David Miller, Netdev, Or Gerlitz
On Tue, 29 May 2018 23:08:11 -0700, Michael Chan wrote:
> On Tue, May 29, 2018 at 10:56 PM, Jakub Kicinski wrote:
> > On Tue, 29 May 2018 20:19:54 -0700, Michael Chan wrote:
> >> On Tue, May 29, 2018 at 1:46 PM, Samudrala, Sridhar wrote:
> >> > Isn't ndo_set_vf_xxx() considered a legacy interface and not planned to be
> >> > extended?
> >
> > +1 it's painful to see this feature being added to the legacy
> > API :( Another duplicated configuration knob.
> >
> >> I didn't know about that.
> >>
> >> > Shouldn't we enable this via ethtool on the port representor netdev?
> >>
> >> We discussed about this. ethtool on the VF representor will only work
> >> in switchdev mode and also will not support min/max values.
> >
> > Ethtool channel API may be overdue a rewrite in devlink anyway, but I
> > feel like implementing switchdev mode and rewriting features in devlink
> > may be too much to ask.
>
> Totally agreed. And switchdev mode doesn't seem to be that widely
> used at the moment. Do you have other suggestions besides NDO?
At some points you (Broadcom) were working whole bunch of devlink
configuration options for the PCIe side of the ASIC. The number of
queues relates to things like number of allocated MSI-X vectors, which
if memory serves me was in your devlink patch set. In an ideal world
we would try to keep all those in one place :)
For PCIe config there is always the question of what can be configured
at runtime, and what requires a HW reset. Therefore that devlink API
which could configure current as well as persistent device settings was
quite nice. I'm not sure if reallocating queues would ever require
PCIe block reset but maybe... Certainly it seems the notion of min
queues would make more sense in PCIe configuration devlink API than
ethtool channel API to me as well.
Queues are in the grey area between netdev and non-netdev constructs.
They make sense both from PCIe resource allocation perspective (i.e.
devlink PCIe settings) and netdev perspective (ethtool) because they
feed into things like qdisc offloads, maybe per-queue stats etc.
So yes... IMHO it would be nice to add this to a devlink SR-IOV config
API and/or switchdev representors. But neither of those are really an
option for you today so IDK :)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
2018-05-30 6:33 ` Jakub Kicinski
@ 2018-05-30 7:18 ` Michael Chan
2018-05-30 21:23 ` Samudrala, Sridhar
1 sibling, 0 replies; 11+ messages in thread
From: Michael Chan @ 2018-05-30 7:18 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: Samudrala, Sridhar, David Miller, Netdev, Or Gerlitz
On Tue, May 29, 2018 at 11:33 PM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> At some points you (Broadcom) were working whole bunch of devlink
> configuration options for the PCIe side of the ASIC. The number of
> queues relates to things like number of allocated MSI-X vectors, which
> if memory serves me was in your devlink patch set. In an ideal world
> we would try to keep all those in one place :)
Yeah, another colleague is now working with Mellanox on something similar.
One difference between those devlink parameters and these queue
parameters is that the former are more permanent and global settings.
For example, number of VFs or number of MSIX per VF are persistent
settings once they are set and after PCIe reset. On the other hand,
these queue settings are pure run-time settings and may be unique for
each VF. These are not stored as there is no room in NVRAM to store
128 sets or more of these parameters.
Anyway, let me discuss this with my colleague to see if there is a
natural fit for these queue parameters in the devlink infrastructure
that they are working on.
>
> For PCIe config there is always the question of what can be configured
> at runtime, and what requires a HW reset. Therefore that devlink API
> which could configure current as well as persistent device settings was
> quite nice. I'm not sure if reallocating queues would ever require
> PCIe block reset but maybe... Certainly it seems the notion of min
> queues would make more sense in PCIe configuration devlink API than
> ethtool channel API to me as well.
>
> Queues are in the grey area between netdev and non-netdev constructs.
> They make sense both from PCIe resource allocation perspective (i.e.
> devlink PCIe settings) and netdev perspective (ethtool) because they
> feed into things like qdisc offloads, maybe per-queue stats etc.
>
> So yes... IMHO it would be nice to add this to a devlink SR-IOV config
> API and/or switchdev representors. But neither of those are really an
> option for you today so IDK :)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
2018-05-30 6:33 ` Jakub Kicinski
2018-05-30 7:18 ` Michael Chan
@ 2018-05-30 21:23 ` Samudrala, Sridhar
2018-05-30 22:53 ` Jakub Kicinski
1 sibling, 1 reply; 11+ messages in thread
From: Samudrala, Sridhar @ 2018-05-30 21:23 UTC (permalink / raw)
To: Jakub Kicinski, Michael Chan; +Cc: David Miller, Netdev, Or Gerlitz
On 5/29/2018 11:33 PM, Jakub Kicinski wrote:
> On Tue, 29 May 2018 23:08:11 -0700, Michael Chan wrote:
>> On Tue, May 29, 2018 at 10:56 PM, Jakub Kicinski wrote:
>>> On Tue, 29 May 2018 20:19:54 -0700, Michael Chan wrote:
>>>> On Tue, May 29, 2018 at 1:46 PM, Samudrala, Sridhar wrote:
>>>>> Isn't ndo_set_vf_xxx() considered a legacy interface and not planned to be
>>>>> extended?
>>> +1 it's painful to see this feature being added to the legacy
>>> API :( Another duplicated configuration knob.
>>>
>>>> I didn't know about that.
>>>>
>>>>> Shouldn't we enable this via ethtool on the port representor netdev?
>>>> We discussed about this. ethtool on the VF representor will only work
>>>> in switchdev mode and also will not support min/max values.
>>> Ethtool channel API may be overdue a rewrite in devlink anyway, but I
>>> feel like implementing switchdev mode and rewriting features in devlink
>>> may be too much to ask.
>> Totally agreed. And switchdev mode doesn't seem to be that widely
>> used at the moment. Do you have other suggestions besides NDO?
> At some points you (Broadcom) were working whole bunch of devlink
> configuration options for the PCIe side of the ASIC. The number of
> queues relates to things like number of allocated MSI-X vectors, which
> if memory serves me was in your devlink patch set. In an ideal world
> we would try to keep all those in one place :)
>
> For PCIe config there is always the question of what can be configured
> at runtime, and what requires a HW reset. Therefore that devlink API
> which could configure current as well as persistent device settings was
> quite nice. I'm not sure if reallocating queues would ever require
> PCIe block reset but maybe... Certainly it seems the notion of min
> queues would make more sense in PCIe configuration devlink API than
> ethtool channel API to me as well.
>
> Queues are in the grey area between netdev and non-netdev constructs.
> They make sense both from PCIe resource allocation perspective (i.e.
> devlink PCIe settings) and netdev perspective (ethtool) because they
> feed into things like qdisc offloads, maybe per-queue stats etc.
>
> So yes... IMHO it would be nice to add this to a devlink SR-IOV config
> API and/or switchdev representors. But neither of those are really an
> option for you today so IDK :)
One reason why 'switchdev' mode is not yet widely used or enabled by default
could be due to the requirement to program the flow rules only via slow path.
Would it make sense to relax this requirement and support a mode where port
representors are created and let the PF driver implement a default policy that
adds flow rules for all the VFs to enable connectivity and let the user
add/modify the rules via port representors?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
2018-05-30 3:19 ` Michael Chan
@ 2018-05-30 22:36 ` Jakub Kicinski
0 siblings, 0 replies; 11+ messages in thread
From: Jakub Kicinski @ 2018-05-30 22:36 UTC (permalink / raw)
To: Michael Chan; +Cc: Samudrala, Sridhar, David Miller, Netdev
On Wed, 30 May 2018 00:18:39 -0700, Michael Chan wrote:
> On Tue, May 29, 2018 at 11:33 PM, Jakub Kicinski wrote:
> > At some points you (Broadcom) were working whole bunch of devlink
> > configuration options for the PCIe side of the ASIC. The number of
> > queues relates to things like number of allocated MSI-X vectors, which
> > if memory serves me was in your devlink patch set. In an ideal world
> > we would try to keep all those in one place :)
>
> Yeah, another colleague is now working with Mellanox on something similar.
>
> One difference between those devlink parameters and these queue
> parameters is that the former are more permanent and global settings.
> For example, number of VFs or number of MSIX per VF are persistent
> settings once they are set and after PCIe reset. On the other hand,
> these queue settings are pure run-time settings and may be unique for
> each VF. These are not stored as there is no room in NVRAM to store
> 128 sets or more of these parameters.
Indeed, I think the API must be flexible as to what is persistent and
what is not because HW will certainly differ in that regard. And
agreed, queues may be a bit of a stretch here, but worth a try.
> Anyway, let me discuss this with my colleague to see if there is a
> natural fit for these queue parameters in the devlink infrastructure
> that they are working on.
Thank you!
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
2018-05-30 21:23 ` Samudrala, Sridhar
@ 2018-05-30 22:53 ` Jakub Kicinski
2018-05-31 3:35 ` Samudrala, Sridhar
0 siblings, 1 reply; 11+ messages in thread
From: Jakub Kicinski @ 2018-05-30 22:53 UTC (permalink / raw)
To: Samudrala, Sridhar; +Cc: Michael Chan, David Miller, Netdev, Or Gerlitz
On Wed, 30 May 2018 14:23:06 -0700, Samudrala, Sridhar wrote:
> On 5/29/2018 11:33 PM, Jakub Kicinski wrote:
> > On Tue, 29 May 2018 23:08:11 -0700, Michael Chan wrote:
> >> On Tue, May 29, 2018 at 10:56 PM, Jakub Kicinski wrote:
> >>> On Tue, 29 May 2018 20:19:54 -0700, Michael Chan wrote:
> >>>> On Tue, May 29, 2018 at 1:46 PM, Samudrala, Sridhar wrote:
> >>>>> Isn't ndo_set_vf_xxx() considered a legacy interface and not planned to be
> >>>>> extended?
> >>> +1 it's painful to see this feature being added to the legacy
> >>> API :( Another duplicated configuration knob.
> >>>
> >>>> I didn't know about that.
> >>>>
> >>>>> Shouldn't we enable this via ethtool on the port representor netdev?
> >>>> We discussed about this. ethtool on the VF representor will only work
> >>>> in switchdev mode and also will not support min/max values.
> >>> Ethtool channel API may be overdue a rewrite in devlink anyway, but I
> >>> feel like implementing switchdev mode and rewriting features in devlink
> >>> may be too much to ask.
> >> Totally agreed. And switchdev mode doesn't seem to be that widely
> >> used at the moment. Do you have other suggestions besides NDO?
> > At some points you (Broadcom) were working whole bunch of devlink
> > configuration options for the PCIe side of the ASIC. The number of
> > queues relates to things like number of allocated MSI-X vectors, which
> > if memory serves me was in your devlink patch set. In an ideal world
> > we would try to keep all those in one place :)
> >
> > For PCIe config there is always the question of what can be configured
> > at runtime, and what requires a HW reset. Therefore that devlink API
> > which could configure current as well as persistent device settings was
> > quite nice. I'm not sure if reallocating queues would ever require
> > PCIe block reset but maybe... Certainly it seems the notion of min
> > queues would make more sense in PCIe configuration devlink API than
> > ethtool channel API to me as well.
> >
> > Queues are in the grey area between netdev and non-netdev constructs.
> > They make sense both from PCIe resource allocation perspective (i.e.
> > devlink PCIe settings) and netdev perspective (ethtool) because they
> > feed into things like qdisc offloads, maybe per-queue stats etc.
> >
> > So yes... IMHO it would be nice to add this to a devlink SR-IOV config
> > API and/or switchdev representors. But neither of those are really an
> > option for you today so IDK :)
>
> One reason why 'switchdev' mode is not yet widely used or enabled by default
> could be due to the requirement to program the flow rules only via slow path.
Do you mean the fallback traffic requirement?
> Would it make sense to relax this requirement and support a mode where port
> representors are created and let the PF driver implement a default policy that
> adds flow rules for all the VFs to enable connectivity and let the user
> add/modify the rules via port representors?
I definitely share your concerns, stopping a major HW vendor from using
this new and preferred mode is not helping us make progress.
The problem is that if we allow this diversion, i.e. driver to implement
some special policy, or pre-populate a bridge in a configuration that
suits the HW we may condition users to expect that as the standard Linux
behaviour. And we will be stuck with it forever even tho your next gen
HW (ice?) may support correct behaviour.
We should perhaps separate switchdev mode from TC flower/OvS offloads.
Is your objective to implement OvS offload or just switchdev mode?
For OvS without proper fallback behaviour you may struggle.
Switchdev mode could be within your reach even without changing the
default rules. What if you spawned all port netdevs (I dislike the
term representor, sorry, it's confusing people) in down state and then
refuse to bring them up unless user instantiated a bridge that would
behave in a way that your HW can support? If ports are down you won't
have fallback traffic so no problem to solve.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
2018-05-30 22:53 ` Jakub Kicinski
@ 2018-05-31 3:35 ` Samudrala, Sridhar
0 siblings, 0 replies; 11+ messages in thread
From: Samudrala, Sridhar @ 2018-05-31 3:35 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: Michael Chan, David Miller, Netdev, Or Gerlitz
On 5/30/2018 3:53 PM, Jakub Kicinski wrote:
> On Wed, 30 May 2018 14:23:06 -0700, Samudrala, Sridhar wrote:
>> On 5/29/2018 11:33 PM, Jakub Kicinski wrote:
>>> On Tue, 29 May 2018 23:08:11 -0700, Michael Chan wrote:
>>>> On Tue, May 29, 2018 at 10:56 PM, Jakub Kicinski wrote:
>>>>> On Tue, 29 May 2018 20:19:54 -0700, Michael Chan wrote:
>>>>>> On Tue, May 29, 2018 at 1:46 PM, Samudrala, Sridhar wrote:
>>>>>>> Isn't ndo_set_vf_xxx() considered a legacy interface and not planned to be
>>>>>>> extended?
>>>>> +1 it's painful to see this feature being added to the legacy
>>>>> API :( Another duplicated configuration knob.
>>>>>
>>>>>> I didn't know about that.
>>>>>>
>>>>>>> Shouldn't we enable this via ethtool on the port representor netdev?
>>>>>> We discussed about this. ethtool on the VF representor will only work
>>>>>> in switchdev mode and also will not support min/max values.
>>>>> Ethtool channel API may be overdue a rewrite in devlink anyway, but I
>>>>> feel like implementing switchdev mode and rewriting features in devlink
>>>>> may be too much to ask.
>>>> Totally agreed. And switchdev mode doesn't seem to be that widely
>>>> used at the moment. Do you have other suggestions besides NDO?
>>> At some points you (Broadcom) were working whole bunch of devlink
>>> configuration options for the PCIe side of the ASIC. The number of
>>> queues relates to things like number of allocated MSI-X vectors, which
>>> if memory serves me was in your devlink patch set. In an ideal world
>>> we would try to keep all those in one place :)
>>>
>>> For PCIe config there is always the question of what can be configured
>>> at runtime, and what requires a HW reset. Therefore that devlink API
>>> which could configure current as well as persistent device settings was
>>> quite nice. I'm not sure if reallocating queues would ever require
>>> PCIe block reset but maybe... Certainly it seems the notion of min
>>> queues would make more sense in PCIe configuration devlink API than
>>> ethtool channel API to me as well.
>>>
>>> Queues are in the grey area between netdev and non-netdev constructs.
>>> They make sense both from PCIe resource allocation perspective (i.e.
>>> devlink PCIe settings) and netdev perspective (ethtool) because they
>>> feed into things like qdisc offloads, maybe per-queue stats etc.
>>>
>>> So yes... IMHO it would be nice to add this to a devlink SR-IOV config
>>> API and/or switchdev representors. But neither of those are really an
>>> option for you today so IDK :)
>> One reason why 'switchdev' mode is not yet widely used or enabled by default
>> could be due to the requirement to program the flow rules only via slow path.
> Do you mean the fallback traffic requirement?
Yes.
>
>> Would it make sense to relax this requirement and support a mode where port
>> representors are created and let the PF driver implement a default policy that
>> adds flow rules for all the VFs to enable connectivity and let the user
>> add/modify the rules via port representors?
> I definitely share your concerns, stopping a major HW vendor from using
> this new and preferred mode is not helping us make progress.
>
> The problem is that if we allow this diversion, i.e. driver to implement
> some special policy, or pre-populate a bridge in a configuration that
> suits the HW we may condition users to expect that as the standard Linux
> behaviour. And we will be stuck with it forever even tho your next gen
> HW (ice?) may support correct behaviour.
Yes. ice can support slowpath behavior as required to support OVS offload.
However, i was just wondering if we should have an option to allow switchdev
without slowpath so that the user can use alternate mechanisms to program
the flow rules instead of having to use OVS.
>
> We should perhaps separate switchdev mode from TC flower/OvS offloads.
> Is your objective to implement OvS offload or just switchdev mode?
>
> For OvS without proper fallback behaviour you may struggle.
>
> Switchdev mode could be within your reach even without changing the
> default rules. What if you spawned all port netdevs (I dislike the
> term representor, sorry, it's confusing people) in down state and then
> refuse to bring them up unless user instantiated a bridge that would
> behave in a way that your HW can support? If ports are down you won't
> have fallback traffic so no problem to solve.
If we want to use port netdev's admin state to control the link state of the
VFs then this will not work.
We need to only disable TX/RX but admin state and link state need to be
supported on the port netdevs.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2018-05-31 3:35 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-05-30 5:56 [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues Jakub Kicinski
2018-05-30 6:08 ` Michael Chan
2018-05-30 6:33 ` Jakub Kicinski
2018-05-30 7:18 ` Michael Chan
2018-05-30 21:23 ` Samudrala, Sridhar
2018-05-30 22:53 ` Jakub Kicinski
2018-05-31 3:35 ` Samudrala, Sridhar
-- strict thread matches above, loose matches on Subject: below --
2018-05-29 8:18 [PATCH net-next 0/3] net: Add support to configure SR-IOV VF queues Michael Chan
2018-05-29 8:18 ` [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues Michael Chan
2018-05-29 20:46 ` Samudrala, Sridhar
2018-05-30 3:19 ` Michael Chan
2018-05-30 22:36 ` Jakub Kicinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox