netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v7 0/9] Implement devlink-rate API and extend it
@ 2022-10-27 13:00 Michal Wilczynski
  2022-10-27 13:00 ` [PATCH net-next v7 1/9] devlink: Introduce new parameter 'tx_priority' to devlink-rate Michal Wilczynski
                   ` (8 more replies)
  0 siblings, 9 replies; 16+ messages in thread
From: Michal Wilczynski @ 2022-10-27 13:00 UTC (permalink / raw)
  To: netdev
  Cc: alexandr.lobakin, jacob.e.keller, jesse.brandeburg,
	przemyslaw.kitszel, anthony.l.nguyen, kuba, ecree.xilinx, jiri,
	Michal Wilczynski

This is a follow up on:
https://lore.kernel.org/netdev/20221018123543.1210217-1-michal.wilczynski@intel.com/

This patch series implements devlink-rate for ice driver. Unfortunately
current API isn't flexible enough for our use case, so there is a need to
extend it. Some functions have been introduced to enable the driver to
export current Tx scheduling configuration.

In the previous submission I've made a mistake and didn't remove
internal review comments. To avoid confusion I don't go backwards
in my versioning and submit it as v7.


Pasting justification for this series from commit implementing devlink-rate
in ice driver(that is a part of this series):

There is a need to support modification of Tx scheduler tree, in the
ice driver. This will allow user to control Tx settings of each node in
the internal hierarchy of nodes. As a result user will be able to use
Hierarchy QoS implemented entirely in the hardware.

This patch implemenents devlink-rate API. It also exports initial
default hierarchy. It's mostly dictated by the fact that the tree
can't be removed entirely, all we can do is enable the user to modify
it. For example root node shouldn't ever be removed, also nodes that
have children are off-limits.

Example initial tree with 2 VF's:

[root@fedora ~]# devlink port function rate show
pci/0000:4b:00.0/node_27: type node parent node_26
pci/0000:4b:00.0/node_26: type node parent node_0
pci/0000:4b:00.0/node_34: type node parent node_33
pci/0000:4b:00.0/node_33: type node parent node_32
pci/0000:4b:00.0/node_32: type node parent node_16
pci/0000:4b:00.0/node_19: type node parent node_18
pci/0000:4b:00.0/node_18: type node parent node_17
pci/0000:4b:00.0/node_17: type node parent node_16
pci/0000:4b:00.0/node_21: type node parent node_20
pci/0000:4b:00.0/node_20: type node parent node_3
pci/0000:4b:00.0/node_14: type node parent node_5
pci/0000:4b:00.0/node_5: type node parent node_3
pci/0000:4b:00.0/node_13: type node parent node_4
pci/0000:4b:00.0/node_12: type node parent node_4
pci/0000:4b:00.0/node_11: type node parent node_4
pci/0000:4b:00.0/node_10: type node parent node_4
pci/0000:4b:00.0/node_9: type node parent node_4
pci/0000:4b:00.0/node_8: type node parent node_4
pci/0000:4b:00.0/node_7: type node parent node_4
pci/0000:4b:00.0/node_6: type node parent node_4
pci/0000:4b:00.0/node_4: type node parent node_3
pci/0000:4b:00.0/node_3: type node parent node_16
pci/0000:4b:00.0/node_16: type node parent node_15
pci/0000:4b:00.0/node_15: type node parent node_0
pci/0000:4b:00.0/node_2: type node parent node_1
pci/0000:4b:00.0/node_1: type node parent node_0
pci/0000:4b:00.0/node_0: type node
pci/0000:4b:00.0/1: type leaf parent node_27
pci/0000:4b:00.0/2: type leaf parent node_27


Let me visualize part of the tree:

                        +---------+
                        |  node_0 |
                        +---------+
                             |
                        +----v----+
                        | node_26 |
                        +----+----+
                             |
                        +----v----+
                        | node_27 |
                        +----+----+
                             |
                    |-----------------|
               +----v----+       +----v----+
               |   VF 1  |       |   VF 2  |
               +----+----+       +----+----+

So at this point there is a couple things that can be done.
For example we could only assign parameters to VF's.

[root@fedora ~]# devlink port function rate set pci/0000:4b:00.0/1 \
                 tx_max 5Gbps

This would cap the VF 1 BW to 5Gbps.

But let's say you would like to create a completely new branch.
This can be done like this:

[root@fedora ~]# devlink port function rate add \
                 pci/0000:4b:00.0/node_custom parent node_0
[root@fedora ~]# devlink port function rate add \
                 pci/0000:4b:00.0/node_custom_1 parent node_custom
[root@fedora ~]# devlink port function rate set \
                 pci/0000:4b:00.0/1 parent node_custom_1

This creates a completely new branch and reassigns VF 1 to it.

A number of parameters is supported per each node: tx_max, tx_share,
tx_priority and tx_weight.


V7:
- split into smaller commits
- paste justification for this series to cover letter

V6:
- replaced strncpy with strscpy
- renamed rate_vport -> rate_leaf

V5:
- removed queue support per community request
- fix division of 64bit variable with 32bit divisor by using div_u64()
- remove RDMA, ADQ exlusion as it's not necessary anymore
- changed how driver exports configuration, as queues are not supported
  anymore
- changed IDA to Xarray for unique node identification


V4:
- changed static variable counter to per port IDA to
  uniquely identify nodes

V3:
- removed shift macros, since FIELD_PREP is used
- added static_assert for struct
- removed unnecessary functions
- used tab instead of space in define

V2:
- fixed Alexandr comments
- refactored code to fix checkpatch issues
- added mutual exclusion for RDMA, DCB



Michal Wilczynski (9):
  devlink: Introduce new parameter 'tx_priority' to devlink-rate
  devlink: Introduce new parameter 'tx_weight' to devlink-rate
  devlink: Enable creation of the devlink-rate nodes from the driver
  devlink: Allow for devlink-rate nodes parent reassignment
  devlink: Allow to set up parent in devl_rate_leaf_create()
  devlink: Allow to change priv in devlink-rate from parent_set
    callbacks
  ice: Introduce new parameters in ice_sched_node
  ice: Implement devlink-rate API
  ice: Prevent ADQ, DCB, RDMA coexistence with Custom Tx scheduler

 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |   4 +-
 drivers/net/ethernet/intel/ice/ice_common.c   |   3 +
 drivers/net/ethernet/intel/ice/ice_dcb_lib.c  |   4 +
 drivers/net/ethernet/intel/ice/ice_devlink.c  | 478 ++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_devlink.h  |   2 +
 drivers/net/ethernet/intel/ice/ice_idc.c      |   5 +
 drivers/net/ethernet/intel/ice/ice_repr.c     |  13 +
 drivers/net/ethernet/intel/ice/ice_sched.c    |  79 ++-
 drivers/net/ethernet/intel/ice/ice_sched.h    |  25 +
 drivers/net/ethernet/intel/ice/ice_type.h     |   8 +
 .../mellanox/mlx5/core/esw/devlink_port.c     |   4 +-
 .../net/ethernet/mellanox/mlx5/core/esw/qos.c |   4 +-
 .../net/ethernet/mellanox/mlx5/core/esw/qos.h |   2 +-
 drivers/net/netdevsim/dev.c                   |  10 +-
 include/net/devlink.h                         |  21 +-
 include/uapi/linux/devlink.h                  |   3 +
 net/core/devlink.c                            | 145 +++++-
 17 files changed, 778 insertions(+), 32 deletions(-)

-- 
2.37.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-10-28  9:37 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-27 13:00 [PATCH net-next v7 0/9] Implement devlink-rate API and extend it Michal Wilczynski
2022-10-27 13:00 ` [PATCH net-next v7 1/9] devlink: Introduce new parameter 'tx_priority' to devlink-rate Michal Wilczynski
2022-10-27 13:00 ` [PATCH net-next v7 2/9] devlink: Introduce new parameter 'tx_weight' " Michal Wilczynski
2022-10-27 13:00 ` [PATCH net-next v7 3/9] devlink: Enable creation of the devlink-rate nodes from the driver Michal Wilczynski
2022-10-27 21:27   ` Przemek Kitszel
2022-10-28  8:51     ` Wilczynski, Michal
2022-10-27 13:00 ` [PATCH net-next v7 4/9] devlink: Allow for devlink-rate nodes parent reassignment Michal Wilczynski
2022-10-27 22:34   ` Przemek Kitszel
2022-10-28  9:35     ` Wilczynski, Michal
2022-10-27 13:00 ` [PATCH net-next v7 5/9] devlink: Allow to set up parent in devl_rate_leaf_create() Michal Wilczynski
2022-10-28  3:28   ` Jakub Kicinski
2022-10-27 13:00 ` [PATCH net-next v7 6/9] devlink: Allow to change priv in devlink-rate from parent_set callbacks Michal Wilczynski
2022-10-27 13:00 ` [PATCH net-next v7 7/9] ice: Introduce new parameters in ice_sched_node Michal Wilczynski
2022-10-27 13:00 ` [PATCH net-next v7 8/9] ice: Implement devlink-rate API Michal Wilczynski
2022-10-27 13:00 ` [PATCH net-next v7 9/9] ice: Prevent ADQ, DCB, RDMA coexistence with Custom Tx scheduler Michal Wilczynski
2022-10-28  3:28   ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).