netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Kicinski <jakub.kicinski@netronome.com>
To: Yuval Avnery <yuvalav@mellanox.com>, jiri@mellanox.com
Cc: netdev@vger.kernel.org, saeedm@mellanox.com, leon@kernel.org,
	davem@davemloft.net, shuah@kernel.org, danielj@mellanox.com,
	parav@mellanox.com, andrew.gospodarek@broadcom.com,
	michael.chan@broadcom.com
Subject: Re: [PATCH net-next v2 00/10] devlink subdev
Date: Mon, 11 Nov 2019 10:00:04 -0800	[thread overview]
Message-ID: <20191111100004.683b7320@cakuba> (raw)
In-Reply-To: <1573229926-30040-1-git-send-email-yuvalav@mellanox.com>

On Fri,  8 Nov 2019 18:18:36 +0200, Yuval Avnery wrote:
> This patchset introduces devlink subdev.
> 
> Currently, legacy tools do not provide a comprehensive solution that can
> be used in both SmartNic and non-SmartNic mode.
> Subdev represents a device that exists on the ASIC but is not necessarily
> visible to the kernel.
> 
> Using devlink ports is not suitable because:
> 
> 1. Those devices aren't necessarily network devices (such as NVMe devices)
>    and doesn’t have E-switch representation. Therefore, there is need for
>    more generic representation of PCI VF.
> 2. Some attributes are not necessarily pure port attributes
>    (number of MSIX vectors)

PCIe attrs will require persistence. Where is that in this design?

Also MSIX vectors are configuration of the devlink port (ASIC side), the
only reason you're putting them in subdev is because some of the subdevs
don't have a port, muddying up the meaning of things.

> 3. It creates a confusing devlink topology, with multiple port flavours
>    and indices.
> 
> Subdev will be created along with flavour and attributes.
> Some network subdevs may be linked with a devlink port.
> 
> This is also aimed to replace "ip link vf" commands as they are strongly
> linked to the PCI topology and allow access only to enabled VFs.
> Even though current patchset and example is limited to MAC address
> of the VF, this interface will allow to manage PF, VF, mdev in
> SmartNic and non SmartNic modes, in unified way for networking and
> non-networking devices via devlink instance.
> 
> Use case example:
> An example system view of a networking ASIC (aka SmartNIC), can be seen in
> below diagram, where devlink eswitch instance and PCI PF and/or VFs are
> situated on two different CPU subsystems:
> 
> 
>       +------------------------------+
>       |                              |
>       |             HOST             |
>       |                              |
>       |   +----+-----+-----+-----+   |
>       |   | PF | VF0 | VF1 | VF2 |   |
>       +---+----+-----------+-----+---+
>                  PCI1|
>           +---+------------+
>               |
>      +----------------------------------------+
>      |        |         SmartNic              |
>      |   +----+-------------------------+     |
>      |   |                              |     |
>      |   |               NIC            |     |
>      |   |                              |     |
>      |   +---------------------+--------+     |
>      |                         |  PCI2        |
>      |         +-----+---------+--+           |
>      |               |                        |
>      |      +-----+--+--+--------------+      |
>      |      |     | PF  |              |      |
>      |      |     +-----+              |      |
>      |      |      Embedded CPU        |      |
>      |      |                          |      |
>      |      +--------------------------+      |
>      |                                        |
>      +----------------------------------------+
> 
> The below diagram shows an example devlink subdev topology where some
> subdevs are connected to devlink ports::
> 
> 
> 
>             (PF0)    (VF0)    (VF1)           (NVME VF2)
>          +--------------------------+         +--------+
>          | devlink| devlink| devlink|         | devlink|
>          | subdev | subdev | subdev |         | subdev |
>          |    0   |    1   |    2   |         |    3   |
>          +--------------------------+         +--------+
>               |        |        |

What is this NVME VF2 connected to? It's gotta get traffic from
somewhere? Frames come in from the uplink, then what?

>               |        |        |
>               |        |        |
>      +----------------------------------+
>      |   | devlink| devlink| devlink|   |
>      |   |  port  |  port  |  port  |   |
>      |   |    0   |    1   |    2   |   |
>      |   +--------------------------+   |
>      |                                  |
>      |                                  |
>      |           E-switch               |
>      |                                  |
>      |                                  |
>      |          +--------+              |
>      |          | uplink |              |
>      |          | devlink|              |
>      |          |  port  |              |
>      +----------------------------------+
>  
> Devlink command example:
> 
> A privileged user wants to configure a VF's hw_addr, before the VF is
> enabled.
> 
> $ devlink subdev set pci/0000:03:00.0/1 hw_addr 10:22:33:44:55:66
> 
> $ devlink subdev show pci/0000:03:00.0/1
> pci/0000:03:00.0/1: flavour pcivf pf 0 vf 0 port_index 1 hw_addr 10:22:33:44:55:66
> 
> $ devlink subdev show pci/0000:03:00.0/1 -jp
> {
>     "subdev": {
>         "pci/0000:03:00.0/1": {
>             "flavour": "pcivf",

If the flavour is pcivf what differentiates this (Ethernet) VF from 
a NVME one?

>             "pf": 0,
>             "vf": 0,
>             "port_index": 1,
>             "hw_addr": "10:22:33:44:55:66"

Since you're messing with the "hw_addr", you should at least provision
the uAPI for adding multiple addresses. Intel guys were asking for this
long time ago.

Have you considered implementing some compat code so drivers don't
have to implement the legacy ndos if they support subdevs?

>         }
>     }
> }

Okay, so you added two diagrams. I guess I was naive in thinking that
"you thought this all through in detail and have more documentation and
design docs internally".

I don't like how unconstrained this is, the only implemented use case is
weak. But since you're not seeing this you probably never will, so
seems like a waste of time to fight it.

  parent reply	other threads:[~2019-11-11 18:00 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-08 16:18 [PATCH net-next v2 00/10] devlink subdev Yuval Avnery
2019-11-08 16:18 ` [PATCH net-next v2 01/10] devlink: Introduce subdev Yuval Avnery
2019-11-08 16:18 ` [PATCH net-next v2 02/10] devlink: Add PCI attributes support for subdev Yuval Avnery
2019-11-08 16:18 ` [PATCH net-next v2 03/10] devlink: Add port with subdev register support Yuval Avnery
2019-11-08 16:18 ` [PATCH net-next v2 04/10] devlink: Support subdev HW address get Yuval Avnery
2019-11-08 18:00   ` Parav Pandit
2019-11-09 17:45     ` Yuval Avnery
2019-11-09 18:13       ` Parav Pandit
2019-11-10  8:12         ` Jiri Pirko
2019-11-08 16:18 ` [PATCH net-next v2 05/10] devlink: Support subdev HW address set Yuval Avnery
2019-11-08 16:18 ` [PATCH net-next v2 06/10] Documentation: Add devlink-subdev documentation Yuval Avnery
2019-11-08 16:18 ` [PATCH net-next v2 07/10] netdevsim: Add max_vfs to bus_dev Yuval Avnery
2019-11-08 16:18 ` [PATCH net-next v2 08/10] netdevsim: Add devlink subdev creation Yuval Avnery
2019-11-08 16:18 ` [PATCH net-next v2 09/10] netdevsim: Add devlink subdev sefltest for netdevsim Yuval Avnery
2019-11-08 16:18 ` [PATCH net-next v2 10/10] net/mlx5e: Add support for devlink subdev and subdev hw_addr set/show Yuval Avnery
2019-11-11 18:00 ` Jakub Kicinski [this message]
2019-11-11 18:46   ` [PATCH net-next v2 00/10] devlink subdev Yuval Avnery
2019-11-12 14:55     ` Andy Gospodarek
2019-11-12 18:35       ` Yuval Avnery
2019-11-12 22:19         ` Jakub Kicinski
2019-11-12 23:32           ` Yuval Avnery
2019-11-13  6:37       ` Parav Pandit
2019-11-14 16:47         ` Yuval Avnery

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191111100004.683b7320@cakuba \
    --to=jakub.kicinski@netronome.com \
    --cc=andrew.gospodarek@broadcom.com \
    --cc=danielj@mellanox.com \
    --cc=davem@davemloft.net \
    --cc=jiri@mellanox.com \
    --cc=leon@kernel.org \
    --cc=michael.chan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=parav@mellanox.com \
    --cc=saeedm@mellanox.com \
    --cc=shuah@kernel.org \
    --cc=yuvalav@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).