netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jiri Pirko <jiri@resnulli.us>
To: Arkadi Sharshevsky <arkadis@mellanox.com>
Cc: netdev@vger.kernel.org, David Miller <davem@davemloft.net>,
	ivecera@redhat.com, roopa@cumulusnetworks.com,
	Florian Fainelli <f.fainelli@gmail.com>,
	Vivien Didelot <vivien.didelot@savoirfairelinux.com>,
	john.fastabend@gmail.com, Andrew Lunn <andrew@lunn.ch>,
	mlxsw <mlxsw@mellanox.com>
Subject: Re: Driver profiles RFC
Date: Tue, 8 Aug 2017 15:24:11 +0200	[thread overview]
Message-ID: <20170808132411.GF1853@nanopsycho> (raw)
In-Reply-To: <6d8560fa-8346-0c43-272d-d39be65ea82f@mellanox.com>

Tue, Aug 08, 2017 at 03:15:41PM CEST, arkadis@mellanox.com wrote:
>Drivers may require driver specific information during the init stage.
>For example, memory based shared resource which should be segmented for
>different ASIC processes, such as FDB and LPM lookups.
>
>The current mlxsw implementation assumes some default values, which are
>const and cannot be changed due to lack of UAPI for its configuration
>(module params is not an option). Those values can greatly impact the
>scale of the hardware processes, such as the maximum sizes of the FDB/LPM
>tables. Furthermore, those values should be consistent between driver
>reloads.
>
>The interface called DPIPE [1] was introduced in order to provide
>abstraction of the hardware pipeline. This RFC letter suggests solving
>this problem by enhancing the DPIPE hardware abstraction model.
>
>DPIPE Resource
>==============
>
>In order to represent ASIC wide resources space a new object should be
>introduced called "resource". It was originally suggested as future
>extension in [1] in order to give the user visibility about the tables
>limitation due to some shared resource. For example FDB and LPM share
>a common hash based memory. This abstraction can be also used for
>providing static configuration for such resources.
>
>Resource
>--------
>The resource object defines generic hardware resource like memory,
>counter pool, etc. which can be described by name and size. The resource
>can be nested, for example the internal ASIC's memory can be split into
>two parts, as can be seen in the following diagram:
>
>                    +---------------+
>                    |  Internal Mem |
>                    |               |
>                    |   Size: 3M*   |
>                    +---------------+
>                      /           \
>                     /             \
>                    /               \
>                   /                 \
>                  /                   \
>         +--------------+      +--------------+
>         |    Linear    |      |     Hash     |
>         |              |      |              |
>         |   Size: 1M   |      |   Size: 2M   |
>         +--------------+      +--------------+
>
>*The number are provided as an example and do not reflect real ASIC
> resource sizes
>
>Where the hash portion is used for FDB/LPM table lookups, and the linear
>one is used by the routing adjacency table. Each resource can be described
>by a name, size and list of children. Example for dumping the described
>above structure:
>
>#devlink dpipe resource dump tree pci/0000:03:00.0 Mem
>{
>    "resource": {
>       "pci/0000:03:00.0": [{
>            "name": "Mem",
>            "size": 3M,
>            "resource": [{
>                      "name": "Mem_Linear",
>                      "size": "1M",
>                     }, {
>                      "name": "Mem_Hash",
>                      "size": "2MK",
>		     }
>              }]
>        }]

This is dumped from kernel either by list or tree using nesting.
I think that list makes more sense and userspace can assemble the tree
according to references.


>     }
>}
>
>Each DPIPE table can be connected to one resource.
>
>Driver <--> Devlink API
>=======================
>Each driver will register his resources with default values at init in
>a similar way to DPIPE table registration. In case those resources already
>exist the default values are discarded. The user will be able to dump and
>update the resources. In order for the changes to take place the user will
>need to re-initiate the driver by a specific devlink knob.
>
>The above described procedure will require extra reload of the driver.
>This can be improved as a future optimization.
>
>UAPI
>====
>The user will be able to update the resources on a per resource basis:
>
>$devlink dpipe resource set pci/0000:03:00.0 Mem_Linear 2M
>
>For some resources the size is fixed, for example the size of the internal
>memory cannot be changed. It is provided merely in order to reflect the
>nested structure of the resource and to imply the user that Mem = Linear +
>Hash, thus a set operation on it will fail.
>
>The user can dump the current resource configuration:
>
>#devlink dpipe resource dump tree pci/0000:03:00.0 Mem
>
>The user can specify 'tree' in order to show all the nested resources under
>the specified one. In case no 'resource name' is specified the TOP hierarchy
>will be dumped.
>
>After successful resource update the drivers hould be re-instantiated in
>order for the changes to take place:
>
>$devlink reload pci/0000:03:00.0
>
>User Configuration
>------------------
>Such an UAPI is very low level, and thus an average user may not know how to
>adjust this sizes according to his needs. The vendor can provide several
>tested configuration files that the user can choose from. Each config file
>will be measured in terms of: MAC addresses, L3 Neighbors (IPv4, IPv6),
>LPM entries (IPv4,IPv6) in order to provide approximate results. By this an
>average user will choose one of the provided ones. Furthermore, a more
>advanced user could play with the numbers for his personal benefit.
>
>Reference
>=========
>[1] https://netdevconf.org/2.1/papers/dpipe_netdev_2_1.odt
>

This provides great visibility and ability to tweak the ASIC in very
well defined way.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>

  reply	other threads:[~2017-08-08 13:24 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-08 13:15 Driver profiles RFC Arkadi Sharshevsky
2017-08-08 13:24 ` Jiri Pirko [this message]
2017-08-08 13:54 ` Andrew Lunn
2017-08-08 15:44   ` Arkadi Sharshevsky
2017-08-08 16:08 ` Roopa Prabhu
2017-08-09 11:43   ` Arkadi Sharshevsky
2017-08-11 14:34     ` Roopa Prabhu
2017-08-11 21:57 ` Jakub Kicinski
2017-08-13  6:32   ` Jiri Pirko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170808132411.GF1853@nanopsycho \
    --to=jiri@resnulli.us \
    --cc=andrew@lunn.ch \
    --cc=arkadis@mellanox.com \
    --cc=davem@davemloft.net \
    --cc=f.fainelli@gmail.com \
    --cc=ivecera@redhat.com \
    --cc=john.fastabend@gmail.com \
    --cc=mlxsw@mellanox.com \
    --cc=netdev@vger.kernel.org \
    --cc=roopa@cumulusnetworks.com \
    --cc=vivien.didelot@savoirfairelinux.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).