From: Jiri Pirko <jiri@resnulli.us>
To: Arkadi Sharshevsky <arkadis@mellanox.com>
Cc: netdev@vger.kernel.org, David Miller <davem@davemloft.net>,
ivecera@redhat.com, roopa@cumulusnetworks.com,
Florian Fainelli <f.fainelli@gmail.com>,
Vivien Didelot <vivien.didelot@savoirfairelinux.com>,
john.fastabend@gmail.com, Andrew Lunn <andrew@lunn.ch>,
mlxsw <mlxsw@mellanox.com>
Subject: Re: Driver profiles RFC
Date: Tue, 8 Aug 2017 15:24:11 +0200 [thread overview]
Message-ID: <20170808132411.GF1853@nanopsycho> (raw)
In-Reply-To: <6d8560fa-8346-0c43-272d-d39be65ea82f@mellanox.com>
Tue, Aug 08, 2017 at 03:15:41PM CEST, arkadis@mellanox.com wrote:
>Drivers may require driver specific information during the init stage.
>For example, memory based shared resource which should be segmented for
>different ASIC processes, such as FDB and LPM lookups.
>
>The current mlxsw implementation assumes some default values, which are
>const and cannot be changed due to lack of UAPI for its configuration
>(module params is not an option). Those values can greatly impact the
>scale of the hardware processes, such as the maximum sizes of the FDB/LPM
>tables. Furthermore, those values should be consistent between driver
>reloads.
>
>The interface called DPIPE [1] was introduced in order to provide
>abstraction of the hardware pipeline. This RFC letter suggests solving
>this problem by enhancing the DPIPE hardware abstraction model.
>
>DPIPE Resource
>==============
>
>In order to represent ASIC wide resources space a new object should be
>introduced called "resource". It was originally suggested as future
>extension in [1] in order to give the user visibility about the tables
>limitation due to some shared resource. For example FDB and LPM share
>a common hash based memory. This abstraction can be also used for
>providing static configuration for such resources.
>
>Resource
>--------
>The resource object defines generic hardware resource like memory,
>counter pool, etc. which can be described by name and size. The resource
>can be nested, for example the internal ASIC's memory can be split into
>two parts, as can be seen in the following diagram:
>
> +---------------+
> | Internal Mem |
> | |
> | Size: 3M* |
> +---------------+
> / \
> / \
> / \
> / \
> / \
> +--------------+ +--------------+
> | Linear | | Hash |
> | | | |
> | Size: 1M | | Size: 2M |
> +--------------+ +--------------+
>
>*The number are provided as an example and do not reflect real ASIC
> resource sizes
>
>Where the hash portion is used for FDB/LPM table lookups, and the linear
>one is used by the routing adjacency table. Each resource can be described
>by a name, size and list of children. Example for dumping the described
>above structure:
>
>#devlink dpipe resource dump tree pci/0000:03:00.0 Mem
>{
> "resource": {
> "pci/0000:03:00.0": [{
> "name": "Mem",
> "size": 3M,
> "resource": [{
> "name": "Mem_Linear",
> "size": "1M",
> }, {
> "name": "Mem_Hash",
> "size": "2MK",
> }
> }]
> }]
This is dumped from kernel either by list or tree using nesting.
I think that list makes more sense and userspace can assemble the tree
according to references.
> }
>}
>
>Each DPIPE table can be connected to one resource.
>
>Driver <--> Devlink API
>=======================
>Each driver will register his resources with default values at init in
>a similar way to DPIPE table registration. In case those resources already
>exist the default values are discarded. The user will be able to dump and
>update the resources. In order for the changes to take place the user will
>need to re-initiate the driver by a specific devlink knob.
>
>The above described procedure will require extra reload of the driver.
>This can be improved as a future optimization.
>
>UAPI
>====
>The user will be able to update the resources on a per resource basis:
>
>$devlink dpipe resource set pci/0000:03:00.0 Mem_Linear 2M
>
>For some resources the size is fixed, for example the size of the internal
>memory cannot be changed. It is provided merely in order to reflect the
>nested structure of the resource and to imply the user that Mem = Linear +
>Hash, thus a set operation on it will fail.
>
>The user can dump the current resource configuration:
>
>#devlink dpipe resource dump tree pci/0000:03:00.0 Mem
>
>The user can specify 'tree' in order to show all the nested resources under
>the specified one. In case no 'resource name' is specified the TOP hierarchy
>will be dumped.
>
>After successful resource update the drivers hould be re-instantiated in
>order for the changes to take place:
>
>$devlink reload pci/0000:03:00.0
>
>User Configuration
>------------------
>Such an UAPI is very low level, and thus an average user may not know how to
>adjust this sizes according to his needs. The vendor can provide several
>tested configuration files that the user can choose from. Each config file
>will be measured in terms of: MAC addresses, L3 Neighbors (IPv4, IPv6),
>LPM entries (IPv4,IPv6) in order to provide approximate results. By this an
>average user will choose one of the provided ones. Furthermore, a more
>advanced user could play with the numbers for his personal benefit.
>
>Reference
>=========
>[1] https://netdevconf.org/2.1/papers/dpipe_netdev_2_1.odt
>
This provides great visibility and ability to tweak the ASIC in very
well defined way.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
next prev parent reply other threads:[~2017-08-08 13:24 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-08 13:15 Driver profiles RFC Arkadi Sharshevsky
2017-08-08 13:24 ` Jiri Pirko [this message]
2017-08-08 13:54 ` Andrew Lunn
2017-08-08 15:44 ` Arkadi Sharshevsky
2017-08-08 16:08 ` Roopa Prabhu
2017-08-09 11:43 ` Arkadi Sharshevsky
2017-08-11 14:34 ` Roopa Prabhu
2017-08-11 21:57 ` Jakub Kicinski
2017-08-13 6:32 ` Jiri Pirko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170808132411.GF1853@nanopsycho \
--to=jiri@resnulli.us \
--cc=andrew@lunn.ch \
--cc=arkadis@mellanox.com \
--cc=davem@davemloft.net \
--cc=f.fainelli@gmail.com \
--cc=ivecera@redhat.com \
--cc=john.fastabend@gmail.com \
--cc=mlxsw@mellanox.com \
--cc=netdev@vger.kernel.org \
--cc=roopa@cumulusnetworks.com \
--cc=vivien.didelot@savoirfairelinux.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).