From: Jiri Pirko <jiri@resnulli.us>
To: Jakub Kicinski <kuba@kernel.org>
Cc: William Tu <witu@nvidia.com>,
Jacob Keller <jacob.e.keller@intel.com>,
bodong@nvidia.com, jiri@nvidia.com, netdev@vger.kernel.org,
saeedm@nvidia.com,
"aleksander.lobakin@intel.com" <aleksander.lobakin@intel.com>
Subject: Re: [RFC PATCH v3 net-next] Documentation: devlink: Add devlink-sd
Date: Fri, 2 Feb 2024 08:46:56 +0100 [thread overview]
Message-ID: <Zbyd8Fbj8_WHP4WI@nanopsycho> (raw)
In-Reply-To: <20240201200041.241fd4c1@kernel.org>
Fri, Feb 02, 2024 at 05:00:41AM CET, kuba@kernel.org wrote:
>On Thu, 1 Feb 2024 11:13:57 +0100 Jiri Pirko wrote:
>> Thu, Feb 01, 2024 at 12:17:26AM CET, kuba@kernel.org wrote:
>> >> I guess bnxt, ice, nfp are doing tx buffer sharing?
>> >
>> >I'm not familiar with ice. I'm 90% sure bnxt shares both Rx and Tx.
>> >I'm 99.9% sure nfp does.
>>
>> Wait a sec.
>
>No, you wait a sec ;) Why do you think this belongs to devlink?
>Two months ago you were complaining bitterly when people were
>considering using devlink rate to control per-queue shapers.
>And now it's fine to add queues as a concept to devlink?
Do you have a better suggestion how to model common pool object for
multiple netdevices? This is the reason why devlink was introduced to
provide a platform for common/shared things for a device that contains
multiple netdevs/ports/whatever. But I may be missing something here,
for sure.
>
>> You refer to using the lower device (like PF) to actually
>> send and receive trafic of representors. That means, you share the
>> entire queues. Or maybe better term is not "share" but "use PF queues".
>>
>> The infra William is proposing is about something else. In that
>> scenario, each representor has a separate independent set of queues,
>> as well as the PF has. Currently in mlx5, all representor queues have
>> descriptors only used for the individual representor. So there is
>> a huge waste of memory for that, as often there is only very low traffic
>> there and probability of hitting trafic burst on many representors at
>> the same time is very low.
>>
>> Say you have 1 queue for a rep. 1 queue has 1k descriptors. For 1k
>> representors you end up with:
>> 1k x 1k = 1m descriptors
>
>I understand the memory waste problem:
>https://people.kernel.org/kuba/nic-memory-reserve
>
>> With this API, user can configure sharing of the descriptors.
>> So there would be a pool (or multiple pools) of descriptors and the
>> descriptors could be used by many queues/representors.
>>
>> So in the example above, for 1k representors you have only 1k
>> descriptors.
>>
>> The infra allows great flexibility in terms of configuring multiple
>> pools of different sizes and assigning queues from representors to
>> different pools. So you can have multiple "classes" of representors.
>> For example the ones you expect heavy trafic could have a separate pool,
>> the rest can share another pool together, etc.
>
>Well, it does not extend naturally to the design described in that blog
>post. There I only care about a netdev level pool, but every queue can
>bind multiple pools.
>
>It also does not cater naturally to a very interesting application
>of such tech to lightweight container interfaces, macvlan-offload style.
>As I said at the beginning, why is the pool a devlink thing if the only
>objects that connect to it are netdevs?
Okay. Let's model it differently, no problem. I find devlink device
as a good fit for object to contain shared things like pools.
But perhaps there could be something else. Something new?
>
>Another netdev thing where this will be awkward is page pool
>integration. It lives in netdev genl, are we going to add devlink pool
>reference to indicate which pool a pp is feeding?
Page pool is per-netdev, isn't it? It could be extended to be bound per
devlink-pool as you suggest. It is a bit awkward, I agree.
So instead of devlink, should be add the descriptor-pool object into
netdev genl and make possible for multiple netdevs to use it there?
I would still miss the namespace of the pool, as it naturally aligns
with devlink device. IDK :/
>
>When memory providers finally materialize that will be another
>netdev thing that needs to somehow connect here.
next prev parent reply other threads:[~2024-02-02 7:47 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-25 4:56 [RFC PATCH v2 net-next] Documentation: devlink: Add devlink-sd William Tu
2024-01-25 21:12 ` [RFC PATCH v3 " William Tu
2024-01-25 22:36 ` William Tu
2024-01-29 10:56 ` Simon Horman
2024-01-29 22:23 ` William Tu
2024-01-31 1:07 ` Jakub Kicinski
2024-01-31 18:47 ` William Tu
2024-01-31 19:06 ` Jakub Kicinski
2024-01-31 19:16 ` William Tu
2024-01-31 20:45 ` Jakub Kicinski
2024-01-31 21:37 ` William Tu
2024-01-31 21:41 ` Jacob Keller
2024-01-31 22:30 ` Jakub Kicinski
2024-01-31 23:02 ` William Tu
2024-01-31 23:17 ` Jakub Kicinski
2024-02-01 2:23 ` Samudrala, Sridhar
2024-02-01 14:00 ` William Tu
2024-02-02 8:48 ` Michal Swiatkowski
2024-02-02 15:27 ` William Tu
2024-02-01 10:13 ` Jiri Pirko
2024-02-02 4:00 ` Jakub Kicinski
2024-02-02 7:46 ` Jiri Pirko [this message]
2024-02-09 1:26 ` Jakub Kicinski
2024-02-15 13:19 ` Jiri Pirko
2024-02-15 17:41 ` Jacob Keller
2024-02-16 2:07 ` Jakub Kicinski
2024-02-16 8:15 ` Jiri Pirko
2024-02-16 21:42 ` Jacob Keller
2024-02-16 21:47 ` Jacob Keller
2024-02-19 8:59 ` Jiri Pirko
2024-02-16 8:10 ` Jiri Pirko
2024-02-16 21:44 ` Jacob Keller
2024-02-16 1:58 ` Jakub Kicinski
2024-02-16 8:06 ` Jiri Pirko
2024-02-17 2:43 ` Jakub Kicinski
2024-02-19 9:06 ` Jiri Pirko
2024-02-20 22:17 ` Jakub Kicinski
2024-02-01 19:16 ` William Tu
2024-02-02 3:30 ` Jakub Kicinski
2024-02-02 4:26 ` William Tu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zbyd8Fbj8_WHP4WI@nanopsycho \
--to=jiri@resnulli.us \
--cc=aleksander.lobakin@intel.com \
--cc=bodong@nvidia.com \
--cc=jacob.e.keller@intel.com \
--cc=jiri@nvidia.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=saeedm@nvidia.com \
--cc=witu@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).