netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jiri Pirko <jiri@resnulli.us>
To: Jakub Kicinski <kuba@kernel.org>
Cc: William Tu <witu@nvidia.com>,
	Jacob Keller <jacob.e.keller@intel.com>,
	bodong@nvidia.com, jiri@nvidia.com, netdev@vger.kernel.org,
	saeedm@nvidia.com,
	"aleksander.lobakin@intel.com" <aleksander.lobakin@intel.com>
Subject: Re: [RFC PATCH v3 net-next] Documentation: devlink: Add devlink-sd
Date: Fri, 2 Feb 2024 08:46:56 +0100	[thread overview]
Message-ID: <Zbyd8Fbj8_WHP4WI@nanopsycho> (raw)
In-Reply-To: <20240201200041.241fd4c1@kernel.org>

Fri, Feb 02, 2024 at 05:00:41AM CET, kuba@kernel.org wrote:
>On Thu, 1 Feb 2024 11:13:57 +0100 Jiri Pirko wrote:
>> Thu, Feb 01, 2024 at 12:17:26AM CET, kuba@kernel.org wrote:
>> >> I guess bnxt, ice, nfp are doing tx buffer sharing?  
>> >
>> >I'm not familiar with ice. I'm 90% sure bnxt shares both Rx and Tx.
>> >I'm 99.9% sure nfp does.  
>> 
>> Wait a sec.
>
>No, you wait a sec ;) Why do you think this belongs to devlink?
>Two months ago you were complaining bitterly when people were
>considering using devlink rate to control per-queue shapers.
>And now it's fine to add queues as a concept to devlink?

Do you have a better suggestion how to model common pool object for
multiple netdevices? This is the reason why devlink was introduced to
provide a platform for common/shared things for a device that contains
multiple netdevs/ports/whatever. But I may be missing something here,
for sure.


>
>> You refer to using the lower device (like PF) to actually
>> send and receive trafic of representors. That means, you share the
>> entire queues. Or maybe better term is not "share" but "use PF queues".
>> 
>> The infra William is proposing is about something else. In that
>> scenario, each representor has a separate independent set of queues,
>> as well as the PF has. Currently in mlx5, all representor queues have
>> descriptors only used for the individual representor. So there is
>> a huge waste of memory for that, as often there is only very low traffic
>> there and probability of hitting trafic burst on many representors at
>> the same time is very low.
>> 
>> Say you have 1 queue for a rep. 1 queue has 1k descriptors. For 1k
>> representors you end up with:
>> 1k x 1k = 1m descriptors
>
>I understand the memory waste problem:
>https://people.kernel.org/kuba/nic-memory-reserve
>
>> With this API, user can configure sharing of the descriptors.
>> So there would be a pool (or multiple pools) of descriptors and the
>> descriptors could be used by many queues/representors.
>> 
>> So in the example above, for 1k representors you have only 1k
>> descriptors.
>> 
>> The infra allows great flexibility in terms of configuring multiple
>> pools of different sizes and assigning queues from representors to
>> different pools. So you can have multiple "classes" of representors.
>> For example the ones you expect heavy trafic could have a separate pool,
>> the rest can share another pool together, etc.
>
>Well, it does not extend naturally to the design described in that blog
>post. There I only care about a netdev level pool, but every queue can
>bind multiple pools.
>
>It also does not cater naturally to a very interesting application
>of such tech to lightweight container interfaces, macvlan-offload style.
>As I said at the beginning, why is the pool a devlink thing if the only
>objects that connect to it are netdevs?

Okay. Let's model it differently, no problem. I find devlink device
as a good fit for object to contain shared things like pools.
But perhaps there could be something else. Something new?


>
>Another netdev thing where this will be awkward is page pool
>integration. It lives in netdev genl, are we going to add devlink pool
>reference to indicate which pool a pp is feeding?

Page pool is per-netdev, isn't it? It could be extended to be bound per
devlink-pool as you suggest. It is a bit awkward, I agree.

So instead of devlink, should be add the descriptor-pool object into
netdev genl and make possible for multiple netdevs to use it there?
I would still miss the namespace of the pool, as it naturally aligns
with devlink device. IDK :/


>
>When memory providers finally materialize that will be another
>netdev thing that needs to somehow connect here.

  reply	other threads:[~2024-02-02  7:47 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-25  4:56 [RFC PATCH v2 net-next] Documentation: devlink: Add devlink-sd William Tu
2024-01-25 21:12 ` [RFC PATCH v3 " William Tu
2024-01-25 22:36 ` William Tu
2024-01-29 10:56   ` Simon Horman
2024-01-29 22:23     ` William Tu
2024-01-31  1:07   ` Jakub Kicinski
2024-01-31 18:47     ` William Tu
2024-01-31 19:06       ` Jakub Kicinski
2024-01-31 19:16         ` William Tu
2024-01-31 20:45           ` Jakub Kicinski
2024-01-31 21:37             ` William Tu
2024-01-31 21:41               ` Jacob Keller
2024-01-31 22:30                 ` Jakub Kicinski
2024-01-31 23:02                   ` William Tu
2024-01-31 23:17                     ` Jakub Kicinski
2024-02-01  2:23                       ` Samudrala, Sridhar
2024-02-01 14:00                         ` William Tu
2024-02-02  8:48                           ` Michal Swiatkowski
2024-02-02 15:27                             ` William Tu
2024-02-01 10:13                       ` Jiri Pirko
2024-02-02  4:00                         ` Jakub Kicinski
2024-02-02  7:46                           ` Jiri Pirko [this message]
2024-02-09  1:26                             ` Jakub Kicinski
2024-02-15 13:19                               ` Jiri Pirko
2024-02-15 17:41                                 ` Jacob Keller
2024-02-16  2:07                                   ` Jakub Kicinski
2024-02-16  8:15                                     ` Jiri Pirko
2024-02-16 21:42                                     ` Jacob Keller
2024-02-16 21:47                                     ` Jacob Keller
2024-02-19  8:59                                       ` Jiri Pirko
2024-02-16  8:10                                   ` Jiri Pirko
2024-02-16 21:44                                     ` Jacob Keller
2024-02-16  1:58                                 ` Jakub Kicinski
2024-02-16  8:06                                   ` Jiri Pirko
2024-02-17  2:43                                     ` Jakub Kicinski
2024-02-19  9:06                                       ` Jiri Pirko
2024-02-20 22:17                                         ` Jakub Kicinski
2024-02-01 19:16                       ` William Tu
2024-02-02  3:30                         ` Jakub Kicinski
2024-02-02  4:26                           ` William Tu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zbyd8Fbj8_WHP4WI@nanopsycho \
    --to=jiri@resnulli.us \
    --cc=aleksander.lobakin@intel.com \
    --cc=bodong@nvidia.com \
    --cc=jacob.e.keller@intel.com \
    --cc=jiri@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=witu@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).