All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jiri Pirko <jiri@resnulli.us>
To: Jakub Kicinski <kuba@kernel.org>
Cc: William Tu <witu@nvidia.com>,
	Jacob Keller <jacob.e.keller@intel.com>,
	bodong@nvidia.com, jiri@nvidia.com, netdev@vger.kernel.org,
	saeedm@nvidia.com,
	"aleksander.lobakin@intel.com" <aleksander.lobakin@intel.com>
Subject: Re: [RFC PATCH v3 net-next] Documentation: devlink: Add devlink-sd
Date: Fri, 2 Feb 2024 08:46:56 +0100	[thread overview]
Message-ID: <Zbyd8Fbj8_WHP4WI@nanopsycho> (raw)
In-Reply-To: <20240201200041.241fd4c1@kernel.org>

Fri, Feb 02, 2024 at 05:00:41AM CET, kuba@kernel.org wrote:
>On Thu, 1 Feb 2024 11:13:57 +0100 Jiri Pirko wrote:
>> Thu, Feb 01, 2024 at 12:17:26AM CET, kuba@kernel.org wrote:
>> >> I guess bnxt, ice, nfp are doing tx buffer sharing?  
>> >
>> >I'm not familiar with ice. I'm 90% sure bnxt shares both Rx and Tx.
>> >I'm 99.9% sure nfp does.  
>> 
>> Wait a sec.
>
>No, you wait a sec ;) Why do you think this belongs to devlink?
>Two months ago you were complaining bitterly when people were
>considering using devlink rate to control per-queue shapers.
>And now it's fine to add queues as a concept to devlink?

Do you have a better suggestion how to model common pool object for
multiple netdevices? This is the reason why devlink was introduced to
provide a platform for common/shared things for a device that contains
multiple netdevs/ports/whatever. But I may be missing something here,
for sure.


>
>> You refer to using the lower device (like PF) to actually
>> send and receive trafic of representors. That means, you share the
>> entire queues. Or maybe better term is not "share" but "use PF queues".
>> 
>> The infra William is proposing is about something else. In that
>> scenario, each representor has a separate independent set of queues,
>> as well as the PF has. Currently in mlx5, all representor queues have
>> descriptors only used for the individual representor. So there is
>> a huge waste of memory for that, as often there is only very low traffic
>> there and probability of hitting trafic burst on many representors at
>> the same time is very low.
>> 
>> Say you have 1 queue for a rep. 1 queue has 1k descriptors. For 1k
>> representors you end up with:
>> 1k x 1k = 1m descriptors
>
>I understand the memory waste problem:
>https://people.kernel.org/kuba/nic-memory-reserve
>
>> With this API, user can configure sharing of the descriptors.
>> So there would be a pool (or multiple pools) of descriptors and the
>> descriptors could be used by many queues/representors.
>> 
>> So in the example above, for 1k representors you have only 1k
>> descriptors.
>> 
>> The infra allows great flexibility in terms of configuring multiple
>> pools of different sizes and assigning queues from representors to
>> different pools. So you can have multiple "classes" of representors.
>> For example the ones you expect heavy trafic could have a separate pool,
>> the rest can share another pool together, etc.
>
>Well, it does not extend naturally to the design described in that blog
>post. There I only care about a netdev level pool, but every queue can
>bind multiple pools.
>
>It also does not cater naturally to a very interesting application
>of such tech to lightweight container interfaces, macvlan-offload style.
>As I said at the beginning, why is the pool a devlink thing if the only
>objects that connect to it are netdevs?

Okay. Let's model it differently, no problem. I find devlink device
as a good fit for object to contain shared things like pools.
But perhaps there could be something else. Something new?


>
>Another netdev thing where this will be awkward is page pool
>integration. It lives in netdev genl, are we going to add devlink pool
>reference to indicate which pool a pp is feeding?

Page pool is per-netdev, isn't it? It could be extended to be bound per
devlink-pool as you suggest. It is a bit awkward, I agree.

So instead of devlink, should be add the descriptor-pool object into
netdev genl and make possible for multiple netdevs to use it there?
I would still miss the namespace of the pool, as it naturally aligns
with devlink device. IDK :/


>
>When memory providers finally materialize that will be another
>netdev thing that needs to somehow connect here.

  reply	other threads:[~2024-02-02  7:47 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-25  4:56 [RFC PATCH v2 net-next] Documentation: devlink: Add devlink-sd William Tu
2024-01-25 21:12 ` [RFC PATCH v3 " William Tu
2024-01-25 22:36 ` William Tu
2024-01-29 10:56   ` Simon Horman
2024-01-29 22:23     ` William Tu
2024-01-31  1:07   ` Jakub Kicinski
2024-01-31 18:47     ` William Tu
2024-01-31 19:06       ` Jakub Kicinski
2024-01-31 19:16         ` William Tu
2024-01-31 20:45           ` Jakub Kicinski
2024-01-31 21:37             ` William Tu
2024-01-31 21:41               ` Jacob Keller
2024-01-31 22:30                 ` Jakub Kicinski
2024-01-31 23:02                   ` William Tu
2024-01-31 23:17                     ` Jakub Kicinski
2024-02-01  2:23                       ` Samudrala, Sridhar
2024-02-01 14:00                         ` William Tu
2024-02-02  8:48                           ` Michal Swiatkowski
2024-02-02 15:27                             ` William Tu
2024-02-01 10:13                       ` Jiri Pirko
2024-02-02  4:00                         ` Jakub Kicinski
2024-02-02  7:46                           ` Jiri Pirko [this message]
2024-02-09  1:26                             ` Jakub Kicinski
2024-02-15 13:19                               ` Jiri Pirko
2024-02-15 17:41                                 ` Jacob Keller
2024-02-16  2:07                                   ` Jakub Kicinski
2024-02-16  8:15                                     ` Jiri Pirko
2024-02-16 21:42                                     ` Jacob Keller
2024-02-16 21:47                                     ` Jacob Keller
2024-02-19  8:59                                       ` Jiri Pirko
2024-02-16  8:10                                   ` Jiri Pirko
2024-02-16 21:44                                     ` Jacob Keller
2024-02-16  1:58                                 ` Jakub Kicinski
2024-02-16  8:06                                   ` Jiri Pirko
2024-02-17  2:43                                     ` Jakub Kicinski
2024-02-19  9:06                                       ` Jiri Pirko
2024-02-20 22:17                                         ` Jakub Kicinski
2024-02-01 19:16                       ` William Tu
2024-02-02  3:30                         ` Jakub Kicinski
2024-02-02  4:26                           ` William Tu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zbyd8Fbj8_WHP4WI@nanopsycho \
    --to=jiri@resnulli.us \
    --cc=aleksander.lobakin@intel.com \
    --cc=bodong@nvidia.com \
    --cc=jacob.e.keller@intel.com \
    --cc=jiri@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=witu@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.