Re: [PATCH RFC v2 net-next 1/2] devlink: Add shared descriptor eswitch attr

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jakub Kicinski <kuba@kernel.org>
To: William Tu <witu@nvidia.com>
Cc: <netdev@vger.kernel.org>, <jiri@nvidia.com>, <bodong@nvidia.com>,
	<tariqt@nvidia.com>, <yossiku@nvidia.com>
Subject: Re: [PATCH RFC v2 net-next 1/2] devlink: Add shared descriptor eswitch attr
Date: Mon, 4 Mar 2024 20:37:58 -0800	[thread overview]
Message-ID: <20240304203758.2fd0f6be@kernel.org> (raw)
In-Reply-To: <20240301011119.3267-1-witu@nvidia.com>

On Fri, 1 Mar 2024 03:11:18 +0200 William Tu wrote:
> Add two eswitch attrs: shrdesc_mode and shrdesc_count.
> 
> 1. shrdesc_mode: to enable a sharing memory buffer for
> representor's rx buffer, 

Let's narrow down the terminology. "Shared memory buffer"
and "shared memory pool" and "shrdesc" all refer to the same
thing. Let's stick to shared pool?

> and 2. shrdesc_count: to control the
> number of buffers in this shared memory pool.

_default_ number of buffers in shared pool used by representors?

If/when the API to configure shared pools becomes real it will
presumably take precedence over this default?

> When using switchdev mode, the representor ports handles the slow path
> traffic, the traffic that can't be offloaded will be redirected to the
> representor port for processing. Memory consumption of the representor
> port's rx buffer can grow to several GB when scaling to 1k VFs reps.
> For example, in mlx5 driver, each RQ, with a typical 1K descriptors,
> consumes 3MB of DMA memory for packet buffer in WQEs, and with four
> channels, it consumes 4 * 3MB * 1024 = 12GB of memory. And since rep
> ports are for slow path traffic, most of these rx DMA memory are idle.
> 
> Add shrdesc_mode configuration, allowing multiple representors
> to share a rx memory buffer pool. When enabled, individual representor
> doesn't need to allocate its dedicated rx buffer, but just pointing
> its rq to the memory pool. This could make the memory being better
> utilized. The shrdesc_count represents the number of rx ring
> entries, e.g., same meaning as ethtool -g, that's shared across other
> representors. Users adjust it based on how many reps, total system
> memory, or performance expectation.

Can we use bytes as the unit? Like the page pool. Descriptors don't
mean much to the user.

> The two params are also useful for other vendors such as Intel ICE
> drivers and Broadcom's driver, which also have representor ports for
> slow path traffic.
> 
> An example use case:
> $ devlink dev eswitch show pci/0000:08:00.0
>   pci/0000:08:00.0: mode legacy inline-mode none encap-mode basic \
>   shrdesc-mode none shrdesc-count 0
> $ devlink dev eswitch set pci/0000:08:00.0 mode switchdev \
>   shrdesc-mode basic shrdesc-count 1024
> $ devlink dev eswitch show pci/0000:08:00.0
>   pci/0000:08:00.0: mode switchdev inline-mode none encap-mode basic \
>   shrdesc-mode basic shrdesc-count 1024
> 
> Note that new configurations are set at legacy mode, and enabled at
> switchdev mode.

>  Documentation/netlink/specs/devlink.yaml | 17 ++++++++++
>  include/net/devlink.h                    |  8 +++++
>  include/uapi/linux/devlink.h             |  7 ++++
>  net/devlink/dev.c                        | 43 ++++++++++++++++++++++++
>  net/devlink/netlink_gen.c                |  6 ++--
>  5 files changed, 79 insertions(+), 2 deletions(-)

ENODOCS

> diff --git a/Documentation/netlink/specs/devlink.yaml b/Documentation/netlink/specs/devlink.yaml
> index cf6eaa0da821..58f31d99b8b3 100644
> --- a/Documentation/netlink/specs/devlink.yaml
> +++ b/Documentation/netlink/specs/devlink.yaml
> @@ -119,6 +119,14 @@ definitions:
>          name: none
>        -
>          name: basic
> +  -
> +    type: enum
> +    name: eswitch-shrdesc-mode
> +    entries:
> +      -
> +        name: none
> +      -
> +        name: basic

Do we need this knob?
Can we not assume that shared-pool-count == 0 means disabled?
We can always add the knob later if needed, right now it's
just on / off with some less direct names.

>    -
>      type: enum
>      name: dpipe-header-id
> @@ -429,6 +437,13 @@ attribute-sets:
>          name: eswitch-encap-mode
>          type: u8
>          enum: eswitch-encap-mode
> +      -
> +        name: eswitch-shrdesc-mode
> +        type: u8

u32, netlink rounds sizes up to 4B, anyway

> +        enum: eswitch-shrdesc-mode
> +      -
> +        name: eswitch-shrdesc-count
> +        type: u32
>        -
>          name: resource-list
>          type: nest

next prev parent reply	other threads:[~2024-03-05  4:37 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-01  1:11 [PATCH RFC v2 net-next 1/2] devlink: Add shared descriptor eswitch attr William Tu
2024-03-01  1:11 ` [PATCH RFC v2 net-next 2/2] net/mlx5e: Add eswitch shared descriptor devlink William Tu
2024-03-01  1:46 ` [PATCH RFC v2 net-next 1/2] devlink: Add shared descriptor eswitch attr Samudrala, Sridhar
2024-03-01 17:25   ` William Tu
2024-03-05  4:37 ` Jakub Kicinski [this message]
2024-03-06  0:27   ` William Tu
2024-03-06  2:30     ` Jakub Kicinski
2024-03-06  5:18       ` William Tu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240304203758.2fd0f6be@kernel.org \
    --to=kuba@kernel.org \
    --cc=bodong@nvidia.com \
    --cc=jiri@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=tariqt@nvidia.com \
    --cc=witu@nvidia.com \
    --cc=yossiku@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).