All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: William Tu <witu@nvidia.com>
Cc: <netdev@vger.kernel.org>, <jiri@nvidia.com>, <bodong@nvidia.com>,
	<tariqt@nvidia.com>, <yossiku@nvidia.com>
Subject: Re: [PATCH RFC v2 net-next 1/2] devlink: Add shared descriptor eswitch attr
Date: Mon, 4 Mar 2024 20:37:58 -0800	[thread overview]
Message-ID: <20240304203758.2fd0f6be@kernel.org> (raw)
In-Reply-To: <20240301011119.3267-1-witu@nvidia.com>

On Fri, 1 Mar 2024 03:11:18 +0200 William Tu wrote:
> Add two eswitch attrs: shrdesc_mode and shrdesc_count.
> 
> 1. shrdesc_mode: to enable a sharing memory buffer for
> representor's rx buffer, 

Let's narrow down the terminology. "Shared memory buffer"
and "shared memory pool" and "shrdesc" all refer to the same
thing. Let's stick to shared pool?

> and 2. shrdesc_count: to control the
> number of buffers in this shared memory pool.

_default_ number of buffers in shared pool used by representors?

If/when the API to configure shared pools becomes real it will
presumably take precedence over this default?

> When using switchdev mode, the representor ports handles the slow path
> traffic, the traffic that can't be offloaded will be redirected to the
> representor port for processing. Memory consumption of the representor
> port's rx buffer can grow to several GB when scaling to 1k VFs reps.
> For example, in mlx5 driver, each RQ, with a typical 1K descriptors,
> consumes 3MB of DMA memory for packet buffer in WQEs, and with four
> channels, it consumes 4 * 3MB * 1024 = 12GB of memory. And since rep
> ports are for slow path traffic, most of these rx DMA memory are idle.
> 
> Add shrdesc_mode configuration, allowing multiple representors
> to share a rx memory buffer pool. When enabled, individual representor
> doesn't need to allocate its dedicated rx buffer, but just pointing
> its rq to the memory pool. This could make the memory being better
> utilized. The shrdesc_count represents the number of rx ring
> entries, e.g., same meaning as ethtool -g, that's shared across other
> representors. Users adjust it based on how many reps, total system
> memory, or performance expectation.

Can we use bytes as the unit? Like the page pool. Descriptors don't
mean much to the user.

> The two params are also useful for other vendors such as Intel ICE
> drivers and Broadcom's driver, which also have representor ports for
> slow path traffic.
> 
> An example use case:
> $ devlink dev eswitch show pci/0000:08:00.0
>   pci/0000:08:00.0: mode legacy inline-mode none encap-mode basic \
>   shrdesc-mode none shrdesc-count 0
> $ devlink dev eswitch set pci/0000:08:00.0 mode switchdev \
>   shrdesc-mode basic shrdesc-count 1024
> $ devlink dev eswitch show pci/0000:08:00.0
>   pci/0000:08:00.0: mode switchdev inline-mode none encap-mode basic \
>   shrdesc-mode basic shrdesc-count 1024
> 
> Note that new configurations are set at legacy mode, and enabled at
> switchdev mode.

>  Documentation/netlink/specs/devlink.yaml | 17 ++++++++++
>  include/net/devlink.h                    |  8 +++++
>  include/uapi/linux/devlink.h             |  7 ++++
>  net/devlink/dev.c                        | 43 ++++++++++++++++++++++++
>  net/devlink/netlink_gen.c                |  6 ++--
>  5 files changed, 79 insertions(+), 2 deletions(-)

ENODOCS

> diff --git a/Documentation/netlink/specs/devlink.yaml b/Documentation/netlink/specs/devlink.yaml
> index cf6eaa0da821..58f31d99b8b3 100644
> --- a/Documentation/netlink/specs/devlink.yaml
> +++ b/Documentation/netlink/specs/devlink.yaml
> @@ -119,6 +119,14 @@ definitions:
>          name: none
>        -
>          name: basic
> +  -
> +    type: enum
> +    name: eswitch-shrdesc-mode
> +    entries:
> +      -
> +        name: none
> +      -
> +        name: basic

Do we need this knob?
Can we not assume that shared-pool-count == 0 means disabled?
We can always add the knob later if needed, right now it's
just on / off with some less direct names.

>    -
>      type: enum
>      name: dpipe-header-id
> @@ -429,6 +437,13 @@ attribute-sets:
>          name: eswitch-encap-mode
>          type: u8
>          enum: eswitch-encap-mode
> +      -
> +        name: eswitch-shrdesc-mode
> +        type: u8

u32, netlink rounds sizes up to 4B, anyway

> +        enum: eswitch-shrdesc-mode
> +      -
> +        name: eswitch-shrdesc-count
> +        type: u32
>        -
>          name: resource-list
>          type: nest

  parent reply	other threads:[~2024-03-05  4:37 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-01  1:11 [PATCH RFC v2 net-next 1/2] devlink: Add shared descriptor eswitch attr William Tu
2024-03-01  1:11 ` [PATCH RFC v2 net-next 2/2] net/mlx5e: Add eswitch shared descriptor devlink William Tu
2024-03-01  1:46 ` [PATCH RFC v2 net-next 1/2] devlink: Add shared descriptor eswitch attr Samudrala, Sridhar
2024-03-01 17:25   ` William Tu
2024-03-05  4:37 ` Jakub Kicinski [this message]
2024-03-06  0:27   ` William Tu
2024-03-06  2:30     ` Jakub Kicinski
2024-03-06  5:18       ` William Tu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240304203758.2fd0f6be@kernel.org \
    --to=kuba@kernel.org \
    --cc=bodong@nvidia.com \
    --cc=jiri@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=tariqt@nvidia.com \
    --cc=witu@nvidia.com \
    --cc=yossiku@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.