netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Kicinski <jakub.kicinski@netronome.com>
To: Huy Nguyen <huyn@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>,
	"David S. Miller" <davem@davemloft.net>,
	netdev@vger.kernel.org, Jiri Pirko <jiri@resnulli.us>,
	Or Gerlitz <gerlitz.or@gmail.com>,
	Parav Pandit <parav@mellanox.com>
Subject: Re: [net-next 1/6] net/dcb: Add dcbnl buffer attribute
Date: Tue, 22 May 2018 11:32:02 -0700	[thread overview]
Message-ID: <20180522113202.54c43440@cakuba> (raw)
In-Reply-To: <5b0d2137-8c70-ceb4-6965-fef2c75c4c24@mellanox.com>

On Tue, 22 May 2018 10:36:17 -0500, Huy Nguyen wrote:
> On 5/22/2018 12:20 AM, Jakub Kicinski wrote:
> > On Mon, 21 May 2018 14:04:57 -0700, Saeed Mahameed wrote:  
> >> From: Huy Nguyen <huyn@mellanox.com>
> >>
> >> In this patch, we add dcbnl buffer attribute to allow user
> >> change the NIC's buffer configuration such as priority
> >> to buffer mapping and buffer size of individual buffer.
> >>
> >> This attribute combined with pfc attribute allows advance user to
> >> fine tune the qos setting for specific priority queue. For example,
> >> user can give dedicated buffer for one or more prirorities or user
> >> can give large buffer to certain priorities.
> >>
> >> We present an use case scenario where dcbnl buffer attribute configured
> >> by advance user helps reduce the latency of messages of different sizes.
> >>
> >> Scenarios description:
> >> On ConnectX-5, we run latency sensitive traffic with
> >> small/medium message sizes ranging from 64B to 256KB and bandwidth sensitive
> >> traffic with large messages sizes 512KB and 1MB. We group small, medium,
> >> and large message sizes to their own pfc enables priorities as follow.
> >>    Priorities 1 & 2 (64B, 256B and 1KB)
> >>    Priorities 3 & 4 (4KB, 8KB, 16KB, 64KB, 128KB and 256KB)
> >>    Priorities 5 & 6 (512KB and 1MB)
> >>
> >> By default, ConnectX-5 maps all pfc enabled priorities to a single
> >> lossless fixed buffer size of 50% of total available buffer space. The
> >> other 50% is assigned to lossy buffer. Using dcbnl buffer attribute,
> >> we create three equal size lossless buffers. Each buffer has 25% of total
> >> available buffer space. Thus, the lossy buffer size reduces to 25%. Priority
> >> to lossless  buffer mappings are set as follow.
> >>    Priorities 1 & 2 on lossless buffer #1
> >>    Priorities 3 & 4 on lossless buffer #2
> >>    Priorities 5 & 6 on lossless buffer #3
> >>
> >> We observe improvements in latency for small and medium message sizes
> >> as follows. Please note that the large message sizes bandwidth performance is
> >> reduced but the total bandwidth remains the same.
> >>    256B message size (42 % latency reduction)
> >>    4K message size (21% latency reduction)
> >>    64K message size (16% latency reduction)
> >>
> >> Signed-off-by: Huy Nguyen <huyn@mellanox.com>
> >> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>  
> > On a cursory look this bares a lot of resemblance to devlink shared
> > buffer configuration ABI.  Did you look into using that?
> >
> > Just to be clear devlink shared buffer ABIs don't require representors
> > and "switchdev mode".
> > .  
> [HQN] Dear Jakub, there are several reasons that devlink shared buffer 
> ABI cannot be used:
> 1. The devlink shared buffer ABI is written based on the switch cli 
> which you can find out more
> from this link https://community.mellanox.com/docs/DOC-2558.

Devlink API accommodates requirements of simpler (SwitchX2?) and more
advanced schemes (present in Spectrum).  The simpler/basic static
threshold configurations is exactly what you are doing here, AFAIU.

> 2. The dcbnl interfaces have been used for QoS settings.

QoS settings != shared buffer configuration.

> In NIC, the  buffer configuration are tied to priority (ETS PFC).

Some customers use DCB, a lot (most?) of them don't.  I don't think the
"this is a logical extension of a commonly used API" really stands here.

> The buffer configuration are not tied to port like switch.

It's tied to a port and TCs, you just have one port but still have 8
TCs exactly like a switch...

> 3. Shared buffer, alpha, threshold are switch specific terms.

IDK how talking about alpha is relevant, it's just one threshold type
the API supports.  As far as shared buffer and threshold I don't know
if these are switch terms (or how "switch" differs from "NIC" at that
level) - I personally find carving shared buffer into pools very
intuitive.

Could you give examples of commands/configs one can use with your new
ABI?  How does one query the total size of the buffer to be carved?

  reply	other threads:[~2018-05-22 18:32 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-21 21:04 [pull request][net-next 0/6] Mellanox, mlx5e updates 2018-05-19 Saeed Mahameed
2018-05-21 21:04 ` [net-next 1/6] net/dcb: Add dcbnl buffer attribute Saeed Mahameed
2018-05-22  5:20   ` Jakub Kicinski
2018-05-22 15:36     ` Huy Nguyen
2018-05-22 18:32       ` Jakub Kicinski [this message]
2018-05-23  1:01         ` Huy Nguyen
2018-05-23  6:15           ` Or Gerlitz
2018-05-23  9:23           ` Jakub Kicinski
2018-05-23  9:33             ` Jiri Pirko
2018-05-23 15:08             ` Huy Nguyen
2018-05-23 15:27             ` Huy Nguyen
2018-05-24 17:13             ` Ido Schimmel
2018-05-23  9:43     ` Jiri Pirko
2018-05-23 13:52       ` John Fastabend
2018-05-23 15:37         ` Huy Nguyen
2018-05-23 16:03           ` John Fastabend
2018-05-23 20:28             ` Jakub Kicinski
2018-05-24 14:37             ` Huy Nguyen
2018-05-23 20:13         ` Jakub Kicinski
2018-05-23 20:19   ` Jakub Kicinski
2018-05-24 14:11     ` Huy Nguyen
2018-05-21 21:04 ` [net-next 2/6] net/mlx5: Add pbmc and pptb in the port_access_reg_cap_mask Saeed Mahameed
2018-05-22 10:19   ` Or Gerlitz
2018-05-22 10:21     ` Or Gerlitz
2018-05-22 16:01       ` Saeed Mahameed
2018-05-24 21:21         ` Or Gerlitz
2018-05-24 21:28           ` Saeed Mahameed
2018-05-21 21:04 ` [net-next 3/6] net/mlx5e: Move port speed code from en_ethtool.c to en/port.c Saeed Mahameed
2018-05-21 21:05 ` [net-next 4/6] net/mlx5e: PPTB and PBMC register firmware command support Saeed Mahameed
2018-05-21 21:05 ` [net-next 5/6] net/mlx5e: Receive buffer configuration Saeed Mahameed
2018-05-21 21:05 ` [net-next 6/6] net/mlx5e: Receive buffer support for DCBX Saeed Mahameed
2018-05-22 19:38 ` [pull request][net-next 0/6] Mellanox, mlx5e updates 2018-05-19 David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180522113202.54c43440@cakuba \
    --to=jakub.kicinski@netronome.com \
    --cc=davem@davemloft.net \
    --cc=gerlitz.or@gmail.com \
    --cc=huyn@mellanox.com \
    --cc=jiri@resnulli.us \
    --cc=netdev@vger.kernel.org \
    --cc=parav@mellanox.com \
    --cc=saeedm@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).