public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Ido Schimmel <idosch@idosch.org>
To: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Eran Ben Elisha <eranbe@mellanox.com>,
	David Miller <davem@davemloft.net>,
	saeedm@mellanox.com, netdev@vger.kernel.org, jiri@mellanox.com,
	alexander.duyck@gmail.com, helgaas@kernel.org
Subject: Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31
Date: Fri, 3 Aug 2018 19:41:50 +0300	[thread overview]
Message-ID: <20180803164150.GA21806@splinter.mtl.com> (raw)
In-Reply-To: <20180802155315.48aaab19@cakuba.netronome.com>

On Thu, Aug 02, 2018 at 03:53:15PM -0700, Jakub Kicinski wrote:
> No one is requesting full RED offload here..  if someone sets the
> parameters you can't support you simply won't offload them.  And ignore
> the parameters which only make sense in software terms.  Look at the
> docs for mlxsw:
> 
> https://github.com/Mellanox/mlxsw/wiki/Queues-Management#offloading-red
> 
> It says "not offloaded" in a number of places.
> 
...
> It's generally preferable to implement a subset of exiting well defined
> API than create vendor knobs, hence hardly a misuse.

Sorry for derailing the discussion, but you mentioned some points that
have been bothering me for a while.

I think we didn't do a very good job with buffer management and this is
exactly why you see some parameters marked as "not offloaded". Take the
"limit" (queue size) for example. It's configured via devlink-sb, by
setting a quota on the number of bytes that can be queued for the port
and TC (queue) that RED manages. See:

https://github.com/Mellanox/mlxsw/wiki/Quality-of-Service#pool-binding

It would have been much better and user friendly to not ignore this
parameter and have users configure the limit using existing interfaces
(tc), instead of creating a discrepancy between the software and
hardware data paths by configuring the hardware directly via devlink-sb.

I believe devlink-sb is mainly the result of Linux's short comings in
this area and our lack of perspective back then. While the qdisc layer
(Linux's shared buffers) works for end hosts, it requires enhancements
(mainly on ingress) for switches (physical/virtual) that forward
packets.

For example, switches (I'm familiar with Mellanox ASICs, but I assume
the concept is similar in other ASICs) have ingress buffers where
packets are stored while going through the pipeline. Once out of the
pipeline you know from which port and queue the packet should egress. In
case you have both lossless and lossy traffic in your network you
probably want to classify it into different ingress buffers and mark the
buffers where the lossless traffic is stored as such, so that PFC frames
would be emitted above a certain threshold.

This is currently configured using dcbnl, but it lacks a software model
which means that packets that are forwarded by the kernel don't get the
same treatment (e.g., skb priority isn't set). It also means that when
you want to limit the number of packets that are queued *from* a certain
port and ingress buffer you resort to tools such as devlink-sb that end
up colliding with existing tools (tc).

I was thinking (not too much...) about modelling the above using ingress
qdiscs. They don't do any queueing, but more of accounting. Once the
egress qdisc dequeues the packet, you give credit back to the ingress
qdisc from which the packet came from. I believe that modelling these
buffers using the qdisc layer is the right abstraction.

Would appreciate hearing your thoughts on the above.

  reply	other threads:[~2018-08-03 18:38 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-01 21:52 [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31 Saeed Mahameed
2018-08-01 21:52 ` [net-next 01/10] devlink: Fix param set handling for string type Saeed Mahameed
2018-08-01 22:33   ` Jakub Kicinski
2018-08-01 21:52 ` [net-next 02/10] devlink: Fix param cmode driverinit " Saeed Mahameed
2018-08-01 21:52 ` [net-next 03/10] devlink: Add helper function for safely copy string param Saeed Mahameed
2018-08-01 21:52 ` [net-next 04/10] devlink: Add extack messages support to param set Saeed Mahameed
2018-08-01 21:52 ` [net-next 05/10] net/mlx5: Move all devlink related functions calls to devlink.c Saeed Mahameed
2018-08-01 21:52 ` [net-next 06/10] net/mlx5: Add MPEGC register configuration functionality Saeed Mahameed
2018-08-01 21:52 ` [net-next 07/10] net/mlx5: Enable PCIe buffer congestion handling workaround via devlink Saeed Mahameed
2018-08-01 22:18   ` Alexander Duyck
2018-08-01 21:52 ` [net-next 08/10] net/mlx5: Add Vendor Specific Capability access gateway Saeed Mahameed
2018-08-01 21:52 ` [net-next 09/10] net/mlx5: Add Crdump FW snapshot support Saeed Mahameed
2018-08-01 21:52 ` [net-next 10/10] net/mlx5: Use devlink region_snapshot parameter Saeed Mahameed
2018-08-01 22:34 ` [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31 Alexander Duyck
2018-08-01 23:13   ` Saeed Mahameed
2018-08-02  0:36     ` Alexander Duyck
     [not found]       ` <2d84340e-0703-0bc7-4917-3b18979b2aa5@mellanox.com>
2018-08-29 15:42         ` Alex Vesker
2018-08-29 17:04           ` Alexander Duyck
     [not found]             ` <5206dd74-432d-3342-2a48-3cdd1be8b5cb@mellanox.com>
2018-08-30 15:39               ` Alexander Duyck
2018-08-02  6:15     ` Jiri Pirko
2018-08-02  0:00 ` Jakub Kicinski
2018-08-02  1:40   ` David Miller
2018-08-02  8:29     ` Petr Machata
2018-08-02 17:11       ` Jakub Kicinski
2018-08-02 18:04         ` David Miller
2018-08-02 20:10           ` Petr Machata
2018-08-02 15:07     ` Eran Ben Elisha
2018-08-02 22:53       ` Jakub Kicinski
2018-08-03 16:41         ` Ido Schimmel [this message]
2018-08-04  4:59           ` Jakub Kicinski
2018-08-06 13:01         ` Eran Ben Elisha
2018-08-07  0:49           ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180803164150.GA21806@splinter.mtl.com \
    --to=idosch@idosch.org \
    --cc=alexander.duyck@gmail.com \
    --cc=davem@davemloft.net \
    --cc=eranbe@mellanox.com \
    --cc=helgaas@kernel.org \
    --cc=jakub.kicinski@netronome.com \
    --cc=jiri@mellanox.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox