netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Petr Machata <petrm@nvidia.com>
To: "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	<netdev@vger.kernel.org>
Cc: Ido Schimmel <idosch@nvidia.com>, Petr Machata <petrm@nvidia.com>,
	"David Ahern" <dsahern@kernel.org>, <mlxsw@nvidia.com>
Subject: [PATCH net-next 0/7] Support for nexthop group statistics
Date: Tue, 27 Feb 2024 19:17:25 +0100	[thread overview]
Message-ID: <cover.1709057158.git.petrm@nvidia.com> (raw)

ECMP is a fundamental component in L3 designs. However, it's fragile. Many
factors influence whether an ECMP group will operate as intended: hash
policy (i.e. the set of fields that contribute to ECMP hash calculation),
neighbor validity, hash seed (which might lead to polarization) or the type
of ECMP group used (hash-threshold or resilient).

At the same time, collection of statistics that would help an operator
determine that the group performs as desired, is difficult.

A solution that we present in this patchset is to add counters to next hop
group entries. For SW-datapath deployments, this will on its own allow
collection and evaluation of relevant statistics. For HW-datapath
deployments, we further add a way to request that HW counters be installed
for a given group, in-kernel interfaces to collect the HW statistics, and
netlink interfaces to query them.

For example:

    # ip nexthop replace id 4000 group 4001/4002 hw_stats on

    # ip -s -d nexthop show id 4000
    id 4000 group 4001/4002 scope global proto unspec offload hw_stats on used on
      stats:
        id 4001 packets 5002 packets_hw 5000
        id 4002 packets 4999 packets_hw 4999

The point of the patchset is visibility of ECMP balance, and that is
influenced by packet headers, not their payload. Correspondingly, we only
include packet counters in the statistics, not byte counters.

We also decided to model HW statistics as a nexthop group attribute, not an
arbitrary nexthop one. The latter would count any traffic going through a
given nexthop, regardless of which ECMP group it is in, or any at all. The
reason is again hat the point of the patchset is ECMP balance visibility,
not arbitrary inspection of how busy a particular nexthop is.
Implementation of individual-nexthop statistics is certainly possible, and
could well follow the general approach we are taking in this patchset.
For resilient groups, per-bucket statistics could be done in a similar
manner as well.

This patchset contains the core code. mlxsw support will be sent in a
follow-up patch set.

This patchset progresses as follows:

- Patches #1 and #2 add support for a new next-hop object attribute,
  NHA_OP_FLAGS. That is meant to carry various op-specific signaling, in
  particular whether SW- and HW-collected nexthop stats should be part of
  the get or dump response. The idea is to avoid wasting message space, and
  time for collection of HW statistics, when the values are not needed.

- Patches #3 and #4 add SW-datapath stats and corresponding UAPI.

- Patches #5, #6 and #7 add support fro HW-datapath stats and UAPI.
  Individual drivers still need to contribute the appropriate HW-specific
  support code.

Ido Schimmel (5):
  net: nexthop: Add nexthop group entry stats
  net: nexthop: Expose nexthop group stats to user space
  net: nexthop: Add hardware statistics notifications
  net: nexthop: Add ability to enable / disable hardware statistics
  net: nexthop: Expose nexthop group HW stats to user space

Petr Machata (2):
  net: nexthop: Adjust netlink policy parsing for a new attribute
  net: nexthop: Add NHA_OP_FLAGS

 include/net/nexthop.h        |  29 ++++
 include/uapi/linux/nexthop.h |  47 ++++++
 net/ipv4/nexthop.c           | 314 ++++++++++++++++++++++++++++++-----
 3 files changed, 350 insertions(+), 40 deletions(-)

-- 
2.43.0


             reply	other threads:[~2024-02-27 18:18 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-27 18:17 Petr Machata [this message]
2024-02-27 18:17 ` [PATCH net-next 1/7] net: nexthop: Adjust netlink policy parsing for a new attribute Petr Machata
2024-02-27 18:17 ` [PATCH net-next 2/7] net: nexthop: Add NHA_OP_FLAGS Petr Machata
2024-02-28  3:34   ` Jakub Kicinski
2024-02-28 10:50     ` Petr Machata
2024-02-28 14:48       ` Jakub Kicinski
2024-02-28 15:16         ` Jakub Kicinski
2024-02-28 15:58           ` Petr Machata
2024-02-28 16:42             ` Jakub Kicinski
2024-02-29 14:03               ` Petr Machata
2024-02-27 18:17 ` [PATCH net-next 3/7] net: nexthop: Add nexthop group entry stats Petr Machata
2024-02-28 14:30   ` Simon Horman
2024-02-28 15:57     ` Petr Machata
2024-02-27 18:17 ` [PATCH net-next 4/7] net: nexthop: Expose nexthop group stats to user space Petr Machata
2024-02-28  3:35   ` Jakub Kicinski
2024-02-28 11:24     ` Petr Machata
2024-02-27 18:17 ` [PATCH net-next 5/7] net: nexthop: Add hardware statistics notifications Petr Machata
2024-02-27 18:17 ` [PATCH net-next 6/7] net: nexthop: Add ability to enable / disable hardware statistics Petr Machata
2024-02-27 18:17 ` [PATCH net-next 7/7] net: nexthop: Expose nexthop group HW stats to user space Petr Machata
2024-02-28  3:39   ` Jakub Kicinski
2024-02-28 17:16     ` Petr Machata
2024-02-28  3:56   ` Kees Cook

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1709057158.git.petrm@nvidia.com \
    --to=petrm@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=idosch@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=mlxsw@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).