All of lore.kernel.org
 help / color / mirror / Atom feed
From: Saeed Mahameed <saeedm@nvidia.com>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>,
	ttoukan.linux@gmail.com, gal@nvidia.com, tariqt@nvidia.com,
	leon@kernel.org, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org
Subject: Re: [PATCH v2 net-next] net/mlx5e: Report rx_discards_phy via rx_fifo_errors
Date: Fri, 15 Nov 2024 00:01:50 -0800	[thread overview]
Message-ID: <Zzb_7hXRPgYMACb9@x130> (raw)
In-Reply-To: <CALOAHbBJ2xWKZ5frzR5wKq1D7-mzS62QkWpxB5Q-A7dR-Djhnw@mail.gmail.com>

On 15 Nov 13:50, Yafang Shao wrote:
>On Fri, Nov 15, 2024 at 12:32 PM Jakub Kicinski <kuba@kernel.org> wrote:
>>
>> On Fri, 15 Nov 2024 11:56:38 +0800 Yafang Shao wrote:
>> > > On Thu, 14 Nov 2024 10:17:11 +0800 Yafang Shao wrote:
>> > > > - *   Not recommended for use in drivers for high speed interfaces.
>> > >
>> > > I thought I suggested we provide clear guidance on this counter being
>> > > related to processing pipeline being to slow, vs host backpressure.
>> > > Just deleting the line that says "don't use" is not going to cut it :|
>> >
>> > Hello Jakub,
>> >
>> > After investigating other network drivers, I found that they all
>> > report this metric to rx_missed_errors:
>> >
>> > - i40e
>> >   The corresponding ethtool metric is port.rx_discards, which was
>> > mapped to rx_missed_errors in commit 5337d2949733 ("i40e: Add
>> > rx_missed_errors for buffer exhaustion").
>> >
>> > - broadcom
>> >   The equivalent metric is rx_total_discard_pkts, reported as
>> > rx_missed_errors in commit c0c050c58d84 ("bnxt_en: New Broadcom
>> > ethernet driver")
>> >
>> > Given this, it seems we should align with the standard practice and
>> > report this metric to rx_missed_errors.
>> >
>> > Tariq, what are your thoughts?
>>
>> mlx5 already reports rx_missed_errors and AFAIU rx_discards_phy are very
>> different kind of drops than the drops reported as 'missed'.
>> The distinction is useful in production in my experience working with
>> mlx5 devices.

Yes rx_missed_errors is lack of software descriptors, please don't mix it
with hardware pipeline FIFO discards.

FYI: mlx5 reports more discards related to pipeline see below, 
especially for per PF/VF buffers. When these are advancing, usually they 
indicate congestion control issues, for example pause frames is off.


rx_prio[p]_buf_discard	
The number of packets discarded by device due to lack of per host receive buffers.

rx_prio[p]_cong_discard	
The number of packets discarded by device due to per host congestion.

rx_prio[p]_discard (rx_discard_phy is the sum of all rx_prio[p]_discard)
The number of packets discarded by device due to lack of receive buffers.

That being said, these are not errors, so reporting them via rx_xyz_error
is very misleading, rx_missed_errors is a special case though, and let's
keep it that way.

>
>From the manual [0], it says :
>
>The number of received packets dropped due to lack of buffers on a
>physical port. If this counter is increasing, it implies that the
>adapter is congested and cannot absorb the traffic coming from the
>network.
>
>Would it be possible to add this description to if_link.h?
>
>Frankly, it doesn’t make much difference to end users like me whether
>this is reported to rx_missed_errors or rx_fifo_errors; the main goal
>is simply to monitor this metric to flag any issues...
>

not rx_missed_errors please, it is exclusive for software lack of buffers.

Please have a look at thtool_eth_XXX_stats IEEE ethnl_stats, if you need to
extend, this is the place. 

RFC2863[1] defines this type of discards as ifInDiscards. So let's add
it to ehttool std stats. mlx5 reports most of them already to driver custom
ethtool -S 

[1] https://datatracker.ietf.org/doc/html/rfc2863

- Saeed

>[0]. https://enterprise-support.nvidia.com/s/article/understanding-mlx5-ethtool-counters
>
>
>--
>Regards
>Yafang

  reply	other threads:[~2024-11-15  8:01 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-14  2:17 [PATCH v2 net-next] net/mlx5e: Report rx_discards_phy via rx_fifo_errors Yafang Shao
2024-11-15  2:27 ` Jakub Kicinski
2024-11-15  3:56   ` Yafang Shao
2024-11-15  4:32     ` Jakub Kicinski
2024-11-15  5:50       ` Yafang Shao
2024-11-15  8:01         ` Saeed Mahameed [this message]
2024-11-15 19:24           ` Jakub Kicinski
2024-11-15 19:54             ` Saeed Mahameed
2024-11-15 21:25               ` Jakub Kicinski
2024-11-15 22:09                 ` Saeed Mahameed
2024-11-15 22:42                   ` Jakub Kicinski
2024-11-20  6:04                     ` Gal Pressman
2024-11-24  2:50                       ` Jakub Kicinski
2024-11-17  6:33               ` Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zzb_7hXRPgYMACb9@x130 \
    --to=saeedm@nvidia.com \
    --cc=gal@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=laoar.shao@gmail.com \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=tariqt@nvidia.com \
    --cc=ttoukan.linux@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.