From: Saeed Mahameed <saeedm@nvidia.com>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>,
ttoukan.linux@gmail.com, gal@nvidia.com, tariqt@nvidia.com,
leon@kernel.org, netdev@vger.kernel.org,
linux-rdma@vger.kernel.org
Subject: Re: [PATCH v2 net-next] net/mlx5e: Report rx_discards_phy via rx_fifo_errors
Date: Fri, 15 Nov 2024 00:01:50 -0800 [thread overview]
Message-ID: <Zzb_7hXRPgYMACb9@x130> (raw)
In-Reply-To: <CALOAHbBJ2xWKZ5frzR5wKq1D7-mzS62QkWpxB5Q-A7dR-Djhnw@mail.gmail.com>
On 15 Nov 13:50, Yafang Shao wrote:
>On Fri, Nov 15, 2024 at 12:32 PM Jakub Kicinski <kuba@kernel.org> wrote:
>>
>> On Fri, 15 Nov 2024 11:56:38 +0800 Yafang Shao wrote:
>> > > On Thu, 14 Nov 2024 10:17:11 +0800 Yafang Shao wrote:
>> > > > - * Not recommended for use in drivers for high speed interfaces.
>> > >
>> > > I thought I suggested we provide clear guidance on this counter being
>> > > related to processing pipeline being to slow, vs host backpressure.
>> > > Just deleting the line that says "don't use" is not going to cut it :|
>> >
>> > Hello Jakub,
>> >
>> > After investigating other network drivers, I found that they all
>> > report this metric to rx_missed_errors:
>> >
>> > - i40e
>> > The corresponding ethtool metric is port.rx_discards, which was
>> > mapped to rx_missed_errors in commit 5337d2949733 ("i40e: Add
>> > rx_missed_errors for buffer exhaustion").
>> >
>> > - broadcom
>> > The equivalent metric is rx_total_discard_pkts, reported as
>> > rx_missed_errors in commit c0c050c58d84 ("bnxt_en: New Broadcom
>> > ethernet driver")
>> >
>> > Given this, it seems we should align with the standard practice and
>> > report this metric to rx_missed_errors.
>> >
>> > Tariq, what are your thoughts?
>>
>> mlx5 already reports rx_missed_errors and AFAIU rx_discards_phy are very
>> different kind of drops than the drops reported as 'missed'.
>> The distinction is useful in production in my experience working with
>> mlx5 devices.
Yes rx_missed_errors is lack of software descriptors, please don't mix it
with hardware pipeline FIFO discards.
FYI: mlx5 reports more discards related to pipeline see below,
especially for per PF/VF buffers. When these are advancing, usually they
indicate congestion control issues, for example pause frames is off.
rx_prio[p]_buf_discard
The number of packets discarded by device due to lack of per host receive buffers.
rx_prio[p]_cong_discard
The number of packets discarded by device due to per host congestion.
rx_prio[p]_discard (rx_discard_phy is the sum of all rx_prio[p]_discard)
The number of packets discarded by device due to lack of receive buffers.
That being said, these are not errors, so reporting them via rx_xyz_error
is very misleading, rx_missed_errors is a special case though, and let's
keep it that way.
>
>From the manual [0], it says :
>
>The number of received packets dropped due to lack of buffers on a
>physical port. If this counter is increasing, it implies that the
>adapter is congested and cannot absorb the traffic coming from the
>network.
>
>Would it be possible to add this description to if_link.h?
>
>Frankly, it doesn’t make much difference to end users like me whether
>this is reported to rx_missed_errors or rx_fifo_errors; the main goal
>is simply to monitor this metric to flag any issues...
>
not rx_missed_errors please, it is exclusive for software lack of buffers.
Please have a look at thtool_eth_XXX_stats IEEE ethnl_stats, if you need to
extend, this is the place.
RFC2863[1] defines this type of discards as ifInDiscards. So let's add
it to ehttool std stats. mlx5 reports most of them already to driver custom
ethtool -S
[1] https://datatracker.ietf.org/doc/html/rfc2863
- Saeed
>[0]. https://enterprise-support.nvidia.com/s/article/understanding-mlx5-ethtool-counters
>
>
>--
>Regards
>Yafang
next prev parent reply other threads:[~2024-11-15 8:01 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-14 2:17 [PATCH v2 net-next] net/mlx5e: Report rx_discards_phy via rx_fifo_errors Yafang Shao
2024-11-15 2:27 ` Jakub Kicinski
2024-11-15 3:56 ` Yafang Shao
2024-11-15 4:32 ` Jakub Kicinski
2024-11-15 5:50 ` Yafang Shao
2024-11-15 8:01 ` Saeed Mahameed [this message]
2024-11-15 19:24 ` Jakub Kicinski
2024-11-15 19:54 ` Saeed Mahameed
2024-11-15 21:25 ` Jakub Kicinski
2024-11-15 22:09 ` Saeed Mahameed
2024-11-15 22:42 ` Jakub Kicinski
2024-11-20 6:04 ` Gal Pressman
2024-11-24 2:50 ` Jakub Kicinski
2024-11-17 6:33 ` Yafang Shao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zzb_7hXRPgYMACb9@x130 \
--to=saeedm@nvidia.com \
--cc=gal@nvidia.com \
--cc=kuba@kernel.org \
--cc=laoar.shao@gmail.com \
--cc=leon@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=tariqt@nvidia.com \
--cc=ttoukan.linux@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).