netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: Andrew Lunn <andrew@lunn.ch>
Cc: netdev@vger.kernel.org, linux@armlinux.org.uk, olteanv@gmail.com,
	hkallweit1@gmail.com, f.fainelli@gmail.com, saeedm@nvidia.com,
	michael.chan@broadcom.com
Subject: Re: [RFC net-next] net: track locally triggered link loss
Date: Sat, 21 May 2022 11:26:46 -0700	[thread overview]
Message-ID: <20220521112646.0d3c0a8a@kernel.org> (raw)
In-Reply-To: <Yoj11Kv55HX3k/Ou@lunn.ch>

On Sat, 21 May 2022 16:23:16 +0200 Andrew Lunn wrote:
> > For a system which wants to monitor link quality on the local end =>
> > i.e. whether physical hardware has to be replaced - differentiating
> > between (1) and (2) doesn't really matter, they are both non-events.  
> 
> Maybe data centres should learn something from the automotive world.
> It seems like most T1 PHYs have a signal quality value, which is
> exposed via netlink in the link info message. And it is none invasive.

There were attempts at this (also on the PCIe side of the NIC)
but AFAIU there is no general standard of the measurement or the
quality metric so it's hard to generalize.

> Many PHYs also have counters of receive errors, framing errors
> etc. These can be reported via ethtool --phy-stats.

Ack, they are, I've added the APIs already and we use those.
Symbol errors during carrier and FEC corrected/uncorrected blocks.
Basic FCS errors, too.

IDK what the relative false-positive rate of different sources of
information are to be honest. The monitoring team asked me about
the link flaps and the situation in Linux is indeed less than ideal.

> SFPs expose SNR ratios in their module data, transmit and receive
> powers etc, via ethtool -m and hwmon.
> 
> There is also ethtool --cable-test. It is invasive, in that it
> requires the link to go down, but it should tell you about broken
> pairs. However, you probably know that already, a monitoring system
> which has not noticed the link dropping to 100Mbps so it only uses two
> pairs is not worth the money you paired for it.

Last hop in DC is all copper DACs. Not sure there's a standard
--cable-test for DACs :S

> Now, it seems like very few, if any, firmware driven Ethernet card
> actually make use of these features. You need cards which Linux is
> actually driving the hardware. But these APIs are available for
> anybody to use. Don't data centre users have enough purchasing power
> they can influence firmware/driver writers to actually use these APIs?
> And i think the results would be better than trying to count link
> up/down.

Let's separate new and old devices.

For new products customer can stipulate requirements and they usually
get implemented. I'd love to add more requirements for signal quality 
and error reporting. It'd need to be based on standards because each
vendor cooking their own units does not scale. Please send pointers 
my way!

Old products are a different ball game, and that's where we care about
basic info like link flaps. Vendors EOL a product and you're lucky to
get bug fixes. Servers live longer and longer and obviously age
correlates with failure rates so we need to monitor those devices.

  reply	other threads:[~2022-05-21 18:26 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-20  0:45 [RFC net-next] net: track locally triggered link loss Jakub Kicinski
2022-05-20 12:24 ` Andrew Lunn
2022-05-20 18:14   ` Jakub Kicinski
2022-05-20 18:48     ` Andrew Lunn
2022-05-20 22:02       ` Jakub Kicinski
2022-05-21 14:23         ` Andrew Lunn
2022-05-21 18:26           ` Jakub Kicinski [this message]
2022-05-20 22:08       ` Saeed Mahameed
2022-05-20 23:03         ` Jakub Kicinski
2022-05-21  5:08           ` Saeed Mahameed
2022-05-21 18:38             ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220521112646.0d3c0a8a@kernel.org \
    --to=kuba@kernel.org \
    --cc=andrew@lunn.ch \
    --cc=f.fainelli@gmail.com \
    --cc=hkallweit1@gmail.com \
    --cc=linux@armlinux.org.uk \
    --cc=michael.chan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=olteanv@gmail.com \
    --cc=saeedm@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).