From: Jakub Kicinski <kuba@kernel.org>
To: Edward Cree <ecree.xilinx@gmail.com>
Cc: davem@davemloft.net, netdev@vger.kernel.org, edumazet@google.com,
pabeni@redhat.com, michael.chan@broadcom.com
Subject: Re: [PATCH net-next 01/11] net: ethtool: let drivers remove lost RSS contexts
Date: Wed, 3 Jul 2024 06:43:47 -0700 [thread overview]
Message-ID: <20240703064347.1929a75b@kernel.org> (raw)
In-Reply-To: <c22f9b2b-cbcd-d77a-2a9a-cf62c2af8882@gmail.com>
On Wed, 3 Jul 2024 12:08:36 +0100 Edward Cree wrote:
> On 03/07/2024 00:47, Jakub Kicinski wrote:
> > RSS contexts may get lost from a device, in various extreme circumstances.
> > Specifically if the firmware leaks resources and resets, or crashes and
> > either recovers in partially working state or the crash causes a
> > different FW version to run - creating the context again may fail.
>
> So, I deliberately *didn't* do this, on the grounds that if the user
> fixed things by updating FW and resetting again, their contexts could
> get restored. I suppose big users like Meta will have orchestration
> doing all that work anyway so it doesn't matter.
"We" don't reset FW while workload is running. I'm speculating why bnxt
may lose the contexts. From my perspective if contexts get lost the
machine should get taken out of production and at least power cycled.
> > Drivers should do their absolute best to prevent this from happening.
> > When it does, however, telling user that a context exists, when it can't
> > possibly be used any more is counter productive. Add a helper for
> > drivers to discard contexts. Print an error, in the future netlink
> > notification will also be sent.
>
> Possibility of a netlink notification makes the idea of a broken flag
> a bit more workable imho. But it's up to you which way to go.
Oh, have we talked about this? Now that you mention the broken flag
I recall talking about devlink health reporter.. a while back.
I don't have a preference on how we deal with the lost contexts.
The more obvious we make it to orchestration that the machine is broken
the better. Can you point me to the discussion / describe the broken
flag?
next prev parent reply other threads:[~2024-07-03 13:43 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-02 23:47 [PATCH net-next 00/11] eth: bnxt: use the new RSS API Jakub Kicinski
2024-07-02 23:47 ` [PATCH net-next 01/11] net: ethtool: let drivers remove lost RSS contexts Jakub Kicinski
2024-07-03 11:08 ` Edward Cree
2024-07-03 13:43 ` Jakub Kicinski [this message]
2024-07-03 14:15 ` Edward Cree
2024-07-02 23:47 ` [PATCH net-next 02/11] net: ethtool: let driver declare max size of RSS indir table and key Jakub Kicinski
2024-07-04 7:34 ` Simon Horman
2024-07-02 23:47 ` [PATCH net-next 02/11] net: ethtool: let drivers " Jakub Kicinski
2024-07-03 11:16 ` Edward Cree
2024-07-02 23:47 ` [PATCH net-next 03/11] eth: bnxt: allow deleting RSS contexts when the device is down Jakub Kicinski
2024-07-02 23:47 ` [PATCH net-next 04/11] eth: bnxt: move from .set_rxfh to .create_rxfh_context and friends Jakub Kicinski
2024-07-03 12:06 ` Edward Cree
2024-07-03 12:49 ` Pavan Chebbi
2024-07-03 13:46 ` Jakub Kicinski
2024-07-04 6:19 ` kernel test robot
2024-07-05 0:34 ` kernel test robot
2024-07-02 23:47 ` [PATCH net-next 05/11] eth: bnxt: remove rss_ctx_bmap Jakub Kicinski
2024-07-02 23:47 ` [PATCH net-next 06/11] eth: bnxt: depend on core cleaning up RSS contexts Jakub Kicinski
2024-07-02 23:47 ` [PATCH net-next 07/11] eth: bnxt: use context priv for struct bnxt_rss_ctx Jakub Kicinski
2024-07-02 23:47 ` [PATCH net-next 08/11] eth: bnxt: use the RSS context XArray instead of the local list Jakub Kicinski
2024-07-02 23:47 ` [PATCH net-next 09/11] eth: bnxt: bump the entry size in indir tables to u32 Jakub Kicinski
2024-07-03 10:51 ` Przemek Kitszel
2024-07-03 13:49 ` Jakub Kicinski
2024-07-03 14:02 ` Przemek Kitszel
2024-07-03 16:02 ` Jakub Kicinski
2024-07-02 23:47 ` [PATCH net-next 10/11] eth: bnxt: use the indir table from ethtool context Jakub Kicinski
2024-07-03 11:08 ` Przemek Kitszel
2024-07-03 11:39 ` Edward Cree
2024-07-03 13:51 ` Jakub Kicinski
2024-07-02 23:47 ` [PATCH net-next 11/11] eth: bnxt: pad out the correct indirection table Jakub Kicinski
-- strict thread matches above, loose matches on Subject: below --
2024-07-11 22:07 [PATCH net-next 00/11] eth: bnxt: use the new RSS API Jakub Kicinski
2024-07-11 22:07 ` [PATCH net-next 01/11] net: ethtool: let drivers remove lost RSS contexts Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240703064347.1929a75b@kernel.org \
--to=kuba@kernel.org \
--cc=davem@davemloft.net \
--cc=ecree.xilinx@gmail.com \
--cc=edumazet@google.com \
--cc=michael.chan@broadcom.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).