From: Saeed Mahameed <saeedm@nvidia.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>,
Leon Romanovsky <leon@kernel.org>,
Linux regressions mailing list <regressions@lists.linux.dev>,
Saeed Mahameed <saeed@kernel.org>, Shay Drory <shayd@nvidia.com>,
netdev@vger.kernel.org, selinux@vger.kernel.org,
Tariq Toukan <tariqt@nvidia.com>
Subject: Re: Potential regression/bug in net/mlx5 driver
Date: Fri, 14 Apr 2023 21:40:35 -0700 [thread overview]
Message-ID: <ZDoqw8x7+UHOTCyM@x130> (raw)
In-Reply-To: <20230414173445.0800b7cf@kernel.org>
On 14 Apr 17:34, Jakub Kicinski wrote:
>On Fri, 14 Apr 2023 15:20:01 -0700 Saeed Mahameed wrote:
>> >> Officially we test only 3 GA FWs back. The fact that mlx5 is a generic CX
>> >> driver makes it really hard to test all the possible combinations, so we
>> >> need to be strict with how back we want to officially support and test old
>> >> generations.
>> >
>> >Would you be able to pull the datapoints for what 3 GA FWs means
>> >in case of CX4? Release number and date when it was released?
>>
>> https://network.nvidia.com/files/related-docs/eol/LCR-000821.pdf
>>
>> Since CX4 was EOL last year, it is going to be hard to find this info but
>> let me check my email archive..
>>
>> 12.28.2006 27-Sep-20 - recommended version
>> 12.26.xxxx 12-Dec-2019
>> 12.24.1000 2-Dec-18
>
>That's basically 3 years of support. Seems fairly reasonable.
>
>> >> Upgrade FW when possible, it is always easier than upgrading the kernel.
>> >> Anyways this was a very rare FW/Arch bug, We should've exposed an
>> >> explicit cap for this new type of PF when we had the chance, now it's too
>> >> late since a proper fix will require FW and Driver upgrades and breaking
>> >> the current solution we have over other OSes as well.
>> >>
>> >> Yes I can craft an if condition to explicitly check for chip id and FW
>> >> version for this corner case, which has no precedence in mlx5, but I prefer
>> >> to ask to upgrade FW first, and if that's an acceptable solution, I would
>> >> like to keep the mlx5 clean and device agnostic as much as possible.
>> >
>> >IMO you either need a fully fleshed out FW update story, with advanced
>> >warnings for a few releases, distributing the FW via linux-firmware or
>> >fwupdmgr or such. Or deal with the corner cases in the driver :(
>>
>> Completely agree, I will start an internal discussion ..
>>
>> >We can get Paul to update, sure, but if he noticed so quickly the
>> >question remains how many people out in the wild will get affected
>> >and not know what the cause is?
>>
>> Right, I will make sure this will be addressed, will let you know how we
>> will handle this, will try to post a patch early next cycle, but i will
>> need to work with Arch and release managers for this, so it will take a
>> couple of weeks to formalize a proper solution.
>
>What do we do now, tho? If the main side effect of a revert is that
>users of a newfangled device with an order of magnitude lower
>deployment continue to see a warning/error in the logs - I'm leaning
>towards applying it :(
I tend to agree with you but let me check with the FW architect what he has
to offer, either we provide a FW version check or another more accurate
FW cap test that could solve the issue for everyone. If I don't come up with
a solution by next Wednesday I will repost your revert in my next net PR
on Wednesday. You can mark it awaiting-upstream for now, if that works for
you.
next prev parent reply other threads:[~2023-04-15 4:40 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-28 23:08 Potential regression/bug in net/mlx5 driver Paul Moore
2023-03-29 22:20 ` Saeed Mahameed
2023-03-30 1:27 ` Paul Moore
2023-04-09 8:48 ` Linux regression tracking (Thorsten Leemhuis)
2023-04-09 23:50 ` Paul Moore
2023-04-10 5:46 ` Leon Romanovsky
2023-04-13 13:49 ` Linux regression tracking (Thorsten Leemhuis)
2023-04-13 14:54 ` Jakub Kicinski
2023-04-13 15:19 ` Paul Moore
2023-04-13 21:12 ` Saeed Mahameed
2023-04-13 22:21 ` Jakub Kicinski
2023-04-13 22:34 ` Saeed Mahameed
2023-04-13 22:51 ` Jakub Kicinski
2023-04-14 3:03 ` Saeed Mahameed
2023-04-14 3:26 ` Jakub Kicinski
2023-04-14 14:37 ` Paul Moore
2023-04-14 22:20 ` Saeed Mahameed
2023-04-15 0:34 ` Jakub Kicinski
2023-04-15 4:40 ` Saeed Mahameed [this message]
2023-04-17 15:38 ` Jakub Kicinski
2023-04-20 0:43 ` Saeed Mahameed
2023-04-20 0:46 ` Jakub Kicinski
2023-04-20 4:02 ` Saeed Mahameed
2023-03-31 13:10 ` Linux regression tracking #adding (Thorsten Leemhuis)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZDoqw8x7+UHOTCyM@x130 \
--to=saeedm@nvidia.com \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=paul@paul-moore.com \
--cc=regressions@lists.linux.dev \
--cc=saeed@kernel.org \
--cc=selinux@vger.kernel.org \
--cc=shayd@nvidia.com \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.