* [PATCH 0/1] net/mlx5: query_mcia_reg fail logging at debug severity @ 2025-10-28 19:40 Matthew W Carlis 2025-10-28 19:40 ` [PATCH 1/1] " Matthew W Carlis 2025-10-28 20:27 ` [PATCH 0/1] " Gal Pressman 0 siblings, 2 replies; 13+ messages in thread From: Matthew W Carlis @ 2025-10-28 19:40 UTC (permalink / raw) To: netdev; +Cc: saeedm, tariqt, mbloch, ashishk, msaggi, adailey, Matthew W Carlis The commit message pretty much has most of my thoughts on this one. We see this message generated all the time because users are trying to determine if there is an SFP module installed in a port or when trying to get module info from a port that doesn't have an SFP module. This is not really an error at all & therefore the message is undesirable. Downgrading the log severity for this message to dbg allows users to control it via dyndbg & therefore "up to the user to decide" if they care about it. Matthew W Carlis (1): net/mlx5: query_mcia_reg fail logging at debug severity drivers/net/ethernet/mellanox/mlx5/core/port.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- 2.46.0 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/1] net/mlx5: query_mcia_reg fail logging at debug severity 2025-10-28 19:40 [PATCH 0/1] net/mlx5: query_mcia_reg fail logging at debug severity Matthew W Carlis @ 2025-10-28 19:40 ` Matthew W Carlis 2025-10-28 20:28 ` Gal Pressman 2025-10-28 22:43 ` Jakub Kicinski 2025-10-28 20:27 ` [PATCH 0/1] " Gal Pressman 1 sibling, 2 replies; 13+ messages in thread From: Matthew W Carlis @ 2025-10-28 19:40 UTC (permalink / raw) To: netdev; +Cc: saeedm, tariqt, mbloch, ashishk, msaggi, adailey, Matthew W Carlis Whenever a user or automation runs ethtool -m <eth> or an equivalent to mlx5 device & there is not any SFP module in that device the kernel log is spammed with ""query_mcia_reg failed: status:" which is really not that informative to the user who already knows that their command failed. Since the severity is logged at error severity the log message cannot be disabled via dyndbg etc... Signed-off-by: Matthew W Carlis <mattc@purestorage.com> 100.0% drivers/net/ethernet/mellanox/mlx5/core/ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c b/drivers/net/ethernet/mellanox/mlx5/core/port.c index aa9f2b0a77d3..e1c93a96e479 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/port.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c @@ -310,7 +310,7 @@ static int mlx5_query_module_id(struct mlx5_core_dev *dev, int module_num, status = MLX5_GET(mcia_reg, out, status); if (status) { - mlx5_core_err(dev, "query_mcia_reg failed: status: 0x%x\n", + mlx5_core_dbg(dev, "query_mcia_reg failed: status: 0x%x\n", status); return -EIO; } @@ -394,7 +394,7 @@ static int mlx5_query_mcia(struct mlx5_core_dev *dev, status = MLX5_GET(mcia_reg, out, status); if (status) { - mlx5_core_err(dev, "query_mcia_reg failed: status: 0x%x\n", + mlx5_core_dbg(dev, "query_mcia_reg failed: status: 0x%x\n", status); return -EIO; } -- 2.46.0 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/1] net/mlx5: query_mcia_reg fail logging at debug severity 2025-10-28 19:40 ` [PATCH 1/1] " Matthew W Carlis @ 2025-10-28 20:28 ` Gal Pressman 2025-10-28 21:48 ` Matthew W Carlis 2025-10-28 22:43 ` Jakub Kicinski 1 sibling, 1 reply; 13+ messages in thread From: Gal Pressman @ 2025-10-28 20:28 UTC (permalink / raw) To: Matthew W Carlis, netdev; +Cc: saeedm, tariqt, mbloch, ashishk, msaggi, adailey On 28/10/2025 21:40, Matthew W Carlis wrote: > Whenever a user or automation runs ethtool -m <eth> or an equivalent > to mlx5 device & there is not any SFP module in that device the > kernel log is spammed with ""query_mcia_reg failed: status:" which > is really not that informative to the user who already knows that > their command failed. Assuming the user knows why the command failed is a big assumption, most users don't. These status errors are indicative of more than one type of error. And if he knows, I would expect him to not run the command again? BTW, we have a bug that returns success in case of an error in module EEPROM query so the user doesn't even know that the command failed other than seeing the error message. We'll push a fix for that in the next few days. Since the severity is logged at error severity > the log message cannot be disabled via dyndbg etc... > > Signed-off-by: Matthew W Carlis <mattc@purestorage.com> > > 100.0% drivers/net/ethernet/mellanox/mlx5/core/ > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c b/drivers/net/ethernet/mellanox/mlx5/core/port.c > index aa9f2b0a77d3..e1c93a96e479 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/port.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c > @@ -310,7 +310,7 @@ static int mlx5_query_module_id(struct mlx5_core_dev *dev, int module_num, > > status = MLX5_GET(mcia_reg, out, status); > if (status) { > - mlx5_core_err(dev, "query_mcia_reg failed: status: 0x%x\n", > + mlx5_core_dbg(dev, "query_mcia_reg failed: status: 0x%x\n", > status); > return -EIO; > } > @@ -394,7 +394,7 @@ static int mlx5_query_mcia(struct mlx5_core_dev *dev, > > status = MLX5_GET(mcia_reg, out, status); > if (status) { > - mlx5_core_err(dev, "query_mcia_reg failed: status: 0x%x\n", > + mlx5_core_dbg(dev, "query_mcia_reg failed: status: 0x%x\n", > status); > return -EIO; > } Both of these functions are called from mlx5e_get_module_eeprom() which has its own error print regardless, so this doesn't really prevent the "spam". ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/1] net/mlx5: query_mcia_reg fail logging at debug severity 2025-10-28 20:28 ` Gal Pressman @ 2025-10-28 21:48 ` Matthew W Carlis 2025-10-29 7:22 ` Gal Pressman 0 siblings, 1 reply; 13+ messages in thread From: Matthew W Carlis @ 2025-10-28 21:48 UTC (permalink / raw) To: gal; +Cc: adailey, ashishk, mattc, mbloch, msaggi, netdev, saeedm, tariqt Tue, 28 Oct 2025 22:27:39, Gal Pressman wrote: > And if he knows, I would expect him to not run the command again? Sometimes a user is a script or an inventory automation tool. > It is an error, as evident by the fact that you only changed the log > level, not the error return value. I don't know of any strict convention in terms of when error return codes should have associated log messages. I wonder if there is something more targeted that could be done. For example, if there is a "physical presence" mechanism & a module is not present simply skip the logging. Cheers! - Matt ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/1] net/mlx5: query_mcia_reg fail logging at debug severity 2025-10-28 21:48 ` Matthew W Carlis @ 2025-10-29 7:22 ` Gal Pressman 0 siblings, 0 replies; 13+ messages in thread From: Gal Pressman @ 2025-10-29 7:22 UTC (permalink / raw) To: Matthew W Carlis; +Cc: adailey, ashishk, mbloch, msaggi, netdev, saeedm, tariqt On 28/10/2025 23:48, Matthew W Carlis wrote: > Tue, 28 Oct 2025 22:27:39, Gal Pressman wrote: >> And if he knows, I would expect him to not run the command again? > > Sometimes a user is a script or an inventory automation tool. > >> It is an error, as evident by the fact that you only changed the log >> level, not the error return value. > > I don't know of any strict convention in terms of when error return codes > should have associated log messages. I wonder if there is something more targeted > that could be done. For example, if there is a "physical presence" mechanism > & a module is not present simply skip the logging. Maybe the problem is that because we returned zero on error the script would wrongly continue to try the queries? Does it stop when an error is detected? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/1] net/mlx5: query_mcia_reg fail logging at debug severity 2025-10-28 19:40 ` [PATCH 1/1] " Matthew W Carlis 2025-10-28 20:28 ` Gal Pressman @ 2025-10-28 22:43 ` Jakub Kicinski 2025-10-29 7:33 ` Gal Pressman 1 sibling, 1 reply; 13+ messages in thread From: Jakub Kicinski @ 2025-10-28 22:43 UTC (permalink / raw) To: Matthew W Carlis; +Cc: netdev, saeedm, tariqt, mbloch, ashishk, msaggi, adailey On Tue, 28 Oct 2025 13:40:11 -0600 Matthew W Carlis wrote: > Whenever a user or automation runs ethtool -m <eth> or an equivalent > to mlx5 device & there is not any SFP module in that device the > kernel log is spammed with ""query_mcia_reg failed: status:" which > is really not that informative to the user who already knows that > their command failed. Since the severity is logged at error severity > the log message cannot be disabled via dyndbg etc... +1 from me FWIW. I wonder if we're hitting this on the same class of systems but recently this started hitting at Meta as well. Millions of log entries a day because some ports don't have an SFP plugged in :| ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/1] net/mlx5: query_mcia_reg fail logging at debug severity 2025-10-28 22:43 ` Jakub Kicinski @ 2025-10-29 7:33 ` Gal Pressman 2025-10-29 16:49 ` Matthew W Carlis 0 siblings, 1 reply; 13+ messages in thread From: Gal Pressman @ 2025-10-29 7:33 UTC (permalink / raw) To: Jakub Kicinski, Matthew W Carlis Cc: netdev, saeedm, tariqt, mbloch, ashishk, msaggi, adailey On 29/10/2025 0:43, Jakub Kicinski wrote: > On Tue, 28 Oct 2025 13:40:11 -0600 Matthew W Carlis wrote: >> Whenever a user or automation runs ethtool -m <eth> or an equivalent >> to mlx5 device & there is not any SFP module in that device the >> kernel log is spammed with ""query_mcia_reg failed: status:" which >> is really not that informative to the user who already knows that >> their command failed. Since the severity is logged at error severity >> the log message cannot be disabled via dyndbg etc... > > +1 from me FWIW. I wonder if we're hitting this on the same class > of systems but recently this started hitting at Meta as well. > Millions of log entries a day because some ports don't have an SFP > plugged in :| > Allow me to split the discussion to two questions: 1. Is this an error? 2. Should it be logged? Do we agree that the answer to #1 is yes? For #2, I think it should, but we can probably improve the situation with extack instead of a print. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/1] net/mlx5: query_mcia_reg fail logging at debug severity 2025-10-29 7:33 ` Gal Pressman @ 2025-10-29 16:49 ` Matthew W Carlis 2025-10-29 23:33 ` Jakub Kicinski 0 siblings, 1 reply; 13+ messages in thread From: Matthew W Carlis @ 2025-10-29 16:49 UTC (permalink / raw) To: gal; +Cc: adailey, ashishk, kuba, mattc, mbloch, msaggi, netdev, saeedm, tariqt On Wed, 29 Oct 2025, Gal Pressman wrote: > Allow me to split the discussion to two questions: > 1. Is this an error? > 2. Should it be logged? > > Do we agree that the answer to #1 is yes? > > For #2, I think it should, but we can probably improve the situation > with extack instead of a print. I think its an 'expected error' if the module is not present. I agree. For 2 I think if the user runs "ethtool -m" on a port with no module, they received an error message stating something along the lines of "module not present" and the kernel didn't have any log messages about it that would be near to 'the best' solution. I specifically suffer pain from the case of an unused port (no module installed). I took a peek at the SFP+ and QSFP connector specifications. It looks like they do have pins reserved for module presence so I'm wondering if the firmware exposes a mechanism for the mlx driver to check the module presence. This would clear up essentially every issue with this particular case because I have curious people seeing the existing message & then questioning me about whether is the cause of some other problem (much of the time not even a networking problem) when in the port is simply 'unused'. Cheers! -Matt ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/1] net/mlx5: query_mcia_reg fail logging at debug severity 2025-10-29 16:49 ` Matthew W Carlis @ 2025-10-29 23:33 ` Jakub Kicinski 2025-10-30 14:48 ` Gal Pressman 0 siblings, 1 reply; 13+ messages in thread From: Jakub Kicinski @ 2025-10-29 23:33 UTC (permalink / raw) To: Matthew W Carlis Cc: gal, adailey, ashishk, mbloch, msaggi, netdev, saeedm, tariqt On Wed, 29 Oct 2025 10:49:24 -0600 Matthew W Carlis wrote: > On Wed, 29 Oct 2025, Gal Pressman wrote: > > Allow me to split the discussion to two questions: > > 1. Is this an error? > > 2. Should it be logged? > > > > Do we agree that the answer to #1 is yes? > > > > For #2, I think it should, but we can probably improve the situation > > with extack instead of a print. > > I think its an 'expected error' if the module is not present. I agree. > > For 2 I think if the user runs "ethtool -m" on a port with no module, > they received an error message stating something along the lines of > "module not present" and the kernel didn't have any log messages about > it that would be near to 'the best' solution. I assume you mean error message specifically from the CLI or whatever API the user is exercising? If so I agree. The system logs are for fatal / unexpected conditions. AFAIU returning -EIO is the _expected_ way to find out that module is not plugged in. If there's a better API I suppose we can make ethtool call it first to avoid the error. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/1] net/mlx5: query_mcia_reg fail logging at debug severity 2025-10-29 23:33 ` Jakub Kicinski @ 2025-10-30 14:48 ` Gal Pressman 2025-10-30 20:09 ` Matthew W Carlis 0 siblings, 1 reply; 13+ messages in thread From: Gal Pressman @ 2025-10-30 14:48 UTC (permalink / raw) To: Jakub Kicinski, Matthew W Carlis Cc: adailey, ashishk, mbloch, msaggi, netdev, saeedm, tariqt On 30/10/2025 1:33, Jakub Kicinski wrote: > On Wed, 29 Oct 2025 10:49:24 -0600 Matthew W Carlis wrote: >> On Wed, 29 Oct 2025, Gal Pressman wrote: >>> Allow me to split the discussion to two questions: >>> 1. Is this an error? >>> 2. Should it be logged? >>> >>> Do we agree that the answer to #1 is yes? >>> >>> For #2, I think it should, but we can probably improve the situation >>> with extack instead of a print. >> >> I think its an 'expected error' if the module is not present. I agree. >> >> For 2 I think if the user runs "ethtool -m" on a port with no module, >> they received an error message stating something along the lines of >> "module not present" and the kernel didn't have any log messages about >> it that would be near to 'the best' solution. > > I assume you mean error message specifically from the CLI or whatever > API the user is exercising? If so I agree. > > The system logs are for fatal / unexpected conditions. AFAIU returning > -EIO is the _expected_ way to find out that module is not plugged in. > If there's a better API I suppose we can make ethtool call it first > to avoid the error. There are other cases which will return -EIO, but do not necessarily mean that the module is disconnected: unresponsive module, i2c error, disabled module, unsupported module, etc. We cannot differentiate between them without the status print. Changing the log level makes things more difficult, as most production servers will not enable the debug print, and the logs would be harder to analyze. I asked before, maybe these automatic tools that keep querying the module continue to do so because of the success return code, and that will be resolved soon? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/1] net/mlx5: query_mcia_reg fail logging at debug severity 2025-10-30 14:48 ` Gal Pressman @ 2025-10-30 20:09 ` Matthew W Carlis 2025-11-02 13:09 ` Gal Pressman 0 siblings, 1 reply; 13+ messages in thread From: Matthew W Carlis @ 2025-10-30 20:09 UTC (permalink / raw) To: gal; +Cc: adailey, ashishk, kuba, mattc, mbloch, msaggi, netdev, saeedm, tariqt On Thu, 30 Oct 2025 16:48:54, Gal Pressman wrote: > I asked before, maybe these automatic tools that keep querying the > module continue to do so because of the success return code, and that > will be resolved soon? I went and ran the command on a port with no module and agree that it does return zero. It doesn't solve the problem for me... Part of the reason automation will periodically run the command is for inventory purposes. We want to roughly know when a module was installed or removed. Then, as far as I can tell the only way to know if a module is installed is by running this command which has the consequence of generating log spam when there is not a module installed. > Changing the log level makes things more difficult, as most production > servers will not enable the debug print, and the logs would be harder to > analyze. I care less about the logging level... I suspect it should be info or dbg, but we just don't want log lines generated when we're trying to figure out if some module is installed. Cheers! -Matt ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/1] net/mlx5: query_mcia_reg fail logging at debug severity 2025-10-30 20:09 ` Matthew W Carlis @ 2025-11-02 13:09 ` Gal Pressman 0 siblings, 0 replies; 13+ messages in thread From: Gal Pressman @ 2025-11-02 13:09 UTC (permalink / raw) To: Matthew W Carlis Cc: adailey, ashishk, kuba, mbloch, msaggi, netdev, saeedm, tariqt On 30/10/2025 22:09, Matthew W Carlis wrote: > On Thu, 30 Oct 2025 16:48:54, Gal Pressman wrote: >> I asked before, maybe these automatic tools that keep querying the >> module continue to do so because of the success return code, and that >> will be resolved soon? > > I went and ran the command on a port with no module and agree that it does > return zero. > > It doesn't solve the problem for me... Part of the reason automation will > periodically run the command is for inventory purposes. We want to roughly > know when a module was installed or removed. Then, as far as I can tell the > only way to know if a module is installed is by running this command which > has the consequence of generating log spam when there is not a module installed. > >> Changing the log level makes things more difficult, as most production >> servers will not enable the debug print, and the logs would be harder to >> analyze. > > I care less about the logging level... I suspect it should be info or dbg, but > we just don't want log lines generated when we're trying to figure out if some > module is installed. I have an idea on how to make the situation better, will hopefully have something soon. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/1] net/mlx5: query_mcia_reg fail logging at debug severity 2025-10-28 19:40 [PATCH 0/1] net/mlx5: query_mcia_reg fail logging at debug severity Matthew W Carlis 2025-10-28 19:40 ` [PATCH 1/1] " Matthew W Carlis @ 2025-10-28 20:27 ` Gal Pressman 1 sibling, 0 replies; 13+ messages in thread From: Gal Pressman @ 2025-10-28 20:27 UTC (permalink / raw) To: Matthew W Carlis, netdev; +Cc: saeedm, tariqt, mbloch, ashishk, msaggi, adailey On 28/10/2025 21:40, Matthew W Carlis wrote: > The commit message pretty much has most of my thoughts on this one. We > see this message generated all the time because users are trying to > determine if there is an SFP module installed in a port or when trying > to get module info from a port that doesn't have an SFP module. This > is not really an error at all It is an error, as evident by the fact that you only changed the log level, not the error return value. We cannot query the module EEPROM if there's no module to query. > & therefore the message is undesirable. Arguable. This is a privileged command, and error messages are intended for errors. It's important to have the value of the status field in the logs in case of an error. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-11-02 13:09 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-10-28 19:40 [PATCH 0/1] net/mlx5: query_mcia_reg fail logging at debug severity Matthew W Carlis 2025-10-28 19:40 ` [PATCH 1/1] " Matthew W Carlis 2025-10-28 20:28 ` Gal Pressman 2025-10-28 21:48 ` Matthew W Carlis 2025-10-29 7:22 ` Gal Pressman 2025-10-28 22:43 ` Jakub Kicinski 2025-10-29 7:33 ` Gal Pressman 2025-10-29 16:49 ` Matthew W Carlis 2025-10-29 23:33 ` Jakub Kicinski 2025-10-30 14:48 ` Gal Pressman 2025-10-30 20:09 ` Matthew W Carlis 2025-11-02 13:09 ` Gal Pressman 2025-10-28 20:27 ` [PATCH 0/1] " Gal Pressman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).