* [Intel-wired-lan] [PATCH] igc: Mask replay rollover/timeout errors in I225_LMVP
@ 2022-12-29 12:26 Rajat Khandelwal
2023-01-01 7:27 ` Neftin, Sasha
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Rajat Khandelwal @ 2022-12-29 12:26 UTC (permalink / raw)
To: jesse.brandeburg, anthony.l.nguyen, davem, edumazet, kuba, pabeni
Cc: netdev, Rajat Khandelwal, intel-wired-lan, linux-kernel,
rajat.khandelwal
The CPU logs get flooded with replay rollover/timeout AER errors in
the system with i225_lmvp connected, usually inside thunderbolt devices.
One of the prominent TBT4 docks we use is HP G4 Hook2, which incorporates
an Intel Foxville chipset, which uses the igc driver.
On connecting ethernet, CPU logs get inundated with these errors. The point
is we shouldn't be spamming the logs with such correctible errors as it
confuses other kernel developers less familiar with PCI errors, support
staff, and users who happen to look at the logs.
Signed-off-by: Rajat Khandelwal <rajat.khandelwal@linux.intel.com>
---
drivers/net/ethernet/intel/igc/igc_main.c | 28 +++++++++++++++++++++--
1 file changed, 26 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index ebff0e04045d..a3a6e8086c8d 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -6201,6 +6201,26 @@ u32 igc_rd32(struct igc_hw *hw, u32 reg)
return value;
}
+#ifdef CONFIG_PCIEAER
+static void igc_mask_aer_replay_correctible(struct igc_adapter *adapter)
+{
+ struct pci_dev *pdev = adapter->pdev;
+ u32 aer_pos, corr_mask;
+
+ if (pdev->device != IGC_DEV_ID_I225_LMVP)
+ return;
+
+ aer_pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR);
+ if (!aer_pos)
+ return;
+
+ pci_read_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, &corr_mask);
+
+ corr_mask |= PCI_ERR_COR_REP_ROLL | PCI_ERR_COR_REP_TIMER;
+ pci_write_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, corr_mask);
+}
+#endif
+
/**
* igc_probe - Device Initialization Routine
* @pdev: PCI device information struct
@@ -6236,8 +6256,6 @@ static int igc_probe(struct pci_dev *pdev,
if (err)
goto err_pci_reg;
- pci_enable_pcie_error_reporting(pdev);
-
err = pci_enable_ptm(pdev, NULL);
if (err < 0)
dev_info(&pdev->dev, "PCIe PTM not supported by PCIe bus/controller\n");
@@ -6272,6 +6290,12 @@ static int igc_probe(struct pci_dev *pdev,
if (!adapter->io_addr)
goto err_ioremap;
+#ifdef CONFIG_PCIEAER
+ igc_mask_aer_replay_correctible(adapter);
+#endif
+
+ pci_enable_pcie_error_reporting(pdev);
+
/* hw->hw_addr can be zeroed, so use adapter->io_addr for unmap */
hw->hw_addr = adapter->io_addr;
--
2.34.1
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [Intel-wired-lan] [PATCH] igc: Mask replay rollover/timeout errors in I225_LMVP 2022-12-29 12:26 [Intel-wired-lan] [PATCH] igc: Mask replay rollover/timeout errors in I225_LMVP Rajat Khandelwal @ 2023-01-01 7:27 ` Neftin, Sasha 2023-01-01 8:32 ` Leon Romanovsky 2023-01-01 10:32 ` Paul Menzel 2 siblings, 0 replies; 16+ messages in thread From: Neftin, Sasha @ 2023-01-01 7:27 UTC (permalink / raw) To: Rajat Khandelwal, jesse.brandeburg, anthony.l.nguyen, davem, edumazet, kuba, pabeni, Ruinskiy, Dima, Lifshits, Vitaly, Avivi, Amir Cc: netdev, intel-wired-lan, linux-kernel, rajat.khandelwal On 12/29/2022 14:26, Rajat Khandelwal wrote: > The CPU logs get flooded with replay rollover/timeout AER errors in > the system with i225_lmvp connected, usually inside thunderbolt devices. > > One of the prominent TBT4 docks we use is HP G4 Hook2, which incorporates > an Intel Foxville chipset, which uses the igc driver. > On connecting ethernet, CPU logs get inundated with these errors. The point > is we shouldn't be spamming the logs with such correctible errors as it > confuses other kernel developers less familiar with PCI errors, support > staff, and users who happen to look at the logs. > > Signed-off-by: Rajat Khandelwal <rajat.khandelwal@linux.intel.com> > --- > drivers/net/ethernet/intel/igc/igc_main.c | 28 +++++++++++++++++++++-- > 1 file changed, 26 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c > index ebff0e04045d..a3a6e8086c8d 100644 > --- a/drivers/net/ethernet/intel/igc/igc_main.c > +++ b/drivers/net/ethernet/intel/igc/igc_main.c > @@ -6201,6 +6201,26 @@ u32 igc_rd32(struct igc_hw *hw, u32 reg) > return value; > } > > +#ifdef CONFIG_PCIEAER > +static void igc_mask_aer_replay_correctible(struct igc_adapter *adapter) > +{ > + struct pci_dev *pdev = adapter->pdev; > + u32 aer_pos, corr_mask; > + > + if (pdev->device != IGC_DEV_ID_I225_LMVP) > + return; > + > + aer_pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR); > + if (!aer_pos) > + return; > + > + pci_read_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, &corr_mask); > + > + corr_mask |= PCI_ERR_COR_REP_ROLL | PCI_ERR_COR_REP_TIMER; > + pci_write_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, corr_mask); > +} > +#endif > + Hello Rajat, May we use the privilege flag approach, give user control: and mask some advanced errors? Although... Why did it happen? Didn't you prefer not to investigate it or else mask it? (I have concerns about the PCIe link over the thunderbolt tunnel) > /** > * igc_probe - Device Initialization Routine > * @pdev: PCI device information struct > @@ -6236,8 +6256,6 @@ static int igc_probe(struct pci_dev *pdev, > if (err) > goto err_pci_reg; > > - pci_enable_pcie_error_reporting(pdev); > - > err = pci_enable_ptm(pdev, NULL); > if (err < 0) > dev_info(&pdev->dev, "PCIe PTM not supported by PCIe bus/controller\n"); > @@ -6272,6 +6290,12 @@ static int igc_probe(struct pci_dev *pdev, > if (!adapter->io_addr) > goto err_ioremap; > > +#ifdef CONFIG_PCIEAER > + igc_mask_aer_replay_correctible(adapter); > +#endif > + > + pci_enable_pcie_error_reporting(pdev); > + > /* hw->hw_addr can be zeroed, so use adapter->io_addr for unmap */ > hw->hw_addr = adapter->io_addr; > _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Intel-wired-lan] [PATCH] igc: Mask replay rollover/timeout errors in I225_LMVP 2022-12-29 12:26 [Intel-wired-lan] [PATCH] igc: Mask replay rollover/timeout errors in I225_LMVP Rajat Khandelwal 2023-01-01 7:27 ` Neftin, Sasha @ 2023-01-01 8:32 ` Leon Romanovsky 2023-01-01 10:34 ` Paul Menzel 2023-01-01 10:32 ` Paul Menzel 2 siblings, 1 reply; 16+ messages in thread From: Leon Romanovsky @ 2023-01-01 8:32 UTC (permalink / raw) To: Rajat Khandelwal Cc: intel-wired-lan, rajat.khandelwal, jesse.brandeburg, linux-kernel, edumazet, anthony.l.nguyen, netdev, kuba, pabeni, davem On Thu, Dec 29, 2022 at 05:56:40PM +0530, Rajat Khandelwal wrote: > The CPU logs get flooded with replay rollover/timeout AER errors in > the system with i225_lmvp connected, usually inside thunderbolt devices. > > One of the prominent TBT4 docks we use is HP G4 Hook2, which incorporates > an Intel Foxville chipset, which uses the igc driver. > On connecting ethernet, CPU logs get inundated with these errors. The point > is we shouldn't be spamming the logs with such correctible errors as it > confuses other kernel developers less familiar with PCI errors, support > staff, and users who happen to look at the logs. > > Signed-off-by: Rajat Khandelwal <rajat.khandelwal@linux.intel.com> > --- > drivers/net/ethernet/intel/igc/igc_main.c | 28 +++++++++++++++++++++-- > 1 file changed, 26 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c > index ebff0e04045d..a3a6e8086c8d 100644 > --- a/drivers/net/ethernet/intel/igc/igc_main.c > +++ b/drivers/net/ethernet/intel/igc/igc_main.c > @@ -6201,6 +6201,26 @@ u32 igc_rd32(struct igc_hw *hw, u32 reg) > return value; > } > > +#ifdef CONFIG_PCIEAER > +static void igc_mask_aer_replay_correctible(struct igc_adapter *adapter) > +{ > + struct pci_dev *pdev = adapter->pdev; > + u32 aer_pos, corr_mask; > + > + if (pdev->device != IGC_DEV_ID_I225_LMVP) > + return; > + > + aer_pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR); > + if (!aer_pos) > + return; > + > + pci_read_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, &corr_mask); > + > + corr_mask |= PCI_ERR_COR_REP_ROLL | PCI_ERR_COR_REP_TIMER; > + pci_write_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, corr_mask); Shouldn't this igc_mask_aer_replay_correctible function be implemented in drivers/pci/quirks.c and not in igc_probe()? Thanks _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Intel-wired-lan] [PATCH] igc: Mask replay rollover/timeout errors in I225_LMVP 2023-01-01 8:32 ` Leon Romanovsky @ 2023-01-01 10:34 ` Paul Menzel 2023-01-03 9:54 ` Leon Romanovsky 0 siblings, 1 reply; 16+ messages in thread From: Paul Menzel @ 2023-01-01 10:34 UTC (permalink / raw) To: Leon Romanovsky, Rajat Khandelwal Cc: linux-pci, netdev, rajat.khandelwal, jesse.brandeburg, linux-kernel, edumazet, anthony.l.nguyen, kuba, Bjorn Helgaas, intel-wired-lan, pabeni, davem [Cc: +Bjorn, +linux-pci] Dear Leon, dear Rajat, Am 01.01.23 um 09:32 schrieb Leon Romanovsky: > On Thu, Dec 29, 2022 at 05:56:40PM +0530, Rajat Khandelwal wrote: >> The CPU logs get flooded with replay rollover/timeout AER errors in >> the system with i225_lmvp connected, usually inside thunderbolt devices. >> >> One of the prominent TBT4 docks we use is HP G4 Hook2, which incorporates >> an Intel Foxville chipset, which uses the igc driver. >> On connecting ethernet, CPU logs get inundated with these errors. The point >> is we shouldn't be spamming the logs with such correctible errors as it >> confuses other kernel developers less familiar with PCI errors, support >> staff, and users who happen to look at the logs. >> >> Signed-off-by: Rajat Khandelwal <rajat.khandelwal@linux.intel.com> >> --- >> drivers/net/ethernet/intel/igc/igc_main.c | 28 +++++++++++++++++++++-- >> 1 file changed, 26 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c >> index ebff0e04045d..a3a6e8086c8d 100644 >> --- a/drivers/net/ethernet/intel/igc/igc_main.c >> +++ b/drivers/net/ethernet/intel/igc/igc_main.c >> @@ -6201,6 +6201,26 @@ u32 igc_rd32(struct igc_hw *hw, u32 reg) >> return value; >> } >> >> +#ifdef CONFIG_PCIEAER >> +static void igc_mask_aer_replay_correctible(struct igc_adapter *adapter) >> +{ >> + struct pci_dev *pdev = adapter->pdev; >> + u32 aer_pos, corr_mask; >> + >> + if (pdev->device != IGC_DEV_ID_I225_LMVP) >> + return; >> + >> + aer_pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR); >> + if (!aer_pos) >> + return; >> + >> + pci_read_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, &corr_mask); >> + >> + corr_mask |= PCI_ERR_COR_REP_ROLL | PCI_ERR_COR_REP_TIMER; >> + pci_write_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, corr_mask); > > Shouldn't this igc_mask_aer_replay_correctible function be implemented > in drivers/pci/quirks.c and not in igc_probe()? Probably. Though I think, the PCI quirk file, is getting too big. Kind regards, Paul _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Intel-wired-lan] [PATCH] igc: Mask replay rollover/timeout errors in I225_LMVP 2023-01-01 10:34 ` Paul Menzel @ 2023-01-03 9:54 ` Leon Romanovsky 2023-01-03 11:54 ` Bjorn Helgaas 0 siblings, 1 reply; 16+ messages in thread From: Leon Romanovsky @ 2023-01-03 9:54 UTC (permalink / raw) To: Paul Menzel Cc: Rajat Khandelwal, linux-pci, netdev, rajat.khandelwal, jesse.brandeburg, linux-kernel, edumazet, intel-wired-lan, kuba, Bjorn Helgaas, anthony.l.nguyen, pabeni, davem On Sun, Jan 01, 2023 at 11:34:21AM +0100, Paul Menzel wrote: > [Cc: +Bjorn, +linux-pci] > > Dear Leon, dear Rajat, > > > Am 01.01.23 um 09:32 schrieb Leon Romanovsky: > > On Thu, Dec 29, 2022 at 05:56:40PM +0530, Rajat Khandelwal wrote: > > > The CPU logs get flooded with replay rollover/timeout AER errors in > > > the system with i225_lmvp connected, usually inside thunderbolt devices. > > > > > > One of the prominent TBT4 docks we use is HP G4 Hook2, which incorporates > > > an Intel Foxville chipset, which uses the igc driver. > > > On connecting ethernet, CPU logs get inundated with these errors. The point > > > is we shouldn't be spamming the logs with such correctible errors as it > > > confuses other kernel developers less familiar with PCI errors, support > > > staff, and users who happen to look at the logs. > > > > > > Signed-off-by: Rajat Khandelwal <rajat.khandelwal@linux.intel.com> > > > --- > > > drivers/net/ethernet/intel/igc/igc_main.c | 28 +++++++++++++++++++++-- > > > 1 file changed, 26 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c > > > index ebff0e04045d..a3a6e8086c8d 100644 > > > --- a/drivers/net/ethernet/intel/igc/igc_main.c > > > +++ b/drivers/net/ethernet/intel/igc/igc_main.c > > > @@ -6201,6 +6201,26 @@ u32 igc_rd32(struct igc_hw *hw, u32 reg) > > > return value; > > > } > > > +#ifdef CONFIG_PCIEAER > > > +static void igc_mask_aer_replay_correctible(struct igc_adapter *adapter) > > > +{ > > > + struct pci_dev *pdev = adapter->pdev; > > > + u32 aer_pos, corr_mask; > > > + > > > + if (pdev->device != IGC_DEV_ID_I225_LMVP) > > > + return; > > > + > > > + aer_pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR); > > > + if (!aer_pos) > > > + return; > > > + > > > + pci_read_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, &corr_mask); > > > + > > > + corr_mask |= PCI_ERR_COR_REP_ROLL | PCI_ERR_COR_REP_TIMER; > > > + pci_write_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, corr_mask); > > > > Shouldn't this igc_mask_aer_replay_correctible function be implemented > > in drivers/pci/quirks.c and not in igc_probe()? > > Probably. Though I think, the PCI quirk file, is getting too big. As long as that file is right location, we should use it. One can refactor quirk file later. Thanks > > > Kind regards, > > Paul _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Intel-wired-lan] [PATCH] igc: Mask replay rollover/timeout errors in I225_LMVP 2023-01-03 9:54 ` Leon Romanovsky @ 2023-01-03 11:54 ` Bjorn Helgaas 2023-01-03 12:00 ` Leon Romanovsky 0 siblings, 1 reply; 16+ messages in thread From: Bjorn Helgaas @ 2023-01-03 11:54 UTC (permalink / raw) To: Leon Romanovsky Cc: Paul Menzel, Rajat Khandelwal, anthony.l.nguyen, netdev, rajat.khandelwal, jesse.brandeburg, linux-kernel, edumazet, intel-wired-lan, linux-pci, Bjorn Helgaas, kuba, pabeni, davem On Tue, Jan 03, 2023 at 11:54:24AM +0200, Leon Romanovsky wrote: > On Sun, Jan 01, 2023 at 11:34:21AM +0100, Paul Menzel wrote: > > Am 01.01.23 um 09:32 schrieb Leon Romanovsky: > > > On Thu, Dec 29, 2022 at 05:56:40PM +0530, Rajat Khandelwal wrote: > > > > The CPU logs get flooded with replay rollover/timeout AER errors in > > > > the system with i225_lmvp connected, usually inside thunderbolt devices. > > > > > > > > One of the prominent TBT4 docks we use is HP G4 Hook2, which incorporates > > > > an Intel Foxville chipset, which uses the igc driver. > > > > On connecting ethernet, CPU logs get inundated with these errors. The point > > > > is we shouldn't be spamming the logs with such correctible errors as it > > > > confuses other kernel developers less familiar with PCI errors, support > > > > staff, and users who happen to look at the logs. > > > > --- a/drivers/net/ethernet/intel/igc/igc_main.c > > > > +++ b/drivers/net/ethernet/intel/igc/igc_main.c > > > > +static void igc_mask_aer_replay_correctible(struct igc_adapter *adapter) > > > Shouldn't this igc_mask_aer_replay_correctible function be implemented > > > in drivers/pci/quirks.c and not in igc_probe()? > > > > Probably. Though I think, the PCI quirk file, is getting too big. > > As long as that file is right location, we should use it. > One can refactor quirk file later. If a quirk like this is only needed when the driver is loaded, I think the driver is a better place than drivers/pci/quirks.c. If it's in quirks.c, either we have to replicate driver Kconfig via #ifdefs, or the kernel contains the quirk for systems that don't need it. I'm generally not a fan of simply masking errors because they're annoying. I'd prefer to figure out the root cause and fix it if possible. Or maybe we can tone down or rate-limit the logging so it's not so alarming. Bjorn _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Intel-wired-lan] [PATCH] igc: Mask replay rollover/timeout errors in I225_LMVP 2023-01-03 11:54 ` Bjorn Helgaas @ 2023-01-03 12:00 ` Leon Romanovsky 2023-01-03 14:21 ` Bjorn Helgaas 0 siblings, 1 reply; 16+ messages in thread From: Leon Romanovsky @ 2023-01-03 12:00 UTC (permalink / raw) To: Bjorn Helgaas Cc: Paul Menzel, Rajat Khandelwal, anthony.l.nguyen, netdev, rajat.khandelwal, jesse.brandeburg, linux-kernel, edumazet, intel-wired-lan, linux-pci, Bjorn Helgaas, kuba, pabeni, davem On Tue, Jan 03, 2023 at 05:54:02AM -0600, Bjorn Helgaas wrote: > On Tue, Jan 03, 2023 at 11:54:24AM +0200, Leon Romanovsky wrote: > > On Sun, Jan 01, 2023 at 11:34:21AM +0100, Paul Menzel wrote: > > > Am 01.01.23 um 09:32 schrieb Leon Romanovsky: > > > > On Thu, Dec 29, 2022 at 05:56:40PM +0530, Rajat Khandelwal wrote: > > > > > The CPU logs get flooded with replay rollover/timeout AER errors in > > > > > the system with i225_lmvp connected, usually inside thunderbolt devices. > > > > > > > > > > One of the prominent TBT4 docks we use is HP G4 Hook2, which incorporates > > > > > an Intel Foxville chipset, which uses the igc driver. > > > > > On connecting ethernet, CPU logs get inundated with these errors. The point > > > > > is we shouldn't be spamming the logs with such correctible errors as it > > > > > confuses other kernel developers less familiar with PCI errors, support > > > > > staff, and users who happen to look at the logs. > > > > > > --- a/drivers/net/ethernet/intel/igc/igc_main.c > > > > > +++ b/drivers/net/ethernet/intel/igc/igc_main.c > > > > > > +static void igc_mask_aer_replay_correctible(struct igc_adapter *adapter) > > > > > Shouldn't this igc_mask_aer_replay_correctible function be implemented > > > > in drivers/pci/quirks.c and not in igc_probe()? > > > > > > Probably. Though I think, the PCI quirk file, is getting too big. > > > > As long as that file is right location, we should use it. > > One can refactor quirk file later. > > If a quirk like this is only needed when the driver is loaded, This is always the case with PCI devices managed through kernel, isn't it? Users don't care/aware about "broken" devices unless they start to use them. Thanks _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Intel-wired-lan] [PATCH] igc: Mask replay rollover/timeout errors in I225_LMVP 2023-01-03 12:00 ` Leon Romanovsky @ 2023-01-03 14:21 ` Bjorn Helgaas 2023-01-03 17:16 ` Leon Romanovsky 2023-01-04 5:35 ` Neftin, Sasha 0 siblings, 2 replies; 16+ messages in thread From: Bjorn Helgaas @ 2023-01-03 14:21 UTC (permalink / raw) To: Leon Romanovsky Cc: Paul Menzel, Rajat Khandelwal, netdev, rajat.khandelwal, jesse.brandeburg, linux-kernel, kuba, edumazet, anthony.l.nguyen, linux-pci, Bjorn Helgaas, intel-wired-lan, pabeni, davem On Tue, Jan 03, 2023 at 02:00:04PM +0200, Leon Romanovsky wrote: > On Tue, Jan 03, 2023 at 05:54:02AM -0600, Bjorn Helgaas wrote: > > On Tue, Jan 03, 2023 at 11:54:24AM +0200, Leon Romanovsky wrote: > > > On Sun, Jan 01, 2023 at 11:34:21AM +0100, Paul Menzel wrote: > > > > Am 01.01.23 um 09:32 schrieb Leon Romanovsky: > > > > > On Thu, Dec 29, 2022 at 05:56:40PM +0530, Rajat Khandelwal wrote: > > > > > > The CPU logs get flooded with replay rollover/timeout AER errors in > > > > > > the system with i225_lmvp connected, usually inside thunderbolt devices. > > > > > > > > > > > > One of the prominent TBT4 docks we use is HP G4 Hook2, which incorporates > > > > > > an Intel Foxville chipset, which uses the igc driver. > > > > > > On connecting ethernet, CPU logs get inundated with these errors. The point > > > > > > is we shouldn't be spamming the logs with such correctible errors as it > > > > > > confuses other kernel developers less familiar with PCI errors, support > > > > > > staff, and users who happen to look at the logs. > > > > > > > > --- a/drivers/net/ethernet/intel/igc/igc_main.c > > > > > > +++ b/drivers/net/ethernet/intel/igc/igc_main.c > > > > > > > > +static void igc_mask_aer_replay_correctible(struct igc_adapter *adapter) > > > > > > > Shouldn't this igc_mask_aer_replay_correctible function be implemented > > > > > in drivers/pci/quirks.c and not in igc_probe()? > > > > > > > > Probably. Though I think, the PCI quirk file, is getting too big. > > > > > > As long as that file is right location, we should use it. > > > One can refactor quirk file later. > > > > If a quirk like this is only needed when the driver is loaded, > > This is always the case with PCI devices managed through kernel, isn't it? > Users don't care/aware about "broken" devices unless they start to use them. Indeed, that's usually the case. There's a lot of stuff in quirks.c that could probably be in drivers instead. Bjorn _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Intel-wired-lan] [PATCH] igc: Mask replay rollover/timeout errors in I225_LMVP 2023-01-03 14:21 ` Bjorn Helgaas @ 2023-01-03 17:16 ` Leon Romanovsky 2023-01-04 6:31 ` Leon Romanovsky 2023-01-04 5:35 ` Neftin, Sasha 1 sibling, 1 reply; 16+ messages in thread From: Leon Romanovsky @ 2023-01-03 17:16 UTC (permalink / raw) To: Bjorn Helgaas Cc: Paul Menzel, Rajat Khandelwal, netdev, rajat.khandelwal, jesse.brandeburg, linux-kernel, kuba, edumazet, anthony.l.nguyen, linux-pci, Bjorn Helgaas, intel-wired-lan, pabeni, davem On Tue, Jan 03, 2023 at 08:21:04AM -0600, Bjorn Helgaas wrote: > On Tue, Jan 03, 2023 at 02:00:04PM +0200, Leon Romanovsky wrote: > > On Tue, Jan 03, 2023 at 05:54:02AM -0600, Bjorn Helgaas wrote: > > > On Tue, Jan 03, 2023 at 11:54:24AM +0200, Leon Romanovsky wrote: > > > > On Sun, Jan 01, 2023 at 11:34:21AM +0100, Paul Menzel wrote: > > > > > Am 01.01.23 um 09:32 schrieb Leon Romanovsky: > > > > > > On Thu, Dec 29, 2022 at 05:56:40PM +0530, Rajat Khandelwal wrote: > > > > > > > The CPU logs get flooded with replay rollover/timeout AER errors in > > > > > > > the system with i225_lmvp connected, usually inside thunderbolt devices. > > > > > > > > > > > > > > One of the prominent TBT4 docks we use is HP G4 Hook2, which incorporates > > > > > > > an Intel Foxville chipset, which uses the igc driver. > > > > > > > On connecting ethernet, CPU logs get inundated with these errors. The point > > > > > > > is we shouldn't be spamming the logs with such correctible errors as it > > > > > > > confuses other kernel developers less familiar with PCI errors, support > > > > > > > staff, and users who happen to look at the logs. > > > > > > > > > > --- a/drivers/net/ethernet/intel/igc/igc_main.c > > > > > > > +++ b/drivers/net/ethernet/intel/igc/igc_main.c > > > > > > > > > > +static void igc_mask_aer_replay_correctible(struct igc_adapter *adapter) > > > > > > > > > Shouldn't this igc_mask_aer_replay_correctible function be implemented > > > > > > in drivers/pci/quirks.c and not in igc_probe()? > > > > > > > > > > Probably. Though I think, the PCI quirk file, is getting too big. > > > > > > > > As long as that file is right location, we should use it. > > > > One can refactor quirk file later. > > > > > > If a quirk like this is only needed when the driver is loaded, > > > > This is always the case with PCI devices managed through kernel, isn't it? > > Users don't care/aware about "broken" devices unless they start to use them. > > Indeed, that's usually the case. There's a lot of stuff in quirks.c > that could probably be in drivers instead. NP, so or deprecate quirks.c and prohibit any change to that file or don't allow drivers to mangle PCI in their probe routines. Everything in-between will cause to enormous mess in long run. Thanks > > Bjorn _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Intel-wired-lan] [PATCH] igc: Mask replay rollover/timeout errors in I225_LMVP 2023-01-03 17:16 ` Leon Romanovsky @ 2023-01-04 6:31 ` Leon Romanovsky 0 siblings, 0 replies; 16+ messages in thread From: Leon Romanovsky @ 2023-01-04 6:31 UTC (permalink / raw) To: Bjorn Helgaas Cc: Paul Menzel, Rajat Khandelwal, netdev, rajat.khandelwal, jesse.brandeburg, linux-kernel, kuba, edumazet, anthony.l.nguyen, linux-pci, Bjorn Helgaas, intel-wired-lan, pabeni, davem On Tue, Jan 03, 2023 at 07:16:58PM +0200, Leon Romanovsky wrote: > On Tue, Jan 03, 2023 at 08:21:04AM -0600, Bjorn Helgaas wrote: <...> > > > > If a quirk like this is only needed when the driver is loaded, > > > > > > This is always the case with PCI devices managed through kernel, isn't it? > > > Users don't care/aware about "broken" devices unless they start to use them. > > > > Indeed, that's usually the case. There's a lot of stuff in quirks.c > > that could probably be in drivers instead. > > NP, so or deprecate quirks.c and prohibit any change to that file or > don't allow drivers to mangle PCI in their probe routines. > Everything in-between will cause to enormous mess in long run. Another thing to consider what if you go with "probe variant", users will see behavioral differences between drivers and subsystems on how to control these quirks. As an example, see proposal in this thread to add ethtool private flag to enable/disable quirk. In other places, it will be module parameter, sysfs or special to that subsystem tool. Thanks > > Thanks > > > > > Bjorn _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Intel-wired-lan] [PATCH] igc: Mask replay rollover/timeout errors in I225_LMVP 2023-01-03 14:21 ` Bjorn Helgaas 2023-01-03 17:16 ` Leon Romanovsky @ 2023-01-04 5:35 ` Neftin, Sasha 1 sibling, 0 replies; 16+ messages in thread From: Neftin, Sasha @ 2023-01-04 5:35 UTC (permalink / raw) To: Bjorn Helgaas, Leon Romanovsky, Ruinskiy, Dima, Nguyen, Anthony L, Lifshits, Vitaly, naamax.meir, Mushayev, Nikolay, Edri, Michael, Neftin, Sasha Cc: Paul Menzel, Rajat Khandelwal, pabeni, netdev, rajat.khandelwal, jesse.brandeburg, linux-kernel, edumazet, anthony.l.nguyen, linux-pci, Bjorn Helgaas, kuba, intel-wired-lan, davem On 1/3/2023 16:21, Bjorn Helgaas wrote: > On Tue, Jan 03, 2023 at 02:00:04PM +0200, Leon Romanovsky wrote: >> On Tue, Jan 03, 2023 at 05:54:02AM -0600, Bjorn Helgaas wrote: >>> On Tue, Jan 03, 2023 at 11:54:24AM +0200, Leon Romanovsky wrote: >>>> On Sun, Jan 01, 2023 at 11:34:21AM +0100, Paul Menzel wrote: >>>>> Am 01.01.23 um 09:32 schrieb Leon Romanovsky: >>>>>> On Thu, Dec 29, 2022 at 05:56:40PM +0530, Rajat Khandelwal wrote: >>>>>>> The CPU logs get flooded with replay rollover/timeout AER errors in >>>>>>> the system with i225_lmvp connected, usually inside thunderbolt devices. >>>>>>> >>>>>>> One of the prominent TBT4 docks we use is HP G4 Hook2, which incorporates >>>>>>> an Intel Foxville chipset, which uses the igc driver. >>>>>>> On connecting ethernet, CPU logs get inundated with these errors. The point >>>>>>> is we shouldn't be spamming the logs with such correctible errors as it >>>>>>> confuses other kernel developers less familiar with PCI errors, support >>>>>>> staff, and users who happen to look at the logs. >>> >>>>>>> --- a/drivers/net/ethernet/intel/igc/igc_main.c >>>>>>> +++ b/drivers/net/ethernet/intel/igc/igc_main.c >>> >>>>>>> +static void igc_mask_aer_replay_correctible(struct igc_adapter *adapter) >>> >>>>>> Shouldn't this igc_mask_aer_replay_correctible function be implemented >>>>>> in drivers/pci/quirks.c and not in igc_probe()? >>>>> >>>>> Probably. Though I think, the PCI quirk file, is getting too big. >>>> >>>> As long as that file is right location, we should use it. >>>> One can refactor quirk file later. >>> >>> If a quirk like this is only needed when the driver is loaded, >> >> This is always the case with PCI devices managed through kernel, isn't it? >> Users don't care/aware about "broken" devices unless they start to use them. > > Indeed, that's usually the case. There's a lot of stuff in quirks.c > that could probably be in drivers instead. > > Bjorn > _______________________________________________ > Intel-wired-lan mailing list > Intel-wired-lan@osuosl.org > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan Tony,Please, drop/recall this patch. Intel's team will investigate this problem. Sasha _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Intel-wired-lan] [PATCH] igc: Mask replay rollover/timeout errors in I225_LMVP 2022-12-29 12:26 [Intel-wired-lan] [PATCH] igc: Mask replay rollover/timeout errors in I225_LMVP Rajat Khandelwal 2023-01-01 7:27 ` Neftin, Sasha 2023-01-01 8:32 ` Leon Romanovsky @ 2023-01-01 10:32 ` Paul Menzel 2023-01-02 17:38 ` Khandelwal, Rajat 2 siblings, 1 reply; 16+ messages in thread From: Paul Menzel @ 2023-01-01 10:32 UTC (permalink / raw) To: Rajat Khandelwal Cc: linux-pci, netdev, rajat.khandelwal, jesse.brandeburg, linux-kernel, edumazet, anthony.l.nguyen, intel-wired-lan, Bjorn Helgaas, kuba, pabeni, davem [Cc: +Bjorn, +linux-pci] Dear Rajat, Thank you for your patch. Am 29.12.22 um 13:26 schrieb Rajat Khandelwal: > The CPU logs get flooded with replay rollover/timeout AER errors in > the system with i225_lmvp connected, usually inside thunderbolt devices. Please add one example log message to the commit message. > One of the prominent TBT4 docks we use is HP G4 Hook2, which incorporates I couldn’t find that device. Is that the correct name? > an Intel Foxville chipset, which uses the igc driver. Please add a blank line between paragraphs. > On connecting ethernet, CPU logs get inundated with these errors. The point > is we shouldn't be spamming the logs with such correctible errors as it correctable > confuses other kernel developers less familiar with PCI errors, support > staff, and users who happen to look at the logs. Please reference the bug reports (bug tracker and mailing list), you know of, where this was reported. > Signed-off-by: Rajat Khandelwal <rajat.khandelwal@linux.intel.com> > --- > drivers/net/ethernet/intel/igc/igc_main.c | 28 +++++++++++++++++++++-- > 1 file changed, 26 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c > index ebff0e04045d..a3a6e8086c8d 100644 > --- a/drivers/net/ethernet/intel/igc/igc_main.c > +++ b/drivers/net/ethernet/intel/igc/igc_main.c > @@ -6201,6 +6201,26 @@ u32 igc_rd32(struct igc_hw *hw, u32 reg) > return value; > } > > +#ifdef CONFIG_PCIEAER > +static void igc_mask_aer_replay_correctible(struct igc_adapter *adapter) correctable > +{ > + struct pci_dev *pdev = adapter->pdev; > + u32 aer_pos, corr_mask; Instead of using the preprocessor, use a normal C conditional. From `Documentation/process/coding-style.rst`: > Within code, where possible, use the IS_ENABLED macro to convert a Kconfig > symbol into a C boolean expression, and use it in a normal C conditional: > > .. code-block:: c > > if (IS_ENABLED(CONFIG_SOMETHING)) { > ... > } > > The compiler will constant-fold the conditional away, and include or exclude > the block of code just as with an #ifdef, so this will not add any runtime > overhead. However, this approach still allows the C compiler to see the code > inside the block, and check it for correctness (syntax, types, symbol > references, etc). Thus, you still have to use an #ifdef if the code inside the > block references symbols that will not exist if the condition is not met. > + > + if (pdev->device != IGC_DEV_ID_I225_LMVP) > + return; > + > + aer_pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR); > + if (!aer_pos) > + return; > + > + pci_read_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, &corr_mask); > + > + corr_mask |= PCI_ERR_COR_REP_ROLL | PCI_ERR_COR_REP_TIMER; > + pci_write_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, corr_mask); > +} > +#endif > + > /** > * igc_probe - Device Initialization Routine > * @pdev: PCI device information struct > @@ -6236,8 +6256,6 @@ static int igc_probe(struct pci_dev *pdev, > if (err) > goto err_pci_reg; > > - pci_enable_pcie_error_reporting(pdev); > - > err = pci_enable_ptm(pdev, NULL); > if (err < 0) > dev_info(&pdev->dev, "PCIe PTM not supported by PCIe bus/controller\n"); > @@ -6272,6 +6290,12 @@ static int igc_probe(struct pci_dev *pdev, > if (!adapter->io_addr) > goto err_ioremap; > > +#ifdef CONFIG_PCIEAER > + igc_mask_aer_replay_correctible(adapter); > +#endif > + > + pci_enable_pcie_error_reporting(pdev); > + > /* hw->hw_addr can be zeroed, so use adapter->io_addr for unmap */ > hw->hw_addr = adapter->io_addr; > Kind regards, Paul _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Intel-wired-lan] [PATCH] igc: Mask replay rollover/timeout errors in I225_LMVP 2023-01-01 10:32 ` Paul Menzel @ 2023-01-02 17:38 ` Khandelwal, Rajat 2023-01-03 5:44 ` Neftin, Sasha 0 siblings, 1 reply; 16+ messages in thread From: Khandelwal, Rajat @ 2023-01-02 17:38 UTC (permalink / raw) To: Paul Menzel, Rajat Khandelwal, Neftin, Sasha Cc: Leon Romanovsky, netdev@vger.kernel.org, linux-pci@vger.kernel.org, Brandeburg, Jesse, linux-kernel@vger.kernel.org, edumazet@google.com, Nguyen, Anthony L, intel-wired-lan@lists.osuosl.org, Bjorn Helgaas, kuba@kernel.org, pabeni@redhat.com, davem@davemloft.net Hi Paul, Sasha Thanks for the acknowledgement! -> Will add the example logs -> Device: https://www.hp.com/us-en/monitors-accessories/computer-accessories/thunderbolt-G4-dock.html -> correctible -> correctable -> I guess acc to the convention, I still have to use #ifdef for my function since it references variables that won't exist if the condition is not met. However, I have used the IS_ENABLED macro to call the function inside igc_probe(). I hope that's okay! -> One last thing, I was also skeptical on the location of this function, but then I witnessed netxen_mask_aer_correctable() function inside net/ethernet/qlogic/netxen/netxen_nic_main.c, which masks the correctable errors in its PCIe device. Also, I don’t see a CONFIG_PCIEAER macro enabled function in pci/quirks.c! I still think to keep the function in igc_main.c, but I am waiting for your judgement. @Neftin, Sasha, I and my team prefer masking these errors rather than debugging them. First, they are correctable and non-fatal. Second, these errors are observed in many of the devices I have worked with (i.e., replay errors). Maybe there is something universal which has to be done for the thunderbolt domain regarding these specific replay errors in the long term? Anyhow, we would like to mask these errors for now to avoid any confusions when ethernet gets connected to the dock. I hope that will be okay? Waiting for your judgement :) Let me know on any more queries and any suggestions until I roll out v2. Thanks Rajat -----Original Message----- From: Paul Menzel <pmenzel@molgen.mpg.de> Sent: Sunday, January 1, 2023 4:02 PM To: Rajat Khandelwal <rajat.khandelwal@linux.intel.com> Cc: Brandeburg, Jesse <jesse.brandeburg@intel.com>; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; davem@davemloft.net; edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org; linux-kernel@vger.kernel.org; Khandelwal, Rajat <rajat.khandelwal@intel.com>; Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org Subject: Re: [Intel-wired-lan] [PATCH] igc: Mask replay rollover/timeout errors in I225_LMVP [Cc: +Bjorn, +linux-pci] Dear Rajat, Thank you for your patch. Am 29.12.22 um 13:26 schrieb Rajat Khandelwal: > The CPU logs get flooded with replay rollover/timeout AER errors in > the system with i225_lmvp connected, usually inside thunderbolt devices. Please add one example log message to the commit message. > One of the prominent TBT4 docks we use is HP G4 Hook2, which > incorporates I couldn’t find that device. Is that the correct name? > an Intel Foxville chipset, which uses the igc driver. Please add a blank line between paragraphs. > On connecting ethernet, CPU logs get inundated with these errors. The > point is we shouldn't be spamming the logs with such correctible > errors as it correctable > confuses other kernel developers less familiar with PCI errors, > support staff, and users who happen to look at the logs. Please reference the bug reports (bug tracker and mailing list), you know of, where this was reported. > Signed-off-by: Rajat Khandelwal <rajat.khandelwal@linux.intel.com> > --- > drivers/net/ethernet/intel/igc/igc_main.c | 28 +++++++++++++++++++++-- > 1 file changed, 26 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/intel/igc/igc_main.c > b/drivers/net/ethernet/intel/igc/igc_main.c > index ebff0e04045d..a3a6e8086c8d 100644 > --- a/drivers/net/ethernet/intel/igc/igc_main.c > +++ b/drivers/net/ethernet/intel/igc/igc_main.c > @@ -6201,6 +6201,26 @@ u32 igc_rd32(struct igc_hw *hw, u32 reg) > return value; > } > > +#ifdef CONFIG_PCIEAER > +static void igc_mask_aer_replay_correctible(struct igc_adapter > +*adapter) correctable > +{ > + struct pci_dev *pdev = adapter->pdev; > + u32 aer_pos, corr_mask; Instead of using the preprocessor, use a normal C conditional. From `Documentation/process/coding-style.rst`: > Within code, where possible, use the IS_ENABLED macro to convert a > Kconfig symbol into a C boolean expression, and use it in a normal C conditional: > > .. code-block:: c > > if (IS_ENABLED(CONFIG_SOMETHING)) { > ... > } > > The compiler will constant-fold the conditional away, and include or > exclude the block of code just as with an #ifdef, so this will not add > any runtime overhead. However, this approach still allows the C > compiler to see the code inside the block, and check it for > correctness (syntax, types, symbol references, etc). Thus, you still > have to use an #ifdef if the code inside the block references symbols that will not exist if the condition is not met. > + > + if (pdev->device != IGC_DEV_ID_I225_LMVP) > + return; > + > + aer_pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR); > + if (!aer_pos) > + return; > + > + pci_read_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, &corr_mask); > + > + corr_mask |= PCI_ERR_COR_REP_ROLL | PCI_ERR_COR_REP_TIMER; > + pci_write_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, corr_mask); > +} #endif > + > /** > * igc_probe - Device Initialization Routine > * @pdev: PCI device information struct @@ -6236,8 +6256,6 @@ static > int igc_probe(struct pci_dev *pdev, > if (err) > goto err_pci_reg; > > - pci_enable_pcie_error_reporting(pdev); > - > err = pci_enable_ptm(pdev, NULL); > if (err < 0) > dev_info(&pdev->dev, "PCIe PTM not supported by PCIe > bus/controller\n"); @@ -6272,6 +6290,12 @@ static int igc_probe(struct pci_dev *pdev, > if (!adapter->io_addr) > goto err_ioremap; > > +#ifdef CONFIG_PCIEAER > + igc_mask_aer_replay_correctible(adapter); > +#endif > + > + pci_enable_pcie_error_reporting(pdev); > + > /* hw->hw_addr can be zeroed, so use adapter->io_addr for unmap */ > hw->hw_addr = adapter->io_addr; > Kind regards, Paul _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Intel-wired-lan] [PATCH] igc: Mask replay rollover/timeout errors in I225_LMVP 2023-01-02 17:38 ` Khandelwal, Rajat @ 2023-01-03 5:44 ` Neftin, Sasha 2023-01-03 6:58 ` Khandelwal, Rajat 0 siblings, 1 reply; 16+ messages in thread From: Neftin, Sasha @ 2023-01-03 5:44 UTC (permalink / raw) To: Khandelwal, Rajat, Paul Menzel, Rajat Khandelwal, Ruinskiy, Dima, Lifshits, Vitaly, naamax.meir Cc: Leon Romanovsky, netdev@vger.kernel.org, Mushayev, Nikolay, linux-pci@vger.kernel.org, Brandeburg, Jesse, linux-kernel@vger.kernel.org, edumazet@google.com, Nguyen, Anthony L, Efrati, Nir, intel-wired-lan@lists.osuosl.org, Bjorn Helgaas, kuba@kernel.org, pabeni@redhat.com, davem@davemloft.net On 1/2/2023 19:38, Khandelwal, Rajat wrote: > Hi Paul, Sasha > Thanks for the acknowledgement! > > -> Will add the example logs > -> Device: https://www.hp.com/us-en/monitors-accessories/computer-accessories/thunderbolt-G4-dock.html > -> correctible -> correctable > -> I guess acc to the convention, I still have to use #ifdef for my function since it > references variables that won't exist if the condition is not met. > However, I have used the IS_ENABLED macro to call the function inside igc_probe(). > I hope that's okay! > > -> One last thing, I was also skeptical on the location of this function, but then I witnessed > netxen_mask_aer_correctable() function inside net/ethernet/qlogic/netxen/netxen_nic_main.c, > which masks the correctable errors in its PCIe device. > Also, I don’t see a CONFIG_PCIEAER macro enabled function in pci/quirks.c! > I still think to keep the function in igc_main.c, but I am waiting for your judgement. > > @Neftin, Sasha, I and my team prefer masking these errors rather than debugging them. > First, they are correctable and non-fatal. Second, these errors are observed in many of the devices I > have worked with (i.e., replay errors). Maybe there is something universal which has to be done for the > thunderbolt domain regarding these specific replay errors in the long term? > Anyhow, we would like to mask these errors for now to avoid any confusions when ethernet gets > connected to the dock. I hope that will be okay? Waiting for your judgement :) I do not think this approach is acceptable (mask in probe). Do not mask it via .config. I suggest exporting priv_flag (give user control: enable/disable specific PCIAER by flag via ethtool and upon user responsibility.)Some example of priv_flag export: 3c98cbf22a96 I also not sure quirck.c approach valid for this case. > > Let me know on any more queries and any suggestions until I roll out v2. > > Thanks > Rajat > > -----Original Message----- > From: Paul Menzel <pmenzel@molgen.mpg.de> > Sent: Sunday, January 1, 2023 4:02 PM > To: Rajat Khandelwal <rajat.khandelwal@linux.intel.com> > Cc: Brandeburg, Jesse <jesse.brandeburg@intel.com>; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; davem@davemloft.net; edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org; linux-kernel@vger.kernel.org; Khandelwal, Rajat <rajat.khandelwal@intel.com>; Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org > Subject: Re: [Intel-wired-lan] [PATCH] igc: Mask replay rollover/timeout errors in I225_LMVP > > [Cc: +Bjorn, +linux-pci] > > > Dear Rajat, > > > Thank you for your patch. > > Am 29.12.22 um 13:26 schrieb Rajat Khandelwal: >> The CPU logs get flooded with replay rollover/timeout AER errors in >> the system with i225_lmvp connected, usually inside thunderbolt devices. > > Please add one example log message to the commit message. > >> One of the prominent TBT4 docks we use is HP G4 Hook2, which >> incorporates > > I couldn’t find that device. Is that the correct name? > >> an Intel Foxville chipset, which uses the igc driver. > > Please add a blank line between paragraphs. > >> On connecting ethernet, CPU logs get inundated with these errors. The >> point is we shouldn't be spamming the logs with such correctible >> errors as it > > correctable > >> confuses other kernel developers less familiar with PCI errors, >> support staff, and users who happen to look at the logs. > > Please reference the bug reports (bug tracker and mailing list), you know of, where this was reported. > >> Signed-off-by: Rajat Khandelwal <rajat.khandelwal@linux.intel.com> >> --- >> drivers/net/ethernet/intel/igc/igc_main.c | 28 +++++++++++++++++++++-- >> 1 file changed, 26 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c >> b/drivers/net/ethernet/intel/igc/igc_main.c >> index ebff0e04045d..a3a6e8086c8d 100644 >> --- a/drivers/net/ethernet/intel/igc/igc_main.c >> +++ b/drivers/net/ethernet/intel/igc/igc_main.c >> @@ -6201,6 +6201,26 @@ u32 igc_rd32(struct igc_hw *hw, u32 reg) >> return value; >> } >> >> +#ifdef CONFIG_PCIEAER >> +static void igc_mask_aer_replay_correctible(struct igc_adapter >> +*adapter) > > correctable > >> +{ >> + struct pci_dev *pdev = adapter->pdev; >> + u32 aer_pos, corr_mask; > > Instead of using the preprocessor, use a normal C conditional. From > `Documentation/process/coding-style.rst`: > >> Within code, where possible, use the IS_ENABLED macro to convert a >> Kconfig symbol into a C boolean expression, and use it in a normal C conditional: >> >> .. code-block:: c >> >> if (IS_ENABLED(CONFIG_SOMETHING)) { >> ... >> } >> >> The compiler will constant-fold the conditional away, and include or >> exclude the block of code just as with an #ifdef, so this will not add >> any runtime overhead. However, this approach still allows the C >> compiler to see the code inside the block, and check it for >> correctness (syntax, types, symbol references, etc). Thus, you still >> have to use an #ifdef if the code inside the block references symbols that will not exist if the condition is not met. > > >> + >> + if (pdev->device != IGC_DEV_ID_I225_LMVP) >> + return; >> + >> + aer_pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR); >> + if (!aer_pos) >> + return; >> + >> + pci_read_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, &corr_mask); >> + >> + corr_mask |= PCI_ERR_COR_REP_ROLL | PCI_ERR_COR_REP_TIMER; >> + pci_write_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, corr_mask); >> +} #endif >> + >> /** >> * igc_probe - Device Initialization Routine >> * @pdev: PCI device information struct @@ -6236,8 +6256,6 @@ static >> int igc_probe(struct pci_dev *pdev, >> if (err) >> goto err_pci_reg; >> >> - pci_enable_pcie_error_reporting(pdev); >> - >> err = pci_enable_ptm(pdev, NULL); >> if (err < 0) >> dev_info(&pdev->dev, "PCIe PTM not supported by PCIe >> bus/controller\n"); @@ -6272,6 +6290,12 @@ static int igc_probe(struct pci_dev *pdev, >> if (!adapter->io_addr) >> goto err_ioremap; >> >> +#ifdef CONFIG_PCIEAER >> + igc_mask_aer_replay_correctible(adapter); >> +#endif >> + >> + pci_enable_pcie_error_reporting(pdev); >> + >> /* hw->hw_addr can be zeroed, so use adapter->io_addr for unmap */ >> hw->hw_addr = adapter->io_addr; >> > > > Kind regards, > > Paul _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Intel-wired-lan] [PATCH] igc: Mask replay rollover/timeout errors in I225_LMVP 2023-01-03 5:44 ` Neftin, Sasha @ 2023-01-03 6:58 ` Khandelwal, Rajat 2023-01-03 19:13 ` Bjorn Helgaas 0 siblings, 1 reply; 16+ messages in thread From: Khandelwal, Rajat @ 2023-01-03 6:58 UTC (permalink / raw) To: Neftin, Sasha, Paul Menzel, Rajat Khandelwal, Ruinskiy, Dima, Lifshits, Vitaly, naamax.meir Cc: Leon Romanovsky, netdev@vger.kernel.org, Mushayev, Nikolay, linux-pci@vger.kernel.org, Brandeburg, Jesse, linux-kernel@vger.kernel.org, edumazet@google.com, Nguyen, Anthony L, Efrati, Nir, intel-wired-lan@lists.osuosl.org, Bjorn Helgaas, kuba@kernel.org, pabeni@redhat.com, davem@davemloft.net Hi Sasha, Thanks for the acknowledgement! Ok, I get the point you are trying to make. Instead of masking inherently, you suggest to export a flag and give user the control. I understand and it's doable. The reason I masked inherently is I witnessed a function netxen_mask_aer_correctable() inside net/ethernet/qlogic/netxen/netxen_nic_main.c, which masks the correctable errors in the corresponding PCIe device. Just curious about the inherent implementation in netxen! Again, if you suggest the implementation compulsory, I will do that. Just confirming before actually doing it :) Thanks Rajat -----Original Message----- From: Neftin, Sasha <sasha.neftin@intel.com> Sent: Tuesday, January 3, 2023 11:14 AM To: Khandelwal, Rajat <rajat.khandelwal@intel.com>; Paul Menzel <pmenzel@molgen.mpg.de>; Rajat Khandelwal <rajat.khandelwal@linux.intel.com>; Ruinskiy, Dima <dima.ruinskiy@intel.com>; Lifshits, Vitaly <vitaly.lifshits@intel.com>; naamax.meir <naamax.meir@linux.intel.com> Cc: Brandeburg, Jesse <jesse.brandeburg@intel.com>; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; davem@davemloft.net; edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org; linux-kernel@vger.kernel.org; Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; Leon Romanovsky <leon@kernel.org>; Avargil, Raanan <raanan.avargil@intel.com>; Efrati, Nir <nir.efrati@intel.com>; Mushayev, Nikolay <nikolay.mushayev@intel.com> Subject: Re: [Intel-wired-lan] [PATCH] igc: Mask replay rollover/timeout errors in I225_LMVP On 1/2/2023 19:38, Khandelwal, Rajat wrote: > Hi Paul, Sasha > Thanks for the acknowledgement! > > -> Will add the example logs > -> Device: > -> https://www.hp.com/us-en/monitors-accessories/computer-accessories/ > -> thunderbolt-G4-dock.html > -> correctible -> correctable > -> I guess acc to the convention, I still have to use #ifdef for my > -> function since it > references variables that won't exist if the condition is not met. > However, I have used the IS_ENABLED macro to call the function inside igc_probe(). > I hope that's okay! > > -> One last thing, I was also skeptical on the location of this > -> function, but then I witnessed > netxen_mask_aer_correctable() function inside > net/ethernet/qlogic/netxen/netxen_nic_main.c, > which masks the correctable errors in its PCIe device. > Also, I don’t see a CONFIG_PCIEAER macro enabled function in pci/quirks.c! > I still think to keep the function in igc_main.c, but I am waiting for your judgement. > > @Neftin, Sasha, I and my team prefer masking these errors rather than debugging them. > First, they are correctable and non-fatal. Second, these errors are > observed in many of the devices I have worked with (i.e., replay > errors). Maybe there is something universal which has to be done for the thunderbolt domain regarding these specific replay errors in the long term? > Anyhow, we would like to mask these errors for now to avoid any > confusions when ethernet gets connected to the dock. I hope that will > be okay? Waiting for your judgement :) I do not think this approach is acceptable (mask in probe). Do not mask it via .config. I suggest exporting priv_flag (give user control: enable/disable specific PCIAER by flag via ethtool and upon user responsibility.)Some example of priv_flag export: 3c98cbf22a96 I also not sure quirck.c approach valid for this case. > > Let me know on any more queries and any suggestions until I roll out v2. > > Thanks > Rajat > > -----Original Message----- > From: Paul Menzel <pmenzel@molgen.mpg.de> > Sent: Sunday, January 1, 2023 4:02 PM > To: Rajat Khandelwal <rajat.khandelwal@linux.intel.com> > Cc: Brandeburg, Jesse <jesse.brandeburg@intel.com>; Nguyen, Anthony L > <anthony.l.nguyen@intel.com>; davem@davemloft.net; > edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; > netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org; > linux-kernel@vger.kernel.org; Khandelwal, Rajat > <rajat.khandelwal@intel.com>; Bjorn Helgaas <bhelgaas@google.com>; > linux-pci@vger.kernel.org > Subject: Re: [Intel-wired-lan] [PATCH] igc: Mask replay > rollover/timeout errors in I225_LMVP > > [Cc: +Bjorn, +linux-pci] > > > Dear Rajat, > > > Thank you for your patch. > > Am 29.12.22 um 13:26 schrieb Rajat Khandelwal: >> The CPU logs get flooded with replay rollover/timeout AER errors in >> the system with i225_lmvp connected, usually inside thunderbolt devices. > > Please add one example log message to the commit message. > >> One of the prominent TBT4 docks we use is HP G4 Hook2, which >> incorporates > > I couldn’t find that device. Is that the correct name? > >> an Intel Foxville chipset, which uses the igc driver. > > Please add a blank line between paragraphs. > >> On connecting ethernet, CPU logs get inundated with these errors. The >> point is we shouldn't be spamming the logs with such correctible >> errors as it > > correctable > >> confuses other kernel developers less familiar with PCI errors, >> support staff, and users who happen to look at the logs. > > Please reference the bug reports (bug tracker and mailing list), you know of, where this was reported. > >> Signed-off-by: Rajat Khandelwal <rajat.khandelwal@linux.intel.com> >> --- >> drivers/net/ethernet/intel/igc/igc_main.c | 28 +++++++++++++++++++++-- >> 1 file changed, 26 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c >> b/drivers/net/ethernet/intel/igc/igc_main.c >> index ebff0e04045d..a3a6e8086c8d 100644 >> --- a/drivers/net/ethernet/intel/igc/igc_main.c >> +++ b/drivers/net/ethernet/intel/igc/igc_main.c >> @@ -6201,6 +6201,26 @@ u32 igc_rd32(struct igc_hw *hw, u32 reg) >> return value; >> } >> >> +#ifdef CONFIG_PCIEAER >> +static void igc_mask_aer_replay_correctible(struct igc_adapter >> +*adapter) > > correctable > >> +{ >> + struct pci_dev *pdev = adapter->pdev; >> + u32 aer_pos, corr_mask; > > Instead of using the preprocessor, use a normal C conditional. From > `Documentation/process/coding-style.rst`: > >> Within code, where possible, use the IS_ENABLED macro to convert a >> Kconfig symbol into a C boolean expression, and use it in a normal C conditional: >> >> .. code-block:: c >> >> if (IS_ENABLED(CONFIG_SOMETHING)) { >> ... >> } >> >> The compiler will constant-fold the conditional away, and include or >> exclude the block of code just as with an #ifdef, so this will not >> add any runtime overhead. However, this approach still allows the C >> compiler to see the code inside the block, and check it for >> correctness (syntax, types, symbol references, etc). Thus, you still >> have to use an #ifdef if the code inside the block references symbols that will not exist if the condition is not met. > > >> + >> + if (pdev->device != IGC_DEV_ID_I225_LMVP) >> + return; >> + >> + aer_pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR); >> + if (!aer_pos) >> + return; >> + >> + pci_read_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, >> +&corr_mask); >> + >> + corr_mask |= PCI_ERR_COR_REP_ROLL | PCI_ERR_COR_REP_TIMER; >> + pci_write_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, >> +corr_mask); } #endif >> + >> /** >> * igc_probe - Device Initialization Routine >> * @pdev: PCI device information struct @@ -6236,8 +6256,6 @@ >> static int igc_probe(struct pci_dev *pdev, >> if (err) >> goto err_pci_reg; >> >> - pci_enable_pcie_error_reporting(pdev); >> - >> err = pci_enable_ptm(pdev, NULL); >> if (err < 0) >> dev_info(&pdev->dev, "PCIe PTM not supported by PCIe >> bus/controller\n"); @@ -6272,6 +6290,12 @@ static int igc_probe(struct pci_dev *pdev, >> if (!adapter->io_addr) >> goto err_ioremap; >> >> +#ifdef CONFIG_PCIEAER >> + igc_mask_aer_replay_correctible(adapter); >> +#endif >> + >> + pci_enable_pcie_error_reporting(pdev); >> + >> /* hw->hw_addr can be zeroed, so use adapter->io_addr for unmap */ >> hw->hw_addr = adapter->io_addr; >> > > > Kind regards, > > Paul _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Intel-wired-lan] [PATCH] igc: Mask replay rollover/timeout errors in I225_LMVP 2023-01-03 6:58 ` Khandelwal, Rajat @ 2023-01-03 19:13 ` Bjorn Helgaas 0 siblings, 0 replies; 16+ messages in thread From: Bjorn Helgaas @ 2023-01-03 19:13 UTC (permalink / raw) To: Khandelwal, Rajat Cc: Paul Menzel, Rajat Khandelwal, Leon Romanovsky, davem@davemloft.net, netdev@vger.kernel.org, Mushayev, Nikolay, Brandeburg, Jesse, linux-kernel@vger.kernel.org, kuba@kernel.org, edumazet@google.com, pabeni@redhat.com, Nguyen, Anthony L, linux-pci@vger.kernel.org, Bjorn Helgaas, Ruinskiy, Dima, Efrati, Nir, intel-wired-lan@lists.osuosl.org On Tue, Jan 03, 2023 at 06:58:36AM +0000, Khandelwal, Rajat wrote: > ... > The reason I masked inherently is I witnessed a function > netxen_mask_aer_correctable() inside > net/ethernet/qlogic/netxen/netxen_nic_main.c, which masks the > correctable errors in the corresponding PCIe device. In my opinion, netxen_mask_aer_correctable() should not exist. The PCI core should own the PCI_ERR_COR_MASK register. netxen_mask_aer_correctable() was added by dce87b960cf4 ("netxen: mask correctable error") with the note that it is a "HW workaround." Maybe it covers up some hardware defect in the device, although it doesn't include any evidence of this. But if we do actually need it, I would rather have the driver set a quirk bit that the PCI core can use to mask correctable errors so the AER configuration is all in one place. Bjorn _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2023-01-04 6:31 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-12-29 12:26 [Intel-wired-lan] [PATCH] igc: Mask replay rollover/timeout errors in I225_LMVP Rajat Khandelwal 2023-01-01 7:27 ` Neftin, Sasha 2023-01-01 8:32 ` Leon Romanovsky 2023-01-01 10:34 ` Paul Menzel 2023-01-03 9:54 ` Leon Romanovsky 2023-01-03 11:54 ` Bjorn Helgaas 2023-01-03 12:00 ` Leon Romanovsky 2023-01-03 14:21 ` Bjorn Helgaas 2023-01-03 17:16 ` Leon Romanovsky 2023-01-04 6:31 ` Leon Romanovsky 2023-01-04 5:35 ` Neftin, Sasha 2023-01-01 10:32 ` Paul Menzel 2023-01-02 17:38 ` Khandelwal, Rajat 2023-01-03 5:44 ` Neftin, Sasha 2023-01-03 6:58 ` Khandelwal, Rajat 2023-01-03 19:13 ` Bjorn Helgaas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox