* [PATCH net 0/3][pull request] Intel Wired LAN Driver Updates 2023-05-09 (igc, igb)
@ 2023-05-09 17:09 Tony Nguyen
2023-05-09 17:09 ` [PATCH net 1/3] igc: Clean the TX buffer and TX descriptor ring Tony Nguyen
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Tony Nguyen @ 2023-05-09 17:09 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, netdev; +Cc: Tony Nguyen
This series contains updates to igc and igb drivers.
Husaini clears Tx rings when interface is brought down for igc.
Vinicius disables PTM and PCI busmaster when removing igc driver.
Alex adds error check and path for NVM read error on igb.
The following are changes since commit 162bd18eb55adf464a0fa2b4144b8d61c75ff7c2:
linux/dim: Do nothing if no time delta between samples
and are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue 1GbE
Aleksandr Loktionov (1):
igb: fix nvm.ops.read() error handling
Muhammad Husaini Zulkifli (1):
igc: Clean the TX buffer and TX descriptor ring
Vinicius Costa Gomes (1):
igc: Fix possible system crash when loading module
drivers/net/ethernet/intel/igb/igb_ethtool.c | 4 +++-
drivers/net/ethernet/intel/igc/igc_main.c | 10 ++++++++++
2 files changed, 13 insertions(+), 1 deletion(-)
--
2.38.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net 1/3] igc: Clean the TX buffer and TX descriptor ring
2023-05-09 17:09 [PATCH net 0/3][pull request] Intel Wired LAN Driver Updates 2023-05-09 (igc, igb) Tony Nguyen
@ 2023-05-09 17:09 ` Tony Nguyen
2023-05-11 2:14 ` Jakub Kicinski
2023-05-09 17:09 ` [PATCH net 2/3] igc: Fix possible system crash when loading module Tony Nguyen
2023-05-09 17:09 ` [PATCH net 3/3] igb: fix nvm.ops.read() error handling Tony Nguyen
2 siblings, 1 reply; 8+ messages in thread
From: Tony Nguyen @ 2023-05-09 17:09 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, netdev
Cc: Muhammad Husaini Zulkifli, anthony.l.nguyen, sasha.neftin,
Naama Meir
From: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
There could be a race condition during link down where interrupt
being generated and igc_clean_tx_irq() been called to perform the
TX completion. Properly clear the TX buffer and TX descriptor ring
to avoid those case.
Fixes: 13b5b7fd6a4a ("igc: Add support for Tx/Rx rings")
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
drivers/net/ethernet/intel/igc/igc_main.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 1c4676882082..5767e691b401 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -254,6 +254,13 @@ static void igc_clean_tx_ring(struct igc_ring *tx_ring)
/* reset BQL for queue */
netdev_tx_reset_queue(txring_txq(tx_ring));
+ /* Zero out the buffer ring */
+ memset(tx_ring->tx_buffer_info, 0,
+ sizeof(*tx_ring->tx_buffer_info) * tx_ring->count);
+
+ /* Zero out the descriptor ring */
+ memset(tx_ring->desc, 0, tx_ring->size);
+
/* reset next_to_use and next_to_clean */
tx_ring->next_to_use = 0;
tx_ring->next_to_clean = 0;
--
2.38.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net 2/3] igc: Fix possible system crash when loading module
2023-05-09 17:09 [PATCH net 0/3][pull request] Intel Wired LAN Driver Updates 2023-05-09 (igc, igb) Tony Nguyen
2023-05-09 17:09 ` [PATCH net 1/3] igc: Clean the TX buffer and TX descriptor ring Tony Nguyen
@ 2023-05-09 17:09 ` Tony Nguyen
2023-05-09 17:09 ` [PATCH net 3/3] igb: fix nvm.ops.read() error handling Tony Nguyen
2 siblings, 0 replies; 8+ messages in thread
From: Tony Nguyen @ 2023-05-09 17:09 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, netdev
Cc: Vinicius Costa Gomes, anthony.l.nguyen, sasha.neftin,
Muhammad Husaini Zulkifli, Naama Meir
From: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Guarantee that when probe() is run again, PTM and PCI busmaster will be
in the same state as it was if the driver was never loaded.
Avoid an i225/i226 hardware issue that PTM requests can be made even
though PCI bus mastering is not enabled. These unexpected PTM requests
can crash some systems.
So, "force" disable PTM and busmastering before removing the driver,
so they can be re-enabled in the right order during probe(). This is
more like a workaround and should be applicable for i225 and i226, in
any platform.
Fixes: 1b5d73fb8624 ("igc: Enable PCIe PTM")
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
drivers/net/ethernet/intel/igc/igc_main.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 5767e691b401..5213f3da1fe7 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -6730,6 +6730,9 @@ static void igc_remove(struct pci_dev *pdev)
igc_ptp_stop(adapter);
+ pci_disable_ptm(pdev);
+ pci_clear_master(pdev);
+
set_bit(__IGC_DOWN, &adapter->state);
del_timer_sync(&adapter->watchdog_timer);
--
2.38.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net 3/3] igb: fix nvm.ops.read() error handling
2023-05-09 17:09 [PATCH net 0/3][pull request] Intel Wired LAN Driver Updates 2023-05-09 (igc, igb) Tony Nguyen
2023-05-09 17:09 ` [PATCH net 1/3] igc: Clean the TX buffer and TX descriptor ring Tony Nguyen
2023-05-09 17:09 ` [PATCH net 2/3] igc: Fix possible system crash when loading module Tony Nguyen
@ 2023-05-09 17:09 ` Tony Nguyen
2 siblings, 0 replies; 8+ messages in thread
From: Tony Nguyen @ 2023-05-09 17:09 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, netdev
Cc: Aleksandr Loktionov, anthony.l.nguyen, Pucha Himasekhar Reddy
From: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Add error handling into igb_set_eeprom() function, in case
nvm.ops.read() fails just quit with error code asap.
Fixes: 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet driver")
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
drivers/net/ethernet/intel/igb/igb_ethtool.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index 7d60da1b7bf4..99b6b21caa02 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -822,6 +822,8 @@ static int igb_set_eeprom(struct net_device *netdev,
*/
ret_val = hw->nvm.ops.read(hw, last_word, 1,
&eeprom_buff[last_word - first_word]);
+ if (ret_val)
+ goto out;
}
/* Device's eeprom is always little-endian, word addressable */
@@ -839,7 +841,7 @@ static int igb_set_eeprom(struct net_device *netdev,
/* Update the checksum if nvm write succeeded */
if (ret_val == 0)
hw->nvm.ops.update(hw);
-
+out:
igb_set_fw_version(adapter);
kfree(eeprom_buff);
return ret_val;
--
2.38.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net 1/3] igc: Clean the TX buffer and TX descriptor ring
2023-05-09 17:09 ` [PATCH net 1/3] igc: Clean the TX buffer and TX descriptor ring Tony Nguyen
@ 2023-05-11 2:14 ` Jakub Kicinski
2023-05-12 8:51 ` Zulkifli, Muhammad Husaini
0 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2023-05-11 2:14 UTC (permalink / raw)
To: Tony Nguyen
Cc: davem, pabeni, edumazet, netdev, Muhammad Husaini Zulkifli,
sasha.neftin, Naama Meir
On Tue, 9 May 2023 10:09:33 -0700 Tony Nguyen wrote:
> There could be a race condition during link down where interrupt
> being generated and igc_clean_tx_irq() been called to perform the
> TX completion. Properly clear the TX buffer and TX descriptor ring
> to avoid those case.
> + /* Zero out the buffer ring */
> + memset(tx_ring->tx_buffer_info, 0,
> + sizeof(*tx_ring->tx_buffer_info) * tx_ring->count);
> +
> + /* Zero out the descriptor ring */
> + memset(tx_ring->desc, 0, tx_ring->size);
Just from the diff and the commit description this does not seem
obviously correct. Race condition means the two functions can run
at the same time, and memset() is not atomic.
--
pw-bot: cr
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH net 1/3] igc: Clean the TX buffer and TX descriptor ring
2023-05-11 2:14 ` Jakub Kicinski
@ 2023-05-12 8:51 ` Zulkifli, Muhammad Husaini
2023-05-12 15:30 ` Maciej Fijalkowski
0 siblings, 1 reply; 8+ messages in thread
From: Zulkifli, Muhammad Husaini @ 2023-05-12 8:51 UTC (permalink / raw)
To: Jakub Kicinski, Nguyen, Anthony L
Cc: davem@davemloft.net, pabeni@redhat.com, edumazet@google.com,
netdev@vger.kernel.org, Neftin, Sasha, Naama Meir
Hi Jakub,
> On Tue, 9 May 2023 10:09:33 -0700 Tony Nguyen wrote:
> > There could be a race condition during link down where interrupt being
> > generated and igc_clean_tx_irq() been called to perform the TX
> > completion. Properly clear the TX buffer and TX descriptor ring to
> > avoid those case.
>
> > + /* Zero out the buffer ring */
> > + memset(tx_ring->tx_buffer_info, 0,
> > + sizeof(*tx_ring->tx_buffer_info) * tx_ring->count);
> > +
> > + /* Zero out the descriptor ring */
> > + memset(tx_ring->desc, 0, tx_ring->size);
>
> Just from the diff and the commit description this does not seem obviously
> correct. Race condition means the two functions can run at the same time,
> and memset() is not atomic.
While a link is going up or down and a lot of packets(UDP) are being sent transmitted,
we are observing some kernel panic issues. On my side, it was easily to reproduce.
It's possible that igc_clean_tx_irq() was called to complete the TX during link up/down
based on how the call trace looks. With this fix, I not observed the issue anymore.
Almost similar issue reported before in here:
https://lore.kernel.org/all/SJ1PR11MB6180CDB866753CFBC2F9AF75B8959@SJ1PR11MB6180.namprd11.prod.outlook.com/
> --
> pw-bot: cr
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net 1/3] igc: Clean the TX buffer and TX descriptor ring
2023-05-12 8:51 ` Zulkifli, Muhammad Husaini
@ 2023-05-12 15:30 ` Maciej Fijalkowski
2023-05-15 3:30 ` Zulkifli, Muhammad Husaini
0 siblings, 1 reply; 8+ messages in thread
From: Maciej Fijalkowski @ 2023-05-12 15:30 UTC (permalink / raw)
To: Zulkifli, Muhammad Husaini
Cc: Jakub Kicinski, Nguyen, Anthony L, davem@davemloft.net,
pabeni@redhat.com, edumazet@google.com, netdev@vger.kernel.org,
Neftin, Sasha, Naama Meir
On Fri, May 12, 2023 at 08:51:23AM +0000, Zulkifli, Muhammad Husaini wrote:
> Hi Jakub,
>
> > On Tue, 9 May 2023 10:09:33 -0700 Tony Nguyen wrote:
> > > There could be a race condition during link down where interrupt being
> > > generated and igc_clean_tx_irq() been called to perform the TX
> > > completion. Properly clear the TX buffer and TX descriptor ring to
> > > avoid those case.
> >
> > > + /* Zero out the buffer ring */
> > > + memset(tx_ring->tx_buffer_info, 0,
> > > + sizeof(*tx_ring->tx_buffer_info) * tx_ring->count);
> > > +
> > > + /* Zero out the descriptor ring */
> > > + memset(tx_ring->desc, 0, tx_ring->size);
> >
> > Just from the diff and the commit description this does not seem obviously
> > correct. Race condition means the two functions can run at the same time,
> > and memset() is not atomic.
>
> While a link is going up or down and a lot of packets(UDP) are being sent transmitted,
> we are observing some kernel panic issues. On my side, it was easily to reproduce.
> It's possible that igc_clean_tx_irq() was called to complete the TX during link up/down
> based on how the call trace looks. With this fix, I not observed the issue anymore.
then include the splat you were getting in the commit msg as well as steps
to repro.
from a brief look it looks like ndo_stop() path does not disable Tx rings
before cleaning them? This is being done when configuring xsk_pool on a
given Tx ring, though.
>
> Almost similar issue reported before in here:
> https://lore.kernel.org/all/SJ1PR11MB6180CDB866753CFBC2F9AF75B8959@SJ1PR11MB6180.namprd11.prod.outlook.com/
>
> > --
> > pw-bot: cr
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH net 1/3] igc: Clean the TX buffer and TX descriptor ring
2023-05-12 15:30 ` Maciej Fijalkowski
@ 2023-05-15 3:30 ` Zulkifli, Muhammad Husaini
0 siblings, 0 replies; 8+ messages in thread
From: Zulkifli, Muhammad Husaini @ 2023-05-15 3:30 UTC (permalink / raw)
To: Fijalkowski, Maciej
Cc: Jakub Kicinski, Nguyen, Anthony L, davem@davemloft.net,
pabeni@redhat.com, edumazet@google.com, netdev@vger.kernel.org,
Neftin, Sasha, Naama Meir
Hi Maciej,
> On Fri, May 12, 2023 at 08:51:23AM +0000, Zulkifli, Muhammad Husaini
> wrote:
> > Hi Jakub,
> >
> > > On Tue, 9 May 2023 10:09:33 -0700 Tony Nguyen wrote:
> > > > There could be a race condition during link down where interrupt
> > > > being generated and igc_clean_tx_irq() been called to perform the
> > > > TX completion. Properly clear the TX buffer and TX descriptor ring
> > > > to avoid those case.
> > >
> > > > + /* Zero out the buffer ring */
> > > > + memset(tx_ring->tx_buffer_info, 0,
> > > > + sizeof(*tx_ring->tx_buffer_info) * tx_ring->count);
> > > > +
> > > > + /* Zero out the descriptor ring */
> > > > + memset(tx_ring->desc, 0, tx_ring->size);
> > >
> > > Just from the diff and the commit description this does not seem
> > > obviously correct. Race condition means the two functions can run at
> > > the same time, and memset() is not atomic.
> >
> > While a link is going up or down and a lot of packets(UDP) are being
> > sent transmitted, we are observing some kernel panic issues. On my side, it
> was easily to reproduce.
> > It's possible that igc_clean_tx_irq() was called to complete the TX
> > during link up/down based on how the call trace looks. With this fix, I not
> observed the issue anymore.
>
> then include the splat you were getting in the commit msg as well as steps to
> repro.
>
> from a brief look it looks like ndo_stop() path does not disable Tx rings before
> cleaning them? This is being done when configuring xsk_pool on a given Tx ring,
> though.
Yes you are right. We shall disable tx queue ring as well.
I will submit the V2 to replace the igc_clean_tx_ring() to igc_disable_tx_ring() in
igc_free_tx_resources() during ndo_stop callback while maintaining the memset
to clear all the ring des/buffer during igc_clean_tx_ring().
I have tested this on my TGL setup with multiple loops iterations and no more
Kernel panic observed. Will update this in commit message as well
Thanks,
Husaini
>
> >
> > Almost similar issue reported before in here:
> >
> https://lore.kernel.org/all/SJ1PR11MB6180CDB866753CFBC2F9AF75B8959@S
> J1
> > PR11MB6180.namprd11.prod.outlook.com/
> >
> > > --
> > > pw-bot: cr
> >
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-05-15 3:32 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-09 17:09 [PATCH net 0/3][pull request] Intel Wired LAN Driver Updates 2023-05-09 (igc, igb) Tony Nguyen
2023-05-09 17:09 ` [PATCH net 1/3] igc: Clean the TX buffer and TX descriptor ring Tony Nguyen
2023-05-11 2:14 ` Jakub Kicinski
2023-05-12 8:51 ` Zulkifli, Muhammad Husaini
2023-05-12 15:30 ` Maciej Fijalkowski
2023-05-15 3:30 ` Zulkifli, Muhammad Husaini
2023-05-09 17:09 ` [PATCH net 2/3] igc: Fix possible system crash when loading module Tony Nguyen
2023-05-09 17:09 ` [PATCH net 3/3] igb: fix nvm.ops.read() error handling Tony Nguyen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).