netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next] cnic: call cp->stop_hw() in cnic_start_hw() on allocation failure
@ 2016-05-04 23:55 Jon Maxwell
  2016-05-06 19:45 ` David Miller
  0 siblings, 1 reply; 2+ messages in thread
From: Jon Maxwell @ 2016-05-04 23:55 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, jmaxwell, ivecera, Jon Maxwell

We recently had a system crash in the cnic module. Vmcore analysis confirmed 
that "ip link up" was executed which failed due to an allocation failure 
because of memory fragmentation. Futher analysis revealed that the cnic irq 
vector was still allocated after the "ip link up" that failed. When 
"ip link down" was executed it called free_msi_irqs() which crashed the system 
because the cnic irq was still inuse.

PANIC: "kernel BUG at drivers/pci/msi.c:411!"

The code execution was:

cnic_netdev_event()
if (event == NETDEV_UP) {
.
.
       ▹       if (!cnic_start_hw(dev))
cnic_start_hw()
calls cnic_cm_open() which failed with -ENOMEM
cnic_start_hw() then took the err1 path:

err1:↩
       cp->free_resc(dev);↩ <---- frees resources but not irq vector
       pci_dev_put(dev->pcidev);↩
       return err;↩
}↩

This returns control back to cnic_netdev_event() but now the cnic irq vector 
is still allocated even although cnic_cm_open() failed. The next 
"ip link down" while trigger the crash. 

The cnic_start_hw() routine is not handling the allocation failure correctly. 
Fix this by checking whether CNIC_DRV_STATE_HANDLES_IRQ flag is set indicating 
that the hardware has been started in cnic_start_hw(). If it has then call 
cp->stop_hw() which frees the cnic irq vector and cnic resources. Otherwise 
just maintain the previous behaviour and free cnic resources. 

I reproduced this by injecting an ENOMEM error into cnic_cm_alloc_mem()s return
code. 

# ip link set dev enpX down
# ip link set dev enpX up <--- hit's allocation failure
# ip link set dev enpX down <--- crashes here

With this patch I confirmed there was no crash in the reproducer.

Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
---
 drivers/net/ethernet/broadcom/cnic.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/cnic.c b/drivers/net/ethernet/broadcom/cnic.c
index b69dc58..b1d2ac8 100644
--- a/drivers/net/ethernet/broadcom/cnic.c
+++ b/drivers/net/ethernet/broadcom/cnic.c
@@ -5350,7 +5350,10 @@ static int cnic_start_hw(struct cnic_dev *dev)
 	return 0;
 
 err1:
-	cp->free_resc(dev);
+	if (ethdev->drv_state & CNIC_DRV_STATE_HANDLES_IRQ)
+		cp->stop_hw(dev);
+	else
+		cp->free_resc(dev);
 	pci_dev_put(dev->pcidev);
 	return err;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH net-next] cnic: call cp->stop_hw() in cnic_start_hw() on allocation failure
  2016-05-04 23:55 [PATCH net-next] cnic: call cp->stop_hw() in cnic_start_hw() on allocation failure Jon Maxwell
@ 2016-05-06 19:45 ` David Miller
  0 siblings, 0 replies; 2+ messages in thread
From: David Miller @ 2016-05-06 19:45 UTC (permalink / raw)
  To: jmaxwell37; +Cc: netdev, linux-kernel, jmaxwell, ivecera

From: Jon Maxwell <jmaxwell37@gmail.com>
Date: Thu,  5 May 2016 09:55:51 +1000

> We recently had a system crash in the cnic module. Vmcore analysis confirmed 
> that "ip link up" was executed which failed due to an allocation failure 
> because of memory fragmentation. Futher analysis revealed that the cnic irq 
> vector was still allocated after the "ip link up" that failed. When 
> "ip link down" was executed it called free_msi_irqs() which crashed the system 
> because the cnic irq was still inuse.
 ...
> The cnic_start_hw() routine is not handling the allocation failure correctly. 
> Fix this by checking whether CNIC_DRV_STATE_HANDLES_IRQ flag is set indicating 
> that the hardware has been started in cnic_start_hw(). If it has then call 
> cp->stop_hw() which frees the cnic irq vector and cnic resources. Otherwise 
> just maintain the previous behaviour and free cnic resources. 
> 
> I reproduced this by injecting an ENOMEM error into cnic_cm_alloc_mem()s return
> code. 
> 
> # ip link set dev enpX down
> # ip link set dev enpX up <--- hit's allocation failure
> # ip link set dev enpX down <--- crashes here
> 
> With this patch I confirmed there was no crash in the reproducer.
> 
> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>

Applied, thank you.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2016-05-06 19:45 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-04 23:55 [PATCH net-next] cnic: call cp->stop_hw() in cnic_start_hw() on allocation failure Jon Maxwell
2016-05-06 19:45 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).