mlx4 2.6.31-rc5: SW2HW

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* mlx4 2.6.31-rc5: SW2HW_EQ failed.
@ 2009-08-17 19:26 Christoph Lameter
  2009-08-17 22:04 ` Roland Dreier
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Lameter @ 2009-08-17 19:26 UTC (permalink / raw)
  To: Roland Dreier; +Cc: netdev, Yevgeny Petrilin

mlx4 fails to initialize here:


[    9.973940] mlx4_core 0000:04:00.0: irq 93 for MSI/MSI-X
[    9.983108] sr 1:0:0:0: Attached scsi CD-ROM sr0
[    9.988209] ses 0:0:32:0: Attached scsi generic sg0 type 13
[    9.999376] sd 0:2:0:0: Attached scsi generic sg1 type 0
[   10.010024] sr 1:0:0:0: Attached scsi generic sg2 type 5
[   10.256371] mlx4_core 0000:04:00.0: SW2HW_EQ failed (-5)
[   10.270103] mlx4_core 0000:04:00.0: Failed to initialize event queue
table, aborting.
[   10.288768] mlx4_core 0000:04:00.0: PCI INT A disabled
[   10.299057] mlx4_core: probe of 0000:04:00.0 failed with error -5


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mlx4 2.6.31-rc5: SW2HW_EQ failed.
  2009-08-17 19:26 mlx4 2.6.31-rc5: SW2HW_EQ failed Christoph Lameter
@ 2009-08-17 22:04 ` Roland Dreier
  2009-08-17 22:17   ` Christoph Lameter
  0 siblings, 1 reply; 14+ messages in thread
From: Roland Dreier @ 2009-08-17 22:04 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: netdev, Yevgeny Petrilin

 > mlx4 fails to initialize here:
 > 
 > 
 > [    9.973940] mlx4_core 0000:04:00.0: irq 93 for MSI/MSI-X
 > [    9.983108] sr 1:0:0:0: Attached scsi CD-ROM sr0
 > [    9.988209] ses 0:0:32:0: Attached scsi generic sg0 type 13
 > [    9.999376] sd 0:2:0:0: Attached scsi generic sg1 type 0
 > [   10.010024] sr 1:0:0:0: Attached scsi generic sg2 type 5
 > [   10.256371] mlx4_core 0000:04:00.0: SW2HW_EQ failed (-5)
 > [   10.270103] mlx4_core 0000:04:00.0: Failed to initialize event queue
 > table, aborting.
 > [   10.288768] mlx4_core 0000:04:00.0: PCI INT A disabled
 > [   10.299057] mlx4_core: probe of 0000:04:00.0 failed with error -5

Thanks for the report... could you try loading mlx4_core with
debug_level=1 to see if anything interesting comes out?  The kernel log
here indicates that the device FW is giving us "internal error" when we
try to initialize event queues.

Also what kernel is this with?  Anything unusual about the system (arch
!= x86, lots of CPUs or RAM, etc)?

One stab in the dark would be to try a423b8a0 ("mlx4_core: Allocate and
map sufficient ICM memory for EQ context") from the for-next branch of
my infiniband.git kernel.org tree.  I would only think that matters if
you have 32 or more CPUs, but maybe you do...

 - R.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mlx4 2.6.31-rc5: SW2HW_EQ failed.
  2009-08-17 22:04 ` Roland Dreier
@ 2009-08-17 22:17   ` Christoph Lameter
  2009-08-18  1:28     ` Roland Dreier
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Lameter @ 2009-08-17 22:17 UTC (permalink / raw)
  To: Roland Dreier; +Cc: netdev, Yevgeny Petrilin

On Mon, 17 Aug 2009, Roland Dreier wrote:

>
>  > mlx4 fails to initialize here:
>  >
>  >
>  > [    9.973940] mlx4_core 0000:04:00.0: irq 93 for MSI/MSI-X
>  > [    9.983108] sr 1:0:0:0: Attached scsi CD-ROM sr0
>  > [    9.988209] ses 0:0:32:0: Attached scsi generic sg0 type 13
>  > [    9.999376] sd 0:2:0:0: Attached scsi generic sg1 type 0
>  > [   10.010024] sr 1:0:0:0: Attached scsi generic sg2 type 5
>  > [   10.256371] mlx4_core 0000:04:00.0: SW2HW_EQ failed (-5)
>  > [   10.270103] mlx4_core 0000:04:00.0: Failed to initialize event queue
>  > table, aborting.
>  > [   10.288768] mlx4_core 0000:04:00.0: PCI INT A disabled
>  > [   10.299057] mlx4_core: probe of 0000:04:00.0 failed with error -5
>
> Thanks for the report... could you try loading mlx4_core with
> debug_level=1 to see if anything interesting comes out?  The kernel log
> here indicates that the device FW is giving us "internal error" when we
> try to initialize event queues.

Device FW??? The log you wanted follows at the end of this message.

> Also what kernel is this with?  Anything unusual about the system (arch
> != x86, lots of CPUs or RAM, etc)?

Dell R620 two quad nehalems. Build with standard debian kernel config.

> One stab in the dark would be to try a423b8a0 ("mlx4_core: Allocate and
> map sufficient ICM memory for EQ context") from the for-next branch of
> my infiniband.git kernel.org tree.  I would only think that matters if
> you have 32 or more CPUs, but maybe you do...

We have 16 processors.

[ 7423.298136] mlx4_core: Mellanox ConnectX core driver v0.01 (May 1,
2007)
[ 7423.298137] mlx4_core: Initializing 0000:04:00.0
[ 7423.298147] mlx4_core 0000:04:00.0: PCI INT A -> GSI 38 (level, low) ->
IRQ 38
[ 7423.298165] mlx4_core 0000:04:00.0: setting latency timer to 64
[ 7424.298240] mlx4_core 0000:04:00.0: FW version 2.6.000 (cmd intf rev
3), max commands 16
[ 7424.298242] mlx4_core 0000:04:00.0: Catastrophic error buffer at
0x1f020, size 0x10, BAR 0
[ 7424.298243] mlx4_core 0000:04:00.0: FW size 385 KB
[ 7424.298245] mlx4_core 0000:04:00.0: Clear int @ f0058, BAR 0
[ 7424.299848] mlx4_core 0000:04:00.0: Mapped 26 chunks/6168 KB for FW.
[ 7424.921833] mlx4_core 0000:04:00.0: BlueFlame available (reg size 512,
regs/page 256)
[ 7424.921952] mlx4_core 0000:04:00.0: Base MM extensions: flags 00000cc0,
rsvd L_Key 00000500
[ 7424.921954] mlx4_core 0000:04:00.0: Max ICM size 4294967296 MB
[ 7424.921955] mlx4_core 0000:04:00.0: Max QPs: 16777216, reserved QPs:
64, entry size: 256
[ 7424.921957] mlx4_core 0000:04:00.0: Max SRQs: 16777216, reserved SRQs:
64, entry size: 128
[ 7424.921959] mlx4_core 0000:04:00.0: Max CQs: 16777216, reserved CQs:
128, entry size: 128
[ 7424.921960] mlx4_core 0000:04:00.0: Max EQs: 512, reserved EQs: 4,
entry size: 128
[ 7424.921961] mlx4_core 0000:04:00.0: reserved MPTs: 16, reserved MTTs:
16
[ 7424.921963] mlx4_core 0000:04:00.0: Max PDs: 8388608, reserved PDs: 4,
reserved UARs: 1
[ 7424.921964] mlx4_core 0000:04:00.0: Max QP/MCG: 8388608, reserved MGMs:
0
[ 7424.921966] mlx4_core 0000:04:00.0: Max CQEs: 4194304, max WQEs: 16384,
max SRQ WQEs: 16384
[ 7424.921967] mlx4_core 0000:04:00.0: Local CA ACK delay: 15, max MTU:
4096, port width cap: 3
[ 7424.921969] mlx4_core 0000:04:00.0: Max SQ desc size: 1008, max SQ S/G:
62
[ 7424.921970] mlx4_core 0000:04:00.0: Max RQ desc size: 512, max RQ S/G:
32
[ 7424.921971] mlx4_core 0000:04:00.0: Max GSO size: 131072
[ 7424.921972] mlx4_core 0000:04:00.0: DEV_CAP flags:
[ 7424.921974] mlx4_core 0000:04:00.0:     RC transport
[ 7424.921975] mlx4_core 0000:04:00.0:     UC transport
[ 7424.921976] mlx4_core 0000:04:00.0:     UD transport
[ 7424.921977] mlx4_core 0000:04:00.0:     XRC transport
[ 7424.921978] mlx4_core 0000:04:00.0:     FCoIB support
[ 7424.921979] mlx4_core 0000:04:00.0:     SRQ support
[ 7424.921980] mlx4_core 0000:04:00.0:     IPoIB checksum offload
[ 7424.921981] mlx4_core 0000:04:00.0:     P_Key violation counter
[ 7424.921982] mlx4_core 0000:04:00.0:     Q_Key violation counter
[ 7424.921983] mlx4_core 0000:04:00.0:     DPDP
[ 7424.921984] mlx4_core 0000:04:00.0:     APM support
[ 7424.921985] mlx4_core 0000:04:00.0:     Atomic ops support
[ 7424.921986] mlx4_core 0000:04:00.0:     Address vector port checking
support
[ 7424.921988] mlx4_core 0000:04:00.0:     UD multicast support
[ 7424.921989] mlx4_core 0000:04:00.0:     Router support
[ 7424.921993] mlx4_core 0000:04:00.0:   profile[ 0] (  CMPT): 2^26
entries @ 0x         0, size 0x 100000000
[ 7424.921995] mlx4_core 0000:04:00.0:   profile[ 1] (RDMARC): 2^21
entries @ 0x 100000000, size 0x   4000000
[ 7424.921997] mlx4_core 0000:04:00.0:   profile[ 2] (   MTT): 2^20
entries @ 0x 104000000, size 0x   4000000
[ 7424.921999] mlx4_core 0000:04:00.0:   profile[ 3] (    QP): 2^17
entries @ 0x 108000000, size 0x   2000000
[ 7424.922001] mlx4_core 0000:04:00.0:   profile[ 4] (  ALTC): 2^17
entries @ 0x 10a000000, size 0x    800000
[ 7424.922003] mlx4_core 0000:04:00.0:   profile[ 5] (   SRQ): 2^16
entries @ 0x 10a800000, size 0x    800000
[ 7424.922005] mlx4_core 0000:04:00.0:   profile[ 6] (    CQ): 2^16
entries @ 0x 10b000000, size 0x    800000
[ 7424.922007] mlx4_core 0000:04:00.0:   profile[ 7] (  DMPT): 2^17
entries @ 0x 10b800000, size 0x    800000
[ 7424.922009] mlx4_core 0000:04:00.0:   profile[ 8] (   MCG): 2^13
entries @ 0x 10c000000, size 0x    200000
[ 7424.922011] mlx4_core 0000:04:00.0:   profile[ 9] (  AUXC): 2^17
entries @ 0x 10c200000, size 0x     20000
[ 7424.922013] mlx4_core 0000:04:00.0:   profile[10] (    EQ): 2^06
entries @ 0x 10c220000, size 0x      2000
[ 7424.922014] mlx4_core 0000:04:00.0: HCA context memory: reserving
4393096 KB
[ 7424.922034] mlx4_core 0000:04:00.0: 4393096 KB of HCA context requires
8620 KB aux memory.
[ 7424.942888] mlx4_core 0000:04:00.0: Mapped 37 chunks/8620 KB for ICM
aux.
[ 7424.943998] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 0 for
ICM.
[ 7424.945080] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 40000000
for ICM.
[ 7424.946162] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 80000000
for ICM.
[ 7424.946192] mlx4_core 0000:04:00.0: Mapped 1 chunks/4 KB at c0000000
for ICM.
[ 7424.946221] mlx4_core 0000:04:00.0: Mapped page at 1a79c4000 to
10c220000 for ICM.
[ 7424.947283] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 104000000
for ICM.
[ 7424.948380] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 10b800000
for ICM.
[ 7424.949441] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 108000000
for ICM.
[ 7424.949976] mlx4_core 0000:04:00.0: Mapped 1 chunks/128 KB at 10c200000
for ICM.
[ 7424.951037] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 10a000000
for ICM.
[ 7424.952098] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 100000000
for ICM.
[ 7424.953159] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 10b000000
for ICM.
[ 7424.954219] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 10a800000
for ICM.
[ 7424.955279] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 10c000000
for ICM.
[ 7424.956339] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 10c040000
for ICM.
[ 7424.957399] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 10c080000
for ICM.
[ 7424.958458] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 10c0c0000
for ICM.
[ 7424.959519] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 10c100000
for ICM.
[ 7424.960581] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 10c140000
for ICM.
[ 7424.961641] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 10c180000
for ICM.
[ 7424.962702] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 10c1c0000
for ICM.
[ 7425.199430] mlx4_core 0000:04:00.0: irq 70 for MSI/MSI-X
[ 7425.199432] mlx4_core 0000:04:00.0: irq 71 for MSI/MSI-X
[ 7425.199434] mlx4_core 0000:04:00.0: irq 72 for MSI/MSI-X
[ 7425.199436] mlx4_core 0000:04:00.0: irq 73 for MSI/MSI-X
[ 7425.199437] mlx4_core 0000:04:00.0: irq 74 for MSI/MSI-X
[ 7425.199439] mlx4_core 0000:04:00.0: irq 75 for MSI/MSI-X
[ 7425.199441] mlx4_core 0000:04:00.0: irq 76 for MSI/MSI-X
[ 7425.199443] mlx4_core 0000:04:00.0: irq 77 for MSI/MSI-X
[ 7425.199445] mlx4_core 0000:04:00.0: irq 78 for MSI/MSI-X
[ 7425.199446] mlx4_core 0000:04:00.0: irq 79 for MSI/MSI-X
[ 7425.199448] mlx4_core 0000:04:00.0: irq 80 for MSI/MSI-X
[ 7425.199450] mlx4_core 0000:04:00.0: irq 81 for MSI/MSI-X
[ 7425.199452] mlx4_core 0000:04:00.0: irq 82 for MSI/MSI-X
[ 7425.199454] mlx4_core 0000:04:00.0: irq 83 for MSI/MSI-X
[ 7425.199456] mlx4_core 0000:04:00.0: irq 84 for MSI/MSI-X
[ 7425.199457] mlx4_core 0000:04:00.0: irq 85 for MSI/MSI-X
[ 7425.199459] mlx4_core 0000:04:00.0: irq 86 for MSI/MSI-X
[ 7425.199461] mlx4_core 0000:04:00.0: irq 87 for MSI/MSI-X
[ 7425.199463] mlx4_core 0000:04:00.0: irq 88 for MSI/MSI-X
[ 7425.199464] mlx4_core 0000:04:00.0: irq 89 for MSI/MSI-X
[ 7425.199466] mlx4_core 0000:04:00.0: irq 90 for MSI/MSI-X
[ 7425.199468] mlx4_core 0000:04:00.0: irq 91 for MSI/MSI-X
[ 7425.199470] mlx4_core 0000:04:00.0: irq 92 for MSI/MSI-X
[ 7425.199472] mlx4_core 0000:04:00.0: irq 93 for MSI/MSI-X
[ 7425.199474] mlx4_core 0000:04:00.0: irq 94 for MSI/MSI-X
[ 7425.199475] mlx4_core 0000:04:00.0: irq 95 for MSI/MSI-X
[ 7425.199477] mlx4_core 0000:04:00.0: irq 96 for MSI/MSI-X
[ 7425.199479] mlx4_core 0000:04:00.0: irq 97 for MSI/MSI-X
[ 7425.199481] mlx4_core 0000:04:00.0: irq 98 for MSI/MSI-X
[ 7425.199483] mlx4_core 0000:04:00.0: irq 99 for MSI/MSI-X
[ 7425.199485] mlx4_core 0000:04:00.0: irq 100 for MSI/MSI-X
[ 7425.199487] mlx4_core 0000:04:00.0: irq 101 for MSI/MSI-X
[ 7425.199488] mlx4_core 0000:04:00.0: irq 102 for MSI/MSI-X
[ 7425.472921] mlx4_core 0000:04:00.0: SW2HW_EQ failed (-5)
[ 7425.476030] mlx4_core 0000:04:00.0: Failed to initialize event queue
table, aborting.
[ 7425.494648] mlx4_core 0000:04:00.0: PCI INT A disabled
[ 7425.494660] mlx4_core: probe of 0000:04:00.0 failed with error -5


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mlx4 2.6.31-rc5: SW2HW_EQ failed.
  2009-08-17 22:17   ` Christoph Lameter
@ 2009-08-18  1:28     ` Roland Dreier
  2009-08-18 15:50       ` Christoph Lameter
  2009-08-19  7:03       ` Roland Dreier
  0 siblings, 2 replies; 14+ messages in thread
From: Roland Dreier @ 2009-08-18  1:28 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: netdev, Yevgeny Petrilin


 > > [   10.256371] mlx4_core 0000:04:00.0: SW2HW_EQ failed (-5)

 > Device FW??? The log you wanted follows at the end of this message.

Not sure why there are "???" there... the (-5) error code is an
"internal error" status from the device FW on the event queue
initialization command.  Anyway I think the log shows that the problem
is exactly the one fixed in the commit I mentioned -- a423b8a0
("mlx4_core: Allocate and map sufficient ICM memory for EQ context")
from my infiniband.git tree should fix this.

The log

 > [ 7425.199430] mlx4_core 0000:04:00.0: irq 70 for MSI/MSI-X
...
 > [ 7425.199488] mlx4_core 0000:04:00.0: irq 102 for MSI/MSI-X

shows 33 event queues being allocated (num_possible_cpus() + 1) and that
will hit the issue fixed in that commit.

Assuming this fixes it for you, I guess I should get this into 2.6.31,
since it obviously is hitting not-particularly-exotic systems in
practice.  I do wonder why num_possible_cpus() is 32 on your box (since
16 threads is really the max with nehalem EP).

Anyway, here's the patch I mean:

commit a423b8a022d523abe834cefe67bfaf42424150a7
Author: Eli Cohen <eli@mellanox.co.il>
Date:   Fri Aug 7 11:13:13 2009 -0700

    mlx4_core: Allocate and map sufficient ICM memory for EQ context
    
    The current implementation allocates a single host page for EQ context
    memory, which was OK when we only allocated a few EQs.  However, since
    we now allocate an EQ for each CPU core, this patch removes the
    hard-coded limit and makes the allocation depend on EQ entry size and
    the number of required EQs.
    
    Signed-off-by: Eli Cohen <eli@mellanox.co.il>
    Signed-off-by: Roland Dreier <rolandd@cisco.com>

diff --git a/drivers/net/mlx4/eq.c b/drivers/net/mlx4/eq.c
index c11a052..dae6387 100644
--- a/drivers/net/mlx4/eq.c
+++ b/drivers/net/mlx4/eq.c
@@ -529,29 +529,36 @@ int mlx4_map_eq_icm(struct mlx4_dev *dev, u64 icm_virt)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	int ret;
+	int host_pages, icm_pages;
+	int i;
 
-	/*
-	 * We assume that mapping one page is enough for the whole EQ
-	 * context table.  This is fine with all current HCAs, because
-	 * we only use 32 EQs and each EQ uses 64 bytes of context
-	 * memory, or 1 KB total.
-	 */
+	host_pages = ALIGN(min_t(int, dev->caps.num_eqs, num_possible_cpus() + 1) *
+			   dev->caps.eqc_entry_size, PAGE_SIZE) >> PAGE_SHIFT;
+	priv->eq_table.order = order_base_2(host_pages);
 	priv->eq_table.icm_virt = icm_virt;
-	priv->eq_table.icm_page = alloc_page(GFP_HIGHUSER);
+	priv->eq_table.icm_page = alloc_pages(GFP_HIGHUSER, priv->eq_table.order);
 	if (!priv->eq_table.icm_page)
 		return -ENOMEM;
 	priv->eq_table.icm_dma  = pci_map_page(dev->pdev, priv->eq_table.icm_page, 0,
-					       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
+					       PAGE_SIZE << priv->eq_table.order,
+					       PCI_DMA_BIDIRECTIONAL);
 	if (pci_dma_mapping_error(dev->pdev, priv->eq_table.icm_dma)) {
-		__free_page(priv->eq_table.icm_page);
+		__free_pages(priv->eq_table.icm_page, priv->eq_table.order);
 		return -ENOMEM;
 	}
 
-	ret = mlx4_MAP_ICM_page(dev, priv->eq_table.icm_dma, icm_virt);
-	if (ret) {
-		pci_unmap_page(dev->pdev, priv->eq_table.icm_dma, PAGE_SIZE,
-			       PCI_DMA_BIDIRECTIONAL);
-		__free_page(priv->eq_table.icm_page);
+	icm_pages = (PAGE_SIZE / MLX4_ICM_PAGE_SIZE) << priv->eq_table.order;
+	for (i = 0; i < icm_pages; ++i) {
+		ret = mlx4_MAP_ICM_page(dev, priv->eq_table.icm_dma,
+					icm_virt + i * MLX4_ICM_PAGE_SIZE);
+		if (ret) {
+			if (i)
+				mlx4_UNMAP_ICM(dev, priv->eq_table.icm_virt, i);
+			pci_unmap_page(dev->pdev, priv->eq_table.icm_dma, PAGE_SIZE,
+				       PCI_DMA_BIDIRECTIONAL);
+			__free_pages(priv->eq_table.icm_page, priv->eq_table.order);
+			break;
+		}
 	}
 
 	return ret;
@@ -560,11 +567,12 @@ int mlx4_map_eq_icm(struct mlx4_dev *dev, u64 icm_virt)
 void mlx4_unmap_eq_icm(struct mlx4_dev *dev)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
+	int icm_pages = (PAGE_SIZE / MLX4_ICM_PAGE_SIZE) << priv->eq_table.order;
 
-	mlx4_UNMAP_ICM(dev, priv->eq_table.icm_virt, 1);
-	pci_unmap_page(dev->pdev, priv->eq_table.icm_dma, PAGE_SIZE,
-		       PCI_DMA_BIDIRECTIONAL);
-	__free_page(priv->eq_table.icm_page);
+	mlx4_UNMAP_ICM(dev, priv->eq_table.icm_virt, icm_pages);
+	pci_unmap_page(dev->pdev, priv->eq_table.icm_dma,
+		       PAGE_SIZE << priv->eq_table.order, PCI_DMA_BIDIRECTIONAL);
+	__free_pages(priv->eq_table.icm_page, priv->eq_table.order);
 }
 
 int mlx4_alloc_eq_table(struct mlx4_dev *dev)
diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index 5c1afe0..474d1f3 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -207,6 +207,7 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	dev->caps.max_cqes	     = dev_cap->max_cq_sz - 1;
 	dev->caps.reserved_cqs	     = dev_cap->reserved_cqs;
 	dev->caps.reserved_eqs	     = dev_cap->reserved_eqs;
+	dev->caps.eqc_entry_size     = dev_cap->eqc_entry_sz;
 	dev->caps.mtts_per_seg	     = 1 << log_mtts_per_seg;
 	dev->caps.reserved_mtts	     = DIV_ROUND_UP(dev_cap->reserved_mtts,
 						    dev->caps.mtts_per_seg);
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 5bd79c2..34bcc11 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -210,6 +210,7 @@ struct mlx4_eq_table {
 	dma_addr_t		icm_dma;
 	struct mlx4_icm_table	cmpt_table;
 	int			have_irq;
+	int			order;
 	u8			inta_pin;
 };
 
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index ce7cc6c..8923c9b 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -206,6 +206,7 @@ struct mlx4_caps {
 	int			max_cqes;
 	int			reserved_cqs;
 	int			num_eqs;
+	int			eqc_entry_size;
 	int			reserved_eqs;
 	int			num_comp_vectors;
 	int			num_mpts;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: mlx4 2.6.31-rc5: SW2HW_EQ failed.
  2009-08-18  1:28     ` Roland Dreier
@ 2009-08-18 15:50       ` Christoph Lameter
  2009-08-18 16:56         ` Roland Dreier
  2009-08-19  7:03       ` Roland Dreier
  1 sibling, 1 reply; 14+ messages in thread
From: Christoph Lameter @ 2009-08-18 15:50 UTC (permalink / raw)
  To: Roland Dreier; +Cc: netdev, Yevgeny Petrilin

On Mon, 17 Aug 2009, Roland Dreier wrote:

>  > Device FW??? The log you wanted follows at the end of this message.
>
> Not sure why there are "???" there... the (-5) error code is an
> "internal error" status from the device FW on the event queue
> initialization command.  Anyway I think the log shows that the problem
> is exactly the one fixed in the commit I mentioned -- a423b8a0
> ("mlx4_core: Allocate and map sufficient ICM memory for EQ context")
> from my infiniband.git tree should fix this.

Never heard of device FW.
>
> The log
>
>  > [ 7425.199430] mlx4_core 0000:04:00.0: irq 70 for MSI/MSI-X
> ...
>  > [ 7425.199488] mlx4_core 0000:04:00.0: irq 102 for MSI/MSI-X
>
> shows 33 event queues being allocated (num_possible_cpus() + 1) and that
> will hit the issue fixed in that commit.
>
> Assuming this fixes it for you, I guess I should get this into 2.6.31,
> since it obviously is hitting not-particularly-exotic systems in
> practice.  I do wonder why num_possible_cpus() is 32 on your box (since
> 16 threads is really the max with nehalem EP).

The Mellanox NIC has two ports. Could that be it?

> commit a423b8a022d523abe834cefe67bfaf42424150a7

Will get back to you shortly when we have tested this patch.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mlx4 2.6.31-rc5: SW2HW_EQ failed.
  2009-08-18 15:50       ` Christoph Lameter
@ 2009-08-18 16:56         ` Roland Dreier
  0 siblings, 0 replies; 14+ messages in thread
From: Roland Dreier @ 2009-08-18 16:56 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: netdev, Yevgeny Petrilin


 > Never heard of device FW.

I just meant firmware running on the device (in this case the connectx adapter).

 > The Mellanox NIC has two ports. Could that be it?

No, the device is basically doing

	nreq = num_possible_cpus() + 1;
	pci_enable_msix(dev->pdev, entries, nreq);

(edited a bit but that's the code).  And nreq is ending up as 33 on your
box.

 > Will get back to you shortly when we have tested this patch.

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mlx4 2.6.31-rc5: SW2HW_EQ failed.
  2009-08-18  1:28     ` Roland Dreier
  2009-08-18 15:50       ` Christoph Lameter
@ 2009-08-19  7:03       ` Roland Dreier
  2009-08-19 11:46         ` Christoph Lameter
  1 sibling, 1 reply; 14+ messages in thread
From: Roland Dreier @ 2009-08-19  7:03 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: netdev, Yevgeny Petrilin

By the way, my dual-socket nehalem EP system says:

    SMP: Allowing 16 CPUs, 0 hotplug CPUs

which I think (haven't checked but the code sure looks that way) means
that num_possible_cpus() is 16.  This is a supermicro workstation board,
forget the exact model.

And the fact that your system is different is not really a bug -- it
just points to slightly incorrect data somewhere, most likely ACPI
tables; having 32 possible CPUs on a system that can only ever really
have 16 CPUs just leads to some overallocations, and tickles the mlx4
bug where it can't handle more than 32 interrupts).

Just FWIW.

 - R.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mlx4 2.6.31-rc5: SW2HW_EQ failed.
  2009-08-19  7:03       ` Roland Dreier
@ 2009-08-19 11:46         ` Christoph Lameter
  2009-08-19 15:29           ` Roland Dreier
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Lameter @ 2009-08-19 11:46 UTC (permalink / raw)
  To: Roland Dreier; +Cc: netdev, Yevgeny Petrilin

On Wed, 19 Aug 2009, Roland Dreier wrote:

> By the way, my dual-socket nehalem EP system says:
>
>     SMP: Allowing 16 CPUs, 0 hotplug CPUs
>
> which I think (haven't checked but the code sure looks that way) means
> that num_possible_cpus() is 16.  This is a supermicro workstation board,
> forget the exact model.

Correct. My Dell R620 may have some weird ACPI info.

SMP: Allowing 32 CPUs, 16 hotplug CPUs

although the box does not have any slots for "hotplugging".


> And the fact that your system is different is not really a bug -- it
> just points to slightly incorrect data somewhere, most likely ACPI
> tables; having 32 possible CPUs on a system that can only ever really
> have 16 CPUs just leads to some overallocations, and tickles the mlx4
> bug where it can't handle more than 32 interrupts).

The patch does not fix the issue so far. We are having various hangs and
are trying to figure out what is gong on.

We ran an old 2.6.26 debian kernel on this and it worked fine.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mlx4 2.6.31-rc5: SW2HW_EQ failed.
  2009-08-19 11:46         ` Christoph Lameter
@ 2009-08-19 15:29           ` Roland Dreier
  2009-08-19 15:47             ` Christoph Lameter
  2009-08-19 16:29             ` Christoph Lameter
  0 siblings, 2 replies; 14+ messages in thread
From: Roland Dreier @ 2009-08-19 15:29 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: netdev, Yevgeny Petrilin


 > Correct. My Dell R620 may have some weird ACPI info.
 > 
 > SMP: Allowing 32 CPUs, 16 hotplug CPUs
 > 
 > although the box does not have any slots for "hotplugging".

I guess you could try booting with "possible_cpus=16" and see how that
affects things...

 > The patch does not fix the issue so far. We are having various hangs and
 > are trying to figure out what is gong on.

Sounds like the patch lets you get farther?  What kind of hangs do you
get?  Still mlx4-related?

 > We ran an old 2.6.26 debian kernel on this and it worked fine.

The change to mlx4 to use multiple completion interrupts went in around
2.6.29 I think.  So that sort of explains why things would work with 2.6.26.

 - R.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mlx4 2.6.31-rc5: SW2HW_EQ failed.
  2009-08-19 15:29           ` Roland Dreier
@ 2009-08-19 15:47             ` Christoph Lameter
  2009-08-19 19:46               ` Roland Dreier
  2009-08-19 21:42               ` Roland Dreier
  2009-08-19 16:29             ` Christoph Lameter
  1 sibling, 2 replies; 14+ messages in thread
From: Christoph Lameter @ 2009-08-19 15:47 UTC (permalink / raw)
  To: Roland Dreier; +Cc: netdev, Yevgeny Petrilin

On Wed, 19 Aug 2009, Roland Dreier wrote:

>  > although the box does not have any slots for "hotplugging".
>
> I guess you could try booting with "possible_cpus=16" and see how that
> affects things...

I configured the kernel with a maximum of 16 cpus and things are fine.

> Sounds like the patch lets you get farther?  What kind of hangs do you
> get?  Still mlx4-related?

System hangs when unloading the mlx4 driver f.e.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mlx4 2.6.31-rc5: SW2HW_EQ failed.
  2009-08-19 15:47             ` Christoph Lameter
@ 2009-08-19 19:46               ` Roland Dreier
  2009-08-19 19:58                 ` Christoph Lameter
  2009-08-19 21:42               ` Roland Dreier
  1 sibling, 1 reply; 14+ messages in thread
From: Roland Dreier @ 2009-08-19 19:46 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: netdev, Yevgeny Petrilin

 > System hangs when unloading the mlx4 driver f.e.

I guess that means things work for a while if you get to the point of
unloading?  Or do you still get an error on load?  Any way to get a
trace or anything?

BTW, are you using mlx4_en or mlx4_ib?

I'll try to reproduce here by booting with "possible_cpus=32" and see
what happens...

 - R.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mlx4 2.6.31-rc5: SW2HW_EQ failed.
  2009-08-19 19:46               ` Roland Dreier
@ 2009-08-19 19:58                 ` Christoph Lameter
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Lameter @ 2009-08-19 19:58 UTC (permalink / raw)
  To: Roland Dreier; +Cc: netdev, Yevgeny Petrilin

On Wed, 19 Aug 2009, Roland Dreier wrote:

>
>  > System hangs when unloading the mlx4 driver f.e.
>
> I guess that means things work for a while if you get to the point of
> unloading?  Or do you still get an error on load?  Any way to get a
> trace or anything?

It hung on unload. Playing around with BIOS configs now. Seems that C
states were disabled so we also had some ACPI issues.

> BTW, are you using mlx4_en or mlx4_ib?

mlx4_ib.

> I'll try to reproduce here by booting with "possible_cpus=32" and see
> what happens...

Thanks.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mlx4 2.6.31-rc5: SW2HW_EQ failed.
  2009-08-19 15:47             ` Christoph Lameter
  2009-08-19 19:46               ` Roland Dreier
@ 2009-08-19 21:42               ` Roland Dreier
  1 sibling, 0 replies; 14+ messages in thread
From: Roland Dreier @ 2009-08-19 21:42 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: netdev, Yevgeny Petrilin

I took another look at the patch I sent and found a couple of bugs in it
(seems original authors didn't really test on a system with 32 CPUs).
Anyway the patch below seems to work on a test system with 32 possible
CPUs (including unloading).

Let me know how it works for you.

Thanks,
  Roland

commit 75e8522a04e982623d67b959d2e545974f36c323
Author: Eli Cohen <eli@mellanox.co.il>
Date:   Wed Aug 19 14:15:59 2009 -0700

    mlx4_core: Allocate and map sufficient ICM memory for EQ context
    
    The current implementation allocates a single host page for EQ context
    memory, which was OK when we only allocated a few EQs.  However, since
    we now allocate an EQ for each CPU core, this patch removes the
    hard-coded limit and makes the allocation depend on EQ entry size and
    the number of required EQs.
    
    Signed-off-by: Eli Cohen <eli@mellanox.co.il>
    Signed-off-by: Roland Dreier <rolandd@cisco.com>

diff --git a/drivers/net/mlx4/eq.c b/drivers/net/mlx4/eq.c
index c11a052..fffe1ea 100644
--- a/drivers/net/mlx4/eq.c
+++ b/drivers/net/mlx4/eq.c
@@ -529,31 +529,46 @@ int mlx4_map_eq_icm(struct mlx4_dev *dev, u64 icm_virt)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	int ret;
+	int host_pages;
+	unsigned off;
 
-	/*
-	 * We assume that mapping one page is enough for the whole EQ
-	 * context table.  This is fine with all current HCAs, because
-	 * we only use 32 EQs and each EQ uses 64 bytes of context
-	 * memory, or 1 KB total.
-	 */
+	host_pages = PAGE_ALIGN(min_t(int, dev->caps.num_eqs, num_possible_cpus() + 1) *
+				dev->caps.eqc_entry_size) >> PAGE_SHIFT;
+	priv->eq_table.order    = order_base_2(host_pages);
 	priv->eq_table.icm_virt = icm_virt;
-	priv->eq_table.icm_page = alloc_page(GFP_HIGHUSER);
-	if (!priv->eq_table.icm_page)
-		return -ENOMEM;
+	priv->eq_table.icm_page = alloc_pages(GFP_HIGHUSER, priv->eq_table.order);
+	if (!priv->eq_table.icm_page) {
+		ret = -ENOMEM;
+		goto err;
+	}
 	priv->eq_table.icm_dma  = pci_map_page(dev->pdev, priv->eq_table.icm_page, 0,
-					       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
+					       PAGE_SIZE << priv->eq_table.order,
+					       PCI_DMA_BIDIRECTIONAL);
 	if (pci_dma_mapping_error(dev->pdev, priv->eq_table.icm_dma)) {
-		__free_page(priv->eq_table.icm_page);
-		return -ENOMEM;
+		ret = -ENOMEM;
+		goto err_free;
 	}
 
-	ret = mlx4_MAP_ICM_page(dev, priv->eq_table.icm_dma, icm_virt);
-	if (ret) {
-		pci_unmap_page(dev->pdev, priv->eq_table.icm_dma, PAGE_SIZE,
-			       PCI_DMA_BIDIRECTIONAL);
-		__free_page(priv->eq_table.icm_page);
+	for (off = 0; off < PAGE_SIZE << priv->eq_table.order; off += MLX4_ICM_PAGE_SIZE) {
+		ret = mlx4_MAP_ICM_page(dev, priv->eq_table.icm_dma + off,
+					icm_virt + off);
+		if (ret)
+			goto err_unmap;
 	}
 
+	return 0;
+
+err_unmap:
+	if (off)
+		mlx4_UNMAP_ICM(dev, priv->eq_table.icm_virt, off / MLX4_ICM_PAGE_SIZE);
+	pci_unmap_page(dev->pdev, priv->eq_table.icm_dma,
+		       PAGE_SIZE << priv->eq_table.order,
+		       PCI_DMA_BIDIRECTIONAL);
+
+err_free:
+	__free_pages(priv->eq_table.icm_page, priv->eq_table.order);
+
+err:
 	return ret;
 }
 
@@ -561,10 +576,11 @@ void mlx4_unmap_eq_icm(struct mlx4_dev *dev)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
 
-	mlx4_UNMAP_ICM(dev, priv->eq_table.icm_virt, 1);
-	pci_unmap_page(dev->pdev, priv->eq_table.icm_dma, PAGE_SIZE,
-		       PCI_DMA_BIDIRECTIONAL);
-	__free_page(priv->eq_table.icm_page);
+	mlx4_UNMAP_ICM(dev, priv->eq_table.icm_virt,
+		       (PAGE_SIZE / MLX4_ICM_PAGE_SIZE) << priv->eq_table.order);
+	pci_unmap_page(dev->pdev, priv->eq_table.icm_dma,
+		       PAGE_SIZE << priv->eq_table.order, PCI_DMA_BIDIRECTIONAL);
+	__free_pages(priv->eq_table.icm_page, priv->eq_table.order);
 }
 
 int mlx4_alloc_eq_table(struct mlx4_dev *dev)
diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index 5c1afe0..474d1f3 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -207,6 +207,7 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	dev->caps.max_cqes	     = dev_cap->max_cq_sz - 1;
 	dev->caps.reserved_cqs	     = dev_cap->reserved_cqs;
 	dev->caps.reserved_eqs	     = dev_cap->reserved_eqs;
+	dev->caps.eqc_entry_size     = dev_cap->eqc_entry_sz;
 	dev->caps.mtts_per_seg	     = 1 << log_mtts_per_seg;
 	dev->caps.reserved_mtts	     = DIV_ROUND_UP(dev_cap->reserved_mtts,
 						    dev->caps.mtts_per_seg);
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 5bd79c2..34bcc11 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -210,6 +210,7 @@ struct mlx4_eq_table {
 	dma_addr_t		icm_dma;
 	struct mlx4_icm_table	cmpt_table;
 	int			have_irq;
+	int			order;
 	u8			inta_pin;
 };
 
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index ce7cc6c..8923c9b 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -206,6 +206,7 @@ struct mlx4_caps {
 	int			max_cqes;
 	int			reserved_cqs;
 	int			num_eqs;
+	int			eqc_entry_size;
 	int			reserved_eqs;
 	int			num_comp_vectors;
 	int			num_mpts;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: mlx4 2.6.31-rc5: SW2HW_EQ failed.
  2009-08-19 15:29           ` Roland Dreier
  2009-08-19 15:47             ` Christoph Lameter
@ 2009-08-19 16:29             ` Christoph Lameter
  1 sibling, 0 replies; 14+ messages in thread
From: Christoph Lameter @ 2009-08-19 16:29 UTC (permalink / raw)
  To: Roland Dreier; +Cc: netdev, Yevgeny Petrilin

On Wed, 19 Aug 2009, Roland Dreier wrote:

>  > We ran an old 2.6.26 debian kernel on this and it worked fine.
>
> The change to mlx4 to use multiple completion interrupts went in around
> 2.6.29 I think.  So that sort of explains why things would work with 2.6.26.

I also see a 2 usecs latency increase with 2.6.31 vs 2.6.30.


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-08-19 21:42 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-17 19:26 mlx4 2.6.31-rc5: SW2HW_EQ failed Christoph Lameter
2009-08-17 22:04 ` Roland Dreier
2009-08-17 22:17   ` Christoph Lameter
2009-08-18  1:28     ` Roland Dreier
2009-08-18 15:50       ` Christoph Lameter
2009-08-18 16:56         ` Roland Dreier
2009-08-19  7:03       ` Roland Dreier
2009-08-19 11:46         ` Christoph Lameter
2009-08-19 15:29           ` Roland Dreier
2009-08-19 15:47             ` Christoph Lameter
2009-08-19 19:46               ` Roland Dreier
2009-08-19 19:58                 ` Christoph Lameter
2009-08-19 21:42               ` Roland Dreier
2009-08-19 16:29             ` Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).