* mlx4 2.6.31-rc5: SW2HW_EQ failed. @ 2009-08-17 19:26 Christoph Lameter 2009-08-17 22:04 ` Roland Dreier 0 siblings, 1 reply; 14+ messages in thread From: Christoph Lameter @ 2009-08-17 19:26 UTC (permalink / raw) To: Roland Dreier; +Cc: netdev, Yevgeny Petrilin mlx4 fails to initialize here: [ 9.973940] mlx4_core 0000:04:00.0: irq 93 for MSI/MSI-X [ 9.983108] sr 1:0:0:0: Attached scsi CD-ROM sr0 [ 9.988209] ses 0:0:32:0: Attached scsi generic sg0 type 13 [ 9.999376] sd 0:2:0:0: Attached scsi generic sg1 type 0 [ 10.010024] sr 1:0:0:0: Attached scsi generic sg2 type 5 [ 10.256371] mlx4_core 0000:04:00.0: SW2HW_EQ failed (-5) [ 10.270103] mlx4_core 0000:04:00.0: Failed to initialize event queue table, aborting. [ 10.288768] mlx4_core 0000:04:00.0: PCI INT A disabled [ 10.299057] mlx4_core: probe of 0000:04:00.0 failed with error -5 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mlx4 2.6.31-rc5: SW2HW_EQ failed. 2009-08-17 19:26 mlx4 2.6.31-rc5: SW2HW_EQ failed Christoph Lameter @ 2009-08-17 22:04 ` Roland Dreier 2009-08-17 22:17 ` Christoph Lameter 0 siblings, 1 reply; 14+ messages in thread From: Roland Dreier @ 2009-08-17 22:04 UTC (permalink / raw) To: Christoph Lameter; +Cc: netdev, Yevgeny Petrilin > mlx4 fails to initialize here: > > > [ 9.973940] mlx4_core 0000:04:00.0: irq 93 for MSI/MSI-X > [ 9.983108] sr 1:0:0:0: Attached scsi CD-ROM sr0 > [ 9.988209] ses 0:0:32:0: Attached scsi generic sg0 type 13 > [ 9.999376] sd 0:2:0:0: Attached scsi generic sg1 type 0 > [ 10.010024] sr 1:0:0:0: Attached scsi generic sg2 type 5 > [ 10.256371] mlx4_core 0000:04:00.0: SW2HW_EQ failed (-5) > [ 10.270103] mlx4_core 0000:04:00.0: Failed to initialize event queue > table, aborting. > [ 10.288768] mlx4_core 0000:04:00.0: PCI INT A disabled > [ 10.299057] mlx4_core: probe of 0000:04:00.0 failed with error -5 Thanks for the report... could you try loading mlx4_core with debug_level=1 to see if anything interesting comes out? The kernel log here indicates that the device FW is giving us "internal error" when we try to initialize event queues. Also what kernel is this with? Anything unusual about the system (arch != x86, lots of CPUs or RAM, etc)? One stab in the dark would be to try a423b8a0 ("mlx4_core: Allocate and map sufficient ICM memory for EQ context") from the for-next branch of my infiniband.git kernel.org tree. I would only think that matters if you have 32 or more CPUs, but maybe you do... - R. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mlx4 2.6.31-rc5: SW2HW_EQ failed. 2009-08-17 22:04 ` Roland Dreier @ 2009-08-17 22:17 ` Christoph Lameter 2009-08-18 1:28 ` Roland Dreier 0 siblings, 1 reply; 14+ messages in thread From: Christoph Lameter @ 2009-08-17 22:17 UTC (permalink / raw) To: Roland Dreier; +Cc: netdev, Yevgeny Petrilin On Mon, 17 Aug 2009, Roland Dreier wrote: > > > mlx4 fails to initialize here: > > > > > > [ 9.973940] mlx4_core 0000:04:00.0: irq 93 for MSI/MSI-X > > [ 9.983108] sr 1:0:0:0: Attached scsi CD-ROM sr0 > > [ 9.988209] ses 0:0:32:0: Attached scsi generic sg0 type 13 > > [ 9.999376] sd 0:2:0:0: Attached scsi generic sg1 type 0 > > [ 10.010024] sr 1:0:0:0: Attached scsi generic sg2 type 5 > > [ 10.256371] mlx4_core 0000:04:00.0: SW2HW_EQ failed (-5) > > [ 10.270103] mlx4_core 0000:04:00.0: Failed to initialize event queue > > table, aborting. > > [ 10.288768] mlx4_core 0000:04:00.0: PCI INT A disabled > > [ 10.299057] mlx4_core: probe of 0000:04:00.0 failed with error -5 > > Thanks for the report... could you try loading mlx4_core with > debug_level=1 to see if anything interesting comes out? The kernel log > here indicates that the device FW is giving us "internal error" when we > try to initialize event queues. Device FW??? The log you wanted follows at the end of this message. > Also what kernel is this with? Anything unusual about the system (arch > != x86, lots of CPUs or RAM, etc)? Dell R620 two quad nehalems. Build with standard debian kernel config. > One stab in the dark would be to try a423b8a0 ("mlx4_core: Allocate and > map sufficient ICM memory for EQ context") from the for-next branch of > my infiniband.git kernel.org tree. I would only think that matters if > you have 32 or more CPUs, but maybe you do... We have 16 processors. [ 7423.298136] mlx4_core: Mellanox ConnectX core driver v0.01 (May 1, 2007) [ 7423.298137] mlx4_core: Initializing 0000:04:00.0 [ 7423.298147] mlx4_core 0000:04:00.0: PCI INT A -> GSI 38 (level, low) -> IRQ 38 [ 7423.298165] mlx4_core 0000:04:00.0: setting latency timer to 64 [ 7424.298240] mlx4_core 0000:04:00.0: FW version 2.6.000 (cmd intf rev 3), max commands 16 [ 7424.298242] mlx4_core 0000:04:00.0: Catastrophic error buffer at 0x1f020, size 0x10, BAR 0 [ 7424.298243] mlx4_core 0000:04:00.0: FW size 385 KB [ 7424.298245] mlx4_core 0000:04:00.0: Clear int @ f0058, BAR 0 [ 7424.299848] mlx4_core 0000:04:00.0: Mapped 26 chunks/6168 KB for FW. [ 7424.921833] mlx4_core 0000:04:00.0: BlueFlame available (reg size 512, regs/page 256) [ 7424.921952] mlx4_core 0000:04:00.0: Base MM extensions: flags 00000cc0, rsvd L_Key 00000500 [ 7424.921954] mlx4_core 0000:04:00.0: Max ICM size 4294967296 MB [ 7424.921955] mlx4_core 0000:04:00.0: Max QPs: 16777216, reserved QPs: 64, entry size: 256 [ 7424.921957] mlx4_core 0000:04:00.0: Max SRQs: 16777216, reserved SRQs: 64, entry size: 128 [ 7424.921959] mlx4_core 0000:04:00.0: Max CQs: 16777216, reserved CQs: 128, entry size: 128 [ 7424.921960] mlx4_core 0000:04:00.0: Max EQs: 512, reserved EQs: 4, entry size: 128 [ 7424.921961] mlx4_core 0000:04:00.0: reserved MPTs: 16, reserved MTTs: 16 [ 7424.921963] mlx4_core 0000:04:00.0: Max PDs: 8388608, reserved PDs: 4, reserved UARs: 1 [ 7424.921964] mlx4_core 0000:04:00.0: Max QP/MCG: 8388608, reserved MGMs: 0 [ 7424.921966] mlx4_core 0000:04:00.0: Max CQEs: 4194304, max WQEs: 16384, max SRQ WQEs: 16384 [ 7424.921967] mlx4_core 0000:04:00.0: Local CA ACK delay: 15, max MTU: 4096, port width cap: 3 [ 7424.921969] mlx4_core 0000:04:00.0: Max SQ desc size: 1008, max SQ S/G: 62 [ 7424.921970] mlx4_core 0000:04:00.0: Max RQ desc size: 512, max RQ S/G: 32 [ 7424.921971] mlx4_core 0000:04:00.0: Max GSO size: 131072 [ 7424.921972] mlx4_core 0000:04:00.0: DEV_CAP flags: [ 7424.921974] mlx4_core 0000:04:00.0: RC transport [ 7424.921975] mlx4_core 0000:04:00.0: UC transport [ 7424.921976] mlx4_core 0000:04:00.0: UD transport [ 7424.921977] mlx4_core 0000:04:00.0: XRC transport [ 7424.921978] mlx4_core 0000:04:00.0: FCoIB support [ 7424.921979] mlx4_core 0000:04:00.0: SRQ support [ 7424.921980] mlx4_core 0000:04:00.0: IPoIB checksum offload [ 7424.921981] mlx4_core 0000:04:00.0: P_Key violation counter [ 7424.921982] mlx4_core 0000:04:00.0: Q_Key violation counter [ 7424.921983] mlx4_core 0000:04:00.0: DPDP [ 7424.921984] mlx4_core 0000:04:00.0: APM support [ 7424.921985] mlx4_core 0000:04:00.0: Atomic ops support [ 7424.921986] mlx4_core 0000:04:00.0: Address vector port checking support [ 7424.921988] mlx4_core 0000:04:00.0: UD multicast support [ 7424.921989] mlx4_core 0000:04:00.0: Router support [ 7424.921993] mlx4_core 0000:04:00.0: profile[ 0] ( CMPT): 2^26 entries @ 0x 0, size 0x 100000000 [ 7424.921995] mlx4_core 0000:04:00.0: profile[ 1] (RDMARC): 2^21 entries @ 0x 100000000, size 0x 4000000 [ 7424.921997] mlx4_core 0000:04:00.0: profile[ 2] ( MTT): 2^20 entries @ 0x 104000000, size 0x 4000000 [ 7424.921999] mlx4_core 0000:04:00.0: profile[ 3] ( QP): 2^17 entries @ 0x 108000000, size 0x 2000000 [ 7424.922001] mlx4_core 0000:04:00.0: profile[ 4] ( ALTC): 2^17 entries @ 0x 10a000000, size 0x 800000 [ 7424.922003] mlx4_core 0000:04:00.0: profile[ 5] ( SRQ): 2^16 entries @ 0x 10a800000, size 0x 800000 [ 7424.922005] mlx4_core 0000:04:00.0: profile[ 6] ( CQ): 2^16 entries @ 0x 10b000000, size 0x 800000 [ 7424.922007] mlx4_core 0000:04:00.0: profile[ 7] ( DMPT): 2^17 entries @ 0x 10b800000, size 0x 800000 [ 7424.922009] mlx4_core 0000:04:00.0: profile[ 8] ( MCG): 2^13 entries @ 0x 10c000000, size 0x 200000 [ 7424.922011] mlx4_core 0000:04:00.0: profile[ 9] ( AUXC): 2^17 entries @ 0x 10c200000, size 0x 20000 [ 7424.922013] mlx4_core 0000:04:00.0: profile[10] ( EQ): 2^06 entries @ 0x 10c220000, size 0x 2000 [ 7424.922014] mlx4_core 0000:04:00.0: HCA context memory: reserving 4393096 KB [ 7424.922034] mlx4_core 0000:04:00.0: 4393096 KB of HCA context requires 8620 KB aux memory. [ 7424.942888] mlx4_core 0000:04:00.0: Mapped 37 chunks/8620 KB for ICM aux. [ 7424.943998] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 0 for ICM. [ 7424.945080] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 40000000 for ICM. [ 7424.946162] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 80000000 for ICM. [ 7424.946192] mlx4_core 0000:04:00.0: Mapped 1 chunks/4 KB at c0000000 for ICM. [ 7424.946221] mlx4_core 0000:04:00.0: Mapped page at 1a79c4000 to 10c220000 for ICM. [ 7424.947283] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 104000000 for ICM. [ 7424.948380] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 10b800000 for ICM. [ 7424.949441] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 108000000 for ICM. [ 7424.949976] mlx4_core 0000:04:00.0: Mapped 1 chunks/128 KB at 10c200000 for ICM. [ 7424.951037] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 10a000000 for ICM. [ 7424.952098] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 100000000 for ICM. [ 7424.953159] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 10b000000 for ICM. [ 7424.954219] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 10a800000 for ICM. [ 7424.955279] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 10c000000 for ICM. [ 7424.956339] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 10c040000 for ICM. [ 7424.957399] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 10c080000 for ICM. [ 7424.958458] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 10c0c0000 for ICM. [ 7424.959519] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 10c100000 for ICM. [ 7424.960581] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 10c140000 for ICM. [ 7424.961641] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 10c180000 for ICM. [ 7424.962702] mlx4_core 0000:04:00.0: Mapped 1 chunks/256 KB at 10c1c0000 for ICM. [ 7425.199430] mlx4_core 0000:04:00.0: irq 70 for MSI/MSI-X [ 7425.199432] mlx4_core 0000:04:00.0: irq 71 for MSI/MSI-X [ 7425.199434] mlx4_core 0000:04:00.0: irq 72 for MSI/MSI-X [ 7425.199436] mlx4_core 0000:04:00.0: irq 73 for MSI/MSI-X [ 7425.199437] mlx4_core 0000:04:00.0: irq 74 for MSI/MSI-X [ 7425.199439] mlx4_core 0000:04:00.0: irq 75 for MSI/MSI-X [ 7425.199441] mlx4_core 0000:04:00.0: irq 76 for MSI/MSI-X [ 7425.199443] mlx4_core 0000:04:00.0: irq 77 for MSI/MSI-X [ 7425.199445] mlx4_core 0000:04:00.0: irq 78 for MSI/MSI-X [ 7425.199446] mlx4_core 0000:04:00.0: irq 79 for MSI/MSI-X [ 7425.199448] mlx4_core 0000:04:00.0: irq 80 for MSI/MSI-X [ 7425.199450] mlx4_core 0000:04:00.0: irq 81 for MSI/MSI-X [ 7425.199452] mlx4_core 0000:04:00.0: irq 82 for MSI/MSI-X [ 7425.199454] mlx4_core 0000:04:00.0: irq 83 for MSI/MSI-X [ 7425.199456] mlx4_core 0000:04:00.0: irq 84 for MSI/MSI-X [ 7425.199457] mlx4_core 0000:04:00.0: irq 85 for MSI/MSI-X [ 7425.199459] mlx4_core 0000:04:00.0: irq 86 for MSI/MSI-X [ 7425.199461] mlx4_core 0000:04:00.0: irq 87 for MSI/MSI-X [ 7425.199463] mlx4_core 0000:04:00.0: irq 88 for MSI/MSI-X [ 7425.199464] mlx4_core 0000:04:00.0: irq 89 for MSI/MSI-X [ 7425.199466] mlx4_core 0000:04:00.0: irq 90 for MSI/MSI-X [ 7425.199468] mlx4_core 0000:04:00.0: irq 91 for MSI/MSI-X [ 7425.199470] mlx4_core 0000:04:00.0: irq 92 for MSI/MSI-X [ 7425.199472] mlx4_core 0000:04:00.0: irq 93 for MSI/MSI-X [ 7425.199474] mlx4_core 0000:04:00.0: irq 94 for MSI/MSI-X [ 7425.199475] mlx4_core 0000:04:00.0: irq 95 for MSI/MSI-X [ 7425.199477] mlx4_core 0000:04:00.0: irq 96 for MSI/MSI-X [ 7425.199479] mlx4_core 0000:04:00.0: irq 97 for MSI/MSI-X [ 7425.199481] mlx4_core 0000:04:00.0: irq 98 for MSI/MSI-X [ 7425.199483] mlx4_core 0000:04:00.0: irq 99 for MSI/MSI-X [ 7425.199485] mlx4_core 0000:04:00.0: irq 100 for MSI/MSI-X [ 7425.199487] mlx4_core 0000:04:00.0: irq 101 for MSI/MSI-X [ 7425.199488] mlx4_core 0000:04:00.0: irq 102 for MSI/MSI-X [ 7425.472921] mlx4_core 0000:04:00.0: SW2HW_EQ failed (-5) [ 7425.476030] mlx4_core 0000:04:00.0: Failed to initialize event queue table, aborting. [ 7425.494648] mlx4_core 0000:04:00.0: PCI INT A disabled [ 7425.494660] mlx4_core: probe of 0000:04:00.0 failed with error -5 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mlx4 2.6.31-rc5: SW2HW_EQ failed. 2009-08-17 22:17 ` Christoph Lameter @ 2009-08-18 1:28 ` Roland Dreier 2009-08-18 15:50 ` Christoph Lameter 2009-08-19 7:03 ` Roland Dreier 0 siblings, 2 replies; 14+ messages in thread From: Roland Dreier @ 2009-08-18 1:28 UTC (permalink / raw) To: Christoph Lameter; +Cc: netdev, Yevgeny Petrilin > > [ 10.256371] mlx4_core 0000:04:00.0: SW2HW_EQ failed (-5) > Device FW??? The log you wanted follows at the end of this message. Not sure why there are "???" there... the (-5) error code is an "internal error" status from the device FW on the event queue initialization command. Anyway I think the log shows that the problem is exactly the one fixed in the commit I mentioned -- a423b8a0 ("mlx4_core: Allocate and map sufficient ICM memory for EQ context") from my infiniband.git tree should fix this. The log > [ 7425.199430] mlx4_core 0000:04:00.0: irq 70 for MSI/MSI-X ... > [ 7425.199488] mlx4_core 0000:04:00.0: irq 102 for MSI/MSI-X shows 33 event queues being allocated (num_possible_cpus() + 1) and that will hit the issue fixed in that commit. Assuming this fixes it for you, I guess I should get this into 2.6.31, since it obviously is hitting not-particularly-exotic systems in practice. I do wonder why num_possible_cpus() is 32 on your box (since 16 threads is really the max with nehalem EP). Anyway, here's the patch I mean: commit a423b8a022d523abe834cefe67bfaf42424150a7 Author: Eli Cohen <eli@mellanox.co.il> Date: Fri Aug 7 11:13:13 2009 -0700 mlx4_core: Allocate and map sufficient ICM memory for EQ context The current implementation allocates a single host page for EQ context memory, which was OK when we only allocated a few EQs. However, since we now allocate an EQ for each CPU core, this patch removes the hard-coded limit and makes the allocation depend on EQ entry size and the number of required EQs. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com> diff --git a/drivers/net/mlx4/eq.c b/drivers/net/mlx4/eq.c index c11a052..dae6387 100644 --- a/drivers/net/mlx4/eq.c +++ b/drivers/net/mlx4/eq.c @@ -529,29 +529,36 @@ int mlx4_map_eq_icm(struct mlx4_dev *dev, u64 icm_virt) { struct mlx4_priv *priv = mlx4_priv(dev); int ret; + int host_pages, icm_pages; + int i; - /* - * We assume that mapping one page is enough for the whole EQ - * context table. This is fine with all current HCAs, because - * we only use 32 EQs and each EQ uses 64 bytes of context - * memory, or 1 KB total. - */ + host_pages = ALIGN(min_t(int, dev->caps.num_eqs, num_possible_cpus() + 1) * + dev->caps.eqc_entry_size, PAGE_SIZE) >> PAGE_SHIFT; + priv->eq_table.order = order_base_2(host_pages); priv->eq_table.icm_virt = icm_virt; - priv->eq_table.icm_page = alloc_page(GFP_HIGHUSER); + priv->eq_table.icm_page = alloc_pages(GFP_HIGHUSER, priv->eq_table.order); if (!priv->eq_table.icm_page) return -ENOMEM; priv->eq_table.icm_dma = pci_map_page(dev->pdev, priv->eq_table.icm_page, 0, - PAGE_SIZE, PCI_DMA_BIDIRECTIONAL); + PAGE_SIZE << priv->eq_table.order, + PCI_DMA_BIDIRECTIONAL); if (pci_dma_mapping_error(dev->pdev, priv->eq_table.icm_dma)) { - __free_page(priv->eq_table.icm_page); + __free_pages(priv->eq_table.icm_page, priv->eq_table.order); return -ENOMEM; } - ret = mlx4_MAP_ICM_page(dev, priv->eq_table.icm_dma, icm_virt); - if (ret) { - pci_unmap_page(dev->pdev, priv->eq_table.icm_dma, PAGE_SIZE, - PCI_DMA_BIDIRECTIONAL); - __free_page(priv->eq_table.icm_page); + icm_pages = (PAGE_SIZE / MLX4_ICM_PAGE_SIZE) << priv->eq_table.order; + for (i = 0; i < icm_pages; ++i) { + ret = mlx4_MAP_ICM_page(dev, priv->eq_table.icm_dma, + icm_virt + i * MLX4_ICM_PAGE_SIZE); + if (ret) { + if (i) + mlx4_UNMAP_ICM(dev, priv->eq_table.icm_virt, i); + pci_unmap_page(dev->pdev, priv->eq_table.icm_dma, PAGE_SIZE, + PCI_DMA_BIDIRECTIONAL); + __free_pages(priv->eq_table.icm_page, priv->eq_table.order); + break; + } } return ret; @@ -560,11 +567,12 @@ int mlx4_map_eq_icm(struct mlx4_dev *dev, u64 icm_virt) void mlx4_unmap_eq_icm(struct mlx4_dev *dev) { struct mlx4_priv *priv = mlx4_priv(dev); + int icm_pages = (PAGE_SIZE / MLX4_ICM_PAGE_SIZE) << priv->eq_table.order; - mlx4_UNMAP_ICM(dev, priv->eq_table.icm_virt, 1); - pci_unmap_page(dev->pdev, priv->eq_table.icm_dma, PAGE_SIZE, - PCI_DMA_BIDIRECTIONAL); - __free_page(priv->eq_table.icm_page); + mlx4_UNMAP_ICM(dev, priv->eq_table.icm_virt, icm_pages); + pci_unmap_page(dev->pdev, priv->eq_table.icm_dma, + PAGE_SIZE << priv->eq_table.order, PCI_DMA_BIDIRECTIONAL); + __free_pages(priv->eq_table.icm_page, priv->eq_table.order); } int mlx4_alloc_eq_table(struct mlx4_dev *dev) diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index 5c1afe0..474d1f3 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -207,6 +207,7 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) dev->caps.max_cqes = dev_cap->max_cq_sz - 1; dev->caps.reserved_cqs = dev_cap->reserved_cqs; dev->caps.reserved_eqs = dev_cap->reserved_eqs; + dev->caps.eqc_entry_size = dev_cap->eqc_entry_sz; dev->caps.mtts_per_seg = 1 << log_mtts_per_seg; dev->caps.reserved_mtts = DIV_ROUND_UP(dev_cap->reserved_mtts, dev->caps.mtts_per_seg); diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index 5bd79c2..34bcc11 100644 --- a/drivers/net/mlx4/mlx4.h +++ b/drivers/net/mlx4/mlx4.h @@ -210,6 +210,7 @@ struct mlx4_eq_table { dma_addr_t icm_dma; struct mlx4_icm_table cmpt_table; int have_irq; + int order; u8 inta_pin; }; diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index ce7cc6c..8923c9b 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -206,6 +206,7 @@ struct mlx4_caps { int max_cqes; int reserved_cqs; int num_eqs; + int eqc_entry_size; int reserved_eqs; int num_comp_vectors; int num_mpts; ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: mlx4 2.6.31-rc5: SW2HW_EQ failed. 2009-08-18 1:28 ` Roland Dreier @ 2009-08-18 15:50 ` Christoph Lameter 2009-08-18 16:56 ` Roland Dreier 2009-08-19 7:03 ` Roland Dreier 1 sibling, 1 reply; 14+ messages in thread From: Christoph Lameter @ 2009-08-18 15:50 UTC (permalink / raw) To: Roland Dreier; +Cc: netdev, Yevgeny Petrilin On Mon, 17 Aug 2009, Roland Dreier wrote: > > Device FW??? The log you wanted follows at the end of this message. > > Not sure why there are "???" there... the (-5) error code is an > "internal error" status from the device FW on the event queue > initialization command. Anyway I think the log shows that the problem > is exactly the one fixed in the commit I mentioned -- a423b8a0 > ("mlx4_core: Allocate and map sufficient ICM memory for EQ context") > from my infiniband.git tree should fix this. Never heard of device FW. > > The log > > > [ 7425.199430] mlx4_core 0000:04:00.0: irq 70 for MSI/MSI-X > ... > > [ 7425.199488] mlx4_core 0000:04:00.0: irq 102 for MSI/MSI-X > > shows 33 event queues being allocated (num_possible_cpus() + 1) and that > will hit the issue fixed in that commit. > > Assuming this fixes it for you, I guess I should get this into 2.6.31, > since it obviously is hitting not-particularly-exotic systems in > practice. I do wonder why num_possible_cpus() is 32 on your box (since > 16 threads is really the max with nehalem EP). The Mellanox NIC has two ports. Could that be it? > commit a423b8a022d523abe834cefe67bfaf42424150a7 Will get back to you shortly when we have tested this patch. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mlx4 2.6.31-rc5: SW2HW_EQ failed. 2009-08-18 15:50 ` Christoph Lameter @ 2009-08-18 16:56 ` Roland Dreier 0 siblings, 0 replies; 14+ messages in thread From: Roland Dreier @ 2009-08-18 16:56 UTC (permalink / raw) To: Christoph Lameter; +Cc: netdev, Yevgeny Petrilin > Never heard of device FW. I just meant firmware running on the device (in this case the connectx adapter). > The Mellanox NIC has two ports. Could that be it? No, the device is basically doing nreq = num_possible_cpus() + 1; pci_enable_msix(dev->pdev, entries, nreq); (edited a bit but that's the code). And nreq is ending up as 33 on your box. > Will get back to you shortly when we have tested this patch. Thanks. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mlx4 2.6.31-rc5: SW2HW_EQ failed. 2009-08-18 1:28 ` Roland Dreier 2009-08-18 15:50 ` Christoph Lameter @ 2009-08-19 7:03 ` Roland Dreier 2009-08-19 11:46 ` Christoph Lameter 1 sibling, 1 reply; 14+ messages in thread From: Roland Dreier @ 2009-08-19 7:03 UTC (permalink / raw) To: Christoph Lameter; +Cc: netdev, Yevgeny Petrilin By the way, my dual-socket nehalem EP system says: SMP: Allowing 16 CPUs, 0 hotplug CPUs which I think (haven't checked but the code sure looks that way) means that num_possible_cpus() is 16. This is a supermicro workstation board, forget the exact model. And the fact that your system is different is not really a bug -- it just points to slightly incorrect data somewhere, most likely ACPI tables; having 32 possible CPUs on a system that can only ever really have 16 CPUs just leads to some overallocations, and tickles the mlx4 bug where it can't handle more than 32 interrupts). Just FWIW. - R. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mlx4 2.6.31-rc5: SW2HW_EQ failed. 2009-08-19 7:03 ` Roland Dreier @ 2009-08-19 11:46 ` Christoph Lameter 2009-08-19 15:29 ` Roland Dreier 0 siblings, 1 reply; 14+ messages in thread From: Christoph Lameter @ 2009-08-19 11:46 UTC (permalink / raw) To: Roland Dreier; +Cc: netdev, Yevgeny Petrilin On Wed, 19 Aug 2009, Roland Dreier wrote: > By the way, my dual-socket nehalem EP system says: > > SMP: Allowing 16 CPUs, 0 hotplug CPUs > > which I think (haven't checked but the code sure looks that way) means > that num_possible_cpus() is 16. This is a supermicro workstation board, > forget the exact model. Correct. My Dell R620 may have some weird ACPI info. SMP: Allowing 32 CPUs, 16 hotplug CPUs although the box does not have any slots for "hotplugging". > And the fact that your system is different is not really a bug -- it > just points to slightly incorrect data somewhere, most likely ACPI > tables; having 32 possible CPUs on a system that can only ever really > have 16 CPUs just leads to some overallocations, and tickles the mlx4 > bug where it can't handle more than 32 interrupts). The patch does not fix the issue so far. We are having various hangs and are trying to figure out what is gong on. We ran an old 2.6.26 debian kernel on this and it worked fine. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mlx4 2.6.31-rc5: SW2HW_EQ failed. 2009-08-19 11:46 ` Christoph Lameter @ 2009-08-19 15:29 ` Roland Dreier 2009-08-19 15:47 ` Christoph Lameter 2009-08-19 16:29 ` Christoph Lameter 0 siblings, 2 replies; 14+ messages in thread From: Roland Dreier @ 2009-08-19 15:29 UTC (permalink / raw) To: Christoph Lameter; +Cc: netdev, Yevgeny Petrilin > Correct. My Dell R620 may have some weird ACPI info. > > SMP: Allowing 32 CPUs, 16 hotplug CPUs > > although the box does not have any slots for "hotplugging". I guess you could try booting with "possible_cpus=16" and see how that affects things... > The patch does not fix the issue so far. We are having various hangs and > are trying to figure out what is gong on. Sounds like the patch lets you get farther? What kind of hangs do you get? Still mlx4-related? > We ran an old 2.6.26 debian kernel on this and it worked fine. The change to mlx4 to use multiple completion interrupts went in around 2.6.29 I think. So that sort of explains why things would work with 2.6.26. - R. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mlx4 2.6.31-rc5: SW2HW_EQ failed. 2009-08-19 15:29 ` Roland Dreier @ 2009-08-19 15:47 ` Christoph Lameter 2009-08-19 19:46 ` Roland Dreier 2009-08-19 21:42 ` Roland Dreier 2009-08-19 16:29 ` Christoph Lameter 1 sibling, 2 replies; 14+ messages in thread From: Christoph Lameter @ 2009-08-19 15:47 UTC (permalink / raw) To: Roland Dreier; +Cc: netdev, Yevgeny Petrilin On Wed, 19 Aug 2009, Roland Dreier wrote: > > although the box does not have any slots for "hotplugging". > > I guess you could try booting with "possible_cpus=16" and see how that > affects things... I configured the kernel with a maximum of 16 cpus and things are fine. > Sounds like the patch lets you get farther? What kind of hangs do you > get? Still mlx4-related? System hangs when unloading the mlx4 driver f.e. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mlx4 2.6.31-rc5: SW2HW_EQ failed. 2009-08-19 15:47 ` Christoph Lameter @ 2009-08-19 19:46 ` Roland Dreier 2009-08-19 19:58 ` Christoph Lameter 2009-08-19 21:42 ` Roland Dreier 1 sibling, 1 reply; 14+ messages in thread From: Roland Dreier @ 2009-08-19 19:46 UTC (permalink / raw) To: Christoph Lameter; +Cc: netdev, Yevgeny Petrilin > System hangs when unloading the mlx4 driver f.e. I guess that means things work for a while if you get to the point of unloading? Or do you still get an error on load? Any way to get a trace or anything? BTW, are you using mlx4_en or mlx4_ib? I'll try to reproduce here by booting with "possible_cpus=32" and see what happens... - R. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mlx4 2.6.31-rc5: SW2HW_EQ failed. 2009-08-19 19:46 ` Roland Dreier @ 2009-08-19 19:58 ` Christoph Lameter 0 siblings, 0 replies; 14+ messages in thread From: Christoph Lameter @ 2009-08-19 19:58 UTC (permalink / raw) To: Roland Dreier; +Cc: netdev, Yevgeny Petrilin On Wed, 19 Aug 2009, Roland Dreier wrote: > > > System hangs when unloading the mlx4 driver f.e. > > I guess that means things work for a while if you get to the point of > unloading? Or do you still get an error on load? Any way to get a > trace or anything? It hung on unload. Playing around with BIOS configs now. Seems that C states were disabled so we also had some ACPI issues. > BTW, are you using mlx4_en or mlx4_ib? mlx4_ib. > I'll try to reproduce here by booting with "possible_cpus=32" and see > what happens... Thanks. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mlx4 2.6.31-rc5: SW2HW_EQ failed. 2009-08-19 15:47 ` Christoph Lameter 2009-08-19 19:46 ` Roland Dreier @ 2009-08-19 21:42 ` Roland Dreier 1 sibling, 0 replies; 14+ messages in thread From: Roland Dreier @ 2009-08-19 21:42 UTC (permalink / raw) To: Christoph Lameter; +Cc: netdev, Yevgeny Petrilin I took another look at the patch I sent and found a couple of bugs in it (seems original authors didn't really test on a system with 32 CPUs). Anyway the patch below seems to work on a test system with 32 possible CPUs (including unloading). Let me know how it works for you. Thanks, Roland commit 75e8522a04e982623d67b959d2e545974f36c323 Author: Eli Cohen <eli@mellanox.co.il> Date: Wed Aug 19 14:15:59 2009 -0700 mlx4_core: Allocate and map sufficient ICM memory for EQ context The current implementation allocates a single host page for EQ context memory, which was OK when we only allocated a few EQs. However, since we now allocate an EQ for each CPU core, this patch removes the hard-coded limit and makes the allocation depend on EQ entry size and the number of required EQs. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com> diff --git a/drivers/net/mlx4/eq.c b/drivers/net/mlx4/eq.c index c11a052..fffe1ea 100644 --- a/drivers/net/mlx4/eq.c +++ b/drivers/net/mlx4/eq.c @@ -529,31 +529,46 @@ int mlx4_map_eq_icm(struct mlx4_dev *dev, u64 icm_virt) { struct mlx4_priv *priv = mlx4_priv(dev); int ret; + int host_pages; + unsigned off; - /* - * We assume that mapping one page is enough for the whole EQ - * context table. This is fine with all current HCAs, because - * we only use 32 EQs and each EQ uses 64 bytes of context - * memory, or 1 KB total. - */ + host_pages = PAGE_ALIGN(min_t(int, dev->caps.num_eqs, num_possible_cpus() + 1) * + dev->caps.eqc_entry_size) >> PAGE_SHIFT; + priv->eq_table.order = order_base_2(host_pages); priv->eq_table.icm_virt = icm_virt; - priv->eq_table.icm_page = alloc_page(GFP_HIGHUSER); - if (!priv->eq_table.icm_page) - return -ENOMEM; + priv->eq_table.icm_page = alloc_pages(GFP_HIGHUSER, priv->eq_table.order); + if (!priv->eq_table.icm_page) { + ret = -ENOMEM; + goto err; + } priv->eq_table.icm_dma = pci_map_page(dev->pdev, priv->eq_table.icm_page, 0, - PAGE_SIZE, PCI_DMA_BIDIRECTIONAL); + PAGE_SIZE << priv->eq_table.order, + PCI_DMA_BIDIRECTIONAL); if (pci_dma_mapping_error(dev->pdev, priv->eq_table.icm_dma)) { - __free_page(priv->eq_table.icm_page); - return -ENOMEM; + ret = -ENOMEM; + goto err_free; } - ret = mlx4_MAP_ICM_page(dev, priv->eq_table.icm_dma, icm_virt); - if (ret) { - pci_unmap_page(dev->pdev, priv->eq_table.icm_dma, PAGE_SIZE, - PCI_DMA_BIDIRECTIONAL); - __free_page(priv->eq_table.icm_page); + for (off = 0; off < PAGE_SIZE << priv->eq_table.order; off += MLX4_ICM_PAGE_SIZE) { + ret = mlx4_MAP_ICM_page(dev, priv->eq_table.icm_dma + off, + icm_virt + off); + if (ret) + goto err_unmap; } + return 0; + +err_unmap: + if (off) + mlx4_UNMAP_ICM(dev, priv->eq_table.icm_virt, off / MLX4_ICM_PAGE_SIZE); + pci_unmap_page(dev->pdev, priv->eq_table.icm_dma, + PAGE_SIZE << priv->eq_table.order, + PCI_DMA_BIDIRECTIONAL); + +err_free: + __free_pages(priv->eq_table.icm_page, priv->eq_table.order); + +err: return ret; } @@ -561,10 +576,11 @@ void mlx4_unmap_eq_icm(struct mlx4_dev *dev) { struct mlx4_priv *priv = mlx4_priv(dev); - mlx4_UNMAP_ICM(dev, priv->eq_table.icm_virt, 1); - pci_unmap_page(dev->pdev, priv->eq_table.icm_dma, PAGE_SIZE, - PCI_DMA_BIDIRECTIONAL); - __free_page(priv->eq_table.icm_page); + mlx4_UNMAP_ICM(dev, priv->eq_table.icm_virt, + (PAGE_SIZE / MLX4_ICM_PAGE_SIZE) << priv->eq_table.order); + pci_unmap_page(dev->pdev, priv->eq_table.icm_dma, + PAGE_SIZE << priv->eq_table.order, PCI_DMA_BIDIRECTIONAL); + __free_pages(priv->eq_table.icm_page, priv->eq_table.order); } int mlx4_alloc_eq_table(struct mlx4_dev *dev) diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index 5c1afe0..474d1f3 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -207,6 +207,7 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) dev->caps.max_cqes = dev_cap->max_cq_sz - 1; dev->caps.reserved_cqs = dev_cap->reserved_cqs; dev->caps.reserved_eqs = dev_cap->reserved_eqs; + dev->caps.eqc_entry_size = dev_cap->eqc_entry_sz; dev->caps.mtts_per_seg = 1 << log_mtts_per_seg; dev->caps.reserved_mtts = DIV_ROUND_UP(dev_cap->reserved_mtts, dev->caps.mtts_per_seg); diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index 5bd79c2..34bcc11 100644 --- a/drivers/net/mlx4/mlx4.h +++ b/drivers/net/mlx4/mlx4.h @@ -210,6 +210,7 @@ struct mlx4_eq_table { dma_addr_t icm_dma; struct mlx4_icm_table cmpt_table; int have_irq; + int order; u8 inta_pin; }; diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index ce7cc6c..8923c9b 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -206,6 +206,7 @@ struct mlx4_caps { int max_cqes; int reserved_cqs; int num_eqs; + int eqc_entry_size; int reserved_eqs; int num_comp_vectors; int num_mpts; ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: mlx4 2.6.31-rc5: SW2HW_EQ failed. 2009-08-19 15:29 ` Roland Dreier 2009-08-19 15:47 ` Christoph Lameter @ 2009-08-19 16:29 ` Christoph Lameter 1 sibling, 0 replies; 14+ messages in thread From: Christoph Lameter @ 2009-08-19 16:29 UTC (permalink / raw) To: Roland Dreier; +Cc: netdev, Yevgeny Petrilin On Wed, 19 Aug 2009, Roland Dreier wrote: > > We ran an old 2.6.26 debian kernel on this and it worked fine. > > The change to mlx4 to use multiple completion interrupts went in around > 2.6.29 I think. So that sort of explains why things would work with 2.6.26. I also see a 2 usecs latency increase with 2.6.31 vs 2.6.30. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2009-08-19 21:42 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-08-17 19:26 mlx4 2.6.31-rc5: SW2HW_EQ failed Christoph Lameter 2009-08-17 22:04 ` Roland Dreier 2009-08-17 22:17 ` Christoph Lameter 2009-08-18 1:28 ` Roland Dreier 2009-08-18 15:50 ` Christoph Lameter 2009-08-18 16:56 ` Roland Dreier 2009-08-19 7:03 ` Roland Dreier 2009-08-19 11:46 ` Christoph Lameter 2009-08-19 15:29 ` Roland Dreier 2009-08-19 15:47 ` Christoph Lameter 2009-08-19 19:46 ` Roland Dreier 2009-08-19 19:58 ` Christoph Lameter 2009-08-19 21:42 ` Roland Dreier 2009-08-19 16:29 ` Christoph Lameter
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).