All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roland Dreier <rdreier@cisco.com>
To: Christoph Lameter <cl@linux-foundation.org>
Cc: netdev@vger.kernel.org, Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Subject: Re: mlx4 2.6.31-rc5: SW2HW_EQ failed.
Date: Mon, 17 Aug 2009 18:28:56 -0700	[thread overview]
Message-ID: <ada3a7p3o6f.fsf@cisco.com> (raw)
In-Reply-To: <alpine.DEB.1.10.0908171814210.15956@gentwo.org> (Christoph Lameter's message of "Mon, 17 Aug 2009 18:17:57 -0400 (EDT)")


 > > [   10.256371] mlx4_core 0000:04:00.0: SW2HW_EQ failed (-5)

 > Device FW??? The log you wanted follows at the end of this message.

Not sure why there are "???" there... the (-5) error code is an
"internal error" status from the device FW on the event queue
initialization command.  Anyway I think the log shows that the problem
is exactly the one fixed in the commit I mentioned -- a423b8a0
("mlx4_core: Allocate and map sufficient ICM memory for EQ context")
from my infiniband.git tree should fix this.

The log

 > [ 7425.199430] mlx4_core 0000:04:00.0: irq 70 for MSI/MSI-X
...
 > [ 7425.199488] mlx4_core 0000:04:00.0: irq 102 for MSI/MSI-X

shows 33 event queues being allocated (num_possible_cpus() + 1) and that
will hit the issue fixed in that commit.

Assuming this fixes it for you, I guess I should get this into 2.6.31,
since it obviously is hitting not-particularly-exotic systems in
practice.  I do wonder why num_possible_cpus() is 32 on your box (since
16 threads is really the max with nehalem EP).

Anyway, here's the patch I mean:

commit a423b8a022d523abe834cefe67bfaf42424150a7
Author: Eli Cohen <eli@mellanox.co.il>
Date:   Fri Aug 7 11:13:13 2009 -0700

    mlx4_core: Allocate and map sufficient ICM memory for EQ context
    
    The current implementation allocates a single host page for EQ context
    memory, which was OK when we only allocated a few EQs.  However, since
    we now allocate an EQ for each CPU core, this patch removes the
    hard-coded limit and makes the allocation depend on EQ entry size and
    the number of required EQs.
    
    Signed-off-by: Eli Cohen <eli@mellanox.co.il>
    Signed-off-by: Roland Dreier <rolandd@cisco.com>

diff --git a/drivers/net/mlx4/eq.c b/drivers/net/mlx4/eq.c
index c11a052..dae6387 100644
--- a/drivers/net/mlx4/eq.c
+++ b/drivers/net/mlx4/eq.c
@@ -529,29 +529,36 @@ int mlx4_map_eq_icm(struct mlx4_dev *dev, u64 icm_virt)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	int ret;
+	int host_pages, icm_pages;
+	int i;
 
-	/*
-	 * We assume that mapping one page is enough for the whole EQ
-	 * context table.  This is fine with all current HCAs, because
-	 * we only use 32 EQs and each EQ uses 64 bytes of context
-	 * memory, or 1 KB total.
-	 */
+	host_pages = ALIGN(min_t(int, dev->caps.num_eqs, num_possible_cpus() + 1) *
+			   dev->caps.eqc_entry_size, PAGE_SIZE) >> PAGE_SHIFT;
+	priv->eq_table.order = order_base_2(host_pages);
 	priv->eq_table.icm_virt = icm_virt;
-	priv->eq_table.icm_page = alloc_page(GFP_HIGHUSER);
+	priv->eq_table.icm_page = alloc_pages(GFP_HIGHUSER, priv->eq_table.order);
 	if (!priv->eq_table.icm_page)
 		return -ENOMEM;
 	priv->eq_table.icm_dma  = pci_map_page(dev->pdev, priv->eq_table.icm_page, 0,
-					       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
+					       PAGE_SIZE << priv->eq_table.order,
+					       PCI_DMA_BIDIRECTIONAL);
 	if (pci_dma_mapping_error(dev->pdev, priv->eq_table.icm_dma)) {
-		__free_page(priv->eq_table.icm_page);
+		__free_pages(priv->eq_table.icm_page, priv->eq_table.order);
 		return -ENOMEM;
 	}
 
-	ret = mlx4_MAP_ICM_page(dev, priv->eq_table.icm_dma, icm_virt);
-	if (ret) {
-		pci_unmap_page(dev->pdev, priv->eq_table.icm_dma, PAGE_SIZE,
-			       PCI_DMA_BIDIRECTIONAL);
-		__free_page(priv->eq_table.icm_page);
+	icm_pages = (PAGE_SIZE / MLX4_ICM_PAGE_SIZE) << priv->eq_table.order;
+	for (i = 0; i < icm_pages; ++i) {
+		ret = mlx4_MAP_ICM_page(dev, priv->eq_table.icm_dma,
+					icm_virt + i * MLX4_ICM_PAGE_SIZE);
+		if (ret) {
+			if (i)
+				mlx4_UNMAP_ICM(dev, priv->eq_table.icm_virt, i);
+			pci_unmap_page(dev->pdev, priv->eq_table.icm_dma, PAGE_SIZE,
+				       PCI_DMA_BIDIRECTIONAL);
+			__free_pages(priv->eq_table.icm_page, priv->eq_table.order);
+			break;
+		}
 	}
 
 	return ret;
@@ -560,11 +567,12 @@ int mlx4_map_eq_icm(struct mlx4_dev *dev, u64 icm_virt)
 void mlx4_unmap_eq_icm(struct mlx4_dev *dev)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
+	int icm_pages = (PAGE_SIZE / MLX4_ICM_PAGE_SIZE) << priv->eq_table.order;
 
-	mlx4_UNMAP_ICM(dev, priv->eq_table.icm_virt, 1);
-	pci_unmap_page(dev->pdev, priv->eq_table.icm_dma, PAGE_SIZE,
-		       PCI_DMA_BIDIRECTIONAL);
-	__free_page(priv->eq_table.icm_page);
+	mlx4_UNMAP_ICM(dev, priv->eq_table.icm_virt, icm_pages);
+	pci_unmap_page(dev->pdev, priv->eq_table.icm_dma,
+		       PAGE_SIZE << priv->eq_table.order, PCI_DMA_BIDIRECTIONAL);
+	__free_pages(priv->eq_table.icm_page, priv->eq_table.order);
 }
 
 int mlx4_alloc_eq_table(struct mlx4_dev *dev)
diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index 5c1afe0..474d1f3 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -207,6 +207,7 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	dev->caps.max_cqes	     = dev_cap->max_cq_sz - 1;
 	dev->caps.reserved_cqs	     = dev_cap->reserved_cqs;
 	dev->caps.reserved_eqs	     = dev_cap->reserved_eqs;
+	dev->caps.eqc_entry_size     = dev_cap->eqc_entry_sz;
 	dev->caps.mtts_per_seg	     = 1 << log_mtts_per_seg;
 	dev->caps.reserved_mtts	     = DIV_ROUND_UP(dev_cap->reserved_mtts,
 						    dev->caps.mtts_per_seg);
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 5bd79c2..34bcc11 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -210,6 +210,7 @@ struct mlx4_eq_table {
 	dma_addr_t		icm_dma;
 	struct mlx4_icm_table	cmpt_table;
 	int			have_irq;
+	int			order;
 	u8			inta_pin;
 };
 
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index ce7cc6c..8923c9b 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -206,6 +206,7 @@ struct mlx4_caps {
 	int			max_cqes;
 	int			reserved_cqs;
 	int			num_eqs;
+	int			eqc_entry_size;
 	int			reserved_eqs;
 	int			num_comp_vectors;
 	int			num_mpts;

  reply	other threads:[~2009-08-18  1:28 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-17 19:26 mlx4 2.6.31-rc5: SW2HW_EQ failed Christoph Lameter
2009-08-17 22:04 ` Roland Dreier
2009-08-17 22:17   ` Christoph Lameter
2009-08-18  1:28     ` Roland Dreier [this message]
2009-08-18 15:50       ` Christoph Lameter
2009-08-18 16:56         ` Roland Dreier
2009-08-19  7:03       ` Roland Dreier
2009-08-19 11:46         ` Christoph Lameter
2009-08-19 15:29           ` Roland Dreier
2009-08-19 15:47             ` Christoph Lameter
2009-08-19 19:46               ` Roland Dreier
2009-08-19 19:58                 ` Christoph Lameter
2009-08-19 21:42               ` Roland Dreier
2009-08-19 16:29             ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ada3a7p3o6f.fsf@cisco.com \
    --to=rdreier@cisco.com \
    --cc=cl@linux-foundation.org \
    --cc=netdev@vger.kernel.org \
    --cc=yevgenyp@mellanox.co.il \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.