All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Bogendoerfer <tbogendoerfer@suse.de>
To: tpearson@raptorengineering.com
Cc: rug@usm.lmu.de, linux-rdma@vger.kernel.org
Subject: Re: Infiniband crash
Date: Tue, 17 Dec 2024 09:10:42 +0100	[thread overview]
Message-ID: <20241217091042.6a4f1759@samweis> (raw)
In-Reply-To: <fba3779b3e870b9f26bb97a9e5c5b0e4.tpearson@raptorengineering.com>

On Mon, 16 Dec 2024 12:05:39 -0600
tpearson@raptorengineering.com wrote:

> Did you ever find a solution for this?  We're running into the same problem on a highly customized aarch64 system (NXP QorIQ platform), same Infinband adapter and very similar crash:
> 
> [    4.544159] OF: /soc/pcie@3600000: no iommu-map translation for id 0x100 on (null)
> [    4.551873] ib_mthca: Mellanox InfiniBand HCA driver v1.0 (April 4, 2008)
> [    4.558690] ib_mthca: Initializing 0000:01:00.0
> [    6.258309] ib_mthca 0000:01:00.0: HCA FW version 5.1.000 is old (5.3.000 is current).
> [    6.266272] ib_mthca 0000:01:00.0: If you have problems, try updating your HCA FW.
> [    6.393143] ib_mthca 0000:01:00.0 ibp1s0: renamed from ib0
> [    6.399038] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
> [    6.407865] Mem abort info:
> [    6.410662]   ESR = 0x0000000096000004
> [    6.414419]   EC = 0x25: DABT (current EL), IL = 32 bits
> [    6.419748]   SET = 0, FnV = 0
> [    6.422806]   EA = 0, S1PTW = 0
> [    6.425952]   FSC = 0x04: level 0 translation fault
> [    6.430842] Data abort info:
> [    6.433725]   ISV = 0, ISS = 0x00000004
> [    6.437569]   CM = 0, WnR = 0
> [    6.440540] user pgtable: 4k pages, 48-bit VAs, pgdp=0000008086f60000
> [    6.447003] [0000000000000010] pgd=0000000000000000, p4d=0000000000000000
> [    6.453819] Internal error: Oops: 0000000096000004 [#1] SMP
> [    6.459412] Modules linked in: ib_ipoib(E) ib_umad(E) rdma_ucm(E) rdma_cm(E) iw_cm(E) ib_cm(E) configfs(E) ib_mthca(E) ib_uverbs(E) ib_core(E)
> [    6.472263] CPU: 0 PID: 100 Comm: kworker/u17:0 Tainted: G            E      6.1.0+ #55
> [    6.480297] Hardware name: Freescale Layerscape 2080a RDB Board (DT)
> [    6.486670] Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core]
> [    6.492636] pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [    6.499624] pc : mthca_poll_cq+0x4f0/0x9a0 [ib_mthca]
> [    6.504703] lr : mthca_poll_cq+0x1e8/0x9a0 [ib_mthca]
> 
> Since this is apparently hitting two different architectures, I suspect the problem is in the driver, not the arch-specific code.  I may recommend we upgrade the card to work around this, but given the rarity of the hardware it's not something I want to recommend tinkering with and it may or may not even accept the new card in the first place.

which kernel version is this ? It looks like the bug fixed with

dc52aadbc184 RDMA/mthca: Fix crash when polling CQ for shared QPs

Thomas.

-- 
SUSE Software Solutions Germany GmbH
HRB 36809 (AG Nürnberg)
Geschäftsführer: Ivo Totev, Andrew McDonald, Werner Knoblich

  reply	other threads:[~2024-12-17  8:10 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-16 18:05 Infiniband crash tpearson
2024-12-17  8:10 ` Thomas Bogendoerfer [this message]
2024-12-17 19:42   ` Timothy Pearson
  -- strict thread matches above, loose matches on Subject: below --
2024-12-16 18:06 tpearson
     [not found] ` <420F7218-5193-44B3-AD7F-ACED38C206AE@usm.lmu.de>
2024-12-16 20:10   ` Timothy Pearson
2022-10-14 18:16 rug
2022-10-14 19:21 ` Jason Gunthorpe
2022-10-17 10:13   ` Christoph Lameter
2022-10-17 11:24     ` Rudolf Gabler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241217091042.6a4f1759@samweis \
    --to=tbogendoerfer@suse.de \
    --cc=linux-rdma@vger.kernel.org \
    --cc=rug@usm.lmu.de \
    --cc=tpearson@raptorengineering.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.