From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: "Marciniszyn, Mike" <mike.marciniszyn@cornelisnetworks.com>
Cc: Christoph Hellwig <hch@lst.de>, Jason Gunthorpe <jgg@nvidia.com>,
"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
"stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: Re: rdmavt panic in long term stable linux-5.10.y
Date: Mon, 15 Mar 2021 14:11:51 +0100 [thread overview]
Message-ID: <YE9dF5nVp9WhSEDI@kroah.com> (raw)
In-Reply-To: <BYAPR01MB3816509E2A5824045B110099F26C9@BYAPR01MB3816.prod.exchangelabs.com>
On Mon, Mar 15, 2021 at 01:05:43PM +0000, Marciniszyn, Mike wrote:
> The following panic happens on the 5.10.20 long term stable running qperf with rdmavt/hfi1:
>
> [ 1467.730495] BUG: kernel NULL pointer dereference, address: 0000000000000268
> [ 1467.738940] #PF: supervisor read access in kernel mode
> [ 1467.745052] #PF: error_code(0x0000) - not-present page
> [ 1467.751159] PGD 0 P4D 0
> [ 1467.754350] Oops: 0000 [#1] SMP PTI
> [ 1467.758621] CPU: 43 PID: 42843 Comm: qperf Tainted: G S 5.10.17 #1
> [ 1467.767370] HISS-219ardware name: Intel Corporation S2600CWR/S2600CW, BIOS SE5C610.86B.01.01.0014.121820151719 12/18/2015
> [ 1467.779357] RIP: 0010:ib_umem_get+0x233/0x3d0 [ib_uverbs]
> [ 1467.785811] Code: 02 00 00 48 0f 46 f5 e8 9b 67 27 ca 85 c0 0f 88 40 01 00 00 4c 63 f0 4c 89 f2 4c 29 f5 48 c1 e2 0c 89 e9 48 01 d3 49 8b 14 24 <48> 8b 92 68 02 00 00 48 85 d2 0f 85 5a ff ff ff 41 b9 00 00 01 00
> [ 1467.807715] RSP: 0018:ffffb7ba87303aa8 EFLAGS: 00010206
> [ 1467.814026] RAX: 0000000000000010 RBX: 000055ad89f11000 RCX: 0000000000000000
> [ 1467.822457] RDX: 0000000000000000 RSI: 000000000000000f RDI: ffff8954bffd6000
> [ 1467.830888] RBP: 0000000000000000 R08: 0000000000031443 R09: 0000000000000000
> [ 1467.839322] R10: 0000000000031420 R11: 0000000000000022 R12: ffff894d50930000
> [ 1467.847751] R13: 0000000000000000 R14: 0000000000000010 R15: ffff894d4a2fe880
> [ 1467.856193] FS: 00007fb12f44c740(0000) GS:ffff89549fa40000(0000) knlGS:0000000000000000
> [ 1467.865721] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1467.872657] CR2: 0000000000000268 CR3: 00000001c0534001 CR4: 00000000001706e0
> [ 1467.881136] Call Trace:
> [ 1467.884398] rvt_reg_user_mr+0x70/0x200 [rdmavt]
>
> The panic happens in the call to dma_get_max_seg_size() because the dma_device is NULL.
>
> Here is the stable patch that causes the issue:
>
> commit 404fa093741e15e16fd522cc76cd9f86e9ef81d2
> Author: Christoph Hellwig <hch@lst.de>
> Date: Fri Nov 6 19:19:38 2020 +0100
>
> RDMA/core: remove use of dma_virt_ops
>
> [ Upstream commit 5a7a9e038b032137ae9c45d5429f18a2ffdf7d42 ]
>
> Use the ib_dma_* helpers to skip the DMA translation instead. This
> removes the last user if dma_virt_ops and keeps the weird layering
> violation inside the RDMA core instead of burderning the DMA mapping
> subsystems with it. This also means the software RDMA drivers now don't
> have to mess with DMA parameters that are not relevant to them at all, and
> that in the future we can use PCI P2P transfers even for software RDMA, as
> there is no first fake layer of DMA mapping that the P2P DMA support.
>
> Link: https://lore.kernel.org/r/20201106181941.1878556-8-hch@lst.de
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Tested-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
>
> The stable backport missed a prereq patch:
>
> commit b116c702791a9834e6485f67ca6267d9fdf59b87
> Author: Christoph Hellwig <hch@lst.de>
> Date: Fri Nov 6 19:19:33 2020 +0100
>
> RDMA/umem: Use ib_dma_max_seg_size instead of dma_get_max_seg_size
>
> RDMA ULPs must not call DMA mapping APIs directly but instead use the
> ib_dma_* wrappers.
>
> Fixes: 0c16d9635e3a ("RDMA/umem: Move to allocate SG table from pages")
> Link: https://lore.kernel.org/r/20201106181941.1878556-3-hch@lst.de
> Reported-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
>
> The missing patch adds the necessary RDMA wrappers to handle the ib_device dma_device member being NULL.
>
> The missing patch picks clean and fixes the issue.
>
> Do you want me to send the stable request?
You just did, now queued up :)
greg k-h
prev parent reply other threads:[~2021-03-15 13:12 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-15 13:05 rdmavt panic in long term stable linux-5.10.y Marciniszyn, Mike
2021-03-15 13:11 ` Greg Kroah-Hartman [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YE9dF5nVp9WhSEDI@kroah.com \
--to=gregkh@linuxfoundation.org \
--cc=hch@lst.de \
--cc=jgg@nvidia.com \
--cc=linux-rdma@vger.kernel.org \
--cc=mike.marciniszyn@cornelisnetworks.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox