* [PATCH] mm/hmm: Allow hmm_dma_map_alloc() to tolerate NULL device
@ 2025-05-23 14:35 Daisuke Matsuda
2025-05-23 14:48 ` Christoph Hellwig
0 siblings, 1 reply; 5+ messages in thread
From: Daisuke Matsuda @ 2025-05-23 14:35 UTC (permalink / raw)
To: linux-rdma, linux-mm, leon, jgg, akpm, jglisse
Cc: linux-kernel, linux-pci, zyjzyj2000, Daisuke Matsuda
Some drivers (such as rxe) may legitimately call hmm_dma_map_alloc() with a
NULL device pointer, which leads to a NULL pointer dereference. This patch
adds NULL checks to safely bypass device-specific DMA features when no
device is provided.
This fixes the following kernel oops:
BUG: kernel NULL pointer dereference, address: 00000000000002fc
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 1028eb067 P4D 1028eb067 PUD 105da0067 PMD 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 3 UID: 1000 PID: 1854 Comm: python3 Tainted: G W 6.15.0-rc1+ #11 PREEMPT(voluntary)
Tainted: [W]=WARN
Hardware name: Trigkey Key N/Key N, BIOS KEYN101 09/02/2024
RIP: 0010:hmm_dma_map_alloc+0x25/0x100
Code: 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 d6 49 c1 e6 0c 41 55 41 54 53 49 39 ce 0f 82 c6 00 00 00 49 89 fc <f6> 87 fc 02 00 00 20 0f 84 af 00 00 00 49 89 f5 48 89 d3 49 89 cf
RSP: 0018:ffffd3d3420eb830 EFLAGS: 00010246
RAX: 0000000000001000 RBX: ffff8b727c7f7400 RCX: 0000000000001000
RDX: 0000000000000001 RSI: ffff8b727c7f74b0 RDI: 0000000000000000
RBP: ffffd3d3420eb858 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 00007262a622a000 R14: 0000000000001000 R15: ffff8b727c7f74b0
FS: 00007262a62a1080(0000) GS:ffff8b762ac3e000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000002fc CR3: 000000010a1f0004 CR4: 0000000000f72ef0
PKRU: 55555554
Call Trace:
<TASK>
ib_init_umem_odp+0xb6/0x110 [ib_uverbs]
ib_umem_odp_get+0xf0/0x150 [ib_uverbs]
rxe_odp_mr_init_user+0x71/0x170 [rdma_rxe]
rxe_reg_user_mr+0x217/0x2e0 [rdma_rxe]
ib_uverbs_reg_mr+0x19e/0x2e0 [ib_uverbs]
ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xd9/0x150 [ib_uverbs]
ib_uverbs_cmd_verbs+0xd19/0xee0 [ib_uverbs]
? mmap_region+0x63/0xd0
? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 [ib_uverbs]
ib_uverbs_ioctl+0xba/0x130 [ib_uverbs]
__x64_sys_ioctl+0xa4/0xe0
x64_sys_call+0x1178/0x2660
do_syscall_64+0x7e/0x170
? syscall_exit_to_user_mode+0x4e/0x250
? do_syscall_64+0x8a/0x170
? do_syscall_64+0x8a/0x170
? syscall_exit_to_user_mode+0x4e/0x250
? do_syscall_64+0x8a/0x170
? syscall_exit_to_user_mode+0x4e/0x250
? do_syscall_64+0x8a/0x170
? do_user_addr_fault+0x1d2/0x8d0
? irqentry_exit_to_user_mode+0x43/0x250
? irqentry_exit+0x43/0x50
? exc_page_fault+0x93/0x1d0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7262a6124ded
Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
RSP: 002b:00007fffd08c3960 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007fffd08c39f0 RCX: 00007262a6124ded
RDX: 00007fffd08c3a10 RSI: 00000000c0181b01 RDI: 0000000000000007
RBP: 00007fffd08c39b0 R08: 0000000014107820 R09: 00007fffd08c3b44
R10: 000000000000000c R11: 0000000000000246 R12: 00007fffd08c3b44
R13: 000000000000000c R14: 00007fffd08c3b58 R15: 0000000014107960
</TASK>
Fixes: 1efe8c0670d6 ("RDMA/core: Convert UMEM ODP DMA mapping to caching IOVA and page linkage")
Closes: https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com/
Signed-off-by: Daisuke Matsuda <dskmtsd@gmail.com>
---
mm/hmm.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/mm/hmm.c b/mm/hmm.c
index a8bf097677f3..311141124e67 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -638,7 +638,7 @@ int hmm_dma_map_alloc(struct device *dev, struct hmm_dma_map *map,
size_t nr_entries, size_t dma_entry_size)
{
bool dma_need_sync = false;
- bool use_iova;
+ bool use_iova = false;
if (!(nr_entries * PAGE_SIZE / dma_entry_size))
return -EINVAL;
@@ -649,9 +649,9 @@ int hmm_dma_map_alloc(struct device *dev, struct hmm_dma_map *map,
* best approximation to ensure no swiotlb buffering happens.
*/
#ifdef CONFIG_DMA_NEED_SYNC
- dma_need_sync = !dev->dma_skip_sync;
+ dma_need_sync = dev ? !dev->dma_skip_sync : false;
#endif /* CONFIG_DMA_NEED_SYNC */
- if (dma_need_sync || dma_addressing_limited(dev))
+ if (dev && (dma_need_sync || dma_addressing_limited(dev)))
return -EOPNOTSUPP;
map->dma_entry_size = dma_entry_size;
@@ -660,9 +660,11 @@ int hmm_dma_map_alloc(struct device *dev, struct hmm_dma_map *map,
if (!map->pfn_list)
return -ENOMEM;
- use_iova = dma_iova_try_alloc(dev, &map->state, 0,
- nr_entries * PAGE_SIZE);
- if (!use_iova && dma_need_unmap(dev)) {
+ if (dev)
+ use_iova = dma_iova_try_alloc(dev, &map->state, 0,
+ nr_entries * PAGE_SIZE);
+
+ if (!dev || (!use_iova && dma_need_unmap(dev))) {
map->dma_list = kvcalloc(nr_entries, sizeof(*map->dma_list),
GFP_KERNEL | __GFP_NOWARN);
if (!map->dma_list)
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] mm/hmm: Allow hmm_dma_map_alloc() to tolerate NULL device
2025-05-23 14:35 [PATCH] mm/hmm: Allow hmm_dma_map_alloc() to tolerate NULL device Daisuke Matsuda
@ 2025-05-23 14:48 ` Christoph Hellwig
2025-05-23 15:38 ` Daisuke Matsuda
0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2025-05-23 14:48 UTC (permalink / raw)
To: Daisuke Matsuda
Cc: linux-rdma, linux-mm, leon, jgg, akpm, jglisse, linux-kernel,
linux-pci, zyjzyj2000
On Fri, May 23, 2025 at 02:35:37PM +0000, Daisuke Matsuda wrote:
> Some drivers (such as rxe) may legitimately call hmm_dma_map_alloc() with a
> NULL device pointer,
No, they may not. If something has no device with physical DMA
capabilities, it has not business calling into it.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm/hmm: Allow hmm_dma_map_alloc() to tolerate NULL device
2025-05-23 14:48 ` Christoph Hellwig
@ 2025-05-23 15:38 ` Daisuke Matsuda
2025-05-23 15:42 ` Christoph Hellwig
0 siblings, 1 reply; 5+ messages in thread
From: Daisuke Matsuda @ 2025-05-23 15:38 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-rdma, linux-mm, leon, jgg, akpm, jglisse, linux-kernel,
linux-pci, zyjzyj2000
On 2025/05/23 23:48, Christoph Hellwig wrote:
> On Fri, May 23, 2025 at 02:35:37PM +0000, Daisuke Matsuda wrote:
>> Some drivers (such as rxe) may legitimately call hmm_dma_map_alloc() with a
>> NULL device pointer,
>
> No, they may not. If something has no device with physical DMA
> capabilities, it has not business calling into it.
>
Hi Christoph,
RXE is a software emulator of IBTA RoCEv2, designed to allow systems equipped with standard Ethernet adapters to interoperate with other RoCEv2-capable nodes.
Like other Infiniband subsystem drivers (under drivers/infiniband/{hw,sw}), RXE depends on the ib_core and ib_uverbs layers in drivers/infiniband/core. These common RDMA layers, in turn, rely on the HMM infrastructure for specific features such as On-Demand Paging.
As a result, even though RXE lacks physical DMA capabilities, it still needs to interact with hmm_dma_map_alloc() through the shared RDMA core paths. This patch ensures that such software-only use cases do not trigger unintended null pointer dereferences.
Thanks,
Daisuke
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm/hmm: Allow hmm_dma_map_alloc() to tolerate NULL device
2025-05-23 15:38 ` Daisuke Matsuda
@ 2025-05-23 15:42 ` Christoph Hellwig
2025-05-23 15:56 ` Daisuke Matsuda
0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2025-05-23 15:42 UTC (permalink / raw)
To: Daisuke Matsuda
Cc: Christoph Hellwig, linux-rdma, linux-mm, leon, jgg, akpm, jglisse,
linux-kernel, linux-pci, zyjzyj2000
Thank you very much, but I know rxe very well. And given your apparent
knowledge of the rdma subsystem you should also know pretty well that
it does not otherwise call into the dma mapping core for virtual devices
because calling into the dma mapping code is not valid for the virtual
devices.
Please fix the rdma core to not call into the hmm dma mapping helpers
for the ib_uses_virt_dma() case.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm/hmm: Allow hmm_dma_map_alloc() to tolerate NULL device
2025-05-23 15:42 ` Christoph Hellwig
@ 2025-05-23 15:56 ` Daisuke Matsuda
0 siblings, 0 replies; 5+ messages in thread
From: Daisuke Matsuda @ 2025-05-23 15:56 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-rdma, linux-mm, leon, jgg, akpm, jglisse, linux-kernel,
linux-pci, zyjzyj2000
On 2025/05/24 0:42, Christoph Hellwig wrote:
> Thank you very much, but I know rxe very well. And given your apparent
> knowledge of the rdma subsystem you should also know pretty well that
> it does not otherwise call into the dma mapping core for virtual devices
> because calling into the dma mapping code is not valid for the virtual
> devices.
>
> Please fix the rdma core to not call into the hmm dma mapping helpers
> for the ib_uses_virt_dma() case.
>
Thank you for the clarification and guidance.
I'll look into updating the RDMA core to avoid calling hmm_dma_map_alloc() when ib_uses_virt_dma() is true. That should help keep the layering and responsibilities properly separated.
Thanks again,
Daisuke
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-05-23 15:56 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-23 14:35 [PATCH] mm/hmm: Allow hmm_dma_map_alloc() to tolerate NULL device Daisuke Matsuda
2025-05-23 14:48 ` Christoph Hellwig
2025-05-23 15:38 ` Daisuke Matsuda
2025-05-23 15:42 ` Christoph Hellwig
2025-05-23 15:56 ` Daisuke Matsuda
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).