linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/hmm: Allow hmm_dma_map_alloc() to tolerate NULL device
@ 2025-05-23 14:35 Daisuke Matsuda
  2025-05-23 14:48 ` Christoph Hellwig
  0 siblings, 1 reply; 5+ messages in thread
From: Daisuke Matsuda @ 2025-05-23 14:35 UTC (permalink / raw)
  To: linux-rdma, linux-mm, leon, jgg, akpm, jglisse
  Cc: linux-kernel, linux-pci, zyjzyj2000, Daisuke Matsuda

Some drivers (such as rxe) may legitimately call hmm_dma_map_alloc() with a
NULL device pointer, which leads to a NULL pointer dereference. This patch
adds NULL checks to safely bypass device-specific DMA features when no
device is provided.

This fixes the following kernel oops:

 BUG: kernel NULL pointer dereference, address: 00000000000002fc
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 1028eb067 P4D 1028eb067 PUD 105da0067 PMD 0
 Oops: Oops: 0000 [#1] SMP NOPTI
 CPU: 3 UID: 1000 PID: 1854 Comm: python3 Tainted: G        W           6.15.0-rc1+ #11 PREEMPT(voluntary)
 Tainted: [W]=WARN
 Hardware name: Trigkey Key N/Key N, BIOS KEYN101 09/02/2024
 RIP: 0010:hmm_dma_map_alloc+0x25/0x100
 Code: 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 d6 49 c1 e6 0c 41 55 41 54 53 49 39 ce 0f 82 c6 00 00 00 49 89 fc <f6> 87 fc 02 00 00 20 0f 84 af 00 00 00 49 89 f5 48 89 d3 49 89 cf
 RSP: 0018:ffffd3d3420eb830 EFLAGS: 00010246
 RAX: 0000000000001000 RBX: ffff8b727c7f7400 RCX: 0000000000001000
 RDX: 0000000000000001 RSI: ffff8b727c7f74b0 RDI: 0000000000000000
 RBP: ffffd3d3420eb858 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
 R13: 00007262a622a000 R14: 0000000000001000 R15: ffff8b727c7f74b0
 FS:  00007262a62a1080(0000) GS:ffff8b762ac3e000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00000000000002fc CR3: 000000010a1f0004 CR4: 0000000000f72ef0
 PKRU: 55555554
 Call Trace:
  <TASK>
  ib_init_umem_odp+0xb6/0x110 [ib_uverbs]
  ib_umem_odp_get+0xf0/0x150 [ib_uverbs]
  rxe_odp_mr_init_user+0x71/0x170 [rdma_rxe]
  rxe_reg_user_mr+0x217/0x2e0 [rdma_rxe]
  ib_uverbs_reg_mr+0x19e/0x2e0 [ib_uverbs]
  ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xd9/0x150 [ib_uverbs]
  ib_uverbs_cmd_verbs+0xd19/0xee0 [ib_uverbs]
  ? mmap_region+0x63/0xd0
  ? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 [ib_uverbs]
  ib_uverbs_ioctl+0xba/0x130 [ib_uverbs]
  __x64_sys_ioctl+0xa4/0xe0
  x64_sys_call+0x1178/0x2660
  do_syscall_64+0x7e/0x170
  ? syscall_exit_to_user_mode+0x4e/0x250
  ? do_syscall_64+0x8a/0x170
  ? do_syscall_64+0x8a/0x170
  ? syscall_exit_to_user_mode+0x4e/0x250
  ? do_syscall_64+0x8a/0x170
  ? syscall_exit_to_user_mode+0x4e/0x250
  ? do_syscall_64+0x8a/0x170
  ? do_user_addr_fault+0x1d2/0x8d0
  ? irqentry_exit_to_user_mode+0x43/0x250
  ? irqentry_exit+0x43/0x50
  ? exc_page_fault+0x93/0x1d0
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
 RIP: 0033:0x7262a6124ded
 Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
 RSP: 002b:00007fffd08c3960 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
 RAX: ffffffffffffffda RBX: 00007fffd08c39f0 RCX: 00007262a6124ded
 RDX: 00007fffd08c3a10 RSI: 00000000c0181b01 RDI: 0000000000000007
 RBP: 00007fffd08c39b0 R08: 0000000014107820 R09: 00007fffd08c3b44
 R10: 000000000000000c R11: 0000000000000246 R12: 00007fffd08c3b44
 R13: 000000000000000c R14: 00007fffd08c3b58 R15: 0000000014107960
  </TASK>

Fixes: 1efe8c0670d6 ("RDMA/core: Convert UMEM ODP DMA mapping to caching IOVA and page linkage")
Closes: https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com/
Signed-off-by: Daisuke Matsuda <dskmtsd@gmail.com>
---
 mm/hmm.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index a8bf097677f3..311141124e67 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -638,7 +638,7 @@ int hmm_dma_map_alloc(struct device *dev, struct hmm_dma_map *map,
 		      size_t nr_entries, size_t dma_entry_size)
 {
 	bool dma_need_sync = false;
-	bool use_iova;
+	bool use_iova = false;
 
 	if (!(nr_entries * PAGE_SIZE / dma_entry_size))
 		return -EINVAL;
@@ -649,9 +649,9 @@ int hmm_dma_map_alloc(struct device *dev, struct hmm_dma_map *map,
 	 * best approximation to ensure no swiotlb buffering happens.
 	 */
 #ifdef CONFIG_DMA_NEED_SYNC
-	dma_need_sync = !dev->dma_skip_sync;
+	dma_need_sync = dev ? !dev->dma_skip_sync : false;
 #endif /* CONFIG_DMA_NEED_SYNC */
-	if (dma_need_sync || dma_addressing_limited(dev))
+	if (dev && (dma_need_sync || dma_addressing_limited(dev)))
 		return -EOPNOTSUPP;
 
 	map->dma_entry_size = dma_entry_size;
@@ -660,9 +660,11 @@ int hmm_dma_map_alloc(struct device *dev, struct hmm_dma_map *map,
 	if (!map->pfn_list)
 		return -ENOMEM;
 
-	use_iova = dma_iova_try_alloc(dev, &map->state, 0,
-			nr_entries * PAGE_SIZE);
-	if (!use_iova && dma_need_unmap(dev)) {
+	if (dev)
+		use_iova = dma_iova_try_alloc(dev, &map->state, 0,
+					      nr_entries * PAGE_SIZE);
+
+	if (!dev || (!use_iova && dma_need_unmap(dev))) {
 		map->dma_list = kvcalloc(nr_entries, sizeof(*map->dma_list),
 					 GFP_KERNEL | __GFP_NOWARN);
 		if (!map->dma_list)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/hmm: Allow hmm_dma_map_alloc() to tolerate NULL device
  2025-05-23 14:35 [PATCH] mm/hmm: Allow hmm_dma_map_alloc() to tolerate NULL device Daisuke Matsuda
@ 2025-05-23 14:48 ` Christoph Hellwig
  2025-05-23 15:38   ` Daisuke Matsuda
  0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2025-05-23 14:48 UTC (permalink / raw)
  To: Daisuke Matsuda
  Cc: linux-rdma, linux-mm, leon, jgg, akpm, jglisse, linux-kernel,
	linux-pci, zyjzyj2000

On Fri, May 23, 2025 at 02:35:37PM +0000, Daisuke Matsuda wrote:
> Some drivers (such as rxe) may legitimately call hmm_dma_map_alloc() with a
> NULL device pointer,

No, they may not.  If something has no device with physical DMA
capabilities, it has not business calling into it.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/hmm: Allow hmm_dma_map_alloc() to tolerate NULL device
  2025-05-23 14:48 ` Christoph Hellwig
@ 2025-05-23 15:38   ` Daisuke Matsuda
  2025-05-23 15:42     ` Christoph Hellwig
  0 siblings, 1 reply; 5+ messages in thread
From: Daisuke Matsuda @ 2025-05-23 15:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-rdma, linux-mm, leon, jgg, akpm, jglisse, linux-kernel,
	linux-pci, zyjzyj2000

On 2025/05/23 23:48, Christoph Hellwig wrote:
> On Fri, May 23, 2025 at 02:35:37PM +0000, Daisuke Matsuda wrote:
>> Some drivers (such as rxe) may legitimately call hmm_dma_map_alloc() with a
>> NULL device pointer,
> 
> No, they may not.  If something has no device with physical DMA
> capabilities, it has not business calling into it.
> 

Hi Christoph,

RXE is a software emulator of IBTA RoCEv2, designed to allow systems equipped with standard Ethernet adapters to interoperate with other RoCEv2-capable nodes.

Like other Infiniband subsystem drivers (under drivers/infiniband/{hw,sw}), RXE depends on the ib_core and ib_uverbs layers in drivers/infiniband/core. These common RDMA layers, in turn, rely on the HMM infrastructure for specific features such as On-Demand Paging.

As a result, even though RXE lacks physical DMA capabilities, it still needs to interact with hmm_dma_map_alloc() through the shared RDMA core paths. This patch ensures that such software-only use cases do not trigger unintended null pointer dereferences.

Thanks,
Daisuke

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/hmm: Allow hmm_dma_map_alloc() to tolerate NULL device
  2025-05-23 15:38   ` Daisuke Matsuda
@ 2025-05-23 15:42     ` Christoph Hellwig
  2025-05-23 15:56       ` Daisuke Matsuda
  0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2025-05-23 15:42 UTC (permalink / raw)
  To: Daisuke Matsuda
  Cc: Christoph Hellwig, linux-rdma, linux-mm, leon, jgg, akpm, jglisse,
	linux-kernel, linux-pci, zyjzyj2000

Thank you very much, but I know rxe very well.  And given your apparent
knowledge of the rdma subsystem you should also know pretty well that
it does not otherwise call into the dma mapping core for virtual devices
because calling into the dma mapping code is not valid for the virtual
devices.

Please fix the rdma core to not call into the hmm dma mapping helpers
for the ib_uses_virt_dma() case.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/hmm: Allow hmm_dma_map_alloc() to tolerate NULL device
  2025-05-23 15:42     ` Christoph Hellwig
@ 2025-05-23 15:56       ` Daisuke Matsuda
  0 siblings, 0 replies; 5+ messages in thread
From: Daisuke Matsuda @ 2025-05-23 15:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-rdma, linux-mm, leon, jgg, akpm, jglisse, linux-kernel,
	linux-pci, zyjzyj2000


On 2025/05/24 0:42, Christoph Hellwig wrote:
> Thank you very much, but I know rxe very well.  And given your apparent
> knowledge of the rdma subsystem you should also know pretty well that
> it does not otherwise call into the dma mapping core for virtual devices
> because calling into the dma mapping code is not valid for the virtual
> devices.
> 
> Please fix the rdma core to not call into the hmm dma mapping helpers
> for the ib_uses_virt_dma() case.
> 

Thank you for the clarification and guidance.

I'll look into updating the RDMA core to avoid calling hmm_dma_map_alloc() when ib_uses_virt_dma() is true. That should help keep the layering and responsibilities properly separated.

Thanks again,
Daisuke

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-05-23 15:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-23 14:35 [PATCH] mm/hmm: Allow hmm_dma_map_alloc() to tolerate NULL device Daisuke Matsuda
2025-05-23 14:48 ` Christoph Hellwig
2025-05-23 15:38   ` Daisuke Matsuda
2025-05-23 15:42     ` Christoph Hellwig
2025-05-23 15:56       ` Daisuke Matsuda

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).