* [PATCH 1/5] RDMA/umem: ib_umem_get(): use kmalloc() to allocate page array
2026-06-30 10:52 [PATCH 0/5] RDMA, IB: replace __get_free_pages() with kmalloc() Mike Rapoport (Microsoft)
@ 2026-06-30 10:52 ` Mike Rapoport (Microsoft)
2026-06-30 12:31 ` Jason Gunthorpe
2026-06-30 10:52 ` [PATCH 2/5] RDMA/mlx5: replace __get_free_page() with kmalloc() Mike Rapoport (Microsoft)
` (3 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: Mike Rapoport (Microsoft) @ 2026-06-30 10:52 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky
Cc: Dennis Dalessandro, Mike Rapoport, linux-kernel, linux-mm,
linux-rdma
ib_umem_get() allocates an array of pointers to struct page for
pin_user_pages_fast() calls during memory registration.
This array can be allocated with kmalloc() as there's nothing special
about it to go directly to the page allocator.
kmalloc() provides a better API that does not require ugly casts and
kfree() does not need to know the size of the freed object.
Performance difference between kmalloc() and __get_free_pages() is not
measurable as both allocators take an object/page from a per-CPU list for
fast path allocations.
For the slow path the performance is anyway determined by the amount of
reclaim involved rather than by what allocator is used.
Replace use of __get_free_page() with kmalloc() and free_page() with
kfree().
Link: https://lore.kernel.org/all/635405e4-9423-4a25-a6e7-e03c8ea0bcbe@redhat.com
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
drivers/infiniband/core/umem.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 73498723a5d5..5c42497f32e2 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -209,7 +209,7 @@ static struct ib_umem *__ib_umem_get_va(struct ib_device *device,
mmgrab(mm);
- page_list = (struct page **) __get_free_page(GFP_KERNEL);
+ page_list = kmalloc(PAGE_SIZE, GFP_KERNEL);
if (!page_list) {
ret = -ENOMEM;
goto umem_kfree;
@@ -269,7 +269,7 @@ static struct ib_umem *__ib_umem_get_va(struct ib_device *device,
__ib_umem_release(device, umem, 0);
atomic64_sub(ib_umem_num_pages(umem), &mm->pinned_vm);
out:
- free_page((unsigned long) page_list);
+ kfree(page_list);
umem_kfree:
if (ret) {
mmdrop(umem->owning_mm);
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH 1/5] RDMA/umem: ib_umem_get(): use kmalloc() to allocate page array
2026-06-30 10:52 ` [PATCH 1/5] RDMA/umem: ib_umem_get(): use kmalloc() to allocate page array Mike Rapoport (Microsoft)
@ 2026-06-30 12:31 ` Jason Gunthorpe
2026-06-30 15:00 ` Mike Rapoport
0 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2026-06-30 12:31 UTC (permalink / raw)
To: Mike Rapoport (Microsoft)
Cc: Leon Romanovsky, Dennis Dalessandro, linux-kernel, linux-mm,
linux-rdma
On Tue, Jun 30, 2026 at 01:52:29PM +0300, Mike Rapoport (Microsoft) wrote:
> ib_umem_get() allocates an array of pointers to struct page for
> pin_user_pages_fast() calls during memory registration.
A whole bunch of these use cases in rdma are really "give me some
temporary memory, I want it fast and as large as possible. In a
syscall context I will free it before returning back to userspace"
eg we'd be really happy to get any kind of high order page here.
So, how would you feel about a new API?
void *kmalloc_temporary(size_t min_size, size_t max_size, size_t *actual_size, gfp);
I know of a few other cases like this in the kernel at least.
The implementation could try to find an available high order page and
immediately return it, otherwise do a small reclaim allocation?
Jason
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/5] RDMA/umem: ib_umem_get(): use kmalloc() to allocate page array
2026-06-30 12:31 ` Jason Gunthorpe
@ 2026-06-30 15:00 ` Mike Rapoport
2026-06-30 15:01 ` Mike Rapoport
0 siblings, 1 reply; 10+ messages in thread
From: Mike Rapoport @ 2026-06-30 15:00 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Leon Romanovsky, Dennis Dalessandro, linux-kernel, linux-mm,
linux-rdma
(adding Vlastimil)
On Tue, Jun 30, 2026 at 09:31:50AM -0300, Jason Gunthorpe wrote:
> On Tue, Jun 30, 2026 at 01:52:29PM +0300, Mike Rapoport (Microsoft) wrote:
> > ib_umem_get() allocates an array of pointers to struct page for
> > pin_user_pages_fast() calls during memory registration.
>
> A whole bunch of these use cases in rdma are really "give me some
> temporary memory, I want it fast and as large as possible. In a
> syscall context I will free it before returning back to userspace"
Not sure I follow where "as large as possible" comes from. Here it's
explicitly a page.
And does "fast" mean that vmalloc() is not an option?
> eg we'd be really happy to get any kind of high order page here.
>
> So, how would you feel about a new API?
>
> void *kmalloc_temporary(size_t min_size, size_t max_size, size_t *actual_size, gfp);
>
> I know of a few other cases like this in the kernel at least.
>
> The implementation could try to find an available high order page and
> immediately return it, otherwise do a small reclaim allocation?
How do you suggest to decide how much of reclaim should happen? With the
usual semantics of gfp?
> Jason
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/5] RDMA/umem: ib_umem_get(): use kmalloc() to allocate page array
2026-06-30 15:00 ` Mike Rapoport
@ 2026-06-30 15:01 ` Mike Rapoport
2026-06-30 15:36 ` Jason Gunthorpe
0 siblings, 1 reply; 10+ messages in thread
From: Mike Rapoport @ 2026-06-30 15:01 UTC (permalink / raw)
To: Jason Gunthorpe, Vlastimil Babka
Cc: Leon Romanovsky, Dennis Dalessandro, linux-kernel, linux-mm,
linux-rdma
(actually adding Vlastimil :) )
On Tue, Jun 30, 2026 at 06:00:24PM +0300, Mike Rapoport wrote:
> (adding Vlastimil)
>
> On Tue, Jun 30, 2026 at 09:31:50AM -0300, Jason Gunthorpe wrote:
> > On Tue, Jun 30, 2026 at 01:52:29PM +0300, Mike Rapoport (Microsoft) wrote:
> > > ib_umem_get() allocates an array of pointers to struct page for
> > > pin_user_pages_fast() calls during memory registration.
> >
> > A whole bunch of these use cases in rdma are really "give me some
> > temporary memory, I want it fast and as large as possible. In a
> > syscall context I will free it before returning back to userspace"
>
> Not sure I follow where "as large as possible" comes from. Here it's
> explicitly a page.
>
> And does "fast" mean that vmalloc() is not an option?
>
> > eg we'd be really happy to get any kind of high order page here.
> >
> > So, how would you feel about a new API?
> >
> > void *kmalloc_temporary(size_t min_size, size_t max_size, size_t *actual_size, gfp);
> >
> > I know of a few other cases like this in the kernel at least.
> >
> > The implementation could try to find an available high order page and
> > immediately return it, otherwise do a small reclaim allocation?
>
> How do you suggest to decide how much of reclaim should happen? With the
> usual semantics of gfp?
>
> > Jason
>
> --
> Sincerely yours,
> Mike.
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/5] RDMA/umem: ib_umem_get(): use kmalloc() to allocate page array
2026-06-30 15:01 ` Mike Rapoport
@ 2026-06-30 15:36 ` Jason Gunthorpe
0 siblings, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2026-06-30 15:36 UTC (permalink / raw)
To: Mike Rapoport
Cc: Vlastimil Babka, Leon Romanovsky, Dennis Dalessandro,
linux-kernel, linux-mm, linux-rdma
On Tue, Jun 30, 2026 at 06:01:17PM +0300, Mike Rapoport wrote:
> (actually adding Vlastimil :) )
>
> On Tue, Jun 30, 2026 at 06:00:24PM +0300, Mike Rapoport wrote:
> > (adding Vlastimil)
> >
> > On Tue, Jun 30, 2026 at 09:31:50AM -0300, Jason Gunthorpe wrote:
> > > On Tue, Jun 30, 2026 at 01:52:29PM +0300, Mike Rapoport (Microsoft) wrote:
> > > > ib_umem_get() allocates an array of pointers to struct page for
> > > > pin_user_pages_fast() calls during memory registration.
> > >
> > > A whole bunch of these use cases in rdma are really "give me some
> > > temporary memory, I want it fast and as large as possible. In a
> > > syscall context I will free it before returning back to userspace"
> >
> > Not sure I follow where "as large as possible" comes from. Here it's
> > explicitly a page.
It is a page because that is "fast"
There will be a calculation what the upper limit of memory is that
this algorithm can use.
> > And does "fast" mean that vmalloc() is not an option?
Yes. The trade off is you do fewer iterations of some loop if you have
a bigger temporary buffer. But if it takes longer to allocate than the loop
iterations then it doesn't help.
> > > So, how would you feel about a new API?
> > >
> > > void *kmalloc_temporary(size_t min_size, size_t max_size, size_t *actual_size, gfp);
> > >
> > > I know of a few other cases like this in the kernel at least.
> > >
> > > The implementation could try to find an available high order page and
> > > immediately return it, otherwise do a small reclaim allocation?
> >
> > How do you suggest to decide how much of reclaim should happen?
> > With the usual semantics of gfp?
Yeah, when all options are exhausted you do some allocation with the
usual GFP options.
Jason
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/5] RDMA/mlx5: replace __get_free_page() with kmalloc()
2026-06-30 10:52 [PATCH 0/5] RDMA, IB: replace __get_free_pages() with kmalloc() Mike Rapoport (Microsoft)
2026-06-30 10:52 ` [PATCH 1/5] RDMA/umem: ib_umem_get(): use kmalloc() to allocate page array Mike Rapoport (Microsoft)
@ 2026-06-30 10:52 ` Mike Rapoport (Microsoft)
2026-06-30 10:52 ` [PATCH 3/5] IB/mthca: mthca_reg_user_mr(): use kmalloc() to allocate addresses array Mike Rapoport (Microsoft)
` (2 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Mike Rapoport (Microsoft) @ 2026-06-30 10:52 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky
Cc: Dennis Dalessandro, Mike Rapoport, linux-kernel, linux-mm,
linux-rdma
mlx5_ib_mr_wqe_pfault_handler() allocates a scratch buffer for
parsing work queue entries during page fault handling.
This buffer can be allocated with kmalloc() as there's nothing special
about it to go directly to the page allocator.
kmalloc() provides a better API that does not require ugly casts and
kfree() does not need to know the size of the freed object.
Performance difference between kmalloc() and __get_free_pages() is not
measurable as both allocators take an object/page from a per-CPU list for
fast path allocations.
For the slow path the performance is anyway determined by the amount of
reclaim involved rather than by what allocator is used.
Replace use of __get_free_page() with kmalloc() and free_page() with
kfree().
Link: https://lore.kernel.org/all/635405e4-9423-4a25-a6e7-e03c8ea0bcbe@redhat.com
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
drivers/infiniband/hw/mlx5/odp.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index 1badec9bf527..90706ff7102a 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -38,6 +38,7 @@
#include <linux/hmm-dma.h>
#include <linux/pci-p2pdma.h>
+#include <linux/slab.h>
#include "mlx5_ib.h"
#include "cmd.h"
#include "umr.h"
@@ -1414,7 +1415,7 @@ static void mlx5_ib_mr_wqe_pfault_handler(struct mlx5_ib_dev *dev,
goto resolve_page_fault;
}
- wqe_start = (void *)__get_free_page(GFP_KERNEL);
+ wqe_start = kmalloc(PAGE_SIZE, GFP_KERNEL);
if (!wqe_start) {
mlx5_ib_err(dev, "Error allocating memory for IO page fault handling.\n");
goto resolve_page_fault;
@@ -1475,7 +1476,7 @@ static void mlx5_ib_mr_wqe_pfault_handler(struct mlx5_ib_dev *dev,
pfault->wqe.wq_num, resume_with_error,
pfault->type);
mlx5_core_res_put(res);
- free_page((unsigned long)wqe_start);
+ kfree(wqe_start);
}
static void mlx5_ib_mr_rdma_pfault_handler(struct mlx5_ib_dev *dev,
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 3/5] IB/mthca: mthca_reg_user_mr(): use kmalloc() to allocate addresses array
2026-06-30 10:52 [PATCH 0/5] RDMA, IB: replace __get_free_pages() with kmalloc() Mike Rapoport (Microsoft)
2026-06-30 10:52 ` [PATCH 1/5] RDMA/umem: ib_umem_get(): use kmalloc() to allocate page array Mike Rapoport (Microsoft)
2026-06-30 10:52 ` [PATCH 2/5] RDMA/mlx5: replace __get_free_page() with kmalloc() Mike Rapoport (Microsoft)
@ 2026-06-30 10:52 ` Mike Rapoport (Microsoft)
2026-06-30 10:52 ` [PATCH 4/5] IB/mthca: allocate mthca_array memory with kzalloc() Mike Rapoport (Microsoft)
2026-06-30 10:52 ` [PATCH 5/5] IB/rdmavt: use kzalloc() to allocate QPN-map pages Mike Rapoport (Microsoft)
4 siblings, 0 replies; 10+ messages in thread
From: Mike Rapoport (Microsoft) @ 2026-06-30 10:52 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky
Cc: Dennis Dalessandro, Mike Rapoport, linux-kernel, linux-mm,
linux-rdma
mthca_reg_user_mr() allocates an array of DMA addresses during memory
registration.
This buffer can be allocated with kmalloc() as there's nothing special
about it to go directly to the page allocator.
kmalloc() provides a better API that does not require ugly casts and
kfree() does not need to know the size of the freed object.
Performance difference between kmalloc() and __get_free_pages() is not
measurable as both allocators take an object/page from a per-CPU list for
fast path allocations.
For the slow path the performance is anyway determined by the amount of
reclaim involved rather than by what allocator is used.
Replace use of __get_free_page() with kmalloc() and free_page() with
kfree().
Link: https://lore.kernel.org/all/635405e4-9423-4a25-a6e7-e03c8ea0bcbe@redhat.com
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
drivers/infiniband/hw/mthca/mthca_provider.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index f90f67afc8fa..c9ec9ca0aaa6 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -895,7 +895,7 @@ static struct ib_mr *mthca_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
goto err_umem;
}
- pages = (u64 *) __get_free_page(GFP_KERNEL);
+ pages = kmalloc(PAGE_SIZE, GFP_KERNEL);
if (!pages) {
err = -ENOMEM;
goto err_mtt;
@@ -924,7 +924,7 @@ static struct ib_mr *mthca_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
if (i)
err = mthca_write_mtt(dev, mr->mtt, n, pages, i);
mtt_done:
- free_page((unsigned long) pages);
+ kfree(pages);
if (err)
goto err_mtt;
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 4/5] IB/mthca: allocate mthca_array memory with kzalloc()
2026-06-30 10:52 [PATCH 0/5] RDMA, IB: replace __get_free_pages() with kmalloc() Mike Rapoport (Microsoft)
` (2 preceding siblings ...)
2026-06-30 10:52 ` [PATCH 3/5] IB/mthca: mthca_reg_user_mr(): use kmalloc() to allocate addresses array Mike Rapoport (Microsoft)
@ 2026-06-30 10:52 ` Mike Rapoport (Microsoft)
2026-06-30 10:52 ` [PATCH 5/5] IB/rdmavt: use kzalloc() to allocate QPN-map pages Mike Rapoport (Microsoft)
4 siblings, 0 replies; 10+ messages in thread
From: Mike Rapoport (Microsoft) @ 2026-06-30 10:52 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky
Cc: Dennis Dalessandro, Mike Rapoport, linux-kernel, linux-mm,
linux-rdma
mthca_array is essentially a sparse array of pointers and there is no
need to allocate its memory using page allocator.
kmalloc() provides a better API that does not require ugly casts and
kfree() does not need to know the size of the freed object.
Performance difference between kmalloc() and __get_free_pages() is not
measurable as both allocators take an object/page from a per-CPU list for
fast path allocations.
For the slow path the performance is anyway determined by the amount of
reclaim involved rather than by what allocator is used.
Replace use of get_zeroed_page() with kzalloc() and free_page() with
kfree().
Link: https://lore.kernel.org/all/635405e4-9423-4a25-a6e7-e03c8ea0bcbe@redhat.com
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
drivers/infiniband/hw/mthca/mthca_allocator.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/hw/mthca/mthca_allocator.c b/drivers/infiniband/hw/mthca/mthca_allocator.c
index dedc301235a0..117a070e784e 100644
--- a/drivers/infiniband/hw/mthca/mthca_allocator.c
+++ b/drivers/infiniband/hw/mthca/mthca_allocator.c
@@ -126,7 +126,7 @@ int mthca_array_set(struct mthca_array *array, int index, void *value)
/* Allocate with GFP_ATOMIC because we'll be called with locks held. */
if (!array->page_list[p].page)
- array->page_list[p].page = (void **) get_zeroed_page(GFP_ATOMIC);
+ array->page_list[p].page = kzalloc(PAGE_SIZE, GFP_ATOMIC);
if (!array->page_list[p].page)
return -ENOMEM;
@@ -142,7 +142,7 @@ void mthca_array_clear(struct mthca_array *array, int index)
int p = (index * sizeof (void *)) >> PAGE_SHIFT;
if (--array->page_list[p].used == 0) {
- free_page((unsigned long) array->page_list[p].page);
+ kfree(array->page_list[p].page);
array->page_list[p].page = NULL;
} else
array->page_list[p].page[index & MTHCA_ARRAY_MASK] = NULL;
@@ -174,7 +174,7 @@ void mthca_array_cleanup(struct mthca_array *array, int nent)
int i;
for (i = 0; i < (nent * sizeof (void *) + PAGE_SIZE - 1) / PAGE_SIZE; ++i)
- free_page((unsigned long) array->page_list[i].page);
+ kfree(array->page_list[i].page);
kfree(array->page_list);
}
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 5/5] IB/rdmavt: use kzalloc() to allocate QPN-map pages
2026-06-30 10:52 [PATCH 0/5] RDMA, IB: replace __get_free_pages() with kmalloc() Mike Rapoport (Microsoft)
` (3 preceding siblings ...)
2026-06-30 10:52 ` [PATCH 4/5] IB/mthca: allocate mthca_array memory with kzalloc() Mike Rapoport (Microsoft)
@ 2026-06-30 10:52 ` Mike Rapoport (Microsoft)
4 siblings, 0 replies; 10+ messages in thread
From: Mike Rapoport (Microsoft) @ 2026-06-30 10:52 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky
Cc: Dennis Dalessandro, Mike Rapoport, linux-kernel, linux-mm,
linux-rdma
get_map_page() allocates bitmap pages using get_zeroed_page().
The bitmaps can be allocated with kmalloc() as there's nothing special
about them to go directly to the page allocator.
kmalloc() provides a better API that does not require ugly casts and
kfree() does not need to know the size of the freed object.
Performance difference between kmalloc() and __get_free_pages() is not
measurable as both allocators take an object/page from a per-CPU list for
fast path allocations.
For the slow path the performance is anyway determined by the amount of
reclaim involved rather than by what allocator is used.
Replace use of get_zeroed_page() with kzalloc() and free_page() with
kfree().
Link: https://lore.kernel.org/all/635405e4-9423-4a25-a6e7-e03c8ea0bcbe@redhat.com
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
drivers/infiniband/sw/rdmavt/qp.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/infiniband/sw/rdmavt/qp.c b/drivers/infiniband/sw/rdmavt/qp.c
index 70e7d08fdce6..c40cce69e945 100644
--- a/drivers/infiniband/sw/rdmavt/qp.c
+++ b/drivers/infiniband/sw/rdmavt/qp.c
@@ -263,7 +263,7 @@ static inline bool wss_exceeds_threshold(struct rvt_wss *wss)
static void get_map_page(struct rvt_qpn_table *qpt,
struct rvt_qpn_map *map)
{
- unsigned long page = get_zeroed_page(GFP_KERNEL);
+ void *page = kzalloc(PAGE_SIZE, GFP_KERNEL);
/*
* Free the page if someone raced with us installing it.
@@ -271,9 +271,9 @@ static void get_map_page(struct rvt_qpn_table *qpt,
spin_lock(&qpt->lock);
if (map->page)
- free_page(page);
+ kfree(page);
else
- map->page = (void *)page;
+ map->page = page;
spin_unlock(&qpt->lock);
}
@@ -343,7 +343,7 @@ static void free_qpn_table(struct rvt_qpn_table *qpt)
int i;
for (i = 0; i < ARRAY_SIZE(qpt->map); i++)
- free_page((unsigned long)qpt->map[i].page);
+ kfree(qpt->map[i].page);
}
/**
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread