[PATCH][rdma-next] RDMA/erdma: Use NUMA-aware allocation for MTT tables

public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH][rdma-next] RDMA/erdma: Use NUMA-aware allocation for MTT tables
@ 2026-02-25  8:51 lirongqing
  2026-02-25 11:33 ` Cheng Xu
  2026-02-26  1:51 ` Cheng Xu
  0 siblings, 2 replies; 9+ messages in thread
From: lirongqing @ 2026-02-25  8:51 UTC (permalink / raw)
  To: Cheng Xu, Kai Shen, Jason Gunthorpe, Leon Romanovsky, linux-rdma,
	linux-kernel
  Cc: Li RongQing

From: Li RongQing <lirongqing@baidu.com>

Currently, MTT (Memory Translation Table) buffers are allocated without
NUMA awareness using kzalloc() and vzalloc(), which allocate memory on
the NUMA node of the calling CPU. This can lead to cross-node memory
access latencies if the erdma device is attached to a different NUMA
socket.

Switch to kzalloc_node() and vzalloc_node() to ensure MTT buffers are
allocated on the local NUMA node of the PCIe device (dev->attrs.numa_node).
This reduces latency for hardware access and improves performance.

Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
 drivers/infiniband/hw/erdma/erdma_verbs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/erdma/erdma_verbs.c b/drivers/infiniband/hw/erdma/erdma_verbs.c
index 9f74aad..58da6ef 100644
--- a/drivers/infiniband/hw/erdma/erdma_verbs.c
+++ b/drivers/infiniband/hw/erdma/erdma_verbs.c
@@ -604,7 +604,7 @@ static struct erdma_mtt *erdma_create_cont_mtt(struct erdma_dev *dev,
 		return ERR_PTR(-ENOMEM);
 
 	mtt->size = size;
-	mtt->buf = kzalloc(mtt->size, GFP_KERNEL);
+	mtt->buf = kzalloc_node(mtt->size, GFP_KERNEL, dev->attrs.numa_node);
 	if (!mtt->buf)
 		goto err_free_mtt;
 
@@ -729,7 +729,7 @@ static struct erdma_mtt *erdma_create_scatter_mtt(struct erdma_dev *dev,
 		return ERR_PTR(-ENOMEM);
 
 	mtt->size = ALIGN(size, PAGE_SIZE);
-	mtt->buf = vzalloc(mtt->size);
+	mtt->buf = vzalloc_node(mtt->size, dev->attrs.numa_node);
 	mtt->continuous = false;
 	if (!mtt->buf)
 		goto err_free_mtt;
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH][rdma-next] RDMA/erdma: Use NUMA-aware allocation for MTT tables
  2026-02-25  8:51 [PATCH][rdma-next] RDMA/erdma: Use NUMA-aware allocation for MTT tables lirongqing
@ 2026-02-25 11:33 ` Cheng Xu
  2026-02-25 11:46   ` 答复: [外部邮件] " Li,Rongqing(ACG CCN)
  2026-02-26  1:51 ` Cheng Xu
  1 sibling, 1 reply; 9+ messages in thread
From: Cheng Xu @ 2026-02-25 11:33 UTC (permalink / raw)
  To: lirongqing, Kai Shen, Jason Gunthorpe, Leon Romanovsky,
	linux-rdma, linux-kernel



On 2/25/26 4:51 PM, lirongqing wrote:
> From: Li RongQing <lirongqing@baidu.com>
> 
> Currently, MTT (Memory Translation Table) buffers are allocated without
> NUMA awareness using kzalloc() and vzalloc(), which allocate memory on
> the NUMA node of the calling CPU. This can lead to cross-node memory
> access latencies if the erdma device is attached to a different NUMA
> socket.
> 
> Switch to kzalloc_node() and vzalloc_node() to ensure MTT buffers are
> allocated on the local NUMA node of the PCIe device (dev->attrs.numa_node).
> This reduces latency for hardware access and improves performance.
> 
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> ---
>  drivers/infiniband/hw/erdma/erdma_verbs.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 

Hi, Li RongQing,

Thanks for the patch. However, I think it is better to keep the current
behavior, for the following reasons:

1. This path is in the control plane, so allocating memory from a remote
   NUMA node should not have a noticeable performance impact.
2. With this change, the driver may fail the allocation when the local NUMA
   node is out of memory, even if other nodes still have available memory.

Thanks,
Cheng Xu

> diff --git a/drivers/infiniband/hw/erdma/erdma_verbs.c b/drivers/infiniband/hw/erdma/erdma_verbs.c
> index 9f74aad..58da6ef 100644
> --- a/drivers/infiniband/hw/erdma/erdma_verbs.c
> +++ b/drivers/infiniband/hw/erdma/erdma_verbs.c
> @@ -604,7 +604,7 @@ static struct erdma_mtt *erdma_create_cont_mtt(struct erdma_dev *dev,
>  		return ERR_PTR(-ENOMEM);
>  
>  	mtt->size = size;
> -	mtt->buf = kzalloc(mtt->size, GFP_KERNEL);
> +	mtt->buf = kzalloc_node(mtt->size, GFP_KERNEL, dev->attrs.numa_node);
>  	if (!mtt->buf)
>  		goto err_free_mtt;
>  
> @@ -729,7 +729,7 @@ static struct erdma_mtt *erdma_create_scatter_mtt(struct erdma_dev *dev,
>  		return ERR_PTR(-ENOMEM);
>  
>  	mtt->size = ALIGN(size, PAGE_SIZE);
> -	mtt->buf = vzalloc(mtt->size);
> +	mtt->buf = vzalloc_node(mtt->size, dev->attrs.numa_node);
>  	mtt->continuous = false;
>  	if (!mtt->buf)
>  		goto err_free_mtt;

^ permalink raw reply	[flat|nested] 9+ messages in thread

* 答复: [外部邮件] Re: [PATCH][rdma-next] RDMA/erdma: Use NUMA-aware allocation for MTT tables
  2026-02-25 11:33 ` Cheng Xu
@ 2026-02-25 11:46   ` Li,Rongqing(ACG CCN)
  2026-02-25 12:07     ` Li,Rongqing(ACG CCN)
  0 siblings, 1 reply; 9+ messages in thread
From: Li,Rongqing(ACG CCN) @ 2026-02-25 11:46 UTC (permalink / raw)
  To: Cheng Xu, Kai Shen, Jason Gunthorpe, Leon Romanovsky,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org


> On 2/25/26 4:51 PM, lirongqing wrote:
> > From: Li RongQing <lirongqing@baidu.com>
> >
> > Currently, MTT (Memory Translation Table) buffers are allocated
> > without NUMA awareness using kzalloc() and vzalloc(), which allocate
> > memory on the NUMA node of the calling CPU. This can lead to
> > cross-node memory access latencies if the erdma device is attached to
> > a different NUMA socket.
> >
> > Switch to kzalloc_node() and vzalloc_node() to ensure MTT buffers are
> > allocated on the local NUMA node of the PCIe device (dev->attrs.numa_node).
> > This reduces latency for hardware access and improves performance.
> >
> > Signed-off-by: Li RongQing <lirongqing@baidu.com>
> > ---
> >  drivers/infiniband/hw/erdma/erdma_verbs.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> 
> Hi, Li RongQing,
> 
> Thanks for the patch. However, I think it is better to keep the current behavior,
> for the following reasons:
> 
> 1. This path is in the control plane, so allocating memory from a remote
>    NUMA node should not have a noticeable performance impact.

If TLB Miss , or the internal cache misses , does the HCA need to query the MTT?

[Li,Rongqing] 

> 2. With this change, the driver may fail the allocation when the local NUMA
>    node is out of memory, even if other nodes still have available memory.
> 
> Thanks,
> Cheng Xu
> 
> > diff --git a/drivers/infiniband/hw/erdma/erdma_verbs.c
> > b/drivers/infiniband/hw/erdma/erdma_verbs.c
> > index 9f74aad..58da6ef 100644
> > --- a/drivers/infiniband/hw/erdma/erdma_verbs.c
> > +++ b/drivers/infiniband/hw/erdma/erdma_verbs.c
> > @@ -604,7 +604,7 @@ static struct erdma_mtt
> *erdma_create_cont_mtt(struct erdma_dev *dev,
> >  		return ERR_PTR(-ENOMEM);
> >
> >  	mtt->size = size;
> > -	mtt->buf = kzalloc(mtt->size, GFP_KERNEL);
> > +	mtt->buf = kzalloc_node(mtt->size, GFP_KERNEL,
> > +dev->attrs.numa_node);
> >  	if (!mtt->buf)
> >  		goto err_free_mtt;
> >
> > @@ -729,7 +729,7 @@ static struct erdma_mtt
> *erdma_create_scatter_mtt(struct erdma_dev *dev,
> >  		return ERR_PTR(-ENOMEM);
> >
> >  	mtt->size = ALIGN(size, PAGE_SIZE);
> > -	mtt->buf = vzalloc(mtt->size);
> > +	mtt->buf = vzalloc_node(mtt->size, dev->attrs.numa_node);
> >  	mtt->continuous = false;
> >  	if (!mtt->buf)
> >  		goto err_free_mtt;

^ permalink raw reply	[flat|nested] 9+ messages in thread

* 答复: [外部邮件] Re: [PATCH][rdma-next] RDMA/erdma: Use NUMA-aware allocation for MTT tables
  2026-02-25 11:46   ` 答复: [外部邮件] " Li,Rongqing(ACG CCN)
@ 2026-02-25 12:07     ` Li,Rongqing(ACG CCN)
  2026-02-26  1:50       ` Cheng Xu
  0 siblings, 1 reply; 9+ messages in thread
From: Li,Rongqing(ACG CCN) @ 2026-02-25 12:07 UTC (permalink / raw)
  To: Cheng Xu, Kai Shen, Jason Gunthorpe, Leon Romanovsky,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org


> > On 2/25/26 4:51 PM, lirongqing wrote:
> > > From: Li RongQing <lirongqing@baidu.com>
> > >
> > > Currently, MTT (Memory Translation Table) buffers are allocated
> > > without NUMA awareness using kzalloc() and vzalloc(), which allocate
> > > memory on the NUMA node of the calling CPU. This can lead to
> > > cross-node memory access latencies if the erdma device is attached
> > > to a different NUMA socket.
> > >
> > > Switch to kzalloc_node() and vzalloc_node() to ensure MTT buffers
> > > are allocated on the local NUMA node of the PCIe device
> (dev->attrs.numa_node).
> > > This reduces latency for hardware access and improves performance.
> > >
> > > Signed-off-by: Li RongQing <lirongqing@baidu.com>
> > > ---
> > >  drivers/infiniband/hw/erdma/erdma_verbs.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> >
> > Hi, Li RongQing,
> >
> > Thanks for the patch. However, I think it is better to keep the
> > current behavior, for the following reasons:
> >
> > 1. This path is in the control plane, so allocating memory from a remote
> >    NUMA node should not have a noticeable performance impact.
> 
> If TLB Miss , or the internal cache misses , does the HCA need to query the MTT?
> 
> [Li,Rongqing]
> 
> > 2. With this change, the driver may fail the allocation when the local NUMA
> >    node is out of memory, even if other nodes still have available memory.
> >


When kmalloc_node() is called without __GFP_THISNODE and the target node
lacks sufficient memory, SLUB allocates a folio from a different node
other than the requested node.

So I think this is not a problem.

[Li,Rongqing] 



> > Thanks,
> > Cheng Xu
> >
> > > diff --git a/drivers/infiniband/hw/erdma/erdma_verbs.c
> > > b/drivers/infiniband/hw/erdma/erdma_verbs.c
> > > index 9f74aad..58da6ef 100644
> > > --- a/drivers/infiniband/hw/erdma/erdma_verbs.c
> > > +++ b/drivers/infiniband/hw/erdma/erdma_verbs.c
> > > @@ -604,7 +604,7 @@ static struct erdma_mtt
> > *erdma_create_cont_mtt(struct erdma_dev *dev,
> > >  		return ERR_PTR(-ENOMEM);
> > >
> > >  	mtt->size = size;
> > > -	mtt->buf = kzalloc(mtt->size, GFP_KERNEL);
> > > +	mtt->buf = kzalloc_node(mtt->size, GFP_KERNEL,
> > > +dev->attrs.numa_node);
> > >  	if (!mtt->buf)
> > >  		goto err_free_mtt;
> > >
> > > @@ -729,7 +729,7 @@ static struct erdma_mtt
> > *erdma_create_scatter_mtt(struct erdma_dev *dev,
> > >  		return ERR_PTR(-ENOMEM);
> > >
> > >  	mtt->size = ALIGN(size, PAGE_SIZE);
> > > -	mtt->buf = vzalloc(mtt->size);
> > > +	mtt->buf = vzalloc_node(mtt->size, dev->attrs.numa_node);
> > >  	mtt->continuous = false;
> > >  	if (!mtt->buf)
> > >  		goto err_free_mtt;

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 答复: [外部邮件] Re: [PATCH][rdma-next] RDMA/erdma: Use NUMA-aware allocation for MTT tables
  2026-02-25 12:07     ` Li,Rongqing(ACG CCN)
@ 2026-02-26  1:50       ` Cheng Xu
  2026-02-26  7:09         ` Leon Romanovsky
  0 siblings, 1 reply; 9+ messages in thread
From: Cheng Xu @ 2026-02-26  1:50 UTC (permalink / raw)
  To: Li,Rongqing(ACG CCN), Kai Shen, Jason Gunthorpe, Leon Romanovsky,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org



On 2/25/26 8:07 PM, Li,Rongqing(ACG CCN) wrote:
> 
>>> On 2/25/26 4:51 PM, lirongqing wrote:
>>>> From: Li RongQing <lirongqing@baidu.com>
>>>>
>>>> Currently, MTT (Memory Translation Table) buffers are allocated
>>>> without NUMA awareness using kzalloc() and vzalloc(), which allocate
>>>> memory on the NUMA node of the calling CPU. This can lead to
>>>> cross-node memory access latencies if the erdma device is attached
>>>> to a different NUMA socket.
>>>>
>>>> Switch to kzalloc_node() and vzalloc_node() to ensure MTT buffers
>>>> are allocated on the local NUMA node of the PCIe device
>> (dev->attrs.numa_node).
>>>> This reduces latency for hardware access and improves performance.
>>>>
>>>> Signed-off-by: Li RongQing <lirongqing@baidu.com>
>>>> ---
>>>>  drivers/infiniband/hw/erdma/erdma_verbs.c | 4 ++--
>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>
>>> Hi, Li RongQing,
>>>
>>> Thanks for the patch. However, I think it is better to keep the
>>> current behavior, for the following reasons:
>>>
>>> 1. This path is in the control plane, so allocating memory from a remote
>>>    NUMA node should not have a noticeable performance impact.
>>
>> If TLB Miss , or the internal cache misses , does the HCA need to query the MTT?
>>

This is rarely happen in our chip.

>> [Li,Rongqing]
>>
>>> 2. With this change, the driver may fail the allocation when the local NUMA
>>>    node is out of memory, even if other nodes still have available memory.
>>>
> 
> 
> When kmalloc_node() is called without __GFP_THISNODE and the target node
> lacks sufficient memory, SLUB allocates a folio from a different node
> other than the requested node.
> 

You are right, thank you for pointing out this.

Cheng Xu

> So I think this is not a problem.
> 
> [Li,Rongqing] 
> 
> 
> 
>>> Thanks,
>>> Cheng Xu
>>>
>>>> diff --git a/drivers/infiniband/hw/erdma/erdma_verbs.c
>>>> b/drivers/infiniband/hw/erdma/erdma_verbs.c
>>>> index 9f74aad..58da6ef 100644
>>>> --- a/drivers/infiniband/hw/erdma/erdma_verbs.c
>>>> +++ b/drivers/infiniband/hw/erdma/erdma_verbs.c
>>>> @@ -604,7 +604,7 @@ static struct erdma_mtt
>>> *erdma_create_cont_mtt(struct erdma_dev *dev,
>>>>  		return ERR_PTR(-ENOMEM);
>>>>
>>>>  	mtt->size = size;
>>>> -	mtt->buf = kzalloc(mtt->size, GFP_KERNEL);
>>>> +	mtt->buf = kzalloc_node(mtt->size, GFP_KERNEL,
>>>> +dev->attrs.numa_node);
>>>>  	if (!mtt->buf)
>>>>  		goto err_free_mtt;
>>>>
>>>> @@ -729,7 +729,7 @@ static struct erdma_mtt
>>> *erdma_create_scatter_mtt(struct erdma_dev *dev,
>>>>  		return ERR_PTR(-ENOMEM);
>>>>
>>>>  	mtt->size = ALIGN(size, PAGE_SIZE);
>>>> -	mtt->buf = vzalloc(mtt->size);
>>>> +	mtt->buf = vzalloc_node(mtt->size, dev->attrs.numa_node);
>>>>  	mtt->continuous = false;
>>>>  	if (!mtt->buf)
>>>>  		goto err_free_mtt;

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH][rdma-next] RDMA/erdma: Use NUMA-aware allocation for MTT tables
  2026-02-25  8:51 [PATCH][rdma-next] RDMA/erdma: Use NUMA-aware allocation for MTT tables lirongqing
  2026-02-25 11:33 ` Cheng Xu
@ 2026-02-26  1:51 ` Cheng Xu
  1 sibling, 0 replies; 9+ messages in thread
From: Cheng Xu @ 2026-02-26  1:51 UTC (permalink / raw)
  To: lirongqing, Kai Shen, Jason Gunthorpe, Leon Romanovsky
  Cc: linux-rdma@vger.kernel.org



On 2/25/26 4:51 PM, lirongqing wrote:
> From: Li RongQing <lirongqing@baidu.com>
> 
> Currently, MTT (Memory Translation Table) buffers are allocated without
> NUMA awareness using kzalloc() and vzalloc(), which allocate memory on
> the NUMA node of the calling CPU. This can lead to cross-node memory
> access latencies if the erdma device is attached to a different NUMA
> socket.
> 
> Switch to kzalloc_node() and vzalloc_node() to ensure MTT buffers are
> allocated on the local NUMA node of the PCIe device (dev->attrs.numa_node).
> This reduces latency for hardware access and improves performance.
> 
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> ---
>  drivers/infiniband/hw/erdma/erdma_verbs.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 

Acked-by: Cheng Xu <chengyou@linux.alibaba.com>

> diff --git a/drivers/infiniband/hw/erdma/erdma_verbs.c b/drivers/infiniband/hw/erdma/erdma_verbs.c
> index 9f74aad..58da6ef 100644
> --- a/drivers/infiniband/hw/erdma/erdma_verbs.c
> +++ b/drivers/infiniband/hw/erdma/erdma_verbs.c
> @@ -604,7 +604,7 @@ static struct erdma_mtt *erdma_create_cont_mtt(struct erdma_dev *dev,
>  		return ERR_PTR(-ENOMEM);
>  
>  	mtt->size = size;
> -	mtt->buf = kzalloc(mtt->size, GFP_KERNEL);
> +	mtt->buf = kzalloc_node(mtt->size, GFP_KERNEL, dev->attrs.numa_node);
>  	if (!mtt->buf)
>  		goto err_free_mtt;
>  
> @@ -729,7 +729,7 @@ static struct erdma_mtt *erdma_create_scatter_mtt(struct erdma_dev *dev,
>  		return ERR_PTR(-ENOMEM);
>  
>  	mtt->size = ALIGN(size, PAGE_SIZE);
> -	mtt->buf = vzalloc(mtt->size);
> +	mtt->buf = vzalloc_node(mtt->size, dev->attrs.numa_node);
>  	mtt->continuous = false;
>  	if (!mtt->buf)
>  		goto err_free_mtt;

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 答复: [外部邮件] Re: [PATCH][rdma-next] RDMA/erdma: Use NUMA-aware allocation for MTT tables
  2026-02-26  1:50       ` Cheng Xu
@ 2026-02-26  7:09         ` Leon Romanovsky
  2026-02-26  7:59           ` Cheng Xu
  2026-02-27 14:55           ` Jason Gunthorpe
  0 siblings, 2 replies; 9+ messages in thread
From: Leon Romanovsky @ 2026-02-26  7:09 UTC (permalink / raw)
  To: Cheng Xu
  Cc: Li,Rongqing(ACG CCN), Kai Shen, Jason Gunthorpe,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org

On Thu, Feb 26, 2026 at 09:50:00AM +0800, Cheng Xu wrote:
> 
> 
> On 2/25/26 8:07 PM, Li,Rongqing(ACG CCN) wrote:
> > 
> >>> On 2/25/26 4:51 PM, lirongqing wrote:
> >>>> From: Li RongQing <lirongqing@baidu.com>
> >>>>
> >>>> Currently, MTT (Memory Translation Table) buffers are allocated
> >>>> without NUMA awareness using kzalloc() and vzalloc(), which allocate
> >>>> memory on the NUMA node of the calling CPU. This can lead to
> >>>> cross-node memory access latencies if the erdma device is attached
> >>>> to a different NUMA socket.
> >>>>
> >>>> Switch to kzalloc_node() and vzalloc_node() to ensure MTT buffers
> >>>> are allocated on the local NUMA node of the PCIe device
> >> (dev->attrs.numa_node).
> >>>> This reduces latency for hardware access and improves performance.
> >>>>
> >>>> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> >>>> ---
> >>>>  drivers/infiniband/hw/erdma/erdma_verbs.c | 4 ++--
> >>>>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>>>
> >>>
> >>> Hi, Li RongQing,
> >>>
> >>> Thanks for the patch. However, I think it is better to keep the
> >>> current behavior, for the following reasons:
> >>>
> >>> 1. This path is in the control plane, so allocating memory from a remote
> >>>    NUMA node should not have a noticeable performance impact.
> >>
> >> If TLB Miss , or the internal cache misses , does the HCA need to query the MTT?
> >>
> 
> This is rarely happen in our chip.

So why do we need this patch? The xxx_node() functions are useful when you
need to force allocation on a specific NUMA node. In most cases, a plain
kmalloc() will allocate memory on the same node as 'struct erdma_dev *dev',
which typically matches the PCI device's NUMA node.

Please avoid vague phrasing like 'potentially improves performance' in the
commit message and responses. It adds no meaningful information.

Also, please remove the dev->attrs.numa_node caching from erdma and rely on
dev_to_node() instead.

Thanks

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 答复: [外部邮件] Re: [PATCH][rdma-next] RDMA/erdma: Use NUMA-aware allocation for MTT tables
  2026-02-26  7:09         ` Leon Romanovsky
@ 2026-02-26  7:59           ` Cheng Xu
  2026-02-27 14:55           ` Jason Gunthorpe
  1 sibling, 0 replies; 9+ messages in thread
From: Cheng Xu @ 2026-02-26  7:59 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Li,Rongqing(ACG CCN), Kai Shen, Jason Gunthorpe,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org



On 2/26/26 3:09 PM, Leon Romanovsky wrote:
> On Thu, Feb 26, 2026 at 09:50:00AM +0800, Cheng Xu wrote:
>>
>>
>> On 2/25/26 8:07 PM, Li,Rongqing(ACG CCN) wrote:
>>>
>>>>> On 2/25/26 4:51 PM, lirongqing wrote:
>>>>>> From: Li RongQing <lirongqing@baidu.com>
>>>>>>
>>>>>> Currently, MTT (Memory Translation Table) buffers are allocated
>>>>>> without NUMA awareness using kzalloc() and vzalloc(), which allocate
>>>>>> memory on the NUMA node of the calling CPU. This can lead to
>>>>>> cross-node memory access latencies if the erdma device is attached
>>>>>> to a different NUMA socket.
>>>>>>
>>>>>> Switch to kzalloc_node() and vzalloc_node() to ensure MTT buffers
>>>>>> are allocated on the local NUMA node of the PCIe device
>>>> (dev->attrs.numa_node).
>>>>>> This reduces latency for hardware access and improves performance.
>>>>>>
>>>>>> Signed-off-by: Li RongQing <lirongqing@baidu.com>
>>>>>> ---
>>>>>>  drivers/infiniband/hw/erdma/erdma_verbs.c | 4 ++--
>>>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>
>>>>>
>>>>> Hi, Li RongQing,
>>>>>
>>>>> Thanks for the patch. However, I think it is better to keep the
>>>>> current behavior, for the following reasons:
>>>>>
>>>>> 1. This path is in the control plane, so allocating memory from a remote
>>>>>    NUMA node should not have a noticeable performance impact.
>>>>
>>>> If TLB Miss , or the internal cache misses , does the HCA need to query the MTT?
>>>>
>>
>> This is rarely happen in our chip.
> 
> So why do we need this patch? The xxx_node() functions are useful when you
> need to force allocation on a specific NUMA node. In most cases, a plain
> kmalloc() will allocate memory on the same node as 'struct erdma_dev *dev',
> which typically matches the PCI device's NUMA node.
> 

Thanks for the detailed explanation.

> Please avoid vague phrasing like 'potentially improves performance' in the
> commit message and responses. It adds no meaningful information.
> 

Got it. 


> Also, please remove the dev->attrs.numa_node caching from erdma and rely on
> dev_to_node() instead.

OK, I will fix this.

Thanks,
Cheng Xu

> 
> Thanks

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 答复: [外部邮件] Re: [PATCH][rdma-next] RDMA/erdma: Use NUMA-aware allocation for MTT tables
  2026-02-26  7:09         ` Leon Romanovsky
  2026-02-26  7:59           ` Cheng Xu
@ 2026-02-27 14:55           ` Jason Gunthorpe
  1 sibling, 0 replies; 9+ messages in thread
From: Jason Gunthorpe @ 2026-02-27 14:55 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Cheng Xu, Li,Rongqing(ACG CCN), Kai Shen,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org

On Thu, Feb 26, 2026 at 09:09:54AM +0200, Leon Romanovsky wrote:
> So why do we need this patch? The xxx_node() functions are useful when you
> need to force allocation on a specific NUMA node. In most cases, a plain
> kmalloc() will allocate memory on the same node as 'struct erdma_dev *dev',
> which typically matches the PCI device's NUMA node.

I think a naked kmalloc allocates memory on the numa node of the
thread that calls it, which is not the dev's node.

IMHO it is best practice to allocate DMA'able memory from the NUMA
node of the struct device.

Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-02-27 14:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-25  8:51 [PATCH][rdma-next] RDMA/erdma: Use NUMA-aware allocation for MTT tables lirongqing
2026-02-25 11:33 ` Cheng Xu
2026-02-25 11:46   ` 答复: [外部邮件] " Li,Rongqing(ACG CCN)
2026-02-25 12:07     ` Li,Rongqing(ACG CCN)
2026-02-26  1:50       ` Cheng Xu
2026-02-26  7:09         ` Leon Romanovsky
2026-02-26  7:59           ` Cheng Xu
2026-02-27 14:55           ` Jason Gunthorpe
2026-02-26  1:51 ` Cheng Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox