public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* Understanding the allocation size of mlx5_alloc_buf
@ 2023-07-06 12:48 Olaf.Krzikalla
  2023-07-06 16:47 ` Leon Romanovsky
  0 siblings, 1 reply; 2+ messages in thread
From: Olaf.Krzikalla @ 2023-07-06 12:48 UTC (permalink / raw)
  To: linux-rdma

Hi @all,

creating connections via create_qp fails on our cluster for rather small numbers of processes (128 is working, 256 not) due to an out-of-memory error. I've tracked down the issue to an mlx5_alloc_buf call, which allocates ~500kB per call, which seems to be a lot.

heaptrack tells me the following:

34.47M peak memory consumed over 92 calls from
mlx5_alloc_buf
  in /usr/lib64/libibverbs/libmlx5-rdmav34.so
8.65M consumed over 16 calls from:
    create_qp
      in /usr/lib64/libibverbs/libmlx5-rdmav34.so
    mlx5_create_qp
      in /usr/lib64/libibverbs/libmlx5-rdmav34.so
.

Can anyone help me to understand, what causes a 500kB allocation in create_qp? Maybe it is some sort of a configuration issue, which I can handle somehow.

Thanks for help and best regards
Olaf Krzikalla


System information:
CentOS Linux 7 (Core)
Linux 3.10.0-1160.88.1.el7.x86_64
CA 'mlx5_0'
        CA type: MT4123
        Number of ports: 1
        Firmware version: 20.33.1048
        Hardware version: 0







^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Understanding the allocation size of mlx5_alloc_buf
  2023-07-06 12:48 Understanding the allocation size of mlx5_alloc_buf Olaf.Krzikalla
@ 2023-07-06 16:47 ` Leon Romanovsky
  0 siblings, 0 replies; 2+ messages in thread
From: Leon Romanovsky @ 2023-07-06 16:47 UTC (permalink / raw)
  To: Olaf.Krzikalla; +Cc: linux-rdma

On Thu, Jul 06, 2023 at 12:48:46PM +0000, Olaf.Krzikalla@dlr.de wrote:
> Hi @all,
> 
> creating connections via create_qp fails on our cluster for rather small numbers of processes (128 is working, 256 not) due to an out-of-memory error. I've tracked down the issue to an mlx5_alloc_buf call, which allocates ~500kB per call, which seems to be a lot.
> 
> heaptrack tells me the following:
> 
> 34.47M peak memory consumed over 92 calls from
> mlx5_alloc_buf
>   in /usr/lib64/libibverbs/libmlx5-rdmav34.so
> 8.65M consumed over 16 calls from:
>     create_qp
>       in /usr/lib64/libibverbs/libmlx5-rdmav34.so
>     mlx5_create_qp
>       in /usr/lib64/libibverbs/libmlx5-rdmav34.so
> .
> 
> Can anyone help me to understand, what causes a 500kB allocation in create_qp? Maybe it is some sort of a configuration issue, which I can handle somehow.
> 
> Thanks for help and best regards
> Olaf Krzikalla
> 
> 
> System information:
> CentOS Linux 7 (Core)
> Linux 3.10.0-1160.88.1.el7.x86_64

Please contact your Nvidia support representative, you are talking about distro kernel
and not linux upstream.

Thanks

> CA 'mlx5_0'
>         CA type: MT4123
>         Number of ports: 1
>         Firmware version: 20.33.1048
>         Hardware version: 0
> 
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-07-06 16:47 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-06 12:48 Understanding the allocation size of mlx5_alloc_buf Olaf.Krzikalla
2023-07-06 16:47 ` Leon Romanovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox