* Understanding the allocation size of mlx5_alloc_buf
@ 2023-07-06 12:48 Olaf.Krzikalla
2023-07-06 16:47 ` Leon Romanovsky
0 siblings, 1 reply; 2+ messages in thread
From: Olaf.Krzikalla @ 2023-07-06 12:48 UTC (permalink / raw)
To: linux-rdma
Hi @all,
creating connections via create_qp fails on our cluster for rather small numbers of processes (128 is working, 256 not) due to an out-of-memory error. I've tracked down the issue to an mlx5_alloc_buf call, which allocates ~500kB per call, which seems to be a lot.
heaptrack tells me the following:
34.47M peak memory consumed over 92 calls from
mlx5_alloc_buf
in /usr/lib64/libibverbs/libmlx5-rdmav34.so
8.65M consumed over 16 calls from:
create_qp
in /usr/lib64/libibverbs/libmlx5-rdmav34.so
mlx5_create_qp
in /usr/lib64/libibverbs/libmlx5-rdmav34.so
.
Can anyone help me to understand, what causes a 500kB allocation in create_qp? Maybe it is some sort of a configuration issue, which I can handle somehow.
Thanks for help and best regards
Olaf Krzikalla
System information:
CentOS Linux 7 (Core)
Linux 3.10.0-1160.88.1.el7.x86_64
CA 'mlx5_0'
CA type: MT4123
Number of ports: 1
Firmware version: 20.33.1048
Hardware version: 0
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Understanding the allocation size of mlx5_alloc_buf
2023-07-06 12:48 Understanding the allocation size of mlx5_alloc_buf Olaf.Krzikalla
@ 2023-07-06 16:47 ` Leon Romanovsky
0 siblings, 0 replies; 2+ messages in thread
From: Leon Romanovsky @ 2023-07-06 16:47 UTC (permalink / raw)
To: Olaf.Krzikalla; +Cc: linux-rdma
On Thu, Jul 06, 2023 at 12:48:46PM +0000, Olaf.Krzikalla@dlr.de wrote:
> Hi @all,
>
> creating connections via create_qp fails on our cluster for rather small numbers of processes (128 is working, 256 not) due to an out-of-memory error. I've tracked down the issue to an mlx5_alloc_buf call, which allocates ~500kB per call, which seems to be a lot.
>
> heaptrack tells me the following:
>
> 34.47M peak memory consumed over 92 calls from
> mlx5_alloc_buf
> in /usr/lib64/libibverbs/libmlx5-rdmav34.so
> 8.65M consumed over 16 calls from:
> create_qp
> in /usr/lib64/libibverbs/libmlx5-rdmav34.so
> mlx5_create_qp
> in /usr/lib64/libibverbs/libmlx5-rdmav34.so
> .
>
> Can anyone help me to understand, what causes a 500kB allocation in create_qp? Maybe it is some sort of a configuration issue, which I can handle somehow.
>
> Thanks for help and best regards
> Olaf Krzikalla
>
>
> System information:
> CentOS Linux 7 (Core)
> Linux 3.10.0-1160.88.1.el7.x86_64
Please contact your Nvidia support representative, you are talking about distro kernel
and not linux upstream.
Thanks
> CA 'mlx5_0'
> CA type: MT4123
> Number of ports: 1
> Firmware version: 20.33.1048
> Hardware version: 0
>
>
>
>
>
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2023-07-06 16:47 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-06 12:48 Understanding the allocation size of mlx5_alloc_buf Olaf.Krzikalla
2023-07-06 16:47 ` Leon Romanovsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox