[PATCH net-next v7] net/mlx5: Reclaim max 50K pages at once

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH net-next v7] net/mlx5: Reclaim max 50K pages at once
@ 2024-07-22 13:46 Anand Khoje
  2024-07-23 11:03 ` Paolo Abeni
  0 siblings, 1 reply; 4+ messages in thread
From: Anand Khoje @ 2024-07-22 13:46 UTC (permalink / raw)
  To: linux-rdma, linux-kernel, netdev
  Cc: saeedm, leon, tariqt, edumazet, kuba, pabeni, davem,
	rama.nichanamatlu, manjunath.b.patil

In non FLR context, at times CX-5 requests release of ~8 million FW pages.
This needs humongous number of cmd mailboxes, which to be released once
the pages are reclaimed. Release of humongous number of cmd mailboxes is
consuming cpu time running into many seconds. Which with non preemptible
kernels is leading to critical process starving on that cpu’s RQ.
On top of it, the FW does not use all the mailbox messages as it has a
limit of releasing 50K pages at once per MLX5_CMD_OP_MANAGE_PAGES +
MLX5_PAGES_TAKE device command. Hence, the allocation of these many
mailboxes is extra and adds unnecessary overhead.
To alleviate this, this change restricts the total number of pages
a worker will try to reclaim to maximum 50K pages in one go.

Our tests have shown significant benefit of this change in terms of
time consumed by dma_pool_free().
During a test where an event was raised by HCA
to release 1.3 Million pages, following observations were made:

- Without this change:
Number of mailbox messages allocated was around 20K, to accommodate
the DMA addresses of 1.3 million pages.
The average time spent by dma_pool_free() to free the DMA pool is between
16 usec to 32 usec.
           value  ------------- Distribution ------------- count
             256 |                                         0
             512 |@                                        287
            1024 |@@@                                      1332
            2048 |@                                        656
            4096 |@@@@@                                    2599
            8192 |@@@@@@@@@@                               4755
           16384 |@@@@@@@@@@@@@@@                          7545
           32768 |@@@@@                                    2501
           65536 |                                         0

- With this change:
Number of mailbox messages allocated was around 800; this was to
accommodate DMA addresses of only 50K pages.
The average time spent by dma_pool_free() to free the DMA pool in this case
lies between 1 usec to 2 usec.
           value  ------------- Distribution ------------- count
             256 |                                         0
             512 |@@@@@@@@@@@@@@@@@@                       346
            1024 |@@@@@@@@@@@@@@@@@@@@@@                   435
            2048 |                                         0
            4096 |                                         0
            8192 |                                         1
           16384 |                                         0

Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
index d894a88..972e8e9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
@@ -608,6 +608,11 @@ enum {
 	RELEASE_ALL_PAGES_MASK = 0x4000,
 };
 
+/* This limit is based on the capability of the firmware as it cannot release
+ * more than 50000 back to the host in one go.
+ */
+#define MAX_RECLAIM_NPAGES (-50000)
+
 static int req_pages_handler(struct notifier_block *nb,
 			     unsigned long type, void *data)
 {
@@ -639,7 +644,16 @@ static int req_pages_handler(struct notifier_block *nb,
 
 	req->dev = dev;
 	req->func_id = func_id;
-	req->npages = npages;
+
+	/* npages > 0 means HCA asking host to allocate/give pages,
+	 * npages < 0 means HCA asking host to reclaim back the pages allocated.
+	 * Here we are restricting the maximum number of pages that can be
+	 * reclaimed to be MAX_RECLAIM_NPAGES. Note that MAX_RECLAIM_NPAGES is
+	 * a negative value.
+	 * Since MAX_RECLAIM is negative, we are using max() to restrict
+	 * req->npages (and not min ()).
+	 */
+	req->npages = max_t(s32, npages, MAX_RECLAIM_NPAGES);
 	req->ec_function = ec_function;
 	req->release_all = release_all;
 	INIT_WORK(&req->work, pages_work_handler);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next v7] net/mlx5: Reclaim max 50K pages at once
  2024-07-22 13:46 Anand Khoje
@ 2024-07-23 11:03 ` Paolo Abeni
  0 siblings, 0 replies; 4+ messages in thread
From: Paolo Abeni @ 2024-07-23 11:03 UTC (permalink / raw)
  To: Anand Khoje, linux-rdma, linux-kernel, netdev
  Cc: saeedm, leon, tariqt, edumazet, kuba, davem, rama.nichanamatlu,
	manjunath.b.patil

On 7/22/24 15:46, Anand Khoje wrote:
> In non FLR context, at times CX-5 requests release of ~8 million FW pages.
> This needs humongous number of cmd mailboxes, which to be released once
> the pages are reclaimed. Release of humongous number of cmd mailboxes is
> consuming cpu time running into many seconds. Which with non preemptible
> kernels is leading to critical process starving on that cpu’s RQ.
> On top of it, the FW does not use all the mailbox messages as it has a
> limit of releasing 50K pages at once per MLX5_CMD_OP_MANAGE_PAGES +
> MLX5_PAGES_TAKE device command. Hence, the allocation of these many
> mailboxes is extra and adds unnecessary overhead.
> To alleviate this, this change restricts the total number of pages
> a worker will try to reclaim to maximum 50K pages in one go.
> 
> Our tests have shown significant benefit of this change in terms of
> time consumed by dma_pool_free().
> During a test where an event was raised by HCA
> to release 1.3 Million pages, following observations were made:
> 
> - Without this change:
> Number of mailbox messages allocated was around 20K, to accommodate
> the DMA addresses of 1.3 million pages.
> The average time spent by dma_pool_free() to free the DMA pool is between
> 16 usec to 32 usec.
>             value  ------------- Distribution ------------- count
>               256 |                                         0
>               512 |@                                        287
>              1024 |@@@                                      1332
>              2048 |@                                        656
>              4096 |@@@@@                                    2599
>              8192 |@@@@@@@@@@                               4755
>             16384 |@@@@@@@@@@@@@@@                          7545
>             32768 |@@@@@                                    2501
>             65536 |                                         0
> 
> - With this change:
> Number of mailbox messages allocated was around 800; this was to
> accommodate DMA addresses of only 50K pages.
> The average time spent by dma_pool_free() to free the DMA pool in this case
> lies between 1 usec to 2 usec.
>             value  ------------- Distribution ------------- count
>               256 |                                         0
>               512 |@@@@@@@@@@@@@@@@@@                       346
>              1024 |@@@@@@@@@@@@@@@@@@@@@@                   435
>              2048 |                                         0
>              4096 |                                         0
>              8192 |                                         1
>             16384 |                                         0
> 
> Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com>
> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> Acked-by: Saeed Mahameed <saeedm@nvidia.com>

## Form letter - net-next-closed

The merge window for v6.11 and therefore net-next is closed for new
drivers, features, code refactoring and optimizations. We are currently
accepting bug fixes only.

Please repost when net-next reopens after July 29th.

RFC patches sent for review only are obviously welcome at any time.

See:
https://www.kernel.org/doc/html/next/process/maintainer-netdev.html#development-cycle
-- 
pw-bot: defer


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH net-next v7] net/mlx5: Reclaim max 50K pages at once
@ 2024-07-30  7:36 Anand Khoje
  2024-08-01 16:10 ` patchwork-bot+netdevbpf
  0 siblings, 1 reply; 4+ messages in thread
From: Anand Khoje @ 2024-07-30  7:36 UTC (permalink / raw)
  To: linux-rdma, linux-kernel, netdev
  Cc: saeedm, leon, tariqt, edumazet, kuba, pabeni, davem,
	rama.nichanamatlu, manjunath.b.patil

In non FLR context, at times CX-5 requests release of ~8 million FW pages.
This needs humongous number of cmd mailboxes, which to be released once
the pages are reclaimed. Release of humongous number of cmd mailboxes is
consuming cpu time running into many seconds. Which with non preemptible
kernels is leading to critical process starving on that cpu’s RQ.
On top of it, the FW does not use all the mailbox messages as it has a
limit of releasing 50K pages at once per MLX5_CMD_OP_MANAGE_PAGES +
MLX5_PAGES_TAKE device command. Hence, the allocation of these many
mailboxes is extra and adds unnecessary overhead.
To alleviate this, this change restricts the total number of pages
a worker will try to reclaim to maximum 50K pages in one go.

Our tests have shown significant benefit of this change in terms of
time consumed by dma_pool_free().
During a test where an event was raised by HCA
to release 1.3 Million pages, following observations were made:

- Without this change:
Number of mailbox messages allocated was around 20K, to accommodate
the DMA addresses of 1.3 million pages.
The average time spent by dma_pool_free() to free the DMA pool is between
16 usec to 32 usec.
           value  ------------- Distribution ------------- count
             256 |                                         0
             512 |@                                        287
            1024 |@@@                                      1332
            2048 |@                                        656
            4096 |@@@@@                                    2599
            8192 |@@@@@@@@@@                               4755
           16384 |@@@@@@@@@@@@@@@                          7545
           32768 |@@@@@                                    2501
           65536 |                                         0

- With this change:
Number of mailbox messages allocated was around 800; this was to
accommodate DMA addresses of only 50K pages.
The average time spent by dma_pool_free() to free the DMA pool in this case
lies between 1 usec to 2 usec.
           value  ------------- Distribution ------------- count
             256 |                                         0
             512 |@@@@@@@@@@@@@@@@@@                       346
            1024 |@@@@@@@@@@@@@@@@@@@@@@                   435
            2048 |                                         0
            4096 |                                         0
            8192 |                                         1
           16384 |                                         0

Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
index d894a88..972e8e9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
@@ -608,6 +608,11 @@ enum {
 	RELEASE_ALL_PAGES_MASK = 0x4000,
 };
 
+/* This limit is based on the capability of the firmware as it cannot release
+ * more than 50000 back to the host in one go.
+ */
+#define MAX_RECLAIM_NPAGES (-50000)
+
 static int req_pages_handler(struct notifier_block *nb,
 			     unsigned long type, void *data)
 {
@@ -639,7 +644,16 @@ static int req_pages_handler(struct notifier_block *nb,
 
 	req->dev = dev;
 	req->func_id = func_id;
-	req->npages = npages;
+
+	/* npages > 0 means HCA asking host to allocate/give pages,
+	 * npages < 0 means HCA asking host to reclaim back the pages allocated.
+	 * Here we are restricting the maximum number of pages that can be
+	 * reclaimed to be MAX_RECLAIM_NPAGES. Note that MAX_RECLAIM_NPAGES is
+	 * a negative value.
+	 * Since MAX_RECLAIM is negative, we are using max() to restrict
+	 * req->npages (and not min ()).
+	 */
+	req->npages = max_t(s32, npages, MAX_RECLAIM_NPAGES);
 	req->ec_function = ec_function;
 	req->release_all = release_all;
 	INIT_WORK(&req->work, pages_work_handler);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next v7] net/mlx5: Reclaim max 50K pages at once
  2024-07-30  7:36 [PATCH net-next v7] net/mlx5: Reclaim max 50K pages at once Anand Khoje
@ 2024-08-01 16:10 ` patchwork-bot+netdevbpf
  0 siblings, 0 replies; 4+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-08-01 16:10 UTC (permalink / raw)
  To: Anand Khoje
  Cc: linux-rdma, linux-kernel, netdev, saeedm, leon, tariqt, edumazet,
	kuba, pabeni, davem, rama.nichanamatlu, manjunath.b.patil

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 30 Jul 2024 13:06:33 +0530 you wrote:
> In non FLR context, at times CX-5 requests release of ~8 million FW pages.
> This needs humongous number of cmd mailboxes, which to be released once
> the pages are reclaimed. Release of humongous number of cmd mailboxes is
> consuming cpu time running into many seconds. Which with non preemptible
> kernels is leading to critical process starving on that cpu’s RQ.
> On top of it, the FW does not use all the mailbox messages as it has a
> limit of releasing 50K pages at once per MLX5_CMD_OP_MANAGE_PAGES +
> MLX5_PAGES_TAKE device command. Hence, the allocation of these many
> mailboxes is extra and adds unnecessary overhead.
> To alleviate this, this change restricts the total number of pages
> a worker will try to reclaim to maximum 50K pages in one go.
> 
> [...]

Here is the summary with links:
  - [net-next,v7] net/mlx5: Reclaim max 50K pages at once
    https://git.kernel.org/netdev/net-next/c/501c3005f031

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-08-01 16:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-30  7:36 [PATCH net-next v7] net/mlx5: Reclaim max 50K pages at once Anand Khoje
2024-08-01 16:10 ` patchwork-bot+netdevbpf
  -- strict thread matches above, loose matches on Subject: below --
2024-07-22 13:46 Anand Khoje
2024-07-23 11:03 ` Paolo Abeni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox