* [PATCH v5] net/mlx5: Reclaim max 50K pages at once
@ 2024-06-24 15:33 Anand Khoje
2024-06-24 20:41 ` Jesse Brandeburg
0 siblings, 1 reply; 8+ messages in thread
From: Anand Khoje @ 2024-06-24 15:33 UTC (permalink / raw)
To: linux-rdma, linux-kernel, netdev
Cc: saeedm, leon, tariqt, edumazet, kuba, pabeni, davem
In non FLR context, at times CX-5 requests release of ~8 million FW pages.
This needs humongous number of cmd mailboxes, which to be released once
the pages are reclaimed. Release of humongous number of cmd mailboxes is
consuming cpu time running into many seconds. Which with non preemptible
kernels is leading to critical process starving on that cpu’s RQ.
To alleviate this, this change restricts the total number of pages
a worker will try to reclaim maximum 50K pages in one go.
The limit 50K is aligned with the current firmware capacity/limit of
releasing 50K pages at once per MLX5_CMD_OP_MANAGE_PAGES + MLX5_PAGES_TAKE
device command.
Our tests have shown significant benefit of this change in terms of
time consumed by dma_pool_free().
During a test where an event was raised by HCA
to release 1.3 Million pages, following observations were made:
- Without this change:
Number of mailbox messages allocated was around 20K, to accommodate
the DMA addresses of 1.3 million pages.
The average time spent by dma_pool_free() to free the DMA pool is between
16 usec to 32 usec.
value ------------- Distribution ------------- count
256 | 0
512 |@ 287
1024 |@@@ 1332
2048 |@ 656
4096 |@@@@@ 2599
8192 |@@@@@@@@@@ 4755
16384 |@@@@@@@@@@@@@@@ 7545
32768 |@@@@@ 2501
65536 | 0
- With this change:
Number of mailbox messages allocated was around 800; this was to
accommodate DMA addresses of only 50K pages.
The average time spent by dma_pool_free() to free the DMA pool in this case
lies between 1 usec to 2 usec.
value ------------- Distribution ------------- count
256 | 0
512 |@@@@@@@@@@@@@@@@@@ 346
1024 |@@@@@@@@@@@@@@@@@@@@@@ 435
2048 | 0
4096 | 0
8192 | 1
16384 | 0
Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com>
---
Changes in v5:
- Made changes as per a suggestion from Leon.
---
drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
index d894a88..1fc583b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
@@ -608,6 +608,7 @@ enum {
RELEASE_ALL_PAGES_MASK = 0x4000,
};
+#define MAX_RECLAIM_NPAGES -50000
static int req_pages_handler(struct notifier_block *nb,
unsigned long type, void *data)
{
@@ -639,7 +640,7 @@ static int req_pages_handler(struct notifier_block *nb,
req->dev = dev;
req->func_id = func_id;
- req->npages = npages;
+ req->npages = max_t(s32, npages, MAX_RECLAIM_NPAGES);
req->ec_function = ec_function;
req->release_all = release_all;
INIT_WORK(&req->work, pages_work_handler);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH v5] net/mlx5: Reclaim max 50K pages at once
2024-06-24 15:33 [PATCH v5] net/mlx5: Reclaim max 50K pages at once Anand Khoje
@ 2024-06-24 20:41 ` Jesse Brandeburg
2024-06-25 5:00 ` Anand Khoje
0 siblings, 1 reply; 8+ messages in thread
From: Jesse Brandeburg @ 2024-06-24 20:41 UTC (permalink / raw)
To: Anand Khoje, linux-rdma, linux-kernel, netdev
Cc: saeedm, leon, tariqt, edumazet, kuba, pabeni, davem
On 6/24/2024 8:33 AM, Anand Khoje wrote:
> --- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
> @@ -608,6 +608,7 @@ enum {
> RELEASE_ALL_PAGES_MASK = 0x4000,
> };
>
> +#define MAX_RECLAIM_NPAGES -50000
Can you please explain why this is negative? There doesn't seem to be
any reason mentioned in the commit message or code.
At the very least it's super confusing to have a MAX be negative, and at
worst it's a bug. I don't have any other context on this code besides
this patch, so an explanation would be helpful.
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH v5] net/mlx5: Reclaim max 50K pages at once
2024-06-24 20:41 ` Jesse Brandeburg
@ 2024-06-25 5:00 ` Anand Khoje
2024-06-25 20:19 ` Zhu Yanjun
2024-06-28 15:44 ` David Laight
0 siblings, 2 replies; 8+ messages in thread
From: Anand Khoje @ 2024-06-25 5:00 UTC (permalink / raw)
To: Jesse Brandeburg, linux-rdma, linux-kernel, netdev
Cc: saeedm, leon, tariqt, edumazet, kuba, pabeni, davem
On 6/25/24 02:11, Jesse Brandeburg wrote:
> On 6/24/2024 8:33 AM, Anand Khoje wrote:
>
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
>> @@ -608,6 +608,7 @@ enum {
>> RELEASE_ALL_PAGES_MASK = 0x4000,
>> };
>>
>> +#define MAX_RECLAIM_NPAGES -50000
> Can you please explain why this is negative? There doesn't seem to be
> any reason mentioned in the commit message or code.
>
> At the very least it's super confusing to have a MAX be negative, and at
> worst it's a bug. I don't have any other context on this code besides
> this patch, so an explanation would be helpful.
>
>
>
Hi Jesse,
The way Mellanox ConnectX5 driver handles 'release of allocated pages
from HCA' or 'allocation of pages to HCA', is by sending an event to the
host. This event will have number of pages in it. If the number is
positive, that indicates HCA is requesting that number of pages to be
allocated. And if that number is negative, it is the HCA indicating that
that number of pages can be reclaimed by the host.
In this patch we are restricting the maximum number of pages that can be
reclaimed to be 50000 (effectively this would be -50000 as it is
reclaim). This limit is based on the capability of the firmware as it
cannot release more than 50000 back to the host in one go.
I hope that explains.
Thanks,
Anand
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH v5] net/mlx5: Reclaim max 50K pages at once
2024-06-25 5:00 ` Anand Khoje
@ 2024-06-25 20:19 ` Zhu Yanjun
2024-06-26 5:34 ` Leon Romanovsky
2024-06-28 15:44 ` David Laight
1 sibling, 1 reply; 8+ messages in thread
From: Zhu Yanjun @ 2024-06-25 20:19 UTC (permalink / raw)
To: Anand Khoje, Jesse Brandeburg, linux-rdma, linux-kernel, netdev
Cc: saeedm, leon, tariqt, edumazet, kuba, pabeni, davem
在 2024/6/25 13:00, Anand Khoje 写道:
>
> On 6/25/24 02:11, Jesse Brandeburg wrote:
>> On 6/24/2024 8:33 AM, Anand Khoje wrote:
>>
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
>>> @@ -608,6 +608,7 @@ enum {
>>> RELEASE_ALL_PAGES_MASK = 0x4000,
>>> };
>>> +#define MAX_RECLAIM_NPAGES -50000
>> Can you please explain why this is negative? There doesn't seem to be
>> any reason mentioned in the commit message or code.
>>
>> At the very least it's super confusing to have a MAX be negative, and at
>> worst it's a bug. I don't have any other context on this code besides
>> this patch, so an explanation would be helpful.
>>
>>
>>
> Hi Jesse,
>
> The way Mellanox ConnectX5 driver handles 'release of allocated pages
> from HCA' or 'allocation of pages to HCA', is by sending an event to the
> host. This event will have number of pages in it. If the number is
> positive, that indicates HCA is requesting that number of pages to be
> allocated. And if that number is negative, it is the HCA indicating that
> that number of pages can be reclaimed by the host.
>
> In this patch we are restricting the maximum number of pages that can be
> reclaimed to be 50000 (effectively this would be -50000 as it is
> reclaim). This limit is based on the capability of the firmware as it
> cannot release more than 50000 back to the host in one go.
>
> I hope that explains.
To be honest, I am also obvious why this MACRO is defined as a negative
number. From the above, I can understand why. I think, perhaps many
people also wonder why it is defined as a negative. IMO, it is better
that you put the above explanations into the source code as comments.
When users check the source code, from the comments, users will know why
it is defined as a negative number.
Thanks a lot.
Zhu Yanjun
>
> Thanks,
>
> Anand
>
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH v5] net/mlx5: Reclaim max 50K pages at once
2024-06-25 20:19 ` Zhu Yanjun
@ 2024-06-26 5:34 ` Leon Romanovsky
0 siblings, 0 replies; 8+ messages in thread
From: Leon Romanovsky @ 2024-06-26 5:34 UTC (permalink / raw)
To: Zhu Yanjun
Cc: Anand Khoje, Jesse Brandeburg, linux-rdma, linux-kernel, netdev,
saeedm, tariqt, edumazet, kuba, pabeni, davem
On Wed, Jun 26, 2024 at 04:19:17AM +0800, Zhu Yanjun wrote:
> 在 2024/6/25 13:00, Anand Khoje 写道:
> >
> > On 6/25/24 02:11, Jesse Brandeburg wrote:
> > > On 6/24/2024 8:33 AM, Anand Khoje wrote:
> > >
> > > > --- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
> > > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
> > > > @@ -608,6 +608,7 @@ enum {
> > > > RELEASE_ALL_PAGES_MASK = 0x4000,
> > > > };
> > > > +#define MAX_RECLAIM_NPAGES -50000
> > > Can you please explain why this is negative? There doesn't seem to be
> > > any reason mentioned in the commit message or code.
> > >
> > > At the very least it's super confusing to have a MAX be negative, and at
> > > worst it's a bug. I don't have any other context on this code besides
> > > this patch, so an explanation would be helpful.
> > >
> > >
> > >
> > Hi Jesse,
> >
> > The way Mellanox ConnectX5 driver handles 'release of allocated pages
> > from HCA' or 'allocation of pages to HCA', is by sending an event to the
> > host. This event will have number of pages in it. If the number is
> > positive, that indicates HCA is requesting that number of pages to be
> > allocated. And if that number is negative, it is the HCA indicating that
> > that number of pages can be reclaimed by the host.
> >
> > In this patch we are restricting the maximum number of pages that can be
> > reclaimed to be 50000 (effectively this would be -50000 as it is
> > reclaim). This limit is based on the capability of the firmware as it
> > cannot release more than 50000 back to the host in one go.
> >
> > I hope that explains.
>
> To be honest, I am also obvious why this MACRO is defined as a negative
> number. From the above, I can understand why. I think, perhaps many people
> also wonder why it is defined as a negative. IMO, it is better that you put
> the above explanations into the source code as comments.
> When users check the source code, from the comments, users will know why it
> is defined as a negative number.
I see no problem with adding a comment to the code, but I think that it
won't help anyone. The whole reclaim/give page logic inside the mlx5
driver is written with the assumption that the number of pages is
negative for reclaim and positive for give.
Thanks
>
> Thanks a lot.
> Zhu Yanjun
>
> >
> > Thanks,
> >
> > Anand
> >
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH v5] net/mlx5: Reclaim max 50K pages at once
2024-06-25 5:00 ` Anand Khoje
2024-06-25 20:19 ` Zhu Yanjun
@ 2024-06-28 15:44 ` David Laight
2024-07-01 5:39 ` Anand Khoje
2024-07-01 5:40 ` Anand Khoje
1 sibling, 2 replies; 8+ messages in thread
From: David Laight @ 2024-06-28 15:44 UTC (permalink / raw)
To: 'Anand Khoje', Jesse Brandeburg,
linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org
Cc: saeedm@mellanox.com, leon@kernel.org, tariqt@nvidia.com,
edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
davem@davemloft.net
...
> The way Mellanox ConnectX5 driver handles 'release of allocated pages
> from HCA' or 'allocation of pages to HCA', is by sending an event to the
> host. This event will have number of pages in it. If the number is
> positive, that indicates HCA is requesting that number of pages to be
> allocated. And if that number is negative, it is the HCA indicating that
> that number of pages can be reclaimed by the host.
A one line comment would do.
Possibly even negating the be32toh() result?
> In this patch we are restricting the maximum number of pages that can be
> reclaimed to be 50000 (effectively this would be -50000 as it is
> reclaim). This limit is based on the capability of the firmware as it
> cannot release more than 50000 back to the host in one go.
Hang on, why are you soft limiting it to the hard limit?
I thought the problem was that releasing a lot of pages took a long
time and 'stuffed' other time-critical tasks.
The only way to resolve that would seem to be to defer the actual freeing
to a low (or at least normal user) priority thread.
You would definitely want to get out of 'softint' context.
(Which is out of napi unless forced to be threaded - and that only really
works if you force the threads under the RT scheduler.)
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v5] net/mlx5: Reclaim max 50K pages at once
2024-06-28 15:44 ` David Laight
@ 2024-07-01 5:39 ` Anand Khoje
2024-07-01 5:40 ` Anand Khoje
1 sibling, 0 replies; 8+ messages in thread
From: Anand Khoje @ 2024-07-01 5:39 UTC (permalink / raw)
To: David Laight, Jesse Brandeburg, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org
Cc: saeedm@mellanox.com, leon@kernel.org, tariqt@nvidia.com,
edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
davem@davemloft.net
On 6/28/24 21:14, David Laight wrote:
> ...
>> The way Mellanox ConnectX5 driver handles 'release of allocated pages
>> from HCA' or 'allocation of pages to HCA', is by sending an event to the
>> host. This event will have number of pages in it. If the number is
>> positive, that indicates HCA is requesting that number of pages to be
>> allocated. And if that number is negative, it is the HCA indicating that
>> that number of pages can be reclaimed by the host.
> A one line comment would do.
> Possibly even negating the be32toh() result?
>
>> In this patch we are restricting the maximum number of pages that can be
>> reclaimed to be 50000 (effectively this would be -50000 as it is
>> reclaim). This limit is based on the capability of the firmware as it
>> cannot release more than 50000 back to the host in one go.
> Hang on, why are you soft limiting it to the hard limit?
> I thought the problem was that releasing a lot of pages took a long
> time and 'stuffed' other time-critical tasks.
>
> The only way to resolve that would seem to be to defer the actual freeing
> to a low (or at least normal user) priority thread.
> You would definitely want to get out of 'softint' context.
> (Which is out of napi unless forced to be threaded - and that only really
> works if you force the threads under the RT scheduler.)
>
> David
Hi David,
The issue here is, when Mellanox device sends a huge number of pages
back to the host to reclaim, the host allocates a certain number of
mailbox messages mlx5_cmd_mailbox to accommodate the DMA addresses of
the memory to be reclaimed. The freeing of these mailbox messages is
time consuming (not the freeing of actual pages).
Now, the limit of the FW is that presently, it frees upto 50000 pages.
This limit can increase in future firmware versions. We are limiting
this in the driver because we see optimal results with this limit during
our tests. The results indicated that the time consumed while freeing of
mailbox messages stayed 2 usec on average - which is tolerable and would
not need running this thread in a different (low priority) context.
Thanks,
Anand
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v5] net/mlx5: Reclaim max 50K pages at once
2024-06-28 15:44 ` David Laight
2024-07-01 5:39 ` Anand Khoje
@ 2024-07-01 5:40 ` Anand Khoje
1 sibling, 0 replies; 8+ messages in thread
From: Anand Khoje @ 2024-07-01 5:40 UTC (permalink / raw)
To: David Laight, Jesse Brandeburg, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org
Cc: saeedm@mellanox.com, leon@kernel.org, tariqt@nvidia.com,
edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
davem@davemloft.net
On 6/28/24 21:14, David Laight wrote:
> ...
>> The way Mellanox ConnectX5 driver handles 'release of allocated pages
>> from HCA' or 'allocation of pages to HCA', is by sending an event to the
>> host. This event will have number of pages in it. If the number is
>> positive, that indicates HCA is requesting that number of pages to be
>> allocated. And if that number is negative, it is the HCA indicating that
>> that number of pages can be reclaimed by the host.
> A one line comment would do.
> Possibly even negating the be32toh() result?
>
>> In this patch we are restricting the maximum number of pages that can be
>> reclaimed to be 50000 (effectively this would be -50000 as it is
>> reclaim). This limit is based on the capability of the firmware as it
>> cannot release more than 50000 back to the host in one go.
> Hang on, why are you soft limiting it to the hard limit?
> I thought the problem was that releasing a lot of pages took a long
> time and 'stuffed' other time-critical tasks.
>
> The only way to resolve that would seem to be to defer the actual freeing
> to a low (or at least normal user) priority thread.
> You would definitely want to get out of 'softint' context.
> (Which is out of napi unless forced to be threaded - and that only really
> works if you force the threads under the RT scheduler.)
>
> David
Hi David,
The issue here is, when Mellanox device sends a huge number of pages
back to the host to reclaim, the host allocates a certain number of
mailbox messages mlx5_cmd_mailbox to accommodate the DMA addresses of
the memory to be reclaimed. The freeing of these mailbox messages is
time consuming (not the freeing of actual pages).
Now, the limit of the FW is that presently, it frees upto 50000 pages.
This limit can increase in future firmware versions. We are limiting
this in the driver because we see optimal results with this limit during
our tests. The results indicated that the time consumed while freeing of
mailbox messages stayed 2 usec on average - which is tolerable and would
not need running this thread in a different (low priority) context.
Thanks,
Anand
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-07-01 5:41 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-24 15:33 [PATCH v5] net/mlx5: Reclaim max 50K pages at once Anand Khoje
2024-06-24 20:41 ` Jesse Brandeburg
2024-06-25 5:00 ` Anand Khoje
2024-06-25 20:19 ` Zhu Yanjun
2024-06-26 5:34 ` Leon Romanovsky
2024-06-28 15:44 ` David Laight
2024-07-01 5:39 ` Anand Khoje
2024-07-01 5:40 ` Anand Khoje
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).