* [PATCH net-next v3] page_pool: check for dma_sync_size earlier
@ 2025-01-06 3:02 Furong Xu
2025-01-06 3:15 ` Jason Xing
2025-01-09 15:44 ` Alexander Lobakin
0 siblings, 2 replies; 5+ messages in thread
From: Furong Xu @ 2025-01-06 3:02 UTC (permalink / raw)
To: netdev, linux-kernel
Cc: Jesper Dangaard Brouer, Ilias Apalodimas, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Furong Xu
Setting dma_sync_size to 0 is not illegal, fec_main.c and ravb_main.c
already did.
We can save a couple of function calls if check for dma_sync_size earlier.
This is a micro optimization, about 0.6% PPS performance improvement
has been observed on a single Cortex-A53 CPU core with 64 bytes UDP RX
traffic test.
Before this patch:
The average of packets per second is 234026 in one minute.
After this patch:
The average of packets per second is 235537 in one minute.
Signed-off-by: Furong Xu <0x1207@gmail.com>
---
V2 -> V3: Add more details about measurement in commit message
V2: https://lore.kernel.org/r/20250103082814.3850096-1-0x1207@gmail.com
V1 -> V2: Add measurement data about performance improvement in commit message
V1: https://lore.kernel.org/r/20241010114019.1734573-1-0x1207@gmail.com
---
net/core/page_pool.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 9733206d6406..9bb2d2300d0b 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -458,7 +458,7 @@ page_pool_dma_sync_for_device(const struct page_pool *pool,
netmem_ref netmem,
u32 dma_sync_size)
{
- if (pool->dma_sync && dma_dev_need_sync(pool->p.dev))
+ if (pool->dma_sync && dma_dev_need_sync(pool->p.dev) && dma_sync_size)
__page_pool_dma_sync_for_device(pool, netmem, dma_sync_size);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH net-next v3] page_pool: check for dma_sync_size earlier
2025-01-06 3:02 [PATCH net-next v3] page_pool: check for dma_sync_size earlier Furong Xu
@ 2025-01-06 3:15 ` Jason Xing
2025-01-06 3:31 ` Furong Xu
2025-01-09 15:44 ` Alexander Lobakin
1 sibling, 1 reply; 5+ messages in thread
From: Jason Xing @ 2025-01-06 3:15 UTC (permalink / raw)
To: Furong Xu
Cc: netdev, linux-kernel, Jesper Dangaard Brouer, Ilias Apalodimas,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman
On Mon, Jan 6, 2025 at 11:02 AM Furong Xu <0x1207@gmail.com> wrote:
>
> Setting dma_sync_size to 0 is not illegal, fec_main.c and ravb_main.c
> already did.
> We can save a couple of function calls if check for dma_sync_size earlier.
>
> This is a micro optimization, about 0.6% PPS performance improvement
> has been observed on a single Cortex-A53 CPU core with 64 bytes UDP RX
> traffic test.
>
> Before this patch:
> The average of packets per second is 234026 in one minute.
>
> After this patch:
> The average of packets per second is 235537 in one minute.
Sorry, I keep skeptical that this small improvement can be statically
observed? What exact tool or benchmark are you using, I wonder?
Thanks,
Jason
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net-next v3] page_pool: check for dma_sync_size earlier
2025-01-06 3:15 ` Jason Xing
@ 2025-01-06 3:31 ` Furong Xu
2025-01-09 10:09 ` Paolo Abeni
0 siblings, 1 reply; 5+ messages in thread
From: Furong Xu @ 2025-01-06 3:31 UTC (permalink / raw)
To: Jason Xing
Cc: netdev, linux-kernel, Jesper Dangaard Brouer, Ilias Apalodimas,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=GB18030, Size: 934 bytes --]
On Mon, 6 Jan 2025 11:15:45 +0800, Jason Xing <kerneljasonxing@gmail.com> wrote:
> On Mon, Jan 6, 2025 at 11:026§2AM Furong Xu <0x1207@gmail.com> wrote:
> >
> > Setting dma_sync_size to 0 is not illegal, fec_main.c and ravb_main.c
> > already did.
> > We can save a couple of function calls if check for dma_sync_size earlier.
> >
> > This is a micro optimization, about 0.6% PPS performance improvement
> > has been observed on a single Cortex-A53 CPU core with 64 bytes UDP RX
> > traffic test.
> >
> > Before this patch:
> > The average of packets per second is 234026 in one minute.
> >
> > After this patch:
> > The average of packets per second is 235537 in one minute.
>
> Sorry, I keep skeptical that this small improvement can be statically
> observed? What exact tool or benchmark are you using, I wonder?
A x86 PC send out UDP packet and the sar cmd from Sysstat package to report
the PPS on RX side:
sar -n DEV 60 1
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net-next v3] page_pool: check for dma_sync_size earlier
2025-01-06 3:31 ` Furong Xu
@ 2025-01-09 10:09 ` Paolo Abeni
0 siblings, 0 replies; 5+ messages in thread
From: Paolo Abeni @ 2025-01-09 10:09 UTC (permalink / raw)
To: Furong Xu, Jason Xing
Cc: netdev, linux-kernel, Jesper Dangaard Brouer, Ilias Apalodimas,
David S. Miller, Eric Dumazet, Jakub Kicinski, Simon Horman
On 1/6/25 4:31 AM, Furong Xu wrote:
> On Mon, 6 Jan 2025 11:15:45 +0800, Jason Xing <kerneljasonxing@gmail.com> wrote:
>
>> On Mon, Jan 6, 2025 at 11:02 AM Furong Xu <0x1207@gmail.com> wrote:
>>>
>>> Setting dma_sync_size to 0 is not illegal, fec_main.c and ravb_main.c
>>> already did.
>>> We can save a couple of function calls if check for dma_sync_size earlier.
>>>
>>> This is a micro optimization, about 0.6% PPS performance improvement
>>> has been observed on a single Cortex-A53 CPU core with 64 bytes UDP RX
>>> traffic test.
>>>
>>> Before this patch:
>>> The average of packets per second is 234026 in one minute.
>>>
>>> After this patch:
>>> The average of packets per second is 235537 in one minute.
>>
>> Sorry, I keep skeptical that this small improvement can be statically
>> observed? What exact tool or benchmark are you using, I wonder?
>
> A x86 PC send out UDP packet and the sar cmd from Sysstat package to report
> the PPS on RX side:
> sar -n DEV 60 1
I agree with Jason: in my experience this kind of delta on UDP pps tests
is quite below the noise level.
I suggest to do a micro-benchmarking, measuring the CPU cycles required
for whole page_pool_dma_sync_for_device() call via get_cycles(), on
vanilla and with your patch - assuming the arch you have handy supports it.
The delta in such testing should be significant.
Thanks,
Paolo
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net-next v3] page_pool: check for dma_sync_size earlier
2025-01-06 3:02 [PATCH net-next v3] page_pool: check for dma_sync_size earlier Furong Xu
2025-01-06 3:15 ` Jason Xing
@ 2025-01-09 15:44 ` Alexander Lobakin
1 sibling, 0 replies; 5+ messages in thread
From: Alexander Lobakin @ 2025-01-09 15:44 UTC (permalink / raw)
To: Furong Xu
Cc: netdev, linux-kernel, Jesper Dangaard Brouer, Ilias Apalodimas,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman
From: Furong Xu <0x1207@gmail.com>
Date: Mon, 6 Jan 2025 11:02:25 +0800
> Setting dma_sync_size to 0 is not illegal, fec_main.c and ravb_main.c
> already did.
> We can save a couple of function calls if check for dma_sync_size earlier.
>
> This is a micro optimization, about 0.6% PPS performance improvement
> has been observed on a single Cortex-A53 CPU core with 64 bytes UDP RX
> traffic test.
>
> Before this patch:
> The average of packets per second is 234026 in one minute.
>
> After this patch:
> The average of packets per second is 235537 in one minute.
>
> Signed-off-by: Furong Xu <0x1207@gmail.com>
> ---
> V2 -> V3: Add more details about measurement in commit message
> V2: https://lore.kernel.org/r/20250103082814.3850096-1-0x1207@gmail.com
>
> V1 -> V2: Add measurement data about performance improvement in commit message
> V1: https://lore.kernel.org/r/20241010114019.1734573-1-0x1207@gmail.com
> ---
> net/core/page_pool.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 9733206d6406..9bb2d2300d0b 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -458,7 +458,7 @@ page_pool_dma_sync_for_device(const struct page_pool *pool,
> netmem_ref netmem,
> u32 dma_sync_size)
> {
> - if (pool->dma_sync && dma_dev_need_sync(pool->p.dev))
> + if (pool->dma_sync && dma_dev_need_sync(pool->p.dev) && dma_sync_size)
page_pool_dma_sync_for_device() with dma_sync_size == 0, but with
pool->dma_sync set is VERY uncommon case. In general, this would happen
only when the device didn't write anything to the buffer.
IOW, this "shortcut" would only help *slowpath* code a bit, but
potentially harming really hot functions. Such hot inline helpers are
designed to make code paths which get executed in 99.999% times faster,
while we don't care about the rest 0.001%.
I dunno how did you get this +0.6%, but if your driver makes Page Pool
call sync_for_device(0) too often, the problem is in your driver.
> __page_pool_dma_sync_for_device(pool, netmem, dma_sync_size);
> }
Thanks,
Olek
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-01-09 15:46 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-06 3:02 [PATCH net-next v3] page_pool: check for dma_sync_size earlier Furong Xu
2025-01-06 3:15 ` Jason Xing
2025-01-06 3:31 ` Furong Xu
2025-01-09 10:09 ` Paolo Abeni
2025-01-09 15:44 ` Alexander Lobakin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).