* [PATCH bpf] xsk: fix batch alloc API on non-coherent systems
@ 2024-09-11 19:10 Maciej Fijalkowski
2024-09-12 11:04 ` Magnus Karlsson
2024-09-14 1:20 ` patchwork-bot+netdevbpf
0 siblings, 2 replies; 4+ messages in thread
From: Maciej Fijalkowski @ 2024-09-11 19:10 UTC (permalink / raw)
To: bpf, ast, daniel, andrii
Cc: netdev, magnus.karlsson, bjorn, maciej.fijalkowski,
Dries De Winter
In cases when synchronizing DMA operations is necessary,
xsk_buff_alloc_batch() returns a single buffer instead of the requested
count. This puts the pressure on drivers that use batch API as they have
to check for this corner case on their side and take care of allocations
by themselves, which feels counter productive. Let us improve the core
by looping over xp_alloc() @max times when slow path needs to be taken.
Another issue with current interface, as spotted and fixed by Dries, was
that when driver called xsk_buff_alloc_batch() with @max == 0, for slow
path case it still allocated and returned a single buffer, which should
not happen. By introducing the logic from first paragraph we kill two
birds with one stone and address this problem as well.
Fixes: 47e4075df300 ("xsk: Batched buffer allocation for the pool")
Reported-and-tested-by: Dries De Winter <ddewinter@synamedia.com>
Co-developed-by: Dries De Winter <ddewinter@synamedia.com>
Signed-off-by: Dries De Winter <ddewinter@synamedia.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
net/xdp/xsk_buff_pool.c | 25 ++++++++++++++++++-------
1 file changed, 18 insertions(+), 7 deletions(-)
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 29afa880ffa0..5e2e03042ef3 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -623,20 +623,31 @@ static u32 xp_alloc_reused(struct xsk_buff_pool *pool, struct xdp_buff **xdp, u3
return nb_entries;
}
-u32 xp_alloc_batch(struct xsk_buff_pool *pool, struct xdp_buff **xdp, u32 max)
+static u32 xp_alloc_slow(struct xsk_buff_pool *pool, struct xdp_buff **xdp,
+ u32 max)
{
- u32 nb_entries1 = 0, nb_entries2;
+ int i;
- if (unlikely(pool->dev && dma_dev_need_sync(pool->dev))) {
+ for (i = 0; i < max; i++) {
struct xdp_buff *buff;
- /* Slow path */
buff = xp_alloc(pool);
- if (buff)
- *xdp = buff;
- return !!buff;
+ if (unlikely(!buff))
+ return i;
+ *xdp = buff;
+ xdp++;
}
+ return max;
+}
+
+u32 xp_alloc_batch(struct xsk_buff_pool *pool, struct xdp_buff **xdp, u32 max)
+{
+ u32 nb_entries1 = 0, nb_entries2;
+
+ if (unlikely(pool->dev && dma_dev_need_sync(pool->dev)))
+ return xp_alloc_slow(pool, xdp, max);
+
if (unlikely(pool->free_list_cnt)) {
nb_entries1 = xp_alloc_reused(pool, xdp, max);
if (nb_entries1 == max)
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH bpf] xsk: fix batch alloc API on non-coherent systems
2024-09-11 19:10 [PATCH bpf] xsk: fix batch alloc API on non-coherent systems Maciej Fijalkowski
@ 2024-09-12 11:04 ` Magnus Karlsson
2024-09-13 19:48 ` Alexei Starovoitov
2024-09-14 1:20 ` patchwork-bot+netdevbpf
1 sibling, 1 reply; 4+ messages in thread
From: Magnus Karlsson @ 2024-09-12 11:04 UTC (permalink / raw)
To: Maciej Fijalkowski
Cc: bpf, ast, daniel, andrii, netdev, magnus.karlsson, bjorn,
Dries De Winter
On Wed, 11 Sept 2024 at 21:10, Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> In cases when synchronizing DMA operations is necessary,
> xsk_buff_alloc_batch() returns a single buffer instead of the requested
> count. This puts the pressure on drivers that use batch API as they have
> to check for this corner case on their side and take care of allocations
> by themselves, which feels counter productive. Let us improve the core
> by looping over xp_alloc() @max times when slow path needs to be taken.
>
> Another issue with current interface, as spotted and fixed by Dries, was
> that when driver called xsk_buff_alloc_batch() with @max == 0, for slow
> path case it still allocated and returned a single buffer, which should
> not happen. By introducing the logic from first paragraph we kill two
> birds with one stone and address this problem as well.
Thanks Maciej and Dries for finding and fixing this.
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
> Fixes: 47e4075df300 ("xsk: Batched buffer allocation for the pool")
> Reported-and-tested-by: Dries De Winter <ddewinter@synamedia.com>
> Co-developed-by: Dries De Winter <ddewinter@synamedia.com>
> Signed-off-by: Dries De Winter <ddewinter@synamedia.com>
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---
> net/xdp/xsk_buff_pool.c | 25 ++++++++++++++++++-------
> 1 file changed, 18 insertions(+), 7 deletions(-)
>
> diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> index 29afa880ffa0..5e2e03042ef3 100644
> --- a/net/xdp/xsk_buff_pool.c
> +++ b/net/xdp/xsk_buff_pool.c
> @@ -623,20 +623,31 @@ static u32 xp_alloc_reused(struct xsk_buff_pool *pool, struct xdp_buff **xdp, u3
> return nb_entries;
> }
>
> -u32 xp_alloc_batch(struct xsk_buff_pool *pool, struct xdp_buff **xdp, u32 max)
> +static u32 xp_alloc_slow(struct xsk_buff_pool *pool, struct xdp_buff **xdp,
> + u32 max)
> {
> - u32 nb_entries1 = 0, nb_entries2;
> + int i;
>
> - if (unlikely(pool->dev && dma_dev_need_sync(pool->dev))) {
> + for (i = 0; i < max; i++) {
> struct xdp_buff *buff;
>
> - /* Slow path */
> buff = xp_alloc(pool);
> - if (buff)
> - *xdp = buff;
> - return !!buff;
> + if (unlikely(!buff))
> + return i;
> + *xdp = buff;
> + xdp++;
> }
>
> + return max;
> +}
> +
> +u32 xp_alloc_batch(struct xsk_buff_pool *pool, struct xdp_buff **xdp, u32 max)
> +{
> + u32 nb_entries1 = 0, nb_entries2;
> +
> + if (unlikely(pool->dev && dma_dev_need_sync(pool->dev)))
> + return xp_alloc_slow(pool, xdp, max);
> +
> if (unlikely(pool->free_list_cnt)) {
> nb_entries1 = xp_alloc_reused(pool, xdp, max);
> if (nb_entries1 == max)
> --
> 2.34.1
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH bpf] xsk: fix batch alloc API on non-coherent systems
2024-09-12 11:04 ` Magnus Karlsson
@ 2024-09-13 19:48 ` Alexei Starovoitov
0 siblings, 0 replies; 4+ messages in thread
From: Alexei Starovoitov @ 2024-09-13 19:48 UTC (permalink / raw)
To: Magnus Karlsson, Jakub Kicinski
Cc: Maciej Fijalkowski, bpf, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Network Development, Karlsson, Magnus,
Björn Töpel, Dries De Winter
On Thu, Sep 12, 2024 at 4:04 AM Magnus Karlsson
<magnus.karlsson@gmail.com> wrote:
>
> On Wed, 11 Sept 2024 at 21:10, Maciej Fijalkowski
> <maciej.fijalkowski@intel.com> wrote:
> >
> > In cases when synchronizing DMA operations is necessary,
> > xsk_buff_alloc_batch() returns a single buffer instead of the requested
> > count. This puts the pressure on drivers that use batch API as they have
> > to check for this corner case on their side and take care of allocations
> > by themselves, which feels counter productive. Let us improve the core
> > by looping over xp_alloc() @max times when slow path needs to be taken.
> >
> > Another issue with current interface, as spotted and fixed by Dries, was
> > that when driver called xsk_buff_alloc_batch() with @max == 0, for slow
> > path case it still allocated and returned a single buffer, which should
> > not happen. By introducing the logic from first paragraph we kill two
> > birds with one stone and address this problem as well.
>
> Thanks Maciej and Dries for finding and fixing this.
>
> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
We already did the last bpf and bpf-next/net PRs before the merge window,
so reassigning to netdev.
Acked-by: Alexei Starovoitov <ast@kernel.org>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH bpf] xsk: fix batch alloc API on non-coherent systems
2024-09-11 19:10 [PATCH bpf] xsk: fix batch alloc API on non-coherent systems Maciej Fijalkowski
2024-09-12 11:04 ` Magnus Karlsson
@ 2024-09-14 1:20 ` patchwork-bot+netdevbpf
1 sibling, 0 replies; 4+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-09-14 1:20 UTC (permalink / raw)
To: Maciej Fijalkowski
Cc: bpf, ast, daniel, andrii, netdev, magnus.karlsson, bjorn,
ddewinter
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Wed, 11 Sep 2024 21:10:19 +0200 you wrote:
> In cases when synchronizing DMA operations is necessary,
> xsk_buff_alloc_batch() returns a single buffer instead of the requested
> count. This puts the pressure on drivers that use batch API as they have
> to check for this corner case on their side and take care of allocations
> by themselves, which feels counter productive. Let us improve the core
> by looping over xp_alloc() @max times when slow path needs to be taken.
>
> [...]
Here is the summary with links:
- [bpf] xsk: fix batch alloc API on non-coherent systems
https://git.kernel.org/netdev/net/c/4144a1059b47
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-09-14 1:20 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-11 19:10 [PATCH bpf] xsk: fix batch alloc API on non-coherent systems Maciej Fijalkowski
2024-09-12 11:04 ` Magnus Karlsson
2024-09-13 19:48 ` Alexei Starovoitov
2024-09-14 1:20 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox