* [PATCH net-next] bpf, skb_do_redirect: clear sender_cpu before xmit
@ 2015-10-07 8:16 Daniel Borkmann
2015-10-07 15:46 ` Alexei Starovoitov
2015-10-08 12:07 ` David Miller
0 siblings, 2 replies; 14+ messages in thread
From: Daniel Borkmann @ 2015-10-07 8:16 UTC (permalink / raw)
To: davem; +Cc: ast, edumazet, netdev, Daniel Borkmann
Similar to commit c29390c6dfee ("xps: must clear sender_cpu before
forwarding"), we also need to clear the skb->sender_cpu when moving
from RX to TX via skb_do_redirect() due to the shared location of
napi_id (used on RX) and sender_cpu (used on TX).
Fixes: 27b29f63058d ("bpf: add bpf_redirect() helper")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
( It's also needed here in the net-next commit 27b29f63058d. )
net/core/filter.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/core/filter.c b/net/core/filter.c
index da3e535..8f4603c 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1462,6 +1462,7 @@ int skb_do_redirect(struct sk_buff *skb)
return dev_forward_skb(dev, skb);
skb->dev = dev;
+ skb_sender_cpu_clear(skb);
return dev_queue_xmit(skb);
}
--
1.9.3
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH net-next] bpf, skb_do_redirect: clear sender_cpu before xmit
2015-10-07 8:16 [PATCH net-next] bpf, skb_do_redirect: clear sender_cpu before xmit Daniel Borkmann
@ 2015-10-07 15:46 ` Alexei Starovoitov
2015-10-09 0:50 ` Devon H. O'Dell
2015-10-08 12:07 ` David Miller
1 sibling, 1 reply; 14+ messages in thread
From: Alexei Starovoitov @ 2015-10-07 15:46 UTC (permalink / raw)
To: Daniel Borkmann, davem; +Cc: edumazet, netdev
On 10/7/15 1:16 AM, Daniel Borkmann wrote:
> Similar to commit c29390c6dfee ("xps: must clear sender_cpu before
> forwarding"), we also need to clear the skb->sender_cpu when moving
> from RX to TX via skb_do_redirect() due to the shared location of
> napi_id (used on RX) and sender_cpu (used on TX).
>
> Fixes: 27b29f63058d ("bpf: add bpf_redirect() helper")
> Signed-off-by: Daniel Borkmann<daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
with the amount of skb_sender_cpu_clear() all over the code base
I wonder whether there is a better solution to all of these.
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH net-next] bpf, skb_do_redirect: clear sender_cpu before xmit
2015-10-07 15:46 ` Alexei Starovoitov
@ 2015-10-09 0:50 ` Devon H. O'Dell
2015-10-09 2:35 ` Alexei Starovoitov
0 siblings, 1 reply; 14+ messages in thread
From: Devon H. O'Dell @ 2015-10-09 0:50 UTC (permalink / raw)
To: Alexei Starovoitov; +Cc: Daniel Borkmann, davem, edumazet, netdev
On Wed, Oct 7, 2015 at 8:46 AM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On 10/7/15 1:16 AM, Daniel Borkmann wrote:
>>
>> Similar to commit c29390c6dfee ("xps: must clear sender_cpu before
>> forwarding"), we also need to clear the skb->sender_cpu when moving
>> from RX to TX via skb_do_redirect() due to the shared location of
>> napi_id (used on RX) and sender_cpu (used on TX).
>>
>> Fixes: 27b29f63058d ("bpf: add bpf_redirect() helper")
>> Signed-off-by: Daniel Borkmann<daniel@iogearbox.net>
>
>
> Acked-by: Alexei Starovoitov <ast@plumgrid.com>
>
> with the amount of skb_sender_cpu_clear() all over the code base
> I wonder whether there is a better solution to all of these.
I think there is. We found that splitting the union of sender_cpu and
napi_id solved the issue for us. In general, I think this is an OK
solution as long as the following hold:
* skbs are always allocated via kzalloc
* out -> out cloned skbs are always cloned on the same CPU
* an extra four bytes in skbuff isn't a bad thing
I think the first and last points are true, but I'm not 100% sure. I'm
also particularly unsure about the second point. If that assumption
does not hold, it could result in extra cache / bus traffic between
cores / sockets. However, that would also imply that we were already
getting some extra traffic at the point of doing the copy. So maybe
not a big deal? The other problem I could imagine is if the second
point *is* true and skbs end up being cloned multiple times, XPS might
get overworked on individual cores.
Anyway, I'm not 100% sure about any of these things: I'm really not at
all familiar with the Linux kernel, let alone the netstack -- this
just turned out to be not particularly difficult to find given
register context and call stack from the panic. I'd be happy to send a
patch to struct skbuff and toss skb_sender_cpu_clear, but I suspect
someone else on this list could validate that quicker than I. The
patch at that point is trivial.
I think it's probably a good thing to do. The need to call
skb_sender_cpu_clear() around every rx->tx copy interaction seems
brittle and likely to be problematic again in the future unless code
is always cargo culted, and assuming we've found every potential clone
site.
--dho
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH net-next] bpf, skb_do_redirect: clear sender_cpu before xmit
2015-10-09 0:50 ` Devon H. O'Dell
@ 2015-10-09 2:35 ` Alexei Starovoitov
2015-10-09 16:40 ` Devon H. O'Dell
2015-10-09 17:33 ` Daniel Borkmann
0 siblings, 2 replies; 14+ messages in thread
From: Alexei Starovoitov @ 2015-10-09 2:35 UTC (permalink / raw)
To: Devon H. O'Dell; +Cc: Daniel Borkmann, davem, edumazet, netdev
On 10/8/15 5:50 PM, Devon H. O'Dell wrote:
>> with the amount of skb_sender_cpu_clear() all over the code base
>> >I wonder whether there is a better solution to all of these.
> I think there is. We found that splitting the union of sender_cpu and
> napi_id solved the issue for us. In general, I think this is an OK
> solution as long as the following hold:
>
> * skbs are always allocated via kzalloc
> * out -> out cloned skbs are always cloned on the same CPU
> * an extra four bytes in skbuff isn't a bad thing
I'm pretty sure extending sk_buff for this is not acceptable.
I was thinking may be we can use sign bit to distinguish between
napi_id and sender_cpu.
Like:
if ((int)skb->sender_cpu >= 0)
skb->sender_cpu = - (raw_smp_processor_id() + 1);
and inside get_xps_queue() use it only if it's negative.
Then we can remove skb_sender_cpu_clear() from everywhere.
Adding a check to napi_hash_add() to make sure that napi_id is not
negative is probably ok too.
Thoughts?
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH net-next] bpf, skb_do_redirect: clear sender_cpu before xmit
2015-10-09 2:35 ` Alexei Starovoitov
@ 2015-10-09 16:40 ` Devon H. O'Dell
2015-10-10 3:11 ` Alexei Starovoitov
2015-10-09 17:33 ` Daniel Borkmann
1 sibling, 1 reply; 14+ messages in thread
From: Devon H. O'Dell @ 2015-10-09 16:40 UTC (permalink / raw)
To: Alexei Starovoitov; +Cc: Daniel Borkmann, davem, Eric Dumazet, netdev
On Thu, Oct 8, 2015 at 7:35 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On 10/8/15 5:50 PM, Devon H. O'Dell wrote:
>>>
>>> with the amount of skb_sender_cpu_clear() all over the code base
>>> >I wonder whether there is a better solution to all of these.
>>
>> I think there is. We found that splitting the union of sender_cpu and
>> napi_id solved the issue for us. In general, I think this is an OK
>> solution as long as the following hold:
>>
>> * skbs are always allocated via kzalloc
>> * out -> out cloned skbs are always cloned on the same CPU
>> * an extra four bytes in skbuff isn't a bad thing
>
>
> I'm pretty sure extending sk_buff for this is not acceptable.
That's unfortunate.
> I was thinking may be we can use sign bit to distinguish between
> napi_id and sender_cpu.
> Like:
> if ((int)skb->sender_cpu >= 0)
> skb->sender_cpu = - (raw_smp_processor_id() + 1);
> and inside get_xps_queue() use it only if it's negative.
I like the idea, but it seems unnecessarily magical. What about using
a bitfield? Then there's just an option bit that is either
OPTION_NAPI_ID or OPTION_SENDER_CPU. Then the check to set sender_cpu
in netdev_pick_tx becomes
if (skb->sender_napi_option == OPTION_NAPI_ID || skb->sender_cpu == 0) ...
> Then we can remove skb_sender_cpu_clear() from everywhere.
> Adding a check to napi_hash_add() to make sure that napi_id is not
> negative is probably ok too.
We could change this to check that sender_napi_option would be
OPTION_NAPI_ID with the bitfield idea.
My names are probably bad, but I think the idea is less magical (and
is effectively the same thing you are proposing).
> Thoughts?
--dho
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH net-next] bpf, skb_do_redirect: clear sender_cpu before xmit
2015-10-09 16:40 ` Devon H. O'Dell
@ 2015-10-10 3:11 ` Alexei Starovoitov
0 siblings, 0 replies; 14+ messages in thread
From: Alexei Starovoitov @ 2015-10-10 3:11 UTC (permalink / raw)
To: Devon H. O'Dell; +Cc: Daniel Borkmann, davem, Eric Dumazet, netdev
On 10/9/15 9:40 AM, Devon H. O'Dell wrote:
> I like the idea, but it seems unnecessarily magical. What about using
> a bitfield? Then there's just an option bit that is either
> OPTION_NAPI_ID or OPTION_SENDER_CPU. Then the check to set sender_cpu
> in netdev_pick_tx becomes
>
> if (skb->sender_napi_option == OPTION_NAPI_ID || skb->sender_cpu == 0) ..
It's less magical, but slower since two loads from skb and two cmp/jmp
are needed instead of one.
and this is critical path of xmit executed for every skb.
that's why I proposed a sign.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next] bpf, skb_do_redirect: clear sender_cpu before xmit
2015-10-09 2:35 ` Alexei Starovoitov
2015-10-09 16:40 ` Devon H. O'Dell
@ 2015-10-09 17:33 ` Daniel Borkmann
2015-10-10 3:19 ` Alexei Starovoitov
2015-11-16 18:07 ` Eric Dumazet
1 sibling, 2 replies; 14+ messages in thread
From: Daniel Borkmann @ 2015-10-09 17:33 UTC (permalink / raw)
To: Alexei Starovoitov, Devon H. O'Dell; +Cc: davem, edumazet, netdev
On 10/09/2015 04:35 AM, Alexei Starovoitov wrote:
> On 10/8/15 5:50 PM, Devon H. O'Dell wrote:
>>> with the amount of skb_sender_cpu_clear() all over the code base
>>> >I wonder whether there is a better solution to all of these.
>> I think there is. We found that splitting the union of sender_cpu and
>> napi_id solved the issue for us. In general, I think this is an OK
>> solution as long as the following hold:
>>
>> * skbs are always allocated via kzalloc
>> * out -> out cloned skbs are always cloned on the same CPU
>> * an extra four bytes in skbuff isn't a bad thing
>
> I'm pretty sure extending sk_buff for this is not acceptable.
+1, I agree.
> I was thinking may be we can use sign bit to distinguish between
> napi_id and sender_cpu.
> Like:
> if ((int)skb->sender_cpu >= 0)
> skb->sender_cpu = - (raw_smp_processor_id() + 1);
> and inside get_xps_queue() use it only if it's negative.
> Then we can remove skb_sender_cpu_clear() from everywhere.
> Adding a check to napi_hash_add() to make sure that napi_id is not
> negative is probably ok too.
> Thoughts?
I think this doesn't make it any more maintainable.
skb_sender_cpu_clear(), one can at least git-grep to easily find
out and review call-sites in the code. There are various members
already used differently depending on the context.
Thanks,
Daniel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next] bpf, skb_do_redirect: clear sender_cpu before xmit
2015-10-09 17:33 ` Daniel Borkmann
@ 2015-10-10 3:19 ` Alexei Starovoitov
2015-10-10 4:38 ` Eric Dumazet
2015-11-16 18:07 ` Eric Dumazet
1 sibling, 1 reply; 14+ messages in thread
From: Alexei Starovoitov @ 2015-10-10 3:19 UTC (permalink / raw)
To: Daniel Borkmann, Devon H. O'Dell; +Cc: davem, edumazet, netdev
On 10/9/15 10:33 AM, Daniel Borkmann wrote:
>> I was thinking may be we can use sign bit to distinguish between
>> napi_id and sender_cpu.
>> Like:
>> if ((int)skb->sender_cpu >= 0)
>> skb->sender_cpu = - (raw_smp_processor_id() + 1);
>> and inside get_xps_queue() use it only if it's negative.
>> Then we can remove skb_sender_cpu_clear() from everywhere.
>> Adding a check to napi_hash_add() to make sure that napi_id is not
>> negative is probably ok too.
>> Thoughts?
>
> I think this doesn't make it any more maintainable.
>
> skb_sender_cpu_clear(), one can at least git-grep to easily find
> out and review call-sites in the code. There are various members
> already used differently depending on the context.
since this bug wasn't fixed at once in all places, it means
that it is hard to review _all_ needed call-sites.
There are 7 places that call skb_sender_cpu_clear() in net-next.
Plus 2 more in net.
How many such paths from rx to tx left?
On the first glance ovs is missing one and who knows what else.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next] bpf, skb_do_redirect: clear sender_cpu before xmit
2015-10-10 3:19 ` Alexei Starovoitov
@ 2015-10-10 4:38 ` Eric Dumazet
2015-10-10 4:55 ` Alexei Starovoitov
2015-10-10 4:56 ` Alexei Starovoitov
0 siblings, 2 replies; 14+ messages in thread
From: Eric Dumazet @ 2015-10-10 4:38 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Daniel Borkmann, Devon H. O'Dell, davem, edumazet, netdev
On Fri, 2015-10-09 at 20:19 -0700, Alexei Starovoitov wrote:
> since this bug wasn't fixed at once in all places, it means
> that it is hard to review _all_ needed call-sites.
> There are 7 places that call skb_sender_cpu_clear() in net-next.
> Plus 2 more in net.
> How many such paths from rx to tx left?
> On the first glance ovs is missing one and who knows what else.
Alexei, what's happening ?
The original patch is 6 months old. If this issue was so urgent, how
comes it took so long to catch the remaining bugs ?
Just add skb_sender_cpu_clear() where needed, thanks.
Using union is hard, but there is a price to performance.
skb size is absolutely critical and deserves some headaches.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next] bpf, skb_do_redirect: clear sender_cpu before xmit
2015-10-10 4:38 ` Eric Dumazet
@ 2015-10-10 4:55 ` Alexei Starovoitov
2015-10-10 4:56 ` Alexei Starovoitov
1 sibling, 0 replies; 14+ messages in thread
From: Alexei Starovoitov @ 2015-10-10 4:55 UTC (permalink / raw)
To: Eric Dumazet
Cc: Daniel Borkmann, Devon H. O'Dell, davem, edumazet, netdev
On 10/9/15 9:38 PM, Eric Dumazet wrote:
> On Fri, 2015-10-09 at 20:19 -0700, Alexei Starovoitov wrote:
>
>> since this bug wasn't fixed at once in all places, it means
>> that it is hard to review _all_ needed call-sites.
>> There are 7 places that call skb_sender_cpu_clear() in net-next.
>> Plus 2 more in net.
>> How many such paths from rx to tx left?
>> On the first glance ovs is missing one and who knows what else.
>
> Alexei, what's happening ?
>
> The original patch is 6 months old. If this issue was so urgent, how
> comes it took so long to catch the remaining bugs ?
no urgency at all. bpf side is clean, so I'm not worried :)
> Just add skb_sender_cpu_clear() where needed, thanks.
>
> Using union is hard, but there is a price to performance.
>
> skb size is absolutely critical and deserves some headaches.
yep. as I said it shouldn't be increased and proposed in-band sign bit.
Anyway, since you and Daniel are ok with adding skb_sender_cpu_clear()
in other places, I rest my case.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next] bpf, skb_do_redirect: clear sender_cpu before xmit
2015-10-10 4:38 ` Eric Dumazet
2015-10-10 4:55 ` Alexei Starovoitov
@ 2015-10-10 4:56 ` Alexei Starovoitov
2015-10-10 17:12 ` Eric Dumazet
1 sibling, 1 reply; 14+ messages in thread
From: Alexei Starovoitov @ 2015-10-10 4:56 UTC (permalink / raw)
To: Eric Dumazet
Cc: Daniel Borkmann, Devon H. O'Dell, davem, edumazet, netdev
On 10/9/15 9:38 PM, Eric Dumazet wrote:
> On Fri, 2015-10-09 at 20:19 -0700, Alexei Starovoitov wrote:
>
>> since this bug wasn't fixed at once in all places, it means
>> that it is hard to review _all_ needed call-sites.
>> There are 7 places that call skb_sender_cpu_clear() in net-next.
>> Plus 2 more in net.
>> How many such paths from rx to tx left?
>> On the first glance ovs is missing one and who knows what else.
>
> Alexei, what's happening ?
>
> The original patch is 6 months old. If this issue was so urgent, how
> comes it took so long to catch the remaining bugs ?
no urgency at all. bpf side is clean, so I'm not worried :)
> Just add skb_sender_cpu_clear() where needed, thanks.
>
> Using union is hard, but there is a price to performance.
>
> skb size is absolutely critical and deserves some headaches.
yep. as I said it shouldn't be increased and proposed in-band sign bit.
Anyway, since you and Daniel are ok with adding skb_sender_cpu_clear()
in other places, I rest my case.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next] bpf, skb_do_redirect: clear sender_cpu before xmit
2015-10-10 4:56 ` Alexei Starovoitov
@ 2015-10-10 17:12 ` Eric Dumazet
0 siblings, 0 replies; 14+ messages in thread
From: Eric Dumazet @ 2015-10-10 17:12 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Daniel Borkmann, Devon H. O'Dell, davem, edumazet, netdev
On Fri, 2015-10-09 at 21:56 -0700, Alexei Starovoitov wrote:
> yep. as I said it shouldn't be increased and proposed in-band sign bit.
I believe we still would like to keep some helpers to clearly identify
when a packet crosses domains. Even if the helper is empty.
It helps a lot when a new feature must be added.
skb_scrub_packet() and similar helper are not only doing useful work,
they are easily grep-able.
So your idea sounds nice, but in reality it maintains the status quo,
where we do not really know which points need care when skbs cross a
domain.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next] bpf, skb_do_redirect: clear sender_cpu before xmit
2015-10-09 17:33 ` Daniel Borkmann
2015-10-10 3:19 ` Alexei Starovoitov
@ 2015-11-16 18:07 ` Eric Dumazet
1 sibling, 0 replies; 14+ messages in thread
From: Eric Dumazet @ 2015-11-16 18:07 UTC (permalink / raw)
To: Daniel Borkmann
Cc: Alexei Starovoitov, Devon H. O'Dell, davem, edumazet, netdev
On Fri, 2015-10-09 at 19:33 +0200, Daniel Borkmann wrote:
> On 10/09/2015 04:35 AM, Alexei Starovoitov wrote:
> > On 10/8/15 5:50 PM, Devon H. O'Dell wrote:
> >>> with the amount of skb_sender_cpu_clear() all over the code base
> >>> >I wonder whether there is a better solution to all of these.
> >> I think there is. We found that splitting the union of sender_cpu and
> >> napi_id solved the issue for us. In general, I think this is an OK
> >> solution as long as the following hold:
> >>
> >> * skbs are always allocated via kzalloc
> >> * out -> out cloned skbs are always cloned on the same CPU
> >> * an extra four bytes in skbuff isn't a bad thing
> >
> > I'm pretty sure extending sk_buff for this is not acceptable.
>
> +1, I agree.
>
> > I was thinking may be we can use sign bit to distinguish between
> > napi_id and sender_cpu.
> > Like:
> > if ((int)skb->sender_cpu >= 0)
> > skb->sender_cpu = - (raw_smp_processor_id() + 1);
> > and inside get_xps_queue() use it only if it's negative.
> > Then we can remove skb_sender_cpu_clear() from everywhere.
> > Adding a check to napi_hash_add() to make sure that napi_id is not
> > negative is probably ok too.
> > Thoughts?
>
> I think this doesn't make it any more maintainable.
>
> skb_sender_cpu_clear(), one can at least git-grep to easily find
> out and review call-sites in the code. There are various members
> already used differently depending on the context.
Extending busy polling support for tunnels devices actually requires to
make some changes in this area.
We need to keep skb->napi_id set until skb reaches a socket, but
skb_scrub_packet() currently defeats the thing.
I will leave skb_sender_cpu_clear() in place but it will be empty.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next] bpf, skb_do_redirect: clear sender_cpu before xmit
2015-10-07 8:16 [PATCH net-next] bpf, skb_do_redirect: clear sender_cpu before xmit Daniel Borkmann
2015-10-07 15:46 ` Alexei Starovoitov
@ 2015-10-08 12:07 ` David Miller
1 sibling, 0 replies; 14+ messages in thread
From: David Miller @ 2015-10-08 12:07 UTC (permalink / raw)
To: daniel; +Cc: ast, edumazet, netdev
From: Daniel Borkmann <daniel@iogearbox.net>
Date: Wed, 7 Oct 2015 10:16:09 +0200
> Similar to commit c29390c6dfee ("xps: must clear sender_cpu before
> forwarding"), we also need to clear the skb->sender_cpu when moving
> from RX to TX via skb_do_redirect() due to the shared location of
> napi_id (used on RX) and sender_cpu (used on TX).
>
> Fixes: 27b29f63058d ("bpf: add bpf_redirect() helper")
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Applied, thanks.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2015-11-16 18:07 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-07 8:16 [PATCH net-next] bpf, skb_do_redirect: clear sender_cpu before xmit Daniel Borkmann
2015-10-07 15:46 ` Alexei Starovoitov
2015-10-09 0:50 ` Devon H. O'Dell
2015-10-09 2:35 ` Alexei Starovoitov
2015-10-09 16:40 ` Devon H. O'Dell
2015-10-10 3:11 ` Alexei Starovoitov
2015-10-09 17:33 ` Daniel Borkmann
2015-10-10 3:19 ` Alexei Starovoitov
2015-10-10 4:38 ` Eric Dumazet
2015-10-10 4:55 ` Alexei Starovoitov
2015-10-10 4:56 ` Alexei Starovoitov
2015-10-10 17:12 ` Eric Dumazet
2015-11-16 18:07 ` Eric Dumazet
2015-10-08 12:07 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).