* [PATCH net-next] net: Optimize IPv6 path in ip_neigh_for_gw()
@ 2024-10-04 16:27 Breno Leitao
2024-10-04 17:01 ` David Ahern
0 siblings, 1 reply; 7+ messages in thread
From: Breno Leitao @ 2024-10-04 16:27 UTC (permalink / raw)
To: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: rmikey, kernel-team, horms, open list:NETWORKING [IPv4/IPv6],
open list
Branch annotation traces from approximately 200 IPv6-enabled hosts
revealed that the 'likely' branch in ip_neigh_for_gw() was consistently
mispredicted. Given the increasing prevalence of IPv6 in modern networks,
this commit adjusts the function to favor the IPv6 path.
Swap the order of the conditional statements and move the 'likely'
annotation to the IPv6 case. This change aims to improve performance in
IPv6-dominant environments by reducing branch mispredictions.
This optimization aligns with the trend of IPv6 becoming the default IP
version in many deployments, and should benefit modern network
configurations.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
include/net/route.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/net/route.h b/include/net/route.h
index 1789f1e6640b..b90b7b1effb8 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -389,11 +389,11 @@ static inline struct neighbour *ip_neigh_for_gw(struct rtable *rt,
struct net_device *dev = rt->dst.dev;
struct neighbour *neigh;
- if (likely(rt->rt_gw_family == AF_INET)) {
- neigh = ip_neigh_gw4(dev, rt->rt_gw4);
- } else if (rt->rt_gw_family == AF_INET6) {
+ if (likely(rt->rt_gw_family == AF_INET6)) {
neigh = ip_neigh_gw6(dev, &rt->rt_gw6);
*is_v6gw = true;
+ } else if (rt->rt_gw_family == AF_INET) {
+ neigh = ip_neigh_gw4(dev, rt->rt_gw4);
} else {
neigh = ip_neigh_gw4(dev, ip_hdr(skb)->daddr);
}
--
2.43.5
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH net-next] net: Optimize IPv6 path in ip_neigh_for_gw()
2024-10-04 16:27 [PATCH net-next] net: Optimize IPv6 path in ip_neigh_for_gw() Breno Leitao
@ 2024-10-04 17:01 ` David Ahern
2024-10-04 17:37 ` Breno Leitao
0 siblings, 1 reply; 7+ messages in thread
From: David Ahern @ 2024-10-04 17:01 UTC (permalink / raw)
To: Breno Leitao, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: rmikey, kernel-team, horms, open list:NETWORKING [IPv4/IPv6],
open list
On 10/4/24 10:27 AM, Breno Leitao wrote:
> Branch annotation traces from approximately 200 IPv6-enabled hosts
> revealed that the 'likely' branch in ip_neigh_for_gw() was consistently
> mispredicted. Given the increasing prevalence of IPv6 in modern networks,
> this commit adjusts the function to favor the IPv6 path.
>
> Swap the order of the conditional statements and move the 'likely'
> annotation to the IPv6 case. This change aims to improve performance in
> IPv6-dominant environments by reducing branch mispredictions.
>
> This optimization aligns with the trend of IPv6 becoming the default IP
> version in many deployments, and should benefit modern network
> configurations.
>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
> include/net/route.h | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/include/net/route.h b/include/net/route.h
> index 1789f1e6640b..b90b7b1effb8 100644
> --- a/include/net/route.h
> +++ b/include/net/route.h
> @@ -389,11 +389,11 @@ static inline struct neighbour *ip_neigh_for_gw(struct rtable *rt,
> struct net_device *dev = rt->dst.dev;
> struct neighbour *neigh;
>
> - if (likely(rt->rt_gw_family == AF_INET)) {
> - neigh = ip_neigh_gw4(dev, rt->rt_gw4);
> - } else if (rt->rt_gw_family == AF_INET6) {
> + if (likely(rt->rt_gw_family == AF_INET6)) {
> neigh = ip_neigh_gw6(dev, &rt->rt_gw6);
> *is_v6gw = true;
> + } else if (rt->rt_gw_family == AF_INET) {
> + neigh = ip_neigh_gw4(dev, rt->rt_gw4);
> } else {
> neigh = ip_neigh_gw4(dev, ip_hdr(skb)->daddr);
> }
This is an IPv4 function allowing support for IPv6 addresses as a
nexthop. It is appropriate for IPv4 family checks to be first.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next] net: Optimize IPv6 path in ip_neigh_for_gw()
2024-10-04 17:01 ` David Ahern
@ 2024-10-04 17:37 ` Breno Leitao
2024-10-08 10:51 ` Paolo Abeni
0 siblings, 1 reply; 7+ messages in thread
From: Breno Leitao @ 2024-10-04 17:37 UTC (permalink / raw)
To: David Ahern
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
rmikey, kernel-team, horms, open list:NETWORKING [IPv4/IPv6],
open list
Hello David,
On Fri, Oct 04, 2024 at 11:01:29AM -0600, David Ahern wrote:
> On 10/4/24 10:27 AM, Breno Leitao wrote:
> > Branch annotation traces from approximately 200 IPv6-enabled hosts
> > revealed that the 'likely' branch in ip_neigh_for_gw() was consistently
> > mispredicted. Given the increasing prevalence of IPv6 in modern networks,
> > this commit adjusts the function to favor the IPv6 path.
> >
> > Swap the order of the conditional statements and move the 'likely'
> > annotation to the IPv6 case. This change aims to improve performance in
> > IPv6-dominant environments by reducing branch mispredictions.
> >
> > This optimization aligns with the trend of IPv6 becoming the default IP
> > version in many deployments, and should benefit modern network
> > configurations.
> >
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> > ---
> > include/net/route.h | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/net/route.h b/include/net/route.h
> > index 1789f1e6640b..b90b7b1effb8 100644
> > --- a/include/net/route.h
> > +++ b/include/net/route.h
> > @@ -389,11 +389,11 @@ static inline struct neighbour *ip_neigh_for_gw(struct rtable *rt,
> > struct net_device *dev = rt->dst.dev;
> > struct neighbour *neigh;
> >
> > - if (likely(rt->rt_gw_family == AF_INET)) {
> > - neigh = ip_neigh_gw4(dev, rt->rt_gw4);
> > - } else if (rt->rt_gw_family == AF_INET6) {
> > + if (likely(rt->rt_gw_family == AF_INET6)) {
> > neigh = ip_neigh_gw6(dev, &rt->rt_gw6);
> > *is_v6gw = true;
> > + } else if (rt->rt_gw_family == AF_INET) {
> > + neigh = ip_neigh_gw4(dev, rt->rt_gw4);
> > } else {
> > neigh = ip_neigh_gw4(dev, ip_hdr(skb)->daddr);
> > }
>
> This is an IPv4 function allowing support for IPv6 addresses as a
> nexthop. It is appropriate for IPv4 family checks to be first.
Right. In which case is this called on IPv6 only systems?
On my IPv6-only 200 systems, the annotated branch predictor is showing
it is mispredicted 100% of the time.
Thanks for the review
--breno
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next] net: Optimize IPv6 path in ip_neigh_for_gw()
2024-10-04 17:37 ` Breno Leitao
@ 2024-10-08 10:51 ` Paolo Abeni
2024-10-08 14:07 ` Breno Leitao
0 siblings, 1 reply; 7+ messages in thread
From: Paolo Abeni @ 2024-10-08 10:51 UTC (permalink / raw)
To: Breno Leitao, David Ahern
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, rmikey,
kernel-team, horms, open list:NETWORKING [IPv4/IPv6], open list
On 10/4/24 19:37, Breno Leitao wrote:
> On Fri, Oct 04, 2024 at 11:01:29AM -0600, David Ahern wrote:
>> On 10/4/24 10:27 AM, Breno Leitao wrote:
>>> Branch annotation traces from approximately 200 IPv6-enabled hosts
>>> revealed that the 'likely' branch in ip_neigh_for_gw() was consistently
>>> mispredicted. Given the increasing prevalence of IPv6 in modern networks,
>>> this commit adjusts the function to favor the IPv6 path.
>>>
>>> Swap the order of the conditional statements and move the 'likely'
>>> annotation to the IPv6 case. This change aims to improve performance in
>>> IPv6-dominant environments by reducing branch mispredictions.
>>>
>>> This optimization aligns with the trend of IPv6 becoming the default IP
>>> version in many deployments, and should benefit modern network
>>> configurations.
>>>
>>> Signed-off-by: Breno Leitao <leitao@debian.org>
>>> ---
>>> include/net/route.h | 6 +++---
>>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/include/net/route.h b/include/net/route.h
>>> index 1789f1e6640b..b90b7b1effb8 100644
>>> --- a/include/net/route.h
>>> +++ b/include/net/route.h
>>> @@ -389,11 +389,11 @@ static inline struct neighbour *ip_neigh_for_gw(struct rtable *rt,
>>> struct net_device *dev = rt->dst.dev;
>>> struct neighbour *neigh;
>>>
>>> - if (likely(rt->rt_gw_family == AF_INET)) {
>>> - neigh = ip_neigh_gw4(dev, rt->rt_gw4);
>>> - } else if (rt->rt_gw_family == AF_INET6) {
>>> + if (likely(rt->rt_gw_family == AF_INET6)) {
>>> neigh = ip_neigh_gw6(dev, &rt->rt_gw6);
>>> *is_v6gw = true;
>>> + } else if (rt->rt_gw_family == AF_INET) {
>>> + neigh = ip_neigh_gw4(dev, rt->rt_gw4);
>>> } else {
>>> neigh = ip_neigh_gw4(dev, ip_hdr(skb)->daddr);
>>> }
>>
>> This is an IPv4 function allowing support for IPv6 addresses as a
>> nexthop. It is appropriate for IPv4 family checks to be first.
>
> Right. In which case is this called on IPv6 only systems?
>
> On my IPv6-only 200 systems, the annotated branch predictor is showing
> it is mispredicted 100% of the time.
perf probe -a ip_neigh_for_gw; perf record -e probe:ip_neigh_for_gw -ag;
perf script
should give you an hint.
Cheers,
Paolo
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next] net: Optimize IPv6 path in ip_neigh_for_gw()
2024-10-08 10:51 ` Paolo Abeni
@ 2024-10-08 14:07 ` Breno Leitao
2024-10-08 14:15 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Breno Leitao @ 2024-10-08 14:07 UTC (permalink / raw)
To: Paolo Abeni
Cc: David Ahern, David S. Miller, Eric Dumazet, Jakub Kicinski,
rmikey, kernel-team, horms, open list:NETWORKING [IPv4/IPv6],
open list
Hello Paolo,
On Tue, Oct 08, 2024 at 12:51:05PM +0200, Paolo Abeni wrote:
> On 10/4/24 19:37, Breno Leitao wrote:
> > On Fri, Oct 04, 2024 at 11:01:29AM -0600, David Ahern wrote:
> > > On 10/4/24 10:27 AM, Breno Leitao wrote:
> > > > Branch annotation traces from approximately 200 IPv6-enabled hosts
> > > > revealed that the 'likely' branch in ip_neigh_for_gw() was consistently
> > > > mispredicted. Given the increasing prevalence of IPv6 in modern networks,
> > > > this commit adjusts the function to favor the IPv6 path.
> > > >
> > > > Swap the order of the conditional statements and move the 'likely'
> > > > annotation to the IPv6 case. This change aims to improve performance in
> > > > IPv6-dominant environments by reducing branch mispredictions.
> > > >
> > > > This optimization aligns with the trend of IPv6 becoming the default IP
> > > > version in many deployments, and should benefit modern network
> > > > configurations.
> > > >
> > > > Signed-off-by: Breno Leitao <leitao@debian.org>
> > > > ---
> > > > include/net/route.h | 6 +++---
> > > > 1 file changed, 3 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/include/net/route.h b/include/net/route.h
> > > > index 1789f1e6640b..b90b7b1effb8 100644
> > > > --- a/include/net/route.h
> > > > +++ b/include/net/route.h
> > > > @@ -389,11 +389,11 @@ static inline struct neighbour *ip_neigh_for_gw(struct rtable *rt,
> > > > struct net_device *dev = rt->dst.dev;
> > > > struct neighbour *neigh;
> > > > - if (likely(rt->rt_gw_family == AF_INET)) {
> > > > - neigh = ip_neigh_gw4(dev, rt->rt_gw4);
> > > > - } else if (rt->rt_gw_family == AF_INET6) {
> > > > + if (likely(rt->rt_gw_family == AF_INET6)) {
> > > > neigh = ip_neigh_gw6(dev, &rt->rt_gw6);
> > > > *is_v6gw = true;
> > > > + } else if (rt->rt_gw_family == AF_INET) {
> > > > + neigh = ip_neigh_gw4(dev, rt->rt_gw4);
> > > > } else {
> > > > neigh = ip_neigh_gw4(dev, ip_hdr(skb)->daddr);
> > > > }
> > >
> > > This is an IPv4 function allowing support for IPv6 addresses as a
> > > nexthop. It is appropriate for IPv4 family checks to be first.
> >
> > Right. In which case is this called on IPv6 only systems?
> >
> > On my IPv6-only 200 systems, the annotated branch predictor is showing
> > it is mispredicted 100% of the time.
>
> perf probe -a ip_neigh_for_gw; perf record -e probe:ip_neigh_for_gw -ag;
> perf script
>
> should give you an hint.
Thanks. That proved to be very useful.
As I said above, all the hosts I have a webserver running, I see this
that likely mispredicted. Same for this server:
# cat /sys/kernel/tracing/trace_stat/branch_annotated | grep ip_neigh_for_gw
correct incorrect % Function File Line
0 17127 100 ip_neigh_for_gw route.h 393
It is mostly coming from ip_finish_output2() and tcp_v4. Important to
say that these machine has no IPv4 configured, except 127.0.0.1
(localhost).
Output of `perf script`:
curl 3284017 [020] 342043.646674: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])
curl 3284017 [020] 342043.646720: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_unicast_reply+0x3fc
tcp_v4_send_reset+0x668
tcp_v4_rcv+0xcdf
ip_protocol_deliver_rcu+0x10e
ip_local_deliver_finish+0x97
ip_local_deliver+0x43
ip_rcv+0x35
process_backlog+0x1b8
__napi_poll+0x30
net_rx_action+0x180
__kprobes_text_end+0xf2
__local_bh_enable_ip+0xeb
__dev_queue_xmit+0xc18
ip_finish_output2+0x63a
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])
isc-net-0000 3286356 [026] 342055.690384: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_skb+0x15
udp_send_skb+0xd2
udp_sendmsg+0xaa7
__x64_sys_sendmsg+0x338
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_sendmsg+0x4d (/usr/lib64/libc.so.6)
curl 3288713 [032] 342103.631991: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])
curl 3288713 [032] 342103.632039: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_unicast_reply+0x3fc
tcp_v4_send_reset+0x668
tcp_v4_rcv+0xcdf
ip_protocol_deliver_rcu+0x10e
ip_local_deliver_finish+0x97
ip_local_deliver+0x43
ip_rcv+0x35
process_backlog+0x1b8
__napi_poll+0x30
net_rx_action+0x180
__kprobes_text_end+0xf2
__local_bh_enable_ip+0xeb
__dev_queue_xmit+0xc18
ip_finish_output2+0x63a
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])
isc-net-0000 3289126 [021] 342115.725482: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_skb+0x15
udp_send_skb+0xd2
udp_sendmsg+0xaa7
__x64_sys_sendmsg+0x338
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_sendmsg+0x4d (/usr/lib64/libc.so.6)
curl 3291018 [030] 342163.627633: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])
curl 3291018 [030] 342163.627673: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_unicast_reply+0x3fc
tcp_v4_send_reset+0x668
tcp_v4_rcv+0xcdf
ip_protocol_deliver_rcu+0x10e
ip_local_deliver_finish+0x97
ip_local_deliver+0x43
ip_rcv+0x35
process_backlog+0x1b8
__napi_poll+0x30
net_rx_action+0x180
__kprobes_text_end+0xf2
__local_bh_enable_ip+0xeb
__dev_queue_xmit+0xc18
ip_finish_output2+0x63a
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])
isc-net-0000 3291256 [031] 342175.683527: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_skb+0x15
udp_send_skb+0xd2
udp_sendmsg+0xaa7
__x64_sys_sendmsg+0x338
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_sendmsg+0x4d (/usr/lib64/libc.so.6)
curl 3293421 [025] 342223.618198: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])
curl 3293421 [025] 342223.618239: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_unicast_reply+0x3fc
tcp_v4_send_reset+0x668
tcp_v4_rcv+0xcdf
ip_protocol_deliver_rcu+0x10e
ip_local_deliver_finish+0x97
ip_local_deliver+0x43
ip_rcv+0x35
process_backlog+0x1b8
__napi_poll+0x30
net_rx_action+0x180
__kprobes_text_end+0xf2
__local_bh_enable_ip+0xeb
__dev_queue_xmit+0xc18
ip_finish_output2+0x63a
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])
isc-net-0000 3293659 [034] 342235.695019: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_skb+0x15
udp_send_skb+0xd2
udp_sendmsg+0xaa7
__x64_sys_sendmsg+0x338
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_sendmsg+0x4d (/usr/lib64/libc.so.6)
curl 3295399 [012] 342283.632642: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])
curl 3295399 [012] 342283.632691: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_unicast_reply+0x3fc
tcp_v4_send_reset+0x668
tcp_v4_rcv+0xcdf
ip_protocol_deliver_rcu+0x10e
ip_local_deliver_finish+0x97
ip_local_deliver+0x43
ip_rcv+0x35
process_backlog+0x1b8
__napi_poll+0x30
net_rx_action+0x180
__kprobes_text_end+0xf2
__local_bh_enable_ip+0xeb
__dev_queue_xmit+0xc18
ip_finish_output2+0x63a
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])
isc-net-0000 3295746 [001] 342295.712436: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_skb+0x15
udp_send_skb+0xd2
udp_sendmsg+0xaa7
__x64_sys_sendmsg+0x338
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_sendmsg+0x4d (/usr/lib64/libc.so.6)
curl 3298603 [020] 342343.608814: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])
curl 3298603 [020] 342343.608858: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_unicast_reply+0x3fc
tcp_v4_send_reset+0x668
tcp_v4_rcv+0xcdf
ip_protocol_deliver_rcu+0x10e
ip_local_deliver_finish+0x97
ip_local_deliver+0x43
ip_rcv+0x35
process_backlog+0x1b8
__napi_poll+0x30
net_rx_action+0x180
__kprobes_text_end+0xf2
__local_bh_enable_ip+0xeb
__dev_queue_xmit+0xc18
ip_finish_output2+0x63a
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])
isc-net-0000 3299252 [032] 342355.693816: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_skb+0x15
udp_send_skb+0xd2
udp_sendmsg+0xaa7
__x64_sys_sendmsg+0x338
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_sendmsg+0x4d (/usr/lib64/libc.so.6)
curl 3303255 [033] 342403.616685: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])
curl 3303255 [033] 342403.616729: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_unicast_reply+0x3fc
tcp_v4_send_reset+0x668
tcp_v4_rcv+0xcdf
ip_protocol_deliver_rcu+0x10e
ip_local_deliver_finish+0x97
ip_local_deliver+0x43
ip_rcv+0x35
process_backlog+0x1b8
__napi_poll+0x30
net_rx_action+0x180
__kprobes_text_end+0xf2
__local_bh_enable_ip+0xeb
__dev_queue_xmit+0xc18
ip_finish_output2+0x63a
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])
isc-net-0000 3304989 [011] 342415.740580: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_skb+0x15
udp_send_skb+0xd2
udp_sendmsg+0xaa7
__x64_sys_sendmsg+0x338
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_sendmsg+0x4d (/usr/lib64/libc.so.6)
curl 3312952 [035] 342463.633808: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])
curl 3312952 [035] 342463.633859: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_unicast_reply+0x3fc
tcp_v4_send_reset+0x668
tcp_v4_rcv+0xcdf
ip_protocol_deliver_rcu+0x10e
ip_local_deliver_finish+0x97
ip_local_deliver+0x43
ip_rcv+0x35
process_backlog+0x1b8
__napi_poll+0x30
net_rx_action+0x180
__kprobes_text_end+0xf2
__local_bh_enable_ip+0xeb
__dev_queue_xmit+0xc18
ip_finish_output2+0x63a
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])
isc-net-0000 3314546 [032] 342475.766762: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_skb+0x15
udp_send_skb+0xd2
udp_sendmsg+0xaa7
__x64_sys_sendmsg+0x338
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_sendmsg+0x4d (/usr/lib64/libc.so.6)
curl 3321983 [006] 342523.654221: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])
curl 3321983 [006] 342523.654262: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_unicast_reply+0x3fc
tcp_v4_send_reset+0x668
tcp_v4_rcv+0xcdf
ip_protocol_deliver_rcu+0x10e
ip_local_deliver_finish+0x97
ip_local_deliver+0x43
ip_rcv+0x35
process_backlog+0x1b8
__napi_poll+0x30
net_rx_action+0x180
__kprobes_text_end+0xf2
__local_bh_enable_ip+0xeb
__dev_queue_xmit+0xc18
ip_finish_output2+0x63a
ip_output+0x73
__ip_queue_xmit+0x504
__tcp_transmit_skb+0xcfe
tcp_connect+0xa1d
tcp_v4_connect+0x463
__inet_stream_connect+0x5b
inet_stream_connect+0x36
__sys_connect+0x8d
__x64_sys_connect+0x16
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_connect+0x4b (/usr/lib64/libc.so.6)
0 [unknown] ([unknown])
isc-net-0000 3323932 [025] 342535.718587: probe:ip_neigh_for_gw: ()
ip_finish_output2+0x150
ip_output+0x73
ip_send_skb+0x15
udp_send_skb+0xd2
udp_sendmsg+0xaa7
__x64_sys_sendmsg+0x338
do_syscall_64+0xc2
entry_SYSCALL_64_after_hwframe+0x4b
__libc_sendmsg+0x4d (/usr/lib64/libc.so.6)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next] net: Optimize IPv6 path in ip_neigh_for_gw()
2024-10-08 14:07 ` Breno Leitao
@ 2024-10-08 14:15 ` Eric Dumazet
2024-10-08 14:48 ` Breno Leitao
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2024-10-08 14:15 UTC (permalink / raw)
To: Breno Leitao
Cc: Paolo Abeni, David Ahern, David S. Miller, Jakub Kicinski, rmikey,
kernel-team, horms, open list:NETWORKING [IPv4/IPv6], open list
On Tue, Oct 8, 2024 at 4:07 PM Breno Leitao <leitao@debian.org> wrote:
>
> Hello Paolo,
>
> On Tue, Oct 08, 2024 at 12:51:05PM +0200, Paolo Abeni wrote:
> > On 10/4/24 19:37, Breno Leitao wrote:
> > > On Fri, Oct 04, 2024 at 11:01:29AM -0600, David Ahern wrote:
> > > > On 10/4/24 10:27 AM, Breno Leitao wrote:
> > > > > Branch annotation traces from approximately 200 IPv6-enabled hosts
> > > > > revealed that the 'likely' branch in ip_neigh_for_gw() was consistently
> > > > > mispredicted. Given the increasing prevalence of IPv6 in modern networks,
> > > > > this commit adjusts the function to favor the IPv6 path.
> > > > >
> > > > > Swap the order of the conditional statements and move the 'likely'
> > > > > annotation to the IPv6 case. This change aims to improve performance in
> > > > > IPv6-dominant environments by reducing branch mispredictions.
> > > > >
> > > > > This optimization aligns with the trend of IPv6 becoming the default IP
> > > > > version in many deployments, and should benefit modern network
> > > > > configurations.
> > > > >
> > > > > Signed-off-by: Breno Leitao <leitao@debian.org>
> > > > > ---
> > > > > include/net/route.h | 6 +++---
> > > > > 1 file changed, 3 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/include/net/route.h b/include/net/route.h
> > > > > index 1789f1e6640b..b90b7b1effb8 100644
> > > > > --- a/include/net/route.h
> > > > > +++ b/include/net/route.h
> > > > > @@ -389,11 +389,11 @@ static inline struct neighbour *ip_neigh_for_gw(struct rtable *rt,
> > > > > struct net_device *dev = rt->dst.dev;
> > > > > struct neighbour *neigh;
> > > > > - if (likely(rt->rt_gw_family == AF_INET)) {
> > > > > - neigh = ip_neigh_gw4(dev, rt->rt_gw4);
> > > > > - } else if (rt->rt_gw_family == AF_INET6) {
> > > > > + if (likely(rt->rt_gw_family == AF_INET6)) {
> > > > > neigh = ip_neigh_gw6(dev, &rt->rt_gw6);
> > > > > *is_v6gw = true;
> > > > > + } else if (rt->rt_gw_family == AF_INET) {
> > > > > + neigh = ip_neigh_gw4(dev, rt->rt_gw4);
> > > > > } else {
> > > > > neigh = ip_neigh_gw4(dev, ip_hdr(skb)->daddr);
> > > > > }
> > > >
> > > > This is an IPv4 function allowing support for IPv6 addresses as a
> > > > nexthop. It is appropriate for IPv4 family checks to be first.
> > >
> > > Right. In which case is this called on IPv6 only systems?
> > >
> > > On my IPv6-only 200 systems, the annotated branch predictor is showing
> > > it is mispredicted 100% of the time.
> >
> > perf probe -a ip_neigh_for_gw; perf record -e probe:ip_neigh_for_gw -ag;
> > perf script
> >
> > should give you an hint.
>
> Thanks. That proved to be very useful.
>
> As I said above, all the hosts I have a webserver running, I see this
> that likely mispredicted. Same for this server:
>
> # cat /sys/kernel/tracing/trace_stat/branch_annotated | grep ip_neigh_for_gw
> correct incorrect % Function File Line
> 0 17127 100 ip_neigh_for_gw route.h 393
>
> It is mostly coming from ip_finish_output2() and tcp_v4. Important to
> say that these machine has no IPv4 configured, except 127.0.0.1
> (localhost).
Now run the experiment on a typical server using IPv4 ?
I would advise removing the likely() if it really bothers you.
(I doubt this has any impact)
But assuming everything is IPv6 is too soon.
There are more obvious changes like :
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index b6e7d4921309741193a8c096aeb278255ec56794..445f4fe712603e8c14b1006ad4cbaac278bae4ea
100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -462,7 +462,7 @@ static struct sk_buff *ip_rcv_core(struct sk_buff
*skb, struct net *net)
/* When the interface is in promisc. mode, drop all the crap
* that it receives, do not try to analyse it.
*/
- if (skb->pkt_type == PACKET_OTHERHOST) {
+ if (unlikely(skb->pkt_type == PACKET_OTHERHOST)) {
dev_core_stats_rx_otherhost_dropped_inc(skb->dev);
drop_reason = SKB_DROP_REASON_OTHERHOST;
goto drop;
diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index 70c0e16c0ae6837d1c64d0036829c8b61799578b..3d0797afa499fa880eb5452a0dea8a23505b3e60
100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -153,7 +153,7 @@ static struct sk_buff *ip6_rcv_core(struct sk_buff
*skb, struct net_device *dev,
u32 pkt_len;
struct inet6_dev *idev;
- if (skb->pkt_type == PACKET_OTHERHOST) {
+ if (unlikely(skb->pkt_type == PACKET_OTHERHOST)) {
dev_core_stats_rx_otherhost_dropped_inc(skb->dev);
kfree_skb_reason(skb, SKB_DROP_REASON_OTHERHOST);
return NULL;
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next] net: Optimize IPv6 path in ip_neigh_for_gw()
2024-10-08 14:15 ` Eric Dumazet
@ 2024-10-08 14:48 ` Breno Leitao
0 siblings, 0 replies; 7+ messages in thread
From: Breno Leitao @ 2024-10-08 14:48 UTC (permalink / raw)
To: Eric Dumazet
Cc: Paolo Abeni, David Ahern, David S. Miller, Jakub Kicinski, rmikey,
kernel-team, horms, open list:NETWORKING [IPv4/IPv6], open list
Hello Eric,
On Tue, Oct 08, 2024 at 04:15:37PM +0200, Eric Dumazet wrote:
> On Tue, Oct 8, 2024 at 4:07 PM Breno Leitao <leitao@debian.org> wrote:
> >
> > Hello Paolo,
> >
> > On Tue, Oct 08, 2024 at 12:51:05PM +0200, Paolo Abeni wrote:
> > > On 10/4/24 19:37, Breno Leitao wrote:
> > > > On Fri, Oct 04, 2024 at 11:01:29AM -0600, David Ahern wrote:
> > > > > On 10/4/24 10:27 AM, Breno Leitao wrote:
> > > > > > Branch annotation traces from approximately 200 IPv6-enabled hosts
> > > > > > revealed that the 'likely' branch in ip_neigh_for_gw() was consistently
> > > > > > mispredicted. Given the increasing prevalence of IPv6 in modern networks,
> > > > > > this commit adjusts the function to favor the IPv6 path.
> > > > > >
> > > > > > Swap the order of the conditional statements and move the 'likely'
> > > > > > annotation to the IPv6 case. This change aims to improve performance in
> > > > > > IPv6-dominant environments by reducing branch mispredictions.
> > > > > >
> > > > > > This optimization aligns with the trend of IPv6 becoming the default IP
> > > > > > version in many deployments, and should benefit modern network
> > > > > > configurations.
> > > > > >
> > > > > > Signed-off-by: Breno Leitao <leitao@debian.org>
> > > > > > ---
> > > > > > include/net/route.h | 6 +++---
> > > > > > 1 file changed, 3 insertions(+), 3 deletions(-)
> > > > > >
> > > > > > diff --git a/include/net/route.h b/include/net/route.h
> > > > > > index 1789f1e6640b..b90b7b1effb8 100644
> > > > > > --- a/include/net/route.h
> > > > > > +++ b/include/net/route.h
> > > > > > @@ -389,11 +389,11 @@ static inline struct neighbour *ip_neigh_for_gw(struct rtable *rt,
> > > > > > struct net_device *dev = rt->dst.dev;
> > > > > > struct neighbour *neigh;
> > > > > > - if (likely(rt->rt_gw_family == AF_INET)) {
> > > > > > - neigh = ip_neigh_gw4(dev, rt->rt_gw4);
> > > > > > - } else if (rt->rt_gw_family == AF_INET6) {
> > > > > > + if (likely(rt->rt_gw_family == AF_INET6)) {
> > > > > > neigh = ip_neigh_gw6(dev, &rt->rt_gw6);
> > > > > > *is_v6gw = true;
> > > > > > + } else if (rt->rt_gw_family == AF_INET) {
> > > > > > + neigh = ip_neigh_gw4(dev, rt->rt_gw4);
> > > > > > } else {
> > > > > > neigh = ip_neigh_gw4(dev, ip_hdr(skb)->daddr);
> > > > > > }
> > > > >
> > > > > This is an IPv4 function allowing support for IPv6 addresses as a
> > > > > nexthop. It is appropriate for IPv4 family checks to be first.
> > > >
> > > > Right. In which case is this called on IPv6 only systems?
> > > >
> > > > On my IPv6-only 200 systems, the annotated branch predictor is showing
> > > > it is mispredicted 100% of the time.
> > >
> > > perf probe -a ip_neigh_for_gw; perf record -e probe:ip_neigh_for_gw -ag;
> > > perf script
> > >
> > > should give you an hint.
> >
> > Thanks. That proved to be very useful.
> >
> > As I said above, all the hosts I have a webserver running, I see this
> > that likely mispredicted. Same for this server:
> >
> > # cat /sys/kernel/tracing/trace_stat/branch_annotated | grep ip_neigh_for_gw
> > correct incorrect % Function File Line
> > 0 17127 100 ip_neigh_for_gw route.h 393
> >
> > It is mostly coming from ip_finish_output2() and tcp_v4. Important to
> > say that these machine has no IPv4 configured, except 127.0.0.1
> > (localhost).
>
> Now run the experiment on a typical server using IPv4 ?
>
> I would advise removing the likely() if it really bothers you.
> (I doubt this has any impact)
Thanks. I am mostly concerned about likely/unlikely() that are wrong
100% the time when running production workloads in modern hardware.
I just got a few hundreds host to able to run annotated
branches enabled, and I am looking on how it can help us to understand
our code flow better.
Regarding performance impact, I agree with you that performance is
minimal (at least on x86). On other architectures, such as powerpc
things can be more evident, given that the branch hint is encoded in the
instruction itself, so, the hardware knows where to predict.
That said, I think the best approach is just to remove the likely() from
that code path.
> But assuming everything is IPv6 is too soon.
>
> There are more obvious changes like :
>
> diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
> index b6e7d4921309741193a8c096aeb278255ec56794..445f4fe712603e8c14b1006ad4cbaac278bae4ea
> 100644
> --- a/net/ipv4/ip_input.c
> +++ b/net/ipv4/ip_input.c
> @@ -462,7 +462,7 @@ static struct sk_buff *ip_rcv_core(struct sk_buff
> *skb, struct net *net)
> /* When the interface is in promisc. mode, drop all the crap
> * that it receives, do not try to analyse it.
> */
> - if (skb->pkt_type == PACKET_OTHERHOST) {
> + if (unlikely(skb->pkt_type == PACKET_OTHERHOST)) {
> dev_core_stats_rx_otherhost_dropped_inc(skb->dev);
> drop_reason = SKB_DROP_REASON_OTHERHOST;
> goto drop;
> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
> index 70c0e16c0ae6837d1c64d0036829c8b61799578b..3d0797afa499fa880eb5452a0dea8a23505b3e60
> 100644
> --- a/net/ipv6/ip6_input.c
> +++ b/net/ipv6/ip6_input.c
> @@ -153,7 +153,7 @@ static struct sk_buff *ip6_rcv_core(struct sk_buff
> *skb, struct net_device *dev,
> u32 pkt_len;
> struct inet6_dev *idev;
>
> - if (skb->pkt_type == PACKET_OTHERHOST) {
> + if (unlikely(skb->pkt_type == PACKET_OTHERHOST)) {
> dev_core_stats_rx_otherhost_dropped_inc(skb->dev);
> kfree_skb_reason(skb, SKB_DROP_REASON_OTHERHOST);
> return NULL;
Agree, that would be obvious changes also.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-10-08 14:49 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-04 16:27 [PATCH net-next] net: Optimize IPv6 path in ip_neigh_for_gw() Breno Leitao
2024-10-04 17:01 ` David Ahern
2024-10-04 17:37 ` Breno Leitao
2024-10-08 10:51 ` Paolo Abeni
2024-10-08 14:07 ` Breno Leitao
2024-10-08 14:15 ` Eric Dumazet
2024-10-08 14:48 ` Breno Leitao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).