public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Leone Fernando <leone4fernando@gmail.com>
To: Eric Dumazet <edumazet@google.com>
Cc: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com,
	dsahern@kernel.org, willemb@google.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH net-next v2 0/4] net: route: improve route hinting
Date: Wed, 15 May 2024 15:52:22 +0200	[thread overview]
Message-ID: <c8ff8557-9e2c-4316-8642-fd7ab1553ffb@gmail.com> (raw)
In-Reply-To: <CANn89iKhHJDZZSwz1EtecZduNt7HxYW5o_1T0CJ9kqXxNbqMDA@mail.gmail.com>

> On Tue, May 7, 2024 at 2:43 PM Leone Fernando <leone4fernando@gmail.com> wrote:
>>
>> In 2017, Paolo Abeni introduced the hinting mechanism [1] to the routing
>> sub-system. The hinting optimization improves performance by reusing
>> previously found dsts instead of looking them up for each skb.
>>
>> This patch series introduces a generalized version of the hinting mechanism that
>> can "remember" a larger number of dsts. This reduces the number of dst
>> lookups for frequently encountered daddrs.
>>
>> Before diving into the code and the benchmarking results, it's important
>> to address the deletion of the old route cache [2] and why
>> this solution is different. The original cache was complicated,
>> vulnerable to DOS attacks and had unstable performance.
>>
>> The new input dst_cache is much simpler thanks to its lazy approach,
>> improving performance without the overhead of the removed cache
>> implementation. Instead of using timers and GC, the deletion of invalid
>> entries is performed lazily during their lookups.
>> The dsts are stored in a simple, lightweight, static hash table. This
>> keeps the lookup times fast yet stable, preventing DOS upon cache misses.
>> The new input dst_cache implementation is built over the existing
>> dst_cache code which supplies a fast lockless percpu behavior.
>>
>> The measurement setup is comprised of 2 machines with mlx5 100Gbit NIC.
>> I sent small UDP packets with 5000 daddrs (10x of cache size) from one
>> machine to the other while also varying the saddr and the tos. I set
>> an iptables rule to drop the packets after routing. the receiving
>> machine's CPU (i9) was saturated.
>>
>> Thanks a lot to David Ahern for all the help and guidance!
>>
>> I measured the rx PPS using ifpps and the per-queue PPS using ethtool -S.
>> These are the results:
> 
> How device dismantles are taken into account ?
> 
> I am currently tracking a bug in dst_cache, triggering sometimes when
> running pmtu.sh selftest.
> 
> Apparently, dst_cache_per_cpu_dst_set() can cache dst that have no
> dst->rt_uncached
> linkage.

The dst_cache_input that was introduced in this series caches input
routes that are owned by the fib tree.
These routes have a rt_uncached linkage. So I think this bug will not
replicate to dst_cache_input.

> There is no cleanup (at least in vxlan) to make sure cached dst are
> either freed or
> their dst->dev changed.
> 
> 
> TEST: ipv6: cleanup of cached exceptions - nexthop objects          [ OK ]
> [ 1001.344490] vxlan: __vxlan_fdb_free calling
> dst_cache_destroy(ffff8f12422cbb90)
> [ 1001.345253] dst_cache_destroy dst_cache=ffff8f12422cbb90
> ->cache=0000417580008d30
> [ 1001.378615] vxlan: __vxlan_fdb_free calling
> dst_cache_destroy(ffff8f12471e31d0)
> [ 1001.379260] dst_cache_destroy dst_cache=ffff8f12471e31d0
> ->cache=0000417580008608
> [ 1011.349730] unregister_netdevice: waiting for veth_A-R1 to become
> free. Usage count = 7
> [ 1011.350562] ref_tracker: veth_A-R1@000000009392ed3b has 1/6 users at
> [ 1011.350562]      dst_alloc+0x76/0x160
> [ 1011.350562]      ip6_dst_alloc+0x25/0x80
> [ 1011.350562]      ip6_pol_route+0x2a8/0x450
> [ 1011.350562]      ip6_pol_route_output+0x1f/0x30
> [ 1011.350562]      fib6_rule_lookup+0x163/0x270
> [ 1011.350562]      ip6_route_output_flags+0xda/0x190
> [ 1011.350562]      ip6_dst_lookup_tail.constprop.0+0x1d0/0x260
> [ 1011.350562]      ip6_dst_lookup_flow+0x47/0xa0
> [ 1011.350562]      udp_tunnel6_dst_lookup+0x158/0x210
> [ 1011.350562]      vxlan_xmit_one+0x4c6/0x1550 [vxlan]
> [ 1011.350562]      vxlan_xmit+0x535/0x1500 [vxlan]
> [ 1011.350562]      dev_hard_start_xmit+0x7b/0x1e0
> [ 1011.350562]      __dev_queue_xmit+0x20c/0xe40
> [ 1011.350562]      arp_xmit+0x1d/0x50
> [ 1011.350562]      arp_send_dst+0x7f/0xa0
> [ 1011.350562]      arp_solicit+0xf6/0x2f0
> [ 1011.350562]
> [ 1011.350562] ref_tracker: veth_A-R1@000000009392ed3b has 3/6 users at
> [ 1011.350562]      dst_alloc+0x76/0x160
> [ 1011.350562]      ip6_dst_alloc+0x25/0x80
> [ 1011.350562]      ip6_pol_route+0x2a8/0x450
> [ 1011.350562]      ip6_pol_route_output+0x1f/0x30
> [ 1011.350562]      fib6_rule_lookup+0x163/0x270
> [ 1011.350562]      ip6_route_output_flags+0xda/0x190
> [ 1011.350562]      ip6_dst_lookup_tail.constprop.0+0x1d0/0x260
> [ 1011.350562]      ip6_dst_lookup_flow+0x47/0xa0
> [ 1011.350562]      udp_tunnel6_dst_lookup+0x158/0x210
> [ 1011.350562]      vxlan_xmit_one+0x4c6/0x1550 [vxlan]
> [ 1011.350562]      vxlan_xmit+0x535/0x1500 [vxlan]
> [ 1011.350562]      dev_hard_start_xmit+0x7b/0x1e0
> [ 1011.350562]      __dev_queue_xmit+0x20c/0xe40
> [ 1011.350562]      ip6_finish_output2+0x2ea/0x6e0
> [ 1011.350562]      ip6_finish_output+0x143/0x320
> [ 1011.350562]      ip6_output+0x74/0x140
> [ 1011.350562]
> [ 1011.350562] ref_tracker: veth_A-R1@000000009392ed3b has 1/6 users at
> [ 1011.350562]      netdev_get_by_index+0xc0/0xe0
> [ 1011.350562]      fib6_nh_init+0x1a9/0xa90
> [ 1011.350562]      rtm_new_nexthop+0x6fa/0x1580
> [ 1011.350562]      rtnetlink_rcv_msg+0x155/0x3e0
> [ 1011.350562]      netlink_rcv_skb+0x61/0x110
> [ 1011.350562]      rtnetlink_rcv+0x19/0x20
> [ 1011.350562]      netlink_unicast+0x23f/0x380
> [ 1011.350562]      netlink_sendmsg+0x1fc/0x430
> [ 1011.350562]      ____sys_sendmsg+0x2ef/0x320
> [ 1011.350562]      ___sys_sendmsg+0x86/0xd0
> [ 1011.350562]      __sys_sendmsg+0x67/0xc0
> [ 1011.350562]      __x64_sys_sendmsg+0x21/0x30
> [ 1011.350562]      x64_sys_call+0x252/0x2030
> [ 1011.350562]      do_syscall_64+0x6c/0x190
> [ 1011.350562]      entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 1011.350562]
> [ 1011.350562] ref_tracker: veth_A-R1@000000009392ed3b has 1/6 users at
> [ 1011.350562]      ipv6_add_dev+0x136/0x530
> [ 1011.350562]      addrconf_notify+0x19d/0x770
> [ 1011.350562]      notifier_call_chain+0x65/0xd0
> [ 1011.350562]      raw_notifier_call_chain+0x1a/0x20
> [ 1011.350562]      call_netdevice_notifiers_info+0x54/0x90
> [ 1011.350562]      register_netdevice+0x61e/0x790
> [ 1011.350562]      veth_newlink+0x230/0x440
> [ 1011.350562]      __rtnl_newlink+0x7d2/0xaa0
> [ 1011.350562]      rtnl_newlink+0x4c/0x70
> [ 1011.350562]      rtnetlink_rcv_msg+0x155/0x3e0
> [ 1011.350562]      netlink_rcv_skb+0x61/0x110
> [ 1011.350562]      rtnetlink_rcv+0x19/0x20
> [ 1011.350562]      netlink_unicast+0x23f/0x380
> [ 1011.350562]      netlink_sendmsg+0x1fc/0x430
> [ 1011.350562]      ____sys_sendmsg+0x2ef/0x320
> [ 1011.350562]      ___sys_sendmsg+0x86/0xd0
> [ 1011.350562]

      reply	other threads:[~2024-05-15 13:52 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-07 12:42 [PATCH net-next v2 0/4] net: route: improve route hinting Leone Fernando
2024-05-07 12:42 ` [PATCH net-next v2 1/4] net: route: expire rt if the dst it holds is expired Leone Fernando
2024-05-07 12:42 ` [PATCH net-next v2 2/4] net: dst_cache: add input_dst_cache API Leone Fernando
2024-05-07 12:42 ` [PATCH net-next v2 3/4] net: route: always compile dst_cache Leone Fernando
2024-05-07 12:42 ` [PATCH net-next v2 4/4] net: route: replace route hints with input_dst_cache Leone Fernando
2024-05-07 13:08 ` [PATCH net-next v2 0/4] net: route: improve route hinting Eric Dumazet
2024-05-15 13:52   ` Leone Fernando [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c8ff8557-9e2c-4316-8642-fd7ab1553ffb@gmail.com \
    --to=leone4fernando@gmail.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox