All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ido Schimmel <idosch@idosch.org>
To: Lahav Schlesinger <lschlesinger@drivenets.com>
Cc: netdev@vger.kernel.org, kuba@kernel.org, dsahern@gmail.com
Subject: Re: [PATCH net-next v4] rtnetlink: Support fine-grained netdevice bulk deletion
Date: Sun, 5 Dec 2021 11:53:03 +0200	[thread overview]
Message-ID: <YayL/7d/hm3TYjtV@shredder> (raw)
In-Reply-To: <20211202174502.28903-1-lschlesinger@drivenets.com>

On Thu, Dec 02, 2021 at 07:45:02PM +0200, Lahav Schlesinger wrote:
> Under large scale, some routers are required to support tens of thousands
> of devices at once, both physical and virtual (e.g. loopbacks, tunnels,
> vrfs, etc).
> At times such routers are required to delete massive amounts of devices
> at once, such as when a factory reset is performed on the router (causing
> a deletion of all devices), or when a configuration is restored after an
> upgrade, or as a request from an operator.
> 
> Currently there are 2 means of deleting devices using Netlink:
> 1. Deleting a single device (either by ifindex using ifinfomsg::ifi_index,
> or by name using IFLA_IFNAME)
> 2. Delete all device that belong to a group (using IFLA_GROUP)
> 
> Deletion of devices one-by-one has poor performance on large scale of
> devices compared to "group deletion":
> After all device are handled, netdev_run_todo() is called which
> calls rcu_barrier() to finish any outstanding RCU callbacks that were
> registered during the deletion of the device, then wait until the
> refcount of all the devices is 0, then perform final cleanups.
> 
> However, calling rcu_barrier() is a very costly operation, each call
> taking in the order of 10s of milliseconds.
> 
> When deleting a large number of device one-by-one, rcu_barrier()
> will be called for each device being deleted.
> As an example, following benchmark deletes 10K loopback devices,
> all of which are UP and with only IPv6 LLA being configured:
> 
> 1. Deleting one-by-one using 1 thread : 243 seconds
> 2. Deleting one-by-one using 10 thread: 70 seconds
> 3. Deleting one-by-one using 50 thread: 54 seconds
> 4. Deleting all using "group deletion": 30 seconds
> 
> Note that even though the deletion logic takes place under the rtnl
> lock, since the call to rcu_barrier() is outside the lock we gain
> some improvements.
> 
> But, while "group deletion" is the fastest, it is not suited for
> deleting large number of arbitrary devices which are unknown a head of
> time. Furthermore, moving large number of devices to a group is also a
> costly operation.

These are the number I get in a VM running on my laptop.

Moving 16k dummy netdevs to a group:

# time -p ip -b group.batch 
real 1.91
user 0.04
sys 0.27

Deleting the group:

# time -p ip link del group 10
real 6.15
user 0.00
sys 3.02

IMO, these numbers do not justify a new API. Also, your user space can
be taught to create all the netdevs in the same group to begin with:

# ip link add name dummy1 group 10 type dummy
# ip link show dev dummy1
10: dummy1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group 10 qlen 1000
    link/ether 12:b6:7d:ff:48:99 brd ff:ff:ff:ff:ff:ff

Moreover, unlike the list API that is specific to deletion, the group
API also lets you batch set operations:

# ip link set group 10 mtu 2000
# ip link show dev dummy1
10: dummy1: <BROADCAST,NOARP> mtu 2000 qdisc noop state DOWN mode
DEFAULT group 10 qlen 1000
    link/ether 12:b6:7d:ff:48:99 brd ff:ff:ff:ff:ff:ff

If you are using namespaces, then during "factory reset" you can delete
the namespace which should trigger batch deletion of the netdevs inside
it.

> 
> This patch adds support for passing an arbitrary list of ifindex of
> devices to delete with a new IFLA_IFINDEX attribute. A single message
> may contain multiple instances of this attribute).
> This gives a more fine-grained control over which devices to delete,
> while still resulting in rcu_barrier() being called only once.
> Indeed, the timings of using this new API to delete 10K devices is
> the same as using the existing "group" deletion.
> 
> Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>

  parent reply	other threads:[~2021-12-05  9:53 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-02 17:45 [PATCH net-next v4] rtnetlink: Support fine-grained netdevice bulk deletion Lahav Schlesinger
2021-12-04  1:06 ` Jakub Kicinski
2021-12-04 10:15 ` Nikolay Aleksandrov
2021-12-04 16:18   ` David Ahern
2021-12-05  8:28   ` Lahav Schlesinger
2021-12-05  9:53 ` Ido Schimmel [this message]
2021-12-05 12:11   ` Lahav Schlesinger
2021-12-05 13:49     ` Ido Schimmel
2021-12-05 15:05       ` Lahav Schlesinger
2021-12-06  8:07         ` Nicolas Dichtel
2021-12-07  3:56       ` David Ahern

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YayL/7d/hm3TYjtV@shredder \
    --to=idosch@idosch.org \
    --cc=dsahern@gmail.com \
    --cc=kuba@kernel.org \
    --cc=lschlesinger@drivenets.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.