All of lore.kernel.org
 help / color / mirror / Atom feed
From: William Tu <witu@nvidia.com>
To: Stephen Hemminger <stephen@networkplumber.org>
Cc: netdev@vger.kernel.org, jiri@nvidia.com, bodong@nvidia.com,
	kuba@kernel.org
Subject: Re: [PATCH RFC net-next] net: cache the __dev_alloc_name()
Date: Wed, 8 May 2024 20:27:00 -0700	[thread overview]
Message-ID: <e4478663-bbae-40fa-bc85-bbd75e83a37c@nvidia.com> (raw)
In-Reply-To: <20240507212436.75c799ad@hermes.local>



On 5/7/24 9:24 PM, Stephen Hemminger wrote:
> External email: Use caution opening links or attachments
>
>
> On Mon, 6 May 2024 20:32:07 +0000
> William Tu <witu@nvidia.com> wrote:
>
>> When a system has around 1000 netdevs, adding the 1001st device becomes
>> very slow. The devlink command to create an SF
>>    $ devlink port add pci/0000:03:00.0 flavour pcisf \
>>      pfnum 0 sfnum 1001
>> takes around 5 seconds, and Linux perf and flamegraph show 19% of time
>> spent on __dev_alloc_name() [1].
>>
>> The reason is that devlink first requests for next available "eth%d".
>> And __dev_alloc_name will scan all existing netdev to match on "ethN",
>> set N to a 'inuse' bitmap, and find/return next available number,
>> in our case eth0.
>>
>> And later on based on udev rule, we renamed it from eth0 to
>> "en3f0pf0sf1001" and with altname below
>>    14: en3f0pf0sf1001: <BROADCAST,MULTICAST,UP,LOWER_UP> ...
>>        altname enp3s0f0npf0sf1001
>>
>> So eth0 is actually never being used, but as we have 1k "en3f0pf0sfN"
>> devices + 1k altnames, the __dev_alloc_name spends lots of time goint
>> through all existing netdev and try to build the 'inuse' bitmap of
>> pattern 'eth%d'. And the bitmap barely has any bit set, and it rescanes
>> every time.
>>
>> I want to see if it makes sense to save/cache the result, or is there
>> any way to not go through the 'eth%d' pattern search. The RFC patch
>> adds name_pat (name pattern) hlist and saves the 'inuse' bitmap. It saves
>> pattens, ex: "eth%d", "veth%d", with the bitmap, and lookup before
>> scanning all existing netdevs.
>>
>> Note: code is working just for quick performance benchmark, and still
>> missing lots of stuff. Using hlist seems to overkill, as I think
>> we only have few patterns
>> $ git grep alloc_netdev drivers/ net/ | grep %d
>>
>> 1. https://github.com/williamtu/net-next/issues/1
>>
>> Signed-off-by: William Tu <witu@nvidia.com>
Hi Stephen,
Thanks for your feedback.
> Actual patch is bit of a mess, with commented out code, leftover printks,
> random whitespace changes. Please fix that.
Yes, working on it.
>
> The issue is that bitmap gets to be large and adds bloat to embedded devices.
the bitmap size is fixed (8*PAGE_SIZE), set_bit is also fast. It's just 
that for each new device, we always re-scan all existing netdevs, set 
bit map, and then free the bitmap.
>
> Perhaps you could either force devlink to use the same device each time (eth0)
> if it is going to be renamed anyway.
It is working like that now (with udev) in my slow environment. So it's 
always getting eth0, (because bitmap is all 0s), and udev renames it to 
enp0xxx. Then next time rescan and since eth0 is still available, 
__dev_alloc_name still returns eth0, and udev renames it again, and next 
device creations follows the same, and the time to rescan gets longer 
and longer.

Regards,
William


  reply	other threads:[~2024-05-09  3:27 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-06 20:32 [PATCH RFC net-next] net: cache the __dev_alloc_name() William Tu
2024-05-07  7:26 ` Paolo Abeni
2024-05-07 18:55   ` William Tu
2024-05-09  7:46     ` Paolo Abeni
2024-05-09 13:06       ` William Tu
2024-05-08  4:24 ` Stephen Hemminger
2024-05-09  3:27   ` William Tu [this message]
2024-05-10 21:30     ` William Tu
2024-05-09  6:11 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e4478663-bbae-40fa-bc85-bbd75e83a37c@nvidia.com \
    --to=witu@nvidia.com \
    --cc=bodong@nvidia.com \
    --cc=jiri@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.