From: "Rafał Miłecki" <zajec5@gmail.com>
To: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Network Development <netdev@vger.kernel.org>,
linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
Russell King <linux@armlinux.org.uk>,
Andrew Lunn <andrew@lunn.ch>, Felix Fietkau <nbd@nbd.name>,
"openwrt-devel@lists.openwrt.org"
<openwrt-devel@lists.openwrt.org>,
Florian Fainelli <f.fainelli@gmail.com>
Subject: Re: Optimizing kernel compilation / alignments for network performance
Date: Fri, 29 Apr 2022 16:18:15 +0200 [thread overview]
Message-ID: <9f958aae-8293-377c-6f30-743d9c3f3ce0@gmail.com> (raw)
In-Reply-To: <066fc320-dc04-11a4-476e-b0d11f3b17e6@gmail.com>
On 27.04.2022 19:31, Rafał Miłecki wrote:
> On 27.04.2022 14:56, Alexander Lobakin wrote:
>> From: Rafał Miłecki <zajec5@gmail.com>
>> Date: Wed, 27 Apr 2022 14:04:54 +0200
>>
>>> I noticed years ago that kernel changes touching code - that I don't use
>>> at all - can affect network performance for me.
>>>
>>> I work with home routers based on Broadcom Northstar platform. Those
>>> are SoCs with not-so-powerful 2 x ARM Cortex-A9 CPU cores. Main task of
>>> those devices is NAT masquerade and that is what I test with iperf
>>> running on two x86 machines.
>>>
>>> ***
>>>
>>> Example of such unused code change:
>>> ce5013ff3bec ("mtd: spi-nor: Add support for XM25QH64A and XM25QH128A").
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ce5013ff3bec05cf2a8a05c75fcd520d9914d92b
>>> It lowered my NAT speed from 381 Mb/s to 367 Mb/s (-3,5%).
>>>
>>> I first reported that issue it in the e-mail thread:
>>> ARM router NAT performance affected by random/unrelated commits
>>> https://lkml.org/lkml/2019/5/21/349
>>> https://www.spinics.net/lists/linux-block/msg40624.html
>>>
>>> Back then it was commit 5b0890a97204 ("flow_dissector: Parse batman-adv
>>> unicast headers")
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9316a9ed6895c4ad2f0cde171d486f80c55d8283
>>> that increased my NAT speed from 741 Mb/s to 773 Mb/s (+4,3%).
>>>
>>> ***
>>>
>>> It appears Northstar CPUs have little cache size and so any change in
>>> location of kernel symbols can affect NAT performance. That explains why
>>> changing unrelated code affects anything & it has been partially proven
>>> aligning some of cache-v7.S code.
>>>
>>> My question is: is there a way to find out & force an optimal symbols
>>> locations?
>>
>> Take a look at CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B[0]. I've been
>> fighting with the same issue on some Realtek MIPS boards: random
>> code changes in random kernel core parts were affecting NAT /
>> network performance. This option resolved this I'd say, for the cost
>> of slightly increased vmlinux size (almost no change in vmlinuz
>> size).
>> The only thing is that it was recently restricted to a set of
>> architectures and MIPS and ARM32 are not included now lol. So it's
>> either a matter of expanding the list (since it was restricted only
>> because `-falign-functions=` is not supported on some architectures)
>> or you can just do:
>>
>> make KCFLAGS=-falign-functions=64 # replace 64 with your I-cache size
>>
>> The actual alignment is something to play with, I stopped on the
>> cacheline size, 32 in my case.
>> Also, this does not provide any guarantees that you won't suffer
>> from random data cacheline changes. There were some initiatives to
>> introduce debug alignment of data as well, but since function are
>> often bigger than 32, while variables are usually much smaller, it
>> was increasing the vmlinux size by a ton (imagine each u32 variable
>> occupying 32-64 bytes instead of 4). But the chance of catching this
>> is much lower than to suffer from I-cache function misplacement.
>
> Thank you Alexander, this appears to be helpful! I decided to ignore
> CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B for now and just adjust CFLAGS
> manually.
>
>
> 1. Without ce5013ff3bec and with -falign-functions=32
> 387 Mb/s
>
> 2. Without ce5013ff3bec and with -falign-functions=64
> 377 Mb/s
>
> 3. With ce5013ff3bec and with -falign-functions=32
> 384 Mb/s
>
> 4. With ce5013ff3bec and with -falign-functions=64
> 377 Mb/s
>
>
> So it seems that:
> 1. -falign-functions=32 = pretty stable high speed
> 2. -falign-functions=64 = very stable slightly lower speed
>
>
> I'm going to perform tests on more commits but if it stays so reliable
> as above that will be a huge success for me.
So sadly that doesn't work all the time. Or maybe just works randomly.
I tried multiple commits with both: -falign-functions=32 and
-falign-functions=64 . I still get speed variations. About 30 Mb/s in
total. From commit to commit it's usually about 3% but skipping few can
result in up to 30 Mb/s (almost 10%).
Similarly to code changes performance also gets affected by enabling /
disabling kernel config options. I noticed that enabling
CONFIG_CRYPTO_PCRYPT may decrease *or* increase speed depending on
-falign-functions (and depending on kernel commit surely too).
┌──────────────────────┬───────────┬──────────┬───────┐
│ │ no PCRYPT │ PCRYPT=y │ diff │
├──────────────────────┼───────────┼──────────┼───────┤
│ No -falign-functions │ 363 Mb/s │ 370 Mb/s │ +2% │
│ -falign-functions=32 │ 364 Mb/s │ 370 Mb/s │ +1,7% │
│ -falign-functions=64 │ 372 Mb/s │ 365 Mb/s │ -2% │
└──────────────────────┴───────────┴──────────┴───────┘
So I still don't have a reliable way of testing kernel changes for speed
regressions :(
next prev parent reply other threads:[~2022-04-29 14:18 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-27 12:04 Optimizing kernel compilation / alignments for network performance Rafał Miłecki
2022-04-27 12:56 ` Alexander Lobakin
2022-04-27 17:31 ` Rafał Miłecki
2022-04-29 14:18 ` Rafał Miłecki [this message]
2022-04-29 14:49 ` Arnd Bergmann
2022-05-05 15:42 ` Rafał Miłecki
2022-05-05 16:04 ` Andrew Lunn
2022-05-05 16:46 ` Felix Fietkau
2022-05-06 7:47 ` Rafał Miłecki
2022-05-06 12:42 ` Andrew Lunn
2022-05-10 10:29 ` Rafał Miłecki
2022-05-10 14:09 ` Dave Taht
2022-05-10 19:15 ` Dave Taht
2022-05-06 7:44 ` Rafał Miłecki
2022-05-06 8:45 ` Arnd Bergmann
2022-05-06 8:55 ` Rafał Miłecki
2022-05-06 9:44 ` Arnd Bergmann
2022-05-10 12:51 ` Rafał Miłecki
2022-05-10 13:19 ` Arnd Bergmann
2022-05-10 11:23 ` Rafał Miłecki
2022-05-10 13:18 ` Arnd Bergmann
2022-05-08 9:53 ` Rafał Miłecki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9f958aae-8293-377c-6f30-743d9c3f3ce0@gmail.com \
--to=zajec5@gmail.com \
--cc=alexandr.lobakin@intel.com \
--cc=andrew@lunn.ch \
--cc=f.fainelli@gmail.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux@armlinux.org.uk \
--cc=nbd@nbd.name \
--cc=netdev@vger.kernel.org \
--cc=openwrt-devel@lists.openwrt.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).