From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C836BC433F5 for ; Fri, 29 Apr 2022 14:18:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1376794AbiD2OVk (ORCPT ); Fri, 29 Apr 2022 10:21:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58830 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1358148AbiD2OVj (ORCPT ); Fri, 29 Apr 2022 10:21:39 -0400 Received: from mail-ej1-x62b.google.com (mail-ej1-x62b.google.com [IPv6:2a00:1450:4864:20::62b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 949BC8FFBB for ; Fri, 29 Apr 2022 07:18:20 -0700 (PDT) Received: by mail-ej1-x62b.google.com with SMTP id r13so15739047ejd.5 for ; Fri, 29 Apr 2022 07:18:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:from:to:cc :references:in-reply-to:content-transfer-encoding; bh=QSgzc0vyPfdM2R74h+fDAaKGxEkHW/J6GntJH/+8MqA=; b=JErjgbrGSCuxkZSPXdXs662X6qLQdm2y9B6kAp85o6xrHVLdlC8F2OKVKlasmfV0VD oUuOBZzjkjlcsO8BgbDE/PUx4+HGFxnB2WnAr6pjLfpmDf8LzN5E6VfV3+rOeb2SgKNd QpNBlHz2V8N0pwdR1FrceUuWfkw62IIZGmiWGsaDw87TVLkKhjqs14ZK1HgwqPjuZURJ cxjayRv06hYSKFJjmbz5uAioOrtkGlBunuj9XhJDW0IrEjrhvoKGjGrpzZIbmpXtQLze 8m4u/C+tDu/zqkuN4Unju+SewE7qH1UUX35amdRXdmQD6IaO8Phmu3nUIX2EbQBfT5sl r37w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :from:to:cc:references:in-reply-to:content-transfer-encoding; bh=QSgzc0vyPfdM2R74h+fDAaKGxEkHW/J6GntJH/+8MqA=; b=WqrXB/9DckzAZEVdxex8+iT3FsGn9RYFYfCd5EPErwLajREWljTHyV4Ra83mKKGFTB nAf7hO+T/twdohKQ+8z2+hpI9ufdvCh6HHLx2DzdxPq0I0EFXp/TDiv5wmTZG9G7GM4D XfhDE2BeeW7LkI/V+urokXSNZqFQe+BKljPp2Hg8Grb1Sb2teuJVTSGllsXwc4mA3JYS PHGE4h/xhdqeVoT2aGXtkdfRFGpFOqcxvHSHg0VCI6q49J2+wqE4+lg/wkbljlG1rX0h kDGKtCY70vLi1CSuPgu+PRto3k81joE7C+aBRg1JU2UFK7RDgiViQ1EGcrsm3j/ODhRt Ygpw== X-Gm-Message-State: AOAM532j+NB2Li6vj8geNixL88cRWihwygK3JsNfDKpQioTFHFy1Kg9L TncQNAnytjVfpfFV8lrOdJs= X-Google-Smtp-Source: ABdhPJyxDAQvouWAeDbrXTAnhgRx7IiZ7XBa7/bMsNyRGDs1v1dtJDx9kHoW5KZPnS3NXdeErGMdyA== X-Received: by 2002:a17:907:105d:b0:6f3:a3ea:cdfd with SMTP id oy29-20020a170907105d00b006f3a3eacdfdmr20718562ejb.704.1651241897738; Fri, 29 Apr 2022 07:18:17 -0700 (PDT) Received: from [192.168.26.149] (ip-194-187-74-233.konfederacka.maverick.com.pl. [194.187.74.233]) by smtp.googlemail.com with ESMTPSA id yo1-20020a170907136100b006f3ef214e0dsm689472ejb.115.2022.04.29.07.18.16 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 29 Apr 2022 07:18:16 -0700 (PDT) Message-ID: <9f958aae-8293-377c-6f30-743d9c3f3ce0@gmail.com> Date: Fri, 29 Apr 2022 16:18:15 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:96.0) Gecko/20100101 Thunderbird/96.0 Subject: Re: Optimizing kernel compilation / alignments for network performance From: =?UTF-8?B?UmFmYcWCIE1pxYJlY2tp?= To: Alexander Lobakin Cc: Network Development , linux-arm-kernel , Russell King , Andrew Lunn , Felix Fietkau , "openwrt-devel@lists.openwrt.org" , Florian Fainelli References: <84f25f73-1fab-fe43-70eb-45d25b614b4c@gmail.com> <20220427125658.3127816-1-alexandr.lobakin@intel.com> <066fc320-dc04-11a4-476e-b0d11f3b17e6@gmail.com> In-Reply-To: <066fc320-dc04-11a4-476e-b0d11f3b17e6@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On 27.04.2022 19:31, Rafał Miłecki wrote: > On 27.04.2022 14:56, Alexander Lobakin wrote: >> From: Rafał Miłecki >> Date: Wed, 27 Apr 2022 14:04:54 +0200 >> >>> I noticed years ago that kernel changes touching code - that I don't use >>> at all - can affect network performance for me. >>> >>> I work with home routers based on Broadcom Northstar platform. Those >>> are SoCs with not-so-powerful 2 x ARM Cortex-A9 CPU cores. Main task of >>> those devices is NAT masquerade and that is what I test with iperf >>> running on two x86 machines. >>> >>> *** >>> >>> Example of such unused code change: >>> ce5013ff3bec ("mtd: spi-nor: Add support for XM25QH64A and XM25QH128A"). >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ce5013ff3bec05cf2a8a05c75fcd520d9914d92b >>> It lowered my NAT speed from 381 Mb/s to 367 Mb/s (-3,5%). >>> >>> I first reported that issue it in the e-mail thread: >>> ARM router NAT performance affected by random/unrelated commits >>> https://lkml.org/lkml/2019/5/21/349 >>> https://www.spinics.net/lists/linux-block/msg40624.html >>> >>> Back then it was commit 5b0890a97204 ("flow_dissector: Parse batman-adv >>> unicast headers") >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9316a9ed6895c4ad2f0cde171d486f80c55d8283 >>> that increased my NAT speed from 741 Mb/s to 773 Mb/s (+4,3%). >>> >>> *** >>> >>> It appears Northstar CPUs have little cache size and so any change in >>> location of kernel symbols can affect NAT performance. That explains why >>> changing unrelated code affects anything & it has been partially proven >>> aligning some of cache-v7.S code. >>> >>> My question is: is there a way to find out & force an optimal symbols >>> locations? >> >> Take a look at CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B[0]. I've been >> fighting with the same issue on some Realtek MIPS boards: random >> code changes in random kernel core parts were affecting NAT / >> network performance. This option resolved this I'd say, for the cost >> of slightly increased vmlinux size (almost no change in vmlinuz >> size). >> The only thing is that it was recently restricted to a set of >> architectures and MIPS and ARM32 are not included now lol. So it's >> either a matter of expanding the list (since it was restricted only >> because `-falign-functions=` is not supported on some architectures) >> or you can just do: >> >> make KCFLAGS=-falign-functions=64 # replace 64 with your I-cache size >> >> The actual alignment is something to play with, I stopped on the >> cacheline size, 32 in my case. >> Also, this does not provide any guarantees that you won't suffer >> from random data cacheline changes. There were some initiatives to >> introduce debug alignment of data as well, but since function are >> often bigger than 32, while variables are usually much smaller, it >> was increasing the vmlinux size by a ton (imagine each u32 variable >> occupying 32-64 bytes instead of 4). But the chance of catching this >> is much lower than to suffer from I-cache function misplacement. > > Thank you Alexander, this appears to be helpful! I decided to ignore > CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B for now and just adjust CFLAGS > manually. > > > 1. Without ce5013ff3bec and with -falign-functions=32 > 387 Mb/s > > 2. Without ce5013ff3bec and with -falign-functions=64 > 377 Mb/s > > 3. With ce5013ff3bec and with -falign-functions=32 > 384 Mb/s > > 4. With ce5013ff3bec and with -falign-functions=64 > 377 Mb/s > > > So it seems that: > 1. -falign-functions=32 = pretty stable high speed > 2. -falign-functions=64 = very stable slightly lower speed > > > I'm going to perform tests on more commits but if it stays so reliable > as above that will be a huge success for me. So sadly that doesn't work all the time. Or maybe just works randomly. I tried multiple commits with both: -falign-functions=32 and -falign-functions=64 . I still get speed variations. About 30 Mb/s in total. From commit to commit it's usually about 3% but skipping few can result in up to 30 Mb/s (almost 10%). Similarly to code changes performance also gets affected by enabling / disabling kernel config options. I noticed that enabling CONFIG_CRYPTO_PCRYPT may decrease *or* increase speed depending on -falign-functions (and depending on kernel commit surely too). ┌──────────────────────┬───────────┬──────────┬───────┐ │ │ no PCRYPT │ PCRYPT=y │ diff │ ├──────────────────────┼───────────┼──────────┼───────┤ │ No -falign-functions │ 363 Mb/s │ 370 Mb/s │ +2% │ │ -falign-functions=32 │ 364 Mb/s │ 370 Mb/s │ +1,7% │ │ -falign-functions=64 │ 372 Mb/s │ 365 Mb/s │ -2% │ └──────────────────────┴───────────┴──────────┴───────┘ So I still don't have a reliable way of testing kernel changes for speed regressions :(