From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2D48C4345F for ; Wed, 24 Apr 2024 04:29:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2AD7B6B01FE; Wed, 24 Apr 2024 00:29:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 25D306B01FF; Wed, 24 Apr 2024 00:29:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 14C886B0200; Wed, 24 Apr 2024 00:29:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id EB5446B01FE for ; Wed, 24 Apr 2024 00:29:47 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 8EE17C0E0D for ; Wed, 24 Apr 2024 04:29:47 +0000 (UTC) X-FDA: 82043147214.19.F5FBB0F Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by imf08.hostedemail.com (Postfix) with ESMTP id 4CE9D160006 for ; Wed, 24 Apr 2024 04:29:43 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf08.hostedemail.com: domain of zhangpeng362@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=zhangpeng362@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713932985; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jrGfUzKvZb/OkWdIBP0JsDEwXFvCeQOzYyTgBPbMCCM=; b=IZC9Ay+rnvxjRWwnNA1Kmj7AEacO9eGeT5JMMQprVnOSulTYKQwKrDgK2uF3vYExnTrwOp OznCPtLm2505M6Jz5jDORJm2upR8bd/V/pKH8soeeONucYYS4QgvQYUwKlGjkvKTFAWwR9 YyFB2Mg3nlZJfsWOygziUhjx9q0XrAM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713932985; a=rsa-sha256; cv=none; b=dJM3WacSvRVugCzwrAuUXPfrohCY5nxTcji2/L6CvOiizwshhpdGK+U4eXfxnJmtCpeks7 e5VjplX7WlG5y4dQWndTqQEp8Y80vuLXnvIb1nXmtYtOR6UcWkRQ+4pNAyrDhMeerG7Git 0whKEEqOVUAFjfAigCTxdL01OHL/WDo= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf08.hostedemail.com: domain of zhangpeng362@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=zhangpeng362@huawei.com Received: from mail.maildlp.com (unknown [172.19.162.254]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4VPQsb1KZTzXlLL; Wed, 24 Apr 2024 12:26:11 +0800 (CST) Received: from kwepemm600020.china.huawei.com (unknown [7.193.23.147]) by mail.maildlp.com (Postfix) with ESMTPS id 3D04118007D; Wed, 24 Apr 2024 12:29:40 +0800 (CST) Received: from [10.174.179.160] (10.174.179.160) by kwepemm600020.china.huawei.com (7.193.23.147) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Wed, 24 Apr 2024 12:29:39 +0800 Message-ID: <0145f5db-1adf-46d1-1a2e-41230ab1e462@huawei.com> Date: Wed, 24 Apr 2024 12:29:25 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.9.0 Subject: Re: [RFC PATCH v2 0/2] mm: convert mm's rss stats to use atomic mode Content-Language: en-US To: , CC: , , , , , , , , , , , References: <20240418142008.2775308-1-zhangpeng362@huawei.com> From: "zhangpeng (AS)" In-Reply-To: <20240418142008.2775308-1-zhangpeng362@huawei.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.179.160] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemm600020.china.huawei.com (7.193.23.147) X-Stat-Signature: q16zdnhc4e7fjxme9rxzj8jhnu87w8et X-Rspamd-Queue-Id: 4CE9D160006 X-Rspamd-Server: rspam06 X-Rspam-User: X-HE-Tag: 1713932983-513885 X-HE-Meta: U2FsdGVkX1/jOXT975TWHZNEB/PJZEJN0dQsUER5/VB+J3mlTUITT0zXaxQq3T4Xp8a0eUfq1SDDkrvjNdpYFLadkJTwKI6BTQjBuuqnGCDHtLv5LNjrUmUwLo17YVnNUu7o0FMTVAx3bUHLdrWmVlt2gKwzd67QdkOgAmKMznqFByvaHGgOKBImPJ99i2sis3KJv7nKFS1nr4WaRjZSn8OgzJy34NVVyoKvYST/zZc8UB3afQv3+SQwk7BfrB32Cj37Bq0D92NQaI/Qqn1tQpQx+o8Q03gkioGyzMiD9sENd/5HYCRvA9e5fYnHNFvMArNoRwAtFNp5UQBu2vQ2uLVkYRZzU9bCF2fEAhn5mFWwG1RXxTj4VBgHKX+2ygTCq3PuaV+BolJiFjwWTc9hJdCEej6MILPSCrkHgXaXs+Fe4VpWYKqyXmtjjvWJyZD8CPCIwY3GHyXbSSBc/+1K8uPxTi+X7FyA7aPKG0Jx2K2VSVeeFnv3V60TMEeWzWwCi7XpIFOpScz5OKFDDy2BIT2hMDABnqKHLaIyerEf0k1elpJY1ReuslKxuPDs/fE4ihxicfw5c15j+QdTz1e8QKrlXCuHBtHgcbxcxKLzmRPBGPUQwY5H09ywG5C4i2ZeE5PQojHMoQcq6AGsf1Sp0G4E/fUbT23nli2lAFQxVYKBar5cehb+Vn09b/dcHx0f94l3mnp/+uuwHZ2pDgcDmSWf65ZKec1wHz4wy3k0//qvCSFSMBrmytu9D78/5WbjW8z3JZeT45oPzc2g8wdj2mKh5xklJOYC/ow+upnKrUUQJjBUJKWMnvLYKHSA4CweLXn4p6Q9kdwewgBOGX+Y6AA/iLMkMn3fFVuxf2FubrjauZXnK5Fs1EQO2CZg1jyqP8l9NUvNTbARdf9rwXL3kvVA17RtXBtvWNaWxhZ9Z74NhCAA6o5AELObD5X+Iv5tER5OrxMSIukltT33wZt WKBB8Y/b pri4zRVEn+MHUVm/i1L50HvSiOTs5gYfu/TTqoTQ5tWxH3nQ8eK42MCIa0QenRW5OVDhfvHnRaj8BfwQhk9XfmITdNYvhshURNfBE5QrPPaxH43u759y4eocgGcPhoNTeI7OZ9CTnojR7Iy70wqWGTZw4UIri8ktNKuIcDoNJj9WnlMR+6komaOrDQa9UmGxca/WVyUmyt2Y0CRGdpt/eqCH43Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/4/18 22:20, Peng Zhang wrote: Any suggestions or opinions are welcome. Could someone please review this patch series? Thanks! > From: ZhangPeng > > Since commit f1a7941243c1 ("mm: convert mm's rss stats into > percpu_counter"), the rss_stats have converted into percpu_counter, > which convert the error margin from (nr_threads * 64) to approximately > (nr_cpus ^ 2). However, the new percpu allocation in mm_init() causes a > performance regression on fork/exec/shell. Even after commit 14ef95be6f55 > ("kernel/fork: group allocation/free of per-cpu counters for mm struct"), > the performance of fork/exec/shell is still poor compared to previous > kernel versions. > > To mitigate performance regression, we delay the allocation of percpu > memory for rss_stats. Therefore, we convert mm's rss stats to use > percpu_counter atomic mode. For single-thread processes, rss_stat is in > atomic mode, which reduces the memory consumption and performance > regression caused by using percpu. For multiple-thread processes, > rss_stat is switched to the percpu mode to reduce the error margin. > We convert rss_stats from atomic mode to percpu mode only when the > second thread is created. > > After lmbench test, we can get 2% ~ 4% performance improvement > for lmbench fork_proc/exec_proc/shell_proc and 6.7% performance > improvement for lmbench page_fault (before batch mode[1]). > > The test results are as follows: > base base+revert base+this patch > > fork_proc 416.3ms 400.0ms (3.9%) 398.6ms (4.2%) > exec_proc 2095.9ms 2061.1ms (1.7%) 2047.7ms (2.3%) > shell_proc 3028.2ms 2954.7ms (2.4%) 2961.2ms (2.2%) > page_fault 0.3603ms 0.3358ms (6.8%) 0.3361ms (6.7%) > > [1] https://lore.kernel.org/all/20240412064751.119015-1-wangkefeng.wang@huawei.com/ > > ChangeLog: > v2->v1: > - Convert rss_stats from atomic mode to percpu mode only when > the second thread is created per Jan Kara. > - Compared with v1, the performance data may be different due to > different test machines. > > ZhangPeng (2): > percpu_counter: introduce atomic mode for percpu_counter > mm: convert mm's rss stats to use atomic mode > > include/linux/mm.h | 50 +++++++++++++++++++++++++++++----- > include/linux/percpu_counter.h | 43 +++++++++++++++++++++++++++-- > include/trace/events/kmem.h | 4 +-- > kernel/fork.c | 18 +++++++----- > lib/percpu_counter.c | 31 +++++++++++++++++++-- > 5 files changed, 125 insertions(+), 21 deletions(-) > -- Best Regards, Peng