From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AEDCAC4345F for ; Fri, 19 Apr 2024 02:30:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2CCB16B0083; Thu, 18 Apr 2024 22:30:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 27CC76B0088; Thu, 18 Apr 2024 22:30:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 11D196B008A; Thu, 18 Apr 2024 22:30:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id E96826B0083 for ; Thu, 18 Apr 2024 22:30:32 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id A3380401F8 for ; Fri, 19 Apr 2024 02:30:32 +0000 (UTC) X-FDA: 82024702704.09.78622E8 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) by imf23.hostedemail.com (Postfix) with ESMTP id C1C3714000D for ; Fri, 19 Apr 2024 02:30:30 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=PV3AhPEO; spf=pass (imf23.hostedemail.com: domain of rongwei.wrw@gmail.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=rongwei.wrw@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713493830; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OqzqLAXxjHizAsgJ8y+IU6jvWunvLzLPi/Caa0cG/nw=; b=oCosAtPBdyp9W57Lf9xyxoU03pxkC7TaYMLCNVTbjavn9MIiKNXS40ol2+oMFRGo0My/YZ rlmE9HRI//FjjcplrUcPFoRoYMwFdgWhc7IWQMNst+2q+gs6+ybD75HG8tpEUQNJCBO3Li 6vm4hjwVkgi+qIUvmS85gLgrT8oDOaA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713493830; a=rsa-sha256; cv=none; b=Rf4oThAvAxo6A92YOFt249Is/fGkG8GMnI8BanovgWyGH1PR0Jct4YIhnM9T2zMZUkDR8J eAelAzTPYyI+48fk0AO7Vd1iYzF/vmeJbsLSZn0DrL8AeD7WxQJhi5EfN5g3Czhr/91MP5 LWGXT52o4nMxaKydRo5a2mX3usIyQrs= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=PV3AhPEO; spf=pass (imf23.hostedemail.com: domain of rongwei.wrw@gmail.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=rongwei.wrw@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-1e3f17c6491so13878875ad.2 for ; Thu, 18 Apr 2024 19:30:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1713493829; x=1714098629; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=OqzqLAXxjHizAsgJ8y+IU6jvWunvLzLPi/Caa0cG/nw=; b=PV3AhPEO+hL/dG8gfpxgAAYMzoFDA9tZSZcDJ6mXZsGfOMQo9yluJW7HAgM3TB9O3F 8JrrFzEAUaeGnOFbhjuloFETdEk7/E9DkAcitZOmkDcFoBC9rMNL6TAPChmYnWe/pTt5 ySUhuA62S+wSbC1F1yiLKU9HyL9wCCPtNZmr+p6uQsr/6Oy83l7ocpJZyoEelHVTH0hM 0K/XI7wnoQ4ODkLjt7OTwtjI0lBxqD8G7N6s0tErhL97hy+UTwq4rNot4HsPqNX5ygNT WE6YromGekvKykzP7SQbmgRxdAou9fuenm/2C57N48gCzHmPdgnjnocm7lgOfyEuhxvX Mf9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713493829; x=1714098629; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=OqzqLAXxjHizAsgJ8y+IU6jvWunvLzLPi/Caa0cG/nw=; b=d18wpQhZ7Ggib0sFMM+f+TSkJfZD/SSpeEhVayQO7gjg6RI8OrokjyJOXYobRQLhKu iyXIBZEOObOT932DSIcQ2pwwI2WGpqeW6w1CCFP2oZc5ciJCRQEglhjkXHKhtUpbgAzQ c1mTC/sx5dlK++Wihqu3dvuFcuotXGEIbkrBNF4Wmp05E7bwuCqi+gbeVjhnecjwZp/l XfUFksZHi/mgSMuWVdyjIBV+XudZpFU9Phjrrrtu5LKVea+RtDhP9h9qpqN5Qk0pHla6 rU5QuUimsA/nkH+RkNmOloK2Mi9NtKR8Rq39c4HBc4c+KUQQycNkSyfMzHVjKAlZFe5P hbcA== X-Forwarded-Encrypted: i=1; AJvYcCWXaNphTwaBbvOwfptVfVIhey0bL4k18+zXn7Z6wfX1xa053PzA3QV0+Gf81dtc2IQ08Jz9WFHsyRU1ieng7OaHYRA= X-Gm-Message-State: AOJu0Yx7NqXqacOA75nICRSc0zGDLaNNTnoEN/X8Vj6LHhgwaWFEUE/I JfAwHeR55cutOSQbgZuZmu62oBe1kkTdgSVSQ6rujYA1prgM+8B6 X-Google-Smtp-Source: AGHT+IEh3AxD9eCXwyzOo2QAfwekki2w8N44Hzo9DD9m6GmzMpcAsoKGxurQ2ONhYxx0b+cmPqN60w== X-Received: by 2002:a17:902:7617:b0:1e2:bf94:487 with SMTP id k23-20020a170902761700b001e2bf940487mr761197pll.57.1713493829422; Thu, 18 Apr 2024 19:30:29 -0700 (PDT) Received: from [192.168.255.10] ([43.132.141.25]) by smtp.gmail.com with ESMTPSA id i9-20020a17090332c900b001e3cfb853a2sm2222893plr.183.2024.04.18.19.30.25 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 18 Apr 2024 19:30:29 -0700 (PDT) Message-ID: Date: Fri, 19 Apr 2024 10:30:24 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v2 2/2] mm: convert mm's rss stats to use atomic mode To: Peng Zhang , linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, dennisszhou@gmail.com, shakeelb@google.com, jack@suse.cz, surenb@google.com, kent.overstreet@linux.dev, mhocko@suse.cz, vbabka@suse.cz, yuzhao@google.com, yu.ma@intel.com, wangkefeng.wang@huawei.com, sunnanyong@huawei.com References: <20240418142008.2775308-1-zhangpeng362@huawei.com> <20240418142008.2775308-3-zhangpeng362@huawei.com> From: Rongwei Wang In-Reply-To: <20240418142008.2775308-3-zhangpeng362@huawei.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: C1C3714000D X-Stat-Signature: d694r85de5k3puq9fsjji7dhqftugf4h X-HE-Tag: 1713493830-61146 X-HE-Meta: U2FsdGVkX18HAaPdqlDE33POkP5/XZ2feME5iM40+BCJAleN6zcLibyhg9BJDJcpEMF2JksL7Tn5sHef99leLpaXfFHevQbdS2Nl1/V+NogPj5PtB7p5lAh8MaGPEqvy2lCIxNZZ12pwBHAcDRWfOq5KjmZpHaKypZkjXE9VNitiZqTi0iAQUFVOSKvyimzDr7Zi+BF48aIGKnCAEK5I2Lf48zULX9oNMSNMr2QD3xjFGBVHtjq+TgdsslW/gECR7mRROa/PZWRVWNtvQiR3iMHDMc1notnuaD34uvC0sTKQ0mtjWzpmuH3s+tRBvb7wkHhAVjdyUOtp2cJgukOAW3qDT2SSOs4PqPhD9bdH8Qu9MpAjOecWTYKjxmvi8k0eIA6/+7KJ8a5Kl1XXWrOkkDkTNiyqcJOR+FLW1pzUgRMvmlo90YfbRlSvXt46yKyx0Uf6P+s+ZW+AuiEWulYljsxtONMbMgWEKhWazRcJCoXZqYs6eq2LI3sJJ+TDcI5ez4aP7OtlASzDXmtECxQeafkZhdDqO7uqrCn0Dtii7z0K3xDpsySCWdNBUcgYIcbTx8OD8Ulujv2yO8p1aELHomQj7ihQLdsyG1pADeWvUMuqoctVb9KukMQ2Z80WUn2OpzFXvE3MYXdyuZdPahaFEIbE5QvlerNdBoRmegl+9D1KbJZhFI2gH4JsdrH3K7M8ztf42iu+RWkQolaZr2hsVwT1io3ZYJ/VC8aPppHaoRAfoA9EepviUtKttg8Bn4hxH0VjYwVkpm8rSgDICK6l17H4zGUVLtCVz6DhARSUtSQDVob5pc9ve6r4OWGYkX+EB42pE++0j7s+3Zkfh/o0tnnKQwB8V/YbxHhtrPw+OZzVr2BzXu8vuBXVOt2WM+waH646FdhSVv+d0S/LGK7nyK2pC+Fhpab9WZKQ6tAuhQgwgNRec7iuvW74ZqltPrSfTyC32NsiwnllzcAQ+dL NfjeZ3tX Wb3FGV00FhxYkxNbIsdPr+DFIFwRaSg1ugZuJ71G+lHPLfGqF3buuvQlErwIyq8lKfNPQRHSiZylpLGPKQ6jNp27qisp1+zxayjrRtSnkaS+P7eweNZOwt8dRdCLQJEUALJb9NH/MdY2z/rTuUYpfKVDHK0two8eXHci49yxkyZnMH5OFGbsPGBcsGfJPGMML0nOPKaPPcSnEyI2SK5tDuozmEItD6TGKmC4ICyQzq7TsoMx8gjIRGllU8+x79Ek2SxaTWjjL3TzAj5Hz++3IQ2yP7hDRSIoIXMljHzg+BOWg6cb4Bv47z6U1QXz8s4G+WRAguoGaMGxR1r9lLM9fdNis3DGog1duOLMS6ay5zLBbVQtr5o/u3rgcwYgAhafMwc6S3IpePa/ozW3BZfgMygg2vluuantZj3yupWMd7lzD2fJnt7S1WlFFuZ8IOIHHlyKZf7r1l4iMai9IgQV2gMLBe+tWZD1WZbgK90EMxfN2p9bFtFkZ2ITmczTOabDcVIyWGAr8DS01orBXpGWQPJrNpWbtldLwAMjpAPbR0Z8Bm+F/GB8UeA9zkyDl1ukK7byFLpYyhhVAA0E= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/4/18 22:20, Peng Zhang wrote: > From: ZhangPeng > > Since commit f1a7941243c1 ("mm: convert mm's rss stats into > percpu_counter"), the rss_stats have converted into percpu_counter, > which convert the error margin from (nr_threads * 64) to approximately > (nr_cpus ^ 2). However, the new percpu allocation in mm_init() causes a > performance regression on fork/exec/shell. Even after commit 14ef95be6f55 > ("kernel/fork: group allocation/free of per-cpu counters for mm struct"), > the performance of fork/exec/shell is still poor compared to previous > kernel versions. > > To mitigate performance regression, we delay the allocation of percpu > memory for rss_stats. Therefore, we convert mm's rss stats to use > percpu_counter atomic mode. For single-thread processes, rss_stat is in > atomic mode, which reduces the memory consumption and performance > regression caused by using percpu. For multiple-thread processes, > rss_stat is switched to the percpu mode to reduce the error margin. > We convert rss_stats from atomic mode to percpu mode only when the > second thread is created. Hi, Zhang Peng This regression we also found it in lmbench these days. I have not test your patch, but it seems will solve a lot for it. And I see this patch not fix the regression in multi-threads, that's because of the rss_stat switched to percpu mode? (If I'm wrong, please correct me.) And It seems percpu_counter also has a bad effect in exit_mmap(). If so, I'm wondering if we can further improving it on the exit_mmap() path in multi-threads scenario, e.g. to determine which CPUs the process has run on (mm_cpumask()? I'm not sure). > > After lmbench test, we can get 2% ~ 4% performance improvement > for lmbench fork_proc/exec_proc/shell_proc and 6.7% performance > improvement for lmbench page_fault (before batch mode[1]). > > The test results are as follows: > > base base+revert base+this patch > > fork_proc 416.3ms 400.0ms (3.9%) 398.6ms (4.2%) > exec_proc 2095.9ms 2061.1ms (1.7%) 2047.7ms (2.3%) > shell_proc 3028.2ms 2954.7ms (2.4%) 2961.2ms (2.2%) > page_fault 0.3603ms 0.3358ms (6.8%) 0.3361ms (6.7%) I think the regression will becomes more obvious if more cores. How about your test machine? Thanks, -wrw > > [1] https://lore.kernel.org/all/20240412064751.119015-1-wangkefeng.wang@huawei.com/ > > Suggested-by: Jan Kara > Signed-off-by: ZhangPeng > Signed-off-by: Kefeng Wang > --- > include/linux/mm.h | 50 +++++++++++++++++++++++++++++++------ > include/trace/events/kmem.h | 4 +-- > kernel/fork.c | 18 +++++++------ > 3 files changed, 56 insertions(+), 16 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index d261e45bb29b..8f1bfbd54697 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -2631,30 +2631,66 @@ static inline bool get_user_page_fast_only(unsigned long addr, > */ > static inline unsigned long get_mm_counter(struct mm_struct *mm, int member) > { > - return percpu_counter_read_positive(&mm->rss_stat[member]); > + struct percpu_counter *fbc = &mm->rss_stat[member]; > + > + if (percpu_counter_initialized(fbc)) > + return percpu_counter_read_positive(fbc); > + > + return percpu_counter_atomic_read(fbc); > } > > void mm_trace_rss_stat(struct mm_struct *mm, int member); > > static inline void add_mm_counter(struct mm_struct *mm, int member, long value) > { > - percpu_counter_add(&mm->rss_stat[member], value); > + struct percpu_counter *fbc = &mm->rss_stat[member]; > + > + if (percpu_counter_initialized(fbc)) > + percpu_counter_add(fbc, value); > + else > + percpu_counter_atomic_add(fbc, value); > > mm_trace_rss_stat(mm, member); > } > > static inline void inc_mm_counter(struct mm_struct *mm, int member) > { > - percpu_counter_inc(&mm->rss_stat[member]); > - > - mm_trace_rss_stat(mm, member); > + add_mm_counter(mm, member, 1); > } > > static inline void dec_mm_counter(struct mm_struct *mm, int member) > { > - percpu_counter_dec(&mm->rss_stat[member]); > + add_mm_counter(mm, member, -1); > +} > > - mm_trace_rss_stat(mm, member); > +static inline s64 mm_counter_sum(struct mm_struct *mm, int member) > +{ > + struct percpu_counter *fbc = &mm->rss_stat[member]; > + > + if (percpu_counter_initialized(fbc)) > + return percpu_counter_sum(fbc); > + > + return percpu_counter_atomic_read(fbc); > +} > + > +static inline s64 mm_counter_sum_positive(struct mm_struct *mm, int member) > +{ > + struct percpu_counter *fbc = &mm->rss_stat[member]; > + > + if (percpu_counter_initialized(fbc)) > + return percpu_counter_sum_positive(fbc); > + > + return percpu_counter_atomic_read(fbc); > +} > + > +static inline int mm_counter_switch_to_pcpu_many(struct mm_struct *mm) > +{ > + return percpu_counter_switch_to_pcpu_many(mm->rss_stat, NR_MM_COUNTERS); > +} > + > +static inline void mm_counter_destroy_many(struct mm_struct *mm) > +{ > + percpu_counter_destroy_many(mm->rss_stat, NR_MM_COUNTERS); > } > > /* Optimized variant when folio is already known not to be anon */ > diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h > index 6e62cc64cd92..a4e40ae6a8c8 100644 > --- a/include/trace/events/kmem.h > +++ b/include/trace/events/kmem.h > @@ -399,8 +399,8 @@ TRACE_EVENT(rss_stat, > __entry->mm_id = mm_ptr_to_hash(mm); > __entry->curr = !!(current->mm == mm); > __entry->member = member; > - __entry->size = (percpu_counter_sum_positive(&mm->rss_stat[member]) > - << PAGE_SHIFT); > + __entry->size = (mm_counter_sum_positive(mm, member) > + << PAGE_SHIFT); > ), > > TP_printk("mm_id=%u curr=%d type=%s size=%ldB", > diff --git a/kernel/fork.c b/kernel/fork.c > index 99076dbe27d8..0214273798c5 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -823,7 +823,7 @@ static void check_mm(struct mm_struct *mm) > "Please make sure 'struct resident_page_types[]' is updated as well"); > > for (i = 0; i < NR_MM_COUNTERS; i++) { > - long x = percpu_counter_sum(&mm->rss_stat[i]); > + long x = mm_counter_sum(mm, i); > > if (unlikely(x)) > pr_alert("BUG: Bad rss-counter state mm:%p type:%s val:%ld\n", > @@ -1301,16 +1301,10 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, > if (mm_alloc_cid(mm)) > goto fail_cid; > > - if (percpu_counter_init_many(mm->rss_stat, 0, GFP_KERNEL_ACCOUNT, > - NR_MM_COUNTERS)) > - goto fail_pcpu; > - > mm->user_ns = get_user_ns(user_ns); > lru_gen_init_mm(mm); > return mm; > > -fail_pcpu: > - mm_destroy_cid(mm); > fail_cid: > destroy_context(mm); > fail_nocontext: > @@ -1730,6 +1724,16 @@ static int copy_mm(unsigned long clone_flags, struct task_struct *tsk) > if (!oldmm) > return 0; > > + /* > + * For single-thread processes, rss_stat is in atomic mode, which > + * reduces the memory consumption and performance regression caused by > + * using percpu. For multiple-thread processes, rss_stat is switched to > + * the percpu mode to reduce the error margin. > + */ > + if (clone_flags & CLONE_THREAD) > + if (mm_counter_switch_to_pcpu_many(oldmm)) > + return -ENOMEM; > + > if (clone_flags & CLONE_VM) { > mmget(oldmm); > mm = oldmm;