From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 611FBC83F17 for ; Tue, 15 Jul 2025 20:17:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EFE286B0089; Tue, 15 Jul 2025 16:17:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ED5C16B009F; Tue, 15 Jul 2025 16:17:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E12C16B00A0; Tue, 15 Jul 2025 16:17:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D06266B0089 for ; Tue, 15 Jul 2025 16:17:31 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 75F4210C5C7 for ; Tue, 15 Jul 2025 20:17:31 +0000 (UTC) X-FDA: 83667609102.22.342E56B Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf24.hostedemail.com (Postfix) with ESMTP id BA18A180007 for ; Tue, 15 Jul 2025 20:17:29 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=mvSMAu9v; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf24.hostedemail.com: domain of namhyung@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=namhyung@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752610649; a=rsa-sha256; cv=none; b=eqc1PJ7L3AswCs4IZ/FuXJ9pNzQn5aY6mf1ZuUmo/ecvntsuKeaV96ruDqdPaXmN4fabn8 q/FVfdxDpk2uifbPxdO3rzly+iO+m/0Jrh6ai3IBSRZ1moluYEXFHCa7o5SM7PgccZnwAX AyyJ0efrDdnu5TxOrdG3hhdFPJ/ro3k= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=mvSMAu9v; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf24.hostedemail.com: domain of namhyung@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=namhyung@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752610649; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BMz/2nyjueftNr3/Ku1ZTqnvz0ZtPn2dNMyNASy64mI=; b=vV2aF5gaGnlubD/7awcZNBehNDhZ0238f//ApZVLfzQ+4oVwGKmiCfjYGmpdA8iUkG6Dnd BC+E01eDhpxCWlAYO5pmS4gdtO0YDTWXr+jtkhFa0V/ePr/szbMhhmFC0jIYZoU121FYGV PnMRB01XUbfEs19ri/oHPxn++YqPLac= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id D36E25C6621; Tue, 15 Jul 2025 20:17:28 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C10BFC4CEE3; Tue, 15 Jul 2025 20:17:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1752610648; bh=ZVUoEHrFKDa2T3shxXnxjvESjxznAX/XpvM1/FjZYA0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=mvSMAu9vM0iQjcTZ9SHZVG63bWX1lFYImGLv//AYGYoW82ayxFKbXMiv0HROtHpq3 b9RS03k6aIp9kyVjyZu3Z2PuUaSqFxY2Z6EVv/gZhO7oIbuT6+OnD3E25IqJ9fQ1mj KK2+Y06NzWi9ndMYwlXjGcOnztH1JOrIM1udOxRBURsFPp7BP0RJqOKCVUKqpU30cC UyQ6f/ndVvwJ6LGJP87k/EOuDOcCczT91gUDcUP6fD4Vey6KyLlKybZw1SmeOCQScS XYE732hQWpiu41qPPEMD6wPrJcJovDgzuotnjPSHtAFI798wQ3uCLbc2gIgXyXPCgc GSJsMSUVOambQ== Date: Tue, 15 Jul 2025 13:17:26 -0700 From: Namhyung Kim To: Ankur Arora Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org, akpm@linux-foundation.org, david@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com, mjguzik@gmail.com, luto@kernel.org, peterz@infradead.org, acme@kernel.org, tglx@linutronix.de, willy@infradead.org, raghavendra.kt@amd.com, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com Subject: Re: [PATCH v5 07/14] perf bench mem: Allow chunking on a memory region Message-ID: References: <20250710005926.1159009-1-ankur.a.arora@oracle.com> <20250710005926.1159009-8-ankur.a.arora@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20250710005926.1159009-8-ankur.a.arora@oracle.com> X-Rspamd-Queue-Id: BA18A180007 X-Stat-Signature: sowynifeeaw8wuuefbic51ccifmgxath X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1752610649-949379 X-HE-Meta: U2FsdGVkX19Mtai0L5fwvuH4q8iOHXhC1BauQVJgo30imOOa0IgAC7yOfMI2rTC/ORZWfFPSKSmV2JucUEFxEZF1c+iM6hDXzE96/vYLoVR7bgVG08hYcHRJlmfpinHTGRC3/9Xexr/ZLFjGTtLQYpul0F1lALgbA+IQyCsspIoj5q3/X0P1hdHmiYE3omGVAtqBL+jnHkgX+deuYmJwsGkhZniv1EKh4gbJ32hdZ6fB1ipD4VTjc75ii8EPzjqHO5Bue+HoNnK1dDcUxNBuOwojguwD/Q818aYFCGrOoROFL4oPM/64nyxC+mIBJX5GTJfsCHF58wGL9xcQIb3uhlBxjiSqwmi1HdhsTy3JBRHgIZXTNV+4LDXNAbwIYhJAgjgnBLO/ui//umM9KqxGixVyBvI+G1xMrmnnUrWCaeaSwr0uOAI6f23GoBZovkqwk//eyIc7jHUe61LSeKIftLIASZEFFbZMXIb/nQvtM+jN22flVNyJLhoZMfgg95Wx53yrutcNJ5HeNYg/rdrr4wynYbmuiNKmEGc65RwwrcDzMIbAUZQ4SHTli8ijCqyQo4SB2TnYieUeA4b0EJFmPSgxA2Ae2OJoFgYAcYwZIUoRN0mK+Hlh4O3mC+1KCFC8s0qM6a+qiraUE+5+v27VqipYa4szX7TCHG9je2n5YhHINbWq7i8wgI/ugkMiHRE0YNwy8Zt9sfTgClrnoy7E0y9NhEDEMDniAt3KcjH4eM4TdnDdWrmnnljvuAJ7wv5J5NQiOzXoU9XbgbnWNCSDgkNjs2CETVsDEyi/ZtJw8cmRPuvvDiXWDB/YTxruYhHI5njdbvfq8bsHmhrZjqy8uEsiHyeluSnk1FSlDVHkygDOdMeIUvj0snMzwCCXZ1kHrqFBlu6B3eGFuzKyE3A/HOfLC1hkvzWkxZZXV3qI1bl6EnWLvuvLGEoNfDVnmyCm1EYFZmO+oSiA8KfbNvT U25EpK/K a36kfScYp7cnnvBd/oD2AM84zj/26Ne6mVDIOVvpuC7JzrxDTVzUj7BFKX/zW57275gvzM8JRI6be7kkaGpiHQaNN5KG4GGs0PlM2bVOlztpr3D4/Hi1jWYsTdXL1EyuFO89Z31hEeezgIOxGiEC8sb5bpmo4cxF4fsOLsBN/SATG91lst4dr33czh23KycLTIQR1IssCFTwN25J3WCvhhv8gbfKHUOuhudfl3Yd07yZ095bnV0oJKkoWtNjlbElDU6NIvZvfevIaCSZB/oLrkm/XJEJgwjkVO06jkvO+Dr/j13c= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jul 09, 2025 at 05:59:19PM -0700, Ankur Arora wrote: > There can be a significant gap in memset/memcpy performance depending > on the size of the region being operated on. > > With chunk-size=4kb: > > $ echo madvise > /sys/kernel/mm/transparent_hugepage/enabled > > $ perf bench mem memset -p 4kb -k 4kb -s 4gb -l 10 -f x86-64-stosq > # Running 'mem/memset' benchmark: > # function 'x86-64-stosq' (movsq-based memset() in arch/x86/lib/memset_64.S) > # Copying 4gb bytes ... > > 13.011655 GB/sec > > With chunk-size=1gb: > > $ echo madvise > /sys/kernel/mm/transparent_hugepage/enabled > > $ perf bench mem memset -p 4kb -k 1gb -s 4gb -l 10 -f x86-64-stosq > # Running 'mem/memset' benchmark: > # function 'x86-64-stosq' (movsq-based memset() in arch/x86/lib/memset_64.S) > # Copying 4gb bytes ... > > 21.936355 GB/sec > > So, allow the user to specify the chunk-size. > > The default value is identical to the total size of the region, which > preserves current behaviour. > > Signed-off-by: Ankur Arora Again, please update the documentation. With that, Reviewed-by: Namhyung Kim Thanks, Namhyung > --- > tools/perf/bench/mem-functions.c | 20 ++++++++++++++++++-- > 1 file changed, 18 insertions(+), 2 deletions(-) > > diff --git a/tools/perf/bench/mem-functions.c b/tools/perf/bench/mem-functions.c > index e4d713587d45..412d18f2cb2e 100644 > --- a/tools/perf/bench/mem-functions.c > +++ b/tools/perf/bench/mem-functions.c > @@ -36,6 +36,7 @@ > static const char *size_str = "1MB"; > static const char *function_str = "all"; > static const char *page_size_str = "4KB"; > +static const char *chunk_size_str = "0"; > static unsigned int nr_loops = 1; > static bool use_cycles; > static int cycles_fd; > @@ -49,6 +50,10 @@ static const struct option options[] = { > "Specify page-size for mapping memory buffers. " > "Available sizes: 4KB, 2MB, 1GB (case insensitive)"), > > + OPT_STRING('k', "chunk", &chunk_size_str, "0", > + "Specify the chunk-size for each invocation. " > + "Available units: B, KB, MB, GB and TB (case insensitive)"), > + > OPT_STRING('f', "function", &function_str, "all", > "Specify the function to run, \"all\" runs all available functions, \"help\" lists them"), > > @@ -69,6 +74,7 @@ union bench_clock { > struct bench_params { > size_t size; > size_t size_total; > + size_t chunk_size; > unsigned int nr_loops; > unsigned int page_shift; > }; > @@ -242,6 +248,14 @@ static int bench_mem_common(int argc, const char **argv, struct bench_mem_info * > } > p.size_total = (size_t)p.size * p.nr_loops; > > + p.chunk_size = (size_t)perf_atoll((char *)chunk_size_str); > + if ((s64)p.chunk_size < 0 || (s64)p.chunk_size > (s64)p.size) { > + fprintf(stderr, "Invalid chunk_size:%s\n", chunk_size_str); > + return 1; > + } > + if (!p.chunk_size) > + p.chunk_size = p.size; > + > page_size = (unsigned int)perf_atoll((char *)page_size_str); > if (page_size != (1 << PAGE_SHIFT_4KB) && > page_size != (1 << PAGE_SHIFT_2MB) && > @@ -299,7 +313,8 @@ static int do_memcpy(const struct function *r, struct bench_params *p, > > clock_get(&start); > for (unsigned int i = 0; i < p->nr_loops; ++i) > - fn(dst, src, p->size); > + for (size_t off = 0; off < p->size; off += p->chunk_size) > + fn(dst + off, src + off, min(p->chunk_size, p->size - off)); > clock_get(&end); > > *rt = clock_diff(&start, &end); > @@ -401,7 +416,8 @@ static int do_memset(const struct function *r, struct bench_params *p, > > clock_get(&start); > for (unsigned int i = 0; i < p->nr_loops; ++i) > - fn(dst, i, p->size); > + for (size_t off = 0; off < p->size; off += p->chunk_size) > + fn(dst + off, i, min(p->chunk_size, p->size - off)); > clock_get(&end); > > *rt = clock_diff(&start, &end); > -- > 2.43.5 >