From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CA7FC369A2 for ; Mon, 14 Apr 2025 06:36:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7C895280041; Mon, 14 Apr 2025 02:36:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7762C280030; Mon, 14 Apr 2025 02:36:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 664FE280041; Mon, 14 Apr 2025 02:36:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 49655280030 for ; Mon, 14 Apr 2025 02:36:27 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 777AA58DF8 for ; Mon, 14 Apr 2025 06:36:28 +0000 (UTC) X-FDA: 83331690456.15.3DCFBDC Received: from sea.source.kernel.org (sea.source.kernel.org [172.234.252.31]) by imf26.hostedemail.com (Postfix) with ESMTP id E898214000B for ; Mon, 14 Apr 2025 06:36:26 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=HskB4Nvw; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf26.hostedemail.com: domain of mingo@kernel.org designates 172.234.252.31 as permitted sender) smtp.mailfrom=mingo@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744612587; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hA/4rz8uj5P5kZFDBt322+jhc3rB6kUKxjke6jm5CYI=; b=f7yheynHmXg+72eRdwhuee3uLLTgJnSUn6TrmLcXpZyU+Bu/KNWUz+AByJ4ozp7qUn9nlP /tyNRBSIQYLxWFlRl59DKk8LZtSKICZSGTkxKc+tbkRGc8JoWLYDLlFJ35RtwkLEPnO9Xi 1M4vAVg0NNQmsOnPu8OWUf5QvLVY3X0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744612587; a=rsa-sha256; cv=none; b=V3+yUk5xHDNWcgh/fw93jVsgJh2NBNxF0netcdpv4if4ZmoUV/pxhjrET7L6N4nnLAw2ce qYU5Ush14G8n/lmxIQk7ruDkVC3ki6GndtPWzQqIV2QAIFMRv6HVJGhbkf/2O+5Awdx/tD PTtat0ZbqE9sVbCS6abVw+Si2OL/LN8= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=HskB4Nvw; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf26.hostedemail.com: domain of mingo@kernel.org designates 172.234.252.31 as permitted sender) smtp.mailfrom=mingo@kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sea.source.kernel.org (Postfix) with ESMTP id D4287450A4; Mon, 14 Apr 2025 06:36:24 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id ED047C4CEE2; Mon, 14 Apr 2025 06:36:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1744612585; bh=cpn8A4zoLhDaSlyxrOD4O65Uuyt1YcZ9xpPGbOgNtig=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=HskB4Nvwb9MHOEn52rvk8D0gahIPG9q/npHskyFWiY1D3I5DiPXd+f2C3OtqQn5K2 f1JIH6Npc+O8pgpLPKoPfn1OMqdr4iliZwIi1HGETT9WhwVGN4n0uP27QJATFhMeqc bb+Or+okZRDYlRm2TqcGC3y9ZfDHxOcunsrkRqhwnDQ9nX1tPxe23fRqE2ljJtjwX3 deUxqv12He3nnbl6zBc0GCOHGBmfMC/4sonDMPsXa9OGLqwxOBZj77xzSnUblL33+C 0uHsRC3vy5qc8MQrA72uwd5tRF5EUSiQzeiJIoQbRvl3w7o3JwCZtrbdIc4rmC0o4x qdSeslvtc1ymg== Date: Mon, 14 Apr 2025 08:36:18 +0200 From: Ingo Molnar To: Ankur Arora Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org, torvalds@linux-foundation.org, akpm@linux-foundation.org, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com, luto@kernel.org, peterz@infradead.org, paulmck@kernel.org, rostedt@goodmis.org, tglx@linutronix.de, willy@infradead.org, jon.grimm@amd.com, bharata@amd.com, raghavendra.kt@amd.com, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com Subject: Re: [PATCH v3 0/4] mm/folio_zero_user: add multi-page clearing Message-ID: References: <20250414034607.762653-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250414034607.762653-1-ankur.a.arora@oracle.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: E898214000B X-Stat-Signature: dskrymrstcqzt7do4a3rccsb313q4ask X-Rspam-User: X-HE-Tag: 1744612586-929313 X-HE-Meta: U2FsdGVkX19FrwMY3WM5UUBf2HTki0+HHqqi45TK12ApbjM+hDkwS5Tw7n6CtBmYsRPqw79BgfSDYG1jCZZxP61NlGbrRu/xXFAwdPecBLZDbkn8qb6houZEePNTIJwWon1jaSPWVY3h9fCBeKHQh/8ebbHdkU5qsTChWUv8YlDiu09MpuuhtDrEpCFLuspQksj1TYqQcDlury7WxEqR9d0r7s/hVUOPAD9YXJ+d8DegRXnEn4slPn0ZrcHCdKqKmHy8QKb5GkCRErnFcHwXbDf25m1BaRZELy10QIcM+M5QbVv3CtmT11rFewT7Fl/ztODcgD74ip1dckUBcxx3Snj1lzWfjgqntG8+oKJT49XQdfB+rlybnpaeZR2OOlTqdJr+2g7hfUivTU2ghDEMK1P445a1+rDMfrrWrXZj5AbQcipOLEYxkbWGDmjBHEv0Q52ulRU+zFDbqhu2i/jpqYFXixwP9PuGr/K0wWoQzEBCMMXCJxqYHC6oXSpLG0IrDbnAP5VHlNUzXFa9OjqX6diuwoCimyk+wb60TEe0PieuUM/+yEHIahcxmktnbxyqZtRz7otpsOmJDBQdc/qNWhYCL9vPPQmdlqURvakL7AplNFPC77SmrRh9FhXGTLeJCFPjI4SmA+7PgJrCiQGJoxKc/EDoZzqgeUVwtT1VXdPLvmVlc5ww1MRg6/eZIX//qqrHWQuqg0XM62jHI5F9Uc983Mkq8og+9VlU/2r1ncY60qfwyjtTVO/4oMsWGIyRRBeYXCxUEkWDpxzjMkTW7raUb1DWTRWA9EqjeqCx/AdwRJJeeQfz9RtsQyA5zsIwfL3wYLfzCEwrxry6sJiUgw7lPWXLPXoR8wrKbbD0CxO9UqwmtdvLH8w7vb/6qQjmKJ33vj5C7kamd9pR3ZkvWgS2ypcIrBUlmF3/jqClU27Ps8T2J7BbYA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: * Ankur Arora wrote: > We also see performance improvement for cases where this optimization is > unavailable (pg-sz=2MB on AMD, and pg-sz=2MB|1GB on Intel) because > REP; STOS is typically microcoded which can now be amortized over > larger regions and the hint allows the hardware prefetcher to do a > better job. > > Milan (EPYC 7J13, boost=0, preempt=full|lazy): > > mm/folio_zero_user x86/folio_zero_user change > (GB/s +- stddev) (GB/s +- stddev) > > pg-sz=1GB 16.51 +- 0.54% 42.80 +- 3.48% + 159.2% > pg-sz=2MB 11.89 +- 0.78% 16.12 +- 0.12% + 35.5% > > Icelakex (Platinum 8358, no_turbo=1, preempt=full|lazy): > > mm/folio_zero_user x86/folio_zero_user change > (GB/s +- stddev) (GB/s +- stddev) > > pg-sz=1GB 8.01 +- 0.24% 11.26 +- 0.48% + 40.57% > pg-sz=2MB 7.95 +- 0.30% 10.90 +- 0.26% + 37.10% How was this measured? Could you integrate this measurement as a new tools/perf/bench/ subcommand so that people can try it on different systems, etc.? There's already a 'perf bench mem' subcommand space where this feature could be added to. Thanks, Ingo