From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 97E99202984 for ; Thu, 6 Mar 2025 10:59:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741258766; cv=none; b=c6uuPm0DntKPidYiHMvrgMhd9yJ5wk9mLWjYm2S7qMvcT303ZonC0b36j0TVAGfhCHtYDZtzUjxTLdu+Iuzofu3BpVqcjVVze/F6p8L5GpwSbf1b5MboPy45rJmOBm9VPojCS5gLl+gvvADcS/oaPeNewZ106k4G0vgqYkvadkI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741258766; c=relaxed/simple; bh=oiKIBMvjUFZQtuUGRvuFWZOhGe9vd7B1krTsOAEz9Qs=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=CgtxQT/WNKXMUlt19IUl0+KP2XY7Id9LOWZsgT+3BHZt3Crkh23Cuhjznu4Id86vU/dD2Bp7sBnrDenfFwmemO8VxvGG4juQLQIrdxd4bZrMf9Tx9qAzsUYNVz+owrglyqnPq7Vm5tkUhm3CoQwQX1aKSE/qEkRxuuf2szxeByc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=RL4039pb; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="RL4039pb" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9629FC4CEE0; Thu, 6 Mar 2025 10:59:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1741258766; bh=oiKIBMvjUFZQtuUGRvuFWZOhGe9vd7B1krTsOAEz9Qs=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=RL4039pbPMJisfHedVT/4GA3Kv1NSanOT/j6XHQr9F/eaPcgEcvP2IL4DPnY2qvqt W9aphpmWNqePu4GZPzDJhASWEfQweJAvXYTmKcF8Qzoyuudq9K9iLxqJ7PBKp+uO9r n89ZIbAcToQDN92nsh+nEsZqa+qIvBK97vEbKMBH0DhAc7A0WI7C99tC8b2WQOQhDX Z1GFOURLT9rC4BdzuOnWUGM3SBhWZXhOrkoQd474ceSKg1/GtlalqZNsVEpS6n4G/f aQ81lQb8T77i1U8gO+8Hd7PdpRVtZTYgWkyoDypmrWQiyAPfS3zKXvMPpSymlU3xwR Bz4wWJzxAZ0FA== Date: Thu, 6 Mar 2025 11:59:20 +0100 From: Ingo Molnar To: Dirk Gouders , Arnaldo Carvalho de Melo , Jiri Olsa , Peter Zijlstra , Namhyung Kim Cc: Uros Bizjak , Borislav Petkov , Dave Hansen , x86@kernel.org, linux-kernel@vger.kernel.org, Peter Zijlstra , Thomas Gleixner , Dave Hansen , "H. Peter Anvin" , Linus Torvalds Subject: Re: [PATCH -tip] x86/locking/atomic: Use asm_inline for atomic locking insns Message-ID: References: <20250228123825.2729925-1-ubizjak@gmail.com> <20f1af22-71dc-4d62-9615-03030012222e@intel.com> <20250301123802.GCZ8L_qsv7-WwUwqt5@fat_crate.local> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: ( I've Cc:-ed some perf gents as to the measurement artifacts observed below. Full report quoted below. ) * Dirk Gouders wrote: > Hi Ingo, > > my interest comes, because I just started to try to better understand > PCL and am reading the perf manual pages. Perhaps I should therefore > keep my RO-bit permanent for some more months, but: > > > And if the benchmark is context-switching heavy, you'll want to use > > 'perf stat -a' option to not have PMU context switching costs, and the > > I'm sure you know what you are talking about so I don't doubt the above > is correct but perhaps, the manual page should also clarify -a: > > -a:: > --all-cpus:: > system-wide collection from all CPUs (default if no target is specified) > > In the last example -a is combined with -C 2 which is even more irritating when > you just started with the manual pages. > > > But the main reason why I thought it might be OK to once toggle my > RO-bit is that I tried your examples and with the first one I have way > higher numbers than yours and I thought that must be, because you just > own the faster machine (as I would have expected): > > > starship:~> perf bench sched pipe > > # Running 'sched/pipe' benchmark: > > # Executed 1000000 pipe operations between two processes > > > > Total time: 6.939 [sec] > > > > 6.939128 usecs/op > > 144110 ops/sec > > lena:~> perf bench sched pipe > # Running 'sched/pipe' benchmark: > # Executed 1000000 pipe operations between two processes > > Total time: 11.129 [sec] > > 11.129952 usecs/op > 89847 ops/sec > > And I expected this to continue throughout the examples. > > But -- to keep this short -- with the last example, my numbers are > suddenly significantly lower than yours: > > > starship:~> taskset 0x4 perf stat -a -C 2 -e instructions --repeat 5 perf bench sched pipe > > 5.808068 usecs/op > > 5.843716 usecs/op > > 5.826543 usecs/op > > 5.801616 usecs/op > > 5.793129 usecs/op > > > > Performance counter stats for 'system wide' (5 runs): > > > > 32,244,691,275 instructions ( +- 0.21% ) > > > > 5.81624 +- 0.00912 seconds time elapsed ( +- 0.16% ) > > lena:~> taskset 0x4 perf stat -a -C 2 -e instructions --repeat 5 perf bench sched pipe > 4.204444 usecs/op > 4.169279 usecs/op > 4.186812 usecs/op > 4.217039 usecs/op > 4.208538 usecs/op > > Performance counter stats for 'system wide' (5 runs): > > 14,196,762,588 instructions ( +- 0.04% ) > > 4.20203 +- 0.00854 seconds time elapsed ( +- 0.20% ) > > > > Of course, I don't want to waste anyones time if this is a so obvious > thing that only newbies don't understand. So, feel free to just ignore this. > > Regards > > Dirk