From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx10.gouders.net (mx10.gouders.net [202.61.206.94]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6B3121A76BC for ; Thu, 6 Mar 2025 10:53:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.61.206.94 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741258407; cv=none; b=fY7t6I1uZCgV7jGXeh1iB3rjsK4otOpJ6EIz7MRMsqC3qpFJahilMJePajSmc1ziKKcV7nxGLZVioJDLpx9jFdnxaQzYBVmVDuP+rePKUbG5SvW4HHBF28272ovzNRFzzzhrtPz9EvPA/Tv0XTtixL3JDQH7DXodeY2iKsV8kNo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741258407; c=relaxed/simple; bh=MQyGl26Whql4FvL52ynnLjyqxmPpjyEauG0LYmwzoVs=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=HF6q1YDojLA2MQVFnuPgr6EMwA2hw8QeHYp9iibCwF8JDqjCU68vCLYrtR/RYBNkiFukVpmZzXJ2wRqR0gdRHLKCUC/SkxIfw7HzvpLMLhNwxJvrm/wpNO9FXw1wIETGPvp8hm/YQ/VVcwkI4/OnG3F7E7oZKKk/6IreVqrHRWs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gouders.net; spf=pass smtp.mailfrom=gouders.net; dkim=pass (1024-bit key) header.d=gouders.net header.i=@gouders.net header.b=ar576irM; arc=none smtp.client-ip=202.61.206.94 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gouders.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gouders.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=gouders.net header.i=@gouders.net header.b="ar576irM" Received: from localhost (ip-109-42-177-222.web.vodafone.de [109.42.177.222]) (authenticated bits=0) by mx10.gouders.net (8.17.1.9/8.17.1.9) with ESMTPSA id 526AqD7U021659 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Thu, 6 Mar 2025 11:52:14 +0100 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gouders.net; s=gnet; t=1741258334; bh=MQyGl26Whql4FvL52ynnLjyqxmPpjyEauG0LYmwzoVs=; h=From:To:Cc:Subject:In-Reply-To:References:Date; b=ar576irMuzc2sXSuJ1W8FiDS59YIegPWWoHsyA2nKIbxpFW1Nru5QoMZGghvBWFq4 +sXkUmQz81H+RVDSHDP8F7y/Nf5SM76khqqBCE6fR8tNN2GixiEcL2XUyM7oW3uh7C YloJIAwxIeUKibq45HT+t/KNtBMcM6vOgqbHj3dY= From: Dirk Gouders To: Ingo Molnar Cc: Uros Bizjak , Borislav Petkov , Dave Hansen , x86@kernel.org, linux-kernel@vger.kernel.org, Peter Zijlstra , Thomas Gleixner , Dave Hansen , "H. Peter Anvin" , Linus Torvalds Subject: Re: [PATCH -tip] x86/locking/atomic: Use asm_inline for atomic locking insns In-Reply-To: (Ingo Molnar's message of "Wed, 5 Mar 2025 21:20:05 +0100") References: <20250228123825.2729925-1-ubizjak@gmail.com> <20f1af22-71dc-4d62-9615-03030012222e@intel.com> <20250301123802.GCZ8L_qsv7-WwUwqt5@fat_crate.local> User-Agent: Gnus/5.13 (Gnus v5.13) Date: Thu, 06 Mar 2025 11:52:08 +0100 Message-ID: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain Hi Ingo, my interest comes, because I just started to try to better understand PCL and am reading the perf manual pages. Perhaps I should therefore keep my RO-bit permanent for some more months, but: > And if the benchmark is context-switching heavy, you'll want to use > 'perf stat -a' option to not have PMU context switching costs, and the I'm sure you know what you are talking about so I don't doubt the above is correct but perhaps, the manual page should also clarify -a: -a:: --all-cpus:: system-wide collection from all CPUs (default if no target is specified) In the last example -a is combined with -C 2 which is even more irritating when you just started with the manual pages. But the main reason why I thought it might be OK to once toggle my RO-bit is that I tried your examples and with the first one I have way higher numbers than yours and I thought that must be, because you just own the faster machine (as I would have expected): > starship:~> perf bench sched pipe > # Running 'sched/pipe' benchmark: > # Executed 1000000 pipe operations between two processes > > Total time: 6.939 [sec] > > 6.939128 usecs/op > 144110 ops/sec lena:~> perf bench sched pipe # Running 'sched/pipe' benchmark: # Executed 1000000 pipe operations between two processes Total time: 11.129 [sec] 11.129952 usecs/op 89847 ops/sec And I expected this to continue throughout the examples. But -- to keep this short -- with the last example, my numbers are suddenly significantly lower than yours: > starship:~> taskset 0x4 perf stat -a -C 2 -e instructions --repeat 5 perf bench sched pipe > 5.808068 usecs/op > 5.843716 usecs/op > 5.826543 usecs/op > 5.801616 usecs/op > 5.793129 usecs/op > > Performance counter stats for 'system wide' (5 runs): > > 32,244,691,275 instructions ( +- 0.21% ) > > 5.81624 +- 0.00912 seconds time elapsed ( +- 0.16% ) lena:~> taskset 0x4 perf stat -a -C 2 -e instructions --repeat 5 perf bench sched pipe 4.204444 usecs/op 4.169279 usecs/op 4.186812 usecs/op 4.217039 usecs/op 4.208538 usecs/op Performance counter stats for 'system wide' (5 runs): 14,196,762,588 instructions ( +- 0.04% ) 4.20203 +- 0.00854 seconds time elapsed ( +- 0.20% ) Of course, I don't want to waste anyones time if this is a so obvious thing that only newbies don't understand. So, feel free to just ignore this. Regards Dirk