From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E7A923DE for ; Sun, 16 Jun 2024 03:56:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718510177; cv=none; b=MFMKMQBRQfL+0DUmYV6sO9jfYinKEcWCDscN+t3pOizoqfiLzVXI+vbeVafbpLUusGMS40VLCYeuKZTN6DUAs0xhNoKXt+PqstgjpHz8MyzWeqdZSENNJi+7Ygq/2s+QapCndNunlBaDXKS+MzqMVgTBz+Y3QeSEjC0xdek4OME= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718510177; c=relaxed/simple; bh=qiRQdMrDaZnTKsyPqjkqTIwelaI2cKFQQ2ak3bzDeuE=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=Jqlvg2f/+F/d2GlGxmen49EFsOB1FhqnV3kYsKhBIWEd2P+bwiFWDb5gXFerGLg0+yenYxDuWjAMIedZEEijn+RoyxR1aDclGr5+aexyE1Iqfty7EIo3r/+Oq+KE8No0WjJbT/ShtUX32baT1zl7jVF1AvRkVUYmv5Ah/uwirv0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=qJJg0+TZ; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="qJJg0+TZ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5BB06C2BBFC; Sun, 16 Jun 2024 03:56:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1718510176; bh=qiRQdMrDaZnTKsyPqjkqTIwelaI2cKFQQ2ak3bzDeuE=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=qJJg0+TZ9Nlymq6zzY+R92tY0Bs/QE6rcQotZlDOloU6rwqwTFiB3ZQHfqpVWugNx BgKn0g+NxUQaacSj5F7IQ/XeC7y3B01CCViT/enRUPfQQbUcGuYadSlOD/u06hChmI bzTauY1BLMxv2aVkuXlJaqzQuR+37CV5cqUVn2O/P1Y+46QZtnukBiAkNrTw9kpSYK AVRyDlz8FFy/D+q6bpQsI24r0sora/UYMVWh47jH0cDX+Z50a1ZQkGYJptCBSMwvB0 DWW+yqF5gOBw8CFEWje6jqFF76edsHWVJbd+kI7mG6G8FLCZRmC3WPlwITYaeFmn/e kakBHJU4I0+hw== Date: Sat, 15 Jun 2024 20:56:14 -0700 From: Namhyung Kim To: Michael Petlan Cc: vmolnaro@redhat.com, linux-perf-users@vger.kernel.org, acme@kernel.org, acme@redhat.com Subject: Re: [PATCH] perf test stat_bpf_counter.sh: Remove comparison of separate runs Message-ID: References: <20240604153111.105548-1-vmolnaro@redhat.com> Precedence: bulk X-Mailing-List: linux-perf-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Hello, On Thu, Jun 13, 2024 at 04:20:58PM +0200, Michael Petlan wrote: > On Tue, 4 Jun 2024, Namhyung Kim wrote: > > On Tue, Jun 04, 2024 at 05:31:11PM +0200, vmolnaro@redhat.com wrote: > > > From: Veronika Molnarova > > > > > > The test has been failing for some time when two separate runs of > > > perf benchmarks are recorded and the counts of the samples are compared, > > > while once the recording was done with option --bpf-counters and once > > > without it. It is expected that the count of the samples should within > > > a certain range, firstly the difference should have been within 10%, > > > which was then later raised to 20%. However, the test case keeps failing > > > on certain architectures as recording the same benchmark can provide > > > completely different counts samples based on the current load of the > > > system. > > > > > > Sampling two separate runs on intel-eaglestream-spr-13 of "perf stat > > > --no-big-num -e cycles -- perf bench sched messaging -g 1 -l 100 -t": > > > > > > Performance counter stats for 'perf bench sched messaging -g 1 -l 100 -t': > > > > > > 396782898 cycles > > > > > > 0.010051983 seconds time elapsed > > > > > > 0.008664000 seconds user > > > 0.097058000 seconds sys > > > > > > Performance counter stats for 'perf bench sched messaging -g 1 -l 100 -t': > > > > > > 1431133032 cycles > > > > > > 0.021803714 seconds time elapsed > > > > > > 0.023377000 seconds user > > > 0.349918000 seconds sys > > > > > > , which is ranging from 400mil to 1400mil samples. > > > > > > From the testing point of view, it does not make sense to compare two > > > separate runs against each other when the conditions may change > > > significantly. Remove the comparison of two separate runs and check only > > > whether the stating works as expected for the --bpf-counters option. Compare > > > the samples count only when the samples are recorded simultaneously > > > ensuring the same conditions. > > > > Hmm.. but having a test which checks if the output is sane can be > > useful. If it's a problem of dynamic changes in cpu cycles, maybe > > we can use 'instructions' event instead (probably with :u) to get > > more stable values? > > Hello. > > As far as I understand it, nowadays, the test checks two things: > > test_bpf_counters() > record $workload twice (with and without --bpf-counters) > check that there are numeric results > compare the results > > test_bpf_modifier() > record $workload once with and without modifier (which should be what > --bpf-counters switch does to the events, right?) > check that there are numeric results > compare the results > > The problem here is not only the "dynamic changes in cpu-cycles", it > is rather in the testcase design itself. A testcase that compares two > metrics should get rid of all possible variable effects that influence it. > > The second function actually compares the values correctly, since they > are measured against the same identical workload. > > So, in my opinion, a better test design would be: > > (1) check that record without --bpf-counters works as a reference run Do you mean by `perf stat record`? Or saving the output of `perf stat` in a text file? > (2) check that record with --bpf-counters works too > (if not, we may compare to (1) to find out if whole `record` is broken > or just the --bpf-counters option) Ditto. > (3) possibly run `perf evlist` to check that --bpf-counters has added the > '/b' modifier The `perf evlist` would work only for `perf stat record`. Then I think it's a completely new test case. > (4) check with versus without "b", such as current test_bpf_modifier() > function does > > I like what Veronika suggests, it is basically the above, except of (3). > > ... > > In case we want preserve two separate runs in test_bpf_counters() _and_ > also check the numbers, then we should: > - use some more predictable workload: > - in an ideal case a statically linked simple binary > - in less-than-ideal case `perf test -w something` > - use instructions instead of cycles > However, I don't like that idea very much, because of the design principles > mentioned above. I understand it's not ideal.. maybe it's fine not to check the numbers here because we do that in the test_bpf_modifier. But it'd be nice to have that. Let's try with instructions event first, change it later if it doesn't go well. Thanks, Namhyung