* perf sampling count/freq/period strange bahavior
@ 2023-10-19 8:28 Michael Petlan
2023-10-20 6:33 ` Namhyung Kim
0 siblings, 1 reply; 4+ messages in thread
From: Michael Petlan @ 2023-10-19 8:28 UTC (permalink / raw)
To: linux-perf-users; +Cc: vmolnaro
Hello all...
We have encountered a strange behavior related to sampling. Generally
said, the number of samples is usually much lower than it should be.
Let's say, for load L, `perf stat -e instructions -- L` shows around
250,000 event hits. If I understand the concept of `-c` option for
`perf record`, with `-c 250` it should make the counter overflow and
generate a sample for every 250th event hit, so, it should result into
~1000 samples.
Instead I am getting something like ~62-150 samples, for example 124.
I have checked in dmesg that kernel has not throttled anything. When
running `perf report --stdio`, I see in the header that there were 0
samples lost (so the buffer was consumed fast enough). Additionally,
`perf report --stdio` tries to estimate the actual event count:
Event count (approx.): 31000
(Which is because 250 * 124 = 31000, so perf itself considers this
formula (n_of_samples * sampling_rate = n_of_events) valid.)
But in reality, this is not the case, `perf stat`, as said, never
shows that low number of events for the same load.
What else is involved in the sampling rates and why the number of
samples is so much lower than it should be?
Thank you for ideas!
Michael
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: perf sampling count/freq/period strange bahavior
2023-10-19 8:28 Michael Petlan
@ 2023-10-20 6:33 ` Namhyung Kim
2023-10-20 20:50 ` Arnaldo Carvalho de Melo
0 siblings, 1 reply; 4+ messages in thread
From: Namhyung Kim @ 2023-10-20 6:33 UTC (permalink / raw)
To: Michael Petlan; +Cc: linux-perf-users, vmolnaro
Hello,
On Thu, Oct 19, 2023 at 1:28 AM Michael Petlan <mpetlan@redhat.com> wrote:
>
> Hello all...
>
> We have encountered a strange behavior related to sampling. Generally
> said, the number of samples is usually much lower than it should be.
>
> Let's say, for load L, `perf stat -e instructions -- L` shows around
> 250,000 event hits. If I understand the concept of `-c` option for
> `perf record`, with `-c 250` it should make the counter overflow and
> generate a sample for every 250th event hit, so, it should result into
> ~1000 samples.
>
> Instead I am getting something like ~62-150 samples, for example 124.
> I have checked in dmesg that kernel has not throttled anything. When
> running `perf report --stdio`, I see in the header that there were 0
> samples lost (so the buffer was consumed fast enough). Additionally,
> `perf report --stdio` tries to estimate the actual event count:
>
> Event count (approx.): 31000
>
> (Which is because 250 * 124 = 31000, so perf itself considers this
> formula (n_of_samples * sampling_rate = n_of_events) valid.)
>
> But in reality, this is not the case, `perf stat`, as said, never
> shows that low number of events for the same load.
>
> What else is involved in the sampling rates and why the number of
> samples is so much lower than it should be?
>
> Thank you for ideas!
I don't know but it seems PMU has some internal limit
on the period and ignores lower values.
On my Intel machine,
$ perf stat -e instructions:u true
Performance counter stats for 'true':
120,024 instructions:u
0.002002822 seconds time elapsed
0.002040000 seconds user
0.000000000 seconds sys
$ perf record -e instructions:u -c 10000 true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB perf.data (12 samples) ]
$ perf record -e instructions:u -c 5000 true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB perf.data (24 samples) ]
$ perf record -e instructions:u -c 4000 true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB perf.data (30 samples) ]
$ perf record -e instructions:u -c 3000 true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB perf.data (40 samples) ]
$ perf record -e instructions:u -c 2000 true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.003 MB perf.data (51 samples) ]
$ perf record -e instructions:u -c 1000 true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.003 MB perf.data (51 samples) ]
$ perf record -e instructions:u -c 500 true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.003 MB perf.data (51 samples) ]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: perf sampling count/freq/period strange bahavior
2023-10-20 6:33 ` Namhyung Kim
@ 2023-10-20 20:50 ` Arnaldo Carvalho de Melo
0 siblings, 0 replies; 4+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-10-20 20:50 UTC (permalink / raw)
To: Namhyung Kim; +Cc: Michael Petlan, linux-perf-users, vmolnaro
Em Thu, Oct 19, 2023 at 11:33:19PM -0700, Namhyung Kim escreveu:
> Hello,
>
> On Thu, Oct 19, 2023 at 1:28 AM Michael Petlan <mpetlan@redhat.com> wrote:
> >
> > Hello all...
> >
> > We have encountered a strange behavior related to sampling. Generally
> > said, the number of samples is usually much lower than it should be.
> >
> > Let's say, for load L, `perf stat -e instructions -- L` shows around
> > 250,000 event hits. If I understand the concept of `-c` option for
> > `perf record`, with `-c 250` it should make the counter overflow and
> > generate a sample for every 250th event hit, so, it should result into
> > ~1000 samples.
> >
> > Instead I am getting something like ~62-150 samples, for example 124.
> > I have checked in dmesg that kernel has not throttled anything. When
> > running `perf report --stdio`, I see in the header that there were 0
> > samples lost (so the buffer was consumed fast enough). Additionally,
> > `perf report --stdio` tries to estimate the actual event count:
> >
> > Event count (approx.): 31000
> >
> > (Which is because 250 * 124 = 31000, so perf itself considers this
> > formula (n_of_samples * sampling_rate = n_of_events) valid.)
> >
> > But in reality, this is not the case, `perf stat`, as said, never
> > shows that low number of events for the same load.
> >
> > What else is involved in the sampling rates and why the number of
> > samples is so much lower than it should be?
> >
> > Thank you for ideas!
>
> I don't know but it seems PMU has some internal limit
> on the period and ignores lower values.
>
> On my Intel machine,
>
> $ perf stat -e instructions:u true
>
> Performance counter stats for 'true':
>
> 120,024 instructions:u
>
> 0.002002822 seconds time elapsed
>
> 0.002040000 seconds user
> 0.000000000 seconds sys
>
>
> $ perf record -e instructions:u -c 10000 true
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.002 MB perf.data (12 samples) ]
>
> $ perf record -e instructions:u -c 5000 true
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.002 MB perf.data (24 samples) ]
>
> $ perf record -e instructions:u -c 4000 true
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.002 MB perf.data (30 samples) ]
>
> $ perf record -e instructions:u -c 3000 true
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.002 MB perf.data (40 samples) ]
>
> $ perf record -e instructions:u -c 2000 true
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.003 MB perf.data (51 samples) ]
>
> $ perf record -e instructions:u -c 1000 true
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.003 MB perf.data (51 samples) ]
>
> $ perf record -e instructions:u -c 500 true
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.003 MB perf.data (51 samples) ]
Same thing here, see below.
But if you ask for a small period you'll miss lots of events, right? The
events that take place while you're processing the bigger number of perf
interrupts.
Like lets get an interrupt per instruction: :-)
[root@five ~]# perf record -c 1 -e instructions stress-ng --quiet --cpu 1 -t 2
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.075 MB perf.data (1687 samples) ]
[root@five ~]#
[root@five ~]# perf stat -r5 -e instructions stress-ng --quiet --cpu 1 -t 2
Performance counter stats for 'stress-ng --quiet --cpu 1 -t 2' (5 runs):
17,313,307,689 instructions ( +- 0.17% )
2.023840 +- 0.000939 seconds time elapsed ( +- 0.05% )
[root@five ~]# perf record -c 1000000 -e instructions stress-ng --quiet --cpu 1 -t 2
[ perf record: Woken up 3 times to write data ]
[ perf record: Captured and wrote 0.548 MB perf.data (17228 samples) ]
[root@five ~]# perf record -c 1000000 -e instructions stress-ng --quiet --cpu 1 -t 2
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.530 MB perf.data (16627 samples) ]
[root@five ~]# perf record -c 1000000 -e instructions stress-ng --quiet --cpu 1 -t 2
[ perf record: Woken up 3 times to write data ]
[ perf record: Captured and wrote 0.549 MB perf.data (17264 samples) ]
[root@five ~]# perf record -c 2000000 -e instructions stress-ng --quiet --cpu 1 -t 2
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.292 MB perf.data (8839 samples) ]
[root@five ~]# perf record -c 2000000 -e instructions stress-ng --quiet --cpu 1 -t 2
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.284 MB perf.data (8590 samples) ]
[root@five ~]# perf record -c 2000000 -e instructions stress-ng --quiet --cpu 1 -t 2
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.288 MB perf.data (8694 samples) ]
[root@five ~]# perf record -c 4000000 -e instructions stress-ng --quiet --cpu 1 -t 2
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.154 MB perf.data (4313 samples) ]
[root@five ~]# perf record -c 4000000 -e instructions stress-ng --quiet --cpu 1 -t 2
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.154 MB perf.data (4319 samples) ]
[root@five ~]# perf record -c 4000000 -e instructions stress-ng --quiet --cpu 1 -t 2
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.154 MB perf.data (4312 samples) ]
[root@five ~]# perf record -c 1000 -e instructions stress-ng --quiet --cpu 1 -t 2
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.090 MB perf.data (2156 samples) ]
[root@five ~]#
^ permalink raw reply [flat|nested] 4+ messages in thread
* perf sampling count/freq/period strange bahavior
@ 2023-10-23 12:18 Michael Petlan
0 siblings, 0 replies; 4+ messages in thread
From: Michael Petlan @ 2023-10-23 12:18 UTC (permalink / raw)
To: linux-perf-users
Hello all...
We have encountered a strange behavior related to sampling. Generally
said, the number of samples is usually much lower than it should be.
Let's say, for load L, `perf stat -e instructions -- L` shows around
250,000 event hits. If I understand the concept of `-c` option for
`perf record`, with `-c 250` it should make the counter overflow and
generate a sample for every 250th event hit, so, it should result into
~1000 samples.
Instead I am getting something like ~62-150 samples, for example 124.
I have checked in dmesg that kernel has not throttled anything. When
running `perf report --stdio`, I see in the header that there were 0
samples lost (so the buffer was consumed fast enough). Additionally,
`perf report --stdio` tries to estimate the actual event count:
Event count (approx.): 31000
(Which is because 250 * 124 = 31000, so perf itself considers this
formula (n_of_samples * sampling_rate = n_of_events) valid.)
But in reality, this is not the case, `perf stat`, as said, never
shows that low number of events for the same load.
What else is involved in the sampling rates and why the number of
samples is so much lower than it should be?
Thank you for ideas!
Michael
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-10-23 12:18 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-23 12:18 perf sampling count/freq/period strange bahavior Michael Petlan
-- strict thread matches above, loose matches on Subject: below --
2023-10-19 8:28 Michael Petlan
2023-10-20 6:33 ` Namhyung Kim
2023-10-20 20:50 ` Arnaldo Carvalho de Melo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).