* --mmap-pages option seemingly has no effect to help with LOST samples @ 2012-06-12 20:55 Maynard Johnson 2012-06-12 21:05 ` David Ahern 0 siblings, 1 reply; 14+ messages in thread From: Maynard Johnson @ 2012-06-12 20:55 UTC (permalink / raw) To: linux-perf-users Hi, On my Intel Core 2 Duo with RHEL 6.2 with the watchdog timer disabled, I'm using perf to collect a CPI profile as follows: perf record -e cycles 100000 -e instructions -c 50000 ./memcpyt 500000000 where 'memcpyt' is a test program that simply does a LOT of memcpy's -- takes about 20 seconds of real time to complete. This fails roughly half the time with: [ perf record: Woken up 11 times to write data ] [ perf record: Captured and wrote 4.540 MB perf.data (~198348 samples) ] Processed 0 events and LOST 872662! Check IO/CPU overload! I've seen some postings on this list in the past about the LOST events and the suggestion to try the --mmap-pages option. I see from the perf source that the default number of pages to use for mmap'ing the kernel's perf_events data is '8'. I tried going up to 64 pages with little noticeable effect. Additionally, sometimes when I get the LOST samples message, I'll also see the following junk pop up in all of my terminal sessions: Message from syslogd@oc3431575272 at Jun 12 15:21:52 ... kernel:Uhhuh. NMI received for unknown reason 00 on CPU 1. Message from syslogd@oc3431575272 at Jun 12 15:21:52 ... kernel:Do you have a strange power saving mode enabled? Message from syslogd@oc3431575272 at Jun 12 15:21:52 ... kernel:Dazed and confused, but trying to continue (Not sure, but these syslogd messages may have occurred only when I was running as root.) I tried decreasing my sampling rate for both events by half (200000 for cycles and 100000 for instructions), but still got LOST samples, with or without the "--mmap-pages=64" option. Decreasing sampling rate by half again finally did get rid of the LOST samples. Questions: 1) Why doesn't the number of mmap pages seem to have the expected beneficial effect? 2) Why doesn't the kernel's throttle capabilities prevent the LOST events in the first place? 3) What's up with the weird syslogd messages? heh. I realize none of these may be perf userspace issues, but may be perf_events kernel issues instead. But I thought I'd start out here on this list instead of wading neck-deep into LKML land. Thanks. -Maynard ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: --mmap-pages option seemingly has no effect to help with LOST samples 2012-06-12 20:55 --mmap-pages option seemingly has no effect to help with LOST samples Maynard Johnson @ 2012-06-12 21:05 ` David Ahern 2012-06-13 15:35 ` Maynard Johnson 0 siblings, 1 reply; 14+ messages in thread From: David Ahern @ 2012-06-12 21:05 UTC (permalink / raw) To: Maynard Johnson; +Cc: linux-perf-users On 6/12/12 2:55 PM, Maynard Johnson wrote: > Hi, > On my Intel Core 2 Duo with RHEL 6.2 with the watchdog timer disabled, I'm using perf to collect a CPI profile as follows: > > perf record -e cycles 100000 -e instructions -c 50000 ./memcpyt 500000000 Confused by that command line '-e cycles 100000' is not valid. Missing a -c? If so, -c 100000 followed by -c 50000 means the interval is 50000 for both events; the second one overrides the first. > where 'memcpyt' is a test program that simply does a LOT of memcpy's -- takes about 20 seconds of real time to complete. > > This fails roughly half the time with: > > [ perf record: Woken up 11 times to write data ] > [ perf record: Captured and wrote 4.540 MB perf.data (~198348 samples) ] > Processed 0 events and LOST 872662! > > Check IO/CPU overload! > > I've seen some postings on this list in the past about the LOST events and the suggestion to try the --mmap-pages option. I see from the perf source that the default number of pages to use for mmap'ing the kernel's perf_events data is '8'. I tried going up to 64 pages with little noticeable effect. Additionally, sometimes when I get the LOST samples message, I'll also see the following junk pop up in all of my terminal sessions: > > Message from syslogd@oc3431575272 at Jun 12 15:21:52 ... > kernel:Uhhuh. NMI received for unknown reason 00 on CPU 1. > > Message from syslogd@oc3431575272 at Jun 12 15:21:52 ... > kernel:Do you have a strange power saving mode enabled? > > Message from syslogd@oc3431575272 at Jun 12 15:21:52 ... > kernel:Dazed and confused, but trying to continue I think you are killing your box with NMIs based on the low period (-c arg). I suggest increasing the period. David > > (Not sure, but these syslogd messages may have occurred only when I was running as root.) > > I tried decreasing my sampling rate for both events by half (200000 for cycles and 100000 for instructions), but still got LOST samples, with or without the "--mmap-pages=64" option. Decreasing sampling rate by half again finally did get rid of the LOST samples. > > Questions: > 1) Why doesn't the number of mmap pages seem to have the expected beneficial effect? > 2) Why doesn't the kernel's throttle capabilities prevent the LOST events in the first place? > 3) What's up with the weird syslogd messages? heh. > > I realize none of these may be perf userspace issues, but may be perf_events kernel issues instead. But I thought I'd start out here on this list instead of wading neck-deep into LKML land. > > Thanks. > -Maynard > > -- > To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: --mmap-pages option seemingly has no effect to help with LOST samples 2012-06-12 21:05 ` David Ahern @ 2012-06-13 15:35 ` Maynard Johnson 2012-06-13 15:48 ` David Ahern 0 siblings, 1 reply; 14+ messages in thread From: Maynard Johnson @ 2012-06-13 15:35 UTC (permalink / raw) To: David Ahern; +Cc: linux-perf-users On 06/12/2012 04:05 PM, David Ahern wrote: > On 6/12/12 2:55 PM, Maynard Johnson wrote: >> Hi, >> On my Intel Core 2 Duo with RHEL 6.2 with the watchdog timer >> disabled, I'm using perf to collect a CPI profile as follows: >> >> perf record -e cycles 100000 -e instructions -c 50000 ./memcpyt >> 500000000 > > Confused by that command line '-e cycles 100000' is not valid. Missing > a -c? If so, -c 100000 followed by -c 50000 means the interval is > 50000 for both events; the second one overrides the first. > Yeah, right, typo. And I guess I forgot that the "-c" option is for *all* events, not per-event. Thanks for the reminder. > >> where 'memcpyt' is a test program that simply does a LOT of memcpy's >> -- takes about 20 seconds of real time to complete. >> >> This fails roughly half the time with: >> >> [ perf record: Woken up 11 times to write data ] >> [ perf record: Captured and wrote 4.540 MB perf.data (~198348 >> samples) ] >> Processed 0 events and LOST 872662! >> >> Check IO/CPU overload! > > >> >> I've seen some postings on this list in the past about the LOST >> events and the suggestion to try the --mmap-pages option. I see from >> the perf source that the default number of pages to use for mmap'ing >> the kernel's perf_events data is '8'. I tried going up to 64 pages >> with little noticeable effect. Additionally, sometimes when I get >> the LOST samples message, I'll also see the following junk pop up in >> all of my terminal sessions: >> >> Message from syslogd@oc3431575272 at Jun 12 15:21:52 ... >> kernel:Uhhuh. NMI received for unknown reason 00 on CPU 1. >> >> Message from syslogd@oc3431575272 at Jun 12 15:21:52 ... >> kernel:Do you have a strange power saving mode enabled? >> >> Message from syslogd@oc3431575272 at Jun 12 15:21:52 ... >> kernel:Dazed and confused, but trying to continue > > I think you are killing your box with NMIs based on the low period (-c > arg). I suggest increasing the period. OK, I'll buy that, as I think I only saw these messages when using the highest sampling rate. But at the mid-level sampling rate that I used (which would have been 100,000), where I still see a lot of LOST samples . . . any thoughts on why bumping up the --mmap-pages didn't help? By the way, in digging into question #2 below, it appears kernel throttling *did* occur (seeing this in the raw report data), but probably not until after some samples were already lost. Thanks. -Maynard > > > David > > >> >> (Not sure, but these syslogd messages may have occurred only when I >> was running as root.) >> >> I tried decreasing my sampling rate for both events by half (200000 >> for cycles and 100000 for instructions), but still got LOST samples, >> with or without the "--mmap-pages=64" option. Decreasing sampling >> rate by half again finally did get rid of the LOST samples. >> >> Questions: >> 1) Why doesn't the number of mmap pages seem to have the expected >> beneficial effect? >> 2) Why doesn't the kernel's throttle capabilities prevent the LOST >> events in the first place? >> 3) What's up with the weird syslogd messages? heh. >> >> I realize none of these may be perf userspace issues, but may be >> perf_events kernel issues instead. But I thought I'd start out here >> on this list instead of wading neck-deep into LKML land. >> >> Thanks. >> -Maynard >> >> -- >> To unsubscribe from this list: send the line "unsubscribe >> linux-perf-users" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: --mmap-pages option seemingly has no effect to help with LOST samples 2012-06-13 15:35 ` Maynard Johnson @ 2012-06-13 15:48 ` David Ahern 2012-06-22 15:59 ` Maynard Johnson 0 siblings, 1 reply; 14+ messages in thread From: David Ahern @ 2012-06-13 15:48 UTC (permalink / raw) To: Maynard Johnson; +Cc: linux-perf-users On 6/13/12 9:35 AM, Maynard Johnson wrote: >> I think you are killing your box with NMIs based on the low period (-c >> arg). I suggest increasing the period. > OK, I'll buy that, as I think I only saw these messages when using the > highest sampling rate. But at the mid-level sampling rate that I used > (which would have been 100,000), where I still see a lot of LOST samples > . . . any thoughts on why bumping up the --mmap-pages didn't help? The default is 128 pages = 512k of RAM per CPU. If you look at pmap $(pidof perf) you will see a 516k map per CPU. My primary box is a dual socket, quad core with HT, so I have 16 of these: 00007f7655186000 516K rw-s- [ anon ] If you bump the number of pages, those segments should increase. e.g., using -m 512 I get 16 segments of 2M: 00007f804a9dd000 2052K rw-s- [ anon ] This is using latest perf source, not RHEL6, but I do not recall many changes for the mapped pages. > > By the way, in digging into question #2 below, it appears kernel > throttling *did* occur (seeing this in the raw report data), but > probably not until after some samples were already lost. Throttling is based on interrupt rate, so it will be independent of lost samples. Default throttling kicks in at 100k: $ cat /proc/sys/kernel/perf_event_max_sample_rate 100000 For my box that is too high - I've seen the PMU reset because of too many nmis. David ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: --mmap-pages option seemingly has no effect to help with LOST samples 2012-06-13 15:48 ` David Ahern @ 2012-06-22 15:59 ` Maynard Johnson 2012-06-22 16:16 ` David Ahern 2012-06-22 19:23 ` David Ahern 0 siblings, 2 replies; 14+ messages in thread From: Maynard Johnson @ 2012-06-22 15:59 UTC (permalink / raw) To: David Ahern; +Cc: linux-perf-users On 06/13/2012 10:48 AM, David Ahern wrote: > On 6/13/12 9:35 AM, Maynard Johnson wrote: >>> I think you are killing your box with NMIs based on the low period (-c >>> arg). I suggest increasing the period. >> OK, I'll buy that, as I think I only saw these messages when using the >> highest sampling rate. But at the mid-level sampling rate that I used >> (which would have been 100,000), where I still see a lot of LOST samples >> . . . any thoughts on why bumping up the --mmap-pages didn't help? > > The default is 128 pages = 512k of RAM per CPU. If you look at pmap $(pidof perf) you will see a 516k map per CPU. My primary box is a dual socket, quad core with HT, so I have 16 of these: > 00007f7655186000 516K rw-s- [ anon ] > > If you bump the number of pages, those segments should increase. e.g., using -m 512 I get 16 segments of 2M: > 00007f804a9dd000 2052K rw-s- [ anon ] > > This is using latest perf source, not RHEL6, but I do not recall many changes for the mapped pages. Hi, David, Finally getting back to this issue after some distractions. Thanks for the pointing out my error regarding the default number of mmap pages. Switching back and forth between my laptop and an IBM POWER7 in testing perf, I got the value of '8' from the POWER7 and incorrectly assumed it would be the same on all architectures. Since the default number of mmap pages for my laptop is, as you said, 128, I re-ran the testcase as follows (using a lower sampling rate to avoid the: perf record -e cycles -e instructions -c 500000 -m 256 ./memcpyt 500000000 and it failed with: Fatal: failed to mmap with 22 (Invalid argument) Evidently, you need to either set /proc/sys/kernel/perf_event_paranoid to '-1' or run perf as root to ask for more than the default number of mmap pages. Running the test as root *without* the "-m" option, I verified that I still see the "LOST" samples message (again, perhaps about half the time). So then I tried different values for '-m', up to 512, and still occasionally (but not as often, I think) see the "LOST" samples. The 'perf record' tool can easily handle a sampling rate of one sample per 100,000 cycles *or* instructions (i.e., one at a time), so I would have expected it to be handle one sample per 500,000 events when profiling on both events? Am I missing something? Another related issue is the number of samples being recorded varies wildly when profiling on multiple events. For example, profiling on just cycles with --count=500000, 'perf report -n' reports ~87k samples. And profiling on just instructions with the same rate, I get ~102k. When profiling with both events, I get cycles/instruction sample counts ranging from a low of 6k/7k to a high of 88k/102k. Usually, I get counts around 12k/15k. The higher the count seen with 'perf report' (i.e., the closer to true values), the more likely that perf record fails with the "LOST" samples message. Thanks in advance for any help. -Maynard > >> >> By the way, in digging into question #2 below, it appears kernel >> throttling *did* occur (seeing this in the raw report data), but >> probably not until after some samples were already lost. > > Throttling is based on interrupt rate, so it will be independent of lost samples. Default throttling kicks in at 100k: > > $ cat /proc/sys/kernel/perf_event_max_sample_rate > 100000 > > For my box that is too high - I've seen the PMU reset because of too many nmis. > > David > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: --mmap-pages option seemingly has no effect to help with LOST samples 2012-06-22 15:59 ` Maynard Johnson @ 2012-06-22 16:16 ` David Ahern 2012-06-22 19:20 ` Maynard Johnson 2012-06-22 19:23 ` David Ahern 1 sibling, 1 reply; 14+ messages in thread From: David Ahern @ 2012-06-22 16:16 UTC (permalink / raw) To: Maynard Johnson; +Cc: linux-perf-users On 6/22/12 9:59 AM, Maynard Johnson wrote: > Hi, David, > Finally getting back to this issue after some distractions. Thanks for the pointing out my error regarding the default number of mmap pages. Switching back and forth between my laptop and an IBM POWER7 in testing perf, I got the value of '8' from the POWER7 and incorrectly assumed it would be the same on all architectures. Since the default number of mmap pages for my laptop is, as you said, 128, I re-ran the testcase as follows (using a lower sampling rate to avoid the: > > perf record -e cycles -e instructions -c 500000 -m 256 ./memcpyt 500000000 > and it failed with: > Fatal: failed to mmap with 22 (Invalid argument) > > Evidently, you need to either set /proc/sys/kernel/perf_event_paranoid to '-1' or run perf as root to ask for more than the default number of mmap pages. Running the test as root *without* the "-m" option, I verified that I still see the "LOST" samples message (again, perhaps about half the time). So then I tried different values for '-m', up to 512, and still occasionally (but not as often, I think) see the "LOST" samples. Right, I keep forgetting the perf_event_paranoid. I have a development box with that set to -1. Try adding the -r option to perf-record. It's quite possible that you are generating events so fast that perf is not getting scheduled often enough. Perhaps it also be getting blocked on disk I/O writing the events to file. Try putting the file into a ramfs or tmpfs (without swapping) mount. > > The 'perf record' tool can easily handle a sampling rate of one sample per 100,000 cycles *or* instructions (i.e., one at a time), so I would have expected it to be handle one sample per 500,000 events when profiling on both events? Am I missing something? Have you tried a newer OS? Perhaps a bug/limitation in the RHEL6 implementation. > > Another related issue is the number of samples being recorded varies wildly when profiling on multiple events. For example, profiling on just cycles with --count=500000, 'perf report -n' reports ~87k samples. And profiling on just instructions with the same rate, I get ~102k. When profiling with both events, I get cycles/instruction sample counts ranging from a low of 6k/7k to a high of 88k/102k. Usually, I get counts around 12k/15k. The higher the count seen with 'perf report' (i.e., the closer to true values), the more likely that perf record fails with the "LOST" samples message. What type of CPU? And can you send me the command so I can try it on my end? static binary for x86 is fine if you can't/don't want to share the source. I have a Core2 laptop and servers with an E5540 (Nehalem) and E5620 (Westmere) - all of which have a variety of kernel versions available. David ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: --mmap-pages option seemingly has no effect to help with LOST samples 2012-06-22 16:16 ` David Ahern @ 2012-06-22 19:20 ` Maynard Johnson 2012-06-22 19:44 ` David Ahern 0 siblings, 1 reply; 14+ messages in thread From: Maynard Johnson @ 2012-06-22 19:20 UTC (permalink / raw) To: David Ahern; +Cc: linux-perf-users, suka On 06/22/2012 11:16 AM, David Ahern wrote: > On 6/22/12 9:59 AM, Maynard Johnson wrote: >> Hi, David, >> Finally getting back to this issue after some distractions. Thanks for the pointing out my error regarding the default number of mmap pages. Switching back and forth between my laptop and an IBM POWER7 in testing perf, I got the value of '8' from the POWER7 and incorrectly assumed it would be the same on all architectures. Since the default number of mmap pages for my laptop is, as you said, 128, I re-ran the testcase as follows (using a lower sampling rate to avoid the: >> >> perf record -e cycles -e instructions -c 500000 -m 256 ./memcpyt 500000000 >> and it failed with: >> Fatal: failed to mmap with 22 (Invalid argument) >> >> Evidently, you need to either set /proc/sys/kernel/perf_event_paranoid to '-1' or run perf as root to ask for more than the default number of mmap pages. Running the test as root *without* the "-m" option, I verified that I still see the "LOST" samples message (again, perhaps about half the time). So then I tried different values for '-m', up to 512, and still occasionally (but not as often, I think) see the "LOST" samples. > > Right, I keep forgetting the perf_event_paranoid. I have a development box with that set to -1. > > Try adding the -r option to perf-record. It's quite possible that you are generating events so fast that perf is not getting scheduled often enough. Perhaps it also be getting blocked on disk I/O writing the events to file. Try putting the file into a ramfs or tmpfs (without swapping) mount. Hi, Dave, Unfortunately, neither '-r 99' nor ramfs helped. > >> >> The 'perf record' tool can easily handle a sampling rate of one sample per 100,000 cycles *or* instructions (i.e., one at a time), so I would have expected it to be handle one sample per 500,000 events when profiling on both events? Am I missing something? > > Have you tried a newer OS? Perhaps a bug/limitation in the RHEL6 implementation. > >> >> Another related issue is the number of samples being recorded varies wildly when profiling on multiple events. For example, profiling on just cycles with --count=500000, 'perf report -n' reports ~87k samples. And profiling on just instructions with the same rate, I get ~102k. When profiling with both events, I get cycles/instruction sample counts ranging from a low of 6k/7k to a high of 88k/102k. Usually, I get counts around 12k/15k. The higher the count seen with 'perf report' (i.e., the closer to true values), the more likely that perf record fails with the "LOST" samples message. > > What type of CPU? And can you send me the command so I can try it on my end? static binary for x86 is fine if you can't/don't want to share the source. I have a Core2 laptop and servers with an E5540 (Nehalem) and E5620 (Westmere) - all of which have a variety of kernel versions available. I was seeing this on both my Intel Core 2 Duo and an IBM POWER7 server, both running RHEL 6.2. Also tried on another POWER7 running RHEL 6.3 beta, and got the same results. I found another POWER server that had RHEL 6.2 but was temporarily booted on a 3.5 kernel and ran the test there -- the counts were good there. :-) Just to be sure, I rebooted that system to the stock RHEL 6.2 kernel and reproduced the problem. So it seems there's an upstream fix for this. Can someone help me find the commit? Thanks. -Maynard > > David > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: --mmap-pages option seemingly has no effect to help with LOST samples 2012-06-22 19:20 ` Maynard Johnson @ 2012-06-22 19:44 ` David Ahern 2012-06-22 20:23 ` Maynard Johnson 0 siblings, 1 reply; 14+ messages in thread From: David Ahern @ 2012-06-22 19:44 UTC (permalink / raw) To: Maynard Johnson; +Cc: linux-perf-users, suka On 6/22/12 1:20 PM, Maynard Johnson wrote: > I was seeing this on both my Intel Core 2 Duo and an IBM POWER7 server, both running RHEL 6.2. Also tried on another POWER7 running RHEL 6.3 beta, and got the same results. I found another POWER server that had RHEL 6.2 but was temporarily booted on a 3.5 kernel and ran the test there -- the counts were good there. :-) Just to be sure, I rebooted that system to the stock RHEL 6.2 kernel and reproduced the problem. So it seems there's an upstream fix for this. Can someone help me find the commit? emails crossing in the ether. 2.6.32 is real early in the perf history. I flipped a system to the Fedora 14 2.6.35.14 kernel -- and it does not handle multiple events either. With Arnaldo's last updates I did notice something curious about the events: Aggregated stats: TOTAL events: 13165 MMAP events: 63 COMM events: 2 SAMPLE events: 13100 cycles:HG stats: TOTAL events: 5769 MMAP events: 63 COMM events: 2 SAMPLE events: 5704 instructions:HG stats: TOTAL events: 7396 SAMPLE events: 7396 The HG is wrong -- I did not put attributes on the event. So, re-running with uk: $ perf record -fo /tmp/perf.data -e cycles:uk -e instructions:uk -c 100000 /tmp/loop_1b_instructions And life turned out right: Aggregated stats: TOTAL events: 17967 MMAP events: 63 COMM events: 2 EXIT events: 1 SAMPLE events: 17901 cycles:ku stats: TOTAL events: 7862 MMAP events: 63 COMM events: 2 EXIT events: 1 SAMPLE events: 7796 instructions:ku stats: TOTAL events: 10105 SAMPLE events: 10105 So, try adding :uk to your events. David ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: --mmap-pages option seemingly has no effect to help with LOST samples 2012-06-22 19:44 ` David Ahern @ 2012-06-22 20:23 ` Maynard Johnson 2012-06-22 20:38 ` David Ahern 0 siblings, 1 reply; 14+ messages in thread From: Maynard Johnson @ 2012-06-22 20:23 UTC (permalink / raw) To: David Ahern; +Cc: linux-perf-users, suka On 06/22/2012 02:44 PM, David Ahern wrote: > On 6/22/12 1:20 PM, Maynard Johnson wrote: >> I was seeing this on both my Intel Core 2 Duo and an IBM POWER7 server, both running RHEL 6.2. Also tried on another POWER7 running RHEL 6.3 beta, and got the same results. I found another POWER server that had RHEL 6.2 but was temporarily booted on a 3.5 kernel and ran the test there -- the counts were good there. :-) Just to be sure, I rebooted that system to the stock RHEL 6.2 kernel and reproduced the problem. So it seems there's an upstream fix for this. Can someone help me find the commit? > > emails crossing in the ether. > > 2.6.32 is real early in the perf history. I flipped a system to the Fedora 14 2.6.35.14 kernel -- and it does not handle multiple events either. With Arnaldo's last updates I did notice something curious about the events: Yes, I realize that 2.6.32 is early perf, but a lot of our customers are stuck with it. Too bad I didn't discover this before RHEL 6.3 GA'ed. Again, I'm hoping that someone in the perf community watching this list might be able to point to an upstream commit for the problem so we can push that fix into RHEL 6.4 (at least). Is there someone you would suggest to be cc'ed? -Maynard > > Aggregated stats: > TOTAL events: 13165 > MMAP events: 63 > COMM events: 2 > SAMPLE events: 13100 > cycles:HG stats: > TOTAL events: 5769 > MMAP events: 63 > COMM events: 2 > SAMPLE events: 5704 > instructions:HG stats: > TOTAL events: 7396 > SAMPLE events: 7396 > > The HG is wrong -- I did not put attributes on the event. So, re-running with uk: > > $ perf record -fo /tmp/perf.data -e cycles:uk -e instructions:uk -c 100000 /tmp/loop_1b_instructions > > And life turned out right: > Aggregated stats: > TOTAL events: 17967 > MMAP events: 63 > COMM events: 2 > EXIT events: 1 > SAMPLE events: 17901 > cycles:ku stats: > TOTAL events: 7862 > MMAP events: 63 > COMM events: 2 > EXIT events: 1 > SAMPLE events: 7796 > instructions:ku stats: > TOTAL events: 10105 > SAMPLE events: 10105 > > So, try adding :uk to your events. That made no noticeable difference. > > David > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: --mmap-pages option seemingly has no effect to help with LOST samples 2012-06-22 20:23 ` Maynard Johnson @ 2012-06-22 20:38 ` David Ahern 2012-06-25 14:10 ` Maynard Johnson 0 siblings, 1 reply; 14+ messages in thread From: David Ahern @ 2012-06-22 20:38 UTC (permalink / raw) To: Maynard Johnson; +Cc: linux-perf-users, suka On 6/22/12 2:23 PM, Maynard Johnson wrote: > On 06/22/2012 02:44 PM, David Ahern wrote: >> On 6/22/12 1:20 PM, Maynard Johnson wrote: >>> I was seeing this on both my Intel Core 2 Duo and an IBM POWER7 server, both running RHEL 6.2. Also tried on another POWER7 running RHEL 6.3 beta, and got the same results. I found another POWER server that had RHEL 6.2 but was temporarily booted on a 3.5 kernel and ran the test there -- the counts were good there. :-) Just to be sure, I rebooted that system to the stock RHEL 6.2 kernel and reproduced the problem. So it seems there's an upstream fix for this. Can someone help me find the commit? >> >> emails crossing in the ether. >> >> 2.6.32 is real early in the perf history. I flipped a system to the Fedora 14 2.6.35.14 kernel -- and it does not handle multiple events either. With Arnaldo's last updates I did notice something curious about the events: > Yes, I realize that 2.6.32 is early perf, but a lot of our customers are stuck with it. Too bad I didn't discover this before RHEL 6.3 GA'ed. Again, I'm hoping that someone in the perf community watching this list might be able to point to an upstream commit for the problem so we can push that fix into RHEL 6.4 (at least). Is there someone you would suggest to be cc'ed? > I maintain a backport to WRL3. As part of debugging your problem I found my backport version is susceptible as well. In my testing I had only looked at single events, not multiple. For my version adding :uk did not help, but :u did. I'll find some time to do a bisect and see if I can find commit(s) that fix the problem, but it will be a background task. David ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: --mmap-pages option seemingly has no effect to help with LOST samples 2012-06-22 20:38 ` David Ahern @ 2012-06-25 14:10 ` Maynard Johnson 2012-06-26 20:16 ` David Ahern 0 siblings, 1 reply; 14+ messages in thread From: Maynard Johnson @ 2012-06-25 14:10 UTC (permalink / raw) To: David Ahern; +Cc: linux-perf-users, suka On 06/22/2012 03:38 PM, David Ahern wrote: > On 6/22/12 2:23 PM, Maynard Johnson wrote: >> On 06/22/2012 02:44 PM, David Ahern wrote: >>> On 6/22/12 1:20 PM, Maynard Johnson wrote: >>>> I was seeing this on both my Intel Core 2 Duo and an IBM POWER7 server, both running RHEL 6.2. Also tried on another POWER7 running RHEL 6.3 beta, and got the same results. I found another POWER server that had RHEL 6.2 but was temporarily booted on a 3.5 kernel and ran the test there -- the counts were good there. :-) Just to be sure, I rebooted that system to the stock RHEL 6.2 kernel and reproduced the problem. So it seems there's an upstream fix for this. Can someone help me find the commit? >>> >>> emails crossing in the ether. >>> >>> 2.6.32 is real early in the perf history. I flipped a system to the Fedora 14 2.6.35.14 kernel -- and it does not handle multiple events either. With Arnaldo's last updates I did notice something curious about the events: >> Yes, I realize that 2.6.32 is early perf, but a lot of our customers are stuck with it. Too bad I didn't discover this before RHEL 6.3 GA'ed. Again, I'm hoping that someone in the perf community watching this list might be able to point to an upstream commit for the problem so we can push that fix into RHEL 6.4 (at least). Is there someone you would suggest to be cc'ed? >> > > I maintain a backport to WRL3. As part of debugging your problem I found my backport version is susceptible as well. In my testing I had only looked at single events, not multiple. For my version adding :uk did not help, but :u did. I'll find some time to do a bisect and see if I can find commit(s) that fix the problem, but it will be a background task. David, Just an FYI -- adding ":u" doesn't help for RHEL 6.2. Thanks much for the help so far. -Maynard > > David > > > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: --mmap-pages option seemingly has no effect to help with LOST samples 2012-06-25 14:10 ` Maynard Johnson @ 2012-06-26 20:16 ` David Ahern 2012-06-26 20:32 ` Maynard Johnson 0 siblings, 1 reply; 14+ messages in thread From: David Ahern @ 2012-06-26 20:16 UTC (permalink / raw) To: Maynard Johnson; +Cc: linux-perf-users, suka On 6/25/12 8:10 AM, Maynard Johnson wrote: > Just an FYI -- adding ":u" doesn't help for RHEL 6.2. Thanks much for the help so far. > What kind of processor are you using and what is the output of: dmesg |grep Perf David ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: --mmap-pages option seemingly has no effect to help with LOST samples 2012-06-26 20:16 ` David Ahern @ 2012-06-26 20:32 ` Maynard Johnson 0 siblings, 0 replies; 14+ messages in thread From: Maynard Johnson @ 2012-06-26 20:32 UTC (permalink / raw) To: David Ahern; +Cc: linux-perf-users, suka On 06/26/2012 03:16 PM, David Ahern wrote: > On 6/25/12 8:10 AM, Maynard Johnson wrote: > >> Just an FYI -- adding ":u" doesn't help for RHEL 6.2. Thanks much for the help so far. >> > > What kind of processor are you using and what is the output of: > dmesg |grep Perf As I mentioned, I'm seeing this symptom (incorrect counts) on both my Intel Core 2 Duo laptop and on an IBM POWER7 server. On the POWER7, 'dmesg | grep Perf' shows nothing. On my laptop, I get "Performance Events: PEBS fmt0+, Core2 events, Intel PMU driver." -Maynard > > David > -- > To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: --mmap-pages option seemingly has no effect to help with LOST samples 2012-06-22 15:59 ` Maynard Johnson 2012-06-22 16:16 ` David Ahern @ 2012-06-22 19:23 ` David Ahern 1 sibling, 0 replies; 14+ messages in thread From: David Ahern @ 2012-06-22 19:23 UTC (permalink / raw) To: Maynard Johnson; +Cc: linux-perf-users On 6/22/12 9:59 AM, Maynard Johnson wrote: > The 'perf record' tool can easily handle a sampling rate of one sample per 100,000 cycles *or* instructions (i.e., one at a time), so I would have expected it to be handle one sample per 500,000 events when profiling on both events? Am I missing something? > > Another related issue is the number of samples being recorded varies wildly when profiling on multiple events. For example, profiling on just cycles with --count=500000, 'perf report -n' reports ~87k samples. And profiling on just instructions with the same rate, I get ~102k. When profiling with both events, I get cycles/instruction sample counts ranging from a low of 6k/7k to a high of 88k/102k. Usually, I get counts around 12k/15k. The higher the count seen with 'perf report' (i.e., the closer to true values), the more likely that perf record fails with the "LOST" samples message. It might well be a RHEL6 problem. Host OS is 3.4.0 perf version 3.5.rc1.87.gcb9dd4.dirty Using Peter Z's 1billion instruction, first up is instructions sampling: $ perf record -fo /tmp/perf.data -e instructions -c 100000 ./loop_1b_instructions $ perf report --stdio -i /tmp/perf.data -D | tail -12 Aggregated stats: TOTAL events: 10161 MMAP events: 106 COMM events: 2 EXIT events: 2 SAMPLE events: 10051 instructions stats: TOTAL events: 10161 MMAP events: 106 COMM events: 2 EXIT events: 2 SAMPLE events: 10051 1billion instructions sampled every 100,000 = ~10,000 samples which corresponds to the above (ish). And now looking at cycles: $ perf record -fo /tmp/perf.data -e cycles -c 100000 ./loop_1b_instructions $ perf report --stdio -i /tmp/perf.data -D | tail -12 Aggregated stats: TOTAL events: 7961 MMAP events: 106 COMM events: 2 EXIT events: 2 SAMPLE events: 7851 cycles stats: TOTAL events: 7961 MMAP events: 106 COMM events: 2 EXIT events: 2 SAMPLE events: 7851 And both events combined: $ perf record -fo /tmp/perf.data -e cycles -e instructions -c 100000 ./loop_1b_instructions $ /tmp/pbuild/perf report --stdio -i /tmp/perf.data -D | tail -15 Aggregated stats: TOTAL events: 18078 MMAP events: 106 COMM events: 2 EXIT events: 2 SAMPLE events: 17968 cycles stats: TOTAL events: 7900 MMAP events: 4 COMM events: 1 EXIT events: 2 SAMPLE events: 7893 instructions stats: TOTAL events: 10075 SAMPLE events: 10075 So we get ~10,000 samples due to the instructions event and ~7800 for the cycles event. David ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2012-06-26 20:32 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-06-12 20:55 --mmap-pages option seemingly has no effect to help with LOST samples Maynard Johnson 2012-06-12 21:05 ` David Ahern 2012-06-13 15:35 ` Maynard Johnson 2012-06-13 15:48 ` David Ahern 2012-06-22 15:59 ` Maynard Johnson 2012-06-22 16:16 ` David Ahern 2012-06-22 19:20 ` Maynard Johnson 2012-06-22 19:44 ` David Ahern 2012-06-22 20:23 ` Maynard Johnson 2012-06-22 20:38 ` David Ahern 2012-06-25 14:10 ` Maynard Johnson 2012-06-26 20:16 ` David Ahern 2012-06-26 20:32 ` Maynard Johnson 2012-06-22 19:23 ` David Ahern
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).