linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* NMI received for unknown reason when running perf with IBS on AMD
@ 2023-10-16 12:00 Guilherme Amadio
  2023-10-16 21:48 ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 8+ messages in thread
From: Guilherme Amadio @ 2023-10-16 12:00 UTC (permalink / raw)
  To: acme; +Cc: linux-perf-users

Hi Arnaldo,

I've been having a strange problem with perf when using IBS on an AMD
3950X processor. Whenever I use perf top or perf record with the default
event, which corresponds to cycles:P, I get these messages in dmesg output:

[443324.266243] Uhhuh. NMI received for unknown reason 3c on CPU 1.
[443324.266246] Dazed and confused, but trying to continue
[443324.290039] Uhhuh. NMI received for unknown reason 2c on CPU 9.
[443324.290042] Dazed and confused, but trying to continue
[443324.307334] Uhhuh. NMI received for unknown reason 3c on CPU 9.
[443324.307336] Dazed and confused, but trying to continue
[443324.404938] Uhhuh. NMI received for unknown reason 2c on CPU 9.
[443324.404940] Dazed and confused, but trying to continue

If I decrease the frequency I use for sampling, the messages also
decrease in frequency, but even with a low sampling frequency,
eventually they start to appear. Interestingly, if I use simply cycles
as the event, the problem does not happen. However, if I use cycles:pp
and a single CPU, it is less frequent, but does happen. Only CPUs used
for measurement show up in the error messages as well (i.e. if I
restrict to CPU 0, only CPU 0 shows in the messages above), and the more
CPUs I use, the more frequent the messages.

Please let me know how I could help to debug this problem further. The
output of perf record then perf report --header-only follows below,
along with perf version --build-options.

Best regards,
-Guilherme



$ perf report --header-only
# ========
# captured on    : Mon Oct 16 13:57:34 2023
# header version : 1
# data offset    : 664
# data size      : 884904
# feat offset    : 885568
# hostname : gentoo.cern.ch
# os release : 6.5.5-gentoo
# perf version : 6.5
# arch : x86_64
# nrcpus online : 16
# nrcpus avail : 16
# cpudesc : AMD Ryzen 9 3950X 16-Core Processor
# cpuid : AuthenticAMD,23,113,0
# total memory : 32748648 kB
# cmdline : /usr/bin/perf record -a -e cycles:pp -- sleep 1 
# event : name = cycles:pp, , id = { 809, 810, 811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, 822, 823, 824 }, type = 0 (PERF_TYPE_HARDWARE), size = 136, config = 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ID|CPU|PERIOD, read_format = ID, disabled = 1, inherit = 1, freq = 1, precise_ip = 2, sample_id_all = 1
# event : name = dummy:HG, , id = { 825, 826, 827, 828, 829, 830, 831, 832, 833, 834, 835, 836, 837, 838, 839, 840 }, type = 1 (PERF_TYPE_SOFTWARE), size = 136, config = 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ID|CPU|PERIOD, read_format = ID, inherit = 1, mmap = 1, comm = 1, freq = 1, task = 1, sample_id_all = 1, mmap2 = 1, comm_exec = 1, ksymbol = 1, bpf_event = 1
# CPU_TOPOLOGY info available, use -I to display
# NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: cpu = 4, amd_df = 13, software = 1, ibs_op = 12, power = 10, ibs_fetch = 11, uprobe = 9, amd_iommu_0 = 15, breakpoint = 5, amd_l3 = 14, tracepoint = 2, kprobe = 8, msr = 16
# CACHE info available, use -I to display
# time of first sample : 444374.952454
# time of last sample : 444375.960893
# sample duration :   1008.439 ms
# cpu pmu capabilities: max_precise=0
# missing features: TRACING_DATA BRANCH_STACK GROUP_DESC AUXTRACE STAT MEM_TOPOLOGY CLOCKID DIR_FORMAT COMPRESSED CLOCK_DATA HYBRID_TOPOLOGY 
# ========
#

$ perf version --build-options
perf version 6.5
                 dwarf: [ on  ]  # HAVE_DWARF_SUPPORT
    dwarf_getlocations: [ on  ]  # HAVE_DWARF_GETLOCATIONS_SUPPORT
         syscall_table: [ on  ]  # HAVE_SYSCALL_TABLE_SUPPORT
                libbfd: [ on  ]  # HAVE_LIBBFD_SUPPORT
            debuginfod: [ OFF ]  # HAVE_DEBUGINFOD_SUPPORT
                libelf: [ on  ]  # HAVE_LIBELF_SUPPORT
               libnuma: [ on  ]  # HAVE_LIBNUMA_SUPPORT
numa_num_possible_cpus: [ on  ]  # HAVE_LIBNUMA_SUPPORT
               libperl: [ on  ]  # HAVE_LIBPERL_SUPPORT
             libpython: [ on  ]  # HAVE_LIBPYTHON_SUPPORT
              libslang: [ on  ]  # HAVE_SLANG_SUPPORT
             libcrypto: [ on  ]  # HAVE_LIBCRYPTO_SUPPORT
             libunwind: [ on  ]  # HAVE_LIBUNWIND_SUPPORT
    libdw-dwarf-unwind: [ on  ]  # HAVE_DWARF_SUPPORT
                  zlib: [ on  ]  # HAVE_ZLIB_SUPPORT
                  lzma: [ on  ]  # HAVE_LZMA_SUPPORT
             get_cpuid: [ on  ]  # HAVE_AUXTRACE_SUPPORT
                   bpf: [ on  ]  # HAVE_LIBBPF_SUPPORT
                   aio: [ on  ]  # HAVE_AIO_SUPPORT
                  zstd: [ on  ]  # HAVE_ZSTD_SUPPORT
               libpfm4: [ on  ]  # HAVE_LIBPFM
         libtraceevent: [ on  ]  # HAVE_LIBTRACEEVENT

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-10-18 18:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-16 12:00 NMI received for unknown reason when running perf with IBS on AMD Guilherme Amadio
2023-10-16 21:48 ` Arnaldo Carvalho de Melo
2023-10-17  8:01   ` Ravi Bangoria
2023-10-17 12:53     ` Arnaldo Carvalho de Melo
2023-10-18  3:59       ` Ravi Bangoria
2023-10-18 13:59         ` Arnaldo Carvalho de Melo
2023-10-18 15:52           ` Ravi Bangoria
2023-10-18 18:36             ` Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).