All of lore.kernel.org
 help / color / mirror / Atom feed
* NMI received for unknown reason when running perf with IBS on AMD
@ 2023-10-16 12:00 Guilherme Amadio
  2023-10-16 21:48 ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 8+ messages in thread
From: Guilherme Amadio @ 2023-10-16 12:00 UTC (permalink / raw)
  To: acme; +Cc: linux-perf-users

Hi Arnaldo,

I've been having a strange problem with perf when using IBS on an AMD
3950X processor. Whenever I use perf top or perf record with the default
event, which corresponds to cycles:P, I get these messages in dmesg output:

[443324.266243] Uhhuh. NMI received for unknown reason 3c on CPU 1.
[443324.266246] Dazed and confused, but trying to continue
[443324.290039] Uhhuh. NMI received for unknown reason 2c on CPU 9.
[443324.290042] Dazed and confused, but trying to continue
[443324.307334] Uhhuh. NMI received for unknown reason 3c on CPU 9.
[443324.307336] Dazed and confused, but trying to continue
[443324.404938] Uhhuh. NMI received for unknown reason 2c on CPU 9.
[443324.404940] Dazed and confused, but trying to continue

If I decrease the frequency I use for sampling, the messages also
decrease in frequency, but even with a low sampling frequency,
eventually they start to appear. Interestingly, if I use simply cycles
as the event, the problem does not happen. However, if I use cycles:pp
and a single CPU, it is less frequent, but does happen. Only CPUs used
for measurement show up in the error messages as well (i.e. if I
restrict to CPU 0, only CPU 0 shows in the messages above), and the more
CPUs I use, the more frequent the messages.

Please let me know how I could help to debug this problem further. The
output of perf record then perf report --header-only follows below,
along with perf version --build-options.

Best regards,
-Guilherme



$ perf report --header-only
# ========
# captured on    : Mon Oct 16 13:57:34 2023
# header version : 1
# data offset    : 664
# data size      : 884904
# feat offset    : 885568
# hostname : gentoo.cern.ch
# os release : 6.5.5-gentoo
# perf version : 6.5
# arch : x86_64
# nrcpus online : 16
# nrcpus avail : 16
# cpudesc : AMD Ryzen 9 3950X 16-Core Processor
# cpuid : AuthenticAMD,23,113,0
# total memory : 32748648 kB
# cmdline : /usr/bin/perf record -a -e cycles:pp -- sleep 1 
# event : name = cycles:pp, , id = { 809, 810, 811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, 822, 823, 824 }, type = 0 (PERF_TYPE_HARDWARE), size = 136, config = 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ID|CPU|PERIOD, read_format = ID, disabled = 1, inherit = 1, freq = 1, precise_ip = 2, sample_id_all = 1
# event : name = dummy:HG, , id = { 825, 826, 827, 828, 829, 830, 831, 832, 833, 834, 835, 836, 837, 838, 839, 840 }, type = 1 (PERF_TYPE_SOFTWARE), size = 136, config = 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ID|CPU|PERIOD, read_format = ID, inherit = 1, mmap = 1, comm = 1, freq = 1, task = 1, sample_id_all = 1, mmap2 = 1, comm_exec = 1, ksymbol = 1, bpf_event = 1
# CPU_TOPOLOGY info available, use -I to display
# NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: cpu = 4, amd_df = 13, software = 1, ibs_op = 12, power = 10, ibs_fetch = 11, uprobe = 9, amd_iommu_0 = 15, breakpoint = 5, amd_l3 = 14, tracepoint = 2, kprobe = 8, msr = 16
# CACHE info available, use -I to display
# time of first sample : 444374.952454
# time of last sample : 444375.960893
# sample duration :   1008.439 ms
# cpu pmu capabilities: max_precise=0
# missing features: TRACING_DATA BRANCH_STACK GROUP_DESC AUXTRACE STAT MEM_TOPOLOGY CLOCKID DIR_FORMAT COMPRESSED CLOCK_DATA HYBRID_TOPOLOGY 
# ========
#

$ perf version --build-options
perf version 6.5
                 dwarf: [ on  ]  # HAVE_DWARF_SUPPORT
    dwarf_getlocations: [ on  ]  # HAVE_DWARF_GETLOCATIONS_SUPPORT
         syscall_table: [ on  ]  # HAVE_SYSCALL_TABLE_SUPPORT
                libbfd: [ on  ]  # HAVE_LIBBFD_SUPPORT
            debuginfod: [ OFF ]  # HAVE_DEBUGINFOD_SUPPORT
                libelf: [ on  ]  # HAVE_LIBELF_SUPPORT
               libnuma: [ on  ]  # HAVE_LIBNUMA_SUPPORT
numa_num_possible_cpus: [ on  ]  # HAVE_LIBNUMA_SUPPORT
               libperl: [ on  ]  # HAVE_LIBPERL_SUPPORT
             libpython: [ on  ]  # HAVE_LIBPYTHON_SUPPORT
              libslang: [ on  ]  # HAVE_SLANG_SUPPORT
             libcrypto: [ on  ]  # HAVE_LIBCRYPTO_SUPPORT
             libunwind: [ on  ]  # HAVE_LIBUNWIND_SUPPORT
    libdw-dwarf-unwind: [ on  ]  # HAVE_DWARF_SUPPORT
                  zlib: [ on  ]  # HAVE_ZLIB_SUPPORT
                  lzma: [ on  ]  # HAVE_LZMA_SUPPORT
             get_cpuid: [ on  ]  # HAVE_AUXTRACE_SUPPORT
                   bpf: [ on  ]  # HAVE_LIBBPF_SUPPORT
                   aio: [ on  ]  # HAVE_AIO_SUPPORT
                  zstd: [ on  ]  # HAVE_ZSTD_SUPPORT
               libpfm4: [ on  ]  # HAVE_LIBPFM
         libtraceevent: [ on  ]  # HAVE_LIBTRACEEVENT

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-10-18 18:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-16 12:00 NMI received for unknown reason when running perf with IBS on AMD Guilherme Amadio
2023-10-16 21:48 ` Arnaldo Carvalho de Melo
2023-10-17  8:01   ` Ravi Bangoria
2023-10-17 12:53     ` Arnaldo Carvalho de Melo
2023-10-18  3:59       ` Ravi Bangoria
2023-10-18 13:59         ` Arnaldo Carvalho de Melo
2023-10-18 15:52           ` Ravi Bangoria
2023-10-18 18:36             ` Arnaldo Carvalho de Melo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.