All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jiri Olsa <jolsa@redhat.com>
To: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: Perf loses events without reporting
Date: Thu, 1 Jul 2021 12:37:18 +0200	[thread overview]
Message-ID: <YN2a3pSvealjK6uR@krava> (raw)
In-Reply-To: <87lf6rclcm.fsf@stepbren-lnx.us.oracle.com>

On Wed, Jun 30, 2021 at 10:29:13AM -0700, Stephen Brennan wrote:
> Hi all,
> 
> I've been trying to understand the behavior of the x86_64 performance
> monitoring interrupt, specifically when IRQ is disabled. Since it's an
> NMI, it should still trigger and record events. However, I've noticed
> that when interrupts are disabled for a long time, events seem to be
> silently dropped, and I'm wondering if this is expected behavior.
> 
> To test this, I created a simple kernel module "irqoff" which creates a
> file /proc/irqoff_sleep_millis. On write, the module uses
> "spin_lock_irq()" to disable interrupts, and then issues an mdelay()
> call for whatever number of milliseconds was written. This allows us to
> busy wait with IRQ disabled. (Source for the module at the end of this
> email).
> 
> When I use perf to record a write to this file, we see the following:
> 
> $ sudo perf record -e cycles -c 100000 -- sh -c 'echo 2000 > /proc/irqoff_sleep_millis'

seems strange.. I'll check

could you see that also when monitoring the cpu? like:

  $ sudo perf record -e cycles -c 100000 -C 1 -- taskset -c 1 sh -c ..

jirka

> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.030 MB perf.data (58 samples) ]
> 
> $ sudo perf script
> # ... filtered down:
>               sh 62863 52318.991716:     100000 cycles:  ffffffff8a8237a9 delay_tsc+0x39 ([kernel.kallsyms])
>               sh 62863 52318.991740:     100000 cycles:  ffffffff8a823797 delay_tsc+0x27 ([kernel.kallsyms])
>               sh 62863 52318.991765:     100000 cycles:  ffffffff8a823797 delay_tsc+0x27 ([kernel.kallsyms])
> #                      ^ v   ~2 second gap!
>               sh 62863 52320.963900:     100000 cycles:  ffffffff8ae47417 _raw_spin_lock_irqsave+0x27 ([kernel.kallsyms])
>               sh 62863 52320.963923:     100000 cycles:  ffffffff8ae47417 _raw_spin_lock_irqsave+0x27 ([kernel.kallsyms])
>               sh 62863 52320.963948:     100000 cycles:  ffffffff8ab1db9a handle_tx_event+0x2da ([kernel.kallsyms])
> 
> The perf stat shows the following counters over a similar run:
> 
> $ sudo perf stat -- sh -c 'echo 2000 > /proc/irqoff_sleep_millis'
> 
>  Performance counter stats for 'sh -c echo 2000 > /proc/irqoff_sleep_millis':
> 
>           1,975.55 msec task-clock                #    0.999 CPUs utilized
>                  1      context-switches          #    0.001 K/sec
>                  0      cpu-migrations            #    0.000 K/sec
>                 61      page-faults               #    0.031 K/sec
>      7,952,267,470      cycles                    #    4.025 GHz
>        541,904,608      instructions              #    0.07  insn per cycle
>         83,406,021      branches                  #   42.219 M/sec
>             10,365      branch-misses             #    0.01% of all branches
> 
>        1.977234595 seconds time elapsed
> 
>        0.000000000 seconds user
>        1.977162000 seconds sys
> 
> According to this, we should see roughly 79k samples (7.9 billion cycles
> / 100k sample period), but perf only gets 58. What it "looks like" to
> me, is that the CPU ring buffer might run out of space after several
> events, and the perf process doesn't get scheduled soon enough to read
> the data? But in my experience, perf usually reports that it missed some
> events. So I wonder if anybody is familiar with the factors at play for
> when IRQ is disabled during a PMI? I'd appreciate any pointers to guide
> my exploration.
> 
> My test case here ran on Ubuntu distro kernel 5.11.0-22-generic, and I
> have also tested on a 5.4 based kernel. I'm happy to reproduce this on a
> mainline kernel too.
> 
> Thanks,
> Stephen
> 
> Makefile:
> <<<
> obj-m += irqoff.o
> 
> all:
> 	make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
> 
> clean:
> 	make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
> >>>
> 
> irqoff.c:
> <<<
> #include <linux/module.h>
> #include <linux/kernel.h>
> #include <linux/init.h>
> #include <linux/proc_fs.h>
> #include <linux/spinlock.h>
> #include <linux/delay.h>
> #include <linux/version.h>
> #include <linux/uaccess.h>
> 
> MODULE_LICENSE("GPL");
> MODULE_DESCRIPTION("Test module that allows to disable IRQ for configurable time");
> MODULE_AUTHOR("Stephen Brennan <stephen.s.brennan@oracle.com>");
> 
> 
> // Store the proc dir entry we can use to check status
> struct proc_dir_entry *pde = NULL;
> 
> DEFINE_SPINLOCK(irqoff_lock);
> 
> 
> static noinline void irqsoff_inirq_delay(unsigned long millis)
> {
> 	mdelay(millis);
> }
> 
> 
> static ssize_t irqsoff_write(struct file *f, const char __user *data, size_t amt, loff_t *off)
> {
> 	char buf[32];
> 	int rv;
> 	unsigned long usecs = 0;
> 
> 	if (amt > sizeof(buf) - 1)
> 		return -EFBIG;
> 
> 	if ((rv = copy_from_user(buf, data, amt)) != 0)
> 		return -EFAULT;
> 
> 	buf[amt] = '\0';
> 
> 	if (sscanf(buf, "%lu", &usecs) != 1)
> 		return -EINVAL;
> 
> 	/* We read number of milliseconds, but will convert to microseconds.
> 	   Threshold it at 5 minutes for safety. */
> 	if (usecs > 5 * 60 * 1000)
> 		return -EINVAL;
> 
> 	pr_info("[irqoff] lock for %lu millis\n", usecs);
> 	spin_lock_irq(&irqoff_lock);
> 	irqsoff_inirq_delay(usecs);
> 	spin_unlock_irq(&irqoff_lock);
> 
> 	return amt;
> }
> 
> static ssize_t irqsoff_read(struct file *f, char __user *data, size_t amt, loff_t *off)
> {
> 	return 0;
> }
> 
> #if LINUX_VERSION_CODE < KERNEL_VERSION(5,6,0)
> static const struct file_operations irqsoff_fops = {
> 	.owner                  = THIS_MODULE,
> 	.read                   = irqsoff_read,
> 	.write                  = irqsoff_write,
> };
> #else
> static const struct proc_ops irqsoff_fops = {
> 	.proc_read		= irqsoff_read,
> 	.proc_write		= irqsoff_write,
> };
> #endif
> 
> static int irqoff_init(void)
> {
> 	pde = proc_create("irqoff_sleep_millis", 0644, NULL, &irqsoff_fops);
> 	if (!pde)
> 		return -ENOENT;
> 
> 	pr_info("[irqoff] successfully initialized\n");
> 	return 0;
> }
> 
> static void irqoff_exit(void)
> {
> 	proc_remove(pde);
> 	pde = NULL;
> }
> 
> module_init(irqoff_init);
> module_exit(irqoff_exit);
> >>>
> 


  reply	other threads:[~2021-07-01 10:37 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-30 17:29 Perf loses events without reporting Stephen Brennan
2021-07-01 10:37 ` Jiri Olsa [this message]
2021-07-01 16:45   ` Stephen Brennan
2021-07-01 17:05     ` Stephen Brennan
2021-07-01 18:00       ` Stephen Brennan
2021-07-02 16:22         ` Jiri Olsa
2021-07-02 20:12           ` Jiri Olsa
2021-07-02 22:48             ` Stephen Brennan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YN2a3pSvealjK6uR@krava \
    --to=jolsa@redhat.com \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=stephen.s.brennan@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.