linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* perf_event_open.2: mmap ring buffer requirement for receiving overflow notifications
@ 2024-11-23 20:49 Mikołaj Kołek
  2024-11-23 21:29 ` Alejandro Colomar
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Mikołaj Kołek @ 2024-11-23 20:49 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: linux-man, linux-perf-users

[-- Attachment #1: Type: text/plain, Size: 2162 bytes --]

I have found that when monitoring a file descriptor returned by
perf_event_open() with poll(), it is required to allocate an mmap ring
buffer to properly receive overflow notifications. If this is not
done, poll() keeps continuously returning POLLHUP, even when an
overflow notification should not be raised. Notably, this behavior is
different from listening for overflow notifications by setting the
O_ASYNC flag on the file descriptor - in that case, creating the mmap
ring buffer is not required to receive the SIGIO signal delivered
after the file descriptor becomes available for reading. I attach code
showcasing this behavior (the functionality is explained in the
comments).

This behavior by itself is not a problem, however, in the current
state of the perf_event_open man page, it's not documented, and in
fact, there are confusing statements that seem to contradict my
findings. In the MMAP layout section of the page, you can find this
sentence:
Before Linux 2.6.39, there is a bug that means you must allocate
an mmap ring buffer when sampling even if you do not plan to
access it.
Unless I'm somehow misunderstanding it, this statement does not seem
to be well worded, or alternatively this bug does not seem to be
fixed. I would not call simply using poll() on the file descriptor
intent to access the ring buffer (unless it's meant to be understood
that way, in which case, in my opinion, it's quite confusing).
Additionally, I cannot find any change in Linux 2.6.39 that would fit
this description (however, that is likely just due to my lack of
experience searching through the kernel changelogs and commits).

I would like to receive clarification on whether this current behavior
of perf_event_open is intentional and desired (that is why I cc'd
linux-perf-users). If it is, I could also create a patch to the man
page that lays out the requirements more clearly. In that case, it
would also be helpful to further clarify the wording of the sentence
mentioning the Linux 2.6.39 change, however I don't know if I'm
qualified to do that, because as I have previously stated, I am unable
to find what changes that sentence actually refers to.

[-- Attachment #2: mmap_demo.cpp --]
[-- Type: text/x-c++src, Size: 3351 bytes --]

#include <linux/perf_event.h>
#include <sys/syscall.h>
#include <sys/mman.h>
#include <iostream>
#include <unistd.h>
#include <signal.h>
#include <fcntl.h>
#include <cstdint>
#include <poll.h>

// Modify the value of this constant to change the variant of the program
// that is run. The possible values are:
// 1: SIGIO without mmap, 2: SIGIO with mmap, 
// 3: poll without mmap, 4: poll with mmap
// As stated in the email, varaints 1, 2 and 4 properly trigger overflow
// notifications approximately after each 1000000000 hardware instructions,
// however when the program is run with variant = 3, poll will just 
// continuously return POLLHUP, without waiting for the overflow
// 
// Also, before running any variant, make sure to set the 
// kernel.perf_event_paranoid sysctl to -1 
// (for example by running sudo sysctl kernel.perf_event_paranoid=-1)
const int variant = 1;

static long perf_event_open(struct perf_event_attr *hw_event, pid_t
	pid, int cpu, int group_fd, unsigned long flags) {
    return syscall(SYS_perf_event_open, hw_event, pid, cpu, group_fd, flags);
}

volatile sig_atomic_t sigioOccurred = 0;
void sigioHandler(int signum) {
    sigioOccurred = 1;
}

uint64_t get_instructions_used(int perf_fd) {
    uint64_t result;
    ssize_t size = read(perf_fd, &result, sizeof(uint64_t));

    if (size != sizeof(result)) {
        std::cout << "read failed";
        exit(0);
    }
    if (result < 0) {
        std::cout << "read negative instructions count";
        exit(0);
    }

    return result;
}

int main() {
    struct sigaction sa;
    sa.sa_handler = sigioHandler; sa.sa_flags = 0; sigemptyset(&sa.sa_mask);
    sigaction(SIGIO, &sa, 0);

    int child = fork(), num = 2;
    if(child == 0) {
        while(true) {
            num *= 2;
        }
    }

    struct perf_event_attr attrs {}; attrs.config = PERF_COUNT_HW_INSTRUCTIONS; 
    attrs.type = PERF_TYPE_HARDWARE; attrs.sample_period = 1000000000; 
	attrs.wakeup_events = 1;
    int perf_fd = perf_event_open(&attrs, child, -1, -1, 0);

    if(variant == 2 or variant == 4) {
        void *base = mmap(NULL, getpagesize() * (8192 + 1), PROT_READ
			| PROT_WRITE, MAP_SHARED, perf_fd, 0);
		
        if (base == MAP_FAILED) {
            std::cout << "mmap err " << errno << "\n";
            return -1;
        }
    }

    if(variant == 1 or variant == 2) {
        fcntl(perf_fd, F_SETOWN, getpid());
        fcntl(perf_fd, F_SETFL, (fcntl(perf_fd, F_GETFL, 0) | O_ASYNC));
    }

    while(true) {
        if(variant == 1 or variant == 2) {
            if(sigioOccurred) {
                std::cout << "SIGIO delivered, instructions used: " <<
					get_instructions_used(perf_fd) << "\n";
				
                sigioOccurred = 0;
            }
        }

        if(variant == 3 or variant == 4) {
            struct pollfd pfd = { .fd = perf_fd, .events = POLLIN };
            int res = poll(&pfd, 1, 1000000);

            std::cout << "Poll returned ";
            if(pfd.revents == POLLHUP)
                std::cout << "POLLHUP, instructions used: " << 
					get_instructions_used(perf_fd) << "\n";
            else if(pfd.revents == POLLIN)
                std::cout << "POLLIN, instructions used: " <<
					get_instructions_used(perf_fd) << "\n";
            else
                std::cout << pfd.revents << "\n";
        }
    }

    return 0;
}

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf_event_open.2: mmap ring buffer requirement for receiving overflow notifications
  2024-11-23 20:49 perf_event_open.2: mmap ring buffer requirement for receiving overflow notifications Mikołaj Kołek
@ 2024-11-23 21:29 ` Alejandro Colomar
  2024-11-26  5:42 ` Namhyung Kim
  2024-11-26 22:13 ` Vince Weaver
  2 siblings, 0 replies; 8+ messages in thread
From: Alejandro Colomar @ 2024-11-23 21:29 UTC (permalink / raw)
  To: Mikołaj Kołek; +Cc: linux-man, linux-perf-users

[-- Attachment #1: Type: text/plain, Size: 2575 bytes --]

Hi Mikołaj,

On Sat, Nov 23, 2024 at 09:49:40PM +0100, Mikołaj Kołek wrote:
> I have found that when monitoring a file descriptor returned by
> perf_event_open() with poll(), it is required to allocate an mmap ring
> buffer to properly receive overflow notifications. If this is not
> done, poll() keeps continuously returning POLLHUP, even when an
> overflow notification should not be raised. Notably, this behavior is
> different from listening for overflow notifications by setting the
> O_ASYNC flag on the file descriptor - in that case, creating the mmap
> ring buffer is not required to receive the SIGIO signal delivered
> after the file descriptor becomes available for reading. I attach code
> showcasing this behavior (the functionality is explained in the
> comments).
> 
> This behavior by itself is not a problem, however, in the current
> state of the perf_event_open man page, it's not documented, and in
> fact, there are confusing statements that seem to contradict my
> findings. In the MMAP layout section of the page, you can find this
> sentence:
> Before Linux 2.6.39, there is a bug that means you must allocate
> an mmap ring buffer when sampling even if you do not plan to
> access it.
> Unless I'm somehow misunderstanding it, this statement does not seem
> to be well worded, or alternatively this bug does not seem to be
> fixed. I would not call simply using poll() on the file descriptor
> intent to access the ring buffer (unless it's meant to be understood
> that way, in which case, in my opinion, it's quite confusing).
> Additionally, I cannot find any change in Linux 2.6.39 that would fit
> this description (however, that is likely just due to my lack of
> experience searching through the kernel changelogs and commits).
> 
> I would like to receive clarification on whether this current behavior
> of perf_event_open is intentional and desired (that is why I cc'd
> linux-perf-users). If it is, I could also create a patch to the man
> page that lays out the requirements more clearly. In that case, it
> would also be helpful to further clarify the wording of the sentence
> mentioning the Linux 2.6.39 change, however I don't know if I'm
> qualified to do that, because as I have previously stated, I am unable
> to find what changes that sentence actually refers to.

I don't know.  Kernel maintainers may be able to respond much better
than me.  I see you've CCed their list, so they'll hopefully answer.

Have a lovely night!
Alex



-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf_event_open.2: mmap ring buffer requirement for receiving overflow notifications
  2024-11-23 20:49 perf_event_open.2: mmap ring buffer requirement for receiving overflow notifications Mikołaj Kołek
  2024-11-23 21:29 ` Alejandro Colomar
@ 2024-11-26  5:42 ` Namhyung Kim
  2024-11-26 10:15   ` Alejandro Colomar
  2024-11-26 22:13 ` Vince Weaver
  2 siblings, 1 reply; 8+ messages in thread
From: Namhyung Kim @ 2024-11-26  5:42 UTC (permalink / raw)
  To: Mikołaj Kołek; +Cc: Alejandro Colomar, linux-man, linux-perf-users

Hello,

On Sat, Nov 23, 2024 at 09:49:40PM +0100, Mikołaj Kołek wrote:
> I have found that when monitoring a file descriptor returned by
> perf_event_open() with poll(), it is required to allocate an mmap ring
> buffer to properly receive overflow notifications. If this is not
> done, poll() keeps continuously returning POLLHUP, even when an
> overflow notification should not be raised. Notably, this behavior is
> different from listening for overflow notifications by setting the
> O_ASYNC flag on the file descriptor - in that case, creating the mmap
> ring buffer is not required to receive the SIGIO signal delivered
> after the file descriptor becomes available for reading. I attach code
> showcasing this behavior (the functionality is explained in the
> comments).

Thanks for the report and the test code.  I agree that the current man
page is a little confusing about the overflow notification.  I can see
the following sentences in the "overflow handling" section.

  There are two ways to generate overflow notifications.

  The  first is to set a wakeup_events or wakeup_watermark value that
  will trigger if a certain number of samples or bytes have been
  written to the mmap ring buffer.  In this case, POLL_IN is indicated.

  The other way is by use of the PERF_EVENT_IOC_REFRESH ioctl.  This
  ioctl adds to a counter that decrements each  time  the  event
  overflows.  When nonzero, POLL_IN is indicated, but once the counter
  reaches 0 POLL_HUP is indicated and the underlying event is disabled.

I think the first and the default way uses the ring buffer to determine
overflow condition so it should be allocated before calling poll(2) or
similar.  The second way doesn't seem to require ring buffers, but I
haven't tested it actually.

Maybe we can add something like this to the first section:

  If the ring buffer is not allocated, POLL_HUP is indicated.

> 
> This behavior by itself is not a problem, however, in the current
> state of the perf_event_open man page, it's not documented, and in
> fact, there are confusing statements that seem to contradict my
> findings. In the MMAP layout section of the page, you can find this
> sentence:
> Before Linux 2.6.39, there is a bug that means you must allocate
> an mmap ring buffer when sampling even if you do not plan to
> access it.

I don't remember the old kernels, but it sounds like the event was
failing if no ring buffer is available.  Maybe no samples would be
generated in that case.

Thanks,
Namhyung


> Unless I'm somehow misunderstanding it, this statement does not seem
> to be well worded, or alternatively this bug does not seem to be
> fixed. I would not call simply using poll() on the file descriptor
> intent to access the ring buffer (unless it's meant to be understood
> that way, in which case, in my opinion, it's quite confusing).
> Additionally, I cannot find any change in Linux 2.6.39 that would fit
> this description (however, that is likely just due to my lack of
> experience searching through the kernel changelogs and commits).
> 
> I would like to receive clarification on whether this current behavior
> of perf_event_open is intentional and desired (that is why I cc'd
> linux-perf-users). If it is, I could also create a patch to the man
> page that lays out the requirements more clearly. In that case, it
> would also be helpful to further clarify the wording of the sentence
> mentioning the Linux 2.6.39 change, however I don't know if I'm
> qualified to do that, because as I have previously stated, I am unable
> to find what changes that sentence actually refers to.

> #include <linux/perf_event.h>
> #include <sys/syscall.h>
> #include <sys/mman.h>
> #include <iostream>
> #include <unistd.h>
> #include <signal.h>
> #include <fcntl.h>
> #include <cstdint>
> #include <poll.h>
> 
> // Modify the value of this constant to change the variant of the program
> // that is run. The possible values are:
> // 1: SIGIO without mmap, 2: SIGIO with mmap, 
> // 3: poll without mmap, 4: poll with mmap
> // As stated in the email, varaints 1, 2 and 4 properly trigger overflow
> // notifications approximately after each 1000000000 hardware instructions,
> // however when the program is run with variant = 3, poll will just 
> // continuously return POLLHUP, without waiting for the overflow
> // 
> // Also, before running any variant, make sure to set the 
> // kernel.perf_event_paranoid sysctl to -1 
> // (for example by running sudo sysctl kernel.perf_event_paranoid=-1)
> const int variant = 1;
> 
> static long perf_event_open(struct perf_event_attr *hw_event, pid_t
> 	pid, int cpu, int group_fd, unsigned long flags) {
>     return syscall(SYS_perf_event_open, hw_event, pid, cpu, group_fd, flags);
> }
> 
> volatile sig_atomic_t sigioOccurred = 0;
> void sigioHandler(int signum) {
>     sigioOccurred = 1;
> }
> 
> uint64_t get_instructions_used(int perf_fd) {
>     uint64_t result;
>     ssize_t size = read(perf_fd, &result, sizeof(uint64_t));
> 
>     if (size != sizeof(result)) {
>         std::cout << "read failed";
>         exit(0);
>     }
>     if (result < 0) {
>         std::cout << "read negative instructions count";
>         exit(0);
>     }
> 
>     return result;
> }
> 
> int main() {
>     struct sigaction sa;
>     sa.sa_handler = sigioHandler; sa.sa_flags = 0; sigemptyset(&sa.sa_mask);
>     sigaction(SIGIO, &sa, 0);
> 
>     int child = fork(), num = 2;
>     if(child == 0) {
>         while(true) {
>             num *= 2;
>         }
>     }
> 
>     struct perf_event_attr attrs {}; attrs.config = PERF_COUNT_HW_INSTRUCTIONS; 
>     attrs.type = PERF_TYPE_HARDWARE; attrs.sample_period = 1000000000; 
> 	attrs.wakeup_events = 1;
>     int perf_fd = perf_event_open(&attrs, child, -1, -1, 0);
> 
>     if(variant == 2 or variant == 4) {
>         void *base = mmap(NULL, getpagesize() * (8192 + 1), PROT_READ
> 			| PROT_WRITE, MAP_SHARED, perf_fd, 0);
> 		
>         if (base == MAP_FAILED) {
>             std::cout << "mmap err " << errno << "\n";
>             return -1;
>         }
>     }
> 
>     if(variant == 1 or variant == 2) {
>         fcntl(perf_fd, F_SETOWN, getpid());
>         fcntl(perf_fd, F_SETFL, (fcntl(perf_fd, F_GETFL, 0) | O_ASYNC));
>     }
> 
>     while(true) {
>         if(variant == 1 or variant == 2) {
>             if(sigioOccurred) {
>                 std::cout << "SIGIO delivered, instructions used: " <<
> 					get_instructions_used(perf_fd) << "\n";
> 				
>                 sigioOccurred = 0;
>             }
>         }
> 
>         if(variant == 3 or variant == 4) {
>             struct pollfd pfd = { .fd = perf_fd, .events = POLLIN };
>             int res = poll(&pfd, 1, 1000000);
> 
>             std::cout << "Poll returned ";
>             if(pfd.revents == POLLHUP)
>                 std::cout << "POLLHUP, instructions used: " << 
> 					get_instructions_used(perf_fd) << "\n";
>             else if(pfd.revents == POLLIN)
>                 std::cout << "POLLIN, instructions used: " <<
> 					get_instructions_used(perf_fd) << "\n";
>             else
>                 std::cout << pfd.revents << "\n";
>         }
>     }
> 
>     return 0;
> }

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf_event_open.2: mmap ring buffer requirement for receiving overflow notifications
  2024-11-26  5:42 ` Namhyung Kim
@ 2024-11-26 10:15   ` Alejandro Colomar
  0 siblings, 0 replies; 8+ messages in thread
From: Alejandro Colomar @ 2024-11-26 10:15 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: Mikołaj Kołek, linux-man, linux-perf-users

[-- Attachment #1: Type: text/plain, Size: 1374 bytes --]

Hi Namhyung,

On Mon, Nov 25, 2024 at 09:42:54PM -0800, Namhyung Kim wrote:
> Thanks for the report and the test code.  I agree that the current man
> page is a little confusing about the overflow notification.  I can see
> the following sentences in the "overflow handling" section.
> 
>   There are two ways to generate overflow notifications.
> 
>   The  first is to set a wakeup_events or wakeup_watermark value that
>   will trigger if a certain number of samples or bytes have been
>   written to the mmap ring buffer.  In this case, POLL_IN is indicated.
> 
>   The other way is by use of the PERF_EVENT_IOC_REFRESH ioctl.  This
>   ioctl adds to a counter that decrements each  time  the  event
>   overflows.  When nonzero, POLL_IN is indicated, but once the counter
>   reaches 0 POLL_HUP is indicated and the underlying event is disabled.
> 
> I think the first and the default way uses the ring buffer to determine
> overflow condition so it should be allocated before calling poll(2) or
> similar.  The second way doesn't seem to require ring buffers, but I
> haven't tested it actually.
> 
> Maybe we can add something like this to the first section:
> 
>   If the ring buffer is not allocated, POLL_HUP is indicated.

Thanks!  Would you mind sending a patch?

Have a lovely day!
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf_event_open.2: mmap ring buffer requirement for receiving overflow notifications
  2024-11-23 20:49 perf_event_open.2: mmap ring buffer requirement for receiving overflow notifications Mikołaj Kołek
  2024-11-23 21:29 ` Alejandro Colomar
  2024-11-26  5:42 ` Namhyung Kim
@ 2024-11-26 22:13 ` Vince Weaver
  2024-12-02 20:53   ` Namhyung Kim
  2 siblings, 1 reply; 8+ messages in thread
From: Vince Weaver @ 2024-11-26 22:13 UTC (permalink / raw)
  To: Mikołaj Kołek; +Cc: Alejandro Colomar, linux-man, linux-perf-users

[-- Attachment #1: Type: text/plain, Size: 1271 bytes --]

On Sat, 23 Nov 2024, Mikołaj Kołek wrote:

> findings. In the MMAP layout section of the page, you can find this
> sentence:
> Before Linux 2.6.39, there is a bug that means you must allocate
> an mmap ring buffer when sampling even if you do not plan to
> access it.
> Unless I'm somehow misunderstanding it, this statement does not seem
> to be well worded, or alternatively this bug does not seem to be
> fixed.

That text was probably written by me.

I tried looking at the 2.6.39 code, my perf_tests, and also PAPI which was 
where the problem was probably noticed but I can't find a firm reference 
for how the issue was fixed.

If I recall, the problem was if you were trying to create a sampling event 
without mmap (say you want to get a signal every 100,000 retired 
instructions, but you don't actually want any sample data).  I think 
before 2.6.39 if you tried setting that up you'd get some sort of error 
(an EINVAL?) when trying to start(?) the event.

It is possible this wasn't fixed.  I tried to be pretty good 
about putting relevant git commits as comments in the manpage but there 
doesn't seem to be one for that part of the text.  I'm guessing it was 
PeterZ doing the work on this so maybe he remembers.

Vince Weaver
vincent.weaver@maine.edu


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf_event_open.2: mmap ring buffer requirement for receiving overflow notifications
  2024-11-26 22:13 ` Vince Weaver
@ 2024-12-02 20:53   ` Namhyung Kim
  2024-12-04 12:28     ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Namhyung Kim @ 2024-12-02 20:53 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Vince Weaver, Mikołaj Kołek, Alejandro Colomar,
	linux-man, linux-perf-users

Hi Peter and Ingo,

On Tue, Nov 26, 2024 at 05:13:45PM -0500, Vince Weaver wrote:
> On Sat, 23 Nov 2024, Mikołaj Kołek wrote:
> 
> > findings. In the MMAP layout section of the page, you can find this
> > sentence:
> > Before Linux 2.6.39, there is a bug that means you must allocate
> > an mmap ring buffer when sampling even if you do not plan to
> > access it.
> > Unless I'm somehow misunderstanding it, this statement does not seem
> > to be well worded, or alternatively this bug does not seem to be
> > fixed.
> 
> That text was probably written by me.
> 
> I tried looking at the 2.6.39 code, my perf_tests, and also PAPI which was 
> where the problem was probably noticed but I can't find a firm reference 
> for how the issue was fixed.
> 
> If I recall, the problem was if you were trying to create a sampling event 
> without mmap (say you want to get a signal every 100,000 retired 
> instructions, but you don't actually want any sample data).  I think 
> before 2.6.39 if you tried setting that up you'd get some sort of error 
> (an EINVAL?) when trying to start(?) the event.
> 
> It is possible this wasn't fixed.  I tried to be pretty good 
> about putting relevant git commits as comments in the manpage but there 
> doesn't seem to be one for that part of the text.  I'm guessing it was 
> PeterZ doing the work on this so maybe he remembers.

Do you remember what was the issue exactly on sampling events w/o mmap?

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf_event_open.2: mmap ring buffer requirement for receiving overflow notifications
  2024-12-02 20:53   ` Namhyung Kim
@ 2024-12-04 12:28     ` Peter Zijlstra
  2024-12-04 19:30       ` Namhyung Kim
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2024-12-04 12:28 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Vince Weaver, Mikołaj Kołek,
	Alejandro Colomar, linux-man, linux-perf-users

On Mon, Dec 02, 2024 at 12:53:56PM -0800, Namhyung Kim wrote:
> Hi Peter and Ingo,
> 
> On Tue, Nov 26, 2024 at 05:13:45PM -0500, Vince Weaver wrote:
> > On Sat, 23 Nov 2024, Mikołaj Kołek wrote:
> > 
> > > findings. In the MMAP layout section of the page, you can find this
> > > sentence:
> > > Before Linux 2.6.39, there is a bug that means you must allocate
> > > an mmap ring buffer when sampling even if you do not plan to
> > > access it.
> > > Unless I'm somehow misunderstanding it, this statement does not seem
> > > to be well worded, or alternatively this bug does not seem to be
> > > fixed.
> > 
> > That text was probably written by me.
> > 
> > I tried looking at the 2.6.39 code, my perf_tests, and also PAPI which was 
> > where the problem was probably noticed but I can't find a firm reference 
> > for how the issue was fixed.
> > 
> > If I recall, the problem was if you were trying to create a sampling event 
> > without mmap (say you want to get a signal every 100,000 retired 
> > instructions, but you don't actually want any sample data).  I think 
> > before 2.6.39 if you tried setting that up you'd get some sort of error 
> > (an EINVAL?) when trying to start(?) the event.
> > 
> > It is possible this wasn't fixed.  I tried to be pretty good 
> > about putting relevant git commits as comments in the manpage but there 
> > doesn't seem to be one for that part of the text.  I'm guessing it was 
> > PeterZ doing the work on this so maybe he remembers.
> 
> Do you remember what was the issue exactly on sampling events w/o mmap?

I can barely remember last release :/ But looking at the code, I don't
see a reason that wouldn't work today.

Only the overflow handler should care about there being a buffer, and
that just NOPs out if there isn't -- expensively, if this is a common
thing, this could perhaps be optimized a little.


---
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4c6f6c286b2d..190b5c3cec10 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8117,6 +8117,8 @@ __perf_event_output(struct perf_event *event,
 
 	/* protect the callchain buffers */
 	rcu_read_lock();
+	if (unlikely(!event->rb))
+		goto exit;
 
 	perf_prepare_sample(data, event, regs);
 	perf_prepare_header(&header, data, event, regs);

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: perf_event_open.2: mmap ring buffer requirement for receiving overflow notifications
  2024-12-04 12:28     ` Peter Zijlstra
@ 2024-12-04 19:30       ` Namhyung Kim
  0 siblings, 0 replies; 8+ messages in thread
From: Namhyung Kim @ 2024-12-04 19:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Vince Weaver, Mikołaj Kołek,
	Alejandro Colomar, linux-man, linux-perf-users

On Wed, Dec 04, 2024 at 01:28:01PM +0100, Peter Zijlstra wrote:
> On Mon, Dec 02, 2024 at 12:53:56PM -0800, Namhyung Kim wrote:
> > Hi Peter and Ingo,
> > 
> > On Tue, Nov 26, 2024 at 05:13:45PM -0500, Vince Weaver wrote:
> > > On Sat, 23 Nov 2024, Mikołaj Kołek wrote:
> > > 
> > > > findings. In the MMAP layout section of the page, you can find this
> > > > sentence:
> > > > Before Linux 2.6.39, there is a bug that means you must allocate
> > > > an mmap ring buffer when sampling even if you do not plan to
> > > > access it.
> > > > Unless I'm somehow misunderstanding it, this statement does not seem
> > > > to be well worded, or alternatively this bug does not seem to be
> > > > fixed.
> > > 
> > > That text was probably written by me.
> > > 
> > > I tried looking at the 2.6.39 code, my perf_tests, and also PAPI which was 
> > > where the problem was probably noticed but I can't find a firm reference 
> > > for how the issue was fixed.
> > > 
> > > If I recall, the problem was if you were trying to create a sampling event 
> > > without mmap (say you want to get a signal every 100,000 retired 
> > > instructions, but you don't actually want any sample data).  I think 
> > > before 2.6.39 if you tried setting that up you'd get some sort of error 
> > > (an EINVAL?) when trying to start(?) the event.
> > > 
> > > It is possible this wasn't fixed.  I tried to be pretty good 
> > > about putting relevant git commits as comments in the manpage but there 
> > > doesn't seem to be one for that part of the text.  I'm guessing it was 
> > > PeterZ doing the work on this so maybe he remembers.
> > 
> > Do you remember what was the issue exactly on sampling events w/o mmap?
> 
> I can barely remember last release :/ But looking at the code, I don't
> see a reason that wouldn't work today.
> 
> Only the overflow handler should care about there being a buffer, and
> that just NOPs out if there isn't -- expensively, if this is a common
> thing, this could perhaps be optimized a little.
> 
> 
> ---
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 4c6f6c286b2d..190b5c3cec10 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8117,6 +8117,8 @@ __perf_event_output(struct perf_event *event,
>  
>  	/* protect the callchain buffers */
>  	rcu_read_lock();
> +	if (unlikely(!event->rb))
> +		goto exit;

We have a similar logic in __perf_output_begin() and it checks with the
parent event due to inheritance.  But doing it here would save some
cycles for preparing samples to be dropped.

Thanks,
Namhyung

>  
>  	perf_prepare_sample(data, event, regs);
>  	perf_prepare_header(&header, data, event, regs);

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-12-04 19:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-23 20:49 perf_event_open.2: mmap ring buffer requirement for receiving overflow notifications Mikołaj Kołek
2024-11-23 21:29 ` Alejandro Colomar
2024-11-26  5:42 ` Namhyung Kim
2024-11-26 10:15   ` Alejandro Colomar
2024-11-26 22:13 ` Vince Weaver
2024-12-02 20:53   ` Namhyung Kim
2024-12-04 12:28     ` Peter Zijlstra
2024-12-04 19:30       ` Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).