From: Namhyung Kim <namhyung@kernel.org>
To: "Mikołaj Kołek" <kolek.mikolaj@gmail.com>
Cc: Alejandro Colomar <alx@kernel.org>,
linux-man@vger.kernel.org, linux-perf-users@vger.kernel.org
Subject: Re: perf_event_open.2: mmap ring buffer requirement for receiving overflow notifications
Date: Mon, 25 Nov 2024 21:42:54 -0800 [thread overview]
Message-ID: <Z0Vf3lrTUsbE_4NP@google.com> (raw)
In-Reply-To: <CAHGiy68y37n-y_b0gR-dArxFRzYOAr93dCw_6bvkNReNWQ37Hw@mail.gmail.com>
Hello,
On Sat, Nov 23, 2024 at 09:49:40PM +0100, Mikołaj Kołek wrote:
> I have found that when monitoring a file descriptor returned by
> perf_event_open() with poll(), it is required to allocate an mmap ring
> buffer to properly receive overflow notifications. If this is not
> done, poll() keeps continuously returning POLLHUP, even when an
> overflow notification should not be raised. Notably, this behavior is
> different from listening for overflow notifications by setting the
> O_ASYNC flag on the file descriptor - in that case, creating the mmap
> ring buffer is not required to receive the SIGIO signal delivered
> after the file descriptor becomes available for reading. I attach code
> showcasing this behavior (the functionality is explained in the
> comments).
Thanks for the report and the test code. I agree that the current man
page is a little confusing about the overflow notification. I can see
the following sentences in the "overflow handling" section.
There are two ways to generate overflow notifications.
The first is to set a wakeup_events or wakeup_watermark value that
will trigger if a certain number of samples or bytes have been
written to the mmap ring buffer. In this case, POLL_IN is indicated.
The other way is by use of the PERF_EVENT_IOC_REFRESH ioctl. This
ioctl adds to a counter that decrements each time the event
overflows. When nonzero, POLL_IN is indicated, but once the counter
reaches 0 POLL_HUP is indicated and the underlying event is disabled.
I think the first and the default way uses the ring buffer to determine
overflow condition so it should be allocated before calling poll(2) or
similar. The second way doesn't seem to require ring buffers, but I
haven't tested it actually.
Maybe we can add something like this to the first section:
If the ring buffer is not allocated, POLL_HUP is indicated.
>
> This behavior by itself is not a problem, however, in the current
> state of the perf_event_open man page, it's not documented, and in
> fact, there are confusing statements that seem to contradict my
> findings. In the MMAP layout section of the page, you can find this
> sentence:
> Before Linux 2.6.39, there is a bug that means you must allocate
> an mmap ring buffer when sampling even if you do not plan to
> access it.
I don't remember the old kernels, but it sounds like the event was
failing if no ring buffer is available. Maybe no samples would be
generated in that case.
Thanks,
Namhyung
> Unless I'm somehow misunderstanding it, this statement does not seem
> to be well worded, or alternatively this bug does not seem to be
> fixed. I would not call simply using poll() on the file descriptor
> intent to access the ring buffer (unless it's meant to be understood
> that way, in which case, in my opinion, it's quite confusing).
> Additionally, I cannot find any change in Linux 2.6.39 that would fit
> this description (however, that is likely just due to my lack of
> experience searching through the kernel changelogs and commits).
>
> I would like to receive clarification on whether this current behavior
> of perf_event_open is intentional and desired (that is why I cc'd
> linux-perf-users). If it is, I could also create a patch to the man
> page that lays out the requirements more clearly. In that case, it
> would also be helpful to further clarify the wording of the sentence
> mentioning the Linux 2.6.39 change, however I don't know if I'm
> qualified to do that, because as I have previously stated, I am unable
> to find what changes that sentence actually refers to.
> #include <linux/perf_event.h>
> #include <sys/syscall.h>
> #include <sys/mman.h>
> #include <iostream>
> #include <unistd.h>
> #include <signal.h>
> #include <fcntl.h>
> #include <cstdint>
> #include <poll.h>
>
> // Modify the value of this constant to change the variant of the program
> // that is run. The possible values are:
> // 1: SIGIO without mmap, 2: SIGIO with mmap,
> // 3: poll without mmap, 4: poll with mmap
> // As stated in the email, varaints 1, 2 and 4 properly trigger overflow
> // notifications approximately after each 1000000000 hardware instructions,
> // however when the program is run with variant = 3, poll will just
> // continuously return POLLHUP, without waiting for the overflow
> //
> // Also, before running any variant, make sure to set the
> // kernel.perf_event_paranoid sysctl to -1
> // (for example by running sudo sysctl kernel.perf_event_paranoid=-1)
> const int variant = 1;
>
> static long perf_event_open(struct perf_event_attr *hw_event, pid_t
> pid, int cpu, int group_fd, unsigned long flags) {
> return syscall(SYS_perf_event_open, hw_event, pid, cpu, group_fd, flags);
> }
>
> volatile sig_atomic_t sigioOccurred = 0;
> void sigioHandler(int signum) {
> sigioOccurred = 1;
> }
>
> uint64_t get_instructions_used(int perf_fd) {
> uint64_t result;
> ssize_t size = read(perf_fd, &result, sizeof(uint64_t));
>
> if (size != sizeof(result)) {
> std::cout << "read failed";
> exit(0);
> }
> if (result < 0) {
> std::cout << "read negative instructions count";
> exit(0);
> }
>
> return result;
> }
>
> int main() {
> struct sigaction sa;
> sa.sa_handler = sigioHandler; sa.sa_flags = 0; sigemptyset(&sa.sa_mask);
> sigaction(SIGIO, &sa, 0);
>
> int child = fork(), num = 2;
> if(child == 0) {
> while(true) {
> num *= 2;
> }
> }
>
> struct perf_event_attr attrs {}; attrs.config = PERF_COUNT_HW_INSTRUCTIONS;
> attrs.type = PERF_TYPE_HARDWARE; attrs.sample_period = 1000000000;
> attrs.wakeup_events = 1;
> int perf_fd = perf_event_open(&attrs, child, -1, -1, 0);
>
> if(variant == 2 or variant == 4) {
> void *base = mmap(NULL, getpagesize() * (8192 + 1), PROT_READ
> | PROT_WRITE, MAP_SHARED, perf_fd, 0);
>
> if (base == MAP_FAILED) {
> std::cout << "mmap err " << errno << "\n";
> return -1;
> }
> }
>
> if(variant == 1 or variant == 2) {
> fcntl(perf_fd, F_SETOWN, getpid());
> fcntl(perf_fd, F_SETFL, (fcntl(perf_fd, F_GETFL, 0) | O_ASYNC));
> }
>
> while(true) {
> if(variant == 1 or variant == 2) {
> if(sigioOccurred) {
> std::cout << "SIGIO delivered, instructions used: " <<
> get_instructions_used(perf_fd) << "\n";
>
> sigioOccurred = 0;
> }
> }
>
> if(variant == 3 or variant == 4) {
> struct pollfd pfd = { .fd = perf_fd, .events = POLLIN };
> int res = poll(&pfd, 1, 1000000);
>
> std::cout << "Poll returned ";
> if(pfd.revents == POLLHUP)
> std::cout << "POLLHUP, instructions used: " <<
> get_instructions_used(perf_fd) << "\n";
> else if(pfd.revents == POLLIN)
> std::cout << "POLLIN, instructions used: " <<
> get_instructions_used(perf_fd) << "\n";
> else
> std::cout << pfd.revents << "\n";
> }
> }
>
> return 0;
> }
next prev parent reply other threads:[~2024-11-26 5:42 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-23 20:49 perf_event_open.2: mmap ring buffer requirement for receiving overflow notifications Mikołaj Kołek
2024-11-23 21:29 ` Alejandro Colomar
2024-11-26 5:42 ` Namhyung Kim [this message]
2024-11-26 10:15 ` Alejandro Colomar
2024-11-26 22:13 ` Vince Weaver
2024-12-02 20:53 ` Namhyung Kim
2024-12-04 12:28 ` Peter Zijlstra
2024-12-04 19:30 ` Namhyung Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z0Vf3lrTUsbE_4NP@google.com \
--to=namhyung@kernel.org \
--cc=alx@kernel.org \
--cc=kolek.mikolaj@gmail.com \
--cc=linux-man@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).