All of lore.kernel.org
 help / color / mirror / Atom feed
From: Namhyung Kim <namhyung@kernel.org>
To: Ian Rogers <irogers@google.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>,
	Milian Wolff <milian.wolff@kdab.com>,
	linux-perf-users@vger.kernel.org,
	Adrian Hunter <adrian.hunter@intel.com>,
	Arnaldo Carvalho de Melo <acme@kenel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: perf 6.9-1 (archlinux) crashes during recording of cycles + raw_syscalls
Date: Fri, 7 Jun 2024 11:26:16 -0700	[thread overview]
Message-ID: <ZmNQyBx6N1XEWbaw@google.com> (raw)
In-Reply-To: <CAP-5=fWD+PPCVCvyBFfd3vZzUajM0HAEWzJe8XmB9p_bfT4ATQ@mail.gmail.com>

On Thu, Jun 06, 2024 at 04:17:43PM -0700, Ian Rogers wrote:
> On Thu, Jun 6, 2024 at 3:20 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Tue, Jun 04, 2024 at 04:02:08PM -0300, Arnaldo Carvalho de Melo wrote:
> > > On Tue, Jun 04, 2024 at 11:48:09AM -0700, Ian Rogers wrote:
> > > > On Tue, Jun 4, 2024 at 7:12 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > > > > Can you please try with the attached and perhaps provide your Tested-by?
> > >
> > > > > From ab355e2c6b4cf641a9fff7af38059cf69ac712d5 Mon Sep 17 00:00:00 2001
> > > > > From: Arnaldo Carvalho de Melo <acme@redhat.com>
> > > > > Date: Tue, 4 Jun 2024 11:00:22 -0300
> > > > > Subject: [PATCH 1/1] Revert "perf record: Reduce memory for recording
> > > > >  PERF_RECORD_LOST_SAMPLES event"
> > >
> > > > > This reverts commit 7d1405c71df21f6c394b8a885aa8a133f749fa22.
> > >
> > > > I think we should try to fight back reverts when possible. Reverts are
> > > > removing something somebody poured time and attention into. When a
> > >
> > > While in the development phase, yeah, but when we find a regression and
> > > the revert makes it go away, that is the way to go.
> > >
> > > The person who poured time on the development gets notified and can
> > > decide if/when to try again.
> > >
> > > Millian had to pour time to figure out why something stopped working,
> > > was kind enough to provide the output from multiple tools to help in
> > > fixing the problem and I had to do the bisect to figure out when the
> > > problem happened and to check if reverting it we would have the tool
> > > working again.
> > >
> > > If we try to fix this for v6.10 we may end up adding yet another bug, so
> > > the safe thing to do at this point is to do the revert.
> > >
> > > We can try improving this once again for v6.11.
> >
> > I think I found a couple of problems with this issue. :(
> >
> >  1. perf_session__set_id_hdr_size() uses the first evsel in the session
> >     But I think it should pick the tracking event.  I guess we assume
> >     all events have the same set of sample_type wrt the sample_id_all
> >     but I'm not sure if it's correct.
> >
> >  2. With --call-graph dwarf, it seems to set unrelated sample type bits
> >     in the attr like ADDR and DATA_SRC.
> >
> >  3. For tracepoint events, evsel__newtp_idx() sets a couple of sample
> >     type regardless of the configuration.  This includes RAW, TIME and
> >     CPU.  This one changes the format of the id headers.
> >
> >  4. PERF_RECORD_LOST_SAMPLES is for the sampling event, so it should
> >     use the event's sample_type.  But the event parsing looks up the
> >     event using evlist->is_pos which is set for the first event.
> >
> >  5. I think we can remove some sample type (i.e. TID and CPU) from the
> >     tracking event in most cases.  ID(ENTIFIER) will be used for LOST_
> >     SAMPLES and TIME is needed anyway.  TID is might be used for SWITCH
> >     but others already contain necessary information in the type.  I
> >     wish we could add id field to PERF_RECORD_LOST_SAMPLES and tid/pid
> >     to PERF_RECORD_SWITCH.
> 
> Right, this is good. To clear up the immediate error we just need to
> increase the memory allocation size by 48 bytes to account for the
> sample ID being written. Here is a change doing that and removing the
> memory allocation altogether:

Looks ok.  I think we can have this as a quick fix and I can work on the
above issues separately.  Can you please send a formal patch?

> ```
> diff --git a/tools/lib/perf/include/perf/event.h
> b/tools/lib/perf/include/perf/event.h
> index ae64090184d3..a2dfaff26fb7 100644
> --- a/tools/lib/perf/include/perf/event.h
> +++ b/tools/lib/perf/include/perf/event.h
> @@ -77,6 +77,8 @@ struct perf_record_lost_samples {
>        __u64                    lost;
> };
> 
> +#define PERF_RECORD_MAX_LOST_SAMPLE_AND_ID_SIZE \
> +       (sizeof(struct perf_record_lost_samples) + 6 * sizeof(__u64))

For possible other uses, I'd suggest to add

  #define MAX_ID_HDR_ENTRIES  6

Thanks,
Namhyung

> /*
>  * PERF_FORMAT_ENABLED | PERF_FORMAT_RUNNING | PERF_FORMAT_ID | PERF_FORMAT_LOST
>  */
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 66a3de8ac661..1615a1723fb9 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -1926,7 +1926,10 @@ static void __record__save_lost_samples(struct
> record *rec, struct evsel *evs
> el,
> static void record__read_lost_samples(struct record *rec)
> {
>        struct perf_session *session = rec->session;
> -       struct perf_record_lost_samples *lost = NULL;
> +       union {
> +               struct perf_record_lost_samples lost;
> +               char
> lost_and_sample_id[PERF_RECORD_MAX_LOST_SAMPLE_AND_ID_SIZE];
> +       } lost;
>        struct evsel *evsel;
> 
>        /* there was an error during record__open */
> @@ -1951,20 +1954,13 @@ static void record__read_lost_samples(struct
> record *rec)
> 
>                                if (perf_evsel__read(&evsel->core, x,
> y, &count) < 0) {
>                                        pr_debug("read LOST count failed\n");
> -                                       goto out;
> +                                       return;
>                                }
> 
>                                if (count.lost) {
> -                                       if (!lost) {
> -                                               lost = zalloc(sizeof(*lost) +
> -
> session->machines.host.id_hdr_size);
> -                                               if (!lost) {
> -
> pr_debug("Memory allocation failed\n");
> -                                                       return;
> -                                               }
> -                                               lost->header.type =
> PERF_RECORD_LOST_SAMPLES;
> -                                       }
> -
> __record__save_lost_samples(rec, evsel, lost,
> +                                       memset(&lost, 0, sizeof(lost));
> +                                       lost.lost.header.type =
> PERF_RECORD_LOST_SAMPLES;
> +
> __record__save_lost_samples(rec, evsel, &lost.lost,
>                                                                    x,
> y, count.lost, 0);
>                                }
>                        }
> @@ -1972,21 +1968,12 @@ static void record__read_lost_samples(struct
> record *rec)
> 
>                lost_count = perf_bpf_filter__lost_count(evsel);
>                if (lost_count) {
> -                       if (!lost) {
> -                               lost = zalloc(sizeof(*lost) +
> -
> session->machines.host.id_hdr_size);
> -                               if (!lost) {
> -                                       pr_debug("Memory allocation failed\n");
> -                                       return;
> -                               }
> -                               lost->header.type = PERF_RECORD_LOST_SAMPLES;
> -                       }
> -                       __record__save_lost_samples(rec, evsel, lost,
> 0, 0, lost_count,
> +                       memset(&lost, 0, sizeof(lost));
> +                       lost.lost.header.type = PERF_RECORD_LOST_SAMPLES;
> +                       __record__save_lost_samples(rec, evsel,
> &lost.lost, 0, 0, lost_count,
> 
> PERF_RECORD_MISC_LOST_SAMPLES_BPF);
>                }
>        }
> -out:
> -       free(lost);
> }
> 
> static volatile sig_atomic_t workload_exec_errno;
> ```
> 
> Thanks,
> Ian
> 
> > Thanks,
> > Namhyung
> >
> > >
> > > > regression has occurred then I think we should add the regression case
> > > > as a test.
> > >
> > > Sure, I thought about that as well, will try and have one shell test
> > > with that, referring to this case, for v6.11.
> > >
> > > - Arnaldo

  reply	other threads:[~2024-06-07 18:26 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-04 11:44 perf 6.9-1 (archlinux) crashes during recording of cycles + raw_syscalls Milian Wolff
2024-06-04 13:49 ` Arnaldo Carvalho de Melo
2024-06-04 14:12   ` Arnaldo Carvalho de Melo
2024-06-04 18:48     ` Ian Rogers
2024-06-04 19:02       ` Arnaldo Carvalho de Melo
2024-06-06 22:20         ` Namhyung Kim
2024-06-06 23:17           ` Ian Rogers
2024-06-07 18:26             ` Namhyung Kim [this message]
2024-06-04 20:04     ` Milian Wolff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZmNQyBx6N1XEWbaw@google.com \
    --to=namhyung@kernel.org \
    --cc=acme@kenel.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=irogers@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=milian.wolff@kdab.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.