Re: [Patch v5 0/6] Bug fixes on topdown events reordering

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Namhyung Kim <namhyung@kernel.org>
To: Ian Rogers <irogers@google.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Dapeng Mi <dapeng1.mi@linux.intel.com>,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
	Yongwei Ma <yongwei.ma@intel.com>,
	Dapeng Mi <dapeng1.mi@intel.com>
Subject: Re: [Patch v5 0/6] Bug fixes on topdown events reordering
Date: Thu, 3 Oct 2024 09:45:16 -0700	[thread overview]
Message-ID: <Zv7KHGQx0y3rAGWx@google.com> (raw)
In-Reply-To: <CAP-5=fXutWptEKZKNvLXvXXpuDoMje6PiOxMuF872xoMjtumGQ@mail.gmail.com>

On Thu, Oct 03, 2024 at 08:55:22AM -0700, Ian Rogers wrote:
> On Thu, Oct 3, 2024 at 7:57 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
> >
> >
> >
> > On 2024-10-02 8:57 p.m., Ian Rogers wrote:
> > > On Wed, Oct 2, 2024 at 5:00 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >>
> > >> On Tue, Oct 01, 2024 at 03:32:04PM -0700, Ian Rogers wrote:
> > >>> On Tue, Oct 1, 2024 at 2:02 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >>>>
> > >>>> On Fri, 13 Sep 2024 08:47:06 +0000, Dapeng Mi wrote:
> > >>>>
> > >>>>> Changes:
> > >>>>> v5 -> v6:
> > >>>>>   * no function change.
> > >>>>>   * rebase patchset to latest code of perf-tool-next tree.
> > >>>>>   * Add Kan's reviewed-by tag.
> > >>>>>
> > >>>>> History:
> > >>>>>   v4: https://lore.kernel.org/all/20240816122938.32228-1-dapeng1.mi@linux.intel.com/
> > >>>>>   v3: https://lore.kernel.org/all/20240712170339.185824-1-dapeng1.mi@linux.intel.com/
> > >>>>>   v2: https://lore.kernel.org/all/20240708144204.839486-1-dapeng1.mi@linux.intel.com/
> > >>>>>   v1: https://lore.kernel.org/all/20240702224037.343958-1-dapeng1.mi@linux.intel.com/
> > >>>>>
> > >>>>> [...]
> > >>>>
> > >>>> Applied to perf-tools-next, thanks!
> > >>>
> > >>> I disagreed with an early patch set and the issue wasn't resolved. Specifically:
> > >>>
> > >>> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/commit/?h=perf-tools-next&id=3b5edc0421e2598a0ae7f0adcd592017f37e3cdf
> > >>> ```
> > >>>   /* Followed by topdown events. */
> > >>>   if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
> > >>>   return -1;
> > >>> - if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs))
> > >>> + /*
> > >>> + * Move topdown events forward only when topdown events
> > >>> + * are not in same group with previous event.
> > >>> + */
> > >>> + if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs) &&
> > >>> +     lhs->core.leader != rhs->core.leader)
> > >>>   return 1;
> > >>> ```
> > >>> Is a broken comparator as the lhs then rhs behavior varies from the
> > >>> rhs then lhs behavior. The qsort implementation can randomly order the
> > >>> events.
> > >>> Please drop/revert.
> > >>
> > >> Can you please provide an example when it's broken?  I'm not sure how it
> > >> can produce new errors, but it seems to fix a specific problem.  Do you
> > >> have a new test failure after this change?
> > >
> > > It may work with a particular sort routine in a particular library
> > > today, the issue is that if the sort routine were say changed from:
> > >
> > > if (cmp(a, b) < 0)
> > >
> > > to:
> > >
> > > if (cmp(b, a) > 0)
> > >
> > > the sort would vary with this change even though such a change in the
> > > sorting code is a no-op. It is fundamental algorithms that this code
> > > is broken, I'm not going to spend the time finding a list of
> > > instructions and a sort routine to demonstrate this.
> >
> >
> > I'm not sure I fully understand. But just want to mention that the
> > change only impacts the Topdown perf metric group, which is only
> > available for the ICL and later p-core. Nothing will change if no perf
> > metrics topdown events are used. Very limited impact.
> 
> How is breaking every ICL and later Intel model (excluding atoms)
> limited impact?

I guess he meant it's only for Topdown metric groups with an incorrect
order on those models.

> 
> > I like the patch set is because it provides examples and tests to cover
> > the possible combination of the slots event and topdown metrics events.
> > So we will have a clear expectation for different combinations caused by
> > the perf metrics mess.
> 
> Tests are good. I have no problem with the change, possibly it is
> inefficient, except that hacking a sort comparator to not be symmetric
> is broken.

Ok.

> 
> > If the algorithms cannot be changed, can you please give some
> > suggestions, especially for the sample read failure?
> 
> So this is symmetric:
> ```
> if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
>   return -1;
> if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs))
>   return 1;
> ```
> That is were lhs and rhs swapped then you'd get the expected comparison order.
> ```
> if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs) &&
> lhs->core.leader != rhs->core.leader)
>   return -1;
> if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs) &&
> lhs->core.leader != rhs->core.leader)
>   return 1;
> ```
> Is symmetric as well.
> ```
> if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
>   return -1;
> if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs) &&
> lhs->core.leader != rhs->core.leader)
>   return 1;
> ```
> (what this patch does) is not symmetric as the group leader impacts
> the greater-than case but not the less-than case.
> 
> It is not uncommon to see in a sort function:
> ```
> if (cmp(a, b) <= 0) {
>   assert(cmp(b,a) >= 0 && "check for unstable/broken compare functions");
> ```

I see.  So are you proposing this?

diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
index 438e4639fa892304..46884fa17fe658a6 100644
--- a/tools/perf/arch/x86/util/evlist.c
+++ b/tools/perf/arch/x86/util/evlist.c
@@ -70,7 +70,8 @@ int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs)
                if (arch_is_topdown_slots(rhs))
                        return 1;
                /* Followed by topdown events. */
-               if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
+               if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs) &&
+                   lhs->core.leader != rhs->core.leader)
                        return -1;
                /*
                 * Move topdown events forward only when topdown events

Dapeng and Kan, can you verify if it's ok?  My quick tests look ok.


> and we could add this here:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/list_sort.c?h=perf-tools-next#n22
> Try searching up "Comparison method violates its general contract"
> which is what Java throws when its TimSort implementation detects
> cases like this. If you break the comparator you can get the sort into
> an infinite loop - maybe that's enough of an indication that the code
> is broken and you don't need the exception. As is common in kernel
> code we're missing guard rails in list_sort, were the comparator
> passed to qsort then expectations on breakage are a roll of the dice.
> 
> I believe when I originally gave this feedback I didn't think the fix
> was to do it in the comparator and maybe an additional pass over the
> list was going to be necessary. Basically a sort needs to be a sort
> and it can't kind of be a sort depending on the order of the
> comparison, this is just incurring tech debt and when a sort tweak
> happens everything will break. As the usual chump cleaning up this
> tech debt I'd prefer it wasn't introduced.

Sorry, I think I didn't follow the previous thread and missed your
feedback.

> 
> Fwiw, I refuse to carry this change in:
> https://github.com/googleprodkernel/linux-perf/commits/google_tools_master/
> and if the tests break as a consequence then the appropriate thing is
> to revert those too. Linus/upstream maintainers follow a different
> plan and that's their business, I can't build off of this code.
> 
> It is unclear to me why we get patches that have been pointed out to
> be broken rebased/resent repeatedly without the broken-ness corrected.
> For example, the abuse of aggregation for metrics in perf script. At
> least point to the disagreement in the commit message, call me an
> idiot, and let the maintainer make an informed or uninformed choice. I
> am frequently an idiot and I apologize for all those cases, to the
> best of my knowledge this isn't one of them.

Yeah I understand your point and it sounds reasonable.  Thanks for your
feedback.  I don't think I'll revert the change though, we can add the
fix on top instead.

Thanks,
Namhyung

next prev parent reply	other threads:[~2024-10-03 16:45 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-13  8:47 [Patch v5 0/6] Bug fixes on topdown events reordering Dapeng Mi
2024-09-13  8:47 ` [Patch v5 1/6] perf x86/topdown: Complete topdown slots/metrics events check Dapeng Mi
2024-10-08  5:55   ` Ian Rogers
2024-10-09  9:56     ` Mi, Dapeng
2024-09-13  8:47 ` [Patch v5 2/6] perf x86/topdown: Correct leader selection with sample_read enabled Dapeng Mi
2024-09-13  8:47 ` [Patch v5 3/6] perf x86/topdown: Don't move topdown metric events in group Dapeng Mi
2024-09-13  8:47 ` [Patch v5 4/6] perf tests: Add leader sampling test in record tests Dapeng Mi
2024-09-13  8:47 ` [Patch v5 5/6] perf tests: Add topdown events counting and sampling tests Dapeng Mi
2024-09-13  8:47 ` [Patch v5 6/6] perf tests: Add more topdown events regroup tests Dapeng Mi
2024-10-01 21:02 ` [Patch v5 0/6] Bug fixes on topdown events reordering Namhyung Kim
2024-10-01 22:32   ` Ian Rogers
2024-10-02 14:31     ` Ian Rogers
2024-10-03  0:00     ` Namhyung Kim
2024-10-03  0:57       ` Ian Rogers
2024-10-03 14:57         ` Liang, Kan
2024-10-03 15:55           ` Ian Rogers
2024-10-03 16:45             ` Namhyung Kim [this message]
2024-10-03 19:45               ` Liang, Kan
2024-10-03 21:26                 ` Ian Rogers
2024-10-03 22:13                   ` Namhyung Kim
2024-10-03 23:29                     ` Liang, Kan
2024-10-03 23:36                       ` Ian Rogers
2024-10-04  5:19                         ` Namhyung Kim
2024-10-08  2:52                         ` Mi, Dapeng1
2024-10-08  5:13                           ` Ian Rogers
2024-10-08  2:31                   ` Mi, Dapeng1
2024-10-08  2:30                 ` Mi, Dapeng1

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:438e4639fa89230 dfblob:46884fa17fe658a )
 OR (
bs:"Re: [Patch v5 0/6] Bug fixes on topdown events reordering" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zv7KHGQx0y3rAGWx@google.com \
    --to=namhyung@kernel.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=dapeng1.mi@intel.com \
    --cc=dapeng1.mi@linux.intel.com \
    --cc=irogers@google.com \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=yongwei.ma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.