* Question: 'pmu' kvm unit test fails when run nested with NMI watchdog on the host
@ 2025-11-05 20:29 mlevitsk
2025-11-10 19:51 ` mlevitsk
2026-02-25 1:07 ` Sean Christopherson
0 siblings, 2 replies; 6+ messages in thread
From: mlevitsk @ 2025-11-05 20:29 UTC (permalink / raw)
To: kvm; +Cc: Sean Christopherson
Hi,
I have a small, a bit philosophical question about the pmu kvm unit test:
One of the subtests of this test, tests all GP counters at once, and it depends on the NMI watchdog being disabled,
because it occupies one GP counter.
This works fine, except when this test is run nested. In this case, assuming that the host has the NMI watchdog enabled,
the L1 still can’t use all counters and has no way of working this around.
Since AFAIK the current long term direction is vPMU, which is especially designed to address those kinds of issues,
I am not sure it is worthy to attempt to fix this at L0 level (by reducing the number of counters that the guest can see for example,
which also won’t always fix the issue, since there could be more perf users on the host, and NMI watchdog can also
get dynamically enabled and disabled).
My question is: Since the test fails and since it interferes with CI, does it make sense to add a workaround to the test,
by making it use 1 counter less if run nested?
As a bonus the test can also check the NMI watchdog state and also reduce the number of tested counters instead of being skipped,
improving coverage.
Does all this make sense? If not, what about making the ‘all_counters’ testcase optional (only print a warning) in case the test is run nested?
Best regards,
Maxim Levitsky
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Question: 'pmu' kvm unit test fails when run nested with NMI watchdog on the host
2025-11-05 20:29 Question: 'pmu' kvm unit test fails when run nested with NMI watchdog on the host mlevitsk
@ 2025-11-10 19:51 ` mlevitsk
2025-11-26 18:14 ` mlevitsk
2026-02-25 1:07 ` Sean Christopherson
1 sibling, 1 reply; 6+ messages in thread
From: mlevitsk @ 2025-11-10 19:51 UTC (permalink / raw)
To: kvm; +Cc: Sean Christopherson
On Wed, 2025-11-05 at 15:29 -0500, mlevitsk@redhat.com wrote:
> Hi,
>
> I have a small, a bit philosophical question about the pmu kvm unit test:
>
> One of the subtests of this test, tests all GP counters at once, and it depends on the NMI watchdog being disabled,
> because it occupies one GP counter.
>
> This works fine, except when this test is run nested. In this case, assuming that the host has the NMI watchdog enabled,
> the L1 still can’t use all counters and has no way of working this around.
>
> Since AFAIK the current long term direction is vPMU, which is especially designed to address those kinds of issues,
> I am not sure it is worthy to attempt to fix this at L0 level (by reducing the number of counters that the guest can see for example,
> which also won’t always fix the issue, since there could be more perf users on the host, and NMI watchdog can also
> get dynamically enabled and disabled).
>
> My question is: Since the test fails and since it interferes with CI, does it make sense to add a workaround to the test,
> by making it use 1 counter less if run nested?
>
> As a bonus the test can also check the NMI watchdog state and also reduce the number of tested counters instead of being skipped,
> improving coverage.
>
> Does all this make sense? If not, what about making the ‘all_counters’ testcase optional (only print a warning) in case the test is run nested?
>
> Best regards,
> Maxim Levitsky
>
Kind ping on this question.
Best regards,
Maxim Levitsky
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Question: 'pmu' kvm unit test fails when run nested with NMI watchdog on the host
2025-11-10 19:51 ` mlevitsk
@ 2025-11-26 18:14 ` mlevitsk
2026-02-10 15:23 ` mlevitsk
0 siblings, 1 reply; 6+ messages in thread
From: mlevitsk @ 2025-11-26 18:14 UTC (permalink / raw)
To: kvm; +Cc: Sean Christopherson
On Mon, 2025-11-10 at 14:51 -0500, mlevitsk@redhat.com wrote:
> On Wed, 2025-11-05 at 15:29 -0500, mlevitsk@redhat.com wrote:
> > Hi,
> >
> > I have a small, a bit philosophical question about the pmu kvm unit test:
> >
> > One of the subtests of this test, tests all GP counters at once, and it depends on the NMI watchdog being disabled,
> > because it occupies one GP counter.
> >
> > This works fine, except when this test is run nested. In this case, assuming that the host has the NMI watchdog enabled,
> > the L1 still can’t use all counters and has no way of working this around.
> >
> > Since AFAIK the current long term direction is vPMU, which is especially designed to address those kinds of issues,
> > I am not sure it is worthy to attempt to fix this at L0 level (by reducing the number of counters that the guest can see for example,
> > which also won’t always fix the issue, since there could be more perf users on the host, and NMI watchdog can also
> > get dynamically enabled and disabled).
> >
> > My question is: Since the test fails and since it interferes with CI, does it make sense to add a workaround to the test,
> > by making it use 1 counter less if run nested?
> >
> > As a bonus the test can also check the NMI watchdog state and also reduce the number of tested counters instead of being skipped,
> > improving coverage.
> >
> > Does all this make sense? If not, what about making the ‘all_counters’ testcase optional (only print a warning) in case the test is run nested?
> >
> > Best regards,
> > Maxim Levitsky
> >
>
> Kind ping on this question.
Another kind ping on this question.
Best regards,
Maxim Levitsky
>
> Best regards,
> Maxim Levitsky
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Question: 'pmu' kvm unit test fails when run nested with NMI watchdog on the host
2025-11-26 18:14 ` mlevitsk
@ 2026-02-10 15:23 ` mlevitsk
0 siblings, 0 replies; 6+ messages in thread
From: mlevitsk @ 2026-02-10 15:23 UTC (permalink / raw)
To: kvm; +Cc: Sean Christopherson
On Wed, 2025-11-26 at 13:14 -0500, mlevitsk@redhat.com wrote:
> On Mon, 2025-11-10 at 14:51 -0500, mlevitsk@redhat.com wrote:
> > On Wed, 2025-11-05 at 15:29 -0500, mlevitsk@redhat.com wrote:
> > > Hi,
> > >
> > > I have a small, a bit philosophical question about the pmu kvm unit test:
> > >
> > > One of the subtests of this test, tests all GP counters at once, and it depends on the NMI watchdog being disabled,
> > > because it occupies one GP counter.
> > >
> > > This works fine, except when this test is run nested. In this case, assuming that the host has the NMI watchdog enabled,
> > > the L1 still can’t use all counters and has no way of working this around.
> > >
> > > Since AFAIK the current long term direction is vPMU, which is especially designed to address those kinds of issues,
> > > I am not sure it is worthy to attempt to fix this at L0 level (by reducing the number of counters that the guest can see for example,
> > > which also won’t always fix the issue, since there could be more perf users on the host, and NMI watchdog can also
> > > get dynamically enabled and disabled).
> > >
> > > My question is: Since the test fails and since it interferes with CI, does it make sense to add a workaround to the test,
> > > by making it use 1 counter less if run nested?
> > >
> > > As a bonus the test can also check the NMI watchdog state and also reduce the number of tested counters instead of being skipped,
> > > improving coverage.
> > >
> > > Does all this make sense? If not, what about making the ‘all_counters’ testcase optional (only print a warning) in case the test is run nested?
> > >
> > > Best regards,
> > > Maxim Levitsky
> > >
> >
> > Kind ping on this question.
>
> Another kind ping on this question.
A ping on this question.
>
> Best regards,
> Maxim Levitsky
>
> >
> > Best regards,
> > Maxim Levitsky
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Question: 'pmu' kvm unit test fails when run nested with NMI watchdog on the host
2025-11-05 20:29 Question: 'pmu' kvm unit test fails when run nested with NMI watchdog on the host mlevitsk
2025-11-10 19:51 ` mlevitsk
@ 2026-02-25 1:07 ` Sean Christopherson
2026-02-25 16:02 ` mlevitsk
1 sibling, 1 reply; 6+ messages in thread
From: Sean Christopherson @ 2026-02-25 1:07 UTC (permalink / raw)
To: mlevitsk; +Cc: kvm
On Wed, Nov 05, 2025, mlevitsk@redhat.com wrote:
> Hi,
>
> I have a small, a bit philosophical question about the pmu kvm unit test:
The problem with philosophical questions is that they're never small :-)
> One of the subtests of this test, tests all GP counters at once, and it
> depends on the NMI watchdog being disabled, because it occupies one GP
> counter.
>
> This works fine, except when this test is run nested. In this case, assuming
> that the host has the NMI watchdog enabled, the L1 still can’t use all
> counters and has no way of working this around.
>
> Since AFAIK the current long term direction is vPMU, which is especially
> designed to address those kinds of issues, I am not sure it is worthy to
> attempt to fix this at L0 level (by reducing the number of counters that the
> guest can see for example, which also won’t always fix the issue, since there
> could be more perf users on the host, and NMI watchdog can also get
> dynamically enabled and disabled).
Agreed. For the emulated PMU, I think the only reasonable answer is that the
admin needs to understand the ramifications of exposing a PMU to the guest.
> My question is: Since the test fails and since it interferes with CI, does it
> make sense to add a workaround to the test, by making it use 1 counter less
> if run nested?
Hrm. I'd prefer not to? Mainly because reducing the number of used counters
seems fragile as it relies heavily on implementation details of pieces of the
stack beyond the current environment (the VM).
I don't suppose there's any way to configure your CI pipeline to disable the
host NMI watchdog?
> As a bonus the test can also check the NMI watchdog state and also reduce the
> number of tested counters instead of being skipped, improving coverage.
I don't think I followed this part. How would a test that runs nested be able
to query the host's NMI watchdog state?
Oh, you're saying in a non-nested scenario to reduce the number of counters.
For me personally, I prefer the SKIP, because it's noisier, i.e. tells me pretty
loudly that I forgot to turn off the watchdog. It's saved me from debugging
false failures at least once when running tests in a VM on the same host.
> Does all this make sense? If not, what about making the ‘all_counters’
> testcase optional (only print a warning) in case the test is run nested?
Printing a warning would definitely be my least favorite option. Tests that
print warns on failure inevitably get ignored. :-/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Question: 'pmu' kvm unit test fails when run nested with NMI watchdog on the host
2026-02-25 1:07 ` Sean Christopherson
@ 2026-02-25 16:02 ` mlevitsk
0 siblings, 0 replies; 6+ messages in thread
From: mlevitsk @ 2026-02-25 16:02 UTC (permalink / raw)
To: Sean Christopherson; +Cc: kvm
On Tue, 2026-02-24 at 17:07 -0800, Sean Christopherson wrote:
> On Wed, Nov 05, 2025, mlevitsk@redhat.com wrote:
> > Hi,
> >
> > I have a small, a bit philosophical question about the pmu kvm unit test:
>
> The problem with philosophical questions is that they're never small :-)
>
> > One of the subtests of this test, tests all GP counters at once, and it
> > depends on the NMI watchdog being disabled, because it occupies one GP
> > counter.
> >
> > This works fine, except when this test is run nested. In this case, assuming
> > that the host has the NMI watchdog enabled, the L1 still can’t use all
> > counters and has no way of working this around.
> >
> > Since AFAIK the current long term direction is vPMU, which is especially
> > designed to address those kinds of issues, I am not sure it is worthy to
> > attempt to fix this at L0 level (by reducing the number of counters that the
> > guest can see for example, which also won’t always fix the issue, since there
> > could be more perf users on the host, and NMI watchdog can also get
> > dynamically enabled and disabled).
>
> Agreed. For the emulated PMU, I think the only reasonable answer is that the
> admin needs to understand the ramifications of exposing a PMU to the guest.
>
> > My question is: Since the test fails and since it interferes with CI, does it
> > make sense to add a workaround to the test, by making it use 1 counter less
> > if run nested?
>
> Hrm. I'd prefer not to? Mainly because reducing the number of used counters
> seems fragile as it relies heavily on implementation details of pieces of the
> stack beyond the current environment (the VM).
OK, then I'll leave it as is.
>
> I don't suppose there's any way to configure your CI pipeline to disable the
> host NMI watchdog?
It is probably possible, I'll ask the CI people.
>
> > As a bonus the test can also check the NMI watchdog state and also reduce the
> > number of tested counters instead of being skipped, improving coverage.
>
> I don't think I followed this part. How would a test that runs nested be able
> to query the host's NMI watchdog state?
>
> Oh, you're saying in a non-nested scenario to reduce the number of counters.
> For me personally, I prefer the SKIP, because it's noisier, i.e. tells me pretty
> loudly that I forgot to turn off the watchdog. It's saved me from debugging
> false failures at least once when running tests in a VM on the same host.
Yes, I am talking here about non nested scenario.
>
> > Does all this make sense? If not, what about making the ‘all_counters’
> > testcase optional (only print a warning) in case the test is run nested?
>
> Printing a warning would definitely be my least favorite option. Tests that
> print warns on failure inevitably get ignored. :-/
>
OK, let it be as it is now.
Thanks,
Best regards,
Maxim Levitsky
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-02-25 16:03 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-05 20:29 Question: 'pmu' kvm unit test fails when run nested with NMI watchdog on the host mlevitsk
2025-11-10 19:51 ` mlevitsk
2025-11-26 18:14 ` mlevitsk
2026-02-10 15:23 ` mlevitsk
2026-02-25 1:07 ` Sean Christopherson
2026-02-25 16:02 ` mlevitsk
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox