* [BUG] Guest OSes die simultaneously (bisected) @ 2024-01-03 22:22 Paul E. McKenney 2024-01-03 23:27 ` Paul E. McKenney 2024-01-04 10:01 ` Breno Leitao 0 siblings, 2 replies; 12+ messages in thread From: Paul E. McKenney @ 2024-01-03 22:22 UTC (permalink / raw) To: Like Xu, Andi Kleen, Kan Liang, Luwei Kang, Peter Zijlstra, Paolo Bonzini Cc: linux-perf-users, linux-kernel, kvm, Breno Leitao, Arnaldo Carvalho de Melo, Ingo Molnar Hello! Since some time between v5.19 and v6.4, long-running rcutorture tests would (rarely but intolerably often) have all guests on a given host die simultaneously with something like an instruction fault or a segmentation violation. Each bisection step required 20 hosts running 10 hours each, and this eventually fingered commit c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS"). Although this commit is certainly messing with things that could possibly cause all manner of mischief, I don't immediately see a smoking gun. Except that the commit prior to this one is rock solid. Just to make things a bit more exciting, bisection in mainline proved to be problematic due to bugs of various kinds that hid this one. I was therefore forced to bisect among the commits backported to the internal v5.19-based kernel, which fingered the backported version of the patch called out above. Please note that this is not (yet) an emergency. I will just continue to run rcutorture on v5.19-based hypervisors in the meantime. Any suggestions for debugging or fixing? Thanx, Paul ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] Guest OSes die simultaneously (bisected) 2024-01-03 22:22 [BUG] Guest OSes die simultaneously (bisected) Paul E. McKenney @ 2024-01-03 23:27 ` Paul E. McKenney 2024-01-04 0:24 ` Sean Christopherson 2024-01-04 10:01 ` Breno Leitao 1 sibling, 1 reply; 12+ messages in thread From: Paul E. McKenney @ 2024-01-03 23:27 UTC (permalink / raw) To: Like Xu, Andi Kleen, Kan Liang, Luwei Kang, Peter Zijlstra, Paolo Bonzini Cc: linux-perf-users, linux-kernel, kvm, Breno Leitao, Arnaldo Carvalho de Melo, Ingo Molnar On Wed, Jan 03, 2024 at 02:22:23PM -0800, Paul E. McKenney wrote: > Hello! > > Since some time between v5.19 and v6.4, long-running rcutorture tests > would (rarely but intolerably often) have all guests on a given host die > simultaneously with something like an instruction fault or a segmentation > violation. > > Each bisection step required 20 hosts running 10 hours each, and > this eventually fingered commit c59a1f106f5c ("KVM: x86/pmu: Add > IA32_PEBS_ENABLE MSR emulation for extended PEBS"). Although this commit > is certainly messing with things that could possibly cause all manner > of mischief, I don't immediately see a smoking gun. Except that the > commit prior to this one is rock solid. > > Just to make things a bit more exciting, bisection in mainline proved > to be problematic due to bugs of various kinds that hid this one. I was > therefore forced to bisect among the commits backported to the internal > v5.19-based kernel, which fingered the backported version of the patch > called out above. Ah, and so why do I believe that this is a problem in mainline rather than just (say) a backporting mistake? Because this issue was first located in v6.4, which already has this commit included. Thanx, Paul > Please note that this is not (yet) an emergency. I will just continue > to run rcutorture on v5.19-based hypervisors in the meantime. > > Any suggestions for debugging or fixing? > > Thanx, Paul ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] Guest OSes die simultaneously (bisected) 2024-01-03 23:27 ` Paul E. McKenney @ 2024-01-04 0:24 ` Sean Christopherson 2024-01-04 1:00 ` Paul E. McKenney 0 siblings, 1 reply; 12+ messages in thread From: Sean Christopherson @ 2024-01-04 0:24 UTC (permalink / raw) To: Paul E. McKenney Cc: Like Xu, Andi Kleen, Kan Liang, Luwei Kang, Peter Zijlstra, Paolo Bonzini, linux-perf-users, linux-kernel, kvm, Breno Leitao, Arnaldo Carvalho de Melo, Ingo Molnar On Wed, Jan 03, 2024, Paul E. McKenney wrote: > On Wed, Jan 03, 2024 at 02:22:23PM -0800, Paul E. McKenney wrote: > > Hello! > > > > Since some time between v5.19 and v6.4, long-running rcutorture tests > > would (rarely but intolerably often) have all guests on a given host die > > simultaneously with something like an instruction fault or a segmentation > > violation. > > > > Each bisection step required 20 hosts running 10 hours each, and > > this eventually fingered commit c59a1f106f5c ("KVM: x86/pmu: Add > > IA32_PEBS_ENABLE MSR emulation for extended PEBS"). Although this commit > > is certainly messing with things that could possibly cause all manner > > of mischief, I don't immediately see a smoking gun. Except that the > > commit prior to this one is rock solid. > > Just to make things a bit more exciting, bisection in mainline proved > > to be problematic due to bugs of various kinds that hid this one. I was > > therefore forced to bisect among the commits backported to the internal > > v5.19-based kernel, which fingered the backported version of the patch > > called out above. > > Ah, and so why do I believe that this is a problem in mainline rather > than just (say) a backporting mistake? > > Because this issue was first located in v6.4, which already has this > commit included. > > Thanx, Paul > > > Please note that this is not (yet) an emergency. I will just continue > > to run rcutorture on v5.19-based hypervisors in the meantime. > > > > Any suggestions for debugging or fixing? This looks suspect: + u64 pebs_mask = cpuc->pebs_enabled & x86_pmu.pebs_capable; + int global_ctrl, pebs_enable; - arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL; - arr[0].host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask; - arr[0].guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask; - arr[0].guest &= ~(cpuc->pebs_enabled & x86_pmu.pebs_capable); - *nr = 1; + *nr = 0; + global_ctrl = (*nr)++; + arr[global_ctrl] = (struct perf_guest_switch_msr){ + .msr = MSR_CORE_PERF_GLOBAL_CTRL, + .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask, + .guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask), + }; IIUC (always a big if with this code), the intent is that the guest's version of PERF_GLOBAL_CTRL gets bits that are (a) not exclusive to the host and (b) not being used for PEBS. (b) is necessary because PEBS generates records in memory using virtual addresses, i.e. the CPU will write to memory using a virtual address that is valid for the host but not the guest. And so PMU counters that are configured to generate PEBS records need to be disabled while running the guest. Before that commit, the logic was: guest[PERF_GLOBAL_CTRL] = ctrl & ~host; guest[PERF_GLOBAL_CTRL] &= ~pebs; But after, it's now: guest[PERF_GLOBAL_CTRL] = ctrl & (~host | ~pebs); I.e. the kernel is enabled counters in the guest that are not host-only OR not PEBS. E.g. if only counter 0 is in use, it's using PEBS, but it's not exclusive to the host, then the new code will yield (truncated to a single byte for sanity) 1 = 1 & (0xf | 0xe) and thus keep counter 0 enabled, whereas the old code would yield 1 = 1 & 0xf 0 = 1 & 0xe A bit of a shot in the dark and completed untested, but I think this is the correct fix? diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index a08f794a0e79..92d5a3464cb2 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -4056,7 +4056,7 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data) arr[global_ctrl] = (struct perf_guest_switch_msr){ .msr = MSR_CORE_PERF_GLOBAL_CTRL, .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask, - .guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask), + .guest = intel_ctrl & ~(cpuc->intel_ctrl_host_mask | pebs_mask), }; if (!x86_pmu.pebs) ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [BUG] Guest OSes die simultaneously (bisected) 2024-01-04 0:24 ` Sean Christopherson @ 2024-01-04 1:00 ` Paul E. McKenney 2024-01-04 14:50 ` Paul E. McKenney 0 siblings, 1 reply; 12+ messages in thread From: Paul E. McKenney @ 2024-01-04 1:00 UTC (permalink / raw) To: Sean Christopherson Cc: Like Xu, Andi Kleen, Kan Liang, Luwei Kang, Peter Zijlstra, Paolo Bonzini, linux-perf-users, linux-kernel, kvm, Breno Leitao, Arnaldo Carvalho de Melo, Ingo Molnar On Wed, Jan 03, 2024 at 04:24:06PM -0800, Sean Christopherson wrote: > On Wed, Jan 03, 2024, Paul E. McKenney wrote: > > On Wed, Jan 03, 2024 at 02:22:23PM -0800, Paul E. McKenney wrote: > > > Hello! > > > > > > Since some time between v5.19 and v6.4, long-running rcutorture tests > > > would (rarely but intolerably often) have all guests on a given host die > > > simultaneously with something like an instruction fault or a segmentation > > > violation. > > > > > > Each bisection step required 20 hosts running 10 hours each, and > > > this eventually fingered commit c59a1f106f5c ("KVM: x86/pmu: Add > > > IA32_PEBS_ENABLE MSR emulation for extended PEBS"). Although this commit > > > is certainly messing with things that could possibly cause all manner > > > of mischief, I don't immediately see a smoking gun. Except that the > > > commit prior to this one is rock solid. > > > Just to make things a bit more exciting, bisection in mainline proved > > > to be problematic due to bugs of various kinds that hid this one. I was > > > therefore forced to bisect among the commits backported to the internal > > > v5.19-based kernel, which fingered the backported version of the patch > > > called out above. > > > > Ah, and so why do I believe that this is a problem in mainline rather > > than just (say) a backporting mistake? > > > > Because this issue was first located in v6.4, which already has this > > commit included. > > > > Thanx, Paul > > > > > Please note that this is not (yet) an emergency. I will just continue > > > to run rcutorture on v5.19-based hypervisors in the meantime. > > > > > > Any suggestions for debugging or fixing? > > This looks suspect: > > + u64 pebs_mask = cpuc->pebs_enabled & x86_pmu.pebs_capable; > + int global_ctrl, pebs_enable; > > - arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL; > - arr[0].host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask; > - arr[0].guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask; > - arr[0].guest &= ~(cpuc->pebs_enabled & x86_pmu.pebs_capable); > - *nr = 1; > + *nr = 0; > + global_ctrl = (*nr)++; > + arr[global_ctrl] = (struct perf_guest_switch_msr){ > + .msr = MSR_CORE_PERF_GLOBAL_CTRL, > + .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask, > + .guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask), > + }; > > > IIUC (always a big if with this code), the intent is that the guest's version of > PERF_GLOBAL_CTRL gets bits that are (a) not exclusive to the host and (b) not > being used for PEBS. (b) is necessary because PEBS generates records in memory > using virtual addresses, i.e. the CPU will write to memory using a virtual address > that is valid for the host but not the guest. And so PMU counters that are > configured to generate PEBS records need to be disabled while running the guest. > > Before that commit, the logic was: > > guest[PERF_GLOBAL_CTRL] = ctrl & ~host; > guest[PERF_GLOBAL_CTRL] &= ~pebs; > > But after, it's now: > > guest[PERF_GLOBAL_CTRL] = ctrl & (~host | ~pebs); > > I.e. the kernel is enabled counters in the guest that are not host-only OR not > PEBS. E.g. if only counter 0 is in use, it's using PEBS, but it's not exclusive > to the host, then the new code will yield (truncated to a single byte for sanity) > > 1 = 1 & (0xf | 0xe) > > and thus keep counter 0 enabled, whereas the old code would yield > > 1 = 1 & 0xf > 0 = 1 & 0xe > > A bit of a shot in the dark and completed untested, but I think this is the correct > fix? I am firing off some tests, and either way, thank you very much!!! Thanx, Paul > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c > index a08f794a0e79..92d5a3464cb2 100644 > --- a/arch/x86/events/intel/core.c > +++ b/arch/x86/events/intel/core.c > @@ -4056,7 +4056,7 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data) > arr[global_ctrl] = (struct perf_guest_switch_msr){ > .msr = MSR_CORE_PERF_GLOBAL_CTRL, > .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask, > - .guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask), > + .guest = intel_ctrl & ~(cpuc->intel_ctrl_host_mask | pebs_mask), > }; > > if (!x86_pmu.pebs) > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] Guest OSes die simultaneously (bisected) 2024-01-04 1:00 ` Paul E. McKenney @ 2024-01-04 14:50 ` Paul E. McKenney 2024-01-04 14:59 ` Paolo Bonzini 0 siblings, 1 reply; 12+ messages in thread From: Paul E. McKenney @ 2024-01-04 14:50 UTC (permalink / raw) To: Sean Christopherson Cc: Like Xu, Andi Kleen, Kan Liang, Luwei Kang, Peter Zijlstra, Paolo Bonzini, linux-perf-users, linux-kernel, kvm, Breno Leitao, Arnaldo Carvalho de Melo, Ingo Molnar On Wed, Jan 03, 2024 at 05:00:35PM -0800, Paul E. McKenney wrote: > On Wed, Jan 03, 2024 at 04:24:06PM -0800, Sean Christopherson wrote: > > On Wed, Jan 03, 2024, Paul E. McKenney wrote: > > > On Wed, Jan 03, 2024 at 02:22:23PM -0800, Paul E. McKenney wrote: > > > > Hello! > > > > > > > > Since some time between v5.19 and v6.4, long-running rcutorture tests > > > > would (rarely but intolerably often) have all guests on a given host die > > > > simultaneously with something like an instruction fault or a segmentation > > > > violation. > > > > > > > > Each bisection step required 20 hosts running 10 hours each, and > > > > this eventually fingered commit c59a1f106f5c ("KVM: x86/pmu: Add > > > > IA32_PEBS_ENABLE MSR emulation for extended PEBS"). Although this commit > > > > is certainly messing with things that could possibly cause all manner > > > > of mischief, I don't immediately see a smoking gun. Except that the > > > > commit prior to this one is rock solid. > > > > Just to make things a bit more exciting, bisection in mainline proved > > > > to be problematic due to bugs of various kinds that hid this one. I was > > > > therefore forced to bisect among the commits backported to the internal > > > > v5.19-based kernel, which fingered the backported version of the patch > > > > called out above. > > > > > > Ah, and so why do I believe that this is a problem in mainline rather > > > than just (say) a backporting mistake? > > > > > > Because this issue was first located in v6.4, which already has this > > > commit included. > > > > > > Thanx, Paul > > > > > > > Please note that this is not (yet) an emergency. I will just continue > > > > to run rcutorture on v5.19-based hypervisors in the meantime. > > > > > > > > Any suggestions for debugging or fixing? > > > > This looks suspect: > > > > + u64 pebs_mask = cpuc->pebs_enabled & x86_pmu.pebs_capable; > > + int global_ctrl, pebs_enable; > > > > - arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL; > > - arr[0].host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask; > > - arr[0].guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask; > > - arr[0].guest &= ~(cpuc->pebs_enabled & x86_pmu.pebs_capable); > > - *nr = 1; > > + *nr = 0; > > + global_ctrl = (*nr)++; > > + arr[global_ctrl] = (struct perf_guest_switch_msr){ > > + .msr = MSR_CORE_PERF_GLOBAL_CTRL, > > + .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask, > > + .guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask), > > + }; > > > > > > IIUC (always a big if with this code), the intent is that the guest's version of > > PERF_GLOBAL_CTRL gets bits that are (a) not exclusive to the host and (b) not > > being used for PEBS. (b) is necessary because PEBS generates records in memory > > using virtual addresses, i.e. the CPU will write to memory using a virtual address > > that is valid for the host but not the guest. And so PMU counters that are > > configured to generate PEBS records need to be disabled while running the guest. > > > > Before that commit, the logic was: > > > > guest[PERF_GLOBAL_CTRL] = ctrl & ~host; > > guest[PERF_GLOBAL_CTRL] &= ~pebs; > > > > But after, it's now: > > > > guest[PERF_GLOBAL_CTRL] = ctrl & (~host | ~pebs); > > > > I.e. the kernel is enabled counters in the guest that are not host-only OR not > > PEBS. E.g. if only counter 0 is in use, it's using PEBS, but it's not exclusive > > to the host, then the new code will yield (truncated to a single byte for sanity) > > > > 1 = 1 & (0xf | 0xe) > > > > and thus keep counter 0 enabled, whereas the old code would yield > > > > 1 = 1 & 0xf > > 0 = 1 & 0xe > > > > A bit of a shot in the dark and completed untested, but I think this is the correct > > fix? > > I am firing off some tests, and either way, thank you very much!!! Woo-hoo!!! ;-) Tested-by: Paul E. McKenney <paulmck@kernel.org> Will you be sending a proper patch, or would you prefer that I do so? In the latter case, I would need your Signed-off-by. And again, thank you very much!!! Thanx, Paul > > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c > > index a08f794a0e79..92d5a3464cb2 100644 > > --- a/arch/x86/events/intel/core.c > > +++ b/arch/x86/events/intel/core.c > > @@ -4056,7 +4056,7 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data) > > arr[global_ctrl] = (struct perf_guest_switch_msr){ > > .msr = MSR_CORE_PERF_GLOBAL_CTRL, > > .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask, > > - .guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask), > > + .guest = intel_ctrl & ~(cpuc->intel_ctrl_host_mask | pebs_mask), > > }; > > > > if (!x86_pmu.pebs) > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] Guest OSes die simultaneously (bisected) 2024-01-04 14:50 ` Paul E. McKenney @ 2024-01-04 14:59 ` Paolo Bonzini 2024-01-04 16:06 ` Paul E. McKenney 0 siblings, 1 reply; 12+ messages in thread From: Paolo Bonzini @ 2024-01-04 14:59 UTC (permalink / raw) To: paulmck Cc: Sean Christopherson, Like Xu, Andi Kleen, Kan Liang, Luwei Kang, Peter Zijlstra, linux-perf-users, linux-kernel, kvm, Breno Leitao, Arnaldo Carvalho de Melo, Ingo Molnar On Thu, Jan 4, 2024 at 3:58 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Wed, Jan 03, 2024 at 05:00:35PM -0800, Paul E. McKenney wrote: > > On Wed, Jan 03, 2024 at 04:24:06PM -0800, Sean Christopherson wrote: > > > On Wed, Jan 03, 2024, Paul E. McKenney wrote: > > > > On Wed, Jan 03, 2024 at 02:22:23PM -0800, Paul E. McKenney wrote: > > > > > Hello! > > > > > > > > > > Since some time between v5.19 and v6.4, long-running rcutorture tests > > > > > would (rarely but intolerably often) have all guests on a given host die > > > > > simultaneously with something like an instruction fault or a segmentation > > > > > violation. > > > > > > > > > > Each bisection step required 20 hosts running 10 hours each, and > > > > > this eventually fingered commit c59a1f106f5c ("KVM: x86/pmu: Add > > > > > IA32_PEBS_ENABLE MSR emulation for extended PEBS"). Although this commit > > > > > is certainly messing with things that could possibly cause all manner > > > > > of mischief, I don't immediately see a smoking gun. Except that the > > > > > commit prior to this one is rock solid. > > > > > Just to make things a bit more exciting, bisection in mainline proved > > > > > to be problematic due to bugs of various kinds that hid this one. I was > > > > > therefore forced to bisect among the commits backported to the internal > > > > > v5.19-based kernel, which fingered the backported version of the patch > > > > > called out above. > > > > > > > > Ah, and so why do I believe that this is a problem in mainline rather > > > > than just (say) a backporting mistake? > > > > > > > > Because this issue was first located in v6.4, which already has this > > > > commit included. > > > > > > > > Thanx, Paul > > > > > > > > > Please note that this is not (yet) an emergency. I will just continue > > > > > to run rcutorture on v5.19-based hypervisors in the meantime. > > > > > > > > > > Any suggestions for debugging or fixing? > > > > > > This looks suspect: > > > > > > + u64 pebs_mask = cpuc->pebs_enabled & x86_pmu.pebs_capable; > > > + int global_ctrl, pebs_enable; > > > > > > - arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL; > > > - arr[0].host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask; > > > - arr[0].guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask; > > > - arr[0].guest &= ~(cpuc->pebs_enabled & x86_pmu.pebs_capable); > > > - *nr = 1; > > > + *nr = 0; > > > + global_ctrl = (*nr)++; > > > + arr[global_ctrl] = (struct perf_guest_switch_msr){ > > > + .msr = MSR_CORE_PERF_GLOBAL_CTRL, > > > + .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask, > > > + .guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask), > > > + }; > > > > > > > > > IIUC (always a big if with this code), the intent is that the guest's version of > > > PERF_GLOBAL_CTRL gets bits that are (a) not exclusive to the host and (b) not > > > being used for PEBS. (b) is necessary because PEBS generates records in memory > > > using virtual addresses, i.e. the CPU will write to memory using a virtual address > > > that is valid for the host but not the guest. And so PMU counters that are > > > configured to generate PEBS records need to be disabled while running the guest. > > > > > > Before that commit, the logic was: > > > > > > guest[PERF_GLOBAL_CTRL] = ctrl & ~host; > > > guest[PERF_GLOBAL_CTRL] &= ~pebs; > > > > > > But after, it's now: > > > > > > guest[PERF_GLOBAL_CTRL] = ctrl & (~host | ~pebs); > > > > > > I.e. the kernel is enabled counters in the guest that are not host-only OR not > > > PEBS. E.g. if only counter 0 is in use, it's using PEBS, but it's not exclusive > > > to the host, then the new code will yield (truncated to a single byte for sanity) > > > > > > 1 = 1 & (0xf | 0xe) > > > > > > and thus keep counter 0 enabled, whereas the old code would yield > > > > > > 1 = 1 & 0xf > > > 0 = 1 & 0xe > > > > > > A bit of a shot in the dark and completed untested, but I think this is the correct > > > fix? > > > > I am firing off some tests, and either way, thank you very much!!! > > Woo-hoo!!! ;-) > > Tested-by: Paul E. McKenney <paulmck@kernel.org> > > Will you be sending a proper patch, or would you prefer that I do so? > In the latter case, I would need your Signed-off-by. I will fast track this one to Linus. Paolo > And again, thank you very much!!! > > Thanx, Paul > > > > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c > > > index a08f794a0e79..92d5a3464cb2 100644 > > > --- a/arch/x86/events/intel/core.c > > > +++ b/arch/x86/events/intel/core.c > > > @@ -4056,7 +4056,7 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data) > > > arr[global_ctrl] = (struct perf_guest_switch_msr){ > > > .msr = MSR_CORE_PERF_GLOBAL_CTRL, > > > .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask, > > > - .guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask), > > > + .guest = intel_ctrl & ~(cpuc->intel_ctrl_host_mask | pebs_mask), > > > }; > > > > > > if (!x86_pmu.pebs) > > > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] Guest OSes die simultaneously (bisected) 2024-01-04 14:59 ` Paolo Bonzini @ 2024-01-04 16:06 ` Paul E. McKenney 2024-01-04 16:32 ` Paolo Bonzini 2024-01-04 17:07 ` Andi Kleen 0 siblings, 2 replies; 12+ messages in thread From: Paul E. McKenney @ 2024-01-04 16:06 UTC (permalink / raw) To: Paolo Bonzini Cc: Sean Christopherson, Like Xu, Andi Kleen, Kan Liang, Luwei Kang, Peter Zijlstra, linux-perf-users, linux-kernel, kvm, Breno Leitao, Arnaldo Carvalho de Melo, Ingo Molnar On Thu, Jan 04, 2024 at 03:59:52PM +0100, Paolo Bonzini wrote: > On Thu, Jan 4, 2024 at 3:58 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Wed, Jan 03, 2024 at 05:00:35PM -0800, Paul E. McKenney wrote: > > > On Wed, Jan 03, 2024 at 04:24:06PM -0800, Sean Christopherson wrote: > > > > On Wed, Jan 03, 2024, Paul E. McKenney wrote: > > > > > On Wed, Jan 03, 2024 at 02:22:23PM -0800, Paul E. McKenney wrote: > > > > > > Hello! > > > > > > > > > > > > Since some time between v5.19 and v6.4, long-running rcutorture tests > > > > > > would (rarely but intolerably often) have all guests on a given host die > > > > > > simultaneously with something like an instruction fault or a segmentation > > > > > > violation. > > > > > > > > > > > > Each bisection step required 20 hosts running 10 hours each, and > > > > > > this eventually fingered commit c59a1f106f5c ("KVM: x86/pmu: Add > > > > > > IA32_PEBS_ENABLE MSR emulation for extended PEBS"). Although this commit > > > > > > is certainly messing with things that could possibly cause all manner > > > > > > of mischief, I don't immediately see a smoking gun. Except that the > > > > > > commit prior to this one is rock solid. > > > > > > Just to make things a bit more exciting, bisection in mainline proved > > > > > > to be problematic due to bugs of various kinds that hid this one. I was > > > > > > therefore forced to bisect among the commits backported to the internal > > > > > > v5.19-based kernel, which fingered the backported version of the patch > > > > > > called out above. > > > > > > > > > > Ah, and so why do I believe that this is a problem in mainline rather > > > > > than just (say) a backporting mistake? > > > > > > > > > > Because this issue was first located in v6.4, which already has this > > > > > commit included. > > > > > > > > > > Thanx, Paul > > > > > > > > > > > Please note that this is not (yet) an emergency. I will just continue > > > > > > to run rcutorture on v5.19-based hypervisors in the meantime. > > > > > > > > > > > > Any suggestions for debugging or fixing? > > > > > > > > This looks suspect: > > > > > > > > + u64 pebs_mask = cpuc->pebs_enabled & x86_pmu.pebs_capable; > > > > + int global_ctrl, pebs_enable; > > > > > > > > - arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL; > > > > - arr[0].host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask; > > > > - arr[0].guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask; > > > > - arr[0].guest &= ~(cpuc->pebs_enabled & x86_pmu.pebs_capable); > > > > - *nr = 1; > > > > + *nr = 0; > > > > + global_ctrl = (*nr)++; > > > > + arr[global_ctrl] = (struct perf_guest_switch_msr){ > > > > + .msr = MSR_CORE_PERF_GLOBAL_CTRL, > > > > + .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask, > > > > + .guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask), > > > > + }; > > > > > > > > > > > > IIUC (always a big if with this code), the intent is that the guest's version of > > > > PERF_GLOBAL_CTRL gets bits that are (a) not exclusive to the host and (b) not > > > > being used for PEBS. (b) is necessary because PEBS generates records in memory > > > > using virtual addresses, i.e. the CPU will write to memory using a virtual address > > > > that is valid for the host but not the guest. And so PMU counters that are > > > > configured to generate PEBS records need to be disabled while running the guest. > > > > > > > > Before that commit, the logic was: > > > > > > > > guest[PERF_GLOBAL_CTRL] = ctrl & ~host; > > > > guest[PERF_GLOBAL_CTRL] &= ~pebs; > > > > > > > > But after, it's now: > > > > > > > > guest[PERF_GLOBAL_CTRL] = ctrl & (~host | ~pebs); > > > > > > > > I.e. the kernel is enabled counters in the guest that are not host-only OR not > > > > PEBS. E.g. if only counter 0 is in use, it's using PEBS, but it's not exclusive > > > > to the host, then the new code will yield (truncated to a single byte for sanity) > > > > > > > > 1 = 1 & (0xf | 0xe) > > > > > > > > and thus keep counter 0 enabled, whereas the old code would yield > > > > > > > > 1 = 1 & 0xf > > > > 0 = 1 & 0xe > > > > > > > > A bit of a shot in the dark and completed untested, but I think this is the correct > > > > fix? > > > > > > I am firing off some tests, and either way, thank you very much!!! > > > > Woo-hoo!!! ;-) > > > > Tested-by: Paul E. McKenney <paulmck@kernel.org> > > > > Will you be sending a proper patch, or would you prefer that I do so? > > In the latter case, I would need your Signed-off-by. > > I will fast track this one to Linus. Thank you, Paolo! One additional request, if I may be so bold... Although I am happy to have been able to locate the commit (and even happier that Sean spotted the problem and that you quickly pushed the fix to mainline!), chasing this consumed a lot of time and systems over an embarrassingly large number of months. As in I first spotted this bug in late July. Despite a number of increasingly complex attempts, bisection became feasible only after the buggy commit was backported to our internal v5.19 code base. :-( My (completely random) guess is that there is some rare combination of events that causes this code to fail. If so, is it feasible to construct a test that makes this rare combination of events less rare, so that similar future bugs are caught more quickly? And please understand that I am not casting shade on those who wrote, reviewed, and committed that buggy commit. As in I freely confess that I had to stare at Sean's fix for a few minutes before I figured out what was going on. Instead, the point I am trying to make is that carefully constructed tests can serve as tireless and accurate code reviewers. This won't ever replace actual code review, but my experience indicates that it will help find more bugs more quickly and more easily. Thanx, Paul > Paolo > > > And again, thank you very much!!! > > > > Thanx, Paul > > > > > > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c > > > > index a08f794a0e79..92d5a3464cb2 100644 > > > > --- a/arch/x86/events/intel/core.c > > > > +++ b/arch/x86/events/intel/core.c > > > > @@ -4056,7 +4056,7 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data) > > > > arr[global_ctrl] = (struct perf_guest_switch_msr){ > > > > .msr = MSR_CORE_PERF_GLOBAL_CTRL, > > > > .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask, > > > > - .guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask), > > > > + .guest = intel_ctrl & ~(cpuc->intel_ctrl_host_mask | pebs_mask), > > > > }; > > > > > > > > if (!x86_pmu.pebs) > > > > > > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] Guest OSes die simultaneously (bisected) 2024-01-04 16:06 ` Paul E. McKenney @ 2024-01-04 16:32 ` Paolo Bonzini 2024-01-04 17:25 ` Sean Christopherson 2024-01-04 19:23 ` Paul E. McKenney 2024-01-04 17:07 ` Andi Kleen 1 sibling, 2 replies; 12+ messages in thread From: Paolo Bonzini @ 2024-01-04 16:32 UTC (permalink / raw) To: paulmck Cc: Sean Christopherson, Like Xu, Andi Kleen, Kan Liang, Luwei Kang, Peter Zijlstra, linux-perf-users, linux-kernel, kvm, Breno Leitao, Arnaldo Carvalho de Melo, Ingo Molnar On 1/4/24 17:06, Paul E. McKenney wrote: > Although I am happy to have been able to locate the commit (and even > happier that Sean spotted the problem and that you quickly pushed the > fix to mainline!), chasing this consumed a lot of time and systems over > an embarrassingly large number of months. As in I first spotted this > bug in late July. Despite a number of increasingly complex attempts, > bisection became feasible only after the buggy commit was backported to > our internal v5.19 code base. 🙁 Yes, this strikes two sore points. One is that I have also experienced being able to bisect only with a somewhat more linear history (namely the CentOS Stream 9 aka c9s frankenkernel [1]) and not with upstream. Even if the c9s kernel is not a fully linear set of commits, there's some benefit from merge commits that consist of slightly more curated set of patches, where each merge commit includes both new features and bugfixes. Unfortunately, whether you'll be able to do this with the c9s kernel depends a lot on the subsystems involved and on the bug. Both are factors that may or may not be known in advance. The other, of course, is testing. The KVM selftests infrastructure is meant for this kind of white box problem, but the space of tests that can be written is so large, that there's always too few tests. It shines when you have a clear bisection but an unclear fix (in the past I have had cases where spending two days to write a test led me to writing a fix in thirty minutes), but boosting the reproducibility is always a good thing. > And please understand that I am not casting shade on those who wrote, > reviewed, and committed that buggy commit. As in I freely confess that > I had to stare at Sean's fix for a few minutes before I figured out what > was going on. Oh don't worry about that---rather, I am going to cast a shade on those that did not review the commit, namely me. I am somewhat obsessed with Boolean logic and *probably* I would have caught it, or would have asked to split the use of designated initializers to a separate patch. Any of the two could, at least potentially, have saved you quite some time. > Instead, the point I am trying to make is that carefully > constructed tests can serve as tireless and accurate code reviewers. > This won't ever replace actual code review, but my experience indicates > that it will help find more bugs more quickly and more easily. TBH this (conflict between virtual addresses on the host and the guest leading to corruption of the guest) is probably not the kind of adversarial test that one would have written or suggested right off the bat. But it should be written now indeed. Paolo [1] https://www.theregister.com/2023/06/30/enterprise_distro_feature_devconf/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] Guest OSes die simultaneously (bisected) 2024-01-04 16:32 ` Paolo Bonzini @ 2024-01-04 17:25 ` Sean Christopherson 2024-01-04 19:23 ` Paul E. McKenney 1 sibling, 0 replies; 12+ messages in thread From: Sean Christopherson @ 2024-01-04 17:25 UTC (permalink / raw) To: Paolo Bonzini Cc: paulmck, Like Xu, Andi Kleen, Kan Liang, Luwei Kang, Peter Zijlstra, linux-perf-users, linux-kernel, kvm, Breno Leitao, Arnaldo Carvalho de Melo, Ingo Molnar On Thu, Jan 04, 2024, Paolo Bonzini wrote: > On 1/4/24 17:06, Paul E. McKenney wrote: > > Instead, the point I am trying to make is that carefully > > constructed tests can serve as tireless and accurate code reviewers. > > This won't ever replace actual code review, but my experience indicates > > that it will help find more bugs more quickly and more easily. > > TBH this (conflict between virtual addresses on the host and the guest > leading to corruption of the guest) is probably not the kind of adversarial > test that one would have written or suggested right off the bat. I disagree. The flaws with PEBS using a virtual address is blatantly obvious to anyone that has spent any time dealing with the cross-section of PMU and VMX. Intel even explicitly added "isolation" functionality to ensure PEBS can't overrun VM-Enter and generate host records in the guest. Not to mention that Intel specifically addressed the virtual addressing issue in the design of Processor Trace (PT, a.k.a. RTIT). In other words, we *knew* exactly what would break *and* there had been breakage in the past. Chalk it up to messed up priorities, poor test infrastructure, or anything along those lines. But we shouldn't pretend that this was some obscure edge case that didn't warrant a dedicated test from the get-go. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] Guest OSes die simultaneously (bisected) 2024-01-04 16:32 ` Paolo Bonzini 2024-01-04 17:25 ` Sean Christopherson @ 2024-01-04 19:23 ` Paul E. McKenney 1 sibling, 0 replies; 12+ messages in thread From: Paul E. McKenney @ 2024-01-04 19:23 UTC (permalink / raw) To: Paolo Bonzini Cc: Sean Christopherson, Like Xu, Andi Kleen, Kan Liang, Luwei Kang, Peter Zijlstra, linux-perf-users, linux-kernel, kvm, Breno Leitao, Arnaldo Carvalho de Melo, Ingo Molnar On Thu, Jan 04, 2024 at 05:32:34PM +0100, Paolo Bonzini wrote: > On 1/4/24 17:06, Paul E. McKenney wrote: > > Although I am happy to have been able to locate the commit (and even > > happier that Sean spotted the problem and that you quickly pushed the > > fix to mainline!), chasing this consumed a lot of time and systems over > > an embarrassingly large number of months. As in I first spotted this > > bug in late July. Despite a number of increasingly complex attempts, > > bisection became feasible only after the buggy commit was backported to > > our internal v5.19 code base. 🙁 > > Yes, this strikes two sore points. > > One is that I have also experienced being able to bisect only with a > somewhat more linear history (namely the CentOS Stream 9 aka c9s > frankenkernel [1]) and not with upstream. Even if the c9s kernel is not a > fully linear set of commits, there's some benefit from merge commits that > consist of slightly more curated set of patches, where each merge commit > includes both new features and bugfixes. Unfortunately, whether you'll be > able to do this with the c9s kernel depends a lot on the subsystems involved > and on the bug. Both are factors that may or may not be known in advance. I guess I am glad that it is not just me. ;-) > The other, of course, is testing. The KVM selftests infrastructure is meant > for this kind of white box problem, but the space of tests that can be > written is so large, that there's always too few tests. It shines when you > have a clear bisection but an unclear fix (in the past I have had cases > where spending two days to write a test led me to writing a fix in thirty > minutes), but boosting the reproducibility is always a good thing. Agreed, validation never will be perfect, and so improving the test suite based on production experience is a good thing, as is creating test cases based on the behavior of important production workloads for those who run them. > > And please understand that I am not casting shade on those who wrote, > > reviewed, and committed that buggy commit. As in I freely confess that > > I had to stare at Sean's fix for a few minutes before I figured out what > > was going on. > > Oh don't worry about that---rather, I am going to cast a shade on those that > did not review the commit, namely me. I am somewhat obsessed with Boolean > logic and *probably* I would have caught it, or would have asked to split > the use of designated initializers to a separate patch. Any of the two > could, at least potentially, have saved you quite some time. We have all done similar things. I certainly have! > > Instead, the point I am trying to make is that carefully > > constructed tests can serve as tireless and accurate code reviewers. > > This won't ever replace actual code review, but my experience indicates > > that it will help find more bugs more quickly and more easily. > > TBH this (conflict between virtual addresses on the host and the guest > leading to corruption of the guest) is probably not the kind of adversarial > test that one would have written or suggested right off the bat. But it > should be written now indeed. Very good, looking forward to seeing it! Thanx, Paul > Paolo > > [1] > https://www.theregister.com/2023/06/30/enterprise_distro_feature_devconf/ > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] Guest OSes die simultaneously (bisected) 2024-01-04 16:06 ` Paul E. McKenney 2024-01-04 16:32 ` Paolo Bonzini @ 2024-01-04 17:07 ` Andi Kleen 1 sibling, 0 replies; 12+ messages in thread From: Andi Kleen @ 2024-01-04 17:07 UTC (permalink / raw) To: Paul E. McKenney Cc: Paolo Bonzini, Sean Christopherson, Like Xu, Kan Liang, Luwei Kang, Peter Zijlstra, linux-perf-users, linux-kernel, kvm, Breno Leitao, Arnaldo Carvalho de Melo, Ingo Molnar > My (completely random) guess is that there is some rare combination > of events that causes this code to fail. If so, is it feasible to > construct a test that makes this rare combination of events less rare, > so that similar future bugs are caught more quickly? Yes, I tested something similar before. What you need is create lots of PMIs with perf (running perf top should be enough) and a workload that creates lots of exits in a guest (e.g. running fio on a virtio device). This will stress test this particular path. -Andi ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] Guest OSes die simultaneously (bisected) 2024-01-03 22:22 [BUG] Guest OSes die simultaneously (bisected) Paul E. McKenney 2024-01-03 23:27 ` Paul E. McKenney @ 2024-01-04 10:01 ` Breno Leitao 1 sibling, 0 replies; 12+ messages in thread From: Breno Leitao @ 2024-01-04 10:01 UTC (permalink / raw) To: Paul E. McKenney Cc: Like Xu, Andi Kleen, Kan Liang, Luwei Kang, Peter Zijlstra, Paolo Bonzini, linux-perf-users, linux-kernel, kvm, Arnaldo Carvalho de Melo, Ingo Molnar, rbc On Wed, Jan 03, 2024 at 02:22:23PM -0800, Paul E. McKenney wrote: > I was > therefore forced to bisect among the commits backported to the internal > v5.19-based kernel, which fingered the backported version of the patch > called out above. Just to add some context to these backport, this commit (c59a1f106f5c) was backported to the internal v5.19-based kernel in order to easily backport these two fixes. a16eb25b09c02a54c ("KVM: x86: Mask LVTPC when handling a PMI") 73554b29bd70546c1 ("KVM: x86/pmu: Synthesize at most one PMI per VM-exit") They are required to solve the softlockup problem reported here: https://lore.kernel.org/all/ZRwcpki67uhpAUKi@gmail.com/ ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-01-04 19:23 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-01-03 22:22 [BUG] Guest OSes die simultaneously (bisected) Paul E. McKenney 2024-01-03 23:27 ` Paul E. McKenney 2024-01-04 0:24 ` Sean Christopherson 2024-01-04 1:00 ` Paul E. McKenney 2024-01-04 14:50 ` Paul E. McKenney 2024-01-04 14:59 ` Paolo Bonzini 2024-01-04 16:06 ` Paul E. McKenney 2024-01-04 16:32 ` Paolo Bonzini 2024-01-04 17:25 ` Sean Christopherson 2024-01-04 19:23 ` Paul E. McKenney 2024-01-04 17:07 ` Andi Kleen 2024-01-04 10:01 ` Breno Leitao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).