* [PATCH v2 0/2] Update mce_record tracepoint
@ 2024-01-25 18:48 Avadhut Naik
2024-01-25 18:48 ` [PATCH v2 1/2] tracing: Include PPIN in " Avadhut Naik
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: Avadhut Naik @ 2024-01-25 18:48 UTC (permalink / raw)
To: linux-trace-kernel, linux-edac
Cc: rostedt, tony.luck, bp, x86, linux-kernel, yazen.ghannam,
avadnaik
This patchset updates the mce_record tracepoint so that the recently added
fields of struct mce are exported through it to userspace.
The first patch adds PPIN (Protected Processor Inventory Number) field to
the tracepoint.
The second patch adds the microcode field (Microcode Revision) to the
tracepoint.
Changes in v2:
- Export microcode field (Microcode Revision) through the tracepoiont in
addition to PPIN.
Avadhut Naik (2):
tracing: Include PPIN in mce_record tracepoint
tracing: Include Microcode Revision in mce_record tracepoint
include/trace/events/mce.h | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
base-commit: 0d4f19418f067465b0a84a287d9a51e443a0bc3a
--
2.34.1
^ permalink raw reply [flat|nested] 18+ messages in thread* [PATCH v2 1/2] tracing: Include PPIN in mce_record tracepoint 2024-01-25 18:48 [PATCH v2 0/2] Update mce_record tracepoint Avadhut Naik @ 2024-01-25 18:48 ` Avadhut Naik 2024-01-25 20:58 ` Sohil Mehta 2024-01-25 18:48 ` [PATCH v2 2/2] tracing: Include Microcode Revision " Avadhut Naik 2024-01-25 18:58 ` [PATCH v2 0/2] Update " Borislav Petkov 2 siblings, 1 reply; 18+ messages in thread From: Avadhut Naik @ 2024-01-25 18:48 UTC (permalink / raw) To: linux-trace-kernel, linux-edac Cc: rostedt, tony.luck, bp, x86, linux-kernel, yazen.ghannam, avadnaik Machine Check Error information from struct mce is exported to userspace through the mce_record tracepoint. Currently, however, the PPIN (Protected Processor Inventory Number) field of struct mce is not exported through the tracepoint. Export PPIN through the tracepoint as it may provide useful information for debug and analysis. Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> --- include/trace/events/mce.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h index 1391ada0da3b..657b93ec8176 100644 --- a/include/trace/events/mce.h +++ b/include/trace/events/mce.h @@ -25,6 +25,7 @@ TRACE_EVENT(mce_record, __field( u64, ipid ) __field( u64, ip ) __field( u64, tsc ) + __field( u64, ppin ) __field( u64, walltime ) __field( u32, cpu ) __field( u32, cpuid ) @@ -45,6 +46,7 @@ TRACE_EVENT(mce_record, __entry->ipid = m->ipid; __entry->ip = m->ip; __entry->tsc = m->tsc; + __entry->ppin = m->ppin; __entry->walltime = m->time; __entry->cpu = m->extcpu; __entry->cpuid = m->cpuid; @@ -55,7 +57,7 @@ TRACE_EVENT(mce_record, __entry->cpuvendor = m->cpuvendor; ), - TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x", + TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PPIN: %llx, PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x", __entry->cpu, __entry->mcgcap, __entry->mcgstatus, __entry->bank, __entry->status, @@ -63,6 +65,7 @@ TRACE_EVENT(mce_record, __entry->addr, __entry->misc, __entry->synd, __entry->cs, __entry->ip, __entry->tsc, + __entry->ppin, __entry->cpuvendor, __entry->cpuid, __entry->walltime, __entry->socketid, -- 2.34.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/2] tracing: Include PPIN in mce_record tracepoint 2024-01-25 18:48 ` [PATCH v2 1/2] tracing: Include PPIN in " Avadhut Naik @ 2024-01-25 20:58 ` Sohil Mehta 0 siblings, 0 replies; 18+ messages in thread From: Sohil Mehta @ 2024-01-25 20:58 UTC (permalink / raw) To: Avadhut Naik, linux-trace-kernel, linux-edac Cc: rostedt, tony.luck, bp, x86, linux-kernel, yazen.ghannam, avadnaik On 1/25/2024 10:48 AM, Avadhut Naik wrote: > Machine Check Error information from struct mce is exported to userspace > through the mce_record tracepoint. > > Currently, however, the PPIN (Protected Processor Inventory Number) field > of struct mce is not exported through the tracepoint. > > Export PPIN through the tracepoint as it may provide useful information > for debug and analysis. > > Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> > --- The patch looks fine to me expect for a nit below. With that fixed, please feel free to add: Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> > include/trace/events/mce.h | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h > index 1391ada0da3b..657b93ec8176 100644 > --- a/include/trace/events/mce.h > +++ b/include/trace/events/mce.h > @@ -25,6 +25,7 @@ TRACE_EVENT(mce_record, > __field( u64, ipid ) > __field( u64, ip ) > __field( u64, tsc ) > + __field( u64, ppin ) The tabs are not aligned with the rest of the structure. > __field( u64, walltime ) > __field( u32, cpu ) > __field( u32, cpuid ) > @@ -45,6 +46,7 @@ TRACE_EVENT(mce_record, > __entry->ipid = m->ipid; > __entry->ip = m->ip; > __entry->tsc = m->tsc; > + __entry->ppin = m->ppin; > __entry->walltime = m->time; > __entry->cpu = m->extcpu; > __entry->cpuid = m->cpuid; > @@ -55,7 +57,7 @@ TRACE_EVENT(mce_record, > __entry->cpuvendor = m->cpuvendor; > ), > > - TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x", > + TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PPIN: %llx, PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x", > __entry->cpu, > __entry->mcgcap, __entry->mcgstatus, > __entry->bank, __entry->status, > @@ -63,6 +65,7 @@ TRACE_EVENT(mce_record, > __entry->addr, __entry->misc, __entry->synd, > __entry->cs, __entry->ip, > __entry->tsc, > + __entry->ppin, > __entry->cpuvendor, __entry->cpuid, > __entry->walltime, > __entry->socketid, ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 2/2] tracing: Include Microcode Revision in mce_record tracepoint 2024-01-25 18:48 [PATCH v2 0/2] Update mce_record tracepoint Avadhut Naik 2024-01-25 18:48 ` [PATCH v2 1/2] tracing: Include PPIN in " Avadhut Naik @ 2024-01-25 18:48 ` Avadhut Naik 2024-01-25 21:03 ` Sohil Mehta 2024-01-25 18:58 ` [PATCH v2 0/2] Update " Borislav Petkov 2 siblings, 1 reply; 18+ messages in thread From: Avadhut Naik @ 2024-01-25 18:48 UTC (permalink / raw) To: linux-trace-kernel, linux-edac Cc: rostedt, tony.luck, bp, x86, linux-kernel, yazen.ghannam, avadnaik Currently, the microcode field (Microcode Revision) of struct mce is not exported to userspace through the mce_record tracepoint. Export it through the tracepoint as it may provide useful information for debug and analysis. Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> --- include/trace/events/mce.h | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h index 657b93ec8176..203baccd3c5c 100644 --- a/include/trace/events/mce.h +++ b/include/trace/events/mce.h @@ -34,6 +34,7 @@ TRACE_EVENT(mce_record, __field( u8, cs ) __field( u8, bank ) __field( u8, cpuvendor ) + __field( u32, microcode ) ), TP_fast_assign( @@ -55,9 +56,10 @@ TRACE_EVENT(mce_record, __entry->cs = m->cs; __entry->bank = m->bank; __entry->cpuvendor = m->cpuvendor; + __entry->microcode = m->microcode; ), - TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PPIN: %llx, PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x", + TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PPIN: %llx, PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x, MICROCODE REVISION: %u", __entry->cpu, __entry->mcgcap, __entry->mcgstatus, __entry->bank, __entry->status, @@ -69,7 +71,8 @@ TRACE_EVENT(mce_record, __entry->cpuvendor, __entry->cpuid, __entry->walltime, __entry->socketid, - __entry->apicid) + __entry->apicid, + __entry->microcode) ); #endif /* _TRACE_MCE_H */ -- 2.34.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/2] tracing: Include Microcode Revision in mce_record tracepoint 2024-01-25 18:48 ` [PATCH v2 2/2] tracing: Include Microcode Revision " Avadhut Naik @ 2024-01-25 21:03 ` Sohil Mehta 2024-01-26 1:35 ` Naik, Avadhut 0 siblings, 1 reply; 18+ messages in thread From: Sohil Mehta @ 2024-01-25 21:03 UTC (permalink / raw) To: Avadhut Naik, linux-trace-kernel, linux-edac Cc: rostedt, tony.luck, bp, x86, linux-kernel, yazen.ghannam, avadnaik On 1/25/2024 10:48 AM, Avadhut Naik wrote: > Currently, the microcode field (Microcode Revision) of struct mce is not > exported to userspace through the mce_record tracepoint. > > Export it through the tracepoint as it may provide useful information for > debug and analysis. > > Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> > --- A couple of nits below. Apart from that the patch looks fine to me. Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> > include/trace/events/mce.h | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h > index 657b93ec8176..203baccd3c5c 100644 > --- a/include/trace/events/mce.h > +++ b/include/trace/events/mce.h > @@ -34,6 +34,7 @@ TRACE_EVENT(mce_record, > __field( u8, cs ) > __field( u8, bank ) > __field( u8, cpuvendor ) > + __field( u32, microcode ) Tab alignment is inconsistent. > ), > > TP_fast_assign( > @@ -55,9 +56,10 @@ TRACE_EVENT(mce_record, > __entry->cs = m->cs; > __entry->bank = m->bank; > __entry->cpuvendor = m->cpuvendor; > + __entry->microcode = m->microcode; > ), > > - TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PPIN: %llx, PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x", > + TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PPIN: %llx, PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x, MICROCODE REVISION: %u", Should microcode by printed as a decimal or an hexadecimal? Elsewhere such as __print_mce(), it is printed as an hexadecimal: /* * Note this output is parsed by external tools and old fields * should not be changed. */ pr_emerg(HW_ERR "PROCESSOR %u:%x TIME %llu SOCKET %u APIC %x microcode %x\n", m->cpuvendor, m->cpuid, m->time, m->socketid, m->apicid, m->microcode); > __entry->cpu, > __entry->mcgcap, __entry->mcgstatus, > __entry->bank, __entry->status, > @@ -69,7 +71,8 @@ TRACE_EVENT(mce_record, > __entry->cpuvendor, __entry->cpuid, > __entry->walltime, > __entry->socketid, > - __entry->apicid) > + __entry->apicid, > + __entry->microcode) > ); > > #endif /* _TRACE_MCE_H */ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/2] tracing: Include Microcode Revision in mce_record tracepoint 2024-01-25 21:03 ` Sohil Mehta @ 2024-01-26 1:35 ` Naik, Avadhut 0 siblings, 0 replies; 18+ messages in thread From: Naik, Avadhut @ 2024-01-26 1:35 UTC (permalink / raw) To: Sohil Mehta, linux-trace-kernel, linux-edac Cc: rostedt, tony.luck, bp, x86, linux-kernel, yazen.ghannam, Avadhut Naik On 1/25/2024 3:03 PM, Sohil Mehta wrote: > On 1/25/2024 10:48 AM, Avadhut Naik wrote: >> Currently, the microcode field (Microcode Revision) of struct mce is not >> exported to userspace through the mce_record tracepoint. >> >> Export it through the tracepoint as it may provide useful information for >> debug and analysis. >> >> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> >> --- > > A couple of nits below. > > Apart from that the patch looks fine to me. > > Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> > >> include/trace/events/mce.h | 7 +++++-- >> 1 file changed, 5 insertions(+), 2 deletions(-) >> >> diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h >> index 657b93ec8176..203baccd3c5c 100644 >> --- a/include/trace/events/mce.h >> +++ b/include/trace/events/mce.h >> @@ -34,6 +34,7 @@ TRACE_EVENT(mce_record, >> __field( u8, cs ) >> __field( u8, bank ) >> __field( u8, cpuvendor ) >> + __field( u32, microcode ) > > Tab alignment is inconsistent. > >> ), >> >> TP_fast_assign( >> @@ -55,9 +56,10 @@ TRACE_EVENT(mce_record, >> __entry->cs = m->cs; >> __entry->bank = m->bank; >> __entry->cpuvendor = m->cpuvendor; >> + __entry->microcode = m->microcode; >> ), >> >> - TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PPIN: %llx, PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x", >> + TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PPIN: %llx, PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x, MICROCODE REVISION: %u", > > Should microcode by printed as a decimal or an hexadecimal? Elsewhere > such as __print_mce(), it is printed as an hexadecimal: > > /* > * Note this output is parsed by external tools and old fields > * should not be changed. > */ > pr_emerg(HW_ERR "PROCESSOR %u:%x TIME %llu SOCKET %u APIC %x > microcode %x\n", > m->cpuvendor, m->cpuid, m->time, m->socketid, m->apicid, > m->microcode); > > Had kept the field as decimal since I considered that version should be a decimal number instead of a hex value. Hadn't noticed the above log in core.c file. Thanks for pointing it out. Since we now have precedent that microcode version values are being reported in hex, will change the above format specifier to %x. Will just submit a new version, addressing the tab alignments too. > > >> __entry->cpu, >> __entry->mcgcap, __entry->mcgstatus, >> __entry->bank, __entry->status, >> @@ -69,7 +71,8 @@ TRACE_EVENT(mce_record, >> __entry->cpuvendor, __entry->cpuid, >> __entry->walltime, >> __entry->socketid, >> - __entry->apicid) >> + __entry->apicid, >> + __entry->microcode) >> ); >> >> #endif /* _TRACE_MCE_H */ > -- Thanks, Avadhut Naik ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 0/2] Update mce_record tracepoint 2024-01-25 18:48 [PATCH v2 0/2] Update mce_record tracepoint Avadhut Naik 2024-01-25 18:48 ` [PATCH v2 1/2] tracing: Include PPIN in " Avadhut Naik 2024-01-25 18:48 ` [PATCH v2 2/2] tracing: Include Microcode Revision " Avadhut Naik @ 2024-01-25 18:58 ` Borislav Petkov 2024-01-25 19:19 ` Luck, Tony 2 siblings, 1 reply; 18+ messages in thread From: Borislav Petkov @ 2024-01-25 18:58 UTC (permalink / raw) To: Avadhut Naik Cc: linux-trace-kernel, linux-edac, rostedt, tony.luck, x86, linux-kernel, yazen.ghannam, avadnaik On Thu, Jan 25, 2024 at 12:48:55PM -0600, Avadhut Naik wrote: > This patchset updates the mce_record tracepoint so that the recently added > fields of struct mce are exported through it to userspace. > > The first patch adds PPIN (Protected Processor Inventory Number) field to > the tracepoint. > > The second patch adds the microcode field (Microcode Revision) to the > tracepoint. This is a lot of static information to add to *every* MCE. And where does it end? Stick full dmesg in the tracepoint too? What is the real-life use case here? -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH v2 0/2] Update mce_record tracepoint 2024-01-25 18:58 ` [PATCH v2 0/2] Update " Borislav Petkov @ 2024-01-25 19:19 ` Luck, Tony 2024-01-25 20:27 ` Naik, Avadhut 2024-01-26 10:27 ` Borislav Petkov 0 siblings, 2 replies; 18+ messages in thread From: Luck, Tony @ 2024-01-25 19:19 UTC (permalink / raw) To: Borislav Petkov, Avadhut Naik Cc: linux-trace-kernel@vger.kernel.org, linux-edac@vger.kernel.org, rostedt@goodmis.org, x86@kernel.org, linux-kernel@vger.kernel.org, yazen.ghannam@amd.com, avadnaik@amd.com > > The first patch adds PPIN (Protected Processor Inventory Number) field to > > the tracepoint. > > > > The second patch adds the microcode field (Microcode Revision) to the > > tracepoint. > > This is a lot of static information to add to *every* MCE. 8 bytes for PPIN, 4 more for microcode. Number of recoverable machine checks per system .... I hope the monthly rate should be countable on my fingers. If a system is getting more than that, then people should be looking at fixing the underlying problem. Corrected errors are much more common. Though Linux takes action to limit the rate when storms occur. So maybe hundreds or small numbers of thousands of error trace records? Increase in trace buffer consumption still measured in Kbytes not Mbytes. Server systems that do machine check reporting now start at tens of GBytes memory. > And where does it end? Stick full dmesg in the tracepoint too? Seems like overkill. > What is the real-life use case here? Systems using rasdaemon to track errors will be able to track both of these (I assume that Naik has plans to update rasdaemon to capture and save these new fields). PPIN is useful when talking to the CPU vendor about patterns of similar errors seen across a cluster. MICROCODE - gives a fast path to root cause problems that have already been fixed in a microcode update. -Tony ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 0/2] Update mce_record tracepoint 2024-01-25 19:19 ` Luck, Tony @ 2024-01-25 20:27 ` Naik, Avadhut 2024-01-26 10:27 ` Borislav Petkov 1 sibling, 0 replies; 18+ messages in thread From: Naik, Avadhut @ 2024-01-25 20:27 UTC (permalink / raw) To: Luck, Tony, Borislav Petkov Cc: linux-trace-kernel@vger.kernel.org, linux-edac@vger.kernel.org, rostedt@goodmis.org, x86@kernel.org, linux-kernel@vger.kernel.org, yazen.ghannam@amd.com, Avadhut Naik Hi, On 1/25/2024 1:19 PM, Luck, Tony wrote: >>> The first patch adds PPIN (Protected Processor Inventory Number) field to >>> the tracepoint. >>> >>> The second patch adds the microcode field (Microcode Revision) to the >>> tracepoint. >> >> This is a lot of static information to add to *every* MCE. > > 8 bytes for PPIN, 4 more for microcode. > > Number of recoverable machine checks per system .... I hope the monthly rate should > be countable on my fingers. If a system is getting more than that, then people should > be looking at fixing the underlying problem. > > Corrected errors are much more common. Though Linux takes action to limit the > rate when storms occur. So maybe hundreds or small numbers of thousands of > error trace records? Increase in trace buffer consumption still measured in Kbytes > not Mbytes. Server systems that do machine check reporting now start at tens of > GBytes memory. > >> And where does it end? Stick full dmesg in the tracepoint too? > > Seems like overkill. > >> What is the real-life use case here? > > Systems using rasdaemon to track errors will be able to track both of these > (I assume that Naik has plans to update rasdaemon to capture and save these > new fields). > Yes, I do intend to submit a pull request to the rasdaemon to parse and log these new fields. > PPIN is useful when talking to the CPU vendor about patterns of similar errors > seen across a cluster. > > MICROCODE - gives a fast path to root cause problems that have already > been fixed in a microcode update. > > -Tony -- Thanks, Avadhut Naik ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 0/2] Update mce_record tracepoint 2024-01-25 19:19 ` Luck, Tony 2024-01-25 20:27 ` Naik, Avadhut @ 2024-01-26 10:27 ` Borislav Petkov 2024-01-26 17:10 ` Luck, Tony 1 sibling, 1 reply; 18+ messages in thread From: Borislav Petkov @ 2024-01-26 10:27 UTC (permalink / raw) To: Luck, Tony Cc: Avadhut Naik, linux-trace-kernel@vger.kernel.org, linux-edac@vger.kernel.org, rostedt@goodmis.org, x86@kernel.org, linux-kernel@vger.kernel.org, yazen.ghannam@amd.com, avadnaik@amd.com On Thu, Jan 25, 2024 at 07:19:22PM +0000, Luck, Tony wrote: > 8 bytes for PPIN, 4 more for microcode. I know, nothing leads to bloat like 0.01% here, 0.001% there... > Number of recoverable machine checks per system .... I hope the > monthly rate should be countable on my fingers... That's not the point. Rather, when you look at MCE reports, you pretty much almost always go and collect additional information from the target machine because you want to figure out what exactly is going on. So what's stopping you from collecting all that static information instead of parrotting it through the tracepoint with each error? > PPIN is useful when talking to the CPU vendor about patterns of > similar errors seen across a cluster. I guess that is perhaps the only thing of the two that makes some sense at least - the identifier uniquely describes which CPU the error comes from... > MICROCODE - gives a fast path to root cause problems that have already > been fixed in a microcode update. But that, nah. See above. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH v2 0/2] Update mce_record tracepoint 2024-01-26 10:27 ` Borislav Petkov @ 2024-01-26 17:10 ` Luck, Tony 2024-01-26 18:56 ` Borislav Petkov 0 siblings, 1 reply; 18+ messages in thread From: Luck, Tony @ 2024-01-26 17:10 UTC (permalink / raw) To: Borislav Petkov Cc: Avadhut Naik, linux-trace-kernel@vger.kernel.org, linux-edac@vger.kernel.org, rostedt@goodmis.org, x86@kernel.org, linux-kernel@vger.kernel.org, yazen.ghannam@amd.com, avadnaik@amd.com > > 8 bytes for PPIN, 4 more for microcode. > > I know, nothing leads to bloat like 0.01% here, 0.001% there... 12 extra bytes divided by (say) 64GB (a very small server these days, may laptop has that much) = 0.00000001746% We will need 57000 changes like this one before we get to 0.001% :-) > > Number of recoverable machine checks per system .... I hope the > > monthly rate should be countable on my fingers... > > That's not the point. Rather, when you look at MCE reports, you pretty > much almost always go and collect additional information from the target > machine because you want to figure out what exactly is going on. > > So what's stopping you from collecting all that static information > instead of parrotting it through the tracepoint with each error? PPIN is static. So with good tracking to keep source platform information attached to the error record as it gets passed around people trying to triage the problem, you could say it can be retrieved later (or earlier when setting up a database of attributes of each machine in the fleet. But the key there is keeping the details of the source machine attached to the error record. My first contact with machine check debugging is always just the raw error record (from mcelog, rasdaemon, or console log). > > PPIN is useful when talking to the CPU vendor about patterns of > > similar errors seen across a cluster. > > I guess that is perhaps the only thing of the two that makes some sense > at least - the identifier uniquely describes which CPU the error comes > from... > > > MICROCODE - gives a fast path to root cause problems that have already > > been fixed in a microcode update. > > But that, nah. See above. Knowing which microcode version was loaded on a core *at the time of the error* is critical. You've spent enough time with Ashok and Thomas tweaking the Linux microcode driver to know that going back to the machine the next day to ask about microcode version has a bunch of ways to get a wrong answer. -Tony ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 0/2] Update mce_record tracepoint 2024-01-26 17:10 ` Luck, Tony @ 2024-01-26 18:56 ` Borislav Petkov 2024-01-26 19:15 ` Luck, Tony 0 siblings, 1 reply; 18+ messages in thread From: Borislav Petkov @ 2024-01-26 18:56 UTC (permalink / raw) To: Luck, Tony Cc: Avadhut Naik, linux-trace-kernel@vger.kernel.org, linux-edac@vger.kernel.org, rostedt@goodmis.org, x86@kernel.org, linux-kernel@vger.kernel.org, yazen.ghannam@amd.com, avadnaik@amd.com On Fri, Jan 26, 2024 at 05:10:20PM +0000, Luck, Tony wrote: > 12 extra bytes divided by (say) 64GB (a very small server these days, may laptop has that much) > = 0.00000001746% > > We will need 57000 changes like this one before we get to 0.001% :-) You're forgetting that those 12 bytes repeat per MCE tracepoint logged. And there's other code which adds more 0.01% here and there, well, because we can. > But the key there is keeping the details of the source machine attached to > the error record. My first contact with machine check debugging is always > just the raw error record (from mcelog, rasdaemon, or console log). Yes, that is somewhat sensible reason to have the PPIN together with the MCE record. > Knowing which microcode version was loaded on a core *at the time of > the error* is critical. So is the rest of the debug info you're going to need from that machine. And yet we're not adding that to the tracepoint. > You've spent enough time with Ashok and Thomas tweaking the Linux > microcode driver to know that going back to the machine the next day > to ask about microcode version has a bunch of ways to get a wrong > answer. Huh, what does that have to do with this? IIUC, if someone changes something on the system, whether that is updating microcode or swapping a harddrive or swapping memory or whatever, that invalidates the errors reported, pretty much. You can't put it all in the trace record, you just can't. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH v2 0/2] Update mce_record tracepoint 2024-01-26 18:56 ` Borislav Petkov @ 2024-01-26 19:15 ` Luck, Tony 2024-01-26 19:45 ` Borislav Petkov 0 siblings, 1 reply; 18+ messages in thread From: Luck, Tony @ 2024-01-26 19:15 UTC (permalink / raw) To: Borislav Petkov Cc: Avadhut Naik, linux-trace-kernel@vger.kernel.org, linux-edac@vger.kernel.org, rostedt@goodmis.org, x86@kernel.org, linux-kernel@vger.kernel.org, yazen.ghannam@amd.com, avadnaik@amd.com > > You've spent enough time with Ashok and Thomas tweaking the Linux > > microcode driver to know that going back to the machine the next day > > to ask about microcode version has a bunch of ways to get a wrong > > answer. > > Huh, what does that have to do with this? If deployment of a microcode update across a fleet always went flawlessly, life would be simpler. But things can fail. And maybe the failure wasn't noticed. Perhaps a node was rebooting when the sysadmin pushed the update to the fleet and so missed the deployment. Perhaps one core was already acting weird and the microcode update didn't get applied to that core. > IIUC, if someone changes something on the system, whether that is > updating microcode or swapping a harddrive or swapping memory or > whatever, that invalidates the errors reported, pretty much. > > You can't put it all in the trace record, you just can't. Swapping a hard drive, or hot plugging a NIC isn't very likely to correlate with an error reported by the CPU in machine check banks. But microcode can be (and has been) the issue in enough cases that knowing the version at the time of the error matters. You seemed to agree with this argument when the microcode field was added to "struct mce" back in 2018 fa94d0c6e0f3 ("x86/MCE: Save microcode revision in machine check records") Is it so very different to add this to a trace record so that rasdaemon can have feature parity with mcelog(8)? -Tony ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 0/2] Update mce_record tracepoint 2024-01-26 19:15 ` Luck, Tony @ 2024-01-26 19:45 ` Borislav Petkov 2024-01-26 20:49 ` Luck, Tony 0 siblings, 1 reply; 18+ messages in thread From: Borislav Petkov @ 2024-01-26 19:45 UTC (permalink / raw) To: Luck, Tony Cc: Avadhut Naik, linux-trace-kernel@vger.kernel.org, linux-edac@vger.kernel.org, rostedt@goodmis.org, x86@kernel.org, linux-kernel@vger.kernel.org, yazen.ghannam@amd.com, avadnaik@amd.com On Fri, Jan 26, 2024 at 07:15:50PM +0000, Luck, Tony wrote: > If deployment of a microcode update across a fleet always went > flawlessly, life would be simpler. But things can fail. And maybe the > failure wasn't noticed. Perhaps a node was rebooting when the sysadmin > pushed the update to the fleet and so missed the deployment. Perhaps > one core was already acting weird and the microcode update didn't get > applied to that core. Yes, and you go collect data from that box. You will have to anyway to figure out why the microcode didn't update. > Swapping a hard drive, or hot plugging a NIC isn't very likely > to correlate with an error reported by the CPU in machine > check banks. Ofc it is - coherent probe timeoutting due to problematic insertion could be reported with a MCE, and so on and so on. > Is it so very different to add this to a trace record so that rasdaemon > can have feature parity with mcelog(8)? I knew you were gonna say that. When someone decides that it is a splendid idea to add more stuff to struct mce then said someone would want it in the tracepoint too. And then we're back to my original question: "And where does it end? Stick full dmesg in the tracepoint too?" Where do you draw the line in the sand and say, no more, especially static, fields bloating the trace record should be added and from then on, you should go collect the info from that box. Something which you're supposed to do anyway. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH v2 0/2] Update mce_record tracepoint 2024-01-26 19:45 ` Borislav Petkov @ 2024-01-26 20:49 ` Luck, Tony 2024-01-26 21:11 ` Borislav Petkov 0 siblings, 1 reply; 18+ messages in thread From: Luck, Tony @ 2024-01-26 20:49 UTC (permalink / raw) To: Borislav Petkov Cc: Avadhut Naik, linux-trace-kernel@vger.kernel.org, linux-edac@vger.kernel.org, rostedt@goodmis.org, x86@kernel.org, linux-kernel@vger.kernel.org, yazen.ghannam@amd.com, avadnaik@amd.com > > Is it so very different to add this to a trace record so that rasdaemon > > can have feature parity with mcelog(8)? > > I knew you were gonna say that. When someone decides that it is > a splendid idea to add more stuff to struct mce then said someone would > want it in the tracepoint too. > > And then we're back to my original question: > > "And where does it end? Stick full dmesg in the tracepoint too?" > > Where do you draw the line in the sand and say, no more, especially > static, fields bloating the trace record should be added and from then > on, you should go collect the info from that box. Something which you're > supposed to do anyway. Every patch that adds new code or data structures adds to the kernel memory footprint. Each should be considered on its merits. The basic question being: "Is the new functionality worth the cost?" Where does it end? It would end if Linus declared: "Linux is now complete. Stop sending patches". I.e. it is never going to end. If somebody posts a patch asking to add the full dmesg to a tracepoint, I'll stand with you to say: "Not only no, but hell no". So for Naik's two patches we have: 1) PPIN Cost = 8 bytes. Benefit: Emdeds a system identifier into the trace record so there can be no ambiguity about which machine generated this error. Also definitively indicates which socket on a multi-socket system. 2) MICROCODE Cost = 4 bytes Benefit: Certainty about the microcode version active on the core at the time the error was detected. RAS = Reliability, Availability, Serviceability These changes fall into the serviceability bucket. They make it easier to diagnose what went wrong. -Tony ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 0/2] Update mce_record tracepoint 2024-01-26 20:49 ` Luck, Tony @ 2024-01-26 21:11 ` Borislav Petkov 2024-01-26 22:01 ` Luck, Tony 0 siblings, 1 reply; 18+ messages in thread From: Borislav Petkov @ 2024-01-26 21:11 UTC (permalink / raw) To: Luck, Tony Cc: Avadhut Naik, linux-trace-kernel@vger.kernel.org, linux-edac@vger.kernel.org, rostedt@goodmis.org, x86@kernel.org, linux-kernel@vger.kernel.org, yazen.ghannam@amd.com, avadnaik@amd.com On Fri, Jan 26, 2024 at 08:49:03PM +0000, Luck, Tony wrote: > Every patch that adds new code or data structures adds to the kernel > memory footprint. Each should be considered on its merits. The basic > question being: > > "Is the new functionality worth the cost?" > > Where does it end? It would end if Linus declared: > > "Linux is now complete. Stop sending patches". > > I.e. it is never going to end. No, it's not that - it is the merit thing which determines. > 1) PPIN > Cost = 8 bytes. > Benefit: Emdeds a system identifier into the trace record so there > can be no ambiguity about which machine generated this error. > Also definitively indicates which socket on a multi-socket system. > > 2) MICROCODE > Cost = 4 bytes > Benefit: Certainty about the microcode version active on the core > at the time the error was detected. > > RAS = Reliability, Availability, Serviceability > > These changes fall into the serviceability bucket. They make it > easier to diagnose what went wrong. So does dmesg. Let's add it to the tracepoint... But no, that's not the right question to ask. It is rather: which bits of information are very relevant to an error record and which are transient enough so that they cannot be gathered from a system by other means or only gathered in a difficult way, and should be part of that record. The PPIN is not transient but you have to go map ->extcpu to the PPIN so adding it to the tracepoint is purely a convenience thing. More or less. The microcode revision thing I still don't buy but it is already there so whateva... So we'd need a rule hammered out and put there in a prominent place so that it is clear what goes into struct mce and what not. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH v2 0/2] Update mce_record tracepoint 2024-01-26 21:11 ` Borislav Petkov @ 2024-01-26 22:01 ` Luck, Tony 2024-01-27 12:19 ` Borislav Petkov 0 siblings, 1 reply; 18+ messages in thread From: Luck, Tony @ 2024-01-26 22:01 UTC (permalink / raw) To: Borislav Petkov Cc: Avadhut Naik, linux-trace-kernel@vger.kernel.org, linux-edac@vger.kernel.org, rostedt@goodmis.org, x86@kernel.org, linux-kernel@vger.kernel.org, yazen.ghannam@amd.com, avadnaik@amd.com > But no, that's not the right question to ask. > > It is rather: which bits of information are very relevant to an error > record and which are transient enough so that they cannot be gathered > from a system by other means or only gathered in a difficult way, and > should be part of that record. > > The PPIN is not transient but you have to go map ->extcpu to the PPIN so > adding it to the tracepoint is purely a convenience thing. More or less. > > The microcode revision thing I still don't buy but it is already there > so whateva... > > So we'd need a rule hammered out and put there in a prominent place so > that it is clear what goes into struct mce and what not. My personal evaluation of the value of these two additions to the trace record: PPIN: Nice to have. People that send stuff to me are terrible about providing surrounding details. The record already includes CPUID(1).EAX ... so I can at least skip the step of asking them which CPU family/model/stepping they were using). But PPIN can be recovered (so long as the submitter kept good records about which system generated the record). MICROCODE: Must have. Microcode version can be changed at run time. Going back to the system to check later may not give the correct answer to what was active at the time of the error. Especially for an error reported while a microcode update is waling across the CPUs poking the MSR on each in turn. -Tony ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 0/2] Update mce_record tracepoint 2024-01-26 22:01 ` Luck, Tony @ 2024-01-27 12:19 ` Borislav Petkov 0 siblings, 0 replies; 18+ messages in thread From: Borislav Petkov @ 2024-01-27 12:19 UTC (permalink / raw) To: Luck, Tony Cc: Avadhut Naik, linux-trace-kernel@vger.kernel.org, linux-edac@vger.kernel.org, rostedt@goodmis.org, x86@kernel.org, linux-kernel@vger.kernel.org, yazen.ghannam@amd.com, avadnaik@amd.com On Fri, Jan 26, 2024 at 10:01:29PM +0000, Luck, Tony wrote: > PPIN: Nice to have. People that send stuff to me are terrible about > providing surrounding details. The record already includes > CPUID(1).EAX ... so I can at least skip the step of asking them which > CPU family/model/stepping they were using). But PPIN can be recovered > (so long as the submitter kept good records about which system > generated the record). Yes. > MICROCODE: Must have. Microcode version can be changed at run time. > Going back to the system to check later may not give the correct > answer to what was active at the time of the error. Especially for an > error reported while a microcode update is waling across the CPUs > poking the MSR on each in turn. Easy: - You've got an MCE? Was it during scheduled microcode updates? - Yes. - Come back to me when it happens again, *outside* of the microcode update schedule. Anyway, I still don't buy that. Maybe I'm wrong and maybe I need to talk to data center operators more but this sounds like microcode update failing is such a common thing to happen so that we *absolutely* *must* capture the microcode revision when an MCE happens. Maybe we should make microcode updates more resilient and add a retry mechanism which doesn't back off as easily. Or maybe people should script around it and keep retrying, dunno. In my world, microcode update just works in the vast majority of the cases and if it doesn't, then those cases need a specific look. And if I am debugging an issue and I want to see the microcode revision, I look at /proc/cpuinfo. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2024-01-27 12:19 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-01-25 18:48 [PATCH v2 0/2] Update mce_record tracepoint Avadhut Naik 2024-01-25 18:48 ` [PATCH v2 1/2] tracing: Include PPIN in " Avadhut Naik 2024-01-25 20:58 ` Sohil Mehta 2024-01-25 18:48 ` [PATCH v2 2/2] tracing: Include Microcode Revision " Avadhut Naik 2024-01-25 21:03 ` Sohil Mehta 2024-01-26 1:35 ` Naik, Avadhut 2024-01-25 18:58 ` [PATCH v2 0/2] Update " Borislav Petkov 2024-01-25 19:19 ` Luck, Tony 2024-01-25 20:27 ` Naik, Avadhut 2024-01-26 10:27 ` Borislav Petkov 2024-01-26 17:10 ` Luck, Tony 2024-01-26 18:56 ` Borislav Petkov 2024-01-26 19:15 ` Luck, Tony 2024-01-26 19:45 ` Borislav Petkov 2024-01-26 20:49 ` Luck, Tony 2024-01-26 21:11 ` Borislav Petkov 2024-01-26 22:01 ` Luck, Tony 2024-01-27 12:19 ` Borislav Petkov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox