From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756013AbcCaIqb (ORCPT ); Thu, 31 Mar 2016 04:46:31 -0400 Received: from mail.skyhub.de ([78.46.96.112]:60259 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754160AbcCaIq1 (ORCPT ); Thu, 31 Mar 2016 04:46:27 -0400 Date: Thu, 31 Mar 2016 10:46:20 +0200 From: Borislav Petkov To: Tony Luck Cc: linux-kernel@vger.kernel.org, Ashok Raj Subject: Re: [PATCH] x86/mce: Look in genpool instead of mcelog.entry[] for pending error records Message-ID: <20160331084620.GB25191@pd.tnic> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 25, 2016 at 11:35:17AM -0700, Tony Luck wrote: > Couple of issues here: > 1) MCE_LOG_LEN is only 32 - so we may have more pending records than will > fit in the buffer on high core count cpus > 2) During a panic we may have a lot of duplicate records because multiple > logical cpus may have seen and logged the same error because some > banks are shared. > > Switch to using the genpool to look for the pending records. Squeeze > out duplicated records. > > Signed-off-by: Tony Luck > --- > > New solution for an old problem. See > https://lkml.kernel.org/r/1443073720-3940-1-git-send-email-ashok.raj@intel.com > for the previous take where Ashok said "bump MCE_LOG_LEN from 32 to 128" and > Boris replied "switch to the gen_pool". > > I think this clears the way to pull all of the /dev/mcelog driver > code out of mce.c and make it just another driver that registers > a notifier on x86_mce_decoder_chain. Good, finally! Once we have all the code around RAS daemon and things, we could finally mark it deprecated and make it a default off config option for distros which still need mcelog. > arch/x86/kernel/cpu/mcheck/mce-genpool.c | 42 +++++++++++++++++++++++++++++++ > arch/x86/kernel/cpu/mcheck/mce-internal.h | 10 ++++++++ > arch/x86/kernel/cpu/mcheck/mce.c | 21 +++++++--------- > 3 files changed, 61 insertions(+), 12 deletions(-) > > diff --git a/arch/x86/kernel/cpu/mcheck/mce-genpool.c b/arch/x86/kernel/cpu/mcheck/mce-genpool.c > index 0a850100c594..2dc276260fea 100644 > --- a/arch/x86/kernel/cpu/mcheck/mce-genpool.c > +++ b/arch/x86/kernel/cpu/mcheck/mce-genpool.c > @@ -26,6 +26,48 @@ static struct gen_pool *mce_evt_pool; > static LLIST_HEAD(mce_event_llist); > static char gen_pool_buf[MCE_POOLSZ]; > > +static bool duplicate_mce_record(struct mce_evt_llist *t, struct mce_evt_llist *l) Hmm, having the verb at the beginning of the name makes me think this function duplicates MCE records :) Maybe is_mce_record_duplicate() or mce_record_seen() ... Bah, all ideas I have right now are meh. > +{ > + struct mce_evt_llist *node; > + struct mce *m1, *m2; > + > + m1 = &t->mce; > + > + llist_for_each_entry(node, &l->llnode, llnode) { > + m2 = &node->mce; > + > + if (dup_mce_record(m1, m2)) > + return true; > + } > + return false; > +} > + > +/* > + * The system has paniced - we'd like to peruse the list of mce records panicked > + * that have been queued, but not seen by anyone yet. The list is in > + * reverse time order, so we need to reverse it. While doing that we can > + * also drop duplicate records (these were logged because some banks are > + * shared between cores or by all threads on a socket). > + */ > +struct llist_node *mce_gen_pool_panic(void) I'd call it mce_gen_pool_prepare_records() or so. > +{ > + struct llist_node *head; > + LLIST_HEAD(new_head); > + struct mce_evt_llist *node, *t; > + > + head = llist_del_all(&mce_event_llist); > + if (!head) > + return NULL; > + > + /* squeeze out duplicates while reversing order */ > + llist_for_each_entry_safe(node, t, head, llnode) { > + if (!duplicate_mce_record(node, t)) > + llist_add(&node->llnode, &new_head); > + } > + > + return new_head.first; > +} > + > void mce_gen_pool_process(void) > { > struct llist_node *head; > diff --git a/arch/x86/kernel/cpu/mcheck/mce-internal.h b/arch/x86/kernel/cpu/mcheck/mce-internal.h > index 547720efd923..492e38e4d02e 100644 > --- a/arch/x86/kernel/cpu/mcheck/mce-internal.h > +++ b/arch/x86/kernel/cpu/mcheck/mce-internal.h > @@ -35,6 +35,7 @@ void mce_gen_pool_process(void); > bool mce_gen_pool_empty(void); > int mce_gen_pool_add(struct mce *mce); > int mce_gen_pool_init(void); > +struct llist_node *mce_gen_pool_panic(void); > > extern int (*mce_severity)(struct mce *a, int tolerant, char **msg, bool is_excp); > struct dentry *mce_get_debugfs_dir(void); > @@ -81,3 +82,12 @@ static inline int apei_clear_mce(u64 record_id) > #endif > > void mce_inject_log(struct mce *m); > + > +/* > + * We consider records to be duplicate if bank+status+addr+misc all match > + */ > +static inline bool dup_mce_record(struct mce *m1, struct mce *m2) Since this is a "comparison" of sorts, I'd prefer something like: mce_records_equal(m1, m2); or even better: mce_cmp(m1, m2); :-) > +{ > + return m1->bank == m2->bank && m1->status == m2->status && > + m1->addr == m2->addr && m1->misc == m2->misc; Should be more readable like this: return m1->bank == m2->bank && m1->status == m2->status && m1->addr == m2->addr && m1->misc == m2->misc; -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.