From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=linux.vnet.ibm.com (client-ip=148.163.158.5; helo=mx0a-001b2d01.pphosted.com; envelope-from=stewart@linux.vnet.ibm.com; receiver=) Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3ybcN83vPKzDql1 for ; Tue, 14 Nov 2017 17:02:00 +1100 (AEDT) Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id vAE5sDOI070198 for ; Tue, 14 Nov 2017 01:01:57 -0500 Received: from e14.ny.us.ibm.com (e14.ny.us.ibm.com [129.33.205.204]) by mx0a-001b2d01.pphosted.com with ESMTP id 2e7p5ru4yu-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Tue, 14 Nov 2017 01:01:57 -0500 Received: from localhost by e14.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 14 Nov 2017 01:01:57 -0500 Received: from b01cxnp22033.gho.pok.ibm.com (9.57.198.23) by e14.ny.us.ibm.com (146.89.104.201) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 14 Nov 2017 01:01:52 -0500 Received: from b01ledav005.gho.pok.ibm.com (b01ledav005.gho.pok.ibm.com [9.57.199.110]) by b01cxnp22033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id vAE61qec35455172; Tue, 14 Nov 2017 06:01:52 GMT Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 68A1DAE051; Tue, 14 Nov 2017 01:02:42 -0500 (EST) Received: from birb.localdomain (unknown [9.185.16.74]) by b01ledav005.gho.pok.ibm.com (Postfix) with ESMTP id DD8E9AE03C; Tue, 14 Nov 2017 01:02:41 -0500 (EST) Received: by birb.localdomain (Postfix, from userid 1000) id D929A4F0ED0; Tue, 14 Nov 2017 17:01:49 +1100 (AEDT) From: Stewart Smith To: Oliver , Joel Stanley Cc: Sergey Kachkin , Alistair Popple , Benjamin Herrenschmidt , Balbir Singh , OpenBMC Maillist Subject: Re: checkstop processing In-Reply-To: References: Date: Tue, 14 Nov 2017 17:01:49 +1100 MIME-Version: 1.0 Content-Type: text/plain X-TM-AS-GCONF: 00 x-cbid: 17111406-0052-0000-0000-00000281D65D X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00008064; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000240; SDB=6.00945598; UDB=6.00477234; IPR=6.00725894; BA=6.00005689; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00018004; XFM=3.00000015; UTC=2017-11-14 06:01:55 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17111406-0053-0000-0000-0000529E192B Message-Id: <87zi7ptn3m.fsf@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-11-14_03:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=2 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1711140080 X-BeenThere: openbmc@lists.ozlabs.org X-Mailman-Version: 2.1.24 Precedence: list List-Id: Development list for OpenBMC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Nov 2017 06:02:01 -0000 Oliver writes: > On Tue, Nov 14, 2017 at 2:42 PM, Joel Stanley wrote: >> On Tue, Nov 14, 2017 at 8:04 AM, Sergey Kachkin wrote: >>> Hi all, >>> >>> i'm investigating the checkstop processing and looking for a way to isolate >>> a faulty component with OpenBmc. >>> So far SEL logs available via REST are not really helpful. >>> >>> Is there any data source in the openbmc to troubleshoot checkstops? >>> >>> I guess eSEL binary data parsed with eSEL.pl can be more informative but do >>> we have any procedure to grab the binary sel data and parse it with the >>> latest obmc? >>> >>> Currently it seems that IPL checkstop analysis is not really working. i mean >>> that faulty component is not deconfigured on the next boot and gard list is >>> empty. >>> It can be easily duplicated by injecting an error manually via putscom. >> >> I think you've identified an area that would be great for improvement. >> >> I'd like to expand the scope beyond just checkstop to other boot >> failures: I've tried to boot machines recently that have failed to >> even start hostboot, and I haven't known what has failed. >> >> A tool that inspects recent error logs, and the state of the SBE would >> be useful. We can leverage libpdbg to talk to the host. > > The SBE stores some state information in cfam 2809 that we can use to > find out the currents istep. I think we can also dump the SBE trace > buffer out of PIB memory on non-secure systems too. Parsing the trace > buffer requires the tracehash file from the SBE build, but we can > probably able to add that to the squashfs file for the host firmware. This would be ideal to put in a sensor for boot progress. -- Stewart Smith OPAL Architect, IBM.