From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3vYGK861t4zDqKH for ; Thu, 2 Mar 2017 00:20:28 +1100 (AEDT) Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.20/8.16.0.20) with SMTP id v21DJaBx098589 for ; Wed, 1 Mar 2017 08:20:26 -0500 Received: from e28smtp03.in.ibm.com (e28smtp03.in.ibm.com [125.16.236.3]) by mx0b-001b2d01.pphosted.com with ESMTP id 28wxrascxb-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Wed, 01 Mar 2017 08:20:25 -0500 Received: from localhost by e28smtp03.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 1 Mar 2017 17:28:54 +0530 Received: from d28relay02.in.ibm.com (d28relay02.in.ibm.com [9.184.220.59]) by d28dlp03.in.ibm.com (Postfix) with ESMTP id 8DF9212585B9 for ; Wed, 1 Mar 2017 16:56:01 +0530 (IST) Received: from d28av02.in.ibm.com (d28av02.in.ibm.com [9.184.220.64]) by d28relay02.in.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id v21BPn3V5177484 for ; Wed, 1 Mar 2017 16:55:49 +0530 Received: from d28av02.in.ibm.com (localhost [127.0.0.1]) by d28av02.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id v21BPkeF021508 for ; Wed, 1 Mar 2017 16:55:48 +0530 From: Vaibhav Jain To: linuxppc-dev@lists.ozlabs.org, Russell Currey , Frederic Barrat Cc: Vaibhav Jain , Andrew Donnellan , Ian Munsie , Christophe Lombard , Philippe Bergheaud , Greg Kurz , Gavin Shan Subject: [RESEND-RFC v2 0/3] cxl: Reset freeze counter for the adapter before PERST Date: Wed, 1 Mar 2017 16:54:49 +0530 Message-Id: <20170301112452.15798-1-vaibhav@linux.vnet.ibm.com> List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Resend: Update the Cc recipients list. v2 changes: * Moved definition of eeh_pe_reset_freeze_counter() from eeh.h to eeh_pe.c to avoid adding a header dependency to 'pci-bridge.h'. The function is now marked as an exported gpl symbol. * Incorporated changes as suggested by Russell Currey: - Inserted logging for PHB and PE number inside eeh_pe_reset_freeze_counter() - Suffixed all the function names used in comments/patch-descriptions with '()' - Removed an un-needed conditional check of '<0' in eeh_handle_normal_event() - Rephrased the function comment for eeh_pe_update_freeze_counter() and eeh_pe_reset_freeze_counter() - Brace-wrapped a single line statement at end of eeh_pe_update_freeze_counter() v1: Presently to flash a cxl adapter with a new FPGA image a warm pcie reset is requested on the adapter, once the bitstream is loaded to card flash memory. This issues a pci-fundamental reset to the card slot signaling the card controller to reconfigure the fpga with the new bitstream. However pci-fundamental reset of the slot also results in a fenced PHB that raises an eeh event triggering the core eeh flow. The core eeh also maintains a counter named freeze_count for each PE inside struct eeh_pe. The counter is incremented every time an eeh error is reported on the PE domain and if the counter reaches the threshold limit, the device is permanently disabled. The threshold limit is enforced by the variable eeh_max_freeze variable that can be manipulated via debugfs. This creates problem for cxl adapters as: * This puts a limit on number of times a fpga image can be re-flashed which is by default 5-time/Hour. * Since after each reset the adapter can potentially acquire a new personality, the freeze_count of older fpga image shouldn't be carried over to newer image. To fix these problems the proposed patch-set introduces a new function named eeh_pe_reset_freeze_counter that resets freeze counter for the eeh_pe struct. This function can then be called by the cxl module before issuing a pci-fundamental reset to the card slot for loading the new fpga image. Test Runs ========== * Without the patchset: # for i in $(seq 0 6); do echo 1 > /sys/class/cxl/card0/reset; sleep 20; done bash: /sys/class/cxl/card0/reset: No such file or directory # dmesg ... EEH: Fenced PHB#22 detected, location: N/A EEH: PHB#22-PE#0 has failed 1 times in the last hour ... EEH: PHB#22-PE#0 has failed 2 times in the last hour ... EEH: PHB#22-PE#0 has failed 3 times in the last hour ... EEH: PHB#22-PE#0 has failed 4 times in the last hour ... EEH: PHB#22-PE#0 has failed 5 times in the last hour ... EEH: PHB#22-PE#0 has failed 6 times in the last hour and has been permanently disabled. * With the patchset: # for i in $(seq 0 6); do echo 1 > /sys/class/cxl/card0/reset; sleep 20; done # dmesg ... cxl-pci 0022:01:00.0: Resetting freeze counters for the PHB EEH: Fenced PHB#22 detected, location: N/A EEH: PHB#22-PE#0 has failed 1 times in the last hour ... cxl-pci 0022:01:00.0: Resetting freeze counters for the PHB EEH: Fenced PHB#22 detected, location: N/A EEH: PHB#22-PE#0 has failed 1 times in the last hour ... cxl-pci 0022:01:00.0: Resetting freeze counters for the PHB EEH: Fenced PHB#22 detected, location: N/A EEH: PHB#22-PE#0 has failed 1 times in the last hour ... cxl-pci 0022:01:00.0: Resetting freeze counters for the PHB EEH: Fenced PHB#22 detected, location: N/A EEH: PHB#22-PE#0 has failed 1 times in the last hour ... cxl-pci 0022:01:00.0: Resetting freeze counters for the PHB EEH: Fenced PHB#22 detected, location: N/A EEH: PHB#22-PE#0 has failed 1 times in the last hour ... cxl-pci 0022:01:00.0: Resetting freeze counters for the PHB EEH: Fenced PHB#22 detected, location: N/A EEH: PHB#22-PE#0 has failed 1 times in the last hour --- Vaibhav Jain (3): powerpc/eeh: Refactor eeh_pe_update_time_stamp() to update freeze_count powerpc/eeh: Introduce function eeh_pe_reset_freeze_counter() cxl: Reset freeze counters before adapter PERST for flashing new image arch/powerpc/include/asm/eeh.h | 7 ++++- arch/powerpc/kernel/eeh_driver.c | 20 ++----------- arch/powerpc/kernel/eeh_pe.c | 64 ++++++++++++++++++++++++++++++---------- drivers/misc/cxl/pci.c | 14 +++++++++ 4 files changed, 72 insertions(+), 33 deletions(-) -- 2.9.3