From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 95B6BC77B73 for ; Sun, 23 Apr 2023 10:55:08 +0000 (UTC) Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4Q44sl1wH1z3fVS for ; Sun, 23 Apr 2023 20:55:07 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=RNqsiMYE; dkim-atps=neutral Received: from gandalf.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4Q44py4mpCz3bnM for ; Sun, 23 Apr 2023 20:52:42 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=RNqsiMYE; dkim-atps=neutral Received: from gandalf.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by gandalf.ozlabs.org (Postfix) with ESMTP id 4Q44pt3lwSz4xFk for ; Sun, 23 Apr 2023 20:52:38 +1000 (AEST) Received: by gandalf.ozlabs.org (Postfix) id 4Q44pt3hJgz4xFv; Sun, 23 Apr 2023 20:52:38 +1000 (AEST) Authentication-Results: gandalf.ozlabs.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: gandalf.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=linux.ibm.com (client-ip=148.163.158.5; helo=mx0b-001b2d01.pphosted.com; envelope-from=sourabhjain@linux.ibm.com; receiver=) Authentication-Results: gandalf.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=RNqsiMYE; dkim-atps=neutral Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gandalf.ozlabs.org (Postfix) with ESMTPS id 4Q44pt0jDvz4xFk for ; Sun, 23 Apr 2023 20:52:37 +1000 (AEST) Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 33NAbxhn031133; Sun, 23 Apr 2023 10:52:25 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : content-transfer-encoding : mime-version; s=pp1; bh=erYOZloGbili+rtx+wBv9FGcgMrrFDiu//ufXaNQbxU=; b=RNqsiMYE9HzmBgmipn+NYKegLVcXKwHGD3+52Me6uZziXVJoZf/nFSORoecrAxqSDjTl 4MYLSzksFAhHulb9YWSX39XP4vKIC2UoQmo+xiSIEfyGntOrWaKr2dPIvzaJF1pLG39D WxJtfk6mmiYMPaf1WSsF7SUTLv+6nkbmGG41dlM6BA+UmkKCuu8y8EcoEbbiZ1WC8Ef2 etOtJ/2FQxwyCt8ivEb0ss+NdYBfskHEkgfXf+IB9ilU4ctgHrztsB/ZX+jsFmYJ4GGD ki250/OnZgbLfWiVMEHKwnIKAU/Bf667zNrdTQEx4E356qR7nb8uYnVjN1sJcmBGpBOj /Q== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3q461b13kg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 23 Apr 2023 10:52:25 +0000 Received: from m0353725.ppops.net (m0353725.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 33NAidix014139; Sun, 23 Apr 2023 10:52:24 GMT Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com [169.51.49.98]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3q461b13jw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 23 Apr 2023 10:52:24 +0000 Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1]) by ppma03ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 33N4r9OS015124; Sun, 23 Apr 2023 10:52:22 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma03ams.nl.ibm.com (PPS) with ESMTPS id 3q47770j74-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 23 Apr 2023 10:52:22 +0000 Received: from smtpav05.fra02v.mail.ibm.com (smtpav05.fra02v.mail.ibm.com [10.20.54.104]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 33NAqJjv12059156 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sun, 23 Apr 2023 10:52:19 GMT Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 078912004D; Sun, 23 Apr 2023 10:52:19 +0000 (GMT) Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EB69920049; Sun, 23 Apr 2023 10:52:15 +0000 (GMT) Received: from li-4f5ba44c-27d4-11b2-a85c-a08f5b49eada.ibm.com.com (unknown [9.43.22.217]) by smtpav05.fra02v.mail.ibm.com (Postfix) with ESMTP; Sun, 23 Apr 2023 10:52:15 +0000 (GMT) From: Sourabh Jain To: linuxppc-dev@ozlabs.org, mpe@ellerman.id.au Subject: [PATCH v10 0/5] PowerPC: In-kernel handling of CPU/Memory hotplug/online/offline events for kdump kernel Date: Sun, 23 Apr 2023 16:22:08 +0530 Message-Id: <20230423105213.70795-1-sourabhjain@linux.ibm.com> X-Mailer: git-send-email 2.39.2 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: pZmSpiqcDN1AcWYtTM01-urNSCxt6AJ_ X-Proofpoint-ORIG-GUID: _AFm65y5WQnojapYkHRj0MmStpUgRxLd Content-Transfer-Encoding: 8bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-23_06,2023-04-21_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 adultscore=0 mlxlogscore=999 clxscore=1011 mlxscore=0 priorityscore=1501 impostorscore=0 spamscore=0 malwarescore=0 bulkscore=0 phishscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304230097 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: eric.devolder@oracle.com, bhe@redhat.com, mahesh@linux.vnet.ibm.com, kexec@lists.infradead.org, ldufour@linux.ibm.com, hbathini@linux.ibm.com Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" The Problem: ============ Post CPU/Memory hot plug/unplug and online/offline events the kernel holds stale information about the system. Dump collection with stale kdump kernel might end up in dump capture failure or an inaccurate dump collection. Existing solution: ================== The existing solution to keep the kdump kernel up-to-date by monitoring CPU/Memory hotplug/online/offline events via udev rule and trigger a full kdump kernel reload for every hotplug event. Shortcomings: ------------------------------------------------ - Leaves a window where kernel crash might not lead to a successful dump collection. - Reloading all kexec components for each hotplug is inefficient. - udev rules are prone to races if hotplug events are frequent. More about issues with an existing solution is posted here: - https://lkml.org/lkml/2020/12/14/532 - https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-February/240254.html Proposed Solution: ================== Instead of reloading all kexec segments on CPU/Memory hotplug/online/offline event, this patch series focuses on updating only the relevant kexec segment. Once the kexec segments are loaded in the kernel reserved area then an arch-specific hotplug handler will update the relevant kexec segment based on hotplug event type. Series Dependencies ==================== This patch series implements the crash hotplug handler on PowerPC. The generic crash hotplug handler is introduced by https://lkml.org/lkml/2023/4/4/1136 patch series. Git tree for testing: ===================== The below git tree has this patch series applied on top of dependent patch series. https://github.com/sourabhjains/linux/tree/e21-s10 To realise the feature the kdump udev rule must updated to avoid reloading of kdump reload on CPU/Memory hotplug/online/offline events. RHEL: /usr/lib/udev/rules.d/98-kexec.rules -SUBSYSTEM=="cpu", ACTION=="online", GOTO="kdump_reload_cpu" -SUBSYSTEM=="memory", ACTION=="online", GOTO="kdump_reload_mem" -SUBSYSTEM=="memory", ACTION=="offline", GOTO="kdump_reload_mem" +SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" +SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" Note: only kexec_file_load syscall will work. For kexec_load minor changes are required in kexec tool. --- Changelog: v10: - Drop the patch that adds fdt_index attribute to struct kimage_arch Find the fdt segment index when needed. - Added more details into commits messages. - Rebased onto 6.3.0-rc5 v9: - Removed patch to prepare elfcorehdr crash notes for possible CPUs. The patch is moved to generic patch series that introduces generic infrastructure for in kernel crash update. - Removed patch to pass the hotplug action type to the arch crash hotplug handler function. The generic patch series has introduced the hotplug action type in kimage struct. - Add detail commit message for better understanding. v8: - Restrict fdt_index initialization to machine_kexec_post_load it work for both kexec_load and kexec_file_load.[3/8] Laurent Dufour - Updated the logic to find the number of offline core. [6/8] - Changed the logic to find the elfcore program header to accommodate future memory ranges due memory hotplug events. [8/8] v7 - added a new config to configure this feature - pass hotplug action type to arch specific handler v6 - Added crash memory hotplug support v5: - Replace COFNIG_CRASH_HOTPLUG with CONFIG_HOTPLUG_CPU. - Move fdt segment identification for kexec_load case to load path instead of crash hotplug handler - Keep new attribute defined under kimage_arch to track FDT segment under CONFIG_HOTPLUG_CPU config. v4: - Update the logic to find the additional space needed for hotadd CPUs post kexec load. Refer "[RFC v4 PATCH 4/5] powerpc/crash hp: add crash hotplug support for kexec_file_load" patch to know more about the change. - Fix a couple of typo. - Replace pr_err to pr_info_once to warn user about memory hotplug support. - In crash hotplug handle exit the for loop if FDT segment is found. v3 - Move fdt_index and fdt_index_vaild variables to kimage_arch struct. - Rebase patche on top of https://lkml.org/lkml/2022/3/3/674 [v5] - Fixed warning reported by checpatch script v2: - Use generic hotplug handler introduced by https://lkml.org/lkml/2022/2/9/1406, a significant change from v1. Sourabh Jain (5): powerpc/kexec: turn some static helper functions public powerpc/crash: introduce a new config option CRASH_HOTPLUG powerpc/crash: add crash CPU hotplug support crash: forward memory_notify args to arch crash hotplug handler powerpc/kexec: add crash memory hotplug support arch/powerpc/Kconfig | 12 + arch/powerpc/include/asm/kexec.h | 10 + arch/powerpc/include/asm/kexec_ranges.h | 1 + arch/powerpc/kexec/core_64.c | 301 ++++++++++++++++++++++++ arch/powerpc/kexec/elf_64.c | 12 +- arch/powerpc/kexec/file_load_64.c | 212 ++++------------- arch/powerpc/kexec/ranges.c | 85 +++++++ arch/x86/include/asm/kexec.h | 2 +- arch/x86/kernel/crash.c | 3 +- include/linux/kexec.h | 2 +- kernel/crash_core.c | 14 +- 11 files changed, 479 insertions(+), 175 deletions(-) -- 2.39.2