From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 635EBC25B74 for ; Tue, 21 May 2024 05:36:24 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s9I9m-0001Qj-KF; Tue, 21 May 2024 01:35:26 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s9I9k-0001QV-Sk for qemu-devel@nongnu.org; Tue, 21 May 2024 01:35:24 -0400 Received: from esa5.hc1455-7.c3s2.iphmx.com ([68.232.139.130]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s9I9i-0000Nm-K9 for qemu-devel@nongnu.org; Tue, 21 May 2024 01:35:24 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=fujitsu.com; i=@fujitsu.com; q=dns/txt; s=fj2; t=1716269722; x=1747805722; h=message-id:date:mime-version:subject:from:to:cc: references:in-reply-to:content-transfer-encoding; bh=SCUPDGU310DnTYsYaeNWK/y5FGc6tJaxJGlEiMyI7Sw=; b=s+dYw5UwNi5TsthEr622hK6staEZibgtiCa9ZlPQqX6cofdgqx/rmaIE eHI9La22pkHKV6SeL+dNJrXmsCTTmR6Udp8tP5VDiJ5qx9aLQ18fARdpX HiEW4CggxIShJjCGcQQcV4k5XO4oL19zHzSerjr5tlNU9668dnPWu53Zz 4fKOgmzueo9o2rADwXV86xNAoVFY5nbTcPyV2v0wd/KzyxHG5SjAzR/Wp b1Ig6+sU+NynvatSNT7OleitBdCEHwSt0EUi7RVZO4aUFxs2kjCOfB4io C6E9LwS/kQpXl3vK4jr0iksXP6OVpRNSTOUeBr/WtMHvLjFlY6hwj6+CM Q==; X-IronPort-AV: E=McAfee;i="6600,9927,11078"; a="158356044" X-IronPort-AV: E=Sophos;i="6.08,176,1712588400"; d="scan'208";a="158356044" Received: from unknown (HELO oym-r2.gw.nic.fujitsu.com) ([210.162.30.90]) by esa5.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2024 14:35:17 +0900 Received: from oym-m4.gw.nic.fujitsu.com (oym-nat-oym-m4.gw.nic.fujitsu.com [192.168.87.61]) by oym-r2.gw.nic.fujitsu.com (Postfix) with ESMTP id 0A093CD7E2 for ; Tue, 21 May 2024 14:35:15 +0900 (JST) Received: from kws-ab3.gw.nic.fujitsu.com (kws-ab3.gw.nic.fujitsu.com [192.51.206.21]) by oym-m4.gw.nic.fujitsu.com (Postfix) with ESMTP id 2C003D52EF for ; Tue, 21 May 2024 14:35:14 +0900 (JST) Received: from edo.cn.fujitsu.com (edo.cn.fujitsu.com [10.167.33.5]) by kws-ab3.gw.nic.fujitsu.com (Postfix) with ESMTP id 8848720094AC9 for ; Tue, 21 May 2024 14:35:13 +0900 (JST) Received: from [192.168.50.5] (unknown [10.167.226.114]) by edo.cn.fujitsu.com (Postfix) with ESMTP id BCACC1A000A; Tue, 21 May 2024 13:35:12 +0800 (CST) Message-ID: Date: Tue, 21 May 2024 13:35:12 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 2/2] cxl/core: add poison creation event handler To: Dan Williams Cc: linux-cxl@vger.kernel.org, qemu-devel@nongnu.org, Jonathan.Cameron@huawei.com, dave@stgolabs.net, ira.weiny@intel.com, alison.schofield@intel.com References: <20240417075053.3273543-1-ruansy.fnst@fujitsu.com> <20240417075053.3273543-3-ruansy.fnst@fujitsu.com> <6628008c39e80_a96f29415@dwillia2-mobl3.amr.corp.intel.com.notmuch> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-TM-AS-Product-Ver: IMSS-9.1.0.1417-9.0.0.1002-28400.005 X-TM-AS-User-Approved-Sender: Yes X-TMASE-Version: IMSS-9.1.0.1417-9.0.1002-28400.005 X-TMASE-Result: 10--18.138600-10.000000 X-TMASE-MatchedRID: OnXFgg5KIq2PvrMjLFD6eJTQgFTHgkhZ2q80vLACqaeeT9kASxo4MUbg XczvLJKOfif6PsvfsVJjDSloE/K2AoDL4lw7ay41bMGKOuLn5FWlY4F8r0vXP2O0yVK/5LmcOUJ GvwdveqIaMKZ6Kdx++qByu2sDWGqW2BIRRxPqiXqAO0kpgKezRCy+piTiLP72nc419+1VHFoRQP nsANQIhSm8xqSuoEo4ex+UDbqohJBaTyDiyKJai0H0jmDpcSmLpxAd6mi1Ga3/z5X0Kzxdul292 NvkjB1uZ4LpE3hFYlFGhiWNz7usIuCPCC2ZACqAI0cHLI6lhgJUENBIMyKD0Wjliw+xvItdH2FV /Q4wfcSNlF1efden0xLg+Gj28iXz62wfhnJCIYyVF2HD8EHNp2o8zUxTJUAjQWa52++sgNmzhHc t+GcMBE/3QbfXts8PmJxhCNBaN2LiNzFg3s1SJRhvfWx0TE/bCZa9cSpBObnAuQ0xDMaXkN9kut SXDwfHVzYnrImPSZdBCdSPDWIIm7/KM1zCNPCMWTWEh5N2a9EYQYFQ2+HwYAbxceezd6S1Wt19W /5IzWypuN1B0n2bvZwCzlUmqhVlb7A96mLiSaSAwosDBeuPwRxCcOlDoVfTfL8fHUCAmusE3acA rSTnfcL0Uzyk5KX7nagtny7ZPcS/WXZS/HqJ2kY41YX/o/8K9+qbtSz74qV4N1jhRVQFLSAHAop Ed76vCHJHCFbsTn+bNtBoHfBInQq8MbEB8QgeaMHO+hPtLa/pH1q/Md+SQA== X-TMASE-SNAP-Result: 1.821001.0001-0-1-22:0,33:0,34:0-0 Received-SPF: pass client-ip=68.232.139.130; envelope-from=ruansy.fnst@fujitsu.com; helo=esa5.hc1455-7.c3s2.iphmx.com X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-to: Shiyang Ruan From: Shiyang Ruan via Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org 在 2024/5/3 19:32, Shiyang Ruan 写道: > > > 在 2024/4/24 2:40, Dan Williams 写道: >> Shiyang Ruan wrote: >>> Currently driver only traces cxl events, poison creation (for both vmem >>> and pmem type) on cxl memdev is silent. >> >> As it should be. >> >>> OS needs to be notified then it could handle poison pages in time. >> >> No, it was always the case that latent poison is an "action optional" >> event. I am not understanding the justification for this approach. What >> breaks if the kernel does not forward events to memory_failure_queue()? > > I think for type3(pmem) device, it should be handled like NVDIMM.  If > there are processes or filesystems running on it, they could be notified > then operate a friendly shutdown if POISON happens. > >> >> Consider that in the CPU consumption case that the firmware first path >> will do its own memory_failure_queue() and in the native case the MCE >> handler will take care of this. So that leaves pages that are accessed >> by DMA or background operation that encounter poison. Those are "action >> optional" scenarios and it is not clear to me how the driver tells the >> difference. > > So for real CXL device, it always use FW-First path to notify such > failure event?  Then, there is nothing to do with OS-First path? > >> >> This needs more precision on which agent is repsonsible for what level >> of reporting. The distribution of responsibility between ACPI GHES, >> EDAC, and the CXL driver is messy and I expect this changelog to >> demonstrate it understands all those considerations. > > Ok, I'll try to understand them. Hi Dan, I checked the GHES, EDAC codes. I think they belong to FW-First path. GHES polls mem errors, then 1. report by EDAC 2. construct a MCE, mce_log(), and handle in work queue 3. queue it into memory_failure right now if needed (sync, ...) And community is adding CXL FW-First trace support[1]. But in OS-First path, error record is sent to CXL driver via MSI, it won't conflict with FW-First path, I think. [1] https://lore.kernel.org/linux-cxl/43ab39e9-c9c2-bfe4-7d1c-bad462221c9c@amd.com/T/#t > >> >>> Per CXL spec, the device error event could be signaled through >>> FW-First and OS-First methods. >>> >>> So, add poison creation event handler in OS-First method: >>>    - Qemu: >> >> Why is QEMU relevant for this patch? QEMU is only a development vehicle >> the upstream enabling should be reference shipping or expected to be >> shipping hardware implementations. > > Yes, but currently we don't have a real CXL device so developing and > verification could only be done on Qemu with simulated CXL device. > >> >>>      - CXL device reports POISON creation event to OS by MSI by sending >>>        GMER/DER after injecting a poison record; >> >> When you say "inject" here do you mean "add to the poison list if >> present". Because "inject" to me means the "Inject Poison" Memory Device >> Command. > > It's a Qemu qmp command called "cxl-inject-poison", which only adds a > given address,length record to CXL's poison list, doesn't send > INJECT_POISON mbox command. > >> >>>    - CXL driver: >>>      a. parse the POISON event from GMER/DER; >>>      b. translate poisoned DPA to HPA (PFN); >>>      c. enqueue poisoned PFN to memory_failure's work queue; >>> >>> Signed-off-by: Shiyang Ruan >>> --- >>>   drivers/cxl/core/mbox.c   | 119 +++++++++++++++++++++++++++++++++----- >>>   drivers/cxl/cxlmem.h      |   8 +-- >>>   include/linux/cxl-event.h |  18 +++++- >>>   3 files changed, 125 insertions(+), 20 deletions(-) >>> >>> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c >>> index f0f54aeccc87..76af0d73859d 100644 >>> --- a/drivers/cxl/core/mbox.c >>> +++ b/drivers/cxl/core/mbox.c >>> @@ -837,25 +837,116 @@ int cxl_enumerate_cmds(struct cxl_memdev_state >>> *mds) >>>   } >>>   EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL); >>> -void cxl_event_trace_record(const struct cxl_memdev *cxlmd, >>> -                enum cxl_event_log_type type, >>> -                enum cxl_event_type event_type, >>> -                const uuid_t *uuid, union cxl_event *evt) >>> +static void cxl_report_poison(struct cxl_memdev *cxlmd, struct >>> cxl_region *cxlr, >>> +                  u64 dpa) >>>   { >>> -    if (event_type == CXL_CPER_EVENT_GEN_MEDIA) >>> +    u64 hpa = cxl_trace_hpa(cxlr, cxlmd, dpa); >>> +    unsigned long pfn = PHYS_PFN(hpa); >>> + >>> +    if (!IS_ENABLED(CONFIG_MEMORY_FAILURE)) >>> +        return; >> >> No need for this check, memory_failure_queue() is already stubbed out in >> the CONFIG_MEMORY_FAILURE=n case. > Yes, I'm overthinking it. > >> >>> +    memory_failure_queue(pfn, MF_ACTION_REQUIRED); >> >> My expectation is MF_ACTION_REQUIRED is not appropriate for CXL event >> reported errors since action is only required for direct consumption >> events and those need not be reported through the device event queue. > Got it. I'm not very sure about 'Host write/read' type. In my opinion, these two types of event should be sent from device when CPU is accessing a bad memory address, they could be thought of a sync event which needs the 'MF_ACTION_REQUIRED' flag. Then, we can determine the flag by the types like this: - CXL_EVENT_TRANSACTION_READ | CXL_EVENT_TRANSACTION_WRITE => MF_ACTION_REQUIRED - CXL_EVENT_TRANSACTION_INJECT_POISON => MF_SW_SIMULATED - others => 0 -- Thanks, Ruan. > >> >> It would be useful to collaborate with a BIOS firmware engineer so that >> the kernel ends up with similar logic as is used to set CPER record >> severity, or at least understands why it would want to be different. >> See how ghes_handle_memory_failure() determines the >> memory_failure_queue() flags. > > Ok, thanks for the advice. > > > -- > Ruan. > > >