From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from esa5.hc1455-7.c3s2.iphmx.com (esa5.hc1455-7.c3s2.iphmx.com [68.232.139.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C197917758 for ; Fri, 3 May 2024 11:32:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=68.232.139.130 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714735950; cv=none; b=gVrNf/xxpdCs5Gvd750ljr3XjxqkqBYGDNDH2cGMvTGlX8N4j0INfTNRruuoWFNFZD8xvwnDu0hGGD7v+JVuDB8l0thPNdX+buL9tKsfld4T1T2hQtmLmicOYox8+X4YZFN1gqkhvGJM64GbrJcTKycaVIVS8ueBFy18zmq57oI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714735950; c=relaxed/simple; bh=Ph0sVC8wxZB+O9I9nzGHvBBheV5ChbIMFIQ6F7nE5nQ=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=ubJVAYa+wh7gwgIgqCVTDFeTzF2Ee786EsJvhs9Gva6dP/3oDb6k8ssSu7msaenz/zRxJrw6oewIiMpfL/qX2fuEg+woRYEHooA57ZrCuu4qGaiv1bZnzzt8XhTXwSW0cdCMT3yA8infHcqHkRoaaxFe94oSQoTqsC6v6yBe1bw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=fujitsu.com; spf=pass smtp.mailfrom=fujitsu.com; dkim=pass (2048-bit key) header.d=fujitsu.com header.i=@fujitsu.com header.b=pH2U10Qu; arc=none smtp.client-ip=68.232.139.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=fujitsu.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fujitsu.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fujitsu.com header.i=@fujitsu.com header.b="pH2U10Qu" DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=fujitsu.com; i=@fujitsu.com; q=dns/txt; s=fj2; t=1714735948; x=1746271948; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=Ph0sVC8wxZB+O9I9nzGHvBBheV5ChbIMFIQ6F7nE5nQ=; b=pH2U10Qu+td4cm/M9PpmFzmcQBGwypqB6cYLojLwkcO6V5o07sBrYxnV 6yUsAuGVrSMsrscTo++j9SV07JAV4n9gs0HLLd4yQGNlxRaoinA+lUknA VMSIdBSMLrXUBGPs6wjQd4+OQ/R96QCTLECf7gjHb6pH/2t0sACmzDSCZ bkTSnTOcjoM6WVMuaEYztdFPYwppW1wPOxGo3CbSKsqsCMDwARkyhIKB5 1DpP85qaMxgbPPx7JbJtUZvV9d7xbUlnBTrEpD485QIiB2Dux7fbWOiWk VZfELb8/7GmhCbs6x9tikVo3Ufk+NfVHaUmHYFYYYjnZ//b9fyYt5gZ9B w==; X-IronPort-AV: E=McAfee;i="6600,9927,11062"; a="156735899" X-IronPort-AV: E=Sophos;i="6.07,251,1708354800"; d="scan'208";a="156735899" Received: from unknown (HELO oym-r4.gw.nic.fujitsu.com) ([210.162.30.92]) by esa5.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 May 2024 20:32:19 +0900 Received: from oym-m2.gw.nic.fujitsu.com (oym-nat-oym-m2.gw.nic.fujitsu.com [192.168.87.59]) by oym-r4.gw.nic.fujitsu.com (Postfix) with ESMTP id 0992D5AF65 for ; Fri, 3 May 2024 20:32:17 +0900 (JST) Received: from kws-ab3.gw.nic.fujitsu.com (kws-ab3.gw.nic.fujitsu.com [192.51.206.21]) by oym-m2.gw.nic.fujitsu.com (Postfix) with ESMTP id 418EDBDC97 for ; Fri, 3 May 2024 20:32:16 +0900 (JST) Received: from edo.cn.fujitsu.com (edo.cn.fujitsu.com [10.167.33.5]) by kws-ab3.gw.nic.fujitsu.com (Postfix) with ESMTP id D14902008BCE2 for ; Fri, 3 May 2024 20:32:15 +0900 (JST) Received: from [10.193.128.195] (unknown [10.193.128.195]) by edo.cn.fujitsu.com (Postfix) with ESMTP id 215F31A0002; Fri, 3 May 2024 19:32:15 +0800 (CST) Message-ID: Date: Fri, 3 May 2024 19:32:14 +0800 Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 2/2] cxl/core: add poison creation event handler To: Dan Williams Cc: linux-cxl@vger.kernel.org, qemu-devel@nongnu.org, Jonathan.Cameron@huawei.com, dave@stgolabs.net, ira.weiny@intel.com, alison.schofield@intel.com References: <20240417075053.3273543-1-ruansy.fnst@fujitsu.com> <20240417075053.3273543-3-ruansy.fnst@fujitsu.com> <6628008c39e80_a96f29415@dwillia2-mobl3.amr.corp.intel.com.notmuch> From: Shiyang Ruan In-Reply-To: <6628008c39e80_a96f29415@dwillia2-mobl3.amr.corp.intel.com.notmuch> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-TM-AS-Product-Ver: IMSS-9.1.0.1417-9.0.0.1002-28358.007 X-TM-AS-User-Approved-Sender: Yes X-TMASE-Version: IMSS-9.1.0.1417-9.0.1002-28358.007 X-TMASE-Result: 10--17.853000-10.000000 X-TMASE-MatchedRID: iE9XhvgCcn2PvrMjLFD6eJTQgFTHgkhZw5PXDQWptSnHr4PSWTXSLMWl hj9iHeVpfif6PsvfsVJjDSloE/K2AoDL4lw7ay41bMGKOuLn5FWlY4F8r0vXP2O0yVK/5LmcOUJ GvwdveqIaMKZ6Kdx++qByu2sDWGqW2BIRRxPqiXqAO0kpgKezRCy+piTiLP72nc419+1VHFoRQP nsANQIhSm8xqSuoEo4ex+UDbqohJBaTyDiyKJai0H0jmDpcSmLpxAd6mi1Ga3/z5X0KzxduuiBx aphBmrYoKLaH5bp/M7zteLMLyOs0V9dhvgA/NqZaw1J1gAXJ1KX2rvknNYlE7+7SLqdnJMRV8uV ka6dpKMDUWAqBlTrGHzelVjH2BNfH9BbtD/NYuUEOhHzDsL05jzLhqT0KeNiL31P64kiV5HoKEr 2irJf5CL637QCIVpi8vc3EUpCmrXDiZmOF0V5FVhRyidsElYk8LGB8WXSdboTf11Tnw43V6xIjS YfsSaZ6mGP/NvIDJh+QYb9VB9LzuIFWrKTp/z05DSMPEZQAkNK4f4Z+CZAZ9pSUoaJ2MP7kHs1A zjpA+xaM5U7dRXqm1+24nCsUSFNt7DW3B48kkHYoM82yqmFMvoLR4+zsDTtFEAkLaTcFnomiH6x IjcSC4jfW/vklDmQBEC8ggnHU5PP8/c7m0jcKw== X-TMASE-SNAP-Result: 1.821001.0001-0-1-22:0,33:0,34:0-0 在 2024/4/24 2:40, Dan Williams 写道: > Shiyang Ruan wrote: >> Currently driver only traces cxl events, poison creation (for both vmem >> and pmem type) on cxl memdev is silent. > > As it should be. > >> OS needs to be notified then it could handle poison pages in time. > > No, it was always the case that latent poison is an "action optional" > event. I am not understanding the justification for this approach. What > breaks if the kernel does not forward events to memory_failure_queue()? I think for type3(pmem) device, it should be handled like NVDIMM. If there are processes or filesystems running on it, they could be notified then operate a friendly shutdown if POISON happens. > > Consider that in the CPU consumption case that the firmware first path > will do its own memory_failure_queue() and in the native case the MCE > handler will take care of this. So that leaves pages that are accessed > by DMA or background operation that encounter poison. Those are "action > optional" scenarios and it is not clear to me how the driver tells the > difference. So for real CXL device, it always use FW-First path to notify such failure event? Then, there is nothing to do with OS-First path? > > This needs more precision on which agent is repsonsible for what level > of reporting. The distribution of responsibility between ACPI GHES, > EDAC, and the CXL driver is messy and I expect this changelog to > demonstrate it understands all those considerations. Ok, I'll try to understand them. > >> Per CXL spec, the device error event could be signaled through >> FW-First and OS-First methods. >> >> So, add poison creation event handler in OS-First method: >> - Qemu: > > Why is QEMU relevant for this patch? QEMU is only a development vehicle > the upstream enabling should be reference shipping or expected to be > shipping hardware implementations. Yes, but currently we don't have a real CXL device so developing and verification could only be done on Qemu with simulated CXL device. > >> - CXL device reports POISON creation event to OS by MSI by sending >> GMER/DER after injecting a poison record; > > When you say "inject" here do you mean "add to the poison list if > present". Because "inject" to me means the "Inject Poison" Memory Device > Command. It's a Qemu qmp command called "cxl-inject-poison", which only adds a given address,length record to CXL's poison list, doesn't send INJECT_POISON mbox command. > >> - CXL driver: >> a. parse the POISON event from GMER/DER; >> b. translate poisoned DPA to HPA (PFN); >> c. enqueue poisoned PFN to memory_failure's work queue; >> >> Signed-off-by: Shiyang Ruan >> --- >> drivers/cxl/core/mbox.c | 119 +++++++++++++++++++++++++++++++++----- >> drivers/cxl/cxlmem.h | 8 +-- >> include/linux/cxl-event.h | 18 +++++- >> 3 files changed, 125 insertions(+), 20 deletions(-) >> >> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c >> index f0f54aeccc87..76af0d73859d 100644 >> --- a/drivers/cxl/core/mbox.c >> +++ b/drivers/cxl/core/mbox.c >> @@ -837,25 +837,116 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds) >> } >> EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL); >> >> -void cxl_event_trace_record(const struct cxl_memdev *cxlmd, >> - enum cxl_event_log_type type, >> - enum cxl_event_type event_type, >> - const uuid_t *uuid, union cxl_event *evt) >> +static void cxl_report_poison(struct cxl_memdev *cxlmd, struct cxl_region *cxlr, >> + u64 dpa) >> { >> - if (event_type == CXL_CPER_EVENT_GEN_MEDIA) >> + u64 hpa = cxl_trace_hpa(cxlr, cxlmd, dpa); >> + unsigned long pfn = PHYS_PFN(hpa); >> + >> + if (!IS_ENABLED(CONFIG_MEMORY_FAILURE)) >> + return; > > No need for this check, memory_failure_queue() is already stubbed out in > the CONFIG_MEMORY_FAILURE=n case. Yes, I'm overthinking it. > >> + memory_failure_queue(pfn, MF_ACTION_REQUIRED); > > My expectation is MF_ACTION_REQUIRED is not appropriate for CXL event > reported errors since action is only required for direct consumption > events and those need not be reported through the device event queue. Got it. > > It would be useful to collaborate with a BIOS firmware engineer so that > the kernel ends up with similar logic as is used to set CPER record > severity, or at least understands why it would want to be different. > See how ghes_handle_memory_failure() determines the > memory_failure_queue() flags. Ok, thanks for the advice. -- Ruan.