From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9810C433FE for ; Fri, 25 Mar 2022 01:33:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348477AbiCYBe6 (ORCPT ); Thu, 24 Mar 2022 21:34:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38296 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1357538AbiCYBdP (ORCPT ); Thu, 24 Mar 2022 21:33:15 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 112F35C667 for ; Thu, 24 Mar 2022 18:31:40 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id AC5BCB82727 for ; Fri, 25 Mar 2022 01:31:38 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 41764C340ED; Fri, 25 Mar 2022 01:31:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1648171897; bh=JEkV3VIH36D7/Q28L4ZN3l3KtEilN0pj3qixQRdrIj0=; h=Date:To:From:Subject:From; b=zdcPMZ6gtY+bcSaA2SIoG15ui2KFPD8L05VXrlE6DEklExFHJMB2hW77SobTg0lTQ 6o3xzX7GXQ4VTemJTKR9G6CkMnJXcN2Mi9H5P4QvYrJeklNVZQJ1pxuX1BTrHuA/cB xo8MD+yjShrip8YxUKQ1XEg0kBne/Ho++OAbLVQc= Date: Thu, 24 Mar 2022 18:31:36 -0700 To: mm-commits@vger.kernel.org, youquan.song@intel.com, tony.luck@intel.com, naoya.horiguchi@nec.com, akpm@linux-foundation.org From: Andrew Morton Subject: [merged] mm-hwpoison-fix-error-page-recovered-but-reported-not-recovered.patch removed from -mm tree Message-Id: <20220325013137.41764C340ED@smtp.kernel.org> Precedence: bulk Reply-To: linux-kernel@vger.kernel.org List-ID: X-Mailing-List: mm-commits@vger.kernel.org The patch titled Subject: mm/hwpoison: fix error page recovered but reported "not recovered" has been removed from the -mm tree. Its filename was mm-hwpoison-fix-error-page-recovered-but-reported-not-recovered.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Naoya Horiguchi Subject: mm/hwpoison: fix error page recovered but reported "not recovered" When an uncorrected memory error is consumed there is a race between the CMCI from the memory controller reporting an uncorrected error with a UCNA signature, and the core reporting and SRAR signature machine check when the data is about to be consumed. If the CMCI wins that race, the page is marked poisoned when uc_decode_notifier() calls memory_failure() and the machine check processing code finds the page already poisoned. It calls kill_accessing_process() to make sure a SIGBUS is sent. But returns the wrong error code. Console log looks like this: [34775.674296] mce: Uncorrected hardware memory error in user-access at 3710b3400 [34775.675413] Memory failure: 0x3710b3: recovery action for dirty LRU page: Recovered [34775.690310] Memory failure: 0x3710b3: already hardware poisoned [34775.696247] Memory failure: 0x3710b3: Sending SIGBUS to einj_mem_uc:361438 due to hardware memory corruption [34775.706072] mce: Memory error not recovered kill_accessing_process() is supposed to return -EHWPOISON to notify that SIGBUS is already set to the process and kill_me_maybe() doesn't have to send it again. But current code simply fails to do this, so fix it to make sure to work as intended. This change avoids the noise message "Memory error not recovered" and skips duplicate SIGBUSs. [tony.luck@intel.com: reword some parts of commit message] Link: https://lkml.kernel.org/r/20220113231117.1021405-1-naoya.horiguchi@linux.dev Fixes: a3f5d80ea401 ("mm,hwpoison: send SIGBUS with error virutal address") Signed-off-by: Naoya Horiguchi Reported-by: Youquan Song Cc: Tony Luck Signed-off-by: Andrew Morton --- mm/memory-failure.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) --- a/mm/memory-failure.c~mm-hwpoison-fix-error-page-recovered-but-reported-not-recovered +++ a/mm/memory-failure.c @@ -707,8 +707,10 @@ static int kill_accessing_process(struct (void *)&priv); if (ret == 1 && priv.tk.addr) kill_proc(&priv.tk, pfn, flags); + else + ret = 0; mmap_read_unlock(p->mm); - return ret ? -EFAULT : -EHWPOISON; + return ret > 0 ? -EHWPOISON : -EFAULT; } static const char *action_name[] = { _ Patches currently in -mm which might be from naoya.horiguchi@nec.com are