From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED59DC4332F for ; Fri, 16 Dec 2022 08:47:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 764E58E0003; Fri, 16 Dec 2022 03:47:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 715268E0002; Fri, 16 Dec 2022 03:47:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 62B3C8E0003; Fri, 16 Dec 2022 03:47:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 548558E0002 for ; Fri, 16 Dec 2022 03:47:49 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 26386409C2 for ; Fri, 16 Dec 2022 08:47:49 +0000 (UTC) X-FDA: 80247541458.03.D2CAC41 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by imf21.hostedemail.com (Postfix) with ESMTP id ECAB31C0010 for ; Fri, 16 Dec 2022 08:47:42 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf21.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1671180466; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3TLpe6uOAFc3oTlj8INt5ZNRsRZmX9odQbLPX0OEO6g=; b=o8MGOMvc0rZrm/Guos9fZ/Uw/Pa4uaYmf+cFUan4vNSFy0feeeYYiQYEKvXgPuuNwAlK2q GdEV6YcN6UDCgpYR+ji2rFKYuyoAPNqlGGIhJtuWRBxjothnmSEZEfETW4kTo1DuOlG9eg kDqoB4X8W0KHVH/s6YeidqXksML4Ns0= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf21.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1671180466; a=rsa-sha256; cv=none; b=ndWJ0mH327HQA4TNhNBr6nqS35gPppJqLBIUcMIYopntTPHA81v/ZQoMIsNzFyHZyKSQL6 kiqGSdZjmtWggZYbnC2dWegPA45vuKMGBDtZyXsV+O6zN/1vyal4GJyaJ7XkyXNg3Gsigf 4iprtjYKB/FdWm7lV+mslvJCLd4A05o= Received: from dggpemm500001.china.huawei.com (unknown [172.30.72.57]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4NYMyj3tt2zmWYH; Fri, 16 Dec 2022 16:41:33 +0800 (CST) Received: from [10.174.177.243] (10.174.177.243) by dggpemm500001.china.huawei.com (7.185.36.107) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.34; Fri, 16 Dec 2022 16:42:34 +0800 Message-ID: <6003e02b-6fc9-2f50-7198-2ef8b541ecad@huawei.com> Date: Fri, 16 Dec 2022 16:42:34 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.5.1 Subject: Re: [PATCH -next resend v3] mm: hwposion: support recovery from ksm_might_need_to_copy() Content-Language: en-US To: =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= CC: "akpm@linux-foundation.org" , "linux-mm@kvack.org" , "tony.luck@intel.com" , "linux-kernel@vger.kernel.org" , "linmiaohe@huawei.com" , David Hildenbrand References: <20221213030557.143432-1-wangkefeng.wang@huawei.com> <20221213120523.141588-1-wangkefeng.wang@huawei.com> <20221216014729.GA2116060@hori.linux.bs1.fc.nec.co.jp> From: Kefeng Wang In-Reply-To: <20221216014729.GA2116060@hori.linux.bs1.fc.nec.co.jp> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.177.243] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To dggpemm500001.china.huawei.com (7.185.36.107) X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: ECAB31C0010 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: 7poiy14r3zhe1iixmctky4hq1eqdgpsi X-HE-Tag: 1671180462-146099 X-HE-Meta: U2FsdGVkX1+GaPDZfEXkv59ZnePJLN3KTtN+5d9ojK+C8HaGBFACJWOE71xn0HF8nZMgcOiKptohPDvQzn4gW0yKvDM5euLnQeycIw8r+ULYwkJr5al7Tt3tZgH8OmXyXLy7xV55/wkBCJpadydV/pUP+3tQmmxWaSS0/tgmw4pAzAcDz01xQgKaPduTPCI07hSOh/DyjZ6O4CEiWJ/3eF4/jyf+dYbNN/qZB4QlNFIXauJA31Q70QYsdAaBgEwEbL7lRwryVmrR06D6qDu1bH3k8fMVFVRijdaQgu7oX1xsbNRHdrSNw+f0qzOpX3Yrn4zXiCz2xSHK00Cnu2PaDK0J4sn/rCu8SvKlRYyv/OzYDJaFvWjdebZw3kr0Vk6t319l0sTJ5nWRBZbDTzn/iX2ldAvhmn//os47e3hV8NF/6R4bi/RQM42FfM+YOWX0ztgZH9VjxxT13UxAly7hJw7J8CK2QxKlyxFye5O1dFLKIMklNHszqKen3Q5iOpeqAfBsU3YmSvXJRPESeUmFQ9nC5m5HH7rFZ4GAz9TjjcZsyHi6eyb0tFq8gD67vx7ozPUAvk6AdzA9BBRWVfnrk5EXlICNvh09suasRLUey/N2HDoGbh2WIHHCg883j8L25GqzpF2+WoQOh7aiKdjNGzFQ+2JiZSvorJdpB06/xM8KMjfeP+KusQSSIPSV0CtqYaf1PltbEYwMWjJFaB9GPkI/JMQgvbi/bH5ORVpcNu8c8aO0cWmBzSx4Vsj5lhPTfxvixb31OkLLmV3CQyyUmYgjubgGvVo7rNxs3iLx24/TAdTdkrAJiGLO8dVQ9Yyliu9jD4+dwKyTs3Xs8KMWKWZ5muDSEM5a9Jym61Quc4adNe0N/dHQSPFM7zQOsJzvslVvWmkkL/8qzO7Aahp4kPKgsrRVnllK X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2022/12/16 9:47, HORIGUCHI NAOYA(堀口 直也) wrote: > On Tue, Dec 13, 2022 at 08:05:23PM +0800, Kefeng Wang wrote: >> When the kernel copy a page from ksm_might_need_to_copy(), but runs >> into an uncorrectable error, it will crash since poisoned page is >> consumed by kernel, this is similar to Copy-on-write poison recovery, > > Maybe you mean "this is similar to the issue recently fixed by > Copy-on-write poison recovery."? And if this sentence ends here, > please put "." instead of ",". That what I mean, will update the changelog. > >> When an error is detected during the page copy, return VM_FAULT_HWPOISON >> in do_swap_page(), and install a hwpoison entry in unuse_pte() when >> swapoff, which help us to avoid system crash. Note, memory failure on >> a KSM page will be skipped, but still call memory_failure_queue() to >> be consistent with general memory failure process. > > Thank you for the work. I have a few comment below ... > >> >> Signed-off-by: Kefeng Wang >> --- >> v3 resend: >> - enhance unuse_pte() if ksm_might_need_to_copy() return -EHWPOISON >> - fix issue found by lkp >> >> mm/ksm.c | 8 ++++++-- >> mm/memory.c | 3 +++ >> mm/swapfile.c | 20 ++++++++++++++------ >> 3 files changed, 23 insertions(+), 8 deletions(-) >> >> diff --git a/mm/ksm.c b/mm/ksm.c >> index dd02780c387f..83e2f74ae7da 100644 >> --- a/mm/ksm.c >> +++ b/mm/ksm.c >> @@ -2629,8 +2629,12 @@ struct page *ksm_might_need_to_copy(struct page *page, >> new_page = NULL; >> } >> if (new_page) { >> - copy_user_highpage(new_page, page, address, vma); >> - >> + if (copy_mc_user_highpage(new_page, page, address, vma)) { >> + put_page(new_page); >> + new_page = ERR_PTR(-EHWPOISON); >> + memory_failure_queue(page_to_pfn(page), 0); >> + return new_page; > > Simply return ERR_PTR(-EHWPOISON)? OK. > >> + } >> SetPageDirty(new_page); >> __SetPageUptodate(new_page); >> __SetPageLocked(new_page); >> diff --git a/mm/memory.c b/mm/memory.c >> index aad226daf41b..5b2c137dfb2a 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -3840,6 +3840,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) >> if (unlikely(!page)) { >> ret = VM_FAULT_OOM; >> goto out_page; >> + } else if (unlikely(PTR_ERR(page) == -EHWPOISON)) { >> + ret = VM_FAULT_HWPOISON; >> + goto out_page; >> } >> folio = page_folio(page); >> >> diff --git a/mm/swapfile.c b/mm/swapfile.c >> index 908a529bca12..0efb1c2c2415 100644 >> --- a/mm/swapfile.c >> +++ b/mm/swapfile.c >> @@ -1763,12 +1763,15 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, >> struct page *swapcache; >> spinlock_t *ptl; >> pte_t *pte, new_pte; >> + bool hwposioned = false; >> int ret = 1; >> >> swapcache = page; >> page = ksm_might_need_to_copy(page, vma, addr); >> if (unlikely(!page)) >> return -ENOMEM; >> + else if (unlikely(PTR_ERR(page) == -EHWPOISON)) >> + hwposioned = true; >> >> pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); >> if (unlikely(!pte_same_as_swp(*pte, swp_entry_to_pte(entry)))) { >> @@ -1776,15 +1779,19 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, >> goto out; >> } >> >> - if (unlikely(!PageUptodate(page))) { >> - pte_t pteval; >> + if (hwposioned || !PageUptodate(page)) { >> + swp_entry_t swp_entry; >> >> dec_mm_counter(vma->vm_mm, MM_SWAPENTS); >> - pteval = swp_entry_to_pte(make_swapin_error_entry()); >> - set_pte_at(vma->vm_mm, addr, pte, pteval); >> - swap_free(entry); >> + if (hwposioned) { >> + swp_entry = make_hwpoison_entry(swapcache); >> + page = swapcache; > > This might work for the process accessing to the broken page, but ksm > pages are likely to be shared by multiple processes, so it would be > much nicer if you can convert all mapping entries for the error ksm page > into hwpoisoned ones. Maybe in this thorough approach, > hwpoison_user_mappings() is updated to call try_to_unmap() for ksm pages. > But it's not necessary to do this together with applying mcsafe-memcpy, > because recovery action and mcsafe-memcpy can be done independently. Yes, the memory failure won't handle KSM page (commit 01e00f880ca7 "HWPOISON: fix oops on ksm pages"), we could support it later, > > Thanks, > Naoya Horiguchi > >> + } else { >> + swp_entry = make_swapin_error_entry(); >> + } >> + new_pte = swp_entry_to_pte(swp_entry); >> ret = 0; >> - goto out; >> + goto setpte; >> } >> >> /* See do_swap_page() */ >> @@ -1816,6 +1823,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, >> new_pte = pte_mksoft_dirty(new_pte); >> if (pte_swp_uffd_wp(*pte)) >> new_pte = pte_mkuffd_wp(new_pte); >> +setpte: >> set_pte_at(vma->vm_mm, addr, pte, new_pte); >> swap_free(entry); >> out: >> -- >> 2.35.3 > >