From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02A1FC5472C for ; Fri, 23 Aug 2024 01:48:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F176880075; Thu, 22 Aug 2024 21:48:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EC7B680073; Thu, 22 Aug 2024 21:48:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DB62580075; Thu, 22 Aug 2024 21:48:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id BDD1F80073 for ; Thu, 22 Aug 2024 21:48:27 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 3BFB641B74 for ; Fri, 23 Aug 2024 01:48:27 +0000 (UTC) X-FDA: 82481825454.15.B09E43C Received: from out-185.mta1.migadu.com (out-185.mta1.migadu.com [95.215.58.185]) by imf25.hostedemail.com (Postfix) with ESMTP id 45921A0005 for ; Fri, 23 Aug 2024 01:48:25 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=BrKD1gz4; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf25.hostedemail.com: domain of hao.ge@linux.dev designates 95.215.58.185 as permitted sender) smtp.mailfrom=hao.ge@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724377688; a=rsa-sha256; cv=none; b=VfAdIf/Aw30XLjz78zzxwhECMZzQrqVKYAY1jBCKas4tmt5Ag8TUXG5kGk6x1DAVXQpCkX vJzLlqUZQCh70vXeZe5DyT33uW3m1zFHTPM2gd3jM4psrn2R9yPt28xQHgs4c+bsx6ZzzY 7F91axeJp9ieldm0+9P3LGfDrFGFRUA= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=BrKD1gz4; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf25.hostedemail.com: domain of hao.ge@linux.dev designates 95.215.58.185 as permitted sender) smtp.mailfrom=hao.ge@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724377688; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PPHOSUo6pnc3jr4Qd3Swu3bDszK2Mgq0Ytsd6c+xHpQ=; b=fw82CJpithv1tHmEi7MEU+/loMFwuRTL2BNcOKstwBOGVJ77MDTux4dSr29O1kejwik3UD SoMZnX8pevRDIu6jloJRTpVYyLuzmeQbMIewiUyROKjULzvFEBdXbJF34mVOcYir7dJlIq VC38xwRqBO2oU/wwI7lG8zumlixMRKk= Message-ID: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1724377703; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PPHOSUo6pnc3jr4Qd3Swu3bDszK2Mgq0Ytsd6c+xHpQ=; b=BrKD1gz42LUEASz7L+8rKbtq+XlDFRkk4D1TCgjNBVRPOazJXuHlF7T4pllpd3mout0hoK Xl27UDDvh1dDNkVj6QuzRA+jYafp7d2GpVlgAmkHlTE6yJTHU9xlg9DWWmwelQB0h1+asQ wlTZMZVcxZemHQ89wMSUftvGufZe1NI= Date: Fri, 23 Aug 2024 09:47:52 +0800 MIME-Version: 1.0 Subject: Re: [PATCH] codetag: debug: mark codetags for pages which transitioned from being poison to unpoison as empty To: Suren Baghdasaryan , Miaohe Lin Cc: kent.overstreet@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Hao Ge , stable@vger.kernel.org, nao.horiguchi@gmail.com, akpm@linux-foundation.org, pasha.tatashin@soleen.com, david@redhat.com References: <20240822025800.13380-1-hao.ge@linux.dev> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Hao Ge In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Queue-Id: 45921A0005 X-Rspamd-Server: rspam01 X-Stat-Signature: yg6s74pj89zkfes8ommbk1ynnbyr7o48 X-HE-Tag: 1724377705-217693 X-HE-Meta: U2FsdGVkX19G6UXUQ1o5BzWgNXiecqIJlx/ZFnzwTUBOwn0MWxiYrlv4oiLkB6mlGlk5pePHCGrSf32eOkerja5BiEVWeHdYHGfCRgssYSo2gqjaUVXc+NoUYP1Rk4jZynTv+ZgZMM1cWk1l9M6S5gOggBMAru922TJZDf2kE6BdczLG5FLIKmYf/2QgRLj/R5d7pmLT7lzv2Zx9g7uIaDttP50CWgNwuA/51KL7hZZqdWceqKCtnho1beqOVznc5+10Xe9vyWaEqjPBvKPB5p6iYD6fd0CGieHWkAzHXlMS5+cbT/lNmvglL3GizIXK12i+3NUF3XxAE5WJvRL5ngwQNe/zcPcDdyBvnnOMILu/5heNWzwlLNeLiQgdDX4zO10h1gvo4xzfgdvFRm/n9Vl21sc0y9X0/Dm9i0Hc0APnzeWN91b8kCu+v1X2pUUNphXIhVrdOTSI77FDzhmuXSyUzyPosquPeCy3wn7VDAKc9uMNjOedvDO99IHtEsOYDRFcm7Ryy+e+Zafcwg4Mv0EG9IGmNBnRgRdhA4bFRK0eZmGA5XgWqt7JlxD8gXJobpbyjMsObAkY+7Gihr4GAzPJ28ayvz49y+3iKCJkYyVq48GAgQymctHuaEaFxcy3nU17oWSCMgJm/JxWuCmg5dlQH2QHkGMROl0GODomk9gdNlvLHF7iTheO4h4yH8GzfbCp7nMNAuVWe1Ys0ykyeRHatRQXypCVn2LujiMw9kFNeOxXkl7dbTgAgEuPwBAGvITRiExBvlB+hRiQ8sV1NGzxA0e9SLVMcjZZvX5cF1qSSaHP1qyyGQzDmdGUlC4EyKx0ZpsDTdcD4GvpZdxxa7JIUZHrvBqRdJTPtIZNO+PuWczcVY5dKS5R4Ll1VrSHrtKuF8UKqoA7oMNpN4qOt6DEEl5qHLXbEqN/TUDDr/pztQqpeER7NbTKHQe+vP5rj79X+9Aon0CPL4CDcqz 5oAnvmpv xJnhrDIONhv5+cuBNbA4lglbVy3ZwnsUDTXEWZ+JBYVJj0a/3T9TWbxf93GAYlAPb9i+26Pife2519X1TmjItoAKIke2TxgE1b4SzUxhdnNYQPQ0nv5yMRzevYoj1zRRQFP4dM2ZM31mBBoOX/mUsU1YEJ3saDB3oEr56cvVsauq7vCTmxn+0ckHZpLIwoxYnhFm8OGvIy/Hg13lS60FE4msWmV5y9uM8m1kq6maULGJHDBzWllQv7PHp1/exu6y0TKKfXFKiOzBRa7/t3jOX+2XaErnSo7JsYhgIQ8LukI77kDcS2FLyxgOVbJ/67ge/36O9Xfy7HjBxIdph9ZWsprPmCMuGBzSMc7vGSYjNrmanswE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Suren and Miaohe Thank you all for taking the time to discuss this issue. On 8/23/24 06:50, Suren Baghdasaryan wrote: > On Thu, Aug 22, 2024 at 2:46 AM Hao Ge wrote: >> Hi Miaohe >> >> >> Thank you for taking the time to review this patch. >> >> >> On 8/22/24 16:04, Miaohe Lin wrote: >>> On 2024/8/22 10:58, Hao Ge wrote: >>>> From: Hao Ge >>>> >>> Thanks for your patch. >>> >>>> The PG_hwpoison page will be caught and isolated on the entrance to >>>> the free buddy page pool. so,when we clear this flag and return it >>>> to the buddy system,mark codetags for pages as empty. >>>> >>> Is below scene cause the problem? >>> >>> 1. Pages are allocated. pgalloc_tag_add() will be called when prep_new_page(). >>> >>> 2. Pages are hwpoisoned. memory_failure() will set PG_hwpoison flag and pgalloc_tag_sub() >>> will be called when pages are caught and isolated on the entrance to buddy. > Hi Folks, > Thanks for reporting this! Could you please describe in more details > how memory_failure() ends up calling pgalloc_tag_sub()? It's not > obvious to me which path leads to pgalloc_tag_sub(), so I must be > missing something. OK,Let me describe the scenario I encountered. In the Link [1] I mentioned,here is the logic behind it: It performed the following operations: madvise(ptrs[num_alloc], pagesize, MADV_SOFT_OFFLINE) and then the kernel's call stack looks like this: do_madvise soft_offline_page page_handle_poison __folio_put free_unref_page It will set a flag within the following function and then release the page. https://elixir.bootlin.com/linux/v6.11-rc4/source/mm/memory-failure.c#L206 and and then,because you set the PG_hwpoison flag, so the page will be caught and isolated on the entrance to the free buddy page pool. look here: https://elixir.bootlin.com/linux/v6.11-rc4/source/mm/page_alloc.c#L1052 At this very moment, we call pgalloc_tag_sub. So,when we callunpoison_memoryclear this flag and return the page to the buddy system, the problem arises. > On a conceptual level I want to understand if the page isolated in > this manner should be considered freed or not. If it shouldn't be > considered free then I think the right fix would be to avoid > pgalloc_tag_sub() when this isolation happens. > Thanks, > Suren. In my understanding, the purpose of unpoison_memory is to reclaim poisoned pages. I dug up the patch that introduced this function back then https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/mm/memory-failure.c?id=847ce401df392b0704369fd3f75df614ac1414b4 Therefore, this is reasonable. Thanks Best regards Hao > >>> 3. unpoison_memory cleared flags and sent the pages to buddy. pgalloc_tag_sub() will be >>> called again in free_pages_prepare(). >>> >>> So there is a imbalance that pgalloc_tag_add() is called once and pgalloc_tag_sub() is called twice? >> As you said, that's exactly the case. >>> If so, let's think about more complicated scene: >>> >>> 1. Same as above. >>> >>> 2. Pages are hwpoisoned. But memory_failure() fails to handle it. So PG_hwpoison flag is set >>> but pgalloc_tag_sub() is not called (pages are not sent to buddy). >>> >>> 3. unpoison_memory cleared flags and calls clear_page_tag_ref() without calling pgalloc_tag_sub() >>> first. Will this cause problem? >>> >>> Though this should be really rare... >>> >>> Thanks. >>> . >> Great, I didn't anticipate this scenario. >> >> When we call clear_page_tag_ref() without calling pgalloc_tag_sub(), >> >> It will cause exceptions in|tag->counters->bytes|and|tag->counters->calls|. >> >> We can add a layer of protection to handle it >> >> The pseudocode is as follows: >> >> if (mem_alloc_profiling_enabled()) { >> union codetag_ref *ref = get_page_tag_ref(page); >> >> if (ref) { >> if( ref->ct != NULL && !is_codetag_empty(ref)) >> { >> tag = ct_to_alloc_tag(ref->ct); >> this_cpu_sub(tag->counters->bytes, bytes); >> this_cpu_dec(tag->counters->calls); >> } >> set_codetag_empty(ref); >> put_page_tag_ref(ref); >> } >> } >> >> Hi Suren and Kent >> >> Do you have any suggestions for this? If it's okay, I'll add comments >> and include this pseudocode in|clear_page_tag_ref|. >> >>>> It was detected by [1] and the following WARN occurred: >>>> >>>> [ 113.930443][ T3282] ------------[ cut here ]------------ >>>> [ 113.931105][ T3282] alloc_tag was not set >>>> [ 113.931576][ T3282] WARNING: CPU: 2 PID: 3282 at ./include/linux/alloc_tag.h:130 pgalloc_tag_sub.part.66+0x154/0x164 >>>> [ 113.932866][ T3282] Modules linked in: hwpoison_inject fuse ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_man4 >>>> [ 113.941638][ T3282] CPU: 2 UID: 0 PID: 3282 Comm: madvise11 Kdump: loaded Tainted: G W 6.11.0-rc4-dirty #18 >>>> [ 113.943003][ T3282] Tainted: [W]=WARN >>>> [ 113.943453][ T3282] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022 >>>> [ 113.944378][ T3282] pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) >>>> [ 113.945319][ T3282] pc : pgalloc_tag_sub.part.66+0x154/0x164 >>>> [ 113.946016][ T3282] lr : pgalloc_tag_sub.part.66+0x154/0x164 >>>> [ 113.946706][ T3282] sp : ffff800087093a10 >>>> [ 113.947197][ T3282] x29: ffff800087093a10 x28: ffff0000d7a9d400 x27: ffff80008249f0a0 >>>> [ 113.948165][ T3282] x26: 0000000000000000 x25: ffff80008249f2b0 x24: 0000000000000000 >>>> [ 113.949134][ T3282] x23: 0000000000000001 x22: 0000000000000001 x21: 0000000000000000 >>>> [ 113.950597][ T3282] x20: ffff0000c08fcad8 x19: ffff80008251e000 x18: ffffffffffffffff >>>> [ 113.952207][ T3282] x17: 0000000000000000 x16: 0000000000000000 x15: ffff800081746210 >>>> [ 113.953161][ T3282] x14: 0000000000000000 x13: 205d323832335420 x12: 5b5d353031313339 >>>> [ 113.954120][ T3282] x11: ffff800087093500 x10: 000000000000005d x9 : 00000000ffffffd0 >>>> [ 113.955078][ T3282] x8 : 7f7f7f7f7f7f7f7f x7 : ffff80008236ba90 x6 : c0000000ffff7fff >>>> [ 113.956036][ T3282] x5 : ffff000b34bf4dc8 x4 : ffff8000820aba90 x3 : 0000000000000001 >>>> [ 113.956994][ T3282] x2 : ffff800ab320f000 x1 : 841d1e35ac932e00 x0 : 0000000000000000 >>>> [ 113.957962][ T3282] Call trace: >>>> [ 113.958350][ T3282] pgalloc_tag_sub.part.66+0x154/0x164 >>>> [ 113.959000][ T3282] pgalloc_tag_sub+0x14/0x1c >>>> [ 113.959539][ T3282] free_unref_page+0xf4/0x4b8 >>>> [ 113.960096][ T3282] __folio_put+0xd4/0x120 >>>> [ 113.960614][ T3282] folio_put+0x24/0x50 >>>> [ 113.961103][ T3282] unpoison_memory+0x4f0/0x5b0 >>>> [ 113.961678][ T3282] hwpoison_unpoison+0x30/0x48 [hwpoison_inject] >>>> [ 113.962436][ T3282] simple_attr_write_xsigned.isra.34+0xec/0x1cc >>>> [ 113.963183][ T3282] simple_attr_write+0x38/0x48 >>>> [ 113.963750][ T3282] debugfs_attr_write+0x54/0x80 >>>> [ 113.964330][ T3282] full_proxy_write+0x68/0x98 >>>> [ 113.964880][ T3282] vfs_write+0xdc/0x4d0 >>>> [ 113.965372][ T3282] ksys_write+0x78/0x100 >>>> [ 113.965875][ T3282] __arm64_sys_write+0x24/0x30 >>>> [ 113.966440][ T3282] invoke_syscall+0x7c/0x104 >>>> [ 113.966984][ T3282] el0_svc_common.constprop.1+0x88/0x104 >>>> [ 113.967652][ T3282] do_el0_svc+0x2c/0x38 >>>> [ 113.968893][ T3282] el0_svc+0x3c/0x1b8 >>>> [ 113.969379][ T3282] el0t_64_sync_handler+0x98/0xbc >>>> [ 113.969980][ T3282] el0t_64_sync+0x19c/0x1a0 >>>> [ 113.970511][ T3282] ---[ end trace 0000000000000000 ]--- >>>> >>>> Link [1]: https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise11.c >>>> >>>> Fixes: a8fc28dad6d5 ("alloc_tag: introduce clear_page_tag_ref() helper function") >>>> Cc: stable@vger.kernel.org # v6.10 >>>> Signed-off-by: Hao Ge >>>> --- >>>> mm/memory-failure.c | 6 ++++++ >>>> 1 file changed, 6 insertions(+) >>>> >>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >>>> index 7066fc84f351..570388c41532 100644 >>>> --- a/mm/memory-failure.c >>>> +++ b/mm/memory-failure.c >>>> @@ -2623,6 +2623,12 @@ int unpoison_memory(unsigned long pfn) >>>> >>>> folio_put(folio); >>>> if (TestClearPageHWPoison(p)) { >>>> + /* the PG_hwpoison page will be caught and isolated >>>> + * on the entrance to the free buddy page pool. >>>> + * so,when we clear this flag and return it to the buddy system, >>>> + * clear it's codetag >>>> + */ >>>> + clear_page_tag_ref(p); >>>> folio_put(folio); >>>> ret = 0; >>>> } >>>> >>>> >> Thanks >> >> BR >> >> Hao >>