From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94D83C52D7C for ; Fri, 23 Aug 2024 03:37:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 15E6A8007A; Thu, 22 Aug 2024 23:37:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 10E3A80073; Thu, 22 Aug 2024 23:37:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EC97C6B0133; Thu, 22 Aug 2024 23:37:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id CC4BC6B00D7 for ; Thu, 22 Aug 2024 23:37:16 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 29D7EC0146 for ; Fri, 23 Aug 2024 03:37:16 +0000 (UTC) X-FDA: 82482099672.12.4FC4885 Received: from out-176.mta0.migadu.com (out-176.mta0.migadu.com [91.218.175.176]) by imf26.hostedemail.com (Postfix) with ESMTP id 2594C140008 for ; Fri, 23 Aug 2024 03:37:12 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=NZoC87Hz; spf=pass (imf26.hostedemail.com: domain of hao.ge@linux.dev designates 91.218.175.176 as permitted sender) smtp.mailfrom=hao.ge@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724384151; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cY24nOUEo2m206GlJwgB19ZAs8hLikNfAgK8dR8ZU44=; b=E9v1tdyVoUTDieFBTewmYF3vzD62KGZbVmvq7HdstumBZZ7weyPsjYcEXO5UNcX9mB6Y03 hlmJir73k/p2yZ6P01fEzNKW1McaqABXALXkJ1HghUNlMQAgl1t9putLDR8os5HuW8h1nY wIc2+uRGWP0Lqn+ZuxKS7ngzgT0Wcps= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724384151; a=rsa-sha256; cv=none; b=ujXvGnCkz6sUDkIf0mLMn7RbNOcv4u3HpibHVYyuee9wqNs/RjR/COhwcQbMpv3CsKi3E8 qumCrGAkjchgwYydat4SgxJQdCUFmvHtH713DPnXsGx61C6Aw4S2XY8yOAMi9Dc7t9xDjO 9h12IFT7HbnInSyQ/vusaFGdXzP2vUg= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=NZoC87Hz; spf=pass (imf26.hostedemail.com: domain of hao.ge@linux.dev designates 91.218.175.176 as permitted sender) smtp.mailfrom=hao.ge@linux.dev; dmarc=pass (policy=none) header.from=linux.dev Message-ID: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1724384230; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cY24nOUEo2m206GlJwgB19ZAs8hLikNfAgK8dR8ZU44=; b=NZoC87Hz1pLFaHnHL4SbGQcF+e9d+YA3JSgICauOGv4UM+ANdhV6hrDti2FSSQWlFQHHI6 c5oYcYwk8maD7LYJA5HR5HIDwJvhINjWA+Jd6WOSKCK5UYkz1po8/AKORUJOl7mKDDurI0 ypYUPtj+ZIRRLA9eRWb5JTkvEgBI0MA= Date: Fri, 23 Aug 2024 11:37:05 +0800 MIME-Version: 1.0 Subject: Re: [PATCH] codetag: debug: mark codetags for pages which transitioned from being poison to unpoison as empty X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Hao Ge To: Suren Baghdasaryan , Miaohe Lin Cc: kent.overstreet@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Hao Ge , stable@vger.kernel.org, nao.horiguchi@gmail.com, akpm@linux-foundation.org, pasha.tatashin@soleen.com, david@redhat.com References: <20240822025800.13380-1-hao.ge@linux.dev> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-Stat-Signature: zjooyscyzwg9sydzjc1fwwzuo6yrpdyu X-Rspamd-Queue-Id: 2594C140008 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1724384232-670821 X-HE-Meta: U2FsdGVkX18n7l+8UgxWcfi37t8kYEtq0ALsp5QDF6mIDyN69C+5IjMBcqfuU3gQcGb9bpTTBg2WY6L4OvDKf/upBb83mFcLdYhWuwUrADx9xXThTby9oLJgoKVk/2psoL/TEYnnDIfkhdaaB2Dg9eIan93NRHmyCTG4NE0rJHi6sZdJD8y6KU893DmJxcHdncF6qpSOJpJiQzYmlX3O+LEwSpoOYacNBg5t/CoX84s3LMsqwIEug52WOQ9D9paTTX/QOtdsZUOXc6ZjmbStwHVqm5YJ0vrI2AYc3j5O+pVu7l0skU4gXOCo364zoj/P4bOV7Yted7kcX9oCrIj4X6xmwdk1tR4bq6xHGeGeDxDVFUg5mXOwqpjciVkmiZXJJvgwO/Ijiec1R2a6Cv3s8gBz0gHM5mDqSTN0RJkHozVK1V7j6PMA57bFcIY76xVJ9T9VQXQXVJmX/qkvAxv1CnQ38Rr659AaJlQtp5YamRGIt8J4/HPCaFwlTWKwXsQOTPssU6Xe0ynn2xZ9KNGnZAfgc/w8yVLh7XLYZRzJHoLl8nqlX0MgUMiEOqQmrp9Jkg+f9ZghFIGYdGMy8Z3GgtZaFtEsilEqOgq3woZt5OftInvPSXtYwxyLr4QWQChVu/WMg4yrqUDE1iwCitsxG1Q4WSJOePQVDzT3TJ2TL2I7GO88KUaZzJGPKboS+oHeerAxDuRGHtjyQZ1SifUrcnRVJVunniAd/2Hv9/rUbRsaeL2e0Uee+nVeFtMTyzAk/e61u6F8Vx5XE7bs+ZlG1e4zINUkIVvBDq8JCc7VEAeUvDSHL1v7VLvLBJk87X6GadutSmYhGKcka12067ktYlwxSac36T+aGqhisv5oBytGuwOLl7MiRZWA48rW66Y0MTFxsEKKjA2889CElnuCEg4JMzPl6KVmtmqk68GPq+yPQZXsiENGem/66JdIvF8q7iUWeM64nJjLBOfgmU3 6W88pRmS KV/lvHSaSXcwBHqp/C/QsRhlJTtdyO6lDrqKVXLHR1gX2sLE8Tj8SsasQxyNRObqoJvSjTTn/ZiAoISzDW4teNq2bpK6awjB4kEEU6ChlYFJDZMsrubvFOFd4XwPpfZZ/i/2lwhiEwTlNrbzPy44dlqFoieuVzRcepH6ggDnpsz2Hih+6at38s25zko/2qSoSCKNzydn4C4RJFEb9Fr0WAQx/qLvUqhtXe/1bcY9B+ppDCNXJvRaOgyGX9slsB0lbzZYNRqn/Bl98Dm1IlXafKK3FVN72c2t/jFri/E8hrfeIWYhVP6DQyP++hq25bRU7lQ4VEnomWL/L1u4qzcKRIwt/0ZwnZRxi9X+Dnvl6bnKYWss= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Suren and Miaohe On 8/23/24 09:47, Hao Ge wrote: > Hi Suren and Miaohe > > > Thank you all for taking the time to discuss this issue. > > > On 8/23/24 06:50, Suren Baghdasaryan wrote: >> On Thu, Aug 22, 2024 at 2:46 AM Hao Ge wrote: >>> Hi Miaohe >>> >>> >>> Thank you for taking the time to review this patch. >>> >>> >>> On 8/22/24 16:04, Miaohe Lin wrote: >>>> On 2024/8/22 10:58, Hao Ge wrote: >>>>> From: Hao Ge >>>>> >>>> Thanks for your patch. >>>> >>>>> The PG_hwpoison page will be caught and isolated on the entrance to >>>>> the free buddy page pool. so,when we clear this flag and return it >>>>> to the buddy system,mark codetags for pages as empty. >>>>> >>>> Is below scene cause the problem? >>>> >>>> 1. Pages are allocated. pgalloc_tag_add() will be called when >>>> prep_new_page(). >>>> >>>> 2. Pages are hwpoisoned. memory_failure() will set PG_hwpoison flag >>>> and pgalloc_tag_sub() >>>> will be called when pages are caught and isolated on the entrance >>>> to buddy. >> Hi Folks, >> Thanks for reporting this! Could you please describe in more details >> how memory_failure() ends up calling pgalloc_tag_sub()? It's not >> obvious to me which path leads to pgalloc_tag_sub(), so I must be >> missing something. > > > OK,Let me describe the scenario I encountered. > > In the Link [1] I mentioned,here is the logic behind it: > > It performed the following operations: > > madvise(ptrs[num_alloc], pagesize, MADV_SOFT_OFFLINE) > > and then the kernel's call stack looks like this: > > do_madvise > > soft_offline_page > > page_handle_poison > > __folio_put > > free_unref_page > I just reviewed it and I think I missed a stack. Actually, it's like this do_madvise soft_offline_page soft_offline_in_use_page page_handle_poison __folio_put free_unref_page And I've come up with a minimal solution. If everyone agrees, I'll send the patch.look this https://elixir.bootlin.com/linux/v6.11-rc4/source/mm/page_alloc.c#L1056 Let's directly call clear_page_tag_ref after pgalloc_tag_sub. Thanks BR Hao > It will set a flag within the following function and then release the > page. > > https://elixir.bootlin.com/linux/v6.11-rc4/source/mm/memory-failure.c#L206 > > > and and then,because you set the PG_hwpoison flag, so the page will be > caught and isolated on the > > entrance to the free buddy page pool. look here: > > https://elixir.bootlin.com/linux/v6.11-rc4/source/mm/page_alloc.c#L1052 > > At this very moment, we call pgalloc_tag_sub. > > So,when we callunpoison_memoryclear this flag and return the page to > the buddy system, the problem arises. > > >> On a conceptual level I want to understand if the page isolated in >> this manner should be considered freed or not. If it shouldn't be >> considered free then I think the right fix would be to avoid >> pgalloc_tag_sub() when this isolation happens. >> Thanks, >> Suren. > > In my understanding, the purpose of unpoison_memory is to reclaim > poisoned pages. > > I dug up the patch that introduced this function back then > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/mm/memory-failure.c?id=847ce401df392b0704369fd3f75df614ac1414b4 > > > Therefore, this is reasonable. > > Thanks > > Best regards > > Hao > >> >>>> 3. unpoison_memory cleared flags and sent the pages to buddy. >>>> pgalloc_tag_sub() will be >>>> called again in free_pages_prepare(). >>>> >>>> So there is a imbalance that pgalloc_tag_add() is called once and >>>> pgalloc_tag_sub() is called twice? >>> As you said, that's exactly the case. >>>> If so, let's think about more complicated scene: >>>> >>>> 1. Same as above. >>>> >>>> 2. Pages are hwpoisoned. But memory_failure() fails to handle it. >>>> So PG_hwpoison flag is set >>>> but pgalloc_tag_sub() is not called (pages are not sent to buddy). >>>> >>>> 3. unpoison_memory cleared flags and calls clear_page_tag_ref() >>>> without calling pgalloc_tag_sub() >>>> first. Will this cause problem? >>>> >>>> Though this should be really rare... >>>> >>>> Thanks. >>>> . >>> Great, I didn't anticipate this scenario. >>> >>> When we call clear_page_tag_ref() without calling pgalloc_tag_sub(), >>> >>> It will cause exceptions >>> in|tag->counters->bytes|and|tag->counters->calls|. >>> >>> We can add a layer of protection to handle it >>> >>> The pseudocode is as follows: >>> >>> if (mem_alloc_profiling_enabled()) { >>>           union codetag_ref *ref = get_page_tag_ref(page); >>> >>>           if (ref) { >>>               if( ref->ct != NULL && !is_codetag_empty(ref)) >>>               { >>>                   tag = ct_to_alloc_tag(ref->ct); >>>                   this_cpu_sub(tag->counters->bytes, bytes); >>>                   this_cpu_dec(tag->counters->calls); >>>               } >>>               set_codetag_empty(ref); >>>               put_page_tag_ref(ref); >>>           } >>> } >>> >>> Hi Suren and Kent >>> >>> Do you have any suggestions for this? If it's okay, I'll add comments >>> and include this pseudocode in|clear_page_tag_ref|. >>> >>>>> It was detected by [1] and the following WARN occurred: >>>>> >>>>> [  113.930443][ T3282] ------------[ cut here ]------------ >>>>> [  113.931105][ T3282] alloc_tag was not set >>>>> [  113.931576][ T3282] WARNING: CPU: 2 PID: 3282 at >>>>> ./include/linux/alloc_tag.h:130 pgalloc_tag_sub.part.66+0x154/0x164 >>>>> [  113.932866][ T3282] Modules linked in: hwpoison_inject fuse >>>>> ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 >>>>> xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_man4 >>>>> [  113.941638][ T3282] CPU: 2 UID: 0 PID: 3282 Comm: madvise11 >>>>> Kdump: loaded Tainted: G        W 6.11.0-rc4-dirty #18 >>>>> [  113.943003][ T3282] Tainted: [W]=WARN >>>>> [  113.943453][ T3282] Hardware name: QEMU KVM Virtual Machine, >>>>> BIOS unknown 2/2/2022 >>>>> [  113.944378][ T3282] pstate: 40400005 (nZcv daif +PAN -UAO -TCO >>>>> -DIT -SSBS BTYPE=--) >>>>> [  113.945319][ T3282] pc : pgalloc_tag_sub.part.66+0x154/0x164 >>>>> [  113.946016][ T3282] lr : pgalloc_tag_sub.part.66+0x154/0x164 >>>>> [  113.946706][ T3282] sp : ffff800087093a10 >>>>> [  113.947197][ T3282] x29: ffff800087093a10 x28: ffff0000d7a9d400 >>>>> x27: ffff80008249f0a0 >>>>> [  113.948165][ T3282] x26: 0000000000000000 x25: ffff80008249f2b0 >>>>> x24: 0000000000000000 >>>>> [  113.949134][ T3282] x23: 0000000000000001 x22: 0000000000000001 >>>>> x21: 0000000000000000 >>>>> [  113.950597][ T3282] x20: ffff0000c08fcad8 x19: ffff80008251e000 >>>>> x18: ffffffffffffffff >>>>> [  113.952207][ T3282] x17: 0000000000000000 x16: 0000000000000000 >>>>> x15: ffff800081746210 >>>>> [  113.953161][ T3282] x14: 0000000000000000 x13: 205d323832335420 >>>>> x12: 5b5d353031313339 >>>>> [  113.954120][ T3282] x11: ffff800087093500 x10: 000000000000005d >>>>> x9 : 00000000ffffffd0 >>>>> [  113.955078][ T3282] x8 : 7f7f7f7f7f7f7f7f x7 : ffff80008236ba90 >>>>> x6 : c0000000ffff7fff >>>>> [  113.956036][ T3282] x5 : ffff000b34bf4dc8 x4 : ffff8000820aba90 >>>>> x3 : 0000000000000001 >>>>> [  113.956994][ T3282] x2 : ffff800ab320f000 x1 : 841d1e35ac932e00 >>>>> x0 : 0000000000000000 >>>>> [  113.957962][ T3282] Call trace: >>>>> [  113.958350][ T3282] pgalloc_tag_sub.part.66+0x154/0x164 >>>>> [  113.959000][ T3282]  pgalloc_tag_sub+0x14/0x1c >>>>> [  113.959539][ T3282]  free_unref_page+0xf4/0x4b8 >>>>> [  113.960096][ T3282]  __folio_put+0xd4/0x120 >>>>> [  113.960614][ T3282]  folio_put+0x24/0x50 >>>>> [  113.961103][ T3282]  unpoison_memory+0x4f0/0x5b0 >>>>> [  113.961678][ T3282]  hwpoison_unpoison+0x30/0x48 [hwpoison_inject] >>>>> [  113.962436][ T3282] simple_attr_write_xsigned.isra.34+0xec/0x1cc >>>>> [  113.963183][ T3282]  simple_attr_write+0x38/0x48 >>>>> [  113.963750][ T3282]  debugfs_attr_write+0x54/0x80 >>>>> [  113.964330][ T3282]  full_proxy_write+0x68/0x98 >>>>> [  113.964880][ T3282]  vfs_write+0xdc/0x4d0 >>>>> [  113.965372][ T3282]  ksys_write+0x78/0x100 >>>>> [  113.965875][ T3282]  __arm64_sys_write+0x24/0x30 >>>>> [  113.966440][ T3282]  invoke_syscall+0x7c/0x104 >>>>> [  113.966984][ T3282] el0_svc_common.constprop.1+0x88/0x104 >>>>> [  113.967652][ T3282]  do_el0_svc+0x2c/0x38 >>>>> [  113.968893][ T3282]  el0_svc+0x3c/0x1b8 >>>>> [  113.969379][ T3282]  el0t_64_sync_handler+0x98/0xbc >>>>> [  113.969980][ T3282]  el0t_64_sync+0x19c/0x1a0 >>>>> [  113.970511][ T3282] ---[ end trace 0000000000000000 ]--- >>>>> >>>>> Link [1]: >>>>> https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise11.c >>>>> >>>>> Fixes: a8fc28dad6d5 ("alloc_tag: introduce clear_page_tag_ref() >>>>> helper function") >>>>> Cc: stable@vger.kernel.org # v6.10 >>>>> Signed-off-by: Hao Ge >>>>> --- >>>>>    mm/memory-failure.c | 6 ++++++ >>>>>    1 file changed, 6 insertions(+) >>>>> >>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >>>>> index 7066fc84f351..570388c41532 100644 >>>>> --- a/mm/memory-failure.c >>>>> +++ b/mm/memory-failure.c >>>>> @@ -2623,6 +2623,12 @@ int unpoison_memory(unsigned long pfn) >>>>> >>>>>               folio_put(folio); >>>>>               if (TestClearPageHWPoison(p)) { >>>>> +                    /* the PG_hwpoison page will be caught and >>>>> isolated >>>>> +                     * on the entrance to the free buddy page pool. >>>>> +                     * so,when we clear this flag and return it >>>>> to the buddy system, >>>>> +                     * clear it's codetag >>>>> +                     */ >>>>> +                    clear_page_tag_ref(p); >>>>>                       folio_put(folio); >>>>>                       ret = 0; >>>>>               } >>>>> >>>>> >>> Thanks >>> >>> BR >>> >>> Hao >>>