From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56A56C54E60 for ; Mon, 18 Mar 2024 02:28:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B7C206B0088; Sun, 17 Mar 2024 22:28:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B2C336B0089; Sun, 17 Mar 2024 22:28:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9CD066B008A; Sun, 17 Mar 2024 22:28:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 8B34E6B0088 for ; Sun, 17 Mar 2024 22:28:27 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 32A70C0B86 for ; Mon, 18 Mar 2024 02:28:27 +0000 (UTC) X-FDA: 81908575854.03.6A725DE Received: from szxga07-in.huawei.com (szxga07-in.huawei.com [45.249.212.35]) by imf15.hostedemail.com (Postfix) with ESMTP id E6F2AA0005 for ; Mon, 18 Mar 2024 02:28:22 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf15.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.35 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710728904; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=p/B9mo+J60EUUjme2V9RfPIft4Q10d5cyPviDFM0ngM=; b=daKo3Wi8KTJqBInvhWWT0AXUhXq3lrtu7r8GG6QXEWQRVrMwT89Us5ryqjqBfcn293tuo9 L7AZtVFnRLBKnYrGl7StGPT2GFI0tyiaDnWxjM73o9JUc/7j6EzW6P9xWsQnTNM24CUuwK Wq0hFdCy1L4aZfodDJ0k9b6A87vmiR0= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf15.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.35 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710728904; a=rsa-sha256; cv=none; b=CXk0MtcH44OjSAi3lJ/ApH2nig7tA9rZuYQCWYeTCvuB3S30f+yNpgk/G4Wwp+xI1GxNMv o67+dtMQD6P6P9sFnCYzlUj118DZIaWrjIYwSXkSVMTXkbQ4YO0QRcuUNAIBWx3t6Ouffv J5Y9Yjxo2fTY1ugjjiEXyGhAV5HGiTY= Received: from mail.maildlp.com (unknown [172.19.88.214]) by szxga07-in.huawei.com (SkyGuard) with ESMTP id 4Tydxk0mMTz1QBky; Mon, 18 Mar 2024 10:25:46 +0800 (CST) Received: from canpemm500002.china.huawei.com (unknown [7.192.104.244]) by mail.maildlp.com (Postfix) with ESMTPS id C692F1A016C; Mon, 18 Mar 2024 10:28:17 +0800 (CST) Received: from [10.173.135.154] (10.173.135.154) by canpemm500002.china.huawei.com (7.192.104.244) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Mon, 18 Mar 2024 10:28:17 +0800 Subject: Re: [PATCH 6/8] mm/memory-failure: Convert memory_failure() to use a folio To: Jane Chu CC: , Naoya Horiguchi , Andrew Morton , , Matthew Wilcox References: <20240229212036.2160900-1-willy@infradead.org> <20240229212036.2160900-7-willy@infradead.org> <5eab08d7-ae38-4f99-401f-f361466e34e0@huawei.com> <196d00e3-4335-4f8f-ac51-5ccfa5ef5f75@oracle.com> <55a1600d-340b-3262-99c7-8a30d6a92a84@huawei.com> <3a5fc87b-7362-4971-a9ab-55154627deb3@oracle.com> From: Miaohe Lin Message-ID: Date: Mon, 18 Mar 2024 10:28:16 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: <3a5fc87b-7362-4971-a9ab-55154627deb3@oracle.com> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.173.135.154] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To canpemm500002.china.huawei.com (7.192.104.244) X-Rspamd-Queue-Id: E6F2AA0005 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: tomdsqjnadiikutb4gr8o3777zkix7un X-HE-Tag: 1710728902-828588 X-HE-Meta: U2FsdGVkX1+aaVXXTP7b0uyhUjXk7lQCuZWXo8EAdKCa77A/yLDl2+VHfZsILGP0Ie5sZTrE+1ZzsWtPVKOvEVNWzfMSK2zd/fzTzbqlFlybb5XrpwvRiugyxaptyh1cCqSdsv7bg87BbceCWwBXG5ziw/mNKFxm4P2vdX1ThEACPHCBhC9Dn9H04pxeb/1vBIz/JFRo1xocOUSEIks53GW76NA9vKQr95J78+hWQL8dPpaYOUeHP/y2nOAo3najlWdzJ0jBGRvz9L7kHOoI1cp6DyZt3qttRZmNOC8qxXnMTsB70qy2h6vy6Y0s4vMriTFlcaSWqpTqqdQ34zHQpMzYN9IsXS7vCgdOJrCRyfFs7z+C7lZ7oFfYd3KHdO1t5NzvwBcYpAhHxQFrKixJF1wvLdMhT8E6NIvipVwN3j9JCgcy0neDzOdd0owBNBbmYMfh1T+c5pRA+SQewsBqY9GGlLjCMihuwCCKMgHLpWAgRtbHEin9txCd3gNLDpIXBXlCjO9XuKGgYjxWnO7OcfK5KVElORo+foRbs0Qm23EdZ8Aa8wHnQeThobX1vNozLwlJr9A1BSRrlMEccR9Jce92FMZYCEyeEjyzmR1OaKHMGM5tHnm44zwRGgl89/jXeLCSqcGhfpk0xfH7X6nwyIs+aEPChkLjBtsM40+BVq3cRPnlbADOjUci7JRG60izlaW/9jzC3xhTUj2MC1IB7UkJii6so/ON5rzkn3/CZL6NBBhFXiEGYVWgrJ1qv2dd61d2x3/B9Y+QWZwACHOHkf+LaBK20Jbs9b7pQOw63yWDoTzvscpl0OZJ82FxbHEVL0r1iiZhob9HWZBPR2V0B5CkATjCVgsL1AguYF8JizDdZiw8aVwd72aoqhFqY2FxLZUTfabIoscO5VmDaGx/IyP4CcL7SG/UWpmtuQqcucGm9hVVaCtNk7Q32gqVpAzs9zeZaB0X3KUc6NXYEjN YtmzhC7s dT/7s+jwcUtiJM7uT62up5WqeuqHvkHOuDouGRPDWLzdu66z/4nofTA+yDje71CDJJ4sR06d0Mqsp2bnYYx0eOM7eec8C/GEak7KIeSzb+V2OLSocu7qpwEBNaGY90AxCKPSr3M6MI3scc4sRcj+ARCLhgr/Zx8DXh1OPmtcvKz3dJbaXfOpeyLuVuOcV+4xXH6ZyMkuTTI2+FBxR81ssrGRiZ0MiU69rR8Pk X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/3/16 3:22, Jane Chu wrote: > On 3/15/2024 1:32 AM, Miaohe Lin wrote: > >> On 2024/3/13 9:23, Jane Chu wrote: >>> On 3/12/2024 7:14 AM, Matthew Wilcox wrote: >>> >>>> On Tue, Mar 12, 2024 at 03:07:39PM +0800, Miaohe Lin wrote: >>>>> On 2024/3/11 20:31, Matthew Wilcox wrote: >>>>>> Assuming we have a refcount on this page so it can't be simultaneously >>>>>> split/freed/whatever, these three sequences are equivalent: >>>>> If page is stable after page refcnt is held, I agree below three sequences are equivalent. >>>>> >>>>>> 1    if (PageCompound(p)) >>>>>> >>>>>> 2    struct page *head = compound_head(p); >>>>>> 2    if (PageHead(head)) >>>>>> >>>>>> 3    struct folio *folio = page_folio(p); >>>>>> 3    if (folio_test_large(folio)) >>>>>> >>>>>> . >>>>>> >>>>> But please see below commit: >>>>> >>>>> """ >>>>> commit f37d4298aa7f8b74395aa13c728677e2ed86fdaf >>>>> Author: Andi Kleen >>>>> Date:   Wed Aug 6 16:06:49 2014 -0700 >>>>> >>>>>       hwpoison: fix race with changing page during offlining >>>>> >>>>>       When a hwpoison page is locked it could change state due to parallel >>>>>       modifications.  The original compound page can be torn down and then >>>>>       this 4k page becomes part of a differently-size compound page is is a >>>>>       standalone regular page. >>>>> >>>>>       Check after the lock if the page is still the same compound page. >>>> I can't speak to what the rules were ten years ago, but this is not >>>> true now.  Compound pages cannot be split if you hold a refcount. >>>> Since we don't track a per-page refcount, we wouldn't know which of >>>> the split pages to give the excess refcount to. >>> I noticed this recently >>> >>>   * GUP pin and PG_locked transferred to @page. Rest subpages can be freed if >>>   * they are not mapped. >>>   * >>>   * Returns 0 if the hugepage is split successfully. >>>   * Returns -EBUSY if the page is pinned or if anon_vma disappeared from under >>>   * us. >>>   */ >>> int split_huge_page_to_list(struct page *page, struct list_head *list) >>> { >>> >>> I have a test case with poisoned shmem THP page that was mlocked and >>> >>> GUP pinned (FOLL_LONGTERM|FOLL_WRITE), but the split succeeded. >> Can you elaborate your test case a little bit more detail? There is a check in split_huge_page_to_list(): >> >> /* Racy check whether the huge page can be split */ >> bool can_split_folio(struct folio *folio, int *pextra_pins) >> { >>     int extra_pins; >> >>     /* Additional pins from page cache */ >>     if (folio_test_anon(folio)) >>         extra_pins = folio_test_swapcache(folio) ? >>                 folio_nr_pages(folio) : 0; >>     else >>         extra_pins = folio_nr_pages(folio); >>     if (pextra_pins) >>         *pextra_pins = extra_pins; >>     return folio_mapcount(folio) == folio_ref_count(folio) - extra_pins - 1; >> } >> >> So a large folio can only be split if only one extra page refcnt is held. It means large folio won't be split from >> under us if we hold an page refcnt. Or am I miss something? > My experiment was with an older kernel, though the can_split check is the same. > Also, I was emulating GUP pin with a hack:  in madvise_inject_error(), replaced > get_user_pages_fast(start, 1, 0, &page) with > pin_user_pages_fast(start, 1, FOLL_WRITE|FOLL_LONGTERM, &page) IIUC, get_user_pages_fast() and pin_user_pages_fast(FOLL_LONGTERM) will both call try_grab_folio() to fetch extra page refcnt. get_user_pages_fast() will have FOLL_GET set while pin_user_pages_fast() will have FOLL_PIN set. It seems they works same for large folio about page refcnt. * * FOLL_GET: folio's refcount will be incremented by @refs. * * FOLL_PIN on large folios: folio's refcount will be incremented by * @refs, and its pincount will be incremented by @refs. * * FOLL_PIN on single-page folios: folio's refcount will be incremented by * @refs * GUP_PIN_COUNTING_BIAS. * * Return: The folio containing @page (with refcount appropriately * incremented) for success, or NULL upon failure. If neither FOLL_GET * nor FOLL_PIN was set, that's considered failure, and furthermore, * a likely bug in the caller, so a warning is also emitted. */ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) They will both call try_get_folio(page, refs) to fetch the page refcnt. So your hack with emulating GUP pin seems doesn't work as you expected. Or am I miss something? Thanks. > I suspect something might be wrong with my hack, I'm trying to reproduce with real GUP pin and on a newer kernel. > Will keep you informed. > thanks! > -jane > > >> >> Thanks. >> >>> thanks, >>> >>> -jane >>> >>> . > .