linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Miaohe Lin <linmiaohe@huawei.com>
To: Jane Chu <jane.chu@oracle.com>
Cc: <linux-mm@kvack.org>, Naoya Horiguchi <naoya.horiguchi@nec.com>,
	Andrew Morton <akpm@linux-foundation.org>, <ak@linux.intel.com>,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH 6/8] mm/memory-failure: Convert memory_failure() to use a folio
Date: Mon, 18 Mar 2024 10:28:16 +0800	[thread overview]
Message-ID: <c63bc8c6-be38-cb69-c663-89141e32d91e@huawei.com> (raw)
In-Reply-To: <3a5fc87b-7362-4971-a9ab-55154627deb3@oracle.com>

On 2024/3/16 3:22, Jane Chu wrote:
> On 3/15/2024 1:32 AM, Miaohe Lin wrote:
> 
>> On 2024/3/13 9:23, Jane Chu wrote:
>>> On 3/12/2024 7:14 AM, Matthew Wilcox wrote:
>>>
>>>> On Tue, Mar 12, 2024 at 03:07:39PM +0800, Miaohe Lin wrote:
>>>>> On 2024/3/11 20:31, Matthew Wilcox wrote:
>>>>>> Assuming we have a refcount on this page so it can't be simultaneously
>>>>>> split/freed/whatever, these three sequences are equivalent:
>>>>> If page is stable after page refcnt is held, I agree below three sequences are equivalent.
>>>>>
>>>>>> 1    if (PageCompound(p))
>>>>>>
>>>>>> 2    struct page *head = compound_head(p);
>>>>>> 2    if (PageHead(head))
>>>>>>
>>>>>> 3    struct folio *folio = page_folio(p);
>>>>>> 3    if (folio_test_large(folio))
>>>>>>
>>>>>> .
>>>>>>
>>>>> But please see below commit:
>>>>>
>>>>> """
>>>>> commit f37d4298aa7f8b74395aa13c728677e2ed86fdaf
>>>>> Author: Andi Kleen <ak@linux.intel.com>
>>>>> Date:   Wed Aug 6 16:06:49 2014 -0700
>>>>>
>>>>>       hwpoison: fix race with changing page during offlining
>>>>>
>>>>>       When a hwpoison page is locked it could change state due to parallel
>>>>>       modifications.  The original compound page can be torn down and then
>>>>>       this 4k page becomes part of a differently-size compound page is is a
>>>>>       standalone regular page.
>>>>>
>>>>>       Check after the lock if the page is still the same compound page.
>>>> I can't speak to what the rules were ten years ago, but this is not
>>>> true now.  Compound pages cannot be split if you hold a refcount.
>>>> Since we don't track a per-page refcount, we wouldn't know which of
>>>> the split pages to give the excess refcount to.
>>> I noticed this recently
>>>
>>>   * GUP pin and PG_locked transferred to @page. Rest subpages can be freed if
>>>   * they are not mapped.
>>>   *
>>>   * Returns 0 if the hugepage is split successfully.
>>>   * Returns -EBUSY if the page is pinned or if anon_vma disappeared from under
>>>   * us.
>>>   */
>>> int split_huge_page_to_list(struct page *page, struct list_head *list)
>>> {
>>>
>>> I have a test case with poisoned shmem THP page that was mlocked and
>>>
>>> GUP pinned (FOLL_LONGTERM|FOLL_WRITE), but the split succeeded.
>> Can you elaborate your test case a little bit more detail? There is a check in split_huge_page_to_list():
>>
>> /* Racy check whether the huge page can be split */
>> bool can_split_folio(struct folio *folio, int *pextra_pins)
>> {
>>     int extra_pins;
>>
>>     /* Additional pins from page cache */
>>     if (folio_test_anon(folio))
>>         extra_pins = folio_test_swapcache(folio) ?
>>                 folio_nr_pages(folio) : 0;
>>     else
>>         extra_pins = folio_nr_pages(folio);
>>     if (pextra_pins)
>>         *pextra_pins = extra_pins;
>>     return folio_mapcount(folio) == folio_ref_count(folio) - extra_pins - 1;
>> }
>>
>> So a large folio can only be split if only one extra page refcnt is held. It means large folio won't be split from
>> under us if we hold an page refcnt. Or am I miss something?
> My experiment was with an older kernel, though the can_split check is the same.
> Also, I was emulating GUP pin with a hack:  in madvise_inject_error(), replaced
> get_user_pages_fast(start, 1, 0, &page) with
> pin_user_pages_fast(start, 1, FOLL_WRITE|FOLL_LONGTERM, &page)

IIUC, get_user_pages_fast() and pin_user_pages_fast(FOLL_LONGTERM) will both call try_grab_folio() to fetch extra page refcnt.
get_user_pages_fast() will have FOLL_GET set while pin_user_pages_fast() will have FOLL_PIN set. It seems they works same for
large folio about page refcnt.

 *
 *    FOLL_GET: folio's refcount will be incremented by @refs.
 *
 *    FOLL_PIN on large folios: folio's refcount will be incremented by
 *    @refs, and its pincount will be incremented by @refs.
 *
 *    FOLL_PIN on single-page folios: folio's refcount will be incremented by
 *    @refs * GUP_PIN_COUNTING_BIAS.
 *
 * Return: The folio containing @page (with refcount appropriately
 * incremented) for success, or NULL upon failure. If neither FOLL_GET
 * nor FOLL_PIN was set, that's considered failure, and furthermore,
 * a likely bug in the caller, so a warning is also emitted.
 */
struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)

They will both call try_get_folio(page, refs) to fetch the page refcnt. So your hack with emulating GUP pin seems
doesn't work as you expected. Or am I miss something?

Thanks.

> I suspect something might be wrong with my hack, I'm trying to reproduce with real GUP pin and on a newer kernel.
> Will keep you informed.
> thanks!
> -jane
> 
> 
>>
>> Thanks.
>>
>>> thanks,
>>>
>>> -jane
>>>
>>> .
> .



  reply	other threads:[~2024-03-18  2:28 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-29 21:20 [PATCH 0/8] Some cleanups for memory-failure Matthew Wilcox (Oracle)
2024-02-29 21:20 ` [PATCH 1/8] mm/memory-failure: Remove fsdax_pgoff argument from __add_to_kill Matthew Wilcox (Oracle)
2024-03-04 12:09   ` Miaohe Lin
2024-03-13  2:07   ` Jane Chu
2024-03-13  3:23     ` Matthew Wilcox
2024-03-13 18:11       ` Jane Chu
2024-03-14  3:51         ` Matthew Wilcox
2024-03-14 17:54           ` Jane Chu
2024-03-19  0:36   ` Dan Williams
2024-02-29 21:20 ` [PATCH 2/8] mm/memory-failure: Pass addr to __add_to_kill() Matthew Wilcox (Oracle)
2024-03-04 12:10   ` Miaohe Lin
2024-02-29 21:20 ` [PATCH 3/8] mm: Return the address from page_mapped_in_vma() Matthew Wilcox (Oracle)
2024-03-04 12:31   ` Miaohe Lin
2024-03-05 20:09     ` Matthew Wilcox
2024-03-06  8:10       ` Miaohe Lin
2024-03-06  8:17   ` Miaohe Lin
2024-02-29 21:20 ` [PATCH 4/8] mm/memory-failure: Convert shake_page() to shake_folio() Matthew Wilcox (Oracle)
2024-03-06  9:31   ` Miaohe Lin
2024-04-08 15:36     ` Matthew Wilcox
2024-04-08 18:31       ` Jane Chu
2024-04-10  4:01         ` Miaohe Lin
2024-02-29 21:20 ` [PATCH 5/8] mm: Convert hugetlb_page_mapping_lock_write to folio Matthew Wilcox (Oracle)
2024-03-08  8:33   ` Miaohe Lin
2024-02-29 21:20 ` [PATCH 6/8] mm/memory-failure: Convert memory_failure() to use a folio Matthew Wilcox (Oracle)
2024-03-08  8:48   ` Miaohe Lin
2024-03-11 12:31     ` Matthew Wilcox
2024-03-12  7:07       ` Miaohe Lin
2024-03-12 14:14         ` Matthew Wilcox
2024-03-13  1:23           ` Jane Chu
2024-03-14  2:34             ` Miaohe Lin
2024-03-14 18:15               ` Jane Chu
2024-03-15  6:25                 ` Miaohe Lin
2024-03-15  8:32             ` Miaohe Lin
2024-03-15 19:22               ` Jane Chu
2024-03-18  2:28                 ` Miaohe Lin [this message]
2024-02-29 21:20 ` [PATCH 7/8] mm/memory-failure: Convert hwpoison_user_mappings to take " Matthew Wilcox (Oracle)
2024-03-11 11:44   ` Miaohe Lin
2024-02-29 21:20 ` [PATCH 8/8] mm/memory-failure: Add some folio conversions to unpoison_memory Matthew Wilcox (Oracle)
2024-03-11 11:29   ` Miaohe Lin
2024-03-01  6:28 ` [PATCH 0/8] Some cleanups for memory-failure Miaohe Lin
2024-03-01 12:40   ` Muhammad Usama Anjum
2024-03-04  1:55     ` Miaohe Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c63bc8c6-be38-cb69-c663-89141e32d91e@huawei.com \
    --to=linmiaohe@huawei.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=jane.chu@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).