linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: jaewon31.kim@samsung.com,
	"zhaoyang.huang@unisoc.com" <zhaoyang.huang@unisoc.com>,
	유혜수 <hyesoo.yu@samsung.com>,
	"jhubbard@nvidia.com" <jhubbard@nvidia.com>,
	"surenb@google.com" <surenb@google.com>,
	"Steve.Kang@unisoc.com" <Steve.Kang@unisoc.com>,
	"huangzhaoyang@gmail.com" <huangzhaoyang@gmail.com>
Cc: Jaewon Kim <jaewon31.kim@gmail.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: reply: [RFC] pin_user_pages_fast failure count increased
Date: Thu, 22 May 2025 17:07:59 +0200	[thread overview]
Message-ID: <1dd70770-f097-46bc-99b1-1bd3249032fb@redhat.com> (raw)
In-Reply-To: <20250522144418epcms1p2a31c1a5c95b1937077bddf1b30495e83@epcms1p2>

On 22.05.25 16:44, 김재원 wrote:
>> On 22.05.25 15:09, Jaewon Kim wrote:
>>>> On 22.05.25 12:18, ?朝? (Zhaoyang Huang) wrote:
>>>>>> On Mon, Apr 28, 2025 at 02:12:57PM -0700, John Hubbard wrote:
>>>>>>> On 4/28/25 1:56 PM, David Hildenbrand wrote:
>>>>>>>> On 28.04.25 22:14, John Hubbard wrote:
>>>>>>>>> On 4/28/25 8:17 AM, Jaewon Kim wrote:
>>>>>>>>>> Hi
>>>>>>>>>>
>>>>>>>>>> If pin_user_pages_fast does not pin all the requested number of
>>>>>>>>>> pages, then drivers calling to pin_user_pages_fast should retry
>>>>>>>>>> until the gup pins all?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Approaches vary, for handling partial success of pin_user_pages().
>>>>>>>>>
>>>>>>>>> * Many drivers unpin everything and either bail out entirely, or
>>>>>>>>> retry pinning the entire original range.
>>>>>>>>
>>>>>>>> Hm, unpinning + trying to repin the entire range can easily result
>>>>>>>> in an endless loop on persistent errors IIRC?
>>>>>>>>
>>>>>>>
>>>>>>> I vaguely recall a limited number of retries, yes.
>>>>>>>
>>>>>>> thanks,
>>>>>>> --
>>>>>>> John Hubbard
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'd like to report a potential issue introduced by a recent change in
>>>>>> 1aaf8c122918 mm: gup: fix infinite loop within __get_longterm_locked
>>>>>>
>>>>>> Previously, the call to migrate_longterm_unpinnable_folio() was guarded by the
>>>>>> collected variable. This meant that if a CMA page was temporarily held in the
>>>>>> pagevec and failed LRU isolation, it wouldn't be added to the
>>>>>> movable_page_list, but the collected counter would still be incremented.
>>>>
>>>> Okay, so we'd also express that way "any longterm_pinnable page found".
>>>>
>>>>> There is lru_add_drain_all for dealing with this scenario, so this won't be the case, right?
>>>>
>>>> Good point. Only concurrent isolation might be problematic (concurrent reclaim?).
>>>>
>>>>>>
>>>>>> As a result, migrate_longterm_unpinnable_folio() would return -EAGAIN, and
>>>>>> the process would be retried until migration of the CMA page succeeded.
>>>>>>
>>>>>> However, in the recent patch merged into mainline, the logic now only checks
>>>>>> whether movable_page_list is empty, and no longer relies on the collected
>>>>>> count.
>>>>>> This can cause CMA pages that fail isolation to bypass retry logic and remain
>>>>>> pinned.
>>>>>>
>>>>>> Effectively,long-term pinning is now possible for CMA pages ? something that
>>>>>> previously would have been avoided through repeated attempts.
>>>>
>>>> Calling migrate_longterm_unpinnable_folios() when there is nothing to migrate is stupid.
>>>>
>>>> Maybe something like:
>>>>
>>>> diff --git a/mm/gup.c b/mm/gup.c
>>>> index 329c5f7acc7a0..58b8e40fc19ed 100644
>>>> --- a/mm/gup.c
>>>> +++ b/mm/gup.c
>>>> @@ -2301,14 +2301,15 @@ static void pofs_unpin(struct pages_or_folios *pofs)
>>>>    }
>>>>
>>>>    /*
>>>> - * Returns the number of collected folios. Return value is always >= 0.
>>>> + * Returns whether any longterm unpinnable folio was found (if isolation
>>>> + * fails, not all can be added to the movable_folio_list).
>>>>     */
>>>> -static void collect_longterm_unpinnable_folios(
>>>> +static bool collect_longterm_unpinnable_folios(
>>>>                   struct list_head *movable_folio_list,
>>>>                   struct pages_or_folios *pofs)
>>>>    {
>>>> +       bool drain_allow = true, any_unpinnable = false;
>>>>           struct folio *prev_folio = NULL;
>>>> -       bool drain_allow = true;
>>>>           unsigned long i;
>>>>
>>>>           for (i = 0; i < pofs->nr_entries; i++) {
>>>> @@ -2320,6 +2321,7 @@ static void collect_longterm_unpinnable_folios(
>>>>
>>>>                   if (folio_is_longterm_pinnable(folio))
>>>>                           continue;
>>>> +               any_unpinnable = true;
>>>>
>>>>                   if (folio_is_device_coherent(folio))
>>>>                           continue;
>>>> @@ -2342,6 +2344,8 @@ static void collect_longterm_unpinnable_folios(
>>>>                                       NR_ISOLATED_ANON + folio_is_file_lru(folio),
>>>>                                       folio_nr_pages(folio));
>>>>           }
>>>> +
>>>> +       return any_unpinnable;
>>>>    }
>>>>
>>>>    /*
>>>> @@ -2417,11 +2421,12 @@ migrate_longterm_unpinnable_folios(struct list_head *movable_folio_list,
>>>>    static long
>>>>    check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
>>>>    {
>>>> +       bool any_unpinnable;
>>>>           LIST_HEAD(movable_folio_list);
>>>>
>>>> -       collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
>>>> +       any_unpinnable = collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
>>>>           if (list_empty(&movable_folio_list))
>>>> -               return 0;
>>>> +               return any_unpinnable ? -EAGAIN : 0;
>>>>
>>>>           return migrate_longterm_unpinnable_folios(&movable_folio_list, pofs);
>>>>    }
>>>>
>>>>
>>>> --
>>>> Cheers,
>>>>
>>>> David / dhildenb
>>>
>>> Hi
>>>
>>> Thank you for your comment and patch.
>>> By the way, what if there are any_unpinnable pages and also pages in the movable_folio_list,
>>> but migrate_longterm_unpinnable_folios failed to migrate and return other erros instead of -EAGAIN?
>>> In that case, I think the CMA or other unpinnable pages would be pinned.
>>
>> Oh, I think what we have to do is call pofs_unpin(pofs)() in case we
>> return with -EAGAIN early. That's what
>> migrate_longterm_unpinnable_folios() would do.
> 
> I did not understand. Do you mean we need to call pofs_unpin in case of any_unpinnable?

At least that's what we did before 1aaf8c122 when calling 
collect_longterm_unpinnable_folios() I think.

> I think your following code seems to be good to me.
>    return any_unpinnable ? -EAGAIN : 0;

I think we have to call pofs_unpin() here.

Assuming we always call migrate_longterm_unpinnable_folios() with a list 
now, we could drop the list_empty() check in there.

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2025-05-22 15:08 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAJrd-UtDD50iN=Yxz4=6kNkAcNAtRFkxhKAbEYiRyyDT-bYPHg@mail.gmail.com>
2025-05-22 10:18 ` reply: [RFC] pin_user_pages_fast failure count increased 黄朝阳 (Zhaoyang Huang)
2025-05-22 12:22   ` David Hildenbrand
     [not found]     ` <CGME20250522130101epcas1p435244c12cfc9bb7895008b8ea98af064@epcms1p3>
2025-05-22 13:09       ` Jaewon Kim
2025-05-22 14:06         ` David Hildenbrand
     [not found]         ` <CGME20250522130101epcas1p435244c12cfc9bb7895008b8ea98af064@epcms1p2>
2025-05-22 14:44           ` 김재원
2025-05-22 15:07             ` David Hildenbrand [this message]
2025-05-23  2:48               ` John Hubbard
2025-05-23  2:37           ` 김재원
2025-05-23  2:52             ` John Hubbard
2025-05-26  7:48               ` Hyesoo Yu
2025-05-26  8:05                 ` Zhaoyang Huang
2025-05-26  9:33                   ` Hyesoo Yu
2025-05-26  9:38                     ` David Hildenbrand
     [not found]                     ` <CGME20250522130101epcas1p435244c12cfc9bb7895008b8ea98af064@epcms1p8>
2025-05-26 11:17                       ` Jaewon Kim
2025-05-26 11:49                         ` Zhaoyang Huang
2025-05-28  1:23                           ` Hyesoo Yu
2025-05-28  2:49                             ` Zhaoyang Huang
2025-05-28  3:36                               ` Hyesoo Yu
2025-05-28  7:55                                 ` David Hildenbrand
2025-05-28 10:59                                   ` Zhaoyang Huang
2025-05-28 12:57                                     ` David Hildenbrand
2025-06-03 13:12                                     ` David Hildenbrand
2025-06-04  1:04                                       ` Zhaoyang Huang
2025-06-04  9:12                                         ` David Hildenbrand
2025-06-04  9:41                                           ` Zhaoyang Huang
2025-06-04  9:48                                             ` David Hildenbrand
     [not found]                                               ` <CGME20250604095542epcas2p3f3d2d6fc17115547981a7173215a09d1@epcas2p3.samsung.com>
2025-06-04  9:53                                                 ` Hyesoo Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1dd70770-f097-46bc-99b1-1bd3249032fb@redhat.com \
    --to=david@redhat.com \
    --cc=Steve.Kang@unisoc.com \
    --cc=huangzhaoyang@gmail.com \
    --cc=hyesoo.yu@samsung.com \
    --cc=jaewon31.kim@gmail.com \
    --cc=jaewon31.kim@samsung.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-mm@kvack.org \
    --cc=surenb@google.com \
    --cc=zhaoyang.huang@unisoc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).