reply: [RFC] pin_user_pages_fast failure count increased

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* reply: [RFC] pin_user_pages_fast failure count increased
@ 2025-05-22 10:18 ` 黄朝阳 (Zhaoyang Huang)
  2025-05-22 12:22   ` David Hildenbrand
  0 siblings, 1 reply; 27+ messages in thread
From: 黄朝阳 (Zhaoyang Huang) @ 2025-05-22 10:18 UTC (permalink / raw)
  To: Hyesoo Yu, John Hubbard
  Cc: David Hildenbrand, Jaewon Kim, surenb@google.com,
	linux-mm@kvack.org, 康纪滨 (Steve Kang),
	Zhaoyang Huang

>On Mon, Apr 28, 2025 at 02:12:57PM -0700, John Hubbard wrote:
>> On 4/28/25 1:56 PM, David Hildenbrand wrote:
>> > On 28.04.25 22:14, John Hubbard wrote:
>> > > On 4/28/25 8:17 AM, Jaewon Kim wrote:
>> > > > Hi
>> > > >
>> > > > If pin_user_pages_fast does not pin all the requested number of
>> > > > pages, then drivers calling to pin_user_pages_fast should retry
>> > > > until the gup pins all?
>> > > >
>> > >
>> > > Approaches vary, for handling partial success of pin_user_pages().
>> > >
>> > > * Many drivers unpin everything and either bail out entirely, or
>> > > retry pinning the entire original range.
>> >
>> > Hm, unpinning + trying to repin the entire range can easily result
>> > in an endless loop on persistent errors IIRC?
>> >
>>
>> I vaguely recall a limited number of retries, yes.
>>
>> thanks,
>> --
>> John Hubbard
>>
>>
>
>Hi,
>
>I'd like to report a potential issue introduced by a recent change in
>1aaf8c122918 mm: gup: fix infinite loop within __get_longterm_locked
>
>Previously, the call to migrate_longterm_unpinnable_folio() was guarded by the
>collected variable. This meant that if a CMA page was temporarily held in the
>pagevec and failed LRU isolation, it wouldn't be added to the
>movable_page_list, but the collected counter would still be incremented.
There is lru_add_drain_all for dealing with this scenario, so this won't be the case, right?
>
>As a result, migrate_longterm_unpinnable_folio() would return -EAGAIN, and
>the process would be retried until migration of the CMA page succeeded.
>
>However, in the recent patch merged into mainline, the logic now only checks
>whether movable_page_list is empty, and no longer relies on the collected
>count.
>This can cause CMA pages that fail isolation to bypass retry logic and remain
>pinned.
>
>Effectively,long-term pinning is now possible for CMA pages — something that
>previously would have been avoided through repeated attempts.
>
>We've observed this behavior in practice, which has led to issues such as CMA
>allocation failures under memory pressure. This may indicate a regression in
>the logic that prevents pinning of unmovable CMA pages.
>
>I believe this warrants further discussion or possibly a fix to restore the
>intended retry behavior for pages that fail LRU isolation.
>
>Thanks,
>Hyesoo Yu.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reply: [RFC] pin_user_pages_fast failure count increased
  2025-05-22 10:18 ` reply: [RFC] pin_user_pages_fast failure count increased 黄朝阳 (Zhaoyang Huang)
@ 2025-05-22 12:22   ` David Hildenbrand
       [not found]     ` <CGME20250522130101epcas1p435244c12cfc9bb7895008b8ea98af064@epcms1p3>
  0 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2025-05-22 12:22 UTC (permalink / raw)
  To: 黄朝阳 (Zhaoyang Huang), Hyesoo Yu,
	John Hubbard
  Cc: Jaewon Kim, surenb@google.com, linux-mm@kvack.org,
	康纪滨 (Steve Kang), Zhaoyang Huang

On 22.05.25 12:18, 黄朝阳 (Zhaoyang Huang) wrote:
>> On Mon, Apr 28, 2025 at 02:12:57PM -0700, John Hubbard wrote:
>>> On 4/28/25 1:56 PM, David Hildenbrand wrote:
>>>> On 28.04.25 22:14, John Hubbard wrote:
>>>>> On 4/28/25 8:17 AM, Jaewon Kim wrote:
>>>>>> Hi
>>>>>>
>>>>>> If pin_user_pages_fast does not pin all the requested number of
>>>>>> pages, then drivers calling to pin_user_pages_fast should retry
>>>>>> until the gup pins all?
>>>>>>
>>>>>
>>>>> Approaches vary, for handling partial success of pin_user_pages().
>>>>>
>>>>> * Many drivers unpin everything and either bail out entirely, or
>>>>> retry pinning the entire original range.
>>>>
>>>> Hm, unpinning + trying to repin the entire range can easily result
>>>> in an endless loop on persistent errors IIRC?
>>>>
>>>
>>> I vaguely recall a limited number of retries, yes.
>>>
>>> thanks,
>>> --
>>> John Hubbard
>>>
>>>
>>
>> Hi,
>>
>> I'd like to report a potential issue introduced by a recent change in
>> 1aaf8c122918 mm: gup: fix infinite loop within __get_longterm_locked
>>
>> Previously, the call to migrate_longterm_unpinnable_folio() was guarded by the
>> collected variable. This meant that if a CMA page was temporarily held in the
>> pagevec and failed LRU isolation, it wouldn't be added to the
>> movable_page_list, but the collected counter would still be incremented.

Okay, so we'd also express that way "any longterm_pinnable page found".

> There is lru_add_drain_all for dealing with this scenario, so this won't be the case, right?

Good point. Only concurrent isolation might be problematic (concurrent reclaim?).

>>
>> As a result, migrate_longterm_unpinnable_folio() would return -EAGAIN, and
>> the process would be retried until migration of the CMA page succeeded.
>>
>> However, in the recent patch merged into mainline, the logic now only checks
>> whether movable_page_list is empty, and no longer relies on the collected
>> count.
>> This can cause CMA pages that fail isolation to bypass retry logic and remain
>> pinned.
>>
>> Effectively,long-term pinning is now possible for CMA pages — something that
>> previously would have been avoided through repeated attempts.

Calling migrate_longterm_unpinnable_folios() when there is nothing to migrate is stupid.

Maybe something like:

diff --git a/mm/gup.c b/mm/gup.c
index 329c5f7acc7a0..58b8e40fc19ed 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2301,14 +2301,15 @@ static void pofs_unpin(struct pages_or_folios *pofs)
  }
  
  /*
- * Returns the number of collected folios. Return value is always >= 0.
+ * Returns whether any longterm unpinnable folio was found (if isolation
+ * fails, not all can be added to the movable_folio_list).
   */
-static void collect_longterm_unpinnable_folios(
+static bool collect_longterm_unpinnable_folios(
                 struct list_head *movable_folio_list,
                 struct pages_or_folios *pofs)
  {
+       bool drain_allow = true, any_unpinnable = false;
         struct folio *prev_folio = NULL;
-       bool drain_allow = true;
         unsigned long i;
  
         for (i = 0; i < pofs->nr_entries; i++) {
@@ -2320,6 +2321,7 @@ static void collect_longterm_unpinnable_folios(
  
                 if (folio_is_longterm_pinnable(folio))
                         continue;
+               any_unpinnable = true;
  
                 if (folio_is_device_coherent(folio))
                         continue;
@@ -2342,6 +2344,8 @@ static void collect_longterm_unpinnable_folios(
                                     NR_ISOLATED_ANON + folio_is_file_lru(folio),
                                     folio_nr_pages(folio));
         }
+
+       return any_unpinnable;
  }
  
  /*
@@ -2417,11 +2421,12 @@ migrate_longterm_unpinnable_folios(struct list_head *movable_folio_list,
  static long
  check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
  {
+       bool any_unpinnable;
         LIST_HEAD(movable_folio_list);
  
-       collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
+       any_unpinnable = collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
         if (list_empty(&movable_folio_list))
-               return 0;
+               return any_unpinnable ? -EAGAIN : 0;
  
         return migrate_longterm_unpinnable_folios(&movable_folio_list, pofs);
  }


-- 
Cheers,

David / dhildenb



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: reply: [RFC] pin_user_pages_fast failure count increased
       [not found]     ` <CGME20250522130101epcas1p435244c12cfc9bb7895008b8ea98af064@epcms1p3>
@ 2025-05-22 13:09       ` Jaewon Kim
  2025-05-22 14:06         ` David Hildenbrand
       [not found]         ` <CGME20250522130101epcas1p435244c12cfc9bb7895008b8ea98af064@epcms1p2>
  0 siblings, 2 replies; 27+ messages in thread
From: Jaewon Kim @ 2025-05-22 13:09 UTC (permalink / raw)
  To: zhaoyang.huang@unisoc.com, Hyesoo Yu, jhubbard@nvidia.com,
	david@redhat.com, surenb@google.com, Steve.Kang@unisoc.com,
	huangzhaoyang@gmail.com
  Cc: Jaewon Kim, linux-mm@kvack.org

>On 22.05.25 12:18, ?朝? (Zhaoyang Huang) wrote:
>>> On Mon, Apr 28, 2025 at 02:12:57PM -0700, John Hubbard wrote:
>>>> On 4/28/25 1:56 PM, David Hildenbrand wrote:
>>>>> On 28.04.25 22:14, John Hubbard wrote:
>>>>>> On 4/28/25 8:17 AM, Jaewon Kim wrote:
>>>>>>> Hi
>>>>>>>
>>>>>>> If pin_user_pages_fast does not pin all the requested number of
>>>>>>> pages, then drivers calling to pin_user_pages_fast should retry
>>>>>>> until the gup pins all?
>>>>>>>
>>>>>>
>>>>>> Approaches vary, for handling partial success of pin_user_pages().
>>>>>>
>>>>>> * Many drivers unpin everything and either bail out entirely, or
>>>>>> retry pinning the entire original range.
>>>>>
>>>>> Hm, unpinning + trying to repin the entire range can easily result
>>>>> in an endless loop on persistent errors IIRC?
>>>>>
>>>>
>>>> I vaguely recall a limited number of retries, yes.
>>>>
>>>> thanks,
>>>> --
>>>> John Hubbard
>>>>
>>>>
>>>
>>> Hi,
>>>
>>> I'd like to report a potential issue introduced by a recent change in
>>> 1aaf8c122918 mm: gup: fix infinite loop within __get_longterm_locked
>>>
>>> Previously, the call to migrate_longterm_unpinnable_folio() was guarded by the
>>> collected variable. This meant that if a CMA page was temporarily held in the
>>> pagevec and failed LRU isolation, it wouldn't be added to the
>>> movable_page_list, but the collected counter would still be incremented.
>
>Okay, so we'd also express that way "any longterm_pinnable page found".
>
>> There is lru_add_drain_all for dealing with this scenario, so this won't be the case, right?
>
>Good point. Only concurrent isolation might be problematic (concurrent reclaim?).
>
>>>
>>> As a result, migrate_longterm_unpinnable_folio() would return -EAGAIN, and
>>> the process would be retried until migration of the CMA page succeeded.
>>>
>>> However, in the recent patch merged into mainline, the logic now only checks
>>> whether movable_page_list is empty, and no longer relies on the collected
>>> count.
>>> This can cause CMA pages that fail isolation to bypass retry logic and remain
>>> pinned.
>>>
>>> Effectively,long-term pinning is now possible for CMA pages ? something that
>>> previously would have been avoided through repeated attempts.
>
>Calling migrate_longterm_unpinnable_folios() when there is nothing to migrate is stupid.
>
>Maybe something like:
>
>diff --git a/mm/gup.c b/mm/gup.c
>index 329c5f7acc7a0..58b8e40fc19ed 100644
>--- a/mm/gup.c
>+++ b/mm/gup.c
>@@ -2301,14 +2301,15 @@ static void pofs_unpin(struct pages_or_folios *pofs)
>  }
>
>  /*
>- * Returns the number of collected folios. Return value is always >= 0.
>+ * Returns whether any longterm unpinnable folio was found (if isolation
>+ * fails, not all can be added to the movable_folio_list).
>   */
>-static void collect_longterm_unpinnable_folios(
>+static bool collect_longterm_unpinnable_folios(
>                 struct list_head *movable_folio_list,
>                 struct pages_or_folios *pofs)
>  {
>+       bool drain_allow = true, any_unpinnable = false;
>         struct folio *prev_folio = NULL;
>-       bool drain_allow = true;
>         unsigned long i;
>
>         for (i = 0; i < pofs->nr_entries; i++) {
>@@ -2320,6 +2321,7 @@ static void collect_longterm_unpinnable_folios(
>
>                 if (folio_is_longterm_pinnable(folio))
>                         continue;
>+               any_unpinnable = true;
>
>                 if (folio_is_device_coherent(folio))
>                         continue;
>@@ -2342,6 +2344,8 @@ static void collect_longterm_unpinnable_folios(
>                                     NR_ISOLATED_ANON + folio_is_file_lru(folio),
>                                     folio_nr_pages(folio));
>         }
>+
>+       return any_unpinnable;
>  }
>
>  /*
>@@ -2417,11 +2421,12 @@ migrate_longterm_unpinnable_folios(struct list_head *movable_folio_list,
>  static long
>  check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
>  {
>+       bool any_unpinnable;
>         LIST_HEAD(movable_folio_list);
>
>-       collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
>+       any_unpinnable = collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
>         if (list_empty(&movable_folio_list))
>-               return 0;
>+               return any_unpinnable ? -EAGAIN : 0;
>
>         return migrate_longterm_unpinnable_folios(&movable_folio_list, pofs);
>  }
>
>
>--
>Cheers,
>
>David / dhildenb

Hi

Thank you for your comment and patch.
By the way, what if there are any_unpinnable pages and also pages in the movable_folio_list,
but migrate_longterm_unpinnable_folios failed to migrate and return other erros instead of -EAGAIN?
In that case, I think the CMA or other unpinnable pages would be pinned.

I think we need to return -EGAIN if we have any_unpinnable.

BR
Jaewon Kim


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reply: [RFC] pin_user_pages_fast failure count increased
  2025-05-22 13:09       ` Jaewon Kim
@ 2025-05-22 14:06         ` David Hildenbrand
       [not found]         ` <CGME20250522130101epcas1p435244c12cfc9bb7895008b8ea98af064@epcms1p2>
  1 sibling, 0 replies; 27+ messages in thread
From: David Hildenbrand @ 2025-05-22 14:06 UTC (permalink / raw)
  To: jaewon31.kim, zhaoyang.huang@unisoc.com, Hyesoo Yu,
	jhubbard@nvidia.com, surenb@google.com, Steve.Kang@unisoc.com,
	huangzhaoyang@gmail.com
  Cc: Jaewon Kim, linux-mm@kvack.org

On 22.05.25 15:09, Jaewon Kim wrote:
>> On 22.05.25 12:18, ?朝? (Zhaoyang Huang) wrote:
>>>> On Mon, Apr 28, 2025 at 02:12:57PM -0700, John Hubbard wrote:
>>>>> On 4/28/25 1:56 PM, David Hildenbrand wrote:
>>>>>> On 28.04.25 22:14, John Hubbard wrote:
>>>>>>> On 4/28/25 8:17 AM, Jaewon Kim wrote:
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> If pin_user_pages_fast does not pin all the requested number of
>>>>>>>> pages, then drivers calling to pin_user_pages_fast should retry
>>>>>>>> until the gup pins all?
>>>>>>>>
>>>>>>>
>>>>>>> Approaches vary, for handling partial success of pin_user_pages().
>>>>>>>
>>>>>>> * Many drivers unpin everything and either bail out entirely, or
>>>>>>> retry pinning the entire original range.
>>>>>>
>>>>>> Hm, unpinning + trying to repin the entire range can easily result
>>>>>> in an endless loop on persistent errors IIRC?
>>>>>>
>>>>>
>>>>> I vaguely recall a limited number of retries, yes.
>>>>>
>>>>> thanks,
>>>>> --
>>>>> John Hubbard
>>>>>
>>>>>
>>>>
>>>> Hi,
>>>>
>>>> I'd like to report a potential issue introduced by a recent change in
>>>> 1aaf8c122918 mm: gup: fix infinite loop within __get_longterm_locked
>>>>
>>>> Previously, the call to migrate_longterm_unpinnable_folio() was guarded by the
>>>> collected variable. This meant that if a CMA page was temporarily held in the
>>>> pagevec and failed LRU isolation, it wouldn't be added to the
>>>> movable_page_list, but the collected counter would still be incremented.
>>
>> Okay, so we'd also express that way "any longterm_pinnable page found".
>>
>>> There is lru_add_drain_all for dealing with this scenario, so this won't be the case, right?
>>
>> Good point. Only concurrent isolation might be problematic (concurrent reclaim?).
>>
>>>>
>>>> As a result, migrate_longterm_unpinnable_folio() would return -EAGAIN, and
>>>> the process would be retried until migration of the CMA page succeeded.
>>>>
>>>> However, in the recent patch merged into mainline, the logic now only checks
>>>> whether movable_page_list is empty, and no longer relies on the collected
>>>> count.
>>>> This can cause CMA pages that fail isolation to bypass retry logic and remain
>>>> pinned.
>>>>
>>>> Effectively,long-term pinning is now possible for CMA pages ? something that
>>>> previously would have been avoided through repeated attempts.
>>
>> Calling migrate_longterm_unpinnable_folios() when there is nothing to migrate is stupid.
>>
>> Maybe something like:
>>
>> diff --git a/mm/gup.c b/mm/gup.c
>> index 329c5f7acc7a0..58b8e40fc19ed 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -2301,14 +2301,15 @@ static void pofs_unpin(struct pages_or_folios *pofs)
>>   }
>>
>>   /*
>> - * Returns the number of collected folios. Return value is always >= 0.
>> + * Returns whether any longterm unpinnable folio was found (if isolation
>> + * fails, not all can be added to the movable_folio_list).
>>    */
>> -static void collect_longterm_unpinnable_folios(
>> +static bool collect_longterm_unpinnable_folios(
>>                  struct list_head *movable_folio_list,
>>                  struct pages_or_folios *pofs)
>>   {
>> +       bool drain_allow = true, any_unpinnable = false;
>>          struct folio *prev_folio = NULL;
>> -       bool drain_allow = true;
>>          unsigned long i;
>>
>>          for (i = 0; i < pofs->nr_entries; i++) {
>> @@ -2320,6 +2321,7 @@ static void collect_longterm_unpinnable_folios(
>>
>>                  if (folio_is_longterm_pinnable(folio))
>>                          continue;
>> +               any_unpinnable = true;
>>
>>                  if (folio_is_device_coherent(folio))
>>                          continue;
>> @@ -2342,6 +2344,8 @@ static void collect_longterm_unpinnable_folios(
>>                                      NR_ISOLATED_ANON + folio_is_file_lru(folio),
>>                                      folio_nr_pages(folio));
>>          }
>> +
>> +       return any_unpinnable;
>>   }
>>
>>   /*
>> @@ -2417,11 +2421,12 @@ migrate_longterm_unpinnable_folios(struct list_head *movable_folio_list,
>>   static long
>>   check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
>>   {
>> +       bool any_unpinnable;
>>          LIST_HEAD(movable_folio_list);
>>
>> -       collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
>> +       any_unpinnable = collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
>>          if (list_empty(&movable_folio_list))
>> -               return 0;
>> +               return any_unpinnable ? -EAGAIN : 0;
>>
>>          return migrate_longterm_unpinnable_folios(&movable_folio_list, pofs);
>>   }
>>
>>
>> --
>> Cheers,
>>
>> David / dhildenb
> 
> Hi
> 
> Thank you for your comment and patch.
> By the way, what if there are any_unpinnable pages and also pages in the movable_folio_list,
> but migrate_longterm_unpinnable_folios failed to migrate and return other erros instead of -EAGAIN?
> In that case, I think the CMA or other unpinnable pages would be pinned.

Oh, I think what we have to do is call pofs_unpin(pofs)() in case we 
return with -EAGAIN early. That's what 
migrate_longterm_unpinnable_folios() would do.

If we happen to call migrate_longterm_unpinnable_folios(), it would 
already unpin all folios by calling pofs_unpin(pofs). So we can just 
return whatever error we actually saw.

Do you have the capacity to send a proper fix?

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 27+ messages in thread

* reply: [RFC] pin_user_pages_fast failure count increased
       [not found]         ` <CGME20250522130101epcas1p435244c12cfc9bb7895008b8ea98af064@epcms1p2>
@ 2025-05-22 14:44           ` 김재원
  2025-05-22 15:07             ` David Hildenbrand
  2025-05-23  2:37           ` 김재원
  1 sibling, 1 reply; 27+ messages in thread
From: 김재원 @ 2025-05-22 14:44 UTC (permalink / raw)
  To: David Hildenbrand, zhaoyang.huang@unisoc.com,
	유혜수, jhubbard@nvidia.com, surenb@google.com,
	Steve.Kang@unisoc.com, huangzhaoyang@gmail.com
  Cc: Jaewon Kim, linux-mm@kvack.org

>On 22.05.25 15:09, Jaewon Kim wrote:
>>> On 22.05.25 12:18, ?朝? (Zhaoyang Huang) wrote:
>>>>> On Mon, Apr 28, 2025 at 02:12:57PM -0700, John Hubbard wrote:
>>>>>> On 4/28/25 1:56 PM, David Hildenbrand wrote:
>>>>>>> On 28.04.25 22:14, John Hubbard wrote:
>>>>>>>> On 4/28/25 8:17 AM, Jaewon Kim wrote:
>>>>>>>>> Hi
>>>>>>>>>
>>>>>>>>> If pin_user_pages_fast does not pin all the requested number of
>>>>>>>>> pages, then drivers calling to pin_user_pages_fast should retry
>>>>>>>>> until the gup pins all?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Approaches vary, for handling partial success of pin_user_pages().
>>>>>>>>
>>>>>>>> * Many drivers unpin everything and either bail out entirely, or
>>>>>>>> retry pinning the entire original range.
>>>>>>>
>>>>>>> Hm, unpinning + trying to repin the entire range can easily result
>>>>>>> in an endless loop on persistent errors IIRC?
>>>>>>>
>>>>>>
>>>>>> I vaguely recall a limited number of retries, yes.
>>>>>>
>>>>>> thanks,
>>>>>> --
>>>>>> John Hubbard
>>>>>>
>>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> I'd like to report a potential issue introduced by a recent change in
>>>>> 1aaf8c122918 mm: gup: fix infinite loop within __get_longterm_locked
>>>>>
>>>>> Previously, the call to migrate_longterm_unpinnable_folio() was guarded by the
>>>>> collected variable. This meant that if a CMA page was temporarily held in the
>>>>> pagevec and failed LRU isolation, it wouldn't be added to the
>>>>> movable_page_list, but the collected counter would still be incremented.
>>>
>>> Okay, so we'd also express that way "any longterm_pinnable page found".
>>>
>>>> There is lru_add_drain_all for dealing with this scenario, so this won't be the case, right?
>>>
>>> Good point. Only concurrent isolation might be problematic (concurrent reclaim?).
>>>
>>>>>
>>>>> As a result, migrate_longterm_unpinnable_folio() would return -EAGAIN, and
>>>>> the process would be retried until migration of the CMA page succeeded.
>>>>>
>>>>> However, in the recent patch merged into mainline, the logic now only checks
>>>>> whether movable_page_list is empty, and no longer relies on the collected
>>>>> count.
>>>>> This can cause CMA pages that fail isolation to bypass retry logic and remain
>>>>> pinned.
>>>>>
>>>>> Effectively,long-term pinning is now possible for CMA pages ? something that
>>>>> previously would have been avoided through repeated attempts.
>>>
>>> Calling migrate_longterm_unpinnable_folios() when there is nothing to migrate is stupid.
>>>
>>> Maybe something like:
>>>
>>> diff --git a/mm/gup.c b/mm/gup.c
>>> index 329c5f7acc7a0..58b8e40fc19ed 100644
>>> --- a/mm/gup.c
>>> +++ b/mm/gup.c
>>> @@ -2301,14 +2301,15 @@ static void pofs_unpin(struct pages_or_folios *pofs)
>>>   }
>>>
>>>   /*
>>> - * Returns the number of collected folios. Return value is always >= 0.
>>> + * Returns whether any longterm unpinnable folio was found (if isolation
>>> + * fails, not all can be added to the movable_folio_list).
>>>    */
>>> -static void collect_longterm_unpinnable_folios(
>>> +static bool collect_longterm_unpinnable_folios(
>>>                  struct list_head *movable_folio_list,
>>>                  struct pages_or_folios *pofs)
>>>   {
>>> +       bool drain_allow = true, any_unpinnable = false;
>>>          struct folio *prev_folio = NULL;
>>> -       bool drain_allow = true;
>>>          unsigned long i;
>>>
>>>          for (i = 0; i < pofs->nr_entries; i++) {
>>> @@ -2320,6 +2321,7 @@ static void collect_longterm_unpinnable_folios(
>>>
>>>                  if (folio_is_longterm_pinnable(folio))
>>>                          continue;
>>> +               any_unpinnable = true;
>>>
>>>                  if (folio_is_device_coherent(folio))
>>>                          continue;
>>> @@ -2342,6 +2344,8 @@ static void collect_longterm_unpinnable_folios(
>>>                                      NR_ISOLATED_ANON + folio_is_file_lru(folio),
>>>                                      folio_nr_pages(folio));
>>>          }
>>> +
>>> +       return any_unpinnable;
>>>   }
>>>
>>>   /*
>>> @@ -2417,11 +2421,12 @@ migrate_longterm_unpinnable_folios(struct list_head *movable_folio_list,
>>>   static long
>>>   check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
>>>   {
>>> +       bool any_unpinnable;
>>>          LIST_HEAD(movable_folio_list);
>>>
>>> -       collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
>>> +       any_unpinnable = collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
>>>          if (list_empty(&movable_folio_list))
>>> -               return 0;
>>> +               return any_unpinnable ? -EAGAIN : 0;
>>>
>>>          return migrate_longterm_unpinnable_folios(&movable_folio_list, pofs);
>>>   }
>>>
>>>
>>> --
>>> Cheers,
>>>
>>> David / dhildenb
>> 
>> Hi
>> 
>> Thank you for your comment and patch.
>> By the way, what if there are any_unpinnable pages and also pages in the movable_folio_list,
>> but migrate_longterm_unpinnable_folios failed to migrate and return other erros instead of -EAGAIN?
>> In that case, I think the CMA or other unpinnable pages would be pinned.
>
>Oh, I think what we have to do is call pofs_unpin(pofs)() in case we 
>return with -EAGAIN early. That's what 
>migrate_longterm_unpinnable_folios() would do.

I did not understand. Do you mean we need to call pofs_unpin in case of any_unpinnable?
I think your following code seems to be good to me.
  return any_unpinnable ? -EAGAIN : 0;

>
>If we happen to call migrate_longterm_unpinnable_folios(), it would 
>already unpin all folios by calling pofs_unpin(pofs). So we can just 
>return whatever error we actually saw.

Oh, I didn't see that pofs_unpin within migrate_longterm_unpinnable_folios, I missed it.
Then I think your previous change is good enough.

>
>Do you have the capacity to send a proper fix?

If you meant to allow us to use your code and make a patch, we will do.

BR

>
>-- 
>Cheers,
>
>David / dhildenb


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reply: [RFC] pin_user_pages_fast failure count increased
  2025-05-22 14:44           ` 김재원
@ 2025-05-22 15:07             ` David Hildenbrand
  2025-05-23  2:48               ` John Hubbard
  0 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2025-05-22 15:07 UTC (permalink / raw)
  To: jaewon31.kim, zhaoyang.huang@unisoc.com, 유혜수,
	jhubbard@nvidia.com, surenb@google.com, Steve.Kang@unisoc.com,
	huangzhaoyang@gmail.com
  Cc: Jaewon Kim, linux-mm@kvack.org

On 22.05.25 16:44, 김재원 wrote:
>> On 22.05.25 15:09, Jaewon Kim wrote:
>>>> On 22.05.25 12:18, ?朝? (Zhaoyang Huang) wrote:
>>>>>> On Mon, Apr 28, 2025 at 02:12:57PM -0700, John Hubbard wrote:
>>>>>>> On 4/28/25 1:56 PM, David Hildenbrand wrote:
>>>>>>>> On 28.04.25 22:14, John Hubbard wrote:
>>>>>>>>> On 4/28/25 8:17 AM, Jaewon Kim wrote:
>>>>>>>>>> Hi
>>>>>>>>>>
>>>>>>>>>> If pin_user_pages_fast does not pin all the requested number of
>>>>>>>>>> pages, then drivers calling to pin_user_pages_fast should retry
>>>>>>>>>> until the gup pins all?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Approaches vary, for handling partial success of pin_user_pages().
>>>>>>>>>
>>>>>>>>> * Many drivers unpin everything and either bail out entirely, or
>>>>>>>>> retry pinning the entire original range.
>>>>>>>>
>>>>>>>> Hm, unpinning + trying to repin the entire range can easily result
>>>>>>>> in an endless loop on persistent errors IIRC?
>>>>>>>>
>>>>>>>
>>>>>>> I vaguely recall a limited number of retries, yes.
>>>>>>>
>>>>>>> thanks,
>>>>>>> --
>>>>>>> John Hubbard
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'd like to report a potential issue introduced by a recent change in
>>>>>> 1aaf8c122918 mm: gup: fix infinite loop within __get_longterm_locked
>>>>>>
>>>>>> Previously, the call to migrate_longterm_unpinnable_folio() was guarded by the
>>>>>> collected variable. This meant that if a CMA page was temporarily held in the
>>>>>> pagevec and failed LRU isolation, it wouldn't be added to the
>>>>>> movable_page_list, but the collected counter would still be incremented.
>>>>
>>>> Okay, so we'd also express that way "any longterm_pinnable page found".
>>>>
>>>>> There is lru_add_drain_all for dealing with this scenario, so this won't be the case, right?
>>>>
>>>> Good point. Only concurrent isolation might be problematic (concurrent reclaim?).
>>>>
>>>>>>
>>>>>> As a result, migrate_longterm_unpinnable_folio() would return -EAGAIN, and
>>>>>> the process would be retried until migration of the CMA page succeeded.
>>>>>>
>>>>>> However, in the recent patch merged into mainline, the logic now only checks
>>>>>> whether movable_page_list is empty, and no longer relies on the collected
>>>>>> count.
>>>>>> This can cause CMA pages that fail isolation to bypass retry logic and remain
>>>>>> pinned.
>>>>>>
>>>>>> Effectively,long-term pinning is now possible for CMA pages ? something that
>>>>>> previously would have been avoided through repeated attempts.
>>>>
>>>> Calling migrate_longterm_unpinnable_folios() when there is nothing to migrate is stupid.
>>>>
>>>> Maybe something like:
>>>>
>>>> diff --git a/mm/gup.c b/mm/gup.c
>>>> index 329c5f7acc7a0..58b8e40fc19ed 100644
>>>> --- a/mm/gup.c
>>>> +++ b/mm/gup.c
>>>> @@ -2301,14 +2301,15 @@ static void pofs_unpin(struct pages_or_folios *pofs)
>>>>    }
>>>>
>>>>    /*
>>>> - * Returns the number of collected folios. Return value is always >= 0.
>>>> + * Returns whether any longterm unpinnable folio was found (if isolation
>>>> + * fails, not all can be added to the movable_folio_list).
>>>>     */
>>>> -static void collect_longterm_unpinnable_folios(
>>>> +static bool collect_longterm_unpinnable_folios(
>>>>                   struct list_head *movable_folio_list,
>>>>                   struct pages_or_folios *pofs)
>>>>    {
>>>> +       bool drain_allow = true, any_unpinnable = false;
>>>>           struct folio *prev_folio = NULL;
>>>> -       bool drain_allow = true;
>>>>           unsigned long i;
>>>>
>>>>           for (i = 0; i < pofs->nr_entries; i++) {
>>>> @@ -2320,6 +2321,7 @@ static void collect_longterm_unpinnable_folios(
>>>>
>>>>                   if (folio_is_longterm_pinnable(folio))
>>>>                           continue;
>>>> +               any_unpinnable = true;
>>>>
>>>>                   if (folio_is_device_coherent(folio))
>>>>                           continue;
>>>> @@ -2342,6 +2344,8 @@ static void collect_longterm_unpinnable_folios(
>>>>                                       NR_ISOLATED_ANON + folio_is_file_lru(folio),
>>>>                                       folio_nr_pages(folio));
>>>>           }
>>>> +
>>>> +       return any_unpinnable;
>>>>    }
>>>>
>>>>    /*
>>>> @@ -2417,11 +2421,12 @@ migrate_longterm_unpinnable_folios(struct list_head *movable_folio_list,
>>>>    static long
>>>>    check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
>>>>    {
>>>> +       bool any_unpinnable;
>>>>           LIST_HEAD(movable_folio_list);
>>>>
>>>> -       collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
>>>> +       any_unpinnable = collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
>>>>           if (list_empty(&movable_folio_list))
>>>> -               return 0;
>>>> +               return any_unpinnable ? -EAGAIN : 0;
>>>>
>>>>           return migrate_longterm_unpinnable_folios(&movable_folio_list, pofs);
>>>>    }
>>>>
>>>>
>>>> --
>>>> Cheers,
>>>>
>>>> David / dhildenb
>>>
>>> Hi
>>>
>>> Thank you for your comment and patch.
>>> By the way, what if there are any_unpinnable pages and also pages in the movable_folio_list,
>>> but migrate_longterm_unpinnable_folios failed to migrate and return other erros instead of -EAGAIN?
>>> In that case, I think the CMA or other unpinnable pages would be pinned.
>>
>> Oh, I think what we have to do is call pofs_unpin(pofs)() in case we
>> return with -EAGAIN early. That's what
>> migrate_longterm_unpinnable_folios() would do.
> 
> I did not understand. Do you mean we need to call pofs_unpin in case of any_unpinnable?

At least that's what we did before 1aaf8c122 when calling 
collect_longterm_unpinnable_folios() I think.

> I think your following code seems to be good to me.
>    return any_unpinnable ? -EAGAIN : 0;

I think we have to call pofs_unpin() here.

Assuming we always call migrate_longterm_unpinnable_folios() with a list 
now, we could drop the list_empty() check in there.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 27+ messages in thread

* reply: [RFC] pin_user_pages_fast failure count increased
       [not found]         ` <CGME20250522130101epcas1p435244c12cfc9bb7895008b8ea98af064@epcms1p2>
  2025-05-22 14:44           ` 김재원
@ 2025-05-23  2:37           ` 김재원
  2025-05-23  2:52             ` John Hubbard
  1 sibling, 1 reply; 27+ messages in thread
From: 김재원 @ 2025-05-23  2:37 UTC (permalink / raw)
  To: David Hildenbrand, 김재원,
	zhaoyang.huang@unisoc.com, 유혜수,
	jhubbard@nvidia.com, surenb@google.com, Steve.Kang@unisoc.com,
	huangzhaoyang@gmail.com
  Cc: Jaewon Kim, linux-mm@kvack.org

>>> On 22.05.25 15:09, Jaewon Kim wrote:
>>>>> On 22.05.25 12:18, ?朝? (Zhaoyang Huang) wrote:
>>>>>>> On Mon, Apr 28, 2025 at 02:12:57PM -0700, John Hubbard wrote:
>>>>>>>> On 4/28/25 1:56 PM, David Hildenbrand wrote:
>>>>>>>>> On 28.04.25 22:14, John Hubbard wrote:
>>>>>>>>>> On 4/28/25 8:17 AM, Jaewon Kim wrote:
>>>>>>>>>>> Hi
>>>>>>>>>>>
>>>>>>>>>>> If pin_user_pages_fast does not pin all the requested number of
>>>>>>>>>>> pages, then drivers calling to pin_user_pages_fast should retry
>>>>>>>>>>> until the gup pins all?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Approaches vary, for handling partial success of pin_user_pages().
>>>>>>>>>>
>>>>>>>>>> * Many drivers unpin everything and either bail out entirely, or
>>>>>>>>>> retry pinning the entire original range.
>>>>>>>>>
>>>>>>>>> Hm, unpinning + trying to repin the entire range can easily result
>>>>>>>>> in an endless loop on persistent errors IIRC?
>>>>>>>>>
>>>>>>>>
>>>>>>>> I vaguely recall a limited number of retries, yes.
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>> --
>>>>>>>> John Hubbard
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'd like to report a potential issue introduced by a recent change in
>>>>>>> 1aaf8c122918 mm: gup: fix infinite loop within __get_longterm_locked
>>>>>>>
>>>>>>> Previously, the call to migrate_longterm_unpinnable_folio() was guarded by the
>>>>>>> collected variable. This meant that if a CMA page was temporarily held in the
>>>>>>> pagevec and failed LRU isolation, it wouldn't be added to the
>>>>>>> movable_page_list, but the collected counter would still be incremented.
>>>>>
>>>>> Okay, so we'd also express that way "any longterm_pinnable page found".
>>>>>
>>>>>> There is lru_add_drain_all for dealing with this scenario, so this won't be the case, right?
>>>>>
>>>>> Good point. Only concurrent isolation might be problematic (concurrent reclaim?).
>>>>>
>>>>>>>
>>>>>>> As a result, migrate_longterm_unpinnable_folio() would return -EAGAIN, and
>>>>>>> the process would be retried until migration of the CMA page succeeded.
>>>>>>>
>>>>>>> However, in the recent patch merged into mainline, the logic now only checks
>>>>>>> whether movable_page_list is empty, and no longer relies on the collected
>>>>>>> count.
>>>>>>> This can cause CMA pages that fail isolation to bypass retry logic and remain
>>>>>>> pinned.
>>>>>>>
>>>>>>> Effectively,long-term pinning is now possible for CMA pages ? something that
>>>>>>> previously would have been avoided through repeated attempts.
>>>>>
>>>>> Calling migrate_longterm_unpinnable_folios() when there is nothing to migrate is stupid.
>>>>>
>>>>> Maybe something like:
>>>>>
>>>>> diff --git a/mm/gup.c b/mm/gup.c
>>>>> index 329c5f7acc7a0..58b8e40fc19ed 100644
>>>>> --- a/mm/gup.c
>>>>> +++ b/mm/gup.c
>>>>> @@ -2301,14 +2301,15 @@ static void pofs_unpin(struct pages_or_folios *pofs)
>>>>>    }
>>>>>
>>>>>    /*
>>>>> - * Returns the number of collected folios. Return value is always >= 0.
>>>>> + * Returns whether any longterm unpinnable folio was found (if isolation
>>>>> + * fails, not all can be added to the movable_folio_list).
>>>>>     */
>>>>> -static void collect_longterm_unpinnable_folios(
>>>>> +static bool collect_longterm_unpinnable_folios(
>>>>>                   struct list_head *movable_folio_list,
>>>>>                   struct pages_or_folios *pofs)
>>>>>    {
>>>>> +       bool drain_allow = true, any_unpinnable = false;
>>>>>           struct folio *prev_folio = NULL;
>>>>> -       bool drain_allow = true;
>>>>>           unsigned long i;
>>>>>
>>>>>           for (i = 0; i < pofs->nr_entries; i++) {
>>>>> @@ -2320,6 +2321,7 @@ static void collect_longterm_unpinnable_folios(
>>>>>
>>>>>                   if (folio_is_longterm_pinnable(folio))
>>>>>                           continue;
>>>>> +               any_unpinnable = true;
>>>>>
>>>>>                   if (folio_is_device_coherent(folio))
>>>>>                           continue;
>>>>> @@ -2342,6 +2344,8 @@ static void collect_longterm_unpinnable_folios(
>>>>>                                       NR_ISOLATED_ANON + folio_is_file_lru(folio),
>>>>>                                       folio_nr_pages(folio));
>>>>>           }
>>>>> +
>>>>> +       return any_unpinnable;
>>>>>    }
>>>>>
>>>>>    /*
>>>>> @@ -2417,11 +2421,12 @@ migrate_longterm_unpinnable_folios(struct list_head *movable_folio_list,
>>>>>    static long
>>>>>    check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
>>>>>    {
>>>>> +       bool any_unpinnable;
>>>>>           LIST_HEAD(movable_folio_list);
>>>>>
>>>>> -       collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
>>>>> +       any_unpinnable = collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
>>>>>           if (list_empty(&movable_folio_list))
>>>>> -               return 0;
>>>>> +               return any_unpinnable ? -EAGAIN : 0;
>>>>>
>>>>>           return migrate_longterm_unpinnable_folios(&movable_folio_list, pofs);
>>>>>    }
>>>>>
>>>>>
>>>>> --
>>>>> Cheers,
>>>>>
>>>>> David / dhildenb
>>>>
>>>> Hi
>>>>
>>>> Thank you for your comment and patch.
>>>> By the way, what if there are any_unpinnable pages and also pages in the movable_folio_list,
>>>> but migrate_longterm_unpinnable_folios failed to migrate and return other erros instead of -EAGAIN?
>>>> In that case, I think the CMA or other unpinnable pages would be pinned.
>>>
>>> Oh, I think what we have to do is call pofs_unpin(pofs)() in case we
>>> return with -EAGAIN early. That's what
>>> migrate_longterm_unpinnable_folios() would do.
>> 
>> I did not understand. Do you mean we need to call pofs_unpin in case of any_unpinnable?
>
>At least that's what we did before 1aaf8c122 when calling 
>collect_longterm_unpinnable_folios() I think.
>
>> I think your following code seems to be good to me.
>>    return any_unpinnable ? -EAGAIN : 0;
>
>I think we have to call pofs_unpin() here.
>
>Assuming we always call migrate_longterm_unpinnable_folios() with a list 
>now, we could drop the list_empty() check in there.
>

Thank for your kind explanation.

I think this is what you meant, please let me know if you have an idea to make this nicer.
We may be to able to prepare the patch next week.

 static long
 check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
 {
+       bool any_unpinnable;
        LIST_HEAD(movable_folio_list);
 
-       collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
-       if (list_empty(&movable_folio_list))
-               return 0;
+       any_unpinnable = collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
+       if (list_empty(&movable_folio_list)) {
+               if (any_unpinnable)
+                       pofs_unpin(pofs);
+               return any_unpinnable ? -EAGAIN : 0;
+       }
 
        return migrate_longterm_unpinnable_folios(&movable_folio_list, pofs);
 }


>-- 
>Cheers,
>
>David / dhildenb





^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reply: [RFC] pin_user_pages_fast failure count increased
  2025-05-22 15:07             ` David Hildenbrand
@ 2025-05-23  2:48               ` John Hubbard
  0 siblings, 0 replies; 27+ messages in thread
From: John Hubbard @ 2025-05-23  2:48 UTC (permalink / raw)
  To: David Hildenbrand, jaewon31.kim, zhaoyang.huang@unisoc.com,
	유혜수, surenb@google.com,
	Steve.Kang@unisoc.com, huangzhaoyang@gmail.com
  Cc: Jaewon Kim, linux-mm@kvack.org

On 5/22/25 8:07 AM, David Hildenbrand wrote:
...
>>>> Thank you for your comment and patch.
>>>> By the way, what if there are any_unpinnable pages and also pages in the movable_folio_list,
>>>> but migrate_longterm_unpinnable_folios failed to migrate and return other erros instead of -EAGAIN?
>>>> In that case, I think the CMA or other unpinnable pages would be pinned.
>>>
>>> Oh, I think what we have to do is call pofs_unpin(pofs)() in case we
>>> return with -EAGAIN early. That's what
>>> migrate_longterm_unpinnable_folios() would do.
>>
>> I did not understand. Do you mean we need to call pofs_unpin in case of any_unpinnable?
> 
> At least that's what we did before 1aaf8c122 when calling 
> collect_longterm_unpinnable_folios() I think.
> 

Not from what I see in the diffs for that commit, no.

>> I think your following code seems to be good to me.
>>    return any_unpinnable ? -EAGAIN : 0;
> 
> I think we have to call pofs_unpin() here.

That seems correct, although I'd have to look a little deeper to 
understand why 1aaf8c122, and not some other commit, revealed
the problem


thanks,
-- 
John Hubbard

> 
> Assuming we always call migrate_longterm_unpinnable_folios() with a list 
> now, we could drop the list_empty() check in there.
> 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reply: [RFC] pin_user_pages_fast failure count increased
  2025-05-23  2:37           ` 김재원
@ 2025-05-23  2:52             ` John Hubbard
  2025-05-26  7:48               ` Hyesoo Yu
  0 siblings, 1 reply; 27+ messages in thread
From: John Hubbard @ 2025-05-23  2:52 UTC (permalink / raw)
  To: jaewon31.kim, David Hildenbrand, zhaoyang.huang@unisoc.com,
	유혜수, surenb@google.com,
	Steve.Kang@unisoc.com, huangzhaoyang@gmail.com
  Cc: Jaewon Kim, linux-mm@kvack.org

On 5/22/25 7:37 PM, 김재원 wrote:
...
> I think this is what you meant, please let me know if you have an idea to make this nicer.
> We may be to able to prepare the patch next week.
> 
>  static long
>  check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
>  {
> +       bool any_unpinnable;
>         LIST_HEAD(movable_folio_list);
>  
> -       collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> -       if (list_empty(&movable_folio_list))
> -               return 0;
> +       any_unpinnable = collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> +       if (list_empty(&movable_folio_list)) {
> +               if (any_unpinnable)
> +                       pofs_unpin(pofs);

I think this is correct, although as I mentioned in the other thread,
that implies that commit 1aaf8c122918 (which didn't add nor remove 
any pof unpinning) is probably not the true or only culprit, right?

> +               return any_unpinnable ? -EAGAIN : 0;

Ha, the "?" operator almost always does more harm than good.

Here, for example, it has obscured from you the fact that any_unpinnable
is being checked twice, when you could have merged those into a single "if".

> +       }
>  
>         return migrate_longterm_unpinnable_folios(&movable_folio_list, pofs);
>  }
thanks,
-- 
John Hubbard



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reply: [RFC] pin_user_pages_fast failure count increased
  2025-05-23  2:52             ` John Hubbard
@ 2025-05-26  7:48               ` Hyesoo Yu
  2025-05-26  8:05                 ` Zhaoyang Huang
  0 siblings, 1 reply; 27+ messages in thread
From: Hyesoo Yu @ 2025-05-26  7:48 UTC (permalink / raw)
  To: John Hubbard
  Cc: jaewon31.kim, David Hildenbrand, zhaoyang.huang@unisoc.com,
	surenb@google.com, Steve.Kang@unisoc.com, huangzhaoyang@gmail.com,
	Jaewon Kim, linux-mm@kvack.org, janghyuck.kim

[-- Attachment #1: Type: text/plain, Size: 2311 bytes --]

On Thu, May 22, 2025 at 07:52:41PM -0700, John Hubbard wrote:
> On 5/22/25 7:37 PM, 김재원 wrote:
> ...
> > I think this is what you meant, please let me know if you have an idea to make this nicer.
> > We may be to able to prepare the patch next week.
> > 
> >  static long
> >  check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
> >  {
> > +       bool any_unpinnable;
> >         LIST_HEAD(movable_folio_list);
> >  
> > -       collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> > -       if (list_empty(&movable_folio_list))
> > -               return 0;
> > +       any_unpinnable = collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> > +       if (list_empty(&movable_folio_list)) {
> > +               if (any_unpinnable)
> > +                       pofs_unpin(pofs);
> 
> I think this is correct, although as I mentioned in the other thread,
> that implies that commit 1aaf8c122918 (which didn't add nor remove 
> any pof unpinning) is probably not the true or only culprit, right?
> 
> > +               return any_unpinnable ? -EAGAIN : 0;
> 
> Ha, the "?" operator almost always does more harm than good.
> 
> Here, for example, it has obscured from you the fact that any_unpinnable
> is being checked twice, when you could have merged those into a single "if".
> 

Hello,

I was wondering if the original problem - an infinite loop when pages allocated by
cma_alloc() in vm_ops->fault are passed to GUP - still remains unresolved.
(To be honest, I'm not quite sure how such pages end up being pinned via GUP.
 Is that the expected behavior, or could it possibly indicate a bug ?)

Would it be make sense to try calling __lru_add_drain_all(false) in collect_longterm_unpinnable_folios ?
We could consider limiting the number of -EAGAIN retries to a fixed count (e.g., 5 attempts) to
avoid an infinite loop. Or perhaps in check_and_migrate_movable_pages_or_folio(),
if the list is empty but any_unpinnable is true, we should consider returning an error
instead of EAGAIN to explicitly prevent longterm pinning of CMA allocated memory.

I'd be interested to hear what others think.

Thanks,
Hyesoo Yu.


> > +       }
> >  
> >         return migrate_longterm_unpinnable_folios(&movable_folio_list, pofs);
> >  }
> thanks,
> -- 
> John Hubbard
> 
> 

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reply: [RFC] pin_user_pages_fast failure count increased
  2025-05-26  7:48               ` Hyesoo Yu
@ 2025-05-26  8:05                 ` Zhaoyang Huang
  2025-05-26  9:33                   ` Hyesoo Yu
  0 siblings, 1 reply; 27+ messages in thread
From: Zhaoyang Huang @ 2025-05-26  8:05 UTC (permalink / raw)
  To: Hyesoo Yu
  Cc: John Hubbard, jaewon31.kim, David Hildenbrand,
	zhaoyang.huang@unisoc.com, surenb@google.com,
	Steve.Kang@unisoc.com, Jaewon Kim, linux-mm@kvack.org,
	janghyuck.kim

On Mon, May 26, 2025 at 3:50 PM Hyesoo Yu <hyesoo.yu@samsung.com> wrote:
>
> On Thu, May 22, 2025 at 07:52:41PM -0700, John Hubbard wrote:
> > On 5/22/25 7:37 PM, 김재원 wrote:
> > ...
> > > I think this is what you meant, please let me know if you have an idea to make this nicer.
> > > We may be to able to prepare the patch next week.
> > >
> > >  static long
> > >  check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
> > >  {
> > > +       bool any_unpinnable;
> > >         LIST_HEAD(movable_folio_list);
> > >
> > > -       collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> > > -       if (list_empty(&movable_folio_list))
> > > -               return 0;
> > > +       any_unpinnable = collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> > > +       if (list_empty(&movable_folio_list)) {
> > > +               if (any_unpinnable)
> > > +                       pofs_unpin(pofs);
> >
> > I think this is correct, although as I mentioned in the other thread,
> > that implies that commit 1aaf8c122918 (which didn't add nor remove
> > any pof unpinning) is probably not the true or only culprit, right?
> >
> > > +               return any_unpinnable ? -EAGAIN : 0;
> >
> > Ha, the "?" operator almost always does more harm than good.
> >
> > Here, for example, it has obscured from you the fact that any_unpinnable
> > is being checked twice, when you could have merged those into a single "if".
> >
>
> Hello,
>
> I was wondering if the original problem - an infinite loop when pages allocated by
> cma_alloc() in vm_ops->fault are passed to GUP - still remains unresolved.
> (To be honest, I'm not quite sure how such pages end up being pinned via GUP.
>  Is that the expected behavior, or could it possibly indicate a bug ?)
The original problem arises from applying CMA as guestOS's memory
slots for kvm which use GUP to setup its 2nd stage mapping(HVA->PFN).
You can check KVM code if you are interested.

> Would it be make sense to try calling __lru_add_drain_all(false) in collect_longterm_unpinnable_folios ?
Please be noted that not all the pages allocated from vm_ops->fault
will be added to LRU.
> We could consider limiting the number of -EAGAIN retries to a fixed count (e.g., 5 attempts) to
> avoid an infinite loop. Or perhaps in check_and_migrate_movable_pages_or_folio(),
> if the list is empty but any_unpinnable is true, we should consider returning an error
> instead of EAGAIN to explicitly prevent longterm pinning of CMA allocated memory.
>
> I'd be interested to hear what others think.
>
> Thanks,
> Hyesoo Yu.
>
>
> > > +       }
> > >
> > >         return migrate_longterm_unpinnable_folios(&movable_folio_list, pofs);
> > >  }
> > thanks,
> > --
> > John Hubbard
> >
> >


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reply: [RFC] pin_user_pages_fast failure count increased
  2025-05-26  8:05                 ` Zhaoyang Huang
@ 2025-05-26  9:33                   ` Hyesoo Yu
  2025-05-26  9:38                     ` David Hildenbrand
       [not found]                     ` <CGME20250522130101epcas1p435244c12cfc9bb7895008b8ea98af064@epcms1p8>
  0 siblings, 2 replies; 27+ messages in thread
From: Hyesoo Yu @ 2025-05-26  9:33 UTC (permalink / raw)
  To: Zhaoyang Huang
  Cc: John Hubbard, jaewon31.kim, David Hildenbrand,
	zhaoyang.huang@unisoc.com, surenb@google.com,
	Steve.Kang@unisoc.com, Jaewon Kim, linux-mm@kvack.org,
	janghyuck.kim

[-- Attachment #1: Type: text/plain, Size: 4215 bytes --]

On Mon, May 26, 2025 at 04:05:16PM +0800, Zhaoyang Huang wrote:
> On Mon, May 26, 2025 at 3:50 PM Hyesoo Yu <hyesoo.yu@samsung.com> wrote:
> >
> > On Thu, May 22, 2025 at 07:52:41PM -0700, John Hubbard wrote:
> > > On 5/22/25 7:37 PM, 김재원 wrote:
> > > ...
> > > > I think this is what you meant, please let me know if you have an idea to make this nicer.
> > > > We may be to able to prepare the patch next week.
> > > >
> > > >  static long
> > > >  check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
> > > >  {
> > > > +       bool any_unpinnable;
> > > >         LIST_HEAD(movable_folio_list);
> > > >
> > > > -       collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> > > > -       if (list_empty(&movable_folio_list))
> > > > -               return 0;
> > > > +       any_unpinnable = collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> > > > +       if (list_empty(&movable_folio_list)) {
> > > > +               if (any_unpinnable)
> > > > +                       pofs_unpin(pofs);
> > >
> > > I think this is correct, although as I mentioned in the other thread,
> > > that implies that commit 1aaf8c122918 (which didn't add nor remove
> > > any pof unpinning) is probably not the true or only culprit, right?
> > >
> > > > +               return any_unpinnable ? -EAGAIN : 0;
> > >
> > > Ha, the "?" operator almost always does more harm than good.
> > >
> > > Here, for example, it has obscured from you the fact that any_unpinnable
> > > is being checked twice, when you could have merged those into a single "if".
> > >
> >
> > Hello,
> >
> > I was wondering if the original problem - an infinite loop when pages allocated by
> > cma_alloc() in vm_ops->fault are passed to GUP - still remains unresolved.
> > (To be honest, I'm not quite sure how such pages end up being pinned via GUP.
> >  Is that the expected behavior, or could it possibly indicate a bug ?)
> The original problem arises from applying CMA as guestOS's memory
> slots for kvm which use GUP to setup its 2nd stage mapping(HVA->PFN).
> You can check KVM code if you are interested.
> 

Thanks for the kind explanation. While I'm not deeply familiar with KVM, my understanding
is that there are cases where GUP is used on CMA.

So does that mean pinning memory from the CMA was actually intended to succeed ? 

In other words, with the commit 1aaf8c122918, it seems that pinning the CMA and no LRU pages
no longer results in an infinite loop and instead, the pinning is now allowed to proceed
successfully. Was this the intended behavior ?

I believe the pinning of CMA pages is something that should be properly handled,
and I understand the proposed code above is also intended to address this issue.

However if -EAGAIN is returned in cases where unpinnable pages are present,
it seems the infinite loop problem could reappear.

In general, we try to avoid pinning memory from CMA, so the fact that certain CMA regions
can still be pinned seems contradictory to that principle.
(folio_is_longterm_pinnable() function returns false for CMA.)

Is there a way to distinguish between CMA regions that must not be pinned and those that
might be allowed to be pinned like KVM ?

> > Would it be make sense to try calling __lru_add_drain_all(false) in collect_longterm_unpinnable_folios ?
> Please be noted that not all the pages allocated from vm_ops->fault
> will be added to LRU.

Yes, however, in our case, where the issue occurred due to pinning a cma page, I believe
there was likely contention around lru_add_drain_all(). 

Thanks,
Regards.

> > We could consider limiting the number of -EAGAIN retries to a fixed count (e.g., 5 attempts) to
> > avoid an infinite loop. Or perhaps in check_and_migrate_movable_pages_or_folio(),
> > if the list is empty but any_unpinnable is true, we should consider returning an error
> > instead of EAGAIN to explicitly prevent longterm pinning of CMA allocated memory.
> >
> > I'd be interested to hear what others think.
> >
> > Thanks,
> > Hyesoo Yu.
> >
> >
> > > > +       }
> > > >
> > > >         return migrate_longterm_unpinnable_folios(&movable_folio_list, pofs);
> > > >  }
> > > thanks,
> > > --
> > > John Hubbard
> > >
> > >
> 

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reply: [RFC] pin_user_pages_fast failure count increased
  2025-05-26  9:33                   ` Hyesoo Yu
@ 2025-05-26  9:38                     ` David Hildenbrand
       [not found]                     ` <CGME20250522130101epcas1p435244c12cfc9bb7895008b8ea98af064@epcms1p8>
  1 sibling, 0 replies; 27+ messages in thread
From: David Hildenbrand @ 2025-05-26  9:38 UTC (permalink / raw)
  To: Hyesoo Yu, Zhaoyang Huang
  Cc: John Hubbard, jaewon31.kim, zhaoyang.huang@unisoc.com,
	surenb@google.com, Steve.Kang@unisoc.com, Jaewon Kim,
	linux-mm@kvack.org, janghyuck.kim

On 26.05.25 11:33, Hyesoo Yu wrote:
> On Mon, May 26, 2025 at 04:05:16PM +0800, Zhaoyang Huang wrote:
>> On Mon, May 26, 2025 at 3:50 PM Hyesoo Yu <hyesoo.yu@samsung.com> wrote:
>>>
>>> On Thu, May 22, 2025 at 07:52:41PM -0700, John Hubbard wrote:
>>>> On 5/22/25 7:37 PM, 김재원 wrote:
>>>> ...
>>>>> I think this is what you meant, please let me know if you have an idea to make this nicer.
>>>>> We may be to able to prepare the patch next week.
>>>>>
>>>>>   static long
>>>>>   check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
>>>>>   {
>>>>> +       bool any_unpinnable;
>>>>>          LIST_HEAD(movable_folio_list);
>>>>>
>>>>> -       collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
>>>>> -       if (list_empty(&movable_folio_list))
>>>>> -               return 0;
>>>>> +       any_unpinnable = collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
>>>>> +       if (list_empty(&movable_folio_list)) {
>>>>> +               if (any_unpinnable)
>>>>> +                       pofs_unpin(pofs);
>>>>
>>>> I think this is correct, although as I mentioned in the other thread,
>>>> that implies that commit 1aaf8c122918 (which didn't add nor remove
>>>> any pof unpinning) is probably not the true or only culprit, right?
>>>>
>>>>> +               return any_unpinnable ? -EAGAIN : 0;
>>>>
>>>> Ha, the "?" operator almost always does more harm than good.
>>>>
>>>> Here, for example, it has obscured from you the fact that any_unpinnable
>>>> is being checked twice, when you could have merged those into a single "if".
>>>>
>>>
>>> Hello,
>>>
>>> I was wondering if the original problem - an infinite loop when pages allocated by
>>> cma_alloc() in vm_ops->fault are passed to GUP - still remains unresolved.
>>> (To be honest, I'm not quite sure how such pages end up being pinned via GUP.
>>>   Is that the expected behavior, or could it possibly indicate a bug ?)
>> The original problem arises from applying CMA as guestOS's memory
>> slots for kvm which use GUP to setup its 2nd stage mapping(HVA->PFN).
>> You can check KVM code if you are interested.
>>
> 
> Thanks for the kind explanation. While I'm not deeply familiar with KVM, my understanding
> is that there are cases where GUP is used on CMA.
> 
> So does that mean pinning memory from the CMA was actually intended to succeed ?

Careful: KVM uses ordinary GUP, not GUP-longterm.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 27+ messages in thread

* reply: [RFC] pin_user_pages_fast failure count increased
       [not found]                     ` <CGME20250522130101epcas1p435244c12cfc9bb7895008b8ea98af064@epcms1p8>
@ 2025-05-26 11:17                       ` Jaewon Kim
  2025-05-26 11:49                         ` Zhaoyang Huang
  0 siblings, 1 reply; 27+ messages in thread
From: Jaewon Kim @ 2025-05-26 11:17 UTC (permalink / raw)
  To: David Hildenbrand, Hyesoo Yu, Zhaoyang Huang
  Cc: John Hubbard, Jaewon Kim, zhaoyang.huang@unisoc.com,
	surenb@google.com, Steve.Kang@unisoc.com, Jaewon Kim,
	linux-mm@kvack.org, Jang-Hyuck Kim

>On 26.05.25 11:33, Hyesoo Yu wrote:
>> On Mon, May 26, 2025 at 04:05:16PM +0800, Zhaoyang Huang wrote:
>>> On Mon, May 26, 2025 at 3:50?PM Hyesoo Yu <hyesoo.yu@samsung.com> wrote:
>>>>
>>>> On Thu, May 22, 2025 at 07:52:41PM -0700, John Hubbard wrote:
>>>>> On 5/22/25 7:37 PM, 김재원 wrote:
>>>>> ...
>>>>>> I think this is what you meant, please let me know if you have an idea to make this nicer.
>>>>>> We may be to able to prepare the patch next week.
>>>>>>
>>>>>>   static long
>>>>>>   check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
>>>>>>   {
>>>>>> +       bool any_unpinnable;
>>>>>>          LIST_HEAD(movable_folio_list);
>>>>>>
>>>>>> -       collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
>>>>>> -       if (list_empty(&movable_folio_list))
>>>>>> -               return 0;
>>>>>> +       any_unpinnable = collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
>>>>>> +       if (list_empty(&movable_folio_list)) {
>>>>>> +               if (any_unpinnable)
>>>>>> +                       pofs_unpin(pofs);
>>>>>
>>>>> I think this is correct, although as I mentioned in the other thread,
>>>>> that implies that commit 1aaf8c122918 (which didn't add nor remove
>>>>> any pof unpinning) is probably not the true or only culprit, right?
>>>>>
>>>>>> +               return any_unpinnable ? -EAGAIN : 0;
>>>>>
>>>>> Ha, the "?" operator almost always does more harm than good.
>>>>>
>>>>> Here, for example, it has obscured from you the fact that any_unpinnable
>>>>> is being checked twice, when you could have merged those into a single "if".
>>>>>
>>>>
>>>> Hello,
>>>>
>>>> I was wondering if the original problem - an infinite loop when pages allocated by
>>>> cma_alloc() in vm_ops->fault are passed to GUP - still remains unresolved.
>>>> (To be honest, I'm not quite sure how such pages end up being pinned via GUP.
>>>>   Is that the expected behavior, or could it possibly indicate a bug ?)
>>> The original problem arises from applying CMA as guestOS's memory
>>> slots for kvm which use GUP to setup its 2nd stage mapping(HVA->PFN).
>>> You can check KVM code if you are interested.
>>>
>> 
>> Thanks for the kind explanation. While I'm not deeply familiar with KVM, my understanding
>> is that there are cases where GUP is used on CMA.
>> 
>> So does that mean pinning memory from the CMA was actually intended to succeed ?
>
>Careful: KVM uses ordinary GUP, not GUP-longterm.

Hi. David and Zhaoyang

If possible, could you kindly explain the situation where the 1aaf8c122918 was addeded?
If KVM does not user FOLL_LONGTERM, then why the function,
collect_longterm_unpinnable_folios, was changed at that time?

First of all, I'm not a KVM expert. After reading Zhaoyang's mail,
I thought CMA free page was initially allocated then migrated by FOLL_LONGTERM,
during the get_user_page for KVM's guest OS. If KVM does not use FOLL_LONGTERM,
I am confused.

Actually I did not understand the infinite loop situation. I thought few times of -EAGAIN
might happen during the gup. But calling lru_add_drain_all by collect_longterm_unpinnable_folios
would put the page to LRU. And other cma_alloc context or migration context, I guess,
put the pages back to LRU if there was race.

BR
Jaewon Kim

>
>-- 
>Cheers,
>
>David / dhildenb
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reply: [RFC] pin_user_pages_fast failure count increased
  2025-05-26 11:17                       ` Jaewon Kim
@ 2025-05-26 11:49                         ` Zhaoyang Huang
  2025-05-28  1:23                           ` Hyesoo Yu
  0 siblings, 1 reply; 27+ messages in thread
From: Zhaoyang Huang @ 2025-05-26 11:49 UTC (permalink / raw)
  To: jaewon31.kim
  Cc: David Hildenbrand, Hyesoo Yu, John Hubbard,
	zhaoyang.huang@unisoc.com, surenb@google.com,
	Steve.Kang@unisoc.com, Jaewon Kim, linux-mm@kvack.org,
	Jang-Hyuck Kim

On Mon, May 26, 2025 at 7:17 PM Jaewon Kim <jaewon31.kim@samsung.com> wrote:
>
> >On 26.05.25 11:33, Hyesoo Yu wrote:
> >> On Mon, May 26, 2025 at 04:05:16PM +0800, Zhaoyang Huang wrote:
> >>> On Mon, May 26, 2025 at 3:50?PM Hyesoo Yu <hyesoo.yu@samsung.com> wrote:
> >>>>
> >>>> On Thu, May 22, 2025 at 07:52:41PM -0700, John Hubbard wrote:
> >>>>> On 5/22/25 7:37 PM, 김재원 wrote:
> >>>>> ...
> >>>>>> I think this is what you meant, please let me know if you have an idea to make this nicer.
> >>>>>> We may be to able to prepare the patch next week.
> >>>>>>
> >>>>>>   static long
> >>>>>>   check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
> >>>>>>   {
> >>>>>> +       bool any_unpinnable;
> >>>>>>          LIST_HEAD(movable_folio_list);
> >>>>>>
> >>>>>> -       collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> >>>>>> -       if (list_empty(&movable_folio_list))
> >>>>>> -               return 0;
> >>>>>> +       any_unpinnable = collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> >>>>>> +       if (list_empty(&movable_folio_list)) {
> >>>>>> +               if (any_unpinnable)
> >>>>>> +                       pofs_unpin(pofs);
> >>>>>
> >>>>> I think this is correct, although as I mentioned in the other thread,
> >>>>> that implies that commit 1aaf8c122918 (which didn't add nor remove
> >>>>> any pof unpinning) is probably not the true or only culprit, right?
> >>>>>
> >>>>>> +               return any_unpinnable ? -EAGAIN : 0;
> >>>>>
> >>>>> Ha, the "?" operator almost always does more harm than good.
> >>>>>
> >>>>> Here, for example, it has obscured from you the fact that any_unpinnable
> >>>>> is being checked twice, when you could have merged those into a single "if".
> >>>>>
> >>>>
> >>>> Hello,
> >>>>
> >>>> I was wondering if the original problem - an infinite loop when pages allocated by
> >>>> cma_alloc() in vm_ops->fault are passed to GUP - still remains unresolved.
> >>>> (To be honest, I'm not quite sure how such pages end up being pinned via GUP.
> >>>>   Is that the expected behavior, or could it possibly indicate a bug ?)
> >>> The original problem arises from applying CMA as guestOS's memory
> >>> slots for kvm which use GUP to setup its 2nd stage mapping(HVA->PFN).
> >>> You can check KVM code if you are interested.
> >>>
> >>
> >> Thanks for the kind explanation. While I'm not deeply familiar with KVM, my understanding
> >> is that there are cases where GUP is used on CMA.
> >>
> >> So does that mean pinning memory from the CMA was actually intended to succeed ?
> >
> >Careful: KVM uses ordinary GUP, not GUP-longterm.
>
> Hi. David and Zhaoyang
>
> If possible, could you kindly explain the situation where the 1aaf8c122918 was addeded?
> If KVM does not user FOLL_LONGTERM, then why the function,
> collect_longterm_unpinnable_folios, was changed at that time?
>
> First of all, I'm not a KVM expert. After reading Zhaoyang's mail,
> I thought CMA free page was initially allocated then migrated by FOLL_LONGTERM,
> during the get_user_page for KVM's guest OS. If KVM does not use FOLL_LONGTERM,
> I am confused.
>
> Actually I did not understand the infinite loop situation. I thought few times of -EAGAIN
> might happen during the gup. But calling lru_add_drain_all by collect_longterm_unpinnable_folios
> would put the page to LRU. And other cma_alloc context or migration context, I guess,
> put the pages back to LRU if there was race.
Actually, it is pkvm which was introduced by google in AOSP. I am
afraid I can just brief the callstack here for security reasons. The
pin_user_pages will setup the 2nd stage mapping for the hva by the
vm_ops->fault which is registered by kvm memfd driver and all PFNs are
from CMA area. The driver will keep the pages out of the LRU which hit
the original bug as it is counted but have the movable_page_list be
empty and lead to infinite loop within __gup_longterm_locked

pkvm_xxx_xxx(equal to user_mem_abort in kvm)
{
    unsigned int flags = FOLL_HWPOISON | FOLL_LONGTERM | FOLL_WRITE;
    ...
    ret = pin_user_pages(hva, 1, flags, &page);
            __gup_longterm_locked
                do {
                     nr_pinned_pages = __get_user_pages_locked(mm,
start, nr_pages,
                      pages, locked, gup_flags);
                     rc =
check_and_migrate_movable_pages(nr_pinned_pages, pages);
                 } while (rc == -EAGAIN);
}
>
> BR
> Jaewon Kim
>
> >
> >--
> >Cheers,
> >
> >David / dhildenb
> >


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reply: [RFC] pin_user_pages_fast failure count increased
  2025-05-26 11:49                         ` Zhaoyang Huang
@ 2025-05-28  1:23                           ` Hyesoo Yu
  2025-05-28  2:49                             ` Zhaoyang Huang
  0 siblings, 1 reply; 27+ messages in thread
From: Hyesoo Yu @ 2025-05-28  1:23 UTC (permalink / raw)
  To: Zhaoyang Huang
  Cc: jaewon31.kim, David Hildenbrand, John Hubbard,
	zhaoyang.huang@unisoc.com, surenb@google.com,
	Steve.Kang@unisoc.com, Jaewon Kim, linux-mm@kvack.org,
	Jang-Hyuck Kim

[-- Attachment #1: Type: text/plain, Size: 5361 bytes --]

On Mon, May 26, 2025 at 07:49:57PM +0800, Zhaoyang Huang wrote:
> On Mon, May 26, 2025 at 7:17 PM Jaewon Kim <jaewon31.kim@samsung.com> wrote:
> >
> > >On 26.05.25 11:33, Hyesoo Yu wrote:
> > >> On Mon, May 26, 2025 at 04:05:16PM +0800, Zhaoyang Huang wrote:
> > >>> On Mon, May 26, 2025 at 3:50?PM Hyesoo Yu <hyesoo.yu@samsung.com> wrote:
> > >>>>
> > >>>> On Thu, May 22, 2025 at 07:52:41PM -0700, John Hubbard wrote:
> > >>>>> On 5/22/25 7:37 PM, 김재원 wrote:
> > >>>>> ...
> > >>>>>> I think this is what you meant, please let me know if you have an idea to make this nicer.
> > >>>>>> We may be to able to prepare the patch next week.
> > >>>>>>
> > >>>>>>   static long
> > >>>>>>   check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
> > >>>>>>   {
> > >>>>>> +       bool any_unpinnable;
> > >>>>>>          LIST_HEAD(movable_folio_list);
> > >>>>>>
> > >>>>>> -       collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> > >>>>>> -       if (list_empty(&movable_folio_list))
> > >>>>>> -               return 0;
> > >>>>>> +       any_unpinnable = collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> > >>>>>> +       if (list_empty(&movable_folio_list)) {
> > >>>>>> +               if (any_unpinnable)
> > >>>>>> +                       pofs_unpin(pofs);
> > >>>>>
> > >>>>> I think this is correct, although as I mentioned in the other thread,
> > >>>>> that implies that commit 1aaf8c122918 (which didn't add nor remove
> > >>>>> any pof unpinning) is probably not the true or only culprit, right?
> > >>>>>
> > >>>>>> +               return any_unpinnable ? -EAGAIN : 0;
> > >>>>>
> > >>>>> Ha, the "?" operator almost always does more harm than good.
> > >>>>>
> > >>>>> Here, for example, it has obscured from you the fact that any_unpinnable
> > >>>>> is being checked twice, when you could have merged those into a single "if".
> > >>>>>
> > >>>>
> > >>>> Hello,
> > >>>>
> > >>>> I was wondering if the original problem - an infinite loop when pages allocated by
> > >>>> cma_alloc() in vm_ops->fault are passed to GUP - still remains unresolved.
> > >>>> (To be honest, I'm not quite sure how such pages end up being pinned via GUP.
> > >>>>   Is that the expected behavior, or could it possibly indicate a bug ?)
> > >>> The original problem arises from applying CMA as guestOS's memory
> > >>> slots for kvm which use GUP to setup its 2nd stage mapping(HVA->PFN).
> > >>> You can check KVM code if you are interested.
> > >>>
> > >>
> > >> Thanks for the kind explanation. While I'm not deeply familiar with KVM, my understanding
> > >> is that there are cases where GUP is used on CMA.
> > >>
> > >> So does that mean pinning memory from the CMA was actually intended to succeed ?
> > >
> > >Careful: KVM uses ordinary GUP, not GUP-longterm.
> >
> > Hi. David and Zhaoyang
> >
> > If possible, could you kindly explain the situation where the 1aaf8c122918 was addeded?
> > If KVM does not user FOLL_LONGTERM, then why the function,
> > collect_longterm_unpinnable_folios, was changed at that time?
> >
> > First of all, I'm not a KVM expert. After reading Zhaoyang's mail,
> > I thought CMA free page was initially allocated then migrated by FOLL_LONGTERM,
> > during the get_user_page for KVM's guest OS. If KVM does not use FOLL_LONGTERM,
> > I am confused.
> >
> > Actually I did not understand the infinite loop situation. I thought few times of -EAGAIN
> > might happen during the gup. But calling lru_add_drain_all by collect_longterm_unpinnable_folios
> > would put the page to LRU. And other cma_alloc context or migration context, I guess,
> > put the pages back to LRU if there was race.
> Actually, it is pkvm which was introduced by google in AOSP. I am
> afraid I can just brief the callstack here for security reasons. The
> pin_user_pages will setup the 2nd stage mapping for the hva by the
> vm_ops->fault which is registered by kvm memfd driver and all PFNs are
> from CMA area. The driver will keep the pages out of the LRU which hit
> the original bug as it is counted but have the movable_page_list be
> empty and lead to infinite loop within __gup_longterm_locked
> 
> pkvm_xxx_xxx(equal to user_mem_abort in kvm)
> {
>     unsigned int flags = FOLL_HWPOISON | FOLL_LONGTERM | FOLL_WRITE;
>     ...
>     ret = pin_user_pages(hva, 1, flags, &page);
>             __gup_longterm_locked
>                 do {
>                      nr_pinned_pages = __get_user_pages_locked(mm,
> start, nr_pages,
>                       pages, locked, gup_flags);
>                      rc =
> check_and_migrate_movable_pages(nr_pinned_pages, pages);
>                  } while (rc == -EAGAIN);
> }

Hello, Zhaoyang.

I don't believe commit 1aaf8c was just intended to prevent an infinite loop.
The commit was introduced to allow pinning CMA memory in the pKVM on AOSP.

That leads me to question whether the assumption that CMA can be long-term pinned is actually valid.

In my opinion, it might be more appropriate to revert that commit 1aaf8c and instead ensure
that pKVM avoids using CMA for memory that requires long-term pinning through GUP ?

Alternatively, instead of changing the current logic that prevents longterm GUP from pinning CMA,
it would be better to propose a new patch that specifically addresses the pKVM scenario like adding new FOLL_flags ?

Thanks,
Regards.


[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reply: [RFC] pin_user_pages_fast failure count increased
  2025-05-28  1:23                           ` Hyesoo Yu
@ 2025-05-28  2:49                             ` Zhaoyang Huang
  2025-05-28  3:36                               ` Hyesoo Yu
  0 siblings, 1 reply; 27+ messages in thread
From: Zhaoyang Huang @ 2025-05-28  2:49 UTC (permalink / raw)
  To: Hyesoo Yu
  Cc: jaewon31.kim, David Hildenbrand, John Hubbard,
	zhaoyang.huang@unisoc.com, surenb@google.com,
	Steve.Kang@unisoc.com, Jaewon Kim, linux-mm@kvack.org,
	Jang-Hyuck Kim

On Wed, May 28, 2025 at 9:25 AM Hyesoo Yu <hyesoo.yu@samsung.com> wrote:
>
> On Mon, May 26, 2025 at 07:49:57PM +0800, Zhaoyang Huang wrote:
> > On Mon, May 26, 2025 at 7:17 PM Jaewon Kim <jaewon31.kim@samsung.com> wrote:
> > >
> > > >On 26.05.25 11:33, Hyesoo Yu wrote:
> > > >> On Mon, May 26, 2025 at 04:05:16PM +0800, Zhaoyang Huang wrote:
> > > >>> On Mon, May 26, 2025 at 3:50?PM Hyesoo Yu <hyesoo.yu@samsung.com> wrote:
> > > >>>>
> > > >>>> On Thu, May 22, 2025 at 07:52:41PM -0700, John Hubbard wrote:
> > > >>>>> On 5/22/25 7:37 PM, 김재원 wrote:
> > > >>>>> ...
> > > >>>>>> I think this is what you meant, please let me know if you have an idea to make this nicer.
> > > >>>>>> We may be to able to prepare the patch next week.
> > > >>>>>>
> > > >>>>>>   static long
> > > >>>>>>   check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
> > > >>>>>>   {
> > > >>>>>> +       bool any_unpinnable;
> > > >>>>>>          LIST_HEAD(movable_folio_list);
> > > >>>>>>
> > > >>>>>> -       collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> > > >>>>>> -       if (list_empty(&movable_folio_list))
> > > >>>>>> -               return 0;
> > > >>>>>> +       any_unpinnable = collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> > > >>>>>> +       if (list_empty(&movable_folio_list)) {
> > > >>>>>> +               if (any_unpinnable)
> > > >>>>>> +                       pofs_unpin(pofs);
> > > >>>>>
> > > >>>>> I think this is correct, although as I mentioned in the other thread,
> > > >>>>> that implies that commit 1aaf8c122918 (which didn't add nor remove
> > > >>>>> any pof unpinning) is probably not the true or only culprit, right?
> > > >>>>>
> > > >>>>>> +               return any_unpinnable ? -EAGAIN : 0;
> > > >>>>>
> > > >>>>> Ha, the "?" operator almost always does more harm than good.
> > > >>>>>
> > > >>>>> Here, for example, it has obscured from you the fact that any_unpinnable
> > > >>>>> is being checked twice, when you could have merged those into a single "if".
> > > >>>>>
> > > >>>>
> > > >>>> Hello,
> > > >>>>
> > > >>>> I was wondering if the original problem - an infinite loop when pages allocated by
> > > >>>> cma_alloc() in vm_ops->fault are passed to GUP - still remains unresolved.
> > > >>>> (To be honest, I'm not quite sure how such pages end up being pinned via GUP.
> > > >>>>   Is that the expected behavior, or could it possibly indicate a bug ?)
> > > >>> The original problem arises from applying CMA as guestOS's memory
> > > >>> slots for kvm which use GUP to setup its 2nd stage mapping(HVA->PFN).
> > > >>> You can check KVM code if you are interested.
> > > >>>
> > > >>
> > > >> Thanks for the kind explanation. While I'm not deeply familiar with KVM, my understanding
> > > >> is that there are cases where GUP is used on CMA.
> > > >>
> > > >> So does that mean pinning memory from the CMA was actually intended to succeed ?
> > > >
> > > >Careful: KVM uses ordinary GUP, not GUP-longterm.
> > >
> > > Hi. David and Zhaoyang
> > >
> > > If possible, could you kindly explain the situation where the 1aaf8c122918 was addeded?
> > > If KVM does not user FOLL_LONGTERM, then why the function,
> > > collect_longterm_unpinnable_folios, was changed at that time?
> > >
> > > First of all, I'm not a KVM expert. After reading Zhaoyang's mail,
> > > I thought CMA free page was initially allocated then migrated by FOLL_LONGTERM,
> > > during the get_user_page for KVM's guest OS. If KVM does not use FOLL_LONGTERM,
> > > I am confused.
> > >
> > > Actually I did not understand the infinite loop situation. I thought few times of -EAGAIN
> > > might happen during the gup. But calling lru_add_drain_all by collect_longterm_unpinnable_folios
> > > would put the page to LRU. And other cma_alloc context or migration context, I guess,
> > > put the pages back to LRU if there was race.
> > Actually, it is pkvm which was introduced by google in AOSP. I am
> > afraid I can just brief the callstack here for security reasons. The
> > pin_user_pages will setup the 2nd stage mapping for the hva by the
> > vm_ops->fault which is registered by kvm memfd driver and all PFNs are
> > from CMA area. The driver will keep the pages out of the LRU which hit
> > the original bug as it is counted but have the movable_page_list be
> > empty and lead to infinite loop within __gup_longterm_locked
> >
> > pkvm_xxx_xxx(equal to user_mem_abort in kvm)
> > {
> >     unsigned int flags = FOLL_HWPOISON | FOLL_LONGTERM | FOLL_WRITE;
> >     ...
> >     ret = pin_user_pages(hva, 1, flags, &page);
> >             __gup_longterm_locked
> >                 do {
> >                      nr_pinned_pages = __get_user_pages_locked(mm,
> > start, nr_pages,
> >                       pages, locked, gup_flags);
> >                      rc =
> > check_and_migrate_movable_pages(nr_pinned_pages, pages);
> >                  } while (rc == -EAGAIN);
> > }
>
> Hello, Zhaoyang.
>
> I don't believe commit 1aaf8c was just intended to prevent an infinite loop.
> The commit was introduced to allow pinning CMA memory in the pKVM on AOSP.
>
> That leads me to question whether the assumption that CMA can be long-term pinned is actually valid.
That depends on the user of CMA, yes for my scenario since it worked
for the guest os. For common scenario such as the file/anon mapping,
the page will be judged as unpinnable for long-term and be migrated
out of CMA area.
>
> In my opinion, it might be more appropriate to revert that commit 1aaf8c and instead ensure
> that pKVM avoids using CMA for memory that requires long-term pinning through GUP ?
It is not a pkvm issue but a defect of applying FOLL_LONGTERM over
non-LRU CMA pages.
>
> Alternatively, instead of changing the current logic that prevents longterm GUP from pinning CMA,
> it would be better to propose a new patch that specifically addresses the pKVM scenario like adding new FOLL_flags ?
I don't think so. pin_user_pages is an exported API which can't make
assumptions over the caller.
>
> Thanks,
> Regards.
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reply: [RFC] pin_user_pages_fast failure count increased
  2025-05-28  2:49                             ` Zhaoyang Huang
@ 2025-05-28  3:36                               ` Hyesoo Yu
  2025-05-28  7:55                                 ` David Hildenbrand
  0 siblings, 1 reply; 27+ messages in thread
From: Hyesoo Yu @ 2025-05-28  3:36 UTC (permalink / raw)
  To: Zhaoyang Huang
  Cc: jaewon31.kim, David Hildenbrand, John Hubbard,
	zhaoyang.huang@unisoc.com, surenb@google.com,
	Steve.Kang@unisoc.com, Jaewon Kim, linux-mm@kvack.org,
	Jang-Hyuck Kim

[-- Attachment #1: Type: text/plain, Size: 2265 bytes --]

On Wed, May 28, 2025 at 10:49:36AM +0800, Zhaoyang Huang wrote:
> On Wed, May 28, 2025 at 9:25 AM Hyesoo Yu <hyesoo.yu@samsung.com> wrote:
> >
> > On Mon, May 26, 2025 at 07:49:57PM +0800, Zhaoyang Huang wrote:
> >
> > Hello, Zhaoyang.
> >
> > I don't believe commit 1aaf8c was just intended to prevent an infinite loop.
> > The commit was introduced to allow pinning CMA memory in the pKVM on AOSP.
> >
> > That leads me to question whether the assumption that CMA can be long-term pinned is actually valid.
> That depends on the user of CMA, yes for my scenario since it worked
> for the guest os. For common scenario such as the file/anon mapping,
> the page will be judged as unpinnable for long-term and be migrated
> out of CMA area.

Your scenario and the common scenarios can not be distinguished from the kernel API's perspective.
Even in common cases, the page may be in a non-LRU state temporiarily, and in such situations,
pinning CMA can lead to bugs - we've encountered multiple issues because of this.

> >
> > In my opinion, it might be more appropriate to revert that commit 1aaf8c and instead ensure
> > that pKVM avoids using CMA for memory that requires long-term pinning through GUP ?
> It is not a pkvm issue but a defect of applying FOLL_LONGTERM over
> non-LRU CMA pages.

In include/linux/mm_types.h, the CMA should be migrated when FOLL_LONGTERM.

* In the CMA case: long term pins in a CMA region would unnecessarily fragment
* that region.  And so, CMA attempts to migrate the page before pinning, when
* FOLL_LONGTERM is specified.

Given this, would it make sense to avoid using FOLL_LONGTERM in this code path ?

> >
> > Alternatively, instead of changing the current logic that prevents longterm GUP from pinning CMA,
> > it would be better to propose a new patch that specifically addresses the pKVM scenario like adding new FOLL_flags ?
> I don't think so. pin_user_pages is an exported API which can't make
> assumptions over the caller.

My point is not to base the patch on assumptions about the caller,
but to define a clear mechanism that ensures safe behavior in the intended scenario.

For example, you can add FOLL_NO_MIGRATION and skip to migrate unpinnable pages.

Thanks,
Regards.

> >
> > Thanks,
> > Regards.
> >
> 

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reply: [RFC] pin_user_pages_fast failure count increased
  2025-05-28  3:36                               ` Hyesoo Yu
@ 2025-05-28  7:55                                 ` David Hildenbrand
  2025-05-28 10:59                                   ` Zhaoyang Huang
  0 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2025-05-28  7:55 UTC (permalink / raw)
  To: Hyesoo Yu, Zhaoyang Huang
  Cc: jaewon31.kim, John Hubbard, zhaoyang.huang@unisoc.com,
	surenb@google.com, Steve.Kang@unisoc.com, Jaewon Kim,
	linux-mm@kvack.org, Jang-Hyuck Kim

On 28.05.25 05:36, Hyesoo Yu wrote:
> On Wed, May 28, 2025 at 10:49:36AM +0800, Zhaoyang Huang wrote:
>> On Wed, May 28, 2025 at 9:25 AM Hyesoo Yu <hyesoo.yu@samsung.com> wrote:
>>>
>>> On Mon, May 26, 2025 at 07:49:57PM +0800, Zhaoyang Huang wrote:
>>>
>>> Hello, Zhaoyang.
>>>
>>> I don't believe commit 1aaf8c was just intended to prevent an infinite loop.
>>> The commit was introduced to allow pinning CMA memory in the pKVM on AOSP.
>>>
>>> That leads me to question whether the assumption that CMA can be long-term pinned is actually valid.
>> That depends on the user of CMA, yes for my scenario since it worked
>> for the guest os. For common scenario such as the file/anon mapping,
>> the page will be judged as unpinnable for long-term and be migrated
>> out of CMA area.
> 
> Your scenario and the common scenarios can not be distinguished from the kernel API's perspective.
> Even in common cases, the page may be in a non-LRU state temporiarily, and in such situations,
> pinning CMA can lead to bugs - we've encountered multiple issues because of this.
> 

Right. We just disallow long-term pinning CMA pages, because we don't 
know who the real owner is that would be okay with long-term pinning them.

>>>
>>> In my opinion, it might be more appropriate to revert that commit 1aaf8c and instead ensure
>>> that pKVM avoids using CMA for memory that requires long-term pinning through GUP ?
>> It is not a pkvm issue but a defect of applying FOLL_LONGTERM over
>> non-LRU CMA pages.
> 
> In include/linux/mm_types.h, the CMA should be migrated when FOLL_LONGTERM.
> 
> * In the CMA case: long term pins in a CMA region would unnecessarily fragment
> * that region.  And so, CMA attempts to migrate the page before pinning, when
> * FOLL_LONGTERM is specified.
> 
> Given this, would it make sense to avoid using FOLL_LONGTERM in this code path ?

If something is unbounded in time, FOLL_LONGTERM is the right thing to use.

>>>
>>> Alternatively, instead of changing the current logic that prevents longterm GUP from pinning CMA,
>>> it would be better to propose a new patch that specifically addresses the pKVM scenario like adding new FOLL_flags ?
>> I don't think so. pin_user_pages is an exported API which can't make
>> assumptions over the caller.
> 
> My point is not to base the patch on assumptions about the caller,
> but to define a clear mechanism that ensures safe behavior in the intended scenario.
> 
> For example, you can add FOLL_NO_MIGRATION and skip to migrate unpinnable pages.

Not sure which exact semantics you have in mind. But failing if we would 
have to migrate might be ok. Not sure if the caller should worry about 
that, though: the caller should not have to worry about page placement 
in general.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reply: [RFC] pin_user_pages_fast failure count increased
  2025-05-28  7:55                                 ` David Hildenbrand
@ 2025-05-28 10:59                                   ` Zhaoyang Huang
  2025-05-28 12:57                                     ` David Hildenbrand
  2025-06-03 13:12                                     ` David Hildenbrand
  0 siblings, 2 replies; 27+ messages in thread
From: Zhaoyang Huang @ 2025-05-28 10:59 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Hyesoo Yu, jaewon31.kim, John Hubbard, zhaoyang.huang@unisoc.com,
	surenb@google.com, Steve.Kang@unisoc.com, Jaewon Kim,
	linux-mm@kvack.org, Jang-Hyuck Kim

On Wed, May 28, 2025 at 3:55 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 28.05.25 05:36, Hyesoo Yu wrote:
> > On Wed, May 28, 2025 at 10:49:36AM +0800, Zhaoyang Huang wrote:
> >> On Wed, May 28, 2025 at 9:25 AM Hyesoo Yu <hyesoo.yu@samsung.com> wrote:
> >>>
> >>> On Mon, May 26, 2025 at 07:49:57PM +0800, Zhaoyang Huang wrote:
> >>>
> >>> Hello, Zhaoyang.
> >>>
> >>> I don't believe commit 1aaf8c was just intended to prevent an infinite loop.
> >>> The commit was introduced to allow pinning CMA memory in the pKVM on AOSP.
> >>>
> >>> That leads me to question whether the assumption that CMA can be long-term pinned is actually valid.
> >> That depends on the user of CMA, yes for my scenario since it worked
> >> for the guest os. For common scenario such as the file/anon mapping,
> >> the page will be judged as unpinnable for long-term and be migrated
> >> out of CMA area.
> >
> > Your scenario and the common scenarios can not be distinguished from the kernel API's perspective.
> > Even in common cases, the page may be in a non-LRU state temporiarily, and in such situations,
> > pinning CMA can lead to bugs - we've encountered multiple issues because of this.
> >
>
> Right. We just disallow long-term pinning CMA pages, because we don't
> know who the real owner is that would be okay with long-term pinning them.
>
> >>>
> >>> In my opinion, it might be more appropriate to revert that commit 1aaf8c and instead ensure
> >>> that pKVM avoids using CMA for memory that requires long-term pinning through GUP ?
> >> It is not a pkvm issue but a defect of applying FOLL_LONGTERM over
> >> non-LRU CMA pages.
> >
> > In include/linux/mm_types.h, the CMA should be migrated when FOLL_LONGTERM.
> >
> > * In the CMA case: long term pins in a CMA region would unnecessarily fragment
> > * that region.  And so, CMA attempts to migrate the page before pinning, when
> > * FOLL_LONGTERM is specified.
> >
> > Given this, would it make sense to avoid using FOLL_LONGTERM in this code path ?
>
> If something is unbounded in time, FOLL_LONGTERM is the right thing to use.
>
> >>>
> >>> Alternatively, instead of changing the current logic that prevents longterm GUP from pinning CMA,
> >>> it would be better to propose a new patch that specifically addresses the pKVM scenario like adding new FOLL_flags ?
> >> I don't think so. pin_user_pages is an exported API which can't make
> >> assumptions over the caller.
> >
> > My point is not to base the patch on assumptions about the caller,
> > but to define a clear mechanism that ensures safe behavior in the intended scenario.
> >
> > For example, you can add FOLL_NO_MIGRATION and skip to migrate unpinnable pages.
>
> Not sure which exact semantics you have in mind. But failing if we would
> have to migrate might be ok. Not sure if the caller should worry about
> that, though: the caller should not have to worry about page placement
> in general.
With going over the whole thread, I think the root cause is
collect_longterm_unpinnable_folios() hit the race window between
lru_add_drain_all() and folio_isolate_lru() by chance and returned
with ret=0 which finally have the CMA page pinned, right? However, I
find the proposed patch below will fail the PKVM
scenario(FOLL_LONGTERM set with non-LRU CMA pages) again as the CMA
pages never go to LRU which will have the __gup_longterm_locked loop
in do while(ret == -EAGAIN) as it did before 1aaf8c. I think the key
point is to find a way to distinguish the temporary(on the way to LRU)
and permanent CMA pages within collect_longterm_unpinnable_folios.

 static long
 check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
 {
+       bool any_unpinnable;
        LIST_HEAD(movable_folio_list);

-       collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
-       if (list_empty(&movable_folio_list))
-               return 0;
+       any_unpinnable =
collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
+       if (list_empty(&movable_folio_list)) {
+               if (any_unpinnable)
+                       pofs_unpin(pofs);
+               return any_unpinnable ? -EAGAIN : 0;
+       }

        return migrate_longterm_unpinnable_folios(&movable_folio_list, pofs);
 }

>
> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reply: [RFC] pin_user_pages_fast failure count increased
  2025-05-28 10:59                                   ` Zhaoyang Huang
@ 2025-05-28 12:57                                     ` David Hildenbrand
  2025-06-03 13:12                                     ` David Hildenbrand
  1 sibling, 0 replies; 27+ messages in thread
From: David Hildenbrand @ 2025-05-28 12:57 UTC (permalink / raw)
  To: Zhaoyang Huang
  Cc: Hyesoo Yu, jaewon31.kim, John Hubbard, zhaoyang.huang@unisoc.com,
	surenb@google.com, Steve.Kang@unisoc.com, Jaewon Kim,
	linux-mm@kvack.org, Jang-Hyuck Kim

On 28.05.25 12:59, Zhaoyang Huang wrote:
> On Wed, May 28, 2025 at 3:55 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 28.05.25 05:36, Hyesoo Yu wrote:
>>> On Wed, May 28, 2025 at 10:49:36AM +0800, Zhaoyang Huang wrote:
>>>> On Wed, May 28, 2025 at 9:25 AM Hyesoo Yu <hyesoo.yu@samsung.com> wrote:
>>>>>
>>>>> On Mon, May 26, 2025 at 07:49:57PM +0800, Zhaoyang Huang wrote:
>>>>>
>>>>> Hello, Zhaoyang.
>>>>>
>>>>> I don't believe commit 1aaf8c was just intended to prevent an infinite loop.
>>>>> The commit was introduced to allow pinning CMA memory in the pKVM on AOSP.
>>>>>
>>>>> That leads me to question whether the assumption that CMA can be long-term pinned is actually valid.
>>>> That depends on the user of CMA, yes for my scenario since it worked
>>>> for the guest os. For common scenario such as the file/anon mapping,
>>>> the page will be judged as unpinnable for long-term and be migrated
>>>> out of CMA area.
>>>
>>> Your scenario and the common scenarios can not be distinguished from the kernel API's perspective.
>>> Even in common cases, the page may be in a non-LRU state temporiarily, and in such situations,
>>> pinning CMA can lead to bugs - we've encountered multiple issues because of this.
>>>
>>
>> Right. We just disallow long-term pinning CMA pages, because we don't
>> know who the real owner is that would be okay with long-term pinning them.
>>
>>>>>
>>>>> In my opinion, it might be more appropriate to revert that commit 1aaf8c and instead ensure
>>>>> that pKVM avoids using CMA for memory that requires long-term pinning through GUP ?
>>>> It is not a pkvm issue but a defect of applying FOLL_LONGTERM over
>>>> non-LRU CMA pages.
>>>
>>> In include/linux/mm_types.h, the CMA should be migrated when FOLL_LONGTERM.
>>>
>>> * In the CMA case: long term pins in a CMA region would unnecessarily fragment
>>> * that region.  And so, CMA attempts to migrate the page before pinning, when
>>> * FOLL_LONGTERM is specified.
>>>
>>> Given this, would it make sense to avoid using FOLL_LONGTERM in this code path ?
>>
>> If something is unbounded in time, FOLL_LONGTERM is the right thing to use.
>>
>>>>>
>>>>> Alternatively, instead of changing the current logic that prevents longterm GUP from pinning CMA,
>>>>> it would be better to propose a new patch that specifically addresses the pKVM scenario like adding new FOLL_flags ?
>>>> I don't think so. pin_user_pages is an exported API which can't make
>>>> assumptions over the caller.
>>>
>>> My point is not to base the patch on assumptions about the caller,
>>> but to define a clear mechanism that ensures safe behavior in the intended scenario.
>>>
>>> For example, you can add FOLL_NO_MIGRATION and skip to migrate unpinnable pages.
>>
>> Not sure which exact semantics you have in mind. But failing if we would
>> have to migrate might be ok. Not sure if the caller should worry about
>> that, though: the caller should not have to worry about page placement
>> in general.
> With going over the whole thread, I think the root cause is
> collect_longterm_unpinnable_folios() hit the race window between
> lru_add_drain_all() and folio_isolate_lru() by chance and returned
> with ret=0 which finally have the CMA page pinned, right? However, I
> find the proposed patch below will fail the PKVM
> scenario(FOLL_LONGTERM set with non-LRU CMA pages) again as the CMA
> pages never go to LRU which will have the __gup_longterm_locked loop
> in do while(ret == -EAGAIN) as it did before 1aaf8c.

For upstream that is the exact right thing to do.

CMA pages must not get longterm pinned.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reply: [RFC] pin_user_pages_fast failure count increased
  2025-05-28 10:59                                   ` Zhaoyang Huang
  2025-05-28 12:57                                     ` David Hildenbrand
@ 2025-06-03 13:12                                     ` David Hildenbrand
  2025-06-04  1:04                                       ` Zhaoyang Huang
  1 sibling, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2025-06-03 13:12 UTC (permalink / raw)
  To: Zhaoyang Huang
  Cc: Hyesoo Yu, jaewon31.kim, John Hubbard, zhaoyang.huang@unisoc.com,
	surenb@google.com, Steve.Kang@unisoc.com, Jaewon Kim,
	linux-mm@kvack.org, Jang-Hyuck Kim

On 28.05.25 12:59, Zhaoyang Huang wrote:
> On Wed, May 28, 2025 at 3:55 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 28.05.25 05:36, Hyesoo Yu wrote:
>>> On Wed, May 28, 2025 at 10:49:36AM +0800, Zhaoyang Huang wrote:
>>>> On Wed, May 28, 2025 at 9:25 AM Hyesoo Yu <hyesoo.yu@samsung.com> wrote:
>>>>>
>>>>> On Mon, May 26, 2025 at 07:49:57PM +0800, Zhaoyang Huang wrote:
>>>>>
>>>>> Hello, Zhaoyang.
>>>>>
>>>>> I don't believe commit 1aaf8c was just intended to prevent an infinite loop.
>>>>> The commit was introduced to allow pinning CMA memory in the pKVM on AOSP.
>>>>>
>>>>> That leads me to question whether the assumption that CMA can be long-term pinned is actually valid.
>>>> That depends on the user of CMA, yes for my scenario since it worked
>>>> for the guest os. For common scenario such as the file/anon mapping,
>>>> the page will be judged as unpinnable for long-term and be migrated
>>>> out of CMA area.
>>>
>>> Your scenario and the common scenarios can not be distinguished from the kernel API's perspective.
>>> Even in common cases, the page may be in a non-LRU state temporiarily, and in such situations,
>>> pinning CMA can lead to bugs - we've encountered multiple issues because of this.
>>>
>>
>> Right. We just disallow long-term pinning CMA pages, because we don't
>> know who the real owner is that would be okay with long-term pinning them.
>>
>>>>>
>>>>> In my opinion, it might be more appropriate to revert that commit 1aaf8c and instead ensure
>>>>> that pKVM avoids using CMA for memory that requires long-term pinning through GUP ?
>>>> It is not a pkvm issue but a defect of applying FOLL_LONGTERM over
>>>> non-LRU CMA pages.
>>>
>>> In include/linux/mm_types.h, the CMA should be migrated when FOLL_LONGTERM.
>>>
>>> * In the CMA case: long term pins in a CMA region would unnecessarily fragment
>>> * that region.  And so, CMA attempts to migrate the page before pinning, when
>>> * FOLL_LONGTERM is specified.
>>>
>>> Given this, would it make sense to avoid using FOLL_LONGTERM in this code path ?
>>
>> If something is unbounded in time, FOLL_LONGTERM is the right thing to use.
>>
>>>>>
>>>>> Alternatively, instead of changing the current logic that prevents longterm GUP from pinning CMA,
>>>>> it would be better to propose a new patch that specifically addresses the pKVM scenario like adding new FOLL_flags ?
>>>> I don't think so. pin_user_pages is an exported API which can't make
>>>> assumptions over the caller.
>>>
>>> My point is not to base the patch on assumptions about the caller,
>>> but to define a clear mechanism that ensures safe behavior in the intended scenario.
>>>
>>> For example, you can add FOLL_NO_MIGRATION and skip to migrate unpinnable pages.
>>
>> Not sure which exact semantics you have in mind. But failing if we would
>> have to migrate might be ok. Not sure if the caller should worry about
>> that, though: the caller should not have to worry about page placement
>> in general.
> With going over the whole thread, I think the root cause is
> collect_longterm_unpinnable_folios() hit the race window between
> lru_add_drain_all() and folio_isolate_lru() by chance and returned
> with ret=0 which finally have the CMA page pinned, right? However, I
> find the proposed patch below will fail the PKVM
> scenario(FOLL_LONGTERM set with non-LRU CMA pages) again as the CMA
> pages never go to LRU which will have the __gup_longterm_locked loop
> in do while(ret == -EAGAIN) as it did before 1aaf8c. I think the key
> point is to find a way to distinguish the temporary(on the way to LRU)
> and permanent CMA pages within collect_longterm_unpinnable_folios.
> 
>   static long
>   check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
>   {
> +       bool any_unpinnable;
>          LIST_HEAD(movable_folio_list);
> 
> -       collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> -       if (list_empty(&movable_folio_list))
> -               return 0;
> +       any_unpinnable =
> collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> +       if (list_empty(&movable_folio_list)) {
> +               if (any_unpinnable)
> +                       pofs_unpin(pofs);
> +               return any_unpinnable ? -EAGAIN : 0;
> +       }
> 
So, what's the status of that? We should fix it upstream (*not* caring 
about controversial out-of-tree pkvm issues).

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reply: [RFC] pin_user_pages_fast failure count increased
  2025-06-03 13:12                                     ` David Hildenbrand
@ 2025-06-04  1:04                                       ` Zhaoyang Huang
  2025-06-04  9:12                                         ` David Hildenbrand
  0 siblings, 1 reply; 27+ messages in thread
From: Zhaoyang Huang @ 2025-06-04  1:04 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Hyesoo Yu, jaewon31.kim, John Hubbard, zhaoyang.huang@unisoc.com,
	surenb@google.com, Steve.Kang@unisoc.com, Jaewon Kim,
	linux-mm@kvack.org, Jang-Hyuck Kim

On Tue, Jun 3, 2025 at 9:12 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 28.05.25 12:59, Zhaoyang Huang wrote:
> > On Wed, May 28, 2025 at 3:55 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 28.05.25 05:36, Hyesoo Yu wrote:
> >>> On Wed, May 28, 2025 at 10:49:36AM +0800, Zhaoyang Huang wrote:
> >>>> On Wed, May 28, 2025 at 9:25 AM Hyesoo Yu <hyesoo.yu@samsung.com> wrote:
> >>>>>
> >>>>> On Mon, May 26, 2025 at 07:49:57PM +0800, Zhaoyang Huang wrote:
> >>>>>
> >>>>> Hello, Zhaoyang.
> >>>>>
> >>>>> I don't believe commit 1aaf8c was just intended to prevent an infinite loop.
> >>>>> The commit was introduced to allow pinning CMA memory in the pKVM on AOSP.
> >>>>>
> >>>>> That leads me to question whether the assumption that CMA can be long-term pinned is actually valid.
> >>>> That depends on the user of CMA, yes for my scenario since it worked
> >>>> for the guest os. For common scenario such as the file/anon mapping,
> >>>> the page will be judged as unpinnable for long-term and be migrated
> >>>> out of CMA area.
> >>>
> >>> Your scenario and the common scenarios can not be distinguished from the kernel API's perspective.
> >>> Even in common cases, the page may be in a non-LRU state temporiarily, and in such situations,
> >>> pinning CMA can lead to bugs - we've encountered multiple issues because of this.
> >>>
> >>
> >> Right. We just disallow long-term pinning CMA pages, because we don't
> >> know who the real owner is that would be okay with long-term pinning them.
> >>
> >>>>>
> >>>>> In my opinion, it might be more appropriate to revert that commit 1aaf8c and instead ensure
> >>>>> that pKVM avoids using CMA for memory that requires long-term pinning through GUP ?
> >>>> It is not a pkvm issue but a defect of applying FOLL_LONGTERM over
> >>>> non-LRU CMA pages.
> >>>
> >>> In include/linux/mm_types.h, the CMA should be migrated when FOLL_LONGTERM.
> >>>
> >>> * In the CMA case: long term pins in a CMA region would unnecessarily fragment
> >>> * that region.  And so, CMA attempts to migrate the page before pinning, when
> >>> * FOLL_LONGTERM is specified.
> >>>
> >>> Given this, would it make sense to avoid using FOLL_LONGTERM in this code path ?
> >>
> >> If something is unbounded in time, FOLL_LONGTERM is the right thing to use.
> >>
> >>>>>
> >>>>> Alternatively, instead of changing the current logic that prevents longterm GUP from pinning CMA,
> >>>>> it would be better to propose a new patch that specifically addresses the pKVM scenario like adding new FOLL_flags ?
> >>>> I don't think so. pin_user_pages is an exported API which can't make
> >>>> assumptions over the caller.
> >>>
> >>> My point is not to base the patch on assumptions about the caller,
> >>> but to define a clear mechanism that ensures safe behavior in the intended scenario.
> >>>
> >>> For example, you can add FOLL_NO_MIGRATION and skip to migrate unpinnable pages.
> >>
> >> Not sure which exact semantics you have in mind. But failing if we would
> >> have to migrate might be ok. Not sure if the caller should worry about
> >> that, though: the caller should not have to worry about page placement
> >> in general.
> > With going over the whole thread, I think the root cause is
> > collect_longterm_unpinnable_folios() hit the race window between
> > lru_add_drain_all() and folio_isolate_lru() by chance and returned
> > with ret=0 which finally have the CMA page pinned, right? However, I
> > find the proposed patch below will fail the PKVM
> > scenario(FOLL_LONGTERM set with non-LRU CMA pages) again as the CMA
> > pages never go to LRU which will have the __gup_longterm_locked loop
> > in do while(ret == -EAGAIN) as it did before 1aaf8c. I think the key
> > point is to find a way to distinguish the temporary(on the way to LRU)
> > and permanent CMA pages within collect_longterm_unpinnable_folios.
> >
> >   static long
> >   check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
> >   {
> > +       bool any_unpinnable;
> >          LIST_HEAD(movable_folio_list);
> >
> > -       collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> > -       if (list_empty(&movable_folio_list))
> > -               return 0;
> > +       any_unpinnable =
> > collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> > +       if (list_empty(&movable_folio_list)) {
> > +               if (any_unpinnable)
> > +                       pofs_unpin(pofs);
> > +               return any_unpinnable ? -EAGAIN : 0;
> > +       }
> >
> So, what's the status of that? We should fix it upstream (*not* caring
> about controversial out-of-tree pkvm issues).
Leaving aside the pkvm issue, we should also care about the CMA pages
mapping to VM by special driver which are intended to be long term
pinned (actually they are fetched by cma_alloc and then mapped to VM
instead of alloc_pages during normal page fault). Could we distinguish
them by the patch below based on 1aaf8c122, that is, this kind of
pages is not on page cache and have equaled refcnt to mapcount

diff --git a/include/linux/mm.h b/include/linux/mm.h
index bf55206935c4..1ae251cd194a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2099,7 +2099,13 @@ static inline bool
folio_is_longterm_pinnable(struct folio *folio)
 #ifdef CONFIG_CMA
        int mt = folio_migratetype(folio);

-       if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE)
+       /*
+        * CMA pages mapping to VM by special driver may not on page
cache which has NULL folio->mapping and equal
+        * refcnt to mapcount
+        */
+       if ((mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE) &&
+               (folio->mapping != NULL) &&
+               (folio_ref_count(folio) != folio_mapcount(folio)))
                return false;

>
> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: reply: [RFC] pin_user_pages_fast failure count increased
  2025-06-04  1:04                                       ` Zhaoyang Huang
@ 2025-06-04  9:12                                         ` David Hildenbrand
  2025-06-04  9:41                                           ` Zhaoyang Huang
  0 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2025-06-04  9:12 UTC (permalink / raw)
  To: Zhaoyang Huang
  Cc: Hyesoo Yu, jaewon31.kim, John Hubbard, zhaoyang.huang@unisoc.com,
	surenb@google.com, Steve.Kang@unisoc.com, Jaewon Kim,
	linux-mm@kvack.org, Jang-Hyuck Kim

On 04.06.25 03:04, Zhaoyang Huang wrote:
> On Tue, Jun 3, 2025 at 9:12 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 28.05.25 12:59, Zhaoyang Huang wrote:
>>> On Wed, May 28, 2025 at 3:55 PM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> On 28.05.25 05:36, Hyesoo Yu wrote:
>>>>> On Wed, May 28, 2025 at 10:49:36AM +0800, Zhaoyang Huang wrote:
>>>>>> On Wed, May 28, 2025 at 9:25 AM Hyesoo Yu <hyesoo.yu@samsung.com> wrote:
>>>>>>>
>>>>>>> On Mon, May 26, 2025 at 07:49:57PM +0800, Zhaoyang Huang wrote:
>>>>>>>
>>>>>>> Hello, Zhaoyang.
>>>>>>>
>>>>>>> I don't believe commit 1aaf8c was just intended to prevent an infinite loop.
>>>>>>> The commit was introduced to allow pinning CMA memory in the pKVM on AOSP.
>>>>>>>
>>>>>>> That leads me to question whether the assumption that CMA can be long-term pinned is actually valid.
>>>>>> That depends on the user of CMA, yes for my scenario since it worked
>>>>>> for the guest os. For common scenario such as the file/anon mapping,
>>>>>> the page will be judged as unpinnable for long-term and be migrated
>>>>>> out of CMA area.
>>>>>
>>>>> Your scenario and the common scenarios can not be distinguished from the kernel API's perspective.
>>>>> Even in common cases, the page may be in a non-LRU state temporiarily, and in such situations,
>>>>> pinning CMA can lead to bugs - we've encountered multiple issues because of this.
>>>>>
>>>>
>>>> Right. We just disallow long-term pinning CMA pages, because we don't
>>>> know who the real owner is that would be okay with long-term pinning them.
>>>>
>>>>>>>
>>>>>>> In my opinion, it might be more appropriate to revert that commit 1aaf8c and instead ensure
>>>>>>> that pKVM avoids using CMA for memory that requires long-term pinning through GUP ?
>>>>>> It is not a pkvm issue but a defect of applying FOLL_LONGTERM over
>>>>>> non-LRU CMA pages.
>>>>>
>>>>> In include/linux/mm_types.h, the CMA should be migrated when FOLL_LONGTERM.
>>>>>
>>>>> * In the CMA case: long term pins in a CMA region would unnecessarily fragment
>>>>> * that region.  And so, CMA attempts to migrate the page before pinning, when
>>>>> * FOLL_LONGTERM is specified.
>>>>>
>>>>> Given this, would it make sense to avoid using FOLL_LONGTERM in this code path ?
>>>>
>>>> If something is unbounded in time, FOLL_LONGTERM is the right thing to use.
>>>>
>>>>>>>
>>>>>>> Alternatively, instead of changing the current logic that prevents longterm GUP from pinning CMA,
>>>>>>> it would be better to propose a new patch that specifically addresses the pKVM scenario like adding new FOLL_flags ?
>>>>>> I don't think so. pin_user_pages is an exported API which can't make
>>>>>> assumptions over the caller.
>>>>>
>>>>> My point is not to base the patch on assumptions about the caller,
>>>>> but to define a clear mechanism that ensures safe behavior in the intended scenario.
>>>>>
>>>>> For example, you can add FOLL_NO_MIGRATION and skip to migrate unpinnable pages.
>>>>
>>>> Not sure which exact semantics you have in mind. But failing if we would
>>>> have to migrate might be ok. Not sure if the caller should worry about
>>>> that, though: the caller should not have to worry about page placement
>>>> in general.
>>> With going over the whole thread, I think the root cause is
>>> collect_longterm_unpinnable_folios() hit the race window between
>>> lru_add_drain_all() and folio_isolate_lru() by chance and returned
>>> with ret=0 which finally have the CMA page pinned, right? However, I
>>> find the proposed patch below will fail the PKVM
>>> scenario(FOLL_LONGTERM set with non-LRU CMA pages) again as the CMA
>>> pages never go to LRU which will have the __gup_longterm_locked loop
>>> in do while(ret == -EAGAIN) as it did before 1aaf8c. I think the key
>>> point is to find a way to distinguish the temporary(on the way to LRU)
>>> and permanent CMA pages within collect_longterm_unpinnable_folios.
>>>
>>>    static long
>>>    check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
>>>    {
>>> +       bool any_unpinnable;
>>>           LIST_HEAD(movable_folio_list);
>>>
>>> -       collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
>>> -       if (list_empty(&movable_folio_list))
>>> -               return 0;
>>> +       any_unpinnable =
>>> collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
>>> +       if (list_empty(&movable_folio_list)) {
>>> +               if (any_unpinnable)
>>> +                       pofs_unpin(pofs);
>>> +               return any_unpinnable ? -EAGAIN : 0;
>>> +       }
>>>
>> So, what's the status of that? We should fix it upstream (*not* caring
>> about controversial out-of-tree pkvm issues).
> Leaving aside the pkvm issue, we should also care about the CMA pages
> mapping to VM by special driver which are intended to be long term
> pinned (actually they are fetched by cma_alloc and then mapped to VM
> instead of alloc_pages during normal page fault).

Is there any such "special driver" in the tree?

> Could we distinguish
> them by the patch below based on 1aaf8c122, that is, this kind of
> pages is not on page cache and have equaled refcnt to mapcount

No, not like that. We'd need some proper indication that this page was 
allocated by the CMA area owner, and that owner agrees that the folio 
can be long-term pinned (maybe that agreement is by mapping it into user 
space, tbd).

Will you send the fix or should I do it? Discussing about broken use 
cases that do no apply upstream is not particularly helpful when we're 
dealing with a real upstream bug.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reply: [RFC] pin_user_pages_fast failure count increased
  2025-06-04  9:12                                         ` David Hildenbrand
@ 2025-06-04  9:41                                           ` Zhaoyang Huang
  2025-06-04  9:48                                             ` David Hildenbrand
  0 siblings, 1 reply; 27+ messages in thread
From: Zhaoyang Huang @ 2025-06-04  9:41 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Hyesoo Yu, jaewon31.kim, John Hubbard, zhaoyang.huang@unisoc.com,
	surenb@google.com, Steve.Kang@unisoc.com, Jaewon Kim,
	linux-mm@kvack.org, Jang-Hyuck Kim

On Wed, Jun 4, 2025 at 5:12 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 04.06.25 03:04, Zhaoyang Huang wrote:
> > On Tue, Jun 3, 2025 at 9:12 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 28.05.25 12:59, Zhaoyang Huang wrote:
> >>> On Wed, May 28, 2025 at 3:55 PM David Hildenbrand <david@redhat.com> wrote:
> >>>>
> >>>> On 28.05.25 05:36, Hyesoo Yu wrote:
> >>>>> On Wed, May 28, 2025 at 10:49:36AM +0800, Zhaoyang Huang wrote:
> >>>>>> On Wed, May 28, 2025 at 9:25 AM Hyesoo Yu <hyesoo.yu@samsung.com> wrote:
> >>>>>>>
> >>>>>>> On Mon, May 26, 2025 at 07:49:57PM +0800, Zhaoyang Huang wrote:
> >>>>>>>
> >>>>>>> Hello, Zhaoyang.
> >>>>>>>
> >>>>>>> I don't believe commit 1aaf8c was just intended to prevent an infinite loop.
> >>>>>>> The commit was introduced to allow pinning CMA memory in the pKVM on AOSP.
> >>>>>>>
> >>>>>>> That leads me to question whether the assumption that CMA can be long-term pinned is actually valid.
> >>>>>> That depends on the user of CMA, yes for my scenario since it worked
> >>>>>> for the guest os. For common scenario such as the file/anon mapping,
> >>>>>> the page will be judged as unpinnable for long-term and be migrated
> >>>>>> out of CMA area.
> >>>>>
> >>>>> Your scenario and the common scenarios can not be distinguished from the kernel API's perspective.
> >>>>> Even in common cases, the page may be in a non-LRU state temporiarily, and in such situations,
> >>>>> pinning CMA can lead to bugs - we've encountered multiple issues because of this.
> >>>>>
> >>>>
> >>>> Right. We just disallow long-term pinning CMA pages, because we don't
> >>>> know who the real owner is that would be okay with long-term pinning them.
> >>>>
> >>>>>>>
> >>>>>>> In my opinion, it might be more appropriate to revert that commit 1aaf8c and instead ensure
> >>>>>>> that pKVM avoids using CMA for memory that requires long-term pinning through GUP ?
> >>>>>> It is not a pkvm issue but a defect of applying FOLL_LONGTERM over
> >>>>>> non-LRU CMA pages.
> >>>>>
> >>>>> In include/linux/mm_types.h, the CMA should be migrated when FOLL_LONGTERM.
> >>>>>
> >>>>> * In the CMA case: long term pins in a CMA region would unnecessarily fragment
> >>>>> * that region.  And so, CMA attempts to migrate the page before pinning, when
> >>>>> * FOLL_LONGTERM is specified.
> >>>>>
> >>>>> Given this, would it make sense to avoid using FOLL_LONGTERM in this code path ?
> >>>>
> >>>> If something is unbounded in time, FOLL_LONGTERM is the right thing to use.
> >>>>
> >>>>>>>
> >>>>>>> Alternatively, instead of changing the current logic that prevents longterm GUP from pinning CMA,
> >>>>>>> it would be better to propose a new patch that specifically addresses the pKVM scenario like adding new FOLL_flags ?
> >>>>>> I don't think so. pin_user_pages is an exported API which can't make
> >>>>>> assumptions over the caller.
> >>>>>
> >>>>> My point is not to base the patch on assumptions about the caller,
> >>>>> but to define a clear mechanism that ensures safe behavior in the intended scenario.
> >>>>>
> >>>>> For example, you can add FOLL_NO_MIGRATION and skip to migrate unpinnable pages.
> >>>>
> >>>> Not sure which exact semantics you have in mind. But failing if we would
> >>>> have to migrate might be ok. Not sure if the caller should worry about
> >>>> that, though: the caller should not have to worry about page placement
> >>>> in general.
> >>> With going over the whole thread, I think the root cause is
> >>> collect_longterm_unpinnable_folios() hit the race window between
> >>> lru_add_drain_all() and folio_isolate_lru() by chance and returned
> >>> with ret=0 which finally have the CMA page pinned, right? However, I
> >>> find the proposed patch below will fail the PKVM
> >>> scenario(FOLL_LONGTERM set with non-LRU CMA pages) again as the CMA
> >>> pages never go to LRU which will have the __gup_longterm_locked loop
> >>> in do while(ret == -EAGAIN) as it did before 1aaf8c. I think the key
> >>> point is to find a way to distinguish the temporary(on the way to LRU)
> >>> and permanent CMA pages within collect_longterm_unpinnable_folios.
> >>>
> >>>    static long
> >>>    check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
> >>>    {
> >>> +       bool any_unpinnable;
> >>>           LIST_HEAD(movable_folio_list);
> >>>
> >>> -       collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> >>> -       if (list_empty(&movable_folio_list))
> >>> -               return 0;
> >>> +       any_unpinnable =
> >>> collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> >>> +       if (list_empty(&movable_folio_list)) {
> >>> +               if (any_unpinnable)
> >>> +                       pofs_unpin(pofs);
> >>> +               return any_unpinnable ? -EAGAIN : 0;
> >>> +       }
> >>>
> >> So, what's the status of that? We should fix it upstream (*not* caring
> >> about controversial out-of-tree pkvm issues).
> > Leaving aside the pkvm issue, we should also care about the CMA pages
> > mapping to VM by special driver which are intended to be long term
> > pinned (actually they are fetched by cma_alloc and then mapped to VM
> > instead of alloc_pages during normal page fault).
>
> Is there any such "special driver" in the tree?
Not that I know of. However, pin_user_pages is exported symbol which
could be used for ko, should we make it be capable of dealing with
this scenario?
>
> > Could we distinguish
> > them by the patch below based on 1aaf8c122, that is, this kind of
> > pages is not on page cache and have equaled refcnt to mapcount
>
> No, not like that. We'd need some proper indication that this page was
> allocated by the CMA area owner, and that owner agrees that the folio
> can be long-term pinned (maybe that agreement is by mapping it into user
> space, tbd).
I think the key point is to distinguish the cma pages which are
allocated from fallback of GFP_MOVABLE during common page faults from
the ones which got from cma_alloc within the special driver's
vm_ops->fault.
>
> Will you send the fix or should I do it? Discussing about broken use
> cases that do no apply upstream is not particularly helpful when we're
> dealing with a real upstream bug.
I would like to ask for your help on this since I have no further ideas. Thanks
>
> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reply: [RFC] pin_user_pages_fast failure count increased
  2025-06-04  9:41                                           ` Zhaoyang Huang
@ 2025-06-04  9:48                                             ` David Hildenbrand
       [not found]                                               ` <CGME20250604095542epcas2p3f3d2d6fc17115547981a7173215a09d1@epcas2p3.samsung.com>
  0 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2025-06-04  9:48 UTC (permalink / raw)
  To: Zhaoyang Huang
  Cc: Hyesoo Yu, jaewon31.kim, John Hubbard, zhaoyang.huang@unisoc.com,
	surenb@google.com, Steve.Kang@unisoc.com, Jaewon Kim,
	linux-mm@kvack.org, Jang-Hyuck Kim

>>>>>
>>>> So, what's the status of that? We should fix it upstream (*not* caring
>>>> about controversial out-of-tree pkvm issues).
>>> Leaving aside the pkvm issue, we should also care about the CMA pages
>>> mapping to VM by special driver which are intended to be long term
>>> pinned (actually they are fetched by cma_alloc and then mapped to VM
>>> instead of alloc_pages during normal page fault).
>>
>> Is there any such "special driver" in the tree?
> Not that I know of. However, pin_user_pages is exported symbol which
> could be used for ko, should we make it be capable of dealing with
> this scenario?

We have the tendency to not go above and beyond of adding features that 
we cannot even test without OOT drivers.

Note that we export symbols so other in-tree drivers that are built as a 
module can make use of them.

>>
>>> Could we distinguish
>>> them by the patch below based on 1aaf8c122, that is, this kind of
>>> pages is not on page cache and have equaled refcnt to mapcount
>>
>> No, not like that. We'd need some proper indication that this page was
>> allocated by the CMA area owner, and that owner agrees that the folio
>> can be long-term pinned (maybe that agreement is by mapping it into user
>> space, tbd).
> I think the key point is to distinguish the cma pages which are
> allocated from fallback of GFP_MOVABLE during common page faults from
> the ones which got from cma_alloc within the special driver's
> vm_ops->fault.

Yes. Using typed pages in the future might work. For now, this is not 
possible yet because the page type overlays page->mapcount.

Hm.

>>
>> Will you send the fix or should I do it? Discussing about broken use
>> cases that do no apply upstream is not particularly helpful when we're
>> dealing with a real upstream bug.
> I would like to ask for your help on this since I have no further ideas. Thanks

Okay, let me send a fix for the original commit.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: reply: [RFC] pin_user_pages_fast failure count increased
       [not found]                                               ` <CGME20250604095542epcas2p3f3d2d6fc17115547981a7173215a09d1@epcas2p3.samsung.com>
@ 2025-06-04  9:53                                                 ` Hyesoo Yu
  0 siblings, 0 replies; 27+ messages in thread
From: Hyesoo Yu @ 2025-06-04  9:53 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Zhaoyang Huang, jaewon31.kim, John Hubbard,
	zhaoyang.huang@unisoc.com, surenb@google.com,
	Steve.Kang@unisoc.com, Jaewon Kim, linux-mm@kvack.org,
	Jang-Hyuck Kim

[-- Attachment #1: Type: text/plain, Size: 2294 bytes --]

On Wed, Jun 04, 2025 at 11:48:54AM +0200, David Hildenbrand wrote:
> > > > > > 
> > > > > So, what's the status of that? We should fix it upstream (*not* caring
> > > > > about controversial out-of-tree pkvm issues).
> > > > Leaving aside the pkvm issue, we should also care about the CMA pages
> > > > mapping to VM by special driver which are intended to be long term
> > > > pinned (actually they are fetched by cma_alloc and then mapped to VM
> > > > instead of alloc_pages during normal page fault).
> > > 
> > > Is there any such "special driver" in the tree?
> > Not that I know of. However, pin_user_pages is exported symbol which
> > could be used for ko, should we make it be capable of dealing with
> > this scenario?
> 
> We have the tendency to not go above and beyond of adding features that we
> cannot even test without OOT drivers.
> 
> Note that we export symbols so other in-tree drivers that are built as a
> module can make use of them.
> 
> > > 
> > > > Could we distinguish
> > > > them by the patch below based on 1aaf8c122, that is, this kind of
> > > > pages is not on page cache and have equaled refcnt to mapcount
> > > 
> > > No, not like that. We'd need some proper indication that this page was
> > > allocated by the CMA area owner, and that owner agrees that the folio
> > > can be long-term pinned (maybe that agreement is by mapping it into user
> > > space, tbd).
> > I think the key point is to distinguish the cma pages which are
> > allocated from fallback of GFP_MOVABLE during common page faults from
> > the ones which got from cma_alloc within the special driver's
> > vm_ops->fault.
> 
> Yes. Using typed pages in the future might work. For now, this is not
> possible yet because the page type overlays page->mapcount.
> 
> Hm.
> 
> > > 
> > > Will you send the fix or should I do it? Discussing about broken use
> > > cases that do no apply upstream is not particularly helpful when we're
> > > dealing with a real upstream bug.
> > I would like to ask for your help on this since I have no further ideas. Thanks
> 
> Okay, let me send a fix for the original commit.
> 

I have prepared a patch and just sent it out.
I'd appreciate it if you could take a look and share your feedback.

Thanks,
Regards.

> -- 
> Cheers,
> 
> David / dhildenb
> 
> 

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2025-06-04  9:55 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAJrd-UtDD50iN=Yxz4=6kNkAcNAtRFkxhKAbEYiRyyDT-bYPHg@mail.gmail.com>
2025-05-22 10:18 ` reply: [RFC] pin_user_pages_fast failure count increased 黄朝阳 (Zhaoyang Huang)
2025-05-22 12:22   ` David Hildenbrand
     [not found]     ` <CGME20250522130101epcas1p435244c12cfc9bb7895008b8ea98af064@epcms1p3>
2025-05-22 13:09       ` Jaewon Kim
2025-05-22 14:06         ` David Hildenbrand
     [not found]         ` <CGME20250522130101epcas1p435244c12cfc9bb7895008b8ea98af064@epcms1p2>
2025-05-22 14:44           ` 김재원
2025-05-22 15:07             ` David Hildenbrand
2025-05-23  2:48               ` John Hubbard
2025-05-23  2:37           ` 김재원
2025-05-23  2:52             ` John Hubbard
2025-05-26  7:48               ` Hyesoo Yu
2025-05-26  8:05                 ` Zhaoyang Huang
2025-05-26  9:33                   ` Hyesoo Yu
2025-05-26  9:38                     ` David Hildenbrand
     [not found]                     ` <CGME20250522130101epcas1p435244c12cfc9bb7895008b8ea98af064@epcms1p8>
2025-05-26 11:17                       ` Jaewon Kim
2025-05-26 11:49                         ` Zhaoyang Huang
2025-05-28  1:23                           ` Hyesoo Yu
2025-05-28  2:49                             ` Zhaoyang Huang
2025-05-28  3:36                               ` Hyesoo Yu
2025-05-28  7:55                                 ` David Hildenbrand
2025-05-28 10:59                                   ` Zhaoyang Huang
2025-05-28 12:57                                     ` David Hildenbrand
2025-06-03 13:12                                     ` David Hildenbrand
2025-06-04  1:04                                       ` Zhaoyang Huang
2025-06-04  9:12                                         ` David Hildenbrand
2025-06-04  9:41                                           ` Zhaoyang Huang
2025-06-04  9:48                                             ` David Hildenbrand
     [not found]                                               ` <CGME20250604095542epcas2p3f3d2d6fc17115547981a7173215a09d1@epcas2p3.samsung.com>
2025-06-04  9:53                                                 ` Hyesoo Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).