* [PATCH v3] gup: optimize longterm pin_user_pages() for large folio
@ 2025-06-05 3:34 lizhe.67
2025-06-05 7:39 ` lizhe.67
2025-06-05 7:51 ` David Hildenbrand
0 siblings, 2 replies; 5+ messages in thread
From: lizhe.67 @ 2025-06-05 3:34 UTC (permalink / raw)
To: akpm, david, jgg, jhubbard, peterx
Cc: linux-mm, linux-kernel, dev.jain, muchun.song, lizhe.67
From: Li Zhe <lizhe.67@bytedance.com>
In the current implementation of the longterm pin_user_pages() function,
we invoke the collect_longterm_unpinnable_folios() function. This function
iterates through the list to check whether each folio belongs to the
"longterm_unpinnabled" category. The folios in this list essentially
correspond to a contiguous region of user-space addresses, with each folio
representing a physical address in increments of PAGESIZE. If this
user-space address range is mapped with large folio, we can optimize the
performance of function collect_longterm_unpinnable_folios() by reducing
the using of READ_ONCE() invoked in
pofs_get_folio()->page_folio()->_compound_head(). Also, we can simplify
the logic of collect_longterm_unpinnable_folios(). Instead of comparing
with prev_folio after calling pofs_get_folio(), we can check whether the
next page is within the same folio.
The performance test results, based on v6.15, obtained through the
gup_test tool from the kernel source tree are as follows. We achieve an
improvement of over 66% for large folio with pagesize=2M. For small folio,
we have only observed a very slight degradation in performance.
Without this patch:
[root@localhost ~] ./gup_test -HL -m 8192 -n 512
TAP version 13
1..1
# PIN_LONGTERM_BENCHMARK: Time: get:14391 put:10858 us#
ok 1 ioctl status 0
# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
[root@localhost ~]# ./gup_test -LT -m 8192 -n 512
TAP version 13
1..1
# PIN_LONGTERM_BENCHMARK: Time: get:130538 put:31676 us#
ok 1 ioctl status 0
# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
With this patch:
[root@localhost ~] ./gup_test -HL -m 8192 -n 512
TAP version 13
1..1
# PIN_LONGTERM_BENCHMARK: Time: get:4867 put:10516 us#
ok 1 ioctl status 0
# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
[root@localhost ~]# ./gup_test -LT -m 8192 -n 512
TAP version 13
1..1
# PIN_LONGTERM_BENCHMARK: Time: get:131798 put:31328 us#
ok 1 ioctl status 0
# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
---
Changelogs:
v2->v3:
- Update performance test data based on v6.15.
- Refine the description of the optimization approach in commit message.
- Fix some issues of code formatting.
- Fine-tune the conditions for entering the optimization path.
v1->v2:
- Modify some unreliable code.
- Update performance test data.
v2 patch: https://lore.kernel.org/all/20250604031536.9053-1-lizhe.67@bytedance.com/
v1 patch: https://lore.kernel.org/all/20250530092351.32709-1-lizhe.67@bytedance.com/
mm/gup.c | 37 +++++++++++++++++++++++++++++--------
1 file changed, 29 insertions(+), 8 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 84461d384ae2..9fbe3592b5fc 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2317,6 +2317,31 @@ static void pofs_unpin(struct pages_or_folios *pofs)
unpin_user_pages(pofs->pages, pofs->nr_entries);
}
+static struct folio *pofs_next_folio(struct folio *folio,
+ struct pages_or_folios *pofs, long *index_ptr)
+{
+ long i = *index_ptr + 1;
+
+ if (!pofs->has_folios && folio_test_large(folio)) {
+ const unsigned long start_pfn = folio_pfn(folio);
+ const unsigned long end_pfn = start_pfn + folio_nr_pages(folio);
+
+ for (; i < pofs->nr_entries; i++) {
+ unsigned long pfn = page_to_pfn(pofs->pages[i]);
+
+ /* Is this page part of this folio? */
+ if (pfn < start_pfn || pfn >= end_pfn)
+ break;
+ }
+ }
+
+ if (unlikely(i == pofs->nr_entries))
+ return NULL;
+ *index_ptr = i;
+
+ return pofs_get_folio(pofs, i);
+}
+
/*
* Returns the number of collected folios. Return value is always >= 0.
*/
@@ -2324,16 +2349,12 @@ static void collect_longterm_unpinnable_folios(
struct list_head *movable_folio_list,
struct pages_or_folios *pofs)
{
- struct folio *prev_folio = NULL;
bool drain_allow = true;
- unsigned long i;
-
- for (i = 0; i < pofs->nr_entries; i++) {
- struct folio *folio = pofs_get_folio(pofs, i);
+ struct folio *folio;
+ long i = 0;
- if (folio == prev_folio)
- continue;
- prev_folio = folio;
+ for (folio = pofs_get_folio(pofs, i); folio;
+ folio = pofs_next_folio(folio, pofs, &i)) {
if (folio_is_longterm_pinnable(folio))
continue;
--
2.20.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v3] gup: optimize longterm pin_user_pages() for large folio
2025-06-05 3:34 [PATCH v3] gup: optimize longterm pin_user_pages() for large folio lizhe.67
@ 2025-06-05 7:39 ` lizhe.67
2025-06-05 7:47 ` David Hildenbrand
2025-06-05 7:51 ` David Hildenbrand
1 sibling, 1 reply; 5+ messages in thread
From: lizhe.67 @ 2025-06-05 7:39 UTC (permalink / raw)
To: david
Cc: akpm, dev.jain, jgg, jhubbard, linux-kernel, linux-mm,
muchun.song, peterx, lizhe.67
On Thu, 5 Jun 2025 11:34:30 +0800, lizhe.67@bytedance.com wrote:
> @@ -2324,16 +2349,12 @@ static void collect_longterm_unpinnable_folios(
> struct list_head *movable_folio_list,
> struct pages_or_folios *pofs)
> {
> - struct folio *prev_folio = NULL;
> bool drain_allow = true;
> - unsigned long i;
> -
> - for (i = 0; i < pofs->nr_entries; i++) {
> - struct folio *folio = pofs_get_folio(pofs, i);
> + struct folio *folio;
> + long i = 0;
>
> - if (folio == prev_folio)
> - continue;
> - prev_folio = folio;
> + for (folio = pofs_get_folio(pofs, i); folio;
> + folio = pofs_next_folio(folio, pofs, &i)) {
Hi David, I used three tabs for indentation here, but it doesn't seem to be
the effect you wanted. Did you mean that the indentation could be achieved
through tags and spaces, so that the folio on this line would be positioned
directly below the folio on the previous line, but offset to the right by
one space?
Thanks,
Zhe
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3] gup: optimize longterm pin_user_pages() for large folio
2025-06-05 7:39 ` lizhe.67
@ 2025-06-05 7:47 ` David Hildenbrand
0 siblings, 0 replies; 5+ messages in thread
From: David Hildenbrand @ 2025-06-05 7:47 UTC (permalink / raw)
To: lizhe.67
Cc: akpm, dev.jain, jgg, jhubbard, linux-kernel, linux-mm,
muchun.song, peterx
On 05.06.25 09:39, lizhe.67@bytedance.com wrote:
> On Thu, 5 Jun 2025 11:34:30 +0800, lizhe.67@bytedance.com wrote:
>
>> @@ -2324,16 +2349,12 @@ static void collect_longterm_unpinnable_folios(
>> struct list_head *movable_folio_list,
>> struct pages_or_folios *pofs)
>> {
>> - struct folio *prev_folio = NULL;
>> bool drain_allow = true;
>> - unsigned long i;
>> -
>> - for (i = 0; i < pofs->nr_entries; i++) {
>> - struct folio *folio = pofs_get_folio(pofs, i);
>> + struct folio *folio;
>> + long i = 0;
>>
>> - if (folio == prev_folio)
>> - continue;
>> - prev_folio = folio;
>> + for (folio = pofs_get_folio(pofs, i); folio;
>> + folio = pofs_next_folio(folio, pofs, &i)) {
>
> Hi David, I used three tabs for indentation here, but it doesn't seem to be
> the effect you wanted. Did you mean that the indentation could be achieved
> through tags and spaces, so that the folio on this line would be positioned
> directly below the folio on the previous line
Yes, that's what we do for these cases.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3] gup: optimize longterm pin_user_pages() for large folio
2025-06-05 3:34 [PATCH v3] gup: optimize longterm pin_user_pages() for large folio lizhe.67
2025-06-05 7:39 ` lizhe.67
@ 2025-06-05 7:51 ` David Hildenbrand
2025-06-05 8:23 ` lizhe.67
1 sibling, 1 reply; 5+ messages in thread
From: David Hildenbrand @ 2025-06-05 7:51 UTC (permalink / raw)
To: lizhe.67, akpm, jgg, jhubbard, peterx
Cc: linux-mm, linux-kernel, dev.jain, muchun.song
On 05.06.25 05:34, lizhe.67@bytedance.com wrote:
> From: Li Zhe <lizhe.67@bytedance.com>
>
> In the current implementation of the longterm pin_user_pages() function,
> we invoke the collect_longterm_unpinnable_folios() function. This function
> iterates through the list to check whether each folio belongs to the
> "longterm_unpinnabled" category. The folios in this list essentially
> correspond to a contiguous region of user-space addresses, with each folio
> representing a physical address in increments of PAGESIZE. If this
> user-space address range is mapped with large folio, we can optimize the
> performance of function collect_longterm_unpinnable_folios() by reducing
> the using of READ_ONCE() invoked in
> pofs_get_folio()->page_folio()->_compound_head(). Also, we can simplify
> the logic of collect_longterm_unpinnable_folios(). Instead of comparing
> with prev_folio after calling pofs_get_folio(), we can check whether the
> next page is within the same folio.
>
> The performance test results, based on v6.15, obtained through the
> gup_test tool from the kernel source tree are as follows. We achieve an
> improvement of over 66% for large folio with pagesize=2M. For small folio,
> we have only observed a very slight degradation in performance.
>
> Without this patch:
>
> [root@localhost ~] ./gup_test -HL -m 8192 -n 512
> TAP version 13
> 1..1
> # PIN_LONGTERM_BENCHMARK: Time: get:14391 put:10858 us#
> ok 1 ioctl status 0
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> [root@localhost ~]# ./gup_test -LT -m 8192 -n 512
> TAP version 13
> 1..1
> # PIN_LONGTERM_BENCHMARK: Time: get:130538 put:31676 us#
> ok 1 ioctl status 0
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
>
> With this patch:
>
> [root@localhost ~] ./gup_test -HL -m 8192 -n 512
> TAP version 13
> 1..1
> # PIN_LONGTERM_BENCHMARK: Time: get:4867 put:10516 us#
> ok 1 ioctl status 0
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> [root@localhost ~]# ./gup_test -LT -m 8192 -n 512
> TAP version 13
> 1..1
> # PIN_LONGTERM_BENCHMARK: Time: get:131798 put:31328 us#
> ok 1 ioctl status 0
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
>
> Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
> ---
> Changelogs:
>
> v2->v3:
> - Update performance test data based on v6.15.
> - Refine the description of the optimization approach in commit message.
> - Fix some issues of code formatting.
> - Fine-tune the conditions for entering the optimization path.
>
> v1->v2:
> - Modify some unreliable code.
> - Update performance test data.
>
> v2 patch: https://lore.kernel.org/all/20250604031536.9053-1-lizhe.67@bytedance.com/
> v1 patch: https://lore.kernel.org/all/20250530092351.32709-1-lizhe.67@bytedance.com/
>
> mm/gup.c | 37 +++++++++++++++++++++++++++++--------
> 1 file changed, 29 insertions(+), 8 deletions(-)
>
> diff --git a/mm/gup.c b/mm/gup.c
> index 84461d384ae2..9fbe3592b5fc 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2317,6 +2317,31 @@ static void pofs_unpin(struct pages_or_folios *pofs)
> unpin_user_pages(pofs->pages, pofs->nr_entries);
> }
>
> +static struct folio *pofs_next_folio(struct folio *folio,
> + struct pages_or_folios *pofs, long *index_ptr)
^ use two tabs here
> +{
> + long i = *index_ptr + 1;
> +
> + if (!pofs->has_folios && folio_test_large(folio)) {
> + const unsigned long start_pfn = folio_pfn(folio);
> + const unsigned long end_pfn = start_pfn + folio_nr_pages(folio);
> +
> + for (; i < pofs->nr_entries; i++) {
> + unsigned long pfn = page_to_pfn(pofs->pages[i]);
> +
> + /* Is this page part of this folio? */
> + if (pfn < start_pfn || pfn >= end_pfn)
> + break;
> + }
> + }
> +
> + if (unlikely(i == pofs->nr_entries))
> + return NULL;
> + *index_ptr = i;
> +
> + return pofs_get_folio(pofs, i);
> +}
> +
> /*
> * Returns the number of collected folios. Return value is always >= 0.
> */
> @@ -2324,16 +2349,12 @@ static void collect_longterm_unpinnable_folios(
> struct list_head *movable_folio_list,
> struct pages_or_folios *pofs)
> {
> - struct folio *prev_folio = NULL;
> bool drain_allow = true;
> - unsigned long i;
> -
> - for (i = 0; i < pofs->nr_entries; i++) {
> - struct folio *folio = pofs_get_folio(pofs, i);
> + struct folio *folio;
> + long i = 0;
>
> - if (folio == prev_folio)
> - continue;
> - prev_folio = folio;
> + for (folio = pofs_get_folio(pofs, i); folio;
> + folio = pofs_next_folio(folio, pofs, &i)) {
As discussed, align both "folios" (using tabs and then spaces)
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3] gup: optimize longterm pin_user_pages() for large folio
2025-06-05 7:51 ` David Hildenbrand
@ 2025-06-05 8:23 ` lizhe.67
0 siblings, 0 replies; 5+ messages in thread
From: lizhe.67 @ 2025-06-05 8:23 UTC (permalink / raw)
To: david
Cc: akpm, dev.jain, jgg, jhubbard, linux-kernel, linux-mm, lizhe.67,
muchun.song, peterx
On Thu, 5 Jun 2025 09:51:06 +0200, david@redhat.com wrote:
> On 05.06.25 05:34, lizhe.67@bytedance.com wrote:
> > From: Li Zhe <lizhe.67@bytedance.com>
> >
> > In the current implementation of the longterm pin_user_pages() function,
> > we invoke the collect_longterm_unpinnable_folios() function. This function
> > iterates through the list to check whether each folio belongs to the
> > "longterm_unpinnabled" category. The folios in this list essentially
> > correspond to a contiguous region of user-space addresses, with each folio
> > representing a physical address in increments of PAGESIZE. If this
> > user-space address range is mapped with large folio, we can optimize the
> > performance of function collect_longterm_unpinnable_folios() by reducing
> > the using of READ_ONCE() invoked in
> > pofs_get_folio()->page_folio()->_compound_head(). Also, we can simplify
> > the logic of collect_longterm_unpinnable_folios(). Instead of comparing
> > with prev_folio after calling pofs_get_folio(), we can check whether the
> > next page is within the same folio.
> >
> > The performance test results, based on v6.15, obtained through the
> > gup_test tool from the kernel source tree are as follows. We achieve an
> > improvement of over 66% for large folio with pagesize=2M. For small folio,
> > we have only observed a very slight degradation in performance.
> >
> > Without this patch:
> >
> > [root@localhost ~] ./gup_test -HL -m 8192 -n 512
> > TAP version 13
> > 1..1
> > # PIN_LONGTERM_BENCHMARK: Time: get:14391 put:10858 us#
> > ok 1 ioctl status 0
> > # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> > [root@localhost ~]# ./gup_test -LT -m 8192 -n 512
> > TAP version 13
> > 1..1
> > # PIN_LONGTERM_BENCHMARK: Time: get:130538 put:31676 us#
> > ok 1 ioctl status 0
> > # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> >
> > With this patch:
> >
> > [root@localhost ~] ./gup_test -HL -m 8192 -n 512
> > TAP version 13
> > 1..1
> > # PIN_LONGTERM_BENCHMARK: Time: get:4867 put:10516 us#
> > ok 1 ioctl status 0
> > # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> > [root@localhost ~]# ./gup_test -LT -m 8192 -n 512
> > TAP version 13
> > 1..1
> > # PIN_LONGTERM_BENCHMARK: Time: get:131798 put:31328 us#
> > ok 1 ioctl status 0
> > # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> >
> > Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
> > ---
> > Changelogs:
> >
> > v2->v3:
> > - Update performance test data based on v6.15.
> > - Refine the description of the optimization approach in commit message.
> > - Fix some issues of code formatting.
> > - Fine-tune the conditions for entering the optimization path.
> >
> > v1->v2:
> > - Modify some unreliable code.
> > - Update performance test data.
> >
> > v2 patch: https://lore.kernel.org/all/20250604031536.9053-1-lizhe.67@bytedance.com/
> > v1 patch: https://lore.kernel.org/all/20250530092351.32709-1-lizhe.67@bytedance.com/
> >
> > mm/gup.c | 37 +++++++++++++++++++++++++++++--------
> > 1 file changed, 29 insertions(+), 8 deletions(-)
> >
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 84461d384ae2..9fbe3592b5fc 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -2317,6 +2317,31 @@ static void pofs_unpin(struct pages_or_folios *pofs)
> > unpin_user_pages(pofs->pages, pofs->nr_entries);
> > }
> >
> > +static struct folio *pofs_next_folio(struct folio *folio,
> > + struct pages_or_folios *pofs, long *index_ptr)
>
> ^ use two tabs here
>
> > +{
> > + long i = *index_ptr + 1;
> > +
> > + if (!pofs->has_folios && folio_test_large(folio)) {
> > + const unsigned long start_pfn = folio_pfn(folio);
> > + const unsigned long end_pfn = start_pfn + folio_nr_pages(folio);
> > +
> > + for (; i < pofs->nr_entries; i++) {
> > + unsigned long pfn = page_to_pfn(pofs->pages[i]);
> > +
> > + /* Is this page part of this folio? */
> > + if (pfn < start_pfn || pfn >= end_pfn)
> > + break;
> > + }
> > + }
> > +
> > + if (unlikely(i == pofs->nr_entries))
> > + return NULL;
> > + *index_ptr = i;
> > +
> > + return pofs_get_folio(pofs, i);
> > +}
> > +
> > /*
> > * Returns the number of collected folios. Return value is always >= 0.
> > */
> > @@ -2324,16 +2349,12 @@ static void collect_longterm_unpinnable_folios(
> > struct list_head *movable_folio_list,
> > struct pages_or_folios *pofs)
> > {
> > - struct folio *prev_folio = NULL;
> > bool drain_allow = true;
> > - unsigned long i;
> > -
> > - for (i = 0; i < pofs->nr_entries; i++) {
> > - struct folio *folio = pofs_get_folio(pofs, i);
> > + struct folio *folio;
> > + long i = 0;
> >
> > - if (folio == prev_folio)
> > - continue;
> > - prev_folio = folio;
> > + for (folio = pofs_get_folio(pofs, i); folio;
> > + folio = pofs_next_folio(folio, pofs, &i)) {
>
> As discussed, align both "folios" (using tabs and then spaces)
>
> Acked-by: David Hildenbrand <david@redhat.com>
Thank you very much for your review. I will fix the issue in v4 patch.
Thanks,
Zhe
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-06-05 8:23 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-05 3:34 [PATCH v3] gup: optimize longterm pin_user_pages() for large folio lizhe.67
2025-06-05 7:39 ` lizhe.67
2025-06-05 7:47 ` David Hildenbrand
2025-06-05 7:51 ` David Hildenbrand
2025-06-05 8:23 ` lizhe.67
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).