[PATCH 0/2] Improve migration by backing off earlier

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] Improve migration by backing off earlier
@ 2024-08-09 10:31 Dev Jain
  2024-08-09 10:31 ` [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch Dev Jain
  2024-08-09 10:31 ` [PATCH 2/2] selftests/mm: Do not fail test for a single migration failure Dev Jain
  0 siblings, 2 replies; 30+ messages in thread
From: Dev Jain @ 2024-08-09 10:31 UTC (permalink / raw)
  To: akpm, shuah, david, willy
  Cc: ryan.roberts, anshuman.khandual, catalin.marinas, cl, vbabka,
	mhocko, apopple, osalvador, baolin.wang, dave.hansen, will,
	baohua, ioworker0, gshan, mark.rutland, kirill.shutemov, hughd,
	aneesh.kumar, yang, peterx, broonie, mgorman, linux-arm-kernel,
	linux-kernel, linux-mm, linux-kselftest, Dev Jain

It was recently observed at [1] that during the folio unmapping stage
of migration, when the PTEs are cleared, a racing thread faulting on that
folio may increase the refcount of the folio, sleep on the folio lock
(the migration path has the lock), and migration ultimately fails
when asserting the actual refcount against the expected.

Migration is a best effort service; the unmapping and the moving phase
are wrapped around loops for retrying. The refcount of the folio is
currently being asserted during the move stage; if it fails, we retry.
But, if a racing thread changes the refcount, and ends up sleeping on the
folio lock (which is mostly the case), there is no way the refcount would
be decremented; as a result, this renders the retrying useless. In the
first patch, we make the refcount check also during the unmap stage; if
it fails, we restore the original state of the PTE, drop the folio lock,
let the system make progress, and retry unmapping again. This improves the
probability of migration winning the race.

Given that migration is a best-effort service, it is wrong to fail the
test for just a single failure; hence, fail the test after 100 consecutive
failures (where 100 is still a subjective choice).

[1] https://lore.kernel.org/all/20240801081657.1386743-1-dev.jain@arm.com/

Dev Jain (2):
  mm: Retry migration earlier upon refcount mismatch
  selftests/mm: Do not fail test for a single migration failure

 mm/migrate.c                           |  9 +++++++++
 tools/testing/selftests/mm/migration.c | 17 +++++++++++------
 2 files changed, 20 insertions(+), 6 deletions(-)

-- 
2.30.2

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch
  2024-08-09 10:31 [PATCH 0/2] Improve migration by backing off earlier Dev Jain
@ 2024-08-09 10:31 ` Dev Jain
  2024-08-09 13:47   ` David Hildenbrand
  2024-08-12  5:34   ` Huang, Ying
  2024-08-09 10:31 ` [PATCH 2/2] selftests/mm: Do not fail test for a single migration failure Dev Jain
  1 sibling, 2 replies; 30+ messages in thread
From: Dev Jain @ 2024-08-09 10:31 UTC (permalink / raw)
  To: akpm, shuah, david, willy
  Cc: ryan.roberts, anshuman.khandual, catalin.marinas, cl, vbabka,
	mhocko, apopple, osalvador, baolin.wang, dave.hansen, will,
	baohua, ioworker0, gshan, mark.rutland, kirill.shutemov, hughd,
	aneesh.kumar, yang, peterx, broonie, mgorman, linux-arm-kernel,
	linux-kernel, linux-mm, linux-kselftest, Dev Jain

As already being done in __migrate_folio(), wherein we backoff if the
folio refcount is wrong, make this check during the unmapping phase, upon
the failure of which, the original state of the PTEs will be restored and
the folio lock will be dropped via migrate_folio_undo_src(), any racing
thread will make progress and migration will be retried.

Signed-off-by: Dev Jain <dev.jain@arm.com>
---
 mm/migrate.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/mm/migrate.c b/mm/migrate.c
index e7296c0fb5d5..477acf996951 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1250,6 +1250,15 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
 	}
 
 	if (!folio_mapped(src)) {
+		/*
+		 * Someone may have changed the refcount and maybe sleeping
+		 * on the folio lock. In case of refcount mismatch, bail out,
+		 * let the system make progress and retry.
+		 */
+		struct address_space *mapping = folio_mapping(src);
+
+		if (folio_ref_count(src) != folio_expected_refs(mapping, src))
+			goto out;
 		__migrate_folio_record(dst, old_page_state, anon_vma);
 		return MIGRATEPAGE_UNMAP;
 	}
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/2] selftests/mm: Do not fail test for a single migration failure
  2024-08-09 10:31 [PATCH 0/2] Improve migration by backing off earlier Dev Jain
  2024-08-09 10:31 ` [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch Dev Jain
@ 2024-08-09 10:31 ` Dev Jain
  2024-08-09 17:13   ` Shuah Khan
  1 sibling, 1 reply; 30+ messages in thread
From: Dev Jain @ 2024-08-09 10:31 UTC (permalink / raw)
  To: akpm, shuah, david, willy
  Cc: ryan.roberts, anshuman.khandual, catalin.marinas, cl, vbabka,
	mhocko, apopple, osalvador, baolin.wang, dave.hansen, will,
	baohua, ioworker0, gshan, mark.rutland, kirill.shutemov, hughd,
	aneesh.kumar, yang, peterx, broonie, mgorman, linux-arm-kernel,
	linux-kernel, linux-mm, linux-kselftest, Dev Jain

Do not fail the test for just a single instance of migration failure,
since migration is a best-effort service.

Signed-off-by: Dev Jain <dev.jain@arm.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Tested-by: Ryan Roberts <ryan.roberts@arm.com>
---
 tools/testing/selftests/mm/migration.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/mm/migration.c b/tools/testing/selftests/mm/migration.c
index 6908569ef406..64bcbb7151cf 100644
--- a/tools/testing/selftests/mm/migration.c
+++ b/tools/testing/selftests/mm/migration.c
@@ -15,10 +15,10 @@
 #include <signal.h>
 #include <time.h>
 
-#define TWOMEG (2<<20)
-#define RUNTIME (20)
-
-#define ALIGN(x, a) (((x) + (a - 1)) & (~((a) - 1)))
+#define TWOMEG		(2<<20)
+#define RUNTIME		(20)
+#define MAX_RETRIES	100
+#define ALIGN(x, a)	(((x) + (a - 1)) & (~((a) - 1)))
 
 FIXTURE(migration)
 {
@@ -65,6 +65,7 @@ int migrate(uint64_t *ptr, int n1, int n2)
 	int ret, tmp;
 	int status = 0;
 	struct timespec ts1, ts2;
+	int failures = 0;
 
 	if (clock_gettime(CLOCK_MONOTONIC, &ts1))
 		return -1;
@@ -79,13 +80,17 @@ int migrate(uint64_t *ptr, int n1, int n2)
 		ret = move_pages(0, 1, (void **) &ptr, &n2, &status,
 				MPOL_MF_MOVE_ALL);
 		if (ret) {
-			if (ret > 0)
+			if (ret > 0) {
+				/* Migration is best effort; try again */
+				if (++failures < MAX_RETRIES)
+					continue;
 				printf("Didn't migrate %d pages\n", ret);
+			}
 			else
 				perror("Couldn't migrate pages");
 			return -2;
 		}
-
+		failures = 0;
 		tmp = n2;
 		n2 = n1;
 		n1 = tmp;
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch
  2024-08-09 10:31 ` [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch Dev Jain
@ 2024-08-09 13:47   ` David Hildenbrand
  2024-08-09 21:09     ` Christoph Lameter (Ampere)
                       ` (2 more replies)
  2024-08-12  5:34   ` Huang, Ying
  1 sibling, 3 replies; 30+ messages in thread
From: David Hildenbrand @ 2024-08-09 13:47 UTC (permalink / raw)
  To: Dev Jain, akpm, shuah, willy
  Cc: ryan.roberts, anshuman.khandual, catalin.marinas, cl, vbabka,
	mhocko, apopple, osalvador, baolin.wang, dave.hansen, will,
	baohua, ioworker0, gshan, mark.rutland, kirill.shutemov, hughd,
	aneesh.kumar, yang, peterx, broonie, mgorman, linux-arm-kernel,
	linux-kernel, linux-mm, linux-kselftest

On 09.08.24 12:31, Dev Jain wrote:
> As already being done in __migrate_folio(), wherein we backoff if the
> folio refcount is wrong, make this check during the unmapping phase, upon
> the failure of which, the original state of the PTEs will be restored and
> the folio lock will be dropped via migrate_folio_undo_src(), any racing
> thread will make progress and migration will be retried.
> 
> Signed-off-by: Dev Jain <dev.jain@arm.com>
> ---
>   mm/migrate.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index e7296c0fb5d5..477acf996951 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1250,6 +1250,15 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
>   	}
>   
>   	if (!folio_mapped(src)) {
> +		/*
> +		 * Someone may have changed the refcount and maybe sleeping
> +		 * on the folio lock. In case of refcount mismatch, bail out,
> +		 * let the system make progress and retry.
> +		 */
> +		struct address_space *mapping = folio_mapping(src);
> +
> +		if (folio_ref_count(src) != folio_expected_refs(mapping, src))
> +			goto out;

This really seems to be the latest point where we can "easily" back off 
and unlock the source folio -- in this function :)

I wonder if we should be smarter in the migrate_pages_batch() loop when 
we start the actual migrations via migrate_folio_move(): if we detect 
that a folio has unexpected references *and* it has waiters 
(PG_waiters), back off then and retry the folio later. If it only has 
unexpected references, just keep retrying: no waiters -> nobody is 
waiting for the lock to make progress.

For example, when migrate_folio_move() fails with -EAGAIN, check if 
there are waiters (PG_waiter?) and undo+unlock to try again later.

But I'm not really a migration expert, so only my 2 cents :)

>   		__migrate_folio_record(dst, old_page_state, anon_vma);
>   		return MIGRATEPAGE_UNMAP;
>   	}

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] selftests/mm: Do not fail test for a single migration failure
  2024-08-09 10:31 ` [PATCH 2/2] selftests/mm: Do not fail test for a single migration failure Dev Jain
@ 2024-08-09 17:13   ` Shuah Khan
  2024-08-09 21:10     ` Christoph Lameter (Ampere)
  2024-08-12  6:19     ` Dev Jain
  0 siblings, 2 replies; 30+ messages in thread
From: Shuah Khan @ 2024-08-09 17:13 UTC (permalink / raw)
  To: Dev Jain, akpm, shuah, david, willy
  Cc: ryan.roberts, anshuman.khandual, catalin.marinas, cl, vbabka,
	mhocko, apopple, osalvador, baolin.wang, dave.hansen, will,
	baohua, ioworker0, gshan, mark.rutland, kirill.shutemov, hughd,
	aneesh.kumar, yang, peterx, broonie, mgorman, linux-arm-kernel,
	linux-kernel, linux-mm, linux-kselftest, Shuah Khan

On 8/9/24 04:31, Dev Jain wrote:
> Do not fail the test for just a single instance of migration failure,
> since migration is a best-effort service.

The cover letter says:

"Given that migration is a best-effort service, it is wrong to fail the
test for just a single failure; hence, fail the test after 100 consecutive
failures (where 100 is still a subjective choice)."

You do want to mention the above here.

The reason being, I would like to know what this does to the run-time of
this test if migration fails and retried 100 times.

> 
> Signed-off-by: Dev Jain <dev.jain@arm.com>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
> Tested-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>   tools/testing/selftests/mm/migration.c | 17 +++++++++++------
>   1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/testing/selftests/mm/migration.c b/tools/testing/selftests/mm/migration.c
> index 6908569ef406..64bcbb7151cf 100644
> --- a/tools/testing/selftests/mm/migration.c
> +++ b/tools/testing/selftests/mm/migration.c
> @@ -15,10 +15,10 @@
>   #include <signal.h>
>   #include <time.h>
>   
> -#define TWOMEG (2<<20)
> -#define RUNTIME (20)
> -
> -#define ALIGN(x, a) (((x) + (a - 1)) & (~((a) - 1)))
> +#define TWOMEG		(2<<20)
> +#define RUNTIME		(20)
> +#define MAX_RETRIES	100
> +#define ALIGN(x, a)	(((x) + (a - 1)) & (~((a) - 1)))
>   
>   FIXTURE(migration)
>   {
> @@ -65,6 +65,7 @@ int migrate(uint64_t *ptr, int n1, int n2)
>   	int ret, tmp;
>   	int status = 0;
>   	struct timespec ts1, ts2;
> +	int failures = 0;
>   
>   	if (clock_gettime(CLOCK_MONOTONIC, &ts1))
>   		return -1;
> @@ -79,13 +80,17 @@ int migrate(uint64_t *ptr, int n1, int n2)
>   		ret = move_pages(0, 1, (void **) &ptr, &n2, &status,
>   				MPOL_MF_MOVE_ALL);
>   		if (ret) {
> -			if (ret > 0)
> +			if (ret > 0) {
> +				/* Migration is best effort; try again */
> +				if (++failures < MAX_RETRIES)
> +					continue;
>   				printf("Didn't migrate %d pages\n", ret);
> +			}
>   			else
>   				perror("Couldn't migrate pages");
>   			return -2;
>   		}
> -
> +		failures = 0;
>   		tmp = n2;
>   		n2 = n1;
>   		n1 = tmp;

thanks,
-- Shuah


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch
  2024-08-09 13:47   ` David Hildenbrand
@ 2024-08-09 21:09     ` Christoph Lameter (Ampere)
  2024-08-10 18:42     ` Dev Jain
  2024-08-10 21:05     ` Zi Yan
  2 siblings, 0 replies; 30+ messages in thread
From: Christoph Lameter (Ampere) @ 2024-08-09 21:09 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Dev Jain, akpm, shuah, willy, ryan.roberts, anshuman.khandual,
	catalin.marinas, vbabka, mhocko, apopple, osalvador, baolin.wang,
	dave.hansen, will, baohua, ioworker0, gshan, mark.rutland,
	kirill.shutemov, hughd, aneesh.kumar, yang, peterx, broonie,
	mgorman, linux-arm-kernel, linux-kernel, linux-mm,
	linux-kselftest

On Fri, 9 Aug 2024, David Hildenbrand wrote:

>
> This really seems to be the latest point where we can "easily" back off and 
> unlock the source folio -- in this function :)
>
> I wonder if we should be smarter in the migrate_pages_batch() loop when we 
> start the actual migrations via migrate_folio_move(): if we detect that a 
> folio has unexpected references *and* it has waiters (PG_waiters), back off 
> then and retry the folio later. If it only has unexpected references, just 
> keep retrying: no waiters -> nobody is waiting for the lock to make progress.

Well just backoff ASAP if there are waiters detected anytime. A waiter 
would have increased the refcount. And a waiter will likely modify the page status soon. So 
push it to the end of the pages to be migrated to give it as much time 
as we can and check again later.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] selftests/mm: Do not fail test for a single migration failure
  2024-08-09 17:13   ` Shuah Khan
@ 2024-08-09 21:10     ` Christoph Lameter (Ampere)
  2024-08-12  6:19     ` Dev Jain
  1 sibling, 0 replies; 30+ messages in thread
From: Christoph Lameter (Ampere) @ 2024-08-09 21:10 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Dev Jain, akpm, shuah, david, willy, ryan.roberts,
	anshuman.khandual, catalin.marinas, vbabka, mhocko, apopple,
	osalvador, baolin.wang, dave.hansen, will, baohua, ioworker0,
	gshan, mark.rutland, kirill.shutemov, hughd, aneesh.kumar, yang,
	peterx, broonie, mgorman, linux-arm-kernel, linux-kernel,
	linux-mm, linux-kselftest

On Fri, 9 Aug 2024, Shuah Khan wrote:

> "Given that migration is a best-effort service, it is wrong to fail the
> test for just a single failure; hence, fail the test after 100 consecutive
> failures (where 100 is still a subjective choice)."
>
> You do want to mention the above here.
>
> The reason being, I would like to know what this does to the run-time of
> this test if migration fails and retried 100 times.

If we backoff earlier without engaging too much with the page then we can 
in turn affort to retry more times.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch
  2024-08-09 13:47   ` David Hildenbrand
  2024-08-09 21:09     ` Christoph Lameter (Ampere)
@ 2024-08-10 18:42     ` Dev Jain
  2024-08-10 18:52       ` David Hildenbrand
  2024-08-10 21:05     ` Zi Yan
  2 siblings, 1 reply; 30+ messages in thread
From: Dev Jain @ 2024-08-10 18:42 UTC (permalink / raw)
  To: David Hildenbrand, akpm, shuah, willy
  Cc: ryan.roberts, anshuman.khandual, catalin.marinas, cl, vbabka,
	mhocko, apopple, osalvador, baolin.wang, dave.hansen, will,
	baohua, ioworker0, gshan, mark.rutland, kirill.shutemov, hughd,
	aneesh.kumar, yang, peterx, broonie, mgorman, linux-arm-kernel,
	linux-kernel, linux-mm, linux-kselftest


On 8/9/24 19:17, David Hildenbrand wrote:
> On 09.08.24 12:31, Dev Jain wrote:
>> As already being done in __migrate_folio(), wherein we backoff if the
>> folio refcount is wrong, make this check during the unmapping phase, 
>> upon
>> the failure of which, the original state of the PTEs will be restored 
>> and
>> the folio lock will be dropped via migrate_folio_undo_src(), any racing
>> thread will make progress and migration will be retried.
>>
>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>> ---
>>   mm/migrate.c | 9 +++++++++
>>   1 file changed, 9 insertions(+)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index e7296c0fb5d5..477acf996951 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1250,6 +1250,15 @@ static int migrate_folio_unmap(new_folio_t 
>> get_new_folio,
>>       }
>>         if (!folio_mapped(src)) {
>> +        /*
>> +         * Someone may have changed the refcount and maybe sleeping
>> +         * on the folio lock. In case of refcount mismatch, bail out,
>> +         * let the system make progress and retry.
>> +         */
>> +        struct address_space *mapping = folio_mapping(src);
>> +
>> +        if (folio_ref_count(src) != folio_expected_refs(mapping, src))
>> +            goto out;
>
> This really seems to be the latest point where we can "easily" back 
> off and unlock the source folio -- in this function :)
>
> I wonder if we should be smarter in the migrate_pages_batch() loop 
> when we start the actual migrations via migrate_folio_move(): if we 
> detect that a folio has unexpected references *and* it has waiters 
> (PG_waiters), back off then and retry the folio later. If it only has 
> unexpected references, just keep retrying: no waiters -> nobody is 
> waiting for the lock to make progress.


The patch currently retries migration irrespective of the reason of 
refcount change.

If you are suggesting that, break the retrying according to two conditions:

1. If the folio has waiters, retry according to 
NR_MAX_MIGRATE_PAGES_RETRY = 10.

2. If not, retry for a large number of iterations, say 10,000, since we 
just need to keep

retrying till the racer finishes reading the folio/failing on 
folio_trylock(), and decrementing

refcount.

If so, we will have to make the check as a refcount freeze(with 
xas_lock()); if we don't do that,

anyone can increase the refcount again, reading data from a stale 
reference to the folio, making

our check futile (which begs the question: is commit 0609139 correct? 
Checking refcount mismatch

in __migrate_folio() is ineffective since after that, and before 
folio_ref_freeze() in __folio_migrate_mapping(),

the refcount may change.) As a result, the freeze will have to take 
place immediately after we unmap

the folios from everyone's address space, something like:

while (!folio_ref_freeze(src, expected_count) && ++retries < 10000) {

         if (folio has waiters)

                 break;    /* will be retried by the outer loop giving 
us 10 chances in total */

}


> This really seems to be the latest point where we can "easily" back 
> off and unlock the source folio -- in this function :)
> For example, when migrate_folio_move() fails with -EAGAIN, check if 
> there are waiters (PG_waiter?) and undo+unlock to try again later.


Currently, on -EAGAIN, migrate_folio_move() returns without undoing src 
and dst; even if we were to fall

through to _undo_src/dst, the folios will not be unmapped again since 
_unmap() and _move() are

wrapped around different loops. This is what I was hinting to when I 
wrote in the cover letter:

"...there is no way the refcount would be decremented; as a result, this 
renders the retrying

useless" since upon the failure of _move(), the lock will not be dropped 
(which is dropped

through undo_src()), rendering the _move() loop useless. Sorry, should 
have noted this there.






^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch
  2024-08-10 18:42     ` Dev Jain
@ 2024-08-10 18:52       ` David Hildenbrand
  2024-08-11  6:06         ` Dev Jain
  0 siblings, 1 reply; 30+ messages in thread
From: David Hildenbrand @ 2024-08-10 18:52 UTC (permalink / raw)
  To: Dev Jain, akpm, shuah, willy
  Cc: ryan.roberts, anshuman.khandual, catalin.marinas, cl, vbabka,
	mhocko, apopple, osalvador, baolin.wang, dave.hansen, will,
	baohua, ioworker0, gshan, mark.rutland, kirill.shutemov, hughd,
	aneesh.kumar, yang, peterx, broonie, mgorman, linux-arm-kernel,
	linux-kernel, linux-mm, linux-kselftest

On 10.08.24 20:42, Dev Jain wrote:
> 
> On 8/9/24 19:17, David Hildenbrand wrote:
>> On 09.08.24 12:31, Dev Jain wrote:
>>> As already being done in __migrate_folio(), wherein we backoff if the
>>> folio refcount is wrong, make this check during the unmapping phase,
>>> upon
>>> the failure of which, the original state of the PTEs will be restored
>>> and
>>> the folio lock will be dropped via migrate_folio_undo_src(), any racing
>>> thread will make progress and migration will be retried.
>>>
>>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>>> ---
>>>    mm/migrate.c | 9 +++++++++
>>>    1 file changed, 9 insertions(+)
>>>
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index e7296c0fb5d5..477acf996951 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -1250,6 +1250,15 @@ static int migrate_folio_unmap(new_folio_t
>>> get_new_folio,
>>>        }
>>>          if (!folio_mapped(src)) {
>>> +        /*
>>> +         * Someone may have changed the refcount and maybe sleeping
>>> +         * on the folio lock. In case of refcount mismatch, bail out,
>>> +         * let the system make progress and retry.
>>> +         */
>>> +        struct address_space *mapping = folio_mapping(src);
>>> +
>>> +        if (folio_ref_count(src) != folio_expected_refs(mapping, src))
>>> +            goto out;
>>
>> This really seems to be the latest point where we can "easily" back
>> off and unlock the source folio -- in this function :)
>>
>> I wonder if we should be smarter in the migrate_pages_batch() loop
>> when we start the actual migrations via migrate_folio_move(): if we
>> detect that a folio has unexpected references *and* it has waiters
>> (PG_waiters), back off then and retry the folio later. If it only has
>> unexpected references, just keep retrying: no waiters -> nobody is
>> waiting for the lock to make progress.
> 
> 
> The patch currently retries migration irrespective of the reason of
> refcount change.
> 
> If you are suggesting that, break the retrying according to two conditions:

That's not what I am suggesting ...

> 
> 
>> This really seems to be the latest point where we can "easily" back
>> off and unlock the source folio -- in this function :)
>> For example, when migrate_folio_move() fails with -EAGAIN, check if
>> there are waiters (PG_waiter?) and undo+unlock to try again later.
> 
> 
> Currently, on -EAGAIN, migrate_folio_move() returns without undoing src
> and dst; even if we were to fall

...

I am wondering if we should detect here if there are waiters and undo 
src+dst.

Does that make sense?

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch
  2024-08-09 13:47   ` David Hildenbrand
  2024-08-09 21:09     ` Christoph Lameter (Ampere)
  2024-08-10 18:42     ` Dev Jain
@ 2024-08-10 21:05     ` Zi Yan
  2 siblings, 0 replies; 30+ messages in thread
From: Zi Yan @ 2024-08-10 21:05 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Dev Jain, akpm, shuah, willy, ryan.roberts, anshuman.khandual,
	catalin.marinas, cl, vbabka, mhocko, apopple, osalvador,
	baolin.wang, dave.hansen, will, baohua, ioworker0, gshan,
	mark.rutland, kirill.shutemov, hughd, aneesh.kumar, yang, peterx,
	broonie, mgorman, linux-arm-kernel, linux-kernel, linux-mm,
	linux-kselftest, Huang Ying

[-- Attachment #1: Type: text/plain, Size: 2045 bytes --]

On 9 Aug 2024, at 9:47, David Hildenbrand wrote:

> On 09.08.24 12:31, Dev Jain wrote:
>> As already being done in __migrate_folio(), wherein we backoff if the
>> folio refcount is wrong, make this check during the unmapping phase, upon
>> the failure of which, the original state of the PTEs will be restored and
>> the folio lock will be dropped via migrate_folio_undo_src(), any racing
>> thread will make progress and migration will be retried.
>>
>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>> ---
>>   mm/migrate.c | 9 +++++++++
>>   1 file changed, 9 insertions(+)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index e7296c0fb5d5..477acf996951 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1250,6 +1250,15 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
>>   	}
>>    	if (!folio_mapped(src)) {
>> +		/*
>> +		 * Someone may have changed the refcount and maybe sleeping
>> +		 * on the folio lock. In case of refcount mismatch, bail out,
>> +		 * let the system make progress and retry.
>> +		 */
>> +		struct address_space *mapping = folio_mapping(src);
>> +
>> +		if (folio_ref_count(src) != folio_expected_refs(mapping, src))
>> +			goto out;
>
> This really seems to be the latest point where we can "easily" back off and unlock the source folio -- in this function :)
>
> I wonder if we should be smarter in the migrate_pages_batch() loop when we start the actual migrations via migrate_folio_move(): if we detect that a folio has unexpected references *and* it has waiters (PG_waiters), back off then and retry the folio later. If it only has unexpected references, just keep retrying: no waiters -> nobody is waiting for the lock to make progress.
>
> For example, when migrate_folio_move() fails with -EAGAIN, check if there are waiters (PG_waiter?) and undo+unlock to try again later.
>
> But I'm not really a migration expert, so only my 2 cents :)

Sounds reasonable to me. Also add Ying (the author of batch page migration).

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch
  2024-08-10 18:52       ` David Hildenbrand
@ 2024-08-11  6:06         ` Dev Jain
  2024-08-11  9:08           ` David Hildenbrand
  0 siblings, 1 reply; 30+ messages in thread
From: Dev Jain @ 2024-08-11  6:06 UTC (permalink / raw)
  To: David Hildenbrand, akpm, shuah, willy
  Cc: ryan.roberts, anshuman.khandual, catalin.marinas, cl, vbabka,
	mhocko, apopple, osalvador, baolin.wang, dave.hansen, will,
	baohua, ioworker0, gshan, mark.rutland, kirill.shutemov, hughd,
	aneesh.kumar, yang, peterx, broonie, mgorman, linux-arm-kernel,
	linux-kernel, linux-mm, linux-kselftest, ying.huang


On 8/11/24 00:22, David Hildenbrand wrote:
> On 10.08.24 20:42, Dev Jain wrote:
>>
>> On 8/9/24 19:17, David Hildenbrand wrote:
>>> On 09.08.24 12:31, Dev Jain wrote:
>>>> As already being done in __migrate_folio(), wherein we backoff if the
>>>> folio refcount is wrong, make this check during the unmapping phase,
>>>> upon
>>>> the failure of which, the original state of the PTEs will be restored
>>>> and
>>>> the folio lock will be dropped via migrate_folio_undo_src(), any 
>>>> racing
>>>> thread will make progress and migration will be retried.
>>>>
>>>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>>>> ---
>>>>    mm/migrate.c | 9 +++++++++
>>>>    1 file changed, 9 insertions(+)
>>>>
>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>> index e7296c0fb5d5..477acf996951 100644
>>>> --- a/mm/migrate.c
>>>> +++ b/mm/migrate.c
>>>> @@ -1250,6 +1250,15 @@ static int migrate_folio_unmap(new_folio_t
>>>> get_new_folio,
>>>>        }
>>>>          if (!folio_mapped(src)) {
>>>> +        /*
>>>> +         * Someone may have changed the refcount and maybe sleeping
>>>> +         * on the folio lock. In case of refcount mismatch, bail out,
>>>> +         * let the system make progress and retry.
>>>> +         */
>>>> +        struct address_space *mapping = folio_mapping(src);
>>>> +
>>>> +        if (folio_ref_count(src) != folio_expected_refs(mapping, 
>>>> src))
>>>> +            goto out;
>>>
>>> This really seems to be the latest point where we can "easily" back
>>> off and unlock the source folio -- in this function :)
>>>
>>> I wonder if we should be smarter in the migrate_pages_batch() loop
>>> when we start the actual migrations via migrate_folio_move(): if we
>>> detect that a folio has unexpected references *and* it has waiters
>>> (PG_waiters), back off then and retry the folio later. If it only has
>>> unexpected references, just keep retrying: no waiters -> nobody is
>>> waiting for the lock to make progress.
>>
>>
>> The patch currently retries migration irrespective of the reason of
>> refcount change.
>>
>> If you are suggesting that, break the retrying according to two 
>> conditions:
>
> That's not what I am suggesting ...
>
>>
>>
>>> This really seems to be the latest point where we can "easily" back
>>> off and unlock the source folio -- in this function :)
>>> For example, when migrate_folio_move() fails with -EAGAIN, check if
>>> there are waiters (PG_waiter?) and undo+unlock to try again later.
>>
>>
>> Currently, on -EAGAIN, migrate_folio_move() returns without undoing src
>> and dst; even if we were to fall
>
> ...
>
> I am wondering if we should detect here if there are waiters and undo 
> src+dst.

After undoing src+dst, which restores the PTEs, how are you going to set the

PTEs to migration again? That is being done through migrate_folio_unmap(),

and the loops of _unmap() and _move() are different. Or am I missing 
something...

>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch
  2024-08-11  6:06         ` Dev Jain
@ 2024-08-11  9:08           ` David Hildenbrand
  2024-08-12  5:35             ` Dev Jain
  0 siblings, 1 reply; 30+ messages in thread
From: David Hildenbrand @ 2024-08-11  9:08 UTC (permalink / raw)
  To: Dev Jain, akpm, shuah, willy
  Cc: ryan.roberts, anshuman.khandual, catalin.marinas, cl, vbabka,
	mhocko, apopple, osalvador, baolin.wang, dave.hansen, will,
	baohua, ioworker0, gshan, mark.rutland, kirill.shutemov, hughd,
	aneesh.kumar, yang, peterx, broonie, mgorman, linux-arm-kernel,
	linux-kernel, linux-mm, linux-kselftest, ying.huang

On 11.08.24 08:06, Dev Jain wrote:
> 
> On 8/11/24 00:22, David Hildenbrand wrote:
>> On 10.08.24 20:42, Dev Jain wrote:
>>>
>>> On 8/9/24 19:17, David Hildenbrand wrote:
>>>> On 09.08.24 12:31, Dev Jain wrote:
>>>>> As already being done in __migrate_folio(), wherein we backoff if the
>>>>> folio refcount is wrong, make this check during the unmapping phase,
>>>>> upon
>>>>> the failure of which, the original state of the PTEs will be restored
>>>>> and
>>>>> the folio lock will be dropped via migrate_folio_undo_src(), any
>>>>> racing
>>>>> thread will make progress and migration will be retried.
>>>>>
>>>>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>>>>> ---
>>>>>     mm/migrate.c | 9 +++++++++
>>>>>     1 file changed, 9 insertions(+)
>>>>>
>>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>>> index e7296c0fb5d5..477acf996951 100644
>>>>> --- a/mm/migrate.c
>>>>> +++ b/mm/migrate.c
>>>>> @@ -1250,6 +1250,15 @@ static int migrate_folio_unmap(new_folio_t
>>>>> get_new_folio,
>>>>>         }
>>>>>           if (!folio_mapped(src)) {
>>>>> +        /*
>>>>> +         * Someone may have changed the refcount and maybe sleeping
>>>>> +         * on the folio lock. In case of refcount mismatch, bail out,
>>>>> +         * let the system make progress and retry.
>>>>> +         */
>>>>> +        struct address_space *mapping = folio_mapping(src);
>>>>> +
>>>>> +        if (folio_ref_count(src) != folio_expected_refs(mapping,
>>>>> src))
>>>>> +            goto out;
>>>>
>>>> This really seems to be the latest point where we can "easily" back
>>>> off and unlock the source folio -- in this function :)
>>>>
>>>> I wonder if we should be smarter in the migrate_pages_batch() loop
>>>> when we start the actual migrations via migrate_folio_move(): if we
>>>> detect that a folio has unexpected references *and* it has waiters
>>>> (PG_waiters), back off then and retry the folio later. If it only has
>>>> unexpected references, just keep retrying: no waiters -> nobody is
>>>> waiting for the lock to make progress.
>>>
>>>
>>> The patch currently retries migration irrespective of the reason of
>>> refcount change.
>>>
>>> If you are suggesting that, break the retrying according to two
>>> conditions:
>>
>> That's not what I am suggesting ...
>>
>>>
>>>
>>>> This really seems to be the latest point where we can "easily" back
>>>> off and unlock the source folio -- in this function :)
>>>> For example, when migrate_folio_move() fails with -EAGAIN, check if
>>>> there are waiters (PG_waiter?) and undo+unlock to try again later.
>>>
>>>
>>> Currently, on -EAGAIN, migrate_folio_move() returns without undoing src
>>> and dst; even if we were to fall
>>
>> ...
>>
>> I am wondering if we should detect here if there are waiters and undo
>> src+dst.
> 
> After undoing src+dst, which restores the PTEs, how are you going to set the
> 
> PTEs to migration again? That is being done through migrate_folio_unmap(),
> 
> and the loops of _unmap() and _move() are different. Or am I missing
> something...

Again, no expert on the code, but it would mean that if we detect that 
there are waiters, we would undo src+dst and add them to ret_folios, 
similar to what we do in "Cleanup remaining folios" at the end of 
migrate_pages_batch()?

So instead of retrying migration of that folio, just give it up 
immediately and retry again later.

Of course, this means that (without further modifications to that 
function), we would leave retrying these folios to the caller, such as 
in migrate_pages_sync(), where we move ret_folios to the tail of 
"folios" and retry migration.

Maybe one would want to optimize that retry logic with such "temporarily 
failed because someone else has to make progress for us to make progress 
and free up a page reference" case. These are different to the typical 
"speculative" references that we try to handle via the existing retry magic.

Please let me know if I am missing something fundamental.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch
  2024-08-09 10:31 ` [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch Dev Jain
  2024-08-09 13:47   ` David Hildenbrand
@ 2024-08-12  5:34   ` Huang, Ying
  2024-08-12  6:01     ` Dev Jain
  2024-08-12  6:13     ` Dev Jain
  1 sibling, 2 replies; 30+ messages in thread
From: Huang, Ying @ 2024-08-12  5:34 UTC (permalink / raw)
  To: Dev Jain
  Cc: akpm, shuah, david, willy, ryan.roberts, anshuman.khandual,
	catalin.marinas, cl, vbabka, mhocko, apopple, osalvador,
	baolin.wang, dave.hansen, will, baohua, ioworker0, gshan,
	mark.rutland, kirill.shutemov, hughd, aneesh.kumar, yang, peterx,
	broonie, mgorman, linux-arm-kernel, linux-kernel, linux-mm,
	linux-kselftest

Hi, Dev,

Dev Jain <dev.jain@arm.com> writes:

> As already being done in __migrate_folio(), wherein we backoff if the
> folio refcount is wrong, make this check during the unmapping phase, upon
> the failure of which, the original state of the PTEs will be restored and
> the folio lock will be dropped via migrate_folio_undo_src(), any racing
> thread will make progress and migration will be retried.
>
> Signed-off-by: Dev Jain <dev.jain@arm.com>
> ---
>  mm/migrate.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index e7296c0fb5d5..477acf996951 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1250,6 +1250,15 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
>  	}
>  
>  	if (!folio_mapped(src)) {
> +		/*
> +		 * Someone may have changed the refcount and maybe sleeping
> +		 * on the folio lock. In case of refcount mismatch, bail out,
> +		 * let the system make progress and retry.
> +		 */
> +		struct address_space *mapping = folio_mapping(src);
> +
> +		if (folio_ref_count(src) != folio_expected_refs(mapping, src))
> +			goto out;
>  		__migrate_folio_record(dst, old_page_state, anon_vma);
>  		return MIGRATEPAGE_UNMAP;
>  	}

Do you have some test results for this?  For example, after applying the
patch, the migration success rate increased XX%, etc.

My understanding for this issue is that the migration success rate can
increase if we undo all changes before retrying.  This is the current
behavior for sync migration, but not for async migration.  If so, we can
use migrate_pages_sync() for async migration too to increase success
rate?  Of course, we need to change the function name and comments.

--
Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch
  2024-08-11  9:08           ` David Hildenbrand
@ 2024-08-12  5:35             ` Dev Jain
  2024-08-12  9:30               ` David Hildenbrand
  0 siblings, 1 reply; 30+ messages in thread
From: Dev Jain @ 2024-08-12  5:35 UTC (permalink / raw)
  To: David Hildenbrand, akpm, shuah, willy
  Cc: ryan.roberts, anshuman.khandual, catalin.marinas, cl, vbabka,
	mhocko, apopple, osalvador, baolin.wang, dave.hansen, will,
	baohua, ioworker0, gshan, mark.rutland, kirill.shutemov, hughd,
	aneesh.kumar, yang, peterx, broonie, mgorman, linux-arm-kernel,
	linux-kernel, linux-mm, linux-kselftest, ying.huang


On 8/11/24 14:38, David Hildenbrand wrote:
> On 11.08.24 08:06, Dev Jain wrote:
>>
>> On 8/11/24 00:22, David Hildenbrand wrote:
>>> On 10.08.24 20:42, Dev Jain wrote:
>>>>
>>>> On 8/9/24 19:17, David Hildenbrand wrote:
>>>>> On 09.08.24 12:31, Dev Jain wrote:
>>>>>> As already being done in __migrate_folio(), wherein we backoff if 
>>>>>> the
>>>>>> folio refcount is wrong, make this check during the unmapping phase,
>>>>>> upon
>>>>>> the failure of which, the original state of the PTEs will be 
>>>>>> restored
>>>>>> and
>>>>>> the folio lock will be dropped via migrate_folio_undo_src(), any
>>>>>> racing
>>>>>> thread will make progress and migration will be retried.
>>>>>>
>>>>>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>>>>>> ---
>>>>>>     mm/migrate.c | 9 +++++++++
>>>>>>     1 file changed, 9 insertions(+)
>>>>>>
>>>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>>>> index e7296c0fb5d5..477acf996951 100644
>>>>>> --- a/mm/migrate.c
>>>>>> +++ b/mm/migrate.c
>>>>>> @@ -1250,6 +1250,15 @@ static int migrate_folio_unmap(new_folio_t
>>>>>> get_new_folio,
>>>>>>         }
>>>>>>           if (!folio_mapped(src)) {
>>>>>> +        /*
>>>>>> +         * Someone may have changed the refcount and maybe sleeping
>>>>>> +         * on the folio lock. In case of refcount mismatch, bail 
>>>>>> out,
>>>>>> +         * let the system make progress and retry.
>>>>>> +         */
>>>>>> +        struct address_space *mapping = folio_mapping(src);
>>>>>> +
>>>>>> +        if (folio_ref_count(src) != folio_expected_refs(mapping,
>>>>>> src))
>>>>>> +            goto out;
>>>>>
>>>>> This really seems to be the latest point where we can "easily" back
>>>>> off and unlock the source folio -- in this function :)
>>>>>
>>>>> I wonder if we should be smarter in the migrate_pages_batch() loop
>>>>> when we start the actual migrations via migrate_folio_move(): if we
>>>>> detect that a folio has unexpected references *and* it has waiters
>>>>> (PG_waiters), back off then and retry the folio later. If it only has
>>>>> unexpected references, just keep retrying: no waiters -> nobody is
>>>>> waiting for the lock to make progress.
>>>>
>>>>
>>>> The patch currently retries migration irrespective of the reason of
>>>> refcount change.
>>>>
>>>> If you are suggesting that, break the retrying according to two
>>>> conditions:
>>>
>>> That's not what I am suggesting ...
>>>
>>>>
>>>>
>>>>> This really seems to be the latest point where we can "easily" back
>>>>> off and unlock the source folio -- in this function :)
>>>>> For example, when migrate_folio_move() fails with -EAGAIN, check if
>>>>> there are waiters (PG_waiter?) and undo+unlock to try again later.
>>>>
>>>>
>>>> Currently, on -EAGAIN, migrate_folio_move() returns without undoing 
>>>> src
>>>> and dst; even if we were to fall
>>>
>>> ...
>>>
>>> I am wondering if we should detect here if there are waiters and undo
>>> src+dst.
>>
>> After undoing src+dst, which restores the PTEs, how are you going to 
>> set the
>>
>> PTEs to migration again? That is being done through 
>> migrate_folio_unmap(),
>>
>> and the loops of _unmap() and _move() are different. Or am I missing
>> something...
>
> Again, no expert on the code, but it would mean that if we detect that 
> there are waiters, we would undo src+dst and add them to ret_folios, 
> similar to what we do in "Cleanup remaining folios" at the end of 
> migrate_pages_batch()?
>
> So instead of retrying migration of that folio, just give it up 
> immediately and retry again later.
>
> Of course, this means that (without further modifications to that 
> function), we would leave retrying these folios to the caller, such as 
> in migrate_pages_sync(), where we move ret_folios to the tail of 
> "folios" and retry migration.

So IIUC, you are saying to change the return value in 
__folio_migrate_mapping(), so that when move_to_new_folio() fails

in migrate_folio_move(), we end up in the retrying loop of _sync() which 
calls _batch() in synchronous mode. Here, we

will have to make a change to decide how much we want to retry?

>
>
> Maybe one would want to optimize that retry logic with such 
> "temporarily failed because someone else has to make progress for us 
> to make progress and free up a page reference" case. These are 
> different to the typical "speculative" references that we try to 
> handle via the existing retry magic.
>
> Please let me know if I am missing something fundamental.
>
>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch
  2024-08-12  5:34   ` Huang, Ying
@ 2024-08-12  6:01     ` Dev Jain
  2024-08-12  6:15       ` Huang, Ying
  2024-08-12  6:13     ` Dev Jain
  1 sibling, 1 reply; 30+ messages in thread
From: Dev Jain @ 2024-08-12  6:01 UTC (permalink / raw)
  To: Huang, Ying
  Cc: akpm, shuah, david, willy, ryan.roberts, anshuman.khandual,
	catalin.marinas, cl, vbabka, mhocko, apopple, osalvador,
	baolin.wang, dave.hansen, will, baohua, ioworker0, gshan,
	mark.rutland, kirill.shutemov, hughd, aneesh.kumar, yang, peterx,
	broonie, mgorman, linux-arm-kernel, linux-kernel, linux-mm,
	linux-kselftest


On 8/12/24 11:04, Huang, Ying wrote:
> Hi, Dev,
>
> Dev Jain <dev.jain@arm.com> writes:
>
>> As already being done in __migrate_folio(), wherein we backoff if the
>> folio refcount is wrong, make this check during the unmapping phase, upon
>> the failure of which, the original state of the PTEs will be restored and
>> the folio lock will be dropped via migrate_folio_undo_src(), any racing
>> thread will make progress and migration will be retried.
>>
>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>> ---
>>   mm/migrate.c | 9 +++++++++
>>   1 file changed, 9 insertions(+)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index e7296c0fb5d5..477acf996951 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1250,6 +1250,15 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
>>   	}
>>   
>>   	if (!folio_mapped(src)) {
>> +		/*
>> +		 * Someone may have changed the refcount and maybe sleeping
>> +		 * on the folio lock. In case of refcount mismatch, bail out,
>> +		 * let the system make progress and retry.
>> +		 */
>> +		struct address_space *mapping = folio_mapping(src);
>> +
>> +		if (folio_ref_count(src) != folio_expected_refs(mapping, src))
>> +			goto out;
>>   		__migrate_folio_record(dst, old_page_state, anon_vma);
>>   		return MIGRATEPAGE_UNMAP;
>>   	}
> Do you have some test results for this?  For example, after applying the
> patch, the migration success rate increased XX%, etc.

I'll get back to you on this.

>
> My understanding for this issue is that the migration success rate can
> increase if we undo all changes before retrying.  This is the current
> behavior for sync migration, but not for async migration.  If so, we can
> use migrate_pages_sync() for async migration too to increase success
> rate?  Of course, we need to change the function name and comments.


As per my understanding, this is not the current behaviour for sync
migration. After successful unmapping, we fail in migrate_folio_move()
with -EAGAIN, we do not call undo src+dst (rendering the loop around
migrate_folio_move() futile), we do not push the failed folio onto the
ret_folios list, therefore, in _sync(), _batch() is never tried again.

>
> --
> Best Regards,
> Huang, Ying


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch
  2024-08-12  5:34   ` Huang, Ying
  2024-08-12  6:01     ` Dev Jain
@ 2024-08-12  6:13     ` Dev Jain
  2024-08-12  6:20       ` Huang, Ying
  1 sibling, 1 reply; 30+ messages in thread
From: Dev Jain @ 2024-08-12  6:13 UTC (permalink / raw)
  To: Huang, Ying
  Cc: akpm, shuah, david, willy, ryan.roberts, anshuman.khandual,
	catalin.marinas, cl, vbabka, mhocko, apopple, osalvador,
	baolin.wang, dave.hansen, will, baohua, ioworker0, gshan,
	mark.rutland, kirill.shutemov, hughd, aneesh.kumar, yang, peterx,
	broonie, mgorman, linux-arm-kernel, linux-kernel, linux-mm,
	linux-kselftest


On 8/12/24 11:04, Huang, Ying wrote:
> Hi, Dev,
>
> Dev Jain <dev.jain@arm.com> writes:
>
>> As already being done in __migrate_folio(), wherein we backoff if the
>> folio refcount is wrong, make this check during the unmapping phase, upon
>> the failure of which, the original state of the PTEs will be restored and
>> the folio lock will be dropped via migrate_folio_undo_src(), any racing
>> thread will make progress and migration will be retried.
>>
>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>> ---
>>   mm/migrate.c | 9 +++++++++
>>   1 file changed, 9 insertions(+)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index e7296c0fb5d5..477acf996951 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1250,6 +1250,15 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
>>   	}
>>   
>>   	if (!folio_mapped(src)) {
>> +		/*
>> +		 * Someone may have changed the refcount and maybe sleeping
>> +		 * on the folio lock. In case of refcount mismatch, bail out,
>> +		 * let the system make progress and retry.
>> +		 */
>> +		struct address_space *mapping = folio_mapping(src);
>> +
>> +		if (folio_ref_count(src) != folio_expected_refs(mapping, src))
>> +			goto out;
>>   		__migrate_folio_record(dst, old_page_state, anon_vma);
>>   		return MIGRATEPAGE_UNMAP;
>>   	}
> Do you have some test results for this?  For example, after applying the
> patch, the migration success rate increased XX%, etc.

Noting that the migration selftest is operating on a single page,
before the patch, the test fails on shared-anon mappings on an
average of 10 iterations of move_pages(), and after applying the
patch it fails on average of 100 iterations, which makes sense
because the unmapping() will get retried 3 + 7 = 10 times.

>
> My understanding for this issue is that the migration success rate can
> increase if we undo all changes before retrying.  This is the current
> behavior for sync migration, but not for async migration.  If so, we can
> use migrate_pages_sync() for async migration too to increase success
> rate?  Of course, we need to change the function name and comments.
>
> --
> Best Regards,
> Huang, Ying


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch
  2024-08-12  6:01     ` Dev Jain
@ 2024-08-12  6:15       ` Huang, Ying
  2024-08-12  6:52         ` Dev Jain
  0 siblings, 1 reply; 30+ messages in thread
From: Huang, Ying @ 2024-08-12  6:15 UTC (permalink / raw)
  To: Dev Jain
  Cc: akpm, shuah, david, willy, ryan.roberts, anshuman.khandual,
	catalin.marinas, cl, vbabka, mhocko, apopple, osalvador,
	baolin.wang, dave.hansen, will, baohua, ioworker0, gshan,
	mark.rutland, kirill.shutemov, hughd, aneesh.kumar, yang, peterx,
	broonie, mgorman, linux-arm-kernel, linux-kernel, linux-mm,
	linux-kselftest

Dev Jain <dev.jain@arm.com> writes:

> On 8/12/24 11:04, Huang, Ying wrote:
>> Hi, Dev,
>>
>> Dev Jain <dev.jain@arm.com> writes:
>>
>>> As already being done in __migrate_folio(), wherein we backoff if the
>>> folio refcount is wrong, make this check during the unmapping phase, upon
>>> the failure of which, the original state of the PTEs will be restored and
>>> the folio lock will be dropped via migrate_folio_undo_src(), any racing
>>> thread will make progress and migration will be retried.
>>>
>>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>>> ---
>>>   mm/migrate.c | 9 +++++++++
>>>   1 file changed, 9 insertions(+)
>>>
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index e7296c0fb5d5..477acf996951 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -1250,6 +1250,15 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
>>>   	}
>>>     	if (!folio_mapped(src)) {
>>> +		/*
>>> +		 * Someone may have changed the refcount and maybe sleeping
>>> +		 * on the folio lock. In case of refcount mismatch, bail out,
>>> +		 * let the system make progress and retry.
>>> +		 */
>>> +		struct address_space *mapping = folio_mapping(src);
>>> +
>>> +		if (folio_ref_count(src) != folio_expected_refs(mapping, src))
>>> +			goto out;
>>>   		__migrate_folio_record(dst, old_page_state, anon_vma);
>>>   		return MIGRATEPAGE_UNMAP;
>>>   	}
>> Do you have some test results for this?  For example, after applying the
>> patch, the migration success rate increased XX%, etc.
>
> I'll get back to you on this.
>
>>
>> My understanding for this issue is that the migration success rate can
>> increase if we undo all changes before retrying.  This is the current
>> behavior for sync migration, but not for async migration.  If so, we can
>> use migrate_pages_sync() for async migration too to increase success
>> rate?  Of course, we need to change the function name and comments.
>
>
> As per my understanding, this is not the current behaviour for sync
> migration. After successful unmapping, we fail in migrate_folio_move()
> with -EAGAIN, we do not call undo src+dst (rendering the loop around
> migrate_folio_move() futile), we do not push the failed folio onto the
> ret_folios list, therefore, in _sync(), _batch() is never tried again.

In migrate_pages_sync(), migrate_pages_batch(,MIGRATE_ASYNC) will be
called first, if failed, the folio will be restored to the original
state (unlocked).  Then migrate_pages_batch(,_SYNC*) is called again.
So, we unlock once.  If it's necessary, we can unlock more times via
another level of loop.

--
Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] selftests/mm: Do not fail test for a single migration failure
  2024-08-09 17:13   ` Shuah Khan
  2024-08-09 21:10     ` Christoph Lameter (Ampere)
@ 2024-08-12  6:19     ` Dev Jain
  1 sibling, 0 replies; 30+ messages in thread
From: Dev Jain @ 2024-08-12  6:19 UTC (permalink / raw)
  To: Shuah Khan, akpm, shuah, david, willy
  Cc: ryan.roberts, anshuman.khandual, catalin.marinas, cl, vbabka,
	mhocko, apopple, osalvador, baolin.wang, dave.hansen, will,
	baohua, ioworker0, gshan, mark.rutland, kirill.shutemov, hughd,
	aneesh.kumar, yang, peterx, broonie, mgorman, linux-arm-kernel,
	linux-kernel, linux-mm, linux-kselftest


On 8/9/24 22:43, Shuah Khan wrote:
> On 8/9/24 04:31, Dev Jain wrote:
>> Do not fail the test for just a single instance of migration failure,
>> since migration is a best-effort service.
>
> The cover letter says:
>
> "Given that migration is a best-effort service, it is wrong to fail the
> test for just a single failure; hence, fail the test after 100 
> consecutive
> failures (where 100 is still a subjective choice)."
>
> You do want to mention the above here.


Sure, shall update in v2.


>
> The reason being, I would like to know what this does to the run-time of
> this test if migration fails and retried 100 times.


Sure; just for the note, it won't affect the execution time of the test 
since
that is controlled by a timeout mechanism.


>
>>
>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
>> Tested-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>   tools/testing/selftests/mm/migration.c | 17 +++++++++++------
>>   1 file changed, 11 insertions(+), 6 deletions(-)
>>
>> diff --git a/tools/testing/selftests/mm/migration.c 
>> b/tools/testing/selftests/mm/migration.c
>> index 6908569ef406..64bcbb7151cf 100644
>> --- a/tools/testing/selftests/mm/migration.c
>> +++ b/tools/testing/selftests/mm/migration.c
>> @@ -15,10 +15,10 @@
>>   #include <signal.h>
>>   #include <time.h>
>>   -#define TWOMEG (2<<20)
>> -#define RUNTIME (20)
>> -
>> -#define ALIGN(x, a) (((x) + (a - 1)) & (~((a) - 1)))
>> +#define TWOMEG        (2<<20)
>> +#define RUNTIME        (20)
>> +#define MAX_RETRIES    100
>> +#define ALIGN(x, a)    (((x) + (a - 1)) & (~((a) - 1)))
>>     FIXTURE(migration)
>>   {
>> @@ -65,6 +65,7 @@ int migrate(uint64_t *ptr, int n1, int n2)
>>       int ret, tmp;
>>       int status = 0;
>>       struct timespec ts1, ts2;
>> +    int failures = 0;
>>         if (clock_gettime(CLOCK_MONOTONIC, &ts1))
>>           return -1;
>> @@ -79,13 +80,17 @@ int migrate(uint64_t *ptr, int n1, int n2)
>>           ret = move_pages(0, 1, (void **) &ptr, &n2, &status,
>>                   MPOL_MF_MOVE_ALL);
>>           if (ret) {
>> -            if (ret > 0)
>> +            if (ret > 0) {
>> +                /* Migration is best effort; try again */
>> +                if (++failures < MAX_RETRIES)
>> +                    continue;
>>                   printf("Didn't migrate %d pages\n", ret);
>> +            }
>>               else
>>                   perror("Couldn't migrate pages");
>>               return -2;
>>           }
>> -
>> +        failures = 0;
>>           tmp = n2;
>>           n2 = n1;
>>           n1 = tmp;
>
> thanks,
> -- Shuah


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch
  2024-08-12  6:13     ` Dev Jain
@ 2024-08-12  6:20       ` Huang, Ying
  2024-08-12  6:32         ` Dev Jain
  0 siblings, 1 reply; 30+ messages in thread
From: Huang, Ying @ 2024-08-12  6:20 UTC (permalink / raw)
  To: Dev Jain
  Cc: akpm, shuah, david, willy, ryan.roberts, anshuman.khandual,
	catalin.marinas, cl, vbabka, mhocko, apopple, osalvador,
	baolin.wang, dave.hansen, will, baohua, ioworker0, gshan,
	mark.rutland, kirill.shutemov, hughd, aneesh.kumar, yang, peterx,
	broonie, mgorman, linux-arm-kernel, linux-kernel, linux-mm,
	linux-kselftest

Dev Jain <dev.jain@arm.com> writes:

> On 8/12/24 11:04, Huang, Ying wrote:
>> Hi, Dev,
>>
>> Dev Jain <dev.jain@arm.com> writes:
>>
>>> As already being done in __migrate_folio(), wherein we backoff if the
>>> folio refcount is wrong, make this check during the unmapping phase, upon
>>> the failure of which, the original state of the PTEs will be restored and
>>> the folio lock will be dropped via migrate_folio_undo_src(), any racing
>>> thread will make progress and migration will be retried.
>>>
>>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>>> ---
>>>   mm/migrate.c | 9 +++++++++
>>>   1 file changed, 9 insertions(+)
>>>
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index e7296c0fb5d5..477acf996951 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -1250,6 +1250,15 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
>>>   	}
>>>     	if (!folio_mapped(src)) {
>>> +		/*
>>> +		 * Someone may have changed the refcount and maybe sleeping
>>> +		 * on the folio lock. In case of refcount mismatch, bail out,
>>> +		 * let the system make progress and retry.
>>> +		 */
>>> +		struct address_space *mapping = folio_mapping(src);
>>> +
>>> +		if (folio_ref_count(src) != folio_expected_refs(mapping, src))
>>> +			goto out;
>>>   		__migrate_folio_record(dst, old_page_state, anon_vma);
>>>   		return MIGRATEPAGE_UNMAP;
>>>   	}
>> Do you have some test results for this?  For example, after applying the
>> patch, the migration success rate increased XX%, etc.
>
> Noting that the migration selftest is operating on a single page,
> before the patch, the test fails on shared-anon mappings on an
> average of 10 iterations of move_pages(), and after applying the
> patch it fails on average of 100 iterations, which makes sense
> because the unmapping() will get retried 3 + 7 = 10 times.

Thanks!  What is the test results for

https://lore.kernel.org/all/20240801081657.1386743-1-dev.jain@arm.com/

?

>>
>> My understanding for this issue is that the migration success rate can
>> increase if we undo all changes before retrying.  This is the current
>> behavior for sync migration, but not for async migration.  If so, we can
>> use migrate_pages_sync() for async migration too to increase success
>> rate?  Of course, we need to change the function name and comments.

--
Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch
  2024-08-12  6:20       ` Huang, Ying
@ 2024-08-12  6:32         ` Dev Jain
  0 siblings, 0 replies; 30+ messages in thread
From: Dev Jain @ 2024-08-12  6:32 UTC (permalink / raw)
  To: Huang, Ying
  Cc: akpm, shuah, david, willy, ryan.roberts, anshuman.khandual,
	catalin.marinas, cl, vbabka, mhocko, apopple, osalvador,
	baolin.wang, dave.hansen, will, baohua, ioworker0, gshan,
	mark.rutland, kirill.shutemov, hughd, aneesh.kumar, yang, peterx,
	broonie, mgorman, linux-arm-kernel, linux-kernel, linux-mm,
	linux-kselftest


On 8/12/24 11:50, Huang, Ying wrote:
> Dev Jain <dev.jain@arm.com> writes:
>
>> On 8/12/24 11:04, Huang, Ying wrote:
>>> Hi, Dev,
>>>
>>> Dev Jain <dev.jain@arm.com> writes:
>>>
>>>> As already being done in __migrate_folio(), wherein we backoff if the
>>>> folio refcount is wrong, make this check during the unmapping phase, upon
>>>> the failure of which, the original state of the PTEs will be restored and
>>>> the folio lock will be dropped via migrate_folio_undo_src(), any racing
>>>> thread will make progress and migration will be retried.
>>>>
>>>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>>>> ---
>>>>    mm/migrate.c | 9 +++++++++
>>>>    1 file changed, 9 insertions(+)
>>>>
>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>> index e7296c0fb5d5..477acf996951 100644
>>>> --- a/mm/migrate.c
>>>> +++ b/mm/migrate.c
>>>> @@ -1250,6 +1250,15 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
>>>>    	}
>>>>      	if (!folio_mapped(src)) {
>>>> +		/*
>>>> +		 * Someone may have changed the refcount and maybe sleeping
>>>> +		 * on the folio lock. In case of refcount mismatch, bail out,
>>>> +		 * let the system make progress and retry.
>>>> +		 */
>>>> +		struct address_space *mapping = folio_mapping(src);
>>>> +
>>>> +		if (folio_ref_count(src) != folio_expected_refs(mapping, src))
>>>> +			goto out;
>>>>    		__migrate_folio_record(dst, old_page_state, anon_vma);
>>>>    		return MIGRATEPAGE_UNMAP;
>>>>    	}
>>> Do you have some test results for this?  For example, after applying the
>>> patch, the migration success rate increased XX%, etc.
>> Noting that the migration selftest is operating on a single page,
>> before the patch, the test fails on shared-anon mappings on an
>> average of 10 iterations of move_pages(), and after applying the
>> patch it fails on average of 100 iterations, which makes sense
>> because the unmapping() will get retried 3 + 7 = 10 times.
> Thanks!  What is the test results for
>
> https://lore.kernel.org/all/20240801081657.1386743-1-dev.jain@arm.com/
>
> ?

That solves the problem completely, makes the test pass. Although
that still may not be a good solution since it solves the problem
for this particular case (reader thread faulting and raising refcount).
As David notes, a concurrent read()/write() should also create this
refcount race problem.


>
>>> My understanding for this issue is that the migration success rate can
>>> increase if we undo all changes before retrying.  This is the current
>>> behavior for sync migration, but not for async migration.  If so, we can
>>> use migrate_pages_sync() for async migration too to increase success
>>> rate?  Of course, we need to change the function name and comments.
> --
> Best Regards,
> Huang, Ying
>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch
  2024-08-12  6:15       ` Huang, Ying
@ 2024-08-12  6:52         ` Dev Jain
  2024-08-12  7:31           ` Huang, Ying
  0 siblings, 1 reply; 30+ messages in thread
From: Dev Jain @ 2024-08-12  6:52 UTC (permalink / raw)
  To: Huang, Ying
  Cc: akpm, shuah, david, willy, ryan.roberts, anshuman.khandual,
	catalin.marinas, cl, vbabka, mhocko, apopple, osalvador,
	baolin.wang, dave.hansen, will, baohua, ioworker0, gshan,
	mark.rutland, kirill.shutemov, hughd, aneesh.kumar, yang, peterx,
	broonie, mgorman, linux-arm-kernel, linux-kernel, linux-mm,
	linux-kselftest


On 8/12/24 11:45, Huang, Ying wrote:
> Dev Jain <dev.jain@arm.com> writes:
>
>> On 8/12/24 11:04, Huang, Ying wrote:
>>> Hi, Dev,
>>>
>>> Dev Jain <dev.jain@arm.com> writes:
>>>
>>>> As already being done in __migrate_folio(), wherein we backoff if the
>>>> folio refcount is wrong, make this check during the unmapping phase, upon
>>>> the failure of which, the original state of the PTEs will be restored and
>>>> the folio lock will be dropped via migrate_folio_undo_src(), any racing
>>>> thread will make progress and migration will be retried.
>>>>
>>>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>>>> ---
>>>>    mm/migrate.c | 9 +++++++++
>>>>    1 file changed, 9 insertions(+)
>>>>
>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>> index e7296c0fb5d5..477acf996951 100644
>>>> --- a/mm/migrate.c
>>>> +++ b/mm/migrate.c
>>>> @@ -1250,6 +1250,15 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
>>>>    	}
>>>>      	if (!folio_mapped(src)) {
>>>> +		/*
>>>> +		 * Someone may have changed the refcount and maybe sleeping
>>>> +		 * on the folio lock. In case of refcount mismatch, bail out,
>>>> +		 * let the system make progress and retry.
>>>> +		 */
>>>> +		struct address_space *mapping = folio_mapping(src);
>>>> +
>>>> +		if (folio_ref_count(src) != folio_expected_refs(mapping, src))
>>>> +			goto out;
>>>>    		__migrate_folio_record(dst, old_page_state, anon_vma);
>>>>    		return MIGRATEPAGE_UNMAP;
>>>>    	}
>>> Do you have some test results for this?  For example, after applying the
>>> patch, the migration success rate increased XX%, etc.
>> I'll get back to you on this.
>>
>>> My understanding for this issue is that the migration success rate can
>>> increase if we undo all changes before retrying.  This is the current
>>> behavior for sync migration, but not for async migration.  If so, we can
>>> use migrate_pages_sync() for async migration too to increase success
>>> rate?  Of course, we need to change the function name and comments.
>>
>> As per my understanding, this is not the current behaviour for sync
>> migration. After successful unmapping, we fail in migrate_folio_move()
>> with -EAGAIN, we do not call undo src+dst (rendering the loop around
>> migrate_folio_move() futile), we do not push the failed folio onto the
>> ret_folios list, therefore, in _sync(), _batch() is never tried again.
> In migrate_pages_sync(), migrate_pages_batch(,MIGRATE_ASYNC) will be
> called first, if failed, the folio will be restored to the original
> state (unlocked).  Then migrate_pages_batch(,_SYNC*) is called again.
> So, we unlock once.  If it's necessary, we can unlock more times via
> another level of loop.

Yes, that's my point. We need to undo src+dst and retry. We will have
to decide where we want this retrying to be; do we want to change the
return value, end up in the while loop wrapped around _sync(), and retry
there by adding another level of loop, or do we want to make use of the
existing retry loops, one of which is wrapped around _unmap(); the latter
is my approach. The utility I see for the former approach is that, in case
of a large number of page migrations (which should usually be the case),
we are giving more time for the folio to get retried. The latter does not
give much time and discards the folio if it did not succeed under 7 times.

>
> --
> Best Regards,
> Huang, Ying


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch
  2024-08-12  6:52         ` Dev Jain
@ 2024-08-12  7:31           ` Huang, Ying
  2024-08-12 12:08             ` Dev Jain
  0 siblings, 1 reply; 30+ messages in thread
From: Huang, Ying @ 2024-08-12  7:31 UTC (permalink / raw)
  To: Dev Jain
  Cc: akpm, shuah, david, willy, ryan.roberts, anshuman.khandual,
	catalin.marinas, cl, vbabka, mhocko, apopple, osalvador,
	baolin.wang, dave.hansen, will, baohua, ioworker0, gshan,
	mark.rutland, kirill.shutemov, hughd, aneesh.kumar, yang, peterx,
	broonie, mgorman, linux-arm-kernel, linux-kernel, linux-mm,
	linux-kselftest

Dev Jain <dev.jain@arm.com> writes:

> On 8/12/24 11:45, Huang, Ying wrote:
>> Dev Jain <dev.jain@arm.com> writes:
>>
>>> On 8/12/24 11:04, Huang, Ying wrote:
>>>> Hi, Dev,
>>>>
>>>> Dev Jain <dev.jain@arm.com> writes:
>>>>
>>>>> As already being done in __migrate_folio(), wherein we backoff if the
>>>>> folio refcount is wrong, make this check during the unmapping phase, upon
>>>>> the failure of which, the original state of the PTEs will be restored and
>>>>> the folio lock will be dropped via migrate_folio_undo_src(), any racing
>>>>> thread will make progress and migration will be retried.
>>>>>
>>>>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>>>>> ---
>>>>>    mm/migrate.c | 9 +++++++++
>>>>>    1 file changed, 9 insertions(+)
>>>>>
>>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>>> index e7296c0fb5d5..477acf996951 100644
>>>>> --- a/mm/migrate.c
>>>>> +++ b/mm/migrate.c
>>>>> @@ -1250,6 +1250,15 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
>>>>>    	}
>>>>>      	if (!folio_mapped(src)) {
>>>>> +		/*
>>>>> +		 * Someone may have changed the refcount and maybe sleeping
>>>>> +		 * on the folio lock. In case of refcount mismatch, bail out,
>>>>> +		 * let the system make progress and retry.
>>>>> +		 */
>>>>> +		struct address_space *mapping = folio_mapping(src);
>>>>> +
>>>>> +		if (folio_ref_count(src) != folio_expected_refs(mapping, src))
>>>>> +			goto out;
>>>>>    		__migrate_folio_record(dst, old_page_state, anon_vma);
>>>>>    		return MIGRATEPAGE_UNMAP;
>>>>>    	}
>>>> Do you have some test results for this?  For example, after applying the
>>>> patch, the migration success rate increased XX%, etc.
>>> I'll get back to you on this.
>>>
>>>> My understanding for this issue is that the migration success rate can
>>>> increase if we undo all changes before retrying.  This is the current
>>>> behavior for sync migration, but not for async migration.  If so, we can
>>>> use migrate_pages_sync() for async migration too to increase success
>>>> rate?  Of course, we need to change the function name and comments.
>>>
>>> As per my understanding, this is not the current behaviour for sync
>>> migration. After successful unmapping, we fail in migrate_folio_move()
>>> with -EAGAIN, we do not call undo src+dst (rendering the loop around
>>> migrate_folio_move() futile), we do not push the failed folio onto the
>>> ret_folios list, therefore, in _sync(), _batch() is never tried again.
>> In migrate_pages_sync(), migrate_pages_batch(,MIGRATE_ASYNC) will be
>> called first, if failed, the folio will be restored to the original
>> state (unlocked).  Then migrate_pages_batch(,_SYNC*) is called again.
>> So, we unlock once.  If it's necessary, we can unlock more times via
>> another level of loop.
>
> Yes, that's my point. We need to undo src+dst and retry.

For sync migration, we undo src+dst and retry now, but only once.  You
have shown that more retrying increases success rate.

> We will have
> to decide where we want this retrying to be; do we want to change the
> return value, end up in the while loop wrapped around _sync(), and retry
> there by adding another level of loop, or do we want to make use of the
> existing retry loops, one of which is wrapped around _unmap(); the latter
> is my approach. The utility I see for the former approach is that, in case
> of a large number of page migrations (which should usually be the case),
> we are giving more time for the folio to get retried. The latter does not
> give much time and discards the folio if it did not succeed under 7 times.

Because it's a race, I guess that most folios will be migrated
successfully in the first pass.

My concerns of your method are that it deal with just one case
specially.  While retrying after undoing all appears more general.

If it's really important to retry after undoing all, we can either
convert two retying loops of migrate_pages_batch() into one loop, or
remove retry loop in migrate_pages_batch() and retry in its caller
instead.

--
Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch
  2024-08-12  5:35             ` Dev Jain
@ 2024-08-12  9:30               ` David Hildenbrand
  0 siblings, 0 replies; 30+ messages in thread
From: David Hildenbrand @ 2024-08-12  9:30 UTC (permalink / raw)
  To: Dev Jain, akpm, shuah, willy
  Cc: ryan.roberts, anshuman.khandual, catalin.marinas, cl, vbabka,
	mhocko, apopple, osalvador, baolin.wang, dave.hansen, will,
	baohua, ioworker0, gshan, mark.rutland, kirill.shutemov, hughd,
	aneesh.kumar, yang, peterx, broonie, mgorman, linux-arm-kernel,
	linux-kernel, linux-mm, linux-kselftest, ying.huang

On 12.08.24 07:35, Dev Jain wrote:
> 
> On 8/11/24 14:38, David Hildenbrand wrote:
>> On 11.08.24 08:06, Dev Jain wrote:
>>>
>>> On 8/11/24 00:22, David Hildenbrand wrote:
>>>> On 10.08.24 20:42, Dev Jain wrote:
>>>>>
>>>>> On 8/9/24 19:17, David Hildenbrand wrote:
>>>>>> On 09.08.24 12:31, Dev Jain wrote:
>>>>>>> As already being done in __migrate_folio(), wherein we backoff if
>>>>>>> the
>>>>>>> folio refcount is wrong, make this check during the unmapping phase,
>>>>>>> upon
>>>>>>> the failure of which, the original state of the PTEs will be
>>>>>>> restored
>>>>>>> and
>>>>>>> the folio lock will be dropped via migrate_folio_undo_src(), any
>>>>>>> racing
>>>>>>> thread will make progress and migration will be retried.
>>>>>>>
>>>>>>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>>>>>>> ---
>>>>>>>      mm/migrate.c | 9 +++++++++
>>>>>>>      1 file changed, 9 insertions(+)
>>>>>>>
>>>>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>>>>> index e7296c0fb5d5..477acf996951 100644
>>>>>>> --- a/mm/migrate.c
>>>>>>> +++ b/mm/migrate.c
>>>>>>> @@ -1250,6 +1250,15 @@ static int migrate_folio_unmap(new_folio_t
>>>>>>> get_new_folio,
>>>>>>>          }
>>>>>>>            if (!folio_mapped(src)) {
>>>>>>> +        /*
>>>>>>> +         * Someone may have changed the refcount and maybe sleeping
>>>>>>> +         * on the folio lock. In case of refcount mismatch, bail
>>>>>>> out,
>>>>>>> +         * let the system make progress and retry.
>>>>>>> +         */
>>>>>>> +        struct address_space *mapping = folio_mapping(src);
>>>>>>> +
>>>>>>> +        if (folio_ref_count(src) != folio_expected_refs(mapping,
>>>>>>> src))
>>>>>>> +            goto out;
>>>>>>
>>>>>> This really seems to be the latest point where we can "easily" back
>>>>>> off and unlock the source folio -- in this function :)
>>>>>>
>>>>>> I wonder if we should be smarter in the migrate_pages_batch() loop
>>>>>> when we start the actual migrations via migrate_folio_move(): if we
>>>>>> detect that a folio has unexpected references *and* it has waiters
>>>>>> (PG_waiters), back off then and retry the folio later. If it only has
>>>>>> unexpected references, just keep retrying: no waiters -> nobody is
>>>>>> waiting for the lock to make progress.
>>>>>
>>>>>
>>>>> The patch currently retries migration irrespective of the reason of
>>>>> refcount change.
>>>>>
>>>>> If you are suggesting that, break the retrying according to two
>>>>> conditions:
>>>>
>>>> That's not what I am suggesting ...
>>>>
>>>>>
>>>>>
>>>>>> This really seems to be the latest point where we can "easily" back
>>>>>> off and unlock the source folio -- in this function :)
>>>>>> For example, when migrate_folio_move() fails with -EAGAIN, check if
>>>>>> there are waiters (PG_waiter?) and undo+unlock to try again later.
>>>>>
>>>>>
>>>>> Currently, on -EAGAIN, migrate_folio_move() returns without undoing
>>>>> src
>>>>> and dst; even if we were to fall
>>>>
>>>> ...
>>>>
>>>> I am wondering if we should detect here if there are waiters and undo
>>>> src+dst.
>>>
>>> After undoing src+dst, which restores the PTEs, how are you going to
>>> set the
>>>
>>> PTEs to migration again? That is being done through
>>> migrate_folio_unmap(),
>>>
>>> and the loops of _unmap() and _move() are different. Or am I missing
>>> something...
>>
>> Again, no expert on the code, but it would mean that if we detect that
>> there are waiters, we would undo src+dst and add them to ret_folios,
>> similar to what we do in "Cleanup remaining folios" at the end of
>> migrate_pages_batch()?
>>
>> So instead of retrying migration of that folio, just give it up
>> immediately and retry again later.
>>
>> Of course, this means that (without further modifications to that
>> function), we would leave retrying these folios to the caller, such as
>> in migrate_pages_sync(), where we move ret_folios to the tail of
>> "folios" and retry migration.
> 
> So IIUC, you are saying to change the return value in
> __folio_migrate_mapping(), so that when move_to_new_folio() fails
> 
> in migrate_folio_move(), we end up in the retrying loop of _sync() which
> calls _batch() in synchronous mode. Here, we
> 
> will have to make a change to decide how much we want to retry?

So essentially, instead of checking for "unexpected references" and 
backing off once at the beginning (what you do in this patch), we would 
*not* add new checks for "unexpected references" and not fail early in 
that case.

Instead, we would continuously check if there are waiters, and if there 
are waiters, we back-off completely (->unlock) instead of retrying 
something that cannot possibly make progress.

For "unexpected references" it can make sense to just retry immediately, 
because these might just be speculative references or short-term 
references that will go away soon.

For "unexpected reference with waiters" (or just "waiters" which should 
be the same because "waiters" should imply "unexpected references"), 
it's different as you discovered.

What we do with these "somebody else is waiting to make progress" pages 
is indeed a god question -- Ying seems to have some ideas in how to 
optimize retrying further.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch
  2024-08-12  7:31           ` Huang, Ying
@ 2024-08-12 12:08             ` Dev Jain
  2024-08-13  5:00               ` Dev Jain
  0 siblings, 1 reply; 30+ messages in thread
From: Dev Jain @ 2024-08-12 12:08 UTC (permalink / raw)
  To: Huang, Ying
  Cc: akpm, shuah, david, willy, ryan.roberts, anshuman.khandual,
	catalin.marinas, cl, vbabka, mhocko, apopple, osalvador,
	baolin.wang, dave.hansen, will, baohua, ioworker0, gshan,
	mark.rutland, kirill.shutemov, hughd, aneesh.kumar, yang, peterx,
	broonie, mgorman, linux-arm-kernel, linux-kernel, linux-mm,
	linux-kselftest


On 8/12/24 13:01, Huang, Ying wrote:
> Dev Jain <dev.jain@arm.com> writes:
>
>> On 8/12/24 11:45, Huang, Ying wrote:
>>> Dev Jain <dev.jain@arm.com> writes:
>>>
>>>> On 8/12/24 11:04, Huang, Ying wrote:
>>>>> Hi, Dev,
>>>>>
>>>>> Dev Jain <dev.jain@arm.com> writes:
>>>>>
>>>>>> As already being done in __migrate_folio(), wherein we backoff if the
>>>>>> folio refcount is wrong, make this check during the unmapping phase, upon
>>>>>> the failure of which, the original state of the PTEs will be restored and
>>>>>> the folio lock will be dropped via migrate_folio_undo_src(), any racing
>>>>>> thread will make progress and migration will be retried.
>>>>>>
>>>>>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>>>>>> ---
>>>>>>     mm/migrate.c | 9 +++++++++
>>>>>>     1 file changed, 9 insertions(+)
>>>>>>
>>>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>>>> index e7296c0fb5d5..477acf996951 100644
>>>>>> --- a/mm/migrate.c
>>>>>> +++ b/mm/migrate.c
>>>>>> @@ -1250,6 +1250,15 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
>>>>>>     	}
>>>>>>       	if (!folio_mapped(src)) {
>>>>>> +		/*
>>>>>> +		 * Someone may have changed the refcount and maybe sleeping
>>>>>> +		 * on the folio lock. In case of refcount mismatch, bail out,
>>>>>> +		 * let the system make progress and retry.
>>>>>> +		 */
>>>>>> +		struct address_space *mapping = folio_mapping(src);
>>>>>> +
>>>>>> +		if (folio_ref_count(src) != folio_expected_refs(mapping, src))
>>>>>> +			goto out;
>>>>>>     		__migrate_folio_record(dst, old_page_state, anon_vma);
>>>>>>     		return MIGRATEPAGE_UNMAP;
>>>>>>     	}
>>>>> Do you have some test results for this?  For example, after applying the
>>>>> patch, the migration success rate increased XX%, etc.
>>>> I'll get back to you on this.
>>>>
>>>>> My understanding for this issue is that the migration success rate can
>>>>> increase if we undo all changes before retrying.  This is the current
>>>>> behavior for sync migration, but not for async migration.  If so, we can
>>>>> use migrate_pages_sync() for async migration too to increase success
>>>>> rate?  Of course, we need to change the function name and comments.
>>>> As per my understanding, this is not the current behaviour for sync
>>>> migration. After successful unmapping, we fail in migrate_folio_move()
>>>> with -EAGAIN, we do not call undo src+dst (rendering the loop around
>>>> migrate_folio_move() futile), we do not push the failed folio onto the
>>>> ret_folios list, therefore, in _sync(), _batch() is never tried again.
>>> In migrate_pages_sync(), migrate_pages_batch(,MIGRATE_ASYNC) will be
>>> called first, if failed, the folio will be restored to the original
>>> state (unlocked).  Then migrate_pages_batch(,_SYNC*) is called again.
>>> So, we unlock once.  If it's necessary, we can unlock more times via
>>> another level of loop.
>> Yes, that's my point. We need to undo src+dst and retry.
> For sync migration, we undo src+dst and retry now, but only once.  You
> have shown that more retrying increases success rate.
>
>> We will have
>> to decide where we want this retrying to be; do we want to change the
>> return value, end up in the while loop wrapped around _sync(), and retry
>> there by adding another level of loop, or do we want to make use of the
>> existing retry loops, one of which is wrapped around _unmap(); the latter
>> is my approach. The utility I see for the former approach is that, in case
>> of a large number of page migrations (which should usually be the case),
>> we are giving more time for the folio to get retried. The latter does not
>> give much time and discards the folio if it did not succeed under 7 times.
> Because it's a race, I guess that most folios will be migrated
> successfully in the first pass.
>
> My concerns of your method are that it deal with just one case
> specially.  While retrying after undoing all appears more general.


Makes sense. Also, please ignore my "change the return value"
thing, I got confused between unmap_folios, ret_folios, etc.
Now I think I understood what the lists are doing :)

>
> If it's really important to retry after undoing all, we can either
> convert two retying loops of migrate_pages_batch() into one loop, or
> remove retry loop in migrate_pages_batch() and retry in its caller
> instead.

And if I implemented this correctly, the following makes the test
pass always:
https://www.codedump.xyz/diff/Zrn7EdxzNXmXyNXe



>
> --
> Best Regards,
> Huang, Ying


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch
  2024-08-12 12:08             ` Dev Jain
@ 2024-08-13  5:00               ` Dev Jain
  2024-08-13  7:22                 ` Dev Jain
  0 siblings, 1 reply; 30+ messages in thread
From: Dev Jain @ 2024-08-13  5:00 UTC (permalink / raw)
  To: Huang, Ying
  Cc: akpm, shuah, david, willy, ryan.roberts, anshuman.khandual,
	catalin.marinas, cl, vbabka, mhocko, apopple, osalvador,
	baolin.wang, dave.hansen, will, baohua, ioworker0, gshan,
	mark.rutland, kirill.shutemov, hughd, aneesh.kumar, yang, peterx,
	broonie, mgorman, linux-arm-kernel, linux-kernel, linux-mm,
	linux-kselftest


On 8/12/24 17:38, Dev Jain wrote:
>
> On 8/12/24 13:01, Huang, Ying wrote:
>> Dev Jain <dev.jain@arm.com> writes:
>>
>>> On 8/12/24 11:45, Huang, Ying wrote:
>>>> Dev Jain <dev.jain@arm.com> writes:
>>>>
>>>>> On 8/12/24 11:04, Huang, Ying wrote:
>>>>>> Hi, Dev,
>>>>>>
>>>>>> Dev Jain <dev.jain@arm.com> writes:
>>>>>>
>>>>>>> As already being done in __migrate_folio(), wherein we backoff 
>>>>>>> if the
>>>>>>> folio refcount is wrong, make this check during the unmapping 
>>>>>>> phase, upon
>>>>>>> the failure of which, the original state of the PTEs will be 
>>>>>>> restored and
>>>>>>> the folio lock will be dropped via migrate_folio_undo_src(), any 
>>>>>>> racing
>>>>>>> thread will make progress and migration will be retried.
>>>>>>>
>>>>>>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>>>>>>> ---
>>>>>>>     mm/migrate.c | 9 +++++++++
>>>>>>>     1 file changed, 9 insertions(+)
>>>>>>>
>>>>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>>>>> index e7296c0fb5d5..477acf996951 100644
>>>>>>> --- a/mm/migrate.c
>>>>>>> +++ b/mm/migrate.c
>>>>>>> @@ -1250,6 +1250,15 @@ static int 
>>>>>>> migrate_folio_unmap(new_folio_t get_new_folio,
>>>>>>>         }
>>>>>>>           if (!folio_mapped(src)) {
>>>>>>> +        /*
>>>>>>> +         * Someone may have changed the refcount and maybe 
>>>>>>> sleeping
>>>>>>> +         * on the folio lock. In case of refcount mismatch, 
>>>>>>> bail out,
>>>>>>> +         * let the system make progress and retry.
>>>>>>> +         */
>>>>>>> +        struct address_space *mapping = folio_mapping(src);
>>>>>>> +
>>>>>>> +        if (folio_ref_count(src) != 
>>>>>>> folio_expected_refs(mapping, src))
>>>>>>> +            goto out;
>>>>>>>             __migrate_folio_record(dst, old_page_state, anon_vma);
>>>>>>>             return MIGRATEPAGE_UNMAP;
>>>>>>>         }
>>>>>> Do you have some test results for this?  For example, after 
>>>>>> applying the
>>>>>> patch, the migration success rate increased XX%, etc.
>>>>> I'll get back to you on this.
>>>>>
>>>>>> My understanding for this issue is that the migration success 
>>>>>> rate can
>>>>>> increase if we undo all changes before retrying.  This is the 
>>>>>> current
>>>>>> behavior for sync migration, but not for async migration.  If so, 
>>>>>> we can
>>>>>> use migrate_pages_sync() for async migration too to increase success
>>>>>> rate?  Of course, we need to change the function name and comments.
>>>>> As per my understanding, this is not the current behaviour for sync
>>>>> migration. After successful unmapping, we fail in 
>>>>> migrate_folio_move()
>>>>> with -EAGAIN, we do not call undo src+dst (rendering the loop around
>>>>> migrate_folio_move() futile), we do not push the failed folio onto 
>>>>> the
>>>>> ret_folios list, therefore, in _sync(), _batch() is never tried 
>>>>> again.
>>>> In migrate_pages_sync(), migrate_pages_batch(,MIGRATE_ASYNC) will be
>>>> called first, if failed, the folio will be restored to the original
>>>> state (unlocked).  Then migrate_pages_batch(,_SYNC*) is called again.
>>>> So, we unlock once.  If it's necessary, we can unlock more times via
>>>> another level of loop.
>>> Yes, that's my point. We need to undo src+dst and retry.
>> For sync migration, we undo src+dst and retry now, but only once.  You
>> have shown that more retrying increases success rate.
>>
>>> We will have
>>> to decide where we want this retrying to be; do we want to change the
>>> return value, end up in the while loop wrapped around _sync(), and 
>>> retry
>>> there by adding another level of loop, or do we want to make use of the
>>> existing retry loops, one of which is wrapped around _unmap(); the 
>>> latter
>>> is my approach. The utility I see for the former approach is that, 
>>> in case
>>> of a large number of page migrations (which should usually be the 
>>> case),
>>> we are giving more time for the folio to get retried. The latter 
>>> does not
>>> give much time and discards the folio if it did not succeed under 7 
>>> times.
>> Because it's a race, I guess that most folios will be migrated
>> successfully in the first pass.
>>
>> My concerns of your method are that it deal with just one case
>> specially.  While retrying after undoing all appears more general.
>
>
> Makes sense. Also, please ignore my "change the return value"
> thing, I got confused between unmap_folios, ret_folios, etc.
> Now I think I understood what the lists are doing :)
>
>>
>> If it's really important to retry after undoing all, we can either
>> convert two retying loops of migrate_pages_batch() into one loop, or
>> remove retry loop in migrate_pages_batch() and retry in its caller
>> instead.
>
> And if I implemented this correctly, the following makes the test
> pass always:
> https://www.codedump.xyz/diff/Zrn7EdxzNXmXyNXe


Okay, I did mess up with the implementation, leading to a false
positive. Let me try again :)

>
>
>
>>
>> -- 
>> Best Regards,
>> Huang, Ying
>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch
  2024-08-13  5:00               ` Dev Jain
@ 2024-08-13  7:22                 ` Dev Jain
  2024-08-16 11:31                   ` Dev Jain
  0 siblings, 1 reply; 30+ messages in thread
From: Dev Jain @ 2024-08-13  7:22 UTC (permalink / raw)
  To: Huang, Ying
  Cc: akpm, shuah, david, willy, ryan.roberts, anshuman.khandual,
	catalin.marinas, cl, vbabka, mhocko, apopple, osalvador,
	baolin.wang, dave.hansen, will, baohua, ioworker0, gshan,
	mark.rutland, kirill.shutemov, hughd, aneesh.kumar, yang, peterx,
	broonie, mgorman, linux-arm-kernel, linux-kernel, linux-mm,
	linux-kselftest


On 8/13/24 10:30, Dev Jain wrote:
>
> On 8/12/24 17:38, Dev Jain wrote:
>>
>> On 8/12/24 13:01, Huang, Ying wrote:
>>> Dev Jain <dev.jain@arm.com> writes:
>>>
>>>> On 8/12/24 11:45, Huang, Ying wrote:
>>>>> Dev Jain <dev.jain@arm.com> writes:
>>>>>
>>>>>> On 8/12/24 11:04, Huang, Ying wrote:
>>>>>>> Hi, Dev,
>>>>>>>
>>>>>>> Dev Jain <dev.jain@arm.com> writes:
>>>>>>>
>>>>>>>> As already being done in __migrate_folio(), wherein we backoff 
>>>>>>>> if the
>>>>>>>> folio refcount is wrong, make this check during the unmapping 
>>>>>>>> phase, upon
>>>>>>>> the failure of which, the original state of the PTEs will be 
>>>>>>>> restored and
>>>>>>>> the folio lock will be dropped via migrate_folio_undo_src(), 
>>>>>>>> any racing
>>>>>>>> thread will make progress and migration will be retried.
>>>>>>>>
>>>>>>>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>>>>>>>> ---
>>>>>>>>     mm/migrate.c | 9 +++++++++
>>>>>>>>     1 file changed, 9 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>>>>>> index e7296c0fb5d5..477acf996951 100644
>>>>>>>> --- a/mm/migrate.c
>>>>>>>> +++ b/mm/migrate.c
>>>>>>>> @@ -1250,6 +1250,15 @@ static int 
>>>>>>>> migrate_folio_unmap(new_folio_t get_new_folio,
>>>>>>>>         }
>>>>>>>>           if (!folio_mapped(src)) {
>>>>>>>> +        /*
>>>>>>>> +         * Someone may have changed the refcount and maybe 
>>>>>>>> sleeping
>>>>>>>> +         * on the folio lock. In case of refcount mismatch, 
>>>>>>>> bail out,
>>>>>>>> +         * let the system make progress and retry.
>>>>>>>> +         */
>>>>>>>> +        struct address_space *mapping = folio_mapping(src);
>>>>>>>> +
>>>>>>>> +        if (folio_ref_count(src) != 
>>>>>>>> folio_expected_refs(mapping, src))
>>>>>>>> +            goto out;
>>>>>>>>             __migrate_folio_record(dst, old_page_state, anon_vma);
>>>>>>>>             return MIGRATEPAGE_UNMAP;
>>>>>>>>         }
>>>>>>> Do you have some test results for this?  For example, after 
>>>>>>> applying the
>>>>>>> patch, the migration success rate increased XX%, etc.
>>>>>> I'll get back to you on this.
>>>>>>
>>>>>>> My understanding for this issue is that the migration success 
>>>>>>> rate can
>>>>>>> increase if we undo all changes before retrying.  This is the 
>>>>>>> current
>>>>>>> behavior for sync migration, but not for async migration.  If 
>>>>>>> so, we can
>>>>>>> use migrate_pages_sync() for async migration too to increase 
>>>>>>> success
>>>>>>> rate?  Of course, we need to change the function name and comments.
>>>>>> As per my understanding, this is not the current behaviour for sync
>>>>>> migration. After successful unmapping, we fail in 
>>>>>> migrate_folio_move()
>>>>>> with -EAGAIN, we do not call undo src+dst (rendering the loop around
>>>>>> migrate_folio_move() futile), we do not push the failed folio 
>>>>>> onto the
>>>>>> ret_folios list, therefore, in _sync(), _batch() is never tried 
>>>>>> again.
>>>>> In migrate_pages_sync(), migrate_pages_batch(,MIGRATE_ASYNC) will be
>>>>> called first, if failed, the folio will be restored to the original
>>>>> state (unlocked).  Then migrate_pages_batch(,_SYNC*) is called again.
>>>>> So, we unlock once.  If it's necessary, we can unlock more times via
>>>>> another level of loop.
>>>> Yes, that's my point. We need to undo src+dst and retry.
>>> For sync migration, we undo src+dst and retry now, but only once.  You
>>> have shown that more retrying increases success rate.
>>>
>>>> We will have
>>>> to decide where we want this retrying to be; do we want to change the
>>>> return value, end up in the while loop wrapped around _sync(), and 
>>>> retry
>>>> there by adding another level of loop, or do we want to make use of 
>>>> the
>>>> existing retry loops, one of which is wrapped around _unmap(); the 
>>>> latter
>>>> is my approach. The utility I see for the former approach is that, 
>>>> in case
>>>> of a large number of page migrations (which should usually be the 
>>>> case),
>>>> we are giving more time for the folio to get retried. The latter 
>>>> does not
>>>> give much time and discards the folio if it did not succeed under 7 
>>>> times.
>>> Because it's a race, I guess that most folios will be migrated
>>> successfully in the first pass.
>>>
>>> My concerns of your method are that it deal with just one case
>>> specially.  While retrying after undoing all appears more general.
>>
>>
>> Makes sense. Also, please ignore my "change the return value"
>> thing, I got confused between unmap_folios, ret_folios, etc.
>> Now I think I understood what the lists are doing :)
>>
>>>
>>> If it's really important to retry after undoing all, we can either
>>> convert two retying loops of migrate_pages_batch() into one loop, or
>>> remove retry loop in migrate_pages_batch() and retry in its caller
>>> instead.
>>
>> And if I implemented this correctly, the following makes the test
>> pass always:
>> https://www.codedump.xyz/diff/Zrn7EdxzNXmXyNXe
>
>
> Okay, I did mess up with the implementation, leading to a false
> positive. Let me try again :)


Hopefully this should do the job:
https://www.codedump.xyz/diff/ZrsIV8JSOPYx5V_u

But the result is worse than the patch proposed; I rarely hit
a 3 digit number of successes of move_pages(). But, on a
base kernel without any changes, when I apply David's
suggestion to change the test, if I choose 7 as the number
of retries (= NR_MAX_MIGRATE_SYNC_RETRY) in the test, I
can touch even 4 digits. I am puzzled.
We can also try merging the for loops of unmap and move...

>
>>
>>
>>
>>>
>>> -- 
>>> Best Regards,
>>> Huang, Ying
>>
>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch
  2024-08-13  7:22                 ` Dev Jain
@ 2024-08-16 11:31                   ` Dev Jain
  2024-08-19  6:58                     ` Huang, Ying
  0 siblings, 1 reply; 30+ messages in thread
From: Dev Jain @ 2024-08-16 11:31 UTC (permalink / raw)
  To: Huang, Ying
  Cc: akpm, shuah, david, willy, ryan.roberts, anshuman.khandual,
	catalin.marinas, cl, vbabka, mhocko, apopple, osalvador,
	baolin.wang, dave.hansen, will, baohua, ioworker0, gshan,
	mark.rutland, kirill.shutemov, hughd, aneesh.kumar, yang, peterx,
	broonie, mgorman, linux-arm-kernel, linux-kernel, linux-mm,
	linux-kselftest


On 8/13/24 12:52, Dev Jain wrote:
>
> On 8/13/24 10:30, Dev Jain wrote:
>>
>> On 8/12/24 17:38, Dev Jain wrote:
>>>
>>> On 8/12/24 13:01, Huang, Ying wrote:
>>>> Dev Jain <dev.jain@arm.com> writes:
>>>>
>>>>> On 8/12/24 11:45, Huang, Ying wrote:
>>>>>> Dev Jain <dev.jain@arm.com> writes:
>>>>>>
>>>>>>> On 8/12/24 11:04, Huang, Ying wrote:
>>>>>>>> Hi, Dev,
>>>>>>>>
>>>>>>>> Dev Jain <dev.jain@arm.com> writes:
>>>>>>>>
>>>>>>>>> As already being done in __migrate_folio(), wherein we backoff 
>>>>>>>>> if the
>>>>>>>>> folio refcount is wrong, make this check during the unmapping 
>>>>>>>>> phase, upon
>>>>>>>>> the failure of which, the original state of the PTEs will be 
>>>>>>>>> restored and
>>>>>>>>> the folio lock will be dropped via migrate_folio_undo_src(), 
>>>>>>>>> any racing
>>>>>>>>> thread will make progress and migration will be retried.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>>>>>>>>> ---
>>>>>>>>>     mm/migrate.c | 9 +++++++++
>>>>>>>>>     1 file changed, 9 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>>>>>>> index e7296c0fb5d5..477acf996951 100644
>>>>>>>>> --- a/mm/migrate.c
>>>>>>>>> +++ b/mm/migrate.c
>>>>>>>>> @@ -1250,6 +1250,15 @@ static int 
>>>>>>>>> migrate_folio_unmap(new_folio_t get_new_folio,
>>>>>>>>>         }
>>>>>>>>>           if (!folio_mapped(src)) {
>>>>>>>>> +        /*
>>>>>>>>> +         * Someone may have changed the refcount and maybe 
>>>>>>>>> sleeping
>>>>>>>>> +         * on the folio lock. In case of refcount mismatch, 
>>>>>>>>> bail out,
>>>>>>>>> +         * let the system make progress and retry.
>>>>>>>>> +         */
>>>>>>>>> +        struct address_space *mapping = folio_mapping(src);
>>>>>>>>> +
>>>>>>>>> +        if (folio_ref_count(src) != 
>>>>>>>>> folio_expected_refs(mapping, src))
>>>>>>>>> +            goto out;
>>>>>>>>>             __migrate_folio_record(dst, old_page_state, 
>>>>>>>>> anon_vma);
>>>>>>>>>             return MIGRATEPAGE_UNMAP;
>>>>>>>>>         }
>>>>>>>> Do you have some test results for this?  For example, after 
>>>>>>>> applying the
>>>>>>>> patch, the migration success rate increased XX%, etc.
>>>>>>> I'll get back to you on this.
>>>>>>>
>>>>>>>> My understanding for this issue is that the migration success 
>>>>>>>> rate can
>>>>>>>> increase if we undo all changes before retrying. This is the 
>>>>>>>> current
>>>>>>>> behavior for sync migration, but not for async migration.  If 
>>>>>>>> so, we can
>>>>>>>> use migrate_pages_sync() for async migration too to increase 
>>>>>>>> success
>>>>>>>> rate?  Of course, we need to change the function name and 
>>>>>>>> comments.
>>>>>>> As per my understanding, this is not the current behaviour for sync
>>>>>>> migration. After successful unmapping, we fail in 
>>>>>>> migrate_folio_move()
>>>>>>> with -EAGAIN, we do not call undo src+dst (rendering the loop 
>>>>>>> around
>>>>>>> migrate_folio_move() futile), we do not push the failed folio 
>>>>>>> onto the
>>>>>>> ret_folios list, therefore, in _sync(), _batch() is never tried 
>>>>>>> again.
>>>>>> In migrate_pages_sync(), migrate_pages_batch(,MIGRATE_ASYNC) will be
>>>>>> called first, if failed, the folio will be restored to the original
>>>>>> state (unlocked).  Then migrate_pages_batch(,_SYNC*) is called 
>>>>>> again.
>>>>>> So, we unlock once.  If it's necessary, we can unlock more times via
>>>>>> another level of loop.
>>>>> Yes, that's my point. We need to undo src+dst and retry.
>>>> For sync migration, we undo src+dst and retry now, but only once.  You
>>>> have shown that more retrying increases success rate.
>>>>
>>>>> We will have
>>>>> to decide where we want this retrying to be; do we want to change the
>>>>> return value, end up in the while loop wrapped around _sync(), and 
>>>>> retry
>>>>> there by adding another level of loop, or do we want to make use 
>>>>> of the
>>>>> existing retry loops, one of which is wrapped around _unmap(); the 
>>>>> latter
>>>>> is my approach. The utility I see for the former approach is that, 
>>>>> in case
>>>>> of a large number of page migrations (which should usually be the 
>>>>> case),
>>>>> we are giving more time for the folio to get retried. The latter 
>>>>> does not
>>>>> give much time and discards the folio if it did not succeed under 
>>>>> 7 times.
>>>> Because it's a race, I guess that most folios will be migrated
>>>> successfully in the first pass.
>>>>
>>>> My concerns of your method are that it deal with just one case
>>>> specially.  While retrying after undoing all appears more general.
>>>
>>>
>>> Makes sense. Also, please ignore my "change the return value"
>>> thing, I got confused between unmap_folios, ret_folios, etc.
>>> Now I think I understood what the lists are doing :)
>>>
>>>>
>>>> If it's really important to retry after undoing all, we can either
>>>> convert two retying loops of migrate_pages_batch() into one loop, or
>>>> remove retry loop in migrate_pages_batch() and retry in its caller
>>>> instead.
>>>
>>> And if I implemented this correctly, the following makes the test
>>> pass always:
>>> https://www.codedump.xyz/diff/Zrn7EdxzNXmXyNXe
>>
>>
>> Okay, I did mess up with the implementation, leading to a false
>> positive. Let me try again :)
>
>
> Hopefully this should do the job:
> https://www.codedump.xyz/diff/ZrsIV8JSOPYx5V_u
>
> But the result is worse than the patch proposed; I rarely hit
> a 3 digit number of successes of move_pages(). But, on a
> base kernel without any changes, when I apply David's
> suggestion to change the test, if I choose 7 as the number
> of retries (= NR_MAX_MIGRATE_SYNC_RETRY) in the test, I
> can touch even 4 digits. I am puzzled.
> We can also try merging the for loops of unmap and move...


If people are okay with this change, I guess I can send it as
a v2? I concur with your assessment that my initial approach
is solving a specific case; the above approach does give me
a slight improvement on arm64 and should be an improvement
in general, since it makes sense to defer retrying the failed folio
as much as we can.

>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch
  2024-08-16 11:31                   ` Dev Jain
@ 2024-08-19  6:58                     ` Huang, Ying
  2024-08-20  7:16                       ` Dev Jain
  0 siblings, 1 reply; 30+ messages in thread
From: Huang, Ying @ 2024-08-19  6:58 UTC (permalink / raw)
  To: Dev Jain
  Cc: akpm, shuah, david, willy, ryan.roberts, anshuman.khandual,
	catalin.marinas, cl, vbabka, mhocko, apopple, osalvador,
	baolin.wang, dave.hansen, will, baohua, ioworker0, gshan,
	mark.rutland, kirill.shutemov, hughd, aneesh.kumar, yang, peterx,
	broonie, mgorman, linux-arm-kernel, linux-kernel, linux-mm,
	linux-kselftest

Dev Jain <dev.jain@arm.com> writes:

> On 8/13/24 12:52, Dev Jain wrote:
>>
>> On 8/13/24 10:30, Dev Jain wrote:
>>>
>>> On 8/12/24 17:38, Dev Jain wrote:
>>>>
>>>> On 8/12/24 13:01, Huang, Ying wrote:
>>>>> Dev Jain <dev.jain@arm.com> writes:
>>>>>
>>>>>> On 8/12/24 11:45, Huang, Ying wrote:
>>>>>>> Dev Jain <dev.jain@arm.com> writes:
>>>>>>>
>>>>>>>> On 8/12/24 11:04, Huang, Ying wrote:
>>>>>>>>> Hi, Dev,
>>>>>>>>>
>>>>>>>>> Dev Jain <dev.jain@arm.com> writes:
>>>>>>>>>
>>>>>>>>>> As already being done in __migrate_folio(), wherein we
>>>>>>>>>> backoff if the
>>>>>>>>>> folio refcount is wrong, make this check during the
>>>>>>>>>> unmapping phase, upon
>>>>>>>>>> the failure of which, the original state of the PTEs will be
>>>>>>>>>> restored and
>>>>>>>>>> the folio lock will be dropped via migrate_folio_undo_src(),
>>>>>>>>>> any racing
>>>>>>>>>> thread will make progress and migration will be retried.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>>>>>>>>>> ---
>>>>>>>>>>     mm/migrate.c | 9 +++++++++
>>>>>>>>>>     1 file changed, 9 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>>>>>>>> index e7296c0fb5d5..477acf996951 100644
>>>>>>>>>> --- a/mm/migrate.c
>>>>>>>>>> +++ b/mm/migrate.c
>>>>>>>>>> @@ -1250,6 +1250,15 @@ static int
>>>>>>>>>> migrate_folio_unmap(new_folio_t get_new_folio,
>>>>>>>>>>         }
>>>>>>>>>>           if (!folio_mapped(src)) {
>>>>>>>>>> +        /*
>>>>>>>>>> +         * Someone may have changed the refcount and maybe
>>>>>>>>>> sleeping
>>>>>>>>>> +         * on the folio lock. In case of refcount mismatch,
>>>>>>>>>> bail out,
>>>>>>>>>> +         * let the system make progress and retry.
>>>>>>>>>> +         */
>>>>>>>>>> +        struct address_space *mapping = folio_mapping(src);
>>>>>>>>>> +
>>>>>>>>>> +        if (folio_ref_count(src) !=
>>>>>>>>>> folio_expected_refs(mapping, src))
>>>>>>>>>> +            goto out;
>>>>>>>>>>             __migrate_folio_record(dst, old_page_state,
>>>>>>>>>> anon_vma);
>>>>>>>>>>             return MIGRATEPAGE_UNMAP;
>>>>>>>>>>         }
>>>>>>>>> Do you have some test results for this?  For example, after
>>>>>>>>> applying the
>>>>>>>>> patch, the migration success rate increased XX%, etc.
>>>>>>>> I'll get back to you on this.
>>>>>>>>
>>>>>>>>> My understanding for this issue is that the migration success
>>>>>>>>> rate can
>>>>>>>>> increase if we undo all changes before retrying. This is the
>>>>>>>>> current
>>>>>>>>> behavior for sync migration, but not for async migration.  If
>>>>>>>>> so, we can
>>>>>>>>> use migrate_pages_sync() for async migration too to increase
>>>>>>>>> success
>>>>>>>>> rate?  Of course, we need to change the function name and
>>>>>>>>> comments.
>>>>>>>> As per my understanding, this is not the current behaviour for sync
>>>>>>>> migration. After successful unmapping, we fail in
>>>>>>>> migrate_folio_move()
>>>>>>>> with -EAGAIN, we do not call undo src+dst (rendering the loop
>>>>>>>> around
>>>>>>>> migrate_folio_move() futile), we do not push the failed folio
>>>>>>>> onto the
>>>>>>>> ret_folios list, therefore, in _sync(), _batch() is never
>>>>>>>> tried again.
>>>>>>> In migrate_pages_sync(), migrate_pages_batch(,MIGRATE_ASYNC) will be
>>>>>>> called first, if failed, the folio will be restored to the original
>>>>>>> state (unlocked).  Then migrate_pages_batch(,_SYNC*) is called
>>>>>>> again.
>>>>>>> So, we unlock once.  If it's necessary, we can unlock more times via
>>>>>>> another level of loop.
>>>>>> Yes, that's my point. We need to undo src+dst and retry.
>>>>> For sync migration, we undo src+dst and retry now, but only once.  You
>>>>> have shown that more retrying increases success rate.
>>>>>
>>>>>> We will have
>>>>>> to decide where we want this retrying to be; do we want to change the
>>>>>> return value, end up in the while loop wrapped around _sync(),
>>>>>> and retry
>>>>>> there by adding another level of loop, or do we want to make use
>>>>>> of the
>>>>>> existing retry loops, one of which is wrapped around _unmap();
>>>>>> the latter
>>>>>> is my approach. The utility I see for the former approach is
>>>>>> that, in case
>>>>>> of a large number of page migrations (which should usually be
>>>>>> the case),
>>>>>> we are giving more time for the folio to get retried. The latter
>>>>>> does not
>>>>>> give much time and discards the folio if it did not succeed
>>>>>> under 7 times.
>>>>> Because it's a race, I guess that most folios will be migrated
>>>>> successfully in the first pass.
>>>>>
>>>>> My concerns of your method are that it deal with just one case
>>>>> specially.  While retrying after undoing all appears more general.
>>>>
>>>>
>>>> Makes sense. Also, please ignore my "change the return value"
>>>> thing, I got confused between unmap_folios, ret_folios, etc.
>>>> Now I think I understood what the lists are doing :)
>>>>
>>>>>
>>>>> If it's really important to retry after undoing all, we can either
>>>>> convert two retying loops of migrate_pages_batch() into one loop, or
>>>>> remove retry loop in migrate_pages_batch() and retry in its caller
>>>>> instead.
>>>>
>>>> And if I implemented this correctly, the following makes the test
>>>> pass always:
>>>> https://www.codedump.xyz/diff/Zrn7EdxzNXmXyNXe
>>>
>>>
>>> Okay, I did mess up with the implementation, leading to a false
>>> positive. Let me try again :)
>>
>>
>> Hopefully this should do the job:
>> https://www.codedump.xyz/diff/ZrsIV8JSOPYx5V_u
>>
>> But the result is worse than the patch proposed; I rarely hit
>> a 3 digit number of successes of move_pages(). But, on a
>> base kernel without any changes, when I apply David's
>> suggestion to change the test, if I choose 7 as the number
>> of retries (= NR_MAX_MIGRATE_SYNC_RETRY) in the test, I
>> can touch even 4 digits. I am puzzled.
>> We can also try merging the for loops of unmap and move...
>
>
> If people are okay with this change, I guess I can send it as
> a v2? I concur with your assessment that my initial approach
> is solving a specific case; the above approach does give me
> a slight improvement on arm64 and should be an improvement
> in general, since it makes sense to defer retrying the failed folio
> as much as we can.

We need to deal with something else before a formal v2,

- stats need to be fixed, please check result processing for the first
  loop of migrate_pages_sync().

- Do we need something similar for async migration.

- Can we add another level of explicit loop for the second loop of
  migrate_pages_sync()?  That is to improve code readability.  Or, add a
  function to dot that?

- Is it good to remove retry loop in migrate_pages_batch()?  And do
  retry in the caller?

--
Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch
  2024-08-19  6:58                     ` Huang, Ying
@ 2024-08-20  7:16                       ` Dev Jain
  2024-09-02  6:42                         ` Huang, Ying
  0 siblings, 1 reply; 30+ messages in thread
From: Dev Jain @ 2024-08-20  7:16 UTC (permalink / raw)
  To: Huang, Ying
  Cc: akpm, shuah, david, willy, ryan.roberts, anshuman.khandual,
	catalin.marinas, cl, vbabka, mhocko, apopple, osalvador,
	baolin.wang, dave.hansen, will, baohua, ioworker0, gshan,
	mark.rutland, kirill.shutemov, hughd, aneesh.kumar, yang, peterx,
	broonie, mgorman, linux-arm-kernel, linux-kernel, linux-mm,
	linux-kselftest


On 8/19/24 12:28, Huang, Ying wrote:
> Dev Jain <dev.jain@arm.com> writes:
>
>> On 8/13/24 12:52, Dev Jain wrote:
>>> On 8/13/24 10:30, Dev Jain wrote:
>>>> On 8/12/24 17:38, Dev Jain wrote:
>>>>> On 8/12/24 13:01, Huang, Ying wrote:
>>>>>> Dev Jain <dev.jain@arm.com> writes:
>>>>>>
>>>>>>> On 8/12/24 11:45, Huang, Ying wrote:
>>>>>>>> Dev Jain <dev.jain@arm.com> writes:
>>>>>>>>
>>>>>>>>> On 8/12/24 11:04, Huang, Ying wrote:
>>>>>>>>>> Hi, Dev,
>>>>>>>>>>
>>>>>>>>>> Dev Jain <dev.jain@arm.com> writes:
>>>>>>>>>>
>>>>>>>>>>> As already being done in __migrate_folio(), wherein we
>>>>>>>>>>> backoff if the
>>>>>>>>>>> folio refcount is wrong, make this check during the
>>>>>>>>>>> unmapping phase, upon
>>>>>>>>>>> the failure of which, the original state of the PTEs will be
>>>>>>>>>>> restored and
>>>>>>>>>>> the folio lock will be dropped via migrate_folio_undo_src(),
>>>>>>>>>>> any racing
>>>>>>>>>>> thread will make progress and migration will be retried.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>>>>>>>>>>> ---
>>>>>>>>>>>      mm/migrate.c | 9 +++++++++
>>>>>>>>>>>      1 file changed, 9 insertions(+)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>>>>>>>>> index e7296c0fb5d5..477acf996951 100644
>>>>>>>>>>> --- a/mm/migrate.c
>>>>>>>>>>> +++ b/mm/migrate.c
>>>>>>>>>>> @@ -1250,6 +1250,15 @@ static int
>>>>>>>>>>> migrate_folio_unmap(new_folio_t get_new_folio,
>>>>>>>>>>>          }
>>>>>>>>>>>            if (!folio_mapped(src)) {
>>>>>>>>>>> +        /*
>>>>>>>>>>> +         * Someone may have changed the refcount and maybe
>>>>>>>>>>> sleeping
>>>>>>>>>>> +         * on the folio lock. In case of refcount mismatch,
>>>>>>>>>>> bail out,
>>>>>>>>>>> +         * let the system make progress and retry.
>>>>>>>>>>> +         */
>>>>>>>>>>> +        struct address_space *mapping = folio_mapping(src);
>>>>>>>>>>> +
>>>>>>>>>>> +        if (folio_ref_count(src) !=
>>>>>>>>>>> folio_expected_refs(mapping, src))
>>>>>>>>>>> +            goto out;
>>>>>>>>>>>              __migrate_folio_record(dst, old_page_state,
>>>>>>>>>>> anon_vma);
>>>>>>>>>>>              return MIGRATEPAGE_UNMAP;
>>>>>>>>>>>          }
>>>>>>>>>> Do you have some test results for this?  For example, after
>>>>>>>>>> applying the
>>>>>>>>>> patch, the migration success rate increased XX%, etc.
>>>>>>>>> I'll get back to you on this.
>>>>>>>>>
>>>>>>>>>> My understanding for this issue is that the migration success
>>>>>>>>>> rate can
>>>>>>>>>> increase if we undo all changes before retrying. This is the
>>>>>>>>>> current
>>>>>>>>>> behavior for sync migration, but not for async migration.  If
>>>>>>>>>> so, we can
>>>>>>>>>> use migrate_pages_sync() for async migration too to increase
>>>>>>>>>> success
>>>>>>>>>> rate?  Of course, we need to change the function name and
>>>>>>>>>> comments.
>>>>>>>>> As per my understanding, this is not the current behaviour for sync
>>>>>>>>> migration. After successful unmapping, we fail in
>>>>>>>>> migrate_folio_move()
>>>>>>>>> with -EAGAIN, we do not call undo src+dst (rendering the loop
>>>>>>>>> around
>>>>>>>>> migrate_folio_move() futile), we do not push the failed folio
>>>>>>>>> onto the
>>>>>>>>> ret_folios list, therefore, in _sync(), _batch() is never
>>>>>>>>> tried again.
>>>>>>>> In migrate_pages_sync(), migrate_pages_batch(,MIGRATE_ASYNC) will be
>>>>>>>> called first, if failed, the folio will be restored to the original
>>>>>>>> state (unlocked).  Then migrate_pages_batch(,_SYNC*) is called
>>>>>>>> again.
>>>>>>>> So, we unlock once.  If it's necessary, we can unlock more times via
>>>>>>>> another level of loop.
>>>>>>> Yes, that's my point. We need to undo src+dst and retry.
>>>>>> For sync migration, we undo src+dst and retry now, but only once.  You
>>>>>> have shown that more retrying increases success rate.
>>>>>>
>>>>>>> We will have
>>>>>>> to decide where we want this retrying to be; do we want to change the
>>>>>>> return value, end up in the while loop wrapped around _sync(),
>>>>>>> and retry
>>>>>>> there by adding another level of loop, or do we want to make use
>>>>>>> of the
>>>>>>> existing retry loops, one of which is wrapped around _unmap();
>>>>>>> the latter
>>>>>>> is my approach. The utility I see for the former approach is
>>>>>>> that, in case
>>>>>>> of a large number of page migrations (which should usually be
>>>>>>> the case),
>>>>>>> we are giving more time for the folio to get retried. The latter
>>>>>>> does not
>>>>>>> give much time and discards the folio if it did not succeed
>>>>>>> under 7 times.
>>>>>> Because it's a race, I guess that most folios will be migrated
>>>>>> successfully in the first pass.
>>>>>>
>>>>>> My concerns of your method are that it deal with just one case
>>>>>> specially.  While retrying after undoing all appears more general.
>>>>>
>>>>> Makes sense. Also, please ignore my "change the return value"
>>>>> thing, I got confused between unmap_folios, ret_folios, etc.
>>>>> Now I think I understood what the lists are doing :)
>>>>>
>>>>>> If it's really important to retry after undoing all, we can either
>>>>>> convert two retying loops of migrate_pages_batch() into one loop, or
>>>>>> remove retry loop in migrate_pages_batch() and retry in its caller
>>>>>> instead.
>>>>> And if I implemented this correctly, the following makes the test
>>>>> pass always:
>>>>> https://www.codedump.xyz/diff/Zrn7EdxzNXmXyNXe
>>>>
>>>> Okay, I did mess up with the implementation, leading to a false
>>>> positive. Let me try again :)
>>>
>>> Hopefully this should do the job:
>>> https://www.codedump.xyz/diff/ZrsIV8JSOPYx5V_u
>>>
>>> But the result is worse than the patch proposed; I rarely hit
>>> a 3 digit number of successes of move_pages(). But, on a
>>> base kernel without any changes, when I apply David's
>>> suggestion to change the test, if I choose 7 as the number
>>> of retries (= NR_MAX_MIGRATE_SYNC_RETRY) in the test, I
>>> can touch even 4 digits. I am puzzled.
>>> We can also try merging the for loops of unmap and move...
>>
>> If people are okay with this change, I guess I can send it as
>> a v2? I concur with your assessment that my initial approach
>> is solving a specific case; the above approach does give me
>> a slight improvement on arm64 and should be an improvement
>> in general, since it makes sense to defer retrying the failed folio
>> as much as we can.
> We need to deal with something else before a formal v2,
>
> - stats need to be fixed, please check result processing for the first
>    loop of migrate_pages_sync().

Sorry, can you point out where do they need to be fixed exactly?
The change I did is inside the while(!list_empty(from)) block,
and there is no stat computation being done there already.

>
> - Do we need something similar for async migration.
>
> - Can we add another level of explicit loop for the second loop of
>    migrate_pages_sync()?  That is to improve code readability.  Or, add a
>    function to dot that?
>
> - Is it good to remove retry loop in migrate_pages_batch()?  And do
>    retry in the caller?

I am personally in favour of leaving the retry loop, and async
migration, as it is. Since async version is basically minimal-effort
migration, it won't make sense to "optimize" it, given the code churn
it would create, including the change we will have to then do in
"if (mode == MIGRATE_ASYNC) => migrate_pages_batch(ASYNC)" inside
migrate_pages().

Sorry, what do you mean by "another level of explicit loop"?

>
> --
> Best Regards,
> Huang, Ying


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch
  2024-08-20  7:16                       ` Dev Jain
@ 2024-09-02  6:42                         ` Huang, Ying
  0 siblings, 0 replies; 30+ messages in thread
From: Huang, Ying @ 2024-09-02  6:42 UTC (permalink / raw)
  To: Dev Jain
  Cc: akpm, shuah, david, willy, ryan.roberts, anshuman.khandual,
	catalin.marinas, cl, vbabka, mhocko, apopple, osalvador,
	baolin.wang, dave.hansen, will, baohua, ioworker0, gshan,
	mark.rutland, kirill.shutemov, hughd, aneesh.kumar, yang, peterx,
	broonie, mgorman, linux-arm-kernel, linux-kernel, linux-mm,
	linux-kselftest

Dev Jain <dev.jain@arm.com> writes:

> On 8/19/24 12:28, Huang, Ying wrote:
>> Dev Jain <dev.jain@arm.com> writes:
>>
>>> On 8/13/24 12:52, Dev Jain wrote:
>>>> On 8/13/24 10:30, Dev Jain wrote:
>>>>> On 8/12/24 17:38, Dev Jain wrote:
>>>>>> On 8/12/24 13:01, Huang, Ying wrote:
>>>>>>> Dev Jain <dev.jain@arm.com> writes:
>>>>>>>
>>>>>>>> On 8/12/24 11:45, Huang, Ying wrote:
>>>>>>>>> Dev Jain <dev.jain@arm.com> writes:
>>>>>>>>>
>>>>>>>>>> On 8/12/24 11:04, Huang, Ying wrote:
>>>>>>>>>>> Hi, Dev,
>>>>>>>>>>>
>>>>>>>>>>> Dev Jain <dev.jain@arm.com> writes:
>>>>>>>>>>>
>>>>>>>>>>>> As already being done in __migrate_folio(), wherein we
>>>>>>>>>>>> backoff if the
>>>>>>>>>>>> folio refcount is wrong, make this check during the
>>>>>>>>>>>> unmapping phase, upon
>>>>>>>>>>>> the failure of which, the original state of the PTEs will be
>>>>>>>>>>>> restored and
>>>>>>>>>>>> the folio lock will be dropped via migrate_folio_undo_src(),
>>>>>>>>>>>> any racing
>>>>>>>>>>>> thread will make progress and migration will be retried.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>>      mm/migrate.c | 9 +++++++++
>>>>>>>>>>>>      1 file changed, 9 insertions(+)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>>>>>>>>>> index e7296c0fb5d5..477acf996951 100644
>>>>>>>>>>>> --- a/mm/migrate.c
>>>>>>>>>>>> +++ b/mm/migrate.c
>>>>>>>>>>>> @@ -1250,6 +1250,15 @@ static int
>>>>>>>>>>>> migrate_folio_unmap(new_folio_t get_new_folio,
>>>>>>>>>>>>          }
>>>>>>>>>>>>            if (!folio_mapped(src)) {
>>>>>>>>>>>> +        /*
>>>>>>>>>>>> +         * Someone may have changed the refcount and maybe
>>>>>>>>>>>> sleeping
>>>>>>>>>>>> +         * on the folio lock. In case of refcount mismatch,
>>>>>>>>>>>> bail out,
>>>>>>>>>>>> +         * let the system make progress and retry.
>>>>>>>>>>>> +         */
>>>>>>>>>>>> +        struct address_space *mapping = folio_mapping(src);
>>>>>>>>>>>> +
>>>>>>>>>>>> +        if (folio_ref_count(src) !=
>>>>>>>>>>>> folio_expected_refs(mapping, src))
>>>>>>>>>>>> +            goto out;
>>>>>>>>>>>>              __migrate_folio_record(dst, old_page_state,
>>>>>>>>>>>> anon_vma);
>>>>>>>>>>>>              return MIGRATEPAGE_UNMAP;
>>>>>>>>>>>>          }
>>>>>>>>>>> Do you have some test results for this?  For example, after
>>>>>>>>>>> applying the
>>>>>>>>>>> patch, the migration success rate increased XX%, etc.
>>>>>>>>>> I'll get back to you on this.
>>>>>>>>>>
>>>>>>>>>>> My understanding for this issue is that the migration success
>>>>>>>>>>> rate can
>>>>>>>>>>> increase if we undo all changes before retrying. This is the
>>>>>>>>>>> current
>>>>>>>>>>> behavior for sync migration, but not for async migration.  If
>>>>>>>>>>> so, we can
>>>>>>>>>>> use migrate_pages_sync() for async migration too to increase
>>>>>>>>>>> success
>>>>>>>>>>> rate?  Of course, we need to change the function name and
>>>>>>>>>>> comments.
>>>>>>>>>> As per my understanding, this is not the current behaviour for sync
>>>>>>>>>> migration. After successful unmapping, we fail in
>>>>>>>>>> migrate_folio_move()
>>>>>>>>>> with -EAGAIN, we do not call undo src+dst (rendering the loop
>>>>>>>>>> around
>>>>>>>>>> migrate_folio_move() futile), we do not push the failed folio
>>>>>>>>>> onto the
>>>>>>>>>> ret_folios list, therefore, in _sync(), _batch() is never
>>>>>>>>>> tried again.
>>>>>>>>> In migrate_pages_sync(), migrate_pages_batch(,MIGRATE_ASYNC) will be
>>>>>>>>> called first, if failed, the folio will be restored to the original
>>>>>>>>> state (unlocked).  Then migrate_pages_batch(,_SYNC*) is called
>>>>>>>>> again.
>>>>>>>>> So, we unlock once.  If it's necessary, we can unlock more times via
>>>>>>>>> another level of loop.
>>>>>>>> Yes, that's my point. We need to undo src+dst and retry.
>>>>>>> For sync migration, we undo src+dst and retry now, but only once.  You
>>>>>>> have shown that more retrying increases success rate.
>>>>>>>
>>>>>>>> We will have
>>>>>>>> to decide where we want this retrying to be; do we want to change the
>>>>>>>> return value, end up in the while loop wrapped around _sync(),
>>>>>>>> and retry
>>>>>>>> there by adding another level of loop, or do we want to make use
>>>>>>>> of the
>>>>>>>> existing retry loops, one of which is wrapped around _unmap();
>>>>>>>> the latter
>>>>>>>> is my approach. The utility I see for the former approach is
>>>>>>>> that, in case
>>>>>>>> of a large number of page migrations (which should usually be
>>>>>>>> the case),
>>>>>>>> we are giving more time for the folio to get retried. The latter
>>>>>>>> does not
>>>>>>>> give much time and discards the folio if it did not succeed
>>>>>>>> under 7 times.
>>>>>>> Because it's a race, I guess that most folios will be migrated
>>>>>>> successfully in the first pass.
>>>>>>>
>>>>>>> My concerns of your method are that it deal with just one case
>>>>>>> specially.  While retrying after undoing all appears more general.
>>>>>>
>>>>>> Makes sense. Also, please ignore my "change the return value"
>>>>>> thing, I got confused between unmap_folios, ret_folios, etc.
>>>>>> Now I think I understood what the lists are doing :)
>>>>>>
>>>>>>> If it's really important to retry after undoing all, we can either
>>>>>>> convert two retying loops of migrate_pages_batch() into one loop, or
>>>>>>> remove retry loop in migrate_pages_batch() and retry in its caller
>>>>>>> instead.
>>>>>> And if I implemented this correctly, the following makes the test
>>>>>> pass always:
>>>>>> https://www.codedump.xyz/diff/Zrn7EdxzNXmXyNXe
>>>>>
>>>>> Okay, I did mess up with the implementation, leading to a false
>>>>> positive. Let me try again :)
>>>>
>>>> Hopefully this should do the job:
>>>> https://www.codedump.xyz/diff/ZrsIV8JSOPYx5V_u
>>>>
>>>> But the result is worse than the patch proposed; I rarely hit
>>>> a 3 digit number of successes of move_pages(). But, on a
>>>> base kernel without any changes, when I apply David's
>>>> suggestion to change the test, if I choose 7 as the number
>>>> of retries (= NR_MAX_MIGRATE_SYNC_RETRY) in the test, I
>>>> can touch even 4 digits. I am puzzled.
>>>> We can also try merging the for loops of unmap and move...
>>>
>>> If people are okay with this change, I guess I can send it as
>>> a v2? I concur with your assessment that my initial approach
>>> is solving a specific case; the above approach does give me
>>> a slight improvement on arm64 and should be an improvement
>>> in general, since it makes sense to defer retrying the failed folio
>>> as much as we can.
>> We need to deal with something else before a formal v2,
>>
>> - stats need to be fixed, please check result processing for the first
>>    loop of migrate_pages_sync().
>
> Sorry, can you point out where do they need to be fixed exactly?
> The change I did is inside the while(!list_empty(from)) block,
> and there is no stat computation being done there already.

The migration failure for one folio may be counted multiple times during
retrying.  So, you may need to drop counting except the final one.

>>
>> - Do we need something similar for async migration.
>>
>> - Can we add another level of explicit loop for the second loop of
>>    migrate_pages_sync()?  That is to improve code readability.  Or, add a
>>    function to dot that?
>>
>> - Is it good to remove retry loop in migrate_pages_batch()?  And do
>>    retry in the caller?
>
> I am personally in favour of leaving the retry loop, and async
> migration, as it is. Since async version is basically minimal-effort
> migration, it won't make sense to "optimize" it, given the code churn
> it would create, including the change we will have to then do in
> "if (mode == MIGRATE_ASYNC) => migrate_pages_batch(ASYNC)" inside
> migrate_pages().

I think that it may make sense to improve async migration too if it's
possible and not too complex.  If necessary, we can refactor the code to
improve the readability.

> Sorry, what do you mean by "another level of explicit loop"?

I mean something like,

for (i = 0; i < NR_MAX_MIGRATE_SYNC_RETRY; i++) {
        while (!list_empty(from)) {
                migrate_pages_batch(, 1);
        }
}

--
Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2024-09-02  6:46 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-09 10:31 [PATCH 0/2] Improve migration by backing off earlier Dev Jain
2024-08-09 10:31 ` [PATCH 1/2] mm: Retry migration earlier upon refcount mismatch Dev Jain
2024-08-09 13:47   ` David Hildenbrand
2024-08-09 21:09     ` Christoph Lameter (Ampere)
2024-08-10 18:42     ` Dev Jain
2024-08-10 18:52       ` David Hildenbrand
2024-08-11  6:06         ` Dev Jain
2024-08-11  9:08           ` David Hildenbrand
2024-08-12  5:35             ` Dev Jain
2024-08-12  9:30               ` David Hildenbrand
2024-08-10 21:05     ` Zi Yan
2024-08-12  5:34   ` Huang, Ying
2024-08-12  6:01     ` Dev Jain
2024-08-12  6:15       ` Huang, Ying
2024-08-12  6:52         ` Dev Jain
2024-08-12  7:31           ` Huang, Ying
2024-08-12 12:08             ` Dev Jain
2024-08-13  5:00               ` Dev Jain
2024-08-13  7:22                 ` Dev Jain
2024-08-16 11:31                   ` Dev Jain
2024-08-19  6:58                     ` Huang, Ying
2024-08-20  7:16                       ` Dev Jain
2024-09-02  6:42                         ` Huang, Ying
2024-08-12  6:13     ` Dev Jain
2024-08-12  6:20       ` Huang, Ying
2024-08-12  6:32         ` Dev Jain
2024-08-09 10:31 ` [PATCH 2/2] selftests/mm: Do not fail test for a single migration failure Dev Jain
2024-08-09 17:13   ` Shuah Khan
2024-08-09 21:10     ` Christoph Lameter (Ampere)
2024-08-12  6:19     ` Dev Jain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).