[PATCH v2 0/2] page_owner: Refcount fixups

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/2] page_owner: Refcount fixups
@ 2024-03-19 18:32 Oscar Salvador
  2024-03-19 18:32 ` [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance Oscar Salvador
  2024-03-19 18:32 ` [PATCH v2 2/2] mm,page_owner: Fix accounting of pages when migrating Oscar Salvador
  0 siblings, 2 replies; 16+ messages in thread
From: Oscar Salvador @ 2024-03-19 18:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Michal Hocko, Vlastimil Babka,
	Marco Elver, Andrey Konovalov, Alexander Potapenko, Tetsuo Handa,
	Oscar Salvador

A couple of fixups for the refcounting part.
From this series on, instead of counting the stacks, we count the outstanding
nr_base_pages each stack has, which gives a much better memory overview.

The other fixup is for the migration part.
A more detailed explanation can be found in the changelog of respective patches.

Oscar Salvador (2):
  mm,page_owner: Fix refcount imbalance
  mm,page_owner: Fix accounting of pages when migrating

 Documentation/mm/page_owner.rst | 73 +++++++++++++++++----------------
 mm/page_owner.c                 | 51 +++++++++++++----------
 2 files changed, 68 insertions(+), 56 deletions(-)

-- 
2.44.0



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance
  2024-03-19 18:32 [PATCH v2 0/2] page_owner: Refcount fixups Oscar Salvador
@ 2024-03-19 18:32 ` Oscar Salvador
  2024-03-19 23:24   ` Andrew Morton
                     ` (3 more replies)
  2024-03-19 18:32 ` [PATCH v2 2/2] mm,page_owner: Fix accounting of pages when migrating Oscar Salvador
  1 sibling, 4 replies; 16+ messages in thread
From: Oscar Salvador @ 2024-03-19 18:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Michal Hocko, Vlastimil Babka,
	Marco Elver, Andrey Konovalov, Alexander Potapenko, Tetsuo Handa,
	Oscar Salvador, syzbot+41bbfdb8d41003d12c0f

Current code does not contemplate scenarios were an allocation and
free operation on the same pages do not handle it in the same amount
at once.
To give an example, page_alloc_exact(), where we will allocate a page
of enough order to stafisfy the size request, but we will free the
remainings right away.

In the above example, we will increment the stack_record refcount
only once, but we will decrease it the same number of times as number
of unused pages we have to free.
This will lead to a warning because of refcount imbalance.

Fix this by recording the number of base pages in the refcount field.

Reported-by: syzbot+41bbfdb8d41003d12c0f@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-mm/00000000000090e8ff0613eda0e5@google.com
Fixes: 217b2119b9e2 ("mm,page_owner: implement the tracking of the stacks count")
Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 Documentation/mm/page_owner.rst | 73 +++++++++++++++++----------------
 mm/page_owner.c                 | 38 ++++++++---------
 2 files changed, 56 insertions(+), 55 deletions(-)

diff --git a/Documentation/mm/page_owner.rst b/Documentation/mm/page_owner.rst
index 0d0334cd5179..3a45a20fc05a 100644
--- a/Documentation/mm/page_owner.rst
+++ b/Documentation/mm/page_owner.rst
@@ -24,10 +24,10 @@ fragmentation statistics can be obtained through gfp flag information of
 each page. It is already implemented and activated if page owner is
 enabled. Other usages are more than welcome.
 
-It can also be used to show all the stacks and their outstanding
-allocations, which gives us a quick overview of where the memory is going
-without the need to screen through all the pages and match the allocation
-and free operation.
+It can also be used to show all the stacks and their current number of
+allocated base pages, which gives us a quick overview of where the memory
+is going without the need to screen through all the pages and match the
+allocation and free operation.
 
 page owner is disabled by default. So, if you'd like to use it, you need
 to add "page_owner=on" to your boot cmdline. If the kernel is built
@@ -75,42 +75,45 @@ Usage
 
 	cat /sys/kernel/debug/page_owner_stacks/show_stacks > stacks.txt
 	cat stacks.txt
-	 prep_new_page+0xa9/0x120
-	 get_page_from_freelist+0x7e6/0x2140
-	 __alloc_pages+0x18a/0x370
-	 new_slab+0xc8/0x580
-	 ___slab_alloc+0x1f2/0xaf0
-	 __slab_alloc.isra.86+0x22/0x40
-	 kmem_cache_alloc+0x31b/0x350
-	 __khugepaged_enter+0x39/0x100
-	 dup_mmap+0x1c7/0x5ce
-	 copy_process+0x1afe/0x1c90
-	 kernel_clone+0x9a/0x3c0
-	 __do_sys_clone+0x66/0x90
-	 do_syscall_64+0x7f/0x160
-	 entry_SYSCALL_64_after_hwframe+0x6c/0x74
-	stack_count: 234
+	 post_alloc_hook+0x177/0x1a0
+	 get_page_from_freelist+0xd01/0xd80
+	 __alloc_pages+0x39e/0x7e0
+	 allocate_slab+0xbc/0x3f0
+	 ___slab_alloc+0x528/0x8a0
+	 kmem_cache_alloc+0x224/0x3b0
+	 sk_prot_alloc+0x58/0x1a0
+	 sk_alloc+0x32/0x4f0
+	 inet_create+0x427/0xb50
+	 __sock_create+0x2e4/0x650
+	 inet_ctl_sock_create+0x30/0x180
+	 igmp_net_init+0xc1/0x130
+	 ops_init+0x167/0x410
+	 setup_net+0x304/0xa60
+	 copy_net_ns+0x29b/0x4a0
+	 create_new_namespaces+0x4a1/0x820
+	nr_base_pages: 16
 	...
 	...
 	echo 7000 > /sys/kernel/debug/page_owner_stacks/count_threshold
 	cat /sys/kernel/debug/page_owner_stacks/show_stacks> stacks_7000.txt
 	cat stacks_7000.txt
-	 prep_new_page+0xa9/0x120
-	 get_page_from_freelist+0x7e6/0x2140
-	 __alloc_pages+0x18a/0x370
-	 alloc_pages_mpol+0xdf/0x1e0
-	 folio_alloc+0x14/0x50
-	 filemap_alloc_folio+0xb0/0x100
-	 page_cache_ra_unbounded+0x97/0x180
-	 filemap_fault+0x4b4/0x1200
-	 __do_fault+0x2d/0x110
-	 do_pte_missing+0x4b0/0xa30
-	 __handle_mm_fault+0x7fa/0xb70
-	 handle_mm_fault+0x125/0x300
-	 do_user_addr_fault+0x3c9/0x840
-	 exc_page_fault+0x68/0x150
-	 asm_exc_page_fault+0x22/0x30
-	stack_count: 8248
+	 post_alloc_hook+0x177/0x1a0
+	 get_page_from_freelist+0xd01/0xd80
+	 __alloc_pages+0x39e/0x7e0
+	 alloc_pages_mpol+0x22e/0x490
+	 folio_alloc+0xd5/0x110
+	 filemap_alloc_folio+0x78/0x230
+	 page_cache_ra_order+0x287/0x6f0
+	 filemap_get_pages+0x517/0x1160
+	 filemap_read+0x304/0x9f0
+	 xfs_file_buffered_read+0xe6/0x1d0 [xfs]
+	 xfs_file_read_iter+0x1f0/0x380 [xfs]
+	 __kernel_read+0x3b9/0x730
+	 kernel_read_file+0x309/0x4d0
+	 __do_sys_finit_module+0x381/0x730
+	 do_syscall_64+0x8d/0x150
+	 entry_SYSCALL_64_after_hwframe+0x62/0x6a
+	nr_base_pages: 20824
 	...
 
 	cat /sys/kernel/debug/page_owner > page_owner_full.txt
diff --git a/mm/page_owner.c b/mm/page_owner.c
index d17d1351ec84..2613805cb665 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -196,9 +196,11 @@ static void add_stack_record_to_list(struct stack_record *stack_record,
 	spin_unlock_irqrestore(&stack_list_lock, flags);
 }
 
-static void inc_stack_record_count(depot_stack_handle_t handle, gfp_t gfp_mask)
+static void inc_stack_record_count(depot_stack_handle_t handle, gfp_t gfp_mask,
+				   int nr_base_pages)
 {
 	struct stack_record *stack_record = __stack_depot_get_stack_record(handle);
+	int old = REFCOUNT_SATURATED;
 
 	if (!stack_record)
 		return;
@@ -210,22 +212,18 @@ static void inc_stack_record_count(depot_stack_handle_t handle, gfp_t gfp_mask)
 	 * Since we do not use STACK_DEPOT_FLAG_GET API, let us
 	 * set a refcount of 1 ourselves.
 	 */
-	if (refcount_read(&stack_record->count) == REFCOUNT_SATURATED) {
-		int old = REFCOUNT_SATURATED;
-
-		if (atomic_try_cmpxchg_relaxed(&stack_record->count.refs, &old, 1))
-			/* Add the new stack_record to our list */
-			add_stack_record_to_list(stack_record, gfp_mask);
-	}
-	refcount_inc(&stack_record->count);
+	if (atomic_try_cmpxchg_relaxed(&stack_record->count.refs, &old, 1))
+		add_stack_record_to_list(stack_record, gfp_mask);
+	refcount_add(nr_base_pages, &stack_record->count);
 }
 
-static void dec_stack_record_count(depot_stack_handle_t handle)
+static void dec_stack_record_count(depot_stack_handle_t handle,
+				   int nr_base_pages)
 {
 	struct stack_record *stack_record = __stack_depot_get_stack_record(handle);
 
 	if (stack_record)
-		refcount_dec(&stack_record->count);
+		refcount_sub_and_test(nr_base_pages, &stack_record->count);
 }
 
 void __reset_page_owner(struct page *page, unsigned short order)
@@ -263,7 +261,7 @@ void __reset_page_owner(struct page *page, unsigned short order)
 		 * the machinery is not ready yet, we cannot decrement
 		 * their refcount either.
 		 */
-		dec_stack_record_count(alloc_handle);
+		dec_stack_record_count(alloc_handle, 1 << order);
 }
 
 static inline void __set_page_owner_handle(struct page_ext *page_ext,
@@ -305,7 +303,7 @@ noinline void __set_page_owner(struct page *page, unsigned short order,
 		return;
 	__set_page_owner_handle(page_ext, handle, order, gfp_mask);
 	page_ext_put(page_ext);
-	inc_stack_record_count(handle, gfp_mask);
+	inc_stack_record_count(handle, gfp_mask, 1 << order);
 }
 
 void __set_page_owner_migrate_reason(struct page *page, int reason)
@@ -861,11 +859,11 @@ static void *stack_next(struct seq_file *m, void *v, loff_t *ppos)
 	return stack;
 }
 
-static unsigned long page_owner_stack_threshold;
+static unsigned long page_owner_pages_threshold;
 
 static int stack_print(struct seq_file *m, void *v)
 {
-	int i, stack_count;
+	int i, nr_base_pages;
 	struct stack *stack = v;
 	unsigned long *entries;
 	unsigned long nr_entries;
@@ -876,14 +874,14 @@ static int stack_print(struct seq_file *m, void *v)
 
 	nr_entries = stack_record->size;
 	entries = stack_record->entries;
-	stack_count = refcount_read(&stack_record->count) - 1;
+	nr_base_pages = refcount_read(&stack_record->count) - 1;
 
-	if (stack_count < 1 || stack_count < page_owner_stack_threshold)
+	if (nr_base_pages < 1 || nr_base_pages < page_owner_pages_threshold)
 		return 0;
 
 	for (i = 0; i < nr_entries; i++)
 		seq_printf(m, " %pS\n", (void *)entries[i]);
-	seq_printf(m, "stack_count: %d\n\n", stack_count);
+	seq_printf(m, "nr_base_pages: %d\n\n", nr_base_pages);
 
 	return 0;
 }
@@ -913,13 +911,13 @@ static const struct file_operations page_owner_stack_operations = {
 
 static int page_owner_threshold_get(void *data, u64 *val)
 {
-	*val = READ_ONCE(page_owner_stack_threshold);
+	*val = READ_ONCE(page_owner_pages_threshold);
 	return 0;
 }
 
 static int page_owner_threshold_set(void *data, u64 val)
 {
-	WRITE_ONCE(page_owner_stack_threshold, val);
+	WRITE_ONCE(page_owner_pages_threshold, val);
 	return 0;
 }
 
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance
  2024-03-19 18:32 ` [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance Oscar Salvador
@ 2024-03-19 23:24   ` Andrew Morton
  2024-03-20  4:40     ` Tetsuo Handa
  2024-03-20 17:35   ` kernel test robot
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2024-03-19 23:24 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: linux-kernel, linux-mm, Michal Hocko, Vlastimil Babka,
	Marco Elver, Andrey Konovalov, Alexander Potapenko, Tetsuo Handa,
	syzbot+41bbfdb8d41003d12c0f

On Tue, 19 Mar 2024 19:32:11 +0100 Oscar Salvador <osalvador@suse.de> wrote:

> Current code does not contemplate scenarios were an allocation and
> free operation on the same pages do not handle it in the same amount
> at once.
> To give an example, page_alloc_exact(), where we will allocate a page
> of enough order to stafisfy the size request, but we will free the
> remainings right away.
> 
> In the above example, we will increment the stack_record refcount
> only once, but we will decrease it the same number of times as number
> of unused pages we have to free.
> This will lead to a warning because of refcount imbalance.
> 
> Fix this by recording the number of base pages in the refcount field.
> 
> ...
>
> -static void dec_stack_record_count(depot_stack_handle_t handle)
> +static void dec_stack_record_count(depot_stack_handle_t handle,
> +				   int nr_base_pages)
>  {
>  	struct stack_record *stack_record = __stack_depot_get_stack_record(handle);
>  
>  	if (stack_record)
> -		refcount_dec(&stack_record->count);
> +		refcount_sub_and_test(nr_base_pages, &stack_record->count);
>  }

mm/page_owner.c: In function 'dec_stack_record_count':
mm/page_owner.c:226:17: error: ignoring return value of 'refcount_sub_and_test' declared with attribute 'warn_unused_result' [-Werror=unused-result]
  226 |                 refcount_sub_and_test(nr_base_pages, &stack_record->count);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance
  2024-03-19 23:24   ` Andrew Morton
@ 2024-03-20  4:40     ` Tetsuo Handa
  2024-03-20  5:49       ` Oscar Salvador
  0 siblings, 1 reply; 16+ messages in thread
From: Tetsuo Handa @ 2024-03-20  4:40 UTC (permalink / raw)
  To: Andrew Morton, Oscar Salvador
  Cc: linux-kernel, linux-mm, Michal Hocko, Vlastimil Babka,
	Marco Elver, Andrey Konovalov, Alexander Potapenko,
	syzbot+41bbfdb8d41003d12c0f

On 2024/03/20 8:24, Andrew Morton wrote:
> On Tue, 19 Mar 2024 19:32:11 +0100 Oscar Salvador <osalvador@suse.de> wrote:
>> -static void dec_stack_record_count(depot_stack_handle_t handle)
>> +static void dec_stack_record_count(depot_stack_handle_t handle,
>> +				   int nr_base_pages)
>>  {
>>  	struct stack_record *stack_record = __stack_depot_get_stack_record(handle);
>>  
>>  	if (stack_record)
>> -		refcount_dec(&stack_record->count);
>> +		refcount_sub_and_test(nr_base_pages, &stack_record->count);
>>  }
> 
> mm/page_owner.c: In function 'dec_stack_record_count':
> mm/page_owner.c:226:17: error: ignoring return value of 'refcount_sub_and_test' declared with attribute 'warn_unused_result' [-Werror=unused-result]
>   226 |                 refcount_sub_and_test(nr_base_pages, &stack_record->count);
>       |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> cc1: all warnings being treated as errors
> 

Hmm, I guess that this is not an expected user of refcount API.
If it is correct behavior that refcount becomes 0 here, you need to explain like

-		refcount_sub_and_test(nr_base_pages, &stack_record->count);
+		if (refcount_sub_and_test(nr_base_pages, &stack_record->count)) {
+			// Explain why nothing to do here, and explain where/how
+			// refcount again becomes positive value using refcount_set().
+		}

or replace refcount_t with atomic_t where it is legal to make refcount positive
without using atomic_set().



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance
  2024-03-20  4:40     ` Tetsuo Handa
@ 2024-03-20  5:49       ` Oscar Salvador
  2024-03-20  9:42         ` Tetsuo Handa
  0 siblings, 1 reply; 16+ messages in thread
From: Oscar Salvador @ 2024-03-20  5:49 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Andrew Morton, linux-kernel, linux-mm, Michal Hocko,
	Vlastimil Babka, Marco Elver, Andrey Konovalov,
	Alexander Potapenko, syzbot+41bbfdb8d41003d12c0f

On Wed, Mar 20, 2024 at 01:40:05PM +0900, Tetsuo Handa wrote:
> Hmm, I guess that this is not an expected user of refcount API.
> If it is correct behavior that refcount becomes 0 here, you need to explain like
> 
> -		refcount_sub_and_test(nr_base_pages, &stack_record->count);
> +		if (refcount_sub_and_test(nr_base_pages, &stack_record->count)) {
> +			// Explain why nothing to do here, and explain where/how
> +			// refcount again becomes positive value using refcount_set().
> +		}
> 
> or replace refcount_t with atomic_t where it is legal to make refcount positive
> without using atomic_set().

No, it is not expected for the refcount to become 0.
I do know why, but I lost a chunk in the middle of a rebase.
This should have the follwing on top:

 diff --git a/mm/page_owner.c b/mm/page_owner.c
 index 2613805cb665..e477a71d6adc 100644
 --- a/mm/page_owner.c
 +++ b/mm/page_owner.c
 @@ -222,8 +222,11 @@ static void dec_stack_record_count(depot_stack_handle_t handle,
  {
         struct stack_record *stack_record = __stack_depot_get_stack_record(handle);
  
 -       if (stack_record)
 -               refcount_sub_and_test(nr_base_pages, &stack_record->count);
 +       if (!stack_record)
 +               return;
 +
 +       if (refcount_sub_and_test(nr_base_pages, &stack_record->count))
 +               WARN(1, "%s refcount went to 0 for %u handle\n", __func__, handle);
  }


-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance
  2024-03-20  5:49       ` Oscar Salvador
@ 2024-03-20  9:42         ` Tetsuo Handa
  0 siblings, 0 replies; 16+ messages in thread
From: Tetsuo Handa @ 2024-03-20  9:42 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Andrew Morton, linux-kernel, linux-mm, Michal Hocko,
	Vlastimil Babka, Marco Elver, Andrey Konovalov,
	Alexander Potapenko, syzbot+41bbfdb8d41003d12c0f

On 2024/03/20 14:49, Oscar Salvador wrote:
> This should have the follwing on top:
> 
>  diff --git a/mm/page_owner.c b/mm/page_owner.c
>  index 2613805cb665..e477a71d6adc 100644
>  --- a/mm/page_owner.c
>  +++ b/mm/page_owner.c
>  @@ -222,8 +222,11 @@ static void dec_stack_record_count(depot_stack_handle_t handle,
>   {
>          struct stack_record *stack_record = __stack_depot_get_stack_record(handle);
>   
>  -       if (stack_record)
>  -               refcount_sub_and_test(nr_base_pages, &stack_record->count);
>  +       if (!stack_record)
>  +               return;
>  +
>  +       if (refcount_sub_and_test(nr_base_pages, &stack_record->count))
>  +               WARN(1, "%s refcount went to 0 for %u handle\n", __func__, handle);
>   }
> 
> 

That fixed this problem. Thank you.

-------- Forwarded Message --------
Date: Wed, 20 Mar 2024 01:31:03 -0700
In-Reply-To: <4362246e-d804-43de-800b-a7840b70919a@I-love.SAKURA.ne.jp>
Message-ID: <000000000000410e3b061413692b@google.com>
Subject: Re: [syzbot] [mm?] WARNING: refcount bug in __reset_page_owner
From: syzbot <syzbot+98c1a1753a0731df2dd4@syzkaller.appspotmail.com>
To: linux-kernel@vger.kernel.org, penguin-kernel@i-love.sakura.ne.jp, syzkaller-bugs@googlegroups.com

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+98c1a1753a0731df2dd4@syzkaller.appspotmail.com

Tested on:

commit:         a4145ce1 Merge tag 'bcachefs-2024-03-19' of https://ev..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=131b1d66180000
kernel config:  https://syzkaller.appspot.com/x/.config?x=9f47e8dfa53b0b11
dashboard link: https://syzkaller.appspot.com/bug?extid=98c1a1753a0731df2dd4
compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch:          https://syzkaller.appspot.com/x/patch.diff?x=13972985180000

Note: testing is done by a robot and is best-effort only.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance
  2024-03-19 18:32 ` [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance Oscar Salvador
  2024-03-19 23:24   ` Andrew Morton
@ 2024-03-20 17:35   ` kernel test robot
  2024-03-20 23:37   ` kernel test robot
  2024-03-21 10:36   ` Vlastimil Babka
  3 siblings, 0 replies; 16+ messages in thread
From: kernel test robot @ 2024-03-20 17:35 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: llvm, oe-kbuild-all, Linux Memory Management List, linux-kernel,
	Michal Hocko, Vlastimil Babka, Marco Elver, Andrey Konovalov,
	Alexander Potapenko, Tetsuo Handa, Oscar Salvador,
	syzbot+41bbfdb8d41003d12c0f

Hi Oscar,

kernel test robot noticed the following build warnings:

[auto build test WARNING on akpm-mm/mm-everything]
[also build test WARNING on linus/master next-20240320]
[cannot apply to v6.8]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Oscar-Salvador/mm-page_owner-Fix-refcount-imbalance/20240320-023302
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20240319183212.17156-2-osalvador%40suse.de
patch subject: [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance
config: x86_64-buildonly-randconfig-004-20240320 (https://download.01.org/0day-ci/archive/20240321/202403210131.YLR6USxm-lkp@intel.com/config)
compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240321/202403210131.YLR6USxm-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202403210131.YLR6USxm-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> mm/page_owner.c:213:3: warning: ignoring return value of function declared with 'warn_unused_result' attribute [-Wunused-result]
     213 |                 refcount_sub_and_test(nr_base_pages, &stack_record->count);
         |                 ^~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   1 warning generated.


vim +/warn_unused_result +213 mm/page_owner.c

   206	
   207	static void dec_stack_record_count(depot_stack_handle_t handle,
   208					   int nr_base_pages)
   209	{
   210		struct stack_record *stack_record = __stack_depot_get_stack_record(handle);
   211	
   212		if (stack_record)
 > 213			refcount_sub_and_test(nr_base_pages, &stack_record->count);
   214	}
   215	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance
  2024-03-19 18:32 ` [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance Oscar Salvador
  2024-03-19 23:24   ` Andrew Morton
  2024-03-20 17:35   ` kernel test robot
@ 2024-03-20 23:37   ` kernel test robot
  2024-03-21 10:36   ` Vlastimil Babka
  3 siblings, 0 replies; 16+ messages in thread
From: kernel test robot @ 2024-03-20 23:37 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: oe-kbuild-all, Linux Memory Management List, linux-kernel,
	Michal Hocko, Vlastimil Babka, Marco Elver, Andrey Konovalov,
	Alexander Potapenko, Tetsuo Handa, Oscar Salvador,
	syzbot+41bbfdb8d41003d12c0f

Hi Oscar,

kernel test robot noticed the following build warnings:

[auto build test WARNING on akpm-mm/mm-everything]
[also build test WARNING on linus/master next-20240320]
[cannot apply to v6.8]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Oscar-Salvador/mm-page_owner-Fix-refcount-imbalance/20240320-023302
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20240319183212.17156-2-osalvador%40suse.de
patch subject: [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance
config: x86_64-randconfig-r081-20240320 (https://download.01.org/0day-ci/archive/20240321/202403210718.9IiE8NIU-lkp@intel.com/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240321/202403210718.9IiE8NIU-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202403210718.9IiE8NIU-lkp@intel.com/

All warnings (new ones prefixed by >>):

   mm/page_owner.c: In function 'dec_stack_record_count':
>> mm/page_owner.c:213:3: warning: ignoring return value of 'refcount_sub_and_test', declared with attribute warn_unused_result [-Wunused-result]
     213 |   refcount_sub_and_test(nr_base_pages, &stack_record->count);
         |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


vim +/refcount_sub_and_test +213 mm/page_owner.c

   206	
   207	static void dec_stack_record_count(depot_stack_handle_t handle,
   208					   int nr_base_pages)
   209	{
   210		struct stack_record *stack_record = __stack_depot_get_stack_record(handle);
   211	
   212		if (stack_record)
 > 213			refcount_sub_and_test(nr_base_pages, &stack_record->count);
   214	}
   215	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance
  2024-03-19 18:32 ` [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance Oscar Salvador
                     ` (2 preceding siblings ...)
  2024-03-20 23:37   ` kernel test robot
@ 2024-03-21 10:36   ` Vlastimil Babka
  3 siblings, 0 replies; 16+ messages in thread
From: Vlastimil Babka @ 2024-03-21 10:36 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: linux-kernel, linux-mm, Michal Hocko, Marco Elver,
	Andrey Konovalov, Alexander Potapenko, Tetsuo Handa,
	syzbot+41bbfdb8d41003d12c0f

On 3/19/24 19:32, Oscar Salvador wrote:
> Current code does not contemplate scenarios were an allocation and
> free operation on the same pages do not handle it in the same amount
> at once.
> To give an example, page_alloc_exact(), where we will allocate a page
> of enough order to stafisfy the size request, but we will free the
> remainings right away.
> 
> In the above example, we will increment the stack_record refcount
> only once, but we will decrease it the same number of times as number
> of unused pages we have to free.
> This will lead to a warning because of refcount imbalance.
> 
> Fix this by recording the number of base pages in the refcount field.
> 
> Reported-by: syzbot+41bbfdb8d41003d12c0f@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/linux-mm/00000000000090e8ff0613eda0e5@google.com
> Fixes: 217b2119b9e2 ("mm,page_owner: implement the tracking of the stacks count")
> Signed-off-by: Oscar Salvador <osalvador@suse.de>

With the fixup,

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

But I think you'll need to resend with the missing hunk already applied, it
had broken whitespace in your email and IIUC this is was dropped from mm tree.

Also I'd suggest a change:

> +++ b/mm/page_owner.c
> @@ -196,9 +196,11 @@ static void add_stack_record_to_list(struct stack_record *stack_record,
>  	spin_unlock_irqrestore(&stack_list_lock, flags);
>  }
>  
> -static void inc_stack_record_count(depot_stack_handle_t handle, gfp_t gfp_mask)
> +static void inc_stack_record_count(depot_stack_handle_t handle, gfp_t gfp_mask,
> +				   int nr_base_pages)
>  {
>  	struct stack_record *stack_record = __stack_depot_get_stack_record(handle);
> +	int old = REFCOUNT_SATURATED;
>  
>  	if (!stack_record)
>  		return;
> @@ -210,22 +212,18 @@ static void inc_stack_record_count(depot_stack_handle_t handle, gfp_t gfp_mask)
>  	 * Since we do not use STACK_DEPOT_FLAG_GET API, let us
>  	 * set a refcount of 1 ourselves.
>  	 */
> -	if (refcount_read(&stack_record->count) == REFCOUNT_SATURATED) {
> -		int old = REFCOUNT_SATURATED;

I think this was useful optimization in that most cases the count is not
REFCOUNT_SATURATED so we don't have to go for the expensive cmpxchg all the
time. Or do I miss a reason why this was changed?

> -
> -		if (atomic_try_cmpxchg_relaxed(&stack_record->count.refs, &old, 1))
> -			/* Add the new stack_record to our list */
> -			add_stack_record_to_list(stack_record, gfp_mask);
> -	}
> -	refcount_inc(&stack_record->count);
> +	if (atomic_try_cmpxchg_relaxed(&stack_record->count.refs, &old, 1))
> +		add_stack_record_to_list(stack_record, gfp_mask);
> +	refcount_add(nr_base_pages, &stack_record->count);
>  }
>  



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 2/2] mm,page_owner: Fix accounting of pages when migrating
  2024-03-19 18:32 [PATCH v2 0/2] page_owner: Refcount fixups Oscar Salvador
  2024-03-19 18:32 ` [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance Oscar Salvador
@ 2024-03-19 18:32 ` Oscar Salvador
  2024-03-19 18:48   ` Matthew Wilcox
  1 sibling, 1 reply; 16+ messages in thread
From: Oscar Salvador @ 2024-03-19 18:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Michal Hocko, Vlastimil Babka,
	Marco Elver, Andrey Konovalov, Alexander Potapenko, Tetsuo Handa,
	Oscar Salvador

Upon migration, new allocated pages are being given the handle of the old
pages. This is problematic because it means that for the stack which
allocated the old page, we will be substracting the old page + the new one
when that page is freed, creating an accounting imbalance.

Fix this by adding a new migrate_handle in the page_owner struct, and
record the handle that allocated the new page in __folio_copy_owner().
Upon freeing, we check whether we have a migrate_handle, and if we do,
we use migrate_handle for dec_stack_record_count(), which will
subtract those pages from its right handle.

Fixes: 217b2119b9e2 ("mm,page_owner: implement the tracking of the stacks count")
Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/page_owner.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/mm/page_owner.c b/mm/page_owner.c
index 2613805cb665..1a7d0d1dc640 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -27,6 +27,7 @@ struct page_owner {
 	gfp_t gfp_mask;
 	depot_stack_handle_t handle;
 	depot_stack_handle_t free_handle;
+	depot_stack_handle_t migrate_handle;
 	u64 ts_nsec;
 	u64 free_ts_nsec;
 	char comm[TASK_COMM_LEN];
@@ -240,7 +241,15 @@ void __reset_page_owner(struct page *page, unsigned short order)
 		return;
 
 	page_owner = get_page_owner(page_ext);
-	alloc_handle = page_owner->handle;
+	/*
+	 * If this page was allocated for migration purposes, its handle doesn't
+	 * reference the stack it was allocated from, so make sure to use the
+	 * migrate_handle in order to subtract it from the right stack.
+	 */
+	if (!page_owner->migrate_handle)
+		alloc_handle = page_owner->handle;
+	else
+		alloc_handle = page_owner->migrate_handle;
 
 	handle = save_stack(GFP_NOWAIT | __GFP_NOWARN);
 	for (i = 0; i < (1 << order); i++) {
@@ -277,6 +286,7 @@ static inline void __set_page_owner_handle(struct page_ext *page_ext,
 		page_owner->handle = handle;
 		page_owner->order = order;
 		page_owner->gfp_mask = gfp_mask;
+		page_owner->migrate_handle = 0;
 		page_owner->last_migrate_reason = -1;
 		page_owner->pid = current->pid;
 		page_owner->tgid = current->tgid;
@@ -358,6 +368,7 @@ void __folio_copy_owner(struct folio *newfolio, struct folio *old)
 	new_page_owner->gfp_mask = old_page_owner->gfp_mask;
 	new_page_owner->last_migrate_reason =
 		old_page_owner->last_migrate_reason;
+	new_page_owner->migrate_handle = new_page_owner->handle;
 	new_page_owner->handle = old_page_owner->handle;
 	new_page_owner->pid = old_page_owner->pid;
 	new_page_owner->tgid = old_page_owner->tgid;
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/2] mm,page_owner: Fix accounting of pages when migrating
  2024-03-19 18:32 ` [PATCH v2 2/2] mm,page_owner: Fix accounting of pages when migrating Oscar Salvador
@ 2024-03-19 18:48   ` Matthew Wilcox
  2024-03-20  5:00     ` Oscar Salvador
  0 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2024-03-19 18:48 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Andrew Morton, linux-kernel, linux-mm, Michal Hocko,
	Vlastimil Babka, Marco Elver, Andrey Konovalov,
	Alexander Potapenko, Tetsuo Handa

On Tue, Mar 19, 2024 at 07:32:12PM +0100, Oscar Salvador wrote:
> Upon migration, new allocated pages are being given the handle of the old
> pages. This is problematic because it means that for the stack which
> allocated the old page, we will be substracting the old page + the new one
> when that page is freed, creating an accounting imbalance.
> 
> Fix this by adding a new migrate_handle in the page_owner struct, and
> record the handle that allocated the new page in __folio_copy_owner().
> Upon freeing, we check whether we have a migrate_handle, and if we do,
> we use migrate_handle for dec_stack_record_count(), which will
> subtract those pages from its right handle.

Is this the right way to fix this problem?  I would have thought we'd
be better off accounting this as migration freeing the old page and
allocating the new page.  If I understand correctly, this is the code
which says "This page was last allocated by X and freed by Y", and I
would think that being last freed (or allocated) by the migration code
would be a very nice hint about where a problem might stem from.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/2] mm,page_owner: Fix accounting of pages when migrating
  2024-03-19 18:48   ` Matthew Wilcox
@ 2024-03-20  5:00     ` Oscar Salvador
  2024-03-21 10:50       ` Vlastimil Babka
  0 siblings, 1 reply; 16+ messages in thread
From: Oscar Salvador @ 2024-03-20  5:00 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, linux-kernel, linux-mm, Michal Hocko,
	Vlastimil Babka, Marco Elver, Andrey Konovalov,
	Alexander Potapenko, Tetsuo Handa

On Tue, Mar 19, 2024 at 06:48:31PM +0000, Matthew Wilcox wrote:
> Is this the right way to fix this problem?  I would have thought we'd
> be better off accounting this as migration freeing the old page and
> allocating the new page.  If I understand correctly, this is the code
> which says "This page was last allocated by X and freed by Y", and I
> would think that being last freed (or allocated) by the migration code
> would be a very nice hint about where a problem might stem from.

I hear you, and I had the same thought when I stumbled upon this.
I did not know that the handle was being changed, otherwise it would
have saved me quite a lot of debugging time.

Checking the history of this, I can see this decision was made in
2016 in:

 commit d435edca928805074dae005ab9a42d9fa60fc702
 Author: Vlastimil Babka <vbabka@suse.cz>
 Date:   Tue Mar 15 14:56:15 2016 -0700

     mm, page_owner: copy page owner info during migration

And let me quote:

   "The page_owner mechanism stores gfp_flags of an allocation and stack
    trace that lead to it.  During page migration, the original information
    is practically replaced by the allocation of free page as the migration
    target.  Arguably this is less useful and might lead to all the
    page_owner info for migratable pages gradually converge towards
    compaction or numa balancing migrations.  It has also lead to
    inaccuracies such as one fixed by commit e2cfc91120fa ("mm/page_owner:
    set correct gfp_mask on page_owner")."

A following patch stored the migration reason in last_migrate_reason,
and the patch also add a bit of information if last_migrate_reason was
other than 0:

 +       if (page_ext->last_migrate_reason != -1) {
 +               ret += snprintf(kbuf + ret, count - ret,
 +                       "Page has been migrated, last migrate reason: %s\n",
 +                       migrate_reason_names[page_ext->last_migrate_reason]);
 +               if (ret >= count)
 +                       goto err;
 +       }

Now, thinking about this some more, it kind of makes sense, because one
of the things page_owner is used for, in my experience, is for memory
leaks.
We print the output, try to correlate allocation/free operations per
stack so one can easily spot a stack that just keeps allocating memory
and never frees it (it might be legit, and not a leak, but it gives a
hint).

Now imagine there are 10k pages pointing to stack A.
If those pages were to be migrated e.g: kcompactd jumps in, we will lose
the original stack and replace it with something like:

migrate_pages()
...
...
kcompatd

After that, stack A does not have those 10k pages pointing to it
anymore, although it stills "own" them, just that got replaced by
new ones due to migration.

This kind of defeats the purpose of page_owner.
And after all, we do record some migration information in those new
pages, which will give us a hint when looking at the output.

So, all in all, I am leaning towards "this is fine".

-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/2] mm,page_owner: Fix accounting of pages when migrating
  2024-03-20  5:00     ` Oscar Salvador
@ 2024-03-21 10:50       ` Vlastimil Babka
  2024-03-21 11:07         ` Oscar Salvador
  0 siblings, 1 reply; 16+ messages in thread
From: Vlastimil Babka @ 2024-03-21 10:50 UTC (permalink / raw)
  To: Oscar Salvador, Matthew Wilcox
  Cc: Andrew Morton, linux-kernel, linux-mm, Michal Hocko, Marco Elver,
	Andrey Konovalov, Alexander Potapenko, Tetsuo Handa

On 3/20/24 06:00, Oscar Salvador wrote:
> On Tue, Mar 19, 2024 at 06:48:31PM +0000, Matthew Wilcox wrote:
>> Is this the right way to fix this problem?  I would have thought we'd
>> be better off accounting this as migration freeing the old page and
>> allocating the new page.  If I understand correctly, this is the code
>> which says "This page was last allocated by X and freed by Y", and I
>> would think that being last freed (or allocated) by the migration code
>> would be a very nice hint about where a problem might stem from.
> 
> I hear you, and I had the same thought when I stumbled upon this.
> I did not know that the handle was being changed, otherwise it would
> have saved me quite a lot of debugging time.
> 
> Checking the history of this, I can see this decision was made in
> 2016 in:
> 
>  commit d435edca928805074dae005ab9a42d9fa60fc702
>  Author: Vlastimil Babka <vbabka@suse.cz>
>  Date:   Tue Mar 15 14:56:15 2016 -0700
>  
>      mm, page_owner: copy page owner info during migration
> 

Yeah I think we could keep that logic.

But we could also simply subtract the refcount of the old handle (the
"allocated for migration") in __folio_copy_owner() no? Then we wouldn't need
the extra migrate_handle.

Also we might have more issues here. Most page owner code takes care to set
everything for all pages within a folio, but __folio_copy_owner() and
__set_page_owner_migrate_reason() don't.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/2] mm,page_owner: Fix accounting of pages when migrating
  2024-03-21 10:50       ` Vlastimil Babka
@ 2024-03-21 11:07         ` Oscar Salvador
  2024-03-21 11:20           ` Vlastimil Babka
  0 siblings, 1 reply; 16+ messages in thread
From: Oscar Salvador @ 2024-03-21 11:07 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Matthew Wilcox, Andrew Morton, linux-kernel, linux-mm,
	Michal Hocko, Marco Elver, Andrey Konovalov, Alexander Potapenko,
	Tetsuo Handa

On Thu, Mar 21, 2024 at 11:50:36AM +0100, Vlastimil Babka wrote:
> Yeah I think we could keep that logic.

I am all for keeping it.

> But we could also simply subtract the refcount of the old handle (the
> "allocated for migration") in __folio_copy_owner() no? Then we wouldn't need
> the extra migrate_handle.

Since new_page will have the old handle pointing to the old stack after
the call, we
could uncharge the old_page to the migrate_stack, which new_page->_handle holds
before it gets changed.

So we basically swap it.

It could work, but I kinda have a bittersweet feeling here.
I am trying to work towards to reduce the number of lookups in the
hlist, but for the approach described above I would need to lookup
the stack for new_page->handle in order to substract the page.

OTHO, I understand that adding migrate_handle kinda wasted memory.
16MB for 16GB of memory.

> Also we might have more issues here. Most page owner code takes care to set
> everything for all pages within a folio, but __folio_copy_owner() and
> __set_page_owner_migrate_reason() don't.

I did not check deeply but do not we split the folio upon migration
in case it is large?
Which means we should reach split_page_owner() before the copy takes
place. Do I get it right?

-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/2] mm,page_owner: Fix accounting of pages when migrating
  2024-03-21 11:07         ` Oscar Salvador
@ 2024-03-21 11:20           ` Vlastimil Babka
  2024-03-21 11:54             ` Oscar Salvador
  0 siblings, 1 reply; 16+ messages in thread
From: Vlastimil Babka @ 2024-03-21 11:20 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Matthew Wilcox, Andrew Morton, linux-kernel, linux-mm,
	Michal Hocko, Marco Elver, Andrey Konovalov, Alexander Potapenko,
	Tetsuo Handa

On 3/21/24 12:07, Oscar Salvador wrote:
> On Thu, Mar 21, 2024 at 11:50:36AM +0100, Vlastimil Babka wrote:
>> Yeah I think we could keep that logic.
> 
> I am all for keeping it.
> 
>> But we could also simply subtract the refcount of the old handle (the
>> "allocated for migration") in __folio_copy_owner() no? Then we wouldn't need
>> the extra migrate_handle.
> 
> Since new_page will have the old handle pointing to the old stack after
> the call, we
> could uncharge the old_page to the migrate_stack, which new_page->_handle holds
> before it gets changed.
> 
> So we basically swap it.
> 
> It could work, but I kinda have a bittersweet feeling here.
> I am trying to work towards to reduce the number of lookups in the
> hlist, but for the approach described above I would need to lookup
> the stack for new_page->handle in order to substract the page.

Understood, but migration is kinda heavy and non-fast-path operation already
so the extra lookup wouldn't be in a critical fast path.

> OTHO, I understand that adding migrate_handle kinda wasted memory.
> 16MB for 16GB of memory.
> 
>> Also we might have more issues here. Most page owner code takes care to set
>> everything for all pages within a folio, but __folio_copy_owner() and
>> __set_page_owner_migrate_reason() don't.
> 
> I did not check deeply but do not we split the folio upon migration
> in case it is large?
> Which means we should reach split_page_owner() before the copy takes
> place. Do I get it right?

When I mean is we have __set_page_owner_handle() setting up everything for
tail pages and then we have __folio_copy_owner updating only the head page,
so this will create kinda a mixed up information. Which might not be an
issue as __folio_copy_owner() should mean it's a proper folio thus compound
page thus nobody ever will check those mismatched tail pages... Maybe we
could adjust  __set_page_owner_handle() to skip tail pages for compound
pages as well and unify this, and tail pages would be only setup for those
weird high-order non-compound pages so that the odd case in __free_pages()
works?

Or maybe page_owner is considered debugging functionality enough so it might
be worth having the redundant data in tail pages in case something goes
wrong. But either way now it seems we're not doing it consistently.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/2] mm,page_owner: Fix accounting of pages when migrating
  2024-03-21 11:20           ` Vlastimil Babka
@ 2024-03-21 11:54             ` Oscar Salvador
  0 siblings, 0 replies; 16+ messages in thread
From: Oscar Salvador @ 2024-03-21 11:54 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Matthew Wilcox, Andrew Morton, linux-kernel, linux-mm,
	Michal Hocko, Marco Elver, Andrey Konovalov, Alexander Potapenko,
	Tetsuo Handa

On Thu, Mar 21, 2024 at 12:20:24PM +0100, Vlastimil Babka wrote:
> Understood, but migration is kinda heavy and non-fast-path operation already
> so the extra lookup wouldn't be in a critical fast path.

Ok, you convinced me, let us save that memory.

> When I mean is we have __set_page_owner_handle() setting up everything for
> tail pages and then we have __folio_copy_owner updating only the head page,
> so this will create kinda a mixed up information. Which might not be an
> issue as __folio_copy_owner() should mean it's a proper folio thus compound
> page thus nobody ever will check those mismatched tail pages... Maybe we
> could adjust  __set_page_owner_handle() to skip tail pages for compound
> pages as well and unify this, and tail pages would be only setup for those
> weird high-order non-compound pages so that the odd case in __free_pages()
> works?
> 
> Or maybe page_owner is considered debugging functionality enough so it might
> be worth having the redundant data in tail pages in case something goes
> wrong. But either way now it seems we're not doing it consistently.

So we basically have two options here, 1) is to also update tail pages
in __folio_copy_owner, and 2) is to follow __folio_copy_owner example
and skip tail pages in __set_page_owner.

The thing is, going with 2) might mean, as you pointed out, that if
something goes wrong we lose the information for those pages as
page_owner does not have a record for them.

OTOH, would that information be really useful? Sure we could see the stack, but
AFAICS not even the migrate_reason would be recorded in those tail
pages, which means that if they are migrated and freed, we would only
see that they were freed. (the free stack would point to the migrate
code though, so that is a hint).

Not really sure which path to take here, skipping tail pages would be
less of a complexity but at a risk of loosing information.

Anyway, let me first tackle the issue at hand here, and then I will
think more about that.

Thanks!

-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-03-21 11:52 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-19 18:32 [PATCH v2 0/2] page_owner: Refcount fixups Oscar Salvador
2024-03-19 18:32 ` [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance Oscar Salvador
2024-03-19 23:24   ` Andrew Morton
2024-03-20  4:40     ` Tetsuo Handa
2024-03-20  5:49       ` Oscar Salvador
2024-03-20  9:42         ` Tetsuo Handa
2024-03-20 17:35   ` kernel test robot
2024-03-20 23:37   ` kernel test robot
2024-03-21 10:36   ` Vlastimil Babka
2024-03-19 18:32 ` [PATCH v2 2/2] mm,page_owner: Fix accounting of pages when migrating Oscar Salvador
2024-03-19 18:48   ` Matthew Wilcox
2024-03-20  5:00     ` Oscar Salvador
2024-03-21 10:50       ` Vlastimil Babka
2024-03-21 11:07         ` Oscar Salvador
2024-03-21 11:20           ` Vlastimil Babka
2024-03-21 11:54             ` Oscar Salvador

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).