* [PATCH v2 0/2] page_owner: Refcount fixups @ 2024-03-19 18:32 Oscar Salvador 2024-03-19 18:32 ` [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance Oscar Salvador 2024-03-19 18:32 ` [PATCH v2 2/2] mm,page_owner: Fix accounting of pages when migrating Oscar Salvador 0 siblings, 2 replies; 16+ messages in thread From: Oscar Salvador @ 2024-03-19 18:32 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-mm, Michal Hocko, Vlastimil Babka, Marco Elver, Andrey Konovalov, Alexander Potapenko, Tetsuo Handa, Oscar Salvador A couple of fixups for the refcounting part. From this series on, instead of counting the stacks, we count the outstanding nr_base_pages each stack has, which gives a much better memory overview. The other fixup is for the migration part. A more detailed explanation can be found in the changelog of respective patches. Oscar Salvador (2): mm,page_owner: Fix refcount imbalance mm,page_owner: Fix accounting of pages when migrating Documentation/mm/page_owner.rst | 73 +++++++++++++++++---------------- mm/page_owner.c | 51 +++++++++++++---------- 2 files changed, 68 insertions(+), 56 deletions(-) -- 2.44.0 ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance 2024-03-19 18:32 [PATCH v2 0/2] page_owner: Refcount fixups Oscar Salvador @ 2024-03-19 18:32 ` Oscar Salvador 2024-03-19 23:24 ` Andrew Morton ` (3 more replies) 2024-03-19 18:32 ` [PATCH v2 2/2] mm,page_owner: Fix accounting of pages when migrating Oscar Salvador 1 sibling, 4 replies; 16+ messages in thread From: Oscar Salvador @ 2024-03-19 18:32 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-mm, Michal Hocko, Vlastimil Babka, Marco Elver, Andrey Konovalov, Alexander Potapenko, Tetsuo Handa, Oscar Salvador, syzbot+41bbfdb8d41003d12c0f Current code does not contemplate scenarios were an allocation and free operation on the same pages do not handle it in the same amount at once. To give an example, page_alloc_exact(), where we will allocate a page of enough order to stafisfy the size request, but we will free the remainings right away. In the above example, we will increment the stack_record refcount only once, but we will decrease it the same number of times as number of unused pages we have to free. This will lead to a warning because of refcount imbalance. Fix this by recording the number of base pages in the refcount field. Reported-by: syzbot+41bbfdb8d41003d12c0f@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-mm/00000000000090e8ff0613eda0e5@google.com Fixes: 217b2119b9e2 ("mm,page_owner: implement the tracking of the stacks count") Signed-off-by: Oscar Salvador <osalvador@suse.de> --- Documentation/mm/page_owner.rst | 73 +++++++++++++++++---------------- mm/page_owner.c | 38 ++++++++--------- 2 files changed, 56 insertions(+), 55 deletions(-) diff --git a/Documentation/mm/page_owner.rst b/Documentation/mm/page_owner.rst index 0d0334cd5179..3a45a20fc05a 100644 --- a/Documentation/mm/page_owner.rst +++ b/Documentation/mm/page_owner.rst @@ -24,10 +24,10 @@ fragmentation statistics can be obtained through gfp flag information of each page. It is already implemented and activated if page owner is enabled. Other usages are more than welcome. -It can also be used to show all the stacks and their outstanding -allocations, which gives us a quick overview of where the memory is going -without the need to screen through all the pages and match the allocation -and free operation. +It can also be used to show all the stacks and their current number of +allocated base pages, which gives us a quick overview of where the memory +is going without the need to screen through all the pages and match the +allocation and free operation. page owner is disabled by default. So, if you'd like to use it, you need to add "page_owner=on" to your boot cmdline. If the kernel is built @@ -75,42 +75,45 @@ Usage cat /sys/kernel/debug/page_owner_stacks/show_stacks > stacks.txt cat stacks.txt - prep_new_page+0xa9/0x120 - get_page_from_freelist+0x7e6/0x2140 - __alloc_pages+0x18a/0x370 - new_slab+0xc8/0x580 - ___slab_alloc+0x1f2/0xaf0 - __slab_alloc.isra.86+0x22/0x40 - kmem_cache_alloc+0x31b/0x350 - __khugepaged_enter+0x39/0x100 - dup_mmap+0x1c7/0x5ce - copy_process+0x1afe/0x1c90 - kernel_clone+0x9a/0x3c0 - __do_sys_clone+0x66/0x90 - do_syscall_64+0x7f/0x160 - entry_SYSCALL_64_after_hwframe+0x6c/0x74 - stack_count: 234 + post_alloc_hook+0x177/0x1a0 + get_page_from_freelist+0xd01/0xd80 + __alloc_pages+0x39e/0x7e0 + allocate_slab+0xbc/0x3f0 + ___slab_alloc+0x528/0x8a0 + kmem_cache_alloc+0x224/0x3b0 + sk_prot_alloc+0x58/0x1a0 + sk_alloc+0x32/0x4f0 + inet_create+0x427/0xb50 + __sock_create+0x2e4/0x650 + inet_ctl_sock_create+0x30/0x180 + igmp_net_init+0xc1/0x130 + ops_init+0x167/0x410 + setup_net+0x304/0xa60 + copy_net_ns+0x29b/0x4a0 + create_new_namespaces+0x4a1/0x820 + nr_base_pages: 16 ... ... echo 7000 > /sys/kernel/debug/page_owner_stacks/count_threshold cat /sys/kernel/debug/page_owner_stacks/show_stacks> stacks_7000.txt cat stacks_7000.txt - prep_new_page+0xa9/0x120 - get_page_from_freelist+0x7e6/0x2140 - __alloc_pages+0x18a/0x370 - alloc_pages_mpol+0xdf/0x1e0 - folio_alloc+0x14/0x50 - filemap_alloc_folio+0xb0/0x100 - page_cache_ra_unbounded+0x97/0x180 - filemap_fault+0x4b4/0x1200 - __do_fault+0x2d/0x110 - do_pte_missing+0x4b0/0xa30 - __handle_mm_fault+0x7fa/0xb70 - handle_mm_fault+0x125/0x300 - do_user_addr_fault+0x3c9/0x840 - exc_page_fault+0x68/0x150 - asm_exc_page_fault+0x22/0x30 - stack_count: 8248 + post_alloc_hook+0x177/0x1a0 + get_page_from_freelist+0xd01/0xd80 + __alloc_pages+0x39e/0x7e0 + alloc_pages_mpol+0x22e/0x490 + folio_alloc+0xd5/0x110 + filemap_alloc_folio+0x78/0x230 + page_cache_ra_order+0x287/0x6f0 + filemap_get_pages+0x517/0x1160 + filemap_read+0x304/0x9f0 + xfs_file_buffered_read+0xe6/0x1d0 [xfs] + xfs_file_read_iter+0x1f0/0x380 [xfs] + __kernel_read+0x3b9/0x730 + kernel_read_file+0x309/0x4d0 + __do_sys_finit_module+0x381/0x730 + do_syscall_64+0x8d/0x150 + entry_SYSCALL_64_after_hwframe+0x62/0x6a + nr_base_pages: 20824 ... cat /sys/kernel/debug/page_owner > page_owner_full.txt diff --git a/mm/page_owner.c b/mm/page_owner.c index d17d1351ec84..2613805cb665 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -196,9 +196,11 @@ static void add_stack_record_to_list(struct stack_record *stack_record, spin_unlock_irqrestore(&stack_list_lock, flags); } -static void inc_stack_record_count(depot_stack_handle_t handle, gfp_t gfp_mask) +static void inc_stack_record_count(depot_stack_handle_t handle, gfp_t gfp_mask, + int nr_base_pages) { struct stack_record *stack_record = __stack_depot_get_stack_record(handle); + int old = REFCOUNT_SATURATED; if (!stack_record) return; @@ -210,22 +212,18 @@ static void inc_stack_record_count(depot_stack_handle_t handle, gfp_t gfp_mask) * Since we do not use STACK_DEPOT_FLAG_GET API, let us * set a refcount of 1 ourselves. */ - if (refcount_read(&stack_record->count) == REFCOUNT_SATURATED) { - int old = REFCOUNT_SATURATED; - - if (atomic_try_cmpxchg_relaxed(&stack_record->count.refs, &old, 1)) - /* Add the new stack_record to our list */ - add_stack_record_to_list(stack_record, gfp_mask); - } - refcount_inc(&stack_record->count); + if (atomic_try_cmpxchg_relaxed(&stack_record->count.refs, &old, 1)) + add_stack_record_to_list(stack_record, gfp_mask); + refcount_add(nr_base_pages, &stack_record->count); } -static void dec_stack_record_count(depot_stack_handle_t handle) +static void dec_stack_record_count(depot_stack_handle_t handle, + int nr_base_pages) { struct stack_record *stack_record = __stack_depot_get_stack_record(handle); if (stack_record) - refcount_dec(&stack_record->count); + refcount_sub_and_test(nr_base_pages, &stack_record->count); } void __reset_page_owner(struct page *page, unsigned short order) @@ -263,7 +261,7 @@ void __reset_page_owner(struct page *page, unsigned short order) * the machinery is not ready yet, we cannot decrement * their refcount either. */ - dec_stack_record_count(alloc_handle); + dec_stack_record_count(alloc_handle, 1 << order); } static inline void __set_page_owner_handle(struct page_ext *page_ext, @@ -305,7 +303,7 @@ noinline void __set_page_owner(struct page *page, unsigned short order, return; __set_page_owner_handle(page_ext, handle, order, gfp_mask); page_ext_put(page_ext); - inc_stack_record_count(handle, gfp_mask); + inc_stack_record_count(handle, gfp_mask, 1 << order); } void __set_page_owner_migrate_reason(struct page *page, int reason) @@ -861,11 +859,11 @@ static void *stack_next(struct seq_file *m, void *v, loff_t *ppos) return stack; } -static unsigned long page_owner_stack_threshold; +static unsigned long page_owner_pages_threshold; static int stack_print(struct seq_file *m, void *v) { - int i, stack_count; + int i, nr_base_pages; struct stack *stack = v; unsigned long *entries; unsigned long nr_entries; @@ -876,14 +874,14 @@ static int stack_print(struct seq_file *m, void *v) nr_entries = stack_record->size; entries = stack_record->entries; - stack_count = refcount_read(&stack_record->count) - 1; + nr_base_pages = refcount_read(&stack_record->count) - 1; - if (stack_count < 1 || stack_count < page_owner_stack_threshold) + if (nr_base_pages < 1 || nr_base_pages < page_owner_pages_threshold) return 0; for (i = 0; i < nr_entries; i++) seq_printf(m, " %pS\n", (void *)entries[i]); - seq_printf(m, "stack_count: %d\n\n", stack_count); + seq_printf(m, "nr_base_pages: %d\n\n", nr_base_pages); return 0; } @@ -913,13 +911,13 @@ static const struct file_operations page_owner_stack_operations = { static int page_owner_threshold_get(void *data, u64 *val) { - *val = READ_ONCE(page_owner_stack_threshold); + *val = READ_ONCE(page_owner_pages_threshold); return 0; } static int page_owner_threshold_set(void *data, u64 val) { - WRITE_ONCE(page_owner_stack_threshold, val); + WRITE_ONCE(page_owner_pages_threshold, val); return 0; } -- 2.44.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance 2024-03-19 18:32 ` [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance Oscar Salvador @ 2024-03-19 23:24 ` Andrew Morton 2024-03-20 4:40 ` Tetsuo Handa 2024-03-20 17:35 ` kernel test robot ` (2 subsequent siblings) 3 siblings, 1 reply; 16+ messages in thread From: Andrew Morton @ 2024-03-19 23:24 UTC (permalink / raw) To: Oscar Salvador Cc: linux-kernel, linux-mm, Michal Hocko, Vlastimil Babka, Marco Elver, Andrey Konovalov, Alexander Potapenko, Tetsuo Handa, syzbot+41bbfdb8d41003d12c0f On Tue, 19 Mar 2024 19:32:11 +0100 Oscar Salvador <osalvador@suse.de> wrote: > Current code does not contemplate scenarios were an allocation and > free operation on the same pages do not handle it in the same amount > at once. > To give an example, page_alloc_exact(), where we will allocate a page > of enough order to stafisfy the size request, but we will free the > remainings right away. > > In the above example, we will increment the stack_record refcount > only once, but we will decrease it the same number of times as number > of unused pages we have to free. > This will lead to a warning because of refcount imbalance. > > Fix this by recording the number of base pages in the refcount field. > > ... > > -static void dec_stack_record_count(depot_stack_handle_t handle) > +static void dec_stack_record_count(depot_stack_handle_t handle, > + int nr_base_pages) > { > struct stack_record *stack_record = __stack_depot_get_stack_record(handle); > > if (stack_record) > - refcount_dec(&stack_record->count); > + refcount_sub_and_test(nr_base_pages, &stack_record->count); > } mm/page_owner.c: In function 'dec_stack_record_count': mm/page_owner.c:226:17: error: ignoring return value of 'refcount_sub_and_test' declared with attribute 'warn_unused_result' [-Werror=unused-result] 226 | refcount_sub_and_test(nr_base_pages, &stack_record->count); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance 2024-03-19 23:24 ` Andrew Morton @ 2024-03-20 4:40 ` Tetsuo Handa 2024-03-20 5:49 ` Oscar Salvador 0 siblings, 1 reply; 16+ messages in thread From: Tetsuo Handa @ 2024-03-20 4:40 UTC (permalink / raw) To: Andrew Morton, Oscar Salvador Cc: linux-kernel, linux-mm, Michal Hocko, Vlastimil Babka, Marco Elver, Andrey Konovalov, Alexander Potapenko, syzbot+41bbfdb8d41003d12c0f On 2024/03/20 8:24, Andrew Morton wrote: > On Tue, 19 Mar 2024 19:32:11 +0100 Oscar Salvador <osalvador@suse.de> wrote: >> -static void dec_stack_record_count(depot_stack_handle_t handle) >> +static void dec_stack_record_count(depot_stack_handle_t handle, >> + int nr_base_pages) >> { >> struct stack_record *stack_record = __stack_depot_get_stack_record(handle); >> >> if (stack_record) >> - refcount_dec(&stack_record->count); >> + refcount_sub_and_test(nr_base_pages, &stack_record->count); >> } > > mm/page_owner.c: In function 'dec_stack_record_count': > mm/page_owner.c:226:17: error: ignoring return value of 'refcount_sub_and_test' declared with attribute 'warn_unused_result' [-Werror=unused-result] > 226 | refcount_sub_and_test(nr_base_pages, &stack_record->count); > | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > cc1: all warnings being treated as errors > Hmm, I guess that this is not an expected user of refcount API. If it is correct behavior that refcount becomes 0 here, you need to explain like - refcount_sub_and_test(nr_base_pages, &stack_record->count); + if (refcount_sub_and_test(nr_base_pages, &stack_record->count)) { + // Explain why nothing to do here, and explain where/how + // refcount again becomes positive value using refcount_set(). + } or replace refcount_t with atomic_t where it is legal to make refcount positive without using atomic_set(). ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance 2024-03-20 4:40 ` Tetsuo Handa @ 2024-03-20 5:49 ` Oscar Salvador 2024-03-20 9:42 ` Tetsuo Handa 0 siblings, 1 reply; 16+ messages in thread From: Oscar Salvador @ 2024-03-20 5:49 UTC (permalink / raw) To: Tetsuo Handa Cc: Andrew Morton, linux-kernel, linux-mm, Michal Hocko, Vlastimil Babka, Marco Elver, Andrey Konovalov, Alexander Potapenko, syzbot+41bbfdb8d41003d12c0f On Wed, Mar 20, 2024 at 01:40:05PM +0900, Tetsuo Handa wrote: > Hmm, I guess that this is not an expected user of refcount API. > If it is correct behavior that refcount becomes 0 here, you need to explain like > > - refcount_sub_and_test(nr_base_pages, &stack_record->count); > + if (refcount_sub_and_test(nr_base_pages, &stack_record->count)) { > + // Explain why nothing to do here, and explain where/how > + // refcount again becomes positive value using refcount_set(). > + } > > or replace refcount_t with atomic_t where it is legal to make refcount positive > without using atomic_set(). No, it is not expected for the refcount to become 0. I do know why, but I lost a chunk in the middle of a rebase. This should have the follwing on top: diff --git a/mm/page_owner.c b/mm/page_owner.c index 2613805cb665..e477a71d6adc 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -222,8 +222,11 @@ static void dec_stack_record_count(depot_stack_handle_t handle, { struct stack_record *stack_record = __stack_depot_get_stack_record(handle); - if (stack_record) - refcount_sub_and_test(nr_base_pages, &stack_record->count); + if (!stack_record) + return; + + if (refcount_sub_and_test(nr_base_pages, &stack_record->count)) + WARN(1, "%s refcount went to 0 for %u handle\n", __func__, handle); } -- Oscar Salvador SUSE Labs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance 2024-03-20 5:49 ` Oscar Salvador @ 2024-03-20 9:42 ` Tetsuo Handa 0 siblings, 0 replies; 16+ messages in thread From: Tetsuo Handa @ 2024-03-20 9:42 UTC (permalink / raw) To: Oscar Salvador Cc: Andrew Morton, linux-kernel, linux-mm, Michal Hocko, Vlastimil Babka, Marco Elver, Andrey Konovalov, Alexander Potapenko, syzbot+41bbfdb8d41003d12c0f On 2024/03/20 14:49, Oscar Salvador wrote: > This should have the follwing on top: > > diff --git a/mm/page_owner.c b/mm/page_owner.c > index 2613805cb665..e477a71d6adc 100644 > --- a/mm/page_owner.c > +++ b/mm/page_owner.c > @@ -222,8 +222,11 @@ static void dec_stack_record_count(depot_stack_handle_t handle, > { > struct stack_record *stack_record = __stack_depot_get_stack_record(handle); > > - if (stack_record) > - refcount_sub_and_test(nr_base_pages, &stack_record->count); > + if (!stack_record) > + return; > + > + if (refcount_sub_and_test(nr_base_pages, &stack_record->count)) > + WARN(1, "%s refcount went to 0 for %u handle\n", __func__, handle); > } > > That fixed this problem. Thank you. -------- Forwarded Message -------- Date: Wed, 20 Mar 2024 01:31:03 -0700 In-Reply-To: <4362246e-d804-43de-800b-a7840b70919a@I-love.SAKURA.ne.jp> Message-ID: <000000000000410e3b061413692b@google.com> Subject: Re: [syzbot] [mm?] WARNING: refcount bug in __reset_page_owner From: syzbot <syzbot+98c1a1753a0731df2dd4@syzkaller.appspotmail.com> To: linux-kernel@vger.kernel.org, penguin-kernel@i-love.sakura.ne.jp, syzkaller-bugs@googlegroups.com Hello, syzbot has tested the proposed patch and the reproducer did not trigger any issue: Reported-and-tested-by: syzbot+98c1a1753a0731df2dd4@syzkaller.appspotmail.com Tested on: commit: a4145ce1 Merge tag 'bcachefs-2024-03-19' of https://ev.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=131b1d66180000 kernel config: https://syzkaller.appspot.com/x/.config?x=9f47e8dfa53b0b11 dashboard link: https://syzkaller.appspot.com/bug?extid=98c1a1753a0731df2dd4 compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 patch: https://syzkaller.appspot.com/x/patch.diff?x=13972985180000 Note: testing is done by a robot and is best-effort only. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance 2024-03-19 18:32 ` [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance Oscar Salvador 2024-03-19 23:24 ` Andrew Morton @ 2024-03-20 17:35 ` kernel test robot 2024-03-20 23:37 ` kernel test robot 2024-03-21 10:36 ` Vlastimil Babka 3 siblings, 0 replies; 16+ messages in thread From: kernel test robot @ 2024-03-20 17:35 UTC (permalink / raw) To: Oscar Salvador, Andrew Morton Cc: llvm, oe-kbuild-all, Linux Memory Management List, linux-kernel, Michal Hocko, Vlastimil Babka, Marco Elver, Andrey Konovalov, Alexander Potapenko, Tetsuo Handa, Oscar Salvador, syzbot+41bbfdb8d41003d12c0f Hi Oscar, kernel test robot noticed the following build warnings: [auto build test WARNING on akpm-mm/mm-everything] [also build test WARNING on linus/master next-20240320] [cannot apply to v6.8] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Oscar-Salvador/mm-page_owner-Fix-refcount-imbalance/20240320-023302 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/20240319183212.17156-2-osalvador%40suse.de patch subject: [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance config: x86_64-buildonly-randconfig-004-20240320 (https://download.01.org/0day-ci/archive/20240321/202403210131.YLR6USxm-lkp@intel.com/config) compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240321/202403210131.YLR6USxm-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202403210131.YLR6USxm-lkp@intel.com/ All warnings (new ones prefixed by >>): >> mm/page_owner.c:213:3: warning: ignoring return value of function declared with 'warn_unused_result' attribute [-Wunused-result] 213 | refcount_sub_and_test(nr_base_pages, &stack_record->count); | ^~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 warning generated. vim +/warn_unused_result +213 mm/page_owner.c 206 207 static void dec_stack_record_count(depot_stack_handle_t handle, 208 int nr_base_pages) 209 { 210 struct stack_record *stack_record = __stack_depot_get_stack_record(handle); 211 212 if (stack_record) > 213 refcount_sub_and_test(nr_base_pages, &stack_record->count); 214 } 215 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance 2024-03-19 18:32 ` [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance Oscar Salvador 2024-03-19 23:24 ` Andrew Morton 2024-03-20 17:35 ` kernel test robot @ 2024-03-20 23:37 ` kernel test robot 2024-03-21 10:36 ` Vlastimil Babka 3 siblings, 0 replies; 16+ messages in thread From: kernel test robot @ 2024-03-20 23:37 UTC (permalink / raw) To: Oscar Salvador, Andrew Morton Cc: oe-kbuild-all, Linux Memory Management List, linux-kernel, Michal Hocko, Vlastimil Babka, Marco Elver, Andrey Konovalov, Alexander Potapenko, Tetsuo Handa, Oscar Salvador, syzbot+41bbfdb8d41003d12c0f Hi Oscar, kernel test robot noticed the following build warnings: [auto build test WARNING on akpm-mm/mm-everything] [also build test WARNING on linus/master next-20240320] [cannot apply to v6.8] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Oscar-Salvador/mm-page_owner-Fix-refcount-imbalance/20240320-023302 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/20240319183212.17156-2-osalvador%40suse.de patch subject: [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance config: x86_64-randconfig-r081-20240320 (https://download.01.org/0day-ci/archive/20240321/202403210718.9IiE8NIU-lkp@intel.com/config) compiler: gcc-9 (Debian 9.3.0-22) 9.3.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240321/202403210718.9IiE8NIU-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202403210718.9IiE8NIU-lkp@intel.com/ All warnings (new ones prefixed by >>): mm/page_owner.c: In function 'dec_stack_record_count': >> mm/page_owner.c:213:3: warning: ignoring return value of 'refcount_sub_and_test', declared with attribute warn_unused_result [-Wunused-result] 213 | refcount_sub_and_test(nr_base_pages, &stack_record->count); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ vim +/refcount_sub_and_test +213 mm/page_owner.c 206 207 static void dec_stack_record_count(depot_stack_handle_t handle, 208 int nr_base_pages) 209 { 210 struct stack_record *stack_record = __stack_depot_get_stack_record(handle); 211 212 if (stack_record) > 213 refcount_sub_and_test(nr_base_pages, &stack_record->count); 214 } 215 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance 2024-03-19 18:32 ` [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance Oscar Salvador ` (2 preceding siblings ...) 2024-03-20 23:37 ` kernel test robot @ 2024-03-21 10:36 ` Vlastimil Babka 3 siblings, 0 replies; 16+ messages in thread From: Vlastimil Babka @ 2024-03-21 10:36 UTC (permalink / raw) To: Oscar Salvador, Andrew Morton Cc: linux-kernel, linux-mm, Michal Hocko, Marco Elver, Andrey Konovalov, Alexander Potapenko, Tetsuo Handa, syzbot+41bbfdb8d41003d12c0f On 3/19/24 19:32, Oscar Salvador wrote: > Current code does not contemplate scenarios were an allocation and > free operation on the same pages do not handle it in the same amount > at once. > To give an example, page_alloc_exact(), where we will allocate a page > of enough order to stafisfy the size request, but we will free the > remainings right away. > > In the above example, we will increment the stack_record refcount > only once, but we will decrease it the same number of times as number > of unused pages we have to free. > This will lead to a warning because of refcount imbalance. > > Fix this by recording the number of base pages in the refcount field. > > Reported-by: syzbot+41bbfdb8d41003d12c0f@syzkaller.appspotmail.com > Closes: https://lore.kernel.org/linux-mm/00000000000090e8ff0613eda0e5@google.com > Fixes: 217b2119b9e2 ("mm,page_owner: implement the tracking of the stacks count") > Signed-off-by: Oscar Salvador <osalvador@suse.de> With the fixup, Reviewed-by: Vlastimil Babka <vbabka@suse.cz> But I think you'll need to resend with the missing hunk already applied, it had broken whitespace in your email and IIUC this is was dropped from mm tree. Also I'd suggest a change: > +++ b/mm/page_owner.c > @@ -196,9 +196,11 @@ static void add_stack_record_to_list(struct stack_record *stack_record, > spin_unlock_irqrestore(&stack_list_lock, flags); > } > > -static void inc_stack_record_count(depot_stack_handle_t handle, gfp_t gfp_mask) > +static void inc_stack_record_count(depot_stack_handle_t handle, gfp_t gfp_mask, > + int nr_base_pages) > { > struct stack_record *stack_record = __stack_depot_get_stack_record(handle); > + int old = REFCOUNT_SATURATED; > > if (!stack_record) > return; > @@ -210,22 +212,18 @@ static void inc_stack_record_count(depot_stack_handle_t handle, gfp_t gfp_mask) > * Since we do not use STACK_DEPOT_FLAG_GET API, let us > * set a refcount of 1 ourselves. > */ > - if (refcount_read(&stack_record->count) == REFCOUNT_SATURATED) { > - int old = REFCOUNT_SATURATED; I think this was useful optimization in that most cases the count is not REFCOUNT_SATURATED so we don't have to go for the expensive cmpxchg all the time. Or do I miss a reason why this was changed? > - > - if (atomic_try_cmpxchg_relaxed(&stack_record->count.refs, &old, 1)) > - /* Add the new stack_record to our list */ > - add_stack_record_to_list(stack_record, gfp_mask); > - } > - refcount_inc(&stack_record->count); > + if (atomic_try_cmpxchg_relaxed(&stack_record->count.refs, &old, 1)) > + add_stack_record_to_list(stack_record, gfp_mask); > + refcount_add(nr_base_pages, &stack_record->count); > } > ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 2/2] mm,page_owner: Fix accounting of pages when migrating 2024-03-19 18:32 [PATCH v2 0/2] page_owner: Refcount fixups Oscar Salvador 2024-03-19 18:32 ` [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance Oscar Salvador @ 2024-03-19 18:32 ` Oscar Salvador 2024-03-19 18:48 ` Matthew Wilcox 1 sibling, 1 reply; 16+ messages in thread From: Oscar Salvador @ 2024-03-19 18:32 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, linux-mm, Michal Hocko, Vlastimil Babka, Marco Elver, Andrey Konovalov, Alexander Potapenko, Tetsuo Handa, Oscar Salvador Upon migration, new allocated pages are being given the handle of the old pages. This is problematic because it means that for the stack which allocated the old page, we will be substracting the old page + the new one when that page is freed, creating an accounting imbalance. Fix this by adding a new migrate_handle in the page_owner struct, and record the handle that allocated the new page in __folio_copy_owner(). Upon freeing, we check whether we have a migrate_handle, and if we do, we use migrate_handle for dec_stack_record_count(), which will subtract those pages from its right handle. Fixes: 217b2119b9e2 ("mm,page_owner: implement the tracking of the stacks count") Signed-off-by: Oscar Salvador <osalvador@suse.de> --- mm/page_owner.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/mm/page_owner.c b/mm/page_owner.c index 2613805cb665..1a7d0d1dc640 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -27,6 +27,7 @@ struct page_owner { gfp_t gfp_mask; depot_stack_handle_t handle; depot_stack_handle_t free_handle; + depot_stack_handle_t migrate_handle; u64 ts_nsec; u64 free_ts_nsec; char comm[TASK_COMM_LEN]; @@ -240,7 +241,15 @@ void __reset_page_owner(struct page *page, unsigned short order) return; page_owner = get_page_owner(page_ext); - alloc_handle = page_owner->handle; + /* + * If this page was allocated for migration purposes, its handle doesn't + * reference the stack it was allocated from, so make sure to use the + * migrate_handle in order to subtract it from the right stack. + */ + if (!page_owner->migrate_handle) + alloc_handle = page_owner->handle; + else + alloc_handle = page_owner->migrate_handle; handle = save_stack(GFP_NOWAIT | __GFP_NOWARN); for (i = 0; i < (1 << order); i++) { @@ -277,6 +286,7 @@ static inline void __set_page_owner_handle(struct page_ext *page_ext, page_owner->handle = handle; page_owner->order = order; page_owner->gfp_mask = gfp_mask; + page_owner->migrate_handle = 0; page_owner->last_migrate_reason = -1; page_owner->pid = current->pid; page_owner->tgid = current->tgid; @@ -358,6 +368,7 @@ void __folio_copy_owner(struct folio *newfolio, struct folio *old) new_page_owner->gfp_mask = old_page_owner->gfp_mask; new_page_owner->last_migrate_reason = old_page_owner->last_migrate_reason; + new_page_owner->migrate_handle = new_page_owner->handle; new_page_owner->handle = old_page_owner->handle; new_page_owner->pid = old_page_owner->pid; new_page_owner->tgid = old_page_owner->tgid; -- 2.44.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/2] mm,page_owner: Fix accounting of pages when migrating 2024-03-19 18:32 ` [PATCH v2 2/2] mm,page_owner: Fix accounting of pages when migrating Oscar Salvador @ 2024-03-19 18:48 ` Matthew Wilcox 2024-03-20 5:00 ` Oscar Salvador 0 siblings, 1 reply; 16+ messages in thread From: Matthew Wilcox @ 2024-03-19 18:48 UTC (permalink / raw) To: Oscar Salvador Cc: Andrew Morton, linux-kernel, linux-mm, Michal Hocko, Vlastimil Babka, Marco Elver, Andrey Konovalov, Alexander Potapenko, Tetsuo Handa On Tue, Mar 19, 2024 at 07:32:12PM +0100, Oscar Salvador wrote: > Upon migration, new allocated pages are being given the handle of the old > pages. This is problematic because it means that for the stack which > allocated the old page, we will be substracting the old page + the new one > when that page is freed, creating an accounting imbalance. > > Fix this by adding a new migrate_handle in the page_owner struct, and > record the handle that allocated the new page in __folio_copy_owner(). > Upon freeing, we check whether we have a migrate_handle, and if we do, > we use migrate_handle for dec_stack_record_count(), which will > subtract those pages from its right handle. Is this the right way to fix this problem? I would have thought we'd be better off accounting this as migration freeing the old page and allocating the new page. If I understand correctly, this is the code which says "This page was last allocated by X and freed by Y", and I would think that being last freed (or allocated) by the migration code would be a very nice hint about where a problem might stem from. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/2] mm,page_owner: Fix accounting of pages when migrating 2024-03-19 18:48 ` Matthew Wilcox @ 2024-03-20 5:00 ` Oscar Salvador 2024-03-21 10:50 ` Vlastimil Babka 0 siblings, 1 reply; 16+ messages in thread From: Oscar Salvador @ 2024-03-20 5:00 UTC (permalink / raw) To: Matthew Wilcox Cc: Andrew Morton, linux-kernel, linux-mm, Michal Hocko, Vlastimil Babka, Marco Elver, Andrey Konovalov, Alexander Potapenko, Tetsuo Handa On Tue, Mar 19, 2024 at 06:48:31PM +0000, Matthew Wilcox wrote: > Is this the right way to fix this problem? I would have thought we'd > be better off accounting this as migration freeing the old page and > allocating the new page. If I understand correctly, this is the code > which says "This page was last allocated by X and freed by Y", and I > would think that being last freed (or allocated) by the migration code > would be a very nice hint about where a problem might stem from. I hear you, and I had the same thought when I stumbled upon this. I did not know that the handle was being changed, otherwise it would have saved me quite a lot of debugging time. Checking the history of this, I can see this decision was made in 2016 in: commit d435edca928805074dae005ab9a42d9fa60fc702 Author: Vlastimil Babka <vbabka@suse.cz> Date: Tue Mar 15 14:56:15 2016 -0700 mm, page_owner: copy page owner info during migration And let me quote: "The page_owner mechanism stores gfp_flags of an allocation and stack trace that lead to it. During page migration, the original information is practically replaced by the allocation of free page as the migration target. Arguably this is less useful and might lead to all the page_owner info for migratable pages gradually converge towards compaction or numa balancing migrations. It has also lead to inaccuracies such as one fixed by commit e2cfc91120fa ("mm/page_owner: set correct gfp_mask on page_owner")." A following patch stored the migration reason in last_migrate_reason, and the patch also add a bit of information if last_migrate_reason was other than 0: + if (page_ext->last_migrate_reason != -1) { + ret += snprintf(kbuf + ret, count - ret, + "Page has been migrated, last migrate reason: %s\n", + migrate_reason_names[page_ext->last_migrate_reason]); + if (ret >= count) + goto err; + } Now, thinking about this some more, it kind of makes sense, because one of the things page_owner is used for, in my experience, is for memory leaks. We print the output, try to correlate allocation/free operations per stack so one can easily spot a stack that just keeps allocating memory and never frees it (it might be legit, and not a leak, but it gives a hint). Now imagine there are 10k pages pointing to stack A. If those pages were to be migrated e.g: kcompactd jumps in, we will lose the original stack and replace it with something like: migrate_pages() ... ... kcompatd After that, stack A does not have those 10k pages pointing to it anymore, although it stills "own" them, just that got replaced by new ones due to migration. This kind of defeats the purpose of page_owner. And after all, we do record some migration information in those new pages, which will give us a hint when looking at the output. So, all in all, I am leaning towards "this is fine". -- Oscar Salvador SUSE Labs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/2] mm,page_owner: Fix accounting of pages when migrating 2024-03-20 5:00 ` Oscar Salvador @ 2024-03-21 10:50 ` Vlastimil Babka 2024-03-21 11:07 ` Oscar Salvador 0 siblings, 1 reply; 16+ messages in thread From: Vlastimil Babka @ 2024-03-21 10:50 UTC (permalink / raw) To: Oscar Salvador, Matthew Wilcox Cc: Andrew Morton, linux-kernel, linux-mm, Michal Hocko, Marco Elver, Andrey Konovalov, Alexander Potapenko, Tetsuo Handa On 3/20/24 06:00, Oscar Salvador wrote: > On Tue, Mar 19, 2024 at 06:48:31PM +0000, Matthew Wilcox wrote: >> Is this the right way to fix this problem? I would have thought we'd >> be better off accounting this as migration freeing the old page and >> allocating the new page. If I understand correctly, this is the code >> which says "This page was last allocated by X and freed by Y", and I >> would think that being last freed (or allocated) by the migration code >> would be a very nice hint about where a problem might stem from. > > I hear you, and I had the same thought when I stumbled upon this. > I did not know that the handle was being changed, otherwise it would > have saved me quite a lot of debugging time. > > Checking the history of this, I can see this decision was made in > 2016 in: > > commit d435edca928805074dae005ab9a42d9fa60fc702 > Author: Vlastimil Babka <vbabka@suse.cz> > Date: Tue Mar 15 14:56:15 2016 -0700 > > mm, page_owner: copy page owner info during migration > Yeah I think we could keep that logic. But we could also simply subtract the refcount of the old handle (the "allocated for migration") in __folio_copy_owner() no? Then we wouldn't need the extra migrate_handle. Also we might have more issues here. Most page owner code takes care to set everything for all pages within a folio, but __folio_copy_owner() and __set_page_owner_migrate_reason() don't. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/2] mm,page_owner: Fix accounting of pages when migrating 2024-03-21 10:50 ` Vlastimil Babka @ 2024-03-21 11:07 ` Oscar Salvador 2024-03-21 11:20 ` Vlastimil Babka 0 siblings, 1 reply; 16+ messages in thread From: Oscar Salvador @ 2024-03-21 11:07 UTC (permalink / raw) To: Vlastimil Babka Cc: Matthew Wilcox, Andrew Morton, linux-kernel, linux-mm, Michal Hocko, Marco Elver, Andrey Konovalov, Alexander Potapenko, Tetsuo Handa On Thu, Mar 21, 2024 at 11:50:36AM +0100, Vlastimil Babka wrote: > Yeah I think we could keep that logic. I am all for keeping it. > But we could also simply subtract the refcount of the old handle (the > "allocated for migration") in __folio_copy_owner() no? Then we wouldn't need > the extra migrate_handle. Since new_page will have the old handle pointing to the old stack after the call, we could uncharge the old_page to the migrate_stack, which new_page->_handle holds before it gets changed. So we basically swap it. It could work, but I kinda have a bittersweet feeling here. I am trying to work towards to reduce the number of lookups in the hlist, but for the approach described above I would need to lookup the stack for new_page->handle in order to substract the page. OTHO, I understand that adding migrate_handle kinda wasted memory. 16MB for 16GB of memory. > Also we might have more issues here. Most page owner code takes care to set > everything for all pages within a folio, but __folio_copy_owner() and > __set_page_owner_migrate_reason() don't. I did not check deeply but do not we split the folio upon migration in case it is large? Which means we should reach split_page_owner() before the copy takes place. Do I get it right? -- Oscar Salvador SUSE Labs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/2] mm,page_owner: Fix accounting of pages when migrating 2024-03-21 11:07 ` Oscar Salvador @ 2024-03-21 11:20 ` Vlastimil Babka 2024-03-21 11:54 ` Oscar Salvador 0 siblings, 1 reply; 16+ messages in thread From: Vlastimil Babka @ 2024-03-21 11:20 UTC (permalink / raw) To: Oscar Salvador Cc: Matthew Wilcox, Andrew Morton, linux-kernel, linux-mm, Michal Hocko, Marco Elver, Andrey Konovalov, Alexander Potapenko, Tetsuo Handa On 3/21/24 12:07, Oscar Salvador wrote: > On Thu, Mar 21, 2024 at 11:50:36AM +0100, Vlastimil Babka wrote: >> Yeah I think we could keep that logic. > > I am all for keeping it. > >> But we could also simply subtract the refcount of the old handle (the >> "allocated for migration") in __folio_copy_owner() no? Then we wouldn't need >> the extra migrate_handle. > > Since new_page will have the old handle pointing to the old stack after > the call, we > could uncharge the old_page to the migrate_stack, which new_page->_handle holds > before it gets changed. > > So we basically swap it. > > It could work, but I kinda have a bittersweet feeling here. > I am trying to work towards to reduce the number of lookups in the > hlist, but for the approach described above I would need to lookup > the stack for new_page->handle in order to substract the page. Understood, but migration is kinda heavy and non-fast-path operation already so the extra lookup wouldn't be in a critical fast path. > OTHO, I understand that adding migrate_handle kinda wasted memory. > 16MB for 16GB of memory. > >> Also we might have more issues here. Most page owner code takes care to set >> everything for all pages within a folio, but __folio_copy_owner() and >> __set_page_owner_migrate_reason() don't. > > I did not check deeply but do not we split the folio upon migration > in case it is large? > Which means we should reach split_page_owner() before the copy takes > place. Do I get it right? When I mean is we have __set_page_owner_handle() setting up everything for tail pages and then we have __folio_copy_owner updating only the head page, so this will create kinda a mixed up information. Which might not be an issue as __folio_copy_owner() should mean it's a proper folio thus compound page thus nobody ever will check those mismatched tail pages... Maybe we could adjust __set_page_owner_handle() to skip tail pages for compound pages as well and unify this, and tail pages would be only setup for those weird high-order non-compound pages so that the odd case in __free_pages() works? Or maybe page_owner is considered debugging functionality enough so it might be worth having the redundant data in tail pages in case something goes wrong. But either way now it seems we're not doing it consistently. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/2] mm,page_owner: Fix accounting of pages when migrating 2024-03-21 11:20 ` Vlastimil Babka @ 2024-03-21 11:54 ` Oscar Salvador 0 siblings, 0 replies; 16+ messages in thread From: Oscar Salvador @ 2024-03-21 11:54 UTC (permalink / raw) To: Vlastimil Babka Cc: Matthew Wilcox, Andrew Morton, linux-kernel, linux-mm, Michal Hocko, Marco Elver, Andrey Konovalov, Alexander Potapenko, Tetsuo Handa On Thu, Mar 21, 2024 at 12:20:24PM +0100, Vlastimil Babka wrote: > Understood, but migration is kinda heavy and non-fast-path operation already > so the extra lookup wouldn't be in a critical fast path. Ok, you convinced me, let us save that memory. > When I mean is we have __set_page_owner_handle() setting up everything for > tail pages and then we have __folio_copy_owner updating only the head page, > so this will create kinda a mixed up information. Which might not be an > issue as __folio_copy_owner() should mean it's a proper folio thus compound > page thus nobody ever will check those mismatched tail pages... Maybe we > could adjust __set_page_owner_handle() to skip tail pages for compound > pages as well and unify this, and tail pages would be only setup for those > weird high-order non-compound pages so that the odd case in __free_pages() > works? > > Or maybe page_owner is considered debugging functionality enough so it might > be worth having the redundant data in tail pages in case something goes > wrong. But either way now it seems we're not doing it consistently. So we basically have two options here, 1) is to also update tail pages in __folio_copy_owner, and 2) is to follow __folio_copy_owner example and skip tail pages in __set_page_owner. The thing is, going with 2) might mean, as you pointed out, that if something goes wrong we lose the information for those pages as page_owner does not have a record for them. OTOH, would that information be really useful? Sure we could see the stack, but AFAICS not even the migrate_reason would be recorded in those tail pages, which means that if they are migrated and freed, we would only see that they were freed. (the free stack would point to the migrate code though, so that is a hint). Not really sure which path to take here, skipping tail pages would be less of a complexity but at a risk of loosing information. Anyway, let me first tackle the issue at hand here, and then I will think more about that. Thanks! -- Oscar Salvador SUSE Labs ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2024-03-21 11:52 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-03-19 18:32 [PATCH v2 0/2] page_owner: Refcount fixups Oscar Salvador 2024-03-19 18:32 ` [PATCH v2 1/2] mm,page_owner: Fix refcount imbalance Oscar Salvador 2024-03-19 23:24 ` Andrew Morton 2024-03-20 4:40 ` Tetsuo Handa 2024-03-20 5:49 ` Oscar Salvador 2024-03-20 9:42 ` Tetsuo Handa 2024-03-20 17:35 ` kernel test robot 2024-03-20 23:37 ` kernel test robot 2024-03-21 10:36 ` Vlastimil Babka 2024-03-19 18:32 ` [PATCH v2 2/2] mm,page_owner: Fix accounting of pages when migrating Oscar Salvador 2024-03-19 18:48 ` Matthew Wilcox 2024-03-20 5:00 ` Oscar Salvador 2024-03-21 10:50 ` Vlastimil Babka 2024-03-21 11:07 ` Oscar Salvador 2024-03-21 11:20 ` Vlastimil Babka 2024-03-21 11:54 ` Oscar Salvador
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).