linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] filemap_add_folio_nocharge()
@ 2025-08-06  0:11 Boris Burkov
  2025-08-06  0:11 ` [PATCH 1/3] mm/filemap: add filemap_add_folio_nocharge() Boris Burkov
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Boris Burkov @ 2025-08-06  0:11 UTC (permalink / raw)
  To: linux-btrfs, linux-mm, linux-fsdevel, kernel-team; +Cc: shakeel.butt, hch, wqu

I would like to revisit Qu's proposal to not charge btrfs extent_buffer
allocations to the user's cgroup.

https://lore.kernel.org/linux-mm/b5fef5372ae454a7b6da4f2f75c427aeab6a07d6.1727498749.git.wqu@suse.com/

I believe it is detrimental to randomly account these global pages to
the cgroup using them, basically at random. A bit more justification
and explanation in the patches themselves.

Three meta-considerations/questions:
1. Which tree should this go through, assuming it is acceptable?
   For now, I have based it off btrfs/for-next as that is what I am
   used to doing, but I am happy to re-send it based off the appropriate mm
   branch.
2. Christoph wrote the first patch as-is in his suggestion to Qu. I am happy
   to replace it with his authorship/s-o-b, I just didn't want to do that
   without asking. For now, I put his "Suggested-by".
3. The previous suggestion also requested "proper" documentation. I don't
   know what that entails in this case, and was unable to find corresponding
   documentation for filemap_add_folio() in the code or in Documentation/.
   Please let me know what I should be doing there, as well.

Boris Burkov (3):
  mm/filemap: add filemap_add_folio_nocharge()
  btrfs: use filemap_add_folio_nocharge() for extent_buffers
  mm: add vmstat for cgroup uncharged pages

 fs/btrfs/extent_io.c    |  4 ++--
 include/linux/mmzone.h  |  3 +++
 include/linux/pagemap.h |  2 ++
 mm/filemap.c            | 41 +++++++++++++++++++++++++++++++++++------
 mm/vmstat.c             |  3 +++
 5 files changed, 45 insertions(+), 8 deletions(-)

-- 
2.50.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] mm/filemap: add filemap_add_folio_nocharge()
  2025-08-06  0:11 [PATCH 0/3] filemap_add_folio_nocharge() Boris Burkov
@ 2025-08-06  0:11 ` Boris Burkov
  2025-08-06  0:11 ` [PATCH 2/3] btrfs: use filemap_add_folio_nocharge() for extent_buffers Boris Burkov
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Boris Burkov @ 2025-08-06  0:11 UTC (permalink / raw)
  To: linux-btrfs, linux-mm, linux-fsdevel, kernel-team; +Cc: shakeel.butt, hch, wqu

Btrfs currently tracks its metadata pages in the page cache, using a
fake inode (fs_info->btree_inode) with offsets corresponding to where
the metadata is stored in the filesystem's full logical address space.

A consequence of this is that when btrfs uses filemap_add_folio(), this
usage is charged to the cgroup of whichever task happens to be running
at the time. These folios don't belong to any particular user cgroup, so
I don't think it makes much sense for them to be charged in that way.
Some negative consequences as a result:
- A task can be holding some important btrfs locks, then need to lookup
  some metadata and go into reclaim, extending the duration it holds
  that lock for, and unfairly pushing its own reclaim pain onto other
  cgroups.
- If that cgroup goes into reclaim, it might reclaim these folios a
  different non-reclaiming cgroup might need soon. This is naturally
  offset by LRU reclaim, but still.

A very similar proposal to use the root cgroup was previously made by
Qu, but he was advised by Christoph to instead introduce a _nocharge
variant of filemap_add_folio.

Link: https://lore.kernel.org/linux-mm/b5fef5372ae454a7b6da4f2f75c427aeab6a07d6.1727498749.git.wqu@suse.com/
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Boris Burkov <boris@bur.io>
---
 include/linux/pagemap.h |  2 ++
 mm/filemap.c            | 23 +++++++++++++++++------
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index e63fbfbd5b0f..acc8d390ecbb 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -1237,6 +1237,8 @@ size_t fault_in_readable(const char __user *uaddr, size_t size);
 
 int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
 		pgoff_t index, gfp_t gfp);
+int filemap_add_folio_nocharge(struct address_space *mapping,
+			       struct folio *folio, pgoff_t index, gfp_t gfp);
 int filemap_add_folio(struct address_space *mapping, struct folio *folio,
 		pgoff_t index, gfp_t gfp);
 void filemap_remove_folio(struct folio *folio);
diff --git a/mm/filemap.c b/mm/filemap.c
index bada249b9fb7..ccc9cfb4d418 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -955,20 +955,15 @@ noinline int __filemap_add_folio(struct address_space *mapping,
 }
 ALLOW_ERROR_INJECTION(__filemap_add_folio, ERRNO);
 
-int filemap_add_folio(struct address_space *mapping, struct folio *folio,
+int filemap_add_folio_nocharge(struct address_space *mapping, struct folio *folio,
 				pgoff_t index, gfp_t gfp)
 {
 	void *shadow = NULL;
 	int ret;
 
-	ret = mem_cgroup_charge(folio, NULL, gfp);
-	if (ret)
-		return ret;
-
 	__folio_set_locked(folio);
 	ret = __filemap_add_folio(mapping, folio, index, gfp, &shadow);
 	if (unlikely(ret)) {
-		mem_cgroup_uncharge(folio);
 		__folio_clear_locked(folio);
 	} else {
 		/*
@@ -986,6 +981,22 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio,
 	}
 	return ret;
 }
+EXPORT_SYMBOL_GPL(filemap_add_folio_nocharge);
+
+int filemap_add_folio(struct address_space *mapping, struct folio *folio,
+				pgoff_t index, gfp_t gfp)
+{
+	int ret;
+
+	ret = mem_cgroup_charge(folio, NULL, gfp);
+	if (ret)
+		return ret;
+
+	ret = filemap_add_folio_nocharge(mapping, folio, index, gfp);
+	if (ret)
+		mem_cgroup_uncharge(folio);
+	return ret;
+}
 EXPORT_SYMBOL_GPL(filemap_add_folio);
 
 #ifdef CONFIG_NUMA
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/3] btrfs: use filemap_add_folio_nocharge() for extent_buffers
  2025-08-06  0:11 [PATCH 0/3] filemap_add_folio_nocharge() Boris Burkov
  2025-08-06  0:11 ` [PATCH 1/3] mm/filemap: add filemap_add_folio_nocharge() Boris Burkov
@ 2025-08-06  0:11 ` Boris Burkov
  2025-08-06  0:11 ` [PATCH 3/3] mm: add vmstat for cgroup uncharged pages Boris Burkov
  2025-08-06 14:00 ` [PATCH 0/3] filemap_add_folio_nocharge() Matthew Wilcox
  3 siblings, 0 replies; 9+ messages in thread
From: Boris Burkov @ 2025-08-06  0:11 UTC (permalink / raw)
  To: linux-btrfs, linux-mm, linux-fsdevel, kernel-team; +Cc: shakeel.butt, hch, wqu

Don't block on cgroup reclaim or subject shared extent_buffers to cgroup
reclaim.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/extent_io.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index f9ccbe5a203a..1f25447f1e39 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3258,8 +3258,8 @@ static int attach_eb_folio_to_filemap(struct extent_buffer *eb, int i,
 
 retry:
 	existing_folio = NULL;
-	ret = filemap_add_folio(mapping, eb->folios[i], index + i,
-				GFP_NOFS | __GFP_NOFAIL);
+	ret = filemap_add_folio_nocharge(mapping, eb->folios[i], index + i,
+					 GFP_NOFS | __GFP_NOFAIL);
 	if (!ret)
 		goto finish;
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/3] mm: add vmstat for cgroup uncharged pages
  2025-08-06  0:11 [PATCH 0/3] filemap_add_folio_nocharge() Boris Burkov
  2025-08-06  0:11 ` [PATCH 1/3] mm/filemap: add filemap_add_folio_nocharge() Boris Burkov
  2025-08-06  0:11 ` [PATCH 2/3] btrfs: use filemap_add_folio_nocharge() for extent_buffers Boris Burkov
@ 2025-08-06  0:11 ` Boris Burkov
  2025-08-06 20:40   ` kernel test robot
                     ` (2 more replies)
  2025-08-06 14:00 ` [PATCH 0/3] filemap_add_folio_nocharge() Matthew Wilcox
  3 siblings, 3 replies; 9+ messages in thread
From: Boris Burkov @ 2025-08-06  0:11 UTC (permalink / raw)
  To: linux-btrfs, linux-mm, linux-fsdevel, kernel-team; +Cc: shakeel.butt, hch, wqu

If cgroups are configured into the kernel, then uncharged pages can only
come from filemap_add_folio_nocharge. Track such uncharged folios in
vmstat so that they are accounted for.

Suggested-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Boris Burkov <boris@bur.io>
---
 include/linux/mmzone.h |  3 +++
 mm/filemap.c           | 18 ++++++++++++++++++
 mm/vmstat.c            |  3 +++
 3 files changed, 24 insertions(+)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 283913d42d7b..a945dec65371 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -241,6 +241,9 @@ enum node_stat_item {
 	NR_HUGETLB,
 #endif
 	NR_BALLOON_PAGES,
+#ifdef CONFIG_MEMCG
+	NR_UNCHARGED_FILE_PAGES,
+#endif
 	NR_VM_NODE_STAT_ITEMS
 };
 
diff --git a/mm/filemap.c b/mm/filemap.c
index ccc9cfb4d418..0a258b4a9246 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -146,6 +146,22 @@ static void page_cache_delete(struct address_space *mapping,
 	mapping->nrpages -= nr;
 }
 
+#ifdef CONFIG_MEMCG
+static void filemap_mod_uncharged_vmstat(struct folio *folio, int sign)
+{
+	long nr = folio_nr_pages(folio) * sign;
+
+	if (!folio_memcg(folio))
+		__lruvec_stat_mod_folio(folio, NR_UNCHARGED_FILE_PAGES, nr);
+}
+#else
+static void filemap_mod_uncharged_cgroup_vmstat(struct folio *folio, int sign)
+{
+	return;
+}
+#endif
+
+
 static void filemap_unaccount_folio(struct address_space *mapping,
 		struct folio *folio)
 {
@@ -190,6 +206,7 @@ static void filemap_unaccount_folio(struct address_space *mapping,
 		__lruvec_stat_mod_folio(folio, NR_FILE_THPS, -nr);
 		filemap_nr_thps_dec(mapping);
 	}
+	filemap_mod_uncharged_vmstat(folio, -1);
 
 	/*
 	 * At this point folio must be either written or cleaned by
@@ -978,6 +995,7 @@ int filemap_add_folio_nocharge(struct address_space *mapping, struct folio *foli
 		if (!(gfp & __GFP_WRITE) && shadow)
 			workingset_refault(folio, shadow);
 		folio_add_lru(folio);
+		filemap_mod_uncharged_vmstat(folio, 1);
 	}
 	return ret;
 }
diff --git a/mm/vmstat.c b/mm/vmstat.c
index a78d70ddeacd..63318742ae5a 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1281,6 +1281,9 @@ const char * const vmstat_text[] = {
 	"nr_hugetlb",
 #endif
 	"nr_balloon_pages",
+#ifdef CONFIG_MEMCG
+	"nr_uncharged_file_pages",
+#endif
 	/* system-wide enum vm_stat_item counters */
 	"nr_dirty_threshold",
 	"nr_dirty_background_threshold",
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] filemap_add_folio_nocharge()
  2025-08-06  0:11 [PATCH 0/3] filemap_add_folio_nocharge() Boris Burkov
                   ` (2 preceding siblings ...)
  2025-08-06  0:11 ` [PATCH 3/3] mm: add vmstat for cgroup uncharged pages Boris Burkov
@ 2025-08-06 14:00 ` Matthew Wilcox
  2025-08-06 23:19   ` Shakeel Butt
  3 siblings, 1 reply; 9+ messages in thread
From: Matthew Wilcox @ 2025-08-06 14:00 UTC (permalink / raw)
  To: Boris Burkov
  Cc: linux-btrfs, linux-mm, linux-fsdevel, kernel-team, shakeel.butt,
	hch, wqu

On Tue, Aug 05, 2025 at 05:11:46PM -0700, Boris Burkov wrote:
> I would like to revisit Qu's proposal to not charge btrfs extent_buffer
> allocations to the user's cgroup.
> 
> https://lore.kernel.org/linux-mm/b5fef5372ae454a7b6da4f2f75c427aeab6a07d6.1727498749.git.wqu@suse.com/

I prefer Qu's suggestion to add a flag to the address_space.  This really
is a property of the address_space, not a property of the call-site.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] mm: add vmstat for cgroup uncharged pages
  2025-08-06  0:11 ` [PATCH 3/3] mm: add vmstat for cgroup uncharged pages Boris Burkov
@ 2025-08-06 20:40   ` kernel test robot
  2025-08-06 20:51   ` kernel test robot
  2025-08-07 17:23   ` Shakeel Butt
  2 siblings, 0 replies; 9+ messages in thread
From: kernel test robot @ 2025-08-06 20:40 UTC (permalink / raw)
  To: Boris Burkov, linux-btrfs, linux-mm, linux-fsdevel, kernel-team
  Cc: oe-kbuild-all, shakeel.butt, hch, wqu

Hi Boris,

kernel test robot noticed the following build warnings:

[auto build test WARNING on kdave/for-next]
[also build test WARNING on v6.16]
[cannot apply to akpm-mm/mm-everything linus/master next-20250806]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Boris-Burkov/mm-filemap-add-filemap_add_folio_nocharge/20250806-130147
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
patch link:    https://lore.kernel.org/r/eae30d630ba07de8966d09a3e1700f53715980c2.1754438418.git.boris%40bur.io
patch subject: [PATCH 3/3] mm: add vmstat for cgroup uncharged pages
config: i386-buildonly-randconfig-003-20250807 (https://download.01.org/0day-ci/archive/20250807/202508070434.6EpRsRF3-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14+deb12u1) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250807/202508070434.6EpRsRF3-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202508070434.6EpRsRF3-lkp@intel.com/

All warnings (new ones prefixed by >>):

   mm/filemap.c: In function 'filemap_unaccount_folio':
   mm/filemap.c:209:9: error: implicit declaration of function 'filemap_mod_uncharged_vmstat'; did you mean 'filemap_mod_uncharged_cgroup_vmstat'? [-Werror=implicit-function-declaration]
     209 |         filemap_mod_uncharged_vmstat(folio, -1);
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
         |         filemap_mod_uncharged_cgroup_vmstat
   mm/filemap.c: At top level:
>> mm/filemap.c:158:13: warning: 'filemap_mod_uncharged_cgroup_vmstat' defined but not used [-Wunused-function]
     158 | static void filemap_mod_uncharged_cgroup_vmstat(struct folio *folio, int sign)
         |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +/filemap_mod_uncharged_cgroup_vmstat +158 mm/filemap.c

   148	
   149	#ifdef CONFIG_MEMCG
   150	static void filemap_mod_uncharged_vmstat(struct folio *folio, int sign)
   151	{
   152		long nr = folio_nr_pages(folio) * sign;
   153	
   154		if (!folio_memcg(folio))
   155			__lruvec_stat_mod_folio(folio, NR_UNCHARGED_FILE_PAGES, nr);
   156	}
   157	#else
 > 158	static void filemap_mod_uncharged_cgroup_vmstat(struct folio *folio, int sign)
   159	{
   160		return;
   161	}
   162	#endif
   163	
   164	
   165	static void filemap_unaccount_folio(struct address_space *mapping,
   166			struct folio *folio)
   167	{
   168		long nr;
   169	
   170		VM_BUG_ON_FOLIO(folio_mapped(folio), folio);
   171		if (!IS_ENABLED(CONFIG_DEBUG_VM) && unlikely(folio_mapped(folio))) {
   172			pr_alert("BUG: Bad page cache in process %s  pfn:%05lx\n",
   173				 current->comm, folio_pfn(folio));
   174			dump_page(&folio->page, "still mapped when deleted");
   175			dump_stack();
   176			add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
   177	
   178			if (mapping_exiting(mapping) && !folio_test_large(folio)) {
   179				int mapcount = folio_mapcount(folio);
   180	
   181				if (folio_ref_count(folio) >= mapcount + 2) {
   182					/*
   183					 * All vmas have already been torn down, so it's
   184					 * a good bet that actually the page is unmapped
   185					 * and we'd rather not leak it: if we're wrong,
   186					 * another bad page check should catch it later.
   187					 */
   188					atomic_set(&folio->_mapcount, -1);
   189					folio_ref_sub(folio, mapcount);
   190				}
   191			}
   192		}
   193	
   194		/* hugetlb folios do not participate in page cache accounting. */
   195		if (folio_test_hugetlb(folio))
   196			return;
   197	
   198		nr = folio_nr_pages(folio);
   199	
   200		__lruvec_stat_mod_folio(folio, NR_FILE_PAGES, -nr);
   201		if (folio_test_swapbacked(folio)) {
   202			__lruvec_stat_mod_folio(folio, NR_SHMEM, -nr);
   203			if (folio_test_pmd_mappable(folio))
   204				__lruvec_stat_mod_folio(folio, NR_SHMEM_THPS, -nr);
   205		} else if (folio_test_pmd_mappable(folio)) {
   206			__lruvec_stat_mod_folio(folio, NR_FILE_THPS, -nr);
   207			filemap_nr_thps_dec(mapping);
   208		}
 > 209		filemap_mod_uncharged_vmstat(folio, -1);
   210	
   211		/*
   212		 * At this point folio must be either written or cleaned by
   213		 * truncate.  Dirty folio here signals a bug and loss of
   214		 * unwritten data - on ordinary filesystems.
   215		 *
   216		 * But it's harmless on in-memory filesystems like tmpfs; and can
   217		 * occur when a driver which did get_user_pages() sets page dirty
   218		 * before putting it, while the inode is being finally evicted.
   219		 *
   220		 * Below fixes dirty accounting after removing the folio entirely
   221		 * but leaves the dirty flag set: it has no effect for truncated
   222		 * folio and anyway will be cleared before returning folio to
   223		 * buddy allocator.
   224		 */
   225		if (WARN_ON_ONCE(folio_test_dirty(folio) &&
   226				 mapping_can_writeback(mapping)))
   227			folio_account_cleaned(folio, inode_to_wb(mapping->host));
   228	}
   229	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] mm: add vmstat for cgroup uncharged pages
  2025-08-06  0:11 ` [PATCH 3/3] mm: add vmstat for cgroup uncharged pages Boris Burkov
  2025-08-06 20:40   ` kernel test robot
@ 2025-08-06 20:51   ` kernel test robot
  2025-08-07 17:23   ` Shakeel Butt
  2 siblings, 0 replies; 9+ messages in thread
From: kernel test robot @ 2025-08-06 20:51 UTC (permalink / raw)
  To: Boris Burkov, linux-btrfs, linux-mm, linux-fsdevel, kernel-team
  Cc: llvm, oe-kbuild-all, shakeel.butt, hch, wqu

Hi Boris,

kernel test robot noticed the following build errors:

[auto build test ERROR on kdave/for-next]
[also build test ERROR on v6.16]
[cannot apply to akpm-mm/mm-everything linus/master next-20250806]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Boris-Burkov/mm-filemap-add-filemap_add_folio_nocharge/20250806-130147
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
patch link:    https://lore.kernel.org/r/eae30d630ba07de8966d09a3e1700f53715980c2.1754438418.git.boris%40bur.io
patch subject: [PATCH 3/3] mm: add vmstat for cgroup uncharged pages
config: i386-buildonly-randconfig-002-20250807 (https://download.01.org/0day-ci/archive/20250807/202508070450.bjCshoYI-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250807/202508070450.bjCshoYI-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202508070450.bjCshoYI-lkp@intel.com/

All errors (new ones prefixed by >>):

>> mm/filemap.c:209:2: error: call to undeclared function 'filemap_mod_uncharged_vmstat'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     209 |         filemap_mod_uncharged_vmstat(folio, -1);
         |         ^
   mm/filemap.c:209:2: note: did you mean 'filemap_mod_uncharged_cgroup_vmstat'?
   mm/filemap.c:158:13: note: 'filemap_mod_uncharged_cgroup_vmstat' declared here
     158 | static void filemap_mod_uncharged_cgroup_vmstat(struct folio *folio, int sign)
         |             ^
   mm/filemap.c:998:3: error: call to undeclared function 'filemap_mod_uncharged_vmstat'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     998 |                 filemap_mod_uncharged_vmstat(folio, 1);
         |                 ^
   2 errors generated.


vim +/filemap_mod_uncharged_vmstat +209 mm/filemap.c

   163	
   164	
   165	static void filemap_unaccount_folio(struct address_space *mapping,
   166			struct folio *folio)
   167	{
   168		long nr;
   169	
   170		VM_BUG_ON_FOLIO(folio_mapped(folio), folio);
   171		if (!IS_ENABLED(CONFIG_DEBUG_VM) && unlikely(folio_mapped(folio))) {
   172			pr_alert("BUG: Bad page cache in process %s  pfn:%05lx\n",
   173				 current->comm, folio_pfn(folio));
   174			dump_page(&folio->page, "still mapped when deleted");
   175			dump_stack();
   176			add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
   177	
   178			if (mapping_exiting(mapping) && !folio_test_large(folio)) {
   179				int mapcount = folio_mapcount(folio);
   180	
   181				if (folio_ref_count(folio) >= mapcount + 2) {
   182					/*
   183					 * All vmas have already been torn down, so it's
   184					 * a good bet that actually the page is unmapped
   185					 * and we'd rather not leak it: if we're wrong,
   186					 * another bad page check should catch it later.
   187					 */
   188					atomic_set(&folio->_mapcount, -1);
   189					folio_ref_sub(folio, mapcount);
   190				}
   191			}
   192		}
   193	
   194		/* hugetlb folios do not participate in page cache accounting. */
   195		if (folio_test_hugetlb(folio))
   196			return;
   197	
   198		nr = folio_nr_pages(folio);
   199	
   200		__lruvec_stat_mod_folio(folio, NR_FILE_PAGES, -nr);
   201		if (folio_test_swapbacked(folio)) {
   202			__lruvec_stat_mod_folio(folio, NR_SHMEM, -nr);
   203			if (folio_test_pmd_mappable(folio))
   204				__lruvec_stat_mod_folio(folio, NR_SHMEM_THPS, -nr);
   205		} else if (folio_test_pmd_mappable(folio)) {
   206			__lruvec_stat_mod_folio(folio, NR_FILE_THPS, -nr);
   207			filemap_nr_thps_dec(mapping);
   208		}
 > 209		filemap_mod_uncharged_vmstat(folio, -1);
   210	
   211		/*
   212		 * At this point folio must be either written or cleaned by
   213		 * truncate.  Dirty folio here signals a bug and loss of
   214		 * unwritten data - on ordinary filesystems.
   215		 *
   216		 * But it's harmless on in-memory filesystems like tmpfs; and can
   217		 * occur when a driver which did get_user_pages() sets page dirty
   218		 * before putting it, while the inode is being finally evicted.
   219		 *
   220		 * Below fixes dirty accounting after removing the folio entirely
   221		 * but leaves the dirty flag set: it has no effect for truncated
   222		 * folio and anyway will be cleared before returning folio to
   223		 * buddy allocator.
   224		 */
   225		if (WARN_ON_ONCE(folio_test_dirty(folio) &&
   226				 mapping_can_writeback(mapping)))
   227			folio_account_cleaned(folio, inode_to_wb(mapping->host));
   228	}
   229	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] filemap_add_folio_nocharge()
  2025-08-06 14:00 ` [PATCH 0/3] filemap_add_folio_nocharge() Matthew Wilcox
@ 2025-08-06 23:19   ` Shakeel Butt
  0 siblings, 0 replies; 9+ messages in thread
From: Shakeel Butt @ 2025-08-06 23:19 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Boris Burkov, linux-btrfs, linux-mm, linux-fsdevel, kernel-team,
	hch, wqu, hannes, mhocko, roman.gushchin, muchun.song, cgroups

CCing memcg maintainers.

On Wed, Aug 06, 2025 at 03:00:43PM +0100, Matthew Wilcox wrote:
> On Tue, Aug 05, 2025 at 05:11:46PM -0700, Boris Burkov wrote:
> > I would like to revisit Qu's proposal to not charge btrfs extent_buffer
> > allocations to the user's cgroup.
> > 
> > https://lore.kernel.org/linux-mm/b5fef5372ae454a7b6da4f2f75c427aeab6a07d6.1727498749.git.wqu@suse.com/
> 
> I prefer Qu's suggestion to add a flag to the address_space.  This really
> is a property of the address_space, not a property of the call-site.
> 

I think Michal wanted call-site (explicit interface) for easy
searchability. I don't have a strong opinion either way. However I
wonder if having this information in address_space might be more useful.
At the moment, this series is using !folio_memcg(folio) to detect if the
folio has skipped memcg charging on the free path to decrement the stat.
Having information in address_space would be another way to extract that
information. I need to think more on this.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] mm: add vmstat for cgroup uncharged pages
  2025-08-06  0:11 ` [PATCH 3/3] mm: add vmstat for cgroup uncharged pages Boris Burkov
  2025-08-06 20:40   ` kernel test robot
  2025-08-06 20:51   ` kernel test robot
@ 2025-08-07 17:23   ` Shakeel Butt
  2 siblings, 0 replies; 9+ messages in thread
From: Shakeel Butt @ 2025-08-07 17:23 UTC (permalink / raw)
  To: Boris Burkov; +Cc: linux-btrfs, linux-mm, linux-fsdevel, kernel-team, hch, wqu

On Tue, Aug 05, 2025 at 05:11:49PM -0700, Boris Burkov wrote:
> If cgroups are configured into the kernel, then uncharged pages can only
> come from filemap_add_folio_nocharge. Track such uncharged folios in
> vmstat so that they are accounted for.
> 
> Suggested-by: Shakeel Butt <shakeel.butt@linux.dev>
> Signed-off-by: Boris Burkov <boris@bur.io>
> ---
>  include/linux/mmzone.h |  3 +++
>  mm/filemap.c           | 18 ++++++++++++++++++
>  mm/vmstat.c            |  3 +++
>  3 files changed, 24 insertions(+)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 283913d42d7b..a945dec65371 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -241,6 +241,9 @@ enum node_stat_item {
>  	NR_HUGETLB,
>  #endif
>  	NR_BALLOON_PAGES,
> +#ifdef CONFIG_MEMCG
> +	NR_UNCHARGED_FILE_PAGES,

> +#endif
>  	NR_VM_NODE_STAT_ITEMS
>  };
>  
> diff --git a/mm/filemap.c b/mm/filemap.c
> index ccc9cfb4d418..0a258b4a9246 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -146,6 +146,22 @@ static void page_cache_delete(struct address_space *mapping,
>  	mapping->nrpages -= nr;
>  }
>  
> +#ifdef CONFIG_MEMCG
> +static void filemap_mod_uncharged_vmstat(struct folio *folio, int sign)
> +{
> +	long nr = folio_nr_pages(folio) * sign;
> +
> +	if (!folio_memcg(folio))
> +		__lruvec_stat_mod_folio(folio, NR_UNCHARGED_FILE_PAGES, nr);

From filemap_add_folio_nocharge(), this function is called without
preemption disabled, so you will get lockdep warning from following
chain:

__lruvec_stat_mod_folio -> __mod_node_page_state ->
preempt_disable_nested -> lockdep_assert_preemption_disabled().


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-08-07 17:23 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-06  0:11 [PATCH 0/3] filemap_add_folio_nocharge() Boris Burkov
2025-08-06  0:11 ` [PATCH 1/3] mm/filemap: add filemap_add_folio_nocharge() Boris Burkov
2025-08-06  0:11 ` [PATCH 2/3] btrfs: use filemap_add_folio_nocharge() for extent_buffers Boris Burkov
2025-08-06  0:11 ` [PATCH 3/3] mm: add vmstat for cgroup uncharged pages Boris Burkov
2025-08-06 20:40   ` kernel test robot
2025-08-06 20:51   ` kernel test robot
2025-08-07 17:23   ` Shakeel Butt
2025-08-06 14:00 ` [PATCH 0/3] filemap_add_folio_nocharge() Matthew Wilcox
2025-08-06 23:19   ` Shakeel Butt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).