From: Roman Gushchin <roman.gushchin@linux.dev>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Qu Wenruo <wqu@suse.com>,
linux-btrfs@vger.kernel.org, mhocko@kernel.org,
shakeel.butt@linux.dev, muchun.song@linux.dev,
cgroups@vger.kernel.org, linux-mm@kvack.org,
Michal Hocko <mhocko@suse.com>,
Vlastimil Babka <vbabka@kernel.org>
Subject: Re: [PATCH v7 2/3] btrfs: always uses root memcgroup for filemap_add_folio()
Date: Fri, 19 Jul 2024 17:25:09 +0000 [thread overview]
Message-ID: <ZpqhddM6-DnbfC7T@google.com> (raw)
In-Reply-To: <20240719170206.GA3242034@cmpxchg.org>
On Fri, Jul 19, 2024 at 01:02:06PM -0400, Johannes Weiner wrote:
> On Fri, Jul 19, 2024 at 07:58:40PM +0930, Qu Wenruo wrote:
> > [BACKGROUND]
> > The function filemap_add_folio() charges the memory cgroup,
> > as we assume all page caches are accessible by user space progresses
> > thus needs the cgroup accounting.
> >
> > However btrfs is a special case, it has a very large metadata thanks to
> > its support of data csum (by default it's 4 bytes per 4K data, and can
> > be as large as 32 bytes per 4K data).
> > This means btrfs has to go page cache for its metadata pages, to take
> > advantage of both cache and reclaim ability of filemap.
> >
> > This has a tiny problem, that all btrfs metadata pages have to go through
> > the memcgroup charge, even all those metadata pages are not
> > accessible by the user space, and doing the charging can introduce some
> > latency if there is a memory limits set.
> >
> > Btrfs currently uses __GFP_NOFAIL flag as a workaround for this cgroup
> > charge situation so that metadata pages won't really be limited by
> > memcgroup.
> >
> > [ENHANCEMENT]
> > Instead of relying on __GFP_NOFAIL to avoid charge failure, use root
> > memory cgroup to attach metadata pages.
> >
> > With root memory cgroup, we directly skip the charging part, and only
> > rely on __GFP_NOFAIL for the real memory allocation part.
> >
> > Suggested-by: Michal Hocko <mhocko@suse.com>
> > Suggested-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> > Signed-off-by: Qu Wenruo <wqu@suse.com>
> > ---
> > fs/btrfs/extent_io.c | 10 ++++++++++
> > 1 file changed, 10 insertions(+)
> >
> > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > index aa7f8148cd0d..cfeed7673009 100644
> > --- a/fs/btrfs/extent_io.c
> > +++ b/fs/btrfs/extent_io.c
> > @@ -2971,6 +2971,7 @@ static int attach_eb_folio_to_filemap(struct extent_buffer *eb, int i,
> >
> > struct btrfs_fs_info *fs_info = eb->fs_info;
> > struct address_space *mapping = fs_info->btree_inode->i_mapping;
> > + struct mem_cgroup *old_memcg;
> > const unsigned long index = eb->start >> PAGE_SHIFT;
> > struct folio *existing_folio = NULL;
> > int ret;
> > @@ -2981,8 +2982,17 @@ static int attach_eb_folio_to_filemap(struct extent_buffer *eb, int i,
> > ASSERT(eb->folios[i]);
> >
> > retry:
> > + /*
> > + * Btree inode is a btrfs internal inode, and not exposed to any
> > + * user.
> > + * Furthermore we do not want any cgroup limits on this inode.
> > + * So we always use root_mem_cgroup as our active memcg when attaching
> > + * the folios.
> > + */
> > + old_memcg = set_active_memcg(root_mem_cgroup);
> > ret = filemap_add_folio(mapping, eb->folios[i], index + i,
> > GFP_NOFS | __GFP_NOFAIL);
> > + set_active_memcg(old_memcg);
>
> It looks correct. But it's going through all dance to set up
> current->active_memcg, then have the charge path look that up,
> css_get(), call try_charge() only to bail immediately, css_put(), then
> update current->active_memcg again. All those branches are necessary
> when we want to charge to a "real" other cgroup. But in this case, we
> always know we're not charging, so it seems uncalled for.
>
> Wouldn't it be a lot simpler (and cheaper) to have a
> filemap_add_folio_nocharge()?
Time to restore GFP_NOACCOUNT? I think it might be useful for allocating objects
which are shared across the entire system and/or unlikely will go away under
the memory pressure.
next prev parent reply other threads:[~2024-07-19 17:25 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-19 10:28 [PATCH v7 0/3] btrfs: try to allocate larger folios for metadata Qu Wenruo
2024-07-19 10:28 ` [PATCH v7 1/3] memcontrol: define root_mem_cgroup for CONFIG_MEMCG=n cases Qu Wenruo
2024-07-19 11:13 ` Michal Hocko
2024-07-19 21:58 ` Qu Wenruo
2024-07-22 7:32 ` Michal Hocko
2024-07-19 10:28 ` [PATCH v7 2/3] btrfs: always uses root memcgroup for filemap_add_folio() Qu Wenruo
2024-07-19 17:02 ` Johannes Weiner
2024-07-19 17:25 ` Roman Gushchin [this message]
2024-07-19 18:13 ` Michal Hocko
2024-07-19 22:11 ` Qu Wenruo
2024-07-22 7:34 ` Michal Hocko
2024-07-22 8:08 ` Qu Wenruo
2024-07-24 3:46 ` Qu Wenruo
2024-07-19 10:28 ` [PATCH v7 3/3] btrfs: prefer to allocate larger folio for metadata Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZpqhddM6-DnbfC7T@google.com \
--to=roman.gushchin@linux.dev \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mhocko@suse.com \
--cc=muchun.song@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=vbabka@kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.