From: Dennis Zhou <dennis@kernel.org>
To: Roman Gushchin <guro@fb.com>
Cc: Jan Kara <jack@suse.cz>, Tejun Heo <tj@kernel.org>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, Alexander Viro <viro@zeniv.linux.org.uk>,
Dave Chinner <dchinner@redhat.com>,
cgroups@vger.kernel.org
Subject: Re: [PATCH v7 0/6] cgroup, blkcg: prevent dirty inodes to pin dying memory cgroups
Date: Sat, 5 Jun 2021 21:37:37 +0000 [thread overview]
Message-ID: <YLvuofB0xMuz/wz9@google.com> (raw)
In-Reply-To: <20210604013159.3126180-1-guro@fb.com>
Hello,
On Thu, Jun 03, 2021 at 06:31:53PM -0700, Roman Gushchin wrote:
> When an inode is getting dirty for the first time it's associated
> with a wb structure (see __inode_attach_wb()). It can later be
> switched to another wb (if e.g. some other cgroup is writing a lot of
> data to the same inode), but otherwise stays attached to the original
> wb until being reclaimed.
>
> The problem is that the wb structure holds a reference to the original
> memory and blkcg cgroups. So if an inode has been dirty once and later
> is actively used in read-only mode, it has a good chance to pin down
> the original memory and blkcg cgroups forewer. This is often the case with
> services bringing data for other services, e.g. updating some rpm
> packages.
>
> In the real life it becomes a problem due to a large size of the memcg
> structure, which can easily be 1000x larger than an inode. Also a
> really large number of dying cgroups can raise different scalability
> issues, e.g. making the memory reclaim costly and less effective.
>
> To solve the problem inodes should be eventually detached from the
> corresponding writeback structure. It's inefficient to do it after
> every writeback completion. Instead it can be done whenever the
> original memory cgroup is offlined and writeback structure is getting
> killed. Scanning over a (potentially long) list of inodes and detach
> them from the writeback structure can take quite some time. To avoid
> scanning all inodes, attached inodes are kept on a new list (b_attached).
> To make it less noticeable to a user, the scanning and switching is performed
> from a work context.
>
> Big thanks to Jan Kara, Dennis Zhou and Hillf Danton for their ideas and
> contribution to this patchset.
>
> v7:
> - shared locking for multiple inode switching
> - introduced inode_prepare_wbs_switch() helper
> - extended the pre-switch inode check for I_WILL_FREE
> - added comments here and there
>
> v6:
> - extended and reused wbs switching functionality to switch inodes
> on cgwb cleanup
> - fixed offline_list handling
> - switched to the unbound_wq
> - other minor fixes
>
> v5:
> - switch inodes to bdi->wb instead of zeroing inode->i_wb
> - split the single patch into two
> - only cgwbs maintain lists of attached inodes
> - added cond_resched()
> - fixed !CONFIG_CGROUP_WRITEBACK handling
> - extended list of prohibited inodes flag
> - other small fixes
>
>
> Roman Gushchin (6):
> writeback, cgroup: do not switch inodes with I_WILL_FREE flag
> writeback, cgroup: switch to rcu_work API in inode_switch_wbs()
> writeback, cgroup: keep list of inodes attached to bdi_writeback
> writeback, cgroup: split out the functional part of
> inode_switch_wbs_work_fn()
> writeback, cgroup: support switching multiple inodes at once
> writeback, cgroup: release dying cgwbs by switching attached inodes
>
> fs/fs-writeback.c | 302 +++++++++++++++++++++----------
> include/linux/backing-dev-defs.h | 20 +-
> include/linux/writeback.h | 1 +
> mm/backing-dev.c | 69 ++++++-
> 4 files changed, 293 insertions(+), 99 deletions(-)
>
> --
> 2.31.1
>
I too am a bit late to the party. Feel free to add mine as well to the
series.
Acked-by: Dennis Zhou <dennis@kernel.org>
I left my one comment on the last patch regarding a possible future
extension.
Thanks,
Dennis
prev parent reply other threads:[~2021-06-05 21:37 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-04 1:31 [PATCH v7 0/6] cgroup, blkcg: prevent dirty inodes to pin dying memory cgroups Roman Gushchin
2021-06-04 1:31 ` [PATCH v7 1/6] writeback, cgroup: do not switch inodes with I_WILL_FREE flag Roman Gushchin
2021-06-07 8:48 ` Jan Kara
2021-06-04 1:31 ` [PATCH v7 2/6] writeback, cgroup: switch to rcu_work API in inode_switch_wbs() Roman Gushchin
2021-06-04 1:31 ` [PATCH v7 3/6] writeback, cgroup: keep list of inodes attached to bdi_writeback Roman Gushchin
2021-06-04 1:31 ` [PATCH v7 4/6] writeback, cgroup: split out the functional part of inode_switch_wbs_work_fn() Roman Gushchin
2021-06-04 1:31 ` [PATCH v7 5/6] writeback, cgroup: support switching multiple inodes at once Roman Gushchin
2021-06-07 9:00 ` Jan Kara
2021-06-04 1:31 ` [PATCH v7 6/6] writeback, cgroup: release dying cgwbs by switching attached inodes Roman Gushchin
2021-06-04 15:51 ` Tejun Heo
2021-06-05 21:34 ` Dennis Zhou
2021-06-08 0:20 ` Roman Gushchin
2021-06-07 9:24 ` Jan Kara
2021-06-04 15:53 ` [PATCH v7 0/6] cgroup, blkcg: prevent dirty inodes to pin dying memory cgroups Tejun Heo
2021-06-04 22:24 ` Roman Gushchin
2021-06-04 23:31 ` Tejun Heo
2021-06-05 21:37 ` Dennis Zhou [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YLvuofB0xMuz/wz9@google.com \
--to=dennis@kernel.org \
--cc=cgroups@vger.kernel.org \
--cc=dchinner@redhat.com \
--cc=guro@fb.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=tj@kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).