From mboxrd@z Thu Jan 1 00:00:00 1970 From: Waiman Long Subject: Re: [LSF/MM/BPF TOPIC] Reducing zombie memcgs Date: Tue, 25 Apr 2023 14:42:41 -0400 Message-ID: <27e15be8-d0eb-ed32-a0ec-5ec9b59f1f27@redhat.com> References: Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1682448173; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gVdh3EskhXvSNgkkcnc90XfD4jl5qJEnnEfhoKtnoTY=; b=Bp1B/HqRkNxqgs7x1H0vIpg1qHP22vA6kYFqoT3HGRj+7YFuae1SO7F8gMa8Y+NCMGZa5n NTqK9rCSeFMspub0CIy6E/14RBkkxRscWU4XGiTWoTEBhiyksNVXZ1X0zyut8tuZJpHFt9 1HsI4OSGoNsguhzq/I7WGJ1dLubsEAI= Content-Language: en-US In-Reply-To: List-ID: Content-Type: text/plain; charset="utf-8"; format="flowed" To: Yosry Ahmed , "T.J. Mercier" Cc: lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Tejun Heo , Shakeel Butt , Muchun Song , Johannes Weiner , Roman Gushchin , Alistair Popple , Jason Gunthorpe , Kalesh Singh , Yu Zhao , Matthew Wilcox , David Rientjes , Greg Thelen On 4/25/23 07:36, Yosry Ahmed wrote: > +David Rientjes +Greg Thelen +Matthew Wilcox > > On Tue, Apr 11, 2023 at 4:48 PM Yosry Ahmed wrote: >> On Tue, Apr 11, 2023 at 4:36 PM T.J. Mercier wrote: >>> When a memcg is removed by userspace it gets offlined by the kernel. >>> Offline memcgs are hidden from user space, but they still live in the >>> kernel until their reference count drops to 0. New allocations cannot >>> be charged to offline memcgs, but existing allocations charged to >>> offline memcgs remain charged, and hold a reference to the memcg. >>> >>> As such, an offline memcg can remain in the kernel indefinitely, >>> becoming a zombie memcg. The accumulation of a large number of zombie >>> memcgs lead to increased system overhead (mainly percpu data in struct >>> mem_cgroup). It also causes some kernel operations that scale with the >>> number of memcgs to become less efficient (e.g. reclaim). >>> >>> There are currently out-of-tree solutions which attempt to >>> periodically clean up zombie memcgs by reclaiming from them. However >>> that is not effective for non-reclaimable memory, which it would be >>> better to reparent or recharge to an online cgroup. There are also >>> proposed changes that would benefit from recharging for shared >>> resources like pinned pages, or DMA buffer pages. >> I am very interested in attending this discussion, it's something that >> I have been actively looking into -- specifically recharging pages of >> offlined memcgs. >> >>> Suggested attendees: >>> Yosry Ahmed >>> Yu Zhao >>> T.J. Mercier >>> Tejun Heo >>> Shakeel Butt >>> Muchun Song >>> Johannes Weiner >>> Roman Gushchin >>> Alistair Popple >>> Jason Gunthorpe >>> Kalesh Singh > I was hoping I would bring a more complete idea to this thread, but > here is what I have so far. > > The idea is to recharge the memory charged to memcgs when they are > offlined. I like to think of the options we have to deal with memory > charged to offline memcgs as a toolkit. This toolkit includes: > > (a) Evict memory. > > This is the simplest option, just evict the memory. > > For file-backed pages, this writes them back to their backing files, > uncharging and freeing the page. The next access will read the page > again and the faulting process’s memcg will be charged. > > For swap-backed pages (anon/shmem), this swaps them out. Swapping out > a page charged to an offline memcg uncharges the page and charges the > swap to its parent. The next access will swap in the page and the > parent will be charged. This is effectively deferred recharging to the > parent. > > Pros: > - Simple. > > Cons: > - Behavior is different for file-backed vs. swap-backed pages, for > swap-backed pages, the memory is recharged to the parent (aka > reparented), not charged to the "rightful" user. > - Next access will incur higher latency, especially if the pages are active. > > (b) Direct recharge to the parent > > This can be done for any page and should be simple as the pages are > already hierarchically charged to the parent. > > Pros: > - Simple. > > Cons: > - If a different memcg is using the memory, it will keep taxing the > parent indefinitely. Same not the "rightful" user argument. Muchun had actually posted patch to do this last year. See https://lore.kernel.org/all/20220621125658.64935-10-songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org/T/#me9dbbce85e2f3c4e5f34b97dbbdb5f79d77ce147 I am wondering if he is going to post an updated version of that or not. Anyway, I am looking forward to learn about the result of this discussion even thought I am not a conference invitee. Thanks, Longman