From mboxrd@z Thu Jan 1 00:00:00 1970 From: Johannes Weiner Subject: Re: [PATCH v2 1/2] mm, memcg: reclaim more aggressively before high allocator throttling Date: Tue, 14 Jul 2020 11:45:04 -0400 Message-ID: <20200714154504.GB215857@cmpxchg.org> References: Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=7U8JsZvgXHH3n8xgxoCLEbRbt6AY8Z+mxAelisJgRw8=; b=TeT4WtuDoSJlYSAQH2bV1YMeSx1E6X2BFU8rj0/8YyGh0bErT86/GtYKHAowuaOF6E i5sZLU99YAKqeQ4qr4Qp40qynq4yNSyzg1yH0BPltc5IY1ZLrkVQwxI8UTVm3ozseEdy 5ZkeWdsxPMWs2cvuTkyXFRGoW6rRjLvtArEwKHjtaJEZrrRGsCnu7J5y946a97v7K8xc O8xhwzvcMs0lb/N+ymrisTzO4wstCoEc86JTj/YVRRwGUOkVXxACR80FW58sUvUxstgn J4rgoTKUsqwoXilNN1i9HA6gtgwsAfe+lhDNrffl8KLuAP0+jQXa9OkWVGhEGjbFqp8/ 6C6g== Content-Disposition: inline In-Reply-To: Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Chris Down Cc: Andrew Morton , Michal Hocko , linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, kernel-team-b10kYP2dOMg@public.gmane.org On Mon, Jul 13, 2020 at 12:42:35PM +0100, Chris Down wrote: > In Facebook production, we've seen cases where cgroups have been put > into allocator throttling even when they appear to have a lot of slack > file caches which should be trivially reclaimable. > > Looking more closely, the problem is that we only try a single cgroup > reclaim walk for each return to usermode before calculating whether or > not we should throttle. This single attempt doesn't produce enough > pressure to shrink for cgroups with a rapidly growing amount of file > caches prior to entering allocator throttling. > > As an example, we see that threads in an affected cgroup are stuck in > allocator throttling: > > # for i in $(cat cgroup.threads); do > > grep over_high "/proc/$i/stack" > > done > [<0>] mem_cgroup_handle_over_high+0x10b/0x150 > [<0>] mem_cgroup_handle_over_high+0x10b/0x150 > [<0>] mem_cgroup_handle_over_high+0x10b/0x150 > > ...however, there is no I/O pressure reported by PSI, despite a lot of > slack file pages: > > # cat memory.pressure > some avg10=78.50 avg60=84.99 avg300=84.53 total=5702440903 > full avg10=78.50 avg60=84.99 avg300=84.53 total=5702116959 > # cat io.pressure > some avg10=0.00 avg60=0.00 avg300=0.00 total=78051391 > full avg10=0.00 avg60=0.00 avg300=0.00 total=78049640 > # grep _file memory.stat > inactive_file 1370939392 > active_file 661635072 > > This patch changes the behaviour to retry reclaim either until the > current task goes below the 10ms grace period, or we are making no > reclaim progress at all. In the latter case, we enter reclaim throttling > as before. > > To a user, there's no intuitive reason for the reclaim behaviour to > differ from hitting memory.high as part of a new allocation, as opposed > to hitting memory.high because someone lowered its value. As such this > also brings an added benefit: it unifies the reclaim behaviour between > the two. > > There's precedent for this behaviour: we already do reclaim retries when > writing to memory.{high,max}, in max reclaim, and in the page allocator > itself. > > Signed-off-by: Chris Down > Cc: Andrew Morton > Cc: Johannes Weiner > Cc: Tejun Heo > Cc: Michal Hocko Acked-by: Johannes Weiner