From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751489AbdCPIsS (ORCPT ); Thu, 16 Mar 2017 04:48:18 -0400 Received: from shells.gnugeneration.com ([66.240.222.126]:55108 "EHLO shells.gnugeneration.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751204AbdCPIqs (ORCPT ); Thu, 16 Mar 2017 04:46:48 -0400 Date: Thu, 16 Mar 2017 01:47:33 -0700 From: lkml@pengaru.com To: Michal Hocko Cc: Gerhard Wiesinger , Minchan Kim , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds Subject: Re: Still OOM problems with 4.9er/4.10er kernels Message-ID: <20170316084733.GP802@shells.gnugeneration.com> References: <20170104091120.GD25453@dhcp22.suse.cz> <82bce413-1bd7-7f66-1c3d-0d890bbaf6f1@wiesinger.com> <20170227090236.GA2789@bbox> <20170227094448.GF14029@dhcp22.suse.cz> <20170228051723.GD2702@bbox> <20170228081223.GA26792@dhcp22.suse.cz> <20170302071721.GA32632@bbox> <20170316082714.GC30501@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170316082714.GC30501@dhcp22.suse.cz> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 16, 2017 at 09:27:14AM +0100, Michal Hocko wrote: > On Thu 16-03-17 07:38:08, Gerhard Wiesinger wrote: > [...] > > The following commit is included in that version: > > commit 710531320af876192d76b2c1f68190a1df941b02 > > Author: Michal Hocko > > Date: Wed Feb 22 15:45:58 2017 -0800 > > > > mm, vmscan: cleanup lru size claculations > > > > commit fd538803731e50367b7c59ce4ad3454426a3d671 upstream. > > This patch shouldn't make any difference. It is a cleanup patch. > I guess you meant 71ab6cfe88dc ("mm, vmscan: consider eligible zones in > get_scan_count") but even that one shouldn't make any difference for 64b > systems. > > > But still OOMs: > > [157048.030760] clamscan: page allocation stalls for 19405ms, order:0, mode:0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null) > > This is not OOM it is an allocation stall. The allocation request cannot > simply make forward progress for more than 10s. This alone is bad but > considering this is GFP_HIGHUSER_MOVABLE which has the full reclaim > capabilities I would suspect your workload overcommits the available > memory too much. You only have ~380MB of RAM with ~160MB sitting in the > anonymous memory, almost nothing in the page cache so I am not wondering > that you see a constant swap activity. There seems to be only 40M in the > slab so we are still missing ~180MB which is neither on the LRU lists > nor allocated by slab. This means that some kernel subsystem allocates > from the page allocator directly. > > That being said, I believe that what you are seeing is not a bug in the > MM subsystem but rather some susbsytem using more memory than it used to > before so your workload doesn't fit into the amount of memory you have > anymore. > While on the topic of understanding allocation stalls, Philip Freeman recently mailed linux-kernel with a similar report, and in his case there are plenty of page cache pages. It was also a GFP_HIGHUSER_MOVABLE 0-order allocation. I'm no MM expert, but it appears a bit broken for such a low-order allocation to stall on the order of 10 seconds when there's plenty of reclaimable pages, in addition to mostly unused and abundant swap space on SSD. Regards, Vito Caputo