From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754432AbbIROu4 (ORCPT ); Fri, 18 Sep 2015 10:50:56 -0400 Received: from mga09.intel.com ([134.134.136.24]:60903 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754339AbbIROuy (ORCPT ); Fri, 18 Sep 2015 10:50:54 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.17,553,1437462000"; d="scan'208";a="647566764" Subject: Re: 4.3-rc1 dirty page count underflow (cgroup-related?) To: Greg Thelen References: <55FB9319.2010000@intel.com> Cc: Johannes Weiner , Michal Hocko , Tejun Heo , Jens Axboe , Andrew Morton , Jan Kara , "open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)" , "open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)" , open list From: Dave Hansen Message-ID: <55FC24C2.8020501@intel.com> Date: Fri, 18 Sep 2015 07:50:42 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/17/2015 11:09 PM, Greg Thelen wrote: > I'm not denying the issue, bug the WARNING splat isn't necessarily > catching a problem. The corresponding code comes from your debug patch: > + WARN_ONCE(__this_cpu_read(memcg->stat->count[MEM_CGROUP_STAT_DIRTY]) > (1UL<<30), "MEM_CGROUP_STAT_DIRTY bogus"); > > This only checks a single cpu's counter, which can be negative. The sum > of all counters is what matters. > Imagine: > cpu1) dirty page: inc > cpu2) clean page: dec > The sum is properly zero, but cpu2 is -1, which will trigger the WARN. > > I'll look at the code and also see if I can reproduce the failure using > mem_cgroup_read_stat() for all of the new WARNs. D'oh. I'll replace those with the proper mem_cgroup_read_stat() and test with your patch to see if anything still triggers. > Did you notice if the global /proc/meminfo:Dirty count also underflowed? It did not underflow. It was one of the first things I looked at and it looked fine, went down near 0 at 'sync', etc...