From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx168.postini.com [74.125.245.168]) by kanga.kvack.org (Postfix) with SMTP id 242D96B0072 for ; Tue, 20 Nov 2012 15:25:57 -0500 (EST) Received: from /spool/local by e39.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 20 Nov 2012 13:25:56 -0700 Received: from d03relay05.boulder.ibm.com (d03relay05.boulder.ibm.com [9.17.195.107]) by d03dlp01.boulder.ibm.com (Postfix) with ESMTP id B26221FF001B for ; Tue, 20 Nov 2012 13:25:49 -0700 (MST) Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by d03relay05.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id qAKKPexx242406 for ; Tue, 20 Nov 2012 13:25:40 -0700 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id qAKKPdtT009268 for ; Tue, 20 Nov 2012 13:25:39 -0700 Message-ID: <50ABE741.2020604@linux.vnet.ibm.com> Date: Tue, 20 Nov 2012 12:25:37 -0800 From: Dave Hansen MIME-Version: 1.0 Subject: [3.7-rc6] capture_free_page() frees page without accounting for them?? Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, Mel Gorman , Andrew Morton , LKML Hi Mel, I'm chasing an apparent memory leak introduced post-3.6. The interesting thing is that it appears that the pages are in the allocator, but not being accounted for: http://www.spinics.net/lists/linux-mm/msg46187.html https://bugzilla.kernel.org/show_bug.cgi?id=50181 I started auditing anything that might be messing with NR_FREE_PAGES, and came across commit 1fb3f8ca. It does something curious with capture_free_page() (previously known as split_free_page()). int capture_free_page(struct page *page, int alloc_order, ... __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order)); - /* Split into individual pages */ - set_page_refcounted(page); - split_page(page, order); + if (alloc_order != order) + expand(zone, page, alloc_order, order, + &zone->free_area[order], migratetype); Note that expand() puts the pages _back_ in the allocator, but it does not bump NR_FREE_PAGES. We "return" alloc_order' worth of pages, but we accounted for removing 'order'. I _think_ the correct fix is to just: - __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order)); + __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << alloc_order)); I'm trying to confirm the theory my making this happen a bit more often, but I'd appreciate a second pair of eyes on the code in case I'm reading it wrong. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx181.postini.com [74.125.245.181]) by kanga.kvack.org (Postfix) with SMTP id 5A1A46B0078 for ; Tue, 20 Nov 2012 19:49:06 -0500 (EST) Received: from /spool/local by e7.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 20 Nov 2012 19:49:05 -0500 Received: from d01relay03.pok.ibm.com (d01relay03.pok.ibm.com [9.56.227.235]) by d01dlp02.pok.ibm.com (Postfix) with ESMTP id 87DF86E803A for ; Tue, 20 Nov 2012 19:48:56 -0500 (EST) Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay03.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id qAL0muJG313404 for ; Tue, 20 Nov 2012 19:48:56 -0500 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id qAL0muaL031986 for ; Tue, 20 Nov 2012 22:48:56 -0200 Message-ID: <50AC24F5.9090303@linux.vnet.ibm.com> Date: Tue, 20 Nov 2012 16:48:53 -0800 From: Dave Hansen MIME-Version: 1.0 Subject: Re: [3.7-rc6] capture_free_page() frees page without accounting for them?? References: <50ABE741.2020604@linux.vnet.ibm.com> In-Reply-To: <50ABE741.2020604@linux.vnet.ibm.com> Content-Type: multipart/mixed; boundary="------------060201040006090003020504" Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, Mel Gorman , Andrew Morton , LKML This is a multi-part message in MIME format. --------------060201040006090003020504 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit I'm really evil, so I changed the loop in compact_capture_page() to basically steal the highest-order page it can. This shouldn't _break_ anything, but it does ensure that we'll be splitting pages that we find more often and recreating this *MUCH* faster: - for (order = cc->order; order < MAX_ORDER; order++) { + for (order = MAX_ORDER - 1; order >= cc->order;order--) I also augmented the area in capture_free_page() that I expect to be leaking: if (alloc_order != order) { static int leaked_pages = 0; leaked_pages += 1<free_area[order], migratetype); } I add up all the fields in buddyinfo to figure out how much _should_ be in the allocator and then compare it to MemFree to get a guess at how much is leaked. That number correlates _really_ well with the "leaked_pages" variable above. That pretty much seals it for me. I'll run a stress test overnight to see if it pops up again. The patch I'm running is attached. I'll send a properly changelogged one tomorrow if it works. --------------060201040006090003020504 Content-Type: text/x-patch; name="leak-fix-20121120-1.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="leak-fix-20121120-1.patch" --- linux-2.6.git-dave/mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff -puN mm/page_alloc.c~leak-fix-20121120-1 mm/page_alloc.c --- linux-2.6.git/mm/page_alloc.c~leak-fix-20121120-1 2012-11-20 19:44:09.588966346 -0500 +++ linux-2.6.git-dave/mm/page_alloc.c 2012-11-20 19:44:21.993057915 -0500 @@ -1405,7 +1405,7 @@ int capture_free_page(struct page *page, mt = get_pageblock_migratetype(page); if (unlikely(mt != MIGRATE_ISOLATE)) - __mod_zone_freepage_state(zone, -(1UL << order), mt); + __mod_zone_freepage_state(zone, -(1UL << alloc_order), mt); if (alloc_order != order) expand(zone, page, alloc_order, order, _ --------------060201040006090003020504-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx156.postini.com [74.125.245.156]) by kanga.kvack.org (Postfix) with SMTP id 5819E6B00AD for ; Wed, 21 Nov 2012 09:33:09 -0500 (EST) Date: Wed, 21 Nov 2012 14:33:03 +0000 From: Mel Gorman Subject: Re: [3.7-rc6] capture_free_page() frees page without accounting for them?? Message-ID: <20121121143303.GD8218@suse.de> References: <50ABE741.2020604@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <50ABE741.2020604@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: Dave Hansen Cc: linux-mm@kvack.org, Andrew Morton , LKML On Tue, Nov 20, 2012 at 12:25:37PM -0800, Dave Hansen wrote: > Hi Mel, > > I'm chasing an apparent memory leak introduced post-3.6. An accounting leak could also contribute to the kswapd bugs we've been seeing recently. Andrew, this is quite important and might be worth wedging in before 3.7 comes out because it'll cause serious problems if Dave is right. > The > interesting thing is that it appears that the pages are in the > allocator, but not being accounted for: > > http://www.spinics.net/lists/linux-mm/msg46187.html > https://bugzilla.kernel.org/show_bug.cgi?id=50181 > Differences in the buddy allocator and reported free figures almost always point to either per-cpu drift or NR_FREE_PAGES accounting bugs. Usually the drift is not too bad and the drift is always within a margin related to the number of CPUs. NR_FREE_PAGES accounting bugs get progressively worse until the machine starts OOM killing or locks up. > I started auditing anything that might be messing with NR_FREE_PAGES, > and came across commit 1fb3f8ca. It could certainly affect NR_FREE_PAGES due to its manipulating of buddy pages. It will not result in happy and the system would potentially need to be running a long time before it's spotted. > It does something curious with > capture_free_page() (previously known as split_free_page()). > > int capture_free_page(struct page *page, int alloc_order, > ... > __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order)); > > - /* Split into individual pages */ > - set_page_refcounted(page); > - split_page(page, order); > + if (alloc_order != order) > + expand(zone, page, alloc_order, order, > + &zone->free_area[order], migratetype); > > Note that expand() puts the pages _back_ in the allocator, but it does > not bump NR_FREE_PAGES. We "return" alloc_order' worth of pages, but we > accounted for removing 'order'. > > I _think_ the correct fix is to just: > > - __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order)); > + __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << alloc_order)); > This looks correct to me but it will collide with other patches. You'll need something like the below. If it works for you, stick a changelog on it, feel free to put my Acked on it and get it to Andrew for ASAP because I really think this needs to be in before 3.7 comes out or we'll be swamped with a maze of kswapd-goes-mental bugs, all similar with different root causes. Thanks a million Dave! diff --git a/mm/page_alloc.c b/mm/page_alloc.c index fd6a073..ad99f0f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1406,7 +1406,7 @@ int capture_free_page(struct page *page, int alloc_order, int migratetype) mt = get_pageblock_migratetype(page); if (unlikely(mt != MIGRATE_ISOLATE)) - __mod_zone_freepage_state(zone, -(1UL << order), mt); + __mod_zone_freepage_state(zone, -(1UL << alloc_order), mt); if (alloc_order != order) expand(zone, page, alloc_order, order, -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752893Ab2KTUZp (ORCPT ); Tue, 20 Nov 2012 15:25:45 -0500 Received: from e36.co.us.ibm.com ([32.97.110.154]:36533 "EHLO e36.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752473Ab2KTUZo (ORCPT ); Tue, 20 Nov 2012 15:25:44 -0500 Message-ID: <50ABE741.2020604@linux.vnet.ibm.com> Date: Tue, 20 Nov 2012 12:25:37 -0800 From: Dave Hansen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121028 Thunderbird/16.0.2 MIME-Version: 1.0 To: linux-mm@kvack.org, Mel Gorman , Andrew Morton , LKML Subject: [3.7-rc6] capture_free_page() frees page without accounting for them?? Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Content-Scanned: Fidelis XPS MAILER x-cbid: 12112020-7606-0000-0000-0000058AAE8C Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Mel, I'm chasing an apparent memory leak introduced post-3.6. The interesting thing is that it appears that the pages are in the allocator, but not being accounted for: http://www.spinics.net/lists/linux-mm/msg46187.html https://bugzilla.kernel.org/show_bug.cgi?id=50181 I started auditing anything that might be messing with NR_FREE_PAGES, and came across commit 1fb3f8ca. It does something curious with capture_free_page() (previously known as split_free_page()). int capture_free_page(struct page *page, int alloc_order, ... __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order)); - /* Split into individual pages */ - set_page_refcounted(page); - split_page(page, order); + if (alloc_order != order) + expand(zone, page, alloc_order, order, + &zone->free_area[order], migratetype); Note that expand() puts the pages _back_ in the allocator, but it does not bump NR_FREE_PAGES. We "return" alloc_order' worth of pages, but we accounted for removing 'order'. I _think_ the correct fix is to just: - __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order)); + __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << alloc_order)); I'm trying to confirm the theory my making this happen a bit more often, but I'd appreciate a second pair of eyes on the code in case I'm reading it wrong. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754831Ab2KUOdK (ORCPT ); Wed, 21 Nov 2012 09:33:10 -0500 Received: from cantor2.suse.de ([195.135.220.15]:57228 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754805Ab2KUOdI (ORCPT ); Wed, 21 Nov 2012 09:33:08 -0500 Date: Wed, 21 Nov 2012 14:33:03 +0000 From: Mel Gorman To: Dave Hansen Cc: linux-mm@kvack.org, Andrew Morton , LKML Subject: Re: [3.7-rc6] capture_free_page() frees page without accounting for them?? Message-ID: <20121121143303.GD8218@suse.de> References: <50ABE741.2020604@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <50ABE741.2020604@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 20, 2012 at 12:25:37PM -0800, Dave Hansen wrote: > Hi Mel, > > I'm chasing an apparent memory leak introduced post-3.6. An accounting leak could also contribute to the kswapd bugs we've been seeing recently. Andrew, this is quite important and might be worth wedging in before 3.7 comes out because it'll cause serious problems if Dave is right. > The > interesting thing is that it appears that the pages are in the > allocator, but not being accounted for: > > http://www.spinics.net/lists/linux-mm/msg46187.html > https://bugzilla.kernel.org/show_bug.cgi?id=50181 > Differences in the buddy allocator and reported free figures almost always point to either per-cpu drift or NR_FREE_PAGES accounting bugs. Usually the drift is not too bad and the drift is always within a margin related to the number of CPUs. NR_FREE_PAGES accounting bugs get progressively worse until the machine starts OOM killing or locks up. > I started auditing anything that might be messing with NR_FREE_PAGES, > and came across commit 1fb3f8ca. It could certainly affect NR_FREE_PAGES due to its manipulating of buddy pages. It will not result in happy and the system would potentially need to be running a long time before it's spotted. > It does something curious with > capture_free_page() (previously known as split_free_page()). > > int capture_free_page(struct page *page, int alloc_order, > ... > __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order)); > > - /* Split into individual pages */ > - set_page_refcounted(page); > - split_page(page, order); > + if (alloc_order != order) > + expand(zone, page, alloc_order, order, > + &zone->free_area[order], migratetype); > > Note that expand() puts the pages _back_ in the allocator, but it does > not bump NR_FREE_PAGES. We "return" alloc_order' worth of pages, but we > accounted for removing 'order'. > > I _think_ the correct fix is to just: > > - __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order)); > + __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << alloc_order)); > This looks correct to me but it will collide with other patches. You'll need something like the below. If it works for you, stick a changelog on it, feel free to put my Acked on it and get it to Andrew for ASAP because I really think this needs to be in before 3.7 comes out or we'll be swamped with a maze of kswapd-goes-mental bugs, all similar with different root causes. Thanks a million Dave! diff --git a/mm/page_alloc.c b/mm/page_alloc.c index fd6a073..ad99f0f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1406,7 +1406,7 @@ int capture_free_page(struct page *page, int alloc_order, int migratetype) mt = get_pageblock_migratetype(page); if (unlikely(mt != MIGRATE_ISOLATE)) - __mod_zone_freepage_state(zone, -(1UL << order), mt); + __mod_zone_freepage_state(zone, -(1UL << alloc_order), mt); if (alloc_order != order) expand(zone, page, alloc_order, order,