From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx153.postini.com [74.125.245.153]) by kanga.kvack.org (Postfix) with SMTP id CAC136B006E for ; Tue, 20 Nov 2012 14:04:48 -0500 (EST) Date: Tue, 20 Nov 2012 14:04:41 -0500 From: Johannes Weiner Subject: kswapd endless loop for compaction Message-ID: <20121120190440.GA24381@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-linux-mm@kvack.org List-ID: To: Rik van Riel , Mel Gorman Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org Hi guys, while testing a 3.7-rc5ish kernel, I noticed that kswapd can drop into a busy spin state without doing reclaim. printk-style debugging told me that this happens when the distance between a zone's high watermark and its low watermark is less than two huge pages (DMA zone). 1. The first loop in balance_pgdat() over the zones finds all zones to be above their high watermark and only does goto out (all_zones_ok). 2. pgdat_balanced() at the out: label also just checks the high watermark, so the node is considered balanced and the order is not reduced. 3. In the `if (order)' block after it, compaction_suitable() checks if the zone's low watermark + twice the huge page size is okay, which it's not necessarily in a small zone, and so COMPACT_SKIPPED makes it it go back to loop_again:. This will go on until somebody else allocates and breaches the high watermark and then hopefully goes on to reclaim the zone above low watermark + 2 * THP. I'm not really sure what the correct solution is. Should we modify the zone_watermark_ok() checks in balance_pgdat() to take into account the higher watermark requirements for reclaim on behalf of compaction? Change the check in compaction_suitable() / not use it directly? Thanks, Johannes -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx105.postini.com [74.125.245.105]) by kanga.kvack.org (Postfix) with SMTP id 7F7316B0062 for ; Wed, 21 Nov 2012 17:01:37 -0500 (EST) Date: Wed, 21 Nov 2012 17:01:26 -0500 From: Johannes Weiner Subject: Re: kswapd endless loop for compaction Message-ID: <20121121220126.GA2301@cmpxchg.org> References: <20121120190440.GA24381@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20121120190440.GA24381@cmpxchg.org> Sender: owner-linux-mm@kvack.org List-ID: To: Rik van Riel , Mel Gorman Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org Just to be clear, this is not fixed by Dave's patch to NR_FREE_PAGES accounting. I can still get 3.7-rc5 + Dave's fix to drop into an endless loop in kswapd within a couple of minutes on my test box. As described below, the bug comes from contradicting conditions in balance_pgdat(), not an accounting problem. On Tue, Nov 20, 2012 at 02:04:41PM -0500, Johannes Weiner wrote: > Hi guys, > > while testing a 3.7-rc5ish kernel, I noticed that kswapd can drop into > a busy spin state without doing reclaim. printk-style debugging told > me that this happens when the distance between a zone's high watermark > and its low watermark is less than two huge pages (DMA zone). > > 1. The first loop in balance_pgdat() over the zones finds all zones to > be above their high watermark and only does goto out (all_zones_ok). > > 2. pgdat_balanced() at the out: label also just checks the high > watermark, so the node is considered balanced and the order is not > reduced. > > 3. In the `if (order)' block after it, compaction_suitable() checks if > the zone's low watermark + twice the huge page size is okay, which > it's not necessarily in a small zone, and so COMPACT_SKIPPED makes it > it go back to loop_again:. > > This will go on until somebody else allocates and breaches the high > watermark and then hopefully goes on to reclaim the zone above low > watermark + 2 * THP. > > I'm not really sure what the correct solution is. Should we modify > the zone_watermark_ok() checks in balance_pgdat() to take into account > the higher watermark requirements for reclaim on behalf of compaction? > Change the check in compaction_suitable() / not use it directly? > > Thanks, > Johannes > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx140.postini.com [74.125.245.140]) by kanga.kvack.org (Postfix) with SMTP id B98366B0072 for ; Thu, 22 Nov 2012 09:42:23 -0500 (EST) Received: by mail-pb0-f41.google.com with SMTP id xa7so6361299pbc.14 for ; Thu, 22 Nov 2012 06:42:23 -0800 (PST) Message-ID: <50AE397A.8080000@gmail.com> Date: Thu, 22 Nov 2012 22:40:58 +0800 From: Jaegeuk Hanse MIME-Version: 1.0 Subject: Re: kswapd endless loop for compaction References: <20121120190440.GA24381@cmpxchg.org> In-Reply-To: <20121120190440.GA24381@cmpxchg.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Johannes Weiner Cc: Rik van Riel , Mel Gorman , linux-mm@kvack.org, linux-kernel@vger.kernel.org On 11/21/2012 03:04 AM, Johannes Weiner wrote: > Hi guys, > > while testing a 3.7-rc5ish kernel, I noticed that kswapd can drop into > a busy spin state without doing reclaim. printk-style debugging told > me that this happens when the distance between a zone's high watermark > and its low watermark is less than two huge pages (DMA zone). > > 1. The first loop in balance_pgdat() over the zones finds all zones to > be above their high watermark and only does goto out (all_zones_ok). > > 2. pgdat_balanced() at the out: label also just checks the high > watermark, so the node is considered balanced and the order is not > reduced. > > 3. In the `if (order)' block after it, compaction_suitable() checks if > the zone's low watermark + twice the huge page size is okay, which > it's not necessarily in a small zone, and so COMPACT_SKIPPED makes it > it go back to loop_again:. > > This will go on until somebody else allocates and breaches the high > watermark and then hopefully goes on to reclaim the zone above low > watermark + 2 * THP. > > I'm not really sure what the correct solution is. Should we modify > the zone_watermark_ok() checks in balance_pgdat() to take into account > the higher watermark requirements for reclaim on behalf of compaction? > Change the check in compaction_suitable() / not use it directly? > Hi Johannes, - If all zones meet high watermark, goto out, then why go to `if (order)' block? - If depend on compaction get enough contigous pages, why if (CONPACT_BUILD && order && compaction_suitable(zone, order) != COMPACTION_SKIPPED) testorder = 0; can't guarantee low watermark + twice the huge page size is okay? Regards, Jaegeuk > > Thanks, > Johannes > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx119.postini.com [74.125.245.119]) by kanga.kvack.org (Postfix) with SMTP id 5F44D6B006C for ; Fri, 23 Nov 2012 03:50:33 -0500 (EST) Received: by mail-ie0-f169.google.com with SMTP id 10so16181015ied.14 for ; Fri, 23 Nov 2012 00:50:32 -0800 (PST) Message-ID: <50AF38D2.6090106@gmail.com> Date: Fri, 23 Nov 2012 16:50:26 +0800 From: Jaegeuk Hanse MIME-Version: 1.0 Subject: Re: kswapd endless loop for compaction References: <20121120190440.GA24381@cmpxchg.org> In-Reply-To: <20121120190440.GA24381@cmpxchg.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Johannes Weiner Cc: Rik van Riel , Mel Gorman , linux-mm@kvack.org, linux-kernel@vger.kernel.org On 11/21/2012 03:04 AM, Johannes Weiner wrote: > Hi guys, > > while testing a 3.7-rc5ish kernel, I noticed that kswapd can drop into > a busy spin state without doing reclaim. printk-style debugging told > me that this happens when the distance between a zone's high watermark > and its low watermark is less than two huge pages (DMA zone). > > 1. The first loop in balance_pgdat() over the zones finds all zones to > be above their high watermark and only does goto out (all_zones_ok). > > 2. pgdat_balanced() at the out: label also just checks the high > watermark, so the node is considered balanced and the order is not > reduced. > > 3. In the `if (order)' block after it, compaction_suitable() checks if > the zone's low watermark + twice the huge page size is okay, which > it's not necessarily in a small zone, and so COMPACT_SKIPPED makes it > it go back to loop_again:. > > This will go on until somebody else allocates and breaches the high > watermark and then hopefully goes on to reclaim the zone above low > watermark + 2 * THP. > > I'm not really sure what the correct solution is. Should we modify > the zone_watermark_ok() checks in balance_pgdat() to take into account > the higher watermark requirements for reclaim on behalf of compaction? > Change the check in compaction_suitable() / not use it directly? Hi Johannes, If depend on compaction get enough contigous pages, why if (CONPACT_BUILD && order && compaction_suitable(zone, order) != COMPACTION_SKIPPED) testorder = 0; can't guarantee low watermark + twice the huge page size is okay? Regards, Jaegeuk > Thanks, > Johannes > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752856Ab2KTTEw (ORCPT ); Tue, 20 Nov 2012 14:04:52 -0500 Received: from zene.cmpxchg.org ([85.214.230.12]:34644 "EHLO zene.cmpxchg.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751856Ab2KTTEv (ORCPT ); Tue, 20 Nov 2012 14:04:51 -0500 Date: Tue, 20 Nov 2012 14:04:41 -0500 From: Johannes Weiner To: Rik van Riel , Mel Gorman Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: kswapd endless loop for compaction Message-ID: <20121120190440.GA24381@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi guys, while testing a 3.7-rc5ish kernel, I noticed that kswapd can drop into a busy spin state without doing reclaim. printk-style debugging told me that this happens when the distance between a zone's high watermark and its low watermark is less than two huge pages (DMA zone). 1. The first loop in balance_pgdat() over the zones finds all zones to be above their high watermark and only does goto out (all_zones_ok). 2. pgdat_balanced() at the out: label also just checks the high watermark, so the node is considered balanced and the order is not reduced. 3. In the `if (order)' block after it, compaction_suitable() checks if the zone's low watermark + twice the huge page size is okay, which it's not necessarily in a small zone, and so COMPACT_SKIPPED makes it it go back to loop_again:. This will go on until somebody else allocates and breaches the high watermark and then hopefully goes on to reclaim the zone above low watermark + 2 * THP. I'm not really sure what the correct solution is. Should we modify the zone_watermark_ok() checks in balance_pgdat() to take into account the higher watermark requirements for reclaim on behalf of compaction? Change the check in compaction_suitable() / not use it directly? Thanks, Johannes From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754013Ab2KVSfy (ORCPT ); Thu, 22 Nov 2012 13:35:54 -0500 Received: from mail-pb0-f46.google.com ([209.85.160.46]:41236 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752169Ab2KVSfe (ORCPT ); Thu, 22 Nov 2012 13:35:34 -0500 Message-ID: <50AE397A.8080000@gmail.com> Date: Thu, 22 Nov 2012 22:40:58 +0800 From: Jaegeuk Hanse User-Agent: Mozilla/5.0 (X11; Linux i686; rv:16.0) Gecko/20121028 Thunderbird/16.0.2 MIME-Version: 1.0 To: Johannes Weiner CC: Rik van Riel , Mel Gorman , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: kswapd endless loop for compaction References: <20121120190440.GA24381@cmpxchg.org> In-Reply-To: <20121120190440.GA24381@cmpxchg.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/21/2012 03:04 AM, Johannes Weiner wrote: > Hi guys, > > while testing a 3.7-rc5ish kernel, I noticed that kswapd can drop into > a busy spin state without doing reclaim. printk-style debugging told > me that this happens when the distance between a zone's high watermark > and its low watermark is less than two huge pages (DMA zone). > > 1. The first loop in balance_pgdat() over the zones finds all zones to > be above their high watermark and only does goto out (all_zones_ok). > > 2. pgdat_balanced() at the out: label also just checks the high > watermark, so the node is considered balanced and the order is not > reduced. > > 3. In the `if (order)' block after it, compaction_suitable() checks if > the zone's low watermark + twice the huge page size is okay, which > it's not necessarily in a small zone, and so COMPACT_SKIPPED makes it > it go back to loop_again:. > > This will go on until somebody else allocates and breaches the high > watermark and then hopefully goes on to reclaim the zone above low > watermark + 2 * THP. > > I'm not really sure what the correct solution is. Should we modify > the zone_watermark_ok() checks in balance_pgdat() to take into account > the higher watermark requirements for reclaim on behalf of compaction? > Change the check in compaction_suitable() / not use it directly? > Hi Johannes, - If all zones meet high watermark, goto out, then why go to `if (order)' block? - If depend on compaction get enough contigous pages, why if (CONPACT_BUILD && order && compaction_suitable(zone, order) != COMPACTION_SKIPPED) testorder = 0; can't guarantee low watermark + twice the huge page size is okay? Regards, Jaegeuk > > Thanks, > Johannes > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758483Ab2KVXDs (ORCPT ); Thu, 22 Nov 2012 18:03:48 -0500 Received: from zene.cmpxchg.org ([85.214.230.12]:34811 "EHLO zene.cmpxchg.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752475Ab2KVXDq (ORCPT ); Thu, 22 Nov 2012 18:03:46 -0500 Date: Wed, 21 Nov 2012 17:01:26 -0500 From: Johannes Weiner To: Rik van Riel , Mel Gorman Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: kswapd endless loop for compaction Message-ID: <20121121220126.GA2301@cmpxchg.org> References: <20121120190440.GA24381@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20121120190440.GA24381@cmpxchg.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Just to be clear, this is not fixed by Dave's patch to NR_FREE_PAGES accounting. I can still get 3.7-rc5 + Dave's fix to drop into an endless loop in kswapd within a couple of minutes on my test box. As described below, the bug comes from contradicting conditions in balance_pgdat(), not an accounting problem. On Tue, Nov 20, 2012 at 02:04:41PM -0500, Johannes Weiner wrote: > Hi guys, > > while testing a 3.7-rc5ish kernel, I noticed that kswapd can drop into > a busy spin state without doing reclaim. printk-style debugging told > me that this happens when the distance between a zone's high watermark > and its low watermark is less than two huge pages (DMA zone). > > 1. The first loop in balance_pgdat() over the zones finds all zones to > be above their high watermark and only does goto out (all_zones_ok). > > 2. pgdat_balanced() at the out: label also just checks the high > watermark, so the node is considered balanced and the order is not > reduced. > > 3. In the `if (order)' block after it, compaction_suitable() checks if > the zone's low watermark + twice the huge page size is okay, which > it's not necessarily in a small zone, and so COMPACT_SKIPPED makes it > it go back to loop_again:. > > This will go on until somebody else allocates and breaches the high > watermark and then hopefully goes on to reclaim the zone above low > watermark + 2 * THP. > > I'm not really sure what the correct solution is. Should we modify > the zone_watermark_ok() checks in balance_pgdat() to take into account > the higher watermark requirements for reclaim on behalf of compaction? > Change the check in compaction_suitable() / not use it directly? > > Thanks, > Johannes > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758045Ab2KWIue (ORCPT ); Fri, 23 Nov 2012 03:50:34 -0500 Received: from mail-ie0-f174.google.com ([209.85.223.174]:33886 "EHLO mail-ie0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754007Ab2KWIud (ORCPT ); Fri, 23 Nov 2012 03:50:33 -0500 Message-ID: <50AF38D2.6090106@gmail.com> Date: Fri, 23 Nov 2012 16:50:26 +0800 From: Jaegeuk Hanse User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: Johannes Weiner CC: Rik van Riel , Mel Gorman , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: kswapd endless loop for compaction References: <20121120190440.GA24381@cmpxchg.org> In-Reply-To: <20121120190440.GA24381@cmpxchg.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/21/2012 03:04 AM, Johannes Weiner wrote: > Hi guys, > > while testing a 3.7-rc5ish kernel, I noticed that kswapd can drop into > a busy spin state without doing reclaim. printk-style debugging told > me that this happens when the distance between a zone's high watermark > and its low watermark is less than two huge pages (DMA zone). > > 1. The first loop in balance_pgdat() over the zones finds all zones to > be above their high watermark and only does goto out (all_zones_ok). > > 2. pgdat_balanced() at the out: label also just checks the high > watermark, so the node is considered balanced and the order is not > reduced. > > 3. In the `if (order)' block after it, compaction_suitable() checks if > the zone's low watermark + twice the huge page size is okay, which > it's not necessarily in a small zone, and so COMPACT_SKIPPED makes it > it go back to loop_again:. > > This will go on until somebody else allocates and breaches the high > watermark and then hopefully goes on to reclaim the zone above low > watermark + 2 * THP. > > I'm not really sure what the correct solution is. Should we modify > the zone_watermark_ok() checks in balance_pgdat() to take into account > the higher watermark requirements for reclaim on behalf of compaction? > Change the check in compaction_suitable() / not use it directly? Hi Johannes, If depend on compaction get enough contigous pages, why if (CONPACT_BUILD && order && compaction_suitable(zone, order) != COMPACTION_SKIPPED) testorder = 0; can't guarantee low watermark + twice the huge page size is okay? Regards, Jaegeuk > Thanks, > Johannes > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org