From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay1.corp.sgi.com [137.38.102.111]) by oss.sgi.com (Postfix) with ESMTP id 6BF617F37 for ; Fri, 20 Mar 2015 04:56:19 -0500 (CDT) Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by relay1.corp.sgi.com (Postfix) with ESMTP id 378A98F8078 for ; Fri, 20 Mar 2015 02:56:16 -0700 (PDT) Received: from mx2.suse.de (cantor2.suse.de [195.135.220.15]) by cuda.sgi.com with ESMTP id VdTH4g451mYqKOC6 (version=TLSv1 cipher=AES256-SHA bits=256 verify=NO) for ; Fri, 20 Mar 2015 02:56:14 -0700 (PDT) Date: Fri, 20 Mar 2015 09:56:06 +0000 From: Mel Gorman Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur Message-ID: <20150320095606.GE3087@suse.de> References: <20150317070655.GB10105@dastard> <20150317205104.GA28621@dastard> <20150317220840.GC28621@dastard> <20150319224143.GI10105@dastard> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: Linus Torvalds Cc: Linux Kernel Mailing List , xfs@oss.sgi.com, Linux-MM , Aneesh Kumar , Andrew Morton , ppc-dev , Ingo Molnar On Thu, Mar 19, 2015 at 04:05:46PM -0700, Linus Torvalds wrote: > On Thu, Mar 19, 2015 at 3:41 PM, Dave Chinner wrote: > > > > My recollection wasn't faulty - I pulled it from an earlier email. > > That said, the original measurement might have been faulty. I ran > > the numbers again on the 3.19 kernel I saved away from the original > > testing. That came up at 235k, which is pretty much the same as > > yesterday's test. The runtime,however, is unchanged from my original > > measurements of 4m54s (pte_hack came in at 5m20s). > > Ok. Good. So the "more than an order of magnitude difference" was > really about measurement differences, not quite as real. Looks like > more a "factor of two" than a factor of 20. > > Did you do the profiles the same way? Because that would explain the > differences in the TLB flush percentages too (the "1.4% from > tlb_invalidate_range()" vs "pretty much everything from migration"). > > The runtime variation does show that there's some *big* subtle > difference for the numa balancing in the exact TNF_NO_GROUP details. TNF_NO_GROUP affects whether the scheduler tries to group related processes together. Whether migration occurs depends on what node a process is scheduled on. If processes are aggressively grouped inappropriately then it is possible there is a bug that causes the load balancer to move processes off a node (possible migration) with NUMA balancing trying to pull it back (another possible migration). Small bugs there can result in excessive migration. -- Mel Gorman SUSE Labs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.suse.de (cantor2.suse.de [195.135.220.15]) (using TLSv1 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id A484D1A2A85 for ; Fri, 20 Mar 2015 20:56:17 +1100 (AEDT) Date: Fri, 20 Mar 2015 09:56:06 +0000 From: Mel Gorman To: Linus Torvalds Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur Message-ID: <20150320095606.GE3087@suse.de> References: <20150317070655.GB10105@dastard> <20150317205104.GA28621@dastard> <20150317220840.GC28621@dastard> <20150319224143.GI10105@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 In-Reply-To: Cc: Dave Chinner , Linux Kernel Mailing List , xfs@oss.sgi.com, Linux-MM , Aneesh Kumar , Andrew Morton , ppc-dev , Ingo Molnar List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Thu, Mar 19, 2015 at 04:05:46PM -0700, Linus Torvalds wrote: > On Thu, Mar 19, 2015 at 3:41 PM, Dave Chinner wrote: > > > > My recollection wasn't faulty - I pulled it from an earlier email. > > That said, the original measurement might have been faulty. I ran > > the numbers again on the 3.19 kernel I saved away from the original > > testing. That came up at 235k, which is pretty much the same as > > yesterday's test. The runtime,however, is unchanged from my original > > measurements of 4m54s (pte_hack came in at 5m20s). > > Ok. Good. So the "more than an order of magnitude difference" was > really about measurement differences, not quite as real. Looks like > more a "factor of two" than a factor of 20. > > Did you do the profiles the same way? Because that would explain the > differences in the TLB flush percentages too (the "1.4% from > tlb_invalidate_range()" vs "pretty much everything from migration"). > > The runtime variation does show that there's some *big* subtle > difference for the numa balancing in the exact TNF_NO_GROUP details. TNF_NO_GROUP affects whether the scheduler tries to group related processes together. Whether migration occurs depends on what node a process is scheduled on. If processes are aggressively grouped inappropriately then it is possible there is a bug that causes the load balancer to move processes off a node (possible migration) with NUMA balancing trying to pull it back (another possible migration). Small bugs there can result in excessive migration. -- Mel Gorman SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f169.google.com (mail-wi0-f169.google.com [209.85.212.169]) by kanga.kvack.org (Postfix) with ESMTP id DF8DA6B0038 for ; Fri, 20 Mar 2015 05:56:15 -0400 (EDT) Received: by wibdy8 with SMTP id dy8so12561345wib.0 for ; Fri, 20 Mar 2015 02:56:15 -0700 (PDT) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id sh1si2549741wic.41.2015.03.20.02.56.13 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 20 Mar 2015 02:56:14 -0700 (PDT) Date: Fri, 20 Mar 2015 09:56:06 +0000 From: Mel Gorman Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur Message-ID: <20150320095606.GE3087@suse.de> References: <20150317070655.GB10105@dastard> <20150317205104.GA28621@dastard> <20150317220840.GC28621@dastard> <20150319224143.GI10105@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Dave Chinner , Ingo Molnar , Andrew Morton , Aneesh Kumar , Linux Kernel Mailing List , Linux-MM , xfs@oss.sgi.com, ppc-dev On Thu, Mar 19, 2015 at 04:05:46PM -0700, Linus Torvalds wrote: > On Thu, Mar 19, 2015 at 3:41 PM, Dave Chinner wrote: > > > > My recollection wasn't faulty - I pulled it from an earlier email. > > That said, the original measurement might have been faulty. I ran > > the numbers again on the 3.19 kernel I saved away from the original > > testing. That came up at 235k, which is pretty much the same as > > yesterday's test. The runtime,however, is unchanged from my original > > measurements of 4m54s (pte_hack came in at 5m20s). > > Ok. Good. So the "more than an order of magnitude difference" was > really about measurement differences, not quite as real. Looks like > more a "factor of two" than a factor of 20. > > Did you do the profiles the same way? Because that would explain the > differences in the TLB flush percentages too (the "1.4% from > tlb_invalidate_range()" vs "pretty much everything from migration"). > > The runtime variation does show that there's some *big* subtle > difference for the numa balancing in the exact TNF_NO_GROUP details. TNF_NO_GROUP affects whether the scheduler tries to group related processes together. Whether migration occurs depends on what node a process is scheduled on. If processes are aggressively grouped inappropriately then it is possible there is a bug that causes the load balancer to move processes off a node (possible migration) with NUMA balancing trying to pull it back (another possible migration). Small bugs there can result in excessive migration. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751930AbbCTJ4S (ORCPT ); Fri, 20 Mar 2015 05:56:18 -0400 Received: from cantor2.suse.de ([195.135.220.15]:51801 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751332AbbCTJ4O (ORCPT ); Fri, 20 Mar 2015 05:56:14 -0400 Date: Fri, 20 Mar 2015 09:56:06 +0000 From: Mel Gorman To: Linus Torvalds Cc: Dave Chinner , Ingo Molnar , Andrew Morton , Aneesh Kumar , Linux Kernel Mailing List , Linux-MM , xfs@oss.sgi.com, ppc-dev Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur Message-ID: <20150320095606.GE3087@suse.de> References: <20150317070655.GB10105@dastard> <20150317205104.GA28621@dastard> <20150317220840.GC28621@dastard> <20150319224143.GI10105@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 19, 2015 at 04:05:46PM -0700, Linus Torvalds wrote: > On Thu, Mar 19, 2015 at 3:41 PM, Dave Chinner wrote: > > > > My recollection wasn't faulty - I pulled it from an earlier email. > > That said, the original measurement might have been faulty. I ran > > the numbers again on the 3.19 kernel I saved away from the original > > testing. That came up at 235k, which is pretty much the same as > > yesterday's test. The runtime,however, is unchanged from my original > > measurements of 4m54s (pte_hack came in at 5m20s). > > Ok. Good. So the "more than an order of magnitude difference" was > really about measurement differences, not quite as real. Looks like > more a "factor of two" than a factor of 20. > > Did you do the profiles the same way? Because that would explain the > differences in the TLB flush percentages too (the "1.4% from > tlb_invalidate_range()" vs "pretty much everything from migration"). > > The runtime variation does show that there's some *big* subtle > difference for the numa balancing in the exact TNF_NO_GROUP details. TNF_NO_GROUP affects whether the scheduler tries to group related processes together. Whether migration occurs depends on what node a process is scheduled on. If processes are aggressively grouped inappropriately then it is possible there is a bug that causes the load balancer to move processes off a node (possible migration) with NUMA balancing trying to pull it back (another possible migration). Small bugs there can result in excessive migration. -- Mel Gorman SUSE Labs