From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay1.corp.sgi.com [137.38.102.111]) by oss.sgi.com (Postfix) with ESMTP id AEF637F5D for ; Sun, 8 Mar 2015 05:02:35 -0500 (CDT) Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by relay1.corp.sgi.com (Postfix) with ESMTP id 805EE8F8037 for ; Sun, 8 Mar 2015 03:02:32 -0700 (PDT) Received: from mail-wg0-f51.google.com (mail-wg0-f51.google.com [74.125.82.51]) by cuda.sgi.com with ESMTP id HojKqSC5Aalmf3wj (version=TLSv1 cipher=RC4-SHA bits=128 verify=NO) for ; Sun, 08 Mar 2015 03:02:30 -0700 (PDT) Received: by wggy19 with SMTP id y19so4906682wgg.9 for ; Sun, 08 Mar 2015 03:02:29 -0700 (PDT) Date: Sun, 8 Mar 2015 11:02:23 +0100 From: Ingo Molnar Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur Message-ID: <20150308100223.GC15487@gmail.com> References: <1425741651-29152-1-git-send-email-mgorman@suse.de> <1425741651-29152-5-git-send-email-mgorman@suse.de> <20150307163657.GA9702@gmail.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: Linus Torvalds Cc: Linux Kernel Mailing List , xfs@oss.sgi.com, Linux-MM , Mel Gorman , Andrew Morton , ppc-dev , Aneesh Kumar * Linus Torvalds wrote: > On Sat, Mar 7, 2015 at 8:36 AM, Ingo Molnar wrote: > > > > And the patch Dave bisected to is a relatively simple patch. Why > > not simply revert it to see whether that cures much of the > > problem? > > So the problem with that is that "pmd_set_numa()" and friends simply > no longer exist. So we can't just revert that one patch, it's the > whole series, and the whole point of the series. Yeah. > What confuses me is that the only real change that I can see in that > patch is the change to "change_huge_pmd()". Everything else is > pretty much a 100% equivalent transformation, afaik. Of course, I > may be wrong about that, and missing something silly. Well, there's a difference in what we write to the pte: #define _PAGE_BIT_NUMA (_PAGE_BIT_GLOBAL+1) #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL and our expectation was that the two should be equivalent methods from the POV of the NUMA balancing code, right? > And the changes to "change_huge_pmd()" were basically re-done > differently by subsequent patches anyway. > > The *only* change I see remaining is that change_huge_pmd() now does > > entry = pmdp_get_and_clear_notify(mm, addr, pmd); > entry = pmd_modify(entry, newprot); > set_pmd_at(mm, addr, pmd, entry); > > for all changes. It used to do that "pmdp_set_numa()" for the > prot_numa case, which did just > > pmd_t pmd = *pmdp; > pmd = pmd_mknuma(pmd); > set_pmd_at(mm, addr, pmdp, pmd); > > instead. > > I don't like the old pmdp_set_numa() because it can drop dirty bits, > so I think the old code was actively buggy. Could we, as a silly testing hack not to be applied, write a hack-patch that re-introduces the racy way of setting the NUMA bit, to confirm that it is indeed this difference that changes pte visibility across CPUs enough to create so many more faults? Because if the answer is 'yes', then we can safely say: 'we regressed performance because correctness [not dropping dirty bits] comes before performance'. If the answer is 'no', then we still have a mystery (and a regression) to track down. As a second hack (not to be applied), could we change: #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL to: #define _PAGE_BIT_PROTNONE (_PAGE_BIT_GLOBAL+1) to double check that the position of the bit does not matter? I don't think we've exhaused all avenues of analysis here. Thanks, Ingo _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-x230.google.com (mail-wg0-x230.google.com [IPv6:2a00:1450:400c:c00::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 41D761A0408 for ; Sun, 8 Mar 2015 21:02:33 +1100 (AEDT) Received: by wghk14 with SMTP id k14so18163162wgh.3 for ; Sun, 08 Mar 2015 03:02:29 -0700 (PDT) Sender: Ingo Molnar Date: Sun, 8 Mar 2015 11:02:23 +0100 From: Ingo Molnar To: Linus Torvalds Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur Message-ID: <20150308100223.GC15487@gmail.com> References: <1425741651-29152-1-git-send-email-mgorman@suse.de> <1425741651-29152-5-git-send-email-mgorman@suse.de> <20150307163657.GA9702@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: Cc: Dave Chinner , Linux Kernel Mailing List , xfs@oss.sgi.com, Linux-MM , Mel Gorman , Andrew Morton , ppc-dev , Aneesh Kumar List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , * Linus Torvalds wrote: > On Sat, Mar 7, 2015 at 8:36 AM, Ingo Molnar wrote: > > > > And the patch Dave bisected to is a relatively simple patch. Why > > not simply revert it to see whether that cures much of the > > problem? > > So the problem with that is that "pmd_set_numa()" and friends simply > no longer exist. So we can't just revert that one patch, it's the > whole series, and the whole point of the series. Yeah. > What confuses me is that the only real change that I can see in that > patch is the change to "change_huge_pmd()". Everything else is > pretty much a 100% equivalent transformation, afaik. Of course, I > may be wrong about that, and missing something silly. Well, there's a difference in what we write to the pte: #define _PAGE_BIT_NUMA (_PAGE_BIT_GLOBAL+1) #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL and our expectation was that the two should be equivalent methods from the POV of the NUMA balancing code, right? > And the changes to "change_huge_pmd()" were basically re-done > differently by subsequent patches anyway. > > The *only* change I see remaining is that change_huge_pmd() now does > > entry = pmdp_get_and_clear_notify(mm, addr, pmd); > entry = pmd_modify(entry, newprot); > set_pmd_at(mm, addr, pmd, entry); > > for all changes. It used to do that "pmdp_set_numa()" for the > prot_numa case, which did just > > pmd_t pmd = *pmdp; > pmd = pmd_mknuma(pmd); > set_pmd_at(mm, addr, pmdp, pmd); > > instead. > > I don't like the old pmdp_set_numa() because it can drop dirty bits, > so I think the old code was actively buggy. Could we, as a silly testing hack not to be applied, write a hack-patch that re-introduces the racy way of setting the NUMA bit, to confirm that it is indeed this difference that changes pte visibility across CPUs enough to create so many more faults? Because if the answer is 'yes', then we can safely say: 'we regressed performance because correctness [not dropping dirty bits] comes before performance'. If the answer is 'no', then we still have a mystery (and a regression) to track down. As a second hack (not to be applied), could we change: #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL to: #define _PAGE_BIT_PROTNONE (_PAGE_BIT_GLOBAL+1) to double check that the position of the bit does not matter? I don't think we've exhaused all avenues of analysis here. Thanks, Ingo From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-f44.google.com (mail-wg0-f44.google.com [74.125.82.44]) by kanga.kvack.org (Postfix) with ESMTP id 52B856B006E for ; Sun, 8 Mar 2015 06:02:31 -0400 (EDT) Received: by wggx13 with SMTP id x13so28710265wgg.4 for ; Sun, 08 Mar 2015 03:02:30 -0700 (PDT) Received: from mail-wg0-x235.google.com (mail-wg0-x235.google.com. [2a00:1450:400c:c00::235]) by mx.google.com with ESMTPS id ll20si20968810wic.111.2015.03.08.03.02.29 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 08 Mar 2015 03:02:29 -0700 (PDT) Received: by wghl18 with SMTP id l18so14697997wgh.11 for ; Sun, 08 Mar 2015 03:02:29 -0700 (PDT) Date: Sun, 8 Mar 2015 11:02:23 +0100 From: Ingo Molnar Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur Message-ID: <20150308100223.GC15487@gmail.com> References: <1425741651-29152-1-git-send-email-mgorman@suse.de> <1425741651-29152-5-git-send-email-mgorman@suse.de> <20150307163657.GA9702@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Mel Gorman , Dave Chinner , Andrew Morton , Aneesh Kumar , Linux Kernel Mailing List , Linux-MM , xfs@oss.sgi.com, ppc-dev * Linus Torvalds wrote: > On Sat, Mar 7, 2015 at 8:36 AM, Ingo Molnar wrote: > > > > And the patch Dave bisected to is a relatively simple patch. Why > > not simply revert it to see whether that cures much of the > > problem? > > So the problem with that is that "pmd_set_numa()" and friends simply > no longer exist. So we can't just revert that one patch, it's the > whole series, and the whole point of the series. Yeah. > What confuses me is that the only real change that I can see in that > patch is the change to "change_huge_pmd()". Everything else is > pretty much a 100% equivalent transformation, afaik. Of course, I > may be wrong about that, and missing something silly. Well, there's a difference in what we write to the pte: #define _PAGE_BIT_NUMA (_PAGE_BIT_GLOBAL+1) #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL and our expectation was that the two should be equivalent methods from the POV of the NUMA balancing code, right? > And the changes to "change_huge_pmd()" were basically re-done > differently by subsequent patches anyway. > > The *only* change I see remaining is that change_huge_pmd() now does > > entry = pmdp_get_and_clear_notify(mm, addr, pmd); > entry = pmd_modify(entry, newprot); > set_pmd_at(mm, addr, pmd, entry); > > for all changes. It used to do that "pmdp_set_numa()" for the > prot_numa case, which did just > > pmd_t pmd = *pmdp; > pmd = pmd_mknuma(pmd); > set_pmd_at(mm, addr, pmdp, pmd); > > instead. > > I don't like the old pmdp_set_numa() because it can drop dirty bits, > so I think the old code was actively buggy. Could we, as a silly testing hack not to be applied, write a hack-patch that re-introduces the racy way of setting the NUMA bit, to confirm that it is indeed this difference that changes pte visibility across CPUs enough to create so many more faults? Because if the answer is 'yes', then we can safely say: 'we regressed performance because correctness [not dropping dirty bits] comes before performance'. If the answer is 'no', then we still have a mystery (and a regression) to track down. As a second hack (not to be applied), could we change: #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL to: #define _PAGE_BIT_PROTNONE (_PAGE_BIT_GLOBAL+1) to double check that the position of the bit does not matter? I don't think we've exhaused all avenues of analysis here. Thanks, Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751884AbbCHKCe (ORCPT ); Sun, 8 Mar 2015 06:02:34 -0400 Received: from mail-wg0-f45.google.com ([74.125.82.45]:39988 "EHLO mail-wg0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751789AbbCHKCa (ORCPT ); Sun, 8 Mar 2015 06:02:30 -0400 Date: Sun, 8 Mar 2015 11:02:23 +0100 From: Ingo Molnar To: Linus Torvalds Cc: Mel Gorman , Dave Chinner , Andrew Morton , Aneesh Kumar , Linux Kernel Mailing List , Linux-MM , xfs@oss.sgi.com, ppc-dev Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur Message-ID: <20150308100223.GC15487@gmail.com> References: <1425741651-29152-1-git-send-email-mgorman@suse.de> <1425741651-29152-5-git-send-email-mgorman@suse.de> <20150307163657.GA9702@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Linus Torvalds wrote: > On Sat, Mar 7, 2015 at 8:36 AM, Ingo Molnar wrote: > > > > And the patch Dave bisected to is a relatively simple patch. Why > > not simply revert it to see whether that cures much of the > > problem? > > So the problem with that is that "pmd_set_numa()" and friends simply > no longer exist. So we can't just revert that one patch, it's the > whole series, and the whole point of the series. Yeah. > What confuses me is that the only real change that I can see in that > patch is the change to "change_huge_pmd()". Everything else is > pretty much a 100% equivalent transformation, afaik. Of course, I > may be wrong about that, and missing something silly. Well, there's a difference in what we write to the pte: #define _PAGE_BIT_NUMA (_PAGE_BIT_GLOBAL+1) #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL and our expectation was that the two should be equivalent methods from the POV of the NUMA balancing code, right? > And the changes to "change_huge_pmd()" were basically re-done > differently by subsequent patches anyway. > > The *only* change I see remaining is that change_huge_pmd() now does > > entry = pmdp_get_and_clear_notify(mm, addr, pmd); > entry = pmd_modify(entry, newprot); > set_pmd_at(mm, addr, pmd, entry); > > for all changes. It used to do that "pmdp_set_numa()" for the > prot_numa case, which did just > > pmd_t pmd = *pmdp; > pmd = pmd_mknuma(pmd); > set_pmd_at(mm, addr, pmdp, pmd); > > instead. > > I don't like the old pmdp_set_numa() because it can drop dirty bits, > so I think the old code was actively buggy. Could we, as a silly testing hack not to be applied, write a hack-patch that re-introduces the racy way of setting the NUMA bit, to confirm that it is indeed this difference that changes pte visibility across CPUs enough to create so many more faults? Because if the answer is 'yes', then we can safely say: 'we regressed performance because correctness [not dropping dirty bits] comes before performance'. If the answer is 'no', then we still have a mystery (and a regression) to track down. As a second hack (not to be applied), could we change: #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL to: #define _PAGE_BIT_PROTNONE (_PAGE_BIT_GLOBAL+1) to double check that the position of the bit does not matter? I don't think we've exhaused all avenues of analysis here. Thanks, Ingo