From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <xfs-bounces@oss.sgi.com>
Received: from relay.sgi.com (relay3.corp.sgi.com [198.149.34.15])
	by oss.sgi.com (Postfix) with ESMTP id 764637F37
	for <xfs@oss.sgi.com>; Thu, 19 Mar 2015 17:42:01 -0500 (CDT)
Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11])
	by relay3.corp.sgi.com (Postfix) with ESMTP id 09DCAAC002
	for <xfs@oss.sgi.com>; Thu, 19 Mar 2015 15:42:00 -0700 (PDT)
Received: from ipmail07.adl2.internode.on.net (ipmail07.adl2.internode.on.net
	[150.101.137.131]) by cuda.sgi.com with ESMTP id
	IHIRQOgu2R9GFSmJ for <xfs@oss.sgi.com>;
	Thu, 19 Mar 2015 15:41:58 -0700 (PDT)
Date: Fri, 20 Mar 2015 09:41:44 +1100
From: Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures
	occur
Message-ID: <20150319224143.GI10105@dastard>
References: <CA+55aFx=81BGnQFNhnAGu6CetL7yifPsnD-+v7Y6QRqwgH47gQ@mail.gmail.com>
	<20150312184925.GH3406@suse.de> <20150317070655.GB10105@dastard>
	<CA+55aFzdLnFdku-gnm3mGbeS=QauYBNkFQKYXJAGkrMd2jKXhw@mail.gmail.com>
	<20150317205104.GA28621@dastard>
	<CA+55aFzSPcNgxw4GC7aAV1r0P5LniyVVC66COz=3cgMcx73Nag@mail.gmail.com>
	<20150317220840.GC28621@dastard>
	<CA+55aFwne-fe_Gg-_GTUo+iOAbbNpLBa264JqSFkH79EULyAqw@mail.gmail.com>
	<CA+55aFy-Mw74rAdLMMMUgnsG3ZttMWVNGz7CXZJY7q9fqyRYfg@mail.gmail.com>
	<CA+55aFyxA9u2cVzV+S7TSY9ZvRXCX=z22YAbi9mdPVBKmqgR5g@mail.gmail.com>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <CA+55aFyxA9u2cVzV+S7TSY9ZvRXCX=z22YAbi9mdPVBKmqgR5g@mail.gmail.com>
List-Id: XFS Filesystem from SGI <xfs.oss.sgi.com>
List-Unsubscribe: <http://oss.sgi.com/mailman/options/xfs>,
	<mailto:xfs-request@oss.sgi.com?subject=unsubscribe>
List-Archive: <http://oss.sgi.com/pipermail/xfs>
List-Post: <mailto:xfs@oss.sgi.com>
List-Help: <mailto:xfs-request@oss.sgi.com?subject=help>
List-Subscribe: <http://oss.sgi.com/mailman/listinfo/xfs>,
	<mailto:xfs-request@oss.sgi.com?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: xfs-bounces@oss.sgi.com
Sender: xfs-bounces@oss.sgi.com
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, xfs@oss.sgi.com, Linux-MM <linux-mm@kvack.org>, Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>, Andrew Morton <akpm@linux-foundation.org>, ppc-dev <linuxppc-dev@lists.ozlabs.org>, Ingo Molnar <mingo@kernel.org>, Mel Gorman <mgorman@suse.de>

On Thu, Mar 19, 2015 at 02:41:48PM -0700, Linus Torvalds wrote:
> On Wed, Mar 18, 2015 at 10:31 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > So I think there's something I'm missing. For non-shared mappings, I
> > still have the idea that pte_dirty should be the same as pte_write.
> > And yet, your testing of 3.19 shows that it's a big difference.
> > There's clearly something I'm completely missing.
> 
> Ahh. The normal page table scanning and page fault handling both clear
> and set the dirty bit together with the writable one. But "fork()"
> will clear the writable bit without clearing dirty. For some reason I
> thought it moved the dirty bit into the struct page like the VM
> scanning does, but that was just me having a brainfart. So yeah,
> pte_dirty doesn't have to match pte_write even under perfectly normal
> circumstances. Maybe there are other cases.
> 
> Not that I see a lot of forking in the xfs repair case either, so..
> 
> Dave, mind re-running the plain 3.19 numbers to really verify that the
> pte_dirty/pte_write change really made that big of a difference. Maybe
> your recollection of ~55,000 migrate_pages events was faulty. If the
> pte_write ->pte_dirty change is the *only* difference, it's still very
> odd how that one difference would make migrate_rate go from ~55k to
> 471k. That's an order of magnitude difference, for what really
> shouldn't be a big change.

My recollection wasn't faulty - I pulled it from an earlier email.
That said, the original measurement might have been faulty. I ran
the numbers again on the 3.19 kernel I saved away from the original
testing. That came up at 235k, which is pretty much the same as
yesterday's test. The runtime,however, is unchanged from my original
measurements of 4m54s (pte_hack came in at 5m20s).

Wondering where the 55k number came from, I played around with when
I started the measurement - all the numbers since I did the bisect
have come from starting it at roughly 130AGs into phase 3 where the
memory footprint stabilises and the tlb flush overhead kicks in.

However, if I start the measurement at the same time as the repair
test, I get something much closer to the 55k number. I also note
that my original 4.0-rc1 numbers were much lower than the more
recent steady state measurements (360k vs 470k), so I'd say the
original numbers weren't representative of the steady state
behaviour and so can be ignored...

> Maybe a system update has changed libraries and memory allocation
> patterns, and there is something bigger than that one-liner
> pte_dirty/write change going on?

Possibly. The xfs_repair binary has definitely been rebuilt (testing
unrelated bug fixes that only affect phase 6/7 behaviour), but
otherwise the system libraries are unchanged.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <david@fromorbit.com>
Received: from ipmail07.adl2.internode.on.net (ipmail07.adl2.internode.on.net
 [150.101.137.131])
 by lists.ozlabs.org (Postfix) with ESMTP id 431AB1A00B0
 for <linuxppc-dev@lists.ozlabs.org>; Fri, 20 Mar 2015 09:41:58 +1100 (AEDT)
Date: Fri, 20 Mar 2015 09:41:44 +1100
From: Dave Chinner <david@fromorbit.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures
 occur
Message-ID: <20150319224143.GI10105@dastard>
References: <CA+55aFx=81BGnQFNhnAGu6CetL7yifPsnD-+v7Y6QRqwgH47gQ@mail.gmail.com>
 <20150312184925.GH3406@suse.de> <20150317070655.GB10105@dastard>
 <CA+55aFzdLnFdku-gnm3mGbeS=QauYBNkFQKYXJAGkrMd2jKXhw@mail.gmail.com>
 <20150317205104.GA28621@dastard>
 <CA+55aFzSPcNgxw4GC7aAV1r0P5LniyVVC66COz=3cgMcx73Nag@mail.gmail.com>
 <20150317220840.GC28621@dastard>
 <CA+55aFwne-fe_Gg-_GTUo+iOAbbNpLBa264JqSFkH79EULyAqw@mail.gmail.com>
 <CA+55aFy-Mw74rAdLMMMUgnsG3ZttMWVNGz7CXZJY7q9fqyRYfg@mail.gmail.com>
 <CA+55aFyxA9u2cVzV+S7TSY9ZvRXCX=z22YAbi9mdPVBKmqgR5g@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
In-Reply-To: <CA+55aFyxA9u2cVzV+S7TSY9ZvRXCX=z22YAbi9mdPVBKmqgR5g@mail.gmail.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, xfs@oss.sgi.com,
 Linux-MM <linux-mm@kvack.org>, Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>,
 Andrew Morton <akpm@linux-foundation.org>,
 ppc-dev <linuxppc-dev@lists.ozlabs.org>, Ingo Molnar <mingo@kernel.org>,
 Mel Gorman <mgorman@suse.de>
List-Id: Linux on PowerPC Developers Mail List <linuxppc-dev.lists.ozlabs.org>
List-Unsubscribe: <https://lists.ozlabs.org/options/linuxppc-dev>,
 <mailto:linuxppc-dev-request@lists.ozlabs.org?subject=unsubscribe>
List-Archive: <http://lists.ozlabs.org/pipermail/linuxppc-dev/>
List-Post: <mailto:linuxppc-dev@lists.ozlabs.org>
List-Help: <mailto:linuxppc-dev-request@lists.ozlabs.org?subject=help>
List-Subscribe: <https://lists.ozlabs.org/listinfo/linuxppc-dev>,
 <mailto:linuxppc-dev-request@lists.ozlabs.org?subject=subscribe>

On Thu, Mar 19, 2015 at 02:41:48PM -0700, Linus Torvalds wrote:
> On Wed, Mar 18, 2015 at 10:31 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > So I think there's something I'm missing. For non-shared mappings, I
> > still have the idea that pte_dirty should be the same as pte_write.
> > And yet, your testing of 3.19 shows that it's a big difference.
> > There's clearly something I'm completely missing.
> 
> Ahh. The normal page table scanning and page fault handling both clear
> and set the dirty bit together with the writable one. But "fork()"
> will clear the writable bit without clearing dirty. For some reason I
> thought it moved the dirty bit into the struct page like the VM
> scanning does, but that was just me having a brainfart. So yeah,
> pte_dirty doesn't have to match pte_write even under perfectly normal
> circumstances. Maybe there are other cases.
> 
> Not that I see a lot of forking in the xfs repair case either, so..
> 
> Dave, mind re-running the plain 3.19 numbers to really verify that the
> pte_dirty/pte_write change really made that big of a difference. Maybe
> your recollection of ~55,000 migrate_pages events was faulty. If the
> pte_write ->pte_dirty change is the *only* difference, it's still very
> odd how that one difference would make migrate_rate go from ~55k to
> 471k. That's an order of magnitude difference, for what really
> shouldn't be a big change.

My recollection wasn't faulty - I pulled it from an earlier email.
That said, the original measurement might have been faulty. I ran
the numbers again on the 3.19 kernel I saved away from the original
testing. That came up at 235k, which is pretty much the same as
yesterday's test. The runtime,however, is unchanged from my original
measurements of 4m54s (pte_hack came in at 5m20s).

Wondering where the 55k number came from, I played around with when
I started the measurement - all the numbers since I did the bisect
have come from starting it at roughly 130AGs into phase 3 where the
memory footprint stabilises and the tlb flush overhead kicks in.

However, if I start the measurement at the same time as the repair
test, I get something much closer to the 55k number. I also note
that my original 4.0-rc1 numbers were much lower than the more
recent steady state measurements (360k vs 470k), so I'd say the
original numbers weren't representative of the steady state
behaviour and so can be ignored...

> Maybe a system update has changed libraries and memory allocation
> patterns, and there is something bigger than that one-liner
> pte_dirty/write change going on?

Possibly. The xfs_repair binary has definitely been rebuilt (testing
unrelated bug fixes that only affect phase 6/7 behaviour), but
otherwise the system libraries are unchanged.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
Received: from mail-pd0-f174.google.com (mail-pd0-f174.google.com [209.85.192.174])
	by kanga.kvack.org (Postfix) with ESMTP id AC3406B0038
	for <linux-mm@kvack.org>; Thu, 19 Mar 2015 18:42:00 -0400 (EDT)
Received: by pdbcz9 with SMTP id cz9so89171948pdb.3
        for <linux-mm@kvack.org>; Thu, 19 Mar 2015 15:42:00 -0700 (PDT)
Received: from ipmail07.adl2.internode.on.net (ipmail07.adl2.internode.on.net. [150.101.137.131])
        by mx.google.com with ESMTP id l9si5629481pdp.89.2015.03.19.15.41.58
        for <linux-mm@kvack.org>;
        Thu, 19 Mar 2015 15:41:59 -0700 (PDT)
Date: Fri, 20 Mar 2015 09:41:44 +1100
From: Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures
 occur
Message-ID: <20150319224143.GI10105@dastard>
References: <CA+55aFx=81BGnQFNhnAGu6CetL7yifPsnD-+v7Y6QRqwgH47gQ@mail.gmail.com>
 <20150312184925.GH3406@suse.de>
 <20150317070655.GB10105@dastard>
 <CA+55aFzdLnFdku-gnm3mGbeS=QauYBNkFQKYXJAGkrMd2jKXhw@mail.gmail.com>
 <20150317205104.GA28621@dastard>
 <CA+55aFzSPcNgxw4GC7aAV1r0P5LniyVVC66COz=3cgMcx73Nag@mail.gmail.com>
 <20150317220840.GC28621@dastard>
 <CA+55aFwne-fe_Gg-_GTUo+iOAbbNpLBa264JqSFkH79EULyAqw@mail.gmail.com>
 <CA+55aFy-Mw74rAdLMMMUgnsG3ZttMWVNGz7CXZJY7q9fqyRYfg@mail.gmail.com>
 <CA+55aFyxA9u2cVzV+S7TSY9ZvRXCX=z22YAbi9mdPVBKmqgR5g@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CA+55aFyxA9u2cVzV+S7TSY9ZvRXCX=z22YAbi9mdPVBKmqgR5g@mail.gmail.com>
Sender: owner-linux-mm@kvack.org
List-ID: <linux-mm.kvack.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>, Ingo Molnar <mingo@kernel.org>, Andrew Morton <akpm@linux-foundation.org>, Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Linux-MM <linux-mm@kvack.org>, xfs@oss.sgi.com, ppc-dev <linuxppc-dev@lists.ozlabs.org>

On Thu, Mar 19, 2015 at 02:41:48PM -0700, Linus Torvalds wrote:
> On Wed, Mar 18, 2015 at 10:31 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > So I think there's something I'm missing. For non-shared mappings, I
> > still have the idea that pte_dirty should be the same as pte_write.
> > And yet, your testing of 3.19 shows that it's a big difference.
> > There's clearly something I'm completely missing.
> 
> Ahh. The normal page table scanning and page fault handling both clear
> and set the dirty bit together with the writable one. But "fork()"
> will clear the writable bit without clearing dirty. For some reason I
> thought it moved the dirty bit into the struct page like the VM
> scanning does, but that was just me having a brainfart. So yeah,
> pte_dirty doesn't have to match pte_write even under perfectly normal
> circumstances. Maybe there are other cases.
> 
> Not that I see a lot of forking in the xfs repair case either, so..
> 
> Dave, mind re-running the plain 3.19 numbers to really verify that the
> pte_dirty/pte_write change really made that big of a difference. Maybe
> your recollection of ~55,000 migrate_pages events was faulty. If the
> pte_write ->pte_dirty change is the *only* difference, it's still very
> odd how that one difference would make migrate_rate go from ~55k to
> 471k. That's an order of magnitude difference, for what really
> shouldn't be a big change.

My recollection wasn't faulty - I pulled it from an earlier email.
That said, the original measurement might have been faulty. I ran
the numbers again on the 3.19 kernel I saved away from the original
testing. That came up at 235k, which is pretty much the same as
yesterday's test. The runtime,however, is unchanged from my original
measurements of 4m54s (pte_hack came in at 5m20s).

Wondering where the 55k number came from, I played around with when
I started the measurement - all the numbers since I did the bisect
have come from starting it at roughly 130AGs into phase 3 where the
memory footprint stabilises and the tlb flush overhead kicks in.

However, if I start the measurement at the same time as the repair
test, I get something much closer to the 55k number. I also note
that my original 4.0-rc1 numbers were much lower than the more
recent steady state measurements (360k vs 470k), so I'd say the
original numbers weren't representative of the steady state
behaviour and so can be ignored...

> Maybe a system update has changed libraries and memory allocation
> patterns, and there is something bigger than that one-liner
> pte_dirty/write change going on?

Possibly. The xfs_repair binary has definitely been rebuilt (testing
unrelated bug fixes that only affect phase 6/7 behaviour), but
otherwise the system libraries are unchanged.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751591AbbCSWmE (ORCPT <rfc822;w@1wt.eu>);
	Thu, 19 Mar 2015 18:42:04 -0400
Received: from ipmail07.adl2.internode.on.net ([150.101.137.131]:56381 "EHLO
	ipmail07.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK)
	by vger.kernel.org with ESMTP id S1752426AbbCSWl6 (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Thu, 19 Mar 2015 18:41:58 -0400
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A2AjEACkTwtV/wYQLHlcgwaBLLMTBpkGAgIBAQKBSE0BAQEBAQF9hA8BAQEDAScTHCMFCwgDGAklDwUlAyETiCcHzjgBAQgCAR8YhXKFDYQPEQFQB4QtBZAphjWDWoEcjBuDK4NHIoICHIFkKjGBC4E4AQEB
Date: Fri, 20 Mar 2015 09:41:44 +1100
From: Dave Chinner <david@fromorbit.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>, Ingo Molnar <mingo@kernel.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        Linux-MM <linux-mm@kvack.org>, xfs@oss.sgi.com,
        ppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures
 occur
Message-ID: <20150319224143.GI10105@dastard>
References: <CA+55aFx=81BGnQFNhnAGu6CetL7yifPsnD-+v7Y6QRqwgH47gQ@mail.gmail.com>
 <20150312184925.GH3406@suse.de>
 <20150317070655.GB10105@dastard>
 <CA+55aFzdLnFdku-gnm3mGbeS=QauYBNkFQKYXJAGkrMd2jKXhw@mail.gmail.com>
 <20150317205104.GA28621@dastard>
 <CA+55aFzSPcNgxw4GC7aAV1r0P5LniyVVC66COz=3cgMcx73Nag@mail.gmail.com>
 <20150317220840.GC28621@dastard>
 <CA+55aFwne-fe_Gg-_GTUo+iOAbbNpLBa264JqSFkH79EULyAqw@mail.gmail.com>
 <CA+55aFy-Mw74rAdLMMMUgnsG3ZttMWVNGz7CXZJY7q9fqyRYfg@mail.gmail.com>
 <CA+55aFyxA9u2cVzV+S7TSY9ZvRXCX=z22YAbi9mdPVBKmqgR5g@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CA+55aFyxA9u2cVzV+S7TSY9ZvRXCX=z22YAbi9mdPVBKmqgR5g@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Thu, Mar 19, 2015 at 02:41:48PM -0700, Linus Torvalds wrote:
> On Wed, Mar 18, 2015 at 10:31 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > So I think there's something I'm missing. For non-shared mappings, I
> > still have the idea that pte_dirty should be the same as pte_write.
> > And yet, your testing of 3.19 shows that it's a big difference.
> > There's clearly something I'm completely missing.
> 
> Ahh. The normal page table scanning and page fault handling both clear
> and set the dirty bit together with the writable one. But "fork()"
> will clear the writable bit without clearing dirty. For some reason I
> thought it moved the dirty bit into the struct page like the VM
> scanning does, but that was just me having a brainfart. So yeah,
> pte_dirty doesn't have to match pte_write even under perfectly normal
> circumstances. Maybe there are other cases.
> 
> Not that I see a lot of forking in the xfs repair case either, so..
> 
> Dave, mind re-running the plain 3.19 numbers to really verify that the
> pte_dirty/pte_write change really made that big of a difference. Maybe
> your recollection of ~55,000 migrate_pages events was faulty. If the
> pte_write ->pte_dirty change is the *only* difference, it's still very
> odd how that one difference would make migrate_rate go from ~55k to
> 471k. That's an order of magnitude difference, for what really
> shouldn't be a big change.

My recollection wasn't faulty - I pulled it from an earlier email.
That said, the original measurement might have been faulty. I ran
the numbers again on the 3.19 kernel I saved away from the original
testing. That came up at 235k, which is pretty much the same as
yesterday's test. The runtime,however, is unchanged from my original
measurements of 4m54s (pte_hack came in at 5m20s).

Wondering where the 55k number came from, I played around with when
I started the measurement - all the numbers since I did the bisect
have come from starting it at roughly 130AGs into phase 3 where the
memory footprint stabilises and the tlb flush overhead kicks in.

However, if I start the measurement at the same time as the repair
test, I get something much closer to the 55k number. I also note
that my original 4.0-rc1 numbers were much lower than the more
recent steady state measurements (360k vs 470k), so I'd say the
original numbers weren't representative of the steady state
behaviour and so can be ignored...

> Maybe a system update has changed libraries and memory allocation
> patterns, and there is something bigger than that one-liner
> pte_dirty/write change going on?

Possibly. The xfs_repair binary has definitely been rebuilt (testing
unrelated bug fixes that only affect phase 6/7 behaviour), but
otherwise the system libraries are unchanged.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com