From: Dave Chinner <david@fromorbit.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
xfs@oss.sgi.com, Linux-MM <linux-mm@kvack.org>,
Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
ppc-dev <linuxppc-dev@lists.ozlabs.org>,
Ingo Molnar <mingo@kernel.org>, Mel Gorman <mgorman@suse.de>
Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur
Date: Mon, 9 Mar 2015 22:29:36 +1100 [thread overview]
Message-ID: <20150309112936.GD26657@destitution> (raw)
In-Reply-To: <CA+55aFyQyZXu2fi7X9bWdSX0utk8=sccfBwFaSoToROXoE_PLA@mail.gmail.com>
On Sun, Mar 08, 2015 at 11:35:59AM -0700, Linus Torvalds wrote:
> On Sun, Mar 8, 2015 at 3:02 AM, Ingo Molnar <mingo@kernel.org> wrote:
> But:
>
> > As a second hack (not to be applied), could we change:
> >
> > #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL
> >
> > to:
> >
> > #define _PAGE_BIT_PROTNONE (_PAGE_BIT_GLOBAL+1)
> >
> > to double check that the position of the bit does not matter?
>
> Agreed. We should definitely try that.
>
> Dave?
As Mel has already mentioned, I'm in Boston for LSFMM and don't have
access to the test rig I've used to generate this.
> Also, is there some sane way for me to actually see this behavior on a
> regular machine with just a single socket? Dave is apparently running
> in some fake-numa setup, I'm wondering if this is easy enough to
> reproduce that I could see it myself.
Should be - I don't actually use 500TB of storage to generate this -
50GB on an SSD is all you need from the storage side. I just use a
sparse backing file to make it look like a 500TB device. :P
i.e. create an XFS filesystem on a 500TB sparse file with "mkfs.xfs
-d size=500t,file=1 /path/to/file.img", mount it on loopback or as a
virtio,cache=none device for the guest vm and then use fsmark to
generate several million files spread across many, many directories
such as:
$ fs_mark -D 10000 -S0 -n 100000 -s 1 -L 32 -d \
/mnt/scratch/0 -d /mnt/scratch/1 -d /mnt/scratch/2 -d \
/mnt/scratch/3 -d /mnt/scratch/4 -d /mnt/scratch/5 -d \
/mnt/scratch/6 -d /mnt/scratch/7
That should only take a few minutes to run - if you throw 8p at it
then it should run at >100k files/s being created.
Then unmount and run "xfs_repair -o bhash=101703 /path/to/file.img"
on the resultant image file.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
xfs@oss.sgi.com, Linux-MM <linux-mm@kvack.org>,
Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
ppc-dev <linuxppc-dev@lists.ozlabs.org>,
Ingo Molnar <mingo@kernel.org>, Mel Gorman <mgorman@suse.de>
Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur
Date: Mon, 9 Mar 2015 22:29:36 +1100 [thread overview]
Message-ID: <20150309112936.GD26657@destitution> (raw)
In-Reply-To: <CA+55aFyQyZXu2fi7X9bWdSX0utk8=sccfBwFaSoToROXoE_PLA@mail.gmail.com>
On Sun, Mar 08, 2015 at 11:35:59AM -0700, Linus Torvalds wrote:
> On Sun, Mar 8, 2015 at 3:02 AM, Ingo Molnar <mingo@kernel.org> wrote:
> But:
>
> > As a second hack (not to be applied), could we change:
> >
> > #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL
> >
> > to:
> >
> > #define _PAGE_BIT_PROTNONE (_PAGE_BIT_GLOBAL+1)
> >
> > to double check that the position of the bit does not matter?
>
> Agreed. We should definitely try that.
>
> Dave?
As Mel has already mentioned, I'm in Boston for LSFMM and don't have
access to the test rig I've used to generate this.
> Also, is there some sane way for me to actually see this behavior on a
> regular machine with just a single socket? Dave is apparently running
> in some fake-numa setup, I'm wondering if this is easy enough to
> reproduce that I could see it myself.
Should be - I don't actually use 500TB of storage to generate this -
50GB on an SSD is all you need from the storage side. I just use a
sparse backing file to make it look like a 500TB device. :P
i.e. create an XFS filesystem on a 500TB sparse file with "mkfs.xfs
-d size=500t,file=1 /path/to/file.img", mount it on loopback or as a
virtio,cache=none device for the guest vm and then use fsmark to
generate several million files spread across many, many directories
such as:
$ fs_mark -D 10000 -S0 -n 100000 -s 1 -L 32 -d \
/mnt/scratch/0 -d /mnt/scratch/1 -d /mnt/scratch/2 -d \
/mnt/scratch/3 -d /mnt/scratch/4 -d /mnt/scratch/5 -d \
/mnt/scratch/6 -d /mnt/scratch/7
That should only take a few minutes to run - if you throw 8p at it
then it should run at >100k files/s being created.
Then unmount and run "xfs_repair -o bhash=101703 /path/to/file.img"
on the resultant image file.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>, Mel Gorman <mgorman@suse.de>,
Andrew Morton <akpm@linux-foundation.org>,
Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Linux-MM <linux-mm@kvack.org>,
xfs@oss.sgi.com, ppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur
Date: Mon, 9 Mar 2015 22:29:36 +1100 [thread overview]
Message-ID: <20150309112936.GD26657@destitution> (raw)
In-Reply-To: <CA+55aFyQyZXu2fi7X9bWdSX0utk8=sccfBwFaSoToROXoE_PLA@mail.gmail.com>
On Sun, Mar 08, 2015 at 11:35:59AM -0700, Linus Torvalds wrote:
> On Sun, Mar 8, 2015 at 3:02 AM, Ingo Molnar <mingo@kernel.org> wrote:
> But:
>
> > As a second hack (not to be applied), could we change:
> >
> > #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL
> >
> > to:
> >
> > #define _PAGE_BIT_PROTNONE (_PAGE_BIT_GLOBAL+1)
> >
> > to double check that the position of the bit does not matter?
>
> Agreed. We should definitely try that.
>
> Dave?
As Mel has already mentioned, I'm in Boston for LSFMM and don't have
access to the test rig I've used to generate this.
> Also, is there some sane way for me to actually see this behavior on a
> regular machine with just a single socket? Dave is apparently running
> in some fake-numa setup, I'm wondering if this is easy enough to
> reproduce that I could see it myself.
Should be - I don't actually use 500TB of storage to generate this -
50GB on an SSD is all you need from the storage side. I just use a
sparse backing file to make it look like a 500TB device. :P
i.e. create an XFS filesystem on a 500TB sparse file with "mkfs.xfs
-d size=500t,file=1 /path/to/file.img", mount it on loopback or as a
virtio,cache=none device for the guest vm and then use fsmark to
generate several million files spread across many, many directories
such as:
$ fs_mark -D 10000 -S0 -n 100000 -s 1 -L 32 -d \
/mnt/scratch/0 -d /mnt/scratch/1 -d /mnt/scratch/2 -d \
/mnt/scratch/3 -d /mnt/scratch/4 -d /mnt/scratch/5 -d \
/mnt/scratch/6 -d /mnt/scratch/7
That should only take a few minutes to run - if you throw 8p at it
then it should run at >100k files/s being created.
Then unmount and run "xfs_repair -o bhash=101703 /path/to/file.img"
on the resultant image file.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>, Mel Gorman <mgorman@suse.de>,
Andrew Morton <akpm@linux-foundation.org>,
Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Linux-MM <linux-mm@kvack.org>,
xfs@oss.sgi.com, ppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur
Date: Mon, 9 Mar 2015 22:29:36 +1100 [thread overview]
Message-ID: <20150309112936.GD26657@destitution> (raw)
In-Reply-To: <CA+55aFyQyZXu2fi7X9bWdSX0utk8=sccfBwFaSoToROXoE_PLA@mail.gmail.com>
On Sun, Mar 08, 2015 at 11:35:59AM -0700, Linus Torvalds wrote:
> On Sun, Mar 8, 2015 at 3:02 AM, Ingo Molnar <mingo@kernel.org> wrote:
> But:
>
> > As a second hack (not to be applied), could we change:
> >
> > #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL
> >
> > to:
> >
> > #define _PAGE_BIT_PROTNONE (_PAGE_BIT_GLOBAL+1)
> >
> > to double check that the position of the bit does not matter?
>
> Agreed. We should definitely try that.
>
> Dave?
As Mel has already mentioned, I'm in Boston for LSFMM and don't have
access to the test rig I've used to generate this.
> Also, is there some sane way for me to actually see this behavior on a
> regular machine with just a single socket? Dave is apparently running
> in some fake-numa setup, I'm wondering if this is easy enough to
> reproduce that I could see it myself.
Should be - I don't actually use 500TB of storage to generate this -
50GB on an SSD is all you need from the storage side. I just use a
sparse backing file to make it look like a 500TB device. :P
i.e. create an XFS filesystem on a 500TB sparse file with "mkfs.xfs
-d size=500t,file=1 /path/to/file.img", mount it on loopback or as a
virtio,cache=none device for the guest vm and then use fsmark to
generate several million files spread across many, many directories
such as:
$ fs_mark -D 10000 -S0 -n 100000 -s 1 -L 32 -d \
/mnt/scratch/0 -d /mnt/scratch/1 -d /mnt/scratch/2 -d \
/mnt/scratch/3 -d /mnt/scratch/4 -d /mnt/scratch/5 -d \
/mnt/scratch/6 -d /mnt/scratch/7
That should only take a few minutes to run - if you throw 8p at it
then it should run at >100k files/s being created.
Then unmount and run "xfs_repair -o bhash=101703 /path/to/file.img"
on the resultant image file.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2015-03-09 11:29 UTC|newest]
Thread overview: 195+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-07 15:20 [RFC PATCH 0/4] Automatic NUMA balancing and PROT_NONE handling followup v2r8 Mel Gorman
2015-03-07 15:20 ` Mel Gorman
2015-03-07 15:20 ` Mel Gorman
2015-03-07 15:20 ` Mel Gorman
2015-03-07 15:20 ` [PATCH 1/4] mm: thp: Return the correct value for change_huge_pmd Mel Gorman
2015-03-07 15:20 ` Mel Gorman
2015-03-07 15:20 ` Mel Gorman
2015-03-07 15:20 ` Mel Gorman
2015-03-07 20:13 ` Linus Torvalds
2015-03-07 20:13 ` Linus Torvalds
2015-03-07 20:13 ` Linus Torvalds
2015-03-07 20:13 ` Linus Torvalds
2015-03-07 20:31 ` Linus Torvalds
2015-03-07 20:31 ` Linus Torvalds
2015-03-07 20:31 ` Linus Torvalds
2015-03-07 20:31 ` Linus Torvalds
2015-03-07 20:56 ` Mel Gorman
2015-03-07 20:56 ` Mel Gorman
2015-03-07 20:56 ` Mel Gorman
2015-03-07 20:56 ` Mel Gorman
2015-03-07 15:20 ` [PATCH 2/4] mm: numa: Remove migrate_ratelimited Mel Gorman
2015-03-07 15:20 ` Mel Gorman
2015-03-07 15:20 ` Mel Gorman
2015-03-07 15:20 ` Mel Gorman
2015-03-07 15:20 ` [PATCH 3/4] mm: numa: Mark huge PTEs young when clearing NUMA hinting faults Mel Gorman
2015-03-07 15:20 ` Mel Gorman
2015-03-07 15:20 ` Mel Gorman
2015-03-07 15:20 ` Mel Gorman
2015-03-07 18:33 ` Linus Torvalds
2015-03-07 18:33 ` Linus Torvalds
2015-03-07 18:33 ` Linus Torvalds
2015-03-07 18:33 ` Linus Torvalds
2015-03-07 18:42 ` Linus Torvalds
2015-03-07 18:42 ` Linus Torvalds
2015-03-07 18:42 ` Linus Torvalds
2015-03-07 18:42 ` Linus Torvalds
2015-03-07 15:20 ` [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur Mel Gorman
2015-03-07 15:20 ` Mel Gorman
2015-03-07 15:20 ` Mel Gorman
2015-03-07 15:20 ` Mel Gorman
2015-03-07 16:36 ` Ingo Molnar
2015-03-07 16:36 ` Ingo Molnar
2015-03-07 16:36 ` Ingo Molnar
2015-03-07 16:36 ` Ingo Molnar
2015-03-07 17:37 ` Mel Gorman
2015-03-07 17:37 ` Mel Gorman
2015-03-07 17:37 ` Mel Gorman
2015-03-07 17:37 ` Mel Gorman
2015-03-08 9:54 ` Ingo Molnar
2015-03-08 9:54 ` Ingo Molnar
2015-03-08 9:54 ` Ingo Molnar
2015-03-08 9:54 ` Ingo Molnar
2015-03-07 19:12 ` Linus Torvalds
2015-03-07 19:12 ` Linus Torvalds
2015-03-07 19:12 ` Linus Torvalds
2015-03-07 19:12 ` Linus Torvalds
2015-03-08 10:02 ` Ingo Molnar
2015-03-08 10:02 ` Ingo Molnar
2015-03-08 10:02 ` Ingo Molnar
2015-03-08 10:02 ` Ingo Molnar
2015-03-08 18:35 ` Linus Torvalds
2015-03-08 18:35 ` Linus Torvalds
2015-03-08 18:35 ` Linus Torvalds
2015-03-08 18:35 ` Linus Torvalds
2015-03-08 18:46 ` Linus Torvalds
2015-03-08 18:46 ` Linus Torvalds
2015-03-08 18:46 ` Linus Torvalds
2015-03-08 18:46 ` Linus Torvalds
2015-03-09 11:29 ` Dave Chinner [this message]
2015-03-09 11:29 ` Dave Chinner
2015-03-09 11:29 ` Dave Chinner
2015-03-09 11:29 ` Dave Chinner
2015-03-09 16:52 ` Linus Torvalds
2015-03-09 16:52 ` Linus Torvalds
2015-03-09 16:52 ` Linus Torvalds
2015-03-09 16:52 ` Linus Torvalds
2015-03-09 19:19 ` Dave Chinner
2015-03-09 19:19 ` Dave Chinner
2015-03-09 19:19 ` Dave Chinner
2015-03-10 23:55 ` Linus Torvalds
2015-03-10 23:55 ` Linus Torvalds
2015-03-10 23:55 ` Linus Torvalds
2015-03-10 23:55 ` Linus Torvalds
2015-03-12 13:10 ` Mel Gorman
2015-03-12 13:10 ` Mel Gorman
2015-03-12 13:10 ` Mel Gorman
2015-03-12 13:10 ` Mel Gorman
2015-03-12 16:20 ` Linus Torvalds
2015-03-12 16:20 ` Linus Torvalds
2015-03-12 16:20 ` Linus Torvalds
2015-03-12 16:20 ` Linus Torvalds
2015-03-12 18:49 ` Mel Gorman
2015-03-12 18:49 ` Mel Gorman
2015-03-12 18:49 ` Mel Gorman
2015-03-12 18:49 ` Mel Gorman
2015-03-17 7:06 ` Dave Chinner
2015-03-17 7:06 ` Dave Chinner
2015-03-17 7:06 ` Dave Chinner
2015-03-17 7:06 ` Dave Chinner
2015-03-17 16:53 ` Linus Torvalds
2015-03-17 16:53 ` Linus Torvalds
2015-03-17 16:53 ` Linus Torvalds
2015-03-17 16:53 ` Linus Torvalds
2015-03-17 20:51 ` Dave Chinner
2015-03-17 20:51 ` Dave Chinner
2015-03-17 20:51 ` Dave Chinner
2015-03-17 20:51 ` Dave Chinner
2015-03-17 21:30 ` Linus Torvalds
2015-03-17 21:30 ` Linus Torvalds
2015-03-17 21:30 ` Linus Torvalds
2015-03-17 21:30 ` Linus Torvalds
2015-03-17 22:08 ` Dave Chinner
2015-03-17 22:08 ` Dave Chinner
2015-03-17 22:08 ` Dave Chinner
2015-03-17 22:08 ` Dave Chinner
2015-03-18 16:08 ` Linus Torvalds
2015-03-18 16:08 ` Linus Torvalds
2015-03-18 16:08 ` Linus Torvalds
2015-03-18 16:08 ` Linus Torvalds
2015-03-18 17:31 ` Linus Torvalds
2015-03-18 17:31 ` Linus Torvalds
2015-03-18 17:31 ` Linus Torvalds
2015-03-18 17:31 ` Linus Torvalds
2015-03-18 22:23 ` Dave Chinner
2015-03-18 22:23 ` Dave Chinner
2015-03-18 22:23 ` Dave Chinner
2015-03-18 22:23 ` Dave Chinner
2015-03-19 14:10 ` Mel Gorman
2015-03-19 14:10 ` Mel Gorman
2015-03-19 14:10 ` Mel Gorman
2015-03-19 14:10 ` Mel Gorman
2015-03-19 18:09 ` Linus Torvalds
2015-03-19 18:09 ` Linus Torvalds
2015-03-19 18:09 ` Linus Torvalds
2015-03-19 18:09 ` Linus Torvalds
2015-03-19 21:41 ` Linus Torvalds
2015-03-19 21:41 ` Linus Torvalds
2015-03-19 21:41 ` Linus Torvalds
2015-03-19 21:41 ` Linus Torvalds
2015-03-19 22:41 ` Dave Chinner
2015-03-19 22:41 ` Dave Chinner
2015-03-19 22:41 ` Dave Chinner
2015-03-19 22:41 ` Dave Chinner
2015-03-19 23:05 ` Linus Torvalds
2015-03-19 23:05 ` Linus Torvalds
2015-03-19 23:05 ` Linus Torvalds
2015-03-19 23:05 ` Linus Torvalds
2015-03-19 23:23 ` Dave Chinner
2015-03-19 23:23 ` Dave Chinner
2015-03-19 23:23 ` Dave Chinner
2015-03-19 23:23 ` Dave Chinner
2015-03-20 0:23 ` Dave Chinner
2015-03-20 0:23 ` Dave Chinner
2015-03-20 0:23 ` Dave Chinner
2015-03-20 0:23 ` Dave Chinner
2015-03-20 1:29 ` Linus Torvalds
2015-03-20 1:29 ` Linus Torvalds
2015-03-20 1:29 ` Linus Torvalds
2015-03-20 1:29 ` Linus Torvalds
2015-03-20 4:13 ` Dave Chinner
2015-03-20 4:13 ` Dave Chinner
2015-03-20 4:13 ` Dave Chinner
2015-03-20 4:13 ` Dave Chinner
2015-03-20 17:02 ` Linus Torvalds
2015-03-20 17:02 ` Linus Torvalds
2015-03-20 17:02 ` Linus Torvalds
2015-03-20 17:02 ` Linus Torvalds
2015-03-23 12:01 ` Mel Gorman
2015-03-23 12:01 ` Mel Gorman
2015-03-23 12:01 ` Mel Gorman
2015-03-23 12:01 ` Mel Gorman
2015-03-20 10:12 ` Mel Gorman
2015-03-20 10:12 ` Mel Gorman
2015-03-20 10:12 ` Mel Gorman
2015-03-20 10:12 ` Mel Gorman
2015-03-20 9:56 ` Mel Gorman
2015-03-20 9:56 ` Mel Gorman
2015-03-20 9:56 ` Mel Gorman
2015-03-20 9:56 ` Mel Gorman
2015-03-08 20:40 ` Mel Gorman
2015-03-08 20:40 ` Mel Gorman
2015-03-08 20:40 ` Mel Gorman
2015-03-08 20:40 ` Mel Gorman
2015-03-09 21:02 ` Mel Gorman
2015-03-09 21:02 ` Mel Gorman
2015-03-09 21:02 ` Mel Gorman
2015-03-09 21:02 ` Mel Gorman
2015-03-10 13:08 ` Mel Gorman
2015-03-10 13:08 ` Mel Gorman
2015-03-10 13:08 ` Mel Gorman
2015-03-10 13:08 ` Mel Gorman
2015-03-08 9:41 ` Ingo Molnar
2015-03-08 9:41 ` Ingo Molnar
2015-03-08 9:41 ` Ingo Molnar
2015-03-08 9:41 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150309112936.GD26657@destitution \
--to=david@fromorbit.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.