From: Zheng Liu <gnehzuil.liu@gmail.com>
To: Dave Hansen <dave@sr71.net>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC][PATCH] mm: madvise: MADV_POPULATE for quick pre-faulting
Date: Fri, 28 Jun 2013 13:47:57 +0800 [thread overview]
Message-ID: <20130628054757.GA10429@gmail.com> (raw)
In-Reply-To: <20130627231605.8F9F12E6@viggo.jf.intel.com>
Hi Dave,
On Thu, Jun 27, 2013 at 04:16:05PM -0700, Dave Hansen wrote:
>
> I've been doing some testing involving large amounts of
> page cache. It's quite painful to get hundreds of GB
> of page cache mapped in, especially when I am trying to
> do it in parallel threads. This is true even when the
> page cache is already allocated and I only need to map
> it in. The test:
>
> 1. take 160 16MB files
> 2. clone 160 threads, mmap the 16MB files, and either
> a. walk through the file touching each page
Why not change MAP_POPULATE flag in mmap(2)? Now it is only for private
mappings. But maybe we could let it support shared mapping.
Regards,
- Zheng
> b. run MADV_POPULATE on the file
> 3. MADV_DONTNEED on the mmap()'d area
>
> 160 threads/processes:
> faulting | MADV_POPULATE
> Threads: 698 | 102239 (146x speedup)
> Proceeses: 154247 | 297518 (1.9x speedup)
>
> single threaded:
> faulting | MADV_POPULATE
> 1908 | 3710 (1.9x speedup)
>
> To fix the thread suckage, this patch just walks the
> VMAs and maps all the pages in. Since it does a
> bunch of them in one go, it amortizes the cost of
> acquiring the mmap_sem across all of those pages.
>
> FAQ:
>
> Why do threads suck so much?
>
> Bouncing the mmap_sem cacheline around, plus anything
> else that we write to during a fault. We do one page,
> move the cachelines to another CPU, do one more page,
> etc...
>
> Does MADV_DONTNEED work for this?
>
> No. It brings the pages in to the page cache, but
> does not map them the way it is implemented at the
> moment. I guess we'd be within our rights to make
> it behave like MADV_POPULATE if we want though.
>
>
>
> ---
>
> linux.git-davehans/include/uapi/asm-generic/mman-common.h | 1
> linux.git-davehans/mm/madvise.c | 40 +++++++++++++-
> 2 files changed, 40 insertions(+), 1 deletion(-)
>
> diff -puN include/uapi/asm-generic/mman-common.h~madv_populate include/uapi/asm-generic/mman-common.h
> --- linux.git/include/uapi/asm-generic/mman-common.h~madv_populate 2013-06-27 15:22:35.651854196 -0700
> +++ linux.git-davehans/include/uapi/asm-generic/mman-common.h 2013-06-27 15:22:35.656854418 -0700
> @@ -51,6 +51,7 @@
> #define MADV_DONTDUMP 16 /* Explicity exclude from the core dump,
> overrides the coredump filter bits */
> #define MADV_DODUMP 17 /* Clear the MADV_NODUMP flag */
> +#define MADV_POPULATE 18 /* Fill in mapping like faults would */
>
> /* compatibility flags */
> #define MAP_FILE 0
> diff -puN mm/madvise.c~madv_populate mm/madvise.c
> --- linux.git/mm/madvise.c~madv_populate 2013-06-27 15:22:35.652854240 -0700
> +++ linux.git-davehans/mm/madvise.c 2013-06-27 15:22:35.656854418 -0700
> @@ -19,6 +19,7 @@
> #include <linux/blkdev.h>
> #include <linux/swap.h>
> #include <linux/swapops.h>
> +#include "internal.h"
>
> /*
> * Any behaviour which results in changes to the vma->vm_flags needs to
> @@ -31,6 +32,7 @@ static int madvise_need_mmap_write(int b
> case MADV_REMOVE:
> case MADV_WILLNEED:
> case MADV_DONTNEED:
> + case MADV_POPULATE:
> return 0;
> default:
> /* be safe, default to 1. list exceptions explicitly */
> @@ -252,6 +254,39 @@ static long madvise_willneed(struct vm_a
> }
>
> /*
> + * Do not just populate the page cache (WILLNEED), also map the pages.
> + */
> +static long madvise_populate(struct vm_area_struct * vma,
> + struct vm_area_struct ** prev,
> + unsigned long start, unsigned long end)
> +{
> + struct file *file = vma->vm_file;
> + int locked = 1;
> + int ret;
> +
> + if (file && file->f_mapping->a_ops->get_xip_mem) {
> + /* no bad return value, but ignore advice */
> + return 0;
> + }
> +
> + ret = __mlock_vma_pages_range(vma, start, end, &locked);
> + /*
> + * Make sure that out down_read() matches (read vs.
> + * write) what we did in sys_madvise.
> + */
> + BUG_ON(madvise_need_mmap_write(MADV_POPULATE));
> + if (!locked) {
> + down_read(¤t->mm->mmap_sem);
> + /* tell sys_madvise we drop mmap_sem: */
> + *prev = NULL;
> + } else {
> + *prev = vma;
> + }
> +
> + return ret;
> +}
> +
> +/*
> * Application no longer needs these pages. If the pages are dirty,
> * it's OK to just throw them away. The app will be more careful about
> * data it wants to keep. Be sure to free swap resources too. The
> @@ -378,6 +413,8 @@ madvise_vma(struct vm_area_struct *vma,
> return madvise_remove(vma, prev, start, end);
> case MADV_WILLNEED:
> return madvise_willneed(vma, prev, start, end);
> + case MADV_POPULATE:
> + return madvise_populate(vma, prev, start, end);
> case MADV_DONTNEED:
> return madvise_dontneed(vma, prev, start, end);
> default:
> @@ -407,6 +444,7 @@ madvise_behavior_valid(int behavior)
> #endif
> case MADV_DONTDUMP:
> case MADV_DODUMP:
> + case MADV_POPULATE:
> return 1;
>
> default:
> @@ -536,7 +574,7 @@ SYSCALL_DEFINE3(madvise, unsigned long,
> goto out;
> if (prev)
> vma = prev->vm_next;
> - else /* madvise_remove dropped mmap_sem */
> + else /* madvise_remove/populate dropped mmap_sem */
> vma = find_vma(current->mm, start);
> }
> out:
> _
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-06-28 5:29 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-27 23:16 [RFC][PATCH] mm: madvise: MADV_POPULATE for quick pre-faulting Dave Hansen
2013-06-28 5:47 ` Zheng Liu [this message]
2013-06-28 15:48 ` Dave Hansen
2013-06-29 2:20 ` Zheng Liu
2013-07-01 16:16 ` Dave Hansen
2013-07-02 2:37 ` Zheng Liu
2013-07-02 4:43 ` Dave Hansen
2013-07-02 6:06 ` Zheng Liu
2013-07-14 3:12 ` Sam Ben
2013-07-15 0:22 ` Zheng Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130628054757.GA10429@gmail.com \
--to=gnehzuil.liu@gmail.com \
--cc=dave@sr71.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).