From: Shaohua Li <shli-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
To: Daniel Micay <danielmicay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>,
Minchan Kim <minchan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
Hugh Dickins <hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
Andrew Morton
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
Michael Kerrisk
<mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>,
"linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org"
<linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>,
KOSAKI Motohiro
<kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org>,
"Kirill A. Shutemov"
<kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>,
Rik van Riel <riel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
Linux API <linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Jason Evans <je-b10kYP2dOMg@public.gmane.org>,
"linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
yalin wang
<yalin.wang2010-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
Mel Gorman <mgorman-l3A5Bk7waGM@public.gmane.org>
Subject: Re: [PATCH v2 01/13] mm: support madvise(MADV_FREE)
Date: Thu, 5 Nov 2015 10:17:26 -0800 [thread overview]
Message-ID: <20151105181726.GA63566@kernel.org> (raw)
In-Reply-To: <563A813B.9080903-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
On Wed, Nov 04, 2015 at 05:05:47PM -0500, Daniel Micay wrote:
> > With enough pages at once, though, munmap would be fine, too.
>
> That implies lots of page faults and zeroing though. The zeroing alone
> is a major performance issue.
>
> There are separate issues with munmap since it ends up resulting in a
> lot more virtual memory fragmentation. It would help if the kernel used
> first-best-fit for mmap instead of the current naive algorithm (bonus:
> O(log n) worst-case, not O(n)). Since allocators like jemalloc and
> PartitionAlloc want 2M aligned spans, mixing them with other allocators
> can also accelerate the VM fragmentation caused by the dumb mmap
> algorithm (i.e. they make a 2M aligned mapping, some other mmap user
> does 4k, now there's a nearly 2M gap when the next 2M region is made and
> the kernel keeps going rather than reusing it). Anyway, that's a totally
> separate issue from this. Just felt like complaining :).
>
> > Maybe what's really needed is a MADV_FREE variant that takes an iovec.
> > On an all-cores multithreaded mm, the TLB shootdown broadcast takes
> > thousands of cycles on each core more or less regardless of how much
> > of the TLB gets zapped.
>
> That would work very well. The allocator ends up having a sequence of
> dirty spans that it needs to purge in one go. As long as purging is
> fairly spread out, the cost of a single TLB shootdown isn't that bad. It
> is extremely bad if it needs to do it over and over to purge a bunch of
> ranges, which can happen if the memory has ended up being very, very
> fragmentated despite the efforts to compact it (depends on what the
> application ends up doing).
I posted a patch doing exactly iovec madvise. Doesn't support MADV_FREE yet
though, but should be easy to do it.
http://marc.info/?l=linux-mm&m=144615663522661&w=2
WARNING: multiple messages have this Message-ID (diff)
From: Shaohua Li <shli@kernel.org>
To: Daniel Micay <danielmicay@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>,
Minchan Kim <minchan@kernel.org>, Hugh Dickins <hughd@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Michael Kerrisk <mtk.manpages@gmail.com>,
Michal Hocko <mhocko@suse.cz>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
Rik van Riel <riel@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Linux API <linux-api@vger.kernel.org>, Jason Evans <je@fb.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
yalin wang <yalin.wang2010@gmail.com>,
Mel Gorman <mgorman@suse.de>
Subject: Re: [PATCH v2 01/13] mm: support madvise(MADV_FREE)
Date: Thu, 5 Nov 2015 10:17:26 -0800 [thread overview]
Message-ID: <20151105181726.GA63566@kernel.org> (raw)
In-Reply-To: <563A813B.9080903@gmail.com>
On Wed, Nov 04, 2015 at 05:05:47PM -0500, Daniel Micay wrote:
> > With enough pages at once, though, munmap would be fine, too.
>
> That implies lots of page faults and zeroing though. The zeroing alone
> is a major performance issue.
>
> There are separate issues with munmap since it ends up resulting in a
> lot more virtual memory fragmentation. It would help if the kernel used
> first-best-fit for mmap instead of the current naive algorithm (bonus:
> O(log n) worst-case, not O(n)). Since allocators like jemalloc and
> PartitionAlloc want 2M aligned spans, mixing them with other allocators
> can also accelerate the VM fragmentation caused by the dumb mmap
> algorithm (i.e. they make a 2M aligned mapping, some other mmap user
> does 4k, now there's a nearly 2M gap when the next 2M region is made and
> the kernel keeps going rather than reusing it). Anyway, that's a totally
> separate issue from this. Just felt like complaining :).
>
> > Maybe what's really needed is a MADV_FREE variant that takes an iovec.
> > On an all-cores multithreaded mm, the TLB shootdown broadcast takes
> > thousands of cycles on each core more or less regardless of how much
> > of the TLB gets zapped.
>
> That would work very well. The allocator ends up having a sequence of
> dirty spans that it needs to purge in one go. As long as purging is
> fairly spread out, the cost of a single TLB shootdown isn't that bad. It
> is extremely bad if it needs to do it over and over to purge a bunch of
> ranges, which can happen if the memory has ended up being very, very
> fragmentated despite the efforts to compact it (depends on what the
> application ends up doing).
I posted a patch doing exactly iovec madvise. Doesn't support MADV_FREE yet
though, but should be easy to do it.
http://marc.info/?l=linux-mm&m=144615663522661&w=2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Shaohua Li <shli@kernel.org>
To: Daniel Micay <danielmicay@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>,
Minchan Kim <minchan@kernel.org>, Hugh Dickins <hughd@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Michael Kerrisk <mtk.manpages@gmail.com>,
Michal Hocko <mhocko@suse.cz>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
Rik van Riel <riel@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Linux API <linux-api@vger.kernel.org>, Jason Evans <je@fb.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
yalin wang <yalin.wang2010@gmail.com>,
Mel Gorman <mgorman@suse.de>
Subject: Re: [PATCH v2 01/13] mm: support madvise(MADV_FREE)
Date: Thu, 5 Nov 2015 10:17:26 -0800 [thread overview]
Message-ID: <20151105181726.GA63566@kernel.org> (raw)
In-Reply-To: <563A813B.9080903@gmail.com>
On Wed, Nov 04, 2015 at 05:05:47PM -0500, Daniel Micay wrote:
> > With enough pages at once, though, munmap would be fine, too.
>
> That implies lots of page faults and zeroing though. The zeroing alone
> is a major performance issue.
>
> There are separate issues with munmap since it ends up resulting in a
> lot more virtual memory fragmentation. It would help if the kernel used
> first-best-fit for mmap instead of the current naive algorithm (bonus:
> O(log n) worst-case, not O(n)). Since allocators like jemalloc and
> PartitionAlloc want 2M aligned spans, mixing them with other allocators
> can also accelerate the VM fragmentation caused by the dumb mmap
> algorithm (i.e. they make a 2M aligned mapping, some other mmap user
> does 4k, now there's a nearly 2M gap when the next 2M region is made and
> the kernel keeps going rather than reusing it). Anyway, that's a totally
> separate issue from this. Just felt like complaining :).
>
> > Maybe what's really needed is a MADV_FREE variant that takes an iovec.
> > On an all-cores multithreaded mm, the TLB shootdown broadcast takes
> > thousands of cycles on each core more or less regardless of how much
> > of the TLB gets zapped.
>
> That would work very well. The allocator ends up having a sequence of
> dirty spans that it needs to purge in one go. As long as purging is
> fairly spread out, the cost of a single TLB shootdown isn't that bad. It
> is extremely bad if it needs to do it over and over to purge a bunch of
> ranges, which can happen if the memory has ended up being very, very
> fragmentated despite the efforts to compact it (depends on what the
> application ends up doing).
I posted a patch doing exactly iovec madvise. Doesn't support MADV_FREE yet
though, but should be easy to do it.
http://marc.info/?l=linux-mm&m=144615663522661&w=2
next prev parent reply other threads:[~2015-11-05 18:17 UTC|newest]
Thread overview: 96+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-04 1:25 [PATCH v2 00/13] MADV_FREE support Minchan Kim
2015-11-04 1:25 ` Minchan Kim
2015-11-04 1:25 ` [PATCH v2 01/13] mm: support madvise(MADV_FREE) Minchan Kim
2015-11-04 1:25 ` Minchan Kim
2015-11-04 2:29 ` Sergey Senozhatsky
2015-11-04 2:29 ` Sergey Senozhatsky
2015-11-04 23:40 ` Minchan Kim
2015-11-04 23:40 ` Minchan Kim
2015-11-04 23:40 ` Minchan Kim
[not found] ` <1446600367-7976-2-git-send-email-minchan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2015-11-04 2:16 ` Sergey Senozhatsky
2015-11-04 2:16 ` Sergey Senozhatsky
2015-11-04 2:16 ` Sergey Senozhatsky
2015-11-04 23:39 ` Minchan Kim
2015-11-04 23:39 ` Minchan Kim
2015-11-04 23:39 ` Minchan Kim
2015-11-05 3:41 ` Sergey Senozhatsky
2015-11-05 3:41 ` Sergey Senozhatsky
2015-11-05 3:41 ` Sergey Senozhatsky
2015-11-04 3:41 ` Andy Lutomirski
2015-11-04 3:41 ` Andy Lutomirski
2015-11-04 3:41 ` Andy Lutomirski
2015-11-04 5:50 ` Daniel Micay
[not found] ` <56399CA5.8090101-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-11-04 5:53 ` Daniel Micay
2015-11-04 5:53 ` Daniel Micay
2015-11-04 6:04 ` Daniel Micay
2015-11-04 18:23 ` Andy Lutomirski
2015-11-04 18:23 ` Andy Lutomirski
2015-11-04 22:05 ` Daniel Micay
[not found] ` <563A813B.9080903-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-11-05 18:17 ` Shaohua Li [this message]
2015-11-05 18:17 ` Shaohua Li
2015-11-05 18:17 ` Shaohua Li
2015-11-05 20:13 ` Daniel Micay
[not found] ` <563BB855.6020304-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-11-05 20:14 ` Daniel Micay
2015-11-05 20:14 ` Daniel Micay
[not found] ` <CALCETrUuNs=26UQtkU88cKPomx_Bik9mbgUUF9q7Nmh1pQJ4qg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-11-05 0:13 ` Minchan Kim
2015-11-05 0:13 ` Minchan Kim
2015-11-05 0:13 ` Minchan Kim
2015-11-05 0:42 ` Andy Lutomirski
2015-11-05 0:42 ` Andy Lutomirski
2015-11-05 0:56 ` Minchan Kim
2015-11-05 0:56 ` Minchan Kim
2015-11-05 1:29 ` Andy Lutomirski
2015-11-05 1:29 ` Andy Lutomirski
[not found] ` <CALCETrWWgbPNwCr-=LF8p33H25C_aNS5vy4wd3NUap6SmrsmkA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-11-05 1:48 ` Minchan Kim
2015-11-05 1:48 ` Minchan Kim
2015-11-05 1:48 ` Minchan Kim
2015-11-04 20:00 ` Shaohua Li
2015-11-04 20:00 ` Shaohua Li
2015-11-04 20:00 ` Shaohua Li
[not found] ` <20151104200006.GA46783-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2015-11-04 21:16 ` Daniel Micay
2015-11-04 21:16 ` Daniel Micay
[not found] ` <563A7591.7080607-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-11-04 21:29 ` Daniel Micay
2015-11-04 21:29 ` Daniel Micay
2015-11-05 1:33 ` Minchan Kim
2015-11-05 1:33 ` Minchan Kim
2015-11-05 1:33 ` Minchan Kim
2015-11-05 1:37 ` Minchan Kim
2015-11-05 1:37 ` Minchan Kim
2015-11-05 1:37 ` Minchan Kim
2015-11-04 21:43 ` Andy Lutomirski
2015-11-04 21:43 ` Andy Lutomirski
2015-12-01 22:30 ` John Stultz
2015-12-01 22:30 ` John Stultz
2015-11-04 1:25 ` [PATCH v2 02/13] mm: define MADV_FREE for some arches Minchan Kim
2015-11-04 1:25 ` Minchan Kim
2015-11-04 1:25 ` [PATCH v2 03/13] arch: uapi: asm: mman.h: Let MADV_FREE have same value for all architectures Minchan Kim
2015-11-04 1:25 ` Minchan Kim
2015-11-04 1:25 ` Minchan Kim
2015-11-04 1:25 ` Minchan Kim
[not found] ` <1446600367-7976-1-git-send-email-minchan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2015-11-04 1:25 ` [PATCH v2 04/13] mm: free swp_entry in madvise_free Minchan Kim
2015-11-04 1:25 ` Minchan Kim
2015-11-04 1:25 ` Minchan Kim
2015-11-04 1:26 ` [PATCH v2 08/13] x86: add pmd_[dirty|mkclean] for THP Minchan Kim
2015-11-04 1:26 ` Minchan Kim
2015-11-04 1:26 ` Minchan Kim
2015-11-04 1:26 ` [PATCH v2 10/13] powerpc: " Minchan Kim
2015-11-04 1:26 ` Minchan Kim
2015-11-04 1:26 ` Minchan Kim
2015-11-04 1:25 ` [PATCH v2 05/13] mm: move lazily freed pages to inactive list Minchan Kim
2015-11-04 1:25 ` Minchan Kim
2015-11-04 1:26 ` [PATCH v2 06/13] mm: clear PG_dirty to mark page freeable Minchan Kim
2015-11-04 1:26 ` Minchan Kim
2015-11-04 1:26 ` [PATCH v2 07/13] mm: mark stable page dirty in KSM Minchan Kim
2015-11-04 1:26 ` Minchan Kim
2015-11-04 1:26 ` [PATCH v2 09/13] sparc: add pmd_[dirty|mkclean] for THP Minchan Kim
2015-11-04 1:26 ` Minchan Kim
2015-11-04 1:26 ` [PATCH v2 11/13] arm: add pmd_mkclean " Minchan Kim
2015-11-04 1:26 ` Minchan Kim
2015-11-04 1:26 ` [PATCH v2 12/13] arm64: " Minchan Kim
2015-11-04 1:26 ` Minchan Kim
2015-11-04 1:26 ` [PATCH v2 13/13] mm: don't split THP page when syscall is called Minchan Kim
2015-11-04 1:26 ` Minchan Kim
2015-12-05 11:10 ` [PATCH v2 00/13] MADV_FREE support Pavel Machek
2015-12-05 11:10 ` Pavel Machek
2015-12-05 15:51 ` Daniel Micay
2015-12-05 15:51 ` Daniel Micay
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151105181726.GA63566@kernel.org \
--to=shli-dgejt+ai2ygdnm+yrofe0a@public.gmane.org \
--cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=danielmicay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
--cc=hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=je-b10kYP2dOMg@public.gmane.org \
--cc=kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org \
--cc=kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org \
--cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
--cc=luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org \
--cc=mgorman-l3A5Bk7waGM@public.gmane.org \
--cc=mhocko-AlSwsSmVLrQ@public.gmane.org \
--cc=minchan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=riel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=yalin.wang2010-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.