From: Andrew Morton <akpm@linux-foundation.org>
To: Shaohua Li <shli@fb.com>
Cc: linux-mm@kvack.org, linux-api@vger.kernel.org, je@fb.com,
Kernel-team@fb.com, Rik van Riel <riel@redhat.com>,
Mel Gorman <mgorman@suse.de>, Hugh Dickins <hughd@google.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Andrea Arcangeli <aarcange@redhat.com>,
Andi Kleen <andi@firstfloor.org>,
Minchan Kim <minchan@kernel.org>
Subject: Re: [PATCH V2] mm: add a new vector based madvise syscall
Date: Tue, 1 Dec 2015 15:59:26 -0800 [thread overview]
Message-ID: <20151201155926.6291e5e541c49d453079849b@linux-foundation.org> (raw)
In-Reply-To: <c25b90749f9212359a085125f6403f4c148dfde0.1447098139.git.shli@fb.com>
On Mon, 9 Nov 2015 11:44:54 -0800 Shaohua Li <shli@fb.com> wrote:
> In jemalloc, a free(3) doesn't immediately free the memory to OS even
> the memory is page aligned/size, and hope the memory can be reused soon.
> Later the virtual address becomes fragmented, and more and more free
> memory are aggregated. If the free memory size is large, jemalloc uses
> madvise(DONT_NEED) to actually free the memory back to OS.
>
> The madvise has significantly overhead paritcularly because of TLB
> flush. jemalloc does madvise for several virtual address space ranges
> one time. Instead of calling madvise for each of the ranges, we
> introduce a new syscall to purge memory for several ranges one time. In
> this way, we can merge several TLB flush for the ranges to one big TLB
> flush. This also reduce mmap_sem locking and kernel/userspace switching.
>
> I'm running a simple memory allocation benchmark. 32 threads do random
> malloc/free/realloc. Corresponding jemalloc patch to utilize this API is
> attached.
> Without patch:
> real 0m18.923s
> user 1m11.819s
> sys 7m44.626s
> each cpu gets around 3000K/s TLB flush interrupt. Perf shows TLB flush
> is hotest functions. mmap_sem read locking (because of page fault) is
> also heavy.
>
> with patch:
> real 0m15.026s
> user 0m48.548s
> sys 6m41.153s
> each cpu gets around 140k/s TLB flush interrupt. TLB flush isn't hot at
> all. mmap_sem read locking (still because of page fault) becomes the
> sole hot spot.
>
> Another test malloc a bunch of memory in 48 threads, then all threads
> free the memory. I measure the time of the memory free.
> Without patch: 34.332s
> With patch: 17.429s
>
> Current implementation only supports MADV_DONTNEED. Should be trival to
> support MADV_FREE if necessary later.
I'd like to see a full description of the proposed userspace interface:
arguments, data structures, return values, etc. A propotype manpage,
basically.
I'd also like to see an analysis of which other userspace allocators
will benefit from this. glibc? tcmalloc?
>
> ...
>
> +/*
> + * The vector madvise(). Like madvise except running for a vector of virtual
> + * address ranges
> + */
> +SYSCALL_DEFINE3(madvisev, const struct iovec __user *, uvector,
> + unsigned long, nr_segs, int, behavior)
> +{
> + struct iovec iovstack[UIO_FASTIOV];
> + struct iovec *iov = NULL;
> + unsigned long start, end = 0;
> + int unmapped_error = 0;
> + size_t len;
> + struct mmu_gather tlb;
> + int error;
> + int i;
> +
> + if (behavior != MADV_DONTNEED)
> + return -EINVAL;
> +
> + error = rw_copy_check_uvector(CHECK_IOVEC_ONLY, uvector, nr_segs,
> + UIO_FASTIOV, iovstack, &iov);
> + if (error <= 0)
> + goto out;
> + /* Make sure address in ascend order */
> + sort(iov, nr_segs, sizeof(struct iovec), iov_cmp_func, NULL);
Do we really need to sort the addresses? That's something which can be
done in userspace and we can easily add a check-for-sortedness to the
below loop.
It depends on whether userspace can easily generate a sorted array. If
basically all userspace will always need to run sort() then it doesn't
matter much whether it's done in the kernel or in userspace. But if
*some* userspace can naturally generate its array in sorted form then
neither userspace nor the kernel needs to run sort() and we should take
this out.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
prev parent reply other threads:[~2015-12-01 23:59 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-09 19:44 [PATCH V2] mm: add a new vector based madvise syscall Shaohua Li
2015-12-01 23:59 ` Andrew Morton [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151201155926.6291e5e541c49d453079849b@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=Kernel-team@fb.com \
--cc=aarcange@redhat.com \
--cc=andi@firstfloor.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=je@fb.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=minchan@kernel.org \
--cc=riel@redhat.com \
--cc=shli@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).