From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Sasha Levin <sasha.levin@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
peterz@infradead.org, mingo@kernel.org
Subject: Re: [PATCH 2/2] mm: replace remap_file_pages() syscall with emulation
Date: Mon, 12 May 2014 20:05:14 +0300 [thread overview]
Message-ID: <20140512170514.GA28227@node.dhcp.inet.fi> (raw)
In-Reply-To: <5370E4B4.1060802@oracle.com>
On Mon, May 12, 2014 at 11:11:48AM -0400, Sasha Levin wrote:
> On 05/08/2014 05:57 PM, Andrew Morton wrote:
> > On Thu, 8 May 2014 15:41:28 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> >
> >> > remap_file_pages(2) was invented to be able efficiently map parts of
> >> > huge file into limited 32-bit virtual address space such as in database
> >> > workloads.
> >> >
> >> > Nonlinear mappings are pain to support and it seems there's no
> >> > legitimate use-cases nowadays since 64-bit systems are widely available.
> >> >
> >> > Let's drop it and get rid of all these special-cased code.
> >> >
> >> > The patch replaces the syscall with emulation which creates new VMA on
> >> > each remap_file_pages(), unless they it can be merged with an adjacent
> >> > one.
> >> >
> >> > I didn't find *any* real code that uses remap_file_pages(2) to test
> >> > emulation impact on. I've checked Debian code search and source of all
> >> > packages in ALT Linux. No real users: libc wrappers, mentions in strace,
> >> > gdb, valgrind and this kind of stuff.
> >> >
> >> > There are few basic tests in LTP for the syscall. They work just fine
> >> > with emulation.
> >> >
> >> > To test performance impact, I've written small test case which
> >> > demonstrate pretty much worst case scenario: map 4G shmfs file, write to
> >> > begin of every page pgoff of the page, remap pages in reverse order,
> >> > read every page.
> >> >
> >> > The test creates 1 million of VMAs if emulation is in use, so I had to
> >> > set vm.max_map_count to 1100000 to avoid -ENOMEM.
> >> >
> >> > Before: 23.3 ( +- 4.31% ) seconds
> >> > After: 43.9 ( +- 0.85% ) seconds
> >> > Slowdown: 1.88x
> >> >
> >> > I believe we can live with that.
> >> >
> > There's still all the special-case goop around the place to be cleaned
> > up - VM_NONLINEAR is a decent search term. As is "grep nonlinear
> > mm/*.c". And although this cleanup is the main reason for the
> > patchset, let's not do it now - we can do all that if/after this patch
> > get merged.
> >
> > I'll queue the patches for some linux-next exposure and shall send
> > [1/2] Linuswards for 3.16 if nothing terrible happens. Once we've
> > sorted out the too-many-vmas issue we'll need to work out when to merge
> > [2/2].
>
> It seems that since no one is really using it, it's also impossible to
> properly test it. I've sent a fix that deals with panics in error paths
> that are very easy to trigger, but I'm worried that there are a lot more
> of those hiding over there.
Sorry for that.
> Since we can't find any actual users, testing suites are very incomplete
> w.r.t this syscall, and the amount of work required to "remove" it is
> non-trivial, can we just kill this syscall off?
>
> It sounds to me like a better option than to ship a new, buggy and possibly
> security dangerous version which we can't even test.
Taking into account your employment, is it possible to check how the RDBMS
(old but it still supported 32-bit versions) would react on -ENOSYS here?
I would like to get rid of it completely, but I thought it's not an option
for compatibility reason.
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Sasha Levin <sasha.levin@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
peterz@infradead.org, mingo@kernel.org
Subject: Re: [PATCH 2/2] mm: replace remap_file_pages() syscall with emulation
Date: Mon, 12 May 2014 20:05:14 +0300 [thread overview]
Message-ID: <20140512170514.GA28227@node.dhcp.inet.fi> (raw)
In-Reply-To: <5370E4B4.1060802@oracle.com>
On Mon, May 12, 2014 at 11:11:48AM -0400, Sasha Levin wrote:
> On 05/08/2014 05:57 PM, Andrew Morton wrote:
> > On Thu, 8 May 2014 15:41:28 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> >
> >> > remap_file_pages(2) was invented to be able efficiently map parts of
> >> > huge file into limited 32-bit virtual address space such as in database
> >> > workloads.
> >> >
> >> > Nonlinear mappings are pain to support and it seems there's no
> >> > legitimate use-cases nowadays since 64-bit systems are widely available.
> >> >
> >> > Let's drop it and get rid of all these special-cased code.
> >> >
> >> > The patch replaces the syscall with emulation which creates new VMA on
> >> > each remap_file_pages(), unless they it can be merged with an adjacent
> >> > one.
> >> >
> >> > I didn't find *any* real code that uses remap_file_pages(2) to test
> >> > emulation impact on. I've checked Debian code search and source of all
> >> > packages in ALT Linux. No real users: libc wrappers, mentions in strace,
> >> > gdb, valgrind and this kind of stuff.
> >> >
> >> > There are few basic tests in LTP for the syscall. They work just fine
> >> > with emulation.
> >> >
> >> > To test performance impact, I've written small test case which
> >> > demonstrate pretty much worst case scenario: map 4G shmfs file, write to
> >> > begin of every page pgoff of the page, remap pages in reverse order,
> >> > read every page.
> >> >
> >> > The test creates 1 million of VMAs if emulation is in use, so I had to
> >> > set vm.max_map_count to 1100000 to avoid -ENOMEM.
> >> >
> >> > Before: 23.3 ( +- 4.31% ) seconds
> >> > After: 43.9 ( +- 0.85% ) seconds
> >> > Slowdown: 1.88x
> >> >
> >> > I believe we can live with that.
> >> >
> > There's still all the special-case goop around the place to be cleaned
> > up - VM_NONLINEAR is a decent search term. As is "grep nonlinear
> > mm/*.c". And although this cleanup is the main reason for the
> > patchset, let's not do it now - we can do all that if/after this patch
> > get merged.
> >
> > I'll queue the patches for some linux-next exposure and shall send
> > [1/2] Linuswards for 3.16 if nothing terrible happens. Once we've
> > sorted out the too-many-vmas issue we'll need to work out when to merge
> > [2/2].
>
> It seems that since no one is really using it, it's also impossible to
> properly test it. I've sent a fix that deals with panics in error paths
> that are very easy to trigger, but I'm worried that there are a lot more
> of those hiding over there.
Sorry for that.
> Since we can't find any actual users, testing suites are very incomplete
> w.r.t this syscall, and the amount of work required to "remove" it is
> non-trivial, can we just kill this syscall off?
>
> It sounds to me like a better option than to ship a new, buggy and possibly
> security dangerous version which we can't even test.
Taking into account your employment, is it possible to check how the RDBMS
(old but it still supported 32-bit versions) would react on -ENOSYS here?
I would like to get rid of it completely, but I thought it's not an option
for compatibility reason.
--
Kirill A. Shutemov
next prev parent reply other threads:[~2014-05-12 17:05 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-08 12:41 [PATCHv2 0/2] remap_file_pages() decommission Kirill A. Shutemov
2014-05-08 12:41 ` Kirill A. Shutemov
2014-05-08 12:41 ` [PATCH 1/2] mm: mark remap_file_pages() syscall as deprecated Kirill A. Shutemov
2014-05-08 12:41 ` Kirill A. Shutemov
[not found] ` <1399552888-11024-2-git-send-email-kirill.shutemov-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2014-06-12 5:48 ` Michael Kerrisk
2014-06-12 5:48 ` Michael Kerrisk
2014-06-12 5:48 ` Michael Kerrisk
[not found] ` <CAHO5Pa31WVrtG+2hU1grbLHiEPjkM_eB4JgSStskX8AvDjQRKA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-06-12 9:40 ` Kirill A. Shutemov
2014-06-12 9:40 ` Kirill A. Shutemov
2014-06-12 9:40 ` Kirill A. Shutemov
2014-06-12 9:44 ` Michael Kerrisk (man-pages)
2014-06-12 9:44 ` Michael Kerrisk (man-pages)
2014-05-08 12:41 ` [PATCH 2/2] mm: replace remap_file_pages() syscall with emulation Kirill A. Shutemov
2014-05-08 12:41 ` Kirill A. Shutemov
2014-05-08 21:57 ` Andrew Morton
2014-05-08 21:57 ` Andrew Morton
2014-05-12 15:11 ` Sasha Levin
2014-05-12 15:11 ` Sasha Levin
2014-05-12 17:05 ` Kirill A. Shutemov [this message]
2014-05-12 17:05 ` Kirill A. Shutemov
2014-05-14 20:52 ` Sasha Levin
2014-05-14 20:52 ` Sasha Levin
2014-05-14 21:17 ` Kirill A. Shutemov
2014-05-14 21:17 ` Kirill A. Shutemov
2014-05-14 21:40 ` Andrew Morton
2014-05-14 21:40 ` Andrew Morton
2014-05-13 7:32 ` Armin Rigo
2014-05-13 7:32 ` Armin Rigo
2014-05-13 12:57 ` Sasha Levin
2014-05-13 12:57 ` Sasha Levin
2014-05-08 15:35 ` [PATCHv2 0/2] remap_file_pages() decommission Linus Torvalds
2014-05-08 15:35 ` Linus Torvalds
2014-05-08 15:44 ` Armin Rigo
2014-05-08 15:44 ` Armin Rigo
2014-05-08 16:02 ` Kirill A. Shutemov
2014-05-08 16:02 ` Kirill A. Shutemov
2014-05-08 16:08 ` Linus Torvalds
2014-05-08 16:08 ` Linus Torvalds
2014-05-09 14:05 ` Kirill A. Shutemov
2014-05-09 14:05 ` Kirill A. Shutemov
2014-05-09 15:14 ` Linus Torvalds
2014-05-09 15:14 ` Linus Torvalds
2014-05-09 18:19 ` Kirill A. Shutemov
2014-05-09 18:19 ` Kirill A. Shutemov
2014-05-12 12:43 ` Kirill A. Shutemov
2014-05-12 12:43 ` Kirill A. Shutemov
2014-05-12 14:59 ` Konstantin Khlebnikov
2014-05-12 14:59 ` Konstantin Khlebnikov
2014-05-12 3:36 ` Andi Kleen
2014-05-12 3:36 ` Andi Kleen
2014-05-12 5:16 ` Konstantin Khlebnikov
2014-05-12 5:16 ` Konstantin Khlebnikov
2014-05-12 7:50 ` Armin Rigo
2014-05-12 7:50 ` Armin Rigo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140512170514.GA28227@node.dhcp.inet.fi \
--to=kirill@shutemov.name \
--cc=akpm@linux-foundation.org \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=sasha.levin@oracle.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.