From: Michal Hocko <mhocko@kernel.org> To: linux-api@vger.kernel.org Cc: Khalid Aziz <khalid.aziz@oracle.com>, Michael Ellerman <mpe@ellerman.id.au>, Andrew Morton <akpm@linux-foundation.org>, Russell King - ARM Linux <linux@armlinux.org.uk>, Andrea Arcangeli <aarcange@redhat.com>, linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>, linux-arch@vger.kernel.org, Florian Weimer <fweimer@redhat.com>, John Hubbard <jhubbard@nvidia.com>, Abdul Haleem <abdhalee@linux.vnet.ibm.com>, Joel Stanley <joel@jms.id.au>, Kees Cook <keescook@chromium.org>, Michal Hocko <mhocko@suse.com> Subject: [PATCH 0/2] mm: introduce MAP_FIXED_SAFE Date: Wed, 29 Nov 2017 15:42:17 +0100 [thread overview] Message-ID: <20171129144219.22867-1-mhocko@kernel.org> (raw) Hi, I am resending with RFC dropped and ask for inclusion. There haven't been any fundamental objections for the RFC [1]. I have also prepared a man page patch which is 0/3 of this series. This has started as a follow up discussion [2][3] resulting in the runtime failure caused by hardening patch [4] which removes MAP_FIXED from the elf loader because MAP_FIXED is inherently dangerous as it might silently clobber an existing underlying mapping (e.g. stack). The reason for the failure is that some architectures enforce an alignment for the given address hint without MAP_FIXED used (e.g. for shared or file backed mappings). One way around this would be excluding those archs which do alignment tricks from the hardening [5]. The patch is really trivial but it has been objected, rightfully so, that this screams for a more generic solution. We basically want a non-destructive MAP_FIXED. The first patch introduced MAP_FIXED_SAFE which enforces the given address but unlike MAP_FIXED it fails with ENOMEM if the given range conflicts with an existing one. The flag is introduced as a completely new one rather than a MAP_FIXED extension because of the backward compatibility. We really want a never-clobber semantic even on older kernels which do not recognize the flag. Unfortunately mmap sucks wrt. flags evaluation because we do not EINVAL on unknown flags. On those kernels we would simply use the traditional hint based semantic so the caller can still get a different address (which sucks) but at least not silently corrupt an existing mapping. I do not see a good way around that. Except we won't export expose the new semantic to the userspace at all. It seems there are users who would like to have something like that. Jemalloc has been mentioned by Michael Ellerman [6] Florian Weimer has mentioned the following: : glibc ld.so currently maps DSOs without hints. This means that the kernel : will map right next to each other, and the offsets between them a completely : predictable. We would like to change that and supply a random address in a : window of the address space. If there is a conflict, we do not want the : kernel to pick a non-random address. Instead, we would try again with a : random address. John Hubbard has mentioned CUDA example : a) Searches /proc/<pid>/maps for a "suitable" region of available : VA space. "Suitable" generally means it has to have a base address : within a certain limited range (a particular device model might : have odd limitations, for example), it has to be large enough, and : alignment has to be large enough (again, various devices may have : constraints that lead us to do this). : : This is of course subject to races with other threads in the process. : : Let's say it finds a region starting at va. : : b) Next it does: : p = mmap(va, ...) : : *without* setting MAP_FIXED, of course (so va is just a hint), to : attempt to safely reserve that region. If p != va, then in most cases, : this is a failure (almost certainly due to another thread getting a : mapping from that region before we did), and so this layer now has to : call munmap(), before returning a "failure: retry" to upper layers. : : IMPROVEMENT: --> if instead, we could call this: : : p = mmap(va, ... MAP_FIXED_SAFE ...) : : , then we could skip the munmap() call upon failure. This : is a small thing, but it is useful here. (Thanks to Piotr : Jaroszynski and Mark Hairgrove for helping me get that detail : exactly right, btw.) : : c) After that, CUDA suballocates from p, via: : : q = mmap(sub_region_start, ... MAP_FIXED ...) : : Interestingly enough, "freeing" is also done via MAP_FIXED, and : setting PROT_NONE to the subregion. Anyway, I just included (c) for : general interest. Atomic address range probing in the multithreaded programs in general sounds like an interesting thing to me. The second patch simply replaces MAP_FIXED use in elf loader by MAP_FIXED_SAFE. I believe other places which rely on MAP_FIXED should follow. Actually real MAP_FIXED usages should be docummented properly and they should be more of an exception. Does anybody see any fundamental reasons why this is a wrong approach? Diffstat says arch/alpha/include/uapi/asm/mman.h | 2 ++ arch/metag/kernel/process.c | 6 +++++- arch/mips/include/uapi/asm/mman.h | 2 ++ arch/parisc/include/uapi/asm/mman.h | 2 ++ arch/powerpc/include/uapi/asm/mman.h | 1 + arch/sparc/include/uapi/asm/mman.h | 1 + arch/tile/include/uapi/asm/mman.h | 1 + arch/xtensa/include/uapi/asm/mman.h | 2 ++ fs/binfmt_elf.c | 12 ++++++++---- include/uapi/asm-generic/mman.h | 1 + mm/mmap.c | 11 +++++++++++ 11 files changed, 36 insertions(+), 5 deletions(-) [1] http://lkml.kernel.org/r/20171116101900.13621-1-mhocko@kernel.org [2] http://lkml.kernel.org/r/20171107162217.382cd754@canb.auug.org.au [3] http://lkml.kernel.org/r/1510048229.12079.7.camel@abdul.in.ibm.com [4] http://lkml.kernel.org/r/20171023082608.6167-1-mhocko@kernel.org [5] http://lkml.kernel.org/r/20171113094203.aofz2e7kueitk55y@dhcp22.suse.cz [6] http://lkml.kernel.org/r/87efp1w7vy.fsf@concordia.ellerman.id.au -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org> To: linux-api@vger.kernel.org Cc: Khalid Aziz <khalid.aziz@oracle.com>, Michael Ellerman <mpe@ellerman.id.au>, Andrew Morton <akpm@linux-foundation.org>, Russell King - ARM Linux <linux@armlinux.org.uk>, Andrea Arcangeli <aarcange@redhat.com>, linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>, linux-arch@vger.kernel.org, Florian Weimer <fweimer@redhat.com>, John Hubbard <jhubbard@nvidia.com>, Abdul Haleem <abdhalee@linux.vnet.ibm.com>, Joel Stanley <joel@jms.id.au>, Kees Cook <keescook@chromium.org>, Michal Hocko <mhocko@suse.com> Subject: [PATCH 0/2] mm: introduce MAP_FIXED_SAFE Date: Wed, 29 Nov 2017 15:42:17 +0100 [thread overview] Message-ID: <20171129144219.22867-1-mhocko@kernel.org> (raw) Message-ID: <20171129144217.NX051u_29T8szC_RMkK_i4GvmHA-7jsvrZYArUledtg@z> (raw) Hi, I am resending with RFC dropped and ask for inclusion. There haven't been any fundamental objections for the RFC [1]. I have also prepared a man page patch which is 0/3 of this series. This has started as a follow up discussion [2][3] resulting in the runtime failure caused by hardening patch [4] which removes MAP_FIXED from the elf loader because MAP_FIXED is inherently dangerous as it might silently clobber an existing underlying mapping (e.g. stack). The reason for the failure is that some architectures enforce an alignment for the given address hint without MAP_FIXED used (e.g. for shared or file backed mappings). One way around this would be excluding those archs which do alignment tricks from the hardening [5]. The patch is really trivial but it has been objected, rightfully so, that this screams for a more generic solution. We basically want a non-destructive MAP_FIXED. The first patch introduced MAP_FIXED_SAFE which enforces the given address but unlike MAP_FIXED it fails with ENOMEM if the given range conflicts with an existing one. The flag is introduced as a completely new one rather than a MAP_FIXED extension because of the backward compatibility. We really want a never-clobber semantic even on older kernels which do not recognize the flag. Unfortunately mmap sucks wrt. flags evaluation because we do not EINVAL on unknown flags. On those kernels we would simply use the traditional hint based semantic so the caller can still get a different address (which sucks) but at least not silently corrupt an existing mapping. I do not see a good way around that. Except we won't export expose the new semantic to the userspace at all. It seems there are users who would like to have something like that. Jemalloc has been mentioned by Michael Ellerman [6] Florian Weimer has mentioned the following: : glibc ld.so currently maps DSOs without hints. This means that the kernel : will map right next to each other, and the offsets between them a completely : predictable. We would like to change that and supply a random address in a : window of the address space. If there is a conflict, we do not want the : kernel to pick a non-random address. Instead, we would try again with a : random address. John Hubbard has mentioned CUDA example : a) Searches /proc/<pid>/maps for a "suitable" region of available : VA space. "Suitable" generally means it has to have a base address : within a certain limited range (a particular device model might : have odd limitations, for example), it has to be large enough, and : alignment has to be large enough (again, various devices may have : constraints that lead us to do this). : : This is of course subject to races with other threads in the process. : : Let's say it finds a region starting at va. : : b) Next it does: : p = mmap(va, ...) : : *without* setting MAP_FIXED, of course (so va is just a hint), to : attempt to safely reserve that region. If p != va, then in most cases, : this is a failure (almost certainly due to another thread getting a : mapping from that region before we did), and so this layer now has to : call munmap(), before returning a "failure: retry" to upper layers. : : IMPROVEMENT: --> if instead, we could call this: : : p = mmap(va, ... MAP_FIXED_SAFE ...) : : , then we could skip the munmap() call upon failure. This : is a small thing, but it is useful here. (Thanks to Piotr : Jaroszynski and Mark Hairgrove for helping me get that detail : exactly right, btw.) : : c) After that, CUDA suballocates from p, via: : : q = mmap(sub_region_start, ... MAP_FIXED ...) : : Interestingly enough, "freeing" is also done via MAP_FIXED, and : setting PROT_NONE to the subregion. Anyway, I just included (c) for : general interest. Atomic address range probing in the multithreaded programs in general sounds like an interesting thing to me. The second patch simply replaces MAP_FIXED use in elf loader by MAP_FIXED_SAFE. I believe other places which rely on MAP_FIXED should follow. Actually real MAP_FIXED usages should be docummented properly and they should be more of an exception. Does anybody see any fundamental reasons why this is a wrong approach? Diffstat says arch/alpha/include/uapi/asm/mman.h | 2 ++ arch/metag/kernel/process.c | 6 +++++- arch/mips/include/uapi/asm/mman.h | 2 ++ arch/parisc/include/uapi/asm/mman.h | 2 ++ arch/powerpc/include/uapi/asm/mman.h | 1 + arch/sparc/include/uapi/asm/mman.h | 1 + arch/tile/include/uapi/asm/mman.h | 1 + arch/xtensa/include/uapi/asm/mman.h | 2 ++ fs/binfmt_elf.c | 12 ++++++++---- include/uapi/asm-generic/mman.h | 1 + mm/mmap.c | 11 +++++++++++ 11 files changed, 36 insertions(+), 5 deletions(-) [1] http://lkml.kernel.org/r/20171116101900.13621-1-mhocko@kernel.org [2] http://lkml.kernel.org/r/20171107162217.382cd754@canb.auug.org.au [3] http://lkml.kernel.org/r/1510048229.12079.7.camel@abdul.in.ibm.com [4] http://lkml.kernel.org/r/20171023082608.6167-1-mhocko@kernel.org [5] http://lkml.kernel.org/r/20171113094203.aofz2e7kueitk55y@dhcp22.suse.cz [6] http://lkml.kernel.org/r/87efp1w7vy.fsf@concordia.ellerman.id.au
next reply other threads:[~2017-11-29 14:42 UTC|newest] Thread overview: 81+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-11-29 14:42 Michal Hocko [this message] 2017-11-29 14:42 ` [PATCH 0/2] mm: introduce MAP_FIXED_SAFE Michal Hocko 2017-11-29 14:42 ` [PATCH 1/2] " Michal Hocko 2017-11-29 14:42 ` Michal Hocko 2017-12-06 5:15 ` Michael Ellerman 2017-12-06 5:15 ` Michael Ellerman 2017-12-06 9:27 ` Michal Hocko 2017-12-06 9:27 ` Michal Hocko 2017-12-06 10:02 ` Michal Hocko 2017-12-06 10:02 ` Michal Hocko 2017-12-07 12:07 ` Pavel Machek 2017-11-29 14:42 ` [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map Michal Hocko 2017-11-29 14:42 ` Michal Hocko 2017-11-29 17:45 ` Khalid Aziz 2017-11-29 17:45 ` Khalid Aziz 2017-11-29 14:45 ` [PATCH] mmap.2: document new MAP_FIXED_SAFE flag Michal Hocko 2017-11-30 3:16 ` John Hubbard 2017-11-30 3:16 ` John Hubbard 2017-11-30 8:23 ` Michal Hocko 2017-11-30 8:23 ` Michal Hocko 2017-11-30 8:24 ` [PATCH v2] " Michal Hocko 2017-11-30 8:24 ` Michal Hocko 2017-11-30 18:31 ` John Hubbard 2017-11-30 18:39 ` Michal Hocko 2017-11-30 18:39 ` Michal Hocko 2017-11-29 15:13 ` [PATCH 0/2] mm: introduce MAP_FIXED_SAFE Rasmus Villemoes 2017-11-29 15:13 ` Rasmus Villemoes [not found] ` <b154b794-7a8b-995e-0954-9234b9446b31-rjjw5hvvQKZaa/9Udqfwiw@public.gmane.org> 2017-11-29 15:50 ` Michal Hocko 2017-11-29 15:50 ` Michal Hocko 2017-11-29 22:15 ` Kees Cook 2017-11-29 22:12 ` Kees Cook 2017-11-29 22:12 ` Kees Cook 2017-11-29 22:25 ` Kees Cook [not found] ` <CAGXu5jLa=b2HhjWXXTQunaZuz11qUhm5aNXHpS26jVqb=G-gfw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-11-30 6:58 ` Michal Hocko 2017-11-30 6:58 ` Michal Hocko 2017-12-01 15:26 ` Cyril Hrubis 2017-12-01 15:26 ` Cyril Hrubis 2017-12-06 4:51 ` Michael Ellerman 2017-12-06 4:51 ` Michael Ellerman 2017-12-06 4:54 ` Matthew Wilcox 2017-12-06 4:54 ` Matthew Wilcox 2017-12-06 7:03 ` Matthew Wilcox 2017-12-06 7:03 ` Matthew Wilcox 2017-12-06 7:33 ` John Hubbard 2017-12-06 7:33 ` John Hubbard [not found] ` <5f4fc834-274a-b8f1-bda0-5bcddc5902ed-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org> 2017-12-06 7:35 ` Florian Weimer 2017-12-06 7:35 ` Florian Weimer 2017-12-06 8:06 ` John Hubbard 2017-12-06 8:06 ` John Hubbard [not found] ` <27ee1755-76d8-f086-5760-9c973b31108a-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org> 2017-12-06 8:54 ` Florian Weimer 2017-12-06 8:54 ` Florian Weimer [not found] ` <20171206070355.GA32044-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org> 2017-12-07 5:46 ` Michael Ellerman 2017-12-07 5:46 ` Michael Ellerman 2017-12-07 19:14 ` Kees Cook 2017-12-07 19:14 ` Kees Cook [not found] ` <CAGXu5jLWRQn6EaXEEvdvXr+4gbiJawwp1EaLMfYisHVfMiqgSA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-12-07 19:57 ` Matthew Wilcox 2017-12-07 19:57 ` Matthew Wilcox 2017-12-08 8:33 ` Michal Hocko [not found] ` <20171208083315.GR20234-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org> 2017-12-08 20:13 ` Kees Cook 2017-12-08 20:13 ` Kees Cook [not found] ` <CAGXu5j+VupGmKEEHx-uNXw27Xvndu=0ObsBqMwQiaYPyMGD+vw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-12-08 20:57 ` Matthew Wilcox 2017-12-08 20:57 ` Matthew Wilcox 2017-12-08 11:08 ` Michael Ellerman 2017-12-08 11:08 ` Michael Ellerman 2017-12-08 14:27 ` Pavel Machek 2017-12-08 14:27 ` Pavel Machek 2017-12-08 20:31 ` Cyril Hrubis 2017-12-08 20:31 ` Cyril Hrubis 2017-12-08 20:47 ` Florian Weimer 2017-12-08 20:47 ` Florian Weimer 2017-12-08 14:33 ` David Laight 2017-12-08 14:33 ` David Laight 2017-12-06 4:50 ` Michael Ellerman 2017-12-06 4:50 ` Michael Ellerman [not found] ` <87zi6we9z2.fsf-W0DJWXSxmBNbyGPkN3NxC2scP1bn1w/D@public.gmane.org> 2017-12-06 7:33 ` Rasmus Villemoes 2017-12-06 7:33 ` Rasmus Villemoes [not found] ` <a3b3129a-2626-a65e-59b0-68aada523723-rjjw5hvvQKZaa/9Udqfwiw@public.gmane.org> 2017-12-06 9:08 ` Michal Hocko 2017-12-06 9:08 ` Michal Hocko 2017-12-07 0:19 ` Kees Cook 2017-12-07 0:19 ` Kees Cook 2017-12-07 1:08 ` John Hubbard
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20171129144219.22867-1-mhocko@kernel.org \ --to=mhocko@kernel.org \ --cc=aarcange@redhat.com \ --cc=abdhalee@linux.vnet.ibm.com \ --cc=akpm@linux-foundation.org \ --cc=fweimer@redhat.com \ --cc=jhubbard@nvidia.com \ --cc=joel@jms.id.au \ --cc=keescook@chromium.org \ --cc=khalid.aziz@oracle.com \ --cc=linux-api@vger.kernel.org \ --cc=linux-arch@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linux@armlinux.org.uk \ --cc=mhocko@suse.com \ --cc=mpe@ellerman.id.au \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).