From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755437AbdABIrq convert rfc822-to-8bit (ORCPT ); Mon, 2 Jan 2017 03:47:46 -0500 Received: from mout.kundenserver.de ([212.227.126.131]:55473 "EHLO mout.kundenserver.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750990AbdABIro (ORCPT ); Mon, 2 Jan 2017 03:47:44 -0500 From: Arnd Bergmann To: "Kirill A. Shutemov" Cc: Linus Torvalds , Andrew Morton , x86@kernel.org, Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Andi Kleen , Dave Hansen , Andy Lutomirski , linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Catalin Marinas , Will Deacon Subject: Re: [RFC, PATCHv2 29/29] mm, x86: introduce RLIMIT_VADDR Date: Mon, 02 Jan 2017 09:44:46 +0100 Message-ID: <2736959.3MfCab47fD@wuerfel> User-Agent: KMail/5.1.3 (Linux/4.4.0-34-generic; KDE/5.18.0; x86_64; ; ) In-Reply-To: <20161227015413.187403-30-kirill.shutemov@linux.intel.com> References: <20161227015413.187403-1-kirill.shutemov@linux.intel.com> <20161227015413.187403-30-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8BIT Content-Type: text/plain; charset="UTF-8" X-Provags-ID: V03:K0:oO6ltoR3hXNQMitMs+Ocgiu+wHH8KSH/4qfqKBI04x5jshouBrI 6jm9X9L30uayIG9mLP+ea6gy0f4V5ZDivUHMYHCNvw/pmc44v/g+j99ZIAkS+9vjsBne7Tl RZrqCeLJuxGZhh99YLvdz+M+APn6V7iTUJX1FH+/bvuGLbGruPthSIgdKNhUmXDGnoTi9+F RMwEl6m3yr06xBRTlfG5w== X-UI-Out-Filterresults: notjunk:1;V01:K0:ZddYe8RKXkg=:UeBPjvivMJd3Klio3uAzFT Jflws+J2tu7LMYFa+PJw+1+LFJMIyB4hv1S78iKsTogwNAwD8Zd5HiEzPE9+Q7kiNvUkJh8Kp Wp3adJ31PYsRPJEAWWHYzuLatxCqtUjpJv03/gOoNqrWowDZsnAKcyugRk6R6ojEIkyexqN/7 Lh266TPFQU7JDOsZf9iHI5wW3kZb/NT8hNs1DgYKIZhOi2IGjW739p/QRea03W+5w45NkPJqx G22kNdIEyApD9+fbr70JHXFKBN7uDpQ7lzPsRfWQ4DndPCEHQct1xWoArSjrU88+4b1vtjNW/ Uotu05vXnndEx7R8eSWgc+UrTqy5R9ZJoPqDYSWB9eotJjeRvzqD23pcN/E+gspoW4fZvicsz bhXL9GZNJC4IEVB3zQHWiz5ImpYKux+sWcTy1dt/kPh64me4KZVUt3pgvqphpr/2ovvQES2Tj nKu9srjvJnLEGFe2LrXWHtmzhhzX6V1hU4ErKKjzaeiwSEMWYEt1aATyoG39WZYMhYBbResT6 VuastZnzPDZdizcH3xTPYcDiDqV2Da+HDUzukX98cF1EcajoILJd2S+XguDohi5+Sap2g3IXe NR4ibpAY7wclV+Fip9SYEe2QIA7MyivLNA6Q9+TDZ8UQhgyzP+8Wlk/Cq5Yczb87lpiNh2Jng FqgrmdITwCooGWE4cJE4TuD8kOFvPR1W0YrMStvZAb/gZf0Srehk98fz/uVhqLjSL8WO8fAX9 BR82d6KNm9w1rKDf Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tuesday, December 27, 2016 4:54:13 AM CET Kirill A. Shutemov wrote: > This patch introduces new rlimit resource to manage maximum virtual > address available to userspace to map. > > On x86, 5-level paging enables 56-bit userspace virtual address space. > Not all user space is ready to handle wide addresses. It's known that > at least some JIT compilers use high bit in pointers to encode their > information. It collides with valid pointers with 5-level paging and > leads to crashes. > > The patch aims to address this compatibility issue. > > MM would use min(RLIMIT_VADDR, TASK_SIZE) as upper limit of virtual > address available to map by userspace. > > The default hard limit will be RLIM_INFINITY, which basically means that > TASK_SIZE limits available address space. > > The soft limit will also be RLIM_INFINITY everywhere, but the machine > with 5-level paging enabled. In this case, soft limit would be > (1UL << 47) - PAGE_SIZE. It’s current x86-64 TASK_SIZE_MAX with 4-level > paging which known to be safe > > New rlimit resource would follow usual semantics with regards to > inheritance: preserved on fork(2) and exec(2). This has potential to > break application if limits set too wide or too narrow, but this is not > uncommon for other resources (consider RLIMIT_DATA or RLIMIT_AS). > > As with other resources you can set the limit lower than current usage. > It would affect only future virtual address space allocations. > > Use-cases for new rlimit: > > - Bumping the soft limit to RLIM_INFINITY, allows current process all > its children to use addresses above 47-bits. > > - Bumping the soft limit to RLIM_INFINITY after fork(2), but before > exec(2) allows the child to use addresses above 47-bits. > > - Lowering the hard limit to 47-bits would prevent current process all > its children to use addresses above 47-bits, unless a process has > CAP_SYS_RESOURCES. > > - It’s also can be handy to lower hard or soft limit to arbitrary > address. User-mode emulation in QEMU may lower the limit to 32-bit > to emulate 32-bit machine on 64-bit host. > > TODO: > - port to non-x86; > > Not-yet-signed-off-by: Kirill A. Shutemov > Cc: linux-api@vger.kernel.org This seems to nicely address the same problem on arm64, which has run into the same issue due to the various page table formats that can currently be chosen at compile time. I don't see how this interacts with the existing PER_LINUX32/PER_LINUX32_3GB personality flags, but I assume you have either already thought of that, or we can come up with a good way to define what happens when conflicting settings are applied. The two reasonable ways I can think of are to either use the minimum of the two limits, or to make the personality syscall set the soft rlimit and use whatever limit was last set. Arnd