From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk0-x232.google.com (mail-vk0-x232.google.com [IPv6:2607:f8b0:400c:c05::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3vjyLg5lXdzDqGk for ; Thu, 16 Mar 2017 03:51:54 +1100 (AEDT) Received: by mail-vk0-x232.google.com with SMTP id t8so11817808vke.3 for ; Wed, 15 Mar 2017 09:51:54 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20170314161229.tl6hsmian2gdep47@arch-dev> References: <20170314161229.tl6hsmian2gdep47@arch-dev> From: Andy Lutomirski Date: Wed, 15 Mar 2017 09:51:31 -0700 Message-ID: Subject: Re: [RFC PATCH 00/13] Introduce first class virtual address spaces To: Andy Lutomirski , Andy Lutomirski , Till Smejkal , Richard Henderson , Ivan Kokshaysky , Matt Turner , Vineet Gupta , Russell King , Catalin Marinas , Will Deacon , Steven Miao , Richard Kuo , Tony Luck , Fenghua Yu , James Hogan , Ralf Baechle , "James E.J. Bottomley" , Helge Deller , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , Martin Schwidefsky , Heiko Carstens , Yoshinori Sato , Rich Felker , "David S. Miller" , Chris Metcalf , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , X86 ML , Chris Zankel , Max Filippov , Arnd Bergmann , Greg Kroah-Hartman , Laurent Pinchart , Mauro Carvalho Chehab , Pawel Osciak , Marek Szyprowski , Kyungmin Park , David Woodhouse , Brian Norris , Boris Brezillon , Marek Vasut , Richard Weinberger , Cyrille Pitchen , Felipe Balbi , Alexander Viro , Benjamin LaHaise , Nadia Yvette Chambers , Jeff Layton , "J. Bruce Fields" , Peter Zijlstra , Hugh Dickins , Arnaldo Carvalho de Melo , Alexander Shishkin , Jaroslav Kysela , Takashi Iwai , "linux-kernel@vger.kernel.org" , linux-alpha@vger.kernel.org, arcml , "linux-arm-kernel@lists.infradead.org" , adi-buildroot-devel@lists.sourceforge.net, linux-hexagon@vger.kernel.org, "linux-ia64@vger.kernel.org" , linux-metag@vger.kernel.org, Linux MIPS Mailing List , linux-parisc@vger.kernel.org, linuxppc-dev , "linux-s390@vger.kernel.org" , "linux-sh@vger.kernel.org" , sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, Linux Media Mailing List , linux-mtd@lists.infradead.org, USB list , Linux FS Devel , linux-aio@kvack.org, "linux-mm@kvack.org" , Linux API , linux-arch , ALSA development Content-Type: text/plain; charset=UTF-8 List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Tue, Mar 14, 2017 at 9:12 AM, Till Smejkal wrote: > On Mon, 13 Mar 2017, Andy Lutomirski wrote: >> On Mon, Mar 13, 2017 at 7:07 PM, Till Smejkal >> wrote: >> > On Mon, 13 Mar 2017, Andy Lutomirski wrote: >> >> This sounds rather complicated. Getting TLB flushing right seems >> >> tricky. Why not just map the same thing into multiple mms? >> > >> > This is exactly what happens at the end. The memory region that is des= cribed by the >> > VAS segment will be mapped in the ASes that use the segment. >> >> So why is this kernel feature better than just doing MAP_SHARED >> manually in userspace? > > One advantage of VAS segments is that they can be globally queried by use= r programs > which means that VAS segments can be shared by applications that not nece= ssarily have > to be related. If I am not mistaken, MAP_SHARED of pure in memory data wi= ll only work > if the tasks that share the memory region are related (aka. have a common= parent that > initialized the shared mapping). Otherwise, the shared mapping have to be= backed by a > file. What's wrong with memfd_create()? > VAS segments on the other side allow sharing of pure in memory data by > arbitrary related tasks without the need of a file. This becomes especial= ly > interesting if one combines VAS segments with non-volatile memory since o= ne can keep > data structures in the NVM and still be able to share them between multip= le tasks. What's wrong with regular mmap? > >> >> Ick. Please don't do this. Can we please keep an mm as just an mm >> >> and not make it look magically different depending on which process >> >> maps it? If you need a trampoline (which you do, of course), just >> >> write a trampoline in regular user code and map it manually. >> > >> > Did I understand you correctly that you are proposing that the switchi= ng thread >> > should make sure by itself that its code, stack, =E2=80=A6 memory regi= ons are properly setup >> > in the new AS before/after switching into it? I think, this would make= using first >> > class virtual address spaces much more difficult for user applications= to the extend >> > that I am not even sure if they can be used at all. At the moment, swi= tching into a >> > VAS is a very simple operation for an application because the kernel w= ill just simply >> > do the right thing. >> >> Yes. I think that having the same mm_struct look different from >> different tasks is problematic. Getting it right in the arch code is >> going to be nasty. The heuristics of what to share are also tough -- >> why would text + data + stack or whatever you're doing be adequate? >> What if you're in a thread? What if two tasks have their stacks in >> the same place? > > The different ASes that a task now can have when it uses first class virt= ual address > spaces are not realized in the kernel by using only one mm_struct per tas= k that just > looks differently but by using multiple mm_structs - one for each AS that= the task > can execute in. When a task attaches a first class virtual address space = to itself to > be able to use another AS, the kernel adds a temporary mm_struct to this = task that > contains the mappings of the first class virtual address space and the on= e shared > with the task's original AS. If a thread now wants to switch into this at= tached first > class virtual address space the kernel only changes the 'mm' and 'active_= mm' pointers > in the task_struct of the thread to the temporary mm_struct and performs = the > corresponding mm_switch operation. The original mm_struct of the thread w= ill not be > changed. > > Accordingly, I do not magically make mm_structs look differently dependin= g on the > task that uses it, but create temporary mm_structs that only contain mapp= ings to the > same memory regions. This sounds complicated and fragile. What happens if a heuristically shared region coincides with a region in the "first class address space" being selected? I think the right solution is "you're a user program playing virtual address games -- make sure you do it right". --Andy