From mboxrd@z Thu Jan 1 00:00:00 1970 From: Till Smejkal Subject: Re: [RFC PATCH 00/13] Introduce first class virtual address spaces Date: Wed, 15 Mar 2017 12:44:47 -0700 Message-ID: <20170315194447.scsf3fiwvf7z5gzc@arch-dev> References: Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20161025; h=from:date:to:cc:subject:message-id:mail-followup-to:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=zq1KQ1qx63E3h1lESbp46QzaSWUSeZO/pkzhE++D5Lw=; b=P/AccalLZqi5XqN6dGtPeBefPYZ9i0ymDTF6qo1YIXX5PQyHG2BYDK7Su6QB7qe+6W qqy3jcOQnTqHjJPreqI7MjSoNgbq+iFioLkGPq9SbjjR+dgRSPZIOYEQFtqBKHcp1i0M fzsXDFuwLphAXHvzJkDgPkm7EU+Bpvbt2O+Y1VgcvWrkNXIk8VHbk0t7Lovz8/L/ze4W bkVeYi8Dj/8ImXCRyMvBLM3B+QBvTKOn5fYPLcfm4kqEli9EUl5JYe/ZyuLPe5fcks6r I1t5KlKzPXO5IwTvPRp/DAEUSZ6OXQ5mWqt93OVdN7Tdb1j2iHDRIe4X97TrqW8kBwY6 N7qg== Content-Disposition: inline In-Reply-To: Sender: owner-linux-aio@kvack.org List-ID: Content-Type: text/plain; charset="windows-1252" To: Andy Lutomirski Cc: Andy Lutomirski , Till Smejkal , Richard Henderson , Ivan Kokshaysky , Matt Turner , Vineet Gupta , Russell King , Catalin Marinas , Will Deacon , Steven Miao , Richard Kuo , Tony Luck , Fenghua Yu , James Hogan , Ralf Baechle , "James E.J. Bottomley" , Helge Deller , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , Martin Schwidefsky , Heiko Carstens , Yoshinori Sato On Wed, 15 Mar 2017, Andy Lutomirski wrote: > > One advantage of VAS segments is that they can be globally queried by u= ser programs > > which means that VAS segments can be shared by applications that not ne= cessarily have > > to be related. If I am not mistaken, MAP_SHARED of pure in memory data = will only work > > if the tasks that share the memory region are related (aka. have a comm= on parent that > > initialized the shared mapping). Otherwise, the shared mapping have to = be backed by a > > file. >=20 > What's wrong with memfd_create()? >=20 > > VAS segments on the other side allow sharing of pure in memory data by > > arbitrary related tasks without the need of a file. This becomes especi= ally > > interesting if one combines VAS segments with non-volatile memory since= one can keep > > data structures in the NVM and still be able to share them between mult= iple tasks. >=20 > What's wrong with regular mmap? I never wanted to say that there is something wrong with regular mmap. We j= ust figured that with VAS segments you could remove the need to mmap your share= d data but instead can keep everything purely in memory. Unfortunately, I am not at full speed with memfds. Is my understanding corr= ect that if the last user of such a file descriptor closes it, the corresponding mem= ory is freed? Accordingly, memfd cannot be used to keep data in memory while no pr= ogram is currently using it, can it? To be able to do this you need again some repre= sentation of the data in a file? Yes, you can use a tmpfs to keep the file content in= memory as well, or some DAX filesystem to keep the file content in NVM, but this alwa= ys requires that such filesystems are mounted in the system that the applicati= on is currently running on. VAS segments on the other side would provide a functi= onality to achieve the same without the need of any mounted filesystem. However, I agr= ee, that this is just a small advantage compared to what can already be achieved wit= h the existing functionality provided by the Linux kernel. I probably need to rev= isit the whole idea of first class virtual address space segments before continuing = with this pacthset. Thank you very much for the great feedback. > >> >> Ick. Please don't do this. Can we please keep an mm as just an mm > >> >> and not make it look magically different depending on which process > >> >> maps it? If you need a trampoline (which you do, of course), just > >> >> write a trampoline in regular user code and map it manually. > >> > > >> > Did I understand you correctly that you are proposing that the switc= hing thread > >> > should make sure by itself that its code, stack, =E2=80=A6 memory re= gions are properly setup > >> > in the new AS before/after switching into it? I think, this would ma= ke using first > >> > class virtual address spaces much more difficult for user applicatio= ns to the extend > >> > that I am not even sure if they can be used at all. At the moment, s= witching into a > >> > VAS is a very simple operation for an application because the kernel= will just simply > >> > do the right thing. > >> > >> Yes. I think that having the same mm_struct look different from > >> different tasks is problematic. Getting it right in the arch code is > >> going to be nasty. The heuristics of what to share are also tough -- > >> why would text + data + stack or whatever you're doing be adequate? > >> What if you're in a thread? What if two tasks have their stacks in > >> the same place? > > > > The different ASes that a task now can have when it uses first class vi= rtual address > > spaces are not realized in the kernel by using only one mm_struct per t= ask that just > > looks differently but by using multiple mm_structs - one for each AS th= at the task > > can execute in. When a task attaches a first class virtual address spac= e to itself to > > be able to use another AS, the kernel adds a temporary mm_struct to thi= s task that > > contains the mappings of the first class virtual address space and the = one shared > > with the task's original AS. If a thread now wants to switch into this = attached first > > class virtual address space the kernel only changes the 'mm' and 'activ= e_mm' pointers > > in the task_struct of the thread to the temporary mm_struct and perform= s the > > corresponding mm_switch operation. The original mm_struct of the thread= will not be > > changed. > > > > Accordingly, I do not magically make mm_structs look differently depend= ing on the > > task that uses it, but create temporary mm_structs that only contain ma= ppings to the > > same memory regions. >=20 > This sounds complicated and fragile. What happens if a heuristically > shared region coincides with a region in the "first class address > space" being selected? If such a conflict happens, the task cannot use the first class address spa= ce and the corresponding system call will return an error. However, with the current a= vailable virtual address space size that programs can use, such conflicts are probab= ly rare. I could also image some additional functionality that allows a user to mark p= arts of its AS to not to be shared/to be shared when switching into a VAS. With this functionality in place, there would be no need for a heuristic in the kerne= l but the user decides what to share. The kernel would by default only share code, da= ta, and stack and the application/libraries have to mark all the other memory regio= ns as shared if they need to be also available in the VAS. > I think the right solution is "you're a user program playing virtual > address games -- make sure you do it right". Hm, in general I agree, that the easier and more robust solution from the k= ernel perspective is to let the user do the AS setup and only provide the functio= nality to create new empty ASes. Though, I think that such an interface would be much= more difficult to use than my current design. Letting the user program setup the= AS has also another implication that I currently don't have. Since I share the cod= e and stack regions between all ASes that are available to a process, I don't nee= d to save/restore stack pointers or instruction pointers when threads switch bet= ween ASes. However, when the user will setup the AS, the kernel cannot be sure that th= e code and stack will be mapped at the same virtual address and hence has to save and = restore these registers (and also potentially others since we can now basically jum= p between different execution contexts). When we first designed first class virtual address spaces, we had one speci= al use-case in mind, namely that one application wants to use different data s= ets that it does not want/can keep in the same AS. Hence, sharing code and stack bet= ween the different ASes that the application uses was a logic step for us because th= e code memory region for example has to be available at all AS anyways since all o= f them execute the same application. Sharing the stack memory region enabled the a= pplication to keep volatile information that might be needed in the new AS on the stac= k which allows easy information flow between the different ASes.=20 For this patch, I extended the initial sharing of stack and code memory reg= ions to all memory regions that are available in the tasks original AS to also allow dynamically linked applications and multi-threaded applications to flawless= ly use first class virtual address spaces. To put it in a nutshell, we envisioned first class virtual address spaces t= o be rather used as shareable/reusable data containers which made sharing variou= s memory regions that are crucial for the execution of the application a feasible implementation decision. Thank you all very much for the feedback. I really appreciate it. Till -- To unsubscribe, send a message with 'unsubscribe linux-aio' in the body to majordomo@kvack.org. For more info on Linux AIO, see: http://www.kvack.org/aio/ Don't email: aart@kvack.org