From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx175.postini.com [74.125.245.175]) by kanga.kvack.org (Postfix) with SMTP id 8251A6B0031 for ; Fri, 21 Jun 2013 19:42:42 -0400 (EDT) Received: by mail-vb0-f46.google.com with SMTP id 10so6380520vbe.5 for ; Fri, 21 Jun 2013 16:42:41 -0700 (PDT) MIME-Version: 1.0 Date: Fri, 21 Jun 2013 16:42:41 -0700 Message-ID: Subject: RFC: named anonymous vmas From: Colin Cross Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: lkml Cc: Linux-MM , Android Kernel Team , John Stultz One of the features of ashmem (drivers/staging/android/ashmem.c) that hasn't gotten much discussion about moving out of staging is named anonymous memory. In Android, ashmem is used for three different features, and most users of it only care about one feature at a time. One is volatile ranges, which John Stultz has been implementing. The second is anonymous shareable memory without having a world-writable tmpfs that untrusted apps could fill with files. The third and most heavily used feature within the Android codebase is named anonymous memory, where a region of anonymous memory can have a name associated with it that will show up in /proc/pid/maps. The Dalvik VM likes to use this feature extensively, even for memory that will never be shared and could easily be allocated using an anonymous mmap, and even malloc has used it in the past. It provides an easy way to collate memory used for different purposes across multiple processes, which Android uses for its "dumpsys meminfo" and "librank" tools to determine how much memory is used for java heaps, JIT caches, native mallocs, etc. I'd like to add this feature for anonymous mmap memory. I propose adding an madvise2(unsigned long start, size_t len_in, int behavior, void *ptr, size_t size) syscall and a new MADV_NAME behavior, which treats ptr as a string of length size. The string would be copied somewhere reusable in the kernel, or reused if it already exists, and the kernel address of the string would get stashed in a new field in struct vm_area_struct. Adjacent vmas would only get merged if the name pointer matched, and naming part of a mapping would split the mapping. show_map_vma would print the name only if none of the other existing names rules match. Any comments as I start implementing it? Is there any reason to allow naming a file-backed mapping and showing it alongside the file name in /proc/pid/maps? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx160.postini.com [74.125.245.160]) by kanga.kvack.org (Postfix) with SMTP id 6D7BB6B0031 for ; Sat, 22 Jun 2013 01:12:51 -0400 (EDT) Received: by mail-oa0-f42.google.com with SMTP id j6so649367oag.29 for ; Fri, 21 Jun 2013 22:12:50 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: Date: Sat, 22 Jun 2013 14:12:50 +0900 Message-ID: Subject: Re: RFC: named anonymous vmas From: Kyungmin Park Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Colin Cross Cc: lkml , Linux-MM , Android Kernel Team , John Stultz , Hyunhee Kim , Marek Szyprowski , Tomasz Stanislawski On Sat, Jun 22, 2013 at 8:42 AM, Colin Cross wrote: > One of the features of ashmem (drivers/staging/android/ashmem.c) that > hasn't gotten much discussion about moving out of staging is named > anonymous memory. > > In Android, ashmem is used for three different features, and most > users of it only care about one feature at a time. One is volatile > ranges, which John Stultz has been implementing. The second is > anonymous shareable memory without having a world-writable tmpfs that > untrusted apps could fill with files. The third and most heavily used > feature within the Android codebase is named anonymous memory, where a > region of anonymous memory can have a name associated with it that > will show up in /proc/pid/maps. The Dalvik VM likes to use this Good to know it. I didn't know ashmem provides these features. we are also discussing these requirement internally. and study how to show who request these anon memory and which callback is used for it. > feature extensively, even for memory that will never be shared and > could easily be allocated using an anonymous mmap, and even malloc has > used it in the past. It provides an easy way to collate memory used > for different purposes across multiple processes, which Android uses > for its "dumpsys meminfo" and "librank" tools to determine how much > memory is used for java heaps, JIT caches, native mallocs, etc. Same requirement for app developers. they want to know what's the meaning these anon memory is allocated and how to find out these anon memory is allocated at their codes. > > I'd like to add this feature for anonymous mmap memory. I propose > adding an madvise2(unsigned long start, size_t len_in, int behavior, > void *ptr, size_t size) syscall and a new MADV_NAME behavior, which > treats ptr as a string of length size. The string would be copied > somewhere reusable in the kernel, or reused if it already exists, and > the kernel address of the string would get stashed in a new field in > struct vm_area_struct. Adjacent vmas would only get merged if the > name pointer matched, and naming part of a mapping would split the > mapping. show_map_vma would print the name only if none of the other > existing names rules match. Do you want to create new syscall? can it use current madvise and only allow this feature at linux only? As you know it's just hint and it doesn't break existing memory behaviors. > > Any comments as I start implementing it? Is there any reason to allow > naming a file-backed mapping and showing it alongside the file name in > /proc/pid/maps? > Thank you, Kyungmin Park -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx202.postini.com [74.125.245.202]) by kanga.kvack.org (Postfix) with SMTP id 7A03F6B0031 for ; Sat, 22 Jun 2013 01:20:03 -0400 (EDT) Received: by mail-ve0-f172.google.com with SMTP id jz10so7227088veb.3 for ; Fri, 21 Jun 2013 22:20:02 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: Date: Fri, 21 Jun 2013 22:20:02 -0700 Message-ID: Subject: Re: RFC: named anonymous vmas From: Colin Cross Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Kyungmin Park Cc: lkml , Linux-MM , Android Kernel Team , John Stultz , Hyunhee Kim , Marek Szyprowski , Tomasz Stanislawski On Fri, Jun 21, 2013 at 10:12 PM, Kyungmin Park wrote: > On Sat, Jun 22, 2013 at 8:42 AM, Colin Cross wrote: >> One of the features of ashmem (drivers/staging/android/ashmem.c) that >> hasn't gotten much discussion about moving out of staging is named >> anonymous memory. >> >> In Android, ashmem is used for three different features, and most >> users of it only care about one feature at a time. One is volatile >> ranges, which John Stultz has been implementing. The second is >> anonymous shareable memory without having a world-writable tmpfs that >> untrusted apps could fill with files. The third and most heavily used >> feature within the Android codebase is named anonymous memory, where a >> region of anonymous memory can have a name associated with it that >> will show up in /proc/pid/maps. The Dalvik VM likes to use this > > Good to know it. I didn't know ashmem provides these features. > we are also discussing these requirement internally. and study how to > show who request these anon memory and which callback is used for it. > >> feature extensively, even for memory that will never be shared and >> could easily be allocated using an anonymous mmap, and even malloc has >> used it in the past. It provides an easy way to collate memory used >> for different purposes across multiple processes, which Android uses >> for its "dumpsys meminfo" and "librank" tools to determine how much >> memory is used for java heaps, JIT caches, native mallocs, etc. > Same requirement for app developers. they want to know what's the > meaning these anon memory is allocated and how to find out these anon > memory is allocated at their codes. >> >> I'd like to add this feature for anonymous mmap memory. I propose >> adding an madvise2(unsigned long start, size_t len_in, int behavior, >> void *ptr, size_t size) syscall and a new MADV_NAME behavior, which >> treats ptr as a string of length size. The string would be copied >> somewhere reusable in the kernel, or reused if it already exists, and >> the kernel address of the string would get stashed in a new field in >> struct vm_area_struct. Adjacent vmas would only get merged if the >> name pointer matched, and naming part of a mapping would split the >> mapping. show_map_vma would print the name only if none of the other >> existing names rules match. > Do you want to create new syscall? can it use current madvise and only > allow this feature at linux only? > As you know it's just hint and it doesn't break existing memory behaviors. The existing madvise syscall only takes a single int to modify the vma, which is not enough to pass a pointer to a string. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx146.postini.com [74.125.245.146]) by kanga.kvack.org (Postfix) with SMTP id B4FD46B0031 for ; Sat, 22 Jun 2013 06:32:10 -0400 (EDT) Date: Sat, 22 Jun 2013 03:31:58 -0700 From: Christoph Hellwig Subject: Re: RFC: named anonymous vmas Message-ID: <20130622103158.GA16304@infradead.org> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Colin Cross Cc: lkml , Linux-MM , Android Kernel Team , John Stultz On Fri, Jun 21, 2013 at 04:42:41PM -0700, Colin Cross wrote: > ranges, which John Stultz has been implementing. The second is > anonymous shareable memory without having a world-writable tmpfs that > untrusted apps could fill with files. I still haven't seen any explanation of what ashmem buys over a shared mmap of /dev/zero in that respect, btw. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx171.postini.com [74.125.245.171]) by kanga.kvack.org (Postfix) with SMTP id 57FB16B0034 for ; Sat, 22 Jun 2013 13:30:31 -0400 (EDT) Received: by mail-ve0-f176.google.com with SMTP id c13so7379831vea.21 for ; Sat, 22 Jun 2013 10:30:30 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20130622103158.GA16304@infradead.org> References: <20130622103158.GA16304@infradead.org> Date: Sat, 22 Jun 2013 10:30:30 -0700 Message-ID: Subject: Re: RFC: named anonymous vmas From: Colin Cross Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Hellwig Cc: lkml , Linux-MM , Android Kernel Team , John Stultz On Sat, Jun 22, 2013 at 3:31 AM, Christoph Hellwig wrote: > On Fri, Jun 21, 2013 at 04:42:41PM -0700, Colin Cross wrote: >> ranges, which John Stultz has been implementing. The second is >> anonymous shareable memory without having a world-writable tmpfs that >> untrusted apps could fill with files. > > I still haven't seen any explanation of what ashmem buys over a shared > mmap of /dev/zero in that respect, btw. I believe the difference is that ashmem ties the memory to an fd, so it can be passed to another process and mmaped to get to the same memory, but /dev/zero does not. Passing a /dev/zero fd and mmaping it would result in a brand new region of zeroed memory. Opening a tmpfs file would allow sharing memory by passing the fd, but we don't want a world-writable tmpfs. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx193.postini.com [74.125.245.193]) by kanga.kvack.org (Postfix) with SMTP id A81F16B0034 for ; Sat, 22 Jun 2013 15:47:42 -0400 (EDT) Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1UqTme-0008Hc-Bj for linux-mm@kvack.org; Sat, 22 Jun 2013 21:47:40 +0200 Received: from c-50-132-41-203.hsd1.wa.comcast.net ([50.132.41.203]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 22 Jun 2013 21:47:40 +0200 Received: from eternaleye by c-50-132-41-203.hsd1.wa.comcast.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 22 Jun 2013 21:47:40 +0200 From: Alex Elsayed Subject: Re: RFC: named anonymous vmas Date: Sat, 22 Jun 2013 12:47:29 -0700 Message-ID: References: <20130622103158.GA16304@infradead.org> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7Bit Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Colin Cross wrote: > On Sat, Jun 22, 2013 at 3:31 AM, Christoph Hellwig > wrote: >> On Fri, Jun 21, 2013 at 04:42:41PM -0700, Colin Cross wrote: >>> ranges, which John Stultz has been implementing. The second is >>> anonymous shareable memory without having a world-writable tmpfs that >>> untrusted apps could fill with files. >> >> I still haven't seen any explanation of what ashmem buys over a shared >> mmap of /dev/zero in that respect, btw. > > I believe the difference is that ashmem ties the memory to an fd, so > it can be passed to another process and mmaped to get to the same > memory, but /dev/zero does not. Passing a /dev/zero fd and mmaping it > would result in a brand new region of zeroed memory. Opening a tmpfs > file would allow sharing memory by passing the fd, but we don't want a > world-writable tmpfs. Couldn't this be done by having a root-only tmpfs, and having a userspace component that creates per-app directories with restrictive permissions on startup/app install? Then each app creates files in its own directory, and can pass the fds around. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx158.postini.com [74.125.245.158]) by kanga.kvack.org (Postfix) with SMTP id A56C46B007D for ; Mon, 24 Jun 2013 07:48:33 -0400 (EDT) Date: Mon, 24 Jun 2013 04:48:32 -0700 From: Christoph Hellwig Subject: Re: RFC: named anonymous vmas Message-ID: <20130624114832.GA9961@infradead.org> References: <20130622103158.GA16304@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Alex Elsayed Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org On Sat, Jun 22, 2013 at 12:47:29PM -0700, Alex Elsayed wrote: > Couldn't this be done by having a root-only tmpfs, and having a userspace > component that creates per-app directories with restrictive permissions on > startup/app install? Then each app creates files in its own directory, and > can pass the fds around. Honestly having a device that allows passing fds around that can be mmaped sounds a lot simpler. I have to admit that I expect /dev/zero to do this, but looking at the code it creates new file structures at ->mmap time which would defeat this. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx113.postini.com [74.125.245.113]) by kanga.kvack.org (Postfix) with SMTP id 8540E6B0032 for ; Mon, 24 Jun 2013 13:26:58 -0400 (EDT) Received: by mail-ve0-f180.google.com with SMTP id pa12so8925106veb.25 for ; Mon, 24 Jun 2013 10:26:57 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20130624114832.GA9961@infradead.org> References: <20130622103158.GA16304@infradead.org> <20130624114832.GA9961@infradead.org> Date: Mon, 24 Jun 2013 10:26:57 -0700 Message-ID: Subject: Re: RFC: named anonymous vmas From: Colin Cross Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Hellwig Cc: Alex Elsayed , Linux-MM , lkml On Mon, Jun 24, 2013 at 4:48 AM, Christoph Hellwig wrote: > On Sat, Jun 22, 2013 at 12:47:29PM -0700, Alex Elsayed wrote: >> Couldn't this be done by having a root-only tmpfs, and having a userspace >> component that creates per-app directories with restrictive permissions on >> startup/app install? Then each app creates files in its own directory, and >> can pass the fds around. If each app gets its own writable directory that's not really different than a world writable tmpfs. It requires something that watches for apps to exit for any reason and cleans up their directories, and it requires each app to come up with an unused name when it wants to create a file, and the kernel can give you both very cleanly. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx120.postini.com [74.125.245.120]) by kanga.kvack.org (Postfix) with SMTP id 359586B0032 for ; Mon, 24 Jun 2013 19:45:13 -0400 (EDT) Received: by mail-ie0-f180.google.com with SMTP id f4so25976866iea.39 for ; Mon, 24 Jun 2013 16:45:12 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <20130622103158.GA16304@infradead.org> <20130624114832.GA9961@infradead.org> Date: Mon, 24 Jun 2013 16:45:12 -0700 Message-ID: Subject: Re: RFC: named anonymous vmas From: John Stultz Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Colin Cross Cc: Christoph Hellwig , Alex Elsayed , Linux-MM , lkml On Mon, Jun 24, 2013 at 10:26 AM, Colin Cross wrote: > On Mon, Jun 24, 2013 at 4:48 AM, Christoph Hellwig wrote: >> On Sat, Jun 22, 2013 at 12:47:29PM -0700, Alex Elsayed wrote: >>> Couldn't this be done by having a root-only tmpfs, and having a userspace >>> component that creates per-app directories with restrictive permissions on >>> startup/app install? Then each app creates files in its own directory, and >>> can pass the fds around. > > If each app gets its own writable directory that's not really > different than a world writable tmpfs. It requires something that > watches for apps to exit for any reason and cleans up their > directories, and it requires each app to come up with an unused name > when it wants to create a file, and the kernel can give you both very > cleanly. Though, I believe having a daemon that has exclusive access to tmpfs, and creates, unlinks and passes the fd to the requesting application would provide a userspace only implementation of the second feature requirement ("without having a world-writable tmpfs that untrusted apps could fill with files"). Though I'm not sure what the proc//maps naming would look like on the unlinked file, so it might not solve the third naming issue. thanks -john -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx204.postini.com [74.125.245.204]) by kanga.kvack.org (Postfix) with SMTP id 986DC6B0034 for ; Wed, 26 Jun 2013 14:53:47 -0400 (EDT) Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1Uruqf-0003hj-Av for linux-mm@kvack.org; Wed, 26 Jun 2013 20:53:45 +0200 Received: from c-24-17-197-101.hsd1.wa.comcast.net ([24.17.197.101]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 26 Jun 2013 20:53:45 +0200 Received: from eternaleye by c-24-17-197-101.hsd1.wa.comcast.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 26 Jun 2013 20:53:45 +0200 From: Alex Elsayed Subject: Re: RFC: named anonymous vmas Date: Wed, 26 Jun 2013 11:53:32 -0700 Message-ID: References: <20130622103158.GA16304@infradead.org> <20130624114832.GA9961@infradead.org> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7Bit Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Colin Cross wrote: > On Mon, Jun 24, 2013 at 4:48 AM, Christoph Hellwig > wrote: >> On Sat, Jun 22, 2013 at 12:47:29PM -0700, Alex Elsayed wrote: >>> Couldn't this be done by having a root-only tmpfs, and having a >>> userspace component that creates per-app directories with restrictive >>> permissions on startup/app install? Then each app creates files in its >>> own directory, and can pass the fds around. > > If each app gets its own writable directory that's not really > different than a world writable tmpfs. It requires something that > watches for apps to exit for any reason and cleans up their > directories, and it requires each app to come up with an unused name > when it wants to create a file, and the kernel can give you both very > cleanly. Not so far as I can tell. I'm thinking specifically in the Android model of 'one user per app', and as I see it the issues with a world writable tmpfs would be: 1.) Race conditions and all the sticky bit bugs of history - app A tries to create file foo, but app C is doing the same. This is resolved with per-app directories and restrictive permissions. 2.) Resource exhaustion - implementing this for a mmap'ed device node as described in HCH's mail would amount to implementing some sort of quota support. A world-writable tmpfs would require user quotas. A dir-per-app tmpfs could mount a separate, limited tmpfs on each even in the absence of user quotas, and mount -o remount,size=$foo works to change those limits (within certain bounds of behavior). 3.) Cleanup - doing this with a device makes it simple, yes; once the FDs are closed the mapping goes away. But if the only way the mapping gets shared is via FD passing, and your users are all via a platform library, unlink() after open(O_CREAT) would get you the same behavior as I understand it. At that point, the only thing to clean up is the per-app directory itself, which can be done on app uninstall IIUC. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx128.postini.com [74.125.245.128]) by kanga.kvack.org (Postfix) with SMTP id 2F15E6B0031 for ; Sat, 13 Jul 2013 20:27:10 -0400 (EDT) Received: by mail-ie0-f172.google.com with SMTP id 16so23943371iea.3 for ; Sat, 13 Jul 2013 17:27:09 -0700 (PDT) Message-ID: <51E1F056.3000108@gmail.com> Date: Sun, 14 Jul 2013 08:27:02 +0800 From: Sam Ben MIME-Version: 1.0 Subject: Re: RFC: named anonymous vmas References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Colin Cross Cc: lkml , Linux-MM , Android Kernel Team , John Stultz Hi Colin, On 06/22/2013 07:42 AM, Colin Cross wrote: > One of the features of ashmem (drivers/staging/android/ashmem.c) that > hasn't gotten much discussion about moving out of staging is named > anonymous memory. > > In Android, ashmem is used for three different features, and most > users of it only care about one feature at a time. One is volatile > ranges, which John Stultz has been implementing. The second is > anonymous shareable memory without having a world-writable tmpfs that > untrusted apps could fill with files. The third and most heavily used How to understand "anonymous shareable memory without having a world-writable tmpfs that untrusted apps could fill with files"? > feature within the Android codebase is named anonymous memory, where a > region of anonymous memory can have a name associated with it that > will show up in /proc/pid/maps. The Dalvik VM likes to use this > feature extensively, even for memory that will never be shared and > could easily be allocated using an anonymous mmap, and even malloc has > used it in the past. It provides an easy way to collate memory used > for different purposes across multiple processes, which Android uses > for its "dumpsys meminfo" and "librank" tools to determine how much > memory is used for java heaps, JIT caches, native mallocs, etc. > > I'd like to add this feature for anonymous mmap memory. I propose > adding an madvise2(unsigned long start, size_t len_in, int behavior, > void *ptr, size_t size) syscall and a new MADV_NAME behavior, which > treats ptr as a string of length size. The string would be copied > somewhere reusable in the kernel, or reused if it already exists, and > the kernel address of the string would get stashed in a new field in > struct vm_area_struct. Adjacent vmas would only get merged if the > name pointer matched, and naming part of a mapping would split the > mapping. show_map_vma would print the name only if none of the other > existing names rules match. > > Any comments as I start implementing it? Is there any reason to allow > naming a file-backed mapping and showing it alongside the file name in > /proc/pid/maps? > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx145.postini.com [74.125.245.145]) by kanga.kvack.org (Postfix) with SMTP id A68866B0031 for ; Sat, 13 Jul 2013 20:57:22 -0400 (EDT) Received: by mail-ie0-f177.google.com with SMTP id aq17so22182244iec.8 for ; Sat, 13 Jul 2013 17:57:22 -0700 (PDT) Message-ID: <51E1F769.7090208@gmail.com> Date: Sun, 14 Jul 2013 08:57:13 +0800 From: Sam Ben MIME-Version: 1.0 Subject: Re: RFC: named anonymous vmas References: <20130622103158.GA16304@infradead.org> <20130624114832.GA9961@infradead.org> In-Reply-To: <20130624114832.GA9961@infradead.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Hellwig Cc: Alex Elsayed , linux-mm@kvack.org, linux-kernel@vger.kernel.org Hi Christoph, On 06/24/2013 07:48 PM, Christoph Hellwig wrote: > On Sat, Jun 22, 2013 at 12:47:29PM -0700, Alex Elsayed wrote: >> Couldn't this be done by having a root-only tmpfs, and having a userspace >> component that creates per-app directories with restrictive permissions on >> startup/app install? Then each app creates files in its own directory, and >> can pass the fds around. > Honestly having a device that allows passing fds around that can be > mmaped sounds a lot simpler. I have to admit that I expect /dev/zero > to do this, but looking at the code it creates new file structures > at ->mmap time which would defeat this. Could you point out where done this? > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx146.postini.com [74.125.245.146]) by kanga.kvack.org (Postfix) with SMTP id 281E16B0031 for ; Thu, 1 Aug 2013 04:29:54 -0400 (EDT) Date: Thu, 1 Aug 2013 01:29:51 -0700 From: Christoph Hellwig Subject: Re: RFC: named anonymous vmas Message-ID: <20130801082951.GA23563@infradead.org> References: <20130622103158.GA16304@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Colin Cross Cc: Christoph Hellwig , lkml , Linux-MM , Android Kernel Team , John Stultz , libc-alpha@sourceware.org Btw, FreeBSD has an extension to shm_open to create unnamed but fd passable segments. From their man page: As a FreeBSD extension, the constant SHM_ANON may be used for the path argument to shm_open(). In this case, an anonymous, unnamed shared memory object is created. Since the object has no name, it cannot be removed via a subsequent call to shm_unlink(). Instead, the shared memory object will be garbage collected when the last reference to the shared memory object is removed. The shared memory object may be shared with other processes by sharing the file descriptor via fork(2) or sendmsg(2). Attempting to open an anonymous shared memory object with O_RDONLY will fail with EINVAL. All other flags are ignored. To me this sounds like the best way to expose this functionality to the user. Implementing it is another question as shm_open sits in libc, we could either take it and shm_unlink to the kernel, or use O_TMPFILE on tmpfs as the backend. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx185.postini.com [74.125.245.185]) by kanga.kvack.org (Postfix) with SMTP id 428C66B0031 for ; Thu, 1 Aug 2013 04:36:41 -0400 (EDT) Date: Thu, 1 Aug 2013 04:36:08 -0400 Subject: Re: RFC: named anonymous vmas Message-ID: <20130801083608.GJ221@brightrain.aerifal.cx> References: <20130622103158.GA16304@infradead.org> <20130801082951.GA23563@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130801082951.GA23563@infradead.org> From: Rich Felker Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Hellwig Cc: Colin Cross , lkml , Linux-MM , Android Kernel Team , John Stultz , libc-alpha@sourceware.org On Thu, Aug 01, 2013 at 01:29:51AM -0700, Christoph Hellwig wrote: > Btw, FreeBSD has an extension to shm_open to create unnamed but fd > passable segments. From their man page: > > As a FreeBSD extension, the constant SHM_ANON may be used for the path > argument to shm_open(). In this case, an anonymous, unnamed shared > memory object is created. Since the object has no name, it cannot be > removed via a subsequent call to shm_unlink(). Instead, the shared > memory object will be garbage collected when the last reference to the > shared memory object is removed. The shared memory object may be shared > with other processes by sharing the file descriptor via fork(2) or > sendmsg(2). Attempting to open an anonymous shared memory object with > O_RDONLY will fail with EINVAL. All other flags are ignored. > > To me this sounds like the best way to expose this functionality to the > user. Implementing it is another question as shm_open sits in libc, > we could either take it and shm_unlink to the kernel, or use O_TMPFILE > on tmpfs as the backend. I'm not sure what the purpose is. shm_open with a long random filename and O_EXCL|O_CREAT, followed immediately by shm_unlink, is just as good except in the case where you have a malicious user killing the process in between these two operations. Rich -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx107.postini.com [74.125.245.107]) by kanga.kvack.org (Postfix) with SMTP id 44AB16B0032 for ; Fri, 2 Aug 2013 11:11:45 -0400 (EDT) Date: Fri, 2 Aug 2013 08:11:41 -0700 From: Christoph Hellwig Subject: Re: RFC: named anonymous vmas Message-ID: <20130802151141.GA4439@infradead.org> References: <20130622103158.GA16304@infradead.org> <20130801082951.GA23563@infradead.org> <20130801083608.GJ221@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130801083608.GJ221@brightrain.aerifal.cx> Sender: owner-linux-mm@kvack.org List-ID: To: Rich Felker Cc: Christoph Hellwig , Colin Cross , lkml , Linux-MM , Android Kernel Team , John Stultz , libc-alpha@sourceware.org On Thu, Aug 01, 2013 at 04:36:08AM -0400, Rich Felker wrote: > I'm not sure what the purpose is. shm_open with a long random filename > and O_EXCL|O_CREAT, followed immediately by shm_unlink, is just as > good except in the case where you have a malicious user killing the > process in between these two operations. The Android people already have an shm API doesn't leave traces in the filesystem, and I at least conceptually agree that having an API that doesn't introduce posisble other access is a good idea. This is the same reason why the O_TMPFILE API was added in this releases. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx186.postini.com [74.125.245.186]) by kanga.kvack.org (Postfix) with SMTP id 969546B0031 for ; Sat, 3 Aug 2013 19:54:59 -0400 (EDT) Received: by mail-oa0-f41.google.com with SMTP id j6so3930024oag.28 for ; Sat, 03 Aug 2013 16:54:58 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20130801083608.GJ221@brightrain.aerifal.cx> References: <20130622103158.GA16304@infradead.org> <20130801082951.GA23563@infradead.org> <20130801083608.GJ221@brightrain.aerifal.cx> From: KOSAKI Motohiro Date: Sat, 3 Aug 2013 19:54:38 -0400 Message-ID: Subject: Re: RFC: named anonymous vmas Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Rich Felker Cc: Christoph Hellwig , Colin Cross , lkml , Linux-MM , Android Kernel Team , John Stultz , libc-alpha On Thu, Aug 1, 2013 at 4:36 AM, Rich Felker wrote: > On Thu, Aug 01, 2013 at 01:29:51AM -0700, Christoph Hellwig wrote: >> Btw, FreeBSD has an extension to shm_open to create unnamed but fd >> passable segments. From their man page: >> >> As a FreeBSD extension, the constant SHM_ANON may be used for the path >> argument to shm_open(). In this case, an anonymous, unnamed shared >> memory object is created. Since the object has no name, it cannot be >> removed via a subsequent call to shm_unlink(). Instead, the shared >> memory object will be garbage collected when the last reference to the >> shared memory object is removed. The shared memory object may be shared >> with other processes by sharing the file descriptor via fork(2) or >> sendmsg(2). Attempting to open an anonymous shared memory object with >> O_RDONLY will fail with EINVAL. All other flags are ignored. >> >> To me this sounds like the best way to expose this functionality to the >> user. Implementing it is another question as shm_open sits in libc, >> we could either take it and shm_unlink to the kernel, or use O_TMPFILE >> on tmpfs as the backend. > > I'm not sure what the purpose is. shm_open with a long random filename > and O_EXCL|O_CREAT, followed immediately by shm_unlink, is just as > good except in the case where you have a malicious user killing the > process in between these two operations. Practically, filename length is restricted by NAME_MAX(255bytes). Several people don't think it is enough long length. The point is, race free API. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1423380Ab3FUXmn (ORCPT ); Fri, 21 Jun 2013 19:42:43 -0400 Received: from mail-vb0-f51.google.com ([209.85.212.51]:47114 "EHLO mail-vb0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161536Ab3FUXmm (ORCPT ); Fri, 21 Jun 2013 19:42:42 -0400 MIME-Version: 1.0 Date: Fri, 21 Jun 2013 16:42:41 -0700 Message-ID: Subject: RFC: named anonymous vmas From: Colin Cross To: lkml Cc: Linux-MM , Android Kernel Team , John Stultz Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org One of the features of ashmem (drivers/staging/android/ashmem.c) that hasn't gotten much discussion about moving out of staging is named anonymous memory. In Android, ashmem is used for three different features, and most users of it only care about one feature at a time. One is volatile ranges, which John Stultz has been implementing. The second is anonymous shareable memory without having a world-writable tmpfs that untrusted apps could fill with files. The third and most heavily used feature within the Android codebase is named anonymous memory, where a region of anonymous memory can have a name associated with it that will show up in /proc/pid/maps. The Dalvik VM likes to use this feature extensively, even for memory that will never be shared and could easily be allocated using an anonymous mmap, and even malloc has used it in the past. It provides an easy way to collate memory used for different purposes across multiple processes, which Android uses for its "dumpsys meminfo" and "librank" tools to determine how much memory is used for java heaps, JIT caches, native mallocs, etc. I'd like to add this feature for anonymous mmap memory. I propose adding an madvise2(unsigned long start, size_t len_in, int behavior, void *ptr, size_t size) syscall and a new MADV_NAME behavior, which treats ptr as a string of length size. The string would be copied somewhere reusable in the kernel, or reused if it already exists, and the kernel address of the string would get stashed in a new field in struct vm_area_struct. Adjacent vmas would only get merged if the name pointer matched, and naming part of a mapping would split the mapping. show_map_vma would print the name only if none of the other existing names rules match. Any comments as I start implementing it? Is there any reason to allow naming a file-backed mapping and showing it alongside the file name in /proc/pid/maps? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750844Ab3FVFMv (ORCPT ); Sat, 22 Jun 2013 01:12:51 -0400 Received: from mail-oa0-f41.google.com ([209.85.219.41]:53279 "EHLO mail-oa0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750753Ab3FVFMu (ORCPT ); Sat, 22 Jun 2013 01:12:50 -0400 MIME-Version: 1.0 In-Reply-To: References: Date: Sat, 22 Jun 2013 14:12:50 +0900 X-Google-Sender-Auth: RKFiUPa4x4ww_fSb2vFwWru9Xi4 Message-ID: Subject: Re: RFC: named anonymous vmas From: Kyungmin Park To: Colin Cross Cc: lkml , Linux-MM , Android Kernel Team , John Stultz , Hyunhee Kim , Marek Szyprowski , Tomasz Stanislawski Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Jun 22, 2013 at 8:42 AM, Colin Cross wrote: > One of the features of ashmem (drivers/staging/android/ashmem.c) that > hasn't gotten much discussion about moving out of staging is named > anonymous memory. > > In Android, ashmem is used for three different features, and most > users of it only care about one feature at a time. One is volatile > ranges, which John Stultz has been implementing. The second is > anonymous shareable memory without having a world-writable tmpfs that > untrusted apps could fill with files. The third and most heavily used > feature within the Android codebase is named anonymous memory, where a > region of anonymous memory can have a name associated with it that > will show up in /proc/pid/maps. The Dalvik VM likes to use this Good to know it. I didn't know ashmem provides these features. we are also discussing these requirement internally. and study how to show who request these anon memory and which callback is used for it. > feature extensively, even for memory that will never be shared and > could easily be allocated using an anonymous mmap, and even malloc has > used it in the past. It provides an easy way to collate memory used > for different purposes across multiple processes, which Android uses > for its "dumpsys meminfo" and "librank" tools to determine how much > memory is used for java heaps, JIT caches, native mallocs, etc. Same requirement for app developers. they want to know what's the meaning these anon memory is allocated and how to find out these anon memory is allocated at their codes. > > I'd like to add this feature for anonymous mmap memory. I propose > adding an madvise2(unsigned long start, size_t len_in, int behavior, > void *ptr, size_t size) syscall and a new MADV_NAME behavior, which > treats ptr as a string of length size. The string would be copied > somewhere reusable in the kernel, or reused if it already exists, and > the kernel address of the string would get stashed in a new field in > struct vm_area_struct. Adjacent vmas would only get merged if the > name pointer matched, and naming part of a mapping would split the > mapping. show_map_vma would print the name only if none of the other > existing names rules match. Do you want to create new syscall? can it use current madvise and only allow this feature at linux only? As you know it's just hint and it doesn't break existing memory behaviors. > > Any comments as I start implementing it? Is there any reason to allow > naming a file-backed mapping and showing it alongside the file name in > /proc/pid/maps? > Thank you, Kyungmin Park From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750921Ab3FVFUF (ORCPT ); Sat, 22 Jun 2013 01:20:05 -0400 Received: from mail-vc0-f169.google.com ([209.85.220.169]:41464 "EHLO mail-vc0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750753Ab3FVFUD (ORCPT ); Sat, 22 Jun 2013 01:20:03 -0400 MIME-Version: 1.0 In-Reply-To: References: Date: Fri, 21 Jun 2013 22:20:02 -0700 Message-ID: Subject: Re: RFC: named anonymous vmas From: Colin Cross To: Kyungmin Park Cc: lkml , Linux-MM , Android Kernel Team , John Stultz , Hyunhee Kim , Marek Szyprowski , Tomasz Stanislawski Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 21, 2013 at 10:12 PM, Kyungmin Park wrote: > On Sat, Jun 22, 2013 at 8:42 AM, Colin Cross wrote: >> One of the features of ashmem (drivers/staging/android/ashmem.c) that >> hasn't gotten much discussion about moving out of staging is named >> anonymous memory. >> >> In Android, ashmem is used for three different features, and most >> users of it only care about one feature at a time. One is volatile >> ranges, which John Stultz has been implementing. The second is >> anonymous shareable memory without having a world-writable tmpfs that >> untrusted apps could fill with files. The third and most heavily used >> feature within the Android codebase is named anonymous memory, where a >> region of anonymous memory can have a name associated with it that >> will show up in /proc/pid/maps. The Dalvik VM likes to use this > > Good to know it. I didn't know ashmem provides these features. > we are also discussing these requirement internally. and study how to > show who request these anon memory and which callback is used for it. > >> feature extensively, even for memory that will never be shared and >> could easily be allocated using an anonymous mmap, and even malloc has >> used it in the past. It provides an easy way to collate memory used >> for different purposes across multiple processes, which Android uses >> for its "dumpsys meminfo" and "librank" tools to determine how much >> memory is used for java heaps, JIT caches, native mallocs, etc. > Same requirement for app developers. they want to know what's the > meaning these anon memory is allocated and how to find out these anon > memory is allocated at their codes. >> >> I'd like to add this feature for anonymous mmap memory. I propose >> adding an madvise2(unsigned long start, size_t len_in, int behavior, >> void *ptr, size_t size) syscall and a new MADV_NAME behavior, which >> treats ptr as a string of length size. The string would be copied >> somewhere reusable in the kernel, or reused if it already exists, and >> the kernel address of the string would get stashed in a new field in >> struct vm_area_struct. Adjacent vmas would only get merged if the >> name pointer matched, and naming part of a mapping would split the >> mapping. show_map_vma would print the name only if none of the other >> existing names rules match. > Do you want to create new syscall? can it use current madvise and only > allow this feature at linux only? > As you know it's just hint and it doesn't break existing memory behaviors. The existing madvise syscall only takes a single int to modify the vma, which is not enough to pass a pointer to a string. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755988Ab3FVKcM (ORCPT ); Sat, 22 Jun 2013 06:32:12 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:36068 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752674Ab3FVKcL (ORCPT ); Sat, 22 Jun 2013 06:32:11 -0400 Date: Sat, 22 Jun 2013 03:31:58 -0700 From: Christoph Hellwig To: Colin Cross Cc: lkml , Linux-MM , Android Kernel Team , John Stultz Subject: Re: RFC: named anonymous vmas Message-ID: <20130622103158.GA16304@infradead.org> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 21, 2013 at 04:42:41PM -0700, Colin Cross wrote: > ranges, which John Stultz has been implementing. The second is > anonymous shareable memory without having a world-writable tmpfs that > untrusted apps could fill with files. I still haven't seen any explanation of what ashmem buys over a shared mmap of /dev/zero in that respect, btw. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752022Ab3FVRac (ORCPT ); Sat, 22 Jun 2013 13:30:32 -0400 Received: from mail-ve0-f179.google.com ([209.85.128.179]:53187 "EHLO mail-ve0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751862Ab3FVRab (ORCPT ); Sat, 22 Jun 2013 13:30:31 -0400 MIME-Version: 1.0 In-Reply-To: <20130622103158.GA16304@infradead.org> References: <20130622103158.GA16304@infradead.org> Date: Sat, 22 Jun 2013 10:30:30 -0700 Message-ID: Subject: Re: RFC: named anonymous vmas From: Colin Cross To: Christoph Hellwig Cc: lkml , Linux-MM , Android Kernel Team , John Stultz Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Jun 22, 2013 at 3:31 AM, Christoph Hellwig wrote: > On Fri, Jun 21, 2013 at 04:42:41PM -0700, Colin Cross wrote: >> ranges, which John Stultz has been implementing. The second is >> anonymous shareable memory without having a world-writable tmpfs that >> untrusted apps could fill with files. > > I still haven't seen any explanation of what ashmem buys over a shared > mmap of /dev/zero in that respect, btw. I believe the difference is that ashmem ties the memory to an fd, so it can be passed to another process and mmaped to get to the same memory, but /dev/zero does not. Passing a /dev/zero fd and mmaping it would result in a brand new region of zeroed memory. Opening a tmpfs file would allow sharing memory by passing the fd, but we don't want a world-writable tmpfs. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752750Ab3FXLsf (ORCPT ); Mon, 24 Jun 2013 07:48:35 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:43284 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752389Ab3FXLse (ORCPT ); Mon, 24 Jun 2013 07:48:34 -0400 Date: Mon, 24 Jun 2013 04:48:32 -0700 From: Christoph Hellwig To: Alex Elsayed Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: RFC: named anonymous vmas Message-ID: <20130624114832.GA9961@infradead.org> References: <20130622103158.GA16304@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Jun 22, 2013 at 12:47:29PM -0700, Alex Elsayed wrote: > Couldn't this be done by having a root-only tmpfs, and having a userspace > component that creates per-app directories with restrictive permissions on > startup/app install? Then each app creates files in its own directory, and > can pass the fds around. Honestly having a device that allows passing fds around that can be mmaped sounds a lot simpler. I have to admit that I expect /dev/zero to do this, but looking at the code it creates new file structures at ->mmap time which would defeat this. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752482Ab3FXR07 (ORCPT ); Mon, 24 Jun 2013 13:26:59 -0400 Received: from mail-vb0-f44.google.com ([209.85.212.44]:48651 "EHLO mail-vb0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751506Ab3FXR06 (ORCPT ); Mon, 24 Jun 2013 13:26:58 -0400 MIME-Version: 1.0 In-Reply-To: <20130624114832.GA9961@infradead.org> References: <20130622103158.GA16304@infradead.org> <20130624114832.GA9961@infradead.org> Date: Mon, 24 Jun 2013 10:26:57 -0700 Message-ID: Subject: Re: RFC: named anonymous vmas From: Colin Cross To: Christoph Hellwig Cc: Alex Elsayed , Linux-MM , lkml Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 24, 2013 at 4:48 AM, Christoph Hellwig wrote: > On Sat, Jun 22, 2013 at 12:47:29PM -0700, Alex Elsayed wrote: >> Couldn't this be done by having a root-only tmpfs, and having a userspace >> component that creates per-app directories with restrictive permissions on >> startup/app install? Then each app creates files in its own directory, and >> can pass the fds around. If each app gets its own writable directory that's not really different than a world writable tmpfs. It requires something that watches for apps to exit for any reason and cleans up their directories, and it requires each app to come up with an unused name when it wants to create a file, and the kernel can give you both very cleanly. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752240Ab3FXXpO (ORCPT ); Mon, 24 Jun 2013 19:45:14 -0400 Received: from mail-ie0-f172.google.com ([209.85.223.172]:42216 "EHLO mail-ie0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750860Ab3FXXpM (ORCPT ); Mon, 24 Jun 2013 19:45:12 -0400 MIME-Version: 1.0 In-Reply-To: References: <20130622103158.GA16304@infradead.org> <20130624114832.GA9961@infradead.org> Date: Mon, 24 Jun 2013 16:45:12 -0700 X-Google-Sender-Auth: I7BDcyJAcULIgkVsIg4xz-C2aNw Message-ID: Subject: Re: RFC: named anonymous vmas From: John Stultz To: Colin Cross Cc: Christoph Hellwig , Alex Elsayed , Linux-MM , lkml Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 24, 2013 at 10:26 AM, Colin Cross wrote: > On Mon, Jun 24, 2013 at 4:48 AM, Christoph Hellwig wrote: >> On Sat, Jun 22, 2013 at 12:47:29PM -0700, Alex Elsayed wrote: >>> Couldn't this be done by having a root-only tmpfs, and having a userspace >>> component that creates per-app directories with restrictive permissions on >>> startup/app install? Then each app creates files in its own directory, and >>> can pass the fds around. > > If each app gets its own writable directory that's not really > different than a world writable tmpfs. It requires something that > watches for apps to exit for any reason and cleans up their > directories, and it requires each app to come up with an unused name > when it wants to create a file, and the kernel can give you both very > cleanly. Though, I believe having a daemon that has exclusive access to tmpfs, and creates, unlinks and passes the fd to the requesting application would provide a userspace only implementation of the second feature requirement ("without having a world-writable tmpfs that untrusted apps could fill with files"). Though I'm not sure what the proc//maps naming would look like on the unlinked file, so it might not solve the third naming issue. thanks -john From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752093Ab3GNA1L (ORCPT ); Sat, 13 Jul 2013 20:27:11 -0400 Received: from mail-ie0-f175.google.com ([209.85.223.175]:62594 "EHLO mail-ie0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751903Ab3GNA1K (ORCPT ); Sat, 13 Jul 2013 20:27:10 -0400 Message-ID: <51E1F056.3000108@gmail.com> Date: Sun, 14 Jul 2013 08:27:02 +0800 From: Sam Ben User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130329 Thunderbird/17.0.5 MIME-Version: 1.0 To: Colin Cross CC: lkml , Linux-MM , Android Kernel Team , John Stultz Subject: Re: RFC: named anonymous vmas References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Colin, On 06/22/2013 07:42 AM, Colin Cross wrote: > One of the features of ashmem (drivers/staging/android/ashmem.c) that > hasn't gotten much discussion about moving out of staging is named > anonymous memory. > > In Android, ashmem is used for three different features, and most > users of it only care about one feature at a time. One is volatile > ranges, which John Stultz has been implementing. The second is > anonymous shareable memory without having a world-writable tmpfs that > untrusted apps could fill with files. The third and most heavily used How to understand "anonymous shareable memory without having a world-writable tmpfs that untrusted apps could fill with files"? > feature within the Android codebase is named anonymous memory, where a > region of anonymous memory can have a name associated with it that > will show up in /proc/pid/maps. The Dalvik VM likes to use this > feature extensively, even for memory that will never be shared and > could easily be allocated using an anonymous mmap, and even malloc has > used it in the past. It provides an easy way to collate memory used > for different purposes across multiple processes, which Android uses > for its "dumpsys meminfo" and "librank" tools to determine how much > memory is used for java heaps, JIT caches, native mallocs, etc. > > I'd like to add this feature for anonymous mmap memory. I propose > adding an madvise2(unsigned long start, size_t len_in, int behavior, > void *ptr, size_t size) syscall and a new MADV_NAME behavior, which > treats ptr as a string of length size. The string would be copied > somewhere reusable in the kernel, or reused if it already exists, and > the kernel address of the string would get stashed in a new field in > struct vm_area_struct. Adjacent vmas would only get merged if the > name pointer matched, and naming part of a mapping would split the > mapping. show_map_vma would print the name only if none of the other > existing names rules match. > > Any comments as I start implementing it? Is there any reason to allow > naming a file-backed mapping and showing it alongside the file name in > /proc/pid/maps? > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752025Ab3GNA5X (ORCPT ); Sat, 13 Jul 2013 20:57:23 -0400 Received: from mail-ie0-f170.google.com ([209.85.223.170]:45497 "EHLO mail-ie0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751748Ab3GNA5W (ORCPT ); Sat, 13 Jul 2013 20:57:22 -0400 Message-ID: <51E1F769.7090208@gmail.com> Date: Sun, 14 Jul 2013 08:57:13 +0800 From: Sam Ben User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130329 Thunderbird/17.0.5 MIME-Version: 1.0 To: Christoph Hellwig CC: Alex Elsayed , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: RFC: named anonymous vmas References: <20130622103158.GA16304@infradead.org> <20130624114832.GA9961@infradead.org> In-Reply-To: <20130624114832.GA9961@infradead.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Christoph, On 06/24/2013 07:48 PM, Christoph Hellwig wrote: > On Sat, Jun 22, 2013 at 12:47:29PM -0700, Alex Elsayed wrote: >> Couldn't this be done by having a root-only tmpfs, and having a userspace >> component that creates per-app directories with restrictive permissions on >> startup/app install? Then each app creates files in its own directory, and >> can pass the fds around. > Honestly having a device that allows passing fds around that can be > mmaped sounds a lot simpler. I have to admit that I expect /dev/zero > to do this, but looking at the code it creates new file structures > at ->mmap time which would defeat this. Could you point out where done this? > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753299Ab3HAI3z (ORCPT ); Thu, 1 Aug 2013 04:29:55 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:43420 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751744Ab3HAI3w (ORCPT ); Thu, 1 Aug 2013 04:29:52 -0400 Date: Thu, 1 Aug 2013 01:29:51 -0700 From: Christoph Hellwig To: Colin Cross Cc: Christoph Hellwig , lkml , Linux-MM , Android Kernel Team , John Stultz , libc-alpha@sourceware.org Subject: Re: RFC: named anonymous vmas Message-ID: <20130801082951.GA23563@infradead.org> References: <20130622103158.GA16304@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Btw, FreeBSD has an extension to shm_open to create unnamed but fd passable segments. From their man page: As a FreeBSD extension, the constant SHM_ANON may be used for the path argument to shm_open(). In this case, an anonymous, unnamed shared memory object is created. Since the object has no name, it cannot be removed via a subsequent call to shm_unlink(). Instead, the shared memory object will be garbage collected when the last reference to the shared memory object is removed. The shared memory object may be shared with other processes by sharing the file descriptor via fork(2) or sendmsg(2). Attempting to open an anonymous shared memory object with O_RDONLY will fail with EINVAL. All other flags are ignored. To me this sounds like the best way to expose this functionality to the user. Implementing it is another question as shm_open sits in libc, we could either take it and shm_unlink to the kernel, or use O_TMPFILE on tmpfs as the backend. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752521Ab3HAIgl (ORCPT ); Thu, 1 Aug 2013 04:36:41 -0400 Received: from 216-12-86-13.cv.mvl.ntelos.net ([216.12.86.13]:55675 "EHLO brightrain.aerifal.cx" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750742Ab3HAIgj (ORCPT ); Thu, 1 Aug 2013 04:36:39 -0400 Date: Thu, 1 Aug 2013 04:36:08 -0400 To: Christoph Hellwig Cc: Colin Cross , lkml , Linux-MM , Android Kernel Team , John Stultz , libc-alpha@sourceware.org Subject: Re: RFC: named anonymous vmas Message-ID: <20130801083608.GJ221@brightrain.aerifal.cx> References: <20130622103158.GA16304@infradead.org> <20130801082951.GA23563@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130801082951.GA23563@infradead.org> User-Agent: Mutt/1.5.21 (2010-09-15) From: Rich Felker Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 01, 2013 at 01:29:51AM -0700, Christoph Hellwig wrote: > Btw, FreeBSD has an extension to shm_open to create unnamed but fd > passable segments. From their man page: > > As a FreeBSD extension, the constant SHM_ANON may be used for the path > argument to shm_open(). In this case, an anonymous, unnamed shared > memory object is created. Since the object has no name, it cannot be > removed via a subsequent call to shm_unlink(). Instead, the shared > memory object will be garbage collected when the last reference to the > shared memory object is removed. The shared memory object may be shared > with other processes by sharing the file descriptor via fork(2) or > sendmsg(2). Attempting to open an anonymous shared memory object with > O_RDONLY will fail with EINVAL. All other flags are ignored. > > To me this sounds like the best way to expose this functionality to the > user. Implementing it is another question as shm_open sits in libc, > we could either take it and shm_unlink to the kernel, or use O_TMPFILE > on tmpfs as the backend. I'm not sure what the purpose is. shm_open with a long random filename and O_EXCL|O_CREAT, followed immediately by shm_unlink, is just as good except in the case where you have a malicious user killing the process in between these two operations. Rich From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753155Ab3HBPLq (ORCPT ); Fri, 2 Aug 2013 11:11:46 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:49754 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751997Ab3HBPLo (ORCPT ); Fri, 2 Aug 2013 11:11:44 -0400 Date: Fri, 2 Aug 2013 08:11:41 -0700 From: Christoph Hellwig To: Rich Felker Cc: Christoph Hellwig , Colin Cross , lkml , Linux-MM , Android Kernel Team , John Stultz , libc-alpha@sourceware.org Subject: Re: RFC: named anonymous vmas Message-ID: <20130802151141.GA4439@infradead.org> References: <20130622103158.GA16304@infradead.org> <20130801082951.GA23563@infradead.org> <20130801083608.GJ221@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130801083608.GJ221@brightrain.aerifal.cx> User-Agent: Mutt/1.5.21 (2010-09-15) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 01, 2013 at 04:36:08AM -0400, Rich Felker wrote: > I'm not sure what the purpose is. shm_open with a long random filename > and O_EXCL|O_CREAT, followed immediately by shm_unlink, is just as > good except in the case where you have a malicious user killing the > process in between these two operations. The Android people already have an shm API doesn't leave traces in the filesystem, and I at least conceptually agree that having an API that doesn't introduce posisble other access is a good idea. This is the same reason why the O_TMPFILE API was added in this releases. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752245Ab3HCXzA (ORCPT ); Sat, 3 Aug 2013 19:55:00 -0400 Received: from mail-oa0-f53.google.com ([209.85.219.53]:46012 "EHLO mail-oa0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750875Ab3HCXy7 (ORCPT ); Sat, 3 Aug 2013 19:54:59 -0400 MIME-Version: 1.0 In-Reply-To: <20130801083608.GJ221@brightrain.aerifal.cx> References: <20130622103158.GA16304@infradead.org> <20130801082951.GA23563@infradead.org> <20130801083608.GJ221@brightrain.aerifal.cx> From: KOSAKI Motohiro Date: Sat, 3 Aug 2013 19:54:38 -0400 Message-ID: Subject: Re: RFC: named anonymous vmas To: Rich Felker Cc: Christoph Hellwig , Colin Cross , lkml , Linux-MM , Android Kernel Team , John Stultz , libc-alpha Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 1, 2013 at 4:36 AM, Rich Felker wrote: > On Thu, Aug 01, 2013 at 01:29:51AM -0700, Christoph Hellwig wrote: >> Btw, FreeBSD has an extension to shm_open to create unnamed but fd >> passable segments. From their man page: >> >> As a FreeBSD extension, the constant SHM_ANON may be used for the path >> argument to shm_open(). In this case, an anonymous, unnamed shared >> memory object is created. Since the object has no name, it cannot be >> removed via a subsequent call to shm_unlink(). Instead, the shared >> memory object will be garbage collected when the last reference to the >> shared memory object is removed. The shared memory object may be shared >> with other processes by sharing the file descriptor via fork(2) or >> sendmsg(2). Attempting to open an anonymous shared memory object with >> O_RDONLY will fail with EINVAL. All other flags are ignored. >> >> To me this sounds like the best way to expose this functionality to the >> user. Implementing it is another question as shm_open sits in libc, >> we could either take it and shm_unlink to the kernel, or use O_TMPFILE >> on tmpfs as the backend. > > I'm not sure what the purpose is. shm_open with a long random filename > and O_EXCL|O_CREAT, followed immediately by shm_unlink, is just as > good except in the case where you have a malicious user killing the > process in between these two operations. Practically, filename length is restricted by NAME_MAX(255bytes). Several people don't think it is enough long length. The point is, race free API.