* Re: [PATCH v1 1/2] mm/page_idle: Add support for per-pid page_idle using virtual indexing [not found] <20190722213205.140845-1-joel@joelfernandes.org> @ 2019-07-23 6:05 ` Michal Hocko 2019-07-23 14:34 ` Joel Fernandes 0 siblings, 1 reply; 2+ messages in thread From: Michal Hocko @ 2019-07-23 6:05 UTC (permalink / raw) To: Joel Fernandes (Google) Cc: linux-kernel, vdavydov.dev, Brendan Gregg, kernel-team, Alexey Dobriyan, Al Viro, Andrew Morton, carmenjackson, Christian Hansen, Colin Ian King, dancol, David Howells, fmayer, joaodias, joelaf, Jonathan Corbet, Kees Cook, Kirill Tkhai, Konstantin Khlebnikov, linux-doc, linux-fsdevel, linux-mm, Mike Rapoport, minchan, minchan, namhyung [Cc linux-api - please always do CC this list when introducing a user visible API] On Mon 22-07-19 17:32:04, Joel Fernandes (Google) wrote: > The page_idle tracking feature currently requires looking up the pagemap > for a process followed by interacting with /sys/kernel/mm/page_idle. > This is quite cumbersome and can be error-prone too. If between > accessing the per-PID pagemap and the global page_idle bitmap, if > something changes with the page then the information is not accurate. > More over looking up PFN from pagemap in Android devices is not > supported by unprivileged process and requires SYS_ADMIN and gives 0 for > the PFN. > > This patch adds support to directly interact with page_idle tracking at > the PID level by introducing a /proc/<pid>/page_idle file. This > eliminates the need for userspace to calculate the mapping of the page. > It follows the exact same semantics as the global > /sys/kernel/mm/page_idle, however it is easier to use for some usecases > where looking up PFN is not needed and also does not require SYS_ADMIN. > It ended up simplifying userspace code, solving the security issue > mentioned and works quite well. SELinux does not need to be turned off > since no pagemap look up is needed. > > In Android, we are using this for the heap profiler (heapprofd) which > profiles and pin points code paths which allocates and leaves memory > idle for long periods of time. > > Documentation material: > The idle page tracking API for virtual address indexing using virtual page > frame numbers (VFN) is located at /proc/<pid>/page_idle. It is a bitmap > that follows the same semantics as /sys/kernel/mm/page_idle/bitmap > except that it uses virtual instead of physical frame numbers. > > This idle page tracking API can be simpler to use than physical address > indexing, since the pagemap for a process does not need to be looked up > to mark or read a page's idle bit. It is also more accurate than > physical address indexing since in physical address indexing, address > space changes can occur between reading the pagemap and reading the > bitmap. In virtual address indexing, the process's mmap_sem is held for > the duration of the access. I didn't get to read the actual code but the overall idea makes sense to me. I can see this being useful for userspace memory management (along with remote MADV_PAGEOUT, MADV_COLD). Normally I would object that a cumbersome nature of the existing interface can be hidden in a userspace but I do agree that rowhammer has made this one close to unusable for anything but a privileged process. I do not think you can make any argument about accuracy because the information will never be accurate. Sure the race window is smaller in principle but you can hardly say anything about how much or whether at all. Thanks. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PATCH v1 1/2] mm/page_idle: Add support for per-pid page_idle using virtual indexing 2019-07-23 6:05 ` [PATCH v1 1/2] mm/page_idle: Add support for per-pid page_idle using virtual indexing Michal Hocko @ 2019-07-23 14:34 ` Joel Fernandes 0 siblings, 0 replies; 2+ messages in thread From: Joel Fernandes @ 2019-07-23 14:34 UTC (permalink / raw) To: Michal Hocko Cc: linux-kernel, vdavydov.dev, Brendan Gregg, kernel-team, Alexey Dobriyan, Al Viro, Andrew Morton, carmenjackson, Christian Hansen, Colin Ian King, dancol, David Howells, fmayer, joaodias, Jonathan Corbet, Kees Cook, Kirill Tkhai, Konstantin Khlebnikov, linux-doc, linux-fsdevel, linux-mm, Mike Rapoport, minchan, minchan, namhyung, sspatil@ On Tue, Jul 23, 2019 at 08:05:25AM +0200, Michal Hocko wrote: > [Cc linux-api - please always do CC this list when introducing a user > visible API] Sorry, will do. > On Mon 22-07-19 17:32:04, Joel Fernandes (Google) wrote: > > The page_idle tracking feature currently requires looking up the pagemap > > for a process followed by interacting with /sys/kernel/mm/page_idle. > > This is quite cumbersome and can be error-prone too. If between > > accessing the per-PID pagemap and the global page_idle bitmap, if > > something changes with the page then the information is not accurate. > > More over looking up PFN from pagemap in Android devices is not > > supported by unprivileged process and requires SYS_ADMIN and gives 0 for > > the PFN. > > > > This patch adds support to directly interact with page_idle tracking at > > the PID level by introducing a /proc/<pid>/page_idle file. This > > eliminates the need for userspace to calculate the mapping of the page. > > It follows the exact same semantics as the global > > /sys/kernel/mm/page_idle, however it is easier to use for some usecases > > where looking up PFN is not needed and also does not require SYS_ADMIN. > > It ended up simplifying userspace code, solving the security issue > > mentioned and works quite well. SELinux does not need to be turned off > > since no pagemap look up is needed. > > > > In Android, we are using this for the heap profiler (heapprofd) which > > profiles and pin points code paths which allocates and leaves memory > > idle for long periods of time. > > > > Documentation material: > > The idle page tracking API for virtual address indexing using virtual page > > frame numbers (VFN) is located at /proc/<pid>/page_idle. It is a bitmap > > that follows the same semantics as /sys/kernel/mm/page_idle/bitmap > > except that it uses virtual instead of physical frame numbers. > > > > This idle page tracking API can be simpler to use than physical address > > indexing, since the pagemap for a process does not need to be looked up > > to mark or read a page's idle bit. It is also more accurate than > > physical address indexing since in physical address indexing, address > > space changes can occur between reading the pagemap and reading the > > bitmap. In virtual address indexing, the process's mmap_sem is held for > > the duration of the access. > > I didn't get to read the actual code but the overall idea makes sense to > me. I can see this being useful for userspace memory management (along > with remote MADV_PAGEOUT, MADV_COLD). Thanks. > Normally I would object that a cumbersome nature of the existing > interface can be hidden in a userspace but I do agree that rowhammer has > made this one close to unusable for anything but a privileged process. Agreed, this is one of the primary motivations for the patch as you said. > I do not think you can make any argument about accuracy because > the information will never be accurate. Sure the race window is smaller > in principle but you can hardly say anything about how much or whether > at all. Sure, fair enough. That is why I wasn't beating the drum too much on the accuracy point. However, this surprisingly does work quite well. thanks, - Joel ^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2019-07-23 14:34 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20190722213205.140845-1-joel@joelfernandes.org>
2019-07-23 6:05 ` [PATCH v1 1/2] mm/page_idle: Add support for per-pid page_idle using virtual indexing Michal Hocko
2019-07-23 14:34 ` Joel Fernandes
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).