* [PATCH 0/4] enhance shmem process and swap accounting @ 2015-02-26 13:51 ` Vlastimil Babka 0 siblings, 0 replies; 37+ messages in thread From: Vlastimil Babka @ 2015-02-26 13:51 UTC (permalink / raw) To: linux-mm, Jerome Marchand Cc: linux-kernel, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Vlastimil Babka This series is based on Jerome Marchand's [1] so let me quote the first paragraph from there: There are several shortcomings with the accounting of shared memory (sysV shm, shared anonymous mapping, mapping to a tmpfs file). The values in /proc/<pid>/status and statm don't allow to distinguish between shmem memory and a shared mapping to a regular file, even though theirs implication on memory usage are quite different: at reclaim, file mapping can be dropped or write back on disk while shmem needs a place in swap. As for shmem pages that are swapped-out or in swap cache, they aren't accounted at all. The original motivation for myself is that a customer found (IMHO rightfully) confusing that e.g. top output for process swap usage is unreliable with respect to swapped out shmem pages, which are not accounted for. The fundamental difference between private anonymous and shmem pages is that the latter has PTE's converted to pte_none, and not swapents. As such, they are not accounted to the number of swapents visible e.g. in /proc/pid/status VmSwap row. It might be theoretically possible to use swapents when swapping out shmem (without extra cost, as one has to change all mappers anyway), and on swap in only convert the swapent for the faulting process, leaving swapents in other processes until they also fault (so again no extra cost). But I don't know how many assumptions this would break, and it would be too disruptive change for a relatively small benefit. Instead, my approach is to document the limitation of VmSwap, and provide means to determine the swap usage for shmem areas for those who are interested and willing to pay the price, using /proc/pid/smaps. Because outside of ipcs, I don't think it's possible to currently to determine the usage at all. The previous patchset [1] did introduce new shmem-specific fields into smaps output, and functions to determine the values. I take a simpler approach, noting that smaps output already has a "Swap: X kB" line, where currently X == 0 always for shmem areas. I think we can just consider this a bug and provide the proper value by consulting the radix tree, as e.g. mincore_page() does. In the patch changelog I explain why this is also not perfect (and cannot be without swapents), but still arguably much better than showing a 0. The last two patches are adapted from Jerome's patchset and provide a VmRSS breakdown to VmAnon, VmFile and VmShm in /proc/pid/status. Hugh noted that this is a welcome addition, and I agree that it might help e.g. debugging process memory usage at albeit non-zero, but still rather low cost of extra per-mm counter and some page flag checks. I updated these patches to 4.0-rc1, made them respect !CONFIG_SHMEM so that tiny systems don't pay the cost, and optimized the page flag checking somewhat. [1] http://lwn.net/Articles/611966/ Jerome Marchand (2): mm, shmem: Add shmem resident memory accounting mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status Vlastimil Babka (2): mm, documentation: clarify /proc/pid/status VmSwap limitations mm, proc: account for shmem swap in /proc/pid/smaps Documentation/filesystems/proc.txt | 15 +++++++++++++-- arch/s390/mm/pgtable.c | 5 +---- fs/proc/task_mmu.c | 35 +++++++++++++++++++++++++++++++++-- include/linux/mm.h | 28 ++++++++++++++++++++++++++++ include/linux/mm_types.h | 9 ++++++--- kernel/events/uprobes.c | 2 +- mm/memory.c | 30 ++++++++++-------------------- mm/oom_kill.c | 5 +++-- mm/rmap.c | 15 ++++----------- 9 files changed, 99 insertions(+), 45 deletions(-) -- 2.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 0/4] enhance shmem process and swap accounting @ 2015-02-26 13:51 ` Vlastimil Babka 0 siblings, 0 replies; 37+ messages in thread From: Vlastimil Babka @ 2015-02-26 13:51 UTC (permalink / raw) To: linux-mm, Jerome Marchand Cc: linux-kernel, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Vlastimil Babka This series is based on Jerome Marchand's [1] so let me quote the first paragraph from there: There are several shortcomings with the accounting of shared memory (sysV shm, shared anonymous mapping, mapping to a tmpfs file). The values in /proc/<pid>/status and statm don't allow to distinguish between shmem memory and a shared mapping to a regular file, even though theirs implication on memory usage are quite different: at reclaim, file mapping can be dropped or write back on disk while shmem needs a place in swap. As for shmem pages that are swapped-out or in swap cache, they aren't accounted at all. The original motivation for myself is that a customer found (IMHO rightfully) confusing that e.g. top output for process swap usage is unreliable with respect to swapped out shmem pages, which are not accounted for. The fundamental difference between private anonymous and shmem pages is that the latter has PTE's converted to pte_none, and not swapents. As such, they are not accounted to the number of swapents visible e.g. in /proc/pid/status VmSwap row. It might be theoretically possible to use swapents when swapping out shmem (without extra cost, as one has to change all mappers anyway), and on swap in only convert the swapent for the faulting process, leaving swapents in other processes until they also fault (so again no extra cost). But I don't know how many assumptions this would break, and it would be too disruptive change for a relatively small benefit. Instead, my approach is to document the limitation of VmSwap, and provide means to determine the swap usage for shmem areas for those who are interested and willing to pay the price, using /proc/pid/smaps. Because outside of ipcs, I don't think it's possible to currently to determine the usage at all. The previous patchset [1] did introduce new shmem-specific fields into smaps output, and functions to determine the values. I take a simpler approach, noting that smaps output already has a "Swap: X kB" line, where currently X == 0 always for shmem areas. I think we can just consider this a bug and provide the proper value by consulting the radix tree, as e.g. mincore_page() does. In the patch changelog I explain why this is also not perfect (and cannot be without swapents), but still arguably much better than showing a 0. The last two patches are adapted from Jerome's patchset and provide a VmRSS breakdown to VmAnon, VmFile and VmShm in /proc/pid/status. Hugh noted that this is a welcome addition, and I agree that it might help e.g. debugging process memory usage at albeit non-zero, but still rather low cost of extra per-mm counter and some page flag checks. I updated these patches to 4.0-rc1, made them respect !CONFIG_SHMEM so that tiny systems don't pay the cost, and optimized the page flag checking somewhat. [1] http://lwn.net/Articles/611966/ Jerome Marchand (2): mm, shmem: Add shmem resident memory accounting mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status Vlastimil Babka (2): mm, documentation: clarify /proc/pid/status VmSwap limitations mm, proc: account for shmem swap in /proc/pid/smaps Documentation/filesystems/proc.txt | 15 +++++++++++++-- arch/s390/mm/pgtable.c | 5 +---- fs/proc/task_mmu.c | 35 +++++++++++++++++++++++++++++++++-- include/linux/mm.h | 28 ++++++++++++++++++++++++++++ include/linux/mm_types.h | 9 ++++++--- kernel/events/uprobes.c | 2 +- mm/memory.c | 30 ++++++++++-------------------- mm/oom_kill.c | 5 +++-- mm/rmap.c | 15 ++++----------- 9 files changed, 99 insertions(+), 45 deletions(-) -- 2.1.4 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 1/4] mm, documentation: clarify /proc/pid/status VmSwap limitations 2015-02-26 13:51 ` Vlastimil Babka @ 2015-02-26 13:51 ` Vlastimil Babka -1 siblings, 0 replies; 37+ messages in thread From: Vlastimil Babka @ 2015-02-26 13:51 UTC (permalink / raw) To: linux-mm, Jerome Marchand Cc: linux-kernel, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Vlastimil Babka The documentation for /proc/pid/status does not mention that the value of VmSwap counts only swapped out anonymous private pages and not shmem. This is not obvious, so document this limitation. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> --- I've noticed that proc(5) manpage is currently missing the VmSwap field altogether. Documentation/filesystems/proc.txt | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index a07ba61..d4f56ec 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -231,6 +231,8 @@ Table 1-2: Contents of the status files (as of 2.6.30-rc7) VmLib size of shared library code VmPTE size of page table entries VmSwap size of swap usage (the number of referred swapents) + by anonymous private data (shmem swap usage is not + included) Threads number of threads SigQ number of signals queued/max. number for queue SigPnd bitmap of pending signals for the thread -- 2.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH 1/4] mm, documentation: clarify /proc/pid/status VmSwap limitations @ 2015-02-26 13:51 ` Vlastimil Babka 0 siblings, 0 replies; 37+ messages in thread From: Vlastimil Babka @ 2015-02-26 13:51 UTC (permalink / raw) To: linux-mm, Jerome Marchand Cc: linux-kernel, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Vlastimil Babka The documentation for /proc/pid/status does not mention that the value of VmSwap counts only swapped out anonymous private pages and not shmem. This is not obvious, so document this limitation. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> --- I've noticed that proc(5) manpage is currently missing the VmSwap field altogether. Documentation/filesystems/proc.txt | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index a07ba61..d4f56ec 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -231,6 +231,8 @@ Table 1-2: Contents of the status files (as of 2.6.30-rc7) VmLib size of shared library code VmPTE size of page table entries VmSwap size of swap usage (the number of referred swapents) + by anonymous private data (shmem swap usage is not + included) Threads number of threads SigQ number of signals queued/max. number for queue SigPnd bitmap of pending signals for the thread -- 2.1.4 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 1/4] mm, documentation: clarify /proc/pid/status VmSwap limitations 2015-02-26 13:51 ` Vlastimil Babka @ 2015-02-27 10:37 ` Michael Kerrisk -1 siblings, 0 replies; 37+ messages in thread From: Michael Kerrisk @ 2015-02-27 10:37 UTC (permalink / raw) To: Vlastimil Babka Cc: linux-mm, Jerome Marchand, Linux Kernel, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API [CC += linux-api@] On Thu, Feb 26, 2015 at 2:51 PM, Vlastimil Babka <vbabka@suse.cz> wrote: > The documentation for /proc/pid/status does not mention that the value of > VmSwap counts only swapped out anonymous private pages and not shmem. This is > not obvious, so document this limitation. > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > --- > I've noticed that proc(5) manpage is currently missing the VmSwap field > altogether. > > Documentation/filesystems/proc.txt | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt > index a07ba61..d4f56ec 100644 > --- a/Documentation/filesystems/proc.txt > +++ b/Documentation/filesystems/proc.txt > @@ -231,6 +231,8 @@ Table 1-2: Contents of the status files (as of 2.6.30-rc7) > VmLib size of shared library code > VmPTE size of page table entries > VmSwap size of swap usage (the number of referred swapents) > + by anonymous private data (shmem swap usage is not > + included) > Threads number of threads > SigQ number of signals queued/max. number for queue > SigPnd bitmap of pending signals for the thread > -- > 2.1.4 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Author of "The Linux Programming Interface", http://blog.man7.org/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/4] mm, documentation: clarify /proc/pid/status VmSwap limitations @ 2015-02-27 10:37 ` Michael Kerrisk 0 siblings, 0 replies; 37+ messages in thread From: Michael Kerrisk @ 2015-02-27 10:37 UTC (permalink / raw) To: Vlastimil Babka Cc: linux-mm, Jerome Marchand, Linux Kernel, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API [CC += linux-api@] On Thu, Feb 26, 2015 at 2:51 PM, Vlastimil Babka <vbabka@suse.cz> wrote: > The documentation for /proc/pid/status does not mention that the value of > VmSwap counts only swapped out anonymous private pages and not shmem. This is > not obvious, so document this limitation. > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > --- > I've noticed that proc(5) manpage is currently missing the VmSwap field > altogether. > > Documentation/filesystems/proc.txt | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt > index a07ba61..d4f56ec 100644 > --- a/Documentation/filesystems/proc.txt > +++ b/Documentation/filesystems/proc.txt > @@ -231,6 +231,8 @@ Table 1-2: Contents of the status files (as of 2.6.30-rc7) > VmLib size of shared library code > VmPTE size of page table entries > VmSwap size of swap usage (the number of referred swapents) > + by anonymous private data (shmem swap usage is not > + included) > Threads number of threads > SigQ number of signals queued/max. number for queue > SigPnd bitmap of pending signals for the thread > -- > 2.1.4 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Author of "The Linux Programming Interface", http://blog.man7.org/ ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 2/4] mm, procfs: account for shmem swap in /proc/pid/smaps 2015-02-26 13:51 ` Vlastimil Babka @ 2015-02-26 13:51 ` Vlastimil Babka -1 siblings, 0 replies; 37+ messages in thread From: Vlastimil Babka @ 2015-02-26 13:51 UTC (permalink / raw) To: linux-mm, Jerome Marchand Cc: linux-kernel, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Vlastimil Babka Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed mappings, even if the mapped portion does contain pages that were swapped out. This is because unlike private anonymous mappings, shmem does not change pte to swap entry, but pte_none when swapping the page out. In the smaps page walk, such page thus looks like it was never faulted in. This patch changes smaps_pte_entry() to determine the swap status for such pte_none entries for shmem mappings, similarly to how mincore_page() does it. Swapped out pages are thus accounted for. The accounting is arguably still not as precise as for private anonymous mappings, since now we will count also pages that the process in question never accessed, but only another process populated them and then let them become swapped out. I believe it is still less confusing and subtle than not showing any swap usage by shmem mappings at all. Also, swapped out pages only becomee a performance issue for future accesses, and we cannot predict those for neither kind of mapping. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> --- Documentation/filesystems/proc.txt | 3 ++- fs/proc/task_mmu.c | 20 ++++++++++++++++++++ 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index d4f56ec..8b30543 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -437,7 +437,8 @@ indicates the amount of memory currently marked as referenced or accessed. a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE and a page is modified, the file page is replaced by a private anonymous copy. "Swap" shows how much would-be-anonymous memory is also used, but out on -swap. +swap. For shmem mappings, "Swap" shows how much of the mapped portion of the +underlying shmem object is on swap. "VmFlags" field deserves a separate description. This member represents the kernel flags associated with the particular virtual memory area in two letter encoded diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 956b75d..0410309 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -13,6 +13,7 @@ #include <linux/swap.h> #include <linux/swapops.h> #include <linux/mmu_notifier.h> +#include <linux/shmem_fs.h> #include <asm/elf.h> #include <asm/uaccess.h> @@ -496,6 +497,25 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr, mss->swap += PAGE_SIZE; else if (is_migration_entry(swpent)) page = migration_entry_to_page(swpent); + } else if (IS_ENABLED(CONFIG_SHMEM) && IS_ENABLED(CONFIG_SWAP) && + pte_none(*pte) && vma->vm_file) { + struct address_space *mapping = + file_inode(vma->vm_file)->i_mapping; + + /* + * shmem does not use swap pte's so we have to consult + * the radix tree to account for swap + */ + if (shmem_mapping(mapping)) { + page = find_get_entry(mapping, pgoff); + if (page) { + if (radix_tree_exceptional_entry(page)) + mss->swap += PAGE_SIZE; + else + page_cache_release(page); + } + page = NULL; + } } if (!page) -- 2.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH 2/4] mm, procfs: account for shmem swap in /proc/pid/smaps @ 2015-02-26 13:51 ` Vlastimil Babka 0 siblings, 0 replies; 37+ messages in thread From: Vlastimil Babka @ 2015-02-26 13:51 UTC (permalink / raw) To: linux-mm, Jerome Marchand Cc: linux-kernel, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Vlastimil Babka Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed mappings, even if the mapped portion does contain pages that were swapped out. This is because unlike private anonymous mappings, shmem does not change pte to swap entry, but pte_none when swapping the page out. In the smaps page walk, such page thus looks like it was never faulted in. This patch changes smaps_pte_entry() to determine the swap status for such pte_none entries for shmem mappings, similarly to how mincore_page() does it. Swapped out pages are thus accounted for. The accounting is arguably still not as precise as for private anonymous mappings, since now we will count also pages that the process in question never accessed, but only another process populated them and then let them become swapped out. I believe it is still less confusing and subtle than not showing any swap usage by shmem mappings at all. Also, swapped out pages only becomee a performance issue for future accesses, and we cannot predict those for neither kind of mapping. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> --- Documentation/filesystems/proc.txt | 3 ++- fs/proc/task_mmu.c | 20 ++++++++++++++++++++ 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index d4f56ec..8b30543 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -437,7 +437,8 @@ indicates the amount of memory currently marked as referenced or accessed. a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE and a page is modified, the file page is replaced by a private anonymous copy. "Swap" shows how much would-be-anonymous memory is also used, but out on -swap. +swap. For shmem mappings, "Swap" shows how much of the mapped portion of the +underlying shmem object is on swap. "VmFlags" field deserves a separate description. This member represents the kernel flags associated with the particular virtual memory area in two letter encoded diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 956b75d..0410309 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -13,6 +13,7 @@ #include <linux/swap.h> #include <linux/swapops.h> #include <linux/mmu_notifier.h> +#include <linux/shmem_fs.h> #include <asm/elf.h> #include <asm/uaccess.h> @@ -496,6 +497,25 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr, mss->swap += PAGE_SIZE; else if (is_migration_entry(swpent)) page = migration_entry_to_page(swpent); + } else if (IS_ENABLED(CONFIG_SHMEM) && IS_ENABLED(CONFIG_SWAP) && + pte_none(*pte) && vma->vm_file) { + struct address_space *mapping = + file_inode(vma->vm_file)->i_mapping; + + /* + * shmem does not use swap pte's so we have to consult + * the radix tree to account for swap + */ + if (shmem_mapping(mapping)) { + page = find_get_entry(mapping, pgoff); + if (page) { + if (radix_tree_exceptional_entry(page)) + mss->swap += PAGE_SIZE; + else + page_cache_release(page); + } + page = NULL; + } } if (!page) -- 2.1.4 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 2/4] mm, procfs: account for shmem swap in /proc/pid/smaps 2015-02-26 13:51 ` Vlastimil Babka (?) @ 2015-02-26 14:39 ` Jerome Marchand -1 siblings, 0 replies; 37+ messages in thread From: Jerome Marchand @ 2015-02-26 14:39 UTC (permalink / raw) To: Vlastimil Babka, linux-mm Cc: linux-kernel, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov [-- Attachment #1: Type: text/plain, Size: 3563 bytes --] On 02/26/2015 02:51 PM, Vlastimil Babka wrote: > Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed > mappings, even if the mapped portion does contain pages that were swapped out. > This is because unlike private anonymous mappings, shmem does not change pte > to swap entry, but pte_none when swapping the page out. In the smaps page > walk, such page thus looks like it was never faulted in. > > This patch changes smaps_pte_entry() to determine the swap status for such > pte_none entries for shmem mappings, similarly to how mincore_page() does it. > Swapped out pages are thus accounted for. > > The accounting is arguably still not as precise as for private anonymous > mappings, since now we will count also pages that the process in question never > accessed, but only another process populated them and then let them become > swapped out. I believe it is still less confusing and subtle than not showing > any swap usage by shmem mappings at all. Also, swapped out pages only becomee a > performance issue for future accesses, and we cannot predict those for neither > kind of mapping. > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > --- > Documentation/filesystems/proc.txt | 3 ++- > fs/proc/task_mmu.c | 20 ++++++++++++++++++++ > 2 files changed, 22 insertions(+), 1 deletion(-) > > diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt > index d4f56ec..8b30543 100644 > --- a/Documentation/filesystems/proc.txt > +++ b/Documentation/filesystems/proc.txt > @@ -437,7 +437,8 @@ indicates the amount of memory currently marked as referenced or accessed. > a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE > and a page is modified, the file page is replaced by a private anonymous copy. > "Swap" shows how much would-be-anonymous memory is also used, but out on > -swap. > +swap. For shmem mappings, "Swap" shows how much of the mapped portion of the > +underlying shmem object is on swap. > > "VmFlags" field deserves a separate description. This member represents the kernel > flags associated with the particular virtual memory area in two letter encoded > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index 956b75d..0410309 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -13,6 +13,7 @@ > #include <linux/swap.h> > #include <linux/swapops.h> > #include <linux/mmu_notifier.h> > +#include <linux/shmem_fs.h> > > #include <asm/elf.h> > #include <asm/uaccess.h> > @@ -496,6 +497,25 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr, > mss->swap += PAGE_SIZE; > else if (is_migration_entry(swpent)) > page = migration_entry_to_page(swpent); > + } else if (IS_ENABLED(CONFIG_SHMEM) && IS_ENABLED(CONFIG_SWAP) && > + pte_none(*pte) && vma->vm_file) { > + struct address_space *mapping = > + file_inode(vma->vm_file)->i_mapping; > + > + /* > + * shmem does not use swap pte's so we have to consult > + * the radix tree to account for swap > + */ > + if (shmem_mapping(mapping)) { > + page = find_get_entry(mapping, pgoff); > + if (page) { > + if (radix_tree_exceptional_entry(page)) > + mss->swap += PAGE_SIZE; > + else > + page_cache_release(page); > + } > + page = NULL; > + } Hi Vlastimil, I'm afraid that isn't enough. Without walking the pte holes too, big chunks of swapped out shmem pages may be missed. Jerome > } > > if (!page) > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/4] mm, procfs: account for shmem swap in /proc/pid/smaps 2015-02-26 13:51 ` Vlastimil Babka @ 2015-02-27 10:38 ` Michael Kerrisk -1 siblings, 0 replies; 37+ messages in thread From: Michael Kerrisk @ 2015-02-27 10:38 UTC (permalink / raw) To: Vlastimil Babka Cc: linux-mm, Jerome Marchand, Linux Kernel, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API [CC += linux-api@] On Thu, Feb 26, 2015 at 2:51 PM, Vlastimil Babka <vbabka@suse.cz> wrote: > Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed > mappings, even if the mapped portion does contain pages that were swapped out. > This is because unlike private anonymous mappings, shmem does not change pte > to swap entry, but pte_none when swapping the page out. In the smaps page > walk, such page thus looks like it was never faulted in. > > This patch changes smaps_pte_entry() to determine the swap status for such > pte_none entries for shmem mappings, similarly to how mincore_page() does it. > Swapped out pages are thus accounted for. > > The accounting is arguably still not as precise as for private anonymous > mappings, since now we will count also pages that the process in question never > accessed, but only another process populated them and then let them become > swapped out. I believe it is still less confusing and subtle than not showing > any swap usage by shmem mappings at all. Also, swapped out pages only becomee a > performance issue for future accesses, and we cannot predict those for neither > kind of mapping. > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > --- > Documentation/filesystems/proc.txt | 3 ++- > fs/proc/task_mmu.c | 20 ++++++++++++++++++++ > 2 files changed, 22 insertions(+), 1 deletion(-) > > diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt > index d4f56ec..8b30543 100644 > --- a/Documentation/filesystems/proc.txt > +++ b/Documentation/filesystems/proc.txt > @@ -437,7 +437,8 @@ indicates the amount of memory currently marked as referenced or accessed. > a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE > and a page is modified, the file page is replaced by a private anonymous copy. > "Swap" shows how much would-be-anonymous memory is also used, but out on > -swap. > +swap. For shmem mappings, "Swap" shows how much of the mapped portion of the > +underlying shmem object is on swap. > > "VmFlags" field deserves a separate description. This member represents the kernel > flags associated with the particular virtual memory area in two letter encoded > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index 956b75d..0410309 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -13,6 +13,7 @@ > #include <linux/swap.h> > #include <linux/swapops.h> > #include <linux/mmu_notifier.h> > +#include <linux/shmem_fs.h> > > #include <asm/elf.h> > #include <asm/uaccess.h> > @@ -496,6 +497,25 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr, > mss->swap += PAGE_SIZE; > else if (is_migration_entry(swpent)) > page = migration_entry_to_page(swpent); > + } else if (IS_ENABLED(CONFIG_SHMEM) && IS_ENABLED(CONFIG_SWAP) && > + pte_none(*pte) && vma->vm_file) { > + struct address_space *mapping = > + file_inode(vma->vm_file)->i_mapping; > + > + /* > + * shmem does not use swap pte's so we have to consult > + * the radix tree to account for swap > + */ > + if (shmem_mapping(mapping)) { > + page = find_get_entry(mapping, pgoff); > + if (page) { > + if (radix_tree_exceptional_entry(page)) > + mss->swap += PAGE_SIZE; > + else > + page_cache_release(page); > + } > + page = NULL; > + } > } > > if (!page) > -- > 2.1.4 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Author of "The Linux Programming Interface", http://blog.man7.org/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/4] mm, procfs: account for shmem swap in /proc/pid/smaps @ 2015-02-27 10:38 ` Michael Kerrisk 0 siblings, 0 replies; 37+ messages in thread From: Michael Kerrisk @ 2015-02-27 10:38 UTC (permalink / raw) To: Vlastimil Babka Cc: linux-mm, Jerome Marchand, Linux Kernel, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API [CC += linux-api@] On Thu, Feb 26, 2015 at 2:51 PM, Vlastimil Babka <vbabka@suse.cz> wrote: > Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed > mappings, even if the mapped portion does contain pages that were swapped out. > This is because unlike private anonymous mappings, shmem does not change pte > to swap entry, but pte_none when swapping the page out. In the smaps page > walk, such page thus looks like it was never faulted in. > > This patch changes smaps_pte_entry() to determine the swap status for such > pte_none entries for shmem mappings, similarly to how mincore_page() does it. > Swapped out pages are thus accounted for. > > The accounting is arguably still not as precise as for private anonymous > mappings, since now we will count also pages that the process in question never > accessed, but only another process populated them and then let them become > swapped out. I believe it is still less confusing and subtle than not showing > any swap usage by shmem mappings at all. Also, swapped out pages only becomee a > performance issue for future accesses, and we cannot predict those for neither > kind of mapping. > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > --- > Documentation/filesystems/proc.txt | 3 ++- > fs/proc/task_mmu.c | 20 ++++++++++++++++++++ > 2 files changed, 22 insertions(+), 1 deletion(-) > > diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt > index d4f56ec..8b30543 100644 > --- a/Documentation/filesystems/proc.txt > +++ b/Documentation/filesystems/proc.txt > @@ -437,7 +437,8 @@ indicates the amount of memory currently marked as referenced or accessed. > a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE > and a page is modified, the file page is replaced by a private anonymous copy. > "Swap" shows how much would-be-anonymous memory is also used, but out on > -swap. > +swap. For shmem mappings, "Swap" shows how much of the mapped portion of the > +underlying shmem object is on swap. > > "VmFlags" field deserves a separate description. This member represents the kernel > flags associated with the particular virtual memory area in two letter encoded > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index 956b75d..0410309 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -13,6 +13,7 @@ > #include <linux/swap.h> > #include <linux/swapops.h> > #include <linux/mmu_notifier.h> > +#include <linux/shmem_fs.h> > > #include <asm/elf.h> > #include <asm/uaccess.h> > @@ -496,6 +497,25 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr, > mss->swap += PAGE_SIZE; > else if (is_migration_entry(swpent)) > page = migration_entry_to_page(swpent); > + } else if (IS_ENABLED(CONFIG_SHMEM) && IS_ENABLED(CONFIG_SWAP) && > + pte_none(*pte) && vma->vm_file) { > + struct address_space *mapping = > + file_inode(vma->vm_file)->i_mapping; > + > + /* > + * shmem does not use swap pte's so we have to consult > + * the radix tree to account for swap > + */ > + if (shmem_mapping(mapping)) { > + page = find_get_entry(mapping, pgoff); > + if (page) { > + if (radix_tree_exceptional_entry(page)) > + mss->swap += PAGE_SIZE; > + else > + page_cache_release(page); > + } > + page = NULL; > + } > } > > if (!page) > -- > 2.1.4 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Author of "The Linux Programming Interface", http://blog.man7.org/ ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/4] mm, procfs: account for shmem swap in /proc/pid/smaps 2015-02-26 13:51 ` Vlastimil Babka @ 2015-03-11 12:30 ` Konstantin Khlebnikov -1 siblings, 0 replies; 37+ messages in thread From: Konstantin Khlebnikov @ 2015-03-11 12:30 UTC (permalink / raw) To: Vlastimil Babka Cc: linux-mm@kvack.org, Jerome Marchand, Linux Kernel Mailing List, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov On Thu, Feb 26, 2015 at 4:51 PM, Vlastimil Babka <vbabka@suse.cz> wrote: > Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed > mappings, even if the mapped portion does contain pages that were swapped out. > This is because unlike private anonymous mappings, shmem does not change pte > to swap entry, but pte_none when swapping the page out. In the smaps page > walk, such page thus looks like it was never faulted in. Maybe just add count of swap entries allocated by mapped shmem into swap usage of this vma? That's isn't exactly correct for partially mapped shmem but this is something weird anyway. > > This patch changes smaps_pte_entry() to determine the swap status for such > pte_none entries for shmem mappings, similarly to how mincore_page() does it. > Swapped out pages are thus accounted for. > > The accounting is arguably still not as precise as for private anonymous > mappings, since now we will count also pages that the process in question never > accessed, but only another process populated them and then let them become > swapped out. I believe it is still less confusing and subtle than not showing > any swap usage by shmem mappings at all. Also, swapped out pages only becomee a > performance issue for future accesses, and we cannot predict those for neither > kind of mapping. > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > --- > Documentation/filesystems/proc.txt | 3 ++- > fs/proc/task_mmu.c | 20 ++++++++++++++++++++ > 2 files changed, 22 insertions(+), 1 deletion(-) > > diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt > index d4f56ec..8b30543 100644 > --- a/Documentation/filesystems/proc.txt > +++ b/Documentation/filesystems/proc.txt > @@ -437,7 +437,8 @@ indicates the amount of memory currently marked as referenced or accessed. > a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE > and a page is modified, the file page is replaced by a private anonymous copy. > "Swap" shows how much would-be-anonymous memory is also used, but out on > -swap. > +swap. For shmem mappings, "Swap" shows how much of the mapped portion of the > +underlying shmem object is on swap. > > "VmFlags" field deserves a separate description. This member represents the kernel > flags associated with the particular virtual memory area in two letter encoded > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index 956b75d..0410309 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -13,6 +13,7 @@ > #include <linux/swap.h> > #include <linux/swapops.h> > #include <linux/mmu_notifier.h> > +#include <linux/shmem_fs.h> > > #include <asm/elf.h> > #include <asm/uaccess.h> > @@ -496,6 +497,25 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr, > mss->swap += PAGE_SIZE; > else if (is_migration_entry(swpent)) > page = migration_entry_to_page(swpent); > + } else if (IS_ENABLED(CONFIG_SHMEM) && IS_ENABLED(CONFIG_SWAP) && > + pte_none(*pte) && vma->vm_file) { > + struct address_space *mapping = > + file_inode(vma->vm_file)->i_mapping; > + > + /* > + * shmem does not use swap pte's so we have to consult > + * the radix tree to account for swap > + */ > + if (shmem_mapping(mapping)) { > + page = find_get_entry(mapping, pgoff); > + if (page) { > + if (radix_tree_exceptional_entry(page)) > + mss->swap += PAGE_SIZE; > + else > + page_cache_release(page); > + } > + page = NULL; > + } > } > > if (!page) > -- > 2.1.4 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/4] mm, procfs: account for shmem swap in /proc/pid/smaps @ 2015-03-11 12:30 ` Konstantin Khlebnikov 0 siblings, 0 replies; 37+ messages in thread From: Konstantin Khlebnikov @ 2015-03-11 12:30 UTC (permalink / raw) To: Vlastimil Babka Cc: linux-mm@kvack.org, Jerome Marchand, Linux Kernel Mailing List, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov On Thu, Feb 26, 2015 at 4:51 PM, Vlastimil Babka <vbabka@suse.cz> wrote: > Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed > mappings, even if the mapped portion does contain pages that were swapped out. > This is because unlike private anonymous mappings, shmem does not change pte > to swap entry, but pte_none when swapping the page out. In the smaps page > walk, such page thus looks like it was never faulted in. Maybe just add count of swap entries allocated by mapped shmem into swap usage of this vma? That's isn't exactly correct for partially mapped shmem but this is something weird anyway. > > This patch changes smaps_pte_entry() to determine the swap status for such > pte_none entries for shmem mappings, similarly to how mincore_page() does it. > Swapped out pages are thus accounted for. > > The accounting is arguably still not as precise as for private anonymous > mappings, since now we will count also pages that the process in question never > accessed, but only another process populated them and then let them become > swapped out. I believe it is still less confusing and subtle than not showing > any swap usage by shmem mappings at all. Also, swapped out pages only becomee a > performance issue for future accesses, and we cannot predict those for neither > kind of mapping. > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > --- > Documentation/filesystems/proc.txt | 3 ++- > fs/proc/task_mmu.c | 20 ++++++++++++++++++++ > 2 files changed, 22 insertions(+), 1 deletion(-) > > diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt > index d4f56ec..8b30543 100644 > --- a/Documentation/filesystems/proc.txt > +++ b/Documentation/filesystems/proc.txt > @@ -437,7 +437,8 @@ indicates the amount of memory currently marked as referenced or accessed. > a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE > and a page is modified, the file page is replaced by a private anonymous copy. > "Swap" shows how much would-be-anonymous memory is also used, but out on > -swap. > +swap. For shmem mappings, "Swap" shows how much of the mapped portion of the > +underlying shmem object is on swap. > > "VmFlags" field deserves a separate description. This member represents the kernel > flags associated with the particular virtual memory area in two letter encoded > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index 956b75d..0410309 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -13,6 +13,7 @@ > #include <linux/swap.h> > #include <linux/swapops.h> > #include <linux/mmu_notifier.h> > +#include <linux/shmem_fs.h> > > #include <asm/elf.h> > #include <asm/uaccess.h> > @@ -496,6 +497,25 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr, > mss->swap += PAGE_SIZE; > else if (is_migration_entry(swpent)) > page = migration_entry_to_page(swpent); > + } else if (IS_ENABLED(CONFIG_SHMEM) && IS_ENABLED(CONFIG_SWAP) && > + pte_none(*pte) && vma->vm_file) { > + struct address_space *mapping = > + file_inode(vma->vm_file)->i_mapping; > + > + /* > + * shmem does not use swap pte's so we have to consult > + * the radix tree to account for swap > + */ > + if (shmem_mapping(mapping)) { > + page = find_get_entry(mapping, pgoff); > + if (page) { > + if (radix_tree_exceptional_entry(page)) > + mss->swap += PAGE_SIZE; > + else > + page_cache_release(page); > + } > + page = NULL; > + } > } > > if (!page) > -- > 2.1.4 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/4] mm, procfs: account for shmem swap in /proc/pid/smaps 2015-03-11 12:30 ` Konstantin Khlebnikov (?) @ 2015-03-11 15:03 ` Konstantin Khlebnikov 2015-03-11 15:26 ` Jerome Marchand -1 siblings, 1 reply; 37+ messages in thread From: Konstantin Khlebnikov @ 2015-03-11 15:03 UTC (permalink / raw) To: Konstantin Khlebnikov, Vlastimil Babka Cc: linux-mm@kvack.org, Jerome Marchand, Linux Kernel Mailing List, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov [-- Attachment #1: Type: text/plain, Size: 4611 bytes --] On 11.03.2015 15:30, Konstantin Khlebnikov wrote: > On Thu, Feb 26, 2015 at 4:51 PM, Vlastimil Babka <vbabka@suse.cz> wrote: >> Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed >> mappings, even if the mapped portion does contain pages that were swapped out. >> This is because unlike private anonymous mappings, shmem does not change pte >> to swap entry, but pte_none when swapping the page out. In the smaps page >> walk, such page thus looks like it was never faulted in. > > Maybe just add count of swap entries allocated by mapped shmem into > swap usage of this vma? That's isn't exactly correct for partially > mapped shmem but this is something weird anyway. Something like that (see patch in attachment) > >> >> This patch changes smaps_pte_entry() to determine the swap status for such >> pte_none entries for shmem mappings, similarly to how mincore_page() does it. >> Swapped out pages are thus accounted for. >> >> The accounting is arguably still not as precise as for private anonymous >> mappings, since now we will count also pages that the process in question never >> accessed, but only another process populated them and then let them become >> swapped out. I believe it is still less confusing and subtle than not showing >> any swap usage by shmem mappings at all. Also, swapped out pages only becomee a >> performance issue for future accesses, and we cannot predict those for neither >> kind of mapping. >> >> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> >> --- >> Documentation/filesystems/proc.txt | 3 ++- >> fs/proc/task_mmu.c | 20 ++++++++++++++++++++ >> 2 files changed, 22 insertions(+), 1 deletion(-) >> >> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt >> index d4f56ec..8b30543 100644 >> --- a/Documentation/filesystems/proc.txt >> +++ b/Documentation/filesystems/proc.txt >> @@ -437,7 +437,8 @@ indicates the amount of memory currently marked as referenced or accessed. >> a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE >> and a page is modified, the file page is replaced by a private anonymous copy. >> "Swap" shows how much would-be-anonymous memory is also used, but out on >> -swap. >> +swap. For shmem mappings, "Swap" shows how much of the mapped portion of the >> +underlying shmem object is on swap. >> >> "VmFlags" field deserves a separate description. This member represents the kernel >> flags associated with the particular virtual memory area in two letter encoded >> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c >> index 956b75d..0410309 100644 >> --- a/fs/proc/task_mmu.c >> +++ b/fs/proc/task_mmu.c >> @@ -13,6 +13,7 @@ >> #include <linux/swap.h> >> #include <linux/swapops.h> >> #include <linux/mmu_notifier.h> >> +#include <linux/shmem_fs.h> >> >> #include <asm/elf.h> >> #include <asm/uaccess.h> >> @@ -496,6 +497,25 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr, >> mss->swap += PAGE_SIZE; >> else if (is_migration_entry(swpent)) >> page = migration_entry_to_page(swpent); >> + } else if (IS_ENABLED(CONFIG_SHMEM) && IS_ENABLED(CONFIG_SWAP) && >> + pte_none(*pte) && vma->vm_file) { >> + struct address_space *mapping = >> + file_inode(vma->vm_file)->i_mapping; >> + >> + /* >> + * shmem does not use swap pte's so we have to consult >> + * the radix tree to account for swap >> + */ >> + if (shmem_mapping(mapping)) { >> + page = find_get_entry(mapping, pgoff); >> + if (page) { >> + if (radix_tree_exceptional_entry(page)) >> + mss->swap += PAGE_SIZE; >> + else >> + page_cache_release(page); >> + } >> + page = NULL; >> + } >> } >> >> if (!page) >> -- >> 2.1.4 >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > -- > To unsubscribe from this list: send the line "unsubscribe linux-doc" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > [-- Attachment #2: shmem-show-swap-usage-in-smaps --] [-- Type: text/plain, Size: 1917 bytes --] shmem: show swap usage in smaps From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> --- fs/proc/task_mmu.c | 3 +++ include/linux/mm.h | 2 ++ mm/shmem.c | 8 ++++++++ 3 files changed, 13 insertions(+) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 956b75d61809..09a94cec159e 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -624,6 +624,9 @@ static int show_smap(struct seq_file *m, void *v, int is_pid) /* mmap_sem is held in m_start */ walk_page_vma(vma, &smaps_walk); + if (vma->vm_ops && vma->vm_ops->get_swap_usage) + mss.swap += vma->vm_ops->get_swap_usage(vma) << PAGE_SHIFT; + show_map_vma(m, vma, is_pid); seq_printf(m, diff --git a/include/linux/mm.h b/include/linux/mm.h index 6571dd78e984..477a46987859 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -292,6 +292,8 @@ struct vm_operations_struct { */ struct page *(*find_special_page)(struct vm_area_struct *vma, unsigned long addr); + + unsigned long (*get_swap_usage)(struct vm_area_struct *vma); }; struct mmu_gather; diff --git a/mm/shmem.c b/mm/shmem.c index cf2d0ca010bc..492f78f51fc2 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1363,6 +1363,13 @@ static struct mempolicy *shmem_get_policy(struct vm_area_struct *vma, } #endif +static unsigned long shmem_get_swap_usage(struct vm_area_struct *vma) +{ + struct inode *inode = file_inode(vma->vm_file); + + return SHMEM_I(inode)->swapped; +} + int shmem_lock(struct file *file, int lock, struct user_struct *user) { struct inode *inode = file_inode(file); @@ -3198,6 +3205,7 @@ static const struct vm_operations_struct shmem_vm_ops = { .set_policy = shmem_set_policy, .get_policy = shmem_get_policy, #endif + .get_swap_usage = shmem_get_swap_usage, }; static struct dentry *shmem_mount(struct file_system_type *fs_type, ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 2/4] mm, procfs: account for shmem swap in /proc/pid/smaps 2015-03-11 15:03 ` Konstantin Khlebnikov @ 2015-03-11 15:26 ` Jerome Marchand 2015-03-11 16:31 ` Konstantin Khlebnikov 0 siblings, 1 reply; 37+ messages in thread From: Jerome Marchand @ 2015-03-11 15:26 UTC (permalink / raw) To: Konstantin Khlebnikov, Konstantin Khlebnikov, Vlastimil Babka Cc: linux-mm@kvack.org, Linux Kernel Mailing List, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov [-- Attachment #1: Type: text/plain, Size: 1500 bytes --] On 03/11/2015 04:03 PM, Konstantin Khlebnikov wrote: > On 11.03.2015 15:30, Konstantin Khlebnikov wrote: >> On Thu, Feb 26, 2015 at 4:51 PM, Vlastimil Babka <vbabka@suse.cz> wrote: >>> Currently, /proc/pid/smaps will always show "Swap: 0 kB" for >>> shmem-backed >>> mappings, even if the mapped portion does contain pages that were >>> swapped out. >>> This is because unlike private anonymous mappings, shmem does not >>> change pte >>> to swap entry, but pte_none when swapping the page out. In the smaps >>> page >>> walk, such page thus looks like it was never faulted in. >> >> Maybe just add count of swap entries allocated by mapped shmem into >> swap usage of this vma? That's isn't exactly correct for partially >> mapped shmem but this is something weird anyway. > > Something like that (see patch in attachment) > -8<--- diff --git a/mm/shmem.c b/mm/shmem.c index cf2d0ca010bc..492f78f51fc2 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1363,6 +1363,13 @@ static struct mempolicy *shmem_get_policy(struct vm_area_struct *vma, } #endif +static unsigned long shmem_get_swap_usage(struct vm_area_struct *vma) +{ + struct inode *inode = file_inode(vma->vm_file); + + return SHMEM_I(inode)->swapped; +} + int shmem_lock(struct file *file, int lock, struct user_struct *user) { struct inode *inode = file_inode(file); -8<--- That will not work for shared anonymous mapping since they all share the same vm_file (/dev/zero). Jerome [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 2/4] mm, procfs: account for shmem swap in /proc/pid/smaps 2015-03-11 15:26 ` Jerome Marchand @ 2015-03-11 16:31 ` Konstantin Khlebnikov 0 siblings, 0 replies; 37+ messages in thread From: Konstantin Khlebnikov @ 2015-03-11 16:31 UTC (permalink / raw) To: Jerome Marchand, Konstantin Khlebnikov, Vlastimil Babka Cc: linux-mm@kvack.org, Linux Kernel Mailing List, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov On 11.03.2015 18:26, Jerome Marchand wrote: > On 03/11/2015 04:03 PM, Konstantin Khlebnikov wrote: >> On 11.03.2015 15:30, Konstantin Khlebnikov wrote: >>> On Thu, Feb 26, 2015 at 4:51 PM, Vlastimil Babka <vbabka@suse.cz> wrote: >>>> Currently, /proc/pid/smaps will always show "Swap: 0 kB" for >>>> shmem-backed >>>> mappings, even if the mapped portion does contain pages that were >>>> swapped out. >>>> This is because unlike private anonymous mappings, shmem does not >>>> change pte >>>> to swap entry, but pte_none when swapping the page out. In the smaps >>>> page >>>> walk, such page thus looks like it was never faulted in. >>> >>> Maybe just add count of swap entries allocated by mapped shmem into >>> swap usage of this vma? That's isn't exactly correct for partially >>> mapped shmem but this is something weird anyway. >> >> Something like that (see patch in attachment) >> > > -8<--- > > diff --git a/mm/shmem.c b/mm/shmem.c > index cf2d0ca010bc..492f78f51fc2 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -1363,6 +1363,13 @@ static struct mempolicy *shmem_get_policy(struct > vm_area_struct *vma, > } > #endif > > +static unsigned long shmem_get_swap_usage(struct vm_area_struct *vma) > +{ > + struct inode *inode = file_inode(vma->vm_file); > + > + return SHMEM_I(inode)->swapped; > +} > + > int shmem_lock(struct file *file, int lock, struct user_struct *user) > { > struct inode *inode = file_inode(file); > > -8<--- > > That will not work for shared anonymous mapping since they all share the > same vm_file (/dev/zero). Nope. They have different files and inodes. They're just called "/dev/zero (deleted)". > > Jerome > -- Konstantin -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/4] mm, procfs: account for shmem swap in /proc/pid/smaps @ 2015-03-11 16:31 ` Konstantin Khlebnikov 0 siblings, 0 replies; 37+ messages in thread From: Konstantin Khlebnikov @ 2015-03-11 16:31 UTC (permalink / raw) To: Jerome Marchand, Konstantin Khlebnikov, Vlastimil Babka Cc: linux-mm@kvack.org, Linux Kernel Mailing List, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov On 11.03.2015 18:26, Jerome Marchand wrote: > On 03/11/2015 04:03 PM, Konstantin Khlebnikov wrote: >> On 11.03.2015 15:30, Konstantin Khlebnikov wrote: >>> On Thu, Feb 26, 2015 at 4:51 PM, Vlastimil Babka <vbabka@suse.cz> wrote: >>>> Currently, /proc/pid/smaps will always show "Swap: 0 kB" for >>>> shmem-backed >>>> mappings, even if the mapped portion does contain pages that were >>>> swapped out. >>>> This is because unlike private anonymous mappings, shmem does not >>>> change pte >>>> to swap entry, but pte_none when swapping the page out. In the smaps >>>> page >>>> walk, such page thus looks like it was never faulted in. >>> >>> Maybe just add count of swap entries allocated by mapped shmem into >>> swap usage of this vma? That's isn't exactly correct for partially >>> mapped shmem but this is something weird anyway. >> >> Something like that (see patch in attachment) >> > > -8<--- > > diff --git a/mm/shmem.c b/mm/shmem.c > index cf2d0ca010bc..492f78f51fc2 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -1363,6 +1363,13 @@ static struct mempolicy *shmem_get_policy(struct > vm_area_struct *vma, > } > #endif > > +static unsigned long shmem_get_swap_usage(struct vm_area_struct *vma) > +{ > + struct inode *inode = file_inode(vma->vm_file); > + > + return SHMEM_I(inode)->swapped; > +} > + > int shmem_lock(struct file *file, int lock, struct user_struct *user) > { > struct inode *inode = file_inode(file); > > -8<--- > > That will not work for shared anonymous mapping since they all share the > same vm_file (/dev/zero). Nope. They have different files and inodes. They're just called "/dev/zero (deleted)". > > Jerome > -- Konstantin ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/4] mm, procfs: account for shmem swap in /proc/pid/smaps 2015-03-11 12:30 ` Konstantin Khlebnikov @ 2015-03-11 19:10 ` Vlastimil Babka -1 siblings, 0 replies; 37+ messages in thread From: Vlastimil Babka @ 2015-03-11 19:10 UTC (permalink / raw) To: Konstantin Khlebnikov Cc: linux-mm@kvack.org, Jerome Marchand, Linux Kernel Mailing List, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov On 03/11/2015 01:30 PM, Konstantin Khlebnikov wrote: > On Thu, Feb 26, 2015 at 4:51 PM, Vlastimil Babka <vbabka@suse.cz> wrote: >> Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed >> mappings, even if the mapped portion does contain pages that were swapped out. >> This is because unlike private anonymous mappings, shmem does not change pte >> to swap entry, but pte_none when swapping the page out. In the smaps page >> walk, such page thus looks like it was never faulted in. > > Maybe just add count of swap entries allocated by mapped shmem into > swap usage of this vma? That's isn't exactly correct for partially > mapped shmem but this is something weird anyway. Yeah for next version I want to add a patch optimizing for the (hopefully) common cases: 1. SHMEM_I(inode)->swapped is 0 - no need to consult radix tree 2. shmem inode is mapped fully (I hope it's ok to just compare its size and mapping size) - just use the value of SHMEM_I(inode)->swapped like you suggest -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/4] mm, procfs: account for shmem swap in /proc/pid/smaps @ 2015-03-11 19:10 ` Vlastimil Babka 0 siblings, 0 replies; 37+ messages in thread From: Vlastimil Babka @ 2015-03-11 19:10 UTC (permalink / raw) To: Konstantin Khlebnikov Cc: linux-mm@kvack.org, Jerome Marchand, Linux Kernel Mailing List, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov On 03/11/2015 01:30 PM, Konstantin Khlebnikov wrote: > On Thu, Feb 26, 2015 at 4:51 PM, Vlastimil Babka <vbabka@suse.cz> wrote: >> Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed >> mappings, even if the mapped portion does contain pages that were swapped out. >> This is because unlike private anonymous mappings, shmem does not change pte >> to swap entry, but pte_none when swapping the page out. In the smaps page >> walk, such page thus looks like it was never faulted in. > > Maybe just add count of swap entries allocated by mapped shmem into > swap usage of this vma? That's isn't exactly correct for partially > mapped shmem but this is something weird anyway. Yeah for next version I want to add a patch optimizing for the (hopefully) common cases: 1. SHMEM_I(inode)->swapped is 0 - no need to consult radix tree 2. shmem inode is mapped fully (I hope it's ok to just compare its size and mapping size) - just use the value of SHMEM_I(inode)->swapped like you suggest ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/4] mm, procfs: account for shmem swap in /proc/pid/smaps 2015-03-11 19:10 ` Vlastimil Babka @ 2015-03-11 20:03 ` Konstantin Khlebnikov -1 siblings, 0 replies; 37+ messages in thread From: Konstantin Khlebnikov @ 2015-03-11 20:03 UTC (permalink / raw) To: Vlastimil Babka Cc: linux-mm@kvack.org, Jerome Marchand, Linux Kernel Mailing List, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov On Wed, Mar 11, 2015 at 10:10 PM, Vlastimil Babka <vbabka@suse.cz> wrote: > On 03/11/2015 01:30 PM, Konstantin Khlebnikov wrote: >> >> On Thu, Feb 26, 2015 at 4:51 PM, Vlastimil Babka <vbabka@suse.cz> wrote: >>> >>> Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed >>> mappings, even if the mapped portion does contain pages that were swapped >>> out. >>> This is because unlike private anonymous mappings, shmem does not change >>> pte >>> to swap entry, but pte_none when swapping the page out. In the smaps page >>> walk, such page thus looks like it was never faulted in. >> >> >> Maybe just add count of swap entries allocated by mapped shmem into >> swap usage of this vma? That's isn't exactly correct for partially >> mapped shmem but this is something weird anyway. > > > Yeah for next version I want to add a patch optimizing for the (hopefully) > common cases: > > 1. SHMEM_I(inode)->swapped is 0 - no need to consult radix tree > 2. shmem inode is mapped fully (I hope it's ok to just compare its size and > mapping size) - just use the value of SHMEM_I(inode)->swapped like you > suggest > BTW using radix tree iterator you can count swap entries without touching page->count. Also long time ago I've suggested to mark swap entries in shmem with one of radix tree tag -- tagged iterator is much faster for sparse trees. (just for this case it's overkill but these tags can speedup swapoff) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/4] mm, procfs: account for shmem swap in /proc/pid/smaps @ 2015-03-11 20:03 ` Konstantin Khlebnikov 0 siblings, 0 replies; 37+ messages in thread From: Konstantin Khlebnikov @ 2015-03-11 20:03 UTC (permalink / raw) To: Vlastimil Babka Cc: linux-mm@kvack.org, Jerome Marchand, Linux Kernel Mailing List, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov On Wed, Mar 11, 2015 at 10:10 PM, Vlastimil Babka <vbabka@suse.cz> wrote: > On 03/11/2015 01:30 PM, Konstantin Khlebnikov wrote: >> >> On Thu, Feb 26, 2015 at 4:51 PM, Vlastimil Babka <vbabka@suse.cz> wrote: >>> >>> Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed >>> mappings, even if the mapped portion does contain pages that were swapped >>> out. >>> This is because unlike private anonymous mappings, shmem does not change >>> pte >>> to swap entry, but pte_none when swapping the page out. In the smaps page >>> walk, such page thus looks like it was never faulted in. >> >> >> Maybe just add count of swap entries allocated by mapped shmem into >> swap usage of this vma? That's isn't exactly correct for partially >> mapped shmem but this is something weird anyway. > > > Yeah for next version I want to add a patch optimizing for the (hopefully) > common cases: > > 1. SHMEM_I(inode)->swapped is 0 - no need to consult radix tree > 2. shmem inode is mapped fully (I hope it's ok to just compare its size and > mapping size) - just use the value of SHMEM_I(inode)->swapped like you > suggest > BTW using radix tree iterator you can count swap entries without touching page->count. Also long time ago I've suggested to mark swap entries in shmem with one of radix tree tag -- tagged iterator is much faster for sparse trees. (just for this case it's overkill but these tags can speedup swapoff) ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 3/4] mm, shmem: Add shmem resident memory accounting 2015-02-26 13:51 ` Vlastimil Babka @ 2015-02-26 13:51 ` Vlastimil Babka -1 siblings, 0 replies; 37+ messages in thread From: Vlastimil Babka @ 2015-02-26 13:51 UTC (permalink / raw) To: linux-mm, Jerome Marchand Cc: linux-kernel, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Vlastimil Babka From: Jerome Marchand <jmarchan@redhat.com> Currently looking at /proc/<pid>/status or statm, there is no way to distinguish shmem pages from pages mapped to a regular file (shmem pages are mapped to /dev/zero), even though their implication in actual memory use is quite different. This patch adds MM_SHMEMPAGES counter to mm_rss_stat to account for shmem pages instead of MM_FILEPAGES. [vbabka@suse.cz: port to 4.0, add #ifdefs, mm_counter_file() variant] Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> --- arch/s390/mm/pgtable.c | 5 +---- fs/proc/task_mmu.c | 4 +++- include/linux/mm.h | 28 ++++++++++++++++++++++++++++ include/linux/mm_types.h | 9 ++++++--- kernel/events/uprobes.c | 2 +- mm/memory.c | 30 ++++++++++-------------------- mm/oom_kill.c | 5 +++-- mm/rmap.c | 15 ++++----------- 8 files changed, 56 insertions(+), 42 deletions(-) diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index b2c1542..5bffd5d 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -617,10 +617,7 @@ static void gmap_zap_swap_entry(swp_entry_t entry, struct mm_struct *mm) else if (is_migration_entry(entry)) { struct page *page = migration_entry_to_page(entry); - if (PageAnon(page)) - dec_mm_counter(mm, MM_ANONPAGES); - else - dec_mm_counter(mm, MM_FILEPAGES); + dec_mm_counter(mm, mm_counter(page)); } free_swap_and_cache(entry); } diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 0410309..d70334c 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -81,7 +81,8 @@ unsigned long task_statm(struct mm_struct *mm, unsigned long *shared, unsigned long *text, unsigned long *data, unsigned long *resident) { - *shared = get_mm_counter(mm, MM_FILEPAGES); + *shared = get_mm_counter(mm, MM_FILEPAGES) + + get_mm_counter(mm, MM_SHMEMPAGES); *text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) >> PAGE_SHIFT; *data = mm->total_vm - mm->shared_vm; @@ -501,6 +502,7 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr, pte_none(*pte) && vma->vm_file) { struct address_space *mapping = file_inode(vma->vm_file)->i_mapping; + pgoff_t pgoff = linear_page_index(vma, addr); /* * shmem does not use swap pte's so we have to consult diff --git a/include/linux/mm.h b/include/linux/mm.h index 47a9392..adfbb5b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1364,6 +1364,16 @@ static inline unsigned long get_mm_counter(struct mm_struct *mm, int member) return (unsigned long)val; } +/* A wrapper for the CONFIG_SHMEM dependent counter */ +static inline unsigned long get_mm_counter_shmem(struct mm_struct *mm) +{ +#ifdef CONFIG_SHMEM + return get_mm_counter(mm, MM_SHMEMPAGES); +#else + return 0; +#endif +} + static inline void add_mm_counter(struct mm_struct *mm, int member, long value) { atomic_long_add(value, &mm->rss_stat.count[member]); @@ -1379,9 +1389,27 @@ static inline void dec_mm_counter(struct mm_struct *mm, int member) atomic_long_dec(&mm->rss_stat.count[member]); } +/* Optimized variant when page is already known not to be PageAnon */ +static inline int mm_counter_file(struct page *page) +{ +#ifdef CONFIG_SHMEM + if (PageSwapBacked(page)) + return MM_SHMEMPAGES; +#endif + return MM_FILEPAGES; +} + +static inline int mm_counter(struct page *page) +{ + if (PageAnon(page)) + return MM_ANONPAGES; + return mm_counter_file(page); +} + static inline unsigned long get_mm_rss(struct mm_struct *mm) { return get_mm_counter(mm, MM_FILEPAGES) + + get_mm_counter_shmem(mm) + get_mm_counter(mm, MM_ANONPAGES); } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 199a03a..d3c2372 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -327,9 +327,12 @@ struct core_state { }; enum { - MM_FILEPAGES, - MM_ANONPAGES, - MM_SWAPENTS, + MM_FILEPAGES, /* Resident file mapping pages */ + MM_ANONPAGES, /* Resident anonymous pages */ + MM_SWAPENTS, /* Anonymous swap entries */ +#ifdef CONFIG_SHMEM + MM_SHMEMPAGES, /* Resident shared memory pages */ +#endif NR_MM_COUNTERS }; diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index cb346f2..0a08fdd 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -188,7 +188,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, lru_cache_add_active_or_unevictable(kpage, vma); if (!PageAnon(page)) { - dec_mm_counter(mm, MM_FILEPAGES); + dec_mm_counter(mm, mm_counter_file(page)); inc_mm_counter(mm, MM_ANONPAGES); } diff --git a/mm/memory.c b/mm/memory.c index 8068893..f145d9e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -832,10 +832,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, } else if (is_migration_entry(entry)) { page = migration_entry_to_page(entry); - if (PageAnon(page)) - rss[MM_ANONPAGES]++; - else - rss[MM_FILEPAGES]++; + rss[mm_counter(page)]++; if (is_write_migration_entry(entry) && is_cow_mapping(vm_flags)) { @@ -874,10 +871,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, if (page) { get_page(page); page_dup_rmap(page); - if (PageAnon(page)) - rss[MM_ANONPAGES]++; - else - rss[MM_FILEPAGES]++; + rss[mm_counter(page)]++; } out_set_pte: @@ -1113,9 +1107,8 @@ again: tlb_remove_tlb_entry(tlb, pte, addr); if (unlikely(!page)) continue; - if (PageAnon(page)) - rss[MM_ANONPAGES]--; - else { + + if (!PageAnon(page)) { if (pte_dirty(ptent)) { force_flush = 1; set_page_dirty(page); @@ -1123,8 +1116,8 @@ again: if (pte_young(ptent) && likely(!(vma->vm_flags & VM_SEQ_READ))) mark_page_accessed(page); - rss[MM_FILEPAGES]--; } + rss[mm_counter(page)]--; page_remove_rmap(page); if (unlikely(page_mapcount(page) < 0)) print_bad_pte(vma, addr, ptent, page); @@ -1146,11 +1139,7 @@ again: struct page *page; page = migration_entry_to_page(entry); - - if (PageAnon(page)) - rss[MM_ANONPAGES]--; - else - rss[MM_FILEPAGES]--; + rss[mm_counter(page)]--; } if (unlikely(!free_swap_and_cache(entry))) print_bad_pte(vma, addr, ptent, NULL); @@ -1460,7 +1449,7 @@ static int insert_page(struct vm_area_struct *vma, unsigned long addr, /* Ok, finally just insert the thing.. */ get_page(page); - inc_mm_counter_fast(mm, MM_FILEPAGES); + inc_mm_counter_fast(mm, mm_counter_file(page)); page_add_file_rmap(page); set_pte_at(mm, addr, pte, mk_pte(page, prot)); @@ -2174,7 +2163,8 @@ gotten: if (likely(pte_same(*page_table, orig_pte))) { if (old_page) { if (!PageAnon(old_page)) { - dec_mm_counter_fast(mm, MM_FILEPAGES); + dec_mm_counter_fast(mm, + mm_counter_file(old_page)); inc_mm_counter_fast(mm, MM_ANONPAGES); } } else @@ -2703,7 +2693,7 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address, inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); page_add_new_anon_rmap(page, vma, address); } else { - inc_mm_counter_fast(vma->vm_mm, MM_FILEPAGES); + inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page)); page_add_file_rmap(page); } set_pte_at(vma->vm_mm, address, pte, entry); diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 642f38c..a5ee3a2 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -573,10 +573,11 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order, /* mm cannot safely be dereferenced after task_unlock(victim) */ mm = victim->mm; mark_tsk_oom_victim(victim); - pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n", + pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n", task_pid_nr(victim), victim->comm, K(victim->mm->total_vm), K(get_mm_counter(victim->mm, MM_ANONPAGES)), - K(get_mm_counter(victim->mm, MM_FILEPAGES))); + K(get_mm_counter(victim->mm, MM_FILEPAGES)), + K(get_mm_counter_shmem(victim->mm))); task_unlock(victim); /* diff --git a/mm/rmap.c b/mm/rmap.c index 5e3e090..e3c4392 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1216,12 +1216,8 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, update_hiwater_rss(mm); if (PageHWPoison(page) && !(flags & TTU_IGNORE_HWPOISON)) { - if (!PageHuge(page)) { - if (PageAnon(page)) - dec_mm_counter(mm, MM_ANONPAGES); - else - dec_mm_counter(mm, MM_FILEPAGES); - } + if (!PageHuge(page)) + dec_mm_counter(mm, mm_counter(page)); set_pte_at(mm, address, pte, swp_entry_to_pte(make_hwpoison_entry(page))); } else if (pte_unused(pteval)) { @@ -1230,10 +1226,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, * interest anymore. Simply discard the pte, vmscan * will take care of the rest. */ - if (PageAnon(page)) - dec_mm_counter(mm, MM_ANONPAGES); - else - dec_mm_counter(mm, MM_FILEPAGES); + dec_mm_counter(mm, mm_counter(page)); } else if (PageAnon(page)) { swp_entry_t entry = { .val = page_private(page) }; pte_t swp_pte; @@ -1276,7 +1269,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, entry = make_migration_entry(page, pte_write(pteval)); set_pte_at(mm, address, pte, swp_entry_to_pte(entry)); } else - dec_mm_counter(mm, MM_FILEPAGES); + dec_mm_counter(mm, mm_counter_file(page)); page_remove_rmap(page); page_cache_release(page); -- 2.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH 3/4] mm, shmem: Add shmem resident memory accounting @ 2015-02-26 13:51 ` Vlastimil Babka 0 siblings, 0 replies; 37+ messages in thread From: Vlastimil Babka @ 2015-02-26 13:51 UTC (permalink / raw) To: linux-mm, Jerome Marchand Cc: linux-kernel, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Vlastimil Babka From: Jerome Marchand <jmarchan@redhat.com> Currently looking at /proc/<pid>/status or statm, there is no way to distinguish shmem pages from pages mapped to a regular file (shmem pages are mapped to /dev/zero), even though their implication in actual memory use is quite different. This patch adds MM_SHMEMPAGES counter to mm_rss_stat to account for shmem pages instead of MM_FILEPAGES. [vbabka@suse.cz: port to 4.0, add #ifdefs, mm_counter_file() variant] Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> --- arch/s390/mm/pgtable.c | 5 +---- fs/proc/task_mmu.c | 4 +++- include/linux/mm.h | 28 ++++++++++++++++++++++++++++ include/linux/mm_types.h | 9 ++++++--- kernel/events/uprobes.c | 2 +- mm/memory.c | 30 ++++++++++-------------------- mm/oom_kill.c | 5 +++-- mm/rmap.c | 15 ++++----------- 8 files changed, 56 insertions(+), 42 deletions(-) diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index b2c1542..5bffd5d 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -617,10 +617,7 @@ static void gmap_zap_swap_entry(swp_entry_t entry, struct mm_struct *mm) else if (is_migration_entry(entry)) { struct page *page = migration_entry_to_page(entry); - if (PageAnon(page)) - dec_mm_counter(mm, MM_ANONPAGES); - else - dec_mm_counter(mm, MM_FILEPAGES); + dec_mm_counter(mm, mm_counter(page)); } free_swap_and_cache(entry); } diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 0410309..d70334c 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -81,7 +81,8 @@ unsigned long task_statm(struct mm_struct *mm, unsigned long *shared, unsigned long *text, unsigned long *data, unsigned long *resident) { - *shared = get_mm_counter(mm, MM_FILEPAGES); + *shared = get_mm_counter(mm, MM_FILEPAGES) + + get_mm_counter(mm, MM_SHMEMPAGES); *text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) >> PAGE_SHIFT; *data = mm->total_vm - mm->shared_vm; @@ -501,6 +502,7 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr, pte_none(*pte) && vma->vm_file) { struct address_space *mapping = file_inode(vma->vm_file)->i_mapping; + pgoff_t pgoff = linear_page_index(vma, addr); /* * shmem does not use swap pte's so we have to consult diff --git a/include/linux/mm.h b/include/linux/mm.h index 47a9392..adfbb5b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1364,6 +1364,16 @@ static inline unsigned long get_mm_counter(struct mm_struct *mm, int member) return (unsigned long)val; } +/* A wrapper for the CONFIG_SHMEM dependent counter */ +static inline unsigned long get_mm_counter_shmem(struct mm_struct *mm) +{ +#ifdef CONFIG_SHMEM + return get_mm_counter(mm, MM_SHMEMPAGES); +#else + return 0; +#endif +} + static inline void add_mm_counter(struct mm_struct *mm, int member, long value) { atomic_long_add(value, &mm->rss_stat.count[member]); @@ -1379,9 +1389,27 @@ static inline void dec_mm_counter(struct mm_struct *mm, int member) atomic_long_dec(&mm->rss_stat.count[member]); } +/* Optimized variant when page is already known not to be PageAnon */ +static inline int mm_counter_file(struct page *page) +{ +#ifdef CONFIG_SHMEM + if (PageSwapBacked(page)) + return MM_SHMEMPAGES; +#endif + return MM_FILEPAGES; +} + +static inline int mm_counter(struct page *page) +{ + if (PageAnon(page)) + return MM_ANONPAGES; + return mm_counter_file(page); +} + static inline unsigned long get_mm_rss(struct mm_struct *mm) { return get_mm_counter(mm, MM_FILEPAGES) + + get_mm_counter_shmem(mm) + get_mm_counter(mm, MM_ANONPAGES); } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 199a03a..d3c2372 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -327,9 +327,12 @@ struct core_state { }; enum { - MM_FILEPAGES, - MM_ANONPAGES, - MM_SWAPENTS, + MM_FILEPAGES, /* Resident file mapping pages */ + MM_ANONPAGES, /* Resident anonymous pages */ + MM_SWAPENTS, /* Anonymous swap entries */ +#ifdef CONFIG_SHMEM + MM_SHMEMPAGES, /* Resident shared memory pages */ +#endif NR_MM_COUNTERS }; diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index cb346f2..0a08fdd 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -188,7 +188,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, lru_cache_add_active_or_unevictable(kpage, vma); if (!PageAnon(page)) { - dec_mm_counter(mm, MM_FILEPAGES); + dec_mm_counter(mm, mm_counter_file(page)); inc_mm_counter(mm, MM_ANONPAGES); } diff --git a/mm/memory.c b/mm/memory.c index 8068893..f145d9e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -832,10 +832,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, } else if (is_migration_entry(entry)) { page = migration_entry_to_page(entry); - if (PageAnon(page)) - rss[MM_ANONPAGES]++; - else - rss[MM_FILEPAGES]++; + rss[mm_counter(page)]++; if (is_write_migration_entry(entry) && is_cow_mapping(vm_flags)) { @@ -874,10 +871,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, if (page) { get_page(page); page_dup_rmap(page); - if (PageAnon(page)) - rss[MM_ANONPAGES]++; - else - rss[MM_FILEPAGES]++; + rss[mm_counter(page)]++; } out_set_pte: @@ -1113,9 +1107,8 @@ again: tlb_remove_tlb_entry(tlb, pte, addr); if (unlikely(!page)) continue; - if (PageAnon(page)) - rss[MM_ANONPAGES]--; - else { + + if (!PageAnon(page)) { if (pte_dirty(ptent)) { force_flush = 1; set_page_dirty(page); @@ -1123,8 +1116,8 @@ again: if (pte_young(ptent) && likely(!(vma->vm_flags & VM_SEQ_READ))) mark_page_accessed(page); - rss[MM_FILEPAGES]--; } + rss[mm_counter(page)]--; page_remove_rmap(page); if (unlikely(page_mapcount(page) < 0)) print_bad_pte(vma, addr, ptent, page); @@ -1146,11 +1139,7 @@ again: struct page *page; page = migration_entry_to_page(entry); - - if (PageAnon(page)) - rss[MM_ANONPAGES]--; - else - rss[MM_FILEPAGES]--; + rss[mm_counter(page)]--; } if (unlikely(!free_swap_and_cache(entry))) print_bad_pte(vma, addr, ptent, NULL); @@ -1460,7 +1449,7 @@ static int insert_page(struct vm_area_struct *vma, unsigned long addr, /* Ok, finally just insert the thing.. */ get_page(page); - inc_mm_counter_fast(mm, MM_FILEPAGES); + inc_mm_counter_fast(mm, mm_counter_file(page)); page_add_file_rmap(page); set_pte_at(mm, addr, pte, mk_pte(page, prot)); @@ -2174,7 +2163,8 @@ gotten: if (likely(pte_same(*page_table, orig_pte))) { if (old_page) { if (!PageAnon(old_page)) { - dec_mm_counter_fast(mm, MM_FILEPAGES); + dec_mm_counter_fast(mm, + mm_counter_file(old_page)); inc_mm_counter_fast(mm, MM_ANONPAGES); } } else @@ -2703,7 +2693,7 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address, inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); page_add_new_anon_rmap(page, vma, address); } else { - inc_mm_counter_fast(vma->vm_mm, MM_FILEPAGES); + inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page)); page_add_file_rmap(page); } set_pte_at(vma->vm_mm, address, pte, entry); diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 642f38c..a5ee3a2 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -573,10 +573,11 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order, /* mm cannot safely be dereferenced after task_unlock(victim) */ mm = victim->mm; mark_tsk_oom_victim(victim); - pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n", + pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n", task_pid_nr(victim), victim->comm, K(victim->mm->total_vm), K(get_mm_counter(victim->mm, MM_ANONPAGES)), - K(get_mm_counter(victim->mm, MM_FILEPAGES))); + K(get_mm_counter(victim->mm, MM_FILEPAGES)), + K(get_mm_counter_shmem(victim->mm))); task_unlock(victim); /* diff --git a/mm/rmap.c b/mm/rmap.c index 5e3e090..e3c4392 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1216,12 +1216,8 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, update_hiwater_rss(mm); if (PageHWPoison(page) && !(flags & TTU_IGNORE_HWPOISON)) { - if (!PageHuge(page)) { - if (PageAnon(page)) - dec_mm_counter(mm, MM_ANONPAGES); - else - dec_mm_counter(mm, MM_FILEPAGES); - } + if (!PageHuge(page)) + dec_mm_counter(mm, mm_counter(page)); set_pte_at(mm, address, pte, swp_entry_to_pte(make_hwpoison_entry(page))); } else if (pte_unused(pteval)) { @@ -1230,10 +1226,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, * interest anymore. Simply discard the pte, vmscan * will take care of the rest. */ - if (PageAnon(page)) - dec_mm_counter(mm, MM_ANONPAGES); - else - dec_mm_counter(mm, MM_FILEPAGES); + dec_mm_counter(mm, mm_counter(page)); } else if (PageAnon(page)) { swp_entry_t entry = { .val = page_private(page) }; pte_t swp_pte; @@ -1276,7 +1269,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, entry = make_migration_entry(page, pte_write(pteval)); set_pte_at(mm, address, pte, swp_entry_to_pte(entry)); } else - dec_mm_counter(mm, MM_FILEPAGES); + dec_mm_counter(mm, mm_counter_file(page)); page_remove_rmap(page); page_cache_release(page); -- 2.1.4 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 3/4] mm, shmem: Add shmem resident memory accounting 2015-02-26 13:51 ` Vlastimil Babka (?) @ 2015-02-26 14:59 ` Jerome Marchand 2015-03-27 16:39 ` Vlastimil Babka -1 siblings, 1 reply; 37+ messages in thread From: Jerome Marchand @ 2015-02-26 14:59 UTC (permalink / raw) To: Vlastimil Babka, linux-mm Cc: linux-kernel, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov [-- Attachment #1: Type: text/plain, Size: 10706 bytes --] On 02/26/2015 02:51 PM, Vlastimil Babka wrote: > From: Jerome Marchand <jmarchan@redhat.com> > > Currently looking at /proc/<pid>/status or statm, there is no way to > distinguish shmem pages from pages mapped to a regular file (shmem > pages are mapped to /dev/zero), even though their implication in > actual memory use is quite different. > This patch adds MM_SHMEMPAGES counter to mm_rss_stat to account for > shmem pages instead of MM_FILEPAGES. > > [vbabka@suse.cz: port to 4.0, add #ifdefs, mm_counter_file() variant] > Signed-off-by: Jerome Marchand <jmarchan@redhat.com> > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > --- > arch/s390/mm/pgtable.c | 5 +---- > fs/proc/task_mmu.c | 4 +++- > include/linux/mm.h | 28 ++++++++++++++++++++++++++++ > include/linux/mm_types.h | 9 ++++++--- > kernel/events/uprobes.c | 2 +- > mm/memory.c | 30 ++++++++++-------------------- > mm/oom_kill.c | 5 +++-- > mm/rmap.c | 15 ++++----------- > 8 files changed, 56 insertions(+), 42 deletions(-) > > diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c > index b2c1542..5bffd5d 100644 > --- a/arch/s390/mm/pgtable.c > +++ b/arch/s390/mm/pgtable.c > @@ -617,10 +617,7 @@ static void gmap_zap_swap_entry(swp_entry_t entry, struct mm_struct *mm) > else if (is_migration_entry(entry)) { > struct page *page = migration_entry_to_page(entry); > > - if (PageAnon(page)) > - dec_mm_counter(mm, MM_ANONPAGES); > - else > - dec_mm_counter(mm, MM_FILEPAGES); > + dec_mm_counter(mm, mm_counter(page)); > } > free_swap_and_cache(entry); > } > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index 0410309..d70334c 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -81,7 +81,8 @@ unsigned long task_statm(struct mm_struct *mm, > unsigned long *shared, unsigned long *text, > unsigned long *data, unsigned long *resident) > { > - *shared = get_mm_counter(mm, MM_FILEPAGES); > + *shared = get_mm_counter(mm, MM_FILEPAGES) + > + get_mm_counter(mm, MM_SHMEMPAGES); > *text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) > >> PAGE_SHIFT; > *data = mm->total_vm - mm->shared_vm; > @@ -501,6 +502,7 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr, > pte_none(*pte) && vma->vm_file) { > struct address_space *mapping = > file_inode(vma->vm_file)->i_mapping; > + pgoff_t pgoff = linear_page_index(vma, addr); > > /* > * shmem does not use swap pte's so we have to consult > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 47a9392..adfbb5b 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1364,6 +1364,16 @@ static inline unsigned long get_mm_counter(struct mm_struct *mm, int member) > return (unsigned long)val; > } > > +/* A wrapper for the CONFIG_SHMEM dependent counter */ > +static inline unsigned long get_mm_counter_shmem(struct mm_struct *mm) > +{ > +#ifdef CONFIG_SHMEM > + return get_mm_counter(mm, MM_SHMEMPAGES); > +#else > + return 0; > +#endif > +} > + > static inline void add_mm_counter(struct mm_struct *mm, int member, long value) > { > atomic_long_add(value, &mm->rss_stat.count[member]); > @@ -1379,9 +1389,27 @@ static inline void dec_mm_counter(struct mm_struct *mm, int member) > atomic_long_dec(&mm->rss_stat.count[member]); > } > > +/* Optimized variant when page is already known not to be PageAnon */ > +static inline int mm_counter_file(struct page *page) Just a nitpick, but I don't like that name as it keeps the confusion we currently have between shmem and file backed pages. I'm not sure what other name to use though. mm_counter_shared() maybe? I'm not sure it is less confusing... Jerome > +{ > +#ifdef CONFIG_SHMEM > + if (PageSwapBacked(page)) > + return MM_SHMEMPAGES; > +#endif > + return MM_FILEPAGES; > +} > + > +static inline int mm_counter(struct page *page) > +{ > + if (PageAnon(page)) > + return MM_ANONPAGES; > + return mm_counter_file(page); > +} > + > static inline unsigned long get_mm_rss(struct mm_struct *mm) > { > return get_mm_counter(mm, MM_FILEPAGES) + > + get_mm_counter_shmem(mm) + > get_mm_counter(mm, MM_ANONPAGES); > } > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 199a03a..d3c2372 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -327,9 +327,12 @@ struct core_state { > }; > > enum { > - MM_FILEPAGES, > - MM_ANONPAGES, > - MM_SWAPENTS, > + MM_FILEPAGES, /* Resident file mapping pages */ > + MM_ANONPAGES, /* Resident anonymous pages */ > + MM_SWAPENTS, /* Anonymous swap entries */ > +#ifdef CONFIG_SHMEM > + MM_SHMEMPAGES, /* Resident shared memory pages */ > +#endif > NR_MM_COUNTERS > }; > > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c > index cb346f2..0a08fdd 100644 > --- a/kernel/events/uprobes.c > +++ b/kernel/events/uprobes.c > @@ -188,7 +188,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, > lru_cache_add_active_or_unevictable(kpage, vma); > > if (!PageAnon(page)) { > - dec_mm_counter(mm, MM_FILEPAGES); > + dec_mm_counter(mm, mm_counter_file(page)); > inc_mm_counter(mm, MM_ANONPAGES); > } > > diff --git a/mm/memory.c b/mm/memory.c > index 8068893..f145d9e 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -832,10 +832,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, > } else if (is_migration_entry(entry)) { > page = migration_entry_to_page(entry); > > - if (PageAnon(page)) > - rss[MM_ANONPAGES]++; > - else > - rss[MM_FILEPAGES]++; > + rss[mm_counter(page)]++; > > if (is_write_migration_entry(entry) && > is_cow_mapping(vm_flags)) { > @@ -874,10 +871,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, > if (page) { > get_page(page); > page_dup_rmap(page); > - if (PageAnon(page)) > - rss[MM_ANONPAGES]++; > - else > - rss[MM_FILEPAGES]++; > + rss[mm_counter(page)]++; > } > > out_set_pte: > @@ -1113,9 +1107,8 @@ again: > tlb_remove_tlb_entry(tlb, pte, addr); > if (unlikely(!page)) > continue; > - if (PageAnon(page)) > - rss[MM_ANONPAGES]--; > - else { > + > + if (!PageAnon(page)) { > if (pte_dirty(ptent)) { > force_flush = 1; > set_page_dirty(page); > @@ -1123,8 +1116,8 @@ again: > if (pte_young(ptent) && > likely(!(vma->vm_flags & VM_SEQ_READ))) > mark_page_accessed(page); > - rss[MM_FILEPAGES]--; > } > + rss[mm_counter(page)]--; > page_remove_rmap(page); > if (unlikely(page_mapcount(page) < 0)) > print_bad_pte(vma, addr, ptent, page); > @@ -1146,11 +1139,7 @@ again: > struct page *page; > > page = migration_entry_to_page(entry); > - > - if (PageAnon(page)) > - rss[MM_ANONPAGES]--; > - else > - rss[MM_FILEPAGES]--; > + rss[mm_counter(page)]--; > } > if (unlikely(!free_swap_and_cache(entry))) > print_bad_pte(vma, addr, ptent, NULL); > @@ -1460,7 +1449,7 @@ static int insert_page(struct vm_area_struct *vma, unsigned long addr, > > /* Ok, finally just insert the thing.. */ > get_page(page); > - inc_mm_counter_fast(mm, MM_FILEPAGES); > + inc_mm_counter_fast(mm, mm_counter_file(page)); > page_add_file_rmap(page); > set_pte_at(mm, addr, pte, mk_pte(page, prot)); > > @@ -2174,7 +2163,8 @@ gotten: > if (likely(pte_same(*page_table, orig_pte))) { > if (old_page) { > if (!PageAnon(old_page)) { > - dec_mm_counter_fast(mm, MM_FILEPAGES); > + dec_mm_counter_fast(mm, > + mm_counter_file(old_page)); > inc_mm_counter_fast(mm, MM_ANONPAGES); > } > } else > @@ -2703,7 +2693,7 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address, > inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); > page_add_new_anon_rmap(page, vma, address); > } else { > - inc_mm_counter_fast(vma->vm_mm, MM_FILEPAGES); > + inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page)); > page_add_file_rmap(page); > } > set_pte_at(vma->vm_mm, address, pte, entry); > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 642f38c..a5ee3a2 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -573,10 +573,11 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order, > /* mm cannot safely be dereferenced after task_unlock(victim) */ > mm = victim->mm; > mark_tsk_oom_victim(victim); > - pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n", > + pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n", > task_pid_nr(victim), victim->comm, K(victim->mm->total_vm), > K(get_mm_counter(victim->mm, MM_ANONPAGES)), > - K(get_mm_counter(victim->mm, MM_FILEPAGES))); > + K(get_mm_counter(victim->mm, MM_FILEPAGES)), > + K(get_mm_counter_shmem(victim->mm))); > task_unlock(victim); > > /* > diff --git a/mm/rmap.c b/mm/rmap.c > index 5e3e090..e3c4392 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1216,12 +1216,8 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > update_hiwater_rss(mm); > > if (PageHWPoison(page) && !(flags & TTU_IGNORE_HWPOISON)) { > - if (!PageHuge(page)) { > - if (PageAnon(page)) > - dec_mm_counter(mm, MM_ANONPAGES); > - else > - dec_mm_counter(mm, MM_FILEPAGES); > - } > + if (!PageHuge(page)) > + dec_mm_counter(mm, mm_counter(page)); > set_pte_at(mm, address, pte, > swp_entry_to_pte(make_hwpoison_entry(page))); > } else if (pte_unused(pteval)) { > @@ -1230,10 +1226,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > * interest anymore. Simply discard the pte, vmscan > * will take care of the rest. > */ > - if (PageAnon(page)) > - dec_mm_counter(mm, MM_ANONPAGES); > - else > - dec_mm_counter(mm, MM_FILEPAGES); > + dec_mm_counter(mm, mm_counter(page)); > } else if (PageAnon(page)) { > swp_entry_t entry = { .val = page_private(page) }; > pte_t swp_pte; > @@ -1276,7 +1269,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > entry = make_migration_entry(page, pte_write(pteval)); > set_pte_at(mm, address, pte, swp_entry_to_pte(entry)); > } else > - dec_mm_counter(mm, MM_FILEPAGES); > + dec_mm_counter(mm, mm_counter_file(page)); > > page_remove_rmap(page); > page_cache_release(page); > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 3/4] mm, shmem: Add shmem resident memory accounting 2015-02-26 14:59 ` Jerome Marchand @ 2015-03-27 16:39 ` Vlastimil Babka 0 siblings, 0 replies; 37+ messages in thread From: Vlastimil Babka @ 2015-03-27 16:39 UTC (permalink / raw) To: Jerome Marchand, linux-mm Cc: linux-kernel, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov On 02/26/2015 03:59 PM, Jerome Marchand wrote: > On 02/26/2015 02:51 PM, Vlastimil Babka wrote: >> >> +/* Optimized variant when page is already known not to be PageAnon */ >> +static inline int mm_counter_file(struct page *page) > > Just a nitpick, but I don't like that name as it keeps the confusion we > currently have between shmem and file backed pages. I'm not sure what > other name to use though. mm_counter_shared() maybe? I'm not sure it is > less confusing... I think that's also confusing, but differently. Didn't come up with better name, so leaving as it is for v2. Thanks > Jerome > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 3/4] mm, shmem: Add shmem resident memory accounting @ 2015-03-27 16:39 ` Vlastimil Babka 0 siblings, 0 replies; 37+ messages in thread From: Vlastimil Babka @ 2015-03-27 16:39 UTC (permalink / raw) To: Jerome Marchand, linux-mm Cc: linux-kernel, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov On 02/26/2015 03:59 PM, Jerome Marchand wrote: > On 02/26/2015 02:51 PM, Vlastimil Babka wrote: >> >> +/* Optimized variant when page is already known not to be PageAnon */ >> +static inline int mm_counter_file(struct page *page) > > Just a nitpick, but I don't like that name as it keeps the confusion we > currently have between shmem and file backed pages. I'm not sure what > other name to use though. mm_counter_shared() maybe? I'm not sure it is > less confusing... I think that's also confusing, but differently. Didn't come up with better name, so leaving as it is for v2. Thanks > Jerome > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 3/4] mm, shmem: Add shmem resident memory accounting 2015-02-26 13:51 ` Vlastimil Babka @ 2015-02-27 10:38 ` Michael Kerrisk -1 siblings, 0 replies; 37+ messages in thread From: Michael Kerrisk @ 2015-02-27 10:38 UTC (permalink / raw) To: Vlastimil Babka Cc: linux-mm, Jerome Marchand, Linux Kernel, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API [CC += linux-api@] On Thu, Feb 26, 2015 at 2:51 PM, Vlastimil Babka <vbabka@suse.cz> wrote: > From: Jerome Marchand <jmarchan@redhat.com> > > Currently looking at /proc/<pid>/status or statm, there is no way to > distinguish shmem pages from pages mapped to a regular file (shmem > pages are mapped to /dev/zero), even though their implication in > actual memory use is quite different. > This patch adds MM_SHMEMPAGES counter to mm_rss_stat to account for > shmem pages instead of MM_FILEPAGES. > > [vbabka@suse.cz: port to 4.0, add #ifdefs, mm_counter_file() variant] > Signed-off-by: Jerome Marchand <jmarchan@redhat.com> > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > --- > arch/s390/mm/pgtable.c | 5 +---- > fs/proc/task_mmu.c | 4 +++- > include/linux/mm.h | 28 ++++++++++++++++++++++++++++ > include/linux/mm_types.h | 9 ++++++--- > kernel/events/uprobes.c | 2 +- > mm/memory.c | 30 ++++++++++-------------------- > mm/oom_kill.c | 5 +++-- > mm/rmap.c | 15 ++++----------- > 8 files changed, 56 insertions(+), 42 deletions(-) > > diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c > index b2c1542..5bffd5d 100644 > --- a/arch/s390/mm/pgtable.c > +++ b/arch/s390/mm/pgtable.c > @@ -617,10 +617,7 @@ static void gmap_zap_swap_entry(swp_entry_t entry, struct mm_struct *mm) > else if (is_migration_entry(entry)) { > struct page *page = migration_entry_to_page(entry); > > - if (PageAnon(page)) > - dec_mm_counter(mm, MM_ANONPAGES); > - else > - dec_mm_counter(mm, MM_FILEPAGES); > + dec_mm_counter(mm, mm_counter(page)); > } > free_swap_and_cache(entry); > } > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index 0410309..d70334c 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -81,7 +81,8 @@ unsigned long task_statm(struct mm_struct *mm, > unsigned long *shared, unsigned long *text, > unsigned long *data, unsigned long *resident) > { > - *shared = get_mm_counter(mm, MM_FILEPAGES); > + *shared = get_mm_counter(mm, MM_FILEPAGES) + > + get_mm_counter(mm, MM_SHMEMPAGES); > *text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) > >> PAGE_SHIFT; > *data = mm->total_vm - mm->shared_vm; > @@ -501,6 +502,7 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr, > pte_none(*pte) && vma->vm_file) { > struct address_space *mapping = > file_inode(vma->vm_file)->i_mapping; > + pgoff_t pgoff = linear_page_index(vma, addr); > > /* > * shmem does not use swap pte's so we have to consult > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 47a9392..adfbb5b 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1364,6 +1364,16 @@ static inline unsigned long get_mm_counter(struct mm_struct *mm, int member) > return (unsigned long)val; > } > > +/* A wrapper for the CONFIG_SHMEM dependent counter */ > +static inline unsigned long get_mm_counter_shmem(struct mm_struct *mm) > +{ > +#ifdef CONFIG_SHMEM > + return get_mm_counter(mm, MM_SHMEMPAGES); > +#else > + return 0; > +#endif > +} > + > static inline void add_mm_counter(struct mm_struct *mm, int member, long value) > { > atomic_long_add(value, &mm->rss_stat.count[member]); > @@ -1379,9 +1389,27 @@ static inline void dec_mm_counter(struct mm_struct *mm, int member) > atomic_long_dec(&mm->rss_stat.count[member]); > } > > +/* Optimized variant when page is already known not to be PageAnon */ > +static inline int mm_counter_file(struct page *page) > +{ > +#ifdef CONFIG_SHMEM > + if (PageSwapBacked(page)) > + return MM_SHMEMPAGES; > +#endif > + return MM_FILEPAGES; > +} > + > +static inline int mm_counter(struct page *page) > +{ > + if (PageAnon(page)) > + return MM_ANONPAGES; > + return mm_counter_file(page); > +} > + > static inline unsigned long get_mm_rss(struct mm_struct *mm) > { > return get_mm_counter(mm, MM_FILEPAGES) + > + get_mm_counter_shmem(mm) + > get_mm_counter(mm, MM_ANONPAGES); > } > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 199a03a..d3c2372 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -327,9 +327,12 @@ struct core_state { > }; > > enum { > - MM_FILEPAGES, > - MM_ANONPAGES, > - MM_SWAPENTS, > + MM_FILEPAGES, /* Resident file mapping pages */ > + MM_ANONPAGES, /* Resident anonymous pages */ > + MM_SWAPENTS, /* Anonymous swap entries */ > +#ifdef CONFIG_SHMEM > + MM_SHMEMPAGES, /* Resident shared memory pages */ > +#endif > NR_MM_COUNTERS > }; > > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c > index cb346f2..0a08fdd 100644 > --- a/kernel/events/uprobes.c > +++ b/kernel/events/uprobes.c > @@ -188,7 +188,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, > lru_cache_add_active_or_unevictable(kpage, vma); > > if (!PageAnon(page)) { > - dec_mm_counter(mm, MM_FILEPAGES); > + dec_mm_counter(mm, mm_counter_file(page)); > inc_mm_counter(mm, MM_ANONPAGES); > } > > diff --git a/mm/memory.c b/mm/memory.c > index 8068893..f145d9e 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -832,10 +832,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, > } else if (is_migration_entry(entry)) { > page = migration_entry_to_page(entry); > > - if (PageAnon(page)) > - rss[MM_ANONPAGES]++; > - else > - rss[MM_FILEPAGES]++; > + rss[mm_counter(page)]++; > > if (is_write_migration_entry(entry) && > is_cow_mapping(vm_flags)) { > @@ -874,10 +871,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, > if (page) { > get_page(page); > page_dup_rmap(page); > - if (PageAnon(page)) > - rss[MM_ANONPAGES]++; > - else > - rss[MM_FILEPAGES]++; > + rss[mm_counter(page)]++; > } > > out_set_pte: > @@ -1113,9 +1107,8 @@ again: > tlb_remove_tlb_entry(tlb, pte, addr); > if (unlikely(!page)) > continue; > - if (PageAnon(page)) > - rss[MM_ANONPAGES]--; > - else { > + > + if (!PageAnon(page)) { > if (pte_dirty(ptent)) { > force_flush = 1; > set_page_dirty(page); > @@ -1123,8 +1116,8 @@ again: > if (pte_young(ptent) && > likely(!(vma->vm_flags & VM_SEQ_READ))) > mark_page_accessed(page); > - rss[MM_FILEPAGES]--; > } > + rss[mm_counter(page)]--; > page_remove_rmap(page); > if (unlikely(page_mapcount(page) < 0)) > print_bad_pte(vma, addr, ptent, page); > @@ -1146,11 +1139,7 @@ again: > struct page *page; > > page = migration_entry_to_page(entry); > - > - if (PageAnon(page)) > - rss[MM_ANONPAGES]--; > - else > - rss[MM_FILEPAGES]--; > + rss[mm_counter(page)]--; > } > if (unlikely(!free_swap_and_cache(entry))) > print_bad_pte(vma, addr, ptent, NULL); > @@ -1460,7 +1449,7 @@ static int insert_page(struct vm_area_struct *vma, unsigned long addr, > > /* Ok, finally just insert the thing.. */ > get_page(page); > - inc_mm_counter_fast(mm, MM_FILEPAGES); > + inc_mm_counter_fast(mm, mm_counter_file(page)); > page_add_file_rmap(page); > set_pte_at(mm, addr, pte, mk_pte(page, prot)); > > @@ -2174,7 +2163,8 @@ gotten: > if (likely(pte_same(*page_table, orig_pte))) { > if (old_page) { > if (!PageAnon(old_page)) { > - dec_mm_counter_fast(mm, MM_FILEPAGES); > + dec_mm_counter_fast(mm, > + mm_counter_file(old_page)); > inc_mm_counter_fast(mm, MM_ANONPAGES); > } > } else > @@ -2703,7 +2693,7 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address, > inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); > page_add_new_anon_rmap(page, vma, address); > } else { > - inc_mm_counter_fast(vma->vm_mm, MM_FILEPAGES); > + inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page)); > page_add_file_rmap(page); > } > set_pte_at(vma->vm_mm, address, pte, entry); > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 642f38c..a5ee3a2 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -573,10 +573,11 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order, > /* mm cannot safely be dereferenced after task_unlock(victim) */ > mm = victim->mm; > mark_tsk_oom_victim(victim); > - pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n", > + pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n", > task_pid_nr(victim), victim->comm, K(victim->mm->total_vm), > K(get_mm_counter(victim->mm, MM_ANONPAGES)), > - K(get_mm_counter(victim->mm, MM_FILEPAGES))); > + K(get_mm_counter(victim->mm, MM_FILEPAGES)), > + K(get_mm_counter_shmem(victim->mm))); > task_unlock(victim); > > /* > diff --git a/mm/rmap.c b/mm/rmap.c > index 5e3e090..e3c4392 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1216,12 +1216,8 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > update_hiwater_rss(mm); > > if (PageHWPoison(page) && !(flags & TTU_IGNORE_HWPOISON)) { > - if (!PageHuge(page)) { > - if (PageAnon(page)) > - dec_mm_counter(mm, MM_ANONPAGES); > - else > - dec_mm_counter(mm, MM_FILEPAGES); > - } > + if (!PageHuge(page)) > + dec_mm_counter(mm, mm_counter(page)); > set_pte_at(mm, address, pte, > swp_entry_to_pte(make_hwpoison_entry(page))); > } else if (pte_unused(pteval)) { > @@ -1230,10 +1226,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > * interest anymore. Simply discard the pte, vmscan > * will take care of the rest. > */ > - if (PageAnon(page)) > - dec_mm_counter(mm, MM_ANONPAGES); > - else > - dec_mm_counter(mm, MM_FILEPAGES); > + dec_mm_counter(mm, mm_counter(page)); > } else if (PageAnon(page)) { > swp_entry_t entry = { .val = page_private(page) }; > pte_t swp_pte; > @@ -1276,7 +1269,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > entry = make_migration_entry(page, pte_write(pteval)); > set_pte_at(mm, address, pte, swp_entry_to_pte(entry)); > } else > - dec_mm_counter(mm, MM_FILEPAGES); > + dec_mm_counter(mm, mm_counter_file(page)); > > page_remove_rmap(page); > page_cache_release(page); > -- > 2.1.4 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Author of "The Linux Programming Interface", http://blog.man7.org/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 3/4] mm, shmem: Add shmem resident memory accounting @ 2015-02-27 10:38 ` Michael Kerrisk 0 siblings, 0 replies; 37+ messages in thread From: Michael Kerrisk @ 2015-02-27 10:38 UTC (permalink / raw) To: Vlastimil Babka Cc: linux-mm, Jerome Marchand, Linux Kernel, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API [CC += linux-api@] On Thu, Feb 26, 2015 at 2:51 PM, Vlastimil Babka <vbabka@suse.cz> wrote: > From: Jerome Marchand <jmarchan@redhat.com> > > Currently looking at /proc/<pid>/status or statm, there is no way to > distinguish shmem pages from pages mapped to a regular file (shmem > pages are mapped to /dev/zero), even though their implication in > actual memory use is quite different. > This patch adds MM_SHMEMPAGES counter to mm_rss_stat to account for > shmem pages instead of MM_FILEPAGES. > > [vbabka@suse.cz: port to 4.0, add #ifdefs, mm_counter_file() variant] > Signed-off-by: Jerome Marchand <jmarchan@redhat.com> > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > --- > arch/s390/mm/pgtable.c | 5 +---- > fs/proc/task_mmu.c | 4 +++- > include/linux/mm.h | 28 ++++++++++++++++++++++++++++ > include/linux/mm_types.h | 9 ++++++--- > kernel/events/uprobes.c | 2 +- > mm/memory.c | 30 ++++++++++-------------------- > mm/oom_kill.c | 5 +++-- > mm/rmap.c | 15 ++++----------- > 8 files changed, 56 insertions(+), 42 deletions(-) > > diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c > index b2c1542..5bffd5d 100644 > --- a/arch/s390/mm/pgtable.c > +++ b/arch/s390/mm/pgtable.c > @@ -617,10 +617,7 @@ static void gmap_zap_swap_entry(swp_entry_t entry, struct mm_struct *mm) > else if (is_migration_entry(entry)) { > struct page *page = migration_entry_to_page(entry); > > - if (PageAnon(page)) > - dec_mm_counter(mm, MM_ANONPAGES); > - else > - dec_mm_counter(mm, MM_FILEPAGES); > + dec_mm_counter(mm, mm_counter(page)); > } > free_swap_and_cache(entry); > } > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index 0410309..d70334c 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -81,7 +81,8 @@ unsigned long task_statm(struct mm_struct *mm, > unsigned long *shared, unsigned long *text, > unsigned long *data, unsigned long *resident) > { > - *shared = get_mm_counter(mm, MM_FILEPAGES); > + *shared = get_mm_counter(mm, MM_FILEPAGES) + > + get_mm_counter(mm, MM_SHMEMPAGES); > *text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) > >> PAGE_SHIFT; > *data = mm->total_vm - mm->shared_vm; > @@ -501,6 +502,7 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr, > pte_none(*pte) && vma->vm_file) { > struct address_space *mapping = > file_inode(vma->vm_file)->i_mapping; > + pgoff_t pgoff = linear_page_index(vma, addr); > > /* > * shmem does not use swap pte's so we have to consult > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 47a9392..adfbb5b 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1364,6 +1364,16 @@ static inline unsigned long get_mm_counter(struct mm_struct *mm, int member) > return (unsigned long)val; > } > > +/* A wrapper for the CONFIG_SHMEM dependent counter */ > +static inline unsigned long get_mm_counter_shmem(struct mm_struct *mm) > +{ > +#ifdef CONFIG_SHMEM > + return get_mm_counter(mm, MM_SHMEMPAGES); > +#else > + return 0; > +#endif > +} > + > static inline void add_mm_counter(struct mm_struct *mm, int member, long value) > { > atomic_long_add(value, &mm->rss_stat.count[member]); > @@ -1379,9 +1389,27 @@ static inline void dec_mm_counter(struct mm_struct *mm, int member) > atomic_long_dec(&mm->rss_stat.count[member]); > } > > +/* Optimized variant when page is already known not to be PageAnon */ > +static inline int mm_counter_file(struct page *page) > +{ > +#ifdef CONFIG_SHMEM > + if (PageSwapBacked(page)) > + return MM_SHMEMPAGES; > +#endif > + return MM_FILEPAGES; > +} > + > +static inline int mm_counter(struct page *page) > +{ > + if (PageAnon(page)) > + return MM_ANONPAGES; > + return mm_counter_file(page); > +} > + > static inline unsigned long get_mm_rss(struct mm_struct *mm) > { > return get_mm_counter(mm, MM_FILEPAGES) + > + get_mm_counter_shmem(mm) + > get_mm_counter(mm, MM_ANONPAGES); > } > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 199a03a..d3c2372 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -327,9 +327,12 @@ struct core_state { > }; > > enum { > - MM_FILEPAGES, > - MM_ANONPAGES, > - MM_SWAPENTS, > + MM_FILEPAGES, /* Resident file mapping pages */ > + MM_ANONPAGES, /* Resident anonymous pages */ > + MM_SWAPENTS, /* Anonymous swap entries */ > +#ifdef CONFIG_SHMEM > + MM_SHMEMPAGES, /* Resident shared memory pages */ > +#endif > NR_MM_COUNTERS > }; > > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c > index cb346f2..0a08fdd 100644 > --- a/kernel/events/uprobes.c > +++ b/kernel/events/uprobes.c > @@ -188,7 +188,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, > lru_cache_add_active_or_unevictable(kpage, vma); > > if (!PageAnon(page)) { > - dec_mm_counter(mm, MM_FILEPAGES); > + dec_mm_counter(mm, mm_counter_file(page)); > inc_mm_counter(mm, MM_ANONPAGES); > } > > diff --git a/mm/memory.c b/mm/memory.c > index 8068893..f145d9e 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -832,10 +832,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, > } else if (is_migration_entry(entry)) { > page = migration_entry_to_page(entry); > > - if (PageAnon(page)) > - rss[MM_ANONPAGES]++; > - else > - rss[MM_FILEPAGES]++; > + rss[mm_counter(page)]++; > > if (is_write_migration_entry(entry) && > is_cow_mapping(vm_flags)) { > @@ -874,10 +871,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, > if (page) { > get_page(page); > page_dup_rmap(page); > - if (PageAnon(page)) > - rss[MM_ANONPAGES]++; > - else > - rss[MM_FILEPAGES]++; > + rss[mm_counter(page)]++; > } > > out_set_pte: > @@ -1113,9 +1107,8 @@ again: > tlb_remove_tlb_entry(tlb, pte, addr); > if (unlikely(!page)) > continue; > - if (PageAnon(page)) > - rss[MM_ANONPAGES]--; > - else { > + > + if (!PageAnon(page)) { > if (pte_dirty(ptent)) { > force_flush = 1; > set_page_dirty(page); > @@ -1123,8 +1116,8 @@ again: > if (pte_young(ptent) && > likely(!(vma->vm_flags & VM_SEQ_READ))) > mark_page_accessed(page); > - rss[MM_FILEPAGES]--; > } > + rss[mm_counter(page)]--; > page_remove_rmap(page); > if (unlikely(page_mapcount(page) < 0)) > print_bad_pte(vma, addr, ptent, page); > @@ -1146,11 +1139,7 @@ again: > struct page *page; > > page = migration_entry_to_page(entry); > - > - if (PageAnon(page)) > - rss[MM_ANONPAGES]--; > - else > - rss[MM_FILEPAGES]--; > + rss[mm_counter(page)]--; > } > if (unlikely(!free_swap_and_cache(entry))) > print_bad_pte(vma, addr, ptent, NULL); > @@ -1460,7 +1449,7 @@ static int insert_page(struct vm_area_struct *vma, unsigned long addr, > > /* Ok, finally just insert the thing.. */ > get_page(page); > - inc_mm_counter_fast(mm, MM_FILEPAGES); > + inc_mm_counter_fast(mm, mm_counter_file(page)); > page_add_file_rmap(page); > set_pte_at(mm, addr, pte, mk_pte(page, prot)); > > @@ -2174,7 +2163,8 @@ gotten: > if (likely(pte_same(*page_table, orig_pte))) { > if (old_page) { > if (!PageAnon(old_page)) { > - dec_mm_counter_fast(mm, MM_FILEPAGES); > + dec_mm_counter_fast(mm, > + mm_counter_file(old_page)); > inc_mm_counter_fast(mm, MM_ANONPAGES); > } > } else > @@ -2703,7 +2693,7 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address, > inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); > page_add_new_anon_rmap(page, vma, address); > } else { > - inc_mm_counter_fast(vma->vm_mm, MM_FILEPAGES); > + inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page)); > page_add_file_rmap(page); > } > set_pte_at(vma->vm_mm, address, pte, entry); > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 642f38c..a5ee3a2 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -573,10 +573,11 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order, > /* mm cannot safely be dereferenced after task_unlock(victim) */ > mm = victim->mm; > mark_tsk_oom_victim(victim); > - pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n", > + pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n", > task_pid_nr(victim), victim->comm, K(victim->mm->total_vm), > K(get_mm_counter(victim->mm, MM_ANONPAGES)), > - K(get_mm_counter(victim->mm, MM_FILEPAGES))); > + K(get_mm_counter(victim->mm, MM_FILEPAGES)), > + K(get_mm_counter_shmem(victim->mm))); > task_unlock(victim); > > /* > diff --git a/mm/rmap.c b/mm/rmap.c > index 5e3e090..e3c4392 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1216,12 +1216,8 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > update_hiwater_rss(mm); > > if (PageHWPoison(page) && !(flags & TTU_IGNORE_HWPOISON)) { > - if (!PageHuge(page)) { > - if (PageAnon(page)) > - dec_mm_counter(mm, MM_ANONPAGES); > - else > - dec_mm_counter(mm, MM_FILEPAGES); > - } > + if (!PageHuge(page)) > + dec_mm_counter(mm, mm_counter(page)); > set_pte_at(mm, address, pte, > swp_entry_to_pte(make_hwpoison_entry(page))); > } else if (pte_unused(pteval)) { > @@ -1230,10 +1226,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > * interest anymore. Simply discard the pte, vmscan > * will take care of the rest. > */ > - if (PageAnon(page)) > - dec_mm_counter(mm, MM_ANONPAGES); > - else > - dec_mm_counter(mm, MM_FILEPAGES); > + dec_mm_counter(mm, mm_counter(page)); > } else if (PageAnon(page)) { > swp_entry_t entry = { .val = page_private(page) }; > pte_t swp_pte; > @@ -1276,7 +1269,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > entry = make_migration_entry(page, pte_write(pteval)); > set_pte_at(mm, address, pte, swp_entry_to_pte(entry)); > } else > - dec_mm_counter(mm, MM_FILEPAGES); > + dec_mm_counter(mm, mm_counter_file(page)); > > page_remove_rmap(page); > page_cache_release(page); > -- > 2.1.4 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Author of "The Linux Programming Interface", http://blog.man7.org/ ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 4/4] mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status 2015-02-26 13:51 ` Vlastimil Babka @ 2015-02-26 13:51 ` Vlastimil Babka -1 siblings, 0 replies; 37+ messages in thread From: Vlastimil Babka @ 2015-02-26 13:51 UTC (permalink / raw) To: linux-mm, Jerome Marchand Cc: linux-kernel, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Vlastimil Babka From: Jerome Marchand <jmarchan@redhat.com> It's currently inconvenient to retrieve MM_ANONPAGES value from status and statm files and there is no way to separate MM_FILEPAGES and MM_SHMEMPAGES. Add VmAnon, VmFile and VmShm lines in /proc/<pid>/status to solve these issues. Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> --- Documentation/filesystems/proc.txt | 10 +++++++++- fs/proc/task_mmu.c | 13 +++++++++++-- 2 files changed, 20 insertions(+), 3 deletions(-) diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 8b30543..c777adb 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -168,6 +168,9 @@ read the file /proc/PID/status: VmLck: 0 kB VmHWM: 476 kB VmRSS: 476 kB + VmAnon: 352 kB + VmFile: 120 kB + VmShm: 4 kB VmData: 156 kB VmStk: 88 kB VmExe: 68 kB @@ -224,7 +227,12 @@ Table 1-2: Contents of the status files (as of 2.6.30-rc7) VmSize total program size VmLck locked memory size VmHWM peak resident set size ("high water mark") - VmRSS size of memory portions + VmRSS size of memory portions. It contains the three + following parts (VmRSS = VmAnon + VmFile + VmShm) + VmAnon size of resident anonymous memory + VmFile size of resident file mappings + VmShm size of resident shmem memory (includes SysV shm, + mapping of tmpfs and shared anonymous mappings) VmData size of data, stack, and text segments VmStk size of data, stack, and text segments VmExe size of text segment diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index d70334c..a77a3ac 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -22,7 +22,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm) { - unsigned long data, text, lib, swap, ptes, pmds; + unsigned long data, text, lib, swap, ptes, pmds, anon, file, shmem; unsigned long hiwater_vm, total_vm, hiwater_rss, total_rss; /* @@ -39,6 +39,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm) if (hiwater_rss < mm->hiwater_rss) hiwater_rss = mm->hiwater_rss; + anon = get_mm_counter(mm, MM_ANONPAGES); + file = get_mm_counter(mm, MM_FILEPAGES); + shmem = get_mm_counter_shmem(mm); data = mm->total_vm - mm->shared_vm - mm->stack_vm; text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) >> 10; lib = (mm->exec_vm << (PAGE_SHIFT-10)) - text; @@ -52,6 +55,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm) "VmPin:\t%8lu kB\n" "VmHWM:\t%8lu kB\n" "VmRSS:\t%8lu kB\n" + "VmAnon:\t%8lu kB\n" + "VmFile:\t%8lu kB\n" + "VmShm:\t%8lu kB\n" "VmData:\t%8lu kB\n" "VmStk:\t%8lu kB\n" "VmExe:\t%8lu kB\n" @@ -65,6 +71,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm) mm->pinned_vm << (PAGE_SHIFT-10), hiwater_rss << (PAGE_SHIFT-10), total_rss << (PAGE_SHIFT-10), + anon << (PAGE_SHIFT-10), + file << (PAGE_SHIFT-10), + shmem << (PAGE_SHIFT-10), data << (PAGE_SHIFT-10), mm->stack_vm << (PAGE_SHIFT-10), text, lib, ptes >> 10, @@ -82,7 +91,7 @@ unsigned long task_statm(struct mm_struct *mm, unsigned long *data, unsigned long *resident) { *shared = get_mm_counter(mm, MM_FILEPAGES) + - get_mm_counter(mm, MM_SHMEMPAGES); + get_mm_counter_shmem(mm); *text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) >> PAGE_SHIFT; *data = mm->total_vm - mm->shared_vm; -- 2.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH 4/4] mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status @ 2015-02-26 13:51 ` Vlastimil Babka 0 siblings, 0 replies; 37+ messages in thread From: Vlastimil Babka @ 2015-02-26 13:51 UTC (permalink / raw) To: linux-mm, Jerome Marchand Cc: linux-kernel, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Vlastimil Babka From: Jerome Marchand <jmarchan@redhat.com> It's currently inconvenient to retrieve MM_ANONPAGES value from status and statm files and there is no way to separate MM_FILEPAGES and MM_SHMEMPAGES. Add VmAnon, VmFile and VmShm lines in /proc/<pid>/status to solve these issues. Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> --- Documentation/filesystems/proc.txt | 10 +++++++++- fs/proc/task_mmu.c | 13 +++++++++++-- 2 files changed, 20 insertions(+), 3 deletions(-) diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 8b30543..c777adb 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -168,6 +168,9 @@ read the file /proc/PID/status: VmLck: 0 kB VmHWM: 476 kB VmRSS: 476 kB + VmAnon: 352 kB + VmFile: 120 kB + VmShm: 4 kB VmData: 156 kB VmStk: 88 kB VmExe: 68 kB @@ -224,7 +227,12 @@ Table 1-2: Contents of the status files (as of 2.6.30-rc7) VmSize total program size VmLck locked memory size VmHWM peak resident set size ("high water mark") - VmRSS size of memory portions + VmRSS size of memory portions. It contains the three + following parts (VmRSS = VmAnon + VmFile + VmShm) + VmAnon size of resident anonymous memory + VmFile size of resident file mappings + VmShm size of resident shmem memory (includes SysV shm, + mapping of tmpfs and shared anonymous mappings) VmData size of data, stack, and text segments VmStk size of data, stack, and text segments VmExe size of text segment diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index d70334c..a77a3ac 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -22,7 +22,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm) { - unsigned long data, text, lib, swap, ptes, pmds; + unsigned long data, text, lib, swap, ptes, pmds, anon, file, shmem; unsigned long hiwater_vm, total_vm, hiwater_rss, total_rss; /* @@ -39,6 +39,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm) if (hiwater_rss < mm->hiwater_rss) hiwater_rss = mm->hiwater_rss; + anon = get_mm_counter(mm, MM_ANONPAGES); + file = get_mm_counter(mm, MM_FILEPAGES); + shmem = get_mm_counter_shmem(mm); data = mm->total_vm - mm->shared_vm - mm->stack_vm; text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) >> 10; lib = (mm->exec_vm << (PAGE_SHIFT-10)) - text; @@ -52,6 +55,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm) "VmPin:\t%8lu kB\n" "VmHWM:\t%8lu kB\n" "VmRSS:\t%8lu kB\n" + "VmAnon:\t%8lu kB\n" + "VmFile:\t%8lu kB\n" + "VmShm:\t%8lu kB\n" "VmData:\t%8lu kB\n" "VmStk:\t%8lu kB\n" "VmExe:\t%8lu kB\n" @@ -65,6 +71,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm) mm->pinned_vm << (PAGE_SHIFT-10), hiwater_rss << (PAGE_SHIFT-10), total_rss << (PAGE_SHIFT-10), + anon << (PAGE_SHIFT-10), + file << (PAGE_SHIFT-10), + shmem << (PAGE_SHIFT-10), data << (PAGE_SHIFT-10), mm->stack_vm << (PAGE_SHIFT-10), text, lib, ptes >> 10, @@ -82,7 +91,7 @@ unsigned long task_statm(struct mm_struct *mm, unsigned long *data, unsigned long *resident) { *shared = get_mm_counter(mm, MM_FILEPAGES) + - get_mm_counter(mm, MM_SHMEMPAGES); + get_mm_counter_shmem(mm); *text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) >> PAGE_SHIFT; *data = mm->total_vm - mm->shared_vm; -- 2.1.4 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 4/4] mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status 2015-02-26 13:51 ` Vlastimil Babka @ 2015-02-27 10:38 ` Michael Kerrisk -1 siblings, 0 replies; 37+ messages in thread From: Michael Kerrisk @ 2015-02-27 10:38 UTC (permalink / raw) To: Vlastimil Babka Cc: linux-mm, Jerome Marchand, Linux Kernel, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API [CC += linux-api@] On Thu, Feb 26, 2015 at 2:51 PM, Vlastimil Babka <vbabka@suse.cz> wrote: > From: Jerome Marchand <jmarchan@redhat.com> > > It's currently inconvenient to retrieve MM_ANONPAGES value from status > and statm files and there is no way to separate MM_FILEPAGES and > MM_SHMEMPAGES. Add VmAnon, VmFile and VmShm lines in /proc/<pid>/status > to solve these issues. > > Signed-off-by: Jerome Marchand <jmarchan@redhat.com> > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > --- > Documentation/filesystems/proc.txt | 10 +++++++++- > fs/proc/task_mmu.c | 13 +++++++++++-- > 2 files changed, 20 insertions(+), 3 deletions(-) > > diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt > index 8b30543..c777adb 100644 > --- a/Documentation/filesystems/proc.txt > +++ b/Documentation/filesystems/proc.txt > @@ -168,6 +168,9 @@ read the file /proc/PID/status: > VmLck: 0 kB > VmHWM: 476 kB > VmRSS: 476 kB > + VmAnon: 352 kB > + VmFile: 120 kB > + VmShm: 4 kB > VmData: 156 kB > VmStk: 88 kB > VmExe: 68 kB > @@ -224,7 +227,12 @@ Table 1-2: Contents of the status files (as of 2.6.30-rc7) > VmSize total program size > VmLck locked memory size > VmHWM peak resident set size ("high water mark") > - VmRSS size of memory portions > + VmRSS size of memory portions. It contains the three > + following parts (VmRSS = VmAnon + VmFile + VmShm) > + VmAnon size of resident anonymous memory > + VmFile size of resident file mappings > + VmShm size of resident shmem memory (includes SysV shm, > + mapping of tmpfs and shared anonymous mappings) > VmData size of data, stack, and text segments > VmStk size of data, stack, and text segments > VmExe size of text segment > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index d70334c..a77a3ac 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -22,7 +22,7 @@ > > void task_mem(struct seq_file *m, struct mm_struct *mm) > { > - unsigned long data, text, lib, swap, ptes, pmds; > + unsigned long data, text, lib, swap, ptes, pmds, anon, file, shmem; > unsigned long hiwater_vm, total_vm, hiwater_rss, total_rss; > > /* > @@ -39,6 +39,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm) > if (hiwater_rss < mm->hiwater_rss) > hiwater_rss = mm->hiwater_rss; > > + anon = get_mm_counter(mm, MM_ANONPAGES); > + file = get_mm_counter(mm, MM_FILEPAGES); > + shmem = get_mm_counter_shmem(mm); > data = mm->total_vm - mm->shared_vm - mm->stack_vm; > text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) >> 10; > lib = (mm->exec_vm << (PAGE_SHIFT-10)) - text; > @@ -52,6 +55,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm) > "VmPin:\t%8lu kB\n" > "VmHWM:\t%8lu kB\n" > "VmRSS:\t%8lu kB\n" > + "VmAnon:\t%8lu kB\n" > + "VmFile:\t%8lu kB\n" > + "VmShm:\t%8lu kB\n" > "VmData:\t%8lu kB\n" > "VmStk:\t%8lu kB\n" > "VmExe:\t%8lu kB\n" > @@ -65,6 +71,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm) > mm->pinned_vm << (PAGE_SHIFT-10), > hiwater_rss << (PAGE_SHIFT-10), > total_rss << (PAGE_SHIFT-10), > + anon << (PAGE_SHIFT-10), > + file << (PAGE_SHIFT-10), > + shmem << (PAGE_SHIFT-10), > data << (PAGE_SHIFT-10), > mm->stack_vm << (PAGE_SHIFT-10), text, lib, > ptes >> 10, > @@ -82,7 +91,7 @@ unsigned long task_statm(struct mm_struct *mm, > unsigned long *data, unsigned long *resident) > { > *shared = get_mm_counter(mm, MM_FILEPAGES) + > - get_mm_counter(mm, MM_SHMEMPAGES); > + get_mm_counter_shmem(mm); > *text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) > >> PAGE_SHIFT; > *data = mm->total_vm - mm->shared_vm; > -- > 2.1.4 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Author of "The Linux Programming Interface", http://blog.man7.org/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 4/4] mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status @ 2015-02-27 10:38 ` Michael Kerrisk 0 siblings, 0 replies; 37+ messages in thread From: Michael Kerrisk @ 2015-02-27 10:38 UTC (permalink / raw) To: Vlastimil Babka Cc: linux-mm, Jerome Marchand, Linux Kernel, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API [CC += linux-api@] On Thu, Feb 26, 2015 at 2:51 PM, Vlastimil Babka <vbabka@suse.cz> wrote: > From: Jerome Marchand <jmarchan@redhat.com> > > It's currently inconvenient to retrieve MM_ANONPAGES value from status > and statm files and there is no way to separate MM_FILEPAGES and > MM_SHMEMPAGES. Add VmAnon, VmFile and VmShm lines in /proc/<pid>/status > to solve these issues. > > Signed-off-by: Jerome Marchand <jmarchan@redhat.com> > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > --- > Documentation/filesystems/proc.txt | 10 +++++++++- > fs/proc/task_mmu.c | 13 +++++++++++-- > 2 files changed, 20 insertions(+), 3 deletions(-) > > diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt > index 8b30543..c777adb 100644 > --- a/Documentation/filesystems/proc.txt > +++ b/Documentation/filesystems/proc.txt > @@ -168,6 +168,9 @@ read the file /proc/PID/status: > VmLck: 0 kB > VmHWM: 476 kB > VmRSS: 476 kB > + VmAnon: 352 kB > + VmFile: 120 kB > + VmShm: 4 kB > VmData: 156 kB > VmStk: 88 kB > VmExe: 68 kB > @@ -224,7 +227,12 @@ Table 1-2: Contents of the status files (as of 2.6.30-rc7) > VmSize total program size > VmLck locked memory size > VmHWM peak resident set size ("high water mark") > - VmRSS size of memory portions > + VmRSS size of memory portions. It contains the three > + following parts (VmRSS = VmAnon + VmFile + VmShm) > + VmAnon size of resident anonymous memory > + VmFile size of resident file mappings > + VmShm size of resident shmem memory (includes SysV shm, > + mapping of tmpfs and shared anonymous mappings) > VmData size of data, stack, and text segments > VmStk size of data, stack, and text segments > VmExe size of text segment > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index d70334c..a77a3ac 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -22,7 +22,7 @@ > > void task_mem(struct seq_file *m, struct mm_struct *mm) > { > - unsigned long data, text, lib, swap, ptes, pmds; > + unsigned long data, text, lib, swap, ptes, pmds, anon, file, shmem; > unsigned long hiwater_vm, total_vm, hiwater_rss, total_rss; > > /* > @@ -39,6 +39,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm) > if (hiwater_rss < mm->hiwater_rss) > hiwater_rss = mm->hiwater_rss; > > + anon = get_mm_counter(mm, MM_ANONPAGES); > + file = get_mm_counter(mm, MM_FILEPAGES); > + shmem = get_mm_counter_shmem(mm); > data = mm->total_vm - mm->shared_vm - mm->stack_vm; > text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) >> 10; > lib = (mm->exec_vm << (PAGE_SHIFT-10)) - text; > @@ -52,6 +55,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm) > "VmPin:\t%8lu kB\n" > "VmHWM:\t%8lu kB\n" > "VmRSS:\t%8lu kB\n" > + "VmAnon:\t%8lu kB\n" > + "VmFile:\t%8lu kB\n" > + "VmShm:\t%8lu kB\n" > "VmData:\t%8lu kB\n" > "VmStk:\t%8lu kB\n" > "VmExe:\t%8lu kB\n" > @@ -65,6 +71,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm) > mm->pinned_vm << (PAGE_SHIFT-10), > hiwater_rss << (PAGE_SHIFT-10), > total_rss << (PAGE_SHIFT-10), > + anon << (PAGE_SHIFT-10), > + file << (PAGE_SHIFT-10), > + shmem << (PAGE_SHIFT-10), > data << (PAGE_SHIFT-10), > mm->stack_vm << (PAGE_SHIFT-10), text, lib, > ptes >> 10, > @@ -82,7 +91,7 @@ unsigned long task_statm(struct mm_struct *mm, > unsigned long *data, unsigned long *resident) > { > *shared = get_mm_counter(mm, MM_FILEPAGES) + > - get_mm_counter(mm, MM_SHMEMPAGES); > + get_mm_counter_shmem(mm); > *text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) > >> PAGE_SHIFT; > *data = mm->total_vm - mm->shared_vm; > -- > 2.1.4 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Author of "The Linux Programming Interface", http://blog.man7.org/ ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/4] enhance shmem process and swap accounting 2015-02-26 13:51 ` Vlastimil Babka @ 2015-02-27 10:36 ` Michael Kerrisk -1 siblings, 0 replies; 37+ messages in thread From: Michael Kerrisk @ 2015-02-27 10:36 UTC (permalink / raw) To: Vlastimil Babka Cc: linux-mm, Jerome Marchand, Linux Kernel, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API [CC += linux-api@] Hello Vlastimil, Since this is a kernel-user-space API change, please CC linux-api@. The kernel source file Documentation/SubmitChecklist notes that all Linux kernel patches that change userspace interfaces should be CCed to linux-api@vger.kernel.org, so that the various parties who are interested in API changes are informed. For further information, see https://www.kernel.org/doc/man-pages/linux-api-ml.html Cheers, Michael On Thu, Feb 26, 2015 at 2:51 PM, Vlastimil Babka <vbabka@suse.cz> wrote: > This series is based on Jerome Marchand's [1] so let me quote the first > paragraph from there: > > There are several shortcomings with the accounting of shared memory > (sysV shm, shared anonymous mapping, mapping to a tmpfs file). The > values in /proc/<pid>/status and statm don't allow to distinguish > between shmem memory and a shared mapping to a regular file, even > though theirs implication on memory usage are quite different: at > reclaim, file mapping can be dropped or write back on disk while shmem > needs a place in swap. As for shmem pages that are swapped-out or in > swap cache, they aren't accounted at all. > > The original motivation for myself is that a customer found (IMHO rightfully) > confusing that e.g. top output for process swap usage is unreliable with > respect to swapped out shmem pages, which are not accounted for. > > The fundamental difference between private anonymous and shmem pages is that > the latter has PTE's converted to pte_none, and not swapents. As such, they are > not accounted to the number of swapents visible e.g. in /proc/pid/status VmSwap > row. It might be theoretically possible to use swapents when swapping out shmem > (without extra cost, as one has to change all mappers anyway), and on swap in > only convert the swapent for the faulting process, leaving swapents in other > processes until they also fault (so again no extra cost). But I don't know how > many assumptions this would break, and it would be too disruptive change for a > relatively small benefit. > > Instead, my approach is to document the limitation of VmSwap, and provide means > to determine the swap usage for shmem areas for those who are interested and > willing to pay the price, using /proc/pid/smaps. Because outside of ipcs, I > don't think it's possible to currently to determine the usage at all. The > previous patchset [1] did introduce new shmem-specific fields into smaps > output, and functions to determine the values. I take a simpler approach, > noting that smaps output already has a "Swap: X kB" line, where currently X == > 0 always for shmem areas. I think we can just consider this a bug and provide > the proper value by consulting the radix tree, as e.g. mincore_page() does. In the > patch changelog I explain why this is also not perfect (and cannot be without > swapents), but still arguably much better than showing a 0. > > The last two patches are adapted from Jerome's patchset and provide a VmRSS > breakdown to VmAnon, VmFile and VmShm in /proc/pid/status. Hugh noted that > this is a welcome addition, and I agree that it might help e.g. debugging > process memory usage at albeit non-zero, but still rather low cost of extra > per-mm counter and some page flag checks. I updated these patches to 4.0-rc1, > made them respect !CONFIG_SHMEM so that tiny systems don't pay the cost, and > optimized the page flag checking somewhat. > > [1] http://lwn.net/Articles/611966/ > > Jerome Marchand (2): > mm, shmem: Add shmem resident memory accounting > mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status > > Vlastimil Babka (2): > mm, documentation: clarify /proc/pid/status VmSwap limitations > mm, proc: account for shmem swap in /proc/pid/smaps > > Documentation/filesystems/proc.txt | 15 +++++++++++++-- > arch/s390/mm/pgtable.c | 5 +---- > fs/proc/task_mmu.c | 35 +++++++++++++++++++++++++++++++++-- > include/linux/mm.h | 28 ++++++++++++++++++++++++++++ > include/linux/mm_types.h | 9 ++++++--- > kernel/events/uprobes.c | 2 +- > mm/memory.c | 30 ++++++++++-------------------- > mm/oom_kill.c | 5 +++-- > mm/rmap.c | 15 ++++----------- > 9 files changed, 99 insertions(+), 45 deletions(-) > > -- > 2.1.4 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Author of "The Linux Programming Interface", http://blog.man7.org/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/4] enhance shmem process and swap accounting @ 2015-02-27 10:36 ` Michael Kerrisk 0 siblings, 0 replies; 37+ messages in thread From: Michael Kerrisk @ 2015-02-27 10:36 UTC (permalink / raw) To: Vlastimil Babka Cc: linux-mm, Jerome Marchand, Linux Kernel, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API [CC += linux-api@] Hello Vlastimil, Since this is a kernel-user-space API change, please CC linux-api@. The kernel source file Documentation/SubmitChecklist notes that all Linux kernel patches that change userspace interfaces should be CCed to linux-api@vger.kernel.org, so that the various parties who are interested in API changes are informed. For further information, see https://www.kernel.org/doc/man-pages/linux-api-ml.html Cheers, Michael On Thu, Feb 26, 2015 at 2:51 PM, Vlastimil Babka <vbabka@suse.cz> wrote: > This series is based on Jerome Marchand's [1] so let me quote the first > paragraph from there: > > There are several shortcomings with the accounting of shared memory > (sysV shm, shared anonymous mapping, mapping to a tmpfs file). The > values in /proc/<pid>/status and statm don't allow to distinguish > between shmem memory and a shared mapping to a regular file, even > though theirs implication on memory usage are quite different: at > reclaim, file mapping can be dropped or write back on disk while shmem > needs a place in swap. As for shmem pages that are swapped-out or in > swap cache, they aren't accounted at all. > > The original motivation for myself is that a customer found (IMHO rightfully) > confusing that e.g. top output for process swap usage is unreliable with > respect to swapped out shmem pages, which are not accounted for. > > The fundamental difference between private anonymous and shmem pages is that > the latter has PTE's converted to pte_none, and not swapents. As such, they are > not accounted to the number of swapents visible e.g. in /proc/pid/status VmSwap > row. It might be theoretically possible to use swapents when swapping out shmem > (without extra cost, as one has to change all mappers anyway), and on swap in > only convert the swapent for the faulting process, leaving swapents in other > processes until they also fault (so again no extra cost). But I don't know how > many assumptions this would break, and it would be too disruptive change for a > relatively small benefit. > > Instead, my approach is to document the limitation of VmSwap, and provide means > to determine the swap usage for shmem areas for those who are interested and > willing to pay the price, using /proc/pid/smaps. Because outside of ipcs, I > don't think it's possible to currently to determine the usage at all. The > previous patchset [1] did introduce new shmem-specific fields into smaps > output, and functions to determine the values. I take a simpler approach, > noting that smaps output already has a "Swap: X kB" line, where currently X == > 0 always for shmem areas. I think we can just consider this a bug and provide > the proper value by consulting the radix tree, as e.g. mincore_page() does. In the > patch changelog I explain why this is also not perfect (and cannot be without > swapents), but still arguably much better than showing a 0. > > The last two patches are adapted from Jerome's patchset and provide a VmRSS > breakdown to VmAnon, VmFile and VmShm in /proc/pid/status. Hugh noted that > this is a welcome addition, and I agree that it might help e.g. debugging > process memory usage at albeit non-zero, but still rather low cost of extra > per-mm counter and some page flag checks. I updated these patches to 4.0-rc1, > made them respect !CONFIG_SHMEM so that tiny systems don't pay the cost, and > optimized the page flag checking somewhat. > > [1] http://lwn.net/Articles/611966/ > > Jerome Marchand (2): > mm, shmem: Add shmem resident memory accounting > mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status > > Vlastimil Babka (2): > mm, documentation: clarify /proc/pid/status VmSwap limitations > mm, proc: account for shmem swap in /proc/pid/smaps > > Documentation/filesystems/proc.txt | 15 +++++++++++++-- > arch/s390/mm/pgtable.c | 5 +---- > fs/proc/task_mmu.c | 35 +++++++++++++++++++++++++++++++++-- > include/linux/mm.h | 28 ++++++++++++++++++++++++++++ > include/linux/mm_types.h | 9 ++++++--- > kernel/events/uprobes.c | 2 +- > mm/memory.c | 30 ++++++++++-------------------- > mm/oom_kill.c | 5 +++-- > mm/rmap.c | 15 ++++----------- > 9 files changed, 99 insertions(+), 45 deletions(-) > > -- > 2.1.4 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Author of "The Linux Programming Interface", http://blog.man7.org/ ^ permalink raw reply [flat|nested] 37+ messages in thread
[parent not found: <CAHO5Pa0xmquUbzkZvow_PxRGZpA7MVEPFcRL2LPXv7hU41uxDw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH 0/4] enhance shmem process and swap accounting 2015-02-27 10:36 ` Michael Kerrisk (?) @ 2015-02-27 10:52 ` Vlastimil Babka -1 siblings, 0 replies; 37+ messages in thread From: Vlastimil Babka @ 2015-02-27 10:52 UTC (permalink / raw) To: Michael Kerrisk Cc: linux-mm, Jerome Marchand, Linux Kernel, Andrew Morton, linux-doc-u79uwXL29TY76Z2rM5mHXA, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390-u79uwXL29TY76Z2rM5mHXA, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API On 02/27/2015 11:36 AM, Michael Kerrisk wrote: > [CC += linux-api@] > > Hello Vlastimil, > > Since this is a kernel-user-space API change, please CC linux-api@. > The kernel source file Documentation/SubmitChecklist notes that all > Linux kernel patches that change userspace interfaces should be CCed > to linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, so that the various parties who are > interested in API changes are informed. For further information, see > https://www.kernel.org/doc/man-pages/linux-api-ml.html Yes I meant to do that but forgot in the end, what a shame. Sorry for the trouble. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/4] enhance shmem process and swap accounting @ 2015-02-27 10:52 ` Vlastimil Babka 0 siblings, 0 replies; 37+ messages in thread From: Vlastimil Babka @ 2015-02-27 10:52 UTC (permalink / raw) To: Michael Kerrisk Cc: linux-mm, Jerome Marchand, Linux Kernel, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API On 02/27/2015 11:36 AM, Michael Kerrisk wrote: > [CC += linux-api@] > > Hello Vlastimil, > > Since this is a kernel-user-space API change, please CC linux-api@. > The kernel source file Documentation/SubmitChecklist notes that all > Linux kernel patches that change userspace interfaces should be CCed > to linux-api@vger.kernel.org, so that the various parties who are > interested in API changes are informed. For further information, see > https://www.kernel.org/doc/man-pages/linux-api-ml.html Yes I meant to do that but forgot in the end, what a shame. Sorry for the trouble. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/4] enhance shmem process and swap accounting @ 2015-02-27 10:52 ` Vlastimil Babka 0 siblings, 0 replies; 37+ messages in thread From: Vlastimil Babka @ 2015-02-27 10:52 UTC (permalink / raw) To: Michael Kerrisk Cc: linux-mm, Jerome Marchand, Linux Kernel, Andrew Morton, linux-doc, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API On 02/27/2015 11:36 AM, Michael Kerrisk wrote: > [CC += linux-api@] > > Hello Vlastimil, > > Since this is a kernel-user-space API change, please CC linux-api@. > The kernel source file Documentation/SubmitChecklist notes that all > Linux kernel patches that change userspace interfaces should be CCed > to linux-api@vger.kernel.org, so that the various parties who are > interested in API changes are informed. For further information, see > https://www.kernel.org/doc/man-pages/linux-api-ml.html Yes I meant to do that but forgot in the end, what a shame. Sorry for the trouble. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2015-03-27 16:40 UTC | newest]
Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-26 13:51 [PATCH 0/4] enhance shmem process and swap accounting Vlastimil Babka
2015-02-26 13:51 ` Vlastimil Babka
2015-02-26 13:51 ` [PATCH 1/4] mm, documentation: clarify /proc/pid/status VmSwap limitations Vlastimil Babka
2015-02-26 13:51 ` Vlastimil Babka
2015-02-27 10:37 ` Michael Kerrisk
2015-02-27 10:37 ` Michael Kerrisk
2015-02-26 13:51 ` [PATCH 2/4] mm, procfs: account for shmem swap in /proc/pid/smaps Vlastimil Babka
2015-02-26 13:51 ` Vlastimil Babka
2015-02-26 14:39 ` Jerome Marchand
2015-02-27 10:38 ` Michael Kerrisk
2015-02-27 10:38 ` Michael Kerrisk
2015-03-11 12:30 ` Konstantin Khlebnikov
2015-03-11 12:30 ` Konstantin Khlebnikov
2015-03-11 15:03 ` Konstantin Khlebnikov
2015-03-11 15:26 ` Jerome Marchand
2015-03-11 16:31 ` Konstantin Khlebnikov
2015-03-11 16:31 ` Konstantin Khlebnikov
2015-03-11 19:10 ` Vlastimil Babka
2015-03-11 19:10 ` Vlastimil Babka
2015-03-11 20:03 ` Konstantin Khlebnikov
2015-03-11 20:03 ` Konstantin Khlebnikov
2015-02-26 13:51 ` [PATCH 3/4] mm, shmem: Add shmem resident memory accounting Vlastimil Babka
2015-02-26 13:51 ` Vlastimil Babka
2015-02-26 14:59 ` Jerome Marchand
2015-03-27 16:39 ` Vlastimil Babka
2015-03-27 16:39 ` Vlastimil Babka
2015-02-27 10:38 ` Michael Kerrisk
2015-02-27 10:38 ` Michael Kerrisk
2015-02-26 13:51 ` [PATCH 4/4] mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status Vlastimil Babka
2015-02-26 13:51 ` Vlastimil Babka
2015-02-27 10:38 ` Michael Kerrisk
2015-02-27 10:38 ` Michael Kerrisk
2015-02-27 10:36 ` [PATCH 0/4] enhance shmem process and swap accounting Michael Kerrisk
2015-02-27 10:36 ` Michael Kerrisk
[not found] ` <CAHO5Pa0xmquUbzkZvow_PxRGZpA7MVEPFcRL2LPXv7hU41uxDw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-02-27 10:52 ` Vlastimil Babka
2015-02-27 10:52 ` Vlastimil Babka
2015-02-27 10:52 ` Vlastimil Babka
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.