* [PATCH v3 0/4] enhance shmem process and swap accounting @ 2015-08-05 13:01 Vlastimil Babka 2015-08-05 13:01 ` [PATCH v3 2/4] mm, proc: account for shmem swap in /proc/pid/smaps Vlastimil Babka ` (3 more replies) 0 siblings, 4 replies; 13+ messages in thread From: Vlastimil Babka @ 2015-08-05 13:01 UTC (permalink / raw) To: Andrew Morton, Jerome Marchand Cc: linux-mm, linux-kernel, Vlastimil Babka, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API, Konstantin Khlebnikov, Minchan Kim Reposting due to lack of feedback in May. I hope at least patches 1 and 2 could be merged as they are IMHO bugfixes. 3 and 4 is optional but IMHO useful. Changes since v2: o Rebase on next-20150805. o This means that /proc/pid/maps has the proportional swap share (SwapPss:) field as per https://lkml.org/lkml/2015/6/15/274 It's not clear what to do with shmem here so it's 0 for now. - swapped out shmem doesn't have swap entries, so we would have to look at who else has the shmem object (partially) mapped - to be more precise we should also check if his range actually includes the offset in question, which could get rather involved - or is there some easy way I don't see? o Konstantin suggested for patch 3/4 that I drop the CONFIG_SHMEM #ifdefs I didn't see the point in going against tinyfication when the work is already done, but I can do that if more people think it's better and it would block the series. Changes since v1: o In Patch 2, rely on SHMEM_I(inode)->swapped if possible, and fallback to radix tree iterator on partially mapped shmem objects, i.e. decouple shmem swap usage determination from the page walk, for performance reasons. Thanks to Jerome and Konstantin for the tips. The downside is that mm/shmem.c had to be touched. This series is based on Jerome Marchand's [1] so let me quote the first paragraph from there: There are several shortcomings with the accounting of shared memory (sysV shm, shared anonymous mapping, mapping to a tmpfs file). The values in /proc/<pid>/status and statm don't allow to distinguish between shmem memory and a shared mapping to a regular file, even though theirs implication on memory usage are quite different: at reclaim, file mapping can be dropped or write back on disk while shmem needs a place in swap. As for shmem pages that are swapped-out or in swap cache, they aren't accounted at all. The original motivation for myself is that a customer found (IMHO rightfully) confusing that e.g. top output for process swap usage is unreliable with respect to swapped out shmem pages, which are not accounted for. The fundamental difference between private anonymous and shmem pages is that the latter has PTE's converted to pte_none, and not swapents. As such, they are not accounted to the number of swapents visible e.g. in /proc/pid/status VmSwap row. It might be theoretically possible to use swapents when swapping out shmem (without extra cost, as one has to change all mappers anyway), and on swap in only convert the swapent for the faulting process, leaving swapents in other processes until they also fault (so again no extra cost). But I don't know how many assumptions this would break, and it would be too disruptive change for a relatively small benefit. Instead, my approach is to document the limitation of VmSwap, and provide means to determine the swap usage for shmem areas for those who are interested and willing to pay the price, using /proc/pid/smaps. Because outside of ipcs, I don't think it's possible to currently to determine the usage at all. The previous patchset [1] did introduce new shmem-specific fields into smaps output, and functions to determine the values. I take a simpler approach, noting that smaps output already has a "Swap: X kB" line, where currently X == 0 always for shmem areas. I think we can just consider this a bug and provide the proper value by consulting the radix tree, as e.g. mincore_page() does. In the patch changelog I explain why this is also not perfect (and cannot be without swapents), but still arguably much better than showing a 0. The last two patches are adapted from Jerome's patchset and provide a VmRSS breakdown to VmAnon, VmFile and VmShm in /proc/pid/status. Hugh noted that this is a welcome addition, and I agree that it might help e.g. debugging process memory usage at albeit non-zero, but still rather low cost of extra per-mm counter and some page flag checks. I updated these patches to 4.0-rc1, made them respect !CONFIG_SHMEM so that tiny systems don't pay the cost, and optimized the page flag checking somewhat. [1] http://lwn.net/Articles/611966/ Jerome Marchand (2): mm, shmem: Add shmem resident memory accounting mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status Vlastimil Babka (2): mm, documentation: clarify /proc/pid/status VmSwap limitations mm, proc: account for shmem swap in /proc/pid/smaps Documentation/filesystems/proc.txt | 18 ++++++++++--- arch/s390/mm/pgtable.c | 5 +--- fs/proc/task_mmu.c | 52 ++++++++++++++++++++++++++++++++++-- include/linux/mm.h | 28 ++++++++++++++++++++ include/linux/mm_types.h | 9 ++++--- include/linux/shmem_fs.h | 6 +++++ kernel/events/uprobes.c | 2 +- mm/memory.c | 30 +++++++-------------- mm/oom_kill.c | 5 ++-- mm/rmap.c | 15 +++-------- mm/shmem.c | 54 ++++++++++++++++++++++++++++++++++++++ 11 files changed, 178 insertions(+), 46 deletions(-) -- 2.4.6 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v3 2/4] mm, proc: account for shmem swap in /proc/pid/smaps 2015-08-05 13:01 [PATCH v3 0/4] enhance shmem process and swap accounting Vlastimil Babka @ 2015-08-05 13:01 ` Vlastimil Babka [not found] ` <1438779685-5227-3-git-send-email-vbabka-AlSwsSmVLrQ@public.gmane.org> 2015-08-05 13:01 ` [PATCH v3 3/4] mm, shmem: Add shmem resident memory accounting Vlastimil Babka ` (2 subsequent siblings) 3 siblings, 1 reply; 13+ messages in thread From: Vlastimil Babka @ 2015-08-05 13:01 UTC (permalink / raw) To: Andrew Morton, Jerome Marchand Cc: linux-mm, linux-kernel, Vlastimil Babka, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API, Konstantin Khlebnikov, Minchan Kim Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed mappings, even if the mapped portion does contain pages that were swapped out. This is because unlike private anonymous mappings, shmem does not change pte to swap entry, but pte_none when swapping the page out. In the smaps page walk, such page thus looks like it was never faulted in. This patch changes smaps_pte_entry() to determine the swap status for such pte_none entries for shmem mappings, similarly to how mincore_page() does it. Swapped out pages are thus accounted for. The accounting is arguably still not as precise as for private anonymous mappings, since now we will count also pages that the process in question never accessed, but only another process populated them and then let them become swapped out. I believe it is still less confusing and subtle than not showing any swap usage by shmem mappings at all. Also, swapped out pages only becomee a performance issue for future accesses, and we cannot predict those for neither kind of mapping. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> --- Documentation/filesystems/proc.txt | 6 +++-- fs/proc/task_mmu.c | 38 +++++++++++++++++++++++++++ include/linux/shmem_fs.h | 6 +++++ mm/shmem.c | 54 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 102 insertions(+), 2 deletions(-) diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 29f4011..fcf67c7 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -451,8 +451,10 @@ accessed. a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE and a page is modified, the file page is replaced by a private anonymous copy. "Swap" shows how much would-be-anonymous memory is also used, but out on -swap. -"SwapPss" shows proportional swap share of this mapping. +swap. For shmem mappings, "Swap" shows how much of the mapped portion of the +underlying shmem object is on swap. +"SwapPss" shows proportional swap share of this mapping. Shmem mappings will +currently show 0 here. "VmFlags" field deserves a separate description. This member represents the kernel flags associated with the particular virtual memory area in two letter encoded manner. The codes are the following: diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 7c9a174..f94f8f3 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -13,6 +13,7 @@ #include <linux/swap.h> #include <linux/swapops.h> #include <linux/mmu_notifier.h> +#include <linux/shmem_fs.h> #include <asm/elf.h> #include <asm/uaccess.h> @@ -625,6 +626,41 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) seq_putc(m, '\n'); } +#if defined(CONFIG_SHMEM) && defined(CONFIG_SWAP) +static unsigned long smaps_shmem_swap(struct vm_area_struct *vma) +{ + struct inode *inode; + unsigned long swapped; + pgoff_t start, end; + + if (!vma->vm_file) + return 0; + + inode = file_inode(vma->vm_file); + + if (!shmem_mapping(inode->i_mapping)) + return 0; + + swapped = shmem_swap_usage(inode); + + if (swapped == 0) + return 0; + + if (vma->vm_end - vma->vm_start >= inode->i_size) + return swapped; + + start = linear_page_index(vma, vma->vm_start); + end = linear_page_index(vma, vma->vm_end); + + return shmem_partial_swap_usage(inode->i_mapping, start, end); +} +#else +static unsigned long smaps_shmem_swap(struct vm_area_struct *vma) +{ + return 0; +} +#endif + static int show_smap(struct seq_file *m, void *v, int is_pid) { struct vm_area_struct *vma = v; @@ -639,6 +675,8 @@ static int show_smap(struct seq_file *m, void *v, int is_pid) /* mmap_sem is held in m_start */ walk_page_vma(vma, &smaps_walk); + mss.swap += smaps_shmem_swap(vma); + show_map_vma(m, vma, is_pid); seq_printf(m, diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 50777b5..12519e4 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -60,6 +60,12 @@ extern struct page *shmem_read_mapping_page_gfp(struct address_space *mapping, extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end); extern int shmem_unuse(swp_entry_t entry, struct page *page); +#ifdef CONFIG_SWAP +extern unsigned long shmem_swap_usage(struct inode *inode); +extern unsigned long shmem_partial_swap_usage(struct address_space *mapping, + pgoff_t start, pgoff_t end); +#endif + static inline struct page *shmem_read_mapping_page( struct address_space *mapping, pgoff_t index) { diff --git a/mm/shmem.c b/mm/shmem.c index aa9c82a..88319f8 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -357,6 +357,60 @@ static int shmem_free_swap(struct address_space *mapping, return 0; } +#ifdef CONFIG_SWAP +unsigned long shmem_swap_usage(struct inode *inode) +{ + struct shmem_inode_info *info = SHMEM_I(inode); + unsigned long swapped; + + spin_lock(&info->lock); + swapped = info->swapped; + spin_unlock(&info->lock); + + return swapped << PAGE_SHIFT; +} + +unsigned long shmem_partial_swap_usage(struct address_space *mapping, + pgoff_t start, pgoff_t end) +{ + struct radix_tree_iter iter; + void **slot; + struct page *page; + unsigned long swapped = 0; + + rcu_read_lock(); + +restart: + radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) { + if (iter.index >= end) + break; + + page = radix_tree_deref_slot(slot); + + /* + * This should only be possible to happen at index 0, so we + * don't need to reset the counter, nor do we risk infinite + * restarts. + */ + if (radix_tree_deref_retry(page)) + goto restart; + + if (radix_tree_exceptional_entry(page)) + swapped++; + + if (need_resched()) { + cond_resched_rcu(); + start = iter.index + 1; + goto restart; + } + } + + rcu_read_unlock(); + + return swapped << PAGE_SHIFT; +} +#endif + /* * SysV IPC SHM_UNLOCK restore Unevictable pages to their evictable lists. */ -- 2.4.6 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 13+ messages in thread
[parent not found: <1438779685-5227-3-git-send-email-vbabka-AlSwsSmVLrQ@public.gmane.org>]
* Re: [PATCH v3 2/4] mm, proc: account for shmem swap in /proc/pid/smaps [not found] ` <1438779685-5227-3-git-send-email-vbabka-AlSwsSmVLrQ@public.gmane.org> @ 2015-09-25 12:57 ` Michal Hocko 0 siblings, 0 replies; 13+ messages in thread From: Michal Hocko @ 2015-09-25 12:57 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, Jerome Marchand, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Hugh Dickins, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390-u79uwXL29TY76Z2rM5mHXA, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API, Konstantin Khlebnikov, Minchan Kim [Sorry for a really long delay] On Wed 05-08-15 15:01:23, Vlastimil Babka wrote: > Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed > mappings, even if the mapped portion does contain pages that were swapped out. > This is because unlike private anonymous mappings, shmem does not change pte > to swap entry, but pte_none when swapping the page out. In the smaps page > walk, such page thus looks like it was never faulted in. > > This patch changes smaps_pte_entry() to determine the swap status for such > pte_none entries for shmem mappings, similarly to how mincore_page() does it. > Swapped out pages are thus accounted for. > > The accounting is arguably still not as precise as for private anonymous > mappings, since now we will count also pages that the process in question never > accessed, but only another process populated them and then let them become > swapped out. > > I believe it is still less confusing and subtle than not showing > any swap usage by shmem mappings at all. Also, swapped out pages only becomee a > performance issue for future accesses, and we cannot predict those for neither > kind of mapping. Yes I agree. > Signed-off-by: Vlastimil Babka <vbabka-AlSwsSmVLrQ@public.gmane.org> [...] > @@ -625,6 +626,41 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) > seq_putc(m, '\n'); > } > > +#if defined(CONFIG_SHMEM) && defined(CONFIG_SWAP) > +static unsigned long smaps_shmem_swap(struct vm_area_struct *vma) > +{ > + struct inode *inode; > + unsigned long swapped; > + pgoff_t start, end; > + > + if (!vma->vm_file) > + return 0; > + > + inode = file_inode(vma->vm_file); Why don't we need to take i_mutex here? What prevents from a parallel truncate? I guess we do not care because radix_tree_for_each_slot would cope with a truncated portion of the range, right? It would deserve a comment I guess. > + > + if (!shmem_mapping(inode->i_mapping)) > + return 0; > + > + swapped = shmem_swap_usage(inode); > + > + if (swapped == 0) > + return 0; > + > + if (vma->vm_end - vma->vm_start >= inode->i_size) > + return swapped; > + > + start = linear_page_index(vma, vma->vm_start); > + end = linear_page_index(vma, vma->vm_end); > + > + return shmem_partial_swap_usage(inode->i_mapping, start, end); > +} [...] > +unsigned long shmem_partial_swap_usage(struct address_space *mapping, > + pgoff_t start, pgoff_t end) > +{ > + struct radix_tree_iter iter; > + void **slot; > + struct page *page; > + unsigned long swapped = 0; > + > + rcu_read_lock(); > + > +restart: > + radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) { > + if (iter.index >= end) > + break; > + > + page = radix_tree_deref_slot(slot); > + > + /* > + * This should only be possible to happen at index 0, so we > + * don't need to reset the counter, nor do we risk infinite > + * restarts. > + */ > + if (radix_tree_deref_retry(page)) > + goto restart; > + > + if (radix_tree_exceptional_entry(page)) > + swapped++; > + > + if (need_resched()) { > + cond_resched_rcu(); > + start = iter.index + 1; > + goto restart; > + } > + } > + > + rcu_read_unlock(); > + > + return swapped << PAGE_SHIFT; > +} > +#endif > + > /* > * SysV IPC SHM_UNLOCK restore Unevictable pages to their evictable lists. > */ > -- > 2.4.6 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-api" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v3 3/4] mm, shmem: Add shmem resident memory accounting 2015-08-05 13:01 [PATCH v3 0/4] enhance shmem process and swap accounting Vlastimil Babka 2015-08-05 13:01 ` [PATCH v3 2/4] mm, proc: account for shmem swap in /proc/pid/smaps Vlastimil Babka @ 2015-08-05 13:01 ` Vlastimil Babka 2015-09-25 13:26 ` Michal Hocko 2015-08-05 13:01 ` [PATCH v3 4/4] mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status Vlastimil Babka [not found] ` <1438779685-5227-1-git-send-email-vbabka-AlSwsSmVLrQ@public.gmane.org> 3 siblings, 1 reply; 13+ messages in thread From: Vlastimil Babka @ 2015-08-05 13:01 UTC (permalink / raw) To: Andrew Morton, Jerome Marchand Cc: linux-mm, linux-kernel, Vlastimil Babka, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API, Konstantin Khlebnikov, Minchan Kim From: Jerome Marchand <jmarchan@redhat.com> Currently looking at /proc/<pid>/status or statm, there is no way to distinguish shmem pages from pages mapped to a regular file (shmem pages are mapped to /dev/zero), even though their implication in actual memory use is quite different. This patch adds MM_SHMEMPAGES counter to mm_rss_stat to account for shmem pages instead of MM_FILEPAGES. Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> --- arch/s390/mm/pgtable.c | 5 +---- fs/proc/task_mmu.c | 3 ++- include/linux/mm.h | 28 ++++++++++++++++++++++++++++ include/linux/mm_types.h | 9 ++++++--- kernel/events/uprobes.c | 2 +- mm/memory.c | 30 ++++++++++-------------------- mm/oom_kill.c | 5 +++-- mm/rmap.c | 15 ++++----------- 8 files changed, 55 insertions(+), 42 deletions(-) diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index b33f661..276e3dd 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -610,10 +610,7 @@ static void gmap_zap_swap_entry(swp_entry_t entry, struct mm_struct *mm) else if (is_migration_entry(entry)) { struct page *page = migration_entry_to_page(entry); - if (PageAnon(page)) - dec_mm_counter(mm, MM_ANONPAGES); - else - dec_mm_counter(mm, MM_FILEPAGES); + dec_mm_counter(mm, mm_counter(page)); } free_swap_and_cache(entry); } diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index f94f8f3..99b0efe 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -81,7 +81,8 @@ unsigned long task_statm(struct mm_struct *mm, unsigned long *shared, unsigned long *text, unsigned long *data, unsigned long *resident) { - *shared = get_mm_counter(mm, MM_FILEPAGES); + *shared = get_mm_counter(mm, MM_FILEPAGES) + + get_mm_counter(mm, MM_SHMEMPAGES); *text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) >> PAGE_SHIFT; *data = mm->total_vm - mm->shared_vm; diff --git a/include/linux/mm.h b/include/linux/mm.h index 5e08787..b814ac2 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1235,6 +1235,16 @@ static inline unsigned long get_mm_counter(struct mm_struct *mm, int member) return (unsigned long)val; } +/* A wrapper for the CONFIG_SHMEM dependent counter */ +static inline unsigned long get_mm_counter_shmem(struct mm_struct *mm) +{ +#ifdef CONFIG_SHMEM + return get_mm_counter(mm, MM_SHMEMPAGES); +#else + return 0; +#endif +} + static inline void add_mm_counter(struct mm_struct *mm, int member, long value) { atomic_long_add(value, &mm->rss_stat.count[member]); @@ -1250,9 +1260,27 @@ static inline void dec_mm_counter(struct mm_struct *mm, int member) atomic_long_dec(&mm->rss_stat.count[member]); } +/* Optimized variant when page is already known not to be PageAnon */ +static inline int mm_counter_file(struct page *page) +{ +#ifdef CONFIG_SHMEM + if (PageSwapBacked(page)) + return MM_SHMEMPAGES; +#endif + return MM_FILEPAGES; +} + +static inline int mm_counter(struct page *page) +{ + if (PageAnon(page)) + return MM_ANONPAGES; + return mm_counter_file(page); +} + static inline unsigned long get_mm_rss(struct mm_struct *mm) { return get_mm_counter(mm, MM_FILEPAGES) + + get_mm_counter_shmem(mm) + get_mm_counter(mm, MM_ANONPAGES); } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 4957bd3..e02a855 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -356,9 +356,12 @@ struct core_state { }; enum { - MM_FILEPAGES, - MM_ANONPAGES, - MM_SWAPENTS, + MM_FILEPAGES, /* Resident file mapping pages */ + MM_ANONPAGES, /* Resident anonymous pages */ + MM_SWAPENTS, /* Anonymous swap entries */ +#ifdef CONFIG_SHMEM + MM_SHMEMPAGES, /* Resident shared memory pages */ +#endif NR_MM_COUNTERS }; diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 4e5e979..6288606 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -180,7 +180,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, lru_cache_add_active_or_unevictable(kpage, vma); if (!PageAnon(page)) { - dec_mm_counter(mm, MM_FILEPAGES); + dec_mm_counter(mm, mm_counter_file(page)); inc_mm_counter(mm, MM_ANONPAGES); } diff --git a/mm/memory.c b/mm/memory.c index fe1e6de..00030e8 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -832,10 +832,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, } else if (is_migration_entry(entry)) { page = migration_entry_to_page(entry); - if (PageAnon(page)) - rss[MM_ANONPAGES]++; - else - rss[MM_FILEPAGES]++; + rss[mm_counter(page)]++; if (is_write_migration_entry(entry) && is_cow_mapping(vm_flags)) { @@ -874,10 +871,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, if (page) { get_page(page); page_dup_rmap(page); - if (PageAnon(page)) - rss[MM_ANONPAGES]++; - else - rss[MM_FILEPAGES]++; + rss[mm_counter(page)]++; } out_set_pte: @@ -1113,9 +1107,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, tlb_remove_tlb_entry(tlb, pte, addr); if (unlikely(!page)) continue; - if (PageAnon(page)) - rss[MM_ANONPAGES]--; - else { + + if (!PageAnon(page)) { if (pte_dirty(ptent)) { force_flush = 1; set_page_dirty(page); @@ -1123,8 +1116,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, if (pte_young(ptent) && likely(!(vma->vm_flags & VM_SEQ_READ))) mark_page_accessed(page); - rss[MM_FILEPAGES]--; } + rss[mm_counter(page)]--; page_remove_rmap(page); if (unlikely(page_mapcount(page) < 0)) print_bad_pte(vma, addr, ptent, page); @@ -1146,11 +1139,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, struct page *page; page = migration_entry_to_page(entry); - - if (PageAnon(page)) - rss[MM_ANONPAGES]--; - else - rss[MM_FILEPAGES]--; + rss[mm_counter(page)]--; } if (unlikely(!free_swap_and_cache(entry))) print_bad_pte(vma, addr, ptent, NULL); @@ -1460,7 +1449,7 @@ static int insert_page(struct vm_area_struct *vma, unsigned long addr, /* Ok, finally just insert the thing.. */ get_page(page); - inc_mm_counter_fast(mm, MM_FILEPAGES); + inc_mm_counter_fast(mm, mm_counter_file(page)); page_add_file_rmap(page); set_pte_at(mm, addr, pte, mk_pte(page, prot)); @@ -2097,7 +2086,8 @@ static int wp_page_copy(struct mm_struct *mm, struct vm_area_struct *vma, if (likely(pte_same(*page_table, orig_pte))) { if (old_page) { if (!PageAnon(old_page)) { - dec_mm_counter_fast(mm, MM_FILEPAGES); + dec_mm_counter_fast(mm, + mm_counter_file(old_page)); inc_mm_counter_fast(mm, MM_ANONPAGES); } } else { @@ -2820,7 +2810,7 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address, inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); page_add_new_anon_rmap(page, vma, address); } else { - inc_mm_counter_fast(vma->vm_mm, MM_FILEPAGES); + inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page)); page_add_file_rmap(page); } set_pte_at(vma->vm_mm, address, pte, entry); diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 1ecc0bc..230edc4 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -555,10 +555,11 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p, /* mm cannot safely be dereferenced after task_unlock(victim) */ mm = victim->mm; mark_oom_victim(victim); - pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n", + pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n", task_pid_nr(victim), victim->comm, K(victim->mm->total_vm), K(get_mm_counter(victim->mm, MM_ANONPAGES)), - K(get_mm_counter(victim->mm, MM_FILEPAGES))); + K(get_mm_counter(victim->mm, MM_FILEPAGES)), + K(get_mm_counter_shmem(victim->mm))); task_unlock(victim); /* diff --git a/mm/rmap.c b/mm/rmap.c index b6db6a6..e38a134 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1381,12 +1381,8 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, update_hiwater_rss(mm); if (PageHWPoison(page) && !(flags & TTU_IGNORE_HWPOISON)) { - if (!PageHuge(page)) { - if (PageAnon(page)) - dec_mm_counter(mm, MM_ANONPAGES); - else - dec_mm_counter(mm, MM_FILEPAGES); - } + if (!PageHuge(page)) + dec_mm_counter(mm, mm_counter(page)); set_pte_at(mm, address, pte, swp_entry_to_pte(make_hwpoison_entry(page))); } else if (pte_unused(pteval)) { @@ -1395,10 +1391,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, * interest anymore. Simply discard the pte, vmscan * will take care of the rest. */ - if (PageAnon(page)) - dec_mm_counter(mm, MM_ANONPAGES); - else - dec_mm_counter(mm, MM_FILEPAGES); + dec_mm_counter(mm, mm_counter(page)); } else if (PageAnon(page)) { swp_entry_t entry = { .val = page_private(page) }; pte_t swp_pte; @@ -1454,7 +1447,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, entry = make_migration_entry(page, pte_write(pteval)); set_pte_at(mm, address, pte, swp_entry_to_pte(entry)); } else - dec_mm_counter(mm, MM_FILEPAGES); + dec_mm_counter(mm, mm_counter_file(page)); discard: page_remove_rmap(page); -- 2.4.6 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v3 3/4] mm, shmem: Add shmem resident memory accounting 2015-08-05 13:01 ` [PATCH v3 3/4] mm, shmem: Add shmem resident memory accounting Vlastimil Babka @ 2015-09-25 13:26 ` Michal Hocko 0 siblings, 0 replies; 13+ messages in thread From: Michal Hocko @ 2015-09-25 13:26 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, Jerome Marchand, linux-mm, linux-kernel, Hugh Dickins, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API, Konstantin Khlebnikov, Minchan Kim On Wed 05-08-15 15:01:24, Vlastimil Babka wrote: > From: Jerome Marchand <jmarchan@redhat.com> > > Currently looking at /proc/<pid>/status or statm, there is no way to > distinguish shmem pages from pages mapped to a regular file (shmem > pages are mapped to /dev/zero), even though their implication in > actual memory use is quite different. > This patch adds MM_SHMEMPAGES counter to mm_rss_stat to account for > shmem pages instead of MM_FILEPAGES. Conflating SHMEM and FILEPAGES was imho unfortunate. Some of that is even unfortunate and people had to learn to subtract shmem from cache... > Signed-off-by: Jerome Marchand <jmarchan@redhat.com> > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> I am not sure that making MM_SHMEMPAGES conditional is really worth the complications. We already have MM_SWAPENTS unconditional. If we really care about the additional 64b then let's make both conditional but this can be done in a separate patch IMO. Btw. task_statm got it wrong and it uses the counter directly. So either this one has to be fixed or just make it unconditional. Other than that: Acked-by: Michal Hocko <mhocko@suse.com> > --- > arch/s390/mm/pgtable.c | 5 +---- > fs/proc/task_mmu.c | 3 ++- > include/linux/mm.h | 28 ++++++++++++++++++++++++++++ > include/linux/mm_types.h | 9 ++++++--- > kernel/events/uprobes.c | 2 +- > mm/memory.c | 30 ++++++++++-------------------- > mm/oom_kill.c | 5 +++-- > mm/rmap.c | 15 ++++----------- > 8 files changed, 55 insertions(+), 42 deletions(-) > > diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c > index b33f661..276e3dd 100644 > --- a/arch/s390/mm/pgtable.c > +++ b/arch/s390/mm/pgtable.c > @@ -610,10 +610,7 @@ static void gmap_zap_swap_entry(swp_entry_t entry, struct mm_struct *mm) > else if (is_migration_entry(entry)) { > struct page *page = migration_entry_to_page(entry); > > - if (PageAnon(page)) > - dec_mm_counter(mm, MM_ANONPAGES); > - else > - dec_mm_counter(mm, MM_FILEPAGES); > + dec_mm_counter(mm, mm_counter(page)); > } > free_swap_and_cache(entry); > } > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index f94f8f3..99b0efe 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -81,7 +81,8 @@ unsigned long task_statm(struct mm_struct *mm, > unsigned long *shared, unsigned long *text, > unsigned long *data, unsigned long *resident) > { > - *shared = get_mm_counter(mm, MM_FILEPAGES); > + *shared = get_mm_counter(mm, MM_FILEPAGES) + > + get_mm_counter(mm, MM_SHMEMPAGES); > *text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) > >> PAGE_SHIFT; > *data = mm->total_vm - mm->shared_vm; > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 5e08787..b814ac2 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1235,6 +1235,16 @@ static inline unsigned long get_mm_counter(struct mm_struct *mm, int member) > return (unsigned long)val; > } > > +/* A wrapper for the CONFIG_SHMEM dependent counter */ > +static inline unsigned long get_mm_counter_shmem(struct mm_struct *mm) > +{ > +#ifdef CONFIG_SHMEM > + return get_mm_counter(mm, MM_SHMEMPAGES); > +#else > + return 0; > +#endif > +} > + > static inline void add_mm_counter(struct mm_struct *mm, int member, long value) > { > atomic_long_add(value, &mm->rss_stat.count[member]); > @@ -1250,9 +1260,27 @@ static inline void dec_mm_counter(struct mm_struct *mm, int member) > atomic_long_dec(&mm->rss_stat.count[member]); > } > > +/* Optimized variant when page is already known not to be PageAnon */ > +static inline int mm_counter_file(struct page *page) > +{ > +#ifdef CONFIG_SHMEM > + if (PageSwapBacked(page)) > + return MM_SHMEMPAGES; > +#endif > + return MM_FILEPAGES; > +} > + > +static inline int mm_counter(struct page *page) > +{ > + if (PageAnon(page)) > + return MM_ANONPAGES; > + return mm_counter_file(page); > +} > + > static inline unsigned long get_mm_rss(struct mm_struct *mm) > { > return get_mm_counter(mm, MM_FILEPAGES) + > + get_mm_counter_shmem(mm) + > get_mm_counter(mm, MM_ANONPAGES); > } > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 4957bd3..e02a855 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -356,9 +356,12 @@ struct core_state { > }; > > enum { > - MM_FILEPAGES, > - MM_ANONPAGES, > - MM_SWAPENTS, > + MM_FILEPAGES, /* Resident file mapping pages */ > + MM_ANONPAGES, /* Resident anonymous pages */ > + MM_SWAPENTS, /* Anonymous swap entries */ > +#ifdef CONFIG_SHMEM > + MM_SHMEMPAGES, /* Resident shared memory pages */ > +#endif > NR_MM_COUNTERS > }; > > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c > index 4e5e979..6288606 100644 > --- a/kernel/events/uprobes.c > +++ b/kernel/events/uprobes.c > @@ -180,7 +180,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, > lru_cache_add_active_or_unevictable(kpage, vma); > > if (!PageAnon(page)) { > - dec_mm_counter(mm, MM_FILEPAGES); > + dec_mm_counter(mm, mm_counter_file(page)); > inc_mm_counter(mm, MM_ANONPAGES); > } > > diff --git a/mm/memory.c b/mm/memory.c > index fe1e6de..00030e8 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -832,10 +832,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, > } else if (is_migration_entry(entry)) { > page = migration_entry_to_page(entry); > > - if (PageAnon(page)) > - rss[MM_ANONPAGES]++; > - else > - rss[MM_FILEPAGES]++; > + rss[mm_counter(page)]++; > > if (is_write_migration_entry(entry) && > is_cow_mapping(vm_flags)) { > @@ -874,10 +871,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, > if (page) { > get_page(page); > page_dup_rmap(page); > - if (PageAnon(page)) > - rss[MM_ANONPAGES]++; > - else > - rss[MM_FILEPAGES]++; > + rss[mm_counter(page)]++; > } > > out_set_pte: > @@ -1113,9 +1107,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, > tlb_remove_tlb_entry(tlb, pte, addr); > if (unlikely(!page)) > continue; > - if (PageAnon(page)) > - rss[MM_ANONPAGES]--; > - else { > + > + if (!PageAnon(page)) { > if (pte_dirty(ptent)) { > force_flush = 1; > set_page_dirty(page); > @@ -1123,8 +1116,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, > if (pte_young(ptent) && > likely(!(vma->vm_flags & VM_SEQ_READ))) > mark_page_accessed(page); > - rss[MM_FILEPAGES]--; > } > + rss[mm_counter(page)]--; > page_remove_rmap(page); > if (unlikely(page_mapcount(page) < 0)) > print_bad_pte(vma, addr, ptent, page); > @@ -1146,11 +1139,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, > struct page *page; > > page = migration_entry_to_page(entry); > - > - if (PageAnon(page)) > - rss[MM_ANONPAGES]--; > - else > - rss[MM_FILEPAGES]--; > + rss[mm_counter(page)]--; > } > if (unlikely(!free_swap_and_cache(entry))) > print_bad_pte(vma, addr, ptent, NULL); > @@ -1460,7 +1449,7 @@ static int insert_page(struct vm_area_struct *vma, unsigned long addr, > > /* Ok, finally just insert the thing.. */ > get_page(page); > - inc_mm_counter_fast(mm, MM_FILEPAGES); > + inc_mm_counter_fast(mm, mm_counter_file(page)); > page_add_file_rmap(page); > set_pte_at(mm, addr, pte, mk_pte(page, prot)); > > @@ -2097,7 +2086,8 @@ static int wp_page_copy(struct mm_struct *mm, struct vm_area_struct *vma, > if (likely(pte_same(*page_table, orig_pte))) { > if (old_page) { > if (!PageAnon(old_page)) { > - dec_mm_counter_fast(mm, MM_FILEPAGES); > + dec_mm_counter_fast(mm, > + mm_counter_file(old_page)); > inc_mm_counter_fast(mm, MM_ANONPAGES); > } > } else { > @@ -2820,7 +2810,7 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address, > inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); > page_add_new_anon_rmap(page, vma, address); > } else { > - inc_mm_counter_fast(vma->vm_mm, MM_FILEPAGES); > + inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page)); > page_add_file_rmap(page); > } > set_pte_at(vma->vm_mm, address, pte, entry); > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 1ecc0bc..230edc4 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -555,10 +555,11 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p, > /* mm cannot safely be dereferenced after task_unlock(victim) */ > mm = victim->mm; > mark_oom_victim(victim); > - pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n", > + pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n", > task_pid_nr(victim), victim->comm, K(victim->mm->total_vm), > K(get_mm_counter(victim->mm, MM_ANONPAGES)), > - K(get_mm_counter(victim->mm, MM_FILEPAGES))); > + K(get_mm_counter(victim->mm, MM_FILEPAGES)), > + K(get_mm_counter_shmem(victim->mm))); > task_unlock(victim); > > /* > diff --git a/mm/rmap.c b/mm/rmap.c > index b6db6a6..e38a134 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1381,12 +1381,8 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > update_hiwater_rss(mm); > > if (PageHWPoison(page) && !(flags & TTU_IGNORE_HWPOISON)) { > - if (!PageHuge(page)) { > - if (PageAnon(page)) > - dec_mm_counter(mm, MM_ANONPAGES); > - else > - dec_mm_counter(mm, MM_FILEPAGES); > - } > + if (!PageHuge(page)) > + dec_mm_counter(mm, mm_counter(page)); > set_pte_at(mm, address, pte, > swp_entry_to_pte(make_hwpoison_entry(page))); > } else if (pte_unused(pteval)) { > @@ -1395,10 +1391,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > * interest anymore. Simply discard the pte, vmscan > * will take care of the rest. > */ > - if (PageAnon(page)) > - dec_mm_counter(mm, MM_ANONPAGES); > - else > - dec_mm_counter(mm, MM_FILEPAGES); > + dec_mm_counter(mm, mm_counter(page)); > } else if (PageAnon(page)) { > swp_entry_t entry = { .val = page_private(page) }; > pte_t swp_pte; > @@ -1454,7 +1447,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > entry = make_migration_entry(page, pte_write(pteval)); > set_pte_at(mm, address, pte, swp_entry_to_pte(entry)); > } else > - dec_mm_counter(mm, MM_FILEPAGES); > + dec_mm_counter(mm, mm_counter_file(page)); > > discard: > page_remove_rmap(page); > -- > 2.4.6 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v3 4/4] mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status 2015-08-05 13:01 [PATCH v3 0/4] enhance shmem process and swap accounting Vlastimil Babka 2015-08-05 13:01 ` [PATCH v3 2/4] mm, proc: account for shmem swap in /proc/pid/smaps Vlastimil Babka 2015-08-05 13:01 ` [PATCH v3 3/4] mm, shmem: Add shmem resident memory accounting Vlastimil Babka @ 2015-08-05 13:01 ` Vlastimil Babka [not found] ` <1438779685-5227-5-git-send-email-vbabka-AlSwsSmVLrQ@public.gmane.org> 2015-09-25 13:29 ` Michal Hocko [not found] ` <1438779685-5227-1-git-send-email-vbabka-AlSwsSmVLrQ@public.gmane.org> 3 siblings, 2 replies; 13+ messages in thread From: Vlastimil Babka @ 2015-08-05 13:01 UTC (permalink / raw) To: Andrew Morton, Jerome Marchand Cc: linux-mm, linux-kernel, Vlastimil Babka, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API, Konstantin Khlebnikov, Minchan Kim From: Jerome Marchand <jmarchan@redhat.com> It's currently inconvenient to retrieve MM_ANONPAGES value from status and statm files and there is no way to separate MM_FILEPAGES and MM_SHMEMPAGES. Add VmAnon, VmFile and VmShm lines in /proc/<pid>/status to solve these issues. Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> --- Documentation/filesystems/proc.txt | 10 +++++++++- fs/proc/task_mmu.c | 13 +++++++++++-- 2 files changed, 20 insertions(+), 3 deletions(-) diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index fcf67c7..fadd1b3 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -168,6 +168,9 @@ For example, to get the status information of a process, all you have to do is VmLck: 0 kB VmHWM: 476 kB VmRSS: 476 kB + VmAnon: 352 kB + VmFile: 120 kB + VmShm: 4 kB VmData: 156 kB VmStk: 88 kB VmExe: 68 kB @@ -229,7 +232,12 @@ Table 1-2: Contents of the status files (as of 4.1) VmSize total program size VmLck locked memory size VmHWM peak resident set size ("high water mark") - VmRSS size of memory portions + VmRSS size of memory portions. It contains the three + following parts (VmRSS = VmAnon + VmFile + VmShm) + VmAnon size of resident anonymous memory + VmFile size of resident file mappings + VmShm size of resident shmem memory (includes SysV shm, + mapping of tmpfs and shared anonymous mappings) VmData size of data, stack, and text segments VmStk size of data, stack, and text segments VmExe size of text segment diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 99b0efe..e299101 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -22,7 +22,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm) { - unsigned long data, text, lib, swap, ptes, pmds; + unsigned long data, text, lib, swap, ptes, pmds, anon, file, shmem; unsigned long hiwater_vm, total_vm, hiwater_rss, total_rss; /* @@ -39,6 +39,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm) if (hiwater_rss < mm->hiwater_rss) hiwater_rss = mm->hiwater_rss; + anon = get_mm_counter(mm, MM_ANONPAGES); + file = get_mm_counter(mm, MM_FILEPAGES); + shmem = get_mm_counter_shmem(mm); data = mm->total_vm - mm->shared_vm - mm->stack_vm; text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) >> 10; lib = (mm->exec_vm << (PAGE_SHIFT-10)) - text; @@ -52,6 +55,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm) "VmPin:\t%8lu kB\n" "VmHWM:\t%8lu kB\n" "VmRSS:\t%8lu kB\n" + "VmAnon:\t%8lu kB\n" + "VmFile:\t%8lu kB\n" + "VmShm:\t%8lu kB\n" "VmData:\t%8lu kB\n" "VmStk:\t%8lu kB\n" "VmExe:\t%8lu kB\n" @@ -65,6 +71,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm) mm->pinned_vm << (PAGE_SHIFT-10), hiwater_rss << (PAGE_SHIFT-10), total_rss << (PAGE_SHIFT-10), + anon << (PAGE_SHIFT-10), + file << (PAGE_SHIFT-10), + shmem << (PAGE_SHIFT-10), data << (PAGE_SHIFT-10), mm->stack_vm << (PAGE_SHIFT-10), text, lib, ptes >> 10, @@ -82,7 +91,7 @@ unsigned long task_statm(struct mm_struct *mm, unsigned long *data, unsigned long *resident) { *shared = get_mm_counter(mm, MM_FILEPAGES) + - get_mm_counter(mm, MM_SHMEMPAGES); + get_mm_counter_shmem(mm); *text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) >> PAGE_SHIFT; *data = mm->total_vm - mm->shared_vm; -- 2.4.6 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 13+ messages in thread
[parent not found: <1438779685-5227-5-git-send-email-vbabka-AlSwsSmVLrQ@public.gmane.org>]
* Re: [PATCH v3 4/4] mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status [not found] ` <1438779685-5227-5-git-send-email-vbabka-AlSwsSmVLrQ@public.gmane.org> @ 2015-08-05 13:21 ` Konstantin Khlebnikov 2015-08-27 7:22 ` Vlastimil Babka 0 siblings, 1 reply; 13+ messages in thread From: Konstantin Khlebnikov @ 2015-08-05 13:21 UTC (permalink / raw) To: Vlastimil Babka, Andrew Morton, Jerome Marchand Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390-u79uwXL29TY76Z2rM5mHXA, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API, Minchan Kim On 05.08.2015 16:01, Vlastimil Babka wrote: > From: Jerome Marchand <jmarchan-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > > It's currently inconvenient to retrieve MM_ANONPAGES value from status > and statm files and there is no way to separate MM_FILEPAGES and > MM_SHMEMPAGES. Add VmAnon, VmFile and VmShm lines in /proc/<pid>/status > to solve these issues. > > Signed-off-by: Jerome Marchand <jmarchan-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > Signed-off-by: Vlastimil Babka <vbabka-AlSwsSmVLrQ@public.gmane.org> > --- > Documentation/filesystems/proc.txt | 10 +++++++++- > fs/proc/task_mmu.c | 13 +++++++++++-- > 2 files changed, 20 insertions(+), 3 deletions(-) > > diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt > index fcf67c7..fadd1b3 100644 > --- a/Documentation/filesystems/proc.txt > +++ b/Documentation/filesystems/proc.txt > @@ -168,6 +168,9 @@ For example, to get the status information of a process, all you have to do is > VmLck: 0 kB > VmHWM: 476 kB > VmRSS: 476 kB > + VmAnon: 352 kB > + VmFile: 120 kB > + VmShm: 4 kB > VmData: 156 kB > VmStk: 88 kB > VmExe: 68 kB > @@ -229,7 +232,12 @@ Table 1-2: Contents of the status files (as of 4.1) > VmSize total program size > VmLck locked memory size > VmHWM peak resident set size ("high water mark") > - VmRSS size of memory portions > + VmRSS size of memory portions. It contains the three > + following parts (VmRSS = VmAnon + VmFile + VmShm) > + VmAnon size of resident anonymous memory > + VmFile size of resident file mappings > + VmShm size of resident shmem memory (includes SysV shm, > + mapping of tmpfs and shared anonymous mappings) "Vm" is an acronym for Virtual Memory, but all these are not virtual. They are real pages. Let's leave VmRSS as is and invent better prefix for new fields: something like "Mem", "Pg", or no prefix at all. > VmData size of data, stack, and text segments > VmStk size of data, stack, and text segments > VmExe size of text segment > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index 99b0efe..e299101 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -22,7 +22,7 @@ > > void task_mem(struct seq_file *m, struct mm_struct *mm) > { > - unsigned long data, text, lib, swap, ptes, pmds; > + unsigned long data, text, lib, swap, ptes, pmds, anon, file, shmem; > unsigned long hiwater_vm, total_vm, hiwater_rss, total_rss; > > /* > @@ -39,6 +39,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm) > if (hiwater_rss < mm->hiwater_rss) > hiwater_rss = mm->hiwater_rss; > > + anon = get_mm_counter(mm, MM_ANONPAGES); > + file = get_mm_counter(mm, MM_FILEPAGES); > + shmem = get_mm_counter_shmem(mm); > data = mm->total_vm - mm->shared_vm - mm->stack_vm; > text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) >> 10; > lib = (mm->exec_vm << (PAGE_SHIFT-10)) - text; > @@ -52,6 +55,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm) > "VmPin:\t%8lu kB\n" > "VmHWM:\t%8lu kB\n" > "VmRSS:\t%8lu kB\n" > + "VmAnon:\t%8lu kB\n" > + "VmFile:\t%8lu kB\n" > + "VmShm:\t%8lu kB\n" > "VmData:\t%8lu kB\n" > "VmStk:\t%8lu kB\n" > "VmExe:\t%8lu kB\n" > @@ -65,6 +71,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm) > mm->pinned_vm << (PAGE_SHIFT-10), > hiwater_rss << (PAGE_SHIFT-10), > total_rss << (PAGE_SHIFT-10), > + anon << (PAGE_SHIFT-10), > + file << (PAGE_SHIFT-10), > + shmem << (PAGE_SHIFT-10), > data << (PAGE_SHIFT-10), > mm->stack_vm << (PAGE_SHIFT-10), text, lib, > ptes >> 10, > @@ -82,7 +91,7 @@ unsigned long task_statm(struct mm_struct *mm, > unsigned long *data, unsigned long *resident) > { > *shared = get_mm_counter(mm, MM_FILEPAGES) + > - get_mm_counter(mm, MM_SHMEMPAGES); > + get_mm_counter_shmem(mm); > *text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) > >> PAGE_SHIFT; > *data = mm->total_vm - mm->shared_vm; > -- Konstantin ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 4/4] mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status 2015-08-05 13:21 ` Konstantin Khlebnikov @ 2015-08-27 7:22 ` Vlastimil Babka 0 siblings, 0 replies; 13+ messages in thread From: Vlastimil Babka @ 2015-08-27 7:22 UTC (permalink / raw) To: Konstantin Khlebnikov, Andrew Morton, Jerome Marchand Cc: linux-mm, linux-kernel, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API, Minchan Kim On 08/05/2015 03:21 PM, Konstantin Khlebnikov wrote: > On 05.08.2015 16:01, Vlastimil Babka wrote: >> From: Jerome Marchand <jmarchan@redhat.com> >> >> It's currently inconvenient to retrieve MM_ANONPAGES value from status >> and statm files and there is no way to separate MM_FILEPAGES and >> MM_SHMEMPAGES. Add VmAnon, VmFile and VmShm lines in /proc/<pid>/status >> to solve these issues. >> >> Signed-off-by: Jerome Marchand <jmarchan@redhat.com> >> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> >> --- >> Documentation/filesystems/proc.txt | 10 +++++++++- >> fs/proc/task_mmu.c | 13 +++++++++++-- >> 2 files changed, 20 insertions(+), 3 deletions(-) >> >> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt >> index fcf67c7..fadd1b3 100644 >> --- a/Documentation/filesystems/proc.txt >> +++ b/Documentation/filesystems/proc.txt >> @@ -168,6 +168,9 @@ For example, to get the status information of a process, all you have to do is >> VmLck: 0 kB >> VmHWM: 476 kB >> VmRSS: 476 kB >> + VmAnon: 352 kB >> + VmFile: 120 kB >> + VmShm: 4 kB >> VmData: 156 kB >> VmStk: 88 kB >> VmExe: 68 kB >> @@ -229,7 +232,12 @@ Table 1-2: Contents of the status files (as of 4.1) >> VmSize total program size >> VmLck locked memory size >> VmHWM peak resident set size ("high water mark") >> - VmRSS size of memory portions >> + VmRSS size of memory portions. It contains the three >> + following parts (VmRSS = VmAnon + VmFile + VmShm) >> + VmAnon size of resident anonymous memory >> + VmFile size of resident file mappings >> + VmShm size of resident shmem memory (includes SysV shm, >> + mapping of tmpfs and shared anonymous mappings) > > "Vm" is an acronym for Virtual Memory, but all these are not virtual. > They are real pages. Let's leave VmRSS as is and invent better prefix > for new fields: something like "Mem", "Pg", or no prefix at all. No prefix would be IMHO confusing. Mem could work, but it's not exactly consistent with the rest. I think only VmPeak and VmSize talk about virtual memory. The rest of existing counters is about physical memory being mapped into that virtual memory or consumed by supporting it (PTE, PMD) or swapped out. I don't see any difference for the new counters here, they would just stand out oddly with some new prefix IMHO. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 4/4] mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status 2015-08-05 13:01 ` [PATCH v3 4/4] mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status Vlastimil Babka [not found] ` <1438779685-5227-5-git-send-email-vbabka-AlSwsSmVLrQ@public.gmane.org> @ 2015-09-25 13:29 ` Michal Hocko 1 sibling, 0 replies; 13+ messages in thread From: Michal Hocko @ 2015-09-25 13:29 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, Jerome Marchand, linux-mm, linux-kernel, Hugh Dickins, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API, Konstantin Khlebnikov, Minchan Kim On Wed 05-08-15 15:01:25, Vlastimil Babka wrote: > From: Jerome Marchand <jmarchan@redhat.com> > > It's currently inconvenient to retrieve MM_ANONPAGES value from status > and statm files and there is no way to separate MM_FILEPAGES and > MM_SHMEMPAGES. Add VmAnon, VmFile and VmShm lines in /proc/<pid>/status > to solve these issues. Yes this is definitely an improvement. I have no strong opinion on naming. VmFOO is consistent with the rest (e.g. VmData, Stk...) > > Signed-off-by: Jerome Marchand <jmarchan@redhat.com> > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> [...] > @@ -82,7 +91,7 @@ unsigned long task_statm(struct mm_struct *mm, > unsigned long *data, unsigned long *resident) > { > *shared = get_mm_counter(mm, MM_FILEPAGES) + > - get_mm_counter(mm, MM_SHMEMPAGES); > + get_mm_counter_shmem(mm); > *text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) > >> PAGE_SHIFT; > *data = mm->total_vm - mm->shared_vm; Ahh, so you have fixed up the compilation issue from previous patch here... This really belong to the previous patch as already noted. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <1438779685-5227-1-git-send-email-vbabka-AlSwsSmVLrQ@public.gmane.org>]
* [PATCH v3 1/4] mm, documentation: clarify /proc/pid/status VmSwap limitations [not found] ` <1438779685-5227-1-git-send-email-vbabka-AlSwsSmVLrQ@public.gmane.org> @ 2015-08-05 13:01 ` Vlastimil Babka [not found] ` <1438779685-5227-2-git-send-email-vbabka-AlSwsSmVLrQ@public.gmane.org> 2015-08-05 13:28 ` [PATCH v3 0/4] enhance shmem process and swap accounting Konstantin Khlebnikov 2015-08-07 9:37 ` Jerome Marchand 2 siblings, 1 reply; 13+ messages in thread From: Vlastimil Babka @ 2015-08-05 13:01 UTC (permalink / raw) To: Andrew Morton, Jerome Marchand Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Vlastimil Babka, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390-u79uwXL29TY76Z2rM5mHXA, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API, Konstantin Khlebnikov, Minchan Kim The documentation for /proc/pid/status does not mention that the value of VmSwap counts only swapped out anonymous private pages and not shmem. This is not obvious, so document this limitation. Signed-off-by: Vlastimil Babka <vbabka-AlSwsSmVLrQ@public.gmane.org> --- Documentation/filesystems/proc.txt | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index d411ca6..29f4011 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -237,6 +237,8 @@ Table 1-2: Contents of the status files (as of 4.1) VmPTE size of page table entries VmPMD size of second level page tables VmSwap size of swap usage (the number of referred swapents) + by anonymous private data (shmem swap usage is not + included) Threads number of threads SigQ number of signals queued/max. number for queue SigPnd bitmap of pending signals for the thread -- 2.4.6 ^ permalink raw reply related [flat|nested] 13+ messages in thread
[parent not found: <1438779685-5227-2-git-send-email-vbabka-AlSwsSmVLrQ@public.gmane.org>]
* Re: [PATCH v3 1/4] mm, documentation: clarify /proc/pid/status VmSwap limitations [not found] ` <1438779685-5227-2-git-send-email-vbabka-AlSwsSmVLrQ@public.gmane.org> @ 2015-09-25 11:36 ` Michal Hocko 0 siblings, 0 replies; 13+ messages in thread From: Michal Hocko @ 2015-09-25 11:36 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, Jerome Marchand, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Hugh Dickins, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390-u79uwXL29TY76Z2rM5mHXA, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API, Konstantin Khlebnikov, Minchan Kim [Sorry for a really long delay] On Wed 05-08-15 15:01:22, Vlastimil Babka wrote: > The documentation for /proc/pid/status does not mention that the value of > VmSwap counts only swapped out anonymous private pages and not shmem. This is > not obvious, so document this limitation. This is definitely an improvement > > Signed-off-by: Vlastimil Babka <vbabka-AlSwsSmVLrQ@public.gmane.org> Acked-by: Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org> > --- > Documentation/filesystems/proc.txt | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt > index d411ca6..29f4011 100644 > --- a/Documentation/filesystems/proc.txt > +++ b/Documentation/filesystems/proc.txt > @@ -237,6 +237,8 @@ Table 1-2: Contents of the status files (as of 4.1) > VmPTE size of page table entries > VmPMD size of second level page tables > VmSwap size of swap usage (the number of referred swapents) > + by anonymous private data (shmem swap usage is not > + included) > Threads number of threads > SigQ number of signals queued/max. number for queue > SigPnd bitmap of pending signals for the thread > -- > 2.4.6 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-api" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 0/4] enhance shmem process and swap accounting [not found] ` <1438779685-5227-1-git-send-email-vbabka-AlSwsSmVLrQ@public.gmane.org> 2015-08-05 13:01 ` [PATCH v3 1/4] mm, documentation: clarify /proc/pid/status VmSwap limitations Vlastimil Babka @ 2015-08-05 13:28 ` Konstantin Khlebnikov 2015-08-07 9:37 ` Jerome Marchand 2 siblings, 0 replies; 13+ messages in thread From: Konstantin Khlebnikov @ 2015-08-05 13:28 UTC (permalink / raw) To: Vlastimil Babka, Andrew Morton, Jerome Marchand Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390-u79uwXL29TY76Z2rM5mHXA, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API, Minchan Kim On 05.08.2015 16:01, Vlastimil Babka wrote: > Reposting due to lack of feedback in May. I hope at least patches 1 and 2 > could be merged as they are IMHO bugfixes. 3 and 4 is optional but IMHO useful. > > Changes since v2: > o Rebase on next-20150805. > o This means that /proc/pid/maps has the proportional swap share (SwapPss:) > field as per https://lkml.org/lkml/2015/6/15/274 > It's not clear what to do with shmem here so it's 0 for now. > - swapped out shmem doesn't have swap entries, so we would have to look at who > else has the shmem object (partially) mapped > - to be more precise we should also check if his range actually includes > the offset in question, which could get rather involved > - or is there some easy way I don't see? > o Konstantin suggested for patch 3/4 that I drop the CONFIG_SHMEM #ifdefs > I didn't see the point in going against tinyfication when the work is > already done, but I can do that if more people think it's better and it > would block the series. That's not a blocker. Except naming in the last patch you can add: Acked-by: Konstantin Khlebnikov <khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org> > > Changes since v1: > o In Patch 2, rely on SHMEM_I(inode)->swapped if possible, and fallback to > radix tree iterator on partially mapped shmem objects, i.e. decouple shmem > swap usage determination from the page walk, for performance reasons. > Thanks to Jerome and Konstantin for the tips. > The downside is that mm/shmem.c had to be touched. > > This series is based on Jerome Marchand's [1] so let me quote the first > paragraph from there: > > There are several shortcomings with the accounting of shared memory > (sysV shm, shared anonymous mapping, mapping to a tmpfs file). The > values in /proc/<pid>/status and statm don't allow to distinguish > between shmem memory and a shared mapping to a regular file, even > though theirs implication on memory usage are quite different: at > reclaim, file mapping can be dropped or write back on disk while shmem > needs a place in swap. As for shmem pages that are swapped-out or in > swap cache, they aren't accounted at all. > > The original motivation for myself is that a customer found (IMHO rightfully) > confusing that e.g. top output for process swap usage is unreliable with > respect to swapped out shmem pages, which are not accounted for. > > The fundamental difference between private anonymous and shmem pages is that > the latter has PTE's converted to pte_none, and not swapents. As such, they are > not accounted to the number of swapents visible e.g. in /proc/pid/status VmSwap > row. It might be theoretically possible to use swapents when swapping out shmem > (without extra cost, as one has to change all mappers anyway), and on swap in > only convert the swapent for the faulting process, leaving swapents in other > processes until they also fault (so again no extra cost). But I don't know how > many assumptions this would break, and it would be too disruptive change for a > relatively small benefit. > > Instead, my approach is to document the limitation of VmSwap, and provide means > to determine the swap usage for shmem areas for those who are interested and > willing to pay the price, using /proc/pid/smaps. Because outside of ipcs, I > don't think it's possible to currently to determine the usage at all. The > previous patchset [1] did introduce new shmem-specific fields into smaps > output, and functions to determine the values. I take a simpler approach, > noting that smaps output already has a "Swap: X kB" line, where currently X == > 0 always for shmem areas. I think we can just consider this a bug and provide > the proper value by consulting the radix tree, as e.g. mincore_page() does. In the > patch changelog I explain why this is also not perfect (and cannot be without > swapents), but still arguably much better than showing a 0. > > The last two patches are adapted from Jerome's patchset and provide a VmRSS > breakdown to VmAnon, VmFile and VmShm in /proc/pid/status. Hugh noted that > this is a welcome addition, and I agree that it might help e.g. debugging > process memory usage at albeit non-zero, but still rather low cost of extra > per-mm counter and some page flag checks. I updated these patches to 4.0-rc1, > made them respect !CONFIG_SHMEM so that tiny systems don't pay the cost, and > optimized the page flag checking somewhat. > > [1] http://lwn.net/Articles/611966/ > > Jerome Marchand (2): > mm, shmem: Add shmem resident memory accounting > mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status > > Vlastimil Babka (2): > mm, documentation: clarify /proc/pid/status VmSwap limitations > mm, proc: account for shmem swap in /proc/pid/smaps > > Documentation/filesystems/proc.txt | 18 ++++++++++--- > arch/s390/mm/pgtable.c | 5 +--- > fs/proc/task_mmu.c | 52 ++++++++++++++++++++++++++++++++++-- > include/linux/mm.h | 28 ++++++++++++++++++++ > include/linux/mm_types.h | 9 ++++--- > include/linux/shmem_fs.h | 6 +++++ > kernel/events/uprobes.c | 2 +- > mm/memory.c | 30 +++++++-------------- > mm/oom_kill.c | 5 ++-- > mm/rmap.c | 15 +++-------- > mm/shmem.c | 54 ++++++++++++++++++++++++++++++++++++++ > 11 files changed, 178 insertions(+), 46 deletions(-) > -- Konstantin ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 0/4] enhance shmem process and swap accounting [not found] ` <1438779685-5227-1-git-send-email-vbabka-AlSwsSmVLrQ@public.gmane.org> 2015-08-05 13:01 ` [PATCH v3 1/4] mm, documentation: clarify /proc/pid/status VmSwap limitations Vlastimil Babka 2015-08-05 13:28 ` [PATCH v3 0/4] enhance shmem process and swap accounting Konstantin Khlebnikov @ 2015-08-07 9:37 ` Jerome Marchand 2 siblings, 0 replies; 13+ messages in thread From: Jerome Marchand @ 2015-08-07 9:37 UTC (permalink / raw) To: Vlastimil Babka, Andrew Morton Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390-u79uwXL29TY76Z2rM5mHXA, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API, Konstantin Khlebnikov, Minchan Kim [-- Attachment #1: Type: text/plain, Size: 5917 bytes --] On 08/05/2015 03:01 PM, Vlastimil Babka wrote: > Reposting due to lack of feedback in May. I hope at least patches 1 and 2 > could be merged as they are IMHO bugfixes. 3 and 4 is optional but IMHO useful. > > Changes since v2: > o Rebase on next-20150805. > o This means that /proc/pid/maps has the proportional swap share (SwapPss:) > field as per https://lkml.org/lkml/2015/6/15/274 > It's not clear what to do with shmem here so it's 0 for now. > - swapped out shmem doesn't have swap entries, so we would have to look at who > else has the shmem object (partially) mapped > - to be more precise we should also check if his range actually includes > the offset in question, which could get rather involved > - or is there some easy way I don't see? Hmm... This is much more difficult than I envision when commenting on Minchan patch. One possibility could be to have the pte of paged out shmem pages set in a similar way than regular swap entry are. But that would need to use some very precious estate on the pte. As it is, a zero value, while obviously wrong, has the advantage of not being misleading like a bad approximation would be (like the kind which doesn't properly accounts for partial mapping). Jerome > o Konstantin suggested for patch 3/4 that I drop the CONFIG_SHMEM #ifdefs > I didn't see the point in going against tinyfication when the work is > already done, but I can do that if more people think it's better and it > would block the series. > > Changes since v1: > o In Patch 2, rely on SHMEM_I(inode)->swapped if possible, and fallback to > radix tree iterator on partially mapped shmem objects, i.e. decouple shmem > swap usage determination from the page walk, for performance reasons. > Thanks to Jerome and Konstantin for the tips. > The downside is that mm/shmem.c had to be touched. > > This series is based on Jerome Marchand's [1] so let me quote the first > paragraph from there: > > There are several shortcomings with the accounting of shared memory > (sysV shm, shared anonymous mapping, mapping to a tmpfs file). The > values in /proc/<pid>/status and statm don't allow to distinguish > between shmem memory and a shared mapping to a regular file, even > though theirs implication on memory usage are quite different: at > reclaim, file mapping can be dropped or write back on disk while shmem > needs a place in swap. As for shmem pages that are swapped-out or in > swap cache, they aren't accounted at all. > > The original motivation for myself is that a customer found (IMHO rightfully) > confusing that e.g. top output for process swap usage is unreliable with > respect to swapped out shmem pages, which are not accounted for. > > The fundamental difference between private anonymous and shmem pages is that > the latter has PTE's converted to pte_none, and not swapents. As such, they are > not accounted to the number of swapents visible e.g. in /proc/pid/status VmSwap > row. It might be theoretically possible to use swapents when swapping out shmem > (without extra cost, as one has to change all mappers anyway), and on swap in > only convert the swapent for the faulting process, leaving swapents in other > processes until they also fault (so again no extra cost). But I don't know how > many assumptions this would break, and it would be too disruptive change for a > relatively small benefit. > > Instead, my approach is to document the limitation of VmSwap, and provide means > to determine the swap usage for shmem areas for those who are interested and > willing to pay the price, using /proc/pid/smaps. Because outside of ipcs, I > don't think it's possible to currently to determine the usage at all. The > previous patchset [1] did introduce new shmem-specific fields into smaps > output, and functions to determine the values. I take a simpler approach, > noting that smaps output already has a "Swap: X kB" line, where currently X == > 0 always for shmem areas. I think we can just consider this a bug and provide > the proper value by consulting the radix tree, as e.g. mincore_page() does. In the > patch changelog I explain why this is also not perfect (and cannot be without > swapents), but still arguably much better than showing a 0. > > The last two patches are adapted from Jerome's patchset and provide a VmRSS > breakdown to VmAnon, VmFile and VmShm in /proc/pid/status. Hugh noted that > this is a welcome addition, and I agree that it might help e.g. debugging > process memory usage at albeit non-zero, but still rather low cost of extra > per-mm counter and some page flag checks. I updated these patches to 4.0-rc1, > made them respect !CONFIG_SHMEM so that tiny systems don't pay the cost, and > optimized the page flag checking somewhat. > > [1] http://lwn.net/Articles/611966/ > > Jerome Marchand (2): > mm, shmem: Add shmem resident memory accounting > mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status > > Vlastimil Babka (2): > mm, documentation: clarify /proc/pid/status VmSwap limitations > mm, proc: account for shmem swap in /proc/pid/smaps > > Documentation/filesystems/proc.txt | 18 ++++++++++--- > arch/s390/mm/pgtable.c | 5 +--- > fs/proc/task_mmu.c | 52 ++++++++++++++++++++++++++++++++++-- > include/linux/mm.h | 28 ++++++++++++++++++++ > include/linux/mm_types.h | 9 ++++--- > include/linux/shmem_fs.h | 6 +++++ > kernel/events/uprobes.c | 2 +- > mm/memory.c | 30 +++++++-------------- > mm/oom_kill.c | 5 ++-- > mm/rmap.c | 15 +++-------- > mm/shmem.c | 54 ++++++++++++++++++++++++++++++++++++++ > 11 files changed, 178 insertions(+), 46 deletions(-) > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2015-09-25 13:29 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-08-05 13:01 [PATCH v3 0/4] enhance shmem process and swap accounting Vlastimil Babka 2015-08-05 13:01 ` [PATCH v3 2/4] mm, proc: account for shmem swap in /proc/pid/smaps Vlastimil Babka [not found] ` <1438779685-5227-3-git-send-email-vbabka-AlSwsSmVLrQ@public.gmane.org> 2015-09-25 12:57 ` Michal Hocko 2015-08-05 13:01 ` [PATCH v3 3/4] mm, shmem: Add shmem resident memory accounting Vlastimil Babka 2015-09-25 13:26 ` Michal Hocko 2015-08-05 13:01 ` [PATCH v3 4/4] mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status Vlastimil Babka [not found] ` <1438779685-5227-5-git-send-email-vbabka-AlSwsSmVLrQ@public.gmane.org> 2015-08-05 13:21 ` Konstantin Khlebnikov 2015-08-27 7:22 ` Vlastimil Babka 2015-09-25 13:29 ` Michal Hocko [not found] ` <1438779685-5227-1-git-send-email-vbabka-AlSwsSmVLrQ@public.gmane.org> 2015-08-05 13:01 ` [PATCH v3 1/4] mm, documentation: clarify /proc/pid/status VmSwap limitations Vlastimil Babka [not found] ` <1438779685-5227-2-git-send-email-vbabka-AlSwsSmVLrQ@public.gmane.org> 2015-09-25 11:36 ` Michal Hocko 2015-08-05 13:28 ` [PATCH v3 0/4] enhance shmem process and swap accounting Konstantin Khlebnikov 2015-08-07 9:37 ` Jerome Marchand
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).