From: Jerome Marchand <jmarchan-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Vlastimil Babka <vbabka-AlSwsSmVLrQ@public.gmane.org>,
Andrew Morton
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Hugh Dickins <hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>,
"Kirill A. Shutemov"
<kirill.shutemov-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>,
Cyrill Gorcunov
<gorcunov-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>,
Randy Dunlap <rdunlap-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
linux-s390-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Martin Schwidefsky
<schwidefsky-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>,
Heiko Carstens
<heiko.carstens-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>,
Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
Paul Mackerras <paulus-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>,
Arnaldo Carvalho de Melo
<acme-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
Linux API <linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Konstantin Khlebnikov
<khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org>,
Minchan Kim <minchan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Subject: Re: [PATCH v3 0/4] enhance shmem process and swap accounting
Date: Fri, 07 Aug 2015 11:37:39 +0200 [thread overview]
Message-ID: <55C47C63.6050406@redhat.com> (raw)
In-Reply-To: <1438779685-5227-1-git-send-email-vbabka-AlSwsSmVLrQ@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 5917 bytes --]
On 08/05/2015 03:01 PM, Vlastimil Babka wrote:
> Reposting due to lack of feedback in May. I hope at least patches 1 and 2
> could be merged as they are IMHO bugfixes. 3 and 4 is optional but IMHO useful.
>
> Changes since v2:
> o Rebase on next-20150805.
> o This means that /proc/pid/maps has the proportional swap share (SwapPss:)
> field as per https://lkml.org/lkml/2015/6/15/274
> It's not clear what to do with shmem here so it's 0 for now.
> - swapped out shmem doesn't have swap entries, so we would have to look at who
> else has the shmem object (partially) mapped
> - to be more precise we should also check if his range actually includes
> the offset in question, which could get rather involved
> - or is there some easy way I don't see?
Hmm... This is much more difficult than I envision when commenting on
Minchan patch. One possibility could be to have the pte of paged out
shmem pages set in a similar way than regular swap entry are. But that
would need to use some very precious estate on the pte.
As it is, a zero value, while obviously wrong, has the advantage of not
being misleading like a bad approximation would be (like the kind which
doesn't properly accounts for partial mapping).
Jerome
> o Konstantin suggested for patch 3/4 that I drop the CONFIG_SHMEM #ifdefs
> I didn't see the point in going against tinyfication when the work is
> already done, but I can do that if more people think it's better and it
> would block the series.
>
> Changes since v1:
> o In Patch 2, rely on SHMEM_I(inode)->swapped if possible, and fallback to
> radix tree iterator on partially mapped shmem objects, i.e. decouple shmem
> swap usage determination from the page walk, for performance reasons.
> Thanks to Jerome and Konstantin for the tips.
> The downside is that mm/shmem.c had to be touched.
>
> This series is based on Jerome Marchand's [1] so let me quote the first
> paragraph from there:
>
> There are several shortcomings with the accounting of shared memory
> (sysV shm, shared anonymous mapping, mapping to a tmpfs file). The
> values in /proc/<pid>/status and statm don't allow to distinguish
> between shmem memory and a shared mapping to a regular file, even
> though theirs implication on memory usage are quite different: at
> reclaim, file mapping can be dropped or write back on disk while shmem
> needs a place in swap. As for shmem pages that are swapped-out or in
> swap cache, they aren't accounted at all.
>
> The original motivation for myself is that a customer found (IMHO rightfully)
> confusing that e.g. top output for process swap usage is unreliable with
> respect to swapped out shmem pages, which are not accounted for.
>
> The fundamental difference between private anonymous and shmem pages is that
> the latter has PTE's converted to pte_none, and not swapents. As such, they are
> not accounted to the number of swapents visible e.g. in /proc/pid/status VmSwap
> row. It might be theoretically possible to use swapents when swapping out shmem
> (without extra cost, as one has to change all mappers anyway), and on swap in
> only convert the swapent for the faulting process, leaving swapents in other
> processes until they also fault (so again no extra cost). But I don't know how
> many assumptions this would break, and it would be too disruptive change for a
> relatively small benefit.
>
> Instead, my approach is to document the limitation of VmSwap, and provide means
> to determine the swap usage for shmem areas for those who are interested and
> willing to pay the price, using /proc/pid/smaps. Because outside of ipcs, I
> don't think it's possible to currently to determine the usage at all. The
> previous patchset [1] did introduce new shmem-specific fields into smaps
> output, and functions to determine the values. I take a simpler approach,
> noting that smaps output already has a "Swap: X kB" line, where currently X ==
> 0 always for shmem areas. I think we can just consider this a bug and provide
> the proper value by consulting the radix tree, as e.g. mincore_page() does. In the
> patch changelog I explain why this is also not perfect (and cannot be without
> swapents), but still arguably much better than showing a 0.
>
> The last two patches are adapted from Jerome's patchset and provide a VmRSS
> breakdown to VmAnon, VmFile and VmShm in /proc/pid/status. Hugh noted that
> this is a welcome addition, and I agree that it might help e.g. debugging
> process memory usage at albeit non-zero, but still rather low cost of extra
> per-mm counter and some page flag checks. I updated these patches to 4.0-rc1,
> made them respect !CONFIG_SHMEM so that tiny systems don't pay the cost, and
> optimized the page flag checking somewhat.
>
> [1] http://lwn.net/Articles/611966/
>
> Jerome Marchand (2):
> mm, shmem: Add shmem resident memory accounting
> mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status
>
> Vlastimil Babka (2):
> mm, documentation: clarify /proc/pid/status VmSwap limitations
> mm, proc: account for shmem swap in /proc/pid/smaps
>
> Documentation/filesystems/proc.txt | 18 ++++++++++---
> arch/s390/mm/pgtable.c | 5 +---
> fs/proc/task_mmu.c | 52 ++++++++++++++++++++++++++++++++++--
> include/linux/mm.h | 28 ++++++++++++++++++++
> include/linux/mm_types.h | 9 ++++---
> include/linux/shmem_fs.h | 6 +++++
> kernel/events/uprobes.c | 2 +-
> mm/memory.c | 30 +++++++--------------
> mm/oom_kill.c | 5 ++--
> mm/rmap.c | 15 +++--------
> mm/shmem.c | 54 ++++++++++++++++++++++++++++++++++++++
> 11 files changed, 178 insertions(+), 46 deletions(-)
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
WARNING: multiple messages have this Message-ID (diff)
From: Jerome Marchand <jmarchan@redhat.com>
To: Vlastimil Babka <vbabka@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Hugh Dickins <hughd@google.com>, Michal Hocko <mhocko@suse.cz>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Cyrill Gorcunov <gorcunov@openvz.org>,
Randy Dunlap <rdunlap@infradead.org>,
linux-s390@vger.kernel.org,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Peter Zijlstra <peterz@infradead.org>,
Paul Mackerras <paulus@samba.org>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Oleg Nesterov <oleg@redhat.com>,
Linux API <linux-api@vger.kernel.org>,
Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
Minchan Kim <minchan@kernel.org>
Subject: Re: [PATCH v3 0/4] enhance shmem process and swap accounting
Date: Fri, 07 Aug 2015 11:37:39 +0200 [thread overview]
Message-ID: <55C47C63.6050406@redhat.com> (raw)
In-Reply-To: <1438779685-5227-1-git-send-email-vbabka@suse.cz>
[-- Attachment #1: Type: text/plain, Size: 5917 bytes --]
On 08/05/2015 03:01 PM, Vlastimil Babka wrote:
> Reposting due to lack of feedback in May. I hope at least patches 1 and 2
> could be merged as they are IMHO bugfixes. 3 and 4 is optional but IMHO useful.
>
> Changes since v2:
> o Rebase on next-20150805.
> o This means that /proc/pid/maps has the proportional swap share (SwapPss:)
> field as per https://lkml.org/lkml/2015/6/15/274
> It's not clear what to do with shmem here so it's 0 for now.
> - swapped out shmem doesn't have swap entries, so we would have to look at who
> else has the shmem object (partially) mapped
> - to be more precise we should also check if his range actually includes
> the offset in question, which could get rather involved
> - or is there some easy way I don't see?
Hmm... This is much more difficult than I envision when commenting on
Minchan patch. One possibility could be to have the pte of paged out
shmem pages set in a similar way than regular swap entry are. But that
would need to use some very precious estate on the pte.
As it is, a zero value, while obviously wrong, has the advantage of not
being misleading like a bad approximation would be (like the kind which
doesn't properly accounts for partial mapping).
Jerome
> o Konstantin suggested for patch 3/4 that I drop the CONFIG_SHMEM #ifdefs
> I didn't see the point in going against tinyfication when the work is
> already done, but I can do that if more people think it's better and it
> would block the series.
>
> Changes since v1:
> o In Patch 2, rely on SHMEM_I(inode)->swapped if possible, and fallback to
> radix tree iterator on partially mapped shmem objects, i.e. decouple shmem
> swap usage determination from the page walk, for performance reasons.
> Thanks to Jerome and Konstantin for the tips.
> The downside is that mm/shmem.c had to be touched.
>
> This series is based on Jerome Marchand's [1] so let me quote the first
> paragraph from there:
>
> There are several shortcomings with the accounting of shared memory
> (sysV shm, shared anonymous mapping, mapping to a tmpfs file). The
> values in /proc/<pid>/status and statm don't allow to distinguish
> between shmem memory and a shared mapping to a regular file, even
> though theirs implication on memory usage are quite different: at
> reclaim, file mapping can be dropped or write back on disk while shmem
> needs a place in swap. As for shmem pages that are swapped-out or in
> swap cache, they aren't accounted at all.
>
> The original motivation for myself is that a customer found (IMHO rightfully)
> confusing that e.g. top output for process swap usage is unreliable with
> respect to swapped out shmem pages, which are not accounted for.
>
> The fundamental difference between private anonymous and shmem pages is that
> the latter has PTE's converted to pte_none, and not swapents. As such, they are
> not accounted to the number of swapents visible e.g. in /proc/pid/status VmSwap
> row. It might be theoretically possible to use swapents when swapping out shmem
> (without extra cost, as one has to change all mappers anyway), and on swap in
> only convert the swapent for the faulting process, leaving swapents in other
> processes until they also fault (so again no extra cost). But I don't know how
> many assumptions this would break, and it would be too disruptive change for a
> relatively small benefit.
>
> Instead, my approach is to document the limitation of VmSwap, and provide means
> to determine the swap usage for shmem areas for those who are interested and
> willing to pay the price, using /proc/pid/smaps. Because outside of ipcs, I
> don't think it's possible to currently to determine the usage at all. The
> previous patchset [1] did introduce new shmem-specific fields into smaps
> output, and functions to determine the values. I take a simpler approach,
> noting that smaps output already has a "Swap: X kB" line, where currently X ==
> 0 always for shmem areas. I think we can just consider this a bug and provide
> the proper value by consulting the radix tree, as e.g. mincore_page() does. In the
> patch changelog I explain why this is also not perfect (and cannot be without
> swapents), but still arguably much better than showing a 0.
>
> The last two patches are adapted from Jerome's patchset and provide a VmRSS
> breakdown to VmAnon, VmFile and VmShm in /proc/pid/status. Hugh noted that
> this is a welcome addition, and I agree that it might help e.g. debugging
> process memory usage at albeit non-zero, but still rather low cost of extra
> per-mm counter and some page flag checks. I updated these patches to 4.0-rc1,
> made them respect !CONFIG_SHMEM so that tiny systems don't pay the cost, and
> optimized the page flag checking somewhat.
>
> [1] http://lwn.net/Articles/611966/
>
> Jerome Marchand (2):
> mm, shmem: Add shmem resident memory accounting
> mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status
>
> Vlastimil Babka (2):
> mm, documentation: clarify /proc/pid/status VmSwap limitations
> mm, proc: account for shmem swap in /proc/pid/smaps
>
> Documentation/filesystems/proc.txt | 18 ++++++++++---
> arch/s390/mm/pgtable.c | 5 +---
> fs/proc/task_mmu.c | 52 ++++++++++++++++++++++++++++++++++--
> include/linux/mm.h | 28 ++++++++++++++++++++
> include/linux/mm_types.h | 9 ++++---
> include/linux/shmem_fs.h | 6 +++++
> kernel/events/uprobes.c | 2 +-
> mm/memory.c | 30 +++++++--------------
> mm/oom_kill.c | 5 ++--
> mm/rmap.c | 15 +++--------
> mm/shmem.c | 54 ++++++++++++++++++++++++++++++++++++++
> 11 files changed, 178 insertions(+), 46 deletions(-)
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
next prev parent reply other threads:[~2015-08-07 9:37 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-05 13:01 [PATCH v3 0/4] enhance shmem process and swap accounting Vlastimil Babka
2015-08-05 13:01 ` Vlastimil Babka
2015-08-05 13:01 ` [PATCH v3 2/4] mm, proc: account for shmem swap in /proc/pid/smaps Vlastimil Babka
2015-08-05 13:01 ` Vlastimil Babka
[not found] ` <1438779685-5227-3-git-send-email-vbabka-AlSwsSmVLrQ@public.gmane.org>
2015-09-25 12:57 ` Michal Hocko
2015-09-25 12:57 ` Michal Hocko
2015-09-25 12:57 ` Michal Hocko
2015-08-05 13:01 ` [PATCH v3 3/4] mm, shmem: Add shmem resident memory accounting Vlastimil Babka
2015-08-05 13:01 ` Vlastimil Babka
2015-09-25 13:26 ` Michal Hocko
2015-09-25 13:26 ` Michal Hocko
2015-08-05 13:01 ` [PATCH v3 4/4] mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status Vlastimil Babka
2015-08-05 13:01 ` Vlastimil Babka
[not found] ` <1438779685-5227-5-git-send-email-vbabka-AlSwsSmVLrQ@public.gmane.org>
2015-08-05 13:21 ` Konstantin Khlebnikov
2015-08-05 13:21 ` Konstantin Khlebnikov
2015-08-05 13:21 ` Konstantin Khlebnikov
2015-08-27 7:22 ` Vlastimil Babka
2015-08-27 7:22 ` Vlastimil Babka
2015-09-25 13:29 ` Michal Hocko
2015-09-25 13:29 ` Michal Hocko
[not found] ` <1438779685-5227-1-git-send-email-vbabka-AlSwsSmVLrQ@public.gmane.org>
2015-08-05 13:01 ` [PATCH v3 1/4] mm, documentation: clarify /proc/pid/status VmSwap limitations Vlastimil Babka
2015-08-05 13:01 ` Vlastimil Babka
2015-08-05 13:01 ` Vlastimil Babka
[not found] ` <1438779685-5227-2-git-send-email-vbabka-AlSwsSmVLrQ@public.gmane.org>
2015-09-25 11:36 ` Michal Hocko
2015-09-25 11:36 ` Michal Hocko
2015-09-25 11:36 ` Michal Hocko
2015-08-05 13:28 ` [PATCH v3 0/4] enhance shmem process and swap accounting Konstantin Khlebnikov
2015-08-05 13:28 ` Konstantin Khlebnikov
2015-08-05 13:28 ` Konstantin Khlebnikov
2015-08-07 9:37 ` Jerome Marchand [this message]
2015-08-07 9:37 ` Jerome Marchand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55C47C63.6050406@redhat.com \
--to=jmarchan-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
--cc=acme-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=gorcunov-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org \
--cc=heiko.carstens-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org \
--cc=hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org \
--cc=kirill.shutemov-VuQAYsv1563Yd54FQh9/CA@public.gmane.org \
--cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
--cc=linux-s390-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=mhocko-AlSwsSmVLrQ@public.gmane.org \
--cc=minchan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=paulus-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org \
--cc=peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
--cc=rdunlap-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
--cc=schwidefsky-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org \
--cc=vbabka-AlSwsSmVLrQ@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.