Re: [PATCH V2 0/6] VA to numa node information

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "prakash.sangappa" <prakash.sangappa@oracle.com>
To: Steven Sistare <steven.sistare@oracle.com>,
	Michal Hocko <mhocko@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	dave.hansen@intel.com, nao.horiguchi@gmail.com,
	akpm@linux-foundation.org, kirill.shutemov@linux.intel.com,
	khandual@linux.vnet.ibm.com
Subject: Re: [PATCH V2 0/6] VA to numa node information
Date: Tue, 18 Dec 2018 15:46:45 -0800	[thread overview]
Message-ID: <c81d912f-157f-749a-92fb-78f5e836da85@oracle.com> (raw)
In-Reply-To: <79d5e991-d9f6-65e2-cb77-0f999fa512fe@oracle.com>

[-- Attachment #1: Type: text/plain, Size: 3572 bytes --]

On 11/26/2018 11:20 AM, Steven Sistare wrote:
> On 11/9/2018 11:48 PM, Prakash Sangappa wrote:
>>
>> Here is some data from pmap using move_pages() API  with optimization.
>> Following table compares time pmap takes to print address mapping of a
>> large process, with numa node information using move_pages() api vs pmap
>> using /proc numa_vamaps file.
>>
>> Running pmap command on a process with 1.3 TB of address space, with
>> sparse mappings.
>>
>>                         ~1.3 TB sparse      250G dense segment with hugepages.
>> move_pages              8.33s              3.14
>> optimized move_pages    6.29s              0.92
>> /proc numa_vamaps       0.08s              0.04
>>
>>   
>> Second column is pmap time on a 250G address range of this process, which maps
>> hugepages(THP & hugetlb).
> The data look compelling to me.  numa_vmap provides a much smoother user experience
> for the analyst who is casting a wide net looking for the root of a performance issue.
> Almost no waiting to see the data.
>
> - Steve

What do others think? How to proceed on this?

Summarizing the discussion so far:

Usecase for getting VA(Virtual Address) to numa node information is
for performance analysis purpose. Investigating  performance issues
would involve looking at where a process memory is allocated from
(which numa node). For the user analyzing the issue, an efficient way
to get this information will be useful when looking at application
processes having large address space.

The patch proposed  adding /proc/<pid>/numa_vamaps file for providing
VA to Numa node id mapping information of a process. This file provides
address range to numa node id info. Address range not having any pages
mapped will be indicated with '-' for numa node id. Sample file content

00400000-00410000 N1
00410000-0047f000 N0
00480000-00481000 -
00481000-004a0000 N0
..

Dave Hansen asked how would it scale, with respect reading this file from
a large process. Answer is, the file contents are generated using page
table walk, and copied to user buffer. The mmap_sem lock is drop and
re-acquired in the process of walking the page table and copying file
content. The kernel buffer size used determines how long the lock is held.
Which can be further improved to drop the lock and re-acquire after a
fixed number(512) of pages are walked.

Also, with support for seeking to a specific VA of the process from where
the VA to numa node information will be provided, the file offset is not
taken into consideration. This behavior is different and unlike reading a
normal file. Other /proc files(Ex /proc/<pid>/pagemap) also have certain
differences compared to reading a normal file.

Michal Hocko suggested that the currently available 'move_pages' API
could be used to collect the VA to numa node id information. However,
use of numa_vamaps /proc file will be more efficient then move_pages().
Steven Sistare Suggested optimizing move_pages(), for the case when
consecutive 4k page  addresses are passed in. I tried out this optimization
and above mentioned table shows  performance comparison of
move_pages() API vs 'numa_vamaps' /proc file. Specifically, in the case of
sparse mapping the optimization to move_pages() does not help. The
performance benefits seen with the /proc file will make a difference from
an usability point of view.

Andrew Morton had asked about the performance difference between
move_pages() API and use of 'numa_vamaps' /proc file, also the usecase
for getting VA to numa node id information. Hope above description
answers the questions.

[-- Attachment #2: Type: text/html, Size: 4678 bytes --]

next prev parent reply	other threads:[~2018-12-18 23:48 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-12 20:23 [PATCH V2 0/6] VA to numa node information Prakash Sangappa
2018-09-12 20:23 ` [PATCH V2 1/6] Add check to match numa node id when gathering pte stats Prakash Sangappa
2018-09-12 20:24 ` [PATCH V2 2/6] Add /proc/<pid>/numa_vamaps file for numa node information Prakash Sangappa
2018-09-12 20:24 ` [PATCH V2 3/6] Provide process address range to numa node id mapping Prakash Sangappa
2018-09-12 20:24 ` [PATCH V2 4/6] Add support to lseek /proc/<pid>/numa_vamaps file Prakash Sangappa
2018-09-12 20:24 ` [PATCH V2 5/6] File /proc/<pid>/numa_vamaps access needs PTRACE_MODE_READ_REALCREDS check Prakash Sangappa
2018-09-12 20:24 ` [PATCH V2 6/6] /proc/pid/numa_vamaps: document in Documentation/filesystems/proc.txt Prakash Sangappa
2018-09-13  8:40 ` [PATCH V2 0/6] VA to numa node information Michal Hocko
2018-09-13 22:32   ` prakash.sangappa
2018-09-14  0:10     ` Andrew Morton
2018-09-14  0:25       ` Dave Hansen
2018-09-15  1:31         ` Prakash Sangappa
2018-09-14  5:56     ` Michal Hocko
2018-09-14 16:01       ` Steven Sistare
2018-09-14 18:04         ` Prakash Sangappa
2018-09-14 19:01           ` Dave Hansen
2018-09-24 17:14         ` Michal Hocko
2018-11-10  4:48           ` Prakash Sangappa
2018-11-26 19:20             ` Steven Sistare
2018-12-18 23:46               ` prakash.sangappa [this message]
2018-12-19 20:52                 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c81d912f-157f-749a-92fb-78f5e836da85@oracle.com \
    --to=prakash.sangappa@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=khandual@linux.vnet.ibm.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=nao.horiguchi@gmail.com \
    --cc=steven.sistare@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).