Re: [PATCH 6/6] mm: Run the fault-around code under the VMA lock

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Yin, Fengwei" <fengwei.yin@intel.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: "surenb@google.com" <surenb@google.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"Agrawal, Punit" <punit.agrawal@bytedance.com>
Subject: Re: [PATCH 6/6] mm: Run the fault-around code under the VMA lock
Date: Wed, 12 Apr 2023 15:35:49 +0800	[thread overview]
Message-ID: <4e8b083c-cdb4-a5e3-abb5-2aae259bd2d7@intel.com> (raw)
In-Reply-To: <ZDQZBdJNiG0lIw2v@casper.infradead.org>



On 4/10/2023 10:11 PM, Matthew Wilcox wrote:
> On Mon, Apr 10, 2023 at 04:53:19AM +0000, Yin, Fengwei wrote:
>> On Tue, 2023-04-04 at 16:23 +0100, Matthew Wilcox wrote:
>>> On Tue, Apr 04, 2023 at 02:58:50PM +0100, Matthew Wilcox (Oracle)
>>> wrote:
>>>> The map_pages fs method should be safe to run under the VMA lock
>>>> instead
>>>> of the mmap lock.  This should have a measurable reduction in
>>>> contention
>>>> on the mmap lock.
>>>
>>> https://github.com/antonblanchard/will-it-scale/pull/37/files should
>>> be a good microbenchmark to report numbers from.  Obviously real-
>>> world
>>> benchmarks will be more compelling.
>>>
>>
>> Test result in my side with page_fault4 of will-it-scale in thread 
>> mode is:
>>   15274196 (without the patch) -> 17291444 (with the patch)
>>
>> 13.2% improvement on a Ice Lake with 48C/96T + 192G RAM + ext4 
>> filesystem.
> 
> Thanks!  That is really good news.
> 
>> The perf showed the mmap_lock contention reduced a lot:
>> (Removed the grandson functions of do_user_addr_fault()) 
>>
>> latest linux-next with the patch:
>>     51.78%--do_user_addr_fault
>>             |          
>>             |--49.09%--handle_mm_fault
>>             |--1.19%--lock_vma_under_rcu
>>             --1.09%--down_read
>>
>> latest linux-next without the patch:
>>     73.65%--do_user_addr_fault
>>             |          
>>             |--28.65%--handle_mm_fault
>>             |--17.22%--down_read_trylock
>>             |--10.92%--down_read
>>             |--9.20%--up_read
>>             --7.30%--find_vma
>>
>> My understanding is down_read_trylock, down_read and up_read all are
>> related with mmap_lock. So the mmap_lock contention reduction is quite
>> obvious.
> 
> Absolutely.  I'm a little surprised that find_vma() basically disappeared
> from the perf results.  I guess that it was cache cold after contending
> on the mmap_lock rwsem.  But this is a very encouraging result.

Yes. find_vma() was surprise. So I did more check about it.
1. re-run the testing for more 3 times in case I made stupid mistake.
   All testing show same behavior for find_vma().

2. perf report for find_vma() shows:
	6.26%--find_vma                                       
		|                                             
		--0.66%--mt_find                             
			|                                  
			--0.60%--mtree_range_walk

   The most time used in find_vma() is not mt_find. It's mmap_assert_locked(mm).

3. perf annotate of find_vma shows:
          │    ffffffffa91e9f20 <load0>:                                                                                                                         
     0.07 │      nop                                                                                                                                             
     0.07 │      sub  $0x10,%rsp                                                                                                                                 
     0.01 │      mov  %gs:0x28,%rax                                                                                                                              
     0.05 │      mov  %rax,0x8(%rsp)                                                                                                                             
     0.02 │      xor  %eax,%eax                                                                                                                                  
          │      mov  %rsi,(%rsp)                                                                                                                                
     0.00 │      mov  0x70(%rdi),%rax     --> This is rwsem_is_locked(&mm->mmap_lock)                                                                                                                      
    99.60 │      test %rax,%rax                                                                                                                                  
          │    ↓ je   4e                                                                                                                                         
     0.09 │      mov  $0xffffffffffffffff,%rdx                                                                                                                   
          │      mov  %rsp,%rsi

   I believe the line "mov  0x70(%rdi),%rax" should occupy 99.60% runtime of find_vma()
   instead of line "test %rax,%rax". And it's accessing the mmap_lock. With this series,
   the mmap_lock contention is reduced a lot with page_fault4 of will-it-scale. It
   makes mmap_lock access fast also.


Regards
Yin, Fengwei

     prev parent reply	other threads:[~2023-04-12  7:36 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-04 13:58 [PATCH 0/6] Avoid the mmap lock for fault-around Matthew Wilcox (Oracle)
2023-04-04 13:58 ` [PATCH 1/6] mm: Allow per-VMA locks on file-backed VMAs Matthew Wilcox (Oracle)
2023-04-07 17:54   ` Suren Baghdasaryan
2023-04-07 20:12     ` Matthew Wilcox
2023-04-07 20:26       ` Suren Baghdasaryan
2023-04-07 21:36         ` Matthew Wilcox
2023-04-07 22:40           ` Suren Baghdasaryan
2023-04-07 22:49             ` Suren Baghdasaryan
2023-04-14 18:02           ` Suren Baghdasaryan
2023-04-04 13:58 ` [PATCH 2/6] mm: Move FAULT_FLAG_VMA_LOCK check from handle_mm_fault() Matthew Wilcox (Oracle)
2023-04-04 13:58 ` [PATCH 3/6] mm: Move FAULT_FLAG_VMA_LOCK check into handle_pte_fault() Matthew Wilcox (Oracle)
2023-04-04 13:58 ` [PATCH 4/6] mm: Move FAULT_FLAG_VMA_LOCK check down in handle_pte_fault() Matthew Wilcox (Oracle)
2023-04-04 13:58 ` [PATCH 5/6] mm: Move the FAULT_FLAG_VMA_LOCK check down from do_pte_missing() Matthew Wilcox (Oracle)
2023-04-04 13:58 ` [PATCH 6/6] mm: Run the fault-around code under the VMA lock Matthew Wilcox (Oracle)
2023-04-04 15:23   ` Matthew Wilcox
2023-04-07 18:20     ` Suren Baghdasaryan
2023-04-10  4:53     ` Yin, Fengwei
2023-04-10 14:11       ` Matthew Wilcox
2023-04-12  7:35         ` Yin, Fengwei [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4e8b083c-cdb4-a5e3-abb5-2aae259bd2d7@intel.com \
    --to=fengwei.yin@intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=punit.agrawal@bytedance.com \
    --cc=surenb@google.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).