Re: [PATCH v2] mm: memory: check userfaultfd_wp() in vmf_orig_pte_uffd_wp()

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

From: Kefeng Wang <wangkefeng.wang@huawei.com>
To: Peter Xu <peterx@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>, <linux-mm@kvack.org>
Subject: Re: [PATCH v2] mm: memory: check userfaultfd_wp() in vmf_orig_pte_uffd_wp()
Date: Sat, 20 Apr 2024 12:05:04 +0800	[thread overview]
Message-ID: <073f3d13-5656-4de0-a62b-cee96f2b0eaa@huawei.com> (raw)
In-Reply-To: <ZiKLBxS8D_jzD6F9@x1n>



On 2024/4/19 23:17, Peter Xu wrote:
> On Fri, Apr 19, 2024 at 11:00:46AM +0800, Kefeng Wang wrote:
>>
>>
>> On 2024/4/19 0:32, Peter Xu wrote:
>>> Hi, Kefeng,
>>>
>>> On Thu, Apr 18, 2024 at 08:06:41PM +0800, Kefeng Wang wrote:
>>>> Add userfaultfd_wp() check in vmf_orig_pte_uffd_wp() to avoid the
>>>> unnecessary pte_marker_entry_uffd_wp() in most pagefault, difference
>>>> as shows below from perf data of lat_pagefault, note, the function
>>>> vmf_orig_pte_uffd_wp() is not inlined in the two kernel versions.
>>>>
>>>>     perf report -i perf.data.before | grep vmf
>>>>        0.17%     0.13%  lat_pagefault  [kernel.kallsyms]      [k] vmf_orig_pte_uffd_wp.part.0.isra.0
>>>>     perf report -i perf.data.after  | grep vmf
>>>
>>> Any real number to share too besides the perf greps?  I meant, even if perf
>>> report will not report such function anymore, it doesn't mean it'll be
>>> faster, and how much it improves?
>>
>> dd if=/dev/zero of=/tmp/XXX bs=512M count=1
>> ./lat_pagefault -W 5 -N 5 /tmp/XXX
>>
>> 	before		after	
>> 1	0.2623		0.2605	
>> 2	0.2622		0.2598	
>> 3	0.2621		0.2595	
>> 4	0.2622		0.2600	
>> 5	0.2651		0.2598	
>> 6	0.2624		0.2594	
>> 7	0.2624		0.2605	
>> 8	0.2627		0.2608	
>> average	0.262675	0.2600375	-0.0026375
>>
>> The lat_pagefault does show some improvement(also I reboot and retest,
>> the results are same).
> 
> Thanks. Could you replace the perf report with these real data?  Or just
> append to it.

Sure, will append it.

> 
> I had a look at the asm and indeed the current code will generate two
> jumps when without this patch, and I don't know why..
> 
>     0x0000000000006ac4 <+52>:    test   $0x8,%ah                 <---- check FAULT_FLAG_ORIG_PTE_VALID
>     0x0000000000006ac7 <+55>:    jne    0x6bcf <set_pte_range+319>
>     0x0000000000006acd <+61>:    mov    0x18(%rbp),%rsi
> 
>     ...
> 
>     0x0000000000006bcf <+319>:   mov    0x40(%rdi),%rdi
>     0x0000000000006bd3 <+323>:   test   $0xffffffffffffff9f,%rdi <---- pte_none() check
>     0x0000000000006bda <+330>:   je     0x6acd <set_pte_range+61>
>     0x0000000000006be0 <+336>:   test   $0x101,%edi              <---- pte_present() check
>     0x0000000000006be6 <+342>:   jne    0x6acd <set_pte_range+61>
>     0x0000000000006bec <+348>:   call   0x1c50 <pte_to_swp_entry>
>     0x0000000000006bf1 <+353>:   mov    0x0(%rip),%rdx        # 0x6bf8 <set_pte_range+360>
>     0x0000000000006bf8 <+360>:   mov    %rax,%r15
>     0x0000000000006bfb <+363>:   shr    $0x3a,%rax
>     0x0000000000006bff <+367>:   cmp    $0x1f,%rax
>     0x0000000000006c03 <+371>:   mov    $0x0,%eax
>     0x0000000000006c08 <+376>:   cmovne %rax,%r15
>     0x0000000000006c0c <+380>:   mov    0x28(%rbx),%eax
>     0x0000000000006c0f <+383>:   and    $0x1,%r15d
>     0x0000000000006c13 <+387>:   jmp    0x6acd <set_pte_range+61>
> 
> I also don't know why the compiler cannot already merge the none+present
> check into one shot, I thought it could.  Also surprised me that
> pte_to_swp_entry() is a function call.. but not involved in this context.
> 
> So I think I was right it should bypass this when seeing it pte_none,
> however that includes two jumps.
> 
> And with your patch applied the two jumps are not there:
> 
>     0x0000000000006b0c <+124>:   testb  $0x8,0x29(%r14)           <--- FAULT_FLAG_ORIG_PTE_VALID
>     0x0000000000006b11 <+129>:   je     0x6b6a <set_pte_range+218>
>     0x0000000000006b13 <+131>:   mov    (%r14),%rax
>     0x0000000000006b16 <+134>:   testb  $0x10,0x21(%rax)          <--- userfaultfd_wp(vmf->vma) check
>     0x0000000000006b1a <+138>:   je     0x6b6a <set_pte_range+218>
> 
> Maybe that's what contributes to that 0.x% extra time of a fault.
> 
> So if we do care about this 0.x% and we're doing this anyway, perhaps move

The latency of lat_pagefault increased a lot than the old kernel(vs 
5.10), except mm counter updating, the another obvious difference
shown from perf graph is the new vmf_orig_pte_uffd_wp().

> the vma check upper?  Because afaict FAULT_FLAG_ORIG_PTE_VALID should
> always hit in set_pte_range(), so we can avoid two more insts in the common
> paths.

Moving it upper is better, and maybe add __always_inline to 
vmf_orig_pte_uffd_wp() to make set_pte_range() only check VM_UFFD_WP 
from vm_flags?

> 
> I'll leave that to you too if you want to mention some details in above and
> add that into the commit log.

Will update the changelog, thanks.
> 
> Thanks,
>

next prev parent reply	other threads:[~2024-04-20  4:05 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-18 12:06 [PATCH v2] mm: memory: check userfaultfd_wp() in vmf_orig_pte_uffd_wp() Kefeng Wang
2024-04-18 16:32 ` Peter Xu
2024-04-19  3:00   ` Kefeng Wang
2024-04-19 15:17     ` Peter Xu
2024-04-20  4:05       ` Kefeng Wang [this message]
2024-04-21 13:53         ` Peter Xu
2024-04-22  2:13           ` Kefeng Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=073f3d13-5656-4de0-a62b-cee96f2b0eaa@huawei.com \
    --to=wangkefeng.wang@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox