Re: [RFC PATCH] powerpc/book3s64/radix: Upgrade va tlbie to PID tlbie if we cross PMD_SIZE

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Michael Ellerman <mpe@ellerman.id.au>
To: Nicholas Piggin <npiggin@gmail.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [RFC PATCH] powerpc/book3s64/radix: Upgrade va tlbie to PID tlbie if we cross PMD_SIZE
Date: Wed, 04 Aug 2021 16:59:25 +1000	[thread overview]
Message-ID: <87im0lofua.fsf@mpe.ellerman.id.au> (raw)
In-Reply-To: <1628053302.0qclx0xcj9.astroid@bobo.none>

Nicholas Piggin <npiggin@gmail.com> writes:
> Excerpts from Aneesh Kumar K.V's message of August 4, 2021 12:37 am:
>> With shared mapping, even though we are unmapping a large range, the kernel
>> will force a TLB flush with ptl lock held to avoid the race mentioned in
>> commit 1cf35d47712d ("mm: split 'tlb_flush_mmu()' into tlb flushing and memory freeing parts")
>> This results in the kernel issuing a high number of TLB flushes even for a large
>> range. This can be improved by making sure the kernel switch to pid based flush if the
>> kernel is unmapping a 2M range.
>
> It would be good to have a bit more description here.
>
> In any patch that changes a heuristic like this, I would like to see 
> some justification or reasoning that could be refuted or used as a 
> supporting argument if we ever wanted to change the heuristic later.
> Ideally with some of the obvious downsides listed as well.
>
> This "improves" things here, but what if it hurt things elsewhere, how 
> would we come in later and decide to change it back?
>
> THP flushes for example, I think now they'll do PID flushes (if they 
> have to be broadcast, which they will tend to be when khugepaged does
> them). So now that might increase jitter for THP and cause it to be a
> loss for more workloads.
>
> So where do you notice this? What's the benefit?

Ack. Needs some numbers and supporting evidence.

>> diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
>> index aefc100d79a7..21d0f098e43b 100644
>> --- a/arch/powerpc/mm/book3s64/radix_tlb.c
>> +++ b/arch/powerpc/mm/book3s64/radix_tlb.c
>> @@ -1106,7 +1106,7 @@ EXPORT_SYMBOL(radix__flush_tlb_kernel_range);
>>   * invalidating a full PID, so it has a far lower threshold to change from
>>   * individual page flushes to full-pid flushes.
>>   */
>> -static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33;
>> +static unsigned long tlb_single_page_flush_ceiling __read_mostly = 32;
>>  static unsigned long tlb_local_single_page_flush_ceiling __read_mostly = POWER9_TLB_SETS_RADIX * 2;
>>  
>>  static inline void __radix__flush_tlb_range(struct mm_struct *mm,
>> @@ -1133,7 +1133,7 @@ static inline void __radix__flush_tlb_range(struct mm_struct *mm,
>>  	if (fullmm)
>>  		flush_pid = true;
>>  	else if (type == FLUSH_TYPE_GLOBAL)
>> -		flush_pid = nr_pages > tlb_single_page_flush_ceiling;
>> +		flush_pid = nr_pages >= tlb_single_page_flush_ceiling;
>
> Arguably >= is nicer than > here, but this shouldn't be in the same 
> patch as the value change.
>
>>  	else
>>  		flush_pid = nr_pages > tlb_local_single_page_flush_ceiling;
>
> And it should change everything to be consistent. Although I'm not sure 
> it's worth changing even though I highly doubt any administrator would
> be tweaking this.

This made me look at how an administrator tweaks these thresholds, and
AFAICS the answer is "recompile the kernel"?

It looks like x86 have a debugfs file for tlb_single_page_flush_ceiling,
but we don't. I guess we meant to copy that but never did?

So at the moment both thresholds could just be #defines.

Making them tweakable at runtime would be nice, it would give us an
escape hatch if we ever hit a workload in production that wants a
different value.

cheers

next prev parent reply	other threads:[~2021-08-04  6:59 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-03 14:37 [RFC PATCH] powerpc/book3s64/radix: Upgrade va tlbie to PID tlbie if we cross PMD_SIZE Aneesh Kumar K.V
2021-08-04  5:14 ` Nicholas Piggin
2021-08-04  6:39   ` Nicholas Piggin
2021-08-04  6:39     ` Nicholas Piggin
2021-08-04  7:34     ` Peter Zijlstra
2021-08-04  7:34       ` Peter Zijlstra
2021-08-04  6:59   ` Michael Ellerman [this message]
  -- strict thread matches above, loose matches on Subject: below --
2021-08-06  5:22 Puvichakravarthy Ramachandran
2021-08-06  7:56 Puvichakravarthy Ramachandran
2021-08-12 12:49 ` Michael Ellerman
2021-08-12 13:20   ` Aneesh Kumar K.V
2021-08-16  7:03     ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87im0lofua.fsf@mpe.ellerman.id.au \
    --to=mpe@ellerman.id.au \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=npiggin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.