All of lore.kernel.org
 help / color / mirror / Atom feed
From: Uladzislau Rezki <urezki@gmail.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Uladzislau Rezki <urezki@gmail.com>,
	"Russell King (Oracle)" <linux@armlinux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Christoph Hellwig <hch@lst.de>,
	Lorenzo Stoakes <lstoakes@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Baoquan He <bhe@redhat.com>, John Ogness <jogness@linutronix.de>,
	linux-arm-kernel@lists.infradead.org,
	Mark Rutland <mark.rutland@arm.com>,
	Marc Zyngier <maz@kernel.org>,
	x86@kernel.org, Nadav Amit <nadav.amit@gmail.com>
Subject: Re: Excessive TLB flush ranges
Date: Fri, 19 May 2023 19:02:49 +0200	[thread overview]
Message-ID: <ZGeruW3ouCiJ61kF@pc636> (raw)
In-Reply-To: <87fs7s46z9.ffs@tglx>

On Fri, May 19, 2023 at 06:32:42PM +0200, Thomas Gleixner wrote:
> On Fri, May 19 2023 at 17:14, Uladzislau Rezki wrote:
> > On Fri, May 19, 2023 at 04:56:53PM +0200, Thomas Gleixner wrote:
> >> > +       /* Flush per-VA. */
> >> > +       list_for_each_entry(va, &local_purge_list, list)
> >> > +               flush_tlb_kernel_range(va->va_start, va->va_end);
> >> >
> >> > -       flush_tlb_kernel_range(start, end);
> >> >         resched_threshold = lazy_max_pages() << 1;
> >> 
> >> That's completely wrong, really.
> >> 
> > Absolutely. That is why we do not flush a range per-VA ;-) I provided the
> > data just to show what happens if we do it!
> 
> Seriously, you think you need to demonstrate that to me? Did you
> actually read what I wrote?
> 
>    "I understand why you want to batch and coalesce and rather do a rare
>     full tlb flush than sending gazillions of IPIs."
> 
Yes i read it. Since i also mentioned about IPI and did not provide any
data, i did it later, just in case. I shared my observation and that is it.

> > A per-VA flushing works when a system is not capable of doing a full
> > flush, so it has to do it page by page. In this scenario we should
> > bypass ranges(not mapped) which are between VAs in a purge-list.
> 
> ARM32 has a full flush as does x86. Just ARM32 does not have a cutoff
> for a full flush in flush_tlb_kernel_range(). That's easily fixable, but
> the underlying problem remains.
> 
> The point is that coalescing the VA ranges blindly is also fundamentally
> wrong:
>
> 
>        start1 = 0x95c8d000 end1 = 0x95c8e000
>        start2 = 0xf08a1000 end2 = 0xf08a5000
> 
> -->    start  = 0x95c8d000 end  = 0xf08a5000
> 
> So this ends up with:
> 
>    if (end - start > flush_all_threshold)
>    	ipi_flush_all();
>    else
>         ipi_flush_range();
> 
> So with the above example this ends up with flush_all(), but a
> flush_vas() as I demonstrated with the list approach (ignore the storage
> problem which is fixable) this results in
> 
>    if (total_nr_pages > flush_all_threshold)
>    	ipi_flush_all();
>    else
>         ipi_flush_vas();
> 
> and that ipi flushes 3 pages instead of taking out the whole TLB, which
> results in a 1% gain on that machine. Not massive, but still.
> 
> The blind coalescing is also wrong if the resulting range is not giantic
> but below the flush_all_threshold. Lets assume a threshold of 32 pages.
> 
>        start1 = 0xf0800000 end1 = 0xf0802000           2 pages
>        start2 = 0xf081e000 end2 = 0xf0820000           2 pages
> 
> -->    start  = 0xf0800000 end  = 0xf0820000
> 
> So because this does not qualify for a full flush and it should not,
> this ends up flushing 32 pages one by one instead of flushing exactly
> four.
> 
> IOW, the existing code is fully biased towards full flushes which is
> wrong.
> 
> Just because this does not show up in your performance numbers on some
> enterprise workload does not make it more correct.
> 
Usually we do a flush of lazy-areas once the lazy_max_pages() threshold
is reached. There are exceptions. When an allocation fails, we drain the
areas(if there are any), second is a per-cpu allocator and last one is
vm_reset_perms() when "vm" is marked as VM_FLUSH_RESET_PERMS.

As for your description i totally see the problem.

--
Uladzislau Rezki

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)
From: Uladzislau Rezki <urezki@gmail.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Uladzislau Rezki <urezki@gmail.com>,
	"Russell King (Oracle)" <linux@armlinux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Christoph Hellwig <hch@lst.de>,
	Lorenzo Stoakes <lstoakes@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Baoquan He <bhe@redhat.com>, John Ogness <jogness@linutronix.de>,
	linux-arm-kernel@lists.infradead.org,
	Mark Rutland <mark.rutland@arm.com>,
	Marc Zyngier <maz@kernel.org>,
	x86@kernel.org, Nadav Amit <nadav.amit@gmail.com>
Subject: Re: Excessive TLB flush ranges
Date: Fri, 19 May 2023 19:02:49 +0200	[thread overview]
Message-ID: <ZGeruW3ouCiJ61kF@pc636> (raw)
In-Reply-To: <87fs7s46z9.ffs@tglx>

On Fri, May 19, 2023 at 06:32:42PM +0200, Thomas Gleixner wrote:
> On Fri, May 19 2023 at 17:14, Uladzislau Rezki wrote:
> > On Fri, May 19, 2023 at 04:56:53PM +0200, Thomas Gleixner wrote:
> >> > +       /* Flush per-VA. */
> >> > +       list_for_each_entry(va, &local_purge_list, list)
> >> > +               flush_tlb_kernel_range(va->va_start, va->va_end);
> >> >
> >> > -       flush_tlb_kernel_range(start, end);
> >> >         resched_threshold = lazy_max_pages() << 1;
> >> 
> >> That's completely wrong, really.
> >> 
> > Absolutely. That is why we do not flush a range per-VA ;-) I provided the
> > data just to show what happens if we do it!
> 
> Seriously, you think you need to demonstrate that to me? Did you
> actually read what I wrote?
> 
>    "I understand why you want to batch and coalesce and rather do a rare
>     full tlb flush than sending gazillions of IPIs."
> 
Yes i read it. Since i also mentioned about IPI and did not provide any
data, i did it later, just in case. I shared my observation and that is it.

> > A per-VA flushing works when a system is not capable of doing a full
> > flush, so it has to do it page by page. In this scenario we should
> > bypass ranges(not mapped) which are between VAs in a purge-list.
> 
> ARM32 has a full flush as does x86. Just ARM32 does not have a cutoff
> for a full flush in flush_tlb_kernel_range(). That's easily fixable, but
> the underlying problem remains.
> 
> The point is that coalescing the VA ranges blindly is also fundamentally
> wrong:
>
> 
>        start1 = 0x95c8d000 end1 = 0x95c8e000
>        start2 = 0xf08a1000 end2 = 0xf08a5000
> 
> -->    start  = 0x95c8d000 end  = 0xf08a5000
> 
> So this ends up with:
> 
>    if (end - start > flush_all_threshold)
>    	ipi_flush_all();
>    else
>         ipi_flush_range();
> 
> So with the above example this ends up with flush_all(), but a
> flush_vas() as I demonstrated with the list approach (ignore the storage
> problem which is fixable) this results in
> 
>    if (total_nr_pages > flush_all_threshold)
>    	ipi_flush_all();
>    else
>         ipi_flush_vas();
> 
> and that ipi flushes 3 pages instead of taking out the whole TLB, which
> results in a 1% gain on that machine. Not massive, but still.
> 
> The blind coalescing is also wrong if the resulting range is not giantic
> but below the flush_all_threshold. Lets assume a threshold of 32 pages.
> 
>        start1 = 0xf0800000 end1 = 0xf0802000           2 pages
>        start2 = 0xf081e000 end2 = 0xf0820000           2 pages
> 
> -->    start  = 0xf0800000 end  = 0xf0820000
> 
> So because this does not qualify for a full flush and it should not,
> this ends up flushing 32 pages one by one instead of flushing exactly
> four.
> 
> IOW, the existing code is fully biased towards full flushes which is
> wrong.
> 
> Just because this does not show up in your performance numbers on some
> enterprise workload does not make it more correct.
> 
Usually we do a flush of lazy-areas once the lazy_max_pages() threshold
is reached. There are exceptions. When an allocation fails, we drain the
areas(if there are any), second is a per-cpu allocator and last one is
vm_reset_perms() when "vm" is marked as VM_FLUSH_RESET_PERMS.

As for your description i totally see the problem.

--
Uladzislau Rezki


  reply	other threads:[~2023-05-19 17:03 UTC|newest]

Thread overview: 150+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-15 16:43 Excessive TLB flush ranges Thomas Gleixner
2023-05-15 16:43 ` Thomas Gleixner
2023-05-15 16:59 ` Russell King (Oracle)
2023-05-15 16:59   ` Russell King (Oracle)
2023-05-15 19:46   ` Thomas Gleixner
2023-05-15 19:46     ` Thomas Gleixner
2023-05-15 21:11     ` Thomas Gleixner
2023-05-15 21:11       ` Thomas Gleixner
2023-05-15 21:31       ` Russell King (Oracle)
2023-05-15 21:31         ` Russell King (Oracle)
2023-05-16  6:37         ` Thomas Gleixner
2023-05-16  6:37           ` Thomas Gleixner
2023-05-16  6:46           ` Thomas Gleixner
2023-05-16  6:46             ` Thomas Gleixner
2023-05-16  8:18           ` Thomas Gleixner
2023-05-16  8:18             ` Thomas Gleixner
2023-05-16  8:20             ` Thomas Gleixner
2023-05-16  8:20               ` Thomas Gleixner
2023-05-16  8:27               ` Russell King (Oracle)
2023-05-16  8:27                 ` Russell King (Oracle)
2023-05-16  9:03                 ` Thomas Gleixner
2023-05-16  9:03                   ` Thomas Gleixner
2023-05-16 10:05                   ` Baoquan He
2023-05-16 10:05                     ` Baoquan He
2023-05-16 14:21                     ` Thomas Gleixner
2023-05-16 14:21                       ` Thomas Gleixner
2023-05-16 19:03                       ` Thomas Gleixner
2023-05-16 19:03                         ` Thomas Gleixner
2023-05-17  9:38                         ` Thomas Gleixner
2023-05-17  9:38                           ` Thomas Gleixner
2023-05-17 10:52                           ` Baoquan He
2023-05-17 10:52                             ` Baoquan He
2023-05-19 11:22                             ` Thomas Gleixner
2023-05-19 11:22                               ` Thomas Gleixner
2023-05-19 11:49                               ` Baoquan He
2023-05-19 11:49                                 ` Baoquan He
2023-05-19 14:13                                 ` Thomas Gleixner
2023-05-19 14:13                                   ` Thomas Gleixner
2023-05-19 12:01                         ` [RFC PATCH 1/3] mm/vmalloc.c: try to flush vmap_area one by one Baoquan He
2023-05-19 12:01                           ` Baoquan He
2023-05-19 14:16                           ` Thomas Gleixner
2023-05-19 14:16                             ` Thomas Gleixner
2023-05-19 12:02                         ` [RFC PATCH 2/3] mm/vmalloc.c: Only flush VM_FLUSH_RESET_PERMS area immediately Baoquan He
2023-05-19 12:02                           ` Baoquan He
2023-05-19 12:03                         ` [RFC PATCH 3/3] mm/vmalloc.c: change _vm_unmap_aliases() to do purge firstly Baoquan He
2023-05-19 12:03                           ` Baoquan He
2023-05-19 14:17                           ` Thomas Gleixner
2023-05-19 14:17                             ` Thomas Gleixner
2023-05-19 18:38                           ` Thomas Gleixner
2023-05-19 18:38                             ` Thomas Gleixner
2023-05-19 23:46                             ` Baoquan He
2023-05-19 23:46                               ` Baoquan He
2023-05-21 23:10                               ` Thomas Gleixner
2023-05-21 23:10                                 ` Thomas Gleixner
2023-05-22 11:21                                 ` Baoquan He
2023-05-22 11:21                                   ` Baoquan He
2023-05-22 12:02                                   ` Thomas Gleixner
2023-05-22 12:02                                     ` Thomas Gleixner
2023-05-22 14:34                                     ` Baoquan He
2023-05-22 14:34                                       ` Baoquan He
2023-05-22 20:21                                       ` Thomas Gleixner
2023-05-22 20:21                                         ` Thomas Gleixner
2023-05-22 20:44                                         ` Thomas Gleixner
2023-05-22 20:44                                           ` Thomas Gleixner
2023-05-23  9:35                                         ` Baoquan He
2023-05-23  9:35                                           ` Baoquan He
2023-05-19 13:49                   ` Excessive TLB flush ranges Thomas Gleixner
2023-05-19 13:49                     ` Thomas Gleixner
2023-05-16  8:21             ` Russell King (Oracle)
2023-05-16  8:21               ` Russell King (Oracle)
2023-05-16  8:19           ` Russell King (Oracle)
2023-05-16  8:19             ` Russell King (Oracle)
2023-05-16  8:44             ` Thomas Gleixner
2023-05-16  8:44               ` Thomas Gleixner
2023-05-16  8:48               ` Russell King (Oracle)
2023-05-16  8:48                 ` Russell King (Oracle)
2023-05-16 12:09                 ` Thomas Gleixner
2023-05-16 12:09                   ` Thomas Gleixner
2023-05-16 13:42                   ` Uladzislau Rezki
2023-05-16 13:42                     ` Uladzislau Rezki
2023-05-16 14:38                     ` Thomas Gleixner
2023-05-16 14:38                       ` Thomas Gleixner
2023-05-16 15:01                       ` Uladzislau Rezki
2023-05-16 15:01                         ` Uladzislau Rezki
2023-05-16 17:04                         ` Thomas Gleixner
2023-05-16 17:04                           ` Thomas Gleixner
2023-05-17 11:26                           ` Uladzislau Rezki
2023-05-17 11:26                             ` Uladzislau Rezki
2023-05-17 11:58                             ` Thomas Gleixner
2023-05-17 11:58                               ` Thomas Gleixner
2023-05-17 12:15                               ` Uladzislau Rezki
2023-05-17 12:15                                 ` Uladzislau Rezki
2023-05-17 16:32                                 ` Thomas Gleixner
2023-05-17 16:32                                   ` Thomas Gleixner
2023-05-19 10:01                                   ` Uladzislau Rezki
2023-05-19 10:01                                     ` Uladzislau Rezki
2023-05-19 14:56                                     ` Thomas Gleixner
2023-05-19 14:56                                       ` Thomas Gleixner
2023-05-19 15:14                                       ` Uladzislau Rezki
2023-05-19 15:14                                         ` Uladzislau Rezki
2023-05-19 16:32                                         ` Thomas Gleixner
2023-05-19 16:32                                           ` Thomas Gleixner
2023-05-19 17:02                                           ` Uladzislau Rezki [this message]
2023-05-19 17:02                                             ` Uladzislau Rezki
2023-05-16 17:56                       ` Nadav Amit
2023-05-16 17:56                         ` Nadav Amit
2023-05-16 19:32                         ` Thomas Gleixner
2023-05-16 19:32                           ` Thomas Gleixner
2023-05-17  0:23                           ` Thomas Gleixner
2023-05-17  0:23                             ` Thomas Gleixner
2023-05-17  1:23                             ` Nadav Amit
2023-05-17  1:23                               ` Nadav Amit
2023-05-17 10:31                               ` Thomas Gleixner
2023-05-17 10:31                                 ` Thomas Gleixner
2023-05-17 11:47                                 ` Thomas Gleixner
2023-05-17 11:47                                   ` Thomas Gleixner
2023-05-17 22:41                                   ` Nadav Amit
2023-05-17 22:41                                     ` Nadav Amit
2023-05-17 14:43                                 ` Mark Rutland
2023-05-17 14:43                                   ` Mark Rutland
2023-05-17 16:41                                   ` Thomas Gleixner
2023-05-17 16:41                                     ` Thomas Gleixner
2023-05-17 22:57                                 ` Nadav Amit
2023-05-17 22:57                                   ` Nadav Amit
2023-05-19 11:49                                   ` Thomas Gleixner
2023-05-19 11:49                                     ` Thomas Gleixner
2023-05-17 12:12                               ` Russell King (Oracle)
2023-05-17 12:12                                 ` Russell King (Oracle)
2023-05-17 23:14                                 ` Nadav Amit
2023-05-17 23:14                                   ` Nadav Amit
2023-05-15 18:17 ` Uladzislau Rezki
2023-05-15 18:17   ` Uladzislau Rezki
2023-05-16  2:26   ` Baoquan He
2023-05-16  2:26     ` Baoquan He
2023-05-16  6:40     ` Thomas Gleixner
2023-05-16  6:40       ` Thomas Gleixner
2023-05-16  8:07       ` Baoquan He
2023-05-16  8:07         ` Baoquan He
2023-05-16  8:10         ` Baoquan He
2023-05-16  8:10           ` Baoquan He
2023-05-16  8:45         ` Russell King (Oracle)
2023-05-16  8:45           ` Russell King (Oracle)
2023-05-16  9:13           ` Thomas Gleixner
2023-05-16  9:13             ` Thomas Gleixner
2023-05-16  8:54         ` Thomas Gleixner
2023-05-16  8:54           ` Thomas Gleixner
2023-05-16  9:48           ` Baoquan He
2023-05-16  9:48             ` Baoquan He
2023-05-15 20:02 ` Nadav Amit
2023-05-15 20:02   ` Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZGeruW3ouCiJ61kF@pc636 \
    --to=urezki@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=hch@lst.de \
    --cc=jogness@linutronix.de \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@armlinux.org.uk \
    --cc=lstoakes@gmail.com \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=nadav.amit@gmail.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.