All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, Christoph Hellwig <hch@lst.de>,
	Uladzislau Rezki <urezki@gmail.com>,
	Lorenzo Stoakes <lstoakes@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Baoquan He <bhe@redhat.com>, John Ogness <jogness@linutronix.de>,
	linux-arm-kernel@lists.infradead.org,
	Russell King <linux@arm.linux.org.uk>,
	Mark Rutland <mark.rutland@arm.com>,
	Marc Zyngier <maz@kernel.org>
Subject: Excessive TLB flush ranges
Date: Mon, 15 May 2023 18:43:40 +0200	[thread overview]
Message-ID: <87a5y5a6kj.ffs@tglx> (raw)

Folks!

We're observing massive latencies and slowdowns on ARM32 machines due to
excessive TLB flush ranges.

Those can be observed when tearing down a process, which has a seccomp
BPF filter installed. ARM32 uses the vmalloc area for module space.

bpf_prog_free_deferred()
  vfree()
    _vm_unmap_aliases()
       collect_per_cpu_vmap_blocks: start:0x95c8d000 end:0x95c8e000 size:0x1000 
       __purge_vmap_area_lazy(start:0x95c8d000, end:0x95c8e000)

         va_start:0xf08a1000 va_end:0xf08a5000 size:0x00004000 gap:0x5ac13000 (371731 pages)
         va_start:0xf08a5000 va_end:0xf08a9000 size:0x00004000 gap:0x00000000 (     0 pages)
         va_start:0xf08a9000 va_end:0xf08ad000 size:0x00004000 gap:0x00000000 (     0 pages)
         va_start:0xf08ad000 va_end:0xf08b1000 size:0x00004000 gap:0x00000000 (     0 pages)
         va_start:0xf08b3000 va_end:0xf08b7000 size:0x00004000 gap:0x00002000 (     2 pages)
         va_start:0xf08b7000 va_end:0xf08bb000 size:0x00004000 gap:0x00000000 (     0 pages)
         va_start:0xf08bb000 va_end:0xf08bf000 size:0x00004000 gap:0x00000000 (     0 pages)
         va_start:0xf0a15000 va_end:0xf0a17000 size:0x00002000 gap:0x00156000 (   342 pages)

      flush_tlb_kernel_range(start:0x95c8d000, end:0xf0a17000)

         Does 372106 flush operations where only 31 are useful

So for all architectures which lack a mechanism to do a full TLB flush
in flush_tlb_kernel_range() this takes ages (4-8ms) and slows down
realtime processes on the other CPUs by a factor of two and larger.

So while ARM32, CSKY, NIOS, PPC (some variants), _should_ arguably have
a fallback to tlb_flush_all() when the range is too large, there is
another issue. I've seen a couple of instances where _vm_unmap_aliases()
collects one page and the actual va list has only 2 pages, which might
be eventually worth to flush one by one.

I'm not sure whether that's worth it as checking for those gaps might be
too expensive for the case where a large number of va entries needs to
be flushed.

We'll experiment with a tlb_flush_all() fallback on that ARM32 system in
the next days and see how that works out.

Thanks,

        tglx


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)
From: Thomas Gleixner <tglx@linutronix.de>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, Christoph Hellwig <hch@lst.de>,
	Uladzislau Rezki <urezki@gmail.com>,
	Lorenzo Stoakes <lstoakes@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Baoquan He <bhe@redhat.com>, John Ogness <jogness@linutronix.de>,
	linux-arm-kernel@lists.infradead.org,
	Russell King <linux@arm.linux.org.uk>,
	Mark Rutland <mark.rutland@arm.com>,
	Marc Zyngier <maz@kernel.org>
Subject: Excessive TLB flush ranges
Date: Mon, 15 May 2023 18:43:40 +0200	[thread overview]
Message-ID: <87a5y5a6kj.ffs@tglx> (raw)

Folks!

We're observing massive latencies and slowdowns on ARM32 machines due to
excessive TLB flush ranges.

Those can be observed when tearing down a process, which has a seccomp
BPF filter installed. ARM32 uses the vmalloc area for module space.

bpf_prog_free_deferred()
  vfree()
    _vm_unmap_aliases()
       collect_per_cpu_vmap_blocks: start:0x95c8d000 end:0x95c8e000 size:0x1000 
       __purge_vmap_area_lazy(start:0x95c8d000, end:0x95c8e000)

         va_start:0xf08a1000 va_end:0xf08a5000 size:0x00004000 gap:0x5ac13000 (371731 pages)
         va_start:0xf08a5000 va_end:0xf08a9000 size:0x00004000 gap:0x00000000 (     0 pages)
         va_start:0xf08a9000 va_end:0xf08ad000 size:0x00004000 gap:0x00000000 (     0 pages)
         va_start:0xf08ad000 va_end:0xf08b1000 size:0x00004000 gap:0x00000000 (     0 pages)
         va_start:0xf08b3000 va_end:0xf08b7000 size:0x00004000 gap:0x00002000 (     2 pages)
         va_start:0xf08b7000 va_end:0xf08bb000 size:0x00004000 gap:0x00000000 (     0 pages)
         va_start:0xf08bb000 va_end:0xf08bf000 size:0x00004000 gap:0x00000000 (     0 pages)
         va_start:0xf0a15000 va_end:0xf0a17000 size:0x00002000 gap:0x00156000 (   342 pages)

      flush_tlb_kernel_range(start:0x95c8d000, end:0xf0a17000)

         Does 372106 flush operations where only 31 are useful

So for all architectures which lack a mechanism to do a full TLB flush
in flush_tlb_kernel_range() this takes ages (4-8ms) and slows down
realtime processes on the other CPUs by a factor of two and larger.

So while ARM32, CSKY, NIOS, PPC (some variants), _should_ arguably have
a fallback to tlb_flush_all() when the range is too large, there is
another issue. I've seen a couple of instances where _vm_unmap_aliases()
collects one page and the actual va list has only 2 pages, which might
be eventually worth to flush one by one.

I'm not sure whether that's worth it as checking for those gaps might be
too expensive for the case where a large number of va entries needs to
be flushed.

We'll experiment with a tlb_flush_all() fallback on that ARM32 system in
the next days and see how that works out.

Thanks,

        tglx



             reply	other threads:[~2023-05-15 16:44 UTC|newest]

Thread overview: 150+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-15 16:43 Thomas Gleixner [this message]
2023-05-15 16:43 ` Excessive TLB flush ranges Thomas Gleixner
2023-05-15 16:59 ` Russell King (Oracle)
2023-05-15 16:59   ` Russell King (Oracle)
2023-05-15 19:46   ` Thomas Gleixner
2023-05-15 19:46     ` Thomas Gleixner
2023-05-15 21:11     ` Thomas Gleixner
2023-05-15 21:11       ` Thomas Gleixner
2023-05-15 21:31       ` Russell King (Oracle)
2023-05-15 21:31         ` Russell King (Oracle)
2023-05-16  6:37         ` Thomas Gleixner
2023-05-16  6:37           ` Thomas Gleixner
2023-05-16  6:46           ` Thomas Gleixner
2023-05-16  6:46             ` Thomas Gleixner
2023-05-16  8:18           ` Thomas Gleixner
2023-05-16  8:18             ` Thomas Gleixner
2023-05-16  8:20             ` Thomas Gleixner
2023-05-16  8:20               ` Thomas Gleixner
2023-05-16  8:27               ` Russell King (Oracle)
2023-05-16  8:27                 ` Russell King (Oracle)
2023-05-16  9:03                 ` Thomas Gleixner
2023-05-16  9:03                   ` Thomas Gleixner
2023-05-16 10:05                   ` Baoquan He
2023-05-16 10:05                     ` Baoquan He
2023-05-16 14:21                     ` Thomas Gleixner
2023-05-16 14:21                       ` Thomas Gleixner
2023-05-16 19:03                       ` Thomas Gleixner
2023-05-16 19:03                         ` Thomas Gleixner
2023-05-17  9:38                         ` Thomas Gleixner
2023-05-17  9:38                           ` Thomas Gleixner
2023-05-17 10:52                           ` Baoquan He
2023-05-17 10:52                             ` Baoquan He
2023-05-19 11:22                             ` Thomas Gleixner
2023-05-19 11:22                               ` Thomas Gleixner
2023-05-19 11:49                               ` Baoquan He
2023-05-19 11:49                                 ` Baoquan He
2023-05-19 14:13                                 ` Thomas Gleixner
2023-05-19 14:13                                   ` Thomas Gleixner
2023-05-19 12:01                         ` [RFC PATCH 1/3] mm/vmalloc.c: try to flush vmap_area one by one Baoquan He
2023-05-19 12:01                           ` Baoquan He
2023-05-19 14:16                           ` Thomas Gleixner
2023-05-19 14:16                             ` Thomas Gleixner
2023-05-19 12:02                         ` [RFC PATCH 2/3] mm/vmalloc.c: Only flush VM_FLUSH_RESET_PERMS area immediately Baoquan He
2023-05-19 12:02                           ` Baoquan He
2023-05-19 12:03                         ` [RFC PATCH 3/3] mm/vmalloc.c: change _vm_unmap_aliases() to do purge firstly Baoquan He
2023-05-19 12:03                           ` Baoquan He
2023-05-19 14:17                           ` Thomas Gleixner
2023-05-19 14:17                             ` Thomas Gleixner
2023-05-19 18:38                           ` Thomas Gleixner
2023-05-19 18:38                             ` Thomas Gleixner
2023-05-19 23:46                             ` Baoquan He
2023-05-19 23:46                               ` Baoquan He
2023-05-21 23:10                               ` Thomas Gleixner
2023-05-21 23:10                                 ` Thomas Gleixner
2023-05-22 11:21                                 ` Baoquan He
2023-05-22 11:21                                   ` Baoquan He
2023-05-22 12:02                                   ` Thomas Gleixner
2023-05-22 12:02                                     ` Thomas Gleixner
2023-05-22 14:34                                     ` Baoquan He
2023-05-22 14:34                                       ` Baoquan He
2023-05-22 20:21                                       ` Thomas Gleixner
2023-05-22 20:21                                         ` Thomas Gleixner
2023-05-22 20:44                                         ` Thomas Gleixner
2023-05-22 20:44                                           ` Thomas Gleixner
2023-05-23  9:35                                         ` Baoquan He
2023-05-23  9:35                                           ` Baoquan He
2023-05-19 13:49                   ` Excessive TLB flush ranges Thomas Gleixner
2023-05-19 13:49                     ` Thomas Gleixner
2023-05-16  8:21             ` Russell King (Oracle)
2023-05-16  8:21               ` Russell King (Oracle)
2023-05-16  8:19           ` Russell King (Oracle)
2023-05-16  8:19             ` Russell King (Oracle)
2023-05-16  8:44             ` Thomas Gleixner
2023-05-16  8:44               ` Thomas Gleixner
2023-05-16  8:48               ` Russell King (Oracle)
2023-05-16  8:48                 ` Russell King (Oracle)
2023-05-16 12:09                 ` Thomas Gleixner
2023-05-16 12:09                   ` Thomas Gleixner
2023-05-16 13:42                   ` Uladzislau Rezki
2023-05-16 13:42                     ` Uladzislau Rezki
2023-05-16 14:38                     ` Thomas Gleixner
2023-05-16 14:38                       ` Thomas Gleixner
2023-05-16 15:01                       ` Uladzislau Rezki
2023-05-16 15:01                         ` Uladzislau Rezki
2023-05-16 17:04                         ` Thomas Gleixner
2023-05-16 17:04                           ` Thomas Gleixner
2023-05-17 11:26                           ` Uladzislau Rezki
2023-05-17 11:26                             ` Uladzislau Rezki
2023-05-17 11:58                             ` Thomas Gleixner
2023-05-17 11:58                               ` Thomas Gleixner
2023-05-17 12:15                               ` Uladzislau Rezki
2023-05-17 12:15                                 ` Uladzislau Rezki
2023-05-17 16:32                                 ` Thomas Gleixner
2023-05-17 16:32                                   ` Thomas Gleixner
2023-05-19 10:01                                   ` Uladzislau Rezki
2023-05-19 10:01                                     ` Uladzislau Rezki
2023-05-19 14:56                                     ` Thomas Gleixner
2023-05-19 14:56                                       ` Thomas Gleixner
2023-05-19 15:14                                       ` Uladzislau Rezki
2023-05-19 15:14                                         ` Uladzislau Rezki
2023-05-19 16:32                                         ` Thomas Gleixner
2023-05-19 16:32                                           ` Thomas Gleixner
2023-05-19 17:02                                           ` Uladzislau Rezki
2023-05-19 17:02                                             ` Uladzislau Rezki
2023-05-16 17:56                       ` Nadav Amit
2023-05-16 17:56                         ` Nadav Amit
2023-05-16 19:32                         ` Thomas Gleixner
2023-05-16 19:32                           ` Thomas Gleixner
2023-05-17  0:23                           ` Thomas Gleixner
2023-05-17  0:23                             ` Thomas Gleixner
2023-05-17  1:23                             ` Nadav Amit
2023-05-17  1:23                               ` Nadav Amit
2023-05-17 10:31                               ` Thomas Gleixner
2023-05-17 10:31                                 ` Thomas Gleixner
2023-05-17 11:47                                 ` Thomas Gleixner
2023-05-17 11:47                                   ` Thomas Gleixner
2023-05-17 22:41                                   ` Nadav Amit
2023-05-17 22:41                                     ` Nadav Amit
2023-05-17 14:43                                 ` Mark Rutland
2023-05-17 14:43                                   ` Mark Rutland
2023-05-17 16:41                                   ` Thomas Gleixner
2023-05-17 16:41                                     ` Thomas Gleixner
2023-05-17 22:57                                 ` Nadav Amit
2023-05-17 22:57                                   ` Nadav Amit
2023-05-19 11:49                                   ` Thomas Gleixner
2023-05-19 11:49                                     ` Thomas Gleixner
2023-05-17 12:12                               ` Russell King (Oracle)
2023-05-17 12:12                                 ` Russell King (Oracle)
2023-05-17 23:14                                 ` Nadav Amit
2023-05-17 23:14                                   ` Nadav Amit
2023-05-15 18:17 ` Uladzislau Rezki
2023-05-15 18:17   ` Uladzislau Rezki
2023-05-16  2:26   ` Baoquan He
2023-05-16  2:26     ` Baoquan He
2023-05-16  6:40     ` Thomas Gleixner
2023-05-16  6:40       ` Thomas Gleixner
2023-05-16  8:07       ` Baoquan He
2023-05-16  8:07         ` Baoquan He
2023-05-16  8:10         ` Baoquan He
2023-05-16  8:10           ` Baoquan He
2023-05-16  8:45         ` Russell King (Oracle)
2023-05-16  8:45           ` Russell King (Oracle)
2023-05-16  9:13           ` Thomas Gleixner
2023-05-16  9:13             ` Thomas Gleixner
2023-05-16  8:54         ` Thomas Gleixner
2023-05-16  8:54           ` Thomas Gleixner
2023-05-16  9:48           ` Baoquan He
2023-05-16  9:48             ` Baoquan He
2023-05-15 20:02 ` Nadav Amit
2023-05-15 20:02   ` Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a5y5a6kj.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=hch@lst.de \
    --cc=jogness@linutronix.de \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@arm.linux.org.uk \
    --cc=lstoakes@gmail.com \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=peterz@infradead.org \
    --cc=urezki@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.