From: Thomas Gleixner <tglx@linutronix.de>
To: Baoquan He <bhe@redhat.com>
Cc: "Russell King (Oracle)" <linux@armlinux.org.uk>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, Christoph Hellwig <hch@lst.de>,
Uladzislau Rezki <urezki@gmail.com>,
Lorenzo Stoakes <lstoakes@gmail.com>,
Peter Zijlstra <peterz@infradead.org>,
John Ogness <jogness@linutronix.de>,
linux-arm-kernel@lists.infradead.org,
Mark Rutland <mark.rutland@arm.com>,
Marc Zyngier <maz@kernel.org>,
x86@kernel.org
Subject: Re: Excessive TLB flush ranges
Date: Fri, 19 May 2023 13:22:33 +0200 [thread overview]
Message-ID: <875y8o5zwm.ffs@tglx> (raw)
In-Reply-To: <ZGSx6B2AWQAv/smj@MiWiFi-R3L-srv>
On Wed, May 17 2023 at 18:52, Baoquan He wrote:
> On 05/17/23 at 11:38am, Thomas Gleixner wrote:
>> On Tue, May 16 2023 at 21:03, Thomas Gleixner wrote:
>> >
>> > Aside of that, if I read the code correctly then if there is an unmap
>> > via vb_free() which does not cover the whole vmap block then vb->dirty
>> > is set and every _vm_unmap_aliases() invocation flushes that dirty range
>> > over and over until that vmap block is completely freed, no?
>>
>> Something like the below would cure that.
>>
>> While it prevents that this is flushed forever it does not cure the
>> eventually overly broad flush when the block is completely dirty and
>> purged:
>>
>> Assume a block with 1024 pages, where 1022 pages are already freed and
>> TLB flushed. Now the last 2 pages are freed and the block is purged,
>> which results in a flush of 1024 pages where 1022 are already done,
>> right?
>
> This is good idea, I am thinking how to reply to your last mail and how
> to fix this. While your cure code may not work well. Please see below
> inline comment.
See below.
> One vmap block has 64 pages.
> #define VMAP_MAX_ALLOC BITS_PER_LONG /* 256K with 4K pages */
No, VMAP_MAX_ALLOC is the allocation limit for a single vb_alloc().
On 64bit it has at least 128 pages, but can have up to 1024:
#define VMAP_BBMAP_BITS_MAX 1024 /* 4MB with 4K pages */
#define VMAP_BBMAP_BITS_MIN (VMAP_MAX_ALLOC*2)
and then some magic happens to calculate the actual size
#define VMAP_BBMAP_BITS \
VMAP_MIN(VMAP_BBMAP_BITS_MAX, \
VMAP_MAX(VMAP_BBMAP_BITS_MIN, \
VMALLOC_PAGES / roundup_pow_of_two(NR_CPUS) / 16))
which is in a range of (2*BITS_PER_LONG) ... 1024.
The actual vmap block size is:
#define VMAP_BLOCK_SIZE (VMAP_BBMAP_BITS * PAGE_SIZE)
Which is then obviously something between 512k and 4MB on 64bit and
between 256k and 4MB on 32bit.
>> @@ -2240,13 +2240,17 @@ static void _vm_unmap_aliases(unsigned l
>> rcu_read_lock();
>> list_for_each_entry_rcu(vb, &vbq->free, free_list) {
>> spin_lock(&vb->lock);
>> - if (vb->dirty && vb->dirty != VMAP_BBMAP_BITS) {
>> + if (vb->dirty_max && vb->dirty != VMAP_BBMAP_BITS) {
>> unsigned long va_start = vb->va->va_start;
>> unsigned long s, e;
>
> When vb_free() is invoked, it could cause three kinds of vmap_block as
> below. Your code works well for the 2nd case, for the 1st one, it may be
> not. And the 2nd one is the stuff that we reclaim and put into purge
> list in purge_fragmented_blocks_allcpus().
>
> 1)
> |-----|------------|-----------|-------|
> |dirty|still mapped| dirty | free |
>
> 2)
> |------------------------------|-------|
> | dirty | free |
You sure? The first one is put into the purge list too.
/* Expand dirty range */
vb->dirty_min = min(vb->dirty_min, offset);
vb->dirty_max = max(vb->dirty_max, offset + (1UL << order));
pages bits dirtymin dirtymax
vb_alloc(A) 2 0 - 1 VMAP_BBMAP_BITS 0
vb_alloc(B) 4 2 - 5
vb_alloc(C) 2 6 - 7
So you get three variants:
1) Flush after freeing A
vb_free(A) 2 0 - 1 0 1
Flush VMAP_BBMAP_BITS 0 <- correct
vb_free(C) 2 6 - 7 6 7
Flush VMAP_BBMAP_BITS 0 <- correct
2) No flush between freeing A and C
vb_free(A) 2 0 - 1 0 1
vb_free(C) 2 6 - 7 0 7
Flush VMAP_BBMAP_BITS 0 <- overbroad flush
3) No flush between freeing A, C, B
vb_free(A) 2 0 - 1 0 1
vb_free(C) 2 6 - 7 0 7
vb_free(C) 2 2 - 5 0 7
Flush VMAP_BBMAP_BITS 0 <- correct
So my quick hack makes it correct for #1 and #3 and prevents repeated
flushes of already flushed areas.
To prevent #2 you need a bitmap which keeps track of the flushed areas.
Thanks,
tglx
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
WARNING: multiple messages have this Message-ID (diff)
From: Thomas Gleixner <tglx@linutronix.de>
To: Baoquan He <bhe@redhat.com>
Cc: "Russell King (Oracle)" <linux@armlinux.org.uk>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, Christoph Hellwig <hch@lst.de>,
Uladzislau Rezki <urezki@gmail.com>,
Lorenzo Stoakes <lstoakes@gmail.com>,
Peter Zijlstra <peterz@infradead.org>,
John Ogness <jogness@linutronix.de>,
linux-arm-kernel@lists.infradead.org,
Mark Rutland <mark.rutland@arm.com>,
Marc Zyngier <maz@kernel.org>,
x86@kernel.org
Subject: Re: Excessive TLB flush ranges
Date: Fri, 19 May 2023 13:22:33 +0200 [thread overview]
Message-ID: <875y8o5zwm.ffs@tglx> (raw)
In-Reply-To: <ZGSx6B2AWQAv/smj@MiWiFi-R3L-srv>
On Wed, May 17 2023 at 18:52, Baoquan He wrote:
> On 05/17/23 at 11:38am, Thomas Gleixner wrote:
>> On Tue, May 16 2023 at 21:03, Thomas Gleixner wrote:
>> >
>> > Aside of that, if I read the code correctly then if there is an unmap
>> > via vb_free() which does not cover the whole vmap block then vb->dirty
>> > is set and every _vm_unmap_aliases() invocation flushes that dirty range
>> > over and over until that vmap block is completely freed, no?
>>
>> Something like the below would cure that.
>>
>> While it prevents that this is flushed forever it does not cure the
>> eventually overly broad flush when the block is completely dirty and
>> purged:
>>
>> Assume a block with 1024 pages, where 1022 pages are already freed and
>> TLB flushed. Now the last 2 pages are freed and the block is purged,
>> which results in a flush of 1024 pages where 1022 are already done,
>> right?
>
> This is good idea, I am thinking how to reply to your last mail and how
> to fix this. While your cure code may not work well. Please see below
> inline comment.
See below.
> One vmap block has 64 pages.
> #define VMAP_MAX_ALLOC BITS_PER_LONG /* 256K with 4K pages */
No, VMAP_MAX_ALLOC is the allocation limit for a single vb_alloc().
On 64bit it has at least 128 pages, but can have up to 1024:
#define VMAP_BBMAP_BITS_MAX 1024 /* 4MB with 4K pages */
#define VMAP_BBMAP_BITS_MIN (VMAP_MAX_ALLOC*2)
and then some magic happens to calculate the actual size
#define VMAP_BBMAP_BITS \
VMAP_MIN(VMAP_BBMAP_BITS_MAX, \
VMAP_MAX(VMAP_BBMAP_BITS_MIN, \
VMALLOC_PAGES / roundup_pow_of_two(NR_CPUS) / 16))
which is in a range of (2*BITS_PER_LONG) ... 1024.
The actual vmap block size is:
#define VMAP_BLOCK_SIZE (VMAP_BBMAP_BITS * PAGE_SIZE)
Which is then obviously something between 512k and 4MB on 64bit and
between 256k and 4MB on 32bit.
>> @@ -2240,13 +2240,17 @@ static void _vm_unmap_aliases(unsigned l
>> rcu_read_lock();
>> list_for_each_entry_rcu(vb, &vbq->free, free_list) {
>> spin_lock(&vb->lock);
>> - if (vb->dirty && vb->dirty != VMAP_BBMAP_BITS) {
>> + if (vb->dirty_max && vb->dirty != VMAP_BBMAP_BITS) {
>> unsigned long va_start = vb->va->va_start;
>> unsigned long s, e;
>
> When vb_free() is invoked, it could cause three kinds of vmap_block as
> below. Your code works well for the 2nd case, for the 1st one, it may be
> not. And the 2nd one is the stuff that we reclaim and put into purge
> list in purge_fragmented_blocks_allcpus().
>
> 1)
> |-----|------------|-----------|-------|
> |dirty|still mapped| dirty | free |
>
> 2)
> |------------------------------|-------|
> | dirty | free |
You sure? The first one is put into the purge list too.
/* Expand dirty range */
vb->dirty_min = min(vb->dirty_min, offset);
vb->dirty_max = max(vb->dirty_max, offset + (1UL << order));
pages bits dirtymin dirtymax
vb_alloc(A) 2 0 - 1 VMAP_BBMAP_BITS 0
vb_alloc(B) 4 2 - 5
vb_alloc(C) 2 6 - 7
So you get three variants:
1) Flush after freeing A
vb_free(A) 2 0 - 1 0 1
Flush VMAP_BBMAP_BITS 0 <- correct
vb_free(C) 2 6 - 7 6 7
Flush VMAP_BBMAP_BITS 0 <- correct
2) No flush between freeing A and C
vb_free(A) 2 0 - 1 0 1
vb_free(C) 2 6 - 7 0 7
Flush VMAP_BBMAP_BITS 0 <- overbroad flush
3) No flush between freeing A, C, B
vb_free(A) 2 0 - 1 0 1
vb_free(C) 2 6 - 7 0 7
vb_free(C) 2 2 - 5 0 7
Flush VMAP_BBMAP_BITS 0 <- correct
So my quick hack makes it correct for #1 and #3 and prevents repeated
flushes of already flushed areas.
To prevent #2 you need a bitmap which keeps track of the flushed areas.
Thanks,
tglx
next prev parent reply other threads:[~2023-05-19 11:23 UTC|newest]
Thread overview: 150+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-15 16:43 Excessive TLB flush ranges Thomas Gleixner
2023-05-15 16:43 ` Thomas Gleixner
2023-05-15 16:59 ` Russell King (Oracle)
2023-05-15 16:59 ` Russell King (Oracle)
2023-05-15 19:46 ` Thomas Gleixner
2023-05-15 19:46 ` Thomas Gleixner
2023-05-15 21:11 ` Thomas Gleixner
2023-05-15 21:11 ` Thomas Gleixner
2023-05-15 21:31 ` Russell King (Oracle)
2023-05-15 21:31 ` Russell King (Oracle)
2023-05-16 6:37 ` Thomas Gleixner
2023-05-16 6:37 ` Thomas Gleixner
2023-05-16 6:46 ` Thomas Gleixner
2023-05-16 6:46 ` Thomas Gleixner
2023-05-16 8:18 ` Thomas Gleixner
2023-05-16 8:18 ` Thomas Gleixner
2023-05-16 8:20 ` Thomas Gleixner
2023-05-16 8:20 ` Thomas Gleixner
2023-05-16 8:27 ` Russell King (Oracle)
2023-05-16 8:27 ` Russell King (Oracle)
2023-05-16 9:03 ` Thomas Gleixner
2023-05-16 9:03 ` Thomas Gleixner
2023-05-16 10:05 ` Baoquan He
2023-05-16 10:05 ` Baoquan He
2023-05-16 14:21 ` Thomas Gleixner
2023-05-16 14:21 ` Thomas Gleixner
2023-05-16 19:03 ` Thomas Gleixner
2023-05-16 19:03 ` Thomas Gleixner
2023-05-17 9:38 ` Thomas Gleixner
2023-05-17 9:38 ` Thomas Gleixner
2023-05-17 10:52 ` Baoquan He
2023-05-17 10:52 ` Baoquan He
2023-05-19 11:22 ` Thomas Gleixner [this message]
2023-05-19 11:22 ` Thomas Gleixner
2023-05-19 11:49 ` Baoquan He
2023-05-19 11:49 ` Baoquan He
2023-05-19 14:13 ` Thomas Gleixner
2023-05-19 14:13 ` Thomas Gleixner
2023-05-19 12:01 ` [RFC PATCH 1/3] mm/vmalloc.c: try to flush vmap_area one by one Baoquan He
2023-05-19 12:01 ` Baoquan He
2023-05-19 14:16 ` Thomas Gleixner
2023-05-19 14:16 ` Thomas Gleixner
2023-05-19 12:02 ` [RFC PATCH 2/3] mm/vmalloc.c: Only flush VM_FLUSH_RESET_PERMS area immediately Baoquan He
2023-05-19 12:02 ` Baoquan He
2023-05-19 12:03 ` [RFC PATCH 3/3] mm/vmalloc.c: change _vm_unmap_aliases() to do purge firstly Baoquan He
2023-05-19 12:03 ` Baoquan He
2023-05-19 14:17 ` Thomas Gleixner
2023-05-19 14:17 ` Thomas Gleixner
2023-05-19 18:38 ` Thomas Gleixner
2023-05-19 18:38 ` Thomas Gleixner
2023-05-19 23:46 ` Baoquan He
2023-05-19 23:46 ` Baoquan He
2023-05-21 23:10 ` Thomas Gleixner
2023-05-21 23:10 ` Thomas Gleixner
2023-05-22 11:21 ` Baoquan He
2023-05-22 11:21 ` Baoquan He
2023-05-22 12:02 ` Thomas Gleixner
2023-05-22 12:02 ` Thomas Gleixner
2023-05-22 14:34 ` Baoquan He
2023-05-22 14:34 ` Baoquan He
2023-05-22 20:21 ` Thomas Gleixner
2023-05-22 20:21 ` Thomas Gleixner
2023-05-22 20:44 ` Thomas Gleixner
2023-05-22 20:44 ` Thomas Gleixner
2023-05-23 9:35 ` Baoquan He
2023-05-23 9:35 ` Baoquan He
2023-05-19 13:49 ` Excessive TLB flush ranges Thomas Gleixner
2023-05-19 13:49 ` Thomas Gleixner
2023-05-16 8:21 ` Russell King (Oracle)
2023-05-16 8:21 ` Russell King (Oracle)
2023-05-16 8:19 ` Russell King (Oracle)
2023-05-16 8:19 ` Russell King (Oracle)
2023-05-16 8:44 ` Thomas Gleixner
2023-05-16 8:44 ` Thomas Gleixner
2023-05-16 8:48 ` Russell King (Oracle)
2023-05-16 8:48 ` Russell King (Oracle)
2023-05-16 12:09 ` Thomas Gleixner
2023-05-16 12:09 ` Thomas Gleixner
2023-05-16 13:42 ` Uladzislau Rezki
2023-05-16 13:42 ` Uladzislau Rezki
2023-05-16 14:38 ` Thomas Gleixner
2023-05-16 14:38 ` Thomas Gleixner
2023-05-16 15:01 ` Uladzislau Rezki
2023-05-16 15:01 ` Uladzislau Rezki
2023-05-16 17:04 ` Thomas Gleixner
2023-05-16 17:04 ` Thomas Gleixner
2023-05-17 11:26 ` Uladzislau Rezki
2023-05-17 11:26 ` Uladzislau Rezki
2023-05-17 11:58 ` Thomas Gleixner
2023-05-17 11:58 ` Thomas Gleixner
2023-05-17 12:15 ` Uladzislau Rezki
2023-05-17 12:15 ` Uladzislau Rezki
2023-05-17 16:32 ` Thomas Gleixner
2023-05-17 16:32 ` Thomas Gleixner
2023-05-19 10:01 ` Uladzislau Rezki
2023-05-19 10:01 ` Uladzislau Rezki
2023-05-19 14:56 ` Thomas Gleixner
2023-05-19 14:56 ` Thomas Gleixner
2023-05-19 15:14 ` Uladzislau Rezki
2023-05-19 15:14 ` Uladzislau Rezki
2023-05-19 16:32 ` Thomas Gleixner
2023-05-19 16:32 ` Thomas Gleixner
2023-05-19 17:02 ` Uladzislau Rezki
2023-05-19 17:02 ` Uladzislau Rezki
2023-05-16 17:56 ` Nadav Amit
2023-05-16 17:56 ` Nadav Amit
2023-05-16 19:32 ` Thomas Gleixner
2023-05-16 19:32 ` Thomas Gleixner
2023-05-17 0:23 ` Thomas Gleixner
2023-05-17 0:23 ` Thomas Gleixner
2023-05-17 1:23 ` Nadav Amit
2023-05-17 1:23 ` Nadav Amit
2023-05-17 10:31 ` Thomas Gleixner
2023-05-17 10:31 ` Thomas Gleixner
2023-05-17 11:47 ` Thomas Gleixner
2023-05-17 11:47 ` Thomas Gleixner
2023-05-17 22:41 ` Nadav Amit
2023-05-17 22:41 ` Nadav Amit
2023-05-17 14:43 ` Mark Rutland
2023-05-17 14:43 ` Mark Rutland
2023-05-17 16:41 ` Thomas Gleixner
2023-05-17 16:41 ` Thomas Gleixner
2023-05-17 22:57 ` Nadav Amit
2023-05-17 22:57 ` Nadav Amit
2023-05-19 11:49 ` Thomas Gleixner
2023-05-19 11:49 ` Thomas Gleixner
2023-05-17 12:12 ` Russell King (Oracle)
2023-05-17 12:12 ` Russell King (Oracle)
2023-05-17 23:14 ` Nadav Amit
2023-05-17 23:14 ` Nadav Amit
2023-05-15 18:17 ` Uladzislau Rezki
2023-05-15 18:17 ` Uladzislau Rezki
2023-05-16 2:26 ` Baoquan He
2023-05-16 2:26 ` Baoquan He
2023-05-16 6:40 ` Thomas Gleixner
2023-05-16 6:40 ` Thomas Gleixner
2023-05-16 8:07 ` Baoquan He
2023-05-16 8:07 ` Baoquan He
2023-05-16 8:10 ` Baoquan He
2023-05-16 8:10 ` Baoquan He
2023-05-16 8:45 ` Russell King (Oracle)
2023-05-16 8:45 ` Russell King (Oracle)
2023-05-16 9:13 ` Thomas Gleixner
2023-05-16 9:13 ` Thomas Gleixner
2023-05-16 8:54 ` Thomas Gleixner
2023-05-16 8:54 ` Thomas Gleixner
2023-05-16 9:48 ` Baoquan He
2023-05-16 9:48 ` Baoquan He
2023-05-15 20:02 ` Nadav Amit
2023-05-15 20:02 ` Nadav Amit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=875y8o5zwm.ffs@tglx \
--to=tglx@linutronix.de \
--cc=akpm@linux-foundation.org \
--cc=bhe@redhat.com \
--cc=hch@lst.de \
--cc=jogness@linutronix.de \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-mm@kvack.org \
--cc=linux@armlinux.org.uk \
--cc=lstoakes@gmail.com \
--cc=mark.rutland@arm.com \
--cc=maz@kernel.org \
--cc=peterz@infradead.org \
--cc=urezki@gmail.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.