From: Linus Torvalds <torvalds@linux-foundation.org>
To: Nadav Amit <nadav.amit@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
Jann Horn <jannh@google.com>, John Hubbard <jhubbard@nvidia.com>,
X86 ML <x86@kernel.org>, Matthew Wilcox <willy@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
kernel list <linux-kernel@vger.kernel.org>,
Linux-MM <linux-mm@kvack.org>,
Andrea Arcangeli <aarcange@redhat.com>,
"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
jroedel@suse.de, ubizjak@gmail.com
Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment
Date: Fri, 28 Oct 2022 17:42:39 -0700 [thread overview]
Message-ID: <CAHk-=wgzT1QsSCF-zN+eS06WGVTBg4sf=6oTMg95+AEq7QrSCQ@mail.gmail.com> (raw)
In-Reply-To: <B88D3073-440A-41C7-95F4-895D3F657EF2@gmail.com>
On Fri, Oct 28, 2022 at 4:57 PM Nadav Amit <nadav.amit@gmail.com> wrote:
>
> The problem is in the following code of zap_pte_range():
>
> if (!PageAnon(page)) {
> if (pte_dirty(ptent)) {
> force_flush = 1;
> set_page_dirty(page);
> }
> …
> }
> page_remove_rmap(page, vma, false);
>
> Once we remove the rmap, rmap_walk() would not acquire the page-table lock
> anymore. As a result, nothing prevents the kernel from performing writeback
> and cleaning the page-struct dirty-bit before the TLB is actually flushed.
Hah.
The original reason for force_flush there was similar, with a race wrt
page_mkclean() because this code doesn't take the page lock that would
normally serialize things, because the page lock is so painful and
ends up having nasty nasty interactions with slow IO operations.
So all the page-dirty handling really *wants* to use the page lock,
and for the IO side (writeback) that ends up being acceptable and
works well, but from that "serialize VM state" it's horrendous.
So instead the code intentionally serialized on the rmap data
structures which page_mkclean() also walks, and as you point out,
that's broken. It's not broken at the point where we do
set_page_dirty(), but it *comes* broken when we drop the rmap, and the
problem is exactly that "we still have the dirty bit hidden in the TLB
state" issue that you pinpoint.
I think the proper fix (or at least _a_ proper fix) would be to
actually carry the dirty bit along to the __tlb_remove_page() point,
and actually treat it exactly the same way as the page pointer itself
- set the page dirty after the TLB flush, the same way we can free the
page after the TLB flush.
We could easiy hide said dirty bit in the low bits of the
"batch->pages[]" array or something like that. We'd just have to add
the 'dirty' argument to __tlb_remove_page_size() and friends.
Hmm?
Your idea of "do the page_remove_rmap() late instead" would also work,
but the reason I think just squirrelling away the dirty bit is the
"proper" fix is that it would get rid of the whole need for
'force_flush' in this area entirely. So we'd not only fix that race
you noticed, we'd actually do so and reduce the number of TLB flushes
too.
I don't know. Maybe I'm missing something fundamental, and my idea is
just stupid.
Linus
next prev parent reply other threads:[~2022-10-29 0:43 UTC|newest]
Thread overview: 143+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-22 11:14 [PATCH 00/13] Clean up pmd_get_atomic() and i386-PAE Peter Zijlstra
2022-10-22 11:14 ` [PATCH 01/13] mm: Update ptep_get_lockless()s comment Peter Zijlstra
2022-10-24 5:42 ` John Hubbard
2022-10-24 8:00 ` Peter Zijlstra
2022-10-24 19:58 ` Jann Horn
2022-10-24 20:19 ` Linus Torvalds
2022-10-24 20:23 ` Jann Horn
2022-10-24 20:36 ` Linus Torvalds
2022-10-25 3:21 ` Matthew Wilcox
2022-10-25 7:54 ` Alistair Popple
2022-10-25 13:33 ` Peter Zijlstra
2022-10-25 13:44 ` Jann Horn
2022-10-26 0:45 ` Alistair Popple
2022-10-25 14:02 ` Peter Zijlstra
2022-10-25 14:18 ` Jann Horn
2022-10-25 15:06 ` Peter Zijlstra
2022-10-26 16:45 ` Jann Horn
2022-10-27 7:08 ` Peter Zijlstra
2022-10-27 18:13 ` Linus Torvalds
2022-10-27 19:35 ` Peter Zijlstra
2022-10-27 19:43 ` Linus Torvalds
2022-10-27 20:15 ` Nadav Amit
2022-10-27 20:31 ` Linus Torvalds
2022-10-27 21:44 ` Nadav Amit
2022-10-28 23:57 ` Nadav Amit
2022-10-29 0:42 ` Linus Torvalds [this message]
2022-10-29 18:05 ` Nadav Amit
2022-10-29 18:36 ` Linus Torvalds
2022-10-29 18:58 ` Linus Torvalds
2022-10-29 19:14 ` Linus Torvalds
2022-10-29 19:28 ` Nadav Amit
2022-10-30 0:18 ` Nadav Amit
2022-10-30 2:17 ` Nadav Amit
2022-10-30 18:19 ` Linus Torvalds
2022-10-30 18:51 ` Linus Torvalds
2022-10-30 22:47 ` Linus Torvalds
2022-10-31 1:47 ` Linus Torvalds
2022-10-31 4:09 ` Nadav Amit
2022-10-31 4:55 ` Nadav Amit
2022-10-31 5:00 ` Linus Torvalds
2022-10-31 15:43 ` Nadav Amit
2022-10-31 17:32 ` Linus Torvalds
2022-10-31 9:36 ` Peter Zijlstra
2022-10-31 17:28 ` Linus Torvalds
2022-10-31 18:43 ` mm: delay rmap removal until after TLB flush Linus Torvalds
2022-11-02 9:14 ` Christian Borntraeger
2022-11-02 9:23 ` Christian Borntraeger
2022-11-02 17:55 ` Linus Torvalds
2022-11-02 18:28 ` Linus Torvalds
2022-11-02 22:29 ` Gerald Schaefer
2022-11-02 12:45 ` Peter Zijlstra
2022-11-02 22:31 ` Gerald Schaefer
2022-11-02 23:13 ` Linus Torvalds
2022-11-03 9:52 ` David Hildenbrand
2022-11-03 16:54 ` Linus Torvalds
2022-11-03 17:09 ` Linus Torvalds
2022-11-03 17:36 ` David Hildenbrand
2022-11-04 6:33 ` Alexander Gordeev
2022-11-04 17:35 ` Linus Torvalds
2022-11-06 21:06 ` Hugh Dickins
2022-11-06 22:34 ` Linus Torvalds
2022-11-06 23:14 ` Andrew Morton
2022-11-07 0:06 ` Stephen Rothwell
2022-11-07 16:19 ` Linus Torvalds
2022-11-07 23:02 ` Andrew Morton
2022-11-07 23:44 ` Stephen Rothwell
2022-11-07 9:12 ` Peter Zijlstra
2022-11-07 20:07 ` Johannes Weiner
2022-11-07 20:29 ` Linus Torvalds
2022-11-07 23:47 ` Linus Torvalds
2022-11-08 4:28 ` Linus Torvalds
2022-11-08 19:56 ` Linus Torvalds
2022-11-08 20:03 ` Konstantin Ryabitsev
2022-11-08 20:18 ` Linus Torvalds
2022-11-08 19:41 ` [PATCH 1/4] mm: introduce 'encoded' page pointers with embedded extra bits Linus Torvalds
2022-11-08 20:37 ` Nadav Amit
2022-11-08 20:46 ` Linus Torvalds
2022-11-09 6:36 ` Alexander Gordeev
2022-11-09 18:00 ` Linus Torvalds
2022-11-09 20:02 ` Linus Torvalds
2022-11-08 19:41 ` [PATCH 2/4] mm: teach release_pages() to take an array of encoded page pointers too Linus Torvalds
2022-11-08 19:41 ` [PATCH 3/4] mm: mmu_gather: prepare to gather encoded page pointers with flags Linus Torvalds
2022-11-08 19:41 ` [PATCH 4/4] mm: delay page_remove_rmap() until after the TLB has been flushed Linus Torvalds
2022-11-08 21:05 ` Nadav Amit
2022-11-09 15:53 ` Johannes Weiner
2022-11-09 19:31 ` Hugh Dickins
2022-10-31 9:39 ` [PATCH 01/13] mm: Update ptep_get_lockless()s comment Peter Zijlstra
2022-10-31 17:22 ` Linus Torvalds
2022-10-31 9:46 ` Peter Zijlstra
2022-10-31 9:28 ` Peter Zijlstra
2022-10-31 17:19 ` Linus Torvalds
2022-10-30 19:34 ` Nadav Amit
2022-10-29 19:39 ` John Hubbard
2022-10-29 20:15 ` Linus Torvalds
2022-10-29 20:30 ` Linus Torvalds
2022-10-29 20:42 ` John Hubbard
2022-10-29 20:56 ` Nadav Amit
2022-10-29 21:03 ` Nadav Amit
2022-10-29 21:12 ` Linus Torvalds
2022-10-29 20:59 ` Theodore Ts'o
2022-10-26 19:43 ` Nadav Amit
2022-10-27 7:27 ` Peter Zijlstra
2022-10-27 17:30 ` Nadav Amit
2022-10-22 11:14 ` [PATCH 02/13] x86/mm/pae: Make pmd_t similar to pte_t Peter Zijlstra
2022-10-22 11:14 ` [PATCH 03/13] sh/mm: " Peter Zijlstra
2022-12-21 13:54 ` Guenter Roeck
2022-10-22 11:14 ` [PATCH 04/13] mm: Fix pmd_read_atomic() Peter Zijlstra
2022-10-22 17:30 ` Linus Torvalds
2022-10-24 8:09 ` Peter Zijlstra
2022-11-01 12:41 ` Peter Zijlstra
2022-11-01 17:42 ` Linus Torvalds
2022-10-22 11:14 ` [PATCH 05/13] mm: Rename GUP_GET_PTE_LOW_HIGH Peter Zijlstra
2022-10-22 11:14 ` [PATCH 06/13] mm: Rename pmd_read_atomic() Peter Zijlstra
2022-10-22 11:14 ` [PATCH 07/13] mm/gup: Fix the lockless PMD access Peter Zijlstra
2022-10-23 0:42 ` Hugh Dickins
2022-10-24 7:42 ` Peter Zijlstra
2022-10-25 3:58 ` Hugh Dickins
2022-10-22 11:14 ` [PATCH 08/13] x86/mm/pae: Dont (ab)use atomic64 Peter Zijlstra
2022-10-22 11:14 ` [PATCH 09/13] x86/mm/pae: Use WRITE_ONCE() Peter Zijlstra
2022-10-22 17:42 ` Linus Torvalds
2022-10-24 10:21 ` Peter Zijlstra
2022-10-22 11:14 ` [PATCH 10/13] x86/mm/pae: Be consistent with pXXp_get_and_clear() Peter Zijlstra
2022-10-22 17:53 ` Linus Torvalds
2022-10-24 11:13 ` Peter Zijlstra
2022-10-22 11:14 ` [PATCH 11/13] x86_64: Remove pointless set_64bit() usage Peter Zijlstra
2022-10-22 17:55 ` Linus Torvalds
2022-11-03 19:09 ` Nathan Chancellor
2022-11-03 19:23 ` Uros Bizjak
2022-11-03 19:35 ` Nathan Chancellor
2022-11-03 20:39 ` Linus Torvalds
2022-11-03 21:06 ` Peter Zijlstra
2022-11-04 16:01 ` Peter Zijlstra
2022-11-04 17:15 ` Linus Torvalds
2022-11-05 13:29 ` Jason A. Donenfeld
2022-11-05 15:14 ` Peter Zijlstra
2022-11-05 20:54 ` Jason A. Donenfeld
2022-11-07 9:14 ` David Laight
2022-12-19 15:44 ` Peter Zijlstra
2022-10-22 11:14 ` [PATCH 12/13] x86/mm/pae: Get rid of set_64bit() Peter Zijlstra
2022-10-22 11:14 ` [PATCH 13/13] mm: Remove pointless barrier() after pmdp_get_lockless() Peter Zijlstra
2022-10-22 19:59 ` Yu Zhao
2022-10-22 17:57 ` [PATCH 00/13] Clean up pmd_get_atomic() and i386-PAE Linus Torvalds
2022-10-29 12:21 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAHk-=wgzT1QsSCF-zN+eS06WGVTBg4sf=6oTMg95+AEq7QrSCQ@mail.gmail.com' \
--to=torvalds@linux-foundation.org \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=jannh@google.com \
--cc=jhubbard@nvidia.com \
--cc=jroedel@suse.de \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nadav.amit@gmail.com \
--cc=peterz@infradead.org \
--cc=ubizjak@gmail.com \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).