From: Peter Xu <peterx@redhat.com>
To: Yang Shi <shy828301@gmail.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
David Hildenbrand <david@redhat.com>,
Alistair Popple <apopple@nvidia.com>,
Andrew Morton <akpm@linux-foundation.org>,
Andrea Arcangeli <aarcange@redhat.com>,
"Kirill A . Shutemov" <kirill@shutemov.name>,
Johannes Weiner <hannes@cmpxchg.org>,
John Hubbard <jhubbard@nvidia.com>,
Naoya Horiguchi <naoya.horiguchi@nec.com>,
Muhammad Usama Anjum <usama.anjum@collabora.com>,
Hugh Dickins <hughd@google.com>, Mike Rapoport <rppt@kernel.org>
Subject: Re: [PATCH 4/4] mm: Make most walk page paths with pmd_trans_unstable() to retry
Date: Mon, 5 Jun 2023 15:20:03 -0400 [thread overview]
Message-ID: <ZH41YzZ0DBoF8csH@x1n> (raw)
In-Reply-To: <CAHbLzkp_tzN8SZVeWTKxtMAnFSzUvk2064KFg3quj=raOSHPrA@mail.gmail.com>
On Mon, Jun 05, 2023 at 11:46:04AM -0700, Yang Shi wrote:
> On Fri, Jun 2, 2023 at 4:06 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > For most of the page walk paths, logically it'll always be good to have the
> > pmd retries if hit pmd_trans_unstable() race. We can treat it as none
> > pmd (per comment above pmd_trans_unstable()), but in most cases we're not
> > even treating that as a none pmd. If to fix it anyway, a retry will be the
> > most accurate.
> >
> > I've went over all the pmd_trans_unstable() special cases and this patch
> > should cover all the rest places where we should retry properly with
> > unstable pmd. With the newly introduced ACTION_AGAIN since 2020 we can
> > easily achieve that.
> >
> > These are the call sites that I think should be fixed with it:
> >
> > *** fs/proc/task_mmu.c:
> > smaps_pte_range[634] if (pmd_trans_unstable(pmd))
> > clear_refs_pte_range[1194] if (pmd_trans_unstable(pmd))
> > pagemap_pmd_range[1542] if (pmd_trans_unstable(pmdp))
> > gather_pte_stats[1891] if (pmd_trans_unstable(pmd))
> > *** mm/memcontrol.c:
> > mem_cgroup_count_precharge_pte_range[6024] if (pmd_trans_unstable(pmd))
> > mem_cgroup_move_charge_pte_range[6244] if (pmd_trans_unstable(pmd))
> > *** mm/memory-failure.c:
> > hwpoison_pte_range[794] if (pmd_trans_unstable(pmdp))
> > *** mm/mempolicy.c:
> > queue_folios_pte_range[517] if (pmd_trans_unstable(pmd))
> > *** mm/madvise.c:
> > madvise_cold_or_pageout_pte_range[425] if (pmd_trans_unstable(pmd))
> > madvise_free_pte_range[625] if (pmd_trans_unstable(pmd))
> >
> > IIUC most of them may or may not be a big issue even without a retry,
> > either because they're already not strict (smaps, pte_stats, MADV_COLD,
> > .. it can mean e.g. the statistic may be inaccurate or one less 2M chunk to
> > cold worst case), but some of them could have functional error without the
> > retry afaiu (e.g. pagemap, where we can have the output buffer shifted over
> > the unstable pmd range.. so IIUC the pagemap result can be wrong).
> >
> > While these call sites all look fine, and don't need any change:
> >
> > *** include/linux/pgtable.h:
> > pmd_devmap_trans_unstable[1418] return pmd_devmap(*pmd) || pmd_trans_unstable(pmd);
> > *** mm/gup.c:
> > follow_pmd_mask[695] if (pmd_trans_unstable(pmd))
> > *** mm/mapping_dirty_helpers.c:
> > wp_clean_pmd_entry[131] if (!pmd_trans_unstable(&pmdval))
> > *** mm/memory.c:
> > do_anonymous_page[4060] if (unlikely(pmd_trans_unstable(vmf->pmd)))
> > *** mm/migrate_device.c:
> > migrate_vma_insert_page[616] if (unlikely(pmd_trans_unstable(pmdp)))
> > *** mm/mincore.c:
> > mincore_pte_range[116] if (pmd_trans_unstable(pmd)) {
> >
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> > fs/proc/task_mmu.c | 17 +++++++++++++----
> > mm/madvise.c | 8 ++++++--
> > mm/memcontrol.c | 8 ++++++--
> > mm/memory-failure.c | 4 +++-
> > mm/mempolicy.c | 4 +++-
> > 5 files changed, 31 insertions(+), 10 deletions(-)
> >
> > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > index 6259dd432eeb..823eaba5c6bf 100644
> > --- a/fs/proc/task_mmu.c
> > +++ b/fs/proc/task_mmu.c
> > @@ -631,8 +631,11 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
> > goto out;
> > }
> >
> > - if (pmd_trans_unstable(pmd))
> > + if (pmd_trans_unstable(pmd)) {
> > + walk->action = ACTION_AGAIN;
> > goto out;
> > + }
> > +
> > /*
> > * The mmap_lock held all the way back in m_start() is what
> > * keeps khugepaged out of here and from collapsing things
> > @@ -1191,8 +1194,10 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
> > return 0;
> > }
> >
> > - if (pmd_trans_unstable(pmd))
> > + if (pmd_trans_unstable(pmd)) {
> > + walk->action = ACTION_AGAIN;
> > return 0;
> > + }
> >
> > pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
> > for (; addr != end; pte++, addr += PAGE_SIZE) {
> > @@ -1539,8 +1544,10 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
> > return err;
> > }
> >
> > - if (pmd_trans_unstable(pmdp))
> > + if (pmd_trans_unstable(pmdp)) {
> > + walk->action = ACTION_AGAIN;
> > return 0;
>
> Had a quick look at the pagemap code, I agree with your analysis,
> "returning 0" may mess up pagemap, retry should be fine. But I'm
> wondering whether we should just fill in empty entries. Anyway I don't
> have a strong opinion on this, just a little bit concerned by
> potential indefinite retry.
Yes, none pte is still an option. But if we're going to fix this anyway,
it seems better to fix it with the accurate new thing that poped up, and
it's even less change (just apply walk->action rather than doing random
stuff in different call sites).
I see that you have worry on deadloop over this, so I hope to discuss
altogether here.
Unlike normal checks, pmd_trans_unstable() check means something must have
changed in the past very short period or it should just never if nothing
changed concurrently from under us, so it's not a "if (flag==true)" check
which is even more likely to loop.
If we see the places that I didn't touch, most of them suggested a retry in
one form or another. So if there's a worry this will also not the first
time to do a retry (and for such a "unstable" API, that's really the most
natural thing to do which is to retry until it's stable).
So in general, it seems to me if we deadloop over pmd_trans_unstable() for
whatever reason then something more wrong could have happened..
Thanks,
--
Peter Xu
next prev parent reply other threads:[~2023-06-05 19:21 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-02 23:05 [PATCH 0/4] mm: Fix pmd_trans_unstable() call sites on retry Peter Xu
2023-06-02 23:05 ` [PATCH 1/4] mm/mprotect: Retry on pmd_trans_unstable() Peter Xu
2023-06-03 2:04 ` Yang Shi
2023-06-04 23:58 ` Peter Xu
2023-06-02 23:05 ` [PATCH 2/4] mm/migrate: Unify and retry an unstable pmd when hit Peter Xu
2023-06-02 23:05 ` [PATCH 3/4] mm: Warn for unstable pmd in move_page_tables() Peter Xu
2023-06-02 23:05 ` [PATCH 4/4] mm: Make most walk page paths with pmd_trans_unstable() to retry Peter Xu
2023-06-05 18:46 ` Yang Shi
2023-06-05 19:20 ` Peter Xu [this message]
2023-06-06 19:12 ` Yang Shi
2023-06-06 19:59 ` Peter Xu
2023-06-07 13:49 ` [PATCH 0/4] mm: Fix pmd_trans_unstable() call sites on retry Peter Xu
2023-06-07 15:45 ` David Hildenbrand
2023-06-07 16:21 ` Peter Xu
2023-06-07 16:39 ` Yang Shi
2023-06-07 18:22 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZH41YzZ0DBoF8csH@x1n \
--to=peterx@redhat.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=jhubbard@nvidia.com \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=naoya.horiguchi@nec.com \
--cc=rppt@kernel.org \
--cc=shy828301@gmail.com \
--cc=usama.anjum@collabora.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox