From: Mike Kravetz <mike.kravetz@oracle.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Yin Fengwei <fengwei.yin@intel.com>,
linux-mm@kvack.org, akpm@linux-foundation.org,
Sidhartha Kumar <sidhartha.kumar@oracle.com>,
Jane Chu <jane.chu@oracle.com>,
Naoya Horiguchi <naoya.horiguchi@nec.com>
Subject: Re: [PATCH 1/5] rmap: move hugetlb try_to_unmap to dedicated function
Date: Thu, 23 Feb 2023 16:20:28 -0800 [thread overview]
Message-ID: <Y/gCzDpmTa3DpwqO@monkey> (raw)
In-Reply-To: <Y/eiKn78T16m0IJl@casper.infradead.org>
On 02/23/23 17:28, Matthew Wilcox wrote:
> On Thu, Feb 23, 2023 at 04:31:56PM +0800, Yin Fengwei wrote:
> > It's to prepare the batched rmap update for large folio.
> > No need to looped handle hugetlb. Just handle hugetlb and
> > bail out early.
> >
> > Almost no functional change. Just one change to mm counter
> > update.
>
> This looks like a very worthwhile cleanup in its own right. Adding
> various hugetlb & memory poisoning experts for commentary on the mm
> counter change (search for three asterisks)
The counter change looks correct to me.
However, there is a bunch of code in the else cases for
if (folio_test_hwpoison(folio) && !(flags & TTU_IGNORE_HWPOISON))
in try_to_unmap_one. Are you sure none of it applies to hugetlb pages?
It seems like some of the code in the
} else if (folio_test_anon(folio)) {
could apply. pte_uffd_wp perhaps?
Since you are cleaning up, I think the following make apply ...
> > +static bool try_to_unmap_one_hugetlb(struct folio *folio,
> > + struct vm_area_struct *vma, struct mmu_notifier_range range,
> > + struct page_vma_mapped_walk pvmw, unsigned long address,
> > + enum ttu_flags flags)
> > +{
> > + struct mm_struct *mm = vma->vm_mm;
> > + pte_t pteval;
> > + bool ret = true, anon = folio_test_anon(folio);
> > +
> > + /*
> > + * The try_to_unmap() is only passed a hugetlb page
> > + * in the case where the hugetlb page is poisoned.
> > + */
> > + VM_BUG_ON_FOLIO(!folio_test_hwpoison(folio), folio);
> > + /*
> > + * huge_pmd_unshare may unmap an entire PMD page.
> > + * There is no way of knowing exactly which PMDs may
> > + * be cached for this mm, so we must flush them all.
> > + * start/end were already adjusted above to cover this
> > + * range.
> > + */
> > + flush_cache_range(vma, range.start, range.end);
> > +
> > + /*
> > + * To call huge_pmd_unshare, i_mmap_rwsem must be
> > + * held in write mode. Caller needs to explicitly
> > + * do this outside rmap routines.
> > + *
> > + * We also must hold hugetlb vma_lock in write mode.
> > + * Lock order dictates acquiring vma_lock BEFORE
> > + * i_mmap_rwsem. We can only try lock here and fail
> > + * if unsuccessful.
> > + */
> > + if (!anon) {
> > + VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
> > + if (!hugetlb_vma_trylock_write(vma)) {
> > + ret = false;
> > + goto out;
> > + }
> > + if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) {
> > + hugetlb_vma_unlock_write(vma);
> > + flush_tlb_range(vma,
> > + range.start, range.end);
> > + mmu_notifier_invalidate_range(mm,
> > + range.start, range.end);
> > + /*
> > + * The ref count of the PMD page was
> > + * dropped which is part of the way map
> > + * counting is done for shared PMDs.
> > + * Return 'true' here. When there is
> > + * no other sharing, huge_pmd_unshare
> > + * returns false and we will unmap the
> > + * actual page and drop map count
> > + * to zero.
> > + */
> > + goto out;
> > + }
> > + hugetlb_vma_unlock_write(vma);
> > + }
> > + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
> > +
> > + /*
> > + * Now the pte is cleared. If this pte was uffd-wp armed,
> > + * we may want to replace a none pte with a marker pte if
> > + * it's file-backed, so we don't lose the tracking info.
> > + */
> > + pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval);
> > +
> > + /* Set the dirty flag on the folio now the pte is gone. */
> > + if (pte_dirty(pteval))
> > + folio_mark_dirty(folio);
> > +
> > + /* Update high watermark before we lower rss */
> > + update_hiwater_rss(mm);
> > +
> > + if (folio_test_hwpoison(folio) && !(flags & TTU_IGNORE_HWPOISON)) {
I'm guessing this can just be,
if (!(flags & TTU_IGNORE_HWPOISON))
as we only get here in the poison case as indicated by,
VM_BUG_ON_FOLIO(!folio_test_hwpoison(folio), folio);
above.
> > + pteval = swp_entry_to_pte(make_hwpoison_entry(&folio->page));
> > + set_huge_pte_at(mm, address, pvmw.pte, pteval);
> > + }
> > +
> > + /*** try_to_unmap_one() called dec_mm_counter for
> > + * (folio_test_hwpoison(folio) && !(flags & TTU_IGNORE_HWPOISON)) not
> > + * true case, looks incorrect. Change it to hugetlb_count_sub() here.
> > + */
> > + hugetlb_count_sub(folio_nr_pages(folio), mm);
> > +
> > + /*
> > + * No need to call mmu_notifier_invalidate_range() it has be
> > + * done above for all cases requiring it to happen under page
> > + * table lock before mmu_notifier_invalidate_range_end()
> > + *
> > + * See Documentation/mm/mmu_notifier.rst
> > + */
> > + page_remove_rmap(&folio->page, vma, folio_test_hugetlb(folio));
Always compound/hugetlb, so could be:
page_remove_rmap(&folio->page, vma, true);
> > + if (vma->vm_flags & VM_LOCKED)
> > + mlock_drain_local();
Since hugetlb pages are always memory resident, I do not think the above
is necessary.
> > + folio_put(folio);
> > +
> > +out:
> > + return ret;
> > +}
--
Mike Kravetz
next prev parent reply other threads:[~2023-02-24 0:20 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-23 8:31 [PATCH 0/5] batched remove rmap in try_to_unmap_one() Yin Fengwei
2023-02-23 8:31 ` [PATCH 1/5] rmap: move hugetlb try_to_unmap to dedicated function Yin Fengwei
2023-02-23 17:28 ` Matthew Wilcox
2023-02-24 0:20 ` Mike Kravetz [this message]
2023-02-24 0:52 ` Yin, Fengwei
2023-02-24 2:51 ` HORIGUCHI NAOYA(堀口 直也)
2023-02-24 4:41 ` Yin, Fengwei
2023-02-24 19:21 ` Mike Kravetz
2023-02-26 11:44 ` Yin, Fengwei
2023-02-23 8:31 ` [PATCH 2/5] rmap: move page unmap operation " Yin Fengwei
2023-02-23 8:31 ` [PATCH 3/5] rmap: cleanup exit path of try_to_unmap_one_page() Yin Fengwei
2023-02-23 8:31 ` [PATCH 4/5] rmap:addd folio_remove_rmap_range() Yin Fengwei
2023-02-23 8:32 ` [PATCH 5/5] try_to_unmap_one: batched remove rmap, update folio refcount Yin Fengwei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y/gCzDpmTa3DpwqO@monkey \
--to=mike.kravetz@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=fengwei.yin@intel.com \
--cc=jane.chu@oracle.com \
--cc=linux-mm@kvack.org \
--cc=naoya.horiguchi@nec.com \
--cc=sidhartha.kumar@oracle.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.