* Re: [PATCH v10 04/10] mm/rmap: Split migration into its own function
@ 2021-06-09 17:43 kernel test robot
0 siblings, 0 replies; 3+ messages in thread
From: kernel test robot @ 2021-06-09 17:43 UTC (permalink / raw)
To: kbuild
[-- Attachment #1: Type: text/plain, Size: 16246 bytes --]
CC: kbuild-all(a)lists.01.org
In-Reply-To: <20210607075855.5084-5-apopple@nvidia.com>
References: <20210607075855.5084-5-apopple@nvidia.com>
TO: Alistair Popple <apopple@nvidia.com>
TO: linux-mm(a)kvack.org
TO: akpm(a)linux-foundation.org
CC: rcampbell(a)nvidia.com
CC: linux-doc(a)vger.kernel.org
CC: nouveau(a)lists.freedesktop.org
CC: hughd(a)google.com
CC: linux-kernel(a)vger.kernel.org
CC: dri-devel(a)lists.freedesktop.org
CC: hch(a)infradead.org
CC: bskeggs(a)redhat.com
Hi Alistair,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on s390/features]
[also build test WARNING on kselftest/next linus/master v5.13-rc5]
[cannot apply to hnaz-linux-mm/master next-20210609]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Alistair-Popple/Add-support-for-SVM-atomics-in-Nouveau/20210607-160056
base: https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features
:::::: branch date: 2 days ago
:::::: commit date: 2 days ago
config: parisc-randconfig-s032-20210607 (attached as .config)
compiler: hppa64-linux-gcc (GCC) 9.3.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# apt-get install sparse
# sparse version: v0.6.3-341-g8af24329-dirty
# https://github.com/0day-ci/linux/commit/80e54e5e679c95e425608a45721ca6cc30ae25f4
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Alistair-Popple/Add-support-for-SVM-atomics-in-Nouveau/20210607-160056
git checkout 80e54e5e679c95e425608a45721ca6cc30ae25f4
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' W=1 ARCH=parisc
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
sparse warnings: (new ones prefixed by >>)
mm/rmap.c: note: in included file (through include/linux/ksm.h):
include/linux/rmap.h:217:28: sparse: sparse: context imbalance in 'page_referenced_one' - unexpected unlock
mm/rmap.c:970:25: sparse: sparse: context imbalance in 'page_mkclean_one' - different lock contexts for basic block
include/linux/rmap.h:217:28: sparse: sparse: context imbalance in 'try_to_unmap_one' - unexpected unlock
>> mm/rmap.c:1899:17: sparse: sparse: context imbalance in 'try_to_migrate_one' - different lock contexts for basic block
include/linux/rmap.h:217:28: sparse: sparse: context imbalance in 'page_mlock_one' - unexpected unlock
vim +/try_to_migrate_one +1899 mm/rmap.c
80e54e5e679c95 Alistair Popple 2021-06-07 1744
80e54e5e679c95 Alistair Popple 2021-06-07 1745 /* Unexpected PMD-mapped THP? */
80e54e5e679c95 Alistair Popple 2021-06-07 1746 VM_BUG_ON_PAGE(!pvmw.pte, page);
80e54e5e679c95 Alistair Popple 2021-06-07 1747
80e54e5e679c95 Alistair Popple 2021-06-07 1748 subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte);
80e54e5e679c95 Alistair Popple 2021-06-07 1749 address = pvmw.address;
80e54e5e679c95 Alistair Popple 2021-06-07 1750
80e54e5e679c95 Alistair Popple 2021-06-07 1751 if (PageHuge(page) && !PageAnon(page)) {
80e54e5e679c95 Alistair Popple 2021-06-07 1752 /*
80e54e5e679c95 Alistair Popple 2021-06-07 1753 * To call huge_pmd_unshare, i_mmap_rwsem must be
80e54e5e679c95 Alistair Popple 2021-06-07 1754 * held in write mode. Caller needs to explicitly
80e54e5e679c95 Alistair Popple 2021-06-07 1755 * do this outside rmap routines.
80e54e5e679c95 Alistair Popple 2021-06-07 1756 */
80e54e5e679c95 Alistair Popple 2021-06-07 1757 VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
80e54e5e679c95 Alistair Popple 2021-06-07 1758 if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
80e54e5e679c95 Alistair Popple 2021-06-07 1759 /*
80e54e5e679c95 Alistair Popple 2021-06-07 1760 * huge_pmd_unshare unmapped an entire PMD
80e54e5e679c95 Alistair Popple 2021-06-07 1761 * page. There is no way of knowing exactly
80e54e5e679c95 Alistair Popple 2021-06-07 1762 * which PMDs may be cached for this mm, so
80e54e5e679c95 Alistair Popple 2021-06-07 1763 * we must flush them all. start/end were
80e54e5e679c95 Alistair Popple 2021-06-07 1764 * already adjusted above to cover this range.
80e54e5e679c95 Alistair Popple 2021-06-07 1765 */
80e54e5e679c95 Alistair Popple 2021-06-07 1766 flush_cache_range(vma, range.start, range.end);
80e54e5e679c95 Alistair Popple 2021-06-07 1767 flush_tlb_range(vma, range.start, range.end);
80e54e5e679c95 Alistair Popple 2021-06-07 1768 mmu_notifier_invalidate_range(mm, range.start,
80e54e5e679c95 Alistair Popple 2021-06-07 1769 range.end);
80e54e5e679c95 Alistair Popple 2021-06-07 1770
80e54e5e679c95 Alistair Popple 2021-06-07 1771 /*
80e54e5e679c95 Alistair Popple 2021-06-07 1772 * The ref count of the PMD page was dropped
80e54e5e679c95 Alistair Popple 2021-06-07 1773 * which is part of the way map counting
80e54e5e679c95 Alistair Popple 2021-06-07 1774 * is done for shared PMDs. Return 'true'
80e54e5e679c95 Alistair Popple 2021-06-07 1775 * here. When there is no other sharing,
80e54e5e679c95 Alistair Popple 2021-06-07 1776 * huge_pmd_unshare returns false and we will
80e54e5e679c95 Alistair Popple 2021-06-07 1777 * unmap the actual page and drop map count
80e54e5e679c95 Alistair Popple 2021-06-07 1778 * to zero.
80e54e5e679c95 Alistair Popple 2021-06-07 1779 */
80e54e5e679c95 Alistair Popple 2021-06-07 1780 page_vma_mapped_walk_done(&pvmw);
80e54e5e679c95 Alistair Popple 2021-06-07 1781 break;
80e54e5e679c95 Alistair Popple 2021-06-07 1782 }
80e54e5e679c95 Alistair Popple 2021-06-07 1783 }
80e54e5e679c95 Alistair Popple 2021-06-07 1784
80e54e5e679c95 Alistair Popple 2021-06-07 1785 /* Nuke the page table entry. */
80e54e5e679c95 Alistair Popple 2021-06-07 1786 flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
80e54e5e679c95 Alistair Popple 2021-06-07 1787 pteval = ptep_clear_flush(vma, address, pvmw.pte);
80e54e5e679c95 Alistair Popple 2021-06-07 1788
80e54e5e679c95 Alistair Popple 2021-06-07 1789 /* Move the dirty bit to the page. Now the pte is gone. */
80e54e5e679c95 Alistair Popple 2021-06-07 1790 if (pte_dirty(pteval))
80e54e5e679c95 Alistair Popple 2021-06-07 1791 set_page_dirty(page);
80e54e5e679c95 Alistair Popple 2021-06-07 1792
80e54e5e679c95 Alistair Popple 2021-06-07 1793 /* Update high watermark before we lower rss */
80e54e5e679c95 Alistair Popple 2021-06-07 1794 update_hiwater_rss(mm);
80e54e5e679c95 Alistair Popple 2021-06-07 1795
80e54e5e679c95 Alistair Popple 2021-06-07 1796 if (is_zone_device_page(page)) {
80e54e5e679c95 Alistair Popple 2021-06-07 1797 swp_entry_t entry;
80e54e5e679c95 Alistair Popple 2021-06-07 1798 pte_t swp_pte;
80e54e5e679c95 Alistair Popple 2021-06-07 1799
80e54e5e679c95 Alistair Popple 2021-06-07 1800 /*
80e54e5e679c95 Alistair Popple 2021-06-07 1801 * Store the pfn of the page in a special migration
80e54e5e679c95 Alistair Popple 2021-06-07 1802 * pte. do_swap_page() will wait until the migration
80e54e5e679c95 Alistair Popple 2021-06-07 1803 * pte is removed and then restart fault handling.
80e54e5e679c95 Alistair Popple 2021-06-07 1804 */
80e54e5e679c95 Alistair Popple 2021-06-07 1805 entry = make_readable_migration_entry(
80e54e5e679c95 Alistair Popple 2021-06-07 1806 page_to_pfn(page));
80e54e5e679c95 Alistair Popple 2021-06-07 1807 swp_pte = swp_entry_to_pte(entry);
80e54e5e679c95 Alistair Popple 2021-06-07 1808
80e54e5e679c95 Alistair Popple 2021-06-07 1809 /*
80e54e5e679c95 Alistair Popple 2021-06-07 1810 * pteval maps a zone device page and is therefore
80e54e5e679c95 Alistair Popple 2021-06-07 1811 * a swap pte.
80e54e5e679c95 Alistair Popple 2021-06-07 1812 */
80e54e5e679c95 Alistair Popple 2021-06-07 1813 if (pte_swp_soft_dirty(pteval))
80e54e5e679c95 Alistair Popple 2021-06-07 1814 swp_pte = pte_swp_mksoft_dirty(swp_pte);
80e54e5e679c95 Alistair Popple 2021-06-07 1815 if (pte_swp_uffd_wp(pteval))
80e54e5e679c95 Alistair Popple 2021-06-07 1816 swp_pte = pte_swp_mkuffd_wp(swp_pte);
80e54e5e679c95 Alistair Popple 2021-06-07 1817 set_pte_at(mm, pvmw.address, pvmw.pte, swp_pte);
80e54e5e679c95 Alistair Popple 2021-06-07 1818 /*
80e54e5e679c95 Alistair Popple 2021-06-07 1819 * No need to invalidate here it will synchronize on
80e54e5e679c95 Alistair Popple 2021-06-07 1820 * against the special swap migration pte.
80e54e5e679c95 Alistair Popple 2021-06-07 1821 *
80e54e5e679c95 Alistair Popple 2021-06-07 1822 * The assignment to subpage above was computed from a
80e54e5e679c95 Alistair Popple 2021-06-07 1823 * swap PTE which results in an invalid pointer.
80e54e5e679c95 Alistair Popple 2021-06-07 1824 * Since only PAGE_SIZE pages can currently be
80e54e5e679c95 Alistair Popple 2021-06-07 1825 * migrated, just set it to page. This will need to be
80e54e5e679c95 Alistair Popple 2021-06-07 1826 * changed when hugepage migrations to device private
80e54e5e679c95 Alistair Popple 2021-06-07 1827 * memory are supported.
80e54e5e679c95 Alistair Popple 2021-06-07 1828 */
80e54e5e679c95 Alistair Popple 2021-06-07 1829 subpage = page;
80e54e5e679c95 Alistair Popple 2021-06-07 1830 } else if (PageHWPoison(page)) {
80e54e5e679c95 Alistair Popple 2021-06-07 1831 pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
80e54e5e679c95 Alistair Popple 2021-06-07 1832 if (PageHuge(page)) {
80e54e5e679c95 Alistair Popple 2021-06-07 1833 hugetlb_count_sub(compound_nr(page), mm);
80e54e5e679c95 Alistair Popple 2021-06-07 1834 set_huge_swap_pte_at(mm, address,
80e54e5e679c95 Alistair Popple 2021-06-07 1835 pvmw.pte, pteval,
80e54e5e679c95 Alistair Popple 2021-06-07 1836 vma_mmu_pagesize(vma));
80e54e5e679c95 Alistair Popple 2021-06-07 1837 } else {
80e54e5e679c95 Alistair Popple 2021-06-07 1838 dec_mm_counter(mm, mm_counter(page));
80e54e5e679c95 Alistair Popple 2021-06-07 1839 set_pte_at(mm, address, pvmw.pte, pteval);
80e54e5e679c95 Alistair Popple 2021-06-07 1840 }
80e54e5e679c95 Alistair Popple 2021-06-07 1841
80e54e5e679c95 Alistair Popple 2021-06-07 1842 } else if (pte_unused(pteval) && !userfaultfd_armed(vma)) {
80e54e5e679c95 Alistair Popple 2021-06-07 1843 /*
80e54e5e679c95 Alistair Popple 2021-06-07 1844 * The guest indicated that the page content is of no
80e54e5e679c95 Alistair Popple 2021-06-07 1845 * interest anymore. Simply discard the pte, vmscan
80e54e5e679c95 Alistair Popple 2021-06-07 1846 * will take care of the rest.
80e54e5e679c95 Alistair Popple 2021-06-07 1847 * A future reference will then fault in a new zero
80e54e5e679c95 Alistair Popple 2021-06-07 1848 * page. When userfaultfd is active, we must not drop
80e54e5e679c95 Alistair Popple 2021-06-07 1849 * this page though, as its main user (postcopy
80e54e5e679c95 Alistair Popple 2021-06-07 1850 * migration) will not expect userfaults on already
80e54e5e679c95 Alistair Popple 2021-06-07 1851 * copied pages.
80e54e5e679c95 Alistair Popple 2021-06-07 1852 */
80e54e5e679c95 Alistair Popple 2021-06-07 1853 dec_mm_counter(mm, mm_counter(page));
80e54e5e679c95 Alistair Popple 2021-06-07 1854 /* We have to invalidate as we cleared the pte */
80e54e5e679c95 Alistair Popple 2021-06-07 1855 mmu_notifier_invalidate_range(mm, address,
80e54e5e679c95 Alistair Popple 2021-06-07 1856 address + PAGE_SIZE);
80e54e5e679c95 Alistair Popple 2021-06-07 1857 } else {
80e54e5e679c95 Alistair Popple 2021-06-07 1858 swp_entry_t entry;
80e54e5e679c95 Alistair Popple 2021-06-07 1859 pte_t swp_pte;
80e54e5e679c95 Alistair Popple 2021-06-07 1860
80e54e5e679c95 Alistair Popple 2021-06-07 1861 if (arch_unmap_one(mm, vma, address, pteval) < 0) {
80e54e5e679c95 Alistair Popple 2021-06-07 1862 set_pte_at(mm, address, pvmw.pte, pteval);
80e54e5e679c95 Alistair Popple 2021-06-07 1863 ret = false;
80e54e5e679c95 Alistair Popple 2021-06-07 1864 page_vma_mapped_walk_done(&pvmw);
80e54e5e679c95 Alistair Popple 2021-06-07 1865 break;
80e54e5e679c95 Alistair Popple 2021-06-07 1866 }
80e54e5e679c95 Alistair Popple 2021-06-07 1867
80e54e5e679c95 Alistair Popple 2021-06-07 1868 /*
80e54e5e679c95 Alistair Popple 2021-06-07 1869 * Store the pfn of the page in a special migration
80e54e5e679c95 Alistair Popple 2021-06-07 1870 * pte. do_swap_page() will wait until the migration
80e54e5e679c95 Alistair Popple 2021-06-07 1871 * pte is removed and then restart fault handling.
80e54e5e679c95 Alistair Popple 2021-06-07 1872 */
80e54e5e679c95 Alistair Popple 2021-06-07 1873 if (pte_write(pteval))
80e54e5e679c95 Alistair Popple 2021-06-07 1874 entry = make_writable_migration_entry(
80e54e5e679c95 Alistair Popple 2021-06-07 1875 page_to_pfn(subpage));
80e54e5e679c95 Alistair Popple 2021-06-07 1876 else
80e54e5e679c95 Alistair Popple 2021-06-07 1877 entry = make_readable_migration_entry(
80e54e5e679c95 Alistair Popple 2021-06-07 1878 page_to_pfn(subpage));
80e54e5e679c95 Alistair Popple 2021-06-07 1879
80e54e5e679c95 Alistair Popple 2021-06-07 1880 swp_pte = swp_entry_to_pte(entry);
80e54e5e679c95 Alistair Popple 2021-06-07 1881 if (pte_soft_dirty(pteval))
80e54e5e679c95 Alistair Popple 2021-06-07 1882 swp_pte = pte_swp_mksoft_dirty(swp_pte);
80e54e5e679c95 Alistair Popple 2021-06-07 1883 if (pte_uffd_wp(pteval))
80e54e5e679c95 Alistair Popple 2021-06-07 1884 swp_pte = pte_swp_mkuffd_wp(swp_pte);
80e54e5e679c95 Alistair Popple 2021-06-07 1885 set_pte_at(mm, address, pvmw.pte, swp_pte);
80e54e5e679c95 Alistair Popple 2021-06-07 1886 /*
80e54e5e679c95 Alistair Popple 2021-06-07 1887 * No need to invalidate here it will synchronize on
80e54e5e679c95 Alistair Popple 2021-06-07 1888 * against the special swap migration pte.
80e54e5e679c95 Alistair Popple 2021-06-07 1889 */
80e54e5e679c95 Alistair Popple 2021-06-07 1890 }
80e54e5e679c95 Alistair Popple 2021-06-07 1891
80e54e5e679c95 Alistair Popple 2021-06-07 1892 /*
80e54e5e679c95 Alistair Popple 2021-06-07 1893 * No need to call mmu_notifier_invalidate_range() it has be
80e54e5e679c95 Alistair Popple 2021-06-07 1894 * done above for all cases requiring it to happen under page
80e54e5e679c95 Alistair Popple 2021-06-07 1895 * table lock before mmu_notifier_invalidate_range_end()
80e54e5e679c95 Alistair Popple 2021-06-07 1896 *
80e54e5e679c95 Alistair Popple 2021-06-07 1897 * See Documentation/vm/mmu_notifier.rst
80e54e5e679c95 Alistair Popple 2021-06-07 1898 */
80e54e5e679c95 Alistair Popple 2021-06-07 @1899 page_remove_rmap(subpage, PageHuge(page));
80e54e5e679c95 Alistair Popple 2021-06-07 1900 put_page(page);
80e54e5e679c95 Alistair Popple 2021-06-07 1901 }
80e54e5e679c95 Alistair Popple 2021-06-07 1902
80e54e5e679c95 Alistair Popple 2021-06-07 1903 mmu_notifier_invalidate_range_end(&range);
80e54e5e679c95 Alistair Popple 2021-06-07 1904
80e54e5e679c95 Alistair Popple 2021-06-07 1905 return ret;
80e54e5e679c95 Alistair Popple 2021-06-07 1906 }
80e54e5e679c95 Alistair Popple 2021-06-07 1907
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 37498 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread* [PATCH v10 00/10] Add support for SVM atomics in Nouveau @ 2021-06-07 7:58 Alistair Popple 2021-06-07 7:58 ` Alistair Popple 0 siblings, 1 reply; 3+ messages in thread From: Alistair Popple @ 2021-06-07 7:58 UTC (permalink / raw) To: linux-mm, akpm Cc: rcampbell, linux-doc, nouveau, hughd, linux-kernel, dri-devel, hch, bskeggs, jgg, peterx, shakeelb, jhubbard, willy, Alistair Popple Hi Andrew, This is an update to address some comments on the previous version of this series. Most are code comment updates, although there were a couple of code changes as well. The most significant are: - Re-introduce the check of VM_LOCKED under the PTL in page_mlock_one(). This was present in an earlier version of the series but removed because we thought it was redundant. However Shakeel provided some background making it clear it is needed. - Reworked the return codes in copy_pte_range() based on suggestions from Peter Xu to hopefully make the code clearer and less error-prone. - Integrated a fix to the Nouveau code reported by Colin King. As discussed to minimise impact I have also made this dependent on CONFIG_DEVICE_PRIVATE. Hopefully these changes don't break any other series that may have been based on the previous version. I see there has been some discussion from Hugh and others around patch order, so if you need me to rebase these to a different branch let me know. Introduction ============ Some devices have features such as atomic PTE bits that can be used to implement atomic access to system memory. To support atomic operations to a shared virtual memory page such a device needs access to that page which is exclusive of the CPU. This series introduces a mechanism to temporarily unmap pages granting exclusive access to a device. These changes are required to support OpenCL atomic operations in Nouveau to shared virtual memory (SVM) regions allocated with the CL_MEM_SVM_ATOMICS clSVMAlloc flag. A more complete description of the OpenCL SVM feature is available at https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/ OpenCL_API.html#_shared_virtual_memory . Implementation ============== Exclusive device access is implemented by adding a new swap entry type (SWAP_DEVICE_EXCLUSIVE) which is similar to a migration entry. The main difference is that on fault the original entry is immediately restored by the fault handler instead of waiting. Restoring the entry triggers calls to MMU notifers which allows a device driver to revoke the atomic access permission from the GPU prior to the CPU finalising the entry. Patches ======= Patches 1 & 2 refactor existing migration and device private entry functions. Patches 3 & 4 rework try_to_unmap_one() by splitting out unrelated functionality into separate functions - try_to_migrate_one() and try_to_munlock_one(). Patch 5 renames some existing code but does not introduce functionality. Patch 6 is a small clean-up to swap entry handling in copy_pte_range(). Patch 7 contains the bulk of the implementation for device exclusive memory. Patch 8 contains some additions to the HMM selftests to ensure everything works as expected. Patch 9 is a cleanup for the Nouveau SVM implementation. Patch 10 contains the implementation of atomic access for the Nouveau driver. Testing ======= This has been tested with upstream Mesa 21.1.0 and a simple OpenCL program which checks that GPU atomic accesses to system memory are atomic. Without this series the test fails as there is no way of write-protecting the page mapping which results in the device clobbering CPU writes. For reference the test is available at https://ozlabs.org/~apopple/opencl_svm_atomics/ Further testing has been performed by adding support for testing exclusive access to the hmm-tests kselftests. Alistair Popple (10): mm: Remove special swap entry functions mm/swapops: Rework swap entry manipulation code mm/rmap: Split try_to_munlock from try_to_unmap mm/rmap: Split migration into its own function mm: Rename migrate_pgmap_owner mm/memory.c: Allow different return codes for copy_nonpresent_pte() mm: Device exclusive memory access mm: Selftests for exclusive device memory nouveau/svm: Refactor nouveau_range_fault nouveau/svm: Implement atomic SVM access Documentation/vm/hmm.rst | 19 +- Documentation/vm/unevictable-lru.rst | 33 +- arch/s390/mm/pgtable.c | 2 +- drivers/gpu/drm/nouveau/include/nvif/if000c.h | 1 + drivers/gpu/drm/nouveau/nouveau_svm.c | 156 ++++- drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h | 1 + .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c | 6 + fs/proc/task_mmu.c | 23 +- include/linux/mmu_notifier.h | 26 +- include/linux/rmap.h | 11 +- include/linux/swap.h | 13 +- include/linux/swapops.h | 123 ++-- lib/test_hmm.c | 126 +++- lib/test_hmm_uapi.h | 2 + mm/debug_vm_pgtable.c | 12 +- mm/hmm.c | 12 +- mm/huge_memory.c | 45 +- mm/hugetlb.c | 10 +- mm/memcontrol.c | 2 +- mm/memory.c | 173 ++++- mm/migrate.c | 51 +- mm/mlock.c | 12 +- mm/mprotect.c | 18 +- mm/page_vma_mapped.c | 15 +- mm/rmap.c | 602 +++++++++++++++--- tools/testing/selftests/vm/hmm-tests.c | 158 +++++ 26 files changed, 1328 insertions(+), 324 deletions(-) -- 2.20.1 ^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH v10 04/10] mm/rmap: Split migration into its own function 2021-06-07 7:58 [PATCH v10 00/10] Add support for SVM atomics in Nouveau Alistair Popple @ 2021-06-07 7:58 ` Alistair Popple 0 siblings, 0 replies; 3+ messages in thread From: Alistair Popple @ 2021-06-07 7:58 UTC (permalink / raw) To: linux-mm, akpm Cc: rcampbell, linux-doc, nouveau, hughd, linux-kernel, dri-devel, hch, bskeggs, jgg, peterx, shakeelb, jhubbard, willy, Alistair Popple, Christoph Hellwig Migration is currently implemented as a mode of operation for try_to_unmap_one() generally specified by passing the TTU_MIGRATION flag or in the case of splitting a huge anonymous page TTU_SPLIT_FREEZE. However it does not have much in common with the rest of the unmap functionality of try_to_unmap_one() and thus splitting it into a separate function reduces the complexity of try_to_unmap_one() making it more readable. Several simplifications can also be made in try_to_migrate_one() based on the following observations: - All users of TTU_MIGRATION also set TTU_IGNORE_MLOCK. - No users of TTU_MIGRATION ever set TTU_IGNORE_HWPOISON. - No users of TTU_MIGRATION ever set TTU_BATCH_FLUSH. TTU_SPLIT_FREEZE is a special case of migration used when splitting an anonymous page. This is most easily dealt with by calling the correct function from unmap_page() in mm/huge_memory.c - either try_to_migrate() for PageAnon or try_to_unmap(). Signed-off-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> --- v5: * Added comments about how PMD splitting works for migration vs. unmapping * Tightened up the flag check in try_to_migrate() to be explicit about which TTU_XXX flags are supported. --- include/linux/rmap.h | 4 +- mm/huge_memory.c | 15 +- mm/migrate.c | 9 +- mm/rmap.c | 358 ++++++++++++++++++++++++++++++++----------- 4 files changed, 280 insertions(+), 106 deletions(-) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 38a746787c2f..0e25d829f742 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -86,8 +86,6 @@ struct anon_vma_chain { }; enum ttu_flags { - TTU_MIGRATION = 0x1, /* migration mode */ - TTU_SPLIT_HUGE_PMD = 0x4, /* split huge PMD if any */ TTU_IGNORE_MLOCK = 0x8, /* ignore mlock */ TTU_IGNORE_HWPOISON = 0x20, /* corrupted page is recoverable */ @@ -96,7 +94,6 @@ enum ttu_flags { * do a final flush if necessary */ TTU_RMAP_LOCKED = 0x80, /* do not grab rmap lock: * caller holds it */ - TTU_SPLIT_FREEZE = 0x100, /* freeze pte under splitting thp */ }; #ifdef CONFIG_MMU @@ -193,6 +190,7 @@ static inline void page_dup_rmap(struct page *page, bool compound) int page_referenced(struct page *, int is_locked, struct mem_cgroup *memcg, unsigned long *vm_flags); +bool try_to_migrate(struct page *page, enum ttu_flags flags); bool try_to_unmap(struct page *, enum ttu_flags flags); /* Avoid racy checks */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2ec6dab72217..6dddc75b89ee 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2345,16 +2345,21 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma, static void unmap_page(struct page *page) { - enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | - TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD; + enum ttu_flags ttu_flags = TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD; bool unmap_success; VM_BUG_ON_PAGE(!PageHead(page), page); if (PageAnon(page)) - ttu_flags |= TTU_SPLIT_FREEZE; - - unmap_success = try_to_unmap(page, ttu_flags); + unmap_success = try_to_migrate(page, ttu_flags); + else + /* + * Don't install migration entries for file backed pages. This + * helps handle cases when i_size is in the middle of the page + * as there is no need to unmap pages beyond i_size manually. + */ + unmap_success = try_to_unmap(page, ttu_flags | + TTU_IGNORE_MLOCK); VM_BUG_ON_PAGE(!unmap_success, page); } diff --git a/mm/migrate.c b/mm/migrate.c index 930de919b1f2..05740f816bc4 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1103,7 +1103,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage, /* Establish migration ptes */ VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma, page); - try_to_unmap(page, TTU_MIGRATION|TTU_IGNORE_MLOCK); + try_to_migrate(page, 0); page_was_mapped = 1; } @@ -1305,7 +1305,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, if (page_mapped(hpage)) { bool mapping_locked = false; - enum ttu_flags ttu = TTU_MIGRATION|TTU_IGNORE_MLOCK; + enum ttu_flags ttu = 0; if (!PageAnon(hpage)) { /* @@ -1322,7 +1322,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, ttu |= TTU_RMAP_LOCKED; } - try_to_unmap(hpage, ttu); + try_to_migrate(hpage, ttu); page_was_mapped = 1; if (mapping_locked) @@ -2712,7 +2712,6 @@ static void migrate_vma_prepare(struct migrate_vma *migrate) */ static void migrate_vma_unmap(struct migrate_vma *migrate) { - int flags = TTU_MIGRATION | TTU_IGNORE_MLOCK; const unsigned long npages = migrate->npages; const unsigned long start = migrate->start; unsigned long addr, i, restore = 0; @@ -2724,7 +2723,7 @@ static void migrate_vma_unmap(struct migrate_vma *migrate) continue; if (page_mapped(page)) { - try_to_unmap(page, flags); + try_to_migrate(page, 0); if (page_mapped(page)) goto restore; } diff --git a/mm/rmap.c b/mm/rmap.c index b6c50df08b3b..be0450d905cd 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1405,14 +1405,8 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, struct mmu_notifier_range range; enum ttu_flags flags = (enum ttu_flags)(long)arg; - if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) && - is_zone_device_page(page) && !is_device_private_page(page)) - return true; - - if (flags & TTU_SPLIT_HUGE_PMD) { - split_huge_pmd_address(vma, address, - flags & TTU_SPLIT_FREEZE, page); - } + if (flags & TTU_SPLIT_HUGE_PMD) + split_huge_pmd_address(vma, address, false, page); /* * For THP, we have to assume the worse case ie pmd for invalidation. @@ -1436,16 +1430,6 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, mmu_notifier_invalidate_range_start(&range); while (page_vma_mapped_walk(&pvmw)) { -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION - /* PMD-mapped THP migration entry */ - if (!pvmw.pte && (flags & TTU_MIGRATION)) { - VM_BUG_ON_PAGE(PageHuge(page) || !PageTransCompound(page), page); - - set_pmd_migration_entry(&pvmw, page); - continue; - } -#endif - /* * If the page is mlock()d, we cannot swap it out. * If it's recently referenced (perhaps page_referenced @@ -1507,46 +1491,6 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, } } - if (IS_ENABLED(CONFIG_MIGRATION) && - (flags & TTU_MIGRATION) && - is_zone_device_page(page)) { - swp_entry_t entry; - pte_t swp_pte; - - pteval = ptep_get_and_clear(mm, pvmw.address, pvmw.pte); - - /* - * Store the pfn of the page in a special migration - * pte. do_swap_page() will wait until the migration - * pte is removed and then restart fault handling. - */ - entry = make_readable_migration_entry(page_to_pfn(page)); - swp_pte = swp_entry_to_pte(entry); - - /* - * pteval maps a zone device page and is therefore - * a swap pte. - */ - if (pte_swp_soft_dirty(pteval)) - swp_pte = pte_swp_mksoft_dirty(swp_pte); - if (pte_swp_uffd_wp(pteval)) - swp_pte = pte_swp_mkuffd_wp(swp_pte); - set_pte_at(mm, pvmw.address, pvmw.pte, swp_pte); - /* - * No need to invalidate here it will synchronize on - * against the special swap migration pte. - * - * The assignment to subpage above was computed from a - * swap PTE which results in an invalid pointer. - * Since only PAGE_SIZE pages can currently be - * migrated, just set it to page. This will need to be - * changed when hugepage migrations to device private - * memory are supported. - */ - subpage = page; - goto discard; - } - /* Nuke the page table entry. */ flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); if (should_defer_flush(mm, flags)) { @@ -1599,39 +1543,6 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, /* We have to invalidate as we cleared the pte */ mmu_notifier_invalidate_range(mm, address, address + PAGE_SIZE); - } else if (IS_ENABLED(CONFIG_MIGRATION) && - (flags & (TTU_MIGRATION|TTU_SPLIT_FREEZE))) { - swp_entry_t entry; - pte_t swp_pte; - - if (arch_unmap_one(mm, vma, address, pteval) < 0) { - set_pte_at(mm, address, pvmw.pte, pteval); - ret = false; - page_vma_mapped_walk_done(&pvmw); - break; - } - - /* - * Store the pfn of the page in a special migration - * pte. do_swap_page() will wait until the migration - * pte is removed and then restart fault handling. - */ - if (pte_write(pteval)) - entry = make_writable_migration_entry( - page_to_pfn(subpage)); - else - entry = make_readable_migration_entry( - page_to_pfn(subpage)); - swp_pte = swp_entry_to_pte(entry); - if (pte_soft_dirty(pteval)) - swp_pte = pte_swp_mksoft_dirty(swp_pte); - if (pte_uffd_wp(pteval)) - swp_pte = pte_swp_mkuffd_wp(swp_pte); - set_pte_at(mm, address, pvmw.pte, swp_pte); - /* - * No need to invalidate here it will synchronize on - * against the special swap migration pte. - */ } else if (PageAnon(page)) { swp_entry_t entry = { .val = page_private(subpage) }; pte_t swp_pte; @@ -1758,6 +1669,268 @@ bool try_to_unmap(struct page *page, enum ttu_flags flags) .anon_lock = page_lock_anon_vma_read, }; + if (flags & TTU_RMAP_LOCKED) + rmap_walk_locked(page, &rwc); + else + rmap_walk(page, &rwc); + + return !page_mapcount(page) ? true : false; +} + +/* + * @arg: enum ttu_flags will be passed to this argument. + * + * If TTU_SPLIT_HUGE_PMD is specified any PMD mappings will be split into PTEs + * containing migration entries. This and TTU_RMAP_LOCKED are the only supported + * flags. + */ +static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma, + unsigned long address, void *arg) +{ + struct mm_struct *mm = vma->vm_mm; + struct page_vma_mapped_walk pvmw = { + .page = page, + .vma = vma, + .address = address, + }; + pte_t pteval; + struct page *subpage; + bool ret = true; + struct mmu_notifier_range range; + enum ttu_flags flags = (enum ttu_flags)(long)arg; + + if (is_zone_device_page(page) && !is_device_private_page(page)) + return true; + + /* + * unmap_page() in mm/huge_memory.c is the only user of migration with + * TTU_SPLIT_HUGE_PMD and it wants to freeze. + */ + if (flags & TTU_SPLIT_HUGE_PMD) + split_huge_pmd_address(vma, address, true, page); + + /* + * For THP, we have to assume the worse case ie pmd for invalidation. + * For hugetlb, it could be much worse if we need to do pud + * invalidation in the case of pmd sharing. + * + * Note that the page can not be free in this function as call of + * try_to_unmap() must hold a reference on the page. + */ + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm, + address, + min(vma->vm_end, address + page_size(page))); + if (PageHuge(page)) { + /* + * If sharing is possible, start and end will be adjusted + * accordingly. + */ + adjust_range_if_pmd_sharing_possible(vma, &range.start, + &range.end); + } + mmu_notifier_invalidate_range_start(&range); + + while (page_vma_mapped_walk(&pvmw)) { +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION + /* PMD-mapped THP migration entry */ + if (!pvmw.pte) { + VM_BUG_ON_PAGE(PageHuge(page) || + !PageTransCompound(page), page); + + set_pmd_migration_entry(&pvmw, page); + continue; + } +#endif + + /* Unexpected PMD-mapped THP? */ + VM_BUG_ON_PAGE(!pvmw.pte, page); + + subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte); + address = pvmw.address; + + if (PageHuge(page) && !PageAnon(page)) { + /* + * To call huge_pmd_unshare, i_mmap_rwsem must be + * held in write mode. Caller needs to explicitly + * do this outside rmap routines. + */ + VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); + if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) { + /* + * huge_pmd_unshare unmapped an entire PMD + * page. There is no way of knowing exactly + * which PMDs may be cached for this mm, so + * we must flush them all. start/end were + * already adjusted above to cover this range. + */ + flush_cache_range(vma, range.start, range.end); + flush_tlb_range(vma, range.start, range.end); + mmu_notifier_invalidate_range(mm, range.start, + range.end); + + /* + * The ref count of the PMD page was dropped + * which is part of the way map counting + * is done for shared PMDs. Return 'true' + * here. When there is no other sharing, + * huge_pmd_unshare returns false and we will + * unmap the actual page and drop map count + * to zero. + */ + page_vma_mapped_walk_done(&pvmw); + break; + } + } + + /* Nuke the page table entry. */ + flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); + pteval = ptep_clear_flush(vma, address, pvmw.pte); + + /* Move the dirty bit to the page. Now the pte is gone. */ + if (pte_dirty(pteval)) + set_page_dirty(page); + + /* Update high watermark before we lower rss */ + update_hiwater_rss(mm); + + if (is_zone_device_page(page)) { + swp_entry_t entry; + pte_t swp_pte; + + /* + * Store the pfn of the page in a special migration + * pte. do_swap_page() will wait until the migration + * pte is removed and then restart fault handling. + */ + entry = make_readable_migration_entry( + page_to_pfn(page)); + swp_pte = swp_entry_to_pte(entry); + + /* + * pteval maps a zone device page and is therefore + * a swap pte. + */ + if (pte_swp_soft_dirty(pteval)) + swp_pte = pte_swp_mksoft_dirty(swp_pte); + if (pte_swp_uffd_wp(pteval)) + swp_pte = pte_swp_mkuffd_wp(swp_pte); + set_pte_at(mm, pvmw.address, pvmw.pte, swp_pte); + /* + * No need to invalidate here it will synchronize on + * against the special swap migration pte. + * + * The assignment to subpage above was computed from a + * swap PTE which results in an invalid pointer. + * Since only PAGE_SIZE pages can currently be + * migrated, just set it to page. This will need to be + * changed when hugepage migrations to device private + * memory are supported. + */ + subpage = page; + } else if (PageHWPoison(page)) { + pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); + if (PageHuge(page)) { + hugetlb_count_sub(compound_nr(page), mm); + set_huge_swap_pte_at(mm, address, + pvmw.pte, pteval, + vma_mmu_pagesize(vma)); + } else { + dec_mm_counter(mm, mm_counter(page)); + set_pte_at(mm, address, pvmw.pte, pteval); + } + + } else if (pte_unused(pteval) && !userfaultfd_armed(vma)) { + /* + * The guest indicated that the page content is of no + * interest anymore. Simply discard the pte, vmscan + * will take care of the rest. + * A future reference will then fault in a new zero + * page. When userfaultfd is active, we must not drop + * this page though, as its main user (postcopy + * migration) will not expect userfaults on already + * copied pages. + */ + dec_mm_counter(mm, mm_counter(page)); + /* We have to invalidate as we cleared the pte */ + mmu_notifier_invalidate_range(mm, address, + address + PAGE_SIZE); + } else { + swp_entry_t entry; + pte_t swp_pte; + + if (arch_unmap_one(mm, vma, address, pteval) < 0) { + set_pte_at(mm, address, pvmw.pte, pteval); + ret = false; + page_vma_mapped_walk_done(&pvmw); + break; + } + + /* + * Store the pfn of the page in a special migration + * pte. do_swap_page() will wait until the migration + * pte is removed and then restart fault handling. + */ + if (pte_write(pteval)) + entry = make_writable_migration_entry( + page_to_pfn(subpage)); + else + entry = make_readable_migration_entry( + page_to_pfn(subpage)); + + swp_pte = swp_entry_to_pte(entry); + if (pte_soft_dirty(pteval)) + swp_pte = pte_swp_mksoft_dirty(swp_pte); + if (pte_uffd_wp(pteval)) + swp_pte = pte_swp_mkuffd_wp(swp_pte); + set_pte_at(mm, address, pvmw.pte, swp_pte); + /* + * No need to invalidate here it will synchronize on + * against the special swap migration pte. + */ + } + + /* + * No need to call mmu_notifier_invalidate_range() it has be + * done above for all cases requiring it to happen under page + * table lock before mmu_notifier_invalidate_range_end() + * + * See Documentation/vm/mmu_notifier.rst + */ + page_remove_rmap(subpage, PageHuge(page)); + put_page(page); + } + + mmu_notifier_invalidate_range_end(&range); + + return ret; +} + +/** + * try_to_migrate - try to replace all page table mappings with swap entries + * @page: the page to replace page table entries for + * @flags: action and flags + * + * Tries to remove all the page table entries which are mapping this page and + * replace them with special swap entries. Caller must hold the page lock. + * + * If is successful, return true. Otherwise, false. + */ +bool try_to_migrate(struct page *page, enum ttu_flags flags) +{ + struct rmap_walk_control rwc = { + .rmap_one = try_to_migrate_one, + .arg = (void *)flags, + .done = page_not_mapped, + .anon_lock = page_lock_anon_vma_read, + }; + + /* + * Migration always ignores mlock and only supports TTU_RMAP_LOCKED and + * TTU_SPLIT_HUGE_PMD flags. + */ + if (WARN_ON_ONCE(flags & ~(TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD))) + return false; + /* * During exec, a temporary VMA is setup and later moved. * The VMA is moved under the anon_vma lock but not the @@ -1766,8 +1939,7 @@ bool try_to_unmap(struct page *page, enum ttu_flags flags) * locking requirements of exec(), migration skips * temporary VMAs until after exec() completes. */ - if ((flags & (TTU_MIGRATION|TTU_SPLIT_FREEZE)) - && !PageKsm(page) && PageAnon(page)) + if (!PageKsm(page) && PageAnon(page)) rwc.invalid_vma = invalid_migration_vma; if (flags & TTU_RMAP_LOCKED) -- 2.20.1 ^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH v10 04/10] mm/rmap: Split migration into its own function @ 2021-06-07 7:58 ` Alistair Popple 0 siblings, 0 replies; 3+ messages in thread From: Alistair Popple @ 2021-06-07 7:58 UTC (permalink / raw) To: linux-mm, akpm Cc: rcampbell, willy, linux-doc, nouveau, Alistair Popple, hughd, linux-kernel, dri-devel, hch, peterx, shakeelb, bskeggs, jgg, jhubbard, Christoph Hellwig Migration is currently implemented as a mode of operation for try_to_unmap_one() generally specified by passing the TTU_MIGRATION flag or in the case of splitting a huge anonymous page TTU_SPLIT_FREEZE. However it does not have much in common with the rest of the unmap functionality of try_to_unmap_one() and thus splitting it into a separate function reduces the complexity of try_to_unmap_one() making it more readable. Several simplifications can also be made in try_to_migrate_one() based on the following observations: - All users of TTU_MIGRATION also set TTU_IGNORE_MLOCK. - No users of TTU_MIGRATION ever set TTU_IGNORE_HWPOISON. - No users of TTU_MIGRATION ever set TTU_BATCH_FLUSH. TTU_SPLIT_FREEZE is a special case of migration used when splitting an anonymous page. This is most easily dealt with by calling the correct function from unmap_page() in mm/huge_memory.c - either try_to_migrate() for PageAnon or try_to_unmap(). Signed-off-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> --- v5: * Added comments about how PMD splitting works for migration vs. unmapping * Tightened up the flag check in try_to_migrate() to be explicit about which TTU_XXX flags are supported. --- include/linux/rmap.h | 4 +- mm/huge_memory.c | 15 +- mm/migrate.c | 9 +- mm/rmap.c | 358 ++++++++++++++++++++++++++++++++----------- 4 files changed, 280 insertions(+), 106 deletions(-) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 38a746787c2f..0e25d829f742 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -86,8 +86,6 @@ struct anon_vma_chain { }; enum ttu_flags { - TTU_MIGRATION = 0x1, /* migration mode */ - TTU_SPLIT_HUGE_PMD = 0x4, /* split huge PMD if any */ TTU_IGNORE_MLOCK = 0x8, /* ignore mlock */ TTU_IGNORE_HWPOISON = 0x20, /* corrupted page is recoverable */ @@ -96,7 +94,6 @@ enum ttu_flags { * do a final flush if necessary */ TTU_RMAP_LOCKED = 0x80, /* do not grab rmap lock: * caller holds it */ - TTU_SPLIT_FREEZE = 0x100, /* freeze pte under splitting thp */ }; #ifdef CONFIG_MMU @@ -193,6 +190,7 @@ static inline void page_dup_rmap(struct page *page, bool compound) int page_referenced(struct page *, int is_locked, struct mem_cgroup *memcg, unsigned long *vm_flags); +bool try_to_migrate(struct page *page, enum ttu_flags flags); bool try_to_unmap(struct page *, enum ttu_flags flags); /* Avoid racy checks */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2ec6dab72217..6dddc75b89ee 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2345,16 +2345,21 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma, static void unmap_page(struct page *page) { - enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | - TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD; + enum ttu_flags ttu_flags = TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD; bool unmap_success; VM_BUG_ON_PAGE(!PageHead(page), page); if (PageAnon(page)) - ttu_flags |= TTU_SPLIT_FREEZE; - - unmap_success = try_to_unmap(page, ttu_flags); + unmap_success = try_to_migrate(page, ttu_flags); + else + /* + * Don't install migration entries for file backed pages. This + * helps handle cases when i_size is in the middle of the page + * as there is no need to unmap pages beyond i_size manually. + */ + unmap_success = try_to_unmap(page, ttu_flags | + TTU_IGNORE_MLOCK); VM_BUG_ON_PAGE(!unmap_success, page); } diff --git a/mm/migrate.c b/mm/migrate.c index 930de919b1f2..05740f816bc4 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1103,7 +1103,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage, /* Establish migration ptes */ VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma, page); - try_to_unmap(page, TTU_MIGRATION|TTU_IGNORE_MLOCK); + try_to_migrate(page, 0); page_was_mapped = 1; } @@ -1305,7 +1305,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, if (page_mapped(hpage)) { bool mapping_locked = false; - enum ttu_flags ttu = TTU_MIGRATION|TTU_IGNORE_MLOCK; + enum ttu_flags ttu = 0; if (!PageAnon(hpage)) { /* @@ -1322,7 +1322,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, ttu |= TTU_RMAP_LOCKED; } - try_to_unmap(hpage, ttu); + try_to_migrate(hpage, ttu); page_was_mapped = 1; if (mapping_locked) @@ -2712,7 +2712,6 @@ static void migrate_vma_prepare(struct migrate_vma *migrate) */ static void migrate_vma_unmap(struct migrate_vma *migrate) { - int flags = TTU_MIGRATION | TTU_IGNORE_MLOCK; const unsigned long npages = migrate->npages; const unsigned long start = migrate->start; unsigned long addr, i, restore = 0; @@ -2724,7 +2723,7 @@ static void migrate_vma_unmap(struct migrate_vma *migrate) continue; if (page_mapped(page)) { - try_to_unmap(page, flags); + try_to_migrate(page, 0); if (page_mapped(page)) goto restore; } diff --git a/mm/rmap.c b/mm/rmap.c index b6c50df08b3b..be0450d905cd 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1405,14 +1405,8 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, struct mmu_notifier_range range; enum ttu_flags flags = (enum ttu_flags)(long)arg; - if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) && - is_zone_device_page(page) && !is_device_private_page(page)) - return true; - - if (flags & TTU_SPLIT_HUGE_PMD) { - split_huge_pmd_address(vma, address, - flags & TTU_SPLIT_FREEZE, page); - } + if (flags & TTU_SPLIT_HUGE_PMD) + split_huge_pmd_address(vma, address, false, page); /* * For THP, we have to assume the worse case ie pmd for invalidation. @@ -1436,16 +1430,6 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, mmu_notifier_invalidate_range_start(&range); while (page_vma_mapped_walk(&pvmw)) { -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION - /* PMD-mapped THP migration entry */ - if (!pvmw.pte && (flags & TTU_MIGRATION)) { - VM_BUG_ON_PAGE(PageHuge(page) || !PageTransCompound(page), page); - - set_pmd_migration_entry(&pvmw, page); - continue; - } -#endif - /* * If the page is mlock()d, we cannot swap it out. * If it's recently referenced (perhaps page_referenced @@ -1507,46 +1491,6 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, } } - if (IS_ENABLED(CONFIG_MIGRATION) && - (flags & TTU_MIGRATION) && - is_zone_device_page(page)) { - swp_entry_t entry; - pte_t swp_pte; - - pteval = ptep_get_and_clear(mm, pvmw.address, pvmw.pte); - - /* - * Store the pfn of the page in a special migration - * pte. do_swap_page() will wait until the migration - * pte is removed and then restart fault handling. - */ - entry = make_readable_migration_entry(page_to_pfn(page)); - swp_pte = swp_entry_to_pte(entry); - - /* - * pteval maps a zone device page and is therefore - * a swap pte. - */ - if (pte_swp_soft_dirty(pteval)) - swp_pte = pte_swp_mksoft_dirty(swp_pte); - if (pte_swp_uffd_wp(pteval)) - swp_pte = pte_swp_mkuffd_wp(swp_pte); - set_pte_at(mm, pvmw.address, pvmw.pte, swp_pte); - /* - * No need to invalidate here it will synchronize on - * against the special swap migration pte. - * - * The assignment to subpage above was computed from a - * swap PTE which results in an invalid pointer. - * Since only PAGE_SIZE pages can currently be - * migrated, just set it to page. This will need to be - * changed when hugepage migrations to device private - * memory are supported. - */ - subpage = page; - goto discard; - } - /* Nuke the page table entry. */ flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); if (should_defer_flush(mm, flags)) { @@ -1599,39 +1543,6 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, /* We have to invalidate as we cleared the pte */ mmu_notifier_invalidate_range(mm, address, address + PAGE_SIZE); - } else if (IS_ENABLED(CONFIG_MIGRATION) && - (flags & (TTU_MIGRATION|TTU_SPLIT_FREEZE))) { - swp_entry_t entry; - pte_t swp_pte; - - if (arch_unmap_one(mm, vma, address, pteval) < 0) { - set_pte_at(mm, address, pvmw.pte, pteval); - ret = false; - page_vma_mapped_walk_done(&pvmw); - break; - } - - /* - * Store the pfn of the page in a special migration - * pte. do_swap_page() will wait until the migration - * pte is removed and then restart fault handling. - */ - if (pte_write(pteval)) - entry = make_writable_migration_entry( - page_to_pfn(subpage)); - else - entry = make_readable_migration_entry( - page_to_pfn(subpage)); - swp_pte = swp_entry_to_pte(entry); - if (pte_soft_dirty(pteval)) - swp_pte = pte_swp_mksoft_dirty(swp_pte); - if (pte_uffd_wp(pteval)) - swp_pte = pte_swp_mkuffd_wp(swp_pte); - set_pte_at(mm, address, pvmw.pte, swp_pte); - /* - * No need to invalidate here it will synchronize on - * against the special swap migration pte. - */ } else if (PageAnon(page)) { swp_entry_t entry = { .val = page_private(subpage) }; pte_t swp_pte; @@ -1758,6 +1669,268 @@ bool try_to_unmap(struct page *page, enum ttu_flags flags) .anon_lock = page_lock_anon_vma_read, }; + if (flags & TTU_RMAP_LOCKED) + rmap_walk_locked(page, &rwc); + else + rmap_walk(page, &rwc); + + return !page_mapcount(page) ? true : false; +} + +/* + * @arg: enum ttu_flags will be passed to this argument. + * + * If TTU_SPLIT_HUGE_PMD is specified any PMD mappings will be split into PTEs + * containing migration entries. This and TTU_RMAP_LOCKED are the only supported + * flags. + */ +static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma, + unsigned long address, void *arg) +{ + struct mm_struct *mm = vma->vm_mm; + struct page_vma_mapped_walk pvmw = { + .page = page, + .vma = vma, + .address = address, + }; + pte_t pteval; + struct page *subpage; + bool ret = true; + struct mmu_notifier_range range; + enum ttu_flags flags = (enum ttu_flags)(long)arg; + + if (is_zone_device_page(page) && !is_device_private_page(page)) + return true; + + /* + * unmap_page() in mm/huge_memory.c is the only user of migration with + * TTU_SPLIT_HUGE_PMD and it wants to freeze. + */ + if (flags & TTU_SPLIT_HUGE_PMD) + split_huge_pmd_address(vma, address, true, page); + + /* + * For THP, we have to assume the worse case ie pmd for invalidation. + * For hugetlb, it could be much worse if we need to do pud + * invalidation in the case of pmd sharing. + * + * Note that the page can not be free in this function as call of + * try_to_unmap() must hold a reference on the page. + */ + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm, + address, + min(vma->vm_end, address + page_size(page))); + if (PageHuge(page)) { + /* + * If sharing is possible, start and end will be adjusted + * accordingly. + */ + adjust_range_if_pmd_sharing_possible(vma, &range.start, + &range.end); + } + mmu_notifier_invalidate_range_start(&range); + + while (page_vma_mapped_walk(&pvmw)) { +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION + /* PMD-mapped THP migration entry */ + if (!pvmw.pte) { + VM_BUG_ON_PAGE(PageHuge(page) || + !PageTransCompound(page), page); + + set_pmd_migration_entry(&pvmw, page); + continue; + } +#endif + + /* Unexpected PMD-mapped THP? */ + VM_BUG_ON_PAGE(!pvmw.pte, page); + + subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte); + address = pvmw.address; + + if (PageHuge(page) && !PageAnon(page)) { + /* + * To call huge_pmd_unshare, i_mmap_rwsem must be + * held in write mode. Caller needs to explicitly + * do this outside rmap routines. + */ + VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); + if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) { + /* + * huge_pmd_unshare unmapped an entire PMD + * page. There is no way of knowing exactly + * which PMDs may be cached for this mm, so + * we must flush them all. start/end were + * already adjusted above to cover this range. + */ + flush_cache_range(vma, range.start, range.end); + flush_tlb_range(vma, range.start, range.end); + mmu_notifier_invalidate_range(mm, range.start, + range.end); + + /* + * The ref count of the PMD page was dropped + * which is part of the way map counting + * is done for shared PMDs. Return 'true' + * here. When there is no other sharing, + * huge_pmd_unshare returns false and we will + * unmap the actual page and drop map count + * to zero. + */ + page_vma_mapped_walk_done(&pvmw); + break; + } + } + + /* Nuke the page table entry. */ + flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); + pteval = ptep_clear_flush(vma, address, pvmw.pte); + + /* Move the dirty bit to the page. Now the pte is gone. */ + if (pte_dirty(pteval)) + set_page_dirty(page); + + /* Update high watermark before we lower rss */ + update_hiwater_rss(mm); + + if (is_zone_device_page(page)) { + swp_entry_t entry; + pte_t swp_pte; + + /* + * Store the pfn of the page in a special migration + * pte. do_swap_page() will wait until the migration + * pte is removed and then restart fault handling. + */ + entry = make_readable_migration_entry( + page_to_pfn(page)); + swp_pte = swp_entry_to_pte(entry); + + /* + * pteval maps a zone device page and is therefore + * a swap pte. + */ + if (pte_swp_soft_dirty(pteval)) + swp_pte = pte_swp_mksoft_dirty(swp_pte); + if (pte_swp_uffd_wp(pteval)) + swp_pte = pte_swp_mkuffd_wp(swp_pte); + set_pte_at(mm, pvmw.address, pvmw.pte, swp_pte); + /* + * No need to invalidate here it will synchronize on + * against the special swap migration pte. + * + * The assignment to subpage above was computed from a + * swap PTE which results in an invalid pointer. + * Since only PAGE_SIZE pages can currently be + * migrated, just set it to page. This will need to be + * changed when hugepage migrations to device private + * memory are supported. + */ + subpage = page; + } else if (PageHWPoison(page)) { + pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); + if (PageHuge(page)) { + hugetlb_count_sub(compound_nr(page), mm); + set_huge_swap_pte_at(mm, address, + pvmw.pte, pteval, + vma_mmu_pagesize(vma)); + } else { + dec_mm_counter(mm, mm_counter(page)); + set_pte_at(mm, address, pvmw.pte, pteval); + } + + } else if (pte_unused(pteval) && !userfaultfd_armed(vma)) { + /* + * The guest indicated that the page content is of no + * interest anymore. Simply discard the pte, vmscan + * will take care of the rest. + * A future reference will then fault in a new zero + * page. When userfaultfd is active, we must not drop + * this page though, as its main user (postcopy + * migration) will not expect userfaults on already + * copied pages. + */ + dec_mm_counter(mm, mm_counter(page)); + /* We have to invalidate as we cleared the pte */ + mmu_notifier_invalidate_range(mm, address, + address + PAGE_SIZE); + } else { + swp_entry_t entry; + pte_t swp_pte; + + if (arch_unmap_one(mm, vma, address, pteval) < 0) { + set_pte_at(mm, address, pvmw.pte, pteval); + ret = false; + page_vma_mapped_walk_done(&pvmw); + break; + } + + /* + * Store the pfn of the page in a special migration + * pte. do_swap_page() will wait until the migration + * pte is removed and then restart fault handling. + */ + if (pte_write(pteval)) + entry = make_writable_migration_entry( + page_to_pfn(subpage)); + else + entry = make_readable_migration_entry( + page_to_pfn(subpage)); + + swp_pte = swp_entry_to_pte(entry); + if (pte_soft_dirty(pteval)) + swp_pte = pte_swp_mksoft_dirty(swp_pte); + if (pte_uffd_wp(pteval)) + swp_pte = pte_swp_mkuffd_wp(swp_pte); + set_pte_at(mm, address, pvmw.pte, swp_pte); + /* + * No need to invalidate here it will synchronize on + * against the special swap migration pte. + */ + } + + /* + * No need to call mmu_notifier_invalidate_range() it has be + * done above for all cases requiring it to happen under page + * table lock before mmu_notifier_invalidate_range_end() + * + * See Documentation/vm/mmu_notifier.rst + */ + page_remove_rmap(subpage, PageHuge(page)); + put_page(page); + } + + mmu_notifier_invalidate_range_end(&range); + + return ret; +} + +/** + * try_to_migrate - try to replace all page table mappings with swap entries + * @page: the page to replace page table entries for + * @flags: action and flags + * + * Tries to remove all the page table entries which are mapping this page and + * replace them with special swap entries. Caller must hold the page lock. + * + * If is successful, return true. Otherwise, false. + */ +bool try_to_migrate(struct page *page, enum ttu_flags flags) +{ + struct rmap_walk_control rwc = { + .rmap_one = try_to_migrate_one, + .arg = (void *)flags, + .done = page_not_mapped, + .anon_lock = page_lock_anon_vma_read, + }; + + /* + * Migration always ignores mlock and only supports TTU_RMAP_LOCKED and + * TTU_SPLIT_HUGE_PMD flags. + */ + if (WARN_ON_ONCE(flags & ~(TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD))) + return false; + /* * During exec, a temporary VMA is setup and later moved. * The VMA is moved under the anon_vma lock but not the @@ -1766,8 +1939,7 @@ bool try_to_unmap(struct page *page, enum ttu_flags flags) * locking requirements of exec(), migration skips * temporary VMAs until after exec() completes. */ - if ((flags & (TTU_MIGRATION|TTU_SPLIT_FREEZE)) - && !PageKsm(page) && PageAnon(page)) + if (!PageKsm(page) && PageAnon(page)) rwc.invalid_vma = invalid_migration_vma; if (flags & TTU_RMAP_LOCKED) -- 2.20.1 ^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-06-09 17:43 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-06-09 17:43 [PATCH v10 04/10] mm/rmap: Split migration into its own function kernel test robot -- strict thread matches above, loose matches on Subject: below -- 2021-06-07 7:58 [PATCH v10 00/10] Add support for SVM atomics in Nouveau Alistair Popple 2021-06-07 7:58 ` [PATCH v10 04/10] mm/rmap: Split migration into its own function Alistair Popple 2021-06-07 7:58 ` Alistair Popple
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.