From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Mike Kravetz <mike.kravetz@oracle.com>,
David Hildenbrand <david@redhat.com>,
Muchun Song <songmuchun@bytedance.com>,
Andrew Morton <akpm@linux-foundation.org>,
Sasha Levin <sashal@kernel.org>,
Samuel Mendoza-Jonas <samjonas@amazon.com>
Subject: [PATCH 5.4 33/67] mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page
Date: Mon, 12 Dec 2022 14:17:08 +0100 [thread overview]
Message-ID: <20221212130919.207389968@linuxfoundation.org> (raw)
In-Reply-To: <20221212130917.599345531@linuxfoundation.org>
From: Baolin Wang <baolin.wang@linux.alibaba.com>
commit fac35ba763ed07ba93154c95ffc0c4a55023707f upstream.
On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb (2M and
1G), but also CONT-PTE/PMD size(64K and 32M) if a 4K page size specified.
So when looking up a CONT-PTE size hugetlb page by follow_page(), it will
use pte_offset_map_lock() to get the pte entry lock for the CONT-PTE size
hugetlb in follow_page_pte(). However this pte entry lock is incorrect
for the CONT-PTE size hugetlb, since we should use huge_pte_lock() to get
the correct lock, which is mm->page_table_lock.
That means the pte entry of the CONT-PTE size hugetlb under current pte
lock is unstable in follow_page_pte(), we can continue to migrate or
poison the pte entry of the CONT-PTE size hugetlb, which can cause some
potential race issues, even though they are under the 'pte lock'.
For example, suppose thread A is trying to look up a CONT-PTE size hugetlb
page by move_pages() syscall under the lock, however antoher thread B can
migrate the CONT-PTE hugetlb page at the same time, which will cause
thread A to get an incorrect page, if thread A also wants to do page
migration, then data inconsistency error occurs.
Moreover we have the same issue for CONT-PMD size hugetlb in
follow_huge_pmd().
To fix above issues, rename the follow_huge_pmd() as follow_huge_pmd_pte()
to handle PMD and PTE level size hugetlb, which uses huge_pte_lock() to
get the correct pte entry lock to make the pte entry stable.
Mike said:
Support for CONT_PMD/_PTE was added with bb9dd3df8ee9 ("arm64: hugetlb:
refactor find_num_contig()"). Patch series "Support for contiguous pte
hugepages", v4. However, I do not believe these code paths were
executed until migration support was added with 5480280d3f2d ("arm64/mm:
enable HugeTLB migration for contiguous bit HugeTLB pages") I would go
with 5480280d3f2d for the Fixes: targe.
Link: https://lkml.kernel.org/r/635f43bdd85ac2615a58405da82b4d33c6e5eb05.1662017562.git.baolin.wang@linux.alibaba.com
Fixes: 5480280d3f2d ("arm64/mm: enable HugeTLB migration for contiguous bit HugeTLB pages")
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Suggested-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[5.4: Fixup contextual diffs before pin_user_pages()]
Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
include/linux/hugetlb.h | 6 +++---
mm/gup.c | 13 ++++++++++++-
mm/hugetlb.c | 28 ++++++++++++++--------------
3 files changed, 29 insertions(+), 18 deletions(-)
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -127,8 +127,8 @@ struct page *follow_huge_addr(struct mm_
struct page *follow_huge_pd(struct vm_area_struct *vma,
unsigned long address, hugepd_t hpd,
int flags, int pdshift);
-struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
- pmd_t *pmd, int flags);
+struct page *follow_huge_pmd_pte(struct vm_area_struct *vma, unsigned long address,
+ int flags);
struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
pud_t *pud, int flags);
struct page *follow_huge_pgd(struct mm_struct *mm, unsigned long address,
@@ -175,7 +175,7 @@ static inline void hugetlb_show_meminfo(
{
}
#define follow_huge_pd(vma, addr, hpd, flags, pdshift) NULL
-#define follow_huge_pmd(mm, addr, pmd, flags) NULL
+#define follow_huge_pmd_pte(vma, addr, flags) NULL
#define follow_huge_pud(mm, addr, pud, flags) NULL
#define follow_huge_pgd(mm, addr, pgd, flags) NULL
#define prepare_hugepage_range(file, addr, len) (-EINVAL)
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -188,6 +188,17 @@ static struct page *follow_page_pte(stru
spinlock_t *ptl;
pte_t *ptep, pte;
+ /*
+ * Considering PTE level hugetlb, like continuous-PTE hugetlb on
+ * ARM64 architecture.
+ */
+ if (is_vm_hugetlb_page(vma)) {
+ page = follow_huge_pmd_pte(vma, address, flags);
+ if (page)
+ return page;
+ return no_page_table(vma, flags);
+ }
+
retry:
if (unlikely(pmd_bad(*pmd)))
return no_page_table(vma, flags);
@@ -333,7 +344,7 @@ static struct page *follow_pmd_mask(stru
if (pmd_none(pmdval))
return no_page_table(vma, flags);
if (pmd_huge(pmdval) && vma->vm_flags & VM_HUGETLB) {
- page = follow_huge_pmd(mm, address, pmd, flags);
+ page = follow_huge_pmd_pte(vma, address, flags);
if (page)
return page;
return no_page_table(vma, flags);
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5157,30 +5157,30 @@ follow_huge_pd(struct vm_area_struct *vm
}
struct page * __weak
-follow_huge_pmd(struct mm_struct *mm, unsigned long address,
- pmd_t *pmd, int flags)
+follow_huge_pmd_pte(struct vm_area_struct *vma, unsigned long address, int flags)
{
+ struct hstate *h = hstate_vma(vma);
+ struct mm_struct *mm = vma->vm_mm;
struct page *page = NULL;
spinlock_t *ptl;
- pte_t pte;
+ pte_t *ptep, pte;
+
retry:
- ptl = pmd_lockptr(mm, pmd);
- spin_lock(ptl);
- /*
- * make sure that the address range covered by this pmd is not
- * unmapped from other threads.
- */
- if (!pmd_huge(*pmd))
- goto out;
- pte = huge_ptep_get((pte_t *)pmd);
+ ptep = huge_pte_offset(mm, address, huge_page_size(h));
+ if (!ptep)
+ return NULL;
+
+ ptl = huge_pte_lock(h, mm, ptep);
+ pte = huge_ptep_get(ptep);
if (pte_present(pte)) {
- page = pmd_page(*pmd) + ((address & ~PMD_MASK) >> PAGE_SHIFT);
+ page = pte_page(pte) +
+ ((address & ~huge_page_mask(h)) >> PAGE_SHIFT);
if (flags & FOLL_GET)
get_page(page);
} else {
if (is_hugetlb_entry_migration(pte)) {
spin_unlock(ptl);
- __migration_entry_wait(mm, (pte_t *)pmd, ptl);
+ __migration_entry_wait(mm, ptep, ptl);
goto retry;
}
/*
next prev parent reply other threads:[~2022-12-12 13:21 UTC|newest]
Thread overview: 90+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-12 13:16 [PATCH 5.4 00/67] 5.4.227-rc1 review Greg Kroah-Hartman
2022-12-12 13:16 ` [PATCH 5.4 01/67] arm64: dts: rockchip: keep I2S1 disabled for GPIO function on ROCK Pi 4 series Greg Kroah-Hartman
2022-12-12 13:16 ` [PATCH 5.4 02/67] arm: dts: rockchip: fix node name for hym8563 rtc Greg Kroah-Hartman
2022-12-12 13:16 ` [PATCH 5.4 03/67] ARM: dts: rockchip: fix ir-receiver node names Greg Kroah-Hartman
2022-12-12 13:16 ` [PATCH 5.4 04/67] ARM: dts: rockchip: rk3188: fix lcdc1-rgb24 node name Greg Kroah-Hartman
2022-12-12 13:16 ` [PATCH 5.4 05/67] ARM: 9251/1: perf: Fix stacktraces for tracepoint events in THUMB2 kernels Greg Kroah-Hartman
2022-12-12 13:16 ` [PATCH 5.4 06/67] ARM: 9266/1: mm: fix no-MMU ZERO_PAGE() implementation Greg Kroah-Hartman
2022-12-12 13:16 ` [PATCH 5.4 07/67] ARM: dts: rockchip: disable arm_global_timer on rk3066 and rk3188 Greg Kroah-Hartman
2022-12-12 13:16 ` [PATCH 5.4 08/67] 9p/fd: Use P9_HDRSZ for header size Greg Kroah-Hartman
2022-12-12 13:16 ` [PATCH 5.4 09/67] regulator: slg51000: Wait after asserting CS pin Greg Kroah-Hartman
2022-12-12 13:16 ` [PATCH 5.4 10/67] ALSA: seq: Fix function prototype mismatch in snd_seq_expand_var_event Greg Kroah-Hartman
2022-12-12 13:16 ` Greg Kroah-Hartman
2022-12-12 13:16 ` [PATCH 5.4 11/67] btrfs: send: avoid unaligned encoded writes when attempting to clone range Greg Kroah-Hartman
2022-12-12 13:16 ` [PATCH 5.4 12/67] ASoC: soc-pcm: Add NULL check in BE reparenting Greg Kroah-Hartman
2022-12-12 13:16 ` [PATCH 5.4 13/67] regulator: twl6030: fix get status of twl6032 regulators Greg Kroah-Hartman
2022-12-12 13:16 ` [PATCH 5.4 14/67] fbcon: Use kzalloc() in fbcon_prepare_logo() Greg Kroah-Hartman
2022-12-12 13:16 ` [PATCH 5.4 15/67] 9p/xen: check logical size for buffer size Greg Kroah-Hartman
2022-12-12 13:16 ` [PATCH 5.4 16/67] net: usb: qmi_wwan: add u-blox 0x1342 composition Greg Kroah-Hartman
2022-12-12 13:16 ` [PATCH 5.4 17/67] mm/khugepaged: take the right locks for page table retraction Greg Kroah-Hartman
2022-12-12 13:16 ` [PATCH 5.4 18/67] mm/khugepaged: fix GUP-fast interaction by sending IPI Greg Kroah-Hartman
2022-12-12 13:16 ` [PATCH 5.4 19/67] mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths Greg Kroah-Hartman
2022-12-12 13:16 ` [PATCH 5.4 20/67] xen/netback: Ensure protocol headers dont fall in the non-linear area Greg Kroah-Hartman
2022-12-12 13:16 ` [PATCH 5.4 21/67] xen/netback: do some code cleanup Greg Kroah-Hartman
2022-12-12 13:16 ` [PATCH 5.4 22/67] xen/netback: dont call kfree_skb() with interrupts disabled Greg Kroah-Hartman
2022-12-12 13:16 ` [PATCH 5.4 23/67] Revert "net: dsa: b53: Fix valid setting for MDB entries" Greg Kroah-Hartman
2022-12-12 13:16 ` [PATCH 5.4 24/67] media: v4l2-dv-timings.c: fix too strict blanking sanity checks Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 25/67] memcg: fix possible use-after-free in memcg_write_event_control() Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 26/67] mm/gup: fix gup_pud_range() for dax Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 27/67] KVM: s390: vsie: Fix the initialization of the epoch extension (epdx) field Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 28/67] drm/shmem-helper: Remove errant put in error path Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 29/67] HID: usbhid: Add ALWAYS_POLL quirk for some mice Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 30/67] HID: hid-lg4ff: Add check for empty lbuf Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 31/67] HID: core: fix shift-out-of-bounds in hid_report_raw_event Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 32/67] can: af_can: fix NULL pointer dereference in can_rcv_filter Greg Kroah-Hartman
2022-12-12 13:17 ` Greg Kroah-Hartman [this message]
2022-12-12 13:17 ` [PATCH 5.4 34/67] ieee802154: cc2520: Fix error return code in cc2520_hw_init() Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 35/67] ca8210: Fix crash by zero initializing data Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 36/67] drm/bridge: ti-sn65dsi86: Fix output polarity setting bug Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 37/67] gpio: amd8111: Fix PCI device reference count leak Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 38/67] e1000e: Fix TX dispatch condition Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 39/67] igb: Allocate MSI-X vector when testing Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 40/67] af_unix: Get user_ns from in_skb in unix_diag_get_exact() Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 41/67] Bluetooth: 6LoWPAN: add missing hci_dev_put() in get_l2cap_conn() Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 42/67] Bluetooth: Fix not cleanup led when bt_init fails Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 43/67] net: dsa: ksz: Check return value Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 44/67] selftests: rtnetlink: correct xfrm policy rule in kci_test_ipsec_offload Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 45/67] mac802154: fix missing INIT_LIST_HEAD in ieee802154_if_add() Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 46/67] net: encx24j600: Add parentheses to fix precedence Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 47/67] net: encx24j600: Fix invalid logic in reading of MISTAT register Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 48/67] xen-netfront: Fix NULL sring after live migration Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 49/67] net: mvneta: Prevent out of bounds read in mvneta_config_rss() Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 50/67] i40e: Fix not setting default xps_cpus after reset Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 51/67] i40e: Fix for VF MAC address 0 Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 52/67] i40e: Disallow ip4 and ip6 l4_4_bytes Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 53/67] NFC: nci: Bounds check struct nfc_target arrays Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 54/67] nvme initialize core quirks before calling nvme_init_subsystem Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 55/67] net: stmmac: fix "snps,axi-config" node property parsing Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 56/67] net: thunderx: Fix missing destroy_workqueue of nicvf_rx_mode_wq Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 57/67] net: hisilicon: Fix potential use-after-free in hisi_femac_rx() Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 58/67] net: hisilicon: Fix potential use-after-free in hix5hd2_rx() Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 59/67] tipc: Fix potential OOB in tipc_link_proto_rcv() Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 60/67] ipv4: Fix incorrect route flushing when source address is deleted Greg Kroah-Hartman
2023-02-03 18:47 ` Shaoying Xu
2023-02-04 8:06 ` Greg KH
2023-02-05 5:30 ` Shaoying Xu
2023-02-05 5:30 ` [PATCH stable 5.4 1/2] Revert "ipv4: Fix incorrect route flushing when source address is deleted" Shaoying Xu
2023-02-07 9:40 ` Greg KH
2023-02-05 5:31 ` [PATCH stable 5.4 2/2] ipv4: Fix incorrect route flushing when source address is deleted Shaoying Xu
2023-02-07 9:40 ` [PATCH 5.4 60/67] " Greg KH
2023-02-07 18:28 ` [PATCH 5.4 v2 1/2] Revert "ipv4: Fix incorrect route flushing when source address is deleted" Shaoying Xu
2023-02-07 18:28 ` [PATCH 5.4 v2 2/2] ipv4: Fix incorrect route flushing when source address is deleted Shaoying Xu
2023-02-08 2:22 ` David Ahern
2023-02-08 2:23 ` David Ahern
2023-02-08 2:40 ` Shaoying Xu
2023-02-17 14:25 ` Patch "ipv4: Fix incorrect route flushing when source address is deleted" has been added to the 5.4-stable tree gregkh
2023-02-17 14:25 ` Patch "Revert "ipv4: Fix incorrect route flushing when source address is deleted"" " gregkh
2022-12-12 13:17 ` [PATCH 5.4 61/67] ipv4: Fix incorrect route flushing when table ID 0 is used Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 62/67] ethernet: aeroflex: fix potential skb leak in greth_init_rings() Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 63/67] xen/netback: fix build warning Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 64/67] net: plip: dont call kfree_skb/dev_kfree_skb() under spin_lock_irq() Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 65/67] ipv6: avoid use-after-free in ip6_fragment() Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 66/67] net: mvneta: Fix an out of bounds check Greg Kroah-Hartman
2022-12-12 13:17 ` [PATCH 5.4 67/67] can: esd_usb: Allow REC and TEC to return to zero Greg Kroah-Hartman
2022-12-12 20:11 ` [PATCH 5.4 00/67] 5.4.227-rc1 review Jon Hunter
2022-12-12 22:30 ` Florian Fainelli
2022-12-13 0:02 ` Shuah Khan
2022-12-13 0:24 ` Guenter Roeck
2022-12-13 9:20 ` Naresh Kamboju
2022-12-13 15:07 ` Greg Kroah-Hartman
2022-12-13 12:02 ` Sudip Mukherjee (Codethink)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221212130919.207389968@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@redhat.com \
--cc=mike.kravetz@oracle.com \
--cc=patches@lists.linux.dev \
--cc=samjonas@amazon.com \
--cc=sashal@kernel.org \
--cc=songmuchun@bytedance.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.