From: Kamal Mostafa <kamal@canonical.com>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org,
kernel-team@lists.ubuntu.com
Cc: Mel Gorman <mgorman@suse.de>,
Andrea Arcangeli <aarcange@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@kernel.org>,
Kamal Mostafa <kamal@canonical.com>
Subject: [PATCH 3.8 78/91] mm: Close races between THP migration and PMD numa clearing
Date: Thu, 7 Nov 2013 18:15:33 -0800 [thread overview]
Message-ID: <1383876946-2396-79-git-send-email-kamal@canonical.com> (raw)
In-Reply-To: <1383876946-2396-1-git-send-email-kamal@canonical.com>
3.8.13.13 -stable review patch. If anyone has any objections, please let me know.
------------------
From: Mel Gorman <mgorman@suse.de>
commit 3f926ab945b60a5824369d21add7710622a2eac0 upstream.
THP migration uses the page lock to guard against parallel allocations
but there are cases like this still open
Task A Task B
--------------------- ---------------------
do_huge_pmd_numa_page do_huge_pmd_numa_page
lock_page
mpol_misplaced == -1
unlock_page
goto clear_pmdnuma
lock_page
mpol_misplaced == 2
migrate_misplaced_transhuge
pmd = pmd_mknonnuma
set_pmd_at
During hours of testing, one crashed with weird errors and while I have
no direct evidence, I suspect something like the race above happened.
This patch extends the page lock to being held until the pmd_numa is
cleared to prevent migration starting in parallel while the pmd_numa is
being cleared. It also flushes the old pmd entry and orders pagetable
insertion before rmap insertion.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-9-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[ kamal: backport to 3.8 (context) ]
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
---
mm/huge_memory.c | 29 +++++++++++++++--------------
mm/migrate.c | 19 +++++++++++--------
2 files changed, 26 insertions(+), 22 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1390fdd..9c37776 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1323,24 +1323,25 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
target_nid = mpol_misplaced(page, vma, haddr);
if (target_nid == -1) {
/* If the page was locked, there are no parallel migrations */
- if (page_locked) {
- unlock_page(page);
+ if (page_locked)
goto clear_pmdnuma;
- }
- /* Otherwise wait for potential migrations and retry fault */
+ /*
+ * Otherwise wait for potential migrations and retry. We do
+ * relock and check_same as the page may no longer be mapped.
+ * As the fault is being retried, do not account for it.
+ */
spin_unlock(&mm->page_table_lock);
wait_on_page_locked(page);
+ page_nid = -1;
goto out;
}
/* Page is misplaced, serialise migrations and parallel THP splits */
get_page(page);
spin_unlock(&mm->page_table_lock);
- if (!page_locked) {
+ if (!page_locked)
lock_page(page);
- page_locked = true;
- }
anon_vma = page_lock_anon_vma_read(page);
/* Confirm the PTE did not while locked */
@@ -1348,29 +1349,29 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
if (unlikely(!pmd_same(pmd, *pmdp))) {
unlock_page(page);
put_page(page);
+ page_nid = -1;
goto out_unlock;
}
- /* Migrate the THP to the requested node */
+ /*
+ * Migrate the THP to the requested node, returns with page unlocked
+ * and pmd_numa cleared.
+ */
spin_unlock(&mm->page_table_lock);
migrated = migrate_misplaced_transhuge_page(mm, vma,
pmdp, pmd, addr, page, target_nid);
if (migrated)
page_nid = target_nid;
- else
- goto check_same;
goto out;
-check_same:
- spin_lock(&mm->page_table_lock);
- if (unlikely(!pmd_same(pmd, *pmdp)))
- goto out_unlock;
clear_pmdnuma:
+ BUG_ON(!PageLocked(page));
pmd = pmd_mknonnuma(pmd);
set_pmd_at(mm, haddr, pmdp, pmd);
VM_BUG_ON(pmd_numa(*pmdp));
update_mmu_cache_pmd(vma, addr, pmdp);
+ unlock_page(page);
out_unlock:
spin_unlock(&mm->page_table_lock);
diff --git a/mm/migrate.c b/mm/migrate.c
index 811a2ca..d2296c5 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1725,12 +1725,12 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
unlock_page(new_page);
put_page(new_page); /* Free it */
- unlock_page(page);
+ /* Retake the callers reference and putback on LRU */
+ get_page(page);
putback_lru_page(page);
-
- count_vm_events(PGMIGRATE_FAIL, HPAGE_PMD_NR);
- isolated = 0;
- goto out;
+ mod_zone_page_state(page_zone(page),
+ NR_ISOLATED_ANON + page_lru, -HPAGE_PMD_NR);
+ goto out_fail;
}
/*
@@ -1747,9 +1747,9 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
entry = pmd_mkhuge(entry);
- page_add_new_anon_rmap(new_page, vma, haddr);
-
+ pmdp_clear_flush(vma, haddr, pmd);
set_pmd_at(mm, haddr, pmd, entry);
+ page_add_new_anon_rmap(new_page, vma, haddr);
update_mmu_cache_pmd(vma, address, &entry);
page_remove_rmap(page);
/*
@@ -1768,7 +1768,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
count_vm_events(PGMIGRATE_SUCCESS, HPAGE_PMD_NR);
count_vm_numa_events(NUMA_PAGE_MIGRATE, HPAGE_PMD_NR);
-out:
mod_zone_page_state(page_zone(page),
NR_ISOLATED_ANON + page_lru,
-HPAGE_PMD_NR);
@@ -1777,6 +1776,10 @@ out:
out_fail:
count_vm_events(PGMIGRATE_FAIL, HPAGE_PMD_NR);
out_dropref:
+ entry = pmd_mknonnuma(entry);
+ set_pmd_at(mm, haddr, pmd, entry);
+ update_mmu_cache_pmd(vma, address, &entry);
+
unlock_page(page);
put_page(page);
return 0;
--
1.8.1.2
next prev parent reply other threads:[~2013-11-08 2:15 UTC|newest]
Thread overview: 94+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-08 2:14 [3.8.y.z extended stable] Linux 3.8.13.13 stable review Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 01/91] ACPICA: Interpreter: Fix Store() when implicit conversion is not possible Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 02/91] ACPICA: DeRefOf operator: Update to fully resolve FieldUnit and BufferField refs Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 03/91] ACPICA: Return error if DerefOf resolves to a null package element Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 04/91] ACPICA: Fix for a Store->ArgX when ArgX contains a reference to a field Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 05/91] vxlan: fix ip_select_ident skb parameter Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 06/91] tcp: TSO packets automatic sizing Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 07/91] tcp: TSQ can use a dynamic limit Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 08/91] net: Add skb_unclone() helper function Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 09/91] ipv6: fix warning in xfrm6_mode_tunnel_input Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 10/91] ip: fix warning in xfrm4_mode_tunnel_input Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 11/91] tcp: must unclone packets before mangling them Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 12/91] tcp: do not forget FIN in tcp_shifted_skb() Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 13/91] net: do not call sock_put() on TIMEWAIT sockets Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 14/91] l2tp: fix kernel panic when using IPv4-mapped IPv6 addresses Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 15/91] l2tp: Fix build warning with ipv6 disabled Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 16/91] net: mv643xx_eth: update statistics timer from timer context only Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 17/91] net: mv643xx_eth: fix orphaned statistics timer crash Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 18/91] net: heap overflow in __audit_sockaddr() Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 19/91] proc connector: fix info leaks Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 20/91] ipv4: fix ineffective source address selection Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 21/91] can: dev: fix nlmsg size calculation in can_get_size() Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 22/91] net: secure_seq: Fix warning when CONFIG_IPV6 and CONFIG_INET are not selected Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 23/91] xen-netback: Don't destroy the netdev until the vif is shut down Kamal Mostafa
2013-11-08 9:53 ` Ian Campbell
2013-11-08 17:54 ` Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 24/91] net: vlan: fix nlmsg size calculation in vlan_get_size() Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 25/91] vti: get rid of nf mark rule in prerouting Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 26/91] l2tp: must disable bh before calling l2tp_xmit_skb() Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 27/91] farsync: fix info leak in ioctl Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 28/91] unix_diag: fix info leak Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 29/91] connector: use nlmsg_len() to check message length Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 30/91] virtio-net: don't respond to cpu hotplug notifier if we're not ready Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 31/91] bridge: Correctly clamp MAX forward_delay when enabling STP Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 32/91] net: dst: provide accessor function to dst->xfrm Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 33/91] sctp: Use software crc32 checksum when xfrm transform will happen Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 34/91] sctp: Perform software checksum if packet has to be fragmented Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 35/91] wanxl: fix info leak in ioctl Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 36/91] net: unix: inherit SOCK_PASS{CRED, SEC} flags from socket to fix race Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 37/91] net: fix cipso packet validation when !NETLABEL Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 38/91] inet: fix possible memory corruption with UDP_CORK and UFO Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 39/91] davinci_emac.c: Fix IFF_ALLMULTI setup Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 40/91] jfs: fix error path in ialloc Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 41/91] cfg80211: fix warning when using WEXT for IBSS Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 42/91] mac80211: drop spoofed packets in ad-hoc mode Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 43/91] mac80211: use sta_info_get_bss() for nl80211 tx and client probing Kamal Mostafa
2013-11-08 2:14 ` [PATCH 3.8 44/91] mac80211: update sta->last_rx on acked tx frames Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 45/91] mwifiex: fix SDIO interrupt lost issue Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 46/91] ath9k: fix tx queue scheduling after channel changes Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 47/91] libata: make ata_eh_qc_retry() bump scmd->allowed on bogus failures Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 48/91] mac80211: correctly close cancelled scans Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 49/91] can: flexcan: fix mx28 detection by rearanging OF match table Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 50/91] rtlwifi: rtl8192cu: Fix error in pointer arithmetic Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 51/91] wireless: radiotap: fix parsing buffer overrun Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 52/91] mac80211: fix crash if bitrate calculation goes wrong Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 53/91] drm/vmwgfx: Don't put resources with invalid id's on lru list Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 54/91] drm/vmwgfx: Don't kill clients on VT switch Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 55/91] ecryptfs: Fix memory leakage in keystore.c Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 56/91] drm: Prevent overwriting from userspace underallocating core ioctl structs Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 57/91] drm: Pad drm_mode_get_connector to 64-bit boundary Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 58/91] drm/radeon/atom: workaround vbios bug in transmitter table on rs780 Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 59/91] md: Fix skipping recovery for read-only arrays Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 60/91] md: avoid deadlock when md_set_badblocks Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 61/91] raid5: set bio bi_vcnt 0 for discard request Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 62/91] raid5: avoid finding "discard" stripe Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 63/91] target/pscsi: fix return value check Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 64/91] vhost/scsi: Fix incorrect usage of get_user_pages_fast write parameter Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 65/91] parisc: Do not crash 64bit SMP kernels on machines with >= 4GB RAM Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 66/91] dmi: add support for exact DMI matches in addition to substring matching Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 67/91] clk: fixup argument order when setting VCO parameters Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 68/91] xtensa: don't use alternate signal stack on threads Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 69/91] ALSA: hda - Fix unbalanced runtime PM refcount after S3/S4 Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 70/91] ASoC: dapm: Fix source list debugfs outputs Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 71/91] drm/i915: quirk away phantom LVDS on Intel's D510MO mainboard Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 72/91] drm/i915: quirk away phantom LVDS on Intel's D525MW mainboard Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 73/91] drm/i915: No LVDS hardware on Intel D410PT and D425KT Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 74/91] mm: Wait for THP migrations to complete during NUMA hinting faults Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 75/91] mm: numa: cleanup flow of transhuge page migration Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 76/91] mm: Prevent parallel splits during THP migration Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 77/91] mm: numa: Sanitize task_numa_fault() callsites Kamal Mostafa
2013-11-08 2:15 ` Kamal Mostafa [this message]
2013-11-08 2:15 ` [PATCH 3.8 79/91] mm: Account for a THP NUMA hinting update as one PTE update Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 80/91] Fix a few incorrectly checked [io_]remap_pfn_range() calls Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 81/91] ALSA: hda - Add a fixup for ASUS N76VZ Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 82/91] ASoC: wm_hubs: Add missing break in hp_supply_event() Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 83/91] uml: check length in exitcode_proc_write() Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 84/91] staging: ozwpan: prevent overflow in oz_cdev_write() Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 85/91] aacraid: missing capable() check in compat ioctl Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 86/91] staging: wlags49_h2: buffer overflow setting station name Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 87/91] Staging: bcm: info leak in ioctl Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 88/91] Staging: sb105x: info leak in mp_get_count() Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 89/91] ALSA: fix oops in snd_pcm_info() caused by ASoC DPCM Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 90/91] lib/scatterlist.c: don't flush_kernel_dcache_page on slab page Kamal Mostafa
2013-11-08 2:15 ` [PATCH 3.8 91/91] scripts/kallsyms: filter symbols not in kernel address space Kamal Mostafa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1383876946-2396-79-git-send-email-kamal@canonical.com \
--to=kamal@canonical.com \
--cc=aarcange@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@lists.ubuntu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=srikar@linux.vnet.ibm.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox