From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org, stable@kernel.org
Cc: Justin Forbes <jmforbes@linuxtx.org>,
Zwane Mwaikambo <zwane@arm.linux.org.uk>,
"Theodore Ts'o" <tytso@mit.edu>,
Randy Dunlap <rdunlap@xenotime.net>,
Dave Jones <davej@redhat.com>,
Chuck Wolber <chuckw@quantumlinux.com>,
Chris Wedgwood <reviews@ml.cw.f00f.org>,
Michael Krufky <mkrufky@linuxtv.org>,
Chuck Ebbert <cebbert@redhat.com>,
Domenico Andreoli <cavokz@gmail.com>, Willy Tarreau <w@1wt.eu>,
Rodrigo Rubira Branco <rbranco@la.checkpoint.com>,
Jake Edge <jake@lwn.net>, Eugene Teo <eteo@redhat.com>,
torvalds@linux-foundation.org, akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk, Nick Piggin <npiggin@suse.de>,
Jared Hulbert <jaredeh@gmail.com>,
Carsten Otte <cotte@freenet.de>, Hugh Dickins <hugh@veritas.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: [patch 33/71] mm: dirty page tracking race fix
Date: Mon, 6 Oct 2008 17:38:50 -0700 [thread overview]
Message-ID: <20081007003850.GH3055@suse.de> (raw)
In-Reply-To: <20081007003634.GA3055@suse.de>
[-- Attachment #1: mm-dirty-page-tracking-race-fix.patch --]
[-- Type: text/plain, Size: 4410 bytes --]
2.6.26-stable review patch. If anyone has any objections, please let us
know.
------------------
From: Nick Piggin <npiggin@suse.de>
commit 479db0bf408e65baa14d2a9821abfcbc0804b847 upstream
There is a race with dirty page accounting where a page may not properly
be accounted for.
clear_page_dirty_for_io() calls page_mkclean; then TestClearPageDirty.
page_mkclean walks the rmaps for that page, and for each one it cleans and
write protects the pte if it was dirty. It uses page_check_address to
find the pte. That function has a shortcut to avoid the ptl if the pte is
not present. Unfortunately, the pte can be switched to not-present then
back to present by other code while holding the page table lock -- this
should not be a signal for page_mkclean to ignore that pte, because it may
be dirty.
For example, powerpc64's set_pte_at will clear a previously present pte
before setting it to the desired value. There may also be other code in
core mm or in arch which do similar things.
The consequence of the bug is loss of data integrity due to msync, and
loss of dirty page accounting accuracy. XIP's __xip_unmap could easily
also be unreliable (depending on the exact XIP locking scheme), which can
lead to data corruption.
Fix this by having an option to always take ptl to check the pte in
page_check_address.
It's possible to retain this optimization for page_referenced and
try_to_unmap.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Jared Hulbert <jaredeh@gmail.com>
Cc: Carsten Otte <cotte@freenet.de>
Cc: Hugh Dickins <hugh@veritas.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
include/linux/rmap.h | 2 +-
mm/filemap_xip.c | 2 +-
mm/rmap.c | 14 +++++++++-----
3 files changed, 11 insertions(+), 7 deletions(-)
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -94,7 +94,7 @@ int try_to_unmap(struct page *, int igno
* Called from mm/filemap_xip.c to unmap empty zero page
*/
pte_t *page_check_address(struct page *, struct mm_struct *,
- unsigned long, spinlock_t **);
+ unsigned long, spinlock_t **, int);
/*
* Used by swapoff to help locate where page is expected in vma.
--- a/mm/filemap_xip.c
+++ b/mm/filemap_xip.c
@@ -184,7 +184,7 @@ __xip_unmap (struct address_space * mapp
address = vma->vm_start +
((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
BUG_ON(address < vma->vm_start || address >= vma->vm_end);
- pte = page_check_address(page, mm, address, &ptl);
+ pte = page_check_address(page, mm, address, &ptl, 1);
if (pte) {
/* Nuke the page table entry. */
flush_cache_page(vma, address, pte_pfn(*pte));
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -223,10 +223,14 @@ unsigned long page_address_in_vma(struct
/*
* Check that @page is mapped at @address into @mm.
*
+ * If @sync is false, page_check_address may perform a racy check to avoid
+ * the page table lock when the pte is not present (helpful when reclaiming
+ * highly shared pages).
+ *
* On success returns with pte mapped and locked.
*/
pte_t *page_check_address(struct page *page, struct mm_struct *mm,
- unsigned long address, spinlock_t **ptlp)
+ unsigned long address, spinlock_t **ptlp, int sync)
{
pgd_t *pgd;
pud_t *pud;
@@ -248,7 +252,7 @@ pte_t *page_check_address(struct page *p
pte = pte_offset_map(pmd, address);
/* Make a quick check before getting the lock */
- if (!pte_present(*pte)) {
+ if (!sync && !pte_present(*pte)) {
pte_unmap(pte);
return NULL;
}
@@ -280,7 +284,7 @@ static int page_referenced_one(struct pa
if (address == -EFAULT)
goto out;
- pte = page_check_address(page, mm, address, &ptl);
+ pte = page_check_address(page, mm, address, &ptl, 0);
if (!pte)
goto out;
@@ -449,7 +453,7 @@ static int page_mkclean_one(struct page
if (address == -EFAULT)
goto out;
- pte = page_check_address(page, mm, address, &ptl);
+ pte = page_check_address(page, mm, address, &ptl, 1);
if (!pte)
goto out;
@@ -707,7 +711,7 @@ static int try_to_unmap_one(struct page
if (address == -EFAULT)
goto out;
- pte = page_check_address(page, mm, address, &ptl);
+ pte = page_check_address(page, mm, address, &ptl, 0);
if (!pte)
goto out;
--
next prev parent reply other threads:[~2008-10-07 0:52 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20081007002606.723632097@mini.kroah.org>
2008-10-07 0:36 ` [patch 00/71] 2.6.26-stable review Greg KH
2008-10-07 0:37 ` [patch 01/71] x86-32: AMD c1e force timer broadcast late Greg KH
2008-10-07 0:37 ` [patch 02/71] ACPI: Fix thermal shutdowns Greg KH
2008-10-07 0:37 ` [patch 03/71] i2c-dev: Return correct error code on class_create() failure Greg KH
2008-10-07 0:37 ` [patch 04/71] ixgbe: initialize interrupt throttle rate Greg KH
2008-10-07 0:37 ` [patch 05/71] drivers/mmc/card/block.c: fix refcount leak in mmc_block_open() Greg KH
2008-10-07 0:37 ` [patch 06/71] async_tx: fix the bug in async_tx_run_dependencies Greg KH
2008-10-07 0:37 ` [patch 07/71] mm: mark the correct zone as full when scanning zonelists Greg KH
2008-10-07 0:37 ` [patch 08/71] pxa2xx_spi: dma bugfixes Greg KH
2008-10-07 0:37 ` [patch 09/71] pxa2xx_spi: chipselect bugfixes Greg KH
2008-10-07 0:37 ` [patch 10/71] smb.h: do not include linux/time.h in userspace Greg KH
2008-10-07 0:37 ` [patch 11/71] USB: fix hcd interrupt disabling Greg KH
2008-10-07 0:37 ` [patch 12/71] SCSI: qla2xxx: Defer enablement of RISC interrupts until ISP initialization completes Greg KH
2008-10-07 0:38 ` [patch 13/71] ALSA: hda - Fix model for Dell Inspiron 1525 Greg KH
2008-10-07 0:38 ` [patch 14/71] ALSA: oxygen: fix distorted output on AK4396-based cards Greg KH
2008-10-07 0:38 ` [patch 15/71] ALSA: fix locking in snd_pcm_open*() and snd_rawmidi_open*() Greg KH
2008-10-07 0:38 ` [patch 16/71] ALSA: remove unneeded power_mutex lock in snd_pcm_drop Greg KH
2008-10-07 0:38 ` [patch 17/71] KVM: SVM: fix random segfaults with NPT enabled Greg KH
2008-10-07 0:38 ` [patch 18/71] KVM: SVM: fix guest global tlb flushes with NPT Greg KH
2008-10-07 0:38 ` [patch 19/71] x86-64: Clean up save/restore_i387() usage Greg KH
2008-10-07 0:38 ` [patch 20/71] x64, fpu: fix possible FPU leakage in error conditions Greg KH
2008-10-07 0:38 ` [patch 21/71] x86: Fix broken LDT access in VMI Greg KH
2008-10-07 0:38 ` [patch 22/71] block: submit_bh() inadvertently discards barrier flag on a sync write Greg KH
2008-10-07 0:38 ` [patch 23/71] sched: fix process time monotonicity Greg KH
2008-10-07 0:38 ` [patch 24/71] APIC routing fix Greg KH
2008-10-07 0:38 ` [patch 25/71] ocfs2: Increment the reference count of an already-active stack Greg KH
2008-10-07 0:38 ` [patch 26/71] sg: disable interrupts inside sg_copy_buffer Greg KH
2008-10-07 0:38 ` [patch 27/71] x86: Fix 27-rc crash on vsmp due to paravirt during module load Greg KH
2008-10-07 0:38 ` [patch 28/71] rt2x00: Use ieee80211_hw->workqueue again Greg KH
2008-10-07 0:38 ` [patch 29/71] x86: fdiv bug detection fix Greg KH
2008-10-07 0:38 ` [patch 30/71] x86: fix oprofile + hibernation badness Greg KH
2008-10-07 0:38 ` [patch 31/71] x86: PAT proper tracking of set_memory_uc and friends Greg KH
2008-10-07 0:38 ` [patch 32/71] x86-64: fix overlap of modules and fixmap areas Greg KH
2008-10-07 0:38 ` Greg KH [this message]
2008-10-07 0:38 ` [patch 34/71] rtc: fix deadlock Greg KH
2008-10-07 0:38 ` [patch 35/71] x86: fix SMP alternatives: use mutex instead of spinlock, text_poke is sleepable Greg KH
2008-10-07 0:38 ` [patch 36/71] ACPI: Avoid bogus EC timeout when EC is in Polling mode Greg KH
2008-10-07 0:39 ` [patch 37/71] x86: add io delay quirk for Presario F700 Greg KH
2008-10-07 0:39 ` [patch 38/71] x86: fix memmap=exactmap boot argument Greg KH
2008-10-07 0:39 ` [patch 39/71] clockevents: prevent clockevent event_handler ending up handler_noop Greg KH
2008-10-07 0:39 ` [patch 40/71] clockevents: prevent endless loop in periodic broadcast handler Greg KH
2008-10-07 0:39 ` [patch 41/71] clockevents: enforce reprogram in oneshot setup Greg KH
2008-10-07 0:39 ` [patch 42/71] clockevents: prevent multiple init/shutdown Greg KH
2008-10-07 0:39 ` [patch 43/71] clockevents: prevent endless loop lockup Greg KH
2008-10-07 0:39 ` [patch 44/71] HPET: make minimum reprogramming delta useful Greg KH
2008-10-07 0:39 ` [patch 45/71] clockevents: broadcast fixup possible waiters Greg KH
2008-10-07 0:39 ` [patch 46/71] x86: HPET fix moronic 32/64bit thinko Greg KH
2008-10-07 0:39 ` [patch 47/71] x86: HPET: read back compare register before reading counter Greg KH
2008-10-07 0:39 ` [patch 48/71] ntp: fix calculation of the next jiffie to trigger RTC sync Greg KH
2008-10-07 0:39 ` [patch 49/71] clockevents: remove WARN_ON which was used to gather information Greg KH
2008-10-07 0:39 ` [patch 50/71] pcmcia: Fix broken abuse of dev->driver_data Greg KH
2008-10-07 0:39 ` [patch 51/71] af_key: Free dumping state on socket close Greg KH
2008-10-07 0:39 ` [patch 52/71] XFRM,IPv6: initialize ip6_dst_blackhole_ops.kmem_cachep Greg KH
2008-10-07 0:39 ` [patch 53/71] ipv6: Fix OOPS in ip6_dst_lookup_tail() Greg KH
2008-10-07 0:39 ` [patch 54/71] niu: panic on reset Greg KH
2008-10-07 0:39 ` [patch 55/71] netlink: fix overrun in attribute iteration Greg KH
2008-10-07 0:39 ` [patch 56/71] ipsec: Fix pskb_expand_head corruption in xfrm_state_check_space Greg KH
2008-10-07 0:40 ` [patch 57/71] sctp: do not enable peer features if we cant do them Greg KH
2008-10-07 0:40 ` [patch 58/71] sctp: Fix oops when INIT-ACK indicates that peer doesnt support AUTH Greg KH
2008-10-07 0:40 ` [patch 59/71] udp: Fix rcv socket locking Greg KH
2008-10-07 0:40 ` [patch 60/71] sparc64: Fix PCI error interrupt registry on PSYCHO Greg KH
2008-12-29 17:14 ` [patch 60/71] sparc64: Fix PCI error interrupt registry on Geert Uytterhoeven
2008-12-29 17:14 ` [patch 60/71] sparc64: Fix PCI error interrupt registry on PSYCHO Geert Uytterhoeven
2008-12-30 2:36 ` [patch 60/71] sparc64: Fix PCI error interrupt registry on David Miller
2008-12-30 2:36 ` [patch 60/71] sparc64: Fix PCI error interrupt registry on PSYCHO David Miller
2008-10-07 0:40 ` [patch 61/71] sparc64: Fix interrupt register calculations on Psycho and Sabre Greg KH
2008-10-07 0:40 ` [patch 62/71] sparc64: Fix OOPS in psycho_pcierr_intr_other() Greg KH
2008-10-07 0:40 ` [patch 63/71] sparc64: Fix disappearing PCI devices on e3500 Greg KH
2008-10-07 0:40 ` [patch 64/71] sparc64: Fix missing devices due to PCI bridge test in of_create_pci_dev() Greg KH
2008-10-07 0:40 ` [patch 65/71] braille_console: only register notifiers when the braille console is used Greg KH
2008-10-07 0:40 ` [patch 66/71] ALSA: snd-powermac: mixers for PowerMac G4 AGP Greg KH
2008-10-07 0:40 ` [patch 67/71] ALSA: snd-powermac: HP detection for 1st iMac G3 SL Greg KH
2008-10-07 0:40 ` [patch 68/71] fbcon: fix monochrome color value calculation Greg KH
2008-10-07 0:40 ` [patch 69/71] rtc: fix kernel panic on second use of SIGIO nofitication Greg KH
2008-10-07 0:40 ` [patch 70/71] mm owner: fix race between swapoff and exit Greg KH
2008-10-07 0:40 ` [patch 71/71] S390: CVE-2008-1514: prevent ptrace padding area read/write in 31-bit mode Greg KH
2008-10-07 4:42 ` [patch 00/71] 2.6.26-stable review Grant Coady
2008-10-07 4:59 ` Greg KH
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081007003850.GH3055@suse.de \
--to=gregkh@suse.de \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=cavokz@gmail.com \
--cc=cebbert@redhat.com \
--cc=chuckw@quantumlinux.com \
--cc=cotte@freenet.de \
--cc=davej@redhat.com \
--cc=eteo@redhat.com \
--cc=hugh@veritas.com \
--cc=jake@lwn.net \
--cc=jaredeh@gmail.com \
--cc=jmforbes@linuxtx.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mkrufky@linuxtv.org \
--cc=npiggin@suse.de \
--cc=rbranco@la.checkpoint.com \
--cc=rdunlap@xenotime.net \
--cc=reviews@ml.cw.f00f.org \
--cc=stable@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=tytso@mit.edu \
--cc=w@1wt.eu \
--cc=zwane@arm.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.