public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org, stable@kernel.org
Cc: Justin Forbes <jmforbes@linuxtx.org>,
	Zwane Mwaikambo <zwane@arm.linux.org.uk>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Randy Dunlap <rdunlap@xenotime.net>,
	Dave Jones <davej@redhat.com>,
	Chuck Wolber <chuckw@quantumlinux.com>,
	Chris Wedgwood <reviews@ml.cw.f00f.org>,
	Michael Krufky <mkrufky@linuxtv.org>,
	Chuck Ebbert <cebbert@redhat.com>,
	torvalds@linux-foundation.org, akpm@linux-foundation.org,
	alan@lxorguk.ukuu.org.uk, Miklos Szeredi <mszeredi@suse.cz>,
	Hugh Dickins <hugh@veritas.com>
Subject: [patch 07/33] holepunch: fix shmem_truncate_range punch locking
Date: Thu, 26 Apr 2007 09:55:32 -0700	[thread overview]
Message-ID: <20070426165532.GH1898@kroah.com> (raw)
In-Reply-To: <20070426165445.GA1898@kroah.com>

[-- Attachment #1: holepunch-fix-shmem_truncate_range-punch-locking.patch --]
[-- Type: text/plain, Size: 7128 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------
From: Hugh Dickins <hugh@veritas.com>

Miklos Szeredi observes that during truncation of shmem page directories,
info->lock is released to improve latency (after lowering i_size and
next_index to exclude races); but this is quite wrong for holepunching,
which receives no such protection from i_size or next_index, and is left
vulnerable to races with shmem_unuse, shmem_getpage and shmem_writepage.

Hold info->lock throughout when holepunching?  No, any user could prevent
rescheduling for far too long.  Instead take info->lock just when needed:
in shmem_free_swp when removing the swap entries, and whenever removing
a directory page from the level above.  But so long as we remove before
scanning, we can safely skip taking the lock at the lower levels, except
at misaligned start and end of the hole.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 mm/shmem.c |   96 ++++++++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 73 insertions(+), 23 deletions(-)

--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -402,26 +402,38 @@ static swp_entry_t *shmem_swp_alloc(stru
 /*
  * shmem_free_swp - free some swap entries in a directory
  *
- * @dir:   pointer to the directory
- * @edir:  pointer after last entry of the directory
+ * @dir:        pointer to the directory
+ * @edir:       pointer after last entry of the directory
+ * @punch_lock: pointer to spinlock when needed for the holepunch case
  */
-static int shmem_free_swp(swp_entry_t *dir, swp_entry_t *edir)
+static int shmem_free_swp(swp_entry_t *dir, swp_entry_t *edir,
+						spinlock_t *punch_lock)
 {
+	spinlock_t *punch_unlock = NULL;
 	swp_entry_t *ptr;
 	int freed = 0;
 
 	for (ptr = dir; ptr < edir; ptr++) {
 		if (ptr->val) {
+			if (unlikely(punch_lock)) {
+				punch_unlock = punch_lock;
+				punch_lock = NULL;
+				spin_lock(punch_unlock);
+				if (!ptr->val)
+					continue;
+			}
 			free_swap_and_cache(*ptr);
 			*ptr = (swp_entry_t){0};
 			freed++;
 		}
 	}
+	if (punch_unlock)
+		spin_unlock(punch_unlock);
 	return freed;
 }
 
-static int shmem_map_and_free_swp(struct page *subdir,
-		int offset, int limit, struct page ***dir)
+static int shmem_map_and_free_swp(struct page *subdir, int offset,
+		int limit, struct page ***dir, spinlock_t *punch_lock)
 {
 	swp_entry_t *ptr;
 	int freed = 0;
@@ -431,7 +443,8 @@ static int shmem_map_and_free_swp(struct
 		int size = limit - offset;
 		if (size > LATENCY_LIMIT)
 			size = LATENCY_LIMIT;
-		freed += shmem_free_swp(ptr+offset, ptr+offset+size);
+		freed += shmem_free_swp(ptr+offset, ptr+offset+size,
+							punch_lock);
 		if (need_resched()) {
 			shmem_swp_unmap(ptr);
 			if (*dir) {
@@ -482,6 +495,8 @@ static void shmem_truncate_range(struct 
 	int offset;
 	int freed;
 	int punch_hole;
+	spinlock_t *needs_lock;
+	spinlock_t *punch_lock;
 	unsigned long upper_limit;
 
 	inode->i_ctime = inode->i_mtime = CURRENT_TIME;
@@ -495,6 +510,7 @@ static void shmem_truncate_range(struct 
 		limit = info->next_index;
 		upper_limit = SHMEM_MAX_INDEX;
 		info->next_index = idx;
+		needs_lock = NULL;
 		punch_hole = 0;
 	} else {
 		if (end + 1 >= inode->i_size) {	/* we may free a little more */
@@ -505,6 +521,7 @@ static void shmem_truncate_range(struct 
 			limit = (end + 1) >> PAGE_CACHE_SHIFT;
 			upper_limit = limit;
 		}
+		needs_lock = &info->lock;
 		punch_hole = 1;
 	}
 
@@ -521,7 +538,7 @@ static void shmem_truncate_range(struct 
 		size = limit;
 		if (size > SHMEM_NR_DIRECT)
 			size = SHMEM_NR_DIRECT;
-		nr_swaps_freed = shmem_free_swp(ptr+idx, ptr+size);
+		nr_swaps_freed = shmem_free_swp(ptr+idx, ptr+size, needs_lock);
 	}
 
 	/*
@@ -531,6 +548,19 @@ static void shmem_truncate_range(struct 
 	if (!topdir || limit <= SHMEM_NR_DIRECT)
 		goto done2;
 
+	/*
+	 * The truncation case has already dropped info->lock, and we're safe
+	 * because i_size and next_index have already been lowered, preventing
+	 * access beyond.  But in the punch_hole case, we still need to take
+	 * the lock when updating the swap directory, because there might be
+	 * racing accesses by shmem_getpage(SGP_CACHE), shmem_unuse_inode or
+	 * shmem_writepage.  However, whenever we find we can remove a whole
+	 * directory page (not at the misaligned start or end of the range),
+	 * we first NULLify its pointer in the level above, and then have no
+	 * need to take the lock when updating its contents: needs_lock and
+	 * punch_lock (either pointing to info->lock or NULL) manage this.
+	 */
+
 	upper_limit -= SHMEM_NR_DIRECT;
 	limit -= SHMEM_NR_DIRECT;
 	idx = (idx > SHMEM_NR_DIRECT)? (idx - SHMEM_NR_DIRECT): 0;
@@ -552,7 +582,13 @@ static void shmem_truncate_range(struct 
 			diroff = ((idx - ENTRIES_PER_PAGEPAGE/2) %
 				ENTRIES_PER_PAGEPAGE) / ENTRIES_PER_PAGE;
 			if (!diroff && !offset && upper_limit >= stage) {
-				*dir = NULL;
+				if (needs_lock) {
+					spin_lock(needs_lock);
+					*dir = NULL;
+					spin_unlock(needs_lock);
+					needs_lock = NULL;
+				} else
+					*dir = NULL;
 				nr_pages_to_free++;
 				list_add(&middir->lru, &pages_to_free);
 			}
@@ -578,8 +614,16 @@ static void shmem_truncate_range(struct 
 			}
 			stage = idx + ENTRIES_PER_PAGEPAGE;
 			middir = *dir;
+			if (punch_hole)
+				needs_lock = &info->lock;
 			if (upper_limit >= stage) {
-				*dir = NULL;
+				if (needs_lock) {
+					spin_lock(needs_lock);
+					*dir = NULL;
+					spin_unlock(needs_lock);
+					needs_lock = NULL;
+				} else
+					*dir = NULL;
 				nr_pages_to_free++;
 				list_add(&middir->lru, &pages_to_free);
 			}
@@ -588,31 +632,37 @@ static void shmem_truncate_range(struct 
 			dir = shmem_dir_map(middir);
 			diroff = 0;
 		}
+		punch_lock = needs_lock;
 		subdir = dir[diroff];
-		if (subdir && page_private(subdir)) {
+		if (subdir && !offset && upper_limit-idx >= ENTRIES_PER_PAGE) {
+			if (needs_lock) {
+				spin_lock(needs_lock);
+				dir[diroff] = NULL;
+				spin_unlock(needs_lock);
+				punch_lock = NULL;
+			} else
+				dir[diroff] = NULL;
+			nr_pages_to_free++;
+			list_add(&subdir->lru, &pages_to_free);
+		}
+		if (subdir && page_private(subdir) /* has swap entries */) {
 			size = limit - idx;
 			if (size > ENTRIES_PER_PAGE)
 				size = ENTRIES_PER_PAGE;
 			freed = shmem_map_and_free_swp(subdir,
-						offset, size, &dir);
+					offset, size, &dir, punch_lock);
 			if (!dir)
 				dir = shmem_dir_map(middir);
 			nr_swaps_freed += freed;
-			if (offset)
+			if (offset || punch_lock) {
 				spin_lock(&info->lock);
-			set_page_private(subdir, page_private(subdir) - freed);
-			if (offset)
+				set_page_private(subdir,
+					page_private(subdir) - freed);
 				spin_unlock(&info->lock);
-			if (!punch_hole)
-				BUG_ON(page_private(subdir) > offset);
-		}
-		if (offset)
-			offset = 0;
-		else if (subdir && upper_limit - idx >= ENTRIES_PER_PAGE) {
-			dir[diroff] = NULL;
-			nr_pages_to_free++;
-			list_add(&subdir->lru, &pages_to_free);
+			} else
+				BUG_ON(page_private(subdir) != freed);
 		}
+		offset = 0;
 	}
 done1:
 	shmem_dir_unmap(dir);

-- 

  parent reply	other threads:[~2007-04-26 16:59 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20070426165111.393445007@mini.kroah.org>
2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
2007-04-26 16:48   ` David Lang
2007-04-26 17:30     ` Greg KH
2007-04-26 17:45       ` [stable] " Chris Wright
2007-04-26 16:54   ` [patch 01/33] knfsd: Use a spinlock to protect sk_info_authunix Greg KH
2007-04-26 16:55   ` [patch 02/33] IB/mthca: Fix data corruption after FMR unmap on Sinai Greg KH
2007-04-26 16:55   ` [patch 03/33] HID: zeroing of bytes in output fields is bogus Greg KH
2007-04-26 16:55   ` [patch 04/33] KVM: MMU: Fix guest writes to nonpae pde Greg KH
2007-04-26 16:55   ` [patch 05/33] KVM: MMU: Fix host memory corruption on i386 with >= 4GB ram Greg KH
2007-04-26 16:55   ` [patch 06/33] holepunch: fix shmem_truncate_range punching too far Greg KH
2007-04-26 16:55   ` Greg KH [this message]
2007-04-26 16:55   ` [patch 08/33] holepunch: fix disconnected pages after second truncate Greg KH
2007-04-26 16:55   ` [patch 09/33] holepunch: fix mmap_sem i_mutex deadlock Greg KH
2007-04-26 16:55   ` [patch 10/33] Fix sparc64 SBUS IOMMU allocator Greg KH
2007-04-26 16:55   ` [patch 11/33] Fix qlogicpti DMA unmapping Greg KH
2007-04-26 16:55   ` [patch 12/33] Fix compat sys_ipc() on sparc64 Greg KH
2007-04-26 16:55   ` [patch 13/33] Fix bogus inline directive in sparc64 PCI code Greg KH
2007-04-26 16:55   ` [patch 14/33] Fix errors in tcp_memcalculations Greg KH
2007-04-26 16:56   ` [patch 15/33] Fix netpoll UDP input path Greg KH
2007-04-26 16:56   ` [patch 16/33] Fix IRDA oopser Greg KH
2007-04-26 16:56   ` [patch 17/33] cache_k8_northbridges() overflows beyond allocation Greg KH
2007-04-26 16:56   ` [patch 18/33] exec.c: fix coredump to pipe problem and obscure "security hole" Greg KH
2007-04-26 16:56   ` [patch 19/33] NFS: Fix an Oops in nfs_setattr() Greg KH
2007-04-26 16:56   ` [patch 20/33] x86: Dont probe for DDC on VBE1.2 Greg KH
2007-04-26 16:56   ` [patch 21/33] vt: fix potential race in VT_WAITACTIVE handler Greg KH
2007-04-26 16:56   ` [patch 22/33] 3w-xxxx: fix oops caused by incorrect REQUEST_SENSE handling Greg KH
2007-04-26 16:56   ` [patch 23/33] fix bogon in /dev/mem mmaping on nommu Greg KH
2007-04-26 16:56   ` [patch 24/33] fix OOM killing processes wrongly thought MPOL_BIND Greg KH
2007-04-26 16:56   ` [patch 25/33] Fix possible NULL pointer access in 8250 serial driver Greg KH
2007-04-26 16:56   ` [patch 26/33] page migration: fix NR_FILE_PAGES accounting Greg KH
2007-04-26 16:57   ` [patch 27/33] Taskstats fix the structure members alignment issue Greg KH
2007-04-26 16:57   ` [patch 28/33] reiserfs: fix xattr root locking/refcount bug Greg KH
2007-04-26 16:57   ` [patch 29/33] hwmon/w83627ehf: Fix the fan5 clock divider write Greg KH
2007-04-26 16:57   ` [patch 30/33] ALSA: intel8x0 - Fix speaker output after S2RAM Greg KH
2007-04-26 16:57   ` [patch 31/33] AGPGART: intel_agp: fix G965 GTT size detect Greg KH
2007-04-26 16:57   ` [patch 32/33] cfq-iosched: fix alias + front merge bug Greg KH
2007-04-26 16:57   ` [patch 33/33] Revert "adjust legacy IDE resource setting (v2)" Greg KH
2007-04-26 17:01   ` [patch 00/33] 2.6.20-stable review Greg KH
2007-04-26 20:29   ` Chuck Ebbert
2007-04-27 10:15   ` Wu, Bryan
2007-04-27 11:05     ` Jesper Juhl
2007-04-27 13:47       ` Chuck Ebbert
2007-04-27 15:13     ` Greg KH
2007-04-28  4:21       ` Bryan WU
2007-04-28  5:48         ` Greg KH
2007-04-28  6:46           ` Bryan WU
2007-04-28  7:01             ` Greg KH
2007-04-28 16:24             ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070426165532.GH1898@kroah.com \
    --to=gregkh@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=cebbert@redhat.com \
    --cc=chuckw@quantumlinux.com \
    --cc=davej@redhat.com \
    --cc=hugh@veritas.com \
    --cc=jmforbes@linuxtx.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mkrufky@linuxtv.org \
    --cc=mszeredi@suse.cz \
    --cc=rdunlap@xenotime.net \
    --cc=reviews@ml.cw.f00f.org \
    --cc=stable@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=zwane@arm.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox