public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Rusty Russell <rusty@rustcorp.com.au>
To: Hugh Dickins <hugh@veritas.com>
Cc: mingo@redhat.com, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@osdl.org>
Subject: Re: [PATCH 2/2] Futex non-page-pinning fix
Date: Thu, 28 Aug 2003 18:03:26 +1000	[thread overview]
Message-ID: <20030828080403.F3DF52C899@lists.samba.org> (raw)
In-Reply-To: Your message of "Wed, 27 Aug 2003 09:37:25 +0100." <Pine.LNX.4.44.0308270846580.2063-100000@localhost.localdomain>

In message <Pine.LNX.4.44.0308270846580.2063-100000@localhost.localdomain> you write:
> But I do not understand how futex_wake can still be doing a
> this->page == page test: its __pin_page will ensure that some
> page is faulted in, but that's not necessarily the same page
> as in this->page.

Yeah, my complete screwup.  See below, with new routine
page_matches().

> I strongly agree with Andrew that add_to_swap and
> delete_from_swap_cache (probably the one without the __s)
> are the places for switching the anonymous, not page_io.c:
> page->mapping will be set and unset in those, and it's
> page->mapping that you're keying off in hash_futex.

Agreed, it's simplest, at least, to call futex_rehash every time
->mapping changes.  But that's alot, so we'll need that futex
page->flags optimization.

I've included 3 patches and my test code.  This includes debugging
output, so it's not a final patch.

The test code sets up 3 futexes (one in a file-backed mmap, one in a
malloc, and one in a shared memory segment), then runs through <N> MB
forcing things to swap.  When you see the kernel message saying a
futex has been swapped out, press Enter and it will wake the futexes.
A successful run looks like (it can take a few attempts before the
futex gets swapped out, depending on the phase of the moon):

  test:~# swapon /swap
  Adding 14992k swap on /swap.  Priority:-1 extents:82
  test:~# ./test-swap2 36
  MMAP=8, SHM=16, ANON=24, beginning chill...
  .
  futex_rehash 13370: offset 24 00000000 -> 902bd100
  
  Exiting...
  Waking up 0x30012008
  Waking up 0x804c018
  Waking up 0x30013010
  test:~# 

Thanks for feedback,
Rusty
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

Name: Minor futex comment tweaks and cleanups
Author: Rusty Russell
Status: Booted on 2.6.0-test4-bk2

D: Changes:
D: 
D: (1) don't return 0 from futex_wait if we are somehow
D:     spuriously woken up, loop in that case.
D: 
D: (2) remove bogus comment about address no longer being in this
D:     address space: we hold the mm lock, and __pin_page succeeded, so it
D:     can't be true, 
D: 
D: (3) remove bogus comment about "get_user might fault and schedule", 
D: 
D: (4) clarify comment about hashing: we hash address of struct page,
D:     not page itself,
D: 
D: (4) remove list_empty check: we still hold the lock, so it can
D:     never happen, and
D: 
D: (5) single error exit path, and move __queue_me to the end (order
D:     doesn't matter since we're inside the futex lock).

diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .9416-linux-2.6.0-test4-bk2/kernel/futex.c .9416-linux-2.6.0-test4-bk2.updated/kernel/futex.c
--- .9416-linux-2.6.0-test4-bk2/kernel/futex.c	2003-05-27 15:02:23.000000000 +1000
+++ .9416-linux-2.6.0-test4-bk2.updated/kernel/futex.c	2003-08-27 12:34:53.000000000 +1000
@@ -84,9 +84,7 @@ static inline void unlock_futex_mm(void)
 	spin_unlock(&current->mm->page_table_lock);
 }
 
-/*
- * The physical page is shared, so we can hash on its address:
- */
+/* The struct page is shared, so we can hash on its address. */
 static inline struct list_head *hash_futex(struct page *page, int offset)
 {
 	return &futex_queues[hash_long((unsigned long)page + offset,
@@ -311,67 +309,68 @@ static inline int futex_wait(unsigned lo
 		      int val,
 		      unsigned long time)
 {
-	DECLARE_WAITQUEUE(wait, current);
-	int ret = 0, curval;
+	wait_queue_t wait;
+	int ret, curval;
 	struct page *page;
 	struct futex_q q;
 
+again:
 	init_waitqueue_head(&q.waiters);
+	init_waitqueue_entry(&wait, current);
 
 	lock_futex_mm();
 
 	page = __pin_page(uaddr - offset);
 	if (!page) {
-		unlock_futex_mm();
-		return -EFAULT;
+		ret = -EFAULT;
+		goto unlock;
 	}
-	__queue_me(&q, page, uaddr, offset, -1, NULL);
 
 	/*
-	 * Page is pinned, but may no longer be in this address space.
+	 * Page is pinned, but may be a kernel address.
 	 * It cannot schedule, so we access it with the spinlock held.
 	 */
 	if (get_user(curval, (int *)uaddr) != 0) {
-		unlock_futex_mm();
 		ret = -EFAULT;
-		goto out;
+		goto unlock;
 	}
+
 	if (curval != val) {
-		unlock_futex_mm();
 		ret = -EWOULDBLOCK;
-		goto out;
+		goto unlock;
 	}
-	/*
-	 * The get_user() above might fault and schedule so we
-	 * cannot just set TASK_INTERRUPTIBLE state when queueing
-	 * ourselves into the futex hash. This code thus has to
-	 * rely on the futex_wake() code doing a wakeup after removing
-	 * the waiter from the list.
-	 */
+
+	__queue_me(&q, page, uaddr, offset, -1, NULL);
 	add_wait_queue(&q.waiters, &wait);
 	set_current_state(TASK_INTERRUPTIBLE);
-	if (!list_empty(&q.list)) {
-		unlock_futex_mm();
-		time = schedule_timeout(time);
-	}
+	unlock_futex_mm();
+
+	time = schedule_timeout(time);
+
 	set_current_state(TASK_RUNNING);
 	/*
 	 * NOTE: we don't remove ourselves from the waitqueue because
 	 * we are the only user of it.
 	 */
-	if (time == 0) {
-		ret = -ETIMEDOUT;
-		goto out;
-	}
-	if (signal_pending(current))
-		ret = -EINTR;
-out:
-	/* Were we woken up anyway? */
+	put_page(q.page);
+
+	/* Were we woken up (and removed from queue)?  Always return
+	 * success when this happens. */
 	if (!unqueue_me(&q))
 		ret = 0;
-	put_page(q.page);
+	else if (time == 0)
+		ret = -ETIMEDOUT;
+	else if (signal_pending(current))
+		ret = -EINTR;
+	else
+		/* Spurious wakeup somehow.  Loop. */
+		goto again;
 
 	return ret;
+
+unlock:
+	unlock_futex_mm();
+	return ret;
 }
 
 static int futex_close(struct inode *inode, struct file *filp)


Name: Allow Futex Rehashing In Interrupt
Author: Rusty Russell
Status: Tested on 2.6.0-test4-bk2
Depends: Misc/futex-minor-tweaks.patch.gz
Depends: Misc/qemu-page-offset.patch.gz

D: This patch simply uses spin_lock_irq() instead of spin_lock for the
D: futex lock, in preparation for the futex_rehash patch which needs to
D: operate on the futex hash table in IRQ context.

diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .5392-linux-2.6.0-test4-bk2/kernel/futex.c .5392-linux-2.6.0-test4-bk2.updated/kernel/futex.c
--- .5392-linux-2.6.0-test4-bk2/kernel/futex.c	2003-08-28 17:51:06.000000000 +1000
+++ .5392-linux-2.6.0-test4-bk2.updated/kernel/futex.c	2003-08-28 17:51:37.000000000 +1000
@@ -74,12 +74,12 @@ static inline void lock_futex_mm(void)
 {
 	spin_lock(&current->mm->page_table_lock);
 	spin_lock(&vcache_lock);
-	spin_lock(&futex_lock);
+	spin_lock_irq(&futex_lock);
 }
 
 static inline void unlock_futex_mm(void)
 {
-	spin_unlock(&futex_lock);
+	spin_unlock_irq(&futex_lock);
 	spin_unlock(&vcache_lock);
 	spin_unlock(&current->mm->page_table_lock);
 }
@@ -196,7 +196,7 @@ static void futex_vcache_callback(vcache
 	struct futex_q *q = container_of(vcache, struct futex_q, vcache);
 	struct list_head *head = hash_futex(new_page, q->offset);
 
-	spin_lock(&futex_lock);
+	spin_lock_irq(&futex_lock);
 
 	if (!list_empty(&q->list)) {
 		put_page(q->page);
@@ -206,7 +206,7 @@ static void futex_vcache_callback(vcache
 		list_add_tail(&q->list, head);
 	}
 
-	spin_unlock(&futex_lock);
+	spin_unlock_irq(&futex_lock);
 }
 
 /*
@@ -293,13 +293,13 @@ static inline int unqueue_me(struct fute
 	int ret = 0;
 
 	spin_lock(&vcache_lock);
-	spin_lock(&futex_lock);
+	spin_lock_irq(&futex_lock);
 	if (!list_empty(&q->list)) {
 		list_del(&q->list);
 		__detach_vcache(&q->vcache);
 		ret = 1;
 	}
-	spin_unlock(&futex_lock);
+	spin_unlock_irq(&futex_lock);
 	spin_unlock(&vcache_lock);
 	return ret;
 }
@@ -391,10 +391,10 @@ static unsigned int futex_poll(struct fi
 	int ret = 0;
 
 	poll_wait(filp, &q->waiters, wait);
-	spin_lock(&futex_lock);
+	spin_lock_irq(&futex_lock);
 	if (list_empty(&q->list))
 		ret = POLLIN | POLLRDNORM;
-	spin_unlock(&futex_lock);
+	spin_unlock_irq(&futex_lock);
 
 	return ret;
 }


Name: Futexes without pinning pages
Author: Rusty Russell
Status: Tested on 2.6.0-test4-bk2
Depends: Misc/futex-irqsave.patch.gz
Depends: Misc/qemu-page-offset.patch.gz

D: TODO: Fix compilation with CONFIG_FUTEX=n.
D: 
D: Avoid pinning pages with futexes in them, to resolve FUTEX_FD DoS.
D: This means we need another way to uniquely identify pages, rather
D: than simply comparing the "struct page *" (and using a hash table
D: based on the "struct page *".  For file-backed pages, the
D: page->mapping and page->index provide a unique identifier, which is
D: persistent even if they get swapped out to the file and back.
D: There are cases where the mapping changes: for these we need a
D: callback to rehash all futexes in that page.
D: 
D: For anonymous pages, page->mapping == NULL.  So for this case we
D: use the "struct page *" itself to hash and compare, because the
D: mapping will be set (to &swapper_space) if the page is moved to the
D: swap cache, and will be rehashed, and we can find it again.
D: 
D: The current futex_rehash() call walks the entire hash table, which
D: is slow.  The simplest optimization is to have a page->flags bit
D: which indicates a futex has been placed in this page: we can clear
D: it if futex_rehash() finds no futexes.  This will come in a
D: followup patch.

diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .5947-2.6.0-test4-bk2-futex-swap.pre/include/linux/futex.h .5947-2.6.0-test4-bk2-futex-swap/include/linux/futex.h
--- .5947-2.6.0-test4-bk2-futex-swap.pre/include/linux/futex.h	2003-05-27 15:02:21.000000000 +1000
+++ .5947-2.6.0-test4-bk2-futex-swap/include/linux/futex.h	2003-08-28 17:52:47.000000000 +1000
@@ -17,4 +17,10 @@ asmlinkage long sys_futex(u32 __user *ua
 long do_futex(unsigned long uaddr, int op, int val,
 		unsigned long timeout, unsigned long uaddr2, int val2);
 
+/* To tell us when the page->mapping or page->index changes. */
+struct page;
+struct address_space;
+extern void futex_rehash(struct page *page,
+			 struct address_space *new_mapping,
+			 unsigned long new_index);
 #endif
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .5947-2.6.0-test4-bk2-futex-swap.pre/include/linux/pagemap.h .5947-2.6.0-test4-bk2-futex-swap/include/linux/pagemap.h
--- .5947-2.6.0-test4-bk2-futex-swap.pre/include/linux/pagemap.h	2003-08-25 11:58:34.000000000 +1000
+++ .5947-2.6.0-test4-bk2-futex-swap/include/linux/pagemap.h	2003-08-28 17:52:47.000000000 +1000
@@ -11,6 +11,7 @@
 #include <linux/pagemap.h>
 #include <asm/uaccess.h>
 #include <linux/gfp.h>
+#include <linux/futex.h>
 
 /*
  * Bits in mapping->flags.  The lower __GFP_BITS_SHIFT bits are the page
@@ -143,6 +144,7 @@ static inline void ___add_to_page_cache(
 		struct address_space *mapping, unsigned long index)
 {
 	list_add(&page->list, &mapping->clean_pages);
+	futex_rehash(page, mapping, index);
 	page->mapping = mapping;
 	page->index = index;
 
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .5947-2.6.0-test4-bk2-futex-swap.pre/kernel/futex.c .5947-2.6.0-test4-bk2-futex-swap/kernel/futex.c
--- .5947-2.6.0-test4-bk2-futex-swap.pre/kernel/futex.c	2003-08-28 17:52:46.000000000 +1000
+++ .5947-2.6.0-test4-bk2-futex-swap/kernel/futex.c	2003-08-28 17:52:47.000000000 +1000
@@ -35,6 +35,15 @@
 #include <linux/vcache.h>
 #include <linux/mount.h>
 
+/* Futexes need to have a way of identifying pages which are the same,
+   when they may be in different address spaces (ie. virtual address
+   might be different, eg. shared mmap).  We don't want to pin the
+   pages, so we use page->mapping & page->index where page->mapping is
+   not NULL (file-backed pages), and hash the page struct itself for
+   other pages.  Callbacks rehash pages when page->mapping is changed
+   or set (such as when anonymous pages enter the swap cache), so we
+   recognize them when the get swapped back in. */
+
 #define FUTEX_HASHBITS 8
 
 /*
@@ -45,7 +54,10 @@ struct futex_q {
 	struct list_head list;
 	wait_queue_head_t waiters;
 
-	/* Page struct and offset within it. */
+	/* These match if mapping != NULL */
+	struct address_space *mapping;
+	unsigned long index;
+	/* Otherwise, the page itself. */
 	struct page *page;
 	int offset;
 
@@ -57,7 +69,6 @@ struct futex_q {
 	struct file *filp;
 };
 
-/* The key for the hash is the address + index + offset within page */
 static struct list_head futex_queues[1<<FUTEX_HASHBITS];
 static spinlock_t futex_lock = SPIN_LOCK_UNLOCKED;
 
@@ -84,11 +95,68 @@ static inline void unlock_futex_mm(void)
 	spin_unlock(&current->mm->page_table_lock);
 }
 
-/* The struct page is shared, so we can hash on its address. */
-static inline struct list_head *hash_futex(struct page *page, int offset)
+static inline int page_matches(struct page *page, struct futex_q *elem)
 {
-	return &futex_queues[hash_long((unsigned long)page + offset,
-							FUTEX_HASHBITS)];
+	if (elem->mapping)
+		return page->mapping == elem->mapping
+			&& page->index == elem->index;
+	return page == elem->page;
+}
+
+/* For pages which are file backed, we can simply hash by mapping and
+ * index, which is persistent as they get swapped out.  For anonymous
+ * regions, we hash by the actual struct page *: their mapping will
+ * change and they will be rehashed if they are swapped out.
+ */
+static inline struct list_head *hash_futex(struct address_space *mapping,
+					   unsigned long index,
+					   struct page *page,
+					   int offset)
+{
+	unsigned long hashin;
+	if (mapping)
+		hashin = (unsigned long)mapping + index;
+	else
+		hashin = (unsigned long)page;
+	return &futex_queues[hash_long(hashin+offset, FUTEX_HASHBITS)];
+}
+
+/* Called when we change page->mapping (or page->index).  Can be in
+ * interrupt context. */
+void futex_rehash(struct page *page,
+		  struct address_space *new_mapping, unsigned long new_index)
+{
+	unsigned int i;
+	unsigned long flags;
+	struct futex_q *q;
+	LIST_HEAD(gather);
+	static int rehash_count = 0;
+
+	spin_lock_irqsave(&futex_lock, flags);
+	rehash_count++;
+	for (i = 0; i < 1 << FUTEX_HASHBITS; i++) {
+		struct list_head *l, *next;
+		list_for_each_safe(l, next, &futex_queues[i]) {
+			q = list_entry(l, struct futex_q, list);
+			if (page_matches(page, q)) {
+				list_del(&q->list);
+				list_add(&q->list, &gather);
+				printk("futex_rehash %i: offset %i %p -> %p\n",
+				       rehash_count,
+				       q->offset, page->mapping, new_mapping);
+			}
+		}
+	}
+	while (!list_empty(&gather)) {
+		q = list_entry(gather.next, struct futex_q, list);
+		q->mapping = new_mapping;
+		q->index = new_index;
+		q->page = page;
+		list_del(&q->list);
+		list_add(&q->list,
+			 hash_futex(new_mapping, new_index, page, q->offset));
+	}
+	spin_unlock_irqrestore(&futex_lock, flags);
 }
 
 /*
@@ -162,12 +230,12 @@ static inline int futex_wake(unsigned lo
 		return -EFAULT;
 	}
 
-	head = hash_futex(page, offset);
+	head = hash_futex(page->mapping, page->index, page, offset);
 
 	list_for_each_safe(i, next, head) {
 		struct futex_q *this = list_entry(i, struct futex_q, list);
 
-		if (this->page == page && this->offset == offset) {
+		if (page_matches(page, this) && this->offset == offset) {
 			list_del_init(i);
 			__detach_vcache(&this->vcache);
 			wake_up_all(&this->waiters);
@@ -194,15 +262,16 @@ static inline int futex_wake(unsigned lo
 static void futex_vcache_callback(vcache_t *vcache, struct page *new_page)
 {
 	struct futex_q *q = container_of(vcache, struct futex_q, vcache);
-	struct list_head *head = hash_futex(new_page, q->offset);
+	struct list_head *head = hash_futex(new_page->mapping, new_page->index,
+					    new_page, q->offset);
 
 	spin_lock_irq(&futex_lock);
 
 	if (!list_empty(&q->list)) {
-		put_page(q->page);
-		q->page = new_page;
-		__pin_page_atomic(new_page);
 		list_del(&q->list);
+		q->mapping = new_page->mapping;
+		q->index = new_page->index;
+		q->page = new_page;
 		list_add_tail(&q->list, head);
 	}
 
@@ -229,13 +298,13 @@ static inline int futex_requeue(unsigned
 	if (!page2)
 		goto out;
 
-	head1 = hash_futex(page1, offset1);
-	head2 = hash_futex(page2, offset2);
+	head1 = hash_futex(page1->mapping, page1->index, page1, offset1);
+	head2 = hash_futex(page2->mapping, page2->index, page2, offset2);
 
 	list_for_each_safe(i, next, head1) {
 		struct futex_q *this = list_entry(i, struct futex_q, list);
 
-		if (this->page == page1 && this->offset == offset1) {
+		if (page_matches(page1, this) && this->offset == offset1) {
 			list_del_init(i);
 			__detach_vcache(&this->vcache);
 			if (++ret <= nr_wake) {
@@ -244,8 +313,6 @@ static inline int futex_requeue(unsigned
 					send_sigio(&this->filp->f_owner,
 							this->fd, POLL_IN);
 			} else {
-				put_page(this->page);
-				__pin_page_atomic (page2);
 				list_add_tail(i, head2);
 				__attach_vcache(&this->vcache, uaddr2,
 					current->mm, futex_vcache_callback);
@@ -272,11 +339,14 @@ static inline void __queue_me(struct fut
 				unsigned long uaddr, int offset,
 				int fd, struct file *filp)
 {
-	struct list_head *head = hash_futex(page, offset);
+	struct list_head *head
+		= hash_futex(page->mapping, page->index, page, offset);
 
 	q->offset = offset;
 	q->fd = fd;
 	q->filp = filp;
+	q->mapping = page->mapping;
+	q->index = page->index;
 	q->page = page;
 
 	list_add_tail(&q->list, head);
@@ -332,18 +402,19 @@ again:
 	 */
 	if (get_user(curval, (int *)uaddr) != 0) {
 		ret = -EFAULT;
-		goto unlock;
+		goto putpage;
 	}
 
 	if (curval != val) {
 		ret = -EWOULDBLOCK;
-		goto unlock;
+		goto putpage;
 	}
 
 	__queue_me(&q, page, uaddr, offset, -1, NULL);
 	add_wait_queue(&q.waiters, &wait);
 	set_current_state(TASK_INTERRUPTIBLE);
 	unlock_futex_mm();
+	put_page(page);
 
 	time = schedule_timeout(time);
 
@@ -352,7 +423,6 @@ again:
 	 * NOTE: we don't remove ourselves from the waitqueue because
 	 * we are the only user of it.
 	 */
-	put_page(q.page);
 
 	/* Were we woken up (and removed from queue)?  Always return
 	 * success when this happens. */
@@ -368,6 +438,8 @@ again:
 
 	return ret;
 
+putpage:
+	put_page(page);
 unlock:
 	unlock_futex_mm();
 	return ret;
@@ -378,7 +450,6 @@ static int futex_close(struct inode *ino
 	struct futex_q *q = filp->private_data;
 
 	unqueue_me(q);
-	put_page(q->page);
 	kfree(filp->private_data);
 	return 0;
 }
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .5947-2.6.0-test4-bk2-futex-swap.pre/mm/filemap.c .5947-2.6.0-test4-bk2-futex-swap/mm/filemap.c
--- .5947-2.6.0-test4-bk2-futex-swap.pre/mm/filemap.c	2003-08-25 11:58:36.000000000 +1000
+++ .5947-2.6.0-test4-bk2-futex-swap/mm/filemap.c	2003-08-28 17:52:47.000000000 +1000
@@ -27,6 +27,7 @@
 #include <linux/pagevec.h>
 #include <linux/blkdev.h>
 #include <linux/security.h>
+#include <linux/futex.h>
 /*
  * This is needed for the following functions:
  *  - try_to_release_page
@@ -92,6 +93,7 @@ void __remove_from_page_cache(struct pag
 
 	radix_tree_delete(&mapping->page_tree, page->index);
 	list_del(&page->list);
+	futex_rehash(page, NULL, 0);
 	page->mapping = NULL;
 
 	mapping->nrpages--;
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .5947-2.6.0-test4-bk2-futex-swap.pre/mm/page_io.c .5947-2.6.0-test4-bk2-futex-swap/mm/page_io.c
--- .5947-2.6.0-test4-bk2-futex-swap.pre/mm/page_io.c	2003-02-07 19:18:58.000000000 +1100
+++ .5947-2.6.0-test4-bk2-futex-swap/mm/page_io.c	2003-08-28 17:52:47.000000000 +1000
@@ -19,6 +19,7 @@
 #include <linux/buffer_head.h>	/* for block_sync_page() */
 #include <linux/mpage.h>
 #include <linux/writeback.h>
+#include <linux/futex.h>
 #include <asm/pgtable.h>
 
 static struct bio *
@@ -151,6 +152,7 @@ int rw_swap_page_sync(int rw, swp_entry_
 	lock_page(page);
 
 	BUG_ON(page->mapping);
+	futex_rehash(page, &swapper_space, entry.val);
 	page->mapping = &swapper_space;
 	page->index = entry.val;
 
@@ -161,7 +163,10 @@ int rw_swap_page_sync(int rw, swp_entry_
 		ret = swap_writepage(page, &swap_wbc);
 		wait_on_page_writeback(page);
 	}
+	lock_page(page);
+	futex_rehash(page, NULL, 0);
 	page->mapping = NULL;
+	unlock_page(page);
 	if (ret == 0 && (!PageUptodate(page) || PageError(page)))
 		ret = -EIO;
 	return ret;
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .5947-2.6.0-test4-bk2-futex-swap.pre/mm/swap_state.c .5947-2.6.0-test4-bk2-futex-swap/mm/swap_state.c
--- .5947-2.6.0-test4-bk2-futex-swap.pre/mm/swap_state.c	2003-08-12 06:58:06.000000000 +1000
+++ .5947-2.6.0-test4-bk2-futex-swap/mm/swap_state.c	2003-08-28 17:52:47.000000000 +1000
@@ -13,6 +13,7 @@
 #include <linux/init.h>
 #include <linux/pagemap.h>
 #include <linux/backing-dev.h>
+#include <linux/futex.h>
 
 #include <asm/pgtable.h>
 
@@ -200,6 +201,7 @@ int move_to_swap_cache(struct page *page
 
 	err = radix_tree_insert(&swapper_space.page_tree, entry.val, page);
 	if (!err) {
+		futex_rehash(page, &swapper_space, entry.val);
 		__remove_from_page_cache(page);
 		___add_to_page_cache(page, &swapper_space, entry.val);
 	}
@@ -236,6 +238,7 @@ int move_from_swap_cache(struct page *pa
 
 	err = radix_tree_insert(&mapping->page_tree, index, page);
 	if (!err) {
+		futex_rehash(page, mapping, index);
 		__delete_from_swap_cache(page);
 		___add_to_page_cache(page, mapping, index);
 	}
================
/* Test program for unpinned futexes. */

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sched.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <signal.h>
#include <errno.h>
#include <string.h>
#include <sys/shm.h>
#include <sys/poll.h>
#include <sys/mman.h>
#include "usersem.h"

#define ARRAY_SIZE(arr) (sizeof(arr)/sizeof((arr)[0]))
#define streq(a,b) (strcmp((a),(b)) == 0)

#define UNTOUCHED_VALUE 7
#define TOUCHED_VALUE 100

/* Recognizable offsets within page so printks can easily identify the futex */
#define MMAP_OFFSET 8
#define SHMEM_OFFSET 16
#define ANON_OFFSET 24

static void chill(int size)
{
	char *mem;
	unsigned int i, numpages;
	unsigned int pagesize = getpagesize();

	numpages = size * 1024*1024 / pagesize;
	mem = malloc(numpages * pagesize);
	if (!mem) {
		fprintf(stderr, "Out of memory\n");
		exit(1);
	}

	for (;;) {
		for (i = 0; i < numpages; i++)
			*(mem + i*pagesize) = '\0';
		printf(".\n");
	}
}

static void *adjust_to_offset(void *addr, unsigned int off)
{
	unsigned int pagesize = getpagesize();

	while ((unsigned long)addr % pagesize != off)
		addr++;
	return addr;
}

/* Give an FD waiting on futex at this addr. */
static int fd_for_addr(void *addr)
{
	return sys_futex(addr, FUTEX_FD, 0, NULL);
}

static void *get_anon_page(void)
{
	return malloc(getpagesize());
}

static void *get_shared_memory(void)
{
	unsigned int pagesize = getpagesize();
	void *memory;
	static int shm;

	shm = shmget(IPC_PRIVATE, pagesize, IPC_CREAT);
	if (shm < 0) {
		perror("shmget");
		return NULL;
	}
	memory = shmat(shm, NULL, 0);
	if (memory == (void *)-1) {
		perror("shmat");
		shmctl(shm, IPC_RMID, NULL);
		return NULL;
	}
	/* Delete when we exit. */
	shmctl(shm, IPC_RMID, NULL);
	return memory;
}

static void *get_mmap(void)
{
	unsigned int pagesize = getpagesize();
	void *memory;
	int fd;

	fd = open("/tmp/test-swap2", O_CREAT|O_EXCL|O_RDWR, 0600);
	if (fd < 0) {
		perror("Opening /tmp/test-swap2");
		return NULL;
	}
	unlink("/tmp/test-swap2");
	if (write(fd, malloc(pagesize), pagesize) != pagesize) {
		perror("Writing /tmp/test-swap2");
		return NULL;
	}
	memory = mmap(NULL, pagesize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
	if (memory == MAP_FAILED) {
		perror("mmap /tmp/test-swap2");
		return NULL;
	}
	return memory;
}

/* Touching the page swaps it back in, which is slightly different
 * from waking on a page which is swapped out. */
static void wake_futex(int *futex, int touch, const char action[])
{
	if (touch)
		*futex = TOUCHED_VALUE;
	printf("Waking up %p\n", futex);
	if (sys_futex(futex, FUTEX_WAKE, 1, NULL) != 1) {
		perror(action);
		exit(1);
	}
}

int main(int argc, char *argv[])
{
	int *map, *anon, *shm, touch = 0;
	pid_t child;
	char c;
	struct pollfd pollarr[3];

	if (argc > 1 && streq(argv[1], "-t")) {
		touch = 1;
		argc--;
		argv++;
	}

	if (argc != 2) {
		fprintf(stderr, "Usage: test-swap2 [-t] <mb-to-thrash>\n"
			"   Where -t means touch before waking futexes\n");
		exit(1);
	}

	map = get_mmap();
	anon = get_anon_page();
	shm = get_shared_memory();
	if (!map || !anon || !shm)
		exit(1);

	/* Different offsets so we can recognize them in printks. */
	map = adjust_to_offset(map, MMAP_OFFSET);
	anon = adjust_to_offset(anon, ANON_OFFSET);
	shm = adjust_to_offset(shm, SHMEM_OFFSET);

	*map = *anon = *shm = UNTOUCHED_VALUE;

	pollarr[0].fd = fd_for_addr(map);
	pollarr[1].fd = fd_for_addr(anon);
	pollarr[2].fd = fd_for_addr(shm);
	pollarr[0].events = pollarr[1].events = pollarr[2].events = POLLIN;

	/* None should be ready yet */
	if (poll(pollarr, ARRAY_SIZE(pollarr), 0)) {
		perror("poll");
		exit(1);
	}

	printf("MMAP=%u, SHM=%u, ANON=%u, beginning chill...\n",
	       MMAP_OFFSET, SHMEM_OFFSET, ANON_OFFSET);

	child = fork();
	if (child < 0) {
		perror("fork");
		exit(1);
	}
	if (child == 0)
		chill(atoi(argv[1]));

	/* Wait until they hit <CR>. */
	if (read(0, &c, 1) != 1) {
		perror("read");
		exit(1);
	}

	kill(child, SIGTERM);
	printf("Exiting...\n");

	wake_futex(map, touch, "wake up mmap");
	wake_futex(anon, touch, "wake up anon");
	wake_futex(shm, touch, "wake up shm");

	if (poll(pollarr, ARRAY_SIZE(pollarr), -1) != 3) {
		perror("poll after wakeup");
		exit(1);
	}

	if (touch)
		c = TOUCHED_VALUE;
	else
		c = UNTOUCHED_VALUE;

	if (*map != c) {
		fprintf(stderr, "mmap didn't get value %u: it's %u\n",
			c, *mmap);
		exit(1);
	}
	if (*anon != c) {
		fprintf(stderr, "anon didn't get value %u: it's %u\n",
			c, *anon);
		exit(1);
	}
	if (*shm != c) {
		fprintf(stderr, "shm didn't get value %u: it's %u\n",
			c, *shm);
		exit(1);
	}
	return 0;
}
================

      parent reply	other threads:[~2003-08-28  8:42 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-08-26  3:12 [PATCH 2/2] Futex non-page-pinning fix Rusty Russell
2003-08-26  4:06 ` Andrew Morton
2003-08-26  5:30   ` Ingo Molnar
2003-08-26  5:50     ` Andrew Morton
2003-08-26  5:58       ` Ingo Molnar
2003-08-26  6:14         ` Andrew Morton
2003-08-26  6:36           ` Ingo Molnar
2003-08-26  7:02             ` Andrew Morton
2003-08-26  7:56               ` Arjan van de Ven
2003-08-26  8:08                 ` Muli Ben-Yehuda
2003-08-26  8:11                   ` Arjan van de Ven
2003-08-26  8:25                   ` Andrew Morton
2003-08-26  9:02                     ` Muli Ben-Yehuda
2003-08-26 10:38                   ` William Lee Irwin III
2003-08-26 10:44                     ` Andrew Morton
2003-08-26 10:45                       ` William Lee Irwin III
2003-08-26 17:29                         ` Andrew Morton
2003-08-26 19:35                           ` William Lee Irwin III
2003-08-27  5:17   ` Rusty Russell
2003-08-27  6:20     ` Andrew Morton
2003-08-28  0:47       ` Rusty Russell
2003-08-28  8:21         ` Andrew Morton
2003-08-29  3:46           ` Rusty Russell
2003-08-29  4:17             ` Andrew Morton
2003-08-30  7:49               ` Rusty Russell
2003-09-01  0:35             ` Jamie Lokier
2003-09-01  4:11               ` Rusty Russell
2003-09-01 20:57                 ` Hugh Dickins
2003-09-02  3:12                   ` Rusty Russell
2003-09-02  6:51                     ` Jamie Lokier
2003-09-02 16:14                       ` Hugh Dickins
2003-09-02 19:54                         ` Jamie Lokier
2003-09-02 20:15                           ` Andrew Morton
2003-09-02 21:20                             ` Jamie Lokier
2003-09-03  2:40                         ` Rusty Russell
2003-09-03  7:36                           ` [PATCH] Alternate futex non-page-pinning and COW fix Jamie Lokier
2003-09-03 11:19                             ` Hugh Dickins
2003-09-03 14:38                               ` Jamie Lokier
2003-09-03 17:39                               ` Jamie Lokier
2003-09-03 17:55                                 ` Linus Torvalds
2003-09-03 18:06                                   ` Hugh Dickins
2003-09-03 18:19                                     ` Linus Torvalds
2003-09-03 18:43                                       ` Hugh Dickins
2003-09-03 19:05                                         ` Linus Torvalds
2003-09-03 19:40                                           ` Hugh Dickins
2003-09-03 20:04                                             ` Linus Torvalds
2003-09-04  2:43                                           ` Rusty Russell
2003-09-04  8:28                                             ` Linus Torvalds
2003-09-04 12:20                                               ` Hugh Dickins
2003-09-04 15:40                                                 ` Linus Torvalds
2003-09-04 16:55                                                   ` Hugh Dickins
2003-09-04 18:38                                           ` Jamie Lokier
2003-09-04 18:46                                             ` Linus Torvalds
2003-09-04 20:04                                               ` Jamie Lokier
2003-09-04 21:49                                                 ` Linus Torvalds
2003-09-04 22:10                                                   ` Jamie Lokier
2003-09-04 17:16                                   ` Jamie Lokier
2003-09-04 17:38                                     ` Linus Torvalds
2003-09-04 18:42                                       ` Linus Torvalds
2003-09-04 18:42                                         ` Linus Torvalds
2003-09-05  3:55                                     ` Rusty Russell
2003-09-05 17:55                                       ` Jamie Lokier
2003-09-04 17:26                                   ` Jamie Lokier
2003-09-04  1:30                               ` Rusty Russell
2003-09-04 21:00                                 ` Jamie Lokier
2003-09-05  5:19                                   ` Rusty Russell
2003-09-05 20:54                                     ` Jamie Lokier
2003-09-07  6:45                                       ` Rusty Russell
2003-09-07 13:20                                         ` Jamie Lokier
2003-09-08  1:49                                           ` Rusty Russell
2003-09-08  9:44                                             ` Jamie Lokier
2003-09-03 14:40                             ` [PATCH 2] Little fixes to previous futex patch Jamie Lokier
2003-09-04 16:45                               ` Hugh Dickins
2003-09-04 17:59                                 ` Jamie Lokier
2003-09-04 18:35                                   ` Hugh Dickins
2003-09-04 20:11                                     ` Jamie Lokier
2003-09-04 21:36                                       ` Hugh Dickins
2003-09-04 21:58                                         ` Jamie Lokier
2003-09-07  7:23                                         ` Ingo Molnar
2003-09-07 12:27                                           ` Hugh Dickins
2003-09-07 15:03                                             ` Jamie Lokier
2003-09-08  1:56                                             ` Rusty Russell
2003-09-07 13:00                                           ` Jamie Lokier
2003-09-08  3:32                                             ` Ingo Molnar
2003-09-08  9:33                                               ` Jamie Lokier
2003-09-08  9:57                                                 ` Ingo Molnar
2003-09-05  4:56                                   ` Rusty Russell
2003-09-03 15:34                             ` [PATCH] Alternate futex non-page-pinning and COW fix Andrew Morton
2003-09-03 17:16                               ` Jamie Lokier
2003-09-04  1:35                             ` Rusty Russell
2003-09-04 17:35                               ` Jamie Lokier
2003-09-03  0:14                       ` [PATCH 2/2] Futex non-page-pinning fix Rusty Russell
2003-09-03  1:16                         ` Andrew Morton
2003-09-03  1:54                           ` Dave Hansen
2003-09-03  2:54                             ` Andrew Morton
2003-09-02  3:23                   ` Hugh Dickins
2003-09-02 23:58                     ` Rusty Russell
2003-08-27  8:37     ` Hugh Dickins
2003-08-27  8:56       ` William Lee Irwin III
2003-08-27 10:38       ` Andrew Morton
2003-08-27 10:57         ` Hugh Dickins
2003-08-28  8:03       ` Rusty Russell [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030828080403.F3DF52C899@lists.samba.org \
    --to=rusty@rustcorp.com.au \
    --cc=akpm@osdl.org \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox