All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rusty Russell <rusty@rustcorp.com.au>
To: Hugh Dickins <hugh@veritas.com>
Cc: mingo@redhat.com, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@osdl.org>
Subject: Re: [PATCH 2/2] Futex non-page-pinning fix
Date: Thu, 28 Aug 2003 18:03:26 +1000	[thread overview]
Message-ID: <20030828080403.F3DF52C899@lists.samba.org> (raw)
In-Reply-To: Your message of "Wed, 27 Aug 2003 09:37:25 +0100." <Pine.LNX.4.44.0308270846580.2063-100000@localhost.localdomain>

In message <Pine.LNX.4.44.0308270846580.2063-100000@localhost.localdomain> you write:
> But I do not understand how futex_wake can still be doing a
> this->page == page test: its __pin_page will ensure that some
> page is faulted in, but that's not necessarily the same page
> as in this->page.

Yeah, my complete screwup.  See below, with new routine
page_matches().

> I strongly agree with Andrew that add_to_swap and
> delete_from_swap_cache (probably the one without the __s)
> are the places for switching the anonymous, not page_io.c:
> page->mapping will be set and unset in those, and it's
> page->mapping that you're keying off in hash_futex.

Agreed, it's simplest, at least, to call futex_rehash every time
->mapping changes.  But that's alot, so we'll need that futex
page->flags optimization.

I've included 3 patches and my test code.  This includes debugging
output, so it's not a final patch.

The test code sets up 3 futexes (one in a file-backed mmap, one in a
malloc, and one in a shared memory segment), then runs through <N> MB
forcing things to swap.  When you see the kernel message saying a
futex has been swapped out, press Enter and it will wake the futexes.
A successful run looks like (it can take a few attempts before the
futex gets swapped out, depending on the phase of the moon):

  test:~# swapon /swap
  Adding 14992k swap on /swap.  Priority:-1 extents:82
  test:~# ./test-swap2 36
  MMAP=8, SHM=16, ANON=24, beginning chill...
  .
  futex_rehash 13370: offset 24 00000000 -> 902bd100
  
  Exiting...
  Waking up 0x30012008
  Waking up 0x804c018
  Waking up 0x30013010
  test:~# 

Thanks for feedback,
Rusty
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

Name: Minor futex comment tweaks and cleanups
Author: Rusty Russell
Status: Booted on 2.6.0-test4-bk2

D: Changes:
D: 
D: (1) don't return 0 from futex_wait if we are somehow
D:     spuriously woken up, loop in that case.
D: 
D: (2) remove bogus comment about address no longer being in this
D:     address space: we hold the mm lock, and __pin_page succeeded, so it
D:     can't be true, 
D: 
D: (3) remove bogus comment about "get_user might fault and schedule", 
D: 
D: (4) clarify comment about hashing: we hash address of struct page,
D:     not page itself,
D: 
D: (4) remove list_empty check: we still hold the lock, so it can
D:     never happen, and
D: 
D: (5) single error exit path, and move __queue_me to the end (order
D:     doesn't matter since we're inside the futex lock).

diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .9416-linux-2.6.0-test4-bk2/kernel/futex.c .9416-linux-2.6.0-test4-bk2.updated/kernel/futex.c
--- .9416-linux-2.6.0-test4-bk2/kernel/futex.c	2003-05-27 15:02:23.000000000 +1000
+++ .9416-linux-2.6.0-test4-bk2.updated/kernel/futex.c	2003-08-27 12:34:53.000000000 +1000
@@ -84,9 +84,7 @@ static inline void unlock_futex_mm(void)
 	spin_unlock(&current->mm->page_table_lock);
 }
 
-/*
- * The physical page is shared, so we can hash on its address:
- */
+/* The struct page is shared, so we can hash on its address. */
 static inline struct list_head *hash_futex(struct page *page, int offset)
 {
 	return &futex_queues[hash_long((unsigned long)page + offset,
@@ -311,67 +309,68 @@ static inline int futex_wait(unsigned lo
 		      int val,
 		      unsigned long time)
 {
-	DECLARE_WAITQUEUE(wait, current);
-	int ret = 0, curval;
+	wait_queue_t wait;
+	int ret, curval;
 	struct page *page;
 	struct futex_q q;
 
+again:
 	init_waitqueue_head(&q.waiters);
+	init_waitqueue_entry(&wait, current);
 
 	lock_futex_mm();
 
 	page = __pin_page(uaddr - offset);
 	if (!page) {
-		unlock_futex_mm();
-		return -EFAULT;
+		ret = -EFAULT;
+		goto unlock;
 	}
-	__queue_me(&q, page, uaddr, offset, -1, NULL);
 
 	/*
-	 * Page is pinned, but may no longer be in this address space.
+	 * Page is pinned, but may be a kernel address.
 	 * It cannot schedule, so we access it with the spinlock held.
 	 */
 	if (get_user(curval, (int *)uaddr) != 0) {
-		unlock_futex_mm();
 		ret = -EFAULT;
-		goto out;
+		goto unlock;
 	}
+
 	if (curval != val) {
-		unlock_futex_mm();
 		ret = -EWOULDBLOCK;
-		goto out;
+		goto unlock;
 	}
-	/*
-	 * The get_user() above might fault and schedule so we
-	 * cannot just set TASK_INTERRUPTIBLE state when queueing
-	 * ourselves into the futex hash. This code thus has to
-	 * rely on the futex_wake() code doing a wakeup after removing
-	 * the waiter from the list.
-	 */
+
+	__queue_me(&q, page, uaddr, offset, -1, NULL);
 	add_wait_queue(&q.waiters, &wait);
 	set_current_state(TASK_INTERRUPTIBLE);
-	if (!list_empty(&q.list)) {
-		unlock_futex_mm();
-		time = schedule_timeout(time);
-	}
+	unlock_futex_mm();
+
+	time = schedule_timeout(time);
+
 	set_current_state(TASK_RUNNING);
 	/*
 	 * NOTE: we don't remove ourselves from the waitqueue because
 	 * we are the only user of it.
 	 */
-	if (time == 0) {
-		ret = -ETIMEDOUT;
-		goto out;
-	}
-	if (signal_pending(current))
-		ret = -EINTR;
-out:
-	/* Were we woken up anyway? */
+	put_page(q.page);
+
+	/* Were we woken up (and removed from queue)?  Always return
+	 * success when this happens. */
 	if (!unqueue_me(&q))
 		ret = 0;
-	put_page(q.page);
+	else if (time == 0)
+		ret = -ETIMEDOUT;
+	else if (signal_pending(current))
+		ret = -EINTR;
+	else
+		/* Spurious wakeup somehow.  Loop. */
+		goto again;
 
 	return ret;
+
+unlock:
+	unlock_futex_mm();
+	return ret;
 }
 
 static int futex_close(struct inode *inode, struct file *filp)


Name: Allow Futex Rehashing In Interrupt
Author: Rusty Russell
Status: Tested on 2.6.0-test4-bk2
Depends: Misc/futex-minor-tweaks.patch.gz
Depends: Misc/qemu-page-offset.patch.gz

D: This patch simply uses spin_lock_irq() instead of spin_lock for the
D: futex lock, in preparation for the futex_rehash patch which needs to
D: operate on the futex hash table in IRQ context.

diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .5392-linux-2.6.0-test4-bk2/kernel/futex.c .5392-linux-2.6.0-test4-bk2.updated/kernel/futex.c
--- .5392-linux-2.6.0-test4-bk2/kernel/futex.c	2003-08-28 17:51:06.000000000 +1000
+++ .5392-linux-2.6.0-test4-bk2.updated/kernel/futex.c	2003-08-28 17:51:37.000000000 +1000
@@ -74,12 +74,12 @@ static inline void lock_futex_mm(void)
 {
 	spin_lock(&current->mm->page_table_lock);
 	spin_lock(&vcache_lock);
-	spin_lock(&futex_lock);
+	spin_lock_irq(&futex_lock);
 }
 
 static inline void unlock_futex_mm(void)
 {
-	spin_unlock(&futex_lock);
+	spin_unlock_irq(&futex_lock);
 	spin_unlock(&vcache_lock);
 	spin_unlock(&current->mm->page_table_lock);
 }
@@ -196,7 +196,7 @@ static void futex_vcache_callback(vcache
 	struct futex_q *q = container_of(vcache, struct futex_q, vcache);
 	struct list_head *head = hash_futex(new_page, q->offset);
 
-	spin_lock(&futex_lock);
+	spin_lock_irq(&futex_lock);
 
 	if (!list_empty(&q->list)) {
 		put_page(q->page);
@@ -206,7 +206,7 @@ static void futex_vcache_callback(vcache
 		list_add_tail(&q->list, head);
 	}
 
-	spin_unlock(&futex_lock);
+	spin_unlock_irq(&futex_lock);
 }
 
 /*
@@ -293,13 +293,13 @@ static inline int unqueue_me(struct fute
 	int ret = 0;
 
 	spin_lock(&vcache_lock);
-	spin_lock(&futex_lock);
+	spin_lock_irq(&futex_lock);
 	if (!list_empty(&q->list)) {
 		list_del(&q->list);
 		__detach_vcache(&q->vcache);
 		ret = 1;
 	}
-	spin_unlock(&futex_lock);
+	spin_unlock_irq(&futex_lock);
 	spin_unlock(&vcache_lock);
 	return ret;
 }
@@ -391,10 +391,10 @@ static unsigned int futex_poll(struct fi
 	int ret = 0;
 
 	poll_wait(filp, &q->waiters, wait);
-	spin_lock(&futex_lock);
+	spin_lock_irq(&futex_lock);
 	if (list_empty(&q->list))
 		ret = POLLIN | POLLRDNORM;
-	spin_unlock(&futex_lock);
+	spin_unlock_irq(&futex_lock);
 
 	return ret;
 }


Name: Futexes without pinning pages
Author: Rusty Russell
Status: Tested on 2.6.0-test4-bk2
Depends: Misc/futex-irqsave.patch.gz
Depends: Misc/qemu-page-offset.patch.gz

D: TODO: Fix compilation with CONFIG_FUTEX=n.
D: 
D: Avoid pinning pages with futexes in them, to resolve FUTEX_FD DoS.
D: This means we need another way to uniquely identify pages, rather
D: than simply comparing the "struct page *" (and using a hash table
D: based on the "struct page *".  For file-backed pages, the
D: page->mapping and page->index provide a unique identifier, which is
D: persistent even if they get swapped out to the file and back.
D: There are cases where the mapping changes: for these we need a
D: callback to rehash all futexes in that page.
D: 
D: For anonymous pages, page->mapping == NULL.  So for this case we
D: use the "struct page *" itself to hash and compare, because the
D: mapping will be set (to &swapper_space) if the page is moved to the
D: swap cache, and will be rehashed, and we can find it again.
D: 
D: The current futex_rehash() call walks the entire hash table, which
D: is slow.  The simplest optimization is to have a page->flags bit
D: which indicates a futex has been placed in this page: we can clear
D: it if futex_rehash() finds no futexes.  This will come in a
D: followup patch.

diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .5947-2.6.0-test4-bk2-futex-swap.pre/include/linux/futex.h .5947-2.6.0-test4-bk2-futex-swap/include/linux/futex.h
--- .5947-2.6.0-test4-bk2-futex-swap.pre/include/linux/futex.h	2003-05-27 15:02:21.000000000 +1000
+++ .5947-2.6.0-test4-bk2-futex-swap/include/linux/futex.h	2003-08-28 17:52:47.000000000 +1000
@@ -17,4 +17,10 @@ asmlinkage long sys_futex(u32 __user *ua
 long do_futex(unsigned long uaddr, int op, int val,
 		unsigned long timeout, unsigned long uaddr2, int val2);
 
+/* To tell us when the page->mapping or page->index changes. */
+struct page;
+struct address_space;
+extern void futex_rehash(struct page *page,
+			 struct address_space *new_mapping,
+			 unsigned long new_index);
 #endif
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .5947-2.6.0-test4-bk2-futex-swap.pre/include/linux/pagemap.h .5947-2.6.0-test4-bk2-futex-swap/include/linux/pagemap.h
--- .5947-2.6.0-test4-bk2-futex-swap.pre/include/linux/pagemap.h	2003-08-25 11:58:34.000000000 +1000
+++ .5947-2.6.0-test4-bk2-futex-swap/include/linux/pagemap.h	2003-08-28 17:52:47.000000000 +1000
@@ -11,6 +11,7 @@
 #include <linux/pagemap.h>
 #include <asm/uaccess.h>
 #include <linux/gfp.h>
+#include <linux/futex.h>
 
 /*
  * Bits in mapping->flags.  The lower __GFP_BITS_SHIFT bits are the page
@@ -143,6 +144,7 @@ static inline void ___add_to_page_cache(
 		struct address_space *mapping, unsigned long index)
 {
 	list_add(&page->list, &mapping->clean_pages);
+	futex_rehash(page, mapping, index);
 	page->mapping = mapping;
 	page->index = index;
 
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .5947-2.6.0-test4-bk2-futex-swap.pre/kernel/futex.c .5947-2.6.0-test4-bk2-futex-swap/kernel/futex.c
--- .5947-2.6.0-test4-bk2-futex-swap.pre/kernel/futex.c	2003-08-28 17:52:46.000000000 +1000
+++ .5947-2.6.0-test4-bk2-futex-swap/kernel/futex.c	2003-08-28 17:52:47.000000000 +1000
@@ -35,6 +35,15 @@
 #include <linux/vcache.h>
 #include <linux/mount.h>
 
+/* Futexes need to have a way of identifying pages which are the same,
+   when they may be in different address spaces (ie. virtual address
+   might be different, eg. shared mmap).  We don't want to pin the
+   pages, so we use page->mapping & page->index where page->mapping is
+   not NULL (file-backed pages), and hash the page struct itself for
+   other pages.  Callbacks rehash pages when page->mapping is changed
+   or set (such as when anonymous pages enter the swap cache), so we
+   recognize them when the get swapped back in. */
+
 #define FUTEX_HASHBITS 8
 
 /*
@@ -45,7 +54,10 @@ struct futex_q {
 	struct list_head list;
 	wait_queue_head_t waiters;
 
-	/* Page struct and offset within it. */
+	/* These match if mapping != NULL */
+	struct address_space *mapping;
+	unsigned long index;
+	/* Otherwise, the page itself. */
 	struct page *page;
 	int offset;
 
@@ -57,7 +69,6 @@ struct futex_q {
 	struct file *filp;
 };
 
-/* The key for the hash is the address + index + offset within page */
 static struct list_head futex_queues[1<<FUTEX_HASHBITS];
 static spinlock_t futex_lock = SPIN_LOCK_UNLOCKED;
 
@@ -84,11 +95,68 @@ static inline void unlock_futex_mm(void)
 	spin_unlock(&current->mm->page_table_lock);
 }
 
-/* The struct page is shared, so we can hash on its address. */
-static inline struct list_head *hash_futex(struct page *page, int offset)
+static inline int page_matches(struct page *page, struct futex_q *elem)
 {
-	return &futex_queues[hash_long((unsigned long)page + offset,
-							FUTEX_HASHBITS)];
+	if (elem->mapping)
+		return page->mapping == elem->mapping
+			&& page->index == elem->index;
+	return page == elem->page;
+}
+
+/* For pages which are file backed, we can simply hash by mapping and
+ * index, which is persistent as they get swapped out.  For anonymous
+ * regions, we hash by the actual struct page *: their mapping will
+ * change and they will be rehashed if they are swapped out.
+ */
+static inline struct list_head *hash_futex(struct address_space *mapping,
+					   unsigned long index,
+					   struct page *page,
+					   int offset)
+{
+	unsigned long hashin;
+	if (mapping)
+		hashin = (unsigned long)mapping + index;
+	else
+		hashin = (unsigned long)page;
+	return &futex_queues[hash_long(hashin+offset, FUTEX_HASHBITS)];
+}
+
+/* Called when we change page->mapping (or page->index).  Can be in
+ * interrupt context. */
+void futex_rehash(struct page *page,
+		  struct address_space *new_mapping, unsigned long new_index)
+{
+	unsigned int i;
+	unsigned long flags;
+	struct futex_q *q;
+	LIST_HEAD(gather);
+	static int rehash_count = 0;
+
+	spin_lock_irqsave(&futex_lock, flags);
+	rehash_count++;
+	for (i = 0; i < 1 << FUTEX_HASHBITS; i++) {
+		struct list_head *l, *next;
+		list_for_each_safe(l, next, &futex_queues[i]) {
+			q = list_entry(l, struct futex_q, list);
+			if (page_matches(page, q)) {
+				list_del(&q->list);
+				list_add(&q->list, &gather);
+				printk("futex_rehash %i: offset %i %p -> %p\n",
+				       rehash_count,
+				       q->offset, page->mapping, new_mapping);
+			}
+		}
+	}
+	while (!list_empty(&gather)) {
+		q = list_entry(gather.next, struct futex_q, list);
+		q->mapping = new_mapping;
+		q->index = new_index;
+		q->page = page;
+		list_del(&q->list);
+		list_add(&q->list,
+			 hash_futex(new_mapping, new_index, page, q->offset));
+	}
+	spin_unlock_irqrestore(&futex_lock, flags);
 }
 
 /*
@@ -162,12 +230,12 @@ static inline int futex_wake(unsigned lo
 		return -EFAULT;
 	}
 
-	head = hash_futex(page, offset);
+	head = hash_futex(page->mapping, page->index, page, offset);
 
 	list_for_each_safe(i, next, head) {
 		struct futex_q *this = list_entry(i, struct futex_q, list);
 
-		if (this->page == page && this->offset == offset) {
+		if (page_matches(page, this) && this->offset == offset) {
 			list_del_init(i);
 			__detach_vcache(&this->vcache);
 			wake_up_all(&this->waiters);
@@ -194,15 +262,16 @@ static inline int futex_wake(unsigned lo
 static void futex_vcache_callback(vcache_t *vcache, struct page *new_page)
 {
 	struct futex_q *q = container_of(vcache, struct futex_q, vcache);
-	struct list_head *head = hash_futex(new_page, q->offset);
+	struct list_head *head = hash_futex(new_page->mapping, new_page->index,
+					    new_page, q->offset);
 
 	spin_lock_irq(&futex_lock);
 
 	if (!list_empty(&q->list)) {
-		put_page(q->page);
-		q->page = new_page;
-		__pin_page_atomic(new_page);
 		list_del(&q->list);
+		q->mapping = new_page->mapping;
+		q->index = new_page->index;
+		q->page = new_page;
 		list_add_tail(&q->list, head);
 	}
 
@@ -229,13 +298,13 @@ static inline int futex_requeue(unsigned
 	if (!page2)
 		goto out;
 
-	head1 = hash_futex(page1, offset1);
-	head2 = hash_futex(page2, offset2);
+	head1 = hash_futex(page1->mapping, page1->index, page1, offset1);
+	head2 = hash_futex(page2->mapping, page2->index, page2, offset2);
 
 	list_for_each_safe(i, next, head1) {
 		struct futex_q *this = list_entry(i, struct futex_q, list);
 
-		if (this->page == page1 && this->offset == offset1) {
+		if (page_matches(page1, this) && this->offset == offset1) {
 			list_del_init(i);
 			__detach_vcache(&this->vcache);
 			if (++ret <= nr_wake) {
@@ -244,8 +313,6 @@ static inline int futex_requeue(unsigned
 					send_sigio(&this->filp->f_owner,
 							this->fd, POLL_IN);
 			} else {
-				put_page(this->page);
-				__pin_page_atomic (page2);
 				list_add_tail(i, head2);
 				__attach_vcache(&this->vcache, uaddr2,
 					current->mm, futex_vcache_callback);
@@ -272,11 +339,14 @@ static inline void __queue_me(struct fut
 				unsigned long uaddr, int offset,
 				int fd, struct file *filp)
 {
-	struct list_head *head = hash_futex(page, offset);
+	struct list_head *head
+		= hash_futex(page->mapping, page->index, page, offset);
 
 	q->offset = offset;
 	q->fd = fd;
 	q->filp = filp;
+	q->mapping = page->mapping;
+	q->index = page->index;
 	q->page = page;
 
 	list_add_tail(&q->list, head);
@@ -332,18 +402,19 @@ again:
 	 */
 	if (get_user(curval, (int *)uaddr) != 0) {
 		ret = -EFAULT;
-		goto unlock;
+		goto putpage;
 	}
 
 	if (curval != val) {
 		ret = -EWOULDBLOCK;
-		goto unlock;
+		goto putpage;
 	}
 
 	__queue_me(&q, page, uaddr, offset, -1, NULL);
 	add_wait_queue(&q.waiters, &wait);
 	set_current_state(TASK_INTERRUPTIBLE);
 	unlock_futex_mm();
+	put_page(page);
 
 	time = schedule_timeout(time);
 
@@ -352,7 +423,6 @@ again:
 	 * NOTE: we don't remove ourselves from the waitqueue because
 	 * we are the only user of it.
 	 */
-	put_page(q.page);
 
 	/* Were we woken up (and removed from queue)?  Always return
 	 * success when this happens. */
@@ -368,6 +438,8 @@ again:
 
 	return ret;
 
+putpage:
+	put_page(page);
 unlock:
 	unlock_futex_mm();
 	return ret;
@@ -378,7 +450,6 @@ static int futex_close(struct inode *ino
 	struct futex_q *q = filp->private_data;
 
 	unqueue_me(q);
-	put_page(q->page);
 	kfree(filp->private_data);
 	return 0;
 }
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .5947-2.6.0-test4-bk2-futex-swap.pre/mm/filemap.c .5947-2.6.0-test4-bk2-futex-swap/mm/filemap.c
--- .5947-2.6.0-test4-bk2-futex-swap.pre/mm/filemap.c	2003-08-25 11:58:36.000000000 +1000
+++ .5947-2.6.0-test4-bk2-futex-swap/mm/filemap.c	2003-08-28 17:52:47.000000000 +1000
@@ -27,6 +27,7 @@
 #include <linux/pagevec.h>
 #include <linux/blkdev.h>
 #include <linux/security.h>
+#include <linux/futex.h>
 /*
  * This is needed for the following functions:
  *  - try_to_release_page
@@ -92,6 +93,7 @@ void __remove_from_page_cache(struct pag
 
 	radix_tree_delete(&mapping->page_tree, page->index);
 	list_del(&page->list);
+	futex_rehash(page, NULL, 0);
 	page->mapping = NULL;
 
 	mapping->nrpages--;
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .5947-2.6.0-test4-bk2-futex-swap.pre/mm/page_io.c .5947-2.6.0-test4-bk2-futex-swap/mm/page_io.c
--- .5947-2.6.0-test4-bk2-futex-swap.pre/mm/page_io.c	2003-02-07 19:18:58.000000000 +1100
+++ .5947-2.6.0-test4-bk2-futex-swap/mm/page_io.c	2003-08-28 17:52:47.000000000 +1000
@@ -19,6 +19,7 @@
 #include <linux/buffer_head.h>	/* for block_sync_page() */
 #include <linux/mpage.h>
 #include <linux/writeback.h>
+#include <linux/futex.h>
 #include <asm/pgtable.h>
 
 static struct bio *
@@ -151,6 +152,7 @@ int rw_swap_page_sync(int rw, swp_entry_
 	lock_page(page);
 
 	BUG_ON(page->mapping);
+	futex_rehash(page, &swapper_space, entry.val);
 	page->mapping = &swapper_space;
 	page->index = entry.val;
 
@@ -161,7 +163,10 @@ int rw_swap_page_sync(int rw, swp_entry_
 		ret = swap_writepage(page, &swap_wbc);
 		wait_on_page_writeback(page);
 	}
+	lock_page(page);
+	futex_rehash(page, NULL, 0);
 	page->mapping = NULL;
+	unlock_page(page);
 	if (ret == 0 && (!PageUptodate(page) || PageError(page)))
 		ret = -EIO;
 	return ret;
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .5947-2.6.0-test4-bk2-futex-swap.pre/mm/swap_state.c .5947-2.6.0-test4-bk2-futex-swap/mm/swap_state.c
--- .5947-2.6.0-test4-bk2-futex-swap.pre/mm/swap_state.c	2003-08-12 06:58:06.000000000 +1000
+++ .5947-2.6.0-test4-bk2-futex-swap/mm/swap_state.c	2003-08-28 17:52:47.000000000 +1000
@@ -13,6 +13,7 @@
 #include <linux/init.h>
 #include <linux/pagemap.h>
 #include <linux/backing-dev.h>
+#include <linux/futex.h>
 
 #include <asm/pgtable.h>
 
@@ -200,6 +201,7 @@ int move_to_swap_cache(struct page *page
 
 	err = radix_tree_insert(&swapper_space.page_tree, entry.val, page);
 	if (!err) {
+		futex_rehash(page, &swapper_space, entry.val);
 		__remove_from_page_cache(page);
 		___add_to_page_cache(page, &swapper_space, entry.val);
 	}
@@ -236,6 +238,7 @@ int move_from_swap_cache(struct page *pa
 
 	err = radix_tree_insert(&mapping->page_tree, index, page);
 	if (!err) {
+		futex_rehash(page, mapping, index);
 		__delete_from_swap_cache(page);
 		___add_to_page_cache(page, mapping, index);
 	}
================
/* Test program for unpinned futexes. */

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sched.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <signal.h>
#include <errno.h>
#include <string.h>
#include <sys/shm.h>
#include <sys/poll.h>
#include <sys/mman.h>
#include "usersem.h"

#define ARRAY_SIZE(arr) (sizeof(arr)/sizeof((arr)[0]))
#define streq(a,b) (strcmp((a),(b)) == 0)

#define UNTOUCHED_VALUE 7
#define TOUCHED_VALUE 100

/* Recognizable offsets within page so printks can easily identify the futex */
#define MMAP_OFFSET 8
#define SHMEM_OFFSET 16
#define ANON_OFFSET 24

static void chill(int size)
{
	char *mem;
	unsigned int i, numpages;
	unsigned int pagesize = getpagesize();

	numpages = size * 1024*1024 / pagesize;
	mem = malloc(numpages * pagesize);
	if (!mem) {
		fprintf(stderr, "Out of memory\n");
		exit(1);
	}

	for (;;) {
		for (i = 0; i < numpages; i++)
			*(mem + i*pagesize) = '\0';
		printf(".\n");
	}
}

static void *adjust_to_offset(void *addr, unsigned int off)
{
	unsigned int pagesize = getpagesize();

	while ((unsigned long)addr % pagesize != off)
		addr++;
	return addr;
}

/* Give an FD waiting on futex at this addr. */
static int fd_for_addr(void *addr)
{
	return sys_futex(addr, FUTEX_FD, 0, NULL);
}

static void *get_anon_page(void)
{
	return malloc(getpagesize());
}

static void *get_shared_memory(void)
{
	unsigned int pagesize = getpagesize();
	void *memory;
	static int shm;

	shm = shmget(IPC_PRIVATE, pagesize, IPC_CREAT);
	if (shm < 0) {
		perror("shmget");
		return NULL;
	}
	memory = shmat(shm, NULL, 0);
	if (memory == (void *)-1) {
		perror("shmat");
		shmctl(shm, IPC_RMID, NULL);
		return NULL;
	}
	/* Delete when we exit. */
	shmctl(shm, IPC_RMID, NULL);
	return memory;
}

static void *get_mmap(void)
{
	unsigned int pagesize = getpagesize();
	void *memory;
	int fd;

	fd = open("/tmp/test-swap2", O_CREAT|O_EXCL|O_RDWR, 0600);
	if (fd < 0) {
		perror("Opening /tmp/test-swap2");
		return NULL;
	}
	unlink("/tmp/test-swap2");
	if (write(fd, malloc(pagesize), pagesize) != pagesize) {
		perror("Writing /tmp/test-swap2");
		return NULL;
	}
	memory = mmap(NULL, pagesize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
	if (memory == MAP_FAILED) {
		perror("mmap /tmp/test-swap2");
		return NULL;
	}
	return memory;
}

/* Touching the page swaps it back in, which is slightly different
 * from waking on a page which is swapped out. */
static void wake_futex(int *futex, int touch, const char action[])
{
	if (touch)
		*futex = TOUCHED_VALUE;
	printf("Waking up %p\n", futex);
	if (sys_futex(futex, FUTEX_WAKE, 1, NULL) != 1) {
		perror(action);
		exit(1);
	}
}

int main(int argc, char *argv[])
{
	int *map, *anon, *shm, touch = 0;
	pid_t child;
	char c;
	struct pollfd pollarr[3];

	if (argc > 1 && streq(argv[1], "-t")) {
		touch = 1;
		argc--;
		argv++;
	}

	if (argc != 2) {
		fprintf(stderr, "Usage: test-swap2 [-t] <mb-to-thrash>\n"
			"   Where -t means touch before waking futexes\n");
		exit(1);
	}

	map = get_mmap();
	anon = get_anon_page();
	shm = get_shared_memory();
	if (!map || !anon || !shm)
		exit(1);

	/* Different offsets so we can recognize them in printks. */
	map = adjust_to_offset(map, MMAP_OFFSET);
	anon = adjust_to_offset(anon, ANON_OFFSET);
	shm = adjust_to_offset(shm, SHMEM_OFFSET);

	*map = *anon = *shm = UNTOUCHED_VALUE;

	pollarr[0].fd = fd_for_addr(map);
	pollarr[1].fd = fd_for_addr(anon);
	pollarr[2].fd = fd_for_addr(shm);
	pollarr[0].events = pollarr[1].events = pollarr[2].events = POLLIN;

	/* None should be ready yet */
	if (poll(pollarr, ARRAY_SIZE(pollarr), 0)) {
		perror("poll");
		exit(1);
	}

	printf("MMAP=%u, SHM=%u, ANON=%u, beginning chill...\n",
	       MMAP_OFFSET, SHMEM_OFFSET, ANON_OFFSET);

	child = fork();
	if (child < 0) {
		perror("fork");
		exit(1);
	}
	if (child == 0)
		chill(atoi(argv[1]));

	/* Wait until they hit <CR>. */
	if (read(0, &c, 1) != 1) {
		perror("read");
		exit(1);
	}

	kill(child, SIGTERM);
	printf("Exiting...\n");

	wake_futex(map, touch, "wake up mmap");
	wake_futex(anon, touch, "wake up anon");
	wake_futex(shm, touch, "wake up shm");

	if (poll(pollarr, ARRAY_SIZE(pollarr), -1) != 3) {
		perror("poll after wakeup");
		exit(1);
	}

	if (touch)
		c = TOUCHED_VALUE;
	else
		c = UNTOUCHED_VALUE;

	if (*map != c) {
		fprintf(stderr, "mmap didn't get value %u: it's %u\n",
			c, *mmap);
		exit(1);
	}
	if (*anon != c) {
		fprintf(stderr, "anon didn't get value %u: it's %u\n",
			c, *anon);
		exit(1);
	}
	if (*shm != c) {
		fprintf(stderr, "shm didn't get value %u: it's %u\n",
			c, *shm);
		exit(1);
	}
	return 0;
}
================

      parent reply	other threads:[~2003-08-28  8:42 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-08-26  3:12 [PATCH 2/2] Futex non-page-pinning fix Rusty Russell
2003-08-26  4:06 ` Andrew Morton
2003-08-26  5:30   ` Ingo Molnar
2003-08-26  5:50     ` Andrew Morton
2003-08-26  5:58       ` Ingo Molnar
2003-08-26  6:14         ` Andrew Morton
2003-08-26  6:36           ` Ingo Molnar
2003-08-26  7:02             ` Andrew Morton
2003-08-26  7:56               ` Arjan van de Ven
2003-08-26  8:08                 ` Muli Ben-Yehuda
2003-08-26  8:11                   ` Arjan van de Ven
2003-08-26  8:25                   ` Andrew Morton
2003-08-26  9:02                     ` Muli Ben-Yehuda
2003-08-26 10:38                   ` William Lee Irwin III
2003-08-26 10:44                     ` Andrew Morton
2003-08-26 10:45                       ` William Lee Irwin III
2003-08-26 17:29                         ` Andrew Morton
2003-08-26 19:35                           ` William Lee Irwin III
2003-08-27  5:17   ` Rusty Russell
2003-08-27  6:20     ` Andrew Morton
2003-08-28  0:47       ` Rusty Russell
2003-08-28  8:21         ` Andrew Morton
2003-08-29  3:46           ` Rusty Russell
2003-08-29  4:17             ` Andrew Morton
2003-08-30  7:49               ` Rusty Russell
2003-09-01  0:35             ` Jamie Lokier
2003-09-01  4:11               ` Rusty Russell
2003-09-01 20:57                 ` Hugh Dickins
2003-09-02  3:12                   ` Rusty Russell
2003-09-02  6:51                     ` Jamie Lokier
2003-09-02 16:14                       ` Hugh Dickins
2003-09-02 19:54                         ` Jamie Lokier
2003-09-02 20:15                           ` Andrew Morton
2003-09-02 21:20                             ` Jamie Lokier
2003-09-03  2:40                         ` Rusty Russell
2003-09-03  7:36                           ` [PATCH] Alternate futex non-page-pinning and COW fix Jamie Lokier
2003-09-03 11:19                             ` Hugh Dickins
2003-09-03 14:38                               ` Jamie Lokier
2003-09-03 17:39                               ` Jamie Lokier
2003-09-03 17:55                                 ` Linus Torvalds
2003-09-03 18:06                                   ` Hugh Dickins
2003-09-03 18:19                                     ` Linus Torvalds
2003-09-03 18:43                                       ` Hugh Dickins
2003-09-03 19:05                                         ` Linus Torvalds
2003-09-03 19:40                                           ` Hugh Dickins
2003-09-03 20:04                                             ` Linus Torvalds
2003-09-04  2:43                                           ` Rusty Russell
2003-09-04  8:28                                             ` Linus Torvalds
2003-09-04 12:20                                               ` Hugh Dickins
2003-09-04 15:40                                                 ` Linus Torvalds
2003-09-04 16:55                                                   ` Hugh Dickins
2003-09-04 18:38                                           ` Jamie Lokier
2003-09-04 18:46                                             ` Linus Torvalds
2003-09-04 20:04                                               ` Jamie Lokier
2003-09-04 21:49                                                 ` Linus Torvalds
2003-09-04 22:10                                                   ` Jamie Lokier
2003-09-04 17:16                                   ` Jamie Lokier
2003-09-04 17:38                                     ` Linus Torvalds
2003-09-04 18:42                                       ` Linus Torvalds
2003-09-04 18:42                                         ` Linus Torvalds
2003-09-05  3:55                                     ` Rusty Russell
2003-09-05 17:55                                       ` Jamie Lokier
2003-09-04 17:26                                   ` Jamie Lokier
2003-09-04  1:30                               ` Rusty Russell
2003-09-04 21:00                                 ` Jamie Lokier
2003-09-05  5:19                                   ` Rusty Russell
2003-09-05 20:54                                     ` Jamie Lokier
2003-09-07  6:45                                       ` Rusty Russell
2003-09-07 13:20                                         ` Jamie Lokier
2003-09-08  1:49                                           ` Rusty Russell
2003-09-08  9:44                                             ` Jamie Lokier
2003-09-03 14:40                             ` [PATCH 2] Little fixes to previous futex patch Jamie Lokier
2003-09-04 16:45                               ` Hugh Dickins
2003-09-04 17:59                                 ` Jamie Lokier
2003-09-04 18:35                                   ` Hugh Dickins
2003-09-04 20:11                                     ` Jamie Lokier
2003-09-04 21:36                                       ` Hugh Dickins
2003-09-04 21:58                                         ` Jamie Lokier
2003-09-07  7:23                                         ` Ingo Molnar
2003-09-07 12:27                                           ` Hugh Dickins
2003-09-07 15:03                                             ` Jamie Lokier
2003-09-08  1:56                                             ` Rusty Russell
2003-09-07 13:00                                           ` Jamie Lokier
2003-09-08  3:32                                             ` Ingo Molnar
2003-09-08  9:33                                               ` Jamie Lokier
2003-09-08  9:57                                                 ` Ingo Molnar
2003-09-05  4:56                                   ` Rusty Russell
2003-09-03 15:34                             ` [PATCH] Alternate futex non-page-pinning and COW fix Andrew Morton
2003-09-03 17:16                               ` Jamie Lokier
2003-09-04  1:35                             ` Rusty Russell
2003-09-04 17:35                               ` Jamie Lokier
2003-09-03  0:14                       ` [PATCH 2/2] Futex non-page-pinning fix Rusty Russell
2003-09-03  1:16                         ` Andrew Morton
2003-09-03  1:54                           ` Dave Hansen
2003-09-03  2:54                             ` Andrew Morton
2003-09-02  3:23                   ` Hugh Dickins
2003-09-02 23:58                     ` Rusty Russell
2003-08-27  8:37     ` Hugh Dickins
2003-08-27  8:56       ` William Lee Irwin III
2003-08-27 10:38       ` Andrew Morton
2003-08-27 10:57         ` Hugh Dickins
2003-08-28  8:03       ` Rusty Russell [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030828080403.F3DF52C899@lists.samba.org \
    --to=rusty@rustcorp.com.au \
    --cc=akpm@osdl.org \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.