public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* shmem_getpage_locked() / swapin_readahead() race in 2.4.4-pre3
@ 2001-04-14 15:59 Marcelo Tosatti
  2001-04-14 19:27 ` Rik van Riel
  2001-04-15 11:54 ` Christoph Rohland
  0 siblings, 2 replies; 6+ messages in thread
From: Marcelo Tosatti @ 2001-04-14 15:59 UTC (permalink / raw)
  To: Christoph Rohland; +Cc: Stephen C. Tweedie, Linus Torvalds, Rik van Riel, lkml


Hi, 

There is a nasty race between shmem_getpage_locked() and
swapin_readahead() with the new shmem code (introduced in 2.4.3-ac3 and
merged in the main tree in 2.4.4-pre3): 

shmem_getpage_locked() finds a page in the swapcache and moves it to the
pagecache as an shmem page, freeing the swapcache and the swap map entry
for this page. (which causes a BUG() in mm/shmem.c:353 since the swap
map entry is being used) 

In the meanwhile, swapin_readahead() is allocating a page and adding it to
the swapcache.

I don't see any clean fix for this one.

Suggestions ? 




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: shmem_getpage_locked() / swapin_readahead() race in 2.4.4-pre3
  2001-04-14 15:59 shmem_getpage_locked() / swapin_readahead() race in 2.4.4-pre3 Marcelo Tosatti
@ 2001-04-14 19:27 ` Rik van Riel
  2001-04-14 23:31   ` [NEED TESTERS] remove swapin_readahead " Marcelo Tosatti
  2001-04-15 11:54 ` Christoph Rohland
  1 sibling, 1 reply; 6+ messages in thread
From: Rik van Riel @ 2001-04-14 19:27 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Christoph Rohland, Stephen C. Tweedie, Linus Torvalds, lkml

On Sat, 14 Apr 2001, Marcelo Tosatti wrote:

> There is a nasty race between shmem_getpage_locked() and
> swapin_readahead() with the new shmem code (introduced in
> 2.4.3-ac3 and merged in the main tree in 2.4.4-pre3):

> I don't see any clean fix for this one.
> Suggestions ?

As we discussed with Alan on irc, we could remove the (physical)
swapin_readahead() and get 2.4 stable. Once 2.4 is stable we
could (if needed) introduce a virtual address based readahead
strategy for swap-backed things ... most of that code has been
ready for months thanks to Ben LaHaise.

A virtual-address based readahead not only makes much more sense
from a performance POV, it also cleanly gets the ugly locking
issues out of the way.

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [NEED TESTERS] remove swapin_readahead Re: shmem_getpage_locked() / swapin_readahead() race in 2.4.4-pre3
  2001-04-14 19:27 ` Rik van Riel
@ 2001-04-14 23:31   ` Marcelo Tosatti
  2001-04-17 20:23     ` Stephen C. Tweedie
  0 siblings, 1 reply; 6+ messages in thread
From: Marcelo Tosatti @ 2001-04-14 23:31 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Christoph Rohland, Stephen C. Tweedie, Linus Torvalds, lkml


On Sat, 14 Apr 2001, Rik van Riel wrote:

> On Sat, 14 Apr 2001, Marcelo Tosatti wrote:
> 
> > There is a nasty race between shmem_getpage_locked() and
> > swapin_readahead() with the new shmem code (introduced in
> > 2.4.3-ac3 and merged in the main tree in 2.4.4-pre3):
> 
> > I don't see any clean fix for this one.
> > Suggestions ?
> 
> As we discussed with Alan on irc, we could remove the (physical)
> swapin_readahead() and get 2.4 stable. Once 2.4 is stable we
> could (if needed) introduce a virtual address based readahead
> strategy for swap-backed things ... most of that code has been
> ready for months thanks to Ben LaHaise.
> 
> A virtual-address based readahead not only makes much more sense
> from a performance POV, it also cleanly gets the ugly locking
> issues out of the way.

Test (multiple shm-stress) runs fine without swapin_readahead(), as
expected.

I tried "make -j32" test (128M RAM, 4 CPU's) and got 4m17 without
readahead against 3m40 with readahead, on average. Need real swap
intensive workloads to "really" know of how much it hurts, though.

People with swap intensive workloads: please test this and report results. 

Stephen/Linus? 


Patch against 2.4.4-pre3.


diff -Nur linux.orig/include/linux/mm.h linux/include/linux/mm.h
--- linux.orig/include/linux/mm.h	Sat Apr 14 21:31:38 2001
+++ linux/include/linux/mm.h	Sat Apr 14 21:30:44 2001
@@ -425,7 +425,6 @@
 extern void mem_init(void);
 extern void show_mem(void);
 extern void si_meminfo(struct sysinfo * val);
-extern void swapin_readahead(swp_entry_t);
 
 /* mmap.c */
 extern void lock_vma_mappings(struct vm_area_struct *);
diff -Nur linux.orig/include/linux/swap.h linux/include/linux/swap.h
--- linux.orig/include/linux/swap.h	Sat Apr 14 21:31:38 2001
+++ linux/include/linux/swap.h	Sat Apr 14 21:30:28 2001
@@ -145,7 +145,6 @@
 					struct inode **);
 extern int swap_duplicate(swp_entry_t);
 extern int swap_count(struct page *);
-extern int valid_swaphandles(swp_entry_t, unsigned long *);
 #define get_swap_page() __get_swap_page(1)
 extern void __swap_free(swp_entry_t, unsigned short);
 #define swap_free(entry) __swap_free((entry), 1)
diff -Nur linux.orig/mm/memory.c linux/mm/memory.c
--- linux.orig/mm/memory.c	Sat Apr 14 21:31:38 2001
+++ linux/mm/memory.c	Sat Apr 14 21:28:34 2001
@@ -1012,42 +1012,6 @@
 	return;
 }
 
-
-
-/* 
- * Primitive swap readahead code. We simply read an aligned block of
- * (1 << page_cluster) entries in the swap area. This method is chosen
- * because it doesn't cost us any seek time.  We also make sure to queue
- * the 'original' request together with the readahead ones...  
- */
-void swapin_readahead(swp_entry_t entry)
-{
-	int i, num;
-	struct page *new_page;
-	unsigned long offset;
-
-	/*
-	 * Get the number of handles we should do readahead io to. Also,
-	 * grab temporary references on them, releasing them as io completes.
-	 */
-	num = valid_swaphandles(entry, &offset);
-	for (i = 0; i < num; offset++, i++) {
-		/* Don't block on I/O for read-ahead */
-		if (atomic_read(&nr_async_pages) >= pager_daemon.swap_cluster
-				* (1 << page_cluster)) {
-			while (i++ < num)
-				swap_free(SWP_ENTRY(SWP_TYPE(entry), offset++));
-			break;
-		}
-		/* Ok, do the async read-ahead now */
-		new_page = read_swap_cache_async(SWP_ENTRY(SWP_TYPE(entry), offset), 0);
-		if (new_page != NULL)
-			page_cache_release(new_page);
-		swap_free(SWP_ENTRY(SWP_TYPE(entry), offset));
-	}
-	return;
-}
-
 /*
  * We hold the mm semaphore and the page_table_lock on entry and exit.
  */
@@ -1062,7 +1026,6 @@
 	page = lookup_swap_cache(entry);
 	if (!page) {
 		lock_kernel();
-		swapin_readahead(entry);
 		page = read_swap_cache(entry);
 		unlock_kernel();
 		if (!page) {
diff -Nur linux.orig/mm/shmem.c linux/mm/shmem.c
--- linux.orig/mm/shmem.c	Sat Apr 14 21:31:38 2001
+++ linux/mm/shmem.c	Sat Apr 14 21:28:44 2001
@@ -328,7 +328,6 @@
 		if (!page) {
 			spin_unlock (&info->lock);
 			lock_kernel();
-			swapin_readahead(*entry);
 			page = read_swap_cache(*entry);
 			unlock_kernel();
 			if (!page) 
diff -Nur linux.orig/mm/swapfile.c linux/mm/swapfile.c
--- linux.orig/mm/swapfile.c	Thu Mar 22 14:22:15 2001
+++ linux/mm/swapfile.c	Sat Apr 14 21:30:04 2001
@@ -955,34 +955,3 @@
 	}
 	return;
 }
-
-/*
- * Kernel_lock protects against swap device deletion. Grab an extra
- * reference on the swaphandle so that it dos not become unused.
- */
-int valid_swaphandles(swp_entry_t entry, unsigned long *offset)
-{
-	int ret = 0, i = 1 << page_cluster;
-	unsigned long toff;
-	struct swap_info_struct *swapdev = SWP_TYPE(entry) + swap_info;
-
-	*offset = SWP_OFFSET(entry);
-	toff = *offset = (*offset >> page_cluster) << page_cluster;
-
-	swap_device_lock(swapdev);
-	do {
-		/* Don't read-ahead past the end of the swap area */
-		if (toff >= swapdev->max)
-			break;
-		/* Don't read in bad or busy pages */
-		if (!swapdev->swap_map[toff])
-			break;
-		if (swapdev->swap_map[toff] == SWAP_MAP_BAD)
-			break;
-		swapdev->swap_map[toff]++;
-		toff++;
-		ret++;
-	} while (--i);
-	swap_device_unlock(swapdev);
-	return ret;
-}


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: shmem_getpage_locked() / swapin_readahead() race in 2.4.4-pre3
  2001-04-14 15:59 shmem_getpage_locked() / swapin_readahead() race in 2.4.4-pre3 Marcelo Tosatti
  2001-04-14 19:27 ` Rik van Riel
@ 2001-04-15 11:54 ` Christoph Rohland
  1 sibling, 0 replies; 6+ messages in thread
From: Christoph Rohland @ 2001-04-15 11:54 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Stephen C. Tweedie, Linus Torvalds, Rik van Riel, lkml

[-- Attachment #1: Type: text/plain, Size: 1312 bytes --]

Hi,

On Sat, 14 Apr 2001, Marcelo Tosatti wrote:
> There is a nasty race between shmem_getpage_locked() and
> swapin_readahead() with the new shmem code (introduced in 2.4.3-ac3
> and merged in the main tree in 2.4.4-pre3):
> 
> shmem_getpage_locked() finds a page in the swapcache and moves it to
> the pagecache as an shmem page, freeing the swapcache and the swap
> map entry for this page. (which causes a BUG() in mm/shmem.c:353
> since the swap map entry is being used)
> 
> In the meanwhile, swapin_readahead() is allocating a page and adding
> it to the swapcache.

Oh, I was just chasing this also. 

> I don't see any clean fix for this one.

I think the actual check for swap_count is not necessary: If
swapin_readahead allocates a new swap_cache page for the entry, that's
not a real bug. On memory pressure this page will be reclaimed.

Actually we have to make shmem much more unfriendly to the swap cache
to make it correct: I think we have to drop the whole drop swap cache
pages on truncate logic since it uses lookup_swap_cache and
delete_from_swap_cache which both lock the page, while holding a
spinlock :-(

The appended patch implements both changes and relies on the page
stealer to shrink the swap cache. 

It also integrates fixes which Marcelo did send earlier.

Greetings
		Christoph


[-- Attachment #2: patch-2.4.4-tmpfs-fixes --]
[-- Type: text/plain, Size: 1190 bytes --]

--- 2.4.4-pre3/mm/shmem.c	Sat Apr 14 11:12:54 2001
+++ u2.4.3/mm/shmem.c	Sun Apr 15 13:45:58 2001
@@ -123,10 +123,19 @@
 		entry = *ptr;
 		*ptr = (swp_entry_t){0};
 		freed++;
+#if 0
+                /*
+		 * This does not work since it may sleep while holding
+		 * a spinlock 
+		 *
+		 * We rely on the page stealer to free up the
+		 * allocated swap space later
+		 */
 		if ((page = lookup_swap_cache(entry)) != NULL) {
 			delete_from_swap_cache(page);
 			page_cache_release(page);	
 		}
+#endif
 		swap_free (entry);
 	}
 	return freed;
@@ -236,8 +245,10 @@
 	
 	/* Only move to the swap cache if there are no other users of
 	 * the page. */
-	if (atomic_read(&page->count) > 2)
-		goto out;
+	if (atomic_read(&page->count) > 2){
+		set_page_dirty(page);
+ 		goto out;
+	}
 	
 	inode = page->mapping->host;
 	info = &inode->u.shmem_i;
@@ -348,9 +359,6 @@
 		if (TryLockPage(page)) 
 			goto wait_retry;
 
-		if (swap_count(page) > 2)
-			BUG();
-		
 		swap_free(*entry);
 		*entry = (swp_entry_t) {0};
 		delete_from_swap_cache_nolock(page);
@@ -432,6 +440,7 @@
 		*ptr = NOPAGE_SIGBUS;
 	return error;
 sigbus:
+	up (&inode->i_sem);
 	*ptr = NOPAGE_SIGBUS;
 	return -EFAULT;
 }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [NEED TESTERS] remove swapin_readahead Re: shmem_getpage_locked() / swapin_readahead() race in 2.4.4-pre3
  2001-04-14 23:31   ` [NEED TESTERS] remove swapin_readahead " Marcelo Tosatti
@ 2001-04-17 20:23     ` Stephen C. Tweedie
  2001-04-18 12:50       ` Christoph Rohland
  0 siblings, 1 reply; 6+ messages in thread
From: Stephen C. Tweedie @ 2001-04-17 20:23 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Rik van Riel, Christoph Rohland, Stephen C. Tweedie,
	Linus Torvalds, lkml

Hi,

On Sat, Apr 14, 2001 at 08:31:07PM -0300, Marcelo Tosatti wrote:
> On Sat, 14 Apr 2001, Rik van Riel wrote:
> > On Sat, 14 Apr 2001, Marcelo Tosatti wrote:
> > 
> > > There is a nasty race between shmem_getpage_locked() and
> > > swapin_readahead() with the new shmem code (introduced in
> > > 2.4.3-ac3 and merged in the main tree in 2.4.4-pre3):

> Test (multiple shm-stress) runs fine without swapin_readahead(), as
> expected.

> Stephen/Linus? 

I don't see the problem.  shmem_getpage_locked appears to back off
correctly if it encounters a swap-cached page already existing if
swapin_readahead has installed the page first, at least with the code
in 2.4.3-ac5.

There *does* appear to be a race, but it's swapin_readahead racing
with shmem_writepage.  That code does not search for an existing entry
in the swap cache when it decides to move a shmem page to swap, so we
can install the page twice and end up doing a lookup on the wrong
physical page if there is swap readahead going on.

To fix that, shmem_writepage needs to do a swap cache lookup and lock
before installing the new page --- it should probably just copy the
new page into the old one if it finds one already there.

--Stephen

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [NEED TESTERS] remove swapin_readahead Re: shmem_getpage_locked() / swapin_readahead() race in 2.4.4-pre3
  2001-04-17 20:23     ` Stephen C. Tweedie
@ 2001-04-18 12:50       ` Christoph Rohland
  0 siblings, 0 replies; 6+ messages in thread
From: Christoph Rohland @ 2001-04-18 12:50 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Marcelo Tosatti, Rik van Riel, Linus Torvalds, lkml

Hi Stephen,

On Tue, 17 Apr 2001, Stephen C. Tweedie wrote:
> I don't see the problem.  shmem_getpage_locked appears to back off
> correctly if it encounters a swap-cached page already existing if
> swapin_readahead has installed the page first, at least with the
> code in 2.4.3-ac5.

But the swap count can be increased by anybody without having the page
lock. So the check triggers and is bogus. See my old patch.

> There *does* appear to be a race, but it's swapin_readahead racing
> with shmem_writepage.  That code does not search for an existing
> entry in the swap cache when it decides to move a shmem page to
> swap, so we can install the page twice and end up doing a lookup on
> the wrong physical page if there is swap readahead going on.

I cannot follow you here. How can we have a swap cache entry if there
is no swap entry. . . . Oh stop you mean swapin_readahead does swap in
some totally bogus page into the swap cache after we did
__get_swap_page? I never thought about that!

> To fix that, shmem_writepage needs to do a swap cache lookup and
> lock before installing the new page --- it should probably just copy
> the new page into the old one if it finds one already there.

OK I will look into that.

Greetings
		Christoph



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2001-04-18 13:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-04-14 15:59 shmem_getpage_locked() / swapin_readahead() race in 2.4.4-pre3 Marcelo Tosatti
2001-04-14 19:27 ` Rik van Riel
2001-04-14 23:31   ` [NEED TESTERS] remove swapin_readahead " Marcelo Tosatti
2001-04-17 20:23     ` Stephen C. Tweedie
2001-04-18 12:50       ` Christoph Rohland
2001-04-15 11:54 ` Christoph Rohland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox