public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]]
@ 2001-04-19 23:28 D.W.Howells
  2001-04-20  1:42 ` Andrea Arcangeli
  0 siblings, 1 reply; 13+ messages in thread
From: D.W.Howells @ 2001-04-19 23:28 UTC (permalink / raw)
  To: andrea; +Cc: dhowells, linux-kernel, torvalds


You asked for some benchmarks Andrea, so I've obtained some.

The set of test modules can be found at:

	ftp://infradead.org/pub/people/dwh/rwsem-test.tar.bz2

(This also includes rwsem-stat.txt which has a copy of the benchmark results 
in as well)

There are six test programs. They can be made for i386 by the following 
command:

	make

And can also be made to invoke the scheduler after each pass through the loop:

	make SCHED=yes

I ran each individual test twice, hence the two sets of results for 
permutation.

My machine is a Dual 400MHz PII with an 440BX chipset. All the tests were run 
in runlevel 3 with no other applications running.

I benchmarked four different environments:

	(1) 2.4.4-pre3 + Andrea's generic rwsem patch
	(2) 2.4.4-pre4 using XADD to implement the rwsems
	(3) same as (2) but with a tweak to make rwsem_wake() less fair
	(4) 2.4.4-pre3 using my generic spinlock code to implement the rwsems

David


TEST		NUM READERS	NUM WRITERS	CONTENTION
===============	===============	===============	==========
rwsem-rw	4		2		r-w & w-w
rwsem-ro	4		0		no
rwsem-wo	0		4		w-w only
rwsem-r1	1		0		no
rwsem-w1	0		1		no
rwsem-r2	2		0		no


ENVIRONMENT			TEST	SCHED	READERS		WRITERS
===============================	=======	=======	===============	=======
Linux-2.4.4-pre3 + AA-rwsem	rws-rw	no	 3330281	    1009
						 3331972	     994
					yes	 1767102	  607091
						 1743420	  642095
				rws-ro	no	 2534630	n/a
						 3535202	n/a
					yes	 2837218	n/a
						 3164814	n/a
				rws-wo	no	n/a		 2507449
						n/a		 2399102
					yes	n/a		 1568467
						n/a		 1412262
				rws-r1	no	 9232485	n/a
						 9217585	n/a
					yes	 5483757	n/a
						 5487028	n/a
				rws-w1	no	n/a		 9900333
						n/a		 9918021
					yes	n/a		 5745657
						n/a		 5747063
				rws-r2	no	 3499275	n/a
						 3518590	n/a
					yes	 3184431	n/a
						 3180287	n/a

-------------------------------	-------	-------	---------------	-------
Linux-2.4.4-pre4 [XADD]		rws-rw	no	  563388	  283005
						  558159	  280288
					yes	  683670	  197017
						  700714	  194316
				rws-ro	no	 6316985	n/a
						 6314241	n/a
					yes	 4309406	n/a
						 4575043	n/a
				rws-wo	no	n/a		  765699
						n/a		  763876
					yes	n/a		  650512
						n/a		  652287
				rws-r1	no	15171532	n/a
						15158899	n/a
					yes	 7222310	n/a
						 7229793	n/a
				rws-w1	no	n/a		13942744
						n/a		13991823
					yes	n/a		 7362605
						n/a		 7356127
				rws-r2	no	 5517671	n/a
						 5516168	n/a
					yes	 3452796	n/a
						 3331947	n/a

-------------------------------	-------	-------	---------------	-------
Linux-2.4.4-pre4 [XADD]		rws-rw	no	  531929	  267129
 + slightly-unfair-contention			  531093	  266822
   tweak				yes	  839560	  185670
						  903995	  183958
				rws-ro	no	 6318293	n/a
						 6320336	n/a
					yes	 4257862	n/a
						 4315243	n/a
				rws-wo	no	n/a		  766427
						n/a		  766471
					yes	n/a		  644036
						n/a		  642236


-------------------------------	-------	-------	---------------	-------
Linux-2.4.4-pre4 [GENERIC-SPIN]	rws-rw	no	  545138	  274002
						  545378	  273785
					yes	  755343	  187874
						  745888	  185562
				rws-ro	no	 4500398	n/a
						 4506583	n/a
					yes	 3137883	n/a
						 3129119	n/a
				rws-wo	no	n/a		  763599
						n/a		  763059
					yes	n/a		  641256
						n/a		  647319
				rws-r1	no	13110083	n/a
						13115436	n/a
					yes	 6950687	n/a
						 6951901	n/a
				rws-w1	no	n/a		13004627
						n/a		13003754
					yes	n/a		 6899764
						n/a		 6898953
				rws-r2	no	 4741615	n/a
						 4746860	n/a
					yes	 3340292	n/a
						 2967268	n/a
\b0ã\x12A8\x01

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]]
  2001-04-19 23:28 rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]] D.W.Howells
@ 2001-04-20  1:42 ` Andrea Arcangeli
  2001-04-20 10:10   ` David Howells
  2001-04-20 17:17   ` x86 rwsem in 2.4.4pre[234] are still buggy [was Re: rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]]] Andrea Arcangeli
  0 siblings, 2 replies; 13+ messages in thread
From: Andrea Arcangeli @ 2001-04-20  1:42 UTC (permalink / raw)
  To: D . W . Howells; +Cc: dhowells, linux-kernel, torvalds

On Fri, Apr 20, 2001 at 12:28:09AM +0100, D . W . Howells wrote:
> I benchmarked four different environments:
> 
> 	(1) 2.4.4-pre3 + Andrea's generic rwsem patch
> 	(2) 2.4.4-pre4 using XADD to implement the rwsems
> 	(3) same as (2) but with a tweak to make rwsem_wake() less fair
> 	(4) 2.4.4-pre3 using my generic spinlock code to implement the rwsems
> 
> David
> 
> 
> TEST		NUM READERS	NUM WRITERS	CONTENTION
> ===============	===============	===============	==========
> rwsem-rw	4		2		r-w & w-w
> rwsem-ro	4		0		no
> rwsem-wo	0		4		w-w only
> rwsem-r1	1		0		no
> rwsem-w1	0		1		no
> rwsem-r2	2		0		no
> 
> 
> ENVIRONMENT			TEST	SCHED	READERS		WRITERS
> ===============================	=======	=======	===============	=======
> Linux-2.4.4-pre3 + AA-rwsem	rws-rw	no	 3330281	    1009
> 						 3331972	     994
[..]
> -------------------------------	-------	-------	---------------	-------
> Linux-2.4.4-pre4 [GENERIC-SPIN]	rws-rw	no	  545138	  274002
> 						  545378	  273785
> 					yes	  755343	  187874
> 						  745888	  185562

Some explanation on the above extreme difference. In the misc rw benchmark the
reason in the same amount of time I get a total number of down 3332966 and you
get only 819163 is that I provide recursive down_read and that in turn can
starve the down_write (my first patches weren't implementing fair semaphores).

As you can see in my post of yesterday I made my semaphores fair in my last
patches (from rwsem-generic-5). (you didn't said which patch you used exactly
but obviously it was earlier than the -5 revision)

I'm uncertain if I should drop the list_empty() check from the fast path and if
I should still allow up_* to be called from irq/softirq, if I reduce the max
number of sleepers to 2^16 and I will provide weaker wakeup semantics I won't
be penalizied anymore and then we'll really compare apples to orange making the
comparison more interesting (probably I will do because later on I can probably
re-add that two features without too much pain).

About the benchmark you wrote it looks good measure to me, thanks.

Andrea

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]]
  2001-04-20  1:42 ` Andrea Arcangeli
@ 2001-04-20 10:10   ` David Howells
  2001-04-20 17:17   ` x86 rwsem in 2.4.4pre[234] are still buggy [was Re: rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]]] Andrea Arcangeli
  1 sibling, 0 replies; 13+ messages in thread
From: David Howells @ 2001-04-20 10:10 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel


> About the benchmark you wrote it looks good measure to me, thanks.

As with all benchmarks, take with one pinch of salt and two of Mindcraft:-)

David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* x86 rwsem in 2.4.4pre[234] are still buggy [was Re: rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]]]
  2001-04-20  1:42 ` Andrea Arcangeli
  2001-04-20 10:10   ` David Howells
@ 2001-04-20 17:17   ` Andrea Arcangeli
  2001-04-20 23:45     ` Linus Torvalds
  1 sibling, 1 reply; 13+ messages in thread
From: Andrea Arcangeli @ 2001-04-20 17:17 UTC (permalink / raw)
  To: D . W . Howells; +Cc: dhowells, linux-kernel, torvalds

On Fri, Apr 20, 2001 at 03:42:15AM +0200, Andrea Arcangeli wrote:
> I'm uncertain if I should drop the list_empty() check from the fast path and if

While dropping the list_empty check to speed up the fast path I faced the same
complexity of the 2.4.4pre4 lib/rwsem.c and so before reinventing the wheel I
read how the problem was solved in 2.4.4pre4.

I couldn't get convinced that all the subtle possible cases was handled
correctly so I tried to stress them. To do that I changed the 2.4.4pre4 rwsem
(compiled for PII SMP) as below patch shows in order to reproduce more easily
those corner cases (note my changes cannot be the cause of the deadlock, as far
I can tell they can _only_ change the timings, and the timings have to be
flexible because they can be randomly influenced by interrupt handlers machine
checks and whatever else in whatever hardware): 

--- rwsemref-1/lib/rwsem.c.~1~	Thu Apr 19 19:59:46 2001
+++ rwsemref-1/lib/rwsem.c	Fri Apr 20 17:13:05 2001
@@ -6,6 +6,7 @@
 #include <linux/rwsem.h>
 #include <linux/sched.h>
 #include <linux/module.h>
+#include <linux/delay.h>
 
 /*
  * wait for the read lock to be granted
@@ -18,6 +19,7 @@
 	DECLARE_WAITQUEUE(wait,tsk);
 	signed long count;
 
+	mdelay(1);
 	rwsemtrace(sem,"Entering rwsem_down_read_failed");
 
 	/* this waitqueue context flag will be cleared when we are granted the lock */
@@ -59,6 +61,7 @@
 	DECLARE_WAITQUEUE(wait,tsk);
 	signed long count;
 
+	mdelay(1);
 	rwsemtrace(sem,"Entering rwsem_down_write_failed");
 
 	/* this waitqueue context flag will be cleared when we are granted the lock */
@@ -115,6 +118,7 @@
 		goto out;
 	}
 
+	mdelay(1);
 	/* try to grant a single write lock if there's a writer at the front of the queue
 	 * - note we leave the 'active part' of the count incremented by 1 and the waiting part
 	 *   incremented by 0x00010000
@@ -122,6 +126,7 @@
 	if (wake_up_ctx(&sem->wait,1,-RWSEM_WAITING_FOR_WRITE)==1)
 		goto out;
 
+	mdelay(10);
 	/* grant an infinite number of read locks to the readers at the front of the queue
 	 * - note we increment the 'active part' of the count by the number of readers just woken,
 	 *   less one for the activity decrement we've already done
@@ -129,6 +134,7 @@
 	woken = wake_up_ctx(&sem->wait,65535,-RWSEM_WAITING_FOR_READ);
 	if (woken<=0)
 		goto counter_correction;
+	mdelay(1);
 
 	woken *= RWSEM_ACTIVE_BIAS-RWSEM_WAITING_BIAS;
 	woken -= RWSEM_ACTIVE_BIAS;

and I succeeded in deadlocking the 2.4.4pre[2345] rwsemaphores with kernel
looping forever in 2.4.4pre4/lib/rwsem.c:rwsem_wake() but at least the breakage
is 100% reproducible if you use the above patch.

All you have to do to reproduce is to apply the above patch and then to run the
rwsem proggy that I posted to the list two days ago (the one that shows my
latest -5/6 revisions to be just a bit faster in the slow path) that does the
mmap and page faults from different threads in loop, in another console you run
`while :; do ps xa; done`, then while `ps` is near the end of its run you kill
the parent thread of the rwsem testcase so that all thread-children gets killed
as well, and at that point `ps` deadlocks hard in R state unkillable (it keeps
looping forever in the try_again: counter_correction: paths).

[..]
  285 ?        SW     0:00 [rpciod]
  461 ?        S      0:00 /usr/sbin/inetd
  463 tty1     S      0:00 /sbin/mingetty --noclear tty1
  464 tty2     S      0:00 /sbin/mingetty tty2
  465 tty3     S      0:00 /sbin/mingetty tty3
  466 tty4     S      0:00 /sbin/mingetty tty4
  467 tty5     S      0:00 /sbin/mingetty tty5
  468 tty6     S      0:00 /sbin/mingetty tty6
  469 ?        S      0:00 in.rlogind
  470 pts/0    S      0:00 login -- andrea
  471 pts/0    S      0:00 -bash
  530 ?        S      0:00 in.rlogind
  531 pts/1    S      0:00 login -- andrea
  532 pts/1    S      0:00 -bash
  666 pts/1    R      0:00 ps xa
  667 pts/0    S      0:00 ./rwsem
  668 pts/0    R      0:00 ./rwsem
  669 pts/0    D      0:00 ./rwsem
  670 pts/0    D      0:00 ./rwsem
  671 pts/0    D      0:00 ./rwsem
  672 pts/0    D      0:00 ./rwsem
  673 pts/0    D      0:00 ./rwsem
  674 pts/0    D      0:00 ./rwsem
  675 pts/0    D      0:00 ./rwsem
  676 pts/0    D      0:00 ./rwsem
  677 pts/0    R      0:00 ./rwsem
  678 pts/0    D      0:00 ./rwsem
  679 pts/0    D      0:00 ./rwsem
  680 pts/0    D      0:00 ./rwsem
  681 pts/0    D      0:00 ./rwsem
  682 pts/0    D      0:00 ./rwsem
  683 pts/0    D      0:00 ./rwsem
  684 pts/0    D      0:00 ./rwsem
  685 pts/0    D      0:00 ./rwsem
  686 pts/0    D      0:00 ./rwsem
  687 pts/0    D      0:00 ./rwsem
  688 pts/0    R      0:00 ./rwsem
  689 pts/0    D      0:00 ./rwsem
  690 pts/0    D      0:00 ./rwsem
  691 pts/0    D      0:00 ./rwsem
  692 pts/0    D      0:00 ./rwsem
  693 pts/0    D      0:00 ./rwsem
  694 pts/0    D      0:00 ./rwsem
  695 pts/0    D      0:00 ./rwsem
  696 pts/0    D      0:00 ./rwsem
  697 pts/0    D      0:00 ./rwsem
  698 pts/0    D      0:00 ./rwsem
  699 pts/0    D      0:00 ./rwsem
  700 pts/0    D      0:00 ./rwsem
  701 pts/0    D      0:00 ./rwsem
  702 pts/0    D      0:00 ./rwsem
  703 pts/0    D      0:00 ./rwsem
[ hang here ]

from another terminal:

andrea@laser:~ > kill 666
andrea@laser:~ > kill 666
andrea@laser:~ > kill 666
andrea@laser:~ > kill 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > ps 666
  PID TTY      STAT   TIME COMMAND
  666 pts/1    R      0:31 ps xa
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > ps 666
  PID TTY      STAT   TIME COMMAND
  666 pts/1    R     53:38 ps xa
andrea@laser:~ > 

At first I had a short look to check if this bug could be unrelated to the rwsem and
to be instead a race between the /proc filesystem and the do_exit path, but
after a short review of the interesting places the task_lock seems to serialize
correctly the accesses to the tsk->mm so the semaphore shouldn't go away under
rwsem_wake and furthmore it would be quite strange if the mm goes away under
the /proc filesystem that it only reproducibly fails in the same way in
rwsem_wake and I never get other system malfunctions for example while reading
the contents of the mm.

As further confirm this is a bug in the rwsemaphores I changed your benchmark
to be a little more aggressive:

--- rwsem-rw.c.orig	Thu Apr 19 22:37:27 2001
+++ rwsem-rw.c	Fri Apr 20 18:56:06 2001
@@ -71,9 +71,8 @@
 
 static inline void sched(void)
 {
-#ifdef SCHED
-	schedule();
-#endif
+	if (current->need_resched)
+		schedule();
 }
 
 int reader(void *arg)
@@ -86,8 +85,8 @@
 
 	while (atomic_read(&do_stuff)) {
 		dr();
-		ur();
 		sched();
+		ur();
 	}
 
 	atomic_dec(&runners);
@@ -105,8 +104,8 @@
 
 	while (atomic_read(&do_stuff)) {
 		dw();
-		uw();
 		sched();
+		uw();
 	}
 
 	atomic_dec(&runners);
@@ -125,22 +124,64 @@
 
 	init_rwsem(&rwsem_sem);
 	atomic_set(&do_stuff, 1);
+	wmb();
 
 #ifdef DEBUG_RWSEM
 	rwsem_to_monitor = &rwsem_sem;
 #endif
 	init_timer(&timer);
 	timer.function = stop_test;
-	timer.expires = jiffies + 5 * HZ;
+	timer.expires = jiffies + 50 * HZ;
 	add_timer(&timer);
 
-	kernel_thread(writer, "WriteA", 0);
-	kernel_thread(writer, "WriteB", 0);
-
-	kernel_thread(reader, "ReadA", 0);
-	kernel_thread(reader, "ReadB", 0);
-	kernel_thread(reader, "ReadC", 0);
-	kernel_thread(reader, "ReadD", 0);
+	kernel_thread(writer, "Write0", 0);
+	kernel_thread(writer, "Write1", 0);
+	kernel_thread(writer, "Write2", 0);
+	kernel_thread(writer, "Write3", 0);
+	kernel_thread(writer, "Write4", 0);
+	kernel_thread(writer, "Write5", 0);
+	kernel_thread(writer, "Write6", 0);
+	kernel_thread(writer, "Write7", 0);
+	kernel_thread(writer, "Write8", 0);
+	kernel_thread(writer, "Write9", 0);
+	kernel_thread(writer, "Write10", 0);
+	kernel_thread(writer, "Write11", 0);
+	kernel_thread(writer, "Write12", 0);
+	kernel_thread(writer, "Write13", 0);
+	kernel_thread(writer, "Write14", 0);
+	kernel_thread(writer, "Write15", 0);
+
+	kernel_thread(reader, "Read0", 0);
+	kernel_thread(reader, "Read1", 0);
+	kernel_thread(reader, "Read2", 0);
+	kernel_thread(reader, "Read3", 0);
+	kernel_thread(reader, "Read4", 0);
+	kernel_thread(reader, "Read5", 0);
+	kernel_thread(reader, "Read6", 0);
+	kernel_thread(reader, "Read7", 0);
+	kernel_thread(reader, "Read8", 0);
+	kernel_thread(reader, "Read9", 0);
+	kernel_thread(reader, "Read10", 0);
+	kernel_thread(reader, "Read11", 0);
+	kernel_thread(reader, "Read12", 0);
+	kernel_thread(reader, "Read13", 0);
+	kernel_thread(reader, "Read14", 0);
+	kernel_thread(reader, "Read15", 0);
+	kernel_thread(reader, "Read16", 0);
+	kernel_thread(reader, "Read17", 0);
+	kernel_thread(reader, "Read18", 0);
+	kernel_thread(reader, "Read19", 0);
+	kernel_thread(reader, "Read20", 0);
+	kernel_thread(reader, "Read21", 0);
+	kernel_thread(reader, "Read22", 0);
+	kernel_thread(reader, "Read23", 0);
+	kernel_thread(reader, "Read24", 0);
+	kernel_thread(reader, "Read25", 0);
+	kernel_thread(reader, "Read26", 0);
+	kernel_thread(reader, "Read27", 0);
+	kernel_thread(reader, "Read28", 0);
+	kernel_thread(reader, "Read29", 0);
+	kernel_thread(reader, "Read30", 0);
 
 	return 0;
 }
@@ -155,6 +196,8 @@
 		schedule();
 	printk("reads taken: %d\n", atomic_read(&reads_taken));
 	printk("writes taken: %d\n", atomic_read(&writes_taken));
+	printk("readers: %d\n", atomic_read(&readers));
+	printk("writers: %d\n", atomic_read(&writers));
 #ifdef DEBUG_RWSEM
 	rwsem_to_monitor = 0;
 #endif

and now also the benchmark deadlocked:

andrea@laser:~ > ps 669
  PID TTY      STAT   TIME COMMAND
  669 pts/0    R      6:25 [Read19]
andrea@laser:~ > su
root@laser:/home/andrea > rmmod rwsem-rw
[ hang here ]

Until we fix that bug I will keep running my rwsem-generic-6 patch to be 100%
sure to sit on rock solid rwsems.

BTW, this is the fix against pre4 for the 386+SMP rwsemaphores bug that I found
yesterday:

--- rwsemref-1/include/asm-i386/rwsem-spin.h.~1~	Fri Apr 20 05:03:59 2001
+++ rwsemref-1/include/asm-i386/rwsem-spin.h	Fri Apr 20 19:05:52 2001
@@ -93,9 +93,9 @@
 	__asm__ __volatile__(
 		"# beginning down_read\n\t"
 #ifdef CONFIG_SMP
+		"1:\n\t"
 LOCK_PREFIX	"  decb      "RWSEM_SPINLOCK_OFFSET_STR"(%%eax)\n\t" /* try to grab the spinlock */
 		"  js        3f\n" /* jump if failed */
-		"1:\n\t"
 #endif
 		"  incl      (%%eax)\n\t" /* adds 0x00000001, returns the old value */
 #ifdef CONFIG_SMP
@@ -132,9 +132,9 @@
 	__asm__ __volatile__(
 		"# beginning down_write\n\t"
 #ifdef CONFIG_SMP
+		"1:\n\t"
 LOCK_PREFIX	"  decb      "RWSEM_SPINLOCK_OFFSET_STR"(%%eax)\n\t" /* try to grab the spinlock */
 		"  js        3f\n" /* jump if failed */
-		"1:\n\t"
 #endif
 		"  xchg      %0,(%%eax)\n\t" /* retrieve the old value */
 		"  add       %0,(%%eax)\n\t" /* add 0xffff0001, result in memory */
@@ -173,9 +173,9 @@
 	__asm__ __volatile__(
 		"# beginning __up_read\n\t"
 #ifdef CONFIG_SMP
+		"1:\n\t"
 LOCK_PREFIX	"  decb      "RWSEM_SPINLOCK_OFFSET_STR"(%%eax)\n\t" /* try to grab the spinlock */
 		"  js        3f\n" /* jump if failed */
-		"1:\n\t"
 #endif
 		"  xchg      %0,(%%eax)\n\t" /* retrieve the old value */
 		"  addl      %0,(%%eax)\n\t" /* subtract 1, result in memory */
@@ -213,9 +213,9 @@
 	__asm__ __volatile__(
 		"# beginning __up_write\n\t"
 #ifdef CONFIG_SMP
+		"1:\n\t"
 LOCK_PREFIX	"  decb      "RWSEM_SPINLOCK_OFFSET_STR"(%%eax)\n\t" /* try to grab the spinlock */
 		"  js        3f\n" /* jump if failed */
-		"1:\n\t"
 #endif
 		"  addl      %3,(%%eax)\n\t" /* adds 0x00010001 */
 #ifdef CONFIG_SMP
@@ -251,9 +251,9 @@
 	__asm__ __volatile__(
 		"# beginning rwsem_atomic_update\n\t"
 #ifdef CONFIG_SMP
+		"1:\n\t"
 LOCK_PREFIX	"  decb      "RWSEM_SPINLOCK_OFFSET_STR"(%1)\n\t" /* try to grab the spinlock */
 		"  js        3f\n" /* jump if failed */
-		"1:\n\t"
 #endif
 		"  xchgl     %0,(%1)\n\t" /* retrieve the old value */
 		"  addl      %0,(%1)\n\t" /* add 0xffff0001, result in memory */
@@ -287,9 +287,9 @@
 	__asm__ __volatile__(
 		"# beginning rwsem_cmpxchgw\n\t"
 #ifdef CONFIG_SMP
+		"1:\n\t"
 LOCK_PREFIX	"  decb      "RWSEM_SPINLOCK_OFFSET_STR"(%3)\n\t" /* try to grab the spinlock */
 		"  js        3f\n" /* jump if failed */
-		"1:\n\t"
 #endif
 		"  cmpw      %w1,(%3)\n\t"
 		"  jne       4f\n\t" /* jump if old doesn't match sem->count LSW */

Andrea

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: x86 rwsem in 2.4.4pre[234] are still buggy [was Re: rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]]]
  2001-04-20 17:17   ` x86 rwsem in 2.4.4pre[234] are still buggy [was Re: rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]]] Andrea Arcangeli
@ 2001-04-20 23:45     ` Linus Torvalds
  2001-04-21 14:03       ` Andrea Arcangeli
  0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2001-04-20 23:45 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: D . W . Howells, dhowells, linux-kernel



On Fri, 20 Apr 2001, Andrea Arcangeli wrote:
>
> While dropping the list_empty check to speed up the fast path I faced the same
> complexity of the 2.4.4pre4 lib/rwsem.c and so before reinventing the wheel I
> read how the problem was solved in 2.4.4pre4.

I would suggest the following:

 - the generic semaphores should use the lock that already exists in the
   wait-queue as the semaphore spinlock.

 - the generic semaphores should _not_ drop the lock. Right now it drops
   the semaphore lock when it goes into the slow path, only to re-aquire
   it. This is due to bad interfacing with the generic slow-path routines.

   I suspect that this lock-drop is why Andrea sees problems with the
   generic semaphores. The changes to "count" and "sleeper" aren't
   actually atomic, because we don't hold the lock over them all. And
   re-using the lock means that we don't need the two levels of
   spinlocking for adding ourselves to the wait queue. Easily done by just
   moving the locking _out_ of the wait-queue helper functions, no?

 - the generic semaphores are entirely out-of-line, and are just declared
   universally as regular FASTCALL() functions.

The fast-path x86 code looks ok to me. The debugging stuff makes it less
readable than it should be, I suspect, and is probably not worth it at
this stage. The users of rw-semaphores are so well-defined (and so well
debugged) that the debugging code only makes the code harder to follow
right now.

Comments?  Andrea? Your patches have looked ok, but I absoutely refuse to
see the non-inlined fast-path for reasonable x86 hardware..

		Linus


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: x86 rwsem in 2.4.4pre[234] are still buggy [was Re: rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]]]
  2001-04-20 23:45     ` Linus Torvalds
@ 2001-04-21 14:03       ` Andrea Arcangeli
  2001-04-21 14:17         ` Russell King
  2001-04-21 14:37         ` Russell King
  0 siblings, 2 replies; 13+ messages in thread
From: Andrea Arcangeli @ 2001-04-21 14:03 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: D . W . Howells, dhowells, linux-kernel

On Fri, Apr 20, 2001 at 04:45:32PM -0700, Linus Torvalds wrote:
> I would suggest the following:
> 
>  - the generic semaphores should use the lock that already exists in the
>    wait-queue as the semaphore spinlock.

Ok, that is what my generic code does.

>  - the generic semaphores should _not_ drop the lock. Right now it drops
>    the semaphore lock when it goes into the slow path, only to re-aquire
>    it. This is due to bad interfacing with the generic slow-path routines.

My generic code doesn't drop the lock.

>    I suspect that this lock-drop is why Andrea sees problems with the
>    generic semaphores. The changes to "count" and "sleeper" aren't
>    actually atomic, because we don't hold the lock over them all. And
>    re-using the lock means that we don't need the two levels of
>    spinlocking for adding ourselves to the wait queue. Easily done by just
>    moving the locking _out_ of the wait-queue helper functions, no?

Basically yes, however for the wakeup I wrote a dedicated routine that
knows how to do the wake-all-next-readers or wake-next-writer (it is not
the same helper function of sched.c).

>  - the generic semaphores are entirely out-of-line, and are just declared
>    universally as regular FASTCALL() functions.

This is what I implemented originally but then I moved the fast path inline
for the fast-path benchmark reasons. I think in real life it doesn't matter
much if the fast path is inline or not.

> The fast-path x86 code looks ok to me. The debugging stuff makes it less
> readable than it should be, I suspect, and is probably not worth it at
> this stage. The users of rw-semaphores are so well-defined (and so well
> debugged) that the debugging code only makes the code harder to follow
> right now.

yes I agree, infact I added the ->magic check only to catch uninitialized
semaphores (and this one doesn't hurt readability that much).

> Comments?  Andrea? Your patches have looked ok, but I absoutely refuse to
> see the non-inlined fast-path for reasonable x86 hardware..

In my last patch the fast path is inline as said above but it is not in asm yet
because I couldn't get convinced it was right code. I plan to return looking
into the rwsem soon. I also seen David fixed the bug and dropped the buggy
rwsem-spin.h, so I suggest to merge his code for now, after a very short look
it seems certainly better than pre5.

Andrea

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: x86 rwsem in 2.4.4pre[234] are still buggy [was Re: rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]]]
  2001-04-21 14:03       ` Andrea Arcangeli
@ 2001-04-21 14:17         ` Russell King
  2001-04-21 14:29           ` Andrea Arcangeli
  2001-04-21 17:18           ` Linus Torvalds
  2001-04-21 14:37         ` Russell King
  1 sibling, 2 replies; 13+ messages in thread
From: Russell King @ 2001-04-21 14:17 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Linus Torvalds, D . W . Howells, dhowells, linux-kernel

On Sat, Apr 21, 2001 at 04:03:27PM +0200, Andrea Arcangeli wrote:
> On Fri, Apr 20, 2001 at 04:45:32PM -0700, Linus Torvalds wrote:
> > I would suggest the following:
> > 
> >  - the generic semaphores should use the lock that already exists in the
> >    wait-queue as the semaphore spinlock.
> 
> Ok, that is what my generic code does.

Erm, spin_lock()?  What if up_read or up_write gets called from interrupt
context (is this allowed)?

If these are now allowed, then maybe we should either consider getting
the Stanford checker to check for this, or else we ought to do a debugging
if (in_interupt()) BUG(); thing.

--
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: x86 rwsem in 2.4.4pre[234] are still buggy [was Re: rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]]]
  2001-04-21 14:17         ` Russell King
@ 2001-04-21 14:29           ` Andrea Arcangeli
  2001-04-21 14:37             ` rmk
  2001-04-21 17:18           ` Linus Torvalds
  1 sibling, 1 reply; 13+ messages in thread
From: Andrea Arcangeli @ 2001-04-21 14:29 UTC (permalink / raw)
  To: Russell King; +Cc: Linus Torvalds, D . W . Howells, dhowells, linux-kernel

On Sat, Apr 21, 2001 at 03:17:37PM +0100, Russell King wrote:
> On Sat, Apr 21, 2001 at 04:03:27PM +0200, Andrea Arcangeli wrote:
> > On Fri, Apr 20, 2001 at 04:45:32PM -0700, Linus Torvalds wrote:
> > > I would suggest the following:
> > > 
> > >  - the generic semaphores should use the lock that already exists in the
> > >    wait-queue as the semaphore spinlock.
> > 
> > Ok, that is what my generic code does.
> 
> Erm, spin_lock()?  What if up_read or up_write gets called from interrupt
> context (is this allowed)?

That it is allowed by my generic code that does spin_lock_irq in down_* and
spin_lock_irqsave in up_* but it's disallowed by the weaker semantics of the
generic and x86 semaphores 2.4.4pre[2345] (or + David's last patch).

> If these are now allowed, then maybe we should either consider getting
> the Stanford checker to check for this, or else we ought to do a debugging
> if (in_interupt()) BUG(); thing.

Caller bug is the last of my worries. (and I seriously doubt that if somebody
doesn't know the semantics of the rwsem, he will care to enable the debugging
checks during regression testing ;)

Andrea

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: x86 rwsem in 2.4.4pre[234] are still buggy [was Re: rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]]]
  2001-04-21 14:29           ` Andrea Arcangeli
@ 2001-04-21 14:37             ` rmk
  2001-04-21 15:04               ` Andrea Arcangeli
  0 siblings, 1 reply; 13+ messages in thread
From: rmk @ 2001-04-21 14:37 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Linus Torvalds, D . W . Howells, dhowells, linux-kernel

Andrea Arcangeli writes:
> That it is allowed by my generic code that does spin_lock_irq in down_* and
> spin_lock_irqsave in up_* but it's disallowed by the weaker semantics of the
> generic and x86 semaphores 2.4.4pre[2345] (or + David's last patch).

Hang on, who's code is in 2.4.4-pre5?  It claims to be Davids, which does
suffer from the problem I described.

--
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: x86 rwsem in 2.4.4pre[234] are still buggy [was Re: rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]]]
  2001-04-21 14:03       ` Andrea Arcangeli
  2001-04-21 14:17         ` Russell King
@ 2001-04-21 14:37         ` Russell King
  2001-04-21 15:07           ` Andrea Arcangeli
  1 sibling, 1 reply; 13+ messages in thread
From: Russell King @ 2001-04-21 14:37 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Linus Torvalds, D . W . Howells, dhowells, linux-kernel

On Sat, Apr 21, 2001 at 04:03:27PM +0200, Andrea Arcangeli wrote:
> On Fri, Apr 20, 2001 at 04:45:32PM -0700, Linus Torvalds wrote:
> > I would suggest the following:
> > 
> >  - the generic semaphores should use the lock that already exists in the
> >    wait-queue as the semaphore spinlock.
> 
> Ok, that is what my generic code does.

rwsem-spinlock.h requires linux/types.h to be included (you're using
__u16 at towards the bottom):

gcc -D__KERNEL__ -I/usr/src2/v2.4/linux-rpc/include -Wall -Wstrict-prototypes
 -O2 -fno-strict-aliasing -pipe -mapcs-32  -march=armv3m -mtune=strongarm110
 -mshort-load-bytes -msoft-float -c -o ieee1284.o ieee1284.c
In file included from /usr/src2/v2.4/linux-rpc/include/linux/rwsem.h:56,
                 from /usr/src2/v2.4/linux-rpc/include/asm/semaphore.h:10,
                 from /usr/src2/v2.4/linux-rpc/include/linux/parport.h:101,
                 from ieee1284.c:19:
/usr/src2/v2.4/linux-rpc/include/linux/rwsem-spinlock.h:154: parse error before `rwsem_cmpxchgw'
/usr/src2/v2.4/linux-rpc/include/linux/rwsem-spinlock.h:154: parse error before `__u16'
/usr/src2/v2.4/linux-rpc/include/linux/rwsem-spinlock.h:155: warning: return-type defaults to `int'
/usr/src2/v2.4/linux-rpc/include/linux/rwsem-spinlock.h:155: warning: function declaration isn't a prototype

Here is a patch that fixes this oversight against 2.4.4-pre5:

--- orig/include/linux/rwsem-spinlock.h	Sat Apr 21 15:32:57 2001
+++ linux/include/linux/rwsem-spinlock.h	Sat Apr 21 15:28:45 2001
@@ -11,6 +11,7 @@
 #endif
 
 #include <linux/spinlock.h>
+#include <linux/types.h>
 
 #ifdef __KERNEL__
 


--
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: x86 rwsem in 2.4.4pre[234] are still buggy [was Re: rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]]]
  2001-04-21 14:37             ` rmk
@ 2001-04-21 15:04               ` Andrea Arcangeli
  0 siblings, 0 replies; 13+ messages in thread
From: Andrea Arcangeli @ 2001-04-21 15:04 UTC (permalink / raw)
  To: rmk; +Cc: Linus Torvalds, D . W . Howells, dhowells, linux-kernel

On Sat, Apr 21, 2001 at 03:37:05PM +0100, rmk@arm.linux.org.uk wrote:
> Andrea Arcangeli writes:
> > That it is allowed by my generic code that does spin_lock_irq in down_* and
> > spin_lock_irqsave in up_* but it's disallowed by the weaker semantics of the
			      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > generic and x86 semaphores 2.4.4pre[2345] (or + David's last patch).
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> Hang on, who's code is in 2.4.4-pre5?  It claims to be Davids, which does
> suffer from the problem I described.

Yes.

Andrea

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: x86 rwsem in 2.4.4pre[234] are still buggy [was Re: rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]]]
  2001-04-21 14:37         ` Russell King
@ 2001-04-21 15:07           ` Andrea Arcangeli
  0 siblings, 0 replies; 13+ messages in thread
From: Andrea Arcangeli @ 2001-04-21 15:07 UTC (permalink / raw)
  To: Russell King; +Cc: Linus Torvalds, D . W . Howells, dhowells, linux-kernel

On Sat, Apr 21, 2001 at 03:37:42PM +0100, Russell King wrote:
> On Sat, Apr 21, 2001 at 04:03:27PM +0200, Andrea Arcangeli wrote:
> > On Fri, Apr 20, 2001 at 04:45:32PM -0700, Linus Torvalds wrote:
> > > I would suggest the following:
> > > 
> > >  - the generic semaphores should use the lock that already exists in the
> > >    wait-queue as the semaphore spinlock.
> > 
> > Ok, that is what my generic code does.
> 
> rwsem-spinlock.h requires linux/types.h to be included (you're using
> __u16 at towards the bottom):

As you are quoting me on "that is what my generic code does", note that my
code is only here:

	ftp://ftp.*.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre4aa1/00_rwsem-generic-6

and no one single line of my rwsem patches is ever been included in any of
Linus's or Alan's kernels.

> gcc -D__KERNEL__ -I/usr/src2/v2.4/linux-rpc/include -Wall -Wstrict-prototypes
>  -O2 -fno-strict-aliasing -pipe -mapcs-32  -march=armv3m -mtune=strongarm110
>  -mshort-load-bytes -msoft-float -c -o ieee1284.o ieee1284.c
> In file included from /usr/src2/v2.4/linux-rpc/include/linux/rwsem.h:56,
>                  from /usr/src2/v2.4/linux-rpc/include/asm/semaphore.h:10,
>                  from /usr/src2/v2.4/linux-rpc/include/linux/parport.h:101,
>                  from ieee1284.c:19:
> /usr/src2/v2.4/linux-rpc/include/linux/rwsem-spinlock.h:154: parse error before `rwsem_cmpxchgw'
> /usr/src2/v2.4/linux-rpc/include/linux/rwsem-spinlock.h:154: parse error before `__u16'
> /usr/src2/v2.4/linux-rpc/include/linux/rwsem-spinlock.h:155: warning: return-type defaults to `int'
> /usr/src2/v2.4/linux-rpc/include/linux/rwsem-spinlock.h:155: warning: function declaration isn't a prototype
> 
> Here is a patch that fixes this oversight against 2.4.4-pre5:
> 
> --- orig/include/linux/rwsem-spinlock.h	Sat Apr 21 15:32:57 2001
> +++ linux/include/linux/rwsem-spinlock.h	Sat Apr 21 15:28:45 2001
> @@ -11,6 +11,7 @@
>  #endif
>  
>  #include <linux/spinlock.h>
> +#include <linux/types.h>
>  
>  #ifdef __KERNEL__
>  
> 
> 
> --
> Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
>              http://www.arm.linux.org.uk/personal/aboutme.html
> 


Andrea

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: x86 rwsem in 2.4.4pre[234] are still buggy [was Re: rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]]]
  2001-04-21 14:17         ` Russell King
  2001-04-21 14:29           ` Andrea Arcangeli
@ 2001-04-21 17:18           ` Linus Torvalds
  1 sibling, 0 replies; 13+ messages in thread
From: Linus Torvalds @ 2001-04-21 17:18 UTC (permalink / raw)
  To: Russell King; +Cc: Andrea Arcangeli, D . W . Howells, dhowells, linux-kernel



On Sat, 21 Apr 2001, Russell King wrote:
>
> Erm, spin_lock()?  What if up_read or up_write gets called from interrupt
> context (is this allowed)?

Currently that is not allowed.

We allow it for regular semaphores, but not for rw-semaphores.

We may some day have to revisit that issue, but I suspect we won't have
much reason to.

		Linus


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2001-04-21 17:18 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-04-19 23:28 rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]] D.W.Howells
2001-04-20  1:42 ` Andrea Arcangeli
2001-04-20 10:10   ` David Howells
2001-04-20 17:17   ` x86 rwsem in 2.4.4pre[234] are still buggy [was Re: rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]]] Andrea Arcangeli
2001-04-20 23:45     ` Linus Torvalds
2001-04-21 14:03       ` Andrea Arcangeli
2001-04-21 14:17         ` Russell King
2001-04-21 14:29           ` Andrea Arcangeli
2001-04-21 14:37             ` rmk
2001-04-21 15:04               ` Andrea Arcangeli
2001-04-21 17:18           ` Linus Torvalds
2001-04-21 14:37         ` Russell King
2001-04-21 15:07           ` Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox