Re: generic rwsem [Re: Alpha "process table hang"]

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: generic rwsem [Re: Alpha "process table hang"]
@ 2001-04-17 19:18 D.W.Howells
  2001-04-17 20:49 ` Andrea Arcangeli
  0 siblings, 1 reply; 17+ messages in thread
From: D.W.Howells @ 2001-04-17 19:18 UTC (permalink / raw)
  To: andrea; +Cc: linux-kernel

Andrea,

> As said the design of the framework to plugin per-arch rwsem implementation 
> isn't flexible enough and the generic spinlocks are as well broken, try to 
> use them if you can (yes I tried that for the alpha, it was just a mess and 
> it was more productive to rewrite than to fix).

Having thought about the matter a bit, I know what the problem is:

As stated in the email with the latest patch, I haven't yet extended this to 
cover any architecture but i386. That patch was actually put up for comments, 
though it got included anyway.

Therefore, all other archs use the old (and probably) broken implementations!

I'll quickly knock up a patch to fix the other archs. This should also fix 
the alpha problem.

As for making the stuff I had done less generic, and more specific, I only 
made it more generic because I got asked to by a number of people. It was 
suggested that I move the contention functions into lib/rwsem.c and make them 
common.

As far as using atomic_add_return() goes, the C compiler cannot make the 
fastpath anywhere near as efficient, because amongst other things, I can make 
use of the condition flags set in EFLAGS and the compiler can't.

> And it's also more readable and it's not bloated code, 65+110 lines
> compared to 156+148+174 lines. 

You do my code an injustice there... I've put comments in mine.

David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: generic rwsem [Re: Alpha "process table hang"]
  2001-04-17 19:18 generic rwsem [Re: Alpha "process table hang"] D.W.Howells
@ 2001-04-17 20:49 ` Andrea Arcangeli
  2001-04-17 21:29   ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Andrea Arcangeli @ 2001-04-17 20:49 UTC (permalink / raw)
  To: D . W . Howells; +Cc: linux-kernel

On Tue, Apr 17, 2001 at 08:18:57PM +0100, D . W . Howells wrote:
> Andrea,
> 
> > As said the design of the framework to plugin per-arch rwsem implementation 
> > isn't flexible enough and the generic spinlocks are as well broken, try to 
> > use them if you can (yes I tried that for the alpha, it was just a mess and 
> > it was more productive to rewrite than to fix).
> 
> Having thought about the matter a bit, I know what the problem is:
> 
> As stated in the email with the latest patch, I haven't yet extended this to 
> cover any architecture but i386. That patch was actually put up for comments, 
> though it got included anyway.
> 
> Therefore, all other archs use the old (and probably) broken implementations!

I am sure ppc couldn't race (at least unless up_read/up_write were excuted
from irq/softnet context and that never happens in 2.4.4pre3, see below ;).

> I'll quickly knock up a patch to fix the other archs. This should also fix 
> the alpha problem.

This is not the point. The point is that we want a generic implementation in C
always available as a fallback and my one is IMHO better to what is in
2.4.4pre3, and secondly it has a superior API to replace the generic
implementation with a per-arch implementation. So the only thing left to do is
to plugin your x86 specific implementation in my patch using the simple API I
provide to override the generic implementation completly and I preferred if you
could do that at least for the 386/486 case (you know your code better). If
you're not interested I probably end ingoring the <586 compiles by implementing
only the atomic_*_return with xadd for >=586 and a CONFIG_RWSEM_ATOMIC_RETURN
config option in the common code so I will optimize almost all archs in one go
as I think that's the way to go and so I prefer to invest time in such direction
only.

> As for making the stuff I had done less generic, and more specific, I only 
> made it more generic because I got asked to by a number of people. It was 
> suggested that I move the contention functions into lib/rwsem.c and make them 
> common.

And the generic part was implemented bad and that's why I rewrote it to boot
my alpha.

> As far as using atomic_add_return() goes, the C compiler cannot make the 
> fastpath anywhere near as efficient, because amongst other things, I can make 
> use of the condition flags set in EFLAGS and the compiler can't.

All we need to do is to avoid the spinlock until we enter the fast path.  We
only need the fast path to be a few asm inlined opcodes that jumps out of line
if there's contention. I don't think other optimizations are interesting.

That can obviously be done for example with C code like this:

	count = atomic_inc_return(&sem->count);
	if (__builtin_expect(count == 0, 0))
		slow_path()

The above is the perfect C implementation IMHO, but it cannot be
the most generic one because [34]86 doesn't have xadd and they
cannot implement atomic_*_return and friends.

And incidentally the above is what (I guess Richard) did on the alpha and that
should really go into common code instead of having asm-i386/rwsem-xadd.h
asm-alpha/rwsem.h etcc.etc...  just implement atomic_inc_return using xadd in
asm-i386/atomic.h, that's much better design IMHO.

> > And it's also more readable and it's not bloated code, 65+110 lines
> > compared to 156+148+174 lines. 
> 
> You do my code an injustice there... I've put comments in mine.

Put it this way: my one is readable enough that I don't need to add comments ;).
More serously I may be biased in the readability point, but if I read
lib/rwsem.c and include/linux/rwsem*.h before applying the patch and after
applying the patch I have no dobut on what I want to run on my computer (I'm
not talking about asm-i386/rwsem*.h of course).  You are of course free to keep
your one if you prefer it but I don't see technical arguments for that
decision.

BTW, Andrew Morton is been so kind to audit my code and he noticed my patch was
not allowing up_read/up_write to be called from irqs because I forgotten an
_irq in the down_write (thanks Andrew!). I didn't catched during regression
testing because nobody in the whole 2.4.4pre3 kernel is running either up_read
or up_write from irq/softirq context, so it cannot destabilize the runtime but
nevertheless that was a leftover also shared by the ppc port code in 2.4.3. So
if you just started the alpha regression testing on the -1 revision go head and
don't stop because you are reading this.  just for Linus I released a -2 new
version of the patch but again upgrade is not necessary for production (at
least unless you use drivers outside the kernel tree that could release rwsem
from irq/softirq context).

	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.4pre3/rwsem-generic-2

I didn't exported rwsem.c if CONFIG_RWSEM_GENERIC is set to n as suggested
by Christoph yet because the old code couldn't be buggy and it's not obvious to
me that the other way around is correct (Christoph are you sure we can export an
object file that is not even compiled/generated? If answer is yes the export
mechanism must be smart enough to discard that file if not present but I'm not
sure if that's the case ;)

thanks for your comments.

Andrea

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: generic rwsem [Re: Alpha "process table hang"]
  2001-04-17 20:49 ` Andrea Arcangeli
@ 2001-04-17 21:29   ` Christoph Hellwig
  2001-04-17 22:06     ` Andrea Arcangeli
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2001-04-17 21:29 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel, dhowells

Hi Andrea,

In article <20010417224933.E31982@athlon.random> you wrote:
> I didn't exported rwsem.c if CONFIG_RWSEM_GENERIC is set to n as suggested
> by Christoph yet because the old code couldn't be buggy and it's not obvious to
> me that the other way around is correct (Christoph are you sure we can export an
> object file that is not even compiled/generated? If answer is yes the export
> mechanism must be smart enough to discard that file if not present but I'm not
> sure if that's the case ;)

Yes! All the objects in export-objs only get additional depencies in
Rules.make - but if they do not get compiled at all that depencies won't
matter either.  All other makefile work this way, btw.

In my first mail I forgot that the makefile can be optimized even
further, the hunk should look like this:
(NOTE: the patch is handwritten, no apply gurantee)

diff -urN 2.4.4pre3/lib/Makefile rwsem/lib/Makefile
--- 2.4.4pre3/lib/Makefile      Sat Apr 14 15:21:29 2001
+++ rwsem/lib/Makefile  Tue Apr 17 21:58:57 2001
@@ -10,10 +10,12 @@

 export-objs := cmdline.o rwsem.o

-obj-y := errno.o ctype.o string.o vsprintf.o brlock.o cmdline.o rwsem.o
+obj-y := errno.o ctype.o string.o vsprintf.o brlock.o cmdline.o

 ifneq ($(CONFIG_HAVE_DEC_LOCK),y)
   obj-y += dec_and_lock.o
 endif

+obj-$(CONFIG_GENERIC_RWSEM)	+= rwsem.o
+
 include $(TOPDIR)/Rules.make



	Christoph

-- 
Of course it doesn't work. We've performed a software upgrade.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: generic rwsem [Re: Alpha "process table hang"]
  2001-04-17 21:29   ` Christoph Hellwig
@ 2001-04-17 22:06     ` Andrea Arcangeli
  0 siblings, 0 replies; 17+ messages in thread
From: Andrea Arcangeli @ 2001-04-17 22:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-kernel, dhowells

On Tue, Apr 17, 2001 at 11:29:23PM +0200, Christoph Hellwig wrote:
> Yes! All the objects in export-objs only get additional depencies in
> Rules.make - but if they do not get compiled at all that depencies won't
> matter either.  All other makefile work this way, btw.

ok thanks for the confirm.

> In my first mail I forgot that the makefile can be optimized even
> further, the hunk should look like this:

Yes, I didn't used the -$() form only because I thought I had to make
conditional the export-objs too.

I applied it.

Andrea

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: generic rwsem [Re: Alpha "process table hang"]
@ 2001-04-17 23:54 D.W.Howells
  2001-04-18 20:49 ` Andrea Arcangeli
  0 siblings, 1 reply; 17+ messages in thread
From: D.W.Howells @ 2001-04-17 23:54 UTC (permalink / raw)
  To: andrea; +Cc: dhowells, linux-kernel

> It is 36bytes. and on 64bit archs the difference is going to be less. 

You're right - I can't add up (must be too late at night), and I was looking 
at wait_queue not wait_queue_head. I suppose that means my implementations 
are then 20 and 16 bytes respectively.

On 64-bit archs the difference will be less, depending on what a "long" is.

> The real waste is the lock of the waitqueue that I don't need, so I should 
> probably keep two list_head in the waitqueue instead of using the 
> wait_queue_head_t and wake_up_process by hand. 

Perhaps you should steal my wake_up_ctx() idea. That means you only need one 
wait queue, and you use bits in the wait_queue flags to note which type of 
waiter is at the front of the queue.

You can then say "wake up the first thing at the front of the queue if it is 
a writer"; and you can say "wake up the first consequtive bunch of things at 
the front of the queue, provided they're all readers" or "wake up all the 
readers in the queue".

> The fast path has to be as fast as yours, if not then the only variable
> that can make difference is the fact I'm not inlining the fast path because
> it's not that small, in such a case I should simply inline the fast path

My point exactly... It can't be as fast because it's _all_ out of line. 
Therefore you always have to go through the overhead of a function call, 
whatever that entails on the architecture of choice.

> I don't care about the scalability of the slow path and I think the slow
> path may even be faster than yours because I don't run additional
> unlock/lock and memory barriers and the other cpus will stop dirtifying my
> stuff after their first trylock until I unlock. 

Except for the optimised case, you may be correct on an SMP configured kernel 
(for a UP kernel, spinlocks are nops).

However! mine runs for as little time as possible with spinlocks held in the 
generic case, and, perhaps more importantly, as little time as possible with 
interrupts disabled.

One other thing: should you be using spin_lock_irqsave() instead of 
spin_lock_irq() in your down functions? I'm not sure it's necessary, however, 
since you probably shouldn't be sleeping if you've got the interrupts 
disabled (though schedule() will cope).

> If you have time to benchmark I'd be interested to see some number. But
> anyways my implementation was mostly meant to be obviously right and
> possible to ovverride with per-arch algorithms

I'll have a go tomorrow evening. It's time to go to bed now I think:-)

David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: generic rwsem [Re: Alpha "process table hang"]
  2001-04-17 23:54 D.W.Howells
@ 2001-04-18 20:49 ` Andrea Arcangeli
  0 siblings, 0 replies; 17+ messages in thread
From: Andrea Arcangeli @ 2001-04-18 20:49 UTC (permalink / raw)
  To: D . W . Howells; +Cc: dhowells, linux-kernel

On Wed, Apr 18, 2001 at 12:54:41AM +0100, D . W . Howells wrote:
> > It is 36bytes. and on 64bit archs the difference is going to be less. 
> 
> You're right - I can't add up (must be too late at night), and I was looking 
> at wait_queue not wait_queue_head. I suppose that means my implementations 
> are then 20 and 16 bytes respectively.
> 
> On 64-bit archs the difference will be less, depending on what a "long" is.

yes. I actually modified my implementation and now I gone below the size of a
waitqueue because I don't need its lock. The rw_semaphore now is only 16
bytes in size even in SMP (while your generic rw_semaphore is larger than 16 in
smp).  I also changed the wakeup mechanism to do the same fair logic as yours
but I'm not changing any common code (so it's complete FIFO with a wake up all
contigous readers behaviour). It's also now completly fair as the fast path
will go to sleep if anybody is registered in a waitqueue so it has the property
that we are still missing in the non rw semaphores. And it still seems quite
obvious while reading it.

> Perhaps you should steal my wake_up_ctx() idea. That means you only need one 

I now stolen the wakeup logic but I reimplemented it internally to the rwsem.c
without involving the external visible waitqueue mechanism (short version: no
changes to sched.c and wait.h).

> You can then say "wake up the first thing at the front of the queue if it is 
> a writer"; and you can say "wake up the first consequtive bunch of things at 
> the front of the queue, provided they're all readers" or "wake up all the 
> readers in the queue".

I preferred not to generalize that in the wake_up_ctx way but yes I hardcoded
that in a function that knows what to do with a simple list_head that is
even ligther and faster.

> My point exactly... It can't be as fast because it's _all_ out of line. 

Ok, I also inlined the fast path. Of course if you're not running out of icache
(like in a benchmark dedicated to the rwsem) inlining the fast path is
faster... (previously I was talking about real world misc load where we can more
easily run out of icache)

> However! mine runs for as little time as possible with spinlocks held in the 
> generic case, and, perhaps more importantly, as little time as possible with 
> interrupts disabled.

But it reacquires other locks for the waitqueue and it clears irqs again too
and it will ping pong cachelines in the wait event interface. So it's _slower_
and not faster. Making the locks more granular makes sense when the contention
on the lock goes away after you make it granular, but you are using two
spinlocks instead of one and you still have contention in the same slow path
on the second spinlock and wait_even runtime, while I only have contention in
the first one and that's why I'm more efficient (see numbers below).

> One other thing: should you be using spin_lock_irqsave() instead of 
> spin_lock_irq() in your down functions? I'm not sure it's necessary, however, 

It's not necessary to save flags because they can sleep.

BTW, your rwsem-spinlock.h forgets the clear irqs in down_* and to clear irqs
and save flags in the up_*! So my spinlock are penalized as the cli/sti pairs
are not that light and you are providing weaker wakeup semantics than me.

My new implementation only handle up to 2^31 concurrent readers, as usual
unlimited sleepers in the slow paths and down_read is not anymore recursive
because of the guaranteed total fifo behaviour.  This scenario will deadlock
while it was working fine with the previous patches on my ftp area:

	task0		task1
	down_read(sem)
			down_write(sem)
	down_read(sem)

new patch is here:

	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.4pre3/rwsem-generic-5

And now I compared 2.4.4pre3aa3 - 00_rwsem-generic-1 with 2.4.4pre3aa3 - 00_rwsem-generic-1 + rwsem-generic-5,
that is the same as comparing vanilla 2.4.4pre3 with vanilla 2.4.4pre3 + rwsem-generic-5.

I wrote this rwsem stresser (if you have any bug in the rwsem this stresser will trigger it
almost immediatly).

/*
 *  rw_semaphore benchmark (use with 2.4.3 or more recent kernels
 *  that uses down_read() in the page fault and down_write() in mmap).
 *
 *  Copyright (C) 2001  Andrea Arcangeli <andrea@suse.de> SuSE
 */

#include <pthread.h>
#include <stdio.h>
#include <asm/page.h>
#include <sys/mman.h>
#include <asm/system.h>

#define NR_THREADS 50
#define RWSEM_LOOPS 500
#define READ_DOWN_PER_WRITE_DOWN 4

static int start;

void * rwsemflood(void * foo)
{
	int i;
	pthread_mutex_t * mutex = (pthread_mutex_t *) foo;

	if (pthread_mutex_lock(mutex))
		perror("pthread_mutex_lock"), exit(1);

	for (i = 0; i < RWSEM_LOOPS; i++) {
		volatile char * mem;
		int i;

		mem = mmap(NULL, PAGE_SIZE * READ_DOWN_PER_WRITE_DOWN,
			   PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
		if (mem == MAP_FAILED)
			perror("mmap"), exit(1);
		for (i = 0; i < PAGE_SIZE * READ_DOWN_PER_WRITE_DOWN; i += PAGE_SIZE)
			*(mem+i);
		if (munmap((char *)mem, PAGE_SIZE) < 0)
			perror("munmap"), exit(1);
	}
	pthread_exit(NULL);
}

main()
{
	pthread_t thread[NR_THREADS];
	pthread_mutex_t mutex[NR_THREADS];
	int i;

	for (i = 0; i < NR_THREADS; i++) {
		pthread_mutex_init(&mutex[i], NULL);
		if (pthread_mutex_lock(&mutex[i]))
			perror("pthread_mutex_lock"), exit(1);
	}

	for (i = 0; i < NR_THREADS; i++)
		if (pthread_create(&thread[i], NULL , rwsemflood, &mutex[i]) < 0)
			perror("pthread_create"), exit(1);

	for (i = 0; i < NR_THREADS; i++)
		if (pthread_mutex_unlock(&mutex[i]))
			perror("pthread_mutex_unlock"), exit(1);

	for (i = 0; i < NR_THREADS; i++)
		if (pthread_join(thread[i], NULL))
			perror("pthread_join"), exit(1);
}

And here are the numbers with the plain 2.4.4pre3 rwsemaphores that are
implemented in asm with the kernel configured for PII and the wakeup that
cannot happen in an irq/softirq. The hardware is a 2-way SMP PII.

andrea@laser:~ > for i in 1 2 3 4; do time ./rwsem ;done

real    0m51.587s
user    0m0.100s
sys     0m52.770s

real    0m50.476s
user    0m0.100s
sys     0m50.730s

real    0m51.502s
user    0m0.110s
sys     0m53.110s

real    0m50.437s
user    0m0.080s
sys     0m51.070s

and now here it is the same benchmark run with rwsem-generic-5:

ndrea@laser:~ > for i in 1 2 3 4; do time ./rwsem ;done

real    0m50.035s
user    0m0.080s
sys     0m51.430s

real    0m50.636s
user    0m0.090s
sys     0m51.100s

real    0m50.038s
user    0m0.050s
sys     0m50.640s

real    0m50.655s
user    0m0.060s
sys     0m50.800s

as you can see despite it was an unfair comparison my implementation was still
a bit faster or at least running at the same speed.  And yes, this only
benchmark the slow path, the fast path is not easy to measure from userspace
and since I now inlined the fast path it must not run slower than yours. If you
have more interesting bench go ahead of course.

I think you should now agree on my generic rwsemaphore implementation.

About your last patch where you try to change all archs to use your generic
implementation in pre3, it still breaks during compilation and I dislike
that long cryptic names:

	CONFIG_RWSEM_GENERIC_SPINLOCK
	CONFIG_RWSEM_XCHGADD_ALGORITHM

I don't see why you didn't call them CONFIG_RWSEM_GENERIC and
CONFIG_RWSEM_ATOMIC_RETURN respectively (as I suggested originally).

I now recommend anyone with an alpha to use 2.4.4pre3 with those
two patches applied:

	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre3aa3/00_alpha-numa-3
	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.4pre3/rwsem-generic-5

(they can be applied also on ia32 kernels of course)

About your comment on the atomic_*_return vs spinlock in the fast path, the
atomic_*_return way is obviously much faster and shorter than the spinlock
version on the alpha and it also saves 8 bytes in the size of the rw_semaphore
compared to the generic implementation (and I'm sure the same is valid for the
other RISC chips that provides read locked store conditional mechanism to
implement atomic updates in memory).  That's why I suggested to move it in the
common code (if it would be slower not even the alpha would be trying to use it
:). It's just that by sharing it we increase the userbase of the brainer part.

Also on any 64bit arch we can provide 2^32 readers and 2^32 writers at the same
time without the need of a spinlock. We could do the same also on >=586 using
chmxchg8b but I'm not sure if that would be a great idea.

Andrea

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: generic rwsem [Re: Alpha "process table hang"]
@ 2001-04-17 21:48 D.W.Howells
  2001-04-17 23:06 ` Andrea Arcangeli
  0 siblings, 1 reply; 17+ messages in thread
From: D.W.Howells @ 2001-04-17 21:48 UTC (permalink / raw)
  To: andrea; +Cc: linux-kernel, dhowells, torvalds

> I am sure ppc couldn't race (at least unless up_read/up_write were excuted 
> from irq/softnet context and that never happens in 2.4.4pre3, see below ;). 

This is not actually using the rwsem code I wrote at the moment.

> And incidentally the above is what (I guess Richard) did on the alpha and
> that should really go into common code instead of having
> asm-i386/rwsem-xadd.h asm-alpha/rwsem.h etcc.etc... just implement
> atomic_inc_return using xadd in asm-i386/atomic.h, that's much better
> design IMHO. 

I disagree... you want such primitives to be as efficient as possible. The 
whole point of having asm/xxxx.h files is that you can stuff them full of 
dirty tricks specific to certain architectures.

> That can obviously be done for example with C code like this: 
>        count = atomic_inc_return(&sem->count); 
>        if (__builtin_expect(count == 0, 0)) 
>                slow_path() 
>
> The above is the perfect C implementation IMHO

But not so efficient since it _has_ to take a jump unless the compiler can 
emit code in alternative text sections when it seems appropriate. Plus, you 
have to test count's value. XADD sets EFLAGS based on the result in memory, 
something that allows all but one fastpath to be two instructions in length 
(the one that isn't is three).

But, yes, there should probably be two generic cases: one implemented with a 
spinlock in the rwsem struct (as I supply) and one implemented using 
atomic_add_return(). Note, however! atomic_add_return() is not necessarily 
implemented efficiently: if it involves a cmpxchg loop (as many seem to), 
then that is really quite inefficient, and may be better done as a spinlock 
anyway.

I've had a look at your implementation... It seems to hold the spinlocks for 
an awfully long time... specifically around the local variable initialisation 
in the 'failed' functions. Don't forget that the compiler can't reorder these 
because they're inside the spinlock. I would, if I were you, fold the 
'failed' functions into the main ones to avoid this problem.

Your rw_semaphore structure is also rather large: 46 bytes without debugging 
stuff (16 bytes apiece for the waitqueues and 12 bytes for the rest). 
Contrast that with mine: generic is 24 bytes and the i386-xadd optimised is 
20.

Admittedly, though, yours is extremely simple and easy to follow, but I don't 
think it's going to be very fast.

Of course, I still prefer mine... smaller, faster, more efficient:-)

David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: generic rwsem [Re: Alpha "process table hang"]
  2001-04-17 21:48 D.W.Howells
@ 2001-04-17 23:06 ` Andrea Arcangeli
  0 siblings, 0 replies; 17+ messages in thread
From: Andrea Arcangeli @ 2001-04-17 23:06 UTC (permalink / raw)
  To: D . W . Howells; +Cc: linux-kernel, dhowells, torvalds

On Tue, Apr 17, 2001 at 10:48:02PM +0100, D . W . Howells wrote:
> I disagree... you want such primitives to be as efficient as possible. The 
> whole point of having asm/xxxx.h files is that you can stuff them full of 
> dirty tricks specific to certain architectures.

Of course you always have the option to override completly and you should
on x86 (providing an API for total override is the main object of my patch).

> I've had a look at your implementation... It seems to hold the spinlocks for 
> an awfully long time... specifically around the local variable initialisation 

My point for not unlocking is that unlocking and locking back another spinlock
for the waitqueue and using the wait_even interface for serializing the slow
path is expensive and generates more cacheline ping pong between cpus.  And
quite frankly I don't care about the scalability of the slow path so if the
slow path is simpler and slower I'm happy with it.

> Your rw_semaphore structure is also rather large: 46 bytes without debugging 

It is 36bytes. and on 64bit archs the difference is going to be less.

> stuff (16 bytes apiece for the waitqueues and 12 bytes for the rest). 

The real waste is the lock of the waitqueue that I don't need, so I should
probably keep two list_head in the waitqueue instead of using the
wait_queue_head_t and wake_up_process by hand.

> Admittedly, though, yours is extremely simple and easy to follow, but I don't 
> think it's going to be very fast.

The fast path has to be as fast as yours, if not then the only variable that
can make difference is the fact I'm not inlining the fast path because it's not
that small, in such a case I should simply inline the fast path, I don't care
about the scalability of the slow path and I think the slow path may even be
faster than yours because I don't run additional unlock/lock and memory
barriers and the other cpus will stop dirtifying my stuff after their first
trylock until I unlock.

If you have time to benchmark I'd be interested to see some number. But anyways
my implementation was mostly meant to be obviously right and possible to
ovverride with per-arch algorithms.

Andrea

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Alpha "process table hang"
@ 2001-04-11 17:57 Bob McElrath
       [not found] ` <E14nOzo-0007Ew-00@the-village.bc.nu>
  0 siblings, 1 reply; 17+ messages in thread
From: Bob McElrath @ 2001-04-11 17:57 UTC (permalink / raw)
  To: Peter Rival; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1095 bytes --]

Peter Rival [frival@zk3.dec.com] wrote:
> Hmpf.  Haven't seen this at all on any of the Alphas that I'm running.  What
> exact system are you seeing this on, and what are you running when it happens?

This is a LX164 system, 533 MHz.

I have a hunch it's related to the X server because I've seen it many,
many times while sitting at the console (in X), but never when I'm
logged on remotely.  I've seen it with both XFree86 3.3.6, 4.0.2, 4.0.3,
Matrox Millenium II video card, 8MB.

I'm also experiencing regular X crashes, but the process-table-hang
doesn't occur at the same time as an X crash (or v/v).  I sent a patch
to xfree86@xfree86.org a few days ago that seemed to fix (one of) the X
crashes (in the mga driver, ask if you want details).

(But since the X server shouldn't have the ability to corrupt the
kernel's process list, there has to be a problem in the kernel
somewhere)

Note that this system was completely stable with 2.2 kernels.

Cheers,
-- Bob

Bob McElrath (rsmcelrath@students.wisc.edu) 
Univ. of Wisconsin at Madison, Department of Physics

[-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <E14nOzo-0007Ew-00@the-village.bc.nu>]

* Re: Alpha "process table hang"
       [not found] ` <E14nOzo-0007Ew-00@the-village.bc.nu>
@ 2001-04-13 13:48   ` Bob McElrath
  2001-04-17 15:07     ` generic rwsem [Re: Alpha "process table hang"] Andrea Arcangeli
  0 siblings, 1 reply; 17+ messages in thread
From: Bob McElrath @ 2001-04-13 13:48 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1276 bytes --]

Alan Cox [alan@lxorguk.ukuu.org.uk] wrote:
> > (But since the X server shouldn't have the ability to corrupt the
> > kernel's process list, there has to be a problem in the kernel
> > somewhere)
> 
> The X server has enough priviledge to corrupt anything. Its unlikely to and
> I do agree they two are likely to be unrelated.

Well, nix that idea.  I just fell back to 2.2.19, and I see neither the
X crash nor the process-table-hang crash (which rules out hardware
problems, thankfully).  The X crash is also kernel related, it seems.

I'm using XFree86 4.0.3 with the mga driver.  It hangs in mga_storm.c on
a line that looks like:
    while (MGAISBUSY()) {}
where:
    #define MGAISBUSY() (INREG8(MGAREG_Status + 2) & 0x01)

Killing and restarting X causes it to immediately hang in the same
place.  (I have to reboot to recover the console)

This would seem to be PCI related.  Have any significant PCI code
changes been made to the alpha architecture, especially pyxis or
cabriolet code?  I see that arch/alpha/kernel has been totally
rearranged, but since this doesn't crash in kernel code, I have no idea
how to debug it.

Thanks,
-- Bob

Bob McElrath (rsmcelrath@students.wisc.edu) 
Univ. of Wisconsin at Madison, Department of Physics

[-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* generic rwsem [Re: Alpha "process table hang"]
  2001-04-13 13:48   ` Bob McElrath
@ 2001-04-17 15:07     ` Andrea Arcangeli
  2001-04-17 15:28       ` Bob McElrath
                         ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Andrea Arcangeli @ 2001-04-17 15:07 UTC (permalink / raw)
  To: Bob McElrath; +Cc: linux-kernel, Peter Rival, Linus Torvalds, David Howells

On Fri, Apr 13, 2001 at 08:48:05AM -0500, Bob McElrath wrote:
> Alan Cox [alan@lxorguk.ukuu.org.uk] wrote:
> > > (But since the X server shouldn't have the ability to corrupt the
> > > kernel's process list, there has to be a problem in the kernel
> > > somewhere)
> > 
> > The X server has enough priviledge to corrupt anything. Its unlikely to and
> > I do agree they two are likely to be unrelated.
> 
> Well, nix that idea.  I just fell back to 2.2.19, and I see neither the
> X crash nor the process-table-hang crash (which rules out hardware
> problems, thankfully).  The X crash is also kernel related, it seems.
> 
> I'm using XFree86 4.0.3 with the mga driver.  It hangs in mga_storm.c on
> a line that looks like:
>     while (MGAISBUSY()) {}
> where:
>     #define MGAISBUSY() (INREG8(MGAREG_Status + 2) & 0x01)
> 
> Killing and restarting X causes it to immediately hang in the same
> place.  (I have to reboot to recover the console)
> 
> This would seem to be PCI related.  Have any significant PCI code
> changes been made to the alpha architecture, especially pyxis or
> cabriolet code?  I see that arch/alpha/kernel has been totally
> rearranged, but since this doesn't crash in kernel code, I have no idea
> how to debug it.

It seems it was an SMP race in the rw alpha semaphores. I rewrote the
rwsemaphores starting from my first implementation of them in C that is now
adpoted by the ppc port (I added some scalability and locking optimization),
and made them generic dropping all the rwsem stuff that is been included into
2.4.4pre[23] (the generic rwsemaphores in those kernels is broken, try to use
them in other archs or x86 and you will notice) and I cannot reproduce the hang
any longer.

My generic rwsem should be also cleaner and faster than the generic ones in
2.4.4pre3 and they can be turned off completly so an architecture can really
takeover with its own asm implementation (while with the 2.4.4pre3 design this
is obviously not possible because lib/rwsem.c compilation isn't conditional and
such file knows the internals of the struct rw_semaphore).

In the below generic implementation of the rw sem the max limit of concurrent
readers in the critical section is 2^sizeof(int) and down_read is recursive.
There's no limit of tasks sleeping in the slow path either by down_read or
down_write. The waitqueue wakeups are done without any additional lock (the
lock in the waitqueue is unused).

So please try to reproduce the hang with 2.4.4pre3 with those two
patches applied:

	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre3aa3/00_alpha-numa-3
	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre3aa3/00_rwsem-generic-1

All alpha users should run with at least the above two patches applied
to compile their tree and to make sure to have rock solid rwsemaphores.

Both patches are suggested for inclusion, the arch optimizations can be done on
top of the cleaner and arch friendly rwsem code (just copy the asm files from
2.4.4pre3 and set CONFIG_GENERIC_RWSEM to `n') and the current lib/rwsem.c can be
moved in arch/i386/kernel without any problem. I didn't do that myself because
I wasn't going to audit every line of the x86 asm rwsem right now and I only
wanted obviously right stuff into my tree but I'd appreciate if David could do
that. Note that besides my patch drops the asm stuff I don't want to reject the asm
based implementation in the long run, but I only care to proivide a solid
and clean generic implementation that can be used as a fallback anytime on any
arch by only changing a configuration option.

The alpha-numa patch also fixes some mm bug in the common code.

Andrea

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: generic rwsem [Re: Alpha "process table hang"]
  2001-04-17 15:07     ` generic rwsem [Re: Alpha "process table hang"] Andrea Arcangeli
@ 2001-04-17 15:28       ` Bob McElrath
  2001-04-19 16:21         ` Bob McElrath
  2001-04-17 15:45       ` Christoph Hellwig
  2001-04-17 16:59       ` David Howells
  2 siblings, 1 reply; 17+ messages in thread
From: Bob McElrath @ 2001-04-17 15:28 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel, Peter Rival, Linus Torvalds, David Howells

[-- Attachment #1: Type: text/plain, Size: 1051 bytes --]

Andrea Arcangeli [andrea@suse.de] wrote:
>
> So please try to reproduce the hang with 2.4.4pre3 with those two
> patches applied:
> 
> 	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre3aa3/00_alpha-numa-3
> 	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre3aa3/00_rwsem-generic-1
> 
> All alpha users should run with at least the above two patches applied
> to compile their tree and to make sure to have rock solid rwsemaphores.

Excellent!  I'll give it a try.

Note that I recently saw the X hang with the 2.2.19 kernel, but I still
haven't seen the process-table-hang with 2.2.19 (about 4 days running
with 2.2.19).  It is *far* easier to get the X hang in 2.4 than 2.2.
(minutes for 2.4, days for 2.2)  Also note that this is not an SMP
machine (single processor 21164a, LX164 mobo).

But I'll apply your patch tonight and let you know the results.

Cheers,
-- Bob

Bob McElrath (rsmcelrath@students.wisc.edu) 
Univ. of Wisconsin at Madison, Department of Physics

[-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: generic rwsem [Re: Alpha "process table hang"]
  2001-04-17 15:28       ` Bob McElrath
@ 2001-04-19 16:21         ` Bob McElrath
  2001-04-19 17:17           ` Andrea Arcangeli
  0 siblings, 1 reply; 17+ messages in thread
From: Bob McElrath @ 2001-04-19 16:21 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1391 bytes --]

Bob McElrath [mcelrath+linux@draal.physics.wisc.edu] wrote:
> Andrea Arcangeli [andrea@suse.de] wrote:
> >
> > So please try to reproduce the hang with 2.4.4pre3 with those two
> > patches applied:
> > 
> > 	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre3aa3/00_alpha-numa-3
> > 	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre3aa3/00_rwsem-generic-1
> > 
> > All alpha users should run with at least the above two patches applied
> > to compile their tree and to make sure to have rock solid rwsemaphores.
> 
> Excellent!  I'll give it a try.
> 
> Note that I recently saw the X hang with the 2.2.19 kernel, but I still
> haven't seen the process-table-hang with 2.2.19 (about 4 days running
> with 2.2.19).  It is *far* easier to get the X hang in 2.4 than 2.2.
> (minutes for 2.4, days for 2.2)  Also note that this is not an SMP
> machine (single processor 21164a, LX164 mobo).
> 
> But I'll apply your patch tonight and let you know the results.

Status report:
I'm at 2 days uptime now, and have not seen the process-table-hang.
Looks like this fixed it.  Previously I would get a hang in the first
day or so.  I'm using your alpha-numa-3 and rwsem-generic-4 against
2.4.4pre3.

Cheers,
-- Bob

Bob McElrath (rsmcelrath@students.wisc.edu) 
Univ. of Wisconsin at Madison, Department of Physics

[-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: generic rwsem [Re: Alpha "process table hang"]
  2001-04-19 16:21         ` Bob McElrath
@ 2001-04-19 17:17           ` Andrea Arcangeli
  2001-04-23 23:27             ` Bob McElrath
  0 siblings, 1 reply; 17+ messages in thread
From: Andrea Arcangeli @ 2001-04-19 17:17 UTC (permalink / raw)
  To: Bob McElrath; +Cc: linux-kernel

On Thu, Apr 19, 2001 at 11:21:17AM -0500, Bob McElrath wrote:
> I'm at 2 days uptime now, and have not seen the process-table-hang.
> Looks like this fixed it.  Previously I would get a hang in the first
> day or so.  I'm using your alpha-numa-3 and rwsem-generic-4 against
> 2.4.4pre3.

good, thanks for the report.

BTW, if you upgrade to 2.4.4pre4 you can apply those two patches:

	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre4aa1/00_alpha-numa-4
	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre4aa1/00_rwsem-generic-6

really the first is not necessary anymore unless you're using a wildfire. The
second also resurrect the optimized rwsemaphores for all archs but alpha and
ia32.

Andrea

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: generic rwsem [Re: Alpha "process table hang"]
  2001-04-19 17:17           ` Andrea Arcangeli
@ 2001-04-23 23:27             ` Bob McElrath
  2001-04-23 23:40               ` Andrea Arcangeli
  0 siblings, 1 reply; 17+ messages in thread
From: Bob McElrath @ 2001-04-23 23:27 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1861 bytes --]

Andrea Arcangeli [andrea@suse.de] wrote:
> On Thu, Apr 19, 2001 at 11:21:17AM -0500, Bob McElrath wrote:
> > I'm at 2 days uptime now, and have not seen the process-table-hang.
> > Looks like this fixed it.  Previously I would get a hang in the first
> > day or so.  I'm using your alpha-numa-3 and rwsem-generic-4 against
> > 2.4.4pre3.
> 
> good, thanks for the report.
> 
> BTW, if you upgrade to 2.4.4pre4 you can apply those two patches:
> 
> 	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre4aa1/00_alpha-numa-4
> 	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre4aa1/00_rwsem-generic-6
> 
> really the first is not necessary anymore unless you're using a wildfire. The
> second also resurrect the optimized rwsemaphores for all archs but alpha and
> ia32.

Well, take that back, I just got it to hang.  Again, this is 2.4.4pre3
with alpha-numa-3 and rwsem-generic-4.  I saw it upon starting mozilla.
I also saw some scary filesystem errors that may or may not be related:
    Apr 23 18:09:40 draal kernel: EXT2-fs error (device sd(8,2)): 
        ext2_new_block: Free blocks count corrupted for block group 252 

There has been a lot of discussion on the topic of rwsems (that,
admittedly, I haven't followed very closely).  It looks like
rwsem-generic-6 is the latest from Andrea, I'll build a new 2.4.4pre4
kernel with these patches and let you know the results.  Have you made
changes between rwsem-generic-4 and rwsem-generic-6 that would
fix/prevent a deadlock?

Let me know if there are any useful tests I could perform.  Would it be
useful for me to run the rwsem benchmarks you've been using?  Could
these detect a deadlock situation?

Cheers,
-- Bob

Bob McElrath (rsmcelrath@students.wisc.edu) 
Univ. of Wisconsin at Madison, Department of Physics

[-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: generic rwsem [Re: Alpha "process table hang"]
  2001-04-23 23:27             ` Bob McElrath
@ 2001-04-23 23:40               ` Andrea Arcangeli
  0 siblings, 0 replies; 17+ messages in thread
From: Andrea Arcangeli @ 2001-04-23 23:40 UTC (permalink / raw)
  To: Bob McElrath; +Cc: linux-kernel

On Mon, Apr 23, 2001 at 06:27:23PM -0500, Bob McElrath wrote:
> Well, take that back, I just got it to hang.  Again, this is 2.4.4pre3
> with alpha-numa-3 and rwsem-generic-4.  I saw it upon starting mozilla.
> I also saw some scary filesystem errors that may or may not be related:
>     Apr 23 18:09:40 draal kernel: EXT2-fs error (device sd(8,2)): 
>         ext2_new_block: Free blocks count corrupted for block group 252 

That is probably unrelated to the ps hang. I suspect you are been bitten by the
ext2 metadata corruption (2.4.4pre2 was just fixed but previous kernel wasn't).

> rwsem-generic-6 is the latest from Andrea, I'll build a new 2.4.4pre4
> kernel with these patches and let you know the results.  Have you made

Yes, that's safe.

> changes between rwsem-generic-4 and rwsem-generic-6 that would
> fix/prevent a deadlock?

No, but I think they are two separate issues.

> Let me know if there are any useful tests I could perform.  Would it be
> useful for me to run the rwsem benchmarks you've been using?  Could
> these detect a deadlock situation?

yes to be sure you can run it without my patch and see if it hangs (I never
tried that myself, but I was able to reproduce the ps hang quite easily and it
was quite obviously due the rwsemaphores and it gone away completly after I
used the generic semaphores).

Andrea

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: generic rwsem [Re: Alpha "process table hang"]
  2001-04-17 15:07     ` generic rwsem [Re: Alpha "process table hang"] Andrea Arcangeli
  2001-04-17 15:28       ` Bob McElrath
@ 2001-04-17 15:45       ` Christoph Hellwig
  2001-04-17 16:59       ` David Howells
  2 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2001-04-17 15:45 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

In article <20010417170717.H2696@athlon.random> you wrote:
> My generic rwsem should be also cleaner and faster than the generic ones in
> 2.4.4pre3 and they can be turned off completly so an architecture can really
> takeover with its own asm implementation (while with the 2.4.4pre3 design this
> is obviously not possible because lib/rwsem.c compilation isn't conditional and
> such file knows the internals of the struct rw_semaphore).
>
> In the below generic implementation of the rw sem the max limit of concurrent
> readers in the critical section is 2^sizeof(int) and down_read is recursive.
> There's no limit of tasks sleeping in the slow path either by down_read or
> down_write. The waitqueue wakeups are done without any additional lock (the
> lock in the waitqueue is unused).
>
> So please try to reproduce the hang with 2.4.4pre3 with those two
> patches applied:

> 	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre3aa3/00_alpha-numa-3
> 	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre3aa3/00_rwsem-generic-1

Hey it looks like someone finally fixed the rwsems :P

A little comment on the path:

In lib/Makefile you should _always_ add rwsem.o the export-objs, not only if
CONFIG_GENERIC_RWSEM is 'y' - that's the whole idea behind export-objs.

	Christoph

-- 
Of course it doesn't work. We've performed a software upgrade.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: generic rwsem [Re: Alpha "process table hang"]
  2001-04-17 15:07     ` generic rwsem [Re: Alpha "process table hang"] Andrea Arcangeli
  2001-04-17 15:28       ` Bob McElrath
  2001-04-17 15:45       ` Christoph Hellwig
@ 2001-04-17 16:59       ` David Howells
  2001-04-17 17:55         ` Andrea Arcangeli
  2 siblings, 1 reply; 17+ messages in thread
From: David Howells @ 2001-04-17 16:59 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Bob McElrath, linux-kernel, Peter Rival, Linus Torvalds,
	David Howells

Andrea,

How did you generate the 00_rwsem-generic-1 patch? Against what did you diff?
You seem to have removed all the optimised i386 rwsem stuff... Did it not work
for you?

> (the generic rwsemaphores in those kernels is broken, try to use them in
> other archs or x86 and you will notice) and I cannot reproduce the hang any
> longer.

Can you supply a test case that demonstrates it not working?

> My generic rwsem should be also cleaner and faster than the generic ones in
> 2.4.4pre3 and they can be turned off completly so an architecture can really
> takeover with its own asm implementation.

I quick look says it shouldn't be faster (inline functions and all that).

However, I think you might be right about it being too dependent on the
algorithm I put in, and that is easy to change.

> (while with the 2.4.4pre3 design this is obviously not possible because
> lib/rwsem.c compilation isn't conditional and such file knows the internals
> of the struct rw_semaphore).

Could be very easily changed.

David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: generic rwsem [Re: Alpha "process table hang"]
  2001-04-17 16:59       ` David Howells
@ 2001-04-17 17:55         ` Andrea Arcangeli
  0 siblings, 0 replies; 17+ messages in thread
From: Andrea Arcangeli @ 2001-04-17 17:55 UTC (permalink / raw)
  To: David Howells; +Cc: Bob McElrath, linux-kernel, Peter Rival, Linus Torvalds

On Tue, Apr 17, 2001 at 05:59:13PM +0100, David Howells wrote:
> Andrea,
> 
> How did you generate the 00_rwsem-generic-1 patch? Against what did you diff?

2.4.4pre3 from kernel.org.

> You seem to have removed all the optimised i386 rwsem stuff... Did it not work
> for you?

As said the design of the framework to plugin per-arch rwsem implementation
isn't flexible enough and the generic spinlocks are as well broken, try to use
them if you can (yes I tried that for the alpha, it was just a mess and it was
more productive to rewrite than to fix).

> > (the generic rwsemaphores in those kernels is broken, try to use them in
> > other archs or x86 and you will notice) and I cannot reproduce the hang any
> > longer.
> 
> Can you supply a test case that demonstrates it not working?

#define __RWSEM_INITIALIZER(name,count) \
				 ^^^^^
{ RWSEM_UNLOCKED_VALUE, SPIN_LOCK_UNLOCKED, \
  ^^^^^^^^^^^^^^^^^^^^
	__WAIT_QUEUE_HEAD_INITIALIZER((name).wait) \
	__RWSEM_DEBUG_INIT __RWSEM_DEBUG_MINIT(name) }

#define __DECLARE_RWSEM_GENERIC(name,count) \
	struct rw_semaphore name = __RWSEM_INITIALIZER(name,count)
							    ^^^^^

#define DECLARE_RWSEM(name) __DECLARE_RWSEM_GENERIC(name,RW_LOCK_BIAS)
							 ^^^^^^^^^^^^
#define DECLARE_RWSEM_READ_LOCKED(name) __DECLARE_RWSEM_GENERIC(name,RW_LOCK_BIAS-1)
								     ^^^^^^^^^^^^^^
#define DECLARE_RWSEM_WRITE_LOCKED(name) __DECLARE_RWSEM_GENERIC(name,0)

> > My generic rwsem should be also cleaner and faster than the generic ones in
> > 2.4.4pre3 and they can be turned off completly so an architecture can really
> > takeover with its own asm implementation.
> 
> I quick look says it shouldn't be faster (inline functions and all that).

The spinlock based generic semaphores are quite large, so I don't want to waste
icache because of that, a call asm instruction isn't that costly (it's
obviously _very_ costly for a spinlock because a spinlock is 1 asm instruction
in the fast path, but not for a C based rwsem). But the real point is the
locking and the waitqueue mechanism that is superior in my implementation (not
the non inlining part).

And it's also more readable and it's not bloated code, 65+110 lines compared to
156+148+174 lines.

andrea@athlon:~/devel/kernel > wc -l 2.4.4pre3aa/include/linux/rwsem.h 
     65 2.4.4pre3aa/include/linux/rwsem.h
andrea@athlon:~/devel/kernel > wc -l 2.4.4pre3aa/lib/rwsem.c           
    110 2.4.4pre3aa/lib/rwsem.c
andrea@athlon:~/devel/kernel > wc -l 2.4.4pre3/lib/rwsem.c   
    156 2.4.4pre3/lib/rwsem.c
andrea@athlon:~/devel/kernel > wc -l 2.4.4pre3/include/linux/rwsem.h 
    148 2.4.4pre3/include/linux/rwsem.h
andrea@athlon:~/devel/kernel > wc -l 2.4.4pre3/include/linux/rwsem-spinlock.h 
    174 2.4.4pre3/include/linux/rwsem-spinlock.h
andrea@athlon:~/devel/kernel > 

I suggest you to apply my patch, read my implementation, tell me if you think
it's not more efficient and more readable, and then to set CONFIG_RWSEM_GENERIC
to n in arch/i386/config.in and to plugin your asm code taken from vanilla
2.4.4pre3 into include/asm-i386/rwsem.h and arch/i386/kernel/rwsem.c then we're
done, and if someone has problems with the asm code with a one liner he can
fallback in a obviously right and quite efficient implementation [even if the
fastpath is not 1 inlined asm instruction] (all archs will be allowed to do
that transparently to the arch dependent code). Same can be done on alpha and
other archs, resurrecting the inlined fast paths based on the atomic_add_return
APIs is easy too. Infact I'd _recommend_ for archs that can implement the
atomic_add_return and friends (included ia32 with xadd on >=586) to
implement the "fast path" version of the rwsem it in C too in the common code
selectable with a CONFIG_RWSEM_ATOMIC_RETURN (plus we add
linux/include/linux/compiler.h with the builtin_expect macro to be able to
define the fast path in C too). Most archs have the atomic_*_return and friends
and they will be able share completly the common code and have rwsem fast paths
as fast as ia32 without risk to introduce bugs in the port. The more we share
the less risk there is. After CONFIG_RWSEM_ATOMIC_RETURN is implemented we can
probably drop the file asm-i386/rwsem-xadd.h.

Andrea

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2001-04-23 23:40 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-04-17 19:18 generic rwsem [Re: Alpha "process table hang"] D.W.Howells
2001-04-17 20:49 ` Andrea Arcangeli
2001-04-17 21:29   ` Christoph Hellwig
2001-04-17 22:06     ` Andrea Arcangeli
  -- strict thread matches above, loose matches on Subject: below --
2001-04-17 23:54 D.W.Howells
2001-04-18 20:49 ` Andrea Arcangeli
2001-04-17 21:48 D.W.Howells
2001-04-17 23:06 ` Andrea Arcangeli
2001-04-11 17:57 Alpha "process table hang" Bob McElrath
     [not found] ` <E14nOzo-0007Ew-00@the-village.bc.nu>
2001-04-13 13:48   ` Bob McElrath
2001-04-17 15:07     ` generic rwsem [Re: Alpha "process table hang"] Andrea Arcangeli
2001-04-17 15:28       ` Bob McElrath
2001-04-19 16:21         ` Bob McElrath
2001-04-19 17:17           ` Andrea Arcangeli
2001-04-23 23:27             ` Bob McElrath
2001-04-23 23:40               ` Andrea Arcangeli
2001-04-17 15:45       ` Christoph Hellwig
2001-04-17 16:59       ` David Howells
2001-04-17 17:55         ` Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox