public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [patch] futex requeueing feature, futex-requeue-2.5.69-D3
@ 2003-05-19  9:31 Ingo Molnar
  2003-05-19 10:10 ` Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Ingo Molnar @ 2003-05-19  9:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Rusty Russell, Ulrich Drepper


the attached patch addresses a futex related SMP scalability problem of
glibc. A number of regressions have been reported to the NTPL mailing list
when going to many CPUs, for applications that use condition variables and
the pthread_cond_broadcast() API call. Using this functionality, testcode
shows a slowdown from 0.12 seconds runtime to over 237 seconds (!)  
runtime, on 4-CPU systems.

pthread condition variables use two futex-backed mutex-alike locks: an
internal one for the glibc CV state itself, and a user-supplied mutex
which the API guarantees to take in certain codepaths. (Unfortunately the
user-supplied mutex cannot be used to protect the CV state, so we've got
to deal with two locks.)

The cause of the slowdown is a 'swarm effect': if lots of threads are
blocked on a condition variable, and pthread_cond_broadcast() is done,
then glibc first does a FUTEX_WAKE on the cv-internal mutex, then down a
mutex_down() on the user-supplied mutex. Ie. a swarm of threads is created
which all race to serialize on the user-supplied mutex. The more threads
are used, the more likely it becomes that the scheduler will balance them
over to other CPUs - where they just schedule, try to lock the mutex, and
go to sleep. This 'swarm effect' is purely technical, a side-effect of
glibc's use of futexes, and the imperfect coupling of the two locks.

the solution to this problem is to not wake up the swarm of threads, but
'requeue' them from the CV-internal mutex to the user-supplied mutex. The
attached patch adds the FUTEX_REQUEUE feature FUTEX_REQUEUE requeues N
threads from futex address A to futex address B:

	sys_futex(uaddr, FUTEX_REQUEUE, nr_wake, NULL, uaddr2);

the 'val' parameter to sys_futex (nr_wake) is the # of woken up threads.  
This way glibc can wake up a single thread (which will take the
user-mutex), and can requeue the rest, with a single system-call.

Ulrich Drepper has implemented FUTEX_REQUEUE support in glibc, and a
number of people have tested it over the past couple of weeks. Here are
the measurements done by Saurabh Desai:

System: 4xPIII 700MHz

 ./cond-perf -r 100 -n 200:        1p       2p         4p
 Default NPTL:                 0.120s   0.211s   237.407s
 requeue NPTL:                 0.124s   0.156s     0.040s

 ./cond-perf -r 1000 -n 100:
 Default NPTL:                 0.276s   0.412s     0.530s
 requeue NPTL:                 0.349s   0.503s     0.550s

 ./pp -v -n 128 -i 1000 -S 32768:
 Default NPTL: 128 games in    1.111s   1.270s    16.894s
 requeue NPTL: 128 games in    1.111s   1.959s     2.426s

 ./pp -v -n 1024 -i 10 -S 32768:
 Default NPTL: 1024 games in   0.181s   0.394s     incompleted 2m+
 requeue NPTL: 1024 games in   0.166s   0.254s     0.341s

the speedup with increasing number of threads is quite significant, in the
128 threads, case it's more than 8 times. In the cond-perf test, on 4 CPUs
it's almost infinitely faster than the 'swarm of threads' catastrophy
triggered by the old code.

there's a slowdown on UP, which is expected: on UP the O(1) scheduler
implicitly serializes all active threads on the runqueue, and doesnt
degrade under lots of threads. On SMP the 'point of breakdown' depends on
the precise amount of time needed for the threads to become rated as
'cache-cold' by the load-balancer.

(the patch adds a new futex syscall parameter (uaddr2), which is a
compatible extension of sys_futex. Old NPTL applications will continue to
work without any impact, only the FUTEX_REQUEUE codepath uses the new
parameter.)

	Ingo

--- linux/include/linux/futex.h.orig	
+++ linux/include/linux/futex.h	
@@ -5,7 +5,8 @@
 #define FUTEX_WAIT (0)
 #define FUTEX_WAKE (1)
 #define FUTEX_FD (2)
+#define FUTEX_REQUEUE (3)
 
-extern asmlinkage long sys_futex(u32 __user *uaddr, int op, int val, struct timespec __user *utime);
+extern asmlinkage long sys_futex(u32 __user *uaddr, int op, int val, struct timespec __user *utime, u32 __user *uaddr2);
 
 #endif
--- linux/kernel/fork.c.orig	
+++ linux/kernel/fork.c	
@@ -457,7 +457,7 @@ void mm_release(struct task_struct *tsk,
 		 * not set up a proper pointer then tough luck.
 		 */
 		put_user(0, tidptr);
-		sys_futex(tidptr, FUTEX_WAKE, 1, NULL);
+		sys_futex(tidptr, FUTEX_WAKE, 1, NULL, NULL);
 	}
 }
 
--- linux/kernel/compat.c.orig	
+++ linux/kernel/compat.c	
@@ -214,7 +214,7 @@ asmlinkage long compat_sys_sigprocmask(i
 extern long do_futex(unsigned long, int, int, unsigned long);
 
 asmlinkage long compat_sys_futex(u32 *uaddr, int op, int val,
-		struct compat_timespec *utime)
+		struct compat_timespec *utime, u32 *uaddr2)
 {
 	struct timespec t;
 	unsigned long timeout = MAX_SCHEDULE_TIMEOUT;
@@ -224,7 +224,7 @@ asmlinkage long compat_sys_futex(u32 *ua
 			return -EFAULT;
 		timeout = timespec_to_jiffies(&t) + 1;
 	}
-	return do_futex((unsigned long)uaddr, op, val, timeout);
+	return do_futex((unsigned long)uaddr, op, val, timeout, (unsigned long)uaddr2);
 }
 
 asmlinkage long sys_setrlimit(unsigned int resource, struct rlimit *rlim);
--- linux/kernel/futex.c.orig	
+++ linux/kernel/futex.c	
@@ -2,6 +2,9 @@
  *  Fast Userspace Mutexes (which I call "Futexes!").
  *  (C) Rusty Russell, IBM 2002
  *
+ *  Generalized futexes, futex requeueing, misc fixes by Ingo Molnar
+ *  (C) Copyright 2003 Red Hat Inc, All Rights Reserved
+ *
  *  Thanks to Ben LaHaise for yelling "hashed waitqueues" loudly
  *  enough at me, Linus for the original (flawed) idea, Matthew
  *  Kirkwood for proof-of-concept implementation.
@@ -9,9 +12,6 @@
  *  "The futexes are also cursed."
  *  "But they come in a choice of three flavours!"
  *
- *  Generalized futexes for every mapping type, Ingo Molnar, 2002
- *
- *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
  *  the Free Software Foundation; either version 2 of the License, or
@@ -216,6 +216,61 @@ static void futex_vcache_callback(vcache
 	spin_unlock(&futex_lock);
 }
 
+/*
+ * Requeue all waiters hashed on one physical page to another
+ * physical page.
+ */
+static int futex_requeue(unsigned long uaddr1, int offset1, unsigned long uaddr2, int offset2, int num)
+{
+	struct list_head *i, *next, *head1, *head2;
+	struct page *page1, *page2;
+	int ret = 0;
+
+	lock_futex_mm();
+
+	page1 = __pin_page(uaddr1 - offset1);
+	if (!page1) {
+		unlock_futex_mm();
+		return -EFAULT;
+	}
+	page2 = __pin_page(uaddr2 - offset2);
+	if (!page2) {
+		unlock_futex_mm();
+		return -EFAULT;
+	}
+
+	head1 = hash_futex(page1, offset1);
+	head2 = hash_futex(page2, offset2);
+
+	list_for_each_safe(i, next, head1) {
+		struct futex_q *this = list_entry(i, struct futex_q, list);
+
+		if (this->page == page1 && this->offset == offset1) {
+			list_del_init(i);
+			__detach_vcache(&this->vcache);
+			if (++ret <= num) {
+				wake_up_all(&this->waiters);
+				if (this->filp)
+					send_sigio(&this->filp->f_owner, this->fd, POLL_IN);
+			} else {
+				unpin_page(this->page);
+				__pin_page_atomic (page2);
+				list_add_tail(i, head2);
+				__attach_vcache(&this->vcache, uaddr2, current->mm, futex_vcache_callback);
+				this->offset = offset2;
+				this->page = page2;
+			}
+		}
+	}
+
+	unlock_futex_mm();
+
+	unpin_page(page1);
+	unpin_page(page2);
+
+	return ret;
+}
+
 static inline void __queue_me(struct futex_q *q, struct page *page,
 				unsigned long uaddr, int offset,
 				int fd, struct file *filp)
@@ -425,9 +480,9 @@ out:
 	return ret;
 }
 
-long do_futex(unsigned long uaddr, int op, int val, unsigned long timeout)
+long do_futex(unsigned long uaddr, int op, int val, unsigned long timeout, unsigned long uaddr2)
 {
-	unsigned long pos_in_page;
+	unsigned long pos_in_page, pos_in_page2;
 	int ret;
 
 	pos_in_page = uaddr % PAGE_SIZE;
@@ -443,6 +498,14 @@ long do_futex(unsigned long uaddr, int o
 	case FUTEX_WAKE:
 		ret = futex_wake(uaddr, pos_in_page, val);
 		break;
+	case FUTEX_REQUEUE:
+		pos_in_page2 = uaddr2 % PAGE_SIZE;
+
+		/* Must be "naturally" aligned */
+		if (pos_in_page2 % sizeof(u32))
+			return -EINVAL;
+		ret = futex_requeue(uaddr, pos_in_page, uaddr2, pos_in_page2, val);
+		break;
 	case FUTEX_FD:
 		/* non-zero val means F_SETOWN(getpid()) & F_SETSIG(val) */
 		ret = futex_fd(uaddr, pos_in_page, val);
@@ -453,7 +516,7 @@ long do_futex(unsigned long uaddr, int o
 	return ret;
 }
 
-asmlinkage long sys_futex(u32 __user *uaddr, int op, int val, struct timespec __user *utime)
+asmlinkage long sys_futex(u32 __user *uaddr, int op, int val, struct timespec __user *utime, u32 __user *uaddr2)
 {
 	struct timespec t;
 	unsigned long timeout = MAX_SCHEDULE_TIMEOUT;
@@ -463,7 +526,7 @@ asmlinkage long sys_futex(u32 __user *ua
 			return -EFAULT;
 		timeout = timespec_to_jiffies(&t) + 1;
 	}
-	return do_futex((unsigned long)uaddr, op, val, timeout);
+	return do_futex((unsigned long)uaddr, op, val, timeout, (unsigned long)uaddr2);
 }
 
 static struct super_block *



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-19  9:31 Ingo Molnar
@ 2003-05-19 10:10 ` Christoph Hellwig
  2003-05-19 10:16   ` Ingo Molnar
  2003-05-20  0:08   ` Rusty Russell
  2003-05-19 10:23 ` Andrew Morton
  2003-05-20  0:04 ` Rusty Russell
  2 siblings, 2 replies; 31+ messages in thread
From: Christoph Hellwig @ 2003-05-19 10:10 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, linux-kernel, Rusty Russell, Ulrich Drepper

On Mon, May 19, 2003 at 11:31:51AM +0200, Ingo Molnar wrote:
> the solution to this problem is to not wake up the swarm of threads, but
> 'requeue' them from the CV-internal mutex to the user-supplied mutex. The
> attached patch adds the FUTEX_REQUEUE feature FUTEX_REQUEUE requeues N
> threads from futex address A to futex address B:
> 
> 	sys_futex(uaddr, FUTEX_REQUEUE, nr_wake, NULL, uaddr2);
> 
> the 'val' parameter to sys_futex (nr_wake) is the # of woken up threads.  
> This way glibc can wake up a single thread (which will take the
> user-mutex), and can requeue the rest, with a single system-call.

Urgg, yet another sys_futex extension.  Could you please split all
these totally different cases into separate syscalls instead?

> +				wake_up_all(&this->waiters);
> +				if (this->filp)
> +					send_sigio(&this->filp->f_owner, this->fd, POLL_IN);
> +			} else {
> +				unpin_page(this->page);
> +				__pin_page_atomic (page2);
> +				list_add_tail(i, head2);
> +				__attach_vcache(&this->vcache, uaddr2, current->mm, futex_vcache_callback);

Please linewrap after 80 lines, thanks.

> +	case FUTEX_REQUEUE:
> +		pos_in_page2 = uaddr2 % PAGE_SIZE;
> +
> +		/* Must be "naturally" aligned */
> +		if (pos_in_page2 % sizeof(u32))
> +			return -EINVAL;

Who guarantess that the alignment of u32 is always the same as it's size?


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-19 10:10 ` Christoph Hellwig
@ 2003-05-19 10:16   ` Ingo Molnar
  2003-05-19 11:49     ` Christoph Hellwig
  2003-05-20  0:08   ` Rusty Russell
  1 sibling, 1 reply; 31+ messages in thread
From: Ingo Molnar @ 2003-05-19 10:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linus Torvalds, linux-kernel, Rusty Russell, Ulrich Drepper


On Mon, 19 May 2003, Christoph Hellwig wrote:

> [...] Could you please split all these totally different cases into
> separate syscalls instead?

sure, i'm all for it, but in a different pass, and after syncing up with
glibc. An API cleanup like this should have been done upon the
introduction of futexes, why didnt you comment on this then? Splitting off
FUTEX_REQUEUE in isolation is quite pointless.

> > +	case FUTEX_REQUEUE:
> > +		pos_in_page2 = uaddr2 % PAGE_SIZE;
> > +
> > +		/* Must be "naturally" aligned */
> > +		if (pos_in_page2 % sizeof(u32))
> > +			return -EINVAL;
> 
> Who guarantess that the alignment of u32 is always the same as it's size?

glibc. We do not want to handle all the misaligned cases for obvious
reasons. The use of u32 (instead of a native word) is a bit unfortunate on
64-bit systems but now a reality.

	Ingo


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-19  9:31 Ingo Molnar
  2003-05-19 10:10 ` Christoph Hellwig
@ 2003-05-19 10:23 ` Andrew Morton
  2003-05-20  0:04 ` Rusty Russell
  2 siblings, 0 replies; 31+ messages in thread
From: Andrew Morton @ 2003-05-19 10:23 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: torvalds, linux-kernel, rusty, drepper

Ingo Molnar <mingo@elte.hu> wrote:
>
>  +	page1 = __pin_page(uaddr1 - offset1);
>  +	if (!page1) {
>  +		unlock_futex_mm();
>  +		return -EFAULT;
>  +	}
>  +	page2 = __pin_page(uaddr2 - offset2);
>  +	if (!page2) {
>  +		unlock_futex_mm();
>  +		return -EFAULT;
>  +	}

page1 is leaked.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-19 10:16   ` Ingo Molnar
@ 2003-05-19 11:49     ` Christoph Hellwig
  0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2003-05-19 11:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Christoph Hellwig, Linus Torvalds, linux-kernel, Rusty Russell,
	Ulrich Drepper

On Mon, May 19, 2003 at 12:16:02PM +0200, Ingo Molnar wrote:
> sure, i'm all for it, but in a different pass, and after syncing up with
> glibc. An API cleanup like this should have been done upon the
> introduction of futexes, why didnt you comment on this then? Splitting off
> FUTEX_REQUEUE in isolation is quite pointless.

Maybe I don't spend all my time watching the futex API? :)  Okay,
let's make a deal, you add a new syscall for this case and I'll fix
up the older ones in a patch that's ontop of yours?

> > > +	case FUTEX_REQUEUE:
> > > +		pos_in_page2 = uaddr2 % PAGE_SIZE;
> > > +
> > > +		/* Must be "naturally" aligned */
> > > +		if (pos_in_page2 % sizeof(u32))
> > > +			return -EINVAL;
> > 
> > Who guarantess that the alignment of u32 is always the same as it's size?
> 
> glibc. We do not want to handle all the misaligned cases for obvious
> reasons. The use of u32 (instead of a native word) is a bit unfortunate on
> 64-bit systems but now a reality.

Sorry if the question wasn't clear, but who guarantess that the alignment
of u32 is the same as it's size?  You test of the size of u32, not it's
alignment even if they usually are the same.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-19  9:31 Ingo Molnar
  2003-05-19 10:10 ` Christoph Hellwig
  2003-05-19 10:23 ` Andrew Morton
@ 2003-05-20  0:04 ` Rusty Russell
  2003-05-20  0:40   ` Ulrich Drepper
  2003-05-20  6:19   ` Ingo Molnar
  2 siblings, 2 replies; 31+ messages in thread
From: Rusty Russell @ 2003-05-20  0:04 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, linux-kernel, Ulrich Drepper

In message <Pine.LNX.4.44.0305191103500.5653-100000@localhost.localdomain> you write:
> 
> the attached patch addresses a futex related SMP scalability problem of
> glibc. A number of regressions have been reported to the NTPL mailing list
> when going to many CPUs, for applications that use condition variables and
> the pthread_cond_broadcast() API call. Using this functionality, testcode
> shows a slowdown from 0.12 seconds runtime to over 237 seconds (!)  
> runtime, on 4-CPU systems.

I gave feedback on this before, but didn't get a response.

1) Overload the last futex arg (change from timeval * to void *),
   don't add YA arg at the end.

2) Use __alignof__(u32) not sizeof(u32).  Sure, they're the same, but
   you really mean __alignof__ here.

I'm glad you finally put your name at the top of the file...

Thanks!
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-19 10:10 ` Christoph Hellwig
  2003-05-19 10:16   ` Ingo Molnar
@ 2003-05-20  0:08   ` Rusty Russell
  2003-05-20  0:31     ` Valdis.Kletnieks
  1 sibling, 1 reply; 31+ messages in thread
From: Rusty Russell @ 2003-05-20  0:08 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Linus Torvalds, linux-kernel, Ulrich Drepper

In message <20030519111028.B8663@infradead.org> you write:
> Urgg, yet another sys_futex extension.  Could you please split all
> these totally different cases into separate syscalls instead?

I don't mind either way, but wouldn't that be pointless incompatible
churn?

Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-20  0:08   ` Rusty Russell
@ 2003-05-20  0:31     ` Valdis.Kletnieks
  0 siblings, 0 replies; 31+ messages in thread
From: Valdis.Kletnieks @ 2003-05-20  0:31 UTC (permalink / raw)
  To: Rusty Russell; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 581 bytes --]

On Tue, 20 May 2003 10:08:46 +1000, Rusty Russell said:
> In message <20030519111028.B8663@infradead.org> you write:
> > Urgg, yet another sys_futex extension.  Could you please split all
> > these totally different cases into separate syscalls instead?
> 
> I don't mind either way, but wouldn't that be pointless incompatible
> churn?

Doesn't matter, the churn hasn't been reflected in glibc-kernheaders yet
either way.  Once kernheaders gets updated to match, you're stuck with whatever
it is at that point, so churn now while you have a chance. ;)

(Sorry, I couldn't resist)

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-20  0:04 ` Rusty Russell
@ 2003-05-20  0:40   ` Ulrich Drepper
  2003-05-20  1:46     ` Rusty Russell
  2003-05-20  6:19   ` Ingo Molnar
  1 sibling, 1 reply; 31+ messages in thread
From: Ulrich Drepper @ 2003-05-20  0:40 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Ingo Molnar, Linus Torvalds, linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Rusty Russell wrote:

> 1) Overload the last futex arg (change from timeval * to void *),
>    don't add YA arg at the end.

It wasn't Ingo's idea.  I suggested it.  Overloading parameter types is
evil.  This isn't an issue anymore if the extension is implemented as a
new syscall which certainly is better.


> 2) Use __alignof__(u32) not sizeof(u32).  Sure, they're the same, but
>    you really mean __alignof__ here.

I would always use sizeof() in this situation.  alignof is a property
the compiler can potentially relax.  On x86 it could easily be defined
to 1 in case someone wants to save memory.  OK, not really useful for
x86, but some embedded archs might be in such a situation.  Using
sizeof() makes sure that the variable is always 4-byte aligned.  For
atomic operations this certainly is a good minimal requirement.

If you want to really correct you might have to introduce something like
atomic_alignof().

- -- 
- --------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+yXmW2ijCOnn/RHQRAnXcAKCeYFvfkKIO/bbwqX1vUvbLkBvHKwCeN6zB
99qZZOVLMckvh6lxieUfVpY=
=LvwT
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-20  0:40   ` Ulrich Drepper
@ 2003-05-20  1:46     ` Rusty Russell
  2003-05-20  2:11       ` Ulrich Drepper
  2003-05-20  6:27       ` Ingo Molnar
  0 siblings, 2 replies; 31+ messages in thread
From: Rusty Russell @ 2003-05-20  1:46 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Ingo Molnar, Linus Torvalds, linux-kernel

In message <3EC97996.1080800@redhat.com> you write:
> Rusty Russell wrote:
> 
> > 1) Overload the last futex arg (change from timeval * to void *),
> >    don't add YA arg at the end.
> 
> It wasn't Ingo's idea.  I suggested it.  Overloading parameter types is
> evil.  This isn't an issue anymore if the extension is implemented as a
> new syscall which certainly is better.

People are using the interface, so I don't think changing it because
"it's nicer this way" is worthwhile: Ingo's "new syscall" patch has
backwards compat code for the old syscalls.  That's fugly 8(

So I'd say if we're going to multiplex, then overloading is easiest,
and in fact already done for FUTEX_FD.

But it's an issue over which sane people can disagree, IMHO.

> > 2) Use __alignof__(u32) not sizeof(u32).  Sure, they're the same, but
> >    you really mean __alignof__ here.
> 
> I would always use sizeof() in this situation.

Comment says: /* Must be "naturally" aligned */.  This used to be true
in a much earlier version of the code, now AFAICT the requirement test
should be:

	/* Handling futexes on multiple pages?  -ETOOHARD */
	if (pos_in_page + sizeof(u32) > PAGE_SIZE)
		return -EINVAL;

Ingo?

Thanks for the comments!
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-20  1:46     ` Rusty Russell
@ 2003-05-20  2:11       ` Ulrich Drepper
  2003-05-20  6:27       ` Ingo Molnar
  1 sibling, 0 replies; 31+ messages in thread
From: Ulrich Drepper @ 2003-05-20  2:11 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Ingo Molnar, Linus Torvalds, linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Rusty Russell wrote:

> People are using the interface, so I don't think changing it because
> "it's nicer this way" is worthwhile: Ingo's "new syscall" patch has
> backwards compat code for the old syscalls.

But not the new requeue variant and this is what needs the extra
parameter.  So the parameter question is not issue.

- -- 
- --------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+yY7n2ijCOnn/RHQRAuwkAJ93BCAOgeebWmWvClkYLCiE7CHAIwCgmu6N
9xteUehAOkcGrOWnPL5Wi5U=
=lHNT
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-20  0:04 ` Rusty Russell
  2003-05-20  0:40   ` Ulrich Drepper
@ 2003-05-20  6:19   ` Ingo Molnar
  1 sibling, 0 replies; 31+ messages in thread
From: Ingo Molnar @ 2003-05-20  6:19 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Linus Torvalds, linux-kernel, Ulrich Drepper


On Tue, 20 May 2003, Rusty Russell wrote:

> 1) Overload the last futex arg (change from timeval * to void *),
>    don't add YA arg at the end.

the right approach is the splitup of the arguments, done in the later
cleanup patch. But i'd rather add yet another argument than some nasty
overload of arguments ...

> 2) Use __alignof__(u32) not sizeof(u32).  Sure, they're the same, but
>    you really mean __alignof__ here.

ok. Could they ever be realistically different?

	Ingo


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-20  1:46     ` Rusty Russell
  2003-05-20  2:11       ` Ulrich Drepper
@ 2003-05-20  6:27       ` Ingo Molnar
  2003-05-20  6:56         ` Christoph Hellwig
  2003-05-20  8:55         ` Rusty Russell
  1 sibling, 2 replies; 31+ messages in thread
From: Ingo Molnar @ 2003-05-20  6:27 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Ulrich Drepper, Linus Torvalds, linux-kernel


On Tue, 20 May 2003, Rusty Russell wrote:

> > It wasn't Ingo's idea.  I suggested it.  Overloading parameter types is
> > evil.  This isn't an issue anymore if the extension is implemented as a
> > new syscall which certainly is better.
> 
> People are using the interface, [...]

it's a _new_ argument, only used by new glibc. Like i mentioned it in the
original announcement, it's fully backwards compatible with any older app.
This is a minimum requirement for a live syscall!

> [...] so I don't think changing it because "it's nicer this way" is
> worthwhile: Ingo's "new syscall" patch has backwards compat code for the
> old syscalls.  That's fugly 8(

yes, but the damage has been done already, and now we've got to start the
slow wait for the old syscall to flush out of our tree. It will a few
years to get rid of the compat code, but we better start now. hch is
perfectly right that the old futex multiplexer interface is quite ugly,
the requeue op only made this even more prominent.

> Comment says: /* Must be "naturally" aligned */.  This used to be true
> in a much earlier version of the code, now AFAICT the requirement test
> should be:
> 
> 	/* Handling futexes on multiple pages?  -ETOOHARD */
> 	if (pos_in_page + sizeof(u32) > PAGE_SIZE)
> 		return -EINVAL;

yes - but i'd rather enforce this for every futex, than to hit it in every
1000th app that manages to misalign a futex _and_ lay it across two pages.  

Also, it's only x86 that guarantees atomic instructions on misaligned
futexes (even then it comes with a cycle penalty), are you sure this also
works on other architectures? So i'd rather be a bit more strict with this
requirement.

	Ingo


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-20  6:27       ` Ingo Molnar
@ 2003-05-20  6:56         ` Christoph Hellwig
  2003-05-20  8:57           ` Rusty Russell
                             ` (2 more replies)
  2003-05-20  8:55         ` Rusty Russell
  1 sibling, 3 replies; 31+ messages in thread
From: Christoph Hellwig @ 2003-05-20  6:56 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Rusty Russell, Ulrich Drepper, Linus Torvalds, linux-kernel

On Tue, May 20, 2003 at 08:27:03AM +0200, Ingo Molnar wrote:
> yes, but the damage has been done already, and now we've got to start the
> slow wait for the old syscall to flush out of our tree.

Actually it should go away before 2.6.0.  sys_futex never was part of a
released stable kernel so having the old_ version around is silly.  I
Think it's enough time until 2.6 hits the roads for people to have those
vendor libc flushed out that use it.  (and sys_futex still isn't used
in the glibc CVS, only in the addon nptl package with pre-1 release
numbers.)


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-20  6:27       ` Ingo Molnar
  2003-05-20  6:56         ` Christoph Hellwig
@ 2003-05-20  8:55         ` Rusty Russell
  1 sibling, 0 replies; 31+ messages in thread
From: Rusty Russell @ 2003-05-20  8:55 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Ulrich Drepper, Linus Torvalds, linux-kernel

In message <Pine.LNX.4.44.0305200821010.2445-100000@localhost.localdomain> you write:
> yes, but the damage has been done already, and now we've got to start the
> slow wait for the old syscall to flush out of our tree. It will a few
> years to get rid of the compat code, but we better start now. hch is
> perfectly right that the old futex multiplexer interface is quite ugly,
> the requeue op only made this even more prominent.

It's a judgement call: how simple it is to change vs. the amount of
damage done by not changing it.

I don't think it's worth changing, but I don't think we're going to
convince each other.

> > Comment says: /* Must be "naturally" aligned */.  This used to be true
> > in a much earlier version of the code, now AFAICT the requirement test
> > should be:
> > 

> > 	/* Handling futexes on multiple pages?  -ETOOHARD */
> > 	if (pos_in_page + sizeof(u32) > PAGE_SIZE)
> > 		return -EINVAL;
> 
> yes - but i'd rather enforce this for every futex, than to hit it in every
> 1000th app that manages to misalign a futex _and_ lay it across two pages.  

Good point.  I'd prefer to fix the comment though, since it's not
true.  How about changing it to something like:

      	/* We can't handle futexes across multiple pages: best to reject
	   any crazy alignment to save the users from themselves. */

> Also, it's only x86 that guarantees atomic instructions on misaligned
> futexes (even then it comes with a cycle penalty), are you sure this also
> works on other architectures? So i'd rather be a bit more strict with this
> requirement.

Sure.  My point was that this comment is actually from a v. early futex
version where the kernel actually did the atomic ops itself.

Hope that clarifies!
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-20  6:56         ` Christoph Hellwig
@ 2003-05-20  8:57           ` Rusty Russell
  2003-05-20  9:03             ` Ingo Molnar
  2003-05-20  9:12           ` Ingo Oeser
  2003-05-20 15:41           ` Linus Torvalds
  2 siblings, 1 reply; 31+ messages in thread
From: Rusty Russell @ 2003-05-20  8:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Ulrich Drepper, mingo, Linus Torvalds, linux-kernel

In message <20030520075627.A28002@infradead.org> you write:
> On Tue, May 20, 2003 at 08:27:03AM +0200, Ingo Molnar wrote:
> > yes, but the damage has been done already, and now we've got to start the
> > slow wait for the old syscall to flush out of our tree.
> 
> Actually it should go away before 2.6.0.  sys_futex never was part of a
> released stable kernel so having the old_ version around is silly.

Hmm, in that case I'd say "just break it", and I'd be all in favour of
demuxing the syscall.

But I think vendors have backported and released futexes, which is why
Ingo did this...

(Although you might be right about "by the time 2.6 is out" 8)
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-20  8:57           ` Rusty Russell
@ 2003-05-20  9:03             ` Ingo Molnar
  2003-05-20  9:51               ` Christoph Hellwig
  0 siblings, 1 reply; 31+ messages in thread
From: Ingo Molnar @ 2003-05-20  9:03 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Christoph Hellwig, Ulrich Drepper, Linus Torvalds, linux-kernel


On Tue, 20 May 2003, Rusty Russell wrote:

> > Actually it should go away before 2.6.0.  sys_futex never was part of a
> > released stable kernel so having the old_ version around is silly.
> 
> Hmm, in that case I'd say "just break it", and I'd be all in favour of
> demuxing the syscall.

have you all gone nuts??? It's not an option to break perfectly working
binaries out there. Hell, we didnt even reorder the new NPTL
syscalls/extensions 1-2 kernel releases after the fact. Please grow up!

the interface should have been gotten right initially. We are all guilty
of it - now lets face the consequences. It's only a couple of lines of
code in a well isolated place of the file so i dont know what the fuss is
about. I havent even added FUTEX_REQUEUE to the old API.

> But I think vendors have backported and released futexes, which is why
> Ingo did this...

of course. And which brought the productization of futexes in the first
place.

	Ingo


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-20  6:56         ` Christoph Hellwig
  2003-05-20  8:57           ` Rusty Russell
@ 2003-05-20  9:12           ` Ingo Oeser
  2003-05-20 15:41           ` Linus Torvalds
  2 siblings, 0 replies; 31+ messages in thread
From: Ingo Oeser @ 2003-05-20  9:12 UTC (permalink / raw)
  To: Christoph Hellwig, Ingo Molnar, Rusty Russell, Ulrich Drepper,
	Linus Torvalds, linux-kernel

Hi,

On Tue, May 20, 2003 at 07:56:27AM +0100, Christoph Hellwig wrote:
> On Tue, May 20, 2003 at 08:27:03AM +0200, Ingo Molnar wrote:
> > yes, but the damage has been done already, and now we've got to start the
> > slow wait for the old syscall to flush out of our tree.
> 
> Actually it should go away before 2.6.0.  sys_futex never was part of a
> released stable kernel so having the old_ version around is silly.  I
> Think it's enough time until 2.6 hits the roads for people to have those
> vendor libc flushed out that use it.  (and sys_futex still isn't used
> in the glibc CVS, only in the addon nptl package with pre-1 release
> numbers.)

This sounds reasonable. The people who shipped that already to
customers have so heavily patched kernels, that this simple patch
for the old sys_futex shouldn't really matter.

But cluttering the kernel with an API/ABI that was born in the
same development cycle, where it has been obsoleted sounds not
worth the bytes it consumes. And so we have a syscall slot
available for the next development cycle.

Or did anybody promise that this was final already? Usally this
promise comes with x.even.y not with x.odd.y .

Regards

Ingo Oeser

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-20  9:03             ` Ingo Molnar
@ 2003-05-20  9:51               ` Christoph Hellwig
  2003-05-20 15:46                 ` Linus Torvalds
  0 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2003-05-20  9:51 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Rusty Russell, Ulrich Drepper, Linus Torvalds, linux-kernel

On Tue, May 20, 2003 at 11:03:36AM +0200, Ingo Molnar wrote:
> have you all gone nuts??? It's not an option to break perfectly working
> binaries out there.

Of course it is.  Linux has enough problem problems due to past mainline
stupidities, now we don't need to codify vendor braindamages aswell.  E.g
mainline doesn't have the RH AS aio vsyscall crap or suse's get dev_t behind
/dev/console stuff either.  If Red Hat thinks it needs to live with more interfaces
than what the stable kernel release provide their on their own luck.

And it's a lot of time to 2.6 anyway, don't tell me Jakub isn't smart enough to
get a glibc rpm out by then that works with the new and the old futex stuff.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-20  6:56         ` Christoph Hellwig
  2003-05-20  8:57           ` Rusty Russell
  2003-05-20  9:12           ` Ingo Oeser
@ 2003-05-20 15:41           ` Linus Torvalds
  2 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2003-05-20 15:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ingo Molnar, Rusty Russell, Ulrich Drepper, linux-kernel


On Tue, 20 May 2003, Christoph Hellwig wrote:
> 
> Actually it should go away before 2.6.0.  sys_futex never was part of a
> released stable kernel so having the old_ version around is silly.

That's not a valid argument. That's an _excuse_.

Guys, binary compatibility is important. It's important enough that if 
something got extensively used on development kernels, it's _still_ a hell 
of a lot more important than most other things around. The _only_ things 
that trump binary compatibility are

 - developer sanity (ie it has to be truly mindbogglingly hard to support 
   the old interface)
 - stability (ie if the old interface was so badly designed that it can't 
   be done right - mmap on /proc/<pid>/mem was one of these).
 - it's been deprecated over at least one full stable release.

Something like "it's only been in the development kernels" is simply not
an issue. The only thing that matters is whether it is used by various
binaries or not. And I think futexes are used a lot by glibc..

		Linus


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-20  9:51               ` Christoph Hellwig
@ 2003-05-20 15:46                 ` Linus Torvalds
  0 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2003-05-20 15:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ingo Molnar, Rusty Russell, Ulrich Drepper, linux-kernel


On Tue, 20 May 2003, Christoph Hellwig wrote:
>
> On Tue, May 20, 2003 at 11:03:36AM +0200, Ingo Molnar wrote:
> > have you all gone nuts??? It's not an option to break perfectly working
> > binaries out there.
> 
> Of course it is. 

NO. 

Christoph, get a grip. Ingo is 100% right.

IT IS NEVER ACCEPTABLE TO BREAK USER LEVEL BINARIES! In particular, we do 
_not_ do it just because of some sense of "aesthetics".

If you want "aesthetics", go play with microkernels, or other academic
projects. They don't care about their users, they care about their ideas. 
The end result is, in my opinion, CRAP.

Linux has never been that way. The _founding_ principle of Linux was
"compatibility". Don't ever forget that. The user comes first. ALWAYS.  
Pretty code is a desirable feature, but if prettifying the code breaks
user apps, it doesn't get prettified.

Repeat after me: the goodness of an operating system is not in how pretty 
it is, but in how well it supports the user.

Make it your mantra.

		Linus


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
       [not found] <3ECAC2AE.8090401@redhat.com>
@ 2003-05-21  2:34 ` Rusty Russell
  2003-05-21  9:48   ` Ingo Molnar
  0 siblings, 1 reply; 31+ messages in thread
From: Rusty Russell @ 2003-05-21  2:34 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Linus Torvalds, Ingo Molnar, linux-kernel

In message <3ECAC2AE.8090401@redhat.com> you write:
> Ingo's last patch is just fine.  New functionality should not be added
> to the to-be-obsoleted interfaces.

Perhaps I was reading too much into Linus' mail, but I read it as
"don't obsolete the old interface and introduce a new one just because
of some sense of aesthetics".

ie. it's not a to-be-obsoleted interface.

I think we're long past this being a productive thread, though.
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-21  2:34 ` Rusty Russell
@ 2003-05-21  9:48   ` Ingo Molnar
  2003-05-21 10:24     ` Christoph Hellwig
  2003-05-22  0:30     ` Rusty Russell
  0 siblings, 2 replies; 31+ messages in thread
From: Ingo Molnar @ 2003-05-21  9:48 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Ulrich Drepper, Linus Torvalds, linux-kernel


On Wed, 21 May 2003, Rusty Russell wrote:

> Perhaps I was reading too much into Linus' mail, but I read it as "don't
> obsolete the old interface and introduce a new one just because of some
> sense of aesthetics".

no. The concept is: "dont cause the user any pain". Reshuffling the
syscall internally and providing new interfaces for the feature to be
exposed in a cleaner way is perfectly OK as long as this does not hurt
anything else - and it does not in this case. New glibc will use the new
syscalls and will fall back to the old one if they are -ENOSYS, so new
glibc will work on older kernels as well. Old glibc will work with old
kernels and new kernels as well. This is being done for other interfaces
currently, this is the only mechanism to 'flush out' old syscalls
gracefully.

the newer a kernel and a glibc is, the less 'fallback' happens - the
faster the application is, so this compatibility scheme makes perfect
sense both support-wise and technically. (not counting the speedup caused
by the interface cleanups themselves)

not adding FUTEX_REQUEUE to the old syscall is the right solution - it was
never exposed to anyone in this way, so old glibc doesnt know about it -
and new glibc will never use the old interfaces.

any other disagreements?

	Ingo



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-21  9:48   ` Ingo Molnar
@ 2003-05-21 10:24     ` Christoph Hellwig
  2003-05-22  0:30     ` Rusty Russell
  1 sibling, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2003-05-21 10:24 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Rusty Russell, Ulrich Drepper, Linus Torvalds, linux-kernel

On Wed, May 21, 2003 at 11:48:49AM +0200, Ingo Molnar wrote:
> no. The concept is: "dont cause the user any pain". Reshuffling the
> syscall internally and providing new interfaces for the feature to be
> exposed in a cleaner way is perfectly OK as long as this does not hurt
> anything else - and it does not in this case. New glibc will use the new
> syscalls and will fall back to the old one if they are -ENOSYS, so new
> glibc will work on older kernels as well. Old glibc will work with old
> kernels and new kernels as well. This is being done for other interfaces
> currently, this is the only mechanism to 'flush out' old syscalls
> gracefully.

I don't think anyone disagreed with that (at least not me).  The only
thing I argued is that we don't need the usual flush-out period of two
stable series because this syscall never was implemented in any relead
stable kernel but rather a much shorter one so that this feature never
hits a released stable kernel.

Linus disagrees strongly so we'll have to keep this crap around for five
years - that's just yet another bit of bloat growing everyones kernel
but no irreperable damage.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-21  9:48   ` Ingo Molnar
  2003-05-21 10:24     ` Christoph Hellwig
@ 2003-05-22  0:30     ` Rusty Russell
  2003-05-22  9:15       ` Ingo Molnar
  1 sibling, 1 reply; 31+ messages in thread
From: Rusty Russell @ 2003-05-22  0:30 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Ulrich Drepper, Linus Torvalds, linux-kernel

In message <Pine.LNX.4.44.0305211140120.2045-100000@localhost.localdomain> you 
write:
> 
> On Wed, 21 May 2003, Rusty Russell wrote:
> 
> > Perhaps I was reading too much into Linus' mail, but I read it as "don't
> > obsolete the old interface and introduce a new one just because of some
> > sense of aesthetics".
> 
> no. The concept is: "dont cause the user any pain". Reshuffling the
> syscall internally and providing new interfaces for the feature to be
> exposed in a cleaner way is perfectly OK as long as this does not hurt
> anything else

I understand what you're saying, but I disagree.

See, I don't think the current interface is too ugly to live.
Especially since you don't need to introduce a new arg to implement
FUTEX_REQUEUE, so there's no immediate problem with it.

Do you understand what I'm saying?  If so, we can agree to disagree,
and it doesn't matter, because Linus will take your patch 8)

Cheers,
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-22  0:30     ` Rusty Russell
@ 2003-05-22  9:15       ` Ingo Molnar
  2003-05-22 10:35         ` Rusty Russell
  0 siblings, 1 reply; 31+ messages in thread
From: Ingo Molnar @ 2003-05-22  9:15 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Ulrich Drepper, Linus Torvalds, linux-kernel


On Thu, 22 May 2003, Rusty Russell wrote:

> See, I don't think the current interface is too ugly to live. Especially
> since you don't need to introduce a new arg to implement FUTEX_REQUEUE,
> so there's no immediate problem with it.

i think overloading a completely unrelated pointer (ie. overloading the
timespec pointer with a futex pointer, depending on the value of the 'op'
flag), within an already overcrowded multiplexing syscall is a 100%
non-starter to begin with.

really, i dont see what your problem with the new syscalls are. Futexes
started out as a special, limited hack for userspace threading, but by
today they have become a first-class concept: generic, user-accessible
waitqueues identified by the piece of VM they are located at. Very
flexible, very useful and very fast.

eg. under LinuxThreads, all the pthread_* functions were assuming a shared
VM. glibc functions like pthread_mutex_lock() only worked for 'threaded'
setups - ie. threads sharing pagetables via CLONE_VM. LinuxThreads had to
use thread pointers, lists and other constructs that made the library only
useful for the limited 'threaded' case.

today, with NPTL + glibc, all these pthread APIs are 100% available for
processes as well: all that is needed are the variables to be shared
between two processes. Ie. it works for shared mmap()-ed regions, or SysV
regions. Just #include <pthread.h> and use these rich locking APIs in your
process-based applications. No need to go to threads to be able to use the
pthread locking APIs. This, as far as i know, is a 'first' amongst unices.

While doing all this stuff we found a new, generic operation needed for
these 'user-accissible hashed waitqueues': the ability to requeue waiting
threads between two waitqueues, so that userspace can implement locking
primitives more flexibly. This is a clean, generic operation nicely
extending the already existing ability to wait on a waitqueue and to wake
up a thread in a waitqueue. Glibc will happily use all the new syscalls.  
They are even slightly faster, and are much cleaner - and there's zero
slowdown for code that used the old API. Plus historically we've
constantly moved away from multiplexer syscalls. So it's precisely the
right time to do this, and it's a win-win situation all over.

all that is needed now is some actual review of the new APIs from the
conceptual angle (i've done that and i think they are okay, but more eyes
see more), so that we make sure these are good and we wont need to discard
any aspect of them anytime soon.

	Ingo


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-22  9:15       ` Ingo Molnar
@ 2003-05-22 10:35         ` Rusty Russell
  0 siblings, 0 replies; 31+ messages in thread
From: Rusty Russell @ 2003-05-22 10:35 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Ulrich Drepper, Linus Torvalds, linux-kernel, pthreads-devel

In message <Pine.LNX.4.44.0305221052590.4549-100000@localhost.localdomain> you write:
> really, i dont see what your problem with the new syscalls are.

That's clear.  But I just changed my mind 8)

Because if you're going to demux the syscall, it might be worth
looking at FUTEX_FD: an "expected val" arg there might be worthwhile,
because to use it currently I think NPTL does:

	try to get futex
	fd = sys_futex(FUTEX_FD...)
	try to get futex again because of race

Have the futex_fd act like futex_wait, ie. return -EWOULDBLOCK if the
value != expected value.

> all that is needed now is some actual review of the new APIs from the
> conceptual angle (i've done that and i think they are okay, but more eyes
> see more), so that we make sure these are good and we wont need to discard
> any aspect of them anytime soon.

Sorry, I didn't comment because I thought your explanation, concept,
analysis and code were all very neat.

Thanks,
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
@ 2003-05-22 11:23 Martin Wirth
  2003-05-22 11:34 ` Ingo Molnar
  0 siblings, 1 reply; 31+ messages in thread
From: Martin Wirth @ 2003-05-22 11:23 UTC (permalink / raw)
  To: mingo, linux-kernel, rusty

Ingo Molnar wrote:

 >all that is needed now is some actual review of the new APIs from the
 >conceptual angle (i've done that and i think they are okay, but more 
 >eyes see more), so that we make sure these are good and we wont need to
 >discard any aspect of them anytime soon.

What about adding an u32 flags field to each one of the new futex 
sys_calls. This gives you more freedom for future extensions without
changing the API again. Possible uses may be:

- Specify the futexes to be mm-local: By using the pair mm* and vaddr as
   key it is possible to have process local futexes living on the same
   hash with the following advantages:
   1. no page_table lock contention (I implemented mm-local futexes
      for my application after I noticed long latencies on SMP where a
      high prio tasks spun in futex_wake while another task was doing 		
      mmap/munmap on a second processor).
   2. no vcache pollution (I guess 99% of all futexes will not be in 		 
     shared memory)
   3. Slightly faster, since no page pinning is needed

- Specify queueing or unqueueing in priority order


Martin


P.S.

By the way, your latest futex patch still contains the bogus
if (!list_empty(&q.list)) conditional, that's always true since
you hold the locks at this point and no one can have removed us
from the list:

 > 	add_wait_queue(&q.waiters, &wait);
 > 	set_current_state(TASK_INTERRUPTIBLE);
 >-	if (!list_empty(&q.list))
 >+	if (!list_empty(&q.list)) {
 >+		unlock_futex_mm();
 > 		time = schedule_timeout(time);
 >+	}

Of course the test would be (and was) pretty necessary if you drop the
locks before the get_user(...) call. And I must admit that I still
can't see why you need to hold the locks across get_user. Even if it's
save to do so at least every automatic code checker will bark at this point.



	


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-22 11:23 [patch] futex requeueing feature, futex-requeue-2.5.69-D3 Martin Wirth
@ 2003-05-22 11:34 ` Ingo Molnar
  2003-05-22 12:04   ` William Lee Irwin III
  0 siblings, 1 reply; 31+ messages in thread
From: Ingo Molnar @ 2003-05-22 11:34 UTC (permalink / raw)
  To: Martin Wirth; +Cc: linux-kernel, rusty


On Thu, 22 May 2003, Martin Wirth wrote:

> - Specify the futexes to be mm-local: By using the pair mm* and vaddr as
>    key it is possible to have process local futexes living on the same
>    hash with the following advantages:
>    1. no page_table lock contention (I implemented mm-local futexes
>       for my application after I noticed long latencies on SMP where a
>       high prio tasks spun in futex_wake while another task was doing 		
>       mmap/munmap on a second processor).
>    2. no vcache pollution (I guess 99% of all futexes will not be in 		 
>      shared memory)
>    3. Slightly faster, since no page pinning is needed

there's a patch in the works to change futex hashing to not require page
pinning, without making futexes process-local. This should also fix the
FUTEX_FD problems.

basically the idea is that struct page remains to be a good hash whenever
the page is present, and the swapout pte value is a good hash key when the
page is swapped out. So we basically can make futexes swappable.

this gets rid of pinned pages and pagetable locking, without the need for
API-level changes.

> By the way, your latest futex patch still contains the bogus if
> (!list_empty(&q.list)) conditional, that's always true since you hold
> the locks at this point and no one can have removed us from the list:

ok, i'll fix this! I knew i missed something ...

	Ingo



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
       [not found] <3ECCB319.4060706@dlr.de.suse.lists.linux.kernel>
@ 2003-05-22 11:38 ` Andi Kleen
  0 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2003-05-22 11:38 UTC (permalink / raw)
  To: Martin Wirth; +Cc: mingo, linux-kernel

Martin Wirth <martin.wirth@dlr.de> writes:
>       mmap/munmap on a second processor).
>    2. no vcache pollution (I guess 99% of all futexes will not be in
> shared memory)

- Could also be specified as a mutex attribute in pthreads/NPTL to get faster
mutexes (or perhaps a environment variable to allow users easy tuning)

-Andi

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch] futex requeueing feature, futex-requeue-2.5.69-D3
  2003-05-22 11:34 ` Ingo Molnar
@ 2003-05-22 12:04   ` William Lee Irwin III
  0 siblings, 0 replies; 31+ messages in thread
From: William Lee Irwin III @ 2003-05-22 12:04 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Martin Wirth, linux-kernel, rusty

On Thu, May 22, 2003 at 01:34:42PM +0200, Ingo Molnar wrote:
> there's a patch in the works to change futex hashing to not require page
> pinning, without making futexes process-local. This should also fix the
> FUTEX_FD problems.
> basically the idea is that struct page remains to be a good hash whenever
> the page is present, and the swapout pte value is a good hash key when the
> page is swapped out. So we basically can make futexes swappable.
> this gets rid of pinned pages and pagetable locking, without the need for
> API-level changes.

Someone's already doing it? I'll back off if so.


-- wli

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2003-05-22 11:51 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-05-22 11:23 [patch] futex requeueing feature, futex-requeue-2.5.69-D3 Martin Wirth
2003-05-22 11:34 ` Ingo Molnar
2003-05-22 12:04   ` William Lee Irwin III
     [not found] <3ECCB319.4060706@dlr.de.suse.lists.linux.kernel>
2003-05-22 11:38 ` Andi Kleen
     [not found] <3ECAC2AE.8090401@redhat.com>
2003-05-21  2:34 ` Rusty Russell
2003-05-21  9:48   ` Ingo Molnar
2003-05-21 10:24     ` Christoph Hellwig
2003-05-22  0:30     ` Rusty Russell
2003-05-22  9:15       ` Ingo Molnar
2003-05-22 10:35         ` Rusty Russell
  -- strict thread matches above, loose matches on Subject: below --
2003-05-19  9:31 Ingo Molnar
2003-05-19 10:10 ` Christoph Hellwig
2003-05-19 10:16   ` Ingo Molnar
2003-05-19 11:49     ` Christoph Hellwig
2003-05-20  0:08   ` Rusty Russell
2003-05-20  0:31     ` Valdis.Kletnieks
2003-05-19 10:23 ` Andrew Morton
2003-05-20  0:04 ` Rusty Russell
2003-05-20  0:40   ` Ulrich Drepper
2003-05-20  1:46     ` Rusty Russell
2003-05-20  2:11       ` Ulrich Drepper
2003-05-20  6:27       ` Ingo Molnar
2003-05-20  6:56         ` Christoph Hellwig
2003-05-20  8:57           ` Rusty Russell
2003-05-20  9:03             ` Ingo Molnar
2003-05-20  9:51               ` Christoph Hellwig
2003-05-20 15:46                 ` Linus Torvalds
2003-05-20  9:12           ` Ingo Oeser
2003-05-20 15:41           ` Linus Torvalds
2003-05-20  8:55         ` Rusty Russell
2003-05-20  6:19   ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox