[PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements
@ 2007-01-09 16:15 Pierre Peiffer
  2007-01-11 17:47 ` Ulrich Drepper
  0 siblings, 1 reply; 11+ messages in thread
From: Pierre Peiffer @ 2007-01-09 16:15 UTC (permalink / raw)
  To: LKML
  Cc: Dinakar Guniguntala, Jean-Pierre Dion, Ingo Molnar,
	Ulrich Drepper, Jakub Jelinek, Darren Hart,
	Sébastien Dugué

Hi,

Today, there are several functionalities or improvements about futexes included
in -rt kernel tree, which, I think, it make sense to have in mainline.
Among them, there are:
	* futex use prio list : allow threads to be woken in priority order instead of
FIFO order.
	* futex_wait use hrtimer : allow the use of finer timer resolution.
	* futex_requeue_pi functionality : allow use of requeue optimisation for
PI-mutexes/PI-futexes.
	* futex64 syscall : allow use of 64-bit futexes instead of 32-bit.

The following mails provide the corresponding patches.

Comments, suggestions, feedback, etc are welcome, as usual.

-- 
Pierre Peiffer



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements
  2007-01-09 16:15 [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements Pierre Peiffer
@ 2007-01-11 17:47 ` Ulrich Drepper
       [not found]   ` <20070111134615.34902742.akpm@osdl.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Ulrich Drepper @ 2007-01-11 17:47 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Pierre Peiffer, LKML, Ingo Molnar, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 381 bytes --]

Andrew,

if the patches allow this, I'd like to see parts 2, 3, and 4 to be in
-mm ASAP.  Especially the 64-bit variants are urgently needed.  Just
hold off adding the plist use, I am still not convinced that
unconditional use is a good thing, especially with one single global list.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

[parent not found: <20070111134615.34902742.akpm@osdl.org>]

* Re: [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements
       [not found]   ` <20070111134615.34902742.akpm@osdl.org>
@ 2007-01-12  7:53     ` Pierre Peiffer
  2007-01-12  7:58       ` Ingo Molnar
  0 siblings, 1 reply; 11+ messages in thread
From: Pierre Peiffer @ 2007-01-12  7:53 UTC (permalink / raw)
  To: LKML; +Cc: Andrew Morton, Ulrich Drepper, Ingo Molnar, Jakub Jelinek

Andrew Morton a écrit :
 > OK.  Unfortunately patches 2-4 don't apply without #1 present and the fix
 > is not immediately obvious, so we'll need a respin+retest, please.

Ok, I'll provide updated patches for -mm ASAP.

> On Thu, 11 Jan 2007 09:47:28 -0800
> Ulrich Drepper <drepper@redhat.com> wrote:

>> if the patches allow this, I'd like to see parts 2, 3, and 4 to be in
>> -mm ASAP.  Especially the 64-bit variants are urgently needed.  Just
>> hold off adding the plist use, I am still not convinced that
>> unconditional use is a good thing, especially with one single global list.

Just to avoid any misunderstanding (I (really) understand your point about 
performance issue), but:

* the problem I mention about several futexes hashed on the same key, and thus 
with all potential waiters listed on the same list, is _not_ a new problem which 
comes with this patch: it already exists today, with simple list.

* the measures of performance done with pthread_broadcast (and thus with 
futex_requeue) is a good choice (well, may be not realistic, when considering 
real applications (*)) to put in evidence the performance impact, rather than 
threads making FUTEX_WAIT/FUTEX_WAKE: what is expensive with plist is the 
plist_add operation (which occurs in FUTEX_WAIT), not plist_del (which occurs 
during FUTEX_WAKE => thus, no big impact should be noticed here). Any measure 
will be difficult to do with only FUTEX_WAIT/WAKE.

=> futex_requeue does as many plist_del/plist_add operations as the number of 
threads waiting (minus 1), and thus has a direct impact on the time needed to 
wake everybody (or to wake the first thread to be more precise).

(*) I'll try the volano bench, if I have time.

-- 
Pierre

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements
  2007-01-12  7:53     ` Pierre Peiffer
@ 2007-01-12  7:58       ` Ingo Molnar
  2007-01-16  8:34         ` Pierre Peiffer
  0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2007-01-12  7:58 UTC (permalink / raw)
  To: Pierre Peiffer; +Cc: LKML, Andrew Morton, Ulrich Drepper, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 563 bytes --]


* Pierre Peiffer <pierre.peiffer@bull.net> wrote:

> [...] Any measure will be difficult to do with only FUTEX_WAIT/WAKE.

that's not a problem - just do such a measurement and show that it does 
/not/ impact performance measurably. That's what we want to know...

> (*) I'll try the volano bench, if I have time.

yeah. As an alternative, it might be a good idea to pthread-ify 
hackbench.c - that should replicate the Volano workload pretty 
accurately. I've attached hackbench.c. (it's process based right now, so 
it wont trigger contended futex ops)

	Ingo

[-- Attachment #2: hackbench.c --]
[-- Type: text/plain, Size: 5408 bytes --]

/* Test groups of 20 processes spraying to 20 receivers */
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/wait.h>
#include <sys/time.h>
#include <sys/poll.h>

#define DATASIZE 100
static unsigned int loops = 100;
static int use_pipes = 0;

static void barf(const char *msg)
{
        fprintf(stderr, "%s (error: %s)\n", msg, strerror(errno));
        exit(1);
}

static void fdpair(int fds[2])
{
        if (use_pipes) {
                if (pipe(fds) == 0)
                        return;
        } else {
                if (socketpair(AF_UNIX, SOCK_STREAM, 0, fds) == 0)
                        return;
        }
        barf("Creating fdpair");
}

/* Block until we're ready to go */
static void ready(int ready_out, int wakefd)
{
        char dummy;
        struct pollfd pollfd = { .fd = wakefd, .events = POLLIN };

        /* Tell them we're ready. */
        if (write(ready_out, &dummy, 1) != 1)
                barf("CLIENT: ready write");

        /* Wait for "GO" signal */
        if (poll(&pollfd, 1, -1) != 1)
                barf("poll");
}

/* Sender sprays loops messages down each file descriptor */
static void sender(unsigned int num_fds,
                   int out_fd[num_fds],
                   int ready_out,
                   int wakefd)
{
        char data[DATASIZE];
        unsigned int i, j;

        ready(ready_out, wakefd);

        /* Now pump to every receiver. */
        for (i = 0; i < loops; i++) {
                for (j = 0; j < num_fds; j++) {
                        int ret, done = 0;

                again:
                        ret = write(out_fd[j], data + done, sizeof(data)-done);
                        if (ret < 0)
                                barf("SENDER: write");
                        done += ret;
                        if (done < sizeof(data))
                                goto again;
                }
        }
}

/* One receiver per fd */
static void receiver(unsigned int num_packets,
                     int in_fd,
                     int ready_out,
                     int wakefd)
{
        unsigned int i;

        /* Wait for start... */
        ready(ready_out, wakefd);

        /* Receive them all */
        for (i = 0; i < num_packets; i++) {
                char data[DATASIZE];
                int ret, done = 0;

        again:
                ret = read(in_fd, data + done, DATASIZE - done);
                if (ret < 0)
                        barf("SERVER: read");
                done += ret;
                if (done < DATASIZE)
                        goto again;
        }
}

/* One group of senders and receivers */
static unsigned int group(unsigned int num_fds,
                          int ready_out,
                          int wakefd)
{
        unsigned int i;
        unsigned int out_fds[num_fds];

        for (i = 0; i < num_fds; i++) {
                int fds[2];

                /* Create the pipe between client and server */
                fdpair(fds);

                /* Fork the receiver. */
                switch (fork()) {
                case -1: barf("fork()");
                case 0:
                        close(fds[1]);
                        receiver(num_fds*loops, fds[0], ready_out, wakefd);
                        exit(0);
                }

                out_fds[i] = fds[1];
                close(fds[0]);
        }

        /* Now we have all the fds, fork the senders */
        for (i = 0; i < num_fds; i++) {
                switch (fork()) {
                case -1: barf("fork()");
                case 0:
                        sender(num_fds, out_fds, ready_out, wakefd);
                        exit(0);
                }
        }

        /* Close the fds we have left */
        for (i = 0; i < num_fds; i++)
                close(out_fds[i]);

        /* Return number of children to reap */
        return num_fds * 2;
}

int main(int argc, char *argv[])
{
        unsigned int i, num_groups, total_children;
        struct timeval start, stop, diff;
        unsigned int num_fds = 20;
        int readyfds[2], wakefds[2];
        char dummy;

        if (argv[1] && strcmp(argv[1], "-pipe") == 0) {
                use_pipes = 1;
                argc--;
                argv++;
        }

        if (argc != 2 || (num_groups = atoi(argv[1])) == 0)
                barf("Usage: hackbench [-pipe] <num groups>\n");

        fdpair(readyfds);
        fdpair(wakefds);

        total_children = 0;
        for (i = 0; i < num_groups; i++)
                total_children += group(num_fds, readyfds[1], wakefds[0]);

        /* Wait for everyone to be ready */
        for (i = 0; i < total_children; i++)
                if (read(readyfds[0], &dummy, 1) != 1)
                        barf("Reading for readyfds");

        gettimeofday(&start, NULL);

        /* Kick them off */
        if (write(wakefds[1], &dummy, 1) != 1)
                barf("Writing to start them");

        /* Reap them all */
        for (i = 0; i < total_children; i++) {
                int status;
                wait(&status);
                if (!WIFEXITED(status))
                        exit(1);
        }

        gettimeofday(&stop, NULL);

        /* Print time... */
        timersub(&stop, &start, &diff);
        printf("Time: %lu.%03lu\n", diff.tv_sec, diff.tv_usec/1000);
        exit(0);
}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements
  2007-01-12  7:58       ` Ingo Molnar
@ 2007-01-16  8:34         ` Pierre Peiffer
  2007-01-16  9:44           ` Ingo Molnar
  2007-01-16 15:14           ` Ulrich Drepper
  0 siblings, 2 replies; 11+ messages in thread
From: Pierre Peiffer @ 2007-01-16  8:34 UTC (permalink / raw)
  To: LKML; +Cc: Ingo Molnar, Andrew Morton, Ulrich Drepper, Jakub Jelinek

Hi,

Ingo Molnar a écrit :
> yeah. As an alternative, it might be a good idea to pthread-ify 
> hackbench.c - that should replicate the Volano workload pretty 
> accurately. I've attached hackbench.c. (it's process based right now, so 
> it wont trigger contended futex ops)

Ok, thanks. I've adapted your test, Ingo, and do some measures. (I've only 
replaced fork with pthread_create, I didn't use condvar or barrier for the first 
synchronization).
The modified hackbench is available here:

http://www.bullopensource.org/posix/pi-futex/hackbench_pth.c

I've run this bench 1000 times with pipe and 800 groups.
Here are the results:

Test1 - with simple list (i.e. without any futex patches)
=========================================================
Iterations=1000
Latency (s)      min      max      avg      stddev
                 26.67    27.89    27.14        0.19

Test2 - with plist (i.e. with only patch 1/4 as is)
===================================================
Iterations=1000
Latency (s)      min      max      avg      stddev
                 26.87    28.18    27.30        0.18

Test3 - with plist but all SHED_OTHER registered
         with the same priority (MAX_RT_PRIO)
(i.e. with modified patch 1/4, patch not yet posted here)
=========================================================
Iterations=1000
Latency (s)      min      max      avg      stddev
                 26.74    27.84    27.16        0.18


-- 
Pierre

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements
  2007-01-16  8:34         ` Pierre Peiffer
@ 2007-01-16  9:44           ` Ingo Molnar
  2007-01-16 15:14           ` Ulrich Drepper
  1 sibling, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2007-01-16  9:44 UTC (permalink / raw)
  To: Pierre Peiffer; +Cc: LKML, Andrew Morton, Ulrich Drepper, Jakub Jelinek


* Pierre Peiffer <pierre.peiffer@bull.net> wrote:

> The modified hackbench is available here:
> 
> http://www.bullopensource.org/posix/pi-futex/hackbench_pth.c

cool!

> I've run this bench 1000 times with pipe and 800 groups.
> Here are the results:
> 
> Test1 - with simple list (i.e. without any futex patches)
> =========================================================
> Latency (s)      min      max      avg      stddev
>                 26.67    27.89    27.14        0.19

> Test2 - with plist (i.e. with only patch 1/4 as is)
>                 26.87    28.18    27.30        0.18

> Test3 - with plist but all SHED_OTHER registered
>                 26.74    27.84    27.16        0.18

ok, seems like the last one is the winner - it's the same as unmodified, 
within noise.

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements
  2007-01-16  8:34         ` Pierre Peiffer
  2007-01-16  9:44           ` Ingo Molnar
@ 2007-01-16 15:14           ` Ulrich Drepper
  2007-01-16 15:40             ` Ingo Molnar
  1 sibling, 1 reply; 11+ messages in thread
From: Ulrich Drepper @ 2007-01-16 15:14 UTC (permalink / raw)
  To: Pierre Peiffer; +Cc: LKML, Ingo Molnar, Andrew Morton, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 699 bytes --]

Pierre Peiffer wrote:
> I've run this bench 1000 times with pipe and 800 groups.
> Here are the results:

This is not what I'm mostly concerned about.  The patches create a
bottleneck since _all_ processes use the same resource.  Plus, this code
has to be run on a machine with multiple processors to get RFOs into play.

So, please do this: on an SMP (4p or more) machine, rig the test so that
it runs quite a while.  Then, in a script, start the program a bunch of
times, all in parallel.  Have the script wait until all program runs are
done and time the time until the last program finishes.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements
  2007-01-16 15:14           ` Ulrich Drepper
@ 2007-01-16 15:40             ` Ingo Molnar
  2007-01-16 17:46               ` Ulrich Drepper
  0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2007-01-16 15:40 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Pierre Peiffer, LKML, Andrew Morton, Jakub Jelinek


* Ulrich Drepper <drepper@redhat.com> wrote:

> Pierre Peiffer wrote:
> > I've run this bench 1000 times with pipe and 800 groups.
> > Here are the results:
> 
> This is not what I'm mostly concerned about.  The patches create a 
> bottleneck since _all_ processes use the same resource. [...]

what do you mean by that - which is this same resource?

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements
  2007-01-16 15:40             ` Ingo Molnar
@ 2007-01-16 17:46               ` Ulrich Drepper
  2007-01-16 17:50                 ` Ingo Molnar
  0 siblings, 1 reply; 11+ messages in thread
From: Ulrich Drepper @ 2007-01-16 17:46 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Pierre Peiffer, LKML, Andrew Morton, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 376 bytes --]

Ingo Molnar wrote:
> what do you mean by that - which is this same resource?

From what has been said here before, all futexes are stored in the same
list or hash table or whatever it was.  I want to see how that code
behaves if many separate processes concurrently use futexes.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements
  2007-01-16 17:46               ` Ulrich Drepper
@ 2007-01-16 17:50                 ` Ingo Molnar
  2007-01-17  7:50                   ` Pierre Peiffer
  0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2007-01-16 17:50 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Pierre Peiffer, LKML, Andrew Morton, Jakub Jelinek


* Ulrich Drepper <drepper@redhat.com> wrote:

> > what do you mean by that - which is this same resource?
> 
> From what has been said here before, all futexes are stored in the 
> same list or hash table or whatever it was.  I want to see how that 
> code behaves if many separate processes concurrently use futexes.

futexes are stored in the bucket hash, and these patches do not change 
that. The pi-list that was talked about is per-futex. So there's no 
change to the way futexes are hashed nor should there be any scalability 
impact - besides the micro-impact that was measured in a number of ways 
- AFAICS.

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements
  2007-01-16 17:50                 ` Ingo Molnar
@ 2007-01-17  7:50                   ` Pierre Peiffer
  0 siblings, 0 replies; 11+ messages in thread
From: Pierre Peiffer @ 2007-01-17  7:50 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Ulrich Drepper, LKML, Andrew Morton, Jakub Jelinek

Ingo Molnar a écrit :
> * Ulrich Drepper <drepper@redhat.com> wrote:
> 
>>> what do you mean by that - which is this same resource?
>> From what has been said here before, all futexes are stored in the 
>> same list or hash table or whatever it was.  I want to see how that 
>> code behaves if many separate processes concurrently use futexes.
> 
> futexes are stored in the bucket hash, and these patches do not change 
> that. The pi-list that was talked about is per-futex. So there's no 
> change to the way futexes are hashed nor should there be any scalability 
> impact - besides the micro-impact that was measured in a number of ways 
> - AFAICS.

Yes, that's completely right !

-- 
Pierre

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2007-01-17  7:51 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-09 16:15 [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements Pierre Peiffer
2007-01-11 17:47 ` Ulrich Drepper
     [not found]   ` <20070111134615.34902742.akpm@osdl.org>
2007-01-12  7:53     ` Pierre Peiffer
2007-01-12  7:58       ` Ingo Molnar
2007-01-16  8:34         ` Pierre Peiffer
2007-01-16  9:44           ` Ingo Molnar
2007-01-16 15:14           ` Ulrich Drepper
2007-01-16 15:40             ` Ingo Molnar
2007-01-16 17:46               ` Ulrich Drepper
2007-01-16 17:50                 ` Ingo Molnar
2007-01-17  7:50                   ` Pierre Peiffer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox