* [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements @ 2007-01-09 16:15 Pierre Peiffer 2007-01-11 17:47 ` Ulrich Drepper 0 siblings, 1 reply; 11+ messages in thread From: Pierre Peiffer @ 2007-01-09 16:15 UTC (permalink / raw) To: LKML Cc: Dinakar Guniguntala, Jean-Pierre Dion, Ingo Molnar, Ulrich Drepper, Jakub Jelinek, Darren Hart, Sébastien Dugué Hi, Today, there are several functionalities or improvements about futexes included in -rt kernel tree, which, I think, it make sense to have in mainline. Among them, there are: * futex use prio list : allow threads to be woken in priority order instead of FIFO order. * futex_wait use hrtimer : allow the use of finer timer resolution. * futex_requeue_pi functionality : allow use of requeue optimisation for PI-mutexes/PI-futexes. * futex64 syscall : allow use of 64-bit futexes instead of 32-bit. The following mails provide the corresponding patches. Comments, suggestions, feedback, etc are welcome, as usual. -- Pierre Peiffer ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements 2007-01-09 16:15 [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements Pierre Peiffer @ 2007-01-11 17:47 ` Ulrich Drepper [not found] ` <20070111134615.34902742.akpm@osdl.org> 0 siblings, 1 reply; 11+ messages in thread From: Ulrich Drepper @ 2007-01-11 17:47 UTC (permalink / raw) To: Andrew Morton; +Cc: Pierre Peiffer, LKML, Ingo Molnar, Jakub Jelinek [-- Attachment #1: Type: text/plain, Size: 381 bytes --] Andrew, if the patches allow this, I'd like to see parts 2, 3, and 4 to be in -mm ASAP. Especially the 64-bit variants are urgently needed. Just hold off adding the plist use, I am still not convinced that unconditional use is a good thing, especially with one single global list. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 251 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <20070111134615.34902742.akpm@osdl.org>]
* Re: [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements [not found] ` <20070111134615.34902742.akpm@osdl.org> @ 2007-01-12 7:53 ` Pierre Peiffer 2007-01-12 7:58 ` Ingo Molnar 0 siblings, 1 reply; 11+ messages in thread From: Pierre Peiffer @ 2007-01-12 7:53 UTC (permalink / raw) To: LKML; +Cc: Andrew Morton, Ulrich Drepper, Ingo Molnar, Jakub Jelinek Andrew Morton a écrit : > OK. Unfortunately patches 2-4 don't apply without #1 present and the fix > is not immediately obvious, so we'll need a respin+retest, please. Ok, I'll provide updated patches for -mm ASAP. > On Thu, 11 Jan 2007 09:47:28 -0800 > Ulrich Drepper <drepper@redhat.com> wrote: >> if the patches allow this, I'd like to see parts 2, 3, and 4 to be in >> -mm ASAP. Especially the 64-bit variants are urgently needed. Just >> hold off adding the plist use, I am still not convinced that >> unconditional use is a good thing, especially with one single global list. Just to avoid any misunderstanding (I (really) understand your point about performance issue), but: * the problem I mention about several futexes hashed on the same key, and thus with all potential waiters listed on the same list, is _not_ a new problem which comes with this patch: it already exists today, with simple list. * the measures of performance done with pthread_broadcast (and thus with futex_requeue) is a good choice (well, may be not realistic, when considering real applications (*)) to put in evidence the performance impact, rather than threads making FUTEX_WAIT/FUTEX_WAKE: what is expensive with plist is the plist_add operation (which occurs in FUTEX_WAIT), not plist_del (which occurs during FUTEX_WAKE => thus, no big impact should be noticed here). Any measure will be difficult to do with only FUTEX_WAIT/WAKE. => futex_requeue does as many plist_del/plist_add operations as the number of threads waiting (minus 1), and thus has a direct impact on the time needed to wake everybody (or to wake the first thread to be more precise). (*) I'll try the volano bench, if I have time. -- Pierre ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements 2007-01-12 7:53 ` Pierre Peiffer @ 2007-01-12 7:58 ` Ingo Molnar 2007-01-16 8:34 ` Pierre Peiffer 0 siblings, 1 reply; 11+ messages in thread From: Ingo Molnar @ 2007-01-12 7:58 UTC (permalink / raw) To: Pierre Peiffer; +Cc: LKML, Andrew Morton, Ulrich Drepper, Jakub Jelinek [-- Attachment #1: Type: text/plain, Size: 563 bytes --] * Pierre Peiffer <pierre.peiffer@bull.net> wrote: > [...] Any measure will be difficult to do with only FUTEX_WAIT/WAKE. that's not a problem - just do such a measurement and show that it does /not/ impact performance measurably. That's what we want to know... > (*) I'll try the volano bench, if I have time. yeah. As an alternative, it might be a good idea to pthread-ify hackbench.c - that should replicate the Volano workload pretty accurately. I've attached hackbench.c. (it's process based right now, so it wont trigger contended futex ops) Ingo [-- Attachment #2: hackbench.c --] [-- Type: text/plain, Size: 5408 bytes --] /* Test groups of 20 processes spraying to 20 receivers */ #include <stdio.h> #include <string.h> #include <errno.h> #include <sys/types.h> #include <sys/socket.h> #include <sys/wait.h> #include <sys/time.h> #include <sys/poll.h> #define DATASIZE 100 static unsigned int loops = 100; static int use_pipes = 0; static void barf(const char *msg) { fprintf(stderr, "%s (error: %s)\n", msg, strerror(errno)); exit(1); } static void fdpair(int fds[2]) { if (use_pipes) { if (pipe(fds) == 0) return; } else { if (socketpair(AF_UNIX, SOCK_STREAM, 0, fds) == 0) return; } barf("Creating fdpair"); } /* Block until we're ready to go */ static void ready(int ready_out, int wakefd) { char dummy; struct pollfd pollfd = { .fd = wakefd, .events = POLLIN }; /* Tell them we're ready. */ if (write(ready_out, &dummy, 1) != 1) barf("CLIENT: ready write"); /* Wait for "GO" signal */ if (poll(&pollfd, 1, -1) != 1) barf("poll"); } /* Sender sprays loops messages down each file descriptor */ static void sender(unsigned int num_fds, int out_fd[num_fds], int ready_out, int wakefd) { char data[DATASIZE]; unsigned int i, j; ready(ready_out, wakefd); /* Now pump to every receiver. */ for (i = 0; i < loops; i++) { for (j = 0; j < num_fds; j++) { int ret, done = 0; again: ret = write(out_fd[j], data + done, sizeof(data)-done); if (ret < 0) barf("SENDER: write"); done += ret; if (done < sizeof(data)) goto again; } } } /* One receiver per fd */ static void receiver(unsigned int num_packets, int in_fd, int ready_out, int wakefd) { unsigned int i; /* Wait for start... */ ready(ready_out, wakefd); /* Receive them all */ for (i = 0; i < num_packets; i++) { char data[DATASIZE]; int ret, done = 0; again: ret = read(in_fd, data + done, DATASIZE - done); if (ret < 0) barf("SERVER: read"); done += ret; if (done < DATASIZE) goto again; } } /* One group of senders and receivers */ static unsigned int group(unsigned int num_fds, int ready_out, int wakefd) { unsigned int i; unsigned int out_fds[num_fds]; for (i = 0; i < num_fds; i++) { int fds[2]; /* Create the pipe between client and server */ fdpair(fds); /* Fork the receiver. */ switch (fork()) { case -1: barf("fork()"); case 0: close(fds[1]); receiver(num_fds*loops, fds[0], ready_out, wakefd); exit(0); } out_fds[i] = fds[1]; close(fds[0]); } /* Now we have all the fds, fork the senders */ for (i = 0; i < num_fds; i++) { switch (fork()) { case -1: barf("fork()"); case 0: sender(num_fds, out_fds, ready_out, wakefd); exit(0); } } /* Close the fds we have left */ for (i = 0; i < num_fds; i++) close(out_fds[i]); /* Return number of children to reap */ return num_fds * 2; } int main(int argc, char *argv[]) { unsigned int i, num_groups, total_children; struct timeval start, stop, diff; unsigned int num_fds = 20; int readyfds[2], wakefds[2]; char dummy; if (argv[1] && strcmp(argv[1], "-pipe") == 0) { use_pipes = 1; argc--; argv++; } if (argc != 2 || (num_groups = atoi(argv[1])) == 0) barf("Usage: hackbench [-pipe] <num groups>\n"); fdpair(readyfds); fdpair(wakefds); total_children = 0; for (i = 0; i < num_groups; i++) total_children += group(num_fds, readyfds[1], wakefds[0]); /* Wait for everyone to be ready */ for (i = 0; i < total_children; i++) if (read(readyfds[0], &dummy, 1) != 1) barf("Reading for readyfds"); gettimeofday(&start, NULL); /* Kick them off */ if (write(wakefds[1], &dummy, 1) != 1) barf("Writing to start them"); /* Reap them all */ for (i = 0; i < total_children; i++) { int status; wait(&status); if (!WIFEXITED(status)) exit(1); } gettimeofday(&stop, NULL); /* Print time... */ timersub(&stop, &start, &diff); printf("Time: %lu.%03lu\n", diff.tv_sec, diff.tv_usec/1000); exit(0); } ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements 2007-01-12 7:58 ` Ingo Molnar @ 2007-01-16 8:34 ` Pierre Peiffer 2007-01-16 9:44 ` Ingo Molnar 2007-01-16 15:14 ` Ulrich Drepper 0 siblings, 2 replies; 11+ messages in thread From: Pierre Peiffer @ 2007-01-16 8:34 UTC (permalink / raw) To: LKML; +Cc: Ingo Molnar, Andrew Morton, Ulrich Drepper, Jakub Jelinek Hi, Ingo Molnar a écrit : > yeah. As an alternative, it might be a good idea to pthread-ify > hackbench.c - that should replicate the Volano workload pretty > accurately. I've attached hackbench.c. (it's process based right now, so > it wont trigger contended futex ops) Ok, thanks. I've adapted your test, Ingo, and do some measures. (I've only replaced fork with pthread_create, I didn't use condvar or barrier for the first synchronization). The modified hackbench is available here: http://www.bullopensource.org/posix/pi-futex/hackbench_pth.c I've run this bench 1000 times with pipe and 800 groups. Here are the results: Test1 - with simple list (i.e. without any futex patches) ========================================================= Iterations=1000 Latency (s) min max avg stddev 26.67 27.89 27.14 0.19 Test2 - with plist (i.e. with only patch 1/4 as is) =================================================== Iterations=1000 Latency (s) min max avg stddev 26.87 28.18 27.30 0.18 Test3 - with plist but all SHED_OTHER registered with the same priority (MAX_RT_PRIO) (i.e. with modified patch 1/4, patch not yet posted here) ========================================================= Iterations=1000 Latency (s) min max avg stddev 26.74 27.84 27.16 0.18 -- Pierre ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements 2007-01-16 8:34 ` Pierre Peiffer @ 2007-01-16 9:44 ` Ingo Molnar 2007-01-16 15:14 ` Ulrich Drepper 1 sibling, 0 replies; 11+ messages in thread From: Ingo Molnar @ 2007-01-16 9:44 UTC (permalink / raw) To: Pierre Peiffer; +Cc: LKML, Andrew Morton, Ulrich Drepper, Jakub Jelinek * Pierre Peiffer <pierre.peiffer@bull.net> wrote: > The modified hackbench is available here: > > http://www.bullopensource.org/posix/pi-futex/hackbench_pth.c cool! > I've run this bench 1000 times with pipe and 800 groups. > Here are the results: > > Test1 - with simple list (i.e. without any futex patches) > ========================================================= > Latency (s) min max avg stddev > 26.67 27.89 27.14 0.19 > Test2 - with plist (i.e. with only patch 1/4 as is) > 26.87 28.18 27.30 0.18 > Test3 - with plist but all SHED_OTHER registered > 26.74 27.84 27.16 0.18 ok, seems like the last one is the winner - it's the same as unmodified, within noise. Ingo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements 2007-01-16 8:34 ` Pierre Peiffer 2007-01-16 9:44 ` Ingo Molnar @ 2007-01-16 15:14 ` Ulrich Drepper 2007-01-16 15:40 ` Ingo Molnar 1 sibling, 1 reply; 11+ messages in thread From: Ulrich Drepper @ 2007-01-16 15:14 UTC (permalink / raw) To: Pierre Peiffer; +Cc: LKML, Ingo Molnar, Andrew Morton, Jakub Jelinek [-- Attachment #1: Type: text/plain, Size: 699 bytes --] Pierre Peiffer wrote: > I've run this bench 1000 times with pipe and 800 groups. > Here are the results: This is not what I'm mostly concerned about. The patches create a bottleneck since _all_ processes use the same resource. Plus, this code has to be run on a machine with multiple processors to get RFOs into play. So, please do this: on an SMP (4p or more) machine, rig the test so that it runs quite a while. Then, in a script, start the program a bunch of times, all in parallel. Have the script wait until all program runs are done and time the time until the last program finishes. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 251 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements 2007-01-16 15:14 ` Ulrich Drepper @ 2007-01-16 15:40 ` Ingo Molnar 2007-01-16 17:46 ` Ulrich Drepper 0 siblings, 1 reply; 11+ messages in thread From: Ingo Molnar @ 2007-01-16 15:40 UTC (permalink / raw) To: Ulrich Drepper; +Cc: Pierre Peiffer, LKML, Andrew Morton, Jakub Jelinek * Ulrich Drepper <drepper@redhat.com> wrote: > Pierre Peiffer wrote: > > I've run this bench 1000 times with pipe and 800 groups. > > Here are the results: > > This is not what I'm mostly concerned about. The patches create a > bottleneck since _all_ processes use the same resource. [...] what do you mean by that - which is this same resource? Ingo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements 2007-01-16 15:40 ` Ingo Molnar @ 2007-01-16 17:46 ` Ulrich Drepper 2007-01-16 17:50 ` Ingo Molnar 0 siblings, 1 reply; 11+ messages in thread From: Ulrich Drepper @ 2007-01-16 17:46 UTC (permalink / raw) To: Ingo Molnar; +Cc: Pierre Peiffer, LKML, Andrew Morton, Jakub Jelinek [-- Attachment #1: Type: text/plain, Size: 376 bytes --] Ingo Molnar wrote: > what do you mean by that - which is this same resource? From what has been said here before, all futexes are stored in the same list or hash table or whatever it was. I want to see how that code behaves if many separate processes concurrently use futexes. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 251 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements 2007-01-16 17:46 ` Ulrich Drepper @ 2007-01-16 17:50 ` Ingo Molnar 2007-01-17 7:50 ` Pierre Peiffer 0 siblings, 1 reply; 11+ messages in thread From: Ingo Molnar @ 2007-01-16 17:50 UTC (permalink / raw) To: Ulrich Drepper; +Cc: Pierre Peiffer, LKML, Andrew Morton, Jakub Jelinek * Ulrich Drepper <drepper@redhat.com> wrote: > > what do you mean by that - which is this same resource? > > From what has been said here before, all futexes are stored in the > same list or hash table or whatever it was. I want to see how that > code behaves if many separate processes concurrently use futexes. futexes are stored in the bucket hash, and these patches do not change that. The pi-list that was talked about is per-futex. So there's no change to the way futexes are hashed nor should there be any scalability impact - besides the micro-impact that was measured in a number of ways - AFAICS. Ingo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements 2007-01-16 17:50 ` Ingo Molnar @ 2007-01-17 7:50 ` Pierre Peiffer 0 siblings, 0 replies; 11+ messages in thread From: Pierre Peiffer @ 2007-01-17 7:50 UTC (permalink / raw) To: Ingo Molnar; +Cc: Ulrich Drepper, LKML, Andrew Morton, Jakub Jelinek Ingo Molnar a écrit : > * Ulrich Drepper <drepper@redhat.com> wrote: > >>> what do you mean by that - which is this same resource? >> From what has been said here before, all futexes are stored in the >> same list or hash table or whatever it was. I want to see how that >> code behaves if many separate processes concurrently use futexes. > > futexes are stored in the bucket hash, and these patches do not change > that. The pi-list that was talked about is per-futex. So there's no > change to the way futexes are hashed nor should there be any scalability > impact - besides the micro-impact that was measured in a number of ways > - AFAICS. Yes, that's completely right ! -- Pierre ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2007-01-17 7:51 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-09 16:15 [PATCH 2.6.20-rc4 0/4] futexes functionalities and improvements Pierre Peiffer
2007-01-11 17:47 ` Ulrich Drepper
[not found] ` <20070111134615.34902742.akpm@osdl.org>
2007-01-12 7:53 ` Pierre Peiffer
2007-01-12 7:58 ` Ingo Molnar
2007-01-16 8:34 ` Pierre Peiffer
2007-01-16 9:44 ` Ingo Molnar
2007-01-16 15:14 ` Ulrich Drepper
2007-01-16 15:40 ` Ingo Molnar
2007-01-16 17:46 ` Ulrich Drepper
2007-01-16 17:50 ` Ingo Molnar
2007-01-17 7:50 ` Pierre Peiffer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox