* Improving lock pages @ 2013-01-15 17:38 Nathan Zimmer 2013-01-15 18:10 ` Nathan Zimmer 2013-02-06 16:31 ` Mel Gorman 0 siblings, 2 replies; 5+ messages in thread From: Nathan Zimmer @ 2013-01-15 17:38 UTC (permalink / raw) To: Mel Gorman; +Cc: holt, linux-mm Hello Mel, You helped some time ago with contention in lock_pages on very large boxes. You worked with Jack Steiner on this. Currently I am tasked with improving this area even more. So I am fishing for any more ideas that would be productive or worth trying. I have some numbers from a 512 machine. Linux uvpsw1 3.0.51-0.7.9-default #1 SMP Thu Nov 29 22:12:17 UTC 2012 (f3be9d0) x86_64 x86_64 x86_64 GNU/Linux 0.166850 0.082339 0.248428 0.081197 0.127635 Linux uvpsw1 3.8.0-rc1-medusa_ntz_clean-dirty #32 SMP Tue Jan 8 16:01:04 CST 2013 x86_64 x86_64 x86_64 GNU/Linux 0.151778 0.118343 0.135750 0.437019 0.120536 Nathan Zimmer -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Improving lock pages 2013-01-15 17:38 Improving lock pages Nathan Zimmer @ 2013-01-15 18:10 ` Nathan Zimmer 2013-02-06 16:31 ` Mel Gorman 1 sibling, 0 replies; 5+ messages in thread From: Nathan Zimmer @ 2013-01-15 18:10 UTC (permalink / raw) To: Nathan Zimmer; +Cc: Mel Gorman, holt, linux-mm [-- Attachment #1: Type: text/plain, Size: 1443 bytes --] On Tue, Jan 15, 2013 at 11:38:14AM -0600, Nathan Zimmer wrote: > > Hello Mel, > You helped some time ago with contention in lock_pages on very large boxes. > You worked with Jack Steiner on this. Currently I am tasked with improving this > area even more. So I am fishing for any more ideas that would be productive or > worth trying. > > I have some numbers from a 512 machine. > > Linux uvpsw1 3.0.51-0.7.9-default #1 SMP Thu Nov 29 22:12:17 UTC 2012 (f3be9d0) x86_64 x86_64 x86_64 GNU/Linux > 0.166850 > 0.082339 > 0.248428 > 0.081197 > 0.127635 > > Linux uvpsw1 3.8.0-rc1-medusa_ntz_clean-dirty #32 SMP Tue Jan 8 16:01:04 CST 2013 x86_64 x86_64 x86_64 GNU/Linux > 0.151778 > 0.118343 > 0.135750 > 0.437019 > 0.120536 > > Nathan Zimmer > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> I realized I forgot to attach the test. The test is fairly basic. Just fork off a number of threads each on their own cpu have them all wait on a cell and measure how long it took for them to all exit. Usage is ./time_exit -p 3 512 The numbers I have provided where from some runs on a 512 system. I tried for a 4096 box but it was being fickle and was needed for some other testing. [-- Attachment #2: time_exit.c --] [-- Type: text/x-c++src, Size: 2092 bytes --] #define _GNU_SOURCE #include <errno.h> #include <sched.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/mman.h> #include <sys/time.h> #include <sys/wait.h> struct time_exit { volatile int ready __attribute__((aligned(64))); volatile int quit __attribute__((aligned(64))); }; #define cpu_relax() asm volatile ("rep;nop":::"memory"); #define MAXCPUS 4096 static int cpu_set_size; static cpu_set_t *task_affinity; static int delay; static void pin(int cpu) { cpu_set_t *affinity; if (cpu < 0 || cpu >= MAXCPUS) return; affinity = CPU_ALLOC(MAXCPUS); CPU_ZERO_S(cpu_set_size, affinity); CPU_SET_S(cpu, cpu_set_size, affinity); (void)sched_setaffinity(0, cpu_set_size, affinity); CPU_FREE(affinity); return; } static void child(struct time_exit *sharep, int cpu) { pin(cpu); __sync_fetch_and_add(&sharep->ready, 1); while (sharep->quit == 0) cpu_relax(); exit(0); } int main(int argc, char **argv) { int children, i; struct time_exit *sharep; struct timeval tv0, tv1; long secs, usecs; char opt; while ((opt = getopt(argc, argv, "p:")) != -1) { switch (opt) { case 'p': delay = atoi(optarg); break; default: fprintf(stderr, "Usage:\n"); } } argv += optind - 1; argc -= optind - 1; if (argc != 2) { printf("Wrong\n"); exit(-1); } children = atoi(argv[1]); cpu_set_size = CPU_ALLOC_SIZE(MAXCPUS); task_affinity = CPU_ALLOC(MAXCPUS); if (sched_getaffinity(0, cpu_set_size, task_affinity) < 0) { perror("Failed in sched_getaffinitt"); exit(-2); } sharep = mmap(0, sizeof(struct time_exit), PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_SHARED, -1, 0); for (i = 0; i < children; i++) if (fork() == 0) child(sharep, i); while (sharep->ready != children) cpu_relax(); if (delay) sleep(delay); gettimeofday(&tv0, NULL); sharep->quit = 1; while (wait(&i) > 0) cpu_relax(); gettimeofday(&tv1, NULL); usecs = tv1.tv_usec - tv0.tv_usec; secs = tv1.tv_sec - tv0.tv_sec; if (usecs < 0) { secs--; usecs += 1000000; } printf("%7ld.%06ld\n", secs, usecs); return 0; } ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Improving lock pages 2013-01-15 17:38 Improving lock pages Nathan Zimmer 2013-01-15 18:10 ` Nathan Zimmer @ 2013-02-06 16:31 ` Mel Gorman 2013-02-08 21:55 ` Nathan Zimmer 1 sibling, 1 reply; 5+ messages in thread From: Mel Gorman @ 2013-02-06 16:31 UTC (permalink / raw) To: Nathan Zimmer; +Cc: holt, linux-mm On Tue, Jan 15, 2013 at 11:38:14AM -0600, Nathan Zimmer wrote: > > Hello Mel, Hi Nathan, > You helped some time ago with contention in lock_pages on very large boxes. It was Nick Piggin and Jack Steiner that helped the situation within SLES and before my time. I inherited the relevant patches but made relatively few contributions to the effort. > You worked with Jack Steiner on this. Currently I am tasked with improving this > area even more. So I am fishing for any more ideas that would be productive or > worth trying. > > I have some numbers from a 512 machine. > > Linux uvpsw1 3.0.51-0.7.9-default #1 SMP Thu Nov 29 22:12:17 UTC 2012 (f3be9d0) x86_64 x86_64 x86_64 GNU/Linux > 0.166850 > 0.082339 > 0.248428 > 0.081197 > 0.127635 Ok, this looks like a SLES 11 SP2 kernel and so includes some unlock/lock page optimisations. > Linux uvpsw1 3.8.0-rc1-medusa_ntz_clean-dirty #32 SMP Tue Jan 8 16:01:04 CST 2013 x86_64 x86_64 x86_64 GNU/Linux > 0.151778 > 0.118343 > 0.135750 > 0.437019 > 0.120536 > And this is a mainline-ish kernel which doesn't. The main reason I never made an strong effort to push them upstream because the problems are barely observable on any machine I had access to. The unlock page optimisation requires a page flag and while it helps profiles a little, the effects are barely observable on smaller machines (at least since I last checked). One machine it was reported to help dramatically was a 768-way 128 node machine. Forthe 512-way machine you're testing with the figures are marginal. The time to exit is shorter but the amount of time is tiny and very close to noise. I forward ported the relevant patches but on a 48-way machine the results for the same test were well within the noise and the standard deviation was higher. I know you're tasked with improving this area more but what are you using as your example workload? What's the minimum sized machine needed for the optimisations to make a difference? -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Improving lock pages 2013-02-06 16:31 ` Mel Gorman @ 2013-02-08 21:55 ` Nathan Zimmer 2013-02-13 10:47 ` Mel Gorman 0 siblings, 1 reply; 5+ messages in thread From: Nathan Zimmer @ 2013-02-08 21:55 UTC (permalink / raw) To: Mel Gorman; +Cc: holt, linux-mm On 02/06/2013 10:31 AM, Mel Gorman wrote: > On Tue, Jan 15, 2013 at 11:38:14AM -0600, Nathan Zimmer wrote: >> Hello Mel, > Hi Nathan, > >> You helped some time ago with contention in lock_pages on very large boxes. > It was Nick Piggin and Jack Steiner that helped the situation within SLES > and before my time. I inherited the relevant patches but made relatively > few contributions to the effort. > >> You worked with Jack Steiner on this. Currently I am tasked with improving this >> area even more. So I am fishing for any more ideas that would be productive or >> worth trying. >> >> I have some numbers from a 512 machine. >> >> Linux uvpsw1 3.0.51-0.7.9-default #1 SMP Thu Nov 29 22:12:17 UTC 2012 (f3be9d0) x86_64 x86_64 x86_64 GNU/Linux >> 0.166850 >> 0.082339 >> 0.248428 >> 0.081197 >> 0.127635 > Ok, this looks like a SLES 11 SP2 kernel and so includes some unlock/lock > page optimisations. > >> Linux uvpsw1 3.8.0-rc1-medusa_ntz_clean-dirty #32 SMP Tue Jan 8 16:01:04 CST 2013 x86_64 x86_64 x86_64 GNU/Linux >> 0.151778 >> 0.118343 >> 0.135750 >> 0.437019 >> 0.120536 >> > And this is a mainline-ish kernel which doesn't. > > The main reason I never made an strong effort to push them upstream > because the problems are barely observable on any machine I had access to. > The unlock page optimisation requires a page flag and while it helps > profiles a little, the effects are barely observable on smaller machines > (at least since I last checked). One machine it was reported to help > dramatically was a 768-way 128 node machine. > > Forthe 512-way machine you're testing with the figures are marginal. The > time to exit is shorter but the amount of time is tiny and very close to > noise. I forward ported the relevant patches but on a 48-way machine the > results for the same test were well within the noise and the standard > deviation was higher. One thing I had noticed the performance curve on this issue is worse then linear. This has made it tough to measure/capture data on smaller boxes. > I know you're tasked with improving this area more but what are you > using as your example workload? What's the minimum sized machine needed > for the optimisations to make a difference? > Right now I am just using the time_exit test I posted earlier. I know it is a bit artificial and am open to suggestion. One of the rough goals is to get under a second on a 4096 box. Also here are some numbers from a larger box with 3.8-rc4... nzimmer@uv48-sys:~/tests/time_exit> for I in $(seq 1 5); { ./time_exit -p 3 2048; } 0.762282 0.810356 0.777785 0.840679 0.743509 nzimmer@uv48-sys:~/tests/time_exit> for I in $(seq 1 5); { ./time_exit -p 3 4096; } 2.550571 2.374378 2.669021 2.703232 2.679028 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Improving lock pages 2013-02-08 21:55 ` Nathan Zimmer @ 2013-02-13 10:47 ` Mel Gorman 0 siblings, 0 replies; 5+ messages in thread From: Mel Gorman @ 2013-02-13 10:47 UTC (permalink / raw) To: Nathan Zimmer; +Cc: holt, linux-mm On Fri, Feb 08, 2013 at 03:55:09PM -0600, Nathan Zimmer wrote: > >The main reason I never made an strong effort to push them upstream > >because the problems are barely observable on any machine I had access to. > >The unlock page optimisation requires a page flag and while it helps > >profiles a little, the effects are barely observable on smaller machines > >(at least since I last checked). One machine it was reported to help > >dramatically was a 768-way 128 node machine. > > > >Forthe 512-way machine you're testing with the figures are marginal. The > >time to exit is shorter but the amount of time is tiny and very close to > >noise. I forward ported the relevant patches but on a 48-way machine the > >results for the same test were well within the noise and the standard > >deviation was higher. > > One thing I had noticed the performance curve on this issue is worse > then linear. > This has made it tough to measure/capture data on smaller boxes. > While this is true the figures you present are of marginal gain given the complexity involved. I know the patches also affected boot-times quite significantly but this was not a common task for the machines involved. > >I know you're tasked with improving this area more but what are you > >using as your example workload? What's the minimum sized machine needed > >for the optimisations to make a difference? > > > > Right now I am just using the time_exit test I posted earlier. > I know it is a bit artificial and am open to suggestion. > I'm not currently aware of a workload that is dominated by lock_page contention and I was expecting SGI was. There are plenty of times where we stall on lock_page but it's usually IO related and not because processes trying to acquire the lock went to sleep too quickly. > One of the rough goals is to get under a second on a 4096 box. > > Also here are some numbers from a larger box with 3.8-rc4... > nzimmer@uv48-sys:~/tests/time_exit> for I in $(seq 1 5); { > ./time_exit -p 3 2048; } > 0.762282 > 0.810356 > 0.777785 > 0.840679 > 0.743509 > > nzimmer@uv48-sys:~/tests/time_exit> for I in $(seq 1 5); { > ./time_exit -p 3 4096; } > 2.550571 > 2.374378 > 2.669021 > 2.703232 > 2.679028 > I collapsed the patches, editted them a bit and pushed them to the mm-lock-page-optimise-v1r1 branch in the git repository git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git The patches are rebased against 3.8-rc6 but I did not pay any special attention to actually improving them. I did leave a few notes on what could be done in the changelog. You could try them out as a starting point and see if they can be reduced to the minimum you require. Unfortunately I suspect that you'll need a more compelling test case than time_exit on a 4096-way machine to justify pushing them to mainline. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-02-13 10:47 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-01-15 17:38 Improving lock pages Nathan Zimmer 2013-01-15 18:10 ` Nathan Zimmer 2013-02-06 16:31 ` Mel Gorman 2013-02-08 21:55 ` Nathan Zimmer 2013-02-13 10:47 ` Mel Gorman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).