* fio hangs with --status-interval @ 2014-07-09 22:56 Michael Mattsson 2014-07-10 8:44 ` Jens Axboe 0 siblings, 1 reply; 25+ messages in thread From: Michael Mattsson @ 2014-07-09 22:56 UTC (permalink / raw) To: fio Hey, I've got 8 identical CentOS 6.5 clients that randomly keeps hanging fio when using --status-interval. I've tried fio 2.1.4 and fio 2.1.10 they both behave the same. I've also tried piping the output to tee instead of redirecting to a file. I also tried --output and specified output file, still same problem. My fio command runs through its tests flawlessly without --status-interval and exits cleanly every time. There could be anywhere from 0 to 5 clients that gets affected. Running strace on the process that seem hung yields the following output: $ strace -p 31055 Process 31055 attached - interrupt to quit futex(0x7f346ede802c, FUTEX_WAIT, 1, NULL It will sit there forever. Excuse the splurge but this is the command (run 8 clients simultaneously where /fut1 and /fut2 is on shared storage (NFSv3)): $ fio --minimal --direct=1 --group_reporting --filesize=64m --norandommap --blocksize=4k --time_based --iodepth=1 --ramp_time=15 --ioengine=libaio --status-interval=5 --name=mytest-foo --rw=randread --numjobs=64 --filename=/fut1/6of4_1:/fut2/6of4_1:/fut1/6of4_2:/fut2/6of4_2:/fut1/6of4_3:/fut2/6of4_3:/fut1/6of4_4:/fut2/6of4_4:/fut1/6of4_5:/fut2/6of4_5:/fut1/6of4_6:/fut2/6of4_6:/fut1/6of4_7:/fut2/6of4_7:/fut1/6of4_8:/fut2/6of4_8:/fut1/6of4_9:/fut2/6of4_9:/fut1/6of4_10:/fut2/6of4_10:/fut1/6of4_11:/fut2/6of4_11:/fut1/6of4_12:/fut2/6of4_12:/fut1/6of4_13:/fut2/6of4_13:/fut1/6of4_14:/fut2/6of4_14:/fut1/6of4_15:/fut2/6of4_15:/fut1/6of4_16:/fut2/6of4_16:/fut1/6of4_17:/fut2/6of4_17:/fut1/6of4_18:/fut2/6of4_18:/fut1/6of4_19:/fut2/6of4_19:/fut1/6of4_20:/fut2/6of4_20:/fut1/6of4_21:/fut2/6of4_21:/fut1/6of4_22:/fut2/6of4_22:/fut1/6of4_23:/fut2/6of4_23:/fut1/6of4_24:/fut2/6of4_24:/fut1/6of4_25:/fut2/6of4_25:/fut1/6of4_26:/fut2/6of4_26:/fut1/6of4_27:/fut2/6of4_27:/fut1/6of4_28:/fut2/6of4_28:/fut1/6of4_29:/fut2/6of4_29:/fut1/6of4_30:/fut2/6of4_30:/fut1/6of4_31:/fut2/6of4_31:/fut1/6of4_32:/fut2/6of4_32 --runtime=60 Worth noting is that the output file is redirected to shared storage (NFSv4) on a different system than the one under test. These are the fio 2.1.4 compile options: $ ./configure Operating system Linux CPU x86_64 Big endian no Compiler gcc Cross compile no Wordsize 64 zlib no Linux AIO support yes POSIX AIO support yes POSIX AIO support needs -lrt yes POSIX AIO fsync yes Solaris AIO support no __sync_fetch_and_add yes libverbs no rdmacm no Linux fallocate yes POSIX fadvise yes POSIX fallocate yes sched_setaffinity(3 arg) yes sched_setaffinity(2 arg) no clock_gettime yes CLOCK_MONOTONIC yes CLOCK_MONOTONIC_PRECISE no gettimeofday yes fdatasync yes sync_file_range yes EXT4 move extent yes Linux splice(2) yes GUASI no Fusion-io atomic engine no libnuma no strsep yes strcasestr yes getopt_long_only() yes inet_aton yes socklen_t yes __thread yes gtk 2.18 or higher no RUSAGE_THREAD yes SCHED_IDLE yes TCP_NODELAY yes RLIMIT_MEMLOCK yes pwritev/preadv yes Regards Michael ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval 2014-07-09 22:56 fio hangs with --status-interval Michael Mattsson @ 2014-07-10 8:44 ` Jens Axboe 2014-07-10 14:55 ` Michael Mattsson 0 siblings, 1 reply; 25+ messages in thread From: Jens Axboe @ 2014-07-10 8:44 UTC (permalink / raw) To: Michael Mattsson, fio [-- Attachment #1: Type: text/plain, Size: 988 bytes --] On 2014-07-10 00:56, Michael Mattsson wrote: > Hey, > I've got 8 identical CentOS 6.5 clients that randomly keeps hanging > fio when using --status-interval. I've tried fio 2.1.4 and fio 2.1.10 > they both behave the same. I've also tried piping the output to tee > instead of redirecting to a file. I also tried --output and specified > output file, still same problem. My fio command runs through its tests > flawlessly without --status-interval and exits cleanly every time. > There could be anywhere from 0 to 5 clients that gets affected. > Running strace on the process that seem hung yields the following > output: > > $ strace -p 31055 > Process 31055 attached - interrupt to quit > futex(0x7f346ede802c, FUTEX_WAIT, 1, NULL Strange, it must be stuck on the stat mutex, but I don't immediately see why that would happen. Does the attached patch make any difference for you, both in getting rid of the hang but still producing output at the desired intervals? -- Jens Axboe [-- Attachment #2: stat.patch --] [-- Type: text/x-patch, Size: 1790 bytes --] diff --git a/stat.c b/stat.c index 979c8100d378..93316a239f7b 100644 --- a/stat.c +++ b/stat.c @@ -1466,11 +1466,12 @@ static void *__show_running_run_stats(void fio_unused *arg) * in the sig handler, but we should be disturbing the system less by just * creating a thread to do it. */ -void show_running_run_stats(void) +int show_running_run_stats(void) { pthread_t thread; - fio_mutex_down(stat_mutex); + if (fio_mutex_down_trylock(stat_mutex)) + return 1; if (!pthread_create(&thread, NULL, __show_running_run_stats, NULL)) { int err; @@ -1479,10 +1480,11 @@ void show_running_run_stats(void) if (err) log_err("fio: DU thread detach failed: %s\n", strerror(err)); - return; + return 0; } fio_mutex_up(stat_mutex); + return 1; } static int status_interval_init; @@ -1531,8 +1533,8 @@ void check_for_running_stats(void) fio_gettime(&status_time, NULL); status_interval_init = 1; } else if (mtime_since_now(&status_time) >= status_interval) { - show_running_run_stats(); - fio_gettime(&status_time, NULL); + if (!show_running_run_stats()) + fio_gettime(&status_time, NULL); return; } } diff --git a/stat.h b/stat.h index 2e46175053e8..82b8e973e4be 100644 --- a/stat.h +++ b/stat.h @@ -218,7 +218,7 @@ extern void show_group_stats(struct group_run_stats *rs); extern int calc_thread_status(struct jobs_eta *je, int force); extern void display_thread_status(struct jobs_eta *je); extern void show_run_stats(void); -extern void show_running_run_stats(void); +extern int show_running_run_stats(void); extern void check_for_running_stats(void); extern void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src, int nr); extern void sum_group_stats(struct group_run_stats *dst, struct group_run_stats *src); ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval 2014-07-10 8:44 ` Jens Axboe @ 2014-07-10 14:55 ` Michael Mattsson 2014-07-10 20:07 ` Jens Axboe 0 siblings, 1 reply; 25+ messages in thread From: Michael Mattsson @ 2014-07-10 14:55 UTC (permalink / raw) To: Jens Axboe; +Cc: fio Hey, Thanks for the patch. I got log output but I had two clients hanging in the same way with the attached patch. $ patch -p0 < stat.patch patching file b/stat.c patching file b/stat.h Hunk #1 succeeded at 213 (offset -5 lines). I was using --output <filename> Worth mentioning here is that on the NFS server the output file is written to is issuing a stat(1) on the above output file once per second. Could it cause any prolems? Thanks! Michael On Thu, Jul 10, 2014 at 1:44 AM, Jens Axboe <axboe@kernel.dk> wrote: > On 2014-07-10 00:56, Michael Mattsson wrote: >> >> Hey, >> I've got 8 identical CentOS 6.5 clients that randomly keeps hanging >> fio when using --status-interval. I've tried fio 2.1.4 and fio 2.1.10 >> they both behave the same. I've also tried piping the output to tee >> instead of redirecting to a file. I also tried --output and specified >> output file, still same problem. My fio command runs through its tests >> flawlessly without --status-interval and exits cleanly every time. >> There could be anywhere from 0 to 5 clients that gets affected. >> Running strace on the process that seem hung yields the following >> output: >> >> $ strace -p 31055 >> Process 31055 attached - interrupt to quit >> futex(0x7f346ede802c, FUTEX_WAIT, 1, NULL > > > Strange, it must be stuck on the stat mutex, but I don't immediately see why > that would happen. Does the attached patch make any difference for you, both > in getting rid of the hang but still producing output at the desired > intervals? > > -- > Jens Axboe > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval 2014-07-10 14:55 ` Michael Mattsson @ 2014-07-10 20:07 ` Jens Axboe 2014-07-10 21:27 ` Michael Mattsson 0 siblings, 1 reply; 25+ messages in thread From: Jens Axboe @ 2014-07-10 20:07 UTC (permalink / raw) To: Michael Mattsson; +Cc: fio On 2014-07-10 16:55, Michael Mattsson wrote: > Hey, > Thanks for the patch. I got log output but I had two clients hanging > in the same way with the attached patch. > > $ patch -p0 < stat.patch > patching file b/stat.c > patching file b/stat.h > Hunk #1 succeeded at 213 (offset -5 lines). > > I was using --output <filename> > > Worth mentioning here is that on the NFS server the output file is > written to is issuing a stat(1) on the above output file once per > second. Could it cause any prolems? OK, so next question. If you leave it long enough, do you get "stuck process" dumps from the kernel? In any case, a: # echo t > /proc/sysrq-trigger dump would be handy to see for all the fio processes, this doesn't smell like a fio issue. -- Jens Axboe ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval 2014-07-10 20:07 ` Jens Axboe @ 2014-07-10 21:27 ` Michael Mattsson 2014-07-11 11:48 ` Jens Axboe 0 siblings, 1 reply; 25+ messages in thread From: Michael Mattsson @ 2014-07-10 21:27 UTC (permalink / raw) To: Jens Axboe; +Cc: fio Hey, I don't get any kernel messages about stuck processes. echo t > /proc/sysrq-trigger gives the following related to fio: fio S 0000000000000004 0 4189 3638 0x00000080 ffff8817a0fd5bf8 0000000000000086 0000000000000000 ffffffff8111f867 ffff8817a0fd5d01 00007fa2f43b6000 ffff8817a0fd5bd8 ffffffff810aec10 ffff88186b867ab8 ffff8817a0fd5fd8 000000000000fbc8 ffff88186b867ab8 Call Trace: [<ffffffff8111f867>] ? unlock_page+0x27/0x30 [<ffffffff810aec10>] ? get_futex_key+0x180/0x2b0 [<ffffffff810ae559>] futex_wait_queue_me+0xb9/0xf0 [<ffffffff810af668>] futex_wait+0x1f8/0x380 [<ffffffff8100988e>] ? __switch_to+0x26e/0x320 [<ffffffff810b0f31>] do_futex+0x121/0xb50 [<ffffffff8109f491>] ? lock_hrtimer_base+0x31/0x60 [<ffffffff810a010f>] ? hrtimer_try_to_cancel+0x3f/0xd0 [<ffffffff810a01c2>] ? hrtimer_cancel+0x22/0x30 [<ffffffff8152a413>] ? do_nanosleep+0x93/0xc0 [<ffffffff810a0294>] ? hrtimer_nanosleep+0xc4/0x180 [<ffffffff810b19db>] sys_futex+0x7b/0x170 [<ffffffff810e1e87>] ? audit_syscall_entry+0x1d7/0x200 [<ffffffff810e1c7e>] ? __audit_syscall_exit+0x25e/0x290 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b fio S 0000000000000004 0 4191 3638 0x00000080 ffff88186bccde68 0000000000000086 0000000000000000 ffffffff8109fdd3 0000000000000004 0000000100000286 ffff88186bccde08 ffffffff8109f491 ffff8818680c7af8 ffff88186bccdfd8 000000000000fbc8 ffff8818680c7af8 Call Trace: [<ffffffff8109fdd3>] ? __hrtimer_start_range_ns+0x1a3/0x460 [<ffffffff8109f491>] ? lock_hrtimer_base+0x31/0x60 [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20 [<ffffffff8152a40b>] do_nanosleep+0x8b/0xc0 [<ffffffff810a0294>] hrtimer_nanosleep+0xc4/0x180 [<ffffffff8109f0f0>] ? hrtimer_wakeup+0x0/0x30 [<ffffffff810a00c4>] ? hrtimer_start_range_ns+0x14/0x20 [<ffffffff810a03be>] sys_nanosleep+0x6e/0x80 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b fio S 0000000000000002 0 4270 3638 0x00000080 ffff88186a503bf8 0000000000000086 ffff88186a503b68 ffffffff8111f867 ffff88186a503d01 00007fa2f39b2000 ffff88186a503bd8 ffffffff810aec10 ffff88186a747058 ffff88186a503fd8 000000000000fbc8 ffff88186a747058 Call Trace: [<ffffffff8111f867>] ? unlock_page+0x27/0x30 [<ffffffff810aec10>] ? get_futex_key+0x180/0x2b0 [<ffffffff810ae559>] futex_wait_queue_me+0xb9/0xf0 [<ffffffff810af668>] futex_wait+0x1f8/0x380 [<ffffffff810b0f31>] do_futex+0x121/0xb50 [<ffffffff810b19db>] sys_futex+0x7b/0x170 [<ffffffff810e1e87>] ? audit_syscall_entry+0x1d7/0x200 [<ffffffff810e1c7e>] ? __audit_syscall_exit+0x25e/0x290 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Above output is without the stat.patch, below is with the patch: fio S 0000000000000004 0 4189 3638 0x00000080 ffff8817a0fd5bf8 0000000000000086 0000000000000000 ffffffff8111f867 ffff8817a0fd5d01 00007fa2f43b6000 ffff8817a0fd5bd8 ffffffff810aec10 ffff88186b867ab8 ffff8817a0fd5fd8 000000000000fbc8 ffff88186b867ab8 Call Trace: [<ffffffff8111f867>] ? unlock_page+0x27/0x30 [<ffffffff810aec10>] ? get_futex_key+0x180/0x2b0 [<ffffffff810ae559>] futex_wait_queue_me+0xb9/0xf0 [<ffffffff810af668>] futex_wait+0x1f8/0x380 [<ffffffff8100988e>] ? __switch_to+0x26e/0x320 [<ffffffff810b0f31>] do_futex+0x121/0xb50 [<ffffffff8109f491>] ? lock_hrtimer_base+0x31/0x60 [<ffffffff810a010f>] ? hrtimer_try_to_cancel+0x3f/0xd0 [<ffffffff810a01c2>] ? hrtimer_cancel+0x22/0x30 [<ffffffff8152a413>] ? do_nanosleep+0x93/0xc0 [<ffffffff810a0294>] ? hrtimer_nanosleep+0xc4/0x180 [<ffffffff810b19db>] sys_futex+0x7b/0x170 [<ffffffff810e1e87>] ? audit_syscall_entry+0x1d7/0x200 [<ffffffff810e1c7e>] ? __audit_syscall_exit+0x25e/0x290 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b fio S 0000000000000004 0 4191 3638 0x00000080 ffff88186bccde68 0000000000000086 0000000000000000 ffffffff8109fdd3 0000000000000004 0000000100000286 ffff88186bccde08 ffffffff8109f491 ffff8818680c7af8 ffff88186bccdfd8 000000000000fbc8 ffff8818680c7af8 Call Trace: [<ffffffff8109fdd3>] ? __hrtimer_start_range_ns+0x1a3/0x460 [<ffffffff8109f491>] ? lock_hrtimer_base+0x31/0x60 [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20 [<ffffffff8152a40b>] do_nanosleep+0x8b/0xc0 [<ffffffff810a0294>] hrtimer_nanosleep+0xc4/0x180 [<ffffffff8109f0f0>] ? hrtimer_wakeup+0x0/0x30 [<ffffffff810a00c4>] ? hrtimer_start_range_ns+0x14/0x20 [<ffffffff810a03be>] sys_nanosleep+0x6e/0x80 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b fio S 0000000000000002 0 4270 3638 0x00000080 ffff88186a503bf8 0000000000000086 ffff88186a503b68 ffffffff8111f867 ffff88186a503d01 00007fa2f39b2000 ffff88186a503bd8 ffffffff810aec10 ffff88186a747058 ffff88186a503fd8 000000000000fbc8 ffff88186a747058 Call Trace: [<ffffffff8111f867>] ? unlock_page+0x27/0x30 [<ffffffff810aec10>] ? get_futex_key+0x180/0x2b0 [<ffffffff810ae559>] futex_wait_queue_me+0xb9/0xf0 [<ffffffff810af668>] futex_wait+0x1f8/0x380 [<ffffffff810b0f31>] do_futex+0x121/0xb50 [<ffffffff810b19db>] sys_futex+0x7b/0x170 [<ffffffff810e1e87>] ? audit_syscall_entry+0x1d7/0x200 [<ffffffff810e1c7e>] ? __audit_syscall_exit+0x25e/0x290 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Thanks! Michael On Thu, Jul 10, 2014 at 1:07 PM, Jens Axboe <axboe@kernel.dk> wrote: > On 2014-07-10 16:55, Michael Mattsson wrote: >> >> Hey, >> Thanks for the patch. I got log output but I had two clients hanging >> in the same way with the attached patch. >> >> $ patch -p0 < stat.patch >> patching file b/stat.c >> patching file b/stat.h >> Hunk #1 succeeded at 213 (offset -5 lines). >> >> I was using --output <filename> >> >> Worth mentioning here is that on the NFS server the output file is >> written to is issuing a stat(1) on the above output file once per >> second. Could it cause any prolems? > > > OK, so next question. If you leave it long enough, do you get "stuck > process" dumps from the kernel? In any case, a: > > # echo t > /proc/sysrq-trigger > > dump would be handy to see for all the fio processes, this doesn't smell > like a fio issue. > > -- > Jens Axboe > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval 2014-07-10 21:27 ` Michael Mattsson @ 2014-07-11 11:48 ` Jens Axboe 2014-07-11 16:19 ` Michael Mattsson 0 siblings, 1 reply; 25+ messages in thread From: Jens Axboe @ 2014-07-11 11:48 UTC (permalink / raw) To: Michael Mattsson; +Cc: fio On 2014-07-10 23:27, Michael Mattsson wrote: > Hey, > I don't get any kernel messages about stuck processes. echo t > > /proc/sysrq-trigger gives the following related to fio: > > fio S 0000000000000004 0 4189 3638 0x00000080 > ffff8817a0fd5bf8 0000000000000086 0000000000000000 ffffffff8111f867 > ffff8817a0fd5d01 00007fa2f43b6000 ffff8817a0fd5bd8 ffffffff810aec10 > ffff88186b867ab8 ffff8817a0fd5fd8 000000000000fbc8 ffff88186b867ab8 > Call Trace: > [<ffffffff8111f867>] ? unlock_page+0x27/0x30 > [<ffffffff810aec10>] ? get_futex_key+0x180/0x2b0 > [<ffffffff810ae559>] futex_wait_queue_me+0xb9/0xf0 > [<ffffffff810af668>] futex_wait+0x1f8/0x380 > [<ffffffff8100988e>] ? __switch_to+0x26e/0x320 > [<ffffffff810b0f31>] do_futex+0x121/0xb50 > [<ffffffff8109f491>] ? lock_hrtimer_base+0x31/0x60 > [<ffffffff810a010f>] ? hrtimer_try_to_cancel+0x3f/0xd0 > [<ffffffff810a01c2>] ? hrtimer_cancel+0x22/0x30 > [<ffffffff8152a413>] ? do_nanosleep+0x93/0xc0 > [<ffffffff810a0294>] ? hrtimer_nanosleep+0xc4/0x180 > [<ffffffff810b19db>] sys_futex+0x7b/0x170 > [<ffffffff810e1e87>] ? audit_syscall_entry+0x1d7/0x200 > [<ffffffff810e1c7e>] ? __audit_syscall_exit+0x25e/0x290 > [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b OK, so nothing unexpected there. But it's all still very weird. Could you try and attach gdb to a fio process that is stuck like this, and generate 'bt' backtraces? -- Jens Axboe ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval 2014-07-11 11:48 ` Jens Axboe @ 2014-07-11 16:19 ` Michael Mattsson 2014-07-12 8:42 ` Jens Axboe 0 siblings, 1 reply; 25+ messages in thread From: Michael Mattsson @ 2014-07-11 16:19 UTC (permalink / raw) To: Jens Axboe; +Cc: fio Hey, This system will be busy over the weekend with other work, I'll get this to you at the next window or try reproduce it on a different system. Do you want the trace with or without the patch you sent? Thanks! Michael On Fri, Jul 11, 2014 at 4:48 AM, Jens Axboe <axboe@kernel.dk> wrote: > On 2014-07-10 23:27, Michael Mattsson wrote: >> >> Hey, >> I don't get any kernel messages about stuck processes. echo t > >> /proc/sysrq-trigger gives the following related to fio: >> >> fio S 0000000000000004 0 4189 3638 0x00000080 >> ffff8817a0fd5bf8 0000000000000086 0000000000000000 ffffffff8111f867 >> ffff8817a0fd5d01 00007fa2f43b6000 ffff8817a0fd5bd8 ffffffff810aec10 >> ffff88186b867ab8 ffff8817a0fd5fd8 000000000000fbc8 ffff88186b867ab8 >> Call Trace: >> [<ffffffff8111f867>] ? unlock_page+0x27/0x30 >> [<ffffffff810aec10>] ? get_futex_key+0x180/0x2b0 >> [<ffffffff810ae559>] futex_wait_queue_me+0xb9/0xf0 >> [<ffffffff810af668>] futex_wait+0x1f8/0x380 >> [<ffffffff8100988e>] ? __switch_to+0x26e/0x320 >> [<ffffffff810b0f31>] do_futex+0x121/0xb50 >> [<ffffffff8109f491>] ? lock_hrtimer_base+0x31/0x60 >> [<ffffffff810a010f>] ? hrtimer_try_to_cancel+0x3f/0xd0 >> [<ffffffff810a01c2>] ? hrtimer_cancel+0x22/0x30 >> [<ffffffff8152a413>] ? do_nanosleep+0x93/0xc0 >> [<ffffffff810a0294>] ? hrtimer_nanosleep+0xc4/0x180 >> [<ffffffff810b19db>] sys_futex+0x7b/0x170 >> [<ffffffff810e1e87>] ? audit_syscall_entry+0x1d7/0x200 >> [<ffffffff810e1c7e>] ? __audit_syscall_exit+0x25e/0x290 >> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b > > > OK, so nothing unexpected there. But it's all still very weird. Could you > try and attach gdb to a fio process that is stuck like this, and generate > 'bt' backtraces? > > -- > Jens Axboe > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval 2014-07-11 16:19 ` Michael Mattsson @ 2014-07-12 8:42 ` Jens Axboe 2014-07-16 16:58 ` Vasily Tarasov 0 siblings, 1 reply; 25+ messages in thread From: Jens Axboe @ 2014-07-12 8:42 UTC (permalink / raw) To: Michael Mattsson; +Cc: fio On 2014-07-11 18:19, Michael Mattsson wrote: > Hey, > This system will be busy over the weekend with other work, I'll get > this to you at the next window or try reproduce it on a different > system. Do you want the trace with or without the patch you sent? That's fine - and either version can be used, doesn't matter to me. -- Jens Axboe ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval 2014-07-12 8:42 ` Jens Axboe @ 2014-07-16 16:58 ` Vasily Tarasov 2014-07-21 8:08 ` Jens Axboe 0 siblings, 1 reply; 25+ messages in thread From: Vasily Tarasov @ 2014-07-16 16:58 UTC (permalink / raw) To: Jens Axboe; +Cc: Michael Mattsson, fio@vger.kernel.org I started to observe similar behavior on one of my workloads. Also, with periodic statistics output and also on RHEL 6.5. Here is gdb output in my case: # ps axu | grep fio root 4489 0.0 0.0 322040 52816 pts/1 Sl+ 08:31 0:03 fio --status-interval 10 --minimal fios/1.fio root 5547 0.0 0.0 103256 860 pts/0 S+ 09:56 0:00 grep fio # cat /proc/4489/wchan futex_wait_queue_me # gdb GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1) Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. (gdb) attach 4489 Attaching to process 4489 Reading symbols from /usr/local/bin/fio...done. Reading symbols from /usr/lib64/librdmacm.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/librdmacm.so.1 Reading symbols from /usr/lib64/libibverbs.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libibverbs.so.1 Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/librt.so.1 Reading symbols from /lib64/libaio.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/libaio.so.1 Reading symbols from /lib64/libz.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/libz.so.1 Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done. Loaded symbols for /lib64/libm.so.6 Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done. [New LWP 4768] [New LWP 4491] [Thread debugging using libthread_db enabled] Loaded symbols for /lib64/libpthread.so.0 Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.132.el6.x86_64 libaio-0.3.107-10.el6.x86_64 libibverbs-1.1.7-1.el6.x86_64 librdmacm-1.0.17-1.el6.x86_64 zlib-1.2.3-29.el6.x86_64 (gdb) bt #0 0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x000000000042ea39 in fio_mutex_down (mutex=0x7f08b4e9f000) at mutex.c:155 #2 0x000000000041b680 in show_run_stats () at stat.c:1409 #3 0x0000000000449c85 in fio_backend () at backend.c:2042 #4 0x000000376ee1ed1d in __libc_start_main () from /lib64/libc.so.6 #5 0x000000000040a4b9 in _start () (gdb) Does it help? Thanks, Vasily On Sat, Jul 12, 2014 at 4:42 AM, Jens Axboe <axboe@kernel.dk> wrote: > On 2014-07-11 18:19, Michael Mattsson wrote: >> >> Hey, >> This system will be busy over the weekend with other work, I'll get >> this to you at the next window or try reproduce it on a different >> system. Do you want the trace with or without the patch you sent? > > > That's fine - and either version can be used, doesn't matter to me. > > > -- > Jens Axboe > > -- > To unsubscribe from this list: send the line "unsubscribe fio" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval 2014-07-16 16:58 ` Vasily Tarasov @ 2014-07-21 8:08 ` Jens Axboe 2014-07-21 20:25 ` Vasily Tarasov 0 siblings, 1 reply; 25+ messages in thread From: Jens Axboe @ 2014-07-21 8:08 UTC (permalink / raw) To: Vasily Tarasov; +Cc: Michael Mattsson, fio@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 3422 bytes --] On 2014-07-16 18:58, Vasily Tarasov wrote: > I started to observe similar behavior on one of my workloads. Also, > with periodic statistics output and also on RHEL 6.5. Here is gdb > output in my case: > > > # ps axu | grep fio > root 4489 0.0 0.0 322040 52816 pts/1 Sl+ 08:31 0:03 fio > --status-interval 10 --minimal fios/1.fio > root 5547 0.0 0.0 103256 860 pts/0 S+ 09:56 0:00 grep fio > > # cat /proc/4489/wchan > futex_wait_queue_me > > # gdb > GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1) > Copyright (C) 2010 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "x86_64-redhat-linux-gnu". > For bug reporting instructions, please see: > <http://www.gnu.org/software/gdb/bugs/>. > (gdb) attach 4489 > Attaching to process 4489 > Reading symbols from /usr/local/bin/fio...done. > Reading symbols from /usr/lib64/librdmacm.so.1...(no debugging symbols > found)...done. > Loaded symbols for /usr/lib64/librdmacm.so.1 > Reading symbols from /usr/lib64/libibverbs.so.1...(no debugging > symbols found)...done. > Loaded symbols for /usr/lib64/libibverbs.so.1 > Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done. > Loaded symbols for /lib64/librt.so.1 > Reading symbols from /lib64/libaio.so.1...(no debugging symbols found)...done. > Loaded symbols for /lib64/libaio.so.1 > Reading symbols from /lib64/libz.so.1...(no debugging symbols found)...done. > Loaded symbols for /lib64/libz.so.1 > Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done. > Loaded symbols for /lib64/libm.so.6 > Reading symbols from /lib64/libpthread.so.0...(no debugging symbols > found)...done. > [New LWP 4768] > [New LWP 4491] > [Thread debugging using libthread_db enabled] > Loaded symbols for /lib64/libpthread.so.0 > Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done. > Loaded symbols for /lib64/libdl.so.2 > Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done. > Loaded symbols for /lib64/libc.so.6 > Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging > symbols found)...done. > Loaded symbols for /lib64/ld-linux-x86-64.so.2 > 0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > Missing separate debuginfos, use: debuginfo-install > glibc-2.12-1.132.el6.x86_64 libaio-0.3.107-10.el6.x86_64 > libibverbs-1.1.7-1.el6.x86_64 librdmacm-1.0.17-1.el6.x86_64 > zlib-1.2.3-29.el6.x86_64 > (gdb) bt > #0 0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x000000000042ea39 in fio_mutex_down (mutex=0x7f08b4e9f000) at mutex.c:155 > #2 0x000000000041b680 in show_run_stats () at stat.c:1409 > #3 0x0000000000449c85 in fio_backend () at backend.c:2042 > #4 0x000000376ee1ed1d in __libc_start_main () from /lib64/libc.so.6 > #5 0x000000000040a4b9 in _start () > (gdb) Are there other threads alive, would be interesting to see a backtrace from them. In any case, I think it'd be better to move the stat mutex grab to the stat thread itself. I can't reproduce this, so can you check if the attached patch makes a difference? -- Jens Axboe [-- Attachment #2: stat.patch --] [-- Type: text/x-patch, Size: 1325 bytes --] diff --git a/stat.c b/stat.c index 979c8100d378..d8365811b25f 100644 --- a/stat.c +++ b/stat.c @@ -1411,13 +1411,15 @@ void show_run_stats(void) fio_mutex_up(stat_mutex); } -static void *__show_running_run_stats(void fio_unused *arg) +static void *__show_running_run_stats(void *arg) { struct thread_data *td; unsigned long long *rt; struct timeval tv; int i; + fio_mutex_down(stat_mutex); + rt = malloc(thread_number * sizeof(unsigned long long)); fio_gettime(&tv, NULL); @@ -1458,6 +1460,7 @@ static void *__show_running_run_stats(void fio_unused *arg) free(rt); fio_mutex_up(stat_mutex); + free(arg); return NULL; } @@ -1468,21 +1471,23 @@ static void *__show_running_run_stats(void fio_unused *arg) */ void show_running_run_stats(void) { - pthread_t thread; + pthread_t *thread; - fio_mutex_down(stat_mutex); + thread = calloc(1, sizeof(*thread)); + if (!thread) + return; - if (!pthread_create(&thread, NULL, __show_running_run_stats, NULL)) { + if (!pthread_create(thread, NULL, __show_running_run_stats, thread)) { int err; - err = pthread_detach(thread); + err = pthread_detach(*thread); if (err) log_err("fio: DU thread detach failed: %s\n", strerror(err)); return; } - fio_mutex_up(stat_mutex); + free(thread); } static int status_interval_init; ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval 2014-07-21 8:08 ` Jens Axboe @ 2014-07-21 20:25 ` Vasily Tarasov 2014-07-21 20:54 ` Jens Axboe 2014-07-25 7:43 ` Jens Axboe 0 siblings, 2 replies; 25+ messages in thread From: Vasily Tarasov @ 2014-07-21 20:25 UTC (permalink / raw) To: Jens Axboe; +Cc: Michael Mattsson, fio@vger.kernel.org Hi Jens, I tried your patch, but it didn't help. Interestingly, the number of threads changes in the end. At first, during the run: # ps -eLf | grep fio root 5224 4274 5224 1 2 11:12 pts/1 00:00:00 fio --status-interval 10 --minimal fios/1.fio root 5224 4274 5225 0 2 11:12 pts/1 00:00:00 fio --status-interval 10 --minimal fios/1.fio root 5231 5224 5231 60 1 11:12 ? 00:00:07 fio --status-interval 10 --minimal fios/1.fio root 5260 5237 5260 0 1 11:12 pts/0 00:00:00 grep fio [root@bison01 vass]# ps -eLf | grep fio root 5224 4274 5224 0 2 11:12 pts/1 00:00:00 fio --status-interval 10 --minimal fios/1.fio root 5224 4274 5225 0 2 11:12 pts/1 00:00:00 fio --status-interval 10 --minimal fios/1.fio root 5231 5224 5231 16 1 11:12 ? 00:00:21 fio --status-interval 10 --minimal fios/1.fio root 5293 5237 5293 0 1 11:14 pts/0 00:00:00 grep fio [root@bison01 vass]# ps -eLf | grep fio root 5224 4274 5224 0 2 11:12 pts/1 00:00:01 fio --status-interval 10 --minimal fios/1.fio root 5224 4274 5225 0 2 11:12 pts/1 00:00:00 fio --status-interval 10 --minimal fios/1.fio root 5231 5224 5231 12 1 11:12 ? 00:01:13 fio --status-interval 10 --minimal fios/1.fio root 5411 5237 5411 0 1 11:22 pts/0 00:00:00 grep fio Later, when the threads are stuck: # ps -eLf | grep fio root 5224 4274 5224 0 16 11:12 pts/1 00:00:02 fio --status-interval 10 --minimal fios/1.fio root 5224 4274 5225 0 16 11:12 pts/1 00:00:01 fio --status-interval 10 --minimal fios/1.fio root 5224 4274 5458 0 16 11:25 pts/1 00:00:00 fio --status-interval 10 --minimal fios/1.fio root 5224 4274 5459 0 16 11:25 pts/1 00:00:00 fio --status-interval 10 --minimal fios/1.fio root 5224 4274 5460 0 16 11:25 pts/1 00:00:00 fio --status-interval 10 --minimal fios/1.fio root 5224 4274 5461 0 16 11:25 pts/1 00:00:00 fio --status-interval 10 --minimal fios/1.fio root 5224 4274 5462 0 16 11:25 pts/1 00:00:00 fio --status-interval 10 --minimal fios/1.fio root 5224 4274 5471 0 16 11:25 pts/1 00:00:00 fio --status-interval 10 --minimal fios/1.fio root 5224 4274 5472 0 16 11:26 pts/1 00:00:00 fio --status-interval 10 --minimal fios/1.fio root 5224 4274 5475 0 16 11:26 pts/1 00:00:00 fio --status-interval 10 --minimal fios/1.fio root 5224 4274 5476 0 16 11:26 pts/1 00:00:00 fio --status-interval 10 --minimal fios/1.fio root 5224 4274 5477 0 16 11:26 pts/1 00:00:00 fio --status-interval 10 --minimal fios/1.fio root 5224 4274 5478 0 16 11:26 pts/1 00:00:00 fio --status-interval 10 --minimal fios/1.fio root 5224 4274 5487 0 16 11:26 pts/1 00:00:00 fio --status-interval 10 --minimal fios/1.fio root 5224 4274 5488 0 16 11:27 pts/1 00:00:00 fio --status-interval 10 --minimal fios/1.fio root 5224 4274 5489 0 16 11:27 pts/1 00:00:00 fio --status-interval 10 --minimal fios/1.fio root 6665 5237 6665 0 1 13:21 pts/0 00:00:00 grep fio Is the number of threads supposed to change?.. Here is the fio file I'm using: <snip starts> # cat fios/1.fio [global] rw=write bs=8m direct=0 [sdaa] filename=/dev/sdaa <snip ends> Command line is fio --status-interval 10 --minimal fios/1.fio Thanks, Vasily ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval 2014-07-21 20:25 ` Vasily Tarasov @ 2014-07-21 20:54 ` Jens Axboe 2014-07-25 7:43 ` Jens Axboe 1 sibling, 0 replies; 25+ messages in thread From: Jens Axboe @ 2014-07-21 20:54 UTC (permalink / raw) To: Vasily Tarasov; +Cc: Michael Mattsson, fio@vger.kernel.org On 2014-07-21 22:25, Vasily Tarasov wrote: > Hi Jens, > > I tried your patch, but it didn't help. Interestingly, the number of > threads changes in the end. At first, during the run: > > # ps -eLf | grep fio > root 5224 4274 5224 1 2 11:12 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5225 0 2 11:12 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5231 5224 5231 60 1 11:12 ? 00:00:07 fio > --status-interval 10 --minimal fios/1.fio > root 5260 5237 5260 0 1 11:12 pts/0 00:00:00 grep fio > [root@bison01 vass]# ps -eLf | grep fio > root 5224 4274 5224 0 2 11:12 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5225 0 2 11:12 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5231 5224 5231 16 1 11:12 ? 00:00:21 fio > --status-interval 10 --minimal fios/1.fio > root 5293 5237 5293 0 1 11:14 pts/0 00:00:00 grep fio > [root@bison01 vass]# ps -eLf | grep fio > root 5224 4274 5224 0 2 11:12 pts/1 00:00:01 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5225 0 2 11:12 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5231 5224 5231 12 1 11:12 ? 00:01:13 fio > --status-interval 10 --minimal fios/1.fio > root 5411 5237 5411 0 1 11:22 pts/0 00:00:00 grep fio > > Later, when the threads are stuck: > > # ps -eLf | grep fio > root 5224 4274 5224 0 16 11:12 pts/1 00:00:02 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5225 0 16 11:12 pts/1 00:00:01 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5458 0 16 11:25 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5459 0 16 11:25 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5460 0 16 11:25 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5461 0 16 11:25 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5462 0 16 11:25 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5471 0 16 11:25 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5472 0 16 11:26 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5475 0 16 11:26 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5476 0 16 11:26 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5477 0 16 11:26 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5478 0 16 11:26 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5487 0 16 11:26 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5488 0 16 11:27 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5489 0 16 11:27 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 6665 5237 6665 0 1 13:21 pts/0 00:00:00 grep fio > > Is the number of threads supposed to change?.. > > Here is the fio file I'm using: > > <snip starts> > # cat fios/1.fio > [global] > rw=write > bs=8m > direct=0 > > [sdaa] > filename=/dev/sdaa > <snip ends> > > Command line is > > fio --status-interval 10 --minimal fios/1.fio Thanks, I'll try and reproduce it tomorrow. -- Jens Axboe ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval 2014-07-21 20:25 ` Vasily Tarasov 2014-07-21 20:54 ` Jens Axboe @ 2014-07-25 7:43 ` Jens Axboe 2014-07-25 7:56 ` Jens Axboe 1 sibling, 1 reply; 25+ messages in thread From: Jens Axboe @ 2014-07-25 7:43 UTC (permalink / raw) To: Vasily Tarasov; +Cc: Michael Mattsson, fio@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 3920 bytes --] On 2014-07-21 22:25, Vasily Tarasov wrote: > Hi Jens, > > I tried your patch, but it didn't help. Interestingly, the number of > threads changes in the end. At first, during the run: > > # ps -eLf | grep fio > root 5224 4274 5224 1 2 11:12 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5225 0 2 11:12 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5231 5224 5231 60 1 11:12 ? 00:00:07 fio > --status-interval 10 --minimal fios/1.fio > root 5260 5237 5260 0 1 11:12 pts/0 00:00:00 grep fio > [root@bison01 vass]# ps -eLf | grep fio > root 5224 4274 5224 0 2 11:12 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5225 0 2 11:12 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5231 5224 5231 16 1 11:12 ? 00:00:21 fio > --status-interval 10 --minimal fios/1.fio > root 5293 5237 5293 0 1 11:14 pts/0 00:00:00 grep fio > [root@bison01 vass]# ps -eLf | grep fio > root 5224 4274 5224 0 2 11:12 pts/1 00:00:01 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5225 0 2 11:12 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5231 5224 5231 12 1 11:12 ? 00:01:13 fio > --status-interval 10 --minimal fios/1.fio > root 5411 5237 5411 0 1 11:22 pts/0 00:00:00 grep fio > > Later, when the threads are stuck: > > # ps -eLf | grep fio > root 5224 4274 5224 0 16 11:12 pts/1 00:00:02 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5225 0 16 11:12 pts/1 00:00:01 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5458 0 16 11:25 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5459 0 16 11:25 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5460 0 16 11:25 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5461 0 16 11:25 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5462 0 16 11:25 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5471 0 16 11:25 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5472 0 16 11:26 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5475 0 16 11:26 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5476 0 16 11:26 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5477 0 16 11:26 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5478 0 16 11:26 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5487 0 16 11:26 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5488 0 16 11:27 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 5224 4274 5489 0 16 11:27 pts/1 00:00:00 fio > --status-interval 10 --minimal fios/1.fio > root 6665 5237 6665 0 1 13:21 pts/0 00:00:00 grep fio > > Is the number of threads supposed to change?.. Never answered this one... Yes, it'll change, since when you run the job, you'll have one backend process, a number of IO workers, and one disk util thread typically. When you get stuck, it's the backend that is left waiting for that mutex. In any case, I haven't been able to figure this one out yet. But it should be safe enough to just ignore the stat mutex for the final output, since the threads otherwise accessing it are gone. Can you see if this one makes the issue go away? -- Jens Axboe [-- Attachment #2: stat-hang.patch --] [-- Type: text/x-patch, Size: 812 bytes --] diff --git a/backend.c b/backend.c index 30f78b72f9d1..981625b61095 100644 --- a/backend.c +++ b/backend.c @@ -2068,7 +2068,7 @@ int fio_backend(void) run_threads(); if (!fio_abort) { - show_run_stats(); + __show_run_stats(); if (write_bw_log) { int i; diff --git a/stat.h b/stat.h index 2e46175053e8..90a7fb31a1bc 100644 --- a/stat.h +++ b/stat.h @@ -218,6 +218,7 @@ extern void show_group_stats(struct group_run_stats *rs); extern int calc_thread_status(struct jobs_eta *je, int force); extern void display_thread_status(struct jobs_eta *je); extern void show_run_stats(void); +extern void __show_run_stats(void); extern void show_running_run_stats(void); extern void check_for_running_stats(void); extern void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src, int nr); ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval 2014-07-25 7:43 ` Jens Axboe @ 2014-07-25 7:56 ` Jens Axboe 2014-07-25 16:34 ` Vasily Tarasov 0 siblings, 1 reply; 25+ messages in thread From: Jens Axboe @ 2014-07-25 7:56 UTC (permalink / raw) To: Vasily Tarasov; +Cc: Michael Mattsson, fio@vger.kernel.org On 2014-07-25 09:43, Jens Axboe wrote: > On 2014-07-21 22:25, Vasily Tarasov wrote: >> Hi Jens, >> >> I tried your patch, but it didn't help. Interestingly, the number of >> threads changes in the end. At first, during the run: >> >> # ps -eLf | grep fio >> root 5224 4274 5224 1 2 11:12 pts/1 00:00:00 fio >> --status-interval 10 --minimal fios/1.fio >> root 5224 4274 5225 0 2 11:12 pts/1 00:00:00 fio >> --status-interval 10 --minimal fios/1.fio >> root 5231 5224 5231 60 1 11:12 ? 00:00:07 fio >> --status-interval 10 --minimal fios/1.fio >> root 5260 5237 5260 0 1 11:12 pts/0 00:00:00 grep fio >> [root@bison01 vass]# ps -eLf | grep fio >> root 5224 4274 5224 0 2 11:12 pts/1 00:00:00 fio >> --status-interval 10 --minimal fios/1.fio >> root 5224 4274 5225 0 2 11:12 pts/1 00:00:00 fio >> --status-interval 10 --minimal fios/1.fio >> root 5231 5224 5231 16 1 11:12 ? 00:00:21 fio >> --status-interval 10 --minimal fios/1.fio >> root 5293 5237 5293 0 1 11:14 pts/0 00:00:00 grep fio >> [root@bison01 vass]# ps -eLf | grep fio >> root 5224 4274 5224 0 2 11:12 pts/1 00:00:01 fio >> --status-interval 10 --minimal fios/1.fio >> root 5224 4274 5225 0 2 11:12 pts/1 00:00:00 fio >> --status-interval 10 --minimal fios/1.fio >> root 5231 5224 5231 12 1 11:12 ? 00:01:13 fio >> --status-interval 10 --minimal fios/1.fio >> root 5411 5237 5411 0 1 11:22 pts/0 00:00:00 grep fio >> >> Later, when the threads are stuck: >> >> # ps -eLf | grep fio >> root 5224 4274 5224 0 16 11:12 pts/1 00:00:02 fio >> --status-interval 10 --minimal fios/1.fio >> root 5224 4274 5225 0 16 11:12 pts/1 00:00:01 fio >> --status-interval 10 --minimal fios/1.fio >> root 5224 4274 5458 0 16 11:25 pts/1 00:00:00 fio >> --status-interval 10 --minimal fios/1.fio >> root 5224 4274 5459 0 16 11:25 pts/1 00:00:00 fio >> --status-interval 10 --minimal fios/1.fio >> root 5224 4274 5460 0 16 11:25 pts/1 00:00:00 fio >> --status-interval 10 --minimal fios/1.fio >> root 5224 4274 5461 0 16 11:25 pts/1 00:00:00 fio >> --status-interval 10 --minimal fios/1.fio >> root 5224 4274 5462 0 16 11:25 pts/1 00:00:00 fio >> --status-interval 10 --minimal fios/1.fio >> root 5224 4274 5471 0 16 11:25 pts/1 00:00:00 fio >> --status-interval 10 --minimal fios/1.fio >> root 5224 4274 5472 0 16 11:26 pts/1 00:00:00 fio >> --status-interval 10 --minimal fios/1.fio >> root 5224 4274 5475 0 16 11:26 pts/1 00:00:00 fio >> --status-interval 10 --minimal fios/1.fio >> root 5224 4274 5476 0 16 11:26 pts/1 00:00:00 fio >> --status-interval 10 --minimal fios/1.fio >> root 5224 4274 5477 0 16 11:26 pts/1 00:00:00 fio >> --status-interval 10 --minimal fios/1.fio >> root 5224 4274 5478 0 16 11:26 pts/1 00:00:00 fio >> --status-interval 10 --minimal fios/1.fio >> root 5224 4274 5487 0 16 11:26 pts/1 00:00:00 fio >> --status-interval 10 --minimal fios/1.fio >> root 5224 4274 5488 0 16 11:27 pts/1 00:00:00 fio >> --status-interval 10 --minimal fios/1.fio >> root 5224 4274 5489 0 16 11:27 pts/1 00:00:00 fio >> --status-interval 10 --minimal fios/1.fio >> root 6665 5237 6665 0 1 13:21 pts/0 00:00:00 grep fio >> >> Is the number of threads supposed to change?.. > > Never answered this one... Yes, it'll change, since when you run the > job, you'll have one backend process, a number of IO workers, and one > disk util thread typically. When you get stuck, it's the backend that is > left waiting for that mutex. > > In any case, I haven't been able to figure this one out yet. But it > should be safe enough to just ignore the stat mutex for the final > output, since the threads otherwise accessing it are gone. Can you see > if this one makes the issue go away? Patch was not compiled, was missing the non-static __show_run_stats(). But just pull current -git, I have committed a variant that does compile :-) -- Jens Axboe ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval 2014-07-25 7:56 ` Jens Axboe @ 2014-07-25 16:34 ` Vasily Tarasov 2014-07-28 8:56 ` Jens Axboe 0 siblings, 1 reply; 25+ messages in thread From: Vasily Tarasov @ 2014-07-25 16:34 UTC (permalink / raw) To: Jens Axboe; +Cc: Michael Mattsson, fio@vger.kernel.org Hi Jens, You'll be surprised but it did not help :( I used the latest code from git (fio-2.1.11-10-gae7e, commit ae7e050). Still see the same picture. I don't know if it helps, but I see this behavior on a machine with 96GB of RAM. So, after buffered writes are over, fio waits for a long time till all dirty buffers hit the disk. But, even after there is no more disk activity, fio is still stuck for as long as I don't kill it. Regarding the number of threads. I do understand where the 3 threads can come from: 1) Backend thread (sort of a manager) 1) Worker thread(s) 2) Disk stats thread I my case I defined only one job instance, so I suppose there always should be only one worker thread. I don't understand how the total number of threads go to 10 in the end. <snip starts> $ ps -eLf | grep fio root 4427 4135 4427 0 15 07:44 pts/1 00:00:02 fio --minimal --status-interval 10 1.fio root 4427 4135 4636 0 15 07:56 pts/1 00:00:00 fio --minimal --status-interval 10 1.fio root 4427 4135 4637 0 15 07:57 pts/1 00:00:00 fio --minimal --status-interval 10 1.fio root 4427 4135 4638 0 15 07:57 pts/1 00:00:00 fio --minimal --status-interval 10 1.fio root 4427 4135 4647 0 15 07:57 pts/1 00:00:00 fio --minimal --status-interval 10 1.fio root 4427 4135 4650 0 15 07:57 pts/1 00:00:00 fio --minimal --status-interval 10 1.fio root 4427 4135 4651 0 15 07:57 pts/1 00:00:00 fio --minimal --status-interval 10 1.fio root 4427 4135 4652 0 15 07:57 pts/1 00:00:00 fio --minimal --status-interval 10 1.fio root 4427 4135 4653 0 15 07:58 pts/1 00:00:00 fio --minimal --status-interval 10 1.fio root 4427 4135 4654 0 15 07:58 pts/1 00:00:00 fio --minimal --status-interval 10 1.fio root 4427 4135 4663 0 15 07:58 pts/1 00:00:00 fio --minimal --status-interval 10 1.fio root 4427 4135 4664 0 15 07:58 pts/1 00:00:00 fio --minimal --status-interval 10 1.fio root 4427 4135 4666 0 15 07:58 pts/1 00:00:00 fio --minimal --status-interval 10 1.fio root 4427 4135 4668 0 15 07:58 pts/1 00:00:00 fio --minimal --status-interval 10 1.fio root 4427 4135 4669 0 15 07:59 pts/1 00:00:00 fio --minimal --status-interval 10 1.fio <snip ends> Thanks, Vasily On Fri, Jul 25, 2014 at 3:56 AM, Jens Axboe <axboe@kernel.dk> wrote: > On 2014-07-25 09:43, Jens Axboe wrote: >> >> On 2014-07-21 22:25, Vasily Tarasov wrote: >>> >>> Hi Jens, >>> >>> I tried your patch, but it didn't help. Interestingly, the number of >>> threads changes in the end. At first, during the run: >>> >>> # ps -eLf | grep fio >>> root 5224 4274 5224 1 2 11:12 pts/1 00:00:00 fio >>> --status-interval 10 --minimal fios/1.fio >>> root 5224 4274 5225 0 2 11:12 pts/1 00:00:00 fio >>> --status-interval 10 --minimal fios/1.fio >>> root 5231 5224 5231 60 1 11:12 ? 00:00:07 fio >>> --status-interval 10 --minimal fios/1.fio >>> root 5260 5237 5260 0 1 11:12 pts/0 00:00:00 grep fio >>> [root@bison01 vass]# ps -eLf | grep fio >>> root 5224 4274 5224 0 2 11:12 pts/1 00:00:00 fio >>> --status-interval 10 --minimal fios/1.fio >>> root 5224 4274 5225 0 2 11:12 pts/1 00:00:00 fio >>> --status-interval 10 --minimal fios/1.fio >>> root 5231 5224 5231 16 1 11:12 ? 00:00:21 fio >>> --status-interval 10 --minimal fios/1.fio >>> root 5293 5237 5293 0 1 11:14 pts/0 00:00:00 grep fio >>> [root@bison01 vass]# ps -eLf | grep fio >>> root 5224 4274 5224 0 2 11:12 pts/1 00:00:01 fio >>> --status-interval 10 --minimal fios/1.fio >>> root 5224 4274 5225 0 2 11:12 pts/1 00:00:00 fio >>> --status-interval 10 --minimal fios/1.fio >>> root 5231 5224 5231 12 1 11:12 ? 00:01:13 fio >>> --status-interval 10 --minimal fios/1.fio >>> root 5411 5237 5411 0 1 11:22 pts/0 00:00:00 grep fio >>> >>> Later, when the threads are stuck: >>> >>> # ps -eLf | grep fio >>> root 5224 4274 5224 0 16 11:12 pts/1 00:00:02 fio >>> --status-interval 10 --minimal fios/1.fio >>> root 5224 4274 5225 0 16 11:12 pts/1 00:00:01 fio >>> --status-interval 10 --minimal fios/1.fio >>> root 5224 4274 5458 0 16 11:25 pts/1 00:00:00 fio >>> --status-interval 10 --minimal fios/1.fio >>> root 5224 4274 5459 0 16 11:25 pts/1 00:00:00 fio >>> --status-interval 10 --minimal fios/1.fio >>> root 5224 4274 5460 0 16 11:25 pts/1 00:00:00 fio >>> --status-interval 10 --minimal fios/1.fio >>> root 5224 4274 5461 0 16 11:25 pts/1 00:00:00 fio >>> --status-interval 10 --minimal fios/1.fio >>> root 5224 4274 5462 0 16 11:25 pts/1 00:00:00 fio >>> --status-interval 10 --minimal fios/1.fio >>> root 5224 4274 5471 0 16 11:25 pts/1 00:00:00 fio >>> --status-interval 10 --minimal fios/1.fio >>> root 5224 4274 5472 0 16 11:26 pts/1 00:00:00 fio >>> --status-interval 10 --minimal fios/1.fio >>> root 5224 4274 5475 0 16 11:26 pts/1 00:00:00 fio >>> --status-interval 10 --minimal fios/1.fio >>> root 5224 4274 5476 0 16 11:26 pts/1 00:00:00 fio >>> --status-interval 10 --minimal fios/1.fio >>> root 5224 4274 5477 0 16 11:26 pts/1 00:00:00 fio >>> --status-interval 10 --minimal fios/1.fio >>> root 5224 4274 5478 0 16 11:26 pts/1 00:00:00 fio >>> --status-interval 10 --minimal fios/1.fio >>> root 5224 4274 5487 0 16 11:26 pts/1 00:00:00 fio >>> --status-interval 10 --minimal fios/1.fio >>> root 5224 4274 5488 0 16 11:27 pts/1 00:00:00 fio >>> --status-interval 10 --minimal fios/1.fio >>> root 5224 4274 5489 0 16 11:27 pts/1 00:00:00 fio >>> --status-interval 10 --minimal fios/1.fio >>> root 6665 5237 6665 0 1 13:21 pts/0 00:00:00 grep fio >>> >>> Is the number of threads supposed to change?.. >> >> >> Never answered this one... Yes, it'll change, since when you run the >> job, you'll have one backend process, a number of IO workers, and one >> disk util thread typically. When you get stuck, it's the backend that is >> left waiting for that mutex. >> >> In any case, I haven't been able to figure this one out yet. But it >> should be safe enough to just ignore the stat mutex for the final >> output, since the threads otherwise accessing it are gone. Can you see >> if this one makes the issue go away? > > > Patch was not compiled, was missing the non-static __show_run_stats(). But > just pull current -git, I have committed a variant that does compile :-) > > -- > Jens Axboe > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval 2014-07-25 16:34 ` Vasily Tarasov @ 2014-07-28 8:56 ` Jens Axboe 2014-07-28 16:05 ` Vasily Tarasov 0 siblings, 1 reply; 25+ messages in thread From: Jens Axboe @ 2014-07-28 8:56 UTC (permalink / raw) To: Vasily Tarasov; +Cc: Michael Mattsson, fio@vger.kernel.org On 2014-07-25 18:34, Vasily Tarasov wrote: > Hi Jens, > > You'll be surprised but it did not help :( I used the latest code from > git (fio-2.1.11-10-gae7e, commit ae7e050). Still see the same picture. That's actually good news, since it didn't make a lot of sense. So lets see if we can't get to the bottom of this... > I don't know if it helps, but I see this behavior on a machine with > 96GB of RAM. So, after buffered writes are over, fio waits for a long > time till all dirty buffers hit the disk. But, even after there is no > more disk activity, fio is still stuck for as long as I don't kill it. > > Regarding the number of threads. I do understand where the 3 threads > can come from: > > 1) Backend thread (sort of a manager) > 1) Worker thread(s) > 2) Disk stats thread > > I my case I defined only one job instance, so I suppose there always > should be only one worker thread. I don't understand how the total > number of threads go to 10 in the end. > > <snip starts> > $ ps -eLf | grep fio > root 4427 4135 4427 0 15 07:44 pts/1 00:00:02 fio > --minimal --status-interval 10 1.fio > root 4427 4135 4636 0 15 07:56 pts/1 00:00:00 fio > --minimal --status-interval 10 1.fio > root 4427 4135 4637 0 15 07:57 pts/1 00:00:00 fio > --minimal --status-interval 10 1.fio > root 4427 4135 4638 0 15 07:57 pts/1 00:00:00 fio > --minimal --status-interval 10 1.fio > root 4427 4135 4647 0 15 07:57 pts/1 00:00:00 fio > --minimal --status-interval 10 1.fio > root 4427 4135 4650 0 15 07:57 pts/1 00:00:00 fio > --minimal --status-interval 10 1.fio > root 4427 4135 4651 0 15 07:57 pts/1 00:00:00 fio > --minimal --status-interval 10 1.fio > root 4427 4135 4652 0 15 07:57 pts/1 00:00:00 fio > --minimal --status-interval 10 1.fio > root 4427 4135 4653 0 15 07:58 pts/1 00:00:00 fio > --minimal --status-interval 10 1.fio > root 4427 4135 4654 0 15 07:58 pts/1 00:00:00 fio > --minimal --status-interval 10 1.fio > root 4427 4135 4663 0 15 07:58 pts/1 00:00:00 fio > --minimal --status-interval 10 1.fio > root 4427 4135 4664 0 15 07:58 pts/1 00:00:00 fio > --minimal --status-interval 10 1.fio > root 4427 4135 4666 0 15 07:58 pts/1 00:00:00 fio > --minimal --status-interval 10 1.fio > root 4427 4135 4668 0 15 07:58 pts/1 00:00:00 fio > --minimal --status-interval 10 1.fio > root 4427 4135 4669 0 15 07:59 pts/1 00:00:00 fio > --minimal --status-interval 10 1.fio > <snip ends> Can you try and gdb attach to it when it's hung and produce a new backtrace? It can't be off the final status run, I wonder if it's off the mutex down and remove instead. -- Jens Axboe ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval 2014-07-28 8:56 ` Jens Axboe @ 2014-07-28 16:05 ` Vasily Tarasov 2014-07-28 19:49 ` Jens Axboe 0 siblings, 1 reply; 25+ messages in thread From: Vasily Tarasov @ 2014-07-28 16:05 UTC (permalink / raw) To: Jens Axboe; +Cc: Michael Mattsson, fio@vger.kernel.org Hi Jens, Thanks for looking into this. Here is the information you've requested: 1. Workload I'm running: <snip starts> [global] rw=write bs=8m direct=0 thread=1 size=4g [sdaa] filename=/dev/sdaa <snip ends> 2. That's what I see on the screen: <snip starts> # fio --status-interval 10 1.fio sdaa: (g=0): rw=write, bs=8M-8M/8M-8M/8M-8M, ioengine=sync, iodepth=1 fio-2.1.11-10-gae7e Starting 1 thread Jobs: 1 (f=1): [W(1)] [-.-% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 00m:00s] sdaa: (groupid=0, jobs=1): err= 0: pid=18849: Mon Jul 28 08:50:47 2014 write: io=4096.0MB, bw=367567KB/s, iops=44, runt= 11411msec clat (usec): min=2400, max=3657, avg=2677.80, stdev=234.68 lat (usec): min=2674, max=3968, avg=2948.69, stdev=269.71 clat percentiles (usec): | 1.00th=[ 2448], 5.00th=[ 2480], 10.00th=[ 2512], 20.00th=[ 2544], | 30.00th=[ 2544], 40.00th=[ 2576], 50.00th=[ 2576], 60.00th=[ 2576], | 70.00th=[ 2608], 80.00th=[ 3024], 90.00th=[ 3120], 95.00th=[ 3152], | 99.00th=[ 3248], 99.50th=[ 3312], 99.90th=[ 3664], 99.95th=[ 3664], | 99.99th=[ 3664] bw (MB /s): min= 2464, max= 2842, per=100.00%, avg=2712.77, stdev=215.50 lat (msec) : 4=100.00% cpu : usr=1.31%, sys=13.92%, ctx=161, majf=0, minf=6 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=0/w=512/d=0, short=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): WRITE: io=4096.0MB, aggrb=367566KB/s, minb=367566KB/s, maxb=367566KB/s, mint=11411msec, maxt=11411msec Disk stats (read/write): sdaa: ios=84/0, merge=0/0, ticks=43/0, in_queue=35, util=2.36% <snip ends> At this point fio does not exit. Below are the gdb backtraces. There are three threads in the run. When I attach to one thread, get backtrace, and then detach fio crashes with Segfault. So, to collect 3 backtraces, I ran the experiment 3 times and attached to a different thread every time. Below, I change the PIDs so that they match the ps output. # ps -eLf | grep fio root 18844 10064 18844 0 3 08:50 pts/1 00:00:00 fio --status-interval 10 1.fio root 18844 10064 18862 0 3 08:50 pts/1 00:00:00 fio --status-interval 10 1.fio root 18844 10064 18863 0 3 08:50 pts/1 00:00:00 fio --status-interval 10 1.fio root 18902 18805 18902 0 1 08:53 pts/0 00:00:00 grep fio # gdb (gdb) attach 18844 (gdb) bt #0 0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000000000430419 in fio_mutex_down (mutex=0x7ffff7ff6000) at mutex.c:155 #2 0x0000000000411bc0 in stat_exit () at stat.c:1905 #3 0x000000000044c1d6 in fio_backend () at backend.c:2094 #4 0x000000376ee1ed1d in __libc_start_main () from /lib64/libc.so.6 #5 0x000000000040a679 in _start () (gdb) attach 1862 (gdb) bt #0 0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000000000430419 in fio_mutex_down (mutex=0x7ffff7ff2000) at mutex.c:155 #2 0x000000000041c717 in __show_running_run_stats (arg=0x72a330) at stat.c:1445 #3 0x000000376f6079d1 in start_thread () from /lib64/libpthread.so.0 #4 0x000000376eee8b6d in clone () from /lib64/libc.so.6 (gdb) attach 1863 (gdb) bt #0 0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000000000430419 in fio_mutex_down (mutex=0x7ffff7ff6000) at mutex.c:155 #2 0x000000000041c5cd in __show_running_run_stats (arg=0x72a490) at stat.c:1421 #3 0x000000376f6079d1 in start_thread () from /lib64/libpthread.so.0 #4 0x000000376eee8b6d in clone () from /lib64/libc.so.6 Let me know if you need any additional information. Thank you, Vasily On Mon, Jul 28, 2014 at 1:56 AM, Jens Axboe <axboe@kernel.dk> wrote: > On 2014-07-25 18:34, Vasily Tarasov wrote: >> >> Hi Jens, >> >> You'll be surprised but it did not help :( I used the latest code from >> git (fio-2.1.11-10-gae7e, commit ae7e050). Still see the same picture. > > > That's actually good news, since it didn't make a lot of sense. So lets see > if we can't get to the bottom of this... > > >> I don't know if it helps, but I see this behavior on a machine with >> 96GB of RAM. So, after buffered writes are over, fio waits for a long >> time till all dirty buffers hit the disk. But, even after there is no >> more disk activity, fio is still stuck for as long as I don't kill it. >> >> Regarding the number of threads. I do understand where the 3 threads >> can come from: >> >> 1) Backend thread (sort of a manager) >> 1) Worker thread(s) >> 2) Disk stats thread >> >> I my case I defined only one job instance, so I suppose there always >> should be only one worker thread. I don't understand how the total >> number of threads go to 10 in the end. >> >> <snip starts> >> $ ps -eLf | grep fio >> root 4427 4135 4427 0 15 07:44 pts/1 00:00:02 fio >> --minimal --status-interval 10 1.fio >> root 4427 4135 4636 0 15 07:56 pts/1 00:00:00 fio >> --minimal --status-interval 10 1.fio >> root 4427 4135 4637 0 15 07:57 pts/1 00:00:00 fio >> --minimal --status-interval 10 1.fio >> root 4427 4135 4638 0 15 07:57 pts/1 00:00:00 fio >> --minimal --status-interval 10 1.fio >> root 4427 4135 4647 0 15 07:57 pts/1 00:00:00 fio >> --minimal --status-interval 10 1.fio >> root 4427 4135 4650 0 15 07:57 pts/1 00:00:00 fio >> --minimal --status-interval 10 1.fio >> root 4427 4135 4651 0 15 07:57 pts/1 00:00:00 fio >> --minimal --status-interval 10 1.fio >> root 4427 4135 4652 0 15 07:57 pts/1 00:00:00 fio >> --minimal --status-interval 10 1.fio >> root 4427 4135 4653 0 15 07:58 pts/1 00:00:00 fio >> --minimal --status-interval 10 1.fio >> root 4427 4135 4654 0 15 07:58 pts/1 00:00:00 fio >> --minimal --status-interval 10 1.fio >> root 4427 4135 4663 0 15 07:58 pts/1 00:00:00 fio >> --minimal --status-interval 10 1.fio >> root 4427 4135 4664 0 15 07:58 pts/1 00:00:00 fio >> --minimal --status-interval 10 1.fio >> root 4427 4135 4666 0 15 07:58 pts/1 00:00:00 fio >> --minimal --status-interval 10 1.fio >> root 4427 4135 4668 0 15 07:58 pts/1 00:00:00 fio >> --minimal --status-interval 10 1.fio >> root 4427 4135 4669 0 15 07:59 pts/1 00:00:00 fio >> --minimal --status-interval 10 1.fio >> <snip ends> > > > Can you try and gdb attach to it when it's hung and produce a new backtrace? > It can't be off the final status run, I wonder if it's off the mutex down > and remove instead. > > -- > Jens Axboe > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval 2014-07-28 16:05 ` Vasily Tarasov @ 2014-07-28 19:49 ` Jens Axboe 2014-10-23 15:57 ` Michael Mattsson 0 siblings, 1 reply; 25+ messages in thread From: Jens Axboe @ 2014-07-28 19:49 UTC (permalink / raw) To: Vasily Tarasov; +Cc: Michael Mattsson, fio@vger.kernel.org On 2014-07-28 18:05, Vasily Tarasov wrote: > Hi Jens, > > Thanks for looking into this. Here is the information you've requested: > > 1. Workload I'm running: > > <snip starts> > > [global] > rw=write > bs=8m > direct=0 > thread=1 > size=4g > > [sdaa] > filename=/dev/sdaa > > <snip ends> > > 2. That's what I see on the screen: > > <snip starts> > > # fio --status-interval 10 1.fio > sdaa: (g=0): rw=write, bs=8M-8M/8M-8M/8M-8M, ioengine=sync, iodepth=1 > fio-2.1.11-10-gae7e > Starting 1 thread > Jobs: 1 (f=1): [W(1)] [-.-% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 00m:00s] > sdaa: (groupid=0, jobs=1): err= 0: pid=18849: Mon Jul 28 08:50:47 2014 > write: io=4096.0MB, bw=367567KB/s, iops=44, runt= 11411msec > clat (usec): min=2400, max=3657, avg=2677.80, stdev=234.68 > lat (usec): min=2674, max=3968, avg=2948.69, stdev=269.71 > clat percentiles (usec): > | 1.00th=[ 2448], 5.00th=[ 2480], 10.00th=[ 2512], 20.00th=[ 2544], > | 30.00th=[ 2544], 40.00th=[ 2576], 50.00th=[ 2576], 60.00th=[ 2576], > | 70.00th=[ 2608], 80.00th=[ 3024], 90.00th=[ 3120], 95.00th=[ 3152], > | 99.00th=[ 3248], 99.50th=[ 3312], 99.90th=[ 3664], 99.95th=[ 3664], > | 99.99th=[ 3664] > bw (MB /s): min= 2464, max= 2842, per=100.00%, avg=2712.77, stdev=215.50 > lat (msec) : 4=100.00% > cpu : usr=1.31%, sys=13.92%, ctx=161, majf=0, minf=6 > IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > issued : total=r=0/w=512/d=0, short=r=0/w=0/d=0 > latency : target=0, window=0, percentile=100.00%, depth=1 > > Run status group 0 (all jobs): > WRITE: io=4096.0MB, aggrb=367566KB/s, minb=367566KB/s, > maxb=367566KB/s, mint=11411msec, maxt=11411msec > > Disk stats (read/write): > sdaa: ios=84/0, merge=0/0, ticks=43/0, in_queue=35, util=2.36% > > <snip ends> > > At this point fio does not exit. Below are the gdb backtraces. There > are three threads in the run. When I attach to one thread, get > backtrace, and then detach fio crashes with Segfault. So, to collect 3 > backtraces, I ran the experiment 3 times and attached to a different > thread every time. Below, I change the PIDs so that they match the ps > output. > > # ps -eLf | grep fio > root 18844 10064 18844 0 3 08:50 pts/1 00:00:00 fio > --status-interval 10 1.fio > root 18844 10064 18862 0 3 08:50 pts/1 00:00:00 fio > --status-interval 10 1.fio > root 18844 10064 18863 0 3 08:50 pts/1 00:00:00 fio > --status-interval 10 1.fio > root 18902 18805 18902 0 1 08:53 pts/0 00:00:00 grep fio > # gdb > (gdb) attach 18844 > (gdb) bt > #0 0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x0000000000430419 in fio_mutex_down (mutex=0x7ffff7ff6000) at mutex.c:155 > #2 0x0000000000411bc0 in stat_exit () at stat.c:1905 > #3 0x000000000044c1d6 in fio_backend () at backend.c:2094 > #4 0x000000376ee1ed1d in __libc_start_main () from /lib64/libc.so.6 > #5 0x000000000040a679 in _start () > (gdb) attach 1862 > (gdb) bt > #0 0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x0000000000430419 in fio_mutex_down (mutex=0x7ffff7ff2000) at mutex.c:155 > #2 0x000000000041c717 in __show_running_run_stats (arg=0x72a330) at stat.c:1445 > #3 0x000000376f6079d1 in start_thread () from /lib64/libpthread.so.0 > #4 0x000000376eee8b6d in clone () from /lib64/libc.so.6 > (gdb) attach 1863 > (gdb) bt > #0 0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x0000000000430419 in fio_mutex_down (mutex=0x7ffff7ff6000) at mutex.c:155 > #2 0x000000000041c5cd in __show_running_run_stats (arg=0x72a490) at stat.c:1421 > #3 0x000000376f6079d1 in start_thread () from /lib64/libpthread.so.0 > #4 0x000000376eee8b6d in clone () from /lib64/libc.so.6 > > Let me know if you need any additional information. This is perfect, makes a lot more sense. Progress on this may be a little slow, I'm on vacation this week... But expect something next week, at least. -- Jens Axboe ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval 2014-07-28 19:49 ` Jens Axboe @ 2014-10-23 15:57 ` Michael Mattsson 2014-10-23 16:01 ` Jens Axboe 0 siblings, 1 reply; 25+ messages in thread From: Michael Mattsson @ 2014-10-23 15:57 UTC (permalink / raw) To: Jens Axboe; +Cc: Vasily Tarasov, fio@vger.kernel.org Hey, Have there been any work on this? This is still an issue for me on fio 2.1.13. It happens less frequently but I've had two hangs in a sample of ~40 runs. Regards Michael On Mon, Jul 28, 2014 at 12:49 PM, Jens Axboe <axboe@kernel.dk> wrote: > On 2014-07-28 18:05, Vasily Tarasov wrote: >> >> Hi Jens, >> >> Thanks for looking into this. Here is the information you've requested: >> >> 1. Workload I'm running: >> >> <snip starts> >> >> [global] >> rw=write >> bs=8m >> direct=0 >> thread=1 >> size=4g >> >> [sdaa] >> filename=/dev/sdaa >> >> <snip ends> >> >> 2. That's what I see on the screen: >> >> <snip starts> >> >> # fio --status-interval 10 1.fio >> sdaa: (g=0): rw=write, bs=8M-8M/8M-8M/8M-8M, ioengine=sync, iodepth=1 >> fio-2.1.11-10-gae7e >> Starting 1 thread >> Jobs: 1 (f=1): [W(1)] [-.-% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta >> 00m:00s] >> sdaa: (groupid=0, jobs=1): err= 0: pid=18849: Mon Jul 28 08:50:47 2014 >> write: io=4096.0MB, bw=367567KB/s, iops=44, runt= 11411msec >> clat (usec): min=2400, max=3657, avg=2677.80, stdev=234.68 >> lat (usec): min=2674, max=3968, avg=2948.69, stdev=269.71 >> clat percentiles (usec): >> | 1.00th=[ 2448], 5.00th=[ 2480], 10.00th=[ 2512], 20.00th=[ >> 2544], >> | 30.00th=[ 2544], 40.00th=[ 2576], 50.00th=[ 2576], 60.00th=[ >> 2576], >> | 70.00th=[ 2608], 80.00th=[ 3024], 90.00th=[ 3120], 95.00th=[ >> 3152], >> | 99.00th=[ 3248], 99.50th=[ 3312], 99.90th=[ 3664], 99.95th=[ >> 3664], >> | 99.99th=[ 3664] >> bw (MB /s): min= 2464, max= 2842, per=100.00%, avg=2712.77, >> stdev=215.50 >> lat (msec) : 4=100.00% >> cpu : usr=1.31%, sys=13.92%, ctx=161, majf=0, minf=6 >> IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >> >=64=0.0% >> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >> >=64=0.0% >> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >> >=64=0.0% >> issued : total=r=0/w=512/d=0, short=r=0/w=0/d=0 >> latency : target=0, window=0, percentile=100.00%, depth=1 >> >> Run status group 0 (all jobs): >> WRITE: io=4096.0MB, aggrb=367566KB/s, minb=367566KB/s, >> maxb=367566KB/s, mint=11411msec, maxt=11411msec >> >> Disk stats (read/write): >> sdaa: ios=84/0, merge=0/0, ticks=43/0, in_queue=35, util=2.36% >> >> <snip ends> >> >> At this point fio does not exit. Below are the gdb backtraces. There >> are three threads in the run. When I attach to one thread, get >> backtrace, and then detach fio crashes with Segfault. So, to collect 3 >> backtraces, I ran the experiment 3 times and attached to a different >> thread every time. Below, I change the PIDs so that they match the ps >> output. >> >> # ps -eLf | grep fio >> root 18844 10064 18844 0 3 08:50 pts/1 00:00:00 fio >> --status-interval 10 1.fio >> root 18844 10064 18862 0 3 08:50 pts/1 00:00:00 fio >> --status-interval 10 1.fio >> root 18844 10064 18863 0 3 08:50 pts/1 00:00:00 fio >> --status-interval 10 1.fio >> root 18902 18805 18902 0 1 08:53 pts/0 00:00:00 grep fio >> # gdb >> (gdb) attach 18844 >> (gdb) bt >> #0 0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from >> /lib64/libpthread.so.0 >> #1 0x0000000000430419 in fio_mutex_down (mutex=0x7ffff7ff6000) at >> mutex.c:155 >> #2 0x0000000000411bc0 in stat_exit () at stat.c:1905 >> #3 0x000000000044c1d6 in fio_backend () at backend.c:2094 >> #4 0x000000376ee1ed1d in __libc_start_main () from /lib64/libc.so.6 >> #5 0x000000000040a679 in _start () >> (gdb) attach 1862 >> (gdb) bt >> #0 0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from >> /lib64/libpthread.so.0 >> #1 0x0000000000430419 in fio_mutex_down (mutex=0x7ffff7ff2000) at >> mutex.c:155 >> #2 0x000000000041c717 in __show_running_run_stats (arg=0x72a330) at >> stat.c:1445 >> #3 0x000000376f6079d1 in start_thread () from /lib64/libpthread.so.0 >> #4 0x000000376eee8b6d in clone () from /lib64/libc.so.6 >> (gdb) attach 1863 >> (gdb) bt >> #0 0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from >> /lib64/libpthread.so.0 >> #1 0x0000000000430419 in fio_mutex_down (mutex=0x7ffff7ff6000) at >> mutex.c:155 >> #2 0x000000000041c5cd in __show_running_run_stats (arg=0x72a490) at >> stat.c:1421 >> #3 0x000000376f6079d1 in start_thread () from /lib64/libpthread.so.0 >> #4 0x000000376eee8b6d in clone () from /lib64/libc.so.6 >> >> Let me know if you need any additional information. > > > This is perfect, makes a lot more sense. Progress on this may be a little > slow, I'm on vacation this week... But expect something next week, at least. > > > -- > Jens Axboe > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval 2014-10-23 15:57 ` Michael Mattsson @ 2014-10-23 16:01 ` Jens Axboe 2014-10-23 21:58 ` Michael Mattsson 0 siblings, 1 reply; 25+ messages in thread From: Jens Axboe @ 2014-10-23 16:01 UTC (permalink / raw) To: Michael Mattsson; +Cc: Vasily Tarasov, fio@vger.kernel.org On 10/23/2014 09:57 AM, Michael Mattsson wrote: > Hey, > Have there been any work on this? This is still an issue for me on fio > 2.1.13. It happens less frequently but I've had two hangs in a sample > of ~40 runs. Try current -git and see if it works better. If not, lets get some new traces and we'll get this fixed up. -- Jens Axboe ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval 2014-10-23 16:01 ` Jens Axboe @ 2014-10-23 21:58 ` Michael Mattsson 2014-10-24 5:17 ` Jens Axboe 0 siblings, 1 reply; 25+ messages in thread From: Michael Mattsson @ 2014-10-23 21:58 UTC (permalink / raw) To: Jens Axboe; +Cc: Vasily Tarasov, fio@vger.kernel.org Hi, I got some new traces of the stuck fio process. This is 2.1.13: (gdb) bt #0 0x00007f53cbb085bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000000000430339 in fio_mutex_down (mutex=0x7f53c24ce000) at mutex.c:155 #2 0x0000000000411950 in stat_exit () at stat.c:1905 #3 0x000000000044c756 in fio_backend () at backend.c:2094 #4 0x00007f53cb583d1d in __libc_start_main () from /lib64/libc.so.6 #5 0x0000000000409e49 in _start () (gdb) With fio-2.1.13-88-gb2ee7: (gdb) bt #0 0x00007f871e9de22d in pthread_join () from /lib64/libpthread.so.0 #1 0x0000000000411939 in wait_for_status_interval_thread_exit () at stat.c:1939 #2 0x000000000044ce2a in fio_backend () at backend.c:2084 #3 0x00007f871e45cd1d in __libc_start_main () from /lib64/libc.so.6 #4 0x000000000040a529 in _start () (gdb) Regards Michael On Thu, Oct 23, 2014 at 9:01 AM, Jens Axboe <axboe@kernel.dk> wrote: > On 10/23/2014 09:57 AM, Michael Mattsson wrote: >> Hey, >> Have there been any work on this? This is still an issue for me on fio >> 2.1.13. It happens less frequently but I've had two hangs in a sample >> of ~40 runs. > > Try current -git and see if it works better. If not, lets get some new > traces and we'll get this fixed up. > > -- > Jens Axboe > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval 2014-10-23 21:58 ` Michael Mattsson @ 2014-10-24 5:17 ` Jens Axboe 2014-10-24 15:33 ` Michael Mattsson 0 siblings, 1 reply; 25+ messages in thread From: Jens Axboe @ 2014-10-24 5:17 UTC (permalink / raw) To: Michael Mattsson; +Cc: Vasily Tarasov, fio@vger.kernel.org On 2014-10-23 15:58, Michael Mattsson wrote: > Hi, > I got some new traces of the stuck fio process. > > This is 2.1.13: > (gdb) bt > #0 0x00007f53cbb085bc in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x0000000000430339 in fio_mutex_down (mutex=0x7f53c24ce000) at mutex.c:155 > #2 0x0000000000411950 in stat_exit () at stat.c:1905 > #3 0x000000000044c756 in fio_backend () at backend.c:2094 > #4 0x00007f53cb583d1d in __libc_start_main () from /lib64/libc.so.6 > #5 0x0000000000409e49 in _start () > (gdb) > > With fio-2.1.13-88-gb2ee7: > (gdb) bt > #0 0x00007f871e9de22d in pthread_join () from /lib64/libpthread.so.0 > #1 0x0000000000411939 in wait_for_status_interval_thread_exit () at stat.c:1939 > #2 0x000000000044ce2a in fio_backend () at backend.c:2084 > #3 0x00007f871e45cd1d in __libc_start_main () from /lib64/libc.so.6 > #4 0x000000000040a529 in _start () OK, I think I see what this is. Please update to current -git again and see if that doesn't fix up this issue. -- Jens Axboe ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval 2014-10-24 5:17 ` Jens Axboe @ 2014-10-24 15:33 ` Michael Mattsson 2014-10-24 15:50 ` Jens Axboe 0 siblings, 1 reply; 25+ messages in thread From: Michael Mattsson @ 2014-10-24 15:33 UTC (permalink / raw) To: Jens Axboe; +Cc: Vasily Tarasov, fio@vger.kernel.org Hey, This seems to be fixed now. I successfully finished 1500+ runs over night in my rig without issues (it would have a hanging fio thread about every other or second or so between eight clients). The stumbling output issue on the second status interval thread I've got still remain inconsistent. Regards Michael On Thu, Oct 23, 2014 at 10:17 PM, Jens Axboe <axboe@kernel.dk> wrote: > On 2014-10-23 15:58, Michael Mattsson wrote: >> >> Hi, >> I got some new traces of the stuck fio process. >> >> This is 2.1.13: >> (gdb) bt >> #0 0x00007f53cbb085bc in pthread_cond_wait@@GLIBC_2.3.2 () from >> /lib64/libpthread.so.0 >> #1 0x0000000000430339 in fio_mutex_down (mutex=0x7f53c24ce000) at >> mutex.c:155 >> #2 0x0000000000411950 in stat_exit () at stat.c:1905 >> #3 0x000000000044c756 in fio_backend () at backend.c:2094 >> #4 0x00007f53cb583d1d in __libc_start_main () from /lib64/libc.so.6 >> #5 0x0000000000409e49 in _start () >> (gdb) >> >> With fio-2.1.13-88-gb2ee7: >> (gdb) bt >> #0 0x00007f871e9de22d in pthread_join () from /lib64/libpthread.so.0 >> #1 0x0000000000411939 in wait_for_status_interval_thread_exit () at >> stat.c:1939 >> #2 0x000000000044ce2a in fio_backend () at backend.c:2084 >> #3 0x00007f871e45cd1d in __libc_start_main () from /lib64/libc.so.6 >> #4 0x000000000040a529 in _start () > > > OK, I think I see what this is. Please update to current -git again and see > if that doesn't fix up this issue. > > -- > Jens Axboe > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval 2014-10-24 15:33 ` Michael Mattsson @ 2014-10-24 15:50 ` Jens Axboe 2014-11-04 15:46 ` Vasily Tarasov 0 siblings, 1 reply; 25+ messages in thread From: Jens Axboe @ 2014-10-24 15:50 UTC (permalink / raw) To: Michael Mattsson; +Cc: Vasily Tarasov, fio@vger.kernel.org On 2014-10-24 09:33, Michael Mattsson wrote: > Hey, > This seems to be fixed now. I successfully finished 1500+ runs over > night in my rig without issues (it would have a hanging fio thread > about every other or second or so between eight clients). Goodie, I thought that it might. -- Jens Axboe ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval 2014-10-24 15:50 ` Jens Axboe @ 2014-11-04 15:46 ` Vasily Tarasov 0 siblings, 0 replies; 25+ messages in thread From: Vasily Tarasov @ 2014-11-04 15:46 UTC (permalink / raw) To: Jens Axboe; +Cc: Michael Mattsson, fio@vger.kernel.org The latest code from git does not hang for me as well. Thanks. Vasily On Fri, Oct 24, 2014 at 11:50 AM, Jens Axboe <axboe@kernel.dk> wrote: > On 2014-10-24 09:33, Michael Mattsson wrote: >> >> Hey, >> This seems to be fixed now. I successfully finished 1500+ runs over >> night in my rig without issues (it would have a hanging fio thread >> about every other or second or so between eight clients). > > > Goodie, I thought that it might. > > -- > Jens Axboe > ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2014-11-04 15:46 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-07-09 22:56 fio hangs with --status-interval Michael Mattsson 2014-07-10 8:44 ` Jens Axboe 2014-07-10 14:55 ` Michael Mattsson 2014-07-10 20:07 ` Jens Axboe 2014-07-10 21:27 ` Michael Mattsson 2014-07-11 11:48 ` Jens Axboe 2014-07-11 16:19 ` Michael Mattsson 2014-07-12 8:42 ` Jens Axboe 2014-07-16 16:58 ` Vasily Tarasov 2014-07-21 8:08 ` Jens Axboe 2014-07-21 20:25 ` Vasily Tarasov 2014-07-21 20:54 ` Jens Axboe 2014-07-25 7:43 ` Jens Axboe 2014-07-25 7:56 ` Jens Axboe 2014-07-25 16:34 ` Vasily Tarasov 2014-07-28 8:56 ` Jens Axboe 2014-07-28 16:05 ` Vasily Tarasov 2014-07-28 19:49 ` Jens Axboe 2014-10-23 15:57 ` Michael Mattsson 2014-10-23 16:01 ` Jens Axboe 2014-10-23 21:58 ` Michael Mattsson 2014-10-24 5:17 ` Jens Axboe 2014-10-24 15:33 ` Michael Mattsson 2014-10-24 15:50 ` Jens Axboe 2014-11-04 15:46 ` Vasily Tarasov
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.