From: Jens Axboe <axboe@kernel.dk>
To: Erik Lattimore <elatt@permabit.com>
Cc: fio@vger.kernel.org
Subject: Re: Race condition in fio atexit code
Date: Tue, 31 Jul 2012 21:06:34 +0200 [thread overview]
Message-ID: <50182CBA.4060602@kernel.dk> (raw)
In-Reply-To: <998AA462-26EC-421F-B005-B4B6C936A233@permabit.com>
On 2012-06-20 01:05, Erik Lattimore wrote:
> Lately it seems like we've been hitting this more frequently, so I figured I'd file a bug. Fio starts up a thread running the function disk_thread_main, which periodically calls update_io_ticks, which calls update_io_tick_disk on each entry in a circular linked list. The function disk_thread_main returns when the global variable "threads" is set to null, but it's only checked a couple of times in the loop.
>
> The main thread runs the test and exits, and has registered an atexit handler free_shm. This routine sets "threads" to null and frees up storage, including the storage where the linked list used by update_io_ticks is stored.
>
> Occasionally, somehow, update_io_tick_disk winds up getting called with a null pointer and crashing. The problem may be exacerbated when memory is tight. Here's the backtrace of the core dump:
>
> Program terminated with signal 11, Segmentation fault.
> #0 update_io_tick_disk (du=<optimized out>) at diskutil.c:80
> 80 if (!du->users)
> (gdb) t apply all bt
>
> Thread 2 (Thread 0x7faab680b700 (LWP 23148)):
> #0 0x00007faab58df377 in shmdt () from /lib64/libc.so.6
> #1 0x000000000040b98d in free_shm () at init.c:231
> #2 0x00007faab583b7f5 in __run_exit_handlers () from /lib64/libc.so.6
> #3 0x00007faab583b845 in exit () from /lib64/libc.so.6
> #4 0x00007faab5824c3d in __libc_start_main () from /lib64/libc.so.6
> #5 0x0000000000408ed9 in _start ()
>
> Thread 1 (Thread 0x7faab32dd700 (LWP 23149)):
> #0 update_io_tick_disk (du=<optimized out>) at diskutil.c:80
> #1 update_io_ticks () at diskutil.c:114
> #2 0x000000000043b303 in disk_thread_main (data=<optimized out>) at backend.c:1589
> #3 0x00007faab61907b6 in start_thread () from /lib64/libpthread.so.0
> #4 0x00007faab58dd9cd in clone () from /lib64/libc.so.6
> #5 0x0000000000000000 in ?? ()
> (gdb) q--
This is clearly a race in how the disk util thread is shut down and the
structures freed. I'll take a look at a fix. It would be useful if you
told me how you are hitting this most easily, as I don't recall seeing
it. Would make me more confident in a fix.
Also, are you sure it's threads == NULL, and not the du's themselves
being freed? They are in separate storage. It might be a good idea to
have diskutil.c:free_disk_util() signal and wait for the disk util
thread to shutdown before going further.
--
Jens Axboe
next prev parent reply other threads:[~2012-07-31 19:06 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-19 23:05 Race condition in fio atexit code Erik Lattimore
2012-07-31 19:06 ` Jens Axboe [this message]
2012-07-31 19:13 ` Jens Axboe
2012-08-01 7:41 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50182CBA.4060602@kernel.dk \
--to=axboe@kernel.dk \
--cc=elatt@permabit.com \
--cc=fio@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.