Flexible I/O Tester development
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Erik Lattimore <elatt@permabit.com>
Cc: fio@vger.kernel.org
Subject: Re: Race condition in fio atexit code
Date: Tue, 31 Jul 2012 21:06:34 +0200	[thread overview]
Message-ID: <50182CBA.4060602@kernel.dk> (raw)
In-Reply-To: <998AA462-26EC-421F-B005-B4B6C936A233@permabit.com>

On 2012-06-20 01:05, Erik Lattimore wrote:
> Lately it seems like we've been hitting this more frequently, so I figured I'd file a bug.  Fio starts up a thread running the function disk_thread_main, which periodically calls update_io_ticks, which calls update_io_tick_disk on each entry in a circular linked list. The function disk_thread_main returns when the global variable "threads" is set to null, but it's only checked a couple of times in the loop.
> 
> The main thread runs the test and exits, and has registered an atexit handler free_shm. This routine sets "threads" to null and frees up storage, including the storage where the linked list used by update_io_ticks is stored.
> 
> Occasionally, somehow, update_io_tick_disk winds up getting called with a null pointer and crashing. The problem may be exacerbated when memory is tight. Here's the backtrace of the core dump:
> 
> Program terminated with signal 11, Segmentation fault.
> #0  update_io_tick_disk (du=<optimized out>) at diskutil.c:80
> 80              if (!du->users)
> (gdb) t apply all bt
> 
> Thread 2 (Thread 0x7faab680b700 (LWP 23148)):
> #0  0x00007faab58df377 in shmdt () from /lib64/libc.so.6
> #1  0x000000000040b98d in free_shm () at init.c:231
> #2  0x00007faab583b7f5 in __run_exit_handlers () from /lib64/libc.so.6
> #3  0x00007faab583b845 in exit () from /lib64/libc.so.6
> #4  0x00007faab5824c3d in __libc_start_main () from /lib64/libc.so.6
> #5  0x0000000000408ed9 in _start ()
> 
> Thread 1 (Thread 0x7faab32dd700 (LWP 23149)):
> #0  update_io_tick_disk (du=<optimized out>) at diskutil.c:80
> #1  update_io_ticks () at diskutil.c:114
> #2  0x000000000043b303 in disk_thread_main (data=<optimized out>) at backend.c:1589
> #3  0x00007faab61907b6 in start_thread () from /lib64/libpthread.so.0
> #4  0x00007faab58dd9cd in clone () from /lib64/libc.so.6
> #5  0x0000000000000000 in ?? ()
> (gdb) q--

This is clearly a race in how the disk util thread is shut down and the
structures freed. I'll take a look at a fix. It would be useful if you
told me how you are hitting this most easily, as I don't recall seeing
it. Would make me more confident in a fix.

Also, are you sure it's threads == NULL, and not the du's themselves
being freed? They are in separate storage. It might be a good idea to
have diskutil.c:free_disk_util() signal and wait for the disk util
thread to shutdown before going further.

-- 
Jens Axboe


  reply	other threads:[~2012-07-31 19:06 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-19 23:05 Race condition in fio atexit code Erik Lattimore
2012-07-31 19:06 ` Jens Axboe [this message]
2012-07-31 19:13   ` Jens Axboe
2012-08-01  7:41     ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50182CBA.4060602@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=elatt@permabit.com \
    --cc=fio@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox