From: Jens Axboe <axboe@kernel.dk>
To: Erik Lattimore <elatt@permabit.com>
Cc: fio@vger.kernel.org
Subject: Re: Race condition in fio atexit code
Date: Tue, 31 Jul 2012 21:13:07 +0200 [thread overview]
Message-ID: <50182E43.5030309@kernel.dk> (raw)
In-Reply-To: <50182CBA.4060602@kernel.dk>
On 2012-07-31 21:06, Jens Axboe wrote:
> On 2012-06-20 01:05, Erik Lattimore wrote:
>> Lately it seems like we've been hitting this more frequently, so I figured I'd file a bug. Fio starts up a thread running the function disk_thread_main, which periodically calls update_io_ticks, which calls update_io_tick_disk on each entry in a circular linked list. The function disk_thread_main returns when the global variable "threads" is set to null, but it's only checked a couple of times in the loop.
>>
>> The main thread runs the test and exits, and has registered an atexit handler free_shm. This routine sets "threads" to null and frees up storage, including the storage where the linked list used by update_io_ticks is stored.
>>
>> Occasionally, somehow, update_io_tick_disk winds up getting called with a null pointer and crashing. The problem may be exacerbated when memory is tight. Here's the backtrace of the core dump:
>>
>> Program terminated with signal 11, Segmentation fault.
>> #0 update_io_tick_disk (du=<optimized out>) at diskutil.c:80
>> 80 if (!du->users)
>> (gdb) t apply all bt
>>
>> Thread 2 (Thread 0x7faab680b700 (LWP 23148)):
>> #0 0x00007faab58df377 in shmdt () from /lib64/libc.so.6
>> #1 0x000000000040b98d in free_shm () at init.c:231
>> #2 0x00007faab583b7f5 in __run_exit_handlers () from /lib64/libc.so.6
>> #3 0x00007faab583b845 in exit () from /lib64/libc.so.6
>> #4 0x00007faab5824c3d in __libc_start_main () from /lib64/libc.so.6
>> #5 0x0000000000408ed9 in _start ()
>>
>> Thread 1 (Thread 0x7faab32dd700 (LWP 23149)):
>> #0 update_io_tick_disk (du=<optimized out>) at diskutil.c:80
>> #1 update_io_ticks () at diskutil.c:114
>> #2 0x000000000043b303 in disk_thread_main (data=<optimized out>) at backend.c:1589
>> #3 0x00007faab61907b6 in start_thread () from /lib64/libpthread.so.0
>> #4 0x00007faab58dd9cd in clone () from /lib64/libc.so.6
>> #5 0x0000000000000000 in ?? ()
>> (gdb) q--
>
> This is clearly a race in how the disk util thread is shut down and the
> structures freed. I'll take a look at a fix. It would be useful if you
> told me how you are hitting this most easily, as I don't recall seeing
> it. Would make me more confident in a fix.
>
> Also, are you sure it's threads == NULL, and not the du's themselves
> being freed? They are in separate storage. It might be a good idea to
> have diskutil.c:free_disk_util() signal and wait for the disk util
> thread to shutdown before going further.
Can you reproduce with this patch?
diff --git a/backend.c b/backend.c
index e1dc0ac..03d35df 100644
--- a/backend.c
+++ b/backend.c
@@ -50,6 +50,8 @@
#include "server.h"
static pthread_t disk_util_thread;
+struct fio_mutex *disk_util_mutex;
+int disk_util_exit = 0;
static struct fio_mutex *startup_mutex;
static struct fio_mutex *writeout_mutex;
static struct flist_head *cgroup_list;
@@ -1592,9 +1594,9 @@ static void *disk_thread_main(void *data)
{
fio_mutex_up(startup_mutex);
- while (threads) {
+ while (threads && !disk_util_exit) {
usleep(DISK_UTIL_MSEC * 1000);
- if (!threads)
+ if (!threads || disk_util_exit)
break;
update_io_ticks();
@@ -1602,6 +1604,7 @@ static void *disk_thread_main(void *data)
print_thread_status();
}
+ fio_mutex_up(disk_util_mutex);
return NULL;
}
@@ -1609,15 +1612,19 @@ static int create_disk_util_thread(void)
{
int ret;
+ disk_util_mutex = fio_mutex_init(0);
+
ret = pthread_create(&disk_util_thread, NULL, disk_thread_main, NULL);
if (ret) {
log_err("Can't create disk util thread: %s\n", strerror(ret));
+ fio_mutex_remove(disk_util_mutex);
return 1;
}
ret = pthread_detach(disk_util_thread);
if (ret) {
log_err("Can't detatch disk util thread: %s\n", strerror(ret));
+ fio_mutex_remove(disk_util_mutex);
return 1;
}
diff --git a/diskutil.c b/diskutil.c
index feb8852..36b381f 100644
--- a/diskutil.c
+++ b/diskutil.c
@@ -521,6 +521,9 @@ void free_disk_util(void)
{
struct disk_util *du;
+ disk_util_exit = 1;
+ fio_mutex_down(disk_util_mutex);
+
while (!flist_empty(&disk_list)) {
du = flist_entry(disk_list.next, struct disk_util, list);
flist_del(&du->list);
@@ -528,6 +531,7 @@ void free_disk_util(void)
}
last_majdev = last_mindev = -1;
+ fio_mutex_remove(disk_util_mutex);
}
void print_disk_util(struct disk_util_stat *dus, struct disk_util_agg *agg,
diff --git a/diskutil.h b/diskutil.h
index 5a9b079..c7212fa 100644
--- a/diskutil.h
+++ b/diskutil.h
@@ -3,6 +3,9 @@
#define FIO_DU_NAME_SZ 64
+extern struct fio_mutex *disk_util_mutex;
+extern int disk_util_exit;
+
/*
* Disk utils as read in /sys/block/<dev>/stat
*/
@@ -106,7 +109,10 @@ extern void update_io_ticks(void);
#else
#define print_disk_util(dus, agg, terse)
#define show_disk_util(terse)
-#define free_disk_util()
+static inline void free_disk_util(void)
+{
+ fio_mutex_remove(disk_util_mutex);
+}
#define init_disk_util(td)
#define update_io_ticks()
#endif
--
Jens Axboe
next prev parent reply other threads:[~2012-07-31 19:13 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-19 23:05 Race condition in fio atexit code Erik Lattimore
2012-07-31 19:06 ` Jens Axboe
2012-07-31 19:13 ` Jens Axboe [this message]
2012-08-01 7:41 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50182E43.5030309@kernel.dk \
--to=axboe@kernel.dk \
--cc=elatt@permabit.com \
--cc=fio@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.