Flexible I/O Tester development
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: "Elliott, Robert (Server Storage)" <Elliott@hp.com>,
	"stephenmcameron@gmail.com" <stephenmcameron@gmail.com>
Cc: "fio@vger.kernel.org" <fio@vger.kernel.org>
Subject: Re: fio main thread got stuck over the weekend
Date: Mon, 15 Dec 2014 13:49:35 -0700	[thread overview]
Message-ID: <548F495F.7090200@kernel.dk> (raw)
In-Reply-To: <548F452D.7040401@kernel.dk>

[-- Attachment #1: Type: text/plain, Size: 2222 bytes --]

On 12/15/2014 01:31 PM, Jens Axboe wrote:
> On 12/15/2014 01:12 PM, Jens Axboe wrote:
>>> (gdb) thread 212
>>> [Switching to thread 212 (Thread 0x7fa9a086e700 (LWP 6509))]#0  0x0000003974c0b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
>>> (gdb) bt
>>> #0  0x0000003974c0b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
>>> #1  0x0000000000449381 in helper_thread_main (data=<value optimized out>) at backend.c:2127
>>> #2  0x0000003974c079d1 in start_thread () from /lib64/libpthread.so.0
>>> #3  0x00000039748e8b7d in clone () from /lib64/libc.so.6
>>> (gdb) select-frame 1
>>> (gdb) info local
>>> ts = {tv_sec = 1418658662, tv_nsec = 980074000}
>>> tv = {tv_sec = 1418658662, tv_usec = 730074}
>>> ret = <value optimized out>
>>> (gdb) print helper_exit
>>> $36 = 0
>>> (gdb) print nr_thread
>>> $37 = 224
>>> (gdb) print nr_process
>>> $38 = 0
>>
>> This is the helper thread, this is what will output the ETA etc stats.
>> There's a signal handler that runs and sets/wakes it to do so. So it
>> would be interesting if you could see what it is up to, set a break
>> point in there and check what the control flow ends up being.
>>
>> It's backend.c:helper_thread_main()
>>
>> backend.c:sig_show_status() should repeatedly fire and end up waking
>> this thread.
> 
> So that referred to the output status, your bug was on the ETA, correct?
> For that, we never get woken up by someone signaling our condvar, we
> just wake up every second to do the ETA display. The time conversion
> looks wrong, though:
> 
> gettimeofday(&tv, NULL);
> ts.tv_sec = tv.tv_sec + sec;
> ts.tv_nsec = (tv.tv_usec * 1000) + nsec;
> if (ts.tv_nsec > 1000000000ULL) {
> 	ts.tv_nsec -= 1000000000ULL;
> 	ts.tv_sec++;
> }
> 
> I wonder what happens if ts.tv_nsec == 10^6, that check really should be
> on >= instead. Not sure if this explains it though, I'll conjure up a
> test case...

On my system, pthread_cond_timedwait() returns EINVAL if nsec is 10^9 or
higher, which is the sane thing to do. I wonder if it does that at your
end too, or whether it hangs? If it hangs indefinitely, that could
explain it. Test case attached, I checked in a fix for the off-by-one.

-- 
Jens Axboe


[-- Attachment #2: cvtest.c --]
[-- Type: text/x-csrc, Size: 1006 bytes --]

/*
 * gcc -Wall -O2 -o cvtest cvtest.c -lpthread
 */
#include <stdio.h>
#include <limits.h>
#include <inttypes.h>
#include <stdlib.h>
#include <assert.h>
#include <sys/time.h>
#include <string.h>
#include <pthread.h>

#define DISK_UTIL_MSEC	250

struct d {
	pthread_mutex_t *lock;
	pthread_cond_t *cond;
};

static void *thread(void *data)
{
	struct d *d = data;
	struct timespec ts;
	struct timeval tv;
	int ret;

	gettimeofday(&tv, NULL);
	ts.tv_sec = tv.tv_sec + 1;
	ts.tv_nsec = 1000000000ULL;

	printf("will wait\n");
	pthread_mutex_lock(d->lock);
	ret = pthread_cond_timedwait(d->cond, d->lock, &ts);
	pthread_mutex_unlock(d->lock);
	printf("done wait %d\n", ret);
	return NULL;
}

int main(int argc, char *argv[])
{
	pthread_cond_t cond;
	pthread_mutex_t lock;
	pthread_t pthread;
	struct d d;
	void *ret;

	pthread_cond_init(&cond, NULL);
	pthread_mutex_init(&lock, NULL);

	d.lock = &lock;
	d.cond = &cond;

	pthread_create(&pthread, NULL, thread, &d);
	pthread_join(pthread, &ret);

	return 0;
}

  reply	other threads:[~2014-12-15 20:49 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-11 15:44 fio main thread got stuck over the weekend scameron
2014-08-11 16:04 ` scameron
2014-08-22 19:04   ` Jens Axboe
2014-08-22 19:09     ` scameron
2014-08-22 19:11       ` Jens Axboe
2014-12-12 20:32         ` Elliott, Robert (Server Storage)
2014-12-13  4:49           ` Jens Axboe
2014-12-15 17:33             ` Elliott, Robert (Server Storage)
2014-12-15 17:37               ` Jens Axboe
2014-12-15 19:39                 ` Elliott, Robert (Server Storage)
2014-12-15 20:12                   ` Jens Axboe
2014-12-15 20:31                     ` Jens Axboe
2014-12-15 20:49                       ` Jens Axboe [this message]
2014-12-16  0:52                         ` Elliott, Robert (Server Storage)
2014-12-16  2:51                           ` Jens Axboe
2014-12-16 22:43                           ` Jens Axboe
2014-12-17  3:52                             ` Elliott, Robert (Server Storage)
2014-12-17  5:43                               ` Jens Axboe
2014-12-17 16:48                                 ` Elliott, Robert (Server Storage)
2014-12-17 17:27                                   ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=548F495F.7090200@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=Elliott@hp.com \
    --cc=fio@vger.kernel.org \
    --cc=stephenmcameron@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox