From: Jens Axboe <axboe@fb.com>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Dave Chinner <david@fromorbit.com>,
linux-ext4@vger.kernel.org, fstests@vger.kernel.org,
tarasov@vasily.name
Subject: Re: Test generic/299 stalling forever
Date: Sun, 23 Oct 2016 08:32:49 -0600 [thread overview]
Message-ID: <53fe5a98-6ff9-4fa1-e84c-8a3e16cc0f50@fb.com> (raw)
In-Reply-To: <20161021221551.sdv4hgw33zjxnkvu@thunk.org>
On 10/21/2016 04:15 PM, Theodore Ts'o wrote:
> On Thu, Oct 20, 2016 at 08:22:00AM -0600, Jens Axboe wrote:
>>> So what's happening is that generic/299 is looping in the
>>> fallocate/truncate loop until fio exits, but since fio never exits, so
>>> it ends up looping forever.
>>
>> I'm setting up the GCE now, I've had the tests running for about 24h now
>> on another test box and haven't been able to trigger any hangs. I'll
>> match your setup as closely as I can, hopefully that'll work.
>
> Any luck reproducing the problem?
>
> On Wed, Oct 19, 2016 at 08:06:44AM -0600, Jens Axboe wrote:
>>
>> I'll take a look today. I agree, this definitely looks like a fio
>> bug. But not related to the mutex issue for the stat part, all verifier
>> threads are waiting to be woken up, but the main thread is done.
>>
>
> I was taking a closer look at this, and it does look ike it's related
> to the stat_mutex. The main thread (according to gdb) seems to be
> stuck in this loop in backend.c line 1738 (in thread_main):
>
> do {
> check_update_rusage(td);
> if (!fio_mutex_down_trylock(stat_mutex))
> break;
> usleep(1000); <----- line 1738
> } while (1);
>
> So it looks like it's not able to grab the stat_mutex. But I can't
> figure out how the stat_mutex could be down. None of the strack
> traces seem to show that, and I've looked at all of the places where
> stat_mutex is taken, and it doesn't look like stat_mutex should ever
> be down for more than, say, a second?
>
> So as a temporary workaround, I'm considering adding a check to see if
> we stay stuck in this loop for than a thousand times, and if so, print
> an error to stderr and then call _exit(1), or maybe just break out two
> levels by jumping to line 1778 at "td_set_runstate(td, TD_FINISHING)"
> and just give up on the usage statistics (since for xfstests we really
> don't care about the usage stats).
Very strange. Can you see who the owner is of stat_mutex->lock, that's
the pthread_mutex_t they are sleeping on.
For now, I'll apply the work-around you sent. I haven't been able to
reproduce this, but knowing that it's the stat_mutex will allow me to
better make up a test case to hit it.
--
Jens Axboe
next prev parent reply other threads:[~2016-10-23 14:33 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-18 15:53 Test generic/299 stalling forever Theodore Ts'o
2015-06-18 16:25 ` Eric Whitney
2015-06-18 23:34 ` Dave Chinner
2015-06-19 2:56 ` Theodore Ts'o
2016-09-29 4:37 ` Theodore Ts'o
2016-10-12 15:46 ` Jens Axboe
2016-10-12 21:14 ` Dave Chinner
2016-10-12 21:19 ` Jens Axboe
2016-10-13 2:15 ` Theodore Ts'o
2016-10-13 2:39 ` Jens Axboe
2016-10-13 23:19 ` Theodore Ts'o
2016-10-18 18:01 ` Theodore Ts'o
2016-10-19 14:06 ` Jens Axboe
2016-10-19 17:49 ` Jens Axboe
2016-10-19 20:32 ` Theodore Ts'o
2016-10-20 14:22 ` Jens Axboe
2016-10-21 22:15 ` Theodore Ts'o
2016-10-23 2:02 ` Theodore Ts'o
2016-10-23 14:32 ` Jens Axboe [this message]
2016-10-23 19:33 ` Theodore Ts'o
2016-10-23 21:24 ` Theodore Ts'o
2016-10-24 1:41 ` Jens Axboe
2016-10-24 3:38 ` Theodore Ts'o
2016-10-24 16:28 ` Jens Axboe
2016-10-25 2:54 ` Theodore Ts'o
2016-10-25 2:59 ` Jens Axboe
2016-10-13 13:08 ` Anatoly Pugachev
2016-10-13 13:36 ` Anatoly Pugachev
2016-10-13 14:28 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53fe5a98-6ff9-4fa1-e84c-8a3e16cc0f50@fb.com \
--to=axboe@fb.com \
--cc=david@fromorbit.com \
--cc=fstests@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=tarasov@vasily.name \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox